Automated preliminary design using artificial neural networks

MISSING IMAGE

Material Information

Title:
Automated preliminary design using artificial neural networks
Physical Description:
vi, 276 leaves : ill. ; 29 cm.
Language:
English
Creator:
Garcelon, John H
Publication Date:

Subjects

Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1995.
Bibliography:
Includes bibliographical references (leaves 266-275).
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by John H. Garcelon.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 002111552
notis - AKV1137
oclc - 36019056
System ID:
AA00003192:00001

Full Text








AUTOMATED PRELIMINARY DESIGN USING ARTIFICIAL NEURAL
NETWORKS












By

JOHN H. GARCELON


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE
UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE
REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY



UNIVERSITY OF FLORIDA



1995














TABLE OF CONTENTS

SAB STRA CT .............9...... ........ ........... ..................... ...*....**. ..**55.*** 4- *49..* -*5 5. .. **. 449 ..*99... v1


INTRODUCTION ......


Computational Models of
Iterative Design Proce;
The Synthesis Process
The Analysis Process.
Evaluation and Redesij
A Role for Neural Networ
Overview of Approach.....
Organization ....................

DESIGN THEORY AND
What Is Design ................
Design Requirements.......
Models of Design Process
Prescriptive Design Pr
Descriptive Design Pru
Computational Design
Artificial Neural Networkl
Connectionism as a Comp


. ** *** ~ 9** ** S*9* 9** 4 4 ... 45 ** .. 5(1* 9 4*9 *9 4*11*194* *


Design Processes ..................................................................... 1
sses .. ..... .............. ............................... ......... .......................... 3
**.. ** ...4 **. ***.* .**** ***4** ** ** *999.. ........*****......... 5
.S....S .* *999 ** #9... *49..........*.......5.4.94.9.*. .........4..*...........* 9..... .. .. 13
gns Processes .............................................................................. *6
k Systems................................................................................ 6


M ETHODOLOGY ............................................................... 13


es........................................................................................... 1326

ocess M o..dels................................................................................ 27
access M models .......................................................................... 32

Process M odels..................................................................... 36
s in Engineering ..................................................................... 44
utatinal M model ........................................................................ 57


ARTIFICIAL NEURAL NETWORKS


99* 44* 551* 5 *4* 9 9 *** 999 9441 4*5 *54 S 9 ** 9 9. ... **t*1 4*


Historical Perspective ....................................................................................................... 62
The Neuron ................................................................................................................... 77
The Biological Neuron ............................................................................................ 78
The Computational Neuron ........ ........................... .......... .................................... ... 79
Input stage ......................................................................................................... 81
Activation stage ................................................................................................. 81
Output stage ...................................................................................................... 85
Networks of Neurons ...................................................................................................... 85
Network Structure .............................................................. ............. ................. 86
Distributed and Local Representations ................... ................... ................... ............ 88
Network Dynamics ..................................................................................................... 89
Summary ....................................................................................................................... 93
Learning .........9............................................................. ....................... ............... 93


~"""""""""'"""""~~'~'~~~~~


rrr~~l~l--







..................... .....* *.*** *** ***** p*** **......... .. .. **.*. *. ** -. .. 9 5


Constraint Satisfaction........


Harmony Theory Networks ....... ............... ...................................................................
Qualitative Reasoning .............. ................... ................... ............................ ..............
E x am p les .....................................................................................................................
S tru ctu re 1 ..............................................................................................................
Structure 2 .............................................................................................................
Structure 3 .............................................................................................................
Summ ary* ...................................... ................... .......-******.........................................*


BACKPROPAGATION NETWORKS .............................
B background .....................................................................
Backpropagation of Error..........................................
Forward Pass--Calculating the Output ......................
Backward Pass--Adjusting the Weights .....................


* 4&4*4 4~4&~4-* ...........4 9... 49**~****

4~~** *9*** **~ *9*~*** *~ 9*4~~*9 *4444*94*~***** 4


Learning Rate, W eight Initialization, and M momentum ...... ....... ..... ................ .......
Variations .... ..................................................... ........................................................
Filat Spots ..,Mh.. d.................................................................*.....*.*..............***********
Symmetric AciVatvation Function ..............................................................................
Hyperbolic Arctangent Error Function ...................................................................
Newton's Method ..................................................................................................
Quickprop ..............................................................................................................
Learning Rate Adaptation ......................................................................................
Summary .....................................................................................................................

QUIKPROP DESIGN AND IM PLEM ENTATION .....................................................
Object-Oriented Programming .......... .........................................................................
Abstraction ............ ....................................... ... ............. .........................................
Encap sulation ........................................................................................................
Inheritance .............................................................................................................
Object-Oriented Program Design .................................................................................
Identifying Classes .................................................................................................


44.. 4 .9..


Assigning Attributes and Behavior


Identifying Relationships Between Classes .............................................................
Creating a Class Hierarchy....................................................................................
Design and Implementation of QuikProp ......................................................................
Goals and Requirements.........................................................................................
D esig n ...................................................................................................................
Using QuikProp ......................... ................. .............................................................
Inpu t F iles............ .................................................................................................
Definition file ...................................................................................................
Training file.........................................................9.............................................
Scale factors file ................................. ..............................................
T est file .................................................................................................... .......







Test mode.


R u n lm o d e ............................. .. ..... ... ...................... ....................----.....*- ..... ...
Output Files .....................................................................****....................... **** ******
Output file........................................................................................................
Plot file ............................................................................................................
W eights file ........................................................................ ..............................

SuPRELIMINARY DESIGN STUDIES .........................................................................

Neural Engineering ....................................................................................................
Problem and Paradigm Selection ............................................................................


Generalization ........................................................................................................
Network Size ...................................................................................................
Hidden Neurons ...............................................................................................
T raining ................................................... ***................................................ ..........
Beam Design Example .................................................................................................
Problem Definition and Setup.................................................................................
M measuring Network Performance ...........................................................................
Recall--Test 1 ............................. ...........................................................................
Generalization- Test 2.............................................................................................
Generalization--Test 3. ...........................................................................................

Generalization--Test 5................ .......................................................................
Generalization- Test 6 ............................................................................................
Evaluation ..e............ ........................................................ ...- ...--.......-4--.- 4 ...4 4 .4..-
Binary Beam Design Example ......................................................................................
Recall--Test 1 ...................................................................................................
Generalization--Test 2..........................................................................................
E v alu atio n. ....................................................94.... ..........*..... ............. ............ .........
Frame Design Example ..................................................................................................
S u m m ary .....................................................................................................................

CONCLUSIONS .................................................................................................... .....
Summary ................................................................................................................
E v alu atio n ....................................................................................................................
Future W ork ................................................................................................................


APPENDIX A


APPENDIX B


. *999999994* 4.. ..* 4. 99 9.... .. 9*9..... 999999..*4449* 44- *9994*1* 9 9499"*949999* *4449*"*99499* 9 4-** *4*9999


9*, S* 9 9999999444 444#4949 *999 99999.4.9499 *99 999999444494*4944444444 49499949*49 9 *4 949449499*994** 9* *"


BBjuII.4G.1I Pa.I.. I(EI.. III 9999999*9*94949999999t 494449999*9*494949999*444*94999944944949' ****.*.. 2.7.


193
193
194
196
198
200
202
204
207
211
212
215
218
219
221
221
222
223
224
225
226
227
234

236
237
239
243







Abstract of Dissertation Presented to the Graduate School of the University of Florida in
Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy


AUTOMATED PRELIMINARY DESIGN USING ARTIFICIAL NEURAL
NETWORKS

By

John H. Garcelon


August 1995


Chairman: Dr. Gale E.


Nevill, Jr.


Major Department: Aerospace Engineering, Mechanics, and Engineering Science

This dissertation investigates the applicability of artificial neural network systems

to preliminary engineering design tasks. Synthesizing new, possibly innovative designs by

exploring the development of structural topologies and determining their possible

behaviors are two steps of preliminary design where this research concentrates. These two

areas of preliminary structural design have proven difficult for design researchers. Using

the neural network approach toward these tasks is feasible, but issues such as representing

design problems in neural networks, collecting good design examples, and measuring

network performance are still unresolved.

This research begins by examining philosophies of design, which provides a basis

for later discussions. In particular, the influence of design automation and computational

models of design processes on the science of design are considered.


Next, this work provides an introduction to artificial neural networks.


Two classes


of neural models, constraint satisfaction and supervised learning models, are examined in







cornerstone for development of a model that uses induction in an attempt to learn from

design examples, generalize results, and generate preliminary structural designs.

A major bottleneck in developing most knowledge based systems is acquiring and

representing requisite knowledge. Supervised learning models of connectionism have the

potential to alleviate this obstacle. The second neural network system discussed and


demonstrated is a hybrid back propagation model.


This system can learn from examples of


previous designs and is able to generate new designs.

In addition to design issues, the discussion of connectionist models includes details


of the different models, their performance, attributes, integrity, and shortcomings.


results of this research are an initial investigation into connectionism as applied to design.

Both connectionism and the theory of design are relatively young in terms of formal

research when compared to traditional areas of engineering and science. This work

contributes to the maturing effort and identifies promising areas for further research.












INTRODUCTION


The objectives of this chapter are to provide a general framework for preliminary

structural design processes considered in this research and identify the role of neural

network systems within this framework. This chapter also includes an overview of the

approach taken here and the motivations for exploring neural network techniques. The last

section of this chapter gives the organization of the remainder of this thesis.


Computational Models of Design Processes


Computational models of design concentrate on two broad areas. First,

computational models focus on how computers can design or assist in designing artifacts.

In this area, computational design models can describe, replicate, or simulate the cognitive

process that human designers employ, or they can describe how a computer can

accomplish some design task. These models can be derived from observation of human

designers, but not necessarily. Secondly, they can serve as a controlled environment for

research into design theory. By providing a design system that can reproduce results in a

consistent, logical fashion, computational models allow design researchers to examine

different processes and theories into the nature of design itself.

Until recently, computational models of design have concentrated on designing


solely for function and fit. An artifact'


preliminary design has ignored the implications of







issues such as disposability.

Designers have traditionally


considered these issues only


after important design


decisions and commitments

have been made, resulting

in designs that could not


meet life-cycle requirements

from conception to disposal.


Time


Figure 1: Mapping Between Abstraction Levels

The economic cost resulting from this practice has led to a


growing interest in what designers call design for manufacture, concurrent design,

simultaneous engineering, and design for the life cycle. Computational models of

preliminary design should address these issues.

Models of design normally consider processes that map an explicit set of design

requirements into a description of a physically realizable object that satisfies those given


requirements.


These models are incremental and iterative in nature and consist of several


stages or steps. Bell et al.


[Bell91] describes a general process where an abstract design


goes through a series of mappings, iterative redesign steps, and optimizations as shown in


Figure 1


Given a set of initial design requirements, the design process improves


artifact through some iterative design and/or optimization procedure until it can no longer

make further progress. At this point, the computational process maps the artifact to a less


abstract stage using all information available. Then the design process is repeated.


This




3


process would typically increase the abstraction level, moving backwards, therefore,

increasing the design time.1

What is not clear about this general mapping model is how the mapping between


abstraction levels is identified and performed.


Human designers easily apply these


mappings at convenient times; however, it has been difficult for existing computational

models to mimic this activity. This research attempts to employ artificial neural networks

to this task, one in which knowledge based systems have had difficulty.


Iterative Design Processes


Nevill et al. [Nevill89a] and Flemming et al.


[Flemming92] describe the iterative


design process that occurs between abstraction mappings. Both of these models are


similar; however, it is illustrative to note their differences.


Nevill et al. characterize a


design model with the following phases:


Evaluation of the status of an artifact's design with respect to the design


requirements,


Generation of candidate design steps,


Prediction of implications of the candidate design steps,


Selection of a candidate design step,
Implementation of the candidate design step,
Notification of the implications of that step.
Flemming et al. describe a similar incremental, iterative design process model. This


model describes the resulting artifact by its form, function, and behavior,


and it involves




4


four stages, synthesis, analysis, evaluation, and redesign. These stages are described as

follows:


Synthesis


the process of developing one or more candidate forms given a set of


design requirements.


Analysis is the process of determining each candidate's


behavior.


Evaluation is the process of comparing the behavior and candidate form to the
requirements.
Redesign is the process of further refinement and selection of one or more
candidate artifacts using information gained from the evaluation of current and
earlier candidate designs.
Both computational models are iterative, but the primary difference between these

two computational design models is that Flemming et al. explicitly identify the importance


of design requirements [Flemming92]; whereas, Nevill et al.


[Nevill89a].


imply their significance


In addition, the Nevill et al. model considers design as a constraint


satisfaction process (i.


, the explicit Notification step); whereas, the Flemming et


model is less specific and does not prescribe a design methodology.


These differences


illustrate that there is no single design process model that has been accepted by the design

community.

As design theory research has progressed over the past decade, most


computational models have adopted like approaches. Figure


2 shows a schematic diagram


of an incremental, iterative computational design model with each of the design stages


described by Flemming et al.


[Flemming92] and Nevill et al.


[Nevill89a].


The following


sections describe each design stage as it pertains to this research.






The Synthesis Process


The synthesis process maps functional


requirements to a description of an artifact's


form,


which includes its geometry, topology, shape, and

even materials. Finger and Dixon [Finger89a] call this

both conceptual design and configuration design.

According to Finger and Dixon, these are different


steps in the sense that in conceptual design an


Figure


Iterative Design Process


artifact's function is explicit and is used to generate new designs; however, in


configuration design, an artifact's


function is usually implicit and is used to evaluate


designs. Both conceptual and configuration design stages are necessary to generate an


artifact's


form.


Research models for synthesis processes are only just beginning to appear.


This work investigates a neural network approach to this stage of an iterative design

process for preliminary structural design.

Preliminary structural design offers a difficult problem area where there are few

constraints, requirements, and objectives that can be expressed in algebraic form. Formal


numerical optimization is not readily adapted to this task.


and performance are not clearly defined.


The relationships between form


There is not enough heuristic knowledge for


preliminary structural design nor even a domain theory, which contains the knowledge that

a system can use in the problem solving process. Good conceptual structural designers

significantly rely on their past design experience. Thus, mathematical optimization and


Synthesis
Generation
Implementation


Redesign
Notification


Analysis

iJ


Evaluation
Selection


~i~


/i


C




6


experiential knowledge for structural synthesis. Therefore, one of the goals of this

research is to investigate and identify promising approaches to acquiring and using

structural synthesis knowledge.


The Analysis Process

The purpose of the analysis process is to determine the behavioral characteristics

of the forms developed from the synthesis process with respect to the functional

requirements. At the initial stages of design, attributes of an artifact often are not yet fully

described; however, analysis of preliminary designs is important before mapping to


another less abstract level.


Without information concerning behavior of partially


instantiated artifacts, these designs can only be analyzed subjectively and implicitly.

This research attempts to identify ways for better analysis of incomplete designs in order

to explore more configuration and conceptual alternatives during synthesis. Since a

general configuration without fully instantiated attributes can result from the synthesis

stage, this research demonstrates a method of qualitative analysis of preliminary structural


designs using the design'


functional requirements and first principles of engineering.


Evaluation and Redesian Processes


This research does not attempt to explicitly investigate the evaluation and redesign

stages of iterative design processes.


A Role for Neural Network Systems


For design problems where design requirements can be expressed in algebraic form




7


attempts to minimize (or maximize) an objective function without violating any

constraints. In formal mathematical optimization, constraints are hard in the sense that a

solution cannot violate any constraint. The preliminary design stage and in particular the

synthesis process are extremely difficult if not impossible to directly cast into a

mathematical optimization problem without first assigning values to design attributes to

formulate an objective function and any associated constraints. The lack of both constraint

and objective functions for form-function relationships makes numerical optimization a

deficient approach for structural synthesis design processes at this time.

Knowledge-based preliminary design systems offer a different technique. Here

automated design systems use heuristic knowledge about specific design domains to

search through a space of possible design solutions for the one which best satisfies a set of

design heuristics. The levels of knowledge required for these heuristics are widely varying,

from first principles to domain specific knowledge. Design heuristics are difficult to

develop since the amount and variations in the types of knowledge that design heuristics

require make collections of heuristics sufficient for many design domains, specifically

structural synthesis, almost impossible to qualify. In addition structural synthesis lacks a

significant domain theory or collection of knowledge that designers recognize and follow,

which inhibits developing heuristics. The lack of a domain theory can be accredited to the

chaotic and creative nature of preliminary design processes that rely heavily on

experiential knowledge that is difficult to qualify or quantify.

Good designers in most fields follow their personal, past design experiences.




8


advantage of opportunity, divide complicated situations into manageable parts, and create

abstractions that simplify design processes. Because both numerical optimization and

knowledge-based design models lack experiential knowledge and a "complete" basis of

first principles, they have difficulty performing as well as good human designers in design

synthesis processes.

Human designers readily map between abstraction layers; oftentimes, without

identifiable rules. Although the abstraction layers they use may be of common types,

different designers move between these layers in a chaotic manner. More likely than not,

these designers are relying on their experience and intuition. If their foray into a less

abstract description does not provide the desired effect, then humans easily adapt by


becoming more abstract.


This type of behavior is difficult to model using knowledge-based


systems.


Artificial neural networks can overcome many of these obstacles.


Neural networks


can learn from experiences and work with large numbers of constraints or requirements.

Learning allows neural systems to identify relationships and self-organize these


relationships, producing a mapping between a set of inputs and some set of outputs.


parallel nature and layered architecture of artificial neural network systems offer a

potential for working with large amounts of interdependent information in a relatively

efficient manner. Thus, artificial neural networks have the potential to organize and use

the immense amount of requisite information characteristic of design problems. Ivezic and

Garrett [Ivezic92] state that machine learning of synthesis knowledge facilitates a more




9

artificial neural networks in design synthesis domains is to acquire from past design

experiences relationships between specified design requirements and physically realizable

objects that satisfy those requirements. A goal of this research is to investigate and identify

promising neural network approaches to preliminary structural design synthesis that can

learn from previous design experiences and efficiently utilize large numbers of constraints.


Overview of Approach

Mapping between abstraction layers, developing structural topologies, and

determining possible behavior of candidate preliminary designs are three areas of

preliminary structural design synthesis that have proved difficult for design researchers.

The approach taken in this research concentrates on investigating neural network

approaches for these tasks. The following assumptions guide this research:

Design is by nature a multidisciplinary effort involving teams of designers with
different areas of expertise. This research will not attempt to invent a complete
computational design model for general structural design compromising complete,
large real world projects. Instead, this study will concentrate on small, compliant
structural designs that encompass as many different aspects of the preliminary
structural design domain as is computationally feasible.
Although design tasks are integrated, the state of current research into design and
neural networks predicates disconnected undertakings in these areas, primarily for
tractability.
This study considers three areas of preliminary structural design:

qualitative analysis of preliminary structural systems,
synthesis of preliminary structural designs,
and mapping between abstraction levels.
Design requirements are identifiable and can be expressed in some manner as goals
and constraints.







Based on the above assumptions, the goal of investigating and identifying promising neural

network approaches to preliminary structural design synthesis leads to the following sub-

goals:


Identify a neural network approach to managing constraints in preliminary
structural design for analyzing and synthesizing preliminary designs.
Identify a suitable neural network learning approach for acquiring structural design
synthesis knowledge.
Identify possible representations of abstract concepts and objects suitable for the
neural systems and to simplify the design task at hand.
Thus, this dissertation discusses and illustrates the use of artificial neural networks to

manage design constraints and to acquire and represent synthesis knowledge for

preliminary structural design to achieve the goal of identifying useful neural network

approaches to preliminary design tasks.

Preliminary design models have been limited by their ability to acquire and reuse

experiential knowledge. Preliminary design lacks a strong domain theory that makes

development of computational design models very difficult. An inductive learning

approach could acquire and then reuse knowledge embodied in past design experiences,

which are portrayed as successful and valid design cases. This knowledge could then be

brought to bear on synthesis, abstraction mapping, and constraint management tasks.

Analysis of preliminary designs to determine the behavioral characteristics of the

forms developed from the synthesis process has also been restricted by a lack of reusable

knowledge. In these cases, knowledge from basic principles to domain specific knowledge

is required; however, the lack of specific values for variables and the wide range of




11


may be used to manage those constraints in a constructive, intuitive manner compatible

with preliminary design processes, thereby providing an analysis capability limited or

nonexistent until this time.

This research focuses on neural network systems that manage constraints and learn

synthesis and abstraction mapping knowledge for preliminary structural design. The

following are motivations for investigating artificial neural networks:

Artificial neural networks can learn complex, nonlinear relationships from a sample
of input-output pairs that represent those relationships. The relationships need not
be explicit. By presenting a network with a training sample of previous designs, it
may learn those design relationships and develop an appropriate taxonomy.
During the learning stage, these systems store entities to be represented as a
pattern of activity distributed over many computing elements. Since the knowledge
is stored in the strengths of the interconnections between processing units, the
knowledge about any individual input-output pattern pair is not stored in the
connections of a special unit reserved for that pattern, but it is distributed over the
connections among many processing units. Distributed representations provide a
way to implement best-fit searches of a solution space, and they have the ability to
learn new concepts without having to increase size of memory.
Because knowledge is distributed over many processing units in a trained neural
network, the system's response can be insensitive to slight variations in input,
gracefully degrade in these situations, allow for automatic generalization, and
produce novel outputs [Hinton86].
Some neural network training paradigms demonstrate inductive learning processes
where general, basic principles are derived from samples. Training sets must
include the principles that the system will learn in either explicit or implicit form.
Because artificial neural networks can learn, the knowledge acquisition bottleneck
associated with knowledge-based computational design models is alleviated.
Neural networks perform well on tasks similar to design, where there are large
numbers of constraints, partial information, and parallel tasks, such as
combinatorial optimization, pattern-recognition, speech understanding, and vision
processing [Fukushima87].








Organization


This section outlines the remainder of this thesis. The next chapter presents an

overview of design theory and methodology research with an emphasis on computational

models of design processes. Here a distinction between traditional knowledge-based

approaches and artificial neural networks is made and the motivation for exploring

artificial neural networks in the context of design is solidified. The third chapter reviews

artificial neural networks in a general sense. The fourth chapter describes a particular type

of artificial neural network called harmony theory and its use as a prototype computer-


based model for qualitative analysis of preliminary designs.


The fifth chapter goes into


detail about the backpropagation neural network paradigm. It describes the general theory

of backpropagation and includes enhancements and pseudo second-order methods for

learning. The sixth chapter details the development and implementation of a feedforward


neural network simulator used in this research.


The seventh chapter provides several


design examples exploring the use of feedforward neural networks for preliminary design.

The final chapter summarizes the results of this research and contains conclusions and

recommendations for further work in this field.













DESIGN THEORY AND METHODOLOGY


This chapter begins by providing a broad description of engineering design,

concentrating on the general theory and methodology. Common design models are then

reviewed which leads to an examination of computational design process models. By

developing a broad description of design and reviewing current work in engineering design

theory and in the development of computational models of design processes, this chapter

provides a perspective and the motivation for succeeding connectionist computational

design models described in later chapters.


What Is Design


The proper study of mankind is the science of design. [Simon69,


page 83]


In The Sciences of the Artificial, Simon introduces the possibility of creating a science or


sciences of design. In this series of essays, he shows that it is possible to explain an

artificial science (as opposed to a natural science) and illustrate that artificial science's

nature. His two illustrative examples of artificial sciences in this book are the fields of

cognitive psychology and engineering design. Simon's work was one of the first such

essays to challenge design researchers to explore and define their science, and it has not


been, by any means, conclusive. Design researchers [Dixon88,


Finger89a, Finger89b,


S aA -~r r a an -a i




14


science of the artificial, we can gain some insight into the question: what is design? After

defining design, we can then critically examine the theory and some existing computational

models of design.

If a natural science is knowledge about natural phenomena and objects that occur

in nature without human intervention, then an artificial science is knowledge about

artificial phenomena and objects. Simon identified four distinguishable indicators for

artificial phenomena and objects. They are as follows:


1. Artificial phenomena and objects are synthesized by humans.
2. Artificial phenomena and objects can imitate natural things.
3. Artificial phenomena and objects can be characterized by how they function, how
they attain goals, and how they have been adapted.
4. Artificial phenomena and objects are discussed and described in not only
descriptive terms but also in imperative terms that detail desired functioning or
goal achievement.
Using Simon's four indicators of artificial phenomena and objects and relating them to

design processes and artifacts, we can characterize design as a process that synthesizes an


artifact that functions to attain some specified goal or goals.


We must take note, however,


that the terms artificial with respect to artificial phenomena and objects and artificial

science actively describe real, not imaginary, artifacts and knowledge. It is important to

emphasize that these artifacts function to achieve specific goals.

Computationally, attainment of design goals should not be considered an

optimization problem. Christopher Alexander in his book, Notes on the Synthesis of Form

states:







level which suffices to prevent misfit between the form and the context,


and to do


this in the least arbitrary manner possible. [Alexander64, page 99]

Thus, design attempts to satisfy requirements rather than optimize for those requirements.

This is particularly evident in preliminary design because in preliminary design we rarely

have a method for finding an optimum since these types of design problems have limited


available quantitative information.


When comparing preliminary design solutions, we


usually use qualitative terms such as "better" and "worse" rather than quantitative terms.

This is not to say that optimization methods are unimportant design tools. They are

actually under utilized particularly at later stages of design where numerical optimization

techniques can be readily applied, but in preliminary design, they are unsuitable for direct

application because of the qualitative nature of design requirements.


Design Requirements


Design


the synthesis of an artifact that satisfies requirements. These


requirements or design goals help define the function and purpose of an artifact


making incremental steps to satisfy each requirement, design requirements may help guide

a design process. Explicit statement of each design requirement helps describe what a

design must achieve; however, general requirements cannot be uniformly described for all

the varied phenomena that designers encounter.

Since we cannot really expect to give complete descriptions of all design

requirements for complex design problems, how can we expect to generate design




16


alternatives that satisfy requirements that we cannot describe? Here lies the designer's

paradox:

By designing an artifact that satisfies given requirements, we can identify further
requirements or more details of the given requirements that were unknown or
unforeseen during the initial stages of our design.

In other words, the context of a design and the design's form are complimentary. For

innovative design cases, prototyping and simulation are important design tools since they


let us explore both a design's


form and context. Other ways to explore a design's


context


are to take incremental steps towards satisfying a design artifact's


known requirements.


This allows for a careful and critical review of current design requirements and facilitates

"stepping back" to previous design states when new or more detailed requirements and


goals are discovered


Some researchers refer to this as a redesign stage, which can be a


costly process.

Typically, a design starts as a problem statement containing one or more abstract

requirements. Abstract requirements are typically fuzzy or incomplete descriptions of

design criteria that most often do not adequately portray the intentions of a design

problem in the sense that they are explicit by referencing specific design values. Therefore,

abstract requirements must be transformed into more detailed ones before continuing onto


further design steps.


Alexander states:


Physical clarity cannot be achieved in a form until there


clarity in the designer's


first some programmatic


mind and actions; and that for this to be possible, in turn, the


designer must first trace his design problem to its earliest functional origins and be


able to find some pattern in them.


[Alexander64, page 15]


It is important to note that Alexander does not reauire a complete transformation







numerical optimization problem. A design's context, no matter how hard we try to define

it, is a field problem in which we have some forces that are too difficult to understand.


Another way to state the designer's


paradox is that understanding a design's context is the


same problem as synthesizing an artifact that does not violate that context.


design problems, the designer's


For complex


paradox becomes even more difficult since we cannot


always understand the context without violating it.

If we return to the concept of satisfying requirements, an underlying motivation for

satisfaction arises from the previous discussion on the nature of design requirements.

Because we can rarely completely describe a design's context and requirements, what is

meant by satisfaction? What does Alexander mean by meeting requirements in the best

possible way? The answer lies not in identifying what is good but in recognizing what is


not bad.


Since we cannot fully understand a design's context but we can recognize if an


artifact does not violate what we do understand, then a design that does not violate any

design requirements (misfit between form and context) satisfies those requirements and is

a good, acceptable design solution. Obviously, if we could maximize the achievement of

each design goal, then we could logically say that the resulting design artifact not only


satisfies identified design requirements but is also the best design.


Thus, satisfying design


requirements is simply to prevent misfit between form and context.

Not only do designers concern themselves with misfit between form and context,

but they also must contend with design requirements that often conflict. This is a

characteristic of design problems that make design decisions often a trade-off. Instead of




18


that makes up the design. Interrelated and conflicting design requirements are a driving

force behind studying design theory, automating design, and creating computational

design process models because they may help designers create better designs in a

complicated, ever changing environment.

As an example of abstract, coupled, conflicting requirements, consider the design


of a simple beam to safely and economically resist a load.


From this statement we can


identify the primary design requirement of kinematically resisting a load.


Any beam we


design will suffice if it will resist the given load in a stable fashion. This is the basic

performance requirement for our design. Considering the two additional, implied

requirements, we have in total three abstract requirements: load resistance, economy, and

safety, which help define the context of this design. Each of these requirements is

indeterminate since we still lack information and details such as the magnitude, location,

or direction of the loading; we do not have a "definition" of economical, nor do we have a

"definition" of safe.

Transforming as many abstract design goals as possible into a set of more detailed

goals or specifications is necessary before any design can continue, particularly when

determining interactions among design requirements. Many design problems require much

more information about the design situation than abstract requirements state. Even in this

simple example, we need such information as available support conditions, materials, even

fabrication methods before we can consider possible design solutions. In essence, we are

exploring a design context by trying to define a design search space of possible design







For this example, we will consider a


Fixed


Roller


vertical load at the center of a span of length, L

we will consider any combination of fixed, pin,

and roller supports at the ends of the span as


(1.5)


shown in Figure 3; we will choose from constant


Figure


Support Types


circular, I-shape, or channel cross sections as shown in Figure 4.


(Appendix A shows three


tables of cross section dimensions used in this example.) We will not specify any materials;

however, we will assume that any materials would equally resist tensile and compressive


forces.


We will think about economy by making the beam as light as possible (based on


normalizing the cross-sectional areas) and considering fabrication and maintenance costs.

Fabrication and maintenance costs are dealt with by assigning two cost factor types and

respective importance multipliers. The first cost factor considers the type of each (shown


in parenthesis below each support in Figure 3). The second cost factor takes into account

matching the type of section chosen to the type of support. Table 1 shows a matrix of


fabrication and maintenance costs for each section and support type. Each of these cost

factors (weigh, fabrication, and maintenance) can be scaled by an importance multiplier,

which adjusts the relative influence of these costs with respect to the other requirements.

The scaled cost factors will be added together based on the number of supports to get a

total cost estimate. Our safety requirement will be transformed into low displacement and

low stress goals.2
















-*1*l-


W


Figure 4


Cross Section Types


Table 1: Support/Cross-Section Costs


Circular


I-Section


Channel


Roller 2 1 2
Pin 2 1 1
Fixed 1 1.5 1.5


Each abstract requirement has been transformed, in some sense, to a more detailed


design goal as summarized in Figure


This transformation is an important step as we


explore the design context. The transformation is based on an interpretation of what we


are trying to achieve and our design experience


Experiential knowledge is a human


designer's


greatest asset and is what most computational models for design proc


esses


attempt to codify and emulate. An additional feature of our detailed design goals is that

we can estimate the level of achievement of each in some way. Stability is easily

determined from statics and is Boolean, true or false; weight is based on the total volume

of material; fabrication and maintenance costs are based on the values given in Table 1 and


Figure 3.


Solid mechanics lets us measure the magnitudes of displacement and stress and


thus rank each design alternative.







Designers, whether


Abstract Reauirements


Detailed Design Goals


human or machine, must


Load Resistance


> Stability


address each design


Economy


requirement to some degree


Weight
Fabrication
Maintenance


in order to generate "good"


Safety/Comfort


designs; however, many


Figure


Displacement

Stress


Transformation of Design Goals


design requirements


conflict


Opposing requirements are those that by increasing performance relative to one


goal results in one reducing the level of achievement of another goal. Interactions between

requirements make them harder to achieve than if they are independently considered, and

for complicated, large design problems where interactions between requirements are

prevalent, designers need some way to reduce this complexity.


Figure 6 shows an interaction diamond for the detailed design goals from Figure


(stability, displacement, stress, weight, maintenance, and fabrication). Each line in Figure 6

represents an interaction, and the type of interaction is shown as a minus sign (-) for a

conflict and a plus sign (+) for mutual benefit or no conflict. Each goal interacts with

every other goal but the relative influence between goals varies. Figure 6 only shows

generalized, qualitative goal interactions and is based on the previously described

performance measurements for each requirement.

As an example of interpreting this diagram, reducing both stress and displacement

has a beneficial effect on the basic performance requirement of stability along with the




22


stress and displacement will likely increase and overall stability (in a qualitative sense) will

probably decrease. Even for this simplified design problem, tradeoffs are apparent, and in

general, design requirements will more often conflict than what we can observe in this

problem.


Stability


Displacement


Fabrication.


Stress


Maintenancet


Weight


Figure 6: Goal Interaction Diamond

As an example of interpreting this diagram, reducing both stress and displacement

has a beneficial effect on the basic performance requirement of stability along with the

reductions benefiting each other. Reducing either maintenance or fabrication cost

requirement, however, tends to have a conflicting effect on the other three goals such that

stress and displacement will likely increase and overall stability (in a qualitative sense) will

probably decrease. Even for this simplified design problem, tradeoffs are apparent, and in

general, design requirements will more often conflict than what we can observe in this




23


An advantage of illustrating these concepts through a simple example is that we

can identify possible classes of solutions, given the design requirements that we are

considering. Satisfying our basic performance requirement of stability within the context

of available support types and locations defines four classes of possible solutions, a simply

supported beam, a cantilevered beam, and two types of indeterminate beams as shown in

Figure 7.

Another advantage of our simple problem


and how we have developed its context is that we


can represent the displacement, stress, and cost

requirements with some quantity. To compare each

solution, we can aggregate the level of achievement


of each of the requirements into a single


performance index.


We will do this by normalizing


the level of achievement of each requirement, then


Figure 7


Possible Beam Solutions


scaling the resulting number. For each class of beam solution, we can derive the maximum

displacement and moment magnitudes from solid mechanics and use the cross-sectional


area as a weight indicator since all solutions must span the same distance.


We will


normalize these magnitudes with the minimum values determined or provided.

normalizations are done across all cross sections as defined in Appendix A. W


minimize the total cost (e. g.,


Both

. want to


maximize the performance); however, because of the


available cross section disparity, we also can apply scale factors to the weight, fabrication,







these values to the costs based on weight, cross-section type, and support type.


resulting numbers are the relative costs that we want to minimize. Thus when choosing

among a number of solutions, we will be confident that our choice will satisfy to a high

degree the specified design requirements. These advantages are not normally present in

complex preliminary design problems; however, we will use them to illustrate how design

requirements interact.3

Appendix A shows through several design scenarios that by emphasizing different

combinations of design requirements, designers cannot ignore interactions between

requirements. By concentrating on increasing the design's performance with respect to

stress and displacement, we in essence disregard the other requirements by selecting a

propped beam with a large I-section. This solution drastically differs from when we

accentuate either manufacturing or fabrication requirements, which indicates that a large

sectioned cantilevered beam would be best. Overall, it is not surprising that I-sections are

generally preferred since they resist bending most efficiently. Both fixed end and

cantilevered beams are often chosen where the choice is a tradeoff between safety and

economy.

This example directly illustrates several concepts. First, by identifying and then

mapping abstract design requirements into detailed design goals, we further define a

design's context, which in turn helps define a design's form. In general, identification and


3 This design problem, as specified, resembles more of a structural optimization problem
rather than a preliminary design problem because we have specified to a high degree the
,a;l, n ,nnt'nvt+ Tnha tar ula rfar nlnIilo nn tha narfennnnroa ;nov irnh rh4a th 5a




25

transformation of design requirements are not as straightforward as in our simple example.

Mapping from abstract requirements to more detailed ones may be regarded as a process

that combines "discovery" of the unexpected with anticipation of what is known about a


design domain


Those requirements that designers discover during refinement most often


prescribe a redesign phase based on an analysis of a design's


failures. Since designers


rarely can completely define a design's context, they must experiment with its form to

clarify the context, thus recognizing new, unanticipated requirements. This symbiosis

(Figure 8) is characteristic of most design problems.

Second, design requirements interact to a varying


degree. Complexity of design problems increases


Context


dramatically during the design process as refinement


Form


defines requirements and interactions between


Figure 8: Symbio


of Design


requirements. These interactions are relationships between design goals that either inhibit


or assist in satisfying those goals.


Identifying interactions and effectively dealing with them


are some of the most challenging aspects of design in general and in developing

computational models of design.

Third, by formalizing context, form, and goal interactions, both design variables

and parameters are more easily identified. Design variables and parameters define specific

quantitative and qualitative features of a design that have the potential for adding requisite


detail to the artifact.


When a design process has completely specified an artifact, all design


variables and parameters have values within a valid range for the given design context, and




26


that the value for a given design variable does not violate either the context or form of the

design, then the value for that design variable is acceptable.

In summary, design is a process that synthesizes an artifact that functions to attain


some specified requirements.


To accomplish this task, any design process model, whether


cognitive or computational, must in some way do the following:


conform to abstract design requirements,

map abstract requirements to detailed requirements,
identify and accommodate interacting requirements, and


satisfy design requirements.
The next section of this chapter reviews existing models of design processes. From

this review, a case for investigating and developing connectionist models for design

processes will be made.


Models of Design Processes


Mode

prescriptive,


ls


of design processes fall into three general categories: descriptive,


and computational. Although this research focuses on development of


computational models, research in both descriptive and prescriptive models is necessary to

help identify those areas of design research and methodology that are well understood and


those that are not.


Design researchers developing and studying descriptive and prescriptive


models base and direct their work on and towards human designers.


These models


oftentimes provide the basis for computational design process models and are necessary to

review.







Prescriptive Design Process Models


Prescriptive models can dictate either how the design process should proceed or

what attributes the design artifact ought to have. To date, these two classes of prescriptive

models are disjoint and have little in common, other than proposing a prescription on the

philosophies of design. Most books on design processes offer prescriptive models


[Cross89, Dieter83,


Ertas93


, Pahl84, Pugh90, Ullman92,


Walton91] that define a strategy


for the design of a quality product. The authors attempt to enlighten engineering students


with better ways to approach design problems.


It is interesting to note that one of the


common themes that these books stress is that reading about design is not enough but the

student must actually do design in order to become proficient at the process. Design

experience is a necessary ingredient in design education. This theme is a common thread

among all design researchers.

Prescriptive design processes describe a plan for how to get from the need for an


artifact to the final product.


This plan ideally best utilizes the knowledge at hand so that


the artifact is of high quality and is quickly and economically developed.


Ullman


[Ullman92] identifies the product life cycle that is the basis for the mechanical design

process. The product life cycle he prescribes consists of six phases:


. Specification development/planning
planning for design.


Conceptual desi
Product design
product.


Production


ign


-- understanding the design problem and


-- generating and evaluating concepts.


-- generating the product, evaluating the product, and finalizing the


-- manufacturing and assembling the product.




28


Concurrent design is the simultaneous evolution of the product and the process for


producing it.


The process of refining a design from a concept to a manufacturable product


is done concurrently in order to increase the quality of the resulting artifact and make its

manufacture as economical and efficient as possible. Although these six phases are

described in an order that reflects a sequential series of actions, they are often concurrently

done among a design team. Each phase of this design process strategy must be


investigated in the context of the overall design process. Ullman estimates that 7


of an artifact'


percent


manufacturing cost is committed by the end of the conceptual design phase


and the product design phase only consumes


percent.


This means that design decisions


made later in the design process will have little affect on the product'


manufacturing cost.


Prescriptive methods can account for the outcome of many possible artifacts that

satisfy the requirements since the knowledge employed during the design process is


designer dependent.


This implies a dynamic process. Designers apply two types of


knowledge: design process knowledge and domain knowledge. Domain knowledge

consists of information that is directly and indirectly associated with design requirements,


variables, terminology


procedure, processes, etc. that make up a design's


context. Human


designers constantly learn more about their domain while practicing design. Since domain

knowledge is so important to prescriptive models, designers rely on experience to gain

domain knowledge. Design process knowledge affects how the designer applies domain

knowledge, and prescriptive design models provide a strategy for applying that

knowledge. Just as different designers create different designs that satisfy the same design




29


An argument can be made that prescriptive processes limit creativity in design;

instead, design processes should be viewed as chaotic. On the other end of the spectrum is


the belief that design processes should be organized and disciplined.


What most design


researchers have proposed to date are prescriptive models that fall in the middle of the

spectrum. Software engineering is an example domain that employs a strict prescriptive

process model [Conger94, Rumbaugh91], and in this domain, a strict model performs


Software engineers are able to define the context with great detail before conceptual


design and product design phases begin since designers can specify requirements with


great detail.


In addition, the targeted computing environment can partially define the form


of a software product before software engineers consider the conceptual design.

Therefore, software engineering inherently allows for strict prescribed design processes,

whereas some engineering design domains, particularly preliminary structural design,


suffer from the designer's


paradox, which allows and sometimes requires an exploration of


the design domain to gather further knowledge.

As knowledge about a design problem increases, designers make decisions that

define the form of an artifact. As a form becomes determined, there is less unbridled

creativity since the emerging form adds constraints to the as yet undetermined aspects of


the artifact.


Violation of these constraints implies expense in terms of design changes that


may affect re-tooling of manufacturing resources and delay in releasing the product to

market. Changes to a product early in the design process incur little expense since only a

small investment has been made; therefore, designers have more opportunity for creativity







The conceptual design phase in most prescriptive processes generates and


evaluates possible preliminary solutions to a design problem.


The generation and


evaluation of alternatives, as opposed to a single solution, is desirable since designers


rarely have a design's


context completely formalized. Again, the designer's


paradox


dictates that we will gain more knowledge about our design domain (and design process)

by exploration. Generating and evaluating alternatives are such an exploration [Pugh90].

A very successful prescriptive design process model that describes the attributes

that the artifact should possess, rather the prescribing the process to attain an artifact, was


developed by Genichi Taguchi [Roy90].


The Taguchi method, as his prescriptive


processes have come to be known, relies on the relationship between design and


manufacturing.


The revitalization of Japanese manufacturing since World War II can in


part be attributed their adoption of Taguchi methods, which strive to minimize the quality


loss of a design over the design'


Taguchi defines quality loss as the deviation from desired performance of a design.

The system that Taguchi developed is statistical in nature and relies on the development of


experiments for parameter and tolerance design.


While Alexander [Alexander64] defines a


good design as one that satisfies all design requirements, Taguchi defines a good design as

one that is relatively insensitive to uncontrollable factors that might be encountered in

manufacturing and during the life of the product. Taguchi calls these factors that can cause


a design's


functional criteria to deviate from expected values


"noise factors.


Taguchi methods emphasize designing quality into not only an artifact, but also the







. Quality should be designed into a product and not inspected into it during


manufacture.


2. Quality is best achieved by minimizing the deviation of design parameters from
target values.
3. The cost of quality loss can be measured as a function of design parameter
deviations in terms of the overall life cycle of a product.
Quality improvement requires continuous effort to reduce the variation around target

values, and it is very important that this effort be done early in a design process in order


for Taguchi methods to succeed.


The first step towards improving quality is to statistically


reduce the deviation as much as possible. To accomplish this, Taguchi methods call for


designing experiments using tables called orthogonall arrays"


least number of experiments and their conditions.


that serve to determine the


The second step to improving product


quality is to reduce noise factors that can adversely influence the response of an artifact.

Machinery wear and weather are examples of what Taguchi considers as noise factors.

Taguchi uses "outer arrays" to study the influence of noise factors with minimum effort.

To achieve desirable product quality through design, Taguchi suggests a three

stage design process:


. Systems design, which focuses on identifying a design's


context and determining


valid ranges of design variables and requirements for both a product and
manufacturing process. This may include materials, process parameters, and
possible configurations.
2. Parameter design, which seeks to determine values of design variables that
produce the best performance of an artifact and manufacturing process.
3. Tolerance design, which fine tunes the results of parameter design by tightening
tolerance factors in order to reduce the variance of the product and manufacturing
process with respect to requirements.
Taguchi methods require designing experiments to identify acceptable design





32


orthogonal array represents a trial while each column corresponds to the design factors

specified for the process. As an example, if a design has seven factors that can take on one


of two values, then a complete exhaustive search of the space would require


or 128


experiments. Using the orthogonal array method, Taguchi can specify eight design

experiments that adequately explore the design space.

The results of these experiments can achieve one or more of the following

objectives. First, they can establish an optimum set of values for design parameters.

Second, they can estimate the contribution of individual factors to the quality of a design.

Third, they can estimate the response of a design with respect to the desired response

level.

The best results using Taguchi methods occur in industries characterized by a high-

volume, low-cost manufacturing environment, such as consumer electronics or


automobiles


In these environments, the cost of a large number of experiments is offset by


the revenue generated; however, low-volume, high-cost manufacturing environments such

as the aerospace industry may not be readily adaptable to Taguchi methods since the cost

of experiments is high [Montgomery91 ].


Descriptive Design Process Models

Descriptive design process models differ from prescriptive models by attempting to

describe how human designers design instead of suggesting a strategy for designers to


follow.

- a a~


They study the cognitive processes designers use. Descriptive models fall into two

l'T1 L CIIC nnCaaa.. .- .n-n n..Ian r.tt 4AA. Un A*rr1 Ar +A : ... 4 nna l wt, A4 n nr ol







simple answer; it is a human characteristic that is unique and not well understood.

Attempting to answer the question of how we design is in a large part the primary

motivation behind many researchers studying cognitive modeling. Their results are often in

the form of a descriptive design process. This section reviews work done in developing

descriptive design process models.

Protocol analysis involves presenting a subject with a problem and asking the


subject to verbalize while solving the problem. Protocol analysis tries to obtain a subject'


stream-of-consciousness thoughts. Typically, protocol analysis requires some type of

recording of protocol sessions, either with sound only or with both sound and video

recordings. By analyzing design protocol sessions, it is possible to identify some of the

mental processes designers use during design. A primary feature of protocol analysis is

that it records all the information a subject can communicate, whether verbally or through

drawings or gestures.

There are several inherent weaknesses with protocol analysis. Considering how

little we know about design processes, the benefits of protocol analysis far out weight the


drawbacks. The first limitation is in interpretation of protocol sessions.


Most researchers


limit any bias by having more than one person interpret each session and then negotiate

differences between explanations. Protocol analysis of designers is limited by studying

individuals because design is generally a team effort involving more than one designer. To

date, no work has been done on applying protocol analysis to design teams. Finally, the

weakest aspect of protocol analysis is that it cannot record what happens when a designer







disciplines that use protocol analysis; therefore, data gathering is divided into several

sessions over the course of several days allowing designers to ponder and organize their

thoughts outside of formal data gathering sessions. Regardless of the limitations, design


researchers have discovered many interesting design "habits"


that are consistent across


several design domains and appear to be inherent human traits.


Ullman and coworkers [Ullman87,


88] have done extensive protocol studies of


mechanical designers. They chose to study all phases of human design processes with both


experienced and inexperienced engineering designers.


Designers were recorded using both


video and sound recording equipment during their entire design session, which started by

providing subjects with abstract design specifications and proceeded until they produced

detailed working drawings for most of a final design. The protocol analysis recorded two

types of design problems. One type was the design of a mass-produced product and the


other required designing a unique one-of-a-kind product.


The resulting protocol analysis


contains descriptions of the form and function of the resulting artifacts and also the


designer's


process.


The results of different protocol analyses are similar [Adelson88,


Ullman88].


Ullman87,


The major findings are as follows:


Individual designers establish a single preliminary design early in their process. As
they discover problems or inefficiencies in their original concept, they fix/patch a


design instead of formulating a different preliminary design.


This single-concept


strategy is contrary to what most prescriptive process models advocate; however,
since these protocol studies are of single designers, not design teams, it is
reasonable to assume that individual designers cannot mentally develop parallel,
preliminary concepts.




35


calculated. They also may provide a visual stimulation for ideas, provide external
memory, and facilitate communication of ideas to others.

The strategy designers employ changes from being systematic to more
opportunistic as a design becomes less abstract. Once designers have explored a
design's context and provided some form to an artifact, patterns in sketches can
trigger ideas or identify problems causing designers to immediately shift their
attention to that idea or problem. In addition, as a design becomes more complex,
it is more difficult for a designer to consider the whole design. Instead, designers
subdivide a problem into manageable parts, which become items of focus.
Prescriptive design processes suggest a more systematic process rather than this
type of opportunistic behavior.

Designers try to keep the state of a design balanced by focusing on abstract
portions. This general strategy of keeping all parts of a design at the same level of
detail contradicts the opportunistic strategy previously mentioned; however, if a
design state becomes bottlenecked, then the convenience of what can be achieved
may alleviate the standstill.
Protocol studies can be valuable tools for studying design processes humans use

and for testing prescriptive design process models. They should be extended to include

design teams rather than just studying individuals.

A cognitive model describes the processes and behaviors that constitute a skill. A

cognitive model specifies a set of mechanisms, each with a well defined function and

defined interactions, that transform a set of inputs into outputs. Since a cognitive model

describes a process by employing well defined mechanisms, it can generate explanations

and predictions about the process that it models [Adelson89]. This is a useful feature for

studying the theory of processes such as design that are not well understood. Theorists

can develop cognitive models from protocol studies; however, retrospective reporting,

where a subject explains what was done, and informal reporting, where an observer

watches what is done and asks questions, can also provide the basis for cognitive models.
C C~ a a -







Protocol studies can help identify complex, interacting behaviors.
Protocol analysis is well suited to test and give support to the explanations
generated by a cognitive model.

Protocol studies are normally done in a natural setting that can protect results from
any skewing that might occur in an experimental setting.
Regardless of the foundation for a cognitive model, most researchers recognize


that cognitive models deserve further investigation [Mostow85].


Research into a cognitive


theory of design is just beginning and suffers from a lack of design theory taxonomy in

which to study and in the contrasting approaches used by researchers in computer science


and those in psychology [Dixon88].


As cognitive modeling and protocol analysis


continues, the methods, skills, and strategies employed by designers will be better


understood


Different computational design process models test many aspects of most, if


not all, cognitive modeling [Adelson88,


Adelson89


Brown86


Mitchell8


, Tong87]


is the subject of the next section.


Computational Design Process Models


Using prescriptive and descriptive models of how designers design, many

researchers have developed computational models as tools to assist designers, as

autonomous systems, or as experiments to research how designers design or should

design. All three types of process models contribute to the overall understanding of design

processes. Each of these types of models helps in identifying abstractions, variables,

techniques, and general knowledge about design. Although there is no requirement that a


design process model need design as people do,


4 protocol studies, cognitive research, and




37


process prescriptive models study human designers as examples of what computational

models may want to emulate. Computational models give researchers a controlled and

reasonably understood testing ground for their theories [Dixon88].

Most computational design process models map some set of design requirements

into a description of a physically realizable object that satisfies those requirements. As

previously illustrated in this chapter, these requirements may be very abstract or quite

detailed; however, given requirements typically attempt to specify function, performance,

context, and available resources. The design task performed by most computational


models is to create a design solution that satisfies these requirements.


Various models take


advantage of special characteristics of their design domains, such as


Modularity
Interactions


-- allows partitioning design problems into subproblems.
-- subproblems may not be independent, but interactions between


classes of subproblems may be well defined.


-- may be specific or abstract.


Level of Solution Detail
description expected?


Available Knowledge


-- are detailed "final" results sought, or is an abstract


-- some domains do not have a recognized domain theory.


In most computational design process models, particularly knowledge-based ones,

the form, quantity, and availability of design knowledge is critical and an often overlooked

factor in the development of a computational model. Computational models must in some

way acquire, organize, represent, and use a variety of types of knowledge. Oftentimes this

knowledge is experiential from humans; it may be fundamental in the design domain; or it

even may come from the design process as it unfolds. To complicate matters, the way


Requirements




38


issues to note. First, the effective combination of different types of knowledge is difficult,

and second, it is difficult to effectively represent many types of knowledge that human


designers efficiently use [Nevill88].


Knowledge representation is a research area within


itself; therefore, this section will not explicitly address different techniques of

representation but will concentrate on different processes computational models may

apply.

Top-down refinement is a general problem solving method that starts with an

initial, abstract problem specification and refines it by adding detail. In order to do this, a

computational process may have to decompose a problem into smaller subproblems until

basic, primitive operators can fully specify each subproblem. A computational model

usually does this in an iterative manner. Top-down refinement requires that a design

domain have identifiable levels of abstraction and subproblem classes. In addition,

interactions between subproblems must also be consistent and identifiable.

A typical computational model may use many levels of abstraction. MOLGEN

[Stefik80], a molecular genetic design system, uses six abstraction levels, and MOSAIC

[Nevill89a] uses three abstraction levels. Each abstraction level a computational model

uses (and human designers, too) tends to focus on a particular viewpoint or primary focus

of a design. For instance, the first or top abstraction level in MOSAIC is concerned with


developing a path around obstacles in the design space.


The results from subproblems at


one level of abstraction are commonly transferred to other subproblems and abstraction

levels to assist refining a solution. Both designers and computational design models use




39

At each level of abstraction, computational models may decompose given problem

specifications into smaller subproblems. Modularity of a design domain and natural

separability lead to different subproblem classes. Design models commonly use both

decomposition and abstraction techniques to simplify design problems. By dividing

complicated design problems into smaller subproblems, the number and scope of design

decisions that need to be made at any point in a design process can be controlled;

however, for computational design models that do not acquire new knowledge, good

abstractions and decompositions must be known and coded into a computer model a

prior. In addition, conflicts can arise between subproblems.

In design domains, independent sets of subproblems are rare and interactions

between subproblems may result in conflicts or assistance, recall Figure 6 showing


interacting requirements.


Alexander [Alexander64] suggests identifying subproblem


classes that have a minimum number of interactions, but in design, interactions are

inevitable and computational models must either have them identified a priori or have


some means of detecting them. A common method of handling interactions is through a

process called constraint propagation. Whenever an interaction occurs, a design process


model would create a constraint that communicates the interaction.


A computational


model using constraints must propagate them to those subproblems that are affected.

Steinberg [Steinberg87] notes that constraint propagation is effective but computationally,


very expensive.


When an interaction causes a conflict or results in some requirement being


unsatisfied, a computational model must be able to resolve the conflict. Most models do




40


typically causes an additional constraint to be posted before the process continues. In

patching a design, a computational model would replace part of or augment a design such

that the patching action eliminates the conflict. Patching does not require that a model

revert to a previous level of abstraction.

Bottom-up composition is a design process that is opposite of top-down

refinement. Bottom-up composition starts with a set of design requirements, but instead of

refining a system of subproblems to find a solution, a design is constructed from a set of

available components. The components are combined in various ways, oftentimes

exhaustively, to create a design that satisfies the requirements. In order to prevent

exponential explosion, control structures guide the process of building a design from basic

components. Two interesting uses of bottom-up composition have found their way into

design models. First, generative grammars for reasoning about spatial and functional

representations have been used with limited success in structural design [Fenves87], and

second, heuristic rule sets that generate possible solutions to a given set of design

requirements [Coyne87]. Both these methods suffer from a limited knowledge set and

process control problems.

Most knowledge-based processes need some type of control system that guides a

process by invoking rules, identifying and resolving interactions, pursuing certain goals,

and evaluating the state of a design. In general, a control system for computational models

of design invokes the specific knowledge and processes used to solve a design problem.

Human designers readily move between abstractions, create design variables, and




41


models in terms of creativity, innovation, and in dealing with unfamiliar design situations.

These characteristics of human designers actually help in difficult design situations since

humans can become opportunistic in the sense that they change abstractions and their


focus because of recognizable patterns in their design [Ullman87,


Baker87]. Humans can


identify patterns that cross abstraction boundaries and even design domain bounds.

Associating design features for pattern recognition can even mean conveniently ignoring


certain features in order to become opportunistic.


Tong [Tong87] suggests that changes in


abstractions are necessary when a design problem contains identifiable bottlenecks.


Unfortunately,


opportunistic behavior can appear to be chaotic and difficult to control in


the sense that designers will always create repeatable designs. Humans constantly learn

and good designers depend on learning from their design experiences. They cannot be

relied on to create repeatable designs.

Computational models, on the other hand, need some type of controlling structure

if they are to produce repeatable designs, even if opportunistic behavior can be modeled.

However, there is not a single recognized procedure for guidance in design let alone

taking advantage of and recognizing opportunity. Most computational models implement

some control structure that provides some guidance in moving between abstractions,

identifying critical design variables, and for refining requirements [Mostow85].

Knowledge-based approaches in design have taken several courses. Brown


[Brown86] developed a hierarchy of "specialists"


modules, which implement well


.- C C a S .




42


understood design alternatives. Design proceeds in a top-down fashion from the abstract

to the detailed level. Because the specialists modules are well defined and understood,

abstraction levels are clearly defined as are relationships between different requirements

and design variables through and between all abstraction levels. Hayes-Roth and Hayes-

Roth [Hayes-Roth78] recognized the opportunistic nature of human planning processes,


which was reinforced by protocol studies [Ullman87, Baker87, Ullman88].


In their model,


Hayes-Roth and Hayes-Roth proposed a common data area called a blackboard where

independent and asynchronous specialists would exchange information. These specialists

are similar to the specialists of Brown [Brown86]; however, Hayes-Roth and Hayes-Roth

left the issue of controlling this process as a topic for future research. Stefik [Stefik80],

working in conjunction with the Hayes-Roths, developed a hierarchical planning system

for molecular genetics. Stefik's work dealt with handling interactions between


subproblems in automated planning and design models. In Stefik's


work, subproblems are


created when simplifying large problems by decomposing them into smaller subproblems.

These subproblems normally cannot be solved independently since they interact; therefore,

the hierarchical model of the Hayes-Roths is hindered without some means of effectively

managing these interactions.

Stefik proposed using constraints, a requirement that must be satisfied, to handle

subproblem interactions. Using a least commitment control strategy, subproblems that are

most constrained and specified are refined into more detailed subproblems until they are


fully specified


Decisions about poorly specified, under-constrained subproblems are







control issues in the mechanical design area. Mittal et al. successfully applied Stefik'


control strategy in an expert system that designs paper handling assemblies for machine

copiers. Both Clinton [Clinton88] and Brown [Brown88] applied similar control methods

to the preliminary design of two-dimensional mechanical structures.

Tong [Tong 87] describes a control strategy called "opportunistic commitment",6

which attempts to provide some of the opportunistic behavior observed in protocol studies


[Ullman87, Baker87


Ullman88].


Clinton [Clinton88] implemented this type of control


structure in the MOSAIC computational design model [Nevill89a, Nevill89b].

Most of the knowledge-based approaches use constraints to manage interactions

between subproblems, goals, and design variables, and since each is a knowledge-based

approach, the knowledge available to the system required a clearly defined set of goals,

constraints, variables, and any relationships between these items. Abstractions and the

delineation between abstractions must also be clearly defined in order to implement a


choice of hierarchical plans.


When new or undefined conditions arise in these systems,


some constraints remain unsatisfied and the expert system fails; however, even though the

knowledge used by these expert systems was brittle, the role of constraints in design

models was shown to be an important means of representing some forms of design

knowledge.

Even though all the computational models this chapter previously described are

successful to varying degrees in modeling design processes, they have not completely

captured the essence of design. They can model specific design areas with brittle




44


knowledge and limited interactions, but as they explore their design domains, it is

extremely difficult to expand their knowledge bases due to the widely varying types of

knowledge that design domains require. Interactions become more prevalent as a design

domain expands, and as the number of interactions increases, the complexity of managing

these interactions exponentially increases. Knowledge-based approaches to computational

design process models are currently stagnant due to these problems; however, it is the

ground-breaking work of these earlier investigations that have stimulated this research.


The next section of this


chapter examines some connectionist design process models.


Artificial Neural Networks in Enineering


Artificial neural network models


applied to engineering problem domains have


begun to appear in the literature. Although very few have dealt directly with some design

domain issues previously illustrated in this chapter, this section reviews several interesting

applications. This section does not go into detail on the theory underlying artificial neural

networks.

Stojadinvic [Stojadinvic90] investigated connectionism as a computing paradigm

and its applicability to engineering design. His study considers two aspects of neural

computing. First, he discusses the computational aspects of the paradigm. His

investigation develops a foundation for the analysis of neural models by defining models of


computational neurons and their assemblies into networks. Specifically,


he looks at five


commonly used neural network models, competitive, self-organizing, associative,

stochastic. and backorooaeation models. The second Dart of his study is an analysis of







optimizers, controllers, AI tools, and models of physical processes. He then investigates

two of these areas, classifiers and optimizers, by implementing and experimenting with

several neural models to determine the feasibility of using artificial neural networks in a


design domain. Although his work explores only a limited dimension of design theory


does a good presentation of neural network basics.

Stojadinvic defines a classifier as a process of mapping instances from one domain

to instances of another, usually a more structured domain. Although he does not make the

connection, a general design process is a series of mappings from abstract to more detailed


model spaces


Stojadinvic seems more concerned with memorized input/output pairs of


neural models such as classifiers and associative memories; however, he does make the

link between using backpropagation neural models in this domain and their strong

generalization properties. He demonstrates the classification problem using a standards


processing example7 for structural design.


Using the "Interactive Activation and


Competition"


neural model [Rumelhart86b, McClelland88],


he implements two


classification example networks.

The first network is a decision table evaluator that given a number of conditions

finds an associated action or set of actions that should take place. His network

performance is comparable to the performance of conventional decision table evaluators.

The neural model has additional features such as handling ambiguous situations by


suggesting rules that were not exact but sufficiently similar to a given input,


and the




46


network can do the inverse problem where given a desired action or set of actions, the

network determines the required conditions.

The second example network is a neural organization system, which is a system of

classifiers for a collection of provisions that make up a standard. The organizational

systems provides a mapping between the requirements of a standard and the behavior

limits. It enables access to those provisions that are important to the design problem at

hand while ignoring those standards that do not apply. This model also performs well in

comparison to conventional organization systems. Although both networks accomplished

the assigned tasks, Stojadinvic does note that determination of neural model parameters

may be problem dependent and critical for good network performance.


There are several other works [Mooney89,


Weiss89, Fisher89] that compare


general symbolic classification algorithms like ID3 and backpropagation. Backpropagation

compares well in terms of the quality of classification; however, learning in artificial neural

networks generally takes more time. Classification applications are not new to neural

networks. Pao [Pao89] provides a good overview of neural networks in pattern

recognition.

Stojadinvic next investigated optimization applications for neural networks in

design. Stojadinvic does not consider general numerical optimization as a field to apply a

neural computational model. Instead, he limits his application to combinatorial

optimization, where the problem is to find an optimal solution among a finite number of

discrete alternative solutions. The best known problem of this type is the traveling







Wasserman89] and a Boltzmann machine [Ackley85,


McClelland88],


for solving the


traveling salesman problem.


In general, he found that both types of networks


accomplished the task but did so quite differently and both had limitations. The Hopfield

network implements an associative memory model that searches the solution space for a

set of values that minimizes the energy of the system. It does this by a process that is

equivalent to simple gradient decent, which is highly dependent on the starting point.

Simple gradient decent will find a local minimum as a stable state of the model, which is

within the basin of the starting point. Thus, the results of the Hopfield model depend

entirely on the shape of the energy landscape and an essentially random starting point. The


Boltzmann machine is a stochastic neural model and in Stojadinvic's


case performed better


than the Hopfield model. The Boltzmann machine is an optimization process that is

analogous to hill-climbing optimization with simulated annealing and is able to avoid

shallow local minimums, but it is restricted by long execution times and can get stuck in

deep local minimums. Given this experience, Stojadinvic feels that with some

modifications to the energy landscape and dedicated neural hardware, neural computing

systems will quickly and efficiently handle combinatorial optimization problems. As for its

applicability in an engineering design domain, he speculates that some problems in

construction management and resource allocation might be solved this way but does not

give details. The three remaining categories that Stojadinvic considers: controllers, AI

tools, and models of physical processes, were not investigated using any neural computing

model.




48


more robust and less brittle expert system for designing cable stayed bridges. Reich uses a

concept formulation system called COBWEB for the creation of hierarchical classification

trees. In adapting COBWEB to design, Reich characterizes each design by a list of


property-value pairs.


When presented with these lists, COBWEB would classify them in a


hierarchy where the leaf nodes of the tree represented each design and the branches to the

leaves would classify each design based on similar property-values pairs. COBWEB uses


an unsupervised learning system that is statistically based to develop the hierarchy.


When


designing a bridge, a user would enter all specifications and COBWEB would find the best

match in terms of known designs. Incomplete specifications result in retrieval of all leaf

nodes below the point in the hierarchy where the specifications could not be met.


COBWEB's


knowledge base is not static in the sense that it can be continually expanded


and extended as new examples arise. In dealing with conflicts and ambiguous design

specifications, Reich developed his system without complete autonomy and allows human


interaction


Without human interaction, the system takes a conservative approach to


refining the design and presents several design alternatives even though each may not


satisfy the given specifications. Reich combined COBWEB's


synthesis into a "complete"


knowledge of design


cable stayed bridge design system that includes evaluation,


analysis, and redesign. Some areas that Reich notes for further work include extending the

system to other design domains, tackling more complex designs, and supporting human

learning from the knowledge base.

Ivezic and Garrett [Ivezic92] developed an artificial neural network for learning







COBWEB, Ivezic and Garrett assume that collections of property-value pairs are

sufficient to represent designs, and that the synthesis process is equivalent to assigning


property values.


When all properties have a value, then a complete design description


results. Each property value assignment is made in some design context, and there are

typically many possible design contexts.

An interesting new outlook that Ivezic and Garrett propose is that it is possible to

construct a sufficiently accurate estimation of the probability of each value of each design

property being used in a given design context. By using probabilities to classify designs,

Ivezic and Garrett ensure NETSYN represents the many-to-many relationships typical of

synthesis knowledge derived from example sets. Many-to-many relationships arise during


design when some variables are set while others remain unspecified.


Human designers,


through experience, can consider already specified variables along with design

requirements when specifying other variables. Depending on how a computational model

represents a design domain, it may ignore many-to-many relationships.

Given a set of known property values, NETSYN predicts the a posteriori

probabilities of each possible unknown property value using a backpropagation


connectionist model.


NETSYN is trained using a set of design contexts which is made up


of a number of bound design properties and a corresponding set of desired design property

value probabilities that are to be determined. The network architecture that NETSYN uses

is one in which a small network exists for each design property value considered. Each

neural network structure acts as a probability estimation function for a design variable.







which is Reich's


COBWEB system [Reich91].


In order to evaluate each of the systems,


they created an artificial design problem that was defined by eight design properties, each

of which could take on five different values. They treated the first four properties as

design specifications and the other four as design descriptions that the synthesis process

would determine.


When evaluating NETSYN'


capabilities, Ivezic and Garrett developed a total of


6250 valid design cases, some of which are repeated design cases in order to capture the


probabilistic nature of the synthesis process. They partitioned these design


cases


into five


train-test collections: 6014 training cases and 50 test cases, 500 training and

cases, 1000 training and 1000 test cases, 4000 training cases and 2000 test c


5000 training cases and 1000 test cases. In testing NETSYN's


00 test


:ases,


performance on the


artificial design problem, they looked at two different scenarios: when the entire synthesis

space is available for training and testing (the first train-test collection) and when part of

the synthesis space is available (the last four train-test collections.) In comparing

NETSYN and symbolic learning systems, COBWEB was the only symbolic system tested


that could capture the design knowledge in a form that Ivezic and Garrett desired.


these tests, Ivezic and Garrett used only the last four train-test collections.


NETSYN'


s performance on the given design problem showed respectable results


% to 80% perfect when the training tests cover a wide range of the synthesis space.


Testing on a smaller number of training samples showed that NETSYN can maintain this


performance. The worst case when given a limited training test occurred when only 5







showed 70% perfect results.


When compared to COBWEB, NETSYN showed


consistently better performance with errors ranging from 6% to 27%.

Based on these limited results, Ivezic and Garrett conclude that connectionism

appears to be a promising approach to acquiring and using design synthesis knowledge,

but he acknowledges that his work was limited and further extensive research should


proceed


Some of the limitations of NETSYN are as follows:


Better representational capabilities


-- NETSYN requires the input and output


vectors to be mapped to binary processing elements. Using continuous-valued
properties would make the network easier to use and to interpret results.


Investigate larger problems


-- connectionist approaches to most problem domains


is initially in small tractable areas.

Apply NETSYN to realistic synthesis tasks.
Investigate approaches to incremental learning


-- most connectionist learning


architectures, including backpropagation, work in such a way that once an training
set has been learned, additional training sets cannot be used to augment what has
already been learned. This limits a network to knowledge at hand when training
occurs. In the neural network literature, this is referred to as the stability-plasticity
dilemma.
Ivezic and Garrett developed an autonomous design model and did not explore
other parts of a general model of design such as constraint management, evaluation
of partial or preliminary designs, and mapping between abstractions.

NETSYN performs well using limited knowledge, but Ivezic and Garrett did not
investigate working with conflicting requirements.
Finally, all training sets given to NETSYN were randomly generated from the
6250 possible solutions, and these solutions included duplicates. Most supervised
learning algorithms are somewhat sensitive to the choice of training sets, an issue
that Ivezic and Garrett ignore.
Overall, NETSYN shows that computational models of design can be built from

connectionist systems. They perform comparatively with symbolic systems both in learning


and in using design knowledge.


Where they particularly seem to excel is in self-organizing







Kamarthi and Kumara [Kamarthi93] investigated a similar use for connectionism in

the design synthesis process. They consider the classification and mapping problems in


conceptual design as two separate processes.


The classification problem they examine is a


twofold operation that both learns classes of design solutions from an example set and

recalls one or more of those design solutions. In connectionist terms, this is an associative

memory application. The mapping issue in conceptual design is more generative and

challenging for any computational preliminary design model. The network must learn

plausible mappings between functional requirements and corresponding design solutions

using previously solved design problems. In implementing computational models for

conceptual design, they created four different network architectures for comparison of


these two tasks using three different network paradigms.


They represented all the designs


as binary vectors of either eight or eleven elements. Each element of the input vectors

represents a single requirement as either on or off.

For the classification problem, Kamarthi and Kumara used a backpropagation

network and an ART-1 network and then compared the two. The backpropagation

network was trained using identical input and output vectors such that the network would

auto-associate the design solely based on design requirements. The backpropagation

network Kamarthi and Kumara adopted grappled with long training times and suffered

from the stability-plasticity dilemma that is characteristic of backpropagation networks.

One of the design constraints of the ART-1 network architecture is to overcome this

dilemma. The ART-1 network was presented with the same binary vectors and grouped







ART-1 networks exhibit dynamic learning properties. They can continuously learn
new associations without forgetting old ones, provided the capacity of the network
has not been reached.
Backpropagation networks are more efficient in retrieving previously learned
design problems.
Given a design problem that is different from the learned set, backpropagation
networks can generate a new solution using features from the previously learned
examples. ART-1 networks do not have this ability.
The backpropagation architecture reproduces only a single candidate output,
whereas ART-1 networks can retrieve one-to-many relationships between design
specifications and design solutions since they categorize similar designs into
families.
Kamarthi and Kumara analyzed the mapping problem using backpropagation and


another adaptive network called ARTMAP


Training the backpropagation network was


similar to what they did for the classification problem. The only change was in the output

vectors, which included additional binary elements to make each design solution unique.

The disadvantages and advantages of backpropagation for mapping were identical to those

for classification.

The ARTMAP network is made-up of two ART-1 networks along with a Map

Field module. One ART-1 network stores the design requirements and the other ART-1


network stores the design solutions.


The Map Field module learns the associations


between the families created by both ART-1 networks.


When presented with a new design


problem, ARTMAP could recall both similar design problem solutions along with plausible

new design solutions. In addition, the ARTMAP network could perform the inverse

problem, that is, it can recall the plausible design problems associated with a given design

solution.




54


same benefits and limitations as the ART-1 networks. One limitation that ARTMAP and

ART-1 networks have that backpropagation does not have is that they can only work with

binary input and output vectors. As far as applicability as a computational model of


design, Kamarthi and Kumara's


networks have the same limitations as those of NETSYN


[Ivezic92], particularly in the area of investigating simplified design problems; however,

his work further supports the notion connectionist architectures can be used successfully

in preliminary design.


In another comparison with a symbolic design reasoning system,


Wilson and


Sharda [Wilson93] examine a neural network approach to duplicating the performance of


a rule-based expert system.


The primary appeal that a neural network offers according to


Wilson and Sharda is the inductive approach to knowledge acquisition that does not

require strict specification ofIF-THEN rules or other knowledge representation schemes.


The goal of his study is to


assess


the performance of a backpropagation type network in


mimicking a rule-based approach to design decision making. An already existing expert

system for packer selection in oil well design was used to create random training and test

cases for the network where one specific packer is recommended for each case. A total of

240 cases were created and divided into different training sets in order to investigate the

effects of different training set sizes. There were a total of eight input neurons representing

design specifications, ten hidden neurons, and six output neurons, each representing one of

the possible packer designs.

Three different networks were trained with different training set sizes of 120, 180,







size, that network could classify correctly 95% of the cases. In comparison to the


performance of the expert system,


Wilson and Sharda were able to conclude that none of


the networks had learned the exact upper bound on one of the design variables.


and Sharda's


Wilson


system did not require binary input or output vectors, and they surmise that


where the network fails are in those instances where the network must learn characteristics


of continuous variables or where hard constraints were required.


Wilson and Sharda feel


that using exemplar training sets that appropriately represent these hard constraints and

that random generation of cases did not accurately describe the actual case distribution

characteristics. In summary, this work shows that artificial neural networks offer distinct

advantages in knowledge acquisition for tasks that can be solved using rule-based expert


systems and might be especially useful in domains where a domain expert's


knowledge is


unavailable.

Berke et al. [Berke93] examine a more routine design task of several aerospace

structural components using artificial neural networks. Their focus is the application of

neural networks to capture structural design expertise through their ability to learn from

examples. Berke et al. recognize a major advantage of some connectionist approaches

over traditional computational methods such as numerical optimization in that a trained


network can produce results with trivial computational effort.


Another advantage is that


the predictive capabilities of connectionist models are insensitive to numerical instabilities


and convergence difficulties typically associated with optimization.


One disadvantage that


Berke et al. see in a neural network approach is that the requisite number of design




56


training convergence problems, which are similar to convergence problems in numerical

optimization.

Their routine design tasks were to generate designs for a trussed ring and two

types of wing sections. Optimum design data were used to create 125 sets of optimum

minimum weight ring designs subject to stress, frequency, and displacement constraints.

The first wing design problem used 15 optimum minimum weight wing designs under

displacement constraints, and the second wing design problem used 50 optimum minimum

weight forward swept wing designs under displacement constraints. For the ring design,

the input parameters were the inner and outer radii and the frequency limit for a total of

three input units, and the minimum weight and cross-sectional areas of 25 truss bars are

the output units. All input and output units were continuous, real values. The 125 designs

varied in weight, depending on their dimension and frequency requirements, from 1,000

pounds to 150,000 pounds. The designs were divided into a training set that consisted of

120 designs and a testing set of five designs. Predictions displayed individual error rates

between 0% and 10% for close to 80% of the variables; however, several design variables

showed much higher error rates. Berke et al. surmise that considering the complexity of

the optimum designs with shifting load paths and the huge weight range, the training and


predictions are satisfactory relative to an expert human designer.


The results for the two


wing designs further support his conclusions that artificial neural networks can predict

optimum designs under difficult design requirements, but several design parameters had


higher errors. Similar to Stojadinvic'


findings [Stojadinvic90],


Berke et al. feel that




discontinuities, artificial neural networks have a difficult time modeling them. This finding


discontinuities, artificial neural networks have a difficult time modeling them. This finding


corresponds to Wilson and Sharda's


packer design system's


inability to properly learn


hard numerical limits [Wilson93].


Berke et al. speculate that clustering the training data


and decomposing the network within these clusters may be a viable solution to dealing


with complex design spaces. Berke et al. urge more research to


assess


the viability and


usefulness of neural networks as expert designers.


Connectionism as a Comoutational Model


There are many different views of what design is or what it should be. Design

research has stimulated these discussions and expanded our thinking into areas of human

cognition and computing that several years ago designers never considered. One of the

primary benefits from any type of research into design is in the alternative dimensions or

increased depth that a viewpoint proposes or illustrates. Any discussion or critique should

not rank or declare one model as better than another. Every model has strengths and

weaknesses, and examinations of any paradigm contribute to the general discourse of


design.


My previous research into knowledge-based design models provided the


stimulation to investigate an alternative paradigm, and artificial neural networks were

chosen for several compelling reasons that are detailed in this section.

Knowledge-based systems focus on capturing the essence of human reasoning by

representing knowledge as a collection of symbols and manipulating logical statements.

The knowledge may be in the form of rules, frames, scripts or other symbolic

representations. Connectionism, on the other hand, does not explicitly represent




58


categories and formal logic. As previously discussed, humans use many reasoning methods


during design


When developing a computation model, the majority of research has


concentrated on developing such features of design as design rules, design hierarchies, and

articulating design processes. Symbolic computational models do a good job of supporting

these areas, but design is more than this. A large part of design is emphasized in

experience. Computationally, experiential knowledge goes beyond recalling a specific case

from a database of known designs but also in the emergence of designs from combining

multiple experiences into a new design situation that may be innovative and superficially

unrelated to previous design situations. Humans have difficulty articulating this process

that some call creative or inventive.

Connectionist models that learn can synthesize new forms from previously learned

examples. This goes beyond classification problems. These generalization capabilities

make connectionist models both robust and innovative providing the training examples

cover many dimensions of a solution domain. Because artificial neural networks exhibit

only inductive learning capabilities, they are not truly spontaneous in their solutions, but


their solutions may have a variety of properties, each derived from learned examples


Both


prescriptive and descriptive design process models suffer from a knowledge bottleneck.

Good human designers have decades of formal training and experience as a basis for

creating new designs. Humans continuously assimilate their life experiences into memories

that influence their creative processes, and only closed minded people have a knowledge

bottleneck. Computational design process models that learn may help alleviate this




59


Computational design process models must learn two types of design knowledge

classes, domain knowledge and design process knowledge. Design researchers have

different means of communicating these types of knowledge that might depend on the type

design model (prescriptive or descriptive), the design domain, and the design task (an

autonomous system or only one stage in a larger process model). Domain knowledge may


consist of


abstractions, requirements, design variables, and even design procedures.


Design process knowledge is the application of domain knowledge. Both these types of

knowledge require representations within a computational model.

Connectionism emphasizes the structure of the human cognitive process as an

artificial neural network. It is artificial in the sense that the parallel processes that occur in

a biological neural network are computationally simulated on serial computers and the

biological processes are simplified. The primary difference between knowledge-based

models and connectionist models is that knowledge is not explicitly represented in


symbolic terms but in the strength of the connections between "neurons.


" Neural


computational models rely on the notion of memory, and that in response to some outside

stimulus, the model will recall various "memories."

In the studying of design, knowledge-based approaches make the knowledge that

they use explicit. That is, the design knowledge itself is available for study and available

for creating explanations. In teaching design expertise, this is a very desirable attribute,

and in complex design environments, human designers may need to check the design

processes or product for consistency. Unfortunately, the knowledge base itself is very




60


When learning from examples, artificial neural networks self-organize the

knowledge in the magnitude of the weights between processing units. Since the

representation of specific knowledge concepts is not explicit within the network,

connectionist models can represent a great deal of information that might be implied

within the training examples such as abstract design requirements, mappings between

abstraction levels, and interacting subproblems and requirements. Network models do not

require the control structure that knowledge-based systems employ to manage their

constraints and knowledge, but their training examples have a huge effect on their

performance. As an example, if we train a network to recognize the first twenty-five


characters in the alphabet, the network will never be able to recall the letter


have never given it that association.


the letter


since we


What the network will attempt to do is characterize


as one of the twenty-five letters it does know about. Thus in terms of design,


when we employ learning in a network implementation, the network will learn the implied

details about domain knowledge and about processes such that it can differentiate between

each learned example. If we do not choose examples in sufficient quantity or quality to

represent those dimensions of a design domain that are significant, the network will never

learn them.

When given novel situations, connectionist systems are able to generalize by

recalling the most mutually compatible and consistent memory from the given stimulus.

Numerically, this is a kind of relaxation procedure that is typically part of the algorithm


that simulates recall.


This feature of connectionism appears to capture the essence of




61


Many design domains are characterized by abstract, conflicting requirements that

provide a starting point for a design process. Because artificial neural networks have the

ability to generalize, they are robust in the sense that given conflicting or incomplete

stimulus, they will gracefully degrade by recalling a compatible and consistent solution.

This is an inherent feature of artificial neural networks and not part of an additional

control structure that knowledge-based systems would require. Recall that protocol

studies identified opportunistic behavior in many designers that tended to cross abstraction


boundaries. A network'


generalization capabilities emulate this same process.


Connectionism is not a complete paradigm for design computational models. Other

approaches to computational models may provide a better structure for studying such

important aspects of design as design rules, abstractions of design features, explanation of


reasoning processes, and explicit evaluation of design artifacts [Coyne90].


Where


connectionism appeals is in the ability to use experiential knowledge, apparent ability to


synthesize in novel situations, and the implicit ability to self-organize knowledge.


These


features of connectionist systems are the stimulus for this research.













ARTIFICIAL NEURAL NETWORKS


This chapter provides a foundation for artificial neural networks by presenting a

brief historical perspective and the general theory behind artificial neural networks. In later

chapters details of two classes of networks, stochastic and feedforward networks, are

presented, but before any analysis of neural computing is done, a solid base in that field

must be established. The following sections define what an artificial neural network is and

why they appear appealing to practitioners in many different fields of research. Along with

other artificial intelligence paradigms, artificial neural networks have experienced

explosive growth and interest, even to the point where some people consider them as

another computing model, along with numeric and symbolic computing. Development of


artificial neural networks in their current form began in the 1980'


however, their roots


can be traced back much further as described in the next section.


Historical Persoective


Humans have always speculated on how our brain generates and organizes our

thoughts and memories. Both the spiritual and anatomical nature has been examined by

philosophers, theologians, psychologists, physiologists, and anatomists with limited

progress and lacking agreement. Due to the complex nature of thought processes and the




63


complexity of even the most simple neural systems, application of scientific methods

have been slow in understanding human thought.

Neurobiologists and neuroanatomists have made substantial progress in

understanding how the brain and nervous system are put together, but have made little

progress towards understanding its operation. The complexity of the brain is

staggering with hundreds of billions of neurons, each connected to thousands of other

neurons. Systems of this size dwarf even the most ambitious super computers known

to date.

Gradually, as researchers developed a rudimentary understanding of the

functioning of the neuron and its pattern of interconnections, mathematical models

have emerged to test these theories. From these early works, it became apparent that

even simple models of neurons and their interconnections not only functioned in a

similar manner as the brain, but they displayed many practical functions beyond just

mimicking the brain. Thus, even from the early days of neural network research, two

mutually reinforcing objectives have emerged. First, some researchers focus on

understanding the physiological and psychological functions of the brain, and second,

some researchers develop artificial neural systems that perform brain-like functions.

This research is directed to the latter. It is interesting to note that some of these same

dilemmas concerning understanding human thought processes and memories that

confront researchers in neural systems also face design researchers.


Artificial neural networks made their first appearance in the 1940'




64


hardware models of neurons and their interconnections. Even though these hardware

models were simple, they achieved impressive results. McCulloch and Pitts

[McCulloch43] published the first systematic study of the mathematical foundations

for neural network research that was to follow. Most of their work was in developing


the simple binary neuron model known as the perception.


These systems generally


have a single layer of neurons connected by weights to a set of inputs as shown in


Figure 9.


The sigma (X) unit multiplies each input, x,, by a weight, w,,


and sums the


resulting values.


The model then

passes these values


to a threshold unit


Output


that compares the


value to a


Figure 9: Simple Perceptron


predetermined threshold value. If the sum is greater than the threshold, then the output

is one, else the output is zero.

Hebb [Hebb49] proposed the first explicit statement of a physiological learning


rule for synaptic modification. Although Hebb'


work was not a mathematical


statement, Hebbian learning is the foundation for most network learning algorithms


and is based on Hebb'


description,


When an axon of cell A is near enough to excite a cell B and repeatedly
or persistently takes part in firing it, some growth process or metabolic


change takes place in one or both cells such that A's


nFthbp rllc finrno R


efficiency, as one


ie inrrtPcqQ [-TphbPz n0 ac


Linear
Threshold







Hebb'


book encompasses more than just a proposed learning rule; he provides


a useful discussion of the link between psychology and physiology and coined the term


"connectionism.


" Several other neural concepts were also proposed or recognized by


Hebb. Other than his learning rule, Hebb also asseverates the distributed nature of

representation that the nervous system uses. In order to represent something, many

cells in the nervous system must take part in the representation. This gives rise to


Hebb's


third concept, which postulated that cells are arranged in assemblies.


These


assemblies are interconnected, self-reinforcing subsets of neurons that form the

representation of information. Individual cells could belong to more than one assembly

depending on the context, and multiple assemblies could be active at any one time.

Thus, Hebb proposed that there is a distributed representation at both the anatomical

level and at the functional level.

In the 1950s, the world entered the computer age, and artificial neural network


research benefited from this trend.


Early neural network research and simulation relied


on few mathematical statements and many wordy descriptions. Rochester et al.


[Rochester56] studied Hebb'


learning system using a computer simulation of the


nervous system. This work was one of the first to test a well formulated, detailed

neural theory using a computer simulation. This paper made an important point about

neural network research that is directly tied to the computer. Before the computer,

neural theories could be proposed, discussed, and analyzed, but they could never be

tested. Using a computer, researchers could test the form and precision of assumptions







be ignored, and success would depend as much on the details as on developing the

general theories.

Rosenblatt [Rosenblatt58] proved a perception learning theorem which

demonstrated that a perception could learn anything it could represent. His work

stimulated many researchers to further investigate the potential of perceptrons.


Rosenblatt'


excitement is clear from the following quote:


The question may well be raised at this point of where the perception'


capabilities actually stop


. the system described is sufficient for


pattern recognition, associative learning, and such cognitive sets as are
necessary for selective attention and selective recall. The system
appears to be potentially capable of trial and error learning and can


learn to emit ordered sequences of responses


. .. [Rosenblatt58, page


404]

However, he also recognized some of the more serious computational limitations of

perceptrons that still plague artificial neural networks today. He notes that perceptrons

act in a "brain damaged" manner by being literal, inflexible, and unable to handle

abstractions.

Widrow and Hoff [Widrow60] extended the perception model by proposing a


perceptron-like system that could potentially learn quickly and accurately.


The neurons


of this system were binary threshold logic units with interconnections of variable

strength. The neurons computed a weighted sum of the inputs times the synaptic

weights and added a bias term. If the sum was greater than zero, then the neuron

output was +1, and if the sum was equal or less than zero, then the output was -1.

The learning rule that Widrow and Hoff employed was a simple supervised







of values that take on either +1 or -1

signal for each neuron, errors, which


When learning, their system computed an error


is the difference between what the neurons


computed and the exact answer. The synaptic weights, wy,


were then adjusted


time t + 1 given the neuron input, x,, and a constant, a, as:


wM;(t +1)


= w,(t)+a


This process would continue until the system's


*x, error1


response was exactly correct (


e.. the


error signal became exactly zero).

Perceptrons and perceptron-like systems generated a great deal of interest in


the early 1960's


due to their initial successes at learning some simple useful functions


and exhibiting brain-like behavior; however, even from the earliest days, many

scientists suspected or even showed that the types of problems that perceptrons could


solve were limited in scope [Rosenblatt58].


The book, Perceptrons [Minsky69],


simply


proved these computational limitations of perceptrons of what they could represent

and learn.

The analysis that Minsky and Papert employed is based on the simple

perception model shown in Figure 10. The computation of a function W(x) in response


to some stimulus


is performed in two stages. First, functions ep(x) are computed and


combined through a function Q that outputs a single value vy.


Minsky and Papert show


that this model performs like a logical predicate function. By imposing certain

conditions and restrictions on perceptrons, they were able to prove several important

limitations of perceptions.













,: ,P.2 __ Q ----.





Figure 10: Minsky's Perceptron Model

The first limitation that they consider is that on order, which essentially

recognizes that only a limited number of input units, ,p, can be connected to each

association unit, (. The second limitation is that of diameter, where input units, pi,

can only connect a limited geometrical region to an association unit, (. Because of

these limitations, Minsky and Papert proved that order and diameter limited


perceptrons could not compute the predicate for parity nor that of connectedness. T

parity problem requires counting the number of active inputs and determining if the

total is odd or even.1 The connectedness problem is defined as a predicate for

determining if all points in any geometric figure are connected to one another.

In summary, Perceptrons described a general dissatisfaction with perception

concept and was not solely responsible for the decline of neural network research in

the United States. Because Minsky and Papert did such a clear and thorough job of


illuminating the perception's


limitations, some of the presumptions presented in the







last chapter of Perceptrons did dampen and delay future work in neural networks as

the following quote exemplifies:


we consider it to be an important research problem to elucidate .
our intuitive judgment that the extension to [multiple layer systems] is
sterile. [Minsky69, page 232]

This assumption was later proved wrong [Rumelhart86c] using the back propagation

algorithm, which could solve the parity problem. It is important to optimistically

realize that what the underlying result from Perceptrons was a chance to consolidate

and extend the field away from the glare caused by the initial hype created by early


successes.


Where neural network research did continue was in the area of


psychological modeling, which carried artificial neural network research into the

1980's.

Perhaps the most famous works presented in the early 1970s is that by

Kohonen [Kohonen72] and Anderson [Anderson72] who independently proposed the

same model for associative memories. Kohonen is primarily concerned with the

mathematical properties of such systems, whereas Anderson focuses on the

physiological plausibility of these systems. The linear associator proposed in these two

papers is markedly different from the perception. They consider that most neurons are

not binary restricted neurons, with only two possible outputs, but have continuous

valued outputs.

The basic neuron model is a very simple analogue integrator with a continuous

valued output. It takes a set of inputs, X, multiplies them by the synaptic weights, W,







adds them up, with the neuron's


output proportional to the sum. The input-output


relations of such systems are specified in terms of matrix multiplication.

Since these models are memory models, they require some type of learning rule

and both Anderson and Kohonen use a generalization of Hebbian learning, which

modifies synaptic weights in proportion to the correlation between input and output

elements. In mathematical terms, the connection matrix storing the memories becomes

the outer product of the input and output vectors, and for recall, multiplying the input

vector by the connection matrix yields the output vector.

As more associations are stored in the connection matrix, the resulting

association is generally not perfect. The only case where association is perfect is when

the input vectors are orthogonal. This puts an upper limit on the number of vectors


that can be stored based on the dimensionally of the memory (i.


matrix).


the connection


This obviously wastes neurons since the capacity of each neuron is not


necessarily maximized, but the usefulness of the model made the trade-off acceptable.

Although this type of system is simplistic in its linearity, it can model many useful

properties and has served as a starting point for larger, more complicated systems.

Stephen Grossberg has been one of the leaders in neural network research over

the past twenty-five years. His work is founded on his complex and very detailed

mathematical analysis of brain function and has found utility in many areas, even in

engineering design [Kamarthi93]. He is particularly well known for his series of

computer simulation programs implementing variations of his Adaptive Resonance







however, Grossberg's


research is consequential to current neural network theory. In


one of his most important papers [Grossberg80], he provides access to much of his

basic thinking on how the brain should work.


Most of the Grossberg's


work is related to how he thinks a neural network


should handle error correction during learning. His theory is different from Widrow


and HoffW


learning theory [Widrow60] and subsequent other supervised learning


algorithms, and it serves as a basis for his series of ART programs. The key point

made in by Grossberg is that a neural network should generally do error correction by

itself rather than from a "teacher" that indicates what is wrong. In order to do this type

of error correction, Grossberg suggests that the neural system is made up of two

systems in series communicating with each other. Input or stimulus to one neural

system causes some neurons to be stimulated on the other system.

When one input pattern causes the wrong set of cells to be stimulated in the

second system, this corresponds to an error. By having reciprocal connections between

the second neural system and the first, a pattern of activity that stimulates the second

systems will result in "learned feedback" returning to the first neural system. Thus, the

associated pattern from the second system interacts with the actual input pattern, and

the neural network requires no outside error correction feedback.

Another way in which Grossberg deviates from the norm concerning error

correction is that instead of using the difference between the input pattern and stored

pattern, he proposes using the sum. In using the sum, Grossberg must deal with




72


effectively suppressing small signals and enhancing large ones through what he calls a

"quenching threshold."

Grossberg tests for matching between the input pattern and the learned

feedback. If the learned feedback matches the input pattern, then the sum is larger than

the input pattern and thus there has been an enhancement since the feedback matches

the input pattern. On the other hand, if the feedback and input do not match, then the

sum will be more uniform without the enhanced peaks of a correct signal. If the

quenching threshold is properly set, some neural activity must become suppressed in

order to tune the response.

Grossberg notes that the feedback portion of a set of neurons can in essence

provide a means to maintain a set of patterns in what he terms "short term memory"

that remains active even if some inputs are shut off When an input and feedback

pattern match, strong signals result and the patterns reinforce one another. Grossberg


refers to this phenomenon as "adaptive resonance."


Such network dynamics are now


used in many neural networks, even though the feedback mechanisms may be different.

The work of Hopfield [Hopfield82] brought together many of the ideas of his

precursors in neural network research; however, Hopfield augmented his discussion

with a clear and detailed mathematical analysis of a computational neural network.


Hopfield'


network is recurrent since all neurons connect to each other and to


themselves. The neurons can achieve only binary states, and he assumes that the neural

network needs to learn a set of states or activation patterns. The network employs







Hopfield's


premise was that the function of a neural system is to develop a


number of locally stable points or attractors in state space. Other points in state space


flow toward a stable point. This is the most interesting aspect ofHopfield's


work, and


by making the connection that neural network dynamics are analogous to statistical

mechanics, he helped broaden the field of neural networks. No longer would the field

solely consist of those researchers wishing to understand how neural networks work

from a biological and psychological standpoint, but his worked prompted many other

scientists to ask what artificial neural networks could accomplish.

In making this analogy, Hopfield defined a term that is analogous to energy

and that characterized the current network state as either stable or unstable. He

mathematically showed that the algorithm for modifying an unstable state vector

causes the energy term to be monotonically decreasing; thus, state changes continue

until a local minimum energy is achieved. The modifying algorithm chooses a neuron


unit at random, examines its inputs, and changes it'


state to either on or off,


depending on the sum of the inputs being above or below a set threshold.


Therefore,


the system energy either decreases or remains the same and a stable state corresponds

to an energy minimum.

Hopfield identified many useful properties of his network that closely follow

from his original premise. His networks have a built-in error correcting mechanism,

since deviations from stable points disappear as all points in state space flow toward a

stable point. If the network is given incomplete information, that point in state space
S 4 4 fl *




74


Ackley et al. [Ackley85] developed a connectionist system called a "Boltzmann


machine.


" It is an artificial neural network whose basic elements are similar to those


used by Hopfield and is well suited to constraint satisfaction tasks. Boltzmann

machines work with weak constraints in the sense that the best solution to a problem

can violate some constraints and the quality of the solution is determined from the cost

or number of the violated constraints. Although a Boltzmann machine network is


similar to a Hopfield'


network, there are several important and interesting differences.


Like a Hopfield network, the neuron processing units of a Boltzmann machine are

binary and are connected by weights that can take on real values. They add up all their

inputs and compare the sum to a threshold. If the value is greater than the threshold,

then the neuron takes a value of one, else the unit takes on a value of zero.

As in a Hopfield network, the energy of the system monotonically decreases to


a local minimum, rather than a global minimum. Using Hopfield'


simple updating rule,


the network will get caught in local minimums, which is the case with gradient decent

and hill-climbing algorithms. Being stuck in a local minimum is not a problem with

Hopfield networks since they are associative memory models used to store items at the

local minimums. Local minimums, however, cause problems for constraint satisfaction

tasks since the quality of the solution suffers if the network settles at a local rather

than a global minimum.

Ackley et al. proposed that a simple way to get out of a local minimum is to

occasionally allow the system to jump to higher energy states. This process is







stochastic and based on the energy gap between a unit being on or off(AE).

is turned on with a probability given by the following equation:

1


The unit


The term, T,


acts like temperature, and a system of units will eventually reach "thermal


equilibrium" as the temperature is lowered. The relative probability of the two global

states obeys the Boltzmann distribution given as


-E)


Here, P, is the probability of being in the cah global state, and E,


is the energy of that


state. The 13 subscript terms represent the other energy state.

The temperature term controls the sensitivity of energy differences. At low

temperatures, there is a strong bias towards low energy states but at a high time cost

to reach those states. Conversely, at high temperatures equilibrium is achieved faster

but at a higher energy state. Thus, similar to annealing in metals, starting at high

temperatures and progressing to lower temperatures, gradually cooling the system,

allows the system to rapidly approach equilibrium. High temperatures allow a course

search of the global state space, and lower temperatures lets the system respond to

smaller energy differences within the course minimum discovered at higher


temperatures. Both Hopfield's


network and the Boltzmann machine network provided


the basis for another network, Harmony Theory, that is discussed in the next chapter.




76


The network dynamics of a Boltzmann machine are effective in finding

minimums in the energy state space for given input patterns. At the time Ackley's

paper was written, neural networks were on the verge of making a comeback, but only

if a generalized learning algorithm could be found. Minsky and Papert [Minsky69] had

shown that single layer perception networks were incapable of solving many

interesting but simple problems. In order to solve these types of problems, a network

must contain nonlinear processing units that are not directly constrained by the input;


however, up until this time, training multiple-layer networks was impossible.


When the


network produced wrong results, it was seemingly impossible to determine which of

the many connection strengths were at fault.

Ackley et al. developed a learning algorithm for Boltzmann machine networks

that overcame this problem. The neuron units of a Boltzmann machine are divided into


two types.


The first type is a set of "visible" units that are the interface between the


network and the environment.


The second type are "hidden"


units that are not fixed


and can represent more complex relationships about the visible units.


The definition of


learning for Boltzmann machines involves matching probabilities between a given

external environment (input/output pairs) and the network. In essence, learning

attempts to find a set of weights that is most likely to have generated the given

environment, so if the probabilities of states of the network match the probabilities of

states of the environment, then the network accurately represents the environment.

The learning rule to adjust the weights to increase the fit between the network and the







by letting the network run to an initial equilibrium state such that probabilities of the


states for each unit can be estimated.


The second step fixes the visible units to take


appropriate values as specified by the input/output pairs, and values of the probabilities

of the states of units are again estimated. Local weight changes are then made

proportional to the difference in the probabilities of the units coupled by their linking


weight.


Although only locally available information is used to change weights, the


change moves toward an optimum of a global measure of fitness. The primary

drawback is that this is a slow learning process since the network must estimate

complete sets of probabilities in order to adjust weights.

The papers discussed in this section are some of the seminal works that have

stimulated research in this field for almost half a century. As can be seen, research has

originated from many different, seemingly disparate fields but are unified in their

common goal: to understand the workings of massively parallel networks similar to


biological neural networks.


This concludes the historical perspective, and the


remainder of this chapter reviews general properties of artificial neural networks.


The Neuron


As can be inferred from the previous discussions, artificial neural networks can

appear to be quite diverse; however, all these systems have a great deal in common.

This section identifies recurrent themes and begins by briefly examining the structure

and behavior of biological neurons since they strongly influence artificial neural

networks. Next. general characteristics of comnutational neurons are examined.




78


The Biological Neuron

Artificial neural networks are biologically inspired, and may be considered as

just a different level of abstraction of their biological counterparts. Initially, many


researchers turned to artificial neural networks to help hypothesize about the brain'


overall operation; however, in the past ten years, even though the functions of artificial

neural networks are often suggestive of human cognition, some network designers

have gone beyond biological knowledge of the brain and discarded biological

plausibility.


Biological

neurons make up a

nervous system. It is

estimated that humans

have more than 10"

neurons with perhaps

1015 interconnections


drites


apses


Cell Body


Nucleus


Figure 11: A Biological Neuron


between neurons. Neurons receive, process, and transmit electrochemical impulses. A

typical biological neuron is shown in Figure 11. A neuron has a well defined region

that houses the nucleus and is known has the cell body. Originating from the cell body

are long, branching fibers that are divided into possibly several, shorter dendrites and a

single, longer axon. Dendrites extend from the cell body, branching to provide

receptive surfaces for signals from other neurons. The cell body gathers all input




79


conducts a signal to the axon. The axon extends from the cell body and terminates in a

branching pattern called terminal fibers. The terminal fibers connect to the dendrites of

other neurons through synapses, which transmit an electrochemical signal from one

neuron to another.

Functionally, a biological neuron is a complex, electrochemical device. The

activity of a neuron is measured in terms of firing frequency, which is the number of

axon signals generated in a constant time interval. The axon signals are continuously


valued.


When the signals reach the synapses, they are transmitted to neighboring


neurons through a chemical transmitter. Given a series of input signals, a neuron can

be either excitatory or inhibitory. An inhibitory action suppresses the transmission of a

signal; whereas, an excitatory response continues transmission of a signal. Some

neurons are capable of transmitting many different types of signals via different

chemical transmitters and potentials.


The Comoutational Neuron


Computational neurons' mimic the simplest capabilities of biological neurons.

Although there are many types of artificial neurons, this section describes the basic

functioning of artificial neurons. Computationally, the functioning of artificial neurons

can be divided into the following three stages:


Input
Activation
Output




80

At the highest level of abstraction, a computational neuron receives a set of inputs, X,

representing output from other neurons or possibly input from the environment.

Dendrites perform this task in a biological neuron. The set of inputs are multiplied by

corresponding weights, W, which represent synaptic strengths. These results may be

passed through a function, ((. The result to this point is called the input to a neuron.

These input is passed to an activation function, f, to determine the level of activation,

S, for that neuron. In a biological neuron, this occurs in the cell body. The activation

level, S, may then go through an output function, g, producing the output value, F,

which is passed on to all other connected neurons. Many computational neurons


simply output the result of the activation function.


axon. Figure 12 illustrates this general concept.


S= f(u)


This is similar to the function of an


Input Stage Activation Stage


Output Stage


Figure 12: Computational Neuron







3 Since many computational neurons use the result of the activation stage as their


u = (x ,w )


F= g(S)




8


Input stage

Although there are no restrictions as to the type of function that a neuron may

use at the input stage, most employ a simple combination of the inputs and the


weights.


The function, <(xi, wi), typically multiplies each input value by the


corresponding weight as


- (^,


where n is the number of inputs to a neuron.


Variations on this theme exist and have


been employed in some networks.


Activation stage


The activation state, S, of a neuron is indicative of that neuron's


contribution


to the state of the network at a particular time in response to some input.

Computational neurons can only take on a single activation value at any time, and

typical computational neurons are limited in the range of values they can achieve. The

activation of biological neurons are more complex, but they also appear to have a

limited activation range.


The simplest type of computational neurons are the binary type.


The activation


of these types of neurons are limited to two values in the set,


(0,1) or S


I-1


where the value of 1 indicates that the neuron is active [McCulloch43,

Hopfield82].


Widrow60,





82


Continuous neurons, which take activation values from the complete set of real

numbers,


=9?,


are on the other end of the spectrum from binary neurons. Subsets of this type of

neuron are those that take on real values from a bounded, closed interval such as


= [0,1] or S


Several networks [Kohonen


Anderson


Ackley85] use these types of neurons.


The activation values are arrived at by passing the results from the input stage


through an activation function.


Two classes of activation functions are employed,


depending on the type of network. These classes are divided into stochastic and

deterministic activation functions. Stochastic activation functions compute the

probability that a neuron will have an activation value based on some probability

distribution function, f, the current input to the neuron, u, the previous activation state,

St-., and possibly a global parameter analogous to temperature, T.


p(S)


f(u,S_, ,T)


A common probability distribution function is the Boltzmann distribution,


p(u) = e ,

The global parameter, T, has an important affect on the shape of this function, and

most stochastic networks vary Tbased on an annealing schedule. German and German

[German84] showed that the rate of temperature reduction in the annealing schedule


[- 1,1]









log(l + t)'


where To is the initial temperature and t represents time. Using this annealing schedule,

the neuron activations will gradually settle onto extremes that represent desirable

network states [Hopfield82, Ackley85]; however, as can be seen from Figure 13, this

equation predicts very slow cooling rates. As a result, both running and training

stochastic networks can take impractical periods of time.


Deterministic


activation functions may be

subclassed into three


categories,


linear,


thresholding, and


. squashing.


The squashing class is


predominant since it can


0 20 40 60 80


Figure 13: German's Annealing Schedule


emulate the others by varying its parameters. The


most common linear activation function is the

identity function, which passes the total input to

connected neurons. Thresholding activation

functions require a specified threshold value, 0, to


nnWnnnaf +1.0 +n+nl 4 41f Thorn rr


Fmiora 1i4 "Sten Fi-inrtnn


T~drB nM F~rrsral )1M~~




84


of thresholding activation functions. The most common are the step function (Figure

14):


f(u,9)


-1, ifu <0
1, otherwise


and the linear threshold function (Figure 15):


f(u,9


1,92)


(-2u + ,


+ e2), if 91


Squashing functions are used in

those networks where an unbounded set of

real values must be mapped to a bounded

set. The most common is the sigmoid

function:


f (u)


Figure 15: Linear Threshold Function


1+e-e


where larger values of 13 produce a step like function (Figure 16) and smaller values

cause the sigmoid to behave like a linear function (Figure 17).


< u < 8,


=(


..




85



Output stage

Networks can use any function as an output function, g(s); however, most

networks simply pass along their activation value to connected neurons. To reflect the

lack of a specific neuron output function, the output of a neuron is often called its

activation or activation level.


Networks of Neurons


Both computational and biological neurons by themselves are incapable doing

much more than turning themselves on or off. As an example, consider a single neural

receptor in a retina. As an individual neuron, it is incapable of recognizing a familiar

visual scene; however, organized with millions of other neurons, we are able to recall


well-known places.


Thus, it is clear that the information processing power of neural


systems does not need to come from using complex neurons but from the aggregate


activity of a network of neurons. Rumelhart et al.


important feature as,


[Rumelhart86b] characterize this


"all the knowledge is in the connections" (p. 75), and the primary


research focus in artificial neural networks is in the behavior of these systems, not in

developing complex neurons.

There are three fundamental factors in determining what an artificial neural

network can accomplish. These are


. the structure or network topology,
. the representational scheme, and
. network dynamics.




86


In order to use neural networks to solve practical problems, issues of

representation must be addressed since neural models must interact and represent with

their environment in some way. The topology of a network, with reference to its

organization, strongly influences the representational capabilities. Therefore, network

users must not only choose how they pose a problem and interpret results, but also the

structure of the network in order to correctly encode features of the environment that

are vital.

Neural networks respond to external input. The dynamics of this response are

implicitly time related. That is, the state of individual neurons change over time in

response to the activation of other connected neurons. The result of processing is an


equilibrium condition on the state of all neurons.


The dynamics of processing greatly


effect the capabilities of these networks. This section provides a conceptual and

mathematical framework for assemblies of simple neurons and aspects of their

dynamics.


Network Structure


The structure of a neural network


defined by a finite set of neurons and the


connectivity between neurons. The set of neurons is usually homogeneous with respect

to their computational characteristics. The size of a network is simply the number of

neurons. Mathematically, sets of neurons are can simply be represented by vectors of

activation values.


Connectivity between neurons is represented by weight matrices, w,. The







neurons are connected


The row index, i, indicates the index of the neuron from which


a signal may emanate, and the column index, j, denotes the index of the neuron that


will receive the signal.


The coefficients are normally real values with positive numbers


representing excitatory connections and negative values indicating inhibitory


connections.


zero.


When two neurons are not connected, then the corresponding weight is


When a network requires both feedforward and feedback connections between


neurons, a square weight matrix results and oftentimes the strength of the connection


must be the same; therefore, a symmetric square matrix arises.


Weights along the


diagonal act as biases.4

Identifying layers of neurons is a convenient way to help analyze networks.

Identification of layers is done according to the direction of connections among

neurons. Multiple layer networks have multiple weight matrices that designate the

connectivity between connected layers. Each layer of neurons is represented by a

separate vector of activations.

Layers of a network are differentiated according to their visibility to the

outside environment. Neurons in a layer that directly receives input from the


environment form the "input layer.


are organized into a layer called the "output layer.


" Neurons that represent the results of processing


All other neurons are known as


"hidden neurons"


and may be organized into one or more "hidden layers,


" depending


upon their connectivity.







Distributed and Local Representations


Representational issues are divided into internal and external interpretations.

External representations are dealt with in the following chapters concerning specific

networks and problem domains. In this section, we inspect internal representations and

information processing of neural networks.

Considering an assembly of neurons, neural processing is a change in the


activation states or activation levels of those neurons.


We can interpret these patterns


of activation as either a local representation or as a distributed representation


[Hinton86].


In local representations, each neuron represents a concept. In distributed


representations, concepts are represented by specific patterns of activity distributed

over a set of neurons. Each neuron may be part of the activation pattern consistent

with more than one concept. Thus a neuron instead of representing an entire concept,


represents a "microfeature"

stand for "microinferences"


of a concept. Connection strengths between microfeatures

[Hinton86, page 80].


Local representations make understanding what the network is doing much

easier since the state of activation of each neuron has a definite meaning. It is also

easier to hand-craft the structure of the network including weight magnitudes. On the

other hand, distributed representations have a number of compelling features that

make them attractive. Using distributed representations makes a network more robust,

improves generalization capabilities, and gives a network the ability to generate new

concepts. Since each neuron depicts a microfeature, noise or a damaged neuron will




89


concept via many neurons. It is entirely conceivable that a damaged network will still


be able to perform adequately. A network'


generalization ability may be improved


since microfeatures may be combined in new patterns of activation and thus represent

new concepts.

Distributed representations are not without caveats. Most distributed

representations are more efficient in terms of memory usage since each neuron may be


part of several patterns; however, distributed representations do not guarantee this.


addition, different concepts cannot be represented in a network at the same time, but

using subnetworks can eliminate this problem. Perhaps the most difficult problem is in

clarifying the relationship between knowledge stored in the activation patterns of a

network using distributed representations and traditional knowledge representation

schemes. Using distributed representations makes identification of what knowledge

has been stored in the network extremely difficult.


Network Dynamics


Network dynamics are divided into two processes, activation and learning

dynamics. Each is implicitly a function of time, which is discretize and abstracted into

steps, epochs, or cycles. Activation dynamics involve determination of the activation

level of all neurons in an assembly. Activation dynamics are typically very fast and

analogous to biological neural assemblies where total required computation time is in

the 10 millisecond range. Learning dynamics involve changing the synaptic weights of

the assembly, which requires more time and can take on the order of weeks to




90


Activation dynamics involve a change in the pattern of activation of neurons in

a network assembly. As mentioned in the previous section on the computational

aspects of neurons, determining the activation level of a single neuron involves the

following three steps:


Input processing,


Activation, S

Output, F =


= x(), w)


f(u)


g(S)


Each neuron computes its activation independently, in a time sense, of other


neurons.


They compute their activation based only on the output provided by


connected neurons. In keeping with biological plausibility, most networks follow some

form of a parallel update schedule in that many if not all neurons simultaneously

change their states. Parallel update schedules can be synchronous or asynchronous.

For hierarchical networks, where neurons are grouped into layers, synchronization of

processing normally requires that neurons in a layer wait until neurons in the preceding

layer compute their outputs. This is a form of synchronous updating since layers or

groups of neurons compute their output simultaneously while all other neurons remain


fixed. Computations depend on the previous time step's


activations, and the network


propagates new activations only after all neurons in the group complete their

computations. Asynchronous updates allow neurons to change their states

independently and simultaneously. Asynchronous updates do not account for the

possibility that computations might not be up to date and be based on old neuron

nn,+*rnm+f +b'tli .n nbtrl l S-n +ho n rnroQc af rhonrr




91


The importance of learning dynamics becomes clear when considering that the

activation dynamics are controlled by the connections between neurons, represented


by the weight matrices, wi,. There are two methods for establishing the


values in


weight matrices and the method utilized depends on the internal representational


scheme, problem complexity, available computing power,


and time.


These are


. by hand and
. algorithmically.


When local, internal representations are used in a neural model, specifying the

connection weights by hand is entirely feasible, providing the network is not immense.5

Using local representations, each neuron represents a single concept or entity, and it is

possible to determine the relationship between neurons on the basis of the desired


concepts and associations to be present.


Those neurons that represent conflicting


concepts are connected with inhibitory connections (e. g., w


-1); those neurons


symbolizing supporting concepts are linked with excitatory connections (e. g.,


Unrelated neurons are not connected (e. g.,

are typically constant throughout. Manual n


= 0). The magnitude of the connections


methods of developing connection strengths


are well suited to those domains that are well structured since relationships between

concepts (neurons) need to be well defined and understood.

Algorithmic learning methods are employed when distributed representations

are used, the problem is large, or the domain is complex. There are two classes of


For hand crafted weight matrices, what is meant by a large network is subjective







algorithmic learning methods, either supervised or unsupervised.


Most algorithmic


learning methods evolved from the concepts first presented by Hebb [Hebb49].


These


methods tune the weights of a network in such a way that the networks performs in as

desired for a given set of input patterns. Learning procedures are generally iterative


and involve a two step process.

current state of activation, F, o


The first step of the iteration is to determine the


fthe network in response to a set of input patterns, u,,


= {u, i


where n represents the number of input patterns.


For supervised learning,


F is used in


combination with a set of desired target responses, T, to calculate a measure of the

quality of learning, E,


= E(u1


The second step in the iterative process requires computing the necessary


change in the connection weights,


, to increase E:

= 0(W, 7,E,E'),


where ir is a matrix of updating rate parameters, and E is a matrix of derivatives of E.

Once AW is calculated, the weights are updated and a new iteration starts. The

iteration continues until E, the quality measure of learning, is acceptable. Learning

dynamics assume that changes in the weights do not effect the activation dynamics

within an iteration.

Both classes of learning processes essentially perform a search in a multiple-


i I;)


---




93


deep and difficult to emerge from. Different learning processes are in essence different

types of search procedures. Both supervised and unsupervised learning procedures

have limitations and research continues to find faster and more reliable training

algorithms.

The performance of a training algorithm is dependent upon the training set, the

input and possibly output patterns presented during learning. In general, the training


set should fully and accurately represent the problem domain.


Features and concepts


that are important must be either explicitly or implicitly encoded in the training set;

otherwise, the network will not be able to represent them.


Summary


All artificial neural networks perform essentially the same function, a vector


mapping


That is, they take an input pattern and produce an output pattern. An


artificial neural network encodes these mapping relationships via a learning process.

Different neural networks vary greatly in the range of mappings that they can

represent, and some networks are quite general in their mapping abilities. The ability to

map conceptually unifies all networks at a high, abstract level; however, there are


many other appealing properties that most of these networks share.


These are learning,


generalization, and self organization.


Learning


The ability of a network to learn from examples (


e., experience) has created a


^nran+ nflJan1 at n /:..al nrt4 a. 44fa n a1 'ral naQnrllre Tlnrnrf na ".nnla loarnsnnr nrnraco







1. Supervised -- where each input pattern is paired with a target output pattern.
Together they form a training pair.
2. Unsupervised -- does not require target output patterns, but iterates until a
consistent set of output patterns results. The effective results are that this
process extracts statistical properties of the input patterns and groups them
accordingly.

Generalization


The goal of learning is not simply to reproduce the output patterns of the

training set. A simple lookup table would perform this task and not require a training


algorithm.


We want a network to also generalize correctly. Generalization is an ability


to produce correct output patterns from input patterns that are not part of the training

set, and for the network to be relatively insensitive to minor variations in the input.

Generalization allows a neural network to accommodate variability, producing a

correct output pattern despite significant deviations from the training set.


Self Organization of Knowledge

Artificial neural networks do not explicitly store nor create knowledge. A

neural network learns knowledge and represents that knowledge in the connection

strengths between processors. Although the topology of a network is static in most

implementations, the network self organizes the learned knowledge through the

learning process.