OPTIMIZATION OF NONCONVEX SYSTEMS
AND THE SYNTHESIS OF OPTIMUM
PROCESS FLOWSHEETS
By
GEORGE STEPHANOPOULOS
A DISSERTATION PRESENTED TO THE GRADUATE
COUNCIL OF THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1974
Dedicated to
The Memory of My Father
Nicholas Stephanopoulos
Mall hai s thr~e ways o6 acting Itiu%ef:
Fi~tty, O kimed cLtatiom, RtJLE5 L, tfie
iqob'1jcLt; Seccmwdey, Oki Zvtetticil, tdLi 5
~iztzc ects~i.est; cuzd TbliidJ, or,
expe.iekice; tlhLs i,5 th'c tte.LCs t.
Co;[6(1CLtS
ACKNOWLEDGMENT
The author wishes to express his gratitude to:
The chairman of his supervisory committee, Dr. Arthur W. Westerberg,
Associate Professor of Chemical Engineering, for his firm support,
sound advice, and guidance throughout what has been the most
stimulating and enjoyable period of his academic life;
The members of his supervisory committee: Dr. F.P. May,
Professor of Chemical Engineering, Dr. D.W. Kirmse, Assistant Professor
of Chemical Engineering, Dr. T.E. Bullock, Associate Professor of
Electrical Engineering and Dr. M.E. Thomas, Professor of Industrial
and Systems Engineering;
The faculty and staff of the Department of Chemical Engineering
for their assistance;
The National Science Foundation, which provided financial support
through the grants GK18633 and 41606 and the Graduate School for a
research assistantship;
Jeanette for her encouragement through the lows and the sharing
of the highs of the life.
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS ......................................... iii
LIST OF TABLES ............................................ vii
LIST OF FIBGURES ........................................... viii
LIST OF SYMBOLS ........................................... xi
ABSTRACT ........................................... ...... xiv
CHAPTERS:
I. INTRODUClION ................................
II. TWO LEVEL OPTIMIZATION METHOD. DUAL GAPS AND
THEIR RESOLUTION USING METHOD OF MULTIPLIERS ... 5
II.1. Review of the Previous Works on the Non
convex Optimization .....................
11.2. Statement of the Problem and the TwoLevel
Procedure ............................... 7
11.3. Dual Gaps ............................... 10
11.4. Resolution of Dual Gaps Using the Method
of Multipliers .......................... 13
11.5. Computational Separability .............. 22
11.6. The Algorithm ........................... 24
11.7. A Discrete Minimum Principle ............ 26
II.8. Discussion .............................. 34
III. A STRONG VERSION OF THE DISCRETE MINIMUM
PRINCIPLE ...................................... 37
III.1. Review of Previous Works ............... 37
11I.2. Statement of the Problem and Its
Lagrangian Formulation ................. 39
iv
TABLE OF CONTENTS (continued)
Page
111.3. Development of a Stronger Version of
the Discrete Minimum Principle ......... 45
111.4. The Algorithm and Computational
Characteristics ........................ 51
111.5. Discussion ............................ 53
IV. EXAMPLES OF NONCONVEX OPTIMIZATION ............. 56
IV.1. Numerical Examples ...................... 56
IV.l.a. Twostage examples ............. 56
IV.l.b. Threestage example with
recycle ........................ 60
IV.l.c. Soland's example ............... 64
IV.l.d. Jackson and Horn's counter
example ............. ......... 64
IV.l.e. Denn's counterexample .......... 69
IV.2. The Design of a Heat Exchange Network ... 72
V. SYNTHESIS OF PROCESS FLOWSHEETS. A GENERAL
REVIEW ....................... .......... ....... 83
VI. BRANCH AND BOUND STRATEGY FOR THE SYNTHESIS OF
OPTIMAL SEPARATION SCHEMES ..................... 90
VI.1. Previous IUorks on the Synthesis of
Separation Schemes ...................... 90
VI.2. Statement of the Problem and the List
Techniques for the Representation of the
Separation Operations.................... 93
VI.3. Branch and Bound Strategy ............... 100
VI.4. Examples ................................ 107
VI.4.a. Example 1: nbutylene
purification system ............ 108
VI.4.b. Example 2: olefinsparrafins
separation system .............. 121
TABLE OF CONTENTS (continued)
Page
VI.5. Discussion .............................. 132
VII. EVOLUTIONARY SYNTHESIS OF PROCESS FLOWSHEETS ... 134
VII.1. A General Philosophy on Evolutionary
Synthesis .............................. 134
VII.2. Evolutionary Synthesis of Optimal Multi
component Separation Sequences ......... 143
VII.2.a. Representation of separation
sequences as binary trees .... 143
VII.2.b. Neighboring flowsheets and the
evolutionary rules for a
separation sequence .......... 146
VII.2.c. Polish strings and their repre
sentation of separation
sequences .................... 150
VII.2.d. Proof of the completeness of
the evolutionary rules ....... 155
VII.2.e. Evolutionary strategy ........ 161
VII.3. Examples of Evolutionary Synthesis 163
VII.3.a. Example 1: synthesis of a
solids' separation system .... 163
VII.3.b. Example 2: synthesis of a
multicomponent separation
sequence ..................... 167
VII.4. Discussion ............................. 174
VIII. CONCLUSIONS AND RECOMMENDATIONS FOR FURTHER
RESEARCH ....................................... 178
APPENDICES ................................................ 183
APPENDIX A ...................................... 184
BIBLIOGRAPHY .............................................. 187
BIOGRAPHICAL SKETCH ....................................... 191
LIST OF TABLES
Table Page
1 Points Generated for the TwoStage Example ....... 58
2 Points Generated for the Modified TwoStage
Example ............ ................. ............. 61
3 Points Generated for Soland's Example ............ 65
4 The Numbers of Distinct Separators B(N) and
Distinct Flowsheets F(N) for a Mixture of r
Components and One Separation Method ............. 96
5 Initial Feed to the nButylene Purification
System ........................................... 109
6 Generated Flowsheets and Their Minimum Costs ..... 120
7 Specifications of the Initial Feed and the Desired
Products for Example 2 ........................... 122
8 Specification ef the Solids in Example 1 ......... 165
9 The Evolutionary Steps Taken During the Synthesis
of the Solids' Separation System ................. 168
10 Table Indicating the Effectiveness of the Proposed
Evolutionary Strategy in Synthesizing an Optimal
Separation Sequence (One Separation Method Used).. 176
11 Table Indicating the Effectiveness of the Proposed
Evolutionary Strategy in Synthesizing an Optimal
Separation Sequence (Three Separation Methods
Used) ............................................ 177
LIST OF FIGURES
Fi ure Page
1 Geometric View of the Success of the Dual Approach .. 12
2 Geometric View of the Failure of the Dual Approach
to Yield the Primal Solution ....................... 14
3 Improvement of the Dual Bound h (X) for the
Penalized Problem Over the Dual Bound h(A) for the
Unpenalized Problem for the Same Multipliers X ..... 36
4 Relationship of Penalty Constant K Required for
Three Lagrangian Based Optimization Algorithms ..... 55
5 Twostage Example .................................. 57
6 Threestage Example with Recycle ................... 62
7 Jackson and Horn's Counterexample ................. 66
8 Heat Recovery Process .............................. 74
9 Uncoupled Heat Recovery Process Showing the Enthalphy
Variables Used in the Twolevel Optimization Method. 75
10 Diagram Showing Notation for a Single Exchanger .... 78
11 All the Distinct Separators Generated and the Basic
Flowsheet for a Fictitious Example of 4 Components
Using 2 Separation Methods ......................... 98
12 Twostage Example to Demonstrate the Insensitivity
of the Dual Function to the Values of the Lagrange
Multipliers .............................. .......... 106
13 Generation of the Basic Flowsheet for the Synthesis
of the nButylene Purification System .............. 111
14 Generation of all the Flowsheets Starting with
Separator 1 for the nButylene Purification System
Example ............................................ 112
LIST OF FIGURES (continued)
Figure Page
15 Generation of all the Flowsheets Starting with
Separators 2 or 5 for the nButylene Purification
System Example ..................................... 114
16 All the Distinct Separators Employed During the
Synthesis by Branch and Bound of the nButylene
Purification System ................................ 116
17 Nearly Optimum Flowsheets Retained at the End of
the Branch and Bound Synthesis of the nButylene
Purification System ................................ 119
18 All the Distinct Separators Employed During the
Synthesis by Branch and Bound of the Olfins,
Paraffins Separation System ........................ 123
19 Generation of the Basic Flowsheet for the Synthesis
of the Olefins, Paraffins Separation System ........ 127
20 Nearly Optimum Flowsheets Retained at the End of
the Branch and Bound Synthesis of the Olefins,
Paraffins Separation System ........................ 128
21 Generation of all Flowsheets Starting with
Separator 1 for the Olefins, Paraffins Separation
System ............................................. 129
22 Generation of all Flowsheets Starting with
Separators 2 or 3 for the Olefins, Paraffins
Separation System .................................. 130
23 Generation of all Flowsheets Starting with
Separator 4 for the Olefins, Paraffins Separation
System ............................................. 131
24 An Illustrative Diagram Showing Two Alternate Sets
of Evolutionary Rules for a Family of Flowsheets A.. 137
25 Flowsheet A and Its Neighboring Flowsheets B,C,D
and E Resulting from A with Simple Structural
Modifications ...................................... 139
26 A Separation Sequence (A), Its Corresponding Binary
Tree (B) and the Skeleton Structure (C) Corresponding
to This Tree ....................................... 144
LIST OF FIGURES (continued)
Figure Page
27 Flowsheet (A), a Down Neighbor (B) to It, and a
New Separator Type Neighbor (C) to It .............. 147
28 Separation Sequences Corresponding to the Binary
Trees of Figures 27 A and B ......................... 149
29 A Binary Tree for the Algebraic Expression
3x2y + z2/u ............. ....................... 151
30 Interpreting a Polish String Development of an
Algebraic Expression Using a Stack of Operands and
a Set of Operators ................................. 153
31 The Operators of a Polish String and Their Operands,
and the Schematic Representation of the Generation
of the Neighboring Flowsheets by Applying Rules 1
and 2 .............................................. 156
32 Flowsheet A1 and Its Neighboring Flowsheets A2,
A3 and A4 Generated Using Rules 1 and 2 .......... 158
33 A Schematic Representation of a System for the
Separation of Solids and the Corresponding Binary
Tree ............................................... 166
34 Flowsheets Generated During the Evolutionary
Synthesis of the nButylene Purification System
Starting from Flowsheet (a) ........................ 170
35 The Evolution Process for the nButylene Purification
System Starting from the Flowsheet (a) ............. 171
36 The Evolution Process for the nButylene Purification
System Starting from the Flowsheet (k) ............. 172
37 Flowsheets Generated During the Evolutionary
Synthesis of the nButylene Purification System
Starting from Flowsheet (k) ....................... 173
LIST OF SYMBOLS
A.
a,b,c
C.
ci
D
F
F*
f.
g
H
H
H
h
h .
J
I(j)
K
L
L
= Heat transfer area of the ith heat exchanger.
= Flowrates of hot streams in a heat exchange network.
= Heat capacity of stream i.
= Cost of the ith heat exchanger.
= Set of feasible dual variables or multipliers.
= Scalar return function.
= Scalar return function for the augmented problem, may
be subscripted.
= Stage transformation function, for stage i.
= Scalar return function for subsystem j.
= Vector of equality constraint functions, may be subscripted.
= Supporting hyperplane for the set R.
= Hessian matrix of subsystem i.
= Hamiltonian function. Subscripted with i refers to the
Hamiltonian for the subsystem i.
= Hamiltonian function for the augmented problem.
= Dual function.
= Vector of inequality constraint functions for subsystem j.
= The set of i such that stream i is an input to subsystem j.
= Penalty constant, always nonnegative.
= Lagrangian function.
= Lagrangian function for the augmented problem, may be
subscripted.
. = Sublagrangian for subsystem i.
I?
= Sublagrangian for subsystem i after the linear
approximation of nonseparable terms.
0(j) = The set of i such that stream i is an output of
subsystem j.
P(q) = Scalar quantity = gT(q)g(q).
Q = Heat duty of an exchanger.
q = Composite vector variable (xlu), may be subscripted.
R = The set (z ,z) such that z w(z), z c Z, may be
subscripted.
S = Constrained variable set, may be subscripted.
s = Direction vector, may be subscripted.
T ,T = Temperatures of a cold stream at the entrance and the
Sexit of a heat exchange network.
t. = Stage transformation function, for stage i.
ta tb,tc = Temperatures at which the hot streams a,b,c are available.
u = System decision variable vector, may be subscripted.
w(z) = Minimum {F(x); g(x) = z, x e S}.
w (z) = Minimum {F (x); g(x) = z, x c S}.
x = Vector variable associated with interconnecting streams,
may be subscripted.
y = Vector variable associated with streams leaving the
system, may be subscripted.
Z = The set of z such that 3 x c S 3 g(x) = z.
z = Perturbation vector, may be subscripted.
Greek Letters
a = Scalar parameter.
i. = Positive constant, usually = 0.6.
B = Scalar parameter.
Yi = Positive constant.
AT = Log mean temperature difference.
= Small positive quantity.
0 = System decision variable vector, may be subscripted.
\ = Vector of Lagrange multipliers, may be subscripted.
. = Vector of Lagrange multipliers for the augmented problem.
v = Vector of KuhnTucker multipliers, may be subscripted.
i = Scalar parameter.
Mathematical Symbols
3 = Such that.
3,$ = There exists, there does not exist.
V = For all.
P = Zero vector.
C = Subset of.
A = Intersection with.
U = Union with.
Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment
of the Requirements for the Degree of Doctor of Philosophy
OPTIMIZATION OF NONCONVEX SYSTEMS
AND THE SYNTHESIS OF OPTIMUM
PROCESS FLOWSHEETS
by
George Stephanopoulos
June, 1974
Chairman: Arthur W. Westerberg
Major Department: Chemical Engineering
Previous efforts in the area of optimization of chemical
processes have not accounted for the nonconvexities commonly en
countered in such systems. These nonconvexities cause many of the
proposed large scale optimization strategies to fail. The subject
also of the chemical process synthesis was largely ignored until
recently despite its importance in the chemical engineering practice.
This dissertation presents techniques to overcome the deficiencies
of two related and often studied optimization methods in the presence
of nonconvexities and develops strategies for the synthesis of
process flowsheets.
Many Lagrangian based methods for the optimization of large scale
systems require that the Lagrangian function possess a saddle point.
The twolevel optimization method may not be generally applicable to
chemical process design problems due to the mathematical character
of commonly encountered constraint sets and objective functions, which
commonly do not allow the existence of saddle points for the Lagrangian
function. The dissertation presents a method to overcome these
shortcomings of the twolevel optimization procedure by employing
Hestenes' method of multipliers. The objective function is augmented
by a penalty term which is the sum of the squares of the connection
constraints multiplied by a positive constant. This penalty term
under certain conditions turns every stationary point of the
Lagrangian into a saddle point thus securing the success of the two
level method. The separability of the initial system which was lost
because of the penalty term is regained by expanding the nonseparable
terms into a Taylor Series and retaining only the linear part of
it. As a direct extension of the above strategy the dissertation
presents the development of a stronger version of the discrete minimum
principle. Two algorithms have been developed to implement these
theoretical results. The success of the new methods has been
demonstrated in several small size numerical examples and in the
design of a simple heat recovery network on which the previous methods
fail.
The dissertation also develops two strategies for the synthesis
of chemical process flowsheets. The first is a branch and bound
strategy which exploits the bounding properties of the dual and primal
values which can be obtained for the flowsheet objective function.
The flowsheets are constructed in a blockbyblock building procedure,
and this method was used to synthesize optimal multicomponent separation
sequences. In the method list processing techniques are used to
develop the distinct separators which can occur. At tie end of the
synthesis a small number of nearly optimum flowsheets is retained.
Further screening among them is possible to locate the optimum
flowsheet. This strategy is demonstrated in two different examples
with very encouraging results.
The second synthesis strategy, the evolutionary strategy, is
considered next and is systematized for use in the synthesis of
general process flowsheets. The evolutionary synthesis procedure is
broken into four subtasks: a) the development of a starting flowsheet,
b) the generation of evolutionary rules to produce the structural
modifications to be considered during the synthesis, c) the developing
of the proper evolutionary strategy to lead to the optimum solution
in the most effective manner, and d) the screening among the current
flowsheet and the alternative flowsheets generated. Notions such
as that of the neighboring flowsheets and the evolutionary strategy
help to put evolutionary synthesis in a correct perspective. The
evolutionary approach is illustrated with the synthesis of two
distinct separation problems with very encouraging results.
CHAPTER I
INTRODUCTION
The essential concern, purpose, and culmination of engineering
is design. In chemical engineering this is the prevailing and most
important factor of its scope. Very often, a final design is achieved
without due consideration to all aspects of the design morphology.
This is necessitated by the complexity of the design problem and
the state of the limited engineering advances in certain areas.
Proper design procedure includes the three essential stages of synthesis,
analysis and evaluation. The design process is complicated by the
interrelationships existing among these stages. Frequently these
interrelationships are complex and cause design to be an iterative
process, requiring the special attention of the designer and the
development of flexible strategies which will lead to good solutions.
Analysis is a term which is equally familiar to both practicing
engineers and students of engineering. It has been developed de
ductively and quantitatively to a high degree. Strategies have been
developed to analyze whole processes and very effective, sophisticated
methods have been proposed to resolve the complicated, time consuming
activities of the design process.
Considerable work has been done and is in progress on optimization
theory for large structured systems. Both theory and applications
have found very fertile ground in chemical engineering. The particular
feature of chemical processes (i.e., sparseness and complexity)
have caused the development of highly effective strategies for
analysis and optimization. Chemical process design has been the
instigator of such developments.
The synthesis stage of the design procedure was largely ignored
until recently in the chemical engineering literature despite its
importance in chemical engineering practice. During the last few
years the importance of creativity, innovation and invention in
designing chemical processes has been stressed and has received the
proper attention.
Because synthesis is such an important step, it became the
principal coal of this thesis. Initially we were exploring the use
of licGalliard's (McGalliard, 1971) approach to structural synthesis
for the synthesis of optimal multi component separation schemes. This
strategy involves the development of dual bounds for alternate flow
sheets. Thus the first problem of concern was a good initial
estimation of the Lagrange multipliers to be used for the evaluation
of dual bounds. A more thorough and detailed investigation of the
physical meaning of the Lagrange multipliers and the discovery of
Hestenes' method in the literature (Hestenes 1969) led to the
development of a new method to overcome the deficiencies of Lasdon's
(Lasdon, 1964) twolevel optimization method.
Then it became evident that a stronger version of the discrete
minimum principle could also be established. Thus two new strategies
evolved which can very likely be used effectively (as their application
to several small examples has indicated) to optimize largescale
systems in the presence of nonconvexities. These approaches are of
particular interest to a chemical engineer, since the cost functions
used in process design involve the throughput to a unit raised to
the 0.6 power which is a characteristic nonconvex function.
Furthermore we explored the use of the bounding properties of
the primal and dual functions in connection with list processing
techniques to generate very good alternate solutions to the multi
component separation problem. Finally, the evolutionary approach to
the synthesis of optimal process flowsheets was systematized, and
all these principles and ideas were illustrated in the synthesis
of an optimal separation scheme.
Thus, in Chapter II we discuss the twolevel optimization method,
its theoretical foundations, its advantages and its drawbacks. The
generation of dual gaps because of 'onconvexities and their resolution
using Hestenes' method of multipliers is discussed. An algorithmic
procedure employing these ideas is described. In Chapter III the
"classical" discrete minimum principle is outlined, and its short
comings because of nonconvexities are defined. Again Hestenes' multi
pliers are used to develop a strong version of the discrete minimum
principle, and a new algorithmic procedure is proposed. In Chapter IV
the theories developed in Chapters II and III are tested on some
simple numerical examples and also some examples drawn from the
chemical engineering literature. In Chapter V a general review is
presented of previous works in the area of process flowsheet synthesis.
Then a branch and bound strategy, using list processing techniques,
is outlined for the synthesis of optimal multicomponent separation
sequences in Chapter VI, while Chapter VII develops a systematic
evolutionary approach for the synthesis of process flowsheets.
4
Chapter VII concludes with an illustrative presentation of the principles
governing the evolution of the design on a separation problem.
Finally in Chapter VIII we summarize the results of this thesis and
outline a program for further related research in the area of
chemical process design.
CHAPTER II
TWOLEVEL OPTIMIZATION METHOD. DUAL GAPS.
RESOLUTION USING METHOD OF MULTIPLIERS
II.1. Review of Previous Works on the Nonconvex Optimization.
The Lagrangian approach, representing a dual method, has often
been proposed for solving optimization problems. In many engineering
design problems, the system to be optimized comprises several fairly
complex subsystems or units which are connected together by sets of
equality constraints. By appending the equality constraints to the
objective function with Lagrange multipliers, the Lagrange function,
for fixed multiplier values, is separable, leaving a subproblem for
each subsystem. The approach is therefore very attractive for this type
of problem. It can fail however because of the presence of a dual gap
at the solution point (Everett, 1963; Lasdon, 1970). Unfortunately
this failure is common in engineering design problems; therefore, the
development of a method to resolve dual gaps is of great importance.
Gaps may arise for various reasons, and a quite thorough treat
ment of their causes and resolutions is presented by Greenberg (1969).
He reviewed the use of nonlinear supports (Gould, 1969; BellmanKarush,
1961). the use of surrogates (GreenbergPierskalla, 1970; Glover, 1968),
the use of cuts via dominance and efficiency concepts (Loane, 1971;
Greenberg, 1969), and the use of branch and bound method designed for
finite separable problems. It should be noted that in all these methods
to resolve gaps, inequality connection constraints are emphasized.
Bellmore, Greenberg and Jarvis (1970) have examined the use of
penalty functions to resolve gaps, and they reviewed the relationship
between the original problem and the augmented one along with proposed
solution procedures.
Falk and Soland (1969) have proposed an algorithm to solve the
separable nonconvex programming problems where the constraints are only
upper and lower bounds on the variables. Soland (1971) has extended
the algorithm to include inequality constraints of a more general form.
Both the algorithms are of a branch and bound type and solve a sequence
of problems with convex objective functions. These problems correspond
to successive partitions of the feasible set. Greenberg (1973), in a
recent publication, has provided a sharper lower bound on the optimum
value with less computation using the Generalized Lagrange Multiplier
method, and the FalkSoland algorithm can be modified appropriately.
All the above methods have a major drawback: the separability of
the objective function and the constraints, if it existed in the
original problem, is destroyed after the proposed modifications. Only
the last method reported by Greenberg (1969) and using a branch and
bound technique preserves the separability, but it is only applicable
for finite problems. Separability is a characteristic providing many
advantages for the solution of a large system, and it is very desirable
to preserve it.
In applying the Lagrangian approach and structural sensitivity
analysis (McGalliard, Westerberg ,1972) in engineering design problems,
ue want to resolve the dual gap problem by keeping the structural
characteristics of the system.
In the present work an algorithm is developed which makes use of
the penalty function approach together with a linear approximation of
the nonseparable terms. The original problem is replaced by a sequence
of problems, each one yielding a tighter dual bound on the optimum
solution. The solutions to the above problems form a nondecreasing
sequence of real numbers, bounded from above. Under certain conditions
developed later on, this sequence of solutions converges to the solution
of the original problem. The structural characteristics of the original
system are preserved: each subproblem is solved separately.
11.2. Statement of the Problem and the TwoLevel Procedure
As a basis for the description of the twolevel optimization
procedure. of the encountered dual "gaps", and of their proposed reso
lution, we will consider the following model which represents many
engineering system design problems (as opposed to resource allocation
problems):
I(j) = {i ; stream i is an input to subsystem j}
0(j) = {i ; stream i is an output of subsystem j}
x. is a vector variable associated with the interconnecting
Stream i.
u. is a vector decision variable associated with
Ssubsystem j.
Define vector variable qu as
q = (Xk ... xk [u ) ki e I(j). (II1)
J
The transformation equations which connect the subsystems are
xi ti(qj) i O(j) j = 1,...,n .
(112)
The overall system return function is the sum of the subsystem
return functions:
n
F = Y f (qj).
j=l
(113)
The system constraint set, excepting the interconnection constraints
(112), is separable, i.e.,
q Sj = {qj hj(qj) 01
j=l . ,n.
Thus, the overall optimization problem is
n
Minimize F = I f (q.)
33
(I14)
subject to xi = ti(qj) i E 0(j) j = 1...,n
qj Sj
, j = 1 .. ,n.
The Lagrange
given by
function for the above problem described by (114) is
n n
L = fj(qj) + 1 (t (q) 
j=1 j=1 iSO(j)
x.) (115)
where the hi are Lagrange multipliers. Rearranging the terms in (115)
the Lagrangian can be written as follows:
n
L = I { f.(qj) +
j=1 ie0(j)
n
 xix}
(116)
For fixed x the problem of minimizing L
equivalent to solving the subproblems
becomes separable, and it is
Minimize Zj(qjQ
(117)
subject to qu S..
Let us now
function is
define a dual function and a dual problem. The dual
n
h(A) = X (minimum k.(q j;)),
j=1 qjES. j
(118)
and the dual problem is
Maximize h(A)
(119)
subject to A E D D = {X; h(A) exists } .
The twolevel optimization procedure requires the following two
distinct operations:
First level: Calculate h(A) by solving the n subproblems (117)
Second level: Adjust the multipliers,A, so as to satisfy the inter
connection constraints in (114)
In effect this procedure solves the dual problem described by (119).
The important question concerning this procedure relates to the
existence of a saddle point for the Lagrange function of the problem.
The following theorems provide the theoretical basis and give some
answers to the saddle point existence question. For further details
and proofs of the theorems, see Lasdon (1970).
Theorem 1: Let XAeE A point (q OA) is a constrained saddle point
for L(q,X) if and only if 1) qO minimizes L(q,Ao) over S and 2)
x. t.(qj) = 0, ieo(j), j=l,...,n.
Theorem 2: If (qo ,x) is a saddle point for L, then qO solves the
primal problem described by (114).
Theorem 3: h(A) is concave over any convex set of D.
Theorem 4: h(X) < F(q) for all qeS such that x ti(qj) = 0, icO(j),
j=1,...,n and for all AeD.
11.3. Dual Gaps
The basic drawback of the Lagrangian approach, and therefore of
the twolevel optimization procedure, is its failure to find the solution
of a problem when the solution is in a "dual gap".
To provide additional insight into the relationship between the
primal and the dual problems and to demonstrate the formation of gaps,
we will give at first a geometric interpretation of the procedure and
second the theoretical justification of the failure of the twolevel
approach in certain cases (Lasdon, 1970 ; Rockafellar, 1967 ).
Consider the family of the perturbed problems, with perturbations
n
Minimize F(q) = X f (qj)
j=l J
subject to xi ti(qj) = zi icO(j) j=,...,n
qj S. j=l,... n
The primal problem corresponds to zi = 0, ie0(j), j=1,...,n. Assuming
continuity of the objective function and of the constraint functions,
let us define
w(z) = minimum {F(q); xi ti(qj) = zi, iEO(j), j=l,...,n, qcS}
T T
where z = [z, for all iEO(j), j=1,...,n].
The domain of w is
Z = {z; there exists a qcS such that x. ti(q ) = zi, iEO(j), j=l,...,n) .
Consider now the set RCEm+l
R = {(z ,z); z > w(z), zeZ}.
If w and Z are convex then R is convex. We shall call this space,
containing R, the HZ space.
The following theorem demonstrates the connection between duality
and supporting hyperplanes for the set R, see Lasdon (1970) in WZ.
Theorem 5: If eS and AEEm, then q minimizes L(q,2) over S, if and
only if H = {(z ,z); z z = L(q,2)} is a supporting hyperplane for
the set R at the point, (F(q), xi ti(qj) = zi, ie0(j), j=l,...,n).
Supporting hyperplanes exist for every point and therefore for
the solution point, if R is a convex set (Fig. 1). In the case of
,
I,
r
~
i ~
5~42 qh \
1
"~rc,
'a "`I
4a 9
'1
supporclng
hyperplane
aT
(w(I (,2j)
aZo
w(zl)
I nh(X wO)
h (,\)= w (Z
supporting
hyperp aon
at (w(i0 O)
Figure 1. Geometric View of the Success of the Dual Approach
Cul~ I __ .I l*ur~a .iL~l1 9U9~~C*j
i
B
i
a
i
ti

nonconvex R sets, there are regions consisting of constraint vectors
that are not generated by any vector X (Fig. 2). Optimum solutions
for constraints inside such inaccessible regions cannot be discovered
by straightforward application of the twolevel optimization procedure,
and must be sought by other means.
The following corollary (Greenberg, 1969) helps to anticipate
the existence of such an inaccessible region, which will be termed a
"dual gap".
Corollary: A dual gap arises if some choice of Lagrange multipliers
XED produces at least two solutions to the Lagrangian problem (118)
with distinct zi = xi ti(qj) i 0, iEO(j), j=l,...,n.
11.4. Resolution of Dual Gaps Using the Method of Multipliers
Let us assume that the solution to problem (114) is in a dual
gap. According to Theorem 5, the twolevel approach fails to find the
solution. The method of multipliers developed by Hestenes (1969)
will be used to cure this shortcoming of the twolevel method.
Let us make the following notational changes:
ij = x t (qj), icO(j) j=1 ...,n
T T T T
9 [gi 2 m gi ] ; j=l,...,n ; ik 0(j) ; k=1,... ,m
T T T T
g = [g j, 92 n... ]
q = [qTl qT2.... qT]
hT = [h, hT ... h
1' 2' n
= w(o)
Gapp p
Figure 2. Geometric View of the Failure of the Dual Approach
to Yield the Primal Solution
Then problem (114) is written as follows:
n
Min F = y f (qj) = F(q)
j=1
(JIP1)
s.t. g(q) = 0
q S
where S = S1 x S2 x ... x Sn
Consider now the following augmented problem:
Min Ft(q,k) = F(q) + K.gT(q)g(q)
s.t. g(q)= 0
q E S (IIP2)
K.> 0 .
Let us now examine the relationship between problems (IIPI) and (IIP2).
Let us make the following assumptions:
Al: Objective function F(q) and constraints g(q) are of class C".
A2: The set S is compact and nonempty.
Theorem 6 (Hestenes, 1969 ): Let qo be a nonsingular solution to the
problem (IIP1), then there exists a multiplier vector A and a constant
Ki such that the point q0 is an unconstrained minimum of the Lagrangian
of problem (IIP2).
Proof: Since q is the solution of the problem (IIP1), the following
conditions are true:
g(qo)= O,
and AL = AqT 2
0 where L = F(q) ATg(q) v Th(q)
Aq > 0 for all permissible variations Aq,
such that
Ag = Aq = 0 and vj [hi i j,,...,r
Note: The KuhnTucker multipliers vj are always nonpositive at the
solution to (IIP1). The Lagrangian of the problem (IIP2) is:
L. = L + KigT(q)g(q).
At the point q :
aL 0
AL. = AL + Ki
lq j2q T q
_q q i~l
For permissible variations Aq such that Ag = 0 and v iAh 0,
AL. > 0 and for nonpermissible variations Aq, it is possible that
AL < 0, but there exists a K. > 0 such that
1
AL. > 0
and the proof is complete.
From Theorem 6 we conclude that there exist a vector A and a Ki > 0
qq q
such that the point (q ,0) is a saddle point for the Lagrangian Li,
of the augmented problem (IIP2). Therefore the problem (!IP1)
can be solved using the twolevel method, if it is augmented by the
penalty term KigT(q)g(q). Define:
w(z) = Min {F(q):g(q) = z}
qES
wi(z) = Min {F (q,Ki):g(q) = z}
qES
Let us examine now how the value of Ki affects the wi(z) and under
what conditions a proper choice of K. yields a supporting hyperplane
for the set R., where
Ri = {(z ,z):z >_ wi(z), zEZ}
at the solution point (wi(0),0). The following theoretical treatment
will demonstrate the effect of the penalty term on the resolution of
dual gaps via a geometrical representation.
Lemma 1:w(z) < w.(z) for z f 0 and w(O) = wi(O). More generally, if
K. > K. then w.(z) > w.(z) and w.(0) = w.(O).
Proof: The equality for z = 0 is direct since the penalty term is then
zero. Let q minimize F(q) with g(q) = z, then q minimizes Fi with
g(q) = z because if this is not true and q minimizes Fi with g(q) = z,
then
F(q) + KigT(q)g(q) < F(q) + KigT()g(q)
and consequently
F(q) < F(q)
which is not true since by assumption q minimizes F(q) with g(q) = z.
Similarly:
KTz
Min F.(q,K.) Min {F(q) + K.z z
i qES
T *
> Min {F(q) + K.z = Min F*(q,K) .
qcS 1 1 1
qcS
*
Therefore w.(z) > wi(z) for z 0 Q.E.D.
Lemma 1 implies that the curve w(z) vs. z (in the onedimensional
case) is moved upwards as K. increases, keeping always the same value
at z = 0.
Lemma 2: If w(z) is a continuous function with continuous first de
rivatives, K. > K., and
J 1
H(K ) = Max h*(A) = Max Min L (q,X,K ) = Max Min [F AT(q)]
A A qeS P qES
then
H(Kj) > H(Ki) and H(Kj) = H(Ki) only at the
solution of the problem (IIP1).
Proof: Let
* *T *
H(K ) = Fj(qj) ) g(q ))
* *T *
H(Ki) = Fi( (i)) (i) g(q (i)
Assume that H(K.) C H(K.), then
* *T * .*T *
Fj(q (j)) g(q ) < F(q ) g(q(i))
F(qj)) A*(g(q*) > F (qj )
F (q j)) A~ i)g(q(j)) < Fi(q i))
*T
*T
(i)g(q(i)
** *
The last inequality implies that the point (F (q(j)), g(qj)) lies
below or on the hyperplane, defined by:
*T
H = {(z,z)I .(i)z = H(Ki)}
Let us first examine the case where it lies below the hyperplane. Then
(Fj(qj)) g(qj))) Ri
where
R = (zZ) Iz *(z) zZ
Ri = {(z ,z) Iz wCi(z) zEZ1
But,
(F i((j)) g(q(j))) E Rj
where
Rj = {(zo,) Izo > w (Z) zeZ }
Because of lemma 1 R.C: R. and (R. R.) ( R. = (w(O),0). Thus
J 1 1 J J
therefore
(Fj(q( ) g(q( ))) E Ri
which contradicts the above result. Therefore H(Kj) 4 H(Ki).
* *
In the case that the point (F.(q(j)), g(q(j))) lies on the hyperplane,
because
(Ri Rj) n Ri = (w(O),O)
we conclude that H(Kj) = H(K.) only at the solution of (IIPl).
Theorem 7: If w(z) is finite for all qes and is continuous with con
tinuous first derivatives and existing finite onesided directional
second derivatives in any direction s, at z = 0, then there exists a
* *
K finite such that for all K > K H(K) = H(K) = primal solution.
Proof: Since w(z) is continuous with continuous first derivatives at
z = 0, there exists a hyperplane tangent to w(z) at the point (w(0),0).
This is not a supporting hyperplane for the set R, since there is no
saddle point for the Lagrangian of the problem (IIP1) inside the dual
gap. This hyperplane is described by:
z z = w(0) with some z w(z).
Now we have to find a K. such that the above hyperplane is a supporting
hyperplane for the set R., i.e.,
T *
z = w(0) + ? z < w*(z).
We note that
w*(z) = Min {F(q) + Kig (q)g(q)g(q) = z}
qcE
therefore,
T T
w(z) + K.zz Tz > w(O)
and there must be a
T
Ki. > w(0) w(z) + X z for all EZ {z 3ES g(q) = z}
Z Z
For z / 0 and with w(z) > m for zeZ, a finite value of K. exists,
say L The only potential problem might occur at z = 0. IJe can use
L'Hospital's rule to find the limit as z 0 along any direction s,
namely
w(O) w(Bs) + T(Bs) 1 2w(Bs) = finite ,
Lim x = Lim T 2 2
z+O )0T (gs) (ss) 2
if this limit exists. If the limit does not exist, then the finiteness
of the onesided directional second derivatives in any direction s, at
z = 0, yields:
Lim a = a. = finite, along the direction s..
Bj.0 J J
Therefore choose K., such that
K. ? max [az, a. for every direction sj] = finite.
Therefore there exists a finite K = K. such that the hyperplane tangent
to the w(z) at the point (w(0),0), with a slope \, is a supporting
hyperplane for the set R.. Therefore the dual solution H(K ) equals
the primal solution.
Now, for any K > K H(K) H(K ) but it cannot be that H(K) H(K )
since that leads to H(K) > primal solution. Therefore H(K) = H(K ).
If 3w(z)/3z is to be discontinuous and w(z) is continuous at z = 0,
then K has to be +m, and the dual approach fails to give a solution
for the problem (IIP2), but as K increases, approaching +, it
produces tighter lower bounds on the primal solution. The method can
fail altogether if w(z) is discontinuous at z = 0.
11.5. Computational Separability
The previously described resolution of the dual gaps suffers from
a serious drawback, namely, the separability, which existed in problem
(IIP1), has been destroyed in problem (IIP2) because of the cross
product terms in the penalty term. Separability preserves the struc
tural characteristics of a system and induces properties which are
always desired in solving a largescale problem such as many of those
in engineering design.
In the following paragraphs an algorithm is proposed which resolves
the dual gap by preserving a computational separability of the system.
Using the initial notation, consider the penalty term
T n T
Km(g(q)z)T (g(q)z) = K M X [ti (j) xi zi [ti(qj)xi z]
j=1 iEO(j) 1
n s 2 2 2
= K [t2 +x +z2 2t. z. 2tt xir
j=l ic0(j) r=l ir r r ir ir
+2ir ir]
where z is the deviation from satisfying the constraints. Under
appropriate assumptions it has been seen that for large enough Km,
the solution of the problem can occur at z. = 0 for all icO(j) and
j=l,...,n.
Each term of the above triple summation consists of separable
2 2
terms tl (qj), Xi 2tir(q.)z.r, and 2Xi zi, and a crossproduct
2tir(q )xir which is not separable. Expand the crossproduct in a
Taylor Series and consider the following linear approximation around
the point, ir, ti = t (q ):
tir (q)x. = t. x. + t. x. + t. x.
Sr ) ir ir ir ir r ir ir
Then the Lagrangian of the problem (IIP2) with constraints g(q) = z
(instead of g(q) = 0) and Lagrange multipliers p takes the following
approximate separable form:
L* n +1 2 2 A
L f.(q.) + K [t. +z. 2t. z. +2t. x. 2t. x. ]
lm j (1 J j m Lr 1 ir ir ir ir ir ir ir ir
m =l ic(j) r=l
r.
icO(j) icl(j)
n ,
S mj (qj. )
j=l
Thus the problem of minimizing this form for L for fixed K and 1 is
equivalent to solving the subproblems:
min mj (qj,l)
q
j=1 ,...,n
s.t. q E S..
11.6. The Algorithm
Using the multiplier method (Section II4) along with the above
mentioned linear approximation for the crossproduct terms, the
following algorithmic procedure is developed which resolves the dual
gap by preserving a computational separability of a largescale system.
In Section 117 a minimum principle is developed which completes the
theoretical foundation on which the algorithm is based.
Step 1: Assume a value for the penalty constant K .
Step 2: Assume values for Lagrange multipliers X (we shall relate
these to p in the step 4).
Step 3: Assume a point (xiti(qi)) for each icO(j) and j=1,2,...,n.
Step 4: Put zi = ti(qj) xi and Pi = 2Kmzi + Xi. Form the subproblems
based on the linear approximation for the crossproduct term.
Step 5: Find q. for j=1,2,...,n which solves
Mi n *
qjES m
Step 6: Update xi, t (qj) and iterate from step 4 until zi = t.(qj) xi
Step 7: Update the Lagrange multipliers A and go to step 3 until the
Max Min L has been attained. Check the constraints. If
i qeS
satisfied stop, otherwise update Km and go to step 2.
The algorithm described above requires that a sequence of MaxMin
problems be solved, each possibly requiring a large number of
iterations. Consequently the total number of iterations for
convergence may become excessively large for practical applications.
In order to accelerate convergence, we have adopted the
modifications on the updating rules proposed by Miele et al. (1972) for
the method of the multipliers. The adopted modifications are the
following:
(i) Shorten the length of a cycle of computations. A cycle of
computations is defined to be the sequence of iterations in which the
multiplier A and the penalty constant Km are held unchanged, while the
vector qj for the subsystem j is viewed as unconstrained. The number
of iterations permitted in each cycle depends on the unconstrained
optimization technique which is employed, and it is AN = number of
iterations = 1 for the ordinary gradient algorithm and the modified
quasilinearization algorithm (Miele et al. 1972) and AN = dimension
of vector qj, for the conjugategradient algorithm.
(ii) Improve the estimate of the Lagrange multipliers A, using
the formula
A(i+1) X(i) + 2 1 g(q)
where B is a scalar parameter determined so as to produce some optimum
effect, namely, to minimize the error in the necessary conditions for
optimality.
(iii) Select the penalty constant Ki in an appropriate fashion.
The method used to select K depends largely on the unconstrained
optimization method which is employed. If the ordinary gradient method
is employed, the penalty constant is given by the formula
Km = 2P(q)/Pq(q) Pq(q) ,
where P(q) = gT (q)g(q) and P (q) = P If the conjugategradient
or the modified quasilinearization algorithm is used, the penalty
constant is updated by the formula
K(i+ ) = min (KoK(i)) if P(q) <_ Q(q,A)
m 0 m
K(i+1) = max (K ,rKi) if P(q) > Q(q,X)
M 0 5
T
where: q(q,X) = F (q,X) F (q,X) F (q,A) A 1
q q q q 3q
and
and K /q pg /Pq(q) Pq(q)
K = ) ^ /PT
[ q T q q q
See Miele et al. (1972) for further information on these rules.
11.7 A Discrete Ninimum Principle
In this section we shall give a theoretical justification for the
algorithm given in Section 11.6. Consider the following two Lagrangian
functions:
L = F(q) + K gT ((q q) Tg(q)
m M
(1110)
and
= F(q) + K (g(q) z)T (g(q) z) T(g(q) z)
(II11)
Geometrically (II11) is the Lagrange function resulting from moving
the origin to z in the WZ space and adding the penalty term for
deviation from that new origin. Theorem 8 indicates a relationship
which exists between them.
Theorem 8: If q solves the problem
*
h _() = Min Lm(q,K m,) (IIP3)
qcS
resulting in
g(q) = z (1112)
then q also solves, for K and z fixed at these same values, the problem
Min {Lm(q,Km,,z)jg(q) = z} (IIP4)
qES
Conversely, if one can find a z which permits (IIP4) to have a solution,
say q, then q solves a problem of form (IIP3).
Proof: Suppose that q solves (IIP3) and results in g(q) = z. Then by
Everett's main theorem (Everett ,1963 ), q solves the problem
irn {F(q) + K gT(q)g(q) g(q) = z} (13)
qES
which is equivalent to solving the problem
Min {F(q)lg(q) = z} (1114)
qeS
since the term KgT (q)g(q) = K z i s constant for z fixed. Problem
(IIP4) is equivalent to problem (1114); thus q solves problem (IIP4).
If z is fixed and permits (IIP4) to have a solution, say q,
then one has solved the problem
lin {F(q) + KmgT(q)g(q) (2K z + ~)T g(q) + K zTz + Tz}
qeS
which is equivalent to solving the problem
Min {F(q) K q)(q) gTg(q)} (1115)
qeS
where
A = 2K z + p (1116)
Problem (1115) is a problem of form (IIP3), and consequently q solves
a problem of form (IIP3).
Theorem 8 says problems (IIP3) and (IIP4) are equivalent, in that
they give rise to the same (z,q) values. The following corollary
follows directly from this observation.
Corollary 8.1: If and only if 2 permits (IIP4) to have a solution,
then the set Rm corresponding to problem (IIP3) has a supporting hyper
plane at the point (w (z),z) with a "slope" given by (1116).
The following corollary also follows directly.
Corollary 8.2: If z permits (IIP4) to have a solution, say q, then
T T
h(X) = (q,K ,i z) + Km zT + Az (II17)
where A is given by (1116).
Proof: If z permits (IIP4) to have a solution, q, then
g(q) = z (IT18)
and
L (q,Kmvuz) = F(q) (1119)
Equation (1117) follows immediately from (II10), (1118) and
(1119). Q.E.D.
Note that corollary 8.2 gives us a formula to calculate a dual bound,
h(A), to w(O) if we have in fact solved (IIP4) rather than (IIP3).
The result following exposes some very useful properties of problem
(IIP4) for the type of engineering design problems being considered
in this paper. It also constitutes the theoretical reasoning of the
algorithm presented in Section 116.
Result:T ~T
Result: If the point q [q ...q n] and the multipliers p solve
problem (IIP4), then each subproblem j, j=1,...,n resulting after the
Taylor Series linear approximation of the crossproduct terms is
minimized with respect to the corresponding qj at the point qj.
Proof: Consider a system of two stages and the following family of
problems:
Min F fl(xl,u ) + f2(x2,u2)
subject to: x2 t((xl,ul) = z
h1 (x1 ,ut) 0
h2(x2,u2) 2 0
where zeZ' = {zlx2 t1(xl,ul) = z, hl(x1l,u) < 0, and h2(x2,u2) 0}.
The Lagrangian function for the augmented problem is:
*T
L = fl(x1Ul)+ f2(x2,u2) + Km[ X t1(x1,l) z]T[x t (x',u) z]
T) z T T
[2 t1(x1u) z] vlhl (x1,u) V2h2(x2,u2)
= L + KM[x2 t1(xl,u1) z]T[x2 t1(x,u) z] ,
where V1 and v2 are KuhnTucker multipliers.
Suppose that L is minimized at the point (x1 ,ul ,X,2). At this point
*
the Hessian matrix of the second derivatives of m must be positive
definite, ie.,
definite, i.e.,
2 atT at a2t
S 1 t ___
2 + 2K 2K 2 1 E t z]
ax 1 ax T ax 
2 K tT 2 2t 1
T + 2Km u 2Km [x2 1 z]
t1
S2K
m T
0x]
0
2
2 3T 2tI 9 t
xx + 2Km x 2Km [X2 tl z]
axu 1 2u ax1ul
2 at at a2 1
2 + 2K ul u 2K [x t z]
a2 n u 1 m T[2 tl z]
I. I du 1 9u1
at1
 2Km u
m T
0
T
2Km
t2K
2Km sul
a2L 2
 + 2K I 2 L
22 Tm
ax2 ax23u 2
T
2xu22
must be positive definite.
Note that A contains terms which are not separable, each of which
contains the term x t z. Since u and ( lUil'X2 2, ) minimize Lm
with 2 t1(x1'ul) z = 0, the matrix A reduces to the following
form:
2
32 L Lt at
L+ 2K t
x 1 ax
1 1
22L
T +
at 3t t
2K t
m u T
1
_ _2L 1t
x + 2KL i l Iu
x1uT m 9X1 uT
32L
u2
aDt
 2K 
m 3xT
0
0
tT 1
2K 2
mu au
at1
 2K
m T
0
Since A must be positive definite the principal minor matrices of A
must be positive definite, i.e.
2L T at
D2L+ 2Km 1 T
ax2 m 1 x3t
1 1
22L at1
xu + 2Km Il x
axTau m u 1 XT
2L atT ;t
XL + 2K 1
x1 2,T m 2x1 uT
O2L D t 1 atl
i2 m a
1 1
t
 2K
m Bx1
DtT
3tT
2K
m 2u1
22L + 2K I
2 m
2
22L
x ou2
x1'l1
posi tive
definite
2 2
A2 = positive definite
x2,u2 DaL 3 L
22 2
ax23u2 aU2
*
Consider now the subLagrangians ml and m2. After the linear
approximation of the crossproduct terms 2Kmx tl( (x,u), they take the
following form:
fml= l(Xl'Ul) Ttl(X1Ul) vhl(xul) + K (x1u )tl(xlul)
2Kmx2 tl(x1,ul) 2Kmz tl
T T T KT2 T
22 = f2(x2'u2) + 1 x2 2(2'u2) +Kmx2x2 2Kmt( X1 )x2 + 2Km 2
where xl,ul,X2 is the point around which the Taylor Series expansion
takes place. At the point 1 ,ulx2,u2 the following conditions are
satisfied:
zml n Lm ml u Lm
0 0
axl ax1 aul xul
*
m2 = 0 and m 0
5x2 5x2 Su Su
2 x2 2 2
*
Also the Hessian matrices of ml and Sm2' which are precisely Al and A2
respectively, are positive definite. Thus we conclude that ml and
respectively, are positive definite. Thus we conclude that S>. and S.
are minimized at the point (xl,ul ,X2,U2) which minimizes Lm. By
induction the same result is found to apply to any number oF stages.
Since the primal problem (IIP1) corresponds to that member of
the class of problems (IIP1') with z = 0, the above mentioned result
applies to (IIP1) for a proper Km and i.
11.8. Discussion
We should note the following two observations which constitute
the essence of the proposed algorithm: 1) The use of a Taylor Series
expansion really provided two results. The first is, of course, that
the problem becomes separable. The second is evident only when one
realizes that other devices could effect the separability feature; for
example, one could rewrite the crossproduct term 2x. t. as
2xir(xir zir) which is also separable for fixed zir. Unfortunately,
in this form in contrast to the proposed one, the subproblem, for
the unit containing x. as a variable, becomes a concave minimization
in xir as Km increases. 2) For the Taylor Series expansion approach,
we have found that problem (IIP4), and not (IIP3) to which it is
equivalent, has the desired property that the subproblems must always
be minimized if the overall problem is minimized, even if K is not
sufficiently large.
2 2
We should also note that the convex quadratic terms t. and x.
ir ir
produced by the penalty term will tend to improve and could dominate
the behavior of the subproblems as Km increases, making them easier
to solve.
The introduction of the quadratic terms in the objective function
also offers another advantage. It desensitizes the dual function with
35
respect to the multipliers A. For given multipliers \,
h (x) = Min L*
xi,ui ,i= ... n
h(X) = Min L
xi,ui,i= .... ,n
is closer to the primal solution than the
(see Figure 3). This is a characteristic
which can be of importance for a dual bounding procedure, such as
the one used in structural sensitivity analysis (McGalliard, Westerberg,
1972).
Figure 3. Improvement of the Dual Bound h (X) for the
Penalized Problem Over the Dual Bound h(A) for
the Unpenalized Problem for the Same Multipliers X
i' 9
h(l)
CHAPTER III
A STRONG VERSION OF
THE DISCRETE MINIMUM PRINCIPLE
The discrete form of Pontryagin's Minimum Principle proposed by
a number of authors has been shown by others in the past to be
fallacious; only a weak result can be obtained. Due to the mathe
matical character of the objective function and the stage transformation
equations, only a small class of chemical engineering problems have
been solved by the strong discrete minimum principle. This chapter
presents a method to overcome the previous shortcomings of the strong
principle. An algorithmic procedure is developed which uses this new
version. Numerical examples are provided to clarify the approach
and demonstrate its usefulness.
III.1. Review of Previous Works
Pontryagin's minimum principle (Pontryagin et al., 1962) is a
wellknown method to solve a wide class of extremal problems associated
with given initial conditions. A discrete analog of the minimum
principle, where the differential equations are substituted by dif
rerence equations, is not valid in general but only in certain almost
trivial cases. Rozonoer (1959) first pointed out this fact. Katz (1962)
and Fan and Wang (1964) later on developed a discrete minimum principle
which was shown to be fallacious by Horn and Jackson (1965a), by means
of simple counterexamples. As was pointed out by Horn and Jackson
(1965a, b) and lucidly presented by Denn (1969), the failure of a
strong minimum principle lies in the fact that we cannot deduce the
nature of the stationary values of the Hamiltonian from a consideration
of firstorder variations only. Inclusion of the secondorder terms,
does not help to draw a general conclusion about the nature of the
stationary points in advance. A weak minimum principle which relates
the solution of the problem to a stationary point of the Hamiltonian
exists and is valid (Horn, 1961 ;Jackson, 1964).
In the case of control systems described by differential equations,
time, by its evolution on a continuum, has a "convexifying" effect
(Halkin, 1966) which does not make necessary the addition of some
convexity assumptions to the specification of the problem. Thus a
strong minimum principle can be applied for these problems, requiring
the minimization of the Hamitonian even in the case that a continuous
problem is solved by discretizing it with respect to the time and
using a strong discrete minimum principle. For discrete, staged systems
described by difference equations, the evolution of the system does
not have any "convexifying" effect and, in order to obtain a minimum
principle, we must add some convexity assumptions to the problem
specification or reformulate the problem in an equivalent form which
possesses inherently the convexity assumptions. This present work
belongs to the second class.
In the present chapter we propose to show a strong version of
the minimum principle which relates the solution of the problem to a
minimum point, rather than a stationary point, of the Hamiltonian.
This is attained through the use of the Hestenes' method of multipliers,
a technique used effectively in Chapter II to resolve the dual gaps
of the twolevel optimization method. This method turns a stationary
point of the Hamiltonian into a minimum point, thus minimum seeking
algorithms can be used.
111.2. Statement of the Problem, and Its Lagrangian Formulation
As a basis for the description of the minimum principle, the two
level optimization approach, their success and failure, and the
development of a strong minimum principle, consider the following
sequential unconstrained problem. (Constraints and recycles do not
change the following results (Westerberg, 1973), and we want to keep
the presentation here as simple as possible.)
N
Min F = i i(xi,ui) (IIIP
subject to
xi+1 = fi(xi,ui)
xI = xo (given)
For every i=2,...,N the vector valued function fi(xi,ui) is given and
satisfies the following conditions:
a. the function f. is defined for all x. and u. ,
1 1 1
b. for every ui the function fi(xi,ui) is twice continuously dif
ferentiable with respect to xi,
c. the fi(xi,ui) and all its first and second partial derivatives
are uniformly bounded.
These conditions correspond to the usual "smoothness" assumptions.
The Lagrangian function for this problem (IIIP1) is given by:
N N
L = i (xiiu. +[) i+ fi(xiui)] (x 
ii =1 i +
N T T To T
S {Ti(x u) xi f. (x.,u )} + xxl ( l
i=l
The solution to the problem (IIIP1) is a stationary point of the
Lagrangian function L. The necessary conditions for a stationary
point of the Lagrangian are:
T
L D i + fi
;x. axi 1i axi 9 i+1
fT
+A = 0,
aui aui i+l
 f (xiu) = 0
0, i=1,...,N
i= ,.. .,N
i=1 ....N. N
From equation (II12) we have the defining equations for the multipliers,
T
3m af
i + th (III5)
with the natural boundary condition,
xN+1 = 0 .
'N+1
(III1)
(1112)
(1113)
(III4)
(I116)
Equation (1114) simply necessitates the satisfaction of the connection
constraints.
The Lagrangian approach constitutes a unifying and general pre
sentation of the necessary conditions which must be satisfied at the
solution of a problem. For the solution of the necessary conditions
different strategies have been developed. In a tutorial presentation
(Westerberg, 1973) the relationship of the different strategies to
solve problem (IIIPI), with the Lagrangian approach, is established
and it is shown that methods such as sensitivity analysis, discrete
minimum principle, and the twolevel optimization method are simply
different techniques to solve the same necessary conditions, eqs.
(III1), (1112) and (III3).
Let us define the stage Hamiltonian H. as follows:
Hi = .i(xi,u ) + T fi(x 'ui) i=1,...,N (II 7)
and the overall Hamiltonian by:
To
H i Hi+ ?TX
i=l
Then,
N N
L = Hi i 1 ixi+ XTx
i=1 1i=
and the necessary conditions (1112), (1113) yield:
3H.
DL 1 = O i=,...,N (II8)
Xi 3x i
S 0 i=,...N (III9)
aui au
Thus, in order to solve problem (IIIPI) by the discrete minimum
principle, we require that each stage Hamiltonian be at a stationary
point.
Consider the point (x,2 .. ,xN; ,u'2, ,u,) and the variations
6ulU2,... ,6uN around the previous point with respect to the controls,
such that S1uijl where the second index j denotes the jth element
of the control vector ui. 6x1 will be taken as zero. The variational
equations corresponding to the connection constraints of the problem
(IIIP1), with up to the second order terms included, are:
2 2
i1 T f + 1 (6x T fi
6xi1 (6ui) + (x ) + (6u.)T ( ) + (x  x
i+l T 2 u u2 x2 (u2 i
au. ax u ax
1 i i
2
+ (6ui)T i iT (6xi) + O(E2
auiaxi
For the considered sequential system, the solution of the above system
with respect to the 6x.'s, i=1,...,N, is straightforward and yields
the following general formula in terms of the variations in the controls
only:
6xT 1 x  y(6u ) + n k xk (u )T_]2 (u )
=1 k=R+1 k au =1 k=+1 k 'Bu
f T T T2 f
S l (6ukT k sts 2 nl ax+t
2 .=1 k,m=1 v=+1 z x u 3k s=k+1 xs ax2 tm+l
T 
u m T2 =1 k=1 v=+1 x uk =k sk+l s ax u
in IV
(III10)
The variation in the objective function F caused by the variations
in the controls is given by:
N Fi i 2 2i
SF ((u.) + i (6x.) + 1 (6u. (6ui)
1 =
T 1 T 2 2 1
i=1 u ax 3u
i i i
1 T 2 i T 32 i 2
+ (xi)T x (6x.) + (Xui) T (Ox1) + 0(c ).
i au 3ax
Substituting into the last expression for 6xi's, i=2,...,N with their
equals from eqs. (III10) and, noting that the stage Hamiltonians
H., i=l,...,N, are given by eq. (III7), we find:
1
N H (6, T 2 (6 2
6F = (u ) + (6u ) (u )
=1 au a u
SN T 1 1 f 3
lN z fT f 2HZFzI9 I T fro
2 (k Su x 1 a2 9 Xum
S=l k,m=1 k s=k+l 3x t=m+l t au
1 N N af 1 af] 32H
+ (6uk)T 3x T (6u)
=k k=l uk Ls=k+l Ss xSu u" (III11)
Unlike the continuous case (Denn, 1969; Halkin, 1966) there is no
general way in which the last two terms in eq. (III11) can be made to
vanish. Therefore, the variations considered in the controls may well
produce 6F < 0, or 6F > 0, or 6F = 0. Thus it is evident from the
above that a strong minimum principle, which requires that the solution
to the problem (IIIP1) minimizes the stage Hamiltonians H z=1,...,N,
is not generally available (Horn and Jackson, 1965b; Denn, 1969;
Halkin, 1966). A weaker form of the discrete minimum principle can
be used and requires that the solution to the problem (IIIP1) makes
the stage Hamiltonians stationary. The examples presented by Horn
and Jackson, which counter the strong minimum principle, are such
that the stage Hamiltonians do not possess a minimum stationary point
whereas the problem itself does. However, there exist special cases
where the strong minimum principle can be applied (Horn and Jackson,
1965b; Denn, 1969).
At this point a further clarification is required. The strong
discrete minimum principle fails for physically staged steadystate
systems. Halkin (1966) has shown that it always succeeds for discrete
time systems, obtained as an approximation to the continuous time
systems, since the time increment t can become arbitrarily small and
make the last two terms in eq. (III11) vanish. The success or the
failure of the strong discrete minimum principle can also be explained
in terms of convexity (or directional convexity) or lack of it for the
sets of the reachable states of the system (Halkin, 1966; Holtzman
and Halkin, 1966). This constitutes an important characteristic and
in fact it is the basis for the development of a strong version of
the discrete minimum principle, which follows.
The twolevel optimization procedure is an infeasible decomposition
strategy of a Lagrangian nature which solves the problem (IIIPl).
The similarity between the Lagrangian and the weak discrete minimum
principle is a wellknown fact and in a recent note (Schock and Luus,
1972) the similarity between the twolevel optimization procedure and
the discrete minimum principle has been pointed out. Further insight
in the relationship between the last two methods will be given later
in this chapter, and a stronger version of the discrete minimum
principle will be developed, based on a method to overcome the
shortcomings of the twolevel optimization procedure developed in
Chapter II.
111.3. Development of a Stronger Version of the Discrete
Minimum Principle
In this section we will develop a stronger version of the discrete
minimum principle using Hestenes'method of multipliers and following
a procedure similar to the one that was developed in Chapter II to
overcome the dual gaps of the twolevel optimization procedure.
As a basis for the presentation we will use problem (IIIP1).
Consider now the augmented problem (IIIP2)
N T To
iin F = i(iui) + K E [xi1 f [x1 fi] K(x x) (x1xo)
i=1 i=l
subject to (!IIP2)
Xi+l = fi(xi'ui) i=1,..., N
xl = xo (given)
with K > 0.
The Hamiltonian for this problem is given by:
+ T T
S= F + i+l fi(xi ui) + llx
In the following theorems we will establish some important properties
of H
Theorem 9: If (1, ...,N ) is a stationary point of the Hamiltonian H
for the problem (IIIP1), then it is also a stationary point for the
Hamiltonian H of the augmented problem (IIIP2) for any real value
of the parameter K.
Proof: From the equations defining H and 11 we take,
H = H + K I [x+ f]T[xi+ fi] + K(xl x)T(x ) (II12)
Differentiation with respect to the controls yields:
N [xil f.]T
3H 3H + 2K i+1 u [xi+ fi] "
Du Su u +1 1
i=l
Evaluating the derivatives at the point (0,,...,.N), since 0 = 0
and x i f. = 0 for i=l,...,N, and we find
H 3H
3u u
Theorem 10: Let u=0 be a local isolated minimum of F with all the
connection constraints satisfied. Then there exists a real valued
2 *
parameter K such that u 0 and H > 0 (i.e., the matrix
0Su 0 u2 0
is positive definite) for K > K .
0
Proof: In Theorem 9 it was shown that = 0 is independent of the
constant K; therefore, it will be valid and for K > K Let us
o
2 2
consider next the matrix 2 H /0u :
2* 2H N [x i+ fiLT [xil fJ
32 2 + 2K u 1
u2 u =2 u
Du Do i1 Do
The solution of problem (IIIP1) is a stationary point of the
Hamiltonian H. The nature of the stationary point for the discrete
case cannot be predetermined and thus 2 0. Assuming that we
u
have a nonsingular problem, i.e.
[xi+l fkT D[xi+1 fi]
uT f / 0 i=1 ...,N
B u T
3u
and we conclude that
T 2H T a2 N 3[xi+l fi T
(su)T 2H (6u) = (6u) 2 (6u) + 2K Y T (6u)
au Du i=l Du
D[xi+l f] 6 > 0
1+1 1T (6u) > 0
L Tau
for a large enough K so that the second term which is always positive
prevails over any negativity of the first term. Q.E.D.
The second theorem implies that any stationary point of the
Hamiltonian H can be made a minimum point by choosing the parameter K
large enough.
Theorems 9 and 10 establish the following result: For a large
enough K, a local solution to problem (IIIP2) which is also a local
solution to problem (IIIP1), minimizes the Hamiltonian H of the
augmented problem (IIIP2).
The above result requires that we minimize the overall problem
Hamiltonian, H The available weak minimum principle permits us to
solve the overall problem by solving one problem per stage per
iteration since H decomposes into a sum of stage Hamiltonians.
Unfortunately H does not decompose, as we shall now see.
Let us see now how we can simplify the above result and relate
the solution of the problem (iIIP1) or (IIIP2) to stage Hamiltonians.
Consider the penalty term in the objective function of the problem
(IIIP2):
N T N T T T
K [x+ fT[xi ] f K [x x+ + f1 2x fx
i=1 =
We note that each member of the summation includes separable terms,
T T
e.g., x+ xi+l and f i f and nonseparable terms such as the cross
product x Tifi. The following develops a strategy to decompose
computationally the problem of minimizing H This approach is moti
vated by the work on solving nonconvex problems using the twolevel
approach.
Expand the crossproduct term in a Taylor Series and consider the
following linear approximation around the point, il f = (xui):
T T ? T T ^
xi1 f + i i i+ i + xi+ i
Then the Hamiltonian of the augmented problem takes the following
form H for H*
form H for H :
T T T T F
H = (x.,ui) + K i [x+x. f+ + 2x+ 2x+ 2
.1 i+ i+l + i ii i i +ll1 1+1+
+ K(x x)T(x xo) + N + T o
x1 1x) 1 iIi+fi + A il1
1=1
N
No T T A T + Kx
E [ + Kfif. + 2x l 2T + +xx]
i~ 1*1 11 i+1 i i ii i+l 1 1
= H + Ax + Kx N+ 2Kx fN + Kxx (III13)
i=l
H. + Kx x + Kfif + 2Kx f 2Kx f 2x f,}+ A
1 1 1 1 1 j+l i i+l i ii i+l i
Let us now establish the following result which also constitutes
the basis of the proposed algorithm (given later in this chapter).
T T T T
Theorem 11: If the point u = (uT,u,... ,u) minimizes the overall
*
Hamiltonian H then each stage Hamiltonian Hi, i=1,...,N, resulting
after the Taylor Series linear approximation of the crossproduct
terms is minimized with respect to the corresponding control variable
u at the point ui
Proof: At the solution point we require feasibility, i.e.
xi+l = fi(xi'ui)
50
(This requirement is evident from the algorithm to be presented in
the next section.) From eq. (11113) it is clear then that
Dui ui
i=l ,... N ,
and therefore a stationary point of IH with respect to u. is a
stationary point of H. with respect to u.. The Hessian matrix of
the second derivatives of H must be positive definite at the point
u, i.e.
2 H*
2
auN
J^
must be positive definite. Thus, all the submatrices on the diagonal
2* 2* u*
must be positive definite, i.e., 2 ... 2 2 must be
ul su2 c2
2 *
positive definite. But 2
1
2 therefore all the 1 i=1,...,N
u 3au
1 i
are positive definite.
Theorem 11 implies that the minimization of the H with respect
to the control variables can be replaced by the problems
Iiin Hi
i=1 . N .
111.4. The Algorithm and Computational Characteristics
The theorems of section 111.3. imply the following algorithm
which constitutes a stronger version of the now available discrete
minimum principle.
Step 1: Assume a value for the penalty constant K.
Step 2: Assume values for the control variables ul... ,uN
Step 3: Using the values u, ...,uN, solve the state equations
o
x="
xi+l fi(xi'ui)
i=l .. ,N
fon:ard and find x2,...,xN. Let x2,...,xN be the found values
(x, = x, given constant).
Step 4: Using the values l ,... ,uN and x ,...,xI, solve the adjoint
equations
3<^. aofT
1 ax ax i+l
S i...
SN+l= 0
backwards. Let Al...,N+A be the found values.
Step 5: Formulate the Hamiltonian H of the augmented problem (IIIP2)
and expand the crossproduct terms around the point
(xI ... ,xN;u .... UN). Formulate the stage Hamiltonian Hi,
i=1,...,. Minimize all Hi, i=1,...,N and find optimal values
for the controls, say u., i=1,...,N. If the minimization
procedure fails to produce a minimum for at least one H.
increase K and go to step 2.
Step 6: If luij uij < E for all j and for i=1,...,N (j denotes the
jth component of the vector ui) stop, the solution is found,
otherwise assume new values for the controls putting
New u. = u. i=1,...,N
and go back to step 3.
The effective use of the above algorithm depends largely on the value
of the constant K. Very small values of K will not produce the
necessary convexity of the stage Hamiltonians, while a very large K
will mask the objective function and will make the algorithm insensitive
to the descent direction of the objective function. Miele et al. (1972)
have suggested a method for choosing a proper value for the constant
K in the context of the Hestenes method of multipliers. As will be
discussed in section III5, the two methods are related and this
method for choosing K should be satisfactory here.
III.5. Discussion and Conclusions
As mentioned before, the failure of the strong discrete minimum
principle was caused by the fact that the stationary points of the
stage Hamiltonian with respect to the controls are not always minima
points. The method proposed in the previous sections of this chapter
turns every stationary point of the Hamiltonian into a minimum point
for a large enough K. Thus we can find the solution of the original
problem at a local minimum point of the stage Hamiltonians. This
constitutes a stronger result than that currently available, where
we must search for a stationary point of the stage Hamiltonians.
The method used in this chapter to establish a stronger version
of the discrete minimum principle parallels in many respects the method
used in Chapter II to resolve the dual gaps in the twolevel opti
mization procedure. The source of the shortcomings for both the
methods (minimum principle and twolevel method) is the nonconvexity
of the objective function and/or the stage transformation equations.
The twolevel optimization method fails if either the objective
function or transformation equations or both are nonconvex with respect
to the control and/or the state variables.
For the resolution of the dual gaps of the twolevel method, the
same penalty term was used, multiplied by a positive constant K.
It was required that K be large enough so that the stage subLagrangians
become locally convex with respect to the control and state variables.
Since, in the present work, we have required that the K be large
enough to turn the stage Hamiltonians convex with respect to the
controls only, we conclude that the K required by the strengthened
form of the minimum principle to solve a nonconvex problem is at
most as large as the K required by the twolevel optimization method
to resolve dual gaps.
Fig. 4 compares the values of K required to solve a nonconvex
problem for the three methods discussed, namely, weak discrete
minimum principle, the strengthened form of discrete minimum principle
developed in this work, and the twolevel optimization method.
Note, KTLM > KSMP > 0, where KTLM is the least value of K required
for the twolevel method and KSMp is the least value of K for the
minimum principle developed here. Thus it should be clear that,
although the methods are related, they do not have equivalent short
comings; it is possible to find problems which can be solved by the
weak minimum principle or even the method developed here, and not by
the twolevel method. However given a K 2 KTLM' all three methods
succeed.
CHAPTER IV
EXAMPLES ON NONCONVEX OPTIMIZATION
IV.1. Numerical Examples
IV.l.a. Twostage Examples
Consider the following two stages example, Fig. 5, taken from
McGalliard's thesis (McGalliard ,1971 ) and described by:
0.6 0.6
Min F = t3(x2,u2) + x6 + 2u, + x26 + 5u2
s.t. x2 = t2(xl,u1) = 3x1 + 3ul
x3 = t3(x2,u2) = 2x2 + 2u2
x, 0
u1 0
X1 < 3
x2 0_
2> 0
u2
ix+ 2u1 < 4 Lx2+ 2u2 < 4
Using the regular twolevel optimization procedure, a table
generated, see Table 1, and the solution is found to be:
x. = 0 or 3
u = 0
x = 4
u = 0
Sl
I >
a
f!
u
Figure 5. Twostage Example
TABLE 1
POINTS GENERATED FOR THE TWO
STAGE EXAMPLE
x2 x2 u1 x2 u2 Min L
+ Cm
2
1.425
1.
0.667
0.5
0.215
0.0
0.5
0.5
0.5
0.5
0.5
0.5,0
0
0
0
0
0
18.07
12.1
4 0
4.84
5.7
7.7
Min tL2 h(2)
18.07
12.1
 9.3
 7.1
 6.3
4.84
5.7
7.7
and Max Min [F A(x2 3x 3ul)] = 4.84
A xl,UlX 2,U2
Two solutions are produced with
z1 = x2 3x1 3u1 = + 4 3(3) + 0 = 5
z2 = x2 3x1 3u = + 4 3(0) + 0 = 4
and from the corollary in Section 111.3. we conclude that a dual gap
exists.
Introduce the penalty term K(x2 3x1 3ul)2 in the objective F
and make a linear approximation of the crossproduct terms x2x1 and
x2u1. Following the algorithm in Section 111.5., we find the solution:
x1 = 4/3 x2 = 4 with A = 0.5 and K = 10
u1 = 0 u2 = 0 and
F = Max Fin
X x1,u ,x2,u2
[FA(x2 3x13ul)]= 4.51 
Substitute unit 2, in example 1, by another described as follows:
2 = t3(x2,u2) + 2x6 + 2u2
t3(x2,u2) = 2x2 + 3u2
x 2xu2 2
S2 = u2 < 2
x2+u2 j 4
Min
x1,u ,x2,u2
The problem becomes:
Min F = x16 + 2ul 2x2 u2 + 2x 6
s.t. x2 = 3x1 + 3
x1,l U S1
x2,u2 E S2
The twolevel procedure yields the solution (see Table 2)
x1 = 0 and 3 x2 = 4 with Max Min
A x u1 ,x2,u2
U1 = 0
[FA(x2 3x13ul)]= 2.54
u2 = 0
and a dual gap is detected again.
The proposed method yields the solution
x2 = 4
u2 = 0
F = Max Min
X x1,ulx2,u2
for A = 0.8 and K = 10
[Fh(x2 3x1 3ul)] = 2.07
with all constraints satisfied.
IV.l.b. Threestage example with recycle
In the twostage example, let us insert an additional stage
with a recycle loop from this new stage to stage 1, Fig 6.
x = 4/3
u = 0
Mi n
xl,'l,x2,u2
TABLE 2
POINTS GEiNERATED FOR THE MODIFIED
TlhOSTAGE EXAMPLE
A2 U1 x2 2 Mlif n1 R 1 h(22)
3
2
1
0.667
0.5
0.484
0.3
0.218
0.215
0.1
* cx,
0.5
0.5
0.5
0.5
0.5,0
0
0
0
0
0
0
0
28.56
18.06
 7.56
 4.06
 2.56
 2.42
 0.767
 0.031
0
0
0
2.0
2.0
2.0
2.0
2.0
2.0
2.0
2.37
2.53
2.546
3.00
+ oo
>+ 0
30.56
20.06
 9.56
 6.06
4.56
 4.42
 3.137
 2.561
 2.546
 3.00
 C
Figure 6. Threestage Examnple with Recycle
6 .) IX,, u,3
The problem is:
M in F = x'6 + 2u + x26 + 5u2 4x u + x4
s.t.
x2 = 3x1 + 3ul
x3 = 2x2 + 2u2
u = u3/4
x i, e S1
x2,u2 e S2
x3,'u3 0
x3'u3 E S3 x 4
x3+u3 < 6
Since the solution of this problem is in a dual gap, we apply the
proposed resolution which yields
x = 0.67 x2 = 2 x3 = 4
u1 = 0 u2 = 0 u3 = 0
for \1 = 1.0 2 = 1.0 ,3 = 0.5 and K = 10.
Iiin F = Max Min [FI (x2 3x1 3u1) 2 (x32x22u2)
The x1,u1,x2,u2.x3,u3 13 X1,X2 u3 X l,ux2 ,U2 X3,U3
X3(ulu3/4)]
= 11.96
with all constraints satisfied.
IV.l.c. Soland's example
This example is taken from Soland (1971):
2
Min f = 12x1 7x2 + x2
s.t. ((x) = 2x1 + 2 x2 0
0 5 x 2
The solution found by Soland
the generation of 12 points,
subproblems is:
x = 0
(which is not the global minimum) after
which necessitated the solution of 23
and f = 10
x2 = 2
The algorithm proposed in this work generated a sequence of 8 points,
solving 8 subproblems, and found the solution:
x = 0.718
and f = 16.7387
x2 = 1.46847
The sequence of generated points is shown in Table 3. The starting
point is X1 = 2.00, x2 = 0.0 .
IV.l.d. Jackson and Horn's counterexample
Consider the following twostage example, Fig. 7, taken from
Horn and Jackson (1965a). This example is the counterexample which
demonstrated the fallacy of the strong minimum principle and is
TABLE 3
POINTS GENERATED FOR SOLAND'S EXAMPLE
A x x2 h(A)
60 2.00 0.025 919.04
58.9 1.7365 0.0393 713.07
25.5 1.6513 0.06146 182.64
18.9 1.5511 0.07873 108.35
12.5 1.4318 0.1719 78.52
 6.5 1.2799 0.3435 27.25
 1.0 1.06 0.6326 22.16
+ 3.0 0.718 1.46847 16.7389
1 ^ 0 j' i S V.J
U *4 i T
2.L V
I VI
"F7
11^clac.atB; ,~.ia;waiKK
es
ILI R B IT R;0.1T
r* %
cP&S7
; I3~t e~LIA~i~i mT~~hi> I.~
Figure 7. Jackson and Horn's Counterexample
's
5L,2" I
rt"Il
U N i T
5'
S1
described by:
Min x2
1 02
subject to
1 o 1 1 12
x = x1 26 (1)2
1 o 1+
x2 = x2
x2 = x + (x + (62)2
2
x2 = arbitrary
xo = xo = 1
x1 2
The solution to this problem is 61 = 2 = 0 which gives the smallest
2
value of x1. It, however, does not minimize the Hamiltonian for the
stage 1 since this Hamiltonian has only one stationary point which
is a maximum. Let us now apply the procedure of the strong discrete
minimum principle developed in the previous sections. Thus we have:
H K[xo 201 (12 2 + K[x0 + 2 2K(x)[x 291 (1)2]
o o 1 1 1 o 1 1 1O
2K(x2)[xo + 1] + K(xl)[xo 21 (61)2] + K( )[x2 ]
+A 1 1 1 11 2 2o 1
iXl 2)1 (1 + X2x2 + XA2
68
H* x + (x2)2 + (62)2 + K(x (x2 2K(x 2 1 )2
2K(x )[x + 1] + K(Xl)[x 2 1 (1 ] + K(x )[x2 + ]
1 1 1 1
1 1 x 2x2
with
=1 + 1 o ( ~12
A1 21+K(x1) 2K~x1 26 [ 6 )
A2 = 2(x2) + 2K(x2) 2K[x2 + 1
The first necessary conditions yield:
1 K(13+ 6K(12 + [10K 2Kx + 2K(x)
Ml I
 Al91
+ [4Kxo + 2Kx2 + 4K(x2) 2K(x2) 2X1 + A] = 0
aH
 2(2) = 0
1. Put K = 1
1
2. Assume, 6
which gives 2 = 0 .
2 = 0
1 1 1 3
3. Find, x 1 2
S2 2
and x2 = 1 + 1 = 2
4. Also, A 1 and = 2(2) = 4
1 A2
~1 ~2
5. Minimize H1 and H2 which give: 1 = 0.947 and 6 = 0
11 O11 = .0531 > E = 0.001
Go back to step 2,
2a. Assume 1 = 0.947 and a2 = 0
1 1
3a. Find, x1 = 1.343 and x2 = 1.947
1 1
4a. Find, A = 1 and X1 = 3.894
2* 2
5a. Minimize H1 and H2 and find: 61 = 0.9 2 = 0
Go back to step 2a and assume 1 = 0.9 62 = 0. Continue in
the same way until 6e 1 < E and 2 6 2 < This finally
leads, in 7 iterations, to the solution 1 = 2 = 0 which is the
solution of the problem.
Note that if K = 0.1 the nonconvexity of the first connection
constraint with respect to 01 does not disappear. The stronger version
of the minimum principle developed in this paper does not succeed in
giving the solution.
IV.I.e. Denn's counterexample
Consider the following example taken from Denn (1969), which is
also a counterexample to Katz' strong minimum principle and is described
by:
Minimize x1
1 2
u ,u
subject to
1 o o 1 1)2
x1 = x(1 + x2) 2 ()
1 o o 1
x2 = 4x1 2x2 + u
2 1 1 1 ,22
S= x (1 + x2) (u
2 1 1 2
x2 = 4xI 2x2 + u
2 1 2x2
x = 1/4
and x = 1
1 2
where u and u2 are to be chosen subject to:
1 2
0 < u u ,u
This is a constrained problem but it can be handled equally as well as
the unconstrained problem by the developed method. The inequalities
will be handled through KuhnTucker multipliers.
The above problem has a solution at the point ui = 1 and u2 = u,
but the Hamiltonian for the first stage does not have a minimum and
thus Katz' strong minimum principle cannot be used. In fact the
Hamiltonian of the first stage possesses one stationary point which
is a maximum. Let us now apply the algorithm developed in the
present work. Thus we take:
= Al(u )2 + ( 1+ )u1 + A1 x(1 + x12)+ 42x1 2A2x2 + K(x1)2((1+12)2
S 2 11 221 2 12
+ 16K(x)2 +K(x(x11,2 16Kx x1 +u + Kx21(1 + x2) 2Kx1x1(1+x2)
1 2 1 22 1 2
+ ^ (1 + x)(2) Kx(1 + x)(u2) + 4K12 8Kxx2
^2 I2 1^2 122 I2 12
2Kx2x2 + 4Kx2x2 4K4Kx1u + 8Kx1u + 2Kx2u 4Kx2u
and
H = 22 2 Kx2 2 + ( 4 + Kx 2) K(x2 + K(u2)
H2 1 2 A1(u ) A u Kx1 Kx1(u ) 2K(x
 ^2 2^u ^2 K >2j +21 2)K
2Kx2u 2 u 2) + Kx1x1(1 x12) 2Kxxl(l+x2) 1 12)
2 1 2 142 1 1 2 12 1 2
Kx(1+x2)(u2) 4Kx1x2 8Kx1x2 2Kxx2 + 4Kx2x2
I 2 1 2 12 21 2
4Kx12 + SKx1u2 + 2Kx2u 4Kx12u
The necessary conditions to be satisfied are:
3H 1 1 and
1 1(ul) + 2 + = 0 and
21 2 2 2 =1 1
Du
S K(u A Kx + K 2Kx Kx1 +2)
3u
1 1
+ 8Kxl 4Kx2=0
The adjoint variables satisfy the following equations:
A1 (1+x(2) + 4A + 2K(x1)(1+x1 2 + 32K(x1) 16Kx 2KX^(1+x2)
K(+x )(u2) 8Kx 2 8Kv2
12 2 2
Kx1u + 4Kx2 4Ku
2 2 2 A A
X2 = 2x + K + K(u2) 2Kx^(1 + x2)
2 = 2K(x) 2Ku + 4Kx1 .
Note that in the above equations the variables pl and p2 are Kuhn
Tucker multipliers through which we handle the inequalities
1 2 *
1 2
u u and u > u For p1 and 12 we have P2 < 0.
Starting with K = 10 and initial assumptions ul = 2 and u2 = 2,
we apply the same algorithmic procedure as presented in the previous
example, and after 5 iterations we find the solution
u = 1 and u
which is the solution for the problem.
IV.2. The Design of a Heat Exchange Network
Avery and Foss (1971) have demonstrated that the method of two
level optimization may not be generally applicable to chemical process
design problems due to the mathematical character of commonly en
countered objective functions. Since the cost functions used in
chemical engineering design problems typically include a term
x where x is the throughput in a unit, nonconvex problems arise.
The method for resolving the nonconvexities, as described in
Chapter II will be applied in this section to the design of a
heat exchange network presented by Avery and Foss and possessing
inherent nonconvexities.
The cold stream D of Figure 8 is to be heated from temperature
T to T Three hot streams having flowrates a,b,c and temperatures
o p
ta, tb' tc may be used for heating; here they are considered to have
sufficient heat capacity and availability to accomplish the task.
The distribution of the total heat load among the three exchangers
is to be accomplished at the minimum cost of equipment, which is related
to the total heat transfer surface area A by
3 3 ac.
C = i Ai Ai (IVI)
i=l i=l
The parameters y and a are positive constants; typically a = 0.6.
It is easily demonstrated that this problem has a minimum cost solution
and it is unique.
Let xI and x2 be the enthalpies of stream D entering and leaving
respectively, the first heat exchanger (see Figure 9). Similarly
yl and y2 are the enthalpies for the second exchanger and z, and z2
for the third. Note that x1 and z2 are known since the flowrate and
Cold Stre mYw
Figure 8. Heat Recovery Process
Hot Streams
T A
t b s i'c
a 4
Stream D
Figure 9. Uncoupled Heat Recovery Process Showing the
Enthalpy Variables Used in the Twolevel
Optimization Method
Cold
temperatures of stream D are known at the entrance, to the first
exchanger and the exit of the third exchanger. Thus, considering
x2Y1' Y2 and z1 as the design variables, the minimization problem
can be formulated as follows:
Min [C1(X2) + C2(1',2) + C3(z)]
x2Y1 Y2,z1
s.t. x2 =
Y2 = Z1 (IV2)
The Lagrangian for this problem is:
3
L = Ci X(x2 Y1) A2(Y2 z1)
= {C1(x2) X1x2} + {C2(Y1,Y2) + XAIY X2Y2} + {C3(zl)+ 2Z }
= 1(x,2'1) + Z2(yy2',X1,A2) + j3(z1,X2) (IV3)
For given values of the multipliers X and x2 the twolevel optimization
method requires the minimization of the sublagrangians 1l' k2 and 93.
A sufficient condition for a function to be at a minimum point is
that the Hessian matrix be positive definite at that point. The
Hessian for 22 is defined as
L22 2C
3 C2 3 C2 d e
H2 = Q2 Q3yl 1 (IV4)
32C2 2C2 e d2
Yl yQ Y712
and the conditions assuring that both eigenvalues are positive are
d1 > 0 d2 > 0 (IV5)
A (d d2 e2)> 0 (IV6)
In the above, the heat duty Q = y2 yl in exchanger 2 has been
substituted for the variable y2.
d d2 and e may be determined from equation (IV1), and the
following relations, which are suitable for the determination of heat
exchange area. Figure 10 identifies the notation used here.
Q = CD (T2 T1) = CB (t2 t1)
Q =Y2 Y
T1= Y1/CD
A =
UAT
m
(t2 T2) (t T1) CD
AT = r f I
Tm (t2 T2) r
n 2) B
(t T)
1 1
Without loss of generality, it is convenient to take r > 1; identical
conclusions hold when r 1. It may be shown that
dl = q[(al) (rs1)2 + p {(rs)2 1}]
d2 = q[(ra) (s1)2 + p(s21)]
e = q[(al) (rsl) (sl) + p(rs2_)]
C,3
/d ka
I N
it' Ma
Y, ;1, xx
U .
Figure 10. Diagram Showing Notation for a Single Exchanger
where
62 CDOt2Yl >
61 CDt2rQY1
52
2
q = Ka p2/62 0
K I C a
K=Y L2 >0
Substitution of di, d2 and e into equation (IV6) gives
A (dd2e ) = q p s (r1)
which implies that A is never positive, and that, regardless of the
parameters of the problem, the stationary point can never be a
minimum. Therefore the twolevel method fails.
Consider now the following augmented problem:
2* 2 2
Min F = Cl(X2) + C2(YIy2) + C3(z ) + K(x K(x Ky (y2z1)
s.t. x2 Yl = 0 and y2 z, = 0
with K > 0.
The coupled terms x2y, and Y2z1 are expanded in a Taylor Series
around the point x2,Y1,Y2,z1 and are approximated by the linear
terms in the expansion. The Lagrangian of the above problem becomes:
80
*~ 2 *~
L {C1(x2) (X + 2K Y)x2 + K x + K x2}
+ {C2(y1'Y2) + ( 2K*x2)Yl (2 2Kzl)Y2
*2 *2 *~ *~
+ K Y1 + K Y2 + K x21 + K Y2z}
*~ 2 *~ ~
+ {C3( ) + ()2 2K y2)z1 + K z1 + K y2z}
*
= I + 2 + 93
The necessary conditions for the minimum of C are
VL = 0 and V L = 0
where
v = [x2yl'y2,z1]T
and = ( T,2 A2)
This results in the following equations
*2
0 _
ax20
*
RZ2 R2
y1 y Y2
*
3
= 0
a z
x2 = Yl and Y2 = z ;
that is, each subsystem is at a stationary point with respect to the
variables associated with it. The Hessian matrix for the new sub
agrangian 2 of the second exchanger is:
lagrangian S2 of the second exchanger is:
H2
a2^
2 C2 *
+ 2K
3Q
a2C2 *
+ 2K
ay IQ
a2C2 *
_ + 2K
SQ3y
2
Q2C2 *
S+ 4K
a 2
a1
The terms d1 d2 and e
dI = dI + 2K
are easily found to be
*= d *+
d2 = d2 + 4K
* *
e = e + 2K
* ** *2
A = [dd2 (e ) ]
*2
= 4(K*)2 + (4d1 + 2d2 4e) K* + (d d2e2)
= 4(K*)2 + r2K + P3
where
r2 = 4dl + 2d2 4e
3 = dd2 
A has roots
K F2 (F 2 16F3)
8
It was shown before that F 5 0
d2
*
thus.
(22 163) >
and the roots are real and of opposite sign. Let K1 > 0 and
K2 0 then for K > K A > 0. Also for
K > d1/2 K3 d1 > 0
and for
K > d2/2 K4 d2 > 0
Therefore if K = max {K, K3, K4} we find H2 > 0 and the stationary
point for unit 2 is a minimum as required.
Thus we see that the curvature of a sublagrangian at the sta
tionary point can be determined not only by the subunit cost
function, C2 in this case, but also by the numerical value of the
penalty factor K
The above example demonstrates the applicability of the developed
strategy in a wide class of chemical engineering design problems
where the conventional twolevel method fails to provide the solution.
CHAPTER V
THE SYNTHESIS OF PROCESS FLOWSHEETS.
A GENERAL REVIEW.
Processing systems are characterized by two distinct features.
The first is the nature of the process units and their interconnections,
and the second is the capacities and the operating conditions of
the units. Synthesis of optimal processing schemes requires a search
over the whole space of structural alternatives as well as over the
space of the design variables for each structural configuration.
The synthesis of an optimal process flowsheet represents a great
challenge for a chemical engineer in the area of systems engineering.
A number of significant contributions have moved the problem from
the state of art to the state of semiart where the personal capa
bilities of the designer have been organized and systematized in a
logical manner. The size of the problem is very large and the methods
currently available simply give good solutions which are happily
accepted by practicing designers. The progress of mathematical systems
theory in this area has been very slow and no major thrusts have been
attained. Thus the chemical engineer has to resort to techniques
that are available for use with his own experience and intuition.
We shall attempt to present the major contributions to the
solution of the synthesis problem in order to relate and compare the
synthesis strategies proposed in this work, and presented in Chapters VI
and VII, with the general trends of thought already established by
other researchers in this area.
