
Citation 
 Permanent Link:
 https://ufdc.ufl.edu/UF00097456/00001
Material Information
 Title:
 Iterative methods for solving largescale linear programs
 Creator:
 Kumar, Gollakota Surya, 1939
 Publication Date:
 1979
 Copyright Date:
 1979
 Language:
 English
 Physical Description:
 iv, 106 leaves : ill. ; 28 cm.
Subjects
 Subjects / Keywords:
 Algorithms ( jstor )
Linear inequalities ( jstor ) Linear programming ( jstor ) Mathematical procedures ( jstor ) Mathematical programming ( jstor ) Mathematical relaxation method ( jstor ) Mathematics ( jstor ) Perceptron convergence procedure ( jstor ) Polyhedrons ( jstor ) Simplex method ( jstor ) Dissertations, Academic  Management  UF ( lcsh ) Linear programming ( lcsh ) Management thesis Ph. D ( lcsh )
 Genre:
 bibliography ( marcgt )
nonfiction ( marcgt )
Notes
 Thesis:
 ThesisUniversity of Florida.
 Bibliography:
 Bibliography: leaves 101104.
 Additional Physical Form:
 Also available on World Wide Web
 General Note:
 Typescript.
 General Note:
 Vita.
 Statement of Responsibility:
 by Gollakota Surya Kumar.
Record Information
 Source Institution:
 University of Florida
 Holding Location:
 University of Florida
 Rights Management:
 Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for nonprofit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
 Resource Identifier:
 023308990 ( AlephBibNum )
06429711 ( OCLC ) AAL1949 ( NOTIS )

Downloads 
This item has the following downloads:

Full Text 
ITEPATIVE lETiHODS FOR SOLVING
LARGESCALE LIN:EAF. PROGR.'lS
Gollakoca Surya Kumar
A DISSERTAlIO;N PRESEfNTED ui TilE GRADUATE COLilICIL OF
THE UNIVERSITY' OF FLORIDA
II PARTIAL FLLFILLHIELNT OF THE REQULiIREIEENTS FOR THE
DEGFE OF DOCTOR OF PIIILOSOPHY
UIJIVERSIT'f OF FLORIDA
19 79
ACKNOWLEDGEMENTS
I wish to express my deepest gratitude to Professor Gary
Koehler, my dissertation chairman, for the excellent guidance, the
constant encouragement and the support he has provided me during
my career at the University of Florida. Despite his busy schedule
he was always available at a moment's notice.
I also wish to thank Professor Antal Majthay whose depth of
knowledge not only in the area of operations research but on a
wide range of subjects will always remain aweinspiring. I am
also grateful to Professors Ira Horowitz and Henry Tosi for
providing financial assistance during my stay at the University of
Florida. I also wish to thank Ms. Kathy J. Combs for her superb
typing of the manuscript.
Last but not the least this work would not have been possible
but for the patience, understanding and encouragement of my wife
Rekha.
TABLE OF CONTENTS
PAGE
ACKNOWLEDGEMENTS ............................................ ii
ABSTRACT ............................ ... ....... .... .... ..* v
CHAPTER
ONE THE PROBLEM: WHY ITERATIVE TECHNIQUES? ........... 1
TWO BACKGROUND RESULTS AND TOOLS ..................... 8
Introduction ............. .... ............... 8
Motivation ................................ .... 8
Task ... ........... ......................... .. 11
Definitions .................... ......... .. 12
Typical Relaxation Method ....................... 13
Procedure ....................... ............. 13
Variations .................................... 14
Convergence ................................... 14
Fejermonotone sequence ..................... 14
Motzkin and Schoenberg ..................... 16
Relaxation Procedures ......................... 18
Agmon .................................... ..... 18
Motzkin and Schoenberg, Eaves ............... 21
Goffin ........................................ 
Definitions .................................. 22
Another proof of Agmon's result .......... 33
Range of A for finite convergence ........ 37
Merzlyakov .................................... 39
Definitions and motivation ............... 39
Method ....................................... 40
Generalized Merzlyakov procedure ....,..... 
Typical Subgradient Method ..................... 6
Procedure .................................... 6
Variations .................................
Some Subgradient Procedures ...,................. '
Polyak ... ........ ...........................
Oettli ...................................... 51
Some of the more recent algorithms .......... 5)
Camerini, Fratta and Maffioli ............ 5
Wolfe ...................................... 59
iii
CHAPTER
THREE THEORETICAL RESULTS ................................ 62
Comparison of Generalized Merzlyakov Method
and Some Typical Subgradient Methods ............ 62
Generalized Merzlyakov method ................ 62
Further developments of Oettli method ........ 66
Generalized Merzlyakov method and Oettli method 68
Comparison with some other subgradient
methods ...................................... 71
An Algorithm Using Relaxation Method ............ 73
Motivation ................. ................. 73
The algorithm ............................... 74
Phase I ............... ......... ......... 74
Phase II ................................. 79
Application to solving large linear
problems .................................... 80
Computational results ........................ 86
Concluding Remarks .................... ......... 98
APPENDIX ....................................................... 99
BIBLIOGRAPHY ................................................. 101
BIOGRAPHICAL SKETCH ................ .......................... 105
PAGE
Abstract of Dissertation Presented to the GrAduate Council
of the University of FloridJ
in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy
ITERA\TiVE METHODS FOR SOLVING
LARGESCALE LINEAR PROGCFAIS
B '
Gollakota Surya Kuirar
August 1979
Chairman: Gary J. Koehler
MIjor Department: Management
The simplex: method for solving linear programs has achieved
its overwhelming success over the years due to the advances during
the last three decades on extending the effectivEness of the basic
algorithm introduced by Danczig in 1951. However the largest
problem that can be economically solved by this method is dictated
by the size of the basis. The limitation being the ability to
maintain a sparse yer accurate representation of che basis inverse
CocuTiercial computer codes using the recent developments of the
simplex method are capable of solving manLIem, Lical program systems
uf Llie order of 10,'i.iUU rows. IJowev.er, inrker l incur p.rogr ns do
arise in practice. The reason why LIe', are not being set up is
because none of the production LP codes can solve such laree
problems in a timely and cost effective fashion. For such large
problems iterative techniques that do not require a basis may be
an attractive alternative.
We review two classes of iterative techniquesrelaxation
methods and subgradient optimization. Solving a linear program
can be shown to be identical to finding a point of a polyhedron P,
the polyhedron being formed from primal and dual constraints and an
additional constraint using strong duality. In relaxation methods
we successively find points of relaxed forms P c pn. That is, at
iteration n we find an x Pn such that P c P. We assume P # 0.
The problem is solved when the sequence converges to a point
x* P.
Subgradient methods are an alternative class of iterative
techniques for solving these problems. In subgradient methods we
minimize a convex though not necessarily differentiable function
n+l n n n n
f(') by the iterative scheme x = x t u where t > 0 is a
n n n
a scalar obtained by the approximate minimization of f(x t u),
n n
and u is a subgradient of f at x When we use the minimization
method for finding a point of a polyhedron P, the function f is
so defined that if x* is a minimum of f then x* E P.
These iterative techniques work with original data; thus
advantages of supersparsity can be fully realized and the program
run in core. Also they are self correcting and knowledge of a
vector close to the solution can be used to advantage.
We show that a generalized form of Merzlyakov's relaxation
procedure subsumes most of the recent subgradient procedures when
the objective is to find a point of a polyhedron. A new algorithm
using relaxation method is presented. This algorithm was coded to
explore values for the relaxation parameters used in the algorithm.
CHAPTER 1
THE PROBLEM: WHY ITERATIVE TECHNIQUES?
Mathematical programming has achieved its present popularity
in a large measure due to the success of the simplex method for
solving linear programs published by Dantzig [6] in 1951. Over the
years considerable effort has been expended towards exploiting and
extending the effectiveness of this procedure. We will briefly
outline some of the improvements.
Most practical linear programs have a sparse matrix [2] and it
is necessary to exploit this sparseness to reduce memory and time
requirements and to maintain accurate information. The first such
procedure was the revisedsimplex procedure. This procedure
operates only on original problem data and requires an inverse of
the current basis. If there are m constraints and n variables
(including slacks, surplus and artificial) then the regular
simplex procedure requires
(1.1) (m + l)(n + 1)
storage locations. There are m + 1 rows (including the objective
and n + 1 columns (including the righthand side). In the revised
simplex procedure we need
(1.2) p(m + l)(n + 1) + m2
where p is the problem density. (1.1) is much larger than (1.2)
if p is close to zero.
It was also discovered that if the inverse matrix were stored
in product form [29] an additional decrease in storage requirements
could be realized. However, in product form, the basis inverse
grows in size at each iteration so that eventually we must either
run out of room or scrap the current form of the basis inverse and
reinvert the current basis. In practice this happens quite often.
Using product form inversion, the storage requirements are
(1.3) p(m + l)(n + 1) + qmr
where q is the average density of the eta vectors needed in the
product form inverse and r is the average number of eta vectors.
Notice, though, that to gain the decrease in (1.3) over (1.2) we
must spend time in the reinversion step.
It was soon realized that there are many ways of forming a
product form of a basis matrix. For instance, the matri:; could
first be decomposed into the product of two triangular matrices.
The eta vectors for these are sparse and easily formed. In
addition, within a class of procedures for viewing the original
basis, one can choose the rowcolumn pivot order to minimize the
creation of new members  i.e. one can choose the order in forming
the eta vectors to minimize the density of the resulting basis
inverse representation [20]. A related issue has to do with
updating the currect basis inverse with a new eta vector. ,inc cjii
perform this operation with a minimal buildup in storage
requirements [14].
Kalan [22] pointed out that of the p(m + l)(n + 1) elements of
a typical LP problem, most are identical (usually most are + Is).
Kalan suggested storing only the unique numbers of an LP problem
and using pointers to store the problem structure. In so doing
one stores fewer elements than exist in the problem. The new
sparseness is termed supersparsity. Kalan noted that not only can
original data be stored this way but also new data (formed in the
basis inverse) can be stored in this fashion.
Another useful idea  scaling of data  has numerical
stability for its motivation. Generally for real problems the
coefficients of matrix may take on a wide range of values. This
can cause numerical difficulties  for example in inversion 
since the computer must round off fractions. To reduce the effect
of this problem, the matrix entries are scaled before starting the
problem.
There are a number of ways in which to obtain a starting
basis: from scratch using an "artificial basis," using a CRASHed
basis forcing some of the variables into the basis, or based on
information about the problem. The last approach is especially
useful in solving a series of problems with only minor variations
in data.
Candidate selection can play an important part in the
efficiency of an LP algorithm. There are a number of ways to select
the entering variable  "multiple pricing" is one of them which is
especially useful for large problems. In this approach we typically
divide the variables into segments and pick the segments sequentially.
In the chosen segment we select the g most negative reduced cost
variables as candidates for pivoting. We concentrate on these g
nonbasic variables for the remainder of some h iterations. And
then repeat the procedure with the next segment. A more recent
development concerning pivot selection is due to Paula Harris [17].
The reduced costs here are weighted prior to selection. The
weights are chosen with respect to a fixed set of nonbasic
variables. The idea is that in the traditional most negative
reduced cost approach we are trying to find out the best candidate
but on a local basis  locally we move in the direction of
steepest ascent of the objective function. Such a selection may
not serve global interests. By constantly adjusting the pricing
by weights derived from a fixed reference point, the global
interests of the problem are hopefully maintained. Expr rimental
results corroborate this expectation and show it to be a very
effective procedure.
When there is degeneracy in a linear programming Drobleim,
there exists the possibility of a set of basic feasible solutions
occurring repeatedly  called "cycling." Bland 13] has introduced
several new methods to avoid cycling  the simplest of whicn is
to select as entering variable one with the smallest subscript
among all candidates to enter basis. If there is a tic for leaving
variable, select the one with the smallest basic subscript.
Parallel with efforts to increase the size of problcim that
can be tackled by the simplex method have been efforts to incrcnse
its capabilities for problems having special structure. l ie
computing effort using the simplex method depends primarily, on the
number of rows; so it is worthwhile considering the possibillty of
taking care of some constraints without increasing the size of the
basis. Generalized Upper Bounding (GUB) technique introduced by
Dantzig and Van Slyke 19] achieves this for contraints of the form
(1.4) Ex. = b.
J 1
with the restriction that the same variable does not occur in more
than one GUB constraint. Decomposition technique is another method
of extending the capability of problem solving by breaking it into
smaller subproblems and using a master program iteratively to tie
them together. Unfortunately the convergence of this method is
rather slow and in addition one is unable to compute bounds on the
value of the objective function at each iteration, unless each of
the subproblems is bounded.
Despite the far reaching advances made during the last three
decades in extending the capability of the simplex method, some of
which have been mentioned above, the basis inverse still remains
the limiting factor dictating the size of the problem that could
be tackled. Even when we use variants like GUB and decomposition
techniques, the size of the basis is still the bottleneck. In the
case of GUB it is the basis of the master problem consisting of
other than GUB constraints and with decomposition it is the size of
the basis of the largest of subproblems. Thus this basis inverse
is the heart of all that is bad in simplex method related techniques
of solving an LP.
Commercial computer codes using above developments of simplex
method are capable of solving mathematical programming systems of
the order of 10,000 rows. However, useful linear programs of such
magnitude do arise in practice for example in food, oil and
chemical related industries. The reason why larger ones are not
being set up and solved is that no production LP code has been
developed to solve such large problems in a timely and cost
effective fashion. This is not to say that there are not large
linear programming production codes. For example, IBM's MPSX/370,
CDC's APEX, Honeywell's LP 6000 X, Techtronic's
MPS3, and others are reliable and available at reasonable rates.
However, one would not seriously consider using one of these to
solve a lincar program of 10,000 or more rows on a routine basis
unless the problem possessed some nice structure or the resulEs of
the problem solution had a high economic value.
Dantzig et. al. [S] in their justification for the setting up
of a system optimization lab for solving large scale models have
noted:
Society would benefit greatly if certain
total systems can be modeled and successfully
solved. For example, crude economic planning
models of many developing countries indicate
a potential growLh race of 10 to 15. per year.
To implement such a 2rou.thl (aside from political
differences; requires a carefully worked out
detailed model and the availability of computer
programs that can solve the resulting large
scale systems. The world is currently faced
with difficult problems related to population
grouch, availability of natural resources,
ecological evalu.'t ions and control. urban
redesign. design :f lar.c scalc cn.inccrin:.,
systemb'ni (e.g. atomic en r,;y .id ru:yc lin*i
systcni. and the modeling of mrn;n's plih iologican
.v teCm for the p purpose of diagnosis and trc;itmenic c.
These problems are complex: and urgent and can
only he solved if viewed as cotal systems. If
not, [lhcn oiily patclhuork picconical solutions s w. ill
be developed (as it las been in the past) and
the u.rrld will continue to be plagued by one crisis
after another caused by poor planning techniques.
The bottleneck in solving such large unstructured LPs is the
inability to maintain a sparse yet accurate representation of the
basis inverse. When the simplex method is not computationally
feasible for such large problems, iterative techniques that do not
require a basis inverse may be an attractive alternative. In this
dissertation we concentrate on iterative procedures for solving
linear programs.
We discuss iterative techniques under the headings (a)
relaxation methods and (b) subgradient methods. These methods are
insensitive to the number of constraints. In fact an increase in
the number of constraints for the same problem improves the
convergence of the relaxation methods. Unlike the simplex method,
iterative techniques are selfcorrecting. We always use original
data thus advantages of supersparsity [22] can be fully realized
and the program run in main memory. Finally knowledge of a vector
close to the solution can be used to advantage. This is very
useful when a sequence of problems has to be solved with only
slightly varying data.
As an aside to the main thrust of this dissertation, we note
that iterative methods are more attractive for solving certain
large Markov decision processes and also for problems having a
Leontief Substitution constraint set [23, 25]both of these problems
are an important special class of LPs.
CHAPTER 2
BACKGROUND RESULTS AND TOOLS
Introduction
Mo tivation
In this chapter we will review various iterative techniques
for finding a point of a polyhedron. We are interested in finding
a point of a polyhedron because a wide variety of important
mathematical programs can be reduced to such a problem. For
example, solving a linear program could be reduced to finding a
point of a polyhedron using primal and dual constraints along wich
an additional constraint implying strong duality. To elucidate tliis
further, consider the following primaldual linear programming pair:
(2.1) Primal Dual
Max c'x Min v'b
s.t. Ax b s.t. v'A c c
x >O v 0
Here A is m x n; c is n x 1 and b is m x 1. v' stands for cranspose
of v. Problem 2.1 can be restated as
(2.2) Ax + b 0
v'A c 0
v'b c'x 0
v 2 0, x 0
Let P = {((): (v) satisfies 2.2} be the set of solutions to 2.2.
()E P implies that x solves the Primal problem in 2.1 and v solves
the dual problem.
Linear programs of the form
z* = Max Min (a'x + b.)
1 1
x i=l,...,m
relate directly to the problem of finding a point of a polyhedron P
where
k
P = {x C R : a'x + b. z*, i = 1, ..., m}
1 1
Such problems arise, for example, in largescale transportation
problems and decomposition problems [27]. We will develop such a
formulation for the transportation problem.
Given k origins with stocks aQ > 0 of a single commodity, and
m destinations with demands d. > 0 the transportation problem involves
determining a shipping schedule that meets all demand and supply
constraints at minimum cost. Let
zyj be the amount shipped from I to j
cj cost of shipping one unit from Z to j.
The problem is
k m
Min Z E c ,j zzj
Z=l j=1
subject to
m
E z j a = 1, ... k
j=l
k
E = d. j = 1,..., m
%=i j J
z > 0
j =
Without loss of generality, we assume
k m
Z a = d
=1l j=l 1
If m and k are too large to be handled by conventional network
methods [4], then the decomposition principle can be used. Let
(z : = ., n; j = 1, m): i I'}
be a list of extreme points of
k
E z k = d., j = 1, ..., m
Z=1
z >2 0
Then the transportation problem is equivalent to the master problem
k m
Min E Z c Z z A
k=l j=1 idI
subject to
m
a k z AP = 0, Z = 1, .. ., k
il j=l
SA = 1
iEI
A1 0.
Let
k m
b Z Z cj z
i =1 j=1
m
i i i
a = a z z
j=1
The master problem becomes
Min E bii
iEI
subject to
i i
A = 0, 1 = 1, ..., k
iE I
icI
i 0
The dual to the above problem is
Max w
subject to
k
E a x + w b
Z=1
w b + a'x
where
ai i k
ai = (al' a2, ..., ak) c R
This problem is equivalent to
Max, Min rax + b.
iax + b.)
xER icI 1 i
Task
Two of the methods of finding a point of a polyhedron are the
FourierMotzkin Elimination Method [7] and a Phase I simplex method.
The Elimination Method reduces the number of variables by one in
each cycle but may greatly increase the number of inequalities in
the remaining variables. For larger problems this procedure is
impractical [26]  since the number of inequalities growsat a rapid
rate. The Phase I simplex procedure can also be used to find a
point of a polyhedron. For very large problems the simplex method
breaks down due to a practical limitation on the size of the matrix
that can be inverted as explained in the last chapter. It is for
problems of this nature that iterative methods may be an attractive
alternative. We discuss iterative techniques under two classes 
relaxation methods and minimization methods.
In relaxation methods, at each iteration we attempt to find a
point of a relaxed form Pn of P. That is at iteration n we find an
n n n
x P such that P c P We look for the property that the sequence
of points thus generated is pointwise closer to P. Minimization
methods attempt to solve the problem of minimizing a general, maybe
nondifferentiable, convex function f('). The sequence generated
generally has the property that the successive iterates are closer
to the level set of solutions in distance, x* is one such point.
The value of the objective function need not necessarily improve ac
each iteration. When we use the minimization method for finding a
point of P, the function f is so defined that if x* is a minimum of
f then x* E P.
We will now review the relaxation methods. As was mentioned
n n
earlier, we find at each iteration n, a point xn p where P is a
n.
relaxed form of P. That is P c P We continue finding points
successively in this manner till we can find an x* E P. :L leasE
one such point exists by our assumption that P J 0.
Definitions
We need a few definitions before some of tle methods c.n be
described. Let
R.(x) = a'x + b
1 1 1
k k
where i E I, I # 0 and finite, x R a. R and b. E R. Without
1 1
loss of generality assume a.i = 1 for all i, where 11 II is the
Euclidean norm. H. given by
1
H. = {x c Rk: k.(x) 0)
1 1
is the halfspace associated with i C I. P given by
P = n H. = {x e Rk: (.(x) 2 0, j I} is the polyhedron
1 1
icI
of interest. E. given by
1
E. = {x C R: L. (x) = 0}
1 1
is the plane associated with halfspace 11 d(x,H.) given by
d(x, H.) = inf II xzI
z H.
1
is the Euclidean distance from x R to halfspace H.. d (x) given
1 p
by
d (x) = max d(x, H.)
p i
i I
is the maximal distance from x to P. B (x) given by
B (x) {ycRk: 1 xyjll r}
r
is the ball with radius r and center x. S (x) given by
r
Sr(x) = {yRk: Ixy = r}
is the kl dimensional sphere of radius r and center x
Typical Relaxation Method
Procedure
A typical relaxation method for finding a point of P is
o k
(1) Pick x R arbitrarily
(2) If xn P stop, otherwise let i* be the
most isolated halfspace, i.e.,
n n
a'. : x b.* > a' x b., for all iel.
1 1 = 1 1
We obtain the sequence {xn} recursively as follows:
n+1 n +n n n
x = x + A (t x )
n n n n
where t = PE (x ) is the projection of x on Ei,. A is called
i*
the relaxation parameter at iteration n. Generally An is constant
across n = 1, 2, ...
Variations
If An = 1 for all n,we call the procedure a projection method,
and when An = 2 for all n,we call it a reflection method. See
Figure 1 for an example of the relaxation procedure. In this
example we use A = A = 2, for all n. At xo, E is the plane
corresponding to the most violated halfspace. We reflect in E.
1 1 2
to get x At x we reflect in E3 to get x If we concinlju
the procedure we are in P in four iterations.
Convergence
Fejermonotone sequence
We are ultimately interested in finding a point of P or at least
finding a sequence of points which converge to a point of P. If the
sequence terminates with a point of P (as in Figure 1) we ha.'e
accomplished our goal. If the sequence does not terminate, then .'e
wish to know whether it converges to a point of P. A well kno..in
result concerning "Fejermonotone" sequences will be used to shed
light on this question.
a3 + b> 0
3* 3 =
alx + b > 0
1 1=
x
a2x + b2 0
a 2 + b 2
FIGURE 1
A sequence of points {xn} in Rk is said to be Fejermonotone
[13, 31] with respect to the set P if
n+l n
(i) xn+ xn and
(ii) I x+ zll I xn z  for all z P and for all n.
Motzkin and Schoenberg
A well known result in analysis is that if {sn) is monotone,
then {sn) converges if and only if it is bounded. Hence we see that
a Fejermonotone sequence always converges in distance. However, we
need to relate this convergence to the convergence of sequence {xn}.
Before we relate a theorem on convergence of Fej6rmonotone sequence
of points, we need a few more definitions.
k
Given a set K c Rk, the affine hull of K, denoted by 'f(K), ii
k ii i
M(K) = {x E R : x = x x E K,
iEL
A = 1, L finite, Ai R}.
iEL
For example, the affine hull of a point is the point itself and affine
hull of a line segment is a line containing the segment. The affine
hull of a triangle in R3 is the plane containing the triangle. A
set P is affine if P = M(P).
Let M be affine in Rk, x a point of M and r E R be a positive
real number. Then we define S (M,x) as the spherical surface .:itl
axis M, center x and radius r by
(2.3) Sr(M,x) = x + ((M x)L n {z E R k zj = r})
Figure 2 shows the construction of a spherical surface. Gi'.in a
point of affine set M, M x is the translation of M by x. (" 
is orthogonal to M x. S (0) is a spherical surface with radius r,
r
which intersects (M x) at two points. These two points are
translated back by x to give S (M, x),
(2.4) Theorem (Motzkin and Schoenberg)
Let the infinite sequence {xn) be Fej6rmonotone with respect
to the set P, then
Case (a): If P has an interior, the sequence converges to a
point.
S (M, x)
r
I. x
i 
s (0)
r
(M x)
FIGURE 2
18
Case (b): If P does not have an interior, then the sequence
can be convergent to a point or the set of its limit points may lie
on a spherical surface with axixMl(P).
See Figure 3 for an illustration. In case (a) P has an interior
0 1 2 3 *
and the sequence x x x x ... converges to a point x of P.
In case (b) P does not have an interior and the Fejermonotone
0 1 2 3
sequence x x x x ... results in the two limit points of the
sequence lying on a spherical surface with axis AI(P) and center y.
1
y being the projection of x on P
Relaxation Procedures
Agmon
We are now in a position to start our survey of relaxation
procedures. One of the earliest published methods is the projection
method due to Agmon [1]. Here we specify an initial veccor x; E F,
(chosen arbitrarily) and set An = 1 for all n. At iteracion n if
n n+l
x E P stop. Otherwise we find a new point x as follo..'s:
n+l n n n n
x =x (t x
I n
(a x + b. ) a
n n n in
n
= x 
a. a.
n in
n n
= x (a x + b.
in in) a.
since Ia.i I = 1. Here i represents a most violated conLr3iinc of
n
n
P at x
Case (a)
S (M(P), y)
r
135
1 3 5
X ,X ,X ,
y
M (P)
p
0246
x ,x ,x ,x ,
Case (b)
FIGURE 3
f
This procedure yields a sequence which converges to a point of
P. In the Agmon procedure
P = {x: a. x + b. 0)
in in
That is, pn is the closed halfspace defined by the constraint in and
P c In keeping with our notation, here P is a relaxed form of P.
(25) Theorem (Agmon)
Let P = n H., P # (, I 7 0 and finite, be the solution set of
iEI 1
a consistent system of linear inequalities, and let {xn) be the
sequence defined by the projection method, that is, n = 1 for all n.
Then either the process terminates after a finite number of steps or
the sequence {x n converges to a point of P, x
Furthermore,
n 0 n
Ix x  2d(x P) ,
where 0 c (0, 1) and depends only on the matrix A, where
a
a2
A =
a
a m
and m = ll.
Agmon explicitly proved the convergence for the projection
method. He also showed that the results could be extended to the
case A E (0, 2), where A = A for all n.
Motzkin and Schoenberg, Eaves
Motzkin and Schoenberg 131] (among others) described a reflexion
method for finding a point of P. They showed that if P has an
interior then the sequence terminates with a point in P after a
finite number of steps. Let x0 P be an arbitrary starting point
and generate a sequence as follows:
If x C P stop
n n
If xn P select a halfspace such that x n H.1
n+1 n
Let x be obtained by reflecting x in E.. After
J
finitely many iterations, x will fall in P if P has an
interior.
The general reflection function can be expressed as follows:
k k
For i = 1, ..., m define f : R  R by
i
x 2(a.x + b.)a. if x 1 H.
f.(x) = x + 2d(x, H.)a. =
x if x E H.
1
Let gl' g2, ... be a sequence where
gj {fl, .. ., f } for j = 1, 2, .
Define g to be the composite function
gn(x) = gn(gn 1 (...(g (x)) ...))
(2.6) Theorem (Motzkin and Schoenberg)
Let P # 0 and I # 0 and finite. If P = n H. has an interior,
iI
0
then for any x there is an k such that
g (x) E P.
Eaves 111] extended this result and demonstrated that the
reflection procedure has a uniform property. Namely, if x is within
a distance 6 of P, then g (x ) will fall in P for some e where
0
Z and k depends on 6 and not on x .
(2.8) Theorem (Eaves)
Assume that P = n H. has an interior. Let X be the set of
icl
points within a distance 6 of P, that is,
X = {x C R: d(x, P) 6}
Then there is an L such that g (X) c P. g is thus a piecewise,
linear retraction from X to P.
Piecewise and linear means that it has finitely many linear
pieces. Retraction means g : X  P is continuous, P c X and
g (x) = x for x C P. See Figure 4 for example. Six points
chosen arbitrarily within 1" of P all converge in under four
iterations.
Goffin
Definitions. Coffin's 115] work deserves special menLiocn.
He has provided a comprehensive treatment of relaxation mechods
and presented several useful finiteness properties. We need a
few more definitions before presenting Coffin's results.
A convex subset F of P is a face of P if
x, y C P
(> x, y c F.
(x, y) F # J
We denote the set of faces of P by F(P). Figure 5 illustrates
F(P) for a given P. Zerodimensional faces are 1, 2, 3. On'
dimensional faces are 4, 5, 6. The polytope 7 itself is also a race.
FIGURE 4
24
The zerodimensional faces of P are called the extreme points of P.
The n1 dimensional faces are called facets of P.
2
FIGURE 5
NS(x) defined by
NS(x) = {u E Rk: u'(z x) 0, Vz E S}
is the normal cone to S at x. NS(x) is a nonempty, closed convex
cone. It is nonempty because it contains at least the origin. Ic
is closed and convex because the intersection of closed halispace is
closed and convex. It is a cone because for all u E NS(x),
Au E N (x), A 2 0.
CS(x) = (NS(x))p
is the supporting cone to S at x. Figure 6 illustrates theoe last
two definitions.
A point x of a set P is a relative interior point designated
r.i.(P.) if it is an interior point of P with respect to the relative
topology induced in M(P) by the Euclidean topology, i.e., x E r.i.(P)
if there exists 6 > 0 such that B6(x) n M(P) c P. For example, as
2 2
shown in Figure 7, a line segment in R has no interior in R but
has a relative interior.
NS (x)
FIGURE 6
FIGURE 7
It can be shown that the normal cone is the same for all
relatively interior points of a given face, i.e.,
Np(x) = Np(F) for any x c r.i.(F)
where F is a face of P. Similarly
C (x) = C (F) for any x c r.i.(F).
Let T be any set, then T is defined as
T = {x E Rk: x = y, y T}
A closed convex set is said to be smooth enough at a point y of
its boundary if
Np(y) c Cp(y)
A closed convex set is said to be smooth enough if it is smooth
enough at every boundary point, or equivalently if
N (F) C (F) V FE F(P) 0
where F(P) stands for the set of faces of P and F(P) 0 is the set
F(P) with 0 removed.
The smooth enough property applied to a polyhedron would require
all its "corners" to be "smooth enough," carrying the analogy to
kdimensional Euclidean space.
Some characteristics of smooth enough convex sets are mentioned
below:
A polyhedral cone C is smooth enough if and
only if (iff) C c C, where C is the polar of C.
C={u E Rk: u'z < 0, Vz E C}.
A polytope (bounded polyhedron) is smooth
enough iff all its extreme points are smooth
enough.
A polyhedron is smooth enough iff Np(F) c C (F)
for all the minimal elements of the poset {F(P) c, c}
where c stands for set inclusion.
Some illustrations are given in Figure 8.
A poset (P, <) is a set P on which a binary relation < is
defined such that it satisfies the following relations:
(i) for all x, x < x (reflexivity)
(ii) if x < y and y < z then x < z transitivityy)
(iii) if x < y and y < x then x = y (antisymmetry)
Examples of posets are {R, <}, i.e., the reals with the ordinary
less than equal to operation, a dictionary with lexicographic
ordering, and the power set S, P(S), with set inclusion, c as the
binary operation.
Given a unit vector e E R and an angle a E [0,11], the set of
vectors which make an angle with e less than or equal to a is called
the spherical cone, C (e), with axis e and half aperture angle a.
N (y) (y)
(a) Not smooth enough at the extreme points of P.
P
(b) Not smooth enough at any point of P.
N (y)
(c) Smooth enough.
FIGURE 8
C (e) = {x E Rk: x'e x  Cos a}
See Figure 9 for an illustration
C (e) = C{B (e)}
a = sin a, a c [0, i)
2
FIGURE 9
Some properties of C (e) are given below:
Ca(e) is closed for a E [0,1]
Ca(e) is a convex set for a G 10, N/2]
For a E [0,1/2)if we let v = Sin a, then
Ca(e) = C{B (e)} where
k ii i i
C(S) = {x R x = A x x E S, A > 0, L finite)
iEL
C(S) is the convex cone hull of S.
Following Goffin [15] we define an obtuseness index u(C) of a closed
convex cone C to be the sine of the half aperture angle of the
largest spherical cone contained in C. Properties of V(C) are:
u(C) = sup {Sin a: C (e) c C, e i B (0)}
= sup mnin(a e)
eS1 (0) icI
u(C) > 0 iff C lias an interior
C1 c C,  u(C ) < u(C,)
u(C) = 1 iff C is a halfspace.
If C (e) is a spherical cone then u(C (e)) = Sin a.
The obtuseness index of a polyhedron P is defined b.,
U(P) = inf 'u(Cp (::)) = min U(Cp,(F))
:z P F F'(P).i .
For a pol cope 'e have
'(P) = rin u(C,(F))
Ft'.'ertices of P
If U(P)= l. 1/2, then P is smooth enough. However, clis is noc a
necessary condition for P to be smooth enough. If P is the pocsicive
ortchant in R then P is smooth enough but
U(P) = 1..'D 1/..F
kk
The distance from a point X:; c a set S C Rk is defined bv,
d(x. S) = inf I x z
zE S
If S is closed, chen che inflmum is acctined, and che set ol points
of S where it is attained is the set of points closest re :: in S
Ts(x%) = {' FE S: I x y:  = d(x., S)}
If a ser K is closed and convex, then T..(::, reduces tc a single point
which will also be denoted by TK(x) This point is called the
k
projection of x into K. Hence T (x) is a retraction from R to K.
Let K be a closed convex set in R The following important
result implies the continuity of the map TK.
(2.8) Theorem (Cheney, Goldstein)
The projection map TK satisfies the Lipschitz condition (for
C = 1). That is
IITK() TK(y)I Ix yI
See Figure 10 for an illustration;
x
K() y
K(y)
IT (X) T (y)I =< I x YIJ
FIGURE 10
(2.9) Theorem (Goffin)
Let K be a closed convex set of Rk and TK(x) the unique closest
point to x in K. Then
{[TK(x) + NK(T (x))], TK(x) K}1 is a partition of Pk
See Figure 11 for an illustration.
FIGURE 11
(2. 10) Lc rEuT
Le C c R' be a closed con'.ex cone. Then
T C" ) c )
where C is i hc polar of C.
An alternaci'.e formulation of a spht.erical surface '..ich da:is. 11,
radius d(x, I) and center T (X) (see 2.3) is concaincd in lhe
following lenrma.
(2.11) Lemma (Goffin)
k k
Let M be affine in R and x C R then
Sd(x,) T(x)) = {y c Rk: x z11= Ily l z M}
'Finally a well known result on normequivalence is stated
below. Here we state the result in terms of distances from x to P
with x V P.
(2.12) Lemma
Let P = n II., P / 0, I J 0, then there exists a scalar
iCl 1
0 < J < 1 such that for all x c Rk
pd(x, P) d (x) < d(x, P).
This result is illustrated in Figure 12.
Another Proof of Agmon's Result. The following is a different
proof for the result originally due to Agmon. We will find this
useful in subsequent parts of this material. In this proof we
make use of the result that the sequence generated by the relaxation
method is Fejermonotone with respect to P.
d (x)
P TT' ,,
FIGURE 12
(2.13) Theorem
Let P = n H., P J 0, I # 0 and finite, then the relaxation
icl
method generates a sequence which converges finitely or infinitely
to a point x* E P for any given value of the relaxation parameter X
such that A E (0, 2).
Furthermore, we have
n *0
Sx x 2d(x P) 6n
where 0 = [1 A(2 _A)p2]1/2
AE [0,1)
Proof:
Assume x n P
First we will show using Figure 13 that
d (xn+, P) = d 2(x, P) A(2 A)d (xn)
where d (xn) is the maximal distance from xn to P i.e., d xn
max d(x, H.).
iEI
n+l n n+l
From the right triangle x t T(x ) we have:
2 n+1 2 n+1 n 2 n
d (x P) = d (x t ) + d (t P) ........ ....(1)
n n n
From the right triangle x t Tp(xn) we have:
2 n+1 2n n 2n
d (x P) = d (x t ) + d (t P) ............... (2)
From (1) and (2)
2 n+1l 2 n 2 n+1 n 2 n n
d (x P) d (x P) = d (x t) d (x t )
2 n+1 2 n n+l n n n.
d (x n+, ) = d (x P) [d(x t ) + d(xn, cL )
[d(x t) d(x+ t )]
= d(xn, P) Adp(xn)[(2A)dp(xn)]
= d2(x, P) A(2 A)d (x )
P
It will be noticed that our proof is independent of Figure 13. The
figure only helps to illustrate the ideas.
T (xn) = T (xn+)
p p
FIGURE 13
By lemma 2.12 there exists
pd(x, P) a dp(x)
p'
a p > 0 such that
d(x, P)
hence
2 n+l 2 n 2 2 n
d (xn1, P) < d(x, P) 2A(2 )d(xn, P)
= d(xn, P)[l A(2 A) p2]
(2.14) Let 9 = [1 A(2 A) 2 1/2
O = 0 when A = 1, p = 1. In general 0 E [0, 1). Thus,
d2(xn, P) O2d2(xn, P)
or d(xn, P) Ond(x0 P)
Case I: If the sequence does not terminate
lim d(xn, P) = 0
n*o
Since {xn} is Fejermonotone and dim P = n, then {x}n converges to
x C DP by Theorem 2.4 where DP is the boundary of P.
n+l
x
Case II: If the sequence terminates in a finite number of steps
Sxn+ z f
Fejermonotone with respect to P. Clearly I x* Tp(xn)l~l xn T(x)II,
hence
x* E Bd(xn, P)Tp(n)
and x*, xn both belong to the ball B d(xn pTp(x ) (see Figure 14).
Hence,
xn x*1x 2d(xn, P)
< 2nd(x0, P)
If the sequence terminates, we can take x* to be the last point of
the sequence./
n+2
x
FIGURE 14
In the above procedure, if 1 A < 2 then
P = {x: a' x + b. > 0}
1 1 
n n
That is, pn is the closed halfspace defined by the constant i .
n
If 0 < A < 1, then
pn = {x: a, x + (1 A)b. Aa' xn > 0)
i i i =
n n n
and again P c pn. In both cases Pn is a relaxed form of P and at
iteration n we find xn E Pn
Range of A for finite convergence. Using relation 2.14 giving
o against different values of A and p, we get the results in Table 1
shown graphically in Figure 15.
TABLE 1
V = 1 A .1 .5 .75 1 1.5 1.9
61=f(A) .9 .5 .24 0 .5 .9
S= .1 .5 .75 1 1.5 1.9
0,=f(A) .98 .9 .87 .87 .9 .98
S= 4 A .1 .5 .75 1 1.5 1.9
6 =f(A) .99 .98 .97 .97 .98 .99
From Figure 15 it would appear that A = 1 gives best results (lowest
0). However, Motzkin and Schoenberg demonstrated that when P has an
interior A = 2 leads to finite convergence. Also if P has acute
angles, p will be low and 0 high. Thus an obtuse angle should lead
to faster convergence. Goffin determined the range of values of A
which gives finite convergence.
38
/4
/2
1.0 1.5 2.0
FIGURE 15
\ = 1
1 = 1
i =1
.5
(2.15) Theorem CGoffin)
Let P be a polyhedron with an interior. Then the sequence
generated by the relaxation method converges finitely to a point of
P if
(a) P is smooth enough and A c 11, 2]
(b) A E (X*, 2]
where
2
= + 2P and U(P) is the obtuseness index of P.
1 + U(P)
Furthermore, if A > 2 then the sequence either converges
finitely to a point of P or it does not converge.
The first part of the theorem shows that if all the corners
of P are smooth enough then the range of A over which we can get
convergence in a finite number of steps is [1, 2].
The second part of the theorem shows that for a polyhedron that
is not smooth enough, the extent by which we can deviate from A = 2
depends on the acuteness of the most restrictive extreme point of P
 the more acute it is, the less we can relax A from a value of 2.
Merzlyakov
Definitions and Motivation. The final work on relaxation
methods that we review is by Merzlyakov {30]. Merzlyakov's
method takes advantage of the fact that convergence properties
of relaxation methods improve if, instead of considering only
the most violated constraint, as was done by Agmon, among
others, a larger number of supporting planes are considered.
We would like to emphasize this unusual property of
relaxation methods  redundancy, in terms of number of constraints,
improves the convergence rate. Additional halfspaces have the effect
of increasing p which lowers 0, the convergence ratio.
At each x C R let
V(x) = {i: a'x + b. < 0, i = 1, ..., m}
1 1
That is, V(x) is the set of indices of constraints violated by x.
A cavity CV is the set
C = {x : R : V(x) = V} where V is a subset of first m natural
numbers. A subcavity SVj of CV is the subset of all x c CV which
violate constraint i C V no less than another constraint j E V.
Ties are broken by the natural ordering. Notice that subcavities
along with P finitely partition Rk. Figure 16 illustrates the above.
Associate with each subcavity S a fixed X.(j, V) '.here
0 if i V V
.(j, V) = 0 if i V
1
> 0 if i = j
2
For example for subcavity S{2 ,3(see Figure 16).
X1(2, {2, 3}) = 0
A2(2, {2, 3}) > 0
A3(2, {2, 3}) > 0
Method. Merzlyakov's relaxation procedure is as foll,,s:
Let A E (0, 2) and x specified. If at iteration n. :n P
stop. Otherwise, let
I
s 21
{I, 2}
C{1, 2}
, 2 3
{1, 2, 3)
, 2, 3)
FIGURE 16
3
S
{2, 3}
2 3
f2, 3)
3}
(2.16) x = xi [ (j,V)(axn+b )] (j)a
[ZiA(j,V)ai] [LiA(j,V)ai]
n j
where x S.
We again make the assumptions that P # 0 and without loss of
generality assume I a.si =1 for each i = 1, ..., m. Also assume
,withouc loss of generality thac Z7.i(jV) = 1. Coffin sho..ed chat if
P t 0 chen the vecccor 7\. (j,V)a. cannot be zero.
lierzlvakov showed that the procedure given above converges
Co a poinc of wiih a linear ract of convergence. If 1 < 2
clhen w.e can say
Fn = {:: 7. (j,V)(a': + b.) 0}
1 1
i
and clearly P c Fn. If 0 < < 1, chen
Pn = {:.: \,.(jV) Ia'x + (1 \)b. ',a .: ] }
1 i 1 1
and P = P Here again Pn is a relaxed form of P and the sequence
obeys our stipulation for rela:xacion method, i.e., ac iteration n we
n n n
find :.; c P ..lere P = P .
The rela:;ation sequence I{:; } congerge infinicely co, a point
C .dP, where ciP is the boundary of P, only if the whole sequence
n: 'J ja r" into :; + H (: ). by, moving in a direction that is a
convex com.binacion of violated constraints ,we force the sequence out
of tlie jam. For Ehis reason terzl,,,akov method can be rcerred co
as an an tijairinug procedure. Table 2 and Figure 17 iilutScnte this
poinE. The cable and figure are based on the ex:.ample givcn in
Merzlvakrov's paper.
TABLE 2
Consider the following system of linear inequalities
x1 + 10x2 + 10 0
3x1 10x2 30 > 0
0
Let x = (0, 0). We solve this by
(a) Agmon's method (b) Merzlyakov's method
where X = 3/4, a. = 1
n
(a) Agmon's method
n+1 n n
x = x x + b. )a. ;
I 1n
n n n
TABLE 2 (continued)
(b) Merzlyakov Method
n+1 n [ZA.(j,V)(a'xn + b.)][ZA.(j,V)ai]
xI = x(jV I i(j a
In i(j,V)a. ] In i (jV)ai]
n [TA. (j,V)a.]'
x E X.(j,V)(a'.xn + b.) Y .(j,V)a. 1 1
n xln x n 1 1 1 1 1 [.i(j,V)a.]
X1 X2 l (3,V)a
.287
0 .000 .000 2.873 ( ) 1
.958
188
1 .619 2.064 1.838 ( ) .037
.037
.188
2 7.622 .686 .459 ( ) .037
.037
.099
3 9.371 .342 .273 ( 99 1
188
4 9.351 .138 .125 (13) .037
.037
188
5 9.827 .044 .032 ( 1) .037
037
.099
6 9.949 .020 .010 ( 99 1
.955
188
7 9.948 .013 .008 ( 03 .037
.037
287
8 9.978 .007 .002 ( 8 1
.958978
9 9.978 .008
Notice that the Merzlyakov method
the Agmon procedure. Convergence to a
faster with the Merzlyakov Method.
is much more effective than
point of P appears much
Generalized Merzlyakov Procedure. The Merzlyakov procedure
can be generalized [24] by allowing the A.'s to be chosen based on
1
the current point rather than fixed for a subcavity. For any:.
where V(x) # 0 let
NAl A II
0 0
F
+ I
x x
o 0
+ I
X X
II II
II
i rC N
P
DC
0
4 P
N 0
0 0
rII
< 0
I I
0 if a'x + b. > 0
A.(x)
( 0 if aix + b. 0
1 1
where 0 < EA.(x) B, A.(x) > 0 for some i E V(x) and at each n
i
there is a j e V(xn) such that
A.(xn)(a'.x + b.)
1 1 1 >
n
a'x + bj,
where j* indexes a most isolated constraint and 0 > 0.
The convergence properties of this method are described in the
next chapter.
Typical Subgradient Method
Procedure
Let f be a conve.:: but not necessarily differentiable funccion
defined on R u E R is said to be a subgradienc of f at :: if it
satisfies the following inequality:
f(y) 2 f(x) + u'(7 :x:) for all Rn.
The set of subgradiencs of f at x is called the subdifferential of
f at :: and denoced by cf(>:).
Si(::) = {u c R : u is a subgradient of f at ::}
Sis a minimum of f if and only if 0 e cf(x ).
A typical subgiadient algoritlhm works as follows:
(1) choose F. I. nrbicraril,
n  n n 11
(2) coiipu t .i bgradieit u >i at :. i.e. 11 c d :: ).
If u = 0 an optimal point has been found. If not go to (3).
n+l n+ n n n
(3) Find : as tollous; :: = x t u
where tn is a scalar generally obtained by the approximate
minimization of
f(xn tun) for t a 0.
n n+1
Go to (2) with x replacing xn
The function f could be an arbitrary convex function. If the
objective is to find a point of a polyhedron, one could choose a
number of different functions. One choice could be
n n
f(xn) = Max (a.x b.)
i 1
i=l,... ,m
Since f is the maximum of a finite number of convex functions, f is
convex. Oettli [32] has given another interesting way to define a
function which can be used to find a point of a polyhedron. This
is given latter.
Variations
Reverting back to the general subgradient procedure given by
the iterative scheme
n+l n n n
xx tu
various proposals [16] have been made for the step size tn. The
earliest one was by Polyak [34] who suggested tn = o and lim n = 0.
n"
The merit of this procedure is that we could choose any arbitrary
subgradient u nE f(x ) and the sequence would converge to some x*
with the stipulated stepsize (e.g., with t = 1). However
convergence is dismally slow and the method is of little practical
value. Polyak attributes the original idea of this method to
Shor [36].
Another idea also due to Polyak [35] is to use
(2.17) ? If(x') ]
n n
I n*
with f = f(x ) the optimum value of the function. Polyak has shown
that with 0 < b An = b < 2 the sequence converges to x with a
1 n 2
linear rate of convergence. A variation of 2.17, also due to Polyak
is to use f > f(x ), i.e., an overestimate of the optimum f(x ).
Here again if 0
linear rate but to the level set
S = {x: f(x) < f}.
Thus for the result to be of value we must choose close enough
overestimates of f(x ) and this is difficult in practice.
A final variation on the theme of 2.17 is to use an
underestimate f < f(x ). Eremin [12] has shown that the sequence
converges to x if lim n = 0 and XA = n Here again convergence
n n
no
to the optimal solution is very slow.
Some Subgradient Procedures
Polyak
(2.18) Theorem (Polyak)
Let f be a quasi convex function to be minimized over F:. If
the sequence {xn} is obtained as follows
n+1 n n un
x = x A u with A = u n, then there e:xists a
"ni ul n
subsequence (xnk} such that lim f(xnk) = f
koo
Proof: Select a > f and define
S = {x E R f(x) = a}
S is convex (it is a level set). Hence there exists an x in int S
a a
since a > f (see Figure 18). Choose p > 0 such that the neighbor
hood B (x) c S
p a
Suppose x ~ S for all n
f(x)
a
I I P.025
(a) x E RI
contours of f
(b) x R2
FIGURE 18
f(x) f(x") > u" (x xn), x
n n
since u is a subgradient at x Thus
f(xn) f(x) un' (xn x). Also
f(x) < f(xn) Vx S
Ct
Combining the above
un (xn x) 2 f(x") f(x) ? 0, yx C S
n
This holds for a point x + u E B (,x)c S
II u P
n un n pun
u x u ( + )
n+ln
+ Iu
n 11 n 1 2 n ~1
p u Ij u (x x)
x x  = x 
n ~ 2 n
II x x + IIu
< n ~ 2 n
= x x2 + u 
n ~I 2 n
= x xll + uI u
n ~ 1 2
u x I
Choose N such that A = p, Vn N.
n
For n = N to n = N + m
S IN+m1 ~ 2 N
0 5 x x I x
SN+m
g x x p A
n
n=n
2un (xn x)
2p I un J
2Pln
2pA .
n
This is possible since lim .', =0.
n
n.0
 x2 + E An(n 2p .,
n=N
For m large enough the right hand side is negative, since Z.\ Ji'.erges
 a contradiction. Hence x EC S
a
Let lim a = f Then we have proved there exists :'nk a
k k AV
such that
n
lim f(x k) = f _I
k*oo
Computational experience with heuristic subgradient methods has
been reported by Held and Karp [18], and Held, Wolfe, and Cr:,.der [19].
Hence
Held, Wolfe, and Crowder use underestimates to f(x ). However the
stepsize Xn = X is kept constant for 2m iterations where m is a
measure of the problem size. Then both A and number of iterations
are halved successively till the number of iterations reaches some
value k, X is then halved every k iteration. This sequence violates
the requirement Xn = m and it is possible that the sequence may
converge to a wrong point. However in their computational experience
this did not happen. Held, Wolfe and Crowder experimentally found
that their procedure was effective for approximating the maximum of
a set of piecewise linear concave functions. In their problems
(assignment, multicommodity, and maxflow) the number of constraints
is enormous, of the order of 100'. In all these cases they were able
to get linear convergence (which is assured by the choice of stepsize).
Held and Karp used overestimates to f(x ) to obtain bounds for
a branch and bound procedure for solving the traveling salesman
problem. Using this procedure they were able to produce optimal
solutions to all traveling salesman problems ranging in size upto
64 cities.
Oettli
Oettli proposed a subgradient method for finding a point of an
arbitrary nonempty polyhedron. We will discuss his less general
method which can be used to solve a linear program. As before, using
the dual and an additional constraint expressing strong duality, a
linear program can be stated to be the problem of finding an x C Rk
satisfying
+j(x) 0 i = 1, ..., m
1
where Z.(x) represents the magnitude by which constant i is violated
at x;i.e.,
i (x) = max {0, a.x b.}
1 1 1
Define (x) to be the vector formed from violations of
different constraints
Z+(x) = (Z+(x), Z2+(x), ..., Z (x))
S2 m
Let p(*) be an isotone norm on Rm, i.e., if 0 = x y then
p(x) p(y). It may be noted that all the usual norms like Euclidean
and Tchebycheff are isotone norms. Define
d(x) = p(Z+(x))
Then clearly x satisfies +(x) 2 0 iff d(x) = 0.
Oectli sho'.ed that d(') is a conv'.e: function andi then described
an iterative scheme using subgradients of d(') that minimizes d()
over Rk. This scheme is quite general since we have a different
function for different norm.
(2.18) Lemu. [Oettlil
d(') is a conxex function.
Proof;
'. (z) is con'vex for all j.
.1+ +
Thus i (atz1 + ',z~) ": i (z ) + t i (z ) for d., :. O, ., +; = i.
Since p is monolonic
.+ '+ 1 +
pr'( +'izt + Cz) p[,i':> (z ) + Si t (z )i
and since p as a norm is con'vx:
I p[lo (zi) + izt ) + z )
h= plk+(zl) I + d pl (z )
hence dxz1 + Lz~) L d(z') + Cd(z). i
Let x0 be given initially. If xn P stop. Otherwise, let
n+l n d(xn) .
(2.19) x = x _ (xnt(x)
t(x )"t(xn)
where t(xn) is any subgradient of d(') formed according to Oettli's
n
rule at x The sequence {x } converges to a point of P with a
linear rate of convergence.
(2.20) Lemma [Oettlil
Let y C P, then for the Oettli's procedure
n+1 2 n 2 d2(xn)
Ix I (x ) 1 2
Proof:
n+1 2 n d(x ) n 2
Sx yl = n x n) t(x ) yl 2
t(x")'t(nn)
n 2 dx) 2d(x)
=I x  + x 2d(xn) t(xn)' (x Y)
t(xn)'t(x ) t(x )'t(x )
By definition of subgradient
d(y) > d(x ) + t(xn) (y x )
Since d(y) = 0 therefore
d(xn) 5 t(xn) (x y)
Collecting the above results
t(x ) t(x )
(2.21) Theorem [Oettli]
n+1 n <0<
I x x 6 II x x ,0 < 0 < 1
Proof:
n+l 2 11 2 d2(xn
(2.22) j x ii Ixn (xn)
t(x ) t(x )
n 2
S11 Y112
By triangle inequality
Ii x xl II x y l + I y xI
S2 I x yll
Since this holds for all y E P including Tp(xn)
II x xl 2d(x', P)
*
Substituting x in 2.22
In+1 *11 2 l n x* 2 d2(")
lx x ] n x x I d()
t(x ) t(x )
n 2 d2(xn) 1
= II x x I [1 ,n. 2 2]
n 2
2 nn
< n *; 1 d (x ) 1
Sn, F n. 2 n 1
cn ) tc x ) d x )
or I x. 1 :x x 1 x ]
1 B
where A = inf B = dup I t(xn")
n
n d(: P) n
Oeccttli has show..n that A > 0 and E i. finice. From these c'Dser'.acticn.,
the result follows if we take
6 11 ,4 2,"
SB
In the ne:x. chapter we show that the Generalized llerzl;,akov
procedure is equivalent to the Oettli procedure in as far as the
sequence of possible moves made by che tw.'o procedures are concerned.
Souime of the Hore Riecc'nt Algori thlns
A limictaion of subgradient opcimization methods is the absence
of a benchmark co compare the computational efficiency of difference
algorithms. Algorithms for unconstrained optimization of smooth
convex functions have the reliable method of steepest descent
(Cauchy procedure) as a yardstick against which other algorithms
are compared. If they are better than the Cauchy procedure, they
usually deserve further consideration. In subgradient optimization,
the Cauchy procedure may not work. An example where the Cauchy
procedure is ineffective is reproduced below from [37]:
The function considered is
max {3 x + 2y, 2x + 10y)
z
8
8 >
FIGURE 19
The contours of this function are sketched in Figure 19. One
can calculate that if the Cauchy algorithm is started at z, the
sequence converges to the origin (0,0) which is not the minimum
of the function.
Some of the recent subgradient procedures can be divided into
two categories. In the first category are those that claim to be
at least as good as some other subgradient procedures. These
algorithms claim their superiority based on empirical (.computational)
results. In the second category are those that claim to be reasonably
effective for both differentiable and nondifferntiable convex
functions and claim that the algorithm is superior when applied to
quadratic and convex differentiable functions.
As an example of the former we present the method of Camerini,
Fratta and Maffioli 15], and as an example of the latter we present
Wolfe's 138] method of conjugate subgradients for minimizing non
differentiable functions. Wolfe's algorithm closely resembles
Lemarchel's method 128].
Camerini, Fratta and laffioli. Here one adopts the following
iterative scheme for minimizing f('). Let
0
x = 0 and
n+1 n n n .
x = x t s where s is a modified "gradient"
n
direction defined as
sn = 0 for n = 0 and
sn = f (xn) + n1
n
n n n
with f'(x ) a subgradient of f at x and 0 a suitable scalar. s
n
is equivalent to a weighted sum of preceding gradient directions and
is used as an antijamming device. The following lemmas and
theorems develop the essential elements of Camerini, Fratta and
Maffioli method.
(2.23) Lemma
n
Let x and x be such that
f(x ) f(xn), then
0 f(xn) f(x ) f'(x) (x x )
This lemma states that the negative of the subgradient makes an
S n *
acute angle with the vector (x x ) where x corresponds to an
optimal solution. See Figure 20.
Sn
/ x
f'(xn)
f' (xn)
FIGURE 20
(2.24) Lemma
If for all n
0 < t f(x) f(x*) and 3 0, then
n 11 sn 12 n
0 5 f' (x) (x xn) < s (x x)
n
Lemma 2.24 extends the subgradient property of Lemma 2.23 to s
n
s also makes an acute angle with (x x ).
(2.25) Theorem
ni n
Yn s f (x ) if s f(xn) < 0
Letn I "n 112
0 otherwise
with 0 = = 2: then
n
n' n n'' n
(x x ) s (x x f (xn)
ls" l = Ilf,(xn) ll
n
Theorem 2.25 shows that a proper choice of ensures that s is at
n
least as good a direction as f'(xn) in the sense that s makes
more of an acute angle with (x x ) compared to f'(x ). If
f'(xn) and sn1 make an acute angle we take sn = f'(xn). Only
if f'(xn) and sn make an obtuse angle do we take
s = (f (n) + s")
n
Figure 21 illustrates the above theorem.
f' ( n)=I
ii1
I,
Sn1
S
ni
x
FIGURE 21
(2.26) Theorem
Ix* xn+ll < j x* xnl
Proof:
nH
2f' (xn) (x* xn by Lemma 2.24
2 (x* x by Lemma 2.25
Hence
t il sn 112 + 2sn'(x* xn) < 0
or t sn2 + 2t s (x* x") < 0
n n
adding II x* x n2 to both sides gives
1Ix* xn+l < I x* xI I/
Theorem 2.26 shows that the sequence is Fej6rmonotone with respect
to an optimum point x*. Using arguments similar to those used
earlier in relaxation methods, it can be shown that the sequence
{xn) converges to x*.
Camerini et al., use heauristic arguments to show that the
best value for y is 1.5. They then claim superiority of their
method over that of Held and Karp [18] and Polyak [35] based on
computational results.
Wolfe. In Wolfe's method [38] we generate a sequence of points
{xn} of directions {dn}, of subgradients {f'(xn)} and scalars {tn}
such that for n = 0, 1, 2, ...
t > 0, f'(xn) E 3f(xn)
n
n+1 n n
x = x + td and
n
f(xn+l) < f(xn).
At step n we have a bundle of vectors G which is formed by a set of
n
rules. G consists of f'(x ), possibly dn and a (possibly empty)
n
string f'(x ), f'(x ), ... of limited length.
We set d = N G where N G is a unique point in the convex
rn rn
hull of G having minimum norm. Demjanov 110] has shown that
Vf(x) = Nr f(x) where Vf(x) is the gradient of f at x when f is
smooth at x and 3f(x) is the subdifferential of f at x. As is
typically the case, the stepsize, t is determined by an approximate
n
minimization of f(:< + td ) for t 0. At some step, d = ; G
n r n
will be small, and in addition the tlst p + 1 points (for some
interger p) are all close together. Thlen :' w. ill be our appro:imrate
soluc i ln.
For e:xample consider the illustration in Wolfe's paper
f(::,v = ma:x [f I, f:., f, where
t f
S = . f = .: + v, t = ," 2"
1 3
sketched in Figure 22. A subgradient of f at an',' (::,') is chosen to
be '7f. where i is the least inde:.: for which f(:,',') = f.. If we start
i 1
with the point (6,3), we move in the direction 'f(6,3) = (1,1) co
the point (3,0). Using our prescription for choosing che direction,
the oni:, available direction at (3,0) is still (1,1). We must
ctake a small scep in tliac direction an':,.a,' even tclouPg it is not a
descent direction. hien the new subgradient will be (1,2) forming
the bundle G6, = [(1,1), (1,2)}, .'e find the direction C, = (1,0).
r
y
10
r\ NA8
[1 J \ x
FIGURE 22
The next step in this direction takes us to the neighborhood
of (0,0). At the next iteration we add (1,0) to the bundle. Then
N = 0. We "reset" and start all over again at this point. We
r3
accept the last point reached as a solution if N G = 0 and the
rn
last "p" points are "close enough."
For differentiable convex functions, the method is an extension
of the method of conjugate gradients and terminates for quadratic
functions.
CHAPTER 3
THEORETICAL RESULTS
Comparison ofGeneralized Merzlyakov Method and
Some Typical Subgradient Methods
The equivalence of relaxation methods and minimization methods
has been noted by several researchers. However, these two approaches
are only mathematically equivalent and this equivalence has not been
generally made explicit. On the other hand computationally relaxa
tion methods are much simpler. It has been shown [24] thac _
generalized form of Merzlyakov's method subsumes most of the typical
subgradient methods. In particular it was shown that the geneuralized
Merzlyakov method can generate any move that is made by thc ucettli
method. In this section we draw heavily from the results of the
above paper [24].
Generalized Merzlyakov Method
The Merzlyakov method is generalized [24] so as to facilitate
its comparison with some of the subgradient methods. This extension
greatly increases the flexibility of the original method. The
following procedure allows the A.s to be chosen based on the current
1
point rather than being constant for a subcavity. For an:, ,l.,lere
V(x) # 0, let
0 if a'x + b > 0
A.(x) =
S0 if a'x + b. < 0
i 1 
where 0 < E A.(x) 8, A.(x) > 0 for some i CV(x) and at each n
1 1
there is a j C V(xn) such that
.(xn)(a'.xn + b.)
3.1 I J 90
a' ,xn + bj,
where j* indexes a most violated constraint and 0 > 0.
For A C (0, 2) and x initially specified, stop at iteration n
if x E P. Otherwise, let
n+ A. (xn)(a'xn + b.)(E A.(x )ai)
n+l n i 1 1 1 1
3.2 x = x 
(E A. i(xn)a.)' ( .(xn)a.)
We continue to assume H aill = 1 for all i.
In the next two theorems, we show that the generalized
Merzlyakov method converges with a linear rate of convergence to P.
3.3 Theorem
Let z C P, and {xn} obtained by the generalized Merzlyakov
method. Then
x n+l 1 < 11 n 11
llx+1 zll < x z
Proof:
2(xnz)' A Z A.(xn)(a'xn + b.)E X.(xn)a.
I n+l i ll n 2 1 1 1 1 1
1 1 1 1
 x Z11=[x n nn
12(Z A.(xn)(a'xn + b.))2
+ 1 1 1
(E A.(xn)ai)'(. .i(xn)a.)
The last two terms can be confined Lo get
AX A .(xn)(a'xn + b.)
S 1 n [2X A.(xn)a'(x z)
(X A.(xn)a.)'( n [.(xn)a) 1 1
A(E A.(xn)(a'xn + b.))].
1 1 1
Now a'xn + b. 0 whenever A.(xn) > 0 and a'xn + b. < 0 for some
1 i I i 1
A.(xn) > 0 and thus the first bracketed term is positive.
1
Since z C P and a'z > b., then
1 1
E A.(xn)a'z 5 Z A.(xn)b.
1 1 1 1
Thus the second bracketed term is less than or equal to
(2 A) A.(xn)(a'x" + b.)
1 1 1
and this term is strictly negative.
Thus
Sxn1 zll < i x n zl Ii/
Thus we have established that the sequence is Fejermonotone
with respect to P. We next show that the procedure gives points
which converge linearly to P. Let
d(xn, P) = inf II x z .
ZEP
3.4 Theorem
Either the generalized Merzlyakov procedure terminates with a
point of P or it generates a sequence which converges to a point
x* E P.
Furthermore, the convergence rate is linear.
n n n
Proof: Assume x V P and z is the unique closest point of P to x .
Then
2 n+1 p) n+l n n 2
d (x P) [ x z .
S In+1 n 2
After expanding x z 112 we get
,n n 2
A(2A)[Z A.(x )a'x + b.)]
2 n+l, P)  n n 2 i i
d (xn P) = x z  
( ( A.x )ai)'( A. (xn)a.)
1 1 1 1
A(2A)[Z A. (xn)(axn + b.)]2
d2(xn p) 1 i
= dA x P) A(xn)a
(E A.(xn)a.)'t( A.(xn)a.)
1 1 1 1
By assumption
IJ A.(xn)(a'xn + b.)]2 [A.(xn)(a'xn + b.)]2
i 1 1 1 1
( 62 (a'xn + b. )2 for some i E V(xn)
From Hoffman [21] we have that there is a p > 0
pd(xn, P) < a'. x b..
Thus
d (x P) = d2(xn, P) 22(x, P)
(E .(xn)a.)' (E (xn)a.)
1 1 1 1
Finally since E A.(xn) < B and I a.ll= 1 for all i, we have
(E Ai(xn)ai)' ( .(xn)a.) 82
And we get
d2(xn+, P) = II A(2)e2 /2/ 2]d2(xn, P) y2d2(xn, P)
Then 0 y < 1 and d(xn, P) < nd(x0, ).
Case 1: If the sequence does not terminate lim d(xn, P) = 0 and
nKo
Motzkin and Shoenberg [31] have shown that the sequence converges to
a point x* in the boundary of P for 0 < A < 2.
Case 2: If the sequence terminates
x n+x zjl < xn zil for all z E P (Theorem 3.3).
Also Ilx* znl < I xn z nl where zn is the closest point to P
from xn
Th us
nn n
< 21 xn z"II = 2d(x", P)
2ynd(x P)
where x* is the last point in the terminating sequence. ///
We can further generalize the Merzlyakov method and Theorems 3.3 and
3.4 by replacing A with A where 0 < b. < A < b < 2 for all n. In
n 1 = n = 2
the remainder of the discussion we usually mean A = A for all n but
n
do not have to restrict ourselves in this manner.
Further Developments of Oettli Method
Oettli's minimization method was discussed in the last chapter.
We wish to show the equivalence of Generalized Merzlyakov method and
Oettli's method in the next section but before that we need to develop
some properties of Oettli's method to facilitate this comparison.
Again most of the theorems and proofs in this section are taken from
the paper [24] referred to earlier.
We represent the subdifferential of a convex function f at x by
f(x)and a subgradient of f at x by f'(x) i.e., f'(x) E af(x).
To recapitulate the Oettli procedure, we form the sequence {xn}
as follows:
3.6
where
3. 7
n+l n d(xn)t(xn)
x =x
t(xn)'t(xn)
n n
t(x ) is any subgradient of d(') at x
Lemma
For any norm p(') on Rm
ap(x) = ap(0) n {p'(x): p(x) = p'(x
)'x)
Proof:
We will show
pp(x) =
Conversely if the
that if p'(x) p(x)then the following ralation holds
{p'(x): p(y) p'(x)'y for all y} n
{p'(x): p(x) = p'(x)'x}
above relation holds, we will show p'(x) E ap(x).
By the definition of a subgradient p'(x) E ap(x) if
p(y) > p(x) + p'(x)'(y x) for all y.
Also
p(Ax) p(x) > p'(x)'(Ax x) for all scalars X.
Since p(') is a norm
p(Ax) = A(lp(x)
For X > 1 p(x) > p'(x)'x
0 < A < 1 p(x) < p'(x)'x
Thus p(x) = p'(x)'x.
From the subgradient inequality
p(y) p(x) + p'(x)'(y x) by substituting for p(x) we get
p(y) p'(x)'y for all y. Hence
3p(x) = {p'(x): p(y) p'(x)'y for all y} n
{p'(x): p(x) = p'(x)'x}.
Conversely, if p(y) > p' (x)'y, for all y and p(x) = p'(x)'x, we get
by subtraction
p(y) p(x) p'(x)'y p(x) = p'(x)(y x)
and p'(x) E Dp(x).
Finally
pp(0) = {p'(0): p(y) > p'(0)'y, for all y} n Rm
= {p'(0): p(y) p'(0)'y, for all y). V/
To find the subgradients of d() in terms of subgradients of the
composite function p(+ (x)) we use the following chain rule given by
Oettli.
3.8 Lemma
Let p(') be an isotone norm and p'(Z (z)) 0, then
3.9 d'(z) = (z) p'( +(z))
Proof:
+ + 0 > 0 + '
e (z) A (z ) = (z z ) jl Cz )
multiplying both sides by p'( +(x)) where p'(Z (x)) = 0
(Z+(z) Z+(z0))p', +(z 0)) = (z z )' +'(z0 p'(+(z0))
Using the definition of subgradient for p'( +(z0))
p(Z+(z)) p(C+(z0)) 0 (+(z) Z+(z0 )'p'(Z+(z0)
( (z zO)'+' (z0)p( +(z0))
d(z) d(z0) (z z )+'(z p' (+z))
But d(z) d(z0) 2 (z z )'d'(z )
hence d'(z ) = +' (z )p, +(z0 ).
The subgradient of (x) may be found using the following well
1
known.. result
3.10 t.+ (x) = {Aa.: = 0 if a'x + b. > 0
1 1 1 1
A = 1 if a'x + b. < 0 and
1 1
A [0,1] if a'x + b. = 0}
I 1
The subdifferential of Z (x) can in turn be found from
3.11 a +(x) = {(hi, ... h ): h. E S (x)}
1 m i l
Generalized Merzlyakov method and Oettli method
With the results developed so far in the previous sec ions we
are now in a position to develop an expression for equation 3.6 in
terms of the parameters of the generalized Merzlyakov method.
3.12 Theorem
Let x c Rk, x V P and t(x) be a subgradient of d(') ic :w formJd
by using the chain rule of equation 3.9. Then
d(x)t(x) [ (a'x + b.)]EA.a.
d(x)t(x) 1 1 1 1
t(x)'t(x) (~A.a.)'( .a.)
1 1 1 1
for some X.'s where A. O, = 0 if a'x + b > 0, and EA. > 0.
1 i i i i i
Proof:
By equation 3.9 we have
t(x) = e+ (x)p'( (x)) for some e (x) e S +(x)
and p'(Z+(x)) ap( +(x)).
By equations 3.10 and 3.11
z (x) = (nlal' r2 2' .. n a ) with
0 if a'x + b. > 0
1 1
= 1 if a'x + b. < 0
I 1 1
S[10,1] if a'x + b. = 0
1 1
Also from the proof of Lemma 3.7
d(x) = p'(Z+(x))'+ (x) where
n, (a'x + b.)
z1
Z+(x) =
n (a'x + b )
m m m
Let
h p' ( (x)). In Oettli's method we need h 0. Substituting
d(x)t(x)
these last few expressions into (x)t(x)gives
t(x)'t(x)
d(x)t(x) [EX. (a'x + b.)]EA.a.
d(x)t(x)1 1 1 1 1
t(x) 't(x) (EA.a.) (EA.a.)
with A. = n.h. 0. We also have EA.(a'x + b.) = p( +(x)) > 0 since
1 11 1 1 1
x V P and EA. > 0.
We have thus established a close similarity between the two
methods. We have however yet to establish the two other requirements
viz EA. =< and that condition 3.1 is satisfied. First we note that
1
since the subdifferential of p(') is compact EA. is bounded above.
1
Hence the condition LA. B is satisfied. Now to show that condition
1
3.1 is satisfied, we see on the lines of the proof of Theorem 3.12
that
d(x) = p(Z+(x)) = p'( +(x))' +(x) E h'(A+(x))
= ET.h.(a'x + b.) EA.(x)(a'.x + b.)
1 1 1 1 1 1 1
where A.(x) = .h. > 0.
1 1 1
Hence
E. (x)(a.x + b.) = p( +(x))
1 1 1
and m max X.(x)(a'x b.) 2 p(C (x))
1 1 1 
i
Also Z (x) 2 Z.,(x)ej, where j* is the most violated
constraint and e. is a unit vector with a 1 in position j*. Since
p(') is monotonic
S+ (x)ej)
p(Z (x)) p(A (x)e)
= (a,x + bj.)p(ej.)
Thus
m max A.(x)(a'x b.) 2 (a' x + b. )p(e.,)
i 1 1 1 J
2 (a',x + bj ) min p(e.)
Hence
main P (e)
A,(x)(a'x + b ) m p(e
i i > i a
= 0 > 0
(a',x + bj,) =
where i E V(x) maximizes the left hand side.
If we allow X = 1 in the generalized Merzlyakov method,we find
from the above analysis that any move made by the Oettli procedure
can also be made by the generalized Merzlyakov method. However, the
reverse is not true. This can be seen if we let A # 1 and take the
case where x violates only one constraint. Thus the Oettli procedure
is strictly subsumed by the generalized Merzlyakov method. Compu
tationally the relaxation method is very simple to implement since
the directions of movement can be easily computed. Also it is a
very flexible procedure since all that is required to change the
direction of movement and the step length is to change the weights
X.(x). On the other hand the Oettli method requires a subgradient
to be computed at each point which is not a trivial calculation.
In an earlier paragraph we indicated that the
Merzlyakov procedure could be further generalized by allowing A to
vary with each iteration. However this would require the additional
condition 0 < b Xn A b2 < 2. Oettli has recently generalized his
method [33] to incorporate the above feature. His generalized
method requires that An (0, 2) for all n with the additional
stipulation
CA (2 X ) = + .
n n
A second aspect in which the two methods differ is that in the
generalized Merzlyakov method we could use a different set of X.'s at
1
each iteration. This could be introduced into the Oettli method by
considering different norms at each iteration.
Comparison With Some Other Subgradient Nethods
In this section we show the versatility of the generalized
Merzlyakov method by showing that some of the other typical subgradient
procedures are strictly subsumed by the generalized Merzlyakov method.
We consider the methods of Polyak (134], 135]), Eremin [12] and Held,
Wolfe and Crowder [19]. These subgradient methods are capable of
tackling more general functions, however we compare them when the
objective is merely to find a point of a polyhedron P. There are a
number of ways of defining the function f(') such that when we
minimize it we get x* P. A typical choice is
3.13 f(xn) = max (a'xn + b.).
1 i
i=l,...,m
We will show that with f thus defined the subgradient methods are
subsumed by the generalized Merzlyakov method. The above subgradient
methods can be collectively described by the following sequence:
n 
n+l n [f(x) f] n
x = x 2 f(x )
S n f'(xn)T2
where f is an estimate of f(x*) the optimum value of f. If f(x )
assumes its value for a unique i in the relation (3.13) then the
subgradient at x f'(xn) = a.. If there are ties, f'(x ) can be
1
taken to be a convex combination of the maximizing a.s. Let 0(x )
represent the set of indices i E {l, ..., m} for which
f(xn) = max (a'.xn b.). Then
i=l,... ,m
f'(xn) = ZA.(xn)a. with EA.(x ) = 1
1 1 1
A.(xn) 0 and A.(xn) = 0 for i i 0(xn).
1 1
Hence
n 
nl n ((x) f)
n+1 n \n A n,
x = x f'(x )
i f' (xn) 2 (x
(EA.i(xn)(a'xn + b ) f)(E.(xn)a.)
1 1x 1
n (EA.i(xn)a.)' ( i. (xn)a.)
with A.(xn) 0, A.i(xn) = 1 and A.(xn) = 0 for i V 0(xn)
1 1 1
This can be considered equivalent to finding a point of P where
P = {x: a.x + b. f ? 0, i = 1, ..., m}.
1 1
If f 0, the procedure gives a point of P. Thus if P J 0 the
subgradient procedures mentioned in this section are special cases
of the generalized Merzlyakov method.
An Algorithm Using Relaxation Method
Motivation
We have seen that with relaxation parameter of A = 1 we can get
the best convergence rate for a fixed i (see Figure 14). Also a
higher p leads to better convergence. We wish to combine both these
features in our algorithm. We also derive motivation from Polyak's
[34] and Oettli's procedure 132]. In this section we give a procedure
for finding a point of an arbitrary polyhedron P and later apply it
to the specific problem of solving large linear programs.
Given a polyhedron P
P = {x E Rk: a'.x + b. = 0, i C I}.
1 1
We assume P J 0 and I j 0 and finite. Goffin [15] has shown that when
P is full dimensioned finite convergence is assured for a range of
relaxation parameters A depending on the obtuseness index of the
polyhedron. When P is not full dimensioned,we can partition it to
subsets M and C, such that M is full dimensioned, and then devise an
iterative procedure relative to these subsets so as to take advantage
of the full dimensionality. Even though finite convergence is not
assured, we can hope to get a better convergence rate with a proper
choice of the relaxation parameters. There is considerable flexibility
in the manner M and C are chosen. One important consideration is that
it should be easy to project on to C. The algorithm is in two phases.
In Phase I we use an iterative scheme on the sets
M and C which drives us close to P. We do this by starting with an
arbitrary point in C. We then move in the direction of M. This
movement could be a projection onto M, or a reflection or under or
over relaxation, depending on the value of the relaxation parameter
A. From the new point thus obtained we project on to C. We show
that the sequence thus constructed converges to P with a linear rate
of convergence. In our algorithm however, this iterative procedure
is continued merely to get close to P. In this special case when
k
C = R Phase I becomes the relaxation procedure of Agmon.
Once we get close to P we switch to Phase II. In Phase II we
consider the constraints of the set P and apply the
Generalized Merzlyakov method. The motivation for use of the
Generalized Merzlyakov method at this stage is the fact that the
set of violated constraints as we approach P are precisely the set
of violated constraints at the limit point and under these
circumstances, the Generalized Merzlyakov method can be very
effective.
We first describe the procedure for finding a point of an
arbitrary polyhedron P.
The Algorithm
Phase I.
Let z CP
If z E P stop, otherwise define
n+l } T )
z = TC {zn + Ad (zn)a.i } TC(sn)
n
where 0 < A < 2 and a. corresponds to the most violated halfspace
1
n
of M at z and
3.14 sn = zn + Ad (zn)a
n
We will show that the sequence {zn) is Fejermonotone with
respect to P.
3.15 Lemma [Goffin]
k
Let x, y, t be points of R and let
x' = x + A(t x) where A E R. Then
I x' y 12 = Ix y 2 2 A) It x 2 + 2A(ty)'(tx)
Proof:
2
I x' yll = x + A(t x) yi 2
= 1 x y1 2 + 12(x y) + A(t x)]' A(t x)
= Ix y 2 A(2 A) t xII 2 + A[2(x y) + 2(tx)]'(tx)
= x y 2 A(2 )I t xI 2 + 2A(t y)'(t x). i//
3.16 Lemma
sn ul2 = izn u2 AdM(zn)[(2 )d(zn) 2a' (tn u)]
n
for u E M.
Proof:
n n n
Substitute s for x z for x and t for t, where
t = z + d (z )a Then
n
1I s ul 2 = 11 z u12 A( (2 ) i )t zn12
+ 2A(tn u)'(tn z)
22n
= z u 2 A(2 A)d (zn) + 2Ad (zn)al (tn u)
9 1
= I zn u AdM(Zn)I(2 A)d:(zn) 2ai (t" u)].
n
//
3.17 Theorem
Let M be a nonempty polyhedron of Rk. Then the sequence of
points generated by equation (3.14) has the following property:
Ss" ull 1 z" ull for all u E M.
Proof:
If u E M, u H Hence a' (tn u) 1 0 for all u M. By
1 1
n n
lemma 3.16
I sn ui I z ull .
3.18 Theorem
k
Let P be a nonempty polyhedron of Rk. Then the sequence of
points {zn} generated by equation (3.14) is Fejermonotone with
respect to P, in fact
I zn+ ull < 1n ull for all u P.
Proof:
Let u c P and let b be the unique point obtained by projiccring
n n+l
u onto the line formed by extending s z (Figure 23). The angle
n
formed by s b, u is a right angle and for the right triangle
n
s b, u.
n 2 2 n 2
I s all 2 + II a u 2 = 1 s u[ 2
I n n+l 2 + n+l al2 + 2j su n+l 1 nl+ _
I s z 1 +  z al + 2 s z I[ 1 z a 
+ a ul 2 = sn ul
or
n n+1 2 n+l 2 n n+1 n + I
s + u + 2 js I i
= sn u
or
n+l u 2 <11n 2
 z ull <  s ull
n n+l
The strict inequality arises from the fact that s and z are
distinct or else z E P. But by Theorem (3.17)
II n ul 11 zn uI hence
Szn+l ull < I zn uI I7/7
In the next theorem we show that the sequence {zn} converges
linearly to P.
FIGURE 23
3.19 Theorem
The sequence of points {zn) generated by the above procedure
converges finitely or infintely to a point of P. Furthermore the
convergence rate is linear.
Proof:
Izn+ T (zn+12 = I zn+ u 2 for a unique u E P (the
n+l
closest point of z to P). Thus
I1z n ul 2 = TC{zn + AdM(z )a } TC(u) 2
n
S1 zn u2 Ad (zn)[2a' (znu)Ad (zn)]
n
since the projection map satisfies the Lipschitz condition with
c = 1. Continuing
n+1 2 n 2 n n n
z u z u12 Ad (z )[2d (Z ) Ad (Z )]
since a' (zn u) = d ,(zn) hence
i M
n
Iz n+l ul 2 < d2(zn, P)[l A(2 A)"2] by Lemma 2.12
= 2d2 (z P).
We get
d(z n+, P) < ed(zn, P) where
0 = [1 A(2 A),*2]1/2
and 0 E [0, 1).
Hence
d(zn, P) < 9nd(z0, P) and if the sequence is infinite
lim d(zn,P) = 0
n+o
Using Theorem [2.4] and by the continuity of the distance function
this implies {z n converges to z* in the boundary of P.
If the sequence terminates at z*, then
Iz* Tp(zn)I l< Tz(zn)jT = d(zn, P)
by Fejermonotonicity of the sequence. Hence z* and zn both belong
to the ball B (Tp(zn)), and
d(zn, P)
d(zn
1I z* zn < 2d(zn, P) < 2Ond(0, P). 
Phase II. When the last k points are within E of each other,
we switch to the Generalized Merzlyakov procedure. This is because
if we are near the limit point, the only active constraints would
be the tight constraints at the limit point of the sequence. In
such a situation the Generalized Merzlyakov method can be very
effective with a proper choice of A and A.(x). We recapitulate
that in Merzlyakov's procedure (which is a special case of
Generalized Merzlyakov method) the sequence is generated as follows:
A n (j, V)(a'x" + b.)][A(j, V)ai]
n+l n 1 1
x =x 
[x.,(j, V)ai] [Ei(j, V)ai]
n
where x e S
If we assume that we know the set of constraints satisfied
with equality by Tp(xn), we can generate the halfspace
H* = {x Rn:(Tp(xn) x") (x Tp(xn)) > 0}
which gives convergence to Tp(xn) in exactly one step with X = 1.
Suppose we number the set of active constraints at T (xn) by
I* = {1, ..., p}. Let A* E (al,..., a )' be the p x k matrix
P
formed by interior normals of the halfspaces H. for iC I*. Let
1
b* = (b ..., bp)'. Define
(A, ...,A )' = (AA*')(A*xn + b*).
With this set of X.(j, V)'s using Merzlyakov method, Goffin [15]
has shown that convergence to T (x ) is obtained in one step.
p
However, this requires a comparably larger amount of computation
and finding (A*A*')1 is not really practical on large problems.
In this study we have therefore concentrated on the Generalized
Merzlyakov Method and attempted to find what would constitute a
favorable set for A and X.(x).
Application to Solving Large Linear Programs
We now consider how the procedure could be used to solve
large linear programs.
Consider the LP
max c'x
s.t. Ax b
x_ 0
and its dual
min T'b
s.t. A'T c
0.
By stong duality, an optimal I and x satisfy
b'T c'x < 0. Let
C = ((): b'T c'x 0}
M= {(x): Ax = b, A'T : c, x 2 0, T 0}
and P = M n C.
TIis is only one of the ways of partitioning the cons. L '1 ilL of
P. There is in fact great flexibility in the choice of C and II.
The advantage of splitting C and M is indicated above is
(a) M has a special structure and can be decomposed along T
and x. This leads to saving in storage as well as easier arithmetic.
(b) C contains only the coupling constraints and is easy to
project to.
However, there are other ways of obtaining P. Another
choice could be to let C have only the nonnegativity constraints.
Again the advantage of such a construction would be ease of pro
jection to C. Still another choice is for C to take the form
C = {(x): b'n c'x 5 0, x 2 0, Tr 2 0}.
There is user discrimination in this algorithm in the choice
of M and C, 1 and Z in Phase I, and A and weights A.(x) in Phase
II. We have already commented on the possible strategies for
selecting Ii and C. We give below strategies for selecting the
other parameters. We will first discuss the parameters for Phase I.
(1) A: If 0 < A 5 1 convergence may be slow if some of the
"angles" of M are very acute (small obtuseness index of P). In
this case increasing A to near 2 will have the effect of opening
the angles and accelerating the convergence. It therefore appears
to be a good strategy to have A approximately equal to 2 in Phase I.
(2) Z : In line with the general observation for relaxation
methods that anya priori information on the location of the solution
could be used to advantage, if we have to solve a linear program
with only slightly varying data, we could Lake advantage of this
feature. However, in general when we do not have any such prior
information, the choice of Z0 could be dictated by the structure of
C, so that we can start with a feasible point of C.
Now for the parameters of Phase II
(1) A: As a general guideline 1 < A <2 may be chosen as the
parameter for Phase II as was done in Phase I. However, Table 3
and Figure 24 show the sequence obtained with A = 1.5 and h = .75
and illustrate that in some cases choice of 0 < A < 1 may give
better results.
(2) A.(j, V): In Merzlyakov method the choice of weights
X.(j, V) is crucial and in a large measure dictates the convergence
1
of the sequence. It may be noted that Ai(j, V)'s determine both
the direction as well as the step size. Suppose a., i C K
represent the set of violated constraints in subcavity S where
xn SV. We can solve the quadratic problem of finding the point
of minimum norm in convex hull of the finite point set a., iE K
to get the weights X.(j, V). Such a direction locally has a great
deal of merit. However the computational effort is too great to
merit its consideration, since at each iteration we have to solve
a quadratic program. Also it may not always lead to the best set
of weights as shown by the counterexample given in the next
paragraph. Instead we specify X (j, V) =) , where KI represents
the number of constraints violated at x When IKj = 2 the
direction obtained coincides with the vector of minimum norm.
The strategy suggested by us is not the best in all cases as
shown by the following counterexample:
Consider x > 1
x2 = 2
1 0
x =(
Case 1: A = .75
TABLE 3
A.(j, V) = 1 for all i E V.
1
n El.(j, V) [ZA. (j, V)a. ]'
x 1 A.(j, V)a 1
n n n xn i [7%i (j
x x2 (a'x + b.) 11 IE(j, V)a.]
1 21 1 1
287
0 0.000 0.000 2.873 2'87 1
.958
188
1 .619 2.064 1.838 (' ) .037
.037
188
2 7.622 .686 .459 ( 88 .037
.037
099
3 9.371 .342 .273 ( '99 1
.995
188
4 9.351 .138 .125 ( 37 .037
.037
188
5 9.827 .044 .032 ( ) .037
.037
.099
6 9.949 .020 .010 ( 99) 1
.995
7 9.948 .013 .008 ( 18 .037
.037
287
8 9.978 .007 .002 ( 58) 1
958
Case 2: A = 1.5 A.(j, V) = 1 for all i E V.
1
n EAi (j, V) [Z .(j, V)a.]' *
i .(j, V)a., V)ai
1_1 2 i 1 1
287
0 0.000 0.000 2.873 ( 8 1
.958
.099
1 1.237 4.128 3.235 ( 9) 1
.995
2 .757 .700 3.326 ( 28 1
.958
099
3 2.189 4.079 3.280 ( 99) 1
.995
287
4 1.702 .816 3.166 ( 8) 1
.958
099
5 3.065 3.733 3.022 99) 1
.995
287
6 2.616 .777 2.866 ( 58 1
.958
099
7 3.850 3.347 2.710 ( '995)
.995)
8 3.447 .704 2.558 287
.958
The
problem considered is
P = {(x2): .099x1 +
.287x1 
.995x2 + .995 > 0,
.958x2 2.873 2 0, x 1 0, x2 0}.
4
2 x
x
8
6 x
x
A = 1.5
S= 5
,' '
FIGURE E L'
From the proof of Theorem 3.4 we have
2 nl (2 A)[EL.(xn)(a'xn + b)]2
2 n+1 2 n i i
d (x P) d(x, P) ( X (
(C^i(xn)ai) C Ci(xn)ai)
Thus to get the best set of k.'s we may solve the following
1
optimization problem
(A.(a'xn b.))2
1 1 1
max = k
I EA.a.l 2
I 1 1 2
,*
Suppose A* solves the above, then i also solves the above.
1 l n Ea.H
S1 1i
Hence an equivalent problem is
max (EX.(a'.x b.))
1 1
s.t. I EA.ai = 1.
In our specific problem
max A1 + 2A2
2 2
s.t. A + = 1
1 2
or max jl A22 + 2X2'
1 2
This gives iA = and .
1 a 2 J57
This set ofA.s gives
1
2 1
x =( ) which is feasible with k = 5.
1
Had we specified A, = A i.e., A. =
1 2 ' i ,
2
x =3 which is not feasible and k = 4.5.
Thus our prescription does not lead to the best set of Ais, it is
however a good heuristic and easy to implement.
(3) A.(x). In Generalized Merzlyakov method, the weights A.(x)
3 J
are determined by the point alone and are not fixed for a subcavity.
The following appears to be intuitively a good set of weights since
the weightage is dependent on the magnitude of violation of a
constraint at the specific point
.() = (violation of constraint j at x
j total violation at x
Computational Results
Die algorithm described in the previous section was coded
primarily to come up i Jith a favorable set of '.alues for the
arbitrary parameters \ in Phase I and and the weights \.(x) for
J
Phase II. It may' be stated at the outset that selection of these
parameters is an art rather than a science and it uould be hard to
specify the best set of paramccers for the general .:ase.
The test problem considered uas the problem on "LeaseBuy
Planning Decisions" on pages 93 to 114 of Salkin and Saha's
"Studies in Linear Programming." The problem has 17 conscaints
and 20 variables. Together uich dual and nonnegati'.'icy, set H
thus had 74 constraints. Set C consisted of the constraint obtained
by using strong duality. This problem has a unique solution. The
constraints of the problem are reproduced in A.ppendi:.:
Test results corroboraLe tlie gncrai l ,biiJd:lines for selection of
cliesc p[ rame Lcrs indicated in the ;rc'. ious secLiol. '1ile di L.ince
from 3 point of I' 1wliere P = 11 ii C, ,was calculatcJd 3C ench iteration.
Tlie number of iterations required to reduce the distance from P bY'
3 certain amount was used as the criterion of improvement. Two
different starting points were selected. Results of computations
are summarized below:
Phase I
A. The relaxation parameter X was varied in the interval (0,2).
For the first starting point, the number of iterations required to
reduce the distance d from P from d = 74 to d = 60, was noted, and
for the second point the number iterations required to reduce the
distance from d = 135 to d = 132 was noted. These results are
summarized in Tables 4 and 5 respectively. The results are in line
with our expectation based on intuition that a higher A has the effect
of accelerating convergence.
Phase II
The Phase II relaxation parameter was varied in the interval (0,2)
and the following alternatives were tested for the weights A.(x):
J
1
(a)(x) # of violated constraints at x
(b)X (x) violation of constraint j at x
W total violation at x
(c). (x) violationn of constraint j at x)2
3 total violation at x
(d) .(x) =(violation of constraint j at x 3
j total violation at x
4
(e).(x) violation of constraint j at x)
j total violation at x
() violation of constraint j at x)
)if ( total of violation at x = .3
= violation of constraint i at x
j( = total violation at x
S.2 (violation of constraint j at a x
total violation at x
f .2 5< (v violation of constraint j at x
I total violation at x
J total violation at x
then
)
.3 then
)
TABLE 4
Initial d = 74
X Final d No of iterations
.25 63.757 > 20,000
.5 60.0 12,955
.75 59.999 7,339
1.0 60.0 4,866
1.25 59.997 3,469
1.5 59.997 2,565
1.75 59.996 1,927
1.95 59.997 1,525
TABLE 5
Initial d = 135.
A Final d No. of iterations
.25 133.113 > 20,000
.5 132.0 12,698
.75 132.0 6,692
1.0 132.0 4,208
1.25 132.0 2,934
1.5 131.999 2,137
1.75 131.999 1,587
1.95 131.999 1,263
90
if < (violation of constraint j at x .2
total violation at x < 2 then
=3.3*(violation of constraint j at x)
total violation at x
if violationn of constraint j at x <1 then
if (x)  :  = 3.13then
total violation at x
.(x) = violation of constraint j at x)
total violation at x
(g) f ( violation of constraint j at .3 then
total violation at x
= i00"(violation of constraint j at x
j total violation at x
if" 2 < (violation of constraint j at) < .3
if (x2 = 1 0(   :  ) .3 t e
total violation at x
S= I0(violation of constraint j at x
total violation at x
if 1 1 < (violation of constraint j at x) <.2 tien
if 2 < (    ) < .32 then
total violation at x
A.(x) = .3violation of constraint j at x
j total violation at x
if (violation of constraint j at ) < then
if (  1 <. ( ) < .1 then
total violation at x
.) = (violation of constraint j at x
j total violation at x
Results of alternative (a) are given in Tables 6 and 7 for
the same two points used in Phase I.
From Tables 6 and 7 it would appear that again A of around 2
gives best convergence, but the counterexample in the previous
section shows that this is not always the best policy with this set
of weights.
Results of alternative (b) are given in Table 8.
TABLE 6
(a) X (x) = 1
j # of violated constraints at x
Initial d = 74
A Final d No. of iterations
.25  time limit exceeded
.5  time limit exceeded
.75 60.0 14,839
1.0 60.0 8,028
1.25 59.998 4,702
1.5 59.998 2,861
1.75 60.0 1,424
1.95 59.857 850
(a) X.(x) =
J
TABLE 7
1
# of violated constraints at x
Initial d = 135
X Final d No. of iterations
.25  time limit exceeded
.5 133.45 >20,000
.75 132.0 15,249
1.0 131.999 7,568
1.25 131.999 4,135
1.5 131.999 2,435
1.75 131.998 1,212
1.95 132.0 713
TABLE 8
(b) .(x) violation of constraint j at x
j total violation at x
Intiial d = 74
X Final d No. of iterations
.25 59.999 1,253
.5 59.964 1,396
.75 59.997 1,505
1.0 60.0 1,084
1.25 59.999 1,254
1.5 59.976 1,061
1.75 59.997 863
1.95 59.998 894

Full Text 
PAGE 1
ITERATIVE METHODS FOR SOLVING LARGESCALE LINEAR PROGRAMS By Gollakota Surya Kumar A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 19 79
PAGE 2
ACKNOWLEDGEMENTS I wish to express my deepest gratitude to Professor Gary Koehler, my dissertation chairman, for the excellent guidance, the constant encouragement and the support he has provided me during my career at the University of Florida. Despite his busy schedule he was always available at a moment's notice. I also wish to thank Professor Antal Majthay whose depth of knowledge not only in the area of operations research but on a wide range of subjects will always remain aweinspiring. I am also grateful to Professors Ira Horowitz and Henry Tosi for providing financial assistance during my stay at the University of Florida. I also wish to thank Ms. Kathy J. Combs for her superb typing of the manuscript. Last but not the least this work would not have been possible but for the patience, understanding and encouragement of my wife Rekha. 11
PAGE 3
TABLE OF CONTENTS PAGE ACKNOlin^EDCeIENTS ii ABSTRACT v CHAPTER ONE THE PROBLEI1: WHY ITERATIVE TECHNIQUES? 1 Tl'JO BACKGROUND RESULTS AND TOOLS 8 Introduction , , , 8 Motivation , , 8 Task , , 11 Definitions , , , 12 Typical Relaxation Method 13 Procedure Â• . , 13 Variations . , 14 Convergence 14 Fejermonotone sequence 14 Motzkin and Schoenberg 16 Relaxation Procedures 18 Agmon 18 Motzkin and Schoenberg, Eaves 21 Goff in 22 Definitions 22 Another proof of Agmon's result 33 Range of A for finite convergence 37 Merzlyakov 39 Definitions and motivation 39 Method 40 Generalized Merzlyakov procedure 44 Typical Subgradient Method , 46 Procedure 46 Variations , 47 Some Subgradient Procedures 48 I'olynk 4y Oettli 51 Some of the more recent algorithms 54 Camerini, Fratta and Maffioli 56 Wolfe , eg 11 X
PAGE 4
CHAPTER PAGE THREE THEORETICAL RESULTS 62 Coraparison of Generalized Merzlyakov Method and Some Typical Subgradient Methods 62 Generalized Merzlyakov method , 62 Further developments of Oettli method 66 Generalized Merzlyakov method and Oettli method 68 Comparison with some other subgradient methods , 71 An Algorithm Using Relaxation Method 73 Motivation 73 The algorithm 7A Phase I m Phase II 79 Application to solving large linear problems 80 Computational results 86 Concluding Remarks 98 APPENDIX 99 BIBLIOGRAPHY 101 BIOGRAPHICAL SKETCH 105 IV
PAGE 5
Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ITERATIVE METHODS FOR SOLVING LARGESCALE LINEAR PROGRAMS By Gollakota Surya Kumar August 1979 Chairman: Gary J. Koehler Major Department: Management Tlie simplex method for solving linear programs has achieved its overwhelming success over the years due to the advances during the last three decades on extending the effectiveness of the basic algorithm introduced by Dantzig in 1951. However the largest problem that can be economically solved by this method is dictated by the size of the basis. The limitation being the ability to maintain a sparse yet accurate representation of the basis inverse. Commercial computer codes using the recent developments of the simplex method are capable of solving mathematical program systems of tlie order of 10,000 rows. However, larger linear programs do arise in practice. llie reason why they are not being set up is because none of the production LP codes can solve such large problems in a timely and cost effective fashion. For such large
PAGE 6
problems iterative tecliniques that do not require a basis may be an attractive alternative. We review two classes of iterative techniques Â— relaxation methods and subgradient optimisation. Solving a linear program can be shown to be identical to finding a point of a polyhedron P, the polyhedron being formed from primal and dual constraints and an additional constraint using strong duality. In relaxation methods we successively find points of relaxed forms P c p'^_ That is, at iteration n we find an x c P such that P <= P , We assume P ^ 0. The problem is solved when the sequence converges to a point X* Â£ P. Subgradient methods are an alternative class of iterative techniques for solving these problems. In subgradient methods we minimize a convex though not necessarily dif ferentiable function ^ / \ , , Â• Â• , ri+1 n n n , n Â„ . f(*) by the iterative scheme x = x t u , where t > is a a scalar obtained by the approximate minimization of f(x t u ), and u is a subgradient of f at x . VJhen we use the minimization method for finding a point of a polyhedron P, the function f is so defined that if x* is a minimum of f tlien x''' G P. These iterative techniques work with original data; thus advantages of supersparsity can be fully realized and the program run in core. Also they are self correcting and knowledge of a vector close to the solution can be used to advantage. We sliow that a generalized form of Merzlyakov's relaxation procedure subsumes most of the recent subgradient procedures when the objective is to find a point of a polyhedron. A new algorithm
PAGE 7
using relaxation method is presented. Tliis algorithm v;as coded to explore values for the relaxation parameters used in the algorithm.
PAGE 8
CHAPTER 1 THE PROBLEIl: WHY ITERi\TIVE TECHNIQUES? Mathematical programming has achieved its present popularity in a large measure due to the success of the simplex method for solving linear programs published by Dantzig [6 J in 1951. Over the years considerable effort has been expended towards exploiting and extending the effectiveness of this procedure. We will briefly outline some of the improvements. Most practical linear programs have a sparse matrix [2] and it is necessary to exploit this sparseness to reduce memory and time requirements and to maintain accurate information. The first such procedure was the revisedsimplex procedure. This procedure operates only on original problem data and requires an inverse of the current basis. If there are m constraints and n variables (including slacks, surplus and artificials) then the regular simplex procedure requires (1.1) (m + l)(n + 1) storage locations. lliere are m + 1 rows (including the objective and n + 1 columns (including the righthand side). In the revisedsimplex procedure we need (1.2) p(ra + l)(n +1) + m^ where p is the problem density. (1.1) is much larger than (1.2) if p is close to zero.
PAGE 9
It was also discovered that if the inverse matrix were stored in product form [29] an additional decrease in storage requirements could be realized. However, in product form, the basis inverse grows in size at each iteration so that eventually we must either run out of room or scrap the current form of the basis inverse and reinvert the current basis. In practice this happens quite often. Using product form inversion, the storage requirements are (1.3) p(m + l)(n + 1) + qmr where q is the average density of the eta vectors needed in the product form inverse and r is the average number of eta vectors. Notice, though, that to gain the decrease in (1.3) over (1.2) we must spend time in tlie reinversion step. It was soon realized that there are many ways of forming a product form of a basis matrix. For instance, the matrix could first be decomposed into the product of two triangular matrices. The eta vectors for these are sparse and easily formed. In addition, within a class of procedures for viewing the original basis, one can choose the rowcolumn pivot order to minimize the creation of new members Â— i.e. one can choose the order in forming the eta vectors to minimize the density of the resulting basis inverse representation [20]. A related issue has to do with updating the currect basis inverse with a new eta vector. One can perform this operation with a minimal buildup in storage requirements [14]. Kalan [22] pointed out that of the p(m + l)(n + 1) elements of a typical LP problem, most are identical (usually most are + Is).
PAGE 10
Kalan suggested storing only the unique numbers of an LP problem and using pointers to store the problem structure. In so doing one stores fewer elements than exist in the problem. The new sparseness is termed supersparsity . Kalan noted that not only can original data be stored this way but also new data (formed in tlie basis inverse) can be stored in tliis fashion. Another useful idea Â— scaling of data Â— has numerical stability for its motivation. Generally for real problems the coefficients of matrix may take on a wide range of values. ITiis can cause numerical difficulties Â— for example in inversion Â— since the computer must round off fractions. To reduce the effect of this problem, the matrix entries are scaled before starting the problem. There are a number of ways in which to obtain a starting basis: from scratch using an "artificial basis," using a CRASHed basis forcing some of the variables into the basis, or based on information about the problem. The last approach is especially useful in solving a series of problems with only minor variations in data. Candidate selection can play an important part in the efficiency of an LP algorithm. There are a number of ways to select the entering variable Â— "multiple pricing" is one of them which is especially useful for large problems. In this approach we typically divide the variables into segments and pick tlie segments sequentially. In the chosen segment we select the g most negative reduced cost variables as candidates for pivoting. We concentrate on these g
PAGE 11
4 nonbasic variables for the remainder of some h iterations. And then repeat the procedure with the next segment. A more recent development concerning pivot selection is due to Paula Harris [17]. The reduced costs here are weighted prior to selection. The weights are chosen with respect to a fixed set of nonbasic variables. The idea is that in the traditional most negative reduced cost approach we are trying to find out the best candidate but on a local basis Â— locally we move in the direction of steepest ascent of the objective function. Such a selection may not serve global interests. By constantly adjusting the pricing by weights derived from a fixed reference point, the global interests of the problem are hopefully maintained. Experimental results corroborate this expectation and show it to be a very effective procedure. When there is degeneracy in a linear programming problem, there exists the possibility of a set of basic feasible solutions occurring repeatedly Â— called "cycling." Bland [3] has introduced several new methods to avoid cycling Â— the simplest of which is to select as entering variable one with the smallest subscript among all candidates to enter basis. If there is a tie for leaving variable, select the one with the smallest basic subscript. Parallel with efforts to increase the size of problems that can be tackled by the simplex method have been efforts to increase its capabilities for problems having special structure. llie computing effort using the simplex method depends primarily on the number of rows; so it is worthwhile considering the possibility of
PAGE 12
taking care of some constraints without increasing tlie size of the basis. Generalized Upper Bounding (GUB) technique introduced by Dantzig and Van Slyke [9] achieves this for contraints of the form (l.A) Zx. = b. with the restriction that the same variable does not occur in more than one GUB constraint. Decomposition technique is another method of extending the capability of problem solving by breaking it into smaller subprobleiiLS and using a master program iteratively to tie them together. Unfortunately the convergence of this method is rather slow and in addition one is unable to compute bounds on the value of the objective function at each iteration, unless each of the subproblems is bounded. Despite the far reaching advances made during the last three decades in extending the capability of the simplex method, some of which have been mentioned above, the basis inverse still remains the limiting factor dictating the size of the problem that could be tackled. Even when we use variants like GUB and decomposition techniques, the size of the basis is still the bottleneck. In the case of GUB it is the basis of the master problem consisting of other than GUB constraints and with decomposition it is the size of the basis of the largest of subproblems. Thus this basis inverse is the heart of all that is bad in simplex method related techniques of solving an LP. Commercial computer codes using above developments of simplex method are capable of solving mathematical programming systems of the order of 10,000 rows. However, useful linear programs of such
PAGE 13
6 magnitude do arise in practice for example in food, oil and chemical related industries. The reason why larger ones are not being set up and solved is that no production LP code has been developed to solve such large problems in a timely and cost effective fashion. Tliis is not to say that there are not large linear programming production codes. For example, IBM's MPSX/370, CDC's APEX, Honeywell's LP 6000 X, Techtronic's MPS3, and others are reliable and available at reasonable rates. However, one would not seriously consider using one of these to solve a linear program of 10,000 or more rows on a routine basis unless the problem possessed some nice structure or the results of the problem solution had a high economic value. Dantzig et. al. [8] in their justification for the setting up of a system optimization lab for solving large scale models have noted : Society would benefit greatly if certain total systems can be modeled and successfully solved. For example, crude economic planning models of many developing countries indicate a potential growth rate of 10 to 15% per year. To implement such a growth (aside from political differences) requires a carefully worked out detailed model and the availability of computer programs that can solve the resulting largescale systems. The world is currently faced with difficult problems related to population growth, availability of natural resources, ecological evaluations and control, urban redesign, design of large scale engineering systems (e.g. atomic energy and recycling systems), and the modeling of man's physiological system for the purpose of diagnosis and treatment. These problems are complex, and urgent and can only be solved if viewed as total systems. If not, then only patchwork piecemeal solutions will be developed (as it has been in the past) and the world will continue to be plagued by one crisis after another caused by poor planning techniques.
PAGE 14
Tlie bottleneck in solving such large unstructured LPs is the inability to maintain a sparse yet accurate representation of the basis inverse. When the simplex method is not computationally feasible for such large problems, iterative techniques that do not require a basis inverse may be an attractive alternative. In this dissertation we concentrate on iterative procedures for solving linear programs. VJe discuss iterative techniques under the headings (a) relaxation methods and (b) subgradient methods. Tliese methods are insensitive to the number of constraints. In fact an increase in the number of constraints for the same problem improves the convergence of the relaxation methods. Unlike the simplex method, iterative techniques are selfcorrecting. We always use original data thus advantages of supersparsity [22] can be fully realized and the program run in main memory. Finally knowledge of a vector close to the solution can be used to advantage. This is very useful when a sequence of problems has to be solved with only slightly varying data. As an aside to the main thrust of this dissertation, we note that iterative methods are more attractive for solving certain large Markov decision processes and also for problems having a Leontief Substitution constraint set [23, 25]both of these problems are an important special class of LPs.
PAGE 15
CHAPTER 2 BACKGROUND RESULTS AND TOOLS Introduction Motivation In this chapter we will review various iterative techniques for finding a point of a polyhedron. We are interested in finding a point of a polyhedron because a wide variety of important mathematical programs can be reduced to such a problem. For example, solving a linear program could be reduced to finding a point of a polyhedron using primal and dual constraints along with an additional constraint implying strong duality. To elucidate this further, consider the following primaldual linear programming pair: (2.1) Primal Dual Max c' X Min v'b s.t. Ax^b s.t. v'A^c X >0 V > Here A is m x n; c is n x 1 and b is m x 1 . v' stands for transpose of V. Problem 2.1 can be restated as (2.2) Ax + b ^ v'A c ^ v'b c'x ^ V ^ 0, X ^
PAGE 16
Let 1' = {(^)(^) satisfies 2.2} be the set of solutions to 2.2. (^)C P implies that x solves the Primal problem in 2.1 and v solves the dual problem. Linear programs of the form z* = Max Min (alx + b.) X i=l, . . . ,ra relate directly to the problem of finding a point of a polyhedron P where P= (xe r'^: alx + b. k z*, i = 1, ..., m} Such problems arise, for example, in largescale transportation problems and decomposition problems [2 7]. We will develop such a formulation for the transportation problem. Given k origins with stocks a, > of a single commodity, and m destinations with demands d. > the transportation problem involves J determining a shipping schedule that meets all demand and supply contraints at minimum cost. Let zÂ„. be the amount shipped from Â£ to j c cost of shipping one unit from Â£ to j . Â£j The problem is k m Min Z T. Cn . Zn i=l j=l ^J ^J subject to '^ Zo . = ^o Â» ^ = 1. Â• Â• Â• , !< j=l ^^ ^ k ^ 2p . = d. , j = 1, . . . , m Â£=1 ^J J
PAGE 17
10 ^.Ij'" Without loss of generality, we assume k m Z 3. = Id Â£=1 ^ j=l If m and k are too large to be handled by conventional network methods [4], then the decomposition principle can be used. Let {(z^.: i = 1, ..., n; j = 1, ..., m) : i G I'} be a list of extreme points of k ^ zÂ„=d.,j=l, ...,m Â£=1 ^J ^ Then the transportation problem is equivalent to the master problem km . . Min Z L c Z Zp . A Â£=1 j=l ^J id ^J subject to ra a, Z Z Zp. A = 0, Â£ = 1, ..., k ie I j = l ' Let Z A^ = 1 iel A^ ^ 0. k m b . = Z Z cÂ„ . z. . m Â— 1 _ 1 y 1 j = l The master problem becomes Z iel Min Z b^A^
PAGE 18
11 subject to E a. X =0, J2. = l, ...,k. iel Z X = 1 iÂ£l A^ > The dual to the above problem is Max w subject to k L aj xÂ„ + w g b^ Â£=1 ^ ^ w < b + a ! X 1 where a^ = (a , a . . . , a ) G R This problem is equivalent to Max^ Min r , , , i ^Jk ... ta.x + b . } xeK lel 1 1 Task Two of the methods of finding a point of a polyhedron are the FourierMo tzkin Elimination Method [7] and a Phase I simplex method. The Elimination Method reduces the number of variables by one in each cycle but may greatly increase the number of inequalities in the remaining variables. For larger problems tliis procedure is impractical [26] Â— since the number of inequalities grows at a rapid rate. The Phase I simplex procedure can also be used to find a point of a polyliedron. For very large problems the simplex method breaks down due to a practical limitation on the size of the matrix
PAGE 19
12 that can be inverted as explained in the last chapter. It is for problems of this nature that iterative methods may be an attractive alternative. We discuss iterative techniques under two classes Â— relaxation methods and minimization methods. In relaxation methods, at each iteration we attempt to find a point of a relaxed form P of P. That is at iteration n we find an X e P such that P c P . We look for the property that the sequence of points thus generated is pointwise closer to P. Minimization methods attempt to solve the problem of minimizing a general, maybe nondif f erentiable, convex function f(*). The sequence generated generally has the property that the successive iterates are closer to the level set of solutions in distance. X" is one such point. The value of the objective function need not necessarily improve at each iteration. When we use the minimization method for finding a point of P, the function f is so defined that if x* is a minimum of f then X* e P. We will now review the relaxation methods. As was mentioned 1 r, , . . . n Â„n , ^n . earlier, we find at each iteration n, a point x e P where P is a relaxed form of P. That is P c p . We continue finding points successively in this manner till we can find an x* e P. At least one such point exists by our assumption that P 7^ 0. Definitions We need a lew definitions before some of tlie methods can be described. Let Ji. (x) = alx + b. 1 11
PAGE 20
13 k k where i Â£ I, 1^0 and finite, x e R , a. e R and b. e R. Without loss of generality assume  a.  = 1 for all i, where  Â•  is the Euclidean norm. H. given by H. = {x e R^: I. (x) > 0} 1 1 is the halfspace associated with i C I. P given by P = n 11. = {x e R^: 5,.(x) ^ 0, j Â£ 1} is the polyhedron iel "^ * of interest. E. given by E. = {x C R*^: Â£. (x) = 0} 1 1 is the plane associated with halfspace II. . d(x,ll.) given by d(x, H.) = inf II xz  Z Â£ H. 1 is the Euclidean distance from x Â£ R to halfspace H. . d (x) given by d (x) = max d (x, H . ) P Jiel is the maximal distance from x to P. B (x) given by B^(x) {yÂ£R^: xy ^ r} is the ball with radius r and center x. S (x) given by r S^(x) = {yER^'iJI xy = r} is the k1 dimensional sphere of radius r and center x Typical Relaxation Method Procedure A typical relaxation method for finding a point of P is o k (1) Pick X Â£ R arbitrarily
PAGE 21
u (2) If X e P stop, otherwise let i''' be the most isolated halfspace, i.e ., a! AX b . * > a! X b., for all iel. 1 1=1 1 We obtain the sequence {x } recursively as follows: n+1 n , , n . n n, X =x+A (tx) where t ~ ^n (x ) is the projection of x on E.^^. A is called the relaxation parameter at iteration n. Generally A is constant across n = 1, 2, ... Variations If A =1 for all n, we call the procedure a projection method, and when A = 2 for all n, we call it a reflection method. See Figure 1 for an example of the relaxation procedure. In this example we use A = A = 2, for all n. At xÂ°, E is the plane corresponding to the most violated halfspace. We reflect in C 11 2 to get X . At x we reflect in E to get x . If we continue the procedure we are in P in four iterations. Convergence Fejermonotone sequence We are ultimately interested in finding a point of P or at least finding a sequence of points which converge to a point of P. If the sequence terminates with a point of P (as in Figure 1) we have accomplished our goal. If the sequence does not terminate, then we wish to know whether it converges to a point of P. A well known result concerning "Feje'rmonotone" sequences will be used to shed light on this question.
PAGE 22
15 a^x + b > / L. E, FIGURE 1
PAGE 23
16 n k A sequence of points {x } in R is said to be Fejerraonotone [13, 31] with respect to the set P if . X n+1 , n (i) X f y. and (ii) II X zjl ^ II X z for all z Â£ P and for all n. Motzkin and Schoenberg A well known result in analysis is that if (s } is monotone, then {s } converges if and only if it is bounded. Hence we see that a Fejermonotone sequence always converges in distance. However, we need to relate this convergence to the convergence of sequence {x } . Before we relate a theorem on convergence of Fejermonotone sequence of points, we need a few more definitions. Given a set K c R , the af f ine hull of K, denoted by jU(K) , is M(K) = {x e R*^: X = Z A^x^, x^ e K, ieL ^ A = 1, L finite, A"^ c r} . ieL For example, the affine hull of a point is the point itself and affine hull of a line segment is a line containing the segment. The affine 3 hull of a triangle in R is the plane containing the triangle. A set P is affin e if P = M(P) . k + Let M be affine in R , x a point of M and r e R be a positive real number. Then we define S (M, x) as the spherical surface with axis M, center x and radius r by (2.3) S^(M,x) = x + ((M x>L n {z c R*":  z = r}) Figure 2 shows the construction of a spherical surface. Given a point of affine set M, M x is the translation of M by x. (M x)*is orthogonal to M x. S (0) is a spherical surface with radius r, r '
PAGE 24
17 which intersects (M x)^ at Wo points. Tliese two points are translated back by x to give S (M, x) . (2. A) nieorera (Motzkin and Schoenberg) Let the infinite sequence {x } be fejermonotone with respect to the set P, then Case (a) : If P has an interior, the sequence converges to a point. (M x)L (M x) FIGURE 2
PAGE 25
Case (b): If P does not have an interior, then the sequence can be convergent to a point or the set of its limit points niay lie on a spherical surface with axixA[(P). See Figure 3 for an illustration. In case (a) P has an interior 12 3 * and the sequence x , x , x , x , ... converges to a point x of P. In case (b) P does not have an interior and the Fejermono tone 0123 ,.u . . . p u sequence x , x , x , x , ... results in the two limit points or the sequence lying on a spherical surface with axis M(P) and center y. y being the projection of x on P Relaxation Procedures Agmon We are now in a position to start our survey of relaxation procedures. One of the earliest published methods is the projection k method due to Agmon [Ij. Here we specify an initial vector x e R (chosen arbitrarily) and set A = 1 for all n. At iteration n if X e P stop. Otherwise we find a nev; point x as follows: n+1 n , , n , n n. X =x+A (tx) (a. X + b . ) a. n ^n In ^n = X a. a. =x (a. X +b.v In In) a,In since a. 11 = 1. Here i represents a most violated contraint of II 1 M n n P at X .
PAGE 26
19 Case (a) S^(M(P), y) 13 5 X , X , X , M(P) 2 4 6 X ,X ,X ,X y Case (b) FIGURE 3
PAGE 27
20 This procedure yields a sequence which converges to a point of P. In the Agmon procedure P" = {x: a'. X + b. ^ 0} That is, P is the closed halfspace defined by the constraint i^^ and P c ? . In keeping with our notation, here p" is a relaxed form of P. (2.5) Theorem (Agmon) Let P = n H.,P/(t), Ij^i}) and finite, be the solution set of iel ^ a consistent system of linear inequalities, and let {x"} be the sequence defined by the projection method, that is, A" = 1 for all n. Tlien either the process terminates after a finite number of steps or the sequence (x } converges to a point of P, x , Furthermore, n "" X X g 2d(xÂ°, p) e" where c (0, 1) and depends only on the matrix A, where A = ^ 1 I ra / and m = 1 1 1 . Agmon explicitly proved the convergence for tlie projection method. He also showed that the results could be extended to the case A c (0, 2), where A" = A for all n.
PAGE 28
21 Motzkin and Schoenbera, Eaves Motzkin and Schoenberg 131] (among others) described a reflexion method for finding a point of P. They showed that if P has an interior then the sequence terminates with a point in P after a finite number of steps. Let x ^ P be an arbitrary starting point and generate a sequence as follows: If x" e P stop If X ?! P select a half space such that x ^ H. Let X be obtained by reflecting x in E . . After finitely many iterations, x will fall in P if P has an interior. The general re flee ion function can be expressed as follows; k k For i = 1, ... , m define f.: K Â— * R by 1 (X 2(a.x + b.)a. if x ?! H. Ill 1 X if x C H. Let g , gÂ„, ... be a sequence where g. e {f , ..., f } for j = 1, 2, . J 1 m Define g to be the composite function g"(x) = g (g , (...(g,(x)) ...)) n ni 1 (2.6) Theorem (Motzkin and Schoenberg) Let P ^ and I t^ and finite. If P = n H. has an interior, iEl ^ then for any x , there is an H such that g (x) c P. Eaves 111] extended this result and demonstrated that the reflection procedure has a uniform property. Namely, if x is within
PAGE 29
22 I, 0, a distance 6 of P, then g (x ) will fall in P for some I where * * % ^ % and t depends on o and not on x . (2.8) Theorem (Eaves) Assiime that P = n H has an interior. Let X be the set of id " points within a distance 6 of P, that is, X = {x c R*^: d(x, P) ^ 6} % 1 Tlien there is an Â£ such that g (X) c P . g is thus a piecewise, linear retraction from X to P . Piecewise and linear means that it has finitely many linear pieces. Retraction means g : X ^P is continuous, P c X and g (x) = X for X Â£ P. See Figure 4 for example. Six points chosen arbitrarily within 1" of P all converge in under four iterations . Gof f in Definitions . Coffin's [15] work deserves special mention. He has provided a comprehensive treatment of relaxation methods and presented several useful finiteness properties. We need a few more definitions before presenting Coffin's results. A convex subset F of P is a face of P if X. y e P 'I l> Â— ^ X, ye F. (x, y) \< i ^ ] We denote the set of faces of P by F(P). Figure 5 illustrates F(P) for a given P. Zerodimensional faces are 1, 2, 3. Onedimensional faces are 4, 5, 6. The polytope 7 itself is also a face.
PAGE 30
23 FIGURE 4
PAGE 31
24 Tlie zerodimensional faces of P are called the extreme points of P. Tlie n1 dimensional faces are called facets of P. FIGURE 5 N (x) defined by Ng(x) = {u Â£ R^: u'(z x) < 0, Vz e S} is the normal cone to S at x. N (x) is a nonempty, closed convex o cone. It is nonempty because it contains at least the origin. It is closed and convex because the intersection of closed halpspace is closed and convex. It is a cone because for all u c N (x) , Au Â£ N (x) , A ^ 0. Cg(x) = (Ng(x))^ is the supporting cone to S at x. Figure 6 illustrates these last two definitions. A point X of a set P is a relative interior point designated r.i.(P. ) if it is an interior point of P with respect to the relative
PAGE 32
25 topology induced in M(P) by the Euclidean topology, i.e., xe r.i.(P) if there exists 6 > such that Br(x) n M(P) c p. For example, as 2 2 shown in Figure 7, a line segment in R has no interior in R but has a relative interior. FIGURE 6
PAGE 33
26 FIGURE 7 It can be shown that the normal cone is the same for all relatively interior points of a given face, i.e., Np(x) = Np(F) for any X c r.i.(F) where F is a face of P. Similarly Cp(x) = Cp(F) for any x E r.i.(F). Let T be any set, then T is defined as T = {x e R : X = y, y Â£ T} A closed convex set is said to be smooth enough at a point v of its boundary if Np(y) c Cp(y) A closed convex set is said to be smootlLenou^,_ if it is smooth enough at every boundary point, or equivalently if Np(F) c Cp(F) â€¢FÂ£ F(P) where F(p) stands for the set of faces of P and F(P) cl> is the set F(P) with removed.
PAGE 34
27 Tlie smooth enough property applied to a polyliedron would require all its "corners" to be "smooth enough," carrying the analogy to kdimensional Euclidean space. Some characteristics of smooth enough convex sets are mentioned below: A polyhedral cone C is smooth enough if and P P only if (iff) C "^ C, where C is the polar of C. c''={u e R*^: u'z ^0, Vz e C}. A polytope (bounded polyhedron) is smooth enough iff all its extreme points are smooth eno ugh . A polyhedron is smooth enough iff N (F) <^ C (F) for ail the minimal elements of the poset {F(P) (}>, c} where *= stands for set inclusion. Some illustrations are given in Figure 8. A poset (P, <^) is a set P on which a binary relation Â£ is defined such that it satisfies the following relations; (i) for all X, X _< X (reflexivity) (ii) if X j< y and y Â£ z then x Â£ z (transitivity) (iii) if X _< y and y <_ x then x = y (antisymmetry) Examples of posets are {R, <] , i.e., the reals with the ordinary less than equal to operation, a dictionary with lexicographic ordering, and the power set S, P(S), with set inclusion, r , as the binary operation. Given a unit vector e e R and an angle a Â£ [0,11], the set of vectors which make an angle with e less than or equal to a is called the spherical cone , C (e), with axis e and half aperture angle a.
PAGE 35
28 (a) Not smooth enough at the extreme points of P. (b) Not smooth enough at any point of P. (c) Smooth enough. FIGURE 8
PAGE 36
29 C^(e) = {x Â£ R^: x'e = See Figure 9 for an illustration X II Cos a} C^(e) = c{B^(e)} a = sin a, a Â£ [0, jr) 2 FIGURE 9 Some properties of C (e) are given below: a C (e) is closed for a e [0,11] C^(e) is a convex set for a e [0, 11/2] For a Â£ [0,11/2) if we let v = Sin a, then C^(e) = c{B^(e)} where C(S) = {x Â£ r'': X = Z X\\ x^ Â£ S, A^ > 0, L finite} iÂ£L C(S) is the convex cone hull of S.
PAGE 37
30 Following Goffin [15] we define an obtuseness index u(C) of a closed convex cone C to be the sine of the half aperture angle of the largest spherical cone contained in C. Properties of V(C) are: U(C) = sup {Sin a; C (e) c C, e e B (0)} = sup min(a'. e) eeS^(O) id ^ U(C) > iff C has an interior u(C) = 1 iff C is a halfsp ace . If C (e) is a spherical cone then U(C (e)) = Sin a, a a The obtuseness index of a polyhedron P is defined by U(P) = inf U(Cp(x)) = min U(Cp(F)) xÂ£;3P FÂ£F(P)i}) For a polytope we have U(P) = min U(Cp(F)) FeVertices of P If U(P) = 1/42, then P is smooth enough. However, tliis is not a necessary condition for P to be smooth enough. If P is the positive 3 orthant in R , then P is smooth enough but i(P) = I//3 < I/J2. The distance from a point x to a set S <= r*^ is defined by d(x, S) = inf II X z zES If S is closed, then the infimum is attained, and the set of points of S where it is attained is the set of points closest to x in S Tg(x) = {y e S: II X y= d(x, S)} If a set K is closed and convex, then T (x) reduces to a single point
PAGE 38
31 which will also bo denoted by T (x) . Tliis point is cnllcd the K projection of x into K . Hence T (x) is a retraction from R to K. K Let K be a closed convex set in R . Tlie following important result implies the continuity of the map T^ (2.8) Theorem (Cheney, Goldstein) K The projection map T satisfies the Lipschitz condition (for K. C = 1). That i IS II Tj,(x) Tj^(y)^ x yl See Figure 10 for an illustration; \\M \(y)\\ < x yjl FIGURE 10
PAGE 39
32 (2.9)Tlieoreni (Goffin) Let K be a closed convex set of R and T (x) the unique closest K point to X in K. Then {[Tj^(x) + Nj^(T^(x))J, Tj^(x) e K} is a partition of R^. See Figure 11 for an illustration. FIGURE 11 (.2.10) Le mma Let C c R be a closed convex cone. Then P P T^(C ) c c"^ P where C is the polar of C. An alternative formulation of a spherical surface with axia M, radius d(x, M) and center T^XX) (see 2.3) is contained in the M following le mma.
PAGE 40
33 (2.11) Lemma (Coffin) k k Let M be affine in R and x c R , then Sd(^^j^)(M.Tj^jM) = {y e r'': II X z\\ = y,, vz e m} Finally a well known result on normequivalence is stated below. Here we state the result in terms of distances from x to P witli X i ?. (2.12) Lemma Let P = n II . , P / 0, I j^ 0, then there exists a scalar iÂ£l ^ < y ^ 1 such that for all x c R yd(x, P) ^ dp(x) ^ d(x, P). This result is illustrated in Figure 12. Another Proof of Agmon's Result . ITie following is a different proof for the result originally due to Agmon. We will find this useful in subsequent parts of this material. In this proof we make use of the result that the sequence generated by the relaxation method is Fejermonotone with respect to P. FIGURE 12
PAGE 41
34 (2.13) Tlieorem Let P= n H.,P?^0, I?^0 and finite, then the relaxation id ^ method generates a sequence which converges finitely or infinitely to a point x* Â£ P for any given value of the relaxation parameter A such that X Â£ (0, 2) . Furthermore, we have II x" x^'ll s 2d(x", p) e" where = [1 A(2 A)u^]^^^ A e [0,1) Proof: Assume x ?! P First we will show using Figure 13 that d^x'''^\ P) = d^x", P) A(2 A)dp(x") where d (x ) is the maximal distance from x to P i.e., d (x ) = max d(x, H. ) . iel ^ T, , . , . , n+1 n Â„ , n+1^ From the right triangle x , t , T (x ) we have: ,2, n+1 Â„. j2. n+1 n, , ^2. n ^. ._. d (x , P) = d (x , t ) + d (t , P) (1) From the right triangle x , t , T (x ) we have: ,2, n+1 ,2,n n.,,2,nÂ„, ,Â„, d (x , P) = d (x , t ) + d (t , P) (2) From (1) and (2) ,2, n+1 j2.n Â„, .2, n+1 n, ,2, n n. d (x , P) d (x , P) = d (x , t ) d (x , t ) ,2 n+1 2. n n+1 n, , , . n n. ^ d (x , P) = d (x , P) [d(x , t ) + d(x , t )] r , / n n . , . n+1 n . , [d(x , t ) d(x , t )] = d^x", P) Adp(x'')[(2A)dj,(x")] = d^(x", P) A(2 A)dp(x")
PAGE 42
35 It will be noticed that our proof is independent of Figure 13. The figure only helps to illustrate the ideas. n+I ^^ ^ ::^ ^^ \ \ 1 (x ) = 1 (x ) P P FIGURE 13 By lemma 2.12 there exists a y > such that yd(x, P) ^ d^,(x) ^ d(x, P) hence d^x"^\ P) < d^x", P) y^A(2 ^ A)d^x", P) = d^(x", P)[l A(2 A) y^] (2.14) Let = [1 A(2 A)y^]^^^ 0=0 when A = 1, y = 1. In general 8 e [0, 1). Thus, d^x^+\ p) ^ e^d^x", p) or d(x", P) ^ e^'dCx^, P) Case I: If the sequence does not terminate lim d(x", P) = nÂ— KÂ» Since {x } is Fejermonotone and dim P = n, then {x } converges to X e 8P by Theorem 2.4 where 3P is the boundary of P.
PAGE 43
36 Case II: If the sequence terminates in a finite number of steps II X z < II x"^ z, ^ z 9 P since the sequence is Fejermono tone with respect to P. Clearly  x* T (x")^  x" T (x), hence and x^ x" both belong to the ball B^^^n^ p)Tp(x") (see Figure 14). Hence, II x" x^ll < 2d(x", P) < 2e"d(xÂ°, p) If the sequence terminates, we can take x* to be the last point of the sequence. /// n+2 X FIGURE 14
PAGE 44
37 In the above procedure, if 1 i A < 2 then P" = (x: a'. X + b,. > 0} That is, P is the closed halfspace defined by the constaint i If < A < 1, then P" = (x: a! X + (1 A)b. Aa ! x" > 0} \ 11n n n 1 11 n .n , , , Â„n and again P <= p . in both cases P is a relaxed form of P and at r . , n Â„n Iteration n we find x e P . Range of A for finite convergence . Using relation 2.14 giving 9 against different values of A and \i, we get the results in Table 1 shown graphically in Figure 15. TABLE 1 p = 1
PAGE 45
38 FIGURE 15
PAGE 46
39 (2.15) Theorem (Coffin) Let P be a polyhedron with an interior. Then the sequence generated by the relaxation method converges finitely to a point of P if (a) P is smooth enough and A Â£ Jl, 2J (b) A c (A*, 2] where 2 y* = ; Â— ; 7^;^ and u(P) is the obtuseness index of P. A 1 + u(P) Furthermore, if A > 2 then the sequence either converges finitely to a point of P or it does not converge. The first part of the theorem shows that if all the corners of P are smooth enough then the range of A over which we can get convergence in a finite number of steps is [1, 2]. The second part of the theorem shows that for a polyhedron that is not smooth enough, the extent by which we can deviate from A = 2 depends on the acuteness of the most restrictive extreme point of P Â— the more acute it is, the less we can relax A from a value of 2. Merzlyakov Definitions and Motivation . The final work on relaxation methods that we review is by Merzlyakov [30]. Merzlyakov 's method takes advantage of the fact that convergence properties of relaxation methods improve if, instead of considering only the most violated constraint, as was done by Agmon, among others, a larger nunÂ±)er of supporting planes are considered. We would like to emphasize this unusual property of
PAGE 47
AO relaxation methods Â— redundancy, in terms of number of constraints, improves the convergence rate. Additional halfspaces have the effect of increasing )i which lowers 6, the convergence ratio. At each x e R let V(x) = {i: a'.x + b. < 0, i = 1, . .. , m} That is, V(x) is the set of indices of constraints violated by x. A cavity C is the set C = {x e R : V(x) = V} where V is a subset of first m natural numbers. A subcavity S """ of C is the subset of all x G C which violate constraint i e V no less thananyother constraint j e V. Ties are broken by the natural ordering. Notice that subcavities along with P finitely partition R . Figure 16 illustrates the above. Associate with each subcavity S a fixed A.(j, V) where if i Â«! V ^,.(J' V) = < > if i e V ^> if i = j . 2 For example for subcavity Sr , (see Figure 16). Xj_(2, {2, 3}) = X^i2, {2, 3}) > A^(2, {2, 3}) > Method . Merzlyakov's relaxation procedure is as follows: Let X e (0, 2) and x specified. If at iteration n, x c P stop. Otherwise, let
PAGE 48
41 (2, 3} '{!', 2, 3} FIGURE 16
PAGE 49
A2 (2.16) x""*"^ = x" A[ZA.(j,V)(a!x"+b.)][ZA.(j,V)a.] [EA.(j,V)a.] [ZA.(j,V)a.] where x e ^w Â• We again make the assumptions that ? ^ and without loss of generality assume a.=l for each i = 1, . . . , m. Also assume without loss of generality that ZA.(j,V) = 1. Goffin showed that if P 7^ then the vector ZA.(i,V)a. cannot be zero. 1 1 Merzlyakov showed that the procedure given above converges to a point of P with a linear rate of convergence. If 1 ^ A < 2 then we can say P" = {x: ZA.(j,V)(a:x + b.) > 0} .1 1 1 ~ 1 and clearly P c p". If < A < 1, then P"" = {x: ZA.(j,V)Ia'.x + (1 A)b. A(a!x")] > 0} .11 11 1 and P c p . Here again P is a relaxed form of P and the sequence obeys our stipulation for relaxation method, i.e., at iteration n we n n n find x e P where P <= p . The relaxation sequence {x } congerges infinitely to a point X c 3P, where 3P is the boundary of P, only if the whole sequence (x } > "jams" into x + N (x ). By moving in a direction that is a convex combination of violated constraints we force the sequence out of tlie jam. For this reason Merzlyakov method can be referred to as an antijamming procedure. Table 2 and Figure 17 illustrate this point. The table and figure are based on the example given in Merzlyakov' s paper.
PAGE 50
43 TABLE 2 Consider the following system of linear inequalities X + lOx + 10 ^ 3x^ lOx^ 30 i Let X = (0, 0). We solve this by (a) Agmon's method (b) Merzlyakov's method where X = 3/4,  a.  = 1 n (a) Agmon's method X =x A (ax +b. )a. ; 1 11 n n n
PAGE 51
TABLE 2 (continued) (b) Merzlyakov Method n+1 n , [ZA.(j,V)(a'.x'' + b.)][ZA.(j,V)a.] X = X A 1 1 1 1 1 [ZA.(j,V)a.]'lLA.(j,V)a.] 44
PAGE 52
45 11 u ex; o M
PAGE 53
46 fO if a!x + b. > A.(x) 4 ^ 1^ if a'x + b . < V. 11 where < ZA.(x) < 3, A.(x) > for some i e V(x) and at each n i there is a j e V(x ) such that A.(x'')(a'.x" + b.) ^ _J i L > a ' X + b . , J* J'' where j ' indexes a most isolated constraint and > 0. The convergence properties of this method are described in the next chapter. Typical Subgradient Method Procedure Let f be a convex but not necessarily dif ferentiable function defined on R . u Â£ R is said to be a subgradient of f at x if it satisfies the following inequality: f(y) > f(x) + u'(y x) for all y Â£ r". The set of subgradients of f at x is called the subdif f erential of f at X and denoted by 9f(x). 9f(x) = {u e R : u is a subgradient of f at x} X is a minimum of f if and only if e 9f(x ). A typical subgradient algorithm works as f ollov%?s : (1) choose X e R arbitrarily (/; compute a subgradient u of f at x , i.e., u C 9L(x ). If u = an optimal point has been found. If not go to (3). fi\ T?J """"l c IT n+l n n n (J; Find X as follows; x = x t u
PAGE 54
47 wliere t is a scalar generally obtained by the approximate rainiraization of f(x" tu") for t ^ 0. Go to (2) with X replacing x llie function f could be an arbitrary convex function. If the objective is to find a point of a polyhedron, one could choose a number of different functions. One choice could be f(x") = Max (alx" b.) i=l, . . . ,m Since f is the maximum of a finite number of convex functions, f is convex. Oettli [32J has given another interesting way to define a function which can be used to find a point of a polyhedron. This is given latter. Variations Reverting back to the general subgradient procedure given by the iterative scheme n+1 n n n X = X t u various proposals [16] have been made for the step size t . The earliest one was by Polyak [34] who suggested Zt = =Â° and lira t = 0. nÂ— >oo The merit of this procedure is that we could choose any arbitrary subgradient u e 8f(x ) and the sequence would converge to some x* with the stipulated stepsize (e.g., with t = n^ Â• However convergence is dismally slow and the method is of .little practical value. Polyak attributes the original idea of this method to Shor [36]. Another idea also due to Polyak [35] is to use
PAGE 55
48 (2.17) ^^ A^IfCx") f] t = II "l2 Â— k with f = f(x ) the optimum value of the function. Polyak has shown that vjith 0 f(x ), i.e., an overestimate of the optimum f(x ). Here again if < b , = X = bÂ„ < 2 the sequence converges at a 1 n f and define S^ = {x e r'^: f(x) i a}
PAGE 56
49 S is convex (it is a level set). Hence there exists an x in int S a a Â•k since a > f (see Figure 18). Choose p > such that the neighborhood B (x) c S . p a Suppose x q' S for all n a f(x) (a) X Â£ R FIGURE 18 f(x) f(x") ^ u'' (x x"), Vx since u is a subgradient at x . Thus f(x") f(x) g u"' (x" x). Also contours of f (b) X e R^. f(x) 1 f(x") ^x e S a
PAGE 57
50 Combining the above u" (x" x) i f(x") f(x) ^ 0, ^x e S a n This holds for a point x + l. e B^(.x)c S , I n ^ nn^n.^pu. Hence u x ^ u (x H ) llu" 1 1 ri II ^ n , n ~, or p u I ^ u (x x) Â„ . II n+1 ~ II II n n ~ m 2 But x x = x u x II ^ ~l2, n2 nn ^ = II x x + II u II 2u (x x) ^ II x"ill 2+ u"22pu" = II x"ill 2+ u"2 2pA^. Choose N such that A = p, Vn = N. This is possible since lim A =0. n n n Â— xÂ» For n = N to n = N + ra 0^ x^'^lx2^ II x^S2 + Ta,(A, 2p) n=N ., _ N+m g II x^ X II 2 p Z A^ n=n For m large enough the right hand side is negative, since ZA diverges Â— a contradiction. Hence x e S . a Let lim a = f . Then we have proved there exists x ^^ e S such that m n ^^ lim f(x 1^) = f '. Computational experience with heuristic subgradient methods has been reported by Held and Karp [18], and Held, Wolfe, and Crowder [19]
PAGE 58
51 Held, Wolfe, and Crowder use underestimates to f(x ). However the stepslze X = A is kept constant for 2m iterations where m is a measure of the problem size. Then both A and nimiber of iterations are halved successively till the number of iterations reaches some value k, \ is then halved every k iteration. This sequence violates the requirement ZA = "Â» and it is possible that the sequence may n converge to a wrong point. However in their computational experience this did not happen. Held, Wolfe and Crowder experimentally found that their procedure was effective for approximating the maximum of a set of piecewise linear concave functions. In their problems (assignment, multicommodity , and maxflow) the number of constraints is enormous, of the order of 100 1 . In all these cases they were able to get linear convergence (which is assured by the choice of stepsize) Held and Karp used overestimates to f(x ) to obtain bounds for a branch and bound procedure for solving the traveling salesman problem. Using this procedure they were able to produce optimal solutions to all traveling salesman problems ranging in size upto 64 cities. Oettli Oettli proposed a subgradient method for finding a point of an arbitrary nonempty polyhedron. We will discuss his less general method which can be used to solve a linear program. As before, using the dual and an additional constraint expressing strong duality, a linear jirogram can bo stated to be the problem of finding an x E 1\ satisfying 5"!'(x) ^0 i = 1, . . ., m
PAGE 59
52 v^rhere t. (x) represents the magnitude by which constaint i is violated at x; i . e . , 5, (x) = max {O, a.x b } i 11 Define Â£ (x) to be the vector formed from violations of different constraints l^(x) = a.'^(x), Â£Â„'^(x), ..., l'^(x)) i Z m Let p(*) be an isotone norm on R , i.e., if = x ^ y then p(x) = p(y). It may be noted that all the usual norms like Euclidean and Tchebycheff are isotone norms. Define d(x) = p(Â£^(x)) Then clearly x satisfies i. (x) ^ iff d(x) = 0. Oettli showed that d(*) is a convex function and then described an iterative scheme using subgradients of d(*) that minimizes d(*) over R . Tliis scheme is quite general since we have a different function for different norms. (2.18) Lemma fOettlil d(*) is a convex function. Proof: Z. (z) is convex for all i. J Thus I (azl + 3z") < aS,"^(z^) + BS"*"(z^) for a, 3 > 0, a+3 = 1. Since p is monotonic p(^^(azl + 3z^) < p[aÂ£'*"(zb + Bl^(z^)] and since p as a norm is convex ^ p[aÂ£"^(zl) + p[3Â£'^(z^)] = apU^Czl)] + 3 p[Ji^(z^)] hence d(az^ + 3z^) ^ ad(z^) + 3d(z^) ///
PAGE 60
53 Let X be given initially. If x Â£ P stop. Othen^;ise, let n+1 n d(x" j n, (2.19) X = x , t(x ) . n ' , n. t(x ) t(x ) where t (x ) is any subgradient of d(') formed according to Oettli's rule at x . Ti\e sequence {x } converges to a point of P with a linear rate of convergence, (2.20) Lemma [Oettlil Let Y Â£ P> then for the Oettli's procedure II n+1 II 2 < II n ,1 2 d2(x") II x yH ^ II X yII ; 1 t(x")' Proof: 11 ri+1 II 2 II n d (x ) , n, n 2 II X yII =Ix ^Â— ^ Â— t(x ) yII t(x")'t(n") = II x"'yII^ ^^^^^ .ml) t(x")'(x" Y) / n. , . n, . n, , . n. ' t(x )' t(x ) t(x )'t(x ) By definition of subgradient d(Y) I d(x") + t(x'')'(Y x") Since d(Y) = therefore ,,n. ^ ,n.i,n . d(x ) $ t(x ) (x Y) Collecting the above results n+1 II 2 , II n II 2 d^(x") II x"^^ yII ^ 5 II x" yII " ^, ^ Â„ . IU\ t(x ) t(x ) (2.21) Theorem [Oettli] c"""^ x" g e II x" x"',0 < < 1 Proof: (2.22) II x"+l yII ' ^ II XyII ' ' ^f ; Â„ t(x ) t(x )
PAGE 61
^ II x"yII By triangle inequality X X < X Y + I Y X 5A ^ 2 II XyII Since this holds for all Y E P including Tp(x ) II x" x'^ll ^ 2d(x^ P) Substituting x in 2.22 1^ / n^ t(x") t(x") 1 i2 / n. n "ll 2 ri d (x ) X X 11 / n N ' / n . II n '' 1 1 2 ' t(x ) t(x ) X X ,"./l2Â„.ii!(Zi ,2. n ] II "+1 '' or X X < X X [1 Â— ]2 11 , ^^\2 t(x') t(x') d"(x , P) 2 1 . rd(x) Â„ ii,n. II where A = mf ; B = sup  t(x )  n d(x , P) n Oettli has shown that A > and B is finite. From these observations, the result follows if we take 2 1 /// =u^n In the next chapter we show that the Generalized Merzlyakov procedure is equivalent to the Oettli procedure in as far as the sequence of possible moves made by the two procedures are concerned. Some of the More Recent Algorithms A limitation of subgradient optimization methods is the absence of a benchmark to compare the computational efficiency of different
PAGE 62
55 algorithms. Algorithms for unconstrained optimization of smooth convex functions have the reliable method of steepest descent (Cauchy procedure). as a yardstick against which other algorithms are compared. If they are better than the Cauchy procedure, they usually deserve further consideration. In subgradient optimization, the Cauchy procedure may not work. An exampJe v;liere the Cauchy procedure is ineffective is reproduced below from [37]: The function considered is max {3 X + 2y, 2x + lOy} FIGUI^ 19 The contours of this function are sketched in Figure 19. One can calculate that if the Cauchy algorithm is started at z, the sequence converges to the origin (0,0) which is not the minimum of the function.
PAGE 63
56 Some of the recent subgradient procedures can be divided into two categories. In the first category are those that claim to be at least as good as some other subgradient procedures. Tliese algorithms claim their superiority based on empirical (computational) results. In the second category are those that claim to be reasonably effective for both dif f erentiable and nondiff erntiable convex functions and claim that the algorithm is superior when applied to quadratic and convex dif f erentiable functions. As an example of the former we present the method of Camerini, Fratta and Maffioli 15], and as an example of the latter we present Wolfe's [38] method of conjugate subgradients for minimizing nondiff erentiable functions. Wolfe's algorithm closely resembles Lemarchel's method 128]. Camerini, Fratta and Maffioli . Here one adopts the following iterative scheme for minimizing f('). Let X = and n+1 n n, n. ,.^.,tr ,. n X = X t s where s is a modified gradient n direction defined as s =0 for n = and n c\ f ^\ , n n1 s =f(x)+3s n with f ' (x ) a subgradient of f at x and 3 a suitable scalar. s n is equivalent to a weighted sura of preceding gradient directions and is used as an antijamming device. The following lemmas and theorems develop the essential elements of Camerini, Fratta and Maffioli method.
PAGE 64
57 (2.23) Lemma * n Let X and x be such that f(x ) ^ f(x ), then S f (x") f (x*) = f ' (x"")' (x* x") This lemma states that the negative of the subgradient makes an acute angle with the vector (x x ) where x corresponds to an optimal solution. See Figure 20. / X f (x") FIGURE 20 f (x") (2.2A) Lemma If for all n < t < 1^^^(^ ) ^^ o > n,, 2 and 3 =0, then n ' A ^ f (x ) (x X ) ^ s (x X )
PAGE 65
58 Lemma 1.1k extends the subgradient property of Lerama 2.23 to s s" also makes an acute angle with (x x ). (2.25) Theorem t * nl,,n, ii,, rY, ^ Â— Lix_ if 3^1 f (x") < Let 6^=1 II s"^ II 2 \^ othervjise with 0=1 =2: then n (x x") s > _ (x X ) f (x ) lls^ll If'Cx") Theorem 2.25 shows that a proper choice of 6 ensures that s is at least as good a direction as f (x ) in the sense that s makes more of an acute angle with (x x ) compared to f ' (x ). If f'(x") and s"""*" make an acute angle we take s^ = f'(x ). Only if f'(x") and s" make an obtuse angle do we take s = (f ' (x ) + 3 s ) n Figure 21 illustrates the above theorem. n1 f (x ) = s n1 FIGURE 21
PAGE 66
59 (2.26) Ttieorem n+1 II < II X*x" Proof: "" ls"l' C^lj s" ^ g (f(x") f(x*)) < 2[f(x") f(xv)] ^ 2f'(x ) (x'< X ) by Lemma 2.24 n ' n S 2s Cx* X ) by Lemma 2.25 t^s"2 + 2s"' (X* x") < or t^s" 2 + 2t s"'(x* x") < n n adding  x* x  to both sides gives Hence n+1 I I . 1 1 , n 1 1 rryj X" X II < II x* X II I/// Theorem 2.26 shows that the sequence is Fejermonotone with respect to an optimum point x*. Using arguments similar to those used earlier in relaxation methods, it can be shown that the sequence r ni ix ; converges to x'^'. Camerini et al . , use heauristic arguments to show that the best value for Y is 1.5. They then claim superiority of their method over that of Held and Karp [18] and Polyak [35] based on computational results. Wolfe . In Wolfe's method [38] we generate a sequence of points {x } of directions {d }, of subgradients {f'(x )} and scalars {t"} such that for n = 0, 1, 2, ...
PAGE 67
60 t ^ 0, f (x"") e 3f(x") n n+1 n , .n , X = X + t a and n ^, n+1, ^ (T/ n, f (x ) < f (x ). At step n we have a bundle of vectors G which is formed by a set of rules. G consists of f ' (x ), possibly d and a (possibly empty) string f'(x ), f'(x ) , . . . of limited length. We set d = N G where N G is a unique point in the convex r n r n hull of G having minimum norm. Demjanov I 10] has shown that Vf (x) = N 8f (x) where Vf(x) is the gradient of f at x when f is smooth at x and 3f(x) is the subdif ferential of f at x. As is typically the case, the stepsize, t , is determined by an approximate minimization of f(x + td ) for t = 0. At some step, d = N G n r n will be small, and in addition the last p + 1 points (for some interger p) are all close together. Tlien x will be our approximate solution. For example consider the illustration in Wolfe's paper f(x,y) = max {f , f , f } whei ;re fj^ = X, f^ = X + y, f^ = x 2y sketched in Figure 22. A subgradient of f at any (x,y) is chosen to be Vf . where i is the least index for which f(x,y) = f.. If we start with the point (6,3), we move in the direction Vf(6,3) = (1,1) to the point (3,0). Using our prescription for choosing the direction, the only available direction at (3,0) is still (1,1). We must take a small step in that direction anyway even thougli it is not a descent direction. Tlien the new subgradient will be (1,2) forming the bundle G = {(1,1), (1,2)}, we find the direction N Q = (1,0).
PAGE 68
61 FIGURE 22 The next step in this direction takes us to the neighborhood of (0,0). At the next iteration we add (1,0) to the bundle. Then N^G^ = 0. We "reset" and start all over again at this point. We accept the last point reached as a solution if N G = and the r n last "p" points are "close enough." For differentiable convex functions, the method is an extension of the method of conjugate gradients and terminates for quadratic functions.
PAGE 69
CHAPTER 3 THEORETICAL RESULTS Comparison of Generalized Merzlyakov Method and Some Typical Subgradient Methods The equivalence of relaxation methods and minimization methods has been noted by several researchers. However, these two approaches are only mathematically equivalent and this equivalence has not been generally made explicit. On the other hand computationally relaxation methods are much simpler. It has been shown [24] that a generalized form of Merzlyakov' s method subsumes most of the typical subgradient methods. In particular it was shown that the generalized Merzlyakov method can generate any move that is made by the Oettli method. In this section we draw heavily from the results of the above paper [24]. Generalized Merzlyakov Method The Merzlyakov method is generalized [24] so as to facilitate its comparison with some of the subgradient methods. This extension greatly increases the flexibility of the original method. The following procedure allows the A.s to be chosen based on the current point rather tlian being constant for a subcavity. For any x where V(x) / 0, let /"o if a^x + b. > A.(x) = ^ "" "" ^ I > if a'.x + b. ^ 62
PAGE 70
63 where < Y. A.(x) = 3, A . (x) > for some i cV(x) and aL each n 1 1 there is a j e V(x ) such that ^.(x")(a'.x" + b.) 3.1 J^ ^ > e a '. . X + b . . J* J* where j* indexes a most violated constraint and > 0. For A e (0, 2) and x initially specified, stop at iteration n if X e P. Otherwise, let XL A.(x")(a:x" + b.)(Z A.(x")a.) T Â„ n+1 n i_ 1 1 1 X 3.2 X = X (E A.(x")a.)'(Z A.(x")a.) 11 11 We continue to assume a. =1 for all i. 1 In the next two theorems, we show that the generalized Merzlyakov method converges with a linear rate of convergence to P. 3. 3 Tlieorera Let z e P, and {x } obtained by the generalized Merzlyakov method. Then II x"*!z < x"z Proof: 2(x"z)' A I A.(x")(a:x" + b.)E A.(x")a. x"^^z=lx"z^(Z A.(x")a.)'(Z A.(x")a.) 11 11 A^(Z A. (x'')(a'.x'' + b.))^ + i ~ (Z A. (x'')a.)'(Z. A.(x")a.) The last two terms can be conAiined to get A Z A.(x'^)(a'.x" + b.) [2Z A.(x")a:(x" z) a A^(x")a^)'(Z A^(x")a^) A(Z A.(x")(a'.x" + b.))] Ill
PAGE 71
YR 64 Now a! X + b. = whenever A.(x ) > and a ! x"^ + b < for some 11 1 11 A.(x ) > and thus the first bracketed term is positive. Since z G P and alz ^ b., then 1 ~ 1 Z A.(x")a'.z ^ E A.(x")b. 11 11 Thus the second bracketed term is less than or equal to (2 A) Z A.(x")(a:x'' + b.) Ill and this term is strictly negative. Thus II "+1 II ^11 n x z ^x 2 Thus we have established that the sequence is Fej ermonotone with respect to P. We next show that the procedure gives points which converge linearly to P. Let d(x^, P) = inf II x" zjl . zeP 3.4 Ilieorem Either the generalized Merzlyakov procedure terminates with a point of P or it generates a sequence which converges to a point X* e P. Furthermore, the convergence rate is linear. Proof: Assume x ?! P and z" is the unique closest point of P to x". Then d^x""\ P) ^ x"+lz"2. After expanding  x" 2*^11 we get (i: A (x'')a.)'(Z A.(x")a.) 11 11 . Â„ A(2A)[Z A.(x")(a:x" + b.)]^ = d^Cx", P) (Z A.(x")a.)'(Z A.(x")a.)
PAGE 72
6 5 By assumption IE A.(x")(a:x" + b.)]^ k [A.(x")(a'.x'^ + b.)]^ ^ e^(a!^x" + b.^^)^ for some i e V(x") From Hoffman [21] we have that there is a p > n r.s < ^, n 111 us pd(x , P) ^ a!..x' b..,. d^x"^\ P) < d2(x^ p)_ A( , 2A)p^9V(x", P) (L A.(x")a.)'(E A.(x")a.) 11 11 Finally since Z A.(x ) ^ g and a.= 1 for all i, we have a A.(x")a.)'(Z A.(x")a.) < B^ And we get d2(x"^\ P) < U A(2A)e2y2/3^]d2(x", P) = y^d^Cx", P) Tlien ^ Y < 1 and d(x", P) i Y"d(xÂ°, P) . Case 1 : If tlie sequence does not terminate lim d(x", P) = and nxÂ» Motzkin and Shoenberg [31] have shown that the sequence converges to a point X* in the boundary of P for < A < 2. Case 2 : If the sequence terminates II x""*" z < II x" zjl for all z e P (Theorem 3.3). Also II X* z II < II X 2 II where z" is the closest point to P from X . Thus x.'c x" ^ II X* z" + z" x" < 2x" z'^ll = 2d(x", P) ^ 2Y"d(xÂ°, P) where x* is the last point in the terminating sequence. ///
PAGE 73
66 We can further generalize the Kerzlyakov method and Tlieorems 3.3 and 3.4 by replacing A with A where < b. < A < b < 2 for all n. In the remainder of the discussion we usually mean A = A for all n but do not have to restrict ourselves in this manner. Further Developments of Oettli Method Oettli's minimization method was discussed in the last chapter. We wish to show the equivalence of Generalized Merzlyakov method and Oettli's method in the next section but before that we need to develop some properties of Oettli's method to facilitate this comparison. Again most of the theorems and proofs in this section are taken from the paper [24] referred to earlier. We represent the subdifferential of a convex function f at x by ' 3f(x)and a subgradient of f at x by f ' (x) i.e., f ' (x) e 3f(x). To recapitulate the Oettli procedure, we form the sequence {x"} as follows: 3.6 ,,"+! ,,n d(x")t(x") t(x")'t(x^) where t(x ) is any subgradient of d(Â«) at x". 3. 7 Lenmia For any norm p(Â«) on RÂ™ ap(x) = 9p(0) n {p'(x): p(x) = p'(x)'x} Proof: We will show that if p' (x) c3p (x) then the following ralation holds 9p(x) = {p'(x): p(y) 2 p'(x)'y for all y} n ^p'(x): p(x) = p'(x)'x} Conversely if the above relation holds, we will show p' (x) e 3p(x).
PAGE 74
67 By the definition of a subgradient p'(x) e 3p(x) if p(y) 2 p(x) + p'(x)'(y x) for all y. Also p(Ax) p(x) > p'(x)'(Ax x) for all scalars A. Since p(*) is a norm p(Ax) = Ap(x) For A > 1 p(x) = p' (x)'x < A < 1 p(x) i p' (x)'x Thus p(x) = p' (x)'x. From the subgradient inequality p(y) ^ p(x) + p'(x)'(y x) by substituting for p(x) we get p(y) > p'(x)'y for all y. Hence 3p(x) = {p'(x): p(y) > p'(x)'y for all y} n {p'(x): p(x) = p'(x)'x}. Conversely, if p(y) > p'(x)'y, for all y and p(x) = p'(x)'x, we get by subtraction p(y) P(x) ^ p'(x)'y p(x) = p'(x)(y x) and p' (x) e dp (x) . Finally 8p(0) = {p'(0): p(y) ^ p'(0)'y, for all y} n r'" {p'(0): p(y) > p'(0)'y, for all y}. Ul To find the subgradients of d(Â«) in terms of subgradients of the composite function p(?, (x)) we use the following chain rule given by Oettli. 3.8 Lemma Let p(*) be an isotone norm and p'C^. (z)) 2 0, then 3.9 d'(z) =Â£"^'(2) p (l^iz))
PAGE 75
68 Proof: l^{z) Â£^(zÂ°) = Cz zÂ°)'il^'czÂ°) multiplying both sides by p ' (Â£ (x)) where p ' (A (x) ) = (^^(z) Â£^(zÂ°))p'(ll^(zÂ°)) I {7. zÂ°).'Ji"*''(zÂ°)p'(Â£^(zÂ°)) Using the definition of subgradient for p'(i2(z )) p(Jl'^(z)) p(Â£'*'(zÂ°)) ^ (!i^(z) Â£"*'(zÂ°))'p'(ii^(zÂ°) ^ (z zÂ°)'Â£^'(zÂ°)p'(il+(zÂ°)) d(z) d(zÂ°) = (z zÂ°)'Â£^' (z'^)p'(Â£"^(zÂ°)). But d(z) d(zÂ°) ^ (z zÂ°)'d'(z'^) hence d'(zÂ°) = Â£^ (zÂ°)p ' (Â£"''(zÂ°) ) . The subgradient of 5.(x) may be found using tlie following well known result 3.10 35,. "'"(x) = {Aa.: A = if a'.x + b. > 1 1 11 A = lifa'. x + b.<0 and 1 1 A c [0,1] if a'.x + b. =0} The subdiff erential of Â£ (x) can in turn be found from 3.11 3i2,"^(x) = {(h,, ..., h ): h. e 3K,"*"(x)} 1 mil Generalized Merzlyakov method and Oettli method With the results developed so far in the previous sections we are now in a position to develop an expression for equation 3.6 in terras of the parameters of the generalized Merzlyakov method. 3.12 Theorem Let X r. K , x ?! P and t(x) be a subgradient of d(*) at x formed by using the chain rule of equation 3.9. Then ,, . , . [ZA. (alx + b.)]ZA.a. d(x) t(x) ^ 1 1 1 11 t(x)'t(x) (ZA.a.)' (EA.a.) 11 11
PAGE 76
69 for some A.'s where A. =0, A. =0ifalx+b.>0, and ZA > 0, 1 11 1 1 i Proof: By equation 3.9 we have t(x) = l^ (x)p'(Â£"*'(x)) for some l^ (x) c 3Â£"'"(x) and p'(Jl"^(x)) c 3p(il"'"(x)). By equations 3.10 and 3.11 +' ^ (x) = (n,a , r]r.a , ..,, n a ) with i i I A mm if a!x + b. > 1 1 n. = < 1 if a'.x + b. < G [0,1] if a! x + b. = Also from the proof of Lemma 3.7 d(x) = p' {I (x))'il (x) where /n^Ca^x + b.) \ l^U) = \ n (a'x + b ) mm m / Let ,+ p' (Â£ (x)). In Oettli's method we need h ^ 0. Substituting ^u 1 c Â• Â• d(x)t(x) these last rew expressions into Â— , Â— rn Â— iÂ—:gives ^ t(x)'t(x) ^ d(x)t(x) t(x)'t(x) [ZA.Ca'.x + b.)]ZA.a. 1 1 1 11 (ZA.a.)'(EA.a.) 11 11 , + , with A. = n.li. ^ 0. We also have ZA.(a!x + b.) = p(Â£ (x)) > since 111 111^ X t ? and ZA. > 0. 1 m We have thus established a close similarity between the two methods. We have however yet to establish the two other requirements viz ZA . ^ 6 and that condition 3.1 is satisfied. First we note that
PAGE 77
70 since the subdif ferential of p(*) is compact ZA. is bounded above. Hence the condition EA. ;^ 6 is satisfied. Now to show that condition 1 3.1 is satisfied, we see on the lines of the proof of Theorem 3.12 that d(x) = p(l^(x)) = p'(Â£'^(x))'Â£"^(x) = h'(Â£^(x)) = i:ri.h.(a'.x + b.) = ZA, (x)(a'.x + b.) Ill 1 1 1 1 where A. (x) = n .h . ^ 0. 1 11 Hence ZA. (x)(a:x + b.) = p(JL'^(x)) 111 and m max A.(x)(a'. x b.) > pCS, (x)) i Also i (x) ^ Â£.,(x)e,. where i* is the most violated constraint and e.^ is a unit vector with a 1 in position j*. Since p(*) is monotonic P(^'^(x)) > P(Â£^.(x)e..) 2^ J* = (a: X + b. .)p(e... ) Thus m max A . (x) (a.'x b.)^(a'x + b ,)p(e ,) _^ 1 1 1 2* J'' j'' 1 (a ! , X + b . . ) min p (e . ) Hence A rx)(a:x+b.) , "^^" P(^) where i e V(x) maximizes the left hand side. If we allow A = 1 in the generalized Merzlyakov method, we find from the above analysis that any move made by the Oettli procedure
PAGE 78
71 can also be made by Che generalized Merzlyakov method. However, the reverse is not true. This can be seen if we let A 7^ 1 and take the case where x violates only one constraint. llius the Oettli procedure is strictly subsumed by the generalized Merzlyakov method. Computationally the relaxation method is very simple to implement since the directions of movement can be easily computed. Also it is a very flexible procedure since all that is required to change the direction of movement and the step length is to change the weights A.(x). On the other hand the Oettli method requires a suberadient 1 TO to be computed at each point which is not a trivial calculation. In an earlier paragraph we indicated that the Merzlyakov procedure could be further generalized by allowing X to vary with each iteration. However this would require tiie additional condition 0
PAGE 79
72 We consider the methods of Polyak ([34], [35]), Eremin [12] and Held, Wolfe and Crowder [19]. These subgradient methods are capable of tackling more general functions, however we compare them when the objective is merely to find a point of a polyhedron P. There are a number of ways of defining the function f(*) such that when we minimize it we get x* e P. A typical choice is 3.13 f(x") = max (a]x^ + b.). 1 1 i=l, . . . ,m We will show that with f thus defined the subgradient methods are subsumed by the generalized Merzlyakov method. The above subgradient methods can be collectively described by the following sequence: n+1 n , [f(x ) f] , , n, ^ = ^ \ f'(x")P ^ ^^ ) where f is an estimate of fCx*) the optimum value of f. If f(x ) assumes its value for a unique i in the relation (3.13) then the subgradient at x , f ' (x ) = a . . If there are ties, f'(x ) can be taken to be a convex combination of the maximizing a.s. Let 0(x ) represent the set of indices i C {l, ..., m} for which fCx"^) = max (a'.x^ b.). Then i=l, . . . ,m f ' (x") = ZA.(x")a. with ZA.Cx"^) = 1 11 1 A_.(x'') I and A.Cx"") = for i 5^ 0(x''). Hence _ ,, 1 (f(x") f) n+1 n An ri , "s X = X 'C (x ) iif(x")r = x A (ZA.(x")(a^x" + b_^) f)(i;A.(x")a.) " (ZA.(x")a.)'(ZA.(x")a.) 1111
PAGE 80
73 with X.(x") > 0, ZA.(x") = 1 and A.(x") = for i ?! 0(x"). This can be considered equivalent to finding a point of P where P= {x: a.x+b. ?^0, i= 1, .... m}. 11 If f ^ 0, the procedure gives a point of P. Thus if P ?^ the subgradient procedures mentioned in this section are special cases of the generalized Merzlyakov method. An Algorithm Using Relaxation Method Motivation We have seen that with relaxation parameter of A = 1 we can get the best convergence rate for a fixed y (see Figure lA). Also a higher y leads to better convergence. We wish to combine both these features in our algorithm. We also derive motivation from Polyak's [34] and Oettli's procedure [32]. In this section we give a procedure for finding a point of an arbitrary polyhedron P and later apply it to the specific problem of solving large linear programs. Given a polyhedron P P = {x e R^: a'.x + b. ^ 0, i c l}. 1 1 We assume P ^ and I f^ and finite. Goffin [15] has shown tliat when P is full dimensioned finite convergence is assured for a range of relaxation parameters A depending on the obtuseness index of the polyhedron. WHien P is not full dimensioned, we can partition it to subsets M and C, such that M is full dimensioned, and then devise an iterative procedure relative to these subsets so as to take advantage of the full dimensionality. Even though finite convergence is not assured, we can hope to get a better convergence rate with a proper choice of the relaxation parameters. There is considerable flexibility
PAGE 81
74 in the manner M and C are chosen. One important consideration is that it should be easy to project on to C. The algorithm is in two phases. In Phase I we use an iterative scheme on the sets M and C which drives us close to P. We do this by starting with an arbitrary point in C. We then move in the direction of M. This movement could be a projection onto M, or a reflection or under or over relaxation, depending on the value of the relaxation parameter X. From the new point thus obtained we project on to C. We show tliat the sequence thus constructed converges to P with a linear rate of convergence. In our algorithm however, this iterative procedure is continued merely to get close to P. In this special case when C = R , Phase I becomes tlie relaxation procedure of Agmon. Once we get close to P we switch to Pliase II. In Phase II we consider the constraints of the set P and apply the Generalized Merzlyakov method. The motivation for use of the Generalized Merzlyakov method at this stage is the fact that the set of violated constraints as we approach P are precisely the set of violated constraints at the limit point and under these circumstances, the Generalized Merzlyakov method can be very effective. We first describe the procedure for finding a point of an arbitrary polyhedron P. The Algorithm Phase I . Let z Â£ CP If z C P stop, otherwise define
PAGE 82
75 n+1 = T^ {z"" + Ad,j(z")a. } = T^(s") where < A < 2 and a. corresponds to the most violated half space n n of M at z and 3.14 n _ n , , n, s :^ z + Ad,,(z )a . M 1 We will show that the sequence { z } is Fejermonotone with respect to P. 3.15 Lemma [GoffinJ Let X, y, t be points of R and let x' = X + A(t x) where A e R. Tlien Proof: X' y X' y y^ AC2 A) lit x ^ + 2A(ty)'(tx) X + A(t x) yl = xy + I2(x y) + A(t x)]' A(t x) = II X y 2 A(2 A) t x ^ + A[2(x y) + 2 (tx) ] ' ( tx) = x y^ A(2 A) t xll 2 + 2A(t y)'(t x) . /7Z1 3.16 Lemma ls"u2 = 1," ul2Ad^(z")[(2A)dj^j(.") 2a! (t" u) ] n for u c M. Proof: Substitute s for x , z for x and t for t, where t" = z" + dj^j(z")a. . Then n 2 II n II 2 , ,Â„ , . II n n, 2 5" ur = II z" ur A(2 A) t z + 2A(t u)' (t z ) z" u ^ A(2 A)d^j(z") + 2Ad^j(z")a^ ( t" u) n .n. n. n z u Adj^(^')I(2 A)dj^,(z") 2a^ (t" u) ] . UÂ± n
PAGE 83
76 3.17 Tlieorem Let M be a nonempty polyhedron of R . Then the sequence of points generated by equation (3.14) has the following property: II s" u ^ II z^ u for all u e M. Proof: If u e M, u e H_. . Hence a^. ( t" u) =0 for all u e M. By lemma 3.16 1 1 n n s" ull ^ II z" ull . E77] 3.18 Theorem Let P be a nonempty polyhedron of R . Then the sequence of points {z } generated by equation (3.14) is Fejermonotone with respect to P, in fact II z""^^ u < II z" ujl for all uC P. Proof: Let u e P and let b be the unique point obtained by projecting u onto the line formed by extending s , z (Figure 23) . The angle formed by s , b, u is a right angle and for the right triangle n , s , b , u. s" a2 + Ha ull 2 = lis" ull 2 or s" z"+l 2 + II z"+l all 2 + 2 s" z"^^I  z"+^ a] + au2 = lis"u2 n n+l, 2 II n+1 ii 2 , Â„ n n n+1 n ,, n+1 s z II + II z u + 2 s / II II z a II '^ l2 = s u or n+1 z ull' < lis"u2.
PAGE 84
77 are The strict inequality arises from the fact that s" and z""^ distinct or else z c P. But by Theorem (3,17) II s u < II z u hence II ^'^^ II ^ II " II TÂ— II z ujl < II z u JljJ In the next theorem we show that the sequence {z"} converges linearly to P. FIGURE 23
PAGE 85
78 3.19 Theorem The sequence of points {z } generated by the above procedure converges finitely or infintely to a point of P. Furthermore the convergence rate is linear. Proof: I I n+1 ^ , n+1. 1 1 2 II n+1 1 1 2 . . r^ / , \\ z. T(z ) =z ~'J f Â°r a unique u e P (the closest point of z to P) . Thus lU'^'u2= T (z^ + Ad rz")a, }T ru) M^ ' i C g II zT" ujl ^ Ad^j(z")[2a (z''u)Adj^(z")] n since the projection map satisfies the Lipschitz condition with c = 1. Continuing II n+1 II 2 ^ II n II 2 u^ < II z"u^Ad rz")[2d,Xz") Ad rz")] since al (z u) = d Â„(z ) hence 1 h n II ^"^^ u ^ < d^Cz"", P)[l A(2 A)y"^] by Lemma 2.12 We get )'d2(z", P). dCz""*""*", P) < ed(z", P) where e = U A(2 X)\i''^]^^^ and Â£ [0, 1). Hence d(z , P) < 9 d(z , P) and if the sequence is infinite lim d(z",P) = nÂ— Kjo Using Theorem [2.4] and by the continuity of the distance function this implies {z } converges to Z'^ in the boundary of P.
PAGE 86
79 If the sequence terminates at z''', then II z>'' Tp(z") < II z" Tp(z") = d(z", P) by Fejermonotonicity of the sequence. Hence z* and z both belong to the ball B (T^Cz")), and d(z", P) nn ^oj/H Â„. ^ Â„Â„n , . z'V z" < 2d(z", P) < 20"d(z^, P). Ul Phase II . When the last k points are within Â£ of each other, we switch to the Generalized Merzlyakov procedure. This is because if we are near the limit point, tlie only active constraints would be the tight constraints at the limit point of the sequence. In such a situation the Generalized Merzlyakov method can be very effective with a proper choice of A and A.(x). We recapitulate that in Merzlyakov's procedure (which is a special case of Generalized Merzlyakov method) the sequence is generated as follows: ^^^ ^ XIEA^.(j, V)(a;.x" + bJ][ZA^.(j, V)aJ X = X [ZA.(j, V)a.] [ZA.(j, V)a.] where x Â£ S^ If we assume that we know the set of constraints satisfied with equality by T (x ), we can generate the halfspace H* = {x e R'':(Tp(x") x")'(x TpCx"")) > 0} which gives convergence to T (x ) in exactly one step with A = 1 Suppose we number the set of active constraints at T (x ) by I* = {l, ..., p}. Let A* E (a.,..., a )' be the p x k matrix formed by interior normals of the halfspaces 11. for ic I*. Let b* = (b^, .... b^)' . Define (A^, ...,A P)' = (A*A*')"^(A*x" +b*).
PAGE 87
80 With this set of A . ( j , V) ' s using Merzlyakov method, Goffin [15] has shown that convergence to T (x ) is obtained in one step. P However, this requires a comparably larger amount of computation and finding (A*A*') is not really practical on large problems. In this study we have therefore concentrated on the Generalized Merzlyakov Method and attempted to find what would constitute a favorable set for X and A . (x) . Application to Solving Large Linear Programs We now consider how the procedure could be used to solve large linear programs. Consider the LP max c ' X s. t. Ax S b X ^ and its dual min TT'b s . t. A'tt = c TT ^ 0. By stong duality, an optimal IT and x satisfy b'TTc'x<0. Let C = {(,'^): b'^ c'x ^ 0} M = {(^): Ax ^ b, A'tt ^ c, x ^ 0, tt > O} and P = M n C. Tliis is only one of the ways of partitioning the constraint of P. There is in fact great flexibility in the choice of C and M. The advantage of splitting C and M is indicated above is
PAGE 88
81 (a) M has a special structure and can be decomposed along TT and X. Tills leads to saving in storage as well as easier arithmetic. (b) C contains only the coupling constraints and is easy to project to. However, there are other ways of obtaining p. Another choice could be to let C have only the nonnegativity constraints. Again the advantage of such a construction would be ease of projection to C. Still another choice is for C to take the form C = {(^): b'TT c'x ^ 0, X ^ 0, 7T ^ 0}. There is user discrimination in this algorithm in the choice of M and C, A and Z in Phase I, and A and weights A.(x) in Phase II. We have already commented on the possible strategies for selecting M and C. We give below strategies for selecting the other parameters. We will first discuss the parameters for Phase I. (1) A^: If < A ^ 1 convergence may be slow if some of the "angles" of M are very acute (small obtuseness index of P) . In this case increasing A to near 2 will have tlie effect of opening the angles and accelerating the convergence. It therefore appears to be a good strategy to have A approximately equal to 2 in Phase I. (2) _Z : In line with the general observation for relaxation methods that any a priori information on the location of the solution could be used to advantage, if we have to solve a linear program witli only sliglitly varying data, wc could take Â£idvantage of tliis feature. However, in general wlien we do not have any sucli prior information, the choice of Z could be dictated by the structure of C, so that we can start with a feasible point of C.
PAGE 89
82 Now for the parameters of Phase II (1) _X: As a general guideline 1 < A <2 may be chosen as the parameter for Phase II as was done in Phase I. However, Table 3 and Figure 2A show the sequence obtained with A = 1.5 and A = . 75 and illustrate that in some cases choice of < A < 1 may give better results. (2) A.(j, V): In Merzlyakov method the choice of weights A.(j, V) is crucial and in a large measure dictates the convergence of the sequence. It may be noted that A . ( j , V) ' s determine both the direction as well as the step size. Suppose a., i c K represent the set of violated constraints in subcavity S where X Z S'}. We can solve the quadratic problem of finding the point of minimum norm in convex hull of the finite point set a., i Â£ K to get the weights A . ( j , V). Such a direction locally has a great deal of merit. However the computational effort is too great to merit its consideration, since at each iteration we have to solve a quadratic program. Also it may not always lead to the best set of weights as shown by the counterexample given in the next paragraph. Instead we specify A . ( j , V) =, where k represents the number of constraints violated at x , When k = 2 the direction obtained coincides with the vector of minimum norm. Tlie strategy suggested by us is not the best in all cases as shovvJii by the following counterexample: Consider x^ 1 > X2 = 2 1 .0.
PAGE 90
83 Case i: A 75 TABLE 3 A.(j, V) 1 for all i Â£ V.
PAGE 91
84 2 X 4 X 6 X A = .75 FIGURE 24
PAGE 92
85 From the proof of Theorem 3.4 we have d2cx"+\ p) s d^x^ p) ^ Â— "^ Â— i Â— (EA.(x")a.)' aA.(x")a,) Thus to get the best set of A.'s we may solve the following optimization problem (ZA.(a!x" b.))^ 1 1 1 _ , max ^ k 2A.a.2 X* Suppose A* solves the above, then i also solves the above. Hence an equivalent problem is max (ZA . (a*. x b . ) ) s.t. IIZA.a.II^ = 1. In our specific problem max A, + 2AÂ„ 2 2 s.t. A + A =1 or ma .JTT A^ + 2X21 This gives A^'' =r" and A'^' = f~ . Tliis set of A. s gives 1 2 1 X = (Â„) which is feasible with k = 5, Had we specified A = A i.e . , A. =jywhich is not feasible and k = 4.5. Tlius our prescription does not lead to the best set of A.s, it is however a good lieuristic and easy to implement.
PAGE 93
86 (3) }i. (^) ' In Generalized Merzlyakov method, the weights A,(x) are determined by the point alone and are not fixed for a subcavity. The following appears to be intuitively a good set of weights since the ivreightage is dependent on the magnitude of violation of a constraint at the specific point , , . , violation of constraint i at x, A . (x) = ( ; :Â— :; : "^ ) J total violation at x Computational Results The algorithm described in the previous section was coded primarily to come up with a favorable set of values for the arbitrary parameters A in Phase I and X and the weights X.(x) for Phase II. It may be stated at the outset that selection of these parameters is an art rather than a science and it would be hard to specify the best set of parameters for the general case. The test problem considered was the problem on "LeaseBuy Planning Decisions" on pages 93 to 114 of Salkin and Saha's "Studies in Linear Programming." The problem has 17 constaints and 20 variables. Together with dual and nonnegativity, set M thus had 74 constraints. Set C consisted of the constraint obtained by using strong duality. This problem has a unique solution. The constraints of the problem are reproduced in Appendix. Test results corroborate the general guidelines for selection of these parameters indicated in the previous section. The distance from a point oC P where P = M n C, was calculated at each iteration. Tlie number of iterations required to reduce the distance from P by a certain amount was used as the criterion of improvement. Two
PAGE 94
different starting points were selected. Results of computations are summarized below; Phase I A_. The relaxation parameter A was varied in the interval (0,2). For the first starting point, the number of iterations required to reduce the distance d from P from d = 74 to d = 60, was noted, and for the second point the number iterations required to reduce the distance from d = 135 to d = 132 was noted. These results are summarized in Tables A and 5 respectively. The results are in line with our expectation based on intuition that a higher A has the effect of accelerating convergence. Phase II llie Phase II relaxation parameter was varied in the interval (0,2) and the following alternatives were tested for the weights A . (x) : (a)A.(x) ^ j // of violated constraints at x , , _ violation of constraint j at x j total violation at x , ,, , , , violation of constraint j at x . 2 (c)A_.Cx) ( ^^^^^ violation at x ^ .,., , . violation of constraint i at x, ! ('^^^j^^) = ^ total violation at x ^ 4 en ,,,,, /Violation of constraint .i at x ^ (e)A.(x) = ( : Â— :; : ) 2 total violation at x ,^. , ^ , violation of constraint j at x. > t ^, (f) If ( ; 7 :Â—, : T ) = Â• 3 th total of violation at x , , , ,Â„, .violation of constraint i at Xv A.(x) = 10^( ; r; : ^ ) J total violation at x .violation of constraint j at x. ^ _ , if .2 < ( . , ^ :: ^ ) < .3 then Â— total violation at x , , , ^ ^ .violation of constraint i at x. A.(x) = 5* ( ; :Â— ; . :: ^ ) J total violation at x
PAGE 95
TABLE 4 Initial d = 74 88 X
PAGE 96
TABLE 5 Initial d = 135, 89 A
PAGE 97
90 , < .violation of constraint i at x. ^ Â« , if Â• 1 = ( . , ^ . ^ ) < Â• 2 then total violation at x , , . _ Â„, violation of constraint j at x. A.(x) = 3.3'=( Â—Â—r Â— 1 ^ ) J total violation at x en . r /Violation of constraint i at x, ^ , , if ( : rÂ— ; : "^ ) < Â• 1 th total violation at x , , , violation of constraint i at Xv A . (x) = ( ; :Â— ; : "^ ) J total violation at x , , ^^ violation of constraint i at x^ > Â„ (g) If ( :; :Â— ; ' ) _ . 3 then ^ total violation at x 1 / \ , /^^ . /Violation of constraint i at x, A.(x) = 100Â«( :; :Â— ; : ^ ) J total violation at x .^ Â„ < /Violation of constraint i at x. ^ if Â• 2 = ( r~r~i Â• 1 ^T^^ ) < 3 then total violation at x , / , ,_, /Violation of constraint ] at x> A.(x) = 10"( :; : Â— :; : ^Â— ) J total violation at x , /Violation of constraint i at x, , Â„ if Â• 1 < ( : r; : ^ ) < . 2 then = total violation at x , / V Â„ .,, /Violation of constraint i at x, A . (x) = 3. 3" ( : Â— ; : ' ) J total violation at x en ., /Violation of constraint i at x. ^ if ( Â— ' :Â—: : ^ ) < .1 th total violation at x , / > /Violation of constraint i at x, A.(x) = ( ; : Â— :, : ^ Â— ) J total violation at x Results of alternative (a) are given in Tables 6 and 7 for the same two points used in Phase I. From Tables 6 and 7 it would appear that again A of around 2 gives best convergence, but the counterexample in the previous section shows ihat this is not alv^/.Â•lys the best policy with this set of weights. Results of alternative (b) are given in Table 8.
PAGE 98
TABLE 6 (a) ^ (x) = . j // of violated constraints at x' Initial d = 74 91 A
PAGE 99
TABLE 7 (a) A.(x) = ., ^ . ^ ^ Â— J // of violated constraints at x" Initial d = 135 92 25 75 1.0 Final d 133.45 132.0 131.999 No. of iterations time limit exceeded >20,000 15,249 7,568 1.25 131.999 4,135 1.5 131.999 2,435 1.75 131.998 1,212 1.95 132.0 713
PAGE 100
93 (b) X .(x) J TABLE 8 violation of constraint j at x total violation at x Intiial d = 74 X
PAGE 101
94 Alternative (b) appears comparatively insensitive to the relaxation parameter X. Results of alternatives (b), (c) , (d) and (e) are tabulated below: TABLE 9 ,, . , / N .violation of constraint i at x. Alternative (b) : A.(x) = ( T~r~i ^~'^ Â— r^~ Â— ~~Z ) J total violation at x . . , / V , violation of constraint j at x 2 Alternative (c) : A.(x) = ( ^ ^ . ^~, Â— TZ ) J total violation at x . , . ^ . . 3 ,,. , / X .violation of constraint i at x. Alternative (d) : A.(x) = ( Â— . . ^ . Â— ) J total violation at x , . , / N .Violation or constraint i at x. Alternative (e) : A . (x) = ( ^ ^ , . , ^ . ;;: ^ ) J total violation at x Initial d = 74 A = . 25 Iteration no. Alternative Alternative Alternative Alternative
PAGE 102
95 Primafacie alternative (c) and (d) looked comparable to alternative (b). So the number of iterations required to reduce d to 60.0 was determined and is given in Table 10. From Table 10 there appears no significant difference between alternative (b), (c), and (d) . Results of alternatives (b), (f), and (g) are shown in Table 11. Since no significant difference in performance was observed, (f) and (g) were not pursued any further. Results of above study show that the following constitute a favorable set of parameters: Phase I. X = 2 Phase II. A = 2 \ f \ violation of constraint j at x j total violation at x However as indicated earlier, there are simple counterexamples showing alternative values of the parameters which give better convergence. Hence the above prescriptions can at best be considered good heuristics. For Phase I of above study, we moved in the direction of most violated constraint of M. Investigations were also conducted combining Phase I and II. Instead of moving in the direction of the most violated constraint of M, we used Generalized Merzlyakov method for relaxation with respect to M. The results were very heartening and are shown in Table 12. In this table a comp;irison has been made with alternative (b) of Phase II.
PAGE 103
96 TABLE 10 Initial d = 74
PAGE 104
TABLE 12 Initial d = 74 Final d = 60 97 A
PAGE 105
98 Concluding Remarks The simplex method of solving linear programs has achieved its present popularity in a large measure due to the advances during the last three decades on extending the effectiveness of the basic algorithm introduced by Dantzig in 1951. There are however some useful largescale linear programs beyond the capability of the simplex method which are not being formulated and solved at present due perhaps to their size. It is for such problems that iterative techniques seem to be a viable alternative. We have surveyed two such techniques Â— relaxation methods and subgradient methods Â— in considerable detail. It has also been shown that most of the recent subgradient methods are mathematically subsumed by the Generalized Merzlyakov method. The latter along with relaxation methods in general are computationally far simpler. In this dissertation another algorithm using relaxation method has been presented which possesses a linear rate of convergence. The new algorithm has been coded and an attempt made to find for it a favorable set of parameters. A variant of the algorithm has been seen experimentally to have considerable merit over the present relaxation methods. Convergence properties of this algorithm are still not attractive enough to be of practical value. However, it is hoped that these could be improved upon and an algorithm found that can economically solve large scale unstructured linear programs.
PAGE 106
APPENDIX The Test Problem min. 1516x, + 1238x^ + 963x^ + 750x, + 450x^ + 1200x^ + lA30x^ 1 2 J 4 5 6 7 + 1300x + 1260JC + 1300x + 870:x + 500x + 1400x + 1360x,, + 2160X,,. + 2325x,, + 1095x,^ + 2720x,14 Ij lb i / io + 1440x^g + SOOx^Q subject to ^6 + ^7 + "18 Â•* ^9 = ^"^ ^4'""i5 ""le ^ ^7 ^^s'^ig = ^"^ "l ^ "6 ^ "7 ^ ^4 ^ ^5 ^ ^6 " \8 " \9 ^ '' "l + "2 " "6 ^ "7 "^ "8 " "9 ' "13 + "14 "^ "l5 "^ "l6 * "l8 = ^Â° x^ + X2 + X3 + x^ + x^ + Xg + x^g + ^ 13 + ^14 + ^15 + ^15 + ^ig = ^^ x^ + x^ + X3 + x^ + x^ + xg + Xg + x^Q + x^^ + x^3 + x^3 + x^^ + x^g > 65 x^ + x^ + X3 + x^ + X3 + x^ + Xg + x^Q + x^^ + x^2 " "13 ^ "15 *"18 ^ "20 =^ ^Â° "16 * "17 = ^2 "18 ^ "19 = ^^ "13 ^ 1Â° "14 + "15 ^ ^5 "16 ^ "20 =< Â° "1 + "2 ^ "6 " "7 " "8 ^ "9 "l4 "15 ^ "17 " "19 ^ 3^ "3 ^ "4 ^ "10 + "11 "13 = ^Â° "5 + "6 ^ "8 ^ "12 + "14 "" "16 "20 = ^^ "10 ^ "11 * "13 Â•* "18 =< ^Â° "7 ^ "9 ^ "12 ^ "17 ^ "20 ^ ^Â° X. >0, 1=1, ...,20 99
PAGE 107
100 Note : Problem abstracted from Case Study on "LeaseBuy Planning Decisions" in Salkin, H. M. , and J, Saha Studies in Linear Programming , 19 75, pp. 93114.
PAGE 108
BIBLIOGRAPHY 1. Agmon, S,, "The Relaxation Method for Linear Inequalities," Canadian Journal of Mathematics , 6, No. 3, 1954, pp. 382392. 2. Beale, E. M. L. , "The Current Algorithmic Scope of Mathematical Programming Systems," Mathematical Programming Study 4 , 1975, pp. 111. 3. Bland, R. G. , "New Finite Pivoting Rules for the Simplex iMethod. , Mathematics of Operations Research , 2, 1977, pp. 103107. 4. Busacker, R. G. and T. L. Saaty, Finite Graphs and Networks , McGrawHill, New York, 1965. 5. Camerini, P. M., L. Fratta, and F. Maffioli, "On Improving Relaxation Methods by Modified Gradient Techniques," Mathematical Programming Study 3 , 1975, pp. 2634. 6. Dantzig, G. B., "Maximization of a Linear Function of Variables Subject to Linear Inequalities," in T. C. Koopraan, ed. Activity Analysis of Production and Allocation , Wiley, New York, 1951, Ch. XXI. 7. Dantzig, G. B., Linear Programming and Extensions , Princeton University Press, Princeton, New Jersey, 1963, pp. 8485. 8. Dantzig, G. B., R. W. Cottle, B. C. Eaves, F. S. Hillier, A. S. Manne, G. H. Golub, D. J. Wilde and R. B. Wilson, "On the Need for a System Optimization Laboratory," in T. C. Hu and S. M. Robinson, eds . Mathematical Programming , Academic Press, New York, 19 73, p. 2. 9. Dantzig, G. B. and R. Van Slyke, "Generalized Upper Bounding Techniques," Journal of Computer System Science , 1968, pp. 213226. 10. Demjanov, V. F. , "Algorithms for some minimax problems," Journa l of Computer and System Sciences 2 (1968), pp. 342380. 11. Eaves, B. C, "Piecewise Linear Retractions by Reflextion," Linear Algebra and its Applications , 7, 1973, pp. 9398. 101
PAGE 109
102 12. Ererain, I. I., "An Iterative Method for Cebysev Approximations of Incompatible Systems of Linear Inequalities," Soviet Mathematical Doklady , 1961, pp , 82182A. 13. Fejer, L., "Ueber die Lage der Nulstellen von Polynomen die aus Mininium forderungen genisser Arten entspringen, " Mathematical Annalen , 85, 1922, pp. 41A8. 14. Forrest, J. J. H. and J. A. Tomlin, "Updating Triangular Factors of the Basis in the Product Form Simplex Method," Mathematical Programming 2, 1972, pp. 263278. 15. Goffin, J. L. , On the Finite Convergence of the Relaxation Method for Solving Systems of Inequalities, Dissertation University of California, Berkeley, 1971. 16. Goffin, J. L., "On Convergence Rates of Subgradient Optimization Methods," Working Paper No. 34, McGill University, 1976. 17. Harris, P. M. J., "Pivot Selection Methods of the Devex LP Code," Mathematical Programming Study 4 , 1975, pp. 3057. 18. Held, M. and R. M. Karp, "The TravellingSalesman Problem and Minimum Spanning Trees; Part II," Mathematical Programming , 1, 1971, pp. 625. 19. Held, M., P. Wolfe, and H. Crowder, "Validation of Subgradient Optimization," Mathematical Programming , 6, 1974, pp , 6288. 20. Hellerman, E. and D. C. Rarick, "The Partitioned Preassigned Pivot Procedure (P'^)," in D. J. Rose and R. A. Willoughby eds., Sparse Matrices and Their Applications , Plenum Press, New York, 1972. 21. Hoffman, A. J., "On Approximate Solutions of Systems of Linear Inequalities," Journal of Research of the National Bureau of Standards, 48, 4, 1952, pp. 263265. 22. Kalan, J. E. , "Aspects of LargeScale, InCore Linear Programming," Proceedings of ACM Annual Conference , Chicago, August, 19 71. 23. Koehler, G. J., "A Case for Relaxation Methods in LargeScale Linear Programming," Large Scale Systems Tlieory and Applications . Proceedings of the IFAC Symposium, June 1976, Udine, Italy. 24. Koehler, G. J. and G. S. Kumar, "A Look at Finding Points of a Convex Polyhedron Using Relaxation and Subgradient Procedures," to appear in Operations Research .
PAGE 110
103 25. Koehler, G. J., A. B. VVhinston and G. P. Wright, Optimization over Leontief Substitution Systems , NorthHolland Publishing Company, Amsterdam, 1975. 26. Kohler, D. A., Projections of Convex Polyhedral Sets, Dissertation, University of California, Berkeley, 1967. 27. Lasdon, L. S,, Optimization llieory for Large Systems , The MacMillan Company, New York, 19 70. 28. Lemarchel, C, "An Extension of Davidon Methods to Nondif ferentiable Problems," Mathematical Programming Study , 3, 1975, pp. 95109. 29. Markovitz, H. M. , "The Elimination Form of Inverse and its Application to Linear Programming," Management Science , 3, 1957, pp. 255269. 30. Merzlyakov, Y. I,, "On a Relaxation Method of Solving Systems of Linear Inequalities," USSR Computational Mathematics and Mathematical Physics , 2 (3), 1963, pp. 504510. 31. Motzkin, T. S. and I. J. Schoenberg, "The Relaxation Method for Linear Inequalities," Canadian Journal of Mathematics , 6, No. 3, 1954, pp. 393404. 32. Oettli, W., "An Iterative Method, Having Linear Rate of Convergence, for Solving a Pair of Dual Linear Programs," Mathematical Programming , 3, 1972, pp. 302311. 33. Oettli, W. , "Symmetric Duality and a Convergent Subgradient Method for Discrete, Linear, Constrained Approximation Problems with Arbitrary Norms Appearing in the Objective Function and in the Constraints," Journal of Approximation Theory , 14, 1, 1975, pp. 4350. 34. Polyak, B. T. , "A General Method of Solving Extremum Problems," Soviet Mathematical Doklady , 1967, pp. 593597. 35. Polyak, B. T. , "Minimization of Unsmooth Functions," USSR Computational Mathematics and Mathematical Physics , 1969, pp. 1429. 36. Shor, N. Z., On the Structure of Algorithms for the Numerical Solution of Optimal Planning and Design Problems, Dissertation, Cybernetics Institute An, Kiev, 1974. 37. Wolfe, P., "A Method of Conjugate Subgradients for Minimizing Nondifferentiable Functions," Proceedings Twelfth Annual Allerton Conference on Circuit and System Tlieory , Chicago, October 24, 1974, pp. 815.
PAGE 111
104 38. Wolfe, P., "A Method of Conjugate Subgradients for Minimizing Nondiff erentiable functions," Mathematical Programming Study 3, 1975, pp. 145173.
PAGE 112
BIOGRAPHICAL SKETCH Gollakota Surya Kumar was born in Kakinada, India, on April 5, 1939. After secondary schooling at Modern School, New Delhi, India he joined St. Stephens College, Delhi University in BSc (Physics Honours) in July 1956. After tv;o years in Delhi University he was one of 24 students selected in an allIndia competition to pursue a fouryear course (1958 to 1962) in mechanical engineering at Indian Railways School of Mechanical and Electrical Engineering at Jamalpur, India. During this period he passed Parts I, II and III of Institution of Mechanical Engineers (London). He became a Graduate of Institution of Mechanical Engineers (London) in July 1962 and was Elected a Member of the British Institution in November 1969. From June 1964 to March 1970 he worked as Assistant Plant Manager and Plant Manager of three major plants of Indian Railways employing a total of over 10,000 persons. From March 19 70 to August 19 72 he worked as Assistant Director, Research, Design and Standards Organization of Indian Railways. In August 1972 he came to the United States and joined the MBA Program at Indiana University, Bloomington, Indiana. From January 1976 he has been pursuing doctoral studies in business administration v>7ith a major in management at the University of 105
PAGE 113
106 Florida. He has been teaching senior level courses in managerial operations analysis over the last two years. He has recently accepted a position as an assistant professor in the College of Business Administration, Northern Illinois University, Dekalb, Illinois. He was General Editor of his school weekly "Sandesh" and has several professional papers to his credit in India and USA. His hobbies include badminton, basketball and photography. He presented a oneman exhibition of photographs in Jamalpur entitled, "Four Years in Jaralpur. "
PAGE 114
I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Gary J^Koehler, Chairman Associate Professor of Management I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. j:ukJi ^^7W/AAntal Majthay Associate Professor of Management I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ^ / y/CX'Lrr(e

