ITEPATIVE lETiHODS FOR SOLVING
LARGESCALE LIN:EAF. PROGR.'lS
Gollakoca Surya Kumar
A DISSERTAlIO;N PRESEfNTED ui TilE GRADUATE COLilICIL OF
THE UNIVERSITY' OF FLORIDA
II PARTIAL FLLFILLHIELNT OF THE REQULiIREIEENTS FOR THE
DEGFE OF DOCTOR OF PIIILOSOPHY
UIJIVERSIT'f OF FLORIDA
19 79
ACKNOWLEDGEMENTS
I wish to express my deepest gratitude to Professor Gary
Koehler, my dissertation chairman, for the excellent guidance, the
constant encouragement and the support he has provided me during
my career at the University of Florida. Despite his busy schedule
he was always available at a moment's notice.
I also wish to thank Professor Antal Majthay whose depth of
knowledge not only in the area of operations research but on a
wide range of subjects will always remain aweinspiring. I am
also grateful to Professors Ira Horowitz and Henry Tosi for
providing financial assistance during my stay at the University of
Florida. I also wish to thank Ms. Kathy J. Combs for her superb
typing of the manuscript.
Last but not the least this work would not have been possible
but for the patience, understanding and encouragement of my wife
Rekha.
TABLE OF CONTENTS
PAGE
ACKNOWLEDGEMENTS ............................................ ii
ABSTRACT ............................ ... ....... .... .... ..* v
CHAPTER
ONE THE PROBLEM: WHY ITERATIVE TECHNIQUES? ........... 1
TWO BACKGROUND RESULTS AND TOOLS ..................... 8
Introduction ............. .... ............... 8
Motivation ................................ .... 8
Task ... ........... ......................... .. 11
Definitions .................... ......... .. 12
Typical Relaxation Method ....................... 13
Procedure ....................... ............. 13
Variations .................................... 14
Convergence ................................... 14
Fejermonotone sequence ..................... 14
Motzkin and Schoenberg ..................... 16
Relaxation Procedures ......................... 18
Agmon .................................... ..... 18
Motzkin and Schoenberg, Eaves ............... 21
Goffin ........................................ 
Definitions .................................. 22
Another proof of Agmon's result .......... 33
Range of A for finite convergence ........ 37
Merzlyakov .................................... 39
Definitions and motivation ............... 39
Method ....................................... 40
Generalized Merzlyakov procedure ....,..... 
Typical Subgradient Method ..................... 6
Procedure .................................... 6
Variations .................................
Some Subgradient Procedures ...,................. '
Polyak ... ........ ...........................
Oettli ...................................... 51
Some of the more recent algorithms .......... 5)
Camerini, Fratta and Maffioli ............ 5
Wolfe ...................................... 59
iii
CHAPTER
THREE THEORETICAL RESULTS ................................ 62
Comparison of Generalized Merzlyakov Method
and Some Typical Subgradient Methods ............ 62
Generalized Merzlyakov method ................ 62
Further developments of Oettli method ........ 66
Generalized Merzlyakov method and Oettli method 68
Comparison with some other subgradient
methods ...................................... 71
An Algorithm Using Relaxation Method ............ 73
Motivation ................. ................. 73
The algorithm ............................... 74
Phase I ............... ......... ......... 74
Phase II ................................. 79
Application to solving large linear
problems .................................... 80
Computational results ........................ 86
Concluding Remarks .................... ......... 98
APPENDIX ....................................................... 99
BIBLIOGRAPHY ................................................. 101
BIOGRAPHICAL SKETCH ................ .......................... 105
PAGE
Abstract of Dissertation Presented to the GrAduate Council
of the University of FloridJ
in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy
ITERA\TiVE METHODS FOR SOLVING
LARGESCALE LINEAR PROGCFAIS
B '
Gollakota Surya Kuirar
August 1979
Chairman: Gary J. Koehler
MIjor Department: Management
The simplex: method for solving linear programs has achieved
its overwhelming success over the years due to the advances during
the last three decades on extending the effectivEness of the basic
algorithm introduced by Danczig in 1951. However the largest
problem that can be economically solved by this method is dictated
by the size of the basis. The limitation being the ability to
maintain a sparse yer accurate representation of che basis inverse
CocuTiercial computer codes using the recent developments of the
simplex method are capable of solving manLIem, Lical program systems
uf Llie order of 10,'i.iUU rows. IJowev.er, inrker l incur p.rogr ns do
arise in practice. The reason why LIe', are not being set up is
because none of the production LP codes can solve such laree
problems in a timely and cost effective fashion. For such large
problems iterative techniques that do not require a basis may be
an attractive alternative.
We review two classes of iterative techniquesrelaxation
methods and subgradient optimization. Solving a linear program
can be shown to be identical to finding a point of a polyhedron P,
the polyhedron being formed from primal and dual constraints and an
additional constraint using strong duality. In relaxation methods
we successively find points of relaxed forms P c pn. That is, at
iteration n we find an x Pn such that P c P. We assume P # 0.
The problem is solved when the sequence converges to a point
x* P.
Subgradient methods are an alternative class of iterative
techniques for solving these problems. In subgradient methods we
minimize a convex though not necessarily differentiable function
n+l n n n n
f(') by the iterative scheme x = x t u where t > 0 is a
n n n
a scalar obtained by the approximate minimization of f(x t u),
n n
and u is a subgradient of f at x When we use the minimization
method for finding a point of a polyhedron P, the function f is
so defined that if x* is a minimum of f then x* E P.
These iterative techniques work with original data; thus
advantages of supersparsity can be fully realized and the program
run in core. Also they are self correcting and knowledge of a
vector close to the solution can be used to advantage.
We show that a generalized form of Merzlyakov's relaxation
procedure subsumes most of the recent subgradient procedures when
the objective is to find a point of a polyhedron. A new algorithm
using relaxation method is presented. This algorithm was coded to
explore values for the relaxation parameters used in the algorithm.
CHAPTER 1
THE PROBLEM: WHY ITERATIVE TECHNIQUES?
Mathematical programming has achieved its present popularity
in a large measure due to the success of the simplex method for
solving linear programs published by Dantzig [6] in 1951. Over the
years considerable effort has been expended towards exploiting and
extending the effectiveness of this procedure. We will briefly
outline some of the improvements.
Most practical linear programs have a sparse matrix [2] and it
is necessary to exploit this sparseness to reduce memory and time
requirements and to maintain accurate information. The first such
procedure was the revisedsimplex procedure. This procedure
operates only on original problem data and requires an inverse of
the current basis. If there are m constraints and n variables
(including slacks, surplus and artificial) then the regular
simplex procedure requires
(1.1) (m + l)(n + 1)
storage locations. There are m + 1 rows (including the objective
and n + 1 columns (including the righthand side). In the revised
simplex procedure we need
(1.2) p(m + l)(n + 1) + m2
where p is the problem density. (1.1) is much larger than (1.2)
if p is close to zero.
It was also discovered that if the inverse matrix were stored
in product form [29] an additional decrease in storage requirements
could be realized. However, in product form, the basis inverse
grows in size at each iteration so that eventually we must either
run out of room or scrap the current form of the basis inverse and
reinvert the current basis. In practice this happens quite often.
Using product form inversion, the storage requirements are
(1.3) p(m + l)(n + 1) + qmr
where q is the average density of the eta vectors needed in the
product form inverse and r is the average number of eta vectors.
Notice, though, that to gain the decrease in (1.3) over (1.2) we
must spend time in the reinversion step.
It was soon realized that there are many ways of forming a
product form of a basis matrix. For instance, the matri:; could
first be decomposed into the product of two triangular matrices.
The eta vectors for these are sparse and easily formed. In
addition, within a class of procedures for viewing the original
basis, one can choose the rowcolumn pivot order to minimize the
creation of new members  i.e. one can choose the order in forming
the eta vectors to minimize the density of the resulting basis
inverse representation [20]. A related issue has to do with
updating the currect basis inverse with a new eta vector. ,inc cjii
perform this operation with a minimal buildup in storage
requirements [14].
Kalan [22] pointed out that of the p(m + l)(n + 1) elements of
a typical LP problem, most are identical (usually most are + Is).
Kalan suggested storing only the unique numbers of an LP problem
and using pointers to store the problem structure. In so doing
one stores fewer elements than exist in the problem. The new
sparseness is termed supersparsity. Kalan noted that not only can
original data be stored this way but also new data (formed in the
basis inverse) can be stored in this fashion.
Another useful idea  scaling of data  has numerical
stability for its motivation. Generally for real problems the
coefficients of matrix may take on a wide range of values. This
can cause numerical difficulties  for example in inversion 
since the computer must round off fractions. To reduce the effect
of this problem, the matrix entries are scaled before starting the
problem.
There are a number of ways in which to obtain a starting
basis: from scratch using an "artificial basis," using a CRASHed
basis forcing some of the variables into the basis, or based on
information about the problem. The last approach is especially
useful in solving a series of problems with only minor variations
in data.
Candidate selection can play an important part in the
efficiency of an LP algorithm. There are a number of ways to select
the entering variable  "multiple pricing" is one of them which is
especially useful for large problems. In this approach we typically
divide the variables into segments and pick the segments sequentially.
In the chosen segment we select the g most negative reduced cost
variables as candidates for pivoting. We concentrate on these g
nonbasic variables for the remainder of some h iterations. And
then repeat the procedure with the next segment. A more recent
development concerning pivot selection is due to Paula Harris [17].
The reduced costs here are weighted prior to selection. The
weights are chosen with respect to a fixed set of nonbasic
variables. The idea is that in the traditional most negative
reduced cost approach we are trying to find out the best candidate
but on a local basis  locally we move in the direction of
steepest ascent of the objective function. Such a selection may
not serve global interests. By constantly adjusting the pricing
by weights derived from a fixed reference point, the global
interests of the problem are hopefully maintained. Expr rimental
results corroborate this expectation and show it to be a very
effective procedure.
When there is degeneracy in a linear programming Drobleim,
there exists the possibility of a set of basic feasible solutions
occurring repeatedly  called "cycling." Bland 13] has introduced
several new methods to avoid cycling  the simplest of whicn is
to select as entering variable one with the smallest subscript
among all candidates to enter basis. If there is a tic for leaving
variable, select the one with the smallest basic subscript.
Parallel with efforts to increase the size of problcim that
can be tackled by the simplex method have been efforts to incrcnse
its capabilities for problems having special structure. l ie
computing effort using the simplex method depends primarily, on the
number of rows; so it is worthwhile considering the possibillty of
taking care of some constraints without increasing the size of the
basis. Generalized Upper Bounding (GUB) technique introduced by
Dantzig and Van Slyke 19] achieves this for contraints of the form
(1.4) Ex. = b.
J 1
with the restriction that the same variable does not occur in more
than one GUB constraint. Decomposition technique is another method
of extending the capability of problem solving by breaking it into
smaller subproblems and using a master program iteratively to tie
them together. Unfortunately the convergence of this method is
rather slow and in addition one is unable to compute bounds on the
value of the objective function at each iteration, unless each of
the subproblems is bounded.
Despite the far reaching advances made during the last three
decades in extending the capability of the simplex method, some of
which have been mentioned above, the basis inverse still remains
the limiting factor dictating the size of the problem that could
be tackled. Even when we use variants like GUB and decomposition
techniques, the size of the basis is still the bottleneck. In the
case of GUB it is the basis of the master problem consisting of
other than GUB constraints and with decomposition it is the size of
the basis of the largest of subproblems. Thus this basis inverse
is the heart of all that is bad in simplex method related techniques
of solving an LP.
Commercial computer codes using above developments of simplex
method are capable of solving mathematical programming systems of
the order of 10,000 rows. However, useful linear programs of such
magnitude do arise in practice for example in food, oil and
chemical related industries. The reason why larger ones are not
being set up and solved is that no production LP code has been
developed to solve such large problems in a timely and cost
effective fashion. This is not to say that there are not large
linear programming production codes. For example, IBM's MPSX/370,
CDC's APEX, Honeywell's LP 6000 X, Techtronic's
MPS3, and others are reliable and available at reasonable rates.
However, one would not seriously consider using one of these to
solve a lincar program of 10,000 or more rows on a routine basis
unless the problem possessed some nice structure or the resulEs of
the problem solution had a high economic value.
Dantzig et. al. [S] in their justification for the setting up
of a system optimization lab for solving large scale models have
noted:
Society would benefit greatly if certain
total systems can be modeled and successfully
solved. For example, crude economic planning
models of many developing countries indicate
a potential growLh race of 10 to 15. per year.
To implement such a 2rou.thl (aside from political
differences; requires a carefully worked out
detailed model and the availability of computer
programs that can solve the resulting large
scale systems. The world is currently faced
with difficult problems related to population
grouch, availability of natural resources,
ecological evalu.'t ions and control. urban
redesign. design :f lar.c scalc cn.inccrin:.,
systemb'ni (e.g. atomic en r,;y .id ru:yc lin*i
systcni. and the modeling of mrn;n's plih iologican
.v teCm for the p purpose of diagnosis and trc;itmenic c.
These problems are complex: and urgent and can
only he solved if viewed as cotal systems. If
not, [lhcn oiily patclhuork picconical solutions s w. ill
be developed (as it las been in the past) and
the u.rrld will continue to be plagued by one crisis
after another caused by poor planning techniques.
The bottleneck in solving such large unstructured LPs is the
inability to maintain a sparse yet accurate representation of the
basis inverse. When the simplex method is not computationally
feasible for such large problems, iterative techniques that do not
require a basis inverse may be an attractive alternative. In this
dissertation we concentrate on iterative procedures for solving
linear programs.
We discuss iterative techniques under the headings (a)
relaxation methods and (b) subgradient methods. These methods are
insensitive to the number of constraints. In fact an increase in
the number of constraints for the same problem improves the
convergence of the relaxation methods. Unlike the simplex method,
iterative techniques are selfcorrecting. We always use original
data thus advantages of supersparsity [22] can be fully realized
and the program run in main memory. Finally knowledge of a vector
close to the solution can be used to advantage. This is very
useful when a sequence of problems has to be solved with only
slightly varying data.
As an aside to the main thrust of this dissertation, we note
that iterative methods are more attractive for solving certain
large Markov decision processes and also for problems having a
Leontief Substitution constraint set [23, 25]both of these problems
are an important special class of LPs.
CHAPTER 2
BACKGROUND RESULTS AND TOOLS
Introduction
Mo tivation
In this chapter we will review various iterative techniques
for finding a point of a polyhedron. We are interested in finding
a point of a polyhedron because a wide variety of important
mathematical programs can be reduced to such a problem. For
example, solving a linear program could be reduced to finding a
point of a polyhedron using primal and dual constraints along wich
an additional constraint implying strong duality. To elucidate tliis
further, consider the following primaldual linear programming pair:
(2.1) Primal Dual
Max c'x Min v'b
s.t. Ax b s.t. v'A c c
x >O v 0
Here A is m x n; c is n x 1 and b is m x 1. v' stands for cranspose
of v. Problem 2.1 can be restated as
(2.2) Ax + b 0
v'A c 0
v'b c'x 0
v 2 0, x 0
Let P = {((): (v) satisfies 2.2} be the set of solutions to 2.2.
()E P implies that x solves the Primal problem in 2.1 and v solves
the dual problem.
Linear programs of the form
z* = Max Min (a'x + b.)
1 1
x i=l,...,m
relate directly to the problem of finding a point of a polyhedron P
where
k
P = {x C R : a'x + b. z*, i = 1, ..., m}
1 1
Such problems arise, for example, in largescale transportation
problems and decomposition problems [27]. We will develop such a
formulation for the transportation problem.
Given k origins with stocks aQ > 0 of a single commodity, and
m destinations with demands d. > 0 the transportation problem involves
determining a shipping schedule that meets all demand and supply
constraints at minimum cost. Let
zyj be the amount shipped from I to j
cj cost of shipping one unit from Z to j.
The problem is
k m
Min Z E c ,j zzj
Z=l j=1
subject to
m
E z j a = 1, ... k
j=l
k
E = d. j = 1,..., m
%=i j J
z > 0
j =
Without loss of generality, we assume
k m
Z a = d
=1l j=l 1
If m and k are too large to be handled by conventional network
methods [4], then the decomposition principle can be used. Let
(z : = ., n; j = 1, m): i I'}
be a list of extreme points of
k
E z k = d., j = 1, ..., m
Z=1
z >2 0
Then the transportation problem is equivalent to the master problem
k m
Min E Z c Z z A
k=l j=1 idI
subject to
m
a k z AP = 0, Z = 1, .. ., k
il j=l
SA = 1
iEI
A1 0.
Let
k m
b Z Z cj z
i =1 j=1
m
i i i
a = a z z
j=1
The master problem becomes
Min E bii
iEI
subject to
i i
A = 0, 1 = 1, ..., k
iE I
icI
i 0
The dual to the above problem is
Max w
subject to
k
E a x + w b
Z=1
w b + a'x
where
ai i k
ai = (al' a2, ..., ak) c R
This problem is equivalent to
Max, Min rax + b.
iax + b.)
xER icI 1 i
Task
Two of the methods of finding a point of a polyhedron are the
FourierMotzkin Elimination Method [7] and a Phase I simplex method.
The Elimination Method reduces the number of variables by one in
each cycle but may greatly increase the number of inequalities in
the remaining variables. For larger problems this procedure is
impractical [26]  since the number of inequalities growsat a rapid
rate. The Phase I simplex procedure can also be used to find a
point of a polyhedron. For very large problems the simplex method
breaks down due to a practical limitation on the size of the matrix
that can be inverted as explained in the last chapter. It is for
problems of this nature that iterative methods may be an attractive
alternative. We discuss iterative techniques under two classes 
relaxation methods and minimization methods.
In relaxation methods, at each iteration we attempt to find a
point of a relaxed form Pn of P. That is at iteration n we find an
n n n
x P such that P c P We look for the property that the sequence
of points thus generated is pointwise closer to P. Minimization
methods attempt to solve the problem of minimizing a general, maybe
nondifferentiable, convex function f('). The sequence generated
generally has the property that the successive iterates are closer
to the level set of solutions in distance, x* is one such point.
The value of the objective function need not necessarily improve ac
each iteration. When we use the minimization method for finding a
point of P, the function f is so defined that if x* is a minimum of
f then x* E P.
We will now review the relaxation methods. As was mentioned
n n
earlier, we find at each iteration n, a point xn p where P is a
n.
relaxed form of P. That is P c P We continue finding points
successively in this manner till we can find an x* E P. :L leasE
one such point exists by our assumption that P J 0.
Definitions
We need a few definitions before some of tle methods c.n be
described. Let
R.(x) = a'x + b
1 1 1
k k
where i E I, I # 0 and finite, x R a. R and b. E R. Without
1 1
loss of generality assume a.i = 1 for all i, where 11 II is the
Euclidean norm. H. given by
1
H. = {x c Rk: k.(x) 0)
1 1
is the halfspace associated with i C I. P given by
P = n H. = {x e Rk: (.(x) 2 0, j I} is the polyhedron
1 1
icI
of interest. E. given by
1
E. = {x C R: L. (x) = 0}
1 1
is the plane associated with halfspace 11 d(x,H.) given by
d(x, H.) = inf II xzI
z H.
1
is the Euclidean distance from x R to halfspace H.. d (x) given
1 p
by
d (x) = max d(x, H.)
p i
i I
is the maximal distance from x to P. B (x) given by
B (x) {ycRk: 1 xyjll r}
r
is the ball with radius r and center x. S (x) given by
r
Sr(x) = {yRk: Ixy = r}
is the kl dimensional sphere of radius r and center x
Typical Relaxation Method
Procedure
A typical relaxation method for finding a point of P is
o k
(1) Pick x R arbitrarily
(2) If xn P stop, otherwise let i* be the
most isolated halfspace, i.e.,
n n
a'. : x b.* > a' x b., for all iel.
1 1 = 1 1
We obtain the sequence {xn} recursively as follows:
n+1 n +n n n
x = x + A (t x )
n n n n
where t = PE (x ) is the projection of x on Ei,. A is called
i*
the relaxation parameter at iteration n. Generally An is constant
across n = 1, 2, ...
Variations
If An = 1 for all n,we call the procedure a projection method,
and when An = 2 for all n,we call it a reflection method. See
Figure 1 for an example of the relaxation procedure. In this
example we use A = A = 2, for all n. At xo, E is the plane
corresponding to the most violated halfspace. We reflect in E.
1 1 2
to get x At x we reflect in E3 to get x If we concinlju
the procedure we are in P in four iterations.
Convergence
Fejermonotone sequence
We are ultimately interested in finding a point of P or at least
finding a sequence of points which converge to a point of P. If the
sequence terminates with a point of P (as in Figure 1) we ha.'e
accomplished our goal. If the sequence does not terminate, then .'e
wish to know whether it converges to a point of P. A well kno..in
result concerning "Fejermonotone" sequences will be used to shed
light on this question.
a3 + b> 0
3* 3 =
alx + b > 0
1 1=
x
a2x + b2 0
a 2 + b 2
FIGURE 1
A sequence of points {xn} in Rk is said to be Fejermonotone
[13, 31] with respect to the set P if
n+l n
(i) xn+ xn and
(ii) I x+ zll I xn z  for all z P and for all n.
Motzkin and Schoenberg
A well known result in analysis is that if {sn) is monotone,
then {sn) converges if and only if it is bounded. Hence we see that
a Fejermonotone sequence always converges in distance. However, we
need to relate this convergence to the convergence of sequence {xn}.
Before we relate a theorem on convergence of Fej6rmonotone sequence
of points, we need a few more definitions.
k
Given a set K c Rk, the affine hull of K, denoted by 'f(K), ii
k ii i
M(K) = {x E R : x = x x E K,
iEL
A = 1, L finite, Ai R}.
iEL
For example, the affine hull of a point is the point itself and affine
hull of a line segment is a line containing the segment. The affine
hull of a triangle in R3 is the plane containing the triangle. A
set P is affine if P = M(P).
Let M be affine in Rk, x a point of M and r E R be a positive
real number. Then we define S (M,x) as the spherical surface .:itl
axis M, center x and radius r by
(2.3) Sr(M,x) = x + ((M x)L n {z E R k zj = r})
Figure 2 shows the construction of a spherical surface. Gi'.in a
point of affine set M, M x is the translation of M by x. (" 
is orthogonal to M x. S (0) is a spherical surface with radius r,
r
which intersects (M x) at two points. These two points are
translated back by x to give S (M, x),
(2.4) Theorem (Motzkin and Schoenberg)
Let the infinite sequence {xn) be Fej6rmonotone with respect
to the set P, then
Case (a): If P has an interior, the sequence converges to a
point.
S (M, x)
r
I. x
i 
s (0)
r
(M x)
FIGURE 2
18
Case (b): If P does not have an interior, then the sequence
can be convergent to a point or the set of its limit points may lie
on a spherical surface with axixMl(P).
See Figure 3 for an illustration. In case (a) P has an interior
0 1 2 3 *
and the sequence x x x x ... converges to a point x of P.
In case (b) P does not have an interior and the Fejermonotone
0 1 2 3
sequence x x x x ... results in the two limit points of the
sequence lying on a spherical surface with axis AI(P) and center y.
1
y being the projection of x on P
Relaxation Procedures
Agmon
We are now in a position to start our survey of relaxation
procedures. One of the earliest published methods is the projection
method due to Agmon [1]. Here we specify an initial veccor x; E F,
(chosen arbitrarily) and set An = 1 for all n. At iteracion n if
n n+l
x E P stop. Otherwise we find a new point x as follo..'s:
n+l n n n n
x =x (t x
I n
(a x + b. ) a
n n n in
n
= x 
a. a.
n in
n n
= x (a x + b.
in in) a.
since Ia.i I = 1. Here i represents a most violated conLr3iinc of
n
n
P at x
Case (a)
S (M(P), y)
r
135
1 3 5
X ,X ,X ,
y
M (P)
p
0246
x ,x ,x ,x ,
Case (b)
FIGURE 3
f
This procedure yields a sequence which converges to a point of
P. In the Agmon procedure
P = {x: a. x + b. 0)
in in
That is, pn is the closed halfspace defined by the constraint in and
P c In keeping with our notation, here P is a relaxed form of P.
(25) Theorem (Agmon)
Let P = n H., P # (, I 7 0 and finite, be the solution set of
iEI 1
a consistent system of linear inequalities, and let {xn) be the
sequence defined by the projection method, that is, n = 1 for all n.
Then either the process terminates after a finite number of steps or
the sequence {x n converges to a point of P, x
Furthermore,
n 0 n
Ix x  2d(x P) ,
where 0 c (0, 1) and depends only on the matrix A, where
a
a2
A =
a
a m
and m = ll.
Agmon explicitly proved the convergence for the projection
method. He also showed that the results could be extended to the
case A E (0, 2), where A = A for all n.
Motzkin and Schoenberg, Eaves
Motzkin and Schoenberg 131] (among others) described a reflexion
method for finding a point of P. They showed that if P has an
interior then the sequence terminates with a point in P after a
finite number of steps. Let x0 P be an arbitrary starting point
and generate a sequence as follows:
If x C P stop
n n
If xn P select a halfspace such that x n H.1
n+1 n
Let x be obtained by reflecting x in E.. After
J
finitely many iterations, x will fall in P if P has an
interior.
The general reflection function can be expressed as follows:
k k
For i = 1, ..., m define f : R  R by
i
x 2(a.x + b.)a. if x 1 H.
f.(x) = x + 2d(x, H.)a. =
x if x E H.
1
Let gl' g2, ... be a sequence where
gj {fl, .. ., f } for j = 1, 2, .
Define g to be the composite function
gn(x) = gn(gn 1 (...(g (x)) ...))
(2.6) Theorem (Motzkin and Schoenberg)
Let P # 0 and I # 0 and finite. If P = n H. has an interior,
iI
0
then for any x there is an k such that
g (x) E P.
Eaves 111] extended this result and demonstrated that the
reflection procedure has a uniform property. Namely, if x is within
a distance 6 of P, then g (x ) will fall in P for some e where
0
Z and k depends on 6 and not on x .
(2.8) Theorem (Eaves)
Assume that P = n H. has an interior. Let X be the set of
icl
points within a distance 6 of P, that is,
X = {x C R: d(x, P) 6}
Then there is an L such that g (X) c P. g is thus a piecewise,
linear retraction from X to P.
Piecewise and linear means that it has finitely many linear
pieces. Retraction means g : X  P is continuous, P c X and
g (x) = x for x C P. See Figure 4 for example. Six points
chosen arbitrarily within 1" of P all converge in under four
iterations.
Goffin
Definitions. Coffin's 115] work deserves special menLiocn.
He has provided a comprehensive treatment of relaxation mechods
and presented several useful finiteness properties. We need a
few more definitions before presenting Coffin's results.
A convex subset F of P is a face of P if
x, y C P
(> x, y c F.
(x, y) F # J
We denote the set of faces of P by F(P). Figure 5 illustrates
F(P) for a given P. Zerodimensional faces are 1, 2, 3. On'
dimensional faces are 4, 5, 6. The polytope 7 itself is also a race.
FIGURE 4
24
The zerodimensional faces of P are called the extreme points of P.
The n1 dimensional faces are called facets of P.
2
FIGURE 5
NS(x) defined by
NS(x) = {u E Rk: u'(z x) 0, Vz E S}
is the normal cone to S at x. NS(x) is a nonempty, closed convex
cone. It is nonempty because it contains at least the origin. Ic
is closed and convex because the intersection of closed halispace is
closed and convex. It is a cone because for all u E NS(x),
Au E N (x), A 2 0.
CS(x) = (NS(x))p
is the supporting cone to S at x. Figure 6 illustrates theoe last
two definitions.
A point x of a set P is a relative interior point designated
r.i.(P.) if it is an interior point of P with respect to the relative
topology induced in M(P) by the Euclidean topology, i.e., x E r.i.(P)
if there exists 6 > 0 such that B6(x) n M(P) c P. For example, as
2 2
shown in Figure 7, a line segment in R has no interior in R but
has a relative interior.
NS (x)
FIGURE 6
FIGURE 7
It can be shown that the normal cone is the same for all
relatively interior points of a given face, i.e.,
Np(x) = Np(F) for any x c r.i.(F)
where F is a face of P. Similarly
C (x) = C (F) for any x c r.i.(F).
Let T be any set, then T is defined as
T = {x E Rk: x = y, y T}
A closed convex set is said to be smooth enough at a point y of
its boundary if
Np(y) c Cp(y)
A closed convex set is said to be smooth enough if it is smooth
enough at every boundary point, or equivalently if
N (F) C (F) V FE F(P) 0
where F(P) stands for the set of faces of P and F(P) 0 is the set
F(P) with 0 removed.
The smooth enough property applied to a polyhedron would require
all its "corners" to be "smooth enough," carrying the analogy to
kdimensional Euclidean space.
Some characteristics of smooth enough convex sets are mentioned
below:
A polyhedral cone C is smooth enough if and
only if (iff) C c C, where C is the polar of C.
C={u E Rk: u'z < 0, Vz E C}.
A polytope (bounded polyhedron) is smooth
enough iff all its extreme points are smooth
enough.
A polyhedron is smooth enough iff Np(F) c C (F)
for all the minimal elements of the poset {F(P) c, c}
where c stands for set inclusion.
Some illustrations are given in Figure 8.
A poset (P, <) is a set P on which a binary relation < is
defined such that it satisfies the following relations:
(i) for all x, x < x (reflexivity)
(ii) if x < y and y < z then x < z transitivityy)
(iii) if x < y and y < x then x = y (antisymmetry)
Examples of posets are {R, <}, i.e., the reals with the ordinary
less than equal to operation, a dictionary with lexicographic
ordering, and the power set S, P(S), with set inclusion, c as the
binary operation.
Given a unit vector e E R and an angle a E [0,11], the set of
vectors which make an angle with e less than or equal to a is called
the spherical cone, C (e), with axis e and half aperture angle a.
N (y) (y)
(a) Not smooth enough at the extreme points of P.
P
(b) Not smooth enough at any point of P.
N (y)
(c) Smooth enough.
FIGURE 8
C (e) = {x E Rk: x'e x  Cos a}
See Figure 9 for an illustration
C (e) = C{B (e)}
a = sin a, a c [0, i)
2
FIGURE 9
Some properties of C (e) are given below:
Ca(e) is closed for a E [0,1]
Ca(e) is a convex set for a G 10, N/2]
For a E [0,1/2)if we let v = Sin a, then
Ca(e) = C{B (e)} where
k ii i i
C(S) = {x R x = A x x E S, A > 0, L finite)
iEL
C(S) is the convex cone hull of S.
Following Goffin [15] we define an obtuseness index u(C) of a closed
convex cone C to be the sine of the half aperture angle of the
largest spherical cone contained in C. Properties of V(C) are:
u(C) = sup {Sin a: C (e) c C, e i B (0)}
= sup mnin(a e)
eS1 (0) icI
u(C) > 0 iff C lias an interior
C1 c C,  u(C ) < u(C,)
u(C) = 1 iff C is a halfspace.
If C (e) is a spherical cone then u(C (e)) = Sin a.
The obtuseness index of a polyhedron P is defined b.,
U(P) = inf 'u(Cp (::)) = min U(Cp,(F))
:z P F F'(P).i .
For a pol cope 'e have
'(P) = rin u(C,(F))
Ft'.'ertices of P
If U(P)= l. 1/2, then P is smooth enough. However, clis is noc a
necessary condition for P to be smooth enough. If P is the pocsicive
ortchant in R then P is smooth enough but
U(P) = 1..'D 1/..F
kk
The distance from a point X:; c a set S C Rk is defined bv,
d(x. S) = inf I x z
zE S
If S is closed, chen che inflmum is acctined, and che set ol points
of S where it is attained is the set of points closest re :: in S
Ts(x%) = {' FE S: I x y:  = d(x., S)}
If a ser K is closed and convex, then T..(::, reduces tc a single point
which will also be denoted by TK(x) This point is called the
k
projection of x into K. Hence T (x) is a retraction from R to K.
Let K be a closed convex set in R The following important
result implies the continuity of the map TK.
(2.8) Theorem (Cheney, Goldstein)
The projection map TK satisfies the Lipschitz condition (for
C = 1). That is
IITK() TK(y)I Ix yI
See Figure 10 for an illustration;
x
K() y
K(y)
IT (X) T (y)I =< I x YIJ
FIGURE 10
(2.9) Theorem (Goffin)
Let K be a closed convex set of Rk and TK(x) the unique closest
point to x in K. Then
{[TK(x) + NK(T (x))], TK(x) K}1 is a partition of Pk
See Figure 11 for an illustration.
FIGURE 11
(2. 10) Lc rEuT
Le C c R' be a closed con'.ex cone. Then
T C" ) c )
where C is i hc polar of C.
An alternaci'.e formulation of a spht.erical surface '..ich da:is. 11,
radius d(x, I) and center T (X) (see 2.3) is concaincd in lhe
following lenrma.
(2.11) Lemma (Goffin)
k k
Let M be affine in R and x C R then
Sd(x,) T(x)) = {y c Rk: x z11= Ily l z M}
'Finally a well known result on normequivalence is stated
below. Here we state the result in terms of distances from x to P
with x V P.
(2.12) Lemma
Let P = n II., P / 0, I J 0, then there exists a scalar
iCl 1
0 < J < 1 such that for all x c Rk
pd(x, P) d (x) < d(x, P).
This result is illustrated in Figure 12.
Another Proof of Agmon's Result. The following is a different
proof for the result originally due to Agmon. We will find this
useful in subsequent parts of this material. In this proof we
make use of the result that the sequence generated by the relaxation
method is Fejermonotone with respect to P.
d (x)
P TT' ,,
FIGURE 12
(2.13) Theorem
Let P = n H., P J 0, I # 0 and finite, then the relaxation
icl
method generates a sequence which converges finitely or infinitely
to a point x* E P for any given value of the relaxation parameter X
such that A E (0, 2).
Furthermore, we have
n *0
Sx x 2d(x P) 6n
where 0 = [1 A(2 _A)p2]1/2
AE [0,1)
Proof:
Assume x n P
First we will show using Figure 13 that
d (xn+, P) = d 2(x, P) A(2 A)d (xn)
where d (xn) is the maximal distance from xn to P i.e., d xn
max d(x, H.).
iEI
n+l n n+l
From the right triangle x t T(x ) we have:
2 n+1 2 n+1 n 2 n
d (x P) = d (x t ) + d (t P) ........ ....(1)
n n n
From the right triangle x t Tp(xn) we have:
2 n+1 2n n 2n
d (x P) = d (x t ) + d (t P) ............... (2)
From (1) and (2)
2 n+1l 2 n 2 n+1 n 2 n n
d (x P) d (x P) = d (x t) d (x t )
2 n+1 2 n n+l n n n.
d (x n+, ) = d (x P) [d(x t ) + d(xn, cL )
[d(x t) d(x+ t )]
= d(xn, P) Adp(xn)[(2A)dp(xn)]
= d2(x, P) A(2 A)d (x )
P
It will be noticed that our proof is independent of Figure 13. The
figure only helps to illustrate the ideas.
T (xn) = T (xn+)
p p
FIGURE 13
By lemma 2.12 there exists
pd(x, P) a dp(x)
p'
a p > 0 such that
d(x, P)
hence
2 n+l 2 n 2 2 n
d (xn1, P) < d(x, P) 2A(2 )d(xn, P)
= d(xn, P)[l A(2 A) p2]
(2.14) Let 9 = [1 A(2 A) 2 1/2
O = 0 when A = 1, p = 1. In general 0 E [0, 1). Thus,
d2(xn, P) O2d2(xn, P)
or d(xn, P) Ond(x0 P)
Case I: If the sequence does not terminate
lim d(xn, P) = 0
n*o
Since {xn} is Fejermonotone and dim P = n, then {x}n converges to
x C DP by Theorem 2.4 where DP is the boundary of P.
n+l
x
Case II: If the sequence terminates in a finite number of steps
Sxn+ z f
Fejermonotone with respect to P. Clearly I x* Tp(xn)l~l xn T(x)II,
hence
x* E Bd(xn, P)Tp(n)
and x*, xn both belong to the ball B d(xn pTp(x ) (see Figure 14).
Hence,
xn x*1x 2d(xn, P)
< 2nd(x0, P)
If the sequence terminates, we can take x* to be the last point of
the sequence./
n+2
x
FIGURE 14
In the above procedure, if 1 A < 2 then
P = {x: a' x + b. > 0}
1 1 
n n
That is, pn is the closed halfspace defined by the constant i .
n
If 0 < A < 1, then
pn = {x: a, x + (1 A)b. Aa' xn > 0)
i i i =
n n n
and again P c pn. In both cases Pn is a relaxed form of P and at
iteration n we find xn E Pn
Range of A for finite convergence. Using relation 2.14 giving
o against different values of A and p, we get the results in Table 1
shown graphically in Figure 15.
TABLE 1
V = 1 A .1 .5 .75 1 1.5 1.9
61=f(A) .9 .5 .24 0 .5 .9
S= .1 .5 .75 1 1.5 1.9
0,=f(A) .98 .9 .87 .87 .9 .98
S= 4 A .1 .5 .75 1 1.5 1.9
6 =f(A) .99 .98 .97 .97 .98 .99
From Figure 15 it would appear that A = 1 gives best results (lowest
0). However, Motzkin and Schoenberg demonstrated that when P has an
interior A = 2 leads to finite convergence. Also if P has acute
angles, p will be low and 0 high. Thus an obtuse angle should lead
to faster convergence. Goffin determined the range of values of A
which gives finite convergence.
38
/4
/2
1.0 1.5 2.0
FIGURE 15
\ = 1
1 = 1
i =1
.5
(2.15) Theorem CGoffin)
Let P be a polyhedron with an interior. Then the sequence
generated by the relaxation method converges finitely to a point of
P if
(a) P is smooth enough and A c 11, 2]
(b) A E (X*, 2]
where
2
= + 2P and U(P) is the obtuseness index of P.
1 + U(P)
Furthermore, if A > 2 then the sequence either converges
finitely to a point of P or it does not converge.
The first part of the theorem shows that if all the corners
of P are smooth enough then the range of A over which we can get
convergence in a finite number of steps is [1, 2].
The second part of the theorem shows that for a polyhedron that
is not smooth enough, the extent by which we can deviate from A = 2
depends on the acuteness of the most restrictive extreme point of P
 the more acute it is, the less we can relax A from a value of 2.
Merzlyakov
Definitions and Motivation. The final work on relaxation
methods that we review is by Merzlyakov {30]. Merzlyakov's
method takes advantage of the fact that convergence properties
of relaxation methods improve if, instead of considering only
the most violated constraint, as was done by Agmon, among
others, a larger number of supporting planes are considered.
We would like to emphasize this unusual property of
relaxation methods  redundancy, in terms of number of constraints,
improves the convergence rate. Additional halfspaces have the effect
of increasing p which lowers 0, the convergence ratio.
At each x C R let
V(x) = {i: a'x + b. < 0, i = 1, ..., m}
1 1
That is, V(x) is the set of indices of constraints violated by x.
A cavity CV is the set
C = {x : R : V(x) = V} where V is a subset of first m natural
numbers. A subcavity SVj of CV is the subset of all x c CV which
violate constraint i C V no less than another constraint j E V.
Ties are broken by the natural ordering. Notice that subcavities
along with P finitely partition Rk. Figure 16 illustrates the above.
Associate with each subcavity S a fixed X.(j, V) '.here
0 if i V V
.(j, V) = 0 if i V
1
> 0 if i = j
2
For example for subcavity S{2 ,3(see Figure 16).
X1(2, {2, 3}) = 0
A2(2, {2, 3}) > 0
A3(2, {2, 3}) > 0
Method. Merzlyakov's relaxation procedure is as foll,,s:
Let A E (0, 2) and x specified. If at iteration n. :n P
stop. Otherwise, let
I
s 21
{I, 2}
C{1, 2}
, 2 3
{1, 2, 3)
, 2, 3)
FIGURE 16
3
S
{2, 3}
2 3
f2, 3)
3}
(2.16) x = xi [ (j,V)(axn+b )] (j)a
[ZiA(j,V)ai] [LiA(j,V)ai]
n j
where x S.
We again make the assumptions that P # 0 and without loss of
generality assume I a.si =1 for each i = 1, ..., m. Also assume
,withouc loss of generality thac Z7.i(jV) = 1. Coffin sho..ed chat if
P t 0 chen the vecccor 7\. (j,V)a. cannot be zero.
lierzlvakov showed that the procedure given above converges
Co a poinc of wiih a linear ract of convergence. If 1 < 2
clhen w.e can say
Fn = {:: 7. (j,V)(a': + b.) 0}
1 1
i
and clearly P c Fn. If 0 < < 1, chen
Pn = {:.: \,.(jV) Ia'x + (1 \)b. ',a .: ] }
1 i 1 1
and P = P Here again Pn is a relaxed form of P and the sequence
obeys our stipulation for rela:xacion method, i.e., ac iteration n we
n n n
find :.; c P ..lere P = P .
The rela:;ation sequence I{:; } congerge infinicely co, a point
C .dP, where ciP is the boundary of P, only if the whole sequence
n: 'J ja r" into :; + H (: ). by, moving in a direction that is a
convex com.binacion of violated constraints ,we force the sequence out
of tlie jam. For Ehis reason terzl,,,akov method can be rcerred co
as an an tijairinug procedure. Table 2 and Figure 17 iilutScnte this
poinE. The cable and figure are based on the ex:.ample givcn in
Merzlvakrov's paper.
TABLE 2
Consider the following system of linear inequalities
x1 + 10x2 + 10 0
3x1 10x2 30 > 0
0
Let x = (0, 0). We solve this by
(a) Agmon's method (b) Merzlyakov's method
where X = 3/4, a. = 1
n
(a) Agmon's method
n+1 n n
x = x x + b. )a. ;
I 1n
n n n
TABLE 2 (continued)
(b) Merzlyakov Method
n+1 n [ZA.(j,V)(a'xn + b.)][ZA.(j,V)ai]
xI = x(jV I i(j a
In i(j,V)a. ] In i (jV)ai]
n [TA. (j,V)a.]'
x E X.(j,V)(a'.xn + b.) Y .(j,V)a. 1 1
n xln x n 1 1 1 1 1 [.i(j,V)a.]
X1 X2 l (3,V)a
.287
0 .000 .000 2.873 ( ) 1
.958
188
1 .619 2.064 1.838 ( ) .037
.037
.188
2 7.622 .686 .459 ( ) .037
.037
.099
3 9.371 .342 .273 ( 99 1
188
4 9.351 .138 .125 (13) .037
.037
188
5 9.827 .044 .032 ( 1) .037
037
.099
6 9.949 .020 .010 ( 99 1
.955
188
7 9.948 .013 .008 ( 03 .037
.037
287
8 9.978 .007 .002 ( 8 1
.958978
9 9.978 .008
Notice that the Merzlyakov method
the Agmon procedure. Convergence to a
faster with the Merzlyakov Method.
is much more effective than
point of P appears much
Generalized Merzlyakov Procedure. The Merzlyakov procedure
can be generalized [24] by allowing the A.'s to be chosen based on
1
the current point rather than fixed for a subcavity. For any:.
where V(x) # 0 let
NAl A II
0 0
F
+ I
x x
o 0
+ I
X X
II II
II
i rC N
P
DC
0
4 P
N 0
0 0
rII
< 0
I I
0 if a'x + b. > 0
A.(x)
( 0 if aix + b. 0
1 1
where 0 < EA.(x) B, A.(x) > 0 for some i E V(x) and at each n
i
there is a j e V(xn) such that
A.(xn)(a'.x + b.)
1 1 1 >
n
a'x + bj,
where j* indexes a most isolated constraint and 0 > 0.
The convergence properties of this method are described in the
next chapter.
Typical Subgradient Method
Procedure
Let f be a conve.:: but not necessarily differentiable funccion
defined on R u E R is said to be a subgradienc of f at :: if it
satisfies the following inequality:
f(y) 2 f(x) + u'(7 :x:) for all Rn.
The set of subgradiencs of f at x is called the subdifferential of
f at :: and denoced by cf(>:).
Si(::) = {u c R : u is a subgradient of f at ::}
Sis a minimum of f if and only if 0 e cf(x ).
A typical subgiadient algoritlhm works as follows:
(1) choose F. I. nrbicraril,
n  n n 11
(2) coiipu t .i bgradieit u >i at :. i.e. 11 c d :: ).
If u = 0 an optimal point has been found. If not go to (3).
n+l n+ n n n
(3) Find : as tollous; :: = x t u
where tn is a scalar generally obtained by the approximate
minimization of
f(xn tun) for t a 0.
n n+1
Go to (2) with x replacing xn
The function f could be an arbitrary convex function. If the
objective is to find a point of a polyhedron, one could choose a
number of different functions. One choice could be
n n
f(xn) = Max (a.x b.)
i 1
i=l,... ,m
Since f is the maximum of a finite number of convex functions, f is
convex. Oettli [32] has given another interesting way to define a
function which can be used to find a point of a polyhedron. This
is given latter.
Variations
Reverting back to the general subgradient procedure given by
the iterative scheme
n+l n n n
xx tu
various proposals [16] have been made for the step size tn. The
earliest one was by Polyak [34] who suggested tn = o and lim n = 0.
n"
The merit of this procedure is that we could choose any arbitrary
subgradient u nE f(x ) and the sequence would converge to some x*
with the stipulated stepsize (e.g., with t = 1). However
convergence is dismally slow and the method is of little practical
value. Polyak attributes the original idea of this method to
Shor [36].
Another idea also due to Polyak [35] is to use
(2.17) ? If(x') ]
n n
I n*
with f = f(x ) the optimum value of the function. Polyak has shown
that with 0 < b An = b < 2 the sequence converges to x with a
1 n 2
linear rate of convergence. A variation of 2.17, also due to Polyak
is to use f > f(x ), i.e., an overestimate of the optimum f(x ).
Here again if 0
linear rate but to the level set
S = {x: f(x) < f}.
Thus for the result to be of value we must choose close enough
overestimates of f(x ) and this is difficult in practice.
A final variation on the theme of 2.17 is to use an
underestimate f < f(x ). Eremin [12] has shown that the sequence
converges to x if lim n = 0 and XA = n Here again convergence
n n
no
to the optimal solution is very slow.
Some Subgradient Procedures
Polyak
(2.18) Theorem (Polyak)
Let f be a quasi convex function to be minimized over F:. If
the sequence {xn} is obtained as follows
n+1 n n un
x = x A u with A = u n, then there e:xists a
"ni ul n
subsequence (xnk} such that lim f(xnk) = f
koo
Proof: Select a > f and define
S = {x E R f(x) = a}
S is convex (it is a level set). Hence there exists an x in int S
a a
since a > f (see Figure 18). Choose p > 0 such that the neighbor
hood B (x) c S
p a
Suppose x ~ S for all n
f(x)
a
I I P.025
(a) x E RI
contours of f
(b) x R2
FIGURE 18
f(x) f(x") > u" (x xn), x
n n
since u is a subgradient at x Thus
f(xn) f(x) un' (xn x). Also
f(x) < f(xn) Vx S
Ct
Combining the above
un (xn x) 2 f(x") f(x) ? 0, yx C S
n
This holds for a point x + u E B (,x)c S
II u P
n un n pun
u x u ( + )
n+ln
+ Iu
n 11 n 1 2 n ~1
p u Ij u (x x)
x x  = x 
n ~ 2 n
II x x + IIu
< n ~ 2 n
= x x2 + u 
n ~I 2 n
= x xll + uI u
n ~ 1 2
u x I
Choose N such that A = p, Vn N.
n
For n = N to n = N + m
S IN+m1 ~ 2 N
0 5 x x I x
SN+m
g x x p A
n
n=n
2un (xn x)
2p I un J
2Pln
2pA .
n
This is possible since lim .', =0.
n
n.0
 x2 + E An(n 2p .,
n=N
For m large enough the right hand side is negative, since Z.\ Ji'.erges
 a contradiction. Hence x EC S
a
Let lim a = f Then we have proved there exists :'nk a
k k AV
such that
n
lim f(x k) = f _I
k*oo
Computational experience with heuristic subgradient methods has
been reported by Held and Karp [18], and Held, Wolfe, and Cr:,.der [19].
Hence
Held, Wolfe, and Crowder use underestimates to f(x ). However the
stepsize Xn = X is kept constant for 2m iterations where m is a
measure of the problem size. Then both A and number of iterations
are halved successively till the number of iterations reaches some
value k, X is then halved every k iteration. This sequence violates
the requirement Xn = m and it is possible that the sequence may
converge to a wrong point. However in their computational experience
this did not happen. Held, Wolfe and Crowder experimentally found
that their procedure was effective for approximating the maximum of
a set of piecewise linear concave functions. In their problems
(assignment, multicommodity, and maxflow) the number of constraints
is enormous, of the order of 100'. In all these cases they were able
to get linear convergence (which is assured by the choice of stepsize).
Held and Karp used overestimates to f(x ) to obtain bounds for
a branch and bound procedure for solving the traveling salesman
problem. Using this procedure they were able to produce optimal
solutions to all traveling salesman problems ranging in size upto
64 cities.
Oettli
Oettli proposed a subgradient method for finding a point of an
arbitrary nonempty polyhedron. We will discuss his less general
method which can be used to solve a linear program. As before, using
the dual and an additional constraint expressing strong duality, a
linear program can be stated to be the problem of finding an x C Rk
satisfying
+j(x) 0 i = 1, ..., m
1
where Z.(x) represents the magnitude by which constant i is violated
at x;i.e.,
i (x) = max {0, a.x b.}
1 1 1
Define (x) to be the vector formed from violations of
different constraints
Z+(x) = (Z+(x), Z2+(x), ..., Z (x))
S2 m
Let p(*) be an isotone norm on Rm, i.e., if 0 = x y then
p(x) p(y). It may be noted that all the usual norms like Euclidean
and Tchebycheff are isotone norms. Define
d(x) = p(Z+(x))
Then clearly x satisfies +(x) 2 0 iff d(x) = 0.
Oectli sho'.ed that d(') is a conv'.e: function andi then described
an iterative scheme using subgradients of d(') that minimizes d()
over Rk. This scheme is quite general since we have a different
function for different norm.
(2.18) Lemu. [Oettlil
d(') is a conxex function.
Proof;
'. (z) is con'vex for all j.
.1+ +
Thus i (atz1 + ',z~) ": i (z ) + t i (z ) for d., :. O, ., +; = i.
Since p is monolonic
.+ '+ 1 +
pr'( +'izt + Cz) p[,i':> (z ) + Si t (z )i
and since p as a norm is con'vx:
I p[lo (zi) + izt ) + z )
h= plk+(zl) I + d pl (z )
hence dxz1 + Lz~) L d(z') + Cd(z). i
Let x0 be given initially. If xn P stop. Otherwise, let
n+l n d(xn) .
(2.19) x = x _ (xnt(x)
t(x )"t(xn)
where t(xn) is any subgradient of d(') formed according to Oettli's
n
rule at x The sequence {x } converges to a point of P with a
linear rate of convergence.
(2.20) Lemma [Oettlil
Let y C P, then for the Oettli's procedure
n+1 2 n 2 d2(xn)
Ix I (x ) 1 2
Proof:
n+1 2 n d(x ) n 2
Sx yl = n x n) t(x ) yl 2
t(x")'t(nn)
n 2 dx) 2d(x)
=I x  + x 2d(xn) t(xn)' (x Y)
t(xn)'t(x ) t(x )'t(x )
By definition of subgradient
d(y) > d(x ) + t(xn) (y x )
Since d(y) = 0 therefore
d(xn) 5 t(xn) (x y)
Collecting the above results
t(x ) t(x )
(2.21) Theorem [Oettli]
n+1 n <0<
I x x 6 II x x ,0 < 0 < 1
Proof:
n+l 2 11 2 d2(xn
(2.22) j x ii Ixn (xn)
t(x ) t(x )
n 2
S11 Y112
By triangle inequality
Ii x xl II x y l + I y xI
S2 I x yll
Since this holds for all y E P including Tp(xn)
II x xl 2d(x', P)
*
Substituting x in 2.22
In+1 *11 2 l n x* 2 d2(")
lx x ] n x x I d()
t(x ) t(x )
n 2 d2(xn) 1
= II x x I [1 ,n. 2 2]
n 2
2 nn
< n *; 1 d (x ) 1
Sn, F n. 2 n 1
cn ) tc x ) d x )
or I x. 1 :x x 1 x ]
1 B
where A = inf B = dup I t(xn")
n
n d(: P) n
Oeccttli has show..n that A > 0 and E i. finice. From these c'Dser'.acticn.,
the result follows if we take
6 11 ,4 2,"
SB
In the ne:x. chapter we show that the Generalized llerzl;,akov
procedure is equivalent to the Oettli procedure in as far as the
sequence of possible moves made by che tw.'o procedures are concerned.
Souime of the Hore Riecc'nt Algori thlns
A limictaion of subgradient opcimization methods is the absence
of a benchmark co compare the computational efficiency of difference
algorithms. Algorithms for unconstrained optimization of smooth
convex functions have the reliable method of steepest descent
(Cauchy procedure) as a yardstick against which other algorithms
are compared. If they are better than the Cauchy procedure, they
usually deserve further consideration. In subgradient optimization,
the Cauchy procedure may not work. An example where the Cauchy
procedure is ineffective is reproduced below from [37]:
The function considered is
max {3 x + 2y, 2x + 10y)
z
8
8 >
FIGURE 19
The contours of this function are sketched in Figure 19. One
can calculate that if the Cauchy algorithm is started at z, the
sequence converges to the origin (0,0) which is not the minimum
of the function.
Some of the recent subgradient procedures can be divided into
two categories. In the first category are those that claim to be
at least as good as some other subgradient procedures. These
algorithms claim their superiority based on empirical (.computational)
results. In the second category are those that claim to be reasonably
effective for both differentiable and nondifferntiable convex
functions and claim that the algorithm is superior when applied to
quadratic and convex differentiable functions.
As an example of the former we present the method of Camerini,
Fratta and Maffioli 15], and as an example of the latter we present
Wolfe's 138] method of conjugate subgradients for minimizing non
differentiable functions. Wolfe's algorithm closely resembles
Lemarchel's method 128].
Camerini, Fratta and laffioli. Here one adopts the following
iterative scheme for minimizing f('). Let
0
x = 0 and
n+1 n n n .
x = x t s where s is a modified "gradient"
n
direction defined as
sn = 0 for n = 0 and
sn = f (xn) + n1
n
n n n
with f'(x ) a subgradient of f at x and 0 a suitable scalar. s
n
is equivalent to a weighted sum of preceding gradient directions and
is used as an antijamming device. The following lemmas and
theorems develop the essential elements of Camerini, Fratta and
Maffioli method.
(2.23) Lemma
n
Let x and x be such that
f(x ) f(xn), then
0 f(xn) f(x ) f'(x) (x x )
This lemma states that the negative of the subgradient makes an
S n *
acute angle with the vector (x x ) where x corresponds to an
optimal solution. See Figure 20.
Sn
/ x
f'(xn)
f' (xn)
FIGURE 20
(2.24) Lemma
If for all n
0 < t f(x) f(x*) and 3 0, then
n 11 sn 12 n
0 5 f' (x) (x xn) < s (x x)
n
Lemma 2.24 extends the subgradient property of Lemma 2.23 to s
n
s also makes an acute angle with (x x ).
(2.25) Theorem
ni n
Yn s f (x ) if s f(xn) < 0
Letn I "n 112
0 otherwise
with 0 = = 2: then
n
n' n n'' n
(x x ) s (x x f (xn)
ls" l = Ilf,(xn) ll
n
Theorem 2.25 shows that a proper choice of ensures that s is at
n
least as good a direction as f'(xn) in the sense that s makes
more of an acute angle with (x x ) compared to f'(x ). If
f'(xn) and sn1 make an acute angle we take sn = f'(xn). Only
if f'(xn) and sn make an obtuse angle do we take
s = (f (n) + s")
n
Figure 21 illustrates the above theorem.
f' ( n)=I
ii1
I,
Sn1
S
ni
x
FIGURE 21
(2.26) Theorem
Ix* xn+ll < j x* xnl
Proof:
nH
2f' (xn) (x* xn by Lemma 2.24
2 (x* x by Lemma 2.25
Hence
t il sn 112 + 2sn'(x* xn) < 0
or t sn2 + 2t s (x* x") < 0
n n
adding II x* x n2 to both sides gives
1Ix* xn+l < I x* xI I/
Theorem 2.26 shows that the sequence is Fej6rmonotone with respect
to an optimum point x*. Using arguments similar to those used
earlier in relaxation methods, it can be shown that the sequence
{xn) converges to x*.
Camerini et al., use heauristic arguments to show that the
best value for y is 1.5. They then claim superiority of their
method over that of Held and Karp [18] and Polyak [35] based on
computational results.
Wolfe. In Wolfe's method [38] we generate a sequence of points
{xn} of directions {dn}, of subgradients {f'(xn)} and scalars {tn}
such that for n = 0, 1, 2, ...
t > 0, f'(xn) E 3f(xn)
n
n+1 n n
x = x + td and
n
f(xn+l) < f(xn).
At step n we have a bundle of vectors G which is formed by a set of
n
rules. G consists of f'(x ), possibly dn and a (possibly empty)
n
string f'(x ), f'(x ), ... of limited length.
We set d = N G where N G is a unique point in the convex
rn rn
hull of G having minimum norm. Demjanov 110] has shown that
Vf(x) = Nr f(x) where Vf(x) is the gradient of f at x when f is
smooth at x and 3f(x) is the subdifferential of f at x. As is
typically the case, the stepsize, t is determined by an approximate
n
minimization of f(:< + td ) for t 0. At some step, d = ; G
n r n
will be small, and in addition the tlst p + 1 points (for some
interger p) are all close together. Thlen :' w. ill be our appro:imrate
soluc i ln.
For e:xample consider the illustration in Wolfe's paper
f(::,v = ma:x [f I, f:., f, where
t f
S = . f = .: + v, t = ," 2"
1 3
sketched in Figure 22. A subgradient of f at an',' (::,') is chosen to
be '7f. where i is the least inde:.: for which f(:,',') = f.. If we start
i 1
with the point (6,3), we move in the direction 'f(6,3) = (1,1) co
the point (3,0). Using our prescription for choosing che direction,
the oni:, available direction at (3,0) is still (1,1). We must
ctake a small scep in tliac direction an':,.a,' even tclouPg it is not a
descent direction. hien the new subgradient will be (1,2) forming
the bundle G6, = [(1,1), (1,2)}, .'e find the direction C, = (1,0).
r
y
10
r\ NA8
[1 J \ x
FIGURE 22
The next step in this direction takes us to the neighborhood
of (0,0). At the next iteration we add (1,0) to the bundle. Then
N = 0. We "reset" and start all over again at this point. We
r3
accept the last point reached as a solution if N G = 0 and the
rn
last "p" points are "close enough."
For differentiable convex functions, the method is an extension
of the method of conjugate gradients and terminates for quadratic
functions.
CHAPTER 3
THEORETICAL RESULTS
Comparison ofGeneralized Merzlyakov Method and
Some Typical Subgradient Methods
The equivalence of relaxation methods and minimization methods
has been noted by several researchers. However, these two approaches
are only mathematically equivalent and this equivalence has not been
generally made explicit. On the other hand computationally relaxa
tion methods are much simpler. It has been shown [24] thac _
generalized form of Merzlyakov's method subsumes most of the typical
subgradient methods. In particular it was shown that the geneuralized
Merzlyakov method can generate any move that is made by thc ucettli
method. In this section we draw heavily from the results of the
above paper [24].
Generalized Merzlyakov Method
The Merzlyakov method is generalized [24] so as to facilitate
its comparison with some of the subgradient methods. This extension
greatly increases the flexibility of the original method. The
following procedure allows the A.s to be chosen based on the current
1
point rather than being constant for a subcavity. For an:, ,l.,lere
V(x) # 0, let
0 if a'x + b > 0
A.(x) =
S0 if a'x + b. < 0
i 1 
where 0 < E A.(x) 8, A.(x) > 0 for some i CV(x) and at each n
1 1
there is a j C V(xn) such that
.(xn)(a'.xn + b.)
3.1 I J 90
a' ,xn + bj,
where j* indexes a most violated constraint and 0 > 0.
For A C (0, 2) and x initially specified, stop at iteration n
if x E P. Otherwise, let
n+ A. (xn)(a'xn + b.)(E A.(x )ai)
n+l n i 1 1 1 1
3.2 x = x 
(E A. i(xn)a.)' ( .(xn)a.)
We continue to assume H aill = 1 for all i.
In the next two theorems, we show that the generalized
Merzlyakov method converges with a linear rate of convergence to P.
3.3 Theorem
Let z C P, and {xn} obtained by the generalized Merzlyakov
method. Then
x n+l 1 < 11 n 11
llx+1 zll < x z
Proof:
2(xnz)' A Z A.(xn)(a'xn + b.)E X.(xn)a.
I n+l i ll n 2 1 1 1 1 1
1 1 1 1
 x Z11=[x n nn
12(Z A.(xn)(a'xn + b.))2
+ 1 1 1
(E A.(xn)ai)'(. .i(xn)a.)
The last two terms can be confined Lo get
AX A .(xn)(a'xn + b.)
S 1 n [2X A.(xn)a'(x z)
(X A.(xn)a.)'( n [.(xn)a) 1 1
A(E A.(xn)(a'xn + b.))].
1 1 1
Now a'xn + b. 0 whenever A.(xn) > 0 and a'xn + b. < 0 for some
1 i I i 1
A.(xn) > 0 and thus the first bracketed term is positive.
1
Since z C P and a'z > b., then
1 1
E A.(xn)a'z 5 Z A.(xn)b.
1 1 1 1
Thus the second bracketed term is less than or equal to
(2 A) A.(xn)(a'x" + b.)
1 1 1
and this term is strictly negative.
Thus
Sxn1 zll < i x n zl Ii/
Thus we have established that the sequence is Fejermonotone
with respect to P. We next show that the procedure gives points
which converge linearly to P. Let
d(xn, P) = inf II x z .
ZEP
3.4 Theorem
Either the generalized Merzlyakov procedure terminates with a
point of P or it generates a sequence which converges to a point
x* E P.
Furthermore, the convergence rate is linear.
n n n
Proof: Assume x V P and z is the unique closest point of P to x .
Then
2 n+1 p) n+l n n 2
d (x P) [ x z .
S In+1 n 2
After expanding x z 112 we get
,n n 2
A(2A)[Z A.(x )a'x + b.)]
2 n+l, P)  n n 2 i i
d (xn P) = x z  
( ( A.x )ai)'( A. (xn)a.)
1 1 1 1
A(2A)[Z A. (xn)(axn + b.)]2
d2(xn p) 1 i
= dA x P) A(xn)a
(E A.(xn)a.)'t( A.(xn)a.)
1 1 1 1
By assumption
IJ A.(xn)(a'xn + b.)]2 [A.(xn)(a'xn + b.)]2
i 1 1 1 1
( 62 (a'xn + b. )2 for some i E V(xn)
From Hoffman [21] we have that there is a p > 0
pd(xn, P) < a'. x b..
Thus
d (x P) = d2(xn, P) 22(x, P)
(E .(xn)a.)' (E (xn)a.)
1 1 1 1
Finally since E A.(xn) < B and I a.ll= 1 for all i, we have
(E Ai(xn)ai)' ( .(xn)a.) 82
And we get
d2(xn+, P) = II A(2)e2 /2/ 2]d2(xn, P) y2d2(xn, P)
Then 0 y < 1 and d(xn, P) < nd(x0, ).
Case 1: If the sequence does not terminate lim d(xn, P) = 0 and
nKo
Motzkin and Shoenberg [31] have shown that the sequence converges to
a point x* in the boundary of P for 0 < A < 2.
Case 2: If the sequence terminates
x n+x zjl < xn zil for all z E P (Theorem 3.3).
Also Ilx* znl < I xn z nl where zn is the closest point to P
from xn
Th us
nn n
< 21 xn z"II = 2d(x", P)
2ynd(x P)
where x* is the last point in the terminating sequence. ///
We can further generalize the Merzlyakov method and Theorems 3.3 and
3.4 by replacing A with A where 0 < b. < A < b < 2 for all n. In
n 1 = n = 2
the remainder of the discussion we usually mean A = A for all n but
n
do not have to restrict ourselves in this manner.
Further Developments of Oettli Method
Oettli's minimization method was discussed in the last chapter.
We wish to show the equivalence of Generalized Merzlyakov method and
Oettli's method in the next section but before that we need to develop
some properties of Oettli's method to facilitate this comparison.
Again most of the theorems and proofs in this section are taken from
the paper [24] referred to earlier.
We represent the subdifferential of a convex function f at x by
f(x)and a subgradient of f at x by f'(x) i.e., f'(x) E af(x).
To recapitulate the Oettli procedure, we form the sequence {xn}
as follows:
3.6
where
3. 7
n+l n d(xn)t(xn)
x =x
t(xn)'t(xn)
n n
t(x ) is any subgradient of d(') at x
Lemma
For any norm p(') on Rm
ap(x) = ap(0) n {p'(x): p(x) = p'(x
)'x)
Proof:
We will show
pp(x) =
Conversely if the
that if p'(x) p(x)then the following ralation holds
{p'(x): p(y) p'(x)'y for all y} n
{p'(x): p(x) = p'(x)'x}
above relation holds, we will show p'(x) E ap(x).
By the definition of a subgradient p'(x) E ap(x) if
p(y) > p(x) + p'(x)'(y x) for all y.
Also
p(Ax) p(x) > p'(x)'(Ax x) for all scalars X.
Since p(') is a norm
p(Ax) = A(lp(x)
For X > 1 p(x) > p'(x)'x
0 < A < 1 p(x) < p'(x)'x
Thus p(x) = p'(x)'x.
From the subgradient inequality
p(y) p(x) + p'(x)'(y x) by substituting for p(x) we get
p(y) p'(x)'y for all y. Hence
3p(x) = {p'(x): p(y) p'(x)'y for all y} n
{p'(x): p(x) = p'(x)'x}.
Conversely, if p(y) > p' (x)'y, for all y and p(x) = p'(x)'x, we get
by subtraction
p(y) p(x) p'(x)'y p(x) = p'(x)(y x)
and p'(x) E Dp(x).
Finally
pp(0) = {p'(0): p(y) > p'(0)'y, for all y} n Rm
= {p'(0): p(y) p'(0)'y, for all y). V/
To find the subgradients of d() in terms of subgradients of the
composite function p(+ (x)) we use the following chain rule given by
Oettli.
3.8 Lemma
Let p(') be an isotone norm and p'(Z (z)) 0, then
3.9 d'(z) = (z) p'( +(z))
Proof:
+ + 0 > 0 + '
e (z) A (z ) = (z z ) jl Cz )
multiplying both sides by p'( +(x)) where p'(Z (x)) = 0
(Z+(z) Z+(z0))p', +(z 0)) = (z z )' +'(z0 p'(+(z0))
Using the definition of subgradient for p'( +(z0))
p(Z+(z)) p(C+(z0)) 0 (+(z) Z+(z0 )'p'(Z+(z0)
( (z zO)'+' (z0)p( +(z0))
d(z) d(z0) (z z )+'(z p' (+z))
But d(z) d(z0) 2 (z z )'d'(z )
hence d'(z ) = +' (z )p, +(z0 ).
The subgradient of (x) may be found using the following well
1
known.. result
3.10 t.+ (x) = {Aa.: = 0 if a'x + b. > 0
1 1 1 1
A = 1 if a'x + b. < 0 and
1 1
A [0,1] if a'x + b. = 0}
I 1
The subdifferential of Z (x) can in turn be found from
3.11 a +(x) = {(hi, ... h ): h. E S (x)}
1 m i l
Generalized Merzlyakov method and Oettli method
With the results developed so far in the previous sec ions we
are now in a position to develop an expression for equation 3.6 in
terms of the parameters of the generalized Merzlyakov method.
3.12 Theorem
Let x c Rk, x V P and t(x) be a subgradient of d(') ic :w formJd
by using the chain rule of equation 3.9. Then
d(x)t(x) [ (a'x + b.)]EA.a.
d(x)t(x) 1 1 1 1
t(x)'t(x) (~A.a.)'( .a.)
1 1 1 1
for some X.'s where A. O, = 0 if a'x + b > 0, and EA. > 0.
1 i i i i i
Proof:
By equation 3.9 we have
t(x) = e+ (x)p'( (x)) for some e (x) e S +(x)
and p'(Z+(x)) ap( +(x)).
By equations 3.10 and 3.11
z (x) = (nlal' r2 2' .. n a ) with
0 if a'x + b. > 0
1 1
= 1 if a'x + b. < 0
I 1 1
S[10,1] if a'x + b. = 0
1 1
Also from the proof of Lemma 3.7
d(x) = p'(Z+(x))'+ (x) where
n, (a'x + b.)
z1
Z+(x) =
n (a'x + b )
m m m
Let
h p' ( (x)). In Oettli's method we need h 0. Substituting
d(x)t(x)
these last few expressions into (x)t(x)gives
t(x)'t(x)
d(x)t(x) [EX. (a'x + b.)]EA.a.
d(x)t(x)1 1 1 1 1
t(x) 't(x) (EA.a.) (EA.a.)
with A. = n.h. 0. We also have EA.(a'x + b.) = p( +(x)) > 0 since
1 11 1 1 1
x V P and EA. > 0.
We have thus established a close similarity between the two
methods. We have however yet to establish the two other requirements
viz EA. =< and that condition 3.1 is satisfied. First we note that
1
since the subdifferential of p(') is compact EA. is bounded above.
1
Hence the condition LA. B is satisfied. Now to show that condition
1
3.1 is satisfied, we see on the lines of the proof of Theorem 3.12
that
d(x) = p(Z+(x)) = p'( +(x))' +(x) E h'(A+(x))
= ET.h.(a'x + b.) EA.(x)(a'.x + b.)
1 1 1 1 1 1 1
where A.(x) = .h. > 0.
1 1 1
Hence
E. (x)(a.x + b.) = p( +(x))
1 1 1
and m max X.(x)(a'x b.) 2 p(C (x))
1 1 1 
i
Also Z (x) 2 Z.,(x)ej, where j* is the most violated
constraint and e. is a unit vector with a 1 in position j*. Since
p(') is monotonic
S+ (x)ej)
p(Z (x)) p(A (x)e)
= (a,x + bj.)p(ej.)
Thus
m max A.(x)(a'x b.) 2 (a' x + b. )p(e.,)
i 1 1 1 J
2 (a',x + bj ) min p(e.)
Hence
main P (e)
A,(x)(a'x + b ) m p(e
i i > i a
= 0 > 0
(a',x + bj,) =
where i E V(x) maximizes the left hand side.
If we allow X = 1 in the generalized Merzlyakov method,we find
from the above analysis that any move made by the Oettli procedure
can also be made by the generalized Merzlyakov method. However, the
reverse is not true. This can be seen if we let A # 1 and take the
case where x violates only one constraint. Thus the Oettli procedure
is strictly subsumed by the generalized Merzlyakov method. Compu
tationally the relaxation method is very simple to implement since
the directions of movement can be easily computed. Also it is a
very flexible procedure since all that is required to change the
direction of movement and the step length is to change the weights
X.(x). On the other hand the Oettli method requires a subgradient
to be computed at each point which is not a trivial calculation.
In an earlier paragraph we indicated that the
Merzlyakov procedure could be further generalized by allowing A to
vary with each iteration. However this would require the additional
condition 0 < b Xn A b2 < 2. Oettli has recently generalized his
method [33] to incorporate the above feature. His generalized
method requires that An (0, 2) for all n with the additional
stipulation
CA (2 X ) = + .
n n
A second aspect in which the two methods differ is that in the
generalized Merzlyakov method we could use a different set of X.'s at
1
each iteration. This could be introduced into the Oettli method by
considering different norms at each iteration.
Comparison With Some Other Subgradient Nethods
In this section we show the versatility of the generalized
Merzlyakov method by showing that some of the other typical subgradient
procedures are strictly subsumed by the generalized Merzlyakov method.
We consider the methods of Polyak (134], 135]), Eremin [12] and Held,
Wolfe and Crowder [19]. These subgradient methods are capable of
tackling more general functions, however we compare them when the
objective is merely to find a point of a polyhedron P. There are a
number of ways of defining the function f(') such that when we
minimize it we get x* P. A typical choice is
3.13 f(xn) = max (a'xn + b.).
1 i
i=l,...,m
We will show that with f thus defined the subgradient methods are
subsumed by the generalized Merzlyakov method. The above subgradient
methods can be collectively described by the following sequence:
n 
n+l n [f(x) f] n
x = x 2 f(x )
S n f'(xn)T2
where f is an estimate of f(x*) the optimum value of f. If f(x )
assumes its value for a unique i in the relation (3.13) then the
subgradient at x f'(xn) = a.. If there are ties, f'(x ) can be
1
taken to be a convex combination of the maximizing a.s. Let 0(x )
represent the set of indices i E {l, ..., m} for which
f(xn) = max (a'.xn b.). Then
i=l,... ,m
f'(xn) = ZA.(xn)a. with EA.(x ) = 1
1 1 1
A.(xn) 0 and A.(xn) = 0 for i i 0(xn).
1 1
Hence
n 
nl n ((x) f)
n+1 n \n A n,
x = x f'(x )
i f' (xn) 2 (x
(EA.i(xn)(a'xn + b ) f)(E.(xn)a.)
1 1x 1
n (EA.i(xn)a.)' ( i. (xn)a.)
with A.(xn) 0, A.i(xn) = 1 and A.(xn) = 0 for i V 0(xn)
1 1 1
This can be considered equivalent to finding a point of P where
P = {x: a.x + b. f ? 0, i = 1, ..., m}.
1 1
If f 0, the procedure gives a point of P. Thus if P J 0 the
subgradient procedures mentioned in this section are special cases
of the generalized Merzlyakov method.
An Algorithm Using Relaxation Method
Motivation
We have seen that with relaxation parameter of A = 1 we can get
the best convergence rate for a fixed i (see Figure 14). Also a
higher p leads to better convergence. We wish to combine both these
features in our algorithm. We also derive motivation from Polyak's
[34] and Oettli's procedure 132]. In this section we give a procedure
for finding a point of an arbitrary polyhedron P and later apply it
to the specific problem of solving large linear programs.
Given a polyhedron P
P = {x E Rk: a'.x + b. = 0, i C I}.
1 1
We assume P J 0 and I j 0 and finite. Goffin [15] has shown that when
P is full dimensioned finite convergence is assured for a range of
relaxation parameters A depending on the obtuseness index of the
polyhedron. When P is not full dimensioned,we can partition it to
subsets M and C, such that M is full dimensioned, and then devise an
iterative procedure relative to these subsets so as to take advantage
of the full dimensionality. Even though finite convergence is not
assured, we can hope to get a better convergence rate with a proper
choice of the relaxation parameters. There is considerable flexibility
in the manner M and C are chosen. One important consideration is that
it should be easy to project on to C. The algorithm is in two phases.
In Phase I we use an iterative scheme on the sets
M and C which drives us close to P. We do this by starting with an
arbitrary point in C. We then move in the direction of M. This
movement could be a projection onto M, or a reflection or under or
over relaxation, depending on the value of the relaxation parameter
A. From the new point thus obtained we project on to C. We show
that the sequence thus constructed converges to P with a linear rate
of convergence. In our algorithm however, this iterative procedure
is continued merely to get close to P. In this special case when
k
C = R Phase I becomes the relaxation procedure of Agmon.
Once we get close to P we switch to Phase II. In Phase II we
consider the constraints of the set P and apply the
Generalized Merzlyakov method. The motivation for use of the
Generalized Merzlyakov method at this stage is the fact that the
set of violated constraints as we approach P are precisely the set
of violated constraints at the limit point and under these
circumstances, the Generalized Merzlyakov method can be very
effective.
We first describe the procedure for finding a point of an
arbitrary polyhedron P.
The Algorithm
Phase I.
Let z CP
If z E P stop, otherwise define
n+l } T )
z = TC {zn + Ad (zn)a.i } TC(sn)
n
where 0 < A < 2 and a. corresponds to the most violated halfspace
1
n
of M at z and
3.14 sn = zn + Ad (zn)a
n
We will show that the sequence {zn) is Fejermonotone with
respect to P.
3.15 Lemma [Goffin]
k
Let x, y, t be points of R and let
x' = x + A(t x) where A E R. Then
I x' y 12 = Ix y 2 2 A) It x 2 + 2A(ty)'(tx)
Proof:
2
I x' yll = x + A(t x) yi 2
= 1 x y1 2 + 12(x y) + A(t x)]' A(t x)
= Ix y 2 A(2 A) t xII 2 + A[2(x y) + 2(tx)]'(tx)
= x y 2 A(2 )I t xI 2 + 2A(t y)'(t x). i//
3.16 Lemma
sn ul2 = izn u2 AdM(zn)[(2 )d(zn) 2a' (tn u)]
n
for u E M.
Proof:
n n n
Substitute s for x z for x and t for t, where
t = z + d (z )a Then
n
1I s ul 2 = 11 z u12 A( (2 ) i )t zn12
+ 2A(tn u)'(tn z)
22n
= z u 2 A(2 A)d (zn) + 2Ad (zn)al (tn u)
9 1
= I zn u AdM(Zn)I(2 A)d:(zn) 2ai (t" u)].
n
//
3.17 Theorem
Let M be a nonempty polyhedron of Rk. Then the sequence of
points generated by equation (3.14) has the following property:
Ss" ull 1 z" ull for all u E M.
Proof:
If u E M, u H Hence a' (tn u) 1 0 for all u M. By
1 1
n n
lemma 3.16
I sn ui I z ull .
3.18 Theorem
k
Let P be a nonempty polyhedron of Rk. Then the sequence of
points {zn} generated by equation (3.14) is Fejermonotone with
respect to P, in fact
I zn+ ull < 1n ull for all u P.
Proof:
Let u c P and let b be the unique point obtained by projiccring
n n+l
u onto the line formed by extending s z (Figure 23). The angle
n
formed by s b, u is a right angle and for the right triangle
n
s b, u.
n 2 2 n 2
I s all 2 + II a u 2 = 1 s u[ 2
I n n+l 2 + n+l al2 + 2j su n+l 1 nl+ _
I s z 1 +  z al + 2 s z I[ 1 z a 
+ a ul 2 = sn ul
or
n n+1 2 n+l 2 n n+1 n + I
s + u + 2 js I i
= sn u
or
n+l u 2 <11n 2
 z ull <  s ull
n n+l
The strict inequality arises from the fact that s and z are
distinct or else z E P. But by Theorem (3.17)
II n ul 11 zn uI hence
Szn+l ull < I zn uI I7/7
In the next theorem we show that the sequence {zn} converges
linearly to P.
FIGURE 23
3.19 Theorem
The sequence of points {zn) generated by the above procedure
converges finitely or infintely to a point of P. Furthermore the
convergence rate is linear.
Proof:
Izn+ T (zn+12 = I zn+ u 2 for a unique u E P (the
n+l
closest point of z to P). Thus
I1z n ul 2 = TC{zn + AdM(z )a } TC(u) 2
n
S1 zn u2 Ad (zn)[2a' (znu)Ad (zn)]
n
since the projection map satisfies the Lipschitz condition with
c = 1. Continuing
n+1 2 n 2 n n n
z u z u12 Ad (z )[2d (Z ) Ad (Z )]
since a' (zn u) = d ,(zn) hence
i M
n
Iz n+l ul 2 < d2(zn, P)[l A(2 A)"2] by Lemma 2.12
= 2d2 (z P).
We get
d(z n+, P) < ed(zn, P) where
0 = [1 A(2 A),*2]1/2
and 0 E [0, 1).
Hence
d(zn, P) < 9nd(z0, P) and if the sequence is infinite
lim d(zn,P) = 0
n+o
Using Theorem [2.4] and by the continuity of the distance function
this implies {z n converges to z* in the boundary of P.
If the sequence terminates at z*, then
Iz* Tp(zn)I l< Tz(zn)jT = d(zn, P)
by Fejermonotonicity of the sequence. Hence z* and zn both belong
to the ball B (Tp(zn)), and
d(zn, P)
d(zn
1I z* zn < 2d(zn, P) < 2Ond(0, P). 
Phase II. When the last k points are within E of each other,
we switch to the Generalized Merzlyakov procedure. This is because
if we are near the limit point, the only active constraints would
be the tight constraints at the limit point of the sequence. In
such a situation the Generalized Merzlyakov method can be very
effective with a proper choice of A and A.(x). We recapitulate
that in Merzlyakov's procedure (which is a special case of
Generalized Merzlyakov method) the sequence is generated as follows:
A n (j, V)(a'x" + b.)][A(j, V)ai]
n+l n 1 1
x =x 
[x.,(j, V)ai] [Ei(j, V)ai]
n
where x e S
If we assume that we know the set of constraints satisfied
with equality by Tp(xn), we can generate the halfspace
H* = {x Rn:(Tp(xn) x") (x Tp(xn)) > 0}
which gives convergence to Tp(xn) in exactly one step with X = 1.
Suppose we number the set of active constraints at T (xn) by
I* = {1, ..., p}. Let A* E (al,..., a )' be the p x k matrix
P
formed by interior normals of the halfspaces H. for iC I*. Let
1
b* = (b ..., bp)'. Define
(A, ...,A )' = (AA*')(A*xn + b*).
With this set of X.(j, V)'s using Merzlyakov method, Goffin [15]
has shown that convergence to T (x ) is obtained in one step.
p
However, this requires a comparably larger amount of computation
and finding (A*A*')1 is not really practical on large problems.
In this study we have therefore concentrated on the Generalized
Merzlyakov Method and attempted to find what would constitute a
favorable set for A and X.(x).
Application to Solving Large Linear Programs
We now consider how the procedure could be used to solve
large linear programs.
Consider the LP
max c'x
s.t. Ax b
x_ 0
and its dual
min T'b
s.t. A'T c
0.
By stong duality, an optimal I and x satisfy
b'T c'x < 0. Let
C = ((): b'T c'x 0}
M= {(x): Ax = b, A'T : c, x 2 0, T 0}
and P = M n C.
TIis is only one of the ways of partitioning the cons. L '1 ilL of
P. There is in fact great flexibility in the choice of C and II.
The advantage of splitting C and M is indicated above is
(a) M has a special structure and can be decomposed along T
and x. This leads to saving in storage as well as easier arithmetic.
(b) C contains only the coupling constraints and is easy to
project to.
However, there are other ways of obtaining P. Another
choice could be to let C have only the nonnegativity constraints.
Again the advantage of such a construction would be ease of pro
jection to C. Still another choice is for C to take the form
C = {(x): b'n c'x 5 0, x 2 0, Tr 2 0}.
There is user discrimination in this algorithm in the choice
of M and C, 1 and Z in Phase I, and A and weights A.(x) in Phase
II. We have already commented on the possible strategies for
selecting Ii and C. We give below strategies for selecting the
other parameters. We will first discuss the parameters for Phase I.
(1) A: If 0 < A 5 1 convergence may be slow if some of the
"angles" of M are very acute (small obtuseness index of P). In
this case increasing A to near 2 will have the effect of opening
the angles and accelerating the convergence. It therefore appears
to be a good strategy to have A approximately equal to 2 in Phase I.
(2) Z : In line with the general observation for relaxation
methods that anya priori information on the location of the solution
could be used to advantage, if we have to solve a linear program
with only slightly varying data, we could Lake advantage of this
feature. However, in general when we do not have any such prior
information, the choice of Z0 could be dictated by the structure of
C, so that we can start with a feasible point of C.
Now for the parameters of Phase II
(1) A: As a general guideline 1 < A <2 may be chosen as the
parameter for Phase II as was done in Phase I. However, Table 3
and Figure 24 show the sequence obtained with A = 1.5 and h = .75
and illustrate that in some cases choice of 0 < A < 1 may give
better results.
(2) A.(j, V): In Merzlyakov method the choice of weights
X.(j, V) is crucial and in a large measure dictates the convergence
1
of the sequence. It may be noted that Ai(j, V)'s determine both
the direction as well as the step size. Suppose a., i C K
represent the set of violated constraints in subcavity S where
xn SV. We can solve the quadratic problem of finding the point
of minimum norm in convex hull of the finite point set a., iE K
to get the weights X.(j, V). Such a direction locally has a great
deal of merit. However the computational effort is too great to
merit its consideration, since at each iteration we have to solve
a quadratic program. Also it may not always lead to the best set
of weights as shown by the counterexample given in the next
paragraph. Instead we specify X (j, V) =) , where KI represents
the number of constraints violated at x When IKj = 2 the
direction obtained coincides with the vector of minimum norm.
The strategy suggested by us is not the best in all cases as
shown by the following counterexample:
Consider x > 1
x2 = 2
1 0
x =(
Case 1: A = .75
TABLE 3
A.(j, V) = 1 for all i E V.
1
n El.(j, V) [ZA. (j, V)a. ]'
x 1 A.(j, V)a 1
n n n xn i [7%i (j
x x2 (a'x + b.) 11 IE(j, V)a.]
1 21 1 1
287
0 0.000 0.000 2.873 2'87 1
.958
188
1 .619 2.064 1.838 (' ) .037
.037
188
2 7.622 .686 .459 ( 88 .037
.037
099
3 9.371 .342 .273 ( '99 1
.995
188
4 9.351 .138 .125 ( 37 .037
.037
188
5 9.827 .044 .032 ( ) .037
.037
.099
6 9.949 .020 .010 ( 99) 1
.995
7 9.948 .013 .008 ( 18 .037
.037
287
8 9.978 .007 .002 ( 58) 1
958
Case 2: A = 1.5 A.(j, V) = 1 for all i E V.
1
n EAi (j, V) [Z .(j, V)a.]' *
i .(j, V)a., V)ai
1_1 2 i 1 1
287
0 0.000 0.000 2.873 ( 8 1
.958
.099
1 1.237 4.128 3.235 ( 9) 1
.995
2 .757 .700 3.326 ( 28 1
.958
099
3 2.189 4.079 3.280 ( 99) 1
.995
287
4 1.702 .816 3.166 ( 8) 1
.958
099
5 3.065 3.733 3.022 99) 1
.995
287
6 2.616 .777 2.866 ( 58 1
.958
099
7 3.850 3.347 2.710 ( '995)
.995)
8 3.447 .704 2.558 287
.958
The
problem considered is
P = {(x2): .099x1 +
.287x1 
.995x2 + .995 > 0,
.958x2 2.873 2 0, x 1 0, x2 0}.
4
2 x
x
8
6 x
x
A = 1.5
S= 5
,' '
FIGURE E L'
From the proof of Theorem 3.4 we have
2 nl (2 A)[EL.(xn)(a'xn + b)]2
2 n+1 2 n i i
d (x P) d(x, P) ( X (
(C^i(xn)ai) C Ci(xn)ai)
Thus to get the best set of k.'s we may solve the following
1
optimization problem
(A.(a'xn b.))2
1 1 1
max = k
I EA.a.l 2
I 1 1 2
,*
Suppose A* solves the above, then i also solves the above.
1 l n Ea.H
S1 1i
Hence an equivalent problem is
max (EX.(a'.x b.))
1 1
s.t. I EA.ai = 1.
In our specific problem
max A1 + 2A2
2 2
s.t. A + = 1
1 2
or max jl A22 + 2X2'
1 2
This gives iA = and .
1 a 2 J57
This set ofA.s gives
1
2 1
x =( ) which is feasible with k = 5.
1
Had we specified A, = A i.e., A. =
1 2 ' i ,
2
x =3 which is not feasible and k = 4.5.
Thus our prescription does not lead to the best set of Ais, it is
however a good heuristic and easy to implement.
(3) A.(x). In Generalized Merzlyakov method, the weights A.(x)
3 J
are determined by the point alone and are not fixed for a subcavity.
The following appears to be intuitively a good set of weights since
the weightage is dependent on the magnitude of violation of a
constraint at the specific point
.() = (violation of constraint j at x
j total violation at x
Computational Results
Die algorithm described in the previous section was coded
primarily to come up i Jith a favorable set of '.alues for the
arbitrary parameters \ in Phase I and and the weights \.(x) for
J
Phase II. It may' be stated at the outset that selection of these
parameters is an art rather than a science and it uould be hard to
specify the best set of paramccers for the general .:ase.
The test problem considered uas the problem on "LeaseBuy
Planning Decisions" on pages 93 to 114 of Salkin and Saha's
"Studies in Linear Programming." The problem has 17 conscaints
and 20 variables. Together uich dual and nonnegati'.'icy, set H
thus had 74 constraints. Set C consisted of the constraint obtained
by using strong duality. This problem has a unique solution. The
constraints of the problem are reproduced in A.ppendi:.:
Test results corroboraLe tlie gncrai l ,biiJd:lines for selection of
cliesc p[ rame Lcrs indicated in the ;rc'. ious secLiol. '1ile di L.ince
from 3 point of I' 1wliere P = 11 ii C, ,was calculatcJd 3C ench iteration.
Tlie number of iterations required to reduce the distance from P bY'
3 certain amount was used as the criterion of improvement. Two
different starting points were selected. Results of computations
are summarized below:
Phase I
A. The relaxation parameter X was varied in the interval (0,2).
For the first starting point, the number of iterations required to
reduce the distance d from P from d = 74 to d = 60, was noted, and
for the second point the number iterations required to reduce the
distance from d = 135 to d = 132 was noted. These results are
summarized in Tables 4 and 5 respectively. The results are in line
with our expectation based on intuition that a higher A has the effect
of accelerating convergence.
Phase II
The Phase II relaxation parameter was varied in the interval (0,2)
and the following alternatives were tested for the weights A.(x):
J
1
(a)(x) # of violated constraints at x
(b)X (x) violation of constraint j at x
W total violation at x
(c). (x) violationn of constraint j at x)2
3 total violation at x
(d) .(x) =(violation of constraint j at x 3
j total violation at x
4
(e).(x) violation of constraint j at x)
j total violation at x
() violation of constraint j at x)
)if ( total of violation at x = .3
= violation of constraint i at x
j( = total violation at x
S.2 (violation of constraint j at a x
total violation at x
f .2 5< (v violation of constraint j at x
I total violation at x
J total violation at x
then
)
.3 then
)
TABLE 4
Initial d = 74
X Final d No of iterations
.25 63.757 > 20,000
.5 60.0 12,955
.75 59.999 7,339
1.0 60.0 4,866
1.25 59.997 3,469
1.5 59.997 2,565
1.75 59.996 1,927
1.95 59.997 1,525
TABLE 5
Initial d = 135.
A Final d No. of iterations
.25 133.113 > 20,000
.5 132.0 12,698
.75 132.0 6,692
1.0 132.0 4,208
1.25 132.0 2,934
1.5 131.999 2,137
1.75 131.999 1,587
1.95 131.999 1,263
90
if < (violation of constraint j at x .2
total violation at x < 2 then
=3.3*(violation of constraint j at x)
total violation at x
if violationn of constraint j at x <1 then
if (x)  :  = 3.13then
total violation at x
.(x) = violation of constraint j at x)
total violation at x
(g) f ( violation of constraint j at .3 then
total violation at x
= i00"(violation of constraint j at x
j total violation at x
if" 2 < (violation of constraint j at) < .3
if (x2 = 1 0(   :  ) .3 t e
total violation at x
S= I0(violation of constraint j at x
total violation at x
if 1 1 < (violation of constraint j at x) <.2 tien
if 2 < (    ) < .32 then
total violation at x
A.(x) = .3violation of constraint j at x
j total violation at x
if (violation of constraint j at ) < then
if (  1 <. ( ) < .1 then
total violation at x
.) = (violation of constraint j at x
j total violation at x
Results of alternative (a) are given in Tables 6 and 7 for
the same two points used in Phase I.
From Tables 6 and 7 it would appear that again A of around 2
gives best convergence, but the counterexample in the previous
section shows that this is not always the best policy with this set
of weights.
Results of alternative (b) are given in Table 8.
TABLE 6
(a) X (x) = 1
j # of violated constraints at x
Initial d = 74
A Final d No. of iterations
.25  time limit exceeded
.5  time limit exceeded
.75 60.0 14,839
1.0 60.0 8,028
1.25 59.998 4,702
1.5 59.998 2,861
1.75 60.0 1,424
1.95 59.857 850
(a) X.(x) =
J
TABLE 7
1
# of violated constraints at x
Initial d = 135
X Final d No. of iterations
.25  time limit exceeded
.5 133.45 >20,000
.75 132.0 15,249
1.0 131.999 7,568
1.25 131.999 4,135
1.5 131.999 2,435
1.75 131.998 1,212
1.95 132.0 713
TABLE 8
(b) .(x) violation of constraint j at x
j total violation at x
Intiial d = 74
X Final d No. of iterations
.25 59.999 1,253
.5 59.964 1,396
.75 59.997 1,505
1.0 60.0 1,084
1.25 59.999 1,254
1.5 59.976 1,061
1.75 59.997 863
1.95 59.998 894
