Title Page
 Table of Contents
 The problem: Why iterative...
 Background results and tools
 Theoretical results
 Biographical sketch

Title: Iterative methods for solving large-scale linear programs
Full Citation
Permanent Link: http://ufdc.ufl.edu/UF00097456/00001
 Material Information
Title: Iterative methods for solving large-scale linear programs
Physical Description: iv, 106 leaves : ill. ; 28 cm.
Language: English
Creator: Kumar, Gollakota Surya, 1939-
Publication Date: 1979
Copyright Date: 1979
Subject: Linear programming   ( lcsh )
Management thesis Ph. D   ( lcsh )
Dissertations, Academic -- Management -- UF   ( lcsh )
Genre: bibliography   ( marcgt )
non-fiction   ( marcgt )
Statement of Responsibility: by Gollakota Surya Kumar.
Thesis: Thesis--University of Florida.
Bibliography: Bibliography: leaves 101-104.
Additional Physical Form: Also available on World Wide Web
General Note: Typescript.
General Note: Vita.
 Record Information
Bibliographic ID: UF00097456
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: alephbibnum - 000096516
oclc - 06429711
notis - AAL1949


This item has the following downloads:

iterativemethods00kumarich ( PDF )

Table of Contents
    Title Page
        Page i
        Page ii
    Table of Contents
        Page iii
        Page iv
        Page v
        Page vi
        Page vii
    The problem: Why iterative techniques?
        Page 1
        Page 2
        Page 3
        Page 4
        Page 5
        Page 6
        Page 7
    Background results and tools
        Page 8
        Page 9
        Page 10
        Page 11
        Page 12
        Page 13
        Page 14
        Page 15
        Page 16
        Page 17
        Page 18
        Page 19
        Page 20
        Page 21
        Page 22
        Page 23
        Page 24
        Page 25
        Page 26
        Page 27
        Page 28
        Page 29
        Page 30
        Page 31
        Page 32
        Page 33
        Page 34
        Page 35
        Page 36
        Page 37
        Page 38
        Page 39
        Page 40
        Page 41
        Page 42
        Page 43
        Page 44
        Page 45
        Page 46
        Page 47
        Page 48
        Page 49
        Page 50
        Page 51
        Page 52
        Page 53
        Page 54
        Page 55
        Page 56
        Page 57
        Page 58
        Page 59
        Page 60
        Page 61
    Theoretical results
        Page 62
        Page 63
        Page 64
        Page 65
        Page 66
        Page 67
        Page 68
        Page 69
        Page 70
        Page 71
        Page 72
        Page 73
        Page 74
        Page 75
        Page 76
        Page 77
        Page 78
        Page 79
        Page 80
        Page 81
        Page 82
        Page 83
        Page 84
        Page 85
        Page 86
        Page 87
        Page 88
        Page 89
        Page 90
        Page 91
        Page 92
        Page 93
        Page 94
        Page 95
        Page 96
        Page 97
        Page 98
        Page 99
        Page 100
        Page 101
        Page 102
        Page 103
        Page 104
    Biographical sketch
        Page 105
        Page 106
        Page 107
Full Text


Gollakoca Surya Kumar



19 79


I wish to express my deepest gratitude to Professor Gary

Koehler, my dissertation chairman, for the excellent guidance, the

constant encouragement and the support he has provided me during

my career at the University of Florida. Despite his busy schedule

he was always available at a moment's notice.

I also wish to thank Professor Antal Majthay whose depth of

knowledge not only in the area of operations research but on a

wide range of subjects will always remain awe-inspiring. I am

also grateful to Professors Ira Horowitz and Henry Tosi for

providing financial assistance during my stay at the University of

Florida. I also wish to thank Ms. Kathy J. Combs for her superb

typing of the manuscript.

Last but not the least this work would not have been possible

but for the patience, understanding and encouragement of my wife




ACKNOWLEDGEMENTS ............................................ ii

ABSTRACT ............................ ... ....... .... .... ..* v



TWO BACKGROUND RESULTS AND TOOLS ..................... 8

Introduction ............. .... ............... 8
Motivation ................................ .... 8
Task ... ........... ......................... .. 11
Definitions .................... ......... .. 12
Typical Relaxation Method ....................... 13
Procedure ....................... ............. 13
Variations .................................... 14
Convergence ................................... 14
Fejer-monotone sequence ..................... 14
Motzkin and Schoenberg ..................... 16
Relaxation Procedures ......................... 18
Agmon .................................... ..... 18
Motzkin and Schoenberg, Eaves ............... 21
Goffin ........................................ -
Definitions .................................. 22
Another proof of Agmon's result .......... 33
Range of A for finite convergence ........ 37
Merzlyakov .................................... 39
Definitions and motivation ............... 39
Method ....................................... 40
Generalized Merzlyakov procedure ....,..... -
Typical Subgradient Method ..................... 6
Procedure .................................... 6
Variations .................................
Some Subgradient Procedures ...,................. '
Polyak ... ........ ...........................
Oettli ...................................... 51
Some of the more recent algorithms .......... 5)
Camerini, Fratta and Maffioli ............ 5
Wolfe ...................................... 59



THREE THEORETICAL RESULTS ................................ 62

Comparison of Generalized Merzlyakov Method
and Some Typical Subgradient Methods ............ 62
Generalized Merzlyakov method ................ 62
Further developments of Oettli method ........ 66
Generalized Merzlyakov method and Oettli method 68
Comparison with some other subgradient
methods ...................................... 71
An Algorithm Using Relaxation Method ............ 73
Motivation ................. ................. 73
The algorithm ............................... 74
Phase I ............... ......... ......... 74
Phase II ................................. 79
Application to solving large linear
problems .................................... 80
Computational results ........................ 86
Concluding Remarks .................... ......... 98

APPENDIX ....................................................... 99

BIBLIOGRAPHY ................................................. 101

BIOGRAPHICAL SKETCH ................ .......................... 105


Abstract of Dissertation Presented to the GrAduate Council
of the University of FloridJ
in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy


B '

Gollakota Surya Kuirar

August 1979

Chairman: Gary J. Koehler
MIjor Department: Management

The simplex: method for solving linear programs has achieved

its overwhelming success over the years due to the advances during

the last three decades on extending the effectivEness of the basic

algorithm introduced by Danczig in 1951. However the largest

problem that can be economically solved by this method is dictated

by the size of the basis. The limitation being the ability to

maintain a sparse yer accurate representation of che basis inverse

CocuTiercial computer codes using the recent developments of the

simplex method are capable of solving manLIem,- Lical program systems

uf Llie order of 10,'i.iUU rows. IJowev.er, inrker l incur p.rogr ns do

arise in practice. The reason why LIe', are not being set up is

because none of the production LP codes can solve such laree

problems in a timely and cost effective fashion. For such large

problems iterative techniques that do not require a basis may be

an attractive alternative.

We review two classes of iterative techniques--relaxation

methods and subgradient optimization. Solving a linear program

can be shown to be identical to finding a point of a polyhedron P,

the polyhedron being formed from primal and dual constraints and an

additional constraint using strong duality. In relaxation methods

we successively find points of relaxed forms P c pn. That is, at

iteration n we find an x Pn such that P c P. We assume P # 0.

The problem is solved when the sequence converges to a point

x* P.

Subgradient methods are an alternative class of iterative

techniques for solving these problems. In subgradient methods we

minimize a convex though not necessarily differentiable function

n+l n n n n
f(') by the iterative scheme x = x t u where t > 0 is a

n n n
a scalar obtained by the approximate minimization of f(x t u),
n n
and u is a subgradient of f at x When we use the minimization

method for finding a point of a polyhedron P, the function f is

so defined that if x* is a minimum of f then x* E P.

These iterative techniques work with original data; thus

advantages of supersparsity can be fully realized and the program

run in core. Also they are self correcting and knowledge of a

vector close to the solution can be used to advantage.

We show that a generalized form of Merzlyakov's relaxation

procedure subsumes most of the recent subgradient procedures when

the objective is to find a point of a polyhedron. A new algorithm

using relaxation method is presented. This algorithm was coded to

explore values for the relaxation parameters used in the algorithm.



Mathematical programming has achieved its present popularity

in a large measure due to the success of the simplex method for

solving linear programs published by Dantzig [6] in 1951. Over the

years considerable effort has been expended towards exploiting and

extending the effectiveness of this procedure. We will briefly

outline some of the improvements.

Most practical linear programs have a sparse matrix [2] and it

is necessary to exploit this sparseness to reduce memory and time

requirements and to maintain accurate information. The first such

procedure was the revised-simplex procedure. This procedure

operates only on original problem data and requires an inverse of

the current basis. If there are m constraints and n variables

(including slacks, surplus and artificial) then the regular

simplex procedure requires

(1.1) (m + l)(n + 1)

storage locations. There are m + 1 rows (including the objective

and n + 1 columns (including the right-hand side). In the revised-

simplex procedure we need

(1.2) p(m + l)(n + 1) + m2

where p is the problem density. (1.1) is much larger than (1.2)

if p is close to zero.

It was also discovered that if the inverse matrix were stored

in product form [29] an additional decrease in storage requirements

could be realized. However, in product form, the basis inverse

grows in size at each iteration so that eventually we must either

run out of room or scrap the current form of the basis inverse and

reinvert the current basis. In practice this happens quite often.

Using product form inversion, the storage requirements are

(1.3) p(m + l)(n + 1) + qmr

where q is the average density of the eta vectors needed in the

product form inverse and r is the average number of eta vectors.

Notice, though, that to gain the decrease in (1.3) over (1.2) we

must spend time in the reinversion step.

It was soon realized that there are many ways of forming a

product form of a basis matrix. For instance, the matri:-; could

first be decomposed into the product of two triangular matrices.

The eta vectors for these are sparse and easily formed. In

addition, within a class of procedures for viewing the original

basis, one can choose the row-column pivot order to minimize the

creation of new members -- i.e. one can choose the order in forming

the eta vectors to minimize the density of the resulting basis

inverse representation [20]. A related issue has to do with

updating the currect basis inverse with a new eta vector. ,inc cjii

perform this operation with a minimal buildup in storage

requirements [14].

Kalan [22] pointed out that of the p(m + l)(n + 1) elements of

a typical LP problem, most are identical (usually most are + Is).

Kalan suggested storing only the unique numbers of an LP problem

and using pointers to store the problem structure. In so doing

one stores fewer elements than exist in the problem. The new

sparseness is termed supersparsity. Kalan noted that not only can

original data be stored this way but also new data (formed in the

basis inverse) can be stored in this fashion.

Another useful idea -- scaling of data -- has numerical

stability for its motivation. Generally for real problems the

coefficients of matrix may take on a wide range of values. This

can cause numerical difficulties -- for example in inversion --

since the computer must round off fractions. To reduce the effect

of this problem, the matrix entries are scaled before starting the


There are a number of ways in which to obtain a starting

basis: from scratch using an "artificial basis," using a CRASHed

basis forcing some of the variables into the basis, or based on

information about the problem. The last approach is especially

useful in solving a series of problems with only minor variations

in data.

Candidate selection can play an important part in the

efficiency of an LP algorithm. There are a number of ways to select

the entering variable -- "multiple pricing" is one of them which is

especially useful for large problems. In this approach we typically

divide the variables into segments and pick the segments sequentially.

In the chosen segment we select the g most negative reduced cost

variables as candidates for pivoting. We concentrate on these g

non-basic variables for the remainder of some h iterations. And

then repeat the procedure with the next segment. A more recent

development concerning pivot selection is due to Paula Harris [17].

The reduced costs here are weighted prior to selection. The

weights are chosen with respect to a fixed set of non-basic

variables. The idea is that in the traditional most negative

reduced cost approach we are trying to find out the best candidate

but on a local basis -- locally we move in the direction of

steepest ascent of the objective function. Such a selection may

not serve global interests. By constantly adjusting the pricing

by weights derived from a fixed reference point, the global

interests of the problem are hopefully maintained. Expr rimental

results corroborate this expectation and show it to be a very

effective procedure.

When there is degeneracy in a linear programming Drobleim,

there exists the possibility of a set of basic feasible solutions

occurring repeatedly -- called "cycling." Bland 13] has introduced

several new methods to avoid cycling -- the simplest of whicn is

to select as entering variable one with the smallest subscript

among all candidates to enter basis. If there is a tic for leaving

variable, select the one with the smallest basic subscript.

Parallel with efforts to increase the size of problcim that

can be tackled by the simplex method have been efforts to incrcnse

its capabilities for problems having special structure. l ie

computing effort using the simplex method depends primarily, on the

number of rows; so it is worthwhile considering the possibillty of

taking care of some constraints without increasing the size of the

basis. Generalized Upper Bounding (GUB) technique introduced by

Dantzig and Van Slyke 19] achieves this for contraints of the form

(1.4) Ex. = b.
J 1

with the restriction that the same variable does not occur in more

than one GUB constraint. Decomposition technique is another method

of extending the capability of problem solving by breaking it into

smaller subproblems and using a master program iteratively to tie

them together. Unfortunately the convergence of this method is

rather slow and in addition one is unable to compute bounds on the

value of the objective function at each iteration, unless each of

the subproblems is bounded.

Despite the far reaching advances made during the last three

decades in extending the capability of the simplex method, some of

which have been mentioned above, the basis inverse still remains

the limiting factor dictating the size of the problem that could

be tackled. Even when we use variants like GUB and decomposition

techniques, the size of the basis is still the bottleneck. In the

case of GUB it is the basis of the master problem consisting of

other than GUB constraints and with decomposition it is the size of

the basis of the largest of subproblems. Thus this basis inverse

is the heart of all that is bad in simplex method related techniques

of solving an LP.

Commercial computer codes using above developments of simplex

method are capable of solving mathematical programming systems of

the order of 10,000 rows. However, useful linear programs of such

magnitude do arise in practice for example in food, oil and

chemical related industries. The reason why larger ones are not

being set up and solved is that no production LP code has been

developed to solve such large problems in a timely and cost

effective fashion. This is not to say that there are not large

linear programming production codes. For example, IBM's MPSX/370,

CDC's APEX, Honeywell's LP 6000 X, Techtronic's

MPS3, and others are reliable and available at reasonable rates.

However, one would not seriously consider using one of these to

solve a lincar program of 10,000 or more rows on a routine basis

unless the problem possessed some nice structure or the resulEs of

the problem solution had a high economic value.

Dantzig et. al. [S] in their justification for the setting up

of a system optimization lab for solving large scale models have


Society would benefit greatly if certain
total systems can be modeled and successfully
solved. For example, crude economic planning
models of many developing countries indicate
a potential growLh race of 10 to 15. per year.
To implement such a 2rou.thl (aside from political
differences; requires a carefully worked out
detailed model and the availability of computer
programs that can solve the resulting large-
scale systems. The world is currently faced
with difficult problems related to population
grouch, availability of natural resources,
ecological evalu.'t ions and control. urban
redesign. design :f lar.c scalc cn.inccrin:.,
systemb'ni (e.g. atomic en r,;y .id ru:yc lin*i
systcni. and the modeling of mrn;n's plih iologican
.v teCm for the p purpose of diagnosis and trc;itmenic c.
These problems are complex: and urgent and can
only he solved if viewed as cotal systems. If
not, [lhcn oiily patclhuork picconical solutions s w. ill
be developed (as it las been in the past) and
the u.rrld will continue to be plagued by one crisis
after another caused by poor planning techniques.

The bottleneck in solving such large unstructured LPs is the

inability to maintain a sparse yet accurate representation of the

basis inverse. When the simplex method is not computationally

feasible for such large problems, iterative techniques that do not

require a basis inverse may be an attractive alternative. In this

dissertation we concentrate on iterative procedures for solving

linear programs.

We discuss iterative techniques under the headings (a)

relaxation methods and (b) subgradient methods. These methods are

insensitive to the number of constraints. In fact an increase in

the number of constraints for the same problem improves the

convergence of the relaxation methods. Unlike the simplex method,

iterative techniques are self-correcting. We always use original

data thus advantages of supersparsity [22] can be fully realized

and the program run in main memory. Finally knowledge of a vector

close to the solution can be used to advantage. This is very

useful when a sequence of problems has to be solved with only

slightly varying data.

As an aside to the main thrust of this dissertation, we note

that iterative methods are more attractive for solving certain

large Markov decision processes and also for problems having a

Leontief Substitution constraint set [23, 25]-both of these problems

are an important special class of LPs.




Mo tivation

In this chapter we will review various iterative techniques

for finding a point of a polyhedron. We are interested in finding

a point of a polyhedron because a wide variety of important

mathematical programs can be reduced to such a problem. For

example, solving a linear program could be reduced to finding a

point of a polyhedron using primal and dual constraints along wich

an additional constraint implying strong duality. To elucidate tliis

further, consider the following primal-dual linear programming pair:

(2.1) Primal Dual

Max c'x Min v'b

s.t. Ax b s.t. v'A c c

x >O v 0

Here A is m x n; c is n x 1 and b is m x 1. v' stands for cranspose

of v. Problem 2.1 can be restated as

(2.2) -Ax + b 0

v'A c 0

v'b c'x 0

v 2 0, x 0

Let P = {((): (v) satisfies 2.2} be the set of solutions to 2.2.

()E P implies that x solves the Primal problem in 2.1 and v solves

the dual problem.

Linear programs of the form

z* = Max Min (a'x + b.)
1 1
x i=l,...,m

relate directly to the problem of finding a point of a polyhedron P

P = {x C R : a'x + b. z*, i = 1, ..., m}
1 1
Such problems arise, for example, in large-scale transportation

problems and decomposition problems [27]. We will develop such a

formulation for the transportation problem.

Given k origins with stocks aQ > 0 of a single commodity, and

m destinations with demands d. > 0 the transportation problem involves

determining a shipping schedule that meets all demand and supply

constraints at minimum cost. Let

zyj be the amount shipped from I to j

cj cost of shipping one unit from Z to j.

The problem is
k m
Min Z E c ,j zzj
Z=l j=1
subject to
E z j a = 1, ... k

E = d. j = 1,..., m
%=i j J

z > 0
j =

Without loss of generality, we assume

k m
Z a = d
=1l j=l 1

If m and k are too large to be handled by conventional network

methods [4], then the decomposition principle can be used. Let

(z : = ., n; j = 1, m): i I'}

be a list of extreme points of

E z k = d., j = 1, ..., m

z >2 0

Then the transportation problem is equivalent to the master problem

k m
Min E Z c Z z A
k=l j=1 idI

subject to
a -k z AP = 0, Z = 1, .. ., k
il j=l

SA = 1

A1 0.

k m
b Z Z cj z
i =1 j=1

-i i i
a = a z z

The master problem becomes

Min E bii

subject to
i i
A = 0, 1 = 1, ..., k
iE I


i 0

The dual to the above problem is

Max w

subject to
-E a x + w b

w b + a'x

ai i k
ai = (al' a2, ..., ak) c R

This problem is equivalent to

Max, Min rax + b.
iax + b.)
xER icI 1 i


Two of the methods of finding a point of a polyhedron are the

Fourier-Motzkin Elimination Method [7] and a Phase I simplex method.

The Elimination Method reduces the number of variables by one in

each cycle but may greatly increase the number of inequalities in

the remaining variables. For larger problems this procedure is

impractical [26] -- since the number of inequalities growsat a rapid

rate. The Phase I simplex procedure can also be used to find a

point of a polyhedron. For very large problems the simplex method

breaks down due to a practical limitation on the size of the matrix

that can be inverted as explained in the last chapter. It is for

problems of this nature that iterative methods may be an attractive

alternative. We discuss iterative techniques under two classes --

relaxation methods and minimization methods.

In relaxation methods, at each iteration we attempt to find a

point of a relaxed form Pn of P. That is at iteration n we find an
n n n
x P such that P c P We look for the property that the sequence

of points thus generated is pointwise closer to P. Minimization

methods attempt to solve the problem of minimizing a general, maybe

non-differentiable, convex function f('). The sequence generated

generally has the property that the successive iterates are closer

to the level set of solutions in distance, x* is one such point.

The value of the objective function need not necessarily improve ac

each iteration. When we use the minimization method for finding a

point of P, the function f is so defined that if x* is a minimum of

f then x* E P.

We will now review the relaxation methods. As was mentioned

n n
earlier, we find at each iteration n, a point xn p where P is a
relaxed form of P. That is P c P We continue finding points

successively in this manner till we can find an x* E P. :L leasE

one such point exists by our assumption that P J 0.


We need a few definitions before some of tle methods c.n be

described. Let

R.(x) = a'x + b
1 1 1

k k
where i E I, I # 0 and finite, x R a. R and b. E R. Without
1 1
loss of generality assume a.i = 1 for all i, where 11 II is the

Euclidean norm. H. given by
H. = {x c Rk: k.(x) 0)
1 1

is the halfspace associated with i C I. P given by

P = n H. = {x e Rk: (.(x) 2 0, j I} is the polyhedron
1 1

of interest. E. given by

E. = {x C R: L. (x) = 0}
1 1
is the plane associated with halfspace 11 d(x,H.) given by

d(x, H.) = inf II x-zI

z H.
is the Euclidean distance from x R to halfspace H.. d (x) given
1 p

d (x) = max d(x, H.)
p i
i I

is the maximal distance from x to P. B (x) given by

B (x) {ycRk: 1 x-yjll r}
is the ball with radius r and center x. S (x) given by
Sr(x) = {yRk: Ix-y| = r}

is the k-l dimensional sphere of radius r and center x

Typical Relaxation Method


A typical relaxation method for finding a point of P is
o k
(1) Pick x R arbitrarily

(2) If xn P stop, otherwise let i* be the

most isolated halfspace, i.e.,
n n
-a'. : x b.* > -a' x b., for all iel.
1 1 = 1 1

We obtain the sequence {xn} recursively as follows:

n+1 n +n n n
x = x + A (t x )
n n n n
where t = PE (x ) is the projection of x on Ei,. A is called

the relaxation parameter at iteration n. Generally An is constant

across n = 1, 2, ...


If An = 1 for all n,we call the procedure a projection method,

and when An = 2 for all n,we call it a reflection method. See

Figure 1 for an example of the relaxation procedure. In this

example we use A = A = 2, for all n. At xo, E is the plane

corresponding to the most violated halfspace. We reflect in E.
1 1 2
to get x At x we reflect in E3 to get x If we concinlju

the procedure we are in P in four iterations.


Fejer-monotone sequence

We are ultimately interested in finding a point of P or at least

finding a sequence of points which converge to a point of P. If the

sequence terminates with a point of P (as in Figure 1) we ha.'e

accomplished our goal. If the sequence does not terminate, then .'e

wish to know whether it converges to a point of P. A well kno..in

result concerning "Fejer-monotone" sequences will be used to shed

light on this question.

a3 + b> 0
3* 3 =

alx + b > 0
1 1=


a2x + b2 0
a 2 + b 2


A sequence of points {xn} in Rk is said to be Fejer-monotone

[13, 31] with respect to the set P if
n+l n
(i) xn+ xn and

(ii) I x+ zll I xn z | for all z P and for all n.

Motzkin and Schoenberg

A well known result in analysis is that if {sn) is monotone,

then {sn) converges if and only if it is bounded. Hence we see that

a Fejer-monotone sequence always converges in distance. However, we

need to relate this convergence to the convergence of sequence {xn}.

Before we relate a theorem on convergence of Fej6r-monotone sequence

of points, we need a few more definitions.
Given a set K c Rk, the affine hull of K, denoted by 'f(K), ii
k ii i
M(K) = {x E R : x = x x E K,

A = 1, L finite, Ai R}.

For example, the affine hull of a point is the point itself and affine

hull of a line segment is a line containing the segment. The affine

hull of a triangle in R3 is the plane containing the triangle. A

set P is affine if P = M(P).

Let M be affine in Rk, x a point of M and r E R be a positive

real number. Then we define S (M,x) as the spherical surface .:itl

axis M, center x and radius r by

(2.3) Sr(M,x) = x + ((M x)L n {z E R k zj = r})

Figure 2 shows the construction of a spherical surface. Gi'.i-n a

point of affine set M, M x is the translation of M by x. (" -

is orthogonal to M x. S (0) is a spherical surface with radius r,

which intersects (M x)- at two points. These two points are
translated back by x to give S (M, x),
(2.4) Theorem (Motzkin and Schoenberg)
Let the infinite sequence {xn) be Fej6r-monotone with respect
to the set P, then
Case (a): If P has an interior, the sequence converges to a

S (M, x)

I. x

i -

s (0)

(M x)



Case (b): If P does not have an interior, then the sequence

can be convergent to a point or the set of its limit points may lie

on a spherical surface with axixMl(P).

See Figure 3 for an illustration. In case (a) P has an interior

0 1 2 3 *
and the sequence x x x x ... converges to a point x of P.

In case (b) P does not have an interior and the Fejer-monotone

0 1 2 3
sequence x x x x ... results in the two limit points of the

sequence lying on a spherical surface with axis AI(P) and center y.
y being the projection of x on P

Relaxation Procedures


We are now in a position to start our survey of relaxation

procedures. One of the earliest published methods is the projection

method due to Agmon [1]. Here we specify an initial veccor x; E F,

(chosen arbitrarily) and set An = 1 for all n. At iteracion n if
n n+l
x E P stop. Otherwise we find a new point x as follo..'s:

n+l n n n n
x =x (t x

I n
(a x + b. ) a
n n n in
= x -
a. a.
n in

n n
= x (a x + b.
in in) a.

since Ia.i I = 1. Here i represents a most violated conLr3iinc of

P at x

Case (a)

S (M(P), y)

1 3 5
X ,X ,X ,


M (P)


x ,x ,x ,x ,

Case (b)



This procedure yields a sequence which converges to a point of

P. In the Agmon procedure

P = {x: a. x + b. 0)
in in

That is, pn is the closed halfspace defined by the constraint in and

P c In keeping with our notation, here P is a relaxed form of P.

(2-5) -Theorem (Agmon)

Let P = n H., P # (, I 7 0 and finite, be the solution set of
iEI 1

a consistent system of linear inequalities, and let {xn) be the

sequence defined by the projection method, that is, n = 1 for all n.

Then either the process terminates after a finite number of steps or

the sequence {x n converges to a point of P, x

n 0 n
Ix x | 2d(x P) ,

where 0 c (0, 1) and depends only on the matrix A, where


A =

a m

and m = ll.

Agmon explicitly proved the convergence for the projection

method. He also showed that the results could be extended to the

case A E (0, 2), where A = A for all n.

Motzkin and Schoenberg, Eaves

Motzkin and Schoenberg 131] (among others) described a reflexion

method for finding a point of P. They showed that if P has an

interior then the sequence terminates with a point in P after a

finite number of steps. Let x0 P be an arbitrary starting point

and generate a sequence as follows:

If x C P stop
n n
If xn P select a halfspace such that x n H.1

n+1 n
Let x be obtained by reflecting x in E.. After
finitely many iterations, x will fall in P if P has an


The general reflection function can be expressed as follows:

k k
For i = 1, ..., m define f : R -- R by

x 2(a.x + b.)a. if x 1 H.
f.(x) = x + 2d(x, H.)a. =-
x if x E H.

Let gl' g2, ... be a sequence where

gj {fl, .. ., f } for j = 1, 2, .

Define g to be the composite function

gn(x) = gn(gn 1 (...(g (x)) ...))

(2.6) Theorem (Motzkin and Schoenberg)

Let P # 0 and I # 0 and finite. If P = n H. has an interior,
then for any x there is an k such that

g (x) E P.

Eaves 111] extended this result and demonstrated that the

reflection procedure has a uniform property. Namely, if x is within

a distance 6 of P, then g (x ) will fall in P for some e where
Z and k depends on 6 and not on x .

(2.8) Theorem (Eaves)

Assume that P = n H. has an interior. Let X be the set of

points within a distance 6 of P, that is,

X = {x C R: d(x, P) 6}

Then there is an L such that g (X) c P. g is thus a piecewise,

linear retraction from X to P.

Piecewise and linear means that it has finitely many linear

pieces. Retraction means g : X -- P is continuous, P c X and

g (x) = x for x C P. See Figure 4 for example. Six points

chosen arbitrarily within 1" of P all converge in under four



Definitions. Coffin's 115] work deserves special menLiocn.

He has provided a comprehensive treatment of relaxation mechods

and presented several useful finiteness properties. We need a

few more definitions before presenting Coffin's results.

A convex subset F of P is a face of P if

x, y C P
(>-- x, y c F.
(x, y) F # J

We denote the set of faces of P by F(P). Figure 5 illustrates

F(P) for a given P. Zero-dimensional faces are 1, 2, 3. On'-

dimensional faces are 4, 5, 6. The polytope 7 itself is also a race.



The zero-dimensional faces of P are called the extreme points of P.

The n-1 dimensional faces are called facets of P.



NS(x) defined by

NS(x) = {u E Rk: u'(z x) 0, Vz E S}

is the normal cone to S at x. NS(x) is a non-empty, closed convex

cone. It is non-empty because it contains at least the origin. Ic

is closed and convex because the intersection of closed halispace is

closed and convex. It is a cone because for all u E NS(x),

Au E N (x), A 2 0.

CS(x) = (NS(x))p

is the supporting cone to S at x. Figure 6 illustrates theoe last

two definitions.

A point x of a set P is a relative interior point designated

r.i.(P.) if it is an interior point of P with respect to the relative

topology induced in M(P) by the Euclidean topology, i.e., x E r.i.(P)

if there exists 6 > 0 such that B6(x) n M(P) c P. For example, as
2 2
shown in Figure 7, a line segment in R has no interior in R but

has a relative interior.

NS (x)



It can be shown that the normal cone is the same for all

relatively interior points of a given face, i.e.,

Np(x) = Np(F) for any x c r.i.(F)

where F is a face of P. Similarly

C (x) = C (F) for any x c r.i.(F).

Let T be any set, then -T is defined as

-T = {x E Rk: x = -y, y T}

A closed convex set is said to be smooth enough at a point y of

its boundary if

-Np(y) c Cp(y)

A closed convex set is said to be smooth enough if it is smooth

enough at every boundary point, or equivalently if

-N (F) C (F) V- FE F(P) 0

where F(P) stands for the set of faces of P and F(P) 0 is the set

F(P) with 0 removed.

The smooth enough property applied to a polyhedron would require

all its "corners" to be "smooth enough," carrying the analogy to

k-dimensional Euclidean space.

Some characteristics of smooth enough convex sets are mentioned


A polyhedral cone C is smooth enough if and

only if (iff) -C c C, where C is the polar of C.

C={u E Rk: u'z < 0, Vz E C}.

A polytope (bounded polyhedron) is smooth

enough iff all its extreme points are smooth


A polyhedron is smooth enough iff -Np(F) c C (F)

for all the minimal elements of the poset {F(P) c, c}

where c stands for set inclusion.

Some illustrations are given in Figure 8.

A poset (P, <) is a set P on which a binary relation < is

defined such that it satisfies the following relations:

(i) for all x, x < x (reflexivity)

(ii) if x < y and y < z then x < z transitivityy)

(iii) if x < y and y < x then x = y (antisymmetry)

Examples of posets are {R, <}, i.e., the reals with the ordinary

less than equal to operation, a dictionary with lexicographic

ordering, and the power set S, P(S), with set inclusion, c as the

binary operation.

Given a unit vector e E R and an angle a E [0,11], the set of

vectors which make an angle with e less than or equal to a is called

the spherical cone, C (e), with axis e and half aperture angle a.

N (y) (y)

(a) Not smooth enough at the extreme points of P.


(b) Not smooth enough at any point of P.

N (y)

(c) Smooth enough.


C (e) = {x E Rk: x'e x || Cos a}

See Figure 9 for an illustration

C (e) = C{B (e)}

a = sin a, a c [0, i)


Some properties of C (e) are given below:

Ca(e) is closed for a E [0,1]

Ca(e) is a convex set for a G 10, N/2]

For a E [0,1/2)if we let v = Sin a, then

Ca(e) = C{B (e)} where
k ii i i
C(S) = {x R x = A x x E S, A > 0, L finite)

C(S) is the convex cone hull of S.

Following Goffin [15] we define an obtuseness index u(C) of a closed

convex cone C to be the sine of the half aperture angle of the

largest spherical cone contained in C. Properties of V(C) are:

u(C) = sup {Sin a: C (e) c C, e i B (0)}

= sup mnin(a e)
eS1 (0) icI

u(C) > 0 iff C lias an interior

C1 c C, -- u(C ) < u(C,)

u(C) = 1 iff C is a halfspace.

If C (e) is a spherical cone then u(C (e)) = Sin a.

The obtuseness index of a polyhedron P is defined b.,

U(P) = inf 'u(Cp (:-:)) = min U(Cp,(F))
:-z P F- F'(P)-.i .

For a pol cope 'e have

'(P) = ri-n u(C,(F))
Ft'.'ertices of P

If U(P)= l. 1/-2, then P is smooth enough. However, clis is noc a

necessary condition for P to be smooth enough. If P is the pocsicive

ortchant in R then P is smooth enough but

U(P) = 1..'D 1/..F

The distance from a point X:; c a set S C Rk is defined bv,

d(x. S) = inf I x z
zE S

If S is closed, chen che inflmum is acctined, and che set ol points

of S where it is attained is the set of points closest re :-: in S

Ts(x%) = {' FE S: I x y: | = d(x., S)}

If a ser K is closed and convex, then T..(::, reduces tc a single point

which will also be denoted by TK(x) This point is called the
projection of x into K. Hence T (x) is a retraction from R to K.

Let K be a closed convex set in R The following important

result implies the continuity of the map TK.

(2.8) Theorem (Cheney, Goldstein)

The projection map TK satisfies the Lipschitz condition (for

C = 1). That is

IITK() TK(y)I Ix yI

See Figure 10 for an illustration;


K() y


IT (X) T (y)|I =< I x YIJ


(2.9)- Theorem (Goffin)

Let K be a closed convex set of Rk and TK(x) the unique closest

point to x in K. Then

{[TK(x) + NK(T (x))], TK(x) K}1 is a partition of Pk

See Figure 11 for an illustration.


(2. 10) Lc rEuT

Le C c R' be a closed con'.ex cone. Then

T C" ) c )

where C is i hc polar of C.

An alternaci'.e formulation of a spht.erical surface '..ich da:is. 11,

radius d(x, I) and center T (X) (see 2.3) is concaincd in lhe

following lenrma.

(2.11) Lemma (Goffin)
k k
Let M be affine in R and x C R then

Sd(x,) T(x)) = {y c Rk: x z11= Ily- l z M}

'Finally a well known result on norm-equivalence is stated

below. Here we state the result in terms of distances from x to P

with x V P.

(2.12) Lemma

Let P = n II., P / 0, I J 0, then there exists a scalar
iCl 1

0 < J < 1 such that for all x c Rk

pd(x, P) d (x) < d(x, P).

This result is illustrated in Figure 12.

Another Proof of Agmon's Result. The following is a different

proof for the result originally due to Agmon. We will find this

useful in subsequent parts of this material. In this proof we

make use of the result that the sequence generated by the relaxation

method is Fejer-monotone with respect to P.

d (x)
P TT' ,-,


(2.13) Theorem

Let P = n H., P J 0, I # 0 and finite, then the relaxation

method generates a sequence which converges finitely or infinitely

to a point x* E P for any given value of the relaxation parameter X

such that A E (0, 2).

Furthermore, we have
n *0
Sx x 2d(x P) 6n

where 0 = [1 A(2 _A)p2]1/2

AE [0,1)


Assume x n P

First we will show using Figure 13 that

d (xn+, P) = d 2(x, P) A(2 -A)d (xn)

where d (xn) is the maximal distance from xn to P i.e., d xn

max d(x, H.).
n+l n n+l
From the right triangle x t T(x ) we have:
2 n+1 2 n+1 n 2 n
d (x P) = d (x t ) + d (t P) ........ ....(1)
n n n
From the right triangle x t Tp(xn) we have:
2 n+1 2n n 2n
d (x P) = d (x t ) + d (t P) ............... (2)

From (1) and (2)
2 n+1l 2 n 2 n+1 n 2 n n
d (x P)- d (x P) = d (x t) d (x t )

2 n+1 2 n n+l n n n.
d (x n+, ) = d (x P) [d(x t ) + d(xn, cL )

[d(x t) d(x+ t )]

= d(xn, P) Adp(xn)[(2-A)dp(xn)]

= d2(x, P) A(2 A)d (x )

It will be noticed that our proof is independent of Figure 13. The

figure only helps to illustrate the ideas.

T (xn) = T (xn+)
p p


By lemma 2.12 there exists

pd(x, P) a dp(x)

a p > 0 such that

d(x, P)


2 n+l 2 n 2 2 n
d (xn1, P) < d(x, P) 2A(2 )d(xn, P)

= d(xn, P)[l A(2 A) p2]

(2.14) Let 9 = [1 A(2 A) 2 1/2

O = 0 when A = 1, p = 1. In general 0 E [0, 1). Thus,

d2(xn, P) O2d2(xn, P)

or d(xn, P) Ond(x0 P)

Case I: If the sequence does not terminate

lim d(xn, P) = 0

Since {xn} is Fejer-monotone and dim P = n, then {x}n converges to

x C DP by Theorem 2.4 where DP is the boundary of P.


Case II: If the sequence terminates in a finite number of steps

Sxn+ z f
Fejer-monotone with respect to P. Clearly I x*- Tp(xn)|l~l xn- T(x)II,


x* E Bd(xn, P)Tp(n)

and x*, xn both belong to the ball B d(xn pTp(x ) (see Figure 14).


xn x*1x 2d(xn, P)

< 2nd(x0, P)

If the sequence terminates, we can take x* to be the last point of

the sequence./



In the above procedure, if 1 A < 2 then

P = {x: a' x + b. > 0}
1 1 -
n n

That is, pn is the closed halfspace defined by the constant i .
If 0 < A < 1, then

pn = {x: a, x + (1 A)b. Aa' xn > 0)
i i i =
n n n

and again P c pn. In both cases Pn is a relaxed form of P and at

iteration n we find xn E Pn

Range of A for finite convergence. Using relation 2.14 giving

o against different values of A and p, we get the results in Table 1

shown graphically in Figure 15.


V = 1 A .1 .5 .75 1 1.5 1.9

61=f(A) .9 .5 .24 0 .5 .9

S= .1 .5 .75 1 1.5 1.9

0,=f(A) .98 .9 .87 .87 .9 .98

S= 4 A .1 .5 .75 1 1.5 1.9

6 =f(A) .99 .98 .97 .97 .98 .99

From Figure 15 it would appear that A = 1 gives best results (lowest

0). However, Motzkin and Schoenberg demonstrated that when P has an

interior A = 2 leads to finite convergence. Also if P has acute

angles, p will be low and 0 high. Thus an obtuse angle should lead

to faster convergence. Goffin determined the range of values of A

which gives finite convergence.




1.0 1.5 2.0


\ = 1

1 = 1

i =1


(2.15) Theorem CGoffin)

Let P be a polyhedron with an interior. Then the sequence

generated by the relaxation method converges finitely to a point of

P if

(a) P is smooth enough and A c 11, 2]

(b) A E (X*, 2]


= + 2P and U(P) is the obtuseness index of P.
1 + U(P)

Furthermore, if A > 2 then the sequence either converges

finitely to a point of P or it does not converge.

The first part of the theorem shows that if all the corners

of P are smooth enough then the range of A over which we can get

convergence in a finite number of steps is [1, 2].

The second part of the theorem shows that for a polyhedron that

is not smooth enough, the extent by which we can deviate from A = 2

depends on the acuteness of the most restrictive extreme point of P

-- the more acute it is, the less we can relax A from a value of 2.


Definitions and Motivation. The final work on relaxation

methods that we review is by Merzlyakov {30]. Merzlyakov's

method takes advantage of the fact that convergence properties

of relaxation methods improve if, instead of considering only

the most violated constraint, as was done by Agmon, among

others, a larger number of supporting planes are considered.

We would like to emphasize this unusual property of

relaxation methods -- redundancy, in terms of number of constraints,

improves the convergence rate. Additional halfspaces have the effect

of increasing p which lowers 0, the convergence ratio.

At each x C R let

V(x) = {i: a'x + b. < 0, i = 1, ..., m}
1 1
That is, V(x) is the set of indices of constraints violated by x.

A cavity CV is the set

C = {x : R : V(x) = V} where V is a subset of first m natural

numbers. A subcavity SVj of CV is the subset of all x c CV which

violate constraint i C V no less than another constraint j E V.

Ties are broken by the natural ordering. Notice that subcavities

along with P finitely partition Rk. Figure 16 illustrates the above.

Associate with each subcavity S a fixed X.(j, V) '.here

0 if i V V

.(j, V) = 0 if i V
> 0 if i = j
For example for subcavity S{2 ,3(see Figure 16).

X1(2, {2, 3}) = 0

A2(2, {2, 3}) > 0

A3(2, {2, 3}) > 0

Method. Merzlyakov's relaxation procedure is as foll,,s:

Let A E (0, 2) and x specified. If at iteration n. :n P

stop. Otherwise, let

s 21
{I, 2}

C{1, 2}

, 2 3
{1, 2, 3)

, 2, 3)


{2, 3}

2 3
f2, 3)


(2.16) x = xi [ (j,V)(axn+b )] (j)a

[ZiA(j,V)ai] [LiA(j,V)ai]

n j
where x S.

We again make the assumptions that P # 0 and without loss of

generality assume I a.si =1 for each i = 1, ..., m. Also assume

,withouc loss of generality thac Z7.i(jV) = 1. Coffin sho..ed chat if

P t 0 chen the vecccor 7\. (j,V)a. cannot be zero.

lierzlvakov showed that the procedure given above converges

Co a poinc of wiih a linear ract of convergence. If 1 < 2

clhen w.e can say
Fn = {:: 7. (j,V)(a':- + b.) 0}
1 1

and clearly P c Fn. If 0 < < 1, chen

Pn = {:.: \,.(jV) Ia'x + (1 \)b. ',a .: ] }
1 i 1 1

and P = P Here again Pn is a relaxed form of P and the sequence

obeys our stipulation for rela:xacion method, i.e., ac iteration n we
n n n
find :.; c P ..lere P = P .

The rela:-;ation sequence I{:; } congerge- infinicely co, a point

C .dP, where ciP is the boundary of P, only if the whole sequence

n: -'J ja r" into :; + H (: ). by, moving in a direction that is a

convex com.binacion of violated constraints ,we force the sequence out

of tlie jam. For Ehis reason terzl,,,akov method can be rc-erred co

as an an tijairinug procedure. Table 2 and Figure 17 iilutScnte this

poinE. The cable and figure are based on the ex:.ample givcn in

Merzlvakrov's paper.


Consider the following system of linear inequalities

-x1 + 10x2 + 10 0

3x1 10x2 30 > 0

Let x = (0, 0). We solve this by

(a) Agmon's method (b) Merzlyakov's method

where X = 3/4, a. = 1

(a) Agmon's method
n+1 n n
x = x x + b. )a. ;
I 1n
n n n

TABLE 2 (continued)

(b) Merzlyakov Method
n+1 n [ZA.(j,V)(a'xn + b.)][ZA.(j,V)ai]
xI = x(jV I i(j a
In i(j,V)a. ] In i (jV)ai]

n [TA. (j,V)a.]'
x E X.(j,V)(a'.xn + b.) Y .(j,V)a. 1 1
n xln x n 1 1 1 1 1 [.i(j,V)a.]
X1 X2 l (3,V)a
0 .000 .000 -2.873 ( ) 1
1 .619 -2.064 -1.838 ( ) .037
2 7.622 -.686 -.459 ( ) .037
3 9.371 -.342 -.273 ( 99 1
4 9.351 -.138 -.125 (13) .037
5 9.827 -.044 -.032 ( 1) .037
6 9.949 -.020 -.010 ( 99 1
7 9.948 -.013 -.008 ( 03 .037
8 9.978 -.007 -.002 ( 8 1

9 9.978 -.008

Notice that the Merzlyakov method

the Agmon procedure. Convergence to a

faster with the Merzlyakov Method.

is much more effective than

point of P appears much

Generalized Merzlyakov Procedure. The Merzlyakov procedure

can be generalized [24] by allowing the A.'s to be chosen based on

the current point rather than fixed for a subcavity. For any:.

where V(x) # 0 let

0 0
+ I

x x
o 0

+ I


-i rC N



-4 P

N 0

0 0

< 0


0 if a'x + b. > 0
( 0 if aix + b. 0
1 1

where 0 < EA.(x) B, A.(x) > 0 for some i E V(x) and at each n
there is a j e V(xn) such that

A.(xn)(a'.x + b.)
1 1 1 >
a'x + bj,

where j* indexes a most isolated constraint and 0 > 0.

The convergence properties of this method are described in the

next chapter.

Typical Subgradient Method


Let f be a conve.:-: but not necessarily differentiable funccion

defined on R u E R is said to be a subgradienc of f at :: if it

satisfies the following inequality:

f(y) 2 f(x) + u'(7 :x:) for all Rn.

The set of subgradiencs of f at x is called the subdifferential of

f at :: and denoced by cf(>:).

Si(:-:) = {u c R : u is a subgradient of f at :-:}

Sis a minimum of f if and only if 0 e cf(x ).

A typical subgiadient algoritlhm works as follows:

(1) choose F. I. nrbicraril,

n - n n 11
(2) coiipu t .i bgradieit u >i at :-. i.e. 11 c d :-: ).

If u = 0 an optimal point has been found. If not go to (3).

n+l n-+ n n n
(3) Find : as tollous; :: = x t u

where tn is a scalar generally obtained by the approximate

minimization of

f(xn tun) for t a 0.
n n+1
Go to (2) with x replacing xn

The function f could be an arbitrary convex function. If the

objective is to find a point of a polyhedron, one could choose a

number of different functions. One choice could be
n n
f(xn) = Max (-a.x b.)
i 1
i=l,... ,m

Since f is the maximum of a finite number of convex functions, f is

convex. Oettli [32] has given another interesting way to define a

function which can be used to find a point of a polyhedron. This

is given latter.


Reverting back to the general subgradient procedure given by

the iterative scheme

n+l n n n
xx tu

various proposals [16] have been made for the step size tn. The

earliest one was by Polyak [34] who suggested tn = o and lim n = 0.

The merit of this procedure is that we could choose any arbitrary

subgradient u nE f(x ) and the sequence would converge to some x*

with the stipulated stepsize (e.g., with t = 1). However

convergence is dismally slow and the method is of little practical

value. Polyak attributes the original idea of this method to

Shor [36].

Another idea also due to Polyak [35] is to use

(2.17) ? If(x') ]
n n

I n*
with f = f(x ) the optimum value of the function. Polyak has shown

that with 0 < b An = b < 2 the sequence converges to x with a
1 n 2
linear rate of convergence. A variation of 2.17, also due to Polyak

is to use f > f(x ), i.e., an overestimate of the optimum f(x ).

Here again if 0
linear rate but to the level set

S = {x: f(x) < f}.

Thus for the result to be of value we must choose close enough

overestimates of f(x ) and this is difficult in practice.

A final variation on the theme of 2.17 is to use an

underestimate f < f(x ). Eremin [12] has shown that the sequence

converges to x if lim n = 0 and XA = n Here again convergence
n n
to the optimal solution is very slow.

Some Subgradient Procedures


(2.18) Theorem (Polyak)

Let f be a quasi convex function to be minimized over F:. If

the sequence {xn} is obtained as follows
n+1 n n un
x = x A u with A = u n, then there e:-xists a
"ni ul n

subsequence (xnk} such that lim f(xnk) = f

Proof: Select a > f and define

S = {x E R f(x) = a}

S is convex (it is a level set). Hence there exists an x in int S
a a
since a > f (see Figure 18). Choose p > 0 such that the neighbor-

hood B (x) c S
p a

Suppose x ~ S for all n



I I P.025

(a) x E RI

contours of f

(b) x R2


f(x) f(x") > u" (x xn), --x
n n
since u is a subgradient at x Thus

f(xn) f(x) un' (xn x). Also

f(x) < f(xn) -V-x S

Combining the above

un (xn x) 2 f(x") f(x) ? 0, -y-x C S

This holds for a point x + u E B (,x)c S
II u P

n un n pun
u x u ( + )

+ Iu

n 11 n 1 2 n ~1
p u Ij u (x x)

x x || = ||x -

n ~ 2 n
II x x| + IIu

< n ~ 2 n
= x x2 + |u |

n ~I 2 n
= x xll + uI u

n ~ 1 2
u x I

Choose N such that A = p, -V-n N.
For n = N to n = N + m

S IN+m-1 ~ 2 N
0 5 x x| I x

g x x p A

2un (xn- x)

2p I un J

2pA .

This is possible since lim .', =0.

- x|2 + E An(n 2p .,

For m large enough the right hand side is negative, since Z.\ Ji'.erges

-- a contradiction. Hence x EC S
Let lim a = f Then we have proved there exists :'nk a
k-- k AV
such that
lim f(x k) = f _-I

Computational experience with heuristic subgradient methods has

been reported by Held and Karp [18], and Held, Wolfe, and Cr-:,.der [19].


Held, Wolfe, and Crowder use underestimates to f(x ). However the

stepsize Xn = X is kept constant for 2m iterations where m is a

measure of the problem size. Then both A and number of iterations

are halved successively till the number of iterations reaches some

value k, X is then halved every k iteration. This sequence violates

the requirement Xn = m and it is possible that the sequence may

converge to a wrong point. However in their computational experience

this did not happen. Held, Wolfe and Crowder experimentally found

that their procedure was effective for approximating the maximum of

a set of piecewise linear concave functions. In their problems

(assignment, multicommodity, and maxflow) the number of constraints

is enormous, of the order of 100'. In all these cases they were able

to get linear convergence (which is assured by the choice of stepsize).

Held and Karp used overestimates to f(x ) to obtain bounds for

a branch and bound procedure for solving the traveling salesman

problem. Using this procedure they were able to produce optimal

solutions to all traveling salesman problems ranging in size upto

64 cities.


Oettli proposed a subgradient method for finding a point of an

arbitrary non-empty polyhedron. We will discuss his less general

method which can be used to solve a linear program. As before, using

the dual and an additional constraint expressing strong duality, a

linear program can be stated to be the problem of finding an x C Rk


+j(x) 0 i = 1, ..., m

where Z.(x) represents the magnitude by which constant i is violated

at x;i.e.,

i (x) = max {0, -a.x b.}
1 1 1

Define (x) to be the vector formed from violations of

different constraints

Z+(x) = (Z+(x), Z2+(x), ..., Z (x))
S2 m

Let p(*) be an isotone norm on Rm, i.e., if 0 = x y then

p(x) p(y). It may be noted that all the usual norms like Euclidean

and Tchebycheff are isotone norms. Define

d(x) = p(Z+(x))

Then clearly x satisfies +(x) 2 0 iff d(x) = 0.

Oectli sho'.ed that d(') is a conv'.e: function andi then described

an iterative scheme using subgradients of d(') that minimizes d(-)

over Rk. This scheme is quite general since we have a different

function for different norm-.

(2.18) Lemu.- [Oettlil

d(') is a conxex function.

'. (z) is con'vex for all j.
.1+ +
Thus i (atz1 + ',z~) ": i (z ) + t i (z- ) for d., :. O, ., +-; = i.

Since p is monolonic
.+ '+ 1 +
pr'( +'izt + Cz-) p[,i':> (z ) + Si t (z )i

and since p as a norm is con'vx:

I p[lo (zi) + izt ) + z )

h= plk+(zl) I + d pl (z -)

hence d-xz1 + L-z~) L d(z') + Cd(z). i

Let x0 be given initially. If xn P stop. Otherwise, let

n+l n d(xn) .
(2.19) x = x -_ (xn-t(x)
t(x )"t(xn)

where t(xn) is any subgradient of d(') formed according to Oettli's
rule at x The sequence {x } converges to a point of P with a

linear rate of convergence.

(2.20) Lemma [Oettlil

Let y C P, then for the Oettli's procedure
n+1 2 n 2 d2(xn)
Ix I (x ) 1 2

n+1 2 n d(x ) n 2
Sx yl = n x n) t(x ) yl 2

n 2 dx) 2d(x)
=I x || + -x 2d(xn) t(xn)' (x Y)
t(xn)'t(x ) t(x )'t(x )

By definition of subgradient

d(y) > d(x ) + t(xn) (y x )

Since d(y) = 0 therefore

d(xn) 5 t(xn) (x y)

Collecting the above results

t(x ) t(x )

(2.21) Theorem [Oettli]
n+1 n <0<
I| x x 6 II x x ,0 < 0 < 1

n+l 2 11 2 d2(xn
(2.22) j x ii Ixn- (xn)
t(x ) t(x )

n 2
S11 Y112

By triangle inequality

Ii x xl II x y l + I y- xI

S2 I x yll

Since this holds for all y E P including Tp(xn)

II x xl 2d(x', P)
Substituting x in 2.22
In+1 *11 2 l n x* 2 d2(")
lx x ] n x x I -d()
t(x ) t(x )

n 2 d2(xn) 1
= II x x I [1 ,n. 2 2]

n 2

2 nn
< n *; 1 d (x ) 1
Sn, F n. 2 n 1
cn ) tc x ) d x )

or I x. 1 :x- x 1 x ]
1 B

where A = inf B = dup I| t(xn")
n d(: P) n

Oeccttli has show..n that A > 0 and E i. finice. From these c'Dser'.acticn.,

the result follows if we take

6 11 -,4 2,"

In the ne:x. chapter we show that the Generalized llerzl;,akov

procedure is equivalent to the Oettli procedure in as far as the

sequence of possible moves made by che tw.'o procedures are concerned.

Souime of the Hore Riecc'nt Algori thlns

A limictaion of subgradient opcimization methods is the absence

of a benchmark co compare the computational efficiency of difference

algorithms. Algorithms for unconstrained optimization of smooth

convex functions have the reliable method of steepest descent

(Cauchy -procedure) as a yardstick against which other algorithms

are compared. If they are better than the Cauchy procedure, they

usually deserve further consideration. In subgradient optimization,

the Cauchy procedure may not work. An example where the Cauchy

procedure is ineffective is reproduced below from [37]:

The function considered is

max {3 x + 2y, 2x + 10y)



-8 >


The contours of this function are sketched in Figure 19. One

can calculate that if the Cauchy algorithm is started at z, the

sequence converges to the origin (0,0) which is not the minimum

of the function.

Some of the recent subgradient procedures can be divided into

two categories. In the first category are those that claim to be

at least as good as some other subgradient procedures. These

algorithms claim their superiority based on empirical (.computational)

results. In the second category are those that claim to be reasonably

effective for both differentiable and non-differntiable convex

functions and claim that the algorithm is superior when applied to

quadratic and convex differentiable functions.

As an example of the former we present the method of Camerini,

Fratta and Maffioli 15], and as an example of the latter we present

Wolfe's 138] method of conjugate subgradients for minimizing non-

differentiable functions. Wolfe's algorithm closely resembles

Lemarchel's method 128].

Camerini, Fratta and laffioli. Here one adopts the following

iterative scheme for minimizing f('). Let
x = 0 and

n+1 n n n .
x = x t s where s is a modified "gradient"

direction defined as

sn- = 0 for n = 0 and

sn = f (xn) + n-1
n n n
with f'(x ) a subgradient of f at x and 0 a suitable scalar. s
is equivalent to a weighted sum of preceding gradient directions and

is used as an anti-jamming device. The following lemmas and

theorems develop the essential elements of Camerini, Fratta and

Maffioli method.

(2.23) Lemma
Let x and x be such that

f(x ) f(xn), then

0 f(xn) f(x ) f'(x) (x x )

This lemma states that the negative of the subgradient makes an
S n *
acute angle with the vector (x x ) where x corresponds to an

optimal solution. See Figure 20.

/ x


f' (xn)


(2.24) Lemma

If for all n

0 < t f(x) f(x*) and 3 0, then
n 11 sn 12 n

0 5 f' (x) (x xn) < -s (x x)

Lemma 2.24 extends the subgradient property of Lemma 2.23 to -s
-s also makes an acute angle with (x x ).

(2.25) Theorem

n-i n
-Yn s f (x ) if s f(xn) < 0

Letn I "n- 112
0 otherwise

with 0 = = 2: then

n' n n'' n
-(x x ) s (x x f (xn)

ls" l = Ilf,(xn) ll
Theorem 2.25 shows that a proper choice of ensures that -s is at

least as good a direction as -f'(xn) in the sense that -s makes

more of an acute angle with (x x ) compared to -f'(x ). If

-f'(xn) and -sn1 make an acute angle we take -sn = -f'(xn). Only

if -f'(xn) and -sn- make an obtuse angle do we take

-s = -(f (n) + s"-)

Figure 21 illustrates the above theorem.

-f' ( n)=-I





(2.26) Theorem

Ix* xn+ll < j x* xnl


-2f' (xn) (x* xn by Lemma 2.24

-2 (x* x by Lemma 2.25


t il sn 112 + 2sn'(x* xn) < 0

or t sn2 + 2t s (x* x") < 0
n n

adding II x* x n2 to both sides gives

1Ix* xn+l < I x* xI I/

Theorem 2.26 shows that the sequence is Fej6r-monotone with respect

to an optimum point x*. Using arguments similar to those used

earlier in relaxation methods, it can be shown that the sequence

{xn) converges to x*.

Camerini et al., use heauristic arguments to show that the

best value for y is 1.5. They then claim superiority of their

method over that of Held and Karp [18] and Polyak [35] based on

computational results.

Wolfe. In Wolfe's method [38] we generate a sequence of points

{xn} of directions {dn}, of subgradients {f'(xn)} and scalars {tn}

such that for n = 0, 1, 2, ...

t > 0, f'(xn) E 3f(xn)
n+1 n n
x = x + td and

f(xn+l) < f(xn).

At step n we have a bundle of vectors G which is formed by a set of

rules. G consists of f'(x ), possibly -dn- and a (possibly empty)

string f'(x ), f'(x ), ... of limited length.

We set d = -N G where N G is a unique point in the convex
rn rn

hull of G having minimum norm. Demjanov 110] has shown that

Vf(x) = -Nr f(x) where Vf(x) is the gradient of f at x when f is

smooth at x and 3f(x) is the subdifferential of f at x. As is

typically the case, the stepsize, t is determined by an approximate
minimization of f(:< + td ) for t 0. At some step, d = -; G
n r n
will be small, and in addition the tlst p + 1 points (for some

interger p) are all close together. Thlen :' w. ill be our appro:imrate

soluc i ln.

For e:x-ample consider the illustration in Wolfe's paper

f(::,v = ma:x [f I, f:., f, where
t f
S = -. f = .: + v, t = ," 2"
1 3
sketched in Figure 22. A subgradient of f at an',' (::,') is chosen to

be '7f. where i is the least inde:.: for which f(:,',') = f.. If we start
i 1
with the point (6,3), we move in the direction -'f(6,3) = (-1,-1) co

the point (3,0). Using our prescription for choosing che direction,

the oni:, available direction at (3,0) is still (-1,-1). We must

ctake a small scep in tliac direction an':,.a,' even tclouPg it is not a

descent direction. hien the new subgradient will be (-1,-2) forming

the bundle G6, = [(1,1), (1,-2)}, .'e find the direction C, = (-1,0).



r\ NA8

[1 J -\ x


The next step in this direction takes us to the neighborhood

of (0,0). At the next iteration we add (-1,0) to the bundle. Then

N = 0. We "reset" and start all over again at this point. We
accept the last point reached as a solution if N G = 0 and the
last "p" points are "close enough."

For differentiable convex functions, the method is an extension

of the method of conjugate gradients and terminates for quadratic




Comparison of-Generalized Merzlyakov Method and
Some Typical Subgradient Methods

The equivalence of relaxation methods and minimization methods

has been noted by several researchers. However, these two approaches

are only mathematically equivalent and this equivalence has not been

generally made explicit. On the other hand computationally relaxa-

tion methods are much simpler. It has been shown [24] thac _

generalized form of Merzlyakov's method subsumes most of the typical

subgradient methods. In particular it was shown that the geneuralized

Merzlyakov method can generate any move that is made by thc ucettli

method. In this section we draw heavily from the results of the

above paper [24].

Generalized Merzlyakov Method

The Merzlyakov method is generalized [24] so as to facilitate

its comparison with some of the subgradient methods. This extension

greatly increases the flexibility of the original method. The

following procedure allows the A.s to be chosen based on the current
point rather than being constant for a subcavity. For an:, ,l.,lere

V(x) # 0, let

0 if a'x + b > 0
A.(x) =
S0 if a'x + b. < 0
i 1 -

where 0 < E A.(x) 8, A.(x) > 0 for some i CV(x) and at each n
1 1
there is a j C V(xn) such that

.(xn)(a'.xn + b.)
3.1 I J 90
a' ,xn + bj,

where j* indexes a most violated constraint and 0 > 0.

For A C (0, 2) and x initially specified, stop at iteration n

if x E P. Otherwise, let

n+ A. (xn)(a'xn + b.)(E A.(x )ai)
n+l n i 1 1 1 1
3.2 x = x -
(E A. i(xn)a.)' ( .(xn)a.)

We continue to assume H aill = 1 for all i.

In the next two theorems, we show that the generalized

Merzlyakov method converges with a linear rate of convergence to P.

3.3 Theorem

Let z C P, and {xn} obtained by the generalized Merzlyakov

method. Then
x n+l 1 < 11 n 11
llx+1 zll < x z


2(xn-z)' A Z A.(xn)(a'xn + b.)E X.(xn)a.
I n+l i ll n 2 1 1 1 1 1

1 1 1 1
|| x -Z11=[|x n nn

12(Z A.(xn)(a'xn + b.))2
+ 1 1 1
(E A.(xn)ai)'(. .i(xn)a.)

The last two terms can be confined Lo get

-AX A .(xn)(a'xn + b.)
S 1 -n [2X A.(xn)a'(x z)
(X A.(xn)a.)'( n [.(xn)a) 1 1

-A(E A.(xn)(a'xn + b.))].
1 1 1

Now a'xn + b. 0 whenever A.(xn) > 0 and a'xn + b. < 0 for some
1 i I i 1
A.(xn) > 0 and thus the first bracketed term is positive.
Since z C P and a'z > b., then
1 1-
-E A.(xn)a'z 5 Z A.(xn)b.
1 1 1 1
Thus the second bracketed term is less than or equal to

(2 -A) A.(xn)(a'x" + b.)
1 1 1
and this term is strictly negative.


Sxn1 zll < i x -n zl Ii/

Thus we have established that the sequence is Fejer-monotone

with respect to P. We next show that the procedure gives points

which converge linearly to P. Let

d(xn, P) = inf II x z .

3.4 Theorem

Either the generalized Merzlyakov procedure terminates with a

point of P or it generates a sequence which converges to a point

x* E P.

Furthermore, the convergence rate is linear.
n n n
Proof: Assume x V P and z is the unique closest point of P to x .

2 n+1 p) |n+l n n 2
d (x P) [ x z .
S In+1 n 2
After expanding x z 112 we get

,n n 2
A(2-A)[Z A.(x )a'x + b.)]
2 n+l, P) | n n 2 i i
d (xn P) = ||x z || -
( ( A.x )ai)'( A. (xn)a.)
1 1 1 1

A(2-A)[Z A. (xn)(axn + b.)]2
d2(xn p) 1 i
= dA x P) A---(x-n-)a
(E A.(xn)a.)'t( A.(xn)a.)
1 1 1 1

By assumption

IJ A.(xn)(a'xn + b.)]2 [A.(xn)(a'xn + b.)]2
i 1 1 1 1

( 62 (a'xn + b. )2 for some i E V(xn)

From Hoffman [21] we have that there is a p > 0

pd(xn, P) < -a'. x b..


d (x P) = d2(xn, P) 2-2(x, P)
(E .(xn)a.)' (E (xn)a.)
1 1 1 1

Finally since E A.(xn) < B and I a.ll= 1 for all i, we have

(E Ai(xn)ai)' ( .(xn)a.) 82

And we get

d2(xn+, P) = II A(2-)e2 /2/ 2]d2(xn, P) y2d2(xn, P)

Then 0 y < 1 and d(xn, P) < nd(x0, ).

Case 1: If the sequence does not terminate lim d(xn, P) = 0 and

Motzkin and Shoenberg [31] have shown that the sequence converges to

a point x* in the boundary of P for 0 < A < 2.

Case 2: If the sequence terminates

x n+x zjl < xn zil for all z E P (Theorem 3.3).

Also Ilx* znl| < I xn z nl where zn is the closest point to P

from xn

Th us
nn n

< 21 xn z"II = 2d(x", P)

2ynd(x P)

where x* is the last point in the terminating sequence. ///

We can further generalize the Merzlyakov method and Theorems 3.3 and

3.4 by replacing A with A where 0 < b. < A < b < 2 for all n. In
n 1 = n = 2
the remainder of the discussion we usually mean A = A for all n but
do not have to restrict ourselves in this manner.

Further Developments of Oettli Method

Oettli's minimization method was discussed in the last chapter.

We wish to show the equivalence of Generalized Merzlyakov method and

Oettli's method in the next section but before that we need to develop

some properties of Oettli's method to facilitate this comparison.

Again most of the theorems and proofs in this section are taken from

the paper [24] referred to earlier.

We represent the subdifferential of a convex function f at x by

f(x)and a subgradient of f at x by f'(x) i.e., f'(x) E af(x).

To recapitulate the Oettli procedure, we form the sequence {xn}

as follows:



3. 7

n+l n d(xn)t(xn)
x =x
n n
t(x ) is any subgradient of d(') at x


For any norm p(') on Rm

ap(x) = ap(0) n {p'(x): p(x) = p'(x



We will show

pp(x) =

Conversely if the

that if p'(x) p(x)then the following ralation holds

{p'(x): p(y) p'(x)'y for all y} n

{p'(x): p(x) = p'(x)'x}

above relation holds, we will show p'(x) E ap(x).

By the definition of a subgradient p'(x) E ap(x) if

p(y) > p(x) + p'(x)'(y x) for all y.


p(Ax) p(x) > p'(x)'(Ax x) for all scalars X.

Since p(') is a norm

p(Ax) = A(lp(x)

For X > 1 p(x) > p'(x)'x

0 < A < 1 p(x) < p'(x)'x

Thus p(x) = p'(x)'x.

From the subgradient inequality

p(y) p(x) + p'(x)'(y x) by substituting for p(x) we get

p(y) p'(x)'y for all y. Hence

3p(x) = {p'(x): p(y) p'(x)'y for all y} n

{p'(x): p(x) = p'(x)'x}.

Conversely, if p(y) > p' (x)'y, for all y and p(x) = p'(x)'x, we get

by subtraction

p(y) p(x) p'(x)'y p(x) = p'(x)(y x)

and p'(x) E Dp(x).


pp(0) = {p'(0): p(y) > p'(0)'y, for all y} n Rm

= {p'(0): p(y) p'(0)'y, for all y). V/

To find the subgradients of d(-) in terms of subgradients of the

composite function p(+ (x)) we use the following chain rule given by


3.8 Lemma

Let p(') be an isotone norm and p'(Z (z)) 0, then

3.9 d'(z) = (z) p'( +(z))

+ + 0 > 0 + '
e (z) A (z ) = (z- z ) jl Cz )

multiplying both sides by p'( +(x)) where p'(Z (x)) = 0

(Z+(z) Z+(z0))p', +(z 0)) = (z z )' +'(z0 p'(+(z0))

Using the definition of subgradient for p'( +(z0))

p(Z+(z)) p(C+(z0)) 0 (+(z) Z+(z0 )'p'(Z+(z0)

( (z zO)'+' (z0)p( +(z0))

d(z) d(z0) (z z )+'(z p' (+z))

But d(z) d(z0) 2 (z- z )'d'(z )

hence d'(z ) = +' (z )p, +(z0 ).

The subgradient of (x) may be found using the following well
known.. result

3.10 t.+ (x) = {-Aa.: = 0 if a'x + b. > 0
1 1 1 1

A = 1 if a'x + b. < 0 and
1 1

A [0,1] if a'x + b. = 0}
I 1

The subdifferential of Z (x) can in turn be found from

3.11 a +(x) = {(hi, ... h ): h. E S (x)}
1 m i l

Generalized Merzlyakov method and Oettli method

With the results developed so far in the previous sec ions we

are now in a position to develop an expression for equation 3.6 in

terms of the parameters of the generalized Merzlyakov method.

3.12 Theorem

Let x c Rk, x V P and t(x) be a subgradient of d(') ic :w formJd

by using the chain rule of equation 3.9. Then
d(x)t(x) [ (a'x + b.)]EA.a.
d(x)t(x) 1 1 1 1
t(x)'t(x) (~A.a.)'( .a.)
1 1 1 1

for some X.'s where A. O, = 0 if a'x + b > 0, and EA. > 0.
1 i i i i i

By equation 3.9 we have

t(x) = e+ (x)p'( (x)) for some e (x) e S +(x)

and p'(Z+(x)) ap( +(x)).

By equations 3.10 and 3.11

z (x) = -(nlal' r2 2' .. n a ) with

0 if a'x + b. > 0
1 1

= 1 if a'x + b. < 0
I 1 1

S[10,1] if a'x + b. = 0
1 1

Also from the proof of Lemma 3.7

d(x) = p'(Z+(x))'+ (x) where

n, (a'x + b.)
Z+(x) =

n (a'x + b )
m m m


h p' ( (x)). In Oettli's method we need h 0. Substituting
these last few expressions into (x)t(x)gives
d(x)t(x) [EX. (a'x + b.)]EA.a.
d(x)t(x)1 1 1 1 1
t(x) 't(x) (EA.a.) (EA.a.)

with A. = n.h. 0. We also have -EA.(a'x + b.) = p( +(x)) > 0 since
1 11 1 1 1
x V P and EA. > 0.

We have thus established a close similarity between the two

methods. We have however yet to establish the two other requirements

viz EA. =< and that condition 3.1 is satisfied. First we note that

since the subdifferential of p(') is compact EA. is bounded above.
Hence the condition LA. B is satisfied. Now to show that condition
3.1 is satisfied, we see on the lines of the proof of Theorem 3.12


d(x) = p(Z+(x)) = p'( +(x))' +(x) E h'(A+(x))

= -ET.h.(a'x + b.) -EA.(x)(a'.x + b.)
1 1 1 1 1 1 1
where A.(x) = .h. > 0.
1 1 1

-E. (x)(a.x + b.) = p( +(x))
1 1 1
and m max X.(x)(-a'x b.) 2 p(C (x))
1 1 1 -

Also Z (x) 2 Z.,(x)ej, where j* is the most violated

constraint and e. is a unit vector with a 1 in position j*. Since

p(') is monotonic

S+ (x)ej)
p(Z (x)) p(A (x)e)

= -(a,x + bj.)p(ej.)


m max A.(x)(-a'x b.) 2 -(a' x + b. )p(e.,)
i 1 1 1 J

2 -(a',x + bj ) min p(e.)


main P (e)
A,(x)(a'x + b ) m p(e
i i > i a
= 0 > 0
(a',x + bj,) =

where i E V(x) maximizes the left hand side.

If we allow X = 1 in the generalized Merzlyakov method,we find

from the above analysis that any move made by the Oettli procedure

can also be made by the generalized Merzlyakov method. However, the

reverse is not true. This can be seen if we let A # 1 and take the

case where x violates only one constraint. Thus the Oettli procedure

is strictly subsumed by the generalized Merzlyakov method. Compu-

tationally the relaxation method is very simple to implement since

the directions of movement can be easily computed. Also it is a

very flexible procedure since all that is required to change the

direction of movement and the step length is to change the weights

X.(x). On the other hand the Oettli method requires a subgradient

to be computed at each point which is not a trivial calculation.

In an earlier paragraph we indicated that the

Merzlyakov procedure could be further generalized by allowing A to

vary with each iteration. However this would require the additional

condition 0 < b Xn A b2 < 2. Oettli has recently generalized his

method [33] to incorporate the above feature. His generalized

method requires that An (0, 2) for all n with the additional


CA (2 X ) = + .
n n

A second aspect in which the two methods differ is that in the

generalized Merzlyakov method we could use a different set of X.'s at

each iteration. This could be introduced into the Oettli method by

considering different norms at each iteration.

Comparison With Some Other Subgradient Nethods

In this section we show the versatility of the generalized

Merzlyakov method by showing that some of the other typical subgradient

procedures are strictly subsumed by the generalized Merzlyakov method.

We consider the methods of Polyak (134], 135]), Eremin [12] and Held,

Wolfe and Crowder [19]. These subgradient methods are capable of

tackling more general functions, however we compare them when the

objective is merely to find a point of a polyhedron P. There are a

number of ways of defining the function f(') such that when we

minimize it we get x* P. A typical choice is

3.13 f(xn) = max -(a'xn + b.).
1 i

We will show that with f thus defined the subgradient methods are

subsumed by the generalized Merzlyakov method. The above subgradient

methods can be collectively described by the following sequence:

n -
n+l n [f(x) f] n
x = x 2 f(x )
S n f'(xn)T2

where f is an estimate of f(x*) the optimum value of f. If f(x )

assumes its value for a unique i in the relation (3.13) then the

subgradient at x f'(xn) = a.. If there are ties, f'(x ) can be
taken to be a convex combination of the maximizing a.s. Let 0(x )

represent the set of indices i E {l, ..., m} for which

f(xn) = max (-a'.xn b.). Then
i=l,... ,m

f'(xn) = ZA.(xn)a. with EA.(x ) = 1
1 1 1
A.(xn) 0 and A.(xn) = 0 for i i 0(xn).
1 1

n -
nl n ((x) f)
n+1 n \n A n,
x = x f'(x )
i f' (xn) 2 (x

(EA.i(xn)(a'xn + b ) f)(E.(xn)a.)
1 1x 1
n (EA.i(xn)a.)' ( i. (xn)a.)

with A.(xn) 0, A.i(xn) = 1 and A.(xn) = 0 for i V 0(xn)
1 1 1
This can be considered equivalent to finding a point of P where

P = {x: a.x + b. f ? 0, i = 1, ..., m}.
1 1
If f 0, the procedure gives a point of P. Thus if P J 0 the

subgradient procedures mentioned in this section are special cases

of the generalized Merzlyakov method.

An Algorithm Using Relaxation Method


We have seen that with relaxation parameter of A = 1 we can get

the best convergence rate for a fixed i (see Figure 14). Also a

higher p leads to better convergence. We wish to combine both these

features in our algorithm. We also derive motivation from Polyak's

[34] and Oettli's procedure 132]. In this section we give a procedure

for finding a point of an arbitrary polyhedron P and later apply it

to the specific problem of solving large linear programs.

Given a polyhedron P

P = {x E Rk: a'.x + b. = 0, i C I}.
1 1
We assume P J 0 and I j 0 and finite. Goffin [15] has shown that when

P is full dimensioned finite convergence is assured for a range of

relaxation parameters A depending on the obtuseness index of the

polyhedron. When P is not full dimensioned,we can partition it to

subsets M and C, such that M is full dimensioned, and then devise an

iterative procedure relative to these subsets so as to take advantage

of the full dimensionality. Even though finite convergence is not

assured, we can hope to get a better convergence rate with a proper

choice of the relaxation parameters. There is considerable flexibility

in the manner M and C are chosen. One important consideration is that

it should be easy to project on to C. The algorithm is in two phases.

In Phase I we use an iterative scheme on the sets

M and C which drives us close to P. We do this by starting with an

arbitrary point in C. We then move in the direction of M. This

movement could be a projection onto M, or a reflection or under or

over relaxation, depending on the value of the relaxation parameter

A. From the new point thus obtained we project on to C. We show

that the sequence thus constructed converges to P with a linear rate

of convergence. In our algorithm however, this iterative procedure

is continued merely to get close to P. In this special case when
C = R Phase I becomes the relaxation procedure of Agmon.

Once we get close to P we switch to Phase II. In Phase II we

consider the constraints of the set P and apply the

Generalized Merzlyakov method. The motivation for use of the

Generalized Merzlyakov method at this stage is the fact that the

set of violated constraints as we approach P are precisely the set

of violated constraints at the limit point and under these

circumstances, the Generalized Merzlyakov method can be very


We first describe the procedure for finding a point of an

arbitrary polyhedron P.

The Algorithm

Phase I.

Let z C-P

If z E P stop, otherwise define

n+l } T )
z = TC {zn + Ad (zn)a.i } TC(sn)

where 0 < A < 2 and a. corresponds to the most violated halfspace
of M at z and

3.14 sn = zn + Ad (zn)a

We will show that the sequence {zn) is Fejer-monotone with

respect to P.

3.15 Lemma [Goffin]
Let x, y, t be points of R and let

x' = x + A(t x) where A E R. Then

I x' y 12 = Ix y 2 2 A) It x 2 + 2A(t-y)'(t-x)


I x' yll = x + A(t x) yi 2

= 1 x y1| 2 + 12(x y) + A(t x)]' A(t x)

= Ix y 2 A(2 A)| t xII 2 + A[2(x y) + 2(t-x)]'(t-x)

= x y 2 A(2- )I t xI 2 + 2A(t y)'(t x). i//

3.16 Lemma

-sn ul2 = izn u2- AdM(zn)[(2 -)d(zn) 2a' (tn u)]
for u E M.


n n n
Substitute s for x z for x and t for t, where

t = z + d (z )a Then
1I s ul 2 = 11 z u12 A( (2 ) i )t zn12

+ 2A(tn u)'(tn z)

= |z u 2 A(2 A)d (zn) + 2Ad (zn)al (tn u)
9 1
= I zn u| AdM(Zn)I(2 A)d:(zn) 2ai (t" u)].


3.17 Theorem

Let M be a non-empty polyhedron of Rk. Then the sequence of

points generated by equation (3.14) has the following property:

Ss" ull 1 z" ull for all u E M.


If u E M, u H Hence a' (tn u) 1 0 for all u M. By
1 1
n n
lemma 3.16

I sn ui I z ull .

3.18 Theorem
Let P be a non-empty polyhedron of Rk. Then the sequence of

points {zn} generated by equation (3.14) is Fejer-monotone with

respect to P, in fact

I zn+ ull < 1n ull for all u P.


Let u c P and let b be the unique point obtained by projiccring
n n+l
u onto the line formed by extending s z (Figure 23). The angle
formed by s b, u is a right angle and for the right triangle
s b, u.
n 2 2 n 2
I s all 2 + II a u 2 = 1 s u[ 2
I n n+l 2 + n+l al2 + 2j su n+l 1 nl+ _
I| s z 1| + || z al| + 2 s z I[ |1 z a |

+ a ul 2 = sn ul

n n+1 2 n+l 2 n n+1 n + I
s + u + 2 js I i|

= sn u


n+l u 2 <11n 2
|| z ull < || s ull

n n+l
The strict inequality arises from the fact that s and z are

distinct or else z E P. But by Theorem (3.17)

II n ul 11 zn u|I hence

Szn+l ull < I zn uI I7/7

In the next theorem we show that the sequence {zn} converges

linearly to P.


3.19 Theorem

The sequence of points {zn) generated by the above procedure

converges finitely or infintely to a point of P. Furthermore the

convergence rate is linear.


Izn+ T (zn+12 = I zn+ u 2 for a unique u E P (the
closest point of z to P). Thus

I1z n ul 2 = TC{zn + AdM(z )a } TC(u) 2
S1 zn u|2 Ad (zn)[2a' (zn-u)-Ad (zn)]
since the projection map satisfies the Lipschitz condition with

c = 1. Continuing
n+1 2 n 2 n n n
z u z u12 Ad (z )[2d (Z ) Ad (Z )]

since a' (zn u) = d ,(zn) hence
i M

Iz n+l ul 2 < d2(zn, P)[l A(2 A)"2] by Lemma 2.12

= 2d2 (z P).

We get

d(z n+, P) < ed(zn, P) where

0 = [1 A(2 A),*2]1/2

and 0 E [0, 1).


d(zn, P) < 9nd(z0, P) and if the sequence is infinite

lim d(zn,P) = 0

Using Theorem [2.4] and by the continuity of the distance function

this implies {z n converges to z* in the boundary of P.

If the sequence terminates at z*, then

Iz* Tp(zn)I l< Tz(zn)jT = d(zn, P)

by Fejer-monotonicity of the sequence. Hence z* and zn both belong

to the ball B (Tp(zn)), and
d(zn, P)

1I z*- zn < 2d(zn, P) < 2Ond(0, P). -

Phase II. When the last k points are within E of each other,

we switch to the Generalized Merzlyakov procedure. This is because

if we are near the limit point, the only active constraints would

be the tight constraints at the limit point of the sequence. In

such a situation the Generalized Merzlyakov method can be very

effective with a proper choice of A and A.(x). We recapitulate

that in Merzlyakov's procedure (which is a special case of

Generalized Merzlyakov method) the sequence is generated as follows:

A n (j, V)(a'x" + b.)][A(j, V)ai]
n+l n 1 1
x =x -
[x.,(j, V)ai] [Ei(j, V)ai]
where x e S

If we assume that we know the set of constraints satisfied

with equality by Tp(xn), we can generate the halfspace

H* = {x Rn:(Tp(xn) x") (x Tp(xn)) > 0}

which gives convergence to Tp(xn) in exactly one step with X = 1.

Suppose we number the set of active constraints at T (xn) by

I* = {1, ..., p}. Let A* E (al,..., a )' be the p x k matrix
formed by interior normals of the halfspaces H. for iC I*. Let
b* = (b ..., bp)'. Define

(A, ...,A )' = -(AA*')-(A*xn + b*).

With this set of X.(j, V)'s using Merzlyakov method, Goffin [15]

has shown that convergence to T (x ) is obtained in one step.
However, this requires a comparably larger amount of computation

and finding (A*A*')-1 is not really practical on large problems.

In this study we have therefore concentrated on the Generalized

Merzlyakov Method and attempted to find what would constitute a

favorable set for A and X.(x).

Application to Solving Large Linear Programs

We now consider how the procedure could be used to solve

large linear programs.

Consider the LP

max c'x

s.t. Ax b

x_ 0

and its dual

min T'b

s.t. A'T c


By stong duality, an optimal I and x satisfy

b'T c'x < 0. Let

C = ((): b'T c'x 0}

M= {(x): Ax = b, -A'T : -c, x 2 0, T 0}

and P = M n C.

TIis is only one of the ways of partitioning the cons. L '1 ilL of

P. There is in fact great flexibility in the choice of C and II.

The advantage of splitting C and M is indicated above is

(a) M has a special structure and can be decomposed along T

and x. This leads to saving in storage as well as easier arithmetic.

(b) C contains only the coupling constraints and is easy to

project to.

However, there are other ways of obtaining P. Another

choice could be to let C have only the non-negativity constraints.

Again the advantage of such a construction would be ease of pro-

jection to C. Still another choice is for C to take the form

C = {(x): b'n c'x 5 0, x 2 0, Tr 2 0}.

There is user discrimination in this algorithm in the choice

of M and C, 1 and Z in Phase I, and A and weights A.(x) in Phase

II. We have already commented on the possible strategies for

selecting Ii and C. We give below strategies for selecting the

other parameters. We will first discuss the parameters for Phase I.

(1) A: If 0 < A 5 1 convergence may be slow if some of the

"angles" of M are very acute (small obtuseness index of P). In

this case increasing A to near 2 will have the effect of opening

the angles and accelerating the convergence. It therefore appears

to be a good strategy to have A approximately equal to 2 in Phase I.

(2) Z : In line with the general observation for relaxation

methods that anya priori information on the location of the solution

could be used to advantage, if we have to solve a linear program

with only slightly varying data, we could Lake advantage of this

feature. However, in general when we do not have any such prior

information, the choice of Z0 could be dictated by the structure of

C, so that we can start with a feasible point of C.

Now for the parameters of Phase II

(1) A: As a general guideline 1 < A <2 may be chosen as the

parameter for Phase II as was done in Phase I. However, Table 3

and Figure 24 show the sequence obtained with A = 1.5 and h = .75

and illustrate that in some cases choice of 0 < A < 1 may give

better results.

(2) A.(j, V): In Merzlyakov method the choice of weights

X.(j, V) is crucial and in a large measure dictates the convergence
of the sequence. It may be noted that Ai(j, V)'s determine both

the direction as well as the step size. Suppose a., i C K

represent the set of violated constraints in subcavity S where

xn SV. We can solve the quadratic problem of finding the point

of minimum norm in convex hull of the finite point set a., iE K

to get the weights X.(j, V). Such a direction locally has a great

deal of merit. However the computational effort is too great to

merit its consideration, since at each iteration we have to solve

a quadratic program. Also it may not always lead to the best set

of weights as shown by the counterexample given in the next

paragraph. Instead we specify X (j, V) =) -, where KI represents

the number of constraints violated at x When IKj = 2 the

direction obtained coincides with the vector of minimum norm.

The strategy suggested by us is not the best in all cases as

shown by the following counterexample:

Consider x > 1

x2 = 2

1 0
x =(

Case 1: A = .75

A.(j, V) = 1 for all i E V.

n El.(j, V) [ZA. (j, V)a. ]'
x 1 A.(j, V)a 1
n n n xn i [7%i (j
x x2 (a'x + b.) 11 IE(j, V)a.]
1 21 1 1
0 0.000 0.000 -2.873 2'87 1
1 .619 -2.064 -1.838 (' ) .037
2 7.622 .686 .459 ( 88 .037
3 9.371 .342 .273 ( '99 1
4 9.351 .138 .125 ( 37 .037
5 9.827 .044 .032 ( ) .037
6 9.949 .020 .010 ( 99) 1
7 9.948 .013 .008 ( 18 .037
8 9.978 .007 .002 ( 58) 1

Case 2: A = 1.5 A.(j, V) = 1 for all i E V.
n EAi (j, V) [Z .(j, V)a.]' *
i .(j, V)a., V)ai
1_1 2 i 1 1
0 0.000 0.000 -2.873 ( 8 1
1 1.237 -4.128 -3.235 ( 9) 1
2 .757 .700 -3.326 ( -28 1
3 2.189 -4.079 -3.280 ( 99) 1
4 1.702 .816 -3.166 ( -8) 1
5 3.065 -3.733 -3.022 99) 1
6 2.616 .777 -2.866 ( 58 1
7 3.850 -3.347 -2.710 ( '995)
8 3.447 .704 -2.558 287


problem considered is
P = {(x2): -.099x1 +
.287x1 -

.995x2 + .995 > 0,
.958x2 2.873 2 0, x 1 0, x2 0}.

2 x

6 x

A = 1.5

S= 5
,' '


From the proof of Theorem 3.4 we have

2 nl (2 A)[EL.(xn)(a'xn + b)]2
2 n+1 2 n i i
d (x P) d(x, P)- -( X (
(C^i(xn)ai) C Ci(xn)ai)

Thus to get the best set of k.'s we may solve the following
optimization problem

(A.(-a'xn b.))2
1 1 1
max = k
I EA.a.l| 2
I 1 1 2
Suppose A* solves the above, then i also solves the above.
1 l n Ea.H
S1 1i
Hence an equivalent problem is

max (EX.(-a'.x b.))
1 1

s.t. I EA.ai = 1.

In our specific problem

max A1 + 2A2

2 2
s.t. A + = 1
1 2
or max jl A22 + 2X2'
1 2
This gives iA = and .
1 a 2 J57
This set ofA.s gives
2 1
x =( ) which is feasible with k = 5.
Had we specified A, = A i.e., A. =-
1 2 '-- i ,

x =3 which is not feasible and k = 4.5.

Thus our prescription does not lead to the best set of Ais, it is

however a good heuristic and easy to implement.

(3) A.(x). In Generalized Merzlyakov method, the weights A.(x)
3 J
are determined by the point alone and are not fixed for a subcavity.

The following appears to be intuitively a good set of weights since

the weightage is dependent on the magnitude of violation of a

constraint at the specific point

.() = (violation of constraint j at x
j total violation at x

Computational Results

Die algorithm described in the previous section was coded

primarily to come up i Jith a favorable set of '.alues for the

arbitrary parameters \ in Phase I and and the weights \.(x) for
Phase II. It may' be stated at the outset that selection of these

parameters is an art rather than a science and it uould be hard to

specify the best set of paramccers for the general .:ase.

The test problem considered uas the problem on "Lease-Buy

Planning Decisions" on pages 93 to 114 of Salkin and Saha's

"Studies in Linear Programming." The problem has 17 conscaints

and 20 variables. Together uich dual and non-negati'.'icy, set H

thus had 74 constraints. Set C consisted of the constraint obtained

by using strong duality. This problem has a unique solution. The

constraints of the problem are reproduced in A.ppendi:.:

-Test results corroboraLe tlie gncra-i l ,biiJd:lines for selection of

cliesc p[ rame Lcrs indicated in the ;rc'. ious secLiol. '1ile di L.ince

from 3 point of I' 1wliere P = 11 ii C, ,was calculatcJd 3C ench iteration.

Tlie number of iterations required to reduce the distance from P bY'

3 certain amount was used as the criterion of improvement. Two

different starting points were selected. Results of computations

are summarized below:

Phase I

A. The relaxation parameter X was varied in the interval (0,2).

For the first starting point, the number of iterations required to

reduce the distance d from P from d = 74 to d = 60, was noted, and

for the second point the number iterations required to reduce the

distance from d = 135 to d = 132 was noted. These results are

summarized in Tables 4 and 5 respectively. The results are in line

with our expectation based on intuition that a higher A has the effect

of accelerating convergence.

Phase II

The Phase II relaxation parameter was varied in the interval (0,2)

and the following alternatives were tested for the weights A.(x):

(a)(x) # of violated constraints at x

(b)X (x) violation of constraint j at x
W total violation at x
(c). (x) violationn of constraint j at x)2
3 total violation at x

(d) .(x) =(violation of constraint j at x 3
j total violation at x
(e).(x) violation of constraint j at x)
j total violation at x

() violation of constraint j at x)
)if ( total of violation at x = .3

= violation of constraint i at x
j( = total violation at x

S.2 (violation of constraint j at a x
total violation at x
f .2 5< (v violation of constraint j at x
I total violation at x
J total violation at x



.3 then


Initial d = 74

X Final d No of iterations

.25 63.757 > 20,000

.5 60.0 12,955

.75 59.999 7,339

1.0 60.0 4,866

1.25 59.997 3,469

1.5 59.997 2,565

1.75 59.996 1,927

1.95 59.997 1,525

Initial d = 135.

A Final d No. of iterations

.25 133.113 > 20,000

.5 132.0 12,698

.75 132.0 6,692

1.0 132.0 4,208

1.25 132.0 2,934

1.5 131.999 2,137

1.75 131.999 1,587

1.95 131.999 1,263


if < (violation of constraint j at x .2
total violation at x < 2 then

=3.3*(violation of constraint j at x)
total violation at x

if violationn of constraint j at x <1 then
if (x) ----- -: -- = 3.13then
total violation at x

.(x) = violation of constraint j at x)
total violation at x

(g) f ( violation of constraint j at .3 then
total violation at x

= i00"(violation of constraint j at x
j total violation at x

if" 2 < (violation of constraint j at) < .3
if (x2 = 1 0( ---- - -: -- --) .3 t e
total violation at x

S= I0(violation of constraint j at x
total violation at x

if 1 1 < (violation of constraint j at x) <.2 tien
if 2 < ( ----- - ----- ) < .32 then

total violation at x

A.(x) = .3-violation of constraint j at x
j total violation at x

if (violation of constraint j at ) < then
if ( -- 1 <--. ( -) < .1 then
total violation at x

.) = (violation of constraint j at x
j total violation at x

Results of alternative (a) are given in Tables 6 and 7 for

the same two points used in Phase I.

From Tables 6 and 7 it would appear that again A of around 2

gives best convergence, but the counterexample in the previous

section shows that this is not always the best policy with this set

of weights.

Results of alternative (b) are given in Table 8.

(a) X (x) = 1
j # of violated constraints at x

Initial d = 74

A Final d No. of iterations

.25 -- time limit exceeded

.5 -- time limit exceeded

.75 60.0 14,839

1.0 60.0 8,028

1.25 59.998 4,702

1.5 59.998 2,861

1.75 60.0 1,424

1.95 59.857 850

(a) X.(x) =

# of violated constraints at x

Initial d = 135

X Final d No. of iterations

.25 -- time limit exceeded

.5 133.45 >20,000

.75 132.0 15,249

1.0 131.999 7,568

1.25 131.999 4,135

1.5 131.999 2,435

1.75 131.998 1,212

1.95 132.0 713

(b) .(x) violation of constraint j at x
j total violation at x

Intiial d = 74

X Final d No. of iterations

.25 59.999 1,253

.5 59.964 1,396

.75 59.997 1,505

1.0 60.0 1,084

1.25 59.999 1,254

1.5 59.976 1,061

1.75 59.997 863

1.95 59.998 894

University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs