Citation
Iterative methods for solving large-scale linear programs

Material Information

Title:
Iterative methods for solving large-scale linear programs
Creator:
Kumar, Gollakota Surya, 1939-
Publication Date:
Copyright Date:
1979
Language:
English
Physical Description:
iv, 106 leaves : ill. ; 28 cm.

Subjects

Subjects / Keywords:
Algorithms ( jstor )
Linear inequalities ( jstor )
Linear programming ( jstor )
Mathematical procedures ( jstor )
Mathematical programming ( jstor )
Mathematical relaxation method ( jstor )
Mathematics ( jstor )
Perceptron convergence procedure ( jstor )
Polyhedrons ( jstor )
Simplex method ( jstor )
Dissertations, Academic -- Management -- UF ( lcsh )
Linear programming ( lcsh )
Management thesis Ph. D ( lcsh )
Genre:
bibliography ( marcgt )
non-fiction ( marcgt )

Notes

Thesis:
Thesis--University of Florida.
Bibliography:
Bibliography: leaves 101-104.
Additional Physical Form:
Also available on World Wide Web
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by Gollakota Surya Kumar.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Resource Identifier:
023308990 ( AlephBibNum )
06429711 ( OCLC )
AAL1949 ( NOTIS )

Downloads

This item has the following downloads:


Full Text












ITEPATIVE lETiHODS FOR SOLVING
LARGE-SCALE LIN:EAF. PROGR.'lS













Gollakoca Surya Kumar














A DISSERTAlIO;N PRESEfNTED ui TilE GRADUATE COLilICIL OF
THE UNIVERSITY' OF FLORIDA
II PARTIAL FLLFILLHIELNT OF THE REQULiIREIEENTS FOR THE
DEGFE OF DOCTOR OF PIIILOSOPHY








UIJIVERSIT'f OF FLORIDA


19 79

















ACKNOWLEDGEMENTS


I wish to express my deepest gratitude to Professor Gary

Koehler, my dissertation chairman, for the excellent guidance, the

constant encouragement and the support he has provided me during

my career at the University of Florida. Despite his busy schedule

he was always available at a moment's notice.

I also wish to thank Professor Antal Majthay whose depth of

knowledge not only in the area of operations research but on a

wide range of subjects will always remain awe-inspiring. I am

also grateful to Professors Ira Horowitz and Henry Tosi for

providing financial assistance during my stay at the University of

Florida. I also wish to thank Ms. Kathy J. Combs for her superb

typing of the manuscript.

Last but not the least this work would not have been possible

but for the patience, understanding and encouragement of my wife

Rekha.

















TABLE OF CONTENTS

PAGE

ACKNOWLEDGEMENTS ............................................ ii

ABSTRACT ............................ ... ....... .... .... ..* v

CHAPTER

ONE THE PROBLEM: WHY ITERATIVE TECHNIQUES? ........... 1

TWO BACKGROUND RESULTS AND TOOLS ..................... 8

Introduction ............. .... ............... 8
Motivation ................................ .... 8
Task ... ........... ......................... .. 11
Definitions .................... ......... .. 12
Typical Relaxation Method ....................... 13
Procedure ....................... ............. 13
Variations .................................... 14
Convergence ................................... 14
Fejer-monotone sequence ..................... 14
Motzkin and Schoenberg ..................... 16
Relaxation Procedures ......................... 18
Agmon .................................... ..... 18
Motzkin and Schoenberg, Eaves ............... 21
Goffin ........................................ -
Definitions .................................. 22
Another proof of Agmon's result .......... 33
Range of A for finite convergence ........ 37
Merzlyakov .................................... 39
Definitions and motivation ............... 39
Method ....................................... 40
Generalized Merzlyakov procedure ....,..... -
Typical Subgradient Method ..................... 6
Procedure .................................... 6
Variations .................................
Some Subgradient Procedures ...,................. '
Polyak ... ........ ...........................
Oettli ...................................... 51
Some of the more recent algorithms .......... 5)
Camerini, Fratta and Maffioli ............ 5
Wolfe ...................................... 59


iii









CHAPTER


THREE THEORETICAL RESULTS ................................ 62

Comparison of Generalized Merzlyakov Method
and Some Typical Subgradient Methods ............ 62
Generalized Merzlyakov method ................ 62
Further developments of Oettli method ........ 66
Generalized Merzlyakov method and Oettli method 68
Comparison with some other subgradient
methods ...................................... 71
An Algorithm Using Relaxation Method ............ 73
Motivation ................. ................. 73
The algorithm ............................... 74
Phase I ............... ......... ......... 74
Phase II ................................. 79
Application to solving large linear
problems .................................... 80
Computational results ........................ 86
Concluding Remarks .................... ......... 98

APPENDIX ....................................................... 99

BIBLIOGRAPHY ................................................. 101

BIOGRAPHICAL SKETCH ................ .......................... 105


PAGE

















Abstract of Dissertation Presented to the GrAduate Council
of the University of FloridJ
in Partial Fulfillment of the Requirements
for the Degree of Doctor of Philosophy


ITERA\TiVE METHODS FOR SOLVING
LARGE-SCALE LINEAR PROGCFAIS

B '

Gollakota Surya Kuirar

August 1979

Chairman: Gary J. Koehler
MIjor Department: Management


The simplex: method for solving linear programs has achieved

its overwhelming success over the years due to the advances during

the last three decades on extending the effectivEness of the basic

algorithm introduced by Danczig in 1951. However the largest

problem that can be economically solved by this method is dictated

by the size of the basis. The limitation being the ability to

maintain a sparse yer accurate representation of che basis inverse

CocuTiercial computer codes using the recent developments of the

simplex method are capable of solving manLIem,- Lical program systems

uf Llie order of 10,'i.iUU rows. IJowev.er, inrker l incur p.rogr ns do

arise in practice. The reason why LIe', are not being set up is

because none of the production LP codes can solve such laree

problems in a timely and cost effective fashion. For such large









problems iterative techniques that do not require a basis may be

an attractive alternative.

We review two classes of iterative techniques--relaxation

methods and subgradient optimization. Solving a linear program

can be shown to be identical to finding a point of a polyhedron P,

the polyhedron being formed from primal and dual constraints and an

additional constraint using strong duality. In relaxation methods

we successively find points of relaxed forms P c pn. That is, at

iteration n we find an x Pn such that P c P. We assume P # 0.

The problem is solved when the sequence converges to a point

x* P.

Subgradient methods are an alternative class of iterative

techniques for solving these problems. In subgradient methods we

minimize a convex though not necessarily differentiable function

n+l n n n n
f(') by the iterative scheme x = x t u where t > 0 is a

n n n
a scalar obtained by the approximate minimization of f(x t u),
n n
and u is a subgradient of f at x When we use the minimization

method for finding a point of a polyhedron P, the function f is

so defined that if x* is a minimum of f then x* E P.

These iterative techniques work with original data; thus

advantages of supersparsity can be fully realized and the program

run in core. Also they are self correcting and knowledge of a

vector close to the solution can be used to advantage.

We show that a generalized form of Merzlyakov's relaxation

procedure subsumes most of the recent subgradient procedures when

the objective is to find a point of a polyhedron. A new algorithm








using relaxation method is presented. This algorithm was coded to

explore values for the relaxation parameters used in the algorithm.















CHAPTER 1

THE PROBLEM: WHY ITERATIVE TECHNIQUES?


Mathematical programming has achieved its present popularity

in a large measure due to the success of the simplex method for

solving linear programs published by Dantzig [6] in 1951. Over the

years considerable effort has been expended towards exploiting and

extending the effectiveness of this procedure. We will briefly

outline some of the improvements.

Most practical linear programs have a sparse matrix [2] and it

is necessary to exploit this sparseness to reduce memory and time

requirements and to maintain accurate information. The first such

procedure was the revised-simplex procedure. This procedure

operates only on original problem data and requires an inverse of

the current basis. If there are m constraints and n variables

(including slacks, surplus and artificial) then the regular

simplex procedure requires

(1.1) (m + l)(n + 1)

storage locations. There are m + 1 rows (including the objective

and n + 1 columns (including the right-hand side). In the revised-

simplex procedure we need

(1.2) p(m + l)(n + 1) + m2

where p is the problem density. (1.1) is much larger than (1.2)

if p is close to zero.









It was also discovered that if the inverse matrix were stored

in product form [29] an additional decrease in storage requirements

could be realized. However, in product form, the basis inverse

grows in size at each iteration so that eventually we must either

run out of room or scrap the current form of the basis inverse and

reinvert the current basis. In practice this happens quite often.

Using product form inversion, the storage requirements are

(1.3) p(m + l)(n + 1) + qmr

where q is the average density of the eta vectors needed in the

product form inverse and r is the average number of eta vectors.

Notice, though, that to gain the decrease in (1.3) over (1.2) we

must spend time in the reinversion step.

It was soon realized that there are many ways of forming a

product form of a basis matrix. For instance, the matri:-; could

first be decomposed into the product of two triangular matrices.

The eta vectors for these are sparse and easily formed. In

addition, within a class of procedures for viewing the original

basis, one can choose the row-column pivot order to minimize the

creation of new members -- i.e. one can choose the order in forming

the eta vectors to minimize the density of the resulting basis

inverse representation [20]. A related issue has to do with

updating the currect basis inverse with a new eta vector. ,inc cjii

perform this operation with a minimal buildup in storage

requirements [14].

Kalan [22] pointed out that of the p(m + l)(n + 1) elements of

a typical LP problem, most are identical (usually most are + Is).









Kalan suggested storing only the unique numbers of an LP problem

and using pointers to store the problem structure. In so doing

one stores fewer elements than exist in the problem. The new

sparseness is termed supersparsity. Kalan noted that not only can

original data be stored this way but also new data (formed in the

basis inverse) can be stored in this fashion.

Another useful idea -- scaling of data -- has numerical

stability for its motivation. Generally for real problems the

coefficients of matrix may take on a wide range of values. This

can cause numerical difficulties -- for example in inversion --

since the computer must round off fractions. To reduce the effect

of this problem, the matrix entries are scaled before starting the

problem.

There are a number of ways in which to obtain a starting

basis: from scratch using an "artificial basis," using a CRASHed

basis forcing some of the variables into the basis, or based on

information about the problem. The last approach is especially

useful in solving a series of problems with only minor variations

in data.

Candidate selection can play an important part in the

efficiency of an LP algorithm. There are a number of ways to select

the entering variable -- "multiple pricing" is one of them which is

especially useful for large problems. In this approach we typically

divide the variables into segments and pick the segments sequentially.

In the chosen segment we select the g most negative reduced cost

variables as candidates for pivoting. We concentrate on these g









non-basic variables for the remainder of some h iterations. And

then repeat the procedure with the next segment. A more recent

development concerning pivot selection is due to Paula Harris [17].

The reduced costs here are weighted prior to selection. The

weights are chosen with respect to a fixed set of non-basic

variables. The idea is that in the traditional most negative

reduced cost approach we are trying to find out the best candidate

but on a local basis -- locally we move in the direction of

steepest ascent of the objective function. Such a selection may

not serve global interests. By constantly adjusting the pricing

by weights derived from a fixed reference point, the global

interests of the problem are hopefully maintained. Expr rimental

results corroborate this expectation and show it to be a very

effective procedure.

When there is degeneracy in a linear programming Drobleim,

there exists the possibility of a set of basic feasible solutions

occurring repeatedly -- called "cycling." Bland 13] has introduced

several new methods to avoid cycling -- the simplest of whicn is

to select as entering variable one with the smallest subscript

among all candidates to enter basis. If there is a tic for leaving

variable, select the one with the smallest basic subscript.

Parallel with efforts to increase the size of problcim that

can be tackled by the simplex method have been efforts to incrcnse

its capabilities for problems having special structure. l ie

computing effort using the simplex method depends primarily, on the

number of rows; so it is worthwhile considering the possibillty of








taking care of some constraints without increasing the size of the

basis. Generalized Upper Bounding (GUB) technique introduced by

Dantzig and Van Slyke 19] achieves this for contraints of the form

(1.4) Ex. = b.
J 1

with the restriction that the same variable does not occur in more

than one GUB constraint. Decomposition technique is another method

of extending the capability of problem solving by breaking it into

smaller subproblems and using a master program iteratively to tie

them together. Unfortunately the convergence of this method is

rather slow and in addition one is unable to compute bounds on the

value of the objective function at each iteration, unless each of

the subproblems is bounded.

Despite the far reaching advances made during the last three

decades in extending the capability of the simplex method, some of

which have been mentioned above, the basis inverse still remains

the limiting factor dictating the size of the problem that could

be tackled. Even when we use variants like GUB and decomposition

techniques, the size of the basis is still the bottleneck. In the

case of GUB it is the basis of the master problem consisting of

other than GUB constraints and with decomposition it is the size of

the basis of the largest of subproblems. Thus this basis inverse

is the heart of all that is bad in simplex method related techniques

of solving an LP.

Commercial computer codes using above developments of simplex

method are capable of solving mathematical programming systems of

the order of 10,000 rows. However, useful linear programs of such









magnitude do arise in practice for example in food, oil and

chemical related industries. The reason why larger ones are not

being set up and solved is that no production LP code has been

developed to solve such large problems in a timely and cost

effective fashion. This is not to say that there are not large

linear programming production codes. For example, IBM's MPSX/370,

CDC's APEX, Honeywell's LP 6000 X, Techtronic's

MPS3, and others are reliable and available at reasonable rates.

However, one would not seriously consider using one of these to

solve a lincar program of 10,000 or more rows on a routine basis

unless the problem possessed some nice structure or the resulEs of

the problem solution had a high economic value.

Dantzig et. al. [S] in their justification for the setting up

of a system optimization lab for solving large scale models have

noted:

Society would benefit greatly if certain
total systems can be modeled and successfully
solved. For example, crude economic planning
models of many developing countries indicate
a potential growLh race of 10 to 15. per year.
To implement such a 2rou.thl (aside from political
differences; requires a carefully worked out
detailed model and the availability of computer
programs that can solve the resulting large-
scale systems. The world is currently faced
with difficult problems related to population
grouch, availability of natural resources,
ecological evalu.'t ions and control. urban
redesign. design :f lar.c scalc cn.inccrin:.,
systemb'ni (e.g. atomic en r,;y .id ru:yc lin*i
systcni. and the modeling of mrn;n's plih iologican
.v teCm for the p purpose of diagnosis and trc;itmenic c.
These problems are complex: and urgent and can
only he solved if viewed as cotal systems. If
not, [lhcn oiily patclhuork picconical solutions s w. ill
be developed (as it las been in the past) and
the u.rrld will continue to be plagued by one crisis
after another caused by poor planning techniques.








The bottleneck in solving such large unstructured LPs is the

inability to maintain a sparse yet accurate representation of the

basis inverse. When the simplex method is not computationally

feasible for such large problems, iterative techniques that do not

require a basis inverse may be an attractive alternative. In this

dissertation we concentrate on iterative procedures for solving

linear programs.

We discuss iterative techniques under the headings (a)

relaxation methods and (b) subgradient methods. These methods are

insensitive to the number of constraints. In fact an increase in

the number of constraints for the same problem improves the

convergence of the relaxation methods. Unlike the simplex method,

iterative techniques are self-correcting. We always use original

data thus advantages of supersparsity [22] can be fully realized

and the program run in main memory. Finally knowledge of a vector

close to the solution can be used to advantage. This is very

useful when a sequence of problems has to be solved with only

slightly varying data.

As an aside to the main thrust of this dissertation, we note

that iterative methods are more attractive for solving certain

large Markov decision processes and also for problems having a

Leontief Substitution constraint set [23, 25]-both of these problems

are an important special class of LPs.















CHAPTER 2

BACKGROUND RESULTS AND TOOLS


Introduction

Mo tivation

In this chapter we will review various iterative techniques

for finding a point of a polyhedron. We are interested in finding

a point of a polyhedron because a wide variety of important

mathematical programs can be reduced to such a problem. For

example, solving a linear program could be reduced to finding a

point of a polyhedron using primal and dual constraints along wich

an additional constraint implying strong duality. To elucidate tliis

further, consider the following primal-dual linear programming pair:

(2.1) Primal Dual

Max c'x Min v'b

s.t. Ax b s.t. v'A c c

x >O v 0

Here A is m x n; c is n x 1 and b is m x 1. v' stands for cranspose

of v. Problem 2.1 can be restated as

(2.2) -Ax + b 0

v'A c 0

v'b c'x 0

v 2 0, x 0









Let P = {((): (v) satisfies 2.2} be the set of solutions to 2.2.


()E P implies that x solves the Primal problem in 2.1 and v solves

the dual problem.

Linear programs of the form

z* = Max Min (a'x + b.)
1 1
x i=l,...,m

relate directly to the problem of finding a point of a polyhedron P

where
k
P = {x C R : a'x + b. z*, i = 1, ..., m}
1 1
Such problems arise, for example, in large-scale transportation

problems and decomposition problems [27]. We will develop such a

formulation for the transportation problem.

Given k origins with stocks aQ > 0 of a single commodity, and

m destinations with demands d. > 0 the transportation problem involves

determining a shipping schedule that meets all demand and supply

constraints at minimum cost. Let

zyj be the amount shipped from I to j


cj cost of shipping one unit from Z to j.


The problem is
k m
Min Z E c ,j zzj
Z=l j=1
subject to
m
E z j a = 1, ... k
j=l

k
E = d. j = 1,..., m
%=i j J









z > 0
j =

Without loss of generality, we assume

k m
Z a = d
=1l j=l 1

If m and k are too large to be handled by conventional network

methods [4], then the decomposition principle can be used. Let

(z : = ., n; j = 1, m): i I'}

be a list of extreme points of

k
E z k = d., j = 1, ..., m
Z=1

z >2 0


Then the transportation problem is equivalent to the master problem

k m
Min E Z c Z z A
k=l j=1 idI

subject to
m
a -k z AP = 0, Z = 1, .. ., k
il j=l


SA = 1
iEI

A1 0.

Let
k m
b Z Z cj z
i =1 j=1

m
-i i i
a = a z z
j=1

The master problem becomes

Min E bii
iEI









subject to
i i
A = 0, 1 = 1, ..., k
iE I



icI

i 0


The dual to the above problem is

Max w

subject to
k
-E a x + w b
Z=1


w b + a'x

where
ai i k
ai = (al' a2, ..., ak) c R

This problem is equivalent to

Max, Min rax + b.
iax + b.)
xER icI 1 i

Task

Two of the methods of finding a point of a polyhedron are the

Fourier-Motzkin Elimination Method [7] and a Phase I simplex method.

The Elimination Method reduces the number of variables by one in

each cycle but may greatly increase the number of inequalities in

the remaining variables. For larger problems this procedure is

impractical [26] -- since the number of inequalities growsat a rapid

rate. The Phase I simplex procedure can also be used to find a

point of a polyhedron. For very large problems the simplex method

breaks down due to a practical limitation on the size of the matrix









that can be inverted as explained in the last chapter. It is for

problems of this nature that iterative methods may be an attractive

alternative. We discuss iterative techniques under two classes --

relaxation methods and minimization methods.

In relaxation methods, at each iteration we attempt to find a

point of a relaxed form Pn of P. That is at iteration n we find an
n n n
x P such that P c P We look for the property that the sequence

of points thus generated is pointwise closer to P. Minimization

methods attempt to solve the problem of minimizing a general, maybe

non-differentiable, convex function f('). The sequence generated

generally has the property that the successive iterates are closer

to the level set of solutions in distance, x* is one such point.

The value of the objective function need not necessarily improve ac

each iteration. When we use the minimization method for finding a

point of P, the function f is so defined that if x* is a minimum of

f then x* E P.

We will now review the relaxation methods. As was mentioned

n n
earlier, we find at each iteration n, a point xn p where P is a
n.
relaxed form of P. That is P c P We continue finding points

successively in this manner till we can find an x* E P. :L leasE

one such point exists by our assumption that P J 0.

Definitions

We need a few definitions before some of tle methods c.n be

described. Let

R.(x) = a'x + b
1 1 1








k k
where i E I, I # 0 and finite, x R a. R and b. E R. Without
1 1
loss of generality assume a.i = 1 for all i, where 11 II is the

Euclidean norm. H. given by
1
H. = {x c Rk: k.(x) 0)
1 1

is the halfspace associated with i C I. P given by

P = n H. = {x e Rk: (.(x) 2 0, j I} is the polyhedron
1 1
icI

of interest. E. given by
1


E. = {x C R: L. (x) = 0}
1 1
is the plane associated with halfspace 11 d(x,H.) given by

d(x, H.) = inf II x-zI

z H.
1
is the Euclidean distance from x R to halfspace H.. d (x) given
1 p
by

d (x) = max d(x, H.)
p i
i I

is the maximal distance from x to P. B (x) given by

B (x) {ycRk: 1 x-yjll r}
r
is the ball with radius r and center x. S (x) given by
r
Sr(x) = {yRk: Ix-y| = r}

is the k-l dimensional sphere of radius r and center x


Typical Relaxation Method

Procedure

A typical relaxation method for finding a point of P is
o k
(1) Pick x R arbitrarily









(2) If xn P stop, otherwise let i* be the

most isolated halfspace, i.e.,
n n
-a'. : x b.* > -a' x b., for all iel.
1 1 = 1 1

We obtain the sequence {xn} recursively as follows:

n+1 n +n n n
x = x + A (t x )
n n n n
where t = PE (x ) is the projection of x on Ei,. A is called
i*

the relaxation parameter at iteration n. Generally An is constant

across n = 1, 2, ...

Variations

If An = 1 for all n,we call the procedure a projection method,

and when An = 2 for all n,we call it a reflection method. See

Figure 1 for an example of the relaxation procedure. In this

example we use A = A = 2, for all n. At xo, E is the plane

corresponding to the most violated halfspace. We reflect in E.
1 1 2
to get x At x we reflect in E3 to get x If we concinlju

the procedure we are in P in four iterations.


Convergence

Fejer-monotone sequence

We are ultimately interested in finding a point of P or at least

finding a sequence of points which converge to a point of P. If the

sequence terminates with a point of P (as in Figure 1) we ha.'e

accomplished our goal. If the sequence does not terminate, then .'e

wish to know whether it converges to a point of P. A well kno..in

result concerning "Fejer-monotone" sequences will be used to shed

light on this question.
































a3 + b> 0
3* 3 =


alx + b > 0
1 1=


x


a2x + b2 0
a 2 + b 2


FIGURE 1









A sequence of points {xn} in Rk is said to be Fejer-monotone

[13, 31] with respect to the set P if
n+l n
(i) xn+ xn and

(ii) I x+ zll I xn z | for all z P and for all n.

Motzkin and Schoenberg

A well known result in analysis is that if {sn) is monotone,

then {sn) converges if and only if it is bounded. Hence we see that

a Fejer-monotone sequence always converges in distance. However, we

need to relate this convergence to the convergence of sequence {xn}.

Before we relate a theorem on convergence of Fej6r-monotone sequence

of points, we need a few more definitions.
k
Given a set K c Rk, the affine hull of K, denoted by 'f(K), ii
k ii i
M(K) = {x E R : x = x x E K,
iEL

A = 1, L finite, Ai R}.
iEL

For example, the affine hull of a point is the point itself and affine

hull of a line segment is a line containing the segment. The affine

hull of a triangle in R3 is the plane containing the triangle. A

set P is affine if P = M(P).

Let M be affine in Rk, x a point of M and r E R be a positive

real number. Then we define S (M,x) as the spherical surface .:itl

axis M, center x and radius r by

(2.3) Sr(M,x) = x + ((M x)L n {z E R k zj = r})

Figure 2 shows the construction of a spherical surface. Gi'.i-n a

point of affine set M, M x is the translation of M by x. (" -

is orthogonal to M x. S (0) is a spherical surface with radius r,
r







which intersects (M x)- at two points. These two points are
translated back by x to give S (M, x),
(2.4) Theorem (Motzkin and Schoenberg)
Let the infinite sequence {xn) be Fej6r-monotone with respect
to the set P, then
Case (a): If P has an interior, the sequence converges to a
point.


S (M, x)
r

I. x


i -


s (0)
r


(M x)


FIGURE 2





18



Case (b): If P does not have an interior, then the sequence

can be convergent to a point or the set of its limit points may lie

on a spherical surface with axixMl(P).

See Figure 3 for an illustration. In case (a) P has an interior

0 1 2 3 *
and the sequence x x x x ... converges to a point x of P.

In case (b) P does not have an interior and the Fejer-monotone

0 1 2 3
sequence x x x x ... results in the two limit points of the

sequence lying on a spherical surface with axis AI(P) and center y.
1
y being the projection of x on P


Relaxation Procedures

Agmon

We are now in a position to start our survey of relaxation

procedures. One of the earliest published methods is the projection

method due to Agmon [1]. Here we specify an initial veccor x; E F,

(chosen arbitrarily) and set An = 1 for all n. At iteracion n if
n n+l
x E P stop. Otherwise we find a new point x as follo..'s:

n+l n n n n
x =x (t x

I n
(a x + b. ) a
n n n in
n
= x -
a. a.
n in

n n
= x (a x + b.
in in) a.


since Ia.i I = 1. Here i represents a most violated conLr3iinc of
n


n
P at x













































Case (a)


S (M(P), y)
r


135
1 3 5
X ,X ,X ,





y


M (P)


p






0246
x ,x ,x ,x ,


Case (b)



FIGURE 3


f









This procedure yields a sequence which converges to a point of

P. In the Agmon procedure

P = {x: a. x + b. 0)
in in

That is, pn is the closed halfspace defined by the constraint in and

P c In keeping with our notation, here P is a relaxed form of P.

(2-5) -Theorem (Agmon)

Let P = n H., P # (, I 7 0 and finite, be the solution set of
iEI 1

a consistent system of linear inequalities, and let {xn) be the

sequence defined by the projection method, that is, n = 1 for all n.

Then either the process terminates after a finite number of steps or

the sequence {x n converges to a point of P, x

Furthermore,
n 0 n
Ix x | 2d(x P) ,

where 0 c (0, 1) and depends only on the matrix A, where

a


a2
A =



a
a m



and m = ll.

Agmon explicitly proved the convergence for the projection

method. He also showed that the results could be extended to the

case A E (0, 2), where A = A for all n.









Motzkin and Schoenberg, Eaves

Motzkin and Schoenberg 131] (among others) described a reflexion

method for finding a point of P. They showed that if P has an

interior then the sequence terminates with a point in P after a

finite number of steps. Let x0 P be an arbitrary starting point

and generate a sequence as follows:

If x C P stop
n n
If xn P select a halfspace such that x n H.1

n+1 n
Let x be obtained by reflecting x in E.. After
J
finitely many iterations, x will fall in P if P has an

interior.

The general reflection function can be expressed as follows:

k k
For i = 1, ..., m define f : R -- R by
i

x 2(a.x + b.)a. if x 1 H.
f.(x) = x + 2d(x, H.)a. =-
x if x E H.
1

Let gl' g2, ... be a sequence where

gj {fl, .. ., f } for j = 1, 2, .

Define g to be the composite function

gn(x) = gn(gn 1 (...(g (x)) ...))

(2.6) Theorem (Motzkin and Schoenberg)

Let P # 0 and I # 0 and finite. If P = n H. has an interior,
iI
0
then for any x there is an k such that

g (x) E P.

Eaves 111] extended this result and demonstrated that the

reflection procedure has a uniform property. Namely, if x is within









a distance 6 of P, then g (x ) will fall in P for some e where
0
Z and k depends on 6 and not on x .

(2.8) Theorem (Eaves)

Assume that P = n H. has an interior. Let X be the set of
icl

points within a distance 6 of P, that is,

X = {x C R: d(x, P) 6}

Then there is an L such that g (X) c P. g is thus a piecewise,

linear retraction from X to P.

Piecewise and linear means that it has finitely many linear

pieces. Retraction means g : X -- P is continuous, P c X and

g (x) = x for x C P. See Figure 4 for example. Six points

chosen arbitrarily within 1" of P all converge in under four

iterations.


Goffin

Definitions. Coffin's 115] work deserves special menLiocn.

He has provided a comprehensive treatment of relaxation mechods

and presented several useful finiteness properties. We need a

few more definitions before presenting Coffin's results.

A convex subset F of P is a face of P if

x, y C P
(>-- x, y c F.
(x, y) F # J

We denote the set of faces of P by F(P). Figure 5 illustrates

F(P) for a given P. Zero-dimensional faces are 1, 2, 3. On'-

dimensional faces are 4, 5, 6. The polytope 7 itself is also a race.




































































FIGURE 4





24


The zero-dimensional faces of P are called the extreme points of P.

The n-1 dimensional faces are called facets of P.





2


FIGURE 5


NS(x) defined by

NS(x) = {u E Rk: u'(z x) 0, Vz E S}

is the normal cone to S at x. NS(x) is a non-empty, closed convex

cone. It is non-empty because it contains at least the origin. Ic

is closed and convex because the intersection of closed halispace is

closed and convex. It is a cone because for all u E NS(x),

Au E N (x), A 2 0.

CS(x) = (NS(x))p

is the supporting cone to S at x. Figure 6 illustrates theoe last

two definitions.

A point x of a set P is a relative interior point designated

r.i.(P.) if it is an interior point of P with respect to the relative








topology induced in M(P) by the Euclidean topology, i.e., x E r.i.(P)

if there exists 6 > 0 such that B6(x) n M(P) c P. For example, as
2 2
shown in Figure 7, a line segment in R has no interior in R but

has a relative interior.


NS (x)


FIGURE 6




























FIGURE 7


It can be shown that the normal cone is the same for all

relatively interior points of a given face, i.e.,

Np(x) = Np(F) for any x c r.i.(F)

where F is a face of P. Similarly

C (x) = C (F) for any x c r.i.(F).

Let T be any set, then -T is defined as

-T = {x E Rk: x = -y, y T}

A closed convex set is said to be smooth enough at a point y of

its boundary if

-Np(y) c Cp(y)

A closed convex set is said to be smooth enough if it is smooth

enough at every boundary point, or equivalently if

-N (F) C (F) V- FE F(P) 0

where F(P) stands for the set of faces of P and F(P) 0 is the set

F(P) with 0 removed.








The smooth enough property applied to a polyhedron would require

all its "corners" to be "smooth enough," carrying the analogy to

k-dimensional Euclidean space.

Some characteristics of smooth enough convex sets are mentioned

below:

A polyhedral cone C is smooth enough if and

only if (iff) -C c C, where C is the polar of C.

C={u E Rk: u'z < 0, Vz E C}.

A polytope (bounded polyhedron) is smooth

enough iff all its extreme points are smooth

enough.

A polyhedron is smooth enough iff -Np(F) c C (F)

for all the minimal elements of the poset {F(P) c, c}

where c stands for set inclusion.

Some illustrations are given in Figure 8.

A poset (P, <) is a set P on which a binary relation < is

defined such that it satisfies the following relations:

(i) for all x, x < x (reflexivity)

(ii) if x < y and y < z then x < z transitivityy)

(iii) if x < y and y < x then x = y (antisymmetry)

Examples of posets are {R, <}, i.e., the reals with the ordinary

less than equal to operation, a dictionary with lexicographic

ordering, and the power set S, P(S), with set inclusion, c as the

binary operation.

Given a unit vector e E R and an angle a E [0,11], the set of

vectors which make an angle with e less than or equal to a is called

the spherical cone, C (e), with axis e and half aperture angle a.


















N (y) (y)




(a) Not smooth enough at the extreme points of P.






P



(b) Not smooth enough at any point of P.





N (y)










(c) Smooth enough.


FIGURE 8








C (e) = {x E Rk: x'e x || Cos a}

See Figure 9 for an illustration


C (e) = C{B (e)}

a = sin a, a c [0, i)
2


FIGURE 9


Some properties of C (e) are given below:

Ca(e) is closed for a E [0,1]

Ca(e) is a convex set for a G 10, N/2]

For a E [0,1/2)if we let v = Sin a, then

Ca(e) = C{B (e)} where
k ii i i
C(S) = {x R x = A x x E S, A > 0, L finite)
iEL

C(S) is the convex cone hull of S.








Following Goffin [15] we define an obtuseness index u(C) of a closed

convex cone C to be the sine of the half aperture angle of the

largest spherical cone contained in C. Properties of V(C) are:

u(C) = sup {Sin a: C (e) c C, e i B (0)}

= sup mnin(a e)
eS1 (0) icI


u(C) > 0 iff C lias an interior

C1 c C, -- u(C ) < u(C,)

u(C) = 1 iff C is a halfspace.

If C (e) is a spherical cone then u(C (e)) = Sin a.

The obtuseness index of a polyhedron P is defined b.,

U(P) = inf 'u(Cp (:-:)) = min U(Cp,(F))
:-z P F- F'(P)-.i .

For a pol cope 'e have

'(P) = ri-n u(C,(F))
Ft'.'ertices of P

If U(P)= l. 1/-2, then P is smooth enough. However, clis is noc a

necessary condition for P to be smooth enough. If P is the pocsicive

ortchant in R then P is smooth enough but

U(P) = 1..'D 1/..F

kk
The distance from a point X:; c a set S C Rk is defined bv,

d(x. S) = inf I x z
zE S

If S is closed, chen che inflmum is acctined, and che set ol points

of S where it is attained is the set of points closest re :-: in S

Ts(x%) = {' FE S: I x y: | = d(x., S)}

If a ser K is closed and convex, then T..(::, reduces tc a single point








which will also be denoted by TK(x) This point is called the
k
projection of x into K. Hence T (x) is a retraction from R to K.

Let K be a closed convex set in R The following important

result implies the continuity of the map TK.

(2.8) Theorem (Cheney, Goldstein)

The projection map TK satisfies the Lipschitz condition (for

C = 1). That is

IITK() TK(y)I Ix yI

See Figure 10 for an illustration;












x







K() y


K(y)


IT (X) T (y)|I =< I x YIJ


FIGURE 10








(2.9)- Theorem (Goffin)

Let K be a closed convex set of Rk and TK(x) the unique closest

point to x in K. Then

{[TK(x) + NK(T (x))], TK(x) K}1 is a partition of Pk

See Figure 11 for an illustration.


FIGURE 11


(2. 10) Lc rEuT

Le C c R' be a closed con'.ex cone. Then

T C" ) c )

where C is i hc polar of C.

An alternaci'.e formulation of a spht.erical surface '..ich da:is. 11,

radius d(x, I) and center T (X) (see 2.3) is concaincd in lhe

following lenrma.








(2.11) Lemma (Goffin)
k k
Let M be affine in R and x C R then

Sd(x,) T(x)) = {y c Rk: x z11= Ily- l z M}

'Finally a well known result on norm-equivalence is stated

below. Here we state the result in terms of distances from x to P

with x V P.

(2.12) Lemma

Let P = n II., P / 0, I J 0, then there exists a scalar
iCl 1

0 < J < 1 such that for all x c Rk

pd(x, P) d (x) < d(x, P).

This result is illustrated in Figure 12.

Another Proof of Agmon's Result. The following is a different

proof for the result originally due to Agmon. We will find this

useful in subsequent parts of this material. In this proof we

make use of the result that the sequence generated by the relaxation

method is Fejer-monotone with respect to P.













d (x)
P TT' ,-,


FIGURE 12








(2.13) Theorem

Let P = n H., P J 0, I # 0 and finite, then the relaxation
icl

method generates a sequence which converges finitely or infinitely

to a point x* E P for any given value of the relaxation parameter X

such that A E (0, 2).

Furthermore, we have
n *0
Sx x 2d(x P) 6n

where 0 = [1 A(2 _A)p2]1/2

AE [0,1)

Proof:

Assume x n P

First we will show using Figure 13 that

d (xn+, P) = d 2(x, P) A(2 -A)d (xn)

where d (xn) is the maximal distance from xn to P i.e., d xn

max d(x, H.).
iEI
n+l n n+l
From the right triangle x t T(x ) we have:
2 n+1 2 n+1 n 2 n
d (x P) = d (x t ) + d (t P) ........ ....(1)
n n n
From the right triangle x t Tp(xn) we have:
2 n+1 2n n 2n
d (x P) = d (x t ) + d (t P) ............... (2)

From (1) and (2)
2 n+1l 2 n 2 n+1 n 2 n n
d (x P)- d (x P) = d (x t) d (x t )

2 n+1 2 n n+l n n n.
d (x n+, ) = d (x P) [d(x t ) + d(xn, cL )

[d(x t) d(x+ t )]

= d(xn, P) Adp(xn)[(2-A)dp(xn)]

= d2(x, P) A(2 A)d (x )
P








It will be noticed that our proof is independent of Figure 13. The

figure only helps to illustrate the ideas.


T (xn) = T (xn+)
p p


FIGURE 13


By lemma 2.12 there exists

pd(x, P) a dp(x)
p'


a p > 0 such that

d(x, P)


hence


2 n+l 2 n 2 2 n
d (xn1, P) < d(x, P) 2A(2 )d(xn, P)

= d(xn, P)[l A(2 A) p2]

(2.14) Let 9 = [1 A(2 A) 2 1/2

O = 0 when A = 1, p = 1. In general 0 E [0, 1). Thus,

d2(xn, P) O2d2(xn, P)

or d(xn, P) Ond(x0 P)

Case I: If the sequence does not terminate

lim d(xn, P) = 0
n--*o

Since {xn} is Fejer-monotone and dim P = n, then {x}n converges to

x C DP by Theorem 2.4 where DP is the boundary of P.


n+l
x








Case II: If the sequence terminates in a finite number of steps

Sxn+ z f
Fejer-monotone with respect to P. Clearly I x*- Tp(xn)|l~l xn- T(x)II,

hence

x* E Bd(xn, P)Tp(n)

and x*, xn both belong to the ball B d(xn pTp(x ) (see Figure 14).

Hence,

xn x*1x 2d(xn, P)

< 2nd(x0, P)

If the sequence terminates, we can take x* to be the last point of

the sequence./


n+2
x


FIGURE 14









In the above procedure, if 1 A < 2 then

P = {x: a' x + b. > 0}
1 1 -
n n

That is, pn is the closed halfspace defined by the constant i .
n
If 0 < A < 1, then

pn = {x: a, x + (1 A)b. Aa' xn > 0)
i i i =
n n n

and again P c pn. In both cases Pn is a relaxed form of P and at

iteration n we find xn E Pn

Range of A for finite convergence. Using relation 2.14 giving

o against different values of A and p, we get the results in Table 1

shown graphically in Figure 15.


TABLE 1



V = 1 A .1 .5 .75 1 1.5 1.9

61=f(A) .9 .5 .24 0 .5 .9


S= .1 .5 .75 1 1.5 1.9

0,=f(A) .98 .9 .87 .87 .9 .98


S= 4 A .1 .5 .75 1 1.5 1.9

6 =f(A) .99 .98 .97 .97 .98 .99




From Figure 15 it would appear that A = 1 gives best results (lowest

0). However, Motzkin and Schoenberg demonstrated that when P has an

interior A = 2 leads to finite convergence. Also if P has acute

angles, p will be low and 0 high. Thus an obtuse angle should lead

to faster convergence. Goffin determined the range of values of A

which gives finite convergence.





38



















/4



/2




























1.0 1.5 2.0







FIGURE 15


\ = 1



1 = 1


i =1


.5








(2.15) Theorem CGoffin)

Let P be a polyhedron with an interior. Then the sequence

generated by the relaxation method converges finitely to a point of

P if

(a) P is smooth enough and A c 11, 2]

(b) A E (X*, 2]

where

2
= + 2P and U(P) is the obtuseness index of P.
1 + U(P)

Furthermore, if A > 2 then the sequence either converges

finitely to a point of P or it does not converge.

The first part of the theorem shows that if all the corners

of P are smooth enough then the range of A over which we can get

convergence in a finite number of steps is [1, 2].

The second part of the theorem shows that for a polyhedron that

is not smooth enough, the extent by which we can deviate from A = 2

depends on the acuteness of the most restrictive extreme point of P

-- the more acute it is, the less we can relax A from a value of 2.

Merzlyakov

Definitions and Motivation. The final work on relaxation

methods that we review is by Merzlyakov {30]. Merzlyakov's

method takes advantage of the fact that convergence properties

of relaxation methods improve if, instead of considering only

the most violated constraint, as was done by Agmon, among

others, a larger number of supporting planes are considered.

We would like to emphasize this unusual property of








relaxation methods -- redundancy, in terms of number of constraints,

improves the convergence rate. Additional halfspaces have the effect

of increasing p which lowers 0, the convergence ratio.

At each x C R let

V(x) = {i: a'x + b. < 0, i = 1, ..., m}
1 1
That is, V(x) is the set of indices of constraints violated by x.

A cavity CV is the set

C = {x : R : V(x) = V} where V is a subset of first m natural

numbers. A subcavity SVj of CV is the subset of all x c CV which

violate constraint i C V no less than another constraint j E V.

Ties are broken by the natural ordering. Notice that subcavities

along with P finitely partition Rk. Figure 16 illustrates the above.

Associate with each subcavity S a fixed X.(j, V) '.here

0 if i V V

.(j, V) = 0 if i V
1
> 0 if i = j
2
For example for subcavity S{2 ,3(see Figure 16).


X1(2, {2, 3}) = 0


A2(2, {2, 3}) > 0

A3(2, {2, 3}) > 0




Method. Merzlyakov's relaxation procedure is as foll,,s:


Let A E (0, 2) and x specified. If at iteration n. :n P

stop. Otherwise, let

































I
s 21
{I, 2}



C{1, 2}


, 2 3
{1, 2, 3)


, 2, 3)


FIGURE 16


3
S
{2, 3}


2 3
f2, 3)


3}









(2.16) x = xi [ (j,V)(axn+b )] (j)a

[ZiA(j,V)ai] [LiA(j,V)ai]

n j
where x S.

We again make the assumptions that P # 0 and without loss of

generality assume I a.si =1 for each i = 1, ..., m. Also assume

,withouc loss of generality thac Z7.i(jV) = 1. Coffin sho..ed chat if

P t 0 chen the vecccor 7\. (j,V)a. cannot be zero.

lierzlvakov showed that the procedure given above converges

Co a poinc of wiih a linear ract of convergence. If 1 < 2

clhen w.e can say
Fn = {:: 7. (j,V)(a':- + b.) 0}
1 1
i

and clearly P c Fn. If 0 < < 1, chen

Pn = {:.: \,.(jV) Ia'x + (1 \)b. ',a .: ] }
1 i 1 1


and P = P Here again Pn is a relaxed form of P and the sequence

obeys our stipulation for rela:xacion method, i.e., ac iteration n we
n n n
find :.; c P ..lere P = P .

The rela:-;ation sequence I{:; } congerge- infinicely co, a point

C .dP, where ciP is the boundary of P, only if the whole sequence

n: -'J ja r" into :; + H (: ). by, moving in a direction that is a

convex com.binacion of violated constraints ,we force the sequence out

of tlie jam. For Ehis reason terzl,,,akov method can be rc-erred co

as an an tijairinug procedure. Table 2 and Figure 17 iilutScnte this

poinE. The cable and figure are based on the ex:.ample givcn in

Merzlvakrov's paper.









TABLE 2

Consider the following system of linear inequalities

-x1 + 10x2 + 10 0


3x1 10x2 30 > 0

0
Let x = (0, 0). We solve this by

(a) Agmon's method (b) Merzlyakov's method

where X = 3/4, a. = 1
n

(a) Agmon's method
n+1 n n
x = x x + b. )a. ;
I 1n
n n n








TABLE 2 (continued)

(b) Merzlyakov Method
n+1 n [ZA.(j,V)(a'xn + b.)][ZA.(j,V)ai]
xI = x(jV I i(j a
In i(j,V)a. ] In i (jV)ai]


n [TA. (j,V)a.]'
x E X.(j,V)(a'.xn + b.) Y .(j,V)a. 1 1
n xln x n 1 1 1 1 1 [.i(j,V)a.]
X1 X2 l (3,V)a
.287
0 .000 .000 -2.873 ( ) 1
-.958
188
1 .619 -2.064 -1.838 ( ) .037
.037
.188
2 7.622 -.686 -.459 ( ) .037
.037
-.099
3 9.371 -.342 -.273 ( 99 1
188
4 9.351 -.138 -.125 (13) .037
.037
188
5 9.827 -.044 -.032 ( 1) .037
037
-.099
6 9.949 -.020 -.010 ( 99 1
.955
188
7 9.948 -.013 -.008 ( 03 .037
.037
287
8 9.978 -.007 -.002 ( 8 1
-.958978

9 9.978 -.008


Notice that the Merzlyakov method

the Agmon procedure. Convergence to a

faster with the Merzlyakov Method.


is much more effective than

point of P appears much


Generalized Merzlyakov Procedure. The Merzlyakov procedure

can be generalized [24] by allowing the A.'s to be chosen based on
1

the current point rather than fixed for a subcavity. For any:.

where V(x) # 0 let



























NAl A II
0 0
--F
+ I

x x
o 0

+ I

X X







II II
II
-i rC N








-P



-DC
0

-4 P

N 0








0 0
rII


< 0


I I









0 if a'x + b. > 0
A.(x)
( 0 if aix + b. 0
1 1

where 0 < EA.(x) B, A.(x) > 0 for some i E V(x) and at each n
i
there is a j e V(xn) such that


A.(xn)(a'.x + b.)
1 1 1 >
n
a'x + bj,


where j* indexes a most isolated constraint and 0 > 0.

The convergence properties of this method are described in the

next chapter.


Typical Subgradient Method

Procedure

Let f be a conve.:-: but not necessarily differentiable funccion

defined on R u E R is said to be a subgradienc of f at :: if it

satisfies the following inequality:

f(y) 2 f(x) + u'(7 :x:) for all Rn.

The set of subgradiencs of f at x is called the subdifferential of

f at :: and denoced by cf(>:).

Si(:-:) = {u c R : u is a subgradient of f at :-:}

Sis a minimum of f if and only if 0 e cf(x ).

A typical subgiadient algoritlhm works as follows:

(1) choose F. I. nrbicraril,

n - n n 11
(2) coiipu t .i bgradieit u >i at :-. i.e. 11 c d :-: ).

If u = 0 an optimal point has been found. If not go to (3).

n+l n-+ n n n
(3) Find : as tollous; :: = x t u









where tn is a scalar generally obtained by the approximate

minimization of

f(xn tun) for t a 0.
n n+1
Go to (2) with x replacing xn

The function f could be an arbitrary convex function. If the

objective is to find a point of a polyhedron, one could choose a

number of different functions. One choice could be
n n
f(xn) = Max (-a.x b.)
i 1
i=l,... ,m

Since f is the maximum of a finite number of convex functions, f is

convex. Oettli [32] has given another interesting way to define a

function which can be used to find a point of a polyhedron. This

is given latter.

Variations

Reverting back to the general subgradient procedure given by

the iterative scheme

n+l n n n
xx tu

various proposals [16] have been made for the step size tn. The

earliest one was by Polyak [34] who suggested tn = o and lim n = 0.
n--"

The merit of this procedure is that we could choose any arbitrary

subgradient u nE f(x ) and the sequence would converge to some x*

with the stipulated stepsize (e.g., with t = 1). However

convergence is dismally slow and the method is of little practical

value. Polyak attributes the original idea of this method to

Shor [36].

Another idea also due to Polyak [35] is to use








(2.17) ? If(x') ]
n n

I n*
with f = f(x ) the optimum value of the function. Polyak has shown

that with 0 < b An = b < 2 the sequence converges to x with a
1 n 2
linear rate of convergence. A variation of 2.17, also due to Polyak

is to use f > f(x ), i.e., an overestimate of the optimum f(x ).

Here again if 0
linear rate but to the level set

S = {x: f(x) < f}.

Thus for the result to be of value we must choose close enough

overestimates of f(x ) and this is difficult in practice.

A final variation on the theme of 2.17 is to use an

underestimate f < f(x ). Eremin [12] has shown that the sequence

converges to x if lim n = 0 and XA = n Here again convergence
n n
n--o
to the optimal solution is very slow.


Some Subgradient Procedures

Polyak

(2.18) Theorem (Polyak)

Let f be a quasi convex function to be minimized over F:. If

the sequence {xn} is obtained as follows
n+1 n n un
x = x A u with A = u n, then there e:-xists a
"ni ul n

subsequence (xnk} such that lim f(xnk) = f
k-oo

Proof: Select a > f and define

S = {x E R f(x) = a}








S is convex (it is a level set). Hence there exists an x in int S
a a
since a > f (see Figure 18). Choose p > 0 such that the neighbor-

hood B (x) c S
p a

Suppose x ~ S for all n





f(x)






-a


I I P.025







(a) x E RI

contours of f

(b) x R2




FIGURE 18


f(x) f(x") > u" (x xn), --x
n n
since u is a subgradient at x Thus

f(xn) f(x) un' (xn x). Also

f(x) < f(xn) -V-x S
Ct








Combining the above

un (xn x) 2 f(x") f(x) ? 0, -y-x C S


n
This holds for a point x + u E B (,x)c S
II u P


n un n pun
u x u ( + )



n+ln
+ Iu





n 11 n 1 2 n ~1
p u Ij u (x x)

x x || = ||x -



n ~ 2 n
II x x| + IIu

< n ~ 2 n
= x x2 + |u |

n ~I 2 n
= x xll + uI u


n ~ 1 2
u x I


Choose N such that A = p, -V-n N.
n
For n = N to n = N + m


S IN+m-1 ~ 2 N
0 5 x x| I x

SN+m
g x x p A
n
n=n


2un (xn- x)

2p I un J

2Pln
2pA .
n

This is possible since lim .', =0.
n
n--.0



- x|2 + E An(n 2p .,
n=N


For m large enough the right hand side is negative, since Z.\ Ji'.erges

-- a contradiction. Hence x EC S
a
Let lim a = f Then we have proved there exists :'nk a
k-- k AV
such that
n
lim f(x k) = f _-I
k--*oo

Computational experience with heuristic subgradient methods has

been reported by Held and Karp [18], and Held, Wolfe, and Cr-:,.der [19].


Hence









Held, Wolfe, and Crowder use underestimates to f(x ). However the

stepsize Xn = X is kept constant for 2m iterations where m is a

measure of the problem size. Then both A and number of iterations

are halved successively till the number of iterations reaches some

value k, X is then halved every k iteration. This sequence violates

the requirement Xn = m and it is possible that the sequence may

converge to a wrong point. However in their computational experience

this did not happen. Held, Wolfe and Crowder experimentally found

that their procedure was effective for approximating the maximum of

a set of piecewise linear concave functions. In their problems

(assignment, multicommodity, and maxflow) the number of constraints

is enormous, of the order of 100'. In all these cases they were able

to get linear convergence (which is assured by the choice of stepsize).

Held and Karp used overestimates to f(x ) to obtain bounds for

a branch and bound procedure for solving the traveling salesman

problem. Using this procedure they were able to produce optimal

solutions to all traveling salesman problems ranging in size upto

64 cities.

Oettli

Oettli proposed a subgradient method for finding a point of an

arbitrary non-empty polyhedron. We will discuss his less general

method which can be used to solve a linear program. As before, using

the dual and an additional constraint expressing strong duality, a

linear program can be stated to be the problem of finding an x C Rk

satisfying

+j(x) 0 i = 1, ..., m
1








where Z.(x) represents the magnitude by which constant i is violated

at x;i.e.,

i (x) = max {0, -a.x b.}
1 1 1

Define (x) to be the vector formed from violations of

different constraints

Z+(x) = (Z+(x), Z2+(x), ..., Z (x))
S2 m

Let p(*) be an isotone norm on Rm, i.e., if 0 = x y then

p(x) p(y). It may be noted that all the usual norms like Euclidean

and Tchebycheff are isotone norms. Define

d(x) = p(Z+(x))

Then clearly x satisfies +(x) 2 0 iff d(x) = 0.

Oectli sho'.ed that d(') is a conv'.e: function andi then described

an iterative scheme using subgradients of d(') that minimizes d(-)

over Rk. This scheme is quite general since we have a different

function for different norm-.

(2.18) Lemu.- [Oettlil

d(') is a conxex function.

Proof;
'. (z) is con'vex for all j.
.1+ +
Thus i (atz1 + ',z~) ": i (z ) + t i (z- ) for d., :. O, ., +-; = i.

Since p is monolonic
.+ '+ 1 +
pr'( +'izt + Cz-) p[,i':> (z ) + Si t (z )i

and since p as a norm is con'vx:

I p[lo (zi) + izt ) + z )

h= plk+(zl) I + d pl (z -)

hence d-xz1 + L-z~) L d(z') + Cd(z). i







Let x0 be given initially. If xn P stop. Otherwise, let

n+l n d(xn) .
(2.19) x = x -_ (xn-t(x)
t(x )"t(xn)

where t(xn) is any subgradient of d(') formed according to Oettli's
n
rule at x The sequence {x } converges to a point of P with a

linear rate of convergence.

(2.20) Lemma [Oettlil

Let y C P, then for the Oettli's procedure
n+1 2 n 2 d2(xn)
Ix I (x ) 1 2
Proof:

n+1 2 n d(x ) n 2
Sx yl = n x n) t(x ) yl 2
t(x")'t(nn)

n 2 dx) 2d(x)
=I x || + -x 2d(xn) t(xn)' (x Y)
t(xn)'t(x ) t(x )'t(x )

By definition of subgradient

d(y) > d(x ) + t(xn) (y x )

Since d(y) = 0 therefore

d(xn) 5 t(xn) (x y)

Collecting the above results



t(x ) t(x )

(2.21) Theorem [Oettli]
n+1 n <0<
I| x x 6 II x x ,0 < 0 < 1


Proof:
n+l 2 11 2 d2(xn
(2.22) j x ii Ixn- (xn)
t(x ) t(x )







n 2
S11 Y112

By triangle inequality

Ii x xl II x y l + I y- xI

S2 I x yll

Since this holds for all y E P including Tp(xn)

II x xl 2d(x', P)
*
Substituting x in 2.22
In+1 *11 2 l n x* 2 d2(")
lx x ] n x x I -d()
t(x ) t(x )

n 2 d2(xn) 1
= II x x I [1 ,n. 2 2]


n 2

2 nn
< n *; 1 d (x ) 1
Sn, F n. 2 n 1
cn ) tc x ) d x )

or I x. 1 :x- x 1 x ]
1 B

where A = inf B = dup I| t(xn")
n
n d(: P) n

Oeccttli has show..n that A > 0 and E i. finice. From these c'Dser'.acticn.,

the result follows if we take


6 11 -,4 2,"
SB-


In the ne:x. chapter we show that the Generalized llerzl;,akov

procedure is equivalent to the Oettli procedure in as far as the

sequence of possible moves made by che tw.'o procedures are concerned.

Souime of the Hore Riecc'nt Algori thlns

A limictaion of subgradient opcimization methods is the absence

of a benchmark co compare the computational efficiency of difference








algorithms. Algorithms for unconstrained optimization of smooth

convex functions have the reliable method of steepest descent

(Cauchy -procedure) as a yardstick against which other algorithms

are compared. If they are better than the Cauchy procedure, they

usually deserve further consideration. In subgradient optimization,

the Cauchy procedure may not work. An example where the Cauchy

procedure is ineffective is reproduced below from [37]:

The function considered is

max {3 x + 2y, 2x + 10y)







z

8


-8 >














FIGURE 19


The contours of this function are sketched in Figure 19. One

can calculate that if the Cauchy algorithm is started at z, the

sequence converges to the origin (0,0) which is not the minimum

of the function.









Some of the recent subgradient procedures can be divided into

two categories. In the first category are those that claim to be

at least as good as some other subgradient procedures. These

algorithms claim their superiority based on empirical (.computational)

results. In the second category are those that claim to be reasonably

effective for both differentiable and non-differntiable convex

functions and claim that the algorithm is superior when applied to

quadratic and convex differentiable functions.

As an example of the former we present the method of Camerini,

Fratta and Maffioli 15], and as an example of the latter we present

Wolfe's 138] method of conjugate subgradients for minimizing non-

differentiable functions. Wolfe's algorithm closely resembles

Lemarchel's method 128].

Camerini, Fratta and laffioli. Here one adopts the following

iterative scheme for minimizing f('). Let
0
x = 0 and

n+1 n n n .
x = x t s where s is a modified "gradient"
n

direction defined as

sn- = 0 for n = 0 and

sn = f (xn) + n-1
n
n n n
with f'(x ) a subgradient of f at x and 0 a suitable scalar. s
n
is equivalent to a weighted sum of preceding gradient directions and

is used as an anti-jamming device. The following lemmas and

theorems develop the essential elements of Camerini, Fratta and

Maffioli method.








(2.23) Lemma
n
Let x and x be such that

f(x ) f(xn), then

0 f(xn) f(x ) f'(x) (x x )

This lemma states that the negative of the subgradient makes an
S n *
acute angle with the vector (x x ) where x corresponds to an

optimal solution. See Figure 20.


Sn
/ x


-f'(xn)


f' (xn)


FIGURE 20


(2.24) Lemma

If for all n

0 < t f(x) f(x*) and 3 0, then
n 11 sn 12 n



0 5 f' (x) (x xn) < -s (x x)







n
Lemma 2.24 extends the subgradient property of Lemma 2.23 to -s
n
-s also makes an acute angle with (x x ).

(2.25) Theorem

n-i n
-Yn s f (x ) if s f(xn) < 0

Letn I "n- 112
0 otherwise

with 0 = = 2: then
n

n' n n'' n
-(x x ) s (x x f (xn)

ls" l = Ilf,(xn) ll
n
Theorem 2.25 shows that a proper choice of ensures that -s is at
n

least as good a direction as -f'(xn) in the sense that -s makes

more of an acute angle with (x x ) compared to -f'(x ). If

-f'(xn) and -sn1 make an acute angle we take -sn = -f'(xn). Only

if -f'(xn) and -sn- make an obtuse angle do we take

-s = -(f (n) + s"-)
n

Figure 21 illustrates the above theorem.


-f' ( n)=-I


ii-1
-I,


Sn-1
-S


n-i
x


FIGURE 21








(2.26) Theorem

Ix* xn+ll < j x* xnl

Proof:






nH
-2f' (xn) (x* xn by Lemma 2.24


-2 (x* x by Lemma 2.25

Hence

t il sn 112 + 2sn'(x* xn) < 0


or t sn2 + 2t s (x* x") < 0
n n

adding II x* x n2 to both sides gives

1Ix* xn+l < I x* xI I/

Theorem 2.26 shows that the sequence is Fej6r-monotone with respect

to an optimum point x*. Using arguments similar to those used

earlier in relaxation methods, it can be shown that the sequence

{xn) converges to x*.

Camerini et al., use heauristic arguments to show that the

best value for y is 1.5. They then claim superiority of their

method over that of Held and Karp [18] and Polyak [35] based on

computational results.

Wolfe. In Wolfe's method [38] we generate a sequence of points

{xn} of directions {dn}, of subgradients {f'(xn)} and scalars {tn}

such that for n = 0, 1, 2, ...








t > 0, f'(xn) E 3f(xn)
n
n+1 n n
x = x + td and
n

f(xn+l) < f(xn).

At step n we have a bundle of vectors G which is formed by a set of
n

rules. G consists of f'(x ), possibly -dn- and a (possibly empty)



n
string f'(x ), f'(x ), ... of limited length.

We set d = -N G where N G is a unique point in the convex
rn rn

hull of G having minimum norm. Demjanov 110] has shown that

Vf(x) = -Nr f(x) where Vf(x) is the gradient of f at x when f is

smooth at x and 3f(x) is the subdifferential of f at x. As is

typically the case, the stepsize, t is determined by an approximate
n
minimization of f(:< + td ) for t 0. At some step, d = -; G
n r n
will be small, and in addition the tlst p + 1 points (for some

interger p) are all close together. Thlen :' w. ill be our appro:imrate

soluc i ln.

For e:x-ample consider the illustration in Wolfe's paper

f(::,v = ma:x [f I, f:., f, where
t f
S = -. f = .: + v, t = ," 2"
1 3
sketched in Figure 22. A subgradient of f at an',' (::,') is chosen to

be '7f. where i is the least inde:.: for which f(:,',') = f.. If we start
i 1
with the point (6,3), we move in the direction -'f(6,3) = (-1,-1) co

the point (3,0). Using our prescription for choosing che direction,

the oni:, available direction at (3,0) is still (-1,-1). We must

ctake a small scep in tliac direction an':,.a,' even tclouPg it is not a

descent direction. hien the new subgradient will be (-1,-2) forming

the bundle G6, = [(1,1), (1,-2)}, .'e find the direction C, = (-1,0).
r















y


10

r\ NA8






[1 J -\ x















FIGURE 22


The next step in this direction takes us to the neighborhood

of (0,0). At the next iteration we add (-1,0) to the bundle. Then

N = 0. We "reset" and start all over again at this point. We
r3
accept the last point reached as a solution if N G = 0 and the
rn
last "p" points are "close enough."

For differentiable convex functions, the method is an extension

of the method of conjugate gradients and terminates for quadratic

functions.















CHAPTER 3

THEORETICAL RESULTS


Comparison of-Generalized Merzlyakov Method and
Some Typical Subgradient Methods

The equivalence of relaxation methods and minimization methods

has been noted by several researchers. However, these two approaches

are only mathematically equivalent and this equivalence has not been

generally made explicit. On the other hand computationally relaxa-

tion methods are much simpler. It has been shown [24] thac _

generalized form of Merzlyakov's method subsumes most of the typical

subgradient methods. In particular it was shown that the geneuralized

Merzlyakov method can generate any move that is made by thc ucettli

method. In this section we draw heavily from the results of the

above paper [24].

Generalized Merzlyakov Method

The Merzlyakov method is generalized [24] so as to facilitate

its comparison with some of the subgradient methods. This extension

greatly increases the flexibility of the original method. The

following procedure allows the A.s to be chosen based on the current
1
point rather than being constant for a subcavity. For an:, ,l.,lere

V(x) # 0, let

0 if a'x + b > 0
A.(x) =
S0 if a'x + b. < 0
i 1 -








where 0 < E A.(x) 8, A.(x) > 0 for some i CV(x) and at each n
1 1
there is a j C V(xn) such that

.(xn)(a'.xn + b.)
3.1 I J 90
a' ,xn + bj,


where j* indexes a most violated constraint and 0 > 0.

For A C (0, 2) and x initially specified, stop at iteration n

if x E P. Otherwise, let

n+ A. (xn)(a'xn + b.)(E A.(x )ai)
n+l n i 1 1 1 1
3.2 x = x -
(E A. i(xn)a.)' ( .(xn)a.)

We continue to assume H aill = 1 for all i.

In the next two theorems, we show that the generalized

Merzlyakov method converges with a linear rate of convergence to P.

3.3 Theorem

Let z C P, and {xn} obtained by the generalized Merzlyakov

method. Then
x n+l 1 < 11 n 11
llx+1 zll < x z

Proof:

2(xn-z)' A Z A.(xn)(a'xn + b.)E X.(xn)a.
I n+l i ll n 2 1 1 1 1 1

1 1 1 1
|| x -Z11=[|x n nn


12(Z A.(xn)(a'xn + b.))2
+ 1 1 1
(E A.(xn)ai)'(. .i(xn)a.)

The last two terms can be confined Lo get

-AX A .(xn)(a'xn + b.)
S 1 -n [2X A.(xn)a'(x z)
(X A.(xn)a.)'( n [.(xn)a) 1 1

-A(E A.(xn)(a'xn + b.))].
1 1 1








Now a'xn + b. 0 whenever A.(xn) > 0 and a'xn + b. < 0 for some
1 i I i 1
A.(xn) > 0 and thus the first bracketed term is positive.
1
Since z C P and a'z > b., then
1 1-
-E A.(xn)a'z 5 Z A.(xn)b.
1 1 1 1
Thus the second bracketed term is less than or equal to

(2 -A) A.(xn)(a'x" + b.)
1 1 1
and this term is strictly negative.

Thus

Sxn1 zll < i x -n zl Ii/

Thus we have established that the sequence is Fejer-monotone

with respect to P. We next show that the procedure gives points

which converge linearly to P. Let

d(xn, P) = inf II x z .
ZEP

3.4 Theorem

Either the generalized Merzlyakov procedure terminates with a

point of P or it generates a sequence which converges to a point

x* E P.

Furthermore, the convergence rate is linear.
n n n
Proof: Assume x V P and z is the unique closest point of P to x .

Then
2 n+1 p) |n+l n n 2
d (x P) [ x z .
S In+1 n 2
After expanding x z 112 we get

,n n 2
A(2-A)[Z A.(x )a'x + b.)]
2 n+l, P) | n n 2 i i
d (xn P) = ||x z || -
( ( A.x )ai)'( A. (xn)a.)
1 1 1 1

A(2-A)[Z A. (xn)(axn + b.)]2
d2(xn p) 1 i
= dA x P) A---(x-n-)a
(E A.(xn)a.)'t( A.(xn)a.)
1 1 1 1








By assumption

IJ A.(xn)(a'xn + b.)]2 [A.(xn)(a'xn + b.)]2
i 1 1 1 1

( 62 (a'xn + b. )2 for some i E V(xn)

From Hoffman [21] we have that there is a p > 0

pd(xn, P) < -a'. x b..

Thus

d (x P) = d2(xn, P) 2-2(x, P)
(E .(xn)a.)' (E (xn)a.)
1 1 1 1

Finally since E A.(xn) < B and I a.ll= 1 for all i, we have

(E Ai(xn)ai)' ( .(xn)a.) 82

And we get

d2(xn+, P) = II A(2-)e2 /2/ 2]d2(xn, P) y2d2(xn, P)

Then 0 y < 1 and d(xn, P) < nd(x0, ).

Case 1: If the sequence does not terminate lim d(xn, P) = 0 and
n-Ko

Motzkin and Shoenberg [31] have shown that the sequence converges to

a point x* in the boundary of P for 0 < A < 2.

Case 2: If the sequence terminates

x n+x zjl < xn zil for all z E P (Theorem 3.3).

Also Ilx* znl| < I xn z nl where zn is the closest point to P

from xn

Th us
nn n


< 21 xn z"II = 2d(x", P)

2ynd(x P)

where x* is the last point in the terminating sequence. ///








We can further generalize the Merzlyakov method and Theorems 3.3 and

3.4 by replacing A with A where 0 < b. < A < b < 2 for all n. In
n 1 = n = 2
the remainder of the discussion we usually mean A = A for all n but
n
do not have to restrict ourselves in this manner.

Further Developments of Oettli Method

Oettli's minimization method was discussed in the last chapter.

We wish to show the equivalence of Generalized Merzlyakov method and

Oettli's method in the next section but before that we need to develop

some properties of Oettli's method to facilitate this comparison.

Again most of the theorems and proofs in this section are taken from

the paper [24] referred to earlier.

We represent the subdifferential of a convex function f at x by

f(x)and a subgradient of f at x by f'(x) i.e., f'(x) E af(x).

To recapitulate the Oettli procedure, we form the sequence {xn}

as follows:


3.6


where

3. 7


n+l n d(xn)t(xn)
x =x
t(xn)'t(xn)
n n
t(x ) is any subgradient of d(') at x

Lemma

For any norm p(') on Rm

ap(x) = ap(0) n {p'(x): p(x) = p'(x


)'x)


Proof:


We will show

pp(x) =


Conversely if the


that if p'(x) p(x)then the following ralation holds

{p'(x): p(y) p'(x)'y for all y} n

{p'(x): p(x) = p'(x)'x}

above relation holds, we will show p'(x) E ap(x).







By the definition of a subgradient p'(x) E ap(x) if

p(y) > p(x) + p'(x)'(y x) for all y.

Also

p(Ax) p(x) > p'(x)'(Ax x) for all scalars X.

Since p(') is a norm

p(Ax) = A(lp(x)

For X > 1 p(x) > p'(x)'x

0 < A < 1 p(x) < p'(x)'x

Thus p(x) = p'(x)'x.

From the subgradient inequality

p(y) p(x) + p'(x)'(y x) by substituting for p(x) we get

p(y) p'(x)'y for all y. Hence

3p(x) = {p'(x): p(y) p'(x)'y for all y} n

{p'(x): p(x) = p'(x)'x}.

Conversely, if p(y) > p' (x)'y, for all y and p(x) = p'(x)'x, we get

by subtraction

p(y) p(x) p'(x)'y p(x) = p'(x)(y x)

and p'(x) E Dp(x).

Finally

pp(0) = {p'(0): p(y) > p'(0)'y, for all y} n Rm

= {p'(0): p(y) p'(0)'y, for all y). V/

To find the subgradients of d(-) in terms of subgradients of the

composite function p(+ (x)) we use the following chain rule given by

Oettli.

3.8 Lemma

Let p(') be an isotone norm and p'(Z (z)) 0, then

3.9 d'(z) = (z) p'( +(z))








Proof:
+ + 0 > 0 + '
e (z) A (z ) = (z- z ) jl Cz )

multiplying both sides by p'( +(x)) where p'(Z (x)) = 0

(Z+(z) Z+(z0))p', +(z 0)) = (z z )' +'(z0 p'(+(z0))

Using the definition of subgradient for p'( +(z0))

p(Z+(z)) p(C+(z0)) 0 (+(z) Z+(z0 )'p'(Z+(z0)

( (z zO)'+' (z0)p( +(z0))

d(z) d(z0) (z z )+'(z p' (+z))

But d(z) d(z0) 2 (z- z )'d'(z )

hence d'(z ) = +' (z )p, +(z0 ).

The subgradient of (x) may be found using the following well
1
known.. result

3.10 t.+ (x) = {-Aa.: = 0 if a'x + b. > 0
1 1 1 1

A = 1 if a'x + b. < 0 and
1 1

A [0,1] if a'x + b. = 0}
I 1

The subdifferential of Z (x) can in turn be found from

3.11 a +(x) = {(hi, ... h ): h. E S (x)}
1 m i l

Generalized Merzlyakov method and Oettli method

With the results developed so far in the previous sec ions we

are now in a position to develop an expression for equation 3.6 in

terms of the parameters of the generalized Merzlyakov method.

3.12 Theorem

Let x c Rk, x V P and t(x) be a subgradient of d(') ic :w formJd

by using the chain rule of equation 3.9. Then
d(x)t(x) [ (a'x + b.)]EA.a.
d(x)t(x) 1 1 1 1
t(x)'t(x) (~A.a.)'( .a.)
1 1 1 1








for some X.'s where A. O, = 0 if a'x + b > 0, and EA. > 0.
1 i i i i i
Proof:

By equation 3.9 we have

t(x) = e+ (x)p'( (x)) for some e (x) e S +(x)

and p'(Z+(x)) ap( +(x)).

By equations 3.10 and 3.11

z (x) = -(nlal' r2 2' .. n a ) with

0 if a'x + b. > 0
1 1

= 1 if a'x + b. < 0
I 1 1

S[10,1] if a'x + b. = 0
1 1

Also from the proof of Lemma 3.7

d(x) = p'(Z+(x))'+ (x) where

n, (a'x + b.)
z1
Z+(x) =

n (a'x + b )
m m m

Let

h p' ( (x)). In Oettli's method we need h 0. Substituting
d(x)t(x)
these last few expressions into (x)t(x)gives
t(x)'t(x)
d(x)t(x) [EX. (a'x + b.)]EA.a.
d(x)t(x)1 1 1 1 1
t(x) 't(x) (EA.a.) (EA.a.)

with A. = n.h. 0. We also have -EA.(a'x + b.) = p( +(x)) > 0 since
1 11 1 1 1
x V P and EA. > 0.

We have thus established a close similarity between the two

methods. We have however yet to establish the two other requirements

viz EA. =< and that condition 3.1 is satisfied. First we note that
1








since the subdifferential of p(') is compact EA. is bounded above.
1
Hence the condition LA. B is satisfied. Now to show that condition
1
3.1 is satisfied, we see on the lines of the proof of Theorem 3.12

that

d(x) = p(Z+(x)) = p'( +(x))' +(x) E h'(A+(x))

= -ET.h.(a'x + b.) -EA.(x)(a'.x + b.)
1 1 1 1 1 1 1
where A.(x) = .h. > 0.
1 1 1
Hence

-E. (x)(a.x + b.) = p( +(x))
1 1 1
and m max X.(x)(-a'x b.) 2 p(C (x))
1 1 1 -
i

Also Z (x) 2 Z.,(x)ej, where j* is the most violated

constraint and e. is a unit vector with a 1 in position j*. Since

p(') is monotonic

S+ (x)ej)
p(Z (x)) p(A (x)e)

= -(a,x + bj.)p(ej.)


Thus

m max A.(x)(-a'x b.) 2 -(a' x + b. )p(e.,)
i 1 1 1 J

2 -(a',x + bj ) min p(e.)

Hence

main P (e)
A,(x)(a'x + b ) m p(e
i i > i a
= 0 > 0
(a',x + bj,) =


where i E V(x) maximizes the left hand side.

If we allow X = 1 in the generalized Merzlyakov method,we find

from the above analysis that any move made by the Oettli procedure









can also be made by the generalized Merzlyakov method. However, the

reverse is not true. This can be seen if we let A # 1 and take the

case where x violates only one constraint. Thus the Oettli procedure

is strictly subsumed by the generalized Merzlyakov method. Compu-

tationally the relaxation method is very simple to implement since

the directions of movement can be easily computed. Also it is a

very flexible procedure since all that is required to change the

direction of movement and the step length is to change the weights

X.(x). On the other hand the Oettli method requires a subgradient

to be computed at each point which is not a trivial calculation.

In an earlier paragraph we indicated that the

Merzlyakov procedure could be further generalized by allowing A to

vary with each iteration. However this would require the additional

condition 0 < b Xn A b2 < 2. Oettli has recently generalized his

method [33] to incorporate the above feature. His generalized

method requires that An (0, 2) for all n with the additional

stipulation

CA (2 X ) = + .
n n

A second aspect in which the two methods differ is that in the

generalized Merzlyakov method we could use a different set of X.'s at
1

each iteration. This could be introduced into the Oettli method by

considering different norms at each iteration.

Comparison With Some Other Subgradient Nethods

In this section we show the versatility of the generalized

Merzlyakov method by showing that some of the other typical subgradient

procedures are strictly subsumed by the generalized Merzlyakov method.








We consider the methods of Polyak (134], 135]), Eremin [12] and Held,

Wolfe and Crowder [19]. These subgradient methods are capable of

tackling more general functions, however we compare them when the

objective is merely to find a point of a polyhedron P. There are a

number of ways of defining the function f(') such that when we

minimize it we get x* P. A typical choice is

3.13 f(xn) = max -(a'xn + b.).
1 i
i=l,...,m

We will show that with f thus defined the subgradient methods are

subsumed by the generalized Merzlyakov method. The above subgradient

methods can be collectively described by the following sequence:

n -
n+l n [f(x) f] n
x = x 2 f(x )
S n f'(xn)T2


where f is an estimate of f(x*) the optimum value of f. If f(x )

assumes its value for a unique i in the relation (3.13) then the

subgradient at x f'(xn) = a.. If there are ties, f'(x ) can be
1
taken to be a convex combination of the maximizing a.s. Let 0(x )

represent the set of indices i E {l, ..., m} for which

f(xn) = max (-a'.xn b.). Then
i=l,... ,m

f'(xn) = ZA.(xn)a. with EA.(x ) = 1
1 1 1
A.(xn) 0 and A.(xn) = 0 for i i 0(xn).
1 1

Hence
n -
nl n ((x) f)
n+1 n \n A n,
x = x f'(x )
i f' (xn) 2 (x


(EA.i(xn)(a'xn + b ) f)(E.(xn)a.)
1 1x 1
n (EA.i(xn)a.)' ( i. (xn)a.)








with A.(xn) 0, A.i(xn) = 1 and A.(xn) = 0 for i V 0(xn)
1 1 1
This can be considered equivalent to finding a point of P where

P = {x: a.x + b. f ? 0, i = 1, ..., m}.
1 1
If f 0, the procedure gives a point of P. Thus if P J 0 the

subgradient procedures mentioned in this section are special cases

of the generalized Merzlyakov method.


An Algorithm Using Relaxation Method

Motivation

We have seen that with relaxation parameter of A = 1 we can get

the best convergence rate for a fixed i (see Figure 14). Also a

higher p leads to better convergence. We wish to combine both these

features in our algorithm. We also derive motivation from Polyak's

[34] and Oettli's procedure 132]. In this section we give a procedure

for finding a point of an arbitrary polyhedron P and later apply it

to the specific problem of solving large linear programs.

Given a polyhedron P

P = {x E Rk: a'.x + b. = 0, i C I}.
1 1
We assume P J 0 and I j 0 and finite. Goffin [15] has shown that when

P is full dimensioned finite convergence is assured for a range of

relaxation parameters A depending on the obtuseness index of the

polyhedron. When P is not full dimensioned,we can partition it to

subsets M and C, such that M is full dimensioned, and then devise an

iterative procedure relative to these subsets so as to take advantage

of the full dimensionality. Even though finite convergence is not

assured, we can hope to get a better convergence rate with a proper

choice of the relaxation parameters. There is considerable flexibility









in the manner M and C are chosen. One important consideration is that

it should be easy to project on to C. The algorithm is in two phases.

In Phase I we use an iterative scheme on the sets

M and C which drives us close to P. We do this by starting with an

arbitrary point in C. We then move in the direction of M. This

movement could be a projection onto M, or a reflection or under or

over relaxation, depending on the value of the relaxation parameter

A. From the new point thus obtained we project on to C. We show

that the sequence thus constructed converges to P with a linear rate

of convergence. In our algorithm however, this iterative procedure

is continued merely to get close to P. In this special case when
k
C = R Phase I becomes the relaxation procedure of Agmon.

Once we get close to P we switch to Phase II. In Phase II we

consider the constraints of the set P and apply the

Generalized Merzlyakov method. The motivation for use of the

Generalized Merzlyakov method at this stage is the fact that the

set of violated constraints as we approach P are precisely the set

of violated constraints at the limit point and under these

circumstances, the Generalized Merzlyakov method can be very

effective.

We first describe the procedure for finding a point of an

arbitrary polyhedron P.

The Algorithm

Phase I.

Let z C-P

If z E P stop, otherwise define








n+l } T )
z = TC {zn + Ad (zn)a.i } TC(sn)
n

where 0 < A < 2 and a. corresponds to the most violated halfspace
1
n
of M at z and

3.14 sn = zn + Ad (zn)a
n

We will show that the sequence {zn) is Fejer-monotone with

respect to P.

3.15 Lemma [Goffin]
k
Let x, y, t be points of R and let

x' = x + A(t x) where A E R. Then

I x' y 12 = Ix y 2 2 A) It x 2 + 2A(t-y)'(t-x)


Proof:


2
I x' yll = x + A(t x) yi 2

= 1 x y1| 2 + 12(x y) + A(t x)]' A(t x)

= Ix y 2 A(2 A)| t xII 2 + A[2(x y) + 2(t-x)]'(t-x)

= x y 2 A(2- )I t xI 2 + 2A(t y)'(t x). i//

3.16 Lemma

-sn ul2 = izn u2- AdM(zn)[(2 -)d(zn) 2a' (tn u)]
n
for u E M.

Proof:


n n n
Substitute s for x z for x and t for t, where

t = z + d (z )a Then
n
1I s ul 2 = 11 z u12 A( (2 ) i )t zn12

+ 2A(tn u)'(tn z)


22n
= |z u 2 A(2 A)d (zn) + 2Ad (zn)al (tn u)
9 1
= I zn u| AdM(Zn)I(2 A)d:(zn) 2ai (t" u)].
n


//








3.17 Theorem

Let M be a non-empty polyhedron of Rk. Then the sequence of

points generated by equation (3.14) has the following property:

Ss" ull 1 z" ull for all u E M.

Proof:

If u E M, u H Hence a' (tn u) 1 0 for all u M. By
1 1
n n
lemma 3.16

I sn ui I z ull .

3.18 Theorem
k
Let P be a non-empty polyhedron of Rk. Then the sequence of

points {zn} generated by equation (3.14) is Fejer-monotone with

respect to P, in fact

I zn+ ull < 1n ull for all u P.

Proof:

Let u c P and let b be the unique point obtained by projiccring
n n+l
u onto the line formed by extending s z (Figure 23). The angle
n
formed by s b, u is a right angle and for the right triangle
n
s b, u.
n 2 2 n 2
I s all 2 + II a u 2 = 1 s u[ 2
I n n+l 2 + n+l al2 + 2j su n+l 1 nl+ _
I| s z 1| + || z al| + 2 s z I[ |1 z a |

+ a ul 2 = sn ul

or
n n+1 2 n+l 2 n n+1 n + I
s + u + 2 js I i|

= sn u

or


n+l u 2 <11n 2
|| z ull < || s ull








n n+l
The strict inequality arises from the fact that s and z are

distinct or else z E P. But by Theorem (3.17)

II n ul 11 zn u|I hence


Szn+l ull < I zn uI I7/7

In the next theorem we show that the sequence {zn} converges

linearly to P.


FIGURE 23








3.19 Theorem

The sequence of points {zn) generated by the above procedure

converges finitely or infintely to a point of P. Furthermore the

convergence rate is linear.

Proof:

Izn+ T (zn+12 = I zn+ u 2 for a unique u E P (the
n+l
closest point of z to P). Thus

I1z n ul 2 = TC{zn + AdM(z )a } TC(u) 2
n
S1 zn u|2 Ad (zn)[2a' (zn-u)-Ad (zn)]
n
since the projection map satisfies the Lipschitz condition with

c = 1. Continuing
n+1 2 n 2 n n n
z u z u12 Ad (z )[2d (Z ) Ad (Z )]

since a' (zn u) = d ,(zn) hence
i M
n

Iz n+l ul 2 < d2(zn, P)[l A(2 A)"2] by Lemma 2.12

= 2d2 (z P).

We get

d(z n+, P) < ed(zn, P) where

0 = [1 A(2 A),*2]1/2

and 0 E [0, 1).

Hence

d(zn, P) < 9nd(z0, P) and if the sequence is infinite

lim d(zn,P) = 0
n-+o

Using Theorem [2.4] and by the continuity of the distance function

this implies {z n converges to z* in the boundary of P.








If the sequence terminates at z*, then

Iz* Tp(zn)I l< Tz(zn)jT = d(zn, P)

by Fejer-monotonicity of the sequence. Hence z* and zn both belong

to the ball B (Tp(zn)), and
d(zn, P)
d(zn

1I z*- zn < 2d(zn, P) < 2Ond(0, P). -

Phase II. When the last k points are within E of each other,

we switch to the Generalized Merzlyakov procedure. This is because

if we are near the limit point, the only active constraints would

be the tight constraints at the limit point of the sequence. In

such a situation the Generalized Merzlyakov method can be very

effective with a proper choice of A and A.(x). We recapitulate

that in Merzlyakov's procedure (which is a special case of

Generalized Merzlyakov method) the sequence is generated as follows:

A n (j, V)(a'x" + b.)][A(j, V)ai]
n+l n 1 1
x =x -
[x.,(j, V)ai] [Ei(j, V)ai]
n
where x e S

If we assume that we know the set of constraints satisfied

with equality by Tp(xn), we can generate the halfspace

H* = {x Rn:(Tp(xn) x") (x Tp(xn)) > 0}

which gives convergence to Tp(xn) in exactly one step with X = 1.

Suppose we number the set of active constraints at T (xn) by

I* = {1, ..., p}. Let A* E (al,..., a )' be the p x k matrix
P
formed by interior normals of the halfspaces H. for iC I*. Let
1
b* = (b ..., bp)'. Define

(A, ...,A )' = -(AA*')-(A*xn + b*).








With this set of X.(j, V)'s using Merzlyakov method, Goffin [15]

has shown that convergence to T (x ) is obtained in one step.
p
However, this requires a comparably larger amount of computation

and finding (A*A*')-1 is not really practical on large problems.

In this study we have therefore concentrated on the Generalized

Merzlyakov Method and attempted to find what would constitute a

favorable set for A and X.(x).

Application to Solving Large Linear Programs

We now consider how the procedure could be used to solve

large linear programs.

Consider the LP

max c'x

s.t. Ax b

x_ 0

and its dual

min T'b

s.t. A'T c

0.

By stong duality, an optimal I and x satisfy

b'T c'x < 0. Let

C = ((): b'T c'x 0}

M= {(x): Ax = b, -A'T : -c, x 2 0, T 0}

and P = M n C.

TIis is only one of the ways of partitioning the cons. L '1 ilL of

P. There is in fact great flexibility in the choice of C and II.

The advantage of splitting C and M is indicated above is









(a) M has a special structure and can be decomposed along T

and x. This leads to saving in storage as well as easier arithmetic.

(b) C contains only the coupling constraints and is easy to

project to.

However, there are other ways of obtaining P. Another

choice could be to let C have only the non-negativity constraints.

Again the advantage of such a construction would be ease of pro-

jection to C. Still another choice is for C to take the form

C = {(x): b'n c'x 5 0, x 2 0, Tr 2 0}.

There is user discrimination in this algorithm in the choice

of M and C, 1 and Z in Phase I, and A and weights A.(x) in Phase

II. We have already commented on the possible strategies for

selecting Ii and C. We give below strategies for selecting the

other parameters. We will first discuss the parameters for Phase I.

(1) A: If 0 < A 5 1 convergence may be slow if some of the

"angles" of M are very acute (small obtuseness index of P). In

this case increasing A to near 2 will have the effect of opening

the angles and accelerating the convergence. It therefore appears

to be a good strategy to have A approximately equal to 2 in Phase I.

(2) Z : In line with the general observation for relaxation

methods that anya priori information on the location of the solution

could be used to advantage, if we have to solve a linear program

with only slightly varying data, we could Lake advantage of this

feature. However, in general when we do not have any such prior

information, the choice of Z0 could be dictated by the structure of

C, so that we can start with a feasible point of C.








Now for the parameters of Phase II

(1) A: As a general guideline 1 < A <2 may be chosen as the

parameter for Phase II as was done in Phase I. However, Table 3

and Figure 24 show the sequence obtained with A = 1.5 and h = .75

and illustrate that in some cases choice of 0 < A < 1 may give

better results.

(2) A.(j, V): In Merzlyakov method the choice of weights


X.(j, V) is crucial and in a large measure dictates the convergence
1
of the sequence. It may be noted that Ai(j, V)'s determine both

the direction as well as the step size. Suppose a., i C K

represent the set of violated constraints in subcavity S where

xn SV. We can solve the quadratic problem of finding the point

of minimum norm in convex hull of the finite point set a., iE K

to get the weights X.(j, V). Such a direction locally has a great

deal of merit. However the computational effort is too great to

merit its consideration, since at each iteration we have to solve

a quadratic program. Also it may not always lead to the best set

of weights as shown by the counterexample given in the next

paragraph. Instead we specify X (j, V) =) -, where KI represents

the number of constraints violated at x When IKj = 2 the

direction obtained coincides with the vector of minimum norm.

The strategy suggested by us is not the best in all cases as

shown by the following counterexample:

Consider x > 1


x2 = 2

1 0
x =(









Case 1: A = .75


TABLE 3
A.(j, V) = 1 for all i E V.
1


n El.(j, V) [ZA. (j, V)a. ]'
x 1 A.(j, V)a 1
n n n xn i [7%i (j
x x2 (a'x + b.) 11 IE(j, V)a.]
1 21 1 1
287
0 0.000 0.000 -2.873 2'87 1
-.958
188
1 .619 -2.064 -1.838 (' ) .037
.037
188
2 7.622 .686 .459 ( 88 .037
.037
099
3 9.371 .342 .273 ( '99 1
.995
188
4 9.351 .138 .125 ( 37 .037
.037
188
5 9.827 .044 .032 ( ) .037
.037
-.099
6 9.949 .020 .010 ( 99) 1
.995
7 9.948 .013 .008 ( 18 .037
.037
287
8 9.978 .007 .002 ( 58) 1
958

Case 2: A = 1.5 A.(j, V) = 1 for all i E V.
1
n EAi (j, V) [Z .(j, V)a.]' *
i .(j, V)a., V)ai
1_1 2 i 1 1
287
0 0.000 0.000 -2.873 ( 8 1
-.958
-.099
1 1.237 -4.128 -3.235 ( 9) 1
.995
2 .757 .700 -3.326 ( -28 1
-.958
-099
3 2.189 -4.079 -3.280 ( 99) 1
.995
287
4 1.702 .816 -3.166 ( -8) 1
-.958
-099
5 3.065 -3.733 -3.022 99) 1
.995
287
6 2.616 .777 -2.866 ( 58 1
-.958
-099
7 3.850 -3.347 -2.710 ( '995)
.995)
8 3.447 .704 -2.558 287
-.958


The


problem considered is
P = {(x2): -.099x1 +
.287x1 -


.995x2 + .995 > 0,
.958x2 2.873 2 0, x 1 0, x2 0}.


















4
2 x
x


8
6 x
x


A = 1.5


S= 5
,' '


FIGURE E L'









From the proof of Theorem 3.4 we have

2 nl (2 A)[EL.(xn)(a'xn + b)]2
2 n+1 2 n i i
d (x P) d(x, P)- -( X (
(C^i(xn)ai) C Ci(xn)ai)

Thus to get the best set of k.'s we may solve the following
1
optimization problem

(A.(-a'xn b.))2
1 1 1
max = k
I EA.a.l| 2
I 1 1 2
,*
Suppose A* solves the above, then i also solves the above.
1 l n Ea.H
S1 1i
Hence an equivalent problem is

max (EX.(-a'.x b.))
1 1

s.t. I EA.ai = 1.

In our specific problem

max A1 + 2A2

2 2
s.t. A + = 1
1 2
or max jl A22 + 2X2'
1 2
This gives iA = and .
1 a 2 J57
This set ofA.s gives
1
2 1
x =( ) which is feasible with k = 5.
1
Had we specified A, = A i.e., A. =-
1 2 '-- i ,

2
x =3 which is not feasible and k = 4.5.



Thus our prescription does not lead to the best set of Ais, it is

however a good heuristic and easy to implement.









(3) A.(x). In Generalized Merzlyakov method, the weights A.(x)
3 J
are determined by the point alone and are not fixed for a subcavity.

The following appears to be intuitively a good set of weights since

the weightage is dependent on the magnitude of violation of a

constraint at the specific point

.() = (violation of constraint j at x
j total violation at x


Computational Results

Die algorithm described in the previous section was coded

primarily to come up i Jith a favorable set of '.alues for the

arbitrary parameters \ in Phase I and and the weights \.(x) for
J
Phase II. It may' be stated at the outset that selection of these

parameters is an art rather than a science and it uould be hard to

specify the best set of paramccers for the general .:ase.

The test problem considered uas the problem on "Lease-Buy

Planning Decisions" on pages 93 to 114 of Salkin and Saha's

"Studies in Linear Programming." The problem has 17 conscaints

and 20 variables. Together uich dual and non-negati'.'icy, set H

thus had 74 constraints. Set C consisted of the constraint obtained

by using strong duality. This problem has a unique solution. The

constraints of the problem are reproduced in A.ppendi:.:

-Test results corroboraLe tlie gncra-i l ,biiJd:lines for selection of

cliesc p[ rame Lcrs indicated in the ;rc'. ious secLiol. '1ile di L.ince

from 3 point of I' 1wliere P = 11 ii C, ,was calculatcJd 3C ench iteration.

Tlie number of iterations required to reduce the distance from P bY'

3 certain amount was used as the criterion of improvement. Two








different starting points were selected. Results of computations

are summarized below:

Phase I

A. The relaxation parameter X was varied in the interval (0,2).

For the first starting point, the number of iterations required to

reduce the distance d from P from d = 74 to d = 60, was noted, and

for the second point the number iterations required to reduce the

distance from d = 135 to d = 132 was noted. These results are

summarized in Tables 4 and 5 respectively. The results are in line

with our expectation based on intuition that a higher A has the effect

of accelerating convergence.

Phase II

The Phase II relaxation parameter was varied in the interval (0,2)

and the following alternatives were tested for the weights A.(x):
J


1
(a)(x) # of violated constraints at x

(b)X (x) violation of constraint j at x
W total violation at x
(c). (x) violationn of constraint j at x)2
3 total violation at x

(d) .(x) =(violation of constraint j at x 3
j total violation at x
4
(e).(x) violation of constraint j at x)
j total violation at x

() violation of constraint j at x)
)if ( total of violation at x = .3

= violation of constraint i at x
j( = total violation at x

S.2 (violation of constraint j at a x
total violation at x
f .2 5< (v violation of constraint j at x
I total violation at x
J total violation at x


then


)

.3 then


)








TABLE 4
Initial d = 74



X Final d No of iterations


.25 63.757 > 20,000


.5 60.0 12,955


.75 59.999 7,339


1.0 60.0 4,866


1.25 59.997 3,469


1.5 59.997 2,565

1.75 59.996 1,927


1.95 59.997 1,525








TABLE 5
Initial d = 135.


A Final d No. of iterations


.25 133.113 > 20,000


.5 132.0 12,698


.75 132.0 6,692


1.0 132.0 4,208


1.25 132.0 2,934


1.5 131.999 2,137


1.75 131.999 1,587


1.95 131.999 1,263





90


if < (violation of constraint j at x .2
total violation at x < 2 then


=3.3*(violation of constraint j at x)
total violation at x

if violationn of constraint j at x <1 then
if (x) ----- -: -- = 3.13then
total violation at x

.(x) = violation of constraint j at x)
total violation at x

(g) f ( violation of constraint j at .3 then
total violation at x

= i00"(violation of constraint j at x
j total violation at x

if" 2 < (violation of constraint j at) < .3
if (x2 = 1 0( ---- - -: -- --) .3 t e
total violation at x

S= I0(violation of constraint j at x
total violation at x

if 1 1 < (violation of constraint j at x) <.2 tien
if 2 < ( ----- - ----- ) < .32 then


total violation at x

A.(x) = .3-violation of constraint j at x
j total violation at x

if (violation of constraint j at ) < then
if ( -- 1 <--. ( -) < .1 then
total violation at x

.) = (violation of constraint j at x
j total violation at x

Results of alternative (a) are given in Tables 6 and 7 for

the same two points used in Phase I.

From Tables 6 and 7 it would appear that again A of around 2

gives best convergence, but the counterexample in the previous

section shows that this is not always the best policy with this set

of weights.

Results of alternative (b) are given in Table 8.







TABLE 6
(a) X (x) = 1
j # of violated constraints at x

Initial d = 74


A Final d No. of iterations


.25 -- time limit exceeded


.5 -- time limit exceeded


.75 60.0 14,839


1.0 60.0 8,028


1.25 59.998 4,702


1.5 59.998 2,861


1.75 60.0 1,424


1.95 59.857 850









(a) X.(x) =
J


TABLE 7
1
# of violated constraints at x


Initial d = 135


X Final d No. of iterations


.25 -- time limit exceeded


.5 133.45 >20,000


.75 132.0 15,249


1.0 131.999 7,568


1.25 131.999 4,135


1.5 131.999 2,435


1.75 131.998 1,212


1.95 132.0 713








TABLE 8
(b) .(x) violation of constraint j at x
j total violation at x


Intiial d = 74


X Final d No. of iterations


.25 59.999 1,253


.5 59.964 1,396


.75 59.997 1,505


1.0 60.0 1,084


1.25 59.999 1,254


1.5 59.976 1,061


1.75 59.997 863


1.95 59.998 894




Full Text

PAGE 1

ITERATIVE METHODS FOR SOLVING LARGE-SCALE LINEAR PROGRAMS By Gollakota Surya Kumar A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 19 79

PAGE 2

ACKNOWLEDGEMENTS I wish to express my deepest gratitude to Professor Gary Koehler, my dissertation chairman, for the excellent guidance, the constant encouragement and the support he has provided me during my career at the University of Florida. Despite his busy schedule he was always available at a moment's notice. I also wish to thank Professor Antal Majthay whose depth of knowledge not only in the area of operations research but on a wide range of subjects will always remain awe-inspiring. I am also grateful to Professors Ira Horowitz and Henry Tosi for providing financial assistance during my stay at the University of Florida. I also wish to thank Ms. Kathy J. Combs for her superb typing of the manuscript. Last but not the least this work would not have been possible but for the patience, understanding and encouragement of my wife Rekha. 11

PAGE 3

TABLE OF CONTENTS PAGE ACKNOlin^EDCe-IENTS ii ABSTRACT v CHAPTER ONE THE PROBLEI-1: WHY ITERATIVE TECHNIQUES? 1 Tl'JO BACKGROUND RESULTS AND TOOLS 8 Introduction , , , 8 Motivation , , 8 Task , , 11 Definitions , , , 12 Typical Relaxation Method 13 Procedure • . , 13 Variations . , 14 Convergence 14 Fejer-monotone sequence 14 Motzkin and Schoenberg 16 Relaxation Procedures 18 Agmon 18 Motzkin and Schoenberg, Eaves 21 Goff in 22 Definitions 22 Another proof of Agmon's result 33 Range of A for finite convergence 37 Merzlyakov 39 Definitions and motivation 39 Method 40 Generalized Merzlyakov procedure 44 Typical Subgradient Method , 46 Procedure 46 Variations , 47 Some Subgradient Procedures 48 I'olynk 4y Oettli 51 Some of the more recent algorithms 54 Camerini, Fratta and Maffioli 56 Wolfe , eg 11 X

PAGE 4

CHAPTER PAGE THREE THEORETICAL RESULTS 62 Coraparison of Generalized Merzlyakov Method and Some Typical Subgradient Methods 62 Generalized Merzlyakov method , 62 Further developments of Oettli method 66 Generalized Merzlyakov method and Oettli method 68 Comparison with some other subgradient methods , 71 An Algorithm Using Relaxation Method 73 Motivation 73 The algorithm 7A Phase I m Phase II 79 Application to solving large linear problems 80 Computational results 86 Concluding Remarks 98 APPENDIX 99 BIBLIOGRAPHY 101 BIOGRAPHICAL SKETCH 105 IV

PAGE 5

Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ITERATIVE METHODS FOR SOLVING LARGE-SCALE LINEAR PROGRAMS By Gollakota Surya Kumar August 1979 Chairman: Gary J. Koehler Major Department: Management Tlie simplex method for solving linear programs has achieved its overwhelming success over the years due to the advances during the last three decades on extending the effectiveness of the basic algorithm introduced by Dantzig in 1951. However the largest problem that can be economically solved by this method is dictated by the size of the basis. The limitation being the ability to maintain a sparse yet accurate representation of the basis inverse. Commercial computer codes using the recent developments of the simplex method are capable of solving mathematical program systems of tlie order of 10,000 rows. However, larger linear programs do arise in practice. llie reason why they are not being set up is because none of the production LP codes can solve such large problems in a timely and cost effective fashion. For such large

PAGE 6

problems iterative tecliniques that do not require a basis may be an attractive alternative. We review two classes of iterative techniques — relaxation methods and subgradient optimisation. Solving a linear program can be shown to be identical to finding a point of a polyhedron P, the polyhedron being formed from primal and dual constraints and an additional constraint using strong duality. In relaxation methods we successively find points of relaxed forms P c p'^_ That is, at iteration n we find an x c P such that P <= P , We assume P ^ 0. The problem is solved when the sequence converges to a point X* £ P. Subgradient methods are an alternative class of iterative techniques for solving these problems. In subgradient methods we minimize a convex though not necessarily dif ferentiable function ^ / \ , , • • , ri+1 n n n , n „ . f(*) by the iterative scheme x = x t u , where t > is a a scalar obtained by the approximate minimization of f(x t u ), and u is a subgradient of f at x . VJhen we use the minimization method for finding a point of a polyhedron P, the function f is so defined that if x* is a minimum of f tlien x''' G P. These iterative techniques work with original data; thus advantages of supersparsity can be fully realized and the program run in core. Also they are self correcting and knowledge of a vector close to the solution can be used to advantage. We sliow that a generalized form of Merzlyakov's relaxation procedure subsumes most of the recent subgradient procedures when the objective is to find a point of a polyhedron. A new algorithm

PAGE 7

using relaxation method is presented. Tliis algorithm v;as coded to explore values for the relaxation parameters used in the algorithm.

PAGE 8

CHAPTER 1 THE PROBLEI-l: WHY ITERi\TIVE TECHNIQUES? Mathematical programming has achieved its present popularity in a large measure due to the success of the simplex method for solving linear programs published by Dantzig [6 J in 1951. Over the years considerable effort has been expended towards exploiting and extending the effectiveness of this procedure. We will briefly outline some of the improvements. Most practical linear programs have a sparse matrix [2] and it is necessary to exploit this sparseness to reduce memory and time requirements and to maintain accurate information. The first such procedure was the revised-simplex procedure. This procedure operates only on original problem data and requires an inverse of the current basis. If there are m constraints and n variables (including slacks, surplus and artificials) then the regular simplex procedure requires (1.1) (m + l)(n + 1) storage locations. lliere are m + 1 rows (including the objective and n + 1 columns (including the right-hand side). In the revisedsimplex procedure we need (1.2) p(ra + l)(n +1) + m^ where p is the problem density. (1.1) is much larger than (1.2) if p is close to zero.

PAGE 9

It was also discovered that if the inverse matrix were stored in product form [29] an additional decrease in storage requirements could be realized. However, in product form, the basis inverse grows in size at each iteration so that eventually we must either run out of room or scrap the current form of the basis inverse and reinvert the current basis. In practice this happens quite often. Using product form inversion, the storage requirements are (1.3) p(m + l)(n + 1) + qmr where q is the average density of the eta vectors needed in the product form inverse and r is the average number of eta vectors. Notice, though, that to gain the decrease in (1.3) over (1.2) we must spend time in tlie reinversion step. It was soon realized that there are many ways of forming a product form of a basis matrix. For instance, the matrix could first be decomposed into the product of two triangular matrices. The eta vectors for these are sparse and easily formed. In addition, within a class of procedures for viewing the original basis, one can choose the row-column pivot order to minimize the creation of new members — i.e. one can choose the order in forming the eta vectors to minimize the density of the resulting basis inverse representation [20]. A related issue has to do with updating the currect basis inverse with a new eta vector. One can perform this operation with a minimal buildup in storage requirements [14]. Kalan [22] pointed out that of the p(m + l)(n + 1) elements of a typical LP problem, most are identical (usually most are + Is).

PAGE 10

Kalan suggested storing only the unique numbers of an LP problem and using pointers to store the problem structure. In so doing one stores fewer elements than exist in the problem. The new sparseness is termed supersparsity . Kalan noted that not only can original data be stored this way but also new data (formed in tlie basis inverse) can be stored in tliis fashion. Another useful idea — scaling of data — has numerical stability for its motivation. Generally for real problems the coefficients of matrix may take on a wide range of values. ITiis can cause numerical difficulties — for example in inversion — since the computer must round off fractions. To reduce the effect of this problem, the matrix entries are scaled before starting the problem. There are a number of ways in which to obtain a starting basis: from scratch using an "artificial basis," using a CRASHed basis forcing some of the variables into the basis, or based on information about the problem. The last approach is especially useful in solving a series of problems with only minor variations in data. Candidate selection can play an important part in the efficiency of an LP algorithm. There are a number of ways to select the entering variable — "multiple pricing" is one of them which is especially useful for large problems. In this approach we typically divide the variables into segments and pick tlie segments sequentially. In the chosen segment we select the g most negative reduced cost variables as candidates for pivoting. We concentrate on these g

PAGE 11

4 non-basic variables for the remainder of some h iterations. And then repeat the procedure with the next segment. A more recent development concerning pivot selection is due to Paula Harris [17]. The reduced costs here are weighted prior to selection. The weights are chosen with respect to a fixed set of non-basic variables. The idea is that in the traditional most negative reduced cost approach we are trying to find out the best candidate but on a local basis — locally we move in the direction of steepest ascent of the objective function. Such a selection may not serve global interests. By constantly adjusting the pricing by weights derived from a fixed reference point, the global interests of the problem are hopefully maintained. Experimental results corroborate this expectation and show it to be a very effective procedure. When there is degeneracy in a linear programming problem, there exists the possibility of a set of basic feasible solutions occurring repeatedly — called "cycling." Bland [3] has introduced several new methods to avoid cycling — the simplest of which is to select as entering variable one with the smallest subscript among all candidates to enter basis. If there is a tie for leaving variable, select the one with the smallest basic subscript. Parallel with efforts to increase the size of problems that can be tackled by the simplex method have been efforts to increase its capabilities for problems having special structure. llie computing effort using the simplex method depends primarily on the number of rows; so it is worthwhile considering the possibility of

PAGE 12

taking care of some constraints without increasing tlie size of the basis. Generalized Upper Bounding (GUB) technique introduced by Dantzig and Van Slyke [9] achieves this for contraints of the form (l.A) Zx. = b. with the restriction that the same variable does not occur in more than one GUB constraint. Decomposition technique is another method of extending the capability of problem solving by breaking it into smaller subprobleiiLS and using a master program iteratively to tie them together. Unfortunately the convergence of this method is rather slow and in addition one is unable to compute bounds on the value of the objective function at each iteration, unless each of the subproblems is bounded. Despite the far reaching advances made during the last three decades in extending the capability of the simplex method, some of which have been mentioned above, the basis inverse still remains the limiting factor dictating the size of the problem that could be tackled. Even when we use variants like GUB and decomposition techniques, the size of the basis is still the bottleneck. In the case of GUB it is the basis of the master problem consisting of other than GUB constraints and with decomposition it is the size of the basis of the largest of subproblems. Thus this basis inverse is the heart of all that is bad in simplex method related techniques of solving an LP. Commercial computer codes using above developments of simplex method are capable of solving mathematical programming systems of the order of 10,000 rows. However, useful linear programs of such

PAGE 13

6 magnitude do arise in practice for example in food, oil and chemical related industries. The reason why larger ones are not being set up and solved is that no production LP code has been developed to solve such large problems in a timely and cost effective fashion. Tliis is not to say that there are not large linear programming production codes. For example, IBM's MPSX/370, CDC's APEX, Honeywell's LP 6000 X, Techtronic's MPS3, and others are reliable and available at reasonable rates. However, one would not seriously consider using one of these to solve a linear program of 10,000 or more rows on a routine basis unless the problem possessed some nice structure or the results of the problem solution had a high economic value. Dantzig et. al. [8] in their justification for the setting up of a system optimization lab for solving large scale models have noted : Society would benefit greatly if certain total systems can be modeled and successfully solved. For example, crude economic planning models of many developing countries indicate a potential growth rate of 10 to 15% per year. To implement such a growth (aside from political differences) requires a carefully worked out detailed model and the availability of computer programs that can solve the resulting largescale systems. The world is currently faced with difficult problems related to population growth, availability of natural resources, ecological evaluations and control, urban redesign, design of large scale engineering systems (e.g. atomic energy and recycling systems), and the modeling of man's physiological system for the purpose of diagnosis and treatment. These problems are complex, and urgent and can only be solved if viewed as total systems. If not, then only patchwork piecemeal solutions will be developed (as it has been in the past) and the world will continue to be plagued by one crisis after another caused by poor planning techniques.

PAGE 14

Tlie bottleneck in solving such large unstructured LPs is the inability to maintain a sparse yet accurate representation of the basis inverse. When the simplex method is not computationally feasible for such large problems, iterative techniques that do not require a basis inverse may be an attractive alternative. In this dissertation we concentrate on iterative procedures for solving linear programs. VJe discuss iterative techniques under the headings (a) relaxation methods and (b) subgradient methods. Tliese methods are insensitive to the number of constraints. In fact an increase in the number of constraints for the same problem improves the convergence of the relaxation methods. Unlike the simplex method, iterative techniques are self-correcting. We always use original data thus advantages of supersparsity [22] can be fully realized and the program run in main memory. Finally knowledge of a vector close to the solution can be used to advantage. This is very useful when a sequence of problems has to be solved with only slightly varying data. As an aside to the main thrust of this dissertation, we note that iterative methods are more attractive for solving certain large Markov decision processes and also for problems having a Leontief Substitution constraint set [23, 25]-both of these problems are an important special class of LPs.

PAGE 15

CHAPTER 2 BACKGROUND RESULTS AND TOOLS Introduction Motivation In this chapter we will review various iterative techniques for finding a point of a polyhedron. We are interested in finding a point of a polyhedron because a wide variety of important mathematical programs can be reduced to such a problem. For example, solving a linear program could be reduced to finding a point of a polyhedron using primal and dual constraints along with an additional constraint implying strong duality. To elucidate this further, consider the following primal-dual linear programming pair: (2.1) Primal Dual Max c' X Min v'b s.t. Ax^b s.t. v'A^c X >0 V > Here A is m x n; c is n x 1 and b is m x 1 . v' stands for transpose of V. Problem 2.1 can be restated as (2.2) -Ax + b ^ v'A c ^ v'b c'x ^ V ^ 0, X ^

PAGE 16

Let 1' = {(^)(^) satisfies 2.2} be the set of solutions to 2.2. (^)C P implies that x solves the Primal problem in 2.1 and v solves the dual problem. Linear programs of the form z* = Max Min (alx + b.) X i=l, . . . ,ra relate directly to the problem of finding a point of a polyhedron P where P= (xe r'^: alx + b. k z*, i = 1, ..., m} Such problems arise, for example, in large-scale transportation problems and decomposition problems [2 7]. We will develop such a formulation for the transportation problem. Given k origins with stocks a, > of a single commodity, and m destinations with demands d. > the transportation problem involves J determining a shipping schedule that meets all demand and supply contraints at minimum cost. Let z„. be the amount shipped from £ to j c cost of shipping one unit from £ to j . £j The problem is k m Min Z T. Cn . Zn i=l j=l ^J ^J subject to '^ Zo . = ^o » ^ = 1. • • • , !< j=l ^^ ^ k ^ 2p . = d. , j = 1, . . . , m £=1 ^J J

PAGE 17

10 ^.Ij'" Without loss of generality, we assume k m Z 3. = Id £=1 ^ j=l If m and k are too large to be handled by conventional network methods [4], then the decomposition principle can be used. Let {(z^.: i = 1, ..., n; j = 1, ..., m) : i G I'} be a list of extreme points of k ^ z„=d.,j=l, ...,m £=1 ^J ^ Then the transportation problem is equivalent to the master problem km . . Min Z L c Z Zp . A £=1 j=l ^J id ^J subject to ra a, Z Z Zp. A = 0, £ = 1, ..., k ie I j = l -' Let Z A^ = 1 iel A^ ^ 0. k m b . = Z Z c„ . z. . m — 1 _ 1 y 1 j = l The master problem becomes Z iel Min Z b^A^

PAGE 18

11 subject to -E a. X =0, J2. = l, ...,k. iel Z X = 1 i£l A^ > The dual to the above problem is Max w subject to k -L aj x„ + w g b^ £=1 ^ ^ w < b + a ! X 1 where a^ = (a , a . . . , a ) G R This problem is equivalent to Max^ Min r , , , -i ^Jk ... ta.x + b . } xeK lel 1 1 Task Two of the methods of finding a point of a polyhedron are the Fourier-Mo tzkin Elimination Method [7] and a Phase I simplex method. The Elimination Method reduces the number of variables by one in each cycle but may greatly increase the number of inequalities in the remaining variables. For larger problems tliis procedure is impractical [26] — since the number of inequalities grows at a rapid rate. The Phase I simplex procedure can also be used to find a point of a polyliedron. For very large problems the simplex method breaks down due to a practical limitation on the size of the matrix

PAGE 19

12 that can be inverted as explained in the last chapter. It is for problems of this nature that iterative methods may be an attractive alternative. We discuss iterative techniques under two classes — relaxation methods and minimization methods. In relaxation methods, at each iteration we attempt to find a point of a relaxed form P of P. That is at iteration n we find an X e P such that P c P . We look for the property that the sequence of points thus generated is pointwise closer to P. Minimization methods attempt to solve the problem of minimizing a general, maybe non-dif f erentiable, convex function f(*). The sequence generated generally has the property that the successive iterates are closer to the level set of solutions in distance. X" is one such point. The value of the objective function need not necessarily improve at each iteration. When we use the minimization method for finding a point of P, the function f is so defined that if x* is a minimum of f then X* e P. We will now review the relaxation methods. As was mentioned 1 r, , . . . n „n , ^n . earlier, we find at each iteration n, a point x e P where P is a relaxed form of P. That is P c p . We continue finding points successively in this manner till we can find an x* e P. At least one such point exists by our assumption that P 7^ 0. Definitions We need a lew definitions before some of tlie methods can be described. Let Ji. (x) = alx + b. 1 11

PAGE 20

13 k k where i £ I, 1^0 and finite, x e R , a. e R and b. e R. Without loss of generality assume || a. || = 1 for all i, where || • || is the Euclidean norm. H. given by H. = {x e R^: I. (x) > 0} 1 1 is the halfspace associated with i C I. P given by P = n 11. = {x e R^: 5,.(x) ^ 0, j £ 1} is the polyhedron iel "^ * of interest. E. given by E. = {x C R*^: £. (x) = 0} 1 1 is the plane associated with halfspace II. . d(x,ll.) given by d(x, H.) = inf II x-z || Z £ H. 1 is the Euclidean distance from x £ R to halfspace H. . d (x) given by d (x) = max d (x, H . ) P Jiel is the maximal distance from x to P. B (x) given by B^(x) {y£R^:|| x-y|| ^ r} is the ball with radius r and center x. S (x) given by r S^(x) = {yER^'iJI x-y|| = r} is the k-1 dimensional sphere of radius r and center x Typical Relaxation Method Procedure A typical relaxation method for finding a point of P is o k (1) Pick X £ R arbitrarily

PAGE 21

u (2) If X e P stop, otherwise let i''' be the most isolated halfspace, i.e ., -a! AX b . * > -a! X b., for all iel. 1 1=1 1 We obtain the sequence {x } recursively as follows: n+1 n , , n . n n, X =x+A (t-x) where t ~ ^-n (x ) is the projection of x on E.^^. A is called the relaxation parameter at iteration n. Generally A is constant across n = 1, 2, ... Variations If A =1 for all n, we call the procedure a projection method, and when A = 2 for all n, we call it a reflection method. See Figure 1 for an example of the relaxation procedure. In this example we use A = A = 2, for all n. At x°, E is the plane corresponding to the most violated halfspace. We reflect in C 11 2 to get X . At x we reflect in E to get x . If we continue the procedure we are in P in four iterations. Convergence Fejer-monotone sequence We are ultimately interested in finding a point of P or at least finding a sequence of points which converge to a point of P. If the sequence terminates with a point of P (as in Figure 1) we have accomplished our goal. If the sequence does not terminate, then we wish to know whether it converges to a point of P. A well known result concerning "Feje'r-monotone" sequences will be used to shed light on this question.

PAGE 22

15 a^x + b > / L. E, FIGURE 1

PAGE 23

16 n k A sequence of points {x } in R is said to be Fejer-raonotone [13, 31] with respect to the set P if . X n+1 , n (i) X f y. and (ii) II X zjl ^ II X z|| for all z £ P and for all n. Motzkin and Schoenberg A well known result in analysis is that if (s } is monotone, then {s } converges if and only if it is bounded. Hence we see that a Fejer-monotone sequence always converges in distance. However, we need to relate this convergence to the convergence of sequence {x } . Before we relate a theorem on convergence of Fejer-monotone sequence of points, we need a few more definitions. Given a set K c R , the af f ine hull of K, denoted by jU(K) , is M(K) = {x e R*^: X = Z A^x^, x^ e K, ieL ^ A = 1, L finite, A"^ c r} . ieL For example, the affine hull of a point is the point itself and affine hull of a line segment is a line containing the segment. The affine 3 hull of a triangle in R is the plane containing the triangle. A set P is affin e if P = M(P) . k + Let M be affine in R , x a point of M and r e R be a positive real number. Then we define S (M, x) as the spherical surface with axis M, center x and radius r by (2.3) S^(M,x) = x + ((M x>L n {z c R*": || z|| = r}) Figure 2 shows the construction of a spherical surface. Given a point of affine set M, M x is the translation of M by x. (M x)-*is orthogonal to M x. S (0) is a spherical surface with radius r, r '

PAGE 24

17 which intersects (M x)-^ at Wo points. Tliese two points are translated back by x to give S (M, x) . (2. A) nieorera (Motzkin and Schoenberg) Let the infinite sequence {x } be fejer-monotone with respect to the set P, then Case (a) : If P has an interior, the sequence converges to a point. (M x)-L (M x) FIGURE 2

PAGE 25

Case (b): If P does not have an interior, then the sequence can be convergent to a point or the set of its limit points niay lie on a spherical surface with axixA[(P). See Figure 3 for an illustration. In case (a) P has an interior 12 3 * and the sequence x , x , x , x , ... converges to a point x of P. In case (b) P does not have an interior and the Fejer-mono tone 0123 -,.u . . . p u sequence x , x , x , x , ... results in the two limit points or the sequence lying on a spherical surface with axis M(P) and center y. y being the projection of x on P Relaxation Procedures Agmon We are now in a position to start our survey of relaxation procedures. One of the earliest published methods is the projection k method due to Agmon [Ij. Here we specify an initial vector x e R (chosen arbitrarily) and set A = 1 for all n. At iteration n if X e P stop. Otherwise we find a nev; point x as follows: n+1 n , , n , n n. X =x+A (t-x) (a. X + b . ) a. n ^n In ^n = X a. a. =x -(a. X +b.v In In) a,In since ||a. 11 = 1. Here i represents a most violated contraint of II 1 M n n P at X .

PAGE 26

19 Case (a) S^(M(P), y) 13 5 X , X , X , M(P) 2 4 6 X ,X ,X ,X y Case (b) FIGURE 3

PAGE 27

20 This procedure yields a sequence which converges to a point of P. In the Agmon procedure P" = {x: a'. X + b. ^ 0} That is, P is the closed halfspace defined by the constraint i^^ and P c ? . In keeping with our notation, here p" is a relaxed form of P. (2.-5) -Theorem (Agmon) Let P = n H.,P/(t), Ij^i}) and finite, be the solution set of iel ^ a consistent system of linear inequalities, and let {x"} be the sequence defined by the projection method, that is, A" = 1 for all n. Tlien either the process terminates after a finite number of steps or the sequence (x } converges to a point of P, x , Furthermore, n "" X X g 2d(x°, p) e" where c (0, 1) and depends only on the matrix A, where A = ^ 1 I ra / and m = 1 1 1 . Agmon explicitly proved the convergence for tlie projection method. He also showed that the results could be extended to the case A c (0, 2), where A" = A for all n.

PAGE 28

21 Motzkin and Schoenbera, Eaves Motzkin and Schoenberg 131] (among others) described a reflexion method for finding a point of P. They showed that if P has an interior then the sequence terminates with a point in P after a finite number of steps. Let x ^ P be an arbitrary starting point and generate a sequence as follows: If x" e P stop If X ?! P select a half space such that x ^ H. Let X be obtained by reflecting x in E . . After finitely many iterations, x will fall in P if P has an interior. The general re flee ion function can be expressed as follows; k k For i = 1, ... , m define f.: K — * R by 1 (X 2(a.x + b.)a. if x ?! H. Ill 1 X if x C H. Let g , g„, ... be a sequence where g. e {f , ..., f } for j = 1, 2, . J 1 m Define g to be the composite function g"(x) = g (g , (...(g,(x)) ...)) n n-i 1 (2.6) Theorem (Motzkin and Schoenberg) Let P ^ and I t^ and finite. If P = n H. has an interior, iEl ^ then for any x , there is an H such that g (x) c P. Eaves 111] extended this result and demonstrated that the reflection procedure has a uniform property. Namely, if x is within

PAGE 29

22 I, 0, a distance 6 of P, then g (x ) will fall in P for some I where * * % ^ % and t depends on o and not on x . (2.8) Theorem (Eaves) Assiime that P = n H has an interior. Let X be the set of id " points within a distance 6 of P, that is, X = {x c R*^: d(x, P) ^ 6} % 1 Tlien there is an £ such that g (X) c P . g is thus a piecewise, linear retraction from X to P . Piecewise and linear means that it has finitely many linear pieces. Retraction means g : X -^P is continuous, P c X and g (x) = X for X £ P. See Figure 4 for example. Six points chosen arbitrarily within 1" of P all converge in under four iterations . Gof f in Definitions . Coffin's [15] work deserves special mention. He has provided a comprehensive treatment of relaxation methods and presented several useful finiteness properties. We need a few more definitions before presenting Coffin's results. A convex subset F of P is a face of P if X. y e P 'I l> — ^ X, ye F. (x, y) \< i ^ ] We denote the set of faces of P by F(P). Figure 5 illustrates F(P) for a given P. Zero-dimensional faces are 1, 2, 3. Onedimensional faces are 4, 5, 6. The polytope 7 itself is also a face.

PAGE 30

23 FIGURE 4

PAGE 31

24 Tlie zero-dimensional faces of P are called the extreme points of P. Tlie n-1 dimensional faces are called facets of P. FIGURE 5 N (x) defined by Ng(x) = {u £ R^: u'(z x) < 0, Vz e S} is the normal cone to S at x. N (x) is a non-empty, closed convex o cone. It is non-empty because it contains at least the origin. It is closed and convex because the intersection of closed halpspace is closed and convex. It is a cone because for all u c N (x) , Au £ N (x) , A ^ 0. Cg(x) = (Ng(x))^ is the supporting cone to S at x. Figure 6 illustrates these last two definitions. A point X of a set P is a relative interior point designated r.i.(P. ) if it is an interior point of P with respect to the relative

PAGE 32

25 topology induced in M(P) by the Euclidean topology, i.e., xe r.i.(P) if there exists 6 > such that Br(x) n M(P) c p. For example, as 2 2 shown in Figure 7, a line segment in R has no interior in R but has a relative interior. FIGURE 6

PAGE 33

26 FIGURE 7 It can be shown that the normal cone is the same for all relatively interior points of a given face, i.e., Np(x) = Np(F) for any X c r.i.(F) where F is a face of P. Similarly Cp(x) = Cp(F) for any x E r.i.(F). Let T be any set, then -T is defined as -T = {x e R : X = -y, y £ T} A closed convex set is said to be smooth enough at a point v of its boundary if -Np(y) c Cp(y) A closed convex set is said to be smootlLenou^,_ if it is smooth enough at every boundary point, or equivalently if -Np(F) c Cp(F) •-F£ F(P) where F(p) stands for the set of faces of P and F(P) cl> is the set F(P) with removed.

PAGE 34

27 Tlie smooth enough property applied to a polyliedron would require all its "corners" to be "smooth enough," carrying the analogy to k-dimensional Euclidean space. Some characteristics of smooth enough convex sets are mentioned below: A polyhedral cone C is smooth enough if and P P only if (iff) -C "^ C, where C is the polar of C. c''={u e R*^: u'z ^0, Vz e C}. A polytope (bounded polyhedron) is smooth enough iff all its extreme points are smooth eno ugh . A polyhedron is smooth enough iff -N (F) <^ C (F) for ail the minimal elements of the poset {F(P) (}>, c} where *= stands for set inclusion. Some illustrations are given in Figure 8. A poset (P, <^) is a set P on which a binary relation £ is defined such that it satisfies the following relations; (i) for all X, X _< X (reflexivity) (ii) if X j< y and y £ z then x £ z (transitivity) (iii) if X _< y and y <_ x then x = y (antisymmetry) Examples of posets are {R, <] , i.e., the reals with the ordinary less than equal to operation, a dictionary with lexicographic ordering, and the power set S, P(S), with set inclusion, r , as the binary operation. Given a unit vector e e R and an angle a £ [0,11], the set of vectors which make an angle with e less than or equal to a is called the spherical cone , C (e), with axis e and half aperture angle a.

PAGE 35

28 (a) Not smooth enough at the extreme points of P. (b) Not smooth enough at any point of P. (c) Smooth enough. FIGURE 8

PAGE 36

29 C^(e) = {x £ R^: x'e = See Figure 9 for an illustration X II Cos a} C^(e) = c{B^(e)} a = sin a, a £ [0, jr) 2 FIGURE 9 Some properties of C (e) are given below: a C (e) is closed for a e [0,11] C^(e) is a convex set for a e [0, 11/2] For a £ [0,11/2) if we let v = Sin a, then C^(e) = c{B^(e)} where C(S) = {x £ r'': X = Z X\\ x^ £ S, A^ > 0, L finite} i£L C(S) is the convex cone hull of S.

PAGE 37

30 Following Goffin [15] we define an obtuseness index u(C) of a closed convex cone C to be the sine of the half aperture angle of the largest spherical cone contained in C. Properties of V(C) are: U(C) = sup {Sin a; C (e) c C, e e B (0)} = sup min(a'. e) eeS^(O) id ^ U(C) > iff C has an interior u(C) = 1 iff C is a halfsp ace . If C (e) is a spherical cone then U(C (e)) = Sin a, a a The obtuseness index of a polyhedron P is defined by U(P) = inf U(Cp(x)) = min U(Cp(F)) x£;3P F£F(P)-i}) For a polytope we have U(P) = min U(Cp(F)) FeVertices of P If U(P) = 1/42, then P is smooth enough. However, tliis is not a necessary condition for P to be smooth enough. If P is the positive 3 orthant in R , then P is smooth enough but i(P) = I//3 < I/J2. The distance from a point x to a set S <= r*^ is defined by d(x, S) = inf II X z|| zES If S is closed, then the infimum is attained, and the set of points of S where it is attained is the set of points closest to x in S Tg(x) = {y e S: II X y||= d(x, S)} If a set K is closed and convex, then T (x) reduces to a single point

PAGE 38

31 which will also bo denoted by T (x) . Tliis point is cnllcd the K projection of x into K . Hence T (x) is a retraction from R to K. K Let K be a closed convex set in R . Tlie following important result implies the continuity of the map T^ (2.8) Theorem (Cheney, Goldstein) K The projection map T satisfies the Lipschitz condition (for K. C = 1). That i IS II Tj,(x) Tj^(y)||^ ||x yl See Figure 10 for an illustration; \\M \(y)\\ < ||x yjl FIGURE 10

PAGE 39

32 (2.9)Tlieoreni (Goffin) Let K be a closed convex set of R and T (x) the unique closest K point to X in K. Then {[Tj^(x) + Nj^(T^(x))J, Tj^(x) e K} is a partition of R^. See Figure 11 for an illustration. FIGURE 11 (.2.10) Le mma Let C c R be a closed convex cone. Then P P T^(-C ) c c"^ P where C is the polar of C. An alternative formulation of a spherical surface with axia M, radius d(x, M) and center T^XX) (see 2.3) is contained in the M following le mma.

PAGE 40

33 (2.11) Lemma (Coffin) k k Let M be affine in R and x c R , then Sd(^^j^)(M.Tj^jM) = {y e r'': II X z\\ = ||y-,||, v-z e m} Finally a well known result on norm-equivalence is stated below. Here we state the result in terms of distances from x to P witli X i ?. (2.12) Lemma Let P = n II . , P / 0, I j^ 0, then there exists a scalar i£l ^ < y ^ 1 such that for all x c R yd(x, P) ^ dp(x) ^ d(x, P). This result is illustrated in Figure 12. Another Proof of Agmon's Result . ITie following is a different proof for the result originally due to Agmon. We will find this useful in subsequent parts of this material. In this proof we make use of the result that the sequence generated by the relaxation method is Fejer-monotone with respect to P. FIGURE 12

PAGE 41

34 (2.13) Tlieorem Let P= n H.,P?^0, I?^0 and finite, then the relaxation id ^ method generates a sequence which converges finitely or infinitely to a point x* £ P for any given value of the relaxation parameter A such that X £ (0, 2) . Furthermore, we have II x" x^'ll s 2d(x", p) e" where = [1 A(2 -A)u^]^^^ A e [0,1) Proof: Assume x ?! P First we will show using Figure 13 that d^x'''^\ P) = d^x", P) A(2 -A)dp(x") where d (x ) is the maximal distance from x to P i.e., d (x ) = max d(x, H. ) . iel ^ T, , . , . -, n+1 n „ , n+1^ From the right triangle x , t , T (x ) we have: ,2, n+1 „. j2. n+1 n, , ^2. n ^. ._. d (x , P) = d (x , t ) + d (t , P) (1) From the right triangle x , t , T (x ) we have: ,2, n+1 ,2,n n.,,2,n„, ,„, d (x , P) = d (x , t ) + d (t , P) (2) From (1) and (2) ,2, n+1 j2.n „, .2, n+1 n, ,2, n n. d (x , P) d (x , P) = d (x , t ) d (x , t ) ,2 n+1 2. n n+1 n, , , . n n. ^ d (x , P) = d (x , P) [d(x , t ) + d(x , t )] r , / n n . , . n+1 n . , [d(x , t ) d(x , t )] = d^x", P) Adp(x'')[(2-A)dj,(x")] = d^(x", P) A(2 A)dp(x")

PAGE 42

35 It will be noticed that our proof is independent of Figure 13. The figure only helps to illustrate the ideas. n+I ^^ ^ ::^ ^^ \ \ 1 (x ) = 1 (x ) P P FIGURE 13 By lemma 2.12 there exists a y > such that yd(x, P) ^ d^,(x) ^ d(x, P) hence d^x"^\ P) < d^x", P) -y^A(2 ^ A)d^x", P) = d^(x", P)[l A(2 A) y^] (2.14) Let = [1 A(2 A)y^]^^^ 0=0 when A = 1, y = 1. In general 8 e [0, 1). Thus, d^x^+\ p) ^ e^d^x", p) or d(x", P) ^ e^'dCx^, P) Case I: If the sequence does not terminate lim d(x", P) = n— K» Since {x } is Fejer-monotone and dim P = n, then {x } converges to X e 8P by Theorem 2.4 where 3P is the boundary of P.

PAGE 43

36 Case II: If the sequence terminates in a finite number of steps II X z|| < II x"^ z||, ^ z 9 P since the sequence is Fejermono tone with respect to P. Clearly || x* T (x")||^ || x" T (x)||, hence and x^ x" both belong to the ball B^^^n^ p)Tp(x") (see Figure 14). Hence, II x" x^-ll < 2d(x", P) < 2e"d(x°, p) If the sequence terminates, we can take x* to be the last point of the sequence. /// n+2 X FIGURE 14

PAGE 44

37 In the above procedure, if 1 i A < 2 then P" = (x: a'. X + b,. > 0} That is, P is the closed halfspace defined by the constaint i If < A < 1, then P" = (x: a! X + (1 A)b. Aa ! x" > 0} \ 11n n n 1 11 n .n , , , „n and again P <= p . in both cases P is a relaxed form of P and at r . , n „n Iteration n we find x e P . Range of A for finite convergence . Using relation 2.14 giving 9 against different values of A and \i, we get the results in Table 1 shown graphically in Figure 15. TABLE 1 p = 1

PAGE 45

38 FIGURE 15

PAGE 46

39 (2.15) Theorem (Coffin) Let P be a polyhedron with an interior. Then the sequence generated by the relaxation method converges finitely to a point of P if (a) P is smooth enough and A £ Jl, 2J (b) A c (A*, 2] where 2 y* = ; — ; 7^;^ and u(P) is the obtuseness index of P. A 1 + u(P) Furthermore, if A > 2 then the sequence either converges finitely to a point of P or it does not converge. The first part of the theorem shows that if all the corners of P are smooth enough then the range of A over which we can get convergence in a finite number of steps is [1, 2]. The second part of the theorem shows that for a polyhedron that is not smooth enough, the extent by which we can deviate from A = 2 depends on the acuteness of the most restrictive extreme point of P — the more acute it is, the less we can relax A from a value of 2. Merzlyakov Definitions and Motivation . The final work on relaxation methods that we review is by Merzlyakov [30]. Merzlyakov 's method takes advantage of the fact that convergence properties of relaxation methods improve if, instead of considering only the most violated constraint, as was done by Agmon, among others, a larger nun±)er of supporting planes are considered. We would like to emphasize this unusual property of

PAGE 47

AO relaxation methods — redundancy, in terms of number of constraints, improves the convergence rate. Additional halfspaces have the effect of increasing )i which lowers 6, the convergence ratio. At each x e R let V(x) = {i: a'.x + b. < 0, i = 1, . .. , m} That is, V(x) is the set of indices of constraints violated by x. A cavity C is the set C = {x e R : V(x) = V} where V is a subset of first m natural numbers. A subcavity S """ of C is the subset of all x G C which violate constraint i e V no less thananyother constraint j e V. Ties are broken by the natural ordering. Notice that subcavities along with P finitely partition R . Figure 16 illustrates the above. Associate with each subcavity S a fixed A.(j, V) where if i «! V ^,.(J' V) = < > if i e V ^> if i = j . 2 For example for subcavity Sr , (see Figure 16). Xj_(2, {2, 3}) = X^i2, {2, 3}) > A^(2, {2, 3}) > Method . Merzlyakov's relaxation procedure is as follows: Let X e (0, 2) and x specified. If at iteration n, x c P stop. Otherwise, let

PAGE 48

41 (2, 3} '{!', 2, 3} FIGURE 16

PAGE 49

A2 (2.16) x""*"^ = x" A[ZA.(j,V)(a!x"+b.)][ZA.(j,V)a.] [EA.(j,V)a.] [ZA.(j,V)a.] where x e ^w • We again make the assumptions that ? ^ and without loss of generality assume ||a.||=l for each i = 1, . . . , m. Also assume without loss of generality that ZA.(j,V) = 1. Goffin showed that if P 7^ then the vector ZA.(i,V)a. cannot be zero. 1 1 Merzlyakov showed that the procedure given above converges to a point of P with a linear rate of convergence. If 1 ^ A < 2 then we can say P" = {x: ZA.(j,V)(a:x + b.) > 0} .1 1 1 ~ 1 and clearly P c p". If < A < 1, then P"" = {x: ZA.(j,V)Ia'.x + (1 A)b. A(a!x")] > 0} .11 11 1 and P c p . Here again P is a relaxed form of P and the sequence obeys our stipulation for relaxation method, i.e., at iteration n we n n n find x e P where P <= p . The relaxation sequence {x } congerges infinitely to a point X c 3P, where 3P is the boundary of P, only if the whole sequence (x } > "jams" into x + N (x ). By moving in a direction that is a convex combination of violated constraints we force the sequence out of tlie jam. For this reason Merzlyakov method can be referred to as an antijamming procedure. Table 2 and Figure 17 illustrate this point. The table and figure are based on the example given in Merzlyakov' s paper.

PAGE 50

43 TABLE 2 Consider the following system of linear inequalities -X + lOx + 10 ^ 3x^ lOx^ 30 i Let X = (0, 0). We solve this by (a) Agmon's method (b) Merzlyakov's method where X = 3/4, || a. || = 1 n (a) Agmon's method X =x -A (ax +b. )a. ; 1 11 n n n

PAGE 51

TABLE 2 (continued) (b) Merzlyakov Method n+1 n , [ZA.(j,V)(a'.x'' + b.)][ZA.(j,V)a.] X = X A 1 1 1 1 1 [ZA.(j,V)a.]'lLA.(j,V)a.] 44

PAGE 52

45 11 u ex; o M

PAGE 53

46 fO if a!x + b. > A.(x) 4 ^ 1^ if a'x + b . < V. 11 where < ZA.(x) < 3, A.(x) > for some i e V(x) and at each n i there is a j e V(x ) such that A.(x'')(a'.x" + b.) ^ _J i L > a ' X + b . , J* J'' where j -' indexes a most isolated constraint and > 0. The convergence properties of this method are described in the next chapter. Typical Subgradient Method Procedure Let f be a convex but not necessarily dif ferentiable function defined on R . u £ R is said to be a subgradient of f at x if it satisfies the following inequality: f(y) > f(x) + u'(y x) for all y £ r". The set of subgradients of f at x is called the subdif f erential of f at X and denoted by 9f(x). 9f(x) = {u e R : u is a subgradient of f at x} X is a minimum of f if and only if e 9f(x ). A typical subgradient algorithm works as f ollov%?s : (1) choose X e R arbitrarily (/; compute a subgradient u of f at x , i.e., u C 9L(x ). If u = an optimal point has been found. If not go to (3). fi\ T?J """"l c IT n+l n n n (J; Find X as follows; x = x t u

PAGE 54

47 wliere t is a scalar generally obtained by the approximate rainiraization of f(x" tu") for t ^ 0. Go to (2) with X replacing x llie function f could be an arbitrary convex function. If the objective is to find a point of a polyhedron, one could choose a number of different functions. One choice could be f(x") = Max (-alx" b.) i=l, . . . ,m Since f is the maximum of a finite number of convex functions, f is convex. Oettli [32J has given another interesting way to define a function which can be used to find a point of a polyhedron. This is given latter. Variations Reverting back to the general subgradient procedure given by the iterative scheme n+1 n n n X = X t u various proposals [16] have been made for the step size t . The earliest one was by Polyak [34] who suggested Zt = =° and lira t = 0. n— >oo The merit of this procedure is that we could choose any arbitrary subgradient u e 8f(x ) and the sequence would converge to some x* with the stipulated stepsize (e.g., with t = n^ • However convergence is dismally slow and the method is of .little practical value. Polyak attributes the original idea of this method to Shor [36]. Another idea also due to Polyak [35] is to use

PAGE 55

48 (2.17) ^^ A^IfCx") f] t = II "l|2 — -k with f = f(x ) the optimum value of the function. Polyak has shown that vjith 0 f(x ), i.e., an overestimate of the optimum f(x ). Here again if < b , = X = b„ < 2 the sequence converges at a 1 n f and define S^ = {x e r'^: f(x) i a}

PAGE 56

49 S is convex (it is a level set). Hence there exists an x in int S a a •k since a > f (see Figure 18). Choose p > such that the neighborhood B (x) c S . p a Suppose x q' S for all n a f(x) (a) X £ R FIGURE 18 f(x) f(x") ^ u'' (x x"), -V-x since u is a subgradient at x . Thus f(x") f(x) g u"' (x" x). Also contours of f (b) X e R^. f(x) 1 f(x") ^x e S a

PAGE 57

50 Combining the above u" (x" x) i f(x") f(x) ^ 0, -^x e S a n This holds for a point x + l. e B^(.x)c S , I n ^ nn^n.^pu. Hence u x ^ u (x H ) llu"|| 1 1 ri II ^ n , n ~, or p| u I ^ u (x x) „ . II n+1 ~ II II n n ~ m 2 But x -x = x -u -x -II ^ ~l|2,|| n||2 nn ^ = II x x|| + II u II 2u (x x) ^ II x"ill 2+ ||u"||22p||u"|| = II x"ill 2+ ||u"||2 -2pA^. Choose N such that A = p, -V-n = N. This is possible since lim A =0. n n n — x» For n = N to n = N + ra 0^ ||x^'^-lx||2^ II x^S||2 + Ta,(A, 2p) n=N ., _ N+m g II x^ X II 2 p Z A^ n=n For m large enough the right hand side is negative, since ZA diverges — a contradiction. Hence x e S . a Let lim a = f . Then we have proved there exists x ^^ e S such that m n ^^ lim f(x 1^) = f '. Computational experience with heuristic subgradient methods has been reported by Held and Karp [18], and Held, Wolfe, and Crowder [19]

PAGE 58

51 Held, Wolfe, and Crowder use underestimates to f(x ). However the stepslze X = A is kept constant for 2m iterations where m is a measure of the problem size. Then both A and nimiber of iterations are halved successively till the number of iterations reaches some value k, \ is then halved every k iteration. This sequence violates the requirement ZA = "» and it is possible that the sequence may n converge to a wrong point. However in their computational experience this did not happen. Held, Wolfe and Crowder experimentally found that their procedure was effective for approximating the maximum of a set of piecewise linear concave functions. In their problems (assignment, multicommodity , and maxflow) the number of constraints is enormous, of the order of 100 1 . In all these cases they were able to get linear convergence (which is assured by the choice of stepsize) Held and Karp used overestimates to f(x ) to obtain bounds for a branch and bound procedure for solving the traveling salesman problem. Using this procedure they were able to produce optimal solutions to all traveling salesman problems ranging in size upto 64 cities. Oettli Oettli proposed a subgradient method for finding a point of an arbitrary non-empty polyhedron. We will discuss his less general method which can be used to solve a linear program. As before, using the dual and an additional constraint expressing strong duality, a linear jirogram can bo stated to be the problem of finding an x E 1\ satisfying 5-"!'(x) ^0 i = 1, . . ., m

PAGE 59

52 v^rhere t. (x) represents the magnitude by which constaint i is violated at x; i . e . , 5, (x) = max {O, -a.x b } i 11 Define £ (x) to be the vector formed from violations of different constraints l^(x) = a.'^(x), £„'^(x), ..., l'^(x)) i Z m Let p(*) be an isotone norm on R , i.e., if = x ^ y then p(x) = p(y). It may be noted that all the usual norms like Euclidean and Tchebycheff are isotone norms. Define d(x) = p(£^(x)) Then clearly x satisfies i. (x) ^ iff d(x) = 0. Oettli showed that d(*) is a convex function and then described an iterative scheme using subgradients of d(*) that minimizes d(*) over R . Tliis scheme is quite general since we have a different function for different norms. (2.18) Lemma fOettlil d(*) is a convex function. Proof: Z. (z) is convex for all i. J Thus I (azl + 3z") < aS,"^(z^) + BS-"*"(z^) for a, 3 > 0, a+3 = 1. Since p is monotonic p(^^(azl + 3z^) < p[a£'*"(zb + Bl^(z^)] and since p as a norm is convex ^ p[a£"^(zl) + p[3£'^(z^)] = apU^Czl)] + 3 p[Ji^(z^)] hence d(az^ + 3z^) ^ ad(z^) + 3d(z^) ///

PAGE 60

53 Let X be given initially. If x £ P stop. Othen^;ise, let n+1 n d(x" j n, (2.19) X = x , t(x ) . n ' , n. t(x ) t(x ) where t (x ) is any subgradient of d(') formed according to Oettli's rule at x . Ti\e sequence {x } converges to a point of P with a linear rate of convergence, (2.20) Lemma [Oettlil Let Y £ P> then for the Oettli's procedure II n+1 II 2 < II n ,1 2 d2(x") II x yH ^ II X yII -; 1 t(x")||' Proof: 11 ri+1 II 2 II n d (x ) , n, n 2 II X yII =I|x -^— ^ — t(x ) yII t(x")'t(n") = II x"'yII^ ^^^^^ .ml) t(x")'(x" Y) / n. , . n, . n, , . n. ' t(x )' t(x ) t(x )'t(x ) By definition of subgradient d(Y) I d(x") + t(x'')'(Y x") Since d(Y) = therefore ,,n. ^ ,n.i,n . d(x ) $ t(x ) (x Y) Collecting the above results n+1 II 2 , II n II 2 d^(x") II x"^^ yII ^ -5 II x" yII " ^, ^ „ . IU\ t(x ) t(x ) (2.21) Theorem [Oettli] c"""^ x"|| g e II x" x"'||,0 < < 1 Proof: (2.22) II x"+l yII ' ^ II XyII ' ' ^f ; „ t(x ) t(x )

PAGE 61

^ II x"yII By triangle inequality X X < X Y + I Y X 5A ^ 2 II XyII Since this holds for all Y E P including Tp(x ) II x" x'^ll ^ 2d(x^ P) Substituting x in 2.22 1-^ / n^ t(x") t(x") -1 i2 / n. n "ll 2 ri d (x ) X X 11 / n N ' / n . II n '' 1 1 2 ' t(x ) t(x ) X X ,"./l|2„.ii!(Zi ,2. n -] II "+1 '' or X X < X X [1 — ]2 11 , ^^\2 t(x') t(x') d"(x , P) 2 1 . rd(x) „ ii,n. II where A = mf ; B = sup || t(x ) || n d(x , P) n Oettli has shown that A > and B is finite. From these observations, the result follows if we take 2 1 /// =u-|^n In the next chapter we show that the Generalized Merzlyakov procedure is equivalent to the Oettli procedure in as far as the sequence of possible moves made by the two procedures are concerned. Some of the More Recent Algorithms A limitation of subgradient optimization methods is the absence of a benchmark to compare the computational efficiency of different

PAGE 62

55 algorithms. Algorithms for unconstrained optimization of smooth convex functions have the reliable method of steepest descent (Cauchy procedure). as a yardstick against which other algorithms are compared. If they are better than the Cauchy procedure, they usually deserve further consideration. In subgradient optimization, the Cauchy procedure may not work. An exampJe v;liere the Cauchy procedure is ineffective is reproduced below from [37]: The function considered is max {3 X + 2y, 2x + lOy} FIGUI^ 19 The contours of this function are sketched in Figure 19. One can calculate that if the Cauchy algorithm is started at z, the sequence converges to the origin (0,0) which is not the minimum of the function.

PAGE 63

56 Some of the recent subgradient procedures can be divided into two categories. In the first category are those that claim to be at least as good as some other subgradient procedures. Tliese algorithms claim their superiority based on empirical (computational) results. In the second category are those that claim to be reasonably effective for both dif f erentiable and non-diff erntiable convex functions and claim that the algorithm is superior when applied to quadratic and convex dif f erentiable functions. As an example of the former we present the method of Camerini, Fratta and Maffioli 15], and as an example of the latter we present Wolfe's [38] method of conjugate subgradients for minimizing nondiff erentiable functions. Wolfe's algorithm closely resembles Lemarchel's method 128]. Camerini, Fratta and Maffioli . Here one adopts the following iterative scheme for minimizing f('). Let X = and n+1 n n, n. ,.^.,tr ,. n X = X t s where s is a modified gradient n direction defined as s =0 for n = and n c\ f ^\ , n n-1 s =f(x)+3s n with f ' (x ) a subgradient of f at x and 3 a suitable scalar. s n is equivalent to a weighted sura of preceding gradient directions and is used as an anti-jamming device. The following lemmas and theorems develop the essential elements of Camerini, Fratta and Maffioli method.

PAGE 64

57 (2.23) Lemma * n Let X and x be such that f(x ) ^ f(x ), then S f (x") f (x*) = f ' (x"")' (x* x") This lemma states that the negative of the subgradient makes an acute angle with the vector (x x ) where x corresponds to an optimal solution. See Figure 20. / X f (x") FIGURE 20 -f (x") (2.2A) Lemma If for all n < t < 1^^^(^ ) -^^ o > n,, 2 and 3 =0, then n ' A ^ f (x ) (x X ) ^ -s (x X )

PAGE 65

58 Lemma 1.1k extends the subgradient property of Lerama 2.23 to -s -s" also makes an acute angle with (x x ). (2.25) Theorem t * n-l,,n, ii,, r-Y, ^ — Lix_ if 3^-1 f (x") < Let 6^=1 II s"-^ II 2 \^ othervjise with 0=1 =2: then n -(x x") s > _ (x X ) f (x ) lls^ll If'Cx") Theorem 2.25 shows that a proper choice of 6 ensures that -s is at least as good a direction as -f (x ) in the sense that -s makes more of an acute angle with (x x ) compared to -f ' (x ). If -f'(x") and -s"""*" make an acute angle we take -s^ = -f'(x ). Only if -f'(x") and -s" make an obtuse angle do we take -s = -(f ' (x ) + 3 s ) n Figure 21 illustrates the above theorem. n-1 -f (x ) = -s n-1 FIGURE 21

PAGE 66

59 (2.26) Ttieorem n+1 II < II X*x" Proof: "" l|s"l|' C^lj s"|| ^ g (f(x") f(x*)) < 2[f(x") f(x--v)] ^ -2f'(x ) (x'< X ) by Lemma 2.24 n ' n S -2s Cx* X ) by Lemma 2.25 t^||s"||2 + 2s"' (X* x") < or t^||s"|| 2 + 2t s"'(x* x") < n n adding || x* x || to both sides gives Hence n+1 I I . 1 1 , n 1 1 r-ry-j X" X II < II x-* X II I/// Theorem 2.26 shows that the sequence is Fejer-monotone with respect to an optimum point x*. Using arguments similar to those used earlier in relaxation methods, it can be shown that the sequence r n-i ix ; converges to x'^'. Camerini et al . , use heauristic arguments to show that the best value for Y is 1.5. They then claim superiority of their method over that of Held and Karp [18] and Polyak [35] based on computational results. Wolfe . In Wolfe's method [38] we generate a sequence of points {x } of directions {d }, of subgradients {f'(x )} and scalars {t"} such that for n = 0, 1, 2, ...

PAGE 67

60 t ^ 0, f (x"") e 3f(x") n n+1 n , .n , X = X + t a and n ^, n+1, ^ (T/ n, f (x ) < f (x ). At step n we have a bundle of vectors G which is formed by a set of rules. G consists of f ' (x ), possibly -d and a (possibly empty) string f'(x ), f'(x ) , . . . of limited length. We set d = -N G where N G is a unique point in the convex r n r n hull of G having minimum norm. Demjanov I 10] has shown that Vf (x) = -N 8f (x) where Vf(x) is the gradient of f at x when f is smooth at x and 3f(x) is the subdif ferential of f at x. As is typically the case, the stepsize, t , is determined by an approximate minimization of f(x + td ) for t = 0. At some step, d = -N G n r n will be small, and in addition the last p + 1 points (for some interger p) are all close together. Tlien x will be our approximate solution. For example consider the illustration in Wolfe's paper f(x,y) = max {f , f , f } whei ;re fj^ = -X, f^ = X + y, f^ = x 2y sketched in Figure 22. A subgradient of f at any (x,y) is chosen to be Vf . where i is the least index for which f(x,y) = f.. If we start with the point (6,3), we move in the direction -Vf(6,3) = (-1,-1) to the point (3,0). Using our prescription for choosing the direction, the only available direction at (3,0) is still (-1,-1). We must take a small step in that direction anyway even thougli it is not a descent direction. Tlien the new subgradient will be (-1,-2) forming the bundle G = {(1,1), (1,-2)}, we find the direction -N Q = (-1,0).

PAGE 68

61 FIGURE 22 The next step in this direction takes us to the neighborhood of (0,0). At the next iteration we add (-1,0) to the bundle. Then N^G^ = 0. We "reset" and start all over again at this point. We accept the last point reached as a solution if N G = and the r n last "p" points are "close enough." For differentiable convex functions, the method is an extension of the method of conjugate gradients and terminates for quadratic functions.

PAGE 69

CHAPTER 3 THEORETICAL RESULTS Comparison of Generalized Merzlyakov Method and Some Typical Subgradient Methods The equivalence of relaxation methods and minimization methods has been noted by several researchers. However, these two approaches are only mathematically equivalent and this equivalence has not been generally made explicit. On the other hand computationally relaxation methods are much simpler. It has been shown [24] that a generalized form of Merzlyakov' s method subsumes most of the typical subgradient methods. In particular it was shown that the generalized Merzlyakov method can generate any move that is made by the Oettli method. In this section we draw heavily from the results of the above paper [24]. Generalized Merzlyakov Method The Merzlyakov method is generalized [24] so as to facilitate its comparison with some of the subgradient methods. This extension greatly increases the flexibility of the original method. The following procedure allows the A.s to be chosen based on the current point rather tlian being constant for a subcavity. For any x where V(x) / 0, let /"o if a^x + b. > A.(x) = ^ "" "" ^ I > if a'.x + b. ^ 62

PAGE 70

63 where < Y. A.(x) = 3, A . (x) > for some i cV(x) and aL each n 1 1 there is a j e V(x ) such that ^.(x")(a'.x" + b.) 3.1 J^ ^ > e a '. . X + b . . J* J* where j* indexes a most violated constraint and > 0. For A e (0, 2) and x initially specified, stop at iteration n if X e P. Otherwise, let XL A.(x")(a:x" + b.)(Z A.(x")a.) T „ n+1 n i_ 1 1 1 X 3.2 X = X (E A.(x")a.)'(Z A.(x")a.) 11 11 We continue to assume a. =1 for all i. 1 In the next two theorems, we show that the generalized Merzlyakov method converges with a linear rate of convergence to P. 3. 3 Tlieorera Let z e P, and {x } obtained by the generalized Merzlyakov method. Then II x"-*-!z|| < ||x"z|| Proof: 2(x"-z)' A I A.(x")(a:x" + b.)E A.(x")a. ||x"^^-z||=l|x"-z||^-(Z A.(x")a.)'(Z A.(x")a.) 11 11 A^(Z A. (x'')(a'.x'' + b.))^ + i ~ (Z A. (x'')a.)'(Z. A.(x")a.) The last two terms can be conAiined to get -A Z A.(x'^)(a'.x" + b.) [2Z A.(x")a:(x" z) a A^(x")a^)'(Z A^(x")a^) A(Z A.(x")(a'.x" + b.))] Ill

PAGE 71

YR 64 Now a! X + b. = whenever A.(x ) > and a ! x"^ + b < for some 11 1 11 A.(x ) > and thus the first bracketed term is positive. Since z G P and alz ^ b., then 1 ~ 1 -Z A.(x")a'.z ^ E A.(x")b. 11 11 Thus the second bracketed term is less than or equal to (2 -A) Z A.(x")(a:x'' + b.) Ill and this term is strictly negative. Thus II "+1 II ^11 n ||x -z|| ^||x -2 Thus we have established that the sequence is Fej er-monotone with respect to P. We next show that the procedure gives points which converge linearly to P. Let d(x^, P) = inf II x" zjl . zeP 3.4 Ilieorem Either the generalized Merzlyakov procedure terminates with a point of P or it generates a sequence which converges to a point X* e P. Furthermore, the convergence rate is linear. Proof: Assume x ?! P and z" is the unique closest point of P to x". Then d^x"-"\ P) ^ ||x"+lz"||2. After ex-panding || x" 2*^11 we get (i: A (x'')a.)'(Z A.(x")a.) 11 11 . „ A(2-A)[Z A.(x")(a:x" + b.)]^ = d-^Cx", P) -(Z A.(x")a.)'(Z A.(x")a.)

PAGE 72

6 5 By assumption IE A.(x")(a:x" + b.)]^ k [A.(x")(a'.x'^ + b.)]^ ^ e^(a!^x" + b.^^)^ for some i e V(x") From Hoffman [21] we have that there is a p > n r.s < ^, n 111 us pd(x , P) ^ -a!..x' b..,. d^x"^\ P) < d2(x^ p)_ A( , 2-A)p^9V(x", P) (L A.(x")a.)'(E A.(x")a.) 11 11 Finally since Z A.(x ) ^ g and ||a.||= 1 for all i, we have a A.(x")a.)'(Z A.(x")a.) < B^ And we get d2(x"^\ P) < U A(2-A)e2y2/3^]d2(x", P) = y^d^Cx", P) Tlien ^ Y < 1 and d(x", P) i Y"d(x°, P) . Case 1 : If tlie sequence does not terminate lim d(x", P) = and n-x» Motzkin and Shoenberg [31] have shown that the sequence converges to a point X* in the boundary of P for < A < 2. Case 2 : If the sequence terminates II x""*" z|| < II x" zjl for all z e P (Theorem 3.3). Also II X* z II < II X 2 II where z" is the closest point to P from X . Thus ||x-.'c x"|| ^ II X* z"|| + ||z" x"|| < 2||x" z'^ll = 2d(x", P) ^ 2Y"d(x°, P) where x* is the last point in the terminating sequence. ///

PAGE 73

66 We can further generalize the Kerzlyakov method and Tlieorems 3.3 and 3.4 by replacing A with A where < b. < A < b < 2 for all n. In the remainder of the discussion we usually mean A = A for all n but do not have to restrict ourselves in this manner. Further Developments of Oettli Method Oettli's minimization method was discussed in the last chapter. We wish to show the equivalence of Generalized Merzlyakov method and Oettli's method in the next section but before that we need to develop some properties of Oettli's method to facilitate this comparison. Again most of the theorems and proofs in this section are taken from the paper [24] referred to earlier. We represent the subdifferential of a convex function f at x by ' 3f(x)and a subgradient of f at x by f ' (x) i.e., f ' (x) e 3f(x). To recapitulate the Oettli procedure, we form the sequence {x"} as follows: 3.6 ,,"+! ,,n d(x")t(x") t(x")'t(x^) where t(x ) is any subgradient of d(«) at x". 3. 7 Lenmia For any norm p(«) on R™ ap(x) = 9p(0) n {p'(x): p(x) = p'(x)'x} Proof: We will show that if p' (x) c3p (x) then the following ralation holds 9p(x) = {p'(x): p(y) 2 p'(x)'y for all y} n ^p'(x): p(x) = p'(x)'x} Conversely if the above relation holds, we will show p' (x) e 3p(x).

PAGE 74

67 By the definition of a subgradient p'(x) e 3p(x) if p(y) 2 p(x) + p'(x)'(y x) for all y. Also p(Ax) p(x) > p'(x)'(Ax x) for all scalars A. Since p(*) is a norm p(Ax) = |A|p(x) For A > 1 p(x) = p' (x)'x < A < 1 p(x) i p' (x)'x Thus p(x) = p' (x)'x. From the subgradient inequality p(y) ^ p(x) + p'(x)'(y x) by substituting for p(x) we get p(y) > p'(x)'y for all y. Hence 3p(x) = {p'(x): p(y) > p'(x)'y for all y} n {p'(x): p(x) = p'(x)'x}. Conversely, if p(y) > p'(x)'y, for all y and p(x) = p'(x)'x, we get by subtraction p(y) P(x) ^ p'(x)'y p(x) = p'(x)(y x) and p' (x) e dp (x) . Finally 8p(0) = {p'(0): p(y) ^ p'(0)'y, for all y} n r'" {p'(0): p(y) > p'(0)'y, for all y}. Ul To find the subgradients of d(«) in terms of subgradients of the composite function p(?, (x)) we use the following chain rule given by Oettli. 3.8 Lemma Let p(*) be an isotone norm and p'C^. (z)) 2 0, then 3.9 d'(z) =£"^'(2) p (l^iz))

PAGE 75

68 Proof: l^{z) £^(z°) = Cz z°)'il^'cz°) multiplying both sides by p ' (£ (x)) where p ' (A (x) ) = (^^(z) £^(z°))p'(ll^(z°)) I {7. z°).'Ji"*''(z°)p'(£^(z°)) Using the definition of subgradient for p'(i2(z )) p(Jl'^(z)) p(£'*'(z°)) ^ (!i^(z) £"*'(z°))'p'(ii^(z°) ^ (z z°)'£^'(z°)p'(il+(z°)) d(z) d(z°) = (z z°)'£^' (z'^)p'(£"^(z°)). But d(z) d(z°) ^ (z z°)'d'(z'^) hence d'(z°) = £^ (z°)p ' (£"''(z°) ) . The subgradient of 5-.(x) may be found using tlie following well known result 3.10 35,. "'"(x) = {-Aa.: A = if a'.x + b. > 1 1 11 A = lifa'. x + b.<0 and 1 1 A c [0,1] if a'.x + b. =0} The subdiff erential of £ (x) can in turn be found from 3.11 3i2,"^(x) = {(h,, ..., h ): h. e 3K,"*"(x)} 1 mil Generalized Merzlyakov method and Oettli method With the results developed so far in the previous sections we are now in a position to develop an expression for equation 3.6 in terras of the parameters of the generalized Merzlyakov method. 3.12 Theorem Let X r. K , x ?! P and t(x) be a subgradient of d(*) at x formed by using the chain rule of equation 3.9. Then ,, . , . [ZA. (alx + b.)]ZA.a. d(x) t(x) ^ 1 1 1 11 t(x)'t(x) (ZA.a.)' (EA.a.) 11 11

PAGE 76

69 for some A.'s where A. =0, A. =0ifalx+b.>0, and ZA > 0, 1 11 1 1 i Proof: By equation 3.9 we have t(x) = l^ (x)p'(£"*'(x)) for some l^ (x) c 3£"'"(x) and p'(Jl"^(x)) c 3p(il"'"(x)). By equations 3.10 and 3.11 +' ^ (x) = -(n,a , r]r.a , ..,, n a ) with i i I A mm if a!x + b. > 1 1 n. = < 1 if a'.x + b. < G [0,1] if a! x + b. = Also from the proof of Lemma 3.7 d(x) = p' {I (x))'il (x) where /n^Ca^x + b.) \ l^U) = \ n (a'x + b ) mm m / Let ,+ p' (£ (x)). In Oettli's method we need h ^ 0. Substituting ^u 1 c • • d(x)t(x) these last rew expressions into — -, — rn — i—:gives ^ t(x)'t(x) ^ d(x)t(x) t(x)'t(x) [ZA.Ca'.x + b.)]ZA.a. 1 1 1 11 (ZA.a.)'(EA.a.) 11 11 , + , with A. = n.li. ^ 0. We also have -ZA.(a!x + b.) = p(£ (x)) > since 111 111^ X t ? and ZA. > 0. 1 m We have thus established a close similarity between the two methods. We have however yet to establish the two other requirements viz ZA . ^ 6 and that condition 3.1 is satisfied. First we note that

PAGE 77

70 since the subdif ferential of p(*) is compact ZA. is bounded above. Hence the condition EA. ;^ 6 is satisfied. Now to show that condition 1 3.1 is satisfied, we see on the lines of the proof of Theorem 3.12 that d(x) = p(l^(x)) = p'(£'^(x))'£"^(x) = h'(£^(x)) = -i:ri.h.(a'.x + b.) = -ZA, (x)(a'.x + b.) Ill 1 1 1 1 where A. (x) = n .h . ^ 0. 1 11 Hence -ZA. (x)(a:x + b.) = p(JL'^(x)) 111 and m max A.(x)(-a'. x b.) > pCS, (x)) i Also i (x) ^ £.,(x)e,. where i* is the most violated constraint and e.^ is a unit vector with a 1 in position j*. Since p(*) is monotonic P(^'^(x)) > P(£^.(x)e..) 2^ J* = -(a: X + b. .)p(e... ) Thus m max A . (x) (-a.'x -b.)^-(a'x + b ,)p(e ,) _^ 1 1 1 2* J'-' j'' 1 (a ! , X + b . . ) min p (e . ) Hence A rx)(a:x+b.) , "^^" P(^) where i e V(x) maximizes the left hand side. If we allow A = 1 in the generalized Merzlyakov method, we find from the above analysis that any move made by the Oettli procedure

PAGE 78

71 can also be made by Che generalized Merzlyakov method. However, the reverse is not true. This can be seen if we let A 7^ 1 and take the case where x violates only one constraint. llius the Oettli procedure is strictly subsumed by the generalized Merzlyakov method. Computationally the relaxation method is very simple to implement since the directions of movement can be easily computed. Also it is a very flexible procedure since all that is required to change the direction of movement and the step length is to change the weights A.(x). On the other hand the Oettli method requires a suberadient 1 TO to be computed at each point which is not a trivial calculation. In an earlier paragraph we indicated that the Merzlyakov procedure could be further generalized by allowing X to vary with each iteration. However this would require tiie additional condition 0
PAGE 79

72 We consider the methods of Polyak ([34], [35]), Eremin [12] and Held, Wolfe and Crowder [19]. These subgradient methods are capable of tackling more general functions, however we compare them when the objective is merely to find a point of a polyhedron P. There are a number of ways of defining the function f(*) such that when we minimize it we get x* e P. A typical choice is 3.13 f(x") = max -(a]x^ + b.). 1 1 i=l, . . . ,m We will show that with f thus defined the subgradient methods are subsumed by the generalized Merzlyakov method. The above subgradient methods can be collectively described by the following sequence: n+1 n , [f(x ) f] , , n, ^ = ^ \ ||f'(x")P ^ ^^ ) where f is an estimate of fCx*) the optimum value of f. If f(x ) assumes its value for a unique i in the relation (3.13) then the subgradient at x , f ' (x ) = a . . If there are ties, f'(x ) can be taken to be a convex combination of the maximizing a.s. Let 0(x ) represent the set of indices i C {l, ..., m} for which fCx"^) = max (-a'.x^ b.). Then i=l, . . . ,m f ' (x") = ZA.(x")a. with ZA.Cx"^) = 1 11 1 A_.(x'') I and A.Cx"") = for i 5^ 0(x''). Hence _ ,, 1 (f(x") f) n+1 n An ri , "s X = X 'C (x ) iif(x")r = x A (ZA.(x")(a^x" + b_^) f)(i;A.(x")a.) " (ZA.(x")a.)'(ZA.(x")a.) 1111

PAGE 80

73 with X.(x") > 0, ZA.(x") = 1 and A.(x") = for i ?! 0(x"). This can be considered equivalent to finding a point of P where P= {x: a.x+b. -?^0, i= 1, .... m}. 11 If f ^ 0, the procedure gives a point of P. Thus if P ?^ the subgradient procedures mentioned in this section are special cases of the generalized Merzlyakov method. An Algorithm Using Relaxation Method Motivation We have seen that with relaxation parameter of A = 1 we can get the best convergence rate for a fixed y (see Figure lA). Also a higher y leads to better convergence. We wish to combine both these features in our algorithm. We also derive motivation from Polyak's [34] and Oettli's procedure [32]. In this section we give a procedure for finding a point of an arbitrary polyhedron P and later apply it to the specific problem of solving large linear programs. Given a polyhedron P P = {x e R^: a'.x + b. ^ 0, i c l}. 1 1 We assume P ^ and I f^ and finite. Goffin [15] has shown tliat when P is full dimensioned finite convergence is assured for a range of relaxation parameters A depending on the obtuseness index of the polyhedron. WHien P is not full dimensioned, we can partition it to subsets M and C, such that M is full dimensioned, and then devise an iterative procedure relative to these subsets so as to take advantage of the full dimensionality. Even though finite convergence is not assured, we can hope to get a better convergence rate with a proper choice of the relaxation parameters. There is considerable flexibility

PAGE 81

74 in the manner M and C are chosen. One important consideration is that it should be easy to project on to C. The algorithm is in two phases. In Phase I we use an iterative scheme on the sets M and C which drives us close to P. We do this by starting with an arbitrary point in C. We then move in the direction of M. This movement could be a projection onto M, or a reflection or under or over relaxation, depending on the value of the relaxation parameter X. From the new point thus obtained we project on to C. We show tliat the sequence thus constructed converges to P with a linear rate of convergence. In our algorithm however, this iterative procedure is continued merely to get close to P. In this special case when C = R , Phase I becomes tlie relaxation procedure of Agmon. Once we get close to P we switch to Pliase II. In Phase II we consider the constraints of the set P and apply the Generalized Merzlyakov method. The motivation for use of the Generalized Merzlyakov method at this stage is the fact that the set of violated constraints as we approach P are precisely the set of violated constraints at the limit point and under these circumstances, the Generalized Merzlyakov method can be very effective. We first describe the procedure for finding a point of an arbitrary polyhedron P. The Algorithm Phase I . Let z £ C-P If z C P stop, otherwise define

PAGE 82

75 n+1 = T^ {z"" + Ad,j(z")a. } = T^(s") where < A < 2 and a. corresponds to the most violated half space n n of M at z and 3.14 n _ n , , n, s :^ z + Ad,,(z )a . M 1 We will show that the sequence { z } is Fejer-monotone with respect to P. 3.15 Lemma [GoffinJ Let X, y, t be points of R and let x' = X + A(t x) where A e R. Tlien Proof: X' y X' y| y||^ AC2 A) lit x|| ^ + 2A(t-y)'(t-x) X + A(t x) yl = ||x-y|| + I2(x y) + A(t x)]' A(t x) = II X y|| 2 A(2 A)|| t x|| ^ + A[2(x y) + 2 (t-x) ] ' ( t-x) = ||x y||^ A(2 A)|| t xll 2 + 2A(t y)'(t x) . |/7Z1 3.16 Lemma |ls"-u||2 = 1|," ul|2Ad^(z")[(2-A)dj^j(.") 2a! (t" u) ] n for u c M. Proof: Substitute s for x , z for x and t for t, where t" = z" + dj^j(z")a. . Then n 2 II n II 2 , ,„ , . II n n,| 2 5" u|r = II z" u|r A(2 A) t z + 2A(t u)' (t z ) z" u|| ^ A(2 A)d^j(z") + 2Ad^j(z")a^ ( t" u) n .n. n. n z u|| Adj^(^')I(2 A)dj^,(z") 2a^ (t" u) ] . U± n

PAGE 83

76 3.17 Tlieorem Let M be a non-empty polyhedron of R . Then the sequence of points generated by equation (3.14) has the following property: II s" u|| ^ II z^ u|| for all u e M. Proof: If u e M, u e H_. . Hence a^. ( t" u) =0 for all u e M. By lemma 3.16 1 1 n n s" ull ^ II z" ull . E77] 3.18 Theorem Let P be a non-empty polyhedron of R . Then the sequence of points {z } generated by equation (3.14) is Fejer-monotone with respect to P, in fact II z""^^ u|| < II z" ujl for all uC P. Proof: Let u e P and let b be the unique point obtained by projecting u onto the line formed by extending s , z (Figure 23) . The angle formed by s , b, u is a right angle and for the right triangle n , s , b , u. s" a||2 + Ha ull 2 = lis" ull 2 or s" z"+l|| 2 + II z"+l all 2 + 2|| s" z"^^|I || z"+^ a] + ||au||2 = lis"u||2 n n+l,| 2 II n+1 ii 2 , „ n n n+1 n ,, n+1 s z II + II z u|| + 2|| s / II II z a II '^ l|2 = s u or n+1 z ull' < lis"u||2.

PAGE 84

77 are The strict inequality arises from the fact that s" and z""^ distinct or else z c P. But by Theorem (3,17) II s u|| < II z u|| hence II ^'^^ II ^ II " II T— II z ujl < II z u|| JljJ In the next theorem we show that the sequence {z"} converges linearly to P. FIGURE 23

PAGE 85

78 3.19 Theorem The sequence of points {z } generated by the above procedure converges finitely or infintely to a point of P. Furthermore the convergence rate is linear. Proof: I I n+1 ^ , n+1. 1 1 2 II n+1 1 1 2 . . r^ / , \\ z. -T(z )|| =||z ~'J|| f °r a unique u e P (the closest point of z to P) . Thus lU'^'u||2= ||T (z^ + Ad rz")a, }T ru)|| M^ ' i C g II zT" ujl ^ Ad^j(z")[2a| (z''-u)-Adj^(z")] n since the projection map satisfies the Lipschitz condition with c = 1. Continuing II n+1 II 2 ^ II n II 2 u||^ < II z"u||^Ad rz")[2d,Xz") Ad rz")] since al (z u) = d „(z ) hence 1 h n II ^"^^ u|| ^ < d^Cz"", P)[l A(2 A)y"^] by Lemma 2.12 We get )'d2(z", P). dCz""*""*", P) < ed(z", P) where e = U A(2 X)\i-''^]^^^ and £ [0, 1). Hence d(z , P) < 9 d(z , P) and if the sequence is infinite lim d(z",P) = n— Kjo Using Theorem [2.4] and by the continuity of the distance function this implies {z } converges to Z'^ in the boundary of P.

PAGE 86

79 If the sequence terminates at z''', then II z>'' Tp(z")|| < II z" Tp(z")|| = d(z", P) by Fejer-monotonicity of the sequence. Hence z* and z both belong to the ball B (T^Cz")), and d(z", P) nn ^oj/H „. ^ „„n , . z'V z"|| < 2d(z", P) < 20"d(z^, P). Ul Phase II . When the last k points are within £ of each other, we switch to the Generalized Merzlyakov procedure. This is because if we are near the limit point, tlie only active constraints would be the tight constraints at the limit point of the sequence. In such a situation the Generalized Merzlyakov method can be very effective with a proper choice of A and A.(x). We recapitulate that in Merzlyakov's procedure (which is a special case of Generalized Merzlyakov method) the sequence is generated as follows: ^^^ ^ XIEA^.(j, V)(a;.x" + bJ][ZA^.(j, V)aJ X = X [ZA.(j, V)a.] [ZA.(j, V)a.] where x £ S^ If we assume that we know the set of constraints satisfied with equality by T (x ), we can generate the halfspace H* = {x e R'':(Tp(x") x")'(x TpCx"")) > 0} which gives convergence to T (x ) in exactly one step with A = 1 Suppose we number the set of active constraints at T (x ) by I* = {l, ..., p}. Let A* E (a.,..., a )' be the p x k matrix formed by interior normals of the halfspaces 11. for ic I*. Let b* = (b^, .... b^)' . Define (A^, ...,A P)' = -(A*A*')"^(A*x" +b*).

PAGE 87

80 With this set of A . ( j , V) ' s using Merzlyakov method, Goffin [15] has shown that convergence to T (x ) is obtained in one step. P However, this requires a comparably larger amount of computation and finding (A*A*') is not really practical on large problems. In this study we have therefore concentrated on the Generalized Merzlyakov Method and attempted to find what would constitute a favorable set for X and A . (x) . Application to Solving Large Linear Programs We now consider how the procedure could be used to solve large linear programs. Consider the LP max c ' X s. t. Ax S b X ^ and its dual min TT'b s . t. A'tt = c TT ^ 0. By stong duality, an optimal IT and x satisfy b'TT-c'x<0. Let C = {(,'^): b'^ c'x ^ 0} M = {(^): Ax ^ b, -A'tt ^ -c, x ^ 0, tt > O} and P = M n C. Tliis is only one of the ways of partitioning the constraint of P. There is in fact great flexibility in the choice of C and M. The advantage of splitting C and M is indicated above is

PAGE 88

81 (a) M has a special structure and can be decomposed along TT and X. Tills leads to saving in storage as well as easier arithmetic. (b) C contains only the coupling constraints and is easy to project to. However, there are other ways of obtaining p. Another choice could be to let C have only the non-negativity constraints. Again the advantage of such a construction would be ease of projection to C. Still another choice is for C to take the form C = {(^): b'TT c'x ^ 0, X ^ 0, 7T ^ 0}. There is user discrimination in this algorithm in the choice of M and C, A and Z in Phase I, and A and weights A.(x) in Phase II. We have already commented on the possible strategies for selecting M and C. We give below strategies for selecting the other parameters. We will first discuss the parameters for Phase I. (1) A^: If < A ^ 1 convergence may be slow if some of the "angles" of M are very acute (small obtuseness index of P) . In this case increasing A to near 2 will have tlie effect of opening the angles and accelerating the convergence. It therefore appears to be a good strategy to have A approximately equal to 2 in Phase I. (2) _Z : In line with the general observation for relaxation methods that any a priori information on the location of the solution could be used to advantage, if we have to solve a linear program witli only sliglitly varying data, wc could take £idvantage of tliis feature. However, in general wlien we do not have any sucli prior information, the choice of Z could be dictated by the structure of C, so that we can start with a feasible point of C.

PAGE 89

82 Now for the parameters of Phase II (1) _X: As a general guideline 1 < A <2 may be chosen as the parameter for Phase II as was done in Phase I. However, Table 3 and Figure 2A show the sequence obtained with A = 1.5 and A = . 75 and illustrate that in some cases choice of < A < 1 may give better results. (2) A.(j, V): In Merzlyakov method the choice of weights A.(j, V) is crucial and in a large measure dictates the convergence of the sequence. It may be noted that A . ( j , V) ' s determine both the direction as well as the step size. Suppose a., i c K represent the set of violated constraints in subcavity S where X Z S'}. We can solve the quadratic problem of finding the point of minimum norm in convex hull of the finite point set a., i £ K to get the weights A . ( j , V). Such a direction locally has a great deal of merit. However the computational effort is too great to merit its consideration, since at each iteration we have to solve a quadratic program. Also it may not always lead to the best set of weights as shown by the counterexample given in the next paragraph. Instead we specify A . ( j , V) =|--|, where |k| represents the number of constraints violated at x , When |k| = 2 the direction obtained coincides with the vector of minimum norm. Tlie strategy suggested by us is not the best in all cases as shovvJii by the following counterexample: Consider x^ 1 > X2 = 2 1 .0.

PAGE 90

83 Case i: A 75 TABLE 3 A.(j, V) 1 for all i £ V.

PAGE 91

84 2 X 4 X 6 X A = .75 FIGURE 24

PAGE 92

85 From the proof of Theorem 3.4 we have d2cx"+\ p) s d^x^ p) ^ — "^ — i — (EA.(x")a.)' aA.(x")a,) Thus to get the best set of A.'s we may solve the following optimization problem (ZA.(-a!x" b.))^ 1 1 1 _ , max ^ k ||2A.a.||2 X* Suppose A* solves the above, then i also solves the above. Hence an equivalent problem is max (ZA . (-a*. x b . ) ) s.t. IIZA.a.II^ = 1. In our specific problem max A, + 2A„ 2 2 s.t. A + A =1 or ma .JTT A^ + 2X21 This gives A^'' =r" and A'^' = f~ . Tliis set of A. s gives 1 2 1 X = („) which is feasible with k = 5, Had we specified A = A i.e . , A. =jywhich is not feasible and k = 4.5. Tlius our prescription does not lead to the best set of A.s, it is however a good lieuristic and easy to implement.

PAGE 93

86 (3) }i. (^) ' In Generalized Merzlyakov method, the weights A,(x) are determined by the point alone and are not fixed for a subcavity. The following appears to be intuitively a good set of weights since the ivreightage is dependent on the magnitude of violation of a constraint at the specific point , , . , violation of constraint i at x, A . (x) = ( ; :— :; : "^ ) J total violation at x Computational Results The algorithm described in the previous section was coded primarily to come up with a favorable set of values for the arbitrary parameters A in Phase I and X and the weights X.(x) for Phase II. It may be stated at the outset that selection of these parameters is an art rather than a science and it would be hard to specify the best set of parameters for the general case. The test problem considered was the problem on "Lease-Buy Planning Decisions" on pages 93 to 114 of Salkin and Saha's "Studies in Linear Programming." The problem has 17 constaints and 20 variables. Together with dual and non-negativity, set M thus had 74 constraints. Set C consisted of the constraint obtained by using strong duality. This problem has a unique solution. The constraints of the problem are reproduced in Appendix. Test results corroborate the general guidelines for selection of these parameters indicated in the previous section. The distance from a point oC P where P = M n C, was calculated at each iteration. Tlie number of iterations required to reduce the distance from P by a certain amount was used as the criterion of improvement. Two

PAGE 94

different starting points were selected. Results of computations are summarized below; Phase I A_. The relaxation parameter A was varied in the interval (0,2). For the first starting point, the number of iterations required to reduce the distance d from P from d = 74 to d = 60, was noted, and for the second point the number iterations required to reduce the distance from d = 135 to d = 132 was noted. These results are summarized in Tables A and 5 respectively. The results are in line with our expectation based on intuition that a higher A has the effect of accelerating convergence. Phase II llie Phase II relaxation parameter was varied in the interval (0,2) and the following alternatives were tested for the weights A . (x) : (a)A.(x) ^ j // of violated constraints at x , , _ violation of constraint j at x j total violation at x , ,, , , , violation of constraint j at x . 2 (c)A_.Cx) -( ^^^^^ violation at x ^ .,.-, , . -violation of constraint i at x, ! ('^^^j^^) = ^ total violation at x ^ 4 en ,,-,,, /Violation of constraint .i at x ^ (e)A.(x) = ( : — :; : ) 2 total violation at x ,^. -, ^ , violation of constraint j at x. > t ^, (f) If ( ; 7 :—-, : -T ) = • 3 th total of violation at x , , , ,„, .violation of constraint i at Xv A.(x) = 10--^( ; r-; : ^ ) J total violation at x .violation of constraint j at x. ^ _ , if .2 < ( . , ^ :: ^ ) < .3 then — total violation at x , , , ^ ^ .violation of constraint i at x. A.(x) = 5* ( ; :— ; -. :: ^ ) J total violation at x

PAGE 95

TABLE 4 Initial d = 74 88 X

PAGE 96

TABLE 5 Initial d = 135, 89 A

PAGE 97

90 , < .violation of constraint i at x. ^ « , if • 1 = ( . , ^ . ^ ) < • 2 then total violation at x , , . _ „, -violation of constraint j at x. A.(x) = 3.3-'=( ——r — 1 ^ ) J total violation at x en . r /Violation of constraint i at x, ^ , , if ( : r— ; : "^ ) < • 1 th total violation at x , , , -violation of constraint i at Xv A . (x) = ( ; :— ; -: "^ ) J total violation at x , , ^^ -violation of constraint i at x^ > „ (g) If ( :; :— ; -' ) _ . 3 then ^ total violation at x 1 / \ -, /^^ . /Violation of constraint i at x, A.(x) = 100«( :; :— ; : ^ ) J total violation at x .^ „ < /Violation of constraint i at x. ^ if • 2 = ( r~r~i • 1 ^T^^ ) < -3 then total violation at x , / , ,_, /Violation of constraint ] at x> A.(x) = 10"( :; : — :; : ^— ) J total violation at x , /Violation of constraint i at x, , „ if • 1 < ( : r-; : ^ ) < . 2 then = total violation at x , / V „ .,, /Violation of constraint i at x, A . (x) = 3. 3" ( : — ; : -' ) J total violation at x en ., /Violation of constraint i at x. ^ if ( — ' :—: : ^ ) < .1 th total violation at x -, / > /Violation of constraint i at x, A.(x) = ( ; : — :, : -^ — ) J total violation at x Results of alternative (a) are given in Tables 6 and 7 for the same two points used in Phase I. From Tables 6 and 7 it would appear that again A of around 2 gives best convergence, but the counterexample in the previous section shows ihat this is not alv^/.•lys the best policy with this set of weights. Results of alternative (b) are given in Table 8.

PAGE 98

TABLE 6 (a) ^ (x) = . j // of violated constraints at x' Initial d = 74 91 A

PAGE 99

TABLE 7 (a) A.(x) = ., ^ . ^ -^ — J // of violated constraints at x" Initial d = 135 92 25 75 1.0 Final d 133.45 132.0 131.999 No. of iterations time limit exceeded >20,000 15,249 7,568 1.25 131.999 4,135 1.5 131.999 2,435 1.75 131.998 1,212 1.95 132.0 713

PAGE 100

93 (b) X .(x) J TABLE 8 violation of constraint j at x total violation at x Intiial d = 74 X

PAGE 101

94 Alternative (b) appears comparatively insensitive to the relaxation parameter X. Results of alternatives (b), (c) , (d) and (e) are tabulated below: TABLE 9 ,, . -, / N .violation of constraint i at x. Alternative (b) : A.(x) = ( T~r~i ^~'^ — r^~ — ~~Z ) J total violation at x . . -, / V , violation of constraint j at x 2 Alternative (c) : A.(x) = ( ^ ^ . ^~, — TZ ) J total violation at x . , . ^ . . 3 ,,. , / X .violation of constraint i at x. Alternative (d) : A.(x) = ( -— . . ^ . — ) J total violation at x , . -, / N .Violation or constraint i at x. Alternative (e) : A . (x) = ( ^ ^ , . , ^ . ;;: ^ ) J total violation at x Initial d = 74 A = . 25 Iteration no. Alternative Alternative Alternative Alternative

PAGE 102

95 Prima-facie alternative (c) and (d) looked comparable to alternative (b). So the number of iterations required to reduce d to 60.0 was determined and is given in Table 10. From Table 10 there appears no significant difference between alternative (b), (c), and (d) . Results of alternatives (b), (f), and (g) are shown in Table 11. Since no significant difference in performance was observed, (f) and (g) were not pursued any further. Results of above study show that the following constitute a favorable set of parameters: Phase I. X = 2 Phase II. A = 2 \ f \ violation of constraint j at x j total violation at x However as indicated earlier, there are simple counterexamples showing alternative values of the parameters which give better convergence. Hence the above prescriptions can at best be considered good heuristics. For Phase I of above study, we moved in the direction of most violated constraint of M. Investigations were also conducted combining Phase I and II. Instead of moving in the direction of the most violated constraint of M, we used Generalized Merzlyakov method for relaxation with respect to M. The results were very heartening and are shown in Table 12. In this table a comp;irison has been made with alternative (b) of Phase II.

PAGE 103

96 TABLE 10 Initial d = 74

PAGE 104

TABLE 12 Initial d = 74 Final d = 60 97 A

PAGE 105

98 Concluding Remarks The simplex method of solving linear programs has achieved its present popularity in a large measure due to the advances during the last three decades on extending the effectiveness of the basic algorithm introduced by Dantzig in 1951. There are however some useful large-scale linear programs beyond the capability of the simplex method which are not being formulated and solved at present due perhaps to their size. It is for such problems that iterative techniques seem to be a viable alternative. We have surveyed two such techniques — relaxation methods and subgradient methods — in considerable detail. It has also been shown that most of the recent subgradient methods are mathematically subsumed by the Generalized Merzlyakov method. The latter along with relaxation methods in general are computationally far simpler. In this dissertation another algorithm using relaxation method has been presented which possesses a linear rate of convergence. The new algorithm has been coded and an attempt made to find for it a favorable set of parameters. A variant of the algorithm has been seen experimentally to have considerable merit over the present relaxation methods. Convergence properties of this algorithm are still not attractive enough to be of practical value. However, it is hoped that these could be improved upon and an algorithm found that can economically solve large scale unstructured linear programs.

PAGE 106

APPENDIX The Test Problem min. 1516x, + 1238x^ + 963x^ + 750x, + 450x^ + 1200x^ + lA30x^ 1 2 J 4 5 6 7 + 1300x + 1260JC + 1300x + 870:x + 500x + 1400x + 1360x,, + 2160X,,. + 2325x,, + 1095x,^ + 2720x,14 Ij lb i / io + 1440x^g + SOOx^Q subject to ^6 + ^7 + "18 •* ^9 = ^"^ ^4'""i5 -""le ^ ^7 ^^s'-^ig = ^"^ "l -^ "6 -^ "7 ^ ^4 -^ ^5 ^ ^6 " \8 " \9 ^ '' "l + "2 " "6 ^ "7 "^ "8 " "9 ' "13 + "14 "^ "l5 "^ "l6 * "l8 = ^° x^ + X2 + X3 + x^ + x^ + Xg + x^g + ^ 13 + ^14 + ^15 + ^15 + ^ig = ^^ x^ + x^ + X3 + x^ + x^ + xg + Xg + x^Q + x^^ + x^3 + x^3 + x^^ + x^g > 65 x^ + x^ + X3 + x^ + X3 + x^ + Xg + x^Q + x^^ + x^2 " "13 ^ "15 *"18 ^ "20 =^ ^° "16 * "17 = ^2 "18 ^ "19 = ^^ "13 ^ 1° "14 + "15 ^ ^5 -"16 -^ "20 =< ° "1 + "2 ^ "6 " "7 " "8 ^ "9 "l4 "15 ^ "17 " "19 ^ 3^ "3 ^ "4 ^ "10 + "11 "13 = ^° "5 + "6 ^ "8 ^ "12 + "14 "" "16 "20 = ^^ "10 -^ "11 * "13 •* "18 =< ^° "7 -^ "9 ^ "12 ^ "17 ^ "20 ^ ^° X. >0, 1=1, ...,20 99

PAGE 107

100 Note : Problem abstracted from Case Study on "Lease-Buy Planning Decisions" in Salkin, H. M. , and J, Saha Studies in Linear Programming , 19 75, pp. 93-114.

PAGE 108

BIBLIOGRAPHY 1. Agmon, S,, "The Relaxation Method for Linear Inequalities," Canadian Journal of Mathematics , 6, No. 3, 1954, pp. 382-392. 2. Beale, E. M. L. , "The Current Algorithmic Scope of Mathematical Programming Systems," Mathematical Programming Study 4 , 1975, pp. 1-11. 3. Bland, R. G. , "New Finite Pivoting Rules for the Simplex iMethod. , Mathematics of Operations Research , 2, 1977, pp. 103-107. 4. Busacker, R. G. and T. L. Saaty, Finite Graphs and Networks , McGraw-Hill, New York, 1965. 5. Camerini, P. M., L. Fratta, and F. Maffioli, "On Improving Relaxation Methods by Modified Gradient Techniques," Mathematical Programming Study 3 , 1975, pp. 26-34. 6. Dantzig, G. B., "Maximization of a Linear Function of Variables Subject to Linear Inequalities," in T. C. Koopraan, ed. Activity Analysis of Production and Allocation , Wiley, New York, 1951, Ch. XXI. 7. Dantzig, G. B., Linear Programming and Extensions , Princeton University Press, Princeton, New Jersey, 1963, pp. 84-85. 8. Dantzig, G. B., R. W. Cottle, B. C. Eaves, F. S. Hillier, A. S. Manne, G. H. Golub, D. J. Wilde and R. B. Wilson, "On the Need for a System Optimization Laboratory," in T. C. Hu and S. M. Robinson, eds . Mathematical Programming , Academic Press, New York, 19 73, p. 2. 9. Dantzig, G. B. and R. Van Slyke, "Generalized Upper Bounding Techniques," Journal of Computer System Science , 1968, pp. 213226. 10. Demjanov, V. F. , "Algorithms for some minimax problems," Journa l of Computer and System Sciences 2 (1968), pp. 342-380. 11. Eaves, B. C, "Piecewise Linear Retractions by Reflextion," Linear Algebra and its Applications , 7, 1973, pp. 93-98. 101

PAGE 109

102 12. Ererain, I. I., "An Iterative Method for Cebysev Approximations of Incompatible Systems of Linear Inequalities," Soviet Mathematical Doklady , 1961, pp , 821-82A. 13. Fejer, L., "Ueber die Lage der Nulstellen von Polynomen die aus Mininium forderungen genisser Arten entspringen, " Mathematical Annalen , 85, 1922, pp. 41-A8. 14. Forrest, J. J. H. and J. A. Tomlin, "Updating Triangular Factors of the Basis in the Product Form Simplex Method," Mathematical Programming 2, 1972, pp. 263-278. 15. Goffin, J. L. , On the Finite Convergence of the Relaxation Method for Solving Systems of Inequalities, Dissertation University of California, Berkeley, 1971. 16. Goffin, J. L., "On Convergence Rates of Subgradient Optimization Methods," Working Paper No. 34, McGill University, 1976. 17. Harris, P. M. J., "Pivot Selection Methods of the Devex LP Code," Mathematical Programming Study 4 , 1975, pp. 30-57. 18. Held, M. and R. M. Karp, "The Travelling-Salesman Problem and Minimum Spanning Trees; Part II," Mathematical Programming , 1, 1971, pp. 6-25. 19. Held, M., P. Wolfe, and H. Crowder, "Validation of Subgradient Optimization," Mathematical Programming , 6, 1974, pp , 62-88. 20. Hellerman, E. and D. C. Rarick, "The Partitioned Preassigned Pivot Procedure (P'^)," in D. J. Rose and R. A. Willoughby eds., Sparse Matrices and Their Applications , Plenum Press, New York, 1972. 21. Hoffman, A. J., "On Approximate Solutions of Systems of Linear Inequalities," Journal of Research of the National Bureau of Standards, 48, 4, 1952, pp. 263-265. 22. Kalan, J. E. , "Aspects of Large-Scale, In-Core Linear Programming," Proceedings of ACM Annual Conference , Chicago, August, 19 71. 23. Koehler, G. J., "A Case for Relaxation Methods in Large-Scale Linear Programming," Large Scale Systems Tlieory and Applications . Proceedings of the IFAC Symposium, June 1976, Udine, Italy. 24. Koehler, G. J. and G. S. Kumar, "A Look at Finding Points of a Convex Polyhedron Using Relaxation and Subgradient Procedures," to appear in Operations Research .

PAGE 110

103 25. Koehler, G. J., A. B. VVhinston and G. P. Wright, Optimization over Leontief Substitution Systems , North-Holland Publishing Company, Amsterdam, 1975. 26. Kohler, D. A., Projections of Convex Polyhedral Sets, Dissertation, University of California, Berkeley, 1967. 27. Lasdon, L. S,, Optimization llieory for Large Systems , The MacMillan Company, New York, 19 70. 28. Lemarchel, C, "An Extension of Davidon Methods to Nondif ferentiable Problems," Mathematical Programming Study , 3, 1975, pp. 95-109. 29. Markovitz, H. M. , "The Elimination Form of Inverse and its Application to Linear Programming," Management Science , 3, 1957, pp. 255-269. 30. Merzlyakov, Y. I,, "On a Relaxation Method of Solving Systems of Linear Inequalities," USSR Computational Mathematics and Mathematical Physics , 2 (3), 1963, pp. 504-510. 31. Motzkin, T. S. and I. J. Schoenberg, "The Relaxation Method for Linear Inequalities," Canadian Journal of Mathematics , 6, No. 3, 1954, pp. 393-404. 32. Oettli, W., "An Iterative Method, Having Linear Rate of Convergence, for Solving a Pair of Dual Linear Programs," Mathematical Programming , 3, 1972, pp. 302-311. 33. Oettli, W. , "Symmetric Duality and a Convergent Subgradient Method for Discrete, Linear, Constrained Approximation Problems with Arbitrary Norms Appearing in the Objective Function and in the Constraints," Journal of Approximation Theory , 14, 1, 1975, pp. 43-50. 34. Polyak, B. T. , "A General Method of Solving Extremum Problems," Soviet Mathematical Doklady , 1967, pp. 593-597. 35. Polyak, B. T. , "Minimization of Unsmooth Functions," USSR Computational Mathematics and Mathematical Physics , 1969, pp. 14-29. 36. Shor, N. Z., On the Structure of Algorithms for the Numerical Solution of Optimal Planning and Design Problems, Dissertation, Cybernetics Institute An, Kiev, 1974. 37. Wolfe, P., "A Method of Conjugate Subgradients for Minimizing Non-differentiable Functions," Proceedings Twelfth Annual Allerton Conference on Circuit and System Tlieory , Chicago, October 2-4, 1974, pp. 8-15.

PAGE 111

104 38. Wolfe, P., "A Method of Conjugate Subgradients for Minimizing Non-diff erentiable functions," Mathematical Programming Study 3, 1975, pp. 145-173.

PAGE 112

BIOGRAPHICAL SKETCH Gollakota Surya Kumar was born in Kakinada, India, on April 5, 1939. After secondary schooling at Modern School, New Delhi, India he joined St. Stephens College, Delhi University in BSc (Physics Honours) in July 1956. After tv;o years in Delhi University he was one of 24 students selected in an all-India competition to pursue a four-year course (1958 to 1962) in mechanical engineering at Indian Railways School of Mechanical and Electrical Engineering at Jamalpur, India. During this period he passed Parts I, II and III of Institution of Mechanical Engineers (London). He became a Graduate of Institution of Mechanical Engineers (London) in July 1962 and was Elected a Member of the British Institution in November 1969. From June 1964 to March 1970 he worked as Assistant Plant Manager and Plant Manager of three major plants of Indian Railways employing a total of over 10,000 persons. From March 19 70 to August 19 72 he worked as Assistant Director, Research, Design and Standards Organization of Indian Railways. In August 1972 he came to the United States and joined the MBA Program at Indiana University, Bloomington, Indiana. From January 1976 he has been pursuing doctoral studies in business administration v>7ith a major in management at the University of 105

PAGE 113

106 Florida. He has been teaching senior level courses in managerial operations analysis over the last two years. He has recently accepted a position as an assistant professor in the College of Business Administration, Northern Illinois University, Dekalb, Illinois. He was General Editor of his school weekly "Sandesh" and has several professional papers to his credit in India and USA. His hobbies include badminton, basketball and photography. He presented a one-man exhibition of photographs in Jamalpur entitled, "Four Years in Jaralpur. "

PAGE 114

I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Gary J^Koehler, Chairman Associate Professor of Management I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. j:ukJi ^^7W/AAntal Majthay Associate Professor of Management I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. ^ / y/CX'Lr-r(e