A Sparse Proximal Implementation of the
LP Dual Active Set Algorithm *
Timothy A. Davist William W. Hagerl
July 26, 2003
Revised August 31, 2004
Abstract
We present an implementation of the LP Dual Active Set Algo
rithm (LP DASA) based on a quadratic proximal approximation, a
strategy for dropping inactive equations from the constraints, and
recently developed algorithms for updating a sparse Cholesky factor
ization after a small rank change. Although our main focus is linear
programming, the first and secondorder proximal techniques that we
develop are applicable to general concaveconvex Lagrangians and to
linear r 1ii;i fl v and inequality constraints. We use Netlib LP test prob
lems to compare our proximal implementation of LP DASA to Simplex
and Barrier algorithms, as implemented in CPLEX.
Key words. Dual active set algorithm, linear programming, simplex
method, barrier method, interior point method, equation dropping
AMS(MOS) subject classifications. 90C05, 90C06, 65Y20
*This material is based upon work supported by the National Science Foundation under
Grant No. (i'1i..2 
tdavis@cise.ufl.edu, http://www.cise.ufl.edu/~davis, PO Box 116120, Department
of Computer and Information Science and E,_ iHP l in,1 University of Florida, Gainesville,
FL 326116120. Phone (352) 3921481. Fax (352) 3921220.
thageramath.ufl.edu, http://www.math.ufl.edu/l..... I PO Box 118105, Depart
ment of Mathematics, University of Florida, Gainesville, FL 326118105. Phone (352)
3920281. Fax (352) 3928357.
1 Introduction
In this paper we present an implementation of the LP Dual Active Set Al
gorithm (LP DASA) [21] for solving linear programming problems. Since
our quadratic proximal approach can be applied to general concaveconvex
Lagrangians, we develop the theory in an abstract setting, and then apply it
to the linear programming problem. Given a function : A x X 1* R, where
A C _" and X C P_' are closed and convex, we consider the problem
sup inf (A,x). (1)
AEA xEX
We refer to in (1) as the "Lagrangian," while the "dual function," defined
on A, is given by
(A)= inf (A,x). (2)
xEX
Hence, the maximin problem (1) is equivalent to the maximization of the
dual function.
The linear program (LP)
min cTx subject to Ax = b, x > 0, (3)
where A is m by n, and all vectors are of compatible size, corresponds to the
following choices in (1):
X = {x E :x >0}, A = ', and (4)
(A,x) = c'x+T(bAx). (5)
Any LP can be written in the canonical form (3) [34]. Since the dual function
for an LP is nonsmooth, we utilize the following proximall" regularization:
C,(A; y) = min (A, x) + Ix y ', (6)
xex 2
where e > 0 is a scalar and y E ".
First and secondorder algorithms for solving (1) are developed. The
firstorder algorithm corresponds to the iteration:
Xj = arg max L,(A;yj), (7)
AEA
Yj+i = arg mm (Aj,x) + yj'. (8)
xEX 2
The dual active set algorithm (DASA) [18, 19, 21, 22, 23, 25] is used to
evaluate the maximum in (7). As we will show, the firstorder algorithm
amounts to steepest descent along the gradient of the function
Q(y) = sup C,(A; y). (9)
AEA
A minimizer of (9) is a solution of (1). The secondorder algorithm, which
uses a secondorder approximation to Q, is finitely convergent, while we only
establish convergence for the firstorder algorithm. On the other hand, for
an LP the firstorder algorithm and the secondorder algorithm can coincide
near a solution of an LP; hence, for an LP, the gradient algorithm (7)(8) can
be finitely convergent since the secondorder algorithm is finitely convergent.
The fact that the firstorder algorithm becomes a secondorder algorithm
may be one reason why the firstorder algorithm is almost always superior
to the secondorder algorithm in numerical experiments.
In [21] we develop the LP Dual Active Set Algorithm, a finitely conver
gent algorithm for solving a linear programming problem based on a series of
orthogonal projections. In [22, (4.4)] we show that if a proximal regulariza
tion is applied to the LP and the resulting quadratic programming problem
is solved by the Dual Active Set Algorithm, then the search directions gener
ated by the proximal DASA closely approximate the orthogonal projections
generated by LP DASA. In this paper, a complete algorithm based on this
proximal approach is developed and analyzed.
Our numerical results use factorizationbased sparse matrix techniques
[7, 8, 9] to solve the linear systems that arise in each step of DASA. In contrast
[22] gives numerical results based on an SSOR iterative solver [20]. This
iterative implementation is attractive in cases where the constraint matrix
A is sparse, while the CIi, 1,,ky factorization of AAT is relatively dense. We
saw in [22] that this iterative approach could solve large LPs connected with
lower bounds to the quadratic assignment problem, which were intractable
for factorizationbased simplex, barrier, or dual active set methods.
The iterative approach, however, is not as efficient as a factorization
based approach when both A and the Cil,1,.ky factorization of AAT are
sparse. In this case, where the time required to solve a factored linear system
may be comparable to the time required to multiply A by a vector, direct
methods are superior to iterative methods. We compare the efficiency of our
factorizationbased LP DASA scheme to Simplex and Barrier algorithms as
implemented in the commercial code CPLEX [3].
Besides developing our proximal LP DASA algorithm, we also develop an
important simplification which we call "equation dropping." The basic idea
is that as the iterations progress, we seek to identify equations that can be
dropped without affecting the solution. We drop these inactive inequalities
and solve a simpler problem with less constraints. This dynamic strategy for
dropping inactive equations is more aggressive than resolve techniques that
drop constraints that must always be satisfied.
We remark that another variation of LP DASA, also based on orthogonal
projection, appears in [30]. There the bound constrained least squares prob
lem appearing in the first step of the LP DASA algorithm [19, 21] is replaced
by an unconstrained least squares problem. As a result, a nondegeneracy
assumption is needed to ensure convergence, while we obtain in [21] con
vergence without any assumptions on the problem. Previous applications of
DASA have been to classes of bound constrained control problems [2, 24, 38]
and to quadratic network optimization [23].
We briefly compare LP DASA [21] to both simplex and barrier approaches.
In the dual active set approach, we start with a guess for the dual multiplier
associated with the constraint Ax = b, and we ascend the dual function,
eventually obtaining a subset of the columns of A which contains b in its
span. Either b is a nonnegative combination of these columns of A, and we
have obtained an optimal solution of the LP, or some of the columns are dis
carded, and the iteration repeats. Since the algorithm constantly ascends the
dual function, the collection of columns obtained in each iteration does not
repeat, and convergence is obtained in a finite number of steps. This finite
convergence property is similar to that of the Simplex method, where the
iterates travel along edges of the feasible set, descending the cost function
in a finite number of steps. Unlike the Simplex method, neither rank nor
nondegeneracy assumptions are either invoked or facilitate the analysis. In
essence, one is able to prove finite convergence without any assumptions. In
the Simplex method, typically one constraint is activated and one constraint
is deactivated in each iteration. In contrast, many constraints often change
in an LP DASA iteration.
In Barrier methods, the iterates move through the relative interior of the
feasible region. In LP DASA, the iterates are infeasible, and the algorithm
seeks to identify the columns of A associated with an optimal basis. Each
iteration of the Barrier method computes a scaled projection of the cost
vector into the null space of A. LP DASA projects the constraint violation
vector into the space orthogonal to the "free" columns of A. Consequently,
a Barrier method solves symmetric linear systems involving a matrix whose
sparsity pattern coincides with that of AA'; LP DASA solves symmetric
linear systems with the matrix AFA' where AF is the submatrix of A
corresponding to the current free variables (primal variables not at a bound).
In a sparse setting, the LP DASA matrix AFA' often has more zeros than
the barrier matrix associated with AA1. Thus storage requirements and
solution times associated with the LP DASA solves can be less than those
for Barrier methods.
Our paper is organized as follows: In Section 2 we prove convergence of
the iteration (7)(8), and we present the dual active set algorithm for solving
(7). The update (8) is a firstorder update. In Section 3 we develop a second
order, finitely convergent update. In Section 4 we show that in some cases,
the first and secondorder updates for an LP coincide. Hence, the analysis
of Sections 3 and 4 explains why the proximal LP DASA associated with
the firstorder update (8) can be finitely convergent for an LP. In Section 5,
we develop a strategy for deleting "inactive equations" when using DASA.
\ iin n I ;il comparisons with Simplex and Barrier codes in CPLEX are given
in Section 6.
2 The firstorder proximal update and DASA
In this section we observe that the iteration (7)(8) is convergent to a sad
dle point of when one exists, and we review DASA. If (A, ) is convex,
then the extremand in (6) is strongly convex in x, and the minimizing ar
gument, denoted x(A), is unique. As a consequence of [5, Thm. 2.1], if the
derivative VA(A, x) is continuous with respect to A and x, then L( ;y)
is differentiable and
VAL(A; y)= VL(AX,x(A)). (10)
Throughout the paper, the gradient is a row vector. If the mixed derivative
VxVAf(A,x) is continuous, then by [15, Lemma 2.2] or [16, Theorem 4.1],
x(A) is Lipschitz continuous in A, and by (10), f(A; y) is Lipschitz contin
uously differentiable in A. Hence, by introducing the e term in (6), we are
able to transform a (possibly) nonsmooth extremand in (1) into a smooth
function e(; y).
We now make a simplifying assumption: the Lagrangian (A, x) is strongly
concave in A. That is, there exists 6 > 0, independent of x, such that
C(,A x) < r(^, x) + VAX(^, x)(A u1) I IA Ii (11)
2
for each A, pI E A. In this case, the maximizer Aj in (7) is unique. Techni
cally, an LP does not satisfy this strong concavity condition. On the other
hand, similar to the regularization in x done in (6), we could augment with
a small concave term of the form 1';AI in order to satisfy this condition.
From a practical viewpoint, this small term is insignificant in the computa
tions. From a theoretical viewpoint, this small term yields a bound on the
iterates generated by (7)(8), from which convergence follows. Alternatively,
for an LP, a similar bound on the iterates can be achieved by letting Aj be
the minimum norm maximizer in (7).
Theorem 1. Suppose the i.i,, ,,,,, (, ) is continuously differentiable,
(. ,x) is uniformly, strongly concave for each fixed x E X, and (A, ) is
convex for each fixed A E A. If has a saddle point (A*,x*) E A x X :
A(, x*) < (A*,x*) < (A*,x) for allX E A, x E X, (12)
then the Aj '., rated by (7)(8) 1(1,i i,,I to the maximizer A* of (1), the
associated yj ,' to a minimizer y* of the function
(x) = sup (A, x),
AEA
and for each j we have
,(AXj+; yj+i) < (,(Aj; yi) Aj+1 j I I+i y,I . (13)
Proof. We focus on the iterates associated with j = 1 and j = 2. Putting
x = Y2 in (6) when A = A2 and y = Y2, and exploiting the uniform, strong
concavity of ( x), gives
f(A2;Y2) = min (A2, x) + Ix y_.' < (A2, 2)
xEX 2
< (AI,Y2)+VA(A,y2)(A2 A ) A 2 AI (14)
2
Also, by the firstorder optimality conditions at the maximizer in (7) and by
(10), it follows that
VAr(A, yj+i)(A Aj) < 0 (15)
for all A E A. By the definition of y2,
f,(Ai ; ) = (AI, Y2) + yi (16)
2
We combine (14), (15) with j = 1, and (16) to obtain
2 2
Analogous to (17), the iterates (Aj+i, yj+i) and (Aj,yj) satisfy (13), which
establishes monotone descent.
The function
(A,,x) + Ix II
2
is strongly convex in x for each fixed A and strongly concave in A for each
fixed x. Hence, we have the following identity (see [12, p. 173]):
max inf (A,x) + I y,ll = min sup (A,x) + x y (18)
AEA xEX xEX AEA 2
= min (x) + x y,'
xEX 2
Moreover, by [12, p. 167], if Aj and yj+i achieve the maximum and the
minimum in (18) respectively, then (Aj, yj+l) is a saddle point:
(A, yj+1) + 1.j+i Yj IA' < (Aj, yj+i) + 1j+i Yj '
2 2
< (Aj,x)+ I j y,lj'
for all A E A and x E X. It follows that
Yj+i = arg mi (Aj, x) + I x yj 
xEX 2
Hence, the x achieving the minimum in (18), denoted yj+l above, coincides
with the iterate yj+i given in (8), and the iteration (7)(8) is equivalent to
the proximal point iteration
yj+i = arg min #(x) + Ix y
xEX 2
Since (A, ) is convex, 4) is convex; that is, take 0 E [0, 1] and xl and
x2 E X to obtain:
#(0xt + (1 O)x2) = sup (A, OxI + (1 O)x2)
A(EA
< sup 0C(A, xl) + (1 0)(A, x2)
A(EA
< 0O(xX) + (1 0)O(x2). (19)
Due to the saddle point assumption (12), we have (see [12, p. 167])
x* E arg min ((x).
xEX
By the convexity of 4) and the existence of a minimizer, the convergence
theory for the proximal point method (see [28], [29], [32, Thm. 1]) tells us
that the yj approach a minimizer y* of 1; obviously, 1(x*) = <(y*).
We now show that
A* = arg max (A, y*). (20)
AEA
If the maximizer yI satisfies (I,, y*) > (A*, y*), then by the saddle point
assumption (12),
4(y*)= (L, y*) > (A*, y*) > (A, x*)= 1(x*) = 1(y*). (21)
Hence, the inequality (p, y*) > (A*, y*) yields a contradiction and (20)
holds.
By the identity (18) with yj replaced by y*, we have
max inf (A,x) + Ix y I = min sup (A,x) + I y 11' (22)
AEA xEX 2 xeX AEA 2
= min (x) Ix+ I
xEX 2
By (21), the minimax in (22) is attained by x = y* and A = A*. The
firstorder optimality conditions for y* give
Vx(A*, y*)(x y*) > 0 for all x E X. (23)
Due to the convexity of (A, .), (23) implies that
y* = arg min (A*,x) + I y* '.
xEX 2
Since y* E arg min{#(x) : x E X}, it follows that
min l (x) + Ix y* 1 = ((y*) = (A*,y*),
xEX 2
where the last equality is due to (21). Hence, the maximin in (22) is achieved
by A = A* and x = y*. The firstorder optimality condition at A* is
VA(XA*, y*)(A A*) < 0 for all A E A. (24)
Adding (24) with A = Aj to (15) with A = A* gives
(VA(A*, y*) VA(AXj yj+1)) ( A*) < 0. (25)
By the uniform strong convexity assumption, we have
(Vr(A*, yj+l) VAL(Aj, j+))(A A*) > iAj A '. (26)
Combining (25) and (26) yields:
IAj A* (VC(XA*,yj+I) V (XA*, y*))(AXj A*)
We apply the Schwarz inequality to obtain
IAj A* < IV (A*,yjl) VA(A*,y*) 1/5.
Since the yj approach y*, the continuity of VBA implies that the Aj approach
A*. D
Remark 1. Although e is independent of j in the statement of Theorem
1, it could depend on j without any changes in the analysis. When e depends
on j, (13) becomes:
Cj+l(A j+1; yfj+) < C (Aj;Yj) Aj+ Aj j+i Yl 
Let us assume that the primal constraint set X is the nonnegative cone
(4). The statement of the dual active set algorithm (DASA) for solving (1),
appearing in Algorithm 1, uses the following notation: Given a bound set
B C {1, 2,..., n}, let XB be the subvector of x consisting of those components
xi associated with i E B. We define the functions
CB+(A) = min {(A, x) : x E P",XB > 0}, (27)
1=0
A0 = starting guess
while V(A1) # 0
V0 A1
Bo = {i 1, n] : x(A,)= 0}.
for k = 0, 1,...
Wk = arg max {IBO(A) : A E '"}
Vk+l = arg max {B+ (A): A [Vk,Wk]}.
Bk+l =i E Bk : Xi(k+l, Bk) = 0}
if k+1 = Wk break
end
= 1+1
Al = Wk
end
Algorithm 1: Dual Active Set Algorithm
and
B0(A) = min {(A,x) : x E ?",XB = 0}. (28)
The components of x in (27) and (28) corresponding to indices in the comple
ment of B are unconstrained during the minimizations. For an LP, the com
plement of B, the indices of free variables, corresponds to "basic variables."
Assuming (A,x) is strongly convex in x, there exists a unique minimizer
x(A, B) in (27) for each choice of A and B. Let x(A) denote x(A, B) in the
case B = {1, 2,...,n}.
In Algorithm 1, there is an outer loop indexed by 1, and an inner loop
indexed by k. In the inner loop, indices are deleted from Bk to obtain
Bk+l; that is, Bk+l is a subset of Bk consisting of those indices for which
xi(vk+l, Bk) = 0. When the inner loop terminates, we reinitialize Bo by
solving the problem
min (A,, x)
x>0
to obtain x(Al). Those i for which xa(A,) = 0 form the new Bo. In the LP
setting studied here, or in the networks studied in [23], the indices contained
in the final Bk of the subiteration are included in the initial Bo of the next
iteration. Hence, the initialization of Bo adds indices to the current bound
set, while the evaluation of Bk+1 frees bound indices. The proof [21, Thm. 1]
that DASA solves (1) in a finite number of iterations is based on the fact
that the subiteration strictly increases the value of the dual function; as a
result, the final set Bk generated in the subiteration cannot repeat. Since the
sets Bk are chosen from a finite set, convergence occurs in a finite number of
steps.
We use DASA to solve (7). We now explain each step of Algorithm 1 in the
case where the dual function is the regularized function L(A, y) associated
with the LP (3):
1. In the outer loop of Algorithm 1, the iteration continues until the gra
dient of the dual function vanishes. As in (10), this gradient can be
expressed:
VL6(A) = b Ax(A), (29)
where x(A) achieves the minimum in (6) corresponding to the LPdual
function (5):
c ATX
x\(A) = max{0, z\(A)}, z(A) y   (30)
2. Before entering the inner loop, Bo is initialized to be the indices i for
which xi(AX) = 0, where x(A) is given in (30).
3. In the first step of the inner iteration, we maximize i,.. This amounts
to solving the problem
max min {cx+ AT(b Ax) + : x F,x = }. (31)
A 2
If F = B~ is the complement of Bk, then the maximin problem (31) is
equivalent to the following equations:
(AFA) = AFCF + (b AFYF), and (32)
1
XF = YF (CF AXA), (33)
where AF denotes the submatrix of A associated with columns corre
sponding to indices in F. The solution A of (32) is the wk of Algorithm
1. The XF given in (33) is the associated primal minimizer in (31).
4. The computation of vk+l represents a line search in which the piecewise
quadratic B+ (A) is maximized along the line segment connecting the
current subiterate vk and the maximizer wk (the solution of (32)). For
any A and B, we have
B+(A) = Tx + AT(b Ax) +  x (34)
where
i max{0, zi(A)} if i ( BC,
xi(A) = (35)
0 ifi E B,
As A moves along the line segment [vk, wk], z(A) changes linearly with
A and x,(A) in (35) changes in a piecewise linear fashion. Hence, B+
in (34) is a convex, piecewise quadratic function.
5. By the definition of xi in (35), Bk+1 is obtained by pruning from Bk
those indices i E Bk for which xi(Vk+l) > 0.
6. If xf(vk+i) = 0 for all i E Bk, then the subiteration terminates and
the set Bo is reinitialized to be the final Bk of the subiteration plus all
indices i E B~ for which Zi(wk) < 0.
\inl,, ll iil details connected with our DASA implementation are given
in Section 6. Briefly, we maintain a Cli,1li,,ky factorization of the matrix
AFAF +aI, which is updated as the free set F = BC changes. Here a, a small
multiple of the machine epsilon, is chosen to ensure numerical stability in
the evaluation of the CIi, 1,ky factorization. Sparse numerical linear algebra
techniques for updating this factorization are developed in [7, 8, 9]. Typically,
several indices are dropped from Bk in each subiteration (step 5 above); also,
the initial set Bo (constructed in step 6) often has many more indices than
the final set Bk of the previous iteration. Hence, multiple rank updates of
AFAT are used in step 5, and multiple rank downdates are used in step 6. If
the rank of the update or the downdate is large enough, then it can be more
efficient to completely reform the matrix AFA and compute its Clii1,,ky
factorization directly.
Remark 2. It was pointed out by a referee, that (32)(33) is equivalent
to the following symmetric system (see [35, Result 3, p. 94]):
( V AF xyF FCF eYF (3
AF 0 A O f b *(36)
For sparse A, the Ci, 1bky factor of the matrix in (36) can be much sparser
than that of AFA'. In particular, this reformulation would allow for the
treatment of dense columns in a natural way, without having to resort to
ShermanMorrisonWoodbury techniques.
3 A secondorder proximal update
Let Q be defined by
Q(y)=sup inf (A,x)+ 6 x yS' =sup L,(A;y). (37)
AEA xEX 2 AEA
Observe that the first step (7) of the firstorder proximal algorithm is equiv
alent to evaluating Q(yj). Under the assumptions of Theorem 1, it follows
from [5, Thm. 2.1] that the function Q is differentiable and
VQ(y) = e(y x(y)), (38)
where x(y) is the minimizing x in (37) corresponding to the maximizing A.
Thus the update (8) corresponds to a step of length 1/e along the negative
gradient of 0:
yj+i=yj (1) V(yj)
Together, (7) and (8) are a firstorder gradient algorithm for minimizing Q.
In this section, we develop a secondorder, finitely convergent algorithm.
In the next section, we show that for LPs, the firstorder algorithm can
coincide with the secondorder algorithm, and in this case, the firstorder
algorithm (7)(8) if finitely convergent. In fact, we observe in numerical
experiments that the firstorder update generally performs better than the
secondorder update. The analysis given in Sections 3 and 4 linking the
firstorder algorithm to the finitely convergent secondorder algorithm pro
vides some insight into this phenomenon. We begin by establishing some
connections between C and Q.
Theorem 2. If ( ,x) is concave for each fixed x E X and (A, ) is
convex for each fixed AX A, where X C P'' and A C T'" are convex, then Q
is convex on F If C has a saddle point (A*, x*) E A x X :
(A, x*) < (A*,x*) < (A*,x) for all A E A, x E X, (39)
then (A*,x*) is a saddle point of Le and Q(x*) = (A*) = (A*,x*). If y*
is any minimizer of Q over X, then A* maximizes e( ; y*) over A.
Proof. For any fixed x E X and y E P, the function
(A,x, yx) (Ax + Ix l.
2
is concave in A by assumption. Similar to the analysis (19), but with con
vexity replaced by concavity, L( ;y) is concave. We now show that for
any fixed A, L(A; ) is convex. Fiit observe that the function (A, x, y),
for fixed A, is convex with respect to the pair (x, y) because the Lagrangian
(A, ) is convex by assumption while the term x y1 is convex with
respect to the pair (x, y). Given any xi, y, E X, i = 1, 2, and 0 E [0, 1], the
convexity of (A, ., ) yields
C(A;0yl + (1 )y2) = min (A, x, 0y + (1 )y2)
xEX
< (A, Ox + (1 O)x2,0y + (1 8)y2)
< 0O(A,x1,yI)+ (1 0)(A,X2,y2).
Minimizing over xl and x2 E X gives
4(A; 0y + (1 )y2) <) (1 0)(A; y2).
Hence, C(A; .) is convex on A". As in (19), Q is convex on ?'.
For any A E A, the first inequality in (39) and the definition of L give
(*,x*) > (A,x*) > inf (A, x) + x* (A; x*). (40)
xEX 2
From the second inequality in (39), we have, for any y E F',
(A*, x*) = inf (A*,x) < inf (A*, x) + y = C(A*;y). (41)
xEX xEX 2
Combining (40) and (41) gives
(A; x*) < L,(A*;x*) < L,(A*;y) for all A E A, y E _"'. (42)
Since Q is given by a maximum over A in (37), we have, for any A E A
and y E F,
Q(y) > inf (A,x) + x yI > (A). (43)
xEX 2
Again, by the saddle point assumption (39),
C(A*) = max {(A) : A A}.
Combining this with (43) gives
Q(y) > (A*) for all y E ('. (44)
By the definition of Q, the new saddle point condition (42), and (44), we
have
Q(x*) sup ,(A; x*) < inf ,(A*;y)
AEA yEX
= inf (A*,x) + x y = (A*) < Q(y). (45)
x,yEX 2
Thus x* minimizes Q over A* maximizes the dual function, and Q(x*) =
(A*). If y = y* is an arbitrary minimizer of Q, then the right side of (45) is
equal to Q(y*) = Q(x*) and all the inequalities are qualities. Consequently,
A* maximizes L( ;y*). O
Algorithm 2 is a secondorder active set method, based on Theorem 2, for
minimizing Q in the case that X is the nonnegative cone (4). In the statement
of Algorithm 2, we use the following notation: For any B C {1, 2,... n},
define
QB(Y) = max min (A,x) + x y '. (46)
AEA XB=O 2
In (37) let A(y) denote the maximizing A and let x(y) denote the associated
x that attains the minimum. Similarly, in (46) let A(y, B) denote the maxi
mizing A and let x(y, B) denote the associated x that attains the minimum.
If (., x) is strongly concave for each fixed x > 0, and (A, .) is convex for
each fixed A E A, these maximizers and minimizers exist. We prove finite
convergence of Algorithm 2, under the hypotheses of Theorem 1.
Theorem 3. Suppose the i.i,, f,,', (, ) is continuously differentiable,
( ,x) is uniformly, strongly concave for each fixedx > 0, (A, ) is convex
for each fixed A E A, and the dual function is not identically equal to oo. If
has a saddle point satisfy ',, (39) and the points zk described in AIn,.i i,,,,
2 all exist, then the iterates reach a minimizer of Q in a finite number of
steps.
Proof. By Theorem 2, Q has a minimizer x*. We show that the iterates
of Algorithm 2 descend Q, reaching a point y* where VO vanishes in a finite
j=0
yo = starting guess
while VQ(yj) 0
770 = Yj
Bo = {i [, n] : x(yj) = 0}.
for k = 0,1,...
either zk = arg min {f Bk(y) : y E F} or
QBk (Zk) < Bk (jk ) and xi(zk) < 0 for some i E Bk
?7k+1 = 71(a) where 7r(a) = r7k + a(Zk rk) and
a = min{3 e 0, 1]: xi(r7(), Bk) = 0 for some i E B }
if r7k+1 = k break
Bk+l {i [1, ] :Xi(rk+l, Bk) = 0}
end
j = j+1
(yj)i = max{0, (zk)i}, 1 < i < n
end
Algorithm 2: Secondorder proximal update, X = {x E ?" : x > 0}
number of steps. Since Q is convex by Theorem 2, y* minimizes Q. The
proof has five steps:
Step 1. Continuity of x(y, B) and A(y, B) as a function of y
Define the function
B(A;y) = min (A,x) + Ix I y'. (47)
XB=o 2
Since the extremand is strongly convex in x, it follows (see [21, p. 265]) that
the x attaining the minimum in (47) is a continuous function of A and y.
As in (10), B(A; y) is differentiable in A and the derivative is a continuous
function of A and y. Due to the strong concavity assumption for , B(A; y)
is strongly concave in A, uniformly in y. Again by [21, p. 265], the maximizer
A(y, B) in (46) is a continuous function of y. Since the minimizer x in (47) is
a continuous function of A and y and since A(y, B) is a continuous function
of y, we conclude that both x(y, B) and A(y, B) are continuous functions of
y.
Let y denote the current yj = ro0, let A denote A(y), the maximizer in
(37) associated with y = y, and let x denote x(y), the associated x that
attains the minimum in (37) corresponding to y = y.
Step 2. A = A(y, Bo) and Q(y) = QBo(y)
To paraphrase Step 2, we wish to show that the value of Q(y), corre
sponding to the constraint x > 0 in (37) coincides with the value of the
function QBo(y) and the constraint XBo = 0 in (46), where
Bo {i [1,n] : x(yj) = 0}.
By (10),
VAL,(A; y)= VA(A, x(y)). (48)
Hence, the optimality condition associated with A can be expressed
VA(A, x)(A A) < 0 for all A E A. (49)
By the firstorder optimality conditions associated with the minimizer x of
the strongly convex extremand in (37), we have
(, x) + e( Y9) = 0 for all i E B,
Oxi
since ai > 0 for all i E Bg. It follows that x = x also attains the mini
mum in the following problem (since the firstorder optimality conditions are
satisfied):
min (A, x) + x y .
XB0=O 2
By (49) A = A satisfies the firstorder optimality condition for (46) in the
case B = Bo and y = y. Since the firstorder condition is sufficient for
optimality, A(:y, Bo) = A.
Step 3. For each k > 0, QB(71k) = Bk_1( k)
We show that A(rk, Bk) = X(k, Bk1) and x(ik, Bk) = x(rk, Bk_),
from which the identity OQB k(7k) = 1Bk (7k) follows. Similar to (48), the
derivative of the function (47) with respect to A, evaluated at A, can be
expressed
VAB(A; y) = V = (A, i), (50)
where x achieves the minimum in (47) when A = A. Since A is a maximizer of
B('; y) over A, it follows from (50) that the firstorder optimality condition
is
VA(A,i )(A A) < 0 for all A E A. (51)
We consider the following special case of (51):
A = A(rk, Bk1), x = x(7k, Bk1), and y = rk. (52)
For B = Bk1 in (47), xi is unconstrained for i E B~,_ while at the miminizer
x associated with A = A and y = 7rk, we have i > 0 for all i E Bk1 \ Bk.
Hence, by the optimality condition characterizing the minimizer of (47),
O(A ) + e(i (k)i) = 0 for all i B = BI U (Bk \ Bk).
OxkU
Thus x satisfies the firstorder optimality conditions for (47) at B = Bk and
S= A. By (50),
VAIBk(A; 7k) =VAIJ(Ax).
Combining this with (51), we conclude, in the special case (52), that A
maximizes Bk ('; 77k) over A and x attains the associated minimum in (47)
for B = Bk and A = A.
Step 4. If VQ(yj) i 0, then Bo(71) < 0(710)
In Step 2, we showed that x achieves the minimum in both (37), for
y = yj = y, and (46), for y = yj = y and B = B0. Again, by [5, Thm. 2.1],
VQ(yj) = 6(yj x) = VBo(Yj). (53)
Since VO(yj) :5 0, we conclude that VOBo(yj) 5" 0. Hence, QBo((Zo) <
QBo(70o). Since ai > 0 for all i E Bg, it follows from the continuity result
Step 1 that xi(y, Bo) > 0 for i E Bg and y E [7r0, zo] near 7ro. As a result, a is
strictly positive, and 7,1 7 0. The convexity of QBo (see the analysis at the
beginning of the proof of Theorem 2) and the condition QBo(zo) < QBon (0)
imply that Bo(71) < QBo(nro) = Q(r7o), where the last equality is Step 2.
Step 5. Finite convergence
Since Bk+ has at least one more element than Bk, it follows that the inner
loop in of Algorithm 2 must terminate. Let BK denote the final set before
termination. By the convexity of QB, Bk (?7k+i) < QBk (77k) for each k. By
Step 3, QBk (nk+i) = QBk+i(7 k+S). By Step 4, Q Bo(7) < Q(yj). Combining
these relations gives
QBK(ZK)
Since the constraints of Q are inequalities while the constraints of 0BK are
qualities, we have
Q(zK) < QBK(ZK) < Q(j).
When we replace any negative components of y by 0, the value of Q in (37)
can only decrease since x > 0. Hence, Q(yj+i) < Q(ZK) < Q(yj) when
VO(yj) : 0. Since there are a finite number of possible choices for the set
BK, convergence is achieved in a finite number of steps. O
4 Secondorder e firstorder for LP
We now examine Algorithm 2 in the case where the dual function is the
regularized Lagrangian (6) for an LP. The central step in Algorithm 2 is the
minimization of QB(y) over y E ". A minimizer is obtained by setting the
derivative of OB to zero:
VOB(y) = C(y x) = 0, (54)
where x achieves the minimum in (46). The minimizing x and the maximizing
A in (46) satisfy (32)(33) where F is the complement of B. Combining (54)
with (32)(33), we see that y minimizes OB if and only if YB = 0 and YF
satisfies the equation:
MyF = A(AFA )b + (M I) CF, (55)
where M = AT(AFAT)1AF.
Although the matrix AFA in (55) may not be invertible, we only apply
the inverse to elements in the range of AFA The vector b lies in the range
of AF since the gradient (29) vanishes at a maximizer of (46). The columns
of AF trivially lie in the range of AFAF. The notation (.)1 denotes the
inverse operator the inverse operator exists since the vectors b and the
columns of AF lie in the range of the operator AFAF.
Based on (55), minimizing 0B is equivalent to minimizing the following
quadratic:
1 1
yFMyF (AFYF)T(AFA) b yT(M I)c. (56)
2 E
That is, when we set to zero the gradient of the quadratic (56), we get (55).
Notice that PF = (I M)CF is the projection of CF into the null space of
A f. If PF : 0, then the minimum of OB is oo since pMpF = 0 and
AFpF = 0. Hence, when yr moves in the direction pF, the first two terms
in (56) are zero while the last term tends to oo.
In our implementation of Algorithm 2 which follows, F = B' stands for
the complement of the current set of bound indices.
1. For each k, we set
(7k+l)F = (k)F SpF.
Movement of YF in the direction of PF does not change the solution A
in (32) since PF lies in the null space of AF. 7rk+1 is the first point in
the search direction where some component of XF in (33) vanishes. If
no component of XF vanishes, then the LP cost is oo. Assuming the
optimal cost is finite, the projection PF will be zero for k large enough;
in this case, we proceed to the next step.
2. When PFk vanishes, the last term in (55) vanishes, and the quadratic
(56) is minimized by taking
YF = (Tk+l)F = (lk)F + AF, (57)
where = (AFAT) '(bAF(k)F). It can be checked by substitution
that this choice for YF yields a solution of (55). The change in the
solution A to (32) corresponding to the change (Thk+i)F  (?k)F
AJp in YF is e On the other hand, when y and A change in this
way, x in (33) does not change (since the terms reflecting the change
in y and the change in A cancel). Since the components of XF do not
move, they remain strictly positive, and the subiteration of Algorithm
2 terminates.
To summarize, the implementation of Algorithm 2 has two phases: In the
first phase, YF moves along PF, the solution A in (32) does not change, and
some components of x in (33) reach their bounds. In the second phase, we
solve an equation connected with the b term in (55). The solution A of (32)
changes by eqc while the solution x of (33) does not change. There may be
several steps in phase one while phase two terminates in one step.
In numerical experiments, we often observe that the number of phase one
steps decreases as we converge to the solution. Close to the solution, there
may be no phase one steps. We now observe that the firstorder update is
essentially a phase two step. The phase two step of Algorithm 2 amounts to
solving the equation AFYF = b, where F is the current free set. In other
words, if (Tlk+i)F in (57) is multiplied by AF, we obtain b. The firstorder
update (8) amounts to taking for yj+i the solution x of (33). With some
rearrangement of (32)(33), we obtain the relation AFXF = b. Hence, the
phase two secondorder update and the firstorder update solve the same
equation. The fact that near a solution, the firstorder update can coincide
with the finitely convergent secondorder update may explain the better than
expected convergence of the firstorder update.
In general, the secondorder update gives more rapid convergence than
the firstorder update; however, during phase one of the secondorder update,
there is one solve (when we compute pp) associated with each variable that
reaches a bound. In contrast, with Algorithm 1, we often update many
variables after each solve (due to the line search). It appears that the second
order method can only surpass the firstorder method in efficiency when a
line search, analogous to the line search of Algorithm 1, is developed.
Remark 3. For a linear programming problem, the update for y given
in Algorithm 2 is a secondorder update since it is obtained by minimiz
ing a quadratic approximation to Q. In contrast, the firstorder update (8)
corresponds to a step of length 1/e along VO.
5 Equation dropping
In this section, we focus on a strategy which we call "equation dropping."
Although we will apply the strategy to linear programs, the algorithm we de
velop is applicable to problems with linear constraints which contain a vari
able xj appearing in only one equation, and with the property that (A, x)
is a linear function of xj. An important situation where this occurs is the
following: When an inequality is converted to an equality by introducing a
slack variable, the slack variable does not appear in the cost function and it
appears in precisely one equation. Hence, the Lagrangian (A, x) is a linear
function of the slack variable.
Given an optimal solution x of the LP, we say that equation i is inactive
if there exists a column j of A that is completely zero except for an element
aij for which xj > 0. In this case, it follows from complementary slackness
that at a corresponding solution to the dual problem, Aiaij = cj. Thus, if
the inactive equations were known, we could hold the associated multipliers
Ai = cj/aij fixed during the solution of the dual problem, and solve an
equivalent reduced problem with the inactive rows removed.
Since we do not know the inactive equations when we start to solve the
dual problem, the solution algorithm must dynamically identify the inactive
equations. Our approach is roughly the following: Initially, we locate all
the column singletons and we compute associated multipliers A, = cj/aij.
In the line search of Algorithm 1, any component of A, which reaches the
value cj/aij, associated with a column singleton, is held fixed and equation
i is dropped from the problem. Thus in the inner loop of DASA, rows are
dropped and columns are freed. In the initialization step of DASA, where Bo
is defined, we also check whether any dropped rows should be restored to the
problem. Row i is restored if the value of the dual function can be increased
with a small change in the previously fixed value of A,. These steps are now
explained in more detail.
Let S denote the set of column indices j corresponding to column single
tons. That is, for each j E S, there is precisely one i such that aij : 0. Let
Z(j) denote the row index associated with column singleton j; hence, Z1(i)
is the set of column singletons in row i. If j, k E I1(i) and cj/aij = ck/aik,
then the variables xj and Xk can be combined together into a single variable
[1]. To simplify the exposition, we assume that the LP has been resolved
so that cj/aij : ckkaik for all j, k E 11(i).
When column singletons are present, we delete the regularization terms
in (6) associated with the set S:
L(A; y) = mi (A, x) + Y(xj y)2
xEX 2
jvs
For an LP, due to the separable structure of the Lagrangian (6), we have
r(A;y)= j (A),
j=1
where
min (cj Aiaij)xj, i = Z(j), if j E S,
xj >0
min (cj aiji)xj + (xj y)2 otherwise.
Xj>0 2
i=1
If j E then
j 0 if aiAi < cj,
(A) [ oo if aijiA > cj.
Since the dual function is being maximized, we should never allow aijAi > cj;
in the case that aijAi < cj, we have j(A) = 0. In other words, the terms in
the Lagrangian associated with column singletons can be ignored as long as
we satisfy the dual constraint aijAi < cj for i = 1(j).
With these insights, the equation dropping version of LP DASA is orga
nized in the following way: Within the subiteration, we maintain two sets,
Bk, the indices of bound columns, and Dk, the indices of dropped equa
tions. In the line search where vk+l is computed, we enforce the constraint
aijAi < cj, i = 1(j), in the following way: If aijiA reaches cj at some point
along the line segment [k, wk], then we add i to the dropped equation set,
and we fix A, = cj/aij. More precisely, in the line search of Algorithm 1, we
replace the segment [Uk, Wk] by a projected segment:
P [k,cWk] = {PW : W E [Vk, Wk]},
where
(pA), cj/aij if aijAi > cj and i = 1(j),
A, otherwise.
Once a row drops during the subiteration, it remains dropped for all the
subsequent subiterations. At the start of the next iteration, before the k
loop, we initialize both Bo and Do. In this initialization phase, we check
whether a fixed dual variable A, = cj/aj, i = 1(j), should be freed. We
free Ai if the dual function value increases after a small change in Ai. This
local behavior of the dual function is determined from the sign of the ith
component of the gradient (29). If aij > 0 and the ith component of the
gradient is < 0 after an infinitesimal decrease in the ith dual variable, then
a small decrease in A, preserves dual feasibility (aijAi < cj) and increases
the value of the dual function. If aij < 0 and the ith component of the
gradient is > 0 after an infinitesimal increase in the ith dual variable, then
a small increase in A, preserves dual feasibility and increases the value of the
dual function. Algorithm 3 provides a complete statement of the equation
dropping version of LP DASA. The symbol on the gradient term in the
definition of Do denotes the values of the gradient after either an infinitesimal
increase or decrease in the ith dual variable. Since the dual function is not
differentiable at A, = cj/aij when j E S and i = Z(j), we need to consider A,
both slightly larger and slightly smaller than cj/aij.
The proof (e.g. [18, 21, 23]) of finite convergence for Algorithm 1 also
works for Algorithm 3. Finite convergence is due to the fact that certain sets
cannot repeat. For Algorithm 3, the entity that cannot repeat is the pair
(Bk, Dk) associated with the final dual subiteration.
1=0
Ao = starting guess
while V(AX) # 0
VO =A
S =j [1, n] : (vo), = cjlaij,i= I(j)}
Do = {i [1, m] : i = I(j) for some j E S, aijCV(vo)z' > 0}
Bo= {j S: xj(AX) = 0} U {j E S: (j) t IDo}
for k= 0,1,...
Wk = arg max {CBo(A) : AD = (Vk)Dk}
vk+l = arg max {B+(A) : A P [k, wk] }
Bk+1 = {j Bk: xj(Lk+I, Bk)= 0}
Dk+1 {i (vk+,)i = cj/aij, j E 11(i)}
if vk+1 = Wk break
end
= 1+1
Al = Wk
end
Algorithm 3: DASA with equation dropping
The equation dropping strategy that we have developed can be imple
mented recursively. That is, when an equation is removed from A, addi
tional column singletons may be generated, leading to new equations that
could drop. We did not implement this recursive procedure, however, many
of the test problems could be solved much faster using such a recursive im
plementation. For example, in the Netlib test problems SCAGR7 and OSA,
every single equation would drop in a recursive implementation. With AGG2,
505 out of 516 rows drop, with SCTAP3, 1365 out of 14III rows drop, and with
BNL2, 1358 out of 2265 rows drop. Problems might also be resolved to ex
pose additional column singletons. That is, a multiple of one equation could
be subtracted from another equation in order to create a column singleton.
During the solution process, an entire row of A could drop.
6 Numerical comparisons
In this section, we compare the performance of LP DASA to that of Simplex
and Barrier methods as implemented in the commercial code ILOG CPLEX
version 7.0 [3], focusing on problems from the Netlib LP test set. Since the
CPLEX Barrier code automatically applies a presolver to the input problem,
we use the CPLEX resolved problem as input to our code. All the codes,
both CPLEX and LP DASA, are written in C.
Both the firstorder and secondorder updates were implemented. Since
the firstorder update was almost always more efficient than the secondorder
update, the numerical experiments, reported below, are based on the first
order update. Although Theorem 1 establishes convergence of the iteration
(7)(8) for any choice of e, it is observed numerically that for any fixed e,
convergence can be very slow; by decreasing e after each iteration of (7)(8),
much faster convergence is achieved. The initial choice for e and the factor
used to reduce the size of e after each proximal iteration were based on the
number of rows m of A:
26, decay factor 1/16, if m < 100,
initial e = 23, decay factor 1/8, if m < 2500,
1, decay factor 1/4, otherwise.
Qualitatively, we find that the initial e should be larger for a large problem
than for a small problem; and e should decrease more slowly for a large
problem than for a small problem. For any specific problem, e can be fine
tuned to achieve faster convergence. For example, the PDS test problems can
be solved more quickly using a smaller initial e than the choices given above.
Before starting the iteration (7)(8), we reorder the rows of A, using the
minimum degree algorithm COLAMD [6], in order to minimize fill during a
C(Ii1l.ky factorization of AAT. If exact numerical cancellation is ignored,
the fillin associated with the matrix AFA appearing in LP DASA is a
subset of the fillin associated with AAT. Hence, by minimizing the fill for
AAT, we minimize the worst possible fill that could occur with AFAF for
any choice of F.
To enhance numerical stability, we factorize the matrix AFA +aI, where
 = 244 and the columns of A are scaled to be unit vectors. The factor
ization is done using a roworiented sparse Clin,1,ky factorization algorithm
[26]. We do not exploit supernodes in this code, and thus it runs about
7 times slower, in terms of floatingpoint operations per second, than the
supernodalbased code used in the CPLEX Barrier method. After a column
is freed or a row is dropped, we use the sparse modification techniques de
veloped in [7, 8, 9] to modify the factorization. The codes for modifying
a factorization after column updates take advantage of speedup associated
with multiple rank updates. As a result, a rank 16 update of a sparse matrix
achieves flop rates comparable to those of a BLAS dot product, and about
1/3 the speed of a BLAS matrixmatrix multiply [11], in the experiments
given in [8].
Our starting guesses for an optimal dual multiplier and primal solution
are always A = 0 and x = 0 respectively. Our convergence test is based on
the LP optimality conditions. The primal approximation x(A) in (30) always
satisfies the bound constraints x > 0. The column indices are expressed as
B U F where F = BC. For all j E B, xj(A) = 0 and (c AT')j > 0. Hence,
the optimality conditions would be satisfied if the primal and dual linear
systems, Ax = b and ATA = CF, were satisfied. Our convergence criterion
is expressed in terms of the relative residuals in the primal and dual systems:
11) Ax, p ^ AHT ,
1 + < 10_8
Ill, IAll,
Here I I, denotes the maximum absolute component of a vector. Since
the test problems are given in MPS format, for which data often has 5 digit
accuracy, our 8 digit error tolerance for the equation is, in a sense, much
more accurate than the input data.
The test problems were solved on a single processor (200 Mhz) in an
11 B.1 SP cluster. In Tables 1 and 2 we compare the execution times in CPU
seconds for all problems in the Netlib LP test set that required at least 0.1
sec. The execution time is shown in bold if it is the fastest for that problem,
and the tables are sorted according to the fastest run time. For problems
OSA_60 and [,.[.:\_18, the CPLEX Barrier code generated an outofmemory
message, and hence, no times are given. In Tables 1 and 2, we see that for
the 58 test cases, LP DASA was fastest in 13 problems, the Barrier code
(with crossover) was fastest in 15 problems, the Simplex code (by default,
dual Simplex) was fastest in 30 problems.
In the current version of LP DASA code, the best results, relative to the
other codes, were obtained for the OSA and FIT problems, for which many
rows drop. The worst relative results were obtained with the PILOT problems
where there was significant growth in the error in the Ci l1,.bky factorization
during downdates.
Tables 3 and 4 give statistics for the 3 methods on a sampling of the test
problems. The number of rows and columns given in Table 3 correspond to
the resolved problems. In Table 3, we give the number of column updates
(col+), column downdates (col), row updates (row+), and row downdates
(row) employed by LP DASA. By a column update, we mean that we add
a column to the current AF and update the factorization. By a row update,
we mean that we add a row to the current AF and update the factorization.
In Table 4 we give the number of times that LP DASA solved the system
(32) or computed the Cil, 1,,ky factorization of AFAT. For comparison, the
last 2 columns of Table 4 give the number of iterations of the CPLEX Barrier
code, and the number of iterations of the Simplex code. Each iteration of the
Barrier code requires the Cl i, l1,ky factorization of a matrix with the sparsity
pattern of AA.
The data of Tables 3 and 4 shows that LP DASA typically uses fewer
Clii, 1,ky factorizations than the Barrier code and fewer solves than the Sim
plex code. With LP DASA, the total number of updates and downdates
is often comparable to the number of rows in the matrix; some exceptions
are FIT2D, a matrix for which the number of rows is much smaller than the
number of columns, and MAROSR7, for which the number of updates was
much smaller than the number of rows. The number of LP DASA column
updates is often much larger than the number of solves. For example, with
i,.:: \_18, there are about 15 updates for each solve. There would have been
much more than 15 updates per solve, however, there were 44 cases where
there were so many updates that it was more efficient to refactor the matrix
directly.
In Figure 1 we study the convergence towards optimality of the three
methods. The horizontal axis is logo1 of the relative error in the cost. The
vertical axis is the relative CPU time; that is, the CPU time divided by the
total CPU time to solve the problem for any of the methods. For LP DASA,
the relative error is computed for the primal approximation associated with
the dual iterate in (7). The convergence of LP DASA is initially similar to
that of the Barrier method. On the other hand, with LP DASA, there are
no conditioning problems in a neighborhood of the solution and a crossover
routine is not needed. Full accuracy, on the order of the machine epsilon, can
be computed while barrier methods typically are limited to accuracy on the
order of the square root of the machine epsilon. With the Simplex method,
2digit accuracy is not achieved until the computer run is more than 75%
complete. With both Barrier and LP DASA, 2digit accuracy is achieved
Relative Time
09 LPDASA
Simplex L
08
07
06
05 I Barrier
04
03
02
01
0
0 2 4 6 8 10 12 14
LOG 0(Relative Error)
Figure 1: Accuracy of solution versus relative CPU time for PDS_10
when the run is 30% complete. In a similar study in [22], we saw that when
iterative methods are used to solve (32), we obtain 2digit accuracy in a small
fraction of the total computing time.
The times reported for QAP8, QAP12, and QAP15 were obtained using an
iterative version of LP DASA described in [22], where (32) is solved using the
SSOR preconditioned conjugate gradient scheme of Bjorck and Elfving [4].
For these problems, A is relatively sparse while the CIi, ,1.ky factorization of
AAT is relatively dense. Hence, the time to factor AAT grows proportional
to mn, the number of rows in A cubed. For example, the ratio of the Barrier
solution times for QAP15 and QAP12 is 750.49/94.54 w 7.94, roughly equal
to the row ratio cubed: (6330/3192)3 w 7.80. Hence, the time to solve
nug30, a problem with 52,250 rows, by the Barrier method would be roughly
750.4' (52260/6330)3 w 422,000 secs, while the memory requirements would
be several gigabytes. For comparison, we have solved this problem to 8 digit
accuracy, using the latest version of our iterative code, in 126,000 sees with
only 12 megabytes memory needed to store A. (In a different approach [31]
to this test problem, using a barrier method, some of the nonzero zeros in
the matrix are neglected initially, while more are taken into account as the
iterations convergence. With this approach, the memory requirement is low
when the iterates are far from the solution, but much higher as the iterates
approach the solution.)
In summary, LP DASA is able to compete with today's stateoftheart
commercial software. The same LP DASA platform can be run using ei
ther an iterative solver or a factorizationbased solver to obtain solutions of
high relative accuracy. Equation dropping techniques significantly lower the
overall solution times. Recursive implementations of the equation dropping
techniques should further reduce solution times. Another approach for fur
ther reducing solution times is to implement the algorithm in a multilevel
fashion. This multilevel approach is developed in [10]. The solution times
reported in the present paper were obtained by running the multilevel code
using a single level.
Problem
STAIR
FIT1D
IIi[P08L
.: : _07
FIT1P
1: T A I [
DEGEN2
SHIP12L
SCSD8
CZPROB
sciv: 12
GROW15
BNL1
STOCFOR2
SC(F:x 1.)
WOODW
MAROS
GROW22
BNL2
PILOT
25FV47
PEROLD
D6CUBE
CRE_C
CRE_A
FIT2P
OSA_07
PILOT_WE
LP DASA
0.63
0.13
0.21
0.17
0.13
0.23
0.25
0.34
0.48
0.18
0.38
0.28
0.90
(i ,
0.68
0.63
0.92
0.40
7.16
2.96
1.25
7.06
21.17
1.91
2.48
0.82
0.99
8.13
Table 1: \Ninliii, ;il Comparisons, part 1
Simplex
0.10
0.15
0.10
0.11
0.41
0.13
0.15
0.16
0.56
0.20
0.18
0.54
0.30
0.30
0.33
0.37
0.39
1.58
0.44
0.47
1.57
0.90
0.77
0.79
0.81
21.58
1.81
3.37
Barrier
0.25
0.22
0.27
0.34
0.33
0.23
0.27
0.41
0.17
0.39
0.23
0.26
0.36
0.44
0.36
0.72
0.42
0.41
1.51
0.57
0.65
0.77
2.02
1.30
1.51
2.30
2.03
0.99
Problem
PILOTNOV
TRUSS
,:;:: 1
PDS_06
I,. l: _11
G;[:[. iF:[.:iF:I
0 [1[: i::: il: A
PILOT_JA
FIT2D
801 Al hl;
QAP8
OSA_14
D2Q06C
STOCFii *I,)
PDS_10
KEN_13
OSA_30
PILOT
MAROSR7
CRE_D
CRE_B
OSA_60
PILOT87
PDS_20
[.[: :,_18
DFL001
QAP12
QAP15
LP DASA
9.68
6.04
3.32
14.17
2.20
3.04
11.73
25.65
1.88
8.40
2.79
2.50
3.14
17.08
19.47
35.69
19.24
7.36
34.82
3.72
37.46
44.30
20.00
151.40
378.57
199.72
356.63
260.83
750.06
Table 2: \N iliiii ;il Comparisons, part 2
Simplex
1.26
49.08
2.10
1.48
1.50
5.15
3.76
1.79
3.23
1.98
2.09
8.61
11.47
10.47
10.20
6.06
6.45
45.50
13.80
12.22
7.76
13.17
228.52
97.14
42.38
44.98
89.71
1568.00
84606.00
Barrier
1.10
1.18
1.19
9.67
2.94
1.52
1.73
1.81
2.84
2.23
2.70
2.59
5.04
3.35
5.76
32.04
9.71
11.58
6.79
8.41
13.69
16.98
*.**
23.91
157.16
***
121.92
94.54
750.49
STAIR 241 269 240 216 72 70
GROW22 440 860 ''4 46 11 29
TRUSS 1000 7', 2773 1018 0 1
C[; I: 1:,1,:i 1 1015 2926 3637 1251 127 414
FIT2D 25 10363 1474 1650 11 14
.:(:.::; 1406 1712 1288 548 349 456
MAROS_R7 2152 2301 40 65 74 83
OSA_60 10209 224125 4".'' 3763 1361 220
PILOT87 1,, 44'II 4170 4379 578 442
PDS_20 10214 7'7 3149 4327 46 69
KEN_18 3'1),.6 89347 4', 'I; 2516 1 35
DFL001 :il 1 8441 2901 2163 39 63
Table 3: LP DASA statistics
LP DASA LP DASA Barrier Simplex
Problem solves chols chols iterations
STAIR 205 3 17 180
GROW22 74 7 13 1598
TRUSS 391 13 18 7287
C: I: : i: ; :ii 548 19 34 4136
FIT2D 148 2 19 3')'
r[):(: l:[:,;, 282 6 19 14 2'
MAROS_R7 61 1 10 2742
OSA_60 132 1 ** ,"' 4
PILOT87 1724 9 27 10779
PDS_20 318 31 42 174' I
1. : :'_18 3410 44 ** 51162
DFL001 1507 11 60 17363
Table 4: LP DASA solves and chols, Barrier chols, Simplex iterations
Problem
rows cols
col+ col row+ row
Acknowledgments
Initial work connected with the application of DASA to linear programming
appears in the thesis of ChunLiang Shih [36]. Experimentation with a very
early version of the code was done by Erik Lundin, a student participating
in an exchange program between the Center for Applied Optimization at
the University of Florida and the Royal Institute of Technology (KTH) in
Sweden. Constructive comments by the referees led to a much improved
presentation.
References
[1] E. D. ANDI:[: [.:': AND K. D. A:N [[:[:[.i:':, Prn /. i;, in linear pro
gruili,,,,., Mathematical Programming, 71 (1995), pp. 221245.
[2] M. BERGOUNIOUX AND K. KI I'[CH, Primaldual strategy for state
constrained optimal control problems, Computational Optimization and
Applications, 22 (2002), pp. 193224.
[3] R. .E. BIXBY, Progress in linear progrn,,,,,;,i ORSA J. on Computing,
6 (1994), pp. 1522.
[4] A. BJORCK AND T. ELFVING, Accelerated projection methods for com
i,aid;,i, pseudoinverse solutions of systems of linear equations, BIT, 19
(1979), pp. 145163.
[5] F. H. CLARKE, Generalized gradients and applications, Transactions
of the American Mathematical Society, 205 (1975), pp. 247262.
[6] T. A. DAVIS, J. R. GiI ii[:[:T, S. I. LARIMORE, AND E. G. NG, A
column approximate minimum Ji ,, order;,', (Jl!,.,;ili,,, AC'.I Transac
tions on Mathematical Software, (to appear, see also TR00005, Univ. of
Florida, CISE Dept., www.cise.ufl.edu).
[7] T. A. DAVIS AND W. W. HAGER, .1'Ifi1:;,Ki a sparse C',/,I ,/y fac
torization, SIAM J. Matrix Anal. Appl., 20 (1999), pp. 606627.
[8] T. A. DAVIS AND W. W. HAGER, Multiple rank rni,!ifi,lti;,. of a
sparse Cholesky factorization, SIAM J. Matrix Anal. Appl., 22 (2001),
pp. 9971013.
[9] T. A. DAVIS AND W. W. HAGER, Row ri,fiftlr~t^u of a sparse
Cholesky factorization, October 21, 2003 (to appear in SIAM J. Ma
trix Anal. Appl.).
[10] T. A. DAVIS AND W. W. HAGER, Dual multilevel optimization, July
21, 2004.
[11] J. DONGARRA, J. DU CROZ, S. HAMMARLING AND I. DUFF, A Set
of Level 3 Basic Linear Al, I i,,, S;ibprir',qr.i AC(.I Transactions on
Mathematical Software, 16 (1990) 117.
[12] I. E .1: N AND AND R. T I:: 1i: 1, Convex Analysis and Variational Prob
lems, NorthHolland, Amsterdam, 1976.
[13] P. E. GILL AND W. MURRAY, A numerically stable form of the simplex
(lI,/ ,11, Linear Algebra and Its Applications, 7 (1973), pp. 99138.
[14] C. D. HA, Generalization of the proximal point ,JlIJi,,, SIAM J. Con
trol Optim., 28 (1990) 503512.
[15] W. W. HAGER, Convex control and dual approximations, part I, Con
trol and Cybernetics, 8 (1979), 522.
[16] W. W. HAGER, Inequalities and approximation, Constructive Ap
proaches to Mathematical Models, C. V. Coffman and G. J. Fix, eds.,
Academic Press, New York, (1979), pp. 189202.
[17] W. W. HAGER, Uiji.,li,, the inverse of a matrix, SIAM Review, 31
(Il' ), pp. 221239.
[18] W. W. HAGER, The dual active set u/,i, ,l/iii, Advances in Optimiza
tion and Parallel Computing, P.M. Pardalos, ed., North Holland, Ams
terdam, (1992), pp. 137142.
[19] W. W. HAGER, The LP dual active set ul',,I',lii,, High Performance
Algorithms and Software in Nonlinear Optimization, R. De Leone,
A. Murli, P. M. Pardalos, and G. Toraldo, eds., Kluwer, Dordrecht,
1l' pp. 243254.
[20] W. W. HAGER, Iterative methods for nearly .,siln,, linear systems,
SIAM J. Sci. Comput., 22 (2000), pp. 747766.
[21] W. W. HAGER, The dual active set vl.,I ,;l11,, and its application to
linear prognri in,,, ;, Computational Optimization and Applications, 21
(2002), pp. 263275.
[22] W. W. HAGER, The dual active set l iI l/;11l,,, and the iterative solution
of linear ,proqrawm in Novel Approaches to Hard Discrete Optimization,
P. M. Pardalos and H. Wolkowicz, Eds., Fields Institute Communica
tions, Vol. 37 (2003), 95107.
[23] W. W. HAGER AND D. Hi: A:N, Application of the dual active set ,I. .
rithm to quadratic network optimization, Computational Optimization
and Applications, 1 (1993), pp. 349373.
[24] W. W. HAGER AND G. IANCII 1:i:CU, Dual approximations in optimal
control, SIAM J. Control Optim., 22 (1984), pp. 423 4t,i
[25] W. W. HAGER, CHUNLIANG SHIH, AND ERIK 0. LUNDIN, Active set
strategies and the LP dual active set ,il.i,. ;l/,,, Department of Mathe
matics, University of Florida, Gainesville, Florida, 1996.
[26] J. W. H. Liu, A ., ,i realized envelope method for sparse factorization by
rows, AC .I Transactions on Mathematical Software, 17 (1991) 112129.
[27] F. J. LUQUE, Asymptotic convergence analysis of the proximal point
,l./,, ;i,, SIAM J. Control Optim., 22 (1984), pp. 277293.
[28] B. l\[:T[1'ET, i ,,i,,i ,'s/,'l' d'inequations variationelles par ap
proximations successives, Revue Francaise Informatique et Recherche
Operationnelle, 4 (1970) 154159.
[29] B. MARTINET, Determination approachee d'un point fixe d'une appli
cation pseudocontractante, Comptes Rendus des Seances de l'Academie
des Sciences, Paris, 274 (1972) 163165.
[30] P.Q. PAN, o."[1,i; ,il;in of the dual projective pivot ,l,1 ,l i1i,,, for linear
progruili,,,,,, to appear in Computational Optimization and Applica
tions, 29, 2004.
[31] M. G. C. R 1:'::,,DE, K. G. RAMAKRISHNAN, AND Z. D[::, : 1[:t: Com
i,';,ii, lower bounds for the quadratic 1,/i.,,,I ,'I problem with an inte
rior point ,l.i .,;i/,,, for linear progruin,,,,,;, Operations Research, 43
(1995), pp. 781791.
[32] R. T. ROCKAF 1[:i i \[:, Monotone operators and the proximal point al
,i,,II,,, SIAM J. Control Optim., 14 (1976), pp. 877 s's
[33] R. T. ROCKAF[! 1 I \[:, A[in, rm1ld1 .I/II ,;'ns and applications of the
proximal point ul',I 1i,,, in convex progrn,,,,,;,., Mathematics of Oper
ations Research, 2 (1976), 97116.
[34] C. Roos, T. TI:[:i A\.Y, AND J.PH. VIAL, Theory and Al,ii,;ilil,i,
for Linear Optimization: An Interior Point Approach, John Wiley and
Sons, New York, 1997.
[35] M. A. SAl :[['F: [:; C'li, ,i yi, .I d methods for sparse least squares: The
benefits of ,,~l,,, ;1,,; in Linear and Nonlinear C ,',&',,/,i1 Gradient
Related Methods, L. Adams and J. L. Nazareth, eds., SIAM, Philadel
phia, 1996, pp. 92100.
[36] C.L. SHIH, Active set strategies in optimization, PhD dissertation, Uni
versity of Florida, 1995.
[37] J. E. SPINGARN, Submonotone mappings and the proximal point ul',I
rithm, :ilii,i. ;l Functional Analysis and Optimization, 4 (1982) 123
150.
[38] S. Volkwein, 11,, ., ,I SQP techniques for the control constrained opti
mal boundary control for Burger's equation, to appear, Computational
Optimization and Applications, Vol. 263.
[39] C. ZHU, Asymptotic (,, I .,/ i ,i analysis of some inexact proximal point
(l,11 ,1, i,,, for minimization, SIAM J. Optim., 6 (1996), pp. 626637.
