Dual Multilevel Optimization*
Timothy A. Davist William W. Hagert
July 21, 2004
Abstract
We study the structure of dual optimization problems associated
with linear constraints, bounds on the variables, and separable cost.
We show how the dependency structure of the linear equations is re
lated to the separability of the dual cost function. As a result, tech
niques for ordering sparse matrices based on nested dissection or graph
partitioning can be used to decompose a dual optimization problem
into independent subproblems that could be solved in parallel. The
performance of a multilevel implementation of the Dual Active Set
Algorithm is compared with CPLEX Simplex and Barrier codes using
Netlib linear programming test problems.
Key words. Multilevel optimization, dual optimization, dual separabil
ity, dual active set algorithm, parallel algorithms
AMS(MOS) subject classifications. 90C05, 90C06, 65Y20
*This material is based upon work supported by the National Science Foundation under
Grant No. (i,'..1'7I
tdavis@cise.ufl.edu, http://www.cise.ufl.edu/~davis, PO Box 116120, Department
of Computer and Information Science and E,,_ iH in,1 University of Florida, Gainesville,
FL 326116120. Phone (352) 3921481. Fax (352) 3921220.
thageramath.ufl.edu, http://www.math.ufl.edu/l..... I PO Box 118105, Depart
ment of Mathematics, University of Florida, Gainesville, FL 326118105. Phone (352)
3920281. Fax (352) 3928357.
1 Introduction
We consider problems of the form
sup (A), (1)
A
where
(A) = F(A) + inf (f(x) TAx),
x>O
F(A) = (Ai),
i=1
n
f(x) = Zfj(xj).
j=1
Here Fi : P  IR, fj : [0, oo) IR, and A is an m by n matrix. If F,(Ai)
biAi, i = 1, 2, ..., m, for some b E ."', then (1) is the dual of the primal
problem
minf(x) subject to Ax = b, x > 0. (2)
We refer to L as the "dual function" and to (1) as the "dual problem,"
regardless of how the F, are chosen.
Due to the separable structure of f, the dual function can be expressed
n
(A)= F(A) + Y t(A), (3)
j=1
where
m
tj(A) = inf h(xj) xj, Aij (4)
xj>O i=1
Even though both f and F are separable, the dual cost function does not, in
general, separate since each of the functions tj can depend on the entire A
vector. In this paper, we examine how the dual problem can be decomposed
into relatively independent subproblems by exploiting the sparsity of A.
The basic step in the decomposition process is to write
(A) = (A) + L1(A,1, A) + L2(A2, A), (5)
L2(A2, A) A,(Ai,, )
Figure 1: The dual optimization tree for the decomposition (5)
where AX, A2, and A are disjoint subvectors of A; that is, the components
of the vectors AX, A2, and A are components of A and no two vectors have
a common component. When L can be decomposed in this way, we can
visualize the dual optimization problem using the tree shown in Figure 1.
The full dual function C(A) is associated with the root of the tree, while
1 and 2 are associated with each leaf of the tree. If A is fixed, then
the optimization of L1(A1, A) over A1 is independent of the optimization of
C2(A2, A) over A2.
Figure 1 suggests various approaches for exploiting the structure. For
example, if the optimization is done using dual coordinate ascent, we could
start at the leaves of the tree and optimize over A1 and A2 while holding A
fixed. The optimization over A1 and A2 can be done in parallel. Then we
can move to the root of the tree and optimize over A while holding A1 and
A2 fixed. After a new value of A is determined, we would drop to the leaves
and optimize over A1 and A2, with A fixed. Under suitable assumptions, the
dual coordinate ascent iteration converges to a stationary point of L (e.g.
[30, p. 228]). The decomposition (5) reveals that two of the dual coordinate
ascent iterates could be computed independently.
In this paper, we develop general multilevel techniques for decomposing
the dual problem into independent subproblems, with each subproblem as
sociated with a node in a tree. Multilevel techniques have proved to be very
effective for solving elliptic partial differential equations [4, 15], and for solv
ing NPhard graph partitioning problems [24, 25, 26, 27, 28]. Our multilevel
strategy is related to the arrowhead partitioning presented in [12]. However,
instead of working towards an arrowhead form, we utilize a more general re
cursive partitioning, closer in spirit to the nested dissection ordering [23, 25]
used for symmetric matrices, and to the discussion in [13] of blockstructured
matrices in linear programming and interior point solvers.
We present a version of the Dual Active Set Algorithm (DASA) [10, 16, 17,
18, 19, 20, 22] for maximizing L, which exploits a multilevel decomposition
and moves up and down the multilevel tree in a way loosely related to that
of multilevel methods for graph partitioning or multigrid methods for partial
differential equations. Although the decoupled problems could be solved
in parallel, we find that even for a single processor, it can be quicker to
solve some problems using a multilevel approach, as opposed to the single
level approach, because the subproblems are solved quickly, and there are
a relatively small number of iterations at the root node. Comparisons with
CPLEX Simplex and Barrier codes, using Netlib linear programming test
problems, are given in Section 7.
2 Level 1 partitioning
Given vectors x, y E P.", define
Y 0 if Xjyj = 0 for all j,
xy=
1 if Xjyj = 0 for some j.
If ai denotes the ith row of A, then ai 0 aj = 0 if and only if the variables
in equations i and j of Ax = b are different. In essence, equations i and j
are independent of each other.
Lemma 1. Let Ri, i = 1, 2,..., k, be disjoint subsets of {1,2,..., m}
and Ji fiT/7
7= (R1 u 2 u... u k)c,
where the superscript "c" denotes complement. If the "orIii;,il ,'.,I condi
;,,,," ap 0 aq = 0 holds for all p E ,R and q E Rj whenever 1 < i < j < k,
then can be expressed as
k
L(A)= L(A)+ Li(Ai,A), (6)
i=1
where A,, i = 1,..., k, and A are disjoint subvectors of A associated with
indices in ,R, i = 1, . k, and in R respectively.
Proof. Define the sets
C = {j [1,n] : aij forsome i E R}, l= 1,2,...,k 1, (7)
and
Ck (CU U ... U Ck1) (8)
Observe that
(a) Ci and Cj are disjoint for i = j: Clearly, Ck is disjoint from Cj for j < k
by the definition of Ck. If I E Ci n Cj for i < j < k, then there exists
p E Ri and q E Rj such that apt # 0 and aqg # 0. This implies that
ap a aq = 1, a contradiction.
(b) For any 1, aij = 0 for every i E (RI U R)" and j E Ci: If I < k and
j E C1, then by (7) there exists p E 7 such that apj 7 0. By the
orthogonality condition, aij = 0 whenever i E (7 U R)c. If I = k and
j E Ck, then j ( CJ for 1 < k. Since j ( C1, (7) implies that aij = 0 for
every i E C1.
By (a) and (8), we can group the terms of (3) in the following way:
(A) =F(A)+ (e (A))
l=1 jECl
If j E C1, then by (b), the only nonzero terms Aiaij in (4) correspond to
indices i E t. It follows that ej(A) is independent of p when p 1l. The
decomposition (6) is obtained by taking, for example,
L (A, A) F,(A,) + Z (A), (9)
T;" jECi
and
L(A) = F(Az). (10)
The dependency matrix D associated with A is defined by
dij = ai 0 aj.
0 z2
Figure 2: Structure of the dependency matrix of A in Lemma 1
Under the hypotheses of Lemma 1, if we permute the rows of A in the order
R1, 72, and R and shade the nonzero parts of the dependency matrix, we
obtain Figure 2. Observe that D is 0 in the region corresponding to (i, j)
with i EC 7 and j E R2 or i E R2 and j E Ri. If in addition the columns of
A are permuted in the order C1 followed by C2, then structure of permuted
A appears in Figure 3. By the definition of C, in (8), aij is zero for i EC 7
and j E C2 or i E z2 and j E C1.
As a specific illustration, let us consider the Netlib test problem i i:i s
In Figure 4 we use MATLAB's spy function to plot the nonzeros in the
dependency matrix; each nonzero in D is plotted as a block spot in an m by
m grid. Although this matrix, in its original form, does not have the structure
seen in Figure 2, it can be put in this form using graph partitioning codes
such as Cli, [24, 25] or '.1ii. [26, 27, 28]. These heuristic methods find a
small edge separator of the graph G whose adjacency matrix is D. An edge
separator is a subset of the edges of G which, when removed from G, cause it
to split into two (or more) disconnected components. Ideally, the separator
is a balanced one, where the two components are of equal size. Finding a
minimal balanced edge separator is an NPcomplete problem. When applied
to D, these codes partition the row and column indices into two sets Q1
and Q2, chosen to approximately minimize the number of nonzeros dij with
i E Q1 and j E Q2. The sets Q1 and Q2 are roughly the same size. We then
extract elements from Q1 or Q2 in order to obtain a set I with the property
that dij = 0 whenever i E Q1 \ 9 and j E Q2 \ 7. Finally, R, = Qi \ 9 for
0
C1
0 R3
Figure 3: The permuted A associated with Lemma 1
Figure 4: Dependency structure of Netlib's test problem i im: i
(a) (b)
Figure 5: Permuted dependency matrix for IiI
For illustration, we apply to the matrix i to generate the sets
600 600
matrix between the first nonzero in each row or column and the diagonal to
obtain Figure 5: Permu ted dependency matructure depicted i n Figure 2.
1 = 1 and 2. This step converts the edge separator found by Ci:, or .,.i.
into a3 Multilevel partitioning
For illustration, we apply Cli,, ,, to the matrix i i:i ss to generate the sets
R1, R2, and 9 described in Lemma 1. The reordered matrix is plotted in
Figure 5a. To better visualize the structure, we darken the portion of the
matrix between the first nonzero in each row or column and the diagonal to
obtain Figure 5b, which corresponds to the structure depicted in Figure 2.
3 Multilevel partitioning
In Lemma 1 we take the dual function and write it as a sum of the functions
i given in (9). Notice that i has the same structure as itself in (3); it is
the sum of a separable function of A (the F, terms in (9)) and terms ej(A),
j e Ci. For any j E Ci, the nonzeros aij in column of j of A correspond to
indices i E Ri or i E 7. Hence, we have
tj(A) = inf fj(xj) X Aaij xj aij (11)
Xj>o ? ,.,
Let us view A as constant. If we identity the first 2 terms in (11) with the
fj term in (4) and if we identity the Ri sum in (11) with the sum in (4), we
can apply Lemma 1 to i to further decompose the dual function.
We will carry out this recursion in the case k = 2. In the initial application
of Lemma 1, we have
(A) = (A) + (AX, A) + 2(A2, A),
where Ai denotes the components of A associated with the set Ri, and where
i and are given in (9)(10) in terms of the sets Ci in (8). Suppose that
Ri is partitioned further as:
Ri = Ril U Ri2 U i,
where the sets Ril and Ri2 satisfy the orthogonality condition of Lemma 1.
When Lemma 1 is applied to (9), in the case k = 2, we group the terms that
define i in the following way:
L(A,, A) = i(A\i) + L,(Ai, Ai) + Li2 (Ai2, Ai), (12)
where
1(Ail, Ai) = Fp(Ap) + t (A),
,, jEcil
Cil = {j E Ci : apj i 0 for some p E il }, C2 i \ Cil,
Li,(A,) = Fp(Ap).
In the recursion (12), some components of Ai form the vector Ail (corre
sponding to qil), other components go into the vector Ai2 (corresponding to
Ri2), while the remaining components (corresponding to Ri) are added to A
to form Ai. Thus Ail and Ai2 are subvectors of Ai and Ai is A augmented
with the remaining components of Ai. When Ai is fixed in (12), the mini
mization of il over Ail and the minimization of Li2 over Ai2 can be done
independently.
In Figure 3 we show the arrangement of nonzeros when A is initially
partitioned (and permuted) in accordance with Lemma 1. And in Figure
6 we show the further rearrangements of nonzeros due to the additional
partitioning of R1 and R2
If the sets of Lemma 1 were generated by graph partitioning, as we did
for i i: s in the previous section, then the dependency matrix of A used for
the initial partitioning would be replaced by the dependency matrix for Ai,
the submatrix of A associated with the intersection of the rows with indices
722
Figure 6: The matrix in Figure 3 after additional partitioning
Figure 7: Doubly permuted dependency matrix for i i: is
1 2 4
Figure 8: Tree associated with 2level decomposition (12)
in 7i and columns with indices in C,. The partitioning process writes the
initial Ri as a disjoint union of sets %ij, j = 1, 2, and Ri. The dependency
matrix of the permuted A appears in Figure 7a. Again, to better visualize
the structure of the ordered matrix, we darken the portion of the matrix
between the first nonzero in each row or column and the diagonal to obtain
Figure 7b.
With these insights, we now formally define a multilevel decomposition
of the dual problem. It is described by a tree T with r nodes labeled 1, 2,
..., r, which satisfy the following four conditions:
M1. Corresponding to each node i of T, there is an associated set Ri. If r
is the root of the tree, then R, = {1, 2,..., m}.
M2. For each node i of T, Ri is strictly contained in 7R(i), where 7r(i) is
the parent of i, the node directly above i in T.
.1:1 For each node i of T, Rj n Rk = 0 whenever j and k are distinct
children of i (that is, i = r(j) = 7r(k) and j # k).
.1 I. For each node i of T, a, 0 aq = 0 whenever p E Rj and q 7Rk where
j and k are distinct children of i.
In the tree of Figure 8 associated with a 2level decomposition, 7r(1) = 3, and
7r(3) = 7. By M2 the sets 7, grow in size as we move from the leaves up to
the root of the tree. At any level in the tree, the sets are disjoint according
to ".1: The orthogonality property .1 I leads to the decomposition of the
dual function as described in Lemma 1.
Using the sets Ri associated with each node in the multilevel decompo
sition, we construct the following additional sets and functions:
(a) the vector Ai formed from components of A associated with elements
of Ri,
(b) the complementary set Ri and the complementary vector Ai,
(c) the function Li.
We now give precise definitions for (b) and (c).
The complementary set Ri is obtained by removing from Ri the sets R,
of its children. In other words,
R,= Ri\ U Rc
cE7r 1 ()
In sparse matrix terminology, the complementary set Ri is the separator of
the children Rc, c E r'(i), in the sense that the equations (Ax b)p = 0
associated with p E Ri couple together the equations associated with each
of the children R.e
The complementary vector Ai is defined as follows: At the root, Ar is the
vector with components Ai, i E 9r. At any other node i 7 r, A, consists of the
complementary vector A,(i) of the parent augmented by those components of
A associated with the set Ri. Since the parent of the root 7r(r) is undefined,
we take A\(r) to be empty. An equivalent description of the complementary
vector Ai is the following: It consists of those components of A associated
with sets R, for each p on the path between i and r, the root of the tree.
Ni,. we write the decomposition rule (6) of Lemma 1 in a recursive fash
ion. At the root, r,(A) = L(A). For nodes beneath the root, the structure
of the recursion is gleaned from (12). The left side of the equation contains
a function i associated with node i of the tree. The right side contains
functions associated with each of the children of i. The left side involves
the variable Ai associated with node i, and the complementary variable A
associated with the parent of i (and the set I). The right side involves the
dual variables Aij of the children of i, and the complementary variable Ai of
the parent of the children. Hence, we have
Li(Ai, A,()) = Li(A) + 1 c(Ac A),
CE7_r (1)
Figure 9: Tree associated with 2level decomposition (12); (a) R23 = 0, (b)
723 = R13 0
= i(Ai) + ,((AC,,X/()), (13)
with the initialization Lr(A) = (X).
If is split into two parts as in (5), and if these parts are further sub
divided into 2 parts, the tree associated with the decomposition process is
binary. In the special case where the complementary set is vacuous, the mul
tilevel tree can be simplified. For example, if Ra = 0 in Figure 8, then the
decomposed Lagrangian is represented by the tree in Figure 9a. If in addition
96 = 0 in Figure 8, then the decomposed Lagrangian is represented by the
tree in Figure 9b. As an example, the dependency matrix of the Netlib test
problem KEN_07 shown in Figure 10 is associated with a tree that has 49
nodes beneath the root.
4 Dual dependencies
The dual functions i, i = 1,..., r, are not all independent of each other.
For the decomposition associated with Figure 8, 1, 2, L4, and 5 are all
independent of each other (assuming the complementary variables are fixed).
Also, 1 and 2 are independent of 6 since 1 and 2 are obtained by
decomposing 3, which is independent of 6. On the other hand, La is
not independent of 1 and 2, even when the complementary variables are
fixed, since they share common variables. In this section, we specify the
independent functions in the dual decomposition.
Given a subset AF of the nodes of T, the special subtree T7 is the tree
obtained by pruning from T all descendants of the nodes AN. For any tree
Figure 10: Dependency matrix for [0. [:': _07 in Netlib/lp
T, we let OT denote the leaves of the tree (that is, the childless nodes).
Lemma 2. Given a multilevel decomposition associated with a tree T
which satisfies (13) and a special subtree T7, we have
(A) (1)+ L(, A(1)). (14)
lETg\a9T IWSTM
Proof. Suppose that (14) is violated for some special subtree TA7 of
T. Let lo E TgNo, and let Po = r(lo) be its parent. Let TAr be the special
subtree of T obtained by pruning from T7A all the children r1 (Po). That is,
Ai = Ao U {Po} Fit li* (14) holds for TAr and we stop the pruning process,
or it is violated for T7A, in which case we choose 1li E 0A,, pi = r(lP),
and TA2 is the special subtree of T induced by i2 = Afi U {pj}. Since
the decomposition (14) holds for the special subtree {r}, which is contained
in any special subtree, this pruning process must terminate with a special
subtree TAk for which (14) is violated, while (14) holds for 7AT+1. Since T,+
is obtained by pruning the children of some node p from TAk, it follows from
(13) that
Lp (ApA7(p)) = p(A~p) + J (AcA(c)). (15)
cE7 1(p)
i j
Figure 11: k is first common ancestor of nodes i and j
Since (14) holds for TAk+l, we have
L(A) + (AA ). (16)
lETAk+1 \aTk+1 ETk+1
Since p E Ta7k+1, it is one of the terms in the last summation in (16). After
making the substitution (15) in (16), we obtain (14) in the case AN = Nk.
This contradicts the fact that (15) was violated for AN = Nk. Hence, (14)
holds for all special subtrees of T. 0
Lemma 3. If T' is a special subtree of T and i and j E TAr, i 4 j, then
R)inRj = 0. If Si is the set of indices of A associated with the complementary
variable A then Si n Rj = 0.
Proof. Consider the sequence of ancestors ''(i), p = 1, 2,... and 7i(j),
q = 1,2,.... Let k = '"(i) = 7r(j) be the first common ancestor of i
and j (see Figure 11). Since k is the first common ancestor, the children
cl = .;" (i) and c2 = .7' 1(j) are distinct. By .1:, ; c, n RC2 = 0. By M2,
Ri C R c and Rj C Re2. Hence, Zi n Rj = 0.
By the definition of the complementary set Rk, it is disjoint from the
sets R, of each of the children c E Tr1(k). That is, 7k n R, = 0 for each
ScE Tr(k). Applying this result to each node between k and the root r, we
conclude that Sk n cR = 0 for each c E 7rl(k). For each node on the path
from cl to i, the complementary set is contained in cR, due to M2. Hence,
Si C (R,, U Sk). The relations
Rcl n Rec2 0, = SkN ZC 0, C Z2, and S, C (Rc U Sk),
imply that 3, n Rj = 0. 0
By Lemmas 2 and 3, it follows that when TA is a special subtree of T,
maximizing the dual function in (14) over At for each 1 E Tgr, while holding
the other components of A fixed, decomposes into independent maximiza
tions:
max Ci(AX, X,())
5 An example
We give a concrete illustration of a multilevel decomposition using the dual
of a linear programming problem
mincTx subject to Ax = b, x > 0,
with
110000
001000
A 1 1 1 1 0 0 (17)
000011
101010
For the tree shown in Figure 12, it can be checked that conditions M1 .I I
hold. The first level of the tree corresponds to the decomposition
C(A) = b5A5 + 3(Ai, A2, A3, A5) + C4(A4, A5),
where the complementary variable is A5 = A5, and the first term b5A5 corre
sponds to 45(A5). The function 3 and 4 are given as follows:
cl > A1 + A3 + A5
blA1 + b2A2 + baa3 if 2 1 + A3
La(A1, 2 A3, A5) C 3 > A2 + A3 + A5
C4 > A3
oo otherwise,
Ra = {1,2,3}
W/
71 = {1}
Z5 = {1,2,3,4,5}
R4 = {4}
n2 = {2}
Figure 12: Multilevel decomposition associated with (17)
f i .f cs > A4 + A5
4 (A4, A5) 44 if C6 > A4
oo otherwise.
La is independent of 4 when A5 is fixed. At nodes 3 and 4, we have
As = (A1, A2, A) and A4 = (A4). By the structure of 13, it can be further
decomposed as:
13(A) = b3A3 + 11(A1, A3, A5) + 12(A2, A3, A5),
where
L1(Ai, A3, A5)
2 (, A3, A5)
blA1 if c{ > Ai + A3 + A5
1 C2 > A1 + A3
oo otherwise,
b2A2 if c3 > A2 + A3 + A5
C4 > A3
oo otherwise.
The complementary variable at node 3 is As = (As, As), the complementary
variable at the parent of 3 augmented by the variable As associated with
3R = {3}. L1 and 12 are independent when A3 is fixed. The nonzero
structure of A associated with the tree of Figure 12 is shaded in Figure 13.
1 000 0 0 0
72 0 0 0 0 R
0 0 R5
R4 0 0 0 0
Figure 13: Nonzero structure of A corresponding to multilevel partitioning
tree in Figure 12
6 Multilevel DASA
It was pointed out in Section 1 that a multilevel decomposition could be
used in various algorithms for solving the dual problem; a specific illustra
tion was dual coordinate ascent. Here we focus on a more efficient use of a
multilevel decomposition based on the Dual Active Set Algorithm (DASA).
By [18, Thm. 1], the DASA solves (1) in a finite number of iterations when
f is strongly convex and F is strongly concave. Given a bound set B C
{1, 2,..., n}, let XB be the subvector of x consisting of those components xi
associated with i E B. We define the functions
CB+(A) = min {(A,x) : x P.", XB > 0}, (18)
and
0Bo(A) = min {(A, x) : x E P",XB = 0}. (19)
where
(A, x) = F(A)+ f(x) ATAx.
The components of x in the minimization problems (18) and (19) correspond
ing to indices in the complement of B are unconstrained. Assuming the fj
are strongly convex, there exists a unique minimizer x(A, B) in (18) for each
choice of A and B. Let x(A) denote x(A, B) in the case B = {1, 2,..., n}.
The statement of the DASA in Algorithm 1 has an outer loop indexed by
1, and an inner loop indexed by k. In the inner loop, indices are deleted from
Bk to obtain Bk+l; that is, Bk+1 is a subset of Bk consisting of those indices
for which xi(vk+1, Bk) = 0. When the inner loop terminates, we reinitialize
Bo by solving the problem
min (Aj, x) (20)
x>O
1=0
Ao = starting guess
while V(AI) 0
vo =A
Bo = {j n] : x(AX ) = 0}
for k = 0,1,...
wk = arg max {(Bo(A) : A P '}
,k+ = arg max { B(A) : A [ik, wk}
Bk+1 j E Bk: xj(V/k+, Bk)= 0}
if v~k+1 Wk break
end
l= +1
A, = Wk
end
Algorithm 1: Dual Active Set Algorithm
to obtain x(AI). Since the optimization over x in (20) decomposes into
independent optimization over xj as in (4), it follows that the Bo set at the
start of the iteration includes some indices that were missing from the final
Bk associated with the inner loop in the previous iteration. Hence, the dual
initialization step adds indices to the current bound set, while the formula
for Bk+1 in the subiteration deletes bound indices. The proof [18, Thm. 1]
that DASA solves (1) in a finite number of iterations is based on the fact
that the subiteration strictly increases the value of the dual function; as a
result, the final set Bk generated in the subiteration cannot repeat. Since the
sets Bk are chosen from a finite set, convergence occurs in a finite number of
steps.
Algorithm 2 is the multilevel version of DASA. Here we replace the maxi
mization over A in the computation of wk in Algorithm 1 by a maximization
over the leaves of a special subtree. As seen after Lemmas 2 and 3, maxi
mization over the leaves of a special subtree decomposes into the independent
maximization of local dual functions, which could be done in parallel. These
local maximizations, the eloop of Algorithm 2, all increase the value of the
dual function. The proof of finite convergence is the same given previously
for the DASA (e.g. [16, 18, 20]). That is, in the multilevel setting, we work
1=0
Ao = starting guess
while V(A1) : 0
v0o = A,, Ao = 0
Bo= {j 1, n] x:(A) = 0}
for k = 0,1,...
k+i1 = Vk
for each e E OCRk
we = arg max{Bo(A) : Ai = ki, for all i E e'
I = arg max {g( (A) A [1k,Wel}
(vk+i)i = vU for all i E Re
end
Bk+1 = {j e Bk: xj(vk+i, Bk) = 0}
Ak+1 = A/k U {p E 7Ark :
(Vk+l)i = Wki for all i E U{R, : i(c) = r(p)} }
if T7k+1 = {r} and Vk+l = r break
end
= 1+1
Al = fWk
end
Algorithm 2. M1ultilccvl Dual Active Set Algorithm
up to the root of the tree T increasing the value of the dual function, and
by M1, the set R, contains all the indices of A. Hence, in the multilevel
setting, the final dual subiterate again takes us to a maximizer of ,,;, for
some set Bk, which can never repeat without contradicting the strict ascent
of the dual iterates.
In Algorithms 1 and 2, we maximize Bgo in each subiteration. As an al
ternative, we could approximately maximize Bo in each subiteration. Under
suitable assumptions (see [19]), convergence is achieved in the limit, as the
number of iterations tends to infinity.
7 Numerical comparisons
In this section, we apply both single level and multilevel versions of DASA
to Netlib linear programming test problems and compare the performance
Rows Columns Nonzeros
PDS_02 877 3087 7546
PDS_06 2972 1 :2u 42304
PDS_10 4725 33270 76307
PDS_20 10214 81224 184176
KEN_07 887 2033 4354
KEN_11 5511 11984 26538
KEN_13 10957 24813 ,. 2:s
KEN_18 1' d., 89 I's_ 205019
I in s 1000 8806 27836
GREENBEA 1015 3222 23144
GREENBEB 1015 3213 23044
D2Q06C 1 , 5399 31342
CYCLE 912 2147 13172
Table 1: Description of the test problems
to that of Simplex and Barrier methods as implemented in the commercial
code ILOG CPLEX version 7.0 [2]. The test problems were solved on a
single processor (200 Mhz) in an 11B.. SP cluster. Since the CPLEX Barrier
code automatically applies a presolver [1] to the input problem, we use the
CPLEX resolved problem as input to the codes. There were 13 Netlib test
problems that seemed wellsuited for a multilevel decomposition in the sense
that the size of the initial 7 was at most 10% of the number of rows in the
matrix. A description of the 13 test problems appears in Table 1. These
statistics are for the resolved problems used in the numerical experiments,
not for the original problems on the Netlib test site.
The multilevel partitioning was done in the following way: Problems
KEN_07, KEN_11, KEN_13, KEN_18, PDS_02, PDS_06, PDS_10, PDS_20 have
the property that the nonzeros of the dependency matrix for A fall in a
series of diagonal blocks and along the outside border. For example, KEN_07
in Figure 10 has 7 x 7 = 49 diagonal blocks. The PDS problems are similar
except that there are 11 diagonal blocks as seen in Figure 14 for PDS_02. For
this set of 8 wellstructured problems, we specified that there were two levels,
with the leaf nodes corresponding to the diagonal blocks of the dependency
matrix. The remaining 5 problems, i l:1 ss, D2Q06C, GREENBEA, GREENBEB,
and CYCLE, were partitioned using the graph partitioning approach described
Problem
Figure 14: Nonzero structure of the dependency matrix for PDS02
in Section 2. We continued to partition into R1 U R2 U 7 as long as the
number of elements in R were at most 10% of the total number of elements
being partitioned. The set 9 was evaluated in meshpart [14], a MATLAB
toolbox, which finds an edge separator via CI., and then converts the edge
separator to a vertex separator via maximum matching (MATLAB's dmperm).
The entire partitioning process was carried out through a MATLAB interface
that generated the partitioning tree and an associated reordering of the rows.
As an example, the problem D2Q06C (Figure 15) was partitioned into a 32
node tree depicted in Figure 16. The CPU times that we report do not
include the time to partition the matrix.
For LP DASA, developed in [10], we take
f(x)= yK,
2
where e > 0 is a regularization parameter and y is a shift. Once we achieve
the maximum in the dual problem, the regularization parameter and shift
are updated. The details of their adjustments are given in [10].
In coding the LP DASA multilevel algorithm, we need to factor a matrix
of the form AFA' where F is the complement of the set Bk in Algorithm
1, and the subscript "F" denotes the submatrix of A associated with col
umn indices in F. This factorization is done using a roworiented sparse
CllIIkiy factorization algorithm [5, 29]. We do not exploit supernodes in
this code, and thus it runs about 7 times slower, in terms of floatingpoint
operations per second, than the supernodalbased code used in the CPLEX
Figure 15: >Nui,/r.i, structure of AAs for D2Q06C
a
o 9
o
0o5
o4
o3
o 1
ol
o o 2 o 3 o 0 5 06 0 7 o o8
neg 5
Figure 16: Partitioning tree associated with D2Q06C
barrier method. Fortunately, we refactor the matrix infrequently, and after
a column is freed or a row is dropped, we use the sparse modification tech
niques developed in [7, 8, 9] to modify the factorization. Since the ClI1ixky
factorization is computed rowbyrow, it can be done in steps, in the multi
level setting. The submatrix associated with a subtree (nodes 1, 2 and 3 in
Figure 12, for example) can be factored without having to compute the en
tire factorization. New rows corresponding to separators can be added to the
factorization without touching the previously computed partial factorization.
The codes for modifying a factorization after column or row updates, take
advantage of speedup associated with multiple rank updates. As a result, a
rank 16 update of a sparse matrix achieves flop rates comparable to those
of a BLAS dot product, and about 1/3 the speed of a BLAS matrixmatrix
multiply [11], in the experiments given in [8].
In coding LP DASA, we use a new version of COLAMD [6], which we
call constrained COLAMD, to reorder the rows of A to minimize the fill as
sociated with the CIi, ,1ky factorization of AAT. We constrain the ordering
so that rows associated with a set ,R for a leaf of the partitioning tree are
constrained to remain in 7, the sets 7, are chosen to satisfy the orthog
onality condition of Lemma 1; to preserve this orthogonality property, we
can only exchange rows within 7,. Our constrained COLAMD, developed
in collaboration with S. Rajamanickam, is related to the scheme developed
in [3] for a sparse LU factorization.
Our starting guesses for an optimal dual multiplier and primal solution
are always A = 0 and x = 0 respectively. Our convergence test is based
on the LP optimality conditions. The primal approximation x(A) always
satisfies the bound constraints x > 0. The column indices are expressed as
B U F where F = BC. For all j E B, xj(A) = 0 and (c ATA)j > 0. Hence,
the optimality conditions would be satisfied if the primal and dual linear
systems, Ax = b and AJA = CF, were satisfied. Our convergence criterion
is expressed in terms of the relative residuals in the primal and dual systems:
11),b1, A, A l. < 108
 I" X II, I ll.,
Here  I, denotes the maximum absolute component of a vector. Since
the test problems are given in MPS format, for which data often has 5 digit
accuracy, our 8 digit error tolerance for the equation is, in a sense, much
more accurate than the input data.
One Level Multilevel Simplex
PDS_02 0.33 0.33 0.05 0.92
PDS_06 14.14 7.81 1.48 9.67
PDS_10 35.68 27.16 6.06 32.04
PDS_20 376.87 238.68 42.38 157.16
KEN_07 0.17 0.12 0.11 0.34
KEN_11 2.20 1.53 1.50 2.94
KEN_13 19.14 10.25 6.45 9.71
KEN_18 199.39 94.83 44.98 *.**
I i: ss 6.08 5.01 I'N I I 1.18
GREENBEA 11.63 7.73 3.76 1.73
GREENBEB 3.01 2.58 2.98 5.15
D2Q06C 17.11 9.92 10.47 3.35
CYCLE 2.56 2.34 0.09 0.83
Table 2: LP DASA (single and multilevel), CPLEX (Simplex and Barrier)
The CPU times on the single processor 110. 1 computer are given in Table
2 for single and multilevel versions of LP DASA and for CPLEX Barrier and
Simplex codes. For problem KEN_18, the Barrier code generated an outof
memory message, and hence, no time is given. Notice that even for a single
processor, the multilevel implementation was faster than the single level im
plementation. The small subproblems in the multilevel approach were solved
quickly and the overall flop count was much smaller than that of a single
level implementation. Of course in a parallel computing environment where
the 324 independent subproblems associated with KEN_18 can be assigned
to separate computers, significant speedup is possible.
In comparing multilevel LP DASA to the CPLEX Simplex code, Simplex
was faster in 9 out of the 13 problems; in comparing multilevel LP DASA to
the CPLEX Barrier code, LP DASA was faster in 8 out of the 13 problems;
in comparing CPLEX Simplex to CPLEX Barrier, Simplex was faster in 9
out of the 13 problems.
Problem
Barrier
8 Summary
Sparse optimization problems with separable cost and with linear constraints
and bounds can be decomposed into a multilevel structure by applying codes
such as CIn, or '. 1 i to the dependency matrix. The multilevel structure
is described by a tree in which each node i is associated with a function
Li. Those i associated with the leaves of a special subtree can be optimized
independently, when the complementary variables are fixed. In an implemen
tation of the Dual Active Set Algorithm, the computation proceeds from the
leaves of the tree up to the root, optimizing over the leaves of special subtrees
along the way. Even on a single processor, the multilevel implementation was
faster than the single level implementation. Performance comparable to that
of CPLEX was achieved for the test set of Table 1.
References
[1] E. D. AND[I:1: [:': AND K. D. AND[:i:s1.[.:, Pr( ,/. I;' in linear pro
gr,,,,,i;,,, Mathematical Programming, 71 (1995), pp. 221245.
[2] R. .E. BIXBY, Progress in linear progri,,,,,;,,, ORSA J. on Computing,
6 (1994), pp. 1522.
[3] I. BRAINMAN AND S. TOLEDO, Nesteddissection orderings for sparse
LU with partial /,. 1./,,/ SIAM J. Matrix Anal. Appl., 23 (2002)
pp. !'r 1I)12.
[4] J. H. BRAMBLE, f,11ll, I;i; methods, John Wiley, New York, NY, 1993.
[5] T. A. DAVIS, Al, .,l,,ni 8xx: a concise sparse ('i g /,i fac
torization i,,.pI,/I submitted to AC'.I Trans. Math. Software.
http://www.cise.ufl.edu/~davis.
[6] T. A. DAVIS, J. R. G [[.'ii[:, S. I. LARIMORE, AND E. G. NG, A
column approximate minimum Ji, I, order;,o, ;(.,,ll,,, AC'.I Transac
tions on Mathematical Software, (to appear, see also TR00005, Univ. of
Florida, CISE Dept., www.cise.ufl.edu).
[7] T. A. DAVIS AND W. W. HAGER, .1, 1;f/,;,'/ a sparse C'lil/I 1y fac
torization, SIAM J. Matrix Anal. Appl., 20 (1999), pp. 606627.
[8] T. A. DAVIS AND W. W. HAGER, Multiple rank rn fi!flt.i,. of a
sparse Cl,,, .1 J factorization, SIAM J. Matrix Anal. Appl., 22 (2001),
pp. 9971013.
[9] T. A. DAVIS AND W. W. HAGER, Row 0mdfilt7i^r of a sparse
Cholesky factorization, October 21, 2003 (to appear in SIAM J. Ma
trix Anal. Appl.).
[10] T. A. DAVIS AND W. W. HAGER, A Sparse Proximal Implementation
of the LP Dual Active Set Al,.;i li,, July 21, 2004.
[11] J. DONGARRA, J. DU CROZ, S. HAMMARLING AND I. DUFF, A Set
of Level 3 Basic Linear A,1 I,, S;ldr;r,in; AC(.I Transactions on
Mathematical Software, 16 (1990) 117.
[12] M. C. Fl:I;:I:[ AND J. D. HORN, Po,';i';I.. mathematical prriqrm i
for parallel solution, Math. Prog., 80 (1' '), pp. 3562.
[13] J. GONDZIO AND R. '$ il[,.[viAN, Parallel interiorpoint solver for
structured linear programs, Math. Prog., 96 (2003), pp. 561584.
[14] J. R. GCiril:iar, G. L. MILLER, AND S.H. TENG, Geometric mesh
partitioning: Implementation and experiments. SIAM J. Scientific Com
puting 19 (1 'i'i,), pp. 20912110.
[15] W. II Ha'i,. rli i s .iJi,,,i .i methods and applications, SpringerVerlag,
Berlin, 1''
[16] W. W. HAGER, The dual active set ul',,, ,;i,, Advances in Optimiza
tion and Parallel Computing, P.M. Pardalos, ed., North Holland, Ams
terdam, (1992), pp. 137142.
[17] W. W. HAGER, The LP dual active set ui,,,llin,, High Performance
Algorithms and Software in Nonlinear Optimization, R. De Leone,
A. Murli, P. M. Pardalos, and G. Toraldo, eds., Kluwer, Dordrecht,
l'I' pp. 243254.
[18] W. W. HAGER, The dual active set uli,,,i ,;1i and its application to
linear program ,,,,, Computational Optimization and Applications, 21
(2002), pp. 263275.
[19] W. W. HAGER, The dual active set ,1,,, .I,,, and the iterative solution
of linear proqr.,i ii in Novel Approaches to Hard Discrete Optimization,
P. M. Pardalos and H. Wolkowicz, Eds., Fields Institute Communica
tions, Vol. 37 (2003), 95107. Mathematical Programming, submitted.
[20] W. W. HAGER AND D. HEARN, Application of the dual active set (,l, ,
rithm to quadratic network optimization, Computational Optimization
and Applications, 1 (1993), pp. 3 11 373.
[21] W. W. HAGER AND G. IANCI .i[:CU, Dual approximations in optimal
control, SIAM J. Control Optim., 22 (1984), pp. 423 It,'.
[22] W. W. HAGER, CHUNLIANG SHIH, AND ERIK 0. LUNDIN, Active set
strategies and the LP dual active set (l,,,. i',,, Department of Mathe
matics, University of Florida, Gainesville, Florida, 1996.
[23] M. T. HEATH AND P. RAGHAVAN, A Cartesian parallel nested dissec
tion (/i. /i,,I,,,. SIAM J. Matrix Anal. Appl., 16 (1995), pp. 235253.
[24] B. II *:.: 'm:',. )N AND R. LELAND, A multilevel (,/,,, ,;.l for parti
fi /. /,, ii,,/,11s Proc. Supercomputing '95, ACM.1, 1995.
[25] B. III: *:i):'Ir.kON AND E. ROTHBERG, Impro, .,I' the runtime and
quality of nested dissection order',,, SIAM J. Sci. Comput., 20 (1999),
4 1' I .
[26] G. KARYPIS AND V. KUMAR, A fast and ii., quality multilevel scheme
for ipi, i;;.,.,,,',; ni, ,, r,1 i qrph SIAM J. Sci. Stat. Comput., 20 (1'"'I ),
359392.
[27] G. KARYPIS AND V. KUMAR, Multilevel kway par;li,. .', scheme for
,,,i,,.i,, 7ri~1h. J. Parallel Distrib. Comput., 48 (1999), 96129.
[28] G. KARYPIS AND V. KUMAR, Parallel multilevel kway par;i;.l,.,,!i,
scheme for ,,, .,.al,, 7qriaih SIAM Review, 41 (1999), 278300.
[29] J. W. H. LIu, A ,i realized envelope method for sparse factorization by
rows, AC'.I Transactions on Mathematical Software, 17 (1991) 112129.
[30] D. G. LUENBERGER, Linear and Nonlinear Progrniiii,,,,, Addison
Wesley, Reading, MA, 1984.
