MULTIPLERANK MODIFICATIONS
OF A SPARSE CHOLESKY FACTORIZATION *
TIMOTHY A. DAVISt AND WILLIAM W. HAGERt
Abstract. Given a sparse symmetric positive definite matrix AAT and an associated sparse
( i.. I factorization LDLT or LLT, we develop sparse techniques for updating the factorization
after either ;..1.i;. a collection of columns to A or deleting a collection of columns from A. Our
techniques are based on an analysis and manipulation of the underlying graph structure, using the
framework developed in an earlier paper on rank one modifications [SIAM J. Matrix Anal. Appl.,
20 (1999), pp. 606627]. C ......ii, .11.... .1 the .... il;,i.. rank update has better memory traffic and
executes much faster than an equivalent series of rank one updates since the i.... il .i. rank update
makes one pass through L computing the new entries, while a series of rank one updates requires
i. ,i; i. passes through L.
Key words, numerical linear algebra, direct methods, ( I...I I factorization, sparse matrices,
mathematical software, matrix updates.
AMS subject classifications. 65F05, 65F50, 6504.
1. Introduction. This paper presents a method for evaluating a multiple rank
update or downdate of the sparse ('!. I.! !:I factorization LDLT or LLT of the matrix
AAT, where A is m by n. More precisely, given an m x r matrix W, we evaluate the
('! ..!. !: factorization of AAT + 7WWT where either a is +1 (corresponding to an
update) and W is arbitrary, or a is 1 (corresponding to a downdate) and W consists
of columns of A. Both AAT and AAT + UWWT must be positive definite. It follows
that n > m in the case of an update, and n r > m in the case of a downdate.
One approach to the multiple rank update is to express it as a series of rank1
updates and use the theory developed in [10] for updating a sparse factorization after
a rank1 change. This approach, however, requires multiple passes through L as it is
updated after each rank1 change. In this paper, we develop a sparse factorization
algorithm that makes only one pass through L.
For a dense ('! ..!. !:; factorization, a onepass algorithm to update a factorization
is obtained from Method C1 in [18] by making all the changes associated with one
column of L before moving to the next column, as is done in the following algorithm
that overwrites L and D with the new factors of AAT + ,WWT. Algorithm 1
performs 2rm2 + 4rm floatingpoint operations.
ALGORITHM 1 (Dense rankr update/downdate).
for i = to r do
ai =1
end for
for j = 1 to m do
for i = 1 to r do
a = ai + 7wji/dj (a = +1 for update or 1 for downdate)
*This work was supported by the National Science Foundation.
tdavis cise.ufl.edu, http://wAww.cise.ufl.edu/~davis, PO Box 116120, Department of Computer
and Information Science and Engineering, University of Florida, Gainesville, FL 326116120. Phone
(352) 3921481. Fax (352) 3921220. TR99006 (June 1999, revised Sept. 2000)
1, .. .1 1. nil .1.1 http://www.math.ufl.edu/~hager, PO Box 118105, Department of Mathe
matics, University of Florida, Gainesville, FL 326118105. Phone (352) 3920281. Fax (352) 3928357.
TIMOTHY A. DAVIS AND WILLIAM W. HAGER
dj = dj.
fi = wji/dj
dj = dj/ai
ai = a
end for
for p = j + 1 to m do
for i = to r do
Wpi = Wpi  jilpj
1pj = j + (1YiWpi
end for
end for
end for
We develop a sparse version of this algorithm that only accesses and modifies those
entries in L and D which can change. For r = 1, the theory in our rank1 paper [10]
shows that those columns which can change correspond to the nodes in an elimination
tree on a path starting from the node k associated with the first nonzero element wkl
in W. For r > 1 we show that the columns of L which can change correspond
to the nodes in a subtree of the elimination tree, and we express this subtree as a
modification of the elimination tree of AAT. Also, we show that with a reordering
of the columns of W, it can be arranged so that in the inner loop where elements in
row p of W are updated, the elements that change are ;.,li i I to each other. The
sparse techniques that we develop lead to sequential access of matrix elements and to
, t!l 1. ii computer memory Ii Ill.! These techniques to modify a sparse factorization
have i, ,ii applications including the Linear Program Dual Active Set Algorithm
(LP DASA) [20], leastsquares problems in statistics, the i, 1 ; of electrical circuits
and power I in structural mechanics, sensitivity i, 1 i in linear 1"'i "IIII;iI'
boundary condition changes in partial differential equations, domain decomposition
methods, and boundary element methods (see [19]).
Section 2 describes our notation. In section 3, we present an algorithm for comput
ing the ,,r,,,,,'l, factorization of AAT using multisets, which determines the location
of nonzero entries in L. Sections 4 and 5 describe our multiple rank ,nil .~,1 update
and downdate algorithms for finding the nonzero pattern of the new factors. Section 6
describes our algorithm for computing the new numerical values of L and D, for either
an update or downdate. Our experimental results are presented in Section 7.
2. Notation and background. Given the location of the nonzero elements of
AAT, we can perform a ,..,1,',. factorization (this i ni! i ,1,_ is introduced by
George and Liu in [15]) of the matrix to predict the location of the nonzero elements
of the ('!I..!. !: factor L. In ,. lL ,li some of these predicted nonzeros ii be
zero due to numerical cancellation during the factorization process. The statement
"ly 7 0" will mean that lij is ..,,,l'. i, nonzero. The main diagonals of L and
D are always nonzero since the matrices that we factor are positive definite (see [26,
p. 253]). The nonzero pattern of column j of L is denoted Cj:
Cj = {i : lij 0},
while denotes the collection of patterns:
= {L1, 2,. }.
MULTIPLERANK MODIFICATIONS
Similarly, Aj denotes the nonzero pattern of column j of A,
Aj = {i : aij 0},
while A is the collection of patterns:
A= {A1,A2,' ,A }.
The elimination tree can be defined in terms of a parent map 7r (see [22]). For ;, i
node j, 7r(j) is the row index of the first nonzero element in column j of L beneath
the diagonal element:
((j) = min j \ {j},
where "min X" denotes the smallest element of X:
min X = min i.
iEX
Our convention is that the min of the ii!'l set is zero. Note that j < 7r(j) except
in the case where the diagonal element in column j is the only nonzero element. The
children of node j is the set of nodes whose parent is j:
{c: j = r(c)}.
The ancestors of a node j, denoted P(j), is the set of successive parents:
P(j) = {j, Fr(j), 7r(7(j)), }.
Since 7r(j) > j for each j, the ancestor sequence is finite. The sequence of nodes
j, 7r(j), 7r(7(j)),  forming P(j), is called the path from j to the associated tree
root, the final node on the path. The collection of paths leading to a root form an
elimination tree. The set of all trees is the elimination forest. Typically, there is a
single tree whose root is m, however, if column j of L has only one nonzero element,
the diagonal element, then j will be the root of a separate tree.
The number of elements (or size) of a set X is denoted IX while IAl or IC denote
the sum of the sizes of the sets they contain.
3. Symbolic factorization. For a matrix of the form AAT, the pattern Cj of
column j is the union of the patterns of each column of L whose parent is j and each
column of A whose smallest row index of its nonzero entries is j (see [16, 22]):
(3.1) = {j} U ( U J r \ } u ( u A)
{c:j=r(c)} min A=j
To modify (3.1) during an update or downdate, without recomputing it from
scratch, we need to keep track of how each entry i entered into j [10]. For example,
if 7r(c) changes, we !1ii need to remove a term C, \ {c}. We cannot simply perform
a set subtraction, since we 11i remove entries that appear in other terms. To keep
track of how entries enter and leave the set Cj, we maintain a multiset associated
with column j. It has the form
S= {(i,m(i,j)) : i E },
TIMOTHY A. DAVIS AND WILLIAM W. HAGER
where the i, ill.l1 il; m(i,j) is the number of children of j that contain row index i
in their pattern plus the number of columns of A whose smallest entry is j and that
contain row index i. Equivalently, for i # j,
m(i,j) = {c: j = T(c) and i E C,} + I{k : min Ak = j and i E Ak}
For i = j, we increment the above equation by one to ensure that the diagonal entries
never disappear during a downdate. The set Cj is obtained from A by removing the
multiplicities.
We define the addition of a multiset XA and a set Y in the following way:
X + = {(i, m'(i)) : i E X or i E y},
where
1 if i X and i E ,
m'(i) = m(i) if i E X and i 3Y,
r(i) + 1 if i EXand i E .
Similarly, the subtraction of a set y from a multiset X is defined by
X 3= {(i,m'(i)) : i E X and m'(i) > 0},
where
,(i mr(i) if i y,
S'(i (i) 1 if i E Y.
The multiset subtraction of Y from XA undoes a prior addition. That is, for :,i
multiset XA and ;,~I set Y, we have
(( + 3Y) 3Y)=
In contrast ((X U 3) \ ) is equal to X if and only if X and y are 1 i. iil sets.
Using multiset addition instead of set union, (3.1) leads to the following algorithm
for computing the nilb. !i. factorization of AAT
ALGORITHM 2 (Symbolic factorization of AAT, using multisets).
for j = 1 to m do
c {(j,1)}
for each c such that j = 7r(c) do
jc = c + (, \{c})
end for
for each k where min Ak = j do
end for
Fr(j) = min Cj \ {j}
end for
MULTIPLERANK MODIFICATIONS
4. Multiple rank symbolic update. We consider how the pattern changes
when AAT is replaced by AAT + WWT. Since
AAT +WWT = [A \\ 1[A W]T,
we can in essence augment A by W in order to evaluate the new pattern of column
j in L. According to (3.1), the new pattern j of column j of L after the update is
(4.1) Zy={j} U U Z \{c} U ( U Ak) U U( U ,
{c:j= (c)} j min AK=j min Wi=j
where ',\ is the pattern of column i in W. Throughout, we put a bar over a matrix
or a set to denote its new value after the update or downdate.
In the following theorem, we consider a column j of the matrix L, and how its
pattern is modified by the sets ",' Let j denote the multiset for column j after the
rankr update or downdate has been applied.
THEOREM 4.1. To compute the new multiset rnitialize = C and .'i .....
the following 1'...'; ,1! ...
Case A: For each i such that j = min' add ' to the pattern for column
J,
jrj = jrj +';V
Case B: For each c such that j = 7r(c) = r(c), compute
cj = cj + GC, \ jc)
(c is a child of j in both the old and new elimination tree).
Case C: For each c such that j = (c) # 7r(c), compute
j = L + (C, \ {C})
(c is a child of j in the new tree, but not the old one).
Case D: For each c such that j = r(c) # rF(c), compute
Cj = Cj (Gc \ {c})
(c is a child of j in the old tree, but not the new one).
F f...f Cases AD account for all the .,liPil i!,. i1 we need to make in yj in order
to obtain j. These ;:.li1,[ Il are deduced from a comparison of (3.1) with (4.1).
In case A, we simply add in the V:' multisets of (4.1) that do not appear in (3.1). In
case B, node c is a child of node j both before and after the update. In this case, we
must ;,li1.1 for the deviation between Zc and ,c. By [10, Prop. 3.2], after a rank1
update, .C C C,. If wi denotes the ith column of W, then
WT = Wi W + W T
W W1 W2 W r .W
Hence, updating AAT by WWT is equivalent to r successive rank1 updates of AAT.
By repeated application of [10, Prop. 3.2], Cc C after a rankr update of AAT. It
TIMOTHY A. DAVIS AND WILLIAM W. HAGER
follows that Zc and c deviate from each other by the set jc \ C,. Consequently, in
case B we simply add in C, \ C.
In case C, node c is a child of j in the new elimination tree, but not in the old
tree. In this case we need to add in the entire set C, \ {c} since the corresponding
term does not appear in (3.1). Similarly, in case D, node c is a child of j in the old
elimination tree, but not in the new tree. In this case, the entire set c \ {c} should be
deleted. The case where c is not a child of j in either the old or the new elimination
tree does not result in :i, ;,1,i1l i,. I since the corresponding c term is absent from
both (3.1) and (4.1). D
An algorithm for updating a ('!C.d. !:, factorization that is based only on this
theorem would have to visit all nodes j from 1 to m, and consider all possible children
c < j. On the other hand, not all nodes j from 1 to m need to be considered since not
all columns of L change when AAT is modified. In [10, Thm. 4.1] we show that for
r = 1, the nodes whose patterns can change are contained in P(kl) where we define
ki = min' For a rankr update, let ('i) be the ancestor map associated with the
elimination tree for the ('I!, !I !: factorization of the matrix
i
(4.2) AAT +Y jw
j=1
Again, by [10, Thm. 4.1], the nodes whose patterns can change during the rankr
update are contained in the union of the patterns P(i) (ki), 1 < i < r. Although we
could evaluate 'P(i)(ki) for each i, it is ,1!!,i L11 to do this I, I !, since we need to
perform a series of rank1 updates and evaluate the ancestor map after each of these.
On the other hand, by [10, Prop. 3.1] and [10, Prop. 3.2], P(i) (j) C P(i+) (j) for each
i and j, from which it follows that P(i) (ki) C P(ki) for each i. Consequently, the
nodes whose patterns change during a rankr update are contained in the set
T= U P(k,).
1
Theorem 4.2, below, shows that i., node in T is also contained in one or more
of the sets P(i) (ki). From this it follows that the nodes in T are precisely those nodes
for which entries in the associated columns of L can change during a rankr update.
Before presenting the theorem, we illustrate this with a simple example shown in
I ,ii. 4.1. The left of I ,,.i. 4.1 shows the i pattern of original matrix AAT
its ('I. I !:I factor L, and the corresponding elimination tree. The nonzero pattern
of the first column of W is W1 = {1, 2}. If performed as a single rank1 update, this
causes a modification of columns 1, 2, 6, and 8 of L. The corresponding nodes in the
original tree are encircled; these nodes form the path () (1) = {1, 2,6, 8} from node
1 to the root (node 8) in the second tree. The middle of I i,,.. 4.1 shows the matrix
after this rank1 update, and its factor and elimination tree. The entries in the second
matrix AAT + wlwT that l!!. i from the original matrix AAT are shown as small
pluses. The second column of W has the nonzero pattern '2 = {3, 4, 7}. As a rank1
update, this affects columns P(2)(3) = P(3) = {3,4,5,6,7,8} of L. These columns
form a single path in the final elimination tree shown in the right of the figure.
For the first rank1 update, the set of columns that actually change are 'P () =(
{1, 2,6,8}. This is a subset of the path (1) = {1, 2,6,7,8} in the final tree. If we
use P(1) to guide the work associated with column 1 of W, we visit all the columns
MULTIPLERANK MODIFICATIONS
Original matrix
AAT
* *
*
*
00*
o*
Original factor L
After first update
AAT+ WT
++ *
++ *
*0
0*
***
After second update
T T T
AA +wIwI + w2w2
** *
** *
++6 +
++. +
0000
** ** *
++ +*
***
Factor after first update Factor after second update
Original
elimination tree
r
8'
6 7
5
1 23 4
*
000
*0
Elimination tree
after first update
8
6 7
2 5
1 3 4
0*
0O 00
00 00
000
Elimination tree
after second update
8
7
6
2 5
1 4
3
FIG. 4.1. Example rank2 update
that need to be modified, plus column 7. Node 7 is in the set of nodes P(3) affected
by the second rank1 update, however, as shown in the following theorem.
THEOREM 4.2. Each of the paths p(i) (k) is contained in T and ..... .,. if
j E T, then j is contained in p(') (ki) for some i.
S...... Before the theorem, we observe that each of the paths p(i) (ki) is contained
in T. Now suppose that some node j lies in the tree T. We need to prove that it is
contained in p(i) (ki) for some i. Let s be the largest integer such that P(k,) contains
j and let c be ;,ii child of j in T. If c lies on the path P(ki) for some i, then j lies
on the path P(ki) since j is the parent of c. Since j does not lie on the path P(ki)
for ;,i! i > s, it follows that c does not lie on the path P(ki) for ;,!i i > s. A,l i1,_
this same argument recursively, we conclude that none of the nodes on the subtree
of T rooted at j lie on the path P(ki) for :;, i > s. Let Tj denote the subtree of T
rooted at j. Since (i) (ki) is contained in P(ki) for each i, none of the nodes of Tj
lie on i:n of the paths P(i)(ki) for i > s. By [10, Thm. 4.1], the patterns of all nodes
outside the path p(i) (ki) are unchanged for each i. Let A) be the pattern of column
c in the ('!. .I. !: factorization of (4.2). Since ;o! node c contained in Tj does not lie
TIMOTHY A. DAVIS AND WILLIAM W. HAGER
FIG. 4.2. Example rank8 symbolic update and subtree T
on .,1 of the paths P(i)(ki) for i > s, i) = C) for all i, I > s. Since ks is a node
of Tj, the path P(S)(ks) must include j. [
1 ii I! . 4.2 depicts a subtree T for an example rank8 update. The subtree consists
of all those nodes and edges in one or more of the paths P(kl), P(k2), .. P(ks).
These paths form a subtree, and not a general graph, since they are all paths from
an initial node to the root of the elimination tree of the matrix L. The subtree T
might actually be a forest, if L has an elimination forest rather than an elimination
tree. The first nonzero positions in wl through ws correspond to nodes ki through
ks. For this example node k4 happens to lie on the path P(l)(ki). Nodes at which
paths first intersect are shown as smaller circles, and are labeled a through f. Other
nodes along the paths are not shown. Each curved arrow denotes a single subpath.
For example, the arrow from nodes b to e denotes the subpath from b to e in P(b).
This subpath is denoted as P(b, e) in I !,,! . 4.2.
The following algorithm computes the rankr  1lb .1i. update. It keeps track of
an ;, i! i of rn "I' !11 i11 L. ," one for each column of L. Each queue contains a set
of pathmarkers in the range 1 to r, which denote which of the paths P(kl) through
P(kr) will modify column j next. If two paths have merged, only one of the paths
needs to be considered (we arbitrarily select the highernumbered path to represent
the merged paths). This set of pathqueues requires O(m + r) space. Removing and
inserting a pathmarker in a pathqueue takes 0(1) time. The only outputs of the
algorithm are the new pattern of L and its elimination tree, namely, Zy and 7r(j) for
all j E [1, m]. Not all columns are affected by the rankr update. We define A = 2C
and 7r(j) = 7r(j) for ;.,! node j not in T.
Case C will occur for c and j prior to visiting column 7r(c), since j = 7(c) < 7r(c).
We thus place c in the lostchildqueue of column 7r(c) when encountering case C
for nodes c and j. When the algorithm visits node 7r(c), its lostchildqueue will
contain all those nodes for which case D holds. This set of lostchildqueues is not
MULTIPLERANK MODIFICATIONS
the same as the set of pathqueues (although there is exactly one lostchildqueue and
one pathqueue for each column j of L).
ALGORITHM 3 (Symbolic rankr update, add new matrix W).
Find the starting nodes of each path
for i = to r do
= {k:Wki) 0}
ki = min \
place pathmarker i in pathqueue of column ki
end for
Consider all columns corresponding to nodes in the paths P(ki) through P(kr)
for j = ii, ki to m do
if pathqueue of column j is nonempty do
i r
~j = J;
for each pathmarker i on pathqueue of column j do
Path P(ki) includes column j
Let c be the prior column on this path (if ;,, I', where r(c) = j
if j = ki do
Case A: j is the first node on the path P(ki), no prior c
else if j = 7(c) then
Case B: c is an old child of j, p..... 'li, changed
Cj = Cj + (C, \ jc)
else
Case C: c is a new child of j and a lost child of 7r(c)
j = j + (C, \ {c})
place c in lostchildqueue of column 7r(c)
endif
end for
Case D: consider each lost child of j
for each c in lostchildqueue of column j do
j = ~ (C, \ {c})
end for
Move up one step in the path(s)
W(j) = min j \ {j}
ifj \ {j} 0 then
Let i be the largest pathmarker in pathqueue of column j
Place pathmarker i in pathqueue of column 7r(j)
end if
end if pathqueue of column j ........ '/'!.
end for
The optimal time for a general rankr update is
0( ET
TIMOTHY A. DAVIS AND WILLIAM W. HAGER
The actual time taken by Algorithm 3 only slightly higher, namely,
0 m+( 4)
because of the O(m) bookkeeping required for the pathqueues. In most practical
cases, the O(m) term will not be the dominant term in the run time.
Algorithm 3 can be used to compute an entire n!il.I!i factorization. We start
by factorizing the ;1. tii matrix I = IIT into LDLT =III. In this case, we have
CE = {(j, 1)} for all j. The initial elimination tree is a forest of m nodes and no
edges. We can now determine the iil,1. 1i factorization of I + AAT using the rankr
symbolic update algorithm above, with r = m. This matrix has identical  niii .i,
factors as AAT. Case A will apply for each column in A, corresponding to the
U Ak
min A =j
term in (3.1). Since 7r(c) = 0 for each c, cases B and D will not apply. At column j,
case C will apply for all children in the elimination tree, corresponding to the
U \c
{c:j=T(c)}
term in (3.1). Since duplicate paths are discarded when they merge, we modify
each column j once, for each child c in the elimination tree. This is the same work
performed by the symbolic factorization algorithm, Algorithm 2, which is O(i ).
Hence, Algorithm 3 is equivalent to Algorithm 2 when we apply it to the update
I + AAT. Its run time is optimal in this case.
5. Multiple rank symbolic downdate. The downdate algorithm is analogous.
The downdated matrix is AAT WWT where W is a subset of the columns of A.
In a downdate, P(k) C P(k), and thus rather than following the paths P(ki), we
follow the paths P(ki). Entries are dropped during a downdate, and thus C C Cj
and 7r(j) < 7r(j). We start with Cj = C" and make the following changes.
Case A: If j = min' for some i, then the pattern 'V is removed from
column j,
j~j = jj "
Case B: If j = 7r(c) = r(c) for some node c, then c is a child of j in both the
old and new tree. We need to remove from C, entries in the old pattern C,
but not in the new pattern C,,
Cj = Cj (\ ).
Case C: If j = 7r(c) 7 7r(c) for some node c, then c is a child of j in the old
elimination tree, but not the new tree. We compute
E4= Ej Gc\ 10).
MULTIPLERANK MODIFICATIONS
Case D: If j = r(c) f 7r(c) for some node c, then c is a child of j in the new
tree, but not the old one. We compute
Ej = Ej + (Gc \ {c}).
Case C will occur for c and j prior to visiting column 7r(c), since j = 7r(c) < 7r(c).
We thus place c in the newchildqueue of 7r(c) when encountering case C for nodes c
and j. When the algorithm visits node 7r(c), its newchildqueue will contain all those
nodes for which case D holds.
ALGORITHM 4 (Symbolic rankr downdate, remove matrix W).
Find the starting nodes of each path
for i = to r do
= {k:wi # 0}
ki = min V\'
place pathmarker i in pathqueue of column ki
end for
Consider all columns corresponding to nodes in the paths P(ki) through P(kr)
for j = !ii ki to m do
if pathqueue of column j is nonempty do
=j j
for each pathmarker i on pathqueue of column j do
Path P(ki) includes column j
Let c be the prior column on this path (if i' where 7r(c) = j
if j = ki do
Case A: j is the first node on the path P(ki), no prior c
~j = ~j . *'
else if j = F(c) then
Case B: c is an old child of j, ;i...i.., changed
=j (GC \ 4 )
else
Case C: c is a lost child of j and a new child of r(c)
c = c (Gc \ {c})
place c in newchildqueue of column 7r(c)
endif
end for
Case D: consider each new child of j
for each c in newchildqueue of j do
cj = J + (C, \ {c})
end for
Move up one step in the path(s)
W(j) = min j \ {j}
if j \ {j} 0 then
Let i be the largest pathmarker in pathqueue of column j
Place pathmarker i in pathqueue of column 7r(j)
end if
end if pathqueue of column j '...1....... ',
end for
TIMOTHY A. DAVIS AND WILLIAM W. HAGER
The time taken by Algorithm 4 is
0 (m+ ,
which slightly higher than the optimal time,
In most practical cases, the O(m) term in the i !,ii ...1 run time for Algorithm 4
will not be the dominant term.
6. Multiple rank numerical update and downdate. The following numer
ical rankr update/downdate algorithm, Algorithm 5, overwrites L and D with the
updated or downdated factors. The algorithm is based on Algorithm 1, the onepass
version of Method C1 in [18] presented in Section 1. The algorithm is used after
the 1 1. ,1i. update algorithm (Algorithm 3) has found the subtree T corresponding
to the nodes whose patterns can change, or after the symbolic downdate algorithm
(Algorithm 4) has found T. Since the columns of the matrix W can be reordered
without affecting the product WWT, we reorder the columns of W using a depth
first search [6] of T (or T) so that as we march through the tree, consecutive columns
of W are utilized in the computations. This reordering improves the numerical up
date/downdate algorithm by placing all columns of W that affect ,i,! given subpath
next to each other, eliminating an indexing operation. Reordering the columns of
a sparse matrix prior to ('!,.!. !: factorization is very common [3, 22, 23, 2.] It
improves data 1... li I and simplifies the algorithm, just as it does for reordering W
in a multiple rank update/downdate. The depth first ordering of the tree changes as
the elimination tree changes, so columns of W must be ordered for each update or
downdate.
To illustrate this reordering, consider the subtree T in I ii,.. 4.2 for a rank8
update. If the depthfirstsearch algorithm visits child subtrees from left to right, the
resulting reordering is as shown in I !,!.. 6.1. Each subpath in I i,,.. 6.1 is labeled
with the range of columns of W that affect that subpath, and with the order in which
the subpath is processed by Algorithm 5. Consider the path from node c to e. In
I i 4.2, the columns of L corresponding to nodes on this subpath are updated by
columns 2, 8, 3, and 5 of W, in that order. In the reordered subtree (1 !,,!. 6.1), the
columns on this subpath are updated by columns 5 through 8 of the reordered W.
ALGORITHM 5 (Sparse numeric rankr modification, add 17WWT).
I !. columns of W have been reordered.
for i = 1 to r do
ai =1
end for
for each subpath in depthfirstsearch order in T (a = 1) or T (a = 1) do
Let cl through c2 be the columns of W that affect this subpath
for each column j in the subpath do
for i = cl to c2 do
a = ai + W0ildj
MULTIPLERANK MODIFICATIONS
3rd 0
FIG. 6.1. Example rank8 update after depthfirstsearch reordering
dj = dja
Ti = wji/dj
dj = d/,ai
ai =dT
end for
for all p \ {j} (a = 1) or pE \ {j} (a = 1) do
fori = cl to c2 do
Wpi = Wpi  jilpj
Ipj = Ipj (+ TWpi
end for
end for
end for
end for
The time taken by r rank1 updates [10] is
(6.1) O(A I)
( ri=1 jEP(i)(ki)
where A(i) is the pattern of column j after the ith rank1 update. This time is
;i !i' i '.. 11 optimal. A single rankr update cannot determine the paths p((i)(ki),
but uses P(ki) instead. Thus, the time taken by Algorithm 5 for a rankr update is
j ( ePT(i)(
This is slightly higher than (6.1), because Pi)(kA,) C P(ki) and Ci C Cj. Since
TIMOTHY A. DAVIS AND WILLIAM W. HAGER
TABLE 7.1
Dense matrix performance for 64by64 matrices and 64by1 vectors
BLAS operation I11..I
DGEMM (matrixmatrix .... ill;.i 171.6
DGEMV (matrixvector .... i; .i 130.0
DTRSV (solve Lx = b) 81.5
DAXPY (the vector computation y = ax + y) 78.5
DDOT (the dot product a = xTy) 68.7
pi) (k1) C P(ki), the ith column of W does not necessarily affect all of the columns
in the path P(ki). If wi does not affect column j, then wji and yi will both be zero in
the inner loop in Algorithm 5. An example of this occurs in I !,iL! 4.1, where column
1 of W does not affect column 7 of L. We could check this condition, and reduce the
~i i ,I1d'. i run time to
i=1 jEP(i)((ki)
In practice, however, we found that the paths p(i) (ki) and P(ki) did not differ much.
Including this test did not improve the overall performance of our algorithm. The
time taken by Algorithm 5 for a rankr downdate is similar, namely,
i=1 j(EP(ki)
The numerical algorithm for updating and downdating LLT is essentially the
same as that for LDLT [4, 24]; the only difference is a diagonal scaling. For either
LLT or LDLT, the !1.1. .!i algorithms are identical.
7. Experimental results. To test our methods, we selected the same exper
iment as in our earlier paper on the singlerank update and downdate [10], which
mimics the behavior of the Linear Programming Dual Active Set Algorithm [20]. The
first matrix is 10 + AoA0T, where Ao consists of 5446 columns from a larger 6071
by12,230 matrix B with 35,632 nonzeros arising in an airline scheduling problem
(DFL001) [13]. The 5446 columns correspond to the optimal solution of the linear
programming problem. I iii!n, with an initial LDLT factorization of the matrix
106I + AoA0T, we added columns from B (corresponding to an update) until we
obtained the factors of 106I+ BBT. We then removed columns in a firstinfirstout
order (corresponding to a downdate) until we obtained the original factors. The LP
DASA algorithm would not perform this much work (6784 updates and 6784 down
dates) to solve this linear programming problem.
Our experiment took place on a Sun Ultra Enterprise running the Solaris 2.6
operating  I. i, with eight 2' Mhz UltraSparcII processors (only one processor
was used) and 2GB of main memory. The dense matrix performance in millions of
floatingpoint operations per second ( !!!.. '1. of the BLAS [12] is shown in Table 7.1.
All results presented below are for our own codes (except for colmmd, spooles, and
the BLAS) written in the C programming language and using double precision floating
point arithmetic.
We first permuted the rows of B to preserve sparsity in the ('!. .I. !: factors of
BBT. This can be done I! !IilI with colamd [7, 8, 9, 21], which is based on an
MULTIPLERANK MODIFICATIONS
TABLE 7.2
Average update and downdate performance results
rank rankr time / r Il..
r in seconds
update downdate update downdate
1 0.0840 0.0880 30.3 29.6
2 0.0656 0.0668 38.9 39.0
3 0.0589 0.0597 43.3 43.6
4 0.0513 0.0549 49.7 47.5
5 0.0500 0.0519 51.0 50.2
6 0.0469 0.0487 54.4 53.5
7 0.0451 0.0468 56.6 55.7
8 0.0434 0.0448 58.8 58.2
9 0.0431 0.0458 59.1 57.0
10 0.0426 0.0447 60.0 58.3
11 0.0415 0.0437 61.5 59.6
12 0.0413 0.0432 61.8 60.3
13 0.0403 0.0424 63.2 61.4
14 0.0402 0.0420 63.6 62.1
15 0.0395 0.0413 64.6 63.1
16 0.0392 0.0408 65.1 63.9
approximate minimum degree ordering algorithm [1]. However, to keep our results
consistent with our prior rank1 update/downdate paper [10], we used the same per
mutation as in those experiments (from colmmd [17]). Both colamd and Matlab's
colmmd compute the ordering without forming BBT explicitly. A il..~li factoriza
tion of BBT finds the nonzero counts of each column of the factors. This step takes
an amount of space this is proportional to the number of nonzero entries in B. It
gives us the size of a static data structure to hold the factors during the updating
and downdating process. The numerical factorization of BBT is not required. A
second symbolic factorization finds the first nonzero pattern C. An initial numerical
factorization computes the first factors L and D. We used our own nonsupernodal
factorization code (similar to SPAISi'AK [5, 15]), since the update/downdate algo
rithms do not use supernodes. A supernodal factorization code such as spooles [3] or
a multifrontal method [2, 14] can get better performance. The factorization method
used has no impact on the performance of the update and downdate algorithms.
We ran 16 i.11. i. Ii experiments, each one using a different rankr update and
downdate, where r varied from 1 to 16. After each rankr update, we solved the
sparse linear  I. i, LDLTx = b using a dense righthand side b. To compare the
performance of a rank1 update with a rankr update (r > 1), we divided the run time
of the rankr update by r. This gives us a normalized time for a single rank1 update.
The average time and Mflops rate for a normalized rank1 update and downdate for
the entire experiment is shown in Table 7.2. The time for the update, downdate, or
solve increases as the factors become denser, but the performance in terms of Mflops
is fairly constant for all three operations. The first rank16 update when the factor
L is sparsest takes 0.47 seconds (0.0294 seconds normalized) and runs at 65.5 Mflops
compared to 65.1 Mflops in Table 7.2 for the average speed of all the rank16 updates.
The performance of each step is summarized in Table 7.3. A rank5 update takes
about the same time as using the updated factors to solve the sparse linear  i. i,
LDLTx = b, even though the rank5 update performs 2.6 times the work.
The work, in terms of floatingpoint operations, varies only slightly as r changes.
\\IIl rank1 updates, the total work for all the updates is 17.293 billion floating
TIMOTHY A. DAVIS AND WILLIAM W. HAGER
TABLE 7.3
Dense matrix performance for 64by64 matrices and 64by1 vectors
Operation Time (sec) II..i. Notes
colamd ordering 0.45
Symbolic factorization (of BBT) 0.07 1.49 million nonzeros
Symbolic factorization for first C 0.46 831 thousand nonzeros
Numeric factorization for first L (our code) 20.07 24.0
Numeric factorization for first L (spooles) 18.10 26.6
Numeric factorization of BB1 (our code) 61.04 18.5 not required
Numeric factorization of BBT (spooles) 17.80 63.3 not required
Average rank16 update 0.63 65.1 compare with rank1
Average rank5 update 0.25 51.0 compare with solve step
Average rank1 update 0.084 30.3
Average solve LDL x = b 0.27 18.2
point operations, or 2.55 million per rank1 update. \\ ii! rank16 updates (the
worst case), the total work increases to 17.318 billion floatingpoint operations. The
rank1 downdates take a total of 17.679 billion floatingpoint operations (2.61 million
per rank1 downdate), while the rank16 downdates take a total of 17.691 billion
operations. This confirms the nearoptimal operation count of the multiple rank
update/downdate, as compared to the optimal rank1 update/downdate.
Solving Lx = b when L is sparse and b is dense, and computing the sparse LDLT
factorization using a nonsupernodal method, both give a rather poor computation
tomemoryreference ratio of only 2/3. We tried the same loop unrolling technique
used in our update/downdate code for our sparse solve and sparse LDLT factorization
codes, but this resulted in no improvement in performance.
A sparse rankr update or downdate can be implemented in a onepass algorithm
that has much better memory I i ,I.i than that of a series of r rank1 modifications. In
our numerical experimentation with the DFL001 linear programming test problem, the
rankr modification was more than twice as fast as r rank1 modifications for r > 11.
The superior performance of the multiple rank algorithm can be explained using the
computationto i. iI i. f. i. i 1.. ratio. If cl = 2 in Algorithm 5 (a subpath affected
by only one column of W), it can be shown that this ratio is about 4/5 when j is
large. The ratio when C2 = cl + 15 (a subpath affected by 16 columns of W) is
about 64/35 when j is large. Hence, going from a rank1 to a rank16 update
improves the computationto11,. ,,. .i i. f i. i. .. ratio by a factor of about 2.3 when
column j of L has 11ii i~ nonzeros. By comparison, the level1 BLAS routines for
dense matrix computations (vector computations such as DAXPY and DDOT) [11]
have computationtomemoryreference ratios between 2/3 and 1. The level2 BLAS
(D(; I.. V and D I 1S\ for example) have a ratio of 2.
8. Summary. Because of improved memory 1 ... li our multiplerank sparse
update/downdate method is over twice as fast as our prior rank1 update/downdate
method. The performance of our new method (65.1 Mflops for a sparse rank16
update) compares favorably with both the dense matrix performance (81.5 Mflops to
solve the dense  i. i Lx = b) and the sparse matrix performance (18.0 Mflops to
solve the sparse i. 11, Lx = b and an observed peak numerical factorization of 63.3
Mflops in spooles) on the computer used in our experiments. Although not strictly
optimal, the multiplerank update/downdate method has nearly the same operation
count as the rank1 update/downdate method, which has an optimal operation count.
MULTIPLERANK MODIFICATIONS
REFERENCES
[1] P. R. AMESTOY, T. A. DAVIS, AND I. S. DUFF, An approximate minimum degree ordering
SIAM J. Matrix Anal. . i1i;. 17 (1996), pp. 886905.
[2] P. R. AMESTOY AND I. S. DUFF, Vectorization of a multiprocessor multifrontal code, Inter
nat. J. Supercomputer Appl., 3 (1989), pp. 4159.
[3] C. ASHCRAFT AND R. G. GRIMES, SPOOLES: an objectoriented sparse matrix n in
Proc. 1999 SIAM Conf. on Parallel Processing for Sci. Comput., Mar. 1999.
[4] C. H. BISCHOF, C.T. PAN, AND P. T. P. TANG, A ( .. I up and downdating for
systolic and SIMD architectures, SIAM J. Sci. Comput., 14 (1993), pp. 670676.
[5] E. CHU, A. GEORGE, J. W. H. LIU, AND E. NG, SPARSPAK: Waterloo sparse matrix package,
user's guide for SPARSPAKA, tech. report, Univ. of Waterloo Dept. of Comp. Sci., Nov.
1984.
[6] T. H. CORMEN, C. E. LEISERSON, AND R. L. RIVEST, Introduction to i. MIT Press,
Cambridge, Massachusets, and McGrawHill, New York, 1990.
[7] T. A. DAVIS, J. R. GILBERT, S. I. LARIMORE, E. NG, AND B. PEYTON, A column approximate
minimum degree ordering in Proc. SIAM Conference on .l .i;.1. Linear Algebra,
Oct. 1997, p. 29.
[8] T. A. DAVIS, J. R. GILBERT, E. NG, AND B. PEYTON, A column approximate minimum de
gree ordering in Abstracts of the Second SIAM Conference on Sparse Matrices,
Snowbird, Utah, Oct. 1996.
[9] A column approximate minimum degree ordering in XIII Householder Symp.
on Numerical Linear Algebra, Switzerland, June 1996.
[10] T. A. DAVIS AND W. W. HAGER, .' *. a sparse ( ' factorization, SIAM J. Matrix
Anal. i.i.1;. 20 (1999), pp. 606627.
[11] J. J. DONGARRA, J. R. BUNCH, C. B. MOLER, AND G. W. STEWART, LINPACK Users Guide,
Philadelphia: SIAM Publications, 1978.
[12] J. J. DONGARRA, J. Du CROZ, I. S. DUFF, AND S. HAMMARLING, A set of level3 basic linear
algebra subprograms, ACM Trans. Math. Softw., 16 (1990), pp. 117.
[13] J. J. DONGARRA AND E. GROSSE, Distribution of mathematical software via electronic mail,
Comm. ACM, 30 (1987), pp. 403407.
[14] I. S. DUFF AND J. K. REID, The multifrontal solution of indefinite sparse symmetric linear
equations, ACM Trans. Math. Softw., 9 (1983), pp. 302325.
[15] A. GEORGE AND J. W. H. LIU, Computer Solution of Large Sparse Positive Definite Systems,
Englewood Cliffs, New Jersey: PrenticeHall, 1981.
[16] A. GEORGE, J. W. H. LIU, AND E. NG, A data structure for sparse i '. and LU factorizations,
SIAM J. Sci. Statist. Comput., 9 (1988), pp. 100121.
[17] J. R. GILBERT, C. MOLER, AND R. SCHREIBER, Sparse matrices in MATLAB: design and
implementation, SIAM J. Matrix Anal. '....i;. 13 (1992), pp. 333356.
[18] P. E. GILL, G. H. GOLUB, W. MURRAY, AND M. A. SAUNDERS, Methods for matrix
factorizations, Mathematics of Computation, 28 (1974), pp. 505535.
[19] W. W. HAGER, Updating the inverse of a matrix, SIAM Review, 31 (1989), pp. 221239.
[20] The LP dual active set in High Performance Algorithms and Software in
Nonlinear Optimization, R. D. Leone, A. Murli, P. M. Pardalos, and G. Toraldo, eds.,
Dordrecht, 1998, Kluwer, pp. 243254.
[21] S. I. LARIMORE, An approximate minimum degree column ordering Tech. Report
TR98016, Univ. of Florida, Gainesville, FL, 1998. http://Awww.cise.ufl.edu/ 1. i. I .., i /.
[22] J. W. H. Liu, The role of elimination trees in sparse factorization, SIAM J. Matrix Anal. Ap
.i;. 11 (1990), pp. 134172.
[23] E. G. NG AND B. W. PEYTON, A supernodal i factorization for shared
memory multiprocessors, SIAM J. Sci. Comput., 14 (1993), pp. 761769.
[24] C.T. PAN, A modification to the LINPACK downdating BIT, 30 (1990), pp. 707
722.
[25] E. ROTHBERG, A. GUPTA, N. E. G., AND B. W. PEYTON, Parallel sparse ( .. factorization
for sharedmemory multiprocessor systems, in Advances in Computer Methods
for Partial Differential Equations VII, R. ..I.. 1 1 D. Knight, and G. Richter, eds.,
IMACS, 1992, pp. 622628.
[26] G. STRANG, Linear Algebra and Its Applications, Academic Press, New York, 1980.
