Group Title: Department of Computer and Information Science and Engineering Technical Reports
Title: Multiple-rank modifications of a sparse Cholesky factorization
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00095439/00001
 Material Information
Title: Multiple-rank modifications of a sparse Cholesky factorization
Series Title: Department of Computer and Information Science and Engineering Technical Reports
Physical Description: Book
Language: English
Creator: Davis, Timothy A.
Hager, William W.
Affiliation: University of Florida -- Department of Computer and Information Science and Engineering
University of Florida -- Department of Mathematics
Publisher: Department of Computer and Information Science and Engineering, University of Florida
Place of Publication: Gainesville, Fla.
Copyright Date: 1999
 Record Information
Bibliographic ID: UF00095439
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.

Downloads

This item has the following downloads:

1999374 ( PDF )


Full Text











MULTIPLE-RANK MODIFICATIONS
OF A SPARSE CHOLESKY FACTORIZATION *

TIMOTHY A. DAVISt AND WILLIAM W. HAGERt

Abstract. Given a sparse symmetric positive definite matrix AAT and an associated sparse
( i.. I factorization LDLT or LLT, we develop sparse techniques for updating the factorization
after either ;..1.i;. a collection of columns to A or deleting a collection of columns from A. Our
techniques are based on an analysis and manipulation of the underlying graph structure, using the
framework developed in an earlier paper on rank one modifications [SIAM J. Matrix Anal. Appl.,
20 (1999), pp. 606-627]. C ......ii, .11.... .1 the .... il;,i.. rank update has better memory traffic and
executes much faster than an equivalent series of rank one updates since the i.... il .i. rank update
makes one pass through L computing the new entries, while a series of rank one updates requires
i. ,i; -i. passes through L.

Key words, numerical linear algebra, direct methods, ( I...I I factorization, sparse matrices,
mathematical software, matrix updates.

AMS subject classifications. 65F05, 65F50, 65-04.

1. Introduction. This paper presents a method for evaluating a multiple rank
update or downdate of the sparse ('!. I.! -!:I factorization LDLT or LLT of the matrix
AAT, where A is m by n. More precisely, given an m x r matrix W, we evaluate the
('! ..!. -!: factorization of AAT + 7WWT where either a is +1 (corresponding to an
update) and W is arbitrary, or a is -1 (corresponding to a downdate) and W consists
of columns of A. Both AAT and AAT + UWWT must be positive definite. It follows
that n > m in the case of an update, and n r > m in the case of a downdate.
One approach to the multiple rank update is to express it as a series of rank-1
updates and use the theory developed in [10] for updating a sparse factorization after
a rank-1 change. This approach, however, requires multiple passes through L as it is
updated after each rank-1 change. In this paper, we develop a sparse factorization
algorithm that makes only one pass through L.
For a dense ('! ..!. -!:; factorization, a one-pass algorithm to update a factorization
is obtained from Method C1 in [18] by making all the changes associated with one
column of L before moving to the next column, as is done in the following algorithm
that overwrites L and D with the new factors of AAT + ,WWT. Algorithm 1
performs 2rm2 + 4rm floating-point operations.

ALGORITHM 1 (Dense rank-r update/downdate).
for i = to r do
ai =1
end for
for j = 1 to m do
for i = 1 to r do
a = ai + 7wji/dj (a = +1 for update or -1 for downdate)

*This work was supported by the National Science Foundation.
tdavis cise.ufl.edu, http://wAww.cise.ufl.edu/~davis, PO Box 116120, Department of Computer
and Information Science and Engineering, University of Florida, Gainesville, FL 32611-6120. Phone
(352) 392-1481. Fax (352) 392-1220. TR-99-006 (June 1999, revised Sept. 2000)
1, .-. .1 1. nil .1.1 http://www.math.ufl.edu/~hager, PO Box 118105, Department of Mathe-
matics, University of Florida, Gainesville, FL 32611-8105. Phone (352) 392-0281. Fax (352) 392-8357.










TIMOTHY A. DAVIS AND WILLIAM W. HAGER


dj = dj.
-fi = wji/dj
dj = dj/ai
ai = a
end for
for p = j + 1 to m do
for i = to r do
Wpi = Wpi -- jilpj
1pj = j + (1YiWpi
end for
end for
end for

We develop a sparse version of this algorithm that only accesses and modifies those
entries in L and D which can change. For r = 1, the theory in our rank-1 paper [10]
shows that those columns which can change correspond to the nodes in an elimination
tree on a path starting from the node k associated with the first nonzero element wkl
in W. For r > 1 we show that the columns of L which can change correspond
to the nodes in a subtree of the elimination tree, and we express this subtree as a
modification of the elimination tree of AAT. Also, we show that with a reordering
of the columns of W, it can be arranged so that in the inner loop where elements in
row p of W are updated, the elements that change are ;.,li i I to each other. The
sparse techniques that we develop lead to sequential access of matrix elements and to
, t!l 1. ii computer memory Ii Ill.! These techniques to modify a sparse factorization
have i, ,ii- applications including the Linear Program Dual Active Set Algorithm
(LP DASA) [20], least-squares problems in statistics, the i, 1- -;- of electrical circuits
and power -I- in- structural mechanics, sensitivity i, 1- -i- in linear 1"-'-i "IIII;iI-'
boundary condition changes in partial differential equations, domain decomposition
methods, and boundary element methods (see [19]).
Section 2 describes our notation. In section 3, we present an algorithm for comput-
ing the ,,r,,,,,'l, factorization of AAT using multisets, which determines the location
of nonzero entries in L. Sections 4 and 5 describe our multiple rank ,nil .~,1 update
and downdate algorithms for finding the nonzero pattern of the new factors. Section 6
describes our algorithm for computing the new numerical values of L and D, for either
an update or downdate. Our experimental results are presented in Section 7.

2. Notation and background. Given the location of the nonzero elements of
AAT, we can perform a ,..,1,',. factorization (this i ni! i ,1,_- is introduced by
George and Liu in [15]) of the matrix to predict the location of the nonzero elements
of the ('!I..!. -!: factor L. In ,. lL ,li- some of these predicted nonzeros ii be
zero due to numerical cancellation during the factorization process. The statement
"ly 7 0" will mean that lij is ..,,,l'. i, nonzero. The main diagonals of L and
D are always nonzero since the matrices that we factor are positive definite (see [26,
p. 253]). The nonzero pattern of column j of L is denoted Cj:

Cj = {i : lij 0},

while denotes the collection of patterns:


= {L1, 2,-. }.










MULTIPLE-RANK MODIFICATIONS


Similarly, Aj denotes the nonzero pattern of column j of A,

Aj = {i : aij 0},

while A is the collection of patterns:

A= {A1,A2,' ,A }.

The elimination tree can be defined in terms of a parent map 7r (see [22]). For ;, i
node j, 7r(j) is the row index of the first nonzero element in column j of L beneath
the diagonal element:

((j) = min j \ {j},

where "min X" denotes the smallest element of X:

min X = min i.
iEX

Our convention is that the min of the ii!'l set is zero. Note that j < 7r(j) except
in the case where the diagonal element in column j is the only nonzero element. The
children of node j is the set of nodes whose parent is j:

{c: j = r(c)}.

The ancestors of a node j, denoted P(j), is the set of successive parents:

P(j) = {j, Fr(j), 7r(7(j)), -}.

Since 7r(j) > j for each j, the ancestor sequence is finite. The sequence of nodes
j, 7r(j), 7r(7(j)), - forming P(j), is called the path from j to the associated tree
root, the final node on the path. The collection of paths leading to a root form an
elimination tree. The set of all trees is the elimination forest. Typically, there is a
single tree whose root is m, however, if column j of L has only one nonzero element,
the diagonal element, then j will be the root of a separate tree.
The number of elements (or size) of a set X is denoted IX while IAl or IC denote
the sum of the sizes of the sets they contain.
3. Symbolic factorization. For a matrix of the form AAT, the pattern Cj of
column j is the union of the patterns of each column of L whose parent is j and each
column of A whose smallest row index of its nonzero entries is j (see [16, 22]):


(3.1) = {j} U ( U J r \ } u ( u A)
{c:j=r(c)} min A=j

To modify (3.1) during an update or downdate, without recomputing it from
scratch, we need to keep track of how each entry i entered into j [10]. For example,
if 7r(c) changes, we !1ii need to remove a term C, \ {c}. We cannot simply perform
a set subtraction, since we 11i remove entries that appear in other terms. To keep
track of how entries enter and leave the set Cj, we maintain a multiset associated
with column j. It has the form

S= {(i,m(i,j)) : i E },










TIMOTHY A. DAVIS AND WILLIAM W. HAGER


where the i, ill.l1 il;- m(i,j) is the number of children of j that contain row index i
in their pattern plus the number of columns of A whose smallest entry is j and that
contain row index i. Equivalently, for i # j,

m(i,j) = {c: j = T(c) and i E C,} + I{k : min Ak = j and i E Ak}

For i = j, we increment the above equation by one to ensure that the diagonal entries
never disappear during a downdate. The set Cj is obtained from A by removing the
multiplicities.
We define the addition of a multiset XA and a set Y in the following way:

X + = {(i, m'(i)) : i E X or i E y},

where

1 if i X and i E ,
m'(i) = m(i) if i E X and i 3Y,
r(i) + 1 if i EXand i E .

Similarly, the subtraction of a set y from a multiset X is defined by

X 3= {(i,m'(i)) : i E X and m'(i) > 0},

where

,(i mr(i) if i y,
S'(i (i) -1 if i E Y.

The multiset subtraction of Y from XA undoes a prior addition. That is, for :,i
multiset XA and ;,~I set Y, we have

(( + 3Y) 3Y)=

In contrast ((X U 3) \ ) is equal to X if and only if X and y are 1 -i. iil sets.
Using multiset addition instead of set union, (3.1) leads to the following algorithm
for computing the nilb. !i. factorization of AAT

ALGORITHM 2 (Symbolic factorization of AAT, using multisets).
for j = 1 to m do
c {(j,1)}
for each c such that j = 7r(c) do
jc = c + (, \{c})
end for
for each k where min Ak = j do

end for
Fr(j) = min Cj \ {j}
end for









MULTIPLE-RANK MODIFICATIONS


4. Multiple rank symbolic update. We consider how the pattern changes
when AAT is replaced by AAT + WWT. Since

AAT +WWT = [A \\ 1[A W]T,

we can in essence augment A by W in order to evaluate the new pattern of column
j in L. According to (3.1), the new pattern j of column j of L after the update is


(4.1) Zy={j} U U Z \{c} U ( U Ak) U U( U ,
{c:j= (c)} j min AK=j min Wi=j

where ',\ is the pattern of column i in W. Throughout, we put a bar over a matrix
or a set to denote its new value after the update or downdate.
In the following theorem, we consider a column j of the matrix L, and how its
pattern is modified by the sets ",' Let j denote the multiset for column j after the
rank-r update or downdate has been applied.
THEOREM 4.1. To compute the new multiset rnitialize = C and .'i .....
the following 1'...'; ,1! ...-
Case A: For each i such that j = min' add ' to the pattern for column
J,

jrj = jrj +';V

Case B: For each c such that j = 7r(c) = r(c), compute

cj = cj + GC, \ jc)

(c is a child of j in both the old and new elimination tree).
Case C: For each c such that j = (c) # 7r(c), compute

j = L + (C, \ {C})
(c is a child of j in the new tree, but not the old one).
Case D: For each c such that j = r(c) # rF(c), compute

Cj = Cj (Gc \ {c})

(c is a child of j in the old tree, but not the new one).
F f...f Cases A-D account for all the .,liPi-l i!,. i1 we need to make in yj in order
to obtain j. These ;:.li1-,[ Il are deduced from a comparison of (3.1) with (4.1).
In case A, we simply add in the V:' multisets of (4.1) that do not appear in (3.1). In
case B, node c is a child of node j both before and after the update. In this case, we
must ;,li1.1- for the deviation between Zc and ,c. By [10, Prop. 3.2], after a rank-1
update, .C C C,. If wi denotes the i-th column of W, then
WT = Wi W + W T
W W1 W2 W r .W

Hence, updating AAT by WWT is equivalent to r successive rank-1 updates of AAT.
By repeated application of [10, Prop. 3.2], Cc C after a rank-r update of AAT. It










TIMOTHY A. DAVIS AND WILLIAM W. HAGER


follows that Zc and c deviate from each other by the set jc \ C,. Consequently, in
case B we simply add in C, \ C.
In case C, node c is a child of j in the new elimination tree, but not in the old
tree. In this case we need to add in the entire set C, \ {c} since the corresponding
term does not appear in (3.1). Similarly, in case D, node c is a child of j in the old
elimination tree, but not in the new tree. In this case, the entire set c \ {c} should be
deleted. The case where c is not a child of j in either the old or the new elimination
tree does not result in :i, ;,1,i1l- i,. I since the corresponding c term is absent from
both (3.1) and (4.1). D
An algorithm for updating a ('!C.d. -!:, factorization that is based only on this
theorem would have to visit all nodes j from 1 to m, and consider all possible children
c < j. On the other hand, not all nodes j from 1 to m need to be considered since not
all columns of L change when AAT is modified. In [10, Thm. 4.1] we show that for
r = 1, the nodes whose patterns can change are contained in P(kl) where we define
ki = min' For a rank-r update, let ('i) be the ancestor map associated with the
elimination tree for the ('I!, !I -!:- factorization of the matrix

i
(4.2) AAT +Y jw
j=1

Again, by [10, Thm. 4.1], the nodes whose patterns can change during the rank-r
update are contained in the union of the patterns P(i) (ki), 1 < i < r. Although we
could evaluate 'P(i)(ki) for each i, it is ,1!!,i L11 to do this I,- I !, since we need to
perform a series of rank-1 updates and evaluate the ancestor map after each of these.
On the other hand, by [10, Prop. 3.1] and [10, Prop. 3.2], P(i) (j) C P(i+) (j) for each
i and j, from which it follows that P(i) (ki) C P(ki) for each i. Consequently, the
nodes whose patterns change during a rank-r update are contained in the set

T= U P(k,).
1
Theorem 4.2, below, shows that i.,- node in T is also contained in one or more
of the sets P(i) (ki). From this it follows that the nodes in T are precisely those nodes
for which entries in the associated columns of L can change during a rank-r update.
Before presenting the theorem, we illustrate this with a simple example shown in
I ,ii.- 4.1. The left of I ,,.i. 4.1 shows the -i- pattern of original matrix AAT
its ('I. -I -!:I factor L, and the corresponding elimination tree. The nonzero pattern
of the first column of W is W1 = {1, 2}. If performed as a single rank-1 update, this
causes a modification of columns 1, 2, 6, and 8 of L. The corresponding nodes in the
original tree are encircled; these nodes form the path () (1) = {1, 2,6, 8} from node
1 to the root (node 8) in the second tree. The middle of I i,,..- 4.1 shows the matrix
after this rank-1 update, and its factor and elimination tree. The entries in the second
matrix AAT + wlwT that l!!. i from the original matrix AAT are shown as small
pluses. The second column of W has the nonzero pattern '2 = {3, 4, 7}. As a rank-1
update, this affects columns P(2)(3) = P(3) = {3,4,5,6,7,8} of L. These columns
form a single path in the final elimination tree shown in the right of the figure.
For the first rank-1 update, the set of columns that actually change are 'P () =(
{1, 2,6,8}. This is a subset of the path (1) = {1, 2,6,7,8} in the final tree. If we
use P(1) to guide the work associated with column 1 of W, we visit all the columns












MULTIPLE-RANK MODIFICATIONS


Original matrix
AAT

* *
*
*

00*

o*



Original factor L


After first update
AAT+ WT

++ *
++ *
*0

0*

***


After second update
T T T
AA +wIwI + w2w2

** *
** *
++6 +
++. +
0000
** ** *
++ +*
***


Factor after first update Factor after second update


Original
elimination tree
r--
8'

6 7
5

1 2|3 4


*


000

*0



Elimination tree
after first update

8

6 7
2 5
1 3 4


0*


0O 00
00 00
000


Elimination tree
after second update

8

7
6
2 5

1 4
3


FIG. 4.1. Example rank-2 update


that need to be modified, plus column 7. Node 7 is in the set of nodes P(3) affected
by the second rank-1 update, however, as shown in the following theorem.
THEOREM 4.2. Each of the paths p(i) (k) is contained in T and ..... .,. if
j E T, then j is contained in p(') (ki) for some i.
S...... Before the theorem, we observe that each of the paths p(i) (ki) is contained
in T. Now suppose that some node j lies in the tree T. We need to prove that it is
contained in p(i) (ki) for some i. Let s be the largest integer such that P(k,) contains
j and let c be ;,ii child of j in T. If c lies on the path P(ki) for some i, then j lies
on the path P(ki) since j is the parent of c. Since j does not lie on the path P(ki)
for ;,i! i > s, it follows that c does not lie on the path P(ki) for ;,!i i > s. A|,l- i1,_-
this same argument recursively, we conclude that none of the nodes on the subtree
of T rooted at j lie on the path P(ki) for :;, i > s. Let Tj denote the subtree of T
rooted at j. Since (i) (ki) is contained in P(ki) for each i, none of the nodes of Tj
lie on i:n of the paths P(i)(ki) for i > s. By [10, Thm. 4.1], the patterns of all nodes
outside the path p(i) (ki) are unchanged for each i. Let A) be the pattern of column
c in the ('!. .I. -!: factorization of (4.2). Since ;o! node c contained in Tj does not lie










TIMOTHY A. DAVIS AND WILLIAM W. HAGER


FIG. 4.2. Example rank-8 symbolic update and subtree T


on .,1- of the paths P(i)(ki) for i > s, i) = C) for all i, I > s. Since ks is a node
of Tj, the path P(S)(ks) must include j. [
1 ii I! .- 4.2 depicts a subtree T for an example rank-8 update. The subtree consists
of all those nodes and edges in one or more of the paths P(kl), P(k2), .. P(ks).
These paths form a subtree, and not a general graph, since they are all paths from
an initial node to the root of the elimination tree of the matrix L. The subtree T
might actually be a forest, if L has an elimination forest rather than an elimination
tree. The first nonzero positions in wl through ws correspond to nodes ki through
ks. For this example node k4 happens to lie on the path P(l)(ki). Nodes at which
paths first intersect are shown as smaller circles, and are labeled a through f. Other
nodes along the paths are not shown. Each curved arrow denotes a single subpath.
For example, the arrow from nodes b to e denotes the subpath from b to e in P(b).
This subpath is denoted as P(b, e) in I !,,! -. 4.2.
The following algorithm computes the rank-r -- 1lb .1i. update. It keeps track of
an ;, i! i of rn "I' !-11 i-11 L. -," one for each column of L. Each queue contains a set
of path-markers in the range 1 to r, which denote which of the paths P(kl) through
P(kr) will modify column j next. If two paths have merged, only one of the paths
needs to be considered (we arbitrarily select the higher-numbered path to represent
the merged paths). This set of path-queues requires O(m + r) space. Removing and
inserting a path-marker in a path-queue takes 0(1) time. The only outputs of the
algorithm are the new pattern of L and its elimination tree, namely, Zy and 7r(j) for
all j E [1, m]. Not all columns are affected by the rank-r update. We define A = 2C
and 7r(j) = 7r(j) for ;.,! node j not in T.
Case C will occur for c and j prior to visiting column 7r(c), since j = 7(c) < 7r(c).
We thus place c in the lost-child-queue of column 7r(c) when encountering case C
for nodes c and j. When the algorithm visits node 7r(c), its lost-child-queue will
contain all those nodes for which case D holds. This set of lost-child-queues is not










MULTIPLE-RANK MODIFICATIONS


the same as the set of path-queues (although there is exactly one lost-child-queue and
one path-queue for each column j of L).

ALGORITHM 3 (Symbolic rank-r update, add new matrix W).
Find the starting nodes of each path
for i = to r do
= {k:Wki) 0}
ki = min \
place path-marker i in path-queue of column ki
end for
Consider all columns corresponding to nodes in the paths P(ki) through P(kr)
for j = ii, ki to m do
if path-queue of column j is non-empty do
i- r
~j = J;
for each path-marker i on path-queue of column j do
Path P(ki) includes column j
Let c be the prior column on this path (if ;,, I', where r(c) = j
if j = ki do
Case A: j is the first node on the path P(ki), no prior c

else if j = 7(c) then
Case B: c is an old child of j, p..... 'li, changed
Cj = Cj + (C, \ jc)
else
Case C: c is a new child of j and a lost child of 7r(c)
j = j + (C, \ {c})
place c in lost-child-queue of column 7r(c)
endif
end for
Case D: consider each lost child of j
for each c in lost-child-queue of column j do
j = ~ (C, \ {c})
end for
Move up one step in the path(s)
W(j) = min j \ {j}
ifj \ {j} 0 then
Let i be the largest path-marker in path-queue of column j
Place path-marker i in path-queue of column 7r(j)
end if
end if path-queue of column j ........ '/'!.
end for

The optimal time for a general rank-r update is



0( ET










TIMOTHY A. DAVIS AND WILLIAM W. HAGER


The actual time taken by Algorithm 3 only slightly higher, namely,


0 m+( 4)


because of the O(m) book-keeping required for the path-queues. In most practical
cases, the O(m) term will not be the dominant term in the run time.
Algorithm 3 can be used to compute an entire n!il.I!i factorization. We start
by factorizing the ;1. tii- matrix I = IIT into LDLT =III. In this case, we have
CE = {(j, 1)} for all j. The initial elimination tree is a forest of m nodes and no
edges. We can now determine the iil-,1. 1i factorization of I + AAT using the rank-r
symbolic update algorithm above, with r = m. This matrix has identical -- niii .i,
factors as AAT. Case A will apply for each column in A, corresponding to the

U Ak
min A =j

term in (3.1). Since 7r(c) = 0 for each c, cases B and D will not apply. At column j,
case C will apply for all children in the elimination tree, corresponding to the

U \c
{c:j=T(c)}

term in (3.1). Since duplicate paths are discarded when they merge, we modify
each column j once, for each child c in the elimination tree. This is the same work
performed by the symbolic factorization algorithm, Algorithm 2, which is O(i ).
Hence, Algorithm 3 is equivalent to Algorithm 2 when we apply it to the update
I + AAT. Its run time is optimal in this case.
5. Multiple rank symbolic downdate. The downdate algorithm is analogous.
The downdated matrix is AAT WWT where W is a subset of the columns of A.
In a downdate, P(k) C P(k), and thus rather than following the paths P(ki), we
follow the paths P(ki). Entries are dropped during a downdate, and thus C C Cj
and 7r(j) < 7r(j). We start with Cj = C" and make the following changes.
Case A: If j = min' for some i, then the pattern 'V is removed from
column j,

j~j = jj -"

Case B: If j = 7r(c) = r(c) for some node c, then c is a child of j in both the
old and new tree. We need to remove from C, entries in the old pattern C,
but not in the new pattern C,,

Cj = Cj (\ ).

Case C: If j = 7r(c) 7 7r(c) for some node c, then c is a child of j in the old
elimination tree, but not the new tree. We compute


E4= Ej- Gc\ 10).










MULTIPLE-RANK MODIFICATIONS


Case D: If j = r(c) f 7r(c) for some node c, then c is a child of j in the new
tree, but not the old one. We compute

Ej = Ej + (Gc \ {c}).

Case C will occur for c and j prior to visiting column 7r(c), since j = 7r(c) < 7r(c).
We thus place c in the new-child-queue of 7r(c) when encountering case C for nodes c
and j. When the algorithm visits node 7r(c), its new-child-queue will contain all those
nodes for which case D holds.

ALGORITHM 4 (Symbolic rank-r downdate, remove matrix W).
Find the starting nodes of each path
for i = to r do
= {k:wi # 0}
ki = min V\'
place path-marker i in path-queue of column ki
end for
Consider all columns corresponding to nodes in the paths P(ki) through P(kr)
for j = !ii ki to m do
if path-queue of column j is non-empty do
=j j
for each path-marker i on path-queue of column j do
Path P(ki) includes column j
Let c be the prior column on this path (if i' where 7r(c) = j
if j = ki do
Case A: j is the first node on the path P(ki), no prior c
~j = ~j -. *'
else if j = F(c) then
Case B: c is an old child of j, ;i...i.., changed
-=-j (GC \ 4 )
else
Case C: c is a lost child of j and a new child of r(c)
-c = -c (Gc \ {c})
place c in new-child-queue of column 7r(c)
endif
end for
Case D: consider each new child of j
for each c in new-child-queue of j do
cj = J + (C, \ {c})
end for
Move up one step in the path(s)
W(j) = min j \ {j}
if j \ {j} 0 then
Let i be the largest path-marker in path-queue of column j
Place path-marker i in path-queue of column 7r(j)
end if
end if path-queue of column j '...1....... ',
end for










TIMOTHY A. DAVIS AND WILLIAM W. HAGER


The time taken by Algorithm 4 is


0 (m+| ,


which slightly higher than the optimal time,





In most practical cases, the O(m) term in the i-- !,ii ...1 run time for Algorithm 4
will not be the dominant term.
6. Multiple rank numerical update and downdate. The following numer-
ical rank-r update/downdate algorithm, Algorithm 5, overwrites L and D with the
updated or downdated factors. The algorithm is based on Algorithm 1, the one-pass
version of Method C1 in [18] presented in Section 1. The algorithm is used after
the 1 1. ,1i. update algorithm (Algorithm 3) has found the subtree T corresponding
to the nodes whose patterns can change, or after the symbolic downdate algorithm
(Algorithm 4) has found T. Since the columns of the matrix W can be reordered
without affecting the product WWT, we reorder the columns of W using a depth-
first search [6] of T (or T) so that as we march through the tree, consecutive columns
of W are utilized in the computations. This reordering improves the numerical up-
date/downdate algorithm by placing all columns of W that affect ,i,! given subpath
next to each other, eliminating an indexing operation. Reordering the columns of
a sparse matrix prior to ('!,.!. -!: factorization is very common [3, 22, 23, 2.] It
improves data 1... li- I and simplifies the algorithm, just as it does for reordering W
in a multiple rank update/downdate. The depth first ordering of the tree changes as
the elimination tree changes, so columns of W must be ordered for each update or
downdate.
To illustrate this reordering, consider the subtree T in I ii,.. 4.2 for a rank-8
update. If the depth-first-search algorithm visits child subtrees from left to right, the
resulting reordering is as shown in I !,!..- 6.1. Each subpath in I i,,..- 6.1 is labeled
with the range of columns of W that affect that subpath, and with the order in which
the subpath is processed by Algorithm 5. Consider the path from node c to e. In
I i 4.2, the columns of L corresponding to nodes on this subpath are updated by
columns 2, 8, 3, and 5 of W, in that order. In the reordered subtree (1 !,,!.- 6.1), the
columns on this subpath are updated by columns 5 through 8 of the reordered W.

ALGORITHM 5 (Sparse numeric rank-r modification, add 17WWT).
I !. columns of W have been reordered.
for i = 1 to r do
ai =1
end for
for each subpath in depth-first-search order in T (a = 1) or T (a = -1) do
Let cl through c2 be the columns of W that affect this subpath
for each column j in the subpath do
for i = cl to c2 do
a = ai + W0ildj











MULTIPLE-RANK MODIFICATIONS


3rd 0




FIG. 6.1. Example rank-8 update after depth-first-search reordering


dj = dja
Ti = wji/dj
dj = d/,ai
ai =dT
end for
for all p \ {j} (a = 1) or pE \ {j} (a = -1) do
fori = cl to c2 do
Wpi = Wpi -- jilpj
Ipj = Ipj (+ TWpi
end for
end for
end for
end for

The time taken by r rank-1 updates [10] is


(6.1) O(A I)
( ri=1 jEP(i)(ki)

where A(i) is the pattern of column j after the i-th rank-1 update. This time is
;i-- !i' i '.. 11- optimal. A single rank-r update cannot determine the paths p((i)(ki),
but uses P(ki) instead. Thus, the time taken by Algorithm 5 for a rank-r update is



j ( ePT(i)(

This is slightly higher than (6.1), because Pi)(kA,) C P(ki) and Ci C Cj. Since











TIMOTHY A. DAVIS AND WILLIAM W. HAGER


TABLE 7.1
Dense matrix performance for 64-by-64 matrices and 64-by-1 vectors

BLAS operation I11..I
DGEMM (matrix-matrix .... ill;.i 171.6
DGEMV (matrix-vector .... i; .i 130.0
DTRSV (solve Lx = b) 81.5
DAXPY (the vector computation y = ax + y) 78.5
DDOT (the dot product a = xTy) 68.7


pi) (k1) C P(ki), the i-th column of W does not necessarily affect all of the columns
in the path P(ki). If wi does not affect column j, then wji and yi will both be zero in
the inner loop in Algorithm 5. An example of this occurs in I !,iL! 4.1, where column
1 of W does not affect column 7 of L. We could check this condition, and reduce the
~i-- i ,I1d'. i run time to



i=1 jEP(i)((ki)

In practice, however, we found that the paths p(i) (ki) and P(ki) did not differ much.
Including this test did not improve the overall performance of our algorithm. The
time taken by Algorithm 5 for a rank-r downdate is similar, namely,



i=1 j(EP(ki)

The numerical algorithm for updating and downdating LLT is essentially the
same as that for LDLT [4, 24]; the only difference is a diagonal scaling. For either
LLT or LDLT, the !1.1. .!i algorithms are identical.
7. Experimental results. To test our methods, we selected the same exper-
iment as in our earlier paper on the single-rank update and downdate [10], which
mimics the behavior of the Linear Programming Dual Active Set Algorithm [20]. The
first matrix is 10- + AoA0T, where Ao consists of 5446 columns from a larger 6071-
by-12,230 matrix B with 35,632 nonzeros arising in an airline scheduling problem
(DFL001) [13]. The 5446 columns correspond to the optimal solution of the linear
programming problem. I iii!n, with an initial LDLT factorization of the matrix
10-6I + AoA0T, we added columns from B (corresponding to an update) until we
obtained the factors of 10-6I+ BBT. We then removed columns in a first-in-first-out
order (corresponding to a downdate) until we obtained the original factors. The LP
DASA algorithm would not perform this much work (6784 updates and 6784 down-
dates) to solve this linear programming problem.
Our experiment took place on a Sun Ultra Enterprise running the Solaris 2.6
operating -- -I. i, with eight 2-' Mhz UltraSparc-II processors (only one processor
was used) and 2GB of main memory. The dense matrix performance in millions of
floating-point operations per second ( !!!..- '1. of the BLAS [12] is shown in Table 7.1.
All results presented below are for our own codes (except for colmmd, spooles, and
the BLAS) written in the C programming language and using double precision floating
point arithmetic.
We first permuted the rows of B to preserve sparsity in the ('!. .I. -!: factors of
BBT. This can be done I! !IilI with colamd [7, 8, 9, 21], which is based on an











MULTIPLE-RANK MODIFICATIONS


TABLE 7.2
Average update and downdate performance results

rank rank-r time / r Il..
r in seconds
update downdate update downdate
1 0.0840 0.0880 30.3 29.6
2 0.0656 0.0668 38.9 39.0
3 0.0589 0.0597 43.3 43.6
4 0.0513 0.0549 49.7 47.5
5 0.0500 0.0519 51.0 50.2
6 0.0469 0.0487 54.4 53.5
7 0.0451 0.0468 56.6 55.7
8 0.0434 0.0448 58.8 58.2
9 0.0431 0.0458 59.1 57.0
10 0.0426 0.0447 60.0 58.3
11 0.0415 0.0437 61.5 59.6
12 0.0413 0.0432 61.8 60.3
13 0.0403 0.0424 63.2 61.4
14 0.0402 0.0420 63.6 62.1
15 0.0395 0.0413 64.6 63.1
16 0.0392 0.0408 65.1 63.9



approximate minimum degree ordering algorithm [1]. However, to keep our results
consistent with our prior rank-1 update/downdate paper [10], we used the same per-
mutation as in those experiments (from colmmd [17]). Both colamd and Matlab's
colmmd compute the ordering without forming BBT explicitly. A il..~li factoriza-
tion of BBT finds the nonzero counts of each column of the factors. This step takes
an amount of space this is proportional to the number of nonzero entries in B. It
gives us the size of a static data structure to hold the factors during the updating
and downdating process. The numerical factorization of BBT is not required. A
second symbolic factorization finds the first nonzero pattern C. An initial numerical
factorization computes the first factors L and D. We used our own non-supernodal
factorization code (similar to SPAISi'AK [5, 15]), since the update/downdate algo-
rithms do not use supernodes. A supernodal factorization code such as spooles [3] or
a multifrontal method [2, 14] can get better performance. The factorization method
used has no impact on the performance of the update and downdate algorithms.
We ran 16 i.11. i. Ii experiments, each one using a different rank-r update and
downdate, where r varied from 1 to 16. After each rank-r update, we solved the
sparse linear -- -I. i, LDLTx = b using a dense right-hand side b. To compare the
performance of a rank-1 update with a rank-r update (r > 1), we divided the run time
of the rank-r update by r. This gives us a normalized time for a single rank-1 update.
The average time and Mflops rate for a normalized rank-1 update and downdate for
the entire experiment is shown in Table 7.2. The time for the update, downdate, or
solve increases as the factors become denser, but the performance in terms of Mflops
is fairly constant for all three operations. The first rank-16 update when the factor
L is sparsest takes 0.47 seconds (0.0294 seconds normalized) and runs at 65.5 Mflops
compared to 65.1 Mflops in Table 7.2 for the average speed of all the rank-16 updates.
The performance of each step is summarized in Table 7.3. A rank-5 update takes
about the same time as using the updated factors to solve the sparse linear -- -i. i,
LDLTx = b, even though the rank-5 update performs 2.6 times the work.
The work, in terms of floating-point operations, varies only slightly as r changes.
\\IIl rank-1 updates, the total work for all the updates is 17.293 billion floating-











TIMOTHY A. DAVIS AND WILLIAM W. HAGER


TABLE 7.3
Dense matrix performance for 64-by-64 matrices and 64-by-1 vectors

Operation Time (sec) II..i. Notes
colamd ordering 0.45
Symbolic factorization (of BBT) 0.07 1.49 million nonzeros
Symbolic factorization for first C 0.46 831 thousand nonzeros
Numeric factorization for first L (our code) 20.07 24.0
Numeric factorization for first L (spooles) 18.10 26.6
Numeric factorization of BB1 (our code) 61.04 18.5 not required
Numeric factorization of BBT (spooles) 17.80 63.3 not required
Average rank-16 update 0.63 65.1 compare with rank-1
Average rank-5 update 0.25 51.0 compare with solve step
Average rank-1 update 0.084 30.3
Average solve LDL x = b 0.27 18.2



point operations, or 2.55 million per rank-1 update. \\ ii! rank-16 updates (the
worst case), the total work increases to 17.318 billion floating-point operations. The
rank-1 downdates take a total of 17.679 billion floating-point operations (2.61 million
per rank-1 downdate), while the rank-16 downdates take a total of 17.691 billion
operations. This confirms the near-optimal operation count of the multiple rank
update/downdate, as compared to the optimal rank-1 update/downdate.
Solving Lx = b when L is sparse and b is dense, and computing the sparse LDLT
factorization using a non-supernodal method, both give a rather poor computation-
to-memory-reference ratio of only 2/3. We tried the same loop unrolling technique
used in our update/downdate code for our sparse solve and sparse LDLT factorization
codes, but this resulted in no improvement in performance.
A sparse rank-r update or downdate can be implemented in a one-pass algorithm
that has much better memory I i ,I.i than that of a series of r rank-1 modifications. In
our numerical experimentation with the DFL001 linear programming test problem, the
rank-r modification was more than twice as fast as r rank-1 modifications for r > 11.
The superior performance of the multiple rank algorithm can be explained using the
computation-to- i. iI- -i. f. i. i 1.. ratio. If cl = 2 in Algorithm 5 (a subpath affected
by only one column of W), it can be shown that this ratio is about 4/5 when j is
large. The ratio when C2 = cl + 15 (a subpath affected by 16 columns of W) is
about 64/35 when j is large. Hence, going from a rank-1 to a rank-16 update
improves the computation-to-11,. ,,. .i -i. f i. i. .. ratio by a factor of about 2.3 when
column j of L has 11ii i~ nonzeros. By comparison, the level-1 BLAS routines for
dense matrix computations (vector computations such as DAXPY and DDOT) [11]
have computation-to-memory-reference ratios between 2/3 and 1. The level-2 BLAS
(D(; I.. V and D I 1S\ for example) have a ratio of 2.

8. Summary. Because of improved memory 1 ... li our multiple-rank sparse
update/downdate method is over twice as fast as our prior rank-1 update/downdate
method. The performance of our new method (65.1 Mflops for a sparse rank-16
update) compares favorably with both the dense matrix performance (81.5 Mflops to
solve the dense -- -i. i Lx = b) and the sparse matrix performance (18.0 Mflops to
solve the sparse -i. 11, Lx = b and an observed peak numerical factorization of 63.3
Mflops in spooles) on the computer used in our experiments. Although not strictly
optimal, the multiple-rank update/downdate method has nearly the same operation
count as the rank-1 update/downdate method, which has an optimal operation count.












MULTIPLE-RANK MODIFICATIONS


REFERENCES

[1] P. R. AMESTOY, T. A. DAVIS, AND I. S. DUFF, An approximate minimum degree ordering
SIAM J. Matrix Anal. .- i1i;. 17 (1996), pp. 886-905.
[2] P. R. AMESTOY AND I. S. DUFF, Vectorization of a multiprocessor multifrontal code, Inter-
nat. J. Supercomputer Appl., 3 (1989), pp. 41-59.
[3] C. ASHCRAFT AND R. G. GRIMES, SPOOLES: an object-oriented sparse matrix n in
Proc. 1999 SIAM Conf. on Parallel Processing for Sci. Comput., Mar. 1999.
[4] C. H. BISCHOF, C.-T. PAN, AND P. T. P. TANG, A ( .. I up- and downdating for
systolic and SIMD architectures, SIAM J. Sci. Comput., 14 (1993), pp. 670-676.
[5] E. CHU, A. GEORGE, J. W. H. LIU, AND E. NG, SPARSPAK: Waterloo sparse matrix package,
user's guide for SPARSPAK-A, tech. report, Univ. of Waterloo Dept. of Comp. Sci., Nov.
1984.
[6] T. H. CORMEN, C. E. LEISERSON, AND R. L. RIVEST, Introduction to i. MIT Press,
Cambridge, Massachusets, and McGraw-Hill, New York, 1990.
[7] T. A. DAVIS, J. R. GILBERT, S. I. LARIMORE, E. NG, AND B. PEYTON, A column approximate
minimum degree ordering in Proc. SIAM Conference on .l .i;.1. Linear Algebra,
Oct. 1997, p. 29.
[8] T. A. DAVIS, J. R. GILBERT, E. NG, AND B. PEYTON, A column approximate minimum de-
gree ordering in Abstracts of the Second SIAM Conference on Sparse Matrices,
Snowbird, Utah, Oct. 1996.
[9] A column approximate minimum degree ordering in XIII Householder Symp.
on Numerical Linear Algebra, Switzerland, June 1996.
[10] T. A. DAVIS AND W. W. HAGER, .' *. a sparse ( ' factorization, SIAM J. Matrix
Anal. i.i.1;. 20 (1999), pp. 606-627.
[11] J. J. DONGARRA, J. R. BUNCH, C. B. MOLER, AND G. W. STEWART, LINPACK Users Guide,
Philadelphia: SIAM Publications, 1978.
[12] J. J. DONGARRA, J. Du CROZ, I. S. DUFF, AND S. HAMMARLING, A set of level-3 basic linear
algebra subprograms, ACM Trans. Math. Softw., 16 (1990), pp. 1-17.
[13] J. J. DONGARRA AND E. GROSSE, Distribution of mathematical software via electronic mail,
Comm. ACM, 30 (1987), pp. 403-407.
[14] I. S. DUFF AND J. K. REID, The multifrontal solution of indefinite sparse symmetric linear
equations, ACM Trans. Math. Softw., 9 (1983), pp. 302-325.
[15] A. GEORGE AND J. W. H. LIU, Computer Solution of Large Sparse Positive Definite Systems,
Englewood Cliffs, New Jersey: Prentice-Hall, 1981.
[16] A. GEORGE, J. W. H. LIU, AND E. NG, A data structure for sparse i '. and LU factorizations,
SIAM J. Sci. Statist. Comput., 9 (1988), pp. 100-121.
[17] J. R. GILBERT, C. MOLER, AND R. SCHREIBER, Sparse matrices in MATLAB: design and
implementation, SIAM J. Matrix Anal. '....i;. 13 (1992), pp. 333-356.
[18] P. E. GILL, G. H. GOLUB, W. MURRAY, AND M. A. SAUNDERS, Methods for matrix
factorizations, Mathematics of Computation, 28 (1974), pp. 505-535.
[19] W. W. HAGER, Updating the inverse of a matrix, SIAM Review, 31 (1989), pp. 221-239.
[20] The LP dual active set in High Performance Algorithms and Software in
Nonlinear Optimization, R. D. Leone, A. Murli, P. M. Pardalos, and G. Toraldo, eds.,
Dordrecht, 1998, Kluwer, pp. 243-254.
[21] S. I. LARIMORE, An approximate minimum degree column ordering Tech. Report
TR-98-016, Univ. of Florida, Gainesville, FL, 1998. http://Awww.cise.ufl.edu/ 1. i. I .., i /.
[22] J. W. H. Liu, The role of elimination trees in sparse factorization, SIAM J. Matrix Anal. Ap-
.i;. 11 (1990), pp. 134-172.
[23] E. G. NG AND B. W. PEYTON, A supernodal i factorization for shared-
memory multiprocessors, SIAM J. Sci. Comput., 14 (1993), pp. 761-769.
[24] C.-T. PAN, A modification to the LINPACK downdating BIT, 30 (1990), pp. 707-
722.
[25] E. ROTHBERG, A. GUPTA, N. E. G., AND B. W. PEYTON, Parallel sparse ( .. factorization
for shared-memory multiprocessor systems, in Advances in Computer Methods
for Partial Differential Equations VII, R. ..I.. 1 1 D. Knight, and G. Richter, eds.,
IMACS, 1992, pp. 622-628.
[26] G. STRANG, Linear Algebra and Its Applications, Academic Press, New York, 1980.




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs