Group Title: Department of Computer and Information Science and Engineering Technical Reports
Title: Row modifications of a sparse Cholesky factorization
Full Citation
Permanent Link:
 Material Information
Title: Row modifications of a sparse Cholesky factorization
Series Title: Department of Computer and Information Science and Engineering Technical Reports
Physical Description: Book
Language: English
Creator: Davis, Timothy A.
Hager, William W.
Publisher: Department of Computer and Information Sciences and Engineering, University of Florida
Place of Publication: Gainesville, Fla.
Copyright Date: 2003
 Record Information
Bibliographic ID: UF00095600
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.


This item has the following downloads:

2003346 ( PDF )

Full Text



Abstract. Given a sparse, symmetric positive definite matrix C and an associated sparse
C ...I -I factorization LDLT, we develop sparse techniques for updating the factorization after a
symmetric modification of a row and column of C. We show how the modification in the C h.. ,. I
factorization associated with this rank two modification of C can be computed efficiently using a
sparse rank one technique developed in an earlier paper [SIAM J. Matrix Anal. Appl., 20 (1999),
pp. 606-627]. We also determine how the solution of a linear system Lx = b changes after changing
a row and column of C or after a rank-r change in C.

Key words. numerical linear algebra, direct methods, C h..I I factorization, sparse matrices,
mathematical software, matrix updates.

AMS subject classifications. 65F05, 65F50, 65Y20, 65-04.

1. Introduction. The problem of updating a Cholesky factorization after a
small rank change in the matrix is a fundamental problem with many applications
including optimization algorithms, least-squares problems in statistics, the analysis
of electrical circuits and power systems, structural mechanics, boundary condition
changes in partial differential equations, domain decomposition methods, and bound-
ary element methods [12]. A variety of techniques for modifying a dense Cholesky
factorization are given in the classic reference [11]. Recently [3, 4], we consider a
sparse Cholesky factorization LDLT of a symmetric, positive definite matrix C, and
the modification associated with a rank-r change of the form C = C WWT, where
W is n-by-r with r typically much less than n. In this paper, we consider a special,
but important, rank-2 change corresponding to a symmetric modification of a row
and column of C. Although we could, in principle, use our previous methodology to
update the factorization, we observe that this rank-2 approach is much less efficient
than the streamlined approach we develop here. In fact, the rank-2 approach could re-
sult in a completely dense modification of the factorization, where nonzero entries are
first introduced and then canceled out. With the new approach, the work connected
with removing a nonzero row and column is comparable to the work associated with
a sparse rank-1 update, while the work associated with adding a new nonzero row and
column is comparable to the work associated with a sparse rank-1 downdate. This
connection between the modification of the matrix and the modification of the fac-
torization is nonintuitive: When we remove elements from the matrix, we update the
factorization; when we add elements to the matrix, we downdate the factorization. As
a by-product of our analysis, we show how the solution to a triangular system Lx = b
changes when both L and b change as a result of the row and column modification

October 21, 2003. This material is based upon work supported by the National Science Foun-
dation under Grant No. CCR-0203270. Any opinions, findings, and conclusions or recommendations
expressed in this material are those of the author and do not i.... ..I reflect the views of the
National Science Foundation.,, PO Box 116120, Department of Com-
puter and Information Science and Engineering, University of Florida, Gainesville, FL 32611-6120.
Phone (352) 392-1481. Fax (352) 392-1220.
thager@math.uf,, PO Box 118105, Department of Mathe-
matics, University of Florida, Gainesville, FL 32611-8105. Phone (352) 392-0281. Fax (352) 392-8357.


problem discussed in this paper, or as a result of a rank-r change to C [3, 4].
One specific application for the techniques developed in this paper is the Linear
Programming Dual Active Set Algorithm (LPDASA) [5, 13, 14, 15, 16, 17]. In this
dual approach to linear programming, inactive inequalities are identified dynamically,
during the solution process; dropping these inactive inequalities amounts to removing
a row and a column from a Cholesky factored matrix. In the same way, when a
dropped inequality later becomes active, a new row and column must be inserted in
the matrix, and the resulting modification in the Cholesky factorization evaluated.
In general, the techniques developed in this paper are useful in any setting where a
system of equations is solved repeatedly, with equations added or dropped before each
A brief overview of our paper follows. In Section 2 we consider the row addition
problem, in which a row and column of a dense matrix C, originally zero except for
the diagonal element, are modified. Section 3 extends this method to a sparse matrix
C. The row deletion problem is the opposite of row addition, and is discussed in
Sections 4 and 5 (the dense and sparse cases, respectively). Sections 6 and 7 describe
how to make arbitrary modifications to a row and column of C. The efficient methods
presented in Sections 2 through 7 are contrasted with performing the modifications
as a rank-2 outer product modification in Section 8, which is shown to be costly,
particularly in the sparse case. Section 9 shows how to efficiently modify the solution
to Lx = b when L and b change. A brief presentation of the experimental performance
of these methods in the context of matrices arising in linear programming is given in
Section 10.
We use the notation C to denote the matrix C after it has been modified. Bold
uppercase A refers to a matrix. Bold lowercase a is a column vector, thus, aT always
refers to a row vector. Plain lowercase letters (such as a and a) are scalars. We
use IAI to denote the number of nonzero entries in the sparse matrix A. Without
parentheses, the notation Ai or Aij refer to submatrices of a matrix A (sometimes
1-by-1 submatrices). We use parenthesis (A)ij to refer to the entry in row i and
column j of the matrix A, (A),j to refer to all of column j of A, and (A)i, to refer
to all of row i of A. When counting floating-point operations, we count one flop for
any arithmetic operation including *, /, +, -, and J.

2. Adding a row and column to C. Modifying the kth row and column of a
matrix C can be considered in two different contexts. If we have a rectangular n-by-m
matrix A, and C al + AAT, then modifying row k of A leads to changes in the
kth row and column of C. In the more general context, we consider arbitrary changes
to the kth row and column of a symmetric positive definite matrix C. We will first
consider the special case where row k is initially zero and becomes nonzero in A (the
row addition case). Alternatively, the kth row and column of C is initially a multiple
of the kth row and column of the identity matrix, and changes to some other value.
Let aT be the new nonzero kth row of A,

A= O A ,
A3 As

where OT is a row vector whose entries are all 0. This leads to a modification to row


and column k of C,

S oAiA


o A1A [C11
0 al + A3AJ C31

where 22 = a. We can let a be zero; even though C would no longer be positive
definite, the submatrix excluding row and column k would still be positive definite.
The linear system Cx = b is well-defined in this case, except for xk. The new matrix
C is given as

al + AiAT
a AI

Ala2 A3A3 C11
a+aa2 a2 3 = C12
A3a2 al + A3A CC31

C 31

Thus, adding a row k to A is equivalent to adding a row and column k to C. Note
that changing row and column k of C from zero (except for the diagonal entry) to a
nonzero value can also be viewed as increasing the dimension of the (n 1)-by-(n 1)

C31 C33
The prior factorization of the n-by-n matrix C is given as

l ][ D1
L31 0 L33
C11 0 Ci,
C31 0 C33

1 L

which leads to the four equations,

L11D1LT 1 C11,
d2 = C,
L31DILT1 C31,
L31DIL 1 + L33D3L3 = C33.

After adding row and column k to obtain C, we have the factorization

T -T
LDLT 112


L33 ] [




Note that L11 and L31 do not change as a result of modifying row and column k of
C. From (2.2), the relevant equations are





L11D1112 = C12,


112Dli12 + d2 = 22,
L31D111 + 132d2 = C2,
T - -T T
(2.4) L31DIL3 + 132d2132 + L33D3L33 = C33.

Let w = 132 2. Combining (2.4) with the original equation (2.1), we obtain

-T T T
L33D3L33 (C33 L31DLL) 132d2132 L33D3L3 ww

The factorization of L33D3LI3 wwT can be computed as a rank-1 downdate of
the original factorization L33D3LI3 [3]. This derivation leads to Algorithm 1 for
computing the modified factorization LDL :

Algorithm 1 (Row addition)
1. Solve the lower triangular system L11Dil12 = 12 for 112.
2. d2 = C22 I2Dil12
3. 132 (C32 L31D1112)/d2
4. w 132 Vd2
5. Perform the rank- downdate L33D3L33 L33D3L13 wwT
end Algorithm 1

Step (1) of Algorithm 1 requires the solution of a unit lower triangular system Lily
c12 of order k 1. The computation of y takes (k 1)2 (k 1) flops, and computing
112 D1 y takes another k 1 flops. Step (2) requires 2(k 1) work, using y. Step
(3) is the matrix-vector multiply L31y, where L31 is (n- k)-by-(k- 1), and thus takes
2(n k)(k 1) operations, and k 1 more to divide by d2. Step (4) requires one
square root operation and n k multiplications. Finally, the rank-1 downdate of step
(5) takes 2(n k)2 + 4(n k) operations using method C1 of [11] (see [4]). For the
dense case, the total number of floating-point operations performed by Algorithm 1
is 2n2 + 3n + k2 (2nk + 2k + 1). This is roughly 2n2 when k 1, n2 when k n,
and (5/4)n2 when k = n/2.

3. Sparse row addition. If C is sparse, each step of Algorithm 1 must operate
on sparse matrices. The graph algorithms and data structures must efficiently support
each step. We will assume that L is stored in a compressed column vector form, where
the row indices in each column are sorted in ascending order. This is the same data
structure used in [3, 4], except that the algorithms presented there do not require
sorted row indices, but they do require the integer ,-... ii ,i i. of each nonzero entry
of L to support an efficient symbolic downdate operation. The algorithm discussed
below will not require the multiplicities.
Maintaining the row indices in sorted order requires a merge operation for the
set union computation to determine the new nonzero patterns of the columns of L,
rather than a simpler unsorted set union used in [3, 4]. It has no effect on asymptotic
complexity, and little effect on the run time. Although more work is required to
maintain the row indices in sorted order, time is gained elsewhere in the algorithm.
Operating on columns in sorted order in the forward solve of Lx = b, for example, is
faster than operating on a matrix with jumbled columns.
Step (1) of Algorithm 1 solves the lower triangular system Lily = c12, where all
three terms in this system are sparse. Gilbert and Peierls have shown how to solve


this system optimally, in time proportional to the number of floating-point operations
required [9, 10]. We review their method here.
Consider the n-by-n system Lx = b, where L is lower triangular and both L and
b are sparse. The solution will be sparse, and the total work may be less than O(n).
We cannot use a conventional algorithm that iterates over each column of L and skips
those for which x is zero, since the work involved will then be at least n. Instead, the
nonzero pattern of x must first be computed, and then the corresponding columns of
L can be used to compute x in time proportional to the floating-point work required.
Let GL be a graph with n nodes and with a directed edge from node j to node
i if and only if li is nonzero. Gilbert and Peierls show that xj is nonzero (ignoring
numerical cancellation) if and only if there is a path of length zero or more from some
node i, where bi, 0, to node j in the graph GL. Computing the pattern of x requires
a graph traversal, starting from the nodes corresponding to the nonzero pattern of b.
It can be done in time proportional to the number of edges traversed. Each of these
edges is a nonzero in L that takes part in the subsequent numerical computation.
The above result holds for any lower triangular matrix L. In our case, L arises
from a Cholesky factorization, and has an elimination tree [18, 19]. The elimination
tree of L has n nodes. The parent of node j in the elimination tree is the smallest
index i > j for which lij / 0; node j is a root if there is no such i. Since the nonzero
pattern of (L),j is a subset of its path to the root of the elimination tree [20], all
the nodes in GL that can be reached from node j correspond to the path from node
j to the root of the elimination tree. Traversing the paths in the tree, starting at
nodes corresponding to nonzero entries in b, takes time proportional to the number
of nonzero entries in x. A general graph traversal of GL is not required. Step (1) of
Algorithm 1 takes

0 (L11).j
\(112 )j '0

time to compute the nonzero pattern and numerical values of both y and 112. The
insertion of the nonzero entries of 112 into the data structure of L is performed in
conjunction with step (3).
Step (2) is a scaled dot product operation, and can be computed in time propor-
tional to the number of nonzero entries in 112.
Step (3) is a matrix-vector multiply operation. It accesses the same columns of
L used by the sparse lower triangular solve, namely, each column j for which the jth
entry in 112 is nonzero. These same columns need to be modified, by shifting entries
in L31 down by one, and inserting the new entries in 112, the kth row of L. No other
columns in the range 1 to k 1 need to be accessed or modified by steps (1) through
(3). When step (3) completes, the new column k of L needs to be inserted into the
data structure. This can be done in one of two ways. In the general case, we can store
the columns themselves in a non-contiguous manner, and simply allocate new space
for this column. A similar strategy can be used for any columns 1 through k 1 of L
that outgrow their originally allocated space, with no increase in asymptotic run time.
Alternatively, we may know an a priori upper bound on the size of each column of L
after all row additions have been performed. In this case, a simpler static allocation
strategy is possible. This latter case occurs in our use of the row addition algorithm
in LPDASA, our target application [5]. In either case, the time to insert 112 into the


data structure and to compute 132 in step (3) is

( ((2) 1i: (L31)*j
(112 )j '0

Step (4) is a simple scalar-times-vector operation. The total time for steps (1) through
(4) is

o (LE |(L) .
(112)o (L)*
Step (5) almost fits the specifications of the sparse rank-1 modification in [3],
but with one interesting twist. The original kth row and column of C are zero,
except for the placeholder diagonal entry, a. The new row and column only adds
entries to C, and thus the nonzero pattern of the original factor L is a subset of the
nonzero pattern of L (ignoring numerical cancellation). The rank-1 modification in
Algorithm 1 is a symbolic update (new nonzero entries are added, not removed) and
a numeric downdate Ls33DL3L -wwT. Since the multiplicities used in [3] are needed
only for a subsequent symbolic downdate, they are not required by the row addition
algorithm. They would be required by a row deletion algorithm that maintains a
strict nonzero pattern of L; this issue is addressed in Section 5.
T- -
The rank-1 modification to obtain the factorization L33D3L33 takes time propor-
tional to the number of nonzero entries in L33 that change. The columns that change
correspond to the path from node k to the root of the elimination tree of L. This
path is denoted P in [3]. At each node j along the path P, at most four floating-point
operations are performed for each nonzero entry in (L),j.
With our choice of data structures, exploitation of the elimination tree, and the
rank-1 modification from [3], the total time taken by Algorithm 1 is proportional to
the total number of nonzero entries in columns corresponding to nonzero entries in 112,
to compute steps (1) through (4), plus the time required for the rank-1 modification
in step (5). The total time is

0 (11E (L2+E j(iL)
((i12)jo j( /
This time includes all data structure manipulations, sparsity pattern computation,
and graph algorithms required to implement the algorithm. It is identical to the total
number of floating-point operations required, and thus Algorithm 1 is optimal. In
the sparse case, if every column j takes part in the computation, the time is O(|L|).
Normally, not all columns will be affected by the sparse row addition. If the new row
and column of L are very sparse, only a few columns take part in the computation.
4. Row deletion. By deleting a row and column k from the matrix C, we mean
setting the entire row and column to zero, except for the diagonal entry (C)kk which
is set to a. Prior to deleting row and column k, we have the original factorization

[L11 D] L i 112 Li ,
LDLT IT 1 d2 1 IT
L31 132 L33 D3 L 3


C11 C12 Cl 1
C12 C22 C32
C31 c32 C33

After deleting row and column k, we have

LT 0 L 0 LT
LDL T OT 1 a 1 OT
L31 0 L33 D3 L33
C11 0 CT 1
[OT a OT
C31 0 C33

Thus we only need to set row and column k of L to zero, set the diagonal entry to a,
and compute L33 and D3. The original factorization is

L33D3L3 = C33 L31DIL3 132d21 2,

while the new factorization is given as

L33D3L33 = C33 L3iDiL.

Combining these two equations, we have a numeric rank-1 update,
(4.1) L33D3L33 L33D3L{3 + wwT

where w = 132 /2. Algorithm 2 gives the complete row deletion algorithm:

Algorithm 2 (Row deletion)
1. 112 0
2. d= a
3. 132 0
4. w 132 'd2
5. Perform the rank-1 update L33D3L33 L33D333 + wwT
end Algorithm 2

When C is dense, the number of floating-point operations performed by Algorithm 2
is 2(n k)2 + 5(n k) + 1 (the same as steps (4) and (5) of Algorithm 1). This is
roughly 2n2 when k = 1, and (1/2)n2 when k n/2. No work is required when k n.

5. Sparse row deletion. When row and column k of C are "deleted" (set to
zero) to obtain C, no new nonzero terms will appear in L, and some nonzero entries
in L may become zero. We refer to the deletion of entries in L as a symbolic downdate
[3]. The symbolic downdate is combined with a numeric rank-1 update, because of
the addition of wwT in (4.1).
We cannot simply delete entries from L that become numerically zero. An entry
in L can only be removed if it becomes symbolically zero (that is, its value is zero
regardless of the assignment of numerical values to the nonzero pattern of C). If
entries are zero because of exact numerical cancellation, and are dropped from the
data structure of L, then the elimination tree no longer characterizes the structure


of the matrix. If the elimination tree is no longer valid, subsequent updates and
downdates will not be able to determine which columns of L must be modified.
Steps (1) through (3) of Algorithm 2 require no numerical work, but do require
some data structure modifications. All of the entries in row and column k of L
become symbolically zero and can be removed. If L is stored by columns, it is trivial
in step (3) to immediately delete all entries in column 132. On the other hand, the
statement 112 0 in step (1) is less obvious. Each nonzero element in row k lies in a
different column. We must either set these values to zero, delete them from the data
structure, or flag row k as zero and require any subsequent algorithm that accesses
the matrix L to ignored flagged rows. The latter option would lead to a sparse row
deletion algorithm with optimal run time, but complicates all other algorithms in our
application and increases their run time. We choose to search for them in each column
that they appear and delete them from the data structure.
To set 112 to zero and delete the entries from the data structure for L requires
a scan of all the columns j of L for which the jth entry of 112 is nonzero. The time
taken for this operation is asymptotically bounded by the time taken for steps (1)
through (3) of sparse row addition (Algorithm 1), but the bound is not tight. The
nonzero pattern of row k of L can easily be found from the elimination tree. Finding
the row index k in the columns takes less time than step (1) of Algorithm 1 since a
binary search can be used. Deleting the entries takes time equivalent to step (3) of
Algorithm 1, since we maintain each column with sorted row indices.
The immediate removal of entries in L33 can be done using the symbolic rank-1
downdate presented in [3]. However, this requires an additional array of multiplicities,
which is one additional integer value for each nonzero in the matrix. Instead, we can
allow these entries to become numerically zero (or very small values due to numerical
roundoff) and not remove them immediately. Since they become numerically zero
(or tiny), they can simply remain in the data structure for the matrix, and have no
effect on subsequent operations that use the matrix L. The entries can be pruned
later on by a complete symbolic factorization, taking O(ILI) time [6, 7, 8]. If this is
done rarely, the overall run time of the application that uses the sparse row deletion
algorithm will not be affected adversely.
The asymptotic run time of our sparse row deletion algorithm is the same as
sparse row addition (or less, because of the binary search), even though sparse row
addition requires more numerical work. This is non-optimal, but no worse than sparse
row addition, whose run time is optimal.

6. Arbitrary row modification. It is possible to generalize our row deletion
and addition algorithms to handle the case where the kth row and column of C is
neither originally zero (the row addition case) nor set to zero (the row deletion) case,
but is changed arbitrarily. Any change of this form can be handled as a row deletion
followed by a row addition, but the question may remain as to whether or not it
can be done faster as a single step. In this section we show that no floating-point
operations are saved in a single-pass row modification algorithm, as compared to the
row deletion + row addition approach, if the change in the kth row and column is
The original matrix C and the new matrix C are

C11 c12 C [C11 C12 C ]
(6.1) C = c c22 CT and C C c C .
C31 C32 C33 C31 E32 C33


Computing l12, d2, and 132 is the same as the row addition algorithm, and takes
exactly the same number of floating-point operations. The original factorization of
the (33)-block is

L33sDsL = C33 LSiDIL 1 132d21T2,

while the new factorization is
SLT -d2 l.T
L33D3L33 = C33 L3iDi1L 132 d2

These can be combined into a rank-1 update and rank-1 downdate
(6.2) L33D3L33 = L33D3L3 + Wwl W2W2,

where wi 132 2 and w2 132 V22. The multiple rank update/downdate presented
in [4] cannot perform a simultaneous update and downdate, but if the sparse downdate
(removal of entries that become symbolically zero) is not performed, it would be
possible to compute the simultaneous update and downdate (6.2) in a single pass.
This may result in some time savings, since the data structure for L is scanned once,
not twice. No floating-point work would be saved, however. The total floating-
point operation count of the row modification algorithm is identical to a row deletion
followed by a row addition.
7. Making sparse changes to a sparse row. Suppose we constrain the mod-
ifications to the kth row and column of C. This assumption can reduce the total
amount of floating-point work required to compute the modified factorization LDLT
as shown below.
More precisely, consider the case where only a few entries in row and column k
of C are changed. Let

Ac12 = 12 C12,
AC22 =T22 -- C22,
AC32 C32 C32,

and assume that the change in the kth row and column is much sparser than the
original kth row and column of C (for example, |Ac121 << cI121).
If we consider (2.3) and its analog for the original matrix C, we have

(7.1) L1Dl112 c12,
(7.2) LlD1l12 = 12.

Combining these two equations together, we have

L1DAl112 Ac12,

where Al12 = 112 112. Since Ac12 is sparse, the solution to this lower triangular
system will also likely be sparse. It can be solved in time proportional to the time
required to multiply L11 times Al12, or

o ( E (L1)j)
\(Alli2 )j,'0


[9, 10]. We can then compute 112 112 + Al12. This approach for computing 112 can
be much faster than solving (7.2) directly, which would take

S (112 ) I( )
\( i2) (LiO

time. Computing d2 can be done in time proportional to Al121, using the following
formula that modifies the dot product computation in Algorithm 1,

d2 = d2 + Ad = d2 + Ac22

E (A12),((2)j + (112)j)(Di)jj.
(All2)j 0

Similarly, the kth column of L can be computed as

132 (AC32 + 132d2 L31DiAl12) /d2
The key component in this computation is the sparse-matrix-vector multiplication
L31DAl112, which takes less time to compute that the corresponding computation
L31D112 in step (2) of Algorithm 1.
The remaining work for the rank-1 update/downdate of L33D33LI3 is identi-
cal to the arbitrary row modification (6.2). If k is small, it is likely that this up-
date/downdate computation will take much more time than the computation of 112,
d2, and 132. If k is large, however, a significant reduction in the total amount of work
can be obtained by exploiting sparsity in the change in the row and column of C.

8. Row
umn k of a
C + wlwT

modifications as a rank-2 outer-product. Modifying row and col-
symmetric matrix C can be written as a rank-2 modification C
w2wT. Suppose C and C are as given in (6.1). Let

d (22 -

- C12
- C32

Let ek be the kth column of the identity. Then C C + de + ekdT. This can be put
into the form C C + ww w2w using the following relationship (note that e,
below, is an arbitrary column vector, not necessarily ek). Given d and e E R", define

d e
| | |

d e
and q |d| Ile|-

Then we have

deT +edT Id (ppT


In our case, e = ek and |lel| = 1. Defining

it fll f ( ))
it follows from (8.1)

and w e ,
and W2 21 -e,

C = C + WlwT W2W.




In the dense case, computing (8.2) using a rank-1 update and rank-1 downdate
takes 4n2 + lln floating-point operations, independent of k (including the work to
compute wl and w2). If we use Algorithm 1 and 2 to make an arbitrary change in
the kth row and column of C, the total work is 4n2 + 8n + 3k2 (6nk + 7k), which is
roughly 4n2 when k 1 n2 + n when k n, and (7/4)n2 when k = n/2. Using the
rank-2 update/downdate method to modify row and column k of C can thus take up
to four times the work compared to the row addition/deletion presented here.
If the modification requires only the addition or deletion of row and column k,
then only Algorithm 1 or 2 needs to be used, but the entire rank-2 update/downdate
method presented in this section is still required (about 4n2 work, independent of
k). Considering only the quadratic terms, Algorithm 1 performs 2n2 + k2 2nk
operations, and Algorithm 2 performs 2(n k)2 operations. The row modification
methods require much less work, particularly for the row deletion case when k n n,
in which the work drops from 4n2 for the rank-2 update/downdate method to nearly
no work at all.
In the sparse case, the differences in work and memory usage can be be extreme.
Both the rank-1 update and downdate could affect the entire matrix L, and could
cause catastrophic intermediate fill-in. Consider the row addition case when k n n,
and the new row and column of C is dense. The factorization of C will still be fairly
sparse, since the large submatrix C11 does not change, and remains very sparse. The
factor L1 can have as few as O(n) entries. However, after one rank-1 update, the (11)-
block of C + wiw is completely dense, since wl is a dense column vector. After the
rank-1 downdate (with -w2wT), massive cancellation occurs, and the factorization
of C11 is restored to its original sparse nonzero pattern. But the damage has been
done, since we require O(n2) memory to hold the intermediate factorization. This is
infeasible in a sparse matrix algorithm. The memory problem could be solved if a
single-pass rank-2 update/downdate algorithm were to be used, but even then, the
total work required would be O(n2) which is much more than the O(|L|) time required
for Algorithm 1 in this case. The same problem occurs if the downdate with -w2wT
is applied first.
9. Modifying a lower triangular system. We now consider a related opera-
tion that can be performed efficiently at the same time that we modify the Cholesky
factorization. Suppose we have a linear system Cx = b, and the system is modified,
either by changing a row and column of C as discussed above, or due to a low-rank
change C =C WWT as discussed in [3, 4]. After the factorization is modified,
the new factorization LDL = C will normally be used to solve a modified linear
system, Cx = b. The complete solution x will likely be different in every component
because of the backsolve, but a significant reduction of work can be obtained in the
forward solve of the lower triangular system Lx = b. We thus focus only on this lower
triangular system, not the complete system.
First, consider the simpler case of a low-rank change of a dense matrix. As shown
below, it takes double the work to modify the solution instead of computing it from
scratch, but the dense matrix method can be used for submatrices in the sparse case,
resulting in a significant reduction in work. Suppose we have the system Lx = b,
including its solution x, and the new linear system is Lx = b. Combining these two
equations gives

Lx b- b+Lx Ab + Lx

where Ab = b b. Suppose we are given L, x, and Ab. The matrix L is computed


column-by-column by either the low-rank update/downdate algorithm in [3, 4], or the
row and column modification algorithm discussed in this paper. We can combine the
modification of L with the computation of x, as shown in Algorithm 3:

Algorithm 3 (Dense modification of Lx = b to solve Lx = b)
x Ab
for j =1 to n do
x x+(L),jxj
compute the new column (L),*
j+l ...n = Xj+l....n (L)jj+ 1...,jxj
end for
end Algorithm 3

The total work to modify the solution to Lx = b is roughly 2n2, as compared to n2
work to solve Lx = b from scratch. One would never use this method if L and b are
Now consider the sparse case. Suppose we have a low-rank sparse update of L.
The columns that change in L correspond to a single path P in the elimination tree
for a rank-1 update. Every entry in these specific columns is modified. For a rank-r
update, where r > 1, the columns that change correspond to a set of paths in the
elimination tree, which we will also refer to as P in this more general case. In both
cases, the nonzero pattern of each column j in the path P is a subset of the path [20].
We can thus partition the matrices L and L into two parts, according to the set P.
The original system is

L1 0 [ x, ] b]
L21 L22 x2 b2

where the matrix L22 consists of all the rows and columns corresponding to nodes in
the path P. If the changes in the right-hand-side b are also constrained to the set P,
the new linear system is

SL11 0 [ xi1 b
L12 L22 j X2 b2 '

Note that L11, L12 and bl do not change, and thus xl does not change. We have
made a sparse change to both the matrix L and the right-hand-side. The solution to
the subsystem Llxl = bl does not change. We have

L21xl + L22x2 = b2
L21x + L22X2 = b2.

We can apply Algorithm 3 to the subsystem

L22x2 = Ab2 + L22x22,

where Ab2 = b2 -b2, to obtain the new solution X2. Algorithm 3 takes 4|L22 + O(P)
floating-point operations to compute X2. An additional 4r|L22| + 0(P) floating-point
operations are required to update L22. Solving Lx = b after L is updated takes 2L|I
floating-point operations. If the path P is short, making a sparse modification to the
old solution x to obtain the new solution x during the update of L takes much less
work than solving the new linear system after L is updated.


Finally, consider the case discussed in this paper of modifying row and column k
of C. If we add row k, the original lower triangular system is

L1 xi bi
(9.1) OT 1 2 b .
L31 0 L33 x3 b3

The new lower triangular system is

L11 x, b,
(9.2) T12 1 2 b2 ,
LS1 132 L33 X3 b3

where we assume bl does not change, and the entries that change in bs are a subset
of the path P. Let Abs = bs b. The term x2 can be computed as
X2 = b2 l1xi.

The (33)-blocks of (9.1) and (9.2) give the equations

La3X3 = b3 L3six
(9.3) L33X3 = (b3 L3ixi) + (Ab3 1322).

The change in the right-hand-side of this system is Ab3 132X2. Since the nonzero
pattern of 132 is a subset of P, this computation fits the same requirements for the
sparse change in x due to a sparse low-rank change in L and b, discussed above. The
row deletion case is analogous.
The number of floating-point operations in Algorithm 3 to modify the solution to
Lx = b is roughly the same as a rank-1 update to compute the modified L. However,
the run time is not doubled compared to a stand-alone rank-1 update. At step j,
for each j in the path P, the rank-1 update must read column j of L, modify it,
and then write the jth column of L back into memory. No extra memory traffic to
access L is required to compute the solution to the lower triangular system using
(9.3). With current technology, memory traffic can consume more time than floating
point operations. Thus, when modifying the kth row and column of C, or performing
a rank-r update/downdate to C, we can update the solution to the lower triangular
system at almost no extra cost. In our target application [5], solving the linear system
Cx b, given its LDLT factorization, can often be the dominant step. The method
presented here can cut this time almost in half since the forward solve time is virtually
eliminated, leaving us with the time for the upper triangular backsolve.
10. Experimental results. To illustrate the performance of the algorithms
developed in this paper in a specific application, Table 10.1 gives the flops associated
with the solution of the four largest problems in the Netlib LP test set using the Linear
Programming Dual Active Set Algorithm (LPDASA [5]). The problems passed to the
solver were first simplified using the ILOG CPLEX [2] version 7.0 resolve routine.
An LP presolver [1] preprocesses the problem by removing redundant constraints and
variables whose values can be easily determined. Hence, the number of rows n of the
problems listed in Table 10.1 is much smaller than the number of rows in the original
Netlib problems.
The problems are displayed according to the average density of the lower trian-
gular L with the densest problems at the top of the table. The density measures the


TABLE 10.1
Experimental results

fraction of the elements beneath the diagonal that are nonzero. The density varies
from 0.097 for DFL001 down to 0.00011 OSA60. The column labeled I.. v ... -..I-
gives the average number of flops that would have been required if forward solves were
done in a conventional way, by a forward elimination process. As the LP is solved,
the sparsity of L changes slightly due to changes in the current basis. Hence, we give
the average flops needed for a conventional solve of Lx = b, which can be compared
with the average flops in modifying the solution using the technique developed in
this paper. The columns labeled "column update" and i,... deletion" partition the
update flops into those associated with a change of the form C + WWT and those
associated with the deletion of a row and column from C.
In Table 10.1 we also give the average rank of the update. Since we have developed
[4] a multiple rank approach for performing column updates, the average ranks listed
in column 4 are all greater than 1; they are near 1 for the densest problem DFL001 and
near 60 for the sparsest problem OSA60. Since the row deletion algorithm developed
in this paper is a single rank process, the ranks listed in column 5 are all 1.
We also list the flops associated with modifying L. For the column updates, this is
the flop count for a typical rank-r update, where r is the average rank of the updates
as given in the table. Recall that the column update requires about 4r flops per
entry in L that change. Modifying the forward solve takes 4 flops per entry in L that
change. The average flops associated with the modification of L is thus always greater
than the flops associated with the forward solve update, especially for multiple rank
column updates. The conventional forward solve requires 2 flops for each nonzero
entry in L, whether they change or not. In the worst case, when all of the entries
in L change, modifying the forward solve takes no more than twice the work of the
conventional forward solve, and a column update takes no more than 2r times the
When the matrix L is very sparse (such as OSA60), both the column update and
row deletion methods modify only a small portion of the matrix. The elimination tree
tends to be short and wide, with short paths from any node to the root. Thus, for
very sparse problems the conventional forward solve (which accesses all of L) takes
much more work than either the column update or the row deletion. For the OSA60

Problem Performance Forward Column Row
solve updates deletions
DFL001 number: 2629 87
n: 3881 avg. mod. rank 1.2 1
density: .097 avg. mod. flops 2.43 x 106 1.64 x 106
avg. solve flops 1.46 x 106 2.03 x 106 1.64 x 106
PDS20 number: 1736 65
n: 10214 avg. mod. rank 3.5 1
density: .011 avg. mod. flops 3.37 x 106 2.13 x 106
avg. solve flops 1.11 x 106 1.21 x 106 2.12 x 106
KEN18 number: 2799 23
n: 39856 avg. mod. rank 15.6 1
density: .00012 avg. mod. flops 61.7 x 103 2242
avg. solve flops 195.8 x 103 7155 2075
OSA60 number: 71 277
n: 10209 avg. mod. rank 58.9 1
density: .00011 avg. mod. flops 4415 62
avg. solve flops 11.0 x 103 270 37


matrix, the conventional forward solve takes about 40 times the work compared to
modifying the forward solve during a column update.
For a matrix L that is fairly dense (such as DFL001), the elimination tree tends
to be tall and thin, and the column updates or row deletions modify most of entries
in L. The work required to modify the forward solve is more on average than a full
forward solve. However, since the update of the forward solve can be done at the
same time as the update of the factorization, this single pass over L is typically faster
than the two pass algorithm, even though the flops are slightly higher on average.

11. Summary. We have presented a method for modifying the sparse Cholesky
factorization of a symmetric, positive definite matrix C after a row and column of
C have been modified. One algorithm, the sparse row addition, is optimal. The
corresponding row deletion algorithm is not optimal, but takes no more time than the
row addition. Although changing a row and column of C can be cast as rank-2 change
of the form C WWT, the latter is impractical in a sparse context. We have shown
how to modify the solution to a lower triangular system Lx = b when the matrix L
changes as a result of either the row addition/deletion operation discussed here, or
an update/downdate of the form C WWT described in our previous papers [3, 4].
By postponing the symbolic downdate, the memory usage has been reduced by .' .
(assuming 8-byte floating-point values and 4-byte integers), compared with the column
update/downdate methods described in [3, 4]. Together, the row addition/deletion
algorithms and the column update/downdate algorithms form a useful suite of tools
for modifying a sparse Cholesky factorization and for solving a a sparse system of
linear equations. Using these algorithms, the linear programming solver LPDASA
is able to achieve an overall performance that rivals, and sometimes exceeds, the
performance of current commercially available solvers [5].


[1] E. D. ANDERSEN AND K. D. ANDERSEN, Presolving in linear programming, Math. Program.,
71 (1995), pp. 221-245.
[2] R. E. BIXBY, Progress in linear programming, ORSA J. on Computing, 6 (1994), pp. 15-22.
[3] T. A. DAVIS AND W. W. HAGER, I. I a sparse I factorization, SIAM J. Matrix
Anal. Appl., 20 (1999), pp. 606-627.
[4] Multiple-rank -. i of a sparse I factorization, SIAM J. Matrix Anal.
Appl., 22 (2001), pp. 997-1013.
[5] A sparse multilevel implementation of the LP dual active set algorithm, tech. report,
Dept. of Mathematics, Univ. of Florida, Gainesville, FL 32611, Submitted to Math. Pro-
gram. Available at and as CISE tech report TR-03-014
at, July 26, 2003.
[6] S. C. EISENSTAT, M. C. GURSKY, M. H. SCHULTZ, AND A. H. SHERMAN, Yale sparse matrix
package, I: The symmetric codes, Internat. J. Numer. Methods Engrg., 18 (1982), pp. 1145
[7] A. GEORGE AND J. W. H. LIU, An optimal algorithm for symbolic factorization of symmetric
matrices, SIAM J. Computing, 9 (1980), pp. 583-593.
[8] Computer Solution of Large Sparse Positive i Systems, Englewood Cliffs, New
Jersey: Prentice-Hall, 1981.
[9] J. R. GILBERT, I structure in sparse matrix computations, SIAM J. Matrix Anal.
Appl., 15 (1994), pp. 62-79.
[10] J. R. GILBERT AND T. PEIERLS, Sparse partial pivoting in time proportional to arithmetic
operations, SIAM J. Sci. Statist. Comput., 9 (1988), pp. 862-874.
[11] P. E. GILL, G. H. GOLUB, W. MURRAY, AND M. A. SAUNDERS, Methods for I matrix
factorizations, Math. Comp., 28 (1974), pp. 505-535.
[12] W. W. HAGER, Updating the inverse of a matrix, SIAM Rev., 31 (1989), pp. 221-239.


[13] The dual active set algorithm, in Advances in Optimization and Parallel Computing,
P. M. Pardalos, ed., North Holland, Amsterdam, 1992, pp. 137-142.
[14] The LP dual active set algorithm, in High Performance Algorithms and Software in
Nonlinear Optimization, R. D. Leone, A. Murli, P. M. Pardalos, and G. Toraldo, eds.,
Dordrecht, 1998, Kluwer, pp. 243-254.
[15] The dual active set algorithm and its application to linear programming, Comput.
Optim. Appl., 21 (2002).
[16] The dual active set algorithm and the iterative solution of linear programs, in Novel
Approaches to Hard Discrete Optimization, P. M. Pardalos and H. Wolkowicz, eds., Kluwer
Academic Publishers, 2003, pp. 95-107.
[17] W. W. HAGER AND D. W. HEARN, Application of the dual active set algorithm to quadratic
network optimization, Comput. Optim. Appl., 1 (1993), pp. 349-373.
[18] J. W. H. LIU, A compact row storage scheme for I *' factors using elimination trees,
ACM Trans. Math. Software, 12 (1986), pp. 127-148.
[19] The role of elimination trees in sparse factorization, SIAM J. Matrix Anal. Appl., 11
(1990), pp. 134-172.
[20] R. SCHREIBER, A new implementation of sparse Gaussian elimination, ACM Trans. Math.
Software, 8 (1982), pp. 256-276.

University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs