Algorithm 8xx: CHOLMOD, supernodal sparse
Cholesky factorization and update/downdate*
YANQING CHENt TIMOTHY A. DAVIS4 WILLIAM W. HAGER
and SIVASANKARAN RAJAMANICKAMT
September 8, 2006
Technical report TR2006005, CISE Dept, Univ. of Florida, Gainesville, FL
Abstract
CHOLMOD is a set of routines for factorizing sparse ,r i,, iri: positive definite
matrices of the form A or AAT, il. i ii /downdating a sparse C'l iI l:y factorization,
solving linear systems, updating/downdating the solution to the triangular ,1. i,,
Lx = b, and many other sparse matrix functions for both , rii i i, : and unsymmetric
matrices. Its supernodal Cl'!I. l:y factorization relies on LAPACK and the Level3
BLAS, and obtains a substantial fraction of the peak performance of the BLAS. Both
real and complex matrices are supported. CHOLMOD is written in ANSI/ISO C, with
both C and MATLABT"iii i. ... It appears in MATLAB 7.2 as x=A\b when A is
sparse ,11, ii r: positive definite, as well as in several other sparse matrix functions.
1 Overview
Methods for the direct solution of a sparse linear system Ax = b rely on matrix factoriza
tions. When A is symmetric positive definite, sparse Cholesky factorization is typically used.
Supernodal and multifrontal methods are among those that can exploit dense matrix kernels
(the BLAS [26, 14, 13]) and can thus achieve high performance on modern computers. For
more background on direct methods for sparse matrices, see [7, 16, 15]. Supernodal methods
are discussed by [3, 12, 24, 29, 32, 33]
*This work was supported by the National Science Foundation, under grants 0203270 and 0620286.
tDept. of Computer and Information Science and Engineering, Univ. of Florida, Gainesville, FL, USA.
email: ycl@cise.ufl.edu.
tDept. of Computer and Information Science and Engineering, Univ. of Florida, Gainesville, FL, USA.
email: davis@cise.ufl.edu. http://www.cise.ufl.edu/~davis.
5Dept. of Mathematics, Univ. of Florida, Gainesville, FL, USA. email: hager@math.ufl.edu.
http://www.math.ufl.edu/~hager.
Dept. of Computer and Information Science and Engineering, Univ. of Florida, Gainesville, FL, USA.
em ail: sr ,i ,i .i i l ..i l . 1,i
CHOLMOD is a package that provides sparse Cholesky factorization methods and re
lated sparse matrix functions. It includes both a supernodal method and a nonsupernodal
uplooking method [6] that does not exploit the BLAS. If the original matrix A is replaced
by A WWT, where W is nbyk with k << n, the package can update or downdate the
factorization while exploiting ,.i,,,.' supernodes. An update or downdate of a conven
tional supernodal structure is not feasible, since changes in the nonzero pattern of L cause
supernodes to merge and split apart. Reorganizing a conventional supernodal structure of
L would dominate the run time of the update/downdate. Dynamic supernodes, in contrast,
are detected and exploited as the algorithm progresses. For a rank1 update/downdate, the
running time is proportional to the number of entries in L that change; the time for a rank
k update/downdate is proportional to the time for k separate rank1 update/downdates.
[9, 10]. CHOLMOD also exploits dynamic supernodes in the triangular solves, and can up
date/downdate the corresponding solution to Lx = b after updating/downdating L itself. A
detailed discussion of how CHOLMOD uses dynamic supernodes is given in [11].
Section 2 summarizes the features in CHOLMOD. Performance of CHOLMOD's sparse
Cholesky factorizations and nested dissection ordering methods are given in Section 3. Sec
tion 4 explains how to obtain the code and the packages it relies on.
2 Features
CHOLMOD is composed of a set of modules, each of which defines a set of objects and/or op
erations on those objects. CHOLMOD's modules (Core, C! ..!. ly, C(!i. 1:, Demo, MATLABTM
MatrixOps, Modify, Partition, and Supernodal) are described below. In addition, CHOLMOD
includes a full user guide, a Makefile, and an exhaustive test suite that exercises 10 of the
statements in the code.
2.1 Core Module
The Core Module defines the five basic objects that CHOLMOD supports:
1. cholmod_sparse: a sparse matrix in compressed sparse column form, either symmetric
or unsymmetric (the latter may be rectangular). In the symmetric case, either the
upper or lower triangular part is stored. Most of CHOLMOD's functions operate on
cholmod_sparse matrices.
2. cholmod_triplet: a sparse matrix in triplet form (a list of nonzero values with their
row and column indices). This is a flexible data structure for the user to generate.
Only a few functions are provided (including functions for converting to and from the
triplet form).
3. cholmod_dense: a dense matrix in columnoriented form (compatible with MATLAB
and Fortran).
4. cholmod_factor: a sparse Cholesky factorization, either supernodal or nonsupernodal,
and either LLT or LDLT (where D is diagonal).
5. cholmod_common: CHOLMOD parameters, workspace, and statistics.
The first four objects can be real or complex (both double precision), or patternonly
(with no numerical values). CHOLMOD supports two forms of complex matrices and factors:
C/C++/Fortranli1., and the splitli,*1 used in MATLAB. The Core Module provides
functions to allocate, reallocate, and free these objects. The Core can also copy all but the
cholmod_common object, and convert between different object types.
Lower case letters are used for subsets (f) or permutation vectors (p); alpha and beta
are scalars. Sparse matrix operators (for cholmod_sparse) in the Core Module include
(in MATLAB notation) A*A', A(:,f)*A(:,f)', alpha*A+beta*B, tril, triu, diag, A',
A(p,f)', and A(p,p)'. The Core Module includes a function to sort row indices within
each sparse column vector and a function to extract entries within a band.
The cholmod_factor object contains either a symbolic or numeric factorization, in either
LLT or LDLT form, and either supernodal or nonsupernodal. The Core Module includes a
function that converts a cholmod factor between these various forms.
2.2 Cholesky Module
The Cholesky Module provides functions for computing or using a sparse Cholesky factor
ization. These functions compute the elimination tree of A or A*A' and its postordering,
row/column counts of chol(A) or chol(A*A'), the symbolic factorization of A or A*A', an
interface to a supernodal Cholesky factorization, nonsupernodal LL' and LDLT factoriza
tions (uplooking), incremental LLT and LDLT factorizations (one row at a time), a , 'iim. 4..7
I, ',. /.)rization (to remove entries after an update/downdate), and solutions to upper/lower
triangular systems (supernodal, dynamic supernodal, and dense, sparse, and/or multiple
righthand sides). The module also provides an interface to the AMD [1] and COLAMD [8]
ordering methods. The methods used in this Module are discussed by [28, 18, 17, 16] and
[6].
If the Supernodal Module is installed, the sparse C'!i..1 l:y factorization automatically
selects between a nonsupernodal uplooking factorization, and a supernodal BLASbased
factorization. The former is much faster than the supernodal method for very sparse matrices
(such as tridiagonal matrices). The selection is done during symbolic analysis, when IL and
the floatingpoint operation count are found.1 If the operation count divided by IL is greater
than or equal to 40, then the supernodal method is selected. Otherwise, the nonsupernodal
method is selected. This ratio is a coarse metric of how much memoryreuse a supernodal
method will be able to exploit during numeric factorization. If it is high, then the supernodes
will tend to be large and the BLAS routines will be efficient. If it is low, there will be little
scope for efficient exploitation of cache within the BLAS kernels. The use of this ratio was
selected analytically based on a model of cache memory behavior, but the default threshold
of 40 was selected based on performance measurements on a large range of sparse matrices
arising in real applications. These measurements are given in Section 3.
If the Partition Module is not installed, CHOLMOD uses a minimum degree ordering
(AMD or COLAMD). Otherwise, it automatically selects between minimum degree and
1 Ll denotes the number of nonzeros in L, or nnz(L) in MATLAB notation.
nested dissection. The default strategy is to first try AMD. If AMD finds an ordering where
the floating point operation divided by IL is less than 500, or if IL < 51AI, then the AMD
ordering is used and no other ordering is attempted. Otherwise, METIS [25] or CHOLMOD's
nested dissection ordering (NESDIS) is tried, and the best ordering (the one with the lowest
IL) is used.
CHOLMOD can also be requested to try up to nine different orderings and select the best
one found (userprovided, natural, AMD/COLAMD, METIS, NESDIS, and the latter three
with varying nondefault parameter selections). This is useful in the case where a sequence
of matrices with identical nonzero pattern must be factorized, as often occurs in nonlinear
solvers, optimization methods, eigensolvers, and many other applications.
Note that MATLAB 7.2 includes neither the Partition Module nor METIS. It thus ahv;
uses AMD for its ordering, in x=A\b when A is sparse and symmetric positive definite.
2.3 Check Module
The C'!I.. : Module checks the validity of the five CHOLMOD objects, and prints their
contents. It can also read a sparse matrix from a file, in triplet or Matrix Market form [4].
2.4 Demo Module
The Demo Module provides sample main programs to illustrate how CHOLMOD can be
used in a user's application.
2.5 MATLAB Module
The MATLAB Module is CHOLMOD's interface to MATLAB, providing most of CHOLMOD's
functionality to the MATLAB environment. Note that MATLAB 7.2 already includes much
of CHOLMOD itself, as builtin functions (namely, the Core, Cholesky, MATLAB, and Su
pernodal Modules). Builtin functions that rely on CHOLMOD include x=A\b and x=b/A
when A is sparse and symmetric positive definite, chol, etree, and symbfact.
CHOLMOD provides additional functions to the MATLAB caller that are not built into
MATLAB, including nested dissection orderings (E I;TIS and NESDIS), and the ability to
update and downdate a sparse LDLT factorization. Its sparsematrixtimesdensematrix
function, sdmult, is significantly faster than the corresponding operation in MATLAB 7.2,
particularly when the dense matrix has in ii': columns.
2.6 MatrixOps Module
The MatrixOps Module provides functions for the cholmod_sparse object, including [A,B],
[A;B], alpha*A*X+beta*Y and alpha*A'*X+beta*Y where A is sparse and X and Y are dense,
A*B, and A(i,j) where i and j are arbitrary integer vectors. It can also compute norms of
sparse or dense matrices. It can perform row and column scaling, and can drop small entries
from a sparse matrix.
2.7 Modify Module
The Modify Module provides functions that can add or delete a row of L (from an LDLT
factorization) and compute the dynamic supernodal update/downdate of an LDLT factor
ization. The solution to Lx = b can also be updated/downdated when L is modified. Details
of the methods used are given by [9, 10, 11]. For background on update/downdate methods,
see [19, 23, 34, 35].
2.8 Partition Module
The Partition Module provides graphpartitioning based orderings. The graph partitioning
methods are currently based on METIS. Thus, this Module can only be used in CHOLMOD
if METIS is also available.
The Partition Module includes an interface to three constrained minimum degree ordering
methods: CAMD, CCOLAMD, and CSYMAMD. These methods are constrained versions of
AMD, COLAMD, and SYMAMD, respectively. In a constrained minimum degree ordering
method, each node is in one of up to n constraint sets. All nodes in constraint set zero are
ordered first, followed by all nodes in set one, and so on. These ordering methods are most
useful when combined with nested dissection [30, 31].
The Module provides a direct interface to METIS functions for computing a node separa
tor of an undirected graph (\ I l`TIS_NodeComputeSeparator), and for computing the nested
dissection ordering of an undirected graph (Il ITISNodeND). The latter recursively parti
tions the graph (via node separators) until the subgraphs are small. Small subgraphs are
ordered independently with minimum degree (\\lI), [27]), not with a constrained minimum
degree ordering.
CHOLMOD also includes its own nested dissection ordering, NESDIS. It uses
METISNodeComputeSeparator to partition the graph recursively, in the same manner as
METIS_NodeND. Unlike METIS_NodeND, which orders each subgraph independently, NES
DIS uses the separators as ordering constraints for CAMD or CCOLAMD (for symmetric
and unsymmetric matrices, respectively). This tends to result in a better ordering at the
expense of a modest increase in ordering time. Performance comparisons between NESDIS,
METIS_NodeND, and AMD/COLAMD are given in Section 3.
2.9 Supernodal Module
The Supernodal Module provides supernodal symbolic and numeric factorization (LLT only)
of A or AAT, and a conventional supernodal triangular solver. Details of the method it uses
are given in [11].
3 Performance
This section highlights the performance of CHOLMOD's sparse C(i. .1. l:y factorization meth
ods, and compares its nested dissection ordering with METIS.
Table 1: Performance of the MATLAB sparse benchmark (in bench)
3.1 Sparse Cholesky factorization
This next example illustrates the performance of CHOLMOD in MATLAB and also provides
an interesting anecdote of how performance results can be misinterpreted.
The introduction of CHOLMOD into MATLAB 7.2 has prompted a recurring spurious
"bug report" from MATLAB users. The MATLAB bench program benchmarks MATLAB
and compares the performance with other computers running the same version of MATLAB.
The sparse benchmark in bench is the sparse C('!i. l:y factorization of a matrix arising from
the discretization of a 2D Lshaped domain, using the natural ordering instead of the default
fillreducing ordering. The matrix is A=delsq(numgrid('L',n)), where the mesh size is
nbyn. In MATLAB 7.1, the mesh size was 120by120. In MATLAB 7.2, the mesh size was
increased to 300by300 to reflect the improved performance of chol. The results in Table I
were obtained on a 3.2GHz Pentium 4 using the Goto BLAS [20].
Some MATLAB users, curious to compare the speed of different versions of MATLAB,
compare the sparse bench time between MATLAB 7.1 and 7.2 and find that MATLAB 7.2
is !.v. i" (0.556 seconds in MATLAB 7.1 versus 2.82 seconds in MATLAB 7.2). They
report this as a "bug" since MATLAB appears to have slowed down by a factor of about
five. They fail to heed the note in bench that the problem sizes can differ; MATLAB 7.2 is
doing about 40 times the work as the MATLAB 7.1 benchmark. For this matrix, the new
sparse C('!I,! l:y factorization is about eight times faster than the old one, not five times
slower.
For large matrices, sparse backlash when A is symmetric positive definite is typically
five to ten times faster in MATLAB 7.2 as compared with the leftlooking nonsupernodal
method in MATLAB 7.1. CHOLMOD's uplooking nonsupernodal factorization, coupled
with the better and faster AMD ordering, provides a speedup even for small and very sparse
matrices. For large matrices, a speedup of up to 40 has been observed, due to the supernodal
factorization in CHOLMOD and the better ordering. For example, for the ND/ND3K matrix,
a 3D mesh, the backlash time reduces from 178.6 seconds in MATLAB 7.1 to 10.7 seconds in
MATLAB 7.2 when using the default builtin orderings in x=A\b (SYMMMD in MATLAB 7.1
and AMD in MATLAB 7.2). With the Partition Module and CHOLMOD's nested dissection
ordering, the time drops to 9.5 seconds (including the ordering time). With METIS, the time
drops to 8.5 seconds (0.9 seconds of the difference are from a reduction in ordering time from
mesh size, n 120 300
matrix dimension 10,443 66,603
nnz(A) 51,743 331,823
nnz(L) 1.02 x 106 16.5 x 106
flop count 109 x 106 4434 x 106
MATLAB 7.1 x=A\b
time 0.556 seconds 18.31 seconds
Mflops 196 242
MATLAB 7.2 x=A\b
time 0.124 seconds 2.82 seconds
Mflops 879 1572
Nonsupernodal performance
4000
4000r
1000 1000
I.,
U) 
a 
 100 ,:'. 100 ,._
0 0
c 10 10
1 1
2 10 40 100 1000 5000 2 10 40 100 1000 5000
floatingpoint operations / nnz(L) floatingpoint operations / nnz(L)
Figure 1: CHOLMOD supernodal and nonsupernodal performance
2.1 to 1.2 seconds; the ordering quality is also slightly improved with METIS for this matrix).
Full details of an independent performance comparison between CHOLMOD and ten
other solvers are given in [22, 21]. With 87 symmetric positive definite test matrices,
CHOLMOD had the fastest total run time (including analysis, factorization, and solution
of the triangular systems) for 55 matrices. It typically uses the least amount of memory as
well. Considering just the 49 larger systems requiring 5 seconds or more to solve using the
fastest method, CHOLMOD was fastest for 32 matrices, and had a run time no higher than
15'. more than the fastest run time for all but two matrices.2 These two matrices are from
an interior point linear programming problem and contain many dense rows and columns
that cause problems with many solvers. CHOLMOD's run time was 2.2 and 3.8 times the
fastest run time for these two matrices, respectively. The bulk of the time was spent in
AMD. Modifying AMD's default dense row/column control parameter can greatly reduce
the ordering time for these two matrices.
Figure 1 shows the performance of CHOLMOD's nonsupernodal uplooking method and
its supernodal method, as a function of the ratio of floatingpoint operations over the number
of nonzeros in L. Figure 2 compares the relative performance of these two methods. The test
set of 320 matrices consists of all matrices in the UF Sparse Matrix Collection [5] that are
either positive definite or binary with a symmetric nonzero pattern. For the binary matrices,
numerical values were constructed to ensure the matrices were positive definite; offdiagonal
entries were set to 1, and the diagonal entry aii was set to one plus the number of nonzero
entries in column i. Random matrices were excluded. The xaxis of each plot is the floating
point operation count over IL, which is the metric used to automatically select between
the two methods. Run times include ordering, analysis, factorization, and solution of the
resulting triangular systems. This is all the work done by x=A\b, except for the mldivide
2The GUPTA1 and GUPTA2 matrices, which are also the two worstcase outliers in Figure 1.
Supernodal performance
Relative performance
7
I 5
S4
E 3 0 0
U) 2 0
O_ O
00
E 
r 1.2 D 0
1m
o 0 .8 " "
50.6 .
0.5
0
c
O
0.2
2 10 40 100 1000 5000
floatingpoint operations / nnz(L)
Figure 2: CHOLMOD relative supernodal and nonsupernodal performance
metaalgorithm that selects which method to use (LU, Cholesky, QR, banded solver, etc.).
Each circle in the figure is a single matrix.
These results indicate that the floatingpoint count per nonzero in L is a remarkably
accurate method for predicting both the absolute and relative performance of these two
methods. The relative performance, and more importantly the threshold of 40 flops/LI
used to automatically select between the uplooking and supernodal method, may of course
differ on different computers. The value of 40 is the default value of a parameter that the user
can modify. Reasonable thresholds on a range of other computers are shown in Table II,
based on the same test set. On all platforms listed, the flops/LI ratio proves to be an
accurate predictor of the performance of each method.
On dualcore processors, a high flops/LI ratio is required to warrant the use of a multi
threaded BLAS, as compared to a singlethreaded BLAS (about 300 on the Intel T2500, and
500 on the AMD Opteron).
3.2 Nested dissection ordering
The results in this section demonstrate the improvement in fillin that can be obtained by
combining nested dissection with a constrained minimum degree ordering.
AMD and the CHOLMOD and METIS nested dissection orderings were tested on 649
matrices with symmetric or nearly symmetric nonzero pattern. These are all matrices in the
UF Sparse Matrix Collection that are either symmetric or whose nonzero pattern is more
than ,11' symmetric.
In addition, 203 unsymmetric matrices were selected from same collection. These are
either rectangular, or square with a nonzero pattern that is less than I II', symmetric. Diag
Computer threshold uplooking supernodal theoretical
peak GFlops peak GFlops peak GFlops
Intel Pentium 4
3.2GHz, 512KB cache, 4GB RAM
Goto BLAS 40 0.38 3.76 6.4
Intel Pentium 4M
2GHz, 512KB cache, 1GB RAM
Intel MKL BLAS 40 0.20 1.77 4.0
Goto BLAS 40 0.20 1.80 4.0
Intel Core Duo T2500 (2core)
2GHz, 2MB cache, 2GB RAM
Intel MKL BLAS (1 thread) 40 0.41 1.34 2.0
Goto BLAS (1 thread) 40 0.41 1.47 4.0
Goto BLAS (2 threads) 120 0.41 2.47 4.0
AMD Opteron 252 (2core)
2.6GHz, 1MB cache, 8GB RAM
AC'. I.L BLAS (1 thread) 30 0.38 2.56 5.2
Goto BLAS (1 thread) 30 0.38 3.93 5.2
Goto BLAS (2 threads) 80 0.38 6.65 10.4
Sun UltraSparcII V9+vis (4core)
450GHz, 4MB cache, 4GB RAM
ATLAS BLAS (1 thread) 45 0.06 0.43 0.9
Table 2: Performance of CHOLMOD sparse Cholesky factorization on a range of computers
onal matrices and matrices with fewer than 10,000 nonzeros were excluded from both test
sets. CHOLMOD and METIS were compared with COLAMD for these matrices.
For the symmetric case or nearly symmetric case, the graph of A+A' was ordered. CHOLMOD's
NESDIS function uses CAMD, a constrained version of AMD. For the unsymmetric case, the
graph of A'*A was ordered if A had at least as many rows as columns (for LU factorization,
or QR factorization for a leastsquares problem). The matrix A*A' was ordered if A had
fewer rows than columns (a linear programming problem). CHOLMOD uses CCOLAMD, a
constrained version of COLAMD, in this case.
For the 649 symmetric matrices, AMD finds the best ordering for 311 matrices, METIS
for 107, and NESDIS for 236 (the total is higher than 649 because of exact ties). AMD is
4.1 times faster than METIS, and 5.0 times faster than NESDIS (a median result). NESDIS
is slower than METIS (by about 25' .), but it typically finds a better ordering than METIS.
The median fillin is 2' less with NESDIS than with METIS for all 649 matrices.
Most sparse Cholesky factorization packages that include a nested dissection ordering
method also include a minimum degree ordering method. Like CHOLMOD, it is typical to
try both and select the method returning the best ordering, since minimum degree is so fast
in practice. It is thus useful to compare nested dissection orderings only for matrices for
which minimum degree does not find the best ordering. Of these 338 symmetric matrices,
METIS finds a significantly better ordering than NESDIS (a reduction in L by 5' or more)
for 20 matrices, whereas NESDIS finds a significantly better ordering than METIS for 91.
They essentially tie (within 5'.) for the other 227 matrices.
For the 203 unsymmetric matrices, COLAMD gives the best ordering for 77 matrices,
METIS for 40, and NESDIS for 92. COLAMD is 3.15 and 3.9 times faster than METIS and
NESDIS, respectively (a median result). The median fillin is :'. less with NESDIS than
with METIS for all 203 matrices, and _'", for just the 126 matrices for which COLAMD
does not find the best ordering. Of these 126 matrices, METIS finds a significantly better
ordering than NESDIS for only 8 matrices, NESDIS for 38, and 'h,1 tie (within 5'.) for
81 matrices.
4 Availability
In addition to appearing as a Collected Algorithm of the AC\ CHOLMOD is available as
a builtin function in MATLAB version 7.2 or later, and online at
http://www.cise.ufl.edu/research/sparse. It requires the ordering functions AMD, COLAMD,
CAMD, and CCOLAMD, the dense matrix libraries LAPACK [2] and the BLAS [26, 14, 13],
and optionally uses METIS for its nested dissection orderings. LAPACK and the BLAS are
available at http://www.netlib.org. The Goto BLAS is available at
http://www.tacc.utexas.edu/resources/software [20]. METIS can be found at
http://glarus.dtc.umn/edu [25].
References
[1] P. R. Amestoy, T. A. Davis, and I. S. Duff. An approximate minimum degree ordering
algorithm. SIAM J. Matrix Anal. Appl., 17(4):886905, 1996.
[2] E. Anderson, Z. Bai, C. H. Bischof, S. Blackford, J. W. Demmel, J. J. Dongarra, J. Du
Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. C. Sorensen. LAPACK
Users' Guide. SIAM, Philadelphia, 3rd edition, 1999.
[3] C. C. Ashcraft and R. G. Grimes. SPOOLES: an objectoriented sparse matrix library.
In Proc. 1999 SIAM Conf. Parallel Processing for Scientific CorT,,l,'i.:, Mar. 1999.
[4] R. F. Boisvert, R. Pozo, K. Remington, R. Barrett, and J. J. Dongarra. The Matrix
Market: A web resource for test matrix collections. In R. F. Boisvert, editor, Qi'l:li of
Numerical Software, Assessment and Enhancement, pages 125137. ('C! hii ,i' & Hall,
London, 1997. (http://math.nist.gov/MatrixMarket).
[5] T. A. Davis. University of Florida sparse matrix collection.
www.cise.ufl.edu/research/sparse, 1994. NA Digest, vol 92, no. 42.
[6] T. A. Davis. Algorithm 849: A concise sparse Cholesky factorization package. ACI'
Trans. Math. Software, 31(4):587591, 2005.
[7] T. A. Davis. Direct Methods for Sparse Linear S.1/.,, SIAM, Philadelphia, PA, 2006.
[8] T. A. Davis, J. R. Gilbert, S. I. Larimore, and E. G. Ng. A column approximate
minimum degree ordering algorithm. AC'I[ Trans. Math. Software, 30(3):353376, 2004.
[9] T. A. Davis and W. W. Hager. Modifying a sparse C(..!1. l:y factorization. SIAM J.
Matrix Anal. Appl., 20(3):606627, 1999.
[10] T. A. Davis and W. W. Hager. Multiplerank modifications of a sparse C('!I! ly fac
torization. SIAM J. Matrix Anal. Appl., 22:9971013, 2001.
[11] T. A. Davis and W. W. Hager. Dynamic supernodes in sparse Cholesky up
date/downdate and triangular solves. AC'I[ Trans. Math. Software, 2006. (submitted).
[12] F. Dobrian, G. K. Kumfert, and A. Pothen. The design of sparse direct solvers using
object oriented techniques. In Adv. in Software Tools in Sci. Cor,,l;',/.:I' pages 89131.
SpringerV(i 1, 2000.
[13] J. J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling. A set of level3 basic linear
algebra subprograms. AC'I[ Trans. Math. Software, 16(1):117, 1990.
[14] J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson. An extended set of
Fortran basic linear algebra subprograms. AC'_I Trans. Math. Software, 14:1832, 1988.
[15] I. S. Duff, A. M. Erisman, and J. K. Reid. Direct Methods for Sparse Matrices. London:
Oxford Univ. Press, 1986.
[16] A. George and J. W. H. Liu. Computer Solution of I.,,., Sparse Positive Definite
S.;,1. m PrenticeHall, Englewood Cliffs, New Jersey, 1981.
[17] J. R. Gilbert, X. S. Li, E. G. Ng, and B. W. Peyton. Computing row and column counts
for sparse QR and LU factorization. BIT, 41(4):693710, 2001.
[18] J. R. Gilbert, E. G. Ng, and B. W. Peyton. An efficient algorithm to compute row
and column counts for sparse ('!. .. l:y factorization. SIAM J. Matrix Anal. Appl.,
15(4):10751091, 1994.
[19] P. E. Gill, G. H. Golub, W. Murray, and M. A. Saunders. Methods for modifying matrix
factorizations. Math. Comp., 28(126):505535, 1974.
[20] K. Goto and R. van de Geijn. On reducing TLB misses in matrix multiplication. TR
200255, Univ. Texas at Austin, Dept. of Computer Sciences, 2002.
[21] N. I. M. Gould, Y. Hu, and J. A. Scott. Complete results from a numerical evaluation
of sparse direct solvers for the solution of large sparse, symmetric linear systems of
equations. Technical Report Internal report 20051 (revision 1), CCLRC, Rutherford
Appleton Laboratory, 2005.
[22] N. I. M. Gould, Y. Hu, and J. A. Scott. A numerical evaluation of sparse direct solvers
for the solution of large sparse, symmetric linear systems of equations. AC'_[ Trans.
Math. Software, 200x. (to appear).
[23] W. W. Hager. Updating the inverse of a matrix. SIAM Review, 31(2):221239, 1989.
[24] P. H6non, P. Ramet, and J. Roman. PaStiX: A highperformance parallel direct solver
for sparse symmetric definite systems. Parallel Computing, 28(2):301321, 2002.
[25] G. Karypis and V. Kumar. A fast and high quality multilevel scheme for partitioning
irregular graphs. SIAM J. Sci. Comrput., 20:359392, 1998.
[26] C. L. Lawson, R. J. Hanson, D. R. Kincaid, and F. T. Krogh. Basic linear algebra
subprograms for Fortran usage. ACI Trans. Math. Software, 5:308323, 1979.
[27] J. W. H. Liu. Modification of the minimumdegree algorithm by multiple elimination.
ACM Trans. Math. Software, 11(2):141153, 1985.
[28] J. W. H. Liu. The role of elimination trees in sparse factorization. SIAM J. Matrix
Anal. Appl., 11(1):134172, 1990.
[29] E. G. Ng and B. W. Peyton. Block sparse Cholesky algorithms on advanced uniprocessor
computers. SIAM J. Sci. Comput., 14(5):10341056, 1993.
[30] F. Pellegrini, J. Roman, and P. R. Amestoy. Hybridizing nested dissection and halo
approximate minimum degree for efficient sparse matrix ordering. Concurr ,,. Pract.
Exp., 12:6884, 2000.
[31] E. Rothberg and S. C. Eisenstat. Node selection strategies for bottomup sparse matrix
orderings. SIAM J. Matrix Anal. Appl., 19(3):682695, 1998.
[32] E. Rothberg and A. Gupta. Efficient sparse matrix factorization on highperformance
workstations Exploiting the memory hierarchy. ACI[ Trans. Math. Software,
17(3):313334, 1991.
[33] V. Rotkin and S. Toledo. The design and implementation of a new outofcore sparse
('!.!. l:y factorization method. ACM_ Trans. Math. Software, 30(1):1946, 2004.
[34] G. W. Stewart. The effects of rounding error on an algorithm for downdating a Cholesky
factorization. J. Inst. Math. Appl., 23:203213, 1979.
[35] G. W. Stewart. Matrix algorithms, Volume 1: Basic decompositions. SIAM, Philadel
phia, 1998.
