Group Title: Department of Computer and Information Science and Engineering Technical Reports
Title: An Approximate minimum degree ordering algorithm
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00095308/00001
 Material Information
Title: An Approximate minimum degree ordering algorithm
Series Title: Department of Computer and Information Science and Engineering Technical Report ; 94-039
Physical Description: Book
Language: English
Creator: Davis, Timothy A.
Amestoy, Patrick
Duff, Iain S.
Affiliation: University of Florida
École Nationale Supérieure d'Électronique, d'Électrotechnique, d'Informatique, d'Hydraulique, et des Télécommunications -- Toulouse, France
Rutherford Appleton Laboratory
Publisher: Department of Computer and Information Sciences, University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: July, 1995
Copyright Date: 1994
Edition: rev.
 Record Information
Bibliographic ID: UF00095308
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.

Downloads

This item has the following downloads:

1994160 ( PDF )


Full Text










AN APPROXIMATE MINIMUM DEGREE ORDERING ALGORITHM

TIMOTHY A. DAVIS*, PATRICK AMESTOYt, AND IAIN S. DUFFt

Computer and Information Sciences Dept., University of Florida,
Technical Report TR-94-039, December, 1994 (revised July 1 'i,).
Abstract. An Approximate Minimum Degree ordering algorithm (AMD) for preordering a
symmetric sparse matrix prior to numerical factorization is presented. We use techniques based on
the quotient graph for matrix factorization that allow us to obtain computationally cheap bounds
for the minimum degree. We show that these bounds are often equal to the actual degree. The
resulting algorithm is typically much faster than previous minimum degree ordering algorithms, and
produces results that are comparable in quality with the best orderings from other minimum degree
algorithms.

Keywords: approximate minimum degree ordering algorithm, quotient graph,
sparse matrices, graph algorithms, ordering algorithms
AMS classifications: 65F50, '.'1 II

1. Introduction. When solving large sparse symmetric linear systems of the
form Ax = b, it is common to precede the numerical factorization by a symmetric
reordering. This reordering is chosen so that pivoting down the diagonal in order on
the resulting permuted matrix PAPT = LLT produces much less fill-in and work than
computing the factors of A by pivoting down the diagonal in the original order. This
reordering is computed using only information on the matrix structure without taking
account of numerical values and so may not be stable for general matrices. However,
if the matrix A is positive-definite [21], a Cholesky factorization can safely be used.
This technique of preceding the numerical factorization with a symbolic analysis can
also be extended to unsymmetric systems although the numerical factorization phase
must allow for subsequent numerical pivoting [1, 2, 16]. The goal of the preordering
is to find a permutation matrix P so that the subsequent factorization has the least
fill-in. Unfortunately, this problem is NP-complete [31], so heuristics are used.
The minimum degree ordering algorithm is one of the most widely used heuristics,
since it produces factors with relatively low fill-in on a wide range of matrices. Because
of this, the algorithm has received much attention over the past three decades. The
algorithm is a symmetric analogue of Markowitz' method [26] and was first proposed
by Tinney and Walker [30] as algorithm S2. Rose [27, 28] developed a graph theo-
retical model of Tinney and Walker's algorithm and renamed it the minimum degree
algorithm, since it performs its pivot selection by selecting from a graph a node of
minimum degree. Later implementations have dramatically improved the time and
memory requirements of Tinney and Walker's method, while maintaining the basic
idea of selecting a node or set of nodes of minimum degree. These improvements have
reduced the memory complexity so that the algorithm can operate within the storage
of the original matrix, and have reduced the amount of work needed to keep track of
the degrees of nodes in the graph (which is the most computationally intensive part

Computer and Information Sciences Department University of Florida, Gainesville, Florida,
USA. phone: (904) 392-1481, email: davis@cis.ufl.edu. Support for this project was provided by
the National Science Foundation (ASC-9111263 and DMS-9223088). Portions of this work were
supported by a post-doctoral grant from CERFACS.
t ENSEEIHT-IRIT, Toulouse, France. email: ...... -i *. ...-.. .1.1 I,
I Rutherford Appleton Laboratory, Chilton, Didcot, Oxon. OX11 OQX England, and Euro-
pean Center for Research and Advanced Training in Scientific Computation (CERFACS), Toulouse,
France. email: isd@letterbox.rl.ac.uk.










P. AMESTOY, T. A. DAVIS, AND I. S. DUFF


of the algorithm). This work includes that of Duff and Reid [10, 13, 14, 15]; George
and McIntyre [23]; Eisenstat, Gursky, Schultz, and Sherman [17, 18]; George and Liu
[19, 20, 21, 22]; and Liu [25]. More recently, several researchers have relaxed this
heuristic by computing upper bounds on the degrees, rather than the exact degrees,
and selecting a node of minimum upper bound on the degree. This work includes
that of Gilbert, Moler, and Schreiber [24], and Davis and Duff [7, 8]. Davis and Duff
use degree bounds in the unsymmetric-pattern multifrontal method (UMFPACK), an
unsymmetric Markowitz-style algorithm. In this paper, we describe an approximate
minimum degree ordering algorithm based on the symmetric analogue of the degree
bounds used in UMFPACK.
Section 2 presents the original minimum degree algorithm of Tinney and Walker
in the context of the graph model of Rose. Section 3 discusses the quotient graph
(or element graph) model and the use of that model to reduce the time taken by
the algorithm. In this context, we present our notation for the quotient graph, and
present a small example matrix and its graphs. We then use the notation to describe
our approximate degree bounds in Section 4. The Approximate Minimum Degree
(AMD) algorithm and its time complexity is presented in Section 5. In Section 6,
we first analyse the performance and accuracy of our approximate degree bounds on
a set of test matrices from a wide range of disciplines. The AMD algorithm is then
compared with other established codes that compute minimum degree orderings.

2. Elimination graphs. The nonzero pattern of a symmetric n-by-n matrix,
A, can be represented by a graph Go = (V0, E), with nodes Vo {1,..., n} and
edges E. An edge (i, j) is in E if and only if aij # 0. Since A is symmetric, Go is
undirected.
The elimination graph, G0 = (V, EB), describes the nonzero pattern of the sub-
matrix still to be factorized after the first k pivots have been chosen. It is undirected,
since the matrix remains symmetric as it is factorized. At step k, the graph G0 de-
pends on G0-1 and the selection of the kth pivot. To find Gk, the kth pivot node p
is selected from Vk-1. Edges are added to Ek-1 to make the nodes adjacent to p in
G -1 a clique (a fully connected subgraph). This addition of edges (fill-in) means that
we cannot know the storage requirements in advance. The edges added correspond to
fill-in caused by the kth step of factorization. A fill-in is a nonzero entry Lij, where
(PAPT)ij is zero. The pivot node p and its incident edges are then removed from the
graph Gk-1 to yield the graph G Let AdjGk(i) denote the set of nodes adjacent to
i in the graph Gk. Throughout this paper, we will use the superscript k to denote a
graph, set, or other structure obtained after the first k pivots have been chosen. For
simplicity, we will drop the superscript when the context is clear.
The minimum degree algorithm selects node p as the kth pivot such that the
degree of p, tp -_ AdjGkc- (p) is minimized (where |... denotes the size of a set or the
number of nonzeros in a matrix). The minimum degree algorithm is a non-optimal
greedy heuristic for reducing the number of new edges (fill-ins) introduced during the
factorization. We have already noted that the optimal solution is NP-complete [31].
By minimizing the degree, the algorithm minimizes the upper bound on the fill-in
caused by the kth pivot. Selecting p as pivot creates at most (t2 tp)/2 new edges in
G.

3. Quotient graphs. In contrast to the elimination graph, the quotient graph
models the factorization of A using an amount of storage that never exceeds the
storage for the original graph, Go [21]. The quotient graph is also referred to as the









APPROXIMATE MINIMUM DEGREE


generalized element model [13, 14, 15, 29]. An important component of a quotient
graph is a clique. It is a particularly economic structure since a clique is represented
by a list of its members rather than by a list of all the edges in the clique. Following
the generalized element model, we refer to nodes removed from the elimination graph
as elements (George and Liu refer to them as eliminated nodes). We use the term
variable to refer to uneliminated nodes.
The quotient graph, k = (V1, V EB, B), implicitly represents the elimination
graph Gk, where go = G. For clarity, we drop the superscript k in the following. The
nodes in G consist of variables (the set V), and elements (the set V). The edges are
divided into two sets: edges between variables E C V x V, and between variables and
elements E C V V. There are no edges between elements since they are removed
by element absorption. The sets V0 and Eo are empty.
We use the following set notation (A, C, and ) to describe the quotient graph
model and our approximate degree bounds. Let Ai be the set of variables adjacent to
variable i in G, and let Ci be the set of elements adjacent to variable i in G (we refer
to Ci as element list i). That is, if i is a variable in V, then

Ai {j:(i,j)C E}C V,

S {e: (i, e) C E} C V,

and

Adjg(i) Ai U i CcVU V.

The set Ai refers to a subset of the nonzero entries in row i of the original matrix A
(thus the notation A). That is, A? {j : aij f 0}, and Af C Ai 1, for 1 < k < n.
Let C, denote the set of variables adjacent to element e in Q. That is, if e is an
element in V, then we define

= Adjg(e) = {i : (i, e) C E} C V.

The edges E and E in the quotient graph are represented explicitly as the sets Ai
and Ci for each variable in G, and the sets C, for each element in G. We will use A, C,
and to denote three sets containing all Ai, Ci, and C,, respectively, for all variables
i and all elements e. George and Liu [21] show that the quotient graph takes no more
storage than the original graph (|A 4 + C IA + IC < A for all k).
The quotient graph G and the elimination graph G are closely related. If i is a
variable in G, it is also a variable in G, and

(3.1) Adj(i) Ai U ) \{i},


where the "\" is the standard set subtraction operator.
When variable p is selected as the kth pivot, element p is formed (variable p is re-
moved from V and added to V). The set p = AdjG(p) is found using Equation (3.1).
The set Cp represents a permuted nonzero pattern of the kth column of L (thus the
notation ). If i E p, where p is the kth pivot, and variable i will become the mth
pivot (for some m > k), then the entry Lk will be nonzero. Equation (3.1) implies
that C, \ {p} C Cp for all elements e adjacent to variable p. This means that all vari-
ables adjacent to an element e C C, are adjacent to the element p and these elements











P. AMESTOY, T. A. DAVIS, AND I. S. DUFF


FIG. 3.1. Elimination graph, quotient graph, and matrix for first three steps.


(a) Elimination graph


(b) Quotient graph


1 EU
2 No U
3 U..
* 4 EU
Um 5 U U
MEm 6
HUm 7EEE

mEm10


1 ** 1 @*
2 No U 2 0 0
3 U.. 3 U .
0 4 XE 10 4 XUE
00 5 E OE0 5 X X
OHEX 6 *@EXX6 X
HUE 7EEE HUE 7EEE
58 HM 8 **
U EE9H XXEE9H
*l10 uio
(c) Factors and active submatrix


1 **
2 *0 0
3 0..
* 4 XEU
So 5XX X
*OOXX6X X
*OXX7HME

XXEE9E
EEE10


e C p are no longer needed. They are absorbed into the new element p and deleted
[15], and reference to them is replaced by reference to the new element p. The new
element p is added to the element lists, Ci, for all variables i adjacent to element p.
Absorbed elements, e E Cp, are removed from all element lists. The sets Ap and Cp,
and C, for all e in Cp, are deleted. Finally, any entry j in Ai, where both i and j are
in p, is redundant and is deleted. The set Ai is thus disjoint with any set C, for
e E Ci. In other words, A4 is the pattern of those entries in row i of A that are not
modified by steps 1 through k of the Cholesky factorization of PAPT. The net result
is that the new graph G takes the same, or less, storage than before the kth pivot was
selected.

3.1. Quotient graph example. We illustrate the sequence of elimination graphs
and quotient graphs of a 10-by-10 sparse matrix in Figures 3.1 and 3.2. The example
is ordered so that a minimum degree algorithm recommends pivoting down the diag-
onal in the natural order (that is, the permutation matrix is the identity). In Figures
3.1 and 3.2, variables and elements are shown as thin-lined and heavy-lined circles,
respectively. In the matrices in these figures, diagonal entries are numbered, unmod-
ified original nonzero entries (entries in A) are shown as a solid squares. The solid
squares in row i form the set Ai. The variables in current unabsorbed elements (sets
C,) are indicated by solid circles in the columns of L corresponding to the unabsorbed











APPROXIMATE MINIMUM DEGREE


FIG. 3.2. Elimination graph, quotient graph, and matrix for steps 4 to 7.


(a) Elimination graph


5

66



(b) Quotient graph


G7










9



,i9_c7)


1 X X
2 SO 0
3 000
X 4 @ 0
0S 5XX X
XSOIX 6 XXX
*OXX 7 XE
XX8 M
XXEE9E
E 010


1 XX 1 X X
2 XX X 2 XX X
3 XXX 3 XXX
X 4 O I X 4 XXX
XX 5 sI XX 5 XX X
XXXSS6XXX XXXXX6 XX
XSSX7XXH XXX07XXE
XX8E X XX8X X
X OXXH9E X XXXX9X
gl EXX10
c) Factors and active submatrxx
(c) Factors and active submatrix


1 XX
2 XX X
3 XXX
X 4 XXX
XX 5XX X
XXXXX6XXX
XXXX7XXO
X XX8X X
X XXXX9X
.xxW10


elements. The solid circles in row i form the set CE. Entries that do not correspond
to edges in the quotient graph are shown as an x. Figure 3.1 shows the elimination
graph, quotient graph, and the matrix prior to elimination (in the left column) and
after the first three steps (from left to right). Figure 3.2 continues the example for
the next four steps.
Consider the transformation of the graph g2 to the graph g3. Variable 3 is
selected as pivot. We have 3 = A3 = {5, 6, 7} (a simple case of Equation (3.1)). The
new element 3 represents the pairwise adjacency of variables 5, 6, and 7. The explicit
edge (5,7) is now redundant, and is deleted from A5 and A7.
Also consider the transformation of the graph G4 to the graph g5. Variable 5 is
selected as pivot. The set A5 is empty and E5 = {2, 3}. Following Equation (3.1),

5 = (As U 2 U 3) \ {5}
(0U {5, 6, 9} U {5, 6, 7}) \ {5}
= {6,7,9},

which is the pattern of column 5 of L (excluding the diagonal). Since the new element
5 implies that variables 6, 7, and, 9 are pairwise adjacent, elements 2 and 3 do not
add any information to the graph. They are removed, having been ,I.-.. I.. .1" into
element 5. Additionally, the edge (7, 9) is redundant, and is removed from A7 and










P. AMESTOY, T. A. DAVIS, AND I. S. DUFF


A9. In g4, we have
A6 = 6 = {2, 3, 4}
A7 = {9, 10} 7 ={3,4}
As = {7, 8, 10} 9 = {2}
After these transformations, we have in 95,
A6 = g6 = {4, 5}
A7 = {10} 7 = {4, 5} ,
A9 = {8, 10} E9 = {5}
and the new element in g5,

5 = {6, 7, 9}.
3.2. Indistinguishable variables and external degree. Two variables i and
j are indistinguishable in G if AdjG(i) U {i} = AdjG(j) U {j}. They will have the
same degree until one is selected as pivot. If i is selected, then j can be selected
next without causing any additional fill-in. Selecting i and j together is called mass
elimination [23]. Variables i and j are replaced in Q by a supervariable containing
both i and j, labeled by its principal variable (i, say) [13, 14, 15]. Variables that
are not supervariables are called simple variables. In practice, new supervariables are
constructed at step k only if both i and j are in p (where p is the pivot selected at
step k). In addition, rather than checking the graph G for indistinguishability, we use
the quotient graph Q so that two variables i and j are found to be indistinguishable
if Adjg(i) U {i} = Adjg(j) U {j}. This comparison is faster than determining if
two variables are indistinguishable in G, but may miss some identifications because,
although indistinguishability in Q implies indistinguishability in G, the reverse is not
true.
We denote the set of simple variables in the supervariable with principal variable
i as i, and define i = {i} if i is a simple variable. When p is selected as pivot at the
kth step, all variables in p are eliminated. The use of supervariables greatly reduces
the number of degree computations performed, which is the most costly part of the
algorithm. Non-principal variables and their incident edges are removed from the
quotient graph data structure when they are detected. The set notation A and refers
either to a set of supervariables or to the variables represented by the supervariables,
depending on the context. In degree computations and when used in representing
elimination graphs, the sets refer to variables; otherwise they refer to supervariables.
In Figure 3.2, detected supervariables are circled by dashed lines. Non-principal
variables are left inside the dashed supervariables. These are, however, removed from
the quotient graph. The last quotient graph in Figure 3.2 represents the selection of
pivots 7, 8, and 9, and thus the right column of the figure depicts G7, G9, and the
matrix after the ninth pivot step.
The external degree di ti |i| + 1 of a principal variable i is

(3.2) di = |AdjG(i) \i = \Ai \ + ( U ) \ i

since the set Ai is disjoint from any set C, for e E Ei. At most (d d)/2 fill-ins occur
if all variables in i are selected as pivots. We refer to ti as the true degree of variable i.
Selecting the pivot with minimum external degree tends to produce a better ordering
than selecting the pivot with minimum true degree [25] (also see Section 6.2).









APPROXIMATE MINIMUM DEGREE


Algorithm 1 (Minimum degree algorithm, based on quotient graph)
V {l...n}
V=0
for i =1 ton do
Ai = {j : aij 0 and i ij}
Ci 0
di = Ai
d = jil
i= {i}
end for
k=
while k < n do
mass elimination:
select variable p E V that minimizes dp
S= (ApUUee,Ce)\p
for each i E p do
remove redundant entries:
Ai = (Ai \ p)\p
element absorption:
i = (Vi \p) U {p}
compute external degree:
di = |Ai \i| + |(UeE, e) ,\i|
end for
supervariable detection, pairs found via hash function:
for each pair i and j E C do
if i and j are indistinguishable then
remove the supervariable j:
i=iUj
di = dc jl
V= V\{j

Aj =0
end if
end for
convert variable p to element p:
V = (VU {p}) \
V = \{p}
Ap = 0

k = k + |p
end while




3.3. Quotient-graph-based minimum degree algorithm. A minimum de-
gree algorithm based on the quotient graph is shown in Algorithm 1. It includes
element absorption, mass elimination, supervariables, and external degrees. Super-
variable detection is simplified by computing a hash function on each variable, so that
not all pairs of variables need be compared [3]. Algorithm 1 does not include two










P. AMESTOY, T. A. DAVIS, AND I. S. DUFF


important features of Liu's Multiple Minimum Degree algorithm (\I \1 I)): incomplete
update [17, 18] and multiple elimination [25]. With multiple elimination, an indepen-
dent set of pivots with minimum degree is selected before any degrees are updated. If
a variable is adjacent to two or more pivot elements, its degree is computed only once.
A variable j is outmatched if AdjG(i) C AdjG(j). With incomplete degree update,
the degree update of the outmatched variable j is avoided until variable i is selected
as pivot. These two features further reduce the amount of work needed for the degree
computation in MMD. We will discuss their relationship to the AMD algorithm in
the next section.
The time taken to compute di using Equation (3.2) by a quotient-graph-based
minimum degree algorithm is

(3.3) o(|AiI + I I|),
eE,

which is Q(|AdjGk(i) ) if all variables are simple.1 This degree computation is the
most costly part of the minimum degree algorithm. When supervariables are present,
the time taken is in the best case proportional to the degree of the variable in the
S..... i --.1" elimination graph, where all non-principal variables and their incident
edges are removed.

4. Approximate degree. Having now discussed the data structures and the
standard minimum degree implementations, we now consider our approximation for
the minimum degree and indicate its lower complexity.
We assume that p is the kth pivot, and that we compute the bounds only for
supervariables i E Cp. Rather than computing the exact external degree, di, our
Approximate Minimum Degree algorithm (AMD) computes an upper bound [7, 8],

n k,
-k-1
(4.1) d = minm L l
I e \{p} I

-k-1
The first two terms (n k, the size of the active submatrix, and di + ICp \ i|,
the worst case fill-in) are usually not as tight as the third term in Equation (4.1).
Algorithm 2 computes |Ce \ Cp for all elements e in the entire quotient graph. The
set Ce splits into two disjoint subsets: the external subset Ce \ Cp and the internal
subset Ce n p. If Algorithm 2 scans element e, the term w(e) is initialized to IECe
and then decremented once for each variable i in the internal subset ,e n Cp, and, at
the end of Algorithm 2, we have w(e) = ICe |en UI = U1e \ Cp. If Algorithm 2
does not scan element e, the term w(e) is less than zero. Combining these two cases,
we obtain

(4.2) I|e \ rp= w(e) if w(e) > for all e C V.
(4.2t ie\ l=P Ce otherwise J


1Asymptotic complexity notation is defined in [6]. We write f(n) = O(g(n)) if there exist
positive constants cl, c2, and no such that 0 < cig(n) < f(n) < c2g(n) for all n > no. Similarly,
f(n) = Q(g(n)) if there exist positive constants c and no such that 0 < cg(n) < f(n) for all n > no;
and f(n) = O(g(n)) if there exist positive constants c and no such that 0 < f(n) < cg(n) for all
n > no.










APPROXIMATE MINIMUM DEGREE


Algorithm 2 (Computation of EC, \ CpI for all e E V)
assume w(... n) < 0
for each supervariable i E p do
for each element e E gi do
if (w(e) < 0) then w(e) = IC,
w(e) w(e) li
end for
end for



Algorithm 2 is followed by a second loop to compute our upper bound degree, di
for each supervariable i E p, using Equations (4.1) and (4.2). The total time for
Algorithm 2 is




The second loop to compute the upper bound degrees takes time

(4.3) o( (|Ai\ + |i)),


which is thus equal to the total asymptotic time.
Multiple elimination [25] improves the minimum degree algorithm by updating the
degree of a variable only once for each set of independent pivots. Incomplete degree
update [17, 18] skips the degree update of outmatched variables. We cannot take full
advantage of the incomplete degree update since it avoids the degree update for some
supervariables adjacent to the pivot element. With our technique (Algorithm 2), we
must scan the element lists for all supervariables i in p. If the degree update of one
of the supervariables is to be skipped, its element list must still be scanned so that the
external subset terms can be computed for the degree update of other supervariables
in Cp. The only advantage of multiple elimination or incomplete degree update would
be to skip the second loop that computes the upper bound degrees for outmatched
variables or supervariables for which the degree has already been computed.
If the total time in Equation (4.3) is amortized across the computation of all
supervariables i E Cp, then the time taken to compute di is

e(I|A + I|g)= O(IA ),

which is e(| .1i- (i)l) if all variables are simple. Computing our bound takes time
proportional to the degree of the variable in the quotient graph, G. This is much faster
than the time taken to compute the exact external degree (see Equation (3.3)).
4.1. Accuracy of our approximate degrees. Gilbert, Moler, and Schreiber
[24] also use approximate external degrees that they can compute in the same time
as our degree bound d. In our notation, their bound di is

di= |Ai \il + If,\il.










P. AMESTOY, T. A. DAVIS, AND I. S. DUFF


Since many pivotal variables are adjacent to two or fewer elements when selected,
Ashcraft and Eisenstat [4] have suggested a combination of d and d,

=f d if |i =2
= d otherwise

Computing d takes the same time as d or d, except when IEl = 2. In this case, it
takes O(IAgl + IC|) time to compute d, whereas computing d or d takes O(|IAi) time.
In the Yale Sparse Matrix Package [17] the ECe \ Cp term for the Eg = {e,p} case
is computed by scanning Ce once. It is then used to compute di for all i E p for
which Ci = {e,p}. This technique can also be used to compute d, and thus the time
to compute dis O(IAiI + |e|) and not O(jAiI + I|e).
Theorem 1: Relationship between external degree and the three ap-
proximate degree bounds. The equality d di = di i = di holds when Il i < 1.
The inequality di = di = di < di holds when Eil = 2. Finally, the inequality
di < di < di = di holds when ,E I > 2.
Proof:
The bound di is equal to the exact degree when variable i is adjacent to at most
one element (|ISl < 1). The accuracy of their bound is unaffected by the size of Ai,
since entries are removed from A that fall within the pattern C of an element. Thus,
if there is just one element (the current element p, say), the bound di is tight. If I|lE
is two (the current element, p, and a prior element e, say), we have

di = lAi \ i + Ip \ il + If \ il = di + (C, n Op) \ i|.

The bound d counts entries in the set (C, n Cp) \ i twice, and so di will be an
overestimate in the possible (even likely) case that a variable j f i exists that is
adjacent to both e and p. Combined with the definition of d, we have di = di = di
when I|< < 1, di = di < di when I 2.
If IEi| < 1 our bound di is exact for the same reason that di is exact. If CII is two
we have

di = lAi \ il + Ip \ il + I \ \p| = di.

No entry is in both Ai and any element C, since these redundant entries are removed
from Ai. Any entry in Cp does not appear in the external subset (C,e \Cp). Thus, no
entry is counted twice, and di = di when I|i < 2. Finally, consider both di and di
when I|g > 2. We have

di = lAi \ il + p \ il + I \e f\CpI
eE \{p}

and

di =\Ai\il+Ip \il + I \il.
eES\{p}

Since these degree bounds are only used when computing the degree of a supervariable
i Cp, we have iC Cp. Thus, di < di when IEil > 2.
D










APPROXIMATE MINIMUM DEGREE


Combining the three inequalities in Theorem 1, the inequality di < di < di < di
holds for all values of Cei\.
Note that, if a variable i is adjacent to two elements or less then our bound is
equal to the exact external degree. This is very important, since most variables of
minimum degree are adjacent to two elements or less. Additionally, our degree bounds
take advantage of element absorption, since the bound depends on |eil after elements
are absorbed.
4.2. Degree computation example. We illustrate the computation of our ap-
proximate external degree bound in Figures 3.1 and 3.2. Variable 6 is adjacent to three
elements in G3 and g4. All other variables are adjacent to two or less elements. In
g3, the bound d6 is tight, since the two sets IC1\ 31 and C2 \ 31 are disjoint.
In graph G4, the current pivot element is p = 4. We compute

d6 = \Ai\i +\Cp\i +( i |e\p|)
eE,\{p}
= 0\ {6} + {6, 7, 8} \ {6} + (j2 \ + 3 \ 41)
= {7,8}1+(1{5,6,9}\ {6,7,8} + |{5,6,7}\ {6,7,8}|)
= |{7, 8} + (1{5, 9}1 + {5})
= 5.

The exact external degree of variable 6 is d6 = 4, as can be seen in the elimination
graph G4 on the left of Figure 3.2(a). Our bound is one more than the exact external
degree, since the variable 5 appears in both 2 \ 4 and 3 \ 4, but is one less than
the bound di which is equal to 6 in this case. Our bound on the degree of variable
6 is again tight after the next pivot step, since elements 2 and 3 are absorbed into
element 5.
5. The approximate minimum degree algorithm. The Approximate Mini-
mum Degree algorithm is identical to Algorithm 1, except that the external degree, di,
is replaced with di, throughout. The bound on the external degree, di, is computed
using Algorithm 2 and Equations (4.1) and (4.2). In addition to absorbing elements
in tp, any element with an empty external subset (eC, \ Cp = 0) is also absorbed
into element p, even if e is not adjacent to p. This i -i --.- i element absorption
improves the degree bounds by reducing |El.
As in many other minimum degree algorithms, we use linked lists to assist the
search for a variable of minimum degree. List d holds all supervariables i with degree
bound di = d. Maintaining this data structure takes time proportional to the total
number of degree computations, or O(|L|).
Computing the pattern of each pivot element, p, takes a total of O(|L|) time
overall, since each element is used in the computation of at most one other element,
and the total sizes of all elements constructed is O(|L|).
The AMD algorithm is based on the quotient graph data structure used in the
MA27 minimum degree algorithm [13, 14, 15]. Initially, the sets A are stored, followed
by a small amount of elbow room. When the set p is formed, it is placed in the elbow
room (or in place of Ap if lEp = 0). Garbage collection occurs if the elbow room is
exhausted. During garbage collection, the space taken by Ai and tE is reduced to
exactly |AiI + | I| for each supervariable i (which is less than or equal to IA?|) and
the extra space is reclaimed. The space for Ae and E. for all elements e E V is fully
reclaimed, as is the space for C, of any absorbed elements e. Each garbage collection










P. AMESTOY, T. A. DAVIS, AND I. S. DUFF


takes time that is proportional to the size of the workspace (normally O(I|A)). In
practice, elbow room of size n is sufficient.
During the computation of our degree bounds, we compute the following hash
function for supervariable detection [3],


Hash(i) = J j + e ) mod (n 1) + 1,


which increases the degree computation time by a small constant factor. We place
each supervariable i in a hash bucket according to Hash(i), taking time O(ILI) overall.
If two or more supervariables are placed in the same hash bucket, then each pair of
supervariables i and j in the hash bucket are tested for indistinguishability. If the
hash function results in no collisions then the total time taken by the comparison is
O(|A|).
Ashcraft [3] uses this hash function as a preprocessing step on the entire matrix
(without the mod(n 1) term, and with an O(jVJ log I|V) sort instead of 1V1 hash
buckets). In contrast, we use this function during the ordering, and only hash those
variables adjacent to the current pivot element.
For example, variables 7, 8, and 9 are indistinguishable in Gs, in Figure 3.2(a).
The AMD algorithm would not consider variable 8 at step 5, since it is not adjacent
to the pivot element 5 (refer to quotient graph G5 in Figure 3.2(b)). AMD would
not construct 7 = {7, 9} at step 5, since 7 and 9 are distinguishable in 9'. It would
construct 7 = {7, 8, 9} at step 6, however.
The total number of times the approximate degree di of variable i is computed
during elimination is no more than the number of nonzero entries in row k of L, where
variable i is the kth pivot. The time taken to compute di is O(|A|), or equivalently
O(j(PAPT)kj.), the number of nonzero entries in row k of the permuted matrix.
The total time taken by the entire AMD algorithm is thus bounded by the degree
computation,


(5.1) 0 | Lk| |(PPAPT )k


This bound assumes no (or few) supervariable hash collisions and a constant number of
garbage collections. In practice these assumptions seem to hold, but the asymptotic
time would be higher if they did not. In many problem domains, the number of
nonzeros per row of A is a constant, independent of n. For matrices in these domains,
our AMD algorithm takes time O(ILI) (with the same assumptions).
6. Performance results. In this section, we present the results of our exper-
iments with AMD on a wide range of test matrices. We first compare the degree
computations discussed above (t, d, d, d, and d), as well as an upper bound on the
true degree, t d + li 1. We then compare the AMD algorithm with other estab-
lished minimum degree codes (\I \I I) and MA27).
6.1. Test Matrices. We tested all degree bounds and codes on all matrices
in the Harwell/Boeing collection of type PUA, RUA, PSA, and RSA [11, 12] (at
orion. cerfacs .fr or numerical. cc. rl. ac.uk), all non-singular matrices in Saad's
SPARSKIT2 collection (at ftp. cs. umn. edu), all matrices in the University of Florida
collection (available from ftp. cis ufl. edu in the directory pub/umfpack/matrices),











APPROXIMATE MINIMUM DEGREE


TABLE 6.1
Selected matrices in test set


Matrix n nz Percentage of Description
p|,1 > 2 1|e| > 2
RAEFSKY3 21,200 733,784 0.00 13.4 fluid/structure interaction, turbulence
VENKAT01 62,424 827,684 0.71 15.7 unstructured 2D Euler solver
BCSSTK32 44,609 985,046 0.20 27.3 structural eng., automobile chassis
EX19 12,005 123,937 1.57 29.4 2D developing pipe flow (turbulent)
BCSSTK30 28,924 1,007,284 0.66 31.8 structural eng., off-shore platform
CT20STIF 52,329 1,323,067 0.77 33.2 structural eng., CT20 engine block
NASASRB 54,870 1,311,227 0.06 35.0 shuttle rocket booster
OLAF 16,146 499,505 0.41 35.2 NASA test problem
RAEFSKY1 3,242 145,517 0.00 38.9 incompressible flow, pressure-driven pipe
CRYSTK03 24,696 863,241 0.00 40.9 structural eng., crystal vibration
RAEFSKY4 19,779 654,416 0.00 41.4 buckling problem for container model
CRYSTK02 13,965 477,309 0.00 42.0 structural eng., crystal vibration
BCSSTK33 8,738 291,583 0.00 42.6 structural eng., auto steering mech.
BCSSTK31 35,588 572,914 0.60 43.1 structural eng., automobile component
EX11 16,614 540,167 0.04 43.3 CFD, 3D cylinder & flat plate heat exch.
FINAN512 74,752 261,120 1.32 46.6 economics, portfolio optimization
RIM 22,560 862,411 2.34 63.2 chemical eng., fluid mechanics problem
BBMAT 38,744 1,274,141 5.81 64.4 CFD, 2D airfoil with turbulence
EX40 7,740 225,136 17.45 64.7 CFD, 3D die swell problem on square die
WANG4 26,068 75,564 15.32 78.3 3D MOSFET semicond. (30x30x30 grid)
LHR34 35,152 608,830 7.69 78.7 chemical eng., light hydrocarbon recovery
WANG3 26,064 75,552 15.29 79.2 3D diode semiconductor (30x30x30 grid)
LHR71 70,304 1,199,704 8.47 81.1 chemical eng., light hydrocarbon recovery
ORANI678 2,529 85,426 6.68 86.9 Australian economic model
PSMIGR1 3,140 410,781 6.65 91.0 US county-by-county migration
APPU 14,000 1,789,392 15.64 94.4 NASA test problem (random matrix)



and several other matrices from NASA and Boeing. Of those 378 matrices, we present
results below on those matrices requiring 500 million or more floating-point operations
for the Cholesky factorization, as well as the ORANI678 matrix in the Harwell/Boeing
collection and the EX19 in Saad's collection (a total of 26 matrices). The latter two
are best-case and worst-case examples from the set of smaller matrices.
For the unsymmetric matrices in the test set, we first used the maximum transver-
sal algorithm MC21 from the Harwell Subroutine Library [9] to reorder the matrix so
that the permuted matrix has a zero-free diagonal. We then formed the symmetric
pattern of the permuted matrix plus its transpose. This is how a minimum degree
ordering algorithm is used in MUPS [1, 2]. For these matrices, Table 6.1 lists the
statistics for the symmetrized pattern.
Table 6.1 lists the matrix name, the order, the number of nonzeros in lower
triangular part, two statistics obtained with an exact minimum degree ordering (using
d), and a description. In column 4, we report the percentage of pivots p such that
tEp > 2. Column 4 shows that there is only a small percentage of pivots selected
using an exact minimum degree ordering that have more than two elements in their
adjacency list. Therefore, we can expect a good quality ordering with an algorithm
based on our approximate degree bound. In column 5, we indicate how often a
degree di is computed when IEil > 2 (as a percentage of the total number of degree
updates). Table 6.1 is sorted according to this degree update percentage. Column 5
thus reports the percentage of ...- ly" degree updates performed by a minimum
degree algorithm based on the exact degree. For matrices with relatively large values










P. AMESTOY, T. A. DAVIS, AND I. S. DUFF


in column 5, significant time reductions can be expected with an approximate degree
based algorithm.
Since any minimum degree algorithm is sensitive to tie-breaking issues, we ran-
domly permuted all matrices and their adjacency lists 21 times (except for the random
APPU matrix, which we ran only once). All methods were given the same set of 21
randomized matrices. We also ran each method on the original matrix. On some ma-
trices, the original matrix gives better ordering time and fill-in results for all methods
than the best result obtained with the randomized matrices. The overall comparisons
are not however dependent on whether original or randomized matrices are used. We
thus report only the median ordering time and fill-in obtained for the randomized
matrices.
The APPU matrix is a random matrix used in a NASA benchmark, and is thus
not representative of sparse matrices from real problems. We include it in our test
set as a pathological case that demonstrates how well AMD handles a very irregular
problem. Its factors are about 1'i. dense. It was not practical to run the APPU
matrix 21 times because the exact degree update algorithms took too much time.

6.2. Comparing the exact and approximate degrees. To make a valid com-
parison between degree update methods, we modified our code for the AMD algorithm
so that we could compute the exact external degree (d), our bound (d), Ashcraft and
Eisenstat's bound (d), Gilbert, Moler, and Schreiber's bound (d), the exact true de-
gree (t), and our upper bound on the true degree (t). The six codes based on d, d, d,
d, t, and t (columns 3 to 8 of Table 2) differ only in how they compute the degree.
Since aggressive absorption is more difficult when using some bounds than others,
we switched off aggressive absorption for these six codes. The actual AMD code (in
column 2 of Table 2) uses d with aggressive absorption.
Table 6.2 lists the median number of nonzeros below the diagonal in L (in thou-
sands) for each method. Results 20% higher than the lowest median |L| in the table (or
higher) are underlined. Our upper bound on the true degree (t) and the exact true de-
gree (t) give nearly identical results. As expected, using minimum degree algorithms
based on external degree noticeably improves the quality of the ordering (compare
columns 3 and 7, or columns 4 and 8). From the inequality d < d < d < d, we
would expect a similar ranking in the quality of ordering produced by these methods.
Table 6.2 confirms this. The bound d and the exact external degree d produce nearly
identical results. Comparing the AMD results and the d column, aggressive absorp-
tion tends to result in slightly lower fill-in, since it reduces |El and thus improves the
accuracy of our bound. The d bound is often accurate enough to produce good re-
sults, but can fail catastrophically for matrices with a high percentage of approximate
pivots (see column 4 in Table 6.1). The less accurate d bound produces notably worse
results for many matrices.
Comparing all 378 matrices, the median L when using d is never more than 9%
higher than the median fill-in obtained when using the exact external degree, d (with
the exception of the FINAN512 matrix). The fill-in results for d and d are identical
for nearly half of the 378 matrices. The approximate degree bound d thus gives a very
reliable estimation of the degree in the context of a minimum degree algorithm.
The FINAN512 matrix is highly sensitive to tie-breaking variations. Its graph
consists of two types of nodes: "constraint" nodes and "linking" nodes [5]. The
constraint nodes form independent sparse subgraphs, connected together via a tree of
linking nodes. This matrix is a pathological worst-case matrix for any minimum degree




















Matrix


RAEFSKY3
VENKAT01
BCSSTK32
EX19
BCSSTK30
CT20STIF
NASASRB
OLAF
RAEFSKY1
CRYSTK03
RAEFSKY4
CRYSTK02
BCSSTK33
BCSSTK31
EX11
FINAN512
RIM
BBMAT
EX40
WANG4
LHR34
WANG3
LHR71
ORANI678
PSMIGR1
APPU


APPROXIMATE MINIMUM DEGREE

TABLE 6.2
Median fill-in results of the degree update methods



Number of nonzeros below diagonal in L, in thousands


AMD


4709
5789
5080
319
3752
10858
12282
2860
1151
13836
7685
6007
2624
5115
6014
4778
3948
19673
1418
6547
3618
6545
7933
147
3020
87648


d d d d


4709
5771
5081
319
3751
10758
12306
2858
1151
13836
7685
6007
2624
5096
6016
4036
3898
19880
1386
6808
3743
6697
8127
147
3025
87613


4709
5789
5079
319
3753
10801
12284
2860
1151
13836
7685
6007
2624
5132
6014
6042
3952
19673
1417
6548
3879
6545
8499
146
3011
87648


4709
5798
5083
318
3759
11057
12676
2860
1151
13836
7685
6007
2640
5225
6014
11418
3955
21422
1687
6566
11909
6497
28241
150
3031
87566


5114
6399
5721
366
4332
13367
14909
3271
1318
17550
9294
7366
3236
6194
7619
11505
4645
37820
1966
7871
27125
7896
60175
150
3176
87562


t t


4992
6245
5693
343
4483
12877
14348
3089
1262
15507
8196
6449
2788
6079
6673
8235
4268
21197
1526
7779
4383
7555
9437
147
2966
87605


4992
6261
5665
343
4502
12846
14227
3090
1262
15507
8196
6449
2787
6057
6721
8486
4210
21445
1530
7598
4435
7358
9623
146
2975
87631


method. All constraint nodes should be ordered first, but linking nodes have low
degree and tend to be selected first, which causes high fill-in. Using a tree dissection
algorithm, Berger, Mulvey, Rothberg, and Vanderbei [5] obtain an ordering with only
1.83 million nonzeros in L.
Table 6.3 lists the median ordering time (in seconds on a SUN SPARCstation
10) for each method. Ordering time twice that of the minimum median ordering
time listed in the table (or higher) is underlined. Computing the d bound is often
the fastest, since it requires a single pass over the element lists instead of the two
passes required for the d bound. It is, however, sometimes slower than d because it
can generate more fill-in, which increases the ordering time (see Equation 5.1). The
ordering time of the two exact degree updates (d and t) increases dramatically as the
percentage of ... -ly" degree updates increases (those for which I|il > 2).
Garbage collection has little effect on the ordering time obtained. In the above
runs, we gave each method elbow room of size n. Usually a single garbage collection
occurred. At most two garbage collections occurred for AMD, and at most three for
the other methods (aggressive absorption reduces the memory requirements).


6.3. Comparing algorithms. In this section, we compare AMD with two other
established minimum degree codes: Liu's Multiple Minimum Degree (1 \I1)) code
[25] and Duff and Reid's MA27 code [15]. MMD stores the element patterns in a
fragmented manner and requires no elbow room [20, 21]. It uses the exact external
degree, d. MMD creates supervariables only when two variables i and j have no


'


'












P. AMESTOY, T. A. DAVIS, AND I. S. DUFF

TABLE 6.3
Median ordering time of the degree update methods


Matrix


RAEFSKY3
VENKAT01
BCSSTK32
EX19
BCSSTK30
CT20STIF
NASASRB
OLAF
RAEFSKY1
CRYSTK03
RAEFSKY4
CRYSTK02
BCSSTK33
BCSSTK31
EX11
FINAN512
RIM
BBMAT
EX40
WANG4
LHR34
WANG3
LHR71
ORANI678
PSMIGR1
APPU


AMD


1.05
4.07
4.67
0.87
3.51
6.62
7.69
1.83
0.27
3.30
2.32
1.49
0.91
4.55
2.70
15.03
5.74
27.80
1.04
5.45
19.56
5.02
46.03
5.49
10.61
41.75


Ordering time, in seconds
d d d d


1.10
4.95
5.64
1.12
5.30
8.66
11.03
2.56
0.34
4.84
2.90
2.34
1.36
7.53
4.06
34.11
10.38
115.75
1.56
11.45
109.10
10.45
349.58
196.01
334.27
2970.54


1.09
4.11
4.54
0.89
3.55
6.54
7.73
1.90
0.28
3.08
2.18
1.55
1.05
4.92
2.77
14.45
5.69
27.44
1.10
5.56
25.62
5.49
58.25
8.13
10.07
39.83


1.05
4.47
4.91
1.01
3.65
7.07
9.23
2.16
0.32
3.68
2.45
1.64
0.99
5.68
3.00
17.79
6.12
42.17
1.09
6.98
45.36
6.52
129.85
6.97
14.20
43.20


1.02
3.88
4.35
0.86
3.51
6.31
7.78
1.83
0.25
3.14
2.08
1.45
0.85
4.56
2.60
15.84
5.72
23.02
0.95
5.21
43.70
4.81
121.96
7.23
8.16
40.64


t t


1.15
4.32
5.55
1.09
4.38
8.63
11.78
2.33
0.35
5.23
3.12
2.04
1.62
7.41
4.23
46.49
10.01
129.32
1.46
11.59
125.41
11.02
389.70
199.01
339.28
3074.44


1.09
3.85
4.48
0.87
3.38
6.45
7.99
1.78
0.28
3.30
2.07
1.52
0.91
4.92
2.89
18.58
5.58
28.33
1.12
5.88
24.73
5.02
60.40
8.45
9.94
38.93


adjacent variables and exactly two adjacent elements (Ci = Ej = {e,p}, and Ai =
Aj = 0, where p is the current pivot element). A hash function is not required. MMD
takes advantage of multiple elimination and incomplete update.
MA27 uses the true degree, t, and the same data structures as AMD. It detects
supervariables whenever two variables are adjacent to the current pivot element and
have the same structure in the quotient graph (as does AMD). MA27 uses the true
degree as the hash function for supervariable detection, and does aggressive absorp-
tion. Neither AMD nor MA27 take advantage of multiple elimination or incomplete
update.
Structural engineering matrices tend to have many rows of identical nonzero pat-
tern. Ashcraft has found that the total ordering time of MMD can be significantly
improved by detecting these initial supervariables before starting the elimination [3].
We implemented Ashcraft's pre-compression algorithm, and modified MMD to allow
for initial supervariables. We call the resulting code (1 \1 I) ( i...... -- .. MMD).
Pre-compression has little effect on AMD, since it finds these supervariables when their
degrees are first updated. The AMD algorithm on compressed matrices together with
the cost of pre-compression was never faster than AMD.
Table 6.4 lists the median number of nonzeros below the diagonal in L (in thou-
sands) for each code. Results 20% higher than the lowest median IL| in the table
(or higher) are underlined. AMD, MMD, and (' \I11) find orderings of about the
same quality. MA27 is slightly worse because it uses the true degree (t) instead of the
external degree (d).


'


'












APPROXIMATE MINIMUM DEGREE

TABLE 6.4
Median fill-in results of the four codes


Matrix


RAEFSKY3
VENKAT01
BCSSTK32
EX19
BCSSTK30
CT20STIF
NASASRB
OLAF
RAEFSKY1
CRYSTK03
RAEFSKY4
CRYSTK02
BCSSTK33
BCSSTK31
EX11
FINAN512
RIM
BBMAT
EX40
WANG4
LHR34
WANG3
LHR71
ORANI678
PSMIGR1
APPU


Number of nonzeros below diagonal
in L, in thousands
AMD MMD CMMD MA27


4709
5789
5080
319
3752
10858
12282
2860
1151
13836
7685
6007
2624
5115
6014
4778
3948
19673
1418
6547
3618
6545
7933
147
3020
87648


4779
5768
5157
322
3788
11212
12490
2876
1165
13812
7539
5980
2599
5231
5947
8180
3947
19876
1408
6619
4162
6657
9190
147
2974
87647


4724
5811
5154
324
3712
10833
12483
2872
1177
14066
7582
6155
2604
5216
6022
8180
3914
19876
1401
6619
4162
6657
9190
147
2974
87647


5041
6303
5710
345
4529
12760
14068
3063
1255
15496
8245
6507
2766
6056
6619
8159
4283
21139
1521
7598
4384
7707
9438
147
2966
87605


Considering the entire set of 378 matrices, AMD produces a better median fill-in
than MMD, ('1 \1 I), and MA27 for 62%, 61%, and 81% of the matrices, respectively.
AMD never generates more than 7%, 7%, and 4% more nonzeros in L than MMD,
C'l \I I), and MA27, respectively. We have shown empirically that AMD produces an
ordering at least as good as these other three methods for this large test set.

If the apparent slight difference in ordering quality between AMD and MMD is
statistically significant, we conjecture that it has more to do with earlier supervariable
detection (which affects the external degree) than with the differences between the
external degree and our upper bound.

Table 6.5 lists the median ordering time (in seconds on a SUN SPARCstation 10)
for each method. The ordering time for C'1 \I I) includes the time taken by the pre-
compression algorithm. Ordering time twice that of the minimum median ordering
time listed in the table (or higher) is underlined. On certain classes of matrices,
typically those from structural analysis applications, C'\ I I) is significantly faster
than MMD. AMD is the fastest method for all but the EX19 matrix. For the other
352 matrices in our full test set, the differences in ordering time between these various
methods is typically less. If we compare the ordering time of AMD with the other
methods on all matrices in our test set requiring at least a tenth of a second of
ordering time, then AMD is slower than MMD, ('\1 1\ ), and MA27 only for 6, 15,
and 8 matrices respectively. For the full set of matrices, AMD is never more than 30%
slower than these other methods. The best and worst cases for the relative run-time


'












P. AMESTOY, T. A. DAVIS, AND I. S. DUFF

TABLE 6.5
Median ordering time of the four codes


Matrix


RAEFSKY3
VENKAT01
BCSSTK32
EX19
BCSSTK30
CT20STIF
NASASRB
OLAF
RAEFSKY1
CRYSTK03
RAEFSKY4
CRYSTK02
BCSSTK33
BCSSTK31
EX11
FINAN512
RIM
BBMAT
EX40
WANG4
LHR34
WANG3
LHR71
ORANI678
PSMIGR1
APPU


Ordering time, in seconds
AMD MMD CMMD


1.05
4.07
4.67
0.87
3.51
6.62
7.69
1.83
0.27
3.30
2.32
1.49
0.91
4.55
2.70
15.03
5.74
27.80
1.04
5.45
19.56
5.02
46.03
5.49
10.61
41.75


2.79
9.01
12.47
0.69
7.78
26.00
22.47
5.67
0.82
10.63
5.24
3.89
2.24
11.60
7.45
895.23
9.09
200.86
2.13
10.79
139.49
10.37
396.03
124.99
186.07
5423.23


1.18
4.50
5.51
0.83
3.71
9.59
11.28
4.41
0.28
3.86
2.36
1.53
1.32
7.76
5.05
897.15
8.11
201.03
2.04
11.60
141.16
10.62
400.40
127.10
185.74
5339.24


MA27


1.23
5.08
6.21
1.03
4.40
9.81
12.75
2.64
0.40
5.27
2.91
2.37
1.31
7.92
3.90
40.31
10.13
134.58
1.30
9.86
77.83
8.23
215.01
124.66
229.51
2683.27


of AMD for the smaller matrices are included
matrices).


in Table 6.5 (the EX19 and ORANI678


7. Summary. We have described a new upper bound for the degree of nodes
in the elimination graph that can be easily computed in the context of a minimum
degree algorithm. We have demonstrated that this upper-bound for the degree is more
accurate than all previously used degree approximations. We have experimentally
shown that we can replace an exact degree update by our approximate degree update
and obtain almost identical fill-in.
An Approximate Minimum Degree (AMD) based on external degree approxima-
tion has been described. We have shown that the AMD algorithm is highly com-
petitive with other ordering algorithms. It is typically faster than other minimum
degree algorithms, and produces comparable results to MMD (which is also based on
external degree) in terms of fill-in. AMD typically produces better results, in terms
of fill-in and computing time, than the MA27 minimum degree algorithm (based on
true degrees).

8. Acknowledgments. We would like to thank John Gilbert for outlining the
di < di portion of the proof to Theorem 1, Joseph Liu for providing a copy of the
MMD algorithm, and Cleve Ashcraft and Stan Eisenstat for their comments on a
draft of this paper.


REFERENCES


'












APPROXIMATE MINIMUM DEGREE


[1] P. R. AMESTOY, Factorization of large sparse matrices based on a multifrontal approach
in a multiprocessor environment, INPT PhD Thesis TH/PA/91/2, CERFACS, Toulouse,
France, 1991.
[2] P. R. AMESTOY, M. DAYDE, AND I. S. DUFF, Use of level 3 BLAS in the solution offull and
sparse linear equations, in High Performance Computing: Proceedings of the International
Symposium on High Performance Computing, Montpellier, France, 22-24 March, 1989,
J.-L. Delhaye and E. Gelenbe, eds., Amsterdam, 1989, North Holland, pp. 19-31.
[3] C. ASHCRAFT, Compressed graphs and the minimum degree algorithm, SIAM Journal on Sci-
entific Computing, (1995, to appear).
[4] C. ASHCRAFT AND S. C. EISENSTAT. personal communication.
[5] A. BERGER, J. MULVEY, E. ROTHBERG, AND R. VANDERBEI, Solving multistage stochachas-
tic programs using tree dissection, Tech. Report SOR-97-07, Program in Statistics and
Operations Research, Princeton University, Princeton, New Jersey, 1995.
[6] T. H. CORMEN, C. E. LEISERSON, AND R. L. RIVEST, Introduction to Algorithms, MIT Press,
Cambridge, Massachusets, and McGraw-Hill, New York, 1990.
[7] T. A. DAVIS AND I. S. DUFF, Unsymmetric-pattern multifrontal methods for parallel sparse
LU factorization, Tech. Report TR-91-023, CIS Dept., Univ. of Florida, Gainesville, FL,
1991.
[8] An unsymmetric-pattern multifrontal method for sparse LU factorization, Tech. Report
TR-94-038, CIS Dept., Univ. of Florida, Gainesville, FL, 1994. (submitted to the SIAM
Journal on Matrix Analysis and Applications in March 1993, revised).
[9] I. S. DUFF, On algorithms for obtaining a maximum transversal, ACM Transactions on Math-
ematical Software, 7 (1981), pp. 315-330.
[10] I. S. DUFF, A. M. ERISMAN, AND J. K. REID, Direct Methods for Sparse Matrices, London:
Oxford Univ. Press, 1986.
[11] I. S. DUFF, R. G. GRIMES, AND J. G. LEWIS, Sparse matrix test problems, ACM Trans. Math.
Softw., 15 (1989), pp. 1-14.
[12] Users' guide for the I I sparse matrix collection (Release 1), Tech. Report
RAL-92-086, Rutherford Appleton Laboratory, Didcot, Oxon, England, Dec. 1992.
[13] I. S. DUFF AND J. K. REID, A comparison of sparsity orderings for obtaining a pivotal sequence
in Gaussian elimination, Journal of the Institute of Mathematics and its Applications, 14
(1974), pp. 281-291.
[14] MA27 -A set of Fortran subroutines for solving sparse symmetric sets of linear equa-
tions, Tech. Report AERE R10533, HMSO, London, 1982.
[15] The multifrontal solution of indefinite sparse symmetric linear equations, ACM Trans.
Math. Softw., 9 (1983), pp. 302-325.
[16] The multifrontal solution of unsymmetric sets of linear equations, SIAM J. Sci. Statist.
Comput., 5 (1984), pp. 633-641.
[17] S. C. EISENSTAT, M. C. GURSKY, M. H. SCHULTZ, AND A. H. SHERMAN, Yale sparse ma-
trix package, I: The symmetric codes, International Journal for Numerical Methods in
Engineering, 18 (1982), pp. 1145-1151.
[18] S. C. EISENSTAT, M. H. SCHULTZ, AND A. H. SHERMAN, Algorithms and data structures
for sparse symmetric Gaussian elimination, SIAM Journal on Scientific and Statistical
Computing, 2 (1981), pp. 225-237.
[19] A. GEORGE AND J. W. H. LIU, A fast implementation of the minimum degree algorithm using
quotient graphs, ACM Transactions on Mathematical Software, 6 (1980), pp. 337-358.
[20] A minimal storage implementation of the minimum degree algorithm, SIAM J. Numer.
Anal., 17 (1980), pp. 282-299.
[21] Computer Solution of Large Sparse Positive Definite Systems, Englewood Cliffs, New
Jersey: Prentice-Hall, 1981.
[22] The evolution of the minimum degree ordering algorithm, SIAM Review, 31 (1989),
pp. 1-19.
[23] A. GEORGE AND D. R. MCINTYRE, On the application of the minimum degree algorithm to
finite element systems, SIAM J. Numer. Anal., 15 (1978), pp. 90-111.
[24] J. R. GILBERT, C. MOLER, AND R. SCHREIBER, Sparse matrices in MATLAB: design and
implementation, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 333-356.
[25] J. W. H. LIU, Modification of the minimum-degree algorithm by multiple elimination, ACM
Trans. Math. Softw., 11 (1985), pp. 141-153.
[26] H. M. MARKOWITZ, The elimination form of the inverse and its application to linear program-
ming, Management Science, 3 (1957), pp. 255-269.
[27] D. J. ROSE, Symmetric Elimination on Sparse Positive Definite Systems and the Potential
Flow Network Problem, PhD thesis, Applied Math., Harvard Univ., 1970.











20 P. AMESTOY, T. A. DAVIS, AND I. S. DUFF


[28] A graph-theoretic study of the numerical solution of sparse positive definite systems of
linear equations, in Graph Theory and Computing, R. C. Read, ed., New York: Academic
Press, 1973, pp. 183-217.
[29] B. SPEELPENNING, The generalized element method, Tech. Report TechnicalReport UIUCDCS-
R-78-946, Dept. of Computer Science, Univ. of Illinois, Urbana, IL, 1978.
[30] W. F. TINNEY AND J. W. WALKER, Direct solutions of sparse network equations by
ordered triangular factorization, Proc. of the IEEE, 55 (1967), pp. 1801-1809.
[31] M. YANNAKAKIS, Computing the minimum fill-in is NP-complete, SIAM J. Algebraic and
Discrete Methods, 2 (1981), pp. 77-79.
Note: all University of Florida technical reports in this list of references are
available in postscript form via anonymous ftp to ftp. cis .ufl. edu in the directory
cis/tech-reports.




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs