LOST PIVOT RECOVERY FOR AN UNSYMMETRICPATTERN
MULTIFRONTAL METHOD
STEVEN M. HADFIELD* AND TIMOTHY A. DAVISt
Computer and Information Sciences Department, University of Florida
Technical Report TR94029.
Abstract. The unsymmetricpatternmultifrontalmethod by Davis and Duff relaxes the assump
tion of a symmetric pattern matrix and has considerable performance advantages for the factorization
of sequences of identically structured sparse matrices with unsymmetric patterns. However, when an
ticipated pivots become no longer numerically acceptable, standard multifrontal recovery techniques
are inadequate. This paper develops a robust lost pivot recovery capability for the unsymmetric
pattern multifrontal method. The recovery capability is built into a distributed memory, parallel
implementation of the method and both its sequential and parallel performance are evaluated.
Key words. LU factorization, unsymmetric sparse matrices, multifrontal methods, parallel
algorithms
AMS subject classifications. 65F50, 65F05
1. Introduction. Systems of differentialalgebraic equations arise in many ap
plication areas including circuit simulation, chemical engineering, magnetic resonance
spectroscopy, and air pollution modeling [4, 17, 20, 22]. Methods for solving such
systems frequently produce sequences of identically structured sparse matrices whose
patterns can be highly unsymmetric. The LU factorization of these matrices allow
their corresponding linear systems to be easily solved. Multifrontal factorization meth
ods are useful as they can define a rich computational structure for the factorization
based on a single analysis of the matrix's structure and then reuse this computational
structure for the numerical factorization of the various matrices in the sequence. Fur
thermore, the computational structures defined by a multifrontal method together
with their use of dense submatrices allows parallelism to be exploited in the factor
ization process.
However, as the values of matrix entries change across matrices in the sequence,
entries that were once numerically acceptable as pivots may become no longer ac
ceptable for subsequent matrices in the sequence. Techniques that recover from the
loss of these anticipated pivots without having to abandon the previously defined
computational structure can significantly improve overall execution time.
Until recently all multifrontal methods assumed a symmetricpattern matrix which
results in a corresponding computational structure that is a tree (or forest) [1, 5, 6,
7, 8, 18]. This simplified structure allows for a simple lost pivot recovery mechanism.
Recently, Davis and Duff have introduced a new multifrontal method that relaxes
the assumption of a symmetric pattern [3] and has demonstrated impressive perfor
mance advantages over previous methods [19]. In addition to reducing computational
requirements, this method exposes considerable parallelism [14, 15]. With this unsym
metric pattern multifrontal method however, the resulting computational structure
Department of Mathematical Sciences, US Air Force Academy, Colorado, USA. phone: (719)
4724470, email: hadfieldsm%dfms%usafa@dfmail.usafa.af.mil
t Computer and Information Sciences Department, University of Florida, Gainesville, Florida,
USA. phone: (904) 3921481, email: davis@cis.ufl.edu. This project is supported the National Science
Foundation (ASC9111263, DMS9223088), and by Cray Research, Inc., through the allocation of
supercomputer resources.
2 Hadfield and Davis
is no longer a tree but rather a more generalized directed acyclic graph (DAG). The
relationships between computational units in this structure are more complicated and
require more sophisticated techniques for the recovery of lost pivots.
The focus of this paper is the theoretical development of a lost pivot recovery
mechanism for the unsymmetric pattern multifrontal method. The resulting mecha
nism is implemented within the context of a distributed memory, parallel refactoriza
tion code based on the unsymmetric pattern multifrontal method.
Section 2 discusses the basic concepts behind multifrontal methods with the differ
ences between the new unsymmetric pattern method and classical symmetric pattern
methods highlighted. Section 3 identifies the lost pivot recovery issues that are unique
to the unsymmetric pattern method and summarizes how these issues are addressed.
Section 4 provides the theoretical foundation for the lost pivot recovery mechanism
and Section 5 summarizes the implementation. Section 6 provides some key perfor
mance results in terms of both sequential execution times and the effects of lost pivot
recovery on exploitable parallelism.
2. Multifrontal Concepts. Multifrontal methods for sparse matrix factoriza
tion decompose the sparse matrix into a set of overlapping dense submatrices called
frontal matrices. Each frontal matrix is partially factorized by one or more pivots.
Entries in the unfactorized portion of the frontal matrix (called the contribution block)
must be uniquely assembled (added) into subsequent frontal matrices. An assembly
directed acyclic graph (DAG) is used to define the computational structure with nodes
representing frontal matrices and edges representing the passing of contribution block
entries from one frontal matrix to another. With symmetric pattern multifrontal
methods, the assembly DAG is a tree (or forest) as each frontal matrix's contri
bution block can be completely absorbed within a single subsequent frontal matrix
[1, 5, 6, 7, 8, 18]. With the unsymmetric pattern multifrontal method, the contri
bution block can be fragmented and portions must be passed to different subsequent
frontal matrices which results in more generalized DAG structure.
In the most general case, there are four types of relationships that can exist
between frontal matrices with these relationships defined by the pattern of nonzeros
in the matrix. These relationships are shown in Figure 2.1 below. The notation Pi
refers to the pivot block of the ith frontal matrix. It is a submatrix formed by the
pivot rows and columns within a single frontal matrix.
P, 0 ... P, 0 ...
No relationship 0 P2 L relationship X P
P, X ... P, X ...
U relationship 0 P2 LU relationship X P2
FIG. 2.1. Possible Relationships between Frontal Matrices
These relationships imply certain inclusion properties between the frontal ma
trices' patterns. Specifically, an L relationship implies that the subsequent (parent)
UNSYMMETRIC PATTERN MULTIFRONTAL LOST PIVOT RECOVERY
frontal matrix's pattern of columns will include all of the nonpivotal columns of the
preceding (child) frontal matrix's contribution block. Likewise, a U relationship im
plies a similar inclusion of the child's contribution block pattern of rows in the parent's
pattern of rows. The LU relationship implies the inclusion properties of both the L
and U relationships and allows the entire child's contribution block to be absorbed
within the parent. The inclusion properties of a single LU parent, as is always the case
with a symmetric pattern multifrontal method, allows any failed pivots in the child
frontal matrix to be shifted to the LU parent and simply handled as an extension of
the number of pivots associated with that frontal matrix. There is no extension of
the parent's contribution block and no impacts on the assembly tree. The situation
becomes significantly more complicated when there are L and/or U relationships to
other frontal matrices.
3. Lost Pivot Recovery Issues. Multifrontal methods can address lost pivot
recovery with three distinct strategies. First, avoidance tries to preclude the need
for recovering lost pivots by relaxing the criteria used to judge potential pivots as
being numerically acceptable. This is done via the concept of threshold pivoting [6].
Threshold pivoting requires the magnitude of the pivot entry's value to be greater than
a threshold parameter (p) times the magnitude of the largest entry in the column.
The threshold parameter is set between 1 and 0. Lost pivots can be avoided by setting
p closer to one for the initial matrix in the sequence, and then lowering it to improve
the potential for acceptable pivots in subsequent matrices in the sequence.
The second strategy, intrafrontal matrix recovery, tries to recover from lost pivots
using only permutations confined to the pivot block of the frontal matrix. Such
permutations have no effects on the assembly DAG or composition of other frontal
matrices.
The problem with avoidance and intrafrontal matrix recovery is that they are not
guaranteed to work. Thus, there is a need for the third strategy which is interfrontal
matrix recovery. Interfrontal matrix recovery uses symmetric permutations to the
entire matrix to shift lost pivot rows and columns to subsequent frontal matrices. In
the case of a symmetric pattern multifrontal method, the impacts are limited to the
single LU parent. With the unsymmetric pattern multifrontal method, the impacts
are more significant and wide spread. Specifically, all L and U parent frontal matrices
that occur prior to a specified recovery matrix in the assembly DAG must be expanded
by portions of the failed pivot row(s) and columnss. The specified recovery matrix
must be an LU ancestor of the failed frontal matrix where an L U ancestor is defined
as a subsequent frontal matrix to which there is a path of strictly L and/or LU edges
(L path) and a path of strictly U and/or LU edges (U path) from the frontal matrix
containing the failed pivots. This recovery matrix will absorb any remaining portion
of the lost pivot row(s) and columnss, just as a single LU parent would do in the
symmetric pattern methods. The intervening frontal matrices (between the failed
frontal matrix and its recovery matrix) that lie on an L path are expanded by the lost
pivot columns) and those on a U path are expanded by the lost pivot row(s).
Figures 3.1, 3.2, and 3.3 illustrate interfrontal matrix recovery within the unsym
metric pattern multifrontal method. In Figure 3.1 the entire matrix is shown with
rows and columns labeled with capital letters. Corresponding frontal matrices are
identified with the row/column label of their first pivot entry. Frontal matrix A has
two pivots with the (Ag, Ag) entry being an acceptable pivot and the (Al, Al) entry
being a lost pivot that must be symmetrically permuted to expand the pivot block
of the recovery frontal matrix D. Figure 3.2 shows the corresponding assembly DAG
4 Hadfield and Davis
with the interfrontal matrix relationships show as labeled edges and Figure 3.3 illus
trates the specific frontal matrices. Notice that the symmetric permutation shown in
Figure 3.1 for interfrontal matrix recovery causes new nonzero entries to be created.
This situation is called fillin and is indicated by a '*'. This fillin is accounted for by
the column extension to frontal matrix B that lies on an L path from A and the row
extension to frontal matrix C that lies on a U path from A.
B C D E
A B C D A E
g 1
X X X X
X X X X X
X X
X X X X
X X X X
X X X X X
x x x
FIG. 3.1. Sparse Matrix with Lost Pivots
FIG. 3.2. Sample Assembly DAG
The theoretical foundation established in the next section will show that the
interfrontal recovery mechanisms just summarized will handle any lost pivot recovery
necessary. Furthermore, the extent of fillin due to lost pivot recovery will be fully
defined and the entire process shown to be achievable without modifying the structure
of the assembly DAG, which is especially important for distributed memory parallel
implementations where static scheduling and allocations are used.
4. Theoretical Foundation. The interfrontal matrix recovery strategy just
summarized is fairly simple in concept but complicated in realization. Specifically, the
A A
g 1
X X X X
X X X X
X X XX X
X X
x x x x
x X X x x
UNSYMMETRIC PATTERN MULTIFRONTAL LOST PIVOT RECOVERY
A A C E
g 1
X X X X
X X X X
X X X X
X X X X
x xx x
O D A E
D X X
A X X
E iX
E X X
0
A
g
A
1
B
E
FIG. 3.3. Frontal Matrices Extended by Lost Pivot Recovery
scope of the lost pivot recovery effects (to include new fillin) must be fully defined.
Impacts on the assembly DAG's structure must be understood. Furthermore, the
adequacy of the strategy to handle multiple interacting recoveries and repeated pivot
failures must be established.
The theoretical foundation established in this section does this by first defining
some necessary notation and foundational theorems that describe the relationships
between frontal matrices' patterns that are imposed by relationships (edges and paths)
in the assembly DAG. These results are then used to address and resolve the larger
issues of sufficiency and required detailed mechanisms.
4.1. Definitions and Notation. The ordered set, F', is defined as the set of all
frontal matrices in a multifrontal formulation for a particular sparse matrix factoriza
tion. Each element of this set represents a distinct frontal matrix and an equivalence
relation is established between these frontal matrices and the original position of their
first pivot in the pivot ordering (assuming the necessary permutations have been ap
plied to put the matrix into the pivot ordering used by the initial analyzefactorize
routine [3]). Hence the ordering of the pivot entries provides the ordering for the set
)T and a frontal matrix's ID is defined to be the pivot column/row index of the first
pivot within the frontal matrix.
When block upper triangular form is in use, the notation Ti will represent the
subset of fT that contains all of the frontal matrices in the ith diagonal block. Within
the context of the assembly DAG, Ti will define the node set of a subDAG that is in
no way connected to any other nodes in the assembly DAG (nodes not in fi).
Frontal matrices will typically be represented by capital letters. A special nota
O B C E A
I
B X X X X
E X X X *
Q C D
C x x
D X X
E X X
A X *
1  ... ..
Hadfield and Davis
tion is reserved for frontal matrices that are designated as recovery frontal matrices.
Specifically, for A E 7, the notation RA (where RA E F) represents the recovery
frontal matrix of A. As mentioned earlier, RA will be the first LU ancestor of A.
As the set T7 is an ordered set, the less than (<), equal to (=), and greater than
(>) relations are well defined and trichotomy insures that for any two frontal matrices
A and B E T7 only one of these relations applies and that if A = B, then A and B
represent the same frontal matrix.
Within a particular frontal matrix, the specific components (submatrices) will
frequently need to be referenced. This will be done using the same notation found
in Davis and Duff's original work [3]. Specifically, CA refers to the set of row indices
that define the rows from the overall matrix that make up the frontal matrix AE 7 T.
Likewise, UA refers to the set of column indices that define the columns of the entire
matrix. (Again, the indices of the overall matrix assume the necessary permutations
have been applied to establish the initial pivot ordering as determined by the analyze
factorize algorithm). The subset C' C CA is the set of row indices that define the
pivot rows of the frontal matrix A. Prior to any lost pivot recovery, CA will be equal
to {A, A+ 1, . ., A+ (A 1)} where A is the row and column index of the first pivot
of A and kA is the number of potential pivots in A. The subset Cl C CA contains all
the row indices of the nonpivotal rows of A. These nonpivotal rows define the rows of
the frontal matrix's contribution block and hence define the L edges from the frontal
matrix A to L parent frontal matrices. In a similar fashion, UA C UA defines the
indices of the pivot columns of A and will initially be equal to A. The set U" C UA
defines the indices of the nonpivotal columns of A and hence also defines the columns
of the contribution block and the U edges to U parents of A.
Furthermore, sometimes subsets of the " and U" patterns will need to be refer
enced. Specifically, the notation 'l(B) will refer to the subset of frontal matrix A's
pattern of nonpivotal rows that are pivotal rows for frontal matrices B and greater.
Likewise, UN(B) refers to the nonpivotal columns of frontal matrix A that are pivot
columns for frontal matrices B and greater.
When discussing the failed pivots of a frontal matrix, the notation C' and
UA,, will be used to define the row and columns indices, respectively, of the lost
pivots within frontal matrix A. The notation, C'AL and UA will be used to
represent the row and column indices of the acceptable pivots within the frontal
matrix A. These patterns accommodate any unsymmetric permutations that may
have taken place within the frontal matrix's pivot block.
Relations between frontal matrices reflect the need to pass the contributions of
one frontal matrix to another frontal matrix where the contributions will be applied
to entries in the destination's pivot row and/or pivot columns. There are two types
of such relations. An L edge from frontal matrix A to B occurs when C/ n C' f 0.
That is, A produces contributions that need to be assembled into B's pivot rows.
L
This L edge is represented by the notation A B. In such a relation, A is said to
be the L child of B and B the L parent of A. Similarly, a U edge from A to B exists
if UN/ n U 0 and represents the fact that A produces contributions that need to
be assembled into the pivot columns of B. This edge is represented by the notation
A B and A is called the U child of B and B is the U parent of A. When L and U
LU
edges both exist from A to B, an LU edge is said to exist and is shown as A B
with A the LU child of B and B the LU parent of A.
Notice that due to the ordering scheme imposed on T7 and the fact that contri
butions are always passed to frontal matrices that appear later in the pivot ordering,
UNSYMMETRIC PATTERN MULTIFRONTAL LOST PIVOT RECOVERY
all edges from a frontal matrix A to B will be such that A < B.
An L path exists between frontal matrices A and B if and only if there is a path
from A to B consisting of all L and/or LU edges. In this case B is called an L ancestor
of A and A is an L predecessor of B. When such a path contains at least one edge,
L+
the notation A C B is used to represent the path. Correspondingly, a U path from
A to B exists if and only if there is a path from A to B of all U and/or LU edges
with B the U ancestor of A and A the U predecessor of B. Such a path is denoted
u+
by A t B. An LU ancestor of a frontal matrix A is another frontal matrix such that
there is both an L path and a U path from A to this other frontal matrix. The L and
U paths may or may not share frontal matrices as path nodes. Such relationships are
LUE
denoted by A L B. Furthermore, in some situations, paths of zero or more edges
will need to be considered (where a zero length path exists only if both the source
and destination frontal matrices are the same). Such paths are denoted by A t B
and A B.
The definitions of L and U edges represent the true relationships between frontal
matrices and thus the set of all of these edges is called the true edge set and denoted:
CT. As CT really consists of two types of edges (L edges and U edges), it can be
decomposed into two disjoint subsets: EL, which contains only the L edges, and CE,
which contains only the U edges. This is similar to a formulation of elimination DAGs
done by Gilbert and Liu [12] and Eisenstat and Liu [9, 10].
The edge set used to define the assembly DAG is called the assembly edge set and
denoted by CA. It differs from the true edge set, CT. The CA edge set is generated by
the combined analyzefactorize algorithm (UMFPACK [2, 3]) for the first matrix in
the sequence. There is an edge from A to B in CA if any contributions were assembled
from frontal matrix A into frontal matrix B. An assembly of one or more rows
from A is done in UMFPACK whenever U"(B) C UB and '(B) U CB is not empty.
Similarly, an assembly of one or more columns from A is done in UMFPACK whenever
U (B) C CB and U'A(B) UUB is not empty. The entire (remaining) contribution block
of A is assembled into B if both U" (B) C UB and (B) C CB hold. In general,
CA is a transitive reduction of CT and it is frequently useful to think of it that way.
However, unless the matrix is in block triangular form, there may exist edges in CT
that are not in CA.
With the preceding notation and definitions established, development of the nec
essary theoretical results may commence. The theory simplifies if an assumption of
block triangular form can be made and such an assumption is made in this article. The
development starts by stating a key result of block triangular form and then proceeds
to some foundational theorems and the rest of the lost pivot recovery results.
4.2. Block Triangular Form. Block Triangular Form is significant because
it can expose a high level parallelism between assembly subDAGS, reduce overall
computations (as offdiagonal blocks need not be factorized), and simplify lost pivot
recovery [13]. The advantages in terms of lost pivot recovery occur as C" 1f 0 and
U" f 0 for all frontal matrices except the last frontal matrix in each diagonal block
Ti. A proof can be found in Hadfield's dissertation [13].
4.3. Foundational Theorems. Three theorems and two corollaries form the
foundation upon which most of the lost pivot recovery results are built. These are
provided below:
THO M 4.1 (L ANCSTO FILLING) IfA
THEOREM 4.1 (L ANCESTOR FILLIN). If A a B Men UA"(B) C UB.
Hadfield and Davis
Proof: The proof of this theorem follows an induction argument based on L edges
of the L path from A to B.
BASIS: For A B (direct L edge), the entries aiji,, ai2,j,, and ailJ2 must be
assumed nonzero for some ii E CV, i2 E CL nCe, j E Ul and for all j2 e UA". Then,
as frontal matrix A's contribution block must be assumed nonzero, all of the ai2 j
entries are nonzero, which implies U"(B) C UB.
L+ L
INDUCTION: If A 0 C B, then by inductive hypothesis U"(C) C Uc.
Furthermore, as C B, the argument of the induction basis provides "(B) C UB
and an edge from C to B implies that C < B. Thus, the following set of inclusions
are established
U (B) C U "(B) C UB.D
COROLLARY 4.2. If A L B then tU(B) C B.
This corollary simply adds the null edge case, which is true via the definition
U" CUA.
THEOREM 4.3 (U ANCESTOR FILLIN). If A B then C'(B) C CB.
Proof: Follows directly as the transpose of the proof of Theorem 4.1.
COROLLARY 4.4. If A B then C/(B) C CB.
This corollary simply adds the null edge case, which is true via the definition
C C LCA.
L+ U+
THEOREM 4.5 (PATHS DUE TO FILLIN). Assume that A B and A 0 C,
u+
then (a) if B < C there is an U path B C or (b) if B > C there is a L path
L+
C B.
Proof: The proof of this theorem is an induction argument based on the L and
U paths. Furthermore, the basis is strengthened to include a path of zero edges which
is needed to facilitate the proof.
L U*
BASIS: Assume A B and A C. Then consider the case of B < C and all
C1 and C2 such that
U* U U*
A C1 C2 C and C < B < C2
(Note that C0 could be A and C2 could be C and as B < C there must be at least
U* U*
one edge in A u C). As A CI, the inclusion: Cl (Ci) C Cc,, holds per Corollary
4.4 and together with C1 < B and A B implies that B E C1. This implies the
L
existence of the edge C  B, which together with C0 < B and Corollary 4.2 implies
U" (B) C UB. Due to Cu C2 and B < C2, C U ,(B) C UB, which establishes
U U* U+
the existence of the edge B C2 and as C2 0 C, the result B C follows.
If B > C then A C implies Cl (C) C Cc by Corollary 4.4. (Note that A could
L L
be equal to C here). Since A B and B > C, the results B E C, and C B follow.
Furthermore, the transpose of this argument establishes that the results also hold
L* U
for A B and A C.
INDUCTION: For the inductive step, the argument assumes paths of length one or
L+ U+
more as follows: A _ B and A  C. First the case of B < C is considered with any
UNSYMMETRIC PATTERN MULTIFRONTAL LOST PIVOT RECOVERY
U* U U*
three frontal matrices: C1, C2, and B1 such C1 < B1 < C2 and A u C i C 2 2 C
L+ L* L
and A C B1 B. By inductive hypothesis, the edge C1 B1 exists. This result,
together with Theorem 4.1, implies that U" I(Bi) C UB1. Hence, as C1 C2 and
U
B1 < C2, one finds that C2 E I1, and the edge B1 0 C2 exists, which establishes the
u+
path B1 C. As none of the selection criteria on C1, B1, or C2 preclude B1 = B,
u+
these results include B o C.
The transpose of the argument just provided establishes that if B > C the results
L+
C B also hold. o
THEOREM 4.6 (ALWAYS AN LU ANCESTOR). Assuming block upper triangular
form, every frontal matrix will have either an LU ancestor or no ancestors at all.
Proof: Assume that there exists a frontal matrix A with ancestors but no LU
ancestor. The assumption of block triangular form insures that C" f 0 and U" # 0;
hence, the frontal matrix must have edges to both L and U parents in ET. Likewise,
each L parent must have L edges to later frontal matrices in Ti unless the parent is
the last frontal matrix in the block. As each Ti has only a finite number of frontal
matrices in it, there will be an L path from A to the last frontal matrix in the block.
In the same fashion, an argument based on the U edges that follows the same logic
will establish the existence of a U path from A to the last frontal matrix in the block.
Hence, there are both L and U paths from A to the last frontal matrix in the block
and thus this frontal matrix is an LU ancestor of A. o
With this foundational theory established, construction may commence on the
specific lost pivot recovery theory.
4.4. FillIn due to Recovery. In this section, theoretical results are developed
to describe how and where new fillin is produced due to the recovery of a frontal
matrix's lost pivot rows and columns. Specifically, new fillin resulting from the
recovery of lost pivots occurs in the lost pivot rows and columns. New fillin the lost
pivot columns is limited in scope to the row patterns of L ancestors. New fillin in the
lost pivot rows is limited in scope to the column patterns of U ancestors. Furthermore,
fillin in a particular row of the lost pivot columns (or column of the lost pivot rows)
can result from several frontal matrices. This will have significant impacts on the
distributed memory implementation to be described in the next section as the fillin
contributions become decentralized and must be combined in a fashion that eliminates
duplicates and insures all distinct contributions are included.
A final, yet very significant, result is that the first LU ancestor will be able to
absorb the remaining nonzero rows of the lost pivot columns and nonzero columns
of the lost pivot rows by simply extending its pivot block (defined by its C and U'
patterns) to include the lost pivot rows and columns. Hence, the failed frontal matrix
can be completely recovered by its first LU ancestor and will have no additional direct
effects on the remaining frontal matrices and the assembly DAG. If any pivots still
fail in the LU ancestor, they can be treated as pivot failures of the LU ancestor (as
discussed in Section 4.7, below).
Throughout the theorems and corollaries of this section, the reader should recall
that the recovery frontal matrix for any particular failed frontal matrix is defined as
the first LU ancestor, which exists for all frontal matrices except for those that occur
last in a diagonal block. The last frontal matrices are always square with = C and
U = U'. Thus any lost pivots in these last frontal matrices indicate the overall matrix
is singular.
Hadfield and Davis
THEOREM 4.7 (L ANCESTOR RECOVERY FILLIN). For A, B, RA E Ti for
L+
some i such that A contains failed pivots, A L B, RA is the recovery frontal matrix
of A, and B < RA, there will be fillin in the intersection of the rows of CB and
columns of U'Ao,,
Proof: The proof follows an induction argument on the L edges of the L path
from A to B.
BASIS: If A L B (direct L edge), then aij is assumed nonzero for all i E C A
and all j E U' including j E UAos. Thus, when the lost pivot columns (UA ,,t) are
permuted to extend the pivot block of RA, these assumed nonzero entries will cause
an extension of the UB by the columns of U'oA and the resulting fillin will cause the
intersection of the CB rows and the UI' o columns to become nonzero.
L+
INDUCTION: If A B (L path with more than one edge) with B < RA, then
L+ L
for C Ti such that A L C L B, the inductive hypothesis provides that U/1 has
been extended by U',,. Since C L B and B < RA, Theorem 4.1 provides that
U',,, C U l(B) C UB, which means frontal matrix B must be extended by the lost
pivot columns. Hence, there will be new fillin in the intersection of the CB rows and
the columns of U' o.
Al,,t
COROLLARY 4.8. The nonzero rows of the U'A zs columns that are contained in
C' pattern for B as defined in Theorem 4.7 will be fully absorbed by the frontal matrix
B and will play no further role in the recovery of A.
This is true as these entries go into the pivot rows of B and will be now considered
as part of the frontal matrix B. If good pivots are found in these rows, they will
become part of the U factor associated with frontal matrix B, else they will be handled
as B's failed pivot rows.
L+ L+
COROLLARY 4.9. For B, C E i such that A B and A  C with B < RA
and C < RA and RA the recovery matrix for A, there is the possibility that either
CB nfcl : 0 and/or c1B nc f 0. Thus, there may be multiple contributions made
to a particular row of the U'Al~ columns as a result of the recovery of A.
THEOREM 4.10 (U ANCESTOR RECOVERY FILLIN). For A, B, RA GE Fi such
U+
that A contains failed pivols, A 1 B, RA is the recovery frontal matrix of A, and
B < RA, there will be fillin in the intersection of the columns of UB and the rows of
'
Aio,,t
Proof: Follows the transpose of the argument for Theorem 4.7. Furthermore,
the transpose of Corollaries 4.8 and 4.9 also follow.
COROLLARY 4.11. The nonzero columns of the CLAz rows that are contained
in Up pattern for B as defined in Theorem 4.10 will be fully absorbed by the frontal
matrix B and will play no further role in the recovery of A.
u+ u+
COROLLARY 4.12. For B, CE Ti such that A M B and A C with B < RA
and C < RA and RA the recovery matrix for A, there is the possibility that either
UB U 7ll : 0 and/or U' \lUc : 0. Thus, there may be multiple contributions made
to a particular column of the C'Azol rows as a result of the recovery of A.
THEOREM 4.13 (LIMITS OF LOST PIVOT COLUMN FILLIN). The only fillin in
the lost pivot columns of a frontal matrix A is due to L ancestors of A.
Proof: Assume ai,k is new fillin created due to the recovery of A where k E
L+
U'A,, but i ( CB for any B E Li such that A L B. Then, there must exist a j
such that A < j < RA such that a,j f 0, and ajj 0, and aj,k # 0 otherwise the
UNSYMMETRIC PATTERN MULTIFRONTAL LOST PIVOT RECOVERY
fillin would not have occurred. If aj,k # 0 regardless of the pivot failure in A then
the edge A J with j C' must exist, which raises a contradiction. Otherwise,
if ajk became nonzero due to fillin from the recovery of A then this argument is
inductively applied to the aj,k entry. The induction is terminated by either finding
an entry a,,k in column k that is nonzero regardless of the recovery (which results
in the contradiction previously described) or if no such entry is found the fillin could
not have occurred (another contradiction of the assumptions). o
THEOREM 4.14 (LIMITS OF LOST PIVOT Row FILLIN). The only fillin in the
lost pivot rows of a frontal matrix A is due to U ancestors of A.
Proof: Follows the transpose of the argument for Theorem 4.13.
THEOREM 4.15 (RECOVERY MATRIX EXTENSIONS). The recovery frontal matrix
RA for a failed frontal matrix A needs only to have its pivot block (defined by CRA
and U4A) extended to accommodate the recovery of A.
L* L+
Proof: For any BE Ti such that A LC B t RA, Theorem 4.1 provides that
U* U+
U"(RA) C URA. Similarly, for any CE Ti such that A u C U RA, Theorem 4.3
provides that C(RA) C RA. As U7 and " have only been extended by U'7
provides that C (RA) C CRA As B C UAlost
and 'A, t, respectively, these are the only extensions needed to the recovery frontal
matrix RA. 
L U
COROLLARY 4.16. Any edges A L B and/or A B such that B > C and
LU+
A  C are redundant due to the existence of corresponding L and U paths from C
to B, respectively.
The next theorem establishes that by using the first LU ancestor as the recovery
matrix, all contributions to the lost pivot block by earlier ancestors are avoided. This
result allows the implementation to preclude the multiple contribution handling for
the lost pivot block portion of the lost pivot rows and columns. Corollaries 4.9 and
4.12 established the need for this processing for the rest of the lost pivot row and
column entries.
THEOREM 4.17 (No CONTRIBUTIONS TO LOST PIVOT BLOCK). If the first
LU+ LU+
LU ancestor B of A (that is, A Lt B with A L C for no C < B) is used for the
recovery matrix for A (designated RA), then there will be no contributions made to
the lost pivot block of A by L or U ancestors of A.
Proof: The lost pivot block of the failed frontal matrix A is defined by the rows
in 'Ao and the columns in /UA By Theorems 4.13 the columns of UAL'., are only
updated via L ancestors. By Theorem 4.14 the rows of C'Azo are only updated by
U ancestors. Thus, for updates to the lost pivot block of A to occur, the updating
frontal matrix must be both an L and a U ancestor of the failed frontal matrix. As
the recovery matrix, RA, is the first such LU ancestor and as the patterns 'A and
UA are extended by 'A and 'A respectively, Corollaries 4.8 and 4.11 provide
that no further contributions to A's lost pivot block will be made as a result of the
recovery of A. o
The practical results of the theory just developed are that the recovery of a failed
frontal matrix can be facilitated by (a) extending all L ancestors that occur before
the first LU ancestor by the lost pivot rows, (b) extending all U ancestors that occur
before the first LU ancestor by the lost pivot columns, and (c) extending the pivot
block of the recovery frontal matrix by the lost pivot rows and columns. Furthermore,
L ancestors absorb the entries of the lost pivot columns that occur in the ancestor's
pivot rows and, likewise, U ancestors absorb the entries of the lost pivot rows that
Hadfield and Davis
occur in the ancestor's pivot columns.
4.5. Impacts on Assembly DAG. One of the most significant features of these
lost pivot recovery mechanisms is the minimal impact on the relationships between
frontal matrices as defined by the edge set: ET. Specifically, since LU ancestors are
available to serve as recovery matrices, no additional edges need be added to ET as a
result of the recovery.
THEOREM 4.18 (No ADDITIONAL EDGES FOR LU ANCESTOR RECOVERIES).
Only redundant edges are created due to recovery fillin when the recovery frontal
matrix RA is an LU ancestor of the failed frontal matrix A.
LU
Proof: Since A Lt RA, Theorem 4.3 provides that (RA) C CRA. Further
L+
more, for each BE Ti such that A B and B < RA, Theorem 4.5 insures the
u+
existence of a U path: B RA. This path, together with Theorem 4.3, provides
that C (RA) C RA. By Theorem 4.13, this accounts for all the recovery fillin in the
pattern CRA, so no new L edges are created from RA. Furthermore, the transpose of
this argument insures that no new U edges are created from RA.
L+
Next, the new edges to RA are considered. If A B with B < RA, then there
will be potentially new fillin in the " rows of the U' columns. But for each row
c ", by definition, there exists an edge B C where c E '. Hence, the path
L+ U+
A  C exists and if C < RA then by Theorem 4.5 the path C  RA exists. Thus, if
the new fillin of row c E C creates a new edge C RA, this edge will be redundant
u+
to the path C RA that already exists. In a similar fashion and by a transpose of
this argument, any new L edge will be redundant to an existing L path.D
Hence only redundant edges are created due to the recovery to the LU ancestor RA
and the edge set ET still correctly defines the relationships between frontal matrices.
4.6. Effects of Multiple Recoveries. Up to this point, all the results estab
lished have focused on the effects of a single failed frontal matrix. Nothing, however,
precludes the possibility of multiple failed frontal matrices. Furthermore, several
failed frontal matrices may all affect a particular frontal matrix. If the failed frontal
matrices are all L predecessors or all U predecessors of the affected frontal matrix
then the results thus far provide for the appropriate recovery actions. However, if the
affected frontal matrix is an L ancestor of one failed frontal matrix and the U an
cestor of another, it will be expanded in both row and column patterns. The theory
developed thus far fails to provide direction on how to handle the new contributions
in the overlap of the new recovery columns and new recovery rows.
Furthermore, the row and column patterns of a frontal matrix's pivot block de
fine its relationship to the preceding frontal matrices. Since this pivot block can be
expanded due to the recovery of LU predecessors, it is necessary to complete all pivot
block expansions before determining how to handle the recovery of other failed frontal
matrices. Ordering the handling of recoveries by increasing topological order of their
recovery matrices will facilitate both overlap resolution and the necessary prerequisite
pivot block expansions.
THEOREM 4.19 (PATHS BETWEEN RECOVERY MATRICES). For A, B, C, RA,
and RB E Ti with A and B failed frontal matrices, RA and RB their respective
L+ u+
recovery matrices, C < RA, C < RB, and the ET edge set, if A t C and B t C,
L+ U+
then one and only one of RA = RB, RA RB, or RB  RA will be true.
UNSYMMETRIC PATTERN MULTIFRONTAL LOST PIVOT RECOVERY
Proof: Due to the definition of the CT edge set and that of the recovery matrices,
LU+ LU+ U+ L+
the paths A  RA and B U RB exist. Thus, as B  C, B  RB, and C < RB,
L+
Theorem 4.5 provides the existence of C R.B Combined with the assumption
L+ L+ U+
of A  C, the path A  RB is established. But, the existence of A . RA is
also known, so by Theorem 4.5 and the trichotomy of the ordered set T7 either (a)
L+ U+
RA < RB and RA RB, (b) RB < RA and RB RA, or (c) RA = RB
Now that a relationship between the recovery matrices of the failed frontal ma
trices has been established by Theorem 4.19, the handling of the overlap area contri
butions of the multiply affected frontal matrix can be defined.
THEOREM 4.20 (OVERLAP AREA CONTRIBUTION HANDLING). If A, B E L7i
are failed frontal matrices with RA and RB their respective recovery matrices and
L+ U+
A _ C and B . C (but no U path from A to C or L path from B to C), then the
contributions made by C in the portion of the extended contribution block defined by
the rows of 'B,, and the columns of UAl, need to be incorporated into the pivot
rows/columns of either RA or RB whichever occurs first in the assembly DAG.
Proof: If A and B have the same recovery matrix then there is no issue to resolve.
If RA < RB, then Theorem 4.19 establishes the existence of a path from RA to RB
and Corollaries 4.8 and 4.11 require that the overlap area contributions be absorbed
by RA. Likewise, if RB < RA, then the overlap area contributions will be absorbed
by RB. O
COROLLARY 4.21. Resolution of the overlap area contribution handling described
in Theorem 4.20 can also be achieved by absorbing the contributions in whichever
recovery frontal matrix (RA or RB) has the lower topological level.
This latest corollary provides a way of organizing multiple recoveries that will
simplify the implementation (to be discussed in detail later).
The second major issue of this section is to define the need to complete all the
pivot block expansions due to failed LU predecessors prior to accommodating any
additional effects due to other failed frontal matrices.
THEOREM 4.22 (PIVOT BLOCK RECOVERIES FIRST). When multiple failed
frontal matrices affect a particular frontal matrix, it is necessary that the effects of
recoveries for which the current frontal matrix is the recovery frontal matrix be resolved
before any other recoveries are addressed.
Proof: Corollaries 4.8 and 4.11 establish the importance of an L ancestor's pivot
block pattern of rows (') and a U ancestor's pivot block pattern of columns (U')
in determining how a failed frontal matrix's recovery will affect the ancestor frontal
matrix. When a frontal matrix is the LU ancestor recovery matrix for a failed frontal
matrix, its 2' and U' patterns will be expanded to accommodate the lost pivots.
Since the recoveries of other failed frontal matrices will depend on these patterns, all
of the effects of failed frontal matrices that the current frontal matrix recovers must
be first resolved before other recovery effects can be addressed. Furthermore, as the
relationship between the failed frontal matrix and its recovery matrix is well defined
a priori, use of the recovery frontal matrix's pivot block pattern is not necessary in
recovering these failed frontal matrices. o
This theorem will be combined with Corollary 4.21 and the results of the next
section to prescribe the ordering for resolving the effects of multiple failed frontal
matrices.
Besides the possibility of distinct frontal matrices having their original anticipated
pivots fail, there is the possibility that these same pivots will fail once they have been
Hadfield and Davis
absorbed into their recovery matrix's pivot block.
4.7. Repeated Failures. Once a failed frontal matrix's pivot block as been
absorbed into a recovery matrix, there are no guarantees that the previously lost
pivots will be acceptable in their new frontal matrix. Hence, the question of whether
the recovery is now complete or not needs to be asked.
The answer is simple and elegant and aids significantly in both the theoretical de
velopment and implementation of lost pivot recovery. The following theorem provides
this answer:
THEOREM 4.23 (RECOVERY COMPLETION). Once the lost pivot block of a failed
frontal matrix (A) has been absorbed into the specified recovery frontal matrix (RA),
the recovery of the failed frontal matrix is complete. Any subsequent failures of the
lost pivot block of A can be handled as failures in the recovery frontal matrix.
L+
Proof: For all BE CTi such that A B B and B < RA, the rows of CL' in the U/'
columns have become part of the pivot rows of B by Corollary 4.8. Furthermore, any
other nonzero rows of the U' columns that are not contained in C' for one of the
Alost
B's defined above are absorbed in CRA by Theorem 4.15.
u+
In a similar fashion, all C fi such that A  C and C < RA, the columns of
U' in the C'A,, rows have become part of the pivot columns of C by Corollary 4.11.
Furthermore, any other nonzero columns of the ' rows that are not contained in
U' for one of the C's defined above are absorbed in URA by Theorem 4.15.
Thus, all the effects of A's lost pivot recovery have been absorbed into other frontal
matrices and subsequent failures of the pivots in RA can be handled as failures in RA
since no new edges from RA have been added to CT based on the results of Theorem
4.16. o
COROLLARY 4.24. The recovery of a particular failed frontal matrix need not
progress past the topological level of its recovery matrix.
The significance of this result is that the scope of a failed frontal matrix's recovery
is now limited to a well defined subset of the frontal matrices. This result can also
be used to specify the ordering of recovery resolutions necessary when multiple failed
frontal matrices affect a particular frontal matrix.
4.8. Ordering Recovery Resolutions. When multiple failed frontal matrices
affect a particular frontal matrix, the ordering with which the effects of each recovery
upon the current frontal matrix are determined is important. In Theorem 4.20, the
ordering of failed L and U ancestors' effects is based on the positions of the ancestors'
recovery matrices. Furthermore, Theorem 4.22 requires that all recovery operations
for failed frontal matrices that are recovered by the current frontal matrix be resolved
before any other failed frontal matrices' recoveries. With Theorem 4.23 now also
established, a composite ordering of the recovery resolutions is defined by the following
theorem.
THEOREM 4.25 (ORDERING RECOVERY RESOLUTIONS). If resolution of the
effects of multiple failed frontal matrices on a particular frontal matrix are ordered
by ascending topological level of the failed frontal matrices' recovery matrices, the
ordering criteria of Theorems 4.20 and 4.22 will be met.
Proof: By Theorem 4.23 all recovery effects are terminated at the recovery ma
trix. Theorem 4.5 insures paths from all affected frontal matrices to an LU ancestor
recovery matrix. Thus, any frontal matrix affected by a failed frontal matrix must
be at a topological level lower than the recovery matrix of the failed frontal matrix.
UNSYMMETRIC PATTERN MULTIFRONTAL LOST PIVOT RECOVERY
Combining this result with the results of Corollary 4.21 and Theorem 4.22 completes
the proof. o
With this result established, all the necessary mechanisms can be built to ac
commodate any number of failed frontal matrices. Yet, the communication paths
upon which frontal matrices will share the information necessary for these recovery
mechanisms have not been fully defined. The next section completes the theoretical
development by establishing that either CT, a transitive reduction of CT, or CA can
be used for the control edge set Cc used for lost pivot recovery.
4.9. Communication Paths. During the recovery of a failed frontal matrix,
the lost pivot block, the rows C" of the lost pivot columns, and the columns U" of the
lost pivot rows must be passed to and absorbed by other frontal matrices. In addition,
new fillin can occur in the lost pivot rows and columns and this fillin must also be
passed to and absorbed by other frontal matrices. This section will show that the
communication paths necessary for this data passing can be either CT, a transitive
reduction of CT, or the assembly edge set CA. The resulting edge set is called the
control edge set and is designated as Cc.
THEOREM 4.26 (Cc BUILT DIRECTLY FROM Tr). CT is sufficient to accommo
date all necessary communication for lost pivot recovery.
Proof: The definition of CT insures that there are paths from each frontal matrix
to its recovery frontal matrix. Theorems 4.13, 4.14, and 4.18 insure that all recovery
effects are then limited to frontal matrices on paths in CT. o
This definition of Cc does, however, provide one serious drawback. Specifically,
the CT edge set is very large and thus will impose serious performance impacts. The
next theorem establishes the sufficiency of a much smaller edge set to be used as Cc.
THEOREM 4.27 (Cc BASED ON A TRANSITIVE REDUCTION OF Tr). If a
transitive reduction of ET is used as Ec then the resulting edge set will still be sufficient
for all necessary lost pivot recovery.
Proof: Follows directly from the definition of a transitive reduction. That is, if
there is an edge A  B in CT, then there is also a path from A to B in the transitive
reduction of CT. When combined with the results of Theorem 4.26, this completes
the proof.H
Furthermore, if an assumption of block triangular form can be made, the original
edge set from the assembly DAG, designated SA, can also be used as the control edge
set Cc. While use of SA for this purpose precludes the need to compute the transitive
reduction of CT, there are typically about 10% more edges in SA than in the reduced
CT which results in additional run time overhead.
4.10. Absence of Block Triangular Form. If block triangular form can not
be assumed, lost pivot recovery is still possible but more difficult. Specifically, empty
contribution blocks are possible and LU ancestors are not guaranteed. Furthermore,
CT is insufficient for lost pivot recovery and must be augmented. Likewise, SA is also
insufficient for lost pivot recovery. While these results complicate lost pivot recovery,
it is still possible. For details on this more generalized case, see [13].
5. Implementation. The implementation of lost pivot recovery is based on a
parallel distributed memory version of the unsymmetric pattern multifrontal method
called PRF (for ParallelReFactor). The PRF software accepts an assembly DAG
and factorizes sequences of matrices that have a nonzero structure identical to the
matrix for which the assembly DAG was created. Initially, the PRF code strictly
uses the initial pivot ordering. With the inclusion of lost pivot recovery, the PRF
Hadfield and Davis
code is enhanced to include an avoidance strategy (relaxing of the pivot threshold),
intrafrontal matrix recovery, and interfrontal matrix recovery.
The PRF software uses hostbased preprocessing to statically schedule, allocate,
and assign frontal matrix tasks. The schedules, frontal matrix descriptions, and ma
trix entry values are then passed to the appropriate parallel processing nodes, which
are part of an nCUBE 2 multiprocessor. The factorization of subsequent matrices in
the sequence reuses the schedules and frontal matrix descriptions, and thus only new
matrix entry values are required. The actual factorization of a matrix is accomplished
by each parallel processing node executing the loop described in Figure 5.1 with mul
tiple processors cooperatively handling the larger frontal matrix tasks. In this fixed
pivot ordering version of the PRF code, synchronization between frontal matrices
is accomplished using the schedules and the blocking message receive primitives for
contribution messages and pivot column broadcasts.
for (each frontal matrix in this processor's schedule) do
allocate storage for local portion of current frontal matrix
assemble original entries and contributions into frontal matrix
partially factorize the frontal matrix according to its pivots
send contribution block to subsequent frontal matrices
retain the LU factors for this frontal matrix
endfor
FIG. 5.1. Original PRF Factorization Loop (Fixed Pivot Order)
With the introduction of lost pivot recovery into the PRF code additional syn
chronization is required. Specifically, the effects of failed predecessor frontal matrices
can cause frontal matrices to expand by new rows and/or columns. Such expan
sions must be assigned to processors and will require larger memory allocations for
frontal matrix storage, updating of frontal matrix row and columns patterns, and
possible remapping of the assembly of original entries and contributions. Further
more, if multiple processors have been allocated to a particular frontal matrix, all of
the cooperating processors must agree upon the resolution and allocation of the lost
pivot recoveries. Hence two new levels of synchronization are required for lost pivot
recovery. First, synchronization is required between frontal matrices as defined by
the control edge set Cc. This synchronization is done between the root processors
of the frontal matrices' subcubes (the relative node 0's within the subcubes). Use of
the transitively reduced ST edge set as Cc reduces the number of required messages,
but requires that specific frontal matrix recoveries be forwarded along paths in Cc
to all affected frontal matrices. The Cc edge set is determined during host prepro
cessing and distributed to the parallel processing nodes. The recovery matrices (first
LU ancestors) and their topological levels are also determined during host prepro
cessing for each frontal matrix. Theorem 4.24 specifies how this information can be
used to discontinue the forwarding of individual frontal matrix recoveries. The second
synchronization level is between the subcube processors for a distinct frontal matrix.
That is, after the subcube's root processor has received and resolved all lost pivot
recovery effects on the current frontal matrix, the resulting updates are broadcast to
each of the other subcube processors. This insures agreement on how the lost pivot
recovery effects have been resolved and allocated.
The two new levels of synchronization required for lost pivot recovery must be
integrated into the frontal matrix task processing as defined by the PRF factorization
loop of Figure 5.1. This integration and the other additional lost pivot recovery
UNSYMMETRIC PATTERN MULTIFRONTAL LOST PIVOT RECOVERY
processing is illustrated in Figure 5.2. Within this description, recovery structures
are created for each frontal matrix with failed pivots. These structures include scalar
data identifying the number of lost pivot rows and columns and the recovery matrix's
ID and level. There is also vector data defining the patterns of lost pivot rows and
columns, as well as the actual values of entries in the lost pivot rows and columns.
For entries outside of the lost pivot block, the values must be accompanied by row,
column, and source frontal matrix ID to facilitate duplicate elimination as multiple
copies of the recovery structure may be created and passed along partially disjoint
paths.
for (each frontal matrix in this processor's schedule) do
if (this is the root processor of the frontal matrix's subcube) then
1: accept, combine, and resolve incoming recovery structures
2: broadcast resulting updates to rest of frontal matrix's subcube
else
3: receive broadcast of updates due to lost pivot recovery
endif
4: allocate storage for local portion of current frontal matrix
5: assemble original entries and contributions into frontal matrix
6: partially factorize the frontal matrix according to it's pivots
if (this is not the root processor of the frontal matrix's subcube) then
7: forward local portions of lost pivot rows/columns and new
contributions due to earlier recoveries to root processor
else
8: receive the lost pivot recovery data from other subcube processors
9: associate the lost pivot recovery data with appropriate recovery
structures updating corresponding patterns as necessary
10: forward recovery structures to control parents
endif
11: forward contribution block to subsequent frontal matrices
12: retain the LU factors for this frontal matrix
endfor
FIG. 5.2. PRF Factorization Loop Enhanced for Lost Pivot Recovery
Details of the processing outlined in Figure 5.2 follow from the theoretical devel
opment of the previous section. Detailed descriptions of this processing can be found
in [13]. A major complicating factor in this implementation is that the recovery struc
ture for an individual failed frontal matrix is duplicated along disjoint control paths.
Rows from this structure are absorbed by L ancestors and columns by U ancestors per
Corollaries 4.8 and 4.11. Furthermore, new rows are introduced by L ancestors and
new columns by U ancestors per Theorems 4.13 and 4.14. As both these absorptions
and introductions can occur in distinct instances of the recovery structure, careful
integration of the patterns and contributions of the various instances is necessary
and accomplished by each frontal matrix task for all received recovery structures. A
shared memory parallel implementation could maintain a single copy of each recovery
structure with access controlled by locking mechanisms. This would eliminate much
of the complexity associated with lost pivot recovery.
Intrafrontal matrix lost pivot recovery did not need to be addressed in the the
oretical development but is handled within the partial dense factorization routine of
step (6) in Figure 5.2. This partial dense factorization routine is based on a pipelined,
fanout algorithm developed by Geist and Heath [11] which uses a columnoriented
Hadfield and Davis
data allocation. In this algorithm, the cooperating processor that holds the next
pivot's column will receive the multipliers (column of the L factor) and then update
just the next pivot's column with these multipliers. The multipliers for this next pivot
can then be computed and sent out after which the rest of columns can be updated
by the current pivot's multipliers. In this way the next pivot multipliers can be on
their way to the other processors before those processors need them.
To adapt this algorithm for intrafrontal matrix recovery, the processor owning
the next pivot column will first check for a valid pivot in the pivot block portion of
the column. If a valid pivot cannot be found in this column, the other pivot columns
held by this processor are updated (using the current pivot's multipliers) and checked
for valid pivots. If one of these other columns has a valid pivot, its multipliers are
computed and sent out with the necessary permutation information appended. If
the processor owning the next pivot has no columns with valid pivots, it sends out a
failure status instead of the multiplier broadcast. When the other processors attempt
to read these multipliers and see the failure status, each will check its own potential
pivot columns for a valid pivot. If a valid pivot is found, the processor will contribute
the forward distance from the original pivot owner's processor ID to its own processor
ID (with these IDs being relative to the frontal matrix's assigned subcube). Otherwise,
if no valid pivot was found, the number of total processors in the assigned subcube is
contributed. These contributions are made to a global synchronization operation that
uses prefix fanin and fanout computations to return the minimum of all contributions
to each participating processor. As a result of this operation, each processor will know
the nearest processor to the current pivot owner that has a valid pivot. This processor
will become the new owner of this pivot and compute and send the corresponding
multipliers. If the global synchronization operation returns the size of the assigned
subcube, then there are no valid pivots held by any processor and the frontal matrix
has lost pivots that must be handled by interfrontal matrix recovery mechanisms.
6. Performance Results. With the theoretical development of a robust lost
pivot recovery mechanism complete and its implementation summarized, we now
present the performance of the parallel factorizeonly algorithm, when pivot failures
are present. Performance results are provided for both sequential and parallel execu
tion time. The three matrix sequences used for this evaluation are RDIST1, RDIST2,
and RDIST3A which come from chemical engineering applications [20, 21]. Their
characteristics are shown in Table 6.1.
TABLE 6.1
Characteristics of Matrix Sequences
NAME ORDER NONZEROS # OF MATRICES
RDIST1 4134 94,408 41
RDIST2 3198 56,834 40
RDIST3A 2398 61,896 37
The effectiveness and efficiency of lost pivot recovery on a single processor was
measured using single processor refactorizations of each matrix in the RDIST1, RDIST2,
and RDIST3A sequences. Separate sets of runs are done with the pivot threshold
value set to 0.1 (which is the value used during the analyzefactorize UMFPACK run
that built the assembly DAG and thus specified the initial pivot ordering) and to
0.001 (which corresponds to the use of a lost pivot avoidance strategy). To put these
times into perspective, they will be compared to an estimation of the analyzefactorize
UNSYMMETRIC PATTERN MULTIFRONTAL LOST PIVOT RECOVERY
time. Only an estimated analyzefactorize time is possible, as the UMFPACK analyze
factorize routine required too much memory to be run on an single nCUBE processing
node. The estimate of the analyzefactorize time was achieved by taking the ratio of
the UMFPACK analyzefactorize time to the UMFPACK refactor time as run on a
single processor of a KSR1 system and multiplying the ratio by the single nCUBE 2
processing node execution time of the fixed pivot order refactorization.
The sequential execution time results for the three test matrix sequences are
shown in Figures 6.1, 6.2, and 6.3. The RDIST1 matrix experienced few lost pivots
even if an avoidance strategy was not in use. For all the matrices in this sequence,
the factoronly code with lost pivot recovery ran faster than the estimated time to
perform an analyzefactor operation. For the RDIST2 and RDIST3A sequences, the
use of an avoidance strategy was necessary for the factoronly time to be better
than the estimated analyzefactor time, yet this combination did consistently produce
the fastest execution times. Without an avoidance strategy, the execution times
soon exceeded the estimated analyzefactor times. The conclusion reached from these
sequential execution tests was that a factoronly algorithm that incorporates lost
pivot recovery is a viable alternative to the analyzefactor operation for sequences
of identically structured matrices. This is especially true if an avoidance strategy is
employed.
50i I I I i
4 5 . . . . .. . . .
40
_35
S30 
. 25
I
C
0
S20 
15
10
5
0 5 10 15 20 25 30 35 40 45
Matrix Number in Sequence
FIG. 6.1. Sequential Execution Time for RDIST1 Sequence
The sequential execution time results indicate that lost pivot recovery is viable
even only a single processor is in use. The implementation, however, also allows for
the exploitation of parallelism. Many parallel executions were done on selected ma
trices from each of the three sequences on hypercubes of dimensions 1 to 6 (2 to 64
processors). Matrix 37 of the RDIST3A sequence experienced the most lost pivots
and, hence, the worst parallel performance. With avoidance this matrix experienced
52 lost pivots and without avoidance there were 221 lost pivots. Figure 6.4 presents
Estimated analysisfactor time
 . .. Lost pivotrecovery without avoidance (u=0.1)
. .. . Lost pivot.recovery with avoidance (u=Q.001) ...
Hadfield and Davis
30
Lost pivot recovery without avoidance (u=. /.
25
SEstimated na lsisfactor time
20_ / Lost pivot recovery with avoidance (u=0.001)
0
.15
,X 10
0 5 10 15 20 25 30 35 40 45
Matrix Number in Sequence
FIG. 6.2. Sequential Execution Time for RDIST2 Sequence
the parallel speedup results for this matrix. In this figure, the perfect (linear) speedup
and speedup without using lost pivot recovery (LPR) curves are shown for reference.
The curve for LPR with no lost pivots was obtained by running the factoronly PRF
code with LPR enabled on the first matrix in the sequence in which all anticipated
pivots were acceptable. Comparison of this curve with the curve that did not employ
LPR shows that the overhead required for the LPR code has minimal adverse impact
on speedup. The LPR with avoidance and LPR without avoidance curves illustrate
performance achieved using matrix 37 of the RDIST3A sequence. These curves indi
cate that the benefits of parallelism are significantly reduced when more pivots are
lost. However, gains due to parallelism are still very much achievable. Furthermore,
these results show the worst of all test cases. Typically significantly fewer pivots were
lost and the corresponding degradations in achievable parallelism were significantly
less severe.
7. Conclusion. The unsymmetricpattern multifrontal method of Davis and Duff
has proven to have extremely competitive sequential performance [19] and significant
parallel potential [16]. When the method is applied to sequences of identically struc
tured matrices, the assembly DAG from the analyzefactor operation on the first
matrix can be reused for factoronly operations on the subsequent matrices in the
sequence. However, if entries anticipated as pivots by the assembly DAG become
no longer numerically acceptable, recovery mechanisms are necessary to maintain the
viability of a factoronly version of the method.
This paper describes such a lost pivot recovery mechanism for the unsymmetric
pattern multifrontal method. A thorough theoretical development of the lost pivot re
covery mechanisms established correctness and robustness. Furthermore, these mech
anisms preserve the node and edge sets of the original assembly DAG. Preservation of
the assembly DAG is especially important for distributed memory implementations
UNSYMMETRIC PATTERN MULTIFRONTAL LOST PIVOT RECOVERY
45 .
SLost pivot recovery without avoidance (u=0.1)
4 0 . .. . .. . . . i
Estimated analysisfactor time
.B 25
I
S20 .. .... .. .
15
10
5 . .
0 5 10 15 20 25 30 35 40 45
Matrix Number in Sequence
FIG. 6.3. Sequential Execution Time for RDIST3A Sequence
so that static scheduling and allocations can be utilized. The implementation of these
mechanisms into a parallel distributed memory factoronly code on the nCUBE 2 was
also described and its performance evaluated. Analysis of these performance results
justifies the following conclusions:
A fully robust LPR capability is possible for the unsymmetric pattern mul
tifrontal method and is applicable to both sequential and parallel implemen
tations.
The LPR capability can effectively reduce overall execution time, especially
if combined with an avoidance strategy.
Performance benefits due to parallelism are definitely achievable when lost
pivot recovery mechanisms are incorporated into the PRF code. However,
the gains due to parallelism are limited when more anticipated pivots are lost
during the factorization process.
Future efforts will likely focus on a parallel shared memory factoronly algorithm
based on the unsymmetricpattern multifrontal method. Many of the performance
limiting aspects of the parallel distributed memory lost pivot recovery implementation
were directly related to the decentralization of critical data structures. A shared
memory implementation should experience significantly better parallel performance.
REFERENCES
[1] P. R. AMESTOY AND I. S. DUFF, Vectorization of a multiprocessor multifrontal code, Interna
tional Journal of Supercomputing Applications, 3 (1989), pp. 4159.
[2] T. A. DAVIS, Users' guide for the unsymmetricpattern multifrontal package (UMFPACK),
Tech. Report TR93020, Computer and Information Sciences Department, Univer
sity of Florida, Gainesville, FL, June 1993. (UMFPACK is available via Netlib
(linalg/umfpack.shar) or via anonymous ftp to ftp.cis.ufl.edu:pub/umfpack).
I I
Hadfield and Davis
3 fost pivots with LPR
8
6  LPR with avoidance
4
LPR without avoidance
2
0
0 1 2 3 4 5 6
Hypercube Dimension
FIG. 6.4. RDIST3A (matrix 37) SpeedUps With and Without Lost Pivot Recovery (LPR)
[3] T. A. DAVIS AND I. S. DUFF, An unsymmetricpattern multifrontal method for sparse LU
factorization, SIAM J. Matrix Anal. Appl., (submitted March 1993, under revision. Also
TR94038.).
[4] I. S. DUFF, Sparse Matrices and their Uses, Academic Press, New York and London, 1981.
[5] I. S. DUFF, Parallel implementation of multifrontal schemes, Parallel Computing, 3 (1986),
pp. 193204.
[6] I. S. DUFF, A. M. ERISMAN, AND J. K. REID, Direct Methods for Sparse Matrices, Oxford
Science Publications, New York, NY, 1989.
[7] I. S. DUFF AND J. K. REID, The multifrontal solution of indefinite sparse symmetric linear
systems, ACM Transactions on Mathematical Software, 9 (1983), pp. 302325.
[8] The multifrontal solution of unsymmetric sets of linear equations, SIAM J. Sci. Statist.
Comput., 5 (1984), pp. 633641.
[9] S. C. EISENSTAT AND J. W. H. LIU, Exploiting structural symmetry in unsymmetric sparse
symbolic factorization, SIAM J. Matrix Anal. Appl., 13 (1992), pp. 202211.
[10] Exploiting structural symmetry in a sparse partial pivoting code, SIAM J. Sci. Statist.
Comput., 14 (1993), pp. 253257.
[11] G. A. GEIST AND M. HEATH, Matrix factorization on a hypercube, in Hypercube Multiproces
sors 1986, M. Heath, ed., Society for Industrial and Applied Mathematics, Philadelphia,
PA, 1986, pp. 161180.
[12] J. R. GILBERT AND J. W. H. LIU, Elimination structures for unsymmetric sparse LU factors,
SIAM Journal on Matrix Analysis and Applications, 14 (1993), pp. 334354.
[13] S. HADFIELD, On the LU Factorization of Sequences of I Structured Sparse Matrices
within a Distributed Memory Environment, PhD thesis, University of Florida, Gainesville,
FL, April 1994. (also TR94019).
[14] S. M. HADFIELD AND T. A. DAVIS, Analysis of potential parallel implementation of the
unsymmetricpattern multifrontal method for sparse LU factorization, Tech. Report
TR92017, Department of Computer and Information Systems, University of Florida,
Gainesville, FL, 1992.
[15] S. M. HADFIELD AND T. A. DAVIS, A parallel unsymmetricpattern multifrontal method, SIAM
J. Sci. Computing, (1994). (submitted. Also Univ. of Fl. tech report TR94028).
[16] S. M. HADFIELD AND T. A. DAVIS, Potential and achievable parallelism in the unsymmetric
pattern multifrontal LU factorization method for sparse matrices, in Fifth SIAM Confer
ence on Applied Linear Algebra, 1994. (also TR94006).
UNSYMMETRIC PATTERN MULTIFRONTAL LOST PIVOT RECOVERY 23
[17] K. S. KUNDERT, Sparse matrix techniques and their applications to circuit simulation, in
Circuit Analysis, Simulation and Design, A. E. Ruehli, ed., New York: NorthHolland,
1986.
[18] J. W. H. LIU, The multifrontal method for sparse matrix solution: Theory and practice, SIAM
Review, 34 (1992), pp. 82109.
[19] E. G.Y. NG, Comparison of some direct methods for solving sparse nonsymmetric linear
systems, in 5th SIAM Conference on Applied Linear Algebra, SIAM, June 1994, p. 140.
[20] S. E. ZITNEY, Sparse matrix methods for chemical process separation calculations on super
computers, in Proc. Supercomputing '92, Minneapolis, MN, Nov. 1992, IEEE Computer
Society Press, pp. 414423.
[21] S. E. ZITNEY AND M. A. STADTHERR, Supercomputing strategies for the design and analysis
of complex separation systems, Ind. Eng. Chem. Res., 32 (1993), pp. 604612.
[22] Z. ZLATEV, Sparse matrix techniques for general matrices with real elements: Pivotal strategies,
decompositions and applications in ODE software, in Sparsity and Its Applications, D. J.
Evans, ed., Cambridge, United Kingdom: Cambridge University Press, 1985, pp. 185228.
Note: all University of Florida technical reports in this list of references are
available in postscript form via anonymous ftp to ftp. cis .uf edu in the directory
cis/techreports.
