Upgrading Circuit Modules To Improve Performance+
Doowon Paik
University of Minnesota
Sartaj Sahni
University of Florida
We consider the problem of selectively upgrading some of the modules in a circuit so as to meet a speci
fied performance level. The upgrading of a module involves replacing it with an equivalent module with
zero delay (effectively) and this replacement has a cost or weight associated with. We show that some
versions of the minimum cost upgrading problem are NPhard while others are polynomially solvable.
Several heuristics for the general problem are proposed and evaluated experimentally.
Keywords And Phrases
Module upgrading, performance, delay, complexity, NPhard, heuristics
+ This research was supported, in part, by the National Science Foundation under grant MIP8617374.
Abstract
2
1. Introduction
Often, a circuit can be modeled as a directed acyclic graph (dag) G in which there is a delay, d(v), associ
ated with each vertex, v [CHAN90, GHAN87, MCGE90]. The dag vertices represent circuit modules
while interconnects are modeled by dag edges. The delay, d(v), associated with vertex v of the dag
models both the delay of the modules the vertex represents as well as the interconnect delay. The delay
of any path in the dag is the sum of the delays of the vertices on the path. The circuit delay, d(G), is the
delay of the maximum delay path in the dag. In case the modeled circuit delay is more than the accept
able delay, then one may reduce the delay by replacing some of the modules by modules with lower
delay. In this paper we consider the case when these replacement modules have a delay that is very much
less than that of the modules they replace. In effect, then, the delay of the new modules is zero. This, for
example, is the case when the new modules are implemented in a much faster technology than the old
ones. There is a cost, w(v), associated with replacing an old module with a functionally equivalent new
one. When an old module is replaced by a new one, we shall say that the corresponding dag vertex v has
been upgraded. The cost of the upgrade is w(v). The cost of upgrading a setX of modules/vertices is the
sum of the costs of the individual upgrades. Let G [ denote the dag obtained by upgrading the vertices in
X. Then the delay of G[X is d(GIX) and the cost of the upgrade is I w(v). Throughout this paper, we
veX
assume that d(v) and w(v) are positive integers for all v.
As an example, consider the dag G of Figure l(a). The number inside a vertex is its label. Outside
each vertex is a pair of numbers. The first is its delay and the second is the upgrade cost for that vertex.
We see that d(G) = 10. When X = {3, 4}, G[X is as shown in Figure l(b). In this figure we provide only
the delay of each vertex. The upgraded vertices (i.e., those in X) have a delay of 0. The cost of the
upgrade is 5.
In this paper we study the problem of obtaining a least cost upgrade set X such that d(G [) < 6 for a
specified nonnegative integer 6. This problem is called the directed vertex upgrade problem (DVUP). It
is easy to see that DVUP is NPhard. For this we simply show how a polynomial time algorithm for
DVUP could be used to solve, in polynomial time, the partition problem which is known to be NPhard
[GARE79]. In the partition problem we are given n positive integers a,, a2, ..., a, and wish to know if
there is a subset I c { 1, 2, ..., n } such that
(5,3) 0
(2,2), ,D 2
(3,4) 3
2 4 2 4
(1,3) (3,2) 1 0
(a) G with 6 = 6 (b)X = {3, 4} is the minimal solution
Figure 1: DVUP example (the first(second) coordinate is the delay(weight) of the vertex).
Y a, Ya,/2
Iel I=1
Consider the dag below which is just a chain of n vertices. Let d(i) = w (i) = a,, 1 < i < n.
DD 
n
It is easy to see that there is an I such that C a, = ~a,/2 iff the minimum cost X such that d(G[X) <
iel i=1
n n
Sa/2 has cost a,/2.
1=1 1=1
The specific results we obtain in this paper are:
1. DVUP is polynomially solvable for dags (circuits) in which all weights and delays are one (Section
2).
2. DVUP is NPhard for dags with all vertex weights one but vertex delays not necessarily one when
6 2 2. When 6 = 1, the problem is polynomially solvable (Section 3).
3. Several heuristics for the general DVUP are proposed (Section 4).
4. An experimental evaluation of the heuristics using ISCAS benchmark circuits [BRGL85, 89] is per
formed (Section 5).
4
Related vertex modification problems for dags have been studied in [PAIK90, 91]. In the first of
these, vertex splitting was used to model the problem of obtaining an optimal selection of scan flipflops
and in the latter vertex deletion was used to model the problem of optimal placement of signal boosters in
lossy circuits.
2. Unit Delay Unit Weight DVUP
A dag G = (V, E) is a unit delay unit weight dag iff d(v) = w (v) = 1 for every vertex v E V. A subset X of
V is kcolorable iff we can label the vertices of X using at most k labels and such that no two adjacent
vertices have the same label. A maximum kcoloring of a dag G is a maximum subset X c V that is
kcolorable. A dag G is transitive iff for every u, v, w E G such that E E and e E, the
edge is also in E. G+ = (V, E+) is the transitive closure of(V, E) iff e E+ if there is a path
(with at least one edge on it) in G from u to v. Note that if G is a dag then G+ is a transitive dag.
The unit delay unit weight DVUP for any 8, 6 > 1 can be solved in O(n3logn) time by using the
O(n3logn) maximum kcoloring algorithm of [GAVR87] for transitive dags. Before showing how this can
be done, we establish a relationship between the number of vertices of a given set X that can be on any
path of G and colorability ofX in G+. Intuitively, X represents the set of vertices that are not upgraded.
So, the number ofX vertices on a path is the delay for that path.
Theorem 1: Let G = (V, E) be a dag and let G+ = (V, E+) be its transitive closure. LetX be a subset of the
vertices in V. G has no path with > 6 vertices ofX iffX is 6 colorable in G+.
Proof: Suppose that G has no path with > 6 vertices ofX. For each v E V let m(v) be the maximum
number ofX vertices on any path from a source of G to vertex v. m(v) is given by the recurrence:
0 v v X and v is a source
1 v X and v is a source
m(v)= max {m(v)} v v X and v is not a source
weP(v)
max {m(v)}+l v X and v is not a source
LweP(v)
where P(v) is the set of immediate predecessors of v and is given by:
P(v)= {w I< w,v> e E}.
Clearly, if G has no path with > 8 vertices ofX, it has no vertex v with m(v) > 8. Let S, E X be the subset
of X that has m value i, 1 < i e 8. I.e., S, = {vIm(v)= i and v e X}. We note that jS, =X and that ifx e
S,, y E S,, x # y then there is no path from x to y (ory to x) in G. To see the latter, observe that if there is
a path from x to y (say), then m(x) < m(y) as the path to x with the maximum number of X vertices can be
extended to y to get a path with at least one more X vertex (y). As a result of this observation, we con
clude that V E+ and V E whenever x e S,,y e S,, x y. So, X is 8 colorable in G+ (vertices
in S, are given the color i, 1 < i k).
For the reverse, assume that X is 6 colorable. Suppose G has at least one path Q that contains > 6
vertices of X. Let the vertices of X on path Q be vi, v2, ..., Vq, q > 6 and assume that v, comes before v,,,,
1 < i < q. So, v, must have a different color from v, for all i and j such that i < j. Hence, q > 6 colors are
needed to color {vj, ..., Vq} c X and X is not 6 colorable. This contradicts the assumption on X. Hence if
X is 6 colorable, G has no path with > 6 vertices ofX. D
From the preceding theorem, it follows that ifX is a maximum 6coloring of G+, VX is the smallest
set such that d(GI(VX)) < 6. This implies the correctness of the three step algorithm of Figure 2. The
complexity of this is governed by that of step 2 which is O(n31ogn). Note that when 6 = 1, a maximum
6coloring is just a maximum independent set and such a set can be found in O(ne) time for transitive clo
sure graphs with n vertices and e edges [GAVR87]. So, the case 6 < 1 can be solved in O(n3) time as the
graph G+ computed in step 1 of Figure 2 may have O(n2) edges even though G may not.
step 1: Compute G+ from G
step 2: Compute X, a maximum 6coloring of G+
step 3: B = V X is the solution for the DVUP instance (G, 6)
Figure 2: Algorithm for unit delay unit weight DVUP.
6
3. Nonunit Delay Unit Weight DVUP
A dag G = (V, E) is a unit weight dag ifw(v) = 1 for every v E V. The case when d(v) is also 1 for all v
was considered in the previous section. So, in this section we are only concerned with the case of unit
weight dags that have at least one vertex with delay > 1. In Section 3.1 we show that the nonunit delay
unit weight DVUP can be solved in O(n3) time when 6 = 1 and in Section 3.2, we show that the problem
is NPhard when 6 2 2.
3.1. 8=1
Let X be a minimum set of vertices such that d(G ) 1. Clearly, every v E V with d(v) > 1 must be in
X. For each v E V with d(v) > 1, let a', a', ..., a be the vertices such that < a', v > E E and let b', b, ...
, b be such that < v, b' > E E and let G' be the dag that results when each such v (together with all edges
incident to/from v) are deleted from G and all edges of the form < av' by > are added. Figure 3 shows the
transformation for a single vertex v. To get G', this transformation is applied serially to all v with d(v) >
1.
bv b
Figure 3: Eliminating a vertex v with d(v) > 1.
Let B = {v I d(v) > 1 and v E V}. Let G' = (V', E') and let C c V' be a minimum vertex set such
that d(G'IC) 1. It is easy to see thatA = B u C is a minimum vertex set such that d(G[X) 1. C can be
obtain in O(n3) time using the unit delay unit weight algorithm of Section 2 (note that G' is a unit delay
unit weight dag), B can be obtained in O(n) time, and G' can be constructed in O(n3) time. So, the overall
complexity of our algorithm to compute X is O(n3).
7
3.2. 8 2
To show that non unit delay unit weight DVUP is NPhard for 6 2 2, we first show the result for 6 = 2 and
then use this fact to show the result for 6 > 2. The proof for the case 6 =2 uses the known [PAIK90] NP
hard problem 23SAT which is defined below:
Input: A boolean function F = C1 C2 C, in n variables x1, x2 ... x such that each clause has
either two or three literals. If I C, = 2 then both literals in C, are either negated or unnegated.
If C,I = 3 then C, has at least one negated and one unnegated literal.
Output: "Yes" iff F is satisfiable (i.e., there is an assignment of truth values to x1, x2 ..., x, such that
all C, evaluate to true).
To establish the NPhardness of unit weight DVUP with 8 = 2, we show how to construct, in poly
nomial time, an instance (GF, 2) of this problem and integer k such that there is an X that satisfies:
(a) d(GF[) < 2
(b) IAX k
iff formula F is satisfiable. This construction will make use of variable and clause subassemblies. These
are described below.
Variable Subassembly
Let H, be a chain of 1 vertices as in Figure 4(a). Each vertex on the chain has unit delay. The schematic
for H, is given in Figure 4(b). The chain source is s and its sink is t. The parallel combination, D1, of two
chains of 1 unit delay vertices has 212 vertices and is shown in Figure 4(c). Its schematic is shown in
Figure 4(d).
The variable subassembly, VS(i), for variable x, is D3 with the source and sink labeled x, and x,
respectively (Figure 5). Note that each vertex in VS(i) has unit delay. x, and x, are the external vertices of
VS(i). One easily sees that d(VS(i)) = 3. Hence, if d(VS(i)[X) 2, then AIX 2 1. Furthermore, if IX = 1,
then X = {x, } orX = { }.
8
1 vertices
s t
(a) 1 vertex chain
(c) double chain with 212 vertices
(c) double chain with 21 2 vertices
t
(b) schematic
(d) schematic
(d) schematic
Figure 4: Single and double chains.
0
VS(i)
0
s(
(a) VS(i)
(b) Schematic
Figure 5: Variable Subassembly.
Lemma 1: If d(VS(i)[X) 2, then AlX 2 1. If IX = 1 and d(VS(i)IX) 2, thenX = { x, } orX = { x, } (i.e.,
X is an external vertex). D
Clause Subassemblies
Our construction of GF will employ different clause subassemblies for different types of clauses. The
subassembly employed will depend on the size of the clause as well as on how many negated literals it
contains. The different subassemblies are described below.
9
Let So be the three vertex dag of Figure 6(a). Vertices a and c have unit delay while the delay of
vertex b is 2. The schematic for So is shown in Figure 6(b). We see that ,i ,,..i = 3 and that if d(SoX) 2,
then IX > 1. Furthermore, if IX = 1, then X c {b, c}. Let la (lb) be the longest delay on any path to a (b)
when vertex set X c {b, c} is upgraded. We see that (la, lb) = (1, 2) when X = {c} and (la, b) = (2, 1)
whenX = {b}.
d(a) = d(c) = 1
d(b) = 2
(a) So (b) schematic
Figure 6: Subgraph So.
Lemma 2: If d(S(O) 2, then A 2 1. If Al = 1 and /i' \,,\) 2, then X c {b, c} and (la, b) = (1, 2)
whenX = {c} and (1, lb) =(2, 1)whenX = {b}. D
The subgraph S, of Figure 7(a) is obtained by combining two instances of So together via two ver
tices a and 3. The schematic is given in Figure 7(c). The subgraph SP is obtained from S, by reversing
the direction of all edges. Its delay properties are identical to those of S1 and its schematic is given in Fig
ure 7(d). a and 3 are called the external vertices of S, and Sf.
We see that d(SI) = 4. Furthermore, if d(SI X) < 2, then from the construction of So it follows that X
must contain at least one of {a1, bl, cl} and one of {a2, b2, c2}. However, this is not sufficient and IX > 3.
Xl = 3 is possible only if (la,, lb) = (la, b). If this is so then 3 may be added to X as the third vertex in
case (l,, lb) = (la, i) = (1, 2) and a may be added in case (la,, lb,) = (l,, i) = (2, 1). However, if (la,, lb)
(1a, 1 ) then both a and 3 must be in X to ensure (i, \ ) 2. In this case IX > 3.
10
(a) S, d(bj) d(b2)=2 d(al)=d(a>2)d(cl)=d(c2)=d(x)=d(3)=1
(a) S1; d(bl)=d?(b2)=2 d?(ai)=d(az)=d?(ci)=d?(cz)=d(a)=df(p)=l
(b) S1 from two So's
Si

$1 ___
(c) Schematic
Figure 7: Subgraph S1.
Lemma 3: Ifd(S[IX) 2, then AX 2 3. If AX
(i.e., exactly one external vertex is inX). D
(d) Schematic for sf
3 and ,i\I.V) < 2, then X = {b b2, } or X = {c,, c 3}
The clause subassembly for a two literal clause with no negated literals is simply S, and that for a
two literal clause with two negated literals is S These are, respectively, denoted CS2(j) and CS (i) (Fig
ure 8). The schematics are the same as for S, and Sf except that the a and 3 vertices have been labeled by
the literals of Cj that they represent. The delay properties of CS2(j) and CS2(/) are the same as those of S1.
The clause subassembly, CS3(j), that we use for a three literal clause Cj = (xj,+xj+3) with exactly
one negated literal is given in Figure 9(a). The vertex labeled xj, is the a vertex of S1 as well as the sink
vertex of D3; vertex xj2 is both the 3 vertex of S, and the sink of D3; vertex x3j is the source of both D3
instances. Note that x,, and xj are sink vertices while x, is a source. The external vertices of CS3(j) are
x,,, xj,, and x,,.
11
C, = x1 + x,7
(a) CS2(')
Figure 8: Clause subassemblies for two literal clauses.
(a) CS,'()
C] = ] + X)
(b) CS (j)
CS3ch) Qem
(b) schematic
Figure 9: Clause subassembly for a three literal clause with one negated literal.
Lemma 4:
(a) Ifd(CS3(j)[X < 2, then AX 2 4.
(b) If l = 4 and d(CS3(j)[)V 2, then I X n {xj, x2, j} = 2. I.e., X contains at least two external ver
tices.
(c) For every B c {xl, x2, x}, BI = 2, there exists anX, X = 4 such that d(CS3(l)[t) 2.
Proof: From the construction it follows that if d(CS3(J)[X) 2 then d(D3p) < 2 and /o \,1) 2. From
Lemmas 1 and 3 it follows that X must contain at least one vertex from each of the D3's and at least three
 12
from S1. However, from Lemma 3, it also follows that ifX contains only three vertices from S, then it
contains exactly one of x,, and xj,. And so X must contain a fourth vertex not in Si from one of the D3's to
satisfy Lemma 1. Hence, XA >2 4. For (b), we note that if all four vertices ofX are from S1, then {x,,, xj}
cX as to get d(CS3(j')X) 2, X must contain at least one vertex from each of the D3's. If only three ver
tices ofX are from S, then one of x,, and xj must be in X (Lemma 2). If it is x,, (xj) then from Lemma 1 it
follows that 5, must be in X so that d(D3X) < 2 for the right (left) D3 of CS3(j). (c) can be proved in a
similar way. D
When a three literal clause Cj = (xj,+5x+xj) has exactly two negated literals, we use the clause
subassembly CS3,R) (Figure 10) which is obtained by reversing the direction of all edges in CS3(j). The
schematic Si denotes S1 with all edges reversed. The delay properties of CS3,() are identical to those of
CS3(j) and are stated in Lemma 5.
0
Sil
(a) CS3Rj) (b) Schematic
Figure 10: Clause subassembly for a three literal clause with two negated literal.
Lemma 5:
(a) Ifd(CS3R()LX) < 2, then IX 2 4.
 13
(b) If XA = 4 and d(CS3(j)[\) 2, then  X n {xj,, x,, x,} = 2. I.e., X contains at least two external ver
tices.
(c) For every B c {xj,, x2, }, 3BI = 2, there exists anX, X = 4 such that d(CS3(j)1X) < 2.
Proof: Same as for Lemma 4. D
For any n variable instance F = C, C2 .. Cm of 23SAT, we construct an instance GF of unit weight
DVUP by using n variable subassemblies (one each variable x, in F) and m clause subassemblies (one
each clause in F). The subassembly used for C, depends on its type as described above. Vertex x, (Q,) of
variable subassembly VS(i) is connected to each of the vertices labeled x, (Q,) in the clause subassemblies
using the double chain D3. In the case of connecting an x, of VS(i) to an x, of a clause, the source of D3 is
the same as the x, vertex of VS(i) while the sink of D3 is the same as the x, vertex of the clause. For an x,
connection, the x, vertex of VS(i) is the sink of D3 and the x, vertex of the clause is the source of D3. Fig
ure 11 gives GF for the case F = (xI+22+24) (x2+x3+24) (1+x3) (x2+x3).
Theorem 2: Let F be an n variable m clause instance of 23SAT and let GF be as constructed above. F is
satisfiable iff there exists a vertex set X such that d(GFpX) 2 and AXl = n + 4m q, where q is the number
of two literal clauses in F.
Proof: From Lemmas 1 5 we know that X must contain at least 1 vertex from each VS(i), at least three
from each CS2, and CSR and at least four from each CS3 and CSf. Since the vertices of the VS(i)'s,
CS2()'s, CSR(/)'s, CS3(j)'s, and CS ()'s are disjoint, AL 2> n + 4m q. If F is satisfiable then an X satis
fying the theorem may be constructed in the following way. Let x, = b,, 1 < i < n, be a set of truth values
that satisfy F.
(a) Vertex x, of VS(i) E X iff b, = true.
(b) Vertex x, of VS(i) E X iff b, = false.
(c) If Cj is a three (two) literal clause, then from its three (two) external vertices eliminate one that
corresponds to a true literal (such a vertex must exist as F is satisfied with the selected truth assign
ment). The remaining two (one) external vertices plus two more (Lemma 3 and 4) as needed to
 14
Figure 11: GF for F = (xI+x2+x4) (x1+x3+x4) (x+x2+x3).
make the delay of the corresponding clause subassembly < 2 are in X.
One may verify that forX constructed as above, d(GF[) = 2 and IXA = n + 4m q.
Next, suppose that X is such that AIX = n + 4m q and d(GFpK) < 2. For this X must contain the
minimum number of vertices needed from each variable and clause subassembly to make the delay of that
subassembly < 2. Hence, X contains exactly one vertex from each variable subassembly, exactly three
 15
from each two literal clause subassembly, and exactly four from each three literal clause subassembly.
From Lemmas 1 5 we see that the vertex from each variable subassembly is an external vertex, exactly
one of the vertices in X from each two literal clause subassembly and exactly two from each three literal
clause subassembly must be external vertices. If the x, vertex of VS(j) is in X, set x, to true otherwise set
x, to false, 1 < i < n. Under this truth assignment, each clause ofF must be true. To see this, note that
from the preceding discussion it follows that one of the external vertices of each clause subassembly is
not in X. If this is a sink (source) of a D3 connector then the corresponding source (sink) must be in X
otherwise d(D3X) = 3. Hence, this literal is true. Note that for each D3 of GF one of the source and sink
vertices is an external vertex of a variable subassembly and the other is an external vertex of a clause
subassembly. D
The NPhardness of non unit delay unit weight DVUP for 6 = 2 now follows from Theorem 2, the
fact that GF can be constructed from F in polynomial time, and the fact that 23SAT is NPhard.
Theorem 3: Non unit delay unit weight DVUP is NPhard for every 6, 6 2 2.
Proof: For 6 = 2, the theorem has been proved above. For 6 = q, q > 2 let G be an instance of non unit
delay unit weight DVUP with 6 = 2. Let G' be obtained from G by attaching to each vertex of G a chain
Hq2 of q2 vertices (see Figure 12).
(a) A node in a graph (b) After attaching
(a) A node in a graph (b) After attaching Hg 2
Figure 12: Attaching chains to vertices of G.
16
Let X and X', respectively, be minimum vertex sets such that d(G[X) < 2 and d(G'X') < q. We shall show
that IX = IX'I. Since d(GX) < 2, d(G'[7) < q. Hence, IX'I < IX. IfX' contains a vertex w that is not in G
then w must be on some Hq_2. Let v be the vertex of G to which the Hq_2 that contains w was attached in
the construction of G'. Let X" =X' {w} + {v}. It is easy to see that d(G'X") < q and IX"I < IX'. How
ever, since X' is a minimal set such that d(G'X') q, IX"I = IX'. In this way we can transform X' to X*
such that d(G'fI*) < q, IX' = X*l, andX* consists solely of vertices of G. So, d(GKX*) < 2. Hence, IX'I =
IX*l 2 X Consequently, IX' = AX From this, the observation that G' can be constructed from G in
polynomial time, and the fact that unit weight DVUP with 6 = 2 is NPhard, it follows that the unit weight
DVUP with 6 = q for any q 2 2 is NPhard. D
Note that since the construction used by us generates a multistage graph, DVUP is NPhard even
when the dags are restricted to be multistage graphs.
4. Heuristics
We formulate seven heuristics to obtain a low weight set X such that d(GV) < 8. All of these upgrade one
vertex at a time and terminate as soon as the delay of the upgraded graph becomes < 6. The heuristics
differ only in how they select the next vertex to upgrade. The general form of each of the heuristics is
given in Figure 13. In the remaining subsections of this section, we describe the criteria used to select the
next vertex to upgrade.
X := 0 ; { X is the set of vertices to upgrade }
while d(GX) > 6 do
begin
Select the next vertex, v to upgrade;
X:=Xu {v};
end;
Figure 13: General form of heuristics.
17
4.1. Heuristic 1 (hl)
For each vertex v V X define N(v) to be the number of edges incident to/from v that are on paths
with delay > 6. I.e., N(v) = { < i,v> E I f(i)+g(v) > 6 } + < v,j> E I f(v)+g(j) > 6 } where f(i) is
the delay of the maximum delay path in GX that ends at vertex i and g(j) is the delay of the maximum
delay path in GLX that begins at vertex j. The vertex, v, with the largest value of N(v)/w(v) is the next
vertex to upgrade.
4.2. Heuristic 2 (h2)
For each vertex v v X let W(v) to be the weight of the vertices in GI(Xu{v}) that are on paths of
delay > 6. I.e., W(v) = I w(y) where S(v) = {y I y is on a path of delay > 8 in GI(Xu{v}),y V X,y v}.
yeS(v)
For the next vertex to upgrade, select v such that W(v) is minimum.
4.3. Heuristic 3 (h3)
For each vertex v V X let D(v) to be the reduction in d(GX) that results from changing X to X u
{v}. I.e., D(v) = d(GX) d(GI(Xu{v})). For the next vertex to upgrade, select v so that D(v)/w(v) is
maximum.
4.4. Heuristic 4 (h4)
Let S = {s1, s2, ..., sp} and T = {t, t2, ..., t } be the sets of source and sink vertices of G, respectively.
Let f(u) and g(u), respectively, denote the maximum delay of any path in G[X from a source to u and
from u to a sink. Define E(GX) as below:
E(GX)= I (g(u)8) + Y (f(u)6)
ueS ueT
g(u)> f(u)>
E measures (approximately) how far GX is from having no source to sink path with delay > 6. For each
vertex v V X we may define E(GI(Xu{v})) in a similar manner. By upgrading vertex v we have come
"closer" to the desired state by the amount:
18
A(v)= E(G) E(GI(Xu{v )).
The cost of doing this is w(v). For the next vertex v, we pick one that maximizes A(v)/w(v). Note that
when G has a single source and a single sink, this selection criterion is the same as that used in heuristic
3.
4.5. Heuristic 5 (h5)
This is similar to heuristic 4 except that the definition ofE is changed to include all vertices not in
X that are on paths with delay > 6. The new definition is:
E(GIX)
g (g(u)+f(u)d(u)6)
g(u)+f(u)VX
g(u)+f(u)d(u)>S
A variation of this heuristic is to do the summation over all vertices u e V (rather than u e VX). It was
experimentally determined that this does not perform as well.
4.6. Tie Breaker Rule
In case any of the selection criteria results in a tie, the tie is broken by computing min {f(u), g(u)} for all
the tied vertices (f and g are defined with respect to G[X as in Section 4.1). The tied vertex for which this
is maximum is selected (any remaining ties are broken arbitrarily).
4.7. Complexity
For a dag G with n vertices and e edges, e 2 n, the time required to select the next vertex using the cri
terion of heuristics hi h7 is, O(e) for hi, and O(ne) for h2 h5.
19
5. Experimental Results
The heuristics of the preceding section were programmed in Pascal and evaluated on an Apollo DN3500
workstation. The test circuits used were derived from ISCAS benchmark combinational and sequential
circuits [BRGL85, BRGL89]. The sequential circuits were transformed into directed acyclic graphs as in
[LEE90]. The characteristics of the two circuits sets are given in Table 1 and 2, respectively. For the
sequential circuits, vertex delays in the range 1 10 were assigned by us. For the combinational circuits,
the vertex delays were obtained by taking the maximum delay on the incoming edges to each vertex.
For each circuit, G, we determined the solution obtained by each of the heuristics for 6 = 0.9d(G),
.8d(G), .7d(G), .6d(G), .5d(G). The size of the vertex upgrade sets for two of the circuits (s38417, c1908)
are given in Tables 3 and 4. These also give the heuristic run time in seconds. Table 5 gives the total
number of vertices upgraded for all 23 circuits (for each circuit this is summed over the five 6 values
used).
As can be seen, heuristics 4 and 5 consistently provide good solutions and their run time is very
competitive.
6. Conclusions
We have shown that the min cost upgrading problem is NPhard for multistage dags with unit vertex
weights when 6 2 2. Some other versions of the DVUP problem have been shown to be polynomially
solvable. Five heuristics for general dags were proposed and evaluated experimentally. Two of this were
determined to be superior to the others.
20
7. References
[BRGL85] F. Brglez and H. Fujiwara, "A Neutral Netlist of Ten Combinational Benchmark Circuits
and a Target Translator in FORTRAN", Proc. IEEE Symp. on Circuits & Systems, June
1985 pp. 663666.
[BRGL89] F. Brglez, D. Bryan, and K. Kozminski, "Combinational Profiles of Sequential Benchmark
Circuits", Proc. ofIntern. Symp. on Circuit & Systems, May 1989, pp. 19291934.
[CHAN90] P. K. Chan, "Algorithms For LibrarySpecific Sizing Of Combinational Logic", Proc.
27th DAC Conf, 1990 pp. 353356.
[GARE79] M. R. Garey, and D. S. Johnson, "Computers and Intractability", W. H. Freeman and Com
pany, San Francisco, 1979.
[GAVR87] F. Gavril, "Algorithms For Maximum kcolorings And kcoverings Of Transitive Graphs",
Networks, Vol. 17, pp. 465470, 1987.
[GHAN87] S. Ghanta, H. C. Yen, and H. C. Du, "Timing Analysis Algorithms For Large Designs",
University of Minnesota, Technical Report, 8757,1987.
[LEE90] D. H. Lee and S. M. Reddy, "On Determining Scan Flipflops In Partialscan Designs",
Proc. of International Conference on Computer Aided Design, November 1990.
[MCGE90] P. McGeer, R. Brayton, R. Rudell, and A. SangiovanniVincentelli, "Extended Stuckfault
Testability For Combinational Networks", Proc. of the 6th MIT Conference on Advanced
Research in VLSI, MIT Press, April 1990.
[PAIK90] D. Paik, S. Reddy, and S. Sahni, "Vertex Splitting In Dags And Applications To Partial
Scan Designs And Lossy Circuits", University of Florida, Technical Report, 9034,1990.
[PAIK91] D. Paik, S. Reddy, and S. Sahni, "Deleting Verticies To Bound Path Lengths", University
of Florida, Technical Report, 914, 1990.
21
circuit # vertices # edges d(G)
s400 173 282 116
s420 37 130 89
s526 27 98 79
s526n 27 98 79
s838 69 266 216
s1423 74 917 160
s5378 233 1314 132
s9234 216 1633 184
s13207 762 3083 214
s15850 608 8562 348
s35932 1777 3380 214
s38417 1396 8754 192
s38584 1448 9471 723
Table 1: Circuit characteristics of modified sequential circuits
circuit # vertices # edges d(G)
c432 250 426 57.40
c499 555 928 53.30
c880 443 729 53.00
c1355 587 1064 49.90
c1908 913 1498 76.59
c2670 1426 2076 86.87
c3540 1719 2939 98.69
c5315 2485 4386 99.30
c6288 2448 4800 319.88
c7552 3719 6144 85.30
Table 2: Circuit characteristics (with max of falling and rising delay) of ISCAS combinational circuits.
# nodes split run time (sec)
L6[ hi h2 h3 h4 h5 hi h2 h3 h4 h5
.9d(G) 24 3 3 3 3 4 54 54 55 55
.8d(G) 111 7 10 7 7 18 200 244 232 205
.7d(G) 174 16 23 15 14 30 580 811 715 619
.6d(G) 218 35 42 27 29 41 1977 2641 1684 1591
.5d(G) 268 62 104 49 49 51 3822 7958 4604 7570
Table 3: Results for s38417
22
# nodes split run time (sec)
L6[ hi h2 h3 h4 h5 hl h2 h3 h4 h5
.9d(G) 12 3 17 3 3 <0 20 114 21 24
.8d(G) 24 19 56 19 15 1 166 632 186 154
.7d(G) 48 67 83 42 39 2 803 1232 688 609
.6d(G) 104 123 124 90 76 5 1927 1947 1656 1367
.5d(G) 160 180 148 127 103 9 3159 2984 2480 2114
Table 4: Results for cl908
circuit hi h2 h3 h4 h5
s400 46 33 57 25 26
s420 34 23 16 16 16
s526 27 16 16 16 16
s526n 27 16 16 16 16
s838 65 62 38 38 38
s1423 46 32 31 27 27
s5378 172 26 51 24 24
s9234 81 37 73 34 34
s13207 128 106 170 81 81
s15850 256 143 217 104 106
s35932 636 527 658 393 393
s38417 795 123 182 101 100
s38584 613 373 380 231 229
c432 28 19 204 22 42
c499 138 257 415 274 173
c880 118 126 268 76 86
c1355 162 303 431 306 204
c1908 348 392 428 281 236
c2670 300 126 1343 71 104
c3540 859 710 1299 356 390
c5315 521 328 995 212 223
c6288 1640 1189 3198 644 1029
c7552 828 564 2577 397 467
Table 5: Total number of nodes upgraded
