Group Title: Department of Computer and Information Science and Engineering Technical Reports
Title: Vertex splitting in dags and applications to partial scan designs and lossy circuits
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00095051/00001
 Material Information
Title: Vertex splitting in dags and applications to partial scan designs and lossy circuits
Series Title: Department of Computer and Information Science and Engineering Technical Reports
Physical Description: Book
Language: English
Creator: Paik, Doowon
Reddy, Sudhakar
Sahni,Sartaj
Affiliation: University of Florida
University of Iowa
University of Florida
Publisher: Department of Computer and Information Sciences, University of Florida
Place of Publication: Gainesville, Fla.
Copyright Date: 1990
 Record Information
Bibliographic ID: UF00095051
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.

Downloads

This item has the following downloads:

19906 ( PDF )


Full Text






Vertex Splitting In Dags And Applications

To Partial Scan Designs And Lossy Circuits


Doowon Paik+

University of Florida


Sudhakar Reddy++

University of Iowa


Sartaj Sahni+

University of Florida


Abstract


Directed acyclic graphs (dags) are often used to model circuits. Path lengths in such dags represent cir-

cuit delays. In the vertex splitting problem, the objective is to determine a minimum number of vertices

to split so that the resulting dag has no path of length 8. This problem has application to the placement

of flip-flops in partial scan designs, placement of latches in pipelined circuits, placement of signal boost-

ers in lossy circuits and networks, etc. Several simplified versions of this problem are shown to be NP-

hard. A linear time algorithm is obtained for the case when the dag is a tree. A backtracking algorithm

and heuristics are developed for general dags and experimental results using dags obtained from ISCAS

benchmark circuits are obtained.







KEYWORDS and PHRASES

Partial-scan designs, flip-flop selection, sequential circuits, lossy circuits and networks, pipelined circuits,

NP-hard







+ Research supported, in part, by the National Science Foundation under grants DCR-84-20935 and MIPS-86-17374.
++ Research supported, in part, by the SDIO/IST Contract No. N00014-90-J-1793 managed by US Office of Naval
Research.







-2-


1. Introduction


In order to achieve high fault coverage in sequential circuits they are often designed to be easily testable.

The current method of choice is the scan-design. In test mode all flip-flops in a sequential circuit, using

scan-design, are connected into one or more shift registers. This allows one to set the contents of the flip-

flops to the desired state as well as to observe the states of the flip-flops. As the complexity of logic cir-

cuits grows, the overhead for full scan-designs may become unacceptable. For such situations, partial-

scan designs have been proposed. In partial-scan designs only a selected subset of the flip-flops in a

sequential circuit are included in the scan-path. Several methods to choose the flip-flops to be included in

the scan-path have been proposed [CHEN90], [GUPT90], [LEE90]. One of these proposals gives a

method to use the structural information in a sequential circuit to determine the flip-flops to be placed in

a scan-path [CHEN90]. We briefly discuss this method.


A sequential circuit is represented by a directed graph digraphh) called S-graph. Each flip-flop in a

sequential circuit is represented by a node in the S-graph. A directed edge exists in the S-graph from node

i to node j if the state of the flip-flop represented by node j depends on the state of the flip-flop

represented by node i (that is ,there is a path, through combinational logic, in the circuit from the output

of flip-flip i to the input of flip-flop j). Figure 1 is an example of a S-graph. Empirical evidence suggests

that the existence of cycles and the maximum path length between nodes of the S-graph increase the com-

plexity of deriving tests for sequential circuits. It was therefore suggested in [CHEN90] to include a

minimum subset of flip-flops into a scan-path such that the resulting S-graph is cycle-free and the max-

imum distance between a pair of nodes is small. There are several cycles in the S-graph of Figure 1. If

the flip-flop corresponding to node 2 is included in the scan-path then one replaces node 2 with a sink

node 2' and a source node 20 as shown in Figure 2. This transformation corresponds to the fact that the

contents of flip-flops in a scan path can be set and observed in test mode. Notice that the S-graph of Fig-

ure 2 is cycle free. The maximum distance between node 20 and 2' is six. If a flip-flop corresponding to

node 5 is also included in the scan-path then the S-graph of Figure 3 is obtained. In this the maximum dis-

tance between any pair of nodes is less than or equal to 3.


























Figure 1: An example S-graph.


Figure 2: An acyclic S-graph for Figure 1.


Figure 3: An S-graph with maximum distance 3.







-4-


Two step methods to select the flip-flops to be scanned were proposed in [CHEN90], [GUPT90],

and [LEE90]. In the first step a minimal subset of flip-flops is selected to be included in the scan-path

such that the resulting S-graph is acyclic. In the second step additional flip-flops are selected to be

included in the scan path such that in the resulting S-graph the maximum distance between any pair of

nodes is less than or equal to a specified number 6. This second step can be modeled as a vertx splitting

problem on directed acyclic graphs (dags). In this paper we study solutions to the problem of finding a

minimum number of nodes, in a dag, to be split such that the maximum distance between any two nodes

in the resulting digraph is less than or equal to a pre-specified value 6. The dags we consider are more

general than the ones that arise from S-graphs. We permit each edge in the dag to have a positive integral

weight instead of requiring all edges to have unit weight. This generalization can be shown to have appli-

cation in the placement of latches in pipelined circuits and in the placement of signal boosters in lossy cir-

cuits. In Section 2, we introduce the terminology we shall use in the remainder of this paper. The

NP-hard results are developed in Section 3 and the linear time algorithm for tree dags is given in Section

4. A backtracking algorithm and heuristics for the dag vertex splitting problem are proposed in Section 5

and 6, respectively. Section 7 reports on experiments with the ISCAS benchmark circuits. It should be

noted that a quadratic time algorithm for series-parallel dags is easily derived from the quadratic time dag

vertex deletion algorithm of [PAIK90].



2. Terminology


Let G = (V,E,w) be a weighted directed acyclic graph (wdag) with vertex set V, edge set E, and edge

weighting function w. w(ij) is the weight of the edge < ij> e E. w(ij) is a positive integer for

< ij> E E and w(ij) is undefined if < ij> v E. A source vetex is a vertex with zero in-degree while a

sink vetex is a vertex with zero out-degree. The delay, d(P), of the path P is the sum of the weights of the

edges on that path. The delay, d(G), of the graph G is the maximum path delay in the graph, i.e.,

d(G) = max { d(P) }
Pn mG











Let G/X be the wdag that results when each vertex v in X is split into two v' and vO such that all

edges < vj> E E are replaced by edges of the form < v,j> and all edges < i,v> E are replaced by

edges of the form < i,v'> I.e., outbound edges ofv now leave vertex vo while the inbound edges ofv

now enter vertex v'. Figure 3 shows the result, G/X, of splitting the vertex 5 of the dag of Figure 2. The

dag vertex splitting problem (DVSP) is to find a least cardinality vertex set X such that d(G/X) < 8 ,

where 6 is a prespecified delay. For the dag of Figure 2 and 6 = 3, X = {5} is a solution to the DVSP

problem.



Lemma 1: Let G = (V,E,w) be a weighted dag and let 6 be a prespecified delay value. Let MaxEdgeDe-

lay = max { w (ij) }. Then the DVSP has a solution iff 6 2 MaxEdgeDelay.
< I,j> E E


Proof: Vertex splitting does not eliminate any edges. So, there is no X such that d(G/X) < MaxEdgeDe-

lay. Further, d(G/V) = MaxEdgeDelay. So, for every 6 2 MaxEdgeDelay, there is a least cardinality set

X such that d(G/X) 8. D



3. Complexity Results


If w (ij) = 1 for every edge in the wdag, then the edge weighting function w is said to be a unit i /,glmig

function and we say that G has unit weights. In this section we show that the following problems are NP-

hard.


1. DVSP for unit weight graphs with 6 2 2.


2. DVSP for unit weight multistage graphs with 6 2 4. (in a multistage graph the vertices are divided

into an ordered set of stages and each edge goes from a vertex in one stage to one in the next stage).


Since unit weight wdags are just a special case of general wdags, the results obtained imply the

NP -hardness of the corresponding problems with the unit weight constraint removed.







-6-


3.1. Unit Weight DVSP


We shall show that the known NP-complete problem 3SAT can be solved in polynomial time if the

unit weight DVSP with 6 2 2 can.



3SAT Problem [GARE79]


Input: A boolean function F = C1 C2 .. Cm in n variables xl, x2 ..., xn. Each clause C, is the dis-

junction of exactly three literals.


Output: "Yes" if there is a binary assignment for the n variables such that F = 1. "No" otherwise.



For each instance F of 3SAT, we construct an instance GF of the unit weight DVSP such that from

the size of the solution to GF we can determine, in polynomial time, the answer to the 3SAT problem for

F. This construction employs two unit weight dag subassemblies: variable subassembly and clause

subassembly.


Variable Subassembly


Figure 4(a) shows a chain with 6 1 vertices. This chain is represented by the schematic of Figure

4(b). The variable subassembly, VS(i), for variable x, is given in Figure 4(c). This is obtained by combin-

ing together three copies of the chain H6_1 with another chain that has three vertices. Thus, the total

number of vertices in the variable subassembly VS(i) is 36. Note that d(VS(i)) = 6 + 1. Also, note that if

d(VS(i)/X) 8, then XAl 2 1. The only X for which |X = 1 and d(VS(i)/X) 8 are X = { x, } and X =

{ x, }. Figure 4(d) shows the schematic for VS(i).


Clause Subassembly


The clause subassembly CS(J) is obtained by connecting together four 6 1 vertex chains with

another three vertex subgraph as shown in Figure 5(a). The schematic for CS(J) is given in Figure 5(b).

The number of vertices in CS(f) is 46 1 and d(CS(f)) = 26. One may easily verify that if ,X = 1, then






-7-


(a) Chain with 6 1 vertices (b) Schematic



H6-i




X,





0

(c) VS(i) (d) Schematic


Figure 4: Variable subassembly for DVSP.

d(CS(j)/X) > 6 So, if d(CSj)/X) 6 ,then |I > 1. Since 6 2 2 the only X with |I = 2 for which

d(CS(j)/X) < 6 are such that X c {li, 12, /j3}. Furthermore, every X c {li, l2, J3} with A = 2 results

in d(CS(j)/X) < 6.

To construct GF from F, we use n VS(i)'s, one for each variable x, in F and m CS(j)'s, one for

each clause C, in F. There is a directed edge from vertex x, (x,) of VS(i) to vertex Jk of CS(') iffx, (2,)

is the k'th literal of C1 (we assume the three literals in C, are ordered). For the case F = (x1+2+4)

('1+'3+'4) (x1+X2+X3), the GF of Figure 6 is obtained.

Since the total number of vertices in GF is 36n + (46 1)m, the construction of GF can be done in

polynomial time for any fixed 6.







-8-


(a) CS(1)


(b) Schematic


Figure 5: Clause assembly for DVSP.


Figure 6: GF forF = (xi+x'2X4) (X1+t3+4) (x1+X2+X3).







-9-


Theorem 1: Let F be an instance of 3SAT and let GF be the instance of unit weight DVSP obtained

using the above construction. For 6 2 2, F is satisfiable iff there is a vertex set X such that d(GF X)

and IX = n + 2m.

Proof: IfF is satisfiable then there is a binary assignment to the x,'s such that F has value 1. Let bl,b2,

... b, be this assignment. Construct a vertex set X in the following way:


1. x, is in X if b, = 1. If b, = 0, then x, is in X.


2. From each CS(j) add exactly two of the vertices 1 lj2, lj3 to X. These are chosen such that

the literal corresponding to the vertex not chosen has value 1. Each clause has at least one

literal with value 1.

We readily see that AIl = n + 2m and that d(GF/X) < 6.


Next, suppose that there is an X such that IXA = n + 2m and d(GF/X) < 6. From the construction of

the variable and clause assemblies and from the fact that AIl = n + 2m, it follows that X must contain

exactly one vertex from each of the sets {x,, x }, 1 < i < n and exactly 2 from each of the sets

{j1, I'2, 1j3}, 1 < j < m. Hence there is no i such that both x, E X and 2, X and there is no j for which

1, E X and IJ2 E X and l,3 e X. Consider the Boolean assignment b, = 1 iffx, e X. Suppose that lk

X and l/k = x, (2). Since d(GF/X) < 6, vertex x, (2,) must be split as otherwise there is a source to sink

path with delay greater than 6. So, x, (x,) E X and b, = 1 (0). As a result, the k'th literal of clause C, is

true. Hence, bl, ... b, results in each clause having at least one true literal and F has value 1. D


When 8 = 1, the unit weight DVSP is easily solved as now every vertex that is not a source or sink

has to be split.







-10-


3.2. DVSP For Unit Weight Multistage Graphs


A n1litimtaige graph is a dag in which the vertices are partitioned into stages and each edge connects

two vertices in adjacent stages. An example is given in Figure 7.















Figure 7: Example multistage graph.

In the construction of Section 3.1, VS(i) is a multistage graph but CS(J) is not as the edges < 1,1, l2 >

< 2, 73 > require 1, and l73 to be two stages apart while the edge < J1, J3 > requires them to be one

stage apart.


To show that DVSP for multistage graphs is NP-hard, we use the problem 2-3SAT defined as:


Input: A boolean function F = C1 C2 . C, in n variables xl, x2 ..., xn. Each clause C, is the dis-

junction of either two or three literals. If I C, = 2, then both literals in C, are either negated or

unnegated. If C,| = 3, then at least one literal of C, is unnegated and at least one is negated.


Output: "Yes" iff there is a truth assignment for the n variables such that F = 1. "No" otherwise.




Theorem 2: 2-3SAT is NP-hard.


Proof: From any instance F of 3SAT we can obtain, in polynomial time, an instance H of 2-3SAT such

that H is satisfiable (i.e., has answer "yes") iff F is. Consider each clause ofF. If C, has only unnegated

literals (say C, = (xl + x, + x13) ) then replace C, with (xl + Y + 72)(x2 + Yl + Y2) (X3 + Yl + Y2) (Y1 + y)

where yl and Y2 are new variables. If C, has only negated literals ( say C, = (x,1 + x, + x3) ) then replace






-11-


C, with (l +Ii +Y2) ( +Yl + y2) (+13 + Y2) (y2 + Y2)


In this way F is transformed into an instance H of 2-3SAT. One may verify that H is satisfiable iff

Fis. D



From an instance F of 2-3SAT we can construct an instance GF of the multistage DVSP using the

variable and clause subassemblies of Figure 8.

One may verify that for 6 > 4 :

(1) If ~X = 1 and d(VS(iX) 6 then X c {x, }.


(2) If IX = 2 and d(CS3()/X) < 6 then X c { lj 1, lj, }.


(3) If X] = 1 and d(CS2()/X) < 6 then X c { 1, 12 }.


The construction of GF is similar to that used in Section 3.1 except that the variable and clause

subassemblies of Figure 8 are used. In case I Cj| = 2, a modified CS2(j), subassembly as in Figure 9(a) is

used. If IC,| = 3, then a modified CS3(j) is used. This modification is now described. Suppose the

literals in CJ are ordered so that the unnegated ones come first. If CJ has two unnegated literals, use the

clause subassembly of Figure 9(b). Otherwise, use that of Figure 9(c). Figure 10 gives the GF obtained

for the case F = (l+-x2--x4) (x2+x3-x4) (X-1-+3) (x2+x3).




Theorem 3: Let F be an instance of 3SAT and let GF be the instance of the unit weight multistage graph

DVSP obtained using the above construction. For 6 2 4, F is satisfiable iff there is a vertex set X such

that d(GF/X) < 8 and AIX = n + 2m q, where m is the number of clauses in F and q is the number of two

literal clauses.




























(b) Schematic


(a) VS(i)




Z- O0


c Si-2] @fHSoC



(c) CS3(f) for C| = 3


1 -
J2 0 CS3(1)
3 0-




(d) Schematic



' 1 0-
'j2 0 CS2()



(f) Schematic


(e) CS2(1) for I C = 2


Figure 8: Subassemblies for DVSP multistage graph.


- 12-







- 13-


021

o-o0 CS3(j)
o-c--


(a) I C =2


(b) Two unnegated literal


(c) One unnegated literal


Figure 9: Modified clause subassemblies.


Figure 10: GF for F = (l+x12+x4) (x2+x3+x4) (x1+x3) (x2+x3).







- 14-


Proof: Analogous to that of Theorem 1. D



4. Tree DVSP


In this section we develop a linear time algorithm for the DVSP when the wdag G is a rooted tree. The

algorithm is a simple postorder [HORO90] traversal of the tree. During this traversal we compute, for

each node x, the maximum delay, D(x), from x to any other node in its subtree. Ifx has a parent z and

D(x) +w (z,x) exceeds 6, then the node x is split and D(z) is set to 0.







b c

2 2 1 2


d e f g

2 1 2 2


h i \j



Figure 11: An example tree.


Consider the example tree of Figure 11 and assume 6 = 3. The delay, D(x), for x a leaf node is 0.

So, D(x) = 0 for x e { h i e j k }. In postorder, a node is visited after its children have been. When

a node x is visited, its delay may be computed as:

D(x) = max { D(y) + w(x,y) }
y is a child of x


So, D(d) = 2. Since D(d) + w (b,d) > 8 = 3, we split node d to get the tree of Figure 12(a). Next, D(b)=

2 and D(f) = 2 are computed and since D(b) + w (a,b) 3 and D(f) + w (c,f) < 3, neither b norf is split.

Then since D(g) = 2 and D(g) + w(c,g) > 8 = 3, node g is split and we get the tree of Figure 12(b).

Next, node c is visited and split since D(c) + w (a,c) = 5 > 3 = 6. No more nodes are split. The final tree







- 15-


2 2 1 2 22 1 2


d'e f g d' e f

2 2 2


j k j





(a) (b)


Figure 12: Splitting nodes in Figure 11

after splitting the three nodes d, g, and c is given in Figure 13. The formal algorithm is given in Figure

14. The algorithm assumes that X has been initialized to 0 and that w (i,j) 6 for every edge in T since

otherwise, there is no solution. Its complexity is O(n) where n is the number of vertices in T.




Theorem 4: Procedure DVSPtree finds a minimum cardinality X such that d(T/X) < 6.

Proof: The proof is by induction on the number, n, of nodes in the tree T. If n = 1, the theorem is trivi-

ally valid. Assume this is so for n < m where m is an arbitrary natural number. Let T be a tree with n + 1

nodes. Let X be the set of vertices split by DVSPtree and let W be a minimum cardinality vertex set

such that d(T/W) 6. We need to show that XA = I f. If X] = 0, this is trivially true. If X] > 0, then let

z be the first vertex added to X by DVSPtree. Let Tz be the subtree of T rooted at z. As z is added to X

by DVSPtree, D(z) + w(parent(z),z) > 6. Hence, W must contain at least one vertex u that is in Tz. If

W contains more than one such u, then W cannot be of minimum cardinality as Z = W { all such u } +

{z} is such that d(T/Z) 6. Hence, W contains exactly one such u. Let W' = W {u}. Let T' be the







-16-


do
2 0
2 1













is a minimum cardinality vertex set such that d(T''/W) 6. Also, X' = X {z } is such that d(T'X') 6

and furthermore X' is the answer produced by DVSP-tree when started with T'. Since the number of ver-

tices in T' is less than m + 1, \X'| = W'. Hence, JX] = \X' + 1 = W' + 1 = | Wf. D







-17-


procedure DVSP tree(T);
{Find minimum cardinality X such that d(T/X) < 6}
{Assume that w (ij) < 6 for every edge in T and that}
{X is initialized to 0}
begin
if T < > nil
then begin
D(T) = 0;
for each child Y of T do
begin
DVSP tree(Y);
D(T):= max {D(T), D(Y)+w(T,Y)};
end;
if T is not the root
then if D(T) + w (parent(T),T) > 6
then begin
X:= X {T}; {split T}
D(T) = 0;
end;
end;
end; {ofDVSP tree}

Figure 14: DVSP algorithm for trees.



5. A Backtracking Algorithm For DVSP


Backtracking algorithms [HOR078] generally search a tree organization of the solution space using

bounding functions. The solution to our problem is a 0/1 vector X = ( xl, x2, n x ) where n is the

number of vertices and x, = 0 iff vertex i is not split. We use the binary tree organization used in

[HOR078] for the 0/1-knapsack problem. In this organization, the nodes at level i denote a decision on

x, 1 < i < n. If x, = 0 we move to the left subtree. Otherwise we move to the right subtree of a level i

node. Figure 15 shows the solution space tree for the case n = 3. Each root to leaf path defines a vector

X in the solution space.


The remaining features of our backtracking algorithm are :


1) The vertices of the dag are considered in topological order. Thus, x, (of Figure 15) denotes a deci-

sion on whether or not the i 'th vertex, in the topological order, of the dag is split.







-18-


x3= 0


000 001 010 011 100 101 110 111





Figure 15: Solution space organization for n = 3.

2) If the i 'th vertex, in the topological order, is a source or sink vertex then the subtree with x, = 1 is

not considered ( i.e., it is eliminated from the tree of Figure 15 ) as source and sink vertices are not

to be split.


3) Let Y be a node in the solution space tree. If Y is at level i ( root is at level 1 ), then the path from

the root to Y determines values for xl, x2, x,_ Let G Y be the dag obtained from the original

dag by splitting the vertices with x = 1, 1 < j < i. Let f(G Y) be the delay of the maximum delay

path in G Y that ends at the i 'th vertex in the topological order and let g(G Y) be the delay of the

maximum delay path in G Y that begins at the i 'th vertex. We use the following rules to move to a

child of node Y:


3a) Iff(G Y) + w (ij) > 6 for some ij> e E, then set x, =1 and eliminate the x = 0 subtree.







-19-


3b) Iff(G//Y) + g(G Y) 6, then set x, = 0 and eliminate the x, = 1 subtree.


3c) If there is only one edge < ij> that leaves vertex i, and f(G//Y) + w (ij) < 6, then set x, = 0

and eliminate the subtree x, = 1.


3d) If none of 3a) 3d) apply, then search the subtree of Y with x, = 0 first and later search the

one with x, = 1.


4) To bound a node Y we do the following. Let opt be the number of nodes split in the best solution

found so far, and let r be the number of nodes split on the path from the root to Y. And let l(G Y)

be the delay of the maximum delay path in G Y. It is clear that at least F l(G//Y)/8] 1 additional

vertices need to be split. So, if opt r + [ l(G//Y)/6] 1 then node Y is bounded and the subtree

with root Y is not to be searched.



6. Heuristics For DVSP


We formulate four simple and intuitively appealing constructive heuristics to obtain a set X such that

d(G/X) < 6. All four split one vertex at a time until the remaining dag has delay < 6. They assume that

the input dag has a feasible solution. I.e., no edge has delay > 6.


The first three heuristics have the form given in Figure 16 and differ only in the criteria used to

select the next vertex to split.



6.1. Heuristic 1 (hl)


The selection criteria for the next vertex to split is :


a) v t X and v is neither a source nor a sink vertex


b) v is on a path with delay > 6







-20-


X := 0 ; { X is the set of vertices to split }

while d(G/X) > 6 do

begin

Select the next vertex, v to split;

X:=Xu {v};

end;


Figure 16: General form of heuristics 1 through 3.

c) Of all vertices that satisfy a) and b), v has the maximum number of incident edges that are on paths

of delay > 6. In case of a tie, let Z be the set of vertices that are tied. For each u E Z determine

1(u) and r(u) such that 1(u) is the length of a longest path from a source of G/Xto u and r(u) is the

length of a longest path from u to a sink of G/X. The vertex with the maximum value of min {l(u)

r(u)} is selected. If there is still a tie, this is broken arbitrarily.

This heuristic is easily implemented to have run time O( k(n + e) ) where k is the number of ver-

tices split, n is the number of vertices in the dag, and e is the number of edges in the dag.



6.2. Heuristic 2 (h2)


In this heuristic, the next vertex, v to split satisfies criteria a) and b) of Heuristic 1. In addition,

the following criteria is employed:


c') Of all the vertices that satisfy a) and b), v is a vertex whose splitting results in a dag that has the

fewest number of vertices that are on paths of delay > 6. Ties are broken as in hi.


Heuristic 2 may be implemented to have complexity O( kne ).







-21-


6.3. Heuristic 3 (h3)


Heuristic 3 also uses criteria a) and b) used by Heuristic 1. However, criteria c) is replaced by:


c") Of all the vertices that satisfy a) and b), v is such that its splitting results in a dag with least delay.

I.e., v is such that d( G/(X u {v})) )is minimum over all choices for v. Ties are broken as in hi.


The complexity of Heuristic 3 is O( kne ).



6.4. Heuristic 4 (h4)


In this heuristic, the vertices of the dag are examined in two different orders: topological and

reverse topological. When the i'th vertex in the topological (reverse topological) order is examined, it is

split if the current dag contains a path comprised solely of vertices 1, ... i and one additional vertex that

has delay > 6. The heuristic is specified in Figure 17. It can be implemented to run in O( n + e ) time.

Note that the additional vertex j can be restricted to the set of vertices adjacent to i.



7. Experimental Results


The backtracking algorithm of Section 5 and the four heuristics of Section 6 were programmed in Pascal

and run on an Apollo DN3500 workstation. We experimented with two sets of acyclic directed graphs.

The first set was obtained from the S-graphs of the ISCAS-89 benchmark sequential circuits [BRGL89].

The S-graphs were first rendered cycle free by the procedure given in [LEE90]. The characteristics of the

resulting dags are given in Table 1. The other set of graphs was derived from the ISCAS-85 benchmark

combinational circuits [BRGL85]. Here the nodes in the digraph model the gates in the circuit and the

edges correspond to the connections between gates. Associated with each edge is the propagation delay

along the corresponding circuit gate input. The edge delay was set to the maximum of the rising and fal-

ling delays provided in [BRGL85]. The characteristics of these circuits are given in Table 2. For each

dag, G, we experimented with the 6 values { .9d(G), .8d(G), .7d(G), .6d(G), .5d(G), .4d(G) }. Table 3

gives the results for the case G = s400. Note from Table 1 that d(s400) = 16. For 6 close to d(s400)

(specifically, 6 = 12 and 14), all four heuristics found optimal solutions. Heuristic 2 was the only one that







- 22 -


X := 0 ; { set of split vertices }

for i := 1 to n do { in topological order }

if G/X has a path comprised solely of vertices

1, 2,..., i, and j (for any j) with delay > 6

thenX :=Xu { i };



Y := 0 ; { set of split vertices }

for i := n down to 1 do { in reverse topological order }

if G/Y has a path comprised solely of vertices

n, n-1, ..., i, and j (for any j) with delay > 6

then Y := Yu i };



if lA < I

then split vertices in X

else split vertices in Y;




Figure 17: Heuristic 4.

obtained optimal solutions for all tested 6 values. Table 4 gives the performance of circuit s38584. The

backtracking algorithm was able to complete only for the case 6 = .9d(G) and 6 = .8d(G) in the time

allowed for each run. Heuristic h2 consistently obtained better solutions than obtained by the remaining

heuristics. However, its run time, while quite acceptable, was greater than that of heuristics 1 and 4.


Table 5 gives the results for the combinational circuit c432. For this circuit, heuristics 2 and 3

found the optimal solution for all tested 6 values. The results for circuit c6288 are given in Table 6. The

backtracking algorithm successfully found the optimal solution only for the cases 6 = 287.89 = 0.9d(G)

and 6 = 255.90 = 0.8d(G). Of the four heuristics, h2 obtained the best solutions for five of the six 6

values tested and h4 was best for the remaining 6 value.







-23-


Tables 7 and 8 give the total number of nodes split by each of the four heuristics for each of the

sequential and combinational circuits, respectively. For each circuit the six 6 values { .9d(G), .8d(G),

.7d(G), .6d(G), .5d(G), .4d(G) } were used and the tables give the sum of the number of vertices split for

each of these 6 values. Table 9 and 10 give the % of tests on which each heuristic obtained the best solu-

tion. Heuristic 2, on average, was significantly better than the others.


Tables 11 14 give the number of nodes split at the two extremes 6 = 0.9d(G) and 6 = 0.4d(G) of

the range of 6 values tested. Generally, for 6 close to d(G) the four heuristics tended to obtain solutions

of comparable quality while for smaller 6 the differences were more noticeable. However, in all 6 ranges

tested, heuristic 2 tended to produce the best solutions. The average run time for each of the circuits and

each 6 value is given in Tables 15 and 16. As can be seen heuristics 1 and 4 are very fast. While heuris-

tic 2 is significantly faster than heuristic 3, it is much slower than hi and h4. Despite this, we recommend

h2 because it produces relatively better solutions and its run time is acceptable.




8. References


[BRGL85] F. Brglez and H. Fujiwara, "A Neutral Netlist of Ten Combinational Benchmark Circuits

and a Target Translator in FORTRAN,"

Proc. IEEE Symp. on Circuits & Systems, June 1985 pp. 663-666.


[BRGL89] F. Brglez, D. Bryan, and K. Kozminski, "Combinational Profiles of Sequential Benchmark

Circuits,"

Proc. ofIntern. Symp. on Circuit & Systems, May 1989, pp. 1929-1934.


[CHEN90] K.T. Cheng and V. D. Agrawal, "A Partial Scan Method for Sequential Circuits with Feed-

back," IEEE Transactions on Computers, Vol. 39, No. 4, pp. 544-548, April 1990.


[GARE79] M. R. Garey, and D. S. Johnson, "Computers and Intractability", W. H. Freeman and Com-

pany, San Francisco, 1979.







- 24 -


[GUPT90] R. Gupta, R. Gupta and M. A. Breuer, "BALLAST: A Methodology for Partial Scan

Design," IEEE Transactions on Computers, Vol. 39, No. 4, pp. 538-544, April 1990.


[HOR078] E. Horowitz, and S. Sahni, "Fundamentals of Computer Algorithms", Computer Science

Press, Maryland, 1978.


[HORO90] E. Horowitz, and S. Sahni, "Fundamentals of Data Structures in Pascal", Computer Sci-

ence Press, Maryland, 1990.


[LEE90] D. H. Lee and S. M. Reddy, "On Determining Scan Flip-flops in Partial-scan Designs,"

Proc. ofInternational Conference on Computer Aided Design, November 1990.


[PAIK90] D. Paik, S. Reddy, and S. Sahni, "Deleting Verticies To Bound Path Lengths", University

of Florida, Technical Report, 1990.



circuit # vertices # edges d(G)

s400 173 282 16
s420 37 130 17
s526 27 98 12
s526n 27 98 12
s838 69 266 33
s1423 74 917 26
s5378 233 1314 20
s9234 216 1633 32
s13207 762 3083 35
s15850 608 8562 61
s35932 1777 3380 36
s38417 1396 8754 29
s38584 1448 9471 129


Table 1: Circuit characteristics (unit delay) of modified sequential circuits







-25-


circuit # vertices # edges max delay

c432 250 426 57.40
c499 555 928 53.30
c880 443 729 53.00
c1355 587 1064 49.90
c1908 913 1498 76.59
c2670 1426 2076 86.87
c3540 1719 2939 98.69
c5315 2485 4386 99.30
c6288 2448 4800 319.88
c7552 3719 6144 85.30


Table 2: Circuit characteristics (with max of falling and rising delay) of ISCAS combinational circuits.



# nodes split run time (sec)

L6 hi h2 h3 h4 optimal hi h2 h3 h4 optimal

14 1 1 1 1 1 <1 <1 <1 <1 <1
12 1 1 1 1 1 <1 <1 <1 <1 <1
11 2 2 2 3 2 <1 <1 <1 <1 <1
9 5 4 4 7 4 <1 <1 1 <1 10
8 7 4 4 11 4 < 1 <1 1 < 1 24
6 12 8 11 10 8 < 1 1 3 < 1 35980


Table 3: Results for s400



# nodes split run time (sec)

[L6 hi h2 h3 h4 optimal hi h2 h3 h4 optimal

116 21 2 2 5 2 3 61 61 < 1 222
103 15 2 4 24 2 2 64 127 < 1 17280
90 18 2 5 20 3 68 181 < 1
77 27 4 6 20 5 127 218 < 1
64 40 8 13 27 7 293 682 < 1
51 89 10 37 44 18 439 2126 < 1


Table 4: Results for s38584







-26-


# nodes split run time (sec)

6 hi h2 h3 h4 optimal hi h2 h3 h4 optimal

51.66 1 1 1 1 1 <1 1 1 < 1 < 1
45.92 2 1 1 9 1 < 1 1 1 < 1 < 1
40.18 2 2 2 10 2 < 1 2 2 < 1 < 1
34.44 3 2 2 19 2 < 1 2 2 < 1 < 1
28.70 4 3 3 10 3 < 1 3 3 < 1 1486
22.96 5 3 3 21 3 < 1 4 4 < 1 1007


Table 5: Results for c432




# nodes split run time (sec)

6 hi h2 h3 h4 optimal hi h2 h3 h4 optimal

287.89 69 1 1 2 1 10 158 161 < 1 33
255.90 210 3 6 4 2 34 370 764 < 1 36
223.91 241 23 33 29 40 2634 4910 < 1
191.92 216 30 86 44 37 3603 16223 < 1
159.94 349 41 86 50 58 4886 18450 < 1
127.95 328 56 86 45 56 6971 20900 1


Table 6: Results for c6288







-27-


circuit hi h2 h3 h4

s400 28 20 23 33
s420 31 31 31 31
s526 21 33 25 26
s526n 21 33 25 26
s838 60 36 36 36
s1423 58 63 57 62
s5378 24 9 10 18
s9234 64 27 33 50
s13207 139 48 99 87
s15850 194 59 121 217
s35932 174 128 147 147
s38417 414 133 246 406
s38584 210 28 67 140


Table 7: Total number of nodes split for modified sequential circuits




circuit hi h2 h3 h4

c432 17 12 12 70
c499 48 72 108 120
c880 65 45 97 82
c1355 60 73 97 152
c1908 128 50 109 144
c2670 183 59 101 147
c3540 389 142 262 226
c5315 260 84 264 184
c6288 1413 154 298 174
c7552 564 249 709 635


Table 8: Total number of nodes split for combinational circuits







-28-


circuit hi h2 h3 h4

s400 50 100 83 33
s420 83 83 83 83
s526 100 33 50 50
s526n 100 33 50 50
s838 33 83 83 83
s1423 83 17 100 33
s5378 17 100 83 67
s9234 0 100 50 0
s13207 0 100 17 17
s15850 0 100 33 17
s35932 33 100 67 67
s38417 17 83 0 17
s38584 0 100 17 0

average 39.7 79.5 55.2 39.8


Table 9: Percentage of best solutions for sequential circuits




# nodes split run time (sec)

circuit hi h2 h3 h4 optimal hi h2 h3 h4 optimal

s400 1 1 1 1 1 < 1 < 1 < 1 < 1 < 1
s420 2 2 2 2 2 < 1 < 1 < 1 < 1 < 1
s526 2 2 2 2 2 < 1 < 1 < 1 < 1 < 1
s526n 2 2 2 2 2 < 1 < 1 < 1 < 1 < 1
s838 4 4 4 4 4 < 1 < 1 < 1 < 1 < 1
s1423 3 3 3 3 3 < 1 < 1 < 1 < 1 1
s5378 1 1 1 1 1 < 1 < 1 < 1 < 1 < 1
s9234 4 1 1 2 1 < 1 < 1 < 1 < 1 < 1
s13207 5 3 3 3 3 < 1 9 10 < 1 16
s15850 5 2 2 2 2 < 1 25 25 < 1 8
s35932 10 10 10 10 < 1 127 157 < 1
s38417 5 3 3 2 2 < 1 47 56 < 1 10
s38584 21 2 2 5 2 3 61 61 < 1 222


Table 10: Sequential circuits, 6


0.9d(G)







-29-


# nodes split run time (sec)

circuit hi h2 h3 h4 optimal hi h2 h3 h4 optimal

c432 1 1 1 1 1 <1 1 1 < 1 < 1
c499 2 2 6 8 2 < 1 9 35 < 1 < 1
c880 1 1 1 1 4 1 < 1 1 1 < 1 < 1
c1355 2 2 5 24 2 < 1 11 36 < 1 11
c1908 2 1 1 12 1 < 1 7 7 < 1 1
c2670 9 1 1 2 1 < 1 10 10 < 1 < 1
c3540 17 5 6 10 1 98 176 < 1
c5315 13 2 11 4 2 1 52 201 < 1 5
c6288 69 1 1 2 1 10 158 161 < 1 33
c7552 20 5 8 5 3 163 280 < 1


Table 11: Combinational circuits, 6 = 0.9d(G)



# nodes split run time (sec)

circuit hi h2 h3 h4 optimal hi h2 h3 h4 optimal

s400 12 8 11 10 8 < 1 1 3 < 1 35980
s420 7 7 7 7 7 < 1 < 1 < 1 < 1 < 1
s526 6 13 6 6 6 < 1 < 1 < 1 < 1 < 1
s526n 6 13 6 6 6 < 1 < 1 < 1 < 1 < 1
s838 8 8 8 8 8 < 1 < 1 < 1 < 1 42
s1423 17 18 16 17 < 1 5 4 < 1
s5378 6 3 4 3 3 < 1 2 2 < 1 < 1
s9234 17 13 16 25 < 1 13 17 < 1
s13207 40 16 35 24 2 85 247 < 1
s15850 68 23 49 78 11 513 1059 < 1 -
s35932 64 37 55 38 6 1064 2018 < 1
s38417 160 60 115 190 25 1924 8553 < 1 -
s38584 89 10 37 44 18 439 2126 < 1


Table 12: Sequential circuits, 6 = 0.4d(G)







-30-


# nodes split run time (sec)

circuit hi h2 h3 h4 optimal hi h2 h3 h4 optimal

c432 5 3 3 21 3 < 1 4 4 < 1 1007
c499 18 21 30 40 < 1 99 187 < 1
c880 33 20 39 25 < 1 41 112 < 1
c1355 18 21 25 56 < 1 117 170 < 1
c1908 40 21 66 41 1 214 724 < 1
c2670 57 34 56 64 3 662 1339 < 1
c3540 113 48 86 68 10 1867 5100 < 1 -
c5315 95 40 105 91 12 2408 8612 < 1 -
c6288 328 56 86 45 56 6971 20900 1
c7552 189 129 372 330 40 23933 83417 < 1 -


Table 13: Combinational circuits, 6= 0.4d(G)




circuit hi h2 h3 h4

c432 33 100 100 17
c499 100 33 0 17
c880 33 83 50 0
c1355 100 17 0 33
c1908 0 100 17 0
c2670 0 100 50 33
c3540 0 100 0 0
c5315 0 100 0 17
c6288 0 83 17 17
c7552 0 100 0 17

average 26.6 81.6 23.4 15.1


Table 14: Percentage of best solutions.







-31-


circuit hi h2 h3 h4

s400 < 1 < 1 1 < 1
s420 < 1 < 1 < 1 < 1
s526 < 1 < 1 < 1 < 1
s526n < 1 < 1 < 1 < 1
s838 < 1 < 1 < 1 < 1
s1423 < 1 3 3 < 1
s5378 < 1 1 1 < 1
s9234 < 1 3 5 < 1
s13207 1 37 95 < 1
s15850 5 195 379 < 1
s35932 2 532 749 < 1
s38417 11 681 2671 < 1
s38584 6 175 558 < 1


Table 15: Average run time for sequential circuits.




circuit hi h2 h3 h4

c432 < 1 2 2 < 1
c499 < 1 59 112 < 1
c880 < 1 14 36 < 1
c1355 < 1 70 101 < 1
c1908 1 84 194 < 1
c2670 1 168 320 < 1
c3540 6 875 2186 < 1
c5315 5 682 2607 < 1
c6288 39 3104 10235 < 1
c7552 18 6212 22632 < 1


Table 16: Average run time for combinational circuits.




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs