EFFICIENT ALGORITHMS FOR ELECTRONIC CAD
By
VENKAT THANVANTRI
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1995
@Copyright 1995
by
Venkat Thanvantri
ACKNOWLEDGMENTS
My heartfelt appreciation geos to my advisor Professor Sartaj Sahni for giving
me continued guidance in my thesis work. I thank him for the help, patience and sup
port he provided throughout my stay in the University of Florida. Weekly meetings
and discussions with him have spawned many ideas, for which I am thankful.
Special thanks go to Dr Baba Vemuri for his interest in my research and for
serving on my supervisory committee. I would like to thank other members in my
superviory committee, Dr Tim Davis, Dr LiMin Fu and Dr John Harris, for their
interest and comments. I thank Dr Rajasekharan for agreeing to attend the thesis
defense on a very short notice.
Thanks go to Seonghun Cho for his willingness to discuss the general subject of
algorithms.
I thank Archana Nair for being a good friend and giving encouragement when I
needed it most.
Thanks go to my sister Lakshmi and brotherinlaw Anand for being there when
I wanted. Finally, I would like to thank my parents for the love and support, without
which I could not have pursued my doctoral studies. To them I dedicate this work.
TABLE OF CONTENTS
ACKNOWLEDGMENTS ................... .......... iii
LIST OF TABLES .................. ............... vi
LIST OF FIGURES .................... ............ viii
ABSTRACT ..................... .............. ix
CHAPTERS
1 INTRODUCTION .... ................. ......... 1
1.1 Background .. .. .. .. .. .. .. .. .. .. ... .. .. 1
1.2 Physical Design Automation .. ................... 3
1.3 Thesis Outline ........ ..... ..... .......... 5
2 FOLDING A STACK OF EQUAL WIDTH CO
2.1 Background .................
2.2 Introduction ... ...........
2.3 Normalization ...............
2.4 EqualWidth HeightConstrained ...
2.5 Parametric Search ..............
2.6 EqualWidth WidthConstrained .....
2.7 Experimental Results ............
2.8 Conclusions .................
3 STANDARD AND CUSTOM CELL FOLDING
3.1 Introduction .................
3.2 Standard Cell Folding (Problems 14) .
3.2.1 Width Constrained Case (Problems
3.2.2 Height Constrained Case (Problems
MPONENTS .
1 and 2)
34) .
3.3 Standard Cell Folding (Problems 57) . .
3.3.1 Minimum Channel Height (Problem 5) .
3.3.2 Minimize Chip Area Subject to Width Constraint
6) . . . .
(Problem
.....
3.3.3 Minimize Chip Area Subject to Height Constraint (Problem
7) . . . . 47
3.4 Custom Cell Folding (Problems 8 and 9) .
3.4.1 Width Constrained Folding (Problem 8)
3.4.2 Height Constrained Folding (Problem 9)
3.5 Experimental Results . .
3.6 Conclusions ....................
4 PLANAR TOPOLOGICAL ROUTABILITY . .
4.1 Introduction . . . .
4.2 Prelim inaries .. .. .. .. .. .. .. .. .. ... .. .
4.3 The Algorithm ............................
4.4 Topological Routability of Multipin Nets . .
4.5 Implementation of TwoPin Algorithm . .
4.6 Experimental Results . ... .
4.7 Conclusion . . . .
5 CONCLUSIONS AND FUTURE WORK. . ..
REFERENCES ...................................
BIOGRAPHICAL SKETCH ........................... .
58
58
60
66
76
86
88
94
95
97
100
.
.
.
.
.
LIST OF TABLES
2.1 Summary of results of Paik and Sahni . ..... 10
2.2 Comparison of equalwidth heightconstrained algorithms ...... 20
2.3 Run times of equalwidth widthconstrained algorithms ... 29
3.1 Heights produced by widthconstrained standard cell folding algorithms 56
3.2 Run times of widthconstrained folding algorithms for custom cells .. 57
4.1 Treelike Connected Circuits . ..... 91
4.2 SixWay Connected Circuits .................. ...... 91
4.3 Random Circuit ................... .......... 92
4.4 Faster Termination for NonRoutable Circuits . ... 92
4.5 Eightway Connected Circuits with Multipin Nets . ... 93
4.6 Treelike Connected Circuits With Multipin Nets . ... 93
4.7 Faster Termination for NonRoutable Circuits With Multipin Nets 93
LIST OF FIGURES
2.1 Stack of equal width components . ..... ... 9
2.2 Routing space reserved .... ..................... 9
2.3 Case when hj + rj+l I rj ........................ 13
2.4 Case when hj + rj < rj+l ............. ......... 15
2.5 Normalizing a stack .................... .... 15
2.6 Procedure to obtain a minimum width folding . .... 19
2.7 Procedure for parametric search . ..... 22
3.1 Standard cell Architecture .................... .. 32
3.2 Procedure to obtain a minimum height folding . ... 40
3.3 Procedure to obtain a minimum channel height folding ... 45
3.4 Procedure to obtain a minimum height folding for custom cells 48
3.5 Procedure to delete F(.) values as in Observation 3 ... 51
3.6 Procedure to Insert F(.) values ..................... 54
4.1 A planar routable and a nonplanar routable case . ... 59
4.2 Augmentation .. ......................... .. 61
4.3 An example to illustrate some terminology . ... 61
4.4 Two possibilities to connect a and b
4.5 Another not planar routable situation . .... 64
4.6 Constructing the envelope of a component . .... 65
4.7 Rerouting to free independent component . ... 66
4.8 Topological routing.. ........................... 67
4.9 Exam ple RI ................... ............ 68
4.10 Illustration of the routing sequence . ... 69
4.11 Trapped terminal and module . . ... 70
4.12 To illustrate conflict ........................... 71
4.13 Algorithm to find routing path between pins r and s ... 73
4.14 Realization of a planar net using one Steiner point . ... 78
4.15 Transformation from multipin to twopin nets . ... 80
4.16 Topological routing of multipin net for restricted version ... 82
4.17 Example RI with a four pin net . . ... 83
4.18 Illustration of the routing sequence with Multiterminal net 84
4.19 An possible situation where the RI is unroutable . .... 86
4.20 Treelike connected circuits . . ... 88
4.21 Sixway connected circuits ........................ 89
4.22 Treelike connected circuits with multipin nets . ... 89
4.23 Eightway connected circuits with multipin nets . .... 90
. .. .. 63
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Doctor of Philosophy
EFFICIENT ALGORITHMS FOR ELECTRONIC CAD
By
Venkat Thanvantri
August 1995
Chairman: Dr. Sartaj Sahni
Major Department: Computer and Information Science and Engineering
In this thesis, we develop efficient algorithms for three problems that arise in
electronic computer aided design (ECAD). (1) component stack folding, (2) standard
and custom cell folding, and (3) planar topological routing.
The component stack folding problem arises in the layout of bitslice architec
tures. We consider two versions of this problem. In both, the components have equal
width and when a stack is folded, a routing penalty is incurred at the fold. In the
first version, the height of the folded layout is given and we have to minimize the
width. In the second, the width of the folded layout is given and its height is to be
minimized. We develop a normalization technique that permits the first version to
be solved in linear time by a greedy algorithm. The second version can be solved
efficiently using normalization and parametric search.
In standard and custom folding, the component list is folded into rows and a
routing penalty is incurred between two rows. In the model we consider, the number
of wires that have to cross between two rows serves as the routing penalty. Nine
versions of the folding problem are formulated and efficient algorithms are developed
for each.
We develop a simple, fast linear time algorithm to determine if a collection of
twopin nets can be routed topologically in a plane. Topological routability testing
of a collection of multipin nets is shown to be equivalent to planarity testing, and a
simple linear time algorithm is developed for the case when the collection of modules
remain connected following the deletion of all nets with more than two pins.
Experimental results are presented.
CHAPTER 1
INTRODUCTION
1.1 Background
With current technology, a single chip can have several million transistors. De
sign and fabrication of such chips is made possible by the automation of the steps
involving the development of the chip. Starting with the formal specifications, the
VLSI design cycle goes through a series of steps to produce the final product, a fully
packaged chip. The VLSI design cycle consists of the following steps [24]:
1. System Specification: In this step the high level representation of the system is
created. Performance, functionality, the physical dimensions, the choice of the
design techniques and the fabrication technology are considered in this step.
2. Functional Design: The output of this step is a timing diagram which is ob
tained by considering the behavioral aspects of the system.
3. Logic Design: The logic design, in general, is represented by Boolean expres
sions. The logic design that represents the functional design is obtained in
this step. The boolean expressions are minimized to obtain the smallest logic
design. Correctness of the logic design is also asserted in this step.
4. Circuit Design: A circuit which represents the logic design of the system is
developed in this step by taking into consideration speed and power require
ments, and electrical behavior of the components used in the development of
the circuit.
5. Physical Design: This is the most time consuming step in the VLSI design
cycle. In this step, the components and the interconnections are represented by
geometric patterns. The objective of this step is to obtain an arrangement of
these geometric patterns which minimizes the area and power and satisfies the
timing requirements of the chip. Due to its high complexity this step is broken
down into smaller substeps. We will look into this step in detail later in this
chapter.
6. Design Verification: In this step design rule checking and circuit extraction are
done to verify that the circuit layout from the physical design step satisfies the
system specfication and design rules.
7. Fabrication: The verified layout is used in the fabrication process to produce
the chip.
8. Packaging, Testing and Debugging: The fabricated chip is packaged and tested
to ensure proper functioning.
Each step in the design cycle can be viewed as a change in representation of the
system. The steps in the VLSI design cycle iteratively improve the representation to
meet the specifications.
1.2 Physical Design Automation
The physical design step maps a circuit design into a physical circuit. The input
to this step is a circuit design which is represented by a set of modules, a set of nets,
a chip carrier and the design rules. The modules and the chip carrier are usually
rectangular. The output of the physical design step is a layout for modules and
interconnections which has the desired functionality.
There are several objective functions that are used in the physical design step.
If the chip size is not fixed then the objective is to find a minimum area layout. When
the circuit speed is a consideration, the objective may be to minimize the critical net
length or minimize the sum of connection lengths.
The field of physical design automation involves developing algorithms and data
structures which can be used in the layout process. The algorithms are used to obtain
solutions which satisy the objective functions and which meet the design rules. Large
designs and the iterative improvements by the physical design engineers require that
the algorithms developed be very fast.
Physical design is an extremely complex process that is usually broken down
into smaller problems such as partitioning, floorplanning and placement, routing and
compaction.
In the partitioning step, the components of a large circuit are divided into a
collection of smaller subcircuits/modules according to some criteria. The factors that
are considered may be the size of the modules, number of modules and the number of
interconnections between the modules. At the end of the partitioning step, we have
a set of modules and a set of interconnections required between modules.
Selecting areas, power consumption, aspect ratios, and I/O pin locations of the
modules forms the floorplanning step. The floorplanning step optimizes design quality
in terms of chip area, power consumption, timing performance and wire density.
Floorplanning is an important step as it lays the foundation for the final layout.
The precise locations of the components are determined during the placement step
to optimize area and timing.
The routing phase completes the interconnections between the modules. Rout
ing is usually divided into three smaller subproblems which are global routing, de
tailed routing and specialized routing. The global router decomposes a larger routing
problem into small and manageable problems. Steiner trees and spanning trees are
the commonly used approaches for net connection in global routing. Detailed routing
includes switchbox, channel and planar. Planar routing is a problem in which inter
connection topology of the nets is planar. That is, all connections can be realized
on a single layer. Single layer routing is not always possible. In MCM technology
with many routing layers, a subset of nets that is planar routable is preferred. Planar
routing is usually preferred as no via is needed for the interconnections. Vias reduce
the reliability and performance of a circuit. Routing clock nets and powerground
nets are specialized routing problems.
During the compaction phase, the components and the interconnections are
moved so as to further optimize the layout in terms of area and delay. By compress
ing the chip, the components come closer thereby reducing the delay between the
components. This step must also ensure that by compressing the chip, design rules
are not violated.
1.3 Thesis Outline
One of the placement methods is to obtain a linear list of components mini
mizing some criteria and then folding this list into a given height or width so as to
minimize the area. The objective functions that can be used when forming the linear
list of components may be minimizing the maximum density of wires between adja
cent components; minimizing the total number of wire segments between adjacent
components; and minimizing maximum net length [1].
In the bitsliced placement model introduced by Paik and Sahni [21], compo
nent reordering is not permitted. That is, the components are ordered by some
objective function to obtain a component stack. This stack is folded into a layout
which is either heightconstrained (layout height is given) or widthconstrained (lay
out width is given). In Chapter 2, we look into two problems considered by Paik and
Sahni [21]. We introduce a normalization technique which in combination with the
greedy method and parametric search helps develop linear time algorithms for these
problems.
In Chapter 3, we develop optimal algorithms to fold a linearly ordered list of
standard and custom cells under various optimization constraints. A total of nine
problems are formulated and their solutions provided.
In Chapter 4, we look into the planar topological routability problem. We
develop a linear time algorithm for planar topological routability for the case when
all nets are 2pin nets. This algorithm determines the topological routability of the
given problem instance and the loose route of wires when the instance is topologically
routable. We also consider the case when there are multipin nets. For this case, we
prove that (a) the topological routability problem is equivalent to the graph planarity
problem, and (b) the problem of finding the maximum number of nets that are
topologically routable is NPHard. A linear time algorithm is developed for the case
when the circuit modules remain connected following the deletion of all nets that
have more than two pins.
Finally, we present conclusions and some future directions for this research.
CHAPTER 2
FOLDING A STACK OF EQUAL WIDTH COMPONENTS
2.1 Background
Wu and Gajski [31] introduced a new slicedlayout architecture to alleviate the
problems of the general bitsliced layouts. Most fabricated chips can be described
by registertransfer schematics. In addition to gates, latches, and flipflops, schemat
ics include registertransfer components such as registers, counters, adders, ALUs,
shifters, multiplexers, and register files. Standard cell methodology decomposes the
components into basic gates, latches, and flipflops before layout. Wu and Gajski [31]
suggest that greater layout density can be achieved if registertransfer components
are laid out in a bitsliced layout architecture.
For each microarchitectural component there is a layout generator that includes
bitslice generators. All generated bit slices are of the same width. If a component has
a width w, then it has w slices. Each microarchitectural component has a different
height. The component includes the cellabutment, overthecell routing, and inter
slice switch box to alleviate the problems of the previous approaches. Intracell routing
in done on metal 1 and Inter cell routing on metal 2. All regular components in a
design are stacked (stack of components) and routed in metal 2. Component stack
folding, in the context of bit sliced architectures introduced by Larmore, Gajski, and
Wu [14], is to fold this stack into itself in a way that minimizes the wasted area.
Stack folding (stack partioning) is also done in case there are too many components
in a single stack. In this paper [14], they used this model to compile layout for cmos
technology. Further applications of the model were considered by Wu and Gajski
[31].
In the model of Larmore et al. [14] and Wu and Gajski [31] the component stack
can be folded at only one point. In addition, it is possible to reorder the components
on the stack. These folding schemes begin by reordering the components by width.
They also show that the folding problem using this model is NPcomplete.
A related, yet different, folding model was considered by Paik and Sahni [21].
In this, no limit is placed on the number of points at which the stack may be folded.
Also, component reordering is forbidden. They point out that this restriction is
realistic as the component stack is usually ordered so as minimize inter component
routing requirements and optimize performance. They also point out that this model
may be used in the application cited by Larmore et al. [14] and Wu and Gajski
[31]. Furthermore, it accurately models the placement step of the standard cell and
seaofgates layout algorithms of Shragowitz et al. [26, 25]. In the case of standard
cell designs, all modules have the same width while in the case of seaofgates designs
module widths and heights vary from module to module.
2.2 Introduction
A stack of equal width components is comprised of variable height components
C1,C2,...,Cn stacked one on top of the other. C, is at the top of the stack and
U i I
Cl cC+
stack Ci C.
stack stack2A stack3l
C.
(a) Component Stack
(b) Folded into three stacks
Figure 2.1. Stack of equal width components
C, Cn
LIIiIE
Interstack Routing
Figure 2.2. Routing space reserved
C1
C0
CG
LJu t^VAY
LuuuV'l.
Table 2.1. Summary of results of Paik and Sahni
Routing area at stack ends
No Yes
Equal width, height constrained O(n) O(n2)
Equal width, width constrained O(n) O(n3)
Equal height, height constrained O(n4 log n) O(n4 log n)
Equal height, width constrained O(n4 log2 n) O(n4 log2 n)
Variable heights and widths, height constrained O(n5 log n) O(n5 log n)
Variable heights and widths, width constrained O(n5 log2 n) O(n5 log2 n)
(Source : Paik and Sahni
[21])
C, at the bottom (Figure 2.1(a)). If the stack is realized, physically, in this way,
the area needed is Ehi w where hi > 0 is the height of Ci and w > 0 is the width
of each component. If the component stack is folded at C; we obtain two adjacent
stacks C1,C2,...,Ci and Cn, Cn1,...,Ci+l. The folding also inverts the left to
right orientation of the components C,,.. ., Ci+. Figure 2.1(b) shows the stack of
Figure 2.1(a) after folding at C,, Ci,. Notice that folding results in a snakelike
rearrangement. While not apparent from the figure, each fold flips the lefttoright
orientation of a component. As can be seen from Figure 2.1(b), pairs of folded
stacks may have nested components, components in odd stacks are left aligned; and
components in even stacks are right aligned. The area of the folded stack is the area
of the smallest rectangle that bounds the layout. To determine this, depending on
the model, we may need to add additional space at the stack ends to allow for routing
between components Ci, and Ci,, where Ci, is a folding point. If so, let ri > 0,
2 < i < n, denote the height of the routing space needed if the stack is folded at Ci1
(Figure 2.2).
In practical situations, the height (width) of the rectangle into which the stack
is to be folded may be limited (and known in advance) and we are to minimize the
width (height). Several versions of folding into height (width) constrained rectangles
were considered by Paik and Sahni [21]. Their results are summarized in Table 1.
In this chapter we consider two of the problems considered in Paik and Sahni [21]:
(1) Equalwidth, heightconstrained with routing area at stack ends. In this problem,
we are to fold a stack of equal height components into a rectangle of given
height so as to minimize the width (and hence area) of the rectangle. For this
problem, the algorithm [21] runs in O(n2) time. We develop an O(n) algorithm.
(2) Equalwidth, widthconstrained with routing area at stack ends. Here the width
of the rectangle into which the folding occurs is given and we are to mini
mize its height (and hence area). Four algorithms with complexity O(n log n),
O(n log log n), O(n log* n), and O(n) respectively are obtained. Experimental
results indicate that the O(n log n) algorithm is fastest in practice. This is due
to the fact that this algorithm has least overhead.
Our algorithms employ two techniques. The first is normalization in which an
input instance is transformed into an equivalent normalized instance that is relatively
easy to solve. The second technique is parameterized searching. In Section 2.2 we
describe our normalization technique and then in Section 2.3, we show how this
results in a linear time algorithm for the equalwidth heightconstrained problem.
Parameterized searching is described in Section 2.4 and then used in Sections 2.5 to
obtain the algorithms for the equalwidth widthconstrained problem. Experimental
results comparing the relative performance of the various algorithms for the equal
width widthconstrained problem are given in Section 2.6.
2.3 Normalization
Let hi be the height of the component Ci, 1 < i < n. Let ri be the routing height
needed between Ci, and Ci if the component stack is folded at Ci1,2 < i < n; and
let rx = rn+l = 0. The defined component stack is normalized if the conditions Cl
and C2 given below are satisfied for every i, 1 < i < n.
S S2 S1 52
*
hj1 hj1
hjI hij+l
rj rj rj+l rj+1
(a) (b)
Figure 2.3. Case when hj + rj+l < r
Cl : hi + ri+l > ri
C2: hi + ri > ri+l
An unnormalized instance I may be transformed into a normalized instance I
with the property that from a minimum height or minimum width folding of I, one
can easily construct a similar folding for I. To obtain i, we identify the least value of
i at which either Cl or C2 is violated. Let this value of i be j. By choice of j, either
hj + rj+1 < rj, or
hj + rj < rj+l.
We first note that (since hj > 0) it is not possible for both of these inequalities to
hold simultaneously. Suppose that hj + rj+l < rj. Now j > 1 as hi + r2 > 0 while
ri = 0. Also, hj + rj > rj+l. Consider any folding of I in which Cj1 is a fold
point (Figure 2.3(a)). Let the height of the stack S1 be h(S1) and that of S2, h(S2).
Consider the folding obtained from Figure 2.3(a) by moving Cj from S2 to S1. Let
the height of the stacks now be h'(S1) and h'(S2). We see that
h'(SI) = h(S1) rj + hj + rj+ <5 h(S1)
and
h'(S2) = h(S2) rj hj + rj+l < h(S2).
So, the height and width of the folding of Figure 2.3(b) is no more than that of
Figure 2.3(a). Hence, the instance I' obtained from I by replacing the component
pair ((hji_, r3._), (hj, rj)) with the single component (hj, + hi, rj,) has the same
minimum width/height folding as does I. From a minimum width/height folding for
I' one can obtain one for I by replacing the component (hj1 + hj,rj1) with the two
components of I.
If hi + rj < rj+l, then I' is obtained by replacing the component pair
((hj, rj), (hj+1, rj+,)) with the single component (hj + hj+,, rj). The proof is similar
to the previous case.
The component pair replacement scheme just described may be repeated as often
as needed to obtain a normalized instance I. Note that the scheme terminates as
each replacement reduces the number of components by one and every one instance
component is normalized.
The preceding discussion leads to the normalization procedure Normalize of
Figure 2.5. The input to this procedure is a component stack C[1]... C[n] and the
output is a normalized stack C[1]... C[n] (the input n (say n") will be generally
larger that the output n (say n')).
C[i].h, C[i].r, C[i].f, and C[i].l, respectively, give the height, routing height needed
if the stack is folded at C[i 1], index of first input component represented by C[i],
S S2 S1 S2
1 hj+
hj h j+l
hj_1 hj
rj+i r+ r
(a) (b)
Figure 2.4. Case when hj + rj < rj+l
Procedure Normalize(C,n)
{ Normalize the component stack C[1]... C[n]}
i := 1; next := 2;
while next < n + 1 do
case
: C[i].h + C[next].r < C[i].r:
{Combine with C[i 1]}
C[i 1].h := C[i 1].h + C[i].h;
C[i 1].l:= C[i].l;
i:=i1;
: C[i].h + C[i].r < C[next].r:
{Combine with C[next]}
C[i].h:= C[i].h + C[next].h;
C[i].l:= C[next].l;
next := next + 1;
:else: C[i + 1] := C[next];
i := i +1;next := next + 1;
end;
n := i 1;
end; {Normalize}
Figure 2.5. Normalizing a stack
and index of the last input component represented by C[i]. At input, we have
C[i].h = hi
C[i].r = ri
C[i].f = C[i].l = i
1 < i < n, and C[n + l].r = 0. Note that, by definition, C[1].r = rn = 0. On
output, component C[i] is the result of combining together the input components
f, f + 1,...,. The heights and the r values are appropriately set. The correctness
of procedure Normalize is established in Theorem 1. Its complexity is O(n) as each
iteration of the while loop takes constant time; the first two case clauses can be
entered atmost a total of n 1 times as on each entry the number of components is
reduced by 1. The else clause can be entered atmost n 1 times as on each entry
next increases by 1 and this variable is never decreased in the procedure.
Theorem 1 : Procedure Normalize produces an equivalent normalized component stack.
Proof : The procedure maintains the following invariant at the start of each itera
tion of the while loop:
Invariant: Normalizing conditions Cl and C2 are satisfied by all components C[j],j <
1.
This is clearly true when i = 1 as there is no component C[j] with j < 1. If
the invariant is true at the start of some iteration, then it is true at the end of that
iteration. To see this, note that if we enter the first clause of the case then following
the execution of this clause, C[j].h, C[j].r, C[j + 1].r,j < i', where i' is the value of
i following execution of the clause, are unchanged. So, the execution does not affect
C1 and C2 for j < i'. If the second caseclause is entered, then again Cl and C2
are unaffected by the execution for j < i as C[j].h, C[j].r, and C[j + l].r, j < i are
unchanged. When the third clause is entered the validity of C1 and C2 for j < i'
follows from the fact that the conditions for the first two clauses are false.
On termination, next = n+2. The last iteration of the while loop could not have
entered the first clause of the case statement as in this clause, next is not increased.
While in the second clause, next is increased, the condition C[i].h+C[i].r < C[next].r
cannot be true in the last iteration as now next = n" + 1 (n" is the input value of
n), C[i].h + C[i].r > 0, C[n"].r = 0. So, the last iteration caused execution of the
third clause of the case statement. As a result, C[n"' is moved to position n" + 1 of
C. From the invariant, it follows that Cl and C2 are satisfied for j < i' = n" + 1
(note i' is the final value of i). Hence the output component stack C[1]... C[n'] is
normalized. 0
Theorem 2 establishes an important property of a normalized stack. This prop
erty enables one to obtain efficient algorithms for the two folding problems considered
in this chapter.
Theorem 2 : Let (hi,ri), 1 < i < n define a normalized component stack. Assume
that ro = rn+l = 0. The following are true:
I I
P1 : rk + E hj + r1+1 < rkl + E hj + ri+,,1
j=k j=k1
1 1+1
P2: rk + hj + r1+l < rk+ hj + rl+2,1 < k < l < n
j=k j=k
Proof : Direct consequence of C2 and C1, respectively. 0
Intuitively, Theorem 2 states that the height needed by a contiguous segment
of components from a normalized stack increases when the segment is expanded by
adding components at either end.
2.4 EqualWidth HeightConstrained
The height of the layout is limited to h and we are to fold the component stack so
as to minimize its width. This can be accomplished in linear time by first normalizing
the stack and then using a greedy strategy to fold only when the next component
cannot be accommodated in the current stack segment without exceeding the height
bound h. The algorithm is given in Figure 2.6.
From the correctness of procedure Normalize, it follows that a minimum width
folding of the normalized instance is also a minimum width folding of the initial
instance. So, we need only to show that the for loop generates a minimum width
folding of the normalized instance generated by the procedure Normalize. This follows
from properties P1 and P2 (Theorem 2) of a normalized instance. Since a segment
size cannot decrease by adding more components at either end, the infeasibility test
is correct. Also, there can be no advantage to postponing the layout of a component
to the next segment if it fits in the current one.
Procedure Minimize Width(C, n, h, width)
{ Obtain a minimum width folding whose height is atmost h}
Normalize(C, n);
used := h; width := 1;
for i:= 1 to n do
case
: used C[i].r + C[i].h + C[i + 1].r < h :
{ assign C[i] to current segment }
used := used C[i].r + C[i].h + C[i + 1].r;
: C[i].r + C[i.h + C[i+ 1].r > h:
{infeasible instance }
output error message; terminate;
:else:{start next segment, fold at C[i 1] }
width := width + 1;
used:= C[i].r + C[ij.h + C[i + 1].r
end;
end; {Minimize Width}
Figure 2.6. Procedure to obtain a minimum width folding
Table 2.2. Comparison of equalwidth heightconstrained algorithms
n [7] Figure 5
16 0.11 0.05
64 1.80 0.14
256 24.85 0.52
Times are in milliseconds
Note that while we are able to solve the equalwidth heightconstrained prob
lem in linear time using a combination of normalizing and the greedy method, the
algorithm of Paik and Sahni [21] uses dynamic programming on the unnormalized
instance and takes O(n2) time. In Table 2, we give the observed run times of the two
algorithms. These were obtained by running C programs on a SUN 4 workstation.
As is evident, our algorithm is considerably superior to that of [21] even on small
instances.
2.5 Parametric Search
In this section, we provide an overview of the parametric search method of
Frederickson [4], which uses developments by Frederickson and Johnson [5, 6] and
Frederickson [3]. This overview has, however, been tailored to suit our application
here and is not as general as that provided by Frederickson and coworkers [3, 4, 5, 6].
Assume that we are given a sorted matrix of O(n2) candidate values M,,, 1 <
i,j < n. By sorted, we mean that
Mi < Mij+, < i
and Mij 5 Mi+lj, 1 < i < n, 1
The matrix is provided implicitly. That is, we are given a way to compute Mij,
in constant time, for any value of i and j. We are required to find the least MAj
that satisfies some criterion F. The criterion F has the property that if F(x) is not
satisfied, then F(y) is not satisfied (i.e., it is infeasible) for all y < x. Similarly, if
F(x) is satisfied (i.e., it is feasible), then F(y) is feasible for all y > x. In a parametric
search, the minimum Mij that satisfies F is found by trying out some of the Mijs.
As different Mijs are tried, we maintain two values A1 and A2, A1 < A2 with the
properties:
(a) F(A1) is infeasible.
(b) F(A2) is feasible.
Initially, A1 = 0 and A2 = oo (we assume F is such that F(0) is infeasible, F(oo)
is feasible, and Mij > 0 for all candidate values). To determine the next candidate
value to try, we begin with the matrix set S = {M}. At each iteration, the matrices
in S are partitioned into four equal sized matrices (assume, for simplicity, that n
is a power of 2). As a result of this, the size of S becomes four times its previous
size. Next, a set T comprised of the largest and smallest elements from each of the
matrices in S is constructed. The median of T is the candidate value x to try next.
The following possibilities exist for x and F(x):
(1) x < A1. Since F(A1) is infeasible, F(y) is infeasible for all y < A,. So, F(x) is
infeasible.
Procedure PSEARCH(S,A ,A2,dimension,finish);
repeat
if dimension > 1 then [ replace each matrix in S by
four equal sized submatrices;
dimension := dimension/2 ]
for i := 1 to 3 do
begin
if dimension = 1 then
[ Let T be the multiset of values in all matrices of S; ]
else
[ Let T be the multiset obtained by selecting the largest
and smallest values from each matrix of S; ]
x := median(T);
if(AX < x < A2) then
if F(x) is feasible then A2 := x
else A1 := a;
Eliminate from S all matrices that have no values
such that AI < x < A2;
end;
until dimension2 IS\ < finish;
end; {PSEARCIIJ
Figure 2.7. Procedure for parametric search
(2) x > A2. Now, F(x) is feasible.
(3) A1 < x < A2. F(x) may be feasible or infeasible. This is determined by
computing F(x). If x is feasible, A2 is set to x. Otherwise, AX is set to x.
Following the update (if any) of A1 or A2 resulting from trying out the candidate
value x, all matrices in S that do not contain candidate values y in the range Ax <
y < A2 may be eliminated from S.
A more precise statement of the search process is given by procedure PSEARCH
(Figure 2.7). This procedure may be invoked as PSEARCH({M},0,oo,x,O). dimension
is the current number of rows or columns in each matrix of S and finish is a stopping
rule. The search for the minimum candidate that satisfies F is terminated when the
number of remaining candidates is < finish. If A2 = oo when PSEARCHterminates,
then none of the candidate values is feasible. If A2 is finite, then it is the smallest
candidate that is feasible.
Since we have assumed n is a power of 2, each time a matrix is divided into
four, the submatrices produced are square and have dimension that is also a power
of 2. Since M is provided implicitly, each of its submatrices can be stored implicitly.
For this, we need merely record the matrix coordinates (indices) of the top left and
bottom right elements (actually, the latter can be computed from the former using
the submatrix dimension). The multiset T required on each iteration of the for loop
is easy to construct because of the fact that M is sorted. Note that since M is
sorted, all of its submatrices are also sorted. Consequently, the largest element of
each submatix is in bottom right corner and the smallest is in the top left corner.
These elements can therefore be determined in constant time per matrix of S.
Theorem 3 : [4] The number of feasibility tests F performed by procedure PSEARCH
when started with S = {MI), AM an n x n sorted matrix that is provided implicitly
is O(log n) and the total time spent obtaining the candidates for feasibility test is
0(n). O
Corollary 1 : Let t(n) be the time needed to determine if F(x) is feasible. The
complexity of PSEARCH is O(n + t(n) log n). 0
For some of the algorithms we describe later, PSEARCH will be initiated with
IS > 1 (i.e., S will contain more than one M matrix initially; all matrices in S will
still be of the same size). To analyze the complexity of these algorithms, we shall use
the following theorem and corollary.
Theorem 4 : [4] If PSEARCH is initiated with S containing m sorted matrices, each
of dimension n, then the number of feasibility tests is O(logn) and the total time
spent obtaining the candidate values for these tests is O(mn). 0
Corollary 2 : Let t(n) be as in Corollary 1. The complexity of PSEARCH under the
assumptions of Theorem 4 is O(mn + t(n) log n). 0
While we have described PSEA RCII under the assumption that the matrices of
candidate values are square and of dimension a power of 2, parametric search easily
handles other matrix shapes and sizes. For this, we can add more rows at the top and
columns to the left so that the matrices become square and have a dimension that is
a power of 2. The entries in the new rows and columns are 0. This does not affect
the asymptotic complexity of PSEARCII. Alternatively, we can modify the matrix
splitting process to partition into four roughly equal submatrices at each step. The
details of these generalizations are given in the literature [3, 4, 5, 6].
Procedure PSEARCH is a restricted version of procedure MSEARCHof [4]. An
alternative search algorithm in which the for loop is iterated twice, once with T
being the multiset of the largest values in S and once with T being the multiset of
the smallest values in S is given in Frederickson and Johnson [5, 6]. We experimented
with both the formulations and found that for our stack folding application, the three
iteration formulation of Figure 2.7 is faster by approximately 43%.
2.6 EqualWidth WidthConstrained
To use parametric search to determine the minimum height folding when the
layout width is constrained to be < w, we must do the following:
(1) Identify a set of candidate values for the minimum height folding. This set must
be provided implicitly as a sorted matrix with the property that each matrix
entry can be computed in constant time.
(2) Provide a way to determine if a candidate height h is feasible; i.e., can the
component stack be folded into a rectangle of height h and width w ?
In this section, for (1), we shall provide an n x n sorted matrix M (n is the
number of components in the stack) of candidate values. For the feasibility test of (2),
we can use procedure Minimize Width of Figure 2.6 by setting h equal to the candidate
height value being tested and then determining if width < w following execution of
the procedure. Since the component stack needs to be normalized only once and since
Minimize Width will be invoked for O(log n) candidate values, the call to Normalize
should be removed from the procedure Minimize Width and normalization done before
the first invocation of this procedure. Also, the remaining code may be modified to
terminate as soon as w folds are made.
Since feasibility testing and normalization each take linear time, from Corol
lary 1, it follows that the complexity of the described parametric search to find the
minimum height folding is O(n + t(n) log n) = O(n + n log n) = O(n log n).
To determine the candidate matrix M, we observe that the height of any layout
is given by
ri + E hq + rj+1
q=i
for some i,j, 1 < i < j < n. This formula just gives us the height of the segment
that contains components Ci through Cj. Define Q to be the n x n matrix with the
elements
ri+ i hq + rj+,,l I < j < n
Qij =
O,i >j
Then for every value of w, Q contains a value that is the height of a minimum height
folding of the component stack such that the folding has width < w. From Theorem
2, it follows that
Qij < Qi,j+l,1 < i < n, 1 < j < n
Qij Qi+,j, 1 < i < n, 1 < j < n
Let Mij = Qni+l,j,1 < i < j < n. So, M is a sorted matrix that contains all
candidate values. The minimum Mij for which a width w folding is possible is the
minimum height widthw folding. We now need to show how the elements of M may
be computed efficiently given the index pair (i,j). Let
Hi= Yh, 1
j=1
and let Ho = 0. We see that
Sri + [, Ilii + rj+,i < j
Qil =>
0,i >j
and so,
Sri+l + llj fln; + rj+, i + j n + 1
Mi, =
0,i+j
So, if we precompute the His each Mij can be determined in constant time. The
precomputation of the His takes O(n) time. Hence, the overall complexity of the
parametric search algorithm to find the minimum height folding remains O(n log n).
We note that our O(n log n) algorithm is very similar to the O(n log n) algorithm
of [5] to partition a path into k subpaths such that the length of the shortest subpath
is maximized. The differences are that
(1) We need to normalize the component stack before parametric search can be
used. and,
(2) The definition of Mij needs to be adjusted to account for the routing heights ri
and rj+l needed at either end of the stack.
[5, 6, 3, 4] present several refinements of the basic parametric search technique.
These refinements apply to the equalwidth widthconstrained problem just as well
as to the path partitioning problem provided we start with a normalized instance and
use the candidate matrix M defined above. These refinements result in algorithms of
complexity O(n log log n), O(n log* n), and O(n) for our component stack problem.
2.7 Experimental Results
The four parametric search algorithms for the equalwidth heightconstrained
problem were programmed in C and run on a SUN 4 workstation. For comparison
purposes, the O(n3) dynamic programming algorithm of Paik and Sahni [21] was also
programmed. The run time performance of these five algorithms is given in Table 3.
These times represent the average time for ten instances of each size.The component
heights were obtained using a random number generator. The four parametric search
algorithms did not exhibit much run time variation among instances with the same
number of components. The algorithm of Paik and Sahni [21] takes much more time
than each of the parametric search algorithms. Within the class of parametric search
algorithms, the O(n log n) one is fastest in the tested problem size range. This may
be attributed to the increased overhead associated with the remaining algorithms.
The O(n log n) algorithm is recommended for use in practice unless the number of
components in a stack is very much larger than 4096.
Table 2.3. Run times of equalwidth widthconstrained algorithms
n [7] O(nlogn) O(nloglogn) O(nlog*n) O(n)
16 4.9 1.47 2.28 1.49 1.52
64 314.7 8.84 15.75 27.14 26.71
256 23255 45.96 76.55 169.58 169.42
4096 1041.90 2148.60 2597.75 2760.25
Times are in milliseconds
2.8 Conclusions
We have shown that the equalwidth heightconstrained and equalwidth width
constrained stack folding problems can be solved by applying the greedy method and
parametric search, respectively, if the input is first normalized. Normalization can be
done in linear time. Hence the overall complexity is determined by that of applying
the greedy method or parametric search to the normalized data.
We have developed a linear time algorithm for the equalwidth heightconstrained
problem. This compares very favorably (both analytically and experimentally) with
the O(n2) dynamic programming algorithm of Paik and Sahni [21].
For the equalwidth widthconstrained problem we have developed four algo
rithms of complexity O(n log n), O(n log log n), O(n log*n), and O(n), respectively.
All compare very favorably with the O(n3) dynamic programming algorithm of Paik
and Sahni [21]. Experimental results indicate that the O(nlogn) algorithm performs
best on practical size instances.
CHAPTER 3
STANDARD AND CUSTOM CELL FOLDING
3.1 Introduction
Standard cell and gate array design styles are characterized by a row (column)
organization of the layout. The layout area is divided into a number of parallel rows
separated by routing channels as shown in Figure 3.1. The layout problem is generally
divided into two independent subtasks: placement and routing. In the placement step
the appropriate locations and orientations of the standard cells are decided. In the
routing step, the required connections are added.
One approach to placement is linear ordering with folding [26, 13, 2]. In this
approach, the placement is divided into two distinct steps. The first is linear ordering
in which an order of the modules is determined so as to minimize the connection
length or minimize maximal density of connections for modules positioned in one
line. The folding step maps the linear order into the row structure of the chip.
The linear ordering problem is NPhard and heuristic strategies are discussed in [26]
to minimize the connection length as well as maximal density of connections. The
greedy strategy is adopted in [26] for folding the ordered modules.
In this paper, we consider only the second step of the placement approach just
described. We begin with an ordered component list C1, C2,..., C and develop
I I I I
cl
I Ok
Ck+1
C ftc+1)
C  Standard cell row
Ci+1
Routing channel
Channel Height
Cn
Figure 3.1. Standard cell Architecture
algorithms to fold this list into rows. If the list is folded at Ci, then the component
Ci is in one row and Ci+l is in the next. If the list is folded at Ci and Cj and
at no component Ck for i < k < j, then components Ci+i,...,Cj are in the same
row. Suppose the list is folded at Ci. The channel height needed between the rows
containing Ci and Ci+1 may be estimated [8] using the number of nets that have a pin
in one of the components CI,..., Ci as well as in one of the components Ci+1,..., Cn
Let this height estimate be li, 1 < i < n. Let 1, = 0.
We study the following folding problems:
1. Standard cell folding to minimize total routing channel area subject to a chip
width constraint W. Since each routing channel has the same width, the chip
area assigned for routing is minimized when the sum of the channel heights is
minimized. This problem is solved in O(n) time using dynamic programming
(Section 3.2.1). Note that whenever we use the term chip area, we could instead
use subchip area.
2. Standard cell folding to minimize chip area subject to a chip width constraint
W. In this problem both routing area. and the area assigned for the components
is considered. Since the chip width is fixed at W, area minimization is equivalent
to minimizing chip height. In Section 3.2.1, we use dynamic programming to
obtain an O(n) algorithm for this problem.
3. Standard cell folding to minimize total routing area subject to a total routing
channel height constraint Ii. This problem differs from problem 1 only in that
the total height of the routing channels is fixed at H, and their width is variable
rather than the routing channels having variable total height and fixed width
W. In Section 3.2.2, we show how to solve this problem in O(n log n) time.
4. Standard cell folding to minimize chip area subject to a chip height constraint
H. This problem is solved in O(n log n) time in Section 3.2.2.
5. Standard cell folding using equal height channels of width W. We are to find
a folding that uses channels of minimum height. Among all such foldings, one
that uses the fewest number of routing channels (and hence fewest number of
component rows) is to be found. In Section 3.3.1, we develop an O(n log n) ex
pected time algorithm for this problem. However, for most practical instances,
the algorithm has run time O(n).
6. Standard cell folding using equal height routing channels of width W. Find a
folding that minimizes the total chip area. This can be done in O(n2) time (see
Section 3.3.2).
7. Standard cell folding using equal height channels and a chip of height H. The
folding should minimize the total chip area. Our algorithm for this problem
can be found in Section 3.3.3. Its complexity is O(n2).
8. Custom cell folding to minimize total chip area subject to a chip width W.
Note that in standard cell layout, all cells/components/modules have the same
height and may have variable widths. In custom cell layout, the cells may differ
in both height and width. We assume that the cell row height is set to be the
height of the tallest cell assigned to that row. In Section 3.4.1, we develop an
O(n log n) algorithm for this problem.
9. Custom cell folding to minimize total chip area subject to a chip height con
straint H. We solve this problem in Section 3.4.2 using an algorithm of com
plexity O(n log2 n).
We note that problem 8 has been studied previously in [21] in the context of bit
slice stack folding. The algorithm developed there has complexity O(n2) while ours
has complexity O(n log n). Problem 9 has also been studied in [21]. Our O(n log2 n)
algorithm is an improvement over the O(n2 log n) algorithm developed in [21].
3.2 Standard Cell Folding (Problems 14)
Our discussion of problems 14 is divided into two parts. In Section 3.2.1, we
consider problems 1 and 2. In both these, the chip width and hence the cell and
routing channel widths are fixed at W. In Section 3.2.2, we consider problems 3 and
4 in both of which the chip height is fixed at H. In all four problems, the routing
channels have variable height. Each cell and hence each cell row has height h. The
width of cell i is wi, 1 < i < n. Let wij = E=i. wk, 1 < i < j < n. In case of fixed
chip width W, we may assume that wi < W, 1 < i < n.
3.2.1 Width Constrained Case (Problems 1 and 2)
We first consider problem 1. In this, we are to minimize the total routing area.
Since the channel widths are fixed at W, it is sufficient to minimize the sum of
channel heights. Suppose that Ci,..., Cn is folded at Ci in an optimal folding X.
Then the folding of C1,..., C, in X as well as that of Ci+l,..., C, must be minimum
area foldings. Hence, the principle of optimality holds and we can use dynamic
programming [10].
Let f(i, s), i < s, denote the minimum sum of channel heights when the com
ponent list Ci,..., C, is folded such that Ci,..., C, are in one cell row and the first
fold is at C, (so, C,+, is in the next cell row). It is easy to see that f(n,n) = 1, = 0.
For 1 < s
0o if w1s, > W
f(i,s) = (3.1)
f(i + 1,s) otherwise
Also, for 1 < i = s < n, we get
f(i,i)= min {f(i+ l,q)+ l} (3.2)
i
The solution to problem 1 is obtained by first using Equations 3.1 and 3.2 to
determine f(i,s), 1 < i < s < n and then determining the minimum of f(1,j),
1 < j < n. The wi,'s may be precomputed in O(n2) time. Each f(i,s), i < s takes
0(1) time to compute and f(i,i) takes O(n i) time. Hence, all the f(i,s)'s, i < s
may be obtained in O(n2) time. The minimum of the f(1,j)'s can be obtained in
O(n) time. So the overall time needed to solve problem 1 using Equations 3.1 and
3.2 is O(n2).
A more careful implementation of the dynamic programming algorithm results
in a complexity O(n). First we compute the suffix sums
n
Qi = :vj, 1 < i < n
j=i
in O(n) time. Let Q,+1 = 0. From the suffix sums, each wi, can be computed in
0(1) time using
wi, = Qi Q.C+
Next, from Equation 3.1 we see that for i < s and ws, < W:
f(i,s) = f(i + 1,s) = f(i + 2, s) = ... = f(s,s) = F(s)
So, Equation 3.1 becomes (for i < s)
00 wi., > IV
f(i,s) = (3.3)
F(s) otherwise
Using Equation 3.3, Equation 3.2 may be rewritten as:
F(i) = f(i,i) = min {f(i + 1,q) +)
i
= mmin {F(q) + l}
i
= +min I{F(q)} (3.4)
i
The minimum total routing height needed is
min { F(i)} (3.5)
1
So, problem 1 may be solved by computing the n F(i)'s using Equation 3.4
(rather than the O(n2) f(i, s)'s using Equations 3.1 and 3.2) and finding the minimum
of O(n) F(i)'s in Equation 3.5. To compute the F(i)'s using Equation 3.4, we begin
with F(n) = 0 and compute F(n 1), F(n 2),..., F(1), in that order. To compute
an F(i) we need to find the minimum of a multiset Si of previously computed F's.
Specifically,
Si = {F(q) I1 < q < n and wti+,q < W}
Observation 1 : If wj+,,q > W, then wi+1,q > W for i < j. Hence, if F(q) Sj, then
F(q) V Si for i j. 1
From Observation 1, it follows that Si1 may be computed from Si, 1 < i < n
by eliminating those F(q)'s for which wi,, > W and adding in F(i) (note that, by
assumption, wi = wii < W).
Lemma 1 : If F(a) E Si, F(b) E Si, F(a) < F(b), a < b, then we may eliminate F(b)
from Si and continue to compute Si1, Si2, S as described above. This does not
affect the values of F(i 1),..., F(1).
Proof : Note that F(j), j < i is being computed using the equation
F(j) = l, + min { F(q) }
F(q)ESi
If F(b) is eliminated from Si, the value of F(i) is unaffected as F(a) < F(b). If F(a)
is eliminated from Sj, j < i because wj+,,a > W, then F(b) would also be eliminated
as a < b and so wj+l,b > Wj+l,a > W. If F(a) is eliminated because there is a
F(c) < F(a), c < a, then so also will be F(b) be eliminated as F(c) 5 F(a) < F(b)
and c < a < b. 0
Observation 1 and Lemma 1 motivate us to maintain S as a sequential queue
[11] in an array Result[1..n]. Result[i].q and Result[i].F together represent an entry
of S yielding the value F(q). The elements of S are stored in positions tail, tail +
1, head of array Result. The F(q) values are in descending order lefttoright.
J.7
Hence, the q values are in ascending order. Procedure MinimizeHtStandard (Figure
3.2) is the resulting algorithm.
Theorem 5 : The procedure MinimizelHtStandard given in Figure 3.2 is correct.
Proof : There are two parts to the working of procedure MinimizeHtStandard. The
first one is computing F(i), in which deletions of F(.)'s can occur. The second one
is inserting the computed F(i) at the appropriate place in the array.
The procedure maintains the following invariant at the start of each iteration of
the for loop.
Invariant: Result[tail].F > Result[tail + 1].F > ... > Result[head].F
It is clearly true when i = n 1 as head = tail.
The invariant is true at the start of the iteration and so Result[head].F is
the minimum maintained F(.) value. The component number is maintained in
Result[head].q. We check whether Q[i + 1] Q[Result[head].q + 1] > W and if
so by virtue of Observation 1, we can eliminate this value. We do so by decrementing
the head pointer. We keep repeating this until we find a record k = Result[head].q
such that BR[i + 1] BR[k + 1] < W. This record pointed to by head has the min
imum of the maintained F(.) values. We compute F(i) and store it in temp. Notice
that at the end of the while loop, we have deleted a few F(.)'s and the invariant
property still holds.
The invariant holds before the start of the second while loop. Here we start
at tail. If the inequality is true, then we delete the record and this is justified by
Lemma 1. We keep doing so until temp > Result[tail].F. Then we decrement the
Procedure MinimizeHtStandard;
{ Compute the minimum height layout}
{ Initialize S, = {F(n) = 0 } }
head := n; tail:= n;
Result[tailj.F := 0; Result[tail].q = n;
{ Compute F(i) }
for i := n 1 downto 1 do
begin
{ Compute S, }
while (Q[i + 1] Q[Result[head].q + 1] > W) do
head := head 1; {delete from Si+l, Observation 1 }
temp:= l[i] + Result[head].F; {Use min F in Si to compute F(i)}
while (temp < Result[tail].F) do { delete using Lemma 1 }
tail:= tail+ 1;
{ Store F(i) }
tail := tail 1;
Result[tail].F := temp;
Result[tail].q := i;
end; { of for }
while (Q[1] Q[Result[head].q + 1] > W) do
head := head 1;
MinimizeHtStandard := Result[head].F;
end; { of MinimizelHtStandard }
Figure 3.2. Procedure to obtain a minimum height folding
tail pointer and store the temp record. So, the invariant holds at the end of the
iteration. Consequently, the invariant holds at the start of each iteration of the for
loop and the F's are correctly computed.
The minimum height layout is the minimum of the maintained F(.) values that
satisfy the width constraint, i.e Q[1] Q[Result[head].q + 1] < W. The last while
loop of the procedure take care of this fact. The last line of procedure computes the
MinimizeHtStandard, which is the minimum height layout. 0
Whenever the pointers head or tail are advanced in the while loops, we delete
F(.) values. This cost can be charged towards deletion of F(.) values. The remaining
code within the for loop takes O(n) amortized time. The complexity of the procedure
MinimizeHtStandard is clearly O(n) as no more than n deletions can take place.
Using standard dynamic programming traceback techniques [101, the fold points can
be obtained in additional O(n) time.
Problem 2, i.e, minimize total area rather than just routing area may be done
in a similar way. Let f(i, s), i < s now denote the minimum chip height for the
component list Ci,..., Cn assuming the first fold is at s. As before f(n, n) = 0 and
Equation 3.1 holds for i < s. Equation 3.2 needs to be replaced by
f(i,i) = min {f(i + 1,q) + li + h (3.6)
i
Using Equations 3.1 and 3.6 and the development for problem 1, an O(n) time
algorithm for problem 2 may be obtained.
3.2.2 Height Constrained Case (Problems 34)
The solutions to problems 3 and 4 are similar. Both use parametric search and
we describe only the solution to problem 3. Since the total height of the routing
channels is fixed at H, the area assigned for routing is minimized by minimizing the
chip width W. To use parametric search to minimize W, we must do the following:
1. Identify a set of candidate values for the minimum W. This set must be pro
vided as a sorted matrix with the property that each matrix entry can be
computed in constant time.
2. Provide a way to determine if a candidate width W is feasible, i.e, can the
component stack can be folded using total channel height H and width W ?
For the feasibility test of 2, we can use procedure MinimizeHtStandard of Figure 3.2 by
setting W to the candidate value being tested and then determine if MinimumHtStandard <
H following the execution of the procedure.
Next, we provide an n x n sorted matrix M (n is the total number of components
in the component list) of candidate values. To determine the candidate matrix M, we
observe that the width of any layout is given by >,= wi for some i,j, 1 < i < j < n.
This formula gives us the width of the segment that contains components Ci through
Cj. M is a sorted matrix that contains all candidate values. The minimum Mij for
which a height H folding is possible is the minimum width heightH folding. We now
show how the elements of M may be computed efficiently given the index pair (i,j).
Let
n
Ti= Ewi,l < i < n
j=i
and let Tn+j = 0. Then,
STi+ T+, i + j > n +1
Mi3 =
,i i+ j
So, if we precompute the Ti's each Mij can be determined in constant time.
The precomputation of the Ti's takes O(n) time. Since feasibility testing takes linear
time, from Corollary 2, it follows that the complexity of the described parametric
search to find the minimum width folding is O(n + t(n)log n) = O(n + nlog n) =
O(n log n).
3.3 Standard Cell Folding (Problems 57)
In this section, we deal with layouts which have fixed channel area, e.g, semi
custom chips in which each routing channel is of the same height.
3.3.1 Minimum Channel Height (Problem 5)
We may view the result of any width W folding as the transformation of the
component list C1,..., C into a new component list B1,..., Bk, k < n where B1
represents the components folded into row i of the layout. The width of each Bi
equals the sum of the widths of the components assigned to cell row i and this is
< W. Also, the routing channel between rows i and i + 1 must have height at least
equal to lj, where Cj, is the last component assigned to cell row i. We see that
i
width(Bi) = E wj
j=j'i +1
and height of channel(i) > 1i,
where jo = 0. When channel heights are the same, the height must be at least
maxl
With this knowledge, we can develop a greedy algorithm to minimize channel
height. In this, we repeatedly combine together pairs of components (this is equivalent
to assigning them to the same cell row or Bi) so that no created component has width
greater than W. The pairs are chosen in nonincreasing order of li. The greedy
algorithm is given in Figure 3.3. Each set of combined components is represented by
a pointer, last, from the first component to the last and another pointer, first, from
the last component to the first. The width of the combined component is kept in the
first elementary component of the combined component.
In the algorithm of Figure 3.3, we initialize the combined component blocks to
consist of elementary components in the first for loop. The sort gives us the order
in which the l's are to be "eliminated" so that the maximum of the remaining I's
is the minimum. In the while loop l's are eliminated by combining blocks. This
is done until the next highest l (we assume that E wi > W so it is not possible to
eliminate all l's). The highest remaining I is l[p[i]] and this is the smallest channel
height needed.
Procedure MinChannelHeight;
for i := 1 to n do {intialize component blocks }
begin
first[i] := i; last[i]:= i;
end;
Sort p[1..n] = [1,2,...,n] so that
l[p[iJ] > l[p[i + 1]], 1 < i < n
i := 1;
while( width[first[p[ij]] + width[p[i] + 1] < W ) do
begin
width[first[p[i]]] := width[first[p[i]]] + width[p[i] + 1];
first[last[p[i] + 1]] := first[p[i]];
last[first[p[i]] := last[p[i] + 1];
i := i +1;
end;
MinChannelHeight:= l[p[i}];
end;
Figure 3.3. Procedure to obtain a minimum channel height folding
The correctness of the procedure is easily established. For its complexity, we
see that except for the sort step, the others take O(n) time. The sort can be done
in O(nlogn) time. However, in practice, max{/i} min{l,} = O(n) and the sort
can be done in O(n) time using a radix sort with radix O(n) (i.e., a bin sort) [11].
One may also verify that the minimum number of cell rows needed is obtained by
doing a greedy folding on the combined components that remain when procedure
MinChannelHeight.
3.3.2 Minimize Chip Area Subject to Width Constraint (Problem 6)
First, consider a modified version of problem 6 in which in addition to the chip
width W, we are given the height L of each routing channel. We are to fold the
components so as to minimize the total chip area. To solve modified problem 6 in
linear time, we first make a pass over all the components and combine components C,
and Ci+l if 1i > L. If any component that results has width > W, L is an infeasible
channel height. Following the combining of blocks in this way, the resulting blocks
are packed into cell rows in a greedy manner (i.e., a new cell row is started only if
the component being placed does not fit in the current cell row). The fact that this
minimizes the number of cell rows and hence chip area is easily verified.
Problem 6 can be solved using the solution to modified problem 6 by trying
out all O(n) possible values for L (i.e., the distinct li's) and seeing which minimizes
overall area. (Actually only li's that are no less than the minimum feasible L as
determined by problem 5 need be tried). The resulting complexity is O(n2).
3.3.3 Minimize Chip Area Subject to Height Constraint (Problem 7)
As for problem 6, we define a modified problem 7 in which the channel height L
is known. This modified problem is solved using parametric search. The candidate
values are described by the same M matrix as used in Section 3.2.2. The solution
to modified problem 6 is used for the feasibility test. This enables us to solve the
modified version of problem 7 in O(n log n) time. Now, by trying out all O(n) possible
L values (as in Section 3.3.2) the minimum area folding can be determined. The
overall time complexity is O(n2 log n).
3.4 Custom Cell Folding (Problems 8 and 9)
In this section, we relax the requirement that all components have the same
height h. Let hi be the height of Ci. If C,, . Cj are assigned to the same cell row
and no other components are assigned to this row, then the cell row height is
max{ h }
i<_q
The height of the folding is the sum of the heights of the cell rows and routing
channels.
3.4.1 Width Constrained Folding (Problem 8)
Since the chip width is fixed at W, chip area is minimized by minimizing chip
height. Let Rj = max
height into which Ci,..., C, can be folded such that the first fold is at C,. Following
Procedure MinimizeIHtCustom;
{ Compute the minimum height folding}
head := n; tail:= n; left := n; right := n;
for i := 1 to n do
Hlist[i].gvalue := oo;
Flist[tail].q := n; Flist[tail].F = 0;
Hlist[n].top:= tail; Hlist[n].bottom := tail;
Hlist[n].hvalue:= h[n];
Hlist[n].gvalue := Hlist[n].hvalue + Flist[Hlist[n].top].F;
InitializeWinnerTree(T);
for i := n 1 downto 1 do
begin
DeleteValue(i);
Insert Value(i);
end;{of for }
Delete Value(0);
MinimizeHtCustom := Winner of the Tree T;
end; { of MinimizelHtCustom }
Figure 3.4. Procedure to obtain a minimum height folding for custom cells
the development of Section 3.2.1, we see that f(n, n) = h, and for i < s,
00 if wi, > W
f(i,s) = (3.7)
f(s, s) + Rs h5, otherwise
and for i = s,
f(i, i) = min { f(i + 1, q) + h + i} (3.8)
i
The minimum height into which the folding can be done is mini
As described in Section 3.2.1, the set of dynamic programming equations can be
solved in O(n2) time. However, the development of Section 3.2.1, that results in an
O(n) time solution does not apply to the new set of equations. Instead, we are able
to solve problem 8 in O(n log n) time.
Define F(i) = f(i, i) hi. Substituting into Equation 3.7, we get
00oo if Wi, > W
f(is) = (3.9)
F(s) + R,, otherwise
From Equation 3.8, we get
F(i) = f(i,i) h = min {f(i + 1, q)} + 1
i
= lj+min{f(i+1,i+1), min {f(i+1,q)}}
i+l
= i + min{F(i + 1) + hi+l, min {F(q) + Ri+,q}}
i+l
= / + min {F(q) + Ri+,q} (3.10)
i
The height of the minimum height folding is
min w{F(i) + R,} (3.11)
l
Beginning with F(n) = f(n, n) h, = 0, the remaining F's may be computed,
in the order F(n 1), F(1), by using Equation 3.10. To use Equation 3.10, we
keep a multiset Si of F values as in Section 3.2.1. We begin with S, = {F(n)} and
rewrite Equation 3.10 as :
F(i) = li + min {F(q) + Ri+i,q} (3.12)
F(q)ESi
Observation 1 of Section 3.2.1 applies to Equation 3.12 and we may eliminate
from Si any F(q) for which wi+l,q > W.
Observation 2 : Ri, Rii,,q > > Ri,i, 1 < i < q 5 n.
Using Observation 2 and Equation 3.12 we can show that Lemma 1 applies for
the computation of the F's as defined in this section.
Observation 3 : If hj > hq and i < j < q, then Riq 0 hq. Also, if hi > hj+, and
i < j then Rj = Ri,j+l.
Now, we devise a method to find the minimum in Equation 3.12 efficiently. We
store the F(.) values in an array of records called Flist. Each Flist record has two
fields, Flist.q and Flist.F. Flist.F = F(Flist.q), ie, say F(8) = 50 then there is a
record which has Flist.q = 8 and Flist.F = 50. There are two pointers, head and
Procedure Delete Value(i);
{ Delete F(l) such that wi+ll > I }
done:=false; bool:= false;
while (not done) do
if (Q[i + 1] Q[Flist[Hlist[right].top].q+ 1] > W) then
{Delete this F(.) value)
Hlist[right].top = Hlist[right].top 1;
head = head 1;
bool := true;
if (Ilist[right].top < Hlist[right].bottom) then
{ Make this record inactive }
Hlist[right].gvalue := oo;
Adjust WinnerTree(T, right);
right := right 1; bool := false;
end; {of if}
else done := true;
end;{of if}
end;{of while}
if bool then
Hlist[right].gvalue := Hlist[right].hvalue + FlistHlist[right].top].F;
Adjust WinnerTree(T, right);
end;{of if}
end; { of Delete Value )
Figure 3.5. Procedure to delete F(.) values as in Observation 3
tail that are used. Initially, head = tail = n. At any point, the head and tail have
values such that head > tail and F(tail) > F(tail + 1) > ... > F(head). This data
structure is same as the one used in Section 3.2.1.
When computing F(i) we need to associate F(q) values with Ri+,q values and
then generate values F(q) + Ri+i,q, and find the minimum of these values. Suppose
hq > hq+i then Riq = Ri,q+l from Observation 3. Associate the values F(q) and
F(q + 1) with Ri,q in this case. In general, if Ri,q = Ri,q+l = Ri,q+2 = ... = Ri,t,
then we have a single Hlist record with hq value and associated with it the values
F(q),F(q + 1),..., F(I). Note that the F(.) values must satisfy the condition :
F(q) > F(q + 1)... > F(l). Otherwise the F(.) values which violate the condition
can be removed as in Lemma 1 by doing a left to right scan. We use an array of
records Hlist of size n with fields Hlist.hvalue representing the height, Hlist.top and
Hlist.bottom the two pointers which keep track of the F(.) values associated with this
record. The top and bottom pointers point to the F(.) values satisfying the condition:
Flist[Hlist.top].F < Flist[Hlist.top 1].F < ... < Flist[Hlist.bottom].F. That is,
Flist[Hlist.top].F is the minimum F(.) value associated with this record. Note that
every F(.) value is associated with a unique Hlist record. We generate the value
Flist[Hlist.top].F+ Hlist.hvalue (which is F(q) + Ri,q) and store it in Hlist.gvalue
(generated value). These generated values are used to construct a winner tree T (see
Horowitz and Sahni [11]).
The winner of the tree T is the minimum we are looking for when computing
F(i). Let a Hlist record be active if Hlist.gvalue # oo. The pointers left and right,
left < right, are used to point to the currently active list of Hlist records. Hlist[left]
is the leftmost active record and Hlist[right] is the rightmost active record.
The procedure MinimizeHltCustom is given in Figure 3.4. The pointers are
initialized and the winner tree T initialized. In the procedure DeleteValue(i), the
F(l) values that satisfy the conditions in Observation 1, i.e., F(l) values such that
wi+1,i > W are deleted. Let Q[j] = wj + wj+l +... + w,.
The procedure Delete Value is given in Figure 3.5. The boolean bool keeps track
of whether a Hlist record has been made inactive. If so, it moves the pointer right
to left to point to an active Hlist record. Also, the winner tree T is adjusted to
update the current minimum. The call to function AdjustWinnerTree takes O(logn)
time [11]. Note that the winner tree T is adjusted a maximum of two times whenever
an F(.) value is deleted. Let the number of deletes when DeleteValue is invoked be
x. Then, the time complexity of Delete Value is O(x log n).
The procedure InsertValue(i) first finds the winner of the tree T. This is added
with l[i] to get F(i) as in Equation 3.12. Once we find F(i), we then insert a
Hlist record with Hlist.hvalue = h[i] and the F(.) value is inserted in the array of
Flist records. The winner tree T is then adjusted. In the first while loop of the
InsertValue, conditions of Lemma 1 are checked. If the conditions apply then the
F(.) values are deleted and the winner tree adjusted. Let the number of F(.) value
deletions be y. In the second while loop of the Insert Value, it is checked to see
whether the conditions of Observation 3 apply. If so, the F(.) records of the adjacent
Hlist record is added to the current Hlist record and the record moved. The winner
Procedure Insert Value(i);
left := left 1; tail := tail 1;
Flist[tail].q := i;
Flist[tail].F:= Winner of the Min Tree T + l[i];
Hlist[left].hvalue:= h[i];
Hlist[left].top:= tail; Hlist[left].bottom:= tail;
Hlist[left].gvalue := IIlist[left].hvalue + Flist[IIlist[left].top].F;
Adjust WinnerTree(T, left);
while (head $ tail and Flist[tail].F < Flist[tail+ 1].F) do
Hlist[left + 1].bottom := Hlist[left + l].bottom + 1;
if (Hlist[left + 1].bottom > llist[left + 1].top) then
Hlist[left + 1] := Hlist[left]; { Move the record }
Hlist[left].gvalue := oo;
AdjustWinnerTree(T, left);
left := left + 1;
end; {of if}
Flist[tail+ 1] = Flist[tail]; { Move the record }
tail tai= tail+ 1;
Hlist[leftj.top:= tail; Illist[left].bottom:= tail;
end;{of while}
while (left / right and Hlist[left].hvalue > Hlist[left + 1.hvalue) do
{Conditions of Observation 3 apply}
Hlist[left].top := Illist[left + 1].top;
Hlist[left + 1] := Hlist[left]; { Move the record }
Hlist[left].gvalue:= oo;
Adjust WinnerTree(T, left);
left:= left + 1;
Hlist [left].gvalue := Hlist[left].hvalue + Flist[Hlist[left].top].F;
Adjust WinnerTree(T, left);
end;{of while}
end; { of Insert Value }
Figure 3.6. Procedure to Insert F(.) values
tree is then adjusted. Every time, the conditions of Observation 3 apply in the while
loop, we spend O(logn) time. I.e., every time the conditions apply we merge two
adjacent Hlist records. Let the number of merges in a single invocation of Insert Value
be z. The total time taken by a single invocation of InsertValue, assuming y F(.)
values are deleted and z Hlist merges take place is O((y + z + 1)log n) time.
Note that not more than n F(.) values can be deleted in total, and not more
than n Hlist records can be merged in total. This implies that the total time taken
by the procedure Minimize HtCustom is O(n log n). In contrast, the algorithm of [21],
for the same problem takes O(n2) time.
3.4.2 Height Constrained Folding (Problem 9)
To obtain the minimum height folding, given the width of the folding W, we use
parametric search in conjunction with the procedure MinimizeHtCustom developed
in Section 3.4.1. The procedure MinimizelltCustom is used for the feasibility testing.
In feasibility testing, we are given the width,x, of the layout and we test whether it
is possible to obtain a folding such that the height of the folding is < H. The set of
candidate values is the same as the ones described in Section 3.2.2. The feasibility
testing takes O(n log n) time, and from Corollary 2, the total time taken to obtain
the minimum height folding is O(n + n log n log n) = O(n log2 n). The same problem
is solved in O(n logn) time in [21].
Table 3.1. Heights produced by widthconstrained standard cell folding algorithms
3.5 Experimental Results
The procedure MinimizeHlStandard (Figure 3.2) was programmed in C and
run on a SUN 4 workstation. The solution produced by MinimizeHtStandard was
compared with the one obtained using the greedy heuristic of [26].
The data for these programs were produced by having a linearly ordered list of
modules and making interconnections between the modules using a random number
generator. The connections were prioritized so that there is a large number of con
nections between modules which are close together. Our algorithm always produces
better solutions than the greedy heuristic and the results are depicted in Table 3.1.
The results shown are the average of 10 runs for each n. Our algorithm, on the aver
age, took 2 to 3 times more time to arrive at the solution than taken by the greedy
heurisitic.
The algorithm MinimizeHtCustom was programmed and the run times compared
with the algorithm of [21]. The results of the experiments are shown in Table 3.2.
Both the programs were written in C. It is evident that our algorithm is considerably
n Greedy Ours
100 624.4 609.8
400 1979 1961.2
1000 3813.7 3721.2
Table 3.2. Run times of widthconstrained folding algorithms for custom cells
n Ours [21]
64 2.56 23.80
250 10.9 350.3
1000 39.23 6125.5
Times are in milliseconds
superior to that of [21]. Since both algorithms generate optimal solutions, the chip
area is the same using either.
3.6 Conclusions
We have developed optimal algorithms to fold a linearly ordered list of standard
and custom cells. Several optimization constraints were considered. These resulted
in a total of nine problem formulations. Two of these correspond to problem formu
lations for the bitslice stack folding problem studied in [21]. The algorithms we have
developed for these two cases are asymptotically superior to those developed in [21].
Experimentation with one of these shows that the asymptotic superiority of our al
gorithms translates into a much reduced execution time. For the other formulations,
heuristics were proposed in Shragowitz et al. [25]. Our algorithms have accept
able asymptotic complexity and guarantee optimal solutions. In fact, experiments
conducted with one yielded foldings with smaller chip area on all tested instances.
CHAPTER 4
PLANAR TOPOLOGICAL ROUTABILITY
4.1 Introduction
The problem of routing twopin nets on a single layer has been studied previously
by several researchers. The river routing and switch box routing problems are special
cases of this. Efficient algorithms for these can be found in [12, 15, 17, 19, 20,
22, 23, 27, 29, 30]. In this chapter, we are concerned with the problem of routing
(topologically) a collection of twopin nets in a single layer or plane. We refer to
this problem as the TPR problem. The input to the problem is a two dimensional
routing surface with a collection of modules placed in it (Figure 4.1(a)). We assume
that no two modules touch. There are pins on the periphery of the modules. Pins
with the same number define a net and are to be joined by an interconnect or wire.
In topological routing, we are concerned with defining wire paths. However, no
underlying grid is assumed and there is no minimum wire separation requirement.
Thus wire paths can take any planar shape and may run arbitrarily close to each
other. Wires are not permitted to intersect or run over modules. In Figure 4.1(a),
the broken lines indicate wire paths. The routing instance (RI) of Figure 4.1(a) is
topologically routable in a single layer while that of Figure 4.1(b) is not. The TPR
problem for Ris in which all modules lie on the boundary of the routing region (or
59
S2
...... ..2.......... .. .
(a) Planar Routable (b) A non planar routable example
Figure 4.1. A planar routable and a nonplanar routable case
more precisely all pins are on the boundary of the region) was studied in [18, 12, 23].
A simple linear time algorithm for this version of the TPR problem was developed
in these papers. For the case in which none of the modules are on the boundary,
Pinter [23] has suggested using the linear time planarity testing algorithm of Hopcroft
and Tarjan [9]. His algorithm is quite complex. MarekSadowska and Tarng [18] have
considered the TPR problem and several variants which include flippable modules
and multiterminal nets. They develop a linear time algorithm for TPR which is
based on module merging. In this chapter, we present, in Section 3.3, another linear
time algorithm for the general TPR problem that is almost as simple as the one
of [18, 12, 23] for the restricted TPR problem. This algorithm was developed by
Lim [16] but the proof that the algorithm is correct was incomplete. In this section,
we also present an algorithm for definite topological routing. That is, if the instance
is topologically routable we give an algorithm to determine the loose route of the
wires. The TPR algorithm is implemented differently than described in the Section
3.3. The implementation issues are discussed in Section 3.5. Experimental results
presented in Section 3.6 indicate that our algorithm is considerably faster than the
TPR algorithm of MarekSadowska and Tarng [18] particularly if the routing instance
is not planar routable. For the case of multipin nets, we show, in Section 3.4, that
testing for topological routability is equivalent to graph planarity testing and that
finding the maximum number of nets that is topologically routable is NPcomplete.
We also extend our twopin algorithm to handle multipin instances in which the
modules remain connected following the deletion of all nets with more than two pins.
4.2 Preliminaries
To simplify matters, we shall assume that TPR RIs that have modules on the
boundary (Figure 4.2(a)) have been augmented by a set of nets that are required to
be routed on the boundary and that this routing together with the module bound
aries enclose the routing region (Figure 4.2(b)). This augmentation may require the
addition of corner modules (A, B, C of Figure 4.2(b)). This assumption is needed so
that our algorithm can account for the constraint that one cannot route around a
boundary module but can route around all other modules.
A pin segment, P = piP2...pk, is a sequence of pins on the boundary of a
module. pI ... pk appear in this order when the module is traversed counterclockwise
beginning at pi. Some of the pin segments of the modules of Figure 4.3 are: abcde and
gjkH of module 1; MLK and LKJGf of module 3; and AiF of module 2. Let last(P)
and first(P), respectively, denote the last and first pins of segment P. Let net(pi)
denote the net associated with pin pi. Note that two pins pi and pj are to be connected
by a wire iff net(pi) = net(pj). A curve, C = P P2 ... Pj, is a sequence of pin segments
a a
A 1 C
2
3 F4 3 41f
e e
(a) RI (b) Augmented RI
Figure 4.2. Augmentation
Figure 4.3. An example to illustrate some terminology
such that net(last(Pi)) = net(first(Pi+i)), 1 < i < j. A curve, C = P1P2...Pj,
is a closed curve iff net(last(Pj)) = net(first(Pi)). In Figure 4.3, net(pi) is the
lowercase letter corresponding to pi. So, net(h) = net(H) = h. Some of the curves
of Figure 4.3 are Ih Habcdeg Gf FEDCBAi, j JGfM mlh, edcba ABCDE and
ABC cdeg GfM. IhHabcdeg Gf FEDCBAi and edcba ABCDE are closed curves.
With any curve C = P1 P2 ... Pj, we associate j1 (j in case C is closed) wires. These,
respectively, connect the pins last(Pi) and first(Pi+l), 1 5 i < j (and last(Pj) and
first(Pi) in case of a closed curve). Note that the curves, closed curves, and wires
associated with any RI depend only on the modules and the net to pin assignments.
These are not a function of the layout of any of the wires.
For any closed curve C = P P2... Pj we define the following:
module(Pi) ... module corresponding to pin segment Pi
pins(module(Pi)) ... set of all pins on module module(Pi)
pins(Pi) ... set of all pins on segment P;
pins(C) ... set of all pins on curve C = U(Jl= pins(Pi)
extpins(C) ... U[i= pins(module(Pi)) pins(C)
Note, it is possible that module(PT) = module(Pj), for i j.
Lemma 2 : [16] Let I be an RI that contains a closed curve C with respect to which
there are two pins a E pins(C) and b E extpins(C) such that net(a) = net(b). I is
not planar routable.
(a) Original Situiiaflon b) Connect terminalra and b
A
b
D
(c) Rerouting o some net
Figure 4.4. Two possibilities to connect a and b
Proof : Figure 4.4 shows two possibilities. It should be clear that no matter how
the wires of C and the wire (a, b) are laid out, there must be an intersection between
two of these. ]
Lemma 3: [16] Let I be an RI that contains a closed curve C = P1,P2,...,Pj
and another curve R = R1R2 ... Rk such that module(Ri) = module(Pd) for some d,
1 < d < j and first(Ri) E extpins(C) (see Figure 4.5). Assume that there exist two
pins a and b such that a E pins(C), b E extpins(C) Upins(R), and net(a) = net(b).
I is not planar routable.
Proof : Follows from Lemma 2. 0
Two modules are connected iff there is a curve C = P1P2... Pj such that both
modules are in UJi= module(Pi). A connected component (or simply component) is
Figure 4.5. Another not planar routable situation
a maximal set of modules that are pairwise connected. It is easy to see that the
connected components of an RI are disjoint. A boundary component is a connected
component that includes at least one boundary module. Note that an RI with no
boundary modules has no boundary component while an RI with at least one bound
ary module has exactly one boundary component (this is because RIs with boundary
components have been augmented as in Figure 4.2(b)).
Lemma 4 : An RI is topologically routable iff its components are (independently)
topologically routable.
Proof : It is easy to see that if the RI is topologically routable then each of its
components is topologically routable. Assume that each component is topologically
routable. Order the components of the RI so that the boundary component is first.
The remaining components are in arbitrary order. Let the components in this order
be K1, K2,..., Kk. If k = 1, then nothing is to be proved. So, assume K > 1. We
shall show how to construct a topological routing for K1, K2..., Ka from a topological
El 0 0 0i 0
0 F 0 O
(ap Module E Ka (b) Spanning tree
o Module Ka
(c) Envelope
Figure 4.6. Constructing the envelope of a component
routing for Ki,..., Ka1 and Ka, 2 < a < k. First since a > 1, Ka is not a boundary
component. So, it is possible to surround it by a closed non self intersecting line
such that the region enclosed by this line includes exactly those modules that are
in Ka and no module touches the line. The region enclosed by this closed line has
the property that any two points in the enclosed region can be joined by a line (not
necessarily straight) that lies wholly within the region. We refer to the surrounding
line as the envelope of Ka. One way to obtain an envelope of Ka is to first construct
a set of II 1 (1Ka is the number of modules in Ka) lines (not necessarily straight)
so that modules of KI together with these lines form a connected component in the
graph theoretic sense (see Figure 4.6). These lines do not touch or cross any of the
modules of RI. This construction can be done as every pair of modules of an RI can
(a) Intersections (b) Rerouting
Figure 4.7. Rerouting to free independent component
be can be connected by such a line. The lines and modules define a spanning tree for
Ka. By fattening the lines as in Figure 4.6(c), the envelope is obtained. It is easy
to see that if IK is topologically routable, then it is topologically routable with the
defined envelope. So, use such a topological routing for K,. When this routing is
embedded into the routing for K1,..., KJ1 some of the topologically routed wires
of K1,..., Ka may intersect (or touch) the envelope of K,. However, none of these
wires originate or terminate in the envelope of K,. So, these can be rerouted following
the contour of the envelope (Figure 4.7). 0
As a result of Lemma 4, we need concern ourselves only with the case when the
RI has a single component.
4.3 The Algorithm
Our algorithm to obtain a topological routing of a component uses Lemmas 2
and 3 to detect infeasibility. The algorithm is given in Figure 4.8. As stated, it
only produces an ordering of the wires such that when the wires are topologically
Algorithm Testing.Planar Routability
Step 1: Let m be any module of the component and let p be any pin of m.
Step 2: Examine the pins of m in counterclockwise order beginning at pin p. When
a pin q is being examined compare net(q) and net(r) where r is the pin (if any)
at the top of stack A. If stack A is empty or net(q) : net(r) then add q and
the remaining pins of m to the top of stack A. Otherwise output (q, r) and
unstack r from A.
Step 3: If both stack A and B are empty, then terminate.
Step 4: Let r be the pin at the top of stack A. Let s be the pin such that
net(r) = net(s).
(a) If s is at the top of the stack B, then [output (r, s); unstack r from A and
s from B; go to start of Step 3].
(b) If s is in stack B but not at the top, then [output("The RI is not planar
routable"). Terminate].
(c) If s is in stack A, then unstackk r from A; add r to stack B; go to the
start of Step 4].
(d) If s is in neither of the stacks then [ set p to s; let m be the module
containing s; go to Step 2].
Figure 4.8. Topological routing.
routed, one at a time, in this order, then there is always a path between the two
end points of the wire currently being routed such that this path does not intersect
previously routed wires or cross any of the modules. This is sufficient to obtain the
actual topological routing.
Our algorithm employs two stacks A and B. Stack A maintains a pin sequence
that defines a curve of the RI. Stack B is used to retain pins that define closed curves
with respect to a (sub) curve on stack A. We describe the working of the algorithm
1Dc cDc
E d. 4 a4
*h2 U A 2 H
FE 3E
(a)Example RI (b)A possible topological routing
Figure 4.9. Example RI
with the aid of an example (Figure 4.9(a)). There are four modules 1 4 and 16 pins
a h and A H. net(p) = p if p is a lowercase letter and net(p) = lowercase(p) if
p is an uppercase letter. Suppose we begin in step 1 with m = 3 and p = B. Then
in step 2, BAFEC get stacked, in that order on to stack A. This corresponds to
the curve of Figure 4.10(a). Pin c is in neither of the stacks and in step 4(d), we set
m = 1, p = c, and go to step 2. In this step, the wires Cc, Ee and Ff are output
for routing. The pins g and D are put on the top of stack A. The curve traced so
far is shown in Figure 4.10(b). The routed wires are also shown as a curve. Note
that these wires have to be routed using the procedure findroute, otherwise they
can enclose a nonempty region. The curve is extended to module 2 and stack A has
configuration BAghb. The wire Dd is output for routing. The curve has the form as
shown is Figure 4.10(c). The curve cannot be extended further as both end points
of wire Bb are on the stack. This means that we have detected a closed curve of the
BAFEC BAFECcefgD
(a) (b)
BAFEC c efgD d hb
BAFECcefgDdhb 0
Ha
(c) (d)
Figure 4.10. Illustration of the routing sequence
RI. The detected curve is that of Figure 4.10(c). We defer the routing of Bb until we
have verified emma 2 and 3 for this closed curve. The deferment also ensures that
the current topological routing does not contain a closed line. If Bb were routed now,
then the wires Bb, Cc, and Dd together with the boundaries of modules 1, 2, and
3 would define a closed line that encloses a nonempty region. This could result in
future routing problems as there would be no path between a point in the region and
one that is outside the region. For example, if the routing of Figure 4.11 is used, then
there is no path between a and A as a is in the enclosed (shaded) region while A is
outside of it. The routing of Bb is deferred by saving b on stack B. The curve of stack
A is extended to module 4 via the wire hH. Wire hH is output for routing. Also
the wires gG and Aa are output for routing. Stack A contains the pin B and stack
B contains the pin b. The curve is shown in Figure 4.10(d). Finally, the wire Bb is
output for routing and the since both the stacks are empty, the algorithm terminates
successfully in step 3.
Figure 4.11. Trapped terminal and module
The routing order is Cc, Ee, Ff, Dd, hH, gG, Aa and Bb. Let us try this
out on our example. We see that no matter how Cc is routed there will remain a
routing path for the remaining wires. The routing of Dd and Hh cannot create any
enclosed regions and so cannot affect the feasibility of future routes. When Ee and
Ff are routed, an enclosed region can be formed. Hence these wires have to be
routed using the procedure findroute of Figure 4.13, otherwise they can enclose a
nonempty region. The topological routed RI can be found in figure Figure 4.9(b).
Lemma 5 : If algorithm Testing.PlanarRoutability terminates in step 3, then the
input instance is topologically routable.
Proof : We shall show that the algorithm TestingPlanarRoutability maintains the
following invariant:
There is a topological routing of all wires output so far such that each remaining
wire is (individually) topologically routable.
M
(a) (b)
Figure 4.12. To illustrate conflict
This is true when we start as at this time, no wires have been output and for
each wire, there is a routing path between its two pins. Assume that the invariant
holds just before some wire (r, s) is output. We shall show the invariant holds after
this wire is output for routing. Wire (r, s) satisfies exactly one of the following:
(a) It is output in step 2, r is a pin that was on stack A, s is the first pin to be
reached on its module.
(b) It is output in step 2, r was on stack A, s is not the first pin to be reached on
its module.
(c) It is output in step 4(a). At the time of output, r was at the top of stack A
and s at the top of stack B.
If we are in case (a), then since module(s) is reached for the first time, no matter
how wire (r, s) is routed at this time no new enclosed regions are formed. Hence all
remaining wires remain routable.
The proofs for cases (b) and (c) are similar. We consider case (c) only. From the
algorithm it follows that at some time prior to the output of (r, s), both r and s were
on stack A, s was at the top of A and about to be moved to stack B. The pins on
stack A beginning with r and ending at s define a closed curve C (as net(r) = net(s)).
Let these pins be from modules module(r) = Mi, M2,..., Mk = module(s) (in this
order moving up stack A). Let p be any pin in pins(C) {r, s} and let q be such that
net(p) = net(q). We may assume that either q E pins(C) or module(q) is unvisited
at the time s is moved from stack A to stack B. Note that if this is not the case,
then q is in stack A and below r at this time. From the working of the algorithm it
follows that when r reaches the top of stack A (as it does by the assumption of (r, s)
being output from step 4(a)), p must be on stack B and above A. So, the algorithm
should have terminated unsuccessfully in step 4(b), contradicting the assumption of
termination in step 3.
Let U be the set of unvisited modules at the time s is transferred from stack A
to stack B. By extending our previous argument, we see that the set N of modules
visited by the algorithm between the time s is transferred from stack A to stack B
and the time (r, s) is output is such that
(1) N C UU modules(C).
(2) All pins in (N n U) U (pins(C) {r,s}) have been output for routing.
Algorithm findroute(r, s)
begin
currentpin := s; I:= clockwisepin(currentpin);
while (1 # r) do
begin
Step 1: Route clockwise from currentpin to 1 following the module boundary
Step 2: currentpin = q such that net(q) = net(l)
Step 3: Continue the route from 1 to currentpin following the existing route
closely
Step 4: 1 = clockwisepin(currentpin)
end
Complete the route from currentpin to I = r following the module boundary
end
Figure 4.13. Algorithm to find routing path between pins r and s
(3) All pins reached from N n modules(C) are in pins(C) U pins(N n U).
We now claim that algorithm findroute(r, s) obtains a topological routing of
the wire (r, s) that preserves the invariant. To establish this, we need to show that
(A) The algorithm actually finds the route between r and s.
(B) The region enclosed by this route and the curve C contains no pins that have
not been routed to.
To prove (A), we need to show:
(Al) For each value of currentpin, clockwisepin(currentpin) (i.e., the pin clockwise
from currentpin) is defined and different from currentpin.
(A2) The net (1, currentpin) in step 3 of Figure 4.13 is already routed, so it is
possible to follow this route.
(A3) currentpin does not assume the same value twice.
For (Al), we simply assume that each module has atleast two pins. Modules
with a single pin may be ignored initially and routed to after the remaining routes
have been made. For (A2), let the value of the currentpin and I at the start of
the i'th iteration of the while loop be ci and li, respectively. We note that cl = s
and 11 E pins(C). If li E pins(C) U pins(N n U), then from conditions (2) and (3),
it follows that ci+i E pins(C) U pins(N n U) and wire (li, ci+i) has been routed.
Suppose there is an li pins(C) U pins(N n U). Let lj be the first such li. Since
j > 1, ljk and Cjk+l are in pins(C) U pins(N n U) for k > 1. Since cj and Ij
are on the same module, it follows that module(cj) V N. So, lI E extlpins(C) and
cj E pins(C). From the way algorithm Testing Planar.Routability works, it follows
that (lj, cj) is a segment of the curve C and that curve C when oriented from r
to s, first reaches lji and then cj via wire (lj,,cj). Hence lj_ E pins(C). Since
lj2 E pins(C) (by assumption on j), cj_ E pins(C) U pins(N n U). Further, since
cj1 is a module of C, cji E pins(C). Now, since (I1, cj) is a segment of C and cj1
is one pin clockwise from lj1 and a pin of C, it follows that (j2,cCj1) is a segment
of C oriented from lj2 to cj1. Continuing in this way, we conclude that (11,c2) is
a segment of C oriented from 11 to c2. However, we know that when C is oriented
from r to s there is only one wire segment that includes a pin of module(s) and this
is oriented to module(s). That is, the orientation is c2 to 11, a contradiction. Hence,
there is no li V pins(C) U pins(N n U). Also, no li E {r, s} at the start of a while
loop iteration. From condition (2), it follows that all encountered (li, ci+i) have been
routed.
For (A3), suppose that ci = cj for some i and j, i < j. Since (li1,ci) and
(l_l,cj) are twopin nets, it follows that 1i1 = lj1. Now, since cii and cj1
are, respectively, one pin counterclockwise from lii and lj1, it follows that ci1 =
cj~. Continuing in this way, we see that s = cl = cji+i. This implies that net
(lji, cji+) = (r,s) has already been routed. But, it has not. So, no ci is repeated.
(B) follows from the fact that find_route reaches only pins in pins(C)Upins(N n
U), condition (3) and the fact that find_route follows existing routes without enclosing
any new pins. [
Lemma 6 : If the algorithm Testing Planar_Routability (given in Figure 4.8) termi
nates in step 4(b), the RI is not planar routable.
Proof : If the algorithm terminates in step 4(b), then let r and s be as in step 4. r
is at the top of stack A and s is in stack B but not at the top. Let x be at the top of
stack B and let y be the pin such that net(y) = net(x). y must currently be on stack
A as x can be put on stack B (see step 4(c)) only if y is on stack A. When one pin
of a net is in stack A and the other in stack B, the pins can leave the stacks together
(step 4(a)) or not at all. Since x is on stack B at termination, y must still be on
stack A and hence must be lower than r (as r is at the top). So, there is a curve
y... r in the RI. Furthermore, curves y...r...s and y...r... x must exist as this is
the only way s and x can get to stack A and then to stack B. Figure 4.12(a) shows
an example curve y... r... s. This figure assumes that module(s) module(r). The
proof for the case module(s) = module(r) is similar. Let m be the module at which
the curves y... r... s and y... r... x diverge (Figure 4.12(b)). Note that m may be
module(r) or a latermodule on the curve y ... r ... s. Let u be the pin of m that is the
last pin of m on curve y... r... s and let v be the corresponding pin for y... r... x.
Since all nets are twopin nets, u v. Since x is above s in stack B, v must be
on the curve y... r...s. The curve C = y...r...v... x is a closed curve. We see
that r E pins(C), and s E extpins(C), and net(s) = net(r). So, s and r satisfy the
conditions of Lemma 3 and the RI is not planar routable. 0
Theorem 6 : The algorithm TestingPlanar.Routability (given in Figure 4.8) is cor
rect.
Proof : Follows from Lemmas 5 and 6. O
The algorithm of Figure 4.8 is easily implemented to have complexity of O(n)
where n is the total number of pins. For this we need to use an array status[l..n] to
maintain the current status (i.e., on stack A, on stack B, on neither) of each pin.
4.4 Topological Routability of Multipin Nets
We shall refer to the extension of TestingPlanarRoutability or TPR to the case
where some or all nets may have more than two pins as MTPR. The MTPR prob
lem may be solved in linear time by mapping MTPR instances into graph planarity
instances [15, 18]. However, the known linear time algorithms [9] for graph planarity
are complex and one is motivated to explore the possibility that simpler algorithms
exist for MTPR (just as they do for TPR). Unfortunately, this is not the case. We
show, in Theorem 7, that any algorithm for MTPR can be used to test graph pla
narity with no increase in complexity. In Theorem 8, we show that the problem of
determining the maximum number of topologically routable nets of an MTPR in
stance is NPhard. For the case where all the pins are twopin nets, we can use the
construction of [18] and the algorithm of [28] to find the maximum subset that is
topologically routable. The complexity of the resulting algorithm is O(n2), where n
is the total number of nets. Theorem 7 motivates the quest for a simple linear time
algorithm for a restricted version of MTPR. We show that the algorithm of Figure 4.8
may be extended to handle MTPR instances in which every pair of modules remains
connected (though not necessarily by a net) when all nets other than twopin nets
are eliminated.
Theorem 7 : Let I be an instance of graph planarity. I can be transformed in linear
time, into an instance I' of MTPR such that I is planar if I' is topologically routable.
Proof : From the constructions of [15, 18], it follows that the topological routability
of an MTPR instance does not depend on the specific placement of the modules.
Hence, in constructing I', we need not specify the module placement. I' is obtained
from I by replacing each edge (i,j), i < j of I by a module Mij with two pins MA.
and M?. The nets of I' are N, = {M4j I i < j} U {Mi I j < i}, 1
the number of vertices in I.
If I' is planar routable, then each net, Ni, has a planar realization that does
not contact the realization of any other net. This realization connects the pins of Ni
together, possibly using some Steiner points (see Figure 4.14(a)).
6 6 6
8 7 77
(a) (b)
Figure 4.14. Realization of a planar net using one Steiner point
If Ni is a twopin net, then introduce vertex vi anywhere on the wire connecting
the two pins of Ni. If Ni has more than two pins, then by using the transformations
of Figure 4.14(b), we can reduce the number of Steiner points to one and also ensure
that each pin of Ni has exactly one wire connected to it. There transformations
preserve the planarity of the routing. The sole surviving Steiner point is replaced by
the vertex vi.
Now each wire that connects to vi connects to module Mij or Mj,. This module
has another wire connecting to vertex j. Remove Mif (or Mji) and join the ends of
these two wires together by a line joining the terminals of Mij (or Mji). We now have
a planar embedding of I.
If I is planar, then start with its planar embedding. Replace vi by a Steiner
point; place Mij anywhere on the embedding of edge (i,j), i < j; split the edge (i,j)
at Mij and connecting the two ends (at the split point) to the terminals of Mij. This
yields a topological routing of I'.
Hence, I is planar if I' is topologically routable. 0
Let MTPRmax be the problem of determining whether or not k of the nets of an
MTPR instance are topologically routable. To show that MTPRmax is NPcomplete,
we use the following problem that is known to be NPcomplete [7].
Planar Subgraph: Given Graph G = (V, E), and a positive integer k
Is there a subset V' C V with I V' > k such that the subgraph induced by the V'
vertices is planar?
Theorem 8 : MTPRmax is NPcomplete.
Proof : It is easy to see that MTPRmax is in NP. Also, from an instance I of the
planar subgraph problem, we can construct an instance I' of MTPRmax by replacing
edges by modules as in Theorem 7. It is easy to see that I' has k nets that are
topologically routable iff I has an induced subgraph with k vertices that is planar.
0
Any instance I of MTPR may be transformed into an instance I' of TPR which
includes unordered modules (i.e., modules whose terminals may be rearranged at
will). I' has the property that there is an arrangement of terminals for each of the
unordered modules which results in I' being topologically routable iff I is topolog
ically routable. To obtain I' from I, for each multipin net Ni of size k, k > 2, we
introduce an unordered module UMi with k pins. The net Ni is replaced by k two
pin nets, one pin of each of these nets is an original pin of Ni and the other a pin
of UMi. Since planar routability is not affected by module placement, UMi may be
placed anywhere in the routing region. An example is given in Figure 4.15.
(a) A multipin net (b) Two pin nets
Figure 4.15. Transformation from multipin to twopin nets
Theorem 9 : The pins of each unordered module of I' can be ordered so that the
resulting instance of TPR is topologically routable if I is topologically routable.
Proof : If the pins of the UMi's can be so ordered, then a topological routing of
I is easily obtained from the topological routing of I' (simply replace each UMi by
a Steiner point). If I is topologically routable, then using transformations similar to
those in Figure 4.15(b), we may transform the topological routing into one in which
each multipin net Ni of size k > 2 is routed using exactly one Steiner point. This
Steiner point is replaced by module UMi and the pin ordering is determined by the
topological routing around the Steiner point. O1
If we knew which terminal orderings of the UM,'s to use, we could simply convert
each UMi to an ordered module and run the TPR algorithm. Unfortunately, we do
not know this. Therefore we need to modify algorithm Testing.Planar.Routability so
as to properly handle unordered modules. As in section 3, we may assume that I
is a single component. For our modification to work, we assume that I remains a
single component when all multipin nets Ni of size k > 2 are eliminated from I. The
modified algorithm MTPR is given in Figure 4.16.
The working of the MTPR algorithm is explained using the example in Fig
ure 4.17. There are five modules 15 and there are seven nets out of which six are
twopin nets. The seventh is a fourpin net. The pins of the fourpin net are w, x, y
and z. In step 1 of the algorithm of Figure 4.16, we replace the multipin net of size
four with a new unordered module UM1 which has four pins. Suppose we begin in
step 2 with m = 5 and p = A. Then in step 3, ACxB get stacked, in that order, on
to stack A. This corresponds to the curve of Figure 4.18(a). The top of stack A now
has pin B, and the curve on stack A is extended by adding pins from module 1 to
stack A. In this process, the wire Bb is output for routing. This situation is depicted
in Figure 4.18(b). Since pin a is at the top of stack A and its mate is below it in the
stack, pin a is moved from stack A to stack B. At the top of stack A, we have a pin
of a fourpin net and since this net is seen for the first time, we add the unordered
module to the top of stack B and mark y as having been seen. We route the wire
from pin y to the unordered module. The pin x, which is below pin y is next routed
to the unordered module. This is done using the procedure findroute of Figure 4.13.
The curve on stack A is extended by adding pins from module 4. At this time, wire
Cc is output for routing. This scenario is depicted in Figure 4.18(c). Now, pin z is
at the top of stack A. This pin is routed to the unordered module in step 6(a) of the
algorithm. Next, in step 3 the wire dD is output for routing. Also, pin E is put on
Algorithm MTPR
Step 1: For each multipin net Ni of size k > 2, introduce a new unordered module
UMi with k pins and replace Ni by k two terminal nets as described earlier.
Step 2: Let m be any ordered module and let p be any pin of m which corresponds
to an original twopin net.
Step 3: Examine the pins of m in counterclockwise order beginning at pin p. When
a pin q is being examined compare net(q) and net(r) where r is the pin (if any)
at the top of stack A. If stack A is empty or net(q) :L net(r) then add q and
the remaining pins of m to the top of stack A. Otherwise output (q,r) and
unstack r from A.
Step 4: If both stacks A and B are empty, then terminate.
Step 5: Let r be the pin at the top of stack A. Let s be the pin such that
net(r) = net(s). If modules) is an unordered module then go to step 6.
(a) If s is at the top of the stack B, then [output (r, s); unstack r from A and
s from B; go to start of Step 4].
(b) If s is in stack B but not at the top, then [output("The RI is not planar
routable"). Terminate].
(c) If s is in stack A, then unstackk r from A; add r to stack B; go to the
start of Step 5].
(d) If s is in neither of the stacks then [ set p to s; let m be the module
containing s; go to Step 3].
Step 6:
(a) If modules) is at the top of stack B, then [output (r, s); unstack r from
A; mark pin s as having been seen. If all pins of module(s) have been
marked then unstack modules) from B; go to start of step 4]
(b) If module(s) is on B but not at the top, then [output("The RI is not
planar routable"). Terminate].
(c) If module(s) is not in stack B, then unstackk r from A; mark pin s as
having been seen; add modules) to the top of stack B; go to start of step
5].
Figure 4.16. Topological routing of multipin net for restricted version
Figure 4.17. Example RI with a four pin net
the top of stack A. At this point stack A contains pins AfE, bottom to top in that
order. This situation is depicted in Figure 4.18(d). We set m = 2 and p = e in step
3 of the algorithm and output wires Ee and fF for routing. The remaining pin w is
put at the top of stack A. In step 6(a) of the algorithm, we mark pin w as seen and
route a wire from this pin to the unordered module. Also, we remove the unordered
module from the top of stack B. This is shown in Figure 4.18(e). Now, stack A
contains pin A and stack B contains pin a and the wire Aa is output for routing in
step 5(a) of the algorithm. Both the stacks are empty and the algorithm terminates
successfully in step 4. The topologically routed RI can be found in Figure 4.17.
Lemma 7 : If the algorithm MTPR halts in step 4, then the wires are planar routable.
Proof : The proof is very similar to that of Lemma 5. The same invariant holds.
When we put an unordered module on stack B and mark a pin (if that is the first
pin marked) then we connect this pin to the unordered module. See that there is no
ACxB ACxBbya
(a) (b)
A C x BHb y a A x y a
c  d z cf 4z
k2D E
(c) (d)
c f  z
SD E e F w
(e)
Figure 4.18. Illustration of the routing sequence with Multiterminal net
enclosed region and the invariant holds true. Now, if we are routing another pin (of
multipin net) then the pin it has to be connected is on the unordered module. So as
soon as we reach the unordered module, the next pin is chosen as the pin it has to
be connected to (this also defines the order of pins in the unordered module). The
proofs apply in this case as we have made all nets twopin nets. 0
Lemma 8 : Let I be an RI that contains a curve C = P1P2 ... Pj. Let R = R1R2 ... Rk
and S = SIS2... S be two curves such that module(Ri) = module(Pd) for some d,
1 < d < j and first(Ri) E extpins(C) and module(S1) = module(P,) for some e,
1 < e < j and first(Si) E pins(C).
Let C be such that first(Pi) and last(Pj) are part of the same net N. Assume
that there exist two pins a and b such that a E pins(C) Upins(S), b E extpins(C) U
pins(R) and net(a) = net(b) 0 N.
I is not planar routable.
Proof : Follows from Lemma 2. 0
Lemma 9 : If algorithm MTPR terminates in steps 5(b) or 6(b), the RI is not planar
routable.
Proof : Suppose the algorithm terminates in step 5(b). Let r and a be as in
step 5 and let x be at the top of stack B. Note that r and s define a twopin net.
If x is a twopin net, then the RI is not planar routable (see proof of Lemma 5).
So assume that x is an unordered module (note that only pins of twopin nets and
unordered modules get on to stack B). Module x must have atleast one marked and
one unmarked pin. Let C be the curve defined by the stack A segment from r to s
when s was at the top of stack A just prior to being transferred to stack B. From the
working of MTPR, it follows that there is a pin p E pins(C) from which a path was
traced to the multipin net corresponding to module x. Furthermore, there is atleast
one pin a of the multipin net that is on a path from a pin that is not in pins(C). A
possible situation is shown in Figure 4.19. The conditions of Lemma 8 are satisfied
and the RI is not planar routable.
If the algorithm terminates in step 6(b), then r is a pin of a multipin net and
module(s) is an unordered module. Let x be at the top of stack B at the time of
termination. Let j be one of the pins that have already been routed to module(s).
Figure 4.19. An possible situation where the RI is unroutable
Let C be the curve defined by the stack A segment at the time pin j was output for
routing. Let S be the curve or pin in pins(C) that was used to reach pin x. Let y
be such that net(x) = net(y). Since y must be below r on stack A and r is a net of
a multipin net, the path from y to r on stack A must include a pin in ext.pins(C).
By setting a and b of Lemma 8 to x and y respectively, we see that the conditions
of Lemma 8 are satisfied and the RI is unroutable. The proof for the case x is an
unordered module is similar. 0
4.5 Implementation of TwoPin Algorithm
While the correctness proof for our algorithm is somewhat involved, the algo
rithm itself is quite simple and easy to implement. To get good performance we
implemented stack A as a stack of modules rather than one of pins as described in
Section 3.3. So, when step 2 of Figure 4.8 adds q and the remaining pins of m to
stack A, we simply add a record of the type (m, q, 1) where 1 is the last pin of m to
the stack. Also, to get the top pin of stack A, we look at the top record (m,q,I).
The top pin is 1. To delete this pin, the top record is changed to (m, q,p(1)) where
p(l) is the predecessor of pin 1 unless q = I. In the latter case, the record (m,q, 1) is
deleted from the stack. The role of array status needs to be changed to support this
change in stack structure. We now keep a status for a module as well as for a pin.
A module's status reflects whether or not it is in stack A and a pin's status reflects
whether or not it is in stack B.
The twopin net algorithm of MarekSadowska and Tarng [18] is a two step
algorithm:
Step 1: Merge modules together to obtain an equivalent routing problem in which
all pins are on the periphery of a routing region.
Step 2: Determine the feasibility of the equivalent problem using a single stack
scheme.
To implement step 1, we performed a traversal of the modules. Each module was
represented as a singly linked circular list of pins. With this representation, modules
can be merged efficiently. By contrast, for the algorithm of Figure 4.8, modules were
represented using doubly linked circular lists.
The multipin net algorithm of MarekSadowska and Tarng [18] has three steps:
Step 1: Merge modules together to obtain an equivalent routing problem in which
all pins are on the periphery of a routing region.
Figure 4.20. Treelike connected circuits
Step 2: Traverse the pins and transform multipin nets into two pin nets.
Step 3: Determine the feasibility of the equivalent problem using a single stack
scheme.
4.6 Experimental Results
We implemented our algorithm for twopin nets and multipin nets and that
of MarekSadowska and Tarng [18] in C and obtained execution times using both
circuits that are routable and those that are not. For the twopin net case, the
routable circuits used are highly structured ones as shown in Figures 4.20 and 4.21
as well as randomly generated ones. The nonroutable circuits used were obtained by
modifying the treelike circuits of Figure 4.20.
Figure 4.21. Sixway connected circuits
*
Figure 4.22. Treelike connected circuits with multipin nets
Figure 4.23. Eightway connected circuits with multipin nets
For the multipin net case, we used highly structured circuits as shown in Fig
ures 4.22 and 4.23. The nonroutable circuits for the multipin case was obtained by
modifying the structured circuits.
The timing results for the routable circuits of twopin nets, are shown in Tables
4.1, 4.2 and 4.3 respectively. The times are in milliseconds and the programs were
run on a SUN 4 workstation. On treelike circuits, the algorithm of MarekSadowska
and Tarng [18] took 65% more time than ours, on average; on sixway circuits, it
took approximately 40% more time; and on random circuits, it took approximately
37% more time.
For the multipin net case, the timing results for the routable circuits are shown
in Tables 4.5 and 4.6. On treelike circuits with multipin nets, the algorithm of
MarekSadowska and Tarng [18] took 295% more time than ours, on average; on
