Efficient algorithms for electronic CAD

MISSING IMAGE

Material Information

Title:
Efficient algorithms for electronic CAD
Physical Description:
x, 100 leaves : ill. ; 29 cm.
Language:
English
Creator:
Thanvantri, Venkat
Publication Date:

Subjects

Subjects / Keywords:
Computer-aided design   ( lcsh )
Integrated circuits -- Very large scale integration -- Design and construction   ( lcsh )
Computer and Information Science and Engineering thesis, Ph. D
Dissertations, Academic -- Computer and Information Science and Engineering -- UF
Genre:
bibliography   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1995.
Bibliography:
Includes bibliographical references (leaves 97-99).
Statement of Responsibility:
by Venkat Thanvantri.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 002058179
oclc - 33849110
notis - AKP6223
System ID:
AA00004720:00001

Full Text












EFFICIENT ALGORITHMS FOR ELECTRONIC CAD


By

VENKAT THANVANTRI










A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


1995
































@Copyright 1995
by
Venkat Thanvantri














ACKNOWLEDGMENTS


My heartfelt appreciation geos to my advisor Professor Sartaj Sahni for giving

me continued guidance in my thesis work. I thank him for the help, patience and sup-

port he provided throughout my stay in the University of Florida. Weekly meetings

and discussions with him have spawned many ideas, for which I am thankful.

Special thanks go to Dr Baba Vemuri for his interest in my research and for

serving on my supervisory committee. I would like to thank other members in my

superviory committee, Dr Tim Davis, Dr Li-Min Fu and Dr John Harris, for their

interest and comments. I thank Dr Rajasekharan for agreeing to attend the thesis

defense on a very short notice.

Thanks go to Seonghun Cho for his willingness to discuss the general subject of

algorithms.

I thank Archana Nair for being a good friend and giving encouragement when I

needed it most.

Thanks go to my sister Lakshmi and brother-in-law Anand for being there when

I wanted. Finally, I would like to thank my parents for the love and support, without

which I could not have pursued my doctoral studies. To them I dedicate this work.














TABLE OF CONTENTS




ACKNOWLEDGMENTS ................... .......... iii

LIST OF TABLES .................. ............... vi

LIST OF FIGURES .................... ............ viii

ABSTRACT ..................... .............. ix

CHAPTERS

1 INTRODUCTION .... ................. ......... 1
1.1 Background .. .. .. .. .. .. .. .. .. .. ... .. .. 1
1.2 Physical Design Automation .. ................... 3
1.3 Thesis Outline ........ ..... ..... .......... 5


2 FOLDING A STACK OF EQUAL WIDTH CO
2.1 Background .................
2.2 Introduction ... ...........
2.3 Normalization ...............
2.4 Equal-Width Height-Constrained ...
2.5 Parametric Search ..............
2.6 Equal-Width Width-Constrained .....
2.7 Experimental Results ............
2.8 Conclusions .................

3 STANDARD AND CUSTOM CELL FOLDING
3.1 Introduction .................
3.2 Standard Cell Folding (Problems 1-4) .
3.2.1 Width Constrained Case (Problems
3.2.2 Height Constrained Case (Problems


MPONENTS .


1 and 2)
3-4) .


3.3 Standard Cell Folding (Problems 5-7) . .
3.3.1 Minimum Channel Height (Problem 5) .
3.3.2 Minimize Chip Area Subject to Width Constraint
6) . . . .


(Problem
.....









3.3.3 Minimize Chip Area Subject to Height Constraint (Problem
7) . . . . 47


3.4 Custom Cell Folding (Problems 8 and 9) .
3.4.1 Width Constrained Folding (Problem 8)
3.4.2 Height Constrained Folding (Problem 9)
3.5 Experimental Results . .


3.6 Conclusions ....................


4 PLANAR TOPOLOGICAL ROUTABILITY . .
4.1 Introduction . . . .
4.2 Prelim inaries .. .. .. .. .. .. .. .. .. ... .. .
4.3 The Algorithm ............................
4.4 Topological Routability of Multi-pin Nets . .
4.5 Implementation of Two-Pin Algorithm . .
4.6 Experimental Results . ... .
4.7 Conclusion . . . .

5 CONCLUSIONS AND FUTURE WORK. . ..


REFERENCES ...................................

BIOGRAPHICAL SKETCH ........................... .


58
58
60
66
76
86
88
94

95


97

100


.


.
.
.

.

















LIST OF TABLES


2.1 Summary of results of Paik and Sahni . ..... 10

2.2 Comparison of equal-width height-constrained algorithms ...... 20

2.3 Run times of equal-width width-constrained algorithms ... 29

3.1 Heights produced by width-constrained standard cell folding algorithms 56

3.2 Run times of width-constrained folding algorithms for custom cells .. 57

4.1 Tree-like Connected Circuits . ..... 91

4.2 Six-Way Connected Circuits .................. ...... 91

4.3 Random Circuit ................... .......... 92

4.4 Faster Termination for NonRoutable Circuits . ... 92

4.5 Eight-way Connected Circuits with Multi-pin Nets . ... 93

4.6 Tree-like Connected Circuits With Multi-pin Nets . ... 93

4.7 Faster Termination for NonRoutable Circuits With Multi-pin Nets 93

















LIST OF FIGURES




2.1 Stack of equal width components . ..... ... 9

2.2 Routing space reserved .... ..................... 9

2.3 Case when hj + rj+l I rj ........................ 13

2.4 Case when hj + rj < rj+l ............. ......... 15

2.5 Normalizing a stack .................... .... 15

2.6 Procedure to obtain a minimum width folding . .... 19

2.7 Procedure for parametric search . ..... 22

3.1 Standard cell Architecture .................... .. 32

3.2 Procedure to obtain a minimum height folding . ... 40

3.3 Procedure to obtain a minimum channel height folding ... 45

3.4 Procedure to obtain a minimum height folding for custom cells 48

3.5 Procedure to delete F(.) values as in Observation 3 ... 51

3.6 Procedure to Insert F(.) values ..................... 54

4.1 A planar routable and a nonplanar routable case . ... 59

4.2 Augmentation .. ......................... .. 61

4.3 An example to illustrate some terminology . ... 61









4.4 Two possibilities to connect a and b


4.5 Another not planar routable situation . .... 64

4.6 Constructing the envelope of a component . .... 65

4.7 Re-routing to free independent component . ... 66

4.8 Topological routing.. ........................... 67

4.9 Exam ple RI ................... ............ 68

4.10 Illustration of the routing sequence . ... 69

4.11 Trapped terminal and module . . ... 70

4.12 To illustrate conflict ........................... 71

4.13 Algorithm to find routing path between pins r and s ... 73

4.14 Realization of a planar net using one Steiner point . ... 78

4.15 Transformation from multipin to two-pin nets . ... 80

4.16 Topological routing of multipin net for restricted version ... 82

4.17 Example RI with a four pin net . . ... 83

4.18 Illustration of the routing sequence with Multiterminal net 84

4.19 An possible situation where the RI is unroutable . .... 86

4.20 Tree-like connected circuits . . ... 88

4.21 Six-way connected circuits ........................ 89

4.22 Tree-like connected circuits with multipin nets . ... 89

4.23 Eight-way connected circuits with multipin nets . .... 90


. .. .. 63
















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Doctor of Philosophy


EFFICIENT ALGORITHMS FOR ELECTRONIC CAD


By


Venkat Thanvantri


August 1995



Chairman: Dr. Sartaj Sahni
Major Department: Computer and Information Science and Engineering


In this thesis, we develop efficient algorithms for three problems that arise in

electronic computer aided design (ECAD). (1) component stack folding, (2) standard

and custom cell folding, and (3) planar topological routing.

The component stack folding problem arises in the layout of bit-slice architec-

tures. We consider two versions of this problem. In both, the components have equal

width and when a stack is folded, a routing penalty is incurred at the fold. In the

first version, the height of the folded layout is given and we have to minimize the

width. In the second, the width of the folded layout is given and its height is to be

minimized. We develop a normalization technique that permits the first version to








be solved in linear time by a greedy algorithm. The second version can be solved

efficiently using normalization and parametric search.

In standard and custom folding, the component list is folded into rows and a

routing penalty is incurred between two rows. In the model we consider, the number

of wires that have to cross between two rows serves as the routing penalty. Nine

versions of the folding problem are formulated and efficient algorithms are developed

for each.

We develop a simple, fast linear time algorithm to determine if a collection of

two-pin nets can be routed topologically in a plane. Topological routability testing

of a collection of multi-pin nets is shown to be equivalent to planarity testing, and a

simple linear time algorithm is developed for the case when the collection of modules

remain connected following the deletion of all nets with more than two pins.

Experimental results are presented.














CHAPTER 1
INTRODUCTION


1.1 Background

With current technology, a single chip can have several million transistors. De-

sign and fabrication of such chips is made possible by the automation of the steps

involving the development of the chip. Starting with the formal specifications, the

VLSI design cycle goes through a series of steps to produce the final product, a fully

packaged chip. The VLSI design cycle consists of the following steps [24]:


1. System Specification: In this step the high level representation of the system is

created. Performance, functionality, the physical dimensions, the choice of the

design techniques and the fabrication technology are considered in this step.


2. Functional Design: The output of this step is a timing diagram which is ob-

tained by considering the behavioral aspects of the system.


3. Logic Design: The logic design, in general, is represented by Boolean expres-

sions. The logic design that represents the functional design is obtained in

this step. The boolean expressions are minimized to obtain the smallest logic

design. Correctness of the logic design is also asserted in this step.








4. Circuit Design: A circuit which represents the logic design of the system is

developed in this step by taking into consideration speed and power require-

ments, and electrical behavior of the components used in the development of

the circuit.


5. Physical Design: This is the most time consuming step in the VLSI design

cycle. In this step, the components and the interconnections are represented by

geometric patterns. The objective of this step is to obtain an arrangement of

these geometric patterns which minimizes the area and power and satisfies the

timing requirements of the chip. Due to its high complexity this step is broken

down into smaller sub-steps. We will look into this step in detail later in this

chapter.


6. Design Verification: In this step design rule checking and circuit extraction are

done to verify that the circuit layout from the physical design step satisfies the

system specfication and design rules.


7. Fabrication: The verified layout is used in the fabrication process to produce

the chip.


8. Packaging, Testing and Debugging: The fabricated chip is packaged and tested

to ensure proper functioning.


Each step in the design cycle can be viewed as a change in representation of the

system. The steps in the VLSI design cycle iteratively improve the representation to

meet the specifications.








1.2 Physical Design Automation

The physical design step maps a circuit design into a physical circuit. The input

to this step is a circuit design which is represented by a set of modules, a set of nets,

a chip carrier and the design rules. The modules and the chip carrier are usually

rectangular. The output of the physical design step is a layout for modules and

interconnections which has the desired functionality.

There are several objective functions that are used in the physical design step.

If the chip size is not fixed then the objective is to find a minimum area layout. When

the circuit speed is a consideration, the objective may be to minimize the critical net

length or minimize the sum of connection lengths.

The field of physical design automation involves developing algorithms and data

structures which can be used in the layout process. The algorithms are used to obtain

solutions which satisy the objective functions and which meet the design rules. Large

designs and the iterative improvements by the physical design engineers require that

the algorithms developed be very fast.

Physical design is an extremely complex process that is usually broken down

into smaller problems such as partitioning, floorplanning and placement, routing and

compaction.

In the partitioning step, the components of a large circuit are divided into a

collection of smaller subcircuits/modules according to some criteria. The factors that

are considered may be the size of the modules, number of modules and the number of








interconnections between the modules. At the end of the partitioning step, we have

a set of modules and a set of interconnections required between modules.

Selecting areas, power consumption, aspect ratios, and I/O pin locations of the

modules forms the floorplanning step. The floorplanning step optimizes design quality

in terms of chip area, power consumption, timing performance and wire density.

Floorplanning is an important step as it lays the foundation for the final layout.

The precise locations of the components are determined during the placement step

to optimize area and timing.

The routing phase completes the interconnections between the modules. Rout-

ing is usually divided into three smaller sub-problems which are global routing, de-

tailed routing and specialized routing. The global router decomposes a larger routing

problem into small and manageable problems. Steiner trees and spanning trees are

the commonly used approaches for net connection in global routing. Detailed routing

includes switchbox, channel and planar. Planar routing is a problem in which inter-

connection topology of the nets is planar. That is, all connections can be realized

on a single layer. Single layer routing is not always possible. In MCM technology

with many routing layers, a subset of nets that is planar routable is preferred. Planar

routing is usually preferred as no via is needed for the interconnections. Vias reduce

the reliability and performance of a circuit. Routing clock nets and power-ground

nets are specialized routing problems.








During the compaction phase, the components and the interconnections are

moved so as to further optimize the layout in terms of area and delay. By compress-

ing the chip, the components come closer thereby reducing the delay between the

components. This step must also ensure that by compressing the chip, design rules

are not violated.

1.3 Thesis Outline

One of the placement methods is to obtain a linear list of components mini-

mizing some criteria and then folding this list into a given height or width so as to

minimize the area. The objective functions that can be used when forming the linear

list of components may be minimizing the maximum density of wires between adja-

cent components; minimizing the total number of wire segments between adjacent

components; and minimizing maximum net length [1].

In the bit-sliced placement model introduced by Paik and Sahni [21], compo-

nent reordering is not permitted. That is, the components are ordered by some

objective function to obtain a component stack. This stack is folded into a layout

which is either height-constrained (layout height is given) or width-constrained (lay-

out width is given). In Chapter 2, we look into two problems considered by Paik and

Sahni [21]. We introduce a normalization technique which in combination with the

greedy method and parametric search helps develop linear time algorithms for these

problems.








In Chapter 3, we develop optimal algorithms to fold a linearly ordered list of

standard and custom cells under various optimization constraints. A total of nine

problems are formulated and their solutions provided.

In Chapter 4, we look into the planar topological routability problem. We

develop a linear time algorithm for planar topological routability for the case when

all nets are 2-pin nets. This algorithm determines the topological routability of the

given problem instance and the loose route of wires when the instance is topologically

routable. We also consider the case when there are multi-pin nets. For this case, we

prove that (a) the topological routability problem is equivalent to the graph planarity

problem, and (b) the problem of finding the maximum number of nets that are

topologically routable is NP-Hard. A linear time algorithm is developed for the case

when the circuit modules remain connected following the deletion of all nets that

have more than two pins.

Finally, we present conclusions and some future directions for this research.














CHAPTER 2
FOLDING A STACK OF EQUAL WIDTH COMPONENTS


2.1 Background

Wu and Gajski [31] introduced a new sliced-layout architecture to alleviate the

problems of the general bit-sliced layouts. Most fabricated chips can be described

by register-transfer schematics. In addition to gates, latches, and flip-flops, schemat-

ics include register-transfer components such as registers, counters, adders, ALUs,

shifters, multiplexers, and register files. Standard cell methodology decomposes the

components into basic gates, latches, and flip-flops before layout. Wu and Gajski [31]

suggest that greater layout density can be achieved if register-transfer components

are laid out in a bit-sliced layout architecture.

For each microarchitectural component there is a layout generator that includes

bit-slice generators. All generated bit slices are of the same width. If a component has

a width w, then it has w slices. Each microarchitectural component has a different

height. The component includes the cell-abutment, over-the-cell routing, and inter-

slice switch box to alleviate the problems of the previous approaches. Intracell routing

in done on metal 1 and Inter cell routing on metal 2. All regular components in a

design are stacked (stack of components) and routed in metal 2. Component stack

folding, in the context of bit sliced architectures introduced by Larmore, Gajski, and








Wu [14], is to fold this stack into itself in a way that minimizes the wasted area.

Stack folding (stack partioning) is also done in case there are too many components

in a single stack. In this paper [14], they used this model to compile layout for cmos

technology. Further applications of the model were considered by Wu and Gajski

[31].

In the model of Larmore et al. [14] and Wu and Gajski [31] the component stack

can be folded at only one point. In addition, it is possible to reorder the components

on the stack. These folding schemes begin by reordering the components by width.

They also show that the folding problem using this model is NP-complete.

A related, yet different, folding model was considered by Paik and Sahni [21].

In this, no limit is placed on the number of points at which the stack may be folded.

Also, component reordering is forbidden. They point out that this restriction is

realistic as the component stack is usually ordered so as minimize inter component

routing requirements and optimize performance. They also point out that this model

may be used in the application cited by Larmore et al. [14] and Wu and Gajski

[31]. Furthermore, it accurately models the placement step of the standard cell and

sea-of-gates layout algorithms of Shragowitz et al. [26, 25]. In the case of standard

cell designs, all modules have the same width while in the case of sea-of-gates designs

module widths and heights vary from module to module.

2.2 Introduction

A stack of equal width components is comprised of variable height components

C1,C2,...,Cn stacked one on top of the other. C, is at the top of the stack and












U i I-------
Cl cC+






stack Ci C.
stack stack2A stack3l


C.


(a) Component Stack


(b) Folded into three stacks


Figure 2.1. Stack of equal width components


C, Cn



LIIiIE


Interstack Routing


Figure 2.2. Routing space reserved


C1



C0
CG


LJu t^VAY


LuuuV'l.



















Table 2.1. Summary of results of Paik and Sahni


Routing area at stack ends

No Yes

Equal width, height constrained O(n) O(n2)

Equal width, width constrained O(n) O(n3)

Equal height, height constrained O(n4 log n) O(n4 log n)

Equal height, width constrained O(n4 log2 n) O(n4 log2 n)

Variable heights and widths, height constrained O(n5 log n) O(n5 log n)

Variable heights and widths, width constrained O(n5 log2 n) O(n5 log2 n)


(Source : Paik and Sahni


[21])








C, at the bottom (Figure 2.1(a)). If the stack is realized, physically, in this way,

the area needed is Ehi w where hi > 0 is the height of Ci and w > 0 is the width

of each component. If the component stack is folded at C; we obtain two adjacent

stacks C1,C2,...,Ci and Cn, Cn-1,...,Ci+l. The folding also inverts the left to

right orientation of the components C,,.. ., Ci+. Figure 2.1(b) shows the stack of

Figure 2.1(a) after folding at C,, Ci,. Notice that folding results in a snake-like

rearrangement. While not apparent from the figure, each fold flips the left-to-right

orientation of a component. As can be seen from Figure 2.1(b), pairs of folded

stacks may have nested components, components in odd stacks are left aligned; and

components in even stacks are right aligned. The area of the folded stack is the area

of the smallest rectangle that bounds the layout. To determine this, depending on

the model, we may need to add additional space at the stack ends to allow for routing

between components Ci, and Ci,, where Ci, is a folding point. If so, let ri > 0,

2 < i < n, denote the height of the routing space needed if the stack is folded at Ci-1

(Figure 2.2).

In practical situations, the height (width) of the rectangle into which the stack

is to be folded may be limited (and known in advance) and we are to minimize the

width (height). Several versions of folding into height (width) constrained rectangles

were considered by Paik and Sahni [21]. Their results are summarized in Table 1.

In this chapter we consider two of the problems considered in Paik and Sahni [21]:


(1) Equal-width, height-constrained with routing area at stack ends. In this problem,

we are to fold a stack of equal height components into a rectangle of given








height so as to minimize the width (and hence area) of the rectangle. For this

problem, the algorithm [21] runs in O(n2) time. We develop an O(n) algorithm.


(2) Equal-width, width-constrained with routing area at stack ends. Here the width

of the rectangle into which the folding occurs is given and we are to mini-

mize its height (and hence area). Four algorithms with complexity O(n log n),

O(n log log n), O(n log* n), and O(n) respectively are obtained. Experimental

results indicate that the O(n log n) algorithm is fastest in practice. This is due

to the fact that this algorithm has least overhead.

Our algorithms employ two techniques. The first is normalization in which an

input instance is transformed into an equivalent normalized instance that is relatively

easy to solve. The second technique is parameterized searching. In Section 2.2 we

describe our normalization technique and then in Section 2.3, we show how this

results in a linear time algorithm for the equal-width height-constrained problem.

Parameterized searching is described in Section 2.4 and then used in Sections 2.5 to

obtain the algorithms for the equal-width width-constrained problem. Experimental

results comparing the relative performance of the various algorithms for the equal-

width width-constrained problem are given in Section 2.6.

2.3 Normalization

Let hi be the height of the component Ci, 1 < i < n. Let ri be the routing height

needed between Ci,- and Ci if the component stack is folded at Ci-1,2 < i < n; and

let rx = rn+l = 0. The defined component stack is normalized if the conditions Cl

and C2 given below are satisfied for every i, 1 < i < n.










S S2 S1 52


*

hj-1 hj-1
hjI hij+l
rj rj rj+l rj+1

(a) (b)


Figure 2.3. Case when hj + rj+l < r


Cl : hi + ri+l > ri


C2: hi + ri > ri+l

An unnormalized instance I may be transformed into a normalized instance I

with the property that from a minimum height or minimum width folding of I, one

can easily construct a similar folding for I. To obtain i, we identify the least value of

i at which either Cl or C2 is violated. Let this value of i be j. By choice of j, either

hj + rj+1 < rj, or

hj + rj < rj+l.


We first note that (since hj > 0) it is not possible for both of these inequalities to

hold simultaneously. Suppose that hj + rj+l < rj. Now j > 1 as hi + r2 > 0 while

ri = 0. Also, hj + rj > rj+l. Consider any folding of I in which Cj-1 is a fold

point (Figure 2.3(a)). Let the height of the stack S1 be h(S1) and that of S2, h(S2).

Consider the folding obtained from Figure 2.3(a) by moving Cj from S2 to S1. Let

the height of the stacks now be h'(S1) and h'(S2). We see that








h'(SI) = h(S1) rj + hj + rj+ <5 h(S1)


and


h'(S2) = h(S2) rj hj + rj+l < h(S2).


So, the height and width of the folding of Figure 2.3(b) is no more than that of

Figure 2.3(a). Hence, the instance I' obtained from I by replacing the component

pair ((hji_, r3._), (hj, rj)) with the single component (hj-, + hi, rj-,) has the same

minimum width/height folding as does I. From a minimum width/height folding for

I' one can obtain one for I by replacing the component (hj-1 + hj,rj-1) with the two

components of I.

If hi + rj < rj+l, then I' is obtained by replacing the component pair

((hj, rj), (hj+1, rj+,)) with the single component (hj + hj+,, rj). The proof is similar

to the previous case.

The component pair replacement scheme just described may be repeated as often

as needed to obtain a normalized instance I. Note that the scheme terminates as

each replacement reduces the number of components by one and every one instance

component is normalized.

The preceding discussion leads to the normalization procedure Normalize of

Figure 2.5. The input to this procedure is a component stack C[1]... C[n] and the

output is a normalized stack C[1]... C[n] (the input n (say n") will be generally

larger that the output n (say n')).

C[i].h, C[i].r, C[i].f, and C[i].l, respectively, give the height, routing height needed

if the stack is folded at C[i 1], index of first input component represented by C[i],










S S2 S1 S2



-1 hj+
hj h j+l
hj_-1 hj
rj+i r+ r-

(a) (b)


Figure 2.4. Case when hj + rj < rj+l


Procedure Normalize(C,n)
{ Normalize the component stack C[1]... C[n]}
i := 1; next := 2;
while next < n + 1 do
case
: C[i].h + C[next].r < C[i].r:
{Combine with C[i 1]}
C[i 1].h := C[i 1].h + C[i].h;
C[i 1].l:= C[i].l;
i:=i-1;
: C[i].h + C[i].r < C[next].r:
{Combine with C[next]}
C[i].h:= C[i].h + C[next].h;
C[i].l:= C[next].l;
next := next + 1;
:else: C[i + 1] := C[next];
i := i +1;next := next + 1;
end;
n := i 1;
end; {Normalize}


Figure 2.5. Normalizing a stack








and index of the last input component represented by C[i]. At input, we have

C[i].h = hi

C[i].r = ri

C[i].f = C[i].l = i

1 < i < n, and C[n + l].r = 0. Note that, by definition, C[1].r = rn = 0. On

output, component C[i] is the result of combining together the input components

f, f + 1,...,. The heights and the r values are appropriately set. The correctness

of procedure Normalize is established in Theorem 1. Its complexity is O(n) as each

iteration of the while loop takes constant time; the first two case clauses can be

entered atmost a total of n 1 times as on each entry the number of components is

reduced by 1. The else clause can be entered atmost n 1 times as on each entry

next increases by 1 and this variable is never decreased in the procedure.


Theorem 1 : Procedure Normalize produces an equivalent normalized component stack.


Proof : The procedure maintains the following invariant at the start of each itera-

tion of the while loop:

Invariant: Normalizing conditions Cl and C2 are satisfied by all components C[j],j <

1.

This is clearly true when i = 1 as there is no component C[j] with j < 1. If

the invariant is true at the start of some iteration, then it is true at the end of that

iteration. To see this, note that if we enter the first clause of the case then following

the execution of this clause, C[j].h, C[j].r, C[j + 1].r,j < i', where i' is the value of

i following execution of the clause, are unchanged. So, the execution does not affect








C1 and C2 for j < i'. If the second case-clause is entered, then again Cl and C2

are unaffected by the execution for j < i as C[j].h, C[j].r, and C[j + l].r, j < i are

unchanged. When the third clause is entered the validity of C1 and C2 for j < i'

follows from the fact that the conditions for the first two clauses are false.

On termination, next = n+2. The last iteration of the while loop could not have

entered the first clause of the case statement as in this clause, next is not increased.

While in the second clause, next is increased, the condition C[i].h+C[i].r < C[next].r

cannot be true in the last iteration as now next = n" + 1 (n" is the input value of

n), C[i].h + C[i].r > 0, C[n"].r = 0. So, the last iteration caused execution of the

third clause of the case statement. As a result, C[n"' is moved to position n" + 1 of

C. From the invariant, it follows that Cl and C2 are satisfied for j < i' = n" + 1

(note i' is the final value of i). Hence the output component stack C[1]... C[n'] is

normalized. 0

Theorem 2 establishes an important property of a normalized stack. This prop-

erty enables one to obtain efficient algorithms for the two folding problems considered

in this chapter.


Theorem 2 : Let (hi,ri), 1 < i < n define a normalized component stack. Assume

that ro = rn+l = 0. The following are true:


I I
P1 : rk + E hj + r1+1 < rk-l + E hj + ri+,,1 j=k j=k-1








1 1+1
P2: rk + hj + r1+l < rk+ hj + rl+2,1 < k < l < n
j=k j=k





Proof : Direct consequence of C2 and C1, respectively. 0

Intuitively, Theorem 2 states that the height needed by a contiguous segment

of components from a normalized stack increases when the segment is expanded by

adding components at either end.

2.4 Equal-Width Height-Constrained

The height of the layout is limited to h and we are to fold the component stack so

as to minimize its width. This can be accomplished in linear time by first normalizing

the stack and then using a greedy strategy to fold only when the next component

cannot be accommodated in the current stack segment without exceeding the height

bound h. The algorithm is given in Figure 2.6.

From the correctness of procedure Normalize, it follows that a minimum width

folding of the normalized instance is also a minimum width folding of the initial

instance. So, we need only to show that the for loop generates a minimum width

folding of the normalized instance generated by the procedure Normalize. This follows

from properties P1 and P2 (Theorem 2) of a normalized instance. Since a segment

size cannot decrease by adding more components at either end, the infeasibility test

is correct. Also, there can be no advantage to postponing the layout of a component

to the next segment if it fits in the current one.



















Procedure Minimize Width(C, n, h, width)
{ Obtain a minimum width folding whose height is atmost h}
Normalize(C, n);
used := h; width := 1;
for i:= 1 to n do
case
: used C[i].r + C[i].h + C[i + 1].r < h :
{ assign C[i] to current segment }
used := used C[i].r + C[i].h + C[i + 1].r;
: C[i].r + C[i.h + C[i+ 1].r > h:
{infeasible instance }
output error message; terminate;
:else:{start next segment, fold at C[i 1] }
width := width + 1;
used:= C[i].r + C[ij.h + C[i + 1].r
end;
end; {Minimize Width}

Figure 2.6. Procedure to obtain a minimum width folding










Table 2.2. Comparison of equal-width height-constrained algorithms



n [7] Figure 5

16 0.11 0.05

64 1.80 0.14

256 24.85 0.52

Times are in milliseconds


Note that while we are able to solve the equal-width height-constrained prob-

lem in linear time using a combination of normalizing and the greedy method, the

algorithm of Paik and Sahni [21] uses dynamic programming on the unnormalized

instance and takes O(n2) time. In Table 2, we give the observed run times of the two

algorithms. These were obtained by running C programs on a SUN 4 workstation.

As is evident, our algorithm is considerably superior to that of [21] even on small

instances.

2.5 Parametric Search

In this section, we provide an overview of the parametric search method of

Frederickson [4], which uses developments by Frederickson and Johnson [5, 6] and

Frederickson [3]. This overview has, however, been tailored to suit our application

here and is not as general as that provided by Frederickson and coworkers [3, 4, 5, 6].

Assume that we are given a sorted matrix of O(n2) candidate values M,,, 1 <

i,j < n. By sorted, we mean that








Mi < Mij+, < i
and Mij 5 Mi+lj, 1 < i < n, 1
The matrix is provided implicitly. That is, we are given a way to compute Mij,

in constant time, for any value of i and j. We are required to find the least MAj

that satisfies some criterion F. The criterion F has the property that if F(x) is not

satisfied, then F(y) is not satisfied (i.e., it is infeasible) for all y < x. Similarly, if

F(x) is satisfied (i.e., it is feasible), then F(y) is feasible for all y > x. In a parametric

search, the minimum Mij that satisfies F is found by trying out some of the Mijs.

As different Mijs are tried, we maintain two values A1 and A2, A1 < A2 with the

properties:

(a) F(A1) is infeasible.


(b) F(A2) is feasible.

Initially, A1 = 0 and A2 = oo (we assume F is such that F(0) is infeasible, F(oo)

is feasible, and Mij > 0 for all candidate values). To determine the next candidate

value to try, we begin with the matrix set S = {M}. At each iteration, the matrices

in S are partitioned into four equal sized matrices (assume, for simplicity, that n

is a power of 2). As a result of this, the size of S becomes four times its previous

size. Next, a set T comprised of the largest and smallest elements from each of the

matrices in S is constructed. The median of T is the candidate value x to try next.

The following possibilities exist for x and F(x):

(1) x < A1. Since F(A1) is infeasible, F(y) is infeasible for all y < A,. So, F(x) is

infeasible.








Procedure PSEARCH(S,A ,A2,dimension,finish);
repeat
if dimension > 1 then [ replace each matrix in S by
four equal sized submatrices;
dimension := dimension/2 ]
for i := 1 to 3 do
begin
if dimension = 1 then
[ Let T be the multiset of values in all matrices of S; ]
else
[ Let T be the multiset obtained by selecting the largest
and smallest values from each matrix of S; ]
x := median(T);
if(AX < x < A2) then
if F(x) is feasible then A2 := x
else A1 := a;
Eliminate from S all matrices that have no values
such that AI < x < A2;
end;
until dimension2 IS\ < finish;
end; {PSEARCIIJ

Figure 2.7. Procedure for parametric search

(2) x > A2. Now, F(x) is feasible.


(3) A1 < x < A2. F(x) may be feasible or infeasible. This is determined by

computing F(x). If x is feasible, A2 is set to x. Otherwise, AX is set to x.


Following the update (if any) of A1 or A2 resulting from trying out the candidate

value x, all matrices in S that do not contain candidate values y in the range Ax <


y < A2 may be eliminated from S.








A more precise statement of the search process is given by procedure PSEARCH

(Figure 2.7). This procedure may be invoked as PSEARCH({M},0,oo,x,O). dimension

is the current number of rows or columns in each matrix of S and finish is a stopping

rule. The search for the minimum candidate that satisfies F is terminated when the

number of remaining candidates is < finish. If A2 = oo when PSEARCHterminates,

then none of the candidate values is feasible. If A2 is finite, then it is the smallest

candidate that is feasible.

Since we have assumed n is a power of 2, each time a matrix is divided into

four, the submatrices produced are square and have dimension that is also a power

of 2. Since M is provided implicitly, each of its submatrices can be stored implicitly.

For this, we need merely record the matrix coordinates (indices) of the top left and

bottom right elements (actually, the latter can be computed from the former using

the submatrix dimension). The multiset T required on each iteration of the for loop

is easy to construct because of the fact that M is sorted. Note that since M is

sorted, all of its submatrices are also sorted. Consequently, the largest element of

each submatix is in bottom right corner and the smallest is in the top left corner.

These elements can therefore be determined in constant time per matrix of S.


Theorem 3 : [4] The number of feasibility tests F performed by procedure PSEARCH

when started with S = {MI), AM an n x n sorted matrix that is provided implicitly

is O(log n) and the total time spent obtaining the candidates for feasibility test is

0(n). O








Corollary 1 : Let t(n) be the time needed to determine if F(x) is feasible. The

complexity of PSEARCH is O(n + t(n) log n). 0


For some of the algorithms we describe later, PSEARCH will be initiated with

IS| > 1 (i.e., S will contain more than one M matrix initially; all matrices in S will

still be of the same size). To analyze the complexity of these algorithms, we shall use

the following theorem and corollary.


Theorem 4 : [4] If PSEARCH is initiated with S containing m sorted matrices, each

of dimension n, then the number of feasibility tests is O(logn) and the total time

spent obtaining the candidate values for these tests is O(mn). 0


Corollary 2 : Let t(n) be as in Corollary 1. The complexity of PSEARCH under the

assumptions of Theorem 4 is O(mn + t(n) log n). 0


While we have described PSEA RCII under the assumption that the matrices of

candidate values are square and of dimension a power of 2, parametric search easily

handles other matrix shapes and sizes. For this, we can add more rows at the top and

columns to the left so that the matrices become square and have a dimension that is

a power of 2. The entries in the new rows and columns are 0. This does not affect

the asymptotic complexity of PSEARCII. Alternatively, we can modify the matrix

splitting process to partition into four roughly equal submatrices at each step. The

details of these generalizations are given in the literature [3, 4, 5, 6].

Procedure PSEARCH is a restricted version of procedure MSEARCHof [4]. An

alternative search algorithm in which the for loop is iterated twice, once with T








being the multiset of the largest values in S and once with T being the multiset of

the smallest values in S is given in Frederickson and Johnson [5, 6]. We experimented

with both the formulations and found that for our stack folding application, the three

iteration formulation of Figure 2.7 is faster by approximately 43%.

2.6 Equal-Width Width-Constrained

To use parametric search to determine the minimum height folding when the

layout width is constrained to be < w, we must do the following:


(1) Identify a set of candidate values for the minimum height folding. This set must

be provided implicitly as a sorted matrix with the property that each matrix

entry can be computed in constant time.


(2) Provide a way to determine if a candidate height h is feasible; i.e., can the

component stack be folded into a rectangle of height h and width w ?


In this section, for (1), we shall provide an n x n sorted matrix M (n is the

number of components in the stack) of candidate values. For the feasibility test of (2),

we can use procedure Minimize Width of Figure 2.6 by setting h equal to the candidate

height value being tested and then determining if width < w following execution of

the procedure. Since the component stack needs to be normalized only once and since

Minimize Width will be invoked for O(log n) candidate values, the call to Normalize

should be removed from the procedure Minimize Width and normalization done before

the first invocation of this procedure. Also, the remaining code may be modified to

terminate as soon as w folds are made.








Since feasibility testing and normalization each take linear time, from Corol-

lary 1, it follows that the complexity of the described parametric search to find the

minimum height folding is O(n + t(n) log n) = O(n + n log n) = O(n log n).

To determine the candidate matrix M, we observe that the height of any layout

is given by



ri + E hq + rj+1
q=i


for some i,j, 1 < i < j < n. This formula just gives us the height of the segment

that contains components Ci through Cj. Define Q to be the n x n matrix with the

elements


ri+ i hq + rj+,,l I < j < n
Qij =
O,i >j


Then for every value of w, Q contains a value that is the height of a minimum height

folding of the component stack such that the folding has width < w. From Theorem

2, it follows that



Qij < Qi,j+l,1 < i < n, 1 < j < n

Qij Qi+,j, 1 < i < n, 1 < j < n



Let Mij = Qn-i+l,j,1 < i < j < n. So, M is a sorted matrix that contains all

candidate values. The minimum Mij for which a width w folding is possible is the








minimum height width-w folding. We now need to show how the elements of M may

be computed efficiently given the index pair (i,j). Let



Hi= Yh, 1 j=1


and let Ho = 0. We see that


Sri + [, Ili-i + rj+,i < j
Qil =>
0,i >j


and so,


Sr-i+l + llj fln-; + rj+, i + j n + 1
Mi, =
0,i+j

So, if we precompute the His each Mij can be determined in constant time. The

precomputation of the His takes O(n) time. Hence, the overall complexity of the

parametric search algorithm to find the minimum height folding remains O(n log n).

We note that our O(n log n) algorithm is very similar to the O(n log n) algorithm

of [5] to partition a path into k subpaths such that the length of the shortest subpath

is maximized. The differences are that


(1) We need to normalize the component stack before parametric search can be

used. and,








(2) The definition of Mij needs to be adjusted to account for the routing heights ri

and rj+l needed at either end of the stack.


[5, 6, 3, 4] present several refinements of the basic parametric search technique.

These refinements apply to the equal-width width-constrained problem just as well

as to the path partitioning problem provided we start with a normalized instance and

use the candidate matrix M defined above. These refinements result in algorithms of

complexity O(n log log n), O(n log* n), and O(n) for our component stack problem.

2.7 Experimental Results

The four parametric search algorithms for the equal-width height-constrained

problem were programmed in C and run on a SUN 4 workstation. For comparison

purposes, the O(n3) dynamic programming algorithm of Paik and Sahni [21] was also

programmed. The run time performance of these five algorithms is given in Table 3.

These times represent the average time for ten instances of each size.The component

heights were obtained using a random number generator. The four parametric search

algorithms did not exhibit much run time variation among instances with the same

number of components. The algorithm of Paik and Sahni [21] takes much more time

than each of the parametric search algorithms. Within the class of parametric search

algorithms, the O(n log n) one is fastest in the tested problem size range. This may

be attributed to the increased overhead associated with the remaining algorithms.

The O(n log n) algorithm is recommended for use in practice unless the number of

components in a stack is very much larger than 4096.










Table 2.3. Run times of equal-width width-constrained algorithms



n [7] O(nlogn) O(nloglogn) O(nlog*n) O(n)

16 4.9 1.47 2.28 1.49 1.52

64 314.7 8.84 15.75 27.14 26.71

256 23255 45.96 76.55 169.58 169.42

4096 1041.90 2148.60 2597.75 2760.25


Times are in milliseconds


2.8 Conclusions

We have shown that the equal-width height-constrained and equal-width width-

constrained stack folding problems can be solved by applying the greedy method and

parametric search, respectively, if the input is first normalized. Normalization can be

done in linear time. Hence the overall complexity is determined by that of applying

the greedy method or parametric search to the normalized data.

We have developed a linear time algorithm for the equal-width height-constrained

problem. This compares very favorably (both analytically and experimentally) with

the O(n2) dynamic programming algorithm of Paik and Sahni [21].

For the equal-width width-constrained problem we have developed four algo-

rithms of complexity O(n log n), O(n log log n), O(n log*n), and O(n), respectively.

All compare very favorably with the O(n3) dynamic programming algorithm of Paik








and Sahni [21]. Experimental results indicate that the O(nlogn) algorithm performs

best on practical size instances.














CHAPTER 3
STANDARD AND CUSTOM CELL FOLDING


3.1 Introduction

Standard cell and gate array design styles are characterized by a row (column)

organization of the layout. The layout area is divided into a number of parallel rows

separated by routing channels as shown in Figure 3.1. The layout problem is generally

divided into two independent subtasks: placement and routing. In the placement step

the appropriate locations and orientations of the standard cells are decided. In the

routing step, the required connections are added.

One approach to placement is linear ordering with folding [26, 13, 2]. In this

approach, the placement is divided into two distinct steps. The first is linear ordering

in which an order of the modules is determined so as to minimize the connection

length or minimize maximal density of connections for modules positioned in one

line. The folding step maps the linear order into the row structure of the chip.

The linear ordering problem is NP-hard and heuristic strategies are discussed in [26]

to minimize the connection length as well as maximal density of connections. The

greedy strategy is adopted in [26] for folding the ordered modules.

In this paper, we consider only the second step of the placement approach just

described. We begin with an ordered component list C1, C2,..., C and develop












I I I I
cl









I Ok


Ck+1
C ftc+1)


C ---- Standard cell row

Ci+1

Routing channel





Channel Height

Cn


Figure 3.1. Standard cell Architecture


algorithms to fold this list into rows. If the list is folded at Ci, then the component

Ci is in one row and Ci+l is in the next. If the list is folded at Ci and Cj and

at no component Ck for i < k < j, then components Ci+i,...,Cj are in the same

row. Suppose the list is folded at Ci. The channel height needed between the rows

containing Ci and Ci+1 may be estimated [8] using the number of nets that have a pin

in one of the components CI,..., Ci as well as in one of the components Ci+1,..., Cn

Let this height estimate be li, 1 < i < n. Let 1, = 0.

We study the following folding problems:

1. Standard cell folding to minimize total routing channel area subject to a chip

width constraint W. Since each routing channel has the same width, the chip

area assigned for routing is minimized when the sum of the channel heights is

minimized. This problem is solved in O(n) time using dynamic programming








(Section 3.2.1). Note that whenever we use the term chip area, we could instead

use subchip area.


2. Standard cell folding to minimize chip area subject to a chip width constraint

W. In this problem both routing area. and the area assigned for the components

is considered. Since the chip width is fixed at W, area minimization is equivalent

to minimizing chip height. In Section 3.2.1, we use dynamic programming to

obtain an O(n) algorithm for this problem.


3. Standard cell folding to minimize total routing area subject to a total routing

channel height constraint Ii. This problem differs from problem 1 only in that

the total height of the routing channels is fixed at H, and their width is variable

rather than the routing channels having variable total height and fixed width

W. In Section 3.2.2, we show how to solve this problem in O(n log n) time.


4. Standard cell folding to minimize chip area subject to a chip height constraint

H. This problem is solved in O(n log n) time in Section 3.2.2.


5. Standard cell folding using equal height channels of width W. We are to find

a folding that uses channels of minimum height. Among all such foldings, one

that uses the fewest number of routing channels (and hence fewest number of

component rows) is to be found. In Section 3.3.1, we develop an O(n log n) ex-

pected time algorithm for this problem. However, for most practical instances,

the algorithm has run time O(n).








6. Standard cell folding using equal height routing channels of width W. Find a

folding that minimizes the total chip area. This can be done in O(n2) time (see

Section 3.3.2).


7. Standard cell folding using equal height channels and a chip of height H. The

folding should minimize the total chip area. Our algorithm for this problem

can be found in Section 3.3.3. Its complexity is O(n2).


8. Custom cell folding to minimize total chip area subject to a chip width W.

Note that in standard cell layout, all cells/components/modules have the same

height and may have variable widths. In custom cell layout, the cells may differ

in both height and width. We assume that the cell row height is set to be the

height of the tallest cell assigned to that row. In Section 3.4.1, we develop an

O(n log n) algorithm for this problem.


9. Custom cell folding to minimize total chip area subject to a chip height con-

straint H. We solve this problem in Section 3.4.2 using an algorithm of com-

plexity O(n log2 n).


We note that problem 8 has been studied previously in [21] in the context of bit

slice stack folding. The algorithm developed there has complexity O(n2) while ours

has complexity O(n log n). Problem 9 has also been studied in [21]. Our O(n log2 n)

algorithm is an improvement over the O(n2 log n) algorithm developed in [21].








3.2 Standard Cell Folding (Problems 1-4)

Our discussion of problems 1-4 is divided into two parts. In Section 3.2.1, we

consider problems 1 and 2. In both these, the chip width and hence the cell and

routing channel widths are fixed at W. In Section 3.2.2, we consider problems 3 and

4 in both of which the chip height is fixed at H. In all four problems, the routing

channels have variable height. Each cell and hence each cell row has height h. The

width of cell i is wi, 1 < i < n. Let wij = E=i. wk, 1 < i < j < n. In case of fixed

chip width W, we may assume that wi < W, 1 < i < n.

3.2.1 Width Constrained Case (Problems 1 and 2)

We first consider problem 1. In this, we are to minimize the total routing area.

Since the channel widths are fixed at W, it is sufficient to minimize the sum of

channel heights. Suppose that Ci,..., Cn is folded at Ci in an optimal folding X.

Then the folding of C1,..., C, in X as well as that of Ci+l,..., C, must be minimum

area foldings. Hence, the principle of optimality holds and we can use dynamic

programming [10].

Let f(i, s), i < s, denote the minimum sum of channel heights when the com-

ponent list Ci,..., C, is folded such that Ci,..., C, are in one cell row and the first

fold is at C, (so, C,+, is in the next cell row). It is easy to see that f(n,n) = 1, = 0.

For 1 < s

0o if w1s, > W
f(i,s) = (3.1)
f(i + 1,s) otherwise









Also, for 1 < i = s < n, we get


f(i,i)= min {f(i+ l,q)+ l} (3.2)
i

The solution to problem 1 is obtained by first using Equations 3.1 and 3.2 to

determine f(i,s), 1 < i < s < n and then determining the minimum of f(1,j),

1 < j < n. The wi,'s may be precomputed in O(n2) time. Each f(i,s), i < s takes

0(1) time to compute and f(i,i) takes O(n i) time. Hence, all the f(i,s)'s, i < s

may be obtained in O(n2) time. The minimum of the f(1,j)'s can be obtained in

O(n) time. So the overall time needed to solve problem 1 using Equations 3.1 and

3.2 is O(n2).

A more careful implementation of the dynamic programming algorithm results

in a complexity O(n). First we compute the suffix sums


n
Qi = :vj, 1 < i < n
j=i


in O(n) time. Let Q,+1 = 0. From the suffix sums, each wi, can be computed in

0(1) time using

wi, = Qi Q.C+


Next, from Equation 3.1 we see that for i < s and ws, < W:


f(i,s) = f(i + 1,s) = f(i + 2, s) = ... = f(s,s) = F(s)








So, Equation 3.1 becomes (for i < s)


00 wi., > IV
f(i,s) = (3.3)
F(s) otherwise


Using Equation 3.3, Equation 3.2 may be rewritten as:



F(i) = f(i,i) = min {f(i + 1,q) +)
i
= mmin {F(q) + l}
i
= +min I{F(q)} (3.4)
i

The minimum total routing height needed is



min { F(i)} (3.5)
1

So, problem 1 may be solved by computing the n F(i)'s using Equation 3.4

(rather than the O(n2) f(i, s)'s using Equations 3.1 and 3.2) and finding the minimum

of O(n) F(i)'s in Equation 3.5. To compute the F(i)'s using Equation 3.4, we begin

with F(n) = 0 and compute F(n 1), F(n 2),..., F(1), in that order. To compute

an F(i) we need to find the minimum of a multiset Si of previously computed F's.

Specifically,


Si = {F(q) I1 < q < n and wti+,q < W}








Observation 1 : If wj+,,q > W, then wi+1,q > W for i < j. Hence, if F(q) Sj, then

F(q) V Si for i j. 1


From Observation 1, it follows that Si-1 may be computed from Si, 1 < i < n

by eliminating those F(q)'s for which wi,, > W and adding in F(i) (note that, by

assumption, wi = wii < W).


Lemma 1 : If F(a) E Si, F(b) E Si, F(a) < F(b), a < b, then we may eliminate F(b)

from Si and continue to compute Si-1, Si-2, S as described above. This does not

affect the values of F(i 1),..., F(1).


Proof : Note that F(j), j < i is being computed using the equation



F(j) = l, + min { F(q) }
F(q)ESi


If F(b) is eliminated from Si, the value of F(i) is unaffected as F(a) < F(b). If F(a)

is eliminated from Sj, j < i because wj+,,a > W, then F(b) would also be eliminated

as a < b and so wj+l,b > Wj+l,a > W. If F(a) is eliminated because there is a

F(c) < F(a), c < a, then so also will be F(b) be eliminated as F(c) 5 F(a) < F(b)

and c < a < b. 0

Observation 1 and Lemma 1 motivate us to maintain S as a sequential queue

[11] in an array Result[1..n]. Result[i].q and Result[i].F together represent an entry

of S yielding the value F(q). The elements of S are stored in positions tail, tail +

1, head of array Result. The F(q) values are in descending order left-to-right.




J.7


Hence, the q values are in ascending order. Procedure MinimizeHtStandard (Figure

3.2) is the resulting algorithm.


Theorem 5 : The procedure MinimizelHtStandard given in Figure 3.2 is correct.


Proof : There are two parts to the working of procedure MinimizeHtStandard. The

first one is computing F(i), in which deletions of F(.)'s can occur. The second one

is inserting the computed F(i) at the appropriate place in the array.

The procedure maintains the following invariant at the start of each iteration of

the for loop.

Invariant: Result[tail].F > Result[tail + 1].F > ... > Result[head].F

It is clearly true when i = n 1 as head = tail.

The invariant is true at the start of the iteration and so Result[head].F is

the minimum maintained F(.) value. The component number is maintained in

Result[head].q. We check whether Q[i + 1] Q[Result[head].q + 1] > W and if

so by virtue of Observation 1, we can eliminate this value. We do so by decrementing

the head pointer. We keep repeating this until we find a record k = Result[head].q

such that BR[i + 1] BR[k + 1] < W. This record pointed to by head has the min-

imum of the maintained F(.) values. We compute F(i) and store it in temp. Notice

that at the end of the while loop, we have deleted a few F(.)'s and the invariant

property still holds.

The invariant holds before the start of the second while loop. Here we start

at tail. If the inequality is true, then we delete the record and this is justified by

Lemma 1. We keep doing so until temp > Result[tail].F. Then we decrement the














Procedure MinimizeHtStandard;
{ Compute the minimum height layout}
{ Initialize S, = {F(n) = 0 } }
head := n; tail:= n;
Result[tailj.F := 0; Result[tail].q = n;
{ Compute F(i) }
for i := n 1 downto 1 do
begin
{ Compute S, }
while (Q[i + 1] Q[Result[head].q + 1] > W) do
head := head 1; {delete from Si+l, Observation 1 }
temp:= l[i] + Result[head].F; {Use min F in Si to compute F(i)}
while (temp < Result[tail].F) do { delete using Lemma 1 }
tail:= tail+ 1;
{ Store F(i) }
tail := tail 1;
Result[tail].F := temp;
Result[tail].q := i;
end; { of for }
while (Q[1] Q[Result[head].q + 1] > W) do
head := head- 1;
MinimizeHtStandard := Result[head].F;
end; { of MinimizelHtStandard }


Figure 3.2. Procedure to obtain a minimum height folding








tail pointer and store the temp record. So, the invariant holds at the end of the

iteration. Consequently, the invariant holds at the start of each iteration of the for

loop and the F's are correctly computed.

The minimum height layout is the minimum of the maintained F(.) values that

satisfy the width constraint, i.e Q[1] Q[Result[head].q + 1] < W. The last while

loop of the procedure take care of this fact. The last line of procedure computes the

MinimizeHtStandard, which is the minimum height layout. 0

Whenever the pointers head or tail are advanced in the while loops, we delete

F(.) values. This cost can be charged towards deletion of F(.) values. The remaining

code within the for loop takes O(n) amortized time. The complexity of the procedure

MinimizeHtStandard is clearly O(n) as no more than n deletions can take place.

Using standard dynamic programming traceback techniques [101, the fold points can

be obtained in additional O(n) time.

Problem 2, i.e, minimize total area rather than just routing area may be done

in a similar way. Let f(i, s), i < s now denote the minimum chip height for the

component list Ci,..., Cn assuming the first fold is at s. As before f(n, n) = 0 and

Equation 3.1 holds for i < s. Equation 3.2 needs to be replaced by



f(i,i) = min {f(i + 1,q) + li + h (3.6)
i

Using Equations 3.1 and 3.6 and the development for problem 1, an O(n) time

algorithm for problem 2 may be obtained.








3.2.2 Height Constrained Case (Problems 3-4)


The solutions to problems 3 and 4 are similar. Both use parametric search and

we describe only the solution to problem 3. Since the total height of the routing

channels is fixed at H, the area assigned for routing is minimized by minimizing the

chip width W. To use parametric search to minimize W, we must do the following:


1. Identify a set of candidate values for the minimum W. This set must be pro-

vided as a sorted matrix with the property that each matrix entry can be

computed in constant time.


2. Provide a way to determine if a candidate width W is feasible, i.e, can the

component stack can be folded using total channel height H and width W ?


For the feasibility test of 2, we can use procedure MinimizeHtStandard of Figure 3.2 by

setting W to the candidate value being tested and then determine if MinimumHtStandard <

H following the execution of the procedure.

Next, we provide an n x n sorted matrix M (n is the total number of components

in the component list) of candidate values. To determine the candidate matrix M, we

observe that the width of any layout is given by >,= wi for some i,j, 1 < i < j < n.

This formula gives us the width of the segment that contains components Ci through

Cj. M is a sorted matrix that contains all candidate values. The minimum Mij for

which a height H folding is possible is the minimum width height-H folding. We now

show how the elements of M may be computed efficiently given the index pair (i,j).








Let


n
Ti= Ewi,l < i < n
j=i


and let Tn+j = 0. Then,





ST-i+ T+, i + j > n +1
Mi3 =
,i i+ j

So, if we precompute the Ti's each Mij can be determined in constant time.

The precomputation of the Ti's takes O(n) time. Since feasibility testing takes linear

time, from Corollary 2, it follows that the complexity of the described parametric

search to find the minimum width folding is O(n + t(n)log n) = O(n + nlog n) =

O(n log n).

3.3 Standard Cell Folding (Problems 5-7)

In this section, we deal with layouts which have fixed channel area, e.g, semi-

custom chips in which each routing channel is of the same height.

3.3.1 Minimum Channel Height (Problem 5)

We may view the result of any width W folding as the transformation of the

component list C1,..., C into a new component list B1,..., Bk, k < n where B1

represents the components folded into row i of the layout. The width of each Bi

equals the sum of the widths of the components assigned to cell row i and this is








< W. Also, the routing channel between rows i and i + 1 must have height at least

equal to lj, where Cj, is the last component assigned to cell row i. We see that


i
width(Bi) = E wj
j=j'-i +1


and height of channel(i) > 1i,


where jo = 0. When channel heights are the same, the height must be at least

maxl
With this knowledge, we can develop a greedy algorithm to minimize channel

height. In this, we repeatedly combine together pairs of components (this is equivalent

to assigning them to the same cell row or Bi) so that no created component has width

greater than W. The pairs are chosen in non-increasing order of li. The greedy

algorithm is given in Figure 3.3. Each set of combined components is represented by

a pointer, last, from the first component to the last and another pointer, first, from

the last component to the first. The width of the combined component is kept in the

first elementary component of the combined component.

In the algorithm of Figure 3.3, we initialize the combined component blocks to

consist of elementary components in the first for loop. The sort gives us the order

in which the l's are to be "eliminated" so that the maximum of the remaining I's

is the minimum. In the while loop l's are eliminated by combining blocks. This

is done until the next highest l (we assume that E wi > W so it is not possible to

eliminate all l's). The highest remaining I is l[p[i]] and this is the smallest channel

height needed.


















Procedure MinChannelHeight;
for i := 1 to n do {intialize component blocks }
begin
first[i] := i; last[i]:= i;
end;
Sort p[1..n] = [1,2,...,n] so that
l[p[iJ] > l[p[i + 1]], 1 < i < n
i := 1;
while( width[first[p[ij]] + width[p[i] + 1] < W ) do
begin
width[first[p[i]]] := width[first[p[i]]] + width[p[i] + 1];
first[last[p[i] + 1]] := first[p[i]];
last[first[p[i]] := last[p[i] + 1];
i := i +1;
end;
MinChannelHeight:= l[p[i}];
end;


Figure 3.3. Procedure to obtain a minimum channel height folding








The correctness of the procedure is easily established. For its complexity, we

see that except for the sort step, the others take O(n) time. The sort can be done

in O(nlogn) time. However, in practice, max{/i} min{l,} = O(n) and the sort

can be done in O(n) time using a radix sort with radix O(n) (i.e., a bin sort) [11].

One may also verify that the minimum number of cell rows needed is obtained by

doing a greedy folding on the combined components that remain when procedure

MinChannelHeight.

3.3.2 Minimize Chip Area Subject to Width Constraint (Problem 6)

First, consider a modified version of problem 6 in which in addition to the chip

width W, we are given the height L of each routing channel. We are to fold the

components so as to minimize the total chip area. To solve modified problem 6 in

linear time, we first make a pass over all the components and combine components C,

and Ci+l if 1i > L. If any component that results has width > W, L is an infeasible

channel height. Following the combining of blocks in this way, the resulting blocks

are packed into cell rows in a greedy manner (i.e., a new cell row is started only if

the component being placed does not fit in the current cell row). The fact that this

minimizes the number of cell rows and hence chip area is easily verified.

Problem 6 can be solved using the solution to modified problem 6 by trying

out all O(n) possible values for L (i.e., the distinct li's) and seeing which minimizes

overall area. (Actually only li's that are no less than the minimum feasible L as

determined by problem 5 need be tried). The resulting complexity is O(n2).








3.3.3 Minimize Chip Area Subject to Height Constraint (Problem 7)

As for problem 6, we define a modified problem 7 in which the channel height L

is known. This modified problem is solved using parametric search. The candidate

values are described by the same M matrix as used in Section 3.2.2. The solution

to modified problem 6 is used for the feasibility test. This enables us to solve the

modified version of problem 7 in O(n log n) time. Now, by trying out all O(n) possible

L values (as in Section 3.3.2) the minimum area folding can be determined. The

overall time complexity is O(n2 log n).

3.4 Custom Cell Folding (Problems 8 and 9)

In this section, we relax the requirement that all components have the same

height h. Let hi be the height of Ci. If C,, .- Cj are assigned to the same cell row

and no other components are assigned to this row, then the cell row height is



max{ h }
i<_q

The height of the folding is the sum of the heights of the cell rows and routing

channels.

3.4.1 Width Constrained Folding (Problem 8)

Since the chip width is fixed at W, chip area is minimized by minimizing chip

height. Let Rj = max
height into which Ci,..., C, can be folded such that the first fold is at C,. Following


















Procedure MinimizeIHtCustom;
{ Compute the minimum height folding}
head := n; tail:= n; left := n; right := n;
for i := 1 to n do
Hlist[i].gvalue := oo;
Flist[tail].q := n; Flist[tail].F = 0;
Hlist[n].top:= tail; Hlist[n].bottom := tail;
Hlist[n].hvalue:= h[n];
Hlist[n].gvalue := Hlist[n].hvalue + Flist[Hlist[n].top].F;
InitializeWinnerTree(T);
for i := n 1 downto 1 do
begin
DeleteValue(i);
Insert Value(i);
end;{of for }
Delete Value(0);
MinimizeHtCustom := Winner of the Tree T;
end; { of MinimizelHtCustom }

Figure 3.4. Procedure to obtain a minimum height folding for custom cells








the development of Section 3.2.1, we see that f(n, n) = h, and for i < s,


00 if wi, > W
f(i,s) = (3.7)
f(s, s) + Rs h5, otherwise


and for i = s,

f(i, i) = min { f(i + 1, q) + h + i} (3.8)
i

The minimum height into which the folding can be done is mini
As described in Section 3.2.1, the set of dynamic programming equations can be

solved in O(n2) time. However, the development of Section 3.2.1, that results in an

O(n) time solution does not apply to the new set of equations. Instead, we are able

to solve problem 8 in O(n log n) time.

Define F(i) = f(i, i) hi. Substituting into Equation 3.7, we get


00oo if Wi, > W
f(is) = (3.9)
F(s) + R,, otherwise


From Equation 3.8, we get



F(i) = f(i,i) h = min {f(i + 1, q)} + 1
i
= lj+min{f(i+1,i+1), min {f(i+1,q)}}
i+l
= i + min{F(i + 1) + hi+l, min {F(q) + Ri+,q}}
i+l
= / + min {F(q) + Ri+,q} (3.10)
i







The height of the minimum height folding is


min w{F(i) + R,} (3.11)
l

Beginning with F(n) = f(n, n) h, = 0, the remaining F's may be computed,

in the order F(n 1), F(1), by using Equation 3.10. To use Equation 3.10, we

keep a multiset Si of F values as in Section 3.2.1. We begin with S, = {F(n)} and

rewrite Equation 3.10 as :



F(i) = li + min {F(q) + Ri+i,q} (3.12)
F(q)ESi


Observation 1 of Section 3.2.1 applies to Equation 3.12 and we may eliminate

from Si any F(q) for which wi+l,q > W.


Observation 2 : Ri, Rii,,q- > > Ri,i, 1 < i < q 5 n.


Using Observation 2 and Equation 3.12 we can show that Lemma 1 applies for

the computation of the F's as defined in this section.


Observation 3 : If hj > hq and i < j < q, then Riq 0 hq. Also, if hi > hj+, and

i < j then Rj = Ri,j+l.


Now, we devise a method to find the minimum in Equation 3.12 efficiently. We

store the F(.) values in an array of records called Flist. Each Flist record has two

fields, Flist.q and Flist.F. Flist.F = F(Flist.q), ie, say F(8) = 50 then there is a

record which has Flist.q = 8 and Flist.F = 50. There are two pointers, head and















Procedure Delete Value(i);
{ Delete F(l) such that wi+ll > I }
done:=false; bool:= false;
while (not done) do
if (Q[i + 1] Q[Flist[Hlist[right].top].q+ 1] > W) then
{Delete this F(.) value)
Hlist[right].top = Hlist[right].top 1;
head = head- 1;
bool := true;
if (Ilist[right].top < Hlist[right].bottom) then
{ Make this record inactive }
Hlist[right].gvalue := oo;
Adjust WinnerTree(T, right);
right := right 1; bool := false;
end; {of if}
else done := true;
end;{of if}
end;{of while}
if bool then
Hlist[right].gvalue := Hlist[right].hvalue + FlistHlist[right].top].F;
Adjust WinnerTree(T, right);
end;{of if}
end; { of Delete Value )


Figure 3.5. Procedure to delete F(.) values as in Observation 3








tail that are used. Initially, head = tail = n. At any point, the head and tail have

values such that head > tail and F(tail) > F(tail + 1) > ... > F(head). This data

structure is same as the one used in Section 3.2.1.

When computing F(i) we need to associate F(q) values with Ri+,q values and

then generate values F(q) + Ri+i,q, and find the minimum of these values. Suppose

hq > hq+i then Riq = Ri,q+l from Observation 3. Associate the values F(q) and

F(q + 1) with Ri,q in this case. In general, if Ri,q = Ri,q+l = Ri,q+2 = ... = Ri,t,

then we have a single Hlist record with hq value and associated with it the values

F(q),F(q + 1),..., F(I). Note that the F(.) values must satisfy the condition :

F(q) > F(q + 1)... > F(l). Otherwise the F(.) values which violate the condition

can be removed as in Lemma 1 by doing a left to right scan. We use an array of

records Hlist of size n with fields Hlist.hvalue representing the height, Hlist.top and

Hlist.bottom the two pointers which keep track of the F(.) values associated with this

record. The top and bottom pointers point to the F(.) values satisfying the condition:

Flist[Hlist.top].F < Flist[Hlist.top- 1].F < ... < Flist[Hlist.bottom].F. That is,

Flist[Hlist.top].F is the minimum F(.) value associated with this record. Note that

every F(.) value is associated with a unique Hlist record. We generate the value

Flist[Hlist.top].F+ Hlist.hvalue (which is F(q) + Ri,q) and store it in Hlist.gvalue

(generated value). These generated values are used to construct a winner tree T (see

Horowitz and Sahni [11]).

The winner of the tree T is the minimum we are looking for when computing

F(i). Let a Hlist record be active if Hlist.gvalue # oo. The pointers left and right,








left < right, are used to point to the currently active list of Hlist records. Hlist[left]

is the leftmost active record and Hlist[right] is the rightmost active record.

The procedure MinimizeHltCustom is given in Figure 3.4. The pointers are

initialized and the winner tree T initialized. In the procedure DeleteValue(i), the

F(l) values that satisfy the conditions in Observation 1, i.e., F(l) values such that

wi+1,i > W are deleted. Let Q[j] = wj + wj+l +... + w,.

The procedure Delete Value is given in Figure 3.5. The boolean bool keeps track

of whether a Hlist record has been made inactive. If so, it moves the pointer right

to left to point to an active Hlist record. Also, the winner tree T is adjusted to

update the current minimum. The call to function AdjustWinnerTree takes O(logn)

time [11]. Note that the winner tree T is adjusted a maximum of two times whenever

an F(.) value is deleted. Let the number of deletes when DeleteValue is invoked be

x. Then, the time complexity of Delete Value is O(x log n).

The procedure InsertValue(i) first finds the winner of the tree T. This is added

with l[i] to get F(i) as in Equation 3.12. Once we find F(i), we then insert a

Hlist record with Hlist.hvalue = h[i] and the F(.) value is inserted in the array of

Flist records. The winner tree T is then adjusted. In the first while loop of the

InsertValue, conditions of Lemma 1 are checked. If the conditions apply then the

F(.) values are deleted and the winner tree adjusted. Let the number of F(.) value

deletions be y. In the second while loop of the Insert Value, it is checked to see

whether the conditions of Observation 3 apply. If so, the F(.) records of the adjacent

Hlist record is added to the current Hlist record and the record moved. The winner









Procedure Insert Value(i);
left := left 1; tail := tail 1;
Flist[tail].q := i;
Flist[tail].F:= Winner of the Min Tree T + l[i];
Hlist[left].hvalue:= h[i];
Hlist[left].top:= tail; Hlist[left].bottom:= tail;
Hlist[left].gvalue := IIlist[left].hvalue + Flist[IIlist[left].top].F;
Adjust WinnerTree(T, left);
while (head $ tail and Flist[tail].F < Flist[tail+ 1].F) do
Hlist[left + 1].bottom := Hlist[left + l].bottom + 1;
if (Hlist[left + 1].bottom > llist[left + 1].top) then
Hlist[left + 1] := Hlist[left]; { Move the record }
Hlist[left].gvalue := oo;
AdjustWinnerTree(T, left);
left := left + 1;
end; {of if}
Flist[tail+ 1] = Flist[tail]; { Move the record }
tail tai= tail+ 1;
Hlist[leftj.top:= tail; Illist[left].bottom:= tail;
end;{of while}
while (left / right and Hlist[left].hvalue > Hlist[left + 1.hvalue) do
{Conditions of Observation 3 apply}
Hlist[left].top := Illist[left + 1].top;
Hlist[left + 1] := Hlist[left]; { Move the record }
Hlist[left].gvalue:= oo;
Adjust WinnerTree(T, left);
left:= left + 1;
Hlist [left].gvalue := Hlist[left].hvalue + Flist[Hlist[left].top].F;
Adjust WinnerTree(T, left);
end;{of while}
end; { of Insert Value }


Figure 3.6. Procedure to Insert F(.) values








tree is then adjusted. Every time, the conditions of Observation 3 apply in the while

loop, we spend O(logn) time. I.e., every time the conditions apply we merge two

adjacent Hlist records. Let the number of merges in a single invocation of Insert Value

be z. The total time taken by a single invocation of InsertValue, assuming y F(.)

values are deleted and z Hlist merges take place is O((y + z + 1)log n) time.

Note that not more than n F(.) values can be deleted in total, and not more

than n Hlist records can be merged in total. This implies that the total time taken

by the procedure Minimize HtCustom is O(n log n). In contrast, the algorithm of [21],

for the same problem takes O(n2) time.

3.4.2 Height Constrained Folding (Problem 9)

To obtain the minimum height folding, given the width of the folding W, we use

parametric search in conjunction with the procedure MinimizeHtCustom developed

in Section 3.4.1. The procedure MinimizelltCustom is used for the feasibility testing.

In feasibility testing, we are given the width,x, of the layout and we test whether it

is possible to obtain a folding such that the height of the folding is < H. The set of

candidate values is the same as the ones described in Section 3.2.2. The feasibility

testing takes O(n log n) time, and from Corollary 2, the total time taken to obtain

the minimum height folding is O(n + n log n log n) = O(n log2 n). The same problem

is solved in O(n logn) time in [21].










Table 3.1. Heights produced by width-constrained standard cell folding algorithms


3.5 Experimental Results

The procedure MinimizeHlStandard (Figure 3.2) was programmed in C and

run on a SUN 4 workstation. The solution produced by MinimizeHtStandard was

compared with the one obtained using the greedy heuristic of [26].

The data for these programs were produced by having a linearly ordered list of

modules and making interconnections between the modules using a random number

generator. The connections were prioritized so that there is a large number of con-

nections between modules which are close together. Our algorithm always produces

better solutions than the greedy heuristic and the results are depicted in Table 3.1.

The results shown are the average of 10 runs for each n. Our algorithm, on the aver-

age, took 2 to 3 times more time to arrive at the solution than taken by the greedy

heurisitic.

The algorithm MinimizeHtCustom was programmed and the run times compared

with the algorithm of [21]. The results of the experiments are shown in Table 3.2.

Both the programs were written in C. It is evident that our algorithm is considerably


n Greedy Ours

100 624.4 609.8

400 1979 1961.2

1000 3813.7 3721.2










Table 3.2. Run times of width-constrained folding algorithms for custom cells


n Ours [21]

64 2.56 23.80

250 10.9 350.3

1000 39.23 6125.5

Times are in milliseconds


superior to that of [21]. Since both algorithms generate optimal solutions, the chip

area is the same using either.

3.6 Conclusions

We have developed optimal algorithms to fold a linearly ordered list of standard

and custom cells. Several optimization constraints were considered. These resulted

in a total of nine problem formulations. Two of these correspond to problem formu-

lations for the bit-slice stack folding problem studied in [21]. The algorithms we have

developed for these two cases are asymptotically superior to those developed in [21].

Experimentation with one of these shows that the asymptotic superiority of our al-

gorithms translates into a much reduced execution time. For the other formulations,

heuristics were proposed in Shragowitz et al. [25]. Our algorithms have accept-

able asymptotic complexity and guarantee optimal solutions. In fact, experiments

conducted with one yielded foldings with smaller chip area on all tested instances.














CHAPTER 4
PLANAR TOPOLOGICAL ROUTABILITY


4.1 Introduction

The problem of routing two-pin nets on a single layer has been studied previously

by several researchers. The river routing and switch box routing problems are special

cases of this. Efficient algorithms for these can be found in [12, 15, 17, 19, 20,

22, 23, 27, 29, 30]. In this chapter, we are concerned with the problem of routing

(topologically) a collection of two-pin nets in a single layer or plane. We refer to

this problem as the TPR problem. The input to the problem is a two dimensional

routing surface with a collection of modules placed in it (Figure 4.1(a)). We assume

that no two modules touch. There are pins on the periphery of the modules. Pins

with the same number define a net and are to be joined by an interconnect or wire.

In topological routing, we are concerned with defining wire paths. However, no

underlying grid is assumed and there is no minimum wire separation requirement.

Thus wire paths can take any planar shape and may run arbitrarily close to each

other. Wires are not permitted to intersect or run over modules. In Figure 4.1(a),

the broken lines indicate wire paths. The routing instance (RI) of Figure 4.1(a) is

topologically routable in a single layer while that of Figure 4.1(b) is not. The TPR

problem for Ris in which all modules lie on the boundary of the routing region (or




59









S2----------
...... ..2.......... .. .



(a) Planar Routable (b) A non planar routable example

Figure 4.1. A planar routable and a nonplanar routable case


more precisely all pins are on the boundary of the region) was studied in [18, 12, 23].

A simple linear time algorithm for this version of the TPR problem was developed

in these papers. For the case in which none of the modules are on the boundary,

Pinter [23] has suggested using the linear time planarity testing algorithm of Hopcroft

and Tarjan [9]. His algorithm is quite complex. Marek-Sadowska and Tarng [18] have

considered the TPR problem and several variants which include flippable modules

and multiterminal nets. They develop a linear time algorithm for TPR which is

based on module merging. In this chapter, we present, in Section 3.3, another linear

time algorithm for the general TPR problem that is almost as simple as the one

of [18, 12, 23] for the restricted TPR problem. This algorithm was developed by

Lim [16] but the proof that the algorithm is correct was incomplete. In this section,

we also present an algorithm for definite topological routing. That is, if the instance

is topologically routable we give an algorithm to determine the loose route of the

wires. The TPR algorithm is implemented differently than described in the Section

3.3. The implementation issues are discussed in Section 3.5. Experimental results








presented in Section 3.6 indicate that our algorithm is considerably faster than the

TPR algorithm of Marek-Sadowska and Tarng [18] particularly if the routing instance

is not planar routable. For the case of multipin nets, we show, in Section 3.4, that

testing for topological routability is equivalent to graph planarity testing and that

finding the maximum number of nets that is topologically routable is NP-complete.

We also extend our two-pin algorithm to handle multipin instances in which the

modules remain connected following the deletion of all nets with more than two pins.

4.2 Preliminaries

To simplify matters, we shall assume that TPR RIs that have modules on the

boundary (Figure 4.2(a)) have been augmented by a set of nets that are required to

be routed on the boundary and that this routing together with the module bound-

aries enclose the routing region (Figure 4.2(b)). This augmentation may require the

addition of corner modules (A, B, C of Figure 4.2(b)). This assumption is needed so

that our algorithm can account for the constraint that one cannot route around a

boundary module but can route around all other modules.

A pin segment, P = piP2...pk, is a sequence of pins on the boundary of a

module. pI ... pk appear in this order when the module is traversed counter-clockwise

beginning at pi. Some of the pin segments of the modules of Figure 4.3 are: abcde and

gjkH of module 1; MLK and LKJGf of module 3; and AiF of module 2. Let last(P)

and first(P), respectively, denote the last and first pins of segment P. Let net(pi)

denote the net associated with pin pi. Note that two pins pi and pj are to be connected

by a wire iff net(pi) = net(pj). A curve, C = P P2 ... Pj, is a sequence of pin segments












a a
A 1 C


2



3 F4 3 41f

e e
(a) RI (b) Augmented RI

Figure 4.2. Augmentation


Figure 4.3. An example to illustrate some terminology








such that net(last(Pi)) = net(first(Pi+i)), 1 < i < j. A curve, C = P1P2...Pj,

is a closed curve iff net(last(Pj)) = net(first(Pi)). In Figure 4.3, net(pi) is the

lowercase letter corresponding to pi. So, net(h) = net(H) = h. Some of the curves

of Figure 4.3 are Ih Habcdeg Gf FEDCBAi, j JGfM mlh, edcba ABCDE and

ABC cdeg GfM. IhHabcdeg Gf FEDCBAi and edcba ABCDE are closed curves.

With any curve C = P1 P2 ... Pj, we associate j-1 (j in case C is closed) wires. These,

respectively, connect the pins last(Pi) and first(Pi+l), 1 5 i < j (and last(Pj) and

first(Pi) in case of a closed curve). Note that the curves, closed curves, and wires

associated with any RI depend only on the modules and the net to pin assignments.

These are not a function of the layout of any of the wires.

For any closed curve C = P P2... Pj we define the following:



module(Pi) ... module corresponding to pin segment Pi

pins(module(Pi)) ... set of all pins on module module(Pi)

pins(Pi) ... set of all pins on segment P;

pins(C) ... set of all pins on curve C = U(Jl= pins(Pi)

extpins(C) ... U[i= pins(module(Pi)) pins(C)

Note, it is possible that module(PT) = module(Pj), for i j.


Lemma 2 : [16] Let I be an RI that contains a closed curve C with respect to which

there are two pins a E pins(C) and b E extpins(C) such that net(a) = net(b). I is

not planar routable.


















(a) Original Situiiaflon b) Connect terminalra and b
A
b



D

(c) Re-routing o some net


Figure 4.4. Two possibilities to connect a and b


Proof : Figure 4.4 shows two possibilities. It should be clear that no matter how

the wires of C and the wire (a, b) are laid out, there must be an intersection between

two of these. ]


Lemma 3: [16] Let I be an RI that contains a closed curve C = P1,P2,...,Pj

and another curve R = R1R2 ... Rk such that module(Ri) = module(Pd) for some d,

1 < d < j and first(Ri) E extpins(C) (see Figure 4.5). Assume that there exist two

pins a and b such that a E pins(C), b E ext-pins(C) Upins(R), and net(a) = net(b).

I is not planar routable.


Proof : Follows from Lemma 2. 0

Two modules are connected iff there is a curve C = P1P2... Pj such that both

modules are in UJi= module(Pi). A connected component (or simply component) is























Figure 4.5. Another not planar routable situation


a maximal set of modules that are pairwise connected. It is easy to see that the

connected components of an RI are disjoint. A boundary component is a connected

component that includes at least one boundary module. Note that an RI with no

boundary modules has no boundary component while an RI with at least one bound-

ary module has exactly one boundary component (this is because RIs with boundary

components have been augmented as in Figure 4.2(b)).


Lemma 4 : An RI is topologically routable iff its components are (independently)

topologically routable.


Proof : It is easy to see that if the RI is topologically routable then each of its

components is topologically routable. Assume that each component is topologically

routable. Order the components of the RI so that the boundary component is first.

The remaining components are in arbitrary order. Let the components in this order

be K1, K2,..., Kk. If k = 1, then nothing is to be proved. So, assume K > 1. We

shall show how to construct a topological routing for K1, K2..., Ka from a topological












El 0 0 0i 0

0 F 0 O
(ap Module E Ka (b) Spanning tree
o Module Ka





(c) Envelope

Figure 4.6. Constructing the envelope of a component


routing for Ki,..., Ka-1 and Ka, 2 < a < k. First since a > 1, Ka is not a boundary

component. So, it is possible to surround it by a closed non self intersecting line

such that the region enclosed by this line includes exactly those modules that are

in Ka and no module touches the line. The region enclosed by this closed line has

the property that any two points in the enclosed region can be joined by a line (not

necessarily straight) that lies wholly within the region. We refer to the surrounding

line as the envelope of Ka. One way to obtain an envelope of Ka is to first construct

a set of I|I 1 (1Ka| is the number of modules in Ka) lines (not necessarily straight)

so that modules of KI together with these lines form a connected component in the

graph theoretic sense (see Figure 4.6). These lines do not touch or cross any of the

modules of RI. This construction can be done as every pair of modules of an RI can




















(a) Intersections (b) Re-routing


Figure 4.7. Re-routing to free independent component


be can be connected by such a line. The lines and modules define a spanning tree for

Ka. By fattening the lines as in Figure 4.6(c), the envelope is obtained. It is easy

to see that if IK is topologically routable, then it is topologically routable with the

defined envelope. So, use such a topological routing for K,. When this routing is

embedded into the routing for K1,..., KJ-1 some of the topologically routed wires

of K1,..., Ka- may intersect (or touch) the envelope of K,. However, none of these

wires originate or terminate in the envelope of K,. So, these can be rerouted following

the contour of the envelope (Figure 4.7). 0

As a result of Lemma 4, we need concern ourselves only with the case when the

RI has a single component.

4.3 The Algorithm

Our algorithm to obtain a topological routing of a component uses Lemmas 2

and 3 to detect infeasibility. The algorithm is given in Figure 4.8. As stated, it

only produces an ordering of the wires such that when the wires are topologically








Algorithm Testing.Planar Routability

Step 1: Let m be any module of the component and let p be any pin of m.

Step 2: Examine the pins of m in counterclockwise order beginning at pin p. When
a pin q is being examined compare net(q) and net(r) where r is the pin (if any)
at the top of stack A. If stack A is empty or net(q) : net(r) then add q and
the remaining pins of m to the top of stack A. Otherwise output (q, r) and
unstack r from A.

Step 3: If both stack A and B are empty, then terminate.

Step 4: Let r be the pin at the top of stack A. Let s be the pin such that
net(r) = net(s).

(a) If s is at the top of the stack B, then [output (r, s); unstack r from A and
s from B; go to start of Step 3].

(b) If s is in stack B but not at the top, then [output("The RI is not planar
routable"). Terminate].
(c) If s is in stack A, then unstackk r from A; add r to stack B; go to the
start of Step 4].
(d) If s is in neither of the stacks then [ set p to s; let m be the module
containing s; go to Step 2].


Figure 4.8. Topological routing.

routed, one at a time, in this order, then there is always a path between the two

end points of the wire currently being routed such that this path does not intersect

previously routed wires or cross any of the modules. This is sufficient to obtain the

actual topological routing.

Our algorithm employs two stacks A and B. Stack A maintains a pin sequence

that defines a curve of the RI. Stack B is used to retain pins that define closed curves

with respect to a (sub) curve on stack A. We describe the working of the algorithm













1Dc cDc

E d. 4 a4
*h2 U A 2 H


FE 3E


(a)Example RI (b)A possible topological routing

Figure 4.9. Example RI


with the aid of an example (Figure 4.9(a)). There are four modules 1 -4 and 16 pins

a h and A H. net(p) = p if p is a lowercase letter and net(p) = lowercase(p) if

p is an uppercase letter. Suppose we begin in step 1 with m = 3 and p = B. Then

in step 2, BAFEC get stacked, in that order on to stack A. This corresponds to

the curve of Figure 4.10(a). Pin c is in neither of the stacks and in step 4(d), we set

m = 1, p = c, and go to step 2. In this step, the wires Cc, Ee and Ff are output

for routing. The pins g and D are put on the top of stack A. The curve traced so

far is shown in Figure 4.10(b). The routed wires are also shown as a curve. Note

that these wires have to be routed using the procedure findroute, otherwise they

can enclose a non-empty region. The curve is extended to module 2 and stack A has

configuration BAghb. The wire Dd is output for routing. The curve has the form as

shown is Figure 4.10(c). The curve cannot be extended further as both end points

of wire Bb are on the stack. This means that we have detected a closed curve of the










B-A-F-E-C B-A-F-E-C-c-e-f-g-D


(a) (b)

B-A-F-E-C- c- e-f-g-D- d- h-b
B-A-F-E-C-c-e-f-g-D-d-h-b 0
H-a


(c) (d)

Figure 4.10. Illustration of the routing sequence


RI. The detected curve is that of Figure 4.10(c). We defer the routing of Bb until we

have verified emma 2 and 3 for this closed curve. The deferment also ensures that

the current topological routing does not contain a closed line. If Bb were routed now,

then the wires Bb, Cc, and Dd together with the boundaries of modules 1, 2, and

3 would define a closed line that encloses a non-empty region. This could result in

future routing problems as there would be no path between a point in the region and

one that is outside the region. For example, if the routing of Figure 4.11 is used, then

there is no path between a and A as a is in the enclosed (shaded) region while A is

outside of it. The routing of Bb is deferred by saving b on stack B. The curve of stack

A is extended to module 4 via the wire hH. Wire hH is output for routing. Also

the wires gG and Aa are output for routing. Stack A contains the pin B and stack

B contains the pin b. The curve is shown in Figure 4.10(d). Finally, the wire Bb is

output for routing and the since both the stacks are empty, the algorithm terminates

successfully in step 3.























Figure 4.11. Trapped terminal and module


The routing order is Cc, Ee, Ff, Dd, hH, gG, Aa and Bb. Let us try this

out on our example. We see that no matter how Cc is routed there will remain a

routing path for the remaining wires. The routing of Dd and Hh cannot create any

enclosed regions and so cannot affect the feasibility of future routes. When Ee and

Ff are routed, an enclosed region can be formed. Hence these wires have to be

routed using the procedure findroute of Figure 4.13, otherwise they can enclose a

non-empty region. The topological routed RI can be found in figure Figure 4.9(b).


Lemma 5 : If algorithm Testing.PlanarRoutability terminates in step 3, then the

input instance is topologically routable.


Proof : We shall show that the algorithm TestingPlanar-Routability maintains the

following invariant:

There is a topological routing of all wires output so far such that each remaining

wire is (individually) topologically routable.














M








(a) (b)

Figure 4.12. To illustrate conflict


This is true when we start as at this time, no wires have been output and for

each wire, there is a routing path between its two pins. Assume that the invariant

holds just before some wire (r, s) is output. We shall show the invariant holds after

this wire is output for routing. Wire (r, s) satisfies exactly one of the following:


(a) It is output in step 2, r is a pin that was on stack A, s is the first pin to be

reached on its module.


(b) It is output in step 2, r was on stack A, s is not the first pin to be reached on

its module.


(c) It is output in step 4(a). At the time of output, r was at the top of stack A

and s at the top of stack B.








If we are in case (a), then since module(s) is reached for the first time, no matter

how wire (r, s) is routed at this time no new enclosed regions are formed. Hence all

remaining wires remain routable.

The proofs for cases (b) and (c) are similar. We consider case (c) only. From the

algorithm it follows that at some time prior to the output of (r, s), both r and s were

on stack A, s was at the top of A and about to be moved to stack B. The pins on

stack A beginning with r and ending at s define a closed curve C (as net(r) = net(s)).

Let these pins be from modules module(r) = Mi, M2,..., Mk = module(s) (in this

order moving up stack A). Let p be any pin in pins(C) {r, s} and let q be such that

net(p) = net(q). We may assume that either q E pins(C) or module(q) is unvisited

at the time s is moved from stack A to stack B. Note that if this is not the case,

then q is in stack A and below r at this time. From the working of the algorithm it

follows that when r reaches the top of stack A (as it does by the assumption of (r, s)

being output from step 4(a)), p must be on stack B and above A. So, the algorithm

should have terminated unsuccessfully in step 4(b), contradicting the assumption of

termination in step 3.

Let U be the set of unvisited modules at the time s is transferred from stack A

to stack B. By extending our previous argument, we see that the set N of modules

visited by the algorithm between the time s is transferred from stack A to stack B

and the time (r, s) is output is such that


(1) N C UU modules(C).


(2) All pins in (N n U) U (pins(C) {r,s}) have been output for routing.








Algorithm findroute(r, s)
begin
currentpin := s; I:= clockwisepin(currentpin);
while (1 # r) do
begin
Step 1: Route clockwise from currentpin to 1 following the module boundary
Step 2: currentpin = q such that net(q) = net(l)
Step 3: Continue the route from 1 to currentpin following the existing route
closely
Step 4: 1 = clockwisepin(currentpin)
end
Complete the route from currentpin to I = r following the module boundary
end

Figure 4.13. Algorithm to find routing path between pins r and s

(3) All pins reached from N n modules(C) are in pins(C) U pins(N n U).


We now claim that algorithm findroute(r, s) obtains a topological routing of

the wire (r, s) that preserves the invariant. To establish this, we need to show that


(A) The algorithm actually finds the route between r and s.


(B) The region enclosed by this route and the curve C contains no pins that have

not been routed to.


To prove (A), we need to show:


(Al) For each value of currentpin, clockwisepin(currentpin) (i.e., the pin clockwise

from currentpin) is defined and different from currentpin.


(A2) The net (1, currentpin) in step 3 of Figure 4.13 is already routed, so it is

possible to follow this route.








(A3) currentpin does not assume the same value twice.

For (Al), we simply assume that each module has atleast two pins. Modules

with a single pin may be ignored initially and routed to after the remaining routes

have been made. For (A2), let the value of the currentpin and I at the start of

the i'th iteration of the while loop be ci and li, respectively. We note that cl = s

and 11 E pins(C). If li E pins(C) U pins(N n U), then from conditions (2) and (3),

it follows that ci+i E pins(C) U pins(N n U) and wire (li, ci+i) has been routed.

Suppose there is an li pins(C) U pins(N n U). Let lj be the first such li. Since

j > 1, lj-k and Cj-k+l are in pins(C) U pins(N n U) for k > 1. Since cj and Ij

are on the same module, it follows that module(cj) V N. So, lI E extlpins(C) and

cj E pins(C). From the way algorithm Testing Planar.Routability works, it follows

that (lj-, cj) is a segment of the curve C and that curve C when oriented from r

to s, first reaches lj-i and then cj via wire (lj-,,cj). Hence l-j_ E pins(C). Since

lj-2 E pins(C) (by assumption on j), cj_- E pins(C) U pins(N n U). Further, since

cj-1 is a module of C, cj-i E pins(C). Now, since (I-1, cj) is a segment of C and cj-1

is one pin clockwise from lj-1 and a pin of C, it follows that (j-2,cCj-1) is a segment

of C oriented from lj-2 to cj-1. Continuing in this way, we conclude that (11,c2) is

a segment of C oriented from 11 to c2. However, we know that when C is oriented

from r to s there is only one wire segment that includes a pin of module(s) and this

is oriented to module(s). That is, the orientation is c2 to 11, a contradiction. Hence,

there is no li V pins(C) U pins(N n U). Also, no li E {r, s} at the start of a while








loop iteration. From condition (2), it follows that all encountered (li, ci+i) have been

routed.

For (A3), suppose that ci = cj for some i and j, i < j. Since (li-1,ci) and

(l-_l,cj) are two-pin nets, it follows that 1i-1 = lj-1. Now, since ci-i and cj-1

are, respectively, one pin counterclockwise from li-i and lj-1, it follows that ci-1 =

cj~-. Continuing in this way, we see that s = cl = cj-i+i. This implies that net

(l-ji, cj-i+) = (r,s) has already been routed. But, it has not. So, no ci is repeated.

(B) follows from the fact that find_route reaches only pins in pins(C)Upins(N n

U), condition (3) and the fact that find_route follows existing routes without enclosing

any new pins. [


Lemma 6 : If the algorithm Testing Planar_Routability (given in Figure 4.8) termi-

nates in step 4(b), the RI is not planar routable.


Proof : If the algorithm terminates in step 4(b), then let r and s be as in step 4. r

is at the top of stack A and s is in stack B but not at the top. Let x be at the top of

stack B and let y be the pin such that net(y) = net(x). y must currently be on stack

A as x can be put on stack B (see step 4(c)) only if y is on stack A. When one pin

of a net is in stack A and the other in stack B, the pins can leave the stacks together

(step 4(a)) or not at all. Since x is on stack B at termination, y must still be on

stack A and hence must be lower than r (as r is at the top). So, there is a curve

y... r in the RI. Furthermore, curves y...r...s and y...r... x must exist as this is

the only way s and x can get to stack A and then to stack B. Figure 4.12(a) shows

an example curve y... r... s. This figure assumes that module(s) module(r). The








proof for the case module(s) = module(r) is similar. Let m be the module at which

the curves y... r... s and y... r... x diverge (Figure 4.12(b)). Note that m may be

module(r) or a latermodule on the curve y ... r ... s. Let u be the pin of m that is the

last pin of m on curve y... r... s and let v be the corresponding pin for y... r... x.

Since all nets are two-pin nets, u v. Since x is above s in stack B, v must be

on the curve y... r...s. The curve C = y...r...v... x is a closed curve. We see

that r E pins(C), and s E ext-pins(C), and net(s) = net(r). So, s and r satisfy the

conditions of Lemma 3 and the RI is not planar routable. 0


Theorem 6 : The algorithm TestingPlanar.Routability (given in Figure 4.8) is cor-

rect.


Proof : Follows from Lemmas 5 and 6. O

The algorithm of Figure 4.8 is easily implemented to have complexity of O(n)

where n is the total number of pins. For this we need to use an array status[l..n] to

maintain the current status (i.e., on stack A, on stack B, on neither) of each pin.

4.4 Topological Routability of Multi-pin Nets

We shall refer to the extension of TestingPlanarRoutability or TPR to the case

where some or all nets may have more than two pins as MTPR. The MTPR prob-

lem may be solved in linear time by mapping MTPR instances into graph planarity

instances [15, 18]. However, the known linear time algorithms [9] for graph planarity

are complex and one is motivated to explore the possibility that simpler algorithms

exist for MTPR (just as they do for TPR). Unfortunately, this is not the case. We








show, in Theorem 7, that any algorithm for MTPR can be used to test graph pla-

narity with no increase in complexity. In Theorem 8, we show that the problem of

determining the maximum number of topologically routable nets of an MTPR in-

stance is NP-hard. For the case where all the pins are two-pin nets, we can use the

construction of [18] and the algorithm of [28] to find the maximum subset that is

topologically routable. The complexity of the resulting algorithm is O(n2), where n

is the total number of nets. Theorem 7 motivates the quest for a simple linear time

algorithm for a restricted version of MTPR. We show that the algorithm of Figure 4.8

may be extended to handle MTPR instances in which every pair of modules remains

connected (though not necessarily by a net) when all nets other than two-pin nets

are eliminated.


Theorem 7 : Let I be an instance of graph planarity. I can be transformed in linear

time, into an instance I' of MTPR such that I is planar if I' is topologically routable.


Proof : From the constructions of [15, 18], it follows that the topological routability

of an MTPR instance does not depend on the specific placement of the modules.

Hence, in constructing I', we need not specify the module placement. I' is obtained

from I by replacing each edge (i,j), i < j of I by a module Mij with two pins MA.

and M?. The nets of I' are N, = {M4j I i < j} U {Mi I j < i}, 1
the number of vertices in I.

If I' is planar routable, then each net, Ni, has a planar realization that does

not contact the realization of any other net. This realization connects the pins of Ni

together, possibly using some Steiner points (see Figure 4.14(a)).

















6 6 6

8 7 77

(a) (b)

Figure 4.14. Realization of a planar net using one Steiner point


If Ni is a two-pin net, then introduce vertex vi anywhere on the wire connecting

the two pins of Ni. If Ni has more than two pins, then by using the transformations

of Figure 4.14(b), we can reduce the number of Steiner points to one and also ensure

that each pin of Ni has exactly one wire connected to it. There transformations

preserve the planarity of the routing. The sole surviving Steiner point is replaced by

the vertex vi.

Now each wire that connects to vi connects to module Mij or Mj,. This module

has another wire connecting to vertex j. Remove Mif (or Mji) and join the ends of

these two wires together by a line joining the terminals of Mij (or Mji). We now have

a planar embedding of I.

If I is planar, then start with its planar embedding. Replace vi by a Steiner

point; place Mij anywhere on the embedding of edge (i,j), i < j; split the edge (i,j)

at Mij and connecting the two ends (at the split point) to the terminals of Mij. This

yields a topological routing of I'.








Hence, I is planar if I' is topologically routable. 0

Let MTPRmax be the problem of determining whether or not k of the nets of an

MTPR instance are topologically routable. To show that MTPRmax is NP-complete,

we use the following problem that is known to be NP-complete [7].

Planar Subgraph: Given Graph G = (V, E), and a positive integer k
Is there a subset V' C V with I V' > k such that the subgraph induced by the V'

vertices is planar?


Theorem 8 : MTPRmax is NP-complete.


Proof : It is easy to see that MTPRmax is in NP. Also, from an instance I of the

planar subgraph problem, we can construct an instance I' of MTPRmax by replacing

edges by modules as in Theorem 7. It is easy to see that I' has k nets that are

topologically routable iff I has an induced subgraph with k vertices that is planar.

0

Any instance I of MTPR may be transformed into an instance I' of TPR which

includes unordered modules (i.e., modules whose terminals may be rearranged at

will). I' has the property that there is an arrangement of terminals for each of the

unordered modules which results in I' being topologically routable iff I is topolog-

ically routable. To obtain I' from I, for each multipin net Ni of size k, k > 2, we

introduce an unordered module UMi with k pins. The net Ni is replaced by k two-

pin nets, one pin of each of these nets is an original pin of Ni and the other a pin

of UMi. Since planar routability is not affected by module placement, UMi may be

placed anywhere in the routing region. An example is given in Figure 4.15.























(a) A multipin net (b) Two pin nets

Figure 4.15. Transformation from multipin to two-pin nets


Theorem 9 : The pins of each unordered module of I' can be ordered so that the

resulting instance of TPR is topologically routable if I is topologically routable.


Proof : If the pins of the UMi's can be so ordered, then a topological routing of

I is easily obtained from the topological routing of I' (simply replace each UMi by

a Steiner point). If I is topologically routable, then using transformations similar to

those in Figure 4.15(b), we may transform the topological routing into one in which

each multipin net Ni of size k > 2 is routed using exactly one Steiner point. This

Steiner point is replaced by module UMi and the pin ordering is determined by the

topological routing around the Steiner point. O1

If we knew which terminal orderings of the UM,'s to use, we could simply convert

each UMi to an ordered module and run the TPR algorithm. Unfortunately, we do

not know this. Therefore we need to modify algorithm Testing.Planar.Routability so

as to properly handle unordered modules. As in section 3, we may assume that I








is a single component. For our modification to work, we assume that I remains a

single component when all multipin nets Ni of size k > 2 are eliminated from I. The

modified algorithm MTPR is given in Figure 4.16.

The working of the MTPR algorithm is explained using the example in Fig-

ure 4.17. There are five modules 1-5 and there are seven nets out of which six are

two-pin nets. The seventh is a four-pin net. The pins of the four-pin net are w, x, y

and z. In step 1 of the algorithm of Figure 4.16, we replace the multipin net of size

four with a new unordered module UM1 which has four pins. Suppose we begin in

step 2 with m = 5 and p = A. Then in step 3, ACxB get stacked, in that order, on

to stack A. This corresponds to the curve of Figure 4.18(a). The top of stack A now

has pin B, and the curve on stack A is extended by adding pins from module 1 to

stack A. In this process, the wire Bb is output for routing. This situation is depicted

in Figure 4.18(b). Since pin a is at the top of stack A and its mate is below it in the

stack, pin a is moved from stack A to stack B. At the top of stack A, we have a pin

of a four-pin net and since this net is seen for the first time, we add the unordered

module to the top of stack B and mark y as having been seen. We route the wire

from pin y to the unordered module. The pin x, which is below pin y is next routed

to the unordered module. This is done using the procedure findroute of Figure 4.13.

The curve on stack A is extended by adding pins from module 4. At this time, wire

Cc is output for routing. This scenario is depicted in Figure 4.18(c). Now, pin z is

at the top of stack A. This pin is routed to the unordered module in step 6(a) of the

algorithm. Next, in step 3 the wire dD is output for routing. Also, pin E is put on









Algorithm MTPR

Step 1: For each multipin net Ni of size k > 2, introduce a new unordered module
UMi with k pins and replace Ni by k two terminal nets as described earlier.

Step 2: Let m be any ordered module and let p be any pin of m which corresponds
to an original two-pin net.

Step 3: Examine the pins of m in counterclockwise order beginning at pin p. When
a pin q is being examined compare net(q) and net(r) where r is the pin (if any)
at the top of stack A. If stack A is empty or net(q) :L net(r) then add q and
the remaining pins of m to the top of stack A. Otherwise output (q,r) and
unstack r from A.

Step 4: If both stacks A and B are empty, then terminate.

Step 5: Let r be the pin at the top of stack A. Let s be the pin such that
net(r) = net(s). If modules) is an unordered module then go to step 6.

(a) If s is at the top of the stack B, then [output (r, s); unstack r from A and
s from B; go to start of Step 4].
(b) If s is in stack B but not at the top, then [output("The RI is not planar
routable"). Terminate].
(c) If s is in stack A, then unstackk r from A; add r to stack B; go to the
start of Step 5].
(d) If s is in neither of the stacks then [ set p to s; let m be the module
containing s; go to Step 3].

Step 6:

(a) If modules) is at the top of stack B, then [output (r, s); unstack r from
A; mark pin s as having been seen. If all pins of module(s) have been
marked then unstack modules) from B; go to start of step 4]
(b) If module(s) is on B but not at the top, then [output("The RI is not
planar routable"). Terminate].
(c) If module(s) is not in stack B, then unstackk r from A; mark pin s as
having been seen; add modules) to the top of stack B; go to start of step
5].

Figure 4.16. Topological routing of multipin net for restricted version
























Figure 4.17. Example RI with a four pin net


the top of stack A. At this point stack A contains pins AfE, bottom to top in that

order. This situation is depicted in Figure 4.18(d). We set m = 2 and p = e in step

3 of the algorithm and output wires Ee and fF for routing. The remaining pin w is

put at the top of stack A. In step 6(a) of the algorithm, we mark pin w as seen and

route a wire from this pin to the unordered module. Also, we remove the unordered

module from the top of stack B. This is shown in Figure 4.18(e). Now, stack A

contains pin A and stack B contains pin a and the wire Aa is output for routing in

step 5(a) of the algorithm. Both the stacks are empty and the algorithm terminates

successfully in step 4. The topologically routed RI can be found in Figure 4.17.


Lemma 7 : If the algorithm MTPR halts in step 4, then the wires are planar routable.


Proof : The proof is very similar to that of Lemma 5. The same invariant holds.

When we put an unordered module on stack B and mark a pin (if that is the first

pin marked) then we connect this pin to the unordered module. See that there is no










AC-x-B ACx-B-b-y-a

(a) (b)


A C x -BHb -y- a A x -y- a
c -- d z c-f- 4-z
k2D -E
(c) (d)


c -f- -- z
SD -E- e- F- w

(e)


Figure 4.18. Illustration of the routing sequence with Multiterminal net


enclosed region and the invariant holds true. Now, if we are routing another pin (of

multipin net) then the pin it has to be connected is on the unordered module. So as

soon as we reach the unordered module, the next pin is chosen as the pin it has to

be connected to (this also defines the order of pins in the unordered module). The

proofs apply in this case as we have made all nets two-pin nets. 0


Lemma 8 : Let I be an RI that contains a curve C = P1P2 ... Pj. Let R = R1R2 ... Rk

and S = SIS2... S be two curves such that module(Ri) = module(Pd) for some d,

1 < d < j and first(Ri) E extpins(C) and module(S1) = module(P,) for some e,

1 < e < j and first(Si) E pins(C).








Let C be such that first(Pi) and last(Pj) are part of the same net N. Assume

that there exist two pins a and b such that a E pins(C) Upins(S), b E ext-pins(C) U

pins(R) and net(a) = net(b) 0 N.

I is not planar routable.


Proof : Follows from Lemma 2. 0


Lemma 9 : If algorithm MTPR terminates in steps 5(b) or 6(b), the RI is not planar

routable.


Proof : Suppose the algorithm terminates in step 5(b). Let r and a be as in

step 5 and let x be at the top of stack B. Note that r and s define a two-pin net.

If x is a two-pin net, then the RI is not planar routable (see proof of Lemma 5).

So assume that x is an unordered module (note that only pins of two-pin nets and

unordered modules get on to stack B). Module x must have atleast one marked and

one unmarked pin. Let C be the curve defined by the stack A segment from r to s

when s was at the top of stack A just prior to being transferred to stack B. From the

working of MTPR, it follows that there is a pin p E pins(C) from which a path was

traced to the multipin net corresponding to module x. Furthermore, there is atleast

one pin a of the multipin net that is on a path from a pin that is not in pins(C). A

possible situation is shown in Figure 4.19. The conditions of Lemma 8 are satisfied

and the RI is not planar routable.

If the algorithm terminates in step 6(b), then r is a pin of a multipin net and

module(s) is an unordered module. Let x be at the top of stack B at the time of

termination. Let j be one of the pins that have already been routed to module(s).



























Figure 4.19. An possible situation where the RI is unroutable


Let C be the curve defined by the stack A segment at the time pin j was output for

routing. Let S be the curve or pin in pins(C) that was used to reach pin x. Let y

be such that net(x) = net(y). Since y must be below r on stack A and r is a net of

a multipin net, the path from y to r on stack A must include a pin in ext.pins(C).

By setting a and b of Lemma 8 to x and y respectively, we see that the conditions

of Lemma 8 are satisfied and the RI is unroutable. The proof for the case x is an

unordered module is similar. 0

4.5 Implementation of Two-Pin Algorithm

While the correctness proof for our algorithm is somewhat involved, the algo-

rithm itself is quite simple and easy to implement. To get good performance we

implemented stack A as a stack of modules rather than one of pins as described in

Section 3.3. So, when step 2 of Figure 4.8 adds q and the remaining pins of m to








stack A, we simply add a record of the type (m, q, 1) where 1 is the last pin of m to

the stack. Also, to get the top pin of stack A, we look at the top record (m,q,I).

The top pin is 1. To delete this pin, the top record is changed to (m, q,p(1)) where

p(l) is the predecessor of pin 1 unless q = I. In the latter case, the record (m,q, 1) is

deleted from the stack. The role of array status needs to be changed to support this

change in stack structure. We now keep a status for a module as well as for a pin.

A module's status reflects whether or not it is in stack A and a pin's status reflects

whether or not it is in stack B.

The two-pin net algorithm of Marek-Sadowska and Tarng [18] is a two step

algorithm:


Step 1: Merge modules together to obtain an equivalent routing problem in which

all pins are on the periphery of a routing region.


Step 2: Determine the feasibility of the equivalent problem using a single stack

scheme.


To implement step 1, we performed a traversal of the modules. Each module was

represented as a singly linked circular list of pins. With this representation, modules

can be merged efficiently. By contrast, for the algorithm of Figure 4.8, modules were

represented using doubly linked circular lists.

The multipin net algorithm of Marek-Sadowska and Tarng [18] has three steps:


Step 1: Merge modules together to obtain an equivalent routing problem in which

all pins are on the periphery of a routing region.




























Figure 4.20. Tree-like connected circuits


Step 2: Traverse the pins and transform multipin nets into two pin nets.


Step 3: Determine the feasibility of the equivalent problem using a single stack

scheme.


4.6 Experimental Results

We implemented our algorithm for two-pin nets and multipin nets and that

of Marek-Sadowska and Tarng [18] in C and obtained execution times using both

circuits that are routable and those that are not. For the two-pin net case, the

routable circuits used are highly structured ones as shown in Figures 4.20 and 4.21

as well as randomly generated ones. The nonroutable circuits used were obtained by

modifying the tree-like circuits of Figure 4.20.
































Figure 4.21. Six-way connected circuits


*

Figure 4.22. Tree-like connected circuits with multipin nets



























Figure 4.23. Eight-way connected circuits with multipin nets


For the multipin net case, we used highly structured circuits as shown in Fig-

ures 4.22 and 4.23. The nonroutable circuits for the multipin case was obtained by

modifying the structured circuits.

The timing results for the routable circuits of two-pin nets, are shown in Tables

4.1, 4.2 and 4.3 respectively. The times are in milliseconds and the programs were

run on a SUN 4 workstation. On tree-like circuits, the algorithm of Marek-Sadowska

and Tarng [18] took 65% more time than ours, on average; on six-way circuits, it

took approximately 40% more time; and on random circuits, it took approximately

37% more time.

For the multipin net case, the timing results for the routable circuits are shown

in Tables 4.5 and 4.6. On tree-like circuits with multipin nets, the algorithm of

Marek-Sadowska and Tarng [18] took 295% more time than ours, on average; on