Group Title: Department of Computer and Information Science and Engineering Technical Reports
Title: Optimal folding of bit sliced stacks
Full Citation
Permanent Link:
 Material Information
Title: Optimal folding of bit sliced stacks
Series Title: Department of Computer and Information Science and Engineering Technical Reports
Physical Description: Book
Language: English
Creator: Sahni, Sartaj
Paik, Doowon
Affiliation: University of Florida
University of Florida
Publisher: Department of Computer and Information Sciences, University of Florida
Place of Publication: Gainesville, Fla.
Copyright Date: 1991
 Record Information
Bibliographic ID: UF00095098
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.


This item has the following downloads:

199123 ( PDF )

Full Text

Optimal Folding Of Bit Sliced Stacks+

Doowon Paik

University of Minnesota

Sartaj Sahni

University of Florida

We develop fast polynomial time algorithms to optimally fold stacked bit sliced architectures to

minimize area subject to height or width constraints. These algorithms may also be applied to

folding problems that arise in standard cell and sea-of-gates designs.


Stacked bit sliced architectures, folding, area

+ Research supported, in part, by the National Science Foundation under grant MIP 86-17374.



1. Introduction

A stack of bit sliced components ([LARM90] and [WU90]) consists of n components of varying

height and width with their left ends vertically aligned as in Figure l(a). The intra component

(i.e., inter slice but local to a component) routing is done on metal layer 1 while the inter com-

ponent (i.e., intra slice but across components) routing is done on metal layer 2.

A component stack may be folded at component il by rotating components il+1, ... in by

180 so that their aligned ends are now on the right and the component order is il+l, ... i bot-

tom to top (Figure l(b)). As a result of this, the slices of components i/+1, ... i, are in the order

slice 1, slice 2, ... right to left. Folding at il creates two components stacks. One is left aligned

(i.e., components 1, ..., ij) and the other is right aligned (components i+l, ... i). By folding at

i1, i2, and i3, a four stack arrangement as in Figure l(c) is obtained.

When a component stack is folded as in Figure l(b), it becomes necessary for the available

vertical tracks in a physical slice to be able to carry the routes for two logical slices. For exam-

ple, in Figure l(b), slice 4 of component C, and slice 4 of component C,_1 occupy the same physi-

cal chip space. We assume that the width of a slice is sufficient for this. Further, when a stack is

folded at component il we may need space at the ends of the two created stacks to complete the

routes between components il and il+1. If this can be done in a third layer, then no such routing

space is needed. If not, additional space proportional to the number of wires between il and i1+1

must be reserved (Figure 2). We consider both of these cases in this paper.

The bit sliced model defined above was introduced by Larmore, Gajski, and Wu [LARM90]

for the compilation of arbitrary net lists into layout for CMOS technology. They studied various

folding strategies under the assumtion that components may be reordered and the stack can be

folded only once (so, only a two stack configuration as in Figure l(b) is possible). These folding

schemes begin by reordering the components by width (i.e., w, w+,,, 1

slice 1 slice 4

Cn ,

(a) Component stack

stack 1 stack 2

(b) Folding at i,


(b) Folding at i

stack 1 stack 2 stack 3 stack 4

C, d i 4

(c) Folded into 4 stacks

Figure 1: Stack of bit sliced components.

I Cn

S cl, I

inter stack routing

Figure 2: Routing space reserved.

width, in number of slices, of component i). In [Wu90], Wu and Gajski report on the application

of their two stack folding model to real circuits.

In this paper, we place no restriction on the number of folding points i/, i2, ..., ik. Addition-

ally, we do not permit component reordering. This restriction is realistic as the component stack

is usually ordered so as to minimize inter component routing requirements and optimize perfor-

mance. We consider both the situation when extra routing space at the stack ends is and is not

needed to accommodate routes between components at the ends of adjacent stacks. Note that our

model can also be used for the folding step of placement algorithms for standard cell and sea-of-

gates designs [SHRA88, SHRA90]. In this, the modules to be placed have been ordered by some

criterion and are then folded into the layout area so as to minimize area. In the case of standard

cell designs, all modules have the same width while in the case of sea-of-gates designs module

widths and heights vary from module to module.

The objective of folding is to obtain a minimum area layout subject to a height or width

constraint. We consider both types of constraints here. Note that if the height (width) is con-

strained to be h (w), then the area is minimized by minimizing the width (height). Note also that

if the height (width) is constrained to be h (w), it is enough to fold the component stack so that

the height (width) of the bounding rectangle is less than or equal to h (w). However, if the height

(width) is < h (w), the physical chip space allocated will have height (width) h (w). So, area is

minimized by minimizing the width (height).

In Section 2, we consider folding under the assumtion that all module widths are the same

(module heights may vary). The case when module widths vary but module heights are the same

is considered in Section 3. The general case of variable module widths and heights is considered

in Section 4. In Section 5, we consider the case when no overlap amongst the slices of different

stacks is permitted. This is referred to as folding without nesting.

2. Components With Equal Widths

Let h, be the height of the i'th component. We may assume that all components have a width of

one. If the stack is folded at components i', i2, ... ik-1, then the width of the layout is k and the

height, h, is the height of the tallest of the k stacks (Figure 3).

2.1. No Routing Area At Stack Ends

2.1.1. Height Of Folded Layout Is Fixed

When the height is constrained to be at most h, the minimum width layout is obtained by first
j J
selecting il to be the largest j such that h, < h. Next i2 is set to be the largest j such that h,
1=1 1= 11+1

_ h. Continuing in this way, i3, i4, ... can be obtained. The process stops when ik- = n. The

number of stacks is k. The optimality of this strategy is easily established and its complexity is



(a) Unfolded placement

---~~~ --- - -- -- ----------- -------

C C 2+1


c pC

------------- ---------------------- n

(b) Folded placement and enclosing rectangle

Figure 3: A k stack folding of equal width components.

readily seen to be O(n).

2.1.2. Width Of Folded Layout Is Fixed

If the width of the layout is constrained to be at most k, the minimum height layout can be

obtained by using a dynamic programming technique. Let H(i, w) be the height of the rectangle

with width w and minimum height into which components C,, C,+, ... C, can be folded. It is

easy to see that the following qualities are true:

H(i, 1) = h

H(n, w) = h, for all ww > 1

H(n+l,w) = 0 forallw w 1

Let h (i, j) = ht. A recurrence for H(i, w) in terms of the heights of smaller width foldings

is obtained by observing that if C,, C,+,, ..., C, is first folded at Cj, then the height of the enclos-

ing width w rectangle is max { h(i,j), H(j+1, w-l) }. So,

H(i, w)= min [max{ h(i,j), H(j+1, w-l) }], w > 1. (1)

The minimum height folding of C1, C, ... C constrained to have width at most k is H(1, k).

Using Equation (1) and the known values of H(i, 1), 1 < i < n, and H(n+l, w), H(i, 2) can be

obtained. From the H(i, 2)'s and Equation (1) we can obtain the H(i, 3)'s. Proceeding in this

way, the H(i, k)'s and hence H(1, k) can be obtained. A straightforward application of (1) to com-

pute each H(i, w) will take O(n-i+l) time. The total complexity to compute all H(i, w)'s, 1 < i <

n, 1 < w < k will be O(n2k). We can reduce this to O(nk) by using the following results.

Lemma 1: Letj, be a value ofj, i < j < n that minimizes max { h(i,j), H(j+1, w-l) } (see Equa-

tion (1)). For this value, max { h(i, j), H(j,+1, w-l) } < H(j,, w-l).

Proof: Suppose j, = i. Then, since h(i, i) H(i, w-l) and H(i+l, w-l) H(i,w-l), max {

h(i, i), H(i+l, w-l) } H(i, w-l). So the lemma is true forj, = i. Suppose j, > i. If max { h(i,j,),

H(j,+1, w-l) } = H(j,+1, w-l), then the lemma follows from the observation that H(j,+1, w-l) <

H(j, w-l). So, consider the case h(i,j,) > H(j,+1, w-l). If the lemma is not true, then h(i,j,) >

H(j, w-l). From this and the observation that h(i, j,-1) < h(i,j,), it follows that max { h(i,j,-1),

H(j,, w-l) } < h(i, j). This contradicts the assumption that j, is a value of j that minimizes max {

h(i,j),H(j+1, w-1)}. D

Lemma 2: Letj, be the largest value of j, i _
Let j,+, be similarly defined. Then j, j,+i 1 i < n.

Proof: max { h(i+l,j,), H(j,+1, w-1) }

< max { h(i,j,), H(j,+I, w-1) } (as h(i,j,) > h(i+l,j,))

H(j,,w-l) (Lemma 1)

H(p+1, w-1) 0 p
max { h(i+l,p), H(p+l, w-) } ,O 0 p

So, max { h(i+l, j), H(j+1, w-1) } is minimized for a value ofj that is 2 j,. Hence, j, j,+.. D

Lemma 3: Ifj, (as in Lemma 2) is known, then j,+, (Lemma 2) can be found in O(1,+i-j,+1) time

if the h(i+l,j)'s and H(/, w-1)'s are known.

Proof: Since h(i+l,j) is an increasing sequence (i.e., h(i+l,j) < h(i+l,j+l) for all j < n) and

H(j+1, w-l) is a nonincreasing sequence, f(i+l,j) = max { h(i+l,j), H(j+1, w-l) } is a bitonic

sequence. I.e., the following is true for some q, i < q < n:

f(i+l,j) 2 f(i+l,j+1) > ... > f(i+l, q) < f(i+l, q+1) < ... < f(i+l, n)

Hence, j,+1 is the least j for which f(i+l, j) < f(i+l, j+l) (in case there is no such j, j = n). Since

j,+, 2 j, (Lemma 2), the search for j,+, can be confined to the interval [,, n]. So, we need to check

iff(i+l, j) < f(i+l, j+l) only for j values in the range [,, j,+i]. This takes O(j,+i-j,+1) time under

the condition that h(i+1, j)'s and H(j, w-1)'s are known. D

Theorem 1: H(i, w), 1 < i < n, 1 < w < k can be computed in O(nk) time.

Proof: The computation is done first for w = 1, then w = 2, w = 3, ..., w = k. For w = 1, H(i, w), 1
Si < n, is obtained in 0(n) time as H(i, 1) = hj. For w > 1, the computation is done in the order

i = 1, 2, ..., n. For any i, the first value ofh that is needed is h(i,j,_ ). This is just h(i-l, j,_) -

h,_1. So, h(i,j,_,) can be obtained in 0(1) time as the value h(i-l,j,_i) was computed during the

computation of j,_-. h(i,j,-_+r) can be obtained from h(i,j,_1+r-1) by adding h,. So each of the

needed h(i, s) values can be obtained in 0(1) time. The last h(i, s) value computed is h(i,j,) and

is used to obtain h(i+l,j,). From Lemma 3, we know that a total ofj,-j,_i+l h(i, s) values are

needed and that the time to compute H(i, w) is proportional to j,-j,_-+l. Further, the time to get
H(i, w) for all i, 1 < i < n is proportional to I (,-j,,-+1) = O(n). Hence, H(i, w), 1 < i < n, 1 < w

< k can be computed in O(nk) time. D

A bounding rectangle of dimension h x w is said to be a dominating rectangle iff the com-

ponents CI, C2, ... ,Cn can be folded into it and there is no H x W rectangle, H < h, W < w, H* W <

h*w, such that C1, C2, ... ,C, can be folded into the H x W rectangle. The dominating rectangles

can be found in O(n2) time by computing H(1, w), 1 < w < n and eliminating from these n rectan-

gles, those that are dominated by a rectangle in this set.

2.2. Routing Area At Stack Ends

Let r,, 2 < i < n denote the height of the routing space needed to route the connections between

C,_1 and C, under the assumption that C,_1 and C, are in different stacks (so, they must be at the

same end of two adjacent stacks). Let r, = r,,+ = 0. Figure 4 shows the situation when C,_1 and C,

are at the bottom end. Note that r, depends on the number of interconnects that pass from C,_1 to

C, but not on the relative positioning of C,_1 and C, at the bottom of their respective stacks.

Further, note that if components C,,, ... C, form one stack, then the height needed to accommodate

these components and the inter stack routing to C, _1 and C, + is hj + r, + r,,+1.

Figure 4: Inter stack routing height requirement.

2.2.1. Height Of Folded Layout Is Fixed

Let W(/) be the width of a minimum width folding for Cj, ..., C,. This folding is required to

leave r, space in the first stack to complete the routes between C,_i and Cj. Note that it is possi-

ble that r, + hj > h for some j. In this case W(j) = as it is not possible to fold C, ..., C, as

desired. Whenever a height h folding is not possible, W(j) = .

Let W(n+l) = 0. We see that

W(i) = 1 + min { { W(+) I i q=l

The above recurrence can be solved for W(1) in O(n2) time. The actual folding with width W(1)

can also be obtained in this much time.


2.2.2. Width Of Folded Layout Is Fixed

Let H'(i, k) be the height of a minimum height rectangle of width k into which C,, C,,+, ..., C, can

be folded. This folding requires that there be enough space at the ends of each of the k stacks to

complete the inter stack routing. In particular, there must be r, units of space (i.e., height)

between the nearest rectangle boundary (top or bottom) and the end of C,. This space is needed

to complete the routing to C,_1. Figure 5 illustrates this.

r_ c,

H' (i, k)

Figure 5: H' (i, k)

One may readily verify the correctness of the following qualities:

H'(i, 1)= h(i,n) + r,

H'(n, w) = h, + r, w > 1 (2)

H'(i, w) = min [ max { h(ij) + r,+ r+, H'(j+l,w-1) } ].
H'(1, k) is the height of the minimum height rectangle with width k into which C1, ..., C, can be

folded and in which the inter stack routing can be completed. H'(1, k) can be computed from (2)

in O(n2k) time by computing H'(i, w) for w in the order w = 2, 3, ..., k. The O(nk) scheme of Sec-

tion 2.1 does not generalize to (2) as H'(i, w) is no longer monotone in i.

3. Components With Equal Heights

3.1. No Routing Area At Stack Ends

3.1.1. Height Of Folded Layout Is Fixed

First, consider the case when the height, h, of the folded layout is fixed and we wish to minimize

the layout width. Let W(i, h) be the width of a minimum width height h folded layout for the

equal height components C,, ..., C,. Consider any folded layout for C,, ..., C,. This consists of a

height h layout for C,, ... C (for some j, i j < n) which is folded at at most one position (Fig-

ure 6) followed by a folded layout for Cj+, ..., C,. Note that when j = n, there is only the layout

for C,, ..., C,.

Let S(i, j, h) be the width of a minimum width height h folded layout for C,, ..., Cj under

the restriction that there is at most one folding position. Then it follows that

W(i,h) = min { S(i,j, h) + W(j+, h)} (3)
_< j_

and W(n+l, h)= 0.

S(i, j, h) W(j+, h)



W(i, h)


S(i, j, h) W(J+1, h)

C, Ci, "

,11 h

W(i, h)

Figure 6: Possible folded layouts for C,, ..., C,.

Let w, be the width of C,, 1 < i < n. We easily see that W(n, h) = w,, for h > 1 (we assume

that the height of each component C, is 1). Since at most h components can be placed on a stack

of height h and since S(i, j, h) can contain at most two stacks, it follows that (3) may be rewritten


W(i, h)

The minimum width height h folding we seek has width W(1, h). To compute this width,

we need to know S(i,j, h) for 1 i n and i _ j min {n, i + 2h 1}. So we shall proceed to

describe how S(i,j,h) may be computed. With a little more book keeping, the layout

corresponding to S(i, j, h) may also be obtained.

min { S(i,j, h) + W(j+, h) }

To compute S(i, j, h), we use the solution to a related problem P1 in which we are given two

sets L = {L1, ... L} and R = {R1, ..., Rm} of equal height components and a rectangle of width w.

The width of L, is wl, and that of R, is wr,. An example is given in Figure 7(a). The width w rec-

tangle is divided into buckets that are of unit height and width w as in Figure 7(b). The com-

ponents of L and R are to be assigned to buckets such that:

(a) the fewest number of buckets are used

(b) each bucket contains at most one L, and one Rj

(c) if a bucket contains L, and Rj, then wl, + wrj < w

(d) the order in which the L,'s (R 's) are assigned to buckets is L1, L2, ..., Ln (RI, R2, ... Rm) bot-

tom to top.

In order to assure a feasible solution, we require w 2 max { w1j, ... win, wry, ... wrm }. Figure

7(c) shows a possible assignment of the L's and R's of Figure 7(a) into 4 buckets.

L3 R3
L2 ?R2


(a) wl = wl3 = wr = wr3 = 2
wl2 = r2 = 1

bucket 4
bucket 3
bucket 2
bucket 1

(b) 4 bucket
width 3 rectangle


(c) Bucket assignment

(c) Bucket assignment

Figure 7: An example for P1.

Let B(i, j, w) be the smallest number of buckets needed for L1, ... L, and R1, ... Rj. The

number of buckets used in any solution of P1 is B(n, m, w). It is easy to see that the following are


B(i,O,w)=i, 1i
B(O,j,w) =j, 1 j
B(i,j,w) B(i,j-1,w), l1i n, l j m

B(i,j,w) B(i-1,j,w), l1i n, l j m

Furthermore, for an arbitrary B(i, j, w), i 2 1, j > 1, we see that ifwl, + wrj, w, then there is

no advantage to not putting L, and Rj into the same bucket (see Figure 8(a)). Hence, B(i, j, w) = 1

+ B(i-l,j-1, w). If wl, + wrj > w, then L, and Rj can not be placed in the same bucket. Because

of the ordering requirement (d), we have two possibilities. In one, L, occupies the highest used

bucket. This bucket cannot be shared with any R, as Rj must occupy the highest bucket occupied

by any of R1, ... R and Rj cannot fit into the same bucket as L,. So, B(i,j, w) = 1 + B(i-l,j, w)

(Figure 8(b)). In the second case, Rj occupies the highest used bucket and the remaining com-

ponents occupy lower buckets (Figure 8(c)). Now, B(i, j, w) = 1 + B(i, j-l, w). Combining these

observations together, we get:

l + B(i-l,j-1, w), wl, + wr B(i,j,w)= + min B(i-l,j,w), B(i,j-1,w) wl, + wr >w (4)

Equation (4) together with the values ofB(i, 0, w), 1 < i < n and B(0, j, w), 1 j n m, can be

used to compute all B(i, j, w) values, 1 < i < n, 1 < j < m by computing these first for all i, j such

that i +j = 2, then i +j = 3, ..., and finally i +j = n + m. The time needed to obtain B(n, m, w) is

therefore O(nm). The actual assignment of the L,'s and R,'s to the buckets can be obtained by

recording which of the three possibilities of (3) results in each B(i,j, w) value and then perform-

ing a traceback (see Horowitz and Sahni [HOR076] for a discussion of dynamic programming

and associated tracebacks).

L, R, L, R,

H (i-l,j-1) H (i-1,j) H (i,j-1)

(a) wl+wr < w (b) wl+wr > w (c) wl+wr > w

Figure 8: Cases for placement of L, and Rj.

Let T(i,j, k, w) be the minimum number of width w buckets needed for the problem P1

instance defined by L = {C,, ..., Ck}, R = {Cj, ..., Ck+l}, i < k < j. Note that T(i,j, k, w) gives the

height of the minimum height width w rectangle into which C,, ..., Cj will fit when folded at Ck.

Let T(i,j, w) = min { T(i,j, k, w) }. T(i,j, w) is the height of a minimum height width w rectan-

gle into which C,, ..., Cj will fit when the component stack C,, ... Cj is folded at at most one

component (recall that folding at k = j corresponds to no folding). T(i, j, w) can be obtained by

first computing T(i,j, k, w) using P1 as described above. Since each T(i,j, k, w) can be com-

puted in O((k-i+l)(l-k)) time, T(i,j,w) can be computed in O( 1 (k-i+l)(1-k)) = O((/-i)3)


A faster way to obtain T(i, j, w) is to solve the P1 instance defined by L = {L1, L2, ..., Lj_, },

R = {R1, R2, ..., Rj_~+I}, -' / = ws+l, wr, = w,+s_, 1 5 s j-i+l where w, is the width of component

C,. Let B be the (j i +1 ) x (j i +1 ) minimum bucket matrix computed using (4). T(i, j, k, w)

= B('-k, k-i+l, w). To see this, consider Figure 9(a). This gives the two sets of components that

are to be assigned to buckets. Figure 9(b) gives the same two component sets using L, R

terminology. Note that Cp corresponds to LJp+, and Rp-_,+ as w, = wlij-+1 = wrp-+1. The

minimum number of buckets needed is not affected by exchanging the two columns of Figure

9(a). So, the solution to Figure 9(a) is the same as that for Figure 9(c). Furthermore, inverting

the two columns to get Figure 9(d) does not affect the minimum number of buckets. Figure 9(d)

in L, R, notation is given in Figure 9(e). Hence, T(i,j, k, w) = B(j-k, k-i+l, w).

C, Cj Lj-_+1 Rjy-+1

C,+1 Lj-,

Ck Ck+1 Lj-k+1 Rk-i+2

(a) T (i, j, w), fold atk

c, C,

Ck+1 Ck

(c) Switch columns

(b) In L, R, terminology

Ck+1 Ck Lj_

C, C, L,

(d) Invert columns

(e) Relable

Figure 9: Equivalence.

Since B can be computed in O( i +1)2) time, T(i,j, k, w), i k j can be computed in

O( /- i +1)2 ) time. Hence, we can compute T(i, j, w) in O( (j- i +1)2) time.

We are now ready to see how S(i, j, h) may be computed. When only one folding is permit-

ted, the minimum width of a height h layout is one of the values in the set { w,, ..., w, } u { w, +

w, | i u < v j }. Let d, < d2 < ... < dp be the distinct values in this set. Note that (0 i +1)2.

Since T(i,j, d,) T(i,j, d,+1), 1 < u < p, S(i,j,h) = dq where q is the least integer for which

T(i,j, dq) h. q can be found by performing a binary search in the range [1, p]. For each exam-

ined value s in this range, T(i, j, ds) is computed. If T(i, j, ds) > h, then values < s are eliminated

from the range. Otherwise, values > s are eliminated. Since a binary search is used, at most

Flog2(p+l)] s values are tried. Hence, at most this many T(i,j, d)'s are computed. So, S(i, j, h)

is computed in O( (j i +1)2 log2(j-i+l)) time.

To compute W(i, h) using (3') we need to compute at most 2h S(i,j, h) values. Since j i

< 2h, this can be done in O(h3 log h ) time. To compute W(1, h), we first compute W(n, h), then

W(n-1, h), ..., and finally W(1, h). The time for this is O(h3n log h ). Since h need be at most n,

the complexity is O( n4 log n ).

3.1.2. Width Of Folded Layout Is Fixed

When the width is fixed at w, we can find the minimum height layout in O( n4 log2 n ) time by

performing a binary search in the interval [1, n]. For each examined value h in this interval, we

compute W(1, h) as in the previous section. If W(1, h) > w, then heights < h are eliminated. Oth-

erwise, heights > h are eliminated.

3.2. Routing Area At Stack Ends

Let r, be as in Section 2.2. Define T'(i, j, k, w) as below:

rT(i,j, k, w)+ max {r, r+l } + rk+l k < j
T'(i, j, k, w) =T(i,j, k,w)+r+ r k=j

T'(i, j, k, w) is the height of a minimum height width w rectangle into which C,, C,+1, ... ,C can fit

with fold at Ck. This height includes the needed routing space at the top and bottom of the com-

ponent stacks. T(i,j,k,w) is as defined in Section 3.1.1. If we use T'(i,j,k,w) in place of

T(i, j, k, w) in the computation of T(i, j, w) then the S(i, j, w)'s and W(i, h)'s computed in Section

3.1.1 account for the routing area. Hence, the minimum width height h folding that allows for

routing area can be found in O( h3n log h ) time. Similarly, the minimum height width w folding

that accounts for routing area can be found in O( n4 log2 n ) time.

4. Varible Width And Height Components

4.1. No Routing Area At Stack Ends

4.1.1. Height Of Folded Layout Is Fixed

Let h, be the height of component C, and let w, be its width. Let S(i,j, h) and W(i,h) be as in

Section 3.1.1. Using the same reasoning as in Section 3.1.1, we obtain

W(i,h)= min {S(i,j,h)+ W(j+1, h) } (5)
z!j!n h, < 2h

and W(n+l, h)= 0.

To obtain S(i, j, h), we generalize P1 to the case where the L's and R's have possibly dif-

ferent heights. Let hi, and hr, respectively, be the height of L, and R,. Let B(i,j, w) be the

minimum height rectangle into which the L,'s and R,'s can be placed so that:

(b') a horizontal line drawn at any vertical position of the rectangle cuts at most one member of

L and at most one ofR.

(c') if L, and R, are cut by some horizontal line, then / + wr, w.

(d') the order in which the L,'s and R,'s appear in the rectangle bottom to top is (L1, ..., L,) and

(RI, ... Rm), respectively.

If wl, + wr > w then using the reasoning of Section 3.1.1, we obtain:

B(i,j,w)= min {hl, +B(i-1,j,w) hr, +B(i,j-1,w) }.

However, if wl, + wr, < w, the minimum B(i,j, w) may not occur when L, and Rj are placed

adjacent to each other (see Figure 10).

R2 R2
R, I R

R1 Ln R1

(a) L1 and R3 adjacent (b) L1 and R, adjacent

Figure 10: L, and R3 adjacent doesn't optimize B (1, 3, w).

When wl, + wrj < w, we need to compute i* and j* such that wl,- + wr, > w. This is done as

in Figure 11. To ensure proper termination of this code, we define wlo = wro = w + 1 and hli = hro

= 0. An example of this computation is given in Figure 12.

Let B(O,j,w) = hr, and B(i, 0,w) = hl,. Theorem 2 establishes a recurrence for
s=1 s=l

B(i,j, w) when wl, + wr, w.

i .J :=j;
HL := hl, ; HR := hr ;
while wl,- + wr,- < w do
if HL > HR
then [ j*:=j*-1;
else [ i*:= i*-l ;

Figure 11: Computing i* and j*.



HR := HR + hr ]
HL := HL +hl,- ;




Figure 12: Example computation ofi* and j*.





Theorem 2: Let i and j be such that wl, + wrj < w and let i*, j*, HL, and HR be as computed by

the code of Figure 11. Then, B(i,j, w) = min { HL +B(i*-1,j*, w), HR +B(i*,j*-l, w) }.

Proof: Since wl, + wr* > w, either L, is above R. or below it in every solution. If L, is above R*

in an optimal solution (see Figure 13), then HLot > HL = hl, as L,-+, ..., L, are above L,.

r ----------- --I

HL pt I

B (i,j, w) Hpt



Figure 13: L, above R* in optimal solution.

Furthermore, since at least L, _I, ..., L, and Ri, ..., R, are below L, Hot >2 HLoP + B(i*-l,j*, w) >

HL + B(i*-1,j*, w). Since there is a feasible solution of height HL + B(i*-1,j*, w) (by construc-

tion of Figure 11, Rj, ..., R,++, can be packed in height HL adjacent to L,, ..., L, ) and since Hop is

the minimum possible height of a feasible solution, it follows that Hop = HL + B(i*-l,j*, w).

Similarly, if R is above L,* in the optimal solution, Ho = HR +B(i*,j*-1, w). Hence, B(i, j, w) =

H, = min { HL +B(i*-,j*, w) ,HR +B(i*,j*-l,w) }. D

Since L = n and |R I= m, a total of nm B values are to be computed. For each, we may need

to compute i* and j* using Figure 12. This takes O( n + m ) time. So, while the B matrix could

be computed in O(nm) time when all components had the same height, it now takes

O(mn(n +m )) time to do this. The remaining ideas of Section 3.1.1 directly carry over to the

case of variable height and width components. The T(i,j,k,w)'s can be computed in

O( (-i +1)3) time for any fixed i and j and all k, i < k j. Hence, each T(i,j,w) can be

obtained in O( (j- i +1)3) time. So, each S(i, j, h) can be obtained in O( h3 log h ) time. As a

result, (5) can be solved for W(1, h) in O( h4n log h ) time.

4.1.2. Width Of Folded Layout Is Fixed

The binary search technique of Section 3.1.2 may be used to obtain the optimal height solution in

O( n5 log2 n ) time as the optimal height is one of the n2 values h(i, j), 1 < i < j < n.

4.2. Routing Area At Stack Ends

We assume that all inter stack routes are done at the top or bottom ends of the stacks as in Figure

14(a), rather than in space internal to the stack as in Figure 14(b). More specifically, ifR is the

minimum height rectangle into which C,, C,+1, ... ,Cj can be folded using at most one folding posi-

tion then all inter stack routing is done external to R.

-- route

I... l I- -
"- -____-
r- ~--

L--- _

I -
L-- -

(a) legal route

I----- -
r- route L I

Si it I
L ..... j route L ______J

(b) illegal route

Figure 14:

The technique of Section 4.2 readily generalizes to the case of vanble height and width

components. The complexity of the algorithms for this case are the same as those for the case

when no routing area is needed at the stack ends.

5. Folding Without Nesting

Our folding model permits the slices of one stack to nest with the slices of an adjacent stack. For

example, in Figure l(b) slice 4 of stack 1 occupies the same physical space as slice 4 of stack 2.

As noted earlier, this nesting of stacks may increase the demand for inter component routing

tracks. In situations where increased routing tracks cannot be provided, one may forbid stack

nesting. So, the stacks of Figure l(b) will have to be placed as in Figure 15. In this section we

consider folding without nesting. Since nesting can occur only when the component widths are

not the same, we need only consider this case.

C Cn

c, c, _
CI I ,4

Figure 15: Folding without nesting.

5.1. Height Of Folded Layout Is Fixed

Let W(i) be the minimum width rectangle of height h into which the components C,, ..., C, can

be folded without nesting. The correctness of the following recurrence is easily established.

W(i) = min { max{wq} + W(q+1) }, 1 i < n
z<_j<_n h(z,j)<_h z<_q<_j

W(n+l) = 0

This recurrence may be solved for W(1) in O(n2) time.

5.2. Width Of Folded Layout Is Fixed

Since the optimal height is one of the n2 values h(i,j), 1 i j n, we can perform a binary

search over these heights to determine the smallest height that results in a folding of width no

more than permissible. For each tested height the O(n2) height constrained algorithm is used.

The overall complexity is O(n2logn).

6. Conclusions

We have considered the problem of folding bit sliced stacks so as to obtain minimum height (sub-

ject to width constraints) and minimum width (subject to height constraints) foldings. Our model

differs from that of [LARM90] in that we do not permit a reordering of the components in the

input component stack. Our model applies to bit sliced architectures as well as to standard cell

and sea-of-gates designs.

Polynomial time algorithms to obtain optimal foldings have been obtained under a variety

of assumtions (stack nesting permitted/not permitted, routing space needed/not needed at stack

ends, equal height components, equal width components). Our algorithms for the stack nesting
case are summarized below in Table 1.

Stack nesting permitted Routing area at stack ends
Stack nesting permitted
No yes
Equal width, height constrained O(n) O(n2)
Equal width, width constrained (n 2) 0(n3)
Equal height, height constrained O (n4logn) O (n4logn)
Equal height, width constrained O(n4log2n) O(n41og2n)
Variable heights and widths, height constrained O(n 5logn) O(n logn)
Variable heights and widths, width constrained O(n 5logn) O(n log2n)

Table 1: Summary of algorithms.

When stack nesting is not permitted, the height constrained case may be solved in O(n2)

time and the width constrained case in O(n logn) time.

7. References

[HOR078] E. Horowitz, and S. Sahni, "Fundamentals of Computer Algorithms", Computer

Science Press, Maryland, 1978.

[LARM90] L. Larmore, D. Gajski and A. Wu, "Layout Placement for Sliced Architecture,"

University of California, Irvine, Technical Report, 1990.


A. Wu, and D. Gajski, "Partitioning Algorithms for Layout Synthesis from

Register-Transfer Netlists," Proc. ofInternational Conference on Computer Aided

Design, November 1990, pp. 144-147.

[SHRA88] E. Shragowitz, L. Lin, S. Sahni, "Models and algorithms for structured layout,"

Computer Aided Design, Butterworth & Co, 20, 5, 1988, 263-271

[SHRA90] E. Shragowitz, J. Lee, and S. Sahni, "Placer-router for sea-of-gates design style,"

in Progress in computer aided VLSI design, Ed. G. Zobrist, Ablex Publishing, Vol

2, 1990, 43-92

University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs