Optimal Folding Of Bit Sliced Stacks+
Doowon Paik
University of Minnesota
Sartaj Sahni
University of Florida
We develop fast polynomial time algorithms to optimally fold stacked bit sliced architectures to
minimize area subject to height or width constraints. These algorithms may also be applied to
folding problems that arise in standard cell and sea-of-gates designs.
KEYWORDS and PHRASES
Stacked bit sliced architectures, folding, area
+ Research supported, in part, by the National Science Foundation under grant MIP 86-17374.
Abstract
2
1. Introduction
A stack of bit sliced components ([LARM90] and [WU90]) consists of n components of varying
height and width with their left ends vertically aligned as in Figure l(a). The intra component
(i.e., inter slice but local to a component) routing is done on metal layer 1 while the inter com-
ponent (i.e., intra slice but across components) routing is done on metal layer 2.
A component stack may be folded at component il by rotating components il+1, ... in by
180 so that their aligned ends are now on the right and the component order is il+l, ... i bot-
tom to top (Figure l(b)). As a result of this, the slices of components i/+1, ... i, are in the order
slice 1, slice 2, ... right to left. Folding at il creates two components stacks. One is left aligned
(i.e., components 1, ..., ij) and the other is right aligned (components i+l, ... i). By folding at
i1, i2, and i3, a four stack arrangement as in Figure l(c) is obtained.
When a component stack is folded as in Figure l(b), it becomes necessary for the available
vertical tracks in a physical slice to be able to carry the routes for two logical slices. For exam-
ple, in Figure l(b), slice 4 of component C, and slice 4 of component C,_1 occupy the same physi-
cal chip space. We assume that the width of a slice is sufficient for this. Further, when a stack is
folded at component il we may need space at the ends of the two created stacks to complete the
routes between components il and il+1. If this can be done in a third layer, then no such routing
space is needed. If not, additional space proportional to the number of wires between il and i1+1
must be reserved (Figure 2). We consider both of these cases in this paper.
The bit sliced model defined above was introduced by Larmore, Gajski, and Wu [LARM90]
for the compilation of arbitrary net lists into layout for CMOS technology. They studied various
folding strategies under the assumtion that components may be reordered and the stack can be
folded only once (so, only a two stack configuration as in Figure l(b) is possible). These folding
schemes begin by reordering the components by width (i.e., w, w+,,, 1 *
*
slice 1 slice 4
Cn ,
(a) Component stack
stack 1 stack 2
(b) Folding at i,
I I
II
I:I
I I
I I I
I I I
I I I
I I I
(b) Folding at i
stack 1 stack 2 stack 3 stack 4
C C
C, d i 4
(c) Folded into 4 stacks
Figure 1: Stack of bit sliced components.
I Cn
S cl, I
inter stack routing
Figure 2: Routing space reserved.
width, in number of slices, of component i). In [Wu90], Wu and Gajski report on the application
of their two stack folding model to real circuits.
In this paper, we place no restriction on the number of folding points i/, i2, ..., ik. Addition-
ally, we do not permit component reordering. This restriction is realistic as the component stack
is usually ordered so as to minimize inter component routing requirements and optimize perfor-
mance. We consider both the situation when extra routing space at the stack ends is and is not
needed to accommodate routes between components at the ends of adjacent stacks. Note that our
model can also be used for the folding step of placement algorithms for standard cell and sea-of-
gates designs [SHRA88, SHRA90]. In this, the modules to be placed have been ordered by some
criterion and are then folded into the layout area so as to minimize area. In the case of standard
cell designs, all modules have the same width while in the case of sea-of-gates designs module
widths and heights vary from module to module.
The objective of folding is to obtain a minimum area layout subject to a height or width
constraint. We consider both types of constraints here. Note that if the height (width) is con-
strained to be h (w), then the area is minimized by minimizing the width (height). Note also that
if the height (width) is constrained to be h (w), it is enough to fold the component stack so that
the height (width) of the bounding rectangle is less than or equal to h (w). However, if the height
(width) is < h (w), the physical chip space allocated will have height (width) h (w). So, area is
minimized by minimizing the width (height).
In Section 2, we consider folding under the assumtion that all module widths are the same
(module heights may vary). The case when module widths vary but module heights are the same
is considered in Section 3. The general case of variable module widths and heights is considered
in Section 4. In Section 5, we consider the case when no overlap amongst the slices of different
stacks is permitted. This is referred to as folding without nesting.
2. Components With Equal Widths
Let h, be the height of the i'th component. We may assume that all components have a width of
one. If the stack is folded at components i', i2, ... ik-1, then the width of the layout is k and the
height, h, is the height of the tallest of the k stacks (Figure 3).
2.1. No Routing Area At Stack Ends
2.1.1. Height Of Folded Layout Is Fixed
When the height is constrained to be at most h, the minimum width layout is obtained by first
j J
selecting il to be the largest j such that h, < h. Next i2 is set to be the largest j such that h,
1=1 1= 11+1
_ h. Continuing in this way, i3, i4, ... can be obtained. The process stops when ik- = n. The
number of stacks is k. The optimality of this strategy is easily established and its complexity is
c,
C2
(a) Unfolded placement
w=k
---~~~ --- - -- -- ----------- -------
C C 2+1
h
c pC
------------- ---------------------- n
(b) Folded placement and enclosing rectangle
Figure 3: A k stack folding of equal width components.
readily seen to be O(n).
2.1.2. Width Of Folded Layout Is Fixed
If the width of the layout is constrained to be at most k, the minimum height layout can be
obtained by using a dynamic programming technique. Let H(i, w) be the height of the rectangle
with width w and minimum height into which components C,, C,+, ... C, can be folded. It is
easy to see that the following qualities are true:
H(i, 1) = h
J=1
H(n, w) = h, for all ww > 1
H(n+l,w) = 0 forallw w 1
Let h (i, j) = ht. A recurrence for H(i, w) in terms of the heights of smaller width foldings
t=1
is obtained by observing that if C,, C,+,, ..., C, is first folded at Cj, then the height of the enclos-
ing width w rectangle is max { h(i,j), H(j+1, w-l) }. So,
H(i, w)= min [max{ h(i,j), H(j+1, w-l) }], w > 1. (1)
The minimum height folding of C1, C, ... C constrained to have width at most k is H(1, k).
Using Equation (1) and the known values of H(i, 1), 1 < i < n, and H(n+l, w), H(i, 2) can be
obtained. From the H(i, 2)'s and Equation (1) we can obtain the H(i, 3)'s. Proceeding in this
way, the H(i, k)'s and hence H(1, k) can be obtained. A straightforward application of (1) to com-
pute each H(i, w) will take O(n-i+l) time. The total complexity to compute all H(i, w)'s, 1 < i <
n, 1 < w < k will be O(n2k). We can reduce this to O(nk) by using the following results.
Lemma 1: Letj, be a value ofj, i < j < n that minimizes max { h(i,j), H(j+1, w-l) } (see Equa-
tion (1)). For this value, max { h(i, j), H(j,+1, w-l) } < H(j,, w-l).
Proof: Suppose j, = i. Then, since h(i, i) H(i, w-l) and H(i+l, w-l) H(i,w-l), max {
h(i, i), H(i+l, w-l) } H(i, w-l). So the lemma is true forj, = i. Suppose j, > i. If max { h(i,j,),
H(j,+1, w-l) } = H(j,+1, w-l), then the lemma follows from the observation that H(j,+1, w-l) <
H(j, w-l). So, consider the case h(i,j,) > H(j,+1, w-l). If the lemma is not true, then h(i,j,) >
H(j, w-l). From this and the observation that h(i, j,-1) < h(i,j,), it follows that max { h(i,j,-1),
H(j,, w-l) } < h(i, j). This contradicts the assumption that j, is a value of j that minimizes max {
h(i,j),H(j+1, w-1)}. D
Lemma 2: Letj, be the largest value of j, i _
Let j,+, be similarly defined. Then j, j,+i 1 i < n.
Proof: max { h(i+l,j,), H(j,+1, w-1) }
< max { h(i,j,), H(j,+I, w-1) } (as h(i,j,) > h(i+l,j,))
H(j,,w-l) (Lemma 1)
H(p+1, w-1) 0 p
max { h(i+l,p), H(p+l, w-) } ,O 0 p
So, max { h(i+l, j), H(j+1, w-1) } is minimized for a value ofj that is 2 j,. Hence, j, j,+.. D
Lemma 3: Ifj, (as in Lemma 2) is known, then j,+, (Lemma 2) can be found in O(1,+i-j,+1) time
if the h(i+l,j)'s and H(/, w-1)'s are known.
Proof: Since h(i+l,j) is an increasing sequence (i.e., h(i+l,j) < h(i+l,j+l) for all j < n) and
H(j+1, w-l) is a nonincreasing sequence, f(i+l,j) = max { h(i+l,j), H(j+1, w-l) } is a bitonic
sequence. I.e., the following is true for some q, i < q < n:
f(i+l,j) 2 f(i+l,j+1) > ... > f(i+l, q) < f(i+l, q+1) < ... < f(i+l, n)
Hence, j,+1 is the least j for which f(i+l, j) < f(i+l, j+l) (in case there is no such j, j = n). Since
j,+, 2 j, (Lemma 2), the search for j,+, can be confined to the interval [,, n]. So, we need to check
iff(i+l, j) < f(i+l, j+l) only for j values in the range [,, j,+i]. This takes O(j,+i-j,+1) time under
the condition that h(i+1, j)'s and H(j, w-1)'s are known. D
Theorem 1: H(i, w), 1 < i < n, 1 < w < k can be computed in O(nk) time.
Proof: The computation is done first for w = 1, then w = 2, w = 3, ..., w = k. For w = 1, H(i, w), 1
n
Si < n, is obtained in 0(n) time as H(i, 1) = hj. For w > 1, the computation is done in the order
J=J
i = 1, 2, ..., n. For any i, the first value ofh that is needed is h(i,j,_ ). This is just h(i-l, j,_) -
h,_1. So, h(i,j,_,) can be obtained in 0(1) time as the value h(i-l,j,_i) was computed during the
computation of j,_-. h(i,j,-_+r) can be obtained from h(i,j,_1+r-1) by adding h,. So each of the
needed h(i, s) values can be obtained in 0(1) time. The last h(i, s) value computed is h(i,j,) and
is used to obtain h(i+l,j,). From Lemma 3, we know that a total ofj,-j,_i+l h(i, s) values are
needed and that the time to compute H(i, w) is proportional to j,-j,_-+l. Further, the time to get
n
H(i, w) for all i, 1 < i < n is proportional to I (,-j,,-+1) = O(n). Hence, H(i, w), 1 < i < n, 1 < w
7=1
< k can be computed in O(nk) time. D
A bounding rectangle of dimension h x w is said to be a dominating rectangle iff the com-
ponents CI, C2, ... ,Cn can be folded into it and there is no H x W rectangle, H < h, W < w, H* W <
h*w, such that C1, C2, ... ,C, can be folded into the H x W rectangle. The dominating rectangles
can be found in O(n2) time by computing H(1, w), 1 < w < n and eliminating from these n rectan-
gles, those that are dominated by a rectangle in this set.
2.2. Routing Area At Stack Ends
Let r,, 2 < i < n denote the height of the routing space needed to route the connections between
C,_1 and C, under the assumption that C,_1 and C, are in different stacks (so, they must be at the
same end of two adjacent stacks). Let r, = r,,+ = 0. Figure 4 shows the situation when C,_1 and C,
are at the bottom end. Note that r, depends on the number of interconnects that pass from C,_1 to
C, but not on the relative positioning of C,_1 and C, at the bottom of their respective stacks.
Further, note that if components C,,, ... C, form one stack, then the height needed to accommodate
these components and the inter stack routing to C, _1 and C, + is hj + r, + r,,+1.
J=11
Figure 4: Inter stack routing height requirement.
2.2.1. Height Of Folded Layout Is Fixed
Let W(/) be the width of a minimum width folding for Cj, ..., C,. This folding is required to
leave r, space in the first stack to complete the routes between C,_i and Cj. Note that it is possi-
ble that r, + hj > h for some j. In this case W(j) = as it is not possible to fold C, ..., C, as
desired. Whenever a height h folding is not possible, W(j) = .
Let W(n+l) = 0. We see that
W(i) = 1 + min { { W(+) I i
q=l
The above recurrence can be solved for W(1) in O(n2) time. The actual folding with width W(1)
can also be obtained in this much time.
11
2.2.2. Width Of Folded Layout Is Fixed
Let H'(i, k) be the height of a minimum height rectangle of width k into which C,, C,,+, ..., C, can
be folded. This folding requires that there be enough space at the ends of each of the k stacks to
complete the inter stack routing. In particular, there must be r, units of space (i.e., height)
between the nearest rectangle boundary (top or bottom) and the end of C,. This space is needed
to complete the routing to C,_1. Figure 5 illustrates this.
r_ c,
Ci
H' (i, k)
Figure 5: H' (i, k)
One may readily verify the correctness of the following qualities:
H'(i, 1)= h(i,n) + r, *
*
H'(n, w) = h, + r, w > 1 (2)
H'(i, w) = min [ max { h(ij) + r,+ r+, H'(j+l,w-1) } ].
i!j
H'(1, k) is the height of the minimum height rectangle with width k into which C1, ..., C, can be
folded and in which the inter stack routing can be completed. H'(1, k) can be computed from (2)
in O(n2k) time by computing H'(i, w) for w in the order w = 2, 3, ..., k. The O(nk) scheme of Sec-
tion 2.1 does not generalize to (2) as H'(i, w) is no longer monotone in i.
3. Components With Equal Heights
3.1. No Routing Area At Stack Ends
3.1.1. Height Of Folded Layout Is Fixed
First, consider the case when the height, h, of the folded layout is fixed and we wish to minimize
the layout width. Let W(i, h) be the width of a minimum width height h folded layout for the
equal height components C,, ..., C,. Consider any folded layout for C,, ..., C,. This consists of a
height h layout for C,, ... C (for some j, i j < n) which is folded at at most one position (Fig-
ure 6) followed by a folded layout for Cj+, ..., C,. Note that when j = n, there is only the layout
for C,, ..., C,.
Let S(i, j, h) be the width of a minimum width height h folded layout for C,, ..., Cj under
the restriction that there is at most one folding position. Then it follows that
W(i,h) = min { S(i,j, h) + W(j+, h)} (3)
_< j_
and W(n+l, h)= 0.
S(i, j, h) W(j+, h)
^--------------
h
W(i, h)
(a)
S(i, j, h) W(J+1, h)
C, Ci, "
III
II
,11 h
W(i, h)
Figure 6: Possible folded layouts for C,, ..., C,.
Let w, be the width of C,, 1 < i < n. We easily see that W(n, h) = w,, for h > 1 (we assume
that the height of each component C, is 1). Since at most h components can be placed on a stack
of height h and since S(i, j, h) can contain at most two stacks, it follows that (3) may be rewritten
as:
W(i, h)
The minimum width height h folding we seek has width W(1, h). To compute this width,
we need to know S(i,j, h) for 1 i n and i _ j min {n, i + 2h 1}. So we shall proceed to
describe how S(i,j,h) may be computed. With a little more book keeping, the layout
corresponding to S(i, j, h) may also be obtained.
min { S(i,j, h) + W(j+, h) }
Ij
To compute S(i, j, h), we use the solution to a related problem P1 in which we are given two
sets L = {L1, ... L} and R = {R1, ..., Rm} of equal height components and a rectangle of width w.
The width of L, is wl, and that of R, is wr,. An example is given in Figure 7(a). The width w rec-
tangle is divided into buckets that are of unit height and width w as in Figure 7(b). The com-
ponents of L and R are to be assigned to buckets such that:
(a) the fewest number of buckets are used
(b) each bucket contains at most one L, and one Rj
(c) if a bucket contains L, and Rj, then wl, + wrj < w
(d) the order in which the L,'s (R 's) are assigned to buckets is L1, L2, ..., Ln (RI, R2, ... Rm) bot-
tom to top.
In order to assure a feasible solution, we require w 2 max { w1j, ... win, wry, ... wrm }. Figure
7(c) shows a possible assignment of the L's and R's of Figure 7(a) into 4 buckets.
L3 R3
L2 ?R2
Lz RI
(a) wl = wl3 = wr = wr3 = 2
wl2 = r2 = 1
bucket 4
bucket 3
bucket 2
bucket 1
(b) 4 bucket
width 3 rectangle
R2
(c) Bucket assignment
(c) Bucket assignment
Figure 7: An example for P1.
Let B(i, j, w) be the smallest number of buckets needed for L1, ... L, and R1, ... Rj. The
number of buckets used in any solution of P1 is B(n, m, w). It is easy to see that the following are
true:
B(i,O,w)=i, 1i
B(O,j,w) =j, 1 j
B(i,j,w) B(i,j-1,w), l1i n, l j m
B(i,j,w) B(i-1,j,w), l1i n, l j m
Furthermore, for an arbitrary B(i, j, w), i 2 1, j > 1, we see that ifwl, + wrj, w, then there is
no advantage to not putting L, and Rj into the same bucket (see Figure 8(a)). Hence, B(i, j, w) = 1
+ B(i-l,j-1, w). If wl, + wrj > w, then L, and Rj can not be placed in the same bucket. Because
of the ordering requirement (d), we have two possibilities. In one, L, occupies the highest used
bucket. This bucket cannot be shared with any R, as Rj must occupy the highest bucket occupied
by any of R1, ... R and Rj cannot fit into the same bucket as L,. So, B(i,j, w) = 1 + B(i-l,j, w)
(Figure 8(b)). In the second case, Rj occupies the highest used bucket and the remaining com-
ponents occupy lower buckets (Figure 8(c)). Now, B(i, j, w) = 1 + B(i, j-l, w). Combining these
observations together, we get:
l + B(i-l,j-1, w), wl, + wr
B(i,j,w)= + min B(i-l,j,w), B(i,j-1,w) wl, + wr >w (4)
Equation (4) together with the values ofB(i, 0, w), 1 < i < n and B(0, j, w), 1 j n m, can be
used to compute all B(i, j, w) values, 1 < i < n, 1 < j < m by computing these first for all i, j such
that i +j = 2, then i +j = 3, ..., and finally i +j = n + m. The time needed to obtain B(n, m, w) is
therefore O(nm). The actual assignment of the L,'s and R,'s to the buckets can be obtained by
recording which of the three possibilities of (3) results in each B(i,j, w) value and then perform-
ing a traceback (see Horowitz and Sahni [HOR076] for a discussion of dynamic programming
and associated tracebacks).
L, R, L, R,
H (i-l,j-1) H (i-1,j) H (i,j-1)
(a) wl+wr < w (b) wl+wr > w (c) wl+wr > w
Figure 8: Cases for placement of L, and Rj.
Let T(i,j, k, w) be the minimum number of width w buckets needed for the problem P1
instance defined by L = {C,, ..., Ck}, R = {Cj, ..., Ck+l}, i < k < j. Note that T(i,j, k, w) gives the
height of the minimum height width w rectangle into which C,, ..., Cj will fit when folded at Ck.
Let T(i,j, w) = min { T(i,j, k, w) }. T(i,j, w) is the height of a minimum height width w rectan-
gle into which C,, ..., Cj will fit when the component stack C,, ... Cj is folded at at most one
component (recall that folding at k = j corresponds to no folding). T(i, j, w) can be obtained by
first computing T(i,j, k, w) using P1 as described above. Since each T(i,j, k, w) can be com-
puted in O((k-i+l)(l-k)) time, T(i,j,w) can be computed in O( 1 (k-i+l)(1-k)) = O((/-i)3)
time.
A faster way to obtain T(i, j, w) is to solve the P1 instance defined by L = {L1, L2, ..., Lj_, },
R = {R1, R2, ..., Rj_~+I}, -' / = ws+l, wr, = w,+s_, 1 5 s j-i+l where w, is the width of component
C,. Let B be the (j i +1 ) x (j i +1 ) minimum bucket matrix computed using (4). T(i, j, k, w)
= B('-k, k-i+l, w). To see this, consider Figure 9(a). This gives the two sets of components that
are to be assigned to buckets. Figure 9(b) gives the same two component sets using L, R
terminology. Note that Cp corresponds to LJp+, and Rp-_,+ as w, = wlij-+1 = wrp-+1. The
minimum number of buckets needed is not affected by exchanging the two columns of Figure
9(a). So, the solution to Figure 9(a) is the same as that for Figure 9(c). Furthermore, inverting
the two columns to get Figure 9(d) does not affect the minimum number of buckets. Figure 9(d)
in L, R, notation is given in Figure 9(e). Hence, T(i,j, k, w) = B(j-k, k-i+l, w).
C, Cj Lj-_+1 Rjy-+1
C,+1 Lj-,
Ck Ck+1 Lj-k+1 Rk-i+2
(a) T (i, j, w), fold atk
c, C,
Ck+1 Ck
(c) Switch columns
(b) In L, R, terminology
Ck+1 Ck Lj_
C, C, L,
(d) Invert columns
(e) Relable
Figure 9: Equivalence.
Since B can be computed in O( i +1)2) time, T(i,j, k, w), i k j can be computed in
O( /- i +1)2 ) time. Hence, we can compute T(i, j, w) in O( (j- i +1)2) time.
We are now ready to see how S(i, j, h) may be computed. When only one folding is permit-
ted, the minimum width of a height h layout is one of the values in the set { w,, ..., w, } u { w, +
w, | i u < v j }. Let d, < d2 < ... < dp be the distinct values in this set. Note that (0 i +1)2.
Since T(i,j, d,) T(i,j, d,+1), 1 < u < p, S(i,j,h) = dq where q is the least integer for which
T(i,j, dq) h. q can be found by performing a binary search in the range [1, p]. For each exam-
ined value s in this range, T(i, j, ds) is computed. If T(i, j, ds) > h, then values < s are eliminated
from the range. Otherwise, values > s are eliminated. Since a binary search is used, at most
Flog2(p+l)] s values are tried. Hence, at most this many T(i,j, d)'s are computed. So, S(i, j, h)
is computed in O( (j i +1)2 log2(j-i+l)) time.
To compute W(i, h) using (3') we need to compute at most 2h S(i,j, h) values. Since j i
< 2h, this can be done in O(h3 log h ) time. To compute W(1, h), we first compute W(n, h), then
W(n-1, h), ..., and finally W(1, h). The time for this is O(h3n log h ). Since h need be at most n,
the complexity is O( n4 log n ).
3.1.2. Width Of Folded Layout Is Fixed
When the width is fixed at w, we can find the minimum height layout in O( n4 log2 n ) time by
performing a binary search in the interval [1, n]. For each examined value h in this interval, we
compute W(1, h) as in the previous section. If W(1, h) > w, then heights < h are eliminated. Oth-
erwise, heights > h are eliminated.
3.2. Routing Area At Stack Ends
Let r, be as in Section 2.2. Define T'(i, j, k, w) as below:
rT(i,j, k, w)+ max {r, r+l } + rk+l k < j
T'(i, j, k, w) =T(i,j, k,w)+r+ r k=j
T'(i, j, k, w) is the height of a minimum height width w rectangle into which C,, C,+1, ... ,C can fit
with fold at Ck. This height includes the needed routing space at the top and bottom of the com-
ponent stacks. T(i,j,k,w) is as defined in Section 3.1.1. If we use T'(i,j,k,w) in place of
T(i, j, k, w) in the computation of T(i, j, w) then the S(i, j, w)'s and W(i, h)'s computed in Section
3.1.1 account for the routing area. Hence, the minimum width height h folding that allows for
routing area can be found in O( h3n log h ) time. Similarly, the minimum height width w folding
that accounts for routing area can be found in O( n4 log2 n ) time.
4. Varible Width And Height Components
4.1. No Routing Area At Stack Ends
4.1.1. Height Of Folded Layout Is Fixed
Let h, be the height of component C, and let w, be its width. Let S(i,j, h) and W(i,h) be as in
Section 3.1.1. Using the same reasoning as in Section 3.1.1, we obtain
W(i,h)= min {S(i,j,h)+ W(j+1, h) } (5)
z!j!n h, < 2h
and W(n+l, h)= 0.
To obtain S(i, j, h), we generalize P1 to the case where the L's and R's have possibly dif-
ferent heights. Let hi, and hr, respectively, be the height of L, and R,. Let B(i,j, w) be the
minimum height rectangle into which the L,'s and R,'s can be placed so that:
(b') a horizontal line drawn at any vertical position of the rectangle cuts at most one member of
L and at most one ofR.
(c') if L, and R, are cut by some horizontal line, then / + wr, w.
(d') the order in which the L,'s and R,'s appear in the rectangle bottom to top is (L1, ..., L,) and
(RI, ... Rm), respectively.
If wl, + wr > w then using the reasoning of Section 3.1.1, we obtain:
B(i,j,w)= min {hl, +B(i-1,j,w) hr, +B(i,j-1,w) }.
However, if wl, + wr, < w, the minimum B(i,j, w) may not occur when L, and Rj are placed
adjacent to each other (see Figure 10).
SR3
R3
R2 R2
R, I R
R1 Ln R1
(a) L1 and R3 adjacent (b) L1 and R, adjacent
Figure 10: L, and R3 adjacent doesn't optimize B (1, 3, w).
When wl, + wrj < w, we need to compute i* and j* such that wl,- + wr, > w. This is done as
in Figure 11. To ensure proper termination of this code, we define wlo = wro = w + 1 and hli = hro
= 0. An example of this computation is given in Figure 12.
Let B(O,j,w) = hr, and B(i, 0,w) = hl,. Theorem 2 establishes a recurrence for
s=1 s=l
B(i,j, w) when wl, + wr, w.
i .J :=j;
HL := hl, ; HR := hr ;
while wl,- + wr,- < w do
if HL > HR
then [ j*:=j*-1;
else [ i*:= i*-l ;
Figure 11: Computing i* and j*.
HL > HR
HL
11
HR := HR + hr ]
HL := HL +hl,- ;
HL < HR
IHR
HL > HR
Figure 12: Example computation ofi* and j*.
HR HL
HR
HL < HR
HL =] HR HL
Theorem 2: Let i and j be such that wl, + wrj < w and let i*, j*, HL, and HR be as computed by
the code of Figure 11. Then, B(i,j, w) = min { HL +B(i*-1,j*, w), HR +B(i*,j*-l, w) }.
Proof: Since wl, + wr* > w, either L, is above R. or below it in every solution. If L, is above R*
in an optimal solution (see Figure 13), then HLot > HL = hl, as L,-+, ..., L, are above L,.
S=l
r ----------- --I
HL pt I
B (i,j, w) Hpt
Rj
I..
L__---------------------_JI
Figure 13: L, above R* in optimal solution.
Furthermore, since at least L, _I, ..., L, and Ri, ..., R, are below L, Hot >2 HLoP + B(i*-l,j*, w) >
HL + B(i*-1,j*, w). Since there is a feasible solution of height HL + B(i*-1,j*, w) (by construc-
tion of Figure 11, Rj, ..., R,++, can be packed in height HL adjacent to L,, ..., L, ) and since Hop is
the minimum possible height of a feasible solution, it follows that Hop = HL + B(i*-l,j*, w).
Similarly, if R is above L,* in the optimal solution, Ho = HR +B(i*,j*-1, w). Hence, B(i, j, w) =
H, = min { HL +B(i*-,j*, w) ,HR +B(i*,j*-l,w) }. D
Since L = n and |R I= m, a total of nm B values are to be computed. For each, we may need
to compute i* and j* using Figure 12. This takes O( n + m ) time. So, while the B matrix could
be computed in O(nm) time when all components had the same height, it now takes
O(mn(n +m )) time to do this. The remaining ideas of Section 3.1.1 directly carry over to the
case of variable height and width components. The T(i,j,k,w)'s can be computed in
O( (-i +1)3) time for any fixed i and j and all k, i < k j. Hence, each T(i,j,w) can be
obtained in O( (j- i +1)3) time. So, each S(i, j, h) can be obtained in O( h3 log h ) time. As a
result, (5) can be solved for W(1, h) in O( h4n log h ) time.
4.1.2. Width Of Folded Layout Is Fixed
The binary search technique of Section 3.1.2 may be used to obtain the optimal height solution in
O( n5 log2 n ) time as the optimal height is one of the n2 values h(i, j), 1 < i < j < n.
4.2. Routing Area At Stack Ends
We assume that all inter stack routes are done at the top or bottom ends of the stacks as in Figure
14(a), rather than in space internal to the stack as in Figure 14(b). More specifically, ifR is the
minimum height rectangle into which C,, C,+1, ... ,Cj can be folded using at most one folding posi-
tion then all inter stack routing is done external to R.
-- route
--
I TI
I... l I- -
"- -____-
_L__
r- ~--
L--- _
I -
L-- -
(a) legal route
I----- -
-11
r- route L I
Si it I
L ..... j route L ______J
(b) illegal route
Figure 14:
The technique of Section 4.2 readily generalizes to the case of vanble height and width
components. The complexity of the algorithms for this case are the same as those for the case
when no routing area is needed at the stack ends.
5. Folding Without Nesting
Our folding model permits the slices of one stack to nest with the slices of an adjacent stack. For
example, in Figure l(b) slice 4 of stack 1 occupies the same physical space as slice 4 of stack 2.
As noted earlier, this nesting of stacks may increase the demand for inter component routing
tracks. In situations where increased routing tracks cannot be provided, one may forbid stack
nesting. So, the stacks of Figure l(b) will have to be placed as in Figure 15. In this section we
consider folding without nesting. Since nesting can occur only when the component widths are
not the same, we need only consider this case.
C Cn
c, c, _
CI I ,4
Figure 15: Folding without nesting.
5.1. Height Of Folded Layout Is Fixed
Let W(i) be the minimum width rectangle of height h into which the components C,, ..., C, can
be folded without nesting. The correctness of the following recurrence is easily established.
W(i) = min { max{wq} + W(q+1) }, 1 i < n
z<_j<_n h(z,j)<_h z<_q<_j
W(n+l) = 0
This recurrence may be solved for W(1) in O(n2) time.
5.2. Width Of Folded Layout Is Fixed
Since the optimal height is one of the n2 values h(i,j), 1 i j n, we can perform a binary
search over these heights to determine the smallest height that results in a folding of width no
more than permissible. For each tested height the O(n2) height constrained algorithm is used.
The overall complexity is O(n2logn).
6. Conclusions
We have considered the problem of folding bit sliced stacks so as to obtain minimum height (sub-
ject to width constraints) and minimum width (subject to height constraints) foldings. Our model
differs from that of [LARM90] in that we do not permit a reordering of the components in the
input component stack. Our model applies to bit sliced architectures as well as to standard cell
and sea-of-gates designs.
Polynomial time algorithms to obtain optimal foldings have been obtained under a variety
of assumtions (stack nesting permitted/not permitted, routing space needed/not needed at stack
ends, equal height components, equal width components). Our algorithms for the stack nesting
case are summarized below in Table 1.
Stack nesting permitted Routing area at stack ends
Stack nesting permitted
No yes
Equal width, height constrained O(n) O(n2)
Equal width, width constrained (n 2) 0(n3)
Equal height, height constrained O (n4logn) O (n4logn)
Equal height, width constrained O(n4log2n) O(n41og2n)
Variable heights and widths, height constrained O(n 5logn) O(n logn)
Variable heights and widths, width constrained O(n 5logn) O(n log2n)
Table 1: Summary of algorithms.
When stack nesting is not permitted, the height constrained case may be solved in O(n2)
time and the width constrained case in O(n logn) time.
7. References
[HOR078] E. Horowitz, and S. Sahni, "Fundamentals of Computer Algorithms", Computer
Science Press, Maryland, 1978.
[LARM90] L. Larmore, D. Gajski and A. Wu, "Layout Placement for Sliced Architecture,"
University of California, Irvine, Technical Report, 1990.
[Wu90]
A. Wu, and D. Gajski, "Partitioning Algorithms for Layout Synthesis from
Register-Transfer Netlists," Proc. ofInternational Conference on Computer Aided
Design, November 1990, pp. 144-147.
[SHRA88] E. Shragowitz, L. Lin, S. Sahni, "Models and algorithms for structured layout,"
Computer Aided Design, Butterworth & Co, 20, 5, 1988, 263-271
[SHRA90] E. Shragowitz, J. Lee, and S. Sahni, "Placer-router for sea-of-gates design style,"
in Progress in computer aided VLSI design, Ed. G. Zobrist, Ablex Publishing, Vol
2, 1990, 43-92
*
* |