Constant Time Computational Geometry On Reconfigurable Meshes With Buses*
Madhusudan Nigam and Sartaj Sahni
University of Florida
Gainesville, FL 32611
Technical Report 92-35
ABSTRACT
We develop 0(1) time algorithms for the following computational geometry prob-
lems: convex hull, smallest enclosing box, ECDF search, and triangulation. Our algo-
rithms are for the reconfigurable mesh with buses architecture and run on the RMESH,
PARBUS, and MRN models.
Keywords And Phrases
Convex hull, enclosing box, ECDF search, triangulation, reconfigurable mesh with buses.
* This research was supported, in part, by the National Science Foundation under grant MIP-9103379.
1. Introduction
Several different mesh like architectures with reconfigurable buses have been pro-
posed in the literature. These include the content addressable array processor (CAAP) of
Weems et al. [WEEM87,89], the polymorphic torus of Li and Maresca [LI89ab,
MARE89], the reconfigurable mesh with buses (RMESH) of Miller et al. [MILL88abc],
the processor array with a reconfigurable bus system (PARBUS) of Wang and Chen
[WANG90ab], and the reconfigurable network (RN) of Ben-Asher et al. [BENA91].
The CAAP [WEEM87,89] and RMESH [MILL88abc] architectures appear to be
quite similar. So, we shall describe the RMESH only. In this, we have a bus grid with an
n x n arrangement of processors at the grid points (see Figure 1 for a 4x4 RMESH ).
Each grid segment has a switch on it which enables one to break the bus, if desired, at
that point. When all switches are closed, all n2 processors are connected by the grid bus.
The switches around a processor can be set by using local information. If all processors
disconnect the switch on their north, then we obtain row buses (Figure 2). Column buses
are obtained by having each processor disconnect the switch on its east (Figure 3). In the
exclusive write model two processors that are on the same bus cannot simultaneously
write to that bus. In the concurrent write model several processors may simultaneously
write to the same bus. Rules are provided to determine which of the several writers actu-
ally succeeds (e.g., arbitrary, maximum, exclusive or, etc.). Notice that in the RMESH
model it is not possible to simultaneously have n disjoint row buses and n disjoint
column buses that, respectively, span the width and height of the RMESH. It is assumed
that processors on the same bus can communicate in 0(1) time. RMESH algorithms for
fundamental data movement operations and image processing problems can be found in
[MILL88abc, MILL91ab, JENQ91abc].
An n x n PARBUS (Figure 4) [WANG90ab] is an n x n mesh in which the inter-
processor links are bus segments and each processor has the ability to connect together
arbitrary subsets of the four bus segments that connect to it. Bus segments that get so
connected behave like a single bus. The bus segment interconnections at a proccessor are
done by an internal four port switch. If the upto four bus segments at a processor are
labeled N (North), E (East), W (West), and S (South), then this switch is able to realize
any set, A = {A 1, A 2 }, of connections where A, c_ {N,E,W,S}, 1 i< 2 and the A i's are
disjoint. For example A = {{N,S}, {E,W}} results in connecting the North and South
segments together and the East and West segments together. If this is done in each pro-
cessor, then we get, simultaneously, disjoint row and column buses (Figure 5 and 6). If A
= {{N,S,E,W},0}, then all four
variety of applications can
JANG92abcde]. Observe that in
={A }, A1 c {N,E,W,S}.
K
(
K
(
K
[
-1
-j
bus segments are connected. PARBUS algorithms for a
be found in [MILL91ab, WANG90ab, LIN92,
an RMESH the realizable connections are of the form A
D
]
)
]
)
" L
:Processor
S: Switch
S Link
Figure 1 4x4 RMESH
The polymorphic torus architecture [LI89ab, MARE89] is identical to the PARBUS
except that the rows and columns of the underlying mesh wrap around (Figure 7). In a
reconfigurable network (RN) [BENA91] no restriction is placed on the bus segments that
connect pairs of processors or on the relative placement of the processors. I.e., processors
may not lie at grid points and a bus segment may join an arbitrary pair of processors.
Like the PARBUS and polymorphic torus, each processor has an internal switch that is
able to connect together arbitrary subsets of the bus segments that connect to the proces-
sor. Ben-asher et al. [BENA91] also define a mesh restriction (MRN) of their
reconfigurable network In this, the processor and bus segment arrangement is exactly as
for the PARBUS (Figure 4). However the switches internal to processors are able to
-1L -
/_r
0
OE
0
O
0
E3
O O
O O
O O
-o -o -
0
O
0
O
0
OF
Figure 2 Row buses
0 0 0
-- 0 El 0 El 0 E-
Figure 3 Column buses
obtain only the 10 bus configurations given in Figure 8. Thus an MRN is a restricted
PARBUS.
While we have defined the above reconfigurable bus architectures as square two
dimensional meshes, it is easy to see how these may be extended to obtain non square
architectures and architectures with more dimensions than two.
RMB algorithms for computational geometry problems has been explored in
[MILL87, JANG92e, REIS92]. In [MILL87] RMESH algorithms for several geometric
Figure 4 4 x 4 PARBUS
problems on digitized images were developed. These include closest figure, extreme
points of every figure, diameter, smallest enclosing box, and smallest enclosing circle.
An 0(1) time convex hull algorithm for a set of N planar points is given in
[REIS92]. The algorithm uses an N x N configuration and is based on the grid convex
hull algorithm of [MILL88]. Unfortunatley, the algorithm of [REIS92] is flawed. In sec-
tion 2, we give a corrected constant time RMESH convex hull algorithm. This may also
be run on an N x N PARBUS and MRN. Jang and Prasanna [JANG92e] develop con-
stant time PARBUS algorithms for the all pairs nearest neighbor problem on a set of N
planar points, 2-set dominance counting for N planar points, and the 3-dimensional max-
ima problem. All these algorithms are for N x N PARBUS configuration. Their algo-
rithm for the nearest neighbor problem is easily run on an N x N RMESH and MRN in
constant time. Jang and Prasanna [JANG92e] state that their 2-set dominance counting
algorithm can be simulated by an RMESH using the switch simulation of [JANG92d].
However the simulation of [JANG92e] requires 16 RMESH processors for each
PARBUS processor simulated. Hence a 4N x 4N RMESH is needed for the simulation.
In section 4, we consider the ECDF search problem which is closely related to the 2-set
dominance counting problem and develop a constant time algorithm that requires only an
Figure 5 Row buses in a PARBUS
N x NRMESH.
The third geometry problem considered in [JANG92e] is the 3-dimensional maxima
problem. In this, we are given a set S of three tuples (points in three dimensional space)
and are required to find all points p e S such that there is no q e S with the property
that each coordinate of q is greater than the corresponding coordinate ofp (i.e., no q in S
dominates p). The algorithm suggested in [JANG92e] is a modification of their 2-set
dominance counting algorithm and is unnecessarily complicated. We observe that the
problem is trivially solved in constant time using the algorithm of Figure 9. The input N
points are in row 1 of the N x NPARBUS/RMESH/MRN. The algorithm of Figure 9 can
be used to solve, in constant time, the k-dimensional maxima problem for any k.
The remaining computational geometry problems we consider in this paper are the
smallest enclosing rectangle problem (section 3) and the problem of triangulating a set of
N planar points (section 5). While these problems have not been considered before for
the RMB architecture, parallel algorithms for other architectures have been developed.
For example, Miller and Stout [MILL89] show how to find the smallest enclosing
Figure 6 Column buses in a PARBUS
Figure 7 4 x 4 Polymorphic torus
i
r(5
<5
<5"?
5
N
W E
S
N
WO E
S
N
W S E
S
N
W E
S
N
W E
S
N
W E
S
N
W E
S
N
W ) E
S
N
W E
S
N
W E
S
Figure 8 Local Configurations allowed for a switch in MRN
rectangle of a set of N planar points in O(N1/2) on an N x N mesh and Wang and Tsin
[WANG87] develop an O(logN) time CREW PRAM algorithm to triangulate a set of N
planar points. Miller and Stout [MILL89] also develop optimal mesh algorithms for the
convex hull of N planar points and for several other computational geometry problems.
Mesh algorithms for some other computational geometry problem are considered in
[JEON90, LU85,86].
2. Convex Hull
In this section, we consider the problem of determining the Convex Hull of a set of
N planar points on N x N Reconfigurable Meshes with Buses (RMBs). While the
RMESH algorithms we develop can be run without modification on PARBUSs and
MRNs, improved performance results from the use of algorithms for subtasks such as
ranking and sorting taht are tailored to the PARBUS/MRN architecture.
Step 1: Broadcast the N points on row 1 to all rows using column buses.
Step 2: {Use the i'th row to determine if the i'th point of S is a non dominated point}
Processor [i,i] broadcasts its point to all processors in its row using a row bus,
1 i N
Processor [ij], 1 < i,j < N compares the coordinates of the point it received
in step 1 to those of the points it received in step 2. If each coordinate of the
step 1 point is larger than the corresponding coordinate of its step 2 point, then
PE[ij] sets its Tvariable to 0. Otherwise, Tis set to 1.
Using row bus splitting, PE[i,1] determines if there is a 0 in row i, 1 < i N.
If so, it sets its Uvariable to 0 (the i'th point is dominated). Otherwise, Uis set
to 1 (the i'th point is not dominated).
Step 3: {Route the results back to row 1}
Using row buses PE[i,1] sends its U value to PE[i,i]. Using column buses,
PE[i,i] sends the Uvalue just received to PE[1,i], 1 < i
Figure 9 Simple Constant time algorithm for 3-dimensional maxima
Let S be a set of N distinct points in a plane. Let E(S) be the set of extreme points in
S and let CH(S) denote an ordering of E such that CH(S) defines a convex polygon. This
convex polygon is the convex hull of S. The extreme points problem is that of finding
E(S) while in the convex hull problem, we are to find the ordered set CH(S). Our algo-
rithms to find E(S) and CH(S) make use of the following known results:
Lemma 1: [Theorem 3.8, [PREP85], p104] The line segment 1 defined by two
points is an edge of the convex hull iff all other points lie on I or to one side of it. D
Lemma 2: [[PREP85], p105] Let p1 and p2, respectively, be the lexicographically
lowest and the highest points of S. Let CH(S) be (p1, li, 12, ... lk,P2, u1, ..., uj). The
ordering is a counter clockwise ordering of the extreme points and is such that the inte-
rior of the defined convex polygon is on the left as one traverses the extreme points in
this order. The points li, ..., lk,p2 have the property that the polar angle made by each of
these points, the positive x-axis, and the preceding point of CH(S) is least. The points
-10-
u u2, ..., Uj have the property that the polar angle made by each point, the negative x-
axis, and the preceding point of CH(S) is least (see Figure 10). D
Figure 10 Convex hull of a set of points
2.1. N3 Processor Algorithm
Before developing the 0(1) N x N processor algorithm, we develop a simple 0(1)
N2 x N processor algorithm. An 0(1) N2 x N processor algorithm for E(S) or CH(S) is
easily obtained by using either Lemma 1 or 2. We elaborate on the algorithm based on
Lemma 1. The strategy employed by this algorithm is given in Figure 11. It assumes that
the N points of S are initially in the top row of the N2 x N RMESH/PARBUS/MRN.
The i'th row, 1 I i N2 of the RMESH/PARBUS/MRN is used to determine if points
and k where i = (j- 1)N + k of S form an edge of the convex hull. In case they do,
then and k are in E(S) andj immediately precedes or immediately follows k in CH(S).
The correctness of Figure 2 is a direct consequence of Lemma 1 and the fact
[PREP85, p100] that an ordering of the points of E by polar angle about an internal point
yields the convex hull. The complexity of the algorithm is readily seen to be 0(1). Note
that if only E(S) is to be computed then steps 9-11 may be omitted.
Step 1: Use column buses to broadcast the points of S from row 1 to all other rows.
Step 2: In row i = (j- )N + k, 1 < i < N2, points j and k are broadcast to all
processors using a row bus.
Step 3: Using the points and k received in step 2, each processor computes a, b, and c
such that ax + by + c = 0 defines the straight line through points and k.
Step 4: Let (u, v) be the coordinates of the point of S assigned to processor [e,J] in step
1. Processor [e, J] sets its flag to 1 if au + by + c > 0, -1 if
au + bv + c < 0, 0 if au + bv + c = 0 and (u,v) is on the line segment
between points and k of step 3 (this includes the cases when (u,v) is point or
k), 2 otherwise.
Step 5: Using row buses and row bus splitting, PE[i,1], i = (j- 1)N + k,
1 I i < N2, determines if there is aflag = 2 on row i. If so, (/, k) is not an
edge of the convex hull. If not, it determines if there is a 1 and later if there is a
-1. If both a 1 and -1 are present, (j,k) is not an edge of the convex hull.
Otherwise it is.
Figure 11 O(1) N2 x Nprocessor algorithm for E(S) and CH(S)
(continued on next page)
2.2. N2 Processor Algorithm
The convex hull of S can be computed in 0(1) time on an N x NRMESH using the
algorithm of Figure 12. The algorithm for an N x N PARBUS/MRN is similar. Step RO
can be done using the 0(1) sorting algorithm for N data items on an N x N RMESH
[NIGA92]. Step R1 is done using the 0(1) N3 processor algorithm of the preceding sec-
tion on each N x N1/2 sub RMESH. For R2, we use the i'th N x N1/2 sub RMESH to
determine which points of E, are also in E. This is done using the algorithm of Figure 13.
Step 6: IfPE[i,1], i = (j- 1)N + k, i < < N2, j > k detects that (j,k) is an edge of
the convex hull, then using a row bus it broadcasts a 1 to PEs [ij] and [i,k].
Step 7: PEs that receive a 1 in step 6 broadcast this to the PEs in row 1 of the same
column (note 0 or 4 PEs in each column receive a 1 in step 6; using column bus
splitting, all but one of these may be eliminated to avoid concurrent writes to a
column bus).
Step 8: Row 1 PEs now mark the points of S they contain as being in or out of E(S)
(note: a point is in E(S) if and only if it receives a 1 value in step 7).
Step 9: Using row bus splitting in row 1, any three extreme points are accumulated in
PE[1,1]. In case E(S) I = 2, the remainder of this step is omitted. The
centroid of these three points is computed. Since no three points of E can be
colinear, the centroid is an interior point of the convex hull.
Step 10: The centroid computed in step 9 is broadcast to all points of E(S) using a row
bus. Each of these points computes the polar angle made* by the point using
the centroid as the origin.
Step 11: The points of E(S) are sorted by polar angle using the 0(1)
RMESH/PARBUS?MRN sort algorithm of [NIGA92].
Figure 11 0(1) N2 x Nprocessor algorithm for E(S) and CH(S)
(continued from previous page)
Theorem 1: The extreme points algorithm of Figure 13 is correct.
Proof: To establish the correctness of Figure 13, we need to show that exactly those
N1/2
points of U Ek, that are not in E(S) are eliminated in steps T5 and T6. It is easy to see
k= 1
* An alternative to using the polar angle is discussed on p100 of [PREP85].
RO: Sort S by x-coordinate and within x by y-coordinate.
R1: Partition S into N1/2 subsets, Si, 1 < i N1/2 using the ordering just obtained.
Each subset is of size N1/2. Determine the convex hull CH, and extreme points
Ei of Si using anN x N1/2 sub RMESH for each i,1 I i I N12.
R2: Combine the extreme points Ei, 1 i N1/2, to obtain the extreme points E
of S.
R3: Obtain the convex hull of S from E.
Figure 12 N x NRMESH algorithm for convex hull
that no point eliminated in T5 or T6 can be in E as in T5 the eliminated points are not in
the convex hull of E, u Ej and in T6 they are not in the convex hull of u Ek. To see that
the points not eliminated are all in E(S), suppose that there is a non eliminated point
pe E, that is not in E(S). Let Q be the set of non eliminated points. Clearly, E(S) =
extreme points of Q and p is in the interior of the convex hull of Q (see Figure 16),
CH(Q). Without loss of generality, assume that I E(S) I > 2. Since p is in the interior of
CH(Q), there must be a q e Q such that q 4 Si and the line from p to q does not go
through the interior of CH(Si) and does not touch the boundary of CH(Si) except at p.
Without loss of generality, assume that q e Sj and q is to the right of p or vertically
above p. Ifp lies between the two tangents from q to Si, then p lies between the tangents
of Si and Sj and so is eliminated in step T5 (see Figure 17). Since p e Q,p is not elim-
inated in T5 and hence must be on a tangent from q to Si (Figure 18). Furthermore, ifp is
not a tangent point with respect to Si and Sj, then p lies between the two tangents of Si
and Sj and is eliminated in T5. So, we may assume p is one of the 3N1/2 2 points
considered in step T6.
First suppose that p is a point of the leftmost partition S1. Since p is in the interior
of CH(Q), part of the boundary of CH(Q) passes above p and part belowp. Let a be the
leftmost point in E1, b the first point in CH(Q) that the lower boundary reaches outside
of S1, and c the first point in CH(Q) that the upper boundary reaches outside of S1 (see
Tl: Let Rji denote the (j,i)'th N12 x N1/2 sub RMESH of the entire RMESH.
Move Ei and Ej into Rji such that each row of Rji has Ei (one point to a
processor) and each column has Ej (one point to a processor). This can be
accomplished by broadcasting row 1 along column buses to all rows and then
broadcasting the diagonal (of the N x N RMESH) processor data along row
buses to all columns.
T2: Each PE in the k'th column of Rji computes the slope of the line that joins the
k'th point of E, and the point of Ej it contains. These slopes form a tritonic
sequence (they may decrease, increase, decrease or may increase, decrease,
increase) as the points of Ej are in convex hull order.
T3: The minimum and maximum of the slopes in each column of Rji are found
using bus splitting. These are stored in row 1 of Rji.
T4: The maximum of the at most N1/2 minimums found in step T3 is determined
by forming the cross product of the minimums in the N processors of Rji. The
minimum of the maximums of step T3 is similarly found.
Figure 13 Substeps for step R2 (continued on next page)
Figure 19). If b = c, then p is not on the tangent from c to CH(S1) and so must have
been eliminated in step T5. So, b # c. Let d and e respectively, be the points in S, to
which c and b are connected in CH(S1). Note that it is possible that d = a or e = a (or
both). If ce Sj, then d and c must be tangent points with respect to S1 and Sj (as other-
wise, some points of S1 and/or Sj lie above the upper boundary of CH(S). Similarly, e
and b are tangent points with respect to S1 and Sk where b e Sk. It is clear thatp is not
an extreme point of {a, b, c, d, e, p}. Since {a, b, c, d, e,p} is a subset of the set used in
step T6, p must have been eliminated in this step.
Next, suppose that p is a point of SN1/2. The proof for this case is similar to that for
p e S1. Finally, suppose p Si, i 4 {1, N1/2}. Let a, b, c, dbe the first points of the
portions of the boundary of CH(Q) that are above and below p that are not in Si (Figure
The minimum and maximum of step T4 define the two tangents of Ei and Ej.
These are used to eliminate points of Ei that are now known to not be in E (see
Figure 14).
The tangents of T5 define up to 4 tangent points. Over all Rji, I < i < N12,
there are 3 N1/2 2 non eliminated tangent points plus non eliminated points
of Si. The extreme points of this set of points is computed. Tangent points of
Ei that are not in the set of extreme points just computed are eliminated (see
Figure 15).
The points of u Ei not eliminated in steps T5 and T6 define E.
Figure 13 Substeps for step R2 (continued from previous page)
Tangents = Broken lines
* = tangent points of E, and Ej
X = points of E, eliminated
Figure 14 Examples of extreme points eliminated in step T5
-16-
+*
El E2 E3
i= 2
= tangent points
X = extreme points of E2 eliminated in step T5
+ = extreme points of E2 eliminated in step T6
Figure 15 Example of extreme point elimination in step T6
CH(Q)
Figure 16 Example for correctness proof
20). Note that it is possible that a = b and c = d. Also, note that a, b, c, and dmust exist
as S = N1/2 for each (i.e., as no Sj is empty).
Suppose that a e Sj. There must be a tangent point with respect to Sj and Si that is
in the triangle defined by a, a and a (see Figure 20). This is so asp is a tangent point
with respect to Sj and the other end of the tangent must lie in the said triangle. Let this
-17-
tangents from q to S,
tangents betweenS, and Sj
Figure 17 Tangents
Figure 18p is on tangent from q
point be A. Similarly, there are tangent points B, C, D in the corresponding triangles
associated with b, c, and d. From the triangles in which A, B, C, D are located, we see
that p is contained in the convex hull of {A, B, C, D,p} and so must be eliminated in step
T6. Note that when a = b and c = d, there must be a tangent point, e, of E, that is on
-18-
a .
L p
b
e
<- S1 ->
Figure 19p e Si
the upper and lower part of the boundary of CH(S). Now, p is not an extreme point of {A,
B, C, D, e, p} (we need point e as it is possible that A = B, and C = D). D
One may verify that each of the steps T1 through T7 can be done in 0(1) time. For
step T6, a modified version of the N3 processor algorithm of the previous section is run
in each N x N1/2 sub RMESH. Since the number of edges to consider is less than
(3N1 2)2 = 9N, the algorithm is run at most nine times. In each run of the algorithm an
edge needs to compare with 3N1 2 2 points using N1/2 processors. This is done in
three passes.
For step R3, the points of E may be sorted by polar angle about an interior point as
in steps 10 and 11 of Figure 2.
-19-
:a ll .... .p
af---------" ^
a.--.---i ,,
S S I S, Sk
Figure 20p e Si, i 4 {1, N/2}
3. Smallest Enclosing Rectangle
It is well known [FREE75] that the smallest enclosing rectangle of a set of N planar
points has at least one side that is an extension of an edge of the convex hull. Hence, the
smallest enclosing rectangle may be found by first computing the convex hull; then deter-
mining for each convex hull edge, the smallest enclosing rectangle that has one side
which is an extension of this edge; and finally determining the smallest of these rectan-
gles. The algorithm is given in Figure 21. Its correctness is immediate and one readily
sees that its complexity is 0(1) on an N x NRMESH, PARBUS and MRN.
4. ECDF Search On An N x N RMESH
Let S be the set of N distinct planar points. Point p S dominates point q e S if and
only if each coordinate of p is greater than the corresponding coordinate of q. In the
ECDF search problem [STOJ86], we are to compute for each pe S, the number, D(p,S),
of points of S that are dominated by p. Stojmenovic [STOJ86] provides an N (N = S
processor, O(log2N) time hypercube algorithm for this problem.
On an N x N RMB, D(p,S) for each point p e S, may be computed using a stra-
tegy similar to that used in [JANG92e] to solve the 2-set dominance counting problem on
a PARBUS in 0(1) time. A high level description of the strategy is given in Figure 22.
For simplicity in presentation, we assume that no two points of S have the same x- or the
-20-
Step 1: Find the convex hull of the N points.
Step 2: [Construct convex hull edges]
(a) The convex hull points are broadcast to all rows using column buses. These
are saved in variable R of each processor.
(b)PE[i,i] broadcasts its R value using a row bus to the S variable of all
processors on row i, 1 < i < p, p = number of convex hull points.
(c) PE[i,i + 1] broadcasts its R value using a row bus to the T variable of all
procesors on row i, 1 < i < p 1. For i = p, PE[p,1] does this. {Note: Now
each PE in row i contains the same edge of the convex hull in its S and T
variables}
Step 3: [Determine area of minimum rectangle for each edge]
(a) Using its R, S, and Tvalues, each PE computes the perpendicular distance D
between point R and the straight line defined by the points S and T. Since, the
R's are in convex hull order, the D values in a row form a tritonic sequence
whose minimum, h, can be found in 0(1) time using row bus splitting. h is the
minimum height of the rectangle for the row edge.
Figure 21 Minimum enclosing rectangle (continued on next page)
same y-coordinates.
Step 1 can be done by sorting the N points in 0(1) time by x-coordinate [NIGA92].
The point in PE[1,j] belongs to partition Xu where u = L -1 1+ 1. Step 2 is simi-
N1/2
larly accomplished by another sort (this time by y-coordinate). Following this each point
knows which X and which Y partition it belongs to. The computation of DX(p) is done
using an N x N1/2 submesh for the points in each X partition. The i'th such submesh is
used for Xi. For this, each p e Xi uses an N1/2 x N1/2 partition of the N x N1/2 sub-
mesh. In processor k of row 1 of this N12 x N1/2 partition, a variable T is set to 1 iffp
-21 -
Step 3: (b) Let P be the perpendicular through the middle of the edge defined by points
S and T. Each processor computes the perpendicular distance of its point R
from the infinite line P (use negative distances for points on one side of P and
positive distances for points on the other side). These distances form a
quadratonic sequence and its maximum, dmax, and minimum, dmin, can be
found in 0(1) time using row bus splitting.
(c) The minimum area, A, of the rectangle for row i is h *(dmax dmin). Let
this be stored in the A value of PE[i,1].
Step 4 [Determine overall minimum rectangle]
Compute the minimum of the A's of PEs [i,1], 1 < i < p. This is done by
forming the cross product of the A's in a p x p sub array and comparing the
two A's in each PE.
Figure 21 Minimum enclosing rectangle (continued from previous page)
dominates the k'th point of Xi. Tis set to zero otherwise. The T's in each N1/2 x N1/2
partition may be summed in 0(1) time using the RMESH ranking algorithm of
[JENQ91a]. The result is DX(p). The computation of DX(p) is best done after Step 1 of
Figure 22 as at this time the points of S are in the needed order. DY(p) may be computed
in a similar way following Step 2. However when setting the value of T, we note that Tis
to be set to 1 iffp dominates both the k'th point of Yj and the k'th point is not in the same
Xpartition asp.
Let Mij = I Su To compute Mij, we first sort the points of S by their Y
u*
partition index and within each Y partition, they are sorted by the X partition index. Fol-*
lowing this, each processor in row 1 determines if the X partition number of the point in
the processor to its left is less than that of its own point. If so, then using its own column
index and the Y partition number of its point, it can compute Rij = Suj, where i and
u*
j are, respectively, the X and Y partition numbers of its point (Ri = q (j- 1)N1/2*
where q is the processor's column index). Finally, Mij = Riv. Before describing
v
-22-
Step 1: Partition S into N1 2 sets Xi, 1 i N1 2 such that Xi = N1 2
1 I i < N1/2, and no point in Xi has a larger x-coordinate than any of the
points in Xi+1, 1 i < N12.
Step 2: Partition S into N1 2 sets Yj, 1 j N1/2 such that | Yj | = N12,
1 j < N1 2, and no point in Yj has a larger y-coordinate than any of the
points in Yj+1, 1 j < N12.
Step 3: Let Si = Xi n Yj (see Figure 23). For each p e Si, it is the case that
D(p,S) = # of points dominated byp in (Yj Sij) + # of points dominated
by p inXi + S,,I = DY(p) + DX(p) + I S,, .
u< v
Compute D(p,S) using this equation.
Figure 22 Strategy to compute D(p,S)
X1 X2
Figure 23 Partitioning of the points of S
how to do this final summation, we provide, in Figure 24, the steps in the overall ECDF
algorithm.
_ i i *i
S,,
-23 -
Step 1: Each point determines its X partition. This is done by sorting the points by x-
coordinate.
Step 2: Compute DX(p) for each point. This is done using an N1/2 x N1/2 block for
each e S.
Step 3: Each point determines its Y partition. For this, sort all points by y-coordinate.
Step 4: Compute DY(p) as in step 2.
Step 5: Sort S by Y partition and within Y partition by X partition. Each p e S
determines its Ri value where i andj are such that e Xi andp e Yj.
Step 6: Each point determines Mi = Y Ri.
v
Step 7: Each point computes D(p,S) = DY(p) + DX(p) + Mij(p).
Figure 24 RMESH algorithm to compute D(p,S)
Now, let us consider the computation of My = Riv. For each i, the N'12 Mij
v
values (1 < j < N1l2) are computed using the i'th N x N1/2 submesh. Initially, we have
Riv stored in the R variable of the PEs in the v'th N1/2 x N1/2 submesh of the i'th
N x N1 2 submesh (Figure 25(a)). The steps required to compute Myi are given in Figure
26. Its correctness is easily verified and its complexity is 0(1).
5. Triangulation
In this section, we develop an 0(1) time algorithm to triangulate a set S of N planar
points. The algorithm is for an N x N RMESH, an N x N PARBUS and an N x N
MRN. For simplicity, we assume that the N points have distinct x-coordinates and that no
four are cocircular. Our algorithm is easily modified to work when these assumption do
-24-
N1/2
N1/2 R N112 'N1/2
N3/4
N
R,2 i2
RI i
N1/2
al ...al a2...a2 aN1/4...
I N1/4 N1/4 .. N1/4
(a) i'th N x N12 submesh (b) N3/4 x N1/2 submesh
Figure 25
not hold. The overall strategy is given in Figure 27. We begin, in step 1, by partitioning
the N points into N2/3 sets, Si, each of size N1 3. For this, we assume N is a power of 3.
The partitioning is done by x-coordinate. Each Si contains points that are to the left of
those in Si+i, 1 < i < N2/3. This partitioning is possible because of our assumption
that the x-coordinates are distinct. To accomplish the partitioning, the N points are sorted
by x-coordinate using the 0(1) sorting algorithm of [NIGA92].
In step 2, the Voronoi diagram of each Si is computed. For this, each point P e Si
computes its voronoi polygon which defines all points in the plane that are closer to p
than to any of the other points in Si. As noted in [PREP85], for any two points p and q,
the set of points that are closer top than to q is the half plane containing p that is defined
by the perpendicular bisector of the line joining p and q. The Voronoi polygon, V(p),
(i.e., the polygon whose interior is the set of points closer top than to any other point in
Si) is defined by the intersection of the N1/3 1 half planes obtained by considering all
q e Si {p}. V(p) is comprised of portions of at most N1/3 1 perpendicular bisectors.
Consider the five points {a, b, c, d, e} of Figure 28 and suppose we are determining
which portion of the perpendicular bisector p3 of ab is a boundary of V(a). The perpen-
dicular bisector of ac intersects p3 atf Since points in the half plane above the perpen-
dicular bisector of ac are closer to c than to a, only the portion of p3 with x-coordinate
2 fx (f.x is the x-coordinate of f) may contribute to V(a). Similarly, since the
-25-
Step 1: [Decompose R's using the radix N1 4]
PE[1, 1] of each of the N12 x N1 2 submeshes computes a and b such that
R = a*N14 + b and 0 b < N1 4. Since I Y, = N12, each Ri,, N12.
Hence, 0 a < N14.
Step 2: [Prefix sum the a's]
(a) The i'th N x N12 submesh is partitioned into N1/4 N3/4 x N1/2
submeshes. In each of these, the N1/4 a values contained are routed to the row
1 processors such that the k'th group of N1/4 such processors contain the k'th a
value (Figure 15(b)).
(b) In each N3/4 x N1/2 submesh, the unary representation of the a's is
computed. For this, the r'th processor in an N1/4 sets its D value to 1 ifr is less
than or equal to its a value. Otherwise, it sets D to zero.
Figure 26 Computation of Mij (continued on next page)
perpendicular bisector of ad intersects up at g, only the portion of up with x-coordinate
2 g.x may contribute to V(a). Finally, since the perpendicular bisector of ae intersects
up at h and since points below (or to the right) of this bisector are closer to e than to a,
only the portion of p3 with x-coordinate < h. x can contribute to V(a). So, by looking at
the three other perpendicular bisectors, we see that only the portion of up that has x-
coordinate 2 mx = max {f.x, g.x} and < m.n = h.x is a boundary of V(a). Note that
if mx < mn, then up does not contribute to V(a).
This strategy for step 2 of Figure 17 may be implemented on an RMESH or
PARBUS. First, the points of Si are broadcast to all rows of the N x N1/3 submesh
using column buses. Next, we compute V(p) for each point p e Si using N2/3 x N1/3
partitions of the N x N1/3 partition. Thej'th such partition (from the top) is used to com-
pute V(p) for thej'th point p e Si. For this, the N2/3 x N1/3 partition is further parti-
tioned into N1/3 N1/3 x N1/3 partitions as in Figure 29. The k'th N1/3 x N1/3, Ak, is
used to determine the portion of the perpendicular bisector ofppk (where pk is the k'th
point of Si) that contributes to V(p), 1 < k N13 ,pk # p. For this, the points andpk
-26-
Step 2: (c) The one's in row 1 of the N3/4 x N1/2 submesh are prefix summed using
the RMESH ranking algorithm of [JENQ91a]. Let the overall sum for each
group of N14 a's be stored in variable A of the [1,1] PE of the N3/4 x N/4
submesh.
(d) [Prefix sum the A's]
We now have N1/4 A values to be prefix summed. Each is in the range [0,
N1l2]. The A's are decomposed using radix N1/4 such that A = c*N1/4 + d
where 0 < d < N 4. The N1/4 c's may be prefix summed using an
N1 2 x N1 2 submesh by routing them to row 1 of any N1 2 x N1 2 submesh,
computing their unary representation (as in (b) above), and ranking (as in (c)
above). The d's are prefix summed similarly. By adding together corresponding
pairs of prefix sums of c's and d's, the prefix sums of the A's are obtained.
(e) [Obtain prefix sum of a's] The prefix sum of the a's is obtained by adding
together the prefix sum of the a's in each N1/4 group (as computed in (c)) and
appropriate prefix sum of A (as computed in (d)).
Step 3: [Prefix sum the b's] This is similar to Step 2.
Step 4: [Obtain Mij] Mij is the sum of the a and b prefix sums computed in step 3 and
step 4 for the j'th (from the bottom) N1 2 x N1 2 submesh of the i'th
N x N1/2 submesh.
Figure 26 Computation of Mi (continued from previous page)
are broadcast to all processors Ak. Each processor then computes the perpendicular
bisector up3 ofppk as well as that ofpq where q is the point initially in the processor. The
processors initially containing p orpk are excluded. The intersection between p3 and the
bisector ofpq is obtained. Let the x-coordinate of this intersection be y. The processor
now sets itself to < y or 2 y depending on which portion of p3 has not been eliminated.
In case p3 is parallel to the bisector ofpq and on the same side of p, then the processor
-27-
Step 1: Partition S into N2/3 sets, Si, each containing N1/3 points. The partitioning is
done such that all points in Si are to the left of those in Si +1, 1 i < N2 3
Step 2: Using an N x N1/3 submesh for each Si, compute the Voronoi diagram for Si,
1 i < N2/3
Step 3: Compute the straight-line dual of each of the N2/3 Voronoi diagrams of step 2.
Step 4: Partition the region not in the union of the convex hulls of the Si's into basic
polygons and triangulate these.
Figure 27 Triangulating N planar points
cX
d
-b
a
e
Figure 28 Determining portion of V(a) contributed by ua
-28-
sets itself to 2 oo, if p3 is farther from p than the bisector of pq and to < -o otherwise.
Following this, the maximum mx of the 2 values and the minimum mn of the < values
is found using all processors of Ak. If mn
tributing endpoints together with the associated vertices p and pk are saved in processor
[1,1] of Ak.
N'/3
N1/3
N1/3
Ak N1/3
N2/3
N1/3
Figure 29 N1/3 x N1/3 partitions
The purpose of computing the straight line dual in step 3 is to obtain the triangula-
tion of the regions defined by the convex hull of each of the Si's. Theorem 5.11 of
[PREP85] states that the straight line dual of the Voronoi diagram of S is a triangulation
of S (Strictly speaking, this is true only if no four points of S are cocircular. However,
when four or more points are cocircular, the completion of the triangulation is easy once
the dual has been obtained). The straight line dual of the Voronoi diagram of Si is easy to
obtain. Suppose that (u,v) is an edge of this diagram. (u,v) is a portion of some perpen-
dicular bisector. Let the points of Si associated with this bisector be a and b. Then, (a,b)
is an edge of the straight line dual. Since, in the computation of the Voronoi diagram
(step 3 of Figure 17), we associate with each bisector the two points of Si that generate
the perpendicular bisector, the edges of the dual are easily generated.
When implementing steps 2 and 3, it is best to combine them so that when we detect
mn < mx (in step 2), the edge (p,pk) is generated providedp.x > pk.x (this avoids the
generation of two copies of each edge of the dual). Since the number of edges in the tri-
angulation of Si is at most 3 N1/3 6, these edges may be routed to the N1 3 row 1
-29-
Figure 30 Partitioning region exterior to convex hulls
processors in 0(1) time. Each processor in row 1 gets atmost 3 edges.
Following step 3, the regions enclosed by the convex hulls of the Si's, has been tri-
angualted (Figure 30) and we need to triangulate the region outside. For this, we use the
technique of Wang and Tsin [WANG87]. The exterior region is partitioned into 'simple
polygons' (Figure 30) which are then triangulated independently. The algorithm of
[WANG87] to partition the external region is given in Figure 31. This has been modified
to account for the fact that we have N2/3 Si's while Wang and Tsin have only N1/2
Step 1 is easily done in 0(1) time using an N x N1/3 submesh for each Si using the
N3 processor algorithm of Figure 2. Step 2 can be done in 0(1) time using an
N 3x N 3submesh for each S,j < i. The least slope Uij,j < i can be found by first
finding the least slope Uij in each group of N1 3 Uij's. This leaves us with at most N1 3
candidates from which the least slope one can be found. The computation is confined to
one N x N1/3 submesh for each Si. The leftmost and rightmost points of Si are easily
identified in 0(1) time. This may be done by using an N 13 x N1 3 submesh for each Si
and comparing each point of Si against each other point or by using the convex hull of Si
and comparing adjacent points in the hull. Hence, step 4 is also easy to do in 0(1) time.
Wang and Tsin [WANG87] have shown that the algorithm of Figure 32 correctly
identifies the exterior polygon to which each point of the convex hull of the Si's belongs.
In this algorithm, /i and ri, respectively, denote the leftmost and rightmost points of
CH(i); a, and bi denote the end points of Ui with bi being a point of CH(Si) and ai
being a point of CH(j) for some < i; and B(u) denotes the exterior polygon to which
point u belongs. The portion of the upper convex hull of Si that is comprised of the hull
points from bi to 1i (counterclockwise) defines a unique exterior polygon (see Figure 30).
-30-
Step 1: Compute the convex hull CH(i) of each Si, 1 I i N2/3
Step 2: For each Si, determine the upper, Uij, and lower, Lij, tangents of Si with
respect to Sj, 1 < j < i < N23.
Step 3: For each Si, determine the upper tangent Ui that has the least slope among all
Uij, j < i and the lower tangent L, that has maximum slope among all Lij,
j < i.
Step 4: For each Si, 1 < i < N2/3 determine the line Mi that joins the rightmost point
of CH(i 1) and the leftmost point of CH(i).
Step 5: Identify the exterior polygons formed by the Ui's, Li's, Mi's, and the
boundaries of the convex hull of the Si's. There are exactly 2N2/3 2 such
polygons.
Figure 31 WANG's Algorithm [WANG87] to partition the exterior of the CH(i)'s
into polygons
This polygon is called Pi. While it is possible for a point u to belong to more than one
Pi, B(u) is defined to be the largest such that u is a point of P. Note that points on the
lower hull of the Si's are on polygons bounded from above by the Mi's. These are
labeled Q's in Figure 30. B(u) may be similarly defined for these points.
To implement the algorithm of Figure 32, it is best to identify the ai, bi, ri, li ver-
tices associated with each Si during the computation of ui and 1i in step 3 of Figure 31
and during the computation of Mi in step 4 of Figure 31. Step 1 of Figure 32 is easily
performed in 0(1) time using an N x N1/3 submesh for each Si. For step 2, we note that
there is exactly one bi associate with each Si, i > 1. The corresponding ai can be stored
along with it when Ui is computed in Figure 31. This results in a pair (ai, bi) stored in a
single processor. To compute B(ai), we need to accumulate in the N x N1/3 submesh
associated with Si, all (aj, bj) pairs forj > i. This can be done by having the PE that
-31 -
{Find B(u) for points in the upper hull of the Si's}
Step 1: For all points u that are between their bi and 1i when moving along CH(i)
counterclockwise from bi (bi excluded, /i included), set B(u) = i.
Step 2: Ifuisanai, thenB(u) = leastjsuchthat aj.x < ai.x < bj.x.
Step 3: If u is a bi, then B(u) is computed similar to the step 2 computation of B(ai).
Step 4: Foru = ri, setB(u) = i + 1.
Step 5: Let b -ri denote the segment of the upper hull of CH(i), 1 I i < N2/3 with
endpoints bi and ri. Sort bi, ri, and all aj's in bi---ri into ascending order by
x-coordinate.
Step 6: Let bi = ai,, ai,, ., ai = ri be the sorted sequence ofay's in bi--ri. For
each u aiJ, 1 j t in bi-ri, setB(u) = B(aij) iff aij_1.x < u.x
aij.x.
{Find B(u) for points in the lower hull of the Si's}
[Similar to steps 1-6]
Figure 32 Algorithm to determine B(u)
contains (ai, bi) route the pair to the row i processor on its column (using a column bus)
and then broadcasting these pairs along row buses to the N x N1/3 submeshes that need
them. The least that satisties the inequality of step 2 is now easily found in 0(1) time (
in each N x N1/3submesh). Step 3 is similar to step 2 and step 4 is trivially done in 0(1)
time.
The segment bi-ri of the upper hull of CH(i) is easily identified from CH(i) in
0(1) time for each Si. All points on this segment that are aj's have been identified during
-32-
step 3 of Figure 31. The sort of step 5 can therefore be done in 0(1) time using an
N2/3 x N1/3 submesh of the N x N1/3 submesh that corresponds to each Si. Following
the sort, each point of b---ri that is not an aj can determine its u value in 0(1) time
using a row of N1/3 processors. For this, the sorted ai's are broadcast down column
buses of each N x N1/3 submesh and the unique j satisfying the inequality of step 6 is
found using row bus splitting. Once the B(u)'s have been computed, the convex hull
points of all the Si's are sorted by their B(u) values. This brings points in the same
polygon next to each other. The sort is accomplished in 0(1) time using the algorithm of
[NIGA92]. To triangulate each polygon, it is desirable to have the polygon points ordered
by the Si's from which they came and within Si, ordered in convex hull order. This can
be incorporated into the sort by B(u). Each polygon can now be triangulated using an
N x di (where di is the size of the polygon) submesh. This triangulation is fairly
straightforward as each polygon is comprised of two monotone segments. The steps are
given in [WANG87] for a PRAM and are easily performed on an RMESH or PARBUS
of size N x di (N 2 di) in 0(1) time.
6. Conclusions
We have developed constant time RMB algorithms for the convex hull, the smallest
enclosing rectangle, ECDF searching, and triangulation problems. All of our algorithms
are for the case of N planar points and all work on N x N RMBs. The algorithms apply
to all the three models (RMESH/PARBUS/MRN).
7. References
[BENA91] Y. Ben-Asher, D. Peleg, R. Ramaswami, and A. Schuster, "The power of
reconfiguration," Journal of Parallel and Distributed Computing, 13,
139-153, 1991.
[FREE75] H. Freeman and R. Shapira, "Determining the minimal-area encasing rec-
tangle foran arbitrary closed curve," Communications ofACM, 18, 409-
413, 1975.
[JANG92a] J. Jang and V. Prasanna, "An optimal sorting algorithm on reconfigurable
meshes," Proceddings 6th International Parallel Processing Symposium,
IEEE Computer Society, Los Alamitos, CA, 130-137, March 1992.
-33 -
[JANG92b]
[JANG92c]
[JANG92d]
[JANG92e]
[JENQ91a]
[JENQ91b]
J. Jang, and V. K. Prasanna, "A fast sorting algorithm on higher dimension
reconfigurable mesh," Proceedings 26th Conference on Information Sci-
ences and Systems, 1992.
J. Jang, H. Park, and V. K. Prasanna, "A fast algorithm for computing his-
togram on reconfigurable mesh," Department of EE-Systems, Technical
Report IRIS-290, University of Southern California, LosAngeles, Feb
1992.
J. Jang, H. Park, and V. K. Prasanna, "An optimal multiplication algo-
rithm on reconfigurable mesh," Department of EE-Systems, Technical
Report IRIS-291,University of Southern California, LosAngeles, March
1992.
J. Jang and V. K. Prasanna, "Efficient Parallel Algorithms for Some
Geometric Problems on Reconfigurable Mesh," Proceedings of 1992
International Conference on Parallel Processing, CRC Press, Boca Raton,
FL, 127-130, Aug 1992.
J. Jenq and S. Sahni, "Reconfigurable mesh algorithms for image shrink-
ing, expanding, clustering, and template matching," Proceedings 5th
International Parallel Processing Symposium, IEEE Computer Society
Press, Los Alamitos, CA, 208-215, 1991.
J. Jenq and S. Sahni, "Reconfigurable mesh algorithms for the Hough
transform," Proc. 1991 International Conference on Parallel Processing,
CRC Press, Boca Raton, FL, 34-41, 1991.
[JENQ91c] J. Jenq and S. Sahni, "Reconfigurable mesh algorithms for the area and
perimeter of image components," Proc. 1991 International Conference on
Parallel Processing, CRC Press, Boca Raton, FL, 280-281, 1991.
[JEON90] C.S. Jeong, and D.T. Lee, "Parallel Geometric Algorithms on Mesh-
connected Computers," Algl' ilhill/i a, 5, 155-177, 1990.
[LI89a] H. Li and M. Maresca, "Polymorphic-torus architecture for computer
vision," IEEE Trans. on Pattern & Machine Intelligence, 11, 3, 133-143,
1989.
[LI89b] H. Li and M. Maresca, "Polymorphic-torus network," IEEE Trans. on
Computers, C-38, 9, 1345-1351, 1989.
-34-
[LU85] M. Lu and P. Varman, "Solving geometric proximity problems on mesh-
connected computers," Proceedings 1985 Workshop on Computer Archi-
tecture, Pattern Analysis, Image Database Management, 248-255, 1985.
[LU86] M. Lu, "Constructing the Voronoi diagram on a Mesh-Connected Com-
puter," Proceedings 1986 International Conference on Parallel Process-
ing, The Pennsylvania State University Press, University Park, PA, 806-
809, 1986.
[MILL87b] R. Miller, V. K. Prasanna Kumar, D. Reisis and Q. Stout, "Parallel Com-
putation on Reconfigurable meshes," Department of EE-systems, Techni-
cal Report IRIS#229, University of Southern California, LosAngeles,
1987.
[MILL88] R. Miller, and Q. F. Stout, "Efficient Parallel Convex Hull Algorithms,"
IEEE Transactions on Computers, c-37, 12, 1605-1618, Dec 1988.
[MILL89] R. Miller, and Q.F. Stout, "Mesh Computer Algorithms for Computational
Geometry," IEEE Transaction on Computers, 38, 3, 321-340, 1989.
[MILL91a] R. Miller, V. K. Prasanna Kumar, D. Reisis and Q. Stout, "Efficient paral-
lel algorithms for intermediate level vision analysis on the reconfigurable
mesh", Parallel Architectures and Algorithms for Image Understanding,
Viktor K. Prasanna ed., 185-207, Academic Press, NewYork, 1991
[MILL91b] R. Miller, V. K. Prasanna Kumar, D. Reisis and Q. Stout, "Image process-
ing on reconfigurable meshes", From Pixels to Features II, H. Burkhardt
ed., Elsevier Science Publishing, Amsterdam, 1991.
[NIGA92] M. Nigam, and S. Sahni, "Sorting n Numbers on n x n Reconfigurable
Meshes with Buses, CIS, University of Florida, Technical Report TR-92-
04, 1992
[PREP85] F.P. Preparata and M.I. Shamos, Computational Geometry: An Introduc-
tion, Springer-Verlag, New York, 1985.
[REIS92] D.I. Reisis, "An Efficient Convex Hull Computation on the
Reconfigurable Mesh," Proceedings 1992 International Parallel Process-
ing Symposium, IEEE Computer Society Press, Los Alamitos, CA, 142-
145, 1992.
-35 -
[WANG87]
[WANG90a]
[WANG90b]
[WEEM87]
[WEEM89]
C.A. Wang, and Y.H. Tsin, "An O(logn) time parallel algorithm for tri-
angulating a set of points in the plane," Information Processing Letters,
25, 1, 15-47, 1988.
B. Wang and G. Chen, "Constant time algorithms for the transitive closure
and some related graph problems on processor arrays with reconfigurable
bus systems," IEEE Trans. on Parallel and Distributed Systems, 1, 4,
500-507, 1990.
B. Wang, G. Chen, and F. Lin, "Constant time sorting on a processor array
with a reconfigurable bus system," Information Processing Letters, 34, 4,
187-190, 1990.
C. C. Weems, S. P. Levitan, A. R. Hanson, E. M. Riseman, J. G. Nash,
and D. B. Sheu, "The image understanding architecture," COINS Techni-
cal Report 87-76, University of Massachusetts at Amherst, 1987.
C. C. Weems, S. P. Levitan, A. R. Hanson, E. M. Riseman, J. G. Nash,
and D. B. Sheu, "The image understanding architecture, International
Journal of Computer Vision, 2, 251-282, 1989. ~
*
* |