Finding A Minimum Cost Acceptable Path in Parallel1

Theodore Johnson Panos E. Livadas

t, 1I ..' i.. 11.cis.ufl.edu pel@cis.ufl.edu

University of Florida, Dept. of CIS

Gainesville, FL 32611-2024

Abstract

We consider the problem of finding a minimum cost acceptable path on a toroidal grid graph,

where each horizontal and each vertical edge have the same orientation. An acceptable path is closed

path that makes a complete horizontal and vertical circuit. We exploit the structure of this graph to

develop efficient parallel algorithms for a message passing computer. Given p processors and an m

by n toroidal graph, our algorithm will find the minimum cost acceptable path in O(mnlog(m)/p)

steps, if p = O(mn/((m + n)log(mn/(m + n)))), which is an optimal speedup. Also we show that

the algorithm will send O(p2(m + n)) messages. The solution of this problem has applications in

surface reconstruction [6, 5, 9]

1 Introduction

The task of finding minimum cost paths in parallel is a well studied problem. Parallel single source

shortest path algorithms include those by Deo, Pang, and Lord [2], Quinn and Yoo [11], and Paige and

Kruskal [10]. Parallel all-pairs-shortest path algorithms have been proposed based on solving the single

source problem at each source [1, 8], matrix multiplication [4, 7, 13, 14, 10]. While Paige and Kruskal's

algorithms [10] achieve optimal speedups under certain conditions, the proposed parallel shortest path

algorithms are generally either limited in parallelism, or suboptimal. Efficient and optimal algorithms

can be found by restricting the problem domain. For example, Eckstein [3], and Reghbati and Corneil

[12] use breadth-first search to construct efficient parallel algorithms for the single source problem when

all edges have the same weight.

In this paper, we will examine a particular type of shortest path problem, that of finding the minimum

cost acceptable path in a toroidal graph. This problem has been studied by Keppel [6], Fuchs, Kedem,

and Uselton [5], and by Livadas [9] in the context of surface reconstruction from a set of points on two

planes. There, it is shown that the best surface can be found in O(mn log(m)) time, where the toroidal

graph that the surface maps to is of dimensions m by n. We make use of the structure of the graph to

find an optimal parallel algorithm, suitable for implementation on a message passing parallel computer.

'University of Florida Dept. of CIS Technical Report 92-010 available bia anonymous ftp at cis.ufl.edu:cis/tech-

reports/tr92/tr92-010.ps.Z

1.1 Definition of the problem

We review the notation from [5] and [9]. Let m and n be two positive integers. We will denote by AM and

AN the sets of all non-negative integers bounded above by (m 1) and (n 1), respectively. In addition

we will denote by R+ the set of positive real numbers. We will search for paths on a particular type

of toroidal graph, a p2-toroidal graph, which we define to be a toroidal directed and weighted graph,

G = (V, E, w), where the sets V, E, and the function w are defined as follows:

1. The set of vertices, V, can be enumerated in such way that

V = {v E G : v = vij ViEM VjEAr}

2. The set of edges, E, is defined as follows

E= {e : e = vij vi,(j+l) mod n VieM Vjer} U {eij : = vi,jV(i+l) mod n,j ViEM VjEgA}

3. The function c is a weight function (i.e, w : E I R+) and we write wj and i- to denote the

images w(eC-) and w(ei-) respectively.

Let i be an integer such that i E AM and consider the vertex vi,o E V. We define an acceptable path

at i, 7r, to be a closed path with initial vertex vi,o and satisfying the following two properties

1. VsE 3 keA st. ((esk C 7r) A (eCsk C 7r A e_2k C 7 sl = s2))

2. Vkeg 3,EM st. ((ek C 70) A (c,T C 7w A T -C 7 k k-z))

To say it differently, each acceptable path contains exactly one horizontal edge between any two

adjacent columns and exactly one vertical edge between any two adjacent rows.

Now let P(i) denote the set of all acceptable paths at i. It is easy see that if wr E P(i), the length of

7r is equal to (m + n). An acceptable path is either a vertex-simple cycle, or consists of two vertex-simple

cycles that share one vertex. Figure 1 illustrates a p2-toroidal graph (m = 5, n = 6), and the two types

of acceptable paths.

We define that cost of a path, 7r to be the sum of the weights of the edges in wr. We define the

minimum cost path at i denoted by p(i) to be the one among all 7r E P(i) such that its cost is minimal.

Finally, let PG be the path among all p(i)'s that yields the minimal cost.

Figure 1: A p2-toroidal graph and two acceptable paths at i = 2

2 Mapping the Toroidal Graph to a Planar Graph

By cutting G open and gluing the two ends of G together we obtain a new planar graph G'(V', E', w'),

as illustrated in Figure 2. We define M' = {0, 1, 2,..., 2m} and A' = {0, 1, 2, ..., n}, so that

V Vi~M VJEM}

E' = { : i' = i,(j+l) mod n ViM' VjeA} U {e :

' ii+1) mod nj iEM' VjcN

Furthermore, the weight function w' : E' -- 7R is defined by w'(e' ) = w(emod m md n) and

w'(e') = w(ei mod m, mod n) for each i and k.

Let i be an integer variable with i E M4; then the simple path in G' defined by

Vi,0

i ,0

' "' t ) t (m+n.+1)

V(m+1),n

has a preimage in G the path

i,0 = Uill mod mrj1 mod n Vi2 mod mJ2 mod n (m** V'(+n+l) mod m r(m++l1) mod n

io0 (1)

which is is an acceptable path at i in G.

Hence, there is a natural one-to-one and onto mapping from the set of simple paths from :',, to

v(m+l),n in G' to the set of acceptable paths in G which start and end at vi,o Vi c M. Our problem is

therefore equivalent to the following:

V0,1 V0,2 V0,3 V0,4 V0,n-1 V0,1 V0,2 V0,3 V0,4 V0,n-1

V 0,- V 0,0

9 10

V 1,0 V 0,1

11

1 2 1 2 8 9 10 11

V 2,0 V 0,2

3 4 5 3

V 3,0 V 0,3

6 7 1

Vm-1,0 V m-1,0 O

8 5

(i) (11)

V'= {v' G : v

V V0,1 V0,2 V0,3 Vb,4 V0,n-lV0,n

v1 0

V1,0,

Vm,0 4

6 1 7

V'+ _3,0

V^+"o--G

Vm _30 11,

Vmn+4,0

V2m04 -A ,

S V0,1 V0,2 Vb,3 V0,4 Vb,n-lVb,n

3

V0,0

Vi,0 ,

V4,0

5

Vin,0 .

Vin+1,0 L 6

7 8 9 10 11

Vm+2,0 -o m

V;n+3,0

Vin+4,0

Figure 2: Mapping the p2-toroidal graph of Figure 1 to a planar graph. The images of the two acceptable

paths in Figure 1 (i) and (ii) are shown in (i) and (ii), respectively.

Find the minimum cost path r'(i) among all simple paths with initial vertex at .',, and terminal

vertex at v(+~i, Vi E M. The minimum cost path p', of G' is the one among all wr'(i)'s (i E AM) of

minimum cost, and the required path p(G) is the preimage of p' and can be obtained via the mapping

defined in equation (1).

In the remainder of this paper, we will work in the planar graph, and we will drop the primes that

distinguish between the toroidal and the planar graph.

3 Non-crossing Paths

Let p(i) and p(j) be two minimal cost acceptable paths at v, o and vjo, respectively. Assuming that

i < j we say that p(i) does not cross p(j), if and only if, whenever e- E p(i) then for all r > 0 there is

no edge of the form e-- k p(j). In other words, p(i) lies entirely dI...- ." p(j) in the sense that they

do not ...--." but they may still have common vertices and edges. In the planar image, no two minimal

cost paths p(i) and p(j) cross; for if they did, we could splice the paths and come up with a shorter path

for one of p(i) and p(j).

For each k E A we define two integers mini, and maxjk as follows

v01 V02 V03 V04 V0~1V~

V0,1 V0,2 V0,3 V0,4 V0,n-lVn

V 0 ,0

V'2,0

V3,0 -

V'4,0 .

V'm,04 -

Vin+l, i

;n+2,0 I

Vm+3,0 2

Vm+4,0

V2mO

(1)

V0,1 V0,2 V0,3 V0,4 V0,n-lVn

(-- )

_1 _-r _-I -- ,-*< -* --^

< -- > J _- -- ,< _- > -- _

(11)

Figure 3: Determining the subgraph of G spanned by V(0, m).

mini = min{i C M st. (e l V ei) CE (i)}

maxi x{j C .M st. (k V e) (j)}

We next define Uk(i,j) = {vik C V : mini, < i < maxj,} and Gij to be the subgraph of G that is

spanned by the set of vertices V(i, j) where

According to our earlier discussions we can see that the determination of the path PG can be done as

follows. First, we determine the path p(O) (Fig. 3(i)). Now, p(m) is a copy of p(0) which starts at vm,o

and ends at v(,m+m), (Fig. 3(ii)). Hence, calculation of all other paths p(1), p(2), ..., p(m 1), can

be restricted to the subgraph of G that is obtained by I.. '11"i off" the portion of the graph G lying

above p(0) and below p(m). In view of equation 2, these paths may be found in the subgraph Go,, of

G spanned by V(O, m) (Fig. 3(iii)).

Suppose that instead of calculating p(1) after calculating p(O), you calculate p(m/2). Now, the paths

p(i), i = 1 ... m/2 1 can be calculated in V(0, m/2), and the paths p(i), i = m/2 + 1 ... m 1 can be

calculated in V(m/2, m). The regions V(0, m/2) and V(m/2, m) can also be divided into finer pieces

V (i, j) = U'A'=, Uk (i, j)

recursively, resulting in the Serial Algorithm. We define W(m) to be the subgraph of G that the serial

algorithm searches to find p(m). Fuchs et al. [5] show this algorithm reduces the time to compute the

minimum cost acceptable path from mT(m, n) to T(m, n) log(m), where T(m, n) is the time to solve the

single source shortest path problem on a p2-toroidal graph.

Serial Algorithm

shortestpath(G) {

/* find the shortest path in 2n x m graph G */

PO = findonepath(G,0)

G'=restrict(G,PO)

{P,. .,Pm}=findallpaths(G',1,m)

return(min(PO,. .,Pn))

}

findallpaths(G,pathlo,pathhi) {

m=(pathlo+pathhi)/2

if(pathlo

Pm=findonepath(G,m)

Plo=findallpaths(restrict(G above Pm),pathlo,m-1)

Phi=findallpaths(restrict(G below Pm)m+l,pathhi)

}

return(union(Plo,Pm,Phi)

}

4 Finding Acceptable Paths in Parallel

In the serial algorithm, after path p((i + j)/2) is found in V(i, j), G can be further divided into V(i, (i +

j)/2) and V((i + j)/2, j). These are independent subtasks, so after finding the path p((i + j)/2), we can

find the paths p(i + 1),..., ((i + j)/2 1) and p((i + j)/2 + 1), ..., (j 1) in parallel.

As with a parallel quicksort algorithm, this algorithm will be efficient only if we can find a single

shortest path efficiently in parallel. Given the structure of the graph G, we can accomplish this via a

dynamic programming algorithm.

Let us consider the problem of finding the shortest path from a node vi ,ji to another node vi,,j,,

ii < i2, ii < i2, vii,j, vi2,j, in some subgraph W C G. To simplify the presentation, we will normalize

the graph coordinates by translating v,,s to u,-ii,s-j,. Therefore the problem translates into finding the

shortest path P(i, j) from uo,o to ui,j, where i = i2 i1 and j = j2 j1. The edges in the graph W only

head down or to the right, never up or to the left. Therefore, all paths to ui,j must travel through ui_1,j

Figure 4: Dynamic programming calculation of the shortest path.

or uij-1, so that the minimum length path from uo,o to uij must have as its penultimate node either

ui-lj or uij-1. See figure 4.

Let C(i, j) be the cost of the minimum cost path from uo,o to ui, Let:

w(i1j) { )Jy =j -l W

1 00 V(i-il),j-jl V_

00 V-im(jiiji) J W

Then we can find the recurrence:

C(i, j)= min(C(i ,j) + w(i j), C(i, j -1) + w(i, j ))

Further, P(i, j) is either P(i 1,j) e(i-i-- or P(i, 1) | e i- ,--)-- depending on which

path has lower weight. Calculating a shortest path p, thus reduces to a dynamic programming problem,

iterating across all vij E W(s) such that i + j = d + s and all d between 0 and m + n.

We can parallelize the dynamic programming calculation for a message passing parallel computer by

splitting up the iteration for a particular value of d. In order to minimize communication, the nodes

for which a processor is responsible should be contiguous. To balance the load, each processor should

be responsible for the same number of nodes along a diagonal. These two considerations lead to the

distribution of nodes among the processors shown in figure 5.

Let Sd C W(s) be the set of nodes at manhattan distance (i.e, L1 norm) d from v, o, kd = Sdl, and

Figure 5: Allocation of nodes to processors

let Vq,r be the node in Sd st. Vq',r, E Sd = q' > q. If there are p processors computing the paths, then

processor i, i = 0,. . . ,p - 1 is responsible for the computing the shortest paths to the nodes (q + 1, r - 1),

where floor(kdi/p) < 1 < floor(kd(i + l)/p). The communication in this algorithm takes place across the

dotted lines that separate the nodes for which each processor is responsible. If a processor q calculates

a shortest path to node to node v, (v, w) E E, and processor r calculates the shortest path to w, then q

sends the length of the shortest path to v to r. Correspondingly, r waits until it is sent the length of the

shortest path to v before calculating the length of the shortest path to w.

After calculating the cost of p(s), the actual path can be found by working backwards from the sink.

Once this path is calculated, it is distributed to the all p processors that co-operated to calculate p(s).

The path p(s) is used to divide W(s) into W(r) and W(t) by the processors that will calculate p(r)

and p(t), respectively. The number of processors assigned to calculate p(r) and p(t) is proportional to

relative sizes (number of edges) of W(r) and W(s), respectively. The sizes of W(r) and W(t) can be

calculated while p(s) is calculated.

These considerations lead to the our parallel algorithm, listed below. Our parallel algorithm has

two noteworthy features. First, all synchronization and communication can be performed using message

passing, so that the algorithm can be implemented on available parallel computers. Second, when the

graph is split up, the number of processors assigned to calculate paths above p(s) is proportional to the

O O

B - - <

S-0 0

O

0

Figure 6: Removing cut edges from further processing

number of graph nodes above p(s). This load balancing is necessary because the graph can be split in a

very unbalanced fashion.

We define W1o to be the subgraph in which the paths p(pathlo) through p(m - 1) will be found,

and we define Whi similarly. Our load balancing mechanism is based on the sizes of W1o and Whi,

respectively. There is one problem -one of Whi and W1o might be very small, and contain sections where

all p(i) that are found in the subgraph must contain the same path. In this case, the remaining work

contained in the subgraph might be somewhat larger than is indicated by the size of the subgraph. To

account for this problem, whenever there is a cut edge in the subgraph, its edges are joined together

(see figure 6). This condition occurs whenever there are two adjacent d in the subgraph such that

Sdl = 1, and can be detected and corrected for when the paths are split and distributed with only a

constant time penalty. As a result, we can guarantee that the remaining work to find p(pathlo) through

p(m - ) is [5] IWol1o log(11o), where 11o = pathlo - m + 1, and the remaining work to find p(m+ 1)

through p (pathhi) is IWhiIlhi log(lhi). If pio processors are assigned to find the lower numbered paths

and Phi processors are are assigned to find the higher numbered paths, then the load will be balanced if

I W|o|1,o log(lo)/pIo = IWhilhi log(lhi)/phi. Since plo +Phi = p = prhi-prlo+ 1, we find that to balance

the load, plo = p/(l + IWMo/ IWhi) = IWMop/(IWo + Wh ).

Parallel Algorithm

shortestpath(G,p,n) {

/* find the shortest path in 2m x n planar graph G using p processors*/

PO = findonepath(G;1,p;O)

G'=restrict(G,PO)

findallpaths(G; 1,p;1,n)

return(min(PO,. .,Pn))

}

f indallpaths(G,prlo,prhi,pathlo,pathhi) {

m=(pathlo+pathhi)/2

if(pathlo
Pm=findonepath(G; prlo,prhi; m)

if(prlo
in parallel{

n_prlo=(prhi-prlo+l)*l|Wlo/(IWlo + IWhi

findallpaths(restrict(G below Pm),prlo,prlo+n-pr_lo,m+1,pathhi)

findallpaths(restrict(G above Pm),prlo+n-pr_lo+l,prhi,pathlo,m-1)

}

else{

findallpaths(restrict(G above Pm),prlo,prlo,pathlo,m-1)

findallpaths(restrict(G below Pm),prlo,prlo,m+1,pathhi)

}

}

4.1 Analysis

Theorem 1 If p = O(mn/((m + n) log(mn/(m + n)))), then the execution time of the algorithm is

O(mn log(m)/p). Further, the number of messages sent is bounded by O(p2(m + n)).

Proof: For the purposes of the analysis, we will assume that every entry into another parallel recursive

step is synchronized across all processors. This assumption is not required for the algorithm to execute

correctly, and provides a worst-case bound. We will also assume that each individual message is large

enough to carry information about a single node. That is, we don't count bit complexity, but that each

message is limited to a 'reasonable' size. Counting bit complexity multiplies the message passing cost by

a factor of log(mn).

Ifp > 1 processors are assigned to calculate p(s) in W(s), then calculating the cost of the p(s) requires

O(max(m + n, W(s)|/p)) steps, and O(p(m + n)) messages. Determining the path of p(s) requires an

additional O(m + n) steps and an additional O(m + n) messages. Distributing the path of p(s) requires

O(p(m + n)) messages and O((m + n)logp) steps. So, the total cost of finding a single shortest path

p(s) using p processors is O(max((m + n)logp, W(s)|/p)) steps and O(p(m + n)) messages.

The first path, p(0) is calculated in V(0, 2n- 1), which contains 2nm nodes. The second path, p(n/2),

is calculated in W(n/2), which contains mn nodes. So, the cost to calculate the first two paths and subse-

quently divide the graph is O(max((m+n) logp, mn/p) steps and O(p(m+n)) messages. The subsequent

two paths, p(n/4) and p(3n/4) are calculated in parallel. Suppose ,n/4 processors are assigned to calcu-

late p(n/4) and P3n/4 are assigned to calculate p(3n/4), ,/4 + P3n/4 = p. The number of steps required

to calculate these paths is O(max((m + n) logp,/4, (n + n) logp3/4, IW(n/4)l/p,/4, IW(3n/4)l/p3n/4)-

Since the processor allocation is proportional to the subgraph size, the number of steps is bounded by

O(max((m+n) logp, mn/p). The number of messages sent is bounded by O(p,/4(m+n)+p3n/4(m+m)) =

O(p(m + n)).

If there are k recursive steps before every processor is executing the dynamic programming al-

gorithm serially, then at most O(k max((m + n)logp, mn/p)) steps are executed. Afterwards, there

are log m - k recursive rounds of the serial algorithm, each of which requires O(mn/p) steps. There

are at most O(logm) recursive steps in which a parallel shortest path algorithm is executed, so the

algorithm will calculate all paths after O(max((m + n)klogp + (logm - k)mn/p, mn log(m)/p)) =

O(max((m + n) log(m) log(p), mn log(m)/p) steps. In addition, at most O((m + n)p2) messages will

be sent. To finish, the processors must agree on the lowest cost path. If all processors keep track of the

lowest cost path that they have helped to calculate, this final step requires only log(p) steps and O(p)

messages. This step does not affect the complexity of the algorithm.

This algorithm has an optimal speedup if (m+n) log(m) log(p) < mn log(m)/p, or ifp = O(mn/((m+

n) log(mnn/(m + n)))) ..

5 Conclusions

We have parallelized a shortest-path problem whose structure allows efficient solutions. The problem

has applications in image reconstruction. Our algorithm has an optimal speedup for p = O(d/log(d))

processors on a d by d p2-toroidal graph. Further, the algorithm is based on message passing, and does

not require that the processors be synchronized, so that the algorithm can be practically implemented

on current multiprocessors. The algorithm has applications in surface reconstruction [6, 5, 9].

References

[1] C.C. Chen. A distributed algorithm for shortest paths. IEEE Trans. Computers, C-31(9):898-899,

1982.

[2] N. Deo, C.Y. Pang, and R.E. Lord. Two parallel algorithms for shortest path problems. In Proceed-

ings of the 1980 Int's Conference on Parallel Processing, pages 244-253, 1980.

[3] D.M. Eckstein. Parallel Algorithms for Graph Theoretic Problems. PhD thesis, University of Illinois,

Dept. of Mathematics, 1977.

[4] A. Frieze and L. Rudolph. A parallel algorithm for all pairs shortest paths in a random graph. In

Proc. 22nd Allerton conf., pages 663-670, 1984.

[5] H. Fuchs, Z.M. Kedem, and S.P. Uselton. Optimal surface reconstruction from planar contours.

Communications of the AC i1, 20(10):693-702, 1977.

[6] E. Keppel. Approximating complex surfaces by triangulation of contour lines. IBM Journal of

Research and Development, 19:2-11, 1 Ii.

[7] L. Kucera. Parallel computation and conflicts in memory access. Inf. Proc. Letters, 14(2):93-96,

1982.

[8] G.D. Lakhani. An improved distribution algorithm for shortest paths problem. IEEE Transactions

on Computers, C-33(9 '.". ',, 1984.

[9] Panos Livadas. A reconstruction of an unknown 3-D surface from a collection of its cross sections:

An implementation. I,, / Journal of Computer Math, 26, 1989.

[10] R.C. Paige and C.P. Kruskal. Parallel algorithms for shortest path problems. In L, I Conference

on Parallel Processing, pages 14-20, 1 I"'

[11] M. Quinn and Y. Yoo. Data structures for the efficient solution of graph theoretic problems on

tightly-coupled computers. In Proceedings of the International Conference on Parallel Processing,

pages 431-438, 1984.

[12] E. Reghbati and D.G. Corneil. Parallel computations in graph theory. SIAM J. Computing, 7(2):230

236, 1978.

[13] J.H. Reif and J. Spirakis. The expected time complexity of parallel graph and digraph algorithms.

Technical Report TR-11-82, Aiken Computation Lab., Harvard University, 1982.

[14] C. Savage. Parallel Algorithms for Graph Theoretic Problems. PhD thesis, University of Illinois,

Dept. of Mathematics, 1977.