A Structured Approach to Parallel
Programming:
Methodology and Models
UF CISE Technical Report 98-023 *
Berna L. Massingill
University of Florida, P.O. Box 116120, Gainesville, FL 32611
blm@cise.ufl.edu
Abstract. Parallel programming continues to be ,itl!,. -iii despite sub-
stantial and ongoing research aimed at making it tractable. Especially
dismaying is the gulf between theory and the practical programming.
\\. propose a structured approach to developing parallel programs for
problems whose specifications are I! those of sequential programs, such
that much of the work of development, reasoning, and testing and de-
bugging can be done using familiar sequential techniques and tools. The
approach takes the form of a simple model of parallel programming,
a methodology for transforming programs in this model into programs
for parallel machines based on the ideas of semantics-preserving trans-
formations and programming archetypes (patterns), and an underlying
operational model providing a unified framework for reasoning about
those transformations that are ,itl!. i!l or impossible to reason about
using sequential techniques. This combination of a relatively accessible
programming methodology and a sound theoretical framework to some
extent bridges the gulf between theory and practical programming. This
paper sketches our methodology and presents our programming model
and its supporting framework in some *
*
1 Introduction
Despite the past and ongoing efforts of 11, ,i- researchers, parallel programming
continues to be it!ii iili, with a persistent and .1-,i ,- i-,_ gulf between theory
and practical programming. We propose a structured approach to developing
parallel programs for the class of problems whose specifications are like those
usually given for sequential programs, in which the specification describes initial
states for which the program must terminate and the relation between initial and
final states. Our approach allows much of the work of development, reasoning,
and testing and .1. 1.,i_ _i i_ to be done using familiar sequential techniques and
tools; it takes the form of a simple model of parallel i '- _! i '1 a methodol-
for transforming programs in this model into programs for parallel machines
* This work was supported by funding from the Air Force ( i,. of Scientific Research
(A] i i-'1 and the .'-..ii .i..il Science Foundation (.'.. -I ).
2 Berna L. Massingill
based on the ideas of semantics-preserving transformations and programming
archetypes (patterns), and an [i!..l 1i i!_ operational model providing a uni-
fied framework for reasoning about those transformations that are ,liti! [ili or
impossible to reason about using sequential techniques.
By combining a relatively accessible programming i
theoretical framework, our approach to some extent bridges the gap between the-
ory and practical programming. The transformations we propose are in i!! i!
cases formalized versions of what programmers and compilers I ,i 11- do in
practice to "I '! 'I1 !, -" sequential code, but we provide a framework for formally
proving their correctness (either by standard sequential techniques or by using
our operational model). Our operational model is -ii!!l ,i II! general to support
proofs of transformations between markedly Lil,! i, 1i programming models (se-
quential, shared-memory, and distributed-memory with !i. -- .. i -_). It is
- ii!!i 111ii abstract to permit a fair degree of rigor, but simple enough to be
relatively accessible, and applicable to a range of programming notations.
This paper describes our programming !!!. 1! l .1., _-- and presents our pro-
gramming model and its supporting framework in some detail. It also sketches
'.i, f!H how the model applies in the context of particular programming nota-
tions.
2 Our programming model and methodology
Our programming model comprises a primary model and two subsidiary mod-
els, and is designed to support a programming !!!. 1. 1. .1. -, based on stepwise
refinement and the reuse where possible of the techniques and tools of sequential
programming. This section gives an overview of our model and !!, 1! 1. .1. -.
2.1 The arb model: parallel composition with sequential semantics
Our primary programming model, which we call the arb model, is simply the
standard sequential model (as defined by Di il:-i 1 [14, 15], Gries [18], and others)
extended to include parallel compositions of groups of program elements whose
parallel composition is equivalent to their sequential composition. The name
(arb) is derived from UC (U! ni C) [5] and is intended to connote that such
groups of program elements !!! be interleaved in :ii arbitrary fashion with-
out changing the result. We define a property we call arb-, .....!'' i7.1, and we
show that if a group of program elements is arb-compatible, their parallel com-
position is semantically equivalent to their sequential composition; we call such
compositions arb compositions. Since arb-model programs can be interpreted
as sequential programs, the extensive body of tools and techniques applicable
to sequential programs is applicable to them. In particular, their correctness
can be demonstrated formally by using sequential methods, they can be refined
by sequential semantics-preserving transformations, and they can be executed
sequentially for testing and .1. 1 ,i - !i
A -i ... i.. .. I Approach to Parallel Programming: Methodology and Models 3
2.2 Transformations from the arb model to practical parallel
languages
Because the arb composition of arb-compatible elements can also be inter-
preted as parallel composition, arb-model programs can be executed as par-
allel programs. Such programs !!!i not make effective use of i- i.i ,1 parallel
architectures, however, so our !i,. -i.1. l. _- includes techniques for improving
their !tii. i! while maintaining correctness. We define two subsidiary pro-
gramming models that abstract key features of two classes of parallel archi-
tectures: the par model for shared-memory (single-address-space) architectures,
and the subset par model for distributed-memory (multiple-address-space) ar-
chitectures. We then develop semantics-preserving transformations to convert
arb-model programs into programs in one of these subsidiary models. Interme-
diate stages in this process are usually arb-model programs, so the transforma-
tions can make use of sequential refinement techniques, and the programs can be
executed sequentially. I !!i i! we indicate how the par model can be mapped
to practical programming languages for shared-memory architectures and the
subset par model to practical programming languages for distributed-memory-
message-passing architectures. Together, these groups of transformations provide
a semantics-preserving path from the original arb-model program to a program
in a practical programming language. I ,,i!.- 1 illustrates this overall scheme.
2.3 Supporting framework for proving transformations correct
Some of the transformations indicated in i ,I i,.. 1 those within the arb model
- can be proved correct using the techniques of sequential stepwise refinement
(as defined by Gries [18], Hoare [20], and others). Others those between
our different programming models, or from one of our models to a practical
programming language require a [ll i I. ,!! approach. We therefore define an
operational model based on viewing programs as state-transition -i. ,i,- give
definitions of our programming models in terms of this ,li.l! i 1-_! i operational
model, and use it to prove the correctness of those transformations for which
sequential techniques are inappropriate.
2.4 Programming archetypes
An additional important element of our approach, though not one that will be
addressed in this paper, is that we envision the transformation process just de-
scribed as being guided by parallel programming .o ?, i,,nq *, by which we mean
abstractions that capture the ..~~!i..,!i! P1i- of classes of programs, much like
the design patterns [17] of the object-oriented world. We envision application
developers choosing from a range of archetypes, each representing a class of pro-
grams with common features and providing a class-specific parallelization strat-
(i.e., a pattern for the shared-memory or distributed-memory program to be
ultimately produced) together with a collection of class-specific transformations
and a code library of communication or other operations that encapsulate the
4 Berna L. Massingill
"arb" model
--!------------------------------------------- e u n i l r g a
. sequential programs
'---------"- ---
S"par" model i programsfor
Sshared-memory
,, architecture
So subset "par" model programsfor
1 _l distributed-memory
II-,- [ architecture
'------------ j 1 1-------- - -
Fig. 1. Overview of programming models and transformation process. Solid-bordered
boxes indicate programs in the various models; arrows indicate semantics-preserving
transformations. A dashed arrow runs from the box denoting a sequential program to a
box denoting an arb-model programs because it is sometimes appropriate and feasible
to derive an arb-model program from an existing sequential program (by replacing
sequential compositions of arb- .... q.ii il elements with arb compositions of the same
elements).
A -i i i. I Approach to Parallel Programming: Methodology and Models 5
details of the parallel programs. A! 1L. i- i are described in more detail in [8]
and [30].
2.5 Program development using our methodology
We can then .* iil,.' the following approach to program development, with all
steps guided by an ; li. i i -.. if parallelization -li il. and supported by
a collection of ;: 1. 1- I i !i i i 1l' 1. i -i. ,ved-correct transformations.
Development of initial program. The application developer begins by developing
a correct program using sequential constructors and parallel composition (| ), but
ensuring that all groups of elements composed in parallel are arb-compatible.
We call such a program an arb-model program, and it can be interpreted as either
a sequential program or a parallel program, with identical meaning. Correctness
of this program can be established using techniques for establishing correctness
of a sequential program.
Sequential-to-sequential ., I/ .C.... The developer then begins the process of
refining the program into one suitable for the target architecture. During the
initial stages of this process, the program is viewed as a sequential program
and operated on with sequential refinement techniques, which are well-defined
and well-understood. A collection of representative transformations (refinement
steps) is presented in [30]. (Appendix C sketches a few of them.) In refining
a sequential composition whose elements are arb-compatible, care is taken to
preserve their arb-compatibility. The result is a program that refines the original
program and can also be interpreted as either a sequential or a parallel program,
with identical meaning. The end product of this refinement should be a program
that can then be transformed into an tii. i- !! program in the par model (for
a shared-memory target) or the subset par model (for a distributed-memory
target).
Sequential-to-parallel ., I/ ...... The developer next transforms (refines) the
refined arb-model program into a par-model or subset-par-model program.
, ..... ~-' ... for target /.l,'/.1 I !!! 11 the developer translates the par-model
or subset-par-model program into the desired parallel language for execution on
the target platform. This step, like the others, is guided by semantics-preserving
rules for mapping one of our programming models into the constructs of a par-
ticular parallel language.
3 The arb model
As discussed in Section 2, the heart of our approach is b1l. ,iiif ii- groups of
program elements that have the useful property that their parallel composition
6 Berna L. Massingill
is semantically equivalent to their sequential composition. We call such a group
of program elements arb-compatible.
In this section, we first present our operational model for parallel programs,
the model we will use for reasoning about programs and program transformations
that are not amenable to strictly sequential reasoning techniques. We then de-
fine a notion of arb-compatibility, such that the parallel composition of a group
of arb-compatible program elements is semantically equivalent to its sequential
composition. We then identify restrictions on groups of program elements that
are -,i!! -i !!i to guarantee their arb-compatibility, and we present some proper-
ties of parallel compositions of arb-compatible elements. I !i !! we sketch how
these ideas apply in the context of programming notations and ',i, fl- discuss
executing arb-model programs sequentially and in parallel.
It is worth observing at this point that the ideas behind the programming
model are not tied to :i!! particular programming notation but should apply
to !! imperative programming notation. We present definitions and theorems
for our programming models in a notation based on that of Dil :- i ,'s guarded-
command language, since it is a simple and compact notation that makes for
readable definitions and theorems. However, we present examples of i;'I'll i,
the definitions and theorems in a notation based on Fortran 90, in order to
take advantage of Fortran 11 wider range of convenient constructs (e.g., ,1! -
and [I'i loops) and to indicate how our ideas apply in the context of a practical
programming notation.
3.1 Overview of program semantics and operational model
We define programs in such a way that a program describes a state-transition
-1i !i and show how to define program computations, sequential and parallel
composition, and program refinement in terms of this definition. In this paper
we present this material with a minimum of mathematical notation and only
brief sketches of most proofs; a more formal treatment of the material, including
more complete proofs, appears in [30].
Treating programs as state-transition -1. 11,- is not a new approach; it has
been used in work such as ('I! !!i 1[ and Misra [9], L- i!! 1 and Tuttle [24], Lam-
port [23], Manna and Pnueli [26], and Pnueli [34] to reason about both parallel
and sequential programs. The basic notions of a state-transition -- -i. 1, a
set of states together with a set of transitions between them, representable as a
directed graph with states for vertices and transitions for edges are perhaps
more helpful in reasoning about parallel programs, particularly when program
specifications describe ongoing behavior (e.g., if. and progress properties)
rather than relations between initial and final states, but they are also applica-
ble to sequential programs. Our operational model builds on this basic view of
program execution, presented in a way specifically aimed at facilitating the stat-
ing and proving of the main theorems of this section (that for groups of program
elements meeting stated criteria, their parallel and sequential compositions are
semantically equivalent) and subsequent sections.
A -i i. i n.. I Approach to Parallel Programming: Methodology and Models 7
3.2 Definitions
D, ; ..'' ,.. 3.1 (Program).
We define a program P as a 6-tuple (V, L, InitL, A, PV, PA), where
- V is a finite set of 1- il. 1 variables. V defines a state space in the state-
transition -1. ,i that is, a state is given by the values of the variables in
V. In our semantics, distinct program variables denote distinct atomic data
objects; aliasing is not allowed.
L C V represents the local variables of P. These variables are distinguished
from the other variables of P in two ways: (i) The initial states of P are
given in terms of their values, and (ii) they are invisible outside P that is,
they !!!i not appear in a specification for P, and they !!!i not be accessed
by other programs composed with P, either in sequence or in parallel.
InitL is an assignment of values to the variables of L, representing their
initial values.
A is a finite set of program actions. A program action describes a relation
between states of its input variables (those variables in V that affect its
behavior, either in the sense of determining from which states it can be
executed or in the sense of determining the effects of its execution) and
states of its output variables (those variables whose value can be affected by
its execution). Thus, a program action is a triple (la, Oa, Ra) in which
I, C V represents the input variables of A.
OQ C V represents the output variables of A.
R, is a relation between I,-tuples and O,-tuples.
PV C V are protocol variables that can be modified only by protocol actions
(elements of PA). (That is, if v is a protocol variable, and a = (Il, O, R,)
is an action such that v E O, a must be a protocol action.) Such variables
and actions are not needed in this section but are useful in defining the
synchronization mechanisms of Section 4 and Section 5; the requirement
that protocol variables be modified only by protocol actions simplifies the
task of defining such mechanisms. Observe that variables in PV can include
both local and non-local variables.
PA C A are protocol actions. Only protocol actions !i i modify protocol
variables. (Protocol actions !i however, modify non-protocol variables.)
A program action a = (Ia, Oa, Ra) defines a set of state transitions, each of
which we write in the form s -> s', as follows: s -4 s' if the pair (i,o), where
i is a tuple representing the values of the variables in Ia in state s and o is a
tuple representing the values of the variables in OQ in state s', is an element of
relation Ra.
Observe that we can also define a program action based on its set of state
transitions, by inferring the required la, Oa, and Ra.
8 Berna L. Massingill
Appendix A presents examples of defining the commands of a programming
notation (Dil:- i ,'s guarded-command language [13,15]) in terms of our model.
D. Ci ,,... 3.2 (Initial states).
For program P, s is an initial state of P if, in s, the values of the local variables
of P have the values given in InitL.
O
D. 'i,, ,.. 3.3 (Enabled).
For action a and state s of program P, we - that a is enabled in s exactly
when there exists program state s' such that s s'.
D. C"',/.i 3.4 (Computation).
If P = (V, L, InitL, A, PV, PA), a computation of P is a pair
C =(so, (j: 1
in which
- so is an initial state of P.
- (j : 1 < j < N : (aj, s)) is a sequence of pairs in which each aj is a
program action of P, and for all j, sjl 4 sj. We call these pairs the state
transitions of C, and the sequence of actions aj the actions of C. N can be
a non-negative integer or oo. In the former case, we - that C is a finite or
terminating computation with length N + 1 and final state SN. In the latter
case, we - that C is an infinite or nonterminating computation.
If C is infinite, the sequence (j : 1 < j : (aj, s)) satisfies the following
fairness requirement: If, for some state sj and program action a, a is enabled
in sj, then eventually either a occurs in C or a ceases to be enabled.
O
D. /i' ,,,. 3.5 (Terminal state).
We - that state s of program P is a terminal state of P exactly when there
are no actions of P enabled in s.
D. Ci, ''.. 3.6 (. ..i, '../ computation).
A -i ... i.. i. I Approach to Parallel Programming: Methodology and Models 9
We that a computation of C of P is a maximal computation exactly when
either (i) C is infinite or (ii) C is finite and ends in a terminal state.
D, f;,. i,.. 3.7 (Affects).
For predicate q and variable v E V, we - that v affects q exactly when there
exist states s and s', identical except for the value of v, such that q.s # q.s'. For
expression E and variable v E V, we - that v affects E exactly when there
exists value k for E such that v affects the predicate (E = k).
3.3 Specifications and program refinement
The usual meaning of "--1. i. 1! ii P is refined by program P'" is that program P'
meets i! specification met by P. We will confine ourselves to specifications that
describe a program's behavior in terms of initial and final states, giving (i) a set
of initial states s such that the program is guaranteed to terminate if started in
s, and (ii) the relation, for terminating computations, between initial and final
states. An example of such a specification is a Hoare total-correctness triple [20].
In terms of our model, initial and final states correspond to assignments of values
to the program's variables; we make the additional restriction that specifications
do not mention a program's local variables L. We make this restriction because
otherwise program equivalence can depend on internal behavior (as reflected in
the values of local variables), which is not the intended meaning of equivalence.
We write P E P' to denote that P is refined by P'; if P E P' and P' C P, we
- that P and P' are equivalent, and we write P P'.
D. i"'! '.,. 3.8 (Equivalence of computations).
For programs Pi and P2 and a set of 1- i. variables V such that V C VI and
V C V2 and for every v in V, v has the same I- i.- in all three sets (V, VI, and
SV), we - that computations C1 of Pi and C2 of P2 are equivalent with respect
to V exactly when:
- For every v in V, the value of v in the initial state of Ci is the same as its
value in the initial state of C2.
Either (i) both C1 and C2 are infinite, or (ii) both are finite, and for every
v in V, the value of v in the final state of Ci is the same as its value in the
final state of C2.
We can now give a -it!ii i i condition for showing that Pi E P2 in our semantics.
10 Berna L. Massingill
1. .... 3.9 (1'. ;- C.. !-/ in terms of equivalent computations).
For Pi and P2 with (VI \Li) C (V2 \L2) (where \ denotes set. 1! I. ,. i Pi E P2
when for every maximal computation C2 of P2 there is a maximal computation
Ci of Pi such that Ci is equivalent to C2 with respect to (VI \ Li).
J ,.,.f of 1 ....,. ... 3.9.
This follows immediately from Definition 3.8, the usual definition of refinement,
and our restriction that program specifications not mention local variables.
3.4 Program composition
We now present definitions of sequential and parallel composition in terms of
our model. 1 i -i we need some restrictions to ensure that the programs to be
composed are compatible that is, that it makes sense to compose them:
D. t;. i '... 3.10 (Composability of programs).
We - that a set of programs PI,..., Pj can be composed exactly when
- :i, variable that appears in more than one program has the same i I.. in
all the programs in which it appears (and if it is a protocol variable in one
program, it is a protocol variable in all programs in which it appears),
i! action that appears in more than one program is defined in the same
way in all the programs in which it appears, and
different programs do not have local variables in common.
Sequential composition
The usual meaning of sequential composition is this: A maximal computation
of Pi; P2 is a maximal computation Ci of Pi followed (if Ci is finite) by a
maximal computation C2 of P2, with the obvious generalization to more than
two programs. We can give a definition with this meaning in terms of our model
by introducing additional local variables Eni,..., EnN that ensure that things
happen in the proper sequence, as follows: Actions from program Pj can execute
only when Enj is true. Enl is set to true at the start of the computation, and
then as each Pj terminates it sets Enj to false and Enj+l to true, thus ensuring
the desired behavior.
A -i i i .I Approach to Parallel Programming: Methodology and Models 11
D. ;'. i',. 3.11 (Sequential composition).
If programs PI,...,PVN, with Pj = (Vj, L, InitL, Aj,PV, PAj), can be com-
posed (Definition 3.10), we define their sequential composition (Pi;... ; PN)
(V, L, InitL, A, PA, PV) thus:
- V = Vi ... U VN UL.
- L = L U ... U LN U {Enp,Eni,..., EnN}, where Enp,Eni,..., EnN are
distinct Boolean variables not otherwise occurring in V:
Enp is true in the initial state of the sequential composition and false there-
after.
For all j, Enj is true during (and only .li, i;-_ the part of the computation
corresponding to execution of Pj.
InitL is defined thus: The initial value of Enp is true. For all j, the initial
value of Enj is false, and the initial values of variables in Lj are those given
by InitLj.
A consists of the following i- i, of actions:
Actions corresponding to actions in Aj, for some j: For a E Aj, we define
a' identical to a except that a' is enabled only when Enj = true.
Actions that accomplish the transitions between components of the com-
position:
Initial action , takes i initial state s, with Enp = true, to a state
s' identical except that Enp = false and Enl = true. s' is thus an initial
state of PI.
For j with 1 < j < N, action aTr takes ;,!- terminal state s of Py,
with Enj = true, to a state s' identical except that Enj = false and
Enj+i = true. s' is thus an initial state of Pj+I.
1 i ,1 I action aT, takes ;i!! terminal state s of PN, with EnN = true, to
a state s' identical except that EnN = false. s' is thus a terminal state
of the sequential composition.
PV= PVIU...UPVN.
PA contains exactly those actions a' derived (as described above) from the
actions a of PAI U ... U PAN.
Parallel composition
The usual meaning of parallel composition is this: A computation of PiIP2
defines two threads of control, one each for Pi and P2. Initiating the composition
corresponds to starting both threads; execution of the composition corresponds
to an interleaving of actions from both components; and the composition is
understood to terminate when both components have terminated. We can give a
definition with this meaning in terms of our model by introducing additional local
variables that ensure that the composition terminates when all of its components
terminate, as follows: As for sequential composition, we introduce additional
12 Berna L. Massingill
local variables En1, ..., EnN such that actions from program Pj can execute only
when Enj is true. For parallel composition, however, all of the Enj's are set to
true at the start of the computation, so computation is an interleaving of actions
from the Pj's. As each Pj terminates, it sets the corresponding Enj to false;
when all are false, the composition has terminated. Observe that the definitions
of parallel and sequential composition are almost identical; this greatly facilitates
the proofs of Lemma 3.17 and Lemma 3.18.
D. ;. "''.,. 3.12 (Parallel composition).
If programs PI,...,PfN, with P = (Vj, Lj, InitLj, Aj, PVj, PAj), can be com-
posed (Definition 3.10), we define their parallel composition (P1i ... I PN)
(V, L, InitL, A, PV, PA) thus:
- V = Vi U... U IV UL.
- L = L U ... U LN U {Enp,Eni,... ,EnN}, where Enp,Eni,..., EnN are
distinct Boolean variables not otherwise occurring in V:
Enp is true in the initial state of the parallel composition and false there-
after.
For all j, Enj is true until the part of the composition corresponding to Pi
has terminated.
InitL is defined thus: The initial value of Enp is true. For all j, the initial
value of Enj is false, and the initial values of variables in Lj are those given
by InitLj.
A consists of the following i- i of actions:
Actions corresponding to actions in Aj, for some j: For a E Aj, we define
a' identical to a except that a' is enabled only when Enj is true.
Actions that correspond to the initiation and termination of the compo-
nents of the composition:
Initial action .1 takes !i initial state s, with Enp = true, to a state
s' identical except that Enj = true for all j. s' is thus an initial state of
Pj, for all j.
For j with 1 < j < N, action aT. takes ;!!- terminal state s of Pj,
with Enj = true, to a state s' identical except that Enj = false. A
terminating computation of P contains one execution of each aT ; after
execution of aT3 for all j, the resulting state s' is a terminal state of the
parallel composition.
PV= PV U...UPVN.
PA contains exactly those actions a' derived (as described above) from the
actions a of PAi U ... U PAN.
A -i i ii . Approach to Parallel Programming: Methodology and Models 13
3.5 arb-compatibility
We now turn our attention to defining -it!!ii. ii conditions for a group of pro-
grams PI,..., PN to have the property we want, namely:
(Pl II... IIPN) ~ (Pl; ... ;PN) .
We first define a key property of pairs of program actions; we can then define
the desired condition and show that it guarantees the property of interest.
D. ;.I "',.'f 3.13 (Commutativity of actions).
Actions a and b of program P are said to commute exactly when the following
two conditions hold:
- Execution of b does not affect (in the sense of Definition 3.7) whether a is
enabled, and vice versa.
It is possible to reach s2 from si by first executing a and then executing b
exactly when it is also possible to reach s2 from s by first executing b and
then executing a, as illustrated by I ', i.- 2. then executing a. (Observe that
if a and b are nondeterministic, there i i be more than one such state 82.)
o--'""
ls2
Fig. 2. Commutativity of actions a and b. Observe that a and b are nondeterministic,
but the graph has the property that if we can reach a state (s8 or 8;) by executing
first a and then b, then we can reach the same state by first executing b and then a,
and vice versa.
(That is, a and b commute exactly when they have the diamond property [10,
-, -, )
D. I,.' ',.f 3.14 (arb-compatible).
14 Berna L. Massingill
Programs PI,...,PN are arb-compatible exactly when they can be composed
(Definition 3.10) and ;i!! action in one program commutes (Definition 3.13)
with ;!, action in another program.
1. ... .... 3.15 (Parallel ~ sequential for arb-compatible programs).
If PI, ..., PN are arb-compatible, then
(PlI ... IIP) (Pi;... ;PN)
i ',.,. of 1.! ....," .... 3.15.
We write Pp = (Pl| ... P7N) and Ps = (P; ... ; PN). From Definition 3.11 and
Definition 3.12,
(Vp = Vs) A (Lp = Ls) A (InitLp = InitLs) A (PVp = PVs)
A (PAp = PAs) ,
so we write Pp = (V, L, InitL, Ap, PV, PA) and Ps = (V, L, InitL, As, PV, PA).
We proceed as follows:
- We first show (Lemma 3.17) that for every maximal computation Cs of PS
there is a maximal computation Cp of Pp with Cs equivalent to Cp with
respect to V \ L. From Theorem 3.9, this establishes that Pp E Ps.
We then show (Lemma 3.18) the converse: that for every maximal computa-
tion Cp of Pp there is a maximal computation Cs of PS with Cp equivalent
to Cs with respect to V\L. From Theorem 3.9, this establishes that Ps c Pfp.
We then conclude that Pp ~ Ps, as desired.
Lemma 3.16 (Reordering of computations).
Suppose that Pi,..., PN are arb-compatible and Cp is a finite (not necessarily
maximal) computation of Pp = (P ll... I PN) containing a successive pair of
transitions ((a, s,), (b, s,+l)) such that a and b commute. Then we can construct
a computation Cp of Pp with the same initial and final states as Cp, and the
same sequence of transitions, except that the pair ((a, s,), (b, s.+l)) has been
replaced by the pair ((b, s'), (a, s,+l)).
A -i .', i i,, Approach to Parallel Programming: Methodology and Models 15
.i ,.- f of Lemma 3.16.
This is an obvious consequence of the commutativity (Definition 3.13) of a and
b: If so-i s, and s, -> si+l, then there exists a state s' such that s,-i s
and s' -a sn+l, so we can construct a computation as described.
Lemma 3.17 (Sequential .,. ,; parallel).
For Pp and Ps defined as in Theorem 3.15, if Cs is a maximal computation of
PS, there is a maximal computation Cp of Pp with Cs equivalent to Cp with
respect to V \ L.
F *,.-,f of Lemma 3.17.
The proof of this lemma is straightforward for finite computations: We have
defined parallel and sequential composition in such a way that i!!- maximal
finite computation of the parallel computation maps to an equivalent maximal
computation of the parallel composition.
For nonterminating computations, we can similarly map a computation of
the sequential composition to an infinite sequence of transitions of the parallel
composition. However, the result 11! not be a computation of the parallel com-
position because it !!i i violate the fairness requirement: If Pj fails to terminate,
no action of Pj+i can occur, even though in the parallel composition there !!i i
be actions of Pj+i that are enabled. If this is the case, however, we can use the
principle behind Lemma 3.16 to transform the unfair sequence of transitions into
a fair one.
Lemma 3.18 (Parallel .' C-; sequential).
For Pp and Ps defined as in Theorem 3.15, if Cp is a maximal computation of
Pp, there is a maximal computation Cs of PS such that Cs is equivalent to Cp
with respect to V \ L.
J! .'.,f of Lemma 3.18.
For terminating computations, the proof is straightforward: Given a maximal
computation ofthe parallel composition, we first apply Lemma 3.16 repeatedly to
construct an equivalent (also maximal) computation of the parallel composition
16 Berna L. Massingill
in which, for j < k, all transitions corresponding to actions of Pj occur before
transitions corresponding to actions of Pk. As in the proof of Lemma 3.17, this
computation then maps to an equivalent maximal computation of the sequential
composition.
For nonterminating computations, we can once again use the principle behind
Lemma 3.16 to construct a sequence of transitions (of the parallel composition)
in which, for j < k, all transitions corresponding to actions of Pj occur be-
fore transitions corresponding to actions of Pk. We then map this sequence of
transitions to a computation of the sequential composition.
3.6 arb composition
For arb-compatible programs Pi,..., PN, then, we know that
(PlII... IPN) ~ (Pi; ... ;PN) .
To denote this parallel/sequential composition of arb-compatible elements, we
write arb(Pl,..., PN), where
arb(PI,..., PN) ~ (P I... IPN)
or equivalently
arb(Pi,...,PN) ~ (Pi;...;PN) .
We refer to this notation as "arb ..!!!l .I.-i i'.. i although it is not a true com-
position operator since it is properly applied only to groups of elements that are
arb-compatible. We regard it as a useful form of !!i ,, il sugar that denotes
not only the parallel/sequential composition of Pi, ..., PN but also the fact that
PI,...,PN are arb-compatible. We also define an additional bit of syntactic
sugar, seq(Pl,..., PN), such that
seq(PI,..., PN) ~(Pi; ... ;PN)
(We use this notation to improve the -1 l.1,1!i- of nestings of sequential and
arb composition.)
arb composition has a number of useful properties. It is associative and
commutative (proofs given in [30]), and it allows refinement by parts, as the
following theorem states.
1! .. '. .... 3.19 (_? I;" ..... by parts of arb composition).
We can refine ;oi- component of an arb composition to obtain a refinement of
the whole composition. That is, if Pi,..., PN are arb-compatible, and, for each
j, Py C Pj, and P{,..., Pjr are arb-compatible, then
arb(PI,..., PN) C arb(P',..., PY)
A -i i,,, i. i,. .I Approach to Parallel Programming: Methodology and Models
J ,.,,.f of !., ....', ... 3.19.
This follows from Theorem 3.15 and refinement by parts for sequential programs.
This theorem is the justification for our program-development i .i in which
we apply the techniques of sequential stepwise refinement to arb-model pro-
grams.
3.7 A simpler sufficient condition for arb-compatibility
The definition of arb-compatibility given in Definition 3.14 is the most general
one that seems to give the desired properties (equivalence of parallel and sequen-
tial composition, and associativity and commutativity), but it ni i be i!tii 1ii
to apply in practice. We therefore give a more-easily-checked -tii!!, ,. ii condition
for programs PI,..., PN to be arb-compatible.
D, ;,."! .'i (Variables read/written by P).
For program P and variable v, we that v is read by P if it is an input variable
for some action a of P, and we i t 11 v is written by P if it is an output variable
for some action a of P.
I !. .... 3.21 (arb-compatibility and shared variables).
If programs PI,..., PN can be composed (Definition 3.10), and for j # k, no vari-
able written by Pj is read or written by Pk, then PI,..., PN are arb-compatible.
J ,.,f of 1! .... ... 3.21.
Given programs PI,..., PN that satisfy the condition, it -it[h. to show that
:,!! two actions from distinct components Pi and Pk commute. The proof is
straightforward; a detailed version appears in [30].
18 Berna L. Massingill
3.8 arb composition and programming notations
A key ,liti! ,ilil in .11-' i,!- our ni. t!l .1..l_- for program development is in
b1 1. 1 1"- i ,_ groups of program elements that are known to be arb-compatible.
The [t Ii111 !i is exacerbated by the fact that !!! programming notations have
a notion of program variable that is more [ lh!,i !,11 to work with than the notion
we !!! 1.- for our formal semantics. In our semantics, variables with distinct
names address distinct data objects. In i!! ii! programming notations, this need
not be the case, and the l!iii ii i- of detecting situations in which variables with
distinct names overlap (,1i ,-i -1_i complicates automatic program optimization
and parallelization just as it complicates the application of our i,. il !..1..,_-.
Syntactic restrictions -[tii ii to guarantee arb-compatibility do not seem in
general feasible. However, it is feasible to give semantics-based rules that identify,
for program P, supersets of the variables read and written by P, and identify-
ing such supersets is -,ii!li - il to permit application of Theorem 3.21. In [30]
we discuss such rules for two representative programming notations, Di11:-li ,'s
guarded-command language [13,15] and Fortran 90 [22,1], and present exam-
ples of the use of these rules. Defining such rules is fairly straightforward for
Diil :-iI 's guarded-command language, since it is a small and well-understood
language. It is less straightforward for a large and complex language such as For-
tran 90; giving a formal definition of its semantics is far from trivial. We observe,
however, that the well-understood constructs of Diil :-i1 's guarded-command
language have, when deterministic, analogous constructs in Fortran 90 (as in
!!! ,!! other practical languages), and that .i!!! 11- -iL,-tli f. 1 results derived in
Di il:- 1 ,'s guarded-command language apply to Fortran 90 programs insofar as
the Fortran 90 programs limit themselves to these analogous constructs.
Before presenting examples, we introduce a little additional notation so that
we can apply our extensions to the sequential programming model to a represen-
tative practical programming notation, Fortran 90. We do this in order to show
how our ideas apply in the context of a practical programming notation.
arb composition. For arb-compatible programs PI,...,PN, we write their arb
composition thus:
arb
P1
PN
end arb
seq composition. We define an analogous notation for sequential composition,
using keywords seq and end seq, useful in improving the ". l ,1 u1- of nesting
of sequential and arb composition.
arball. To allow us to express the arb composition of, for example, the iter-
ations of a loop, we define an indexed form of arb composition, with -- i i
A -i iii .I Approach to Parallel Programming: Methodology and Models 19
modeled after that of the FORALL construct of High Performance Fortran [19],
as follows. This notation is !! 1i i1, sugar only, and all theorems that apply to
arb composition apply to arball as well.
D, ;,. ,.. 3.22 (arball).
If we have N index variables il,...,iN, with corresponding index ranges
ij-start < ij < ijend, and program block P such that P does not modify the
value of; ., of the index variables, then we can define an arball composition as
follows.
For each tuple (Xi,... ,X) in the cross product of the index ranges, we
define a corresponding program block P(xl, ..., xN) by replacing index variables
il,..., iN with corresponding values xl,..., XN. If the resulting program blocks
are arb-compatible, then we write their arb composition as follows:
arball (il = il_start : il_end ..., iN = iN_start : iN_end)
P(xl, ..., xN)
end arball
3.9 Examples of arb composition
Composition of sequential blocks. This example composes two sequences, the
first assigning to a and b and the second assigning to c and d.
arb
seq
a= 1;b=a
end seq
seq
c=2 ;d= c
end seq
end arb
Composition of sequential blocks (arball). The following example composes ten
sequences, each assigning to one element of a and one element of b.
arball (i = 1:10)
seq
a(i) = i
b(i) = a(i)
end seq
end arball
20 Berna L. Massingill
Invalid composition. The following example is not a valid arb composition; the
two assignments are not arb-compatible.
arb
a=
b=a
end arb
3.10 Execution of arb-model programs
Since for arb-compatible program elements, their arb composition is semanti-
cally equivalent to their parallel composition and also to their sequential com-
position, programs written using sequential commands and constructors plus
(valid) arb composition can, as noted earlier, be executed either as sequential or
as parallel programs with identical results.1 In this section we sketch how to do
this in the context of practical programming languages; [30] presents additional
details and examples.
Sequential execution. A program in the arb model can be executed sequentially;
such a program can be transformed into an equivalent program in the l,1111.1 l i; -
sequential notation by replacing arb composition with sequential composition.
For Fortran 90, this is done by removing arb and end arb and transforming
arball into nested [i'L loops.
Parallel execution. A program in the arb model can be executed on a shared-
i i .i -b -",.1. 1 parallel architecture given a language construct that implements
general parallel composition as defined in Definition 3.12. Language constructs
consistent with this form of composition include the par and parfor constructs
of CC++ [7] and the PAR LLEL [CI and PAR LLL L SEl'-TIiI1S, constructs of the
01O !. l' proposal [33].
4 The par model and shared-memory programs
As discussed in Section 1, once we have developed a program in our arb model,
we can transform the program into one suitable for execution on a shared-
memory architecture via what we call the par model, which is based on a
structured form of parallel composition with barrier i ,1. !, 1 i, .11i that we
call par composition. In our !!!l 1 l, 1..1 _-, we initially write down programs us-
ing arb composition and sequential constructs; after ;,l|1'l i_- transformations
Programs that use arb to compose elements that are not arb-compatible cannot, of
course, be guaranteed to have this property. As discussed in Section 3.6, we assume
that the arb composition notation is applied only to groups of program elements
that are arb- ... .i..,i I!l. it is the responsibility of the programmer to ensure that
this is the case.
A -i i.. i .I Approach to Parallel Programming: Methodology and Models 21
such as those presented in Appendix C, we transform the results in par-model
programs, which are then readily converted into programs for shared-memory
architectures (by replacing par composition with parallel composition and our
barrier i !-. i i il- .i construct with that provided by a selected parallel lan-
guage or library). In this section we extend our model of parallel composition to
include barrier synchronization, give key transformations for turning arb-model
programs into programs using parallel composition with barrier synchronization,
and '.! i. fl discuss executing such programs on shared-memory architectures.
4.1 Parallel composition with barrier synchronization
We first expand the definition of parallel composition given in Section 3 (Defini-
tion 3.12) to include barrier synchronization. Behind ;!! synchronization mech-
anism is the notion of "-,1-I. !' .i!," a component of a parallel composition until
some condition is met that is, temporarily interrupting the normal flow of
control in the component, and then resuming it when the condition is met. We
model suspension as busy waiting, since this approach simplifies our definitions
and proofs by making it unnecessary to distinguish between computations that
terminate normally and computations that terminate in a deadlock situation -
if suspension is modeled as a busy wait, deadlocked computations are infinite.
Specification of barrier synchronization. We first give a specification for
barrier !! 1 i., -i i .,i that is, we define the expected behavior of a barrier
command in the context of the parallel composition of programs PI,..., PN. If
iBj denotes the number of times Pj has initiated the barrier command, and cBj
denotes the number of times Pj has completed the barrier command, then we
require the following:
- For all j, iBj = cBj or iBj = cBj + 1. If iBj = cBj + 1, we that Pj is
suspended at the barrier. If iBj = cBj, we that Pj is not suspended at
the barrier.
If Pj and Pk are both suspended at the barrier, or neither Pj nor Pk is
suspended at the barrier, then iBj = iBk.
If Pj is suspended at the barrier and Pk is not suspended at the barrier,
iBj = iBk + 1.
For ; !- n, if every P, initiates the barrier command n times, then eventually
every P, completes the barrier command n times:
(Vj :: (iBj = cBj + 1) A (iBj = n)) -' (Vj :: (cBj = n))
We observe that this specification simply captures formally the usual meaning of
barrier !! . i i -.11 and is consistent with other formalizations, for example
those of [2] and [36]. Most details of the specification were obtained from [38];
the overall method (in which initiations and completions of a command are
considered separately) owes much to [27].
22 Berna L. Massingill
Barrier synchronization in our model. We define barrier -- !l !i. 1'!i i..! i
by extending the definition of parallel composition given in Definition 3.12 and
defining a new command, barrier. This combined definition implements a com-
mon approach to barrier synchronization based on keeping a count of processes
waiting at the barrier, as in [2]. In the context of our model, we implement this
approach using two protocol variables local to the parallel composition, a count
Q of suspended components and a flag Arriving that indicates whether compo-
nents are arriving at the barrier or leaving. As components arrive at the barrier,
we suspend them and increment Q. When Q equals the number of components,
we set Arriving to false and allow components to leave the barrier. Components
leave the barrier by unsuspending and decrementing Q. When Q equals 0, we
reset Arriving to true, ready for the next use of the barrier.
D. v', ,,.. 4.1 (barrier).
We define program barrier = (V, L, InitL, A, PV, PA) as follows:
- V = L U {Q, Arriving}.
- L = {En, Susp}, where En, Susp are Boolean variables.
- InitL = (true, false).
- A arrivee areease,, aeave, areset, await}, where
arrive corresponds to a process's initiating the barrier command when
fewer than N- 1 other processes are suspended. The process should then
suspend, so the action is defined by the set of state transitions s -> s'
such that:
In s, En is true, Arriving is true, and Q < (N 1).
s' is s with En set to false, Susp set to true, and Q incremented by
1.
release corresponds to a process's initiating the barrier command when
N 1 other processes are suspended. The process should then complete
the command and enable the other processes to complete their barrier
commands as well. The action is thus defined by the set of state transi-
tions s -> s' such that:
In s, En is true, Arriving is true, and Q = (N 1).
s' is s with En set to false and Arriving set to false. Susp, which
was initially false, is unchanged.
leave corresponds to a process's completing the barrier command when
at least one other process has not completed its barrier command. The
action is defined by the set of state transitions s -> s' such that:
In s, Susp is true, Arriving is false, and Q > 1.
s' is s with Susp set to false and Q decremented by 1.
areset corresponds to a process's completing the barrier command when
all other processes have already done so. The action is defined by the set
of state transitions s -> s' such that:
In s, Susp is true, Arriving is false, and Q = 1.
s' is s with Susp set to false, Arriving set to true, and Q set to 0.
A -i ... ... .I Approach to Parallel Programming: Methodology and Models 23
Sawait corresponds to a process's '.,i- 1- lii_- at the barrier. The action
is defined by the set of state transitions s -> s' such that:
In s, Susp is true.
s = s.
PV = {Q, Arriving}.
PA= A.
D. I"' '. 4.2 (Parallel composition with barrier .... ", ' ..
We define parallel composition as in Section 3 (Definition 3.12), except that we
add local protocol variables Arriving (of 1i i.. Boolean) and Q (of 1- i. integer)
with initial values true and 0 respectively.
Observe that this definition meets the specification given previously; a proof
can be constructed by formalizing the introductory discussion preceding the
definitions.
4.2 The par model
We now define a structured form of parallel composition with barrier synchro-
nization. Previously we defined a notion of arb-compatibility and then defined
arb composition as the parallel composition of arb-compatible components.
Analogously, in this section we define a notion of par-compatibility and then
define par composition as the parallel composition of par-compatible compo-
nents. The idea behind par-compatibility is that the components match up with
regard to their use of the barrier command that is, they all execute the bar-
rier command the same number of times and hence do not deadlock. Observe
that while our definition is given in terms of restricted forms of the alternative
(IF) and repetition (DO) constructs of D ll:-i ,'s guarded-command language
[13, 15], it applies to ;oi programming notation with equivalent constructs.
D. "i' .f 4.3 (arb-compatible, revisited).
Programs PI,..., PN are arb-compatible exactly when (i) they meet the con-
ditions for arb-compatibility given earlier (Definition 3.14), and (ii) for each
j, Pj contains no free barriers, where program P is said to contain a free bar-
rier exactly when it contains an instance of barrier not enclosed in a parallel
composition.
D. t, "i' .,, 4.4 (par-compatible).
We programs PI,..., PN are par-compatible exactly when one of the fol-
lowing is true:
24 Berna L. Massingill
- P, ..., PN are arb-compatible.
- For each j,
Pj = Qj; barrier; Rj
where Q1,..., QN are arb-compatible and R1i,..., RN are par-compatible.
For each j,
Pj = if bj -- Qj O -bj -- skip fi
where Q1,..., QN are par-compatible, and for k # j no variable that affects
bj is written by Qk.
For each j,
Pj = if bj -- (Qj; barrier; Rj) [ -bj -+ skip fi
where Q1,..., QN are arb-compatible, Ri,..., RN are par-compatible, and
for k 7 j no variable that affects bj is written by Qk.
For each j,
Pj = do bj -- (Qj; barrier; Rj; barrier) od
where Q1,..., QN are arb-compatible, Ri,..., RN are par-compatible, and
for k # j no variable that affects bj is written by Qk.
As with arb, we write par(Pi,..., PN) to denote the parallel composition (with
barrier 1!!! ..- !! i 1.. 1 of par-compatible elements Pi,..., PN. We also define
a Fortran 90-compatible notation analogous to that for arb and a -- !11 parall
analogous to arball.
4.3 Examples of par composition
Composition of sequential blocks (parall). The following example composes ten
sequences, each assigning to one element of a and one element of b. The bar-
rier is needed since otherwise the sequences being composed would not be par-
compatible.
parall (i = 1:10)
seq
a(i) = i
barrier
b(i) = a(11-i)
end seq
end parall
A -i i i ,..I Approach to Parallel Programming: Methodology and Models 25
Invalid composition. The following example is not a valid par composition; the
two sequences are not par-compatible.
par
seq
a = 1 ; barrier ; b = a
end seq
seq
c = 2
end seq
end par
4.4 Transforming arb-model programs into par-model programs
We now give theorems allowing us to transform programs in the arb model
into programs in the par model. The versions here are suitable if the eventual
goal is a program for a shared-memory architecture; versions more suitable for
distributed-memory architectures are presented in Appendix B.
1 .. ..... 4.5 (Replacement of arb with par).
If PI, ..., PN are arb-compatible,
arb(Pi,...,PN) C par(Pi,...,PN)
J",,.,f of .... .. 4.5.
Trivial.
Il- ... 4.6 (Interchange of par and sequential composition).
If Q1,..., QN are arb-compatible and Ri,..., RN are par-compatible, then
arb(Qi,..., QN); par(RI,..., RN)
par(
(Q ; barrier; Ri),
(QN; barrier; RN)
26 Berna L. Massingill
J -,."f of 1! ....,' .. 4.6.
Ii -1 observe that both sides of the refinement have the same set of non-local
variables Vi We need to show that given I!! maximal computation C of the
right-hand side of the refinement we can produce a maximal computation C' of
the left-hand side such that C' is equivalent to C with respect to Vl. This is
straightforward: In u!!i maximal computation of the right-hand side, from the
definitions of sequential composition and barrier we know that we can partition
the computation into (1) a segment consisting of maximal computations of the
Qj's and initiations of the barrier command, one for each j, and (2) a segment
consisting of completions of the barrier command, one for each j, and maximal
computations of the Rj's. Segment (1) can readily be mapped to an equivalent
maximal computation of arb(Q, ..., QN) by removing the barrier-initiation ac-
tions. Segment (2) can readily be mapped to an equivalent maximal computation
of par(Ri,..., RN) by removing the first barrier-completion action for each j.
We observe that this approach works even for nonterminating computations: If
the right-hand side does not terminate, then either at least one Qj does not ter-
minate, or par(Ri,..., RN) does not terminate, and in either case the analogous
computation of the left-hand side also does not terminate. The right-hand side
cannot fail to terminate because of deadlock at the first barrier because if all
the Qj's terminate, the iii,. ,1 1. 1- 1 .11. .-,ing executions of barrier terminate
as well (from the specification of barrier -- !i. !1..!i il i. !i
1! ..' 4.7 (Interchange of par and IF, part 1).
If QI,...,QN are par-compatible, and for all j no variable that affects b is
written Qj, then
if b -> par(Qi,...,QN) [] -b -+ skip fi
par(
if b -+ Q1 -b skip fi,
if b QN [] b skip fi
)
J -.,.f of 1! .... .... 4.7.
A -i i i. ,I. Approach to Parallel Programming: Methodology and Models 27
Again observe that both sides of the refinement have the same set of non-local
variables i,. As before, a proof can be constructed by considering all maximal
computations of the right-hand side and showing that for each such computation
C we can produce a maximal computation C' of the left-hand side such that C'
is equivalent to C with respect to li,. Here, such a proof uses the fact that the
value of b is not changed by Qj for ; j. Since no barriers are introduced in
this transformation, we do not introduce additional possibilities for deadlock.
1! .... 4.8 (Interchange of par and IF, part 2).
If Q,... QN are arb-compatible, Ri,..., RN are par-compatible, and for all
j no variable that affects b is written by Qj, then
if b -+ (arb(Qi,...,QN);par(Ri,...,RN)) [] -b -+ skip fi
par(
if b -+ (Qi; barrier; Ri) [] b -+ skip fi,
if b -+ (QN; barrier; RN) [] -+ skip fi
)
,,,,f of 1!., .. .. 4.8.
Again observe that both sides of the refinement have the same set of non-local
variables Ii,. As before, a proof can be constructed by considering all maximal
computations of the right-hand side and showing that for each such computation
C we can produce a maximal computation C' of the left-hand side such that C' is
equivalent to C with respect to lid. The barrier introduced in the transformation
cannot deadlock for reasons similar to those for the transformation of Theorem
4.6.
1! .... 4.9 (Interchange of par and DO).
If Qi,..., QN are arb-compatible, Ri,..., RN are par-compatible, and for all
j no variable that affects b is written by Qj, then
28 Berna L. Massingill
do b -> (arb(Qi,...,QN);par(RI,...,RN)) od
par(
do b -> (Q1; barrier; Ri; barrier) od,
do b -> (QN; barrier; RN; barrier) od
)
J ,.,.f of 1!-. ... 4.9.
Ii -1 observe that both sides of the refinement have the same set of non-local
variables Vi. As before, a proof can be constructed by considering all maximal
computations of the right-hand side and showing that for each such computation
C we can produce a maximal computation C' of the left-hand side such that C'
is equivalent to C with respect to Vi,. The proof makes use of the restrictions on
when variables that affect b can be written. For terminating computations, the
proof can be constructed using the standard unrolling of the repetition command
(as in [18] or [15]) together with Theorem 4.6 and Theorem 4.8. For nontermi-
nating computations, the proof must consider two classes of computations: those
that fail to terminate because an iteration of one of the loops fails to terminate,
and those that fail to terminate because one of the loops iterates forever. In both
cases, however, the computation can be mapped onto an infinite (and therefore,
in our model, equivalent) computation of the left-hand side.
Example of ./,'/'.l i..,'.....',.i .... Let P be the following program:
do while (x < 100)
arb
a=a*2
b=b+1
end arb
par
x = max(a, b)
skip
end par
end do
Then P is refined by the following:
A -i i,. i i. .I Approach to Parallel Programming: Methodology and Models 29
par
do while (x < 100)
a = a 2 ; barrier ; x = max(a, b) ; barrier
end do
do while (x < 100)
b = b + 1 ; barrier ; skip ; barrier
end do
end par
Additional examples of ;'I'I'1 ;1ii these transformations are given in [30].
4.5 Executing par-model programs
It is clear that par composition as described in this section is implemented
by general parallel composition (as described in Section 3.10) plus a barrier
- !, !.iii; i i, G .!i that meets the specification of Section 4.1. Thus, we can trans-
form a program in the par model into an equivalent program in ii! language
with constructs that implement composition and barrier synchronization in a
way consistent with our definitions (which in turn are consistent with the usual
meaning of parallel composition with barrier synchronization). Examples of such
constructs are the P4ARAELLEL Fi PARA. EL SE,-'TIi IIS' and E AIEF constructs
of the 01. .' Il' proposal [33]. Examples of conversions are given in [30].
5 The subset par model and distributed-memory
programs
As discussed in Section 1, once we have developed a program in our arb model,
we can transform the program into one suitable for execution on a distributed-
III. !..,.- !i. -- -. -I '--1'1-: architecture via what we call the subset par model,
which is a restricted form of the par model discussed in Section 4. In our
,1, 1 11 -11, ,! we apply a succession of transformations to an arb-model program
to produce a program in the subset par model and then transform the result
into a program for a distributed-memory-message-passing architecture. In this
section we extend our model of parallel composition to include message-passing
operations, define a restricted subset of the par model that corresponds more
directly to distributed-memory architectures, discuss transforming programs in
the resulting subset par model into programs using parallel composition with
i -- ._. -I I '- -- and '.;i 11- discuss executing such programs on distributed-
ii ,I ,i,- 11. --. ._. i -ill-: architectures.
5.1 Parallel composition with message-passing
We first expand the definition of parallel composition given in Section 3 to include
message-passing.
30 Berna L. Massingill
Specification of message-passing. We define message-passing for PI, ..., PN
composed in parallel in a way compatible with single-sender-single-receiver chan-
nels with infinite slack (i.e., infinite i* 1"1 i '). Every message operation (send or
receive) specifies a sender and a receiver, and while a receive operation suspends
if there is no message to receive, a send operation never suspends. Messages are
received in the order in which they are sent and are not received before they
are sent. That is, if we let nSj,k denote the number of send operations from Pj
to Pk performed, iRj,k denote the number of receive operations from Pj to Pk
initiated, and cRj,k denote the number of such receive operations completed,
then we can write the desired specification as follows:
- iRjk = cRj,k or iRj,k = cRjk + 1 for all j, k.
- Messages are not received before they are sent: nSj,k > cRj,k for all j, k.
- Messages are received in the order in which they are sent: The n-th message
received by Pj from Pk is identical with the n-th message sent from Pk to
If n messages are sent from Pk to Pj, and Pj initiates n receive operations
for messages from Pk, then all will complete:
(nSj,k > n) A (iRjk = n)) - (cRj.k = n)
We observe that this specification, like the one for barrier synchronization in
Section 4, simply captures formally the usual meaning of this 1- .. of message
passing, and is consistent with other formalizations, for example those of [2] and
;,] The i. i!ii!l,.1._. ("-1 !:") and overall method (in which initiations and
completions of a command are considered separately) are based on [27].
Message-passing in our model. Like !i i! i other implementations of message-
passing, for example that of [2], our definition represents channels as queues:
We define for each ordered pair (Pd, Pk) a queue Cj,k whose elements rep-
resent messages in transit from Pd to Pf. Message sends are then represented
as enqueue operations and message receives as (possibly --1-I1, ,ili-1 dequeue
operations. 1.I !!! i!! of Cj,k take the form of pairs ( 1.,11 Value). Just as we did
in Section 4, we model suspension as busy waiting.
D, ; ,'',,,.- 5.1 (send).
We define program send = (V, L, InitL, A, PV, PA) as follows:
- V = L U {OutP1,..., OutPN, Rcvr, lI,.. Value}, where each OutPj ("..,ti-
port j") is a variable of i- i"' queue, Rcvr is an integer variable, l1I,, is
a I- I and Value is a variable of 1- i li,,. Variables OutPi,..., OutPN
are to be shared with the enclosing parallel composition, as described later,
while variables Rcvr, l ,," Value are to be shared with the enclosing sequen-
tial composition. (I.e., it is assumed that send is composed in sequence with
assignment statements that assign appropriate values to Rcvr, l1I,, and
Value.)
A -i ', i i .1 Approach to Parallel Programming: Methodology and Models 31
- L = {En}, where En is a Boolean variable.
- InitL = (true).
- A {a,nd}, where
e and corresponds to a process's sending a message ( l,, Value) to pro-
cess PRc,,. The action is defined by the set of state transitions s -+ s'
such that:
In s, En is true.
s' is s with En set to false and ( l 1,,l Value) enqueued (appended)
to OutPRcvr.
PV ={OutPI,...,OutPN}.
PA= A.
D. ;', ',,.. 5.2 (recv).
We define program recv = (V, L, InitL, A, PV, PA) as follows:
V = LU{InPi,..., InPN, Sndr, I,,.,1 Value}, where each InPj ("j!ii ,*i t j")
is a variable of 1- 1.. queue, Sndr is an integer variable, lI,,. is a I- 1, and
Value is a variable of 1i ,. 1 ..,, Variables InPI,..., InPN are to be shared
with the enclosing parallel composition, as described later, while variables
Sndr, l ,i., Value are to be shared with the enclosing sequential composition,
similarly to the analogous variables of send.
L = {En}, where En is a Boolean variable.
InitL =(true).
A= {arc, await}, where
arc corresponds to a process's receiving a message ( 1, 1 Value) from
process PSdr. The action is defined by the set of state transitions s -+ s'
such that:
In s, En is true and InPsndr is not !I1'1
s' is s with En set to false and ( l ,,1 Value) and InPsdr set to the
values resulting from dequeueing an element from InPS,ndr
await corresponds to a process's waiting for a message from process PSndr*
The action is defined by the set of state transitions s -+ s' such that:
In s, En is true and InPSndr is I'
s' = s.
PV = {InP,...,InPN}.
PA= A.
D. i"/ ','. 5.3 (Parallel composition with message-passing).
We define parallel composition as in Section 3 (Definition 3.12), except that
we add local protocol variables Cj,k (of i i"' queue), one for each ordered pair
(Pj, Pk), with initial values of -* il 1 ', and we perform the following additional
modifications on the component programs Pi:
32 Berna L. Massingill
- We replace variables OutP,..., OutPN in Vj with Cj,i,..., C,N, and we
make the same replacement in actions a derived from a,nd.
We replace variables InP, ..., InPN in Vj with Cij, ..., CN,j, and we make
the same replacement in actions a derived from arc and await.
Observe that this definition clearly meets the specification given earlier.
5.2 The subset par model
We define the subset par model such that a computation of a program in this
model 1!!! be thought of as consisting of an alternating sequence of (i) blocks of
computation in which each component operates independently on its local data,
and (ii) blocks of computation in which values are copied between components,
separated by barrier ! 1 i..! 1 i .1i as illustrated by I ,,i!.- 3. We refer to a
local-computation
sections
data-exchange
operations
Fig. 3. A computation of a subset-par-model program. Shaded vertical bars represent
computations of processes, arrows represent copying of data between processes, and
dashed horizontal lines represent barrier synchronization.
block of the first ,i. i as a local-computation section and to a block of the
second i. i (together with the preceding and succeeding barrier synchroniza-
tions) as a data-exchange operation.
That is, a program in the subset par model is a composition par(Pi,..., PN),
where PI,..., PN are subset-par-compatible as defined by the following.
A -i ,i ii i Approach to Parallel Programming: Methodology and Models 33
D C", i'.-- 5.4 (Subset par-compatibility).
PI, ..., P are subset-par-compatible exactly when (i) PI,..., PN are par-
compatible, (ii) the variables V of the composition (excluding the protocol
variables representing message channels) are partitioned into sli-i .ii subsets
W1,..., 1WyN, and (iii) exactly one of the following holds:
- PI,..., PN are arb-compatible and each Pj reads and writes only variables
in 1Wj.
For each j,
Pi = Qj; barrier; Q; barrier; Rj
where
Qi,..., QN are arb-compatible.
Each Qj reads and writes only variables in 1Wj.
Each Q' is an arb-compatible set of assignment statements Xk := xj
such that xj is an element of 1W and Xk is an element of Wk for some k
(possibly k = j).
R1,..., RN are subset-par-compatible.
For each j, bj E W1l and
Pj = if bj -+ Qj [ -bj -+ skip fi
where Q1,..., QN are subset-par-compatible.
For each j, bj E W1l and
Pj = do bj -+ Qj od
where Q1,..., QN are subset-par-compatible.
5.3 Example of subset par composition
The following example computes the maximum of four elements using recursive
doubling:
integer a(4), part(2), partcopy(2), m(2)
arb
part(l) = max(a(1), a(2))
part(2) = max(a(3), a(4))
end arb
arb
part_copy(1) = part(2)
partcopy(2) = part(l)
end arb
arb
m(1) = max(part(1), part_copy(1))
m(2) = max(part_copy(2), part(2))
end arb
34 Berna L. Massingill
5.4 Transforming subset-par-model programs into programs with
message-passing
We can transform a program in the subset par model into a program for a
distributed-memory-message-passing architecture by mapping each component
Pj onto a process j and making the following additional changes:
- Map each element 1W'j of the partition of V to the address space for process j.
- Convert each data-exchange operation (consisting of a set of
(barrier; Q; barrier) sequences, one for each component Pj) into a
collection of message-passing operations, in which each assignment xj := Xk
is transformed into a pair of message-passing commands: a send command
in k -i, if i!- Rcvr = j, and a recv command in j -i \" if i! Sndr = k.
Optionally, for :,!!- pair (Pj,Pk) of processes, concatenate all the messages
sent from Pj to Pk as part of a data-exchange operation into a single message,
replacing the collection of (send, receive) pairs from Pj to Pk with a single
(send, receive) pair.
Such a program refines the original program: Each send-receive pair of opera-
tions produces the same result as the assignment statement from which it was
derived (as discussed in [21] and -2' and the arb-compatibility of the assign-
ments ensures that these pairs can be executed in r order without changing the
result. Replacing barrier -- !i ~1i. i i. 11 with the weaker pairwise synchroniza-
tion implied by these pairs of message-passing operations also preserves program
correctness; we can construct a proof of this claim by using the techniques of
Section 3 and our definitions of barrier 1! 1 i .11, i. .! and message-passing. A
similar theorem and its proof are given in [29].
Example. If P is the recursive-doubling example program of Section 5.3, P is
refined by the following subset-par-model program P' with variables partitioned
into
- 1i = {a(l : 2),part(1),part_copy(1),m(1)} and
- {W2 = {a(3: 4),part(2),part_copy(2),m(2)} :
arb
seq
part(l) = max(a(1), a(2))
barrier ; part_copy(1) = part(2) ; barrier
m(1) = max(part(1), part_copy(1))
end seq
seq
part(2) = max(a(3), a(4))
barrier ; part_copy(2) = part(l) ; barrier
m(2) = max(part_copy(2), part(2))
end seq
end arb
A -i ,i ii i Approach to Parallel Programming: Methodology and Models 35
which is in turn refined by the following message-passing program P":
arb
seq
part(l) = max(a(l), a(2))
send ("integer", part(l)) to (P2)
recv (type, part_copy(l)) from (P2)
m(l) = max(part(l), part_copy(l))
end seq
seq
part(2) = max(a(3), a(4))
send ("integer", part(2)) to (Pl)
recv (type, part_copy(2)) from (Pl)
m(2) = max(part(2), part_copy(2))
end seq
end arb
5.5 Executing subset-par-model programs
We can use the transformation of the preceding section to transform programs
in the subset par model into programs in 1 !! language that supports multiple-
address-space parallel composition with single-sender-single-receiver message-
passing. Examples include Fortran M [16] (which supports multiple-address-
space parallel composition via process blocks and single-sender-single-receiver
message-passing via channels) and MPI [31] (which assumes execution in an
environment of multiple-address-space parallel composition and supports single-
sender-single-receiver message-passing via tagged point-to-point sends and re-
ceives).
Example. Program P" from Section 5.4 can be implemented by the following
Fortran M program:
program main
integer a(4)
inport (integer) inp(2)
outport (integer) outp(2)
channel (outp(l), inp(2))
channel (outp(2), inp(l))
processes
process call P(a(l:2), inp(l), outp(l))
process call P(a(3:4), inp(2), outp(2))
end processes
end
process P(a, inp, outp)
integer a(2)
36 Berna L. Massingill
inport (integer) inp
outport (integer) outp
integer part, part_copy, m
part = max(a(1), a(2))
send (outp) part
receive (inp) part_copy
m = max(part, part_copy)
end process
6 Related work
Program development via stepwise ... I;i .. "! Other researchers, for example
Back [3,4] and Martin [28], have addressed stepwise refinement for parallel pro-
grams. Our work is somewhat simpler than i!! i!! approaches because we deal
only with specifications that can be stated in terms of initial and final states,
rather than also addressing ongoing program behavior (e.g., f. 1i and progress
properties).
Operational models. Our operational model is based on defining programs as
state-transition -i-. !- as in the work of ( i '!. il and Misra [9], L- !!~il and
Tuttle [24], Lamport [23], Manna and Pnueli [26], and Pnueli [34]. Our model is
designed to be as simple as possible while retaining enough _. i,. i ,1i- to support
all aspects of our programming model.
Parallel programming models. Programming models similar in spirit to ours have
been proposed by Valiant [39] and T .. i!!. [37]; our model differs in that we
provide a more explicit supporting theoretical framework and in the use we make
of archetypes.
Automatic parallelization of sequential programs. Our work is in !i i~ respects
complementary to efforts to develop parallelizing compilers, for example Fortran
D [12]. The focus of such work is on the automatic detection of exploitable
parallelism, while our work addresses how to exploit parallelism once it is known
to exist. Our theoretical framework could be used to prove not only manually-
applied transformations but also those applied by parallelizing compilers.
Programming skeletons and patterns. Our work is also in some respects com-
plementary to work exploring the use of programming skeletons and patterns
in parallel computing, for example that of Cole [11] and Brinch Hansen [6]. We
also make use of abstractions that capture exploitable commonalities among pro-
grams, but we use these abstractions to guide a program development method-
S1.,_- based on program transformations.
A -,i -i ii.. Approach to Parallel Programming: Methodology and Models 37
7 Conclusions
We believe that our operational model, presented in Section 3, forms a suitable
framework for reasoning about program correctness and transformations, par-
ticularly transformations between our different programming models. Proofs of
the theorems of Section 3, sketched here and presented in detail in [30], demon-
strate that this model can be used as the basis for rigorous and detailed proofs.
Our programming model, which is based on t11. r!i, f i groups of program ele-
ments whose sequential composition and parallel composition are semantically
equivalent, together with the collection of transformations presented in [30] for
converting programs in this model to programs for 1- i,'. .1 parallel architectures,
provides a framework for program development that permits much of the work
to be done with well-understood and familiar sequential tools and techniques. A
discussion of how our approach can simplify the task of producing correct par-
allel applications is outside the scope of this paper, but [30] presents examples
of its use in developing example and real-world applications with good results.
Much more could be done, particularly in exploring providing automated
support for the transformations we describe and in 1. !!lif i!_ additional useful
transformations, but the results so far are !i i ..i. i and we believe that the
work as a whole constitutes an effective unified theory/practice framework for
parallel application development.
Acknowledgments
Thanks go to Mani ('!! il,, for his guidance and support of the work on which
this paper is based, and Eric Van de Velde for the book [40] that was an early
inspiration for this work.
References
1. J. C. Adams, W. S. Brainerd, J. T. Martin, B. T. Smith, and J. L. \\.i,_ ". i
Fortran 90 Handbook: Complete ANSI/ISO Reference. Intertext Publications :
McGraw-Hill Book Company, 1992.
2. G. R. Andrews. Concurrent Programming: Principles and Practice. The Ben-
jamin/Cummings Publishing Company, Inc., 1991.
3. R. J. R. Back. Refinement calculus, part II: Parallel and reactive programs. In Step-
wise Refinement of Distributed Systems: Models, Formalisms, Correctness, volume
430 of Lecture .'.-. -- in Computer Science, pages 67-93. Springer-Verlag, 1990.
4. R. J. R. Back and J. von Wright. Refinement calculus, part I: Sequential non-
deterministic programs. In Stepwise Refinement of Distributed Systems: Models,
Formalisms, Correctness, volume 430 of Lecture .'.. .' in Computer Science, pages
42-66. Springer-Verlag, 1990.
5. R. Bagrodia, K. M. ('!i. inl- and M. Dhagat. UC a set-based language for data-
parallel programming. Journal of Parallel and Distributed Computing, 28(2):186-
201, 1995.
38 Berna L. Massingill
6. P. Brinch Hansen. Model programs for computational science: A programming
methodology for multiconputers. Concurrency: Practice and Experience, 5(5):407
423, 1993.
7. K. M. ('i.I Il and C. Kesselman. CC++: A declarative concurrent object-oriented
programming notation. In Research Directions in Concurrent Object-Oriented Pro-
gramming. MIT Press, 1993.
8. K. M. ('il.Ii. l and B. L. Massingill. Parallel program archetypes. Technical Report
CS- 1 -'i.-- California Institute of Technology, 1997.
9. K. M. ('l..il and J. l-i., Parallel Program Design: A Foundation. Addison-
\\. -1. 1989.
10. A. ('i. h and J. B. Rosser. Some properties of conversion. ... .... of the
American Mathematical Society, 39:472-482, 1936.
11. M. I. Cole. I... -. .' ..'...'. . Structured Management of Parallel Computa-
tion. MIT Press, 1989.
12. K. D. Cooper, M. W. Hall, R. T. Hood, K. Kennedy, K. S. McKinley, J. M. Mellor-
Crummey, L. Torczon, and S. K. \\.il 1. The Parascope parallel programming
environment. Proceedings of the IEEE, 82(2):244-263, 1993.
13. E. W. 1 'I! -i ., Guarded commands, nondeterminacy, and formal derivations of
programs. Communications of the ACM, 18(8):453-457, 1975.
14. E. W. Dj! li .i A Discipline of Programming. Prentice-Hall, Inc., 1976.
15. E. W. Dij! i. and C. S. Scholten. Predicate Calculus and Program Semantics.
Springer-Verlag, 1990.
16. I. T. Foster and K. M. ('!,.iiil- F( il. 1 ..\N M: A language for modular parallel
programming. Journal of Parallel and Distributed Computing, 26(1):24-35, 1995.
17. E. Gamma, R. Henm, R. Johnson, and J. Vlissides. Design Patterns: I..'.. of
Reusable Object-Oriented Software. .\ IlI I.,!-\\.. -I. 1995.
18. D. Gries. !'. Science of Programming. Springer-Verlag, 1981.
19. High Performance Fortran Forum. High Performance Fortran language specifica-
tion, version 1.0. Scientific Programming, 2(1 2):1 170, 1993.
20. C. A. R. Hoare. An axiomatic basis for computer programming. Communications
of the ACM, 12(10):576 583, 1969.
21. C. A. R. Hoare. Communicating Sequential Processes. Prentice-Hall, 1985.
22. International -iin.Il.ii.l- Organization. ISO/IEC 1539:1991 (E), Fortran 90, 1991.
23. L. Lamport. A temporal logic of actions. ACM ... .... on Programming
Languages and Systems, 16(3):872 923, 1994.
24. N. A. Lynch and M. R. l i i l Hierarchical correctness proofs for ih-i il .ii. .
algorithms. In Proceedings of the 6th Annual ACM Symposium on Principles of
Distributed Computing, 1987.
25. B. J. MacLennan. Functional Programming: Practice and !'. ... .\I I l.,I- \\, !
1990.
26. Z. Manna and A. Pnueli. Completing the temporal picture. '.. ...' Computer
Science, 83(1):97 130, 1991.
27. A. J. Martin. An axiomatic definition of synchronization primitives. Acta Infor
matica, 16:219 235, 1981.
28. A. J. Martin. Compiling communicating processes into delay-insensitive VLSI
circuits. Distributed Computing, 1(4):226 234, 1986.
29. B. L. Massingill. Experiments with program parallelization using archetypes and
stepwise refinement. Technical Report UF-('I-.-'i.-ii! University of Florida,
1998.
A -i i. i. .I Approach to Parallel Programming: Methodology and Models 39
30. B. L. Massingill. A structured approach to parallel programming. Technical Report
CS- I 1.-'' -i California Institute of Technology, 1998. Ph.D. thesis. Available as
( ftp://ftp.cs.caltech.edu/tr/cs-tr-98-04.ps.Z ).
31. Message Passing Interface Forum. MPI: A nessage-passing interface standard.
International Journal of Supercomputer Applications and High Performance Com-
puting, 8(3-4), 1994.
32. J. M. Morris. Piecewise data refinement. In E. W. I 'I -i 1. editor, Formal Devel-
opment of Programs and Proofs..\. I Il--..!- \\. - Publishing Company, Inc., 1990.
33. 01i. !: 1I Partners. The 01. : !: 1I standard for shared-memory parallel directives,
1998. ( http://www.openmp.org ).
34. A. Pnueli. The temporal semantics of concurrent programs. .. . .' Computer
Science, 13:45-60, 1981.
35. P. A. G. Sivilotti. A verified integration of imperative parallel programming
paradigms in an object-oriented language. Technical Report CS- 1 I-' .-21, Cal-
ifornia Institute of Technology, 1993.
36. P. A. G. Sivilotti. Reliable synchronization primitives for Java threads. Technical
Report CS- I 1.-'i- I California Institute of Technology, 1996.
37. J. Thornley. A parallel programming model with sequential semantics. Technical
Report CS- 1 I.-'i.- California Institute of Technology, 1996.
38. J. Thornley and K. M. ('!,.iii- Barriers: Specification. Unpublished document.
39. L. G. Valiant. A bridging model for parallel computation. Communications of the
ACM, 33(8):103 111, 1990.
40. E. F. Van de Velde. Concurrent Scientific Computinq. Springer-Verlag, 1994.
40 Berna L. Massingill
A Some commands of Dijkstra's guarded-command
language in our model
This section sketches definitions in our model for some of the commands of D l:-
stra's guarded-command language [13, 15]. Definitions of additional constructors
appear in [30].
D C, -" 1 ',,,. A .1 (,. ').
We define program skip = (V, L, InitL, A, PV, PA) as follows:
- V = L.
- L = {... }, where F.'. is a Boolean variable.
- InitL = (true).
- A = {a}, where
la = {'. }
Oa ={r".. }
Ra = {((true), (false))}
PV = {}.
PA= {}.
D. I;. ',..., A.2 (Assignment).
We define program P = (V, L, InitL, A, PV, PA) for (y := E) as follows:
- V = {i,..., vN} U {y} U L, where {vi,... ,VN} = {v: affects.(v,E) : v}.
- L = {Enp}, where Enp is a Boolean variable not otherwise occurring in V.
- InitL = (true).
- A = {a}, where
Ia = {Enp} U {Vi,...,VN}
Oa = {Enp,y}
Ra = {, ... ,XN :: ((true,xi,... ,XN), (false,E.(xi,...,XN))}
and Xa, ..., XN is an assignment of values to the variables in vi,..., VN.
PV ={}.
PA= {}.
D. I,. /' ,.. A.3 (Abort).
We define program abort = (V, L, InitL, A, PV, PA) as follows:
A. -i i,. i ,i,. Approach to Parallel Programming: Methodology and Models
V =L
L = {F., }, where F.,
InitL = (true).
A = {a}, where
is a Boolean variable.
la = { r. }
0. = { }
oa={}
R = {(true), (}
PV = {}.
PA= {}.
B More about transforming arb-model programs into
par-model programs
This section presents versions of some of the theorems of in Section 4.4 more
suitable for transforming programs for distributed-memory architectures. For
proofs of theorems refer to [30].
Lemma B.i (Interchange of par and IF, part 1, with duplicated variables).
If Q1,..., QN and b are as for Theorem 4.7, and b, ..., bN are Boolean expres-
sions such that for j # k no variable that affects bj is written by Qk, then the
following holds whenever both sides are started in a state in which bj = b for all
if b -> par(Qi,...,QN) [] -b -+ skip fi
par(
if bi -+ Qi [] -b1 -- skip fi,
if b -> QN H -bN -- skip fi
J, ..,f of Lemma B.1.
This lemma follows from Theorem 4.7 and exploitation of copy consistency as
discussed in Section C.2.
42 Berna L. Massingill
Lemma B.2 (Interchange of par and IF, part 2, with duplicated variables).
If Qi,...,QN, Ri,..., RN, and b are as for Theorem 4.8, and bl,..., bN are
Boolean expressions such that for j # k no variable that affects bj is written by
Qk, then the following holds whenever both sides are started in a state in which
bj = b for all j:
if b -- (arb(Qi,...,QN);par(Ri,...,RN)) [] -b -- skip fi
par(
if b1 -+ (Q; barrier; Ri) [] -bl -+ skip fi,
if bN (QN; barrier; RN) [ -bN -> skip fi
)
S'f.*,. of Lemma B.2.
Analogous to Lemma B.1.
Lemma B.3 (Interchange of par and DO, with duplicated variables).
If Q1,..., QN are arb-compatible, Ri,..., RN are par-compatible, and for all
k # j no variable that affects bj is written by Qk, and (Vj :: (bj = b)) is an
invariant of the loop
do b -> (arb(Qi,...,QN);par(RI,...,RN)) od
then the following holds whenever both sides are started in a state in which
bj = b for all j:
do b -- (arb(Qi,...,QN);par(RI,...,RN))od
par(
do b1 -> (Qi; barrier; R, barrier) od,
do bN -- (QN; barrier; RN, barrier) od
)
S.*,.f. of Lemma B.3.
Analogous to Lemma B.1.
A -i ,i i ,I Approach to Parallel Programming: Methodology and Models 43
Example of ,,,'. ,i .....1.. .....i .... Let P be the following program:
x = max(a, b)
do while (x < 100)
arb
a=a*2
b=b+1
end arb
par
x = max(a, b)
skip
end par
end do
Then P is refined (using the data-duplication techniques of Section C.2) by the
following:
arb
xl = max(a, b)
x2 = max(a, b)
end arb
do while (xl < 100)
arb
a=a*2
b=b+1
end arb
par
xl
x2 :
end par
end do
max(a, b)
max(a, b)
which in turn is refined (using Theorem B.3) by the following:
arb
xl = max(a, b)
x2 = max(a, b)
end arb
do while (xl < 100)
a = a 2 ; barrier ; xl =
end do
do while (x2 < 100)
b = b + 1 ; barrier ; x2 =
end do
end par
max(a, b) ; barrier
max(a, b) ; barrier
44 Berna L. Massingill
which again in turn is refined by the following:
par
seq
xl = max(a, b)
barrier
do while (xl < 100)
a = a 2 ; barrier ; xl
end do
end seq
seq
x2 = max(a, b)
barrier
do while (x2 < 100)
b = b + 1 ; barrier ; x2
end
end par
max(a, b) ; barrier
max(a, b) ; barrier
end do
seq
C Some example transformations
Section 2.5 sketches our program-development -hi ,i. _. A key element of that
-11 i and one not discussed in detail in this paper, is the sequence of trans-
formations that convert the original arb-model program into one that can be
transformed into a program in the par or subset par model and thence into
a program for the target architecture. A collection of transformations useful in
this process appears in [30]; we summarize a few here.
C.1 Change of granularity
If the number of elements in an arb composition is large compared to the number
of processors available for execution, and the cost of creating a separate thread
for each element of the composition is relatively high, then we can improve the
. II !! i. of the program by reducing the number of threads required, that is,
by changing the granularity of the program.
We can change the granularity of an arb-model program by transforming
an arb composition of N elements into a combination of arb composition (of
fewer than N elements) and sequential composition, as described in the following
theorem.
1 .... .. .. C .1 (( !... : of ... ... ,, .*. ).
If PI, ... P are arb-compatible, and we have integers ji, j2, ..., jM such that
(1 < ji) A (ji < j2) A... A (JM < N), then
A ,i i i i Approach to Parallel Programming: Methodology and Models 45
arb(Pi,... ,PN)
arb(
seq(PE,...,P ),
seq(Pjl+l,..., Pi2),
seq(Pj+l, +..., PN)
)
" '.,., f of ., .... C.1.
See [30].
C.2 Data distribution and duplication
In order to transform a program in the arb model into a program suitable for
execution on a distributed-memory architecture, we must partition its variables
into distinct groups, each corresponding to an address space (and hence to a
process). Section 5 describes the characteristics such a partitioning should have
in order to permit execution on a distributed-memory architecture; in this sec-
tion we discuss only the mechanics of the partitioning, that is, transformations
that effect partitioning while preserving program correctness. These transforma-
tions fall into two categories: data distribution, in which variables of the original
program are mapped one-to-one onto variables of the transformed program; and
data duplication, in which the map is one-to-!!i i! that is, in which some vari-
ables of the original program are duplicated in the transformed program.
Data distribution. The transformations required to effect data distribution
are in essence renamings of program variables, in which variables of the original
program are mapped one-to-one to variables of the transformed program. The
most I 1'.- I1 use of data distribution is in partitioning non-atomic data objects
such as ;!i - Each ;!i is divided into local sections, one for each process,
and a one-to-one map is defined between the elements of the original i, and
the elements of the (1i-i ..i ) union of the local sections. That such a renaming
operation does not change the meaning of the program is clear, although if
elements of the ;: are referenced via index variables, some care must be taken
to ensure that they (the index variables) are transformed in a way consistent with
the i i! !!ni i /mapping.
46 Berna L. Massingill
Data duplication. The transformations involved in data duplication are less
obviously semantics-preserving than those involved in data distribution. The
goal of such a transformation is to replace a single variable with multiple copies,
such that "' .I'; consistency is maintained when it matters." We use the term
(re-)establishing ..'., consistency to refer to (re-)establishing the property that
all of the copies have the same value (and that their value is the same as that
of the original variable at an analogous point in the computation). In the trans-
formed program, all copies have the same initial value as the initial value of the
original variable (thereby establishing copy consistency), and i!!- reference to
a copy that changes its value is followed by program actions to assign the new
value to the other copies as well (thereby re-establishing copy consistency when
it is violated). Whenever copy consistency holds, a read reference to the origi-
nal variable can be transformed into a read reference to i!!- one of the copies
without changing the meaning of the program.
We can accomplish such a transformation using the techniques of data refine-
ment, as described in [32]. We begin with the following data-refinement trans-
formation: Given program P with local variables L, duplicating variable w in L
means producing a program P' with variables
L' = L \ {w}U{w(),...,,, }
(where N is the number of copies desired and w(1),...,. are the copies of
w), such that P E P'. It is simplest to think in terms of renaming w to w(1) and
then introducing variables w(2),... ,.,' ; it is then clear what it means for P'
(with variable w(1)) to meet the same specification as P (with variable w).
Using the techniques of data refinement, we can produce such a program P'
by defining the abstraction invariant
Vj :2
and transforming P as follows:
- Assign the same initial value to each copy w(j) in InitL' that was assigned
to w in InitL, and replace :i!! assignment w := E in P with the multiple
assignment
W('1),..., E '1), ...,E (N)
where E(k) = [.,'/w()] (j is arbitrary and can be different for .[itl. i. ii
values of k). Observe that multiple assignment can be implemented as a
sequence of assignments, possibly using temporary variables if w affects E.
Replace ;,!! other reference to w in P with a reference to w(j), where j is
arbitrary.
The first replacement rule ensures that the abstraction invariant holds after
each command; the second rule makes use of the invariant. In our informal
.i ,i i 11, ,.1. the abstraction invariant states that copy consistency holds, and the
two replacement rules respectively (re-)establish and exploit copy consistency.
A -i ,i ii.i Approach to Parallel Programming: Methodology and Models 47
Let P' be the result of ': ,l| i!- these refinement rules to P. Then P E P'.
We do not give a detailed proof, but such a proof could be produced using the
rules of data refinement (as given in [32]) and structural induction on P.
For our purposes, however, P' as just defined II not be quite what we want,
since in some situations it would be advantageous to postpone re-establishing
copy consistency (e.g., if there are several duplicated variables, it might be ad-
vantageous to defer re-establishing copy consistency until all have been assigned
new values), if we can do so without losing the prop i 1i i P E P'. We observe,
then, that
(w 1),E...,.(' : ),...,E( )); Q
E
w(k) := E(k) Q ; (w(1) ...,. '-1) '+1),..., := w(k) ...,.,')
as long as for all j # k, wj) is not among the variables read or written by Q.
The argument for the correctness of this claim is similar to that used to prove
Theorem 3.21 in Section 3.7.
We can thus give the following replacement rules for duplicating variable w
in an arb-model program:
- Replace w := E with
arb(w(l) := i.[.,/w()], ... ' := i.[.'/., ]) .
If w is not written by :,!! of PI,..., PN, replace arb(Pi,..., PN) with
arb(Pi[w/w(1)],...,PN[w/.I ']) .
If w is written by Pk but neither read nor written by ; other Pk, replace
arb(Pi,..., PN) with
arb(PI,..., Pk[w/. '" '],..., PN) ;
arb(w() := ., "= .. -1) ." .' 1) .. := f )
*
* |