An Optimal Algorithm for the Construction of the

System Dependence Graph

Panos E. Livadas

Theodore Johnson

Computer and Information Sciences Department

University of Florida

Gainesville, FL 32611

ABSTRACT

Program slicing can be used to aid in a variety of software maintenance activities including code under-

standing, code ;,. ,on-. debugging, and program reengineering. Program slicing (as well as other program

analysis functions including forward slicing) can be efficiently performed on an internal program represen-

tation called a system dependence graph (SDG). The construction of the SDG depends primarily on the cal-

culation of the transitive dependence which in turn depends in the calculation of the data dependence. In

this paper we demonstrate the correctness and the optimality of our method of calculating the transitive data

dependence. Furthermore, this method requires neither the (explicit) calculation of the GMOD and GREF

sets nor the construction of a linkage grammar and the corresponding subordinate characteristic graphs of

the linkage grammar's nonterminals. Additionally, a beneficial side effect of this method is that it provides

us with a new method for performing interprocedural, flow-sensitive data flow analysis.

1. Introduction

Software maintenance is an expensive, demanding, and ongoing process. Lientz and Swanson

[Lie80] have reported that large organizations devoted 50% of their total programming effort to

the maintenance of existing systems. Boehm [Boe75] has estimated that one US Air Force sys-

tem cost $30 per instruction to develop and $4,000 per instruction to maintain over its lifetime.

These figures are perhaps exceptional; but, on the average, maintenance costs seem to be between

two and four times higher than development costs for large embedded systems. Furthermore,

according to internal conversations with our industrial affiliates, approximately 60% of their

maintainer's time is spent looking at code. We therefore aim to reduce maintenance costs by

developing tools that can assist in activities that focus on understanding code.

Program slicing provides one such useful tool for the maintenance programmer. Let P be a

program, let p be a point in P, and let v be a variable of P that is either defined or used at p. A

static slice (or simply a slice) ofP relative to the slicing criterion

is defined as the set of all

statements and predicates of P that might affect the value of the variable v at the point p. This

definition is less general than the one given in [Wei84]; but, it is sufficient [Hor90]. Program

slices could be used in a variety of ways to aid in several software engineering activities. Weiser

[Wei82] has shown that programmers use slices when debugging. Program slicing provides a

meaningful way to decompose a large program into smaller components and can therefore aid in

program understanding; and, can also be used in code reusability. Dicing, a method based on

static slicing, can be used to aid in debugging by allowing certain program bugs to be automati-

cally located [Lyl87]. Horwitz [Hor88] has used the concepts of slicing in integrating program

variants and Badger [Bad88] has demonstrated how slicing can be used for automatic paralleliza-

tion. Furthermore, slicing also aids in code reusability. Furthermore, a number of metrics based

on program slicing have been proposed [Wei82] which include coverage, component overlap,

functional clustering, parallelism, and tightness.

-2-

Weiser's slicer was based on a flow-graph representation of Simple_D programs. Otten-

stein et al [Ott84], showed that an intraprocedural slice could be found in linear time by travers-

ing a suitable graph representation of the program which they referred to as the program depen-

dence graph (PDG). Horwitz et al [Hor90], have introduced algorithms to construct interpro-

cedural slices by extending the program dependence graph to a supergraph of the PDG which is

referred to as the system dependence graph (SDG). This extension also captures the calling con-

text of the procedures that was lacking in the method proposed by Weiser; and, it also permits

slicing to be performed even if a program contains calls to unknown procedures, provided that

transitive dependence are known.

Informally, the SDG is a labeled, directed, multigraph where each vertex represents a pro-

gram construct such as declarations, assignment statements, and control predicates. Edges

represent several kinds of dependence among the vertices which can be distinguished by the

labels attached to them.

We noted in [Liv90] that the SDG provides us with a suitable form of internal program

representation (IPR) that could be employed in the development of an integrated software mainte-

nance environment. That is, an environment that integrates a number of tools that alone or in con-

juction with one another would aid in various program understanding and maintenance tasks.

Realizing the versatility of this IPR, we embarked in the development of a prototype that

accepts programs written in a subset of ANSI C [Liv94a] including macro support1 which gen-

erates an SDG [Croll94]. We have also implemented a number of tools such as a static slicer, a

dicer, a forward slicer among others that can utilize this SDG [Liv94]. We should also note that

the prototype incorporates the methods and algorithms discussed in this paper.

Despite that our prototype accepts C programs that may span multiple files and nearly every

C construct in this paper we will restrict our grammar to a subset of ANSI C defined as follows.

First, declarations of local, global, and static scalar variables are supported. Second, the distinc-

tion is made between the two methods of parameter passing: pass-by-value and pass-by-

reference. The same notation is employed as in C so that the type of parameter passing can be

determined. However, pointer operations are restricted to those that constitute "pass-by-reference

parameters"; i.e., if x and y are pointer variables, we permit assignments of these variables

such as *x = 4, and *y = *x (where denotes a dereferencing of the contents of the vari-

ables); but, general pointer assignments such as x = y are not allowed. Third, any number of

return statements are permitted to appear anywhere in a procedure and can contain expressions

that may include variables and are modeled after the return statements in the language C.

Fourth, we distinguish between functions that return values as opposed to those that do not.

Fifth, all C constructs are "handled" except goto, break, continue, and long jumps.

The main contribution of this paper is the presentation of a new method that permits one to

solve all procedures including the construction of the SDG in a bottom-up fashion and so that

only one copy of a procedure dependence graph is required for all sites2. There is no need to build

an attribute grammar or calculate the corresponding subordinate graphs for the determination of

the transitive dependence even in the presence of recursion; actual-out nodes that are deemed N-

nodes are identified as such during the SDG construction and dependence calculation that makes

the explicit calculation of the GMOD and GREF sets of all procedures unnecessary. Hence, as

we show, our algorithm is not only conceptually simpler than other alghoritms but it is also

optimal.

1 The macro preprocessor is discussed in [Liv94b]

2 Notwithstanding the fact that in the case of aliasing phenomena each aliasing pattern gives rise to a dis-

tinct procedure dependence graph.

-3-

The remaining part of the paper is organized as follows. Section 2 describes the system

dependence graph associated with our grammar. In section 3 we present the interprocedural slic-

ing algorithm whereas in section 4 we provide with a classification of the formal parameters. Sec-

tion 5 describes our algorithm and in section 6 we prove its the correctness and optimality.

Finally, the handling of aliasing in the present of recursion is discussed in section 7, our prototype

is briefly presented in section 8, related work is presented in section 9 and the paper concludes

with section 9 where our current and future work is discussed.

2. The Program Dependence Graph and the System Dependence Graph

This section briefly describes the program dependence graph and the intraprocedural slicing algo-

rithm, it follows with a discussion of the system dependence graph, and concludes by presenting

the transformations that are employed in the presence of global and static variables and aliasing.

We should note that the terminology and notation used in the sequel is the same as that in [Liv94]

unless otherwise noted.

2.1. Program Dependence Graph

The program dependence graph (PDG) for a program P, with no procedures, denoted by Gp, is a

labeled, directed, multigraph. Each node represents a program construct such as a declaration, an

assignment statement, and a control predicate; there is also a special node called the entry node.

Edges represent several kinds of dependence among the nodes which can be distinguished

by the label attached to them. Specifically, three dependence are distinguished3: control, data

flow, and declaration, each of which will be briefly discussed below. In particular, let v and v2 be

two nodes ofGp.

If the execution of v2 is determined by the predicate represented by vi at the time of execu-

tion, then v2 is control dependent on v,. In this case, we write

cd

V1 -)V2

We note that every component of P that is not subordinate to any control predicate is control

dependent on the entry node. Given the constructs4 of the grammar under consideration, control

dependence reflect the program's 17neting structure.

We say that v2 is data flow dependent on vl, if and only if v2 uses a variable a that vi

defines5, and there exists an execution path from v, to v2 where variable a is not redefined. The

above relationship can be denoted by using the following notation:

dd

V1 -")V2

Declaration dependence are considered as special kinds of data flow dependence that exist from

a node vi corresponding to the declaration of a variable to each of the nodes v2 corresponding to

that variable's subsequent definitions; and, we write:

de

V1 "'_)V2

3 In reality there is a further dependence edge due to the return statement. We will defer discussion of

this edge to the next section.

4 The handling of a return statement requires the introduction of additional edges that are discussed in

the following section.

5 We say that v defines a variable a if and only if execution of v causes the memory location represent-

ed by a to be written (i.e. if v : a b + 3 then a is defined at v ). A variable a is said to be used at a state-

ment v 2 if and only if execution of v 2 causes the memory location represented by a is read. For example, b is

used at statement v 1.

-4-

Let so and w be two nodes in Gp. An intraslice-path from w to s, is a path on Gp denoted by S.

and defined by

Vv,,v,,w,seGp 3: e,=(v,,v)eS -* ( -v) v (v, v) v (v, v)

Let now so be a node of G, that defines a variable v. The slice6 of G, with respect to so, (denoted

by G,/so), is that subgraph of G, which consists of those nodes w from which so is intraslice-path

reachable. Hence, the nodes of the slice are defined as follows

V(Gso) = wV(Gp) : 3 S

2.2. System Dependence Graph

Our discussion now moves to slicing on a program which consists of a collection of one or more

procedures and their associated parameters. To address this problem, the program dependence

graph is extended to what is called a system dependence graph (SDG). An SDG for a program P

consists of a PDG that models the main program M and a collection of L procedure dependence

graphs that model the program's K procedures Fk for each non-negative integer k such that7

0
the introduction of an additional set of nodes and an additional set of edges. Each of these sets is

discussed in turn below. When a call to a function Fk is encountered, a call-site node is created

that is denoted by cs,(Fk) ( is a nonnegative integer employed to enumerate the static calls to Fk).

Then for each actual parameter a node, the actual_in, is created and an additional node, the

actual_out node, is created for each actual parameter that is passed by reference. The sets of

actual_in and actual_out nodes (corresponding to the call to Fk with i parameters are denoted by

a_in, (Fk) and a_out, (Fk), where I is equal to the number of parameters of the function Fk that are

passed-by-reference), are built. By definition, all such nodes are control dependent to the call-site

node. In symbols,

[Vj Vk Vve (ain,(Fk) kaoutl(Fk))] k (cs,(Fk)v)

Secondly, at the time that the first static call to a function Fk was encountered, an additional node,

called the entry node and denoted by en(F ), was created. Moreover, two additional sets of nodes,

referred to as the formal-in nodes (f_in,(Fk)) and the formal-out nodes (foutl(Fk)) are built. For

allj and fixed k, the set ain, (Fk) is isomorphic to f_in,(F), whereas the set a_out,,(Fk) is iso-

morphic to f_out (fk). In addition:

[Vj Vk Vve (fin, ,(fk) Ufout(Fk))] (en(Fk)i v)

At this point, we present additional types of edges that will enable us to build the system depen-

dence graph.

By definition, for each k and each j, the vertex en is adjacent to cs,(Fk). Each such edge that is

incident from a call-site node and incident to an entry node is referred to as a call edge. Notice

that the indegree(en(Fk)) = jk where j is the number of call-sites corresponding to the function Fk.

Hence, for each k

6 The definition that we use inn this paper is the following. A static slice of a program P at a program

point p relative to a variable v, that is either defined or used atp, is the set of all statements and predicates of

P that might affect the value of v.

7 We will see shortly that in the presence of aliasing, K

A parameter-in edge is an edge from an actual-in node to its corresponding formal-in node.

Similarly, a parameter-out edge is an edge from a formal-out node to its corresponding actual-out

node. In symbols we have,

Vj Vi a-in,,(Fk) fin,(Fk) A [VI fout(Fk) aouti,(Fk)

A transitive dependence edge exists from an actual-in node to an actual-out node if the formal-

out node corresponding to the latter node is intraslice-path reachable8 from the formal-in node

corresponding to the former node. Note that these edges may exist only between actual-in nodes

and actual-out nodes. In other words, for each fixed k

(3i 31 3: a_in, (Fk) aoutl(Fk)) [(3i 3 3: fin,(Fk) f_out(Fk)) (fin,(F~k)e (G,~f_.,r '))

where GF denotes the procedure dependence graph of the function Fk.

C functions may or may not return a value to the call-site. In the former case, the returned

value may be data dependent on one or more of the actual parameters. If that is the case, then

these parameters should be included in the slice. Therefore, we define a new edge, the affect-

return edge, that indicates this parameter-returned value dependence. Such an edge, if it exists, is

by definition incident from the actual-in node corresponding to the actual parameter that

influences the returned value and incident to the function's call-site node. In symbols,

Vi 3: the value returned by Fk is data dependent on ain,(Fk) in,(Fk) -; cs,(Fk) V]

Two new types of edges, an intraprocedural edge (return-control edge) and an interpro-

cedural edge (return-link edge) are needed to properly handle the return statements. We

would like to note here that our method for determining control dependence is based on a

syntax-directed method (hence we do not handle such constructs such as gotos). If another, more

precise method was used (such as the one described in [Aho86], there would be no need to

include return-control edges in the SDG.

The return-control edge indicates the dependence between the return statement of a procedure

and other statements following the return statement which will not be executed when the program

exits on a return statement. In other words, a return-control dependence exists between a return

node v, and another node of v, of GF if and only if, execution of the return statement correspond-

ing to the former node excludes execution of the statement corresponding to the latter node. The

above relationship can be defined as follows9:

[Vv,vve GF (v,, -v,)] (v, vr)

A return-link edge10 connects a return node to the corresponding function call-site. Specifically,

Vv,, eG Vj (v,, csJ(Fk))

8 The definition of reachability in the presence of procedures is more general than the one presented in the

previous section and is given below.

9 We note that the definition of return-control dependence does not coincide with the control dependence

defined elsewhere. Our definition combined with the grammar that we employ makes computation of control

dependence unnecessary.

10 It should be noted that the return value could also be modeled as a parameter. We have chosen to use

return-link edges in this paper.

Vj (s, (F k) ") en (F k)

-6-

Given now that our grammar allows call-by-reference parameters, return statements, as

well as functions that may return values, we can define the summary information, ( at a call-site

cs,(Fk), to be the union of three types of dependence: transitive dependence, affect-return

dependence, and return-link dependence. Therefore,

= 0' U U ,=(ain,,(F),cs(Fk)) : 3 : (a in,,(Fk) cs(Fk))

VU U ern,=(v,csj(Fk)) E m 3:(veGFk)A(v, -S(F))

Figure 1 presents the system dependence graph of the program shown in Table 1. Declaration

edges are not shown to keep the graph from becoming "busier" than it already is.

void main() int CalcSum(int s) void Inc(int *x)

{ { {

int sum; Inc(&s); *x = *x + 1;

int i; s = s + 9;

i = 0; return s;

while (i < 10) { }

i = i + 1;

sum = CalcSum(sum);

}

i = i;

sum = sum;

Table A sample program.

Table 1. A sample program.

Control ffectReturn Parameter In

Flow Return Control -- .-- Parameter Out

Cad Retu-Lnmk Tro sitive

Figure 1. The program dependence graph corresponding to the program in Table 1.

-7-

At this time, we should further note that the nodes of the SDG (in the figures) are shown to be

"resolved" at the statement level. In actuality, the SDG is "resolved" at the token level. Using

a parse tree representation as the basis for our SDG allows more precise slices to be calculated

[Liv94]. In the sequel, by the term SDG we will denote a parse-tree-based SDG unless otherwise

noted. For the purposes of simplicity, the figures in this paper are shown to be resolved at the

statement level.

As we indicated earlier, determination of the transitive dependence of a procedure Fk is accom-

plished by determining for eachf_out1(Fk) all the formal-in nodesf_in1(Fk) from which the former

node is intraslice-path reachable. But, the structure of the procedure dependence graph GC is dif-

ferent from the one presented in Section 2.1 because it may contain a number of call-site nodes

cs. (F"k ) together with their associated summary information where jk is the number of static calls

that are made from Fk to functions F"", respectively. Hence, the definition of an intraslice-path S"

given in Section 2.1 is extended to a path denoted by S9 so that it also includes edges that

represent the transitive dependence and affect-return edges associated with the call-sites of

cs (Fm ") as well as the return-control edges. Hence,

Vv,,v,,w,SoeGFk : (vv) e, v (v, ---v) v (v, --vj) v (v, -v,)

We will say that so is intraslice-path reachable from w, if and only if, there exists an intraslice-path

from w to so. Hence, an intraprocedural slice is defined via

V(GF /SO) = weV(GF) 3ES

The algorithm that performs this task is similar to the one described in the previous section with

the additional requirement that the affect-return and return-control edges must be taken into

account.

2.3. Global Variables, Static Variables, and Aliasing

Source programs which contain global or static variables, or in which aliasing is present, need to

be transformed before they can be represented by an SDG. The following sections describe these

conversions.

2.3.1. Global Variables

The handling of global variables is based on the method suggested by [Hor90]. Specifically, glo-

bals are solved by introducing them as additional pass-by-reference parameters to the procedures

that use or define them. All procedures that call a procedure directly or a procedure which

indirectly uses or defines global variables are modified to include the global variables as pass-by-

reference parameters. The call-sites are also modified to include the new parameters.

This however is an incomplete solution. Because of possible naming conflicts, the global vari-

ables may need to be renamed. Consider the program in Table 2. In procedure Inc, there is a

naming conflict. Although Inc does not directly use the global variable g, Inc calls function

IncGlobal which does. Adding an additional parameter g to procedure Inc would create an

obvious naming conflict. Naming conflicts can arise when a formal parameter or a local variable

share the same name with a global variable.

The solution is to rename the global variables to avoid this conflict. A simple approach to choos-

ing unique names would be to simply append an "illegal" character to the end of a global vari-

able. For example, the global variable g can be renamed g+. Note that this renaming can be

done on the SDG; the source program need not be altered.

-8 -

[ ]. int g;

[ 2]. void main(void) [ 7]. int Inc(int g) [12]. void IncGlobal(void)

[ 3]. { [ 8]. { [13]. {

[ 4]. int i = 4; [ 9]. IncGlobal(); [14]. g = g + 1;

[ 5]. i = Inc(i); [10]. return g+1; [15]. }

[ 6]. } [11]. }

Table 2. Illustration of global naming conflicts.

2.3.2. Static Variables

Static variables in C are essentially global variables with limited visibility. These variables exist

across invocations of the procedure in which they are declared. They can be handled in the same

manner as "regular" global variables except special attention must be paid to avoid naming

conflicts; there may be several static variables with the same name among modules, procedures,

or even within the same procedure. As in the renaming of global variables, a simple approach to

choosing unique names would be to append an "illegal" character to the end of the static vari-

able. Additionally, the name of the procedure in which it is declared is also appended. This will

remove naming conflicts between procedures. To avoid naming conflicts within the same pro-

cedure, the scoping level of the static variable is also appended. For example, the static variable

s in the procedure Add, that is declared in the first scope of the procedure, would be renamed

s+Addl.

2.3.3. Aliasing

Our previous discussions have not included the problem of aliasing. The reason is that when

aliasing phenomena occur during a call to a procedure, they are then resolved (during the SDG

construction) via a transformation to an alias free procedure. We first note that when a global

variable g is encountered in the body of a function alias that is invoked via the call11

alias (&x1,&x2,...,&x,) and function's header alias (*yi,*y2,...,*ym) then internally it is

assumed that the call was alias (&xl,&x2,...,&x,,&g+) and the function's header was

alias (*yl,*y2...,*y,, *g+); actual as well as formal nodes are adjusted accordingly. Hence,

from that point on we may assume the existence of neither global nor static variables and that

each function call is of the form alias (&xi,&x2,... &x,) and the function's header

alias (*yl,*y2 ...,*>, ). Now given our grammar aliasing can occur, if and only if, a call of the

form alias (&x1,&x2,...,&x ) is made with x, = x, where i] and 1 *
are aliases.*

The transformation to an alias-free procedure dependence graph is simple. When a procedure

must be solved, a tag is attached to it. A tag of a procedure with n parameters is a mapping from

X {i} to N" that indicates the aliasing pattern for that particular call. The mapping is straight for-

= 1

ward; ify,,,y2, .... ,y, are aliases, the positions i
value i,. For example, if no aliasing is present, the mapping is the identity on x {i}; whereas if

y2 andy4 are aliases, the mapping is given by

tag(1,2,3,4,5,....,n 1,n)= (1,2,3,2,5,...,n ,n)

which indicates that the second and fourth parameters are aliases. When aliasing is detected at the

call-site to a function Fk, the call-site is tagged; a new entry node is created and tagged as

describedl2; the (alias-free) abstract syntax tree representing alias is copied so it is rooted at

The types of the formal parameters are omitted.

12 In our implementation both call-site and entry nodes are identified via Fk. tag.

-9-

this new entry node; and, data dependence analysis is performed by "identifying" the sets of

variables that are aliased. We note here that the possible number of alias configurations for a pro-

cedure with n passed-by-reference parameters is 2" -n.

3. The Interprocedural Slicing Algorithm

The interprocedural slicing algorithm is based on the algorithm suggested in [Hor90].

Modifications are necessary given the additional constructs introduced in the grammar. The algo-

rithm finds the slice relative to a node so of a program Gp in two phases. During the first phase, a

set of nodes U1 of Gp is captured with the property that u e U1, if and only if, so is phase 1 reach-

able from w. Phase 1 reachability is equivalent to the property that there is a path from u to v con-

sisting of any of the following types of edges: control dependence, data dependence, declaration

dependence, return-control, parameter-in, transitive dependence, affect-return, and/or call. In the

second phase, we capture an additional set of nodes U, of Gp with the property that we U,, if and

only if, there exists a node ue U1 such that u is Phase 2 reachable from w. Phase 2 reachability is

equivalent to the property that there is a path from w to u consisting of any of the following types

of edges: control dependence, data dependence, declaration dependence, return-control,

parameter-out, transitive dependence, affect-return, and/or return-link edges. Finally, the vertices

of the interprocedural slice are defined as the union of the nodes visited in both phases. In sym-

bols

V(Gp/So) = U U U

In general, at each phase all indicated edges are followed recursively backwards as they were

when intraprocedural slicing was performed.

Finally, the slicing algorithm is modified to handle the return-control edges. This

modification is based on the observation that the slicer must recognize when a return-control edge

is being traversed. The node at the end of the return-control edge (a return statement) is marked

as being in the slice. The slicer now "short circuits" to the control predicate13 of the return state-

ment and slicing continues as normal.

We should note that when a call to a procedure F yields aliasing and a slice at a statement so

that is internal to the body of procedure F, special care must be taken. As described in [Hor90],

assuming that the total number of aliasing patterns is m >0, let s'f represent the instance of so in

each procedure dependence graph associated with F. Then the slice is given by

.j { slice at so

4. Enhancing Slicing Accuracy

There are a number of instances in which an actual-out node should not exist as when a passed-

by-reference parameter is not modified. In this case, the presence of its actual-out node could

adversely affect the precision of an interprocedural slice. A method is described in [Hor90] to

detect such a phenomena that is based on the calculation of the GMOD and GREF sets14 (via the

method proposed in [Ban79]) for each procedure Fk. We have determined that calculating these

sets is not necessary under our method since all information required for that determination is

13 The short-circuiting is done by simply following the control dependence edge and continuing the slic-

ing algorithm from its origin.

14 For a procedure P, the set GMOD(P) is defined as the set of variables that might be modified by P itself

or by a procedure transitivelyy) called from P. The set GREF(P) is defined as the set of variables that might

be referenced by P itself or by a procedure transitivelyy) called from P [Hor90].

- 10-

contained in the procedure's dependence graph. Furthermore, as we will show in the next section,

we will derive this information during construction of the SDG.

In particular, we will consider four cases of a formal-out node that corresponds to a pass-

by-reference parameter. The first case occurs when the variable is passed to the procedure and is

never modified (i.e., there is no execution path where the variable is defined). The second occurs

when the variable is passed and is always modified (i.e., the variable is defined on every execu-

tion path). The third occurs when the variable is passed and may sometimes be modified. For the

purposes of slicing, the second and third cases can be combined. However, by differentiating

between the second and third case, we are able to use that information for other related applica-

tions such as calculating reaching definitions. The fourth case, the unknown case, is the initial

condition before the nodes are classified. This case may also exist while the dependence for

recursive procedures are being calculated. Note that this case will not exist after the calculation

has been completed. The unknown case will be discussed in the next section.

We can summarize as follows: Let A, S, N, and U denote the set of formal-out nodes of Fk that

are always, sometimes, never modified, and unknown, respectively. Then a node f_out((Fk) is

classified as follows:

dd

1. (fouti(Fk)eA) <- (Vi (f_in,(Fk) -) f_outl(F ))),

dd

2. (f_outi(Fk) S) < ((f_in, Fk -) f_outl(Fk)) A (indegreed(f_out,(F))>)),

Sdd (

3. (f_out (F k)N) N) ((fin,l(Fk) -) f_out (Fk)) A (iidegree d(f out,(F k )) ),

4. (f_out(Fk)e U) (indegreea(f _out, (Fk))= O).

where indegreedd(v) denotes the indegree of the node v relative to the data flow dependence edges.

5. Building the System Dependence Graph

Let Fk be a procedure of a program P. We will say that Fk is solved, if and only if, all data depen-

dences and control dependence have been computed. We will say that the procedure has been

summarized, if and only if, all summary dependence have been calculated. On the other hand,

determination of the summary information of Fk requires that the procedure be solved. The

method that is proposed in [Hor90] for the calculation of the transitive dependence distinguishes

between grammars that do not support recursion and those that do. In the former case, the solu-

tion proposed is via the use of a separate copy of a procedure dependence graph for each call-site.

In the latter case, the solution requires the construction of an attribute grammar and the calcula-

tion of the corresponding subordinate characteristic graphs of the linkage grammar's nonterminals

to determine the transitive dependence. Furthermore, in either case the GMOD and GREF sets

must be calculated before solution of the dependence is initiated.

In this section we will describe a method that permits one to solve all procedures including

the construction of the SDG in a bottom-up fashion and so that only one copy of a procedure

dependence graph is required for all sites15. Hence, the advantages our algorithm are that it is

conceptually simpler; there is no need to build an attribute grammar or calculate the correspond-

ing subordinate graphs for the determination of the transitive dependence; actual-out nodes that

are deemed N-nodes are identified as such during the SDG construction and dependence calcula-

tion that makes the explicit calculation of the GMOD and GREF sets of all procedures unneces-

sary. Finally, to repeat, the algorithm operates on a parse-tree-based SDG that yields smaller

slices.

15 Notwithstanding the fact that in the case of aliasing phenomena each aliasing pattern gives rise to a dis-

tinct procedure dependence graph.

-11-

The Algorithm

In [Liv94], we presented and algorithm that in the absence of recursion correctly recursion

computes the transitive dependence in a terminal procedure, and use it to compute both data

dependence and transitive dependence of a SDG in a single pass. Moreover, our method does

not need to "know" whether a procedure Fk is terminal (i.e., does not contain static calls to any

procedure) or not. If Fk is terminal, then it can be solved with no interruptions. If it is not, then

upon encountering a call to a procedure F', calculation of the dependence of procedure Fk is

suspended; the partial solution of Fk (denoted by NoFk) obtained up to this point is preserved;

and, dependence calculation is initiated at the called procedure F'. This process is continued until

a procedure Fr is encountered that is either terminal or has already been solved. In the former

case, the terminal procedure is solved. At any rate, F"'s summary information cF" is "reflected"

back onto its corresponding calling site in the form of edges and we write poFr to indicate this

operation of reflection. Calculation of the dependence of the calling procedure is then resumed.

It should be noted that once a procedure has been summarized, there is no reason to descend into

the procedure again; subsequent calls to a summarized procedure need only have the summary

information edges reflected (i.e., copied) to the call-site.

The algorithm just described does not work well when recursive procedures are present.

The reason is that in the absence of recursion, it is guaranteed that a terminal procedure will be

encountered that can be completely solved, and its summary information can be reflected to its

caller. In the case of recursion even if we process a procedure in its entirety, the summary infor-

mation that will be obtained may be incomplete; therefore, a number of dependence may not be

found. To counter this problem, the algorithm described above was modified as follows.

The extended call sequence graph (ECSG) is employed to detect when a recursive pro-

cedure has been encountered and to also keep track of the set of procedures (as in the case of

mutual recursion) that must be iterated over. An ECSG, Q, is a dynamic multilist based on the

CSG of the form Q = {c,,:O*
nothing more than the CSG itself defined by {o,:0Oj<;m}. Associated with each node in the*

backbone, co,,, there is a list of procedures {co,1: 1k
h=coo,. By definition, if an iterate list is not empty, then no procedure in the list appears in that

list more than once; and, as we will see, these are the procedures over which iteration must take

place. As an example, in Figure 3 one can see three possible ECSG's; the backbone in each case

is the "horizontal" list (consisting of procedures M, A, B, and C). Furthermore, in (i), all iterate

lists are empty; whereas in (ii) and (iii), there is a non-empty iterate list with root nodes C and A,

respectively.

C M A B

(i) A

(ii)

B

C

(iii)

Figure 3. Extended Call Sequence Graph.

The insert operation on the ECSG differs from that of the CSG. Specifically, whenever a

call to an unsolved procedure U is encountered during the solution of a procedure v, a search in

- 12-

"column-major" order is performed on the ECSG, Q, starting from co,o to determine if a node

oC labeled with that procedure's name (U) is encountered in Q.

If no such node is found, the new procedure is inserted at the tail of the backbone; the

iterate list of this procedure is set to empty; and, calculation will proceed as normal by preserving

ao and descending into U.

On the other hand, if such a node is found, then recursion has been detected16. In that case,

we modify the ECSG as follows. First, the iterate list, rooted at h=co,, is expanded by copying

its root into it as well as all the procedures that correspond to the nodes of Q satisfying

{C,,:0=*j, are deleted. Further-*

more, the fact that the procedure U was found in ECSG suggests that either we have only partially

descended into it or have completed a first pass through it; therefore, instead of descending17 into

procedure U, we reflect the partial summary of U into the corresponding call site in V and resume

solution of v. One example, assuming the use of ECSG in Figure 3(ii), a call from C to procedure

A would yield the ECSG of Figure 3(iii).

Finally, a procedure V is deleted from the backbone, if and only if, the entire procedure has

been processed. At the same time an intra-slice is performed and the summary information that is

obtained is reflected to its (known) call sites. It is important to note that if the procedure deleted

is not a terminal procedure, its summary will be partial (i.e., incomplete). Moreover, if the iterate

list rooted at V is not empty, then this is a signal that iteration should be performed over the pro-

cedures in the iterate list.

Initially, the summary information calculated for a procedure in the iterate list is incom-

plete. We term this incomplete information a partial summary. The main concept of the iteration

algorithm is that as the algorithm iterates over each procedure in the iterate list, this partial infor-

mation is reflected onto the call-sites, which in turn, is used in the calculation of subsequent par-

tial summaries. Eventually, when no new dependence are found, this partial summary becomes

a complete summary.

An iteration over a procedure is defined as follows. We descend into the procedure and calculate

the dependence as normal, except that as call-sites are encountered, only the (possibly partial)

summary information is reflected onto the call-site; no descents are made from the procedure.

When the procedure has been processed, the summary information is calculated and reflected to

all known (encountered) call-sites of the procedure. It should be noted here that the correct calcu-

lation of dependence requires that when partial dependence (in which the effects of actual-out

variables are unknown are involved), the reaching definitions for those actual-out variables are

killed. Of course, the classification of the actual-out nodes will change as the partial summary

becomes more complete.

This iteration is performed over the set of procedures contained within the iterate list until no

changes to the calculated dependence of the set are found. At this point, the procedures in the

iterate list are solved.

The algorithm just described is presented in a procedural language in Table 3. The SDG for pro-

gram Prog is computed by calling solve_program. This procedure initializes the ECSG and

the procedure solutions and then calls solve_procedure on the main procedure. In

solve_procedure, each line of the the procedure P is solved, using the algorithm described in

[Liv94]. Whenever an unsummarized procedure call Q is encountered, the partial solution of P

is saved and solveprocedure is executed on Q. If P and Q do not call each other, then

16 Although recursion has been detected, the extent (the procedures involved in the recursion) has not yet

been determined. As the extent of recursion is determined, the root may change.

17 Additional descents will be made at the time of iteration.

- 13-

algorithm solveprogram(Prog){

ECSG = { Main }

set all partial procedures solutions to empty

solve procedure(Main)

solve_procedure(P)

for every line 1 in P

if 1 contains a procedure call to unsummarized procedure Q

if Q does not appear on ECSG

save the partial solution of P

push Q onto ECSG

solve_procedure(Q)

else

n=ECSG POS(Q)

m=ECSG POS(P)

For every procedure R such that n
or R is on the iterate list of T, n
Place R on the iterate list of Q

Delete all iterate lists for procedure T, n
Process 1 using the existing partial summary for Q

else

Process 1

If the iterate list of P is not empty

compute recursion(iterate list(ECSG POS(P))

Remove P and its iterate list from ECSG

}

computerecursion(procedure list)

Let changed=procedure list

while changed # 0

for every P in changed

solve(P)

if the summary summary information of P changed

add every Q in procedurelist that calls P to New changed

New changed=changed

Table3. The algorithm

when the solveprocedure on Q terminates,Q will be summarized and the solution of P can

proceed. Otherwise, the recursive chain involving P and Q will be detected when Q (or a pro-

cedure that Q calls) makes a call to P is on the backbone of the ECSG. The iterate lists on the

ECSG are updated, making use of the ECSG_POS procedure which returns the index of a pro-

cedure on the ECSG backbone. When a recursion is detected, the called procedure can't be

solved immediately, so whatever partial summary that exists is used. If all lines in P have been

solved, and P contains an iterate list on ECSG, then P is a member of a set of recursive pro-

cedures that must be solved, using the compute_recursion procedure.

The compute_recursion procedure accepts an iterate list as input, and repeatedly solves the

procedures on the list until no new transitive dependence are found. The transitive dependence

in a procedure can change only if the transitive dependence in a procedure that it calls has

changed. Therefore, we only need to solve procedures that call procedures whose summary infor-

mation was changed in a previous step.

We illustrate As an example consider the algorithm in detail by considering the call sequence in

Table 4.

H -- A B

A -BD

B -CE

C CA

D c

E F

F -E

Table 4. Program Abstraction.

- 14-

We begin by descending into the main procedure (procedure M). The first procedure encountered

is the unsolved procedure A which does not exist in the backbone; hence, NoAM is saved, we insert

A into the backbone, and descend into it. When a call to the unsolved procedure B is encountered,

the backbone is searched; but, the procedure is not found. So B is inserted there, the partial solu-

tion soA is preserved, and we descend into B. Similarly, when we encounter the call to the

unsolved procedure C, we suspend solution of B, save o(B, insert C into the backbone, and des-

cend into it. Figure 3(i) illustrates the status of the ECSG (merely the backbone) up to this point

whereas (6.e) indicates the solution steps so far.

oMA -l aoA -- aoB -- aoC (6.e)

During the solution of C a call to C is encountered; C exists in the backbone so we reflect the

partial summary to the call-site (in this case the partial summary is empty) a2'C = aoC U paoC,

then the set of nodes of the ECSG from C to the end of the list are copied and appended as an

iterate list at C (Figure 3(ii). Processing of C is continued until the call to procedure A is encoun-

tered. A search of the ECSG reveals that A exists; hence, the partial summary aoA (in this case

the partial summary is also empty) is reflected to its call-site in C, i.e, a3oC = a'2C U paoA; and,

the iterate list rooted at A is updated (Figure 3(iii)). Since we did not descend, processing of C

continues until its end is encountered in which case it is marked as having been visited18, its (par-

tial) summary information (pa2oC) is calculated; and, this summary information is reflected to

the call site in B. This yields a2oB = o B U pa2oC. C is then deleted from the backbone and pro-

cessing returns to procedure B as indicated by the tail of the backbone (Figure 4(i)). B now calls

E which in turn calls F as indicated by Figure 4(ii).

A A A

B B B

C C C

(i) (ii) (iii)

Figure4. Extended Call Sequence Graph.

In symbols (6.e) has yielded (6.f)

aoM---) (A --- 2B -- aoE oF (6.f)

Now F, being terminal and solved, can be reflected, (i.e., a'oE = aoE U poF); and, now E can be

solved. Its summary is reflected via a'3B = a2O( B J poE.

The partial summary aoB is calculated and reflected into A; in symbols, a2 A = aoA U pa B.

Notice that as a consequence of the previous steps, (6.g) has become

oM -A ) 2oA (6.g)

18 If recursion is not present, marking the procedure denotes the procedure as solved. A marked pro-

cedure will not be descended into, only its summary information need be reflected. In the case of recursion,

marking the function only denotes that the procedure has, at best, a partial solution. However, the mark

prevents the procedure from being descended into until the iteration stage.

- 15-

Figure 4(iii) shows the state of the graph up to this point.

Now, during processing of A, a call to D is encountered. The solution of A is suspended and

we descend into D (Figure 5(i)).

A A

T m A D D A

B B

C C

(i) (ii)

Figure 5. Extended Call Sequence Graph.

But, D being terminal is solved; its node is deleted from the backbone (Figure 5(ii)); and, its

summary is calculated and reflected to its corresponding call site in A via 3'A = 2oAA j poD.

Processing resumes with procedure A. When the end of A is encountered, it is marked as visited.

When the end of A is reached, A is deleted from the backbone. But since the iterate list rooted at

A was not empty, iteration over the union of the procedures of that iterate list is necessary; in

symbols, t(A UB U C) (t denotes the iteration operation). Notice that when the iteration has been

completed, the complete summary of all procedures in the iterate list will have been obtained. In

other words,

t(A UBUC) ---A UAU B ~ C

Once the recursion is solved, we return to finish procedure M. After calling A, M calls procedure

B. However, since procedure B has already been solved, we are finished.

D2 oM f..A

-3 oM

---> cM

6. Correctness and Optimality

In this section, we demonstrate some properties of our algorithm, and use them to show both that

the algorithm is correct, and that it is optimal.

6.1. Recursive Biconnected Components

Let G =(Vp,Ep) be a directed graph of calling dependence of a program P. The vertex set Vp is

the set of procedures in the program. If procedures A,Be P and A calls procedure B, then we will

add edge (A,B) to Ep. A recursive biconnected component of a graph Gp denoted by G, is a maxi-

mal set of vertices, V'c Vp such that there is a path between every pair of vertices v and u in V'.

The transitive dependence of a set of procedures in a biconnected component must be calculated

together, because every procedure in the component can call every other, directly or indirectly.

Furthermore, if A is in recursive biconnected component Gr, and (A,B)eEE with B G_, then the

correct computation of the transitive dependence in G, depends on the transitive dependence of

B. Therefore, B should be solved before any procedure in G, is solved.

- 16-

The algorithm that computes the extended call sequence graph collects recursive biconnected

components into an iterate list (i.e., a set of vertices co,j, i= 1..n,). Since all vertices on the right

hand side of the ECSG are solved first (i.e, ifECSG_POS(A)<_ECSG(B) then A is solved before B), all

procedures So G, called by a procedure in G, have been solved when the iteration starts on G,.

6.2. Transitive Edge Depth

By the previous argument, we can restrict our attention to the calculation of the transitive depen-

dences in the procedures in recursive biconnected component G,. Let x andy be two parameters of

a procedure A. We need a definition of a "natural" path in a program that defines a transitive

dependence. We say that there is a transitive dependence of actual-out node x on actual-in node y

if there is a path in Si, where:

Vv,,v,,w,soeGFk 3: e,=(v,,vj)eS= ,eS v (v, ) v ( v, --vj) v (v, ->v,) v (v, -v) v (v, ---v,)r

In this case, we say that is interslice reachable from x.

If there is a transitive edge e from the actual in node x to actual out node y then one can infer that y

is interslice reachable from x (where we distinguish between edges incident to Sometimes and

Always nodes). Moreover, given that a transitive edge can be generated by more than one

interslice paths, we define P*(e) be the set of all such paths. For each path PeP*(e) we define its

!i ,igti, denoted by len(P), to be the number of procedures (or equivalently call sites) that are

encountered along P including the initial procedure and all recursive copies. Moreover, we define

the recursive depth of transitive edge e, denoted by r(e), to be

min len(P)}.

PeP *(e)

We give one more definition. Let e be a transitive edge in procedure A of recursive biconnected

component G,. The edge e will be found and inserted to the summary of A at some iterative step i

when G, is solved. We define the iterative depth of an edge e, denoted by i(e), to be the iteration

at which e is added to the graph. We may now state and prove the following theorem.

Theorem: For any transitive edge ecA with AE G, we have that its recursive and iterative depths

are equal. In symbols, r(e)=i(e).

Proof: We prove the theorem by induction on r(e). Ifr(e)= 1, then the procedure is terminal and

therefore it can be solved. Hence, the transitive dependence e ofy on x will be found in the first

iteration. Then e will be added on the first iteration and consequently, i(e)= 1.

Assume now that r(e)=i(e) for r(e)1. If now r(e)=k, then there is a path PeP*(e)

such that len(P)=k. Therefore if P is restricted to procedure A, there is a intra-slice path that leads

from x toy, and all of the transitive dependence edges e' in P are such that r(e')
least one transitive edge eo such that r(eo)=k-1. The induction hypothesis tells us that all of the

transitive edges inA are in place by the kth iteration, and one of then was added on the k- 1st itera-

tion. Therefore e is added to the SDG on the kth iteration, and i(e)=r(e)=k.E

6.3. Correctness

The correctness of the transitive dependence calculation follows if the algorithm terminates, and

if all and only those transitive dependence that exist in the program are added as edges in the

SDG. It is easy to see that the algorithm terminates, because only a finite number of transitive

edges can be added. It is also easy to see, by an inductive argument, that only those edges that

exist in the program are added to the SDG.

- 17-

In the base case, the calculation of dependence within a single procedure is correct [Liv94], so

only those edges e such that r(e)= 1 are added in the first iteration. In the kth iteration, a transitive

edge e =x-y will be added if in A the path from x to y crosses a transitive dependency edge e' such

that i(e')=r(e')=k-1. But by our inductive hypothesis, an edge with iterative depth less than k is

added to the SDG only if it exists in the program. Therefore, since the algorithm for calculating

dependence in a single procedure is correct, only those edges e such that r(e)=k are added on the

kth iteration.

Next, we need to show that all transitive edges are added to the SDG. Let e be a transitive edge.

Then e has a recursive depth, say r(e)=k. Let P be a path for e such that len(P)=k. Therefore the

calling procedure A makes a call to procedure B with parameters w and z such that there is a transi-

tive edge e'=w->z and r(e')=k-1. Continuing this way, we can see that in P there are transitive

edges with recursive depth 1 through k-1. Since i(e)=r(e), there was an edge added to the SDG

on iterations 1 through k 1. Therefore if at step 1 no transitive edge is added, there is no transitive

edge with recursive depth I +1 or greater.

6.4. Optimality

A worst case analysis of the number of iterations required to solve a recursive biconnected com-

ponent G, will produce a very pessimistic result, because one can construct examples in which

only a single transitive edge is added at every iteration. If param is the maximum number of

parameters in any procedure in G,, then up to O( IGparam ) transitive edges can be added, so solv-

ing G, might require O(IG,2 1,, ) iterations. Consider the example in Table 5:

A(w,x,y,z)

B(z,w,x,y)

}

B(w,x,y,z) {

C(w,x,y, z)

}

C(w,x,y,z)

D(w,x,y, z)

D(w,x,y,z)

if (w>0)

A(w,x,y, z)

else

y = z

Table 5. A program with a large number of transitive dependence.

The program in Table 5 contains 48 dependencies, as in each procedure A, B, C, D, there is

dependency between each pair of parameters. The transitive dependencies are shown in Figure 6,

and are labeled with the number of iterations required to find the dependency. To simplify the

figure, we use the parameter name to represent both the formal in and the formal out nodes. In

procedure D, the edges w->y and z->y are added on the first iteration, because w and z directly

affect y. These transitive dependence propagate through the recursive biconnected component

(the match between the parameters of the calling and the called parameters is indicated by dashed

lines). The last dependence to be determined is e = w-x in A, after 20 iterations.

Recall the theorem that we just proved, that i(e)=r(e). Since 20 iterations are required to solve the

set of procedures, there is an edge with iterative depth 20 and therefore an edge e with recursive

depth 20. Since e has recursive depth 20, any algorithm that finds e must search through 20

instances of the procedures. We can now state the following theorem:

-18-

12

16 12

A( W i- Z 4 16

8 4

-. ------ ----- 1 5

15

3 1

B(LL-- 7----

14

2 1

618 14

6

10

1 7 ____ 13

Figure 6. Transitive dependence for the Program in Table 5.

Theorem: The ugin, ,liib for finding transitive edges in recursive biconnected components is

optimal with respect to the longest chain ofprocedures that must be solved.

Proof: Suppose that the algorithm requires k iterations. Then there is an transitive edge e in the

recursive biconnected component with recursive depth k. Therefore, any algorithm which finds e

must search through k procedure instances.E

We note that the algorithm in Table 3 can be optimized somewhat to reduce the number of itera-

tions and the space overhead. The algorithm actually used in our implementation is listed in

Table 7. The idea is to collapse some iterations by propagating changes within an iterative step.

We delayed the presentation of the actual algorithm because the algorithm in Table 3 is easier to

analyze, and has the same behavior.

computerecursion(procedure list)

mark all procedures on procedure list

while there is a marked procedure on procedurelist

for every P in procedurelist

if P is marked

solve (P)

if the summary summary information of P changed

mark every Q in procedure list that calls P

Table7. The modified computerecursion procedure.

6.4.1. Recursive Depth

The optimality of our algorithm shows that any algorithm for computing an SDG can require

o(IG,[param2) iterations when confronted with a recursive biconnected component. Intuitively,

only a few iterations will be required, because a well written program will contain only a few

- 19-

procedures in each recursive biconnected component, and the procedures will add most of their

transitive edges on the first iteration. Therefore, the number of iterations should be proportional

to the number of procedures in the component.

Let us recall the example in Table 5. It is difficult to determine just what computation the pro-

gram carries out, because so many calls must be examined to determine where the information

flows. A more typical example of a recursive procedure is shown in Table 8. This program has a

recursive depth of 2, and it is not difficult to trace the execution of the recursive procedure. In

general, a recursive biconnected component that contains edges with a large recursive depth will

be more difficult to read than a recursive biconnected component that only contains transitive

edges with a small recursive depth.

We propose a new software metric: the recursive depth. The recursive depth of a recursive

biconnected component is the largest iterative depth on any transitive edge in the component, and

the recursive depth of a program is the largest recursive depth of any recursive biconnected com-

ponent in the program. A large recursive depth indicates a difficult to understand program.

[ ]. void main() [10]. void R(int *x, int *y)

[ 2] { [11] .

[ 3]. int x,y; [12]. if (*y == 0)

[ 4]. [13]. *x = *x + 1;

[ 5]. R(&x,&y); [14]. else if (*y == 1)

[ 6]. [15]. *y = *y + *x;

[ 7]. } [16]. R(x,y);

[17]. *x = *x + 1;

[18] .

[19] else

[20] *x = *x 1;

[21] *y = *y 1;

[22] R(x,y);

[23] .

[24] .

Table 8. A sample program.

7. Recursion and Aliasing

If aliasing is present in a recursive procedure, the possibility exists that several alias

configurations may be "spawned" as a result of the dependency calculation. Consider the pro-

cedure fragment in Table 9. This procedure (when called without aliased parameters) will

"spawn" three distinct aliased configurations. They are: R. (1,1,3), R. (1,2,2) and

R. (1, 1,1). This does not present a problem since each alias configuration gives rise to a dis-

tinctly named function. In this case the algorithm will iterate over the set of four procedures (the

non-aliased configuration as well as the aliased ones).

[ ]. void R(int *x, int *y, int *z)

2] .

[ 3] if (-)

4]. ---

5] else if (-)

[ 6]. R(x,x,z);

[ 7] .---;

[ 8] R(x,y,y);

[ 9] }

[10] .

Table 9. A procedure fragment.

-20-

8. An Integrated Software Maintenance Environment

As we indicated in the introduction we have developed a prototype that integrates a number of

tools. Three of those, reaching definitions calculation, forward slice, and dice, are briefly dis-

cussed below.

The SDG Can be used to find reaching definitions and/or du-pairs even when they cross

procedure boundaries. Finding reaching definitions can be thought of as a restricted form of slic-

ing; i.e., computing a slice of only one "iteration" backwards. Intuitively, each flow edge is fol-

lowed backward from the target node and the nodes that have been reached are marked as being

in the reaching definition set. This works for an intraprocedural case. For the interprocedural

case, we would like to identify definitions that span one or more procedure boundaries. For this

to occur, we must take into account both the passing of variables by reference and their subse-

quent "return" and the actual return statement mechanism. The interprocedural algorithm is

similar to the slicing algorithm in that it requires two phases. The screen dump that was provided

from our Ghinsu tool in Figure 7 illustrates the result of finding the reaching definitions relative

to the statement *y=*y+*x. Notice, that this way we are able to also detect data anomalies. For

example, we can see that the declaration of y "reaches" the use of y in the statement

*y=*y+*x; therefore, it is detected an uninitialized variable.

*I The Ghins. Project : /amd/santa/cis/santaO/srol/iabdemo92/rec.rsion2.c

void main()

R(ax, &y);

X = X';

Y =Y;

void R(int *x, int *y)

if (*y = 0)

*x= x + 1;

else if (*y == 1)

*Y +*Y 1;

else

*x= 1;

R M Y);

I Cen IIjilSDGI

Si Reach

E~eJ Rrpple

Oepcenden ecey

DeenIsyI~lwef

Qeedny lw ert

EI D ~II D G I

caa ntc~il

Me Usagei~ Cor SDwump

J _ II_

Figure 7. The Ghinsu tool. The reaching definitions relative to the statement *y=*y+*x are shown.

Whereas a slice relative to a particular variable in a particular statement is the set of all

statements that may affect the value of the variable, a forward slice will capture the potential

effect of changing a variable at a selected statement [Hor90].

Like slicing, forward slicing is accomplished in two phases. The first phase consists of a

traversal of a particular set of edges starting at a selected node. In the second phase, traversal of a

different set of edges is applied to each node visited during the first phase. The union of the

nodes visited in both phases is the interprocedural slice. In general, all edges are (recursively)

-21-

followed forward. The edges followed in the first phase are control, data flow, declaration,

return-control, parameter-in, transitive, affect-return, and call edges. The second phase follows

the control, data flow, declaration, return-control, parameter-out, transitive, affect-return, and

return-link edges. These are the same sets of edges followed in slicing, but in the forward direc-

tion. Note that in forward slicing, there is no need for the "short circuit" operation when follow-

ing return-control edges.

The implementation of the dicing algorithm is straightforward. The dice is computed in two

phases as in calculation of a slice. The only difference is that the action of the slicing algorithm is

reversed. Instead of marking nodes as being contained in the slice, the encountered nodes are

marked as not being in the slice.

9. Related Work

Weiser[Wei84] has built slicers for FORTRAN and an abstract data language called Simple-D.

His slices were based on flow-graph representation of programs. As far as we know, no opera-

tional slicers for C have been built. In addition, Weiser's method does not produce an optimum

slice across procedure calls because it cannot keep track of the calling context of a called pro-

cedure. Methods for more precise interprocedural slicing have been developed by Horwitz

[Hor88] where parameters are passed by value-result. This is an extension of the program depen-

dence graph presented in [Fer87]. However, this models a simple language that supports scalar

variables, assignment statements, conditional statements, and while loops.

The dependence graph developed by Horwitz differentiates between loop-independent and

loop-carried flow dependency edges. Our method treats these as a single type of edge -- the data

flow edge -- which simplifies construction of the program dependence graph.

Our method of calculating interprocedural dependence does not use linkage grammar as

used in Horwitz's algorithm[Hor90]. Our algorithm is conceptually much simpler. The linkage

grammar utilized by Horwitz includes one nonterminal and one production for each procedure in

the system. The attributes in the linkage grammar correspond to the input and output parameters

of the procedures. After constructing the linkage grammar, the algorithm determines the pro-

cedure which does not call any other procedure and calculates its transitive dependence and

reflects them to other procedures. Our method descends to the called procedures in the order of

their call in the program. When a called procedure does not call any other procedure, its transi-

tive dependence are reflected on the other procedures which called this procedure. Recursion is

handled by a method of iteration over the recursive proceduress. The called procedure always

returns to the correct address in the calling procedure. This completely eliminates the use of link-

age grammar and construction of subordinate characteristic graphs which makes our algorithm

more efficient.

Harrold, et. al., [Har89] calculate interprocedural data dependence in the context of inter-

procedural data flow testing. Their algorithm requires an invocation ordering of the procedures.

Additionally, when recursive procedures are present, processing may visit each node p times

where p is the number of procedures in the program. As above, we do not need to calculate an

invocation ordering. Also, we need to iterate over only the recursive procedures, not the entire

program.

A technique for handling slices for recursive procedures has been suggested by Hwang

[Hwa88] which constructs a sequence of slices of the system where each slice of the sequence

essentially permits only one additional level of recursion until a fixed point is reached. More-

over, this algorithm solves only self-recursive procedures and has no mechanism for handling

mutually recursive procedures.

-22-

10. Current and Future Work

We presently have produced a prototype that builds an SDG for virtually every construct of ANSI

C (except long jumps) with the restriction that only single-level pointers are allowed and in that

case analysis induced by the presence of the pointers is detected and handled. For arbitrary

pointers our prototype permits interprocedural analysis by inline expanding each function. We are

currently developing methods to summarize procedures in the case of arbitrary level pointers in

the context of the method that we employ in the determination of the du-chains. When this is

accomplished will enable us to build the SDG.

11. References

[Aho74] A.V. Aho, J.E. Hopcroft, and J.D. Ullman. "The Design and Analysis of Computer

A lg idi, ", Addison-Wesley, Reading, MA.

[Aho86] A.V. Aho, R. Sethi, and J.D. Ullman. "Compilers: Principles, Techniques and Tools",

Addison-Wesley, Reading, MA.

[Bad88] L. Badger and M. Weiser. "Minimizing Communications for Synchronizing Parallel

Dataflow Programs", In Proceedings of the 1988 International Conference on Parallel Process-

ing, Penn State University Press, PA.

[Ban79] Banning, J.P. "An Etficient Way to Find the Side Effects of Procedure Calls and the

Aliases of Variables ". In Conference Record of the Sixth ACM Symposium on Principles of

Programming Languages (San Antonio, Tex., Jan. 29-31,1979). ACM, New York, 1979.

[Boe75] B.W. Boehm. "The High Cost of Software, Practical Strategies for Developing Large

Software Systems", E. Horowitz (ed.). Reading, Mass: Addison-Wesley.

[Cal88] D. Callahan. "The Program Summary Graph and Flow-Sensitive Interprocedural Data

Flow Analysis", In Proceedings of the SIGPLAN 1988 Conference on Programming Language

Design and Implementation, Atlanta Georgia, June 22-24, 1988.

[Fer87] J. Ferrante, K. Ottenstein, and J. Warren. "The Program Dependence Graph and its Use

in Optimization", ACM TOPLAS, July 1987.

[Har89] M. J. Harrold and M. L. Soffa. 'S lk sg Data for Integration Testing

[Hor88] S. Horwitz, J. Prins, and T. Reps. "Integrating Non-interfering Versions ofPrograms",

in Proceedings of the 15th ACM Symposium of Programming Languages, ACM Press, N. York.

[Hor89] S. Horwitz, J. Prins, and T. Reps. "Integrating Non-interfering Versions ofPrograms",

ACM TOPLAS, July 1989.

[Hor90] S. Horwitz, T. Reps, and D. Binkley. "Interprocedural Slicing Using Dependence

Graphs", ACM TOPLAS, January 1990.

[Hwa88] J.C. Hwang, M.W. Du, C.R. Chou. "Finding Program Slices for Recursive Pro-

cedures", In Proceedings of the IEEE COMPSAC 88, IEEE Computer Society, 1988.

[Kas80] Kastens, U. "Ordered Attribute Grammars". Acta Inf. 13,3, 1980.

[Ker88] B.W. Kemigham and D. M. Ritchie. "The C Programming (ANSI C) Language ", 2nd.

Edition, Prentice Hall, Englewood Cliffs, New Jersey.

[Leu87] H.K.N. Leung and H.K. Reghbati. "Comments on Program Slicing", IEEE Transac-

tions on Software Engineering, Vol. Se-13 No. 12, December 1987.

[Liv94] Panos E. Livadas, Stephen Croll. "A New Algr ,hiim for the Calculation of Transitive

Dependences", Journal of Software Maintenance, Vol 6, 1994; pp. 100-127.

[Lyl86] J.R. Lyle and M. Weiser. "Experiments in \i ,,ig-.,,,,J Debugging Aids", In Elliot

Soloway and Sitharama Iyengar, editors, Empirical Studies of Programmers, Ablex Publishing

- 23-

Corporation, Norwood, New Jersey, 1986.

[Lyl87] J.R. Lyle and M. Weiser. "Automatic Program Bug Location by Program Slicing", In

Proceedings of the 2nd International Conference on Computers and Applications, June 1987.

[Ott84] K.J. Ottenstein and L.M. Ottenstein. "The Program Dependence Graph in a Software

Development Environment", In Proceedings of the ACM SIGSOFT/SIGPLAN Software

Engineering Symposium on Practical Software Development Environments (Pittsburgh, Pa.,

April 23-25, 1984). ACM SIGPLAN Notices 19,5, May 1984.

[Par86] G. Parikh. 'Handbook of Software Maintenance ", Wiley-Interscience, New York, New

York 1986.

[Reps88] T. Reps and W. Yang. "The Semantics of Program \ l ,,g", TR-777, Computer Sci-

ences Dept., University of Wisconsin, Madison, June 1988.

[Reps89] T. Reps and T. Bricker. "l//lun,,,,, Interference in Interfering Versions of Pro-

grams", TR-827, Computer Sciences Dept., University of Wisconsin, Madison, March 1989.

[Wei81] M. Weiser. "Program \/,i rig", In Proceedings of the Fifth International Conference on

Software Engineering, San Diego, CA, March 1981.

[Wei82] M. Weiser. "Programmers Use Slices When Debugging", CACM July 1982.

[Wei84] M. Weiser. 'Program Slicing, IEEE Transactions on Software Engineering, July 1984.

[Yang89] W. Yang, S. Horwitz, and T. Reps. "Detecting Program Components With Equivalent

Behaviors", TR-840, Computer Sciences Dept., University of Wisconsin, Madison, June 1989.