Citation
Context-dependent flow-sensitive interprocedural dataflow analysis and its application to slicing and parallelization

Material Information

Title:
Context-dependent flow-sensitive interprocedural dataflow analysis and its application to slicing and parallelization
Creator:
Johmann, Kurt, 1955-
Publication Date:
Language:
English
Physical Description:
vi, 91 leaves : ill. ; 29 cm.

Subjects

Subjects / Keywords:
Algorithms ( jstor )
Data lines ( jstor )
Experimental results ( jstor )
Linear programming ( jstor )
Logical givens ( jstor )
Logical theorems ( jstor )
Mathematical independent variables ( jstor )
Mathematical procedures ( jstor )
Mathematical variables ( jstor )
Programming languages ( jstor )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1992.
Bibliography:
Includes bibliographical references (leaves 89-90).
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
by Kurt Johmann.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Resource Identifier:
027817217 ( ALEPH )
AJG6071 ( NOTIS )
26576154 ( OCLC )

Downloads

This item has the following downloads:


Full Text










CONTEXT-DEPENDENT FLOW-SENSITIVE INTERPROCEDURAL
DATAFLOW ANALYSIS AND ITS APPLICATION
TO SLICING AND PARALLELIZATION














By
KURT JOHMANN


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA

1992

UmYERSITY'OFfORm UUIR E














ACKNOWLEDGEMENTS


I would like to express my appreciation and gratitude to my chairman and

advisor, Dr. Stephen S. Yau, for his careful guidance and generous support during

this study. I also would like to express my appreciation and gratitude to my previous

advisor, Dr. Sying-Syang Liu. Without their supervision and counsel, this work would

not have been possible. To Dr. Paul Fishwick, Dr. Richard Newman-Wolfe, and Dr.

Mark Yang, members of the supervisory committee, go my thankfulness for their

service.

Finally, I want to thank the Software Engineering Research Center (SERC) for

providing financial support during this study.














TABLE OF CONTENTS




ACKNOWLEDGEMENTS ............................ ii

ABSTRACT .................................... v

CHAPTERS

1 INTRODUCTION .................... ........... 1

1.1 Interprocedural Dataflow Analysis ..................... 1
1.2 Slicing and Logical Ripple Effect ................. ...... 3
1.3 Parallelization ................... ............ 6
1.4 Literature Review .................. ........... 7
1.5 Outline in Brief .................... .......... 11

2 THE INTERPROCEDURAL DATAFLOW ANALYSIS METHOD .... 12

2.1 Constructing the Flowgraph ................. ..... .. 12
2.2 Interprocedural Forward-Flow-Or Analysis ............. 16
2.2.1 The Dataflow Equations .. 17
2.2.2 Element Recoding for Aliases 23
2.2.3 Implicit Definitions Due to Calls .. 27
2.3 Interprocedural Forward-Flow-And Analysis ... 30
2.4 Interprocedural Backward-Flow Analysis .. 36
2.5 Complexity of Our Interprocedural Analysis Method ... 36
2.6 Experimental Results ................... ........ 41

3 INTERPROCEDURAL SLICING AND LOGICAL RIPPLE EFFECT 45

3.1 Representing Continuation Paths for Interprocedural Logical Ripple
Effect . 45
3.2 The Logical Ripple Effect Algorithm 55
3.3 A Prototype Demonstrates the Algorithm. ... 67
3.4 The Slicing Algorithm ... .............. ....... 71

4 INTERPROCEDURAL PARALLELIZATION ..... 77

4.1 Loop-Carried Data Dependence ..... 77
4.2 The Parallelization Algorithm 80

5 CONCLUSIONS AND FUTURE RESEARCH.. 86

5.1 Summary of Main Results ...................... 86








5.2 Directions for Future Research .. 87

REFERENCES ....................... ......... ..... 88

BIOGRAPHICAL SKETCH ............................ 91














Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

CONTEXT-DEPENDENT FLOW-SENSITIVE INTERPROCEDURAL
DATAFLOW ANALYSIS AND ITS APPLICATION
TO SLICING AND PARALLELIZATION

By

Kurt Johmann

May 1992

Chairman: Dr. Stephen S. Yau
Major Department: Computer and Information Sciences

Interprocedural dataflow analysis is important in compiler optimization, au-

tomatic vectorization and parallelization, program revalidation, dataflow anomaly

detection, and software tools that make a program more understandable by show-

ing data dependencies. These applications require the solution of dataflow problems

such as reaching definitions, live variables, available expressions, and definition-use

chains. When solving these problems interprocedurally, the context of each call must

be taken into account.

In this dissertation we present a method to solve this kind of dataflow problem

precisely. The method consists of special dataflow equations that are solved for a

program flowgraph. Regarding calling context, separate sets, called entry and body

sets, are maintained at each node in the flowgraph. The entry set contains calling-

context effects that enter a procedure. The body set contains effects that result

from statements in the procedure. By isolating calling-context effects in the entry

set, a call's nonkilled calling context is preserved by means of a simple intersection

operation done at the return node for the call.








Slicing determines program pieces that can affect a value. Logical ripple effect

determines program pieces that can be affected by a value. Both slicing and logical

ripple effect are useful for software maintenance. The problems of slicing and logical

ripple effect are inverses of each other, and a solution of either problem can be inverted

to solve the other. Precise interprocedural logical ripple effect analysis is complicated

by the fact that an element may be in the ripple effect by virtue of one or more specific

execution paths. In this dissertation we present an algorithm that builds a precise

logical ripple effect or slice piece by piece, taking into account the possible execution

paths. The algorithm makes use of our interprocedural dataflow analysis method,

and this method is also used in an algorithm given in this dissertation for identifying

loops that can be parallelized.














CHAPTER 1
INTRODUCTION

1.1 Interprocedural Dataflow Analysis

Dataflow analysis refers to a class of problems that ask about the relationships

that exist along a program's possible execution paths, between such program ele-

ments as variables, constants, and expressions [2, 10]. When dataflow analysis is

done for a program by treating its individual procedures as being independent of

each other, regardless of the calls made, this is known as intraprocedural analysis.

For intraprocedural analysis, assumptions must be made about the effects of calls.

By contrast, interprocedural analysis replaces assumptions with specific information

about the effects of each call. This information can be gathered by either flow-

sensitive [3, 6, 9, 17, 19, 21] or flow-insensitive [4, 7, 18] analysis. When answering

a dataflow question, a flow-sensitive analysis will take into account the flow paths

within procedures, whereas a flow-insensitive analysis ignores these flow paths. The

flow paths are the possible execution paths. Flow-sensitive analysis typically provides

more precise information, but at greater cost.

Flow-sensitive interprocedural dataflow analysis has two major problems that

make it significantly harder than intraprocedural analysis. First, in intraprocedural

analysis, it is assumed that any path in the flowgraph is a possible execution path.

By contrast, for interprocedural analysis, it is useful to assume that the possible

execution paths conform to the rule that once a procedure is entered by a call, the

flow returns to that call upon return. Thus, the set of possible execution paths will

typically be a proper subset of the paths in the program flowgraph. This problem








will be referred to as the calling-context problem. Second, call-by-reference formal

parameters typically cause alias relationships between actual and formal parameters

that are valid only for certain calls and apply only to those passes through the called

procedure that originate from those calls that establish the specific alias relationship.

There are many applications for a flow-sensitive interprocedural dataflow anal-

ysis method that solves the two major problems, assuming that the costs of the

method are not too high. Some of the well-known dataflow problems that can be

precisely solved by such a method are reaching definitions, live variables, the related

problems of definition-use and use-definition chains, and available expressions. Ap-

plications that require the solution of one or more of these dataflow problems include

compiler optimization, automatic vectorization and parallelization of program code,

program revalidation, dataflow anomaly detection, and software tools that show data

dependencies.

In this dissertation we present a new method for flow-sensitive interprocedural

dataflow analysis that solves the two major problems, and does so at a comparatively

low cost [13]. The method consists of special dataflow equations that are solved for

a program flowgraph. In deference to calling context, separate sets, called entry and

body sets, are maintained at each node in the flowgraph. The entry set contains

calling-context effects that enter a procedure. The body set contains effects that

result from statements in the procedure. By isolating calling-context effects in the

entry set, a call's nonkilled calling context is preserved by means of a simple inter-

section operation done at the return node for the call. The main advantage of our

method is its low complexity, and the fact that the presence of recursion does not

affect the preciseness of the result.

The language model assumed for Chapter 2 allows global variables, but the

visibility of each formal parameter is limited to the single procedure that declares








it. Thus, with the exception of a call and its indirect reference, each formal pa-

rameter can only be referenced inside a single procedure. Examples of programming

languages that fit this model are C and FORTRAN. This restriction on the visibility

of formal parameters is imposed for the sake of the discussions of element recoding

in Sections 2.2.2 and 2.3, of implicit definitions in Section 2.2.3, and of worst-case

complexity in Section 2.5. Our method can also be used for the alternative language

model that allows each formal parameter to have visibility in more than a single

procedure, but this is considered only briefly at the end of Section 2.5.

1.2 Slicing and Logical Ripple Effect

Given an actual or hypothetical variable v at program point p, determine all

program pieces that can possibly be affected by the value of v at p. This is the logical

ripple effect problem. Given v and p, determine all program pieces that can possibly

affect the value of v at p. This is the slicing problem. For these two problems, each

problem is the inverse of the other, and a solution for one of these problems, once

inverted, would be a solution for the other problem.

Logical ripple effect is useful for helping a programmer to understand how a

program change, either actual or hypothetical, will impact that program. Making

program changes as part of routine maintenance often introduces new errors into the

changed program. Such errors typically result because the programmer overlooked

some part of the logical ripple effect for that change. By showing a programmer what

the logical ripple effect actually is for a program change, mistakes can be avoided.

Slicing is primarily useful for program fault localization [23]. If a variable v at

point p is known to have a wrong value, then a slice on v at p will narrow the search

for the cause of the error to that part of the program that can truly affect v at p.

Thus, the fault is localized. The more precise the slice, the more localized the cause

of the error, saving programmer time.








In this dissertation we are concerned only with static logical ripple effect and

slicing [11, 12, 16, 24] where the ripple effect or slice is determined from dataflow

analysis of the program text. The alternative approach is dynamic logical ripple

effect and slicing [1, 14] where the ripple effect or slice is determined by actually

executing the program. Whenever we speak of execution paths in Chapter 3, we

always mean possible execution paths as determined by dataflow analysis.

Precise interprocedural logical ripple effect analysis is complicated by the fact

that a definition may be added to the ripple effect because of one or more specific

execution paths. To determine in turn the ripple effect of that added definition,

that definition should be constrained to those execution paths that are the possible

continuations of the execution paths along which that definition was itself affected

and thereby added to the ripple effect. We refer to this as the execution-path problem.

In particular, it is those call instances made in an execution path P that have

not been returned to in P that cause the difficulty. This is because of the rule that a

called procedure returns to its most recent caller. This means that any continuation

of the execution path P must first return to those unreturned calls in P before returns

can possibly be made to call instances that precede P. An example will illustrate the

problem.


procedure main procedure B procedure A
begin begin begin
1: f 7 6: y- f+5 7: callB
2: call A end 8: x +- y
3: z x end
4: f- 1
5: call B
end



For the example, assume that all variables are global, and that the problem is to

determine the logical ripple effect for the definition of variable f at line 4. The call








to procedure B at line 5 allows the definition of f at line 4 to affect the definition of

y at line 6, and the return of procedure B would be to the call at line 5 by which the

definition of y at line 6 was affected. The end result is that the ripple effect should

include only line 6. However, assume that the execution-path problem is ignored and

all returns are possible when the ripple effect is computed. For the same problem, the

call at line 5 allows the definition of f at line 4 to affect the definition of y at line 6.

Then the definition of y at line 6 affects the definition of x at line 8 by procedure B

returning to the call at line 7 in addition to the call at line 5. Then the definition of

x at line 8 affects the definition of z at line 3 by procedure A returning to the call at

line 2. The end result is a ripple effect that includes lines 3, 6, and 8, but only line 6

should be in the ripple effect.

Although there are a number of papers on logical ripple effect and slicing

[11, 12, 16, 24], there appears to be only one [11] that addresses the problems of

precise interprocedural logical ripple effect and slicing, and presents a method for it.

Weiser [24] was the first to propose an interprocedural slicing method that ignores

the execution-path problem and thereby suffers from the resulting loss of precision.

Horwitz et al. [11] address the problem of precise interprocedural slicing, and present

a method to construct a system dependence graph from which slices can be extracted.

In this dissertation we present an algorithm that builds the logical ripple effect

piece by piece, and takes into account the restrictions on execution-path continuation

that are imposed by the preceding execution paths up to the point by which the given

program piece is affected and thereby included in the ripple effect. In general, the

algorithm computes a precise logical ripple effect, but some overestimation is possible,

meaning that the computed logical ripple effect may be larger than it actually is. An

inverse form of the algorithm is presented for the slicing problem. The languages

that our algorithm will work for include many of the common procedural languages

such as C, Pascal, Ada, and Fortran.








1.3 Parallelization

Automatic conversion of a sequential program into a parallel program is often

referred to as parallelization. Parallelization problems are typically concerned with

the conversion of sequential loops into parallel code. In this dissertation, the specific

problem considered is the identification of loops in a program that can be parallelized,

including those loops that contain calls. A flow-sensitive interprocedural dataflow

analysis method has specific applicability to the problem of parallelizing loops that

contain calls, because such a method can supply the precise data-dependency infor-

mation that would be necessary for the parallelization analysis.

The parallelization of a loop would mean that each iteration of the loop can

be executed independently of the other iterations of the loop. In theory, this would

mean that each single iteration, or each arbitrary block of iterations, can be assigned

to a separate processor in a parallel machine. The specific architecture of a particular

parallel machine, as well as the programming language to be parallelized, as well as

the various loop transformations that are possible to convert sequential loop code into

functionally equivalent sequential code that is more parallelizable, will influence the

determination in any parallelization tool as to what loops can actually be parallelized,

and how they would be parallelized. However, none of the architecture, language,

and loop-transformation issues will be considered here. Instead, the problem will be

considered solely from the standpoint of data dependence.

After a brief review of the basics regarding data dependence and parallelization,

an algorithm is given that identifies loops in a program that can be parallelized, and

this algorithm uses our interprocedural dataflow analysis method as an integral part.

The potential value of parallelization is clear. On the one hand, parallel machines

are becoming more common, and on the other hand, a great number of sequential

programs already exist, some of which can benefit from the greater processing power

that parallelization would offer.








1.4 Literature Review

Different methods have been offered for solving various flow-sensitive interpro-

cedural dataflow analysis problems. Sharir and Pnueli [21] present a method they

name call-strings. The essential idea of their method is to accumulate for each ele-

ment a history of the calls traversed by that element as it flows through the program

flowgraph. The call history associated with an element is used whenever that element

is at a return point. The element can only cross back to those calls in its call history.

Thus, the call-strings approach provides a solution to the calling-context problem.

However, the disadvantage of this approach is the time and space needed to maintain

a call history for each element at each flowgraph node.

Let I be the program size. We assume that the number of elements will be

a linear function of 1. The worst-case number of total set operations required by

the call-strings approach would be greater by a factor of I when compared to our

method. This is because for each union or intersection of two sets of elements, if the

same element is in both sets, then a union operation must also be done for the two

associated call histories so as to get the new call history to be associated with that

element at the node for which the set operation is being done. A further disadvantage

of the call-strings approach is the need to include the associated call histories when

set stability is tested to determine termination for the iterative algorithm used to

solve the dataflow equations.

Myers [17] offers a solution to the calling-context problem that is essentially the

same as call-strings. Allen [3] presents a different method for interprocedural dataflow

analysis. The method analyzes each procedure completely, in reverse invocation

order. The first procedures to be analyzed would be those that make no calls, then

the procedures that only call these procedures would be analyzed, and so on. Once a

procedure is analyzed, its effects can be incorporated into those procedures that call








it, when they in turn are analyzed. The obvious drawback of this method is that it

cannot be used to analyze recursive calls.

Rosen [19] presents a complex method for interprocedural dataflow analysis

that is limited to solving the problems of variable modification, preservation, and use.

These dataflow problems do not require a solution of the calling-context problem.

Callahan [6] has proposed the program summary graph to solve the interproce-

dural dataflow problems of kill and use, where kill determines all definite kills that

result from a procedure call, and use determines all variables that may be used as a

result of a procedure call before being redefined.

As part of the determination of edges in the program summary graph, intrapro-

cedural reaching-definitions analysis must be done for each procedure. Simplifying

Callahan's space complexity analysis, we get O(vgl) as the worst-case size of the

program summary graph, where v,, is the number of global variables in the program

plus the average number of actual parameters per call, and I is the program size. One

limitation of Callahan's method is that it does not correctly handle multiple aliases

that result when the same variable is used multiple times as an actual parameter

in the same call and the corresponding formal parameters are call-by-reference. By

contrast, our method, using element recoding where all the aliases are encoded in a

single element, will correctly handle the multiple aliases problem.

Callahan's method offers no solution to the calling-context problem, and could

not be used to determine, for example, interprocedural reaching definitions. However,

Harrold and Soffa [9] have extended his method so that interprocedural reaching

definitions can be determined. They use an interprocedural flowgraph, denoted IFG,

that is very similar to the program summary graph. The IFG has inter-reaching edges

that are determined by solving Callahan's kill problem. They recommend using his

method, so their method inherits Callahan's space and time complexity, as well as

its limitation with regard to multiple aliases.








Before the IFG can be used, it must be decorated with the results of intrapro-

cedural analysis done twice for each procedure to determine both reaching definitions

and upwardly exposed uses. Then an algorithm is used to propagate the upwardly

exposed uses throughout the IFG. This algorithm has worst-case time complexity

of O(n2) where n is the number of nodes in the IFG. Their graph will have the

same number of nodes as for Callahan's graph, meaning worst-case graph size will be

O(vgal). Substituting val for n, we get a worst-case time complexity of O(v1l12). As

the size of our flowgraph is proportional to the size of the program, the worst-case

time complexity for solving our equations is only 0(12).

Weiser [24] was the first to propose an interprocedural slicing method that

ignores the execution-path problem and thereby suffers from the resulting loss of

precision. Horwitz et al. [11] have presented a method to compute the more precise

slice explained in the Introduction. However, they use a more restricted definition

of a slice. Their slice is all statements and predicates that may affect a variable

v at program point p, such that v is defined or used at point p. Their method

consists of constructing a specialized graph called a system dependence graph. Nodes

in this graph represent program pieces such as statements, and the edges in the

graph represent control or data dependencies. Edges representing transitive data

dependencies that are due to procedure calls are computed by first modeling each

procedure and its calls with an attribute grammar called a linkage grammar, and then

solving the grammar so as to determine the transitive data dependencies represented

by it. Once the system dependence graph is complete, any slice based on an actual

definition or use occurring at any point p in the program can be extracted from the

graph. A major weakness of their method is that it does not allow a hypothetical

use to be the starting point of the slice.

The complexity of constructing the system dependence graph is given as O(G -
X2 D2) where G is the total number of procedures and calls in the program, X is the








total number of global variables in the program plus a term that can be considered

a constant, and D is a linear function of X. Once the system dependence graph

is complete, any particular slice that is wanted can be extracted from the graph at

complexity O(n) where n is the size of the graph. The size of the graph is roughly

quadratic with program size, being bounded by O(P -(V + E) + T X) where P is

the number of procedures, V is the largest number of predicates and definitions in a

single procedure, E is the largest number of edges in a procedure dependence graph,

T is the number of calls in the program, and X is the number of global variables. In

their paper, much is made of the fact that once the graph is complete, any slice on

an actual definition or use can be extracted from the graph at O(n) cost where n is

the size of the graph. However, the number of actual definition and use occurrences

in a program is proportional to the program size L. Therefore, any method that can

compute a slice at cost O(Z) for some Z, can generate all the slices contained in their

graph at cost O(Z L), spool the slices to disk, and recover them at cost 0(1).

Although there are many papers on slicing, it seems that only Horwitz et al.

[11] discuss clearly the problem of the more precise interprocedural slice, and present

a method to compute it, as well as providing complexity analysis. Our research on

slicing is only concerned with computing the more precise slice, so Horwitz et al. is

the principal reference.

Zima and Chapman [25] is the principal reference used to study the issues and

methods of parallelization. Their book distills the work found in scores of papers

and dissertations, and is an excellent survey of parallelization. Interprocedural par-

allelization is specifically considered by Burke and Cytron [5], and by Triolet et al.

[22].








1.5 Outline in Brief

This introductory chapter ends with a brief synopsis of the remaining chapters.

Chapter 2 presents in detail our interprocedural dataflow analysis method. The chap-

ter ends with a brief description of the prototypes that were built to demonstrate the

method, along with some of the experimental results obtained from these prototypes.

Chapter 3 begins with a representation scheme for continuation paths for the inter-

procedural logical ripple effect problem and then presents our interprocedural logical

ripple effect algorithm. A prototype that was built to demonstrate this algorithm is

briefly described and experimental results are presented. An inversion of the logical

ripple effect algorithm is then presented as a solution to the interprocedural slicing

problem. Chapter 4 begins with an explanation of loop-carried data dependence and

its relevance to parallelization, and concludes with an algorithm that identifies loops

that can be parallelized, including loops that contain calls. Chapter 5 summarizes

the major results of the dissertation, and suggests directions for future research.














CHAPTER 2
THE INTERPROCEDURAL DATAFLOW ANALYSIS METHOD

2.1 Constructing the Flowgraph

This section discusses the flowgraph and its relationship to dataflow equations.

After the discussion, rules are given for constructing the specific flowgraph required

by our interprocedural analysis method. Note that the required flowgraph is con-

ventional and the rules to be given relate only to the representation of calls and

procedures in the flowgraph.

A flowgraph is a directed graph that represents the possible flow paths of a

program. The nodes of a flowgraph correspond to basic blocks in the program. A

basic block is a sequence of program code that is always executed together in the

same order. The directed edges of a flowgraph represent possible transfers of control.

Figures 2.1 and 2.3 each represent a flowgraph.

Dataflow problems are often formulated as a set of equations that relate the four

sets, IN, OUT, GEN, and KILL, that are associated with each node in the flow-

graph. For any node and its block, the GEN set represents the elements generated

by that block. The KILL set represents those elements that cannot flow through the

block, because they would be killed by the block. The IN set represents the valid

elements at the start of the block, and the OUT set represents the valid elements at

the end of the block.

Dataflow problems are typically either forward-flow or backward-flow. For

forward-flow, the IN set of a node is computed as the confluence of the OUT sets

of the predecessor nodes, and the OUT set is a function of the node's IN, GEN,








and KILL sets. For backward-flow, the OUT set of a node is computed as the con-

fluence of the IN sets of the successor nodes, and the IN set is a function of the

node's OUT, GEN, and KILL sets. The predecessors of any node n are those nodes

that have an out-edge directed to node n. The successors of node n are those nodes

that have an in-edge directed from node n. The confluence operator will almost in-

variably be either set union or set intersection, depending on the problem. Thus, a

dataflow problem may be classified as being either forward-flow-or, forward-flow-and,

backward-flow-or, or backward-flow-and, where "or" refers to set union and "and"

refers to set intersection.

Once the dataflow equations have been defined for a particular problem, and

the rules established for creating the GEN and KILL sets, the equations can then

be solved for a specific program or procedure and its representative flowgraph. To

solve the equations, the iterative algorithm can be used. The iterative algorithm has

the advantage that it will work for any flowgraph.

The iterative algorithm repeatedly computes the IN and OUT sets for all nodes

until all sets have stabilized and ceased to change. Recomputation of a node is

necessary whenever an outside set that it depends on changes. For forward-flow

problems, a node must be recomputed if the OUT set of a predecessor node changes.

For backward-flow problems, a node must be recomputed if the IN set of a successor

node changes. Typically, an evaluation strategy will determine the actual order in

which nodes are recomputed.

The flowgraph required by our interprocedural analysis method is conventional,

with special nodes and edges as follows. For each procedure in the program, assign

an entry node and an exit node. These nodes have no associated blocks of program

code.

The entry node has a single out-edge and as many in-edges as there are calls

to that procedure in the program. The exit node has as many in-edges as there are








nodes for that procedure whose blocks terminate with a return action. The exit node

has as many out-edges as there are calls to that procedure in the program. For every

in-edge of the entry node, there is a corresponding out-edge of the exit node.

For the purpose of constructing the flowgraph, calls must be classified as either

known or unknown. A known call is where the flowgraph for the called procedure

will be a part of the total flowgraph being constructed. An unknown call is where

the flowgraph of the called procedure will not be a part of the total flowgraph being

constructed. Unknown calls are common and will occur for two reasons. First, the

called procedure may be a compiler-library procedure for which source code is not

available. Second, the called procedure may be a separately compiled user procedure

for which the source code is not available.

For any unknown call made within the program, if summary information of its

interprocedural effects is not available, then conservative assumptions about its effects

will have to be made. The actual summary information needed, and the assumptions

made in its absence, will depend on the particular dataflow problem. The summary

information, if present, would be used when constructing the GEN and KILL sets

for any node whose block contains an unknown call.

For any known call made within the program, there will be two nodes in the

flowgraph for that call. One node is the call node. The call node represents a basic

block that ends with the known call. The other node is the return node. The return

node has an empty associated block.

The call node will have two out-edges. One edge will be directed to the entry

node of the called procedure. The other out-edge will be directed to the return node

for that call. The return node will have two in-edges. One edge is the directed edge

from the call node. The other in-edge is directed from the called procedure's exit

node.








In all, each known call results in two nodes and three distinct edges. One edge

connects the call node to its return node. A second edge connects the call node to

the called procedure's entry node. A third edge connects the called procedure's exit

node to the return node.

In constructing the flowgraph, a special problem arises if the programming lan-

guage allows procedure-valued variables, such as the function pointers of C that when

dereferenced result in a call of the function that is pointed at. The problem is to

identify what are the possible procedure values when the procedure-valued variable

invokes a call. Assuming this information is available from a separate analysis, the

flowgraph can be constructed accordingly. For example, if the procedure-valued vari-

able can have three different values when the call in question is invoked and each

value is a procedure whose flowgraph will be part of the total flowgraph, then three

known calls would be constructed in parallel with a common predecessor node for

the three call nodes and a common successor node for the three return nodes.

A procedure-valued variable is in essence a pointer. Note that the problem of

determining what a pointer is or may be pointing at when that pointer is dereferenced,

can itself be formulated as a dataflow problem, and in particular as a forward-flow-or

dataflow problem. If necessary, an initial version of the flowgraph could be con-

structed that treats all calls invoked by procedure-valued variables as unknown calls,

followed by a solving of the dataflow problem for determining possible pointer values

whenever a pointer is dereferenced, followed by amendments to the flowgraph using

the pointer-value information.

Dataflow analysis makes a simplifying, conservative assumption about the cor-

respondence between paths in the flowgraph and possible execution paths in the pro-

gram. Let a path be a sequence of flowgraph nodes such that in the sequence node

n follows node m only if n is a successor of m in the flowgraph. For intraprocedural








analysis, the assumption made is that any path in the flowgraph is a possible execu-

tion path. That this assumption may not be true for a particular program should be

obvious. However, the problem of determining the possible execution paths for an

arbitrary program is known to be undecidable. The simplifying assumption that we

use for interprocedural analysis is the same as that used for intraprocedural analy-

sis, but with the added proviso that for any path that is a possible execution path,

any subsequence of return nodes must inversely match, if present, the immediately

preceding subsequence of call nodes. A return node matches a call node if and only

if the return node is the call node's successor in the flowgraph.

2.2 Interprocedural Forward-Flow-Or Analysis

This section begins with our basic approach to solving the calling-context prob-

lem. The dataflow equations for forward-flow-or analysis are then given and their

correctness is shown. As a part of our interprocedural analysis method, the tech-

nique of element recoding is presented as a way to deal with the aliases that result

from call-by-reference formal parameters. For some dataflow problems, implicit defi-

nitions due to calls require explicit treatment, and this is discussed last.

If certain problems, such as reaching definitions, are to be solved for a program

by flow-sensitive interprocedural analysis, then the calling context of each procedure

call must be preserved. In general, preserving calling context means that the dataflow

effects of an individual call should include those effects that survive the call and were

introduced into the called procedure by the call itself, but not those effects introduced

into the called procedure by all the other calls to it that may exist elsewhere in the

program. We refer to the need to preserve calling context as the calling-context

problem.

Our solution to the calling-context problem-and the essential difference be-

tween our dataflow equations and conventional dataflow equations-is to divide every

IN set and every OUT set into two sets called an entry set and a body set. The reason








for having two sets is that the calling-context effects that enter a procedure from the

different calls can be collected and isolated in the separate entry set. This entry set

can then have effects in it killed by statements in the body of the procedure, but no

additions are made to this entry set by body statements. Instead, any additions of

effects due to body statements are made to the separate body set. This body set

will also have effects killed in the normal manner, as for the entry set. Because the

body set is kept free of calling-context effects, it is empty at the entry node. By

contrast, the entry set is at its largest at the entry node and will either stay the same

size as it progresses through the procedure's body nodes, or become smaller because

of kills. By intersecting the calling context at a call node with the entry set at the

exit node of the called procedure, the result is that subset of the calling context that

has reached the exit node and therefore will reach the return node for that call. By

"reach" we mean that there exists a path in the flowgraph along which the element

is not killed or blocked.

2.2.1 The Dataflow Equations

The dataflow equations that define the entry and body sets at every node are

now given. The equations are divided into three groups. The first group computes

the sets for entry nodes. The second group computes the sets for return nodes. The

third group computes the sets for all other nodes. In the equations, B denotes a

body set and E denotes an entry set. Two conditions, C1 and C2, appear in the

equations. C1 means that x will cross the interprocedural boundary from call node p

into the called procedure. C2 means that x can cross the interprocedural boundary

from exit node q into return node n. C, means not Ci. For each node n, pred(n)

means the set of predecessors of n. The RECODE set used in Group I is explained

in Section 2.2.2. The GEN set used in Group I, and the GEN and KILL sets used

in Group II, are explained in Section 2.2.3.







For any node n.

IN[n] = Ei,[n] U Bi,[n]

OUT[n] = Eot[n] U But[n]

Group I: n is an entry node.

Bin[n] = 0

E;,[n] = U { Ix E OUT[p] A C})
p E pred(n)

Bo,,[n] = GEN[n]

Eo0t[n] = Ei,[n] U RECODE[n]

Group II: n is a return node, p is the associated call node and q is the exit node of
the called procedure.

Bin[n] = {x (x E B0ot[p] A (1 V (Ci A C2 Ax E Eo Et[q]))) V (x E Bout[q] A C2)}

Ei.[n] = {x E ,t[pl I C- V (Ci A C2 A x E E,,[q])}

Bo,.[nl = (Bin[nl KILL[n]) U GEN[n]

Eo.t[n] = Ei[n] KILL[n]

Group III: n is not an entry or return node.

Bin[n] = U Bo.u[p
p E pred(n)

Ein[n]= U Et p]
p E pred(n)

Bo.t[n] = (Bi,[n] KILL[n]) U GEN[n]








ES.t[n] = E,n[n] KILL[n]

The equations assume that the GEN and KILL sets for each call node will

include only those effects for that call that occur prior to the entry of the called

procedure. This requirement is necessary because the OUT set of the call node is

used by the entry-node equation that constructs the entry set of the called procedure.

Referring to conditions C1 and C2, the rules for deciding whether an effect

crosses a particular interprocedural boundary will depend on two primary factors,

namely the dataflow problem and the programming language. For example, for the

reaching-definitions problem and a language such as FORTRAN, any definition of a

global variable, and any definition of a variable that is used as an actual parameter

whose corresponding formal parameter is call-by-reference, will cross. As a rule, an

effect that crosses into a procedure because it might be killed, will also cross back to

the return node if it reaches the exit node of the called procedure.

Table 2.1 shows the result of solving the equations for the flowgraph of Fig-

ure 2.1. By "solving" we mean that, in effect, the iterative algorithm has been used

and all the sets are stable. The dataflow problem is reaching definitions, and variable

w is local while variables x, y, and z are global. Reaching definitions is the problem

of finding all definitions of a variable that reach a particular use of that variable,

for all variables and uses in the program. In Figure 2.1, nodes 1 and 8 are entry

nodes, nodes 7 and 10 are exit nodes, nodes 3 and 5 are call nodes, and nodes 4 and

6 are return nodes. Alongside each node is its basic block. Each defined variable is

superscripted with an identifier that is the set element used in Table 2.1 to represent

that definition.

The correctness of the equations can be seen from the following observations.

For a procedure, the entry-node entry set is constructed as the union of all calling-

context effects that can enter the procedure from its calls. Within the procedure

body, effects in the entry set can be killed, but not added to. For effects in the entry









procedure main procedure f(
begin begin
w=5 x=10
x = 10 end
if(w > x)
z=10
call fO
else
y=5
call fO 1
end


Figure 2.1. A reaching-definitions example.


z4= 10
call f(


y3= 5
call f(









Table 2.1. Solution of forward-flow-or equations for Figure 2.1.
Node Ei, Eot Bi, Bot
1 0 0 0 0
2 0 0 0 {1,2}
3 0 0 {1,2} {1,2,4}
4 0 0 {1,4,5} {1,4, 5}
5 0 0 {1, 2} {1, 2, 3}
6 0 0 {1, 3, 5} {1,3, 5
7 0 0 {1,3,4,5} {1, 3,4, 5}
8 {2, 3,4} {2, 3, 4} 0 0
9 {2, 3, 4} {3,4} 0 {5}
10 {3,4} {3,4} {5} {5}


set that reach a call at a call node, those effects that survive the call are recovered
in the entry set constructed by the E,n[n] equation for the successor return node n.
To see that this is true, observe the following. If an entry-set effect that reaches
the call cannot enter the called procedure, then it cannot be killed within the called
procedure, so the effect should be added to the return-node entry set without further
conditions, and this is done by the selection criterion (x E Eo,t[p] A C) in the Ei, [n]
equation for the return node. If, on the other hand, an entry-set effect reaches the
call and does enter the called procedure, and therefore may be killed by it, then this
effect should be added to the return-node entry set only if it reached the entry set of
the called procedure's exit node and the effect can cross back into the caller. This is
done by the selection criterion (x E E,,t[p] A C1 A C2 A x E E,,t[q]) in the Ei,[n]
equation for the return node.
From the equations for the entry set, we see that for any procedure z, the
entry set at z's exit node will, as the equations are solved, eventually contain all
calling-context effects that entered z and reached its exit node. This characteristic
of the exit-node entry set is the requirement placed upon it when it is used in the








Ei,[n] equation for the return node, so this requirement is satisfied and the entry-set

equations are correct.

For any procedure, the Bi, set is always empty at the entry node, so the B set

is free of calling-context effects. Within the procedure body, GEN and KILL sets

are used to update the body set as it propagates along the various nodes. For effects

in the body set that reach a call at a call node, those effects that survive the call are

recovered in the body set constructed by the Bin[n] equation for the successor return

node n. If a body-set effect that reaches the call cannot enter the called procedure,

then it cannot be killed within the called procedure, so it should be added to the

return-node body set without further conditions, and this is done by the selection

criterion (x E Bo,,[p] A C1) in the Bin[n] equation for the return node. If, on the

other hand, a body-set effect reaches the call and will enter the called procedure,

and therefore may be killed by it, then this effect should be added to the return-

node body set only if it reached the entry set of the called procedure's exit node

and the effect can cross back into the caller. This is done by the selection criterion

(x E Bo,,[p] A C1 A C2 A x E E ot[q]) in the Bi,[n] equation for the return node. In

addition, all crossable effects that result from the call, and that are independent of

calling context, should also be added to the return-node body set, and this is done by

the selection criterion (x E B,,t[q] A C2) in the Bi,[n] equation for the return node.

From the equations for the body set, we see that for any procedure z, the body

set at z's exit node is free of calling-context effects and will, as the equations are

solved, eventually contain all body effects that reached the exit node, including those

body effects resulting from calls made within z. This characteristic of the exit-node

body set is the requirement placed upon it when it is used in the Bin[n] equation

for the return node, so this requirement is satisfied. The other requirement of this

return-node equation is that the exit-node entry set contains all calling-context effects








for the procedure that reach the exit node. This requirement has already been shown

to be satisfied, so we conclude that the body-set equations are correct.

2.2.2 Element Recoding for Aliases

The RECODE set for the entry node has its elements added to the Ei,, set for

that node. The idea of the RECODE set is that certain elements in the OUT set of a

predecessor call node, irrespective of their ability to cross the interprocedural bound-

ary when parameters are ignored, should nevertheless be carried over into the entry

set of the called procedure as calling-context effects because of an alias relationship

established by the call, between an actual parameter and a formal call-by-reference

parameter. Any element that enters a procedure because of such an alias relationship

between parameters should be recorded to reflect this alias relationship.

A recorded element represents both the base element, which is the element as

it would be if there were no alias relationship, and the non-empty alias relationship.

Element recoding has two purposes. First, it allows the recorded element within

the called procedure to be killed correctly through its alias relationship. Second, it

allows the recorded element within the called procedure to be correctly associated

with specific references to those aliases that are in the alias relationship.

Element recoding never involves a change of the base element, but only a change

of the associated alias relationship, which would be the set of formal parameters to

which the base element is, in effect, aliased. Because of element recoding, in effect a

new element is generated, hence the separate RECODE set.

Figure 2.2 presents an algorithm for generating the entry-node input sets E,,

and RECODE, for a forward-flow-or dataflow problem, for the assumed language

model in which the visibility of each formal parameter is limited to the single proce-

dure that declares it. For each element in the OUT[c] set, the algorithm generates

at most one element for inclusion in the entry-node input sets. The algorithm is








unambiguous, except for line 10. The "can be affected by" test at line 10 is a gener-

alization. The details of this test will depend on the specific dataflow problem being

solved. For example, if the dataflow problem is reaching definitions, then each base

element w represents a specific definition of some variable z. If the actual parame-

ter p being tested by the algorithm is the variable z, and the corresponding formal

parameter is call-by-reference, then the definition that w represents can be used or

killed through that formal parameter, so w can be affected by that actual parameter

z, and the "affected by" test is therefore satisfied. The p E OA test at line 10 covers

the situation where an actual parameter p that is aliased to the formal f is itself a

formal parameter that is effectively aliased to w. In this case f is established as a

new effective alias for w, by transitivity of the alias relationship.

Referring to the algorithm, there is no carry over of the old alias relationship

into the new alias relationship. The old alias relationship is represented by the OA

set, and the new alias relationship is represented by the NA set. That this no-

carry-over of the old alias relationship is correct, follows from the assumed language

model. The aliases of element recoding are formal parameters, and the model states

that each formal parameter is visible in only one procedure. This means there is no

need to carry the old alias relationship into a different procedure, because the aliases

cannot be referenced outside the single procedure in which the old alias relationship

is active. Note that recursive calls are no exception to this no-carry-over rule, because

a recursive call will cancel any alias relationship established for a base element by

any prior call of the procedure.

In general, the fact that crossing elements are recorded when NA $ 0, and

unrecoded when NA = 0 and OA 5 0, places an added burden on the return-node

equations to recognize an element that should be recovered from the exit-node entry

set, necessitating, in effect, additional rules to cover this possibility. After an element

is recovered, it would also be necessary to restore the alias relationship, if any, that


















e is an entry node.
This algorithm constructs the E,, [ e ] and RECODE[ e] sets.
begin
1 Ei,[e] 0
2 RECODE[e] 0
3 for each predecessor call node c of entry node e
4 for each element x E OUT[c]
5 let w be the base element of x
6 let OA be the set of aliases, if any, associated with w, forming x
7 let NA be the set of new aliases
8 NA 0
9 for each actual parameter p at call node c that is aliased
to a call-by-reference formal parameter f
10 if (w can be affected by p) V (p E OA)
11 NA- NA U{f}
fi
end for
12 if NA # 0
13 RECODE[e] RECODE[ e ] U {(w, NA)}
14 else if w can cross the interprocedural boundary
15 Ei,[e] EZ,[e] U {w}
fi
end for
end for
end


Figure 2.2. Element-recoding algorithm for forward-flow-or dataflow problems.








it had prior to the call. This recognition and restoration problem is perhaps most

easily solved by associating with each call node two additional sets, one for body-

set elements and another for entry-set elements, where each set consists of ordered

pairs. These sets would be determined whenever the entry-node entry set of the

called procedure is computed.

The first element of each ordered pair is a crossing element x as it exists in the

Bout or E,,t set at the call node, and the second element is element y which is that

element effectively generated from element x by the element-recoding algorithm of

Figure 2.2 at either line 13 or line 15. If all crossing elements for the call are included

in these additional sets, then the return-node equations can use these sets instead

of the Bo,,[p] and Eout[p] sets to recognize elements to be recovered from the exit-

node entry set. Recognition and restoration would be done by trying to match the

exit-node entry-set element against the second element of an ordered pair from the

appropriate additional set at the call node, and then, if there is a match, restoring

the original element by using the first element of the matched pair. For example, if

x is a crossing element in the B,,t set of a call node, and y is the generated element,

then (x, y) would be an ordered pair in the additional set for body-set elements.

When the Bi, set for the return node is computed, if y is in the exit-node entry set

then it will match the ordered pair (x, y), and element x will be added to the Bi,

set.

As an example of why element recoding is necessary, consider the following.

Suppose there are two different calls to the same procedure, and different definitions

of global variable g reach each call. At one of the calls, g is also used as an actual

parameter and the corresponding formal parameter is call-by-reference. The problem

now is what to kill from the entry set whenever that formal parameter is defined in

the called procedure. If the individual elements representing the different definitions

of g do not somehow identify how they are related to this formal parameter, then








the only choice is to kill all of them or none of them, and neither of these choices is

correct in this case, as the only definitions of g that should be killed are those that

entered the procedure from the call where g is aliased to the call-by-reference formal

parameter.

2.2.3 Implicit Definitions Due to Calls

A call with parameters typically has implicit definitions associated with it.

For example, if a formal parameter is call-by-reference, then each actual parameter

aliased to that formal parameter is implicitly defined at each definition of the formal

parameter. If a formal parameter is call-by-value-result, then that formal parameter is

implicitly defined each time the called procedure is entered, and the actual parameter

at the call is implicitly defined upon return from the call. From the standpoint

of solving a dataflow problem such as reaching definitions, all implicit definitions

due to calls should be determined, and elements generated at the appropriate nodes

to represent these implicit definitions. The remainder of this section discusses the

generation of implicit definitions and the determination of what reaches them for the

specific problem of reaching definitions.

We assume that a formal parameter may be either call-by-reference, call-by-

value, call-by-value-result, or call-by-result. For the reaching-definitions problem,

before the iterative algorithm can be used to solve the dataflow equations, all GEN

sets must be prepared.

For each point p in the program where a call-by-reference formal parameter is

defined, add to the GEN set of the node for point p an implicit definition of each

actual-parameter variable that is aliased to that formal parameter in a call. Each

added implicit-definition element must be a recorded element that includes the alias

relationship for that actual parameter. For example, suppose a procedure named A

has two call-by-reference formal parameters, x and y, and inside A at point p there is

a definition of x, and there are three calls of procedure A in the program. The first call








aliases variable v to x. The second call aliases variable v to both x and y. The third

call aliases variable w to x. Thus, at point p there would be three implicit-definition

elements generated, namely (v, {a}), (v, {z, y}), and (w, {x}). As an example of

what this element notation means, for the (v, {x}) element the v represents the

implicit definition of variable v that occurs at point p, and the x represents the

formal parameter that variable v is aliased to. As a special requirement for these

implicit-definition elements, for the B,,t set at the exit node of procedure A, the

(v, {x}) element, if it reaches this set, can only cross from this set to the return node
of the first call. Similarly, the (v, {z, y}) element can only cross to the return node

of the second call, and the (w, {x}) element can only cross to the return node of the

third call.

The crossing restrictions in the preceding example are due to a rule, now given.

Let A denote a procedure containing a definition at point p of a call-by-reference

formal parameter x, (t, {z}) is the implicit-definition element generated at point p

for some specific call c of A that aliases actual-parameter variable t to z, and m is

the exit node of A. If (t, {x}) E B,,t[m], then (t, {x}) can only cross from Bout[m] to

the return node of call c, and as (t, {x}) crosses, it must be recorded as t by having its

alias relationship nullified. This crossing-restriction rule is necessary because element

(t, {z}) is both a body effect, because it is generated inside the called procedure, and
a calling-context effect, because it is the result of a specific call of that procedure.

This dual quality requires the special treatment that the rule provides. Nullifying

the alias relationship as the element crosses to the return node is both good practice

in general for this element, and a necessity if call c is a recursive call of A. As an

example, assume that call c is a recursive call of A, and that variable t is a global

variable. If (t, {z}) reaches the B,,t[m] set, the rule states that this element can only

cross to the return node of call c, and that it be recorded as t. Assuming that this

t element then reaches from this return node to the B,,t[m] set, t can then cross








to any return node that has an in-edge from m. Although both the (t, {x}) and t

elements refer to the same implicit definition of variable t occurring at point p, the

two elements are not the same, and the crossing-restriction rule applies only to an

element that is identical to the element generated at point p, which is (t, {x}).

The implicit definitions of actual-parameter variables is the most important

category of implicit definitions that are due to call-by-reference formal parameters.

However, there is also a second, less-important category. At each explicit definition

of a variable t at point p inside A, such that variable t is also used in a call of A as an

actual parameter aliased to a call-by-reference formal parameter z, then there is an

implicit definition of formal parameter x at point p. The implicit-definition element

generated at point p would be (x, {t}), meaning a definition of variable x at point

p, aliased to variable t. However, assuming a formal parameter cannot be defined or

used outside the procedure for which it is declared, it follows that there is no need

for a crossing-restriction rule for these elements, because they cannot cross to any

return node.

Normally, a definition of a variable kills all other definitions of that variable.

However, the implicit definitions due to call-by-reference formal parameters have no

associated kills. Instead, the following rule suffices. For each call-by-reference formal

parameter x declared for procedure A, if all calls of A alias the same actual-parameter

variable t to x, then each explicit definition inside A of either variable t or z, will kill

all definitions of variable t and all definitions of variable x. Otherwise, if all calls of A

do not alias the same actual-parameter variable t to z, then each explicit definition

inside A of either variable t or x will kill only the definitions of that variable and

those recorded elements that are aliased to that variable.

The entry-node GEN set will be used to hold all implicit definitions of formal

parameters that occur upon procedure entry. Thus, for each entry node, for each








formal parameter of the represented procedure that is call-by-value or call-by-value-

result, add to the GEN set of that entry node an element that represents an implicit

definition of that formal parameter occurring at that entry node.

The return-node GEN set will be used to hold all implicit definitions of actual

parameters that may occur upon return from the called procedure. Thus, for each

return node, for each actual parameter of the associated call whose corresponding

formal parameter is call-by-result or call-by-value-result, add to the GEN set of

that return node an element that represents an implicit definition of that actual

parameter occurring at that return node. The return-node KILL set should represent

all elements that will be killed by these implicit definitions of actual parameters.

With the GEN sets ready, the iterative algorithm can proceed. Once the iter-

ative algorithm is ended, a follow-on step is done: a) Examine the Bo,, set for each

exit node. For each definition d in this set of a formal parameter p, and p is call-by-

result or call-by-value-result, then d reaches the implicit use of this formal parameter

by those implicit definitions of actual parameters found at the various return nodes

whose corresponding formal parameter is p. The element representing d can be added

to the Bi sets of those return nodes in a way that reflects the reach, b) Examine

the OUT set of each call node. For each definition d in this set of a variable that

is used as an actual parameter in the call, and the corresponding formal parameter

is call-by-value or call-by-value-result, then d reaches the implicit use of the defined

variable by the implicit definition of the corresponding formal parameter found at

the entry node of the called procedure. The element representing d can be added to

the E,, set of that entry node in a way that reflects the reach.

2.3 Interprocedural Forward-Flow-And Analysis

This section gives the dataflow equations used by our interprocedural analysis

method for forward-flow-and problems. The difference between these equations and

the equations for forward-flow-or is explained.








For forward-flow-and problems, some changes are needed to the dataflow equa-
tions given in Section 2.2.1. Of course, the confluence operator must be changed from
union to intersection. However, it is still necessary to construct the entry-node entry
set as the union of all crossing effects from the predecessor-node sets, so that calling
context can be properly recovered at the return nodes. At the same time, the entry
set must always be constructed as the intersection of predecessor-node sets, if the
entry set is to be a part of the IN and OUT sets. These conflicting requirements for
the entry-node entry set can be resolved by maintaining two separate entry sets at
each node. The revised dataflow equations follow. The two conditions, C1 and C2,
are explained in Section 2.2.1.
For any node n.

IN[n] = E!i)[n] U Bi[n]

OUT[n]= Et[n]U Bo.t[n]

Group I: n is an entry node.

Bin[n] = 0

Ein[n] = U { Ix (E'[p] U Bot[p]) A C,}
p E pred(n)

En)[n] = n {x I|x E OUT[p] A C1}
p E pred(n)
Bout[n] = GEN[n]

E1[n] = Ei$[n] U RECODE(1)[n] U RECODE )[n]

ES([n] = E()[n] U RECODE(2)[n]

Group II: n is a return node, p is the associated call node and q is the exit node of
the called procedure.

Bin[n] = {x I (x E Bot[p] A (' V (C1 A C2 A x E E(~[q]))) V (x E Bo,,[q] A C2)}








ESW[n] = {x E E(t[p I CV (CI A C 2 AxE E1'[q])};i =1,2.

Bo,,[n] = (Bin[n] KILL[n]) U GEN[n]

E.t[n] = E!)[n] KILL[n]; i = 1,2.

Group III: n is not an entry or return node.

Bi,[n] = n B [p]
p pred(n)

S[n] = n E' [p]; i = 1, 2.
p E pred(n)

B0t[n] = (Bi,[n] KILL[n]) U GEN[n]

Et[n] = E)[n] KILL[n]; i = 1,2.

The entry set E(') is the set used to recover calling context, and the entry set
E(2) is the set that is a component of the IN and OUT sets. The RECODE sets
appearing in the entry-node equations represent recorded elements as explained in
Section 2.2.2. The RECODE() set will just be the union of the recorded elements
generated from each predecessor call node c, using the algorithm of Figure 2.2 and
drawing from the E,[ c] and Bo,[c] sets at line 4 instead of the OUT[c] set.
Similarly, the RECODE(2) set could just be the intersection of the recorded
elements from each predecessor call node c, drawing from the OUT[c] set at line 4.
However, doing this may cause the unnecessary loss of recorded elements when the
same underlying base element w is found in each OUT[c] set. To avoid such loss, an
improved rule states that if the same base element w is found in each OUT[c] set,
and there is one or more non-empty alias relationships for that w occurring at one or
more predecessor nodes c, then a single recorded element for that w that encodes all
of these alias relationships would be generated into the RECODE(2) set, otherwise
no recorded element for that w would be generated into the RECODE(2) set. For








example, suppose c has three different values for a given entry node, and the same

base element w is found in each OUT[c] set, and at one c there is an empty alias

relationship, at the second c there is an alias relationship to formal parameter x, and

at the third c there is an alias relationship to formal parameter y. For this example,

the single recorded element would be (w, {x, y}), and this recorded element can either

be killed directly through w, or indirectly through x, or through y. Note that the

complete kill of this recorded element at any kill point, even though the kill may have

been made through an alias that was not established at each c, is nevertheless correct.

The intersection confluence operator associated with RECODE() implicitly requires

that for base element w to pass a kill point, it must be on every call path past that

kill point, which is not the case when w is killed from at least one call path, which

happens when that w is killed through an alias that was established by at least one

of the c. If the specific dataflow problem being solved allows the base element to be

used through one of its effective aliases, then a flag could be associated with each alias

in the recorded elements of RECODE(2), and this flag could indicate whether or not

the alias was established at each c. In the case of the example, the recorded element

with flags would be (w, {Xnot Ynot}). Only a use of the base element through an

alias established at each c would be a use through an alias that occurs on every call

path, and this kind of use would be the all-paths use that is implicitly required by

the specific dataflow problem by virtue of it being forward-flow-and.

With the exception of the confluence operator and the two different entry sets,

the equations for forward-flow-and are the same as for forward-flow-or, and are like-

wise correct. Set E(2) fulfills the requirement for the IN and OUT sets by consistently

using the intersection confluence operator for its construction, just as B does. The

equations for the E(1) and E(2) sets only differ at the entry node, and there the only

difference is the confluence operator, and the way the RECODE sets are built. As

set intersection is the confluence operator for E(2), and set union for E(1), and the









Table 2.2. Solution of forward-flow-and equations for Figure 2.3.
N o d e E-- Fpo-,-- F ^ --) L IP ||,- --(2 ) B i -- o ,,t
Node -E, E(1 E) E7t B B0
1 0 0 0 0 0 0
2 0 0 0 0 0 {1,2}
3 0 0 0 0 {1,2} {1,2,4}
4 0 0 0 0 {1,4,5} {1,4,5}
5 0 0 0 0 {1,2} {1,2,3}
6 0 0 0 0 {1,3,5} 1, 3, 5}
7 0 0 0 0 {1,5} {1,5}
8 {2, 3, 4} {2, 3, 4} {2} {2} 0 0
9 {2,3,4} {3,4} {2} 0 0 {5}
10 {3,4} {3,4} 0 0 {5} {5}


RECODE(2) set is added to both E() and E2), it follows that E(2 will be a subset
of E(1) at every node. Thus, E(1) can be used to recover calling context for E(2).
Set E1) also serves to recover calling context for both E(1) and B, because E() is
built at the entry node from these two sets, and the use of union as the confluence
operator guarantees that all calling-context effects will be collected.
Table 2.2 shows the result of solving the equations for the flowgraph of Fig-
ure 2.3. By "solving" we mean that, in effect, the iterative algorithm has been used
and all the sets are stable. The dataflow problem is available expressions, and vari-
able w is local while variables x, y, and z are global. Available expressions is the
problem of determining whether the use of an expression is always reached by some
prior use of that expression, for certain expressions in the program. In Figure 2.3,
nodes 1 and 8 are entry nodes, nodes 7 and 10 are exit nodes, nodes 3 and 5 are call
nodes, and nodes 4 and 6 are return nodes. Alongside each node is its basic block.
Each expression is superscripted with an identifier that is the set element used in
Table 2.2 to represent that expression.










procedure main
begin
y=w+l
z=x+1
if(e)
a=z+1
call f(
else
a=y+ 1
call f(
end








a=z+ 14r
call fO 3


procedure f()
begin
x=z+2
end








+ 11
+ 12



Sa= y +13
5 call fO


Figure 2.3. An available-expressions example.








2.4 Interprocedural Backward-Flow Analysis

Backward-flow problems are basically forward-flow problems in reverse. How-

ever, the same flowgraph is used for both forward-flow and backward-flow problems.

To convert the equations for forward-flow-or to backward-flow-or, or for forward-

flow-and to backward-flow-and, the transformation is mechanical and straightfor-

ward. The same equations are used, but various words and phrases are everywhere

changed to reflect the reverse flow. For example, "pred(n)" for predecessors becomes

"succ(n)" for successors, "out" subscripts become "in" subscripts and "in" subscripts

become "out" subscripts, IN becomes OUT and OUT becomes IN, "call node" be-

comes "return node" and "return node" becomes "call node", "entry node" becomes

"exit node" and "exit node" becomes "entry node". For backward flow, the nodes

requiring special equations are the exit node and call node, and not the entry node

and return node as for the forward-flow problems.

2.5 Complexity of Our Interprocedural Analysis Method

To determine the worst-case complexity of our method for the assumed lan-

guage model in which the visibility of each formal parameter is limited to the single

procedure that declares it, we consider the solution of the dataflow equations for only

one element at a time. Let n be the number of flowgraph nodes. Let the elementary

operation measured by the complexity be the computation of the dataflow equations

once at a single, average flowgraph node, for a single element. Only the presence or

absence of the single element within a particular body or entry set need be repre-

sented, and this requires no more than a single bit of storage for each set referenced

by the equations. Thus, computing the dataflow equations once at an average node,

for a single element, will consist of a small number of integer operations, assuming

that the average in and out-degree of the flowgraph nodes is bounded by a small

constant, which will always be the case for flowgraphs generated from real programs,








and also assuming that the length of recorded elements will be small. Referring to

the algorithm of Figure 2.2, the length of a recorded element is 1 + INAI, and INAI

is bounded from above by the number of call-by-reference formal parameters of the

given procedure. As a rule, this upper bound will be small.

We next consider the total number of node visits required to solve the dataflow

equations for a single element. Prior to solving the equations, all body and entry sets

are initialized to empty, at complexity O(n). The empty sets represent the absence of

the element. Note that each set has only two states: either the element is present, or

it is absent. Assuming a forward-flow problem, each time the equations are computed

for a node, if any of the out sets have changed from their previous state, then the

equations will be computed for all successor nodes. The forward-flow-or equations

have only two out sets per node, and the forward-flow-and equations have three. It

follows that repeated computation of the equations for a single node will cause the

successor nodes to be marked for computation at most two or three times, depending

on the equations being used. Given that the average number of successor nodes is

bounded by a small constant, it follows that the total number of node visits required

to solve the dataflow equations for a single element will be bounded from above by

kin where ki is a constant, giving a worst-case complexity of O(n) for solving the

dataflow equations for a single element.

The worst-case complexity of solving the dataflow equations for m total ele-

ments will therefore be O(mn). Let b be the number of base elements for the program

being analyzed, and let r be the number of recorded elements, giving m = b + r. As

an example, for the reaching-definitions dataflow problem the base elements will be

all the definitions in the program. We assume that for the kind of dataflow problems

our method is meant to solve, the number of base elements will be a linear function

of the program size, and therefore proportional to n. Let constant k2 be an upper

bound of b/n. We also assume the universe of real, useful programs, written by








programmers to solve practical problems. To determine an upper bound for r, let k

be the maximum number of formal parameters for a single procedure. That k is a

constant independent of program size should be obvious.

Given k and the algorithm of Figure 2.2, and allowing all possible combinations

of the formal parameters of any single procedure, the maximum number of recorded

elements for any single procedure and base element is k3 = k=1 ( k ) = 2k -1.

Note that k3 is a constant, albeit an enormous constant. The maximum number

of recorded elements for any single procedure will therefore be kab. In the assumed

language model, each formal parameter is visible in only one procedure, and this

means each recorded element is confined to a single procedure when the dataflow

equations are solved. Therefore, the total number of node visits required to solve

the dataflow equations for all the recorded elements will be bounded from above by

Ei = kisik3b where j is the number of procedures in the flowgraph, and s, is the
number of flowgraph nodes in the ith procedure. This upper bound can be rewritten

as Ci=i klk2k3nsi. Ignoring constants and given that Ef= si = n and Ei= ns; = n ,
the worst-case complexity of our method for the assumed language model is O(n2),

and the elementary operation measured by the complexity is a small number of integer

operations assuming that the average recoded-element length is small.

For a program from the assumed universe of programs, the likelihood of a large

complexity constant due to element recoding is very low, for the following reason.

In order to increase the number of recorded elements for a given base element and

procedure, the given base element must, in effect, be repeatedly aliased to different

combinations of formal parameters in the given procedure. The algorithm of Fig-

ure 2.2 generates at most a single recorded element for each element in the OUT set,

so to increase the number of recorded elements as stated, there must be multiple calls

to the same procedure, and in these different calls the same base element must be

aliased to different formal-parameter combinations. To assess the likelihood of this








requirement being met, consider that for any given program from the assumed uni-

verse, the type and purpose of a variable determines how that variable is used in that

program, and each variable used in a program by necessity has a purpose. Given a

number of different calls to the same procedure, and given that a variable appears as

one or more of the actual parameters in each of the calls, then as a rule we expect

that variable to always occupy the same parameter positions in those calls because

there is always a close correspondence between parameter position and the purpose of

the variable that occupies that position. Note that by "variable" we mean a variable

and any aliases it may have, including formal-parameter aliases. A variable and its

aliases are interchangeable and share the same purpose because by definition they

reference the same data.

It might be argued that a language such as C has procedures that have a

variable number of arguments, such as printf and scanf, for which the same variable

could easily occupy different actual-parameter positions in different calls. This is

true, but such library procedures are best treated as unknown calls, and there is

no element recoding for unknown calls. For the needs of element recoding in the

rare case of a user-written procedure with a variable number of arguments, a single

formal parameter could stand for the variable portion of the formal parameters, and

conservative assumptions could be made whenever that single formal is, in effect,

referenced. Aside from mentioning this, we do not consider such user-written variable-

argument procedures further.

For a dataflow problem such as reaching definitions, the base element can only

be affected by a single variable. For such a dataflow problem, the purposefulness of

variables makes it very unlikely that an increase in the number of recorded elements

for a given procedure and base element can even begin, let alone be sustained. How-

ever, such an increase would be more likely for a dataflow problem where the base








element can be affected by several different variables. An example would be avail-

able expressions, because each base element could be affected by as many different

variables as compose the expression represented by that base element.

In light of the preceding argument regarding the purposefulness of variables,

for the reaching-definitions and similar dataflow problems, we expect the maximum

number of recorded elements for any given procedure and base element in the majority

of the programs in the assumed universe, to be one, and a little higher than one for

the remaining programs in that universe. Given the algorithm of Figure 2.2, we also

expect the average length of each recorded element to be slightly more than two, given

the preceding expectation that there will be a very small maximum number of recorded

elements for any given procedure and base element, and assuming that most base

elements when aliased by a call will be aliased to only a single formal parameter, and

only occasionally aliased to more than one. Note that this expected average length

of the recorded elements is consistent with the claim that the elementary operation

measured by the worst-case complexity of our method is a small number of integer

operations.

It may be noticed that the complexity of O(n2) for our interprocedural analysis

method is the same as the known worst-case complexity for intraprocedural dataflow

analysis, assuming there are no restrictions on the flowgraph. This fact makes it

unlikely that it would be possible to improve on our method in terms of complexity,

without resorting to flowgraph restrictions. However, although the complexities are

the same, this does not mean interprocedural dataflow analysis will now take roughly

the same time as intraprocedural dataflow analysis. The following inequality should

make this clear. E=1 s? < n2, given that j is the number of procedures in the

flowgraph, si is the number of flowgraph nodes in the ith procedure, and Cji= si = n.

Besides the language model that is assumed for this chapter, an alternative

model allows each formal parameter to have visibility in more than a single procedure.








Examples of programming languages that fit this alternative model are Pascal and

Ada, which allow nested procedures. Element recoding can be used for this alternative

model, but unless precision is compromised, the worst-case complexity for solving the

equations will be exponential, because the number of recorded elements could grow

exponentially assuming that alias information is compounded when a recorded element

is recorded. The exponential complexity of tracking aliases due to calls was first

considered by Myers [17], and more recently by Landi and Ryder [15]. In practice, the

cost of precise element recoding for the alternative language model may be acceptable

for the assumed universe of programs, and for the same reason given previously

regarding the purposefulness of variables. However, we do not consider the alternative

model further.

2.6 Experimental Results

There are experimental data for our interprocedural analysis method. Specif-

ically, two different prototypes have been constructed, and they both solve the

reaching-definitions dataflow problem using our method. Both prototypes accept

C-language programs as the input to be dataflow analyzed. For simplicity, these pro-

totypes impose some restrictions on the input, such as requiring that all variables be

represented by single identifiers, thereby excluding variables that have more than one

component, such as structure and union variables. In addition, there is no logic in

the prototypes to determine what pointers are pointing at, so pointer dereferencing

is essentially ignored. The prototypes do not accept pre-processor commands, so the

input programs must be post-preprocessor.

Both prototypes, named prototype 1 and prototype 2, use the same code to

parse the input program and construct the flowgraph. However, they differ in how

they implement our analysis method. Prototype 1 prepares a single bit-vector format

containing all the definitions in the input program, and then solves the dataflow

equations once for the program flowgraph. Prototype 2 uses a single integer as the








bit vector and solves the dataflow equations for the program flowgraph as many

times as there are base elements. For the reaching-definitions dataflow problem, the

definitions in the program are the base elements. We call the approach used by

prototype 2 one-base-element-at-a-time, and the approach used by prototype 1 is

all-at-once.

It might be expected that prototype 2 would be many times slower than proto-

type 1, because of the big difference in bit-vector sizes, but this is not the case. For

prototype 1, calculations using varied test results show that V x S"1 D, where V

is the average number of visits per flowgraph node made to solve the dataflow equa-

tions, S is the integer size of the bit vector for prototype 1, and D is the number

of definitions in the input program. This relationship for prototype 1 means that

prototype 2 should run at roughly the same speed as prototype 1, because solving

the dataflow equations for a single element will require an average of roughly one

visit per flowgraph node and the application of the dataflow equations to a vector

of size one. Note that the total amount of work prototype 1 must do per flowgraph

node to solve the equations is proportional to the product V x S1 D, and the total

amount of work prototype 2 must do per flowgraph node to solve the equations for

the D base elements is proportional to the product Vx S2 x D x 1 x 1 x D D,

where S2 is the integer size of the bit vector for prototype 2.

Experimental results have supported the expectation of similar speeds for the

two prototypes. When deciding on the design of a practical tool, this finding is

important and decisively tips the scales in favor of the one-base-element-at-a-time

approach used by prototype 2. For both prototypes, the bit space needed for set

storage is nks, where n is the number of flowgraph nodes, k is the average number of

sets per node, and s = max(average set bit-size for any solving of the equations). Note

that for prototype 1 there is only one solving of the equations, and for prototype 2

there are as many solving of the equations as base elements. The primary reason










Table 2.3. Typical experimental results for the two prototypes.

defs defs global calls nodes prototype 1 prototype 2
2126 30% 521 4191 49s lm21s
2026 60% 472 3948 55s 2m22s
4109 30% 924 7537 4m18s 4m38s
4223 60% 916 7723 4m57s 8ml9s
6115 30% 1325 11185 N/A 10m0s
6091 60% 1411 11288 N/A 18ml8s
8200 30% 1832 14799 N/A 17m44s
8054 60% 1726 14641 N/A 30m2s
10299 30% 2164 18434 N/A 23m55s
10016 60% 2356 18587 N/A 45m8s


the approach used by prototype 2 is preferable when compared with the all-at-once

approach used by prototype 1, is the likelihood of a greatly reduced s value. For

example, without element recoding, the s value is 1 for prototype 2, and D for

prototype 1. Allowing element recoding, the s value for the prototype-2 approach

will be 1 + max(average number of recorded elements per procedure for any solving

of the equations). Here we assume that the best way to add element recoding to

prototype 2 would be, for each solving of the equations, to solve the equations for

both a single base element and all recorded elements generated from that base element.

Table 2.3 presents typical experimental results for the two prototypes. Each

table row represents a different input program. The input programs were randomly

generated by a separate program generator. The generated input programs are syn-

tactically correct and compile without error, but have meaningless executions. Each

input program in Table 2.3 has 100 procedures. Only prototype 1 currently has

element-recoding logic, so the input programs do not have call parameters and the

table data do not reflect element-recoding costs. Measuring element-recoding costs

for randomly generated programs would be somewhat meaningless anyway, since the

purposefulness-of-variables principle would be violated.






44

Referring to the columns of Table 2.3, "defs" is the total number of definitions in

the input program, "defs global" is the percentage that define global variables, "calls"

is the number of known calls, "nodes" is the number of flowgraph nodes, "prototype 1"

is the total CPU usage time in minutes and seconds required by prototype 1 to

completely solve the reaching-definitions dataflow problem for the input program

and generate a report of all the reaches, and "prototype 2" is the same thing for

prototype 2. The hardware used was rated at roughly 23 MIPS. The large space

requirements of prototype 1 prevented running it for the larger input programs in

the table.















CHAPTER 3
INTERPROCEDURAL SLICING AND LOGICAL RIPPLE EFFECT

3.1 Representing Continuation Paths for Interprocedural Logical Ripple Effect

This section lays the theoretical basis for our algorithm. The problem of inter-

procedural logical ripple effect is examined from the perspective of execution paths

and their possible continuations. First, general definitions are given, followed by three

assumptions and a definition of the Allow and Transform sets, followed by Lemma 1,

Theorems 1 through 4, and a discussion of the potential for overestimation inherent

in the Allow set.

A variable is defined at each point in a program where it is assigned a value. A

definition is assumed to have the general form of "v +- expression", where v is the

variable being defined and "--" is an assignment operator that assigns the value of

expression to v. If the expression includes variables, then these variables are termed

the use variables of the definition. In general, a use is any instance of a variable that

is having its value used at the point where the variable occurs.

A procedure contains a definition if the statement that makes the definition is

in the body of the procedure. Similarly, a procedure contains a call if the statement

that makes the call is in the body of the procedure. The body of a procedure is those

statements that are defined as belonging to the procedure.

Frequent reference is made in this chapter to a procedure containing a state-

ment, or containing a call, or containing a flowgraph node. For languages that allow

nested procedures, such as Pascal and Ada, note that procedure nesting in these

languages is a mechanism for controlling variable scope, and not a mechanism for








sharing statements, calls, or flowgraph nodes. Throughout this chapter we assume

that at most only a single procedure contains any given statement, call, or flowgraph

node.

Let d and dd be two definitions, possibly the same, in the same program. Let dd

have a use-variable v, let vdd be that use-variable instance, and let d define v. Given a

possible execution path between definition d and vdd, along which the definition of v

that d represents would be propagated, such a path is referred to as a definition-clear

path between d and vdd with respect to v. Definition d can only be propagated along

an execution path to the end of that path if either definition d itself or an element

that represents definition d exists at the beginning of that path, and there is no

redefinition of v along that path. Definition d is said to affect definition dd if there

is a definition-clear path between d and Vdd with respect to v. Similarly, definition d

affects use u if u is an instance of v, and there is a definition-clear path between d

and u with respect to v. For convenience, v will not be explicitly mentioned when

it is understood. Note that whenever we speak of an execution path between two

points, we always mean that the execution path begins at the first point and ends at

the second point. For example, an execution path between d and dd begins at the

program point where d occurs and ends at the program point where Vdd occurs. For

convenience, we assume that dd and Vdd occupy the same program point.

Assumption 1. A called procedure, if it returns, always returns to its most recent

caller. A procedure that returns, always returns to the most recent unreturned call.

Assumption 2. A call has no influence on the execution paths taken inside the

called procedure.

Assumption 3. There are no recursive calls.

Assumption 1 reflects the behavior of all the procedural languages that we know

of. Regarding Assumption 2, our algorithm may in fact overestimate the logical ripple

effect because of both Assumption 2 and the unstated but standard assumption of








intraprocedural dataflow analysis that all paths in a procedure flowgraph are possible

execution paths. However, these two assumptions are unavoidable because determin-

ing all the truly possible execution paths in an arbitrary program is known to be an

undecidable problem. Regarding Assumption 3, making this assumption improves

the precision of our algorithm because this assumption removes a potential cause of

overestimation. The consequence of using our algorithm for a program with recursive

calls is discussed at the end of Section 3.2.

To determine what a definition affects when it is constrained by ripple effect,

it is useful to introduce two concepts: backward flow and forward flow. Given an

execution path, whenever the execution path returns from a procedure to a call, this

is termed backward flow. All other parts of the execution path may be termed forward

flow. Note that the possibilities for backward flow are constrained by Assumption 1,

and therefore constrained by the relevant execution paths that lead up to the point

of the return in question.

Regarding a given execution path, those call instances within that execution

path that have yet to be returned to within that path, called unreturned calls, are

the parts of the path that constrain backward flow. Note that this constraint is a

positive constraint, since a call cannot be returned to unless that call exists as an

unreturned call in at least one relevant execution path.

Definition 1. Two sets, Allow and Transform, will be used to represent the

backward-flow restrictions associated with a particular definition d. Let p be the pro-

gram point where definition d occurs. The elements in both sets are calls. The Allow

set identifies only the calls to which the execution path continuing on from point p

may make an unmatched return to-until the backward-flow restrictions represented

by this Allow set are effectively cancelled by the interaction between the execution-

path continuation and the Transform set, explained shortly. An unmatched return is

a return made during the execution-path continuation to a call instance that precedes








the beginning of that execution-path continuation. The call instance is necessarily an

unreturned call, as otherwise it could not be returned to. [Allowl < the total number

of different calls represented in the program text. We define Allow = 0 to mean there

are no backward-flow restrictions for d. The Transform set identifies only the calls

to which the execution path continuing on from point p may make an unmatched

return to, and upon this unmatched return, the execution-path continuation is no

longer constrained by the Allow and Transform sets associated with d. The following

relationships hold. Transform C Allow. If Allow 5 0 then Transform Z 0.

Note that minimizing backward-flow restrictions must be done whenever the

possible execution paths allow it, because otherwise the computed logical ripple

effect-which is the whole purpose of this formal-analysis section-may be missing

pieces that belong in it but were not added to it because backward-flow restrictions

were retained that are not valid for all the possible execution paths involved.

Lemma 1. For any execution path P between two program points p and q, if P

includes two or more call instances made in P that have not been returned to in P,

then for these unreturned calls, c, calls the procedure containing ci+l, where c, is the

ith unreturned call, in execution order, made in P.

Proof. Assume that the next unreturned call ci+l is not contained in the pro-

cedure that was called by ci. Let X be the procedure called by ci, and let Y be the

procedure that contains ci+1. The execution path in P between making the call ci

and making the call ci+ must include a path out of procedure X and into procedure

Y so that the call ci+ can be made. A path out of procedure X can occur in only

two ways. Either X returns to a call, or X itself makes a call. If X returns to a call,

then by Assumption 1, ci would be returned to, contradicting the given that ci has

not been returned to. This means X must make a call to get to Y. Let c be the call

contained in X that is the last call contained in X on the execution path in P taken

from X to Y so as to make the call ci+l. If X makes the call c, and c has not been








returned to in P, then c would precede ci+, as an unreturned call following c;, con-

tradicting the given that ci+l is the next unreturned call in execution order after c;.

If c has been returned to in P, then all calls occurring on the execution path between

the call c and the return to c must have been returned to according to Assumption 1.

This would mean ci+1 has been returned to, contradicting the given that ci+1 has

not been returned to. Thus, it is true that ci calls the procedure containing ci+,, as

assuming otherwise leads to contradictions. 0

Definitions for Theorems 1 through 4. Let d and dd be the two definitions

previously defined. Let A and T be the Allow and Transform sets associated with d.

Let P be a single execution path between d and dd, and along which d can affect dd,

subject to the constraints on P imposed by A and T. P will consist of a sequence of

calls and returns, if any, in the order they are made. Any instance of a call made in

P that is not returned to in P, is an unreturned call in P.

K is defined for P if and only if P contains an unmatched return-meaning a

return to a call instance that precedes the beginning of P-to a call E T. K is that

part of P that follows the first unmatched return to a call E T. Thus, K represents

the continuation of P after the unmatched return. Any instance of a call made in K

that is not returned to in K, is an unreturned call in K.

Referring to each of the four theorems in turn, let AA and TT be the Allow

and Transform sets for dd given all the paths P that meet the requirements of P as

stated by that theorem. Let AAp and TTp be the Allow and Transform sets for dd

given a single path P that meets the requirements of P as stated by that theorem.

The four theorems that follow each define AA and TT. Note that for any given P,

A, and T, one of the four theorems will apply.

Theorem 1. If (1) A = 0, and P has no unreturned calls, or (2) A 5 0, K is

defined for P, and K has no unreturned calls, then AA +- 0 and TT +- 0.








Proof. For case (1), d is free of backward-flow restrictions and d has affected

dd without making an unreturned call, therefore dd will be free of backward-flow

restrictions, giving AA <- 0 and TT <- 0. For case (2), as soon as path P makes

an unmatched return r to a call E T, then by Definition 1 what d can affect is no

longer constrained by A and T, and this freedom from constraint by A and T passes

by transitivity to dd because d affects dd.

When K is defined for P, the unmatched return r in P that immediately pre-

cedes the beginning of K, means that any unreturned calls in P are also in K. This

is because all call instances within P are more recent than the call instance that

matches the unmatched return r. Thus, by Assumption 1 all call instances in P

preceding the return r must be returned to in P before r can occur. Therefore, P has

no unreturned calls because K has no unreturned calls. Thus, dd is free of backward-

flow restrictions since A, T, and P contribute nothing in the way of constraint, giving

AA +- 0 and TT 0. O

Theorem 2. If (1) A = 0, and P has at least one unreturned call, or (2) A ^ 0,

K is defined for P, and K has at least one unreturned call, then AA *- Uall such P

{the unreturned calls of P}, and TT +- Uall such P {the first unreturned call in P}.

Proof. For case (1), A and T contribute nothing in the way of constraint

to AAp and TTp. Because d affects dd along path P which contains unreturned

calls, by Assumption 1 those unreturned calls must be returned to first before any

other unreturned calls can be made from the execution-path continuation point of dd

onward. Hence, AAp +- {the unreturned calls of P}. Because d had no backward-

flow restrictions, it follows that once all the unreturned calls of P are returned to

by the execution-path continuation, then that continuation would no longer have

any backward-flow restrictions. Because of Assumption 3 and Lemma 1, all the

unreturned calls of P are returned to when the sequentially first unreturned call in

P is returned to. Hence, TTp <- {the first unreturned call in P}. For case (2), as








shown in the proof of Theorem 1 case (2), A and T contribute nothing to AAp and

TTp when K is defined for P. Thus, this case (2) is effectively the same as case

(1), because the A and T sets contribute nothing and an unreturned call in K is an
unreturned call in P. Therefore, AAp +- {the unreturned calls of P} and TTp -

{the first unreturned call in P}.

From Definition 1 and the general definitions of AA, TT, AAp, and TTp, it

follows that AA +- Uall such P AAp and TT t Uall such P TTp. Thus, AA +-

Uall such P {the unreturned calls of P}, and TT Uall such P {the first unreturned
call in P}. O

Theorem 3. If A # 0, K is not defined for P, and P has no unreturned calls,

then AA {zx x E A A (a is part of a possible execution path that inclusively

begins with a call E T and ends with a call of the procedure containing dd, such that

each unreturned call in this possible execution path is in A)}, and TT +- AA n T.

Proof. Note that only one procedure contains dd. Because K is not defined for

P, it follows that P was constrained in its entirety by A, never making an unmatched

return to a call E T. Because P has no unreturned calls, d can only affect dd along

P by making one or more unmatched returns to calls E (A T), unless d and dd are

in the same procedure.

A, in effect, represents possible execution paths with unreturned calls by which

d was affected. However, once given P, the path P may eliminate some of the

paths from A as being possible, and return to some of the unreturned calls in A.

Thus, although P contributes nothing directly to AA, it may narrow the unreturned

execution-path possibilities that A can contribute to AA. AA as defined for this

theorem, captures all execution paths in A that begin with a call E T and end with

a call of the procedure that contains dd. Given Assumption 3, it should be obvious

that these are all the possible paths in A that are unreturned after P. Note that

if d and dd are in the same procedure, then AA = A and TT = T. Assume that








d and dd are in different procedures. Any call E A that is not part of at least one

path in A that makes a call of the procedure containing dd, must be excluded from

AA because P requires a path in A that passes through the procedure containing

dd, because otherwise P could not make a return to the procedure containing dd.

Any call E A that is on a path in A between the procedure containing dd and the

procedure containing d, must be excluded from AA because the procedure containing

dd has been returned to by P. The definition of AA for this theorem satisfies these

two exclusions.

That TT +- AA n T follows from Definition 1 requiring TT C AA, and from

the definition of AA for this theorem. O

Theorem 4. If A 0, K is not defined for P, P has at least one unreturned

call, and the first unreturned call in P is contained in procedure X, then St

Uall such P given X {the unreturned calls of P}, and S2 +- {x x E A A (x is part
of a possible execution path that inclusively begins with a call E T and ends with

a call of the procedure X, such that each unreturned call in this possible execution

path is in A)}, AA <- S U S2, and TT S2 n T.

Proof. St follows from Definition 1 and the proof of Theorem 2. S2 follows from

Theorem 3, where the specific "procedure containing dd" in the expression for AA in

Theorem 3 has been replaced by the equally specific "procedure X".

That the union operation of AA, combining St and S2, does not thereby repre-

sent spurious paths in AA, it is only necessary to show that the paths represented in

Si never cross with the paths represented in S2. Two paths cross if each path makes

an unreturned call to the same procedure. All paths in S2 end with an unreturned

call of procedure X. All paths in St begin with an unreturned call contained in

procedure X. Assume that both S and S2 include an unreturned call to the same

procedure. As all paths in S2 lead to procedure X, this means there exists an exe-

cution path that originates in procedure X and eventually calls procedure X. Thus,














Figure 3.1. An example call structure that does not allow overestimation.

the execution path represents recursion, and this is contradicted by Assumption 3.

Therefore, the paths represented in S1 never cross with the paths represented in S2.

The first unreturned call in P is not added to TT because the path P is an

extension of the unreturned paths represented in S2. That TT +- S2 r T follows from

Definition 1 requiring TT C AA, and from the definition of AA for this theorem. 0

The four theorems given above will be used to build the algorithm given in

the next section. In effect, a given Allow set represents possible execution paths

with unreturned calls by which the definition associated with that Allow set was

affected. Inversely, the Allow set identifies, in effect, those continuation paths that

can make unmatched returns. However, missing from the Allow set is the information

needed to enforce an ordering of the unmatched returns that the continuation path

may make. To a large extent, this missing information is unnecessary because of

Lemma 1. Typically, the call structure of the program itself enforces the ordering of

the unmatched returns. Figure 3.1 is an example. Assume d affects dd, giving an

Allow set of {cl,c2} for dd. Given a continuation path from dd, it is not possible

for cl to be returned to before c2, so the correct ordering of unmatched returns is

enforced by the program itself. However, there are cases where the missing ordering

information can result in a continuation path taking unwanted shortcuts.

Figure 3.2 gives an example of a call structure that allows the continuation path

from dd to make an unwanted shortcut when given the right circumstances. Assume

d affects dd along the paths cl-c2 and c3-c4, giving an Allow set of {cl,c2,c3,c4} for

dd. Assume the continuation path is r2-c5-r3, where r2 and r3 are unmatched returns










cl


c3


c2


c4


Figure 3.2. An example call structure that allows overestimation.


to calls c2 and c3. The unmatched return r3 should not be allowed to happen before
an unmatched return r4, but this unmatched-return ordering will not be enforced
by the Allow set defined in this dissertation, so the assumed continuation path is
possible. By virtue of such a spurious continuation path, dd may be able to affect
a definition or use that it would not otherwise be able to affect, assuming dd were
confined to only legitimate continuation paths. In practical terms, this means that
the computed logical ripple effect that consists of affected definitions and uses may
in fact be an overestimate because of spurious continuation paths. Although the
Allow set does permit spurious continuation paths under the right circumstances, of
which Figure 3.2, and the assumed paths by which d affected dd, are the most simple
example, we feel that these circumstances, along with spurious paths that affect
what would otherwise be unaffected, will not occur often enough in real programs to
undermine the general usefulness of the Allow set in constraining backward flow and
permitting computation of a precise or semiprecise logical ripple effect.








3.2 The Logical Ripple Effect Algorithm

This section presents an algorithm for computing a precise interprocedural logi-

cal ripple effect. After a brief overview of the algorithm, the dataflow analysis method

used by the algorithm is discussed. Then, two important properties of the dataflow

sets are detailed, followed by three rules that are used to impose backward-flow re-

strictions on the dataflow analysis that is done. Last are proofs that the algorithm

is correct.

The algorithm to compute logical ripple effect is shown in Figure 3.3. Each

statement in the algorithm is numbered on the left. For convenience, algorithm

statements will be referred to as lines. For example, a reference to line 28 means the

statement at 28 that actually is printed on several lines. Comments in the algorithm

begin with -. I and T are just two different, fixed, arbitrary values.

In general, the algorithm works as follows. A definition d and its associated

Allow and Transform sets are popped from the stack (line 7), and then the reaching-

definitions dataflow problem is solved for this definition d, imposing any backward-

flow restrictions represented by the Allow and Transform sets (line 8). Reaching

definitions for a single definition is the problem of finding all uses and definitions

affected by the definition. The definition d that was dataflow analyzed, and any

uses affected by it, are included in the ripple effect (lines 9 to 11). Each affected

definition will have its Allow and Transform sets determined in accordance with

Theorems 1 through 4 (lines 22 to 46). A check is then made to see if the affected

definition and its restriction sets, Allow and Transform, should be added to the stack

for dataflow analysis or not (lines 47 to 52). The algorithm ends when the stack is

empty. Although the algorithm shows a single definition b being added to the stack at

line 5, any number of different b can actually be added, along with empty restriction

sets for each b.














Compute the logical ripple effect for a hypothetical or actual definition b
Input: a program flowgraph ready for dataflow analysis
Output: the logical ripple effect in RIPPLE
begin
1 RIPPLE 0
2 for each definition dd in the program
3 FINdd _- I
end for
4 stack +- 0
5 push (b, 0, 0) onto stack
6 while stack 5 0 do
7 pop stack into (d, ALLOW, TRANSFORM)
8 Solve the reaching-definitions dataflow equations for the single definition d,
using Rules 1, 2, and 3.
9 RIPPLE +- RIPPLE U {d}
10 for each use u in the program that is affected by either dl or d2
11 RIPPLE RIPPLE U {u}
end for
12 ROOT1 0, LINK1 + 0, ROOT2 +- 0, LINK2 + 0
13 for each call node n in the flowgraph
14 if dl E Bou[n] and di crossed from this call into the called procedure
15 ROOT1 ROOT1 U {the call node n}
fi
16 if di E E ut[n] and di crossed from this call into the called procedure
17 LINK1 LINK1 U {the call node n}
fi
18 if d2 E Bout[n] and d2 crossed from this call into the called procedure
19 ROOT2 ROOT2 U {the call node n}
fi
20 if d2 E Eat[n] and d2 crossed from this call into the called procedure
21 LINK2 +- LINK2 U {the call node n}
fi
end for


Figure 3.3. The logical ripple effect algorithm.









22 for each definition dd in the program that is affected by either dl or d2
determine Allow and Transform for dd by Theorem 1
23 if d2 E B,,[node where dd occurs]
24 PATHS *- 0, TRANS +- 0
25 call Analyze
else
determine Allow and Transform for dd by Theorem 2
26 if d2 E Ei,[node where dd occurs]
27 PATHS 0
28 PATHS x- { x z (ROOT2 U LINK2) A (x calls the procedure
that contains dd V x calls a procedure that contains
a call c E (PATHS n LINK2))}
29 TRANS ROOT2 n PATHS
30 call Analyze
fi
determine Allow and Transform for dd by Theorem 3
31 if di E B,n[node where dd occurs]
32 PATHS +- 0
33 PATHS {x I x E ALLOW A (x calls the procedure that contains dd
V x calls a procedure that contains a call c E PATHS)}
34 TRANS TRANSFORM n PATHS
35 call Analyze
fi
determine Allow and Transform for dd by Theorem 4
36 if di E E,, [node where dd occurs]
37 for each procedure X that contains a call E ROOT1
38 RT1 {x I x E ROOT1 A x is contained in procedure X}
39 PP < 0
40 PP .- {x I x E (RT1 U LINK1) A (x is on a path that inclusively
begins with a call E RT1 and ends with a call of the
procedure that contains dd, such that each call
in this path is in (RT1 U LINK1))}
41 if PP 0
42 PATHS 0
43 PATHS {X x E ALLOW A (x calls procedure X V x calls
a procedure that contains a call c E PATHS)}
44 TRANS TRANSFORM n PATHS
45 PATHS PATHS U PP
46 call Analyze
end statements: fi, end for, fi, fi, end for, od
end


Figure 3.3. continued.









Procedure Analyze
begin
avoid repetition of dd dataflow analysis if possible
47 if FINdd # T A (PATHS = 0
V (true for all saved pairs for dd: PATHS g P V TRANS g T))
48 if PATHS = 0
49 FINdd C T
50 push (dd, 0, 0) onto stack
else
51 save PATHS and TRANS as the pair P x T for dd
52 push (dd, PATHS, TRANS) onto stack
fi
fi
end


Figure 3.3. continued.

The dataflow equations referred to in line 8 are shown in Figure 3.4. These

equations are copied from Chapter 2 that presents a method for context-dependent

flow-sensitive interprocedural dataflow analysis. The method consists of solving-

using the standard iterative algorithm-the dataflow equations shown in Figure 3.4,

for the program flowgraph required by the equations. The method in Chapter 2

includes a solution to the problems of parameter aliasing and implicit definitions,

that are part of the interprocedural reaching-definition problem. We assume that the

full method of Chapter 2 would be used, but we do not discuss these side issues in

this chapter as they are not directly relevant to the algorithm. Note that there are

other methods for context-dependent flow-sensitive interprocedural dataflow analysis

[3, 9, 17, 21], but the method of Chapter 2 has precision and efficiency advantages
over the other methods cited.

Referring to the dataflow equations of Figure 3.4, four sets are computed for

each flowgraph node: two body sets, Bi, and B,,t, and two entry sets, Ei, and E,,t.

All body and entry sets are initially empty. As the equations will be solved for only

a single definition d, the GEN set for the node where d occurs-i.e. the node whose










For any node n.

IN[n] = Ein[n] UBi[n]

OUT[n] = E,,,[n] U B,,t[n]

Group I: n is an entry node.

Bin[n] = 0

Ei,[n] = U {x I E OUT[pl A C,}
p E pred(n)

B,,t[n] = GEN[n]

Eat[n] = Ei,[n] U RECODE[n]

Group II: n is a return node, p is the associated call node and q is the exit node of
the called procedure.

Bin[n] = {x I (x E BS,,[p] A (C- V (C1 A C2 A x E Eo,[q]))) V (x E Bot[q] A C2)}

Ei,[n] = {x E Eo,,[p] I Ci V (C1 A C2 A z E ,,t[q])}

Bo.t[n] = (B,n[n] KILL[n]) U GEN[n]

Eot[n] = E;[n] KILL[n]

Group III: n is not an entry or return node.

Bi,[n] = U Bout[p]
p E pred(n)

Ei.[n] = U Eo,,[p]
p E pred(n)

Bout[n] = (Bin[n] KILL[n]) U GEN[n]

Et[n] = Ein[n] KILL[n]


Figure 3.4. Dataflow equations for the reaching-definitions problem.








associated block of program code contains the definition d-will contain an element

representing d, and all the other GEN sets will be empty. The node where d occurs

is the natural starting point for the iterative algorithm, that will recompute the body

and entry sets for the nodes until stability is attained and the sets cease to change, at

which point the equations have been solved. Once solved, an element is in the entry

set or body set at a particular node depending on how that element was propagated

to that node. The same element may be in both sets at the same node. Properties 1

and 2 listed below, summarize those implications of set membership that are used by

the algorithm. The properties follow directly from the dataflow equations.

Property 1. For any node n, an element is in the Ei,[n] set or E,,t[n] set if and

only if that element entered the procedure that contains node n from a call node,

and there is a definition-clear path from that call node to node n. Thus, membership

in the entry set of node n implies that the element can propagate to node n by an

execution path that makes at least one unreturned call between the point where the

element is generated and the point where node n occurs.

Property 2. For any node n, an element is in the Bi,[n] set or Bo,,t[n] set if

and only if that element was generated in the same procedure that contains node n,

or that element entered the procedure that contains node n from an exit-node B,,t

set. There must also be a definition-clear path to node n from either the element's

generation node or from the exit node. If the element entered from an exit-node

Bot set, then Property 2 applies recursively to the element in that B,,t set. Thus,

membership in the body set of node n implies that the element can propagate to

node n by an execution path between the point where the element is generated and

the point where node n occurs that does not include any unreturned calls.

The three rules referred to in line 8 are listed below. Rule 1 applies before

the dataflow equations are solved. Rules 2 and 3 apply as the equations are being








solved. The rules impose the backward-flow restrictions represented by the ALLOW

and TRANSFORM sets in line 7.

Rule 1. If ALLOW = 0 then element d2 is generated at the node where definition

d occurs, otherwise d, is the generated element, meaning the element in the GEN

set. Both dl and d2 are base elements that represent the same definition d. Both

elements are identical in terms of when they appear in any given KILL set. The

only difference between them is that dl and d2 are treated differently by Rules 2 and

3 below.

If the ALLOW set is empty, then by Definition 1 there should be no backward-

flow restrictions on d. Rule 1 accomplishes this requirement, as d2 is immune to

backward-flow restrictions which are imposed by Rule 2.

Rule 2. Let n be a return node, p be the associated call node, and q be the

exit node of the called procedure. Each time the Bi,[n] equation is computed, if

di E Bot[ q ], then di cannot cross from Bout[ q] into the Bi,[n] set if p ALLOW.

In the dataflow equations, the crossing of an element from an exit-node body

set to a return node is the only action in the equations that represents, in effect, an

unmatched return to a call instance that was made in an execution path leading up

to the program point where definition d occurs, which is the starting point of the

reaching-definition analysis done for d. Thus, Rule 2 covers all cases in which an

unmatched return occurs. Rule 2 restricts unmatched returns to those call instances

that are represented in the ALLOW set, thereby realizing the purpose of the ALLOW

set as given by Definition 1.

Rule 3. Let n be a return node, p be the associated call node, and q be the

exit node of the called procedure. Each time the Bi,[n] equation is computed, if

di E Bout[ q], and, by C2 and Rule 2, d, can cross from Bout[ q into the Bi,[n] set,

and p E TRANSFORM, then as this dl element crosses from Bot[q] into the Bi,[n]








set, the element is changed to d2. In effect, dl is transformed into d2, and the return

node n becomes a generation node for the d2 element.

As already mentioned, in the dataflow equations the crossing of an element

from an exit-node body set to a return node is the only action in the equations that

represents, in effect, an unmatched return to a call instance that was made in an

execution path leading up to the program point where definition d occurs, which

is the starting point of the reaching-definition analysis done for d. Thus, Rule 3

covers all cases in which an unmatched return occurs. The requirement by Rule 3

that the returned-to call be in the TRANSFORM set satisfies Definition 1 as to

when backward-flow restrictions can be ignored. Rule 3 replaces element dl, which is

subject to the backward-flow restrictions, with element d2, which is free of backward-

flow restrictions, at the return point and thereby satisfies Definition 1 regarding

removal of backward-flow restrictions on the execution-path continuation, since d2

now represents the continuation instead of dl.

Lemma 2. The algorithm computes at lines 23 to 46 the restriction sets for an

affected definition in accordance with Theorems 1 through 4.

Proof. We first establish the properties of the LINK and ROOT sets computed

at lines 12 to 21. Let p be the node, if any, where dl is generated. Let q be any node

where d2 is generated, i.e. those return nodes where dl is transformed into d2, or for

ALLOW = 0 the node where definition d occurs.

The tests at lines 14 and 18 make use of Property 2: if an element is in the Bout

set of a call node n, then there exists a definition-clear path between the node where

the element is generated and node n, and the path has no unreturned calls. The call

at node n would be the first unreturned call on that path by just extending the path

to the entry node of the called procedure. Therefore, the ROOT1 set represents all

calls that are the first unreturned call on at least one definition-clear path between

node p and some other node in the flowgraph. The ROOT2 set represents all calls








that are the first unreturned call on at least one definition-clear path between node

q and some other node in the flowgraph.

The tests of lines 16 and 20 make use of Property 1: if an element is in the E,,t

set of a call node n, then there exists a definition-clear path between the node where

the element is generated and node n, and the path includes the unreturned call that

called the procedure containing node n. The call at node n would be at least the

second unreturned call on that path by just extending the path to the entry node

of the called procedure. Therefore, the LINK1 set represents all calls that are an

unreturned call but not the first unreturned call on at least one definition-clear path

between node p and some other node in the flowgraph. The LINK2 set represents

all calls that are an unreturned call but not the first unreturned call on at least one

definition-clear path between node q and some other node in the flowgraph.

The test at line 23 checks for the application of Theorem 1. If d2 E Bi, [node

where dd occurs], then by Property 2 there exists a definition-clear path P between

d and dd that has no unreturned calls, and somewhere along P, d2 is generated,

meaning either ALLOW = 0 or K is defined for P. This satisfies the conditions of

Theorem 1, and line 24 sets PATHS and TRANS to empty in accordance with the

theorem. PATHS and TRANS are the Allow and Transform sets for dd.

The test at line 26 checks for the application of Theorem 2. If d2 E Ei[node

where dd occurs], then by Property 1 there exists at least one definition-clear path

P between d and dd that has at least one unreturned call, and somewhere along P,

d2 is generated, meaning either ALLOW = 0 or K is defined for P. This satisfies

the conditions of Theorem 2. Only the d2 element satisfies the theorem, so it follows

that all paths P for the theorem will have to be constructed from the ROOT2 and

LINK2 sets exclusively.

Referring to Theorem 2, line 28 computes the AA set, and line 29 computes

TT. For line 28, the PATHS set is defined in terms of itself. This recursive reference








means that each time a call is added to the PATHS set, the condition containing

the recursive reference must be reevaluated, because additional calls may thereby be

added to PATHS. Recursive references are similarly used in lines 33 and 43. What

line 28 does is extract from all the calls that element d2 crossed, just those calls that

are on a path to dd. This is done by building the paths backwards, beginning with

those calls that call the procedure containing dd. By Lemma 1, any path between d

and dd consisting of unreturned calls can be found by proceeding in reverse order from

dd and selecting those calls that call a procedure containing a call already selected.

Backward path building and Lemma 1 are similarly used in lines 33 and 43. By the

properties of the ROOT and LINK sets, the paths constructed by line 28 will be

definition-clear. Notice that a particular call may be in both the ROOT2 and LINK2

sets, but if a call is only in the ROOT2 set, then it cannot be used as the basis for

extending further backwards any path, because by Property 1, d2 does not propagate

from the entry node of the procedure that contains that call, to the call node for that

call. This is the reason for the (PATHS n LINK2) requirement in line 28. Once the

PATHS set is computed, line 29 computes TRANS in accordance with the theorem.

The test at line 31 checks for the application of Theorem 3. If di E Bi [node

where dd occurs], then by Property 2 there exists a definition-clear path P between

d and dd that has no unreturned calls. It also follows that ALLOW # 0 and P

does not make an unmatched return to a call E TRANSFORM, because d, is the

element, meaning K is not defined for P. This satisfies the conditions of Theorem 3.

Referring to Theorem 3, line 33 computes the AA set, and line 34 computes TT.

What line 33 does is extract from ALLOW all paths that end with a call of the

procedure containing dd. Although Theorem 3 states that the path begin with a call

E TRANSFORM, line 33 does not require a check for this because TRANSFORM is

a subset of ALLOW and those first unreturned calls in TRANSFORM that are on a

path in ALLOW to dd, will unavoidably be picked up as the paths are built backwards








from dd. Thus, the PATHS set is computed in accordance with Theorem 3, followed

by line 34 that computes the TRANS set in accordance with the theorem.

The test at line 36 checks for the application of Theorem 4. If dl E i, [node

where dd occurs], then by Property 1 there exists at least one definition-clear path P

between d and dd that has at least one unreturned call. It also follows that ALLOW

: 0 and P does not make an unmatched return to a call E TRANSFORM, because

dl is the element. This satisfies the conditions of the theorem. Only the dl element

satisfies the theorem, so it follows that all paths P for the theorem will have to be

constructed from the ROOT1 and LINK1 sets exclusively. Referring to Theorem 4,

line 40 computes the S1 set, line 43 computes the S2 set, line 44 computes the TT set,

and line 45 computes the AA set. The reason for the test at line 41 is that although

there exists at least one path P satisfying the theorem, there may not be any paths

P that begin in the specific procedure X. It can be seen that lines 37 to 46 compute

in accordance with the theorem. 0

Lemma 3. Let Ai and Ti be one pair of Allow and Transform sets associated

with a definition d, and let Aj and Tj be a different pair of Allow and Transform sets

associated with the same definition d. Assume Ai 5 0 and Aj : 0. If Aj C Ai and

Tj C Ti, then dataflow analyzing d with the pair Aj and Tj cannot add anything to

the ripple effect that is not added by dataflow analyzing d with the pair Ai and T,.

Proof. By inspection of Rules 1, 2, and 3, it can be seen that removing some

of the calls from Ai or Ti cannot make d affect anything that it does not affect with

Ai and Ti as they were. Also, by inspection of lines 23 to 46, the determination of

the Allow and Transform sets for any definition dd affected by d, cannot be made to

include calls when Aj and Tj are the restriction sets for d, that would not be included

when Ai and Ti are the restriction sets for d. O

Lemma 4. Let A and T be Allow and Transform sets associated with a definition

d, and let X and Y be a different pair of Allow and Transform sets associated with








the same definition d. If A = 0, then dataflow analyzing d with X and Y cannot add

anything to the ripple effect that is not added by dataflow analyzing d with A and

T.

Proof. By Rule 1, d will be represented by d2 and have no restrictions on its

backward flow. Thus, d will affect everything that it is possible for it to affect. If d

is dataflow analyzed with X and Y, then any calls found in the ROOT1, ROOT2,

LINK1, or LINK2 sets will also be found in the ROOT2 or LINK2 sets when d is

dataflow analyzed with A and T. These sets determine the restriction sets associated

with a definition dd affected by d. It follows that any dataflow path allowed for a dd

affected by d using X and Y, will also be allowed for a dd affected by d using A and

T. o

Theorem 5. Given Definition 1 and Theorems 1 through 4, the algorithm will

correctly compute the logical ripple effect.

Proof. As shown by Lemma 2, for any affected definition dd, the Allow and

Transform sets to be associated with dd are computed in accordance with Theorems 1

to 4. By Lemma 4, if Theorem 1 applies to an affected definition (line 23), then there

is no need to check if any other theorem also applies, because additional dataflow

analysis resulting from the other theorems cannot contribute to the ripple effect.

However, if Theorem 1 does not apply, then the definition must be dataflow analyzed

separately in turn for each theorem that does apply. This is done by the sequence of

three if statements at lines 26, 31, and 36. Thus, the control logic in lines 23 to 46 is

safe.

The Analyze procedure (lines 47 to 52) prepares a definition and its restriction

sets for dataflow analysis by adding them to the stack (line 50 and 52). Once a defi-

nition will be dataflow analyzed with no restrictions (line 50) it will not be analyzed

again (line 47). By Lemma 4, this is safe. Assuming FINdd 0 T and PATHS # 0, the

test at line 47 will not prepare a definition for dataflow analysis if both restriction sets








are subsets of any pair of restriction sets used previously to analyze that definition.

This follows from Lemma 3. Thus, the Analyze procedure is safe.

The correctness of the dataflow equations (line 8) is established in Chapter 2,

and the correctness of the three rules for imposing backward-flow restrictions (line 8)

has already been discussed. Regarding the correctness of having no backward-flow

restrictions for the initial definition (line 5), let p be the program point where b occurs.

For execution to attain point p, any possible execution path between the program's

execution starting point and point p can be assumed to have occurred. Thus, there

should be no restrictions on the backward-flow possibilities of b, because there were

no constraints imposed by the ripple effect on how point p was initially attained. O

Programs with recursive calls can be processed by our algorithm, but there may

be some overestimation of the logical ripple effect because of the recursive calls. The

dataflow equations (line 8) are not the problem, as they work for recursive programs.

Instead, the problem is with the Allow set and its representation of execution paths.

If a cyclic execution path is represented in the Allow set, then when the Allow set is

used to restrict backward flow by Rule 2, it may be possible for an element moving

through the program flowgraph to take a shortcut on its unmatched returns and avoid

having to make unmatched returns along the complete cycle before a program point

can be attained. This shortcut may permit the element to affect something that it

should not be able to affect, possibly adding to the ripple effect beyond what should

be there.

3.3 A Prototype Demonstrates the Algorithm

This section first considers the complexity of our interprocedural logical ripple

effect algorithm. A prototype that demonstrates the algorithm is then described, and

test results presented.








Let n be the number of nodes in the flowgraph of the input program. For a

programming language such as C, solving the dataflow equations for a single defi-

nition, which is what line 8 does, has worst-case complexity of O(n). Let k be the

number of known calls in the input program. Considering line 47, a definition may be

dataflow analyzed repeatedly as long as the associated restriction sets are not subsets

of any previous pair of restriction sets used to dataflow analyze that definition. The

number of different restriction sets possible such that no set is a subset of another

set, is clearly a number that will grow exponentially with k. Thus, the worst-case

complexity of our logical ripple effect algorithm is exponential, where the exponent

is some function of k. However, for the typical input program, the actual number of

non-subset restriction sets that can be generated by our algorithm for a given defini-

tion, will be severely constrained by a combination of Lemma 1, Theorems 1 through

4, and the typical program call structure that is characterized by shallow call depth.

A prototype that demonstrates our logical ripple effect algorithm has been built.

The prototype accepts as input C programs that satisfy certain constraints, such as

having only single-identifier variable names. Given an input program, the prototype

then requires that one or more definitions be identified as the starting point of the

ripple effect. For purposes of comparison, besides using our algorithm to compute

a precise logical ripple effect, the prototype also computes an overestimate of the

logical ripple effect. The overestimate is computed by simply ignoring the execution-

path problem, i.e. there are no backward-flow restrictions when the overestimate is

computed. The worst-case complexity of computing the overestimate for C programs

is only O(nd) where n is the number of flowgraph nodes and d is the number of

definitions in the overestimated ripple effect. This complexity follows from the O(n)

complexity of solving the dataflow equations for a single definition, and the fact that

the equations will have to be solved d times.










Table 3.1. Experimental results for the prototype.


global defs defs global depth nodes RSo RSp reduction time time,
50 2420 7% 2/213 3939 2275 936 53.4% 5s 3s
100 2291 15% 2/188 3776 4151 2449 41.0% 17s 13s
200 2294 30% 2/188 3662 5594 3718 33.5% 40s 32s
300 2370 45% 2/231 3962 5897 2607 55.8% lm5s 27s
50 2225 7% 3/202 3717 1222 633 40.3% 3s 2s
100 2333 15% 3/229 3864 4139 1867 54.9% 17s 7s
200 2211 30% 3/231 3760 4884 2688 45.0% 39s 28s
300 2236 45% 3/205 3737 5308 3505 34.0% 59s 38s
50 2320 7% 4/227 3912 1822 1067 35.1% 5s 3s
100 2211 15% 4/228 3673 4329 1525 64.8% 18s 7s
200 2223 30% 4/227 3705 5019 1918 61.8% 37s 16s
300 2214 45% 4/214 3648 5922 4740 20.0% lm9s 1m36s
100 4354 7% 2/372 6858 4317 2201 40.0% 19s 10s
200 4467 15% 2/368 7068 8844 6457 27.0% lml7s 1ml2s
400 4261 30% 2/388 6851 9653 2976 69.2% 2m29s 49s
600 4289 45% 2/340 6784 10590 6840 35.4% 4m8s 3m56s
100 4314 7% 3/432 6781 1993 631 52.5% 8s 2s
200 4268 15% 3/395 6876 5795 3236 35.5% 51s 54s
400 4223 30% 3/393 6735 9240 7307 20.9% 2m26s 4m21s
600 4248 45% 3/433 6868 9772 6453 30.6% 3m56s 4m50s
100 4252 7% 4/455 6961 2756 1120 42.6% 14s 5s
200 4276 15% 4/440 6858 7781 5752 26.1% ImlOs 2m35s
400 4228 30% 4/391 6681 9838 8290 15.7% 2m45s 9m20s
600 4112 45% 4/462 6802 10017 9192 8.2% 4m24s 39m55s


Table 3.1 presents test results for the prototype. Each row details relevant char-
acteristics of an input program, and presents the resulting averages of ten different
tests of that input program, where each test computed the ripple effect started by a
single, randomly chosen definition of a global variable.
The input programs of Table 3.1 were randomly generated by a separate pro-
gram generator. The generated input programs are syntactically correct and compile
without error, but have meaningless executions. Each input program of Table 3.1 has
100 procedures, and exactly the number of global variables listed. Within each input








program, each global variable is defined and used at least once. The call structure of

each input program was determined randomly by the generator, with the constraint

that there be no recursion in the input program, and the given maximum call depth

not be exceeded by any call in the input program. All calls in the generated input

program are known calls, and approximately 1/(max + 1) of the calls will be at each

possible depth from zero to max, where max is the given maximum call depth.

Referring to the columns of Table 3.1, globall" is the number of global variables

in the input program, "defs" is the number of definitions in the input program, "defs

global" is the percentage of the definitions that define a global variable, "depth" is

the maximum call depth followed by the total number of calls in the input program,

"nodes" is the number of nodes in the flowgraph, "RSo" is the average size of the

overestimated ripple effect for the ten test cases where size is the total number of

definitions and uses in the ripple effect, "RSp" is the average size of the precise

ripple effect, "reduction" is the average percentage reduction for the ten test cases

of the size of the overestimated ripple effect when it is replaced by the precise ripple

effect, timee" is the average CPU usage time for each test case to compute the

overestimated ripple effect, and "timer" is the average CPU usage time for each test

case to compute the precise ripple effect. The hardware used was rated at roughly

24 MIPS. As an example of the time notation used in Table 3.1, time 1m36s would

be read as 1 minute, 36 seconds.

Although the worst-case complexity of our algorithm for precise logical ripple

effect is exponential, the data of Table 3.1 indicates that the expected complexity for

a wide range of input programs, given a programming language such as C, is approxi-

mated by O(nd). This follows from the O(nd) worst-case complexity of computing the

overestimate, and the typical closeness of time, and time, for each row in Table 3.1.

However, the last row of Table 3.1 is instructive, because it shows that regardless of

what the expected complexity might be, there will always be specific input programs








and starting points that require time greatly exceeding the time required to compute

the overestimate. In practice, if the computation of the precise logical ripple effect

is taking too long, then this computation can be abandoned and the overestimate

computed and used in its place. Note that our algorithm can very easily compute

the overestimate by simply modifying Rule 1 so that element d2 is always generated

in place of element d\, thereby avoiding all backward-flow restrictions.

3.4 The Slicing Algorithm

This section presents the inverse form of the precise interprocedural logical

ripple effect algorithm, and the inverse form of the associated dataflow equations and

backward-flow restriction rules. Our algorithm for precise interprocedural slicing is

shown in Figure 3.5. The complexity and expected performance of this algorithm

is the same as for the precise interprocedural logical ripple effect algorithm given

previously.

For logical ripple effect, the dataflow problem solved at line 8 was reaching

definitions for a single definition. For slicing, which is the inverse problem, the

dataflow problem solved at line 8 will be reaching uses for a single use. In reaching

definitions, the definition flows in the direction of the arcs in the flowgraph, and is

killed by definitions of the same variable, and affects uses of the same variable and

any definitions directly dependent on an affected use. In reaching uses, the use flows

in the reverse direction of the arcs in the flowgraph, and is killed by definitions of the

same variable, and affects definitions of the same variable and any uses that directly

determine an affected definition. This reverse flow in the flowgraph means that the

dataflow equations solved at line 8 for the slicing algorithm must be an inverted form

of the dataflow equations that are used for the logical ripple effect algorithm. These

inverted dataflow equations are shown in Figure 3.6. The inverted rules that the

slicing algorithm uses for backward-flow restriction are given below. Notice that the

ALLOW and TRANSFORM sets will contain returns instead of calls.














Compute the slice for a hypothetical or actual use b
Input: a program flowgraph ready for dataflow analysis
Output: the slice in SLICE
begin
1 SLICE 0
2 for each use uu in the program
3 FINU, 1.
end for
4 stack 0
5 push (b, 0, 0) onto stack
6 while stack 0 0 do
7 pop stack into (u, ALLOW, TRANSFORM)
8 Solve the reaching-uses dataflow equations for the single use u,
using Rules 1, 2, and 3.
9 SLICE SLICE U {u}
10 for each definition d in the program that is affected by either ul or u2
11 SLICE <- SLICE U {d}
end for
12 ROOT1 0, LINK1 0, ROOT2 +- 0, LINK2 +- 0
13 for each return node n in the flowgraph
14 if ul E Bin[n] A ul crossed from this return into the returned-from procedure
15 ROOT1 <- ROOT1 U {the return node n}
fi
16 if ul E E,,[n] A ul crossed from this return into the returned-from procedure
17 LINK1 -- LINK1 U {the return node n}
fi
18 if u2 E Bin[n] A u2 crossed from this return into the returned-from procedure
19 ROOT2 +- ROOT2 U {the return node n}
fi
20 if u2 E Ei,[n] A u2 crossed from this return into the returned-from procedure
21 LINK2 4- LINK2 U {the return node n}
fi
end for


Figure 3.5. The slicing algorithm.









22 for each use uu in the program that is affected by either ul or u2
determine Allow and Transform for uu by Theorem 1
23 if u2 E Bout[node where uu occurs]
24 PATHS 0, TRANS 0
25 call Analyze
else
determine Allow and Transform for uu by Theorem 2
26 if u2 E Eout0[node where uu occurs]
27 PATHS +- 0
28 PATHS {I x E (ROOT2 U LINK2) A (x returns from the
procedure that contains uu V x returns from a procedure
that contains a return r E (PATHS n LINK2))}
29 TRANS -- ROOT2 n PATHS
30 call Analyze
fi
determine Allow and Transform for uu by Theorem 3
31 if ul E Bot[node where uu occurs]
32 PATHS 0
33 PATHS {x x E ALLOW A (x returns from the procedure that
contains uu V x returns from a procedure that contains
a return r E PATHS)}
34 TRANS TRANSFORM n PATHS
35 call Analyze
fi
determine Allow and Transform for uu by Theorem 4
36 if ul E Eout[node where uu occurs]
37 for each procedure X that contains a return E ROOT1
38 RT1 {x | x E ROOT1 A x is contained in procedure X}
39 PP 0
40 PP -- {zI x E (RT1 U LINK1) A (x is on a path that inclusively
begins with a return E RT1 and ends with a return from
the procedure that contains uu, such that each return
in this path is in (RT1 U LINK1))}
41 if PP 0
42 PATHS 0
43 PATHS -- {x | x E ALLOW A (x returns from procedure X
V x returns from a procedure that contains
a return r E PATHS)}
44 TRANS TRANSFORM n PATHS
45 PATHS PATHS U PP
46 call Analyze
end statements: fi, end for, fi, fi, end for, od
end


Figure 3.5. continued.









Procedure Analyze
begin
avoid repetition of uu dataflow analysis if possible
47 if FIN.. 0 T A (PATHS = 0
V (true for all saved pairs for uu: PATHS P P V TRANS V T))
48 if PATHS = 0
49 FIN., T
50 push (uu, 0, 0) onto stack
else
51 save PATHS and TRANS as the pair P x T for uu
52 push (uu, PATHS, TRANS) onto stack
fi
fi
end


Figure 3.5. continued.

Rule 1. If ALLOW = 0 then element u2 is generated at the node where use u

occurs, otherwise ul is the generated element.

Rule 2. Let n be a call node, p be the associated return node, and q be the entry

node of the returned-from procedure. Each time the Bot[n] equation is computed, if

ul E Bin[q], then ul cannot cross from B,n[q] into the Bout[n] set if p ALLOW.

Rule 3. Let n be a call node, p be the associated return node, and q be the entry

node of the returned-from procedure. Each time the Bo,,u[n] equation is computed,

if ui E Bi,[q], and, by C2 and Rule 2, ul can cross from Bi,[q] into the Bou,[n] set,

and p E TRANSFORM, then as this ul element crosses from Bin[q] into the Bot[n]

set, the element is changed to u2. In effect, ul is transformed into u2, and the call

node n becomes a generation node for the u2 element.

As the usefulness of slicing is primarily for program fault localization, it may

be desirable to modify the algorithm so that those uses in control predicates whose

subordinate statements have at least one use or definition already in the slice, are

themselves added to the slice and propagated in turn. An example of a control pred-

icate is the condition tested by an if statement. By subordinate statements is meant










For any node n.

OUT[n] = E,,t[n] U B,,t[n]

IN[n] = En[n] U B, [n]

Group I: n is an exit node.

Bo.,[n] = 0

E,,[n]= U {x I xEIN[p A C}
p E succ(n)

B,.[n] = GEN[n]

Ei[n] = E,,u[n] U RECODE[n]

Group II: n is a call node, p is the associated return node and q is the entry node of
the returned-from procedure.

Bout[n] = {x I (x e Bi,[p] A (C- V (C1 A C2 Ax E in[q]))) V (x E Bin[q] A C2)}

Eou,[n] = {x E Ei,[p] I C V (C1 A C2 Ax E Ei.[q])}

Bi,[n] = (Bot[n] KILL[n]) U GEN[n]

Ei,[n] = Eot[n] KILL[n]

Group III: n is not an exit or call node.

B,,t[n] = U Bi,[pl
p E succ(n)

Et[n] = U Ein[p]
p E succ(n)

B,n[n] = (Bo,,[n] KILL[n]) U GEN[n]

Ei,[n] = Eot[n] KILL[n]


Figure 3.6. Dataflow equations for the reaching-uses problem.






76

those statements whose execution is decided by the control predicate. Including these

control-predicate uses in the slice is advantageous because the cause of a program

error may actually be in a control predicate that is not deciding correctly when to

execute its subordinate statements. Ferrante et al. [8] present a method to precisely

determine the control predicates for each statement.














CHAPTER 4
INTERPROCEDURAL PARALLELIZATION

4.1 Loop-Carried Data Dependence

This section explains loop-carried data dependence and its relevance to paral-
lelization. When a definition of a variable reaches a use of that variable, then a data

dependence exists such that the use depends on the definition. An example of data
dependence can be seen in Figure 4.1. The use of A(I) at line 3, and the use of A(I)
at line 4, both depend on the definition of A(I) at line 2. However, when considering

whether or not a loop can be parallelized, there is a special kind of data dependence
called loop-carried data dependence [25]. A data dependence is loop carried if the
value set by a definition inside the loop during loop iteration i can be used by a use
of that variable inside the loop during loop iteration j, where i $ j. Note that i $ j
is specified instead of the more restrictive and natural seeming i < j, because if the
loop is parallelized then the ordering of the loop iterations cannot be assumed.

The relationship between loop-carried data dependence and parallelization is
straightforward. If there is at least one loop-carried data dependence, then the loop

cannot be parallelized, otherwise the loop can be parallelized. Loop parallelization


1 DO I = 1,N
2 A(I)= B(I) C(I) + D
3 B(I)= C(I) / D + A(I)
4 IF C(I) < 0 THEN C(I) = A(I) B(I) FI
END DO


Figure 4.1. An example loop.








would mean that the ordering of the different iterations of the loop is unimportant,

whereas a loop-carried dependence means the opposite. If there are no loop-carried

data dependencies then there is no requirement that the iterations be ordered a

certain way. However, whenever a loop is parallelized, there should be a following,

added, serial step that sets the iteration variables, such as the I in Figure 4.1, to

whatever their values would be for the last iteration of the loop, assuming the loop

had not been parallelized. This added step would be necessary, assuming the iteration

variables of a loop are visible outside the loop and can therefore be referenced after

the loop completes. Iteration variables are those variables that are incremented or

decremented a constant value for each loop iteration. The recognition of iteration

variables is language-dependent.

Regarding data dependence and arrays, there are several efficient tests available

that determine if a data dependence is possible between a particular definition and

use of an array. The tests are the separability test, the gcd test, and the Banerjee

test. Details of these three tests can be found in [25]. The number theory behind the

tests is linear diophantine equations. A linear diophantine equation can be formed

from the array subscripts of the definition and use in question. For example, in

Figure 4.2 we want to know if A(3 I 5) and A(6 I) can ever refer to the

same array element. The linear diophantine equation that relates these two array

references would be 3x 6y = 5. The question now becomes does this equation have

any integer solutions given the boundary conditions 30 < x, y < 100. If there is at

least one integer solution, then there would be a data dependence, otherwise there is

no data dependence, as is the case with Figure 4.2.

For the discussion that follows, we define the term loop body. The loop body

of any loop L will be all statements in the program that can possibly be executed

during the iterations of loop L. Calls are allowed in a loop, so a single loop body

could conceivably include the statements of many different procedures. For example,









DO I = 30,100
A(3 I- 5) = ...

... A(6 I)
END DO


Figure 4.2. A loop with array references.

if a loop contains a call of procedure A, and procedure A contains a call of procedure

B, then the loop body would include all the statements of procedures A and B. In

Figure 4.1, the loop body is the four statements at lines 1 through 4.

With respect to the program flowgraph, the loop body is all flowgraph nodes

that may be traversed during the iterations of the loop. Let LB be the set of flowgraph

nodes that are in the loop body of loop L. Let n be the first node in the loop body

that is traversed during each iteration of the loop. The identification of node n is

language-dependent. Within the loop body of L, let definition d be a definition of

a non-array variable v, and let use u be a use of the variable v that is reached by

definition d. Let d be the node in the loop body where definition d occurs, and let

u be the node in the loop body where the use u occurs. To avoid the complications

posed by special cases, we assume that d, n, and u are separate and distinct nodes.

Although use u depends on definition d because definition d reaches use u,

this data dependence can prevent parallelization of loop L only if the dependence is

loop carried. Let P be a sequence of flowgraph nodes drawn from LB, such that P

represents a possible execution path along which definition d can reach use u. For

definition d to be loop-carried to use u along path P, the three nodes, d, n, and u,

must be in P, and in that order, because only the traversal of node n represents the

transition to a different iteration of the loop. If v is an array, then we assume that

definition d and use u may refer to different array elements during the same iteration.

For this reason, a path P that includes the nodes d, u, n, d, u, in that order, must








be assumed to show a loop-carried data dependence when v is an array, whereas this

path P does not show a loop-carried data dependence if definition d and use u always

refer to the same storage location during any iteration, as we assume is the case when

v is a non-array, because in any iteration that follows such a path P, the value used

at use u is always the value defined at definition d in that same iteration.

4.2 The Parallelization Algorithm

This section presents in Figure 4.3 an algorithm that identifies loops that can be

parallelized, including loops that contain calls. The algorithm uses our interprocedu-

ral dataflow analysis method as an integral step to determine data dependencies. The

loops that can be parallelized are those loops that are not marked by the algorithm

as inhibited.

The algorithm has three distinct steps. First, the reaching-definitions dataflow

problem is solved for the input program by using our interprocedural dataflow analysis

method. Second, the quality of the reaching-definition information computed by the

first step is possibly improved in the case of array references by using the separability,

gcd, and Banerjee tests. Third, individual d, u pairs that represent data dependence

are examined for loop-carried data dependence.

At line 7, the definitions and uses of iteration variables are excluded from testing

for loop-carried data dependence, because for any iteration the iteration variables will

have constant values that can be precomputed if loop L is parallelized. The test at

line 8 is a necessary condition for the P-test procedure to return a T, which is tested

for at line 9. The test at line 8 is done as an economy measure to avoid, when

possible, the more costly P-test.

Procedure P-test uses a straightforward algorithm that begins with node d and

then spreads out examining successors, successors of successors, and so on, until either

there are no more acceptable nodes to examine, in which case F is returned, or all the

requirements for path P have been met, in which case T is returned. The successors













a d, u pair is a definition d that reaches a use u
x is the dataflow element that represents the definition d
v is the variable referenced by definition d and use u

to avoid complications, n # d # u is assumed
n is the first node traversed during each loop L iteration
d is the node whose basic block contains definition d
u is the node whose basic block contains use u

LB is the set of nodes in the loop body of loop L
IV is the set of definitions of iteration variables for loop L

begin
step 1, determine reaching definitions for the input program
1 use our method to solve the reaching-definitions dataflow problem

step 2, improve the reaching-definition information for array references
2 for all d, u pairs in the program, such that v is an array
3 use the separability, gcd, and Banerjee tests as applicable
4 if definition d and use u can never reference the same element
5 mark the d, u pair as non-reaching
fi
end for

step 3, identify d, u pairs that inhibit parallelization
6 for each loop L in the program
7 for each reaching d, u pair such that d, u E LB and definition d IV
8 if x E Bot[n]
9 if P-test(x, n, d, u, L, LB) = T
10 mark L parallelization as inhibited by the d, u pair
fi
fi
end for
end for
end


Figure 4.3. The parallelization algorithm.

















procedure P-test(x, n, d, u, L, LB)
is there a loop-carried data dependence from definition d to use u thru node n
return T if yes, F if no
begin
partly, is there a path from d to n along which x is found
11 if v is an array
12 DONE {d}
else
13 DONE -- {d, u}
fi
14 NEXT <- {d}
15 until NEXT = 0
16 remove a node from NEXT, denote it p
17 for each successor node s of node p, such that s a DONE
18 DONE DONE U {s}
19 if s V LB
Vs is an entry node
Vx V Bo,,[s]
20 ignore s
21 else if s = n
22 goto part
else
23 NEXT +- NEXT U {s}
fi
end for
end until
24 return F


Figure 4.3. continued.


















part2:
part2, is there a path from n to u along which x is found
25 if v is an array
26 DONE -- {n}
else
27 DONE {n, d}
fi
28 NEXT {n}
29 until NEXT = 0
30 remove a node from NEXT, denote it p
31 for each successor node s of node p, such that s 4 DONE
32 DONE DONE U {s}
33 if s LB
Vs is an exit node
V(s is contained in the same procedure that contains L A x B Bot[s])
V(s is not contained in the same procedure that contains L A x E,,t[s])
34 ignore s
35 else if s = u
36 return T
else
37 NEXT NEXT U {s}
fi
end for
end until
38 return F
end


Figure 4.3. continued.








of a node are examined because normally a successor node is assumed to represent a

possible continuation of the execution path from the point of the predecessor node.

Exceptions in the algorithm involving entry and exit nodes are explained shortly.

Note that P-test only determines whether a satisfactory path P exists or not; it does

not determine what path P is in terms of an actual node sequence, as there may be

many such satisfactory paths P. Lines 13 and 27 are active when v is not an array.

In this case, a path P that includes d, u, n, d, u, in that order, is not allowed, and

this is prevented by marking the unwanted node u at line 13, and the unwanted node

d at line 27.

The test of x B,,t[s] at line 19 satisfies the requirement that the definition

d can reach along the path P. A similar test is made at line 33. At line 19, only

the B set is checked because there are no descents into called procedures, as per

the rejection of entry nodes at line 19. Entry nodes are rejected at line 19 because

any path from d to n will not leave unreturned calls, because n is an outermost node

relative to the loop body, and the path is confined to the loop body. As the successors

of each call node are an entry node and a return node, it is only necessary to check

the out set of the return node to know whether the element x survived the call or

not, and this is effectively done by the x V Bot[s] test already mentioned. At line 33,

exit nodes are rejected because any path from n to u will not make a return without

first making the call. This follows from the fact, already mentioned, that node n is

an outermost node relative to the loop body, and the path is confined to the loop

body. As the return node can always be added to the path P from the call node,

there is no need to add it from the exit node, hence the rejection of the exit node.

For partly and part2 in procedure P-test, each flowgraph node may appear only

once in the NEXT set, hence the complexity of the P-test procedure is O(n) where n

is the number of flowgraph nodes. For the entire algorithm, step3 dominates, so the






85

complexity is O(lpn) where I is the number of loops in the program, p is the number

of d,u pairs in the program, and n is the number of flowgraph nodes.















CHAPTER 5
CONCLUSIONS AND FUTURE RESEARCH

5.1 Summary of Main Results

The first part of this work presented a new method for context-dependent, flow-

sensitive interprocedural dataflow analysis. The method was shown to produce a

precise, low-cost solution for such fundamental and important problems as reaching

definitions and available expressions, regardless of the actual call structure of the

program being analyzed. By using a separate set to isolate calling-context effects,

and another set to accumulate body effects, the calling-context problem has been

reduced to the problem of solving the dataflow equations that compute the different

sets. These equations can be solved by the iterative algorithm. As part of our

method, the interprocedural kill effects of call-by-reference formal parameters are

correctly handled by the equations-compatible technique of element recoding.

The importance of our interprocedural analysis method lies in the fact that

a number of different applications depend on the solution of fundamental dataflow

problems such as reaching definitions, live variables, definition-use and use-definition

chains, and available expressions. Program revalidation, dataflow anomaly detection,

compiler optimization, automatic vectorization and parallelization, and software tools

that make a program more understandable by revealing data dependencies, are some

of the applications that may benefit by using our method.

The second part of this work presented new algorithms for precise interprocedu-

ral logical ripple effect and slicing. The algorithms use our interprocedural dataflow

analysis method, and add a control mechanism by which, in effect, execution-path








history can affect execution-path continuation as the ripple effect or slice is built

piece by piece.

The importance of our algorithms for precise interprocedural logical ripple effect

and slicing lies in their applicability to the areas of software maintenance and debug-

ging. A precise interprocedural logical ripple effect can be used to show a programmer

the consequences of program changes, thereby reducing errors and maintenance cost.

Similarly, a precise interprocedural slice can localize program faults, thereby saving

programmer effort and debugging cost.

The third part of this work presented an algorithm that identifies loops that

can be parallelized, including loops that contain calls. The algorithm makes use of

our interprocedural dataflow analysis method to determine data dependencies, and

then the algorithm examines the data dependencies within each loop and determines

if any of these data dependencies are loop-carried, in which case parallelization of the

loop is inhibited. The algorithm has potential use in parallelization tools.

5.2 Directions for Future Research

There are several topics of possible future research related to our method for

interprocedural dataflow analysis. Regarding solving the equations, besides the it-

erative algorithm there are elimination algorithms [20] that have better complexity.

Further studies are needed to determine to what extent these other algorithms can

be used to solve the equations. Another topic regards the dataflow problems that can

be solved by our method, as the actual universe of solvable problems remains to be

determined. We have only mentioned a few of the better known problems. For some

dataflow problems, it may be that our method can be used after suitable modification

to adapt it to the special needs of the problem.

Regarding possible future research related to our algorithms for precise inter-

procedural logical ripple effect and slicing, because the algorithms may overestimate

when recursive calls are present, or because the Allow set lacks the information needed






88

to enforce the ordering of unmatched returns, one area of future research would be

to investigate the possibility of modifying Definition 1, Theorems 1 through 4, and

the algorithms, so as to remove the possibility of such overestimation.














REFERENCES


[1] Agrawal, H., and Horgan, J. Dynamic program slicing. Proceedings of the SIG-
PLAN 90 Conference on Programming Language Design and Implementation.
ACM SIGPLAN Notices, 25, 6 (June 1990), 246-256.
[2] Aho, A., Sethi, R., and Ullman, J. Compilers, Principles, Techniques and Tools.
Addison-Wesley, Reading, MA (1986).
[3] Allen, F. Interprocedural data flow analysis. Proceedings of the IFIP Congress
1974, North Holland, Amsterdam (1974), 398-402.
[4] Banning, J. An efficient way to find the side effects of procedure calls and the
aliases of variables. Conference Record of the 6th ACM Symposium on Principles
of Programming Languages, ACM, New York (Jan. 1979), 29-41.
[5] Burke, M., and Cytron, R. Interprocedural dependence analysis and paralleliza-
tion. Proceedings of the SIGPLAN 86 Symposium on Compiler Construction,
162-175.
[6] Callahan, D. The program summary graph and flow-sensitive interprocedural
data flow analysis. Proceedings of the SIGPLAN 88 Conference on Program-
ming Language Design and Implementation. ACM SIGPLAN Notices, 23, 7 (July
1988), 47-56.
[7] Cooper, K., and Kennedy, K. Interprocedural side-effect analysis in linear time.
Proceedings of the SIGPLAN 88 Conference on Programming Language Design
and Implementation. ACM SIGPLAN Notices, 23, 7 (July 1988), 57-66.
[8] Ferrante, J., Ottenstein, K., and Warren, J. The program dependence graph
and its use in optimization. ACM Transactions on Programming Languages and
Systems, 9, 2 (1987), 319-349.
[9] Harrold, M., and Soffa, M. Computation of interprocedural definition and use
dependencies. Proceedings of the IEEE Computer Society 1990 Int'l Conference
on Computer Languages, New Orleans, LA (March 1990).
[10] Hecht, M. Flow Analysis of Computer Programs. Elsevier North-Holland, New
York (1977).
[11] Horwitz, S., Reps, T., and Binkley, D. Interprocedural slicing using dependence
graphs. ACM Transactions on Programming Languages and Systems, 12, 1 (Jan.
1990), 26-60.
[12] Hwang, J., Du, M., and Chou, C. Finding program slices for recursive procedures.
Proceedings of the IEEE COMPSAC 88 (Oct. 1988), 220-227.








[13] Johmann, K., Liu, S., and Yau, S. Dataflow Equations for Context-Dependent
Flow-Sensitive Interprocedural Analysis. SERC-TR-45-F, Department of Com-
puter and Information Sciences, University of Florida, Gainesville (Jan. 1991).
[14] Korel, B., and Laski, J. Dynamic program slicing. Information Processing Let-
ters, 29, 3 (Oct. 1988), 155-163.
[15] Landi, W., and Ryder, B. Pointer-induced aliasing: a problem classification.
Conference Record of the 18th ACM Symposium on Principles of Programming
Languages, ACM, New York (1991), 93-103.
[16] Leung, H., and Reghbati, H. Comments on program slicing. IEEE Transactions
on Software Engineering, SE-13, 12 (Dec. 1987), 1370-1371.
[17] Myers, E. A precise interprocedural data flow analysis algorithm. Conference
Record of the 8th ACM Symposium on Principles of Programming Languages,
ACM, New York (1981), 219-230.
[18] Richardson, S., and Ganapathi, M. Interprocedural optimization: experimental
results. Software-Practice and Experience, 19, 2 (1989), 149-169.
[19] Rosen, B. Data flow analysis for procedural languages. Journal of the ACM, 26,
2 (April 1979), 322-344.
[20] Ryder, B., and Paull, M. Elimination algorithms for data flow analysis. ACM
Computing Surveys, 18, 3 (Sep. 1986), 277-316.
[21] Sharir, M., and Pnueli, A. Two approaches to interprocedural data flow analysis.
Muchnik, S., and Jones, N. Eds. Program Flow Analysis: Theory and Applica-
tions, Prentice-Hall, Englewood Cliffs, NJ (1981), 189-232.
[22] Triolet, R., Irigoin, F., Feautrier, P. Direct parallelization of call statements.
Proceedings of the SIGPLAN 86 Symposium on Compiler Construction, 176-
185.
[23] Weiser, M. Programmers use slices when debugging. Communications of the
ACM, 25, 7 (July 1982), 446-452.
[24] Weiser, M. Program slicing. IEEE Transactions on Software Engineering, SE-10,
4 (July 1984), 352-357.
[25] Zima, H., and Chapman, B. Supercompilers for Parallel and Vector Computers.
Addison-Wesley, Reading, MA (1990).













BIOGRAPHICAL SKETCH


Kurt Johmann was born in Elizabeth, New Jersey, on November 16, 1955. In

1978 he received a B.A. in computer science from Rutgers University in New Jersey.

Following graduation, he worked for a shipping company, Sea-Land Service Inc., as

a programmer and systems analyst. In 1985 he left Sea-Land and did PC work for

three years. Following this, he entered the graduate program of the Computer and

Information Sciences Department at the University of Florida in the Fall of 1988.

He received an M.S. in computer science, December 1989, and entered the Ph.D.

program. Anticipating graduation, he hopes to find a job in academia.







I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope
and quality, as a dissertation for the degree of Doctor of Philosophy.





Step S. Yau, C aian
Professor of Comp ,r and
Information Sciences



I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope
and quality, as a dissertation for the degree of Doctor of Philosophy.





Richard Newman-Wolfe, Cochairman
Assistant Professor of
Computer and Information Sciences



I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope
and quality, as a dissertation for the degree of Doctor of Philosophy.





Paul Fishwick
Associate Professor of
Computer and Information Sciences







I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope
and quality, as a dissertation for the degree of Doctor of Philosophy.





Mark Yang
Professor of tati




This dissertation was submitted to the Graduate Faculty of the College
of Engineering and to the Graduate School and was accepted as partial fulfillment
of the requirements for the degree of Doctor of Philosophy.




May, 1992 ., .----
SWinfred M. Phillips
Dean, College of Engineering





Madelyn M. Lockhart
Dean, Graduate School






































UNIVERSITY OF FLORIDA
III 1 11111 Ill 111 I I
3 1262 08285 449 7




Full Text

PAGE 1

&217(;7'(3(1'(17 )/2:6(16,7,9( ,17(5352&('85$/ '$7$)/2: $1$/<6,6 $1' ,76 $33/,&$7,21 72 6/,&,1* $1' 3$5$//(/,=$7,21 %\ .857 -2+0$11 $ ',66(57$7,21 35(6(17(' 72 7+( *5$'8$7( 6&+22/ 2) 7+( 81,9(56,7< 2) )/25,'$ ,1 3$57,$/ )8/),//0(17 2) 7+( 5(48,5(0(176 )25 7+( '(*5(( 2) '2&725 2) 3+,/2623+< 81,9(56,7< 2) )/25,'$ 81,9(56,7< 2) )/25IIO5 /,%5$5,(6

PAGE 2

$&.12:/('*(0(176 ZRXOG OLNH WR H[SUHVV P\ DSSUHFLDWLRQ DQG JUDWLWXGH WR P\ FKDLUPDQ DQG DGYLVRU 'U 6WHSKHQ 6
PAGE 3



PAGE 4

'LUHFWLRQV IRU )XWXUH 5HVHDUFK 5()(5(1&(6 %,2*5$3+,&$/ 6.(7&+ LY

PAGE 5

$EVWUDFW RI 'LVVHUWDWLRQ 3UHVHQWHG WR WKH *UDGXDWH 6FKRRO RI WKH 8QLYHUVLW\ RI )ORULGD LQ 3DUWLDO )XOILOOPHQW RI WKH 5HTXLUHPHQWV IRU WKH 'HJUHH RI 'RFWRU RI 3KLORVRSK\ &217(;7'(3(1'(17 )/2:6(16,7,9( ,17(5352&('85$/ '$7$)/2: $1$/<6,6 $1' ,76 $33/,&$7,21 72 6/,&,1* $1' 3$5$//(/,=$7,21 %\ .XUW -RKPDQQ 0D\ &KDLUPDQ 'U 6WHSKHQ 6
PAGE 6



PAGE 7

&+$37(5 ,1752'8&7,21 ,QWHUSURFHGXUDO 'DWDIORZ $QDO\VLV 'DWDIORZ DQDO\VLV UHIHUV WR D FODVV RI SUREOHPV WKDW DVN DERXW WKH UHODWLRQVKLSV WKDW H[LVW DORQJ D SURJUDPfV SRVVLEOH H[HFXWLRQ SDWKV EHWZHHQ VXFK SURJUDP HOHn

PAGE 8

ZLOO EH UHIHUUHG WR DV WKH FDOOLQJFRQWH[W SUREOHP 6HFRQG FDOOE\UHIHUHQFH IRUPDO SDUDPHWHUV W\SLFDOO\ FDXVH DOLDV UHODWLRQVKLSV EHWZHHQ DFWXDO DQG IRUPDO SDUDPHWHUV WKDW DUH YDOLG RQO\ IRU FHUWDLQ FDOOV DQG DSSO\ RQO\ WR WKRVH SDVVHV WKURXJK WKH FDOOHG SURFHGXUH WKDW RULJLQDWH IURP WKRVH FDOOV WKDW HVWDEOLVK WKH VSHFLILF DOLDV UHODWLRQVKLS 7KHUH DUH PDQ\ DSSOLFDWLRQV IRU D IORZVHQVLWLYH LQWHUSURFHGXUDO GDWDIORZ DQDOn \VLV PHWKRG WKDW VROYHV WKH WZR PDMRU SUREOHPV DVVXPLQJ WKDW WKH FRVWV RI WKH PHWKRG DUH QRW WRR KLJK 6RPH RI WKH ZHOONQRZQ GDWDIORZ SUREOHPV WKDW FDQ EH SUHFLVHO\ VROYHG E\ VXFK D PHWKRG DUH UHDFKLQJ GHILQLWLRQV OLYH YDULDEOHV WKH UHODWHG SUREOHPV RI GHILQLWLRQXVH DQG XVHGHILQLWLRQ FKDLQV DQG DYDLODEOH H[SUHVVLRQV $SnfV QRQNLOOHG FDOOLQJ FRQWH[W LV SUHVHUYHG E\ PHDQV RI D VLPSOH LQWHUn VHFWLRQ RSHUDWLRQ GRQH DW WKH UHWXUQ QRGH IRU WKH FDOO 7KH PDLQ DGYDQWDJH RI RXU PHWKRG LV LWV ORZ FRPSOH[LW\ DQG WKH IDFW WKDW WKH SUHVHQFH RI UHFXUVLRQ GRHV QRW DIIHFW WKH SUHFLVHQHVV RI WKH UHVXOW 7KH ODQJXDJH PRGHO DVVXPHG IRU &KDSWHU DOORZV JOREDO YDULDEOHV EXW WKH YLVLELOLW\ RI HDFK IRUPDO SDUDPHWHU LV OLPLWHG WR WKH VLQJOH SURFHGXUH WKDW GHFODUHV

PAGE 9

LW 7KXV ZLWK WKH H[FHSWLRQ RI D FDOO DQG LWV LQGLUHFW UHIHUHQFH HDFK IRUPDO SDn

PAGE 10

m \ f§ I FDOO % FDOO $ HQG [ \ ] f§ [ I m FDOO % HQG HQG )RU WKH H[DPSOH DVVXPH WKDW DOO YDULDEOHV DUH JOREDO DQG WKDW WKH SUREOHP LV WR GHWHUPLQH WKH ORJLFDO ULSSOH HIIHFW IRU WKH GHILQLWLRQ RI YDULDEOH DW OLQH 7KH FDOO

PAGE 11



PAGE 12

n

PAGE 13

/LWHUDWXUH 5HYLHZ 'LIIHUHQW PHWKRGV KDYH EHHQ RIIHUHG IRU VROYLQJ YDULRXV IORZVHQVLWLYH LQWHUSURn FHGXUDO GDWDIORZ DQDO\VLV SUREOHPV 6KDULU DQG 3QXHOL >@ SUHVHQW D PHWKRG WKH\ QDPH FDOOVWULQJV 7KH HVVHQWLDO LGHD RI WKHLU PHWKRG LV WR DFFXPXODWH IRU HDFK HOHn

PAGE 14

LW ZKHQ WKH\ LQ WXUQ DUH DQDO\]HG 7KH REYLRXV GUDZEDFN RI WKLV PHWKRG LV WKDW LW FDQQRW EH XVHG WR DQDO\]H UHFXUVLYH FDOOV 5RVHQ >@ SUHVHQWV D FRPSOH[ PHWKRG IRU LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV WKDW LV OLPLWHG WR VROYLQJ WKH SUREOHPV RI YDULDEOH PRGLILFDWLRQ SUHVHUYDWLRQ DQG XVH 7KHVH GDWDIORZ SUREOHPV GR QRW UHTXLUH D VROXWLRQ RI WKH FDOOLQJFRQWH[W SUREOHP &DOODKDQ >@ KDV SURSRVHG WKH SURJUDP VXPPDU\ JUDSK WR VROYH WKH LQWHUSURFHn GXUDO GDWDIORZ SUREOHPV RI NLOO DQG XVH ZKHUH NLOO GHWHUPLQHV DOO GHILQLWH NLOOV WKDW UHVXOW IURP D SURFHGXUH FDOO DQG XVH GHWHUPLQHV DOO YDULDEOHV WKDW PD\ EH XVHG DV D UHVXOW RI D SURFHGXUH FDOO EHIRUH EHLQJ UHGHILQHG $V SDUW RI WKH GHWHUPLQDWLRQ RI HGJHV LQ WKH SURJUDP VXPPDU\ JUDSK LQWUDSURn FHGXUDO UHDFKLQJGHILQLWLRQV DQDO\VLV PXVW EH GRQH IRU HDFK SURFHGXUH 6LPSOLI\LQJ &DOODKDQfV VSDFH FRPSOH[LW\ DQDO\VLV ZH JHW YJDOf DV WKH ZRUVWFDVH VL]H RI WKH SURJUDP VXPPDU\ JUDSK ZKHUH YJD LV WKH QXPEHU RI JOREDO YDULDEOHV LQ WKH SURJUDP SOXV WKH DYHUDJH QXPEHU RI DFWXDO SDUDPHWHUV SHU FDOO DQG LV WKH SURJUDP VL]H 2QH OLPLWDWLRQ RI &DOODKDQfV PHWKRG LV WKDW LW GRHV QRW FRUUHFWO\ KDQGOH PXOWLSOH DOLDVHV WKDW UHVXOW ZKHQ WKH VDPH YDULDEOH LV XVHG PXOWLSOH WLPHV DV DQ DFWXDO SDUDPHWHU LQ WKH VDPH FDOO DQG WKH FRUUHVSRQGLQJ IRUPDO SDUDPHWHUV DUH FDOOE\UHIHUHQFH %\ FRQWUDVW RXU PHWKRG XVLQJ HOHPHQW UHFRGLQJ ZKHUH DOO WKH DOLDVHV DUH HQFRGHG LQ D VLQJOH HOHPHQW ZLOO FRUUHFWO\ KDQGOH WKH PXOWLSOH DOLDVHV SUREOHP &DOODKDQfV PHWKRG RIIHUV QR VROXWLRQ WR WKH FDOOLQJFRQWH[W SUREOHP DQG FRXOG QRW EH XVHG WR GHWHUPLQH IRU H[DPSOH LQWHUSURFHGXUDO UHDFKLQJ GHILQLWLRQV +RZHYHU +DUUROG DQG 6RIID >@ KDYH H[WHQGHG KLV PHWKRG VR WKDW LQWHUSURFHGXUDO UHDFKLQJ GHILQLWLRQV FDQ EH GHWHUPLQHG 7KH\ XVH DQ LQWHUSURFHGXUDO IORZJUDSK GHQRWHG ,)* WKDW LV YHU\ VLPLODU WR WKH SURJUDP VXPPDU\ JUDSK 7KH ,)* KDV LQWHUUHDFKLQJ HGJHV WKDW DUH GHWHUPLQHG E\ VROYLQJ &DOODKDQfV NLOO SUREOHP 7KH\ UHFRPPHQG XVLQJ KLV PHWKRG VR WKHLU PHWKRG LQKHULWV &DOODKDQfV VSDFH DQG WLPH FRPSOH[LW\ DV ZHOO DV LWV OLPLWDWLRQ ZLWK UHJDUG WR PXOWLSOH DOLDVHV

PAGE 15

%HIRUH WKH ,)* FDQ EH XVHG LW PXVW EH GHFRUDWHG ZLWK WKH UHVXOWV RI LQWUDSURn FHGXUDO DQDO\VLV GRQH WZLFH IRU HDFK SURFHGXUH WR GHWHUPLQH ERWK UHDFKLQJ GHILQLWLRQV DQG XSZDUGO\ H[SRVHG XVHV 7KHQ DQ DOJRULWKP LV XVHG WR SURSDJDWH WKH XSZDUGO\ H[SRVHG XVHV WKURXJKRXW WKH ,)* 7KLV DOJRULWKP KDV ZRUVWFDVH WLPH FRPSOH[LW\ RI Qf ZKHUH Q LV WKH QXPEHU RI QRGHV LQ WKH ,)* 7KHLU JUDSK ZLOO KDYH WKH VDPH QXPEHU RI QRGHV DV IRU &DOODKDQfV JUDSK PHDQLQJ ZRUVWFDVH JUDSK VL]H ZLOO EH YJDOf 6XEVWLWXWLQJ YJDO IRU Q ZH JHW D ZRUVWFDVH WLPH FRPSOH[LW\ RI Y-f $V WKH VL]H RI RXU IORZJUDSK LV SURSRUWLRQDO WR WKH VL]H RI WKH SURJUDP WKH ZRUVWFDVH WLPH FRPSOH[LW\ IRU VROYLQJ RXU HTXDWLRQV LV RQO\ Off ; ‘ 'f ZKHUH LV WKH WRWDO QXPEHU RI SURFHGXUHV DQG FDOOV LQ WKH SURJUDP ; LV WKH

PAGE 16

WRWDO QXPEHU RI JOREDO YDULDEOHV LQ WKH SURJUDP SOXV D WHUP WKDW FDQ EH FRQVLGHUHG D FRQVWDQW DQG LV D OLQHDU IXQFWLRQ RI ; 2QFH WKH V\VWHP GHSHQGHQFH JUDSK LV FRPSOHWH DQ\ SDUWLFXODU VOLFH WKDW LV ZDQWHG FDQ EH H[WUDFWHG IURP WKH JUDSK DW FRPSOH[LW\ Qf ZKHUH Q LV WKH VL]H RI WKH JUDSK 7KH VL]H RI WKH JUDSK LV URXJKO\ TXDGUDWLF ZLWK SURJUDP VL]H EHLQJ ERXQGHG E\ 3 f 9 I (f 7 f ;f ZKHUH 3 LV WKH QXPEHU RI SURFHGXUHV 9 LV WKH ODUJHVW QXPEHU RI SUHGLFDWHV DQG GHILQLWLRQV LQ D VLQJOH SURFHGXUH ( LV WKH ODUJHVW QXPEHU RI HGJHV LQ D SURFHGXUH GHSHQGHQFH JUDSK 7 LV WKH QXPEHU RI FDOOV LQ WKH SURJUDP DQG ; LV WKH QXPEHU RI JOREDO YDULDEOHV ,Q WKHLU SDSHU PXFK LV PDGH RI WKH IDFW WKDW RQFH WKH JUDSK LV FRPSOHWH DQ\ VOLFH RQ DQ DFWXDO GHILQLWLRQ RU XVH FDQ EH H[WUDFWHG IURP WKH JUDSK DW Qf FRVW ZKHUH Q LV WKH VL]H RI WKH JUDSK +RZHYHU WKH QXPEHU RI DFWXDO GHILQLWLRQ DQG XVH RFFXUUHQFHV LQ D SURJUDP LV SURSRUWLRQDO WR WKH SURJUDP VL]H / 7KHUHIRUH DQ\ PHWKRG WKDW FDQ FRPSXWH D VOLFH DW FRVW 2=f IRU VRPH = FDQ JHQHUDWH DOO WKH VOLFHV FRQWDLQHG LQ WKHLU JUDSK DW FRVW = f /f VSRRO WKH VOLFHV WR GLVN DQG UHFRYHU WKHP DW FRVW fn DOOHOL]DWLRQ LV VSHFLILFDOO\ FRQVLGHUHG E\ %XUNH DQG &\WURQ >@ DQG E\ 7ULROHW HW DO >@

PAGE 17

2XWOLQH LQ %ULHI 7KLV LQWURGXFWRU\ FKDSWHU HQGV ZLWK D EULHI V\QRSVLV RI WKH UHPDLQLQJ FKDSWHUV &KDSWHU SUHVHQWV LQ GHWDLO RXU LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV PHWKRG 7KH FKDSn WHU HQGV ZLWK D EULHI GHVFULSWLRQ RI WKH SURWRW\SHV WKDW ZHUH EXLOW WR GHPRQVWUDWH WKH PHWKRG DORQJ ZLWK VRPH RI WKH H[SHULPHQWDO UHVXOWV REWDLQHG IURP WKHVH SURWRW\SHV &KDSWHU EHJLQV ZLWK D UHSUHVHQWDWLRQ VFKHPH IRU FRQWLQXDWLRQ SDWKV IRU WKH LQWHUn

PAGE 18

&+$37(5 7+( ,17(5352&('85$/ '$7$)/2: $1$/<6,6 0(7+2' &RQVWUXFWLQJ WKH )ORZJUDSK 7KLV VHFWLRQ GLVFXVVHV WKH IORZJUDSK DQG LWV UHODWLRQVKLS WR GDWDIORZ HTXDWLRQV $IWHU WKH GLVFXVVLRQ UXOHV DUH JLYHQ IRU FRQVWUXFWLQJ WKH VSHFLILF IORZJUDSK UHTXLUHG E\ RXU LQWHUSURFHGXUDO DQDO\VLV PHWKRG 1RWH WKDW WKH UHTXLUHG IORZJUDSK LV FRQnfV ,1 *(1

PAGE 19

DQG .,// VHWV )RU EDFNZDUGIORZ WKH 287 VHW RI D QRGH LV FRPSXWHG DV WKH FRQn IOXHQFH RI WKH ,1 VHWV RI WKH VXFFHVVRU QRGHV DQG WKH ,1 VHW LV D IXQFWLRQ RI WKH QRGHfV 287 *(1 DQG .,// VHWV 7KH SUHGHFHVVRUV RI DQ\ QRGH Q DUH WKRVH QRGHV WKDW KDYH DQ RXWHGJH GLUHFWHG WR QRGH Q 7KH VXFFHVVRUV RI QRGH Q DUH WKRVH QRGHV WKDW KDYH DQ LQHGJH GLUHFWHG IURP QRGH Q 7KH FRQIOXHQFH RSHUDWRU ZLOO DOPRVW LQn YDULDEO\ EH HLWKHU VHW XQLRQ RU VHW LQWHUVHFWLRQ GHSHQGLQJ RQ WKH SUREOHP 7KXV D GDWDIORZ SUREOHP PD\ EH FODVVLILHG DV EHLQJ HLWKHU IRUZDUGIORZRU IRUZDUGIORZDQG EDFNZDUGIORZRU RU EDFNZDUGIORZDQG ZKHUH fRUf UHIHUV WR VHW XQLRQ DQG fDQGf

PAGE 20

fV H[LW QRGH

PAGE 21

,Q DOO HDFK NQRZQ FDOO UHVXOWV LQ WZR QRGHV DQG WKUHH GLVWLQFW HGJHV 2QH HGJH FRQQHFWV WKH FDOO QRGH WR LWV UHWXUQ QRGH $ VHFRQG HGJH FRQQHFWV WKH FDOO QRGH WR WKH FDOOHG SURFHGXUHfV HQWU\ QRGH $ WKLUG HGJH FRQQHFWV WKH FDOOHG SURFHGXUHfV H[LW QRGH WR WKH UHWXUQ QRGH ,Q FRQVWUXFWLQJ WKH IORZJUDSK D VSHFLDO SUREOHP DULVHV LI WKH SURJUDPPLQJ ODQn JXDJH DOORZV SURFHGXUHYDOXHG YDULDEOHV VXFK DV WKH IXQFWLRQ SRLQWHUV RI & WKDW ZKHQ GHUHIHUHQFHG UHVXOW LQ D FDOO RI WKH IXQFWLRQ WKDW LV SRLQWHG DW 7KH SUREOHP LV WR LGHQWLI\ ZKDW DUH WKH SRVVLEOH SURFHGXUH YDOXHV ZKHQ WKH SURFHGXUHYDOXHG YDULDEOH LQYRNHV D FDOO $VVXPLQJ WKLV LQIRUPDWLRQ LV DYDLODEOH IURP D VHSDUDWH DQDO\VLV WKH IORZJUDSK FDQ EH FRQVWUXFWHG DFFRUGLQJO\ )RU H[DPSOH LI WKH SURFHGXUHYDOXHG YDULnn VWUXFWHG WKDW WUHDWV DOO FDOOV LQYRNHG E\ SURFHGXUHYDOXHG YDULDEOHV DV XQNQRZQ FDOOV IROORZHG E\ D VROYLQJ RI WKH GDWDIORZ SUREOHP IRU GHWHUPLQLQJ SRVVLEOH SRLQWHU YDOXHV ZKHQHYHU D SRLQWHU LV GHUHIHUHQFHG IROORZHG E\ DPHQGPHQWV WR WKH IORZJUDSK XVLQJ WKH SRLQWHUYDOXH LQIRUPDWLRQ 'DWDIORZ DQDO\VLV PDNHV D VLPSOLI\LQJ FRQVHUYDWLYH DVVXPSWLRQ DERXW WKH FRUn UHVSRQGHQFH EHWZHHQ SDWKV LQ WKH IORZJUDSK DQG SRVVLEOH H[HFXWLRQ SDWKV LQ WKH SURn JUDP /HW D SDWK EH D VHTXHQFH RI IORZJUDSK QRGHV VXFK WKDW LQ WKH VHTXHQFH QRGH Q IROORZV QRGH P RQO\ LI Q LV D VXFFHVVRU RI P LQ WKH IORZJUDSK )RU LQWUDSURFHGXUDO

PAGE 22

DQDO\VLV WKH DVVXPSWLRQ PDGH LV WKDW DQ\ SDWK LQ WKH IORZJUDSK LV D SRVVLEOH H[HFXn WLRQ SDWK 7KDW WKLV DVVXPSWLRQ PD\ QRW EH WUXH IRU D SDUWLFXODU SURJUDP VKRXOG EH REYLRXV +RZHYHU WKH SUREOHP RI GHWHUPLQLQJ WKH SRVVLEOH H[HFXWLRQ SDWKV IRU DQ DUELWUDU\ SURJUDP LV NQRZQ WR EH XQGHFLGDEOH 7KH VLPSOLI\LQJ DVVXPSWLRQ WKDW ZH XVH IRU LQWHUSURFHGXUDO DQDO\VLV LV WKH VDPH DV WKDW XVHG IRU LQWUDSURFHGXUDO DQDO\n VLV EXW ZLWK WKH DGGHG SURYLVR WKDW IRU DQ\ SDWK WKDW LV D SRVVLEOH H[HFXWLRQ SDWK DQ\ VXEVHTXHQFH RI UHWXUQ QRGHV PXVW LQYHUVHO\ PDWFK LI SUHVHQW WKH LPPHGLDWHO\ SUHFHGLQJ VXEVHTXHQFH RI FDOO QRGHV $ UHWXUQ QRGH PDWFKHV D FDOO QRGH LI DQG RQO\ LI WKH UHWXUQ QRGH LV WKH FDOO QRGHfV VXFFHVVRU LQ WKH IORZJUDSK ,QWHUSURFHGXUDO )RUZDUG)ORZ2U $QDO\VLV 7KLV VHFWLRQ EHJLQV ZLWK RXU EDVLF DSSURDFK WR VROYLQJ WKH FDOOLQJFRQWH[W SUREn OHP 7KH GDWDIORZ HTXDWLRQV IRU IRUZDUGIORZRU DQDO\VLV DUH WKHQ JLYHQ DQG WKHLU FRUUHFWQHVV LV VKRZQ $V D SDUW RI RXU LQWHUSURFHGXUDO DQDO\VLV PHWKRG WKH WHFKn QLTXH RI HOHPHQW UHFRGLQJ LV SUHVHQWHG DV D ZD\ WR GHDO ZLWK WKH DOLDVHV WKDW UHVXOW IURP FDOOE\UHIHUHQFH IRUPDO SDUDPHWHUV )RU VRPH GDWDIORZ SUREOHPV LPSOLFLW GHILnf§DQG WKH HVVHQWLDO GLIIHUHQFH EHn WZHHQ RXU GDWDIORZ HTXDWLRQV DQG FRQYHQWLRQDO GDWDIORZ HTXDWLRQVf§LV WR GLYLGH HYHU\ ,1 VHW DQG HYHU\ 287 VHW LQWR WZR VHWV FDOOHG DQ HQWU\ VHW DQG D ERG\ VHW 7KH UHDVRQ

PAGE 23

fV ERG\ QRGHV RU EHFRPH VPDOOHU EHFDXVH RI NLOOV %\ LQWHUVHFWLQJ WKH FDOOLQJ FRQWH[W DW D FDOO QRGH ZLWK WKH HQWU\ VHW DW WKH H[LW QRGH RI WKH FDOOHG SURFHGXUH WKH UHVXOW LV WKDW VXEVHW RI WKH FDOOLQJ FRQWH[W WKDW KDV UHDFKHG WKH H[LW QRGH DQG WKHUHIRUH ZLOO UHDFK WKH UHWXUQ QRGH IRU WKDW FDOO %\ fUHDFKff PHDQV WKH VHW RI SUHGHFHVVRUV RI Q 7KH 5(&2'( VHW XVHG LQ *URXS LV H[SODLQHG LQ 6HFWLRQ 7KH *(1 VHW XVHG LQ *URXS DQG WKH *(1 DQG .,// VHWV XVHG LQ *URXS ,, DUH H[SODLQHG LQ 6HFWLRQ

PAGE 24

)RU DQ\ QRGH Q ,1>Q@ (LQ>Q@ 8 %WQ>Q? 287>Q@ (RXW>Q? 8 %RXW>Q@ *URXS Q LV DQ HQWU\ QRGH %LQ^Q@ (LQ>Q? ^[ [ 287>S@ $ &L` S SUHGQf %RXW>Q` *(1>Q? (RXW >Q@ (LQ>Q?8 5(&2'(>Q? *URXS ,, Q LV D UHWXUQ QRGH S LV WKH DVVRFLDWHG FDOO QRGH DQG T LV WKH H[LW QRGH RI WKH FDOOHG SURFHGXUH %LQ>Q? ^[ D %RXW>S@ $ &L 9 &[ $ & $ [ e eRXW>"@fff 9 [ %RXW>T@ $ &f` (LQ>Q? ^[ (RXW>S` &L 9 &L $ & $ [ (RXW>T@f` %RXW>Q@ P>Q@ .,//>Q@f 8 *(1>Q` (RXW>Q@ (WQ>Q? ,,//>Q? *URXS ,,, Q LV QRW DQ HQWU\ RU UHWXUQ QRGH %LQ 0 ?%RXW>S@ S SUHGQf (LQ>Q@ 8 (RXW>S@ S SUHGQf %XW>Q@ %WQ>Q@ .,//>Q@f 8 *(1>Q?

PAGE 25

n XUH %\ fVROYLQJf

PAGE 26

SURFHGXUH PDLQ EHJLQ Z [ LIZ [f ] SURFHGXUH If EHJLQ [ HQG FDOO If HOVH \ FDOO If )LJXUH $ UHDFKLQJGHILQLWLRQV H[DPSOH

PAGE 27

7DEOH 6ROXWLRQ RI IRUZDUGIORZRU HTXDWLRQV IRU )LJXUH 1RGH (^Q (XW %LQ %RXW ^` ^` ^ f ^` ^ ` f f ^ ` ^ ` ^ ` ^ ` ^ ` ^ ` ^ ` ^` ^` f ^ ` ^` ^` VHW WKDW UHDFK D FDOO DW D FDOO QRGH WKRVH HIIHFWV WKDW VXUYLYH WKH FDOO DUH UHFRYHUHG LQ WKH HQWU\ VHW FRQVWUXFWHG E\ WKH (LQ>Q? HTXDWLRQ IRU WKH VXFFHVVRU UHWXUQ QRGH Q 7R VHH WKDW WKLV LV WUXH REVHUYH WKH IROORZLQJ ,I DQ HQWU\VHW HIIHFW WKDW UHDFKHV WKH FDOO FDQQRW HQWHU WKH FDOOHG SURFHGXUH WKHQ LW FDQQRW EH NLOOHG ZLWKLQ WKH FDOOHG SURFHGXUH VR WKH HIIHFW VKRXOG EH DGGHG WR WKH UHWXUQQRGH HQWU\ VHW ZLWKRXW IXUWKHU FRQGLWLRQV DQG WKLV LV GRQH E\ WKH VHOHFWLRQ FULWHULRQ [ e (RXW>S@ $ &?f LQ WKH HTXDWLRQ IRU WKH UHWXUQ QRGH ,I RQ WKH RWKHU KDQG DQ HQWU\VHW HIIHFW UHDFKHV WKH FDOO DQG GRHV HQWHU WKH FDOOHG SURFHGXUH DQG WKHUHIRUH PD\ EH NLOOHG E\ LW WKHQ WKLV HIIHFW VKRXOG EH DGGHG WR WKH UHWXUQQRGH HQWU\ VHW RQO\ LI LW UHDFKHG WKH HQWU\ VHW RI WKH FDOOHG SURFHGXUHfV H[LW QRGH DQG WKH HIIHFW FDQ FURVV EDFN LQWR WKH FDOOHU 7KLV LV GRQH E\ WKH VHOHFWLRQ FULWHULRQ [ e (RXW>S@ $ &? $ & $ [ e (RXW>T@f LQ WKH (LQ>Q? HTXDWLRQ IRU WKH UHWXUQ QRGH )URP WKH HTXDWLRQV IRU WKH HQWU\ VHW ZH VHH WKDW IRU DQ\ SURFHGXUH ] WKH HQWU\ VHW DW ]fV H[LW QRGH ZLOO DV WKH HTXDWLRQV DUH VROYHG HYHQWXDOO\ FRQWDLQ DOO FDOOLQJFRQWH[W HIIHFWV WKDW HQWHUHG ] DQG UHDFKHG LWV H[LW QRGH 7KLV FKDUDFWHULVWLF RI WKH H[LWQRGH HQWU\ VHW LV WKH UHTXLUHPHQW SODFHG XSRQ LW ZKHQ LW LV XVHG LQ WKH

PAGE 28

(LQ>Q? HTXDWLRQ IRU WKH UHWXUQ QRGH VR WKLV UHTXLUHPHQW LV VDWLVILHG DQG WKH HQWU\VHW HTXDWLRQV DUH FRUUHFW )RU DQ\ SURFHGXUH WKH %LQ VHW LV DOZD\V HPSW\ DW WKH HQWU\ QRGH VR WKH % VHW LV IUHH RI FDOOLQJFRQWH[W HIIHFWV :LWKLQ WKH SURFHGXUH ERG\ *(1 DQG .,// VHWV DUH XVHG WR XSGDWH WKH ERG\ VHW DV LW SURSDJDWHV DORQJ WKH YDULRXV QRGHV )RU HIIHFWV LQ WKH ERG\ VHW WKDW UHDFK D FDOO DW D FDOO QRGH WKRVH HIIHFWV WKDW VXUYLYH WKH FDOO DUH UHFRYHUHG LQ WKH ERG\ VHW FRQVWUXFWHG E\ WKH "f>Q@ HTXDWLRQ IRU WKH VXFFHVVRU UHWXUQ QRGH Q ,I D ERG\VHW HIIHFW WKDW UHDFKHV WKH FDOO FDQQRW HQWHU WKH FDOOHG SURFHGXUH WKHQ LW FDQQRW EH NLOOHG ZLWKLQ WKH FDOOHG SURFHGXUH VR LW VKRXOG EH DGGHG WR WKH UHWXUQQRGH ERG\ VHW ZLWKRXW IXUWKHU FRQGLWLRQV DQG WKLV LV GRQH E\ WKH VHOHFWLRQ FULWHULRQ [ %RXW>S@ $ &Mf LQ WKH f>Q@ HTXDWLRQ IRU WKH UHWXUQ QRGH ,I RQ WKH RWKHU KDQG D ERG\VHW HIIHFW UHDFKHV WKH FDOO DQG ZLOO HQWHU WKH FDOOHG SURFHGXUH DQG WKHUHIRUH PD\ EH NLOOHG E\ LW WKHQ WKLV HIIHFW VKRXOG EH DGGHG WR WKH UHWXUQ QRGH ERG\ VHW RQO\ LI LW UHDFKHG WKH HQWU\ VHW RI WKH FDOOHG SURFHGXUHfV H[LW QRGH DQG WKH HIIHFW FDQ FURVV EDFN LQWR WKH FDOOHU 7KLV LV GRQH E\ WKH VHOHFWLRQ FULWHULRQ [ f %R[OW>S@ $ &L $ & $ [ f (RXW>T@f LQ WKH ="P>Q@ HTXDWLRQ IRU WKH UHWXUQ QRGH ,Q DGGLWLRQ DOO FURVVDEOH HIIHFWV WKDW UHVXOW IURP WKH FDOO DQG WKDW DUH LQGHSHQGHQW RI FDOOLQJ FRQWH[W VKRXOG DOVR EH DGGHG WR WKH UHWXUQQRGH ERG\ VHW DQG WKLV LV GRQH E\ WKH VHOHFWLRQ FULWHULRQ [ f %RXW>T@ $ &f LQ WKH P>Q@ HTXDWLRQ IRU WKH UHWXUQ QRGH )URP WKH HTXDWLRQV IRU WKH ERG\ VHW ZH VHH WKDW IRU DQ\ SURFHGXUH ] WKH ERG\ VHW DW ]fV H[LW QRGH LV IUHH RI FDOOLQJFRQWH[W HIIHFWV DQG ZLOO DV WKH HTXDWLRQV DUH VROYHG HYHQWXDOO\ FRQWDLQ DOO ERG\ HIIHFWV WKDW UHDFKHG WKH H[LW QRGH LQFOXGLQJ WKRVH ERG\ HIIHFWV UHVXOWLQJ IURP FDOOV PDGH ZLWKLQ ] 7KLV FKDUDFWHULVWLF RI WKH H[LWQRGH ERG\ VHW LV WKH UHTXLUHPHQW SODFHG XSRQ LW ZKHQ LW LV XVHG LQ WKH eQ>Q@ HTXDWLRQ IRU WKH UHWXUQ QRGH VR WKLV UHTXLUHPHQW LV VDWLVILHG 7KH RWKHU UHTXLUHPHQW RI WKLV UHWXUQQRGH HTXDWLRQ LV WKDW WKH H[LWQRGH HQWU\ VHW FRQWDLQV DOO FDOOLQJFRQWH[W HIIHFWV

PAGE 29

IRU WKH SURFHGXUH WKDW UHDFK WKH H[LW QRGH 7KLV UHTXLUHPHQW KDV DOUHDG\ EHHQ VKRZQ WR EH VDWLVILHG VR ZH FRQFOXGH WKDW WKH ERG\VHW HTXDWLRQV DUH FRUUHFW (OHPHQW 5HFRGLQJ IRU $OLDVHV 7KH 5(&2'( VHW IRU WKH HQWU\ QRGH KDV LWV HOHPHQWV DGGHG WR WKH )Q VHW IRU WKDW QRGH 7KH LGHD RI WKH 5(&2'( VHW LV WKDW FHUWDLQ HOHPHQWV LQ WKH 287 VHW RI D SUHGHFHVVRU FDOO QRGH LUUHVSHFWLYH RI WKHLU DELOLW\ WR FURVV WKH LQWHUSURFHGXUDO ERXQGnf DQG 5(&2'( IRU D IRUZDUGIORZRU GDWDIORZ SUREOHP IRU WKH DVVXPHG ODQJXDJH PRGHO LQ ZKLFK WKH YLVLELOLW\ RI HDFK IRUPDO SDUDPHWHU LV OLPLWHG WR WKH VLQJOH SURFHn GXUH WKDW GHFODUHV LW )RU HDFK HOHPHQW LQ WKH 287>F@ VHW WKH DOJRULWKP JHQHUDWHV DW PRVW RQH HOHPHQW IRU LQFOXVLRQ LQ WKH HQWU\QRGH LQSXW VHWV 7KH DOJRULWKP LV

PAGE 30

XQDPELJXRXV H[FHSW IRU OLQH 7KH fFDQ EH DIIHFWHG E\f WHVW DW OLQH LV D JHQHUn DOL]DWLRQ 7KH GHWDLOV RI WKLV WHVW ZLOO GHSHQG RQ WKH VSHFLILF GDWDIORZ SUREOHP EHLQJ VROYHG )RU H[DPSOH LI WKH GDWDIORZ SUREOHP LV UHDFKLQJ GHILQLWLRQV WKHQ HDFK EDVH HOHPHQW Z UHSUHVHQWV D VSHFLILF GHILQLWLRQ RI VRPH YDULDEOH ] ,I WKH DFWXDO SDUDPHn WHU S EHLQJ WHVWHG E\ WKH DOJRULWKP LV WKH YDULDEOH ] DQG WKH FRUUHVSRQGLQJ IRUPDO SDUDPHWHU LV FDOOE\UHIHUHQFH WKHQ WKH GHILQLWLRQ WKDW Z UHSUHVHQWV FDQ EH XVHG RU NLOOHG WKURXJK WKDW IRUPDO SDUDPHWHU VR Z FDQ EH DIIHFWHG E\ WKDW DFWXDO SDUDPHWHU ] DQG WKH fDIIHFWHG E\f WHVW LV WKHUHIRUH VDWLVILHG 7KH S f

PAGE 31

f§ H LV DQ HQWU\ QRGH f§ 7KLV DOJRULWKP FRQVWUXFWV WKH (^Q>H@ DQG 5(&2'(>H@ VHWV EHJLQ (LQ > H @ f§ 5(&2'(>H@ IRU HDFK SUHGHFHVVRU FDOO QRGH F RI HQWU\ QRGH H IRU HDFK HOHPHQW [ e 287>F@ OHW Z EH WKH EDVH HOHPHQW RI [ OHW 2$ EH WKH VHW RI DOLDVHV LI DQ\ DVVRFLDWHG ZLWK Z IRUPLQJ [ OHW 1$ EH WKH VHW RI QHZ DOLDVHV 1$ Y IRU HDFK DFWXDO SDUDPHWHU S DW FDOO QRGH F WKDW LV DOLDVHG WR D FDOOE\UHIHUHQFH IRUPDO SDUDPHWHU LI Z FDQ EH DIIHFWHG E\ Sf 9 S f 2$f 1$ 1$8^I@ IL HQG IRU LI 1$ 5(&2'(>H` 5(&2'(>H@ 8 ^Z1$f` HOVH LI Z FDQ FURVV WKH LQWHUSURFHGXUDO ERXQGDU\ (LQ>H@ (LQ>H@ 8 ^X` IL HQG IRU HQG IRU HQG )LJXUH (OHPHQWUHFRGLQJ DOJRULWKP IRU IRUZDUGIORZRU GDWDIORZ SUREOHPV

PAGE 32

f VHWV WR UHFRJQL]H HOHPHQWV WR EH UHFRYHUHG IURP WKH H[LW QRGH HQWU\ VHW 5HFRJQLWLRQ DQG UHVWRUDWLRQ ZRXOG EH GRQH E\ WU\LQJ WR PDWFK WKH H[LWQRGH HQWU\VHW HOHPHQW DJDLQVW WKH VHFRQG HOHPHQW RI DQ RUGHUHG SDLU IURP WKH DSSURSULDWH DGGLWLRQDO VHW DW WKH FDOO QRGH DQG WKHQ LI WKHUH LV D PDWFK UHVWRULQJ WKH RULJLQDO HOHPHQW E\ XVLQJ WKH ILUVW HOHPHQW RI WKH PDWFKHG SDLU )RU H[DPSOH LI D LV D FURVVLQJ HOHPHQW LQ WKH %RXW VHW RI D FDOO QRGH DQG \ LV WKH JHQHUDWHG HOHPHQW WKHQ [ \f ZRXOG EH DQ RUGHUHG SDLU LQ WKH DGGLWLRQDO VHW IRU ERG\VHW HOHPHQWV :KHQ WKH VHW IRU WKH UHWXUQ QRGH LV FRPSXWHG LI \ LV LQ WKH H[LWQRGH HQWU\ VHW WKHQ LW ZLOO PDWFK WKH RUGHUHG SDLU [ \f DQG HOHPHQW [ ZLOO EH DGGHG WR WKH %^Q VHW $V DQ H[DPSOH RI ZK\ HOHPHQW UHFRGLQJ LV QHFHVVDU\ FRQVLGHU WKH IROORZLQJ 6XSSRVH WKHUH DUH WZR GLIIHUHQW FDOOV WR WKH VDPH SURFHGXUH DQG GLIIHUHQW GHILQLWLRQV RI JOREDO YDULDEOH J UHDFK HDFK FDOO $W RQH RI WKH FDOOV J LV DOVR XVHG DV DQ DFWXDO SDUDPHWHU DQG WKH FRUUHVSRQGLQJ IRUPDO SDUDPHWHU LV FDOOE\UHIHUHQFH 7KH SUREOHP QRZ LV ZKDW WR NLOO IURP WKH HQWU\ VHW ZKHQHYHU WKDW IRUPDO SDUDPHWHU LV GHILQHG LQ WKH FDOOHG SURFHGXUH ,I WKH LQGLYLGXDO HOHPHQWV UHSUHVHQWLQJ WKH GLIIHUHQW GHILQLWLRQV RI J GR QRW VRPHKRZ LGHQWLI\ KRZ WKH\ DUH UHODWHG WR WKLV IRUPDO SDUDPHWHU WKHQ

PAGE 33

n

PAGE 34

DOLDVHV YDULDEOH Y WR [ 7KH VHFRQG FDOO DOLDVHV YDULDEOH Y WR ERWK [ DQG \ 7KH WKLUG FDOO DOLDVHV YDULDEOH Z WR [ 7KXV DW SRLQW S WKHUH ZRXOG EH WKUHH LPSOLFLWGHILQLWLRQ HOHPHQWV JHQHUDWHG QDPHO\ X ^[`f Z ^UH \`f DQG LQ ^D`f $V DQ H[DPSOH RI ZKDW WKLV HOHPHQW QRWDWLRQ PHDQV IRU WKH X^[`f HOHPHQW WKH Y UHSUHVHQWV WKH LPSOLFLW GHILQLWLRQ RI YDULDEOH Y WKDW RFFXUV DW SRLQW S DQG WKH [ UHSUHVHQWV WKH IRUPDO SDUDPHWHU WKDW YDULDEOH Y LV DOLDVHG WR $V D VSHFLDO UHTXLUHPHQW IRU WKHVH LPSOLFLWGHILQLWLRQ HOHPHQWV IRU WKH %RXW VHW DW WKH H[LW QRGH RI SURFHGXUH $ WKH Y ^[`f HOHPHQW LI LW UHDFKHV WKLV VHW FDQ RQO\ FURVV IURP WKLV VHW WR WKH UHWXUQ QRGH RI WKH ILUVW FDOO 6LPLODUO\ WKH X ^[S`f HOHPHQW FDQ RQO\ FURVV WR WKH UHWXUQ QRGH RI WKH VHFRQG FDOO DQG WKH Z ^[`f HOHPHQW FDQ RQO\ FURVV WR WKH UHWXUQ QRGH RI WKH WKLUG FDOO 7KH FURVVLQJ UHVWULFWLRQV LQ WKH SUHFHGLQJ H[DPSOH DUH GXH WR D UXOH QRZ JLYHQ /HW $ GHQRWH D SURFHGXUH FRQWDLQLQJ D GHILQLWLRQ DW SRLQW S RI D FDOOE\UHIHUHQFH IRUPDO SDUDPHWHU [ W ^[`f LV WKH LPSOLFLWGHILQLWLRQ HOHPHQW JHQHUDWHG DW SRLQW S IRU VRPH VSHFLILF FDOO F RI $ WKDW DOLDVHV DFWXDOSDUDPHWHU YDULDEOH W WR [ DQG P LV WKH H[LW QRGH RI $ ,I W ^[`f %RXW>P? WKHQ W ^[`f FDQ RQO\ FURVV IURP %RXW>P? WR WKH UHWXUQ QRGH RI FDOO F DQG DV W ^[`f FURVVHV LW PXVW EH UHFRGHG DV W E\ KDYLQJ LWV DOLDV UHODWLRQVKLS QXOOLILHG 7KLV FURVVLQJUHVWULFWLRQ UXOH LV QHFHVVDU\ EHFDXVH HOHPHQW I ^[`f`f UHDFKHV WKH %RXW>P? VHW WKH UXOH VWDWHV WKDW WKLV HOHPHQW FDQ RQO\ FURVV WR WKH UHWXUQ QRGH RI FDOO F DQG WKDW LW EH UHFRGHG DV W $VVXPLQJ WKDW WKLV W HOHPHQW WKHQ UHDFKHV IURP WKLV UHWXUQ QRGH WR WKH %RXW>P? VHW W FDQ WKHQ FURVV

PAGE 35

WR DQ\ UHWXUQ QRGH WKDW KDV DQ LQHGJH IURP P $OWKRXJK ERWK WKH ^[`f DQG W HOHPHQWV UHIHU WR WKH VDPH LPSOLFLW GHILQLWLRQ RI YDULDEOH W RFFXUULQJ DW SRLQW S WKH WZR HOHPHQWV DUH QRW WKH VDPH DQG WKH FURVVLQJUHVWULFWLRQ UXOH DSSOLHV RQO\ WR DQ HOHPHQW WKDW LV LGHQWLFDO WR WKH HOHPHQW JHQHUDWHG DW SRLQW S ZKLFK LV W ^[`f`f

PAGE 36

n DWLYH DOJRULWKP LV HQGHG D IROORZRQ VWHS LV GRQH Df ([DPLQH WKH %RXW VHW IRU HDFK H[LW QRGH )RU HDFK GHILQLWLRQ G LQ WKLV VHW RI D IRUPDO SDUDPHWHU S DQG S LV FDOOE\n UHVXOW RU FDOOE\YDOXHUHVXOW WKHQ G UHDFKHV WKH LPSOLFLW XVH RI WKLV IRUPDO SDUDPHWHU E\ WKRVH LPSOLFLW GHILQLWLRQV RI DFWXDO SDUDPHWHUV IRXQG DW WKH YDULRXV UHWXUQ QRGHV ZKRVH FRUUHVSRQGLQJ IRUPDO SDUDPHWHU LV S 7KH HOHPHQW UHSUHVHQWLQJ G FDQ EH DGGHG WR WKH %LQ VHWV RI WKRVH UHWXUQ QRGHV LQ D ZD\ WKDW UHIOHFWV WKH UHDFK Ef

PAGE 37

)RU IRUZDUGIORZDQG SUREOHPV VRPH FKDQJHV DUH QHHGHG WR WKH GDWDIORZ HTXDnOW>S@ 8 %RXW>Sff $ &O` S f SUHGQf "0 S_ ^[?[H287>S@$&` S f SUHGQf %RXW>Q@ *(1>Q? (O?>Q@ ,9@ 8 5(&2'(Z>Q? 8 5(&2'(Z>Q@ (O?>Q@ eMQf>Q@ 8 5(&2'(A>Q@ *URXS ,, Q LV D UHWXUQ QRGH S LV WKH DVVRFLDWHG FDOO QRGH DQG T LV WKH H[LW QRGH RI WKH FDOOHG SURFHGXUH %LQ>Q? ^[ [ %RXW>S@ $ &M 9 &M $ & $ r f O>"@fff 9 [ f %RXW>T@ $ &f`

PAGE 38

": ^[ f (bW>S@ &c 9 ^&[ $ & $ [ f (c}W>T@f`L %RX>Q@ Q>Q@ D/>Q@f 8 *e:>Q@ 0 A>Q@ .,//>Q?L *URXS ,,, Q LV QRW DQ HQWU\ RU UHWXUQ QRGH %WQ>Q@ S_ %RXW>S@ S ( SUHG^Qf rZ Q HeP L S e SUHGQf %RXW>Q@ eff>m@ .,//>Q?f 8 *(1>Q` (eW>Q@ (ccOf` VHW 7R DYRLG VXFK ORVV DQ LPSURYHG UXOH VWDWHV WKDW LI WKH VDPH EDVH HOHPHQW Z LV IRXQG LQ HDFK 287>F@ VHW DQG WKHUH LV RQH RU PRUH QRQHPSW\ DOLDV UHODWLRQVKLSV IRU WKDW Z RFFXUULQJ DW RQH RU PRUH SUHGHFHVVRU QRGHV F WKHQ D VLQJOH UHFRGHG HOHPHQW IRU WKDW Z WKDW HQFRGHV DOO RI WKHVH DOLDV UHODWLRQVKLSV ZRXOG EH JHQHUDWHG LQWR WKH 5(&2'(A VHW RWKHUZLVH QR UHFRGHG HOHPHQW IRU WKDW Z ZRXOG EH JHQHUDWHG LQWR WKH 5(&2'(A VHW )RU

PAGE 39

H[DPSOH VXSSRVH F KDV WKUHH GLIIHUHQW YDOXHV IRU D JLYHQ HQWU\ QRGH DQG WKH VDPH EDVH HOHPHQW Z LV IRXQG LQ HDFK 287>F@ VHW DQG DW RQH F WKHUH LV DQ HPSW\ DOLDV UHODWLRQVKLS DW WKH VHFRQG F WKHUH LV DQ DOLDV UHODWLRQVKLS WR IRUPDO SDUDPHWHU [ DQG DW WKH WKLUG F WKHUH LV DQ DOLDV UHODWLRQVKLS WR IRUPDO SDUDPHWHU \ )RU WKLV H[DPSOH WKH VLQJOH UHFRGHG HOHPHQW ZRXOG EH Z ^[L`fff 2QO\ D XVH RI WKH EDVH HOHPHQW WKURXJK DQ DOLDV HVWDEOLVKHG DW HDFK F ZRXOG EH D XVH WKURXJK DQ DOLDV WKDW RFFXUV RQ HYHU\ FDOO SDWK DQG WKLV NLQG RI XVH ZRXOG EH WKH DOOSDWKV XVH WKDW LV LPSOLFLWO\ UHTXLUHG E\ WKH VSHFLILF GDWDIORZ SUREOHP E\ YLUWXH RI LW EHLQJ IRUZDUGIORZDQG :LWK WKH H[FHSWLRQ RI WKH FRQIOXHQFH RSHUDWRU DQG WKH WZR GLIIHUHQW HQWU\ VHWV WKH HTXDWLRQV IRU IRUZDUGIORZDQG DUH WKH VDPH DV IRU IRUZDUGIORZRU DQG DUH OLNHn ZLVH FRUUHFW 6HW (A IXOILOOV WKH UHTXLUHPHQW IRU WKH ,1 DQG 287 VHWV E\ FRQVLVWHQWO\ XVLQJ WKH LQWHUVHFWLRQ FRQIOXHQFH RSHUDWRU IRU LWV FRQVWUXFWLRQ MXVW DV % GRHV 7KH HTXDWLRQV IRU WKH (A DQG (A VHWV RQO\ GLIIHU DW WKH HQWU\ QRGH DQG WKHUH WKH RQO\ GLIIHUHQFH LV WKH FRQIOXHQFH RSHUDWRU DQG WKH ZD\ WKH 5(&2'( VHWV DUH EXLOW $V VHW LQWHUVHFWLRQ LV WKH FRQIOXHQFH RSHUDWRU IRU (A? DQG VHW XQLRQ IRU (A? DQG WKH

PAGE 40

7DEOH 6ROXWLRQ RI IRUZDUGIORZDQG HTXDWLRQV IRU )LJXUH 1RGH LQ nRXW me} S: nRXW %LQ %RXW ^` ^` ^ ` ^ ` ^` ^` ^` ^ ` ^ ` ^` ` ^ ` ^ ` ^` ^` ^ ` ^ ` ^` ^` ^` ^ ` ^` ^` 5(&2'(A VHW LV DGGHG WR ERWK DQG (A? LW IROORZV WKDW (A ZLOO EH D VXEVHW RI (DW HYHU\ QRGH 7KXV (A FDQ EH XVHG WR UHFRYHU FDOOLQJ FRQWH[W IRU (A? 6HW (: DOVR VHUYHV WR UHFRYHU FDOOLQJ FRQWH[W IRU ERWK (A DQG % EHFDXVH (A LV EXLOW DW WKH HQWU\ QRGH IURP WKHVH WZR VHWV DQG WKH XVH RI XQLRQ DV WKH FRQIOXHQFH RSHUDWRU JXDUDQWHHV WKDW DOO FDOOLQJFRQWH[W HIIHFWV ZLOO EH FROOHFWHG 7DEOH VKRZV WKH UHVXOW RI VROYLQJ WKH HTXDWLRQV IRU WKH IORZJUDSK RI )LJn XUH %\ fVROYLQJf ZH PHDQ WKDW LQ HIIHFW WKH LWHUDWLYH DOJRULWKP KDV EHHQ XVHG DQG DOO WKH VHWV DUH VWDEOH 7KH GDWDIORZ SUREOHP LV DYDLODEOH H[SUHVVLRQV DQG YDULn

PAGE 41

SURFHGXUH PDLQ EHJLQ \ Z ] [ SURFHGXUH If EHJLQ [ ] HQG LIHf D ] FD8 If )LJXUH $Q DYDLODEOHH[SUHVVLRQV H[DPSOH

PAGE 42

,QWHUSURFHGXUDO %DFNZDUG)ORZ $QDO\VLV %DFNZDUGIORZ SUREOHPV DUH EDVLFDOO\ IRUZDUGIORZ SUREOHPV LQ UHYHUVH +RZn HYHU WKH VDPH IORZJUDSK LV XVHG IRU ERWK IRUZDUGIORZ DQG EDFNZDUGIORZ SUREOHPV 7R FRQYHUW WKH HTXDWLRQV IRU IRUZDUGIORZRU WR EDFNZDUGIORZRU RU IRU IRUZDUG IORZDQG WR EDFNZDUGIORZDQG WKH WUDQVIRUPDWLRQ LV PHFKDQLFDO DQG VWUDLJKWIRUn ZDUG 7KH VDPH HTXDWLRQV DUH XVHG EXW YDULRXV ZRUGV DQG SKUDVHV DUH HYHU\ZKHUH FKDQJHG WR UHIOHFW WKH UHYHUVH IORZ )RU H[DPSOH fSUHGQff IRU SUHGHFHVVRUV EHFRPHV fVXFFQff IRU VXFFHVVRUV fRXWf VXEVFULSWV EHFRPH fLQf VXEVFULSWV DQG fLQf VXEVFULSWV EHFRPH fRXWf VXEVFULSWV ,1 EHFRPHV 287 DQG 287 EHFRPHV ,1 fFDOO QRGHf EHn FRPHV fUHWXUQ QRGHf DQG fUHWXUQ QRGHf EHFRPHV fFDOO QRGHf fHQWU\ QRGHf EHFRPHV fH[LW QRGHf DQG fH[LW QRGHf EHFRPHV fHQWU\ QRGHf )RU EDFNZDUG IORZ WKH QRGHV UHTXLULQJ VSHFLDO HTXDWLRQV DUH WKH H[LW QRGH DQG FDOO QRGH DQG QRW WKH HQWU\ QRGH DQG UHWXUQ QRGH DV IRU WKH IRUZDUGIORZ SUREOHPV &RPSOH[LW\ RI 2XU ,QWHUSURFHGXUDO $QDO\VLV 0HWKRG 7R GHWHUPLQH WKH ZRUVWFDVH FRPSOH[LW\ RI RXU PHWKRG IRU WKH DVVXPHG ODQn JXDJH PRGHO LQ ZKLFK WKH YLVLELOLW\ RI HDFK IRUPDO SDUDPHWHU LV OLPLWHG WR WKH VLQJOH SURFHGXUH WKDW GHFODUHV LW ZH FRQVLGHU WKH VROXWLRQ RI WKH GDWDIORZ HTXDWLRQV IRU RQO\ RQH HOHPHQW DW D WLPH /HW Q EH WKH QXPEHU RI IORZJUDSK QRGHV /HW WKH HOHPHQWDU\ RSHUDWLRQ PHDVXUHG E\ WKH FRPSOH[LW\ EH WKH FRPSXWDWLRQ RI WKH GDWDIORZ HTXDWLRQV RQFH DW D VLQJOH DYHUDJH IORZJUDSK QRGH IRU D VLQJOH HOHPHQW 2QO\ WKH SUHVHQFH RU DEVHQFH RI WKH VLQJOH HOHPHQW ZLWKLQ D SDUWLFXODU ERG\ RU HQWU\ VHW QHHG EH UHSUHn VHQWHG DQG WKLV UHTXLUHV QR PRUH WKDQ D VLQJOH ELW RI VWRUDJH IRU HDFK VHW UHIHUHQFHG E\ WKH HTXDWLRQV 7KXV FRPSXWLQJ WKH GDWDIORZ HTXDWLRQV RQFH DW DQ DYHUDJH QRGH IRU D VLQJOH HOHPHQW ZLOO FRQVLVW RI D VPDOO QXPEHU RI LQWHJHU RSHUDWLRQV DVVXPLQJ WKDW WKH DYHUDJH LQ DQG RXWGHJUHH RI WKH IORZJUDSK QRGHV LV ERXQGHG E\ D VPDOO FRQVWDQW ZKLFK ZLOO DOZD\V EH WKH FDVH IRU IORZJUDSKV JHQHUDWHG IURP UHDO SURJUDPV

PAGE 43

DQG DOVR DVVXPLQJ WKDW WKH OHQJWK RI UHFRGHG HOHPHQWV ZLOO EH VPDOO 5HIHUULQJ WR WKH DOJRULWKP RI )LJXUH WKH OHQJWK RI D UHFRGHG HOHPHQW LV _1$_ DQG _L?0_ LV ERXQGHG IURP DERYH E\ WKH QXPEHU RI FDOOE\UHIHUHQFH IRUPDO SDUDPHWHUV RI WKH JLYHQ SURFHGXUH $V D UXOH WKLV XSSHU ERXQG ZLOO EH VPDOO :H QH[W FRQVLGHU WKH WRWDO QXPEHU RI QRGH YLVLWV UHTXLUHG WR VROYH WKH GDWDIORZ HTXDWLRQV IRU D VLQJOH HOHPHQW 3ULRU WR VROYLQJ WKH HTXDWLRQV DOO ERG\ DQG HQWU\ VHWV DUH LQLWLDOL]HG WR HPSW\ DW FRPSOH[LW\ Qff IRU VROYLQJ WKH GDWDIORZ HTXDWLRQV IRU D VLQJOH HOHPHQW 7KH ZRUVWFDVH FRPSOH[LW\ RI VROYLQJ WKH GDWDIORZ HTXDWLRQV IRU P WRWDO HOHn PHQWV ZLOO WKHUHIRUH EH 2PQf /HW E EH WKH QXPEHU RI EDVH HOHPHQWV IRU WKH SURJUDP EHLQJ DQDO\]HG DQG OHW U EH WKH QXPEHU RI UHFRGHG HOHPHQWV JLYLQJ P f§ E I U $V DQ H[DPSOH IRU WKH UHDFKLQJGHILQLWLRQV GDWDIORZ SUREOHP WKH EDVH HOHPHQWV ZLOO EH DOO WKH GHILQLWLRQV LQ WKH SURJUDP :H DVVXPH WKDW IRU WKH NLQG RI GDWDIORZ SUREOHPV RXU PHWKRG LV PHDQW WR VROYH WKH QXPEHU RI EDVH HOHPHQWV ZLOO EH D OLQHDU IXQFWLRQ RI WKH SURJUDP VL]H DQG WKHUHIRUH SURSRUWLRQDO WR Q /HW FRQVWDQW t EH DQ XSSHU ERXQG RI EQ :H DOVR DVVXPH WKH XQLYHUVH RI UHDO XVHIXO SURJUDPV ZULWWHQ E\

PAGE 44

SURJUDPPHUV WR VROYH SUDFWLFDO SUREOHPV 7R GHWHUPLQH DQ XSSHU ERXQG IRU U OHW N EH WKH PD[LPXP QXPEHU RI IRUPDO SDUDPHWHUV IRU D VLQJOH SURFHGXUH 7KDW N LV D FRQVWDQW LQGHSHQGHQW RI SURJUDP VL]H VKRXOG EH REYLRXV *LYHQ N DQG WKH DOJRULWKP RI )LJXUH DQG DOORZLQJ DOO SRVVLEOH FRPELQDWLRQV RI WKH IRUPDO SDUDPHWHUV RI DQ\ VLQJOH SURFHGXUH WKH PD[LPXP QXPEHU RI UHFRGHG HOHPHQWV IRU DQ\ VLQJOH SURFHGXUH DQG EDVH HOHPHQW LV N AI A  A N f§ 1RWH WKDW N LV D FRQVWDQW DOEHLW DQ HQRUPRXV FRQVWDQW 7KH PD[LPXP QXPEHU RI UHFRGHG HOHPHQWV IRU DQ\ VLQJOH SURFHGXUH ZLOO WKHUHIRUH EH NE ,Q WKH DVVXPHG ODQJXDJH PRGHO HDFK IRUPDO SDUDPHWHU LV YLVLEOH LQ RQO\ RQH SURFHGXUH DQG WKLV PHDQV HDFK UHFRGHG HOHPHQW LV FRQILQHG WR D VLQJOH SURFHGXUH ZKHQ WKH GDWDIORZ HTXDWLRQV DUH VROYHG 7KHUHIRUH WKH WRWDO QXPEHU RI QRGH YLVLWV UHTXLUHG WR VROYH WKH GDWDIORZ HTXDWLRQV IRU DOO WKH UHFRGHG HOHPHQWV ZLOO EH ERXQGHG IURP DERYH E\ NLVcNE ZKHUH M LV WKH QXPEHU RI SURFHGXUHV LQ WKH IORZJUDSK DQG Vc LV WKH QXPEHU RI IORZJUDSK QRGHV LQ WKH LWK SURFHGXUH 7KLV XSSHU ERXQG FDQ EH UHZULWWHQ DV -L L NLNNQV^ ,JQRULQJ FRQVWDQWV DQG JLYHQ WKDW e1 VL f§ Q DQG + L QVL Q WKH ZRUVWFDVH FRPSOH[LW\ RI RXU PHWKRG IRU WKH DVVXPHG ODQJXDJH PRGHO LV Qfn XUH JHQHUDWHV DW PRVW D VLQJOH UHFRGHG HOHPHQW IRU HDFK HOHPHQW LQ WKH 287 VHW VR WR LQFUHDVH WKH QXPEHU RI UHFRGHG HOHPHQWV DV VWDWHG WKHUH PXVW EH PXOWLSOH FDOOV WR WKH VDPH SURFHGXUH DQG LQ WKHVH GLIIHUHQW FDOOV WKH VDPH EDVH HOHPHQW PXVW EH DOLDVHG WR GLIIHUHQW IRUPDOSDUDPHWHU FRPELQDWLRQV 7R DVVHVV WKH OLNHOLKRRG RI WKLV

PAGE 45

UHTXLUHPHQW EHLQJ PHW FRQVLGHU WKDW IRU DQ\ JLYHQ SURJUDP IURP WKH DVVXPHG XQLnfYDULDEOHfn HYHU VXFK DQ LQFUHDVH ZRXOG EH PRUH OLNHO\ IRU D GDWDIORZ SUREOHP ZKHUH WKH EDVH

PAGE 46

HOHPHQW FDQ EH DIIHFWHG E\ VHYHUDO GLIIHUHQW YDULDEOHV $Q H[DPSOH ZRXOG EH DYDLOnfWK SURFHGXUH DQG
PAGE 47

n LFDOO\ WZR GLIIHUHQW SURWRW\SHV KDYH EHHQ FRQVWUXFWHG DQG WKH\ ERWK VROYH WKH UHDFKLQJGHILQLWLRQV GDWDIORZ SUREOHP XVLQJ RXU PHWKRG %RWK SURWRW\SHV DFFHSW &ODQJXDJH SURJUDPV DV WKH LQSXW WR EH GDWDIORZ DQDO\]HG )RU VLPSOLFLW\ WKHVH SURn

PAGE 48

ELW YHFWRU DQG VROYHV WKH GDWDIORZ HTXDWLRQV IRU WKH SURJUDP IORZJUDSK DV PDQ\ WLPHV DV WKHUH DUH EDVH HOHPHQWV )RU WKH UHDFKLQJGHILQLWLRQV GDWDIORZ SUREOHP WKH GHILQLWLRQV LQ WKH SURJUDP DUH WKH EDVH HOHPHQWV :H FDOO WKH DSSURDFK XVHG E\ SURWRW\SH RQHEDVHHOHPHQWDWDWLPH DQG WKH DSSURDFK XVHG E\ SURWRW\SH LV DOODWRQFH ,W PLJKW EH H[SHFWHG WKDW SURWRW\SH ZRXOG EH PDQ\ WLPHV VORZHU WKDQ SURWRn W\SH EHFDXVH RI WKH ELJ GLIIHUHQFH LQ ELWYHFWRU VL]HV EXW WKLV LV QRW WKH FDVH )RU SURWRW\SH FDOFXODWLRQV XVLQJ YDULHG WHVW UHVXOWV VKRZ WKDW 9 [ 6L a ZKHUH 9 LV WKH DYHUDJH QXPEHU RI YLVLWV SHU IORZJUDSK QRGH PDGH WR VROYH WKH GDWDIORZ HTXDna =f DQG WKH WRWDO DPRXQW RI ZRUN SURWRW\SH PXVW GR SHU IORZJUDSK QRGH WR VROYH WKH HTXDWLRQV IRU WKH EDVH HOHPHQWV LV SURSRUWLRQDO WR WKH SURGXFW 9[6L['tf 1RWH WKDW IRU SURWRW\SH WKHUH LV RQO\ RQH VROYLQJ RI WKH HTXDWLRQV DQG IRU SURWRW\SH WKHUH DUH DV PDQ\ VROYLQJ RI WKH HTXDWLRQV DV EDVH HOHPHQWV 7KH SULPDU\ UHDVRQ

PAGE 49

7DEOH 7\SLFDO H[SHULPHQWDO UHVXOWV IRU WKH WZR SURWRW\SHV GHIV GHIV JOREDO FDOOV QRGHV SURWRW\SH SURWRW\SH b V OPV b V PV b POV PV b PV P V b 1$ O2P2V b 1$ POV b 1$ PV b 1$ PV b 1$ PV b 1$ PV WKH DSSURDFK XVHG E\ SURWRW\SH LV SUHIHUDEOH ZKHQ FRPSDUHG ZLWK WKH DOODWRQFH DSSURDFK XVHG E\ SURWRW\SH LV WKH OLNHOLKRRG RI D JUHDWO\ UHGXFHG V YDOXH )RU H[DPSOH ZLWKRXW HOHPHQW UHFRGLQJ WKH V YDOXH LV IRU SURWRW\SH DQG IRU SURWRW\SH $OORZLQJ HOHPHQW UHFRGLQJ WKH V YDOXH IRU WKH SURWRW\SH DSSURDFK ZLOO EH PD[DYHUDJH QXPEHU RI UHFRGHG HOHPHQWV SHU SURFHGXUH IRU DQ\ VROYLQJ RI WKH HTXDWLRQVf +HUH ZH DVVXPH WKDW WKH EHVW ZD\ WR DGG HOHPHQW UHFRGLQJ WR SURWRW\SH ZRXOG EH IRU HDFK VROYLQJ RI WKH HTXDWLRQV WR VROYH WKH HTXDWLRQV IRU ERWK D VLQJOH EDVH HOHPHQW DQG DOO UHFRGHG HOHPHQWV JHQHUDWHG IURP WKDW EDVH HOHPHQW 7DEOH SUHVHQWV W\SLFDO H[SHULPHQWDO UHVXOWV IRU WKH WZR SURWRW\SHV (DFK WDEOH URZ UHSUHVHQWV D GLIIHUHQW LQSXW SURJUDP 7KH LQSXW SURJUDPV ZHUH UDQGRPO\ JHQHUDWHG E\ D VHSDUDWH SURJUDP JHQHUDWRU 7KH JHQHUDWHG LQSXW SURJUDPV DUH V\Qn WDFWLFDOO\ FRUUHFW DQG FRPSLOH ZLWKRXW HUURU EXW KDYH PHDQLQJOHVV H[HFXWLRQV (DFK LQSXW SURJUDP LQ 7DEOH KDV SURFHGXUHV 2QO\ SURWRW\SH FXUUHQWO\ KDV HOHPHQWUHFRGLQJ ORJLF VR WKH LQSXW SURJUDPV GR QRW KDYH FDOO SDUDPHWHUV DQG WKH WDEOH GDWD GR QRW UHIOHFW HOHPHQWUHFRGLQJ FRVWV 0HDVXULQJ HOHPHQWUHFRGLQJ FRVWV IRU UDQGRPO\ JHQHUDWHG SURJUDPV ZRXOG EH VRPHZKDW PHDQLQJOHVV DQ\ZD\ VLQFH WKH SXUSRVHIXOQHVVRIYDULDEOHV SULQFLSOH ZRXOG EH YLRODWHG

PAGE 50

5HIHUULQJ WR WKH FROXPQV RI 7DEOH fGHIVf LV WKH WRWDO QXPEHU RI GHILQLWLRQV LQ WKH LQSXW SURJUDP fGHIV JOREDOf LV WKH SHUFHQWDJH WKDW GHILQH JOREDO YDULDEOHV fFDOOVf LV WKH QXPEHU RI NQRZQ FDOOV fQRGHVf LV WKH QXPEHU RI IORZJUDSK QRGHV fSURWRW\SH f LV WKH WRWDO &38 XVDJH WLPH LQ PLQXWHV DQG VHFRQGV UHTXLUHG E\ SURWRW\SH WR FRPSOHWHO\ VROYH WKH UHDFKLQJGHILQLWLRQV GDWDIORZ SUREOHP IRU WKH LQSXW SURJUDP DQG JHQHUDWH D UHSRUW RI DOO WKH UHDFKHV DQG fSURWRW\SH f LV WKH VDPH WKLQJ IRU SURWRW\SH 7KH KDUGZDUH XVHG ZDV UDWHG DW URXJKO\ 0,36 7KH ODUJH VSDFH UHTXLUHPHQWV RI SURWRW\SH SUHYHQWHG UXQQLQJ LW IRU WKH ODUJHU LQSXW SURJUDPV LQ WKH WDEOH

PAGE 51

fX f§ H[SUHVVLRQf ZKHUH Y LV WKH YDULDEOH EHLQJ GHILQHG DQG fn PHQW RU FRQWDLQLQJ D FDOO RU FRQWDLQLQJ D IORZJUDSK QRGH )RU ODQJXDJHV WKDW DOORZ QHVWHG SURFHGXUHV VXFK DV 3DVFDO DQG $GD QRWH WKDW SURFHGXUH QHVWLQJ LQ WKHVH ODQJXDJHV LV D PHFKDQLVP IRU FRQWUROOLQJ YDULDEOH VFRSH DQG QRW D PHFKDQLVP IRU

PAGE 52

VKDULQJ VWDWHPHQWV FDOOV RU IORZJUDSK QRGHV 7KURXJKRXW WKLV FKDSWHU ZH DVVXPH WKDW DW PRVW RQO\ D VLQJOH SURFHGXUH FRQWDLQV DQ\ JLYHQ VWDWHPHQW FDOO RU IORZJUDSK QRGH /HW G DQG GG EH WZR GHILQLWLRQV SRVVLEO\ WKH VDPH LQ WKH VDPH SURJUDP /HW GG KDYH D XVHYDULDEOH Y OHW Y EH WKDW XVHYDULDEOH LQVWDQFH DQG OHW G GHILQH Y *LYHQ D SRVVLEOH H[HFXWLRQ SDWK EHWZHHQ GHILQLWLRQ G DQG Y

PAGE 53

LQWUDSURFHGXUDO GDWDIORZ DQDO\VLV WKDW DOO SDWKV LQ D SURFHGXUH IORZJUDSK DUH SRVVLEOH H[HFXWLRQ SDWKV +RZHYHU WKHVH WZR DVVXPSWLRQV DUH XQDYRLGDEOH EHFDXVH GHWHUPLQnn JUDP SRLQW ZKHUH GHILQLWLRQ G RFFXUV 7KH HOHPHQWV LQ ERWK VHWV DUH FDOOV 7KH $OORZ VHW LGHQWLILHV RQO\ WKH FDOOV WR ZKLFK WKH H[HFXWLRQ SDWK FRQWLQXLQJ RQ IURP SRLQW S PD\ PDNH DQ XQPDWFKHG UHWXUQ WRf§XQWLO WKH EDFNZDUGIORZ UHVWULFWLRQV UHSUHVHQWHG E\ WKLV $OORZ VHW DUH HIIHFWLYHO\ FDQFHOOHG E\ WKH LQWHUDFWLRQ EHWZHHQ WKH H[HFXWLRQ SDWK FRQWLQXDWLRQ DQG WKH 7UDQVIRUP VHW H[SODLQHG VKRUWO\ $Q XQPDWFKHG UHWXUQ LV D UHWXUQ PDGH GXULQJ WKH H[HFXWLRQSDWK FRQWLQXDWLRQ WR D FDOO LQVWDQFH WKDW SUHFHGHV

PAGE 54

f§ZKLFK LV WKH ZKROH SXUSRVH RI WKLV IRUPDODQDO\VLV VHFWLRQf§PD\ EH PLVVLQJ SLHFHV WKDW EHORQJ LQ LW EXW ZHUH QRW DGGHG WR LW EHFDXVH EDFNZDUGIORZ UHVWULFWLRQV ZHUH UHWDLQHG WKDW DUH QRW YDOLG IRU DOO WKH SRVVLEOH H[HFXWLRQ SDWKV LQYROYHG /HPPD )RU DQ\ H[HFXWLRQ SDWK 3 EHWZHHQ WZR SURJUDP SRLQWV S DQG T LI 3 LQFOXGHV WZR RU PRUH FDOO LQVWDQFHV PDGH LQ 3 WKDW KDYH QRW EHHQ UHWXUQHG WR LQ 3 WKHQ IRU WKHVH XQUHWXUQHG FDOOV F FDOOV WKH SURFHGXUH FRQWDLQLQJ FW ZKHUH F LV WKH LfWK XQUHWXUQHG FDOO LQ H[HFXWLRQ RUGHU PDGH LQ 3 3URRI $VVXPH WKDW WKH QH[W XQUHWXUQHG FDOO F LV QRW FRQWDLQHG LQ WKH SURn FHGXUH WKDW ZDV FDOOHG E\ F

PAGE 55

UHWXUQHG WR LQ 3 WKHQ F ZRXOG SUHFHGH FL DV DQ XQUHWXUQHG FDOO IROORZLQJ F FRQn WUDGLFWLQJ WKH JLYHQ WKDW FL LV WKH QH[W XQUHWXUQHG FDOO LQ H[HFXWLRQ RUGHU DIWHU F ,I F KDV EHHQ UHWXUQHG WR LQ 3 WKHQ DOO FDOOV RFFXUULQJ RQ WKH H[HFXWLRQ SDWK EHWZHHQ WKH FDOO F DQG WKH UHWXUQ WR F PXVW KDYH EHHQ UHWXUQHG WR DFFRUGLQJ WR $VVXPSWLRQ 7KLV ZRXOG PHDQ FL KDV EHHQ UHWXUQHG WR FRQWUDGLFWLQJ WKH JLYHQ WKDW FL KDV QRW EHHQ UHWXUQHG WR 7KXV LW LV WUXH WKDW F FDOOV WKH SURFHGXUH FRQWDLQLQJ &ML DV DVVXPLQJ RWKHUZLVH OHDGV WR FRQWUDGLFWLRQV ’f§PHDQLQJ D UHWXUQ WR D FDOO LQVWDQFH WKDW SUHFHGHV WKH EHJLQQLQJ RI 3f§WR D FDOO ff $ DQG 3 KDV QR XQUHWXUQHG FDOOV RU f $ A LV GHILQHG IRU 3 DQG KDV QR XQUHWXUQHG FDOOV WKHQ $$ f§ DQG 77 f§

PAGE 56

3URRI )RU FDVH f G LV IUHH RI EDFNZDUGIORZ UHVWULFWLRQV DQG G KDV DIIHFWHG GG ZLWKRXW PDNLQJ DQ XQUHWXUQHG FDOO WKHUHIRUH GG ZLOO EH IUHH RI EDFNZDUGIORZ UHVWULFWLRQV JLYLQJ $$ rf§ DQG 77 f§ )RU FDVH f DV VRRQ DV SDWK 3 PDNHV DQ XQPDWFKHG UHWXUQ U WR D FDOO 7 WKHQ E\ 'HILQLWLRQ ZKDW G FDQ DIIHFW LV QR ORQJHU FRQVWUDLQHG E\ $ DQG 7 DQG WKLV IUHHGRP IURP FRQVWUDLQW E\ $ DQG 7 SDVVHV E\ WUDQVLWLYLW\ WR GG EHFDXVH G DIIHFWV GG :KHQ LV GHILQHG IRU 3 WKH XQPDWFKHG UHWXUQ U LQ 3 WKDW LPPHGLDWHO\ SUHn FHGHV WKH EHJLQQLQJ RI PHDQV WKDW DQ\ XQUHWXUQHG FDOOV LQ 3 DUH DOVR LQ 7KLV LV EHFDXVH DOO FDOO LQVWDQFHV ZLWKLQ 3 DUH PRUH UHFHQW WKDQ WKH FDOO LQVWDQFH WKDW PDWFKHV WKH XQPDWFKHG UHWXUQ U 7KXV E\ $VVXPSWLRQ DOO FDOO LQVWDQFHV LQ 3 SUHFHGLQJ WKH UHWXUQ U PXVW EH UHWXUQHG WR LQ 3 EHIRUH U FDQ RFFXU 7KHUHIRUH 3 KDV QR XQUHWXUQHG FDOOV EHFDXVH KDV QR XQUHWXUQHG FDOOV 7KXV GG LV IUHH RI EDFNZDUG IORZ UHVWULFWLRQV VLQFH $ 7 DQG 3 FRQWULEXWH QRWKLQJ LQ WKH ZD\ RI FRQVWUDLQW JLYLQJ $$ mf§ DQG 77 mf§ ’ 7KHRUHP ,I f $ DQG 3 KDV DW OHDVW RQH XQUHWXUQHG FDOO RU f $ A LV GHILQHG IRU 3 DQG KDV DW OHDVW RQH XQUHWXUQHG FDOO WKHQ $$ rf§ 8DMM VXFA S ^WKH XQUHWXUQHG FDOOV RI 3` DQG 77 f§ -DMM A S ^WKH ILUVW XQUHWXUQHG FDOO LQ 3` 3URRI )RU FDVH f $ DQG 7 FRQWULEXWH QRWKLQJ LQ WKH ZD\ RI FRQVWUDLQW WR $$S DQG 77S %HFDXVH G DIIHFWV GG DORQJ SDWK 3 ZKLFK FRQWDLQV XQUHWXUQHG FDOOV E\ $VVXPSWLRQ WKRVH XQUHWXUQHG FDOOV PXVW EH UHWXUQHG WR ILUVW EHIRUH DQ\ RWKHU XQUHWXUQHG FDOOV FDQ EH PDGH IURP WKH H[HFXWLRQSDWK FRQWLQXDWLRQ SRLQW RI GG RQZDUG +HQFH $$S f§ ^WKH XQUHWXUQHG FDOOV RI 3` %HFDXVH G KDG QR EDFNZDUG IORZ UHVWULFWLRQV LW IROORZV WKDW RQFH DOO WKH XQUHWXUQHG FDOOV RI 3 DUH UHWXUQHG WR E\ WKH H[HFXWLRQSDWK FRQWLQXDWLRQ WKHQ WKDW FRQWLQXDWLRQ ZRXOG QR ORQJHU KDYH DQ\ EDFNZDUGIORZ UHVWULFWLRQV %HFDXVH RI $VVXPSWLRQ DQG /HPPD DOO WKH XQUHWXUQHG FDOOV RI 3 DUH UHWXUQHG WR ZKHQ WKH VHTXHQWLDOO\ ILUVW XQUHWXUQHG FDOO LQ 3 LV UHWXUQHG WR +HQFH 77S f§ ^WKH ILUVW XQUHWXUQHG FDOO LQ 3` )RU FDVH f DV

PAGE 57

VKRZQ LQ WKH SURRI RI 7KHRUHP FDVH f $ DQG 7 FRQWULEXWH QRWKLQJ WR $$S DQG 77S ZKHQ LV GHILQHG IRU 3 7KXV WKLV FDVH f LV HIIHFWLYHO\ WKH VDPH DV FDVH f EHFDXVH WKH $ DQG 7 VHWV FRQWULEXWH QRWKLQJ DQG DQ XQUHWXUQHG FDOO LQ LV DQ XQUHWXUQHG FDOO LQ 3 7KHUHIRUH $$S rf§ ^WKH XQUHWXUQHG FDOOV RI 3` DQG 77S f§ ^WKH ILUVW XQUHWXUQHG FDOO LQ 3` )URP 'HILQLWLRQ DQG WKH JHQHUDO GHILQLWLRQV RI $$ 77 $$S DQG 77S LW IROORZV WKDW $$ f§ -DQ VXFK S $$3 DQG 77 f§ 8DOO VXFK 3 77S 7KXV $$ f§ -DLO VXFK 3 XQUHWXUQHG FDOOV RI 3` DQG 77 rf§ -DQ VXFK S ^WKH ILUVW XQUHWXUQHG FDOO LQ 3` ’ 7KHRUHP ,I $ A LV QRW GHILQHG IRU 3 DQG 3 KDV QR XQUHWXUQHG FDOOV WKHQ $$ f§ ^[ [ f $ $ [ LV SDUW RI D SRVVLEOH H[HFXWLRQ SDWK WKDW LQFOXVLYHO\ EHJLQV ZLWK D FDOO f 7 DQG HQGV ZLWK D FDOO RI WKH SURFHGXUH FRQWDLQLQJ GG VXFK WKDW HDFK XQUHWXUQHG FDOO LQ WKLV SRVVLEOH H[HFXWLRQ SDWK LV LQ $f` DQG 77 f§ $ $ IO 7 3URRI 1RWH WKDW RQO\ RQH SURFHGXUH FRQWDLQV GG %HFDXVH LV QRW GHILQHG IRU 3 LW IROORZV WKDW 3 ZDV FRQVWUDLQHG LQ LWV HQWLUHW\ E\ $ QHYHU PDNLQJ DQ XQPDWFKHG UHWXUQ WR D FDOO f 7 %HFDXVH 3 KDV QR XQUHWXUQHG FDOOV G FDQ RQO\ DIIHFW GG DORQJ 3 E\ PDNLQJ RQH RU PRUH XQPDWFKHG UHWXUQV WR FDOOV f $ f§ 7f XQOHVV G DQG GG DUH LQ WKH VDPH SURFHGXUH $ LQ HIIHFW UHSUHVHQWV SRVVLEOH H[HFXWLRQ SDWKV ZLWK XQUHWXUQHG FDOOV E\ ZKLFK G ZDV DIIHFWHG +RZHYHU RQFH JLYHQ 3 WKH SDWK 3 PD\ HOLPLQDWH VRPH RI WKH SDWKV IURP $ DV EHLQJ SRVVLEOH DQG UHWXUQ WR VRPH RI WKH XQUHWXUQHG FDOOV LQ $ 7KXV DOWKRXJK 3 FRQWULEXWHV QRWKLQJ GLUHFWO\ WR $$ LW PD\ QDUURZ WKH XQUHWXUQHG H[HFXWLRQSDWK SRVVLELOLWLHV WKDW $ FDQ FRQWULEXWH WR $$ $$ DV GHILQHG IRU WKLV WKHRUHP FDSWXUHV DOO H[HFXWLRQ SDWKV LQ $ WKDW EHJLQ ZLWK D FDOO f 7 DQG HQG ZLWK D FDOO RI WKH SURFHGXUH WKDW FRQWDLQV GG *LYHQ $VVXPSWLRQ LW VKRXOG EH REYLRXV WKDW WKHVH DUH DOO WKH SRVVLEOH SDWKV LQ $ WKDW DUH XQUHWXUQHG DIWHU 3 1RWH WKDW LI G DQG GG DUH LQ WKH VDPH SURFHGXUH WKHQ $$ $ DQG 77 7 $VVXPH WKDW

PAGE 58

G DQG GG DUH LQ GLIIHUHQW SURFHGXUHV $Q\ FDOO f $ WKDW LV QRW SDUW RI DW OHDVW RQH SDWK LQ $ WKDW PDNHV D FDOO RI WKH SURFHGXUH FRQWDLQLQJ GG PXVW EH H[FOXGHG IURP $$ EHFDXVH 3 UHTXLUHV D SDWK LQ $ WKDW SDVVHV WKURXJK WKH SURFHGXUH FRQWDLQLQJ GG EHFDXVH RWKHUZLVH 3 FRXOG QRW PDNH D UHWXUQ WR WKH SURFHGXUH FRQWDLQLQJ GG $Q\ FDOO f $ WKDW LV RQ D SDWK LQ $ EHWZHHQ WKH SURFHGXUH FRQWDLQLQJ GG DQG WKH SURFHGXUH FRQWDLQLQJ G PXVW EH H[FOXGHG IURP $$ EHFDXVH WKH SURFHGXUH FRQWDLQLQJ GG KDV EHHQ UHWXUQHG WR E\ 3 7KH GHILQLWLRQ RI $$ IRU WKLV WKHRUHP VDWLVILHV WKHVH WZR H[FOXVLRQV 7KDW 77 f§ $$ 7 IROORZV IURP 'HILQLWLRQ UHTXLULQJ 77 & $$ DQG IURP WKH GHILQLWLRQ RI $$ IRU WKLV WKHRUHP ’ 7KHRUHP I ,I $ A LV QRW GHILQHG IRU 3 3 KDV DW OHDVW RQH XQUHWXUQHG FDOO DQG WKH ILUVW XQUHWXUQHG FDOO LQ 3 LV FRQWDLQHG LQ SURFHGXUH ; WKHQ 6? rf§ 8DOO VXFK 3 JLYHQ ; ^AH XQUHWXUQHG FDOOV RI 3` DQG f§ ^[ [ f $ $ [ LV SDUW RI D SRVVLEOH H[HFXWLRQ SDWK WKDW LQFOXVLYHO\ EHJLQV ZLWK D FDOO f 7 DQG HQGV ZLWK D FDOO RI WKH SURFHGXUH ; VXFK WKDW HDFK XQUHWXUQHG FDOO LQ WKLV SRVVLEOH H[HFXWLRQ SDWK LV LQ $f` $$ f§ 6? 8 6 DQG 77 f§ 6 7 3URRI 6L IROORZV IURP 'HILQLWLRQ DQG WKH SURRI RI 7KHRUHP 6 IROORZV IURP 7KHRUHP ZKHUH WKH VSHFLILF fSURFHGXUH FRQWDLQLQJ GGLQ WKH H[SUHVVLRQ IRU $$ LQ 7KHRUHP KDV EHHQ UHSODFHG E\ WKH HTXDOO\ VSHFLILF fSURFHGXUH ; 7KDW WKH XQLRQ RSHUDWLRQ RI $$ FRPELQLQJ 6? DQG GRHV QRW WKHUHE\ UHSUHn VHQW VSXULRXV SDWKV LQ $$ LW LV RQO\ QHFHVVDU\ WR VKRZ WKDW WKH SDWKV UHSUHVHQWHG LQ 6L QHYHU FURVV ZLWK WKH SDWKV UHSUHVHQWHG LQ 7ZR SDWKV FURVV LI HDFK SDWK PDNHV DQ XQUHWXUQHG FDOO WR WKH VDPH SURFHGXUH $OO SDWKV LQ 6 HQG ZLWK DQ XQUHWXUQHG FDOO RI SURFHGXUH ; $OO SDWKV LQ 6L EHJLQ ZLWK DQ XQUHWXUQHG FDOO FRQWDLQHG LQ SURFHGXUH ; $VVXPH WKDW ERWK 6L DQG 6 LQFOXGH DQ XQUHWXUQHG FDOO WR WKH VDPH SURFHGXUH $V DOO SDWKV LQ OHDG WR SURFHGXUH ; WKLV PHDQV WKHUH H[LVWV DQ H[Hn FXWLRQ SDWK WKDW RULJLQDWHV LQ SURFHGXUH ; DQG HYHQWXDOO\ FDOOV SURFHGXUH ; 7KXV

PAGE 59

)LJXUH $Q H[DPSOH FDOO VWUXFWXUH WKDW GRHV QRW DOORZ RYHUHVWLPDWLRQ WKH H[HFXWLRQ SDWK UHSUHVHQWV UHFXUVLRQ DQG WKLV LV FRQWUDGLFWHG E\ $VVXPSWLRQ 7KHUHIRUH WKH SDWKV UHSUHVHQWHG LQ 6L QHYHU FURVV ZLWK WKH SDWKV UHSUHVHQWHG LQ 7KH ILUVW XQUHWXUQHG FDOO LQ 3 LV QRW DGGHG WR 77 EHFDXVH WKH SDWK 3 LV DQ H[WHQVLRQ RI WKH XQUHWXUQHG SDWKV UHSUHVHQWHG LQ 6A 7KDW 77 rf§ 6L&?7 IROORZV IURP 'HILQLWLRQ UHTXLULQJ 77 & $$ DQG IURP WKH GHILQLWLRQ RI $$ IRU WKLV WKHRUHP ’`` IRU GG $VVXPH WKH FRQWLQXDWLRQ SDWK LV UFU ZKHUH U DQG U DUH XQPDWFKHG UHWXUQV

PAGE 60



PAGE 61

7KH /RJLFDO 5LSSOH (IIHFW $OJRULWKP 7KLV VHFWLRQ SUHVHQWV DQ DOJRULWKP IRU FRPSXWLQJ D SUHFLVH LQWHUSURFHGXUDO ORJLn FDO ULSSOH HIIHFW $IWHU D EULHI RYHUYLHZ RI WKH DOJRULWKP WKH GDWDIORZ DQDO\VLV PHWKRG XVHG E\ WKH DOJRULWKP LV GLVFXVVHG 7KHQ WZR LPSRUWDQW SURSHUWLHV RI WKH GDWDIORZ VHWV DUH GHWDLOHG IROORZHG E\ WKUHH UXOHV WKDW DUH XVHG WR LPSRVH EDFNZDUGIORZ UHn VWULFWLRQV RQ WKH GDWDIORZ DQDO\VLV WKDW LV GRQH /DVW DUH SURRIV WKDW WKH DOJRULWKP LV FRUUHFW 7KH DOJRULWKP WR FRPSXWH ORJLFDO ULSSOH HIIHFW LV VKRZQ LQ )LJXUH (DFK VWDWHPHQW LQ WKH DOJRULWKP LV QXPEHUHG RQ WKH OHIW )RU FRQYHQLHQFH DOJRULWKP VWDWHPHQWV ZLOO EH UHIHUUHG WR DV OLQHV )RU H[DPSOH D UHIHUHQFH WR OLQH PHDQV WKH VWDWHPHQW DW WKDW DFWXDOO\ LV SULQWHG RQ VHYHUDO OLQHV &RPPHQWV LQ WKH DOJRULWKP EHJLQ ZLWK f§ B/ DQG 7 DUH MXVW WZR GLIIHUHQW IL[HG DUELWUDU\ YDOXHV ,Q JHQHUDO WKH DOJRULWKP ZRUNV DV IROORZV $ GHILQLWLRQ G DQG LWV DVVRFLDWHG $OORZ DQG 7UDQVIRUP VHWV DUH SRSSHG IURP WKH VWDFN OLQH f DQG WKHQ WKH UHDFKLQJ GHILQLWLRQV GDWDIORZ SUREOHP LV VROYHG IRU WKLV GHILQLWLRQ G LPSRVLQJ DQ\ EDFNZDUG IORZ UHVWULFWLRQV UHSUHVHQWHG E\ WKH $OORZ DQG 7UDQVIRUP VHWV OLQH f 5HDFKLQJ GHILQLWLRQV IRU D VLQJOH GHILQLWLRQ LV WKH SUREOHP RI ILQGLQJ DOO XVHV DQG GHILQLWLRQV DIIHFWHG E\ WKH GHILQLWLRQ 7KH GHILQLWLRQ G WKDW ZDV GDWDIORZ DQDO\]HG DQG DQ\ XVHV DIIHFWHG E\ LW DUH LQFOXGHG LQ WKH ULSSOH HIIHFW OLQHV WR f (DFK DIIHFWHG GHILQLWLRQ ZLOO KDYH LWV $OORZ DQG 7UDQVIRUP VHWV GHWHUPLQHG LQ DFFRUGDQFH ZLWK 7KHRUHPV WKURXJK OLQHV WR f $ FKHFN LV WKHQ PDGH WR VHH LI WKH DIIHFWHG GHILQLWLRQ DQG LWV UHVWULFWLRQ VHWV $OORZ DQG 7UDQVIRUP VKRXOG EH DGGHG WR WKH VWDFN IRU GDWDIORZ DQDO\VLV RU QRW OLQHV WR f 7KH DOJRULWKP HQGV ZKHQ WKH VWDFN LV HPSW\ $OWKRXJK WKH DOJRULWKP VKRZV D VLQJOH GHILQLWLRQ E EHLQJ DGGHG WR WKH VWDFN DW OLQH DQ\ QXPEHU RI GLIIHUHQW E FDQ DFWXDOO\ EH DGGHG DORQJ ZLWK HPSW\ UHVWULFWLRQ VHWV IRU HDFK E

PAGE 62

f§ &RPSXWH WKH ORJLFDO ULSSOH HIIHFW IRU D K\SRWKHWLFDO RU DFWXDO GHILQLWLRQ E f§ ,QSXW D SURJUDP IORZJUDSK UHDG\ IRU GDWDIORZ DQDO\VLV f§ 2XWSXW WKH ORJLFDO ULSSOH HIIHFW LQ 5,33/( EHJLQ 5,33/( m IRU HDFK GHILQLWLRQ GG LQ WKH SURJUDP )O1GG m HQG IRU VWDFN mf§ SXVK E f RQWR VWDFN ZKLOH VWDFN GR SRS VWDFN LQWR G $//2: 75$16)250f 6ROYH WKH UHDFKLQJGHILQLWLRQV GDWDIORZ HTXDWLRQV IRU WKH VLQJOH GHILQLWLRQ G XVLQJ 5XOHV DQG 5,33/( 5,33/( 8 ^G` IRU HDFK XVH X LQ WKH SURJUDP WKDW LV DIIHFWHG E\ HLWKHU G? RU G 5,33/( 5,33/( 8 ^X` HQG IRU 5227 /,1. 5227 m /,1. IRU HDFK FDOO QRGH Q LQ WKH IORZJUDSK LI G? f %RXW>Q@ DQG G? FURVVHG IURP WKLV FDOO LQWR WKH FDOOHG SURFHGXUH 5227 f§ 5227 8 ^WKH FDOO QRGH Q` IL LI GL f (RXW>Q? DQG GL FURVVHG IURP WKLV FDOO LQWR WKH FDOOHG SURFHGXUH /,1. /,1. 8 ^WKH FDOO QRGH Q` IL LI G f %RXW>Q@ DQG G FURVVHG IURP WKLV FDOO LQWR WKH FDOOHG SURFHGXUH 5227 5227 8 ^WKH FDOO QRGH Q` IL LI G f (RXW>Q? DQG G FURVVHG IURP WKLV FDOO LQWR WKH FDOOHG SURFHGXUH /,1. /,1. 8 ^WKH FDOO QRGH Q` IL HQG IRU )LJXUH 7KH ORJLFDO ULSSOH HIIHFW DOJRULWKP

PAGE 63

IRU HDFK GHILQLWLRQ GG LQ WKH SURJUDP WKDW LV DIIHFWHG E\ HLWKHU G? RU G f§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU GG E\ 7KHRUHP LI G f %LQ >QRGH ZKHUH GG RFFXUV@ 3$7+6 m 75$16 m FDOO $QDO\]H HOVH f§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU GG E\ 7KHRUHP LI G e Af>QRGH ZKHUH GG RFFXUV@ 3$7+6 Y 3$7+6 f§ ^D [ e 5227 8 /,1.f $ [ FDOOV WKH SURFHGXUH WKDW FRQWDLQV GG 9 [ FDOOV D SURFHGXUH WKDW FRQWDLQV D FDOO F f 3$7+6 Q /,1.ff` 75$16 5227 Q 3$7+6 FDOO $QDO\]H IL f§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU GG E\ 7KHRUHP LI G? e %LQ >QRGH ZKHUH GG RFFXUV@ 3$7+6 m 3$7+6 f§ ^D [ e $//2: $ [ FDOOV WKH SURFHGXUH WKDW FRQWDLQV GG 9 [ FDOOV D SURFHGXUH WKDW FRQWDLQV D FDOO F e 3$7+6f` 75$16 75$16)250 3$7+6 FDOO $QDO\]H IL f§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU GG E\ 7KHRUHP LI G? e enQ>QRGH ZKHUH GG RFFXUV@ IRU HDFK SURFHGXUH ; WKDW FRQWDLQV D FDOO e 5227 57 33 33 ^[ [ e 5227 $ [ LV FRQWDLQHG LQ SURFHGXUH ;` ^[ [ e 57 8 /,1.f $ [ LV RQ D SDWK WKDW LQFOXVLYHO\ EHJLQV ZLWK D FDOO e 57 DQG HQGV ZLWK D FDOO RI WKH SURFHGXUH WKDW FRQWDLQV GG VXFK WKDW HDFK FDOO LQ WKLV SDWK LV LQ 57 8 /,1.ff` LI 33 A 3$7+6 3$7+6 f§ ^[ [ e $//2: $ [ FDOOV SURFHGXUH ; 9 [ FDOOV D SURFHGXUH WKDW FRQWDLQV D FDOO F e 3$7+6f` 75$16 75$16)250 3$7+6 3$7+6 3$7+6 8 33 FDOO $QDO\]H HQG VWDWHPHQWV IL HQG IRU IL IL HQG IRU RG HQG )LJXUH FRQWLQXHG

PAGE 64

3URFHGXUH $QDO\]H EHJLQ f§ DYRLG UHSHWLWLRQ RI GG GDWDIORZ DQDO\VLV LI SRVVLEOH LI ),1r 7 $ 3$7+6 9 WUXH IRU DOO VDYHG SDLUV IRU GG 3$7+6 b 3 9 75$16 e 7ff LI 3$7+6 ),1GLL 7 SXVK GG f RQWR VWDFN HOVH VDYH 3$7+6 DQG 75$16 DV WKH SDLU 3 [ 7 IRU GG SXVK GG 3$7+6 75$16f RQWR VWDFN IL IL HQG )LJXUH FRQWLQXHG 7KH GDWDIORZ HTXDWLRQV UHIHUUHG WR LQ OLQH DUH VKRZQ LQ )LJXUH 7KHVH HTXDWLRQV DUH FRSLHG IURP &KDSWHU WKDW SUHVHQWV D PHWKRG IRU FRQWH[WGHSHQGHQW IORZVHQVLWLYH LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV 7KH PHWKRG FRQVLVWV RI VROYLQJf§ XVLQJ WKH VWDQGDUG LWHUDWLYH DOJRULWKPf§f§LH WKH QRGH ZKRVH

PAGE 65

)RU DQ\ QRGH Q ,1>Q@ (LQ >Q@ 8 ef>Q@ 287>Q@ (RXW>Q? 8 %RXW>Q@ *URXS Q LV DQ HQWU\ QRGH %LQ>Q@ (LQ>Q@ 8 ^[?[H287>S`$&` S SUHGQf %RXW>Q? *(1>Q? (RXW>Q@ (WQ>Q@ 8 5(&2'(>Q? *URXS ,, Q LV D UHWXUQ QRGH S LV WKH DVVRFLDWHG FDOO QRGH DQG T LV WKH H[LW QRGH RI WKH FDOOHG SURFHGXUH %LQ>Q` ^[ [ %RXW>S@ $ &L 9 &L $ & $ [ H (RXW>"@fff 9 [ %RXW>T@ $ &f` (LQ>Q@ ^Le (RXW>S@ &L 9 &L $ & $ [ (RXW >"@f` 8W>Q@ %f>Q@ .,//>Q`f 8 *(1>Q? (XW>Q@ (LQ1 .,//>Q? *URXS ,,, Q LV QRW DQ HQWU\ RU UHWXUQ QRGH f>Q@ 8 %RXW>S@ S SUHGQf (LQ >IW@ 8 (RXW>S@ S SUHGQf %RXW>Q@ eP>Q@ .,//>Q@f 8 *(1>Q` (RXW>Q? (LQ>Q? f§ .,//>Q@ )LJXUH 'DWDIORZ HTXDWLRQV IRU WKH UHDFKLQJGHILQLWLRQV SUREOHP

PAGE 66

DVVRFLDWHG EORFN RI SURJUDP FRGH FRQWDLQV WKH GHILQLWLRQ Gf§ef

PAGE 67

f %RXW>T@L WKHQ G? FDQQRW FURVV IURP %RXW>T@ LQWR WKH f>Q@ VHW LI S e

PAGE 68



PAGE 69

f 5Q>QRGH ZKHUH GG RFFXUV@ WKHQ E\ 3URSHUW\ WKHUH H[LVWV D GHILQLWLRQFOHDU SDWK 3 EHWZHHQ G DQG GG WKDW KDV QR XQUHWXUQHG FDOOV DQG VRPHZKHUH DORQJ 3 G LV JHQHUDWHG PHDQLQJ HLWKHU $//2: RU LV GHILQHG IRU 3 7KLV VDWLVILHV WKH FRQGLWLRQV RI 7KHRUHP DQG OLQH VHWV 3$7+6 DQG 75$16 WR HPSW\ LQ DFFRUGDQFH ZLWK WKH WKHRUHP 3$7+6 DQG 75$16 DUH WKH $OORZ DQG 7UDQVIRUP VHWV IRU GG 7KH WHVW DW OLQH FKHFNV IRU WKH DSSOLFDWLRQ RI 7KHRUHP ,I G e

PAGE 70

f UHTXLUHPHQW LQ OLQH 2QFH WKH 3$7+6 VHW LV FRPSXWHG OLQH FRPSXWHV 75$16 LQ DFFRUGDQFH ZLWK WKH WKHRUHP 7KH WHVW DW OLQH FKHFNV IRU WKH DSSOLFDWLRQ RI 7KHRUHP ,I G? %P>QRGH ZKHUH GG RFFXUV@ WKHQ E\ 3URSHUW\ WKHUH H[LVWV D GHILQLWLRQFOHDU SDWK 3 EHWZHHQ G DQG GG WKDW KDV QR XQUHWXUQHG FDOOV ,W DOVR IROORZV WKDW $//2: DQG 3 GRHV QRW PDNH DQ XQPDWFKHG UHWXUQ WR D FDOO f 75$16)250 EHFDXVH G? LV WKH HOHPHQW PHDQLQJ LV QRW GHILQHG IRU 3 7KLV VDWLVILHV WKH FRQGLWLRQV RI 7KHRUHP 5HIHUULQJ WR 7KHRUHP OLQH FRPSXWHV WKH $$ VHW DQG OLQH FRPSXWHV 77 :KDW OLQH GRHV LV H[WUDFW IURP $//2: DOO SDWKV WKDW HQG ZLWK D FDOO RI WKH SURFHGXUH FRQWDLQLQJ GG $OWKRXJK 7KHRUHP VWDWHV WKDW WKH SDWK EHJLQ ZLWK D FDOO f 75$16)250 OLQH GRHV QRW UHTXLUH D FKHFN IRU WKLV EHFDXVH 75$16)250 LV D VXEVHW RI $//2: DQG WKRVH ILUVW XQUHWXUQHG FDOOV LQ 75$16)250 WKDW DUH RQ D SDWK LQ $//2: WR GG ZLOO XQDYRLGDEO\ EH SLFNHG XS DV WKH SDWKV DUH EXLOW EDFNZDUGV

PAGE 71

IURP GG 7KXV WKH 3$7+6 VHW LV FRPSXWHG LQ DFFRUGDQFH ZLWK 7KHRUHP IROORZHG E\ OLQH WKDW FRPSXWHV WKH 75$16 VHW LQ DFFRUGDQFH ZLWK WKH WKHRUHP 7KH WHVW DW OLQH FKHFNV IRU WKH DSSOLFDWLRQ RI 7KHRUHP ,I G? f eLQ>QRGH ZKHUH GG RFFXUV@ WKHQ E\ 3URSHUW\ WKHUH H[LVWV DW OHDVW RQH GHILQLWLRQFOHDU SDWK 3 EHWZHHQ G DQG GG WKDW KDV DW OHDVW RQH XQUHWXUQHG FDOO ,W DOVR IROORZV WKDW $//2: ’ /HPPD /HW $ DQG EH RQH SDLU RI $OORZ DQG 7UDQVIRUP VHWV DVVRFLDWHG ZLWK D GHILQLWLRQ G DQG OHW $M DQG 7M EH D GLIIHUHQW SDLU RI $OORZ DQG 7UDQVIRUP VHWV DVVRFLDWHG ZLWK WKH VDPH GHILQLWLRQ G $VVXPH $ A DQG $M ,I $M & $ DQG 7M  7 WKHQ GDWDIORZ DQDO\]LQJ G ZLWK WKH SDLU $M DQG 7M FDQQRW DGG DQ\WKLQJ WR WKH ULSSOH HIIHFW WKDW LV QRW DGGHG E\ GDWDIORZ DQDO\]LQJ G ZLWK WKH SDLU $ DQG 7 3URRI %\ LQVSHFWLRQ RI 5XOHV DQG LW FDQ EH VHHQ WKDW UHPRYLQJ VRPH RI WKH FDOOV IURP $ RU 7 FDQQRW PDNH G DIIHFW DQ\WKLQJ WKDW LW GRHV QRW DIIHFW ZLWK $ DQG  DV WKH\ ZHUH $OVR E\ LQVSHFWLRQ RI OLQHV WR WKH GHWHUPLQDWLRQ RI WKH $OORZ DQG 7UDQVIRUP VHWV IRU DQ\ GHILQLWLRQ GG DIIHFWHG E\ G FDQQRW EH PDGH WR LQFOXGH FDOOV ZKHQ $ DQG 7M DUH WKH UHVWULFWLRQ VHWV IRU G WKDW ZRXOG QRW EH LQFOXGHG ZKHQ $ DQG 7 DUH WKH UHVWULFWLRQ VHWV IRU G ’ /HPPD ‘ /HW $ DQG 7 EH $OORZ DQG 7UDQVIRUP VHWV DVVRFLDWHG ZLWK D GHILQLWLRQ G DQG OHW ; DQG < EH D GLIIHUHQW SDLU RI $OORZ DQG 7UDQVIRUP VHWV DVVRFLDWHG ZLWK

PAGE 72

Â’ 7KHRUHP *LYHQ 'HILQLWLRQ DQG 7KHRUHPV WKURXJK WKH DOJRULWKP ZLOO FRUUHFWO\ FRPSXWH WKH ORJLFDO ULSSOH HIIHFW 3URRI $V VKRZQ E\ /HPPD IRU DQ\ DIIHFWHG GHILQLWLRQ GG WKH $OORZ DQG 7UDQVIRUP VHWV WR EH DVVRFLDWHG ZLWK GG DUH FRPSXWHG LQ DFFRUGDQFH ZLWK 7KHRUHPV WR %\ /HPPD LI 7KHRUHP DSSOLHV WR DQ DIIHFWHG GHILQLWLRQ OLQH f WKHQ WKHUH LV QR QHHG WR FKHFN LI DQ\ RWKHU WKHRUHP DOVR DSSOLHV EHFDXVH DGGLWLRQDO GDWDIORZ DQDO\VLV UHVXOWLQJ IURP WKH RWKHU WKHRUHPV FDQQRW FRQWULEXWH WR WKH ULSSOH HIIHFW +RZHYHU LI 7KHRUHP GRHV QRW DSSO\ WKHQ WKH GHILQLWLRQ PXVW EH GDWDIORZ DQDO\]HG VHSDUDWHO\ LQ WXUQ IRU HDFK WKHRUHP WKDW GRHV DSSO\ 7KLV LV GRQH E\ WKH VHTXHQFH RI WKUHH LI VWDWHPHQWV DW OLQHV DQG 7KXV WKH FRQWURO ORJLF LQ OLQHV WR LV VDIH 7KH $QDO\]H SURFHGXUH OLQHV WR f SUHSDUHV D GHILQLWLRQ DQG LWV UHVWULFWLRQ VHWV IRU GDWDIORZ DQDO\VLV E\ DGGLQJ WKHP WR WKH VWDFN OLQH DQG f 2QFH D GHILn QLWLRQ ZLOO EH GDWDIORZ DQDO\]HG ZLWK QR UHVWULFWLRQV OLQH f LW ZLOO QRW EH DQDO\]HG DJDLQ OLQH f %\ /HPPD WKLV LV VDIH $VVXPLQJ ),1GG A 7 DQG 3$7+6 A WKH WHVW DW OLQH ZLOO QRW SUHSDUH D GHILQLWLRQ IRU GDWDIORZ DQDO\VLV LI ERWK UHVWULFWLRQ VHWV

PAGE 73

DUH VXEVHWV RI DQ\ SDLU RI UHVWULFWLRQ VHWV XVHG SUHYLRXVO\ WR DQDO\]H WKDW GHILQLWLRQ 7KLV IROORZV IURP /HPPD 7KXV WKH $QDO\]H SURFHGXUH LV VDIH 7KH FRUUHFWQHVV RI WKH GDWDIORZ HTXDWLRQV OLQH f LV HVWDEOLVKHG LQ &KDSWHU DQG WKH FRUUHFWQHVV RI WKH WKUHH UXOHV IRU LPSRVLQJ EDFNZDUGIORZ UHVWULFWLRQV OLQH f KDV DOUHDG\ EHHQ GLVFXVVHG 5HJDUGLQJ WKH FRUUHFWQHVV RI KDYLQJ QR EDFNZDUGIORZ UHVWULFWLRQV IRU WKH LQLWLDO GHILQLWLRQ OLQH f OHW S EH WKH SURJUDP SRLQW ZKHUH E RFFXUV )RU H[HFXWLRQ WR DWWDLQ SRLQW S DQ\ SRVVLEOH H[HFXWLRQ SDWK EHWZHHQ WKH SURJUDPfV H[HFXWLRQ VWDUWLQJ SRLQW DQG SRLQW S FDQ EH DVVXPHG WR KDYH RFFXUUHG 7KXV WKHUH VKRXOG EH QR UHVWULFWLRQV RQ WKH EDFNZDUGIORZ SRVVLELOLWLHV RI E EHFDXVH WKHUH ZHUH QR FRQVWUDLQWV LPSRVHG E\ WKH ULSSOH HIIHFW RQ KRZ SRLQW S ZDV LQLWLDOO\ DWWDLQHG Â’ 3URJUDPV ZLWK UHFXUVLYH FDOOV FDQ EH SURFHVVHG E\ RXU DOJRULWKP EXW WKHUH PD\ EH VRPH RYHUHVWLPDWLRQ RI WKH ORJLFDO ULSSOH HIIHFW EHFDXVH RI WKH UHFXUVLYH FDOOV 7KH GDWDIORZ HTXDWLRQV OLQH f

PAGE 74

/HW Q EH WKH QXPEHU RI QRGHV LQ WKH IORZJUDSK RI WKH LQSXW SURJUDP )RU D SURJUDPPLQJ ODQJXDJH VXFK DV & VROYLQJ WKH GDWDIORZ HTXDWLRQV IRU D VLQJOH GHILn QLWLRQ ZKLFK LV ZKDW OLQH GRHV KDV ZRUVWFDVH FRPSOH[LW\ RI Qfnf ZKHUH Q LV WKH QXPEHU RI IORZJUDSK QRGHV DQG G LV WKH QXPEHU RI GHILQLWLRQV LQ WKH RYHUHVWLPDWHG ULSSOH HIIHFW 7KLV FRPSOH[LW\ IROORZV IURP WKH 2Qf FRPSOH[LW\ RI VROYLQJ WKH GDWDIORZ HTXDWLRQV IRU D VLQJOH GHILQLWLRQ DQG WKH IDFW WKDW WKH HTXDWLRQV ZLOO KDYH WR EH VROYHG G WLPHV

PAGE 75

7DEOH ([SHULPHQWDO UHVXOWV IRU WKH SURWRW\SH JOREDOV GHIV GHIV JOREDO GHSWK QRGHV 56 563 UHGXFWLRQ WLPH WLPHS b b V V b b V V b b V V b b OPV V b b V V b b V V b b V V b b V V b b V V b b V V b b V V b b OPV OPV b b V V b b OPOV OPOV b b PV V b b PV PV b b V V b b V V b b PV PV b b PV PV b b V V b b OPO2V PV b b PV PV b b PV PV 7DEOH SUHVHQWV WHVW UHVXOWV IRU WKH SURWRW\SH (DFK URZ GHWDLOV UHOHYDQW FKDUn DFWHULVWLFV RI DQ LQSXW SURJUDP DQG SUHVHQWV WKH UHVXOWLQJ DYHUDJHV RI WHQ GLIIHUHQW WHVWV RI WKDW LQSXW SURJUDP ZKHUH HDFK WHVW FRPSXWHG WKH ULSSOH HIIHFW VWDUWHG E\ D VLQJOH UDQGRPO\ FKRVHQ GHILQLWLRQ RI D JOREDO YDULDEOH 7KH LQSXW SURJUDPV RI 7DEOH ZHUH UDQGRPO\ JHQHUDWHG E\ D VHSDUDWH SURn JUDP JHQHUDWRU 7KH JHQHUDWHG LQSXW SURJUDPV DUH V\QWDFWLFDOO\ FRUUHFW DQG FRPSLOH ZLWKRXW HUURU EXW KDYH PHDQLQJOHVV H[HFXWLRQV (DFK LQSXW SURJUDP RI 7DEOH KDV SURFHGXUHV DQG H[DFWO\ WKH QXPEHU RI JOREDO YDULDEOHV OLVWHG :LWKLQ HDFK LQSXW

PAGE 76

SURJUDP HDFK JOREDO YDULDEOH LV GHILQHG DQG XVHG DW OHDVW RQFH 7KH FDOO VWUXFWXUH RI HDFK LQSXW SURJUDP ZDV GHWHUPLQHG UDQGRPO\ E\ WKH JHQHUDWRU ZLWK WKH FRQVWUDLQW WKDW WKHUH EH QR UHFXUVLRQ LQ WKH LQSXW SURJUDP DQG WKH JLYHQ PD[LPXP FDOO GHSWK QRW EH H[FHHGHG E\ DQ\ FDOO LQ WKH LQSXW SURJUDP $OO FDOOV LQ WKH JHQHUDWHG LQSXW SURJUDP DUH NQRZQ FDOOV DQG DSSUR[LPDWHO\ OPD[ f RI WKH FDOOV ZLOO EH DW HDFK SRVVLEOH GHSWK IURP ]HUR WR PD[ ZKHUH PD[ LV WKH JLYHQ PD[LPXP FDOO GHSWK 5HIHUULQJ WR WKH FROXPQV RI 7DEOH fJOREDOVf LV WKH QXPEHU RI JOREDO YDULDEOHV LQ WKH LQSXW SURJUDP fGHIVf LV WKH QXPEHU RI GHILQLWLRQV LQ WKH LQSXW SURJUDP fGHIV JOREDOf LV WKH SHUFHQWDJH RI WKH GHILQLWLRQV WKDW GHILQH D JOREDO YDULDEOH fGHSWKf LV WKH PD[LPXP FDOO GHSWK IROORZHG E\ WKH WRWDO QXPEHU RI FDOOV LQ WKH LQSXW SURJUDP fQRGHVf LV WKH QXPEHU RI QRGHV LQ WKH IORZJUDSK f56ff LV WKH DYHUDJH VL]H RI WKH RYHUHVWLPDWHG ULSSOH HIIHFW IRU WKH WHQ WHVW FDVHV ZKHUH VL]H LV WKH WRWDO QXPEHU RI GHILQLWLRQV DQG XVHV LQ WKH ULSSOH HIIHFW f563f LV WKH DYHUDJH VL]H RI WKH SUHFLVH ULSSOH HIIHFW fUHGXFWLRQf LV WKH DYHUDJH SHUFHQWDJH UHGXFWLRQ IRU WKH WHQ WHVW FDVHV RI WKH VL]H RI WKH RYHUHVWLPDWHG ULSSOH HIIHFW ZKHQ LW LV UHSODFHG E\ WKH SUHFLVH ULSSOH HIIHFW fWLPHf LV WKH DYHUDJH &38 XVDJH WLPH IRU HDFK WHVW FDVH WR FRPSXWH WKH RYHUHVWLPDWHG ULSSOH HIIHFW DQG fWLPHSf LV WKH DYHUDJH &38 XVDJH WLPH IRU HDFK WHVW FDVH WR FRPSXWH WKH SUHFLVH ULSSOH HIIHFW 7KH KDUGZDUH XVHG ZDV UDWHG DW URXJKO\ 0,36 $V DQ H[DPSOH RI WKH WLPH QRWDWLRQ XVHG LQ 7DEOH WLPH OPV ZRXOG EH UHDG DV PLQXWH VHFRQGV $OWKRXJK WKH ZRUVWFDVH FRPSOH[LW\ RI RXU DOJRULWKP IRU SUHFLVH ORJLFDO ULSSOH HIIHFW LV H[SRQHQWLDO WKH GDWD RI 7DEOH LQGLFDWHV WKDW WKH H[SHFWHG FRPSOH[LW\ IRU D ZLGH UDQJH RI LQSXW SURJUDPV JLYHQ D SURJUDPPLQJ ODQJXDJH VXFK DV & LV DSSUR[Ln PDWHG E\ QGf 7KLV IROORZV IURP WKH QGf ZRUVWFDVH FRPSOH[LW\ RI FRPSXWLQJ WKH RYHUHVWLPDWH DQG WKH W\SLFDO FORVHQHVV RI WLPH DQG WLPHS IRU HDFK URZ LQ 7DEOH +RZHYHU WKH ODVW URZ RI 7DEOH LV LQVWUXFWLYH EHFDXVH LW VKRZV WKDW UHJDUGOHVV RI ZKDW WKH H[SHFWHG FRPSOH[LW\ PLJKW EH WKHUH ZLOO DOZD\V EH VSHFLILF LQSXW SURJUDPV

PAGE 77



PAGE 78

f§ &RPSXWH WKH VOLFH IRU D K\SRWKHWLFDO RU DFWXDO XVH E f§ ,QSXW D SURJUDP IORZJUDSK UHDG\ IRU GDWDIORZ DQDO\VLV f§ 2XWSXW WKH VOLFH LQ 6/,&( EHJLQ 6/,&( IRU HDFK XVH XX LQ WKH SURJUDP ),188 HQG IRU VWDFN f§ SXVK f RQWR VWDFN ZKLOH VWDFN A GR SRS VWDFN LQWR X $//2: 75$16)250f 6ROYH WKH UHDFKLQJXVHV GDWDIORZ HTXDWLRQV IRU WKH VLQJOH XVH X XVLQJ 5XOHV DQG 6/,&( 6/,&( 8 0 IRU HDFK GHILQLWLRQ G LQ WKH SURJUDP WKDW LV DIIHFWHG E\ HLWKHU 8? RU X 6/,&( 6/,&( 8 ^G` HQG IRU 5227 A /,1. 5227 /,1. IRU HDFK UHWXUQ QRGH Q LQ WKH IORZJUDSK LI X? f %LQ>Q@ $ 8? FURVVHG IURP WKLV UHWXUQ LQWR WKH UHWXUQHGIURP SURFHGXUH 5227 f§ 5227O 8 ^WKH UHWXUQ QRGH Q` IL LI X? e ef>Q@ $ X? FURVVHG IURP WKLV UHWXUQ LQWR WKH UHWXUQHGIURP SURFHGXUH /,1. f§ /,1. 8 ^WKH UHWXUQ QRGH Q` IL LI X f %cQ>Q? $ X FURVVHG IURP WKLV UHWXUQ LQWR WKH UHWXUQHGIURP SURFHGXUH 5227 f§ 5227 8 ^WKH UHWXUQ QRGH Q` IL LI X e (WQ>Q@ $ X FURVVHG IURP WKLV UHWXUQ LQWR WKH UHWXUQHGIURP SURFHGXUH /,1. f§ /,1. 8 ^WKH UHWXUQ QRGH Q` IL HQG IRU )LJXUH 7KH VOLFLQJ DOJRULWKP

PAGE 79

HQG HQG IRU HDFK XVH XX LQ WKH SURJUDP WKDW LV DIIHFWHG E\ HLWKHU X? RU X f§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU XX E\ 7KHRUHP LI X e ="RXW>QRGH ZKHUH XX RFFXUV@ 3$7+6 75$16 FDOO $QDO\]H HOVH f§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU XX E\ 7KHRUHP LI X e L"RXL>QRGH ZKHUH XX RFFXUV@ 3$7+6 3$7+6 f§ ^[ [ e 5227 8 /,1.f $ [ UHWXUQV IURP WKH SURFHGXUH WKDW FRQWDLQV XX 9 [ UHWXUQV IURP D SURFHGXUH WKDW FRQWDLQV D UHWXUQ U e 3$7+6 /,1.ff` 75$16 5227 Q 3$7+6 FDOO $QDO\]H IL f§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU XX E\ 7KHRUHP LI XL e "RXW>QRGH ZKHUH XX RFFXUV@ 3$7+6 3$7+6 rf§ ^[ [ e $//2: $ [ UHWXUQV IURP WKH SURFHGXUH WKDW FRQWDLQV XX 9 [ UHWXUQV IURP D SURFHGXUH WKDW FRQWDLQV D UHWXUQ U e 3$7+6f` 75$16 75$16)250 Q 3$7+6 FDOO $QDO\]H IL f§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU XX E\ 7KHRUHP LI XL e )nRXMQRGH ZKHUH XX RFFXUV@ IRU HDFK SURFHGXUH ; WKDW FRQWDLQV D UHWXUQ e 5227 57 Af§ ^[ [ e 5227 $ [ LV FRQWDLQHG LQ SURFHGXUH ;f 33 m 33 f§ ^D [ e 57 8 /,1.f $ [ LV RQ D SDWK WKDW LQFOXVLYHO\ EHJLQV ZLWK D UHWXUQ e 57 DQG HQGV ZLWK D UHWXUQ IURP WKH SURFHGXUH WKDW FRQWDLQV XX VXFK WKDW HDFK UHWXUQ LQ WKLV SDWK LV LQ 57O 8 /,1.Off` LI 33 s 3$7+6 3$7+6 f§ ^[ [ e $//2: $ [ UHWXUQV IURP SURFHGXUH ; 9 [ UHWXUQV IURP D SURFHGXUH WKDW FRQWDLQV D UHWXUQ U e 3$7+6f` 75$16 75$16)250 Q 3$7+6 3$7+6 3$7+6 8 33 FDOO $QDO\]H VWDWHPHQWV IL HQG IRU IL IL HQG IRU RG )LJXUH FRQWLQXHG

PAGE 80

r 3URFHGXUH $QDO\]H EHJLQ f§ DYRLG UHSHWLWLRQ RI XX GDWDIORZ DQDO\VLV LI SRVVLEOH LI ),188 7 $ 3$7+6 9 WUXH IRU DOO VDYHG SDLUV IRU XX 3$7+6 A 3 9 75$16 b 7ff LI 3$7+6 ),188 7 SXVK XX f RQWR VWDFN HOVH VDYH 3$7+6 DQG 75$16 DV WKH SDLU 3 [ 7 IRU XX SXVK XX 3$7+6 75$16f RQWR VWDFN IL IL HQG )LJXUH FRQWLQXHG 5XOH ,I $//2: WKHQ HOHPHQW L LV JHQHUDWHG DW WKH QRGH ZKHUH XVH X RFFXUV RWKHUZLVH WT LV WKH JHQHUDWHG HOHPHQW 5XOH /HW Q EH D FDOO QRGH S EH WKH DVVRFLDWHG UHWXUQ QRGH DQG T EH WKH HQWU\ QRGH RI WKH UHWXUQHGIURP SURFHGXUH (DFK WLPH WKH %RXW>Q? HTXDWLRQ LV FRPSXWHG LI LT f %LQ>Tf WKHQ WT FDQQRW FURVV IURP %^Q>T@ LQWR WKH %RXW>Q@ VHW LI S e $//2: 5XOH /HW Q EH D FDOO QRGH S EH WKH DVVRFLDWHG UHWXUQ QRGH DQG T EH WKH HQWU\ QRGH RI WKH UHWXUQHGIURP SURFHGXUH (DFK WLPH WKH %RXW>Q? HTXDWLRQ LV FRPSXWHG LI LT f %^Q>T@ DQG E\ &L DQG 5XOH UT FDQ FURVV IURP %^Q>T@ LQWR WKH %RXW>Q@ VHW DQG S ( 75$16)250 WKHQ DV WKLV WT HOHPHQW FURVVHV IURP %^Q>T@ LQWR WKH %RXW>Q? VHW WKH HOHPHQW LV FKDQJHG WR L ,Q HIIHFW LT LV WUDQVIRUPHG LQWR 8 DQG WKH FDOO QRGH Q EHFRPHV D JHQHUDWLRQ QRGH IRU WKH X HOHPHQW $V WKH XVHIXOQHVV RI VOLFLQJ LV SULPDULO\ IRU SURJUDP IDXOW ORFDOL]DWLRQ LW PD\ EH GHVLUDEOH WR PRGLI\ WKH DOJRULWKP VR WKDW WKRVH XVHV LQ FRQWURO SUHGLFDWHV ZKRVH VXERUGLQDWH VWDWHPHQWV KDYH DW OHDVW RQH XVH RU GHILQLWLRQ DOUHDG\ LQ WKH VOLFH DUH WKHPVHOYHV DGGHG WR WKH VOLFH DQG SURSDJDWHG LQ WXUQ $Q H[DPSOH RI D FRQWURO SUHGn LFDWH LV WKH FRQGLWLRQ WHVWHG E\ DQ LI VWDWHPHQW %\ VXERUGLQDWH VWDWHPHQWV LV PHDQW

PAGE 81

)RU DQ\ QRGH Q 287>Q@ (RXW>Q@ 8 %RXW>Q? ,1>Q@ (LQ>Q? 8 %LQ>Q? *URXS Q LV DQ H[LW QRGH %XW>Q@ (RXW:@ ^[ [ ,1>S@ $ &L` S f VXFFQf %LQ>Q@ *(1>Q@ (LQ>Q@ (RXW>Q@ 8 5(&2'(>Q@ *URXS ,, Q LV D FDOO QRGH S LV WKH DVVRFLDWHG UHWXUQ QRGH DQG T LV WKH HQWU\ QRGH RI WKH UHWXUQHGIURP SURFHGXUH %XW>Q? ^] [ f %LQ>S@ $ &L 9 &L $ & $ [ H eQ>J@fff 9 [ f %LQ>T@ $ &f` (RXW>Q@ ^[ (LQ>S@ &L 9 &L $ & $ [ ef>"@f` %Q>Q@ >%RXW>Q@ .,//>Q@f 8 *(1>Q@ (LQ>Q@ (RXW>Q? .,//>Q@ *URXS ,,, Q LV QRW DQ H[LW RU FDOO QRGH %XW>Qf %LQ>S@ S f VXFFQf (XW>Q@ (LQ>S@ S f VXFFQf %LQ>Q? %RXW>Q? f§ .,//>Q@f 8 *(1>Q@ &QIQ@ (RXW>Q@ .,//>Q? )LJXUH 'DWDIORZ HTXDWLRQV IRU WKH UHDFKLQJXVHV SUREOHP

PAGE 82

WKRVH VWDWHPHQWV ZKRVH H[HFXWLRQ LV GHFLGHG E\ WKH FRQWURO SUHGLFDWH ,QFOXGLQJ WKHVH FRQWUROSUHGLFDWH XVHV LQ WKH VOLFH LV DGYDQWDJHRXV EHFDXVH WKH FDXVH RI D SURJUDP HUURU PD\ DFWXDOO\ EH LQ D FRQWURO SUHGLFDWH WKDW LV QRW GHFLGLQJ FRUUHFWO\ ZKHQ WR H[HFXWH LWV VXERUGLQDWH VWDWHPHQWV )HUUDQWH HW DO >@ SUHVHQW D PHWKRG WR SUHFLVHO\ GHWHUPLQH WKH FRQWURO SUHGLFDWHV IRU HDFK VWDWHPHQW

PAGE 83

&+$37(5 ,17(5352&('85$/ 3$5$//(/,=$7,21 /RRS&DUULHG 'DWD 'HSHQGHQFH 7KLV VHFWLRQ H[SODLQV ORRSFDUULHG GDWD GHSHQGHQFH DQG LWV UHOHYDQFH WR SDUDOn OHOL]DWLRQ :KHQ D GHILQLWLRQ RI D YDULDEOH UHDFKHV D XVH RI WKDW YDULDEOH WKHQ D GDWD GHSHQGHQFH H[LVWV VXFK WKDW WKH XVH GHSHQGV RQ WKH GHILQLWLRQ $Q H[DPSOH RI GDWD GHSHQGHQFH FDQ EH VHHQ LQ )LJXUH 7KH XVH RI $,f DW OLQH DQG WKH XVH RI $,f DW OLQH ERWK GHSHQG RQ WKH GHILQLWLRQ RI $,ff %,f r &,f %,f &,f $,f ,) &,f 7+(1 &,f $,f r %,f ), (1' '2 )LJXUH $Q H[DPSOH ORRS

PAGE 84

ZRXOG PHDQ WKDW WKH RUGHULQJ RI WKH GLIIHUHQW LWHUDWLRQV RI WKH ORRS LV XQLPSRUWDQW ZKHUHDV D ORRSFDUULHG GHSHQGHQFH PHDQV WKH RSSRVLWH ,I WKHUH DUH QR ORRSFDUULHG GDWD GHSHQGHQFLHV WKHQ WKHUH LV QR UHTXLUHPHQW WKDW WKH LWHUDWLRQV EH RUGHUHG D FHUWDLQ ZD\ +RZHYHU ZKHQHYHU D ORRS LV SDUDOOHOL]HG WKHUH VKRXOG EH D IROORZLQJ DGGHG VHULDO VWHS WKDW VHWV WKH LWHUDWLRQ YDULDEOHV VXFK DV WKH LQ )LJXUH WR ZKDWHYHU WKHLU YDOXHV ZRXOG EH IRU WKH ODVW LWHUDWLRQ RI WKH ORRS DVVXPLQJ WKH ORRS KDG QRW EHHQ SDUDOOHOL]HG 7KLV DGGHG VWHS ZRXOG EH QHFHVVDU\ DVVXPLQJ WKH LWHUDWLRQ YDULDEOHV RI D ORRS DUH YLVLEOH RXWVLGH WKH ORRS DQG FDQ WKHUHIRUH EH UHIHUHQFHG DIWHU WKH ORRS FRPSOHWHV ,WHUDWLRQ YDULDEOHV DUH WKRVH YDULDEOHV WKDW DUH LQFUHPHQWHG RU GHFUHPHQWHG D FRQVWDQW YDOXH IRU HDFK ORRS LWHUDWLRQ 7KH UHFRJQLWLRQ RI LWHUDWLRQ YDULDEOHV LV ODQJXDJHGHSHQGHQW 5HJDUGLQJ GDWD GHSHQGHQFH DQG DUUD\V WKHUH DUH VHYHUDO HIILFLHQW WHVWV DYDLODEOH WKDW GHWHUPLQH LI D GDWD GHSHQGHQFH LV SRVVLEOH EHWZHHQ D SDUWLFXODU GHILQLWLRQ DQG XVH RI DQ DUUD\ 7KH WHVWV DUH WKH VHSDUDELOLW\ WHVW WKH JFG WHVW DQG WKH %DQHUMHH WHVW 'HWDLOV RI WKHVH WKUHH WHVWV FDQ EH IRXQG LQ >@ 7KH QXPEHU WKHRU\ EHKLQG WKH WHVWV LV OLQHDU GLRSKDQWLQH HTXDWLRQV $ OLQHDU GLRSKDQWLQH HTXDWLRQ FDQ EH IRUPHG IURP WKH DUUD\ VXEVFULSWV RI WKH GHILQLWLRQ DQG XVH LQ TXHVWLRQ )RU H[DPSOH LQ )LJXUH ZH ZDQW WR NQRZ LI $ r f DQG $ r ,f FDQ HYHU UHIHU WR WKH VDPH DUUD\ HOHPHQW 7KH OLQHDU GLRSKDQWLQH HTXDWLRQ WKDW UHODWHV WKHVH WZR DUUD\ UHIHUHQFHV ZRXOG EH D f§

PAGE 85

'2 $ r f $ r ,f

PAGE 86

n

PAGE 87

f§ D G X SDLU LV D GHILQLWLRQ G WKDW UHDFKHV D XVH X f§ D LV WKH GDWDIORZ HOHPHQW WKDW UHSUHVHQWV WKH GHILQLWLRQ G f§ Y LV WKH YDULDEOH UHIHUHQFHG E\ GHILQLWLRQ G DQG XVH X f§ WR DYRLG FRPSOLFDWLRQV Q A G A X LV DVVXPHG f§ Q LV WKH ILUVW QRGH WUDYHUVHG GXULQJ HDFK ORRS / LWHUDWLRQ f§ G LV WKH QRGH ZKRVH EDVLF EORFN FRQWDLQV GHILQLWLRQ G f§ X LV WKH QRGH ZKRVH EDVLF EORFN FRQWDLQV XVH X f§ /% LV WKH VHW RI QRGHV LQ WKH ORRS ERG\ RI ORRS / f§ ,9 LV WKH VHW RI GHILQLWLRQV RI LWHUDWLRQ YDULDEOHV IRU ORRS / EHJLQ f§ VWHS GHWHUPLQH UHDFKLQJ GHILQLWLRQV IRU WKH LQSXW SURJUDP XVH RXU PHWKRG WR VROYH WKH UHDFKLQJGHILQLWLRQV GDWDIORZ SUREOHP f§ VWHS LPSURYH WKH UHDFKLQJGHILQLWLRQ LQIRUPDWLRQ IRU DUUD\ UHIHUHQFHV IRU DOO G X SDLUV LQ WKH SURJUDP VXFK WKDW Y LV DQ DUUD\ XVH WKH VHSDUDELOLW\ JFG DQG %DQHUMHH WHVWV DV DSSOLFDEOH LI GHILQLWLRQ G DQG XVH X FDQ QHYHU UHIHUHQFH WKH VDPH HOHPHQW PDUN WKH G X SDLU DV QRQUHDFKLQJ IL HQG IRU f§ VWHS LGHQWLI\ G X SDLUV WKDW LQKLELW SDUDOOHOL]DWLRQ IRU HDFK ORRS / LQ WKH SURJUDP IRU HDFK UHDFKLQJ G X SDLU VXFK WKDW G X f /% DQG GHILQLWLRQ G e ,9 LI [ f %RXW>Q? LI 3WHVWDU Q G X / /%f 7 PDUN / SDUDOOHOL]DWLRQ DV LQKLELWHG E\ WKH G X SDLU IL IL HQG IRU HQG IRU HQG )LJXUH 7KH SDUDOOHOL]DWLRQ DOJRULWKP

PAGE 88

SURFHGXUH 3WHVW[ Q G X / /%f f§ LV WKHUH D ORRSFDUULHG GDWD GHSHQGHQFH IURP GHILQLWLRQ G WR XVH X WKUX QRGH Q f§ UHWXUQ 7 LI \HV ) LI QR EHJLQ f§ SDUWL LV WKHUH D SDWK IURP G WR Q DORQJ ZKLFK [ LV IRXQG LI Y LV DQ DUUD\ '21( m ^G` HOVH '21( ^G X` IL 1(;7 ^G` XQWLO 1(;7 UHPRYH D QRGH IURP 1(;7 GHQRWH LW S IRU HDFK VXFFHVVRU QRGH V RI QRGH S VXFK WKDW V e '21( '21( '21( 8 ^V` LI V e /% 9V LV DQ HQWU\ QRGH 9[ e RXW>!V@ LJQRUH V HOVH LI V Q JRWR SDUW HOVH 1(;7 1(;7 8 ^V` IL HQG IRU HQG XQWLO UHWXUQ ) )LJXUH FRQWLQXHG

PAGE 89

SDUW f§ SDUW LV WKHUH D SDWK IURP Q WR X DORQJ ZKLFK [ LV IRXQG LI Y LV DQ DUUD\ '21( m ^Q` HOVH '21( m ^Q G` IL 1(;7 m ^Q` XQWLO 1(;7 UHPRYH D QRGH IURP 1(;7 GHQRWH LW S IRU HDFK VXFFHVVRU QRGH V RI QRGH S VXFK WKDW V A '21( '21( '21( 8 ^V` LI V e /% 9V LV DQ H[LW QRGH 9V LV FRQWDLQHG LQ WKH VDPH SURFHGXUH WKDW FRQWDLQV / $ [ %RXW>V@f 9V LV QRW FRQWDLQHG LQ WKH VDPH SURFHGXUH WKDW FRQWDLQV / $ [ e (ARXL>V@f LJQRUH V HOVH LI V X UHWXUQ 7 HOVH 1(;7 1(;7 8 ^V` IL HQG IRU HQG XQWLO UHWXUQ ) HQG )LJXUH FRQWLQXHG

PAGE 90

RI D QRGH DUH H[DPLQHG EHFDXVH QRUPDOO\ D VXFFHVVRU QRGH LV DVVXPHG WR UHSUHVHQW D SRVVLEOH FRQWLQXDWLRQ RI WKH H[HFXWLRQ SDWK IURP WKH SRLQW RI WKH SUHGHFHVVRU QRGH ([FHSWLRQV LQ WKH DOJRULWKP LQYROYLQJ HQWU\ DQG H[LW QRGHV DUH H[SODLQHG VKRUWO\ 1RWH WKDW 3WHVW RQO\ GHWHUPLQHV ZKHWKHU D VDWLVIDFWRU\ SDWK 3 H[LVWV RU QRW LW GRHV QRW GHWHUPLQH ZKDW SDWK 3 LV LQ WHUPV RI DQ DFWXDO QRGH VHTXHQFH DV WKHUH PD\ EH PDQ\ VXFK VDWLVIDFWRU\ SDWKV 3 /LQHV DQG DUH DFWLYH ZKHQ Y LV QRW DQ DUUD\ ,Q WKLV FDVH D SDWK 3 WKDW LQFOXGHV G X Q G X LQ WKDW RUGHU LV QRW DOORZHG DQG WKLV LV SUHYHQWHG E\ PDUNLQJ WKH XQZDQWHG QRGH X DW OLQH DQG WKH XQZDQWHG QRGH G DW OLQH 7KH WHVW RI [ I"RXW>V@ DW OLQH VDWLVILHV WKH UHTXLUHPHQW WKDW WKH GHILQLWLRQ G FDQ UHDFK DORQJ WKH SDWK 3 $ VLPLODU WHVW LV PDGH DW OLQH $W OLQH RQO\ WKH % VHW LV FKHFNHG EHFDXVH WKHUH DUH QR GHVFHQWV LQWR FDOOHG SURFHGXUHV DV SHU WKH UHMHFWLRQ RI HQWU\ QRGHV DW OLQH (QWU\ QRGHV DUH UHMHFWHG DW OLQH EHFDXVH DQ\ SDWK IURP G WR Q ZLOO QRW OHDYH XQUHWXUQHG FDOOV EHFDXVH Q LV DQ RXWHUPRVW QRGH UHODWLYH WR WKH ORRS ERG\ DQG WKH SDWK LV FRQILQHG WR WKH ORRS ERG\ $V WKH VXFFHVVRUV RI HDFK FDOO QRGH DUH DQ HQWU\ QRGH DQG D UHWXUQ QRGH LW LV RQO\ QHFHVVDU\ WR FKHFN WKH RXW VHW RI WKH UHWXUQ QRGH WR NQRZ ZKHWKHU WKH HOHPHQW [ VXUYLYHG WKH FDOO RU QRW DQG WKLV LV HIIHFWLYHO\ GRQH E\ WKH [ ef ZKHUH Q LV WKH QXPEHU RI IORZJUDSK QRGHV )RU WKH HQWLUH DOJRULWKP VWHS GRPLQDWHV VR WKH

PAGE 91

FRPSOH[LW\ LV 2OSQf ZKHUH O LV WKH QXPEHU RI ORRSV LQ WKH SURJUDP S LV WKH QXPEHU RI GX SDLUV LQ WKH SURJUDP DQG Q LV WKH QXPEHU RI IORZJUDSK QRGHV

PAGE 92

n UDO ORJLFDO ULSSOH HIIHFW DQG VOLFLQJ 7KH DOJRULWKPV XVH RXU LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV PHWKRG DQG DGG D FRQWURO PHFKDQLVP E\ ZKLFK LQ HIIHFW H[HFXWLRQSDWK

PAGE 93

KLVWRU\ FDQ DIIHFW H[HFXWLRQSDWK FRQWLQXDWLRQ DV WKH ULSSOH HIIHFW RU VOLFH LV EXLOW SLHFH E\ SLHFH 7KH LPSRUWDQFH RI RXU DOJRULWKPV IRU SUHFLVH LQWHUSURFHGXUDO ORJLFDO ULSSOH HIIHFW DQG VOLFLQJ OLHV LQ WKHLU DSSOLFDELOLW\ WR WKH DUHDV RI VRIWZDUH PDLQWHQDQFH DQG GHEXJnnn SURFHGXUDO ORJLFDO ULSSOH HIIHFW DQG VOLFLQJ EHFDXVH WKH DOJRULWKPV PD\ RYHUHVWLPDWH ZKHQ UHFXUVLYH FDOOV DUH SUHVHQW RU EHFDXVH WKH $OORZ VHW ODFNV WKH LQIRUPDWLRQ QHHGHG

PAGE 94

WR HQIRUFH WKH RUGHULQJ RI XQPDWFKHG UHWXUQV RQH DUHD RI IXWXUH UHVHDUFK ZRXOG EH WR LQYHVWLJDWH WKH SRVVLELOLW\ RI PRGLI\LQJ 'HILQLWLRQ 7KHRUHPV WKURXJK DQG WKH DOJRULWKPV VR DV WR UHPRYH WKH SRVVLELOLW\ RI VXFK RYHUHVWLPDWLRQ

PAGE 95

5()(5(1&(6 >@ $JUDZDO + DQG +RUJDQ '\QDPLF SURJUDP VOLFLQJ 3URFHHGLQJV RI WKH 6,* 3/$1 &RQIHUHQFH RQ 3URJUDPPLQJ /DQJXDJH 'HVLJQ DQG ,PSOHPHQWDWLRQ $&0 6,*3/$1 1RWLFHV -XQH f >@ $KR $ 6HWKL 5 DQG 8OOPDQ &RPSLOHUV 3ULQFLSOHV 7HFKQLTXHV DQG 7RROV $GGLVRQ:HVOH\ 5HDGLQJ 0$ f >@ $OOHQ ) ,QWHUSURFHGXUDO GDWD IORZ DQDO\VLV 3URFHHGLQJV RI WKH ,),3 &RQJUHVV 1RUWK +ROODQG $PVWHUGDP f >@ %DQQLQJ $Q HIILFLHQW ZD\ WR ILQG WKH VLGH HIIHFWV RI SURFHGXUH FDOOV DQG WKH DOLDVHV RI YDULDEOHV &RQIHUHQFH 5HFRUG RI WKH WK $&0 6\PSRVLXP RQ 3ULQFLSOHV RI 3URJUDPPLQJ /DQJXDJHV $&0 1HZ @ %XUNH 0 DQG &\WURQ 5 ,QWHUSURFHGXUDO GHSHQGHQFH DQDO\VLV DQG SDUDOOHOL]Dn WLRQ 3URFHHGLQJV RI WKH 6,*3/$1 6\PSRVLXP RQ &RPSLOHU &RQVWUXFWLRQ >@ &DOODKDQ 7KH SURJUDP VXPPDU\ JUDSK DQG IORZVHQVLWLYH LQWHUSURFHGXUDO GDWD IORZ DQDO\VLV 3URFHHGLQJV RI WKH 6,*3/$1 &RQIHUHQFH RQ 3URJUDPn PLQJ /DQJXDJH 'HVLJQ DQG ,PSOHPHQWDWLRQ $&0 6,*3/$1 1RWLFHV -XO\ f >@ &RRSHU DQG .HQQHG\ ,QWHUSURFHGXUDO VLGHHIIHFW DQDO\VLV LQ OLQHDU WLPH 3URFHHGLQJV RI WKH 6,*3/$1 &RQIHUHQFH RQ 3URJUDPPLQJ /DQJXDJH 'HVLJQ DQG ,PSOHPHQWDWLRQ $&0 6,*3/$1 1RWLFHV -XO\ f >@ )HUUDQWH 2WWHQVWHLQ DQG :DUUHQ 7KH SURJUDP GHSHQGHQFH JUDSK DQG LWV XVH LQ RSWLPL]DWLRQ $&0 7UDQVDFWLRQV RQ 3URJUDPPLQJ /DQJXDJHV DQG 6\VWHPV f >@ +DUUROG 0 DQG 6RIID 0 &RPSXWDWLRQ RI LQWHUSURFHGXUDO GHILQLWLRQ DQG XVH GHSHQGHQFLHV 3URFHHGLQJV RI WKH ,((( &RPSXWHU 6RFLHW\ ,QWfO &RQIHUHQFH RQ &RPSXWHU /DQJXDJHV 1HZ 2UOHDQV /$ 0DUFK f >@ +HFKW 0 )ORZ $QDO\VLV RI &RPSXWHU 3URJUDPV (OVHYLHU 1RUWK+ROODQG 1HZ @ +RUZLW] 6 5HSV 7 DQG %LQNOH\ ,QWHUSURFHGXUDO VOLFLQJ XVLQJ GHSHQGHQFH JUDSKV $&0 7UDQVDFWLRQV RQ 3URJUDPPLQJ /DQJXDJHV DQG 6\VWHPV -DQ f >@ +ZDQJ 'X 0 DQG &KRX & )LQGLQJ SURJUDP VOLFHV IRU UHFXUVLYH SURFHGXUHV 3URFHHGLQJV RI WKH ,((( &2036$& 2FW f

PAGE 96

>@ -RKPDQQ /LX 6 DQG @ .RUHO % DQG /DVNL '\QDPLF SURJUDP VOLFLQJ ,QIRUPDWLRQ 3URFHVVLQJ /HWn WHUV 2FW f >@ /DQGL : DQG 5\GHU % 3RLQWHULQGXFHG DOLDVLQJ D SUREOHP FODVVLILFDWLRQ &RQIHUHQFH 5HFRUG RI WKH WK $&0 6\PSRVLXP RQ 3ULQFLSOHV RI 3URJUDPPLQJ /DQJXDJHV $&0 1HZ @ /HXQJ + DQG 5HJKEDWL + &RPPHQWV RQ SURJUDP VOLFLQJ ,((( 7UDQVDFWLRQV RQ 6RIWZDUH (QJLQHHULQJ 6( 'HF f >@ 0\HUV ( $ SUHFLVH LQWHUSURFHGXUDO GDWD IORZ DQDO\VLV DOJRULWKP &RQIHUHQFH 5HFRUG RI WKH WK $&0 6\PSRVLXP RQ 3ULQFLSOHV RI 3URJUDPPLQJ /DQJXDJHV $&0 1HZ @ 5LFKDUGVRQ 6 DQG *DQDSDWKL 0 ,QWHUSURFHGXUDO RSWLPL]DWLRQ H[SHULPHQWDO UHVXOWV 6RIWZDUHf§3UDFWLFH DQG ([SHULHQFH f >@ 5RVHQ % 'DWD IORZ DQDO\VLV IRU SURFHGXUDO ODQJXDJHV -RXUQDO RI WKH $&0 $SULO f >@ 5\GHU % DQG 3DXOL 0 (OLPLQDWLRQ DOJRULWKPV IRU GDWD IORZ DQDO\VLV $&0 &RPSXWLQJ 6XUYH\V 6HS f >@ 6KDULU 0 DQG 3QXHOL $ 7ZR DSSURDFKHV WR LQWHUSURFHGXUDO GDWD IORZ DQDO\VLV 0XFKQLN 6 DQG -RQHV 1 (GV 3URJUDP )ORZ $QDO\VLV 7KHRU\ DQG $SSOLFDn WLRQV 3UHQWLFH+DOO (QJOHZRRG &OLIIV 1f >@ 7ULROHW 5 ,ULJRLQ ) )HDXWULHU 3 'LUHFW SDUDOOHOL]DWLRQ RI FDOO VWDWHPHQWV 3URFHHGLQJV RI WKH 6,*3/$1 6\PSRVLXP RQ &RPSLOHU &RQVWUXFWLRQ f§ >@ :HLVHU 0 3URJUDPPHUV XVH VOLFHV ZKHQ GHEXJJLQJ &RPPXQLFDWLRQV RI WKH $&0 -XO\ f >@ :HLVHU 0 3URJUDP VOLFLQJ ,((( 7UDQVDFWLRQV RQ 6RIWZDUH (QJLQHHULQJ 6( -XO\ f >@ =LPD + DQG &KDSPDQ % 6XSHUFRPSLOHUV IRU 3DUDOOHO DQG 9HFWRU &RPSXWHUV $GGLVRQ:HVOH\ 5HDGLQJ 0$ f

PAGE 97



PAGE 98

, FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 6WHSILH 6
PAGE 99



PAGE 100

81,9(56,7< 2) )/25,'$


CONTEXT-DEPENDENT FLOW-SENSITIVE INTERPROCEDURAL
DATAFLOW ANALYSIS AND ITS APPLICATION
TO SLICING AND PARALLELIZATION
By
KURT JOHMANN
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1992
UNIVERSITY OF FIORIOR LIBRARIES

ACKNOWLEDGEMENTS
I would like to express my appreciation and gratitude to my chairman and
advisor, Dr. Stephen S. Yau, for his careful guidance and generous support during
this study. I also would like to express my appreciation and gratitude to my previous
advisor, Dr. Sying-Syang Liu. Without their supervision and counsel, this work would
not have been possible. To Dr. Paul Fishwick, Dr. Richard Newman-Wolfe, and Dr.
Mark Yang, members of the supervisory committee, go my thankfulness for their
service.
Finally, I want to thank the Software Engineering Research Center (SERC) for
providing financial support during this study.
in

TABLE OF CONTENTS
ACKNOWLEDGEMENTS ii
ABSTRACT v
CHAPTERS
1 INTRODUCTION 1
1.1 Interprocedural Dataflow Analysis 1
1.2 Slicing and Logical Ripple Effect 3
1.3 Parallelization 6
1.4 Literature Review 7
1.5 Outline in Brief 11
2 THE INTERPROCEDURAL DATAFLOW ANALYSIS METHOD .... 12
2.1 Constructing the Flowgraph 12
2.2 Interprocedural Forward-Flow-Or Analysis 16
2.2.1 The Dataflow Equations 17
2.2.2 Element Recoding for Aliases 23
2.2.3 Implicit Definitions Due to Calls 27
2.3 Interprocedural Forward-Flow-And Analysis 30
2.4 Interprocedural Backward-Flow Analysis 36
2.5 Complexity of Our Interprocedural Analysis Method 36
2.6 Experimental Results 41
3 INTERPROCEDURAL SLICING AND LOGICAL RIPPLE EFFECT . . 45
3.1 Representing Continuation Paths for Interprocedural Logical Ripple
Effect 45
3.2 The Logical Ripple Effect Algorithm 55
3.3 A Prototype Demonstrates the Algorithm 67
3.4 The Slicing Algorithm 71
4 INTERPROCEDURAL PARALLELIZATION 77
4.1 Loop-Carried Data Dependence 77
4.2 The Parallelization Algorithm 80
5 CONCLUSIONS AND FUTURE RESEARCH 86
5.1 Summary of Main Results 86
iii

5.2 Directions for Future Research 87
REFERENCES 88
BIOGRAPHICAL SKETCH 91
iv

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
CONTEXT-DEPENDENT FLOW-SENSITIVE INTERPROCEDURAL
DATAFLOW ANALYSIS AND ITS APPLICATION
TO SLICING AND PARALLELIZATION
By
Kurt Johmann
May 1992
Chairman: Dr. Stephen S. Yau
Major Department: Computer and Information Sciences
Interprocedural dataflow analysis is important in compiler optimization, au¬
tomatic vectorization and parallelization, program revalidation, dataflow anomaly
detection, and software tools that make a program more understandable by show¬
ing data dependencies. These applications require the solution of dataflow problems
such as reaching definitions, live variables, available expressions, and definition-use
chains. When solving these problems interprocedurally, the context of each call must
be taken into account.
In this dissertation we present a method to solve this kind of dataflow problem
precisely. The method consists of special dataflow equations that are solved for a
program flowgraph. Regarding calling context, separate sets, called entry and body
sets, are maintained at each node in the flowgraph. The entry set contains calling-
context effects that enter a procedure. The body set contains effects that result
from statements in the procedure. By isolating calling-context effects in the entry
set, a call’s nonkilled calling context is preserved by means of a simple intersection
operation done at the return node for the call.

Slicing determines program pieces that can affect a value. Logical ripple effect
determines program pieces that can be affected by a value. Both slicing and logical
ripple effect are useful for software maintenance. The problems of slicing and logical
ripple effect are inverses of each other, and a solution of either problem can be inverted
to solve the other. Precise interprocedural logical ripple effect analysis is complicated
by the fact that an element may be in the ripple effect by virtue of one or more specific
execution paths. In this dissertation we present an algorithm that builds a precise
logical ripple effect or slice piece by piece, taking into account the possible execution
paths. The algorithm makes use of our interprocedural dataflow analysis method,
and this method is also used in an algorithm given in this dissertation for identifying
loops that can be parallelized.
vi

CHAPTER 1
INTRODUCTION
1.1 Interprocedural Dataflow Analysis
Dataflow analysis refers to a class of problems that ask about the relationships
that exist along a program’s possible execution paths, between such program ele¬
ments as variables, constants, and expressions [2, 10]. When dataflow analysis is
done for a program by treating its individual procedures as being independent of
each other, regardless of the calls made, this is known as intraprocedural analysis.
For intraprocedural analysis, assumptions must be made about the effects of calls.
By contrast, interprocedural analysis replaces assumptions with specific information
about the effects of each call. This information can be gathered by either flow-
sensitive [3, 6, 9, 17, 19, 21] or flow-insensitive [4, 7, 18] analysis. When answering
a dataflow question, a flow-sensitive analysis will take into account the flow paths
within procedures, whereas a flow-insensitive analysis ignores these flow paths. The
flow paths are the possible execution paths. Flow-sensitive analysis typically provides
more precise information, but at greater cost.
Flow-sensitive interprocedural dataflow analysis has two major problems that
make it significantly harder than intraprocedural analysis. First, in intraprocedural
analysis, it is assumed that any path in the flowgraph is a possible execution path.
By contrast, for interprocedural analysis, it is useful to assume that the possible
execution paths conform to the rule that once a procedure is entered by a call, the
flow returns to that call upon return. Thus, the set of possible execution paths will
typically be a proper subset of the paths in the program flowgraph. This problem
1

2
will be referred to as the calling-context problem. Second, call-by-reference formal
parameters typically cause alias relationships between actual and formal parameters
that are valid only for certain calls and apply only to those passes through the called
procedure that originate from those calls that establish the specific alias relationship.
There are many applications for a flow-sensitive interprocedural dataflow anal¬
ysis method that solves the two major problems, assuming that the costs of the
method are not too high. Some of the well-known dataflow problems that can be
precisely solved by such a method are reaching definitions, live variables, the related
problems of definition-use and use-definition chains, and available expressions. Ap¬
plications that require the solution of one or more of these dataflow problems include
compiler optimization, automatic vectorization and parallelization of program code,
program revalidation, dataflow anomaly detection, and software tools that show data
dependencies.
In this dissertation we present a new method for flow-sensitive interprocedural
dataflow analysis that solves the two major problems, and does so at a comparatively
low cost [13]. The method consists of special dataflow equations that are solved for
a program flowgraph. In deference to calling context, separate sets, called entry and
body sets, are maintained at each node in the flowgraph. The entry set contains
calling-context effects that enter a procedure. The body set contains effects that
result from statements in the procedure. By isolating calling-context effects in the
entry set, a call’s nonkilled calling context is preserved by means of a simple inter¬
section operation done at the return node for the call. The main advantage of our
method is its low complexity, and the fact that the presence of recursion does not
affect the preciseness of the result.
The language model assumed for Chapter 2 allows global variables, but the
visibility of each formal parameter is limited to the single procedure that declares

3
it. Thus, with the exception of a call and its indirect reference, each formal pa¬
rameter can only be referenced inside a single procedure. Examples of programming
languages that fit this model are C and FORTRAN. This restriction on the visibility
of formal parameters is imposed for the sake of the discussions of element recoding
in Sections 2.2.2 and 2.3, of implicit definitions in Section 2.2.3, and of worst-case
complexity in Section 2.5. Our method can also be used for the alternative language
model that allows each formal parameter to have visibility in more than a single
procedure, but this is considered only briefly at the end of Section 2.5.
1.2 Slicing and Logical Ripple Effect
Given an actual or hypothetical variable v at program point p, determine all
program pieces that can possibly be affected by the value of v at p. This is the logical
ripple effect problem. Given v and p, determine all program pieces that can possibly
affect the value of v at p. This is the slicing problem. For these two problems, each
problem is the inverse of the other, and a solution for one of these problems, once
inverted, would be a solution for the other problem.
Logical ripple effect is useful for helping a programmer to understand how a
program change, either actual or hypothetical, will impact that program. Making
program changes as part of routine maintenance often introduces new errors into the
changed program. Such errors typically result because the programmer overlooked
some part of the logical ripple effect for that change. By showing a programmer what
the logical ripple effect actually is for a program change, mistakes can be avoided.
Slicing is primarily useful for program fault localization [23]. If a variable v at
point p is known to have a wrong value, then a slice on v at p will narrow the search
for the cause of the error to that part of the program that can truly affect v at p.
Thus, the fault is localized. The more precise the slice, the more localized the cause
of the error, saving programmer time.

4
In this dissertation we are concerned only with static logical ripple effect and
slicing [11, 12, 16, 24] where the ripple effect or slice is determined from dataflow
analysis of the program text. The alternative approach is dynamic logical ripple
effect and slicing [1, 14] where the ripple effect or slice is determined by actually
executing the program. Whenever we speak of execution paths in Chapter 3, we
always mean possible execution paths as determined by dataflow analysis.
Precise interprocedural logical ripple effect analysis is complicated by the fact
that a definition may be added to the ripple effect because of one or more specific
execution paths. To determine in turn the ripple effect of that added definition,
that definition should be constrained to those execution paths that are the possible
continuations of the execution paths along which that definition was itself affected
and thereby added to the ripple effect. We refer to this as the execution-path problem.
In particular, it is those call instances made in an execution path P that have
not been returned to in P that cause the difficulty. This is because of the rule that a
called procedure returns to its most recent caller. This means that any continuation
of the execution path P must first return to those unreturned calls in P before returns
can possibly be made to call instances that precede P. An example will illustrate the
problem.
procedure main
procedure B
procedure A
begin
begin
begin
1: f «- 7
6: y <— f + 5
7: call B
2: call A
end
8: x < y
3: z <— x
4: f «- 1
5: call B
end
end
For the example, assume that all variables are global, and that the problem is to
determine the logical ripple effect for the definition of variable / at line 4. The call

5
to procedure B at line 5 allows the definition of / at line 4 to affect the definition of
y at line 6, and the return of procedure B would be to the call at line 5 by which the
definition of y at line 6 was affected. The end result is that the ripple effect should
include only line 6. However, assume that the execution-path problem is ignored and
all returns are possible when the ripple effect is computed. For the same problem, the
call at line 5 allows the definition of / at line 4 to affect the definition of y at line 6.
Then the definition of y at line 6 affects the definition of x at line 8 by procedure B
returning to the call at line 7 in addition to the call at line 5. Then the definition of
x at line 8 affects the definition of z at line 3 by procedure A returning to the call at
line 2. The end result is a ripple effect that includes lines 3, 6, and 8, but only line 6
should be in the ripple effect.
Although there are a number of papers on logical ripple effect and slicing
[11, 12, 16, 24], there appears to be only one [11] that addresses the problems of
precise interprocedural logical ripple effect and slicing, and presents a method for it.
Weiser [24] was the first to propose an interprocedural slicing method that ignores
the execution-path problem and thereby suffers from the resulting loss of precision.
Horwitz et al. [11] address the problem of precise interprocedural slicing, and present
a method to construct a system dependence graph from which slices can be extracted.
In this dissertation we present an algorithm that builds the logical ripple effect
piece by piece, and takes into account the restrictions on execution-path continuation
that are imposed by the preceding execution paths up to the point by which the given
program piece is affected and thereby included in the ripple effect. In general, the
algorithm computes a precise logical ripple effect, but some overestimation is possible,
meaning that the computed logical ripple effect may be larger than it actually is. An
inverse form of the algorithm is presented for the slicing problem. The languages
that our algorithm will work for include many of the common procedural languages
such as C, Pascal, Ada, and Fortran.

6
1,3 Parallelization
Automatic conversion of a sequential program into a parallel program is often
referred to as parallelization. Parallelization problems are typically concerned with
the conversion of sequential loops into parallel code. In this dissertation, the specific
problem considered is the identification of loops in a program that can be parallelized,
including those loops that contain calls. A flow-sensitive interprocedural dataflow
analysis method has specific applicability to the problem of parallelizing loops that
contain calls, because such a method can supply the precise data-dependency infor¬
mation that would be necessary for the parallelization analysis.
The parallelization of a loop would mean that each iteration of the loop can
be executed independently of the other iterations of the loop. In theory, this would
mean that each single iteration, or each arbitrary block of iterations, can be assigned
to a separate processor in a parallel machine. The specific architecture of a particular
parallel machine, as well as the programming language to be parallelized, as well as
the various loop transformations that are possible to convert sequential loop code into
functionally equivalent sequential code that is more parallelizable, will influence the
determination in any parallelization tool as to what loops can actually be parallelized,
and how they would be parallelized. However, none of the architecture, language,
and loop-transformation issues will be considered here. Instead, the problem will be
considered solely from the standpoint of data dependence.
After a brief review of the basics regarding data dependence and parallelization,
an algorithm is given that identifies loops in a program that can be parallelized, and
this algorithm uses our interprocedural dataflow analysis method as an integral part.
The potential value of parallelization is clear. On the one hand, parallel machines
are becoming more common, and on the other hand, a great number of sequential
programs already exist, some of which can benefit from the greater processing power
that parallelization would offer.

7
1.4 Literature Review
Different methods have been offered for solving various flow-sensitive interpro¬
cedural dataflow analysis problems. Sharir and Pnueli [21] present a method they
name call-strings. The essential idea of their method is to accumulate for each ele¬
ment a history of the calls traversed by that element as it flows through the program
flowgraph. The call history associated with an element is used whenever that element
is at a return point. The element can only cross back to those calls in its call history.
Thus, the call-strings approach provides a solution to the calling-context problem.
However, the disadvantage of this approach is the time and space needed to maintain
a call history for each element at each flowgraph node.
Let l be the program size. We assume that the number of elements will be
a linear function of l. The worst-case number of total set operations required by
the call-strings approach would be greater by a factor of / when compared to our
method. This is because for each union or intersection of two sets of elements, if the
same element is in both sets, then a union operation must also be done for the two
associated call histories so as to get the new call history to be associated with that
element at the node for which the set operation is being done. A further disadvantage
of the call-strings approach is the need to include the associated call histories when
set stability is tested to determine termination for the iterative algorithm used to
solve the dataflow equations.
Myers [17] offers a solution to the calling-context problem that is essentially the
same as call-strings. Allen [3] presents a different method for interprocedural dataflow
analysis. The method analyzes each procedure completely, in reverse invocation
order. The first procedures to be analyzed would be those that make no calls, then
the procedures that only call these procedures would be analyzed, and so on. Once a
procedure is analyzed, its effects can be incorporated into those procedures that call

8
it, when they in turn are analyzed. The obvious drawback of this method is that it
cannot be used to analyze recursive calls.
Rosen [19] presents a complex method for interprocedural dataflow analysis
that is limited to solving the problems of variable modification, preservation, and use.
These dataflow problems do not require a solution of the calling-context problem.
Callahan [6] has proposed the program summary graph to solve the interproce¬
dural dataflow problems of kill and use, where kill determines all definite kills that
result from a procedure call, and use determines all variables that may be used as a
result of a procedure call before being redefined.
As part of the determination of edges in the program summary graph, intrapro¬
cedural reaching-definitions analysis must be done for each procedure. Simplifying
Callahan’s space complexity analysis, we get 0(vgal) as the worst-case size of the
program summary graph, where vga is the number of global variables in the program
plus the average number of actual parameters per call, and / is the program size. One
limitation of Callahan’s method is that it does not correctly handle multiple aliases
that result when the same variable is used multiple times as an actual parameter
in the same call and the corresponding formal parameters are call-by-reference. By
contrast, our method, using element recoding where all the aliases are encoded in a
single element, will correctly handle the multiple aliases problem.
Callahan’s method offers no solution to the calling-context problem, and could
not be used to determine, for example, interprocedural reaching definitions. However,
Harrold and Soffa [9] have extended his method so that interprocedural reaching
definitions can be determined. They use an interprocedural flowgraph, denoted IFG,
that is very similar to the program summary graph. The IFG has inter-reaching edges
that are determined by solving Callahan’s kill problem. They recommend using his
method, so their method inherits Callahan’s space and time complexity, as well as
its limitation with regard to multiple aliases.

9
Before the IFG can be used, it must be decorated with the results of intrapro¬
cedural analysis done twice for each procedure to determine both reaching definitions
and upwardly exposed uses. Then an algorithm is used to propagate the upwardly
exposed uses throughout the IFG. This algorithm has worst-case time complexity
of 0(n2) where n is the number of nodes in the IFG. Their graph will have the
same number of nodes as for Callahan’s graph, meaning worst-case graph size will be
0{vgal). Substituting vgal for n, we get a worst-case time complexity of 0(v2J2). As
the size of our flowgraph is proportional to the size of the program, the worst-case
time complexity for solving our equations is only 0(l2).
Weiser [24] was the first to propose an interprocedural slicing method that
ignores the execution-path problem and thereby suffers from the resulting loss of
precision. Horwitz et al. [11] have presented a method to compute the more precise
slice explained in the Introduction. However, they use a more restricted definition
of a slice. Their slice is all statements and predicates that may affect a variable
v at program point p, such that v is defined or used at point p. Their method
consists of constructing a specialized graph called a system dependence graph. Nodes
in this graph represent program pieces such as statements, and the edges in the
graph represent control or data dependencies. Edges representing transitive data
dependencies that are due to procedure calls are computed by first modeling each
procedure and its calls with an attribute grammar called a linkage grammar, and then
solving the grammar so as to determine the transitive data dependencies represented
by it. Once the system dependence graph is complete, any slice based on an actual
definition or use occurring at any point p in the program can be extracted from the
graph. A major weakness of their method is that it does not allow a hypothetical
use to be the starting point of the slice.
The complexity of constructing the system dependence graph is given as 0(G •
X2 â–  D2) where G is the total number of procedures and calls in the program, X is the

10
total number of global variables in the program plus a term that can be considered
a constant, and D is a linear function of X. Once the system dependence graph
is complete, any particular slice that is wanted can be extracted from the graph at
complexity 0(n) where n is the size of the graph. The size of the graph is roughly
quadratic with program size, being bounded by 0(P • (V -f E) + T • X) where P is
the number of procedures, V is the largest number of predicates and definitions in a
single procedure, E is the largest number of edges in a procedure dependence graph,
T is the number of calls in the program, and X is the number of global variables. In
their paper, much is made of the fact that once the graph is complete, any slice on
an actual definition or use can be extracted from the graph at 0(n) cost where n is
the size of the graph. However, the number of actual definition and use occurrences
in a program is proportional to the program size L. Therefore, any method that can
compute a slice at cost O(Z) for some Z, can generate all the slices contained in their
graph at cost 0(Z • L), spool the slices to disk, and recover them at cost 0(1).
Although there are many papers on slicing, it seems that only Horwitz et al.
[11] discuss clearly the problem of the more precise interprocedural slice, and present
a method to compute it, as well as providing complexity analysis. Our research on
slicing is only concerned with computing the more precise slice, so Horwitz et al. is
the principal reference.
Zima and Chapman [25] is the principal reference used to study the issues and
methods of parallelization. Their book distills the work found in scores of papers
and dissertations, and is an excellent survey of parallelization. Interprocedural par¬
allelization is specifically considered by Burke and Cytron [5], and by Triolet et al.
[22].

11
1.5 Outline in Brief
This introductory chapter ends with a brief synopsis of the remaining chapters.
Chapter 2 presents in detail our interprocedural dataflow analysis method. The chap¬
ter ends with a brief description of the prototypes that were built to demonstrate the
method, along with some of the experimental results obtained from these prototypes.
Chapter 3 begins with a representation scheme for continuation paths for the inter¬
procedural logical ripple effect problem and then presents our interprocedural logical
ripple effect algorithm. A prototype that was built to demonstrate this algorithm is
briefly described and experimental results are presented. An inversion of the logical
ripple effect algorithm is then presented as a solution to the interprocedural slicing
problem. Chapter 4 begins with an explanation of loop-carried data dependence and
its relevance to parallelization, and concludes with an algorithm that identifies loops
that can be parallelized, including loops that contain calls. Chapter 5 summarizes
the major results of the dissertation, and suggests directions for future research.

CHAPTER 2
THE INTERPROCEDURAL DATAFLOW ANALYSIS METHOD
2.1 Constructing the Flowgraph
This section discusses the flowgraph and its relationship to dataflow equations.
After the discussion, rules are given for constructing the specific flowgraph required
by our interprocedural analysis method. Note that the required flowgraph is con¬
ventional and the rules to be given relate only to the representation of calls and
procedures in the flowgraph.
A flowgraph is a directed graph that represents the possible flow paths of a
program. The nodes of a flowgraph correspond to basic blocks in the program. A
basic block is a sequence of program code that is always executed together in the
same order. The directed edges of a flowgraph represent possible transfers of control.
Figures 2.1 and 2.3 each represent a flowgraph.
Dataflow problems are often formulated as a set of equations that relate the four
sets, IN, OUT, GEN, and KILL, that are associated with each node in the flow-
graph. For any node and its block, the GEN set represents the elements generated
by that block. The KILL set represents those elements that cannot flow through the
block, because they would be killed by the block. The IN set represents the valid
elements at the start of the block, and the OUT set represents the valid elements at
the end of the block.
Dataflow problems are typically either forward-flow or backward-flow. For
forward-flow, the IN set of a node is computed as the confluence of the OUT sets
of the predecessor nodes, and the OUT set is a function of the node’s IN, GEN,
12

13
and KILL sets. For backward-flow, the OUT set of a node is computed as the con¬
fluence of the IN sets of the successor nodes, and the IN set is a function of the
node’s OUT, GEN, and KILL sets. The predecessors of any node n are those nodes
that have an out-edge directed to node n. The successors of node n are those nodes
that have an in-edge directed from node n. The confluence operator will almost in¬
variably be either set union or set intersection, depending on the problem. Thus, a
dataflow problem may be classified as being either forward-flow-or, forward-flow-and,
backward-flow-or, or backward-flow-and, where “or” refers to set union and “and”
refers to set intersection.
Once the dataflow equations have been defined for a particular problem, and
the rules established for creating the GEN and KILL sets, the equations can then
be solved for a specific program or procedure and its representative flowgraph. To
solve the equations, the iterative algorithm can be used. The iterative algorithm has
the advantage that it will work for any flowgraph.
The iterative algorithm repeatedly computes the IN and OUT sets for all nodes
until all sets have stabilized and ceased to change. Recomputation of a node is
necessary whenever an outside set that it depends on changes. For forward-flow
problems, a node must be recomputed if the OUT set of a predecessor node changes.
For backward-flow problems, a node must be recomputed if the IN set of a successor
node changes. Typically, an evaluation strategy will determine the actual order in
which nodes are recomputed.
The flowgraph required by our interprocedural analysis method is conventional,
with special nodes and edges as follows. For each procedure in the program, assign
an entry node and an exit node. These nodes have no associated blocks of program
code.
The entry node has a single out-edge and as many in-edges as there are calls
to that procedure in the program. The exit node has as many in-edges as there are

14
nodes for that procedure whose blocks terminate with a return action. The exit node
has as many out-edges as there are calls to that procedure in the program. For every
in-edge of the entry node, there is a corresponding out-edge of the exit node.
For the purpose of constructing the flowgraph, calls must be classified as either
known or unknown. A known call is where the flowgraph for the called procedure
will be a part of the total flowgraph being constructed. An unknown call is where
the flowgraph of the called procedure will not be a part of the total flowgraph being
constructed. Unknown calls are common and will occur for two reasons. First, the
called procedure may be a compiler-library procedure for which source code is not
available. Second, the called procedure may be a separately compiled user procedure
for which the source code is not available.
For any unknown call made within the program, if summary information of its
interprocedural effects is not available, then conservative assumptions about its effects
will have to be made. The actual summary information needed, and the assumptions
made in its absence, will depend on the particular dataflow problem. The summary
information, if present, would be used when constructing the GEN and KILL sets
for any node whose block contains an unknown call.
For any known call made within the program, there will be two nodes in the
flowgraph for that call. One node is the call node. The call node represents a basic
block that ends with the known call. The other node is the return node. The return
node has an empty associated block.
The call node will have two out-edges. One edge will be directed to the entry
node of the called procedure. The other out-edge will be directed to the return node
for that call. The return node will have two in-edges. One edge is the directed edge
from the call node. The other in-edge is directed from the called procedure’s exit
node.

15
In all, each known call results in two nodes and three distinct edges. One edge
connects the call node to its return node. A second edge connects the call node to
the called procedure’s entry node. A third edge connects the called procedure’s exit
node to the return node.
In constructing the flowgraph, a special problem arises if the programming lan¬
guage allows procedure-valued variables, such as the function pointers of C that when
dereferenced result in a call of the function that is pointed at. The problem is to
identify what are the possible procedure values when the procedure-valued variable
invokes a call. Assuming this information is available from a separate analysis, the
flowgraph can be constructed accordingly. For example, if the procedure-valued vari¬
able can have three different values when the call in question is invoked and each
value is a procedure whose flowgraph will be part of the total flowgraph, then three
known calls would be constructed in parallel with a common predecessor node for
the three call nodes and a common successor node for the three return nodes.
A procedure-valued variable is in essence a pointer. Note that the problem of
determining what a pointer is or may be pointing at when that pointer is dereferenced,
can itself be formulated as a dataflow problem, and in particular as a forward-flow-or
dataflow problem. If necessary, an initial version of the flowgraph could be con¬
structed that treats all calls invoked by procedure-valued variables as unknown calls,
followed by a solving of the dataflow problem for determining possible pointer values
whenever a pointer is dereferenced, followed by amendments to the flowgraph using
the pointer-value information.
Dataflow analysis makes a simplifying, conservative assumption about the cor¬
respondence between paths in the flowgraph and possible execution paths in the pro¬
gram. Let a path be a sequence of flowgraph nodes such that in the sequence node
n follows node m only if n is a successor of m in the flowgraph. For intraprocedural

16
analysis, the assumption made is that any path in the flowgraph is a possible execu¬
tion path. That this assumption may not be true for a particular program should be
obvious. However, the problem of determining the possible execution paths for an
arbitrary program is known to be undecidable. The simplifying assumption that we
use for interprocedural analysis is the same as that used for intraprocedural analy¬
sis, but with the added proviso that for any path that is a possible execution path,
any subsequence of return nodes must inversely match, if present, the immediately
preceding subsequence of call nodes. A return node matches a call node if and only
if the return node is the call node’s successor in the flowgraph.
2.2 Interprocedural Forward-Flow-Or Analysis
This section begins with our basic approach to solving the calling-context prob¬
lem. The dataflow equations for forward-flow-or analysis are then given and their
correctness is shown. As a part of our interprocedural analysis method, the tech¬
nique of element recoding is presented as a way to deal with the aliases that result
from call-by-reference formal parameters. For some dataflow problems, implicit defi¬
nitions due to calls require explicit treatment, and this is discussed last.
If certain problems, such as reaching definitions, are to be solved for a program
by flow-sensitive interprocedural analysis, then the calling context of each procedure
call must be preserved. In general, preserving calling context means that the dataflow
effects of an individual call should include those effects that survive the call and were
introduced into the called procedure by the call itself, but not those effects introduced
into the called procedure by all the other calls to it that may exist elsewhere in the
program. We refer to the need to preserve calling context as the calling-context
problem.
Our solution to the calling-context problem—and the essential difference be¬
tween our dataflow equations and conventional dataflow equations—is to divide every
IN set and every OUT set into two sets called an entry set and a body set. The reason

17
for having two sets is that the calling-context effects that enter a procedure from the
different calls can be collected and isolated in the separate entry set. This entry set
can then have effects in it killed by statements in the body of the procedure, but no
additions are made to this entry set by body statements. Instead, any additions of
effects due to body statements are made to the separate body set. This body set
will also have effects killed in the normal manner, as for the entry set. Because the
body set is kept free of calling-context effects, it is empty at the entry node. By
contrast, the entry set is at its largest at the entry node and will either stay the same
size as it progresses through the procedure’s body nodes, or become smaller because
of kills. By intersecting the calling context at a call node with the entry set at the
exit node of the called procedure, the result is that subset of the calling context that
has reached the exit node and therefore will reach the return node for that call. By
“reach” we mean that there exists a path in the flowgraph along which the element
is not killed or blocked.
2.2.1 The Dataflow Equations
The dataflow equations that define the entry and body sets at every node are
now given. The equations are divided into three groups. The first group computes
the sets for entry nodes. The second group computes the sets for return nodes. The
third group computes the sets for all other nodes. In the equations, B denotes a
body set and E denotes an entry set. Two conditions, C\ and C2, appear in the
equations. C\ means that x will cross the interprocedural boundary from call node p
into the called procedure. C2 means that x can cross the interprocedural boundary
from exit node q into return node n. Cx means not C,. For each node n, pred(n)
means the set of predecessors of n. The RECODE set used in Group I is explained
in Section 2.2.2. The GEN set used in Group I, and the GEN and KILL sets used
in Group II, are explained in Section 2.2.3.

18
For any node n.
IN[n] = Ein[n] U Bin[n\
OUT[n] = Eout[n\ U Bout[n\
Group I: n is an entry node.
Bin[n] = 0
Ein[n] = 1J {x \ x € OUT[p] ACi}
p £ pred(n)
Boxlt[n] = GEN[n\
Eout [n] = Ein[n\U RECODE[n\
Group II: n is a return node, p is the associated call node and q is the exit node of
the called procedure.
Bin[n\ = {x | (x G Bout[p] A (Ci V (Cx A C2 A x £ £out[ E,n[n\ = {xí Eout[p} | Ci V (Ci A C2 A x e Eout[q})}
Bout[n} = (5m[n] - KILL[n]) U GEN[n}
Eout[n} = Ein[n\ - I Group III: n is not an entry or return node.
Bin M = \J Bout[p]
p £ pred(n)
Ein[n] = (J Eout[p]
p £ pred(n)
B0ut[n] = (Btn[n] - KILL[n]) U GEN[n\

19
Eout[n] = E{n[n] - KILL[n]
The equations assume that the GEN and KILL sets for each call node will
include only those effects for that call that occur prior to the entry of the called
procedure. This requirement is necessary because the OUT set of the call node is
used by the entry-node equation that constructs the entry set of the called procedure.
Referring to conditions C\ and C2, the rules for deciding whether an effect
crosses a particular interprocedural boundary will depend on two primary factors,
namely the dataflow problem and the programming language. For example, for the
reaching-definitions problem and a language such as FORTRAN, any definition of a
global variable, and any definition of a variable that is used as an actual parameter
whose corresponding formal parameter is call-by-reference, will cross. As a rule, an
effect that crosses into a procedure because it might be killed, will also cross back to
the return node if it reaches the exit node of the called procedure.
Table 2.1 shows the result of solving the equations for the flowgraph of Fig¬
ure 2.1. By “solving” we mean that, in effect, the iterative algorithm has been used
and all the sets are stable. The dataflow problem is reaching definitions, and variable
w is local while variables x, t/, and 2 are global. Reaching definitions is the problem
of finding all definitions of a variable that reach a particular use of that variable,
for all variables and uses in the program. In Figure 2.1, nodes 1 and 8 are entry
nodes, nodes 7 and 10 are exit nodes, nodes 3 and 5 are call nodes, and nodes 4 and
6 are return nodes. Alongside each node is its basic block. Each defined variable is
superscripted with an identifier that is the set element used in Table 2.1 to represent
that definition.
The correctness of the equations can be seen from the following observations.
For a procedure, the entry-node entry set is constructed as the union of all calling-
context effects that can enter the procedure from its calls. Within the procedure
body, effects in the entry set can be killed, but not added to. For effects in the entry

20
procedure main
begin
w = 5
x = 10
if(w > x)
z = 10
procedure f()
begin
x= 10
end
call f()
else
y3 = 5
call f()
Figure 2.1. A reaching-definitions example.

21
Table 2.1. Solution of forward-flow-or equations for Figure 2.1.
Node
E{n
E0ut
Bin
Bout
1
0
0
0
0
2
0
0
0
{1,2}
3
0
0
{1,2}
{1,2, 4)
4
0
0
{1,4,5}
{1,4, 5}
5
0
0
(1,2)
(1,2,3)
6
0
0
{1,3, 5}
{1,3, 5}
7
0
0
{1,3, 4,5}
{1,3, 4,5}
8
{2, 3, 4}
{2, 3, 4}
0
0
9
{2, 3, 4}
{3,4}
0
{5}
10
(3,4)
{3, 4}
{5}
{5}
set that reach a call at a call node, those effects that survive the call are recovered
in the entry set constructed by the Ein[n\ equation for the successor return node n.
To see that this is true, observe the following. If an entry-set effect that reaches
the call cannot enter the called procedure, then it cannot be killed within the called
procedure, so the effect should be added to the return-node entry set without further
conditions, and this is done by the selection criterion (x £ Eout[p\ A C\) in the
equation for the return node. If, on the other hand, an entry-set effect reaches the
call and does enter the called procedure, and therefore may be killed by it, then this
effect should be added to the return-node entry set only if it reached the entry set of
the called procedure’s exit node and the effect can cross back into the caller. This is
done by the selection criterion (x £ Eout[p] A C\ A C2 A x £ Eout[q]) in the Ein[n]
equation for the return node.
From the equations for the entry set, we see that for any procedure z, the
entry set at z’s exit node will, as the equations are solved, eventually contain all
calling-context effects that entered z and reached its exit node. This characteristic
of the exit-node entry set is the requirement placed upon it when it is used in the

22
Ein[n\ equation for the return node, so this requirement is satisfied and the entry-set
equations are correct.
For any procedure, the Bin set is always empty at the entry node, so the B set
is free of calling-context effects. Within the procedure body, GEN and KILL sets
are used to update the body set as it propagates along the various nodes. For effects
in the body set that reach a call at a call node, those effects that survive the call are
recovered in the body set constructed by the ¿?,-„[n] equation for the successor return
node n. If a body-set effect that reaches the call cannot enter the called procedure,
then it cannot be killed within the called procedure, so it should be added to the
return-node body set without further conditions, and this is done by the selection
criterion (x € Bout[p] A Cj) in the 5,„[n] equation for the return node. If, on the
other hand, a body-set effect reaches the call and will enter the called procedure,
and therefore may be killed by it, then this effect should be added to the return-
node body set only if it reached the entry set of the called procedure’s exit node
and the effect can cross back into the caller. This is done by the selection criterion
(x € Boxlt[p] A Ci A C2 A x € Eout[q]) in the Z?m[n] equation for the return node. In
addition, all crossable effects that result from the call, and that are independent of
calling context, should also be added to the return-node body set, and this is done by
the selection criterion (x € Bout[q] A C2) in the 5m[n] equation for the return node.
From the equations for the body set, we see that for any procedure 2, the body
set at 2’s exit node is free of calling-context effects and will, as the equations are
solved, eventually contain all body effects that reached the exit node, including those
body effects resulting from calls made within 2. This characteristic of the exit-node
body set is the requirement placed upon it when it is used in the Bin[n\ equation
for the return node, so this requirement is satisfied. The other requirement of this
return-node equation is that the exit-node entry set contains all calling-context effects

23
for the procedure that reach the exit node. This requirement has already been shown
to be satisfied, so we conclude that the body-set equations are correct.
2.2.2 Element Recoding for Aliases
The RECODE set for the entry node has its elements added to the F,n set for
that node. The idea of the RECODE set is that certain elements in the OUT set of a
predecessor call node, irrespective of their ability to cross the interprocedural bound¬
ary when parameters are ignored, should nevertheless be carried over into the entry
set of the called procedure as calling-context effects because of an alias relationship
established by the call, between an actual parameter and a formal call-by-reference
parameter. Any element that enters a procedure because of such an alias relationship
between parameters should be recoded to reflect this alias relationship.
A recoded element represents both the base element, which is the element as
it would be if there were no alias relationship, and the non-empty alias relationship.
Element recoding has two purposes. First, it allows the recoded element within
the called procedure to be killed correctly through its alias relationship. Second, it
allows the recoded element within the called procedure to be correctly associated
with specific references to those aliases that are in the alias relationship.
Element recoding never involves a change of the base element, but only a change
of the associated alias relationship, which would be the set of formal parameters to
which the base element is, in effect, aliased. Because of element recoding, in effect a
new element is generated, hence the separate RECODE set.
Figure 2.2 presents an algorithm for generating the entry-node input sets E,„
and RECODE, for a forward-flow-or dataflow problem, for the assumed language
model in which the visibility of each formal parameter is limited to the single proce¬
dure that declares it. For each element in the OUT[c} set, the algorithm generates
at most one element for inclusion in the entry-node input sets. The algorithm is

24
unambiguous, except for line 10. The “can be affected by” test at line 10 is a gener¬
alization. The details of this test will depend on the specific dataflow problem being
solved. For example, if the dataflow problem is reaching definitions, then each base
element w represents a specific definition of some variable z. If the actual parame¬
ter p being tested by the algorithm is the variable z, and the corresponding formal
parameter is call-by-reference, then the definition that w represents can be used or
killed through that formal parameter, so w can be affected by that actual parameter
z, and the “affected by” test is therefore satisfied. The p € OA test at line 10 covers
the situation where an actual parameter p that is aliased to the formal / is itself a
formal parameter that is effectively aliased to w. In this case / is established as a
new effective alias for w, by transitivity of the alias relationship.
Referring to the algorithm, there is no carry over of the old alias relationship
into the new alias relationship. The old alias relationship is represented by the OA
set, and the new alias relationship is represented by the NA set. That this no-
carry-over of the old alias relationship is correct, follows from the assumed language
model. The aliases of element recoding are formal parameters, and the model states
that each formal parameter is visible in only one procedure. This means there is no
need to carry the old alias relationship into a different procedure, because the aliases
cannot be referenced outside the single procedure in which the old alias relationship
is active. Note that recursive calls are no exception to this no-carry-over rule, because
a recursive call will cancel any alias relationship established for a base element by
any prior call of the procedure.
In general, the fact that crossing elements are recoded when NA / 0, and
unrecoded when NA = 0 and OA ^ 0, places an added burden on the return-node
equations to recognize an element that should be recovered from the exit-node entry
set, necessitating, in effect, additional rules to cover this possibility. After an element
is recovered, it would also be necessary to restore the alias relationship, if any, that

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
25
— e is an entry node.
— This algorithm constructs the E{n[e] and RECODE[e] sets,
begin
Ein [ e ] <— 0
RECODE[e] <- 0
for each predecessor call node c of entry node e
for each element x £ OUT[c]
let w be the base element of x
let OA be the set of aliases, if any, associated with w, forming x
let NA be the set of new aliases
N A *— 0
for each actual parameter p at call node c that is aliased
to a call-by-reference formal parameter /
if (w can be affected by p) V (p € OA)
NA +- NAU{f]
fi
end for
if NA 0
RECODE[e] <- RECODE[e] U {(w,NA)}
else if w can cross the interprocedural boundary
Ein[e] <- Ein[e] U {u;}
fi
end for
end for
end
Figure 2.2. Element-recoding algorithm for forward-flow-or dataflow problems.

26
it had prior to the call. This recognition and restoration problem is perhaps most
easily solved by associating with each call node two additional sets, one for body-
set elements and another for entry-set elements, where each set consists of ordered
pairs. These sets would be determined whenever the entry-node entry set of the
called procedure is computed.
The first element of each ordered pair is a crossing element x as it exists in the
Bout or Eout set at the call node, and the second element is element y which is that
element effectively generated from element x by the element-recoding algorithm of
Figure 2.2 at either line 13 or line 15. If all crossing elements for the call are included
in these additional sets, then the return-node equations can use these sets instead
of the Bout[p] and Eout[p) sets to recognize elements to be recovered from the exit-
node entry set. Recognition and restoration would be done by trying to match the
exit-node entry-set element against the second element of an ordered pair from the
appropriate additional set at the call node, and then, if there is a match, restoring
the original element by using the first element of the matched pair. For example, if
a: is a crossing element in the Bout set of a call node, and y is the generated element,
then (x, y) would be an ordered pair in the additional set for body-set elements.
When the set for the return node is computed, if y is in the exit-node entry set
then it will match the ordered pair (x, y), and element x will be added to the B{n
set.
As an example of why element recoding is necessary, consider the following.
Suppose there are two different calls to the same procedure, and different definitions
of global variable g reach each call. At one of the calls, g is also used as an actual
parameter and the corresponding formal parameter is call-by-reference. The problem
now is what to kill from the entry set whenever that formal parameter is defined in
the called procedure. If the individual elements representing the different definitions
of g do not somehow identify how they are related to this formal parameter, then

27
the only choice is to kill all of them or none of them, and neither of these choices is
correct in this case, as the only definitions of g that should be killed are those that
entered the procedure from the call where g is aliased to the call-by-reference formal
parameter.
2.2.3 Implicit Definitions Due to Calls
A call with parameters typically has implicit definitions associated with it.
For example, if a formal parameter is call-by-reference, then each actual parameter
aliased to that formal parameter is implicitly defined at each definition of the formal
parameter. If a formal parameter is call-by-value-result, then that formal parameter is
implicitly defined each time the called procedure is entered, and the actual parameter
at the call is implicitly defined upon return from the call. From the standpoint
of solving a dataflow problem such as reaching definitions, all implicit definitions
due to calls should be determined, and elements generated at the appropriate nodes
to represent these implicit definitions. The remainder of this section discusses the
generation of implicit definitions and the determination of what reaches them for the
specific problem of reaching definitions.
We assume that a formal parameter may be either call-by-reference, call-by¬
value, call-by-value-result, or call-by-result. For the reaching-definitions problem,
before the iterative algorithm can be used to solve the dataflow equations, all GEN
sets must be prepared.
For each point p in the program where a call-by-reference formal parameter is
defined, add to the GEN set of the node for point p an implicit definition of each
actual-parameter variable that is aliased to that formal parameter in a call. Each
added implicit-definition element must be a recoded element that includes the alias
relationship for that actual parameter. For example, suppose a procedure named A
has two call-by-reference formal parameters, x and y, and inside A at point p there is
a definition of x, and there are three calls of procedure A in the program. The first call

28
aliases variable v to x. The second call aliases variable v to both x and y. The third
call aliases variable w to x. Thus, at point p there would be three implicit-definition
elements generated, namely (u, {x}), (w, {x, y}), and (in, {a:}). As an example of
what this element notation means, for the (u,{x}) element the v represents the
implicit definition of variable v that occurs at point p, and the x represents the
formal parameter that variable v is aliased to. As a special requirement for these
implicit-definition elements, for the Bout set at the exit node of procedure A, the
(v, {x}) element, if it reaches this set, can only cross from this set to the return node
of the first call. Similarly, the (u, {x,p}) element can only cross to the return node
of the second call, and the (w, {x}) element can only cross to the return node of the
third call.
The crossing restrictions in the preceding example are due to a rule, now given.
Let A denote a procedure containing a definition at point p of a call-by-reference
formal parameter x, (t, {x}) is the implicit-definition element generated at point p
for some specific call c of A that aliases actual-parameter variable t to x, and m is
the exit node of A. If (t, {x}) € Bout[m\, then (t, {x}) can only cross from Bout[m\ to
the return node of call c, and as (t, {x}) crosses, it must be recoded as t by having its
alias relationship nullified. This crossing-restriction rule is necessary because element
(f, {x}) is both a body effect, because it is generated inside the called procedure, and
a calling-context effect, because it is the result of a specific call of that procedure.
This dual quality requires the special treatment that the rule provides. Nullifying
the alias relationship as the element crosses to the return node is both good practice
in general for this element, and a necessity if call c is a recursive call of A. As an
example, assume that call c is a recursive call of A, and that variable t is a global
variable. If (t, {x}) reaches the Bout[m\ set, the rule states that this element can only
cross to the return node of call c, and that it be recoded as t. Assuming that this
t element then reaches from this return node to the Bout[m\ set, t can then cross

29
to any return node that has an in-edge from m. Although both the (¿,{x}) and t
elements refer to the same implicit definition of variable t occurring at point p, the
two elements are not the same, and the crossing-restriction rule applies only to an
element that is identical to the element generated at point p, which is (t, {x}).
The implicit definitions of actual-parameter variables is the most important
category of implicit definitions that are due to call-by-reference formal parameters.
However, there is also a second, less-important category. At each explicit definition
of a variable t at point p inside A, such that variable t is also used in a call of A as an
actual parameter aliased to a call-by-reference formal parameter x, then there is an
implicit definition of formal parameter x at point p. The implicit-definition element
generated at point p would be (x, {<}), meaning a definition of variable x at point
p, aliased to variable t. However, assuming a formal parameter cannot be defined or
used outside the procedure for which it is declared, it follows that there is no need
for a crossing-restriction rule for these elements, because they cannot cross to any
return node.
Normally, a definition of a variable kills all other definitions of that variable.
However, the implicit definitions due to call-by-reference formal parameters have no
associated kills. Instead, the following rule suffices. For each call-by-reference formal
parameter x declared for procedure A, if all calls of A alias the same actual-parameter
variable t to x, then each explicit definition inside A of either variable t or x, will kill
all definitions of variable t and all definitions of variable x. Otherwise, if all calls of A
do not alias the same actual-parameter variable t to x, then each explicit definition
inside A of either variable t or x will kill only the definitions of that variable and
those recoded elements that are aliased to that variable.
The entry-node GEN set will be used to hold all implicit definitions of formal
parameters that occur upon procedure entry. Thus, for each entry node, for each

30
formal parameter of the represented procedure that is call-by-value or call-by-value-
result, add to the GEN set of that entry node an element that represents an implicit
definition of that formal parameter occurring at that entry node.
The return-node GEN set will be used to hold all implicit definitions of actual
parameters that may occur upon return from the called procedure. Thus, for each
return node, for each actual parameter of the associated call whose corresponding
formal parameter is call-by-result or call-by-value-result, add to the GEN set of
that return node an element that represents an implicit definition of that actual
parameter occurring at that return node. The return-node KILL set should represent
all elements that will be killed by these implicit definitions of actual parameters.
With the GEN sets ready, the iterative algorithm can proceed. Once the iter¬
ative algorithm is ended, a follow-on step is done: a) Examine the Bout set for each
exit node. For each definition d in this set of a formal parameter p, and p is call-by¬
result or call-by-value-result, then d reaches the implicit use of this formal parameter
by those implicit definitions of actual parameters found at the various return nodes
whose corresponding formal parameter is p. The element representing d can be added
to the Bin sets of those return nodes in a way that reflects the reach, b) Examine
the OUT set of each call node. For each definition d in this set of a variable that
is used as an actual parameter in the call, and the corresponding formal parameter
is call-by-value or call-by-value-result, then d reaches the implicit use of the defined
variable by the implicit definition of the corresponding formal parameter found at
the entry node of the called procedure. The element representing d can be added to
the Ein set of that entry node in a way that reflects the reach.
2.3 Interprocedural Forward-Flow-And Analysis
This section gives the dataflow equations used by our interprocedural analysis
method for forward-flow-and problems. The difference between these equations and
the equations for forward-flow-or is explained.

31
For forward-flow-and problems, some changes are needed to the dataflow equa¬
tions given in Section 2.2.1. Of course, the confluence operator must be changed from
union to intersection. However, it is still necessary to construct the entry-node entry
set as the union of all crossing effects from the predecessor-node sets, so that calling
context can be properly recovered at the return nodes. At the same time, the entry
set must always be constructed as the intersection of predecessor-node sets, if the
entry set is to be a part of the IN and OUT sets. These conflicting requirements for
the entry-node entry set can be resolved by maintaining two separate entry sets at
each node. The revised dataflow equations follow. The two conditions, C\ and C2,
are explained in Section 2.2.1.
For any node n.
/7V[n] = 4„2)[n] U Bin[n]
OUT[n] = Eâ„¢[n\ U Bout[n\
Group I: n is an entry node.
Bin[n] - 0
= U {x I X e (£Íut[p] U Bout[p)) A Cl}
p € pred(n)
4?M = n {xlxeOUTipjhCr}
p € pred(n)
Bout[n] = GEN[n\
= 4n}H u RECODEw[n\ U RECODE{2)[n]
Eil\[n] = £jn2)[n] U RECODE^[n]
Group II: n is a return node, p is the associated call node and q is the exit node of
the called procedure.
Bin[n\ = {i|(xG Bout[p] A (Cj V (Ci A C2 A 1 £ E™[q ]))) V(i£ Bout[q] A C2)}

32
4ÍM = {* € É&ip] | CT V (Ci A C2 A x e EÍ2[9])};* = 1,2.
5ouí[n] = (5m[n] - tf/IZjn]) U GEN[n\
E^t[n\ = E¡Ü[n}-KILL[n}]i = 1,2.
Group III: n is not an entry or return node.
£»nM = Pi Bout[p]
p G pred{n)
4°M = fl 42.H;< = i,2.
p G pred(n)
50Ut[n] = (£:n[n] - AVLI[n]) U G£/V[n]
4ÃœW = 4?[n] - A7LL[n];i = 1,2.
The entry set E^ is the set used to recover calling context, and the entry set
Eis the set that is a component of the IN and OUT sets. The RECODE sets
appearing in the entry-node equations represent recoded elements as explained in
Section 2.2.2. The RECODEF) set will just be the union of the recoded elements
generated from each predecessor call node c, using the algorithm of Figure 2.2 and
drawing from the E^Jt[c] and Bout[c] sets at line 4 instead of the OUT[c] set.
Similarly, the RECODE^ set could just be the intersection of the recoded
elements from each predecessor call node c, drawing from the OUT[c] set at line 4.
However, doing this may cause the unnecessary loss of recoded elements when the
same underlying base element w is found in each OUT[c} set. To avoid such loss, an
improved rule states that if the same base element w is found in each OUT[c] set,
and there is one or more non-empty alias relationships for that w occurring at one or
more predecessor nodes c, then a single recoded element for that w that encodes all
of these alias relationships would be generated into the RECODE^ set, otherwise
no recoded element for that w would be generated into the RECODE^ set. For

33
example, suppose c has three different values for a given entry node, and the same
base element w is found in each OUT[c] set, and at one c there is an empty alias
relationship, at the second c there is an alias relationship to formal parameter x, and
at the third c there is an alias relationship to formal parameter y. For this example,
the single recoded element would be (w, {x,i/}), and this recoded element can either
be killed directly through to, or indirectly through x, or through y. Note that the
complete kill of this recoded element at any kill point, even though the kill may have
been made through an alias that was not established at each c, is nevertheless correct.
The intersection confluence operator associated with RECODE^ implicitly requires
that for base element to to pass a kill point, it must be on every call path past that
kill point, which is not the case when to is killed from at least one call path, which
happens when that to is killed through an alias that was established by at least one
of the c. If the specific dataflow problem being solved allows the base element to be
used through one of its effective aliases, then a flag could be associated with each alias
in the recoded elements of RECODE^2\ and this flag could indicate whether or not
the alias was established at each c. In the case of the example, the recoded element
with flags would be (to, {znot> 2/not))- Only a use of the base element through an
alias established at each c would be a use through an alias that occurs on every call
path, and this kind of use would be the all-paths use that is implicitly required by
the specific dataflow problem by virtue of it being forward-flow-and.
With the exception of the confluence operator and the two different entry sets,
the equations for forward-flow-and are the same as for forward-flow-or, and are like¬
wise correct. Set E^ fulfills the requirement for the IN and OUT sets by consistently
using the intersection confluence operator for its construction, just as B does. The
equations for the E^ and E^ sets only differ at the entry node, and there the only
difference is the confluence operator, and the way the RECODE sets are built. As
set intersection is the confluence operator for E^2\ and set union for E^\ and the

34
Table 2.2. Solution of forward-flow-and equations for Figure 2.3.
Node
Efi»
in
^out
if?
r(2)
C'out
Bin
Bout
1
0
0
0
0
0
0
2
0
0
0
0
0
{1,2}
3
0
0
0
0
{1,2}
{1,2, 4}
4
0
0
0
0
{1,4, 5}
{1,4,5}
5
0
0
0
0
{1.2}
{1,2,3}
6
0
0
0
0
{1,3, 5}
{1,3, 5}
7
0
0
0
0
{1,5}
0,5}
8
{2, 3, 4}
{2, 3, 4}
{2}
{2}
0
0
9
{2, 3, 4}
{3, 4}
{2}
0
0
{5}
10
{3.4}
{3, 4}
0
0
{5}
{5}
RECODE^ set is added to both E^ and E^2\ it follows that E^ will be a subset
of Eat every node. Thus, E^ can be used to recover calling context for E^2\
Set E^ also serves to recover calling context for both E^ and B, because E^ is
built at the entry node from these two sets, and the use of union as the confluence
operator guarantees that all calling-context effects will be collected.
Table 2.2 shows the result of solving the equations for the flowgraph of Fig¬
ure 2.3. By “solving” we mean that, in effect, the iterative algorithm has been used
and all the sets are stable. The dataflow problem is available expressions, and vari¬
able w is local while variables x, y, and 2 are global. Available expressions is the
problem of determining whether the use of an expression is always reached by some
prior use of that expression, for certain expressions in the program. In Figure 2.3,
nodes 1 and 8 are entry nodes, nodes 7 and 10 are exit nodes, nodes 3 and 5 are call
nodes, and nodes 4 and 6 are return nodes. Alongside each node is its basic block.
Each expression is superscripted with an identifier that is the set element used in
Table 2.2 to represent that expression.

procedure main
begin
y = w + 1
z = x + 1
procedure f()
begin
x = z + 2
end
if(e)
a = z + 1
caU f()
Figure 2.3. An available-expressions example.

36
2.4 Interprocedural Backward-Flow Analysis
Backward-flow problems are basically forward-flow problems in reverse. How¬
ever, the same flowgraph is used for both forward-flow and backward-flow problems.
To convert the equations for forward-flow-or to backward-flow-or, or for forward-
flow-and to backward-flow-and, the transformation is mechanical and straightfor¬
ward. The same equations are used, but various words and phrases are everywhere
changed to reflect the reverse flow. For example, “pred(n)” for predecessors becomes
“succ(n)” for successors, “out” subscripts become “in” subscripts and “in” subscripts
become “out” subscripts, IN becomes OUT and OUT becomes IN, “call node” be¬
comes “return node” and “return node” becomes “call node”, “entry node” becomes
“exit node” and “exit node” becomes “entry node”. For backward flow, the nodes
requiring special equations are the exit node and call node, and not the entry node
and return node as for the forward-flow problems.
2.5 Complexity of Our Interprocedural Analysis Method
To determine the worst-case complexity of our method for the assumed lan¬
guage model in which the visibility of each formal parameter is limited to the single
procedure that declares it, we consider the solution of the dataflow equations for only
one element at a time. Let n be the number of flowgraph nodes. Let the elementary
operation measured by the complexity be the computation of the dataflow equations
once at a single, average flowgraph node, for a single element. Only the presence or
absence of the single element within a particular body or entry set need be repre¬
sented, and this requires no more than a single bit of storage for each set referenced
by the equations. Thus, computing the dataflow equations once at an average node,
for a single element, will consist of a small number of integer operations, assuming
that the average in and out-degree of the flowgraph nodes is bounded by a small
constant, which will always be the case for flowgraphs generated from real programs,

37
and also assuming that the length of recoded elements will be small. Referring to
the algorithm of Figure 2.2, the length of a recoded element is 1 + |NA|, and |i\M|
is bounded from above by the number of call-by-reference formal parameters of the
given procedure. As a rule, this upper bound will be small.
We next consider the total number of node visits required to solve the dataflow
equations for a single element. Prior to solving the equations, all body and entry sets
are initialized to empty, at complexity 0(n). The empty sets represent the absence of
the element. Note that each set has only two states: either the element is present, or
it is absent. Assuming a forward-flow problem, each time the equations are computed
for a node, if any of the out sets have changed from their previous state, then the
equations will be computed for all successor nodes. The forward-flow-or equations
have only two out sets per node, and the forward-flow-and equations have three. It
follows that repeated computation of the equations for a single node will cause the
successor nodes to be marked for computation at most two or three times, depending
on the equations being used. Given that the average number of successor nodes is
bounded by a small constant, it follows that the total number of node visits required
to solve the dataflow equations for a single element will be bounded from above by
kin where ki is a constant, giving a worst-case complexity of O(n) for solving the
dataflow equations for a single element.
The worst-case complexity of solving the dataflow equations for m total ele¬
ments will therefore be 0{mn). Let b be the number of base elements for the program
being analyzed, and let r be the number of recoded elements, giving m — b + r. As
an example, for the reaching-definitions dataflow problem the base elements will be
all the definitions in the program. We assume that for the kind of dataflow problems
our method is meant to solve, the number of base elements will be a linear function
of the program size, and therefore proportional to n. Let constant &2 be an upper
bound of b/n. We also assume the universe of real, useful programs, written by

38
programmers to solve practical problems. To determine an upper bound for r, let k
be the maximum number of formal parameters for a single procedure. That k is a
constant independent of program size should be obvious.
Given k and the algorithm of Figure 2.2, and allowing all possible combinations
of the formal parameters of any single procedure, the maximum number of recoded
elements for any single procedure and base element is k3 = ^ ¿ ^ = 2k — 1.
Note that k3 is a constant, albeit an enormous constant. The maximum number
of recoded elements for any single procedure will therefore be k3b. In the assumed
language model, each formal parameter is visible in only one procedure, and this
means each recoded element is confined to a single procedure when the dataflow
equations are solved. Therefore, the total number of node visits required to solve
the dataflow equations for all the recoded elements will be bounded from above by
kiSik3b where j is the number of procedures in the flowgraph, and s¡ is the
number of flowgraph nodes in the ith procedure. This upper bound can be rewritten
as J2i=i kik2k3ns{. Ignoring constants and given that J2i=i si — n and H¿=i nsi = n2,
the worst-case complexity of our method for the assumed language model is 0(n2),
and the elementary operation measured by the complexity is a small number of integer
operations assuming that the average recoded-element length is small.
For a program from the assumed universe of programs, the likelihood of a large
complexity constant due to element recoding is very low, for the following reason.
In order to increase the number of recoded elements for a given base element and
procedure, the given base element must, in effect, be repeatedly aliased to different
combinations of formal parameters in the given procedure. The algorithm of Fig¬
ure 2.2 generates at most a single recoded element for each element in the OUT set,
so to increase the number of recoded elements as stated, there must be multiple calls
to the same procedure, and in these different calls the same base element must be
aliased to different formal-parameter combinations. To assess the likelihood of this

39
requirement being met, consider that for any given program from the assumed uni¬
verse, the type and purpose of a variable determines how that variable is used in that
program, and each variable used in a program by necessity has a purpose. Given a
number of different calls to the same procedure, and given that a variable appears as
one or more of the actual parameters in each of the calls, then as a rule we expect
that variable to always occupy the same parameter positions in those calls because
there is always a close correspondence between parameter position and the purpose of
the variable that occupies that position. Note that by “variable” we mean a variable
and any aliases it may have, including formal-parameter aliases. A variable and its
aliases are interchangeable and share the same purpose because by definition they
reference the same data.
It might be argued that a language such as C has procedures that have a
variable number of arguments, such as print/ and scan/, for which the same variable
could easily occupy different actual-parameter positions in different calls. This is
true, but such library procedures are best treated as unknown calls, and there is
no element recoding for unknown calls. For the needs of element recoding in the
rare case of a user-written procedure with a variable number of arguments, a single
formal parameter could stand for the variable portion of the formal parameters, and
conservative assumptions could be made whenever that single formal is, in effect,
referenced. Aside from mentioning this, we do not consider such user-written variable-
argument procedures further.
For a dataflow problem such as reaching definitions, the base element can only
be affected by a single variable. For such a dataflow problem, the purposefulness of
variables makes it very unlikely that an increase in the number of recoded elements
for a given procedure and base element can even begin, let alone be sustained. How¬
ever, such an increase would be more likely for a dataflow problem where the base

40
element can be affected by several different variables. An example would be avail¬
able expressions, because each base element could be affected by as many different
variables as compose the expression represented by that base element.
In light of the preceding argument regarding the purposefulness of variables,
for the reaching-definitions and similar dataflow problems, we expect the maximum
number of recoded elements for any given procedure and base element in the majority
of the programs in the assumed universe, to be one, and a little higher than one for
the remaining programs in that universe. Given the algorithm of Figure 2.2, we also
expect the average length of each recoded element to be slightly more than two, given
the preceding expectation that there will be a very small maximum number of recoded
elements for any given procedure and base element, and assuming that most base
elements when aliased by a call will be aliased to only a single formal parameter, and
only occasionally aliased to more than one. Note that this expected average length
of the recoded elements is consistent with the claim that the elementary operation
measured by the worst-case complexity of our method is a small number of integer
operations.
It may be noticed that the complexity of 0(n2) for our interprocedural analysis
method is the same as the known worst-case complexity for intraprocedural dataflow
analysis, assuming there are no restrictions on the flowgraph. This fact makes it
unlikely that it would be possible to improve on our method in terms of complexity,
without resorting to flowgraph restrictions. However, although the complexities are
the same, this does not mean interprocedural dataflow analysis will now take roughly
the same time as intraprocedural dataflow analysis. The following inequality should
make this clear. 5: n2> given that j is the number of procedures in the
flowgraph, s,- is the number of flowgraph nodes in the ¿th procedure, and si = n-
Besides the language model that is assumed for this chapter, an alternative
model allows each formal parameter to have visibility in more than a single procedure.

41
Examples of programming languages that fit this alternative model are Pascal and
Ada, which allow nested procedures. Element recoding can be used for this alternative
model, but unless precision is compromised, the worst-case complexity for solving the
equations will be exponential, because the number of recoded elements could grow
exponentially assuming that alias information is compounded when a recoded element
is recoded. The exponential complexity of tracking aliases due to calls was first
considered by Myers [17], and more recently by Landi and Ryder [15]. In practice, the
cost of precise element recoding for the alternative language model may be acceptable
for the assumed universe of programs, and for the same reason given previously
regarding the purposefulness of variables. However, we do not consider the alternative
model further.
2.6 Experimental Results
There are experimental data for our interprocedural analysis method. Specif¬
ically, two different prototypes have been constructed, and they both solve the
reaching-definitions dataflow problem using our method. Both prototypes accept
C-language programs as the input to be dataflow analyzed. For simplicity, these pro¬
totypes impose some restrictions on the input, such as requiring that all variables be
represented by single identifiers, thereby excluding variables that have more than one
component, such as structure and union variables. In addition, there is no logic in
the prototypes to determine what pointers are pointing at, so pointer dereferencing
is essentially ignored. The prototypes do not accept pre-processor commands, so the
input programs must be post-preprocessor.
Both prototypes, named prototype 1 and prototype 2, use the same code to
parse the input program and construct the flowgraph. However, they differ in how
they implement our analysis method. Prototype 1 prepares a single bit-vector format
containing all the definitions in the input program, and then solves the dataflow
equations once for the program flowgraph. Prototype 2 uses a single integer as the

42
bit vector and solves the dataflow equations for the program flowgraph as many
times as there are base elements. For the reaching-definitions dataflow problem, the
definitions in the program are the base elements. We call the approach used by
prototype 2 one-base-element-at-a-time, and the approach used by prototype 1 is
all-at-once.
It might be expected that prototype 2 would be many times slower than proto¬
type 1, because of the big difference in bit-vector sizes, but this is not the case. For
prototype 1, calculations using varied test results show that V x Si ~ D, where V
is the average number of visits per flowgraph node made to solve the dataflow equa¬
tions, Si is the integer size of the bit vector for prototype 1, and D is the number
of definitions in the input program. This relationship for prototype 1 means that
prototype 2 should run at roughly the same speed as prototype 1, because solving
the dataflow equations for a single element will require an average of roughly one
visit per flowgraph node and the application of the dataflow equations to a vector
of size one. Note that the total amount of work prototype 1 must do per flowgraph
node to solve the equations is proportional to the product V x S\ ~ Z), and the total
amount of work prototype 2 must do per flowgraph node to solve the equations for
the D base elements is proportional to the product VxSjxflwlxlxflwfl,
where ¿2 is the integer size of the bit vector for prototype 2.
Experimental results have supported the expectation of similar speeds for the
two prototypes. When deciding on the design of a practical tool, this finding is
important and decisively tips the scales in favor of the one-base-element-at-a-time
approach used by prototype 2. For both prototypes, the bit space needed for set
storage is n&s, where n is the number of flowgraph nodes, k is the average number of
sets per node, and s = max(average set bit-size for any solving of the equations). Note
that for prototype 1 there is only one solving of the equations, and for prototype 2
there are as many solving of the equations as base elements. The primary reason

43
Table 2.3. Typical experimental results for the two prototypes.
defs
defs global
calls
nodes
prototype 1
prototype 2
2126
30%
521
4191
49s
lm21s
2026
60%
472
3948
55s
2m22s
4109
30%
924
7537
4ml8s
4m38s
4223
60%
916
7723
4m57s
8m 19s
6115
30%
1325
11185
N/A
lOmOs
6091
60%
1411
11288
N/A
18ml8s
8200
30%
1832
14799
N/A
17m44s
8054
60%
1726
14641
N/A
30m2s
10299
30%
2164
18434
N/A
23m55s
10016
60%
2356
18587
N/A
45m8s
the approach used by prototype 2 is preferable when compared with the all-at-once
approach used by prototype 1, is the likelihood of a greatly reduced s value. For
example, without element recoding, the s value is 1 for prototype 2, and D for
prototype 1. Allowing element recoding, the s value for the prototype-2 approach
will be 1 + max(average number of recoded elements per procedure for any solving
of the equations). Here we assume that the best way to add element recoding to
prototype 2 would be, for each solving of the equations, to solve the equations for
both a single base element and all recoded elements generated from that base element.
Table 2.3 presents typical experimental results for the two prototypes. Each
table row represents a different input program. The input programs were randomly
generated by a separate program generator. The generated input programs are syn¬
tactically correct and compile without error, but have meaningless executions. Each
input program in Table 2.3 has 100 procedures. Only prototype 1 currently has
element-recoding logic, so the input programs do not have call parameters and the
table data do not reflect element-recoding costs. Measuring element-recoding costs
for randomly generated programs would be somewhat meaningless anyway, since the
purposefulness-of-variables principle would be violated.

44
Referring to the columns of Table 2.3, “defs” is the total number of definitions in
the input program, “defs global” is the percentage that define global variables, “calls”
is the number of known calls, “nodes” is the number of flowgraph nodes, “prototype 1”
is the total CPU usage time in minutes and seconds required by prototype 1 to
completely solve the reaching-definitions dataflow problem for the input program
and generate a report of all the reaches, and “prototype 2” is the same thing for
prototype 2. The hardware used was rated at roughly 23 MIPS. The large space
requirements of prototype 1 prevented running it for the larger input programs in
the table.

CHAPTER 3
INTERPROCEDURAL SLICING AND LOGICAL RIPPLE EFFECT
3.1 Representing Continuation Paths for Interprocedural Logical Ripple Effect
This section lays the theoretical basis for our algorithm. The problem of inter-
procedural logical ripple effect is examined from the perspective of execution paths
and their possible continuations. First, general definitions are given, followed by three
assumptions and a definition of the Allow and Transform sets, followed by Lemma 1,
Theorems 1 through 4, and a discussion of the potential for overestimation inherent
in the Allow set.
A variable is defined at each point in a program where it is assigned a value. A
definition is assumed to have the general form of “v <— expression”, where v is the
variable being defined and ” is an assignment operator that assigns the value of
expression to v. If the expression includes variables, then these variables are termed
the use variables of the definition. In general, a use is any instance of a variable that
is having its value used at the point where the variable occurs.
A procedure contains a definition if the statement that makes the definition is
in the body of the procedure. Similarly, a procedure contains a call if the statement
that makes the call is in the body of the procedure. The body of a procedure is those
statements that are defined as belonging to the procedure.
Frequent reference is made in this chapter to a procedure containing a state¬
ment, or containing a call, or containing a flowgraph node. For languages that allow
nested procedures, such as Pascal and Ada, note that procedure nesting in these
languages is a mechanism for controlling variable scope, and not a mechanism for
45

46
sharing statements, calls, or flowgraph nodes. Throughout this chapter we assume
that at most only a single procedure contains any given statement, call, or flowgraph
node.
Let d and dd be two definitions, possibly the same, in the same program. Let dd
have a use-variable v, let v¿¿ be that use-variable instance, and let d define v. Given a
possible execution path between definition d and v¿d, along which the definition of v
that d represents would be propagated, such a path is referred to as a definition-clear
path between d and Vdd with respect to v. Definition d can only be propagated along
an execution path to the end of that path if either definition d itself or an element
that represents definition d exists at the beginning of that path, and there is no
redefinition of v along that path. Definition d is said to affect definition dd if there
is a definition-clear path between d and Vdd with respect to v. Similarly, definition d
affects use u if u is an instance of v, and there is a definition-clear path between d
and u with respect to v. For convenience, v will not be explicitly mentioned when
it is understood. Note that whenever we speak of an execution path between two
points, we always mean that the execution path begins at the first point and ends at
the second point. For example, an execution path between d and dd begins at the
program point where d occurs and ends at the program point where Vdd occurs. For
convenience, we assume that dd and Vdd occupy the same program point.
Assumption 1. A called procedure, if it returns, always returns to its most recent
caller. A procedure that returns, always returns to the most recent unreturned call.
Assumption 2. A call has no influence on the execution paths taken inside the
called procedure.
Assumption 3. There are no recursive calls.
Assumption 1 reflects the behavior of all the procedural languages that we know
of. Regarding Assumption 2, our algorithm may in fact overestimate the logical ripple
effect because of both Assumption 2 and the unstated but standard assumption of

47
intraprocedural dataflow analysis that all paths in a procedure flowgraph are possible
execution paths. However, these two assumptions are unavoidable because determin¬
ing all the truly possible execution paths in an arbitrary program is known to be an
undecidable problem. Regarding Assumption 3, making this assumption improves
the precision of our algorithm because this assumption removes a potential cause of
overestimation. The consequence of using our algorithm for a program with recursive
calls is discussed at the end of Section 3.2.
To determine what a definition affects when it is constrained by ripple effect,
it is useful to introduce two concepts: backward flow and forward flow. Given an
execution path, whenever the execution path returns from a procedure to a call, this
is termed backward flow. All other parts of the execution path may be termed forward
flow. Note that the possibilities for backward flow are constrained by Assumption 1,
and therefore constrained by the relevant execution paths that lead up to the point
of the return in question.
Regarding a given execution path, those call instances within that execution
path that have yet to be returned to within that path, called unreturned calls, are
the parts of the path that constrain backward flow. Note that this constraint is a
positive constraint, since a call cannot be returned to unless that call exists as an
unreturned call in at least one relevant execution path.
Definition 1. Two sets, Allow and Transform, will be used to represent the
backward-flow restrictions associated with a particular definition d. Let p be the pro¬
gram point where definition d occurs. The elements in both sets are calls. The Allow
set identifies only the calls to which the execution path continuing on from point p
may make an unmatched return to—until the backward-flow restrictions represented
by this Allow set are effectively cancelled by the interaction between the execution-
path continuation and the Transform set, explained shortly. An unmatched return is
a return made during the execution-path continuation to a call instance that precedes

48
the beginning of that execution-path continuation. The call instance is necessarily an
unreturned call, as otherwise it could not be returned to. |AIlow| < the total number
of different calls represented in the program text. We define Allow = 0 to mean there
are no backward-flow restrictions for d. The Transform set identifies only the calls
to which the execution path continuing on from point p may make an unmatched
return to, and upon this unmatched return, the execution-path continuation is no
longer constrained by the Allow and Transform sets associated with d. The following
relationships hold. Transform C Allow. If Allow ^ 0 then Transform ^ 0.
Note that minimizing backward-flow restrictions must be done whenever the
possible execution paths allow it, because otherwise the computed logical ripple
effect—which is the whole purpose of this formal-analysis section—may be missing
pieces that belong in it but were not added to it because backward-flow restrictions
were retained that are not valid for all the possible execution paths involved.
Lemma 1. For any execution path P between two program points p and q, if P
includes two or more call instances made in P that have not been returned to in P,
then for these unreturned calls, c, calls the procedure containing ct+1, where c, is the
ith unreturned call, in execution order, made in P.
Proof. Assume that the next unreturned call c,+i is not contained in the pro¬
cedure that was called by c,. Let X be the procedure called by c,, and let Y be the
procedure that contains c,+j. The execution path in P between making the call c,
and making the call c,+1 must include a path out of procedure X and into procedure
Y so that the call ct+i can be made. A path out of procedure X can occur in only
two ways. Either X returns to a call, or X itself makes a call. If X returns to a call,
then by Assumption 1, c, would be returned to, contradicting the given that c, has
not been returned to. This means X must make a call to get to Y. Let c be the call
contained in X that is the last call contained in X on the execution path in P taken
from X to Y so as to make the call c1+i. If X makes the call c, and c has not been

49
returned to in P, then c would precede c,+i as an unreturned call following c¿, con¬
tradicting the given that c,+i is the next unreturned call in execution order after c,.
If c has been returned to in P, then all calls occurring on the execution path between
the call c and the return to c must have been returned to according to Assumption 1.
This would mean c,+i has been returned to, contradicting the given that c¿+i has
not been returned to. Thus, it is true that c, calls the procedure containing Cj+i, as
assuming otherwise leads to contradictions. â–¡
Definitions for Theorems 1 through Let d and dd be the two definitions
previously defined. Let A and T be the Allow and Transform sets associated with d.
Let P be a single execution path between d and dd, and along which d can affect dd,
subject to the constraints on P imposed by A and T. P will consist of a sequence of
calls and returns, if any, in the order they are made. Any instance of a call made in
P that is not returned to in P, is an unreturned call in P.
K is defined for P if and only if P contains an unmatched return—meaning a
return to a call instance that precedes the beginning of P—to a call € T. K is that
part of P that follows the first unmatched return to a call 6 T. Thus, K represents
the continuation of P after the unmatched return. Any instance of a call made in K
that is not returned to in K, is an unreturned call in K.
Referring to each of the four theorems in turn, let AA and TT be the Allow
and Transform sets for dd given all the paths P that meet the requirements of P as
stated by that theorem. Let AAp and TTp be the Allow and Transform sets for dd
given a single path P that meets the requirements of P as stated by that theorem.
The four theorems that follow each define AA and TT. Note that for any given P,
A, and T, one of the four theorems will apply.
Theorem 1. If (1) A = 0, and P has no unreturned calls, or (2) A ^ 0, K is
defined for P, and K has no unreturned calls, then AA <— 0 and TT <— 0.

50
Proof. For case (1), d is free of backward-flow restrictions and d has affected
dd without making an unreturned call, therefore dd will be free of backward-flow
restrictions, giving AA *— 0 and TT <— 0. For case (2), as soon as path P makes
an unmatched return r to a call 6 T, then by Definition 1 what d can affect is no
longer constrained by A and T, and this freedom from constraint by A and T passes
by transitivity to dd because d affects dd.
When K is defined for P, the unmatched return r in P that immediately pre¬
cedes the beginning of K, means that any unreturned calls in P are also in K. This
is because all call instances within P are more recent than the call instance that
matches the unmatched return r. Thus, by Assumption 1 all call instances in P
preceding the return r must be returned to in P before r can occur. Therefore, P has
no unreturned calls because K has no unreturned calls. Thus, dd is free of backward-
flow restrictions since A, T, and P contribute nothing in the way of constraint, giving
AA «— 0 and TT <— 0. □
Theorem 2. If (1) A = 0, and P has at least one unreturned call, or (2) A ^ 0,
K is defined for P, and K has at least one unreturned call, then AA *— Uajj suc^ p
{the unreturned calls of P}, and TT <— (Jajj ^ p {the first unreturned call in P}.
Proof. For case (1), A and T contribute nothing in the way of constraint
to AAp and TTp. Because d affects dd along path P which contains unreturned
calls, by Assumption 1 those unreturned calls must be returned to first before any
other unreturned calls can be made from the execution-path continuation point of dd
onward. Hence, AAp <— {the unreturned calls of P}. Because d had no backward-
flow restrictions, it follows that once all the unreturned calls of P are returned to
by the execution-path continuation, then that continuation would no longer have
any backward-flow restrictions. Because of Assumption 3 and Lemma 1, all the
unreturned calls of P are returned to when the sequentially first unreturned call in
P is returned to. Hence, TTp <— {the first unreturned call in P}. For case (2), as

51
shown in the proof of Theorem 1 case (2), A and T contribute nothing to AAp and
TTp when K is defined for P. Thus, this case (2) is effectively the same as case
(1), because the A and T sets contribute nothing and an unreturned call in K is an
unreturned call in P. Therefore, AAp *— {the unreturned calls of P} and TTp <—
{the first unreturned call in P}.
From Definition 1 and the general definitions of AA, TT, AAp, and TTp, it
follows that AA <— (Jall such p AAp and TT <— Uall such P TTp. Thus, AA <—
(Jail such P unreturned calls of P}, and TT +— (Jajj such p {the first unreturned
call in P}. â–¡
Theorem 3. If A ^ 0, K is not defined for P, and P has no unreturned calls,
then AA <— {x | x € A A (x is part of a possible execution path that inclusively
begins with a call £ T and ends with a call of the procedure containing dd, such that
each unreturned call in this possible execution path is in A)}, and TT <— A A fl T.
Proof. Note that only one procedure contains dd. Because K is not defined for
P, it follows that P was constrained in its entirety by A, never making an unmatched
return to a call £ T. Because P has no unreturned calls, d can only affect dd along
P by making one or more unmatched returns to calls £ (A — T), unless d and dd are
in the same procedure.
A, in effect, represents possible execution paths with unreturned calls by which
d was affected. However, once given P, the path P may eliminate some of the
paths from A as being possible, and return to some of the unreturned calls in A.
Thus, although P contributes nothing directly to AA, it may narrow the unreturned
execution-path possibilities that A can contribute to AA. AA as defined for this
theorem, captures all execution paths in A that begin with a call £ T and end with
a call of the procedure that contains dd. Given Assumption 3, it should be obvious
that these are all the possible paths in A that are unreturned after P. Note that
if d and dd are in the same procedure, then AA = A and TT = T. Assume that

52
d and dd are in different procedures. Any call € A that is not part of at least one
path in A that makes a call of the procedure containing dd, must be excluded from
AA because P requires a path in A that passes through the procedure containing
dd, because otherwise P could not make a return to the procedure containing dd.
Any call € A that is on a path in A between the procedure containing dd and the
procedure containing d, must be excluded from AA because the procedure containing
dd has been returned to by P. The definition of AA for this theorem satisfies these
two exclusions.
That TT *— AA D T follows from Definition 1 requiring TT C AA, and from
the definition of AA for this theorem. â–¡
Theorem 4- If A ^ 0, K is not defined for P, P has at least one unreturned
call, and the first unreturned call in P is contained in procedure X, then S\ *—
Uall such P given X {^e unreturned calls of P}, and S2 <— {x | x € A A (x is part
of a possible execution path that inclusively begins with a call 6 T and ends with
a call of the procedure X, such that each unreturned call in this possible execution
path is in A)}, AA <— S\ U S2, and TT <— S2 D T.
Proof. Si follows from Definition 1 and the proof of Theorem 2. S2 follows from
Theorem 3, where the specific “procedure containing dd” in the expression for AA in
Theorem 3 has been replaced by the equally specific “procedure X".
That the union operation of AA, combining S\ and S2, does not thereby repre¬
sent spurious paths in AA, it is only necessary to show that the paths represented in
Si never cross with the paths represented in S'2. Two paths cross if each path makes
an unreturned call to the same procedure. All paths in S2 end with an unreturned
call of procedure X. All paths in Si begin with an unreturned call contained in
procedure X. Assume that both Si and S2 include an unreturned call to the same
procedure. As all paths in S2 lead to procedure X, this means there exists an exe¬
cution path that originates in procedure X and eventually calls procedure X. Thus,

53
Figure 3.1. An example call structure that does not allow overestimation.
the execution path represents recursion, and this is contradicted by Assumption 3.
Therefore, the paths represented in Si never cross with the paths represented in S2.
The first unreturned call in P is not added to TT because the path P is an
extension of the unreturned paths represented in S2- That TT *— Si (~\T follows from
Definition 1 requiring TT C AA, and from the definition of AA for this theorem. â–¡
The four theorems given above will be used to build the algorithm given in
the next section. In effect, a given Allow set represents possible execution paths
with unreturned calls by which the definition associated with that Allow set was
affected. Inversely, the Allow set identifies, in effect, those continuation paths that
can make unmatched returns. However, missing from the Allow set is the information
needed to enforce an ordering of the unmatched returns that the continuation path
may make. To a large extent, this missing information is unnecessary because of
Lemma 1. Typically, the call structure of the program itself enforces the ordering of
the unmatched returns. Figure 3.1 is an example. Assume d affects dd, giving an
Allow set of {cl,c2} for dd. Given a continuation path from dd, it is not possible
for cl to be returned to before c2, so the correct ordering of unmatched returns is
enforced by the program itself. However, there are cases where the missing ordering
information can result in a continuation path taking unwanted shortcuts.
Figure 3.2 gives an example of a call structure that allows the continuation path
from dd to make an unwanted shortcut when given the right circumstances. Assume
d affects dd along the paths cl-c2 and c3-c4, giving an Allow set of {cl,c2,c3,c4} for
dd. Assume the continuation path is r2-c5-r3, where r2 and r3 are unmatched returns

54
Figure 3.2. An example call structure that allows overestimation.
to calls c2 and c3. The unmatched return r3 should not be allowed to happen before
an unmatched return r4, but this unmatched-return ordering will not be enforced
by the Allow set defined in this dissertation, so the assumed continuation path is
possible. By virtue of such a spurious continuation path, dd may be able to affect
a definition or use that it would not otherwise be able to affect, assuming dd were
confined to only legitimate continuation paths. In practical terms, this means that
the computed logical ripple effect that consists of affected definitions and uses may
in fact be an overestimate because of spurious continuation paths. Although the
Allow set does permit spurious continuation paths under the right circumstances, of
which Figure 3.2, and the assumed paths by which d affected dd, are the most simple
example, we feel that these circumstances, along with spurious paths that affect
what would otherwise be unaffected, will not occur often enough in real programs to
undermine the general usefulness of the Allow set in constraining backward flow and
permitting computation of a precise or semiprecise logical ripple effect.

55
3.2 The Logical Ripple Effect Algorithm
This section presents an algorithm for computing a precise interprocedural logi¬
cal ripple effect. After a brief overview of the algorithm, the dataflow analysis method
used by the algorithm is discussed. Then, two important properties of the dataflow
sets are detailed, followed by three rules that are used to impose backward-flow re¬
strictions on the dataflow analysis that is done. Last are proofs that the algorithm
is correct.
The algorithm to compute logical ripple effect is shown in Figure 3.3. Each
statement in the algorithm is numbered on the left. For convenience, algorithm
statements will be referred to as lines. For example, a reference to line 28 means the
statement at 28 that actually is printed on several lines. Comments in the algorithm
begin with —. _L and T are just two different, fixed, arbitrary values.
In general, the algorithm works as follows. A definition d and its associated
Allow and Transform sets are popped from the stack (line 7), and then the reaching-
definitions dataflow problem is solved for this definition d, imposing any backward-
flow restrictions represented by the Allow and Transform sets (line 8). Reaching
definitions for a single definition is the problem of finding all uses and definitions
affected by the definition. The definition d that was dataflow analyzed, and any
uses affected by it, are included in the ripple effect (lines 9 to 11). Each affected
definition will have its Allow and Transform sets determined in accordance with
Theorems 1 through 4 (lines 22 to 46). A check is then made to see if the affected
definition and its restriction sets, Allow and Transform, should be added to the stack
for dataflow analysis or not (lines 47 to 52). The algorithm ends when the stack is
empty. Although the algorithm shows a single definition b being added to the stack at
line 5, any number of different b can actually be added, along with empty restriction
sets for each b.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
56
— Compute the logical ripple effect for a hypothetical or actual definition b
— Input: a program flowgraph ready for dataflow analysis
— Output: the logical ripple effect in RIPPLE
begin
RIPPLE 0
for each definition dd in the program
FlNdd «- 1
end for
stack «— 0
push (b, 0, 0) onto stack
while stack / 0 do
pop stack into (d, ALLOW, TRANSFORM)
Solve the reaching-definitions dataflow equations for the single definition d,
using Rules 1, 2, and 3.
RIPPLE 4- RIPPLE U {d}
for each use u in the program that is affected by either d\ or d2
RIPPLE 4- RIPPLE U {u}
end for
ROOT1 4- 0, LINK1 4- 0, ROOT2 «- 0, LINK2 4- 0
for each call node n in the flowgraph
if d\ € Bout[n] and d\ crossed from this call into the called procedure
ROOT1 4— ROOT1 U {the call node n}
fi
if di € Eout[n\ and di crossed from this call into the called procedure
LINK1 4- LINK1 U {the call node n}
fi
if d2 € Bout[n] and d2 crossed from this call into the called procedure
ROOT2 4 ROOT2 U {the call node n}
fi
if d2 € Eout[n\ and d2 crossed from this call into the called procedure
LINK2 4- LINK2 U {the call node n}
fi
end for
Figure 3.3. The logical ripple effect algorithm.

57
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
for each definition dd in the program that is affected by either d\ or d2
— determine Allow and Transform for dd by Theorem 1
if d-2 € Bin [node where dd occurs]
PATHS «- 0, TRANS 4- 0
call Analyze
else
— determine Allow and Transform for dd by Theorem 2
if d2 (E £t„[node where dd occurs]
PATHS v- 0
PATHS <— {a: I x € (ROOT2 U LINK2) A (x calls the procedure
that contains dd V x calls a procedure that contains
a call c € (PATHS n LINK2))}
TRANS «- ROOT2 n PATHS
call Analyze
fi
— determine Allow and Transform for dd by Theorem 3
if d\ € Bin [node where dd occurs]
PATHS «- 0
PATHS <— {1 I x 6 ALLOW A (x calls the procedure that contains dd
V x calls a procedure that contains a call c € PATHS)}
TRANS 4- TRANSFORM D PATHS
call Analyze
fi
— determine Allow and Transform for dd by Theorem 4
if d\ € 2£,-„[node where dd occurs]
for each procedure X that contains a call 6 ROOT1
RT1
PP ♦
PP v
- {x I x € ROOT1 A x is contained in procedure X}
0
{x I x € (RT1 U LINK1) A (x is on a path that inclusively
begins with a call € RT1 and ends with a call of the
procedure that contains dd, such that each call
in this path is in (RT1 U LINK1))}
if PP ^ 0
PATHS 4- 0
PATHS <— {x I x G ALLOW A (x calls procedure X V x calls
a procedure that contains a call c € PATHS)}
TRANS 4- TRANSFORM 0 PATHS
PATHS 4- PATHS U PP
call Analyze
end statements: fi, end for, fi, fi, end for, od
end
Figure 3.3. - continued.

58
Procedure Analyze
begin
— avoid repetition of dd dataflow analysis if possible
47 if FIN* / T A (PATHS = 0
V (true for all saved pairs for dd: PATHS % P V TRANS £ T))
48 if PATHS = 0
49 FINdii T
50 push (dd, 0, 0) onto stack
else
51 save PATHS and TRANS as the pair P x T for dd
52 push (dd, PATHS, TRANS) onto stack
fi
fi
end
Figure 3.3. - continued.
The dataflow equations referred to in line 8 are shown in Figure 3.4. These
equations are copied from Chapter 2 that presents a method for context-dependent
flow-sensitive interprocedural dataflow analysis. The method consists of solving—
using the standard iterative algorithm—the dataflow equations shown in Figure 3.4,
for the program flowgraph required by the equations. The method in Chapter 2
includes a solution to the problems of parameter aliasing and implicit definitions,
that are part of the interprocedural reaching-definition problem. We assume that the
full method of Chapter 2 would be used, but we do not discuss these side issues in
this chapter as they are not directly relevant to the algorithm. Note that there are
other methods for context-dependent flow-sensitive interprocedural dataflow analysis
[3, 9, 17, 21], but the method of Chapter 2 has precision and efficiency advantages
over the other methods cited.
Referring to the dataflow equations of Figure 3.4, four sets are computed for
each flowgraph node: two body sets, 5,n and Bout, and two entry sets, and Eout.
All body and entry sets are initially empty. As the equations will be solved for only
a single definition d, the GEN set for the node where d occurs—i.e. the node whose

59
For any node n.
IN[n] = Ein [n] U £,„[n]
OUT[n] = Eout[n\ U Bout[n]
Group I: n is an entry node.
Bin[n] = 0
Ein[n\= U {x\xeOUT[p}AC1}
p G pred(n)
Bout[n\ = GEN[n]
Eout[n] = Etn[n] U RECODE[n\
Group II: n is a return node, p is the associated call node and q is the exit node of
the called procedure.
Bin[n} = {x | (x G Bout[p] A (Ci V (Ci A C2 A x e .Eout[?]))) V (x G Bout[q] A C2)}
E,n[n] = {i£ Eout[p] | Ci V (Ci A C2 A x G Eout [?])}
50Ut[n] = (£,„[«] - KILL[n}) U GEN[n\
E0ut[n] = Ein[n] - KILL[n]
Group III: n is not an entry or return node.
5,„[n] = U Bout[p]
p G pred(n)
Ein[n] = |J Eout[p]
p G pred(n)
Bout[n] = (.Bin[n] - KILL[n]) U GEN[n}
Eout[n\ = Ein[n\ — KILL[n]
Figure 3.4. Dataflow equations for the reaching-definitions problem.

60
associated block of program code contains the definition d—will contain an element
representing d, and all the other GEN sets will be empty. The node where d occurs
is the natural starting point for the iterative algorithm, that will recompute the body
and entry sets for the nodes until stability is attained and the sets cease to change, at
which point the equations have been solved. Once solved, an element is in the entry
set or body set at a particular node depending on how that element was propagated
to that node. The same element may be in both sets at the same node. Properties 1
and 2 listed below, summarize those implications of set membership that are used by
the algorithm. The properties follow directly from the dataflow equations.
Property 1. For any node n, an element is in the £:n[n] set or Eout[n\ set if and
only if that element entered the procedure that contains node n from a call node,
and there is a definition-clear path from that call node to node n. Thus, membership
in the entry set of node n implies that the element can propagate to node n by an
execution path that makes at least one unreturned call between the point where the
element is generated and the point where node n occurs.
Property 2. For any node n, an element is in the Bin[n] set or Bout[n] set if
and only if that element was generated in the same procedure that contains node n,
or that element entered the procedure that contains node n from an exit-node Bout
set. There must also be a definition-clear path to node n from either the element’s
generation node or from the exit node. If the element entered from an exit-node
Bout set, then Property 2 applies recursively to the element in that Bout set. Thus,
membership in the body set of node n implies that the element can propagate to
node n by an execution path between the point where the element is generated and
the point where node n occurs that does not include any unreturned calls.
The three rules referred to in line 8 are listed below. Rule 1 applies before
the dataflow equations are solved. Rules 2 and 3 apply as the equations are being

61
solved. The rules impose the backward-flow restrictions represented by the ALLOW
and TRANSFORM sets in line 7.
Rule 1. If ALLOW = 0 then element d2 is generated at the node where definition
d occurs, otherwise dx is the generated element, meaning the element in the GEN
set. Both d\ and d2 are base elements that represent the same definition d. Both
elements are identical in terms of when they appear in any given KILL set. The
only difference between them is that d\ and d2 are treated differently by Rules 2 and
3 below.
If the ALLOW set is empty, then by Definition 1 there should be no backward-
flow restrictions on d. Rule 1 accomplishes this requirement, as d2 is immune to
backward-flow restrictions which are imposed by Rule 2.
Rule 2. Let n be a return node, p be the associated call node, and q be the
exit node of the called procedure. Each time the 5m[n] equation is computed, if
d\ € Bout[q]i then d\ cannot cross from Bout[q] into the 5,„[n] set if p £ ALLOW.
In the dataflow equations, the crossing of an element from an exit-node body
set to a return node is the only action in the equations that represents, in effect, an
unmatched return to a call instance that was made in an execution path leading up
to the program point where definition d occurs, which is the starting point of the
reaching-definition analysis done for d. Thus, Rule 2 covers all cases in which an
unmatched return occurs. Rule 2 restricts unmatched returns to those call instances
that are represented in the ALLOW set, thereby realizing the purpose of the ALLOW
set as given by Definition 1.
Rule 3. Let n be a return node, p be the associated call node, and q be the
exit node of the called procedure. Each time the F?,„[n] equation is computed, if
d\ E Bout[q], and, by C2 and Rule 2, dj can cross from Bout[q] into the 5,„[n] set,
and p € TRANSFORM, then as this d\ element crosses from Bout[q] into the

62
set, the element is changed to d2. In effect, d\ is transformed into d2, and the return
node n becomes a generation node for the d2 element.
As already mentioned, in the dataflow equations the crossing of an element
from an exit-node body set to a return node is the only action in the equations that
represents, in effect, an unmatched return to a call instance that was made in an
execution path leading up to the program point where definition d occurs, which
is the starting point of the reaching-definition analysis done for d. Thus, Rule 3
covers all cases in which an unmatched return occurs. The requirement by Rule 3
that the returned-to call be in the TRANSFORM set satisfies Definition 1 as to
when backward-flow restrictions can be ignored. Rule 3 replaces element d\, which is
subject to the backward-flow restrictions, with element d2, which is free of backward-
flow restrictions, at the return point and thereby satisfies Definition 1 regarding
removal of backward-flow restrictions on the execution-path continuation, since d2
now represents the continuation instead of d\.
Lemma 2. The algorithm computes at lines 23 to 46 the restriction sets for an
affected definition in accordance with Theorems 1 through 4.
Proof. We first establish the properties of the LINK and ROOT sets computed
at lines 12 to 21. Let p be the node, if any, where d\ is generated. Let q be any node
where d2 is generated, i.e. those return nodes where di is transformed into d2, or for
ALLOW = 0 the node where definition d occurs.
The tests at lines 14 and 18 make use of Property 2: if an element is in the Bout
set of a call node n, then there exists a definition-clear path between the node where
the element is generated and node n, and the path has no unreturned calls. The call
at node n would be the first unreturned call on that path by just extending the path
to the entry node of the called procedure. Therefore, the R00T1 set represents all
calls that are the first unreturned call on at least one definition-clear path between
node p and some other node in the flowgraph. The ROOT2 set represents all calls

63
that are the first unreturned call on at least one definition-clear path between node
q and some other node in the flowgraph.
The tests of lines 16 and 20 make use of Property 1: if an element is in the Eout
set of a call node n, then there exists a definition-clear path between the node where
the element is generated and node n, and the path includes the unreturned call that
called the procedure containing node n. The call at node n would be at least the
second unreturned call on that path by just extending the path to the entry node
of the called procedure. Therefore, the LINK1 set represents all calls that are an
unreturned call but not the first unreturned call on at least one definition-clear path
between node p and some other node in the flowgraph. The LINK2 set represents
all calls that are an unreturned call but not the first unreturned call on at least one
definition-clear path between node q and some other node in the flowgraph.
The test at line 23 checks for the application of Theorem 1. If d2 € R,n[node
where dd occurs], then by Property 2 there exists a definition-clear path P between
d and dd that has no unreturned calls, and somewhere along P, d2 is generated,
meaning either ALLOW = 0 or K is defined for P. This satisfies the conditions of
Theorem 1, and line 24 sets PATHS and TRANS to empty in accordance with the
theorem. PATHS and TRANS are the Allow and Transform sets for dd.
The test at line 26 checks for the application of Theorem 2. If d2 € £m[node
where dd occurs], then by Property 1 there exists at least one definition-clear path
P between d and dd that has at least one unreturned call, and somewhere along P,
d2 is generated, meaning either ALLOW = 0 or K is defined for P. This satisfies
the conditions of Theorem 2. Only the d2 element satisfies the theorem, so it follows
that all paths P for the theorem will have to be constructed from the ROOT2 and
LINK2 sets exclusively.
Referring to Theorem 2, line 28 computes the AA set, and line 29 computes
TT. For line 28, the PATHS set is defined in terms of itself. This recursive reference

64
means that each time a call is added to the PATHS set, the condition containing
the recursive reference must be reevaluated, because additional calls may thereby be
added to PATHS. Recursive references are similarly used in lines 33 and 43. What
line 28 does is extract from all the calls that element d2 crossed, just those calls that
are on a path to dd. This is done by building the paths backwards, beginning with
those calls that call the procedure containing dd. By Lemma 1, any path between d
and dd consisting of unreturned calls can be found by proceeding in reverse order from
dd and selecting those calls that call a procedure containing a call already selected.
Backward path building and Lemma 1 are similarly used in lines 33 and 43. By the
properties of the ROOT and LINK sets, the paths constructed by line 28 will be
definition-clear. Notice that a particular call may be in both the R00T2 and LINK2
sets, but if a call is only in the R00T2 set, then it cannot be used as the basis for
extending further backwards any path, because by Property 1, d2 does not propagate
from the entry node of the procedure that contains that call, to the call node for that
call. This is the reason for the (PATHS f) LINK2) requirement in line 28. Once the
PATHS set is computed, line 29 computes TRANS in accordance with the theorem.
The test at line 31 checks for the application of Theorem 3. If d\ 6 Bm[node
where dd occurs], then by Property 2 there exists a definition-clear path P between
d and dd that has no unreturned calls. It also follows that ALLOW / 0 and P
does not make an unmatched return to a call € TRANSFORM, because d\ is the
element, meaning K is not defined for P. This satisfies the conditions of Theorem 3.
Referring to Theorem 3, line 33 computes the AA set, and line 34 computes TT.
What line 33 does is extract from ALLOW all paths that end with a call of the
procedure containing dd. Although Theorem 3 states that the path begin with a call
€ TRANSFORM, line 33 does not require a check for this because TRANSFORM is
a subset of ALLOW and those first unreturned calls in TRANSFORM that are on a
path in ALLOW to dd, will unavoidably be picked up as the paths are built backwards

65
from dd. Thus, the PATHS set is computed in accordance with Theorem 3, followed
by line 34 that computes the TRANS set in accordance with the theorem.
The test at line 36 checks for the application of Theorem 4. If d\ € £tn[node
where dd occurs], then by Property 1 there exists at least one definition-clear path P
between d and dd that has at least one unreturned call. It also follows that ALLOW
Í 0 and P does not make an unmatched return to a call 6 TRANSFORM, because
di is the element. This satisfies the conditions of the theorem. Only the d\ element
satisfies the theorem, so it follows that all paths P for the theorem will have to be
constructed from the ROOT1 and LINK1 sets exclusively. Referring to Theorem 4,
line 40 computes the Si set, line 43 computes the S2 set, line 44 computes the TT set,
and line 45 computes the AA set. The reason for the test at line 41 is that although
there exists at least one path P satisfying the theorem, there may not be any paths
P that begin in the specific procedure X. It can be seen that lines 37 to 46 compute
in accordance with the theorem. â–¡
Lemma 3. Let A, and be one pair of Allow and Transform sets associated
with a definition d, and let Aj and Tj be a different pair of Allow and Transform sets
associated with the same definition d. Assume A,- 0 and A: 0. If Aj C A, and
Tj C T{, then dataflow analyzing d with the pair Aj and Tj cannot add anything to
the ripple effect that is not added by dataflow analyzing d with the pair A, and T,.
Proof. By inspection of Rules 1, 2, and 3, it can be seen that removing some
of the calls from A,- or T, cannot make d affect anything that it does not affect with
A,- and 7¿ as they were. Also, by inspection of lines 23 to 46, the determination of
the Allow and Transform sets for any definition dd affected by d, cannot be made to
include calls when A: and Tj are the restriction sets for d, that would not be included
when A, and T, are the restriction sets for d. â–¡
Lemma 4â–  Let A and T be Allow and Transform sets associated with a definition
d, and let X and Y be a different pair of Allow and Transform sets associated with

66
the same definition d. If A = 0, then dataflow analyzing d with X and Y cannot add
anything to the ripple effect that is not added by dataflow analyzing d with A and
T.
Proof. By Rule 1, d will be represented by d2 and have no restrictions on its
backward flow. Thus, d will affect everything that it is possible for it to affect. If d
is dataflow analyzed with X and Y, then any calls found in the ROOT1, ROOT2,
LINK1, or LINK2 sets will also be found in the ROOT2 or LINK2 sets when d is
dataflow analyzed with A and T. These sets determine the restriction sets associated
with a definition dd affected by d. It follows that any dataflow path allowed for a dd
affected by d using X and Y, will also be allowed for a dd affected by d using A and
T. â–¡
Theorem 5. Given Definition 1 and Theorems 1 through 4, the algorithm will
correctly compute the logical ripple effect.
Proof. As shown by Lemma 2, for any affected definition dd, the Allow and
Transform sets to be associated with dd are computed in accordance with Theorems 1
to 4. By Lemma 4, if Theorem 1 applies to an affected definition (line 23), then there
is no need to check if any other theorem also applies, because additional dataflow
analysis resulting from the other theorems cannot contribute to the ripple effect.
However, if Theorem 1 does not apply, then the definition must be dataflow analyzed
separately in turn for each theorem that does apply. This is done by the sequence of
three if statements at lines 26, 31, and 36. Thus, the control logic in lines 23 to 46 is
safe.
The Analyze procedure (lines 47 to 52) prepares a definition and its restriction
sets for dataflow analysis by adding them to the stack (line 50 and 52). Once a defi¬
nition will be dataflow analyzed with no restrictions (line 50) it will not be analyzed
again (line 47). By Lemma 4, this is safe. Assuming FINdd ^ T and PATHS ^ 0, the
test at line 47 will not prepare a definition for dataflow analysis if both restriction sets

67
are subsets of any pair of restriction sets used previously to analyze that definition.
This follows from Lemma 3. Thus, the Analyze procedure is safe.
The correctness of the dataflow equations (line 8) is established in Chapter 2,
and the correctness of the three rules for imposing backward-flow restrictions (line 8)
has already been discussed. Regarding the correctness of having no backward-flow
restrictions for the initial definition (line 5), let p be the program point where b occurs.
For execution to attain point p, any possible execution path between the program’s
execution starting point and point p can be assumed to have occurred. Thus, there
should be no restrictions on the backward-flow possibilities of b, because there were
no constraints imposed by the ripple effect on how point p was initially attained. â–¡
Programs with recursive calls can be processed by our algorithm, but there may
be some overestimation of the logical ripple effect because of the recursive calls. The
dataflow equations (line 8) are not the problem, as they work for recursive programs.
Instead, the problem is with the Allow set and its representation of execution paths.
If a cyclic execution path is represented in the Allow set, then when the Allow set is
used to restrict backward flow by Rule 2, it may be possible for an element moving
through the program flowgraph to take a shortcut on its unmatched returns and avoid
having to make unmatched returns along the complete cycle before a program point
can be attained. This shortcut may permit the element to affect something that it
should not be able to affect, possibly adding to the ripple effect beyond what should
be there.
3.3 A Prototype Demonstrates the Algorithm
This section first considers the complexity of our interprocedural logical ripple
effect algorithm. A prototype that demonstrates the algorithm is then described, and
test results presented.

68
Let n be the number of nodes in the flowgraph of the input program. For a
programming language such as C, solving the dataflow equations for a single defi¬
nition, which is what line 8 does, has worst-case complexity of O(n). Let k be the
number of known calls in the input program. Considering line 47, a definition may be
dataflow analyzed repeatedly as long as the associated restriction sets are not subsets
of any previous pair of restriction sets used to dataflow analyze that definition. The
number of different restriction sets possible such that no set is a subset of another
set, is clearly a number that will grow exponentially with k. Thus, the worst-case
complexity of our logical ripple effect algorithm is exponential, where the exponent
is some function of k. However, for the typical input program, the actual number of
non-subset restriction sets that can be generated by our algorithm for a given defini¬
tion, will be severely constrained by a combination of Lemma 1, Theorems 1 through
4, and the typical program call structure that is characterized by shallow call depth.
A prototype that demonstrates our logical ripple effect algorithm has been built.
The prototype accepts as input C programs that satisfy certain constraints, such as
having only single-identifier variable names. Given an input program, the prototype
then requires that one or more definitions be identified as the starting point of the
ripple effect. For purposes of comparison, besides using our algorithm to compute
a precise logical ripple effect, the prototype also computes an overestimate of the
logical ripple effect. The overestimate is computed by simply ignoring the execution-
path problem, i.e. there are no backward-flow restrictions when the overestimate is
computed. The worst-case complexity of computing the overestimate for C programs
is only O(nd) where n is the number of flowgraph nodes and d is the number of
definitions in the overestimated ripple effect. This complexity follows from the O(n)
complexity of solving the dataflow equations for a single definition, and the fact that
the equations will have to be solved d times.

69
Table 3.1. Experimental results for the prototype.
globals
defs
defs global
depth
nodes
RS0
RSP
reduction
time0
timep
50
2420
7%
2/213
3939
2275
936
53.4%
5s
3s
100
2291
15%
2/188
3776
4151
2449
41.0%
17s
13s
200
2294
30%
2/188
3662
5594
3718
33.5%
40s
32s
300
2370
45%
2/231
3962
5897
2607
55.8%
lm5s
27s
50
2225
7%
3/202
3717
1222
633
40.3%
3s
2s
100
2333
15%
3/229
3864
4139
1867
54.9%
17s
7s
200
2211
30%
3/231
3760
4884
2688
45.0%
39s
28s
300
2236
45%
3/205
3737
5308
3505
34.0%
59s
38s
50
2320
7%
4/227
3912
1822
1067
35.1%
5s
3s
100
2211
15%
4/228
3673
4329
1525
64.8%
18s
7s
200
2223
30%
4/227
3705
5019
1918
61.8%
37s
16s
300
2214
45%
4/214
3648
5922
4740
20.0%
lm9s
lm36s
100
4354
7%
2/372
6858
4317
2201
40.0%
19s
10s
200
4467
15%
2/368
7068
8844
6457
27.0%
lml7s
lml2s
400
4261
30%
2/388
6851
9653
2976
69.2%
2m29s
49s
600
4289
45%
2/340
6784
10590
6840
35.4%
4m8s
3m56s
100
4314
7%
3/432
6781
1993
631
52.5%
8s
2s
200
4268
15%
3/395
6876
5795
3236
35.5%
51s
54s
400
4223
30%
3/393
6735
9240
7307
20.9%
2m26s
4m21s
600
4248
45%
3/433
6868
9772
6453
30.6%
3m56s
4m50s
100
4252
7%
4/455
6961
2756
1120
42.6%
14s
5s
200
4276
15%
4/440
6858
7781
5752
26.1%
lmlOs
2m35s
400
4228
30%
4/391
6681
9838
8290
15.7%
2m45s
9m20s
600
4112
45%
4/462
6802
10017
9192
8.2%
4m24s
39m55s
Table 3.1 presents test results for the prototype. Each row details relevant char¬
acteristics of an input program, and presents the resulting averages of ten different
tests of that input program, where each test computed the ripple effect started by a
single, randomly chosen definition of a global variable.
The input programs of Table 3.1 were randomly generated by a separate pro¬
gram generator. The generated input programs are syntactically correct and compile
without error, but have meaningless executions. Each input program of Table 3.1 has
100 procedures, and exactly the number of global variables listed. Within each input

70
program, each global variable is defined and used at least once. The call structure of
each input program was determined randomly by the generator, with the constraint
that there be no recursion in the input program, and the given maximum call depth
not be exceeded by any call in the input program. All calls in the generated input
program are known calls, and approximately l/(max + 1) of the calls will be at each
possible depth from zero to max, where max is the given maximum call depth.
Referring to the columns of Table 3.1, “globals” is the number of global variables
in the input program, “defs” is the number of definitions in the input program, “defs
global” is the percentage of the definitions that define a global variable, “depth” is
the maximum call depth followed by the total number of calls in the input program,
“nodes” is the number of nodes in the flowgraph, “RS„” is the average size of the
overestimated ripple effect for the ten test cases where size is the total number of
definitions and uses in the ripple effect, “RSP” is the average size of the precise
ripple effect, “reduction” is the average percentage reduction for the ten test cases
of the size of the overestimated ripple effect when it is replaced by the precise ripple
effect, “time0” is the average CPU usage time for each test case to compute the
overestimated ripple effect, and “timep” is the average CPU usage time for each test
case to compute the precise ripple effect. The hardware used was rated at roughly
24 MIPS. As an example of the time notation used in Table 3.1, time lm36s would
be read as 1 minute, 36 seconds.
Although the worst-case complexity of our algorithm for precise logical ripple
effect is exponential, the data of Table 3.1 indicates that the expected complexity for
a wide range of input programs, given a programming language such as C, is approxi¬
mated by 0(nd). This follows from the 0(nd) worst-case complexity of computing the
overestimate, and the typical closeness of time0 and timep for each row in Table 3.1.
However, the last row of Table 3.1 is instructive, because it shows that regardless of
what the expected complexity might be, there will always be specific input programs

71
and starting points that require time greatly exceeding the time required to compute
the overestimate. In practice, if the computation of the precise logical ripple effect
is taking too long, then this computation can be abandoned and the overestimate
computed and used in its place. Note that our algorithm can very easily compute
the overestimate by simply modifying Rule 1 so that element d2 is always generated
in place of element d\, thereby avoiding all backward-flow restrictions.
3.4 The Slicing Algorithm
This section presents the inverse form of the precise interprocedural logical
ripple effect algorithm, and the inverse form of the associated dataflow equations and
backward-flow restriction rules. Our algorithm for precise interprocedural slicing is
shown in Figure 3.5. The complexity and expected performance of this algorithm
is the same as for the precise interprocedural logical ripple effect algorithm given
previously.
For logical ripple effect, the dataflow problem solved at line 8 was reaching
definitions for a single definition. For slicing, which is the inverse problem, the
dataflow problem solved at line 8 will be reaching uses for a single use. In reaching
definitions, the definition flows in the direction of the arcs in the flowgraph, and is
killed by definitions of the same variable, and affects uses of the same variable and
any definitions directly dependent on an affected use. In reaching uses, the use flows
in the reverse direction of the arcs in the flowgraph, and is killed by definitions of the
same variable, and affects definitions of the same variable and any uses that directly
determine an affected definition. This reverse flow in the flowgraph means that the
dataflow equations solved at line 8 for the slicing algorithm must be an inverted form
of the dataflow equations that are used for the logical ripple effect algorithm. These
inverted dataflow equations are shown in Figure 3.6. The inverted rules that the
slicing algorithm uses for backward-flow restriction are given below. Notice that the
ALLOW and TRANSFORM sets will contain returns instead of calls.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
72
— Compute the slice for a hypothetical or actual use b
— Input: a program flowgraph ready for dataflow analysis
— Output: the slice in SLICE
begin
SLICE <- 0
for each use uu in the program
FINUU <- 1
end for
stack <— 0
push (6, 0, 0) onto stack
while stack ^ 0 do
pop stack into (u, ALLOW, TRANSFORM)
Solve the reaching-uses dataflow equations for the single use u,
using Rules 1, 2, and 3.
SLICE <- SLICE U {u}
for each definition d in the program that is affected by either Ui or u2
SLICE <- SLICE U {d}
end for
ROOT1 ^ 0, LINK1 <- 0, ROOT2 4- 0, LINK2 0
for each return node n in the flowgraph
if u\ € Bin[n] A U\ crossed from this return into the returned-from procedure
ROOT1 <— ROOTl U {the return node n}
fi
if u\ € £,„[n] A u\ crossed from this return into the returned-from procedure
LINK1 4— LINK1 U {the return node n}
fi
if u2 € B¡n[n\ A ii2 crossed from this return into the returned-from procedure
ROOT2 4— ROOT2 U {the return node n}
fi
if u2 LINK2 4— LINK2 U {the return node n}
fi
end for
Figure 3.5. The slicing algorithm.

73
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
end
end
for each use uu in the program that is affected by either u\ or u2
— determine Allow and Transform for uu by Theorem 1
if u2 £ Z?out[node where uu occurs]
PATHS - 0, TRANS <- 0
call Analyze
else
— determine Allow and Transform for uu by Theorem 2
if u2 £ i?oui[node where uu occurs]
PATHS 4- 0
PATHS <— {x | x £ (ROOT2 U LINK2) A (x returns from the
procedure that contains uu V x returns from a procedure
that contains a return r £ (PATHS D LINK2))}
TRANS 4- ROOT2 n PATHS
call Analyze
fi
— determine Allow and Transform for uu by Theorem 3
if ui £ B0ut[node where uu occurs]
PATHS 4- 0
PATHS 4— (x | x £ ALLOW A (x returns from the procedure that
contains uu V x returns from a procedure that contains
a return r £ PATHS)}
TRANS 4- TRANSFORM n PATHS
call Analyze
fi
— determine Allow and Transform for uu by Theorem 4
if ui £ F'oujnode where uu occurs]
for each procedure X that contains a return £ ROOT1
RT1 4— {x | x £ ROOT1 A x is contained in procedure X)
PP 4- 0
PP 4— {a: | x £ (RT1 U LINK1) A (x is on a path that inclusively
begins with a return £ RT1 and ends with a return from
the procedure that contains uu, such that each return
in this path is in (RTl U LINKl))}
if PP ± 0
PATHS 4- 0
PATHS «— {x | x £ ALLOW A (x returns from procedure X
V x returns from a procedure that contains
a return r £ PATHS)}
TRANS 4- TRANSFORM n PATHS
PATHS 4- PATHS U PP
call Analyze
statements: fi, end for, fi, fi, end for, od
Figure 3.5. - continued.

74
*
Procedure Analyze
begin
— avoid repetition of uu dataflow analysis if possible
47 if FINUU T A (PATHS = 0
V (true for all saved pairs for uu: PATHS ^ P V TRANS % T))
48 if PATHS = 0
49 FINUU T
50 push (uu, 0, 0) onto stack
else
51 save PATHS and TRANS as the pair P x T for uu
52 push (uu, PATHS, TRANS) onto stack
fi
fi
end
Figure 3.5. - continued.
Rule 1. If ALLOW = 0 then element u2 is generated at the node where use u
occurs, otherwise Uj is the generated element.
Rule 2. Let n be a call node, p be the associated return node, and q be the entry
node of the returned-from procedure. Each time the Bout[n\ equation is computed, if
Ui (E Bin[q), then ux cannot cross from B{n[q] into the Bout[n] set if p £ ALLOW.
Rule 3. Let n be a call node, p be the associated return node, and q be the entry
node of the returned-from procedure. Each time the Bout[n\ equation is computed,
if iq € B{n[q], and, by C2 and Rule 2, uj can cross from B{n[q] into the Bout[n] set,
and p E TRANSFORM, then as this U\ element crosses from B{n[q] into the Bout[n\
set, the element is changed to u2. In effect, uj is transformed into u2, and the call
node n becomes a generation node for the u2 element.
As the usefulness of slicing is primarily for program fault localization, it may
be desirable to modify the algorithm so that those uses in control predicates whose
subordinate statements have at least one use or definition already in the slice, are
themselves added to the slice and propagated in turn. An example of a control pred¬
icate is the condition tested by an if statement. By subordinate statements is meant

75
For any node n.
OUT[n] = Eout[n\ U Bout[n\
IN[n] = Ein[n\ U Bin[n]
Group I: n is an exit node.
B0ut[n] = 0
Eout[n] = (J {x | x e IN[p] A Ci}
p € succ(n)
Bin[n] = GEN[n]
Ein[n] = Eout[n] U RECODE[n]
Group II: n is a call node, p is the associated return node and q is the entry node of
the returned-from procedure.
B0ut[n\ = {x | (x € Bin[p] A (Ci V (Ci A C2 A x e £¿n[g]))) V (x € Bin[q] A C2)}
Eout[n] = {x G Ein[p] | Ci V (Ci A C2 A x G £,„[?])}
Bin[n} = [Bout[n] - KILL[n]) U GEN[n]
Ein[n] = Eout[n\ - KILL[n]
Group III: n is not an exit or call node.
B0ut[n) = (J Bin[p]
p € succ(n)
E0ut[n] = (J Ein[p]
p € succ(n)
Bin[n\ = (Bout[n] — KILL[n]) U GEN[n]
£,„[«] = Eout[n] - KILL[n\
Figure 3.6. Dataflow equations for the reaching-uses problem.

76
those statements whose execution is decided by the control predicate. Including these
control-predicate uses in the slice is advantageous because the cause of a program
error may actually be in a control predicate that is not deciding correctly when to
execute its subordinate statements. Ferrante et al. [8] present a method to precisely
determine the control predicates for each statement.

CHAPTER 4
INTERPROCEDURAL PARALLELIZATION
4.1 Loop-Carried Data Dependence
This section explains loop-carried data dependence and its relevance to paral¬
lelization. When a definition of a variable reaches a use of that variable, then a data
dependence exists such that the use depends on the definition. An example of data
dependence can be seen in Figure 4.1. The use of A(I) at line 3, and the use of A(I)
at line 4, both depend on the definition of A(I) at line 2. However, when considering
whether or not a loop can be parallelized, there is a special kind of data dependence
called loop-carried data dependence [25]. A data dependence is loop carried if the
value set by a definition inside the loop during loop iteration i can be used by a use
of that variable inside the loop during loop iteration j, where i ^ j. Note that i ^ j
is specified instead of the more restrictive and natural seeming i < j, because if the
loop is parallelized then the ordering of the loop iterations cannot be assumed.
The relationship between loop-carried data dependence and parallelization is
straightforward. If there is at least one loop-carried data dependence, then the loop
cannot be parallelized, otherwise the loop can be parallelized. Loop parallelization
1 DO I = 1,N
2 A(I) = B(I) * C(I) + D
3 B(I) = C(I) / D + A(I)
4 IF C(I) < 0 THEN C(I) = A(I) * B(I) FI
END DO
Figure 4.1. An example loop.
77

78
would mean that the ordering of the different iterations of the loop is unimportant,
whereas a loop-carried dependence means the opposite. If there are no loop-carried
data dependencies then there is no requirement that the iterations be ordered a
certain way. However, whenever a loop is parallelized, there should be a following,
added, serial step that sets the iteration variables, such as the I in Figure 4.1, to
whatever their values would be for the last iteration of the loop, assuming the loop
had not been parallelized. This added step would be necessary, assuming the iteration
variables of a loop are visible outside the loop and can therefore be referenced after
the loop completes. Iteration variables are those variables that are incremented or
decremented a constant value for each loop iteration. The recognition of iteration
variables is language-dependent.
Regarding data dependence and arrays, there are several efficient tests available
that determine if a data dependence is possible between a particular definition and
use of an array. The tests are the separability test, the gcd test, and the Banerjee
test. Details of these three tests can be found in [25]. The number theory behind the
tests is linear diophantine equations. A linear diophantine equation can be formed
from the array subscripts of the definition and use in question. For example, in
Figure 4.2 we want to know if A(3 * I - 5) and A(6 * I) can ever refer to the
same array element. The linear diophantine equation that relates these two array
references would be 3a: — 6y = 5. The question now becomes does this equation have
any integer solutions given the boundary conditions 30 < x,y < 100. If there is at
least one integer solution, then there would be a data dependence, otherwise there is
no data dependence, as is the case with Figure 4.2.
For the discussion that follows, we define the term loop body. The loop body
of any loop L will be all statements in the program that can possibly be executed
during the iterations of loop L. Calls are allowed in a loop, so a single loop body
could conceivably include the statements of many different procedures. For example,

79
DO I = 30,100
A(3 * I - 5) = ...
...= A(6 * I)
END DO
Figure 4.2. A loop with array references.
if a loop contains a call of procedure A, and procedure A contains a call of procedure
B, then the loop body would include all the statements of procedures A and B. In
Figure 4.1, the loop body is the four statements at lines 1 through 4.
With respect to the program flowgraph, the loop body is all flowgraph nodes
that may be traversed during the iterations of the loop. Let LB be the set of flowgraph
nodes that are in the loop body of loop L. Let n be the first node in the loop body
that is traversed during each iteration of the loop. The identification of node n is
language-dependent. Within the loop body of L, let definition d be a definition of
a non-array variable v, and let use u be a use of the variable v that is reached by
definition d. Let d be the node in the loop body where definition d occurs, and let
u be the node in the loop body where the use u occurs. To avoid the complications
posed by special cases, we assume that d, n, and u are separate and distinct nodes.
Although use u depends on definition d because definition d reaches use u,
this data dependence can prevent parallelization of loop L only if the dependence is
loop carried. Let P be a sequence of flowgraph nodes drawn from LB, such that P
represents a possible execution path along which definition d can reach use u. For
definition d to be loop-carried to use u along path P, the three nodes, d, n, and u,
must be in P, and in that order, because only the traversal of node n represents the
transition to a different iteration of the loop. If v is an array, then we assume that
definition d and use u may refer to different array elements during the same iteration.
For this reason, a path P that includes the nodes d, u, n, d, it, in that order, must

80
be assumed to show a loop-carried data dependence when v is an array, whereas this
path P does not show a loop-carried data dependence if definition d and use u always
refer to the same storage location during any iteration, as we assume is the case when
v is a non-array, because in any iteration that follows such a path P, the value used
at use u is always the value defined at definition d in that same iteration.
4.2 The Parallelization Algorithm
This section presents in Figure 4.3 an algorithm that identifies loops that can be
parallelized, including loops that contain calls. The algorithm uses our interprocedu¬
ral dataflow analysis method as an integral step to determine data dependencies. The
loops that can be parallelized are those loops that are not marked by the algorithm
as inhibited.
The algorithm has three distinct steps. First, the reaching-definitions dataflow
problem is solved for the input program by using our interprocedural dataflow analysis
method. Second, the quality of the reaching-definition information computed by the
first step is possibly improved in the case of array references by using the separability,
gcd, and Banerjee tests. Third, individual d,u pairs that represent data dependence
are examined for loop-carried data dependence.
At line 7, the definitions and uses of iteration variables are excluded from testing
for loop-carried data dependence, because for any iteration the iteration variables will
have constant values that can be precomputed if loop L is parallelized. The test at
line 8 is a necessary condition for the P-test procedure to return a T, which is tested
for at line 9. The test at line 8 is done as an economy measure to avoid, when
possible, the more costly P-test.
Procedure P-test uses a straightforward algorithm that begins with node d and
then spreads out examining successors, successors of successors, and so on, until either
there are no more acceptable nodes to examine, in which case F is returned, or all the
requirements for path P have been met, in which case T is returned. The successors

1
2
3
4
5
6
7
8
9
10
81
— a d, u pair is a definition d that reaches a use u
— a: is the dataflow element that represents the definition d
— v is the variable referenced by definition d and use u
— to avoid complications, n ^ d ^ u is assumed
— n is the first node traversed during each loop L iteration
— d is the node whose basic block contains definition d
— u is the node whose basic block contains use u
— LB is the set of nodes in the loop body of loop L
— IV is the set of definitions of iteration variables for loop L
begin
— step 1, determine reaching definitions for the input program
use our method to solve the reaching-definitions dataflow problem
— step 2, improve the reaching-definition information for array references
for all d, u pairs in the program, such that v is an array
use the separability, gcd, and Banerjee tests as applicable
if definition d and use u can never reference the same element
mark the d, u pair as non-reaching
fi
end for
— step 3, identify d, u pairs that inhibit parallelization
for each loop L in the program
for each reaching d, u pair such that d, u € LB and definition d £ IV
if x € Bout[n]
if P-test(ar, n, d, u, L, LB) = T
mark L parallelization as inhibited by the d, u pair
fi
fi
end for
end for
end
Figure 4.3. The parallelization algorithm.

11
12
13
14
15
16
17
18
19
20
21
22
23
24
82
procedure P-test(x, n, d, u, L, LB)
— is there a loop-carried data dependence from definition d to use u thru node n
— return T if yes, F if no
begin
— parti, is there a path from d to n along which x is found
if v is an array
DONE «- {d}
else
DONE <- {d, u}
fi
NEXT «- {d}
until NEXT = 0
remove a node from NEXT, denote it p
for each successor node s of node p, such that s £ DONE
DONE 4- DONE U {s}
if s & LB
Vs is an entry node
Vx £ 5out[>s]
ignore s
else if s = n
goto part2
else
NEXT <- NEXT U {s}
fi
end for
end until
return F
Figure 4.3. - continued.

83
part2:
— part2, is there a path from n to u along which x is found
25 if v is an array
26 DONE «- {n}
else
27 DONE «- {n, d}
fi
28 NEXT «- {n}
29 until NEXT = 0
30 remove a node from NEXT, denote it p
31 for each successor node s of node p, such that s ^ DONE
32 DONE <- DONE U {s}
33 if s £ LB
V5 is an exit node
V(á is contained in the same procedure that contains L A x $ -Bout[s])
V(s is not contained in the same procedure that contains L A x <£ E^oui[s])
34 ignore s
35 else if s = u
36 return T
else
37 NEXT 4- NEXT U {s}
fi
end for
end until
38 return F
end
Figure 4.3. - continued.

84
of a node are examined because normally a successor node is assumed to represent a
possible continuation of the execution path from the point of the predecessor node.
Exceptions in the algorithm involving entry and exit nodes are explained shortly.
Note that P-test only determines whether a satisfactory path P exists or not; it does
not determine what path P is in terms of an actual node sequence, as there may be
many such satisfactory paths P. Lines 13 and 27 are active when v is not an array.
In this case, a path P that includes d, u, n, d, u, in that order, is not allowed, and
this is prevented by marking the unwanted node u at line 13, and the unwanted node
d at line 27.
The test of x f?out[s] at line 19 satisfies the requirement that the definition
d can reach along the path P. A similar test is made at line 33. At line 19, only
the B set is checked because there are no descents into called procedures, as per
the rejection of entry nodes at line 19. Entry nodes are rejected at line 19 because
any path from d to n will not leave unreturned calls, because n is an outermost node
relative to the loop body, and the path is confined to the loop body. As the successors
of each call node are an entry node and a return node, it is only necessary to check
the out set of the return node to know whether the element x survived the call or
not, and this is effectively done by the x £ Bout[s] test already mentioned. At line 33,
exit nodes are rejected because any path from n to u will not make a return without
first making the call. This follows from the fact, already mentioned, that node n is
an outermost node relative to the loop body, and the path is confined to the loop
body. As the return node can always be added to the path P from the call node,
there is no need to add it from the exit node, hence the rejection of the exit node.
For parti and part2 in procedure P-test, each flowgraph node may appear only
once in the NEXT set, hence the complexity of the P-test procedure is 0(n) where n
is the number of flowgraph nodes. For the entire algorithm, step3 dominates, so the

85
complexity is O(lpn) where / is the number of loops in the program, p is the number
of d,u pairs in the program, and n is the number of flowgraph nodes.

CHAPTER 5
CONCLUSIONS AND FUTURE RESEARCH
5.1 Summary of Main Results
The first part of this work presented a new method for context-dependent, flow-
sensitive interprocedural dataflow analysis. The method was shown to produce a
precise, low-cost solution for such fundamental and important problems as reaching
definitions and available expressions, regardless of the actual call structure of the
program being analyzed. By using a separate set to isolate calling-context effects,
and another set to accumulate body effects, the calling-context problem has been
reduced to the problem of solving the dataflow equations that compute the different
sets. These equations can be solved by the iterative algorithm. As part of our
method, the interprocedural kill effects of call-by-reference formal parameters are
correctly handled by the equations-compatible technique of element recoding.
The importance of our interprocedural analysis method lies in the fact that
a number of different applications depend on the solution of fundamental dataflow
problems such as reaching definitions, live variables, definition-use and use-definition
chains, and available expressions. Program revalidation, dataflow anomaly detection,
compiler optimization, automatic vectorization and parallelization, and software tools
that make a program more understandable by revealing data dependencies, are some
of the applications that may benefit by using our method.
The second part of this work presented new algorithms for precise interprocedu¬
ral logical ripple effect and slicing. The algorithms use our interprocedural dataflow
analysis method, and add a control mechanism by which, in effect, execution-path
86

87
history can affect execution-path continuation as the ripple effect or slice is built
piece by piece.
The importance of our algorithms for precise interprocedural logical ripple effect
and slicing lies in their applicability to the areas of software maintenance and debug¬
ging. A precise interprocedural logical ripple effect can be used to show a programmer
the consequences of program changes, thereby reducing errors and maintenance cost.
Similarly, a precise interprocedural slice can localize program faults, thereby saving
programmer effort and debugging cost.
The third part of this work presented an algorithm that identifies loops that
can be parallelized, including loops that contain calls. The algorithm makes use of
our interprocedural dataflow analysis method to determine data dependencies, and
then the algorithm examines the data dependencies within each loop and determines
if any of these data dependencies are loop-carried, in which case parallelization of the
loop is inhibited. The algorithm has potential use in parallelization tools.
5.2 Directions for Future Research
There are several topics of possible future research related to our method for
interprocedural dataflow analysis. Regarding solving the equations, besides the it¬
erative algorithm there are elimination algorithms [20] that have better complexity.
Further studies are needed to determine to what extent these other algorithms can
be used to solve the equations. Another topic regards the dataflow problems that can
be solved by our method, as the actual universe of solvable problems remains to be
determined. We have only mentioned a few of the better known problems. For some
dataflow problems, it may be that our method can be used after suitable modification
to adapt it to the special needs of the problem.
Regarding possible future research related to our algorithms for precise inter¬
procedural logical ripple effect and slicing, because the algorithms may overestimate
when recursive calls are present, or because the Allow set lacks the information needed

88
to enforce the ordering of unmatched returns, one area of future research would be
to investigate the possibility of modifying Definition 1, Theorems 1 through 4, and
the algorithms, so as to remove the possibility of such overestimation.

REFERENCES
[1] Agrawal, H., and Horgan, J. Dynamic program slicing. Proceedings of the SIG-
PLAN 90 Conference on Programming Language Design and Implementation.
ACM SIGPLAN Notices, 25, 6 (June 1990), 246-256.
[2] Aho, A., Sethi, R., and Ullman, J. Compilers, Principles, Techniques and Tools.
Addison-Wesley, Reading, MA (1986).
[3] Allen, F. Interprocedural data flow analysis. Proceedings of the IFIP Congress
1974, North Holland, Amsterdam (1974), 398-402.
[4] Banning, J. An efficient way to find the side effects of procedure calls and the
aliases of variables. Conference Record of the 6th ACM Symposium on Principles
of Programming Languages, ACM, New York (Jan. 1979), 29-41.
[5] Burke, M., and Cytron, R. Interprocedural dependence analysis and paralleliza¬
tion. Proceedings of the SIGPLAN 86 Symposium on Compiler Construction,
162-175.
[6] Callahan, D. The program summary graph and flow-sensitive interprocedural
data flow analysis. Proceedings of the SIGPLAN 88 Conference on Program¬
ming Language Design and Implementation. ACM SIGPLAN Notices, 23, 7 (July
1988), 47-56.
[7] Cooper, K., and Kennedy, K. Interprocedural side-effect analysis in linear time.
Proceedings of the SIGPLAN 88 Conference on Programming Language Design
and Implementation. ACM SIGPLAN Notices, 23, 7 (July 1988), 57-66.
[8] Ferrante, J., Ottenstein, K., and Warren, J. The program dependence graph
and its use in optimization. ACM Transactions on Programming Languages and
Systems, 9, 2 (1987), 319-349.
[9] Harrold, M., and Soffa, M. Computation of interprocedural definition and use
dependencies. Proceedings of the IEEE Computer Society 1990 Int’l Conference
on Computer Languages, New Orleans, LA (March 1990).
[10] Hecht, M. Flow Analysis of Computer Programs. Elsevier North-Holland, New
York (1977).
[11] Horwitz, S., Reps, T., and Binkley, D. Interprocedural slicing using dependence
graphs. ACM Transactions on Programming Languages and Systems, 12, 1 (Jan.
1990), 26-60.
[12] Hwang, J., Du, M., and Chou, C. Finding program slices for recursive procedures.
Proceedings of the IEEE COMPSAC 88 (Oct. 1988), 220-227.
89

90
[13] Johmann, K., Liu, S., and Yau, S. Dataflow Equations for Context-Dependent
Flow-Sensitive Interprocedural Analysis. SERC-TR-45-F, Department of Com¬
puter and Information Sciences, University of Florida, Gainesville (Jan. 1991).
[14] Korel, B., and Laski, J. Dynamic program slicing. Information Processing Let¬
ters, 29, 3 (Oct. 1988), 155-163.
[15] Landi, W., and Ryder, B. Pointer-induced aliasing: a problem classification.
Conference Record of the 18th ACM Symposium on Principles of Programming
Languages, ACM, New York (1991), 93-103.
[16] Leung, H., and Reghbati, H. Comments on program slicing. IEEE Transactions
on Software Engineering, SE-13, 12 (Dec. 1987), 1370-1371.
[17] Myers, E. A precise interprocedural data flow analysis algorithm. Conference
Record of the 8th ACM Symposium on Principles of Programming Languages,
ACM, New York (1981), 219-230.
[18] Richardson, S., and Ganapathi, M. Interprocedural optimization: experimental
results. Software—Practice and Experience, 19, 2 (1989), 149-169.
[19] Rosen, B. Data flow analysis for procedural languages. Journal of the ACM, 26,
2 (April 1979), 322-344.
[20] Ryder, B., and Pauli, M. Elimination algorithms for data flow analysis. ACM
Computing Surveys, 18, 3 (Sep. 1986), 277-316.
[21] Sharir, M., and Pnueli, A. Two approaches to interprocedural data flow analysis.
Muchnik, S., and Jones, N. Eds. Program Flow Analysis: Theory and Applica¬
tions, Prentice-Hall, Englewood Cliffs, NJ (1981), 189-232.
[22] Triolet, R., Irigoin, F., Feautrier, P. Direct parallelization of call statements.
Proceedings of the SIGPLAN 86 Symposium on Compiler Construction, 176—
185.
[23] Weiser, M. Programmers use slices when debugging. Communications of the
ACM, 25, 7 (July 1982), 446-452.
[24] Weiser, M. Program slicing. IEEE Transactions on Software Enqineerinq, SE-10,
4 (July 1984), 352-357.
[25] Zima, H., and Chapman, B. Supercompilers for Parallel and Vector Computers.
Addison-Wesley, Reading, MA (1990).

BIOGRAPHICAL SKETCH
Kurt Johmann was born in Elizabeth, New Jersey, on November 16, 1955. In
1978 he received a B.A. in computer science from Rutgers University in New Jersey.
Following graduation, he worked for a shipping company, Sea-Land Service Inc., as
a programmer and systems analyst. In 1985 he left Sea-Land and did PC work for
three years. Following this, he entered the graduate program of the Computer and
Information Sciences Department at the University of Florida in the Fall of 1988.
He received an M.S. in computer science, December 1989, and entered the Ph.D.
program. Anticipating graduation, he hopes to find a job in academia.
91

I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope
and quality, as a dissertation for the degree of Doctor of Philosophy.
Stepfie/ S. Yau, Chai/man
Professor of Computer and
Information Sciences
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope
and quality, as a dissertation for the degree of Doctor of Philosophy.
Richard Newman-Wolfe, Cochairman
Assistant Professor of
Computer and Information Sciences
I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope
and quality, as a dissertation for the degree of Doctor of Philosophy.
fL//\rfpLj
Paul Fishwick
Associate Professor of
Computer and Information Sciences

I certify that I have read this study and that in my opinion it conforms to
acceptable standards of scholarly presentation and is fully adequate, in scope
and quality, as a dissertation for the degree of Doctor of Philosophy.
This dissertation was submitted to the Graduate Faculty of the College
of Engineering and to the Graduate School and was accepted as partial fulfillment
of the requirements for the degree of Doctor of Philosophy.
May, 1992
n
/'I'v
Winfred M. Phillips
Dean, College of Engineering
Madelyn M. Lockhart
Dean, Graduate School

UNIVERSITY OF FLORIDA



xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EBDFRDY78_3D0UU3 INGEST_TIME 2011-07-29T20:06:58Z PACKAGE AA00003273_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES