UFDC Home  Search all Groups  UF Institutional Repository  UF Institutional Repository  UF Theses & Dissertations  Vendor Digitized Files   Help 
Material Information
Subjects
Notes
Record Information

Full Text 
CONTEXTDEPENDENT FLOWSENSITIVE INTERPROCEDURAL DATAFLOW ANALYSIS AND ITS APPLICATION TO SLICING AND PARALLELIZATION By KURT JOHMANN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1992 UmYERSITY'OFfORm UUIR E ACKNOWLEDGEMENTS I would like to express my appreciation and gratitude to my chairman and advisor, Dr. Stephen S. Yau, for his careful guidance and generous support during this study. I also would like to express my appreciation and gratitude to my previous advisor, Dr. SyingSyang Liu. Without their supervision and counsel, this work would not have been possible. To Dr. Paul Fishwick, Dr. Richard NewmanWolfe, and Dr. Mark Yang, members of the supervisory committee, go my thankfulness for their service. Finally, I want to thank the Software Engineering Research Center (SERC) for providing financial support during this study. TABLE OF CONTENTS ACKNOWLEDGEMENTS ............................ ii ABSTRACT .................................... v CHAPTERS 1 INTRODUCTION .................... ........... 1 1.1 Interprocedural Dataflow Analysis ..................... 1 1.2 Slicing and Logical Ripple Effect ................. ...... 3 1.3 Parallelization ................... ............ 6 1.4 Literature Review .................. ........... 7 1.5 Outline in Brief .................... .......... 11 2 THE INTERPROCEDURAL DATAFLOW ANALYSIS METHOD .... 12 2.1 Constructing the Flowgraph ................. ..... .. 12 2.2 Interprocedural ForwardFlowOr Analysis ............. 16 2.2.1 The Dataflow Equations . .. 17 2.2.2 Element Recoding for Aliases . . 23 2.2.3 Implicit Definitions Due to Calls . .. 27 2.3 Interprocedural ForwardFlowAnd Analysis . ... 30 2.4 Interprocedural BackwardFlow Analysis . .. 36 2.5 Complexity of Our Interprocedural Analysis Method ... 36 2.6 Experimental Results ................... ........ 41 3 INTERPROCEDURAL SLICING AND LOGICAL RIPPLE EFFECT 45 3.1 Representing Continuation Paths for Interprocedural Logical Ripple Effect . . . . 45 3.2 The Logical Ripple Effect Algorithm . . 55 3.3 A Prototype Demonstrates the Algorithm. . ... 67 3.4 The Slicing Algorithm ... .............. ....... 71 4 INTERPROCEDURAL PARALLELIZATION . ..... 77 4.1 LoopCarried Data Dependence . ..... 77 4.2 The Parallelization Algorithm . . 80 5 CONCLUSIONS AND FUTURE RESEARCH.. . 86 5.1 Summary of Main Results ...................... 86 5.2 Directions for Future Research . . .. 87 REFERENCES ....................... ......... ..... 88 BIOGRAPHICAL SKETCH ............................ 91 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy CONTEXTDEPENDENT FLOWSENSITIVE INTERPROCEDURAL DATAFLOW ANALYSIS AND ITS APPLICATION TO SLICING AND PARALLELIZATION By Kurt Johmann May 1992 Chairman: Dr. Stephen S. Yau Major Department: Computer and Information Sciences Interprocedural dataflow analysis is important in compiler optimization, au tomatic vectorization and parallelization, program revalidation, dataflow anomaly detection, and software tools that make a program more understandable by show ing data dependencies. These applications require the solution of dataflow problems such as reaching definitions, live variables, available expressions, and definitionuse chains. When solving these problems interprocedurally, the context of each call must be taken into account. In this dissertation we present a method to solve this kind of dataflow problem precisely. The method consists of special dataflow equations that are solved for a program flowgraph. Regarding calling context, separate sets, called entry and body sets, are maintained at each node in the flowgraph. The entry set contains calling context effects that enter a procedure. The body set contains effects that result from statements in the procedure. By isolating callingcontext effects in the entry set, a call's nonkilled calling context is preserved by means of a simple intersection operation done at the return node for the call. Slicing determines program pieces that can affect a value. Logical ripple effect determines program pieces that can be affected by a value. Both slicing and logical ripple effect are useful for software maintenance. The problems of slicing and logical ripple effect are inverses of each other, and a solution of either problem can be inverted to solve the other. Precise interprocedural logical ripple effect analysis is complicated by the fact that an element may be in the ripple effect by virtue of one or more specific execution paths. In this dissertation we present an algorithm that builds a precise logical ripple effect or slice piece by piece, taking into account the possible execution paths. The algorithm makes use of our interprocedural dataflow analysis method, and this method is also used in an algorithm given in this dissertation for identifying loops that can be parallelized. CHAPTER 1 INTRODUCTION 1.1 Interprocedural Dataflow Analysis Dataflow analysis refers to a class of problems that ask about the relationships that exist along a program's possible execution paths, between such program ele ments as variables, constants, and expressions [2, 10]. When dataflow analysis is done for a program by treating its individual procedures as being independent of each other, regardless of the calls made, this is known as intraprocedural analysis. For intraprocedural analysis, assumptions must be made about the effects of calls. By contrast, interprocedural analysis replaces assumptions with specific information about the effects of each call. This information can be gathered by either flow sensitive [3, 6, 9, 17, 19, 21] or flowinsensitive [4, 7, 18] analysis. When answering a dataflow question, a flowsensitive analysis will take into account the flow paths within procedures, whereas a flowinsensitive analysis ignores these flow paths. The flow paths are the possible execution paths. Flowsensitive analysis typically provides more precise information, but at greater cost. Flowsensitive interprocedural dataflow analysis has two major problems that make it significantly harder than intraprocedural analysis. First, in intraprocedural analysis, it is assumed that any path in the flowgraph is a possible execution path. By contrast, for interprocedural analysis, it is useful to assume that the possible execution paths conform to the rule that once a procedure is entered by a call, the flow returns to that call upon return. Thus, the set of possible execution paths will typically be a proper subset of the paths in the program flowgraph. This problem will be referred to as the callingcontext problem. Second, callbyreference formal parameters typically cause alias relationships between actual and formal parameters that are valid only for certain calls and apply only to those passes through the called procedure that originate from those calls that establish the specific alias relationship. There are many applications for a flowsensitive interprocedural dataflow anal ysis method that solves the two major problems, assuming that the costs of the method are not too high. Some of the wellknown dataflow problems that can be precisely solved by such a method are reaching definitions, live variables, the related problems of definitionuse and usedefinition chains, and available expressions. Ap plications that require the solution of one or more of these dataflow problems include compiler optimization, automatic vectorization and parallelization of program code, program revalidation, dataflow anomaly detection, and software tools that show data dependencies. In this dissertation we present a new method for flowsensitive interprocedural dataflow analysis that solves the two major problems, and does so at a comparatively low cost [13]. The method consists of special dataflow equations that are solved for a program flowgraph. In deference to calling context, separate sets, called entry and body sets, are maintained at each node in the flowgraph. The entry set contains callingcontext effects that enter a procedure. The body set contains effects that result from statements in the procedure. By isolating callingcontext effects in the entry set, a call's nonkilled calling context is preserved by means of a simple inter section operation done at the return node for the call. The main advantage of our method is its low complexity, and the fact that the presence of recursion does not affect the preciseness of the result. The language model assumed for Chapter 2 allows global variables, but the visibility of each formal parameter is limited to the single procedure that declares it. Thus, with the exception of a call and its indirect reference, each formal pa rameter can only be referenced inside a single procedure. Examples of programming languages that fit this model are C and FORTRAN. This restriction on the visibility of formal parameters is imposed for the sake of the discussions of element recoding in Sections 2.2.2 and 2.3, of implicit definitions in Section 2.2.3, and of worstcase complexity in Section 2.5. Our method can also be used for the alternative language model that allows each formal parameter to have visibility in more than a single procedure, but this is considered only briefly at the end of Section 2.5. 1.2 Slicing and Logical Ripple Effect Given an actual or hypothetical variable v at program point p, determine all program pieces that can possibly be affected by the value of v at p. This is the logical ripple effect problem. Given v and p, determine all program pieces that can possibly affect the value of v at p. This is the slicing problem. For these two problems, each problem is the inverse of the other, and a solution for one of these problems, once inverted, would be a solution for the other problem. Logical ripple effect is useful for helping a programmer to understand how a program change, either actual or hypothetical, will impact that program. Making program changes as part of routine maintenance often introduces new errors into the changed program. Such errors typically result because the programmer overlooked some part of the logical ripple effect for that change. By showing a programmer what the logical ripple effect actually is for a program change, mistakes can be avoided. Slicing is primarily useful for program fault localization [23]. If a variable v at point p is known to have a wrong value, then a slice on v at p will narrow the search for the cause of the error to that part of the program that can truly affect v at p. Thus, the fault is localized. The more precise the slice, the more localized the cause of the error, saving programmer time. In this dissertation we are concerned only with static logical ripple effect and slicing [11, 12, 16, 24] where the ripple effect or slice is determined from dataflow analysis of the program text. The alternative approach is dynamic logical ripple effect and slicing [1, 14] where the ripple effect or slice is determined by actually executing the program. Whenever we speak of execution paths in Chapter 3, we always mean possible execution paths as determined by dataflow analysis. Precise interprocedural logical ripple effect analysis is complicated by the fact that a definition may be added to the ripple effect because of one or more specific execution paths. To determine in turn the ripple effect of that added definition, that definition should be constrained to those execution paths that are the possible continuations of the execution paths along which that definition was itself affected and thereby added to the ripple effect. We refer to this as the executionpath problem. In particular, it is those call instances made in an execution path P that have not been returned to in P that cause the difficulty. This is because of the rule that a called procedure returns to its most recent caller. This means that any continuation of the execution path P must first return to those unreturned calls in P before returns can possibly be made to call instances that precede P. An example will illustrate the problem. procedure main procedure B procedure A begin begin begin 1: f 7 6: y f+5 7: callB 2: call A end 8: x + y 3: z x end 4: f 1 5: call B end For the example, assume that all variables are global, and that the problem is to determine the logical ripple effect for the definition of variable f at line 4. The call to procedure B at line 5 allows the definition of f at line 4 to affect the definition of y at line 6, and the return of procedure B would be to the call at line 5 by which the definition of y at line 6 was affected. The end result is that the ripple effect should include only line 6. However, assume that the executionpath problem is ignored and all returns are possible when the ripple effect is computed. For the same problem, the call at line 5 allows the definition of f at line 4 to affect the definition of y at line 6. Then the definition of y at line 6 affects the definition of x at line 8 by procedure B returning to the call at line 7 in addition to the call at line 5. Then the definition of x at line 8 affects the definition of z at line 3 by procedure A returning to the call at line 2. The end result is a ripple effect that includes lines 3, 6, and 8, but only line 6 should be in the ripple effect. Although there are a number of papers on logical ripple effect and slicing [11, 12, 16, 24], there appears to be only one [11] that addresses the problems of precise interprocedural logical ripple effect and slicing, and presents a method for it. Weiser [24] was the first to propose an interprocedural slicing method that ignores the executionpath problem and thereby suffers from the resulting loss of precision. Horwitz et al. [11] address the problem of precise interprocedural slicing, and present a method to construct a system dependence graph from which slices can be extracted. In this dissertation we present an algorithm that builds the logical ripple effect piece by piece, and takes into account the restrictions on executionpath continuation that are imposed by the preceding execution paths up to the point by which the given program piece is affected and thereby included in the ripple effect. In general, the algorithm computes a precise logical ripple effect, but some overestimation is possible, meaning that the computed logical ripple effect may be larger than it actually is. An inverse form of the algorithm is presented for the slicing problem. The languages that our algorithm will work for include many of the common procedural languages such as C, Pascal, Ada, and Fortran. 1.3 Parallelization Automatic conversion of a sequential program into a parallel program is often referred to as parallelization. Parallelization problems are typically concerned with the conversion of sequential loops into parallel code. In this dissertation, the specific problem considered is the identification of loops in a program that can be parallelized, including those loops that contain calls. A flowsensitive interprocedural dataflow analysis method has specific applicability to the problem of parallelizing loops that contain calls, because such a method can supply the precise datadependency infor mation that would be necessary for the parallelization analysis. The parallelization of a loop would mean that each iteration of the loop can be executed independently of the other iterations of the loop. In theory, this would mean that each single iteration, or each arbitrary block of iterations, can be assigned to a separate processor in a parallel machine. The specific architecture of a particular parallel machine, as well as the programming language to be parallelized, as well as the various loop transformations that are possible to convert sequential loop code into functionally equivalent sequential code that is more parallelizable, will influence the determination in any parallelization tool as to what loops can actually be parallelized, and how they would be parallelized. However, none of the architecture, language, and looptransformation issues will be considered here. Instead, the problem will be considered solely from the standpoint of data dependence. After a brief review of the basics regarding data dependence and parallelization, an algorithm is given that identifies loops in a program that can be parallelized, and this algorithm uses our interprocedural dataflow analysis method as an integral part. The potential value of parallelization is clear. On the one hand, parallel machines are becoming more common, and on the other hand, a great number of sequential programs already exist, some of which can benefit from the greater processing power that parallelization would offer. 1.4 Literature Review Different methods have been offered for solving various flowsensitive interpro cedural dataflow analysis problems. Sharir and Pnueli [21] present a method they name callstrings. The essential idea of their method is to accumulate for each ele ment a history of the calls traversed by that element as it flows through the program flowgraph. The call history associated with an element is used whenever that element is at a return point. The element can only cross back to those calls in its call history. Thus, the callstrings approach provides a solution to the callingcontext problem. However, the disadvantage of this approach is the time and space needed to maintain a call history for each element at each flowgraph node. Let I be the program size. We assume that the number of elements will be a linear function of 1. The worstcase number of total set operations required by the callstrings approach would be greater by a factor of I when compared to our method. This is because for each union or intersection of two sets of elements, if the same element is in both sets, then a union operation must also be done for the two associated call histories so as to get the new call history to be associated with that element at the node for which the set operation is being done. A further disadvantage of the callstrings approach is the need to include the associated call histories when set stability is tested to determine termination for the iterative algorithm used to solve the dataflow equations. Myers [17] offers a solution to the callingcontext problem that is essentially the same as callstrings. Allen [3] presents a different method for interprocedural dataflow analysis. The method analyzes each procedure completely, in reverse invocation order. The first procedures to be analyzed would be those that make no calls, then the procedures that only call these procedures would be analyzed, and so on. Once a procedure is analyzed, its effects can be incorporated into those procedures that call it, when they in turn are analyzed. The obvious drawback of this method is that it cannot be used to analyze recursive calls. Rosen [19] presents a complex method for interprocedural dataflow analysis that is limited to solving the problems of variable modification, preservation, and use. These dataflow problems do not require a solution of the callingcontext problem. Callahan [6] has proposed the program summary graph to solve the interproce dural dataflow problems of kill and use, where kill determines all definite kills that result from a procedure call, and use determines all variables that may be used as a result of a procedure call before being redefined. As part of the determination of edges in the program summary graph, intrapro cedural reachingdefinitions analysis must be done for each procedure. Simplifying Callahan's space complexity analysis, we get O(vgl) as the worstcase size of the program summary graph, where v,, is the number of global variables in the program plus the average number of actual parameters per call, and I is the program size. One limitation of Callahan's method is that it does not correctly handle multiple aliases that result when the same variable is used multiple times as an actual parameter in the same call and the corresponding formal parameters are callbyreference. By contrast, our method, using element recoding where all the aliases are encoded in a single element, will correctly handle the multiple aliases problem. Callahan's method offers no solution to the callingcontext problem, and could not be used to determine, for example, interprocedural reaching definitions. However, Harrold and Soffa [9] have extended his method so that interprocedural reaching definitions can be determined. They use an interprocedural flowgraph, denoted IFG, that is very similar to the program summary graph. The IFG has interreaching edges that are determined by solving Callahan's kill problem. They recommend using his method, so their method inherits Callahan's space and time complexity, as well as its limitation with regard to multiple aliases. Before the IFG can be used, it must be decorated with the results of intrapro cedural analysis done twice for each procedure to determine both reaching definitions and upwardly exposed uses. Then an algorithm is used to propagate the upwardly exposed uses throughout the IFG. This algorithm has worstcase time complexity of O(n2) where n is the number of nodes in the IFG. Their graph will have the same number of nodes as for Callahan's graph, meaning worstcase graph size will be O(vgal). Substituting val for n, we get a worstcase time complexity of O(v1l12). As the size of our flowgraph is proportional to the size of the program, the worstcase time complexity for solving our equations is only 0(12). Weiser [24] was the first to propose an interprocedural slicing method that ignores the executionpath problem and thereby suffers from the resulting loss of precision. Horwitz et al. [11] have presented a method to compute the more precise slice explained in the Introduction. However, they use a more restricted definition of a slice. Their slice is all statements and predicates that may affect a variable v at program point p, such that v is defined or used at point p. Their method consists of constructing a specialized graph called a system dependence graph. Nodes in this graph represent program pieces such as statements, and the edges in the graph represent control or data dependencies. Edges representing transitive data dependencies that are due to procedure calls are computed by first modeling each procedure and its calls with an attribute grammar called a linkage grammar, and then solving the grammar so as to determine the transitive data dependencies represented by it. Once the system dependence graph is complete, any slice based on an actual definition or use occurring at any point p in the program can be extracted from the graph. A major weakness of their method is that it does not allow a hypothetical use to be the starting point of the slice. The complexity of constructing the system dependence graph is given as O(G  X2 D2) where G is the total number of procedures and calls in the program, X is the total number of global variables in the program plus a term that can be considered a constant, and D is a linear function of X. Once the system dependence graph is complete, any particular slice that is wanted can be extracted from the graph at complexity O(n) where n is the size of the graph. The size of the graph is roughly quadratic with program size, being bounded by O(P (V + E) + T X) where P is the number of procedures, V is the largest number of predicates and definitions in a single procedure, E is the largest number of edges in a procedure dependence graph, T is the number of calls in the program, and X is the number of global variables. In their paper, much is made of the fact that once the graph is complete, any slice on an actual definition or use can be extracted from the graph at O(n) cost where n is the size of the graph. However, the number of actual definition and use occurrences in a program is proportional to the program size L. Therefore, any method that can compute a slice at cost O(Z) for some Z, can generate all the slices contained in their graph at cost O(Z L), spool the slices to disk, and recover them at cost 0(1). Although there are many papers on slicing, it seems that only Horwitz et al. [11] discuss clearly the problem of the more precise interprocedural slice, and present a method to compute it, as well as providing complexity analysis. Our research on slicing is only concerned with computing the more precise slice, so Horwitz et al. is the principal reference. Zima and Chapman [25] is the principal reference used to study the issues and methods of parallelization. Their book distills the work found in scores of papers and dissertations, and is an excellent survey of parallelization. Interprocedural par allelization is specifically considered by Burke and Cytron [5], and by Triolet et al. [22]. 1.5 Outline in Brief This introductory chapter ends with a brief synopsis of the remaining chapters. Chapter 2 presents in detail our interprocedural dataflow analysis method. The chap ter ends with a brief description of the prototypes that were built to demonstrate the method, along with some of the experimental results obtained from these prototypes. Chapter 3 begins with a representation scheme for continuation paths for the inter procedural logical ripple effect problem and then presents our interprocedural logical ripple effect algorithm. A prototype that was built to demonstrate this algorithm is briefly described and experimental results are presented. An inversion of the logical ripple effect algorithm is then presented as a solution to the interprocedural slicing problem. Chapter 4 begins with an explanation of loopcarried data dependence and its relevance to parallelization, and concludes with an algorithm that identifies loops that can be parallelized, including loops that contain calls. Chapter 5 summarizes the major results of the dissertation, and suggests directions for future research. CHAPTER 2 THE INTERPROCEDURAL DATAFLOW ANALYSIS METHOD 2.1 Constructing the Flowgraph This section discusses the flowgraph and its relationship to dataflow equations. After the discussion, rules are given for constructing the specific flowgraph required by our interprocedural analysis method. Note that the required flowgraph is con ventional and the rules to be given relate only to the representation of calls and procedures in the flowgraph. A flowgraph is a directed graph that represents the possible flow paths of a program. The nodes of a flowgraph correspond to basic blocks in the program. A basic block is a sequence of program code that is always executed together in the same order. The directed edges of a flowgraph represent possible transfers of control. Figures 2.1 and 2.3 each represent a flowgraph. Dataflow problems are often formulated as a set of equations that relate the four sets, IN, OUT, GEN, and KILL, that are associated with each node in the flow graph. For any node and its block, the GEN set represents the elements generated by that block. The KILL set represents those elements that cannot flow through the block, because they would be killed by the block. The IN set represents the valid elements at the start of the block, and the OUT set represents the valid elements at the end of the block. Dataflow problems are typically either forwardflow or backwardflow. For forwardflow, the IN set of a node is computed as the confluence of the OUT sets of the predecessor nodes, and the OUT set is a function of the node's IN, GEN, and KILL sets. For backwardflow, the OUT set of a node is computed as the con fluence of the IN sets of the successor nodes, and the IN set is a function of the node's OUT, GEN, and KILL sets. The predecessors of any node n are those nodes that have an outedge directed to node n. The successors of node n are those nodes that have an inedge directed from node n. The confluence operator will almost in variably be either set union or set intersection, depending on the problem. Thus, a dataflow problem may be classified as being either forwardflowor, forwardflowand, backwardflowor, or backwardflowand, where "or" refers to set union and "and" refers to set intersection. Once the dataflow equations have been defined for a particular problem, and the rules established for creating the GEN and KILL sets, the equations can then be solved for a specific program or procedure and its representative flowgraph. To solve the equations, the iterative algorithm can be used. The iterative algorithm has the advantage that it will work for any flowgraph. The iterative algorithm repeatedly computes the IN and OUT sets for all nodes until all sets have stabilized and ceased to change. Recomputation of a node is necessary whenever an outside set that it depends on changes. For forwardflow problems, a node must be recomputed if the OUT set of a predecessor node changes. For backwardflow problems, a node must be recomputed if the IN set of a successor node changes. Typically, an evaluation strategy will determine the actual order in which nodes are recomputed. The flowgraph required by our interprocedural analysis method is conventional, with special nodes and edges as follows. For each procedure in the program, assign an entry node and an exit node. These nodes have no associated blocks of program code. The entry node has a single outedge and as many inedges as there are calls to that procedure in the program. The exit node has as many inedges as there are nodes for that procedure whose blocks terminate with a return action. The exit node has as many outedges as there are calls to that procedure in the program. For every inedge of the entry node, there is a corresponding outedge of the exit node. For the purpose of constructing the flowgraph, calls must be classified as either known or unknown. A known call is where the flowgraph for the called procedure will be a part of the total flowgraph being constructed. An unknown call is where the flowgraph of the called procedure will not be a part of the total flowgraph being constructed. Unknown calls are common and will occur for two reasons. First, the called procedure may be a compilerlibrary procedure for which source code is not available. Second, the called procedure may be a separately compiled user procedure for which the source code is not available. For any unknown call made within the program, if summary information of its interprocedural effects is not available, then conservative assumptions about its effects will have to be made. The actual summary information needed, and the assumptions made in its absence, will depend on the particular dataflow problem. The summary information, if present, would be used when constructing the GEN and KILL sets for any node whose block contains an unknown call. For any known call made within the program, there will be two nodes in the flowgraph for that call. One node is the call node. The call node represents a basic block that ends with the known call. The other node is the return node. The return node has an empty associated block. The call node will have two outedges. One edge will be directed to the entry node of the called procedure. The other outedge will be directed to the return node for that call. The return node will have two inedges. One edge is the directed edge from the call node. The other inedge is directed from the called procedure's exit node. In all, each known call results in two nodes and three distinct edges. One edge connects the call node to its return node. A second edge connects the call node to the called procedure's entry node. A third edge connects the called procedure's exit node to the return node. In constructing the flowgraph, a special problem arises if the programming lan guage allows procedurevalued variables, such as the function pointers of C that when dereferenced result in a call of the function that is pointed at. The problem is to identify what are the possible procedure values when the procedurevalued variable invokes a call. Assuming this information is available from a separate analysis, the flowgraph can be constructed accordingly. For example, if the procedurevalued vari able can have three different values when the call in question is invoked and each value is a procedure whose flowgraph will be part of the total flowgraph, then three known calls would be constructed in parallel with a common predecessor node for the three call nodes and a common successor node for the three return nodes. A procedurevalued variable is in essence a pointer. Note that the problem of determining what a pointer is or may be pointing at when that pointer is dereferenced, can itself be formulated as a dataflow problem, and in particular as a forwardflowor dataflow problem. If necessary, an initial version of the flowgraph could be con structed that treats all calls invoked by procedurevalued variables as unknown calls, followed by a solving of the dataflow problem for determining possible pointer values whenever a pointer is dereferenced, followed by amendments to the flowgraph using the pointervalue information. Dataflow analysis makes a simplifying, conservative assumption about the cor respondence between paths in the flowgraph and possible execution paths in the pro gram. Let a path be a sequence of flowgraph nodes such that in the sequence node n follows node m only if n is a successor of m in the flowgraph. For intraprocedural analysis, the assumption made is that any path in the flowgraph is a possible execu tion path. That this assumption may not be true for a particular program should be obvious. However, the problem of determining the possible execution paths for an arbitrary program is known to be undecidable. The simplifying assumption that we use for interprocedural analysis is the same as that used for intraprocedural analy sis, but with the added proviso that for any path that is a possible execution path, any subsequence of return nodes must inversely match, if present, the immediately preceding subsequence of call nodes. A return node matches a call node if and only if the return node is the call node's successor in the flowgraph. 2.2 Interprocedural ForwardFlowOr Analysis This section begins with our basic approach to solving the callingcontext prob lem. The dataflow equations for forwardflowor analysis are then given and their correctness is shown. As a part of our interprocedural analysis method, the tech nique of element recoding is presented as a way to deal with the aliases that result from callbyreference formal parameters. For some dataflow problems, implicit defi nitions due to calls require explicit treatment, and this is discussed last. If certain problems, such as reaching definitions, are to be solved for a program by flowsensitive interprocedural analysis, then the calling context of each procedure call must be preserved. In general, preserving calling context means that the dataflow effects of an individual call should include those effects that survive the call and were introduced into the called procedure by the call itself, but not those effects introduced into the called procedure by all the other calls to it that may exist elsewhere in the program. We refer to the need to preserve calling context as the callingcontext problem. Our solution to the callingcontext problemand the essential difference be tween our dataflow equations and conventional dataflow equationsis to divide every IN set and every OUT set into two sets called an entry set and a body set. The reason for having two sets is that the callingcontext effects that enter a procedure from the different calls can be collected and isolated in the separate entry set. This entry set can then have effects in it killed by statements in the body of the procedure, but no additions are made to this entry set by body statements. Instead, any additions of effects due to body statements are made to the separate body set. This body set will also have effects killed in the normal manner, as for the entry set. Because the body set is kept free of callingcontext effects, it is empty at the entry node. By contrast, the entry set is at its largest at the entry node and will either stay the same size as it progresses through the procedure's body nodes, or become smaller because of kills. By intersecting the calling context at a call node with the entry set at the exit node of the called procedure, the result is that subset of the calling context that has reached the exit node and therefore will reach the return node for that call. By "reach" we mean that there exists a path in the flowgraph along which the element is not killed or blocked. 2.2.1 The Dataflow Equations The dataflow equations that define the entry and body sets at every node are now given. The equations are divided into three groups. The first group computes the sets for entry nodes. The second group computes the sets for return nodes. The third group computes the sets for all other nodes. In the equations, B denotes a body set and E denotes an entry set. Two conditions, C1 and C2, appear in the equations. C1 means that x will cross the interprocedural boundary from call node p into the called procedure. C2 means that x can cross the interprocedural boundary from exit node q into return node n. C, means not Ci. For each node n, pred(n) means the set of predecessors of n. The RECODE set used in Group I is explained in Section 2.2.2. The GEN set used in Group I, and the GEN and KILL sets used in Group II, are explained in Section 2.2.3. For any node n. IN[n] = Ei,[n] U Bi,[n] OUT[n] = Eot[n] U But[n] Group I: n is an entry node. Bin[n] = 0 E;,[n] = U { Ix E OUT[p] A C}) p E pred(n) Bo,,[n] = GEN[n] Eo0t[n] = Ei,[n] U RECODE[n] Group II: n is a return node, p is the associated call node and q is the exit node of the called procedure. Bin[n] = {x (x E B0ot[p] A (1 V (Ci A C2 Ax E Eo Et[q]))) V (x E Bout[q] A C2)} Ei.[n] = {x E ,t[pl I C V (Ci A C2 A x E E,,[q])} Bo,.[nl = (Bin[nl KILL[n]) U GEN[n] Eo.t[n] = Ei[n] KILL[n] Group III: n is not an entry or return node. Bin[n] = U Bo.u[p p E pred(n) Ein[n]= U Et p] p E pred(n) Bo.t[n] = (Bi,[n] KILL[n]) U GEN[n] ES.t[n] = E,n[n] KILL[n] The equations assume that the GEN and KILL sets for each call node will include only those effects for that call that occur prior to the entry of the called procedure. This requirement is necessary because the OUT set of the call node is used by the entrynode equation that constructs the entry set of the called procedure. Referring to conditions C1 and C2, the rules for deciding whether an effect crosses a particular interprocedural boundary will depend on two primary factors, namely the dataflow problem and the programming language. For example, for the reachingdefinitions problem and a language such as FORTRAN, any definition of a global variable, and any definition of a variable that is used as an actual parameter whose corresponding formal parameter is callbyreference, will cross. As a rule, an effect that crosses into a procedure because it might be killed, will also cross back to the return node if it reaches the exit node of the called procedure. Table 2.1 shows the result of solving the equations for the flowgraph of Fig ure 2.1. By "solving" we mean that, in effect, the iterative algorithm has been used and all the sets are stable. The dataflow problem is reaching definitions, and variable w is local while variables x, y, and z are global. Reaching definitions is the problem of finding all definitions of a variable that reach a particular use of that variable, for all variables and uses in the program. In Figure 2.1, nodes 1 and 8 are entry nodes, nodes 7 and 10 are exit nodes, nodes 3 and 5 are call nodes, and nodes 4 and 6 are return nodes. Alongside each node is its basic block. Each defined variable is superscripted with an identifier that is the set element used in Table 2.1 to represent that definition. The correctness of the equations can be seen from the following observations. For a procedure, the entrynode entry set is constructed as the union of all calling context effects that can enter the procedure from its calls. Within the procedure body, effects in the entry set can be killed, but not added to. For effects in the entry procedure main procedure f( begin begin w=5 x=10 x = 10 end if(w > x) z=10 call fO else y=5 call fO 1 end Figure 2.1. A reachingdefinitions example. z4= 10 call f( y3= 5 call f( Table 2.1. Solution of forwardflowor equations for Figure 2.1. Node Ei, Eot Bi, Bot 1 0 0 0 0 2 0 0 0 {1,2} 3 0 0 {1,2} {1,2,4} 4 0 0 {1,4,5} {1,4, 5} 5 0 0 {1, 2} {1, 2, 3} 6 0 0 {1, 3, 5} {1,3, 5 7 0 0 {1,3,4,5} {1, 3,4, 5} 8 {2, 3,4} {2, 3, 4} 0 0 9 {2, 3, 4} {3,4} 0 {5} 10 {3,4} {3,4} {5} {5} set that reach a call at a call node, those effects that survive the call are recovered in the entry set constructed by the E,n[n] equation for the successor return node n. To see that this is true, observe the following. If an entryset effect that reaches the call cannot enter the called procedure, then it cannot be killed within the called procedure, so the effect should be added to the returnnode entry set without further conditions, and this is done by the selection criterion (x E Eo,t[p] A C) in the Ei, [n] equation for the return node. If, on the other hand, an entryset effect reaches the call and does enter the called procedure, and therefore may be killed by it, then this effect should be added to the returnnode entry set only if it reached the entry set of the called procedure's exit node and the effect can cross back into the caller. This is done by the selection criterion (x E E,,t[p] A C1 A C2 A x E E,,t[q]) in the Ei,[n] equation for the return node. From the equations for the entry set, we see that for any procedure z, the entry set at z's exit node will, as the equations are solved, eventually contain all callingcontext effects that entered z and reached its exit node. This characteristic of the exitnode entry set is the requirement placed upon it when it is used in the Ei,[n] equation for the return node, so this requirement is satisfied and the entryset equations are correct. For any procedure, the Bi, set is always empty at the entry node, so the B set is free of callingcontext effects. Within the procedure body, GEN and KILL sets are used to update the body set as it propagates along the various nodes. For effects in the body set that reach a call at a call node, those effects that survive the call are recovered in the body set constructed by the Bin[n] equation for the successor return node n. If a bodyset effect that reaches the call cannot enter the called procedure, then it cannot be killed within the called procedure, so it should be added to the returnnode body set without further conditions, and this is done by the selection criterion (x E Bo,,[p] A C1) in the Bin[n] equation for the return node. If, on the other hand, a bodyset effect reaches the call and will enter the called procedure, and therefore may be killed by it, then this effect should be added to the return node body set only if it reached the entry set of the called procedure's exit node and the effect can cross back into the caller. This is done by the selection criterion (x E Bo,,[p] A C1 A C2 A x E E ot[q]) in the Bi,[n] equation for the return node. In addition, all crossable effects that result from the call, and that are independent of calling context, should also be added to the returnnode body set, and this is done by the selection criterion (x E B,,t[q] A C2) in the Bi,[n] equation for the return node. From the equations for the body set, we see that for any procedure z, the body set at z's exit node is free of callingcontext effects and will, as the equations are solved, eventually contain all body effects that reached the exit node, including those body effects resulting from calls made within z. This characteristic of the exitnode body set is the requirement placed upon it when it is used in the Bin[n] equation for the return node, so this requirement is satisfied. The other requirement of this returnnode equation is that the exitnode entry set contains all callingcontext effects for the procedure that reach the exit node. This requirement has already been shown to be satisfied, so we conclude that the bodyset equations are correct. 2.2.2 Element Recoding for Aliases The RECODE set for the entry node has its elements added to the Ei,, set for that node. The idea of the RECODE set is that certain elements in the OUT set of a predecessor call node, irrespective of their ability to cross the interprocedural bound ary when parameters are ignored, should nevertheless be carried over into the entry set of the called procedure as callingcontext effects because of an alias relationship established by the call, between an actual parameter and a formal callbyreference parameter. Any element that enters a procedure because of such an alias relationship between parameters should be recorded to reflect this alias relationship. A recorded element represents both the base element, which is the element as it would be if there were no alias relationship, and the nonempty alias relationship. Element recoding has two purposes. First, it allows the recorded element within the called procedure to be killed correctly through its alias relationship. Second, it allows the recorded element within the called procedure to be correctly associated with specific references to those aliases that are in the alias relationship. Element recoding never involves a change of the base element, but only a change of the associated alias relationship, which would be the set of formal parameters to which the base element is, in effect, aliased. Because of element recoding, in effect a new element is generated, hence the separate RECODE set. Figure 2.2 presents an algorithm for generating the entrynode input sets E,, and RECODE, for a forwardflowor dataflow problem, for the assumed language model in which the visibility of each formal parameter is limited to the single proce dure that declares it. For each element in the OUT[c] set, the algorithm generates at most one element for inclusion in the entrynode input sets. The algorithm is unambiguous, except for line 10. The "can be affected by" test at line 10 is a gener alization. The details of this test will depend on the specific dataflow problem being solved. For example, if the dataflow problem is reaching definitions, then each base element w represents a specific definition of some variable z. If the actual parame ter p being tested by the algorithm is the variable z, and the corresponding formal parameter is callbyreference, then the definition that w represents can be used or killed through that formal parameter, so w can be affected by that actual parameter z, and the "affected by" test is therefore satisfied. The p E OA test at line 10 covers the situation where an actual parameter p that is aliased to the formal f is itself a formal parameter that is effectively aliased to w. In this case f is established as a new effective alias for w, by transitivity of the alias relationship. Referring to the algorithm, there is no carry over of the old alias relationship into the new alias relationship. The old alias relationship is represented by the OA set, and the new alias relationship is represented by the NA set. That this no carryover of the old alias relationship is correct, follows from the assumed language model. The aliases of element recoding are formal parameters, and the model states that each formal parameter is visible in only one procedure. This means there is no need to carry the old alias relationship into a different procedure, because the aliases cannot be referenced outside the single procedure in which the old alias relationship is active. Note that recursive calls are no exception to this nocarryover rule, because a recursive call will cancel any alias relationship established for a base element by any prior call of the procedure. In general, the fact that crossing elements are recorded when NA $ 0, and unrecoded when NA = 0 and OA 5 0, places an added burden on the returnnode equations to recognize an element that should be recovered from the exitnode entry set, necessitating, in effect, additional rules to cover this possibility. After an element is recovered, it would also be necessary to restore the alias relationship, if any, that e is an entry node. This algorithm constructs the E,, [ e ] and RECODE[ e] sets. begin 1 Ei,[e] 0 2 RECODE[e] 0 3 for each predecessor call node c of entry node e 4 for each element x E OUT[c] 5 let w be the base element of x 6 let OA be the set of aliases, if any, associated with w, forming x 7 let NA be the set of new aliases 8 NA 0 9 for each actual parameter p at call node c that is aliased to a callbyreference formal parameter f 10 if (w can be affected by p) V (p E OA) 11 NA NA U{f} fi end for 12 if NA # 0 13 RECODE[e] RECODE[ e ] U {(w, NA)} 14 else if w can cross the interprocedural boundary 15 Ei,[e] EZ,[e] U {w} fi end for end for end Figure 2.2. Elementrecoding algorithm for forwardflowor dataflow problems. it had prior to the call. This recognition and restoration problem is perhaps most easily solved by associating with each call node two additional sets, one for body set elements and another for entryset elements, where each set consists of ordered pairs. These sets would be determined whenever the entrynode entry set of the called procedure is computed. The first element of each ordered pair is a crossing element x as it exists in the Bout or E,,t set at the call node, and the second element is element y which is that element effectively generated from element x by the elementrecoding algorithm of Figure 2.2 at either line 13 or line 15. If all crossing elements for the call are included in these additional sets, then the returnnode equations can use these sets instead of the Bo,,[p] and Eout[p] sets to recognize elements to be recovered from the exit node entry set. Recognition and restoration would be done by trying to match the exitnode entryset element against the second element of an ordered pair from the appropriate additional set at the call node, and then, if there is a match, restoring the original element by using the first element of the matched pair. For example, if x is a crossing element in the B,,t set of a call node, and y is the generated element, then (x, y) would be an ordered pair in the additional set for bodyset elements. When the Bi, set for the return node is computed, if y is in the exitnode entry set then it will match the ordered pair (x, y), and element x will be added to the Bi, set. As an example of why element recoding is necessary, consider the following. Suppose there are two different calls to the same procedure, and different definitions of global variable g reach each call. At one of the calls, g is also used as an actual parameter and the corresponding formal parameter is callbyreference. The problem now is what to kill from the entry set whenever that formal parameter is defined in the called procedure. If the individual elements representing the different definitions of g do not somehow identify how they are related to this formal parameter, then the only choice is to kill all of them or none of them, and neither of these choices is correct in this case, as the only definitions of g that should be killed are those that entered the procedure from the call where g is aliased to the callbyreference formal parameter. 2.2.3 Implicit Definitions Due to Calls A call with parameters typically has implicit definitions associated with it. For example, if a formal parameter is callbyreference, then each actual parameter aliased to that formal parameter is implicitly defined at each definition of the formal parameter. If a formal parameter is callbyvalueresult, then that formal parameter is implicitly defined each time the called procedure is entered, and the actual parameter at the call is implicitly defined upon return from the call. From the standpoint of solving a dataflow problem such as reaching definitions, all implicit definitions due to calls should be determined, and elements generated at the appropriate nodes to represent these implicit definitions. The remainder of this section discusses the generation of implicit definitions and the determination of what reaches them for the specific problem of reaching definitions. We assume that a formal parameter may be either callbyreference, callby value, callbyvalueresult, or callbyresult. For the reachingdefinitions problem, before the iterative algorithm can be used to solve the dataflow equations, all GEN sets must be prepared. For each point p in the program where a callbyreference formal parameter is defined, add to the GEN set of the node for point p an implicit definition of each actualparameter variable that is aliased to that formal parameter in a call. Each added implicitdefinition element must be a recorded element that includes the alias relationship for that actual parameter. For example, suppose a procedure named A has two callbyreference formal parameters, x and y, and inside A at point p there is a definition of x, and there are three calls of procedure A in the program. The first call aliases variable v to x. The second call aliases variable v to both x and y. The third call aliases variable w to x. Thus, at point p there would be three implicitdefinition elements generated, namely (v, {a}), (v, {z, y}), and (w, {x}). As an example of what this element notation means, for the (v, {x}) element the v represents the implicit definition of variable v that occurs at point p, and the x represents the formal parameter that variable v is aliased to. As a special requirement for these implicitdefinition elements, for the B,,t set at the exit node of procedure A, the (v, {x}) element, if it reaches this set, can only cross from this set to the return node of the first call. Similarly, the (v, {z, y}) element can only cross to the return node of the second call, and the (w, {x}) element can only cross to the return node of the third call. The crossing restrictions in the preceding example are due to a rule, now given. Let A denote a procedure containing a definition at point p of a callbyreference formal parameter x, (t, {z}) is the implicitdefinition element generated at point p for some specific call c of A that aliases actualparameter variable t to z, and m is the exit node of A. If (t, {x}) E B,,t[m], then (t, {x}) can only cross from Bout[m] to the return node of call c, and as (t, {x}) crosses, it must be recorded as t by having its alias relationship nullified. This crossingrestriction rule is necessary because element (t, {z}) is both a body effect, because it is generated inside the called procedure, and a callingcontext effect, because it is the result of a specific call of that procedure. This dual quality requires the special treatment that the rule provides. Nullifying the alias relationship as the element crosses to the return node is both good practice in general for this element, and a necessity if call c is a recursive call of A. As an example, assume that call c is a recursive call of A, and that variable t is a global variable. If (t, {z}) reaches the B,,t[m] set, the rule states that this element can only cross to the return node of call c, and that it be recorded as t. Assuming that this t element then reaches from this return node to the B,,t[m] set, t can then cross to any return node that has an inedge from m. Although both the (t, {x}) and t elements refer to the same implicit definition of variable t occurring at point p, the two elements are not the same, and the crossingrestriction rule applies only to an element that is identical to the element generated at point p, which is (t, {x}). The implicit definitions of actualparameter variables is the most important category of implicit definitions that are due to callbyreference formal parameters. However, there is also a second, lessimportant category. At each explicit definition of a variable t at point p inside A, such that variable t is also used in a call of A as an actual parameter aliased to a callbyreference formal parameter z, then there is an implicit definition of formal parameter x at point p. The implicitdefinition element generated at point p would be (x, {t}), meaning a definition of variable x at point p, aliased to variable t. However, assuming a formal parameter cannot be defined or used outside the procedure for which it is declared, it follows that there is no need for a crossingrestriction rule for these elements, because they cannot cross to any return node. Normally, a definition of a variable kills all other definitions of that variable. However, the implicit definitions due to callbyreference formal parameters have no associated kills. Instead, the following rule suffices. For each callbyreference formal parameter x declared for procedure A, if all calls of A alias the same actualparameter variable t to x, then each explicit definition inside A of either variable t or z, will kill all definitions of variable t and all definitions of variable x. Otherwise, if all calls of A do not alias the same actualparameter variable t to z, then each explicit definition inside A of either variable t or x will kill only the definitions of that variable and those recorded elements that are aliased to that variable. The entrynode GEN set will be used to hold all implicit definitions of formal parameters that occur upon procedure entry. Thus, for each entry node, for each formal parameter of the represented procedure that is callbyvalue or callbyvalue result, add to the GEN set of that entry node an element that represents an implicit definition of that formal parameter occurring at that entry node. The returnnode GEN set will be used to hold all implicit definitions of actual parameters that may occur upon return from the called procedure. Thus, for each return node, for each actual parameter of the associated call whose corresponding formal parameter is callbyresult or callbyvalueresult, add to the GEN set of that return node an element that represents an implicit definition of that actual parameter occurring at that return node. The returnnode KILL set should represent all elements that will be killed by these implicit definitions of actual parameters. With the GEN sets ready, the iterative algorithm can proceed. Once the iter ative algorithm is ended, a followon step is done: a) Examine the Bo,, set for each exit node. For each definition d in this set of a formal parameter p, and p is callby result or callbyvalueresult, then d reaches the implicit use of this formal parameter by those implicit definitions of actual parameters found at the various return nodes whose corresponding formal parameter is p. The element representing d can be added to the Bi sets of those return nodes in a way that reflects the reach, b) Examine the OUT set of each call node. For each definition d in this set of a variable that is used as an actual parameter in the call, and the corresponding formal parameter is callbyvalue or callbyvalueresult, then d reaches the implicit use of the defined variable by the implicit definition of the corresponding formal parameter found at the entry node of the called procedure. The element representing d can be added to the E,, set of that entry node in a way that reflects the reach. 2.3 Interprocedural ForwardFlowAnd Analysis This section gives the dataflow equations used by our interprocedural analysis method for forwardflowand problems. The difference between these equations and the equations for forwardflowor is explained. For forwardflowand problems, some changes are needed to the dataflow equa tions given in Section 2.2.1. Of course, the confluence operator must be changed from union to intersection. However, it is still necessary to construct the entrynode entry set as the union of all crossing effects from the predecessornode sets, so that calling context can be properly recovered at the return nodes. At the same time, the entry set must always be constructed as the intersection of predecessornode sets, if the entry set is to be a part of the IN and OUT sets. These conflicting requirements for the entrynode entry set can be resolved by maintaining two separate entry sets at each node. The revised dataflow equations follow. The two conditions, C1 and C2, are explained in Section 2.2.1. For any node n. IN[n] = E!i)[n] U Bi[n] OUT[n]= Et[n]U Bo.t[n] Group I: n is an entry node. Bin[n] = 0 Ein[n] = U { Ix (E'[p] U Bot[p]) A C,} p E pred(n) En)[n] = n {x Ix E OUT[p] A C1} p E pred(n) Bout[n] = GEN[n] E1[n] = Ei$[n] U RECODE(1)[n] U RECODE )[n] ES([n] = E()[n] U RECODE(2)[n] Group II: n is a return node, p is the associated call node and q is the exit node of the called procedure. Bin[n] = {x I (x E Bot[p] A (' V (C1 A C2 A x E E(~[q]))) V (x E Bo,,[q] A C2)} ESW[n] = {x E E(t[p I CV (CI A C 2 AxE E1'[q])};i =1,2. Bo,,[n] = (Bin[n] KILL[n]) U GEN[n] E.t[n] = E!)[n] KILL[n]; i = 1,2. Group III: n is not an entry or return node. Bi,[n] = n B [p] p pred(n) S[n] = n E' [p]; i = 1, 2. p E pred(n) B0t[n] = (Bi,[n] KILL[n]) U GEN[n] Et[n] = E)[n] KILL[n]; i = 1,2. The entry set E(') is the set used to recover calling context, and the entry set E(2) is the set that is a component of the IN and OUT sets. The RECODE sets appearing in the entrynode equations represent recorded elements as explained in Section 2.2.2. The RECODE() set will just be the union of the recorded elements generated from each predecessor call node c, using the algorithm of Figure 2.2 and drawing from the E,[ c] and Bo,[c] sets at line 4 instead of the OUT[c] set. Similarly, the RECODE(2) set could just be the intersection of the recorded elements from each predecessor call node c, drawing from the OUT[c] set at line 4. However, doing this may cause the unnecessary loss of recorded elements when the same underlying base element w is found in each OUT[c] set. To avoid such loss, an improved rule states that if the same base element w is found in each OUT[c] set, and there is one or more nonempty alias relationships for that w occurring at one or more predecessor nodes c, then a single recorded element for that w that encodes all of these alias relationships would be generated into the RECODE(2) set, otherwise no recorded element for that w would be generated into the RECODE(2) set. For example, suppose c has three different values for a given entry node, and the same base element w is found in each OUT[c] set, and at one c there is an empty alias relationship, at the second c there is an alias relationship to formal parameter x, and at the third c there is an alias relationship to formal parameter y. For this example, the single recorded element would be (w, {x, y}), and this recorded element can either be killed directly through w, or indirectly through x, or through y. Note that the complete kill of this recorded element at any kill point, even though the kill may have been made through an alias that was not established at each c, is nevertheless correct. The intersection confluence operator associated with RECODE() implicitly requires that for base element w to pass a kill point, it must be on every call path past that kill point, which is not the case when w is killed from at least one call path, which happens when that w is killed through an alias that was established by at least one of the c. If the specific dataflow problem being solved allows the base element to be used through one of its effective aliases, then a flag could be associated with each alias in the recorded elements of RECODE(2), and this flag could indicate whether or not the alias was established at each c. In the case of the example, the recorded element with flags would be (w, {Xnot Ynot}). Only a use of the base element through an alias established at each c would be a use through an alias that occurs on every call path, and this kind of use would be the allpaths use that is implicitly required by the specific dataflow problem by virtue of it being forwardflowand. With the exception of the confluence operator and the two different entry sets, the equations for forwardflowand are the same as for forwardflowor, and are like wise correct. Set E(2) fulfills the requirement for the IN and OUT sets by consistently using the intersection confluence operator for its construction, just as B does. The equations for the E(1) and E(2) sets only differ at the entry node, and there the only difference is the confluence operator, and the way the RECODE sets are built. As set intersection is the confluence operator for E(2), and set union for E(1), and the Table 2.2. Solution of forwardflowand equations for Figure 2.3. N o d e E Fpo, F ^ ) L IP , (2 ) B i  o ,,t Node E, E(1 E) E7t B B0 1 0 0 0 0 0 0 2 0 0 0 0 0 {1,2} 3 0 0 0 0 {1,2} {1,2,4} 4 0 0 0 0 {1,4,5} {1,4,5} 5 0 0 0 0 {1,2} {1,2,3} 6 0 0 0 0 {1,3,5} 1, 3, 5} 7 0 0 0 0 {1,5} {1,5} 8 {2, 3, 4} {2, 3, 4} {2} {2} 0 0 9 {2,3,4} {3,4} {2} 0 0 {5} 10 {3,4} {3,4} 0 0 {5} {5} RECODE(2) set is added to both E() and E2), it follows that E(2 will be a subset of E(1) at every node. Thus, E(1) can be used to recover calling context for E(2). Set E1) also serves to recover calling context for both E(1) and B, because E() is built at the entry node from these two sets, and the use of union as the confluence operator guarantees that all callingcontext effects will be collected. Table 2.2 shows the result of solving the equations for the flowgraph of Fig ure 2.3. By "solving" we mean that, in effect, the iterative algorithm has been used and all the sets are stable. The dataflow problem is available expressions, and vari able w is local while variables x, y, and z are global. Available expressions is the problem of determining whether the use of an expression is always reached by some prior use of that expression, for certain expressions in the program. In Figure 2.3, nodes 1 and 8 are entry nodes, nodes 7 and 10 are exit nodes, nodes 3 and 5 are call nodes, and nodes 4 and 6 are return nodes. Alongside each node is its basic block. Each expression is superscripted with an identifier that is the set element used in Table 2.2 to represent that expression. procedure main begin y=w+l z=x+1 if(e) a=z+1 call f( else a=y+ 1 call f( end a=z+ 14r call fO 3 procedure f() begin x=z+2 end + 11 + 12 Sa= y +13 5 call fO Figure 2.3. An availableexpressions example. 2.4 Interprocedural BackwardFlow Analysis Backwardflow problems are basically forwardflow problems in reverse. How ever, the same flowgraph is used for both forwardflow and backwardflow problems. To convert the equations for forwardflowor to backwardflowor, or for forward flowand to backwardflowand, the transformation is mechanical and straightfor ward. The same equations are used, but various words and phrases are everywhere changed to reflect the reverse flow. For example, "pred(n)" for predecessors becomes "succ(n)" for successors, "out" subscripts become "in" subscripts and "in" subscripts become "out" subscripts, IN becomes OUT and OUT becomes IN, "call node" be comes "return node" and "return node" becomes "call node", "entry node" becomes "exit node" and "exit node" becomes "entry node". For backward flow, the nodes requiring special equations are the exit node and call node, and not the entry node and return node as for the forwardflow problems. 2.5 Complexity of Our Interprocedural Analysis Method To determine the worstcase complexity of our method for the assumed lan guage model in which the visibility of each formal parameter is limited to the single procedure that declares it, we consider the solution of the dataflow equations for only one element at a time. Let n be the number of flowgraph nodes. Let the elementary operation measured by the complexity be the computation of the dataflow equations once at a single, average flowgraph node, for a single element. Only the presence or absence of the single element within a particular body or entry set need be repre sented, and this requires no more than a single bit of storage for each set referenced by the equations. Thus, computing the dataflow equations once at an average node, for a single element, will consist of a small number of integer operations, assuming that the average in and outdegree of the flowgraph nodes is bounded by a small constant, which will always be the case for flowgraphs generated from real programs, and also assuming that the length of recorded elements will be small. Referring to the algorithm of Figure 2.2, the length of a recorded element is 1 + INAI, and INAI is bounded from above by the number of callbyreference formal parameters of the given procedure. As a rule, this upper bound will be small. We next consider the total number of node visits required to solve the dataflow equations for a single element. Prior to solving the equations, all body and entry sets are initialized to empty, at complexity O(n). The empty sets represent the absence of the element. Note that each set has only two states: either the element is present, or it is absent. Assuming a forwardflow problem, each time the equations are computed for a node, if any of the out sets have changed from their previous state, then the equations will be computed for all successor nodes. The forwardflowor equations have only two out sets per node, and the forwardflowand equations have three. It follows that repeated computation of the equations for a single node will cause the successor nodes to be marked for computation at most two or three times, depending on the equations being used. Given that the average number of successor nodes is bounded by a small constant, it follows that the total number of node visits required to solve the dataflow equations for a single element will be bounded from above by kin where ki is a constant, giving a worstcase complexity of O(n) for solving the dataflow equations for a single element. The worstcase complexity of solving the dataflow equations for m total ele ments will therefore be O(mn). Let b be the number of base elements for the program being analyzed, and let r be the number of recorded elements, giving m = b + r. As an example, for the reachingdefinitions dataflow problem the base elements will be all the definitions in the program. We assume that for the kind of dataflow problems our method is meant to solve, the number of base elements will be a linear function of the program size, and therefore proportional to n. Let constant k2 be an upper bound of b/n. We also assume the universe of real, useful programs, written by programmers to solve practical problems. To determine an upper bound for r, let k be the maximum number of formal parameters for a single procedure. That k is a constant independent of program size should be obvious. Given k and the algorithm of Figure 2.2, and allowing all possible combinations of the formal parameters of any single procedure, the maximum number of recorded elements for any single procedure and base element is k3 = k=1 ( k ) = 2k 1. Note that k3 is a constant, albeit an enormous constant. The maximum number of recorded elements for any single procedure will therefore be kab. In the assumed language model, each formal parameter is visible in only one procedure, and this means each recorded element is confined to a single procedure when the dataflow equations are solved. Therefore, the total number of node visits required to solve the dataflow equations for all the recorded elements will be bounded from above by Ei = kisik3b where j is the number of procedures in the flowgraph, and s, is the number of flowgraph nodes in the ith procedure. This upper bound can be rewritten as Ci=i klk2k3nsi. Ignoring constants and given that Ef= si = n and Ei= ns; = n , the worstcase complexity of our method for the assumed language model is O(n2), and the elementary operation measured by the complexity is a small number of integer operations assuming that the average recodedelement length is small. For a program from the assumed universe of programs, the likelihood of a large complexity constant due to element recoding is very low, for the following reason. In order to increase the number of recorded elements for a given base element and procedure, the given base element must, in effect, be repeatedly aliased to different combinations of formal parameters in the given procedure. The algorithm of Fig ure 2.2 generates at most a single recorded element for each element in the OUT set, so to increase the number of recorded elements as stated, there must be multiple calls to the same procedure, and in these different calls the same base element must be aliased to different formalparameter combinations. To assess the likelihood of this requirement being met, consider that for any given program from the assumed uni verse, the type and purpose of a variable determines how that variable is used in that program, and each variable used in a program by necessity has a purpose. Given a number of different calls to the same procedure, and given that a variable appears as one or more of the actual parameters in each of the calls, then as a rule we expect that variable to always occupy the same parameter positions in those calls because there is always a close correspondence between parameter position and the purpose of the variable that occupies that position. Note that by "variable" we mean a variable and any aliases it may have, including formalparameter aliases. A variable and its aliases are interchangeable and share the same purpose because by definition they reference the same data. It might be argued that a language such as C has procedures that have a variable number of arguments, such as printf and scanf, for which the same variable could easily occupy different actualparameter positions in different calls. This is true, but such library procedures are best treated as unknown calls, and there is no element recoding for unknown calls. For the needs of element recoding in the rare case of a userwritten procedure with a variable number of arguments, a single formal parameter could stand for the variable portion of the formal parameters, and conservative assumptions could be made whenever that single formal is, in effect, referenced. Aside from mentioning this, we do not consider such userwritten variable argument procedures further. For a dataflow problem such as reaching definitions, the base element can only be affected by a single variable. For such a dataflow problem, the purposefulness of variables makes it very unlikely that an increase in the number of recorded elements for a given procedure and base element can even begin, let alone be sustained. How ever, such an increase would be more likely for a dataflow problem where the base element can be affected by several different variables. An example would be avail able expressions, because each base element could be affected by as many different variables as compose the expression represented by that base element. In light of the preceding argument regarding the purposefulness of variables, for the reachingdefinitions and similar dataflow problems, we expect the maximum number of recorded elements for any given procedure and base element in the majority of the programs in the assumed universe, to be one, and a little higher than one for the remaining programs in that universe. Given the algorithm of Figure 2.2, we also expect the average length of each recorded element to be slightly more than two, given the preceding expectation that there will be a very small maximum number of recorded elements for any given procedure and base element, and assuming that most base elements when aliased by a call will be aliased to only a single formal parameter, and only occasionally aliased to more than one. Note that this expected average length of the recorded elements is consistent with the claim that the elementary operation measured by the worstcase complexity of our method is a small number of integer operations. It may be noticed that the complexity of O(n2) for our interprocedural analysis method is the same as the known worstcase complexity for intraprocedural dataflow analysis, assuming there are no restrictions on the flowgraph. This fact makes it unlikely that it would be possible to improve on our method in terms of complexity, without resorting to flowgraph restrictions. However, although the complexities are the same, this does not mean interprocedural dataflow analysis will now take roughly the same time as intraprocedural dataflow analysis. The following inequality should make this clear. E=1 s? < n2, given that j is the number of procedures in the flowgraph, si is the number of flowgraph nodes in the ith procedure, and Cji= si = n. Besides the language model that is assumed for this chapter, an alternative model allows each formal parameter to have visibility in more than a single procedure. Examples of programming languages that fit this alternative model are Pascal and Ada, which allow nested procedures. Element recoding can be used for this alternative model, but unless precision is compromised, the worstcase complexity for solving the equations will be exponential, because the number of recorded elements could grow exponentially assuming that alias information is compounded when a recorded element is recorded. The exponential complexity of tracking aliases due to calls was first considered by Myers [17], and more recently by Landi and Ryder [15]. In practice, the cost of precise element recoding for the alternative language model may be acceptable for the assumed universe of programs, and for the same reason given previously regarding the purposefulness of variables. However, we do not consider the alternative model further. 2.6 Experimental Results There are experimental data for our interprocedural analysis method. Specif ically, two different prototypes have been constructed, and they both solve the reachingdefinitions dataflow problem using our method. Both prototypes accept Clanguage programs as the input to be dataflow analyzed. For simplicity, these pro totypes impose some restrictions on the input, such as requiring that all variables be represented by single identifiers, thereby excluding variables that have more than one component, such as structure and union variables. In addition, there is no logic in the prototypes to determine what pointers are pointing at, so pointer dereferencing is essentially ignored. The prototypes do not accept preprocessor commands, so the input programs must be postpreprocessor. Both prototypes, named prototype 1 and prototype 2, use the same code to parse the input program and construct the flowgraph. However, they differ in how they implement our analysis method. Prototype 1 prepares a single bitvector format containing all the definitions in the input program, and then solves the dataflow equations once for the program flowgraph. Prototype 2 uses a single integer as the bit vector and solves the dataflow equations for the program flowgraph as many times as there are base elements. For the reachingdefinitions dataflow problem, the definitions in the program are the base elements. We call the approach used by prototype 2 onebaseelementatatime, and the approach used by prototype 1 is allatonce. It might be expected that prototype 2 would be many times slower than proto type 1, because of the big difference in bitvector sizes, but this is not the case. For prototype 1, calculations using varied test results show that V x S"1 D, where V is the average number of visits per flowgraph node made to solve the dataflow equa tions, S is the integer size of the bit vector for prototype 1, and D is the number of definitions in the input program. This relationship for prototype 1 means that prototype 2 should run at roughly the same speed as prototype 1, because solving the dataflow equations for a single element will require an average of roughly one visit per flowgraph node and the application of the dataflow equations to a vector of size one. Note that the total amount of work prototype 1 must do per flowgraph node to solve the equations is proportional to the product V x S1 D, and the total amount of work prototype 2 must do per flowgraph node to solve the equations for the D base elements is proportional to the product Vx S2 x D x 1 x 1 x D D, where S2 is the integer size of the bit vector for prototype 2. Experimental results have supported the expectation of similar speeds for the two prototypes. When deciding on the design of a practical tool, this finding is important and decisively tips the scales in favor of the onebaseelementatatime approach used by prototype 2. For both prototypes, the bit space needed for set storage is nks, where n is the number of flowgraph nodes, k is the average number of sets per node, and s = max(average set bitsize for any solving of the equations). Note that for prototype 1 there is only one solving of the equations, and for prototype 2 there are as many solving of the equations as base elements. The primary reason Table 2.3. Typical experimental results for the two prototypes. defs defs global calls nodes prototype 1 prototype 2 2126 30% 521 4191 49s lm21s 2026 60% 472 3948 55s 2m22s 4109 30% 924 7537 4m18s 4m38s 4223 60% 916 7723 4m57s 8ml9s 6115 30% 1325 11185 N/A 10m0s 6091 60% 1411 11288 N/A 18ml8s 8200 30% 1832 14799 N/A 17m44s 8054 60% 1726 14641 N/A 30m2s 10299 30% 2164 18434 N/A 23m55s 10016 60% 2356 18587 N/A 45m8s the approach used by prototype 2 is preferable when compared with the allatonce approach used by prototype 1, is the likelihood of a greatly reduced s value. For example, without element recoding, the s value is 1 for prototype 2, and D for prototype 1. Allowing element recoding, the s value for the prototype2 approach will be 1 + max(average number of recorded elements per procedure for any solving of the equations). Here we assume that the best way to add element recoding to prototype 2 would be, for each solving of the equations, to solve the equations for both a single base element and all recorded elements generated from that base element. Table 2.3 presents typical experimental results for the two prototypes. Each table row represents a different input program. The input programs were randomly generated by a separate program generator. The generated input programs are syn tactically correct and compile without error, but have meaningless executions. Each input program in Table 2.3 has 100 procedures. Only prototype 1 currently has elementrecoding logic, so the input programs do not have call parameters and the table data do not reflect elementrecoding costs. Measuring elementrecoding costs for randomly generated programs would be somewhat meaningless anyway, since the purposefulnessofvariables principle would be violated. 44 Referring to the columns of Table 2.3, "defs" is the total number of definitions in the input program, "defs global" is the percentage that define global variables, "calls" is the number of known calls, "nodes" is the number of flowgraph nodes, "prototype 1" is the total CPU usage time in minutes and seconds required by prototype 1 to completely solve the reachingdefinitions dataflow problem for the input program and generate a report of all the reaches, and "prototype 2" is the same thing for prototype 2. The hardware used was rated at roughly 23 MIPS. The large space requirements of prototype 1 prevented running it for the larger input programs in the table. CHAPTER 3 INTERPROCEDURAL SLICING AND LOGICAL RIPPLE EFFECT 3.1 Representing Continuation Paths for Interprocedural Logical Ripple Effect This section lays the theoretical basis for our algorithm. The problem of inter procedural logical ripple effect is examined from the perspective of execution paths and their possible continuations. First, general definitions are given, followed by three assumptions and a definition of the Allow and Transform sets, followed by Lemma 1, Theorems 1 through 4, and a discussion of the potential for overestimation inherent in the Allow set. A variable is defined at each point in a program where it is assigned a value. A definition is assumed to have the general form of "v + expression", where v is the variable being defined and "" is an assignment operator that assigns the value of expression to v. If the expression includes variables, then these variables are termed the use variables of the definition. In general, a use is any instance of a variable that is having its value used at the point where the variable occurs. A procedure contains a definition if the statement that makes the definition is in the body of the procedure. Similarly, a procedure contains a call if the statement that makes the call is in the body of the procedure. The body of a procedure is those statements that are defined as belonging to the procedure. Frequent reference is made in this chapter to a procedure containing a state ment, or containing a call, or containing a flowgraph node. For languages that allow nested procedures, such as Pascal and Ada, note that procedure nesting in these languages is a mechanism for controlling variable scope, and not a mechanism for sharing statements, calls, or flowgraph nodes. Throughout this chapter we assume that at most only a single procedure contains any given statement, call, or flowgraph node. Let d and dd be two definitions, possibly the same, in the same program. Let dd have a usevariable v, let vdd be that usevariable instance, and let d define v. Given a possible execution path between definition d and vdd, along which the definition of v that d represents would be propagated, such a path is referred to as a definitionclear path between d and vdd with respect to v. Definition d can only be propagated along an execution path to the end of that path if either definition d itself or an element that represents definition d exists at the beginning of that path, and there is no redefinition of v along that path. Definition d is said to affect definition dd if there is a definitionclear path between d and Vdd with respect to v. Similarly, definition d affects use u if u is an instance of v, and there is a definitionclear path between d and u with respect to v. For convenience, v will not be explicitly mentioned when it is understood. Note that whenever we speak of an execution path between two points, we always mean that the execution path begins at the first point and ends at the second point. For example, an execution path between d and dd begins at the program point where d occurs and ends at the program point where Vdd occurs. For convenience, we assume that dd and Vdd occupy the same program point. Assumption 1. A called procedure, if it returns, always returns to its most recent caller. A procedure that returns, always returns to the most recent unreturned call. Assumption 2. A call has no influence on the execution paths taken inside the called procedure. Assumption 3. There are no recursive calls. Assumption 1 reflects the behavior of all the procedural languages that we know of. Regarding Assumption 2, our algorithm may in fact overestimate the logical ripple effect because of both Assumption 2 and the unstated but standard assumption of intraprocedural dataflow analysis that all paths in a procedure flowgraph are possible execution paths. However, these two assumptions are unavoidable because determin ing all the truly possible execution paths in an arbitrary program is known to be an undecidable problem. Regarding Assumption 3, making this assumption improves the precision of our algorithm because this assumption removes a potential cause of overestimation. The consequence of using our algorithm for a program with recursive calls is discussed at the end of Section 3.2. To determine what a definition affects when it is constrained by ripple effect, it is useful to introduce two concepts: backward flow and forward flow. Given an execution path, whenever the execution path returns from a procedure to a call, this is termed backward flow. All other parts of the execution path may be termed forward flow. Note that the possibilities for backward flow are constrained by Assumption 1, and therefore constrained by the relevant execution paths that lead up to the point of the return in question. Regarding a given execution path, those call instances within that execution path that have yet to be returned to within that path, called unreturned calls, are the parts of the path that constrain backward flow. Note that this constraint is a positive constraint, since a call cannot be returned to unless that call exists as an unreturned call in at least one relevant execution path. Definition 1. Two sets, Allow and Transform, will be used to represent the backwardflow restrictions associated with a particular definition d. Let p be the pro gram point where definition d occurs. The elements in both sets are calls. The Allow set identifies only the calls to which the execution path continuing on from point p may make an unmatched return tountil the backwardflow restrictions represented by this Allow set are effectively cancelled by the interaction between the execution path continuation and the Transform set, explained shortly. An unmatched return is a return made during the executionpath continuation to a call instance that precedes the beginning of that executionpath continuation. The call instance is necessarily an unreturned call, as otherwise it could not be returned to. [Allowl < the total number of different calls represented in the program text. We define Allow = 0 to mean there are no backwardflow restrictions for d. The Transform set identifies only the calls to which the execution path continuing on from point p may make an unmatched return to, and upon this unmatched return, the executionpath continuation is no longer constrained by the Allow and Transform sets associated with d. The following relationships hold. Transform C Allow. If Allow 5 0 then Transform Z 0. Note that minimizing backwardflow restrictions must be done whenever the possible execution paths allow it, because otherwise the computed logical ripple effectwhich is the whole purpose of this formalanalysis sectionmay be missing pieces that belong in it but were not added to it because backwardflow restrictions were retained that are not valid for all the possible execution paths involved. Lemma 1. For any execution path P between two program points p and q, if P includes two or more call instances made in P that have not been returned to in P, then for these unreturned calls, c, calls the procedure containing ci+l, where c, is the ith unreturned call, in execution order, made in P. Proof. Assume that the next unreturned call ci+l is not contained in the pro cedure that was called by ci. Let X be the procedure called by ci, and let Y be the procedure that contains ci+1. The execution path in P between making the call ci and making the call ci+ must include a path out of procedure X and into procedure Y so that the call ci+ can be made. A path out of procedure X can occur in only two ways. Either X returns to a call, or X itself makes a call. If X returns to a call, then by Assumption 1, ci would be returned to, contradicting the given that ci has not been returned to. This means X must make a call to get to Y. Let c be the call contained in X that is the last call contained in X on the execution path in P taken from X to Y so as to make the call ci+l. If X makes the call c, and c has not been returned to in P, then c would precede ci+, as an unreturned call following c;, con tradicting the given that ci+l is the next unreturned call in execution order after c;. If c has been returned to in P, then all calls occurring on the execution path between the call c and the return to c must have been returned to according to Assumption 1. This would mean ci+1 has been returned to, contradicting the given that ci+1 has not been returned to. Thus, it is true that ci calls the procedure containing ci+,, as assuming otherwise leads to contradictions. 0 Definitions for Theorems 1 through 4. Let d and dd be the two definitions previously defined. Let A and T be the Allow and Transform sets associated with d. Let P be a single execution path between d and dd, and along which d can affect dd, subject to the constraints on P imposed by A and T. P will consist of a sequence of calls and returns, if any, in the order they are made. Any instance of a call made in P that is not returned to in P, is an unreturned call in P. K is defined for P if and only if P contains an unmatched returnmeaning a return to a call instance that precedes the beginning of Pto a call E T. K is that part of P that follows the first unmatched return to a call E T. Thus, K represents the continuation of P after the unmatched return. Any instance of a call made in K that is not returned to in K, is an unreturned call in K. Referring to each of the four theorems in turn, let AA and TT be the Allow and Transform sets for dd given all the paths P that meet the requirements of P as stated by that theorem. Let AAp and TTp be the Allow and Transform sets for dd given a single path P that meets the requirements of P as stated by that theorem. The four theorems that follow each define AA and TT. Note that for any given P, A, and T, one of the four theorems will apply. Theorem 1. If (1) A = 0, and P has no unreturned calls, or (2) A 5 0, K is defined for P, and K has no unreturned calls, then AA + 0 and TT + 0. Proof. For case (1), d is free of backwardflow restrictions and d has affected dd without making an unreturned call, therefore dd will be free of backwardflow restrictions, giving AA < 0 and TT < 0. For case (2), as soon as path P makes an unmatched return r to a call E T, then by Definition 1 what d can affect is no longer constrained by A and T, and this freedom from constraint by A and T passes by transitivity to dd because d affects dd. When K is defined for P, the unmatched return r in P that immediately pre cedes the beginning of K, means that any unreturned calls in P are also in K. This is because all call instances within P are more recent than the call instance that matches the unmatched return r. Thus, by Assumption 1 all call instances in P preceding the return r must be returned to in P before r can occur. Therefore, P has no unreturned calls because K has no unreturned calls. Thus, dd is free of backward flow restrictions since A, T, and P contribute nothing in the way of constraint, giving AA + 0 and TT 0. O Theorem 2. If (1) A = 0, and P has at least one unreturned call, or (2) A ^ 0, K is defined for P, and K has at least one unreturned call, then AA * Uall such P {the unreturned calls of P}, and TT + Uall such P {the first unreturned call in P}. Proof. For case (1), A and T contribute nothing in the way of constraint to AAp and TTp. Because d affects dd along path P which contains unreturned calls, by Assumption 1 those unreturned calls must be returned to first before any other unreturned calls can be made from the executionpath continuation point of dd onward. Hence, AAp + {the unreturned calls of P}. Because d had no backward flow restrictions, it follows that once all the unreturned calls of P are returned to by the executionpath continuation, then that continuation would no longer have any backwardflow restrictions. Because of Assumption 3 and Lemma 1, all the unreturned calls of P are returned to when the sequentially first unreturned call in P is returned to. Hence, TTp < {the first unreturned call in P}. For case (2), as shown in the proof of Theorem 1 case (2), A and T contribute nothing to AAp and TTp when K is defined for P. Thus, this case (2) is effectively the same as case (1), because the A and T sets contribute nothing and an unreturned call in K is an unreturned call in P. Therefore, AAp + {the unreturned calls of P} and TTp  {the first unreturned call in P}. From Definition 1 and the general definitions of AA, TT, AAp, and TTp, it follows that AA + Uall such P AAp and TT t Uall such P TTp. Thus, AA + Uall such P {the unreturned calls of P}, and TT Uall such P {the first unreturned call in P}. O Theorem 3. If A # 0, K is not defined for P, and P has no unreturned calls, then AA {zx x E A A (a is part of a possible execution path that inclusively begins with a call E T and ends with a call of the procedure containing dd, such that each unreturned call in this possible execution path is in A)}, and TT + AA n T. Proof. Note that only one procedure contains dd. Because K is not defined for P, it follows that P was constrained in its entirety by A, never making an unmatched return to a call E T. Because P has no unreturned calls, d can only affect dd along P by making one or more unmatched returns to calls E (A T), unless d and dd are in the same procedure. A, in effect, represents possible execution paths with unreturned calls by which d was affected. However, once given P, the path P may eliminate some of the paths from A as being possible, and return to some of the unreturned calls in A. Thus, although P contributes nothing directly to AA, it may narrow the unreturned executionpath possibilities that A can contribute to AA. AA as defined for this theorem, captures all execution paths in A that begin with a call E T and end with a call of the procedure that contains dd. Given Assumption 3, it should be obvious that these are all the possible paths in A that are unreturned after P. Note that if d and dd are in the same procedure, then AA = A and TT = T. Assume that d and dd are in different procedures. Any call E A that is not part of at least one path in A that makes a call of the procedure containing dd, must be excluded from AA because P requires a path in A that passes through the procedure containing dd, because otherwise P could not make a return to the procedure containing dd. Any call E A that is on a path in A between the procedure containing dd and the procedure containing d, must be excluded from AA because the procedure containing dd has been returned to by P. The definition of AA for this theorem satisfies these two exclusions. That TT + AA n T follows from Definition 1 requiring TT C AA, and from the definition of AA for this theorem. O Theorem 4. If A 0, K is not defined for P, P has at least one unreturned call, and the first unreturned call in P is contained in procedure X, then St Uall such P given X {the unreturned calls of P}, and S2 + {x x E A A (x is part of a possible execution path that inclusively begins with a call E T and ends with a call of the procedure X, such that each unreturned call in this possible execution path is in A)}, AA < S U S2, and TT S2 n T. Proof. St follows from Definition 1 and the proof of Theorem 2. S2 follows from Theorem 3, where the specific "procedure containing dd" in the expression for AA in Theorem 3 has been replaced by the equally specific "procedure X". That the union operation of AA, combining St and S2, does not thereby repre sent spurious paths in AA, it is only necessary to show that the paths represented in Si never cross with the paths represented in S2. Two paths cross if each path makes an unreturned call to the same procedure. All paths in S2 end with an unreturned call of procedure X. All paths in St begin with an unreturned call contained in procedure X. Assume that both S and S2 include an unreturned call to the same procedure. As all paths in S2 lead to procedure X, this means there exists an exe cution path that originates in procedure X and eventually calls procedure X. Thus, Figure 3.1. An example call structure that does not allow overestimation. the execution path represents recursion, and this is contradicted by Assumption 3. Therefore, the paths represented in S1 never cross with the paths represented in S2. The first unreturned call in P is not added to TT because the path P is an extension of the unreturned paths represented in S2. That TT + S2 r T follows from Definition 1 requiring TT C AA, and from the definition of AA for this theorem. 0 The four theorems given above will be used to build the algorithm given in the next section. In effect, a given Allow set represents possible execution paths with unreturned calls by which the definition associated with that Allow set was affected. Inversely, the Allow set identifies, in effect, those continuation paths that can make unmatched returns. However, missing from the Allow set is the information needed to enforce an ordering of the unmatched returns that the continuation path may make. To a large extent, this missing information is unnecessary because of Lemma 1. Typically, the call structure of the program itself enforces the ordering of the unmatched returns. Figure 3.1 is an example. Assume d affects dd, giving an Allow set of {cl,c2} for dd. Given a continuation path from dd, it is not possible for cl to be returned to before c2, so the correct ordering of unmatched returns is enforced by the program itself. However, there are cases where the missing ordering information can result in a continuation path taking unwanted shortcuts. Figure 3.2 gives an example of a call structure that allows the continuation path from dd to make an unwanted shortcut when given the right circumstances. Assume d affects dd along the paths clc2 and c3c4, giving an Allow set of {cl,c2,c3,c4} for dd. Assume the continuation path is r2c5r3, where r2 and r3 are unmatched returns cl c3 c2 c4 Figure 3.2. An example call structure that allows overestimation. to calls c2 and c3. The unmatched return r3 should not be allowed to happen before an unmatched return r4, but this unmatchedreturn ordering will not be enforced by the Allow set defined in this dissertation, so the assumed continuation path is possible. By virtue of such a spurious continuation path, dd may be able to affect a definition or use that it would not otherwise be able to affect, assuming dd were confined to only legitimate continuation paths. In practical terms, this means that the computed logical ripple effect that consists of affected definitions and uses may in fact be an overestimate because of spurious continuation paths. Although the Allow set does permit spurious continuation paths under the right circumstances, of which Figure 3.2, and the assumed paths by which d affected dd, are the most simple example, we feel that these circumstances, along with spurious paths that affect what would otherwise be unaffected, will not occur often enough in real programs to undermine the general usefulness of the Allow set in constraining backward flow and permitting computation of a precise or semiprecise logical ripple effect. 3.2 The Logical Ripple Effect Algorithm This section presents an algorithm for computing a precise interprocedural logi cal ripple effect. After a brief overview of the algorithm, the dataflow analysis method used by the algorithm is discussed. Then, two important properties of the dataflow sets are detailed, followed by three rules that are used to impose backwardflow re strictions on the dataflow analysis that is done. Last are proofs that the algorithm is correct. The algorithm to compute logical ripple effect is shown in Figure 3.3. Each statement in the algorithm is numbered on the left. For convenience, algorithm statements will be referred to as lines. For example, a reference to line 28 means the statement at 28 that actually is printed on several lines. Comments in the algorithm begin with . I and T are just two different, fixed, arbitrary values. In general, the algorithm works as follows. A definition d and its associated Allow and Transform sets are popped from the stack (line 7), and then the reaching definitions dataflow problem is solved for this definition d, imposing any backward flow restrictions represented by the Allow and Transform sets (line 8). Reaching definitions for a single definition is the problem of finding all uses and definitions affected by the definition. The definition d that was dataflow analyzed, and any uses affected by it, are included in the ripple effect (lines 9 to 11). Each affected definition will have its Allow and Transform sets determined in accordance with Theorems 1 through 4 (lines 22 to 46). A check is then made to see if the affected definition and its restriction sets, Allow and Transform, should be added to the stack for dataflow analysis or not (lines 47 to 52). The algorithm ends when the stack is empty. Although the algorithm shows a single definition b being added to the stack at line 5, any number of different b can actually be added, along with empty restriction sets for each b. Compute the logical ripple effect for a hypothetical or actual definition b Input: a program flowgraph ready for dataflow analysis Output: the logical ripple effect in RIPPLE begin 1 RIPPLE 0 2 for each definition dd in the program 3 FINdd _ I end for 4 stack + 0 5 push (b, 0, 0) onto stack 6 while stack 5 0 do 7 pop stack into (d, ALLOW, TRANSFORM) 8 Solve the reachingdefinitions dataflow equations for the single definition d, using Rules 1, 2, and 3. 9 RIPPLE + RIPPLE U {d} 10 for each use u in the program that is affected by either dl or d2 11 RIPPLE RIPPLE U {u} end for 12 ROOT1 0, LINK1 + 0, ROOT2 + 0, LINK2 + 0 13 for each call node n in the flowgraph 14 if dl E Bou[n] and di crossed from this call into the called procedure 15 ROOT1 ROOT1 U {the call node n} fi 16 if di E E ut[n] and di crossed from this call into the called procedure 17 LINK1 LINK1 U {the call node n} fi 18 if d2 E Bout[n] and d2 crossed from this call into the called procedure 19 ROOT2 ROOT2 U {the call node n} fi 20 if d2 E Eat[n] and d2 crossed from this call into the called procedure 21 LINK2 + LINK2 U {the call node n} fi end for Figure 3.3. The logical ripple effect algorithm. 22 for each definition dd in the program that is affected by either dl or d2 determine Allow and Transform for dd by Theorem 1 23 if d2 E B,,[node where dd occurs] 24 PATHS * 0, TRANS + 0 25 call Analyze else determine Allow and Transform for dd by Theorem 2 26 if d2 E Ei,[node where dd occurs] 27 PATHS 0 28 PATHS x { x z (ROOT2 U LINK2) A (x calls the procedure that contains dd V x calls a procedure that contains a call c E (PATHS n LINK2))} 29 TRANS ROOT2 n PATHS 30 call Analyze fi determine Allow and Transform for dd by Theorem 3 31 if di E B,n[node where dd occurs] 32 PATHS + 0 33 PATHS {x I x E ALLOW A (x calls the procedure that contains dd V x calls a procedure that contains a call c E PATHS)} 34 TRANS TRANSFORM n PATHS 35 call Analyze fi determine Allow and Transform for dd by Theorem 4 36 if di E E,, [node where dd occurs] 37 for each procedure X that contains a call E ROOT1 38 RT1 {x I x E ROOT1 A x is contained in procedure X} 39 PP < 0 40 PP . {x I x E (RT1 U LINK1) A (x is on a path that inclusively begins with a call E RT1 and ends with a call of the procedure that contains dd, such that each call in this path is in (RT1 U LINK1))} 41 if PP 0 42 PATHS 0 43 PATHS {X x E ALLOW A (x calls procedure X V x calls a procedure that contains a call c E PATHS)} 44 TRANS TRANSFORM n PATHS 45 PATHS PATHS U PP 46 call Analyze end statements: fi, end for, fi, fi, end for, od end Figure 3.3. continued. Procedure Analyze begin avoid repetition of dd dataflow analysis if possible 47 if FINdd # T A (PATHS = 0 V (true for all saved pairs for dd: PATHS g P V TRANS g T)) 48 if PATHS = 0 49 FINdd C T 50 push (dd, 0, 0) onto stack else 51 save PATHS and TRANS as the pair P x T for dd 52 push (dd, PATHS, TRANS) onto stack fi fi end Figure 3.3. continued. The dataflow equations referred to in line 8 are shown in Figure 3.4. These equations are copied from Chapter 2 that presents a method for contextdependent flowsensitive interprocedural dataflow analysis. The method consists of solving using the standard iterative algorithmthe dataflow equations shown in Figure 3.4, for the program flowgraph required by the equations. The method in Chapter 2 includes a solution to the problems of parameter aliasing and implicit definitions, that are part of the interprocedural reachingdefinition problem. We assume that the full method of Chapter 2 would be used, but we do not discuss these side issues in this chapter as they are not directly relevant to the algorithm. Note that there are other methods for contextdependent flowsensitive interprocedural dataflow analysis [3, 9, 17, 21], but the method of Chapter 2 has precision and efficiency advantages over the other methods cited. Referring to the dataflow equations of Figure 3.4, four sets are computed for each flowgraph node: two body sets, Bi, and B,,t, and two entry sets, Ei, and E,,t. All body and entry sets are initially empty. As the equations will be solved for only a single definition d, the GEN set for the node where d occursi.e. the node whose For any node n. IN[n] = Ein[n] UBi[n] OUT[n] = E,,,[n] U B,,t[n] Group I: n is an entry node. Bin[n] = 0 Ei,[n] = U {x I E OUT[pl A C,} p E pred(n) B,,t[n] = GEN[n] Eat[n] = Ei,[n] U RECODE[n] Group II: n is a return node, p is the associated call node and q is the exit node of the called procedure. Bin[n] = {x I (x E BS,,[p] A (C V (C1 A C2 A x E Eo,[q]))) V (x E Bot[q] A C2)} Ei,[n] = {x E Eo,,[p] I Ci V (C1 A C2 A z E ,,t[q])} Bo.t[n] = (B,n[n] KILL[n]) U GEN[n] Eot[n] = E;[n] KILL[n] Group III: n is not an entry or return node. Bi,[n] = U Bout[p] p E pred(n) Ei.[n] = U Eo,,[p] p E pred(n) Bout[n] = (Bin[n] KILL[n]) U GEN[n] Et[n] = Ein[n] KILL[n] Figure 3.4. Dataflow equations for the reachingdefinitions problem. associated block of program code contains the definition dwill contain an element representing d, and all the other GEN sets will be empty. The node where d occurs is the natural starting point for the iterative algorithm, that will recompute the body and entry sets for the nodes until stability is attained and the sets cease to change, at which point the equations have been solved. Once solved, an element is in the entry set or body set at a particular node depending on how that element was propagated to that node. The same element may be in both sets at the same node. Properties 1 and 2 listed below, summarize those implications of set membership that are used by the algorithm. The properties follow directly from the dataflow equations. Property 1. For any node n, an element is in the Ei,[n] set or E,,t[n] set if and only if that element entered the procedure that contains node n from a call node, and there is a definitionclear path from that call node to node n. Thus, membership in the entry set of node n implies that the element can propagate to node n by an execution path that makes at least one unreturned call between the point where the element is generated and the point where node n occurs. Property 2. For any node n, an element is in the Bi,[n] set or Bo,,t[n] set if and only if that element was generated in the same procedure that contains node n, or that element entered the procedure that contains node n from an exitnode B,,t set. There must also be a definitionclear path to node n from either the element's generation node or from the exit node. If the element entered from an exitnode Bot set, then Property 2 applies recursively to the element in that B,,t set. Thus, membership in the body set of node n implies that the element can propagate to node n by an execution path between the point where the element is generated and the point where node n occurs that does not include any unreturned calls. The three rules referred to in line 8 are listed below. Rule 1 applies before the dataflow equations are solved. Rules 2 and 3 apply as the equations are being solved. The rules impose the backwardflow restrictions represented by the ALLOW and TRANSFORM sets in line 7. Rule 1. If ALLOW = 0 then element d2 is generated at the node where definition d occurs, otherwise d, is the generated element, meaning the element in the GEN set. Both dl and d2 are base elements that represent the same definition d. Both elements are identical in terms of when they appear in any given KILL set. The only difference between them is that dl and d2 are treated differently by Rules 2 and 3 below. If the ALLOW set is empty, then by Definition 1 there should be no backward flow restrictions on d. Rule 1 accomplishes this requirement, as d2 is immune to backwardflow restrictions which are imposed by Rule 2. Rule 2. Let n be a return node, p be the associated call node, and q be the exit node of the called procedure. Each time the Bi,[n] equation is computed, if di E Bot[ q ], then di cannot cross from Bout[ q] into the Bi,[n] set if p ALLOW. In the dataflow equations, the crossing of an element from an exitnode body set to a return node is the only action in the equations that represents, in effect, an unmatched return to a call instance that was made in an execution path leading up to the program point where definition d occurs, which is the starting point of the reachingdefinition analysis done for d. Thus, Rule 2 covers all cases in which an unmatched return occurs. Rule 2 restricts unmatched returns to those call instances that are represented in the ALLOW set, thereby realizing the purpose of the ALLOW set as given by Definition 1. Rule 3. Let n be a return node, p be the associated call node, and q be the exit node of the called procedure. Each time the Bi,[n] equation is computed, if di E Bout[ q], and, by C2 and Rule 2, d, can cross from Bout[ q into the Bi,[n] set, and p E TRANSFORM, then as this dl element crosses from Bot[q] into the Bi,[n] set, the element is changed to d2. In effect, dl is transformed into d2, and the return node n becomes a generation node for the d2 element. As already mentioned, in the dataflow equations the crossing of an element from an exitnode body set to a return node is the only action in the equations that represents, in effect, an unmatched return to a call instance that was made in an execution path leading up to the program point where definition d occurs, which is the starting point of the reachingdefinition analysis done for d. Thus, Rule 3 covers all cases in which an unmatched return occurs. The requirement by Rule 3 that the returnedto call be in the TRANSFORM set satisfies Definition 1 as to when backwardflow restrictions can be ignored. Rule 3 replaces element dl, which is subject to the backwardflow restrictions, with element d2, which is free of backward flow restrictions, at the return point and thereby satisfies Definition 1 regarding removal of backwardflow restrictions on the executionpath continuation, since d2 now represents the continuation instead of dl. Lemma 2. The algorithm computes at lines 23 to 46 the restriction sets for an affected definition in accordance with Theorems 1 through 4. Proof. We first establish the properties of the LINK and ROOT sets computed at lines 12 to 21. Let p be the node, if any, where dl is generated. Let q be any node where d2 is generated, i.e. those return nodes where dl is transformed into d2, or for ALLOW = 0 the node where definition d occurs. The tests at lines 14 and 18 make use of Property 2: if an element is in the Bout set of a call node n, then there exists a definitionclear path between the node where the element is generated and node n, and the path has no unreturned calls. The call at node n would be the first unreturned call on that path by just extending the path to the entry node of the called procedure. Therefore, the ROOT1 set represents all calls that are the first unreturned call on at least one definitionclear path between node p and some other node in the flowgraph. The ROOT2 set represents all calls that are the first unreturned call on at least one definitionclear path between node q and some other node in the flowgraph. The tests of lines 16 and 20 make use of Property 1: if an element is in the E,,t set of a call node n, then there exists a definitionclear path between the node where the element is generated and node n, and the path includes the unreturned call that called the procedure containing node n. The call at node n would be at least the second unreturned call on that path by just extending the path to the entry node of the called procedure. Therefore, the LINK1 set represents all calls that are an unreturned call but not the first unreturned call on at least one definitionclear path between node p and some other node in the flowgraph. The LINK2 set represents all calls that are an unreturned call but not the first unreturned call on at least one definitionclear path between node q and some other node in the flowgraph. The test at line 23 checks for the application of Theorem 1. If d2 E Bi, [node where dd occurs], then by Property 2 there exists a definitionclear path P between d and dd that has no unreturned calls, and somewhere along P, d2 is generated, meaning either ALLOW = 0 or K is defined for P. This satisfies the conditions of Theorem 1, and line 24 sets PATHS and TRANS to empty in accordance with the theorem. PATHS and TRANS are the Allow and Transform sets for dd. The test at line 26 checks for the application of Theorem 2. If d2 E Ei[node where dd occurs], then by Property 1 there exists at least one definitionclear path P between d and dd that has at least one unreturned call, and somewhere along P, d2 is generated, meaning either ALLOW = 0 or K is defined for P. This satisfies the conditions of Theorem 2. Only the d2 element satisfies the theorem, so it follows that all paths P for the theorem will have to be constructed from the ROOT2 and LINK2 sets exclusively. Referring to Theorem 2, line 28 computes the AA set, and line 29 computes TT. For line 28, the PATHS set is defined in terms of itself. This recursive reference means that each time a call is added to the PATHS set, the condition containing the recursive reference must be reevaluated, because additional calls may thereby be added to PATHS. Recursive references are similarly used in lines 33 and 43. What line 28 does is extract from all the calls that element d2 crossed, just those calls that are on a path to dd. This is done by building the paths backwards, beginning with those calls that call the procedure containing dd. By Lemma 1, any path between d and dd consisting of unreturned calls can be found by proceeding in reverse order from dd and selecting those calls that call a procedure containing a call already selected. Backward path building and Lemma 1 are similarly used in lines 33 and 43. By the properties of the ROOT and LINK sets, the paths constructed by line 28 will be definitionclear. Notice that a particular call may be in both the ROOT2 and LINK2 sets, but if a call is only in the ROOT2 set, then it cannot be used as the basis for extending further backwards any path, because by Property 1, d2 does not propagate from the entry node of the procedure that contains that call, to the call node for that call. This is the reason for the (PATHS n LINK2) requirement in line 28. Once the PATHS set is computed, line 29 computes TRANS in accordance with the theorem. The test at line 31 checks for the application of Theorem 3. If di E Bi [node where dd occurs], then by Property 2 there exists a definitionclear path P between d and dd that has no unreturned calls. It also follows that ALLOW # 0 and P does not make an unmatched return to a call E TRANSFORM, because d, is the element, meaning K is not defined for P. This satisfies the conditions of Theorem 3. Referring to Theorem 3, line 33 computes the AA set, and line 34 computes TT. What line 33 does is extract from ALLOW all paths that end with a call of the procedure containing dd. Although Theorem 3 states that the path begin with a call E TRANSFORM, line 33 does not require a check for this because TRANSFORM is a subset of ALLOW and those first unreturned calls in TRANSFORM that are on a path in ALLOW to dd, will unavoidably be picked up as the paths are built backwards from dd. Thus, the PATHS set is computed in accordance with Theorem 3, followed by line 34 that computes the TRANS set in accordance with the theorem. The test at line 36 checks for the application of Theorem 4. If dl E i, [node where dd occurs], then by Property 1 there exists at least one definitionclear path P between d and dd that has at least one unreturned call. It also follows that ALLOW : 0 and P does not make an unmatched return to a call E TRANSFORM, because dl is the element. This satisfies the conditions of the theorem. Only the dl element satisfies the theorem, so it follows that all paths P for the theorem will have to be constructed from the ROOT1 and LINK1 sets exclusively. Referring to Theorem 4, line 40 computes the S1 set, line 43 computes the S2 set, line 44 computes the TT set, and line 45 computes the AA set. The reason for the test at line 41 is that although there exists at least one path P satisfying the theorem, there may not be any paths P that begin in the specific procedure X. It can be seen that lines 37 to 46 compute in accordance with the theorem. 0 Lemma 3. Let Ai and Ti be one pair of Allow and Transform sets associated with a definition d, and let Aj and Tj be a different pair of Allow and Transform sets associated with the same definition d. Assume Ai 5 0 and Aj : 0. If Aj C Ai and Tj C Ti, then dataflow analyzing d with the pair Aj and Tj cannot add anything to the ripple effect that is not added by dataflow analyzing d with the pair Ai and T,. Proof. By inspection of Rules 1, 2, and 3, it can be seen that removing some of the calls from Ai or Ti cannot make d affect anything that it does not affect with Ai and Ti as they were. Also, by inspection of lines 23 to 46, the determination of the Allow and Transform sets for any definition dd affected by d, cannot be made to include calls when Aj and Tj are the restriction sets for d, that would not be included when Ai and Ti are the restriction sets for d. O Lemma 4. Let A and T be Allow and Transform sets associated with a definition d, and let X and Y be a different pair of Allow and Transform sets associated with the same definition d. If A = 0, then dataflow analyzing d with X and Y cannot add anything to the ripple effect that is not added by dataflow analyzing d with A and T. Proof. By Rule 1, d will be represented by d2 and have no restrictions on its backward flow. Thus, d will affect everything that it is possible for it to affect. If d is dataflow analyzed with X and Y, then any calls found in the ROOT1, ROOT2, LINK1, or LINK2 sets will also be found in the ROOT2 or LINK2 sets when d is dataflow analyzed with A and T. These sets determine the restriction sets associated with a definition dd affected by d. It follows that any dataflow path allowed for a dd affected by d using X and Y, will also be allowed for a dd affected by d using A and T. o Theorem 5. Given Definition 1 and Theorems 1 through 4, the algorithm will correctly compute the logical ripple effect. Proof. As shown by Lemma 2, for any affected definition dd, the Allow and Transform sets to be associated with dd are computed in accordance with Theorems 1 to 4. By Lemma 4, if Theorem 1 applies to an affected definition (line 23), then there is no need to check if any other theorem also applies, because additional dataflow analysis resulting from the other theorems cannot contribute to the ripple effect. However, if Theorem 1 does not apply, then the definition must be dataflow analyzed separately in turn for each theorem that does apply. This is done by the sequence of three if statements at lines 26, 31, and 36. Thus, the control logic in lines 23 to 46 is safe. The Analyze procedure (lines 47 to 52) prepares a definition and its restriction sets for dataflow analysis by adding them to the stack (line 50 and 52). Once a defi nition will be dataflow analyzed with no restrictions (line 50) it will not be analyzed again (line 47). By Lemma 4, this is safe. Assuming FINdd 0 T and PATHS # 0, the test at line 47 will not prepare a definition for dataflow analysis if both restriction sets are subsets of any pair of restriction sets used previously to analyze that definition. This follows from Lemma 3. Thus, the Analyze procedure is safe. The correctness of the dataflow equations (line 8) is established in Chapter 2, and the correctness of the three rules for imposing backwardflow restrictions (line 8) has already been discussed. Regarding the correctness of having no backwardflow restrictions for the initial definition (line 5), let p be the program point where b occurs. For execution to attain point p, any possible execution path between the program's execution starting point and point p can be assumed to have occurred. Thus, there should be no restrictions on the backwardflow possibilities of b, because there were no constraints imposed by the ripple effect on how point p was initially attained. O Programs with recursive calls can be processed by our algorithm, but there may be some overestimation of the logical ripple effect because of the recursive calls. The dataflow equations (line 8) are not the problem, as they work for recursive programs. Instead, the problem is with the Allow set and its representation of execution paths. If a cyclic execution path is represented in the Allow set, then when the Allow set is used to restrict backward flow by Rule 2, it may be possible for an element moving through the program flowgraph to take a shortcut on its unmatched returns and avoid having to make unmatched returns along the complete cycle before a program point can be attained. This shortcut may permit the element to affect something that it should not be able to affect, possibly adding to the ripple effect beyond what should be there. 3.3 A Prototype Demonstrates the Algorithm This section first considers the complexity of our interprocedural logical ripple effect algorithm. A prototype that demonstrates the algorithm is then described, and test results presented. Let n be the number of nodes in the flowgraph of the input program. For a programming language such as C, solving the dataflow equations for a single defi nition, which is what line 8 does, has worstcase complexity of O(n). Let k be the number of known calls in the input program. Considering line 47, a definition may be dataflow analyzed repeatedly as long as the associated restriction sets are not subsets of any previous pair of restriction sets used to dataflow analyze that definition. The number of different restriction sets possible such that no set is a subset of another set, is clearly a number that will grow exponentially with k. Thus, the worstcase complexity of our logical ripple effect algorithm is exponential, where the exponent is some function of k. However, for the typical input program, the actual number of nonsubset restriction sets that can be generated by our algorithm for a given defini tion, will be severely constrained by a combination of Lemma 1, Theorems 1 through 4, and the typical program call structure that is characterized by shallow call depth. A prototype that demonstrates our logical ripple effect algorithm has been built. The prototype accepts as input C programs that satisfy certain constraints, such as having only singleidentifier variable names. Given an input program, the prototype then requires that one or more definitions be identified as the starting point of the ripple effect. For purposes of comparison, besides using our algorithm to compute a precise logical ripple effect, the prototype also computes an overestimate of the logical ripple effect. The overestimate is computed by simply ignoring the execution path problem, i.e. there are no backwardflow restrictions when the overestimate is computed. The worstcase complexity of computing the overestimate for C programs is only O(nd) where n is the number of flowgraph nodes and d is the number of definitions in the overestimated ripple effect. This complexity follows from the O(n) complexity of solving the dataflow equations for a single definition, and the fact that the equations will have to be solved d times. Table 3.1. Experimental results for the prototype. global defs defs global depth nodes RSo RSp reduction time time, 50 2420 7% 2/213 3939 2275 936 53.4% 5s 3s 100 2291 15% 2/188 3776 4151 2449 41.0% 17s 13s 200 2294 30% 2/188 3662 5594 3718 33.5% 40s 32s 300 2370 45% 2/231 3962 5897 2607 55.8% lm5s 27s 50 2225 7% 3/202 3717 1222 633 40.3% 3s 2s 100 2333 15% 3/229 3864 4139 1867 54.9% 17s 7s 200 2211 30% 3/231 3760 4884 2688 45.0% 39s 28s 300 2236 45% 3/205 3737 5308 3505 34.0% 59s 38s 50 2320 7% 4/227 3912 1822 1067 35.1% 5s 3s 100 2211 15% 4/228 3673 4329 1525 64.8% 18s 7s 200 2223 30% 4/227 3705 5019 1918 61.8% 37s 16s 300 2214 45% 4/214 3648 5922 4740 20.0% lm9s 1m36s 100 4354 7% 2/372 6858 4317 2201 40.0% 19s 10s 200 4467 15% 2/368 7068 8844 6457 27.0% lml7s 1ml2s 400 4261 30% 2/388 6851 9653 2976 69.2% 2m29s 49s 600 4289 45% 2/340 6784 10590 6840 35.4% 4m8s 3m56s 100 4314 7% 3/432 6781 1993 631 52.5% 8s 2s 200 4268 15% 3/395 6876 5795 3236 35.5% 51s 54s 400 4223 30% 3/393 6735 9240 7307 20.9% 2m26s 4m21s 600 4248 45% 3/433 6868 9772 6453 30.6% 3m56s 4m50s 100 4252 7% 4/455 6961 2756 1120 42.6% 14s 5s 200 4276 15% 4/440 6858 7781 5752 26.1% ImlOs 2m35s 400 4228 30% 4/391 6681 9838 8290 15.7% 2m45s 9m20s 600 4112 45% 4/462 6802 10017 9192 8.2% 4m24s 39m55s Table 3.1 presents test results for the prototype. Each row details relevant char acteristics of an input program, and presents the resulting averages of ten different tests of that input program, where each test computed the ripple effect started by a single, randomly chosen definition of a global variable. The input programs of Table 3.1 were randomly generated by a separate pro gram generator. The generated input programs are syntactically correct and compile without error, but have meaningless executions. Each input program of Table 3.1 has 100 procedures, and exactly the number of global variables listed. Within each input program, each global variable is defined and used at least once. The call structure of each input program was determined randomly by the generator, with the constraint that there be no recursion in the input program, and the given maximum call depth not be exceeded by any call in the input program. All calls in the generated input program are known calls, and approximately 1/(max + 1) of the calls will be at each possible depth from zero to max, where max is the given maximum call depth. Referring to the columns of Table 3.1, globall" is the number of global variables in the input program, "defs" is the number of definitions in the input program, "defs global" is the percentage of the definitions that define a global variable, "depth" is the maximum call depth followed by the total number of calls in the input program, "nodes" is the number of nodes in the flowgraph, "RSo" is the average size of the overestimated ripple effect for the ten test cases where size is the total number of definitions and uses in the ripple effect, "RSp" is the average size of the precise ripple effect, "reduction" is the average percentage reduction for the ten test cases of the size of the overestimated ripple effect when it is replaced by the precise ripple effect, timee" is the average CPU usage time for each test case to compute the overestimated ripple effect, and "timer" is the average CPU usage time for each test case to compute the precise ripple effect. The hardware used was rated at roughly 24 MIPS. As an example of the time notation used in Table 3.1, time 1m36s would be read as 1 minute, 36 seconds. Although the worstcase complexity of our algorithm for precise logical ripple effect is exponential, the data of Table 3.1 indicates that the expected complexity for a wide range of input programs, given a programming language such as C, is approxi mated by O(nd). This follows from the O(nd) worstcase complexity of computing the overestimate, and the typical closeness of time, and time, for each row in Table 3.1. However, the last row of Table 3.1 is instructive, because it shows that regardless of what the expected complexity might be, there will always be specific input programs and starting points that require time greatly exceeding the time required to compute the overestimate. In practice, if the computation of the precise logical ripple effect is taking too long, then this computation can be abandoned and the overestimate computed and used in its place. Note that our algorithm can very easily compute the overestimate by simply modifying Rule 1 so that element d2 is always generated in place of element d\, thereby avoiding all backwardflow restrictions. 3.4 The Slicing Algorithm This section presents the inverse form of the precise interprocedural logical ripple effect algorithm, and the inverse form of the associated dataflow equations and backwardflow restriction rules. Our algorithm for precise interprocedural slicing is shown in Figure 3.5. The complexity and expected performance of this algorithm is the same as for the precise interprocedural logical ripple effect algorithm given previously. For logical ripple effect, the dataflow problem solved at line 8 was reaching definitions for a single definition. For slicing, which is the inverse problem, the dataflow problem solved at line 8 will be reaching uses for a single use. In reaching definitions, the definition flows in the direction of the arcs in the flowgraph, and is killed by definitions of the same variable, and affects uses of the same variable and any definitions directly dependent on an affected use. In reaching uses, the use flows in the reverse direction of the arcs in the flowgraph, and is killed by definitions of the same variable, and affects definitions of the same variable and any uses that directly determine an affected definition. This reverse flow in the flowgraph means that the dataflow equations solved at line 8 for the slicing algorithm must be an inverted form of the dataflow equations that are used for the logical ripple effect algorithm. These inverted dataflow equations are shown in Figure 3.6. The inverted rules that the slicing algorithm uses for backwardflow restriction are given below. Notice that the ALLOW and TRANSFORM sets will contain returns instead of calls. Compute the slice for a hypothetical or actual use b Input: a program flowgraph ready for dataflow analysis Output: the slice in SLICE begin 1 SLICE 0 2 for each use uu in the program 3 FINU, 1. end for 4 stack 0 5 push (b, 0, 0) onto stack 6 while stack 0 0 do 7 pop stack into (u, ALLOW, TRANSFORM) 8 Solve the reachinguses dataflow equations for the single use u, using Rules 1, 2, and 3. 9 SLICE SLICE U {u} 10 for each definition d in the program that is affected by either ul or u2 11 SLICE < SLICE U {d} end for 12 ROOT1 0, LINK1 0, ROOT2 + 0, LINK2 + 0 13 for each return node n in the flowgraph 14 if ul E Bin[n] A ul crossed from this return into the returnedfrom procedure 15 ROOT1 < ROOT1 U {the return node n} fi 16 if ul E E,,[n] A ul crossed from this return into the returnedfrom procedure 17 LINK1  LINK1 U {the return node n} fi 18 if u2 E Bin[n] A u2 crossed from this return into the returnedfrom procedure 19 ROOT2 + ROOT2 U {the return node n} fi 20 if u2 E Ei,[n] A u2 crossed from this return into the returnedfrom procedure 21 LINK2 4 LINK2 U {the return node n} fi end for Figure 3.5. The slicing algorithm. 22 for each use uu in the program that is affected by either ul or u2 determine Allow and Transform for uu by Theorem 1 23 if u2 E Bout[node where uu occurs] 24 PATHS 0, TRANS 0 25 call Analyze else determine Allow and Transform for uu by Theorem 2 26 if u2 E Eout0[node where uu occurs] 27 PATHS + 0 28 PATHS {I x E (ROOT2 U LINK2) A (x returns from the procedure that contains uu V x returns from a procedure that contains a return r E (PATHS n LINK2))} 29 TRANS  ROOT2 n PATHS 30 call Analyze fi determine Allow and Transform for uu by Theorem 3 31 if ul E Bot[node where uu occurs] 32 PATHS 0 33 PATHS {x x E ALLOW A (x returns from the procedure that contains uu V x returns from a procedure that contains a return r E PATHS)} 34 TRANS TRANSFORM n PATHS 35 call Analyze fi determine Allow and Transform for uu by Theorem 4 36 if ul E Eout[node where uu occurs] 37 for each procedure X that contains a return E ROOT1 38 RT1 {x  x E ROOT1 A x is contained in procedure X} 39 PP 0 40 PP  {zI x E (RT1 U LINK1) A (x is on a path that inclusively begins with a return E RT1 and ends with a return from the procedure that contains uu, such that each return in this path is in (RT1 U LINK1))} 41 if PP 0 42 PATHS 0 43 PATHS  {x  x E ALLOW A (x returns from procedure X V x returns from a procedure that contains a return r E PATHS)} 44 TRANS TRANSFORM n PATHS 45 PATHS PATHS U PP 46 call Analyze end statements: fi, end for, fi, fi, end for, od end Figure 3.5. continued. Procedure Analyze begin avoid repetition of uu dataflow analysis if possible 47 if FIN.. 0 T A (PATHS = 0 V (true for all saved pairs for uu: PATHS P P V TRANS V T)) 48 if PATHS = 0 49 FIN., T 50 push (uu, 0, 0) onto stack else 51 save PATHS and TRANS as the pair P x T for uu 52 push (uu, PATHS, TRANS) onto stack fi fi end Figure 3.5. continued. Rule 1. If ALLOW = 0 then element u2 is generated at the node where use u occurs, otherwise ul is the generated element. Rule 2. Let n be a call node, p be the associated return node, and q be the entry node of the returnedfrom procedure. Each time the Bot[n] equation is computed, if ul E Bin[q], then ul cannot cross from B,n[q] into the Bout[n] set if p ALLOW. Rule 3. Let n be a call node, p be the associated return node, and q be the entry node of the returnedfrom procedure. Each time the Bo,,u[n] equation is computed, if ui E Bi,[q], and, by C2 and Rule 2, ul can cross from Bi,[q] into the Bou,[n] set, and p E TRANSFORM, then as this ul element crosses from Bin[q] into the Bot[n] set, the element is changed to u2. In effect, ul is transformed into u2, and the call node n becomes a generation node for the u2 element. As the usefulness of slicing is primarily for program fault localization, it may be desirable to modify the algorithm so that those uses in control predicates whose subordinate statements have at least one use or definition already in the slice, are themselves added to the slice and propagated in turn. An example of a control pred icate is the condition tested by an if statement. By subordinate statements is meant For any node n. OUT[n] = E,,t[n] U B,,t[n] IN[n] = En[n] U B, [n] Group I: n is an exit node. Bo.,[n] = 0 E,,[n]= U {x I xEIN[p A C} p E succ(n) B,.[n] = GEN[n] Ei[n] = E,,u[n] U RECODE[n] Group II: n is a call node, p is the associated return node and q is the entry node of the returnedfrom procedure. Bout[n] = {x I (x e Bi,[p] A (C V (C1 A C2 Ax E in[q]))) V (x E Bin[q] A C2)} Eou,[n] = {x E Ei,[p] I C V (C1 A C2 Ax E Ei.[q])} Bi,[n] = (Bot[n] KILL[n]) U GEN[n] Ei,[n] = Eot[n] KILL[n] Group III: n is not an exit or call node. B,,t[n] = U Bi,[pl p E succ(n) Et[n] = U Ein[p] p E succ(n) B,n[n] = (Bo,,[n] KILL[n]) U GEN[n] Ei,[n] = Eot[n] KILL[n] Figure 3.6. Dataflow equations for the reachinguses problem. 76 those statements whose execution is decided by the control predicate. Including these controlpredicate uses in the slice is advantageous because the cause of a program error may actually be in a control predicate that is not deciding correctly when to execute its subordinate statements. Ferrante et al. [8] present a method to precisely determine the control predicates for each statement. CHAPTER 4 INTERPROCEDURAL PARALLELIZATION 4.1 LoopCarried Data Dependence This section explains loopcarried data dependence and its relevance to paral lelization. When a definition of a variable reaches a use of that variable, then a data dependence exists such that the use depends on the definition. An example of data dependence can be seen in Figure 4.1. The use of A(I) at line 3, and the use of A(I) at line 4, both depend on the definition of A(I) at line 2. However, when considering whether or not a loop can be parallelized, there is a special kind of data dependence called loopcarried data dependence [25]. A data dependence is loop carried if the value set by a definition inside the loop during loop iteration i can be used by a use of that variable inside the loop during loop iteration j, where i $ j. Note that i $ j is specified instead of the more restrictive and natural seeming i < j, because if the loop is parallelized then the ordering of the loop iterations cannot be assumed. The relationship between loopcarried data dependence and parallelization is straightforward. If there is at least one loopcarried data dependence, then the loop cannot be parallelized, otherwise the loop can be parallelized. Loop parallelization 1 DO I = 1,N 2 A(I)= B(I) C(I) + D 3 B(I)= C(I) / D + A(I) 4 IF C(I) < 0 THEN C(I) = A(I) B(I) FI END DO Figure 4.1. An example loop. would mean that the ordering of the different iterations of the loop is unimportant, whereas a loopcarried dependence means the opposite. If there are no loopcarried data dependencies then there is no requirement that the iterations be ordered a certain way. However, whenever a loop is parallelized, there should be a following, added, serial step that sets the iteration variables, such as the I in Figure 4.1, to whatever their values would be for the last iteration of the loop, assuming the loop had not been parallelized. This added step would be necessary, assuming the iteration variables of a loop are visible outside the loop and can therefore be referenced after the loop completes. Iteration variables are those variables that are incremented or decremented a constant value for each loop iteration. The recognition of iteration variables is languagedependent. Regarding data dependence and arrays, there are several efficient tests available that determine if a data dependence is possible between a particular definition and use of an array. The tests are the separability test, the gcd test, and the Banerjee test. Details of these three tests can be found in [25]. The number theory behind the tests is linear diophantine equations. A linear diophantine equation can be formed from the array subscripts of the definition and use in question. For example, in Figure 4.2 we want to know if A(3 I 5) and A(6 I) can ever refer to the same array element. The linear diophantine equation that relates these two array references would be 3x 6y = 5. The question now becomes does this equation have any integer solutions given the boundary conditions 30 < x, y < 100. If there is at least one integer solution, then there would be a data dependence, otherwise there is no data dependence, as is the case with Figure 4.2. For the discussion that follows, we define the term loop body. The loop body of any loop L will be all statements in the program that can possibly be executed during the iterations of loop L. Calls are allowed in a loop, so a single loop body could conceivably include the statements of many different procedures. For example, DO I = 30,100 A(3 I 5) = ... ... A(6 I) END DO Figure 4.2. A loop with array references. if a loop contains a call of procedure A, and procedure A contains a call of procedure B, then the loop body would include all the statements of procedures A and B. In Figure 4.1, the loop body is the four statements at lines 1 through 4. With respect to the program flowgraph, the loop body is all flowgraph nodes that may be traversed during the iterations of the loop. Let LB be the set of flowgraph nodes that are in the loop body of loop L. Let n be the first node in the loop body that is traversed during each iteration of the loop. The identification of node n is languagedependent. Within the loop body of L, let definition d be a definition of a nonarray variable v, and let use u be a use of the variable v that is reached by definition d. Let d be the node in the loop body where definition d occurs, and let u be the node in the loop body where the use u occurs. To avoid the complications posed by special cases, we assume that d, n, and u are separate and distinct nodes. Although use u depends on definition d because definition d reaches use u, this data dependence can prevent parallelization of loop L only if the dependence is loop carried. Let P be a sequence of flowgraph nodes drawn from LB, such that P represents a possible execution path along which definition d can reach use u. For definition d to be loopcarried to use u along path P, the three nodes, d, n, and u, must be in P, and in that order, because only the traversal of node n represents the transition to a different iteration of the loop. If v is an array, then we assume that definition d and use u may refer to different array elements during the same iteration. For this reason, a path P that includes the nodes d, u, n, d, u, in that order, must be assumed to show a loopcarried data dependence when v is an array, whereas this path P does not show a loopcarried data dependence if definition d and use u always refer to the same storage location during any iteration, as we assume is the case when v is a nonarray, because in any iteration that follows such a path P, the value used at use u is always the value defined at definition d in that same iteration. 4.2 The Parallelization Algorithm This section presents in Figure 4.3 an algorithm that identifies loops that can be parallelized, including loops that contain calls. The algorithm uses our interprocedu ral dataflow analysis method as an integral step to determine data dependencies. The loops that can be parallelized are those loops that are not marked by the algorithm as inhibited. The algorithm has three distinct steps. First, the reachingdefinitions dataflow problem is solved for the input program by using our interprocedural dataflow analysis method. Second, the quality of the reachingdefinition information computed by the first step is possibly improved in the case of array references by using the separability, gcd, and Banerjee tests. Third, individual d, u pairs that represent data dependence are examined for loopcarried data dependence. At line 7, the definitions and uses of iteration variables are excluded from testing for loopcarried data dependence, because for any iteration the iteration variables will have constant values that can be precomputed if loop L is parallelized. The test at line 8 is a necessary condition for the Ptest procedure to return a T, which is tested for at line 9. The test at line 8 is done as an economy measure to avoid, when possible, the more costly Ptest. Procedure Ptest uses a straightforward algorithm that begins with node d and then spreads out examining successors, successors of successors, and so on, until either there are no more acceptable nodes to examine, in which case F is returned, or all the requirements for path P have been met, in which case T is returned. The successors a d, u pair is a definition d that reaches a use u x is the dataflow element that represents the definition d v is the variable referenced by definition d and use u to avoid complications, n # d # u is assumed n is the first node traversed during each loop L iteration d is the node whose basic block contains definition d u is the node whose basic block contains use u LB is the set of nodes in the loop body of loop L IV is the set of definitions of iteration variables for loop L begin step 1, determine reaching definitions for the input program 1 use our method to solve the reachingdefinitions dataflow problem step 2, improve the reachingdefinition information for array references 2 for all d, u pairs in the program, such that v is an array 3 use the separability, gcd, and Banerjee tests as applicable 4 if definition d and use u can never reference the same element 5 mark the d, u pair as nonreaching fi end for step 3, identify d, u pairs that inhibit parallelization 6 for each loop L in the program 7 for each reaching d, u pair such that d, u E LB and definition d IV 8 if x E Bot[n] 9 if Ptest(x, n, d, u, L, LB) = T 10 mark L parallelization as inhibited by the d, u pair fi fi end for end for end Figure 4.3. The parallelization algorithm. procedure Ptest(x, n, d, u, L, LB) is there a loopcarried data dependence from definition d to use u thru node n return T if yes, F if no begin partly, is there a path from d to n along which x is found 11 if v is an array 12 DONE {d} else 13 DONE  {d, u} fi 14 NEXT < {d} 15 until NEXT = 0 16 remove a node from NEXT, denote it p 17 for each successor node s of node p, such that s a DONE 18 DONE DONE U {s} 19 if s V LB Vs is an entry node Vx V Bo,,[s] 20 ignore s 21 else if s = n 22 goto part else 23 NEXT + NEXT U {s} fi end for end until 24 return F Figure 4.3. continued. part2: part2, is there a path from n to u along which x is found 25 if v is an array 26 DONE  {n} else 27 DONE {n, d} fi 28 NEXT {n} 29 until NEXT = 0 30 remove a node from NEXT, denote it p 31 for each successor node s of node p, such that s 4 DONE 32 DONE DONE U {s} 33 if s LB Vs is an exit node V(s is contained in the same procedure that contains L A x B Bot[s]) V(s is not contained in the same procedure that contains L A x E,,t[s]) 34 ignore s 35 else if s = u 36 return T else 37 NEXT NEXT U {s} fi end for end until 38 return F end Figure 4.3. continued. of a node are examined because normally a successor node is assumed to represent a possible continuation of the execution path from the point of the predecessor node. Exceptions in the algorithm involving entry and exit nodes are explained shortly. Note that Ptest only determines whether a satisfactory path P exists or not; it does not determine what path P is in terms of an actual node sequence, as there may be many such satisfactory paths P. Lines 13 and 27 are active when v is not an array. In this case, a path P that includes d, u, n, d, u, in that order, is not allowed, and this is prevented by marking the unwanted node u at line 13, and the unwanted node d at line 27. The test of x B,,t[s] at line 19 satisfies the requirement that the definition d can reach along the path P. A similar test is made at line 33. At line 19, only the B set is checked because there are no descents into called procedures, as per the rejection of entry nodes at line 19. Entry nodes are rejected at line 19 because any path from d to n will not leave unreturned calls, because n is an outermost node relative to the loop body, and the path is confined to the loop body. As the successors of each call node are an entry node and a return node, it is only necessary to check the out set of the return node to know whether the element x survived the call or not, and this is effectively done by the x V Bot[s] test already mentioned. At line 33, exit nodes are rejected because any path from n to u will not make a return without first making the call. This follows from the fact, already mentioned, that node n is an outermost node relative to the loop body, and the path is confined to the loop body. As the return node can always be added to the path P from the call node, there is no need to add it from the exit node, hence the rejection of the exit node. For partly and part2 in procedure Ptest, each flowgraph node may appear only once in the NEXT set, hence the complexity of the Ptest procedure is O(n) where n is the number of flowgraph nodes. For the entire algorithm, step3 dominates, so the 85 complexity is O(lpn) where I is the number of loops in the program, p is the number of d,u pairs in the program, and n is the number of flowgraph nodes. CHAPTER 5 CONCLUSIONS AND FUTURE RESEARCH 5.1 Summary of Main Results The first part of this work presented a new method for contextdependent, flow sensitive interprocedural dataflow analysis. The method was shown to produce a precise, lowcost solution for such fundamental and important problems as reaching definitions and available expressions, regardless of the actual call structure of the program being analyzed. By using a separate set to isolate callingcontext effects, and another set to accumulate body effects, the callingcontext problem has been reduced to the problem of solving the dataflow equations that compute the different sets. These equations can be solved by the iterative algorithm. As part of our method, the interprocedural kill effects of callbyreference formal parameters are correctly handled by the equationscompatible technique of element recoding. The importance of our interprocedural analysis method lies in the fact that a number of different applications depend on the solution of fundamental dataflow problems such as reaching definitions, live variables, definitionuse and usedefinition chains, and available expressions. Program revalidation, dataflow anomaly detection, compiler optimization, automatic vectorization and parallelization, and software tools that make a program more understandable by revealing data dependencies, are some of the applications that may benefit by using our method. The second part of this work presented new algorithms for precise interprocedu ral logical ripple effect and slicing. The algorithms use our interprocedural dataflow analysis method, and add a control mechanism by which, in effect, executionpath history can affect executionpath continuation as the ripple effect or slice is built piece by piece. The importance of our algorithms for precise interprocedural logical ripple effect and slicing lies in their applicability to the areas of software maintenance and debug ging. A precise interprocedural logical ripple effect can be used to show a programmer the consequences of program changes, thereby reducing errors and maintenance cost. Similarly, a precise interprocedural slice can localize program faults, thereby saving programmer effort and debugging cost. The third part of this work presented an algorithm that identifies loops that can be parallelized, including loops that contain calls. The algorithm makes use of our interprocedural dataflow analysis method to determine data dependencies, and then the algorithm examines the data dependencies within each loop and determines if any of these data dependencies are loopcarried, in which case parallelization of the loop is inhibited. The algorithm has potential use in parallelization tools. 5.2 Directions for Future Research There are several topics of possible future research related to our method for interprocedural dataflow analysis. Regarding solving the equations, besides the it erative algorithm there are elimination algorithms [20] that have better complexity. Further studies are needed to determine to what extent these other algorithms can be used to solve the equations. Another topic regards the dataflow problems that can be solved by our method, as the actual universe of solvable problems remains to be determined. We have only mentioned a few of the better known problems. For some dataflow problems, it may be that our method can be used after suitable modification to adapt it to the special needs of the problem. Regarding possible future research related to our algorithms for precise inter procedural logical ripple effect and slicing, because the algorithms may overestimate when recursive calls are present, or because the Allow set lacks the information needed 88 to enforce the ordering of unmatched returns, one area of future research would be to investigate the possibility of modifying Definition 1, Theorems 1 through 4, and the algorithms, so as to remove the possibility of such overestimation. REFERENCES [1] Agrawal, H., and Horgan, J. Dynamic program slicing. Proceedings of the SIG PLAN 90 Conference on Programming Language Design and Implementation. ACM SIGPLAN Notices, 25, 6 (June 1990), 246256. [2] Aho, A., Sethi, R., and Ullman, J. Compilers, Principles, Techniques and Tools. AddisonWesley, Reading, MA (1986). [3] Allen, F. Interprocedural data flow analysis. Proceedings of the IFIP Congress 1974, North Holland, Amsterdam (1974), 398402. [4] Banning, J. An efficient way to find the side effects of procedure calls and the aliases of variables. Conference Record of the 6th ACM Symposium on Principles of Programming Languages, ACM, New York (Jan. 1979), 2941. [5] Burke, M., and Cytron, R. Interprocedural dependence analysis and paralleliza tion. Proceedings of the SIGPLAN 86 Symposium on Compiler Construction, 162175. [6] Callahan, D. The program summary graph and flowsensitive interprocedural data flow analysis. Proceedings of the SIGPLAN 88 Conference on Program ming Language Design and Implementation. ACM SIGPLAN Notices, 23, 7 (July 1988), 4756. [7] Cooper, K., and Kennedy, K. Interprocedural sideeffect analysis in linear time. Proceedings of the SIGPLAN 88 Conference on Programming Language Design and Implementation. ACM SIGPLAN Notices, 23, 7 (July 1988), 5766. [8] Ferrante, J., Ottenstein, K., and Warren, J. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9, 2 (1987), 319349. [9] Harrold, M., and Soffa, M. Computation of interprocedural definition and use dependencies. Proceedings of the IEEE Computer Society 1990 Int'l Conference on Computer Languages, New Orleans, LA (March 1990). [10] Hecht, M. Flow Analysis of Computer Programs. Elsevier NorthHolland, New York (1977). [11] Horwitz, S., Reps, T., and Binkley, D. Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems, 12, 1 (Jan. 1990), 2660. [12] Hwang, J., Du, M., and Chou, C. Finding program slices for recursive procedures. Proceedings of the IEEE COMPSAC 88 (Oct. 1988), 220227. [13] Johmann, K., Liu, S., and Yau, S. Dataflow Equations for ContextDependent FlowSensitive Interprocedural Analysis. SERCTR45F, Department of Com puter and Information Sciences, University of Florida, Gainesville (Jan. 1991). [14] Korel, B., and Laski, J. Dynamic program slicing. Information Processing Let ters, 29, 3 (Oct. 1988), 155163. [15] Landi, W., and Ryder, B. Pointerinduced aliasing: a problem classification. Conference Record of the 18th ACM Symposium on Principles of Programming Languages, ACM, New York (1991), 93103. [16] Leung, H., and Reghbati, H. Comments on program slicing. IEEE Transactions on Software Engineering, SE13, 12 (Dec. 1987), 13701371. [17] Myers, E. A precise interprocedural data flow analysis algorithm. Conference Record of the 8th ACM Symposium on Principles of Programming Languages, ACM, New York (1981), 219230. [18] Richardson, S., and Ganapathi, M. Interprocedural optimization: experimental results. SoftwarePractice and Experience, 19, 2 (1989), 149169. [19] Rosen, B. Data flow analysis for procedural languages. Journal of the ACM, 26, 2 (April 1979), 322344. [20] Ryder, B., and Paull, M. Elimination algorithms for data flow analysis. ACM Computing Surveys, 18, 3 (Sep. 1986), 277316. [21] Sharir, M., and Pnueli, A. Two approaches to interprocedural data flow analysis. Muchnik, S., and Jones, N. Eds. Program Flow Analysis: Theory and Applica tions, PrenticeHall, Englewood Cliffs, NJ (1981), 189232. [22] Triolet, R., Irigoin, F., Feautrier, P. Direct parallelization of call statements. Proceedings of the SIGPLAN 86 Symposium on Compiler Construction, 176 185. [23] Weiser, M. Programmers use slices when debugging. Communications of the ACM, 25, 7 (July 1982), 446452. [24] Weiser, M. Program slicing. IEEE Transactions on Software Engineering, SE10, 4 (July 1984), 352357. [25] Zima, H., and Chapman, B. Supercompilers for Parallel and Vector Computers. AddisonWesley, Reading, MA (1990). BIOGRAPHICAL SKETCH Kurt Johmann was born in Elizabeth, New Jersey, on November 16, 1955. In 1978 he received a B.A. in computer science from Rutgers University in New Jersey. Following graduation, he worked for a shipping company, SeaLand Service Inc., as a programmer and systems analyst. In 1985 he left SeaLand and did PC work for three years. Following this, he entered the graduate program of the Computer and Information Sciences Department at the University of Florida in the Fall of 1988. He received an M.S. in computer science, December 1989, and entered the Ph.D. program. Anticipating graduation, he hopes to find a job in academia. I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Step S. Yau, C aian Professor of Comp ,r and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Richard NewmanWolfe, Cochairman Assistant Professor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Paul Fishwick Associate Professor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Mark Yang Professor of tati This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. May, 1992 ., . SWinfred M. Phillips Dean, College of Engineering Madelyn M. Lockhart Dean, Graduate School UNIVERSITY OF FLORIDA III 1 11111 Ill 111 I I 3 1262 08285 449 7 