Citation |

- Permanent Link:
- http://ufdc.ufl.edu/AA00003273/00001
## Material Information- Title:
- Context-dependent flow-sensitive interprocedural dataflow analysis and its application to slicing and parallelization
- Creator:
- Johmann, Kurt, 1955-
- Publication Date:
- 1992
- Language:
- English
- Physical Description:
- vi, 91 leaves : ill. ; 29 cm.
## Subjects- Subjects / Keywords:
- Algorithms ( jstor )
Data lines ( jstor ) Experimental results ( jstor ) Linear programming ( jstor ) Logical givens ( jstor ) Logical theorems ( jstor ) Mathematical independent variables ( jstor ) Mathematical procedures ( jstor ) Mathematical variables ( jstor ) Programming languages ( jstor ) - Genre:
- bibliography ( marcgt )
theses ( marcgt ) non-fiction ( marcgt )
## Notes- Thesis:
- Thesis (Ph. D.)--University of Florida, 1992.
- Bibliography:
- Includes bibliographical references (leaves 89-90).
- General Note:
- Typescript.
- General Note:
- Vita.
- Statement of Responsibility:
- by Kurt Johmann.
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Resource Identifier:
- 027817217 ( ALEPH )
AJG6071 ( NOTIS ) 26576154 ( OCLC )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

CONTEXT-DEPENDENT FLOW-SENSITIVE INTERPROCEDURAL DATAFLOW ANALYSIS AND ITS APPLICATION TO SLICING AND PARALLELIZATION By KURT JOHMANN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1992 UmYERSITY'OFfORm UUIR E ACKNOWLEDGEMENTS I would like to express my appreciation and gratitude to my chairman and advisor, Dr. Stephen S. Yau, for his careful guidance and generous support during this study. I also would like to express my appreciation and gratitude to my previous advisor, Dr. Sying-Syang Liu. Without their supervision and counsel, this work would not have been possible. To Dr. Paul Fishwick, Dr. Richard Newman-Wolfe, and Dr. Mark Yang, members of the supervisory committee, go my thankfulness for their service. Finally, I want to thank the Software Engineering Research Center (SERC) for providing financial support during this study. TABLE OF CONTENTS ACKNOWLEDGEMENTS ............................ ii ABSTRACT .................................... v CHAPTERS 1 INTRODUCTION .................... ........... 1 1.1 Interprocedural Dataflow Analysis ..................... 1 1.2 Slicing and Logical Ripple Effect ................. ...... 3 1.3 Parallelization ................... ............ 6 1.4 Literature Review .................. ........... 7 1.5 Outline in Brief .................... .......... 11 2 THE INTERPROCEDURAL DATAFLOW ANALYSIS METHOD .... 12 2.1 Constructing the Flowgraph ................. ..... .. 12 2.2 Interprocedural Forward-Flow-Or Analysis ............. 16 2.2.1 The Dataflow Equations .. 17 2.2.2 Element Recoding for Aliases 23 2.2.3 Implicit Definitions Due to Calls .. 27 2.3 Interprocedural Forward-Flow-And Analysis ... 30 2.4 Interprocedural Backward-Flow Analysis .. 36 2.5 Complexity of Our Interprocedural Analysis Method ... 36 2.6 Experimental Results ................... ........ 41 3 INTERPROCEDURAL SLICING AND LOGICAL RIPPLE EFFECT 45 3.1 Representing Continuation Paths for Interprocedural Logical Ripple Effect . 45 3.2 The Logical Ripple Effect Algorithm 55 3.3 A Prototype Demonstrates the Algorithm. ... 67 3.4 The Slicing Algorithm ... .............. ....... 71 4 INTERPROCEDURAL PARALLELIZATION ..... 77 4.1 Loop-Carried Data Dependence ..... 77 4.2 The Parallelization Algorithm 80 5 CONCLUSIONS AND FUTURE RESEARCH.. 86 5.1 Summary of Main Results ...................... 86 5.2 Directions for Future Research .. 87 REFERENCES ....................... ......... ..... 88 BIOGRAPHICAL SKETCH ............................ 91 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy CONTEXT-DEPENDENT FLOW-SENSITIVE INTERPROCEDURAL DATAFLOW ANALYSIS AND ITS APPLICATION TO SLICING AND PARALLELIZATION By Kurt Johmann May 1992 Chairman: Dr. Stephen S. Yau Major Department: Computer and Information Sciences Interprocedural dataflow analysis is important in compiler optimization, au- tomatic vectorization and parallelization, program revalidation, dataflow anomaly detection, and software tools that make a program more understandable by show- ing data dependencies. These applications require the solution of dataflow problems such as reaching definitions, live variables, available expressions, and definition-use chains. When solving these problems interprocedurally, the context of each call must be taken into account. In this dissertation we present a method to solve this kind of dataflow problem precisely. The method consists of special dataflow equations that are solved for a program flowgraph. Regarding calling context, separate sets, called entry and body sets, are maintained at each node in the flowgraph. The entry set contains calling- context effects that enter a procedure. The body set contains effects that result from statements in the procedure. By isolating calling-context effects in the entry set, a call's nonkilled calling context is preserved by means of a simple intersection operation done at the return node for the call. Slicing determines program pieces that can affect a value. Logical ripple effect determines program pieces that can be affected by a value. Both slicing and logical ripple effect are useful for software maintenance. The problems of slicing and logical ripple effect are inverses of each other, and a solution of either problem can be inverted to solve the other. Precise interprocedural logical ripple effect analysis is complicated by the fact that an element may be in the ripple effect by virtue of one or more specific execution paths. In this dissertation we present an algorithm that builds a precise logical ripple effect or slice piece by piece, taking into account the possible execution paths. The algorithm makes use of our interprocedural dataflow analysis method, and this method is also used in an algorithm given in this dissertation for identifying loops that can be parallelized. CHAPTER 1 INTRODUCTION 1.1 Interprocedural Dataflow Analysis Dataflow analysis refers to a class of problems that ask about the relationships that exist along a program's possible execution paths, between such program ele- ments as variables, constants, and expressions [2, 10]. When dataflow analysis is done for a program by treating its individual procedures as being independent of each other, regardless of the calls made, this is known as intraprocedural analysis. For intraprocedural analysis, assumptions must be made about the effects of calls. By contrast, interprocedural analysis replaces assumptions with specific information about the effects of each call. This information can be gathered by either flow- sensitive [3, 6, 9, 17, 19, 21] or flow-insensitive [4, 7, 18] analysis. When answering a dataflow question, a flow-sensitive analysis will take into account the flow paths within procedures, whereas a flow-insensitive analysis ignores these flow paths. The flow paths are the possible execution paths. Flow-sensitive analysis typically provides more precise information, but at greater cost. Flow-sensitive interprocedural dataflow analysis has two major problems that make it significantly harder than intraprocedural analysis. First, in intraprocedural analysis, it is assumed that any path in the flowgraph is a possible execution path. By contrast, for interprocedural analysis, it is useful to assume that the possible execution paths conform to the rule that once a procedure is entered by a call, the flow returns to that call upon return. Thus, the set of possible execution paths will typically be a proper subset of the paths in the program flowgraph. This problem will be referred to as the calling-context problem. Second, call-by-reference formal parameters typically cause alias relationships between actual and formal parameters that are valid only for certain calls and apply only to those passes through the called procedure that originate from those calls that establish the specific alias relationship. There are many applications for a flow-sensitive interprocedural dataflow anal- ysis method that solves the two major problems, assuming that the costs of the method are not too high. Some of the well-known dataflow problems that can be precisely solved by such a method are reaching definitions, live variables, the related problems of definition-use and use-definition chains, and available expressions. Ap- plications that require the solution of one or more of these dataflow problems include compiler optimization, automatic vectorization and parallelization of program code, program revalidation, dataflow anomaly detection, and software tools that show data dependencies. In this dissertation we present a new method for flow-sensitive interprocedural dataflow analysis that solves the two major problems, and does so at a comparatively low cost [13]. The method consists of special dataflow equations that are solved for a program flowgraph. In deference to calling context, separate sets, called entry and body sets, are maintained at each node in the flowgraph. The entry set contains calling-context effects that enter a procedure. The body set contains effects that result from statements in the procedure. By isolating calling-context effects in the entry set, a call's nonkilled calling context is preserved by means of a simple inter- section operation done at the return node for the call. The main advantage of our method is its low complexity, and the fact that the presence of recursion does not affect the preciseness of the result. The language model assumed for Chapter 2 allows global variables, but the visibility of each formal parameter is limited to the single procedure that declares it. Thus, with the exception of a call and its indirect reference, each formal pa- rameter can only be referenced inside a single procedure. Examples of programming languages that fit this model are C and FORTRAN. This restriction on the visibility of formal parameters is imposed for the sake of the discussions of element recoding in Sections 2.2.2 and 2.3, of implicit definitions in Section 2.2.3, and of worst-case complexity in Section 2.5. Our method can also be used for the alternative language model that allows each formal parameter to have visibility in more than a single procedure, but this is considered only briefly at the end of Section 2.5. 1.2 Slicing and Logical Ripple Effect Given an actual or hypothetical variable v at program point p, determine all program pieces that can possibly be affected by the value of v at p. This is the logical ripple effect problem. Given v and p, determine all program pieces that can possibly affect the value of v at p. This is the slicing problem. For these two problems, each problem is the inverse of the other, and a solution for one of these problems, once inverted, would be a solution for the other problem. Logical ripple effect is useful for helping a programmer to understand how a program change, either actual or hypothetical, will impact that program. Making program changes as part of routine maintenance often introduces new errors into the changed program. Such errors typically result because the programmer overlooked some part of the logical ripple effect for that change. By showing a programmer what the logical ripple effect actually is for a program change, mistakes can be avoided. Slicing is primarily useful for program fault localization [23]. If a variable v at point p is known to have a wrong value, then a slice on v at p will narrow the search for the cause of the error to that part of the program that can truly affect v at p. Thus, the fault is localized. The more precise the slice, the more localized the cause of the error, saving programmer time. In this dissertation we are concerned only with static logical ripple effect and slicing [11, 12, 16, 24] where the ripple effect or slice is determined from dataflow analysis of the program text. The alternative approach is dynamic logical ripple effect and slicing [1, 14] where the ripple effect or slice is determined by actually executing the program. Whenever we speak of execution paths in Chapter 3, we always mean possible execution paths as determined by dataflow analysis. Precise interprocedural logical ripple effect analysis is complicated by the fact that a definition may be added to the ripple effect because of one or more specific execution paths. To determine in turn the ripple effect of that added definition, that definition should be constrained to those execution paths that are the possible continuations of the execution paths along which that definition was itself affected and thereby added to the ripple effect. We refer to this as the execution-path problem. In particular, it is those call instances made in an execution path P that have not been returned to in P that cause the difficulty. This is because of the rule that a called procedure returns to its most recent caller. This means that any continuation of the execution path P must first return to those unreturned calls in P before returns can possibly be made to call instances that precede P. An example will illustrate the problem. procedure main procedure B procedure A begin begin begin 1: f 7 6: y- f+5 7: callB 2: call A end 8: x +- y 3: z x end 4: f- 1 5: call B end For the example, assume that all variables are global, and that the problem is to determine the logical ripple effect for the definition of variable f at line 4. The call to procedure B at line 5 allows the definition of f at line 4 to affect the definition of y at line 6, and the return of procedure B would be to the call at line 5 by which the definition of y at line 6 was affected. The end result is that the ripple effect should include only line 6. However, assume that the execution-path problem is ignored and all returns are possible when the ripple effect is computed. For the same problem, the call at line 5 allows the definition of f at line 4 to affect the definition of y at line 6. Then the definition of y at line 6 affects the definition of x at line 8 by procedure B returning to the call at line 7 in addition to the call at line 5. Then the definition of x at line 8 affects the definition of z at line 3 by procedure A returning to the call at line 2. The end result is a ripple effect that includes lines 3, 6, and 8, but only line 6 should be in the ripple effect. Although there are a number of papers on logical ripple effect and slicing [11, 12, 16, 24], there appears to be only one [11] that addresses the problems of precise interprocedural logical ripple effect and slicing, and presents a method for it. Weiser [24] was the first to propose an interprocedural slicing method that ignores the execution-path problem and thereby suffers from the resulting loss of precision. Horwitz et al. [11] address the problem of precise interprocedural slicing, and present a method to construct a system dependence graph from which slices can be extracted. In this dissertation we present an algorithm that builds the logical ripple effect piece by piece, and takes into account the restrictions on execution-path continuation that are imposed by the preceding execution paths up to the point by which the given program piece is affected and thereby included in the ripple effect. In general, the algorithm computes a precise logical ripple effect, but some overestimation is possible, meaning that the computed logical ripple effect may be larger than it actually is. An inverse form of the algorithm is presented for the slicing problem. The languages that our algorithm will work for include many of the common procedural languages such as C, Pascal, Ada, and Fortran. 1.3 Parallelization Automatic conversion of a sequential program into a parallel program is often referred to as parallelization. Parallelization problems are typically concerned with the conversion of sequential loops into parallel code. In this dissertation, the specific problem considered is the identification of loops in a program that can be parallelized, including those loops that contain calls. A flow-sensitive interprocedural dataflow analysis method has specific applicability to the problem of parallelizing loops that contain calls, because such a method can supply the precise data-dependency infor- mation that would be necessary for the parallelization analysis. The parallelization of a loop would mean that each iteration of the loop can be executed independently of the other iterations of the loop. In theory, this would mean that each single iteration, or each arbitrary block of iterations, can be assigned to a separate processor in a parallel machine. The specific architecture of a particular parallel machine, as well as the programming language to be parallelized, as well as the various loop transformations that are possible to convert sequential loop code into functionally equivalent sequential code that is more parallelizable, will influence the determination in any parallelization tool as to what loops can actually be parallelized, and how they would be parallelized. However, none of the architecture, language, and loop-transformation issues will be considered here. Instead, the problem will be considered solely from the standpoint of data dependence. After a brief review of the basics regarding data dependence and parallelization, an algorithm is given that identifies loops in a program that can be parallelized, and this algorithm uses our interprocedural dataflow analysis method as an integral part. The potential value of parallelization is clear. On the one hand, parallel machines are becoming more common, and on the other hand, a great number of sequential programs already exist, some of which can benefit from the greater processing power that parallelization would offer. 1.4 Literature Review Different methods have been offered for solving various flow-sensitive interpro- cedural dataflow analysis problems. Sharir and Pnueli [21] present a method they name call-strings. The essential idea of their method is to accumulate for each ele- ment a history of the calls traversed by that element as it flows through the program flowgraph. The call history associated with an element is used whenever that element is at a return point. The element can only cross back to those calls in its call history. Thus, the call-strings approach provides a solution to the calling-context problem. However, the disadvantage of this approach is the time and space needed to maintain a call history for each element at each flowgraph node. Let I be the program size. We assume that the number of elements will be a linear function of 1. The worst-case number of total set operations required by the call-strings approach would be greater by a factor of I when compared to our method. This is because for each union or intersection of two sets of elements, if the same element is in both sets, then a union operation must also be done for the two associated call histories so as to get the new call history to be associated with that element at the node for which the set operation is being done. A further disadvantage of the call-strings approach is the need to include the associated call histories when set stability is tested to determine termination for the iterative algorithm used to solve the dataflow equations. Myers [17] offers a solution to the calling-context problem that is essentially the same as call-strings. Allen [3] presents a different method for interprocedural dataflow analysis. The method analyzes each procedure completely, in reverse invocation order. The first procedures to be analyzed would be those that make no calls, then the procedures that only call these procedures would be analyzed, and so on. Once a procedure is analyzed, its effects can be incorporated into those procedures that call it, when they in turn are analyzed. The obvious drawback of this method is that it cannot be used to analyze recursive calls. Rosen [19] presents a complex method for interprocedural dataflow analysis that is limited to solving the problems of variable modification, preservation, and use. These dataflow problems do not require a solution of the calling-context problem. Callahan [6] has proposed the program summary graph to solve the interproce- dural dataflow problems of kill and use, where kill determines all definite kills that result from a procedure call, and use determines all variables that may be used as a result of a procedure call before being redefined. As part of the determination of edges in the program summary graph, intrapro- cedural reaching-definitions analysis must be done for each procedure. Simplifying Callahan's space complexity analysis, we get O(vgl) as the worst-case size of the program summary graph, where v,, is the number of global variables in the program plus the average number of actual parameters per call, and I is the program size. One limitation of Callahan's method is that it does not correctly handle multiple aliases that result when the same variable is used multiple times as an actual parameter in the same call and the corresponding formal parameters are call-by-reference. By contrast, our method, using element recoding where all the aliases are encoded in a single element, will correctly handle the multiple aliases problem. Callahan's method offers no solution to the calling-context problem, and could not be used to determine, for example, interprocedural reaching definitions. However, Harrold and Soffa [9] have extended his method so that interprocedural reaching definitions can be determined. They use an interprocedural flowgraph, denoted IFG, that is very similar to the program summary graph. The IFG has inter-reaching edges that are determined by solving Callahan's kill problem. They recommend using his method, so their method inherits Callahan's space and time complexity, as well as its limitation with regard to multiple aliases. Before the IFG can be used, it must be decorated with the results of intrapro- cedural analysis done twice for each procedure to determine both reaching definitions and upwardly exposed uses. Then an algorithm is used to propagate the upwardly exposed uses throughout the IFG. This algorithm has worst-case time complexity of O(n2) where n is the number of nodes in the IFG. Their graph will have the same number of nodes as for Callahan's graph, meaning worst-case graph size will be O(vgal). Substituting val for n, we get a worst-case time complexity of O(v1l12). As the size of our flowgraph is proportional to the size of the program, the worst-case time complexity for solving our equations is only 0(12). Weiser [24] was the first to propose an interprocedural slicing method that ignores the execution-path problem and thereby suffers from the resulting loss of precision. Horwitz et al. [11] have presented a method to compute the more precise slice explained in the Introduction. However, they use a more restricted definition of a slice. Their slice is all statements and predicates that may affect a variable v at program point p, such that v is defined or used at point p. Their method consists of constructing a specialized graph called a system dependence graph. Nodes in this graph represent program pieces such as statements, and the edges in the graph represent control or data dependencies. Edges representing transitive data dependencies that are due to procedure calls are computed by first modeling each procedure and its calls with an attribute grammar called a linkage grammar, and then solving the grammar so as to determine the transitive data dependencies represented by it. Once the system dependence graph is complete, any slice based on an actual definition or use occurring at any point p in the program can be extracted from the graph. A major weakness of their method is that it does not allow a hypothetical use to be the starting point of the slice. The complexity of constructing the system dependence graph is given as O(G - X2 D2) where G is the total number of procedures and calls in the program, X is the total number of global variables in the program plus a term that can be considered a constant, and D is a linear function of X. Once the system dependence graph is complete, any particular slice that is wanted can be extracted from the graph at complexity O(n) where n is the size of the graph. The size of the graph is roughly quadratic with program size, being bounded by O(P -(V + E) + T X) where P is the number of procedures, V is the largest number of predicates and definitions in a single procedure, E is the largest number of edges in a procedure dependence graph, T is the number of calls in the program, and X is the number of global variables. In their paper, much is made of the fact that once the graph is complete, any slice on an actual definition or use can be extracted from the graph at O(n) cost where n is the size of the graph. However, the number of actual definition and use occurrences in a program is proportional to the program size L. Therefore, any method that can compute a slice at cost O(Z) for some Z, can generate all the slices contained in their graph at cost O(Z L), spool the slices to disk, and recover them at cost 0(1). Although there are many papers on slicing, it seems that only Horwitz et al. [11] discuss clearly the problem of the more precise interprocedural slice, and present a method to compute it, as well as providing complexity analysis. Our research on slicing is only concerned with computing the more precise slice, so Horwitz et al. is the principal reference. Zima and Chapman [25] is the principal reference used to study the issues and methods of parallelization. Their book distills the work found in scores of papers and dissertations, and is an excellent survey of parallelization. Interprocedural par- allelization is specifically considered by Burke and Cytron [5], and by Triolet et al. [22]. 1.5 Outline in Brief This introductory chapter ends with a brief synopsis of the remaining chapters. Chapter 2 presents in detail our interprocedural dataflow analysis method. The chap- ter ends with a brief description of the prototypes that were built to demonstrate the method, along with some of the experimental results obtained from these prototypes. Chapter 3 begins with a representation scheme for continuation paths for the inter- procedural logical ripple effect problem and then presents our interprocedural logical ripple effect algorithm. A prototype that was built to demonstrate this algorithm is briefly described and experimental results are presented. An inversion of the logical ripple effect algorithm is then presented as a solution to the interprocedural slicing problem. Chapter 4 begins with an explanation of loop-carried data dependence and its relevance to parallelization, and concludes with an algorithm that identifies loops that can be parallelized, including loops that contain calls. Chapter 5 summarizes the major results of the dissertation, and suggests directions for future research. CHAPTER 2 THE INTERPROCEDURAL DATAFLOW ANALYSIS METHOD 2.1 Constructing the Flowgraph This section discusses the flowgraph and its relationship to dataflow equations. After the discussion, rules are given for constructing the specific flowgraph required by our interprocedural analysis method. Note that the required flowgraph is con- ventional and the rules to be given relate only to the representation of calls and procedures in the flowgraph. A flowgraph is a directed graph that represents the possible flow paths of a program. The nodes of a flowgraph correspond to basic blocks in the program. A basic block is a sequence of program code that is always executed together in the same order. The directed edges of a flowgraph represent possible transfers of control. Figures 2.1 and 2.3 each represent a flowgraph. Dataflow problems are often formulated as a set of equations that relate the four sets, IN, OUT, GEN, and KILL, that are associated with each node in the flow- graph. For any node and its block, the GEN set represents the elements generated by that block. The KILL set represents those elements that cannot flow through the block, because they would be killed by the block. The IN set represents the valid elements at the start of the block, and the OUT set represents the valid elements at the end of the block. Dataflow problems are typically either forward-flow or backward-flow. For forward-flow, the IN set of a node is computed as the confluence of the OUT sets of the predecessor nodes, and the OUT set is a function of the node's IN, GEN, and KILL sets. For backward-flow, the OUT set of a node is computed as the con- fluence of the IN sets of the successor nodes, and the IN set is a function of the node's OUT, GEN, and KILL sets. The predecessors of any node n are those nodes that have an out-edge directed to node n. The successors of node n are those nodes that have an in-edge directed from node n. The confluence operator will almost in- variably be either set union or set intersection, depending on the problem. Thus, a dataflow problem may be classified as being either forward-flow-or, forward-flow-and, backward-flow-or, or backward-flow-and, where "or" refers to set union and "and" refers to set intersection. Once the dataflow equations have been defined for a particular problem, and the rules established for creating the GEN and KILL sets, the equations can then be solved for a specific program or procedure and its representative flowgraph. To solve the equations, the iterative algorithm can be used. The iterative algorithm has the advantage that it will work for any flowgraph. The iterative algorithm repeatedly computes the IN and OUT sets for all nodes until all sets have stabilized and ceased to change. Recomputation of a node is necessary whenever an outside set that it depends on changes. For forward-flow problems, a node must be recomputed if the OUT set of a predecessor node changes. For backward-flow problems, a node must be recomputed if the IN set of a successor node changes. Typically, an evaluation strategy will determine the actual order in which nodes are recomputed. The flowgraph required by our interprocedural analysis method is conventional, with special nodes and edges as follows. For each procedure in the program, assign an entry node and an exit node. These nodes have no associated blocks of program code. The entry node has a single out-edge and as many in-edges as there are calls to that procedure in the program. The exit node has as many in-edges as there are nodes for that procedure whose blocks terminate with a return action. The exit node has as many out-edges as there are calls to that procedure in the program. For every in-edge of the entry node, there is a corresponding out-edge of the exit node. For the purpose of constructing the flowgraph, calls must be classified as either known or unknown. A known call is where the flowgraph for the called procedure will be a part of the total flowgraph being constructed. An unknown call is where the flowgraph of the called procedure will not be a part of the total flowgraph being constructed. Unknown calls are common and will occur for two reasons. First, the called procedure may be a compiler-library procedure for which source code is not available. Second, the called procedure may be a separately compiled user procedure for which the source code is not available. For any unknown call made within the program, if summary information of its interprocedural effects is not available, then conservative assumptions about its effects will have to be made. The actual summary information needed, and the assumptions made in its absence, will depend on the particular dataflow problem. The summary information, if present, would be used when constructing the GEN and KILL sets for any node whose block contains an unknown call. For any known call made within the program, there will be two nodes in the flowgraph for that call. One node is the call node. The call node represents a basic block that ends with the known call. The other node is the return node. The return node has an empty associated block. The call node will have two out-edges. One edge will be directed to the entry node of the called procedure. The other out-edge will be directed to the return node for that call. The return node will have two in-edges. One edge is the directed edge from the call node. The other in-edge is directed from the called procedure's exit node. In all, each known call results in two nodes and three distinct edges. One edge connects the call node to its return node. A second edge connects the call node to the called procedure's entry node. A third edge connects the called procedure's exit node to the return node. In constructing the flowgraph, a special problem arises if the programming lan- guage allows procedure-valued variables, such as the function pointers of C that when dereferenced result in a call of the function that is pointed at. The problem is to identify what are the possible procedure values when the procedure-valued variable invokes a call. Assuming this information is available from a separate analysis, the flowgraph can be constructed accordingly. For example, if the procedure-valued vari- able can have three different values when the call in question is invoked and each value is a procedure whose flowgraph will be part of the total flowgraph, then three known calls would be constructed in parallel with a common predecessor node for the three call nodes and a common successor node for the three return nodes. A procedure-valued variable is in essence a pointer. Note that the problem of determining what a pointer is or may be pointing at when that pointer is dereferenced, can itself be formulated as a dataflow problem, and in particular as a forward-flow-or dataflow problem. If necessary, an initial version of the flowgraph could be con- structed that treats all calls invoked by procedure-valued variables as unknown calls, followed by a solving of the dataflow problem for determining possible pointer values whenever a pointer is dereferenced, followed by amendments to the flowgraph using the pointer-value information. Dataflow analysis makes a simplifying, conservative assumption about the cor- respondence between paths in the flowgraph and possible execution paths in the pro- gram. Let a path be a sequence of flowgraph nodes such that in the sequence node n follows node m only if n is a successor of m in the flowgraph. For intraprocedural analysis, the assumption made is that any path in the flowgraph is a possible execu- tion path. That this assumption may not be true for a particular program should be obvious. However, the problem of determining the possible execution paths for an arbitrary program is known to be undecidable. The simplifying assumption that we use for interprocedural analysis is the same as that used for intraprocedural analy- sis, but with the added proviso that for any path that is a possible execution path, any subsequence of return nodes must inversely match, if present, the immediately preceding subsequence of call nodes. A return node matches a call node if and only if the return node is the call node's successor in the flowgraph. 2.2 Interprocedural Forward-Flow-Or Analysis This section begins with our basic approach to solving the calling-context prob- lem. The dataflow equations for forward-flow-or analysis are then given and their correctness is shown. As a part of our interprocedural analysis method, the tech- nique of element recoding is presented as a way to deal with the aliases that result from call-by-reference formal parameters. For some dataflow problems, implicit defi- nitions due to calls require explicit treatment, and this is discussed last. If certain problems, such as reaching definitions, are to be solved for a program by flow-sensitive interprocedural analysis, then the calling context of each procedure call must be preserved. In general, preserving calling context means that the dataflow effects of an individual call should include those effects that survive the call and were introduced into the called procedure by the call itself, but not those effects introduced into the called procedure by all the other calls to it that may exist elsewhere in the program. We refer to the need to preserve calling context as the calling-context problem. Our solution to the calling-context problem-and the essential difference be- tween our dataflow equations and conventional dataflow equations-is to divide every IN set and every OUT set into two sets called an entry set and a body set. The reason for having two sets is that the calling-context effects that enter a procedure from the different calls can be collected and isolated in the separate entry set. This entry set can then have effects in it killed by statements in the body of the procedure, but no additions are made to this entry set by body statements. Instead, any additions of effects due to body statements are made to the separate body set. This body set will also have effects killed in the normal manner, as for the entry set. Because the body set is kept free of calling-context effects, it is empty at the entry node. By contrast, the entry set is at its largest at the entry node and will either stay the same size as it progresses through the procedure's body nodes, or become smaller because of kills. By intersecting the calling context at a call node with the entry set at the exit node of the called procedure, the result is that subset of the calling context that has reached the exit node and therefore will reach the return node for that call. By "reach" we mean that there exists a path in the flowgraph along which the element is not killed or blocked. 2.2.1 The Dataflow Equations The dataflow equations that define the entry and body sets at every node are now given. The equations are divided into three groups. The first group computes the sets for entry nodes. The second group computes the sets for return nodes. The third group computes the sets for all other nodes. In the equations, B denotes a body set and E denotes an entry set. Two conditions, C1 and C2, appear in the equations. C1 means that x will cross the interprocedural boundary from call node p into the called procedure. C2 means that x can cross the interprocedural boundary from exit node q into return node n. C, means not Ci. For each node n, pred(n) means the set of predecessors of n. The RECODE set used in Group I is explained in Section 2.2.2. The GEN set used in Group I, and the GEN and KILL sets used in Group II, are explained in Section 2.2.3. For any node n. IN[n] = Ei,[n] U Bi,[n] OUT[n] = Eot[n] U But[n] Group I: n is an entry node. Bin[n] = 0 E;,[n] = U { Ix E OUT[p] A C}) p E pred(n) Bo,,[n] = GEN[n] Eo0t[n] = Ei,[n] U RECODE[n] Group II: n is a return node, p is the associated call node and q is the exit node of the called procedure. Bin[n] = {x (x E B0ot[p] A (1 V (Ci A C2 Ax E Eo Et[q]))) V (x E Bout[q] A C2)} Ei.[n] = {x E ,t[pl I C- V (Ci A C2 A x E E,,[q])} Bo,.[nl = (Bin[nl KILL[n]) U GEN[n] Eo.t[n] = Ei[n] KILL[n] Group III: n is not an entry or return node. Bin[n] = U Bo.u[p p E pred(n) Ein[n]= U Et p] p E pred(n) Bo.t[n] = (Bi,[n] KILL[n]) U GEN[n] ES.t[n] = E,n[n] KILL[n] The equations assume that the GEN and KILL sets for each call node will include only those effects for that call that occur prior to the entry of the called procedure. This requirement is necessary because the OUT set of the call node is used by the entry-node equation that constructs the entry set of the called procedure. Referring to conditions C1 and C2, the rules for deciding whether an effect crosses a particular interprocedural boundary will depend on two primary factors, namely the dataflow problem and the programming language. For example, for the reaching-definitions problem and a language such as FORTRAN, any definition of a global variable, and any definition of a variable that is used as an actual parameter whose corresponding formal parameter is call-by-reference, will cross. As a rule, an effect that crosses into a procedure because it might be killed, will also cross back to the return node if it reaches the exit node of the called procedure. Table 2.1 shows the result of solving the equations for the flowgraph of Fig- ure 2.1. By "solving" we mean that, in effect, the iterative algorithm has been used and all the sets are stable. The dataflow problem is reaching definitions, and variable w is local while variables x, y, and z are global. Reaching definitions is the problem of finding all definitions of a variable that reach a particular use of that variable, for all variables and uses in the program. In Figure 2.1, nodes 1 and 8 are entry nodes, nodes 7 and 10 are exit nodes, nodes 3 and 5 are call nodes, and nodes 4 and 6 are return nodes. Alongside each node is its basic block. Each defined variable is superscripted with an identifier that is the set element used in Table 2.1 to represent that definition. The correctness of the equations can be seen from the following observations. For a procedure, the entry-node entry set is constructed as the union of all calling- context effects that can enter the procedure from its calls. Within the procedure body, effects in the entry set can be killed, but not added to. For effects in the entry procedure main procedure f( begin begin w=5 x=10 x = 10 end if(w > x) z=10 call fO else y=5 call fO 1 end Figure 2.1. A reaching-definitions example. z4= 10 call f( y3= 5 call f( Table 2.1. Solution of forward-flow-or equations for Figure 2.1. Node Ei, Eot Bi, Bot 1 0 0 0 0 2 0 0 0 {1,2} 3 0 0 {1,2} {1,2,4} 4 0 0 {1,4,5} {1,4, 5} 5 0 0 {1, 2} {1, 2, 3} 6 0 0 {1, 3, 5} {1,3, 5 7 0 0 {1,3,4,5} {1, 3,4, 5} 8 {2, 3,4} {2, 3, 4} 0 0 9 {2, 3, 4} {3,4} 0 {5} 10 {3,4} {3,4} {5} {5} set that reach a call at a call node, those effects that survive the call are recovered in the entry set constructed by the E,n[n] equation for the successor return node n. To see that this is true, observe the following. If an entry-set effect that reaches the call cannot enter the called procedure, then it cannot be killed within the called procedure, so the effect should be added to the return-node entry set without further conditions, and this is done by the selection criterion (x E Eo,t[p] A C) in the Ei, [n] equation for the return node. If, on the other hand, an entry-set effect reaches the call and does enter the called procedure, and therefore may be killed by it, then this effect should be added to the return-node entry set only if it reached the entry set of the called procedure's exit node and the effect can cross back into the caller. This is done by the selection criterion (x E E,,t[p] A C1 A C2 A x E E,,t[q]) in the Ei,[n] equation for the return node. From the equations for the entry set, we see that for any procedure z, the entry set at z's exit node will, as the equations are solved, eventually contain all calling-context effects that entered z and reached its exit node. This characteristic of the exit-node entry set is the requirement placed upon it when it is used in the Ei,[n] equation for the return node, so this requirement is satisfied and the entry-set equations are correct. For any procedure, the Bi, set is always empty at the entry node, so the B set is free of calling-context effects. Within the procedure body, GEN and KILL sets are used to update the body set as it propagates along the various nodes. For effects in the body set that reach a call at a call node, those effects that survive the call are recovered in the body set constructed by the Bin[n] equation for the successor return node n. If a body-set effect that reaches the call cannot enter the called procedure, then it cannot be killed within the called procedure, so it should be added to the return-node body set without further conditions, and this is done by the selection criterion (x E Bo,,[p] A C1) in the Bin[n] equation for the return node. If, on the other hand, a body-set effect reaches the call and will enter the called procedure, and therefore may be killed by it, then this effect should be added to the return- node body set only if it reached the entry set of the called procedure's exit node and the effect can cross back into the caller. This is done by the selection criterion (x E Bo,,[p] A C1 A C2 A x E E ot[q]) in the Bi,[n] equation for the return node. In addition, all crossable effects that result from the call, and that are independent of calling context, should also be added to the return-node body set, and this is done by the selection criterion (x E B,,t[q] A C2) in the Bi,[n] equation for the return node. From the equations for the body set, we see that for any procedure z, the body set at z's exit node is free of calling-context effects and will, as the equations are solved, eventually contain all body effects that reached the exit node, including those body effects resulting from calls made within z. This characteristic of the exit-node body set is the requirement placed upon it when it is used in the Bin[n] equation for the return node, so this requirement is satisfied. The other requirement of this return-node equation is that the exit-node entry set contains all calling-context effects for the procedure that reach the exit node. This requirement has already been shown to be satisfied, so we conclude that the body-set equations are correct. 2.2.2 Element Recoding for Aliases The RECODE set for the entry node has its elements added to the Ei,, set for that node. The idea of the RECODE set is that certain elements in the OUT set of a predecessor call node, irrespective of their ability to cross the interprocedural bound- ary when parameters are ignored, should nevertheless be carried over into the entry set of the called procedure as calling-context effects because of an alias relationship established by the call, between an actual parameter and a formal call-by-reference parameter. Any element that enters a procedure because of such an alias relationship between parameters should be recorded to reflect this alias relationship. A recorded element represents both the base element, which is the element as it would be if there were no alias relationship, and the non-empty alias relationship. Element recoding has two purposes. First, it allows the recorded element within the called procedure to be killed correctly through its alias relationship. Second, it allows the recorded element within the called procedure to be correctly associated with specific references to those aliases that are in the alias relationship. Element recoding never involves a change of the base element, but only a change of the associated alias relationship, which would be the set of formal parameters to which the base element is, in effect, aliased. Because of element recoding, in effect a new element is generated, hence the separate RECODE set. Figure 2.2 presents an algorithm for generating the entry-node input sets E,, and RECODE, for a forward-flow-or dataflow problem, for the assumed language model in which the visibility of each formal parameter is limited to the single proce- dure that declares it. For each element in the OUT[c] set, the algorithm generates at most one element for inclusion in the entry-node input sets. The algorithm is unambiguous, except for line 10. The "can be affected by" test at line 10 is a gener- alization. The details of this test will depend on the specific dataflow problem being solved. For example, if the dataflow problem is reaching definitions, then each base element w represents a specific definition of some variable z. If the actual parame- ter p being tested by the algorithm is the variable z, and the corresponding formal parameter is call-by-reference, then the definition that w represents can be used or killed through that formal parameter, so w can be affected by that actual parameter z, and the "affected by" test is therefore satisfied. The p E OA test at line 10 covers the situation where an actual parameter p that is aliased to the formal f is itself a formal parameter that is effectively aliased to w. In this case f is established as a new effective alias for w, by transitivity of the alias relationship. Referring to the algorithm, there is no carry over of the old alias relationship into the new alias relationship. The old alias relationship is represented by the OA set, and the new alias relationship is represented by the NA set. That this no- carry-over of the old alias relationship is correct, follows from the assumed language model. The aliases of element recoding are formal parameters, and the model states that each formal parameter is visible in only one procedure. This means there is no need to carry the old alias relationship into a different procedure, because the aliases cannot be referenced outside the single procedure in which the old alias relationship is active. Note that recursive calls are no exception to this no-carry-over rule, because a recursive call will cancel any alias relationship established for a base element by any prior call of the procedure. In general, the fact that crossing elements are recorded when NA $ 0, and unrecoded when NA = 0 and OA 5 0, places an added burden on the return-node equations to recognize an element that should be recovered from the exit-node entry set, necessitating, in effect, additional rules to cover this possibility. After an element is recovered, it would also be necessary to restore the alias relationship, if any, that e is an entry node. This algorithm constructs the E,, [ e ] and RECODE[ e] sets. begin 1 Ei,[e] 0 2 RECODE[e] 0 3 for each predecessor call node c of entry node e 4 for each element x E OUT[c] 5 let w be the base element of x 6 let OA be the set of aliases, if any, associated with w, forming x 7 let NA be the set of new aliases 8 NA 0 9 for each actual parameter p at call node c that is aliased to a call-by-reference formal parameter f 10 if (w can be affected by p) V (p E OA) 11 NA- NA U{f} fi end for 12 if NA # 0 13 RECODE[e] RECODE[ e ] U {(w, NA)} 14 else if w can cross the interprocedural boundary 15 Ei,[e] EZ,[e] U {w} fi end for end for end Figure 2.2. Element-recoding algorithm for forward-flow-or dataflow problems. it had prior to the call. This recognition and restoration problem is perhaps most easily solved by associating with each call node two additional sets, one for body- set elements and another for entry-set elements, where each set consists of ordered pairs. These sets would be determined whenever the entry-node entry set of the called procedure is computed. The first element of each ordered pair is a crossing element x as it exists in the Bout or E,,t set at the call node, and the second element is element y which is that element effectively generated from element x by the element-recoding algorithm of Figure 2.2 at either line 13 or line 15. If all crossing elements for the call are included in these additional sets, then the return-node equations can use these sets instead of the Bo,,[p] and Eout[p] sets to recognize elements to be recovered from the exit- node entry set. Recognition and restoration would be done by trying to match the exit-node entry-set element against the second element of an ordered pair from the appropriate additional set at the call node, and then, if there is a match, restoring the original element by using the first element of the matched pair. For example, if x is a crossing element in the B,,t set of a call node, and y is the generated element, then (x, y) would be an ordered pair in the additional set for body-set elements. When the Bi, set for the return node is computed, if y is in the exit-node entry set then it will match the ordered pair (x, y), and element x will be added to the Bi, set. As an example of why element recoding is necessary, consider the following. Suppose there are two different calls to the same procedure, and different definitions of global variable g reach each call. At one of the calls, g is also used as an actual parameter and the corresponding formal parameter is call-by-reference. The problem now is what to kill from the entry set whenever that formal parameter is defined in the called procedure. If the individual elements representing the different definitions of g do not somehow identify how they are related to this formal parameter, then the only choice is to kill all of them or none of them, and neither of these choices is correct in this case, as the only definitions of g that should be killed are those that entered the procedure from the call where g is aliased to the call-by-reference formal parameter. 2.2.3 Implicit Definitions Due to Calls A call with parameters typically has implicit definitions associated with it. For example, if a formal parameter is call-by-reference, then each actual parameter aliased to that formal parameter is implicitly defined at each definition of the formal parameter. If a formal parameter is call-by-value-result, then that formal parameter is implicitly defined each time the called procedure is entered, and the actual parameter at the call is implicitly defined upon return from the call. From the standpoint of solving a dataflow problem such as reaching definitions, all implicit definitions due to calls should be determined, and elements generated at the appropriate nodes to represent these implicit definitions. The remainder of this section discusses the generation of implicit definitions and the determination of what reaches them for the specific problem of reaching definitions. We assume that a formal parameter may be either call-by-reference, call-by- value, call-by-value-result, or call-by-result. For the reaching-definitions problem, before the iterative algorithm can be used to solve the dataflow equations, all GEN sets must be prepared. For each point p in the program where a call-by-reference formal parameter is defined, add to the GEN set of the node for point p an implicit definition of each actual-parameter variable that is aliased to that formal parameter in a call. Each added implicit-definition element must be a recorded element that includes the alias relationship for that actual parameter. For example, suppose a procedure named A has two call-by-reference formal parameters, x and y, and inside A at point p there is a definition of x, and there are three calls of procedure A in the program. The first call aliases variable v to x. The second call aliases variable v to both x and y. The third call aliases variable w to x. Thus, at point p there would be three implicit-definition elements generated, namely (v, {a}), (v, {z, y}), and (w, {x}). As an example of what this element notation means, for the (v, {x}) element the v represents the implicit definition of variable v that occurs at point p, and the x represents the formal parameter that variable v is aliased to. As a special requirement for these implicit-definition elements, for the B,,t set at the exit node of procedure A, the (v, {x}) element, if it reaches this set, can only cross from this set to the return node of the first call. Similarly, the (v, {z, y}) element can only cross to the return node of the second call, and the (w, {x}) element can only cross to the return node of the third call. The crossing restrictions in the preceding example are due to a rule, now given. Let A denote a procedure containing a definition at point p of a call-by-reference formal parameter x, (t, {z}) is the implicit-definition element generated at point p for some specific call c of A that aliases actual-parameter variable t to z, and m is the exit node of A. If (t, {x}) E B,,t[m], then (t, {x}) can only cross from Bout[m] to the return node of call c, and as (t, {x}) crosses, it must be recorded as t by having its alias relationship nullified. This crossing-restriction rule is necessary because element (t, {z}) is both a body effect, because it is generated inside the called procedure, and a calling-context effect, because it is the result of a specific call of that procedure. This dual quality requires the special treatment that the rule provides. Nullifying the alias relationship as the element crosses to the return node is both good practice in general for this element, and a necessity if call c is a recursive call of A. As an example, assume that call c is a recursive call of A, and that variable t is a global variable. If (t, {z}) reaches the B,,t[m] set, the rule states that this element can only cross to the return node of call c, and that it be recorded as t. Assuming that this t element then reaches from this return node to the B,,t[m] set, t can then cross to any return node that has an in-edge from m. Although both the (t, {x}) and t elements refer to the same implicit definition of variable t occurring at point p, the two elements are not the same, and the crossing-restriction rule applies only to an element that is identical to the element generated at point p, which is (t, {x}). The implicit definitions of actual-parameter variables is the most important category of implicit definitions that are due to call-by-reference formal parameters. However, there is also a second, less-important category. At each explicit definition of a variable t at point p inside A, such that variable t is also used in a call of A as an actual parameter aliased to a call-by-reference formal parameter z, then there is an implicit definition of formal parameter x at point p. The implicit-definition element generated at point p would be (x, {t}), meaning a definition of variable x at point p, aliased to variable t. However, assuming a formal parameter cannot be defined or used outside the procedure for which it is declared, it follows that there is no need for a crossing-restriction rule for these elements, because they cannot cross to any return node. Normally, a definition of a variable kills all other definitions of that variable. However, the implicit definitions due to call-by-reference formal parameters have no associated kills. Instead, the following rule suffices. For each call-by-reference formal parameter x declared for procedure A, if all calls of A alias the same actual-parameter variable t to x, then each explicit definition inside A of either variable t or z, will kill all definitions of variable t and all definitions of variable x. Otherwise, if all calls of A do not alias the same actual-parameter variable t to z, then each explicit definition inside A of either variable t or x will kill only the definitions of that variable and those recorded elements that are aliased to that variable. The entry-node GEN set will be used to hold all implicit definitions of formal parameters that occur upon procedure entry. Thus, for each entry node, for each formal parameter of the represented procedure that is call-by-value or call-by-value- result, add to the GEN set of that entry node an element that represents an implicit definition of that formal parameter occurring at that entry node. The return-node GEN set will be used to hold all implicit definitions of actual parameters that may occur upon return from the called procedure. Thus, for each return node, for each actual parameter of the associated call whose corresponding formal parameter is call-by-result or call-by-value-result, add to the GEN set of that return node an element that represents an implicit definition of that actual parameter occurring at that return node. The return-node KILL set should represent all elements that will be killed by these implicit definitions of actual parameters. With the GEN sets ready, the iterative algorithm can proceed. Once the iter- ative algorithm is ended, a follow-on step is done: a) Examine the Bo,, set for each exit node. For each definition d in this set of a formal parameter p, and p is call-by- result or call-by-value-result, then d reaches the implicit use of this formal parameter by those implicit definitions of actual parameters found at the various return nodes whose corresponding formal parameter is p. The element representing d can be added to the Bi sets of those return nodes in a way that reflects the reach, b) Examine the OUT set of each call node. For each definition d in this set of a variable that is used as an actual parameter in the call, and the corresponding formal parameter is call-by-value or call-by-value-result, then d reaches the implicit use of the defined variable by the implicit definition of the corresponding formal parameter found at the entry node of the called procedure. The element representing d can be added to the E,, set of that entry node in a way that reflects the reach. 2.3 Interprocedural Forward-Flow-And Analysis This section gives the dataflow equations used by our interprocedural analysis method for forward-flow-and problems. The difference between these equations and the equations for forward-flow-or is explained. For forward-flow-and problems, some changes are needed to the dataflow equa- tions given in Section 2.2.1. Of course, the confluence operator must be changed from union to intersection. However, it is still necessary to construct the entry-node entry set as the union of all crossing effects from the predecessor-node sets, so that calling context can be properly recovered at the return nodes. At the same time, the entry set must always be constructed as the intersection of predecessor-node sets, if the entry set is to be a part of the IN and OUT sets. These conflicting requirements for the entry-node entry set can be resolved by maintaining two separate entry sets at each node. The revised dataflow equations follow. The two conditions, C1 and C2, are explained in Section 2.2.1. For any node n. IN[n] = E!i)[n] U Bi[n] OUT[n]= Et[n]U Bo.t[n] Group I: n is an entry node. Bin[n] = 0 Ein[n] = U { Ix (E'[p] U Bot[p]) A C,} p E pred(n) En)[n] = n {x I|x E OUT[p] A C1} p E pred(n) Bout[n] = GEN[n] E1[n] = Ei$[n] U RECODE(1)[n] U RECODE )[n] ES([n] = E()[n] U RECODE(2)[n] Group II: n is a return node, p is the associated call node and q is the exit node of the called procedure. Bin[n] = {x I (x E Bot[p] A (' V (C1 A C2 A x E E(~[q]))) V (x E Bo,,[q] A C2)} ESW[n] = {x E E(t[p I CV (CI A C 2 AxE E1'[q])};i =1,2. Bo,,[n] = (Bin[n] KILL[n]) U GEN[n] E.t[n] = E!)[n] KILL[n]; i = 1,2. Group III: n is not an entry or return node. Bi,[n] = n B [p] p pred(n) S[n] = n E' [p]; i = 1, 2. p E pred(n) B0t[n] = (Bi,[n] KILL[n]) U GEN[n] Et[n] = E)[n] KILL[n]; i = 1,2. The entry set E(') is the set used to recover calling context, and the entry set E(2) is the set that is a component of the IN and OUT sets. The RECODE sets appearing in the entry-node equations represent recorded elements as explained in Section 2.2.2. The RECODE() set will just be the union of the recorded elements generated from each predecessor call node c, using the algorithm of Figure 2.2 and drawing from the E,[ c] and Bo,[c] sets at line 4 instead of the OUT[c] set. Similarly, the RECODE(2) set could just be the intersection of the recorded elements from each predecessor call node c, drawing from the OUT[c] set at line 4. However, doing this may cause the unnecessary loss of recorded elements when the same underlying base element w is found in each OUT[c] set. To avoid such loss, an improved rule states that if the same base element w is found in each OUT[c] set, and there is one or more non-empty alias relationships for that w occurring at one or more predecessor nodes c, then a single recorded element for that w that encodes all of these alias relationships would be generated into the RECODE(2) set, otherwise no recorded element for that w would be generated into the RECODE(2) set. For example, suppose c has three different values for a given entry node, and the same base element w is found in each OUT[c] set, and at one c there is an empty alias relationship, at the second c there is an alias relationship to formal parameter x, and at the third c there is an alias relationship to formal parameter y. For this example, the single recorded element would be (w, {x, y}), and this recorded element can either be killed directly through w, or indirectly through x, or through y. Note that the complete kill of this recorded element at any kill point, even though the kill may have been made through an alias that was not established at each c, is nevertheless correct. The intersection confluence operator associated with RECODE() implicitly requires that for base element w to pass a kill point, it must be on every call path past that kill point, which is not the case when w is killed from at least one call path, which happens when that w is killed through an alias that was established by at least one of the c. If the specific dataflow problem being solved allows the base element to be used through one of its effective aliases, then a flag could be associated with each alias in the recorded elements of RECODE(2), and this flag could indicate whether or not the alias was established at each c. In the case of the example, the recorded element with flags would be (w, {Xnot Ynot}). Only a use of the base element through an alias established at each c would be a use through an alias that occurs on every call path, and this kind of use would be the all-paths use that is implicitly required by the specific dataflow problem by virtue of it being forward-flow-and. With the exception of the confluence operator and the two different entry sets, the equations for forward-flow-and are the same as for forward-flow-or, and are like- wise correct. Set E(2) fulfills the requirement for the IN and OUT sets by consistently using the intersection confluence operator for its construction, just as B does. The equations for the E(1) and E(2) sets only differ at the entry node, and there the only difference is the confluence operator, and the way the RECODE sets are built. As set intersection is the confluence operator for E(2), and set union for E(1), and the Table 2.2. Solution of forward-flow-and equations for Figure 2.3. N o d e E-- Fpo-,-- F ^ --) L IP ||,- --(2 ) B i -- o ,,t Node -E, E(1 E) E7t B B0 1 0 0 0 0 0 0 2 0 0 0 0 0 {1,2} 3 0 0 0 0 {1,2} {1,2,4} 4 0 0 0 0 {1,4,5} {1,4,5} 5 0 0 0 0 {1,2} {1,2,3} 6 0 0 0 0 {1,3,5} 1, 3, 5} 7 0 0 0 0 {1,5} {1,5} 8 {2, 3, 4} {2, 3, 4} {2} {2} 0 0 9 {2,3,4} {3,4} {2} 0 0 {5} 10 {3,4} {3,4} 0 0 {5} {5} RECODE(2) set is added to both E() and E2), it follows that E(2 will be a subset of E(1) at every node. Thus, E(1) can be used to recover calling context for E(2). Set E1) also serves to recover calling context for both E(1) and B, because E() is built at the entry node from these two sets, and the use of union as the confluence operator guarantees that all calling-context effects will be collected. Table 2.2 shows the result of solving the equations for the flowgraph of Fig- ure 2.3. By "solving" we mean that, in effect, the iterative algorithm has been used and all the sets are stable. The dataflow problem is available expressions, and vari- able w is local while variables x, y, and z are global. Available expressions is the problem of determining whether the use of an expression is always reached by some prior use of that expression, for certain expressions in the program. In Figure 2.3, nodes 1 and 8 are entry nodes, nodes 7 and 10 are exit nodes, nodes 3 and 5 are call nodes, and nodes 4 and 6 are return nodes. Alongside each node is its basic block. Each expression is superscripted with an identifier that is the set element used in Table 2.2 to represent that expression. procedure main begin y=w+l z=x+1 if(e) a=z+1 call f( else a=y+ 1 call f( end a=z+ 14r call fO 3 procedure f() begin x=z+2 end + 11 + 12 Sa= y +13 5 call fO Figure 2.3. An available-expressions example. 2.4 Interprocedural Backward-Flow Analysis Backward-flow problems are basically forward-flow problems in reverse. How- ever, the same flowgraph is used for both forward-flow and backward-flow problems. To convert the equations for forward-flow-or to backward-flow-or, or for forward- flow-and to backward-flow-and, the transformation is mechanical and straightfor- ward. The same equations are used, but various words and phrases are everywhere changed to reflect the reverse flow. For example, "pred(n)" for predecessors becomes "succ(n)" for successors, "out" subscripts become "in" subscripts and "in" subscripts become "out" subscripts, IN becomes OUT and OUT becomes IN, "call node" be- comes "return node" and "return node" becomes "call node", "entry node" becomes "exit node" and "exit node" becomes "entry node". For backward flow, the nodes requiring special equations are the exit node and call node, and not the entry node and return node as for the forward-flow problems. 2.5 Complexity of Our Interprocedural Analysis Method To determine the worst-case complexity of our method for the assumed lan- guage model in which the visibility of each formal parameter is limited to the single procedure that declares it, we consider the solution of the dataflow equations for only one element at a time. Let n be the number of flowgraph nodes. Let the elementary operation measured by the complexity be the computation of the dataflow equations once at a single, average flowgraph node, for a single element. Only the presence or absence of the single element within a particular body or entry set need be repre- sented, and this requires no more than a single bit of storage for each set referenced by the equations. Thus, computing the dataflow equations once at an average node, for a single element, will consist of a small number of integer operations, assuming that the average in and out-degree of the flowgraph nodes is bounded by a small constant, which will always be the case for flowgraphs generated from real programs, and also assuming that the length of recorded elements will be small. Referring to the algorithm of Figure 2.2, the length of a recorded element is 1 + INAI, and INAI is bounded from above by the number of call-by-reference formal parameters of the given procedure. As a rule, this upper bound will be small. We next consider the total number of node visits required to solve the dataflow equations for a single element. Prior to solving the equations, all body and entry sets are initialized to empty, at complexity O(n). The empty sets represent the absence of the element. Note that each set has only two states: either the element is present, or it is absent. Assuming a forward-flow problem, each time the equations are computed for a node, if any of the out sets have changed from their previous state, then the equations will be computed for all successor nodes. The forward-flow-or equations have only two out sets per node, and the forward-flow-and equations have three. It follows that repeated computation of the equations for a single node will cause the successor nodes to be marked for computation at most two or three times, depending on the equations being used. Given that the average number of successor nodes is bounded by a small constant, it follows that the total number of node visits required to solve the dataflow equations for a single element will be bounded from above by kin where ki is a constant, giving a worst-case complexity of O(n) for solving the dataflow equations for a single element. The worst-case complexity of solving the dataflow equations for m total ele- ments will therefore be O(mn). Let b be the number of base elements for the program being analyzed, and let r be the number of recorded elements, giving m = b + r. As an example, for the reaching-definitions dataflow problem the base elements will be all the definitions in the program. We assume that for the kind of dataflow problems our method is meant to solve, the number of base elements will be a linear function of the program size, and therefore proportional to n. Let constant k2 be an upper bound of b/n. We also assume the universe of real, useful programs, written by programmers to solve practical problems. To determine an upper bound for r, let k be the maximum number of formal parameters for a single procedure. That k is a constant independent of program size should be obvious. Given k and the algorithm of Figure 2.2, and allowing all possible combinations of the formal parameters of any single procedure, the maximum number of recorded elements for any single procedure and base element is k3 = k=1 ( k ) = 2k -1. Note that k3 is a constant, albeit an enormous constant. The maximum number of recorded elements for any single procedure will therefore be kab. In the assumed language model, each formal parameter is visible in only one procedure, and this means each recorded element is confined to a single procedure when the dataflow equations are solved. Therefore, the total number of node visits required to solve the dataflow equations for all the recorded elements will be bounded from above by Ei = kisik3b where j is the number of procedures in the flowgraph, and s, is the number of flowgraph nodes in the ith procedure. This upper bound can be rewritten as Ci=i klk2k3nsi. Ignoring constants and given that Ef= si = n and Ei= ns; = n , the worst-case complexity of our method for the assumed language model is O(n2), and the elementary operation measured by the complexity is a small number of integer operations assuming that the average recoded-element length is small. For a program from the assumed universe of programs, the likelihood of a large complexity constant due to element recoding is very low, for the following reason. In order to increase the number of recorded elements for a given base element and procedure, the given base element must, in effect, be repeatedly aliased to different combinations of formal parameters in the given procedure. The algorithm of Fig- ure 2.2 generates at most a single recorded element for each element in the OUT set, so to increase the number of recorded elements as stated, there must be multiple calls to the same procedure, and in these different calls the same base element must be aliased to different formal-parameter combinations. To assess the likelihood of this requirement being met, consider that for any given program from the assumed uni- verse, the type and purpose of a variable determines how that variable is used in that program, and each variable used in a program by necessity has a purpose. Given a number of different calls to the same procedure, and given that a variable appears as one or more of the actual parameters in each of the calls, then as a rule we expect that variable to always occupy the same parameter positions in those calls because there is always a close correspondence between parameter position and the purpose of the variable that occupies that position. Note that by "variable" we mean a variable and any aliases it may have, including formal-parameter aliases. A variable and its aliases are interchangeable and share the same purpose because by definition they reference the same data. It might be argued that a language such as C has procedures that have a variable number of arguments, such as printf and scanf, for which the same variable could easily occupy different actual-parameter positions in different calls. This is true, but such library procedures are best treated as unknown calls, and there is no element recoding for unknown calls. For the needs of element recoding in the rare case of a user-written procedure with a variable number of arguments, a single formal parameter could stand for the variable portion of the formal parameters, and conservative assumptions could be made whenever that single formal is, in effect, referenced. Aside from mentioning this, we do not consider such user-written variable- argument procedures further. For a dataflow problem such as reaching definitions, the base element can only be affected by a single variable. For such a dataflow problem, the purposefulness of variables makes it very unlikely that an increase in the number of recorded elements for a given procedure and base element can even begin, let alone be sustained. How- ever, such an increase would be more likely for a dataflow problem where the base element can be affected by several different variables. An example would be avail- able expressions, because each base element could be affected by as many different variables as compose the expression represented by that base element. In light of the preceding argument regarding the purposefulness of variables, for the reaching-definitions and similar dataflow problems, we expect the maximum number of recorded elements for any given procedure and base element in the majority of the programs in the assumed universe, to be one, and a little higher than one for the remaining programs in that universe. Given the algorithm of Figure 2.2, we also expect the average length of each recorded element to be slightly more than two, given the preceding expectation that there will be a very small maximum number of recorded elements for any given procedure and base element, and assuming that most base elements when aliased by a call will be aliased to only a single formal parameter, and only occasionally aliased to more than one. Note that this expected average length of the recorded elements is consistent with the claim that the elementary operation measured by the worst-case complexity of our method is a small number of integer operations. It may be noticed that the complexity of O(n2) for our interprocedural analysis method is the same as the known worst-case complexity for intraprocedural dataflow analysis, assuming there are no restrictions on the flowgraph. This fact makes it unlikely that it would be possible to improve on our method in terms of complexity, without resorting to flowgraph restrictions. However, although the complexities are the same, this does not mean interprocedural dataflow analysis will now take roughly the same time as intraprocedural dataflow analysis. The following inequality should make this clear. E=1 s? < n2, given that j is the number of procedures in the flowgraph, si is the number of flowgraph nodes in the ith procedure, and Cji= si = n. Besides the language model that is assumed for this chapter, an alternative model allows each formal parameter to have visibility in more than a single procedure. Examples of programming languages that fit this alternative model are Pascal and Ada, which allow nested procedures. Element recoding can be used for this alternative model, but unless precision is compromised, the worst-case complexity for solving the equations will be exponential, because the number of recorded elements could grow exponentially assuming that alias information is compounded when a recorded element is recorded. The exponential complexity of tracking aliases due to calls was first considered by Myers [17], and more recently by Landi and Ryder [15]. In practice, the cost of precise element recoding for the alternative language model may be acceptable for the assumed universe of programs, and for the same reason given previously regarding the purposefulness of variables. However, we do not consider the alternative model further. 2.6 Experimental Results There are experimental data for our interprocedural analysis method. Specif- ically, two different prototypes have been constructed, and they both solve the reaching-definitions dataflow problem using our method. Both prototypes accept C-language programs as the input to be dataflow analyzed. For simplicity, these pro- totypes impose some restrictions on the input, such as requiring that all variables be represented by single identifiers, thereby excluding variables that have more than one component, such as structure and union variables. In addition, there is no logic in the prototypes to determine what pointers are pointing at, so pointer dereferencing is essentially ignored. The prototypes do not accept pre-processor commands, so the input programs must be post-preprocessor. Both prototypes, named prototype 1 and prototype 2, use the same code to parse the input program and construct the flowgraph. However, they differ in how they implement our analysis method. Prototype 1 prepares a single bit-vector format containing all the definitions in the input program, and then solves the dataflow equations once for the program flowgraph. Prototype 2 uses a single integer as the bit vector and solves the dataflow equations for the program flowgraph as many times as there are base elements. For the reaching-definitions dataflow problem, the definitions in the program are the base elements. We call the approach used by prototype 2 one-base-element-at-a-time, and the approach used by prototype 1 is all-at-once. It might be expected that prototype 2 would be many times slower than proto- type 1, because of the big difference in bit-vector sizes, but this is not the case. For prototype 1, calculations using varied test results show that V x S"1 D, where V is the average number of visits per flowgraph node made to solve the dataflow equa- tions, S is the integer size of the bit vector for prototype 1, and D is the number of definitions in the input program. This relationship for prototype 1 means that prototype 2 should run at roughly the same speed as prototype 1, because solving the dataflow equations for a single element will require an average of roughly one visit per flowgraph node and the application of the dataflow equations to a vector of size one. Note that the total amount of work prototype 1 must do per flowgraph node to solve the equations is proportional to the product V x S1 D, and the total amount of work prototype 2 must do per flowgraph node to solve the equations for the D base elements is proportional to the product Vx S2 x D x 1 x 1 x D D, where S2 is the integer size of the bit vector for prototype 2. Experimental results have supported the expectation of similar speeds for the two prototypes. When deciding on the design of a practical tool, this finding is important and decisively tips the scales in favor of the one-base-element-at-a-time approach used by prototype 2. For both prototypes, the bit space needed for set storage is nks, where n is the number of flowgraph nodes, k is the average number of sets per node, and s = max(average set bit-size for any solving of the equations). Note that for prototype 1 there is only one solving of the equations, and for prototype 2 there are as many solving of the equations as base elements. The primary reason Table 2.3. Typical experimental results for the two prototypes. defs defs global calls nodes prototype 1 prototype 2 2126 30% 521 4191 49s lm21s 2026 60% 472 3948 55s 2m22s 4109 30% 924 7537 4m18s 4m38s 4223 60% 916 7723 4m57s 8ml9s 6115 30% 1325 11185 N/A 10m0s 6091 60% 1411 11288 N/A 18ml8s 8200 30% 1832 14799 N/A 17m44s 8054 60% 1726 14641 N/A 30m2s 10299 30% 2164 18434 N/A 23m55s 10016 60% 2356 18587 N/A 45m8s the approach used by prototype 2 is preferable when compared with the all-at-once approach used by prototype 1, is the likelihood of a greatly reduced s value. For example, without element recoding, the s value is 1 for prototype 2, and D for prototype 1. Allowing element recoding, the s value for the prototype-2 approach will be 1 + max(average number of recorded elements per procedure for any solving of the equations). Here we assume that the best way to add element recoding to prototype 2 would be, for each solving of the equations, to solve the equations for both a single base element and all recorded elements generated from that base element. Table 2.3 presents typical experimental results for the two prototypes. Each table row represents a different input program. The input programs were randomly generated by a separate program generator. The generated input programs are syn- tactically correct and compile without error, but have meaningless executions. Each input program in Table 2.3 has 100 procedures. Only prototype 1 currently has element-recoding logic, so the input programs do not have call parameters and the table data do not reflect element-recoding costs. Measuring element-recoding costs for randomly generated programs would be somewhat meaningless anyway, since the purposefulness-of-variables principle would be violated. 44 Referring to the columns of Table 2.3, "defs" is the total number of definitions in the input program, "defs global" is the percentage that define global variables, "calls" is the number of known calls, "nodes" is the number of flowgraph nodes, "prototype 1" is the total CPU usage time in minutes and seconds required by prototype 1 to completely solve the reaching-definitions dataflow problem for the input program and generate a report of all the reaches, and "prototype 2" is the same thing for prototype 2. The hardware used was rated at roughly 23 MIPS. The large space requirements of prototype 1 prevented running it for the larger input programs in the table. CHAPTER 3 INTERPROCEDURAL SLICING AND LOGICAL RIPPLE EFFECT 3.1 Representing Continuation Paths for Interprocedural Logical Ripple Effect This section lays the theoretical basis for our algorithm. The problem of inter- procedural logical ripple effect is examined from the perspective of execution paths and their possible continuations. First, general definitions are given, followed by three assumptions and a definition of the Allow and Transform sets, followed by Lemma 1, Theorems 1 through 4, and a discussion of the potential for overestimation inherent in the Allow set. A variable is defined at each point in a program where it is assigned a value. A definition is assumed to have the general form of "v +- expression", where v is the variable being defined and "--" is an assignment operator that assigns the value of expression to v. If the expression includes variables, then these variables are termed the use variables of the definition. In general, a use is any instance of a variable that is having its value used at the point where the variable occurs. A procedure contains a definition if the statement that makes the definition is in the body of the procedure. Similarly, a procedure contains a call if the statement that makes the call is in the body of the procedure. The body of a procedure is those statements that are defined as belonging to the procedure. Frequent reference is made in this chapter to a procedure containing a state- ment, or containing a call, or containing a flowgraph node. For languages that allow nested procedures, such as Pascal and Ada, note that procedure nesting in these languages is a mechanism for controlling variable scope, and not a mechanism for sharing statements, calls, or flowgraph nodes. Throughout this chapter we assume that at most only a single procedure contains any given statement, call, or flowgraph node. Let d and dd be two definitions, possibly the same, in the same program. Let dd have a use-variable v, let vdd be that use-variable instance, and let d define v. Given a possible execution path between definition d and vdd, along which the definition of v that d represents would be propagated, such a path is referred to as a definition-clear path between d and vdd with respect to v. Definition d can only be propagated along an execution path to the end of that path if either definition d itself or an element that represents definition d exists at the beginning of that path, and there is no redefinition of v along that path. Definition d is said to affect definition dd if there is a definition-clear path between d and Vdd with respect to v. Similarly, definition d affects use u if u is an instance of v, and there is a definition-clear path between d and u with respect to v. For convenience, v will not be explicitly mentioned when it is understood. Note that whenever we speak of an execution path between two points, we always mean that the execution path begins at the first point and ends at the second point. For example, an execution path between d and dd begins at the program point where d occurs and ends at the program point where Vdd occurs. For convenience, we assume that dd and Vdd occupy the same program point. Assumption 1. A called procedure, if it returns, always returns to its most recent caller. A procedure that returns, always returns to the most recent unreturned call. Assumption 2. A call has no influence on the execution paths taken inside the called procedure. Assumption 3. There are no recursive calls. Assumption 1 reflects the behavior of all the procedural languages that we know of. Regarding Assumption 2, our algorithm may in fact overestimate the logical ripple effect because of both Assumption 2 and the unstated but standard assumption of intraprocedural dataflow analysis that all paths in a procedure flowgraph are possible execution paths. However, these two assumptions are unavoidable because determin- ing all the truly possible execution paths in an arbitrary program is known to be an undecidable problem. Regarding Assumption 3, making this assumption improves the precision of our algorithm because this assumption removes a potential cause of overestimation. The consequence of using our algorithm for a program with recursive calls is discussed at the end of Section 3.2. To determine what a definition affects when it is constrained by ripple effect, it is useful to introduce two concepts: backward flow and forward flow. Given an execution path, whenever the execution path returns from a procedure to a call, this is termed backward flow. All other parts of the execution path may be termed forward flow. Note that the possibilities for backward flow are constrained by Assumption 1, and therefore constrained by the relevant execution paths that lead up to the point of the return in question. Regarding a given execution path, those call instances within that execution path that have yet to be returned to within that path, called unreturned calls, are the parts of the path that constrain backward flow. Note that this constraint is a positive constraint, since a call cannot be returned to unless that call exists as an unreturned call in at least one relevant execution path. Definition 1. Two sets, Allow and Transform, will be used to represent the backward-flow restrictions associated with a particular definition d. Let p be the pro- gram point where definition d occurs. The elements in both sets are calls. The Allow set identifies only the calls to which the execution path continuing on from point p may make an unmatched return to-until the backward-flow restrictions represented by this Allow set are effectively cancelled by the interaction between the execution- path continuation and the Transform set, explained shortly. An unmatched return is a return made during the execution-path continuation to a call instance that precedes the beginning of that execution-path continuation. The call instance is necessarily an unreturned call, as otherwise it could not be returned to. [Allowl < the total number of different calls represented in the program text. We define Allow = 0 to mean there are no backward-flow restrictions for d. The Transform set identifies only the calls to which the execution path continuing on from point p may make an unmatched return to, and upon this unmatched return, the execution-path continuation is no longer constrained by the Allow and Transform sets associated with d. The following relationships hold. Transform C Allow. If Allow 5 0 then Transform Z 0. Note that minimizing backward-flow restrictions must be done whenever the possible execution paths allow it, because otherwise the computed logical ripple effect-which is the whole purpose of this formal-analysis section-may be missing pieces that belong in it but were not added to it because backward-flow restrictions were retained that are not valid for all the possible execution paths involved. Lemma 1. For any execution path P between two program points p and q, if P includes two or more call instances made in P that have not been returned to in P, then for these unreturned calls, c, calls the procedure containing ci+l, where c, is the ith unreturned call, in execution order, made in P. Proof. Assume that the next unreturned call ci+l is not contained in the pro- cedure that was called by ci. Let X be the procedure called by ci, and let Y be the procedure that contains ci+1. The execution path in P between making the call ci and making the call ci+ must include a path out of procedure X and into procedure Y so that the call ci+ can be made. A path out of procedure X can occur in only two ways. Either X returns to a call, or X itself makes a call. If X returns to a call, then by Assumption 1, ci would be returned to, contradicting the given that ci has not been returned to. This means X must make a call to get to Y. Let c be the call contained in X that is the last call contained in X on the execution path in P taken from X to Y so as to make the call ci+l. If X makes the call c, and c has not been returned to in P, then c would precede ci+, as an unreturned call following c;, con- tradicting the given that ci+l is the next unreturned call in execution order after c;. If c has been returned to in P, then all calls occurring on the execution path between the call c and the return to c must have been returned to according to Assumption 1. This would mean ci+1 has been returned to, contradicting the given that ci+1 has not been returned to. Thus, it is true that ci calls the procedure containing ci+,, as assuming otherwise leads to contradictions. 0 Definitions for Theorems 1 through 4. Let d and dd be the two definitions previously defined. Let A and T be the Allow and Transform sets associated with d. Let P be a single execution path between d and dd, and along which d can affect dd, subject to the constraints on P imposed by A and T. P will consist of a sequence of calls and returns, if any, in the order they are made. Any instance of a call made in P that is not returned to in P, is an unreturned call in P. K is defined for P if and only if P contains an unmatched return-meaning a return to a call instance that precedes the beginning of P-to a call E T. K is that part of P that follows the first unmatched return to a call E T. Thus, K represents the continuation of P after the unmatched return. Any instance of a call made in K that is not returned to in K, is an unreturned call in K. Referring to each of the four theorems in turn, let AA and TT be the Allow and Transform sets for dd given all the paths P that meet the requirements of P as stated by that theorem. Let AAp and TTp be the Allow and Transform sets for dd given a single path P that meets the requirements of P as stated by that theorem. The four theorems that follow each define AA and TT. Note that for any given P, A, and T, one of the four theorems will apply. Theorem 1. If (1) A = 0, and P has no unreturned calls, or (2) A 5 0, K is defined for P, and K has no unreturned calls, then AA +- 0 and TT +- 0. Proof. For case (1), d is free of backward-flow restrictions and d has affected dd without making an unreturned call, therefore dd will be free of backward-flow restrictions, giving AA <- 0 and TT <- 0. For case (2), as soon as path P makes an unmatched return r to a call E T, then by Definition 1 what d can affect is no longer constrained by A and T, and this freedom from constraint by A and T passes by transitivity to dd because d affects dd. When K is defined for P, the unmatched return r in P that immediately pre- cedes the beginning of K, means that any unreturned calls in P are also in K. This is because all call instances within P are more recent than the call instance that matches the unmatched return r. Thus, by Assumption 1 all call instances in P preceding the return r must be returned to in P before r can occur. Therefore, P has no unreturned calls because K has no unreturned calls. Thus, dd is free of backward- flow restrictions since A, T, and P contribute nothing in the way of constraint, giving AA +- 0 and TT 0. O Theorem 2. If (1) A = 0, and P has at least one unreturned call, or (2) A ^ 0, K is defined for P, and K has at least one unreturned call, then AA *- Uall such P {the unreturned calls of P}, and TT +- Uall such P {the first unreturned call in P}. Proof. For case (1), A and T contribute nothing in the way of constraint to AAp and TTp. Because d affects dd along path P which contains unreturned calls, by Assumption 1 those unreturned calls must be returned to first before any other unreturned calls can be made from the execution-path continuation point of dd onward. Hence, AAp +- {the unreturned calls of P}. Because d had no backward- flow restrictions, it follows that once all the unreturned calls of P are returned to by the execution-path continuation, then that continuation would no longer have any backward-flow restrictions. Because of Assumption 3 and Lemma 1, all the unreturned calls of P are returned to when the sequentially first unreturned call in P is returned to. Hence, TTp <- {the first unreturned call in P}. For case (2), as shown in the proof of Theorem 1 case (2), A and T contribute nothing to AAp and TTp when K is defined for P. Thus, this case (2) is effectively the same as case (1), because the A and T sets contribute nothing and an unreturned call in K is an unreturned call in P. Therefore, AAp +- {the unreturned calls of P} and TTp - {the first unreturned call in P}. From Definition 1 and the general definitions of AA, TT, AAp, and TTp, it follows that AA +- Uall such P AAp and TT t Uall such P TTp. Thus, AA +- Uall such P {the unreturned calls of P}, and TT Uall such P {the first unreturned call in P}. O Theorem 3. If A # 0, K is not defined for P, and P has no unreturned calls, then AA {zx x E A A (a is part of a possible execution path that inclusively begins with a call E T and ends with a call of the procedure containing dd, such that each unreturned call in this possible execution path is in A)}, and TT +- AA n T. Proof. Note that only one procedure contains dd. Because K is not defined for P, it follows that P was constrained in its entirety by A, never making an unmatched return to a call E T. Because P has no unreturned calls, d can only affect dd along P by making one or more unmatched returns to calls E (A T), unless d and dd are in the same procedure. A, in effect, represents possible execution paths with unreturned calls by which d was affected. However, once given P, the path P may eliminate some of the paths from A as being possible, and return to some of the unreturned calls in A. Thus, although P contributes nothing directly to AA, it may narrow the unreturned execution-path possibilities that A can contribute to AA. AA as defined for this theorem, captures all execution paths in A that begin with a call E T and end with a call of the procedure that contains dd. Given Assumption 3, it should be obvious that these are all the possible paths in A that are unreturned after P. Note that if d and dd are in the same procedure, then AA = A and TT = T. Assume that d and dd are in different procedures. Any call E A that is not part of at least one path in A that makes a call of the procedure containing dd, must be excluded from AA because P requires a path in A that passes through the procedure containing dd, because otherwise P could not make a return to the procedure containing dd. Any call E A that is on a path in A between the procedure containing dd and the procedure containing d, must be excluded from AA because the procedure containing dd has been returned to by P. The definition of AA for this theorem satisfies these two exclusions. That TT +- AA n T follows from Definition 1 requiring TT C AA, and from the definition of AA for this theorem. O Theorem 4. If A 0, K is not defined for P, P has at least one unreturned call, and the first unreturned call in P is contained in procedure X, then St Uall such P given X {the unreturned calls of P}, and S2 +- {x x E A A (x is part of a possible execution path that inclusively begins with a call E T and ends with a call of the procedure X, such that each unreturned call in this possible execution path is in A)}, AA <- S U S2, and TT S2 n T. Proof. St follows from Definition 1 and the proof of Theorem 2. S2 follows from Theorem 3, where the specific "procedure containing dd" in the expression for AA in Theorem 3 has been replaced by the equally specific "procedure X". That the union operation of AA, combining St and S2, does not thereby repre- sent spurious paths in AA, it is only necessary to show that the paths represented in Si never cross with the paths represented in S2. Two paths cross if each path makes an unreturned call to the same procedure. All paths in S2 end with an unreturned call of procedure X. All paths in St begin with an unreturned call contained in procedure X. Assume that both S and S2 include an unreturned call to the same procedure. As all paths in S2 lead to procedure X, this means there exists an exe- cution path that originates in procedure X and eventually calls procedure X. Thus, Figure 3.1. An example call structure that does not allow overestimation. the execution path represents recursion, and this is contradicted by Assumption 3. Therefore, the paths represented in S1 never cross with the paths represented in S2. The first unreturned call in P is not added to TT because the path P is an extension of the unreturned paths represented in S2. That TT +- S2 r T follows from Definition 1 requiring TT C AA, and from the definition of AA for this theorem. 0 The four theorems given above will be used to build the algorithm given in the next section. In effect, a given Allow set represents possible execution paths with unreturned calls by which the definition associated with that Allow set was affected. Inversely, the Allow set identifies, in effect, those continuation paths that can make unmatched returns. However, missing from the Allow set is the information needed to enforce an ordering of the unmatched returns that the continuation path may make. To a large extent, this missing information is unnecessary because of Lemma 1. Typically, the call structure of the program itself enforces the ordering of the unmatched returns. Figure 3.1 is an example. Assume d affects dd, giving an Allow set of {cl,c2} for dd. Given a continuation path from dd, it is not possible for cl to be returned to before c2, so the correct ordering of unmatched returns is enforced by the program itself. However, there are cases where the missing ordering information can result in a continuation path taking unwanted shortcuts. Figure 3.2 gives an example of a call structure that allows the continuation path from dd to make an unwanted shortcut when given the right circumstances. Assume d affects dd along the paths cl-c2 and c3-c4, giving an Allow set of {cl,c2,c3,c4} for dd. Assume the continuation path is r2-c5-r3, where r2 and r3 are unmatched returns cl c3 c2 c4 Figure 3.2. An example call structure that allows overestimation. to calls c2 and c3. The unmatched return r3 should not be allowed to happen before an unmatched return r4, but this unmatched-return ordering will not be enforced by the Allow set defined in this dissertation, so the assumed continuation path is possible. By virtue of such a spurious continuation path, dd may be able to affect a definition or use that it would not otherwise be able to affect, assuming dd were confined to only legitimate continuation paths. In practical terms, this means that the computed logical ripple effect that consists of affected definitions and uses may in fact be an overestimate because of spurious continuation paths. Although the Allow set does permit spurious continuation paths under the right circumstances, of which Figure 3.2, and the assumed paths by which d affected dd, are the most simple example, we feel that these circumstances, along with spurious paths that affect what would otherwise be unaffected, will not occur often enough in real programs to undermine the general usefulness of the Allow set in constraining backward flow and permitting computation of a precise or semiprecise logical ripple effect. 3.2 The Logical Ripple Effect Algorithm This section presents an algorithm for computing a precise interprocedural logi- cal ripple effect. After a brief overview of the algorithm, the dataflow analysis method used by the algorithm is discussed. Then, two important properties of the dataflow sets are detailed, followed by three rules that are used to impose backward-flow re- strictions on the dataflow analysis that is done. Last are proofs that the algorithm is correct. The algorithm to compute logical ripple effect is shown in Figure 3.3. Each statement in the algorithm is numbered on the left. For convenience, algorithm statements will be referred to as lines. For example, a reference to line 28 means the statement at 28 that actually is printed on several lines. Comments in the algorithm begin with -. I and T are just two different, fixed, arbitrary values. In general, the algorithm works as follows. A definition d and its associated Allow and Transform sets are popped from the stack (line 7), and then the reaching- definitions dataflow problem is solved for this definition d, imposing any backward- flow restrictions represented by the Allow and Transform sets (line 8). Reaching definitions for a single definition is the problem of finding all uses and definitions affected by the definition. The definition d that was dataflow analyzed, and any uses affected by it, are included in the ripple effect (lines 9 to 11). Each affected definition will have its Allow and Transform sets determined in accordance with Theorems 1 through 4 (lines 22 to 46). A check is then made to see if the affected definition and its restriction sets, Allow and Transform, should be added to the stack for dataflow analysis or not (lines 47 to 52). The algorithm ends when the stack is empty. Although the algorithm shows a single definition b being added to the stack at line 5, any number of different b can actually be added, along with empty restriction sets for each b. Compute the logical ripple effect for a hypothetical or actual definition b Input: a program flowgraph ready for dataflow analysis Output: the logical ripple effect in RIPPLE begin 1 RIPPLE 0 2 for each definition dd in the program 3 FINdd _- I end for 4 stack +- 0 5 push (b, 0, 0) onto stack 6 while stack 5 0 do 7 pop stack into (d, ALLOW, TRANSFORM) 8 Solve the reaching-definitions dataflow equations for the single definition d, using Rules 1, 2, and 3. 9 RIPPLE +- RIPPLE U {d} 10 for each use u in the program that is affected by either dl or d2 11 RIPPLE RIPPLE U {u} end for 12 ROOT1 0, LINK1 + 0, ROOT2 +- 0, LINK2 + 0 13 for each call node n in the flowgraph 14 if dl E Bou[n] and di crossed from this call into the called procedure 15 ROOT1 ROOT1 U {the call node n} fi 16 if di E E ut[n] and di crossed from this call into the called procedure 17 LINK1 LINK1 U {the call node n} fi 18 if d2 E Bout[n] and d2 crossed from this call into the called procedure 19 ROOT2 ROOT2 U {the call node n} fi 20 if d2 E Eat[n] and d2 crossed from this call into the called procedure 21 LINK2 +- LINK2 U {the call node n} fi end for Figure 3.3. The logical ripple effect algorithm. 22 for each definition dd in the program that is affected by either dl or d2 determine Allow and Transform for dd by Theorem 1 23 if d2 E B,,[node where dd occurs] 24 PATHS *- 0, TRANS +- 0 25 call Analyze else determine Allow and Transform for dd by Theorem 2 26 if d2 E Ei,[node where dd occurs] 27 PATHS 0 28 PATHS x- { x z (ROOT2 U LINK2) A (x calls the procedure that contains dd V x calls a procedure that contains a call c E (PATHS n LINK2))} 29 TRANS ROOT2 n PATHS 30 call Analyze fi determine Allow and Transform for dd by Theorem 3 31 if di E B,n[node where dd occurs] 32 PATHS +- 0 33 PATHS {x I x E ALLOW A (x calls the procedure that contains dd V x calls a procedure that contains a call c E PATHS)} 34 TRANS TRANSFORM n PATHS 35 call Analyze fi determine Allow and Transform for dd by Theorem 4 36 if di E E,, [node where dd occurs] 37 for each procedure X that contains a call E ROOT1 38 RT1 {x I x E ROOT1 A x is contained in procedure X} 39 PP < 0 40 PP .- {x I x E (RT1 U LINK1) A (x is on a path that inclusively begins with a call E RT1 and ends with a call of the procedure that contains dd, such that each call in this path is in (RT1 U LINK1))} 41 if PP 0 42 PATHS 0 43 PATHS {X x E ALLOW A (x calls procedure X V x calls a procedure that contains a call c E PATHS)} 44 TRANS TRANSFORM n PATHS 45 PATHS PATHS U PP 46 call Analyze end statements: fi, end for, fi, fi, end for, od end Figure 3.3. continued. Procedure Analyze begin avoid repetition of dd dataflow analysis if possible 47 if FINdd # T A (PATHS = 0 V (true for all saved pairs for dd: PATHS g P V TRANS g T)) 48 if PATHS = 0 49 FINdd C T 50 push (dd, 0, 0) onto stack else 51 save PATHS and TRANS as the pair P x T for dd 52 push (dd, PATHS, TRANS) onto stack fi fi end Figure 3.3. continued. The dataflow equations referred to in line 8 are shown in Figure 3.4. These equations are copied from Chapter 2 that presents a method for context-dependent flow-sensitive interprocedural dataflow analysis. The method consists of solving- using the standard iterative algorithm-the dataflow equations shown in Figure 3.4, for the program flowgraph required by the equations. The method in Chapter 2 includes a solution to the problems of parameter aliasing and implicit definitions, that are part of the interprocedural reaching-definition problem. We assume that the full method of Chapter 2 would be used, but we do not discuss these side issues in this chapter as they are not directly relevant to the algorithm. Note that there are other methods for context-dependent flow-sensitive interprocedural dataflow analysis [3, 9, 17, 21], but the method of Chapter 2 has precision and efficiency advantages over the other methods cited. Referring to the dataflow equations of Figure 3.4, four sets are computed for each flowgraph node: two body sets, Bi, and B,,t, and two entry sets, Ei, and E,,t. All body and entry sets are initially empty. As the equations will be solved for only a single definition d, the GEN set for the node where d occurs-i.e. the node whose For any node n. IN[n] = Ein[n] UBi[n] OUT[n] = E,,,[n] U B,,t[n] Group I: n is an entry node. Bin[n] = 0 Ei,[n] = U {x I E OUT[pl A C,} p E pred(n) B,,t[n] = GEN[n] Eat[n] = Ei,[n] U RECODE[n] Group II: n is a return node, p is the associated call node and q is the exit node of the called procedure. Bin[n] = {x I (x E BS,,[p] A (C- V (C1 A C2 A x E Eo,[q]))) V (x E Bot[q] A C2)} Ei,[n] = {x E Eo,,[p] I Ci V (C1 A C2 A z E ,,t[q])} Bo.t[n] = (B,n[n] KILL[n]) U GEN[n] Eot[n] = E;[n] KILL[n] Group III: n is not an entry or return node. Bi,[n] = U Bout[p] p E pred(n) Ei.[n] = U Eo,,[p] p E pred(n) Bout[n] = (Bin[n] KILL[n]) U GEN[n] Et[n] = Ein[n] KILL[n] Figure 3.4. Dataflow equations for the reaching-definitions problem. associated block of program code contains the definition d-will contain an element representing d, and all the other GEN sets will be empty. The node where d occurs is the natural starting point for the iterative algorithm, that will recompute the body and entry sets for the nodes until stability is attained and the sets cease to change, at which point the equations have been solved. Once solved, an element is in the entry set or body set at a particular node depending on how that element was propagated to that node. The same element may be in both sets at the same node. Properties 1 and 2 listed below, summarize those implications of set membership that are used by the algorithm. The properties follow directly from the dataflow equations. Property 1. For any node n, an element is in the Ei,[n] set or E,,t[n] set if and only if that element entered the procedure that contains node n from a call node, and there is a definition-clear path from that call node to node n. Thus, membership in the entry set of node n implies that the element can propagate to node n by an execution path that makes at least one unreturned call between the point where the element is generated and the point where node n occurs. Property 2. For any node n, an element is in the Bi,[n] set or Bo,,t[n] set if and only if that element was generated in the same procedure that contains node n, or that element entered the procedure that contains node n from an exit-node B,,t set. There must also be a definition-clear path to node n from either the element's generation node or from the exit node. If the element entered from an exit-node Bot set, then Property 2 applies recursively to the element in that B,,t set. Thus, membership in the body set of node n implies that the element can propagate to node n by an execution path between the point where the element is generated and the point where node n occurs that does not include any unreturned calls. The three rules referred to in line 8 are listed below. Rule 1 applies before the dataflow equations are solved. Rules 2 and 3 apply as the equations are being solved. The rules impose the backward-flow restrictions represented by the ALLOW and TRANSFORM sets in line 7. Rule 1. If ALLOW = 0 then element d2 is generated at the node where definition d occurs, otherwise d, is the generated element, meaning the element in the GEN set. Both dl and d2 are base elements that represent the same definition d. Both elements are identical in terms of when they appear in any given KILL set. The only difference between them is that dl and d2 are treated differently by Rules 2 and 3 below. If the ALLOW set is empty, then by Definition 1 there should be no backward- flow restrictions on d. Rule 1 accomplishes this requirement, as d2 is immune to backward-flow restrictions which are imposed by Rule 2. Rule 2. Let n be a return node, p be the associated call node, and q be the exit node of the called procedure. Each time the Bi,[n] equation is computed, if di E Bot[ q ], then di cannot cross from Bout[ q] into the Bi,[n] set if p ALLOW. In the dataflow equations, the crossing of an element from an exit-node body set to a return node is the only action in the equations that represents, in effect, an unmatched return to a call instance that was made in an execution path leading up to the program point where definition d occurs, which is the starting point of the reaching-definition analysis done for d. Thus, Rule 2 covers all cases in which an unmatched return occurs. Rule 2 restricts unmatched returns to those call instances that are represented in the ALLOW set, thereby realizing the purpose of the ALLOW set as given by Definition 1. Rule 3. Let n be a return node, p be the associated call node, and q be the exit node of the called procedure. Each time the Bi,[n] equation is computed, if di E Bout[ q], and, by C2 and Rule 2, d, can cross from Bout[ q into the Bi,[n] set, and p E TRANSFORM, then as this dl element crosses from Bot[q] into the Bi,[n] set, the element is changed to d2. In effect, dl is transformed into d2, and the return node n becomes a generation node for the d2 element. As already mentioned, in the dataflow equations the crossing of an element from an exit-node body set to a return node is the only action in the equations that represents, in effect, an unmatched return to a call instance that was made in an execution path leading up to the program point where definition d occurs, which is the starting point of the reaching-definition analysis done for d. Thus, Rule 3 covers all cases in which an unmatched return occurs. The requirement by Rule 3 that the returned-to call be in the TRANSFORM set satisfies Definition 1 as to when backward-flow restrictions can be ignored. Rule 3 replaces element dl, which is subject to the backward-flow restrictions, with element d2, which is free of backward- flow restrictions, at the return point and thereby satisfies Definition 1 regarding removal of backward-flow restrictions on the execution-path continuation, since d2 now represents the continuation instead of dl. Lemma 2. The algorithm computes at lines 23 to 46 the restriction sets for an affected definition in accordance with Theorems 1 through 4. Proof. We first establish the properties of the LINK and ROOT sets computed at lines 12 to 21. Let p be the node, if any, where dl is generated. Let q be any node where d2 is generated, i.e. those return nodes where dl is transformed into d2, or for ALLOW = 0 the node where definition d occurs. The tests at lines 14 and 18 make use of Property 2: if an element is in the Bout set of a call node n, then there exists a definition-clear path between the node where the element is generated and node n, and the path has no unreturned calls. The call at node n would be the first unreturned call on that path by just extending the path to the entry node of the called procedure. Therefore, the ROOT1 set represents all calls that are the first unreturned call on at least one definition-clear path between node p and some other node in the flowgraph. The ROOT2 set represents all calls that are the first unreturned call on at least one definition-clear path between node q and some other node in the flowgraph. The tests of lines 16 and 20 make use of Property 1: if an element is in the E,,t set of a call node n, then there exists a definition-clear path between the node where the element is generated and node n, and the path includes the unreturned call that called the procedure containing node n. The call at node n would be at least the second unreturned call on that path by just extending the path to the entry node of the called procedure. Therefore, the LINK1 set represents all calls that are an unreturned call but not the first unreturned call on at least one definition-clear path between node p and some other node in the flowgraph. The LINK2 set represents all calls that are an unreturned call but not the first unreturned call on at least one definition-clear path between node q and some other node in the flowgraph. The test at line 23 checks for the application of Theorem 1. If d2 E Bi, [node where dd occurs], then by Property 2 there exists a definition-clear path P between d and dd that has no unreturned calls, and somewhere along P, d2 is generated, meaning either ALLOW = 0 or K is defined for P. This satisfies the conditions of Theorem 1, and line 24 sets PATHS and TRANS to empty in accordance with the theorem. PATHS and TRANS are the Allow and Transform sets for dd. The test at line 26 checks for the application of Theorem 2. If d2 E Ei[node where dd occurs], then by Property 1 there exists at least one definition-clear path P between d and dd that has at least one unreturned call, and somewhere along P, d2 is generated, meaning either ALLOW = 0 or K is defined for P. This satisfies the conditions of Theorem 2. Only the d2 element satisfies the theorem, so it follows that all paths P for the theorem will have to be constructed from the ROOT2 and LINK2 sets exclusively. Referring to Theorem 2, line 28 computes the AA set, and line 29 computes TT. For line 28, the PATHS set is defined in terms of itself. This recursive reference means that each time a call is added to the PATHS set, the condition containing the recursive reference must be reevaluated, because additional calls may thereby be added to PATHS. Recursive references are similarly used in lines 33 and 43. What line 28 does is extract from all the calls that element d2 crossed, just those calls that are on a path to dd. This is done by building the paths backwards, beginning with those calls that call the procedure containing dd. By Lemma 1, any path between d and dd consisting of unreturned calls can be found by proceeding in reverse order from dd and selecting those calls that call a procedure containing a call already selected. Backward path building and Lemma 1 are similarly used in lines 33 and 43. By the properties of the ROOT and LINK sets, the paths constructed by line 28 will be definition-clear. Notice that a particular call may be in both the ROOT2 and LINK2 sets, but if a call is only in the ROOT2 set, then it cannot be used as the basis for extending further backwards any path, because by Property 1, d2 does not propagate from the entry node of the procedure that contains that call, to the call node for that call. This is the reason for the (PATHS n LINK2) requirement in line 28. Once the PATHS set is computed, line 29 computes TRANS in accordance with the theorem. The test at line 31 checks for the application of Theorem 3. If di E Bi [node where dd occurs], then by Property 2 there exists a definition-clear path P between d and dd that has no unreturned calls. It also follows that ALLOW # 0 and P does not make an unmatched return to a call E TRANSFORM, because d, is the element, meaning K is not defined for P. This satisfies the conditions of Theorem 3. Referring to Theorem 3, line 33 computes the AA set, and line 34 computes TT. What line 33 does is extract from ALLOW all paths that end with a call of the procedure containing dd. Although Theorem 3 states that the path begin with a call E TRANSFORM, line 33 does not require a check for this because TRANSFORM is a subset of ALLOW and those first unreturned calls in TRANSFORM that are on a path in ALLOW to dd, will unavoidably be picked up as the paths are built backwards from dd. Thus, the PATHS set is computed in accordance with Theorem 3, followed by line 34 that computes the TRANS set in accordance with the theorem. The test at line 36 checks for the application of Theorem 4. If dl E i, [node where dd occurs], then by Property 1 there exists at least one definition-clear path P between d and dd that has at least one unreturned call. It also follows that ALLOW : 0 and P does not make an unmatched return to a call E TRANSFORM, because dl is the element. This satisfies the conditions of the theorem. Only the dl element satisfies the theorem, so it follows that all paths P for the theorem will have to be constructed from the ROOT1 and LINK1 sets exclusively. Referring to Theorem 4, line 40 computes the S1 set, line 43 computes the S2 set, line 44 computes the TT set, and line 45 computes the AA set. The reason for the test at line 41 is that although there exists at least one path P satisfying the theorem, there may not be any paths P that begin in the specific procedure X. It can be seen that lines 37 to 46 compute in accordance with the theorem. 0 Lemma 3. Let Ai and Ti be one pair of Allow and Transform sets associated with a definition d, and let Aj and Tj be a different pair of Allow and Transform sets associated with the same definition d. Assume Ai 5 0 and Aj : 0. If Aj C Ai and Tj C Ti, then dataflow analyzing d with the pair Aj and Tj cannot add anything to the ripple effect that is not added by dataflow analyzing d with the pair Ai and T,. Proof. By inspection of Rules 1, 2, and 3, it can be seen that removing some of the calls from Ai or Ti cannot make d affect anything that it does not affect with Ai and Ti as they were. Also, by inspection of lines 23 to 46, the determination of the Allow and Transform sets for any definition dd affected by d, cannot be made to include calls when Aj and Tj are the restriction sets for d, that would not be included when Ai and Ti are the restriction sets for d. O Lemma 4. Let A and T be Allow and Transform sets associated with a definition d, and let X and Y be a different pair of Allow and Transform sets associated with the same definition d. If A = 0, then dataflow analyzing d with X and Y cannot add anything to the ripple effect that is not added by dataflow analyzing d with A and T. Proof. By Rule 1, d will be represented by d2 and have no restrictions on its backward flow. Thus, d will affect everything that it is possible for it to affect. If d is dataflow analyzed with X and Y, then any calls found in the ROOT1, ROOT2, LINK1, or LINK2 sets will also be found in the ROOT2 or LINK2 sets when d is dataflow analyzed with A and T. These sets determine the restriction sets associated with a definition dd affected by d. It follows that any dataflow path allowed for a dd affected by d using X and Y, will also be allowed for a dd affected by d using A and T. o Theorem 5. Given Definition 1 and Theorems 1 through 4, the algorithm will correctly compute the logical ripple effect. Proof. As shown by Lemma 2, for any affected definition dd, the Allow and Transform sets to be associated with dd are computed in accordance with Theorems 1 to 4. By Lemma 4, if Theorem 1 applies to an affected definition (line 23), then there is no need to check if any other theorem also applies, because additional dataflow analysis resulting from the other theorems cannot contribute to the ripple effect. However, if Theorem 1 does not apply, then the definition must be dataflow analyzed separately in turn for each theorem that does apply. This is done by the sequence of three if statements at lines 26, 31, and 36. Thus, the control logic in lines 23 to 46 is safe. The Analyze procedure (lines 47 to 52) prepares a definition and its restriction sets for dataflow analysis by adding them to the stack (line 50 and 52). Once a defi- nition will be dataflow analyzed with no restrictions (line 50) it will not be analyzed again (line 47). By Lemma 4, this is safe. Assuming FINdd 0 T and PATHS # 0, the test at line 47 will not prepare a definition for dataflow analysis if both restriction sets are subsets of any pair of restriction sets used previously to analyze that definition. This follows from Lemma 3. Thus, the Analyze procedure is safe. The correctness of the dataflow equations (line 8) is established in Chapter 2, and the correctness of the three rules for imposing backward-flow restrictions (line 8) has already been discussed. Regarding the correctness of having no backward-flow restrictions for the initial definition (line 5), let p be the program point where b occurs. For execution to attain point p, any possible execution path between the program's execution starting point and point p can be assumed to have occurred. Thus, there should be no restrictions on the backward-flow possibilities of b, because there were no constraints imposed by the ripple effect on how point p was initially attained. O Programs with recursive calls can be processed by our algorithm, but there may be some overestimation of the logical ripple effect because of the recursive calls. The dataflow equations (line 8) are not the problem, as they work for recursive programs. Instead, the problem is with the Allow set and its representation of execution paths. If a cyclic execution path is represented in the Allow set, then when the Allow set is used to restrict backward flow by Rule 2, it may be possible for an element moving through the program flowgraph to take a shortcut on its unmatched returns and avoid having to make unmatched returns along the complete cycle before a program point can be attained. This shortcut may permit the element to affect something that it should not be able to affect, possibly adding to the ripple effect beyond what should be there. 3.3 A Prototype Demonstrates the Algorithm This section first considers the complexity of our interprocedural logical ripple effect algorithm. A prototype that demonstrates the algorithm is then described, and test results presented. Let n be the number of nodes in the flowgraph of the input program. For a programming language such as C, solving the dataflow equations for a single defi- nition, which is what line 8 does, has worst-case complexity of O(n). Let k be the number of known calls in the input program. Considering line 47, a definition may be dataflow analyzed repeatedly as long as the associated restriction sets are not subsets of any previous pair of restriction sets used to dataflow analyze that definition. The number of different restriction sets possible such that no set is a subset of another set, is clearly a number that will grow exponentially with k. Thus, the worst-case complexity of our logical ripple effect algorithm is exponential, where the exponent is some function of k. However, for the typical input program, the actual number of non-subset restriction sets that can be generated by our algorithm for a given defini- tion, will be severely constrained by a combination of Lemma 1, Theorems 1 through 4, and the typical program call structure that is characterized by shallow call depth. A prototype that demonstrates our logical ripple effect algorithm has been built. The prototype accepts as input C programs that satisfy certain constraints, such as having only single-identifier variable names. Given an input program, the prototype then requires that one or more definitions be identified as the starting point of the ripple effect. For purposes of comparison, besides using our algorithm to compute a precise logical ripple effect, the prototype also computes an overestimate of the logical ripple effect. The overestimate is computed by simply ignoring the execution- path problem, i.e. there are no backward-flow restrictions when the overestimate is computed. The worst-case complexity of computing the overestimate for C programs is only O(nd) where n is the number of flowgraph nodes and d is the number of definitions in the overestimated ripple effect. This complexity follows from the O(n) complexity of solving the dataflow equations for a single definition, and the fact that the equations will have to be solved d times. Table 3.1. Experimental results for the prototype. global defs defs global depth nodes RSo RSp reduction time time, 50 2420 7% 2/213 3939 2275 936 53.4% 5s 3s 100 2291 15% 2/188 3776 4151 2449 41.0% 17s 13s 200 2294 30% 2/188 3662 5594 3718 33.5% 40s 32s 300 2370 45% 2/231 3962 5897 2607 55.8% lm5s 27s 50 2225 7% 3/202 3717 1222 633 40.3% 3s 2s 100 2333 15% 3/229 3864 4139 1867 54.9% 17s 7s 200 2211 30% 3/231 3760 4884 2688 45.0% 39s 28s 300 2236 45% 3/205 3737 5308 3505 34.0% 59s 38s 50 2320 7% 4/227 3912 1822 1067 35.1% 5s 3s 100 2211 15% 4/228 3673 4329 1525 64.8% 18s 7s 200 2223 30% 4/227 3705 5019 1918 61.8% 37s 16s 300 2214 45% 4/214 3648 5922 4740 20.0% lm9s 1m36s 100 4354 7% 2/372 6858 4317 2201 40.0% 19s 10s 200 4467 15% 2/368 7068 8844 6457 27.0% lml7s 1ml2s 400 4261 30% 2/388 6851 9653 2976 69.2% 2m29s 49s 600 4289 45% 2/340 6784 10590 6840 35.4% 4m8s 3m56s 100 4314 7% 3/432 6781 1993 631 52.5% 8s 2s 200 4268 15% 3/395 6876 5795 3236 35.5% 51s 54s 400 4223 30% 3/393 6735 9240 7307 20.9% 2m26s 4m21s 600 4248 45% 3/433 6868 9772 6453 30.6% 3m56s 4m50s 100 4252 7% 4/455 6961 2756 1120 42.6% 14s 5s 200 4276 15% 4/440 6858 7781 5752 26.1% ImlOs 2m35s 400 4228 30% 4/391 6681 9838 8290 15.7% 2m45s 9m20s 600 4112 45% 4/462 6802 10017 9192 8.2% 4m24s 39m55s Table 3.1 presents test results for the prototype. Each row details relevant char- acteristics of an input program, and presents the resulting averages of ten different tests of that input program, where each test computed the ripple effect started by a single, randomly chosen definition of a global variable. The input programs of Table 3.1 were randomly generated by a separate pro- gram generator. The generated input programs are syntactically correct and compile without error, but have meaningless executions. Each input program of Table 3.1 has 100 procedures, and exactly the number of global variables listed. Within each input program, each global variable is defined and used at least once. The call structure of each input program was determined randomly by the generator, with the constraint that there be no recursion in the input program, and the given maximum call depth not be exceeded by any call in the input program. All calls in the generated input program are known calls, and approximately 1/(max + 1) of the calls will be at each possible depth from zero to max, where max is the given maximum call depth. Referring to the columns of Table 3.1, globall" is the number of global variables in the input program, "defs" is the number of definitions in the input program, "defs global" is the percentage of the definitions that define a global variable, "depth" is the maximum call depth followed by the total number of calls in the input program, "nodes" is the number of nodes in the flowgraph, "RSo" is the average size of the overestimated ripple effect for the ten test cases where size is the total number of definitions and uses in the ripple effect, "RSp" is the average size of the precise ripple effect, "reduction" is the average percentage reduction for the ten test cases of the size of the overestimated ripple effect when it is replaced by the precise ripple effect, timee" is the average CPU usage time for each test case to compute the overestimated ripple effect, and "timer" is the average CPU usage time for each test case to compute the precise ripple effect. The hardware used was rated at roughly 24 MIPS. As an example of the time notation used in Table 3.1, time 1m36s would be read as 1 minute, 36 seconds. Although the worst-case complexity of our algorithm for precise logical ripple effect is exponential, the data of Table 3.1 indicates that the expected complexity for a wide range of input programs, given a programming language such as C, is approxi- mated by O(nd). This follows from the O(nd) worst-case complexity of computing the overestimate, and the typical closeness of time, and time, for each row in Table 3.1. However, the last row of Table 3.1 is instructive, because it shows that regardless of what the expected complexity might be, there will always be specific input programs and starting points that require time greatly exceeding the time required to compute the overestimate. In practice, if the computation of the precise logical ripple effect is taking too long, then this computation can be abandoned and the overestimate computed and used in its place. Note that our algorithm can very easily compute the overestimate by simply modifying Rule 1 so that element d2 is always generated in place of element d\, thereby avoiding all backward-flow restrictions. 3.4 The Slicing Algorithm This section presents the inverse form of the precise interprocedural logical ripple effect algorithm, and the inverse form of the associated dataflow equations and backward-flow restriction rules. Our algorithm for precise interprocedural slicing is shown in Figure 3.5. The complexity and expected performance of this algorithm is the same as for the precise interprocedural logical ripple effect algorithm given previously. For logical ripple effect, the dataflow problem solved at line 8 was reaching definitions for a single definition. For slicing, which is the inverse problem, the dataflow problem solved at line 8 will be reaching uses for a single use. In reaching definitions, the definition flows in the direction of the arcs in the flowgraph, and is killed by definitions of the same variable, and affects uses of the same variable and any definitions directly dependent on an affected use. In reaching uses, the use flows in the reverse direction of the arcs in the flowgraph, and is killed by definitions of the same variable, and affects definitions of the same variable and any uses that directly determine an affected definition. This reverse flow in the flowgraph means that the dataflow equations solved at line 8 for the slicing algorithm must be an inverted form of the dataflow equations that are used for the logical ripple effect algorithm. These inverted dataflow equations are shown in Figure 3.6. The inverted rules that the slicing algorithm uses for backward-flow restriction are given below. Notice that the ALLOW and TRANSFORM sets will contain returns instead of calls. Compute the slice for a hypothetical or actual use b Input: a program flowgraph ready for dataflow analysis Output: the slice in SLICE begin 1 SLICE 0 2 for each use uu in the program 3 FINU, 1. end for 4 stack 0 5 push (b, 0, 0) onto stack 6 while stack 0 0 do 7 pop stack into (u, ALLOW, TRANSFORM) 8 Solve the reaching-uses dataflow equations for the single use u, using Rules 1, 2, and 3. 9 SLICE SLICE U {u} 10 for each definition d in the program that is affected by either ul or u2 11 SLICE <- SLICE U {d} end for 12 ROOT1 0, LINK1 0, ROOT2 +- 0, LINK2 +- 0 13 for each return node n in the flowgraph 14 if ul E Bin[n] A ul crossed from this return into the returned-from procedure 15 ROOT1 <- ROOT1 U {the return node n} fi 16 if ul E E,,[n] A ul crossed from this return into the returned-from procedure 17 LINK1 -- LINK1 U {the return node n} fi 18 if u2 E Bin[n] A u2 crossed from this return into the returned-from procedure 19 ROOT2 +- ROOT2 U {the return node n} fi 20 if u2 E Ei,[n] A u2 crossed from this return into the returned-from procedure 21 LINK2 4- LINK2 U {the return node n} fi end for Figure 3.5. The slicing algorithm. 22 for each use uu in the program that is affected by either ul or u2 determine Allow and Transform for uu by Theorem 1 23 if u2 E Bout[node where uu occurs] 24 PATHS 0, TRANS 0 25 call Analyze else determine Allow and Transform for uu by Theorem 2 26 if u2 E Eout0[node where uu occurs] 27 PATHS +- 0 28 PATHS {I x E (ROOT2 U LINK2) A (x returns from the procedure that contains uu V x returns from a procedure that contains a return r E (PATHS n LINK2))} 29 TRANS -- ROOT2 n PATHS 30 call Analyze fi determine Allow and Transform for uu by Theorem 3 31 if ul E Bot[node where uu occurs] 32 PATHS 0 33 PATHS {x x E ALLOW A (x returns from the procedure that contains uu V x returns from a procedure that contains a return r E PATHS)} 34 TRANS TRANSFORM n PATHS 35 call Analyze fi determine Allow and Transform for uu by Theorem 4 36 if ul E Eout[node where uu occurs] 37 for each procedure X that contains a return E ROOT1 38 RT1 {x | x E ROOT1 A x is contained in procedure X} 39 PP 0 40 PP -- {zI x E (RT1 U LINK1) A (x is on a path that inclusively begins with a return E RT1 and ends with a return from the procedure that contains uu, such that each return in this path is in (RT1 U LINK1))} 41 if PP 0 42 PATHS 0 43 PATHS -- {x | x E ALLOW A (x returns from procedure X V x returns from a procedure that contains a return r E PATHS)} 44 TRANS TRANSFORM n PATHS 45 PATHS PATHS U PP 46 call Analyze end statements: fi, end for, fi, fi, end for, od end Figure 3.5. continued. Procedure Analyze begin avoid repetition of uu dataflow analysis if possible 47 if FIN.. 0 T A (PATHS = 0 V (true for all saved pairs for uu: PATHS P P V TRANS V T)) 48 if PATHS = 0 49 FIN., T 50 push (uu, 0, 0) onto stack else 51 save PATHS and TRANS as the pair P x T for uu 52 push (uu, PATHS, TRANS) onto stack fi fi end Figure 3.5. continued. Rule 1. If ALLOW = 0 then element u2 is generated at the node where use u occurs, otherwise ul is the generated element. Rule 2. Let n be a call node, p be the associated return node, and q be the entry node of the returned-from procedure. Each time the Bot[n] equation is computed, if ul E Bin[q], then ul cannot cross from B,n[q] into the Bout[n] set if p ALLOW. Rule 3. Let n be a call node, p be the associated return node, and q be the entry node of the returned-from procedure. Each time the Bo,,u[n] equation is computed, if ui E Bi,[q], and, by C2 and Rule 2, ul can cross from Bi,[q] into the Bou,[n] set, and p E TRANSFORM, then as this ul element crosses from Bin[q] into the Bot[n] set, the element is changed to u2. In effect, ul is transformed into u2, and the call node n becomes a generation node for the u2 element. As the usefulness of slicing is primarily for program fault localization, it may be desirable to modify the algorithm so that those uses in control predicates whose subordinate statements have at least one use or definition already in the slice, are themselves added to the slice and propagated in turn. An example of a control pred- icate is the condition tested by an if statement. By subordinate statements is meant For any node n. OUT[n] = E,,t[n] U B,,t[n] IN[n] = En[n] U B, [n] Group I: n is an exit node. Bo.,[n] = 0 E,,[n]= U {x I xEIN[p A C} p E succ(n) B,.[n] = GEN[n] Ei[n] = E,,u[n] U RECODE[n] Group II: n is a call node, p is the associated return node and q is the entry node of the returned-from procedure. Bout[n] = {x I (x e Bi,[p] A (C- V (C1 A C2 Ax E in[q]))) V (x E Bin[q] A C2)} Eou,[n] = {x E Ei,[p] I C V (C1 A C2 Ax E Ei.[q])} Bi,[n] = (Bot[n] KILL[n]) U GEN[n] Ei,[n] = Eot[n] KILL[n] Group III: n is not an exit or call node. B,,t[n] = U Bi,[pl p E succ(n) Et[n] = U Ein[p] p E succ(n) B,n[n] = (Bo,,[n] KILL[n]) U GEN[n] Ei,[n] = Eot[n] KILL[n] Figure 3.6. Dataflow equations for the reaching-uses problem. 76 those statements whose execution is decided by the control predicate. Including these control-predicate uses in the slice is advantageous because the cause of a program error may actually be in a control predicate that is not deciding correctly when to execute its subordinate statements. Ferrante et al. [8] present a method to precisely determine the control predicates for each statement. CHAPTER 4 INTERPROCEDURAL PARALLELIZATION 4.1 Loop-Carried Data Dependence This section explains loop-carried data dependence and its relevance to paral- lelization. When a definition of a variable reaches a use of that variable, then a data dependence exists such that the use depends on the definition. An example of data dependence can be seen in Figure 4.1. The use of A(I) at line 3, and the use of A(I) at line 4, both depend on the definition of A(I) at line 2. However, when considering whether or not a loop can be parallelized, there is a special kind of data dependence called loop-carried data dependence [25]. A data dependence is loop carried if the value set by a definition inside the loop during loop iteration i can be used by a use of that variable inside the loop during loop iteration j, where i $ j. Note that i $ j is specified instead of the more restrictive and natural seeming i < j, because if the loop is parallelized then the ordering of the loop iterations cannot be assumed. The relationship between loop-carried data dependence and parallelization is straightforward. If there is at least one loop-carried data dependence, then the loop cannot be parallelized, otherwise the loop can be parallelized. Loop parallelization 1 DO I = 1,N 2 A(I)= B(I) C(I) + D 3 B(I)= C(I) / D + A(I) 4 IF C(I) < 0 THEN C(I) = A(I) B(I) FI END DO Figure 4.1. An example loop. would mean that the ordering of the different iterations of the loop is unimportant, whereas a loop-carried dependence means the opposite. If there are no loop-carried data dependencies then there is no requirement that the iterations be ordered a certain way. However, whenever a loop is parallelized, there should be a following, added, serial step that sets the iteration variables, such as the I in Figure 4.1, to whatever their values would be for the last iteration of the loop, assuming the loop had not been parallelized. This added step would be necessary, assuming the iteration variables of a loop are visible outside the loop and can therefore be referenced after the loop completes. Iteration variables are those variables that are incremented or decremented a constant value for each loop iteration. The recognition of iteration variables is language-dependent. Regarding data dependence and arrays, there are several efficient tests available that determine if a data dependence is possible between a particular definition and use of an array. The tests are the separability test, the gcd test, and the Banerjee test. Details of these three tests can be found in [25]. The number theory behind the tests is linear diophantine equations. A linear diophantine equation can be formed from the array subscripts of the definition and use in question. For example, in Figure 4.2 we want to know if A(3 I 5) and A(6 I) can ever refer to the same array element. The linear diophantine equation that relates these two array references would be 3x 6y = 5. The question now becomes does this equation have any integer solutions given the boundary conditions 30 < x, y < 100. If there is at least one integer solution, then there would be a data dependence, otherwise there is no data dependence, as is the case with Figure 4.2. For the discussion that follows, we define the term loop body. The loop body of any loop L will be all statements in the program that can possibly be executed during the iterations of loop L. Calls are allowed in a loop, so a single loop body could conceivably include the statements of many different procedures. For example, DO I = 30,100 A(3 I- 5) = ... ... A(6 I) END DO Figure 4.2. A loop with array references. if a loop contains a call of procedure A, and procedure A contains a call of procedure B, then the loop body would include all the statements of procedures A and B. In Figure 4.1, the loop body is the four statements at lines 1 through 4. With respect to the program flowgraph, the loop body is all flowgraph nodes that may be traversed during the iterations of the loop. Let LB be the set of flowgraph nodes that are in the loop body of loop L. Let n be the first node in the loop body that is traversed during each iteration of the loop. The identification of node n is language-dependent. Within the loop body of L, let definition d be a definition of a non-array variable v, and let use u be a use of the variable v that is reached by definition d. Let d be the node in the loop body where definition d occurs, and let u be the node in the loop body where the use u occurs. To avoid the complications posed by special cases, we assume that d, n, and u are separate and distinct nodes. Although use u depends on definition d because definition d reaches use u, this data dependence can prevent parallelization of loop L only if the dependence is loop carried. Let P be a sequence of flowgraph nodes drawn from LB, such that P represents a possible execution path along which definition d can reach use u. For definition d to be loop-carried to use u along path P, the three nodes, d, n, and u, must be in P, and in that order, because only the traversal of node n represents the transition to a different iteration of the loop. If v is an array, then we assume that definition d and use u may refer to different array elements during the same iteration. For this reason, a path P that includes the nodes d, u, n, d, u, in that order, must be assumed to show a loop-carried data dependence when v is an array, whereas this path P does not show a loop-carried data dependence if definition d and use u always refer to the same storage location during any iteration, as we assume is the case when v is a non-array, because in any iteration that follows such a path P, the value used at use u is always the value defined at definition d in that same iteration. 4.2 The Parallelization Algorithm This section presents in Figure 4.3 an algorithm that identifies loops that can be parallelized, including loops that contain calls. The algorithm uses our interprocedu- ral dataflow analysis method as an integral step to determine data dependencies. The loops that can be parallelized are those loops that are not marked by the algorithm as inhibited. The algorithm has three distinct steps. First, the reaching-definitions dataflow problem is solved for the input program by using our interprocedural dataflow analysis method. Second, the quality of the reaching-definition information computed by the first step is possibly improved in the case of array references by using the separability, gcd, and Banerjee tests. Third, individual d, u pairs that represent data dependence are examined for loop-carried data dependence. At line 7, the definitions and uses of iteration variables are excluded from testing for loop-carried data dependence, because for any iteration the iteration variables will have constant values that can be precomputed if loop L is parallelized. The test at line 8 is a necessary condition for the P-test procedure to return a T, which is tested for at line 9. The test at line 8 is done as an economy measure to avoid, when possible, the more costly P-test. Procedure P-test uses a straightforward algorithm that begins with node d and then spreads out examining successors, successors of successors, and so on, until either there are no more acceptable nodes to examine, in which case F is returned, or all the requirements for path P have been met, in which case T is returned. The successors a d, u pair is a definition d that reaches a use u x is the dataflow element that represents the definition d v is the variable referenced by definition d and use u to avoid complications, n # d # u is assumed n is the first node traversed during each loop L iteration d is the node whose basic block contains definition d u is the node whose basic block contains use u LB is the set of nodes in the loop body of loop L IV is the set of definitions of iteration variables for loop L begin step 1, determine reaching definitions for the input program 1 use our method to solve the reaching-definitions dataflow problem step 2, improve the reaching-definition information for array references 2 for all d, u pairs in the program, such that v is an array 3 use the separability, gcd, and Banerjee tests as applicable 4 if definition d and use u can never reference the same element 5 mark the d, u pair as non-reaching fi end for step 3, identify d, u pairs that inhibit parallelization 6 for each loop L in the program 7 for each reaching d, u pair such that d, u E LB and definition d IV 8 if x E Bot[n] 9 if P-test(x, n, d, u, L, LB) = T 10 mark L parallelization as inhibited by the d, u pair fi fi end for end for end Figure 4.3. The parallelization algorithm. procedure P-test(x, n, d, u, L, LB) is there a loop-carried data dependence from definition d to use u thru node n return T if yes, F if no begin partly, is there a path from d to n along which x is found 11 if v is an array 12 DONE {d} else 13 DONE -- {d, u} fi 14 NEXT <- {d} 15 until NEXT = 0 16 remove a node from NEXT, denote it p 17 for each successor node s of node p, such that s a DONE 18 DONE DONE U {s} 19 if s V LB Vs is an entry node Vx V Bo,,[s] 20 ignore s 21 else if s = n 22 goto part else 23 NEXT +- NEXT U {s} fi end for end until 24 return F Figure 4.3. continued. part2: part2, is there a path from n to u along which x is found 25 if v is an array 26 DONE -- {n} else 27 DONE {n, d} fi 28 NEXT {n} 29 until NEXT = 0 30 remove a node from NEXT, denote it p 31 for each successor node s of node p, such that s 4 DONE 32 DONE DONE U {s} 33 if s LB Vs is an exit node V(s is contained in the same procedure that contains L A x B Bot[s]) V(s is not contained in the same procedure that contains L A x E,,t[s]) 34 ignore s 35 else if s = u 36 return T else 37 NEXT NEXT U {s} fi end for end until 38 return F end Figure 4.3. continued. of a node are examined because normally a successor node is assumed to represent a possible continuation of the execution path from the point of the predecessor node. Exceptions in the algorithm involving entry and exit nodes are explained shortly. Note that P-test only determines whether a satisfactory path P exists or not; it does not determine what path P is in terms of an actual node sequence, as there may be many such satisfactory paths P. Lines 13 and 27 are active when v is not an array. In this case, a path P that includes d, u, n, d, u, in that order, is not allowed, and this is prevented by marking the unwanted node u at line 13, and the unwanted node d at line 27. The test of x B,,t[s] at line 19 satisfies the requirement that the definition d can reach along the path P. A similar test is made at line 33. At line 19, only the B set is checked because there are no descents into called procedures, as per the rejection of entry nodes at line 19. Entry nodes are rejected at line 19 because any path from d to n will not leave unreturned calls, because n is an outermost node relative to the loop body, and the path is confined to the loop body. As the successors of each call node are an entry node and a return node, it is only necessary to check the out set of the return node to know whether the element x survived the call or not, and this is effectively done by the x V Bot[s] test already mentioned. At line 33, exit nodes are rejected because any path from n to u will not make a return without first making the call. This follows from the fact, already mentioned, that node n is an outermost node relative to the loop body, and the path is confined to the loop body. As the return node can always be added to the path P from the call node, there is no need to add it from the exit node, hence the rejection of the exit node. For partly and part2 in procedure P-test, each flowgraph node may appear only once in the NEXT set, hence the complexity of the P-test procedure is O(n) where n is the number of flowgraph nodes. For the entire algorithm, step3 dominates, so the 85 complexity is O(lpn) where I is the number of loops in the program, p is the number of d,u pairs in the program, and n is the number of flowgraph nodes. CHAPTER 5 CONCLUSIONS AND FUTURE RESEARCH 5.1 Summary of Main Results The first part of this work presented a new method for context-dependent, flow- sensitive interprocedural dataflow analysis. The method was shown to produce a precise, low-cost solution for such fundamental and important problems as reaching definitions and available expressions, regardless of the actual call structure of the program being analyzed. By using a separate set to isolate calling-context effects, and another set to accumulate body effects, the calling-context problem has been reduced to the problem of solving the dataflow equations that compute the different sets. These equations can be solved by the iterative algorithm. As part of our method, the interprocedural kill effects of call-by-reference formal parameters are correctly handled by the equations-compatible technique of element recoding. The importance of our interprocedural analysis method lies in the fact that a number of different applications depend on the solution of fundamental dataflow problems such as reaching definitions, live variables, definition-use and use-definition chains, and available expressions. Program revalidation, dataflow anomaly detection, compiler optimization, automatic vectorization and parallelization, and software tools that make a program more understandable by revealing data dependencies, are some of the applications that may benefit by using our method. The second part of this work presented new algorithms for precise interprocedu- ral logical ripple effect and slicing. The algorithms use our interprocedural dataflow analysis method, and add a control mechanism by which, in effect, execution-path history can affect execution-path continuation as the ripple effect or slice is built piece by piece. The importance of our algorithms for precise interprocedural logical ripple effect and slicing lies in their applicability to the areas of software maintenance and debug- ging. A precise interprocedural logical ripple effect can be used to show a programmer the consequences of program changes, thereby reducing errors and maintenance cost. Similarly, a precise interprocedural slice can localize program faults, thereby saving programmer effort and debugging cost. The third part of this work presented an algorithm that identifies loops that can be parallelized, including loops that contain calls. The algorithm makes use of our interprocedural dataflow analysis method to determine data dependencies, and then the algorithm examines the data dependencies within each loop and determines if any of these data dependencies are loop-carried, in which case parallelization of the loop is inhibited. The algorithm has potential use in parallelization tools. 5.2 Directions for Future Research There are several topics of possible future research related to our method for interprocedural dataflow analysis. Regarding solving the equations, besides the it- erative algorithm there are elimination algorithms [20] that have better complexity. Further studies are needed to determine to what extent these other algorithms can be used to solve the equations. Another topic regards the dataflow problems that can be solved by our method, as the actual universe of solvable problems remains to be determined. We have only mentioned a few of the better known problems. For some dataflow problems, it may be that our method can be used after suitable modification to adapt it to the special needs of the problem. Regarding possible future research related to our algorithms for precise inter- procedural logical ripple effect and slicing, because the algorithms may overestimate when recursive calls are present, or because the Allow set lacks the information needed 88 to enforce the ordering of unmatched returns, one area of future research would be to investigate the possibility of modifying Definition 1, Theorems 1 through 4, and the algorithms, so as to remove the possibility of such overestimation. REFERENCES [1] Agrawal, H., and Horgan, J. Dynamic program slicing. Proceedings of the SIG- PLAN 90 Conference on Programming Language Design and Implementation. ACM SIGPLAN Notices, 25, 6 (June 1990), 246-256. [2] Aho, A., Sethi, R., and Ullman, J. Compilers, Principles, Techniques and Tools. Addison-Wesley, Reading, MA (1986). [3] Allen, F. Interprocedural data flow analysis. Proceedings of the IFIP Congress 1974, North Holland, Amsterdam (1974), 398-402. [4] Banning, J. An efficient way to find the side effects of procedure calls and the aliases of variables. Conference Record of the 6th ACM Symposium on Principles of Programming Languages, ACM, New York (Jan. 1979), 29-41. [5] Burke, M., and Cytron, R. Interprocedural dependence analysis and paralleliza- tion. Proceedings of the SIGPLAN 86 Symposium on Compiler Construction, 162-175. [6] Callahan, D. The program summary graph and flow-sensitive interprocedural data flow analysis. Proceedings of the SIGPLAN 88 Conference on Program- ming Language Design and Implementation. ACM SIGPLAN Notices, 23, 7 (July 1988), 47-56. [7] Cooper, K., and Kennedy, K. Interprocedural side-effect analysis in linear time. Proceedings of the SIGPLAN 88 Conference on Programming Language Design and Implementation. ACM SIGPLAN Notices, 23, 7 (July 1988), 57-66. [8] Ferrante, J., Ottenstein, K., and Warren, J. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9, 2 (1987), 319-349. [9] Harrold, M., and Soffa, M. Computation of interprocedural definition and use dependencies. Proceedings of the IEEE Computer Society 1990 Int'l Conference on Computer Languages, New Orleans, LA (March 1990). [10] Hecht, M. Flow Analysis of Computer Programs. Elsevier North-Holland, New York (1977). [11] Horwitz, S., Reps, T., and Binkley, D. Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems, 12, 1 (Jan. 1990), 26-60. [12] Hwang, J., Du, M., and Chou, C. Finding program slices for recursive procedures. Proceedings of the IEEE COMPSAC 88 (Oct. 1988), 220-227. [13] Johmann, K., Liu, S., and Yau, S. Dataflow Equations for Context-Dependent Flow-Sensitive Interprocedural Analysis. SERC-TR-45-F, Department of Com- puter and Information Sciences, University of Florida, Gainesville (Jan. 1991). [14] Korel, B., and Laski, J. Dynamic program slicing. Information Processing Let- ters, 29, 3 (Oct. 1988), 155-163. [15] Landi, W., and Ryder, B. Pointer-induced aliasing: a problem classification. Conference Record of the 18th ACM Symposium on Principles of Programming Languages, ACM, New York (1991), 93-103. [16] Leung, H., and Reghbati, H. Comments on program slicing. IEEE Transactions on Software Engineering, SE-13, 12 (Dec. 1987), 1370-1371. [17] Myers, E. A precise interprocedural data flow analysis algorithm. Conference Record of the 8th ACM Symposium on Principles of Programming Languages, ACM, New York (1981), 219-230. [18] Richardson, S., and Ganapathi, M. Interprocedural optimization: experimental results. Software-Practice and Experience, 19, 2 (1989), 149-169. [19] Rosen, B. Data flow analysis for procedural languages. Journal of the ACM, 26, 2 (April 1979), 322-344. [20] Ryder, B., and Paull, M. Elimination algorithms for data flow analysis. ACM Computing Surveys, 18, 3 (Sep. 1986), 277-316. [21] Sharir, M., and Pnueli, A. Two approaches to interprocedural data flow analysis. Muchnik, S., and Jones, N. Eds. Program Flow Analysis: Theory and Applica- tions, Prentice-Hall, Englewood Cliffs, NJ (1981), 189-232. [22] Triolet, R., Irigoin, F., Feautrier, P. Direct parallelization of call statements. Proceedings of the SIGPLAN 86 Symposium on Compiler Construction, 176- 185. [23] Weiser, M. Programmers use slices when debugging. Communications of the ACM, 25, 7 (July 1982), 446-452. [24] Weiser, M. Program slicing. IEEE Transactions on Software Engineering, SE-10, 4 (July 1984), 352-357. [25] Zima, H., and Chapman, B. Supercompilers for Parallel and Vector Computers. Addison-Wesley, Reading, MA (1990). BIOGRAPHICAL SKETCH Kurt Johmann was born in Elizabeth, New Jersey, on November 16, 1955. In 1978 he received a B.A. in computer science from Rutgers University in New Jersey. Following graduation, he worked for a shipping company, Sea-Land Service Inc., as a programmer and systems analyst. In 1985 he left Sea-Land and did PC work for three years. Following this, he entered the graduate program of the Computer and Information Sciences Department at the University of Florida in the Fall of 1988. He received an M.S. in computer science, December 1989, and entered the Ph.D. program. Anticipating graduation, he hopes to find a job in academia. I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Step S. Yau, C aian Professor of Comp ,r and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Richard Newman-Wolfe, Cochairman Assistant Professor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Paul Fishwick Associate Professor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Mark Yang Professor of tati This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. May, 1992 ., .---- SWinfred M. Phillips Dean, College of Engineering Madelyn M. Lockhart Dean, Graduate School UNIVERSITY OF FLORIDA III 1 11111 Ill 111 I I 3 1262 08285 449 7 |

Full Text |

PAGE 1 &217(;7'(3(1'(17 )/2:6(16,7,9( ,17(5352&('85$/ '$7$)/2: $1$/<6,6 $1' ,76 $33/,&$7,21 72 6/,&,1* $1' 3$5$//(/,=$7,21 %\ .857 -2+0$11 $ ',66(57$7,21 35(6(17(' 72 7+( *5$'8$7( 6&+22/ 2) 7+( 81,9(56,7< 2) )/25,'$ ,1 3$57,$/ )8/),//0(17 2) 7+( 5(48,5(0(176 )25 7+( '(*5(( 2) '2&725 2) 3+,/2623+< 81,9(56,7< 2) )/25,'$ 81,9(56,7< 2) )/25IIO5 /,%5$5,(6 PAGE 2 $&.12:/('*(0(176 ZRXOG OLNH WR H[SUHVV P\ DSSUHFLDWLRQ DQG JUDWLWXGH WR P\ FKDLUPDQ DQG DGYLVRU 'U 6WHSKHQ 6 PAGE 3 7$%/( 2) &217(176 $&.12:/('*(0(176 LL $%675$&7 Y &+$37(56 ,1752'8&7,21 ,QWHUSURFHGXUDO 'DWDIORZ $QDO\VLV 6OLFLQJ DQG /RJLFDO 5LSSOH (IIHFW 3DUDOOHOL]DWLRQ /LWHUDWXUH 5HYLHZ 2XWOLQH LQ %ULHI 7+( ,17(5352&('85$/ '$7$)/2: $1$/<6,6 0(7+2' &RQVWUXFWLQJ WKH )ORZJUDSK ,QWHUSURFHGXUDO )RUZDUG)ORZ2U $QDO\VLV 7KH 'DWDIORZ (TXDWLRQV (OHPHQW 5HFRGLQJ IRU $OLDVHV ,PSOLFLW 'HILQLWLRQV 'XH WR &DOOV ,QWHUSURFHGXUDO )RUZDUG)ORZ$QG $QDO\VLV ,QWHUSURFHGXUDO %DFNZDUG)ORZ $QDO\VLV &RPSOH[LW\ RI 2XU ,QWHUSURFHGXUDO $QDO\VLV 0HWKRG ([SHULPHQWDO 5HVXOWV ,17(5352&('85$/ 6/,&,1* $1' /2*,&$/ 5,33/( ())(&7 5HSUHVHQWLQJ &RQWLQXDWLRQ 3DWKV IRU ,QWHUSURFHGXUDO /RJLFDO 5LSSOH (IIHFW 7KH /RJLFDO 5LSSOH (IIHFW $OJRULWKP $ 3URWRW\SH 'HPRQVWUDWHV WKH $OJRULWKP 7KH 6OLFLQJ $OJRULWKP ,17(5352&('85$/ 3$5$//(/,=$7,21 /RRS&DUULHG 'DWD 'HSHQGHQFH 7KH 3DUDOOHOL]DWLRQ $OJRULWKP &21&/86,216 $1' )8785( 5(6($5&+ 6XPPDU\ RI 0DLQ 5HVXOWV LLL PAGE 4 'LUHFWLRQV IRU )XWXUH 5HVHDUFK 5()(5(1&(6 %,2*5$3+,&$/ 6.(7&+ LY PAGE 5 $EVWUDFW RI 'LVVHUWDWLRQ 3UHVHQWHG WR WKH *UDGXDWH 6FKRRO RI WKH 8QLYHUVLW\ RI )ORULGD LQ 3DUWLDO )XOILOOPHQW RI WKH 5HTXLUHPHQWV IRU WKH 'HJUHH RI 'RFWRU RI 3KLORVRSK\ &217(;7'(3(1'(17 )/2:6(16,7,9( ,17(5352&('85$/ '$7$)/2: $1$/<6,6 $1' ,76 $33/,&$7,21 72 6/,&,1* $1' 3$5$//(/,=$7,21 %\ .XUW -RKPDQQ 0D\ &KDLUPDQ 'U 6WHSKHQ 6 PAGE 6 6OLFLQJ GHWHUPLQHV SURJUDP SLHFHV WKDW FDQ DIIHFW D YDOXH /RJLFDO ULSSOH HIIHFW GHWHUPLQHV SURJUDP SLHFHV WKDW FDQ EH DIIHFWHG E\ D YDOXH %RWK VOLFLQJ DQG ORJLFDO ULSSOH HIIHFW DUH XVHIXO IRU VRIWZDUH PDLQWHQDQFH 7KH SUREOHPV RI VOLFLQJ DQG ORJLFDO ULSSOH HIIHFW DUH LQYHUVHV RI HDFK RWKHU DQG D VROXWLRQ RI HLWKHU SUREOHP FDQ EH LQYHUWHG WR VROYH WKH RWKHU 3UHFLVH LQWHUSURFHGXUDO ORJLFDO ULSSOH HIIHFW DQDO\VLV LV FRPSOLFDWHG E\ WKH IDFW WKDW DQ HOHPHQW PD\ EH LQ WKH ULSSOH HIIHFW E\ YLUWXH RI RQH RU PRUH VSHFLILF H[HFXWLRQ SDWKV ,Q WKLV GLVVHUWDWLRQ ZH SUHVHQW DQ DOJRULWKP WKDW EXLOGV D SUHFLVH ORJLFDO ULSSOH HIIHFW RU VOLFH SLHFH E\ SLHFH WDNLQJ LQWR DFFRXQW WKH SRVVLEOH H[HFXWLRQ SDWKV 7KH DOJRULWKP PDNHV XVH RI RXU LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV PHWKRG DQG WKLV PHWKRG LV DOVR XVHG LQ DQ DOJRULWKP JLYHQ LQ WKLV GLVVHUWDWLRQ IRU LGHQWLI\LQJ ORRSV WKDW FDQ EH SDUDOOHOL]HG YL PAGE 7 &+$37(5 ,1752'8&7,21 ,QWHUSURFHGXUDO 'DWDIORZ $QDO\VLV 'DWDIORZ DQDO\VLV UHIHUV WR D FODVV RI SUREOHPV WKDW DVN DERXW WKH UHODWLRQVKLSV WKDW H[LVW DORQJ D SURJUDPfV SRVVLEOH H[HFXWLRQ SDWKV EHWZHHQ VXFK SURJUDP HOHn PHQWV DV YDULDEOHV FRQVWDQWV DQG H[SUHVVLRQV > @ :KHQ GDWDIORZ DQDO\VLV LV GRQH IRU D SURJUDP E\ WUHDWLQJ LWV LQGLYLGXDO SURFHGXUHV DV EHLQJ LQGHSHQGHQW RI HDFK RWKHU UHJDUGOHVV RI WKH FDOOV PDGH WKLV LV NQRZQ DV LQWUDSURFHGXUDO DQDO\VLV )RU LQWUDSURFHGXUDO DQDO\VLV DVVXPSWLRQV PXVW EH PDGH DERXW WKH HIIHFWV RI FDOOV %\ FRQWUDVW LQWHUSURFHGXUDO DQDO\VLV UHSODFHV DVVXPSWLRQV ZLWK VSHFLILF LQIRUPDWLRQ DERXW WKH HIIHFWV RI HDFK FDOO 7KLV LQIRUPDWLRQ FDQ EH JDWKHUHG E\ HLWKHU IORZ VHQVLWLYH > @ RU IORZLQVHQVLWLYH > @ DQDO\VLV :KHQ DQVZHULQJ D GDWDIORZ TXHVWLRQ D IORZVHQVLWLYH DQDO\VLV ZLOO WDNH LQWR DFFRXQW WKH IORZ SDWKV ZLWKLQ SURFHGXUHV ZKHUHDV D IORZLQVHQVLWLYH DQDO\VLV LJQRUHV WKHVH IORZ SDWKV 7KH IORZ SDWKV DUH WKH SRVVLEOH H[HFXWLRQ SDWKV )ORZVHQVLWLYH DQDO\VLV W\SLFDOO\ SURYLGHV PRUH SUHFLVH LQIRUPDWLRQ EXW DW JUHDWHU FRVW )ORZVHQVLWLYH LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV KDV WZR PDMRU SUREOHPV WKDW PDNH LW VLJQLILFDQWO\ KDUGHU WKDQ LQWUDSURFHGXUDO DQDO\VLV )LUVW LQ LQWUDSURFHGXUDO DQDO\VLV LW LV DVVXPHG WKDW DQ\ SDWK LQ WKH IORZJUDSK LV D SRVVLEOH H[HFXWLRQ SDWK %\ FRQWUDVW IRU LQWHUSURFHGXUDO DQDO\VLV LW LV XVHIXO WR DVVXPH WKDW WKH SRVVLEOH H[HFXWLRQ SDWKV FRQIRUP WR WKH UXOH WKDW RQFH D SURFHGXUH LV HQWHUHG E\ D FDOO WKH IORZ UHWXUQV WR WKDW FDOO XSRQ UHWXUQ 7KXV WKH VHW RI SRVVLEOH H[HFXWLRQ SDWKV ZLOO W\SLFDOO\ EH D SURSHU VXEVHW RI WKH SDWKV LQ WKH SURJUDP IORZJUDSK 7KLV SUREOHP PAGE 8 ZLOO EH UHIHUUHG WR DV WKH FDOOLQJFRQWH[W SUREOHP 6HFRQG FDOOE\UHIHUHQFH IRUPDO SDUDPHWHUV W\SLFDOO\ FDXVH DOLDV UHODWLRQVKLSV EHWZHHQ DFWXDO DQG IRUPDO SDUDPHWHUV WKDW DUH YDOLG RQO\ IRU FHUWDLQ FDOOV DQG DSSO\ RQO\ WR WKRVH SDVVHV WKURXJK WKH FDOOHG SURFHGXUH WKDW RULJLQDWH IURP WKRVH FDOOV WKDW HVWDEOLVK WKH VSHFLILF DOLDV UHODWLRQVKLS 7KHUH DUH PDQ\ DSSOLFDWLRQV IRU D IORZVHQVLWLYH LQWHUSURFHGXUDO GDWDIORZ DQDOn \VLV PHWKRG WKDW VROYHV WKH WZR PDMRU SUREOHPV DVVXPLQJ WKDW WKH FRVWV RI WKH PHWKRG DUH QRW WRR KLJK 6RPH RI WKH ZHOONQRZQ GDWDIORZ SUREOHPV WKDW FDQ EH SUHFLVHO\ VROYHG E\ VXFK D PHWKRG DUH UHDFKLQJ GHILQLWLRQV OLYH YDULDEOHV WKH UHODWHG SUREOHPV RI GHILQLWLRQXVH DQG XVHGHILQLWLRQ FKDLQV DQG DYDLODEOH H[SUHVVLRQV $Sn SOLFDWLRQV WKDW UHTXLUH WKH VROXWLRQ RI RQH RU PRUH RI WKHVH GDWDIORZ SUREOHPV LQFOXGH FRPSLOHU RSWLPL]DWLRQ DXWRPDWLF YHFWRUL]DWLRQ DQG SDUDOOHOL]DWLRQ RI SURJUDP FRGH SURJUDP UHYDOLGDWLRQ GDWDIORZ DQRPDO\ GHWHFWLRQ DQG VRIWZDUH WRROV WKDW VKRZ GDWD GHSHQGHQFLHV ,Q WKLV GLVVHUWDWLRQ ZH SUHVHQW D QHZ PHWKRG IRU IORZVHQVLWLYH LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV WKDW VROYHV WKH WZR PDMRU SUREOHPV DQG GRHV VR DW D FRPSDUDWLYHO\ ORZ FRVW >@ 7KH PHWKRG FRQVLVWV RI VSHFLDO GDWDIORZ HTXDWLRQV WKDW DUH VROYHG IRU D SURJUDP IORZJUDSK ,Q GHIHUHQFH WR FDOOLQJ FRQWH[W VHSDUDWH VHWV FDOOHG HQWU\ DQG ERG\ VHWV DUH PDLQWDLQHG DW HDFK QRGH LQ WKH IORZJUDSK 7KH HQWU\ VHW FRQWDLQV FDOOLQJFRQWH[W HIIHFWV WKDW HQWHU D SURFHGXUH 7KH ERG\ VHW FRQWDLQV HIIHFWV WKDW UHVXOW IURP VWDWHPHQWV LQ WKH SURFHGXUH %\ LVRODWLQJ FDOOLQJFRQWH[W HIIHFWV LQ WKH HQWU\ VHW D FDOOfV QRQNLOOHG FDOOLQJ FRQWH[W LV SUHVHUYHG E\ PHDQV RI D VLPSOH LQWHUn VHFWLRQ RSHUDWLRQ GRQH DW WKH UHWXUQ QRGH IRU WKH FDOO 7KH PDLQ DGYDQWDJH RI RXU PHWKRG LV LWV ORZ FRPSOH[LW\ DQG WKH IDFW WKDW WKH SUHVHQFH RI UHFXUVLRQ GRHV QRW DIIHFW WKH SUHFLVHQHVV RI WKH UHVXOW 7KH ODQJXDJH PRGHO DVVXPHG IRU &KDSWHU DOORZV JOREDO YDULDEOHV EXW WKH YLVLELOLW\ RI HDFK IRUPDO SDUDPHWHU LV OLPLWHG WR WKH VLQJOH SURFHGXUH WKDW GHFODUHV PAGE 9 LW 7KXV ZLWK WKH H[FHSWLRQ RI D FDOO DQG LWV LQGLUHFW UHIHUHQFH HDFK IRUPDO SDn UDPHWHU FDQ RQO\ EH UHIHUHQFHG LQVLGH D VLQJOH SURFHGXUH ([DPSOHV RI SURJUDPPLQJ ODQJXDJHV WKDW ILW WKLV PRGHO DUH & DQG )2575$1 7KLV UHVWULFWLRQ RQ WKH YLVLELOLW\ RI IRUPDO SDUDPHWHUV LV LPSRVHG IRU WKH VDNH RI WKH GLVFXVVLRQV RI HOHPHQW UHFRGLQJ LQ 6HFWLRQV DQG RI LPSOLFLW GHILQLWLRQV LQ 6HFWLRQ DQG RI ZRUVWFDVH FRPSOH[LW\ LQ 6HFWLRQ 2XU PHWKRG FDQ DOVR EH XVHG IRU WKH DOWHUQDWLYH ODQJXDJH PRGHO WKDW DOORZV HDFK IRUPDO SDUDPHWHU WR KDYH YLVLELOLW\ LQ PRUH WKDQ D VLQJOH SURFHGXUH EXW WKLV LV FRQVLGHUHG RQO\ EULHIO\ DW WKH HQG RI 6HFWLRQ 6OLFLQJ DQG /RJLFDO 5LSSOH (IIHFW *LYHQ DQ DFWXDO RU K\SRWKHWLFDO YDULDEOH Y DW SURJUDP SRLQW S GHWHUPLQH DOO SURJUDP SLHFHV WKDW FDQ SRVVLEO\ EH DIIHFWHG E\ WKH YDOXH RI Y DW S 7KLV LV WKH ORJLFDO ULSSOH HIIHFW SUREOHP *LYHQ Y DQG S GHWHUPLQH DOO SURJUDP SLHFHV WKDW FDQ SRVVLEO\ DIIHFW WKH YDOXH RI Y DW S 7KLV LV WKH VOLFLQJ SUREOHP )RU WKHVH WZR SUREOHPV HDFK SUREOHP LV WKH LQYHUVH RI WKH RWKHU DQG D VROXWLRQ IRU RQH RI WKHVH SUREOHPV RQFH LQYHUWHG ZRXOG EH D VROXWLRQ IRU WKH RWKHU SUREOHP /RJLFDO ULSSOH HIIHFW LV XVHIXO IRU KHOSLQJ D SURJUDPPHU WR XQGHUVWDQG KRZ D SURJUDP FKDQJH HLWKHU DFWXDO RU K\SRWKHWLFDO ZLOO LPSDFW WKDW SURJUDP 0DNLQJ SURJUDP FKDQJHV DV SDUW RI URXWLQH PDLQWHQDQFH RIWHQ LQWURGXFHV QHZ HUURUV LQWR WKH FKDQJHG SURJUDP 6XFK HUURUV W\SLFDOO\ UHVXOW EHFDXVH WKH SURJUDPPHU RYHUORRNHG VRPH SDUW RI WKH ORJLFDO ULSSOH HIIHFW IRU WKDW FKDQJH %\ VKRZLQJ D SURJUDPPHU ZKDW WKH ORJLFDO ULSSOH HIIHFW DFWXDOO\ LV IRU D SURJUDP FKDQJH PLVWDNHV FDQ EH DYRLGHG 6OLFLQJ LV SULPDULO\ XVHIXO IRU SURJUDP IDXOW ORFDOL]DWLRQ >@ ,I D YDULDEOH Y DW SRLQW S LV NQRZQ WR KDYH D ZURQJ YDOXH WKHQ D VOLFH RQ Y DW S ZLOO QDUURZ WKH VHDUFK IRU WKH FDXVH RI WKH HUURU WR WKDW SDUW RI WKH SURJUDP WKDW FDQ WUXO\ DIIHFW Y DW S 7KXV WKH IDXOW LV ORFDOL]HG 7KH PRUH SUHFLVH WKH VOLFH WKH PRUH ORFDOL]HG WKH FDXVH RI WKH HUURU VDYLQJ SURJUDPPHU WLPH PAGE 10 ,Q WKLV GLVVHUWDWLRQ ZH DUH FRQFHUQHG RQO\ ZLWK VWDWLF ORJLFDO ULSSOH HIIHFW DQG VOLFLQJ > @ ZKHUH WKH ULSSOH HIIHFW RU VOLFH LV GHWHUPLQHG IURP GDWDIORZ DQDO\VLV RI WKH SURJUDP WH[W 7KH DOWHUQDWLYH DSSURDFK LV G\QDPLF ORJLFDO ULSSOH HIIHFW DQG VOLFLQJ > @ ZKHUH WKH ULSSOH HIIHFW RU VOLFH LV GHWHUPLQHG E\ DFWXDOO\ H[HFXWLQJ WKH SURJUDP :KHQHYHU ZH VSHDN RI H[HFXWLRQ SDWKV LQ &KDSWHU ZH DOZD\V PHDQ SRVVLEOH H[HFXWLRQ SDWKV DV GHWHUPLQHG E\ GDWDIORZ DQDO\VLV 3UHFLVH LQWHUSURFHGXUDO ORJLFDO ULSSOH HIIHFW DQDO\VLV LV FRPSOLFDWHG E\ WKH IDFW WKDW D GHILQLWLRQ PD\ EH DGGHG WR WKH ULSSOH HIIHFW EHFDXVH RI RQH RU PRUH VSHFLILF H[HFXWLRQ SDWKV 7R GHWHUPLQH LQ WXUQ WKH ULSSOH HIIHFW RI WKDW DGGHG GHILQLWLRQ WKDW GHILQLWLRQ VKRXOG EH FRQVWUDLQHG WR WKRVH H[HFXWLRQ SDWKV WKDW DUH WKH SRVVLEOH FRQWLQXDWLRQV RI WKH H[HFXWLRQ SDWKV DORQJ ZKLFK WKDW GHILQLWLRQ ZDV LWVHOI DIIHFWHG DQG WKHUHE\ DGGHG WR WKH ULSSOH HIIHFW :H UHIHU WR WKLV DV WKH H[HFXWLRQSDWK SUREOHP ,Q SDUWLFXODU LW LV WKRVH FDOO LQVWDQFHV PDGH LQ DQ H[HFXWLRQ SDWK 3 WKDW KDYH QRW EHHQ UHWXUQHG WR LQ 3 WKDW FDXVH WKH GLIILFXOW\ 7KLV LV EHFDXVH RI WKH UXOH WKDW D FDOOHG SURFHGXUH UHWXUQV WR LWV PRVW UHFHQW FDOOHU 7KLV PHDQV WKDW DQ\ FRQWLQXDWLRQ RI WKH H[HFXWLRQ SDWK 3 PXVW ILUVW UHWXUQ WR WKRVH XQUHWXUQHG FDOOV LQ 3 EHIRUH UHWXUQV FDQ SRVVLEO\ EH PDGH WR FDOO LQVWDQFHV WKDW SUHFHGH 3 $Q H[DPSOH ZLOO LOOXVWUDWH WKH SUREOHP SURFHGXUH PDLQ SURFHGXUH % SURFHGXUH $ EHJLQ EHJLQ EHJLQ I m \ fÂ§ I FDOO % FDOO $ HQG [ \ ] fÂ§ [ I m FDOO % HQG HQG )RU WKH H[DPSOH DVVXPH WKDW DOO YDULDEOHV DUH JOREDO DQG WKDW WKH SUREOHP LV WR GHWHUPLQH WKH ORJLFDO ULSSOH HIIHFW IRU WKH GHILQLWLRQ RI YDULDEOH DW OLQH 7KH FDOO PAGE 11 WR SURFHGXUH % DW OLQH DOORZV WKH GHILQLWLRQ RI DW OLQH WR DIIHFW WKH GHILQLWLRQ RI \ DW OLQH DQG WKH UHWXUQ RI SURFHGXUH % ZRXOG EH WR WKH FDOO DW OLQH E\ ZKLFK WKH GHILQLWLRQ RI \ DW OLQH ZDV DIIHFWHG 7KH HQG UHVXOW LV WKDW WKH ULSSOH HIIHFW VKRXOG LQFOXGH RQO\ OLQH +RZHYHU DVVXPH WKDW WKH H[HFXWLRQSDWK SUREOHP LV LJQRUHG DQG DOO UHWXUQV DUH SRVVLEOH ZKHQ WKH ULSSOH HIIHFW LV FRPSXWHG )RU WKH VDPH SUREOHP WKH FDOO DW OLQH DOORZV WKH GHILQLWLRQ RI DW OLQH WR DIIHFW WKH GHILQLWLRQ RI \ DW OLQH 7KHQ WKH GHILQLWLRQ RI \ DW OLQH DIIHFWV WKH GHILQLWLRQ RI [ DW OLQH E\ SURFHGXUH % UHWXUQLQJ WR WKH FDOO DW OLQH LQ DGGLWLRQ WR WKH FDOO DW OLQH 7KHQ WKH GHILQLWLRQ RI [ DW OLQH DIIHFWV WKH GHILQLWLRQ RI ] DW OLQH E\ SURFHGXUH $ UHWXUQLQJ WR WKH FDOO DW OLQH 7KH HQG UHVXOW LV D ULSSOH HIIHFW WKDW LQFOXGHV OLQHV DQG EXW RQO\ OLQH VKRXOG EH LQ WKH ULSSOH HIIHFW $OWKRXJK WKHUH DUH D QXPEHU RI SDSHUV RQ ORJLFDO ULSSOH HIIHFW DQG VOLFLQJ > @ WKHUH DSSHDUV WR EH RQO\ RQH >@ WKDW DGGUHVVHV WKH SUREOHPV RI SUHFLVH LQWHUSURFHGXUDO ORJLFDO ULSSOH HIIHFW DQG VOLFLQJ DQG SUHVHQWV D PHWKRG IRU LW :HLVHU >@ ZDV WKH ILUVW WR SURSRVH DQ LQWHUSURFHGXUDO VOLFLQJ PHWKRG WKDW LJQRUHV WKH H[HFXWLRQSDWK SUREOHP DQG WKHUHE\ VXIIHUV IURP WKH UHVXOWLQJ ORVV RI SUHFLVLRQ +RUZLW] HW DO >@ DGGUHVV WKH SUREOHP RI SUHFLVH LQWHUSURFHGXUDO VOLFLQJ DQG SUHVHQW D PHWKRG WR FRQVWUXFW D V\VWHP GHSHQGHQFH JUDSK IURP ZKLFK VOLFHV FDQ EH H[WUDFWHG ,Q WKLV GLVVHUWDWLRQ ZH SUHVHQW DQ DOJRULWKP WKDW EXLOGV WKH ORJLFDO ULSSOH HIIHFW SLHFH E\ SLHFH DQG WDNHV LQWR DFFRXQW WKH UHVWULFWLRQV RQ H[HFXWLRQSDWK FRQWLQXDWLRQ WKDW DUH LPSRVHG E\ WKH SUHFHGLQJ H[HFXWLRQ SDWKV XS WR WKH SRLQW E\ ZKLFK WKH JLYHQ SURJUDP SLHFH LV DIIHFWHG DQG WKHUHE\ LQFOXGHG LQ WKH ULSSOH HIIHFW ,Q JHQHUDO WKH DOJRULWKP FRPSXWHV D SUHFLVH ORJLFDO ULSSOH HIIHFW EXW VRPH RYHUHVWLPDWLRQ LV SRVVLEOH PHDQLQJ WKDW WKH FRPSXWHG ORJLFDO ULSSOH HIIHFW PD\ EH ODUJHU WKDQ LW DFWXDOO\ LV $Q LQYHUVH IRUP RI WKH DOJRULWKP LV SUHVHQWHG IRU WKH VOLFLQJ SUREOHP 7KH ODQJXDJHV WKDW RXU DOJRULWKP ZLOO ZRUN IRU LQFOXGH PDQ\ RI WKH FRPPRQ SURFHGXUDO ODQJXDJHV VXFK DV & 3DVFDO $GD DQG )RUWUDQ PAGE 12 3DUDOOHOL]DWLRQ $XWRPDWLF FRQYHUVLRQ RI D VHTXHQWLDO SURJUDP LQWR D SDUDOOHO SURJUDP LV RIWHQ UHIHUUHG WR DV SDUDOOHOL]DWLRQ 3DUDOOHOL]DWLRQ SUREOHPV DUH W\SLFDOO\ FRQFHUQHG ZLWK WKH FRQYHUVLRQ RI VHTXHQWLDO ORRSV LQWR SDUDOOHO FRGH ,Q WKLV GLVVHUWDWLRQ WKH VSHFLILF SUREOHP FRQVLGHUHG LV WKH LGHQWLILFDWLRQ RI ORRSV LQ D SURJUDP WKDW FDQ EH SDUDOOHOL]HG LQFOXGLQJ WKRVH ORRSV WKDW FRQWDLQ FDOOV $ IORZVHQVLWLYH LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV PHWKRG KDV VSHFLILF DSSOLFDELOLW\ WR WKH SUREOHP RI SDUDOOHOL]LQJ ORRSV WKDW FRQWDLQ FDOOV EHFDXVH VXFK D PHWKRG FDQ VXSSO\ WKH SUHFLVH GDWDGHSHQGHQF\ LQIRUn PDWLRQ WKDW ZRXOG EH QHFHVVDU\ IRU WKH SDUDOOHOL]DWLRQ DQDO\VLV 7KH SDUDOOHOL]DWLRQ RI D ORRS ZRXOG PHDQ WKDW HDFK LWHUDWLRQ RI WKH ORRS FDQ EH H[HFXWHG LQGHSHQGHQWO\ RI WKH RWKHU LWHUDWLRQV RI WKH ORRS ,Q WKHRU\ WKLV ZRXOG PHDQ WKDW HDFK VLQJOH LWHUDWLRQ RU HDFK DUELWUDU\ EORFN RI LWHUDWLRQV FDQ EH DVVLJQHG WR D VHSDUDWH SURFHVVRU LQ D SDUDOOHO PDFKLQH 7KH VSHFLILF DUFKLWHFWXUH RI D SDUWLFXODU SDUDOOHO PDFKLQH DV ZHOO DV WKH SURJUDPPLQJ ODQJXDJH WR EH SDUDOOHOL]HG DV ZHOO DV WKH YDULRXV ORRS WUDQVIRUPDWLRQV WKDW DUH SRVVLEOH WR FRQYHUW VHTXHQWLDO ORRS FRGH LQWR IXQFWLRQDOO\ HTXLYDOHQW VHTXHQWLDO FRGH WKDW LV PRUH SDUDOOHOL]DEOH ZLOO LQIOXHQFH WKH GHWHUPLQDWLRQ LQ DQ\ SDUDOOHOL]DWLRQ WRRO DV WR ZKDW ORRSV FDQ DFWXDOO\ EH SDUDOOHOL]HG DQG KRZ WKH\ ZRXOG EH SDUDOOHOL]HG +RZHYHU QRQH RI WKH DUFKLWHFWXUH ODQJXDJH DQG ORRSWUDQVIRUPDWLRQ LVVXHV ZLOO EH FRQVLGHUHG KHUH ,QVWHDG WKH SUREOHP ZLOO EH FRQVLGHUHG VROHO\ IURP WKH VWDQGSRLQW RI GDWD GHSHQGHQFH $IWHU D EULHI UHYLHZ RI WKH EDVLFV UHJDUGLQJ GDWD GHSHQGHQFH DQG SDUDOOHOL]DWLRQ DQ DOJRULWKP LV JLYHQ WKDW LGHQWLILHV ORRSV LQ D SURJUDP WKDW FDQ EH SDUDOOHOL]HG DQG WKLV DOJRULWKP XVHV RXU LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV PHWKRG DV DQ LQWHJUDO SDUW 7KH SRWHQWLDO YDOXH RI SDUDOOHOL]DWLRQ LV FOHDU 2Q WKH RQH KDQG SDUDOOHO PDFKLQHV DUH EHFRPLQJ PRUH FRPPRQ DQG RQ WKH RWKHU KDQG D JUHDW QXPEHU RI VHTXHQWLDO SURJUDPV DOUHDG\ H[LVW VRPH RI ZKLFK FDQ EHQHILW IURP WKH JUHDWHU SURFHVVLQJ SRZHU WKDW SDUDOOHOL]DWLRQ ZRXOG RIIHU PAGE 13 /LWHUDWXUH 5HYLHZ 'LIIHUHQW PHWKRGV KDYH EHHQ RIIHUHG IRU VROYLQJ YDULRXV IORZVHQVLWLYH LQWHUSURn FHGXUDO GDWDIORZ DQDO\VLV SUREOHPV 6KDULU DQG 3QXHOL >@ SUHVHQW D PHWKRG WKH\ QDPH FDOOVWULQJV 7KH HVVHQWLDO LGHD RI WKHLU PHWKRG LV WR DFFXPXODWH IRU HDFK HOHn PHQW D KLVWRU\ RI WKH FDOOV WUDYHUVHG E\ WKDW HOHPHQW DV LW IORZV WKURXJK WKH SURJUDP IORZJUDSK 7KH FDOO KLVWRU\ DVVRFLDWHG ZLWK DQ HOHPHQW LV XVHG ZKHQHYHU WKDW HOHPHQW LV DW D UHWXUQ SRLQW 7KH HOHPHQW FDQ RQO\ FURVV EDFN WR WKRVH FDOOV LQ LWV FDOO KLVWRU\ 7KXV WKH FDOOVWULQJV DSSURDFK SURYLGHV D VROXWLRQ WR WKH FDOOLQJFRQWH[W SUREOHP +RZHYHU WKH GLVDGYDQWDJH RI WKLV DSSURDFK LV WKH WLPH DQG VSDFH QHHGHG WR PDLQWDLQ D FDOO KLVWRU\ IRU HDFK HOHPHQW DW HDFK IORZJUDSK QRGH /HW O EH WKH SURJUDP VL]H :H DVVXPH WKDW WKH QXPEHU RI HOHPHQWV ZLOO EH D OLQHDU IXQFWLRQ RI O 7KH ZRUVWFDVH QXPEHU RI WRWDO VHW RSHUDWLRQV UHTXLUHG E\ WKH FDOOVWULQJV DSSURDFK ZRXOG EH JUHDWHU E\ D IDFWRU RI ZKHQ FRPSDUHG WR RXU PHWKRG 7KLV LV EHFDXVH IRU HDFK XQLRQ RU LQWHUVHFWLRQ RI WZR VHWV RI HOHPHQWV LI WKH VDPH HOHPHQW LV LQ ERWK VHWV WKHQ D XQLRQ RSHUDWLRQ PXVW DOVR EH GRQH IRU WKH WZR DVVRFLDWHG FDOO KLVWRULHV VR DV WR JHW WKH QHZ FDOO KLVWRU\ WR EH DVVRFLDWHG ZLWK WKDW HOHPHQW DW WKH QRGH IRU ZKLFK WKH VHW RSHUDWLRQ LV EHLQJ GRQH $ IXUWKHU GLVDGYDQWDJH RI WKH FDOOVWULQJV DSSURDFK LV WKH QHHG WR LQFOXGH WKH DVVRFLDWHG FDOO KLVWRULHV ZKHQ VHW VWDELOLW\ LV WHVWHG WR GHWHUPLQH WHUPLQDWLRQ IRU WKH LWHUDWLYH DOJRULWKP XVHG WR VROYH WKH GDWDIORZ HTXDWLRQV 0\HUV >@ RIIHUV D VROXWLRQ WR WKH FDOOLQJFRQWH[W SUREOHP WKDW LV HVVHQWLDOO\ WKH VDPH DV FDOOVWULQJV $OOHQ >@ SUHVHQWV D GLIIHUHQW PHWKRG IRU LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV 7KH PHWKRG DQDO\]HV HDFK SURFHGXUH FRPSOHWHO\ LQ UHYHUVH LQYRFDWLRQ RUGHU 7KH ILUVW SURFHGXUHV WR EH DQDO\]HG ZRXOG EH WKRVH WKDW PDNH QR FDOOV WKHQ WKH SURFHGXUHV WKDW RQO\ FDOO WKHVH SURFHGXUHV ZRXOG EH DQDO\]HG DQG VR RQ 2QFH D SURFHGXUH LV DQDO\]HG LWV HIIHFWV FDQ EH LQFRUSRUDWHG LQWR WKRVH SURFHGXUHV WKDW FDOO PAGE 14 LW ZKHQ WKH\ LQ WXUQ DUH DQDO\]HG 7KH REYLRXV GUDZEDFN RI WKLV PHWKRG LV WKDW LW FDQQRW EH XVHG WR DQDO\]H UHFXUVLYH FDOOV 5RVHQ >@ SUHVHQWV D FRPSOH[ PHWKRG IRU LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV WKDW LV OLPLWHG WR VROYLQJ WKH SUREOHPV RI YDULDEOH PRGLILFDWLRQ SUHVHUYDWLRQ DQG XVH 7KHVH GDWDIORZ SUREOHPV GR QRW UHTXLUH D VROXWLRQ RI WKH FDOOLQJFRQWH[W SUREOHP &DOODKDQ >@ KDV SURSRVHG WKH SURJUDP VXPPDU\ JUDSK WR VROYH WKH LQWHUSURFHn GXUDO GDWDIORZ SUREOHPV RI NLOO DQG XVH ZKHUH NLOO GHWHUPLQHV DOO GHILQLWH NLOOV WKDW UHVXOW IURP D SURFHGXUH FDOO DQG XVH GHWHUPLQHV DOO YDULDEOHV WKDW PD\ EH XVHG DV D UHVXOW RI D SURFHGXUH FDOO EHIRUH EHLQJ UHGHILQHG $V SDUW RI WKH GHWHUPLQDWLRQ RI HGJHV LQ WKH SURJUDP VXPPDU\ JUDSK LQWUDSURn FHGXUDO UHDFKLQJGHILQLWLRQV DQDO\VLV PXVW EH GRQH IRU HDFK SURFHGXUH 6LPSOLI\LQJ &DOODKDQfV VSDFH FRPSOH[LW\ DQDO\VLV ZH JHW YJDOf DV WKH ZRUVWFDVH VL]H RI WKH SURJUDP VXPPDU\ JUDSK ZKHUH YJD LV WKH QXPEHU RI JOREDO YDULDEOHV LQ WKH SURJUDP SOXV WKH DYHUDJH QXPEHU RI DFWXDO SDUDPHWHUV SHU FDOO DQG LV WKH SURJUDP VL]H 2QH OLPLWDWLRQ RI &DOODKDQfV PHWKRG LV WKDW LW GRHV QRW FRUUHFWO\ KDQGOH PXOWLSOH DOLDVHV WKDW UHVXOW ZKHQ WKH VDPH YDULDEOH LV XVHG PXOWLSOH WLPHV DV DQ DFWXDO SDUDPHWHU LQ WKH VDPH FDOO DQG WKH FRUUHVSRQGLQJ IRUPDO SDUDPHWHUV DUH FDOOE\UHIHUHQFH %\ FRQWUDVW RXU PHWKRG XVLQJ HOHPHQW UHFRGLQJ ZKHUH DOO WKH DOLDVHV DUH HQFRGHG LQ D VLQJOH HOHPHQW ZLOO FRUUHFWO\ KDQGOH WKH PXOWLSOH DOLDVHV SUREOHP &DOODKDQfV PHWKRG RIIHUV QR VROXWLRQ WR WKH FDOOLQJFRQWH[W SUREOHP DQG FRXOG QRW EH XVHG WR GHWHUPLQH IRU H[DPSOH LQWHUSURFHGXUDO UHDFKLQJ GHILQLWLRQV +RZHYHU +DUUROG DQG 6RIID >@ KDYH H[WHQGHG KLV PHWKRG VR WKDW LQWHUSURFHGXUDO UHDFKLQJ GHILQLWLRQV FDQ EH GHWHUPLQHG 7KH\ XVH DQ LQWHUSURFHGXUDO IORZJUDSK GHQRWHG ,)* WKDW LV YHU\ VLPLODU WR WKH SURJUDP VXPPDU\ JUDSK 7KH ,)* KDV LQWHUUHDFKLQJ HGJHV WKDW DUH GHWHUPLQHG E\ VROYLQJ &DOODKDQfV NLOO SUREOHP 7KH\ UHFRPPHQG XVLQJ KLV PHWKRG VR WKHLU PHWKRG LQKHULWV &DOODKDQfV VSDFH DQG WLPH FRPSOH[LW\ DV ZHOO DV LWV OLPLWDWLRQ ZLWK UHJDUG WR PXOWLSOH DOLDVHV PAGE 15 %HIRUH WKH ,)* FDQ EH XVHG LW PXVW EH GHFRUDWHG ZLWK WKH UHVXOWV RI LQWUDSURn FHGXUDO DQDO\VLV GRQH WZLFH IRU HDFK SURFHGXUH WR GHWHUPLQH ERWK UHDFKLQJ GHILQLWLRQV DQG XSZDUGO\ H[SRVHG XVHV 7KHQ DQ DOJRULWKP LV XVHG WR SURSDJDWH WKH XSZDUGO\ H[SRVHG XVHV WKURXJKRXW WKH ,)* 7KLV DOJRULWKP KDV ZRUVWFDVH WLPH FRPSOH[LW\ RI Qf ZKHUH Q LV WKH QXPEHU RI QRGHV LQ WKH ,)* 7KHLU JUDSK ZLOO KDYH WKH VDPH QXPEHU RI QRGHV DV IRU &DOODKDQfV JUDSK PHDQLQJ ZRUVWFDVH JUDSK VL]H ZLOO EH YJDOf 6XEVWLWXWLQJ YJDO IRU Q ZH JHW D ZRUVWFDVH WLPH FRPSOH[LW\ RI Y-f $V WKH VL]H RI RXU IORZJUDSK LV SURSRUWLRQDO WR WKH VL]H RI WKH SURJUDP WKH ZRUVWFDVH WLPH FRPSOH[LW\ IRU VROYLQJ RXU HTXDWLRQV LV RQO\ Of :HLVHU >@ ZDV WKH ILUVW WR SURSRVH DQ LQWHUSURFHGXUDO VOLFLQJ PHWKRG WKDW LJQRUHV WKH H[HFXWLRQSDWK SUREOHP DQG WKHUHE\ VXIIHUV IURP WKH UHVXOWLQJ ORVV RI SUHFLVLRQ +RUZLW] HW DO >@ KDYH SUHVHQWHG D PHWKRG WR FRPSXWH WKH PRUH SUHFLVH VOLFH H[SODLQHG LQ WKH ,QWURGXFWLRQ +RZHYHU WKH\ XVH D PRUH UHVWULFWHG GHILQLWLRQ RI D VOLFH 7KHLU VOLFH LV DOO VWDWHPHQWV DQG SUHGLFDWHV WKDW PD\ DIIHFW D YDULDEOH Y DW SURJUDP SRLQW S VXFK WKDW Y LV GHILQHG RU XVHG DW SRLQW S 7KHLU PHWKRG FRQVLVWV RI FRQVWUXFWLQJ D VSHFLDOL]HG JUDSK FDOOHG D V\VWHP GHSHQGHQFH JUDSK 1RGHV LQ WKLV JUDSK UHSUHVHQW SURJUDP SLHFHV VXFK DV VWDWHPHQWV DQG WKH HGJHV LQ WKH JUDSK UHSUHVHQW FRQWURO RU GDWD GHSHQGHQFLHV (GJHV UHSUHVHQWLQJ WUDQVLWLYH GDWD GHSHQGHQFLHV WKDW DUH GXH WR SURFHGXUH FDOOV DUH FRPSXWHG E\ ILUVW PRGHOLQJ HDFK SURFHGXUH DQG LWV FDOOV ZLWK DQ DWWULEXWH JUDPPDU FDOOHG D OLQNDJH JUDPPDU DQG WKHQ VROYLQJ WKH JUDPPDU VR DV WR GHWHUPLQH WKH WUDQVLWLYH GDWD GHSHQGHQFLHV UHSUHVHQWHG E\ LW 2QFH WKH V\VWHP GHSHQGHQFH JUDSK LV FRPSOHWH DQ\ VOLFH EDVHG RQ DQ DFWXDO GHILQLWLRQ RU XVH RFFXUULQJ DW DQ\ SRLQW S LQ WKH SURJUDP FDQ EH H[WUDFWHG IURP WKH JUDSK $ PDMRU ZHDNQHVV RI WKHLU PHWKRG LV WKDW LW GRHV QRW DOORZ D K\SRWKHWLFDO XVH WR EH WKH VWDUWLQJ SRLQW RI WKH VOLFH 7KH FRPSOH[LW\ RI FRQVWUXFWLQJ WKH V\VWHP GHSHQGHQFH JUDSK LV JLYHQ DV f ; Â‘ 'f ZKHUH LV WKH WRWDO QXPEHU RI SURFHGXUHV DQG FDOOV LQ WKH SURJUDP ; LV WKH PAGE 16 WRWDO QXPEHU RI JOREDO YDULDEOHV LQ WKH SURJUDP SOXV D WHUP WKDW FDQ EH FRQVLGHUHG D FRQVWDQW DQG LV D OLQHDU IXQFWLRQ RI ; 2QFH WKH V\VWHP GHSHQGHQFH JUDSK LV FRPSOHWH DQ\ SDUWLFXODU VOLFH WKDW LV ZDQWHG FDQ EH H[WUDFWHG IURP WKH JUDSK DW FRPSOH[LW\ Qf ZKHUH Q LV WKH VL]H RI WKH JUDSK 7KH VL]H RI WKH JUDSK LV URXJKO\ TXDGUDWLF ZLWK SURJUDP VL]H EHLQJ ERXQGHG E\ 3 f 9 I (f 7 f ;f ZKHUH 3 LV WKH QXPEHU RI SURFHGXUHV 9 LV WKH ODUJHVW QXPEHU RI SUHGLFDWHV DQG GHILQLWLRQV LQ D VLQJOH SURFHGXUH ( LV WKH ODUJHVW QXPEHU RI HGJHV LQ D SURFHGXUH GHSHQGHQFH JUDSK 7 LV WKH QXPEHU RI FDOOV LQ WKH SURJUDP DQG ; LV WKH QXPEHU RI JOREDO YDULDEOHV ,Q WKHLU SDSHU PXFK LV PDGH RI WKH IDFW WKDW RQFH WKH JUDSK LV FRPSOHWH DQ\ VOLFH RQ DQ DFWXDO GHILQLWLRQ RU XVH FDQ EH H[WUDFWHG IURP WKH JUDSK DW Qf FRVW ZKHUH Q LV WKH VL]H RI WKH JUDSK +RZHYHU WKH QXPEHU RI DFWXDO GHILQLWLRQ DQG XVH RFFXUUHQFHV LQ D SURJUDP LV SURSRUWLRQDO WR WKH SURJUDP VL]H / 7KHUHIRUH DQ\ PHWKRG WKDW FDQ FRPSXWH D VOLFH DW FRVW 2=f IRU VRPH = FDQ JHQHUDWH DOO WKH VOLFHV FRQWDLQHG LQ WKHLU JUDSK DW FRVW = f /f VSRRO WKH VOLFHV WR GLVN DQG UHFRYHU WKHP DW FRVW f $OWKRXJK WKHUH DUH PDQ\ SDSHUV RQ VOLFLQJ LW VHHPV WKDW RQO\ +RUZLW] HW DO >@ GLVFXVV FOHDUO\ WKH SUREOHP RI WKH PRUH SUHFLVH LQWHUSURFHGXUDO VOLFH DQG SUHVHQW D PHWKRG WR FRPSXWH LW DV ZHOO DV SURYLGLQJ FRPSOH[LW\ DQDO\VLV 2XU UHVHDUFK RQ VOLFLQJ LV RQO\ FRQFHUQHG ZLWK FRPSXWLQJ WKH PRUH SUHFLVH VOLFH VR +RUZLW] HW DO LV WKH SULQFLSDO UHIHUHQFH =LPD DQG &KDSPDQ >@ LV WKH SULQFLSDO UHIHUHQFH XVHG WR VWXG\ WKH LVVXHV DQG PHWKRGV RI SDUDOOHOL]DWLRQ 7KHLU ERRN GLVWLOOV WKH ZRUN IRXQG LQ VFRUHV RI SDSHUV DQG GLVVHUWDWLRQV DQG LV DQ H[FHOOHQW VXUYH\ RI SDUDOOHOL]DWLRQ ,QWHUSURFHGXUDO SDUn DOOHOL]DWLRQ LV VSHFLILFDOO\ FRQVLGHUHG E\ %XUNH DQG &\WURQ >@ DQG E\ 7ULROHW HW DO >@ PAGE 17 2XWOLQH LQ %ULHI 7KLV LQWURGXFWRU\ FKDSWHU HQGV ZLWK D EULHI V\QRSVLV RI WKH UHPDLQLQJ FKDSWHUV &KDSWHU SUHVHQWV LQ GHWDLO RXU LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV PHWKRG 7KH FKDSn WHU HQGV ZLWK D EULHI GHVFULSWLRQ RI WKH SURWRW\SHV WKDW ZHUH EXLOW WR GHPRQVWUDWH WKH PHWKRG DORQJ ZLWK VRPH RI WKH H[SHULPHQWDO UHVXOWV REWDLQHG IURP WKHVH SURWRW\SHV &KDSWHU EHJLQV ZLWK D UHSUHVHQWDWLRQ VFKHPH IRU FRQWLQXDWLRQ SDWKV IRU WKH LQWHUn SURFHGXUDO ORJLFDO ULSSOH HIIHFW SUREOHP DQG WKHQ SUHVHQWV RXU LQWHUSURFHGXUDO ORJLFDO ULSSOH HIIHFW DOJRULWKP $ SURWRW\SH WKDW ZDV EXLOW WR GHPRQVWUDWH WKLV DOJRULWKP LV EULHIO\ GHVFULEHG DQG H[SHULPHQWDO UHVXOWV DUH SUHVHQWHG $Q LQYHUVLRQ RI WKH ORJLFDO ULSSOH HIIHFW DOJRULWKP LV WKHQ SUHVHQWHG DV D VROXWLRQ WR WKH LQWHUSURFHGXUDO VOLFLQJ SUREOHP &KDSWHU EHJLQV ZLWK DQ H[SODQDWLRQ RI ORRSFDUULHG GDWD GHSHQGHQFH DQG LWV UHOHYDQFH WR SDUDOOHOL]DWLRQ DQG FRQFOXGHV ZLWK DQ DOJRULWKP WKDW LGHQWLILHV ORRSV WKDW FDQ EH SDUDOOHOL]HG LQFOXGLQJ ORRSV WKDW FRQWDLQ FDOOV &KDSWHU VXPPDUL]HV WKH PDMRU UHVXOWV RI WKH GLVVHUWDWLRQ DQG VXJJHVWV GLUHFWLRQV IRU IXWXUH UHVHDUFK PAGE 18 &+$37(5 7+( ,17(5352&('85$/ '$7$)/2: $1$/<6,6 0(7+2' &RQVWUXFWLQJ WKH )ORZJUDSK 7KLV VHFWLRQ GLVFXVVHV WKH IORZJUDSK DQG LWV UHODWLRQVKLS WR GDWDIORZ HTXDWLRQV $IWHU WKH GLVFXVVLRQ UXOHV DUH JLYHQ IRU FRQVWUXFWLQJ WKH VSHFLILF IORZJUDSK UHTXLUHG E\ RXU LQWHUSURFHGXUDO DQDO\VLV PHWKRG 1RWH WKDW WKH UHTXLUHG IORZJUDSK LV FRQn YHQWLRQDO DQG WKH UXOHV WR EH JLYHQ UHODWH RQO\ WR WKH UHSUHVHQWDWLRQ RI FDOOV DQG SURFHGXUHV LQ WKH IORZJUDSK $ IORZJUDSK LV D GLUHFWHG JUDSK WKDW UHSUHVHQWV WKH SRVVLEOH IORZ SDWKV RI D SURJUDP 7KH QRGHV RI D IORZJUDSK FRUUHVSRQG WR EDVLF EORFNV LQ WKH SURJUDP $ EDVLF EORFN LV D VHTXHQFH RI SURJUDP FRGH WKDW LV DOZD\V H[HFXWHG WRJHWKHU LQ WKH VDPH RUGHU 7KH GLUHFWHG HGJHV RI D IORZJUDSK UHSUHVHQW SRVVLEOH WUDQVIHUV RI FRQWURO )LJXUHV DQG HDFK UHSUHVHQW D IORZJUDSK 'DWDIORZ SUREOHPV DUH RIWHQ IRUPXODWHG DV D VHW RI HTXDWLRQV WKDW UHODWH WKH IRXU VHWV ,1 287 *(1 DQG .,// WKDW DUH DVVRFLDWHG ZLWK HDFK QRGH LQ WKH IORZ JUDSK )RU DQ\ QRGH DQG LWV EORFN WKH *(1 VHW UHSUHVHQWV WKH HOHPHQWV JHQHUDWHG E\ WKDW EORFN 7KH .,// VHW UHSUHVHQWV WKRVH HOHPHQWV WKDW FDQQRW IORZ WKURXJK WKH EORFN EHFDXVH WKH\ ZRXOG EH NLOOHG E\ WKH EORFN 7KH ,1 VHW UHSUHVHQWV WKH YDOLG HOHPHQWV DW WKH VWDUW RI WKH EORFN DQG WKH 287 VHW UHSUHVHQWV WKH YDOLG HOHPHQWV DW WKH HQG RI WKH EORFN 'DWDIORZ SUREOHPV DUH W\SLFDOO\ HLWKHU IRUZDUGIORZ RU EDFNZDUGIORZ )RU IRUZDUGIORZ WKH ,1 VHW RI D QRGH LV FRPSXWHG DV WKH FRQIOXHQFH RI WKH 287 VHWV RI WKH SUHGHFHVVRU QRGHV DQG WKH 287 VHW LV D IXQFWLRQ RI WKH QRGHfV ,1 *(1 PAGE 19 DQG .,// VHWV )RU EDFNZDUGIORZ WKH 287 VHW RI D QRGH LV FRPSXWHG DV WKH FRQn IOXHQFH RI WKH ,1 VHWV RI WKH VXFFHVVRU QRGHV DQG WKH ,1 VHW LV D IXQFWLRQ RI WKH QRGHfV 287 *(1 DQG .,// VHWV 7KH SUHGHFHVVRUV RI DQ\ QRGH Q DUH WKRVH QRGHV WKDW KDYH DQ RXWHGJH GLUHFWHG WR QRGH Q 7KH VXFFHVVRUV RI QRGH Q DUH WKRVH QRGHV WKDW KDYH DQ LQHGJH GLUHFWHG IURP QRGH Q 7KH FRQIOXHQFH RSHUDWRU ZLOO DOPRVW LQn YDULDEO\ EH HLWKHU VHW XQLRQ RU VHW LQWHUVHFWLRQ GHSHQGLQJ RQ WKH SUREOHP 7KXV D GDWDIORZ SUREOHP PD\ EH FODVVLILHG DV EHLQJ HLWKHU IRUZDUGIORZRU IRUZDUGIORZDQG EDFNZDUGIORZRU RU EDFNZDUGIORZDQG ZKHUH fRUf UHIHUV WR VHW XQLRQ DQG fDQGf UHIHUV WR VHW LQWHUVHFWLRQ 2QFH WKH GDWDIORZ HTXDWLRQV KDYH EHHQ GHILQHG IRU D SDUWLFXODU SUREOHP DQG WKH UXOHV HVWDEOLVKHG IRU FUHDWLQJ WKH *(1 DQG .,// VHWV WKH HTXDWLRQV FDQ WKHQ EH VROYHG IRU D VSHFLILF SURJUDP RU SURFHGXUH DQG LWV UHSUHVHQWDWLYH IORZJUDSK 7R VROYH WKH HTXDWLRQV WKH LWHUDWLYH DOJRULWKP FDQ EH XVHG 7KH LWHUDWLYH DOJRULWKP KDV WKH DGYDQWDJH WKDW LW ZLOO ZRUN IRU DQ\ IORZJUDSK 7KH LWHUDWLYH DOJRULWKP UHSHDWHGO\ FRPSXWHV WKH ,1 DQG 287 VHWV IRU DOO QRGHV XQWLO DOO VHWV KDYH VWDELOL]HG DQG FHDVHG WR FKDQJH 5HFRPSXWDWLRQ RI D QRGH LV QHFHVVDU\ ZKHQHYHU DQ RXWVLGH VHW WKDW LW GHSHQGV RQ FKDQJHV )RU IRUZDUGIORZ SUREOHPV D QRGH PXVW EH UHFRPSXWHG LI WKH 287 VHW RI D SUHGHFHVVRU QRGH FKDQJHV )RU EDFNZDUGIORZ SUREOHPV D QRGH PXVW EH UHFRPSXWHG LI WKH ,1 VHW RI D VXFFHVVRU QRGH FKDQJHV 7\SLFDOO\ DQ HYDOXDWLRQ VWUDWHJ\ ZLOO GHWHUPLQH WKH DFWXDO RUGHU LQ ZKLFK QRGHV DUH UHFRPSXWHG 7KH IORZJUDSK UHTXLUHG E\ RXU LQWHUSURFHGXUDO DQDO\VLV PHWKRG LV FRQYHQWLRQDO ZLWK VSHFLDO QRGHV DQG HGJHV DV IROORZV )RU HDFK SURFHGXUH LQ WKH SURJUDP DVVLJQ DQ HQWU\ QRGH DQG DQ H[LW QRGH 7KHVH QRGHV KDYH QR DVVRFLDWHG EORFNV RI SURJUDP FRGH 7KH HQWU\ QRGH KDV D VLQJOH RXWHGJH DQG DV PDQ\ LQHGJHV DV WKHUH DUH FDOOV WR WKDW SURFHGXUH LQ WKH SURJUDP 7KH H[LW QRGH KDV DV PDQ\ LQHGJHV DV WKHUH DUH PAGE 20 QRGHV IRU WKDW SURFHGXUH ZKRVH EORFNV WHUPLQDWH ZLWK D UHWXUQ DFWLRQ 7KH H[LW QRGH KDV DV PDQ\ RXWHGJHV DV WKHUH DUH FDOOV WR WKDW SURFHGXUH LQ WKH SURJUDP )RU HYHU\ LQHGJH RI WKH HQWU\ QRGH WKHUH LV D FRUUHVSRQGLQJ RXWHGJH RI WKH H[LW QRGH )RU WKH SXUSRVH RI FRQVWUXFWLQJ WKH IORZJUDSK FDOOV PXVW EH FODVVLILHG DV HLWKHU NQRZQ RU XQNQRZQ $ NQRZQ FDOO LV ZKHUH WKH IORZJUDSK IRU WKH FDOOHG SURFHGXUH ZLOO EH D SDUW RI WKH WRWDO IORZJUDSK EHLQJ FRQVWUXFWHG $Q XQNQRZQ FDOO LV ZKHUH WKH IORZJUDSK RI WKH FDOOHG SURFHGXUH ZLOO QRW EH D SDUW RI WKH WRWDO IORZJUDSK EHLQJ FRQVWUXFWHG 8QNQRZQ FDOOV DUH FRPPRQ DQG ZLOO RFFXU IRU WZR UHDVRQV )LUVW WKH FDOOHG SURFHGXUH PD\ EH D FRPSLOHUOLEUDU\ SURFHGXUH IRU ZKLFK VRXUFH FRGH LV QRW DYDLODEOH 6HFRQG WKH FDOOHG SURFHGXUH PD\ EH D VHSDUDWHO\ FRPSLOHG XVHU SURFHGXUH IRU ZKLFK WKH VRXUFH FRGH LV QRW DYDLODEOH )RU DQ\ XQNQRZQ FDOO PDGH ZLWKLQ WKH SURJUDP LI VXPPDU\ LQIRUPDWLRQ RI LWV LQWHUSURFHGXUDO HIIHFWV LV QRW DYDLODEOH WKHQ FRQVHUYDWLYH DVVXPSWLRQV DERXW LWV HIIHFWV ZLOO KDYH WR EH PDGH 7KH DFWXDO VXPPDU\ LQIRUPDWLRQ QHHGHG DQG WKH DVVXPSWLRQV PDGH LQ LWV DEVHQFH ZLOO GHSHQG RQ WKH SDUWLFXODU GDWDIORZ SUREOHP 7KH VXPPDU\ LQIRUPDWLRQ LI SUHVHQW ZRXOG EH XVHG ZKHQ FRQVWUXFWLQJ WKH *(1 DQG .,// VHWV IRU DQ\ QRGH ZKRVH EORFN FRQWDLQV DQ XQNQRZQ FDOO )RU DQ\ NQRZQ FDOO PDGH ZLWKLQ WKH SURJUDP WKHUH ZLOO EH WZR QRGHV LQ WKH IORZJUDSK IRU WKDW FDOO 2QH QRGH LV WKH FDOO QRGH 7KH FDOO QRGH UHSUHVHQWV D EDVLF EORFN WKDW HQGV ZLWK WKH NQRZQ FDOO 7KH RWKHU QRGH LV WKH UHWXUQ QRGH 7KH UHWXUQ QRGH KDV DQ HPSW\ DVVRFLDWHG EORFN 7KH FDOO QRGH ZLOO KDYH WZR RXWHGJHV 2QH HGJH ZLOO EH GLUHFWHG WR WKH HQWU\ QRGH RI WKH FDOOHG SURFHGXUH 7KH RWKHU RXWHGJH ZLOO EH GLUHFWHG WR WKH UHWXUQ QRGH IRU WKDW FDOO 7KH UHWXUQ QRGH ZLOO KDYH WZR LQHGJHV 2QH HGJH LV WKH GLUHFWHG HGJH IURP WKH FDOO QRGH 7KH RWKHU LQHGJH LV GLUHFWHG IURP WKH FDOOHG SURFHGXUHfV H[LW QRGH PAGE 21 ,Q DOO HDFK NQRZQ FDOO UHVXOWV LQ WZR QRGHV DQG WKUHH GLVWLQFW HGJHV 2QH HGJH FRQQHFWV WKH FDOO QRGH WR LWV UHWXUQ QRGH $ VHFRQG HGJH FRQQHFWV WKH FDOO QRGH WR WKH FDOOHG SURFHGXUHfV HQWU\ QRGH $ WKLUG HGJH FRQQHFWV WKH FDOOHG SURFHGXUHfV H[LW QRGH WR WKH UHWXUQ QRGH ,Q FRQVWUXFWLQJ WKH IORZJUDSK D VSHFLDO SUREOHP DULVHV LI WKH SURJUDPPLQJ ODQn JXDJH DOORZV SURFHGXUHYDOXHG YDULDEOHV VXFK DV WKH IXQFWLRQ SRLQWHUV RI & WKDW ZKHQ GHUHIHUHQFHG UHVXOW LQ D FDOO RI WKH IXQFWLRQ WKDW LV SRLQWHG DW 7KH SUREOHP LV WR LGHQWLI\ ZKDW DUH WKH SRVVLEOH SURFHGXUH YDOXHV ZKHQ WKH SURFHGXUHYDOXHG YDULDEOH LQYRNHV D FDOO $VVXPLQJ WKLV LQIRUPDWLRQ LV DYDLODEOH IURP D VHSDUDWH DQDO\VLV WKH IORZJUDSK FDQ EH FRQVWUXFWHG DFFRUGLQJO\ )RU H[DPSOH LI WKH SURFHGXUHYDOXHG YDULn DEOH FDQ KDYH WKUHH GLIIHUHQW YDOXHV ZKHQ WKH FDOO LQ TXHVWLRQ LV LQYRNHG DQG HDFK YDOXH LV D SURFHGXUH ZKRVH IORZJUDSK ZLOO EH SDUW RI WKH WRWDO IORZJUDSK WKHQ WKUHH NQRZQ FDOOV ZRXOG EH FRQVWUXFWHG LQ SDUDOOHO ZLWK D FRPPRQ SUHGHFHVVRU QRGH IRU WKH WKUHH FDOO QRGHV DQG D FRPPRQ VXFFHVVRU QRGH IRU WKH WKUHH UHWXUQ QRGHV $ SURFHGXUHYDOXHG YDULDEOH LV LQ HVVHQFH D SRLQWHU 1RWH WKDW WKH SUREOHP RI GHWHUPLQLQJ ZKDW D SRLQWHU LV RU PD\ EH SRLQWLQJ DW ZKHQ WKDW SRLQWHU LV GHUHIHUHQFHG FDQ LWVHOI EH IRUPXODWHG DV D GDWDIORZ SUREOHP DQG LQ SDUWLFXODU DV D IRUZDUGIORZRU GDWDIORZ SUREOHP ,I QHFHVVDU\ DQ LQLWLDO YHUVLRQ RI WKH IORZJUDSK FRXOG EH FRQn VWUXFWHG WKDW WUHDWV DOO FDOOV LQYRNHG E\ SURFHGXUHYDOXHG YDULDEOHV DV XQNQRZQ FDOOV IROORZHG E\ D VROYLQJ RI WKH GDWDIORZ SUREOHP IRU GHWHUPLQLQJ SRVVLEOH SRLQWHU YDOXHV ZKHQHYHU D SRLQWHU LV GHUHIHUHQFHG IROORZHG E\ DPHQGPHQWV WR WKH IORZJUDSK XVLQJ WKH SRLQWHUYDOXH LQIRUPDWLRQ 'DWDIORZ DQDO\VLV PDNHV D VLPSOLI\LQJ FRQVHUYDWLYH DVVXPSWLRQ DERXW WKH FRUn UHVSRQGHQFH EHWZHHQ SDWKV LQ WKH IORZJUDSK DQG SRVVLEOH H[HFXWLRQ SDWKV LQ WKH SURn JUDP /HW D SDWK EH D VHTXHQFH RI IORZJUDSK QRGHV VXFK WKDW LQ WKH VHTXHQFH QRGH Q IROORZV QRGH P RQO\ LI Q LV D VXFFHVVRU RI P LQ WKH IORZJUDSK )RU LQWUDSURFHGXUDO PAGE 22 DQDO\VLV WKH DVVXPSWLRQ PDGH LV WKDW DQ\ SDWK LQ WKH IORZJUDSK LV D SRVVLEOH H[HFXn WLRQ SDWK 7KDW WKLV DVVXPSWLRQ PD\ QRW EH WUXH IRU D SDUWLFXODU SURJUDP VKRXOG EH REYLRXV +RZHYHU WKH SUREOHP RI GHWHUPLQLQJ WKH SRVVLEOH H[HFXWLRQ SDWKV IRU DQ DUELWUDU\ SURJUDP LV NQRZQ WR EH XQGHFLGDEOH 7KH VLPSOLI\LQJ DVVXPSWLRQ WKDW ZH XVH IRU LQWHUSURFHGXUDO DQDO\VLV LV WKH VDPH DV WKDW XVHG IRU LQWUDSURFHGXUDO DQDO\n VLV EXW ZLWK WKH DGGHG SURYLVR WKDW IRU DQ\ SDWK WKDW LV D SRVVLEOH H[HFXWLRQ SDWK DQ\ VXEVHTXHQFH RI UHWXUQ QRGHV PXVW LQYHUVHO\ PDWFK LI SUHVHQW WKH LPPHGLDWHO\ SUHFHGLQJ VXEVHTXHQFH RI FDOO QRGHV $ UHWXUQ QRGH PDWFKHV D FDOO QRGH LI DQG RQO\ LI WKH UHWXUQ QRGH LV WKH FDOO QRGHfV VXFFHVVRU LQ WKH IORZJUDSK ,QWHUSURFHGXUDO )RUZDUG)ORZ2U $QDO\VLV 7KLV VHFWLRQ EHJLQV ZLWK RXU EDVLF DSSURDFK WR VROYLQJ WKH FDOOLQJFRQWH[W SUREn OHP 7KH GDWDIORZ HTXDWLRQV IRU IRUZDUGIORZRU DQDO\VLV DUH WKHQ JLYHQ DQG WKHLU FRUUHFWQHVV LV VKRZQ $V D SDUW RI RXU LQWHUSURFHGXUDO DQDO\VLV PHWKRG WKH WHFKn QLTXH RI HOHPHQW UHFRGLQJ LV SUHVHQWHG DV D ZD\ WR GHDO ZLWK WKH DOLDVHV WKDW UHVXOW IURP FDOOE\UHIHUHQFH IRUPDO SDUDPHWHUV )RU VRPH GDWDIORZ SUREOHPV LPSOLFLW GHILn QLWLRQV GXH WR FDOOV UHTXLUH H[SOLFLW WUHDWPHQW DQG WKLV LV GLVFXVVHG ODVW ,I FHUWDLQ SUREOHPV VXFK DV UHDFKLQJ GHILQLWLRQV DUH WR EH VROYHG IRU D SURJUDP E\ IORZVHQVLWLYH LQWHUSURFHGXUDO DQDO\VLV WKHQ WKH FDOOLQJ FRQWH[W RI HDFK SURFHGXUH FDOO PXVW EH SUHVHUYHG ,Q JHQHUDO SUHVHUYLQJ FDOOLQJ FRQWH[W PHDQV WKDW WKH GDWDIORZ HIIHFWV RI DQ LQGLYLGXDO FDOO VKRXOG LQFOXGH WKRVH HIIHFWV WKDW VXUYLYH WKH FDOO DQG ZHUH LQWURGXFHG LQWR WKH FDOOHG SURFHGXUH E\ WKH FDOO LWVHOI EXW QRW WKRVH HIIHFWV LQWURGXFHG LQWR WKH FDOOHG SURFHGXUH E\ DOO WKH RWKHU FDOOV WR LW WKDW PD\ H[LVW HOVHZKHUH LQ WKH SURJUDP :H UHIHU WR WKH QHHG WR SUHVHUYH FDOOLQJ FRQWH[W DV WKH FDOOLQJFRQWH[W SUREOHP 2XU VROXWLRQ WR WKH FDOOLQJFRQWH[W SUREOHPfÂ§DQG WKH HVVHQWLDO GLIIHUHQFH EHn WZHHQ RXU GDWDIORZ HTXDWLRQV DQG FRQYHQWLRQDO GDWDIORZ HTXDWLRQVfÂ§LV WR GLYLGH HYHU\ ,1 VHW DQG HYHU\ 287 VHW LQWR WZR VHWV FDOOHG DQ HQWU\ VHW DQG D ERG\ VHW 7KH UHDVRQ PAGE 23 IRU KDYLQJ WZR VHWV LV WKDW WKH FDOOLQJFRQWH[W HIIHFWV WKDW HQWHU D SURFHGXUH IURP WKH GLIIHUHQW FDOOV FDQ EH FROOHFWHG DQG LVRODWHG LQ WKH VHSDUDWH HQWU\ VHW 7KLV HQWU\ VHW FDQ WKHQ KDYH HIIHFWV LQ LW NLOOHG E\ VWDWHPHQWV LQ WKH ERG\ RI WKH SURFHGXUH EXW QR DGGLWLRQV DUH PDGH WR WKLV HQWU\ VHW E\ ERG\ VWDWHPHQWV ,QVWHDG DQ\ DGGLWLRQV RI HIIHFWV GXH WR ERG\ VWDWHPHQWV DUH PDGH WR WKH VHSDUDWH ERG\ VHW 7KLV ERG\ VHW ZLOO DOVR KDYH HIIHFWV NLOOHG LQ WKH QRUPDO PDQQHU DV IRU WKH HQWU\ VHW %HFDXVH WKH ERG\ VHW LV NHSW IUHH RI FDOOLQJFRQWH[W HIIHFWV LW LV HPSW\ DW WKH HQWU\ QRGH %\ FRQWUDVW WKH HQWU\ VHW LV DW LWV ODUJHVW DW WKH HQWU\ QRGH DQG ZLOO HLWKHU VWD\ WKH VDPH VL]H DV LW SURJUHVVHV WKURXJK WKH SURFHGXUHfV ERG\ QRGHV RU EHFRPH VPDOOHU EHFDXVH RI NLOOV %\ LQWHUVHFWLQJ WKH FDOOLQJ FRQWH[W DW D FDOO QRGH ZLWK WKH HQWU\ VHW DW WKH H[LW QRGH RI WKH FDOOHG SURFHGXUH WKH UHVXOW LV WKDW VXEVHW RI WKH FDOOLQJ FRQWH[W WKDW KDV UHDFKHG WKH H[LW QRGH DQG WKHUHIRUH ZLOO UHDFK WKH UHWXUQ QRGH IRU WKDW FDOO %\ fUHDFKf ZH PHDQ WKDW WKHUH H[LVWV D SDWK LQ WKH IORZJUDSK DORQJ ZKLFK WKH HOHPHQW LV QRW NLOOHG RU EORFNHG 7KH 'DWDIORZ (TXDWLRQV 7KH GDWDIORZ HTXDWLRQV WKDW GHILQH WKH HQWU\ DQG ERG\ VHWV DW HYHU\ QRGH DUH QRZ JLYHQ 7KH HTXDWLRQV DUH GLYLGHG LQWR WKUHH JURXSV 7KH ILUVW JURXS FRPSXWHV WKH VHWV IRU HQWU\ QRGHV 7KH VHFRQG JURXS FRPSXWHV WKH VHWV IRU UHWXUQ QRGHV 7KH WKLUG JURXS FRPSXWHV WKH VHWV IRU DOO RWKHU QRGHV ,Q WKH HTXDWLRQV % GHQRWHV D ERG\ VHW DQG ( GHQRWHV DQ HQWU\ VHW 7ZR FRQGLWLRQV &? DQG & DSSHDU LQ WKH HTXDWLRQV &? PHDQV WKDW [ ZLOO FURVV WKH LQWHUSURFHGXUDO ERXQGDU\ IURP FDOO QRGH S LQWR WKH FDOOHG SURFHGXUH & PHDQV WKDW [ FDQ FURVV WKH LQWHUSURFHGXUDO ERXQGDU\ IURP H[LW QRGH T LQWR UHWXUQ QRGH Q &W PHDQV QRW & )RU HDFK QRGH Q SUHGQf PHDQV WKH VHW RI SUHGHFHVVRUV RI Q 7KH 5(&2'( VHW XVHG LQ *URXS LV H[SODLQHG LQ 6HFWLRQ 7KH *(1 VHW XVHG LQ *URXS DQG WKH *(1 DQG .,// VHWV XVHG LQ *URXS ,, DUH H[SODLQHG LQ 6HFWLRQ PAGE 24 )RU DQ\ QRGH Q ,1>Q@ (LQ>Q@ 8 %WQ>Q? 287>Q@ (RXW>Q? 8 %RXW>Q@ *URXS Q LV DQ HQWU\ QRGH %LQ^Q@ (LQ>Q? ^[ [ 287>S@ $ &L` S SUHGQf %RXW>Q` *(1>Q? (RXW >Q@ (LQ>Q?8 5(&2'(>Q? *URXS ,, Q LV D UHWXUQ QRGH S LV WKH DVVRFLDWHG FDOO QRGH DQG T LV WKH H[LW QRGH RI WKH FDOOHG SURFHGXUH %LQ>Q? ^[ D %RXW>S@ $ &L 9 &[ $ & $ [ e eRXW>"@fff 9 [ %RXW>T@ $ &f` (LQ>Q? ^[ (RXW>S` &L 9 &L $ & $ [ (RXW>T@f` %RXW>Q@ P>Q@ .,//>Q@f 8 *(1>Q` (RXW>Q@ (WQ>Q? ,,//>Q? *URXS ,,, Q LV QRW DQ HQWU\ RU UHWXUQ QRGH %LQ 0 ?%RXW>S@ S SUHGQf (LQ>Q@ 8 (RXW>S@ S SUHGQf %XW>Q@ %WQ>Q@ .,//>Q@f 8 *(1>Q? PAGE 25 (RXW>Q@ (LQ>Q@ .,//>Q@ 7KH HTXDWLRQV DVVXPH WKDW WKH *(1 DQG .,// VHWV IRU HDFK FDOO QRGH ZLOO LQFOXGH RQO\ WKRVH HIIHFWV IRU WKDW FDOO WKDW RFFXU SULRU WR WKH HQWU\ RI WKH FDOOHG SURFHGXUH 7KLV UHTXLUHPHQW LV QHFHVVDU\ EHFDXVH WKH 287 VHW RI WKH FDOO QRGH LV XVHG E\ WKH HQWU\QRGH HTXDWLRQ WKDW FRQVWUXFWV WKH HQWU\ VHW RI WKH FDOOHG SURFHGXUH 5HIHUULQJ WR FRQGLWLRQV &? DQG & WKH UXOHV IRU GHFLGLQJ ZKHWKHU DQ HIIHFW FURVVHV D SDUWLFXODU LQWHUSURFHGXUDO ERXQGDU\ ZLOO GHSHQG RQ WZR SULPDU\ IDFWRUV QDPHO\ WKH GDWDIORZ SUREOHP DQG WKH SURJUDPPLQJ ODQJXDJH )RU H[DPSOH IRU WKH UHDFKLQJGHILQLWLRQV SUREOHP DQG D ODQJXDJH VXFK DV )2575$1 DQ\ GHILQLWLRQ RI D JOREDO YDULDEOH DQG DQ\ GHILQLWLRQ RI D YDULDEOH WKDW LV XVHG DV DQ DFWXDO SDUDPHWHU ZKRVH FRUUHVSRQGLQJ IRUPDO SDUDPHWHU LV FDOOE\UHIHUHQFH ZLOO FURVV $V D UXOH DQ HIIHFW WKDW FURVVHV LQWR D SURFHGXUH EHFDXVH LW PLJKW EH NLOOHG ZLOO DOVR FURVV EDFN WR WKH UHWXUQ QRGH LI LW UHDFKHV WKH H[LW QRGH RI WKH FDOOHG SURFHGXUH 7DEOH VKRZV WKH UHVXOW RI VROYLQJ WKH HTXDWLRQV IRU WKH IORZJUDSK RI )LJn XUH %\ fVROYLQJf ZH PHDQ WKDW LQ HIIHFW WKH LWHUDWLYH DOJRULWKP KDV EHHQ XVHG DQG DOO WKH VHWV DUH VWDEOH 7KH GDWDIORZ SUREOHP LV UHDFKLQJ GHILQLWLRQV DQG YDULDEOH Z LV ORFDO ZKLOH YDULDEOHV [ W DQG DUH JOREDO 5HDFKLQJ GHILQLWLRQV LV WKH SUREOHP RI ILQGLQJ DOO GHILQLWLRQV RI D YDULDEOH WKDW UHDFK D SDUWLFXODU XVH RI WKDW YDULDEOH IRU DOO YDULDEOHV DQG XVHV LQ WKH SURJUDP ,Q )LJXUH QRGHV DQG DUH HQWU\ QRGHV QRGHV DQG DUH H[LW QRGHV QRGHV DQG DUH FDOO QRGHV DQG QRGHV DQG DUH UHWXUQ QRGHV $ORQJVLGH HDFK QRGH LV LWV EDVLF EORFN (DFK GHILQHG YDULDEOH LV VXSHUVFULSWHG ZLWK DQ LGHQWLILHU WKDW LV WKH VHW HOHPHQW XVHG LQ 7DEOH WR UHSUHVHQW WKDW GHILQLWLRQ 7KH FRUUHFWQHVV RI WKH HTXDWLRQV FDQ EH VHHQ IURP WKH IROORZLQJ REVHUYDWLRQV )RU D SURFHGXUH WKH HQWU\QRGH HQWU\ VHW LV FRQVWUXFWHG DV WKH XQLRQ RI DOO FDOOLQJ FRQWH[W HIIHFWV WKDW FDQ HQWHU WKH SURFHGXUH IURP LWV FDOOV :LWKLQ WKH SURFHGXUH ERG\ HIIHFWV LQ WKH HQWU\ VHW FDQ EH NLOOHG EXW QRW DGGHG WR )RU HIIHFWV LQ WKH HQWU\ PAGE 26 SURFHGXUH PDLQ EHJLQ Z [ LIZ [f ] SURFHGXUH If EHJLQ [ HQG FDOO If HOVH \ FDOO If )LJXUH $ UHDFKLQJGHILQLWLRQV H[DPSOH PAGE 27 7DEOH 6ROXWLRQ RI IRUZDUGIORZRU HTXDWLRQV IRU )LJXUH 1RGH (^Q (XW %LQ %RXW ^` ^` ^ f ^` ^ ` f f ^ ` ^ ` ^ ` ^ ` ^ ` ^ ` ^ ` ^` ^` f ^ ` ^` ^` VHW WKDW UHDFK D FDOO DW D FDOO QRGH WKRVH HIIHFWV WKDW VXUYLYH WKH FDOO DUH UHFRYHUHG LQ WKH HQWU\ VHW FRQVWUXFWHG E\ WKH (LQ>Q? HTXDWLRQ IRU WKH VXFFHVVRU UHWXUQ QRGH Q 7R VHH WKDW WKLV LV WUXH REVHUYH WKH IROORZLQJ ,I DQ HQWU\VHW HIIHFW WKDW UHDFKHV WKH FDOO FDQQRW HQWHU WKH FDOOHG SURFHGXUH WKHQ LW FDQQRW EH NLOOHG ZLWKLQ WKH FDOOHG SURFHGXUH VR WKH HIIHFW VKRXOG EH DGGHG WR WKH UHWXUQQRGH HQWU\ VHW ZLWKRXW IXUWKHU FRQGLWLRQV DQG WKLV LV GRQH E\ WKH VHOHFWLRQ FULWHULRQ [ e (RXW>S@ $ &?f LQ WKH HTXDWLRQ IRU WKH UHWXUQ QRGH ,I RQ WKH RWKHU KDQG DQ HQWU\VHW HIIHFW UHDFKHV WKH FDOO DQG GRHV HQWHU WKH FDOOHG SURFHGXUH DQG WKHUHIRUH PD\ EH NLOOHG E\ LW WKHQ WKLV HIIHFW VKRXOG EH DGGHG WR WKH UHWXUQQRGH HQWU\ VHW RQO\ LI LW UHDFKHG WKH HQWU\ VHW RI WKH FDOOHG SURFHGXUHfV H[LW QRGH DQG WKH HIIHFW FDQ FURVV EDFN LQWR WKH FDOOHU 7KLV LV GRQH E\ WKH VHOHFWLRQ FULWHULRQ [ e (RXW>S@ $ &? $ & $ [ e (RXW>T@f LQ WKH (LQ>Q? HTXDWLRQ IRU WKH UHWXUQ QRGH )URP WKH HTXDWLRQV IRU WKH HQWU\ VHW ZH VHH WKDW IRU DQ\ SURFHGXUH ] WKH HQWU\ VHW DW ]fV H[LW QRGH ZLOO DV WKH HTXDWLRQV DUH VROYHG HYHQWXDOO\ FRQWDLQ DOO FDOOLQJFRQWH[W HIIHFWV WKDW HQWHUHG ] DQG UHDFKHG LWV H[LW QRGH 7KLV FKDUDFWHULVWLF RI WKH H[LWQRGH HQWU\ VHW LV WKH UHTXLUHPHQW SODFHG XSRQ LW ZKHQ LW LV XVHG LQ WKH PAGE 28 (LQ>Q? HTXDWLRQ IRU WKH UHWXUQ QRGH VR WKLV UHTXLUHPHQW LV VDWLVILHG DQG WKH HQWU\VHW HTXDWLRQV DUH FRUUHFW )RU DQ\ SURFHGXUH WKH %LQ VHW LV DOZD\V HPSW\ DW WKH HQWU\ QRGH VR WKH % VHW LV IUHH RI FDOOLQJFRQWH[W HIIHFWV :LWKLQ WKH SURFHGXUH ERG\ *(1 DQG .,// VHWV DUH XVHG WR XSGDWH WKH ERG\ VHW DV LW SURSDJDWHV DORQJ WKH YDULRXV QRGHV )RU HIIHFWV LQ WKH ERG\ VHW WKDW UHDFK D FDOO DW D FDOO QRGH WKRVH HIIHFWV WKDW VXUYLYH WKH FDOO DUH UHFRYHUHG LQ WKH ERG\ VHW FRQVWUXFWHG E\ WKH Â"f>Q@ HTXDWLRQ IRU WKH VXFFHVVRU UHWXUQ QRGH Q ,I D ERG\VHW HIIHFW WKDW UHDFKHV WKH FDOO FDQQRW HQWHU WKH FDOOHG SURFHGXUH WKHQ LW FDQQRW EH NLOOHG ZLWKLQ WKH FDOOHG SURFHGXUH VR LW VKRXOG EH DGGHG WR WKH UHWXUQQRGH ERG\ VHW ZLWKRXW IXUWKHU FRQGLWLRQV DQG WKLV LV GRQH E\ WKH VHOHFWLRQ FULWHULRQ [ %RXW>S@ $ &Mf LQ WKH f>Q@ HTXDWLRQ IRU WKH UHWXUQ QRGH ,I RQ WKH RWKHU KDQG D ERG\VHW HIIHFW UHDFKHV WKH FDOO DQG ZLOO HQWHU WKH FDOOHG SURFHGXUH DQG WKHUHIRUH PD\ EH NLOOHG E\ LW WKHQ WKLV HIIHFW VKRXOG EH DGGHG WR WKH UHWXUQ QRGH ERG\ VHW RQO\ LI LW UHDFKHG WKH HQWU\ VHW RI WKH FDOOHG SURFHGXUHfV H[LW QRGH DQG WKH HIIHFW FDQ FURVV EDFN LQWR WKH FDOOHU 7KLV LV GRQH E\ WKH VHOHFWLRQ FULWHULRQ [ f %R[OW>S@ $ &L $ & $ [ f (RXW>T@f LQ WKH ="P>Q@ HTXDWLRQ IRU WKH UHWXUQ QRGH ,Q DGGLWLRQ DOO FURVVDEOH HIIHFWV WKDW UHVXOW IURP WKH FDOO DQG WKDW DUH LQGHSHQGHQW RI FDOOLQJ FRQWH[W VKRXOG DOVR EH DGGHG WR WKH UHWXUQQRGH ERG\ VHW DQG WKLV LV GRQH E\ WKH VHOHFWLRQ FULWHULRQ [ f %RXW>T@ $ &f LQ WKH P>Q@ HTXDWLRQ IRU WKH UHWXUQ QRGH )URP WKH HTXDWLRQV IRU WKH ERG\ VHW ZH VHH WKDW IRU DQ\ SURFHGXUH ] WKH ERG\ VHW DW ]fV H[LW QRGH LV IUHH RI FDOOLQJFRQWH[W HIIHFWV DQG ZLOO DV WKH HTXDWLRQV DUH VROYHG HYHQWXDOO\ FRQWDLQ DOO ERG\ HIIHFWV WKDW UHDFKHG WKH H[LW QRGH LQFOXGLQJ WKRVH ERG\ HIIHFWV UHVXOWLQJ IURP FDOOV PDGH ZLWKLQ ] 7KLV FKDUDFWHULVWLF RI WKH H[LWQRGH ERG\ VHW LV WKH UHTXLUHPHQW SODFHG XSRQ LW ZKHQ LW LV XVHG LQ WKH eQ>Q@ HTXDWLRQ IRU WKH UHWXUQ QRGH VR WKLV UHTXLUHPHQW LV VDWLVILHG 7KH RWKHU UHTXLUHPHQW RI WKLV UHWXUQQRGH HTXDWLRQ LV WKDW WKH H[LWQRGH HQWU\ VHW FRQWDLQV DOO FDOOLQJFRQWH[W HIIHFWV PAGE 29 IRU WKH SURFHGXUH WKDW UHDFK WKH H[LW QRGH 7KLV UHTXLUHPHQW KDV DOUHDG\ EHHQ VKRZQ WR EH VDWLVILHG VR ZH FRQFOXGH WKDW WKH ERG\VHW HTXDWLRQV DUH FRUUHFW (OHPHQW 5HFRGLQJ IRU $OLDVHV 7KH 5(&2'( VHW IRU WKH HQWU\ QRGH KDV LWV HOHPHQWV DGGHG WR WKH )Q VHW IRU WKDW QRGH 7KH LGHD RI WKH 5(&2'( VHW LV WKDW FHUWDLQ HOHPHQWV LQ WKH 287 VHW RI D SUHGHFHVVRU FDOO QRGH LUUHVSHFWLYH RI WKHLU DELOLW\ WR FURVV WKH LQWHUSURFHGXUDO ERXQGn DU\ ZKHQ SDUDPHWHUV DUH LJQRUHG VKRXOG QHYHUWKHOHVV EH FDUULHG RYHU LQWR WKH HQWU\ VHW RI WKH FDOOHG SURFHGXUH DV FDOOLQJFRQWH[W HIIHFWV EHFDXVH RI DQ DOLDV UHODWLRQVKLS HVWDEOLVKHG E\ WKH FDOO EHWZHHQ DQ DFWXDO SDUDPHWHU DQG D IRUPDO FDOOE\UHIHUHQFH SDUDPHWHU $Q\ HOHPHQW WKDW HQWHUV D SURFHGXUH EHFDXVH RI VXFK DQ DOLDV UHODWLRQVKLS EHWZHHQ SDUDPHWHUV VKRXOG EH UHFRGHG WR UHIOHFW WKLV DOLDV UHODWLRQVKLS $ UHFRGHG HOHPHQW UHSUHVHQWV ERWK WKH EDVH HOHPHQW ZKLFK LV WKH HOHPHQW DV LW ZRXOG EH LI WKHUH ZHUH QR DOLDV UHODWLRQVKLS DQG WKH QRQHPSW\ DOLDV UHODWLRQVKLS (OHPHQW UHFRGLQJ KDV WZR SXUSRVHV )LUVW LW DOORZV WKH UHFRGHG HOHPHQW ZLWKLQ WKH FDOOHG SURFHGXUH WR EH NLOOHG FRUUHFWO\ WKURXJK LWV DOLDV UHODWLRQVKLS 6HFRQG LW DOORZV WKH UHFRGHG HOHPHQW ZLWKLQ WKH FDOOHG SURFHGXUH WR EH FRUUHFWO\ DVVRFLDWHG ZLWK VSHFLILF UHIHUHQFHV WR WKRVH DOLDVHV WKDW DUH LQ WKH DOLDV UHODWLRQVKLS (OHPHQW UHFRGLQJ QHYHU LQYROYHV D FKDQJH RI WKH EDVH HOHPHQW EXW RQO\ D FKDQJH RI WKH DVVRFLDWHG DOLDV UHODWLRQVKLS ZKLFK ZRXOG EH WKH VHW RI IRUPDO SDUDPHWHUV WR ZKLFK WKH EDVH HOHPHQW LV LQ HIIHFW DOLDVHG %HFDXVH RI HOHPHQW UHFRGLQJ LQ HIIHFW D QHZ HOHPHQW LV JHQHUDWHG KHQFH WKH VHSDUDWH 5(&2'( VHW )LJXUH SUHVHQWV DQ DOJRULWKP IRU JHQHUDWLQJ WKH HQWU\QRGH LQSXW VHWV (f DQG 5(&2'( IRU D IRUZDUGIORZRU GDWDIORZ SUREOHP IRU WKH DVVXPHG ODQJXDJH PRGHO LQ ZKLFK WKH YLVLELOLW\ RI HDFK IRUPDO SDUDPHWHU LV OLPLWHG WR WKH VLQJOH SURFHn GXUH WKDW GHFODUHV LW )RU HDFK HOHPHQW LQ WKH 287>F@ VHW WKH DOJRULWKP JHQHUDWHV DW PRVW RQH HOHPHQW IRU LQFOXVLRQ LQ WKH HQWU\QRGH LQSXW VHWV 7KH DOJRULWKP LV PAGE 30 XQDPELJXRXV H[FHSW IRU OLQH 7KH fFDQ EH DIIHFWHG E\f WHVW DW OLQH LV D JHQHUn DOL]DWLRQ 7KH GHWDLOV RI WKLV WHVW ZLOO GHSHQG RQ WKH VSHFLILF GDWDIORZ SUREOHP EHLQJ VROYHG )RU H[DPSOH LI WKH GDWDIORZ SUREOHP LV UHDFKLQJ GHILQLWLRQV WKHQ HDFK EDVH HOHPHQW Z UHSUHVHQWV D VSHFLILF GHILQLWLRQ RI VRPH YDULDEOH ] ,I WKH DFWXDO SDUDPHn WHU S EHLQJ WHVWHG E\ WKH DOJRULWKP LV WKH YDULDEOH ] DQG WKH FRUUHVSRQGLQJ IRUPDO SDUDPHWHU LV FDOOE\UHIHUHQFH WKHQ WKH GHILQLWLRQ WKDW Z UHSUHVHQWV FDQ EH XVHG RU NLOOHG WKURXJK WKDW IRUPDO SDUDPHWHU VR Z FDQ EH DIIHFWHG E\ WKDW DFWXDO SDUDPHWHU ] DQG WKH fDIIHFWHG E\f WHVW LV WKHUHIRUH VDWLVILHG 7KH S f 2$ WHVW DW OLQH FRYHUV WKH VLWXDWLRQ ZKHUH DQ DFWXDO SDUDPHWHU S WKDW LV DOLDVHG WR WKH IRUPDO LV LWVHOI D IRUPDO SDUDPHWHU WKDW LV HIIHFWLYHO\ DOLDVHG WR Z ,Q WKLV FDVH LV HVWDEOLVKHG DV D QHZ HIIHFWLYH DOLDV IRU Z E\ WUDQVLWLYLW\ RI WKH DOLDV UHODWLRQVKLS 5HIHUULQJ WR WKH DOJRULWKP WKHUH LV QR FDUU\ RYHU RI WKH ROG DOLDV UHODWLRQVKLS LQWR WKH QHZ DOLDV UHODWLRQVKLS 7KH ROG DOLDV UHODWLRQVKLS LV UHSUHVHQWHG E\ WKH 2$ VHW DQG WKH QHZ DOLDV UHODWLRQVKLS LV UHSUHVHQWHG E\ WKH 1$ VHW 7KDW WKLV QR FDUU\RYHU RI WKH ROG DOLDV UHODWLRQVKLS LV FRUUHFW IROORZV IURP WKH DVVXPHG ODQJXDJH PRGHO 7KH DOLDVHV RI HOHPHQW UHFRGLQJ DUH IRUPDO SDUDPHWHUV DQG WKH PRGHO VWDWHV WKDW HDFK IRUPDO SDUDPHWHU LV YLVLEOH LQ RQO\ RQH SURFHGXUH 7KLV PHDQV WKHUH LV QR QHHG WR FDUU\ WKH ROG DOLDV UHODWLRQVKLS LQWR D GLIIHUHQW SURFHGXUH EHFDXVH WKH DOLDVHV FDQQRW EH UHIHUHQFHG RXWVLGH WKH VLQJOH SURFHGXUH LQ ZKLFK WKH ROG DOLDV UHODWLRQVKLS LV DFWLYH 1RWH WKDW UHFXUVLYH FDOOV DUH QR H[FHSWLRQ WR WKLV QRFDUU\RYHU UXOH EHFDXVH D UHFXUVLYH FDOO ZLOO FDQFHO DQ\ DOLDV UHODWLRQVKLS HVWDEOLVKHG IRU D EDVH HOHPHQW E\ DQ\ SULRU FDOO RI WKH SURFHGXUH ,Q JHQHUDO WKH IDFW WKDW FURVVLQJ HOHPHQWV DUH UHFRGHG ZKHQ 1$ DQG XQUHFRGHG ZKHQ 1$ DQG 2$ A SODFHV DQ DGGHG EXUGHQ RQ WKH UHWXUQQRGH HTXDWLRQV WR UHFRJQL]H DQ HOHPHQW WKDW VKRXOG EH UHFRYHUHG IURP WKH H[LWQRGH HQWU\ VHW QHFHVVLWDWLQJ LQ HIIHFW DGGLWLRQDO UXOHV WR FRYHU WKLV SRVVLELOLW\ $IWHU DQ HOHPHQW LV UHFRYHUHG LW ZRXOG DOVR EH QHFHVVDU\ WR UHVWRUH WKH DOLDV UHODWLRQVKLS LI DQ\ WKDW PAGE 31 fÂ§ H LV DQ HQWU\ QRGH fÂ§ 7KLV DOJRULWKP FRQVWUXFWV WKH (^Q>H@ DQG 5(&2'(>H@ VHWV EHJLQ (LQ > H @ fÂ§ 5(&2'(>H@ IRU HDFK SUHGHFHVVRU FDOO QRGH F RI HQWU\ QRGH H IRU HDFK HOHPHQW [ e 287>F@ OHW Z EH WKH EDVH HOHPHQW RI [ OHW 2$ EH WKH VHW RI DOLDVHV LI DQ\ DVVRFLDWHG ZLWK Z IRUPLQJ [ OHW 1$ EH WKH VHW RI QHZ DOLDVHV 1$ Y IRU HDFK DFWXDO SDUDPHWHU S DW FDOO QRGH F WKDW LV DOLDVHG WR D FDOOE\UHIHUHQFH IRUPDO SDUDPHWHU LI Z FDQ EH DIIHFWHG E\ Sf 9 S f 2$f 1$ 1$8^I@ IL HQG IRU LI 1$ 5(&2'(>H` 5(&2'(>H@ 8 ^Z1$f` HOVH LI Z FDQ FURVV WKH LQWHUSURFHGXUDO ERXQGDU\ (LQ>H@ (LQ>H@ 8 ^X` IL HQG IRU HQG IRU HQG )LJXUH (OHPHQWUHFRGLQJ DOJRULWKP IRU IRUZDUGIORZRU GDWDIORZ SUREOHPV PAGE 32 LW KDG SULRU WR WKH FDOO 7KLV UHFRJQLWLRQ DQG UHVWRUDWLRQ SUREOHP LV SHUKDSV PRVW HDVLO\ VROYHG E\ DVVRFLDWLQJ ZLWK HDFK FDOO QRGH WZR DGGLWLRQDO VHWV RQH IRU ERG\ VHW HOHPHQWV DQG DQRWKHU IRU HQWU\VHW HOHPHQWV ZKHUH HDFK VHW FRQVLVWV RI RUGHUHG SDLUV 7KHVH VHWV ZRXOG EH GHWHUPLQHG ZKHQHYHU WKH HQWU\QRGH HQWU\ VHW RI WKH FDOOHG SURFHGXUH LV FRPSXWHG 7KH ILUVW HOHPHQW RI HDFK RUGHUHG SDLU LV D FURVVLQJ HOHPHQW [ DV LW H[LVWV LQ WKH %RXW RU (RXW VHW DW WKH FDOO QRGH DQG WKH VHFRQG HOHPHQW LV HOHPHQW \ ZKLFK LV WKDW HOHPHQW HIIHFWLYHO\ JHQHUDWHG IURP HOHPHQW [ E\ WKH HOHPHQWUHFRGLQJ DOJRULWKP RI )LJXUH DW HLWKHU OLQH RU OLQH ,I DOO FURVVLQJ HOHPHQWV IRU WKH FDOO DUH LQFOXGHG LQ WKHVH DGGLWLRQDO VHWV WKHQ WKH UHWXUQQRGH HTXDWLRQV FDQ XVH WKHVH VHWV LQVWHDG RI WKH %RXW>S@ DQG (RXW>Sf VHWV WR UHFRJQL]H HOHPHQWV WR EH UHFRYHUHG IURP WKH H[LW QRGH HQWU\ VHW 5HFRJQLWLRQ DQG UHVWRUDWLRQ ZRXOG EH GRQH E\ WU\LQJ WR PDWFK WKH H[LWQRGH HQWU\VHW HOHPHQW DJDLQVW WKH VHFRQG HOHPHQW RI DQ RUGHUHG SDLU IURP WKH DSSURSULDWH DGGLWLRQDO VHW DW WKH FDOO QRGH DQG WKHQ LI WKHUH LV D PDWFK UHVWRULQJ WKH RULJLQDO HOHPHQW E\ XVLQJ WKH ILUVW HOHPHQW RI WKH PDWFKHG SDLU )RU H[DPSOH LI D LV D FURVVLQJ HOHPHQW LQ WKH %RXW VHW RI D FDOO QRGH DQG \ LV WKH JHQHUDWHG HOHPHQW WKHQ [ \f ZRXOG EH DQ RUGHUHG SDLU LQ WKH DGGLWLRQDO VHW IRU ERG\VHW HOHPHQWV :KHQ WKH VHW IRU WKH UHWXUQ QRGH LV FRPSXWHG LI \ LV LQ WKH H[LWQRGH HQWU\ VHW WKHQ LW ZLOO PDWFK WKH RUGHUHG SDLU [ \f DQG HOHPHQW [ ZLOO EH DGGHG WR WKH %^Q VHW $V DQ H[DPSOH RI ZK\ HOHPHQW UHFRGLQJ LV QHFHVVDU\ FRQVLGHU WKH IROORZLQJ 6XSSRVH WKHUH DUH WZR GLIIHUHQW FDOOV WR WKH VDPH SURFHGXUH DQG GLIIHUHQW GHILQLWLRQV RI JOREDO YDULDEOH J UHDFK HDFK FDOO $W RQH RI WKH FDOOV J LV DOVR XVHG DV DQ DFWXDO SDUDPHWHU DQG WKH FRUUHVSRQGLQJ IRUPDO SDUDPHWHU LV FDOOE\UHIHUHQFH 7KH SUREOHP QRZ LV ZKDW WR NLOO IURP WKH HQWU\ VHW ZKHQHYHU WKDW IRUPDO SDUDPHWHU LV GHILQHG LQ WKH FDOOHG SURFHGXUH ,I WKH LQGLYLGXDO HOHPHQWV UHSUHVHQWLQJ WKH GLIIHUHQW GHILQLWLRQV RI J GR QRW VRPHKRZ LGHQWLI\ KRZ WKH\ DUH UHODWHG WR WKLV IRUPDO SDUDPHWHU WKHQ PAGE 33 WKH RQO\ FKRLFH LV WR NLOO DOO RI WKHP RU QRQH RI WKHP DQG QHLWKHU RI WKHVH FKRLFHV LV FRUUHFW LQ WKLV FDVH DV WKH RQO\ GHILQLWLRQV RI J WKDW VKRXOG EH NLOOHG DUH WKRVH WKDW HQWHUHG WKH SURFHGXUH IURP WKH FDOO ZKHUH J LV DOLDVHG WR WKH FDOOE\UHIHUHQFH IRUPDO SDUDPHWHU ,PSOLFLW 'HILQLWLRQV 'XH WR &DOOV $ FDOO ZLWK SDUDPHWHUV W\SLFDOO\ KDV LPSOLFLW GHILQLWLRQV DVVRFLDWHG ZLWK LW )RU H[DPSOH LI D IRUPDO SDUDPHWHU LV FDOOE\UHIHUHQFH WKHQ HDFK DFWXDO SDUDPHWHU DOLDVHG WR WKDW IRUPDO SDUDPHWHU LV LPSOLFLWO\ GHILQHG DW HDFK GHILQLWLRQ RI WKH IRUPDO SDUDPHWHU ,I D IRUPDO SDUDPHWHU LV FDOOE\YDOXHUHVXOW WKHQ WKDW IRUPDO SDUDPHWHU LV LPSOLFLWO\ GHILQHG HDFK WLPH WKH FDOOHG SURFHGXUH LV HQWHUHG DQG WKH DFWXDO SDUDPHWHU DW WKH FDOO LV LPSOLFLWO\ GHILQHG XSRQ UHWXUQ IURP WKH FDOO )URP WKH VWDQGSRLQW RI VROYLQJ D GDWDIORZ SUREOHP VXFK DV UHDFKLQJ GHILQLWLRQV DOO LPSOLFLW GHILQLWLRQV GXH WR FDOOV VKRXOG EH GHWHUPLQHG DQG HOHPHQWV JHQHUDWHG DW WKH DSSURSULDWH QRGHV WR UHSUHVHQW WKHVH LPSOLFLW GHILQLWLRQV 7KH UHPDLQGHU RI WKLV VHFWLRQ GLVFXVVHV WKH JHQHUDWLRQ RI LPSOLFLW GHILQLWLRQV DQG WKH GHWHUPLQDWLRQ RI ZKDW UHDFKHV WKHP IRU WKH VSHFLILF SUREOHP RI UHDFKLQJ GHILQLWLRQV :H DVVXPH WKDW D IRUPDO SDUDPHWHU PD\ EH HLWKHU FDOOE\UHIHUHQFH FDOOE\n YDOXH FDOOE\YDOXHUHVXOW RU FDOOE\UHVXOW )RU WKH UHDFKLQJGHILQLWLRQV SUREOHP EHIRUH WKH LWHUDWLYH DOJRULWKP FDQ EH XVHG WR VROYH WKH GDWDIORZ HTXDWLRQV DOO *(1 VHWV PXVW EH SUHSDUHG )RU HDFK SRLQW S LQ WKH SURJUDP ZKHUH D FDOOE\UHIHUHQFH IRUPDO SDUDPHWHU LV GHILQHG DGG WR WKH *(1 VHW RI WKH QRGH IRU SRLQW S DQ LPSOLFLW GHILQLWLRQ RI HDFK DFWXDOSDUDPHWHU YDULDEOH WKDW LV DOLDVHG WR WKDW IRUPDO SDUDPHWHU LQ D FDOO (DFK DGGHG LPSOLFLWGHILQLWLRQ HOHPHQW PXVW EH D UHFRGHG HOHPHQW WKDW LQFOXGHV WKH DOLDV UHODWLRQVKLS IRU WKDW DFWXDO SDUDPHWHU )RU H[DPSOH VXSSRVH D SURFHGXUH QDPHG $ KDV WZR FDOOE\UHIHUHQFH IRUPDO SDUDPHWHUV [ DQG \ DQG LQVLGH $ DW SRLQW S WKHUH LV D GHILQLWLRQ RI [ DQG WKHUH DUH WKUHH FDOOV RI SURFHGXUH $ LQ WKH SURJUDP 7KH ILUVW FDOO PAGE 34 DOLDVHV YDULDEOH Y WR [ 7KH VHFRQG FDOO DOLDVHV YDULDEOH Y WR ERWK [ DQG \ 7KH WKLUG FDOO DOLDVHV YDULDEOH Z WR [ 7KXV DW SRLQW S WKHUH ZRXOG EH WKUHH LPSOLFLWGHILQLWLRQ HOHPHQWV JHQHUDWHG QDPHO\ X ^[`f Z ^UH \`f DQG LQ ^D`f $V DQ H[DPSOH RI ZKDW WKLV HOHPHQW QRWDWLRQ PHDQV IRU WKH X^[`f HOHPHQW WKH Y UHSUHVHQWV WKH LPSOLFLW GHILQLWLRQ RI YDULDEOH Y WKDW RFFXUV DW SRLQW S DQG WKH [ UHSUHVHQWV WKH IRUPDO SDUDPHWHU WKDW YDULDEOH Y LV DOLDVHG WR $V D VSHFLDO UHTXLUHPHQW IRU WKHVH LPSOLFLWGHILQLWLRQ HOHPHQWV IRU WKH %RXW VHW DW WKH H[LW QRGH RI SURFHGXUH $ WKH Y ^[`f HOHPHQW LI LW UHDFKHV WKLV VHW FDQ RQO\ FURVV IURP WKLV VHW WR WKH UHWXUQ QRGH RI WKH ILUVW FDOO 6LPLODUO\ WKH X ^[S`f HOHPHQW FDQ RQO\ FURVV WR WKH UHWXUQ QRGH RI WKH VHFRQG FDOO DQG WKH Z ^[`f HOHPHQW FDQ RQO\ FURVV WR WKH UHWXUQ QRGH RI WKH WKLUG FDOO 7KH FURVVLQJ UHVWULFWLRQV LQ WKH SUHFHGLQJ H[DPSOH DUH GXH WR D UXOH QRZ JLYHQ /HW $ GHQRWH D SURFHGXUH FRQWDLQLQJ D GHILQLWLRQ DW SRLQW S RI D FDOOE\UHIHUHQFH IRUPDO SDUDPHWHU [ W ^[`f LV WKH LPSOLFLWGHILQLWLRQ HOHPHQW JHQHUDWHG DW SRLQW S IRU VRPH VSHFLILF FDOO F RI $ WKDW DOLDVHV DFWXDOSDUDPHWHU YDULDEOH W WR [ DQG P LV WKH H[LW QRGH RI $ ,I W ^[`f %RXW>P? WKHQ W ^[`f FDQ RQO\ FURVV IURP %RXW>P? WR WKH UHWXUQ QRGH RI FDOO F DQG DV W ^[`f FURVVHV LW PXVW EH UHFRGHG DV W E\ KDYLQJ LWV DOLDV UHODWLRQVKLS QXOOLILHG 7KLV FURVVLQJUHVWULFWLRQ UXOH LV QHFHVVDU\ EHFDXVH HOHPHQW I ^[`f LV ERWK D ERG\ HIIHFW EHFDXVH LW LV JHQHUDWHG LQVLGH WKH FDOOHG SURFHGXUH DQG D FDOOLQJFRQWH[W HIIHFW EHFDXVH LW LV WKH UHVXOW RI D VSHFLILF FDOO RI WKDW SURFHGXUH 7KLV GXDO TXDOLW\ UHTXLUHV WKH VSHFLDO WUHDWPHQW WKDW WKH UXOH SURYLGHV 1XOOLI\LQJ WKH DOLDV UHODWLRQVKLS DV WKH HOHPHQW FURVVHV WR WKH UHWXUQ QRGH LV ERWK JRRG SUDFWLFH LQ JHQHUDO IRU WKLV HOHPHQW DQG D QHFHVVLW\ LI FDOO F LV D UHFXUVLYH FDOO RI $ $V DQ H[DPSOH DVVXPH WKDW FDOO F LV D UHFXUVLYH FDOO RI $ DQG WKDW YDULDEOH W LV D JOREDO YDULDEOH ,I W ^[`f UHDFKHV WKH %RXW>P? VHW WKH UXOH VWDWHV WKDW WKLV HOHPHQW FDQ RQO\ FURVV WR WKH UHWXUQ QRGH RI FDOO F DQG WKDW LW EH UHFRGHG DV W $VVXPLQJ WKDW WKLV W HOHPHQW WKHQ UHDFKHV IURP WKLV UHWXUQ QRGH WR WKH %RXW>P? VHW W FDQ WKHQ FURVV PAGE 35 WR DQ\ UHWXUQ QRGH WKDW KDV DQ LQHGJH IURP P $OWKRXJK ERWK WKH Â^[`f DQG W HOHPHQWV UHIHU WR WKH VDPH LPSOLFLW GHILQLWLRQ RI YDULDEOH W RFFXUULQJ DW SRLQW S WKH WZR HOHPHQWV DUH QRW WKH VDPH DQG WKH FURVVLQJUHVWULFWLRQ UXOH DSSOLHV RQO\ WR DQ HOHPHQW WKDW LV LGHQWLFDO WR WKH HOHPHQW JHQHUDWHG DW SRLQW S ZKLFK LV W ^[`f 7KH LPSOLFLW GHILQLWLRQV RI DFWXDOSDUDPHWHU YDULDEOHV LV WKH PRVW LPSRUWDQW FDWHJRU\ RI LPSOLFLW GHILQLWLRQV WKDW DUH GXH WR FDOOE\UHIHUHQFH IRUPDO SDUDPHWHUV +RZHYHU WKHUH LV DOVR D VHFRQG OHVVLPSRUWDQW FDWHJRU\ $W HDFK H[SOLFLW GHILQLWLRQ RI D YDULDEOH W DW SRLQW S LQVLGH $ VXFK WKDW YDULDEOH W LV DOVR XVHG LQ D FDOO RI $ DV DQ DFWXDO SDUDPHWHU DOLDVHG WR D FDOOE\UHIHUHQFH IRUPDO SDUDPHWHU [ WKHQ WKHUH LV DQ LPSOLFLW GHILQLWLRQ RI IRUPDO SDUDPHWHU [ DW SRLQW S 7KH LPSOLFLWGHILQLWLRQ HOHPHQW JHQHUDWHG DW SRLQW S ZRXOG EH [^L`f PHDQLQJ D GHILQLWLRQ RI YDULDEOH [ DW SRLQW S DOLDVHG WR YDULDEOH W +RZHYHU DVVXPLQJ D IRUPDO SDUDPHWHU FDQQRW EH GHILQHG RU XVHG RXWVLGH WKH SURFHGXUH IRU ZKLFK LW LV GHFODUHG LW IROORZV WKDW WKHUH LV QR QHHG IRU D FURVVLQJUHVWULFWLRQ UXOH IRU WKHVH HOHPHQWV EHFDXVH WKH\ FDQQRW FURVV WR DQ\ UHWXUQ QRGH 1RUPDOO\ D GHILQLWLRQ RI D YDULDEOH NLOOV DOO RWKHU GHILQLWLRQV RI WKDW YDULDEOH +RZHYHU WKH LPSOLFLW GHILQLWLRQV GXH WR FDOOE\UHIHUHQFH IRUPDO SDUDPHWHUV KDYH QR DVVRFLDWHG NLOOV ,QVWHDG WKH IROORZLQJ UXOH VXIILFHV )RU HDFK FDOOE\UHIHUHQFH IRUPDO SDUDPHWHU [ GHFODUHG IRU SURFHGXUH $ LI DOO FDOOV RI $ DOLDV WKH VDPH DFWXDOSDUDPHWHU YDULDEOH W WR [ WKHQ HDFK H[SOLFLW GHILQLWLRQ LQVLGH $ RI HLWKHU YDULDEOH W RU [ ZLOO NLOO DOO GHILQLWLRQV RI YDULDEOH W DQG DOO GHILQLWLRQV RI YDULDEOH [ 2WKHUZLVH LI DOO FDOOV RI $ GR QRW DOLDV WKH VDPH DFWXDOSDUDPHWHU YDULDEOH W WR [ WKHQ HDFK H[SOLFLW GHILQLWLRQ LQVLGH $ RI HLWKHU YDULDEOH W RU [ ZLOO NLOO RQO\ WKH GHILQLWLRQV RI WKDW YDULDEOH DQG WKRVH UHFRGHG HOHPHQWV WKDW DUH DOLDVHG WR WKDW YDULDEOH 7KH HQWU\QRGH *(1 VHW ZLOO EH XVHG WR KROG DOO LPSOLFLW GHILQLWLRQV RI IRUPDO SDUDPHWHUV WKDW RFFXU XSRQ SURFHGXUH HQWU\ 7KXV IRU HDFK HQWU\ QRGH IRU HDFK PAGE 36 IRUPDO SDUDPHWHU RI WKH UHSUHVHQWHG SURFHGXUH WKDW LV FDOOE\YDOXH RU FDOOE\YDOXH UHVXOW DGG WR WKH *(1 VHW RI WKDW HQWU\ QRGH DQ HOHPHQW WKDW UHSUHVHQWV DQ LPSOLFLW GHILQLWLRQ RI WKDW IRUPDO SDUDPHWHU RFFXUULQJ DW WKDW HQWU\ QRGH 7KH UHWXUQQRGH *(1 VHW ZLOO EH XVHG WR KROG DOO LPSOLFLW GHILQLWLRQV RI DFWXDO SDUDPHWHUV WKDW PD\ RFFXU XSRQ UHWXUQ IURP WKH FDOOHG SURFHGXUH 7KXV IRU HDFK UHWXUQ QRGH IRU HDFK DFWXDO SDUDPHWHU RI WKH DVVRFLDWHG FDOO ZKRVH FRUUHVSRQGLQJ IRUPDO SDUDPHWHU LV FDOOE\UHVXOW RU FDOOE\YDOXHUHVXOW DGG WR WKH *(1 VHW RI WKDW UHWXUQ QRGH DQ HOHPHQW WKDW UHSUHVHQWV DQ LPSOLFLW GHILQLWLRQ RI WKDW DFWXDO SDUDPHWHU RFFXUULQJ DW WKDW UHWXUQ QRGH 7KH UHWXUQQRGH .,// VHW VKRXOG UHSUHVHQW DOO HOHPHQWV WKDW ZLOO EH NLOOHG E\ WKHVH LPSOLFLW GHILQLWLRQV RI DFWXDO SDUDPHWHUV :LWK WKH *(1 VHWV UHDG\ WKH LWHUDWLYH DOJRULWKP FDQ SURFHHG 2QFH WKH LWHUn DWLYH DOJRULWKP LV HQGHG D IROORZRQ VWHS LV GRQH Df ([DPLQH WKH %RXW VHW IRU HDFK H[LW QRGH )RU HDFK GHILQLWLRQ G LQ WKLV VHW RI D IRUPDO SDUDPHWHU S DQG S LV FDOOE\n UHVXOW RU FDOOE\YDOXHUHVXOW WKHQ G UHDFKHV WKH LPSOLFLW XVH RI WKLV IRUPDO SDUDPHWHU E\ WKRVH LPSOLFLW GHILQLWLRQV RI DFWXDO SDUDPHWHUV IRXQG DW WKH YDULRXV UHWXUQ QRGHV ZKRVH FRUUHVSRQGLQJ IRUPDO SDUDPHWHU LV S 7KH HOHPHQW UHSUHVHQWLQJ G FDQ EH DGGHG WR WKH %LQ VHWV RI WKRVH UHWXUQ QRGHV LQ D ZD\ WKDW UHIOHFWV WKH UHDFK Ef ([DPLQH WKH 287 VHW RI HDFK FDOO QRGH )RU HDFK GHILQLWLRQ G LQ WKLV VHW RI D YDULDEOH WKDW LV XVHG DV DQ DFWXDO SDUDPHWHU LQ WKH FDOO DQG WKH FRUUHVSRQGLQJ IRUPDO SDUDPHWHU LV FDOOE\YDOXH RU FDOOE\YDOXHUHVXOW WKHQ G UHDFKHV WKH LPSOLFLW XVH RI WKH GHILQHG YDULDEOH E\ WKH LPSOLFLW GHILQLWLRQ RI WKH FRUUHVSRQGLQJ IRUPDO SDUDPHWHU IRXQG DW WKH HQWU\ QRGH RI WKH FDOOHG SURFHGXUH 7KH HOHPHQW UHSUHVHQWLQJ G FDQ EH DGGHG WR WKH (LQ VHW RI WKDW HQWU\ QRGH LQ D ZD\ WKDW UHIOHFWV WKH UHDFK ,QWHUSURFHGXUDO )RUZDUG)ORZ$QG $QDO\VLV 7KLV VHFWLRQ JLYHV WKH GDWDIORZ HTXDWLRQV XVHG E\ RXU LQWHUSURFHGXUDO DQDO\VLV PHWKRG IRU IRUZDUGIORZDQG SUREOHPV 7KH GLIIHUHQFH EHWZHHQ WKHVH HTXDWLRQV DQG WKH HTXDWLRQV IRU IRUZDUGIORZRU LV H[SODLQHG PAGE 37 )RU IRUZDUGIORZDQG SUREOHPV VRPH FKDQJHV DUH QHHGHG WR WKH GDWDIORZ HTXDn WLRQV JLYHQ LQ 6HFWLRQ 2I FRXUVH WKH FRQIOXHQFH RSHUDWRU PXVW EH FKDQJHG IURP XQLRQ WR LQWHUVHFWLRQ +RZHYHU LW LV VWLOO QHFHVVDU\ WR FRQVWUXFW WKH HQWU\QRGH HQWU\ VHW DV WKH XQLRQ RI DOO FURVVLQJ HIIHFWV IURP WKH SUHGHFHVVRUQRGH VHWV VR WKDW FDOOLQJ FRQWH[W FDQ EH SURSHUO\ UHFRYHUHG DW WKH UHWXUQ QRGHV $W WKH VDPH WLPH WKH HQWU\ VHW PXVW DOZD\V EH FRQVWUXFWHG DV WKH LQWHUVHFWLRQ RI SUHGHFHVVRUQRGH VHWV LI WKH HQWU\ VHW LV WR EH D SDUW RI WKH ,1 DQG 287 VHWV 7KHVH FRQIOLFWLQJ UHTXLUHPHQWV IRU WKH HQWU\QRGH HQWU\ VHW FDQ EH UHVROYHG E\ PDLQWDLQLQJ WZR VHSDUDWH HQWU\ VHWV DW HDFK QRGH 7KH UHYLVHG GDWDIORZ HTXDWLRQV IROORZ 7KH WZR FRQGLWLRQV &? DQG & DUH H[SODLQHG LQ 6HFWLRQ )RU DQ\ QRGH Q 9>Q@ (A>Q@ 8 %LQ>Q@ 287>Q@ (OO?>Q@ 8 %RXW>Q? *URXS Q LV DQ HQWU\ QRGH %LQ>Q@ 8 ^[ ; H ^(ÂOW>S@ 8 %RXW>Sff $ &O` S f SUHGQf "0 S_ ^[?[H287>S@$&` S f SUHGQf %RXW>Q@ *(1>Q? (ÂO?>Q@ ,9@ 8 5(&2'(Z>Q? 8 5(&2'(Z>Q@ (O?>Q@ eMQf>Q@ 8 5(&2'(A>Q@ *URXS ,, Q LV D UHWXUQ QRGH S LV WKH DVVRFLDWHG FDOO QRGH DQG T LV WKH H[LW QRGH RI WKH FDOOHG SURFHGXUH %LQ>Q? ^[ [ %RXW>S@ $ &M 9 &M $ & $ r f O>"@fff 9 [ f %RXW>T@ $ &f` PAGE 38 ": ^[ f (bW>S@ &c 9 ^&[ $ & $ [ f (c}W>T@f`L %RX>Q@ Q>Q@ DÂ/>Q@f 8 *e:>Q@ Â0 A>Q@ .,//>Q?L *URXS ,,, Q LV QRW DQ HQWU\ RU UHWXUQ QRGH %WQ>Q@ S_ %RXW>S@ S ( SUHG^Qf rZ Q HeP L S e SUHGQf %RXW>Q@ eff>m@ .,//>Q?f 8 *(1>Q` (eW>Q@ (ccOf>Q?.,//OQ@@L O 7KH HQWU\ VHW (A LV WKH VHW XVHG WR UHFRYHU FDOOLQJ FRQWH[W DQG WKH HQWU\ VHW (LV WKH VHW WKDW LV D FRPSRQHQW RI WKH ,1 DQG 287 VHWV 7KH 5(&2'( VHWV DSSHDULQJ LQ WKH HQWU\QRGH HTXDWLRQV UHSUHVHQW UHFRGHG HOHPHQWV DV H[SODLQHG LQ 6HFWLRQ 7KH 5(&2'(A VHW ZLOO MXVW EH WKH XQLRQ RI WKH UHFRGHG HOHPHQWV JHQHUDWHG IURP HDFK SUHGHFHVVRU FDOO QRGH F XVLQJ WKH DOJRULWKP RI )LJXUH DQG GUDZLQJ IURP WKH (A-W>F@ DQG %RXW>F@ VHWV DW OLQH LQVWHDG RI WKH 287>F@ VHW 6LPLODUO\ WKH 5(&2'(A VHW FRXOG MXVW EH WKH LQWHUVHFWLRQ RI WKH UHFRGHG HOHPHQWV IURP HDFK SUHGHFHVVRU FDOO QRGH F GUDZLQJ IURP WKH 287>F@ VHW DW OLQH +RZHYHU GRLQJ WKLV PD\ FDXVH WKH XQQHFHVVDU\ ORVV RI UHFRGHG HOHPHQWV ZKHQ WKH VDPH XQGHUO\LQJ EDVH HOHPHQW Z LV IRXQG LQ HDFK 287>F` VHW 7R DYRLG VXFK ORVV DQ LPSURYHG UXOH VWDWHV WKDW LI WKH VDPH EDVH HOHPHQW Z LV IRXQG LQ HDFK 287>F@ VHW DQG WKHUH LV RQH RU PRUH QRQHPSW\ DOLDV UHODWLRQVKLSV IRU WKDW Z RFFXUULQJ DW RQH RU PRUH SUHGHFHVVRU QRGHV F WKHQ D VLQJOH UHFRGHG HOHPHQW IRU WKDW Z WKDW HQFRGHV DOO RI WKHVH DOLDV UHODWLRQVKLSV ZRXOG EH JHQHUDWHG LQWR WKH 5(&2'(A VHW RWKHUZLVH QR UHFRGHG HOHPHQW IRU WKDW Z ZRXOG EH JHQHUDWHG LQWR WKH 5(&2'(A VHW )RU PAGE 39 H[DPSOH VXSSRVH F KDV WKUHH GLIIHUHQW YDOXHV IRU D JLYHQ HQWU\ QRGH DQG WKH VDPH EDVH HOHPHQW Z LV IRXQG LQ HDFK 287>F@ VHW DQG DW RQH F WKHUH LV DQ HPSW\ DOLDV UHODWLRQVKLS DW WKH VHFRQG F WKHUH LV DQ DOLDV UHODWLRQVKLS WR IRUPDO SDUDPHWHU [ DQG DW WKH WKLUG F WKHUH LV DQ DOLDV UHODWLRQVKLS WR IRUPDO SDUDPHWHU \ )RU WKLV H[DPSOH WKH VLQJOH UHFRGHG HOHPHQW ZRXOG EH Z ^[L`f DQG WKLV UHFRGHG HOHPHQW FDQ HLWKHU EH NLOOHG GLUHFWO\ WKURXJK WR RU LQGLUHFWO\ WKURXJK [ RU WKURXJK \ 1RWH WKDW WKH FRPSOHWH NLOO RI WKLV UHFRGHG HOHPHQW DW DQ\ NLOO SRLQW HYHQ WKRXJK WKH NLOO PD\ KDYH EHHQ PDGH WKURXJK DQ DOLDV WKDW ZDV QRW HVWDEOLVKHG DW HDFK F LV QHYHUWKHOHVV FRUUHFW 7KH LQWHUVHFWLRQ FRQIOXHQFH RSHUDWRU DVVRFLDWHG ZLWK 5(&2'(A LPSOLFLWO\ UHTXLUHV WKDW IRU EDVH HOHPHQW Z WR SDVV D NLOO SRLQW LW PXVW EH RQ HYHU\ FDOO SDWK SDVW WKDW NLOO SRLQW ZKLFK LV QRW WKH FDVH ZKHQ Z LV NLOOHG IURP DW OHDVW RQH FDOO SDWK ZKLFK KDSSHQV ZKHQ WKDW Z LV NLOOHG WKURXJK DQ DOLDV WKDW ZDV HVWDEOLVKHG E\ DW OHDVW RQH RI WKH F ,I WKH VSHFLILF GDWDIORZ SUREOHP EHLQJ VROYHG DOORZV WKH EDVH HOHPHQW WR EH XVHG WKURXJK RQH RI LWV HIIHFWLYH DOLDVHV WKHQ D IODJ FRXOG EH DVVRFLDWHG ZLWK HDFK DOLDV LQ WKH UHFRGHG HOHPHQWV RI 5(&2'(A? DQG WKLV IODJ FRXOG LQGLFDWH ZKHWKHU RU QRW WKH DOLDV ZDV HVWDEOLVKHG DW HDFK F ,Q WKH FDVH RI WKH H[DPSOH WKH UHFRGHG HOHPHQW ZLWK IODJV ZRXOG EH Z ^]QRW! QRWff 2QO\ D XVH RI WKH EDVH HOHPHQW WKURXJK DQ DOLDV HVWDEOLVKHG DW HDFK F ZRXOG EH D XVH WKURXJK DQ DOLDV WKDW RFFXUV RQ HYHU\ FDOO SDWK DQG WKLV NLQG RI XVH ZRXOG EH WKH DOOSDWKV XVH WKDW LV LPSOLFLWO\ UHTXLUHG E\ WKH VSHFLILF GDWDIORZ SUREOHP E\ YLUWXH RI LW EHLQJ IRUZDUGIORZDQG :LWK WKH H[FHSWLRQ RI WKH FRQIOXHQFH RSHUDWRU DQG WKH WZR GLIIHUHQW HQWU\ VHWV WKH HTXDWLRQV IRU IRUZDUGIORZDQG DUH WKH VDPH DV IRU IRUZDUGIORZRU DQG DUH OLNHn ZLVH FRUUHFW 6HW (A IXOILOOV WKH UHTXLUHPHQW IRU WKH ,1 DQG 287 VHWV E\ FRQVLVWHQWO\ XVLQJ WKH LQWHUVHFWLRQ FRQIOXHQFH RSHUDWRU IRU LWV FRQVWUXFWLRQ MXVW DV % GRHV 7KH HTXDWLRQV IRU WKH (A DQG (A VHWV RQO\ GLIIHU DW WKH HQWU\ QRGH DQG WKHUH WKH RQO\ GLIIHUHQFH LV WKH FRQIOXHQFH RSHUDWRU DQG WKH ZD\ WKH 5(&2'( VHWV DUH EXLOW $V VHW LQWHUVHFWLRQ LV WKH FRQIOXHQFH RSHUDWRU IRU (A? DQG VHW XQLRQ IRU (A? DQG WKH PAGE 40 7DEOH 6ROXWLRQ RI IRUZDUGIORZDQG HTXDWLRQV IRU )LJXUH 1RGH LQ ÂnRXW me} S: ÂnRXW %LQ %RXW ^` ^` ^ ` ^ ` ^` ^` ^` ^ ` ^ ` ^` ` ^ ` ^ ` ^` ^` ^ ` ^ ` ^` ^` ^` ^ ` ^` ^` 5(&2'(A VHW LV DGGHG WR ERWK DQG (A? LW IROORZV WKDW (A ZLOO EH D VXEVHW RI (DW HYHU\ QRGH 7KXV (A FDQ EH XVHG WR UHFRYHU FDOOLQJ FRQWH[W IRU (A? 6HW (: DOVR VHUYHV WR UHFRYHU FDOOLQJ FRQWH[W IRU ERWK (A DQG % EHFDXVH (A LV EXLOW DW WKH HQWU\ QRGH IURP WKHVH WZR VHWV DQG WKH XVH RI XQLRQ DV WKH FRQIOXHQFH RSHUDWRU JXDUDQWHHV WKDW DOO FDOOLQJFRQWH[W HIIHFWV ZLOO EH FROOHFWHG 7DEOH VKRZV WKH UHVXOW RI VROYLQJ WKH HTXDWLRQV IRU WKH IORZJUDSK RI )LJn XUH %\ fVROYLQJf ZH PHDQ WKDW LQ HIIHFW WKH LWHUDWLYH DOJRULWKP KDV EHHQ XVHG DQG DOO WKH VHWV DUH VWDEOH 7KH GDWDIORZ SUREOHP LV DYDLODEOH H[SUHVVLRQV DQG YDULn DEOH Z LV ORFDO ZKLOH YDULDEOHV [ \ DQG DUH JOREDO $YDLODEOH H[SUHVVLRQV LV WKH SUREOHP RI GHWHUPLQLQJ ZKHWKHU WKH XVH RI DQ H[SUHVVLRQ LV DOZD\V UHDFKHG E\ VRPH SULRU XVH RI WKDW H[SUHVVLRQ IRU FHUWDLQ H[SUHVVLRQV LQ WKH SURJUDP ,Q )LJXUH QRGHV DQG DUH HQWU\ QRGHV QRGHV DQG DUH H[LW QRGHV QRGHV DQG DUH FDOO QRGHV DQG QRGHV DQG DUH UHWXUQ QRGHV $ORQJVLGH HDFK QRGH LV LWV EDVLF EORFN (DFK H[SUHVVLRQ LV VXSHUVFULSWHG ZLWK DQ LGHQWLILHU WKDW LV WKH VHW HOHPHQW XVHG LQ 7DEOH WR UHSUHVHQW WKDW H[SUHVVLRQ PAGE 41 SURFHGXUH PDLQ EHJLQ \ Z ] [ SURFHGXUH If EHJLQ [ ] HQG LIHf D ] FD8 If )LJXUH $Q DYDLODEOHH[SUHVVLRQV H[DPSOH PAGE 42 ,QWHUSURFHGXUDO %DFNZDUG)ORZ $QDO\VLV %DFNZDUGIORZ SUREOHPV DUH EDVLFDOO\ IRUZDUGIORZ SUREOHPV LQ UHYHUVH +RZn HYHU WKH VDPH IORZJUDSK LV XVHG IRU ERWK IRUZDUGIORZ DQG EDFNZDUGIORZ SUREOHPV 7R FRQYHUW WKH HTXDWLRQV IRU IRUZDUGIORZRU WR EDFNZDUGIORZRU RU IRU IRUZDUG IORZDQG WR EDFNZDUGIORZDQG WKH WUDQVIRUPDWLRQ LV PHFKDQLFDO DQG VWUDLJKWIRUn ZDUG 7KH VDPH HTXDWLRQV DUH XVHG EXW YDULRXV ZRUGV DQG SKUDVHV DUH HYHU\ZKHUH FKDQJHG WR UHIOHFW WKH UHYHUVH IORZ )RU H[DPSOH fSUHGQff IRU SUHGHFHVVRUV EHFRPHV fVXFFQff IRU VXFFHVVRUV fRXWf VXEVFULSWV EHFRPH fLQf VXEVFULSWV DQG fLQf VXEVFULSWV EHFRPH fRXWf VXEVFULSWV ,1 EHFRPHV 287 DQG 287 EHFRPHV ,1 fFDOO QRGHf EHn FRPHV fUHWXUQ QRGHf DQG fUHWXUQ QRGHf EHFRPHV fFDOO QRGHf fHQWU\ QRGHf EHFRPHV fH[LW QRGHf DQG fH[LW QRGHf EHFRPHV fHQWU\ QRGHf )RU EDFNZDUG IORZ WKH QRGHV UHTXLULQJ VSHFLDO HTXDWLRQV DUH WKH H[LW QRGH DQG FDOO QRGH DQG QRW WKH HQWU\ QRGH DQG UHWXUQ QRGH DV IRU WKH IRUZDUGIORZ SUREOHPV &RPSOH[LW\ RI 2XU ,QWHUSURFHGXUDO $QDO\VLV 0HWKRG 7R GHWHUPLQH WKH ZRUVWFDVH FRPSOH[LW\ RI RXU PHWKRG IRU WKH DVVXPHG ODQn JXDJH PRGHO LQ ZKLFK WKH YLVLELOLW\ RI HDFK IRUPDO SDUDPHWHU LV OLPLWHG WR WKH VLQJOH SURFHGXUH WKDW GHFODUHV LW ZH FRQVLGHU WKH VROXWLRQ RI WKH GDWDIORZ HTXDWLRQV IRU RQO\ RQH HOHPHQW DW D WLPH /HW Q EH WKH QXPEHU RI IORZJUDSK QRGHV /HW WKH HOHPHQWDU\ RSHUDWLRQ PHDVXUHG E\ WKH FRPSOH[LW\ EH WKH FRPSXWDWLRQ RI WKH GDWDIORZ HTXDWLRQV RQFH DW D VLQJOH DYHUDJH IORZJUDSK QRGH IRU D VLQJOH HOHPHQW 2QO\ WKH SUHVHQFH RU DEVHQFH RI WKH VLQJOH HOHPHQW ZLWKLQ D SDUWLFXODU ERG\ RU HQWU\ VHW QHHG EH UHSUHn VHQWHG DQG WKLV UHTXLUHV QR PRUH WKDQ D VLQJOH ELW RI VWRUDJH IRU HDFK VHW UHIHUHQFHG E\ WKH HTXDWLRQV 7KXV FRPSXWLQJ WKH GDWDIORZ HTXDWLRQV RQFH DW DQ DYHUDJH QRGH IRU D VLQJOH HOHPHQW ZLOO FRQVLVW RI D VPDOO QXPEHU RI LQWHJHU RSHUDWLRQV DVVXPLQJ WKDW WKH DYHUDJH LQ DQG RXWGHJUHH RI WKH IORZJUDSK QRGHV LV ERXQGHG E\ D VPDOO FRQVWDQW ZKLFK ZLOO DOZD\V EH WKH FDVH IRU IORZJUDSKV JHQHUDWHG IURP UHDO SURJUDPV PAGE 43 DQG DOVR DVVXPLQJ WKDW WKH OHQJWK RI UHFRGHG HOHPHQWV ZLOO EH VPDOO 5HIHUULQJ WR WKH DOJRULWKP RI )LJXUH WKH OHQJWK RI D UHFRGHG HOHPHQW LV _1$_ DQG _L?0_ LV ERXQGHG IURP DERYH E\ WKH QXPEHU RI FDOOE\UHIHUHQFH IRUPDO SDUDPHWHUV RI WKH JLYHQ SURFHGXUH $V D UXOH WKLV XSSHU ERXQG ZLOO EH VPDOO :H QH[W FRQVLGHU WKH WRWDO QXPEHU RI QRGH YLVLWV UHTXLUHG WR VROYH WKH GDWDIORZ HTXDWLRQV IRU D VLQJOH HOHPHQW 3ULRU WR VROYLQJ WKH HTXDWLRQV DOO ERG\ DQG HQWU\ VHWV DUH LQLWLDOL]HG WR HPSW\ DW FRPSOH[LW\ Qf 7KH HPSW\ VHWV UHSUHVHQW WKH DEVHQFH RI WKH HOHPHQW 1RWH WKDW HDFK VHW KDV RQO\ WZR VWDWHV HLWKHU WKH HOHPHQW LV SUHVHQW RU LW LV DEVHQW $VVXPLQJ D IRUZDUGIORZ SUREOHP HDFK WLPH WKH HTXDWLRQV DUH FRPSXWHG IRU D QRGH LI DQ\ RI WKH RXW VHWV KDYH FKDQJHG IURP WKHLU SUHYLRXV VWDWH WKHQ WKH HTXDWLRQV ZLOO EH FRPSXWHG IRU DOO VXFFHVVRU QRGHV 7KH IRUZDUGIORZRU HTXDWLRQV KDYH RQO\ WZR RXW VHWV SHU QRGH DQG WKH IRUZDUGIORZDQG HTXDWLRQV KDYH WKUHH ,W IROORZV WKDW UHSHDWHG FRPSXWDWLRQ RI WKH HTXDWLRQV IRU D VLQJOH QRGH ZLOO FDXVH WKH VXFFHVVRU QRGHV WR EH PDUNHG IRU FRPSXWDWLRQ DW PRVW WZR RU WKUHH WLPHV GHSHQGLQJ RQ WKH HTXDWLRQV EHLQJ XVHG *LYHQ WKDW WKH DYHUDJH QXPEHU RI VXFFHVVRU QRGHV LV ERXQGHG E\ D VPDOO FRQVWDQW LW IROORZV WKDW WKH WRWDO QXPEHU RI QRGH YLVLWV UHTXLUHG WR VROYH WKH GDWDIORZ HTXDWLRQV IRU D VLQJOH HOHPHQW ZLOO EH ERXQGHG IURP DERYH E\ N[Q ZKHUH NL LV D FRQVWDQW JLYLQJ D ZRUVWFDVH FRPSOH[LW\ RI Qf IRU VROYLQJ WKH GDWDIORZ HTXDWLRQV IRU D VLQJOH HOHPHQW 7KH ZRUVWFDVH FRPSOH[LW\ RI VROYLQJ WKH GDWDIORZ HTXDWLRQV IRU P WRWDO HOHn PHQWV ZLOO WKHUHIRUH EH 2PQf /HW E EH WKH QXPEHU RI EDVH HOHPHQWV IRU WKH SURJUDP EHLQJ DQDO\]HG DQG OHW U EH WKH QXPEHU RI UHFRGHG HOHPHQWV JLYLQJ P fÂ§ E I U $V DQ H[DPSOH IRU WKH UHDFKLQJGHILQLWLRQV GDWDIORZ SUREOHP WKH EDVH HOHPHQWV ZLOO EH DOO WKH GHILQLWLRQV LQ WKH SURJUDP :H DVVXPH WKDW IRU WKH NLQG RI GDWDIORZ SUREOHPV RXU PHWKRG LV PHDQW WR VROYH WKH QXPEHU RI EDVH HOHPHQWV ZLOO EH D OLQHDU IXQFWLRQ RI WKH SURJUDP VL]H DQG WKHUHIRUH SURSRUWLRQDO WR Q /HW FRQVWDQW t EH DQ XSSHU ERXQG RI EQ :H DOVR DVVXPH WKH XQLYHUVH RI UHDO XVHIXO SURJUDPV ZULWWHQ E\ PAGE 44 SURJUDPPHUV WR VROYH SUDFWLFDO SUREOHPV 7R GHWHUPLQH DQ XSSHU ERXQG IRU U OHW N EH WKH PD[LPXP QXPEHU RI IRUPDO SDUDPHWHUV IRU D VLQJOH SURFHGXUH 7KDW N LV D FRQVWDQW LQGHSHQGHQW RI SURJUDP VL]H VKRXOG EH REYLRXV *LYHQ N DQG WKH DOJRULWKP RI )LJXUH DQG DOORZLQJ DOO SRVVLEOH FRPELQDWLRQV RI WKH IRUPDO SDUDPHWHUV RI DQ\ VLQJOH SURFHGXUH WKH PD[LPXP QXPEHU RI UHFRGHG HOHPHQWV IRU DQ\ VLQJOH SURFHGXUH DQG EDVH HOHPHQW LV N AI A Â A N fÂ§ 1RWH WKDW N LV D FRQVWDQW DOEHLW DQ HQRUPRXV FRQVWDQW 7KH PD[LPXP QXPEHU RI UHFRGHG HOHPHQWV IRU DQ\ VLQJOH SURFHGXUH ZLOO WKHUHIRUH EH NE ,Q WKH DVVXPHG ODQJXDJH PRGHO HDFK IRUPDO SDUDPHWHU LV YLVLEOH LQ RQO\ RQH SURFHGXUH DQG WKLV PHDQV HDFK UHFRGHG HOHPHQW LV FRQILQHG WR D VLQJOH SURFHGXUH ZKHQ WKH GDWDIORZ HTXDWLRQV DUH VROYHG 7KHUHIRUH WKH WRWDO QXPEHU RI QRGH YLVLWV UHTXLUHG WR VROYH WKH GDWDIORZ HTXDWLRQV IRU DOO WKH UHFRGHG HOHPHQWV ZLOO EH ERXQGHG IURP DERYH E\ NLVcNE ZKHUH M LV WKH QXPEHU RI SURFHGXUHV LQ WKH IORZJUDSK DQG Vc LV WKH QXPEHU RI IORZJUDSK QRGHV LQ WKH LWK SURFHGXUH 7KLV XSSHU ERXQG FDQ EH UHZULWWHQ DV -L L NLNNQV^ ,JQRULQJ FRQVWDQWV DQG JLYHQ WKDW e1 VL fÂ§ Q DQG +Â L QVL Q WKH ZRUVWFDVH FRPSOH[LW\ RI RXU PHWKRG IRU WKH DVVXPHG ODQJXDJH PRGHO LV Qf DQG WKH HOHPHQWDU\ RSHUDWLRQ PHDVXUHG E\ WKH FRPSOH[LW\ LV D VPDOO QXPEHU RI LQWHJHU RSHUDWLRQV DVVXPLQJ WKDW WKH DYHUDJH UHFRGHGHOHPHQW OHQJWK LV VPDOO )RU D SURJUDP IURP WKH DVVXPHG XQLYHUVH RI SURJUDPV WKH OLNHOLKRRG RI D ODUJH FRPSOH[LW\ FRQVWDQW GXH WR HOHPHQW UHFRGLQJ LV YHU\ ORZ IRU WKH IROORZLQJ UHDVRQ ,Q RUGHU WR LQFUHDVH WKH QXPEHU RI UHFRGHG HOHPHQWV IRU D JLYHQ EDVH HOHPHQW DQG SURFHGXUH WKH JLYHQ EDVH HOHPHQW PXVW LQ HIIHFW EH UHSHDWHGO\ DOLDVHG WR GLIIHUHQW FRPELQDWLRQV RI IRUPDO SDUDPHWHUV LQ WKH JLYHQ SURFHGXUH 7KH DOJRULWKP RI )LJn XUH JHQHUDWHV DW PRVW D VLQJOH UHFRGHG HOHPHQW IRU HDFK HOHPHQW LQ WKH 287 VHW VR WR LQFUHDVH WKH QXPEHU RI UHFRGHG HOHPHQWV DV VWDWHG WKHUH PXVW EH PXOWLSOH FDOOV WR WKH VDPH SURFHGXUH DQG LQ WKHVH GLIIHUHQW FDOOV WKH VDPH EDVH HOHPHQW PXVW EH DOLDVHG WR GLIIHUHQW IRUPDOSDUDPHWHU FRPELQDWLRQV 7R DVVHVV WKH OLNHOLKRRG RI WKLV PAGE 45 UHTXLUHPHQW EHLQJ PHW FRQVLGHU WKDW IRU DQ\ JLYHQ SURJUDP IURP WKH DVVXPHG XQLn YHUVH WKH W\SH DQG SXUSRVH RI D YDULDEOH GHWHUPLQHV KRZ WKDW YDULDEOH LV XVHG LQ WKDW SURJUDP DQG HDFK YDULDEOH XVHG LQ D SURJUDP E\ QHFHVVLW\ KDV D SXUSRVH *LYHQ D QXPEHU RI GLIIHUHQW FDOOV WR WKH VDPH SURFHGXUH DQG JLYHQ WKDW D YDULDEOH DSSHDUV DV RQH RU PRUH RI WKH DFWXDO SDUDPHWHUV LQ HDFK RI WKH FDOOV WKHQ DV D UXOH ZH H[SHFW WKDW YDULDEOH WR DOZD\V RFFXS\ WKH VDPH SDUDPHWHU SRVLWLRQV LQ WKRVH FDOOV EHFDXVH WKHUH LV DOZD\V D FORVH FRUUHVSRQGHQFH EHWZHHQ SDUDPHWHU SRVLWLRQ DQG WKH SXUSRVH RI WKH YDULDEOH WKDW RFFXSLHV WKDW SRVLWLRQ 1RWH WKDW E\ fYDULDEOHf ZH PHDQ D YDULDEOH DQG DQ\ DOLDVHV LW PD\ KDYH LQFOXGLQJ IRUPDOSDUDPHWHU DOLDVHV $ YDULDEOH DQG LWV DOLDVHV DUH LQWHUFKDQJHDEOH DQG VKDUH WKH VDPH SXUSRVH EHFDXVH E\ GHILQLWLRQ WKH\ UHIHUHQFH WKH VDPH GDWD ,W PLJKW EH DUJXHG WKDW D ODQJXDJH VXFK DV & KDV SURFHGXUHV WKDW KDYH D YDULDEOH QXPEHU RI DUJXPHQWV VXFK DV SULQW DQG VFDQ IRU ZKLFK WKH VDPH YDULDEOH FRXOG HDVLO\ RFFXS\ GLIIHUHQW DFWXDOSDUDPHWHU SRVLWLRQV LQ GLIIHUHQW FDOOV 7KLV LV WUXH EXW VXFK OLEUDU\ SURFHGXUHV DUH EHVW WUHDWHG DV XQNQRZQ FDOOV DQG WKHUH LV QR HOHPHQW UHFRGLQJ IRU XQNQRZQ FDOOV )RU WKH QHHGV RI HOHPHQW UHFRGLQJ LQ WKH UDUH FDVH RI D XVHUZULWWHQ SURFHGXUH ZLWK D YDULDEOH QXPEHU RI DUJXPHQWV D VLQJOH IRUPDO SDUDPHWHU FRXOG VWDQG IRU WKH YDULDEOH SRUWLRQ RI WKH IRUPDO SDUDPHWHUV DQG FRQVHUYDWLYH DVVXPSWLRQV FRXOG EH PDGH ZKHQHYHU WKDW VLQJOH IRUPDO LV LQ HIIHFW UHIHUHQFHG $VLGH IURP PHQWLRQLQJ WKLV ZH GR QRW FRQVLGHU VXFK XVHUZULWWHQ YDULDEOH DUJXPHQW SURFHGXUHV IXUWKHU )RU D GDWDIORZ SUREOHP VXFK DV UHDFKLQJ GHILQLWLRQV WKH EDVH HOHPHQW FDQ RQO\ EH DIIHFWHG E\ D VLQJOH YDULDEOH )RU VXFK D GDWDIORZ SUREOHP WKH SXUSRVHIXOQHVV RI YDULDEOHV PDNHV LW YHU\ XQOLNHO\ WKDW DQ LQFUHDVH LQ WKH QXPEHU RI UHFRGHG HOHPHQWV IRU D JLYHQ SURFHGXUH DQG EDVH HOHPHQW FDQ HYHQ EHJLQ OHW DORQH EH VXVWDLQHG +RZn HYHU VXFK DQ LQFUHDVH ZRXOG EH PRUH OLNHO\ IRU D GDWDIORZ SUREOHP ZKHUH WKH EDVH PAGE 46 HOHPHQW FDQ EH DIIHFWHG E\ VHYHUDO GLIIHUHQW YDULDEOHV $Q H[DPSOH ZRXOG EH DYDLOn DEOH H[SUHVVLRQV EHFDXVH HDFK EDVH HOHPHQW FRXOG EH DIIHFWHG E\ DV PDQ\ GLIIHUHQW YDULDEOHV DV FRPSRVH WKH H[SUHVVLRQ UHSUHVHQWHG E\ WKDW EDVH HOHPHQW ,Q OLJKW RI WKH SUHFHGLQJ DUJXPHQW UHJDUGLQJ WKH SXUSRVHIXOQHVV RI YDULDEOHV IRU WKH UHDFKLQJGHILQLWLRQV DQG VLPLODU GDWDIORZ SUREOHPV ZH H[SHFW WKH PD[LPXP QXPEHU RI UHFRGHG HOHPHQWV IRU DQ\ JLYHQ SURFHGXUH DQG EDVH HOHPHQW LQ WKH PDMRULW\ RI WKH SURJUDPV LQ WKH DVVXPHG XQLYHUVH WR EH RQH DQG D OLWWOH KLJKHU WKDQ RQH IRU WKH UHPDLQLQJ SURJUDPV LQ WKDW XQLYHUVH *LYHQ WKH DOJRULWKP RI )LJXUH ZH DOVR H[SHFW WKH DYHUDJH OHQJWK RI HDFK UHFRGHG HOHPHQW WR EH VOLJKWO\ PRUH WKDQ WZR JLYHQ WKH SUHFHGLQJ H[SHFWDWLRQ WKDW WKHUH ZLOO EH D YHU\ VPDOO PD[LPXP QXPEHU RI UHFRGHG HOHPHQWV IRU DQ\ JLYHQ SURFHGXUH DQG EDVH HOHPHQW DQG DVVXPLQJ WKDW PRVW EDVH HOHPHQWV ZKHQ DOLDVHG E\ D FDOO ZLOO EH DOLDVHG WR RQO\ D VLQJOH IRUPDO SDUDPHWHU DQG RQO\ RFFDVLRQDOO\ DOLDVHG WR PRUH WKDQ RQH 1RWH WKDW WKLV H[SHFWHG DYHUDJH OHQJWK RI WKH UHFRGHG HOHPHQWV LV FRQVLVWHQW ZLWK WKH FODLP WKDW WKH HOHPHQWDU\ RSHUDWLRQ PHDVXUHG E\ WKH ZRUVWFDVH FRPSOH[LW\ RI RXU PHWKRG LV D VPDOO QXPEHU RI LQWHJHU RSHUDWLRQV ,W PD\ EH QRWLFHG WKDW WKH FRPSOH[LW\ RI Qf IRU RXU LQWHUSURFHGXUDO DQDO\VLV PHWKRG LV WKH VDPH DV WKH NQRZQ ZRUVWFDVH FRPSOH[LW\ IRU LQWUDSURFHGXUDO GDWDIORZ DQDO\VLV DVVXPLQJ WKHUH DUH QR UHVWULFWLRQV RQ WKH IORZJUDSK 7KLV IDFW PDNHV LW XQOLNHO\ WKDW LW ZRXOG EH SRVVLEOH WR LPSURYH RQ RXU PHWKRG LQ WHUPV RI FRPSOH[LW\ ZLWKRXW UHVRUWLQJ WR IORZJUDSK UHVWULFWLRQV +RZHYHU DOWKRXJK WKH FRPSOH[LWLHV DUH WKH VDPH WKLV GRHV QRW PHDQ LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV ZLOO QRZ WDNH URXJKO\ WKH VDPH WLPH DV LQWUDSURFHGXUDO GDWDIORZ DQDO\VLV 7KH IROORZLQJ LQHTXDOLW\ VKRXOG PDNH WKLV FOHDU V@ Q! JLYHQ WKDW M LV WKH QXPEHU RI SURFHGXUHV LQ WKH IORZJUDSK V LV WKH QXPEHU RI IORZJUDSK QRGHV LQ WKH ÂWK SURFHGXUH DQG PAGE 47 ([DPSOHV RI SURJUDPPLQJ ODQJXDJHV WKDW ILW WKLV DOWHUQDWLYH PRGHO DUH 3DVFDO DQG $GD ZKLFK DOORZ QHVWHG SURFHGXUHV (OHPHQW UHFRGLQJ FDQ EH XVHG IRU WKLV DOWHUQDWLYH PRGHO EXW XQOHVV SUHFLVLRQ LV FRPSURPLVHG WKH ZRUVWFDVH FRPSOH[LW\ IRU VROYLQJ WKH HTXDWLRQV ZLOO EH H[SRQHQWLDO EHFDXVH WKH QXPEHU RI UHFRGHG HOHPHQWV FRXOG JURZ H[SRQHQWLDOO\ DVVXPLQJ WKDW DOLDV LQIRUPDWLRQ LV FRPSRXQGHG ZKHQ D UHFRGHG HOHPHQW LV UHFRGHG 7KH H[SRQHQWLDO FRPSOH[LW\ RI WUDFNLQJ DOLDVHV GXH WR FDOOV ZDV ILUVW FRQVLGHUHG E\ 0\HUV >@ DQG PRUH UHFHQWO\ E\ /DQGL DQG 5\GHU >@ ,Q SUDFWLFH WKH FRVW RI SUHFLVH HOHPHQW UHFRGLQJ IRU WKH DOWHUQDWLYH ODQJXDJH PRGHO PD\ EH DFFHSWDEOH IRU WKH DVVXPHG XQLYHUVH RI SURJUDPV DQG IRU WKH VDPH UHDVRQ JLYHQ SUHYLRXVO\ UHJDUGLQJ WKH SXUSRVHIXOQHVV RI YDULDEOHV +RZHYHU ZH GR QRW FRQVLGHU WKH DOWHUQDWLYH PRGHO IXUWKHU ([SHULPHQWDO 5HVXOWV 7KHUH DUH H[SHULPHQWDO GDWD IRU RXU LQWHUSURFHGXUDO DQDO\VLV PHWKRG 6SHFLIn LFDOO\ WZR GLIIHUHQW SURWRW\SHV KDYH EHHQ FRQVWUXFWHG DQG WKH\ ERWK VROYH WKH UHDFKLQJGHILQLWLRQV GDWDIORZ SUREOHP XVLQJ RXU PHWKRG %RWK SURWRW\SHV DFFHSW &ODQJXDJH SURJUDPV DV WKH LQSXW WR EH GDWDIORZ DQDO\]HG )RU VLPSOLFLW\ WKHVH SURn WRW\SHV LPSRVH VRPH UHVWULFWLRQV RQ WKH LQSXW VXFK DV UHTXLULQJ WKDW DOO YDULDEOHV EH UHSUHVHQWHG E\ VLQJOH LGHQWLILHUV WKHUHE\ H[FOXGLQJ YDULDEOHV WKDW KDYH PRUH WKDQ RQH FRPSRQHQW VXFK DV VWUXFWXUH DQG XQLRQ YDULDEOHV ,Q DGGLWLRQ WKHUH LV QR ORJLF LQ WKH SURWRW\SHV WR GHWHUPLQH ZKDW SRLQWHUV DUH SRLQWLQJ DW VR SRLQWHU GHUHIHUHQFLQJ LV HVVHQWLDOO\ LJQRUHG 7KH SURWRW\SHV GR QRW DFFHSW SUHSURFHVVRU FRPPDQGV VR WKH LQSXW SURJUDPV PXVW EH SRVWSUHSURFHVVRU %RWK SURWRW\SHV QDPHG SURWRW\SH DQG SURWRW\SH XVH WKH VDPH FRGH WR SDUVH WKH LQSXW SURJUDP DQG FRQVWUXFW WKH IORZJUDSK +RZHYHU WKH\ GLIIHU LQ KRZ WKH\ LPSOHPHQW RXU DQDO\VLV PHWKRG 3URWRW\SH SUHSDUHV D VLQJOH ELWYHFWRU IRUPDW FRQWDLQLQJ DOO WKH GHILQLWLRQV LQ WKH LQSXW SURJUDP DQG WKHQ VROYHV WKH GDWDIORZ HTXDWLRQV RQFH IRU WKH SURJUDP IORZJUDSK 3URWRW\SH XVHV D VLQJOH LQWHJHU DV WKH PAGE 48 ELW YHFWRU DQG VROYHV WKH GDWDIORZ HTXDWLRQV IRU WKH SURJUDP IORZJUDSK DV PDQ\ WLPHV DV WKHUH DUH EDVH HOHPHQWV )RU WKH UHDFKLQJGHILQLWLRQV GDWDIORZ SUREOHP WKH GHILQLWLRQV LQ WKH SURJUDP DUH WKH EDVH HOHPHQWV :H FDOO WKH DSSURDFK XVHG E\ SURWRW\SH RQHEDVHHOHPHQWDWDWLPH DQG WKH DSSURDFK XVHG E\ SURWRW\SH LV DOODWRQFH ,W PLJKW EH H[SHFWHG WKDW SURWRW\SH ZRXOG EH PDQ\ WLPHV VORZHU WKDQ SURWRn W\SH EHFDXVH RI WKH ELJ GLIIHUHQFH LQ ELWYHFWRU VL]HV EXW WKLV LV QRW WKH FDVH )RU SURWRW\SH FDOFXODWLRQV XVLQJ YDULHG WHVW UHVXOWV VKRZ WKDW 9 [ 6L a ZKHUH 9 LV WKH DYHUDJH QXPEHU RI YLVLWV SHU IORZJUDSK QRGH PDGH WR VROYH WKH GDWDIORZ HTXDn WLRQV 6L LV WKH LQWHJHU VL]H RI WKH ELW YHFWRU IRU SURWRW\SH DQG LV WKH QXPEHU RI GHILQLWLRQV LQ WKH LQSXW SURJUDP 7KLV UHODWLRQVKLS IRU SURWRW\SH PHDQV WKDW SURWRW\SH VKRXOG UXQ DW URXJKO\ WKH VDPH VSHHG DV SURWRW\SH EHFDXVH VROYLQJ WKH GDWDIORZ HTXDWLRQV IRU D VLQJOH HOHPHQW ZLOO UHTXLUH DQ DYHUDJH RI URXJKO\ RQH YLVLW SHU IORZJUDSK QRGH DQG WKH DSSOLFDWLRQ RI WKH GDWDIORZ HTXDWLRQV WR D YHFWRU RI VL]H RQH 1RWH WKDW WKH WRWDO DPRXQW RI ZRUN SURWRW\SH PXVW GR SHU IORZJUDSK QRGH WR VROYH WKH HTXDWLRQV LV SURSRUWLRQDO WR WKH SURGXFW 9 [ 6? a =f DQG WKH WRWDO DPRXQW RI ZRUN SURWRW\SH PXVW GR SHU IORZJUDSK QRGH WR VROYH WKH HTXDWLRQV IRU WKH EDVH HOHPHQWV LV SURSRUWLRQDO WR WKH SURGXFW 9[6L['tO[O[']V' ZKHUH 6 LV WKH LQWHJHU VL]H RI WKH ELW YHFWRU IRU SURWRW\SH ([SHULPHQWDO UHVXOWV KDYH VXSSRUWHG WKH H[SHFWDWLRQ RI VLPLODU VSHHGV IRU WKH WZR SURWRW\SHV :KHQ GHFLGLQJ RQ WKH GHVLJQ RI D SUDFWLFDO WRRO WKLV ILQGLQJ LV LPSRUWDQW DQG GHFLVLYHO\ WLSV WKH VFDOHV LQ IDYRU RI WKH RQHEDVHHOHPHQWDWDWLPH DSSURDFK XVHG E\ SURWRW\SH )RU ERWK SURWRW\SHV WKH ELW VSDFH QHHGHG IRU VHW VWRUDJH LV QNV ZKHUH Q LV WKH QXPEHU RI IORZJUDSK QRGHV N LV WKH DYHUDJH QXPEHU RI VHWV SHU QRGH DQG V PD[DYHUDJH VHW ELWVL]H IRU DQ\ VROYLQJ RI WKH HTXDWLRQVf 1RWH WKDW IRU SURWRW\SH WKHUH LV RQO\ RQH VROYLQJ RI WKH HTXDWLRQV DQG IRU SURWRW\SH WKHUH DUH DV PDQ\ VROYLQJ RI WKH HTXDWLRQV DV EDVH HOHPHQWV 7KH SULPDU\ UHDVRQ PAGE 49 7DEOH 7\SLFDO H[SHULPHQWDO UHVXOWV IRU WKH WZR SURWRW\SHV GHIV GHIV JOREDO FDOOV QRGHV SURWRW\SH SURWRW\SH b V OPV b V PV b POV PV b PV P V b 1$ O2P2V b 1$ POV b 1$ PV b 1$ PV b 1$ PV b 1$ PV WKH DSSURDFK XVHG E\ SURWRW\SH LV SUHIHUDEOH ZKHQ FRPSDUHG ZLWK WKH DOODWRQFH DSSURDFK XVHG E\ SURWRW\SH LV WKH OLNHOLKRRG RI D JUHDWO\ UHGXFHG V YDOXH )RU H[DPSOH ZLWKRXW HOHPHQW UHFRGLQJ WKH V YDOXH LV IRU SURWRW\SH DQG IRU SURWRW\SH $OORZLQJ HOHPHQW UHFRGLQJ WKH V YDOXH IRU WKH SURWRW\SH DSSURDFK ZLOO EH PD[DYHUDJH QXPEHU RI UHFRGHG HOHPHQWV SHU SURFHGXUH IRU DQ\ VROYLQJ RI WKH HTXDWLRQVf +HUH ZH DVVXPH WKDW WKH EHVW ZD\ WR DGG HOHPHQW UHFRGLQJ WR SURWRW\SH ZRXOG EH IRU HDFK VROYLQJ RI WKH HTXDWLRQV WR VROYH WKH HTXDWLRQV IRU ERWK D VLQJOH EDVH HOHPHQW DQG DOO UHFRGHG HOHPHQWV JHQHUDWHG IURP WKDW EDVH HOHPHQW 7DEOH SUHVHQWV W\SLFDO H[SHULPHQWDO UHVXOWV IRU WKH WZR SURWRW\SHV (DFK WDEOH URZ UHSUHVHQWV D GLIIHUHQW LQSXW SURJUDP 7KH LQSXW SURJUDPV ZHUH UDQGRPO\ JHQHUDWHG E\ D VHSDUDWH SURJUDP JHQHUDWRU 7KH JHQHUDWHG LQSXW SURJUDPV DUH V\Qn WDFWLFDOO\ FRUUHFW DQG FRPSLOH ZLWKRXW HUURU EXW KDYH PHDQLQJOHVV H[HFXWLRQV (DFK LQSXW SURJUDP LQ 7DEOH KDV SURFHGXUHV 2QO\ SURWRW\SH FXUUHQWO\ KDV HOHPHQWUHFRGLQJ ORJLF VR WKH LQSXW SURJUDPV GR QRW KDYH FDOO SDUDPHWHUV DQG WKH WDEOH GDWD GR QRW UHIOHFW HOHPHQWUHFRGLQJ FRVWV 0HDVXULQJ HOHPHQWUHFRGLQJ FRVWV IRU UDQGRPO\ JHQHUDWHG SURJUDPV ZRXOG EH VRPHZKDW PHDQLQJOHVV DQ\ZD\ VLQFH WKH SXUSRVHIXOQHVVRIYDULDEOHV SULQFLSOH ZRXOG EH YLRODWHG PAGE 50 5HIHUULQJ WR WKH FROXPQV RI 7DEOH fGHIVf LV WKH WRWDO QXPEHU RI GHILQLWLRQV LQ WKH LQSXW SURJUDP fGHIV JOREDOf LV WKH SHUFHQWDJH WKDW GHILQH JOREDO YDULDEOHV fFDOOVf LV WKH QXPEHU RI NQRZQ FDOOV fQRGHVf LV WKH QXPEHU RI IORZJUDSK QRGHV fSURWRW\SH f LV WKH WRWDO &38 XVDJH WLPH LQ PLQXWHV DQG VHFRQGV UHTXLUHG E\ SURWRW\SH WR FRPSOHWHO\ VROYH WKH UHDFKLQJGHILQLWLRQV GDWDIORZ SUREOHP IRU WKH LQSXW SURJUDP DQG JHQHUDWH D UHSRUW RI DOO WKH UHDFKHV DQG fSURWRW\SH f LV WKH VDPH WKLQJ IRU SURWRW\SH 7KH KDUGZDUH XVHG ZDV UDWHG DW URXJKO\ 0,36 7KH ODUJH VSDFH UHTXLUHPHQWV RI SURWRW\SH SUHYHQWHG UXQQLQJ LW IRU WKH ODUJHU LQSXW SURJUDPV LQ WKH WDEOH PAGE 51 &+$37(5 ,17(5352&('85$/ 6/,&,1* $1' /2*,&$/ 5,33/( ())(&7 5HSUHVHQWLQJ &RQWLQXDWLRQ 3DWKV IRU ,QWHUSURFHGXUDO /RJLFDO 5LSSOH (IIHFW 7KLV VHFWLRQ OD\V WKH WKHRUHWLFDO EDVLV IRU RXU DOJRULWKP 7KH SUREOHP RI LQWHU SURFHGXUDO ORJLFDO ULSSOH HIIHFW LV H[DPLQHG IURP WKH SHUVSHFWLYH RI H[HFXWLRQ SDWKV DQG WKHLU SRVVLEOH FRQWLQXDWLRQV )LUVW JHQHUDO GHILQLWLRQV DUH JLYHQ IROORZHG E\ WKUHH DVVXPSWLRQV DQG D GHILQLWLRQ RI WKH $OORZ DQG 7UDQVIRUP VHWV IROORZHG E\ /HPPD 7KHRUHPV WKURXJK DQG D GLVFXVVLRQ RI WKH SRWHQWLDO IRU RYHUHVWLPDWLRQ LQKHUHQW LQ WKH $OORZ VHW $ YDULDEOH LV GHILQHG DW HDFK SRLQW LQ D SURJUDP ZKHUH LW LV DVVLJQHG D YDOXH $ GHILQLWLRQ LV DVVXPHG WR KDYH WKH JHQHUDO IRUP RI fX fÂ§ H[SUHVVLRQf ZKHUH Y LV WKH YDULDEOH EHLQJ GHILQHG DQG f LV DQ DVVLJQPHQW RSHUDWRU WKDW DVVLJQV WKH YDOXH RI H[SUHVVLRQ WR Y ,I WKH H[SUHVVLRQ LQFOXGHV YDULDEOHV WKHQ WKHVH YDULDEOHV DUH WHUPHG WKH XVH YDULDEOHV RI WKH GHILQLWLRQ ,Q JHQHUDO D XVH LV DQ\ LQVWDQFH RI D YDULDEOH WKDW LV KDYLQJ LWV YDOXH XVHG DW WKH SRLQW ZKHUH WKH YDULDEOH RFFXUV $ SURFHGXUH FRQWDLQV D GHILQLWLRQ LI WKH VWDWHPHQW WKDW PDNHV WKH GHILQLWLRQ LV LQ WKH ERG\ RI WKH SURFHGXUH 6LPLODUO\ D SURFHGXUH FRQWDLQV D FDOO LI WKH VWDWHPHQW WKDW PDNHV WKH FDOO LV LQ WKH ERG\ RI WKH SURFHGXUH 7KH ERG\ RI D SURFHGXUH LV WKRVH VWDWHPHQWV WKDW DUH GHILQHG DV EHORQJLQJ WR WKH SURFHGXUH )UHTXHQW UHIHUHQFH LV PDGH LQ WKLV FKDSWHU WR D SURFHGXUH FRQWDLQLQJ D VWDWHn PHQW RU FRQWDLQLQJ D FDOO RU FRQWDLQLQJ D IORZJUDSK QRGH )RU ODQJXDJHV WKDW DOORZ QHVWHG SURFHGXUHV VXFK DV 3DVFDO DQG $GD QRWH WKDW SURFHGXUH QHVWLQJ LQ WKHVH ODQJXDJHV LV D PHFKDQLVP IRU FRQWUROOLQJ YDULDEOH VFRSH DQG QRW D PHFKDQLVP IRU PAGE 52 VKDULQJ VWDWHPHQWV FDOOV RU IORZJUDSK QRGHV 7KURXJKRXW WKLV FKDSWHU ZH DVVXPH WKDW DW PRVW RQO\ D VLQJOH SURFHGXUH FRQWDLQV DQ\ JLYHQ VWDWHPHQW FDOO RU IORZJUDSK QRGH /HW G DQG GG EH WZR GHILQLWLRQV SRVVLEO\ WKH VDPH LQ WKH VDPH SURJUDP /HW GG KDYH D XVHYDULDEOH Y OHW YÂÂ EH WKDW XVHYDULDEOH LQVWDQFH DQG OHW G GHILQH Y *LYHQ D SRVVLEOH H[HFXWLRQ SDWK EHWZHHQ GHILQLWLRQ G DQG YÂG DORQJ ZKLFK WKH GHILQLWLRQ RI Y WKDW G UHSUHVHQWV ZRXOG EH SURSDJDWHG VXFK D SDWK LV UHIHUUHG WR DV D GHILQLWLRQFOHDU SDWK EHWZHHQ G DQG 9GG ZLWK UHVSHFW WR Y 'HILQLWLRQ G FDQ RQO\ EH SURSDJDWHG DORQJ DQ H[HFXWLRQ SDWK WR WKH HQG RI WKDW SDWK LI HLWKHU GHILQLWLRQ G LWVHOI RU DQ HOHPHQW WKDW UHSUHVHQWV GHILQLWLRQ G H[LVWV DW WKH EHJLQQLQJ RI WKDW SDWK DQG WKHUH LV QR UHGHILQLWLRQ RI Y DORQJ WKDW SDWK 'HILQLWLRQ G LV VDLG WR DIIHFW GHILQLWLRQ GG LI WKHUH LV D GHILQLWLRQFOHDU SDWK EHWZHHQ G DQG 9GG ZLWK UHVSHFW WR Y 6LPLODUO\ GHILQLWLRQ G DIIHFWV XVH X LI X LV DQ LQVWDQFH RI Y DQG WKHUH LV D GHILQLWLRQFOHDU SDWK EHWZHHQ G DQG X ZLWK UHVSHFW WR Y )RU FRQYHQLHQFH Y ZLOO QRW EH H[SOLFLWO\ PHQWLRQHG ZKHQ LW LV XQGHUVWRRG 1RWH WKDW ZKHQHYHU ZH VSHDN RI DQ H[HFXWLRQ SDWK EHWZHHQ WZR SRLQWV ZH DOZD\V PHDQ WKDW WKH H[HFXWLRQ SDWK EHJLQV DW WKH ILUVW SRLQW DQG HQGV DW WKH VHFRQG SRLQW )RU H[DPSOH DQ H[HFXWLRQ SDWK EHWZHHQ G DQG GG EHJLQV DW WKH SURJUDP SRLQW ZKHUH G RFFXUV DQG HQGV DW WKH SURJUDP SRLQW ZKHUH 9GG RFFXUV )RU FRQYHQLHQFH ZH DVVXPH WKDW GG DQG 9GG RFFXS\ WKH VDPH SURJUDP SRLQW $VVXPSWLRQ $ FDOOHG SURFHGXUH LI LW UHWXUQV DOZD\V UHWXUQV WR LWV PRVW UHFHQW FDOOHU $ SURFHGXUH WKDW UHWXUQV DOZD\V UHWXUQV WR WKH PRVW UHFHQW XQUHWXUQHG FDOO $VVXPSWLRQ $ FDOO KDV QR LQIOXHQFH RQ WKH H[HFXWLRQ SDWKV WDNHQ LQVLGH WKH FDOOHG SURFHGXUH $VVXPSWLRQ 7KHUH DUH QR UHFXUVLYH FDOOV $VVXPSWLRQ UHIOHFWV WKH EHKDYLRU RI DOO WKH SURFHGXUDO ODQJXDJHV WKDW ZH NQRZ RI 5HJDUGLQJ $VVXPSWLRQ RXU DOJRULWKP PD\ LQ IDFW RYHUHVWLPDWH WKH ORJLFDO ULSSOH HIIHFW EHFDXVH RI ERWK $VVXPSWLRQ DQG WKH XQVWDWHG EXW VWDQGDUG DVVXPSWLRQ RI PAGE 53 LQWUDSURFHGXUDO GDWDIORZ DQDO\VLV WKDW DOO SDWKV LQ D SURFHGXUH IORZJUDSK DUH SRVVLEOH H[HFXWLRQ SDWKV +RZHYHU WKHVH WZR DVVXPSWLRQV DUH XQDYRLGDEOH EHFDXVH GHWHUPLQn LQJ DOO WKH WUXO\ SRVVLEOH H[HFXWLRQ SDWKV LQ DQ DUELWUDU\ SURJUDP LV NQRZQ WR EH DQ XQGHFLGDEOH SUREOHP 5HJDUGLQJ $VVXPSWLRQ PDNLQJ WKLV DVVXPSWLRQ LPSURYHV WKH SUHFLVLRQ RI RXU DOJRULWKP EHFDXVH WKLV DVVXPSWLRQ UHPRYHV D SRWHQWLDO FDXVH RI RYHUHVWLPDWLRQ 7KH FRQVHTXHQFH RI XVLQJ RXU DOJRULWKP IRU D SURJUDP ZLWK UHFXUVLYH FDOOV LV GLVFXVVHG DW WKH HQG RI 6HFWLRQ 7R GHWHUPLQH ZKDW D GHILQLWLRQ DIIHFWV ZKHQ LW LV FRQVWUDLQHG E\ ULSSOH HIIHFW LW LV XVHIXO WR LQWURGXFH WZR FRQFHSWV EDFNZDUG IORZ DQG IRUZDUG IORZ *LYHQ DQ H[HFXWLRQ SDWK ZKHQHYHU WKH H[HFXWLRQ SDWK UHWXUQV IURP D SURFHGXUH WR D FDOO WKLV LV WHUPHG EDFNZDUG IORZ $OO RWKHU SDUWV RI WKH H[HFXWLRQ SDWK PD\ EH WHUPHG IRUZDUG IORZ 1RWH WKDW WKH SRVVLELOLWLHV IRU EDFNZDUG IORZ DUH FRQVWUDLQHG E\ $VVXPSWLRQ DQG WKHUHIRUH FRQVWUDLQHG E\ WKH UHOHYDQW H[HFXWLRQ SDWKV WKDW OHDG XS WR WKH SRLQW RI WKH UHWXUQ LQ TXHVWLRQ 5HJDUGLQJ D JLYHQ H[HFXWLRQ SDWK WKRVH FDOO LQVWDQFHV ZLWKLQ WKDW H[HFXWLRQ SDWK WKDW KDYH \HW WR EH UHWXUQHG WR ZLWKLQ WKDW SDWK FDOOHG XQUHWXUQHG FDOOV DUH WKH SDUWV RI WKH SDWK WKDW FRQVWUDLQ EDFNZDUG IORZ 1RWH WKDW WKLV FRQVWUDLQW LV D SRVLWLYH FRQVWUDLQW VLQFH D FDOO FDQQRW EH UHWXUQHG WR XQOHVV WKDW FDOO H[LVWV DV DQ XQUHWXUQHG FDOO LQ DW OHDVW RQH UHOHYDQW H[HFXWLRQ SDWK 'HILQLWLRQ 7ZR VHWV $OORZ DQG 7UDQVIRUP ZLOO EH XVHG WR UHSUHVHQW WKH EDFNZDUGIORZ UHVWULFWLRQV DVVRFLDWHG ZLWK D SDUWLFXODU GHILQLWLRQ G /HW S EH WKH SURn JUDP SRLQW ZKHUH GHILQLWLRQ G RFFXUV 7KH HOHPHQWV LQ ERWK VHWV DUH FDOOV 7KH $OORZ VHW LGHQWLILHV RQO\ WKH FDOOV WR ZKLFK WKH H[HFXWLRQ SDWK FRQWLQXLQJ RQ IURP SRLQW S PD\ PDNH DQ XQPDWFKHG UHWXUQ WRfÂ§XQWLO WKH EDFNZDUGIORZ UHVWULFWLRQV UHSUHVHQWHG E\ WKLV $OORZ VHW DUH HIIHFWLYHO\ FDQFHOOHG E\ WKH LQWHUDFWLRQ EHWZHHQ WKH H[HFXWLRQ SDWK FRQWLQXDWLRQ DQG WKH 7UDQVIRUP VHW H[SODLQHG VKRUWO\ $Q XQPDWFKHG UHWXUQ LV D UHWXUQ PDGH GXULQJ WKH H[HFXWLRQSDWK FRQWLQXDWLRQ WR D FDOO LQVWDQFH WKDW SUHFHGHV PAGE 54 WKH EHJLQQLQJ RI WKDW H[HFXWLRQSDWK FRQWLQXDWLRQ 7KH FDOO LQVWDQFH LV QHFHVVDULO\ DQ XQUHWXUQHG FDOO DV RWKHUZLVH LW FRXOG QRW EH UHWXUQHG WR _$,ORZ_ WKH WRWDO QXPEHU RI GLIIHUHQW FDOOV UHSUHVHQWHG LQ WKH SURJUDP WH[W :H GHILQH $OORZ WR PHDQ WKHUH DUH QR EDFNZDUGIORZ UHVWULFWLRQV IRU G 7KH 7UDQVIRUP VHW LGHQWLILHV RQO\ WKH FDOOV WR ZKLFK WKH H[HFXWLRQ SDWK FRQWLQXLQJ RQ IURP SRLQW S PD\ PDNH DQ XQPDWFKHG UHWXUQ WR DQG XSRQ WKLV XQPDWFKHG UHWXUQ WKH H[HFXWLRQSDWK FRQWLQXDWLRQ LV QR ORQJHU FRQVWUDLQHG E\ WKH $OORZ DQG 7UDQVIRUP VHWV DVVRFLDWHG ZLWK G 7KH IROORZLQJ UHODWLRQVKLSV KROG 7UDQVIRUP & $OORZ ,I $OORZ A WKHQ 7UDQVIRUP A 1RWH WKDW PLQLPL]LQJ EDFNZDUGIORZ UHVWULFWLRQV PXVW EH GRQH ZKHQHYHU WKH SRVVLEOH H[HFXWLRQ SDWKV DOORZ LW EHFDXVH RWKHUZLVH WKH FRPSXWHG ORJLFDO ULSSOH HIIHFWfÂ§ZKLFK LV WKH ZKROH SXUSRVH RI WKLV IRUPDODQDO\VLV VHFWLRQfÂ§PD\ EH PLVVLQJ SLHFHV WKDW EHORQJ LQ LW EXW ZHUH QRW DGGHG WR LW EHFDXVH EDFNZDUGIORZ UHVWULFWLRQV ZHUH UHWDLQHG WKDW DUH QRW YDOLG IRU DOO WKH SRVVLEOH H[HFXWLRQ SDWKV LQYROYHG /HPPD )RU DQ\ H[HFXWLRQ SDWK 3 EHWZHHQ WZR SURJUDP SRLQWV S DQG T LI 3 LQFOXGHV WZR RU PRUH FDOO LQVWDQFHV PDGH LQ 3 WKDW KDYH QRW EHHQ UHWXUQHG WR LQ 3 WKHQ IRU WKHVH XQUHWXUQHG FDOOV F FDOOV WKH SURFHGXUH FRQWDLQLQJ FW ZKHUH F LV WKH LfWK XQUHWXUQHG FDOO LQ H[HFXWLRQ RUGHU PDGH LQ 3 3URRI $VVXPH WKDW WKH QH[W XQUHWXUQHG FDOO F LV QRW FRQWDLQHG LQ WKH SURn FHGXUH WKDW ZDV FDOOHG E\ FÂ /HW ; EH WKH SURFHGXUH FDOOHG E\ F DQG OHW < EH WKH SURFHGXUH WKDW FRQWDLQV FL 7KH H[HFXWLRQ SDWK LQ 3 EHWZHHQ PDNLQJ WKH FDOO F DQG PDNLQJ WKH FDOO F PXVW LQFOXGH D SDWK RXW RI SURFHGXUH ; DQG LQWR SURFHGXUH < VR WKDW WKH FDOO FWL FDQ EH PDGH $ SDWK RXW RI SURFHGXUH ; FDQ RFFXU LQ RQO\ WZR ZD\V (LWKHU ; UHWXUQV WR D FDOO RU ; LWVHOI PDNHV D FDOO ,I ; UHWXUQV WR D FDOO WKHQ E\ $VVXPSWLRQ F ZRXOG EH UHWXUQHG WR FRQWUDGLFWLQJ WKH JLYHQ WKDW F KDV QRW EHHQ UHWXUQHG WR 7KLV PHDQV ; PXVW PDNH D FDOO WR JHW WR < /HW F EH WKH FDOO FRQWDLQHG LQ ; WKDW LV WKH ODVW FDOO FRQWDLQHG LQ ; RQ WKH H[HFXWLRQ SDWK LQ 3 WDNHQ IURP ; WR < VR DV WR PDNH WKH FDOO FL ,I ; PDNHV WKH FDOO F DQG F KDV QRW EHHQ PAGE 55 UHWXUQHG WR LQ 3 WKHQ F ZRXOG SUHFHGH FL DV DQ XQUHWXUQHG FDOO IROORZLQJ FÂ FRQn WUDGLFWLQJ WKH JLYHQ WKDW FL LV WKH QH[W XQUHWXUQHG FDOO LQ H[HFXWLRQ RUGHU DIWHU F ,I F KDV EHHQ UHWXUQHG WR LQ 3 WKHQ DOO FDOOV RFFXUULQJ RQ WKH H[HFXWLRQ SDWK EHWZHHQ WKH FDOO F DQG WKH UHWXUQ WR F PXVW KDYH EHHQ UHWXUQHG WR DFFRUGLQJ WR $VVXPSWLRQ 7KLV ZRXOG PHDQ FL KDV EHHQ UHWXUQHG WR FRQWUDGLFWLQJ WKH JLYHQ WKDW FÂL KDV QRW EHHQ UHWXUQHG WR 7KXV LW LV WUXH WKDW F FDOOV WKH SURFHGXUH FRQWDLQLQJ &ML DV DVVXPLQJ RWKHUZLVH OHDGV WR FRQWUDGLFWLRQV Â’ 'HILQLWLRQV IRU 7KHRUHPV WKURXJK /HW G DQG GG EH WKH WZR GHILQLWLRQV SUHYLRXVO\ GHILQHG /HW $ DQG 7 EH WKH $OORZ DQG 7UDQVIRUP VHWV DVVRFLDWHG ZLWK G /HW 3 EH D VLQJOH H[HFXWLRQ SDWK EHWZHHQ G DQG GG DQG DORQJ ZKLFK G FDQ DIIHFW GG VXEMHFW WR WKH FRQVWUDLQWV RQ 3 LPSRVHG E\ $ DQG 7 3 ZLOO FRQVLVW RI D VHTXHQFH RI FDOOV DQG UHWXUQV LI DQ\ LQ WKH RUGHU WKH\ DUH PDGH $Q\ LQVWDQFH RI D FDOO PDGH LQ 3 WKDW LV QRW UHWXUQHG WR LQ 3 LV DQ XQUHWXUQHG FDOO LQ 3 LV GHILQHG IRU 3 LI DQG RQO\ LI 3 FRQWDLQV DQ XQPDWFKHG UHWXUQfÂ§PHDQLQJ D UHWXUQ WR D FDOO LQVWDQFH WKDW SUHFHGHV WKH EHJLQQLQJ RI 3fÂ§WR D FDOO f 7 LV WKDW SDUW RI 3 WKDW IROORZV WKH ILUVW XQPDWFKHG UHWXUQ WR D FDOO 7 7KXV UHSUHVHQWV WKH FRQWLQXDWLRQ RI 3 DIWHU WKH XQPDWFKHG UHWXUQ $Q\ LQVWDQFH RI D FDOO PDGH LQ WKDW LV QRW UHWXUQHG WR LQ LV DQ XQUHWXUQHG FDOO LQ 5HIHUULQJ WR HDFK RI WKH IRXU WKHRUHPV LQ WXUQ OHW $$ DQG 77 EH WKH $OORZ DQG 7UDQVIRUP VHWV IRU GG JLYHQ DOO WKH SDWKV 3 WKDW PHHW WKH UHTXLUHPHQWV RI 3 DV VWDWHG E\ WKDW WKHRUHP /HW $$S DQG 77S EH WKH $OORZ DQG 7UDQVIRUP VHWV IRU GG JLYHQ D VLQJOH SDWK 3 WKDW PHHWV WKH UHTXLUHPHQWV RI 3 DV VWDWHG E\ WKDW WKHRUHP 7KH IRXU WKHRUHPV WKDW IROORZ HDFK GHILQH $$ DQG 77 1RWH WKDW IRU DQ\ JLYHQ 3 $ DQG 7 RQH RI WKH IRXU WKHRUHPV ZLOO DSSO\ 7KHRUHP ,I f $ DQG 3 KDV QR XQUHWXUQHG FDOOV RU f $ A LV GHILQHG IRU 3 DQG KDV QR XQUHWXUQHG FDOOV WKHQ $$ fÂ§ DQG 77 fÂ§ PAGE 56 3URRI )RU FDVH f G LV IUHH RI EDFNZDUGIORZ UHVWULFWLRQV DQG G KDV DIIHFWHG GG ZLWKRXW PDNLQJ DQ XQUHWXUQHG FDOO WKHUHIRUH GG ZLOO EH IUHH RI EDFNZDUGIORZ UHVWULFWLRQV JLYLQJ $$ rfÂ§ DQG 77 fÂ§ )RU FDVH f DV VRRQ DV SDWK 3 PDNHV DQ XQPDWFKHG UHWXUQ U WR D FDOO 7 WKHQ E\ 'HILQLWLRQ ZKDW G FDQ DIIHFW LV QR ORQJHU FRQVWUDLQHG E\ $ DQG 7 DQG WKLV IUHHGRP IURP FRQVWUDLQW E\ $ DQG 7 SDVVHV E\ WUDQVLWLYLW\ WR GG EHFDXVH G DIIHFWV GG :KHQ LV GHILQHG IRU 3 WKH XQPDWFKHG UHWXUQ U LQ 3 WKDW LPPHGLDWHO\ SUHn FHGHV WKH EHJLQQLQJ RI PHDQV WKDW DQ\ XQUHWXUQHG FDOOV LQ 3 DUH DOVR LQ 7KLV LV EHFDXVH DOO FDOO LQVWDQFHV ZLWKLQ 3 DUH PRUH UHFHQW WKDQ WKH FDOO LQVWDQFH WKDW PDWFKHV WKH XQPDWFKHG UHWXUQ U 7KXV E\ $VVXPSWLRQ DOO FDOO LQVWDQFHV LQ 3 SUHFHGLQJ WKH UHWXUQ U PXVW EH UHWXUQHG WR LQ 3 EHIRUH U FDQ RFFXU 7KHUHIRUH 3 KDV QR XQUHWXUQHG FDOOV EHFDXVH KDV QR XQUHWXUQHG FDOOV 7KXV GG LV IUHH RI EDFNZDUG IORZ UHVWULFWLRQV VLQFH $ 7 DQG 3 FRQWULEXWH QRWKLQJ LQ WKH ZD\ RI FRQVWUDLQW JLYLQJ $$ mfÂ§ DQG 77 mfÂ§ Â’ 7KHRUHP ,I f $ DQG 3 KDV DW OHDVW RQH XQUHWXUQHG FDOO RU f $ A LV GHILQHG IRU 3 DQG KDV DW OHDVW RQH XQUHWXUQHG FDOO WKHQ $$ rfÂ§ 8DMM VXFA S ^WKH XQUHWXUQHG FDOOV RI 3` DQG 77 fÂ§ -DMM A S ^WKH ILUVW XQUHWXUQHG FDOO LQ 3` 3URRI )RU FDVH f $ DQG 7 FRQWULEXWH QRWKLQJ LQ WKH ZD\ RI FRQVWUDLQW WR $$S DQG 77S %HFDXVH G DIIHFWV GG DORQJ SDWK 3 ZKLFK FRQWDLQV XQUHWXUQHG FDOOV E\ $VVXPSWLRQ WKRVH XQUHWXUQHG FDOOV PXVW EH UHWXUQHG WR ILUVW EHIRUH DQ\ RWKHU XQUHWXUQHG FDOOV FDQ EH PDGH IURP WKH H[HFXWLRQSDWK FRQWLQXDWLRQ SRLQW RI GG RQZDUG +HQFH $$S fÂ§ ^WKH XQUHWXUQHG FDOOV RI 3` %HFDXVH G KDG QR EDFNZDUG IORZ UHVWULFWLRQV LW IROORZV WKDW RQFH DOO WKH XQUHWXUQHG FDOOV RI 3 DUH UHWXUQHG WR E\ WKH H[HFXWLRQSDWK FRQWLQXDWLRQ WKHQ WKDW FRQWLQXDWLRQ ZRXOG QR ORQJHU KDYH DQ\ EDFNZDUGIORZ UHVWULFWLRQV %HFDXVH RI $VVXPSWLRQ DQG /HPPD DOO WKH XQUHWXUQHG FDOOV RI 3 DUH UHWXUQHG WR ZKHQ WKH VHTXHQWLDOO\ ILUVW XQUHWXUQHG FDOO LQ 3 LV UHWXUQHG WR +HQFH 77S fÂ§ ^WKH ILUVW XQUHWXUQHG FDOO LQ 3` )RU FDVH f DV PAGE 57 VKRZQ LQ WKH SURRI RI 7KHRUHP FDVH f $ DQG 7 FRQWULEXWH QRWKLQJ WR $$S DQG 77S ZKHQ LV GHILQHG IRU 3 7KXV WKLV FDVH f LV HIIHFWLYHO\ WKH VDPH DV FDVH f EHFDXVH WKH $ DQG 7 VHWV FRQWULEXWH QRWKLQJ DQG DQ XQUHWXUQHG FDOO LQ LV DQ XQUHWXUQHG FDOO LQ 3 7KHUHIRUH $$S rfÂ§ ^WKH XQUHWXUQHG FDOOV RI 3` DQG 77S fÂ§ ^WKH ILUVW XQUHWXUQHG FDOO LQ 3` )URP 'HILQLWLRQ DQG WKH JHQHUDO GHILQLWLRQV RI $$ 77 $$S DQG 77S LW IROORZV WKDW $$ fÂ§ -DQ VXFK S $$3 DQG 77 fÂ§ 8DOO VXFK 3 77S 7KXV $$ fÂ§ -DLO VXFK 3 XQUHWXUQHG FDOOV RI 3` DQG 77 rfÂ§ -DQ VXFK S ^WKH ILUVW XQUHWXUQHG FDOO LQ 3` Â’ 7KHRUHP ,I $ A LV QRW GHILQHG IRU 3 DQG 3 KDV QR XQUHWXUQHG FDOOV WKHQ $$ fÂ§ ^[ [ f $ $ [ LV SDUW RI D SRVVLEOH H[HFXWLRQ SDWK WKDW LQFOXVLYHO\ EHJLQV ZLWK D FDOO f 7 DQG HQGV ZLWK D FDOO RI WKH SURFHGXUH FRQWDLQLQJ GG VXFK WKDW HDFK XQUHWXUQHG FDOO LQ WKLV SRVVLEOH H[HFXWLRQ SDWK LV LQ $f` DQG 77 fÂ§ $ $ IO 7 3URRI 1RWH WKDW RQO\ RQH SURFHGXUH FRQWDLQV GG %HFDXVH LV QRW GHILQHG IRU 3 LW IROORZV WKDW 3 ZDV FRQVWUDLQHG LQ LWV HQWLUHW\ E\ $ QHYHU PDNLQJ DQ XQPDWFKHG UHWXUQ WR D FDOO f 7 %HFDXVH 3 KDV QR XQUHWXUQHG FDOOV G FDQ RQO\ DIIHFW GG DORQJ 3 E\ PDNLQJ RQH RU PRUH XQPDWFKHG UHWXUQV WR FDOOV f $ fÂ§ 7f XQOHVV G DQG GG DUH LQ WKH VDPH SURFHGXUH $ LQ HIIHFW UHSUHVHQWV SRVVLEOH H[HFXWLRQ SDWKV ZLWK XQUHWXUQHG FDOOV E\ ZKLFK G ZDV DIIHFWHG +RZHYHU RQFH JLYHQ 3 WKH SDWK 3 PD\ HOLPLQDWH VRPH RI WKH SDWKV IURP $ DV EHLQJ SRVVLEOH DQG UHWXUQ WR VRPH RI WKH XQUHWXUQHG FDOOV LQ $ 7KXV DOWKRXJK 3 FRQWULEXWHV QRWKLQJ GLUHFWO\ WR $$ LW PD\ QDUURZ WKH XQUHWXUQHG H[HFXWLRQSDWK SRVVLELOLWLHV WKDW $ FDQ FRQWULEXWH WR $$ $$ DV GHILQHG IRU WKLV WKHRUHP FDSWXUHV DOO H[HFXWLRQ SDWKV LQ $ WKDW EHJLQ ZLWK D FDOO f 7 DQG HQG ZLWK D FDOO RI WKH SURFHGXUH WKDW FRQWDLQV GG *LYHQ $VVXPSWLRQ LW VKRXOG EH REYLRXV WKDW WKHVH DUH DOO WKH SRVVLEOH SDWKV LQ $ WKDW DUH XQUHWXUQHG DIWHU 3 1RWH WKDW LI G DQG GG DUH LQ WKH VDPH SURFHGXUH WKHQ $$ $ DQG 77 7 $VVXPH WKDW PAGE 58 G DQG GG DUH LQ GLIIHUHQW SURFHGXUHV $Q\ FDOO f $ WKDW LV QRW SDUW RI DW OHDVW RQH SDWK LQ $ WKDW PDNHV D FDOO RI WKH SURFHGXUH FRQWDLQLQJ GG PXVW EH H[FOXGHG IURP $$ EHFDXVH 3 UHTXLUHV D SDWK LQ $ WKDW SDVVHV WKURXJK WKH SURFHGXUH FRQWDLQLQJ GG EHFDXVH RWKHUZLVH 3 FRXOG QRW PDNH D UHWXUQ WR WKH SURFHGXUH FRQWDLQLQJ GG $Q\ FDOO f $ WKDW LV RQ D SDWK LQ $ EHWZHHQ WKH SURFHGXUH FRQWDLQLQJ GG DQG WKH SURFHGXUH FRQWDLQLQJ G PXVW EH H[FOXGHG IURP $$ EHFDXVH WKH SURFHGXUH FRQWDLQLQJ GG KDV EHHQ UHWXUQHG WR E\ 3 7KH GHILQLWLRQ RI $$ IRU WKLV WKHRUHP VDWLVILHV WKHVH WZR H[FOXVLRQV 7KDW 77 fÂ§ $$ 7 IROORZV IURP 'HILQLWLRQ UHTXLULQJ 77 & $$ DQG IURP WKH GHILQLWLRQ RI $$ IRU WKLV WKHRUHP Â’ 7KHRUHP I ,I $ A LV QRW GHILQHG IRU 3 3 KDV DW OHDVW RQH XQUHWXUQHG FDOO DQG WKH ILUVW XQUHWXUQHG FDOO LQ 3 LV FRQWDLQHG LQ SURFHGXUH ; WKHQ 6? rfÂ§ 8DOO VXFK 3 JLYHQ ; ^AH XQUHWXUQHG FDOOV RI 3` DQG fÂ§ ^[ [ f $ $ [ LV SDUW RI D SRVVLEOH H[HFXWLRQ SDWK WKDW LQFOXVLYHO\ EHJLQV ZLWK D FDOO f 7 DQG HQGV ZLWK D FDOO RI WKH SURFHGXUH ; VXFK WKDW HDFK XQUHWXUQHG FDOO LQ WKLV SRVVLEOH H[HFXWLRQ SDWK LV LQ $f` $$ fÂ§ 6? 8 6 DQG 77 fÂ§ 6 7 3URRI 6L IROORZV IURP 'HILQLWLRQ DQG WKH SURRI RI 7KHRUHP 6 IROORZV IURP 7KHRUHP ZKHUH WKH VSHFLILF fSURFHGXUH FRQWDLQLQJ GGLQ WKH H[SUHVVLRQ IRU $$ LQ 7KHRUHP KDV EHHQ UHSODFHG E\ WKH HTXDOO\ VSHFLILF fSURFHGXUH ; 7KDW WKH XQLRQ RSHUDWLRQ RI $$ FRPELQLQJ 6? DQG GRHV QRW WKHUHE\ UHSUHn VHQW VSXULRXV SDWKV LQ $$ LW LV RQO\ QHFHVVDU\ WR VKRZ WKDW WKH SDWKV UHSUHVHQWHG LQ 6L QHYHU FURVV ZLWK WKH SDWKV UHSUHVHQWHG LQ 7ZR SDWKV FURVV LI HDFK SDWK PDNHV DQ XQUHWXUQHG FDOO WR WKH VDPH SURFHGXUH $OO SDWKV LQ 6 HQG ZLWK DQ XQUHWXUQHG FDOO RI SURFHGXUH ; $OO SDWKV LQ 6L EHJLQ ZLWK DQ XQUHWXUQHG FDOO FRQWDLQHG LQ SURFHGXUH ; $VVXPH WKDW ERWK 6L DQG 6 LQFOXGH DQ XQUHWXUQHG FDOO WR WKH VDPH SURFHGXUH $V DOO SDWKV LQ OHDG WR SURFHGXUH ; WKLV PHDQV WKHUH H[LVWV DQ H[Hn FXWLRQ SDWK WKDW RULJLQDWHV LQ SURFHGXUH ; DQG HYHQWXDOO\ FDOOV SURFHGXUH ; 7KXV PAGE 59 )LJXUH $Q H[DPSOH FDOO VWUXFWXUH WKDW GRHV QRW DOORZ RYHUHVWLPDWLRQ WKH H[HFXWLRQ SDWK UHSUHVHQWV UHFXUVLRQ DQG WKLV LV FRQWUDGLFWHG E\ $VVXPSWLRQ 7KHUHIRUH WKH SDWKV UHSUHVHQWHG LQ 6L QHYHU FURVV ZLWK WKH SDWKV UHSUHVHQWHG LQ 7KH ILUVW XQUHWXUQHG FDOO LQ 3 LV QRW DGGHG WR 77 EHFDXVH WKH SDWK 3 LV DQ H[WHQVLRQ RI WKH XQUHWXUQHG SDWKV UHSUHVHQWHG LQ 6A 7KDW 77 rfÂ§ 6L&?7 IROORZV IURP 'HILQLWLRQ UHTXLULQJ 77 & $$ DQG IURP WKH GHILQLWLRQ RI $$ IRU WKLV WKHRUHP Â’ 7KH IRXU WKHRUHPV JLYHQ DERYH ZLOO EH XVHG WR EXLOG WKH DOJRULWKP JLYHQ LQ WKH QH[W VHFWLRQ ,Q HIIHFW D JLYHQ $OORZ VHW UHSUHVHQWV SRVVLEOH H[HFXWLRQ SDWKV ZLWK XQUHWXUQHG FDOOV E\ ZKLFK WKH GHILQLWLRQ DVVRFLDWHG ZLWK WKDW $OORZ VHW ZDV DIIHFWHG ,QYHUVHO\ WKH $OORZ VHW LGHQWLILHV LQ HIIHFW WKRVH FRQWLQXDWLRQ SDWKV WKDW FDQ PDNH XQPDWFKHG UHWXUQV +RZHYHU PLVVLQJ IURP WKH $OORZ VHW LV WKH LQIRUPDWLRQ QHHGHG WR HQIRUFH DQ RUGHULQJ RI WKH XQPDWFKHG UHWXUQV WKDW WKH FRQWLQXDWLRQ SDWK PD\ PDNH 7R D ODUJH H[WHQW WKLV PLVVLQJ LQIRUPDWLRQ LV XQQHFHVVDU\ EHFDXVH RI /HPPD 7\SLFDOO\ WKH FDOO VWUXFWXUH RI WKH SURJUDP LWVHOI HQIRUFHV WKH RUGHULQJ RI WKH XQPDWFKHG UHWXUQV )LJXUH LV DQ H[DPSOH $VVXPH G DIIHFWV GG JLYLQJ DQ $OORZ VHW RI ^FOF` IRU GG *LYHQ D FRQWLQXDWLRQ SDWK IURP GG LW LV QRW SRVVLEOH IRU FO WR EH UHWXUQHG WR EHIRUH F VR WKH FRUUHFW RUGHULQJ RI XQPDWFKHG UHWXUQV LV HQIRUFHG E\ WKH SURJUDP LWVHOI +RZHYHU WKHUH DUH FDVHV ZKHUH WKH PLVVLQJ RUGHULQJ LQIRUPDWLRQ FDQ UHVXOW LQ D FRQWLQXDWLRQ SDWK WDNLQJ XQZDQWHG VKRUWFXWV )LJXUH JLYHV DQ H[DPSOH RI D FDOO VWUXFWXUH WKDW DOORZV WKH FRQWLQXDWLRQ SDWK IURP GG WR PDNH DQ XQZDQWHG VKRUWFXW ZKHQ JLYHQ WKH ULJKW FLUFXPVWDQFHV $VVXPH G DIIHFWV GG DORQJ WKH SDWKV FOF DQG FF JLYLQJ DQ $OORZ VHW RI ^FOFFF` IRU GG $VVXPH WKH FRQWLQXDWLRQ SDWK LV UFU ZKHUH U DQG U DUH XQPDWFKHG UHWXUQV PAGE 60 )LJXUH $Q H[DPSOH FDOO VWUXFWXUH WKDW DOORZV RYHUHVWLPDWLRQ WR FDOOV F DQG F 7KH XQPDWFKHG UHWXUQ U VKRXOG QRW EH DOORZHG WR KDSSHQ EHIRUH DQ XQPDWFKHG UHWXUQ U EXW WKLV XQPDWFKHGUHWXUQ RUGHULQJ ZLOO QRW EH HQIRUFHG E\ WKH $OORZ VHW GHILQHG LQ WKLV GLVVHUWDWLRQ VR WKH DVVXPHG FRQWLQXDWLRQ SDWK LV SRVVLEOH %\ YLUWXH RI VXFK D VSXULRXV FRQWLQXDWLRQ SDWK GG PD\ EH DEOH WR DIIHFW D GHILQLWLRQ RU XVH WKDW LW ZRXOG QRW RWKHUZLVH EH DEOH WR DIIHFW DVVXPLQJ GG ZHUH FRQILQHG WR RQO\ OHJLWLPDWH FRQWLQXDWLRQ SDWKV ,Q SUDFWLFDO WHUPV WKLV PHDQV WKDW WKH FRPSXWHG ORJLFDO ULSSOH HIIHFW WKDW FRQVLVWV RI DIIHFWHG GHILQLWLRQV DQG XVHV PD\ LQ IDFW EH DQ RYHUHVWLPDWH EHFDXVH RI VSXULRXV FRQWLQXDWLRQ SDWKV $OWKRXJK WKH $OORZ VHW GRHV SHUPLW VSXULRXV FRQWLQXDWLRQ SDWKV XQGHU WKH ULJKW FLUFXPVWDQFHV RI ZKLFK )LJXUH DQG WKH DVVXPHG SDWKV E\ ZKLFK G DIIHFWHG GG DUH WKH PRVW VLPSOH H[DPSOH ZH IHHO WKDW WKHVH FLUFXPVWDQFHV DORQJ ZLWK VSXULRXV SDWKV WKDW DIIHFW ZKDW ZRXOG RWKHUZLVH EH XQDIIHFWHG ZLOO QRW RFFXU RIWHQ HQRXJK LQ UHDO SURJUDPV WR XQGHUPLQH WKH JHQHUDO XVHIXOQHVV RI WKH $OORZ VHW LQ FRQVWUDLQLQJ EDFNZDUG IORZ DQG SHUPLWWLQJ FRPSXWDWLRQ RI D SUHFLVH RU VHPLSUHFLVH ORJLFDO ULSSOH HIIHFW PAGE 61 7KH /RJLFDO 5LSSOH (IIHFW $OJRULWKP 7KLV VHFWLRQ SUHVHQWV DQ DOJRULWKP IRU FRPSXWLQJ D SUHFLVH LQWHUSURFHGXUDO ORJLn FDO ULSSOH HIIHFW $IWHU D EULHI RYHUYLHZ RI WKH DOJRULWKP WKH GDWDIORZ DQDO\VLV PHWKRG XVHG E\ WKH DOJRULWKP LV GLVFXVVHG 7KHQ WZR LPSRUWDQW SURSHUWLHV RI WKH GDWDIORZ VHWV DUH GHWDLOHG IROORZHG E\ WKUHH UXOHV WKDW DUH XVHG WR LPSRVH EDFNZDUGIORZ UHn VWULFWLRQV RQ WKH GDWDIORZ DQDO\VLV WKDW LV GRQH /DVW DUH SURRIV WKDW WKH DOJRULWKP LV FRUUHFW 7KH DOJRULWKP WR FRPSXWH ORJLFDO ULSSOH HIIHFW LV VKRZQ LQ )LJXUH (DFK VWDWHPHQW LQ WKH DOJRULWKP LV QXPEHUHG RQ WKH OHIW )RU FRQYHQLHQFH DOJRULWKP VWDWHPHQWV ZLOO EH UHIHUUHG WR DV OLQHV )RU H[DPSOH D UHIHUHQFH WR OLQH PHDQV WKH VWDWHPHQW DW WKDW DFWXDOO\ LV SULQWHG RQ VHYHUDO OLQHV &RPPHQWV LQ WKH DOJRULWKP EHJLQ ZLWK fÂ§ B/ DQG 7 DUH MXVW WZR GLIIHUHQW IL[HG DUELWUDU\ YDOXHV ,Q JHQHUDO WKH DOJRULWKP ZRUNV DV IROORZV $ GHILQLWLRQ G DQG LWV DVVRFLDWHG $OORZ DQG 7UDQVIRUP VHWV DUH SRSSHG IURP WKH VWDFN OLQH f DQG WKHQ WKH UHDFKLQJ GHILQLWLRQV GDWDIORZ SUREOHP LV VROYHG IRU WKLV GHILQLWLRQ G LPSRVLQJ DQ\ EDFNZDUG IORZ UHVWULFWLRQV UHSUHVHQWHG E\ WKH $OORZ DQG 7UDQVIRUP VHWV OLQH f 5HDFKLQJ GHILQLWLRQV IRU D VLQJOH GHILQLWLRQ LV WKH SUREOHP RI ILQGLQJ DOO XVHV DQG GHILQLWLRQV DIIHFWHG E\ WKH GHILQLWLRQ 7KH GHILQLWLRQ G WKDW ZDV GDWDIORZ DQDO\]HG DQG DQ\ XVHV DIIHFWHG E\ LW DUH LQFOXGHG LQ WKH ULSSOH HIIHFW OLQHV WR f (DFK DIIHFWHG GHILQLWLRQ ZLOO KDYH LWV $OORZ DQG 7UDQVIRUP VHWV GHWHUPLQHG LQ DFFRUGDQFH ZLWK 7KHRUHPV WKURXJK OLQHV WR f $ FKHFN LV WKHQ PDGH WR VHH LI WKH DIIHFWHG GHILQLWLRQ DQG LWV UHVWULFWLRQ VHWV $OORZ DQG 7UDQVIRUP VKRXOG EH DGGHG WR WKH VWDFN IRU GDWDIORZ DQDO\VLV RU QRW OLQHV WR f 7KH DOJRULWKP HQGV ZKHQ WKH VWDFN LV HPSW\ $OWKRXJK WKH DOJRULWKP VKRZV D VLQJOH GHILQLWLRQ E EHLQJ DGGHG WR WKH VWDFN DW OLQH DQ\ QXPEHU RI GLIIHUHQW E FDQ DFWXDOO\ EH DGGHG DORQJ ZLWK HPSW\ UHVWULFWLRQ VHWV IRU HDFK E PAGE 62 fÂ§ &RPSXWH WKH ORJLFDO ULSSOH HIIHFW IRU D K\SRWKHWLFDO RU DFWXDO GHILQLWLRQ E fÂ§ ,QSXW D SURJUDP IORZJUDSK UHDG\ IRU GDWDIORZ DQDO\VLV fÂ§ 2XWSXW WKH ORJLFDO ULSSOH HIIHFW LQ 5,33/( EHJLQ 5,33/( m IRU HDFK GHILQLWLRQ GG LQ WKH SURJUDP )O1GG m HQG IRU VWDFN mfÂ§ SXVK E f RQWR VWDFN ZKLOH VWDFN GR SRS VWDFN LQWR G $//2: 75$16)250f 6ROYH WKH UHDFKLQJGHILQLWLRQV GDWDIORZ HTXDWLRQV IRU WKH VLQJOH GHILQLWLRQ G XVLQJ 5XOHV DQG 5,33/( 5,33/( 8 ^G` IRU HDFK XVH X LQ WKH SURJUDP WKDW LV DIIHFWHG E\ HLWKHU G? RU G 5,33/( 5,33/( 8 ^X` HQG IRU 5227 /,1. 5227 m /,1. IRU HDFK FDOO QRGH Q LQ WKH IORZJUDSK LI G? f %RXW>Q@ DQG G? FURVVHG IURP WKLV FDOO LQWR WKH FDOOHG SURFHGXUH 5227 fÂ§ 5227 8 ^WKH FDOO QRGH Q` IL LI GL f (RXW>Q? DQG GL FURVVHG IURP WKLV FDOO LQWR WKH FDOOHG SURFHGXUH /,1. /,1. 8 ^WKH FDOO QRGH Q` IL LI G f %RXW>Q@ DQG G FURVVHG IURP WKLV FDOO LQWR WKH FDOOHG SURFHGXUH 5227 5227 8 ^WKH FDOO QRGH Q` IL LI G f (RXW>Q? DQG G FURVVHG IURP WKLV FDOO LQWR WKH FDOOHG SURFHGXUH /,1. /,1. 8 ^WKH FDOO QRGH Q` IL HQG IRU )LJXUH 7KH ORJLFDO ULSSOH HIIHFW DOJRULWKP PAGE 63 IRU HDFK GHILQLWLRQ GG LQ WKH SURJUDP WKDW LV DIIHFWHG E\ HLWKHU G? RU G fÂ§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU GG E\ 7KHRUHP LI G f %LQ >QRGH ZKHUH GG RFFXUV@ 3$7+6 m 75$16 m FDOO $QDO\]H HOVH fÂ§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU GG E\ 7KHRUHP LI G e Af>QRGH ZKHUH GG RFFXUV@ 3$7+6 Y 3$7+6 fÂ§ ^D [ e 5227 8 /,1.f $ [ FDOOV WKH SURFHGXUH WKDW FRQWDLQV GG 9 [ FDOOV D SURFHGXUH WKDW FRQWDLQV D FDOO F f 3$7+6 Q /,1.ff` 75$16 5227 Q 3$7+6 FDOO $QDO\]H IL fÂ§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU GG E\ 7KHRUHP LI G? e %LQ >QRGH ZKHUH GG RFFXUV@ 3$7+6 m 3$7+6 fÂ§ ^D [ e $//2: $ [ FDOOV WKH SURFHGXUH WKDW FRQWDLQV GG 9 [ FDOOV D SURFHGXUH WKDW FRQWDLQV D FDOO F e 3$7+6f` 75$16 75$16)250 3$7+6 FDOO $QDO\]H IL fÂ§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU GG E\ 7KHRUHP LI G? e enQ>QRGH ZKHUH GG RFFXUV@ IRU HDFK SURFHGXUH ; WKDW FRQWDLQV D FDOO e 5227 57 33 33 ^[ [ e 5227 $ [ LV FRQWDLQHG LQ SURFHGXUH ;` ^[ [ e 57 8 /,1.f $ [ LV RQ D SDWK WKDW LQFOXVLYHO\ EHJLQV ZLWK D FDOO e 57 DQG HQGV ZLWK D FDOO RI WKH SURFHGXUH WKDW FRQWDLQV GG VXFK WKDW HDFK FDOO LQ WKLV SDWK LV LQ 57 8 /,1.ff` LI 33 A 3$7+6 3$7+6 fÂ§ ^[ [ e $//2: $ [ FDOOV SURFHGXUH ; 9 [ FDOOV D SURFHGXUH WKDW FRQWDLQV D FDOO F e 3$7+6f` 75$16 75$16)250 3$7+6 3$7+6 3$7+6 8 33 FDOO $QDO\]H HQG VWDWHPHQWV IL HQG IRU IL IL HQG IRU RG HQG )LJXUH FRQWLQXHG PAGE 64 3URFHGXUH $QDO\]H EHJLQ fÂ§ DYRLG UHSHWLWLRQ RI GG GDWDIORZ DQDO\VLV LI SRVVLEOH LI ),1r 7 $ 3$7+6 9 WUXH IRU DOO VDYHG SDLUV IRU GG 3$7+6 b 3 9 75$16 e 7ff LI 3$7+6 ),1GLL 7 SXVK GG f RQWR VWDFN HOVH VDYH 3$7+6 DQG 75$16 DV WKH SDLU 3 [ 7 IRU GG SXVK GG 3$7+6 75$16f RQWR VWDFN IL IL HQG )LJXUH FRQWLQXHG 7KH GDWDIORZ HTXDWLRQV UHIHUUHG WR LQ OLQH DUH VKRZQ LQ )LJXUH 7KHVH HTXDWLRQV DUH FRSLHG IURP &KDSWHU WKDW SUHVHQWV D PHWKRG IRU FRQWH[WGHSHQGHQW IORZVHQVLWLYH LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV 7KH PHWKRG FRQVLVWV RI VROYLQJfÂ§ XVLQJ WKH VWDQGDUG LWHUDWLYH DOJRULWKPfÂ§WKH GDWDIORZ HTXDWLRQV VKRZQ LQ )LJXUH IRU WKH SURJUDP IORZJUDSK UHTXLUHG E\ WKH HTXDWLRQV 7KH PHWKRG LQ &KDSWHU LQFOXGHV D VROXWLRQ WR WKH SUREOHPV RI SDUDPHWHU DOLDVLQJ DQG LPSOLFLW GHILQLWLRQV WKDW DUH SDUW RI WKH LQWHUSURFHGXUDO UHDFKLQJGHILQLWLRQ SUREOHP :H DVVXPH WKDW WKH IXOO PHWKRG RI &KDSWHU ZRXOG EH XVHG EXW ZH GR QRW GLVFXVV WKHVH VLGH LVVXHV LQ WKLV FKDSWHU DV WKH\ DUH QRW GLUHFWO\ UHOHYDQW WR WKH DOJRULWKP 1RWH WKDW WKHUH DUH RWKHU PHWKRGV IRU FRQWH[WGHSHQGHQW IORZVHQVLWLYH LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV > @ EXW WKH PHWKRG RI &KDSWHU KDV SUHFLVLRQ DQG HIILFLHQF\ DGYDQWDJHV RYHU WKH RWKHU PHWKRGV FLWHG 5HIHUULQJ WR WKH GDWDIORZ HTXDWLRQV RI )LJXUH IRXU VHWV DUH FRPSXWHG IRU HDFK IORZJUDSK QRGH WZR ERG\ VHWV Q DQG %RXW DQG WZR HQWU\ VHWV DQG (RXW $OO ERG\ DQG HQWU\ VHWV DUH LQLWLDOO\ HPSW\ $V WKH HTXDWLRQV ZLOO EH VROYHG IRU RQO\ D VLQJOH GHILQLWLRQ G WKH *(1 VHW IRU WKH QRGH ZKHUH G RFFXUVfÂ§LH WKH QRGH ZKRVH PAGE 65 )RU DQ\ QRGH Q ,1>Q@ (LQ >Q@ 8 ef>Q@ 287>Q@ (RXW>Q? 8 %RXW>Q@ *URXS Q LV DQ HQWU\ QRGH %LQ>Q@ (LQ>Q@ 8 ^[?[H287>S`$&` S SUHGQf %RXW>Q? *(1>Q? (RXW>Q@ (WQ>Q@ 8 5(&2'(>Q? *URXS ,, Q LV D UHWXUQ QRGH S LV WKH DVVRFLDWHG FDOO QRGH DQG T LV WKH H[LW QRGH RI WKH FDOOHG SURFHGXUH %LQ>Q` ^[ [ %RXW>S@ $ &L 9 &L $ & $ [ H (RXW>"@fff 9 [ %RXW>T@ $ &f` (LQ>Q@ ^Le (RXW>S@ &L 9 &L $ & $ [ (RXW >"@f` 8W>Q@ %f>Q@ .,//>Q`f 8 *(1>Q? (XW>Q@ (LQ1 .,//>Q? *URXS ,,, Q LV QRW DQ HQWU\ RU UHWXUQ QRGH f>Q@ 8 %RXW>S@ S SUHGQf (LQ >IW@ 8 (RXW>S@ S SUHGQf %RXW>Q@ eP>Q@ .,//>Q@f 8 *(1>Q` (RXW>Q? (LQ>Q? fÂ§ .,//>Q@ )LJXUH 'DWDIORZ HTXDWLRQV IRU WKH UHDFKLQJGHILQLWLRQV SUREOHP PAGE 66 DVVRFLDWHG EORFN RI SURJUDP FRGH FRQWDLQV WKH GHILQLWLRQ GfÂ§ZLOO FRQWDLQ DQ HOHPHQW UHSUHVHQWLQJ G DQG DOO WKH RWKHU *(1 VHWV ZLOO EH HPSW\ 7KH QRGH ZKHUH G RFFXUV LV WKH QDWXUDO VWDUWLQJ SRLQW IRU WKH LWHUDWLYH DOJRULWKP WKDW ZLOO UHFRPSXWH WKH ERG\ DQG HQWU\ VHWV IRU WKH QRGHV XQWLO VWDELOLW\ LV DWWDLQHG DQG WKH VHWV FHDVH WR FKDQJH DW ZKLFK SRLQW WKH HTXDWLRQV KDYH EHHQ VROYHG 2QFH VROYHG DQ HOHPHQW LV LQ WKH HQWU\ VHW RU ERG\ VHW DW D SDUWLFXODU QRGH GHSHQGLQJ RQ KRZ WKDW HOHPHQW ZDV SURSDJDWHG WR WKDW QRGH 7KH VDPH HOHPHQW PD\ EH LQ ERWK VHWV DW WKH VDPH QRGH 3URSHUWLHV DQG OLVWHG EHORZ VXPPDUL]H WKRVH LPSOLFDWLRQV RI VHW PHPEHUVKLS WKDW DUH XVHG E\ WKH DOJRULWKP 7KH SURSHUWLHV IROORZ GLUHFWO\ IURP WKH GDWDIORZ HTXDWLRQV 3URSHUW\ )RU DQ\ QRGH Q DQ HOHPHQW LV LQ WKH eQ>Q@ VHW RU (RXW>Q? VHW LI DQG RQO\ LI WKDW HOHPHQW HQWHUHG WKH SURFHGXUH WKDW FRQWDLQV QRGH Q IURP D FDOO QRGH DQG WKHUH LV D GHILQLWLRQFOHDU SDWK IURP WKDW FDOO QRGH WR QRGH Q 7KXV PHPEHUVKLS LQ WKH HQWU\ VHW RI QRGH Q LPSOLHV WKDW WKH HOHPHQW FDQ SURSDJDWH WR QRGH Q E\ DQ H[HFXWLRQ SDWK WKDW PDNHV DW OHDVW RQH XQUHWXUQHG FDOO EHWZHHQ WKH SRLQW ZKHUH WKH HOHPHQW LV JHQHUDWHG DQG WKH SRLQW ZKHUH QRGH Q RFFXUV 3URSHUW\ )RU DQ\ QRGH Q DQ HOHPHQW LV LQ WKH )"LQ>Q@ VHW RU %RXW>Q@ VHW LI DQG RQO\ LI WKDW HOHPHQW ZDV JHQHUDWHG LQ WKH VDPH SURFHGXUH WKDW FRQWDLQV QRGH Q RU WKDW HOHPHQW HQWHUHG WKH SURFHGXUH WKDW FRQWDLQV QRGH Q IURP DQ H[LWQRGH %RXW VHW 7KHUH PXVW DOVR EH D GHILQLWLRQFOHDU SDWK WR QRGH Q IURP HLWKHU WKH HOHPHQWfV JHQHUDWLRQ QRGH RU IURP WKH H[LW QRGH ,I WKH HOHPHQW HQWHUHG IURP DQ H[LWQRGH %RXW VHW WKHQ 3URSHUW\ DSSOLHV UHFXUVLYHO\ WR WKH HOHPHQW LQ WKDW %RXW VHW 7KXV PHPEHUVKLS LQ WKH ERG\ VHW RI QRGH Q LPSOLHV WKDW WKH HOHPHQW FDQ SURSDJDWH WR QRGH Q E\ DQ H[HFXWLRQ SDWK EHWZHHQ WKH SRLQW ZKHUH WKH HOHPHQW LV JHQHUDWHG DQG WKH SRLQW ZKHUH QRGH Q RFFXUV WKDW GRHV QRW LQFOXGH DQ\ XQUHWXUQHG FDOOV 7KH WKUHH UXOHV UHIHUUHG WR LQ OLQH DUH OLVWHG EHORZ 5XOH DSSOLHV EHIRUH WKH GDWDIORZ HTXDWLRQV DUH VROYHG 5XOHV DQG DSSO\ DV WKH HTXDWLRQV DUH EHLQJ PAGE 67 VROYHG 7KH UXOHV LPSRVH WKH EDFNZDUGIORZ UHVWULFWLRQV UHSUHVHQWHG E\ WKH $//2: DQG 75$16)250 VHWV LQ OLQH 5XOH ,I $//2: WKHQ HOHPHQW G LV JHQHUDWHG DW WKH QRGH ZKHUH GHILQLWLRQ G RFFXUV RWKHUZLVH G[ LV WKH JHQHUDWHG HOHPHQW PHDQLQJ WKH HOHPHQW LQ WKH *(1 VHW %RWK G? DQG G DUH EDVH HOHPHQWV WKDW UHSUHVHQW WKH VDPH GHILQLWLRQ G %RWK HOHPHQWV DUH LGHQWLFDO LQ WHUPV RI ZKHQ WKH\ DSSHDU LQ DQ\ JLYHQ .,// VHW 7KH RQO\ GLIIHUHQFH EHWZHHQ WKHP LV WKDW G? DQG G DUH WUHDWHG GLIIHUHQWO\ E\ 5XOHV DQG EHORZ ,I WKH $//2: VHW LV HPSW\ WKHQ E\ 'HILQLWLRQ WKHUH VKRXOG EH QR EDFNZDUG IORZ UHVWULFWLRQV RQ G 5XOH DFFRPSOLVKHV WKLV UHTXLUHPHQW DV G LV LPPXQH WR EDFNZDUGIORZ UHVWULFWLRQV ZKLFK DUH LPSRVHG E\ 5XOH 5XOH /HW Q EH D UHWXUQ QRGH S EH WKH DVVRFLDWHG FDOO QRGH DQG T EH WKH H[LW QRGH RI WKH FDOOHG SURFHGXUH (DFK WLPH WKH L"Q>Q@ HTXDWLRQ LV FRPSXWHG LI G? f %RXW>T@L WKHQ G? FDQQRW FURVV IURP %RXW>T@ LQWR WKH f>Q@ VHW LI S e $//2: ,Q WKH GDWDIORZ HTXDWLRQV WKH FURVVLQJ RI DQ HOHPHQW IURP DQ H[LWQRGH ERG\ VHW WR D UHWXUQ QRGH LV WKH RQO\ DFWLRQ LQ WKH HTXDWLRQV WKDW UHSUHVHQWV LQ HIIHFW DQ XQPDWFKHG UHWXUQ WR D FDOO LQVWDQFH WKDW ZDV PDGH LQ DQ H[HFXWLRQ SDWK OHDGLQJ XS WR WKH SURJUDP SRLQW ZKHUH GHILQLWLRQ G RFFXUV ZKLFK LV WKH VWDUWLQJ SRLQW RI WKH UHDFKLQJGHILQLWLRQ DQDO\VLV GRQH IRU G 7KXV 5XOH FRYHUV DOO FDVHV LQ ZKLFK DQ XQPDWFKHG UHWXUQ RFFXUV 5XOH UHVWULFWV XQPDWFKHG UHWXUQV WR WKRVH FDOO LQVWDQFHV WKDW DUH UHSUHVHQWHG LQ WKH $//2: VHW WKHUHE\ UHDOL]LQJ WKH SXUSRVH RI WKH $//2: VHW DV JLYHQ E\ 'HILQLWLRQ 5XOH /HW Q EH D UHWXUQ QRGH S EH WKH DVVRFLDWHG FDOO QRGH DQG T EH WKH H[LW QRGH RI WKH FDOOHG SURFHGXUH (DFK WLPH WKH P>Q@ HTXDWLRQ LV FRPSXWHG LI G? ( %RXW>T@ DQG E\ & DQG 5XOH GM FDQ FURVV IURP %RXW>T@ LQWR WKH VHW DQG S 75$16)250 WKHQ DV WKLV G? HOHPHQW FURVVHV IURP %RXW>T@ LQWR WKH PAGE 68 VHW WKH HOHPHQW LV FKDQJHG WR G ,Q HIIHFW G? LV WUDQVIRUPHG LQWR G DQG WKH UHWXUQ QRGH Q EHFRPHV D JHQHUDWLRQ QRGH IRU WKH G HOHPHQW $V DOUHDG\ PHQWLRQHG LQ WKH GDWDIORZ HTXDWLRQV WKH FURVVLQJ RI DQ HOHPHQW IURP DQ H[LWQRGH ERG\ VHW WR D UHWXUQ QRGH LV WKH RQO\ DFWLRQ LQ WKH HTXDWLRQV WKDW UHSUHVHQWV LQ HIIHFW DQ XQPDWFKHG UHWXUQ WR D FDOO LQVWDQFH WKDW ZDV PDGH LQ DQ H[HFXWLRQ SDWK OHDGLQJ XS WR WKH SURJUDP SRLQW ZKHUH GHILQLWLRQ G RFFXUV ZKLFK LV WKH VWDUWLQJ SRLQW RI WKH UHDFKLQJGHILQLWLRQ DQDO\VLV GRQH IRU G 7KXV 5XOH FRYHUV DOO FDVHV LQ ZKLFK DQ XQPDWFKHG UHWXUQ RFFXUV 7KH UHTXLUHPHQW E\ 5XOH WKDW WKH UHWXUQHGWR FDOO EH LQ WKH 75$16)250 VHW VDWLVILHV 'HILQLWLRQ DV WR ZKHQ EDFNZDUGIORZ UHVWULFWLRQV FDQ EH LJQRUHG 5XOH UHSODFHV HOHPHQW G? ZKLFK LV VXEMHFW WR WKH EDFNZDUGIORZ UHVWULFWLRQV ZLWK HOHPHQW G ZKLFK LV IUHH RI EDFNZDUG IORZ UHVWULFWLRQV DW WKH UHWXUQ SRLQW DQG WKHUHE\ VDWLVILHV 'HILQLWLRQ UHJDUGLQJ UHPRYDO RI EDFNZDUGIORZ UHVWULFWLRQV RQ WKH H[HFXWLRQSDWK FRQWLQXDWLRQ VLQFH G QRZ UHSUHVHQWV WKH FRQWLQXDWLRQ LQVWHDG RI G? /HPPD 7KH DOJRULWKP FRPSXWHV DW OLQHV WR WKH UHVWULFWLRQ VHWV IRU DQ DIIHFWHG GHILQLWLRQ LQ DFFRUGDQFH ZLWK 7KHRUHPV WKURXJK 3URRI :H ILUVW HVWDEOLVK WKH SURSHUWLHV RI WKH /,1. DQG 5227 VHWV FRPSXWHG DW OLQHV WR /HW S EH WKH QRGH LI DQ\ ZKHUH G? LV JHQHUDWHG /HW T EH DQ\ QRGH ZKHUH G LV JHQHUDWHG LH WKRVH UHWXUQ QRGHV ZKHUH GL LV WUDQVIRUPHG LQWR G RU IRU $//2: WKH QRGH ZKHUH GHILQLWLRQ G RFFXUV 7KH WHVWV DW OLQHV DQG PDNH XVH RI 3URSHUW\ LI DQ HOHPHQW LV LQ WKH %RXW VHW RI D FDOO QRGH Q WKHQ WKHUH H[LVWV D GHILQLWLRQFOHDU SDWK EHWZHHQ WKH QRGH ZKHUH WKH HOHPHQW LV JHQHUDWHG DQG QRGH Q DQG WKH SDWK KDV QR XQUHWXUQHG FDOOV 7KH FDOO DW QRGH Q ZRXOG EH WKH ILUVW XQUHWXUQHG FDOO RQ WKDW SDWK E\ MXVW H[WHQGLQJ WKH SDWK WR WKH HQWU\ QRGH RI WKH FDOOHG SURFHGXUH 7KHUHIRUH WKH 57 VHW UHSUHVHQWV DOO FDOOV WKDW DUH WKH ILUVW XQUHWXUQHG FDOO RQ DW OHDVW RQH GHILQLWLRQFOHDU SDWK EHWZHHQ QRGH S DQG VRPH RWKHU QRGH LQ WKH IORZJUDSK 7KH 5227 VHW UHSUHVHQWV DOO FDOOV PAGE 69 WKDW DUH WKH ILUVW XQUHWXUQHG FDOO RQ DW OHDVW RQH GHILQLWLRQFOHDU SDWK EHWZHHQ QRGH T DQG VRPH RWKHU QRGH LQ WKH IORZJUDSK 7KH WHVWV RI OLQHV DQG PDNH XVH RI 3URSHUW\ LI DQ HOHPHQW LV LQ WKH (RXW VHW RI D FDOO QRGH Q WKHQ WKHUH H[LVWV D GHILQLWLRQFOHDU SDWK EHWZHHQ WKH QRGH ZKHUH WKH HOHPHQW LV JHQHUDWHG DQG QRGH Q DQG WKH SDWK LQFOXGHV WKH XQUHWXUQHG FDOO WKDW FDOOHG WKH SURFHGXUH FRQWDLQLQJ QRGH Q 7KH FDOO DW QRGH Q ZRXOG EH DW OHDVW WKH VHFRQG XQUHWXUQHG FDOO RQ WKDW SDWK E\ MXVW H[WHQGLQJ WKH SDWK WR WKH HQWU\ QRGH RI WKH FDOOHG SURFHGXUH 7KHUHIRUH WKH /,1. VHW UHSUHVHQWV DOO FDOOV WKDW DUH DQ XQUHWXUQHG FDOO EXW QRW WKH ILUVW XQUHWXUQHG FDOO RQ DW OHDVW RQH GHILQLWLRQFOHDU SDWK EHWZHHQ QRGH S DQG VRPH RWKHU QRGH LQ WKH IORZJUDSK 7KH /,1. VHW UHSUHVHQWV DOO FDOOV WKDW DUH DQ XQUHWXUQHG FDOO EXW QRW WKH ILUVW XQUHWXUQHG FDOO RQ DW OHDVW RQH GHILQLWLRQFOHDU SDWK EHWZHHQ QRGH T DQG VRPH RWKHU QRGH LQ WKH IORZJUDSK 7KH WHVW DW OLQH FKHFNV IRU WKH DSSOLFDWLRQ RI 7KHRUHP ,I G f 5Q>QRGH ZKHUH GG RFFXUV@ WKHQ E\ 3URSHUW\ WKHUH H[LVWV D GHILQLWLRQFOHDU SDWK 3 EHWZHHQ G DQG GG WKDW KDV QR XQUHWXUQHG FDOOV DQG VRPHZKHUH DORQJ 3 G LV JHQHUDWHG PHDQLQJ HLWKHU $//2: RU LV GHILQHG IRU 3 7KLV VDWLVILHV WKH FRQGLWLRQV RI 7KHRUHP DQG OLQH VHWV 3$7+6 DQG 75$16 WR HPSW\ LQ DFFRUGDQFH ZLWK WKH WKHRUHP 3$7+6 DQG 75$16 DUH WKH $OORZ DQG 7UDQVIRUP VHWV IRU GG 7KH WHVW DW OLQH FKHFNV IRU WKH DSSOLFDWLRQ RI 7KHRUHP ,I G eP>QRGH ZKHUH GG RFFXUV@ WKHQ E\ 3URSHUW\ WKHUH H[LVWV DW OHDVW RQH GHILQLWLRQFOHDU SDWK 3 EHWZHHQ G DQG GG WKDW KDV DW OHDVW RQH XQUHWXUQHG FDOO DQG VRPHZKHUH DORQJ 3 G LV JHQHUDWHG PHDQLQJ HLWKHU $//2: RU LV GHILQHG IRU 3 7KLV VDWLVILHV WKH FRQGLWLRQV RI 7KHRUHP 2QO\ WKH G HOHPHQW VDWLVILHV WKH WKHRUHP VR LW IROORZV WKDW DOO SDWKV 3 IRU WKH WKHRUHP ZLOO KDYH WR EH FRQVWUXFWHG IURP WKH 5227 DQG /,1. VHWV H[FOXVLYHO\ 5HIHUULQJ WR 7KHRUHP OLQH FRPSXWHV WKH $$ VHW DQG OLQH FRPSXWHV 77 )RU OLQH WKH 3$7+6 VHW LV GHILQHG LQ WHUPV RI LWVHOI 7KLV UHFXUVLYH UHIHUHQFH PAGE 70 PHDQV WKDW HDFK WLPH D FDOO LV DGGHG WR WKH 3$7+6 VHW WKH FRQGLWLRQ FRQWDLQLQJ WKH UHFXUVLYH UHIHUHQFH PXVW EH UHHYDOXDWHG EHFDXVH DGGLWLRQDO FDOOV PD\ WKHUHE\ EH DGGHG WR 3$7+6 5HFXUVLYH UHIHUHQFHV DUH VLPLODUO\ XVHG LQ OLQHV DQG :KDW OLQH GRHV LV H[WUDFW IURP DOO WKH FDOOV WKDW HOHPHQW G FURVVHG MXVW WKRVH FDOOV WKDW DUH RQ D SDWK WR GG 7KLV LV GRQH E\ EXLOGLQJ WKH SDWKV EDFNZDUGV EHJLQQLQJ ZLWK WKRVH FDOOV WKDW FDOO WKH SURFHGXUH FRQWDLQLQJ GG %\ /HPPD DQ\ SDWK EHWZHHQ G DQG GG FRQVLVWLQJ RI XQUHWXUQHG FDOOV FDQ EH IRXQG E\ SURFHHGLQJ LQ UHYHUVH RUGHU IURP GG DQG VHOHFWLQJ WKRVH FDOOV WKDW FDOO D SURFHGXUH FRQWDLQLQJ D FDOO DOUHDG\ VHOHFWHG %DFNZDUG SDWK EXLOGLQJ DQG /HPPD DUH VLPLODUO\ XVHG LQ OLQHV DQG %\ WKH SURSHUWLHV RI WKH 5227 DQG /,1. VHWV WKH SDWKV FRQVWUXFWHG E\ OLQH ZLOO EH GHILQLWLRQFOHDU 1RWLFH WKDW D SDUWLFXODU FDOO PD\ EH LQ ERWK WKH 57 DQG /,1. VHWV EXW LI D FDOO LV RQO\ LQ WKH 57 VHW WKHQ LW FDQQRW EH XVHG DV WKH EDVLV IRU H[WHQGLQJ IXUWKHU EDFNZDUGV DQ\ SDWK EHFDXVH E\ 3URSHUW\ G GRHV QRW SURSDJDWH IURP WKH HQWU\ QRGH RI WKH SURFHGXUH WKDW FRQWDLQV WKDW FDOO WR WKH FDOO QRGH IRU WKDW FDOO 7KLV LV WKH UHDVRQ IRU WKH 3$7+6 2 /,1.f UHTXLUHPHQW LQ OLQH 2QFH WKH 3$7+6 VHW LV FRPSXWHG OLQH FRPSXWHV 75$16 LQ DFFRUGDQFH ZLWK WKH WKHRUHP 7KH WHVW DW OLQH FKHFNV IRU WKH DSSOLFDWLRQ RI 7KHRUHP ,I G? %P>QRGH ZKHUH GG RFFXUV@ WKHQ E\ 3URSHUW\ WKHUH H[LVWV D GHILQLWLRQFOHDU SDWK 3 EHWZHHQ G DQG GG WKDW KDV QR XQUHWXUQHG FDOOV ,W DOVR IROORZV WKDW $//2: DQG 3 GRHV QRW PDNH DQ XQPDWFKHG UHWXUQ WR D FDOO f 75$16)250 EHFDXVH G? LV WKH HOHPHQW PHDQLQJ LV QRW GHILQHG IRU 3 7KLV VDWLVILHV WKH FRQGLWLRQV RI 7KHRUHP 5HIHUULQJ WR 7KHRUHP OLQH FRPSXWHV WKH $$ VHW DQG OLQH FRPSXWHV 77 :KDW OLQH GRHV LV H[WUDFW IURP $//2: DOO SDWKV WKDW HQG ZLWK D FDOO RI WKH SURFHGXUH FRQWDLQLQJ GG $OWKRXJK 7KHRUHP VWDWHV WKDW WKH SDWK EHJLQ ZLWK D FDOO f 75$16)250 OLQH GRHV QRW UHTXLUH D FKHFN IRU WKLV EHFDXVH 75$16)250 LV D VXEVHW RI $//2: DQG WKRVH ILUVW XQUHWXUQHG FDOOV LQ 75$16)250 WKDW DUH RQ D SDWK LQ $//2: WR GG ZLOO XQDYRLGDEO\ EH SLFNHG XS DV WKH SDWKV DUH EXLOW EDFNZDUGV PAGE 71 IURP GG 7KXV WKH 3$7+6 VHW LV FRPSXWHG LQ DFFRUGDQFH ZLWK 7KHRUHP IROORZHG E\ OLQH WKDW FRPSXWHV WKH 75$16 VHW LQ DFFRUGDQFH ZLWK WKH WKHRUHP 7KH WHVW DW OLQH FKHFNV IRU WKH DSSOLFDWLRQ RI 7KHRUHP ,I G? f eLQ>QRGH ZKHUH GG RFFXUV@ WKHQ E\ 3URSHUW\ WKHUH H[LVWV DW OHDVW RQH GHILQLWLRQFOHDU SDWK 3 EHWZHHQ G DQG GG WKDW KDV DW OHDVW RQH XQUHWXUQHG FDOO ,W DOVR IROORZV WKDW $//2: Â DQG 3 GRHV QRW PDNH DQ XQPDWFKHG UHWXUQ WR D FDOO 75$16)250 EHFDXVH GL LV WKH HOHPHQW 7KLV VDWLVILHV WKH FRQGLWLRQV RI WKH WKHRUHP 2QO\ WKH G? HOHPHQW VDWLVILHV WKH WKHRUHP VR LW IROORZV WKDW DOO SDWKV 3 IRU WKH WKHRUHP ZLOO KDYH WR EH FRQVWUXFWHG IURP WKH 5227 DQG /,1. VHWV H[FOXVLYHO\ 5HIHUULQJ WR 7KHRUHP OLQH FRPSXWHV WKH 6L VHW OLQH FRPSXWHV WKH 6 VHW OLQH FRPSXWHV WKH 77 VHW DQG OLQH FRPSXWHV WKH $$ VHW 7KH UHDVRQ IRU WKH WHVW DW OLQH LV WKDW DOWKRXJK WKHUH H[LVWV DW OHDVW RQH SDWK 3 VDWLVI\LQJ WKH WKHRUHP WKHUH PD\ QRW EH DQ\ SDWKV 3 WKDW EHJLQ LQ WKH VSHFLILF SURFHGXUH ; ,W FDQ EH VHHQ WKDW OLQHV WR FRPSXWH LQ DFFRUGDQFH ZLWK WKH WKHRUHP Â’ /HPPD /HW $ DQG EH RQH SDLU RI $OORZ DQG 7UDQVIRUP VHWV DVVRFLDWHG ZLWK D GHILQLWLRQ G DQG OHW $M DQG 7M EH D GLIIHUHQW SDLU RI $OORZ DQG 7UDQVIRUP VHWV DVVRFLDWHG ZLWK WKH VDPH GHILQLWLRQ G $VVXPH $ A DQG $M ,I $M & $ DQG 7M Â 7 WKHQ GDWDIORZ DQDO\]LQJ G ZLWK WKH SDLU $M DQG 7M FDQQRW DGG DQ\WKLQJ WR WKH ULSSOH HIIHFW WKDW LV QRW DGGHG E\ GDWDIORZ DQDO\]LQJ G ZLWK WKH SDLU $ DQG 7 3URRI %\ LQVSHFWLRQ RI 5XOHV DQG LW FDQ EH VHHQ WKDW UHPRYLQJ VRPH RI WKH FDOOV IURP $ RU 7 FDQQRW PDNH G DIIHFW DQ\WKLQJ WKDW LW GRHV QRW DIIHFW ZLWK $ DQG Â DV WKH\ ZHUH $OVR E\ LQVSHFWLRQ RI OLQHV WR WKH GHWHUPLQDWLRQ RI WKH $OORZ DQG 7UDQVIRUP VHWV IRU DQ\ GHILQLWLRQ GG DIIHFWHG E\ G FDQQRW EH PDGH WR LQFOXGH FDOOV ZKHQ $ DQG 7M DUH WKH UHVWULFWLRQ VHWV IRU G WKDW ZRXOG QRW EH LQFOXGHG ZKHQ $ DQG 7 DUH WKH UHVWULFWLRQ VHWV IRU G Â’ /HPPD Â‘ /HW $ DQG 7 EH $OORZ DQG 7UDQVIRUP VHWV DVVRFLDWHG ZLWK D GHILQLWLRQ G DQG OHW ; DQG < EH D GLIIHUHQW SDLU RI $OORZ DQG 7UDQVIRUP VHWV DVVRFLDWHG ZLWK PAGE 72 WKH VDPH GHILQLWLRQ G ,I $ WKHQ GDWDIORZ DQDO\]LQJ G ZLWK ; DQG < FDQQRW DGG DQ\WKLQJ WR WKH ULSSOH HIIHFW WKDW LV QRW DGGHG E\ GDWDIORZ DQDO\]LQJ G ZLWK $ DQG 7 3URRI %\ 5XOH G ZLOO EH UHSUHVHQWHG E\ G DQG KDYH QR UHVWULFWLRQV RQ LWV EDFNZDUG IORZ 7KXV G ZLOO DIIHFW HYHU\WKLQJ WKDW LW LV SRVVLEOH IRU LW WR DIIHFW ,I G LV GDWDIORZ DQDO\]HG ZLWK ; DQG < WKHQ DQ\ FDOOV IRXQG LQ WKH 5227 5227 /,1. RU /,1. VHWV ZLOO DOVR EH IRXQG LQ WKH 5227 RU /,1. VHWV ZKHQ G LV GDWDIORZ DQDO\]HG ZLWK $ DQG 7 7KHVH VHWV GHWHUPLQH WKH UHVWULFWLRQ VHWV DVVRFLDWHG ZLWK D GHILQLWLRQ GG DIIHFWHG E\ G ,W IROORZV WKDW DQ\ GDWDIORZ SDWK DOORZHG IRU D GG DIIHFWHG E\ G XVLQJ ; DQG < ZLOO DOVR EH DOORZHG IRU D GG DIIHFWHG E\ G XVLQJ $ DQG 7 Â’ 7KHRUHP *LYHQ 'HILQLWLRQ DQG 7KHRUHPV WKURXJK WKH DOJRULWKP ZLOO FRUUHFWO\ FRPSXWH WKH ORJLFDO ULSSOH HIIHFW 3URRI $V VKRZQ E\ /HPPD IRU DQ\ DIIHFWHG GHILQLWLRQ GG WKH $OORZ DQG 7UDQVIRUP VHWV WR EH DVVRFLDWHG ZLWK GG DUH FRPSXWHG LQ DFFRUGDQFH ZLWK 7KHRUHPV WR %\ /HPPD LI 7KHRUHP DSSOLHV WR DQ DIIHFWHG GHILQLWLRQ OLQH f WKHQ WKHUH LV QR QHHG WR FKHFN LI DQ\ RWKHU WKHRUHP DOVR DSSOLHV EHFDXVH DGGLWLRQDO GDWDIORZ DQDO\VLV UHVXOWLQJ IURP WKH RWKHU WKHRUHPV FDQQRW FRQWULEXWH WR WKH ULSSOH HIIHFW +RZHYHU LI 7KHRUHP GRHV QRW DSSO\ WKHQ WKH GHILQLWLRQ PXVW EH GDWDIORZ DQDO\]HG VHSDUDWHO\ LQ WXUQ IRU HDFK WKHRUHP WKDW GRHV DSSO\ 7KLV LV GRQH E\ WKH VHTXHQFH RI WKUHH LI VWDWHPHQWV DW OLQHV DQG 7KXV WKH FRQWURO ORJLF LQ OLQHV WR LV VDIH 7KH $QDO\]H SURFHGXUH OLQHV WR f SUHSDUHV D GHILQLWLRQ DQG LWV UHVWULFWLRQ VHWV IRU GDWDIORZ DQDO\VLV E\ DGGLQJ WKHP WR WKH VWDFN OLQH DQG f 2QFH D GHILn QLWLRQ ZLOO EH GDWDIORZ DQDO\]HG ZLWK QR UHVWULFWLRQV OLQH f LW ZLOO QRW EH DQDO\]HG DJDLQ OLQH f %\ /HPPD WKLV LV VDIH $VVXPLQJ ),1GG A 7 DQG 3$7+6 A WKH WHVW DW OLQH ZLOO QRW SUHSDUH D GHILQLWLRQ IRU GDWDIORZ DQDO\VLV LI ERWK UHVWULFWLRQ VHWV PAGE 73 DUH VXEVHWV RI DQ\ SDLU RI UHVWULFWLRQ VHWV XVHG SUHYLRXVO\ WR DQDO\]H WKDW GHILQLWLRQ 7KLV IROORZV IURP /HPPD 7KXV WKH $QDO\]H SURFHGXUH LV VDIH 7KH FRUUHFWQHVV RI WKH GDWDIORZ HTXDWLRQV OLQH f LV HVWDEOLVKHG LQ &KDSWHU DQG WKH FRUUHFWQHVV RI WKH WKUHH UXOHV IRU LPSRVLQJ EDFNZDUGIORZ UHVWULFWLRQV OLQH f KDV DOUHDG\ EHHQ GLVFXVVHG 5HJDUGLQJ WKH FRUUHFWQHVV RI KDYLQJ QR EDFNZDUGIORZ UHVWULFWLRQV IRU WKH LQLWLDO GHILQLWLRQ OLQH f OHW S EH WKH SURJUDP SRLQW ZKHUH E RFFXUV )RU H[HFXWLRQ WR DWWDLQ SRLQW S DQ\ SRVVLEOH H[HFXWLRQ SDWK EHWZHHQ WKH SURJUDPfV H[HFXWLRQ VWDUWLQJ SRLQW DQG SRLQW S FDQ EH DVVXPHG WR KDYH RFFXUUHG 7KXV WKHUH VKRXOG EH QR UHVWULFWLRQV RQ WKH EDFNZDUGIORZ SRVVLELOLWLHV RI E EHFDXVH WKHUH ZHUH QR FRQVWUDLQWV LPSRVHG E\ WKH ULSSOH HIIHFW RQ KRZ SRLQW S ZDV LQLWLDOO\ DWWDLQHG Â’ 3URJUDPV ZLWK UHFXUVLYH FDOOV FDQ EH SURFHVVHG E\ RXU DOJRULWKP EXW WKHUH PD\ EH VRPH RYHUHVWLPDWLRQ RI WKH ORJLFDO ULSSOH HIIHFW EHFDXVH RI WKH UHFXUVLYH FDOOV 7KH GDWDIORZ HTXDWLRQV OLQH f DUH QRW WKH SUREOHP DV WKH\ ZRUN IRU UHFXUVLYH SURJUDPV ,QVWHDG WKH SUREOHP LV ZLWK WKH $OORZ VHW DQG LWV UHSUHVHQWDWLRQ RI H[HFXWLRQ SDWKV ,I D F\FOLF H[HFXWLRQ SDWK LV UHSUHVHQWHG LQ WKH $OORZ VHW WKHQ ZKHQ WKH $OORZ VHW LV XVHG WR UHVWULFW EDFNZDUG IORZ E\ 5XOH LW PD\ EH SRVVLEOH IRU DQ HOHPHQW PRYLQJ WKURXJK WKH SURJUDP IORZJUDSK WR WDNH D VKRUWFXW RQ LWV XQPDWFKHG UHWXUQV DQG DYRLG KDYLQJ WR PDNH XQPDWFKHG UHWXUQV DORQJ WKH FRPSOHWH F\FOH EHIRUH D SURJUDP SRLQW FDQ EH DWWDLQHG 7KLV VKRUWFXW PD\ SHUPLW WKH HOHPHQW WR DIIHFW VRPHWKLQJ WKDW LW VKRXOG QRW EH DEOH WR DIIHFW SRVVLEO\ DGGLQJ WR WKH ULSSOH HIIHFW EH\RQG ZKDW VKRXOG EH WKHUH $ 3URWRW\SH 'HPRQVWUDWHV WKH $OJRULWKP 7KLV VHFWLRQ ILUVW FRQVLGHUV WKH FRPSOH[LW\ RI RXU LQWHUSURFHGXUDO ORJLFDO ULSSOH HIIHFW DOJRULWKP $ SURWRW\SH WKDW GHPRQVWUDWHV WKH DOJRULWKP LV WKHQ GHVFULEHG DQG WHVW UHVXOWV SUHVHQWHG PAGE 74 /HW Q EH WKH QXPEHU RI QRGHV LQ WKH IORZJUDSK RI WKH LQSXW SURJUDP )RU D SURJUDPPLQJ ODQJXDJH VXFK DV & VROYLQJ WKH GDWDIORZ HTXDWLRQV IRU D VLQJOH GHILn QLWLRQ ZKLFK LV ZKDW OLQH GRHV KDV ZRUVWFDVH FRPSOH[LW\ RI Qf /HW N EH WKH QXPEHU RI NQRZQ FDOOV LQ WKH LQSXW SURJUDP &RQVLGHULQJ OLQH D GHILQLWLRQ PD\ EH GDWDIORZ DQDO\]HG UHSHDWHGO\ DV ORQJ DV WKH DVVRFLDWHG UHVWULFWLRQ VHWV DUH QRW VXEVHWV RI DQ\ SUHYLRXV SDLU RI UHVWULFWLRQ VHWV XVHG WR GDWDIORZ DQDO\]H WKDW GHILQLWLRQ 7KH QXPEHU RI GLIIHUHQW UHVWULFWLRQ VHWV SRVVLEOH VXFK WKDW QR VHW LV D VXEVHW RI DQRWKHU VHW LV FOHDUO\ D QXPEHU WKDW ZLOO JURZ H[SRQHQWLDOO\ ZLWK N 7KXV WKH ZRUVWFDVH FRPSOH[LW\ RI RXU ORJLFDO ULSSOH HIIHFW DOJRULWKP LV H[SRQHQWLDO ZKHUH WKH H[SRQHQW LV VRPH IXQFWLRQ RI N +RZHYHU IRU WKH W\SLFDO LQSXW SURJUDP WKH DFWXDO QXPEHU RI QRQVXEVHW UHVWULFWLRQ VHWV WKDW FDQ EH JHQHUDWHG E\ RXU DOJRULWKP IRU D JLYHQ GHILQLn WLRQ ZLOO EH VHYHUHO\ FRQVWUDLQHG E\ D FRPELQDWLRQ RI /HPPD 7KHRUHPV WKURXJK DQG WKH W\SLFDO SURJUDP FDOO VWUXFWXUH WKDW LV FKDUDFWHUL]HG E\ VKDOORZ FDOO GHSWK $ SURWRW\SH WKDW GHPRQVWUDWHV RXU ORJLFDO ULSSOH HIIHFW DOJRULWKP KDV EHHQ EXLOW 7KH SURWRW\SH DFFHSWV DV LQSXW & SURJUDPV WKDW VDWLVI\ FHUWDLQ FRQVWUDLQWV VXFK DV KDYLQJ RQO\ VLQJOHLGHQWLILHU YDULDEOH QDPHV *LYHQ DQ LQSXW SURJUDP WKH SURWRW\SH WKHQ UHTXLUHV WKDW RQH RU PRUH GHILQLWLRQV EH LGHQWLILHG DV WKH VWDUWLQJ SRLQW RI WKH ULSSOH HIIHFW )RU SXUSRVHV RI FRPSDULVRQ EHVLGHV XVLQJ RXU DOJRULWKP WR FRPSXWH D SUHFLVH ORJLFDO ULSSOH HIIHFW WKH SURWRW\SH DOVR FRPSXWHV DQ RYHUHVWLPDWH RI WKH ORJLFDO ULSSOH HIIHFW 7KH RYHUHVWLPDWH LV FRPSXWHG E\ VLPSO\ LJQRULQJ WKH H[HFXWLRQ SDWK SUREOHP LH WKHUH DUH QR EDFNZDUGIORZ UHVWULFWLRQV ZKHQ WKH RYHUHVWLPDWH LV FRPSXWHG 7KH ZRUVWFDVH FRPSOH[LW\ RI FRPSXWLQJ WKH RYHUHVWLPDWH IRU & SURJUDPV LV RQO\ 2QGf ZKHUH Q LV WKH QXPEHU RI IORZJUDSK QRGHV DQG G LV WKH QXPEHU RI GHILQLWLRQV LQ WKH RYHUHVWLPDWHG ULSSOH HIIHFW 7KLV FRPSOH[LW\ IROORZV IURP WKH 2Qf FRPSOH[LW\ RI VROYLQJ WKH GDWDIORZ HTXDWLRQV IRU D VLQJOH GHILQLWLRQ DQG WKH IDFW WKDW WKH HTXDWLRQV ZLOO KDYH WR EH VROYHG G WLPHV PAGE 75 7DEOH ([SHULPHQWDO UHVXOWV IRU WKH SURWRW\SH JOREDOV GHIV GHIV JOREDO GHSWK QRGHV 56 563 UHGXFWLRQ WLPH WLPHS b b V V b b V V b b V V b b OPV V b b V V b b V V b b V V b b V V b b V V b b V V b b V V b b OPV OPV b b V V b b OPOV OPOV b b PV V b b PV PV b b V V b b V V b b PV PV b b PV PV b b V V b b OPO2V PV b b PV PV b b PV PV 7DEOH SUHVHQWV WHVW UHVXOWV IRU WKH SURWRW\SH (DFK URZ GHWDLOV UHOHYDQW FKDUn DFWHULVWLFV RI DQ LQSXW SURJUDP DQG SUHVHQWV WKH UHVXOWLQJ DYHUDJHV RI WHQ GLIIHUHQW WHVWV RI WKDW LQSXW SURJUDP ZKHUH HDFK WHVW FRPSXWHG WKH ULSSOH HIIHFW VWDUWHG E\ D VLQJOH UDQGRPO\ FKRVHQ GHILQLWLRQ RI D JOREDO YDULDEOH 7KH LQSXW SURJUDPV RI 7DEOH ZHUH UDQGRPO\ JHQHUDWHG E\ D VHSDUDWH SURn JUDP JHQHUDWRU 7KH JHQHUDWHG LQSXW SURJUDPV DUH V\QWDFWLFDOO\ FRUUHFW DQG FRPSLOH ZLWKRXW HUURU EXW KDYH PHDQLQJOHVV H[HFXWLRQV (DFK LQSXW SURJUDP RI 7DEOH KDV SURFHGXUHV DQG H[DFWO\ WKH QXPEHU RI JOREDO YDULDEOHV OLVWHG :LWKLQ HDFK LQSXW PAGE 76 SURJUDP HDFK JOREDO YDULDEOH LV GHILQHG DQG XVHG DW OHDVW RQFH 7KH FDOO VWUXFWXUH RI HDFK LQSXW SURJUDP ZDV GHWHUPLQHG UDQGRPO\ E\ WKH JHQHUDWRU ZLWK WKH FRQVWUDLQW WKDW WKHUH EH QR UHFXUVLRQ LQ WKH LQSXW SURJUDP DQG WKH JLYHQ PD[LPXP FDOO GHSWK QRW EH H[FHHGHG E\ DQ\ FDOO LQ WKH LQSXW SURJUDP $OO FDOOV LQ WKH JHQHUDWHG LQSXW SURJUDP DUH NQRZQ FDOOV DQG DSSUR[LPDWHO\ OPD[ f RI WKH FDOOV ZLOO EH DW HDFK SRVVLEOH GHSWK IURP ]HUR WR PD[ ZKHUH PD[ LV WKH JLYHQ PD[LPXP FDOO GHSWK 5HIHUULQJ WR WKH FROXPQV RI 7DEOH fJOREDOVf LV WKH QXPEHU RI JOREDO YDULDEOHV LQ WKH LQSXW SURJUDP fGHIVf LV WKH QXPEHU RI GHILQLWLRQV LQ WKH LQSXW SURJUDP fGHIV JOREDOf LV WKH SHUFHQWDJH RI WKH GHILQLWLRQV WKDW GHILQH D JOREDO YDULDEOH fGHSWKf LV WKH PD[LPXP FDOO GHSWK IROORZHG E\ WKH WRWDO QXPEHU RI FDOOV LQ WKH LQSXW SURJUDP fQRGHVf LV WKH QXPEHU RI QRGHV LQ WKH IORZJUDSK f56ff LV WKH DYHUDJH VL]H RI WKH RYHUHVWLPDWHG ULSSOH HIIHFW IRU WKH WHQ WHVW FDVHV ZKHUH VL]H LV WKH WRWDO QXPEHU RI GHILQLWLRQV DQG XVHV LQ WKH ULSSOH HIIHFW f563f LV WKH DYHUDJH VL]H RI WKH SUHFLVH ULSSOH HIIHFW fUHGXFWLRQf LV WKH DYHUDJH SHUFHQWDJH UHGXFWLRQ IRU WKH WHQ WHVW FDVHV RI WKH VL]H RI WKH RYHUHVWLPDWHG ULSSOH HIIHFW ZKHQ LW LV UHSODFHG E\ WKH SUHFLVH ULSSOH HIIHFW fWLPHf LV WKH DYHUDJH &38 XVDJH WLPH IRU HDFK WHVW FDVH WR FRPSXWH WKH RYHUHVWLPDWHG ULSSOH HIIHFW DQG fWLPHSf LV WKH DYHUDJH &38 XVDJH WLPH IRU HDFK WHVW FDVH WR FRPSXWH WKH SUHFLVH ULSSOH HIIHFW 7KH KDUGZDUH XVHG ZDV UDWHG DW URXJKO\ 0,36 $V DQ H[DPSOH RI WKH WLPH QRWDWLRQ XVHG LQ 7DEOH WLPH OPV ZRXOG EH UHDG DV PLQXWH VHFRQGV $OWKRXJK WKH ZRUVWFDVH FRPSOH[LW\ RI RXU DOJRULWKP IRU SUHFLVH ORJLFDO ULSSOH HIIHFW LV H[SRQHQWLDO WKH GDWD RI 7DEOH LQGLFDWHV WKDW WKH H[SHFWHG FRPSOH[LW\ IRU D ZLGH UDQJH RI LQSXW SURJUDPV JLYHQ D SURJUDPPLQJ ODQJXDJH VXFK DV & LV DSSUR[Ln PDWHG E\ QGf 7KLV IROORZV IURP WKH QGf ZRUVWFDVH FRPSOH[LW\ RI FRPSXWLQJ WKH RYHUHVWLPDWH DQG WKH W\SLFDO FORVHQHVV RI WLPH DQG WLPHS IRU HDFK URZ LQ 7DEOH +RZHYHU WKH ODVW URZ RI 7DEOH LV LQVWUXFWLYH EHFDXVH LW VKRZV WKDW UHJDUGOHVV RI ZKDW WKH H[SHFWHG FRPSOH[LW\ PLJKW EH WKHUH ZLOO DOZD\V EH VSHFLILF LQSXW SURJUDPV PAGE 77 DQG VWDUWLQJ SRLQWV WKDW UHTXLUH WLPH JUHDWO\ H[FHHGLQJ WKH WLPH UHTXLUHG WR FRPSXWH WKH RYHUHVWLPDWH ,Q SUDFWLFH LI WKH FRPSXWDWLRQ RI WKH SUHFLVH ORJLFDO ULSSOH HIIHFW LV WDNLQJ WRR ORQJ WKHQ WKLV FRPSXWDWLRQ FDQ EH DEDQGRQHG DQG WKH RYHUHVWLPDWH FRPSXWHG DQG XVHG LQ LWV SODFH 1RWH WKDW RXU DOJRULWKP FDQ YHU\ HDVLO\ FRPSXWH WKH RYHUHVWLPDWH E\ VLPSO\ PRGLI\LQJ 5XOH VR WKDW HOHPHQW G LV DOZD\V JHQHUDWHG LQ SODFH RI HOHPHQW G? WKHUHE\ DYRLGLQJ DOO EDFNZDUGIORZ UHVWULFWLRQV 7KH 6OLFLQJ $OJRULWKP 7KLV VHFWLRQ SUHVHQWV WKH LQYHUVH IRUP RI WKH SUHFLVH LQWHUSURFHGXUDO ORJLFDO ULSSOH HIIHFW DOJRULWKP DQG WKH LQYHUVH IRUP RI WKH DVVRFLDWHG GDWDIORZ HTXDWLRQV DQG EDFNZDUGIORZ UHVWULFWLRQ UXOHV 2XU DOJRULWKP IRU SUHFLVH LQWHUSURFHGXUDO VOLFLQJ LV VKRZQ LQ )LJXUH 7KH FRPSOH[LW\ DQG H[SHFWHG SHUIRUPDQFH RI WKLV DOJRULWKP LV WKH VDPH DV IRU WKH SUHFLVH LQWHUSURFHGXUDO ORJLFDO ULSSOH HIIHFW DOJRULWKP JLYHQ SUHYLRXVO\ )RU ORJLFDO ULSSOH HIIHFW WKH GDWDIORZ SUREOHP VROYHG DW OLQH ZDV UHDFKLQJ GHILQLWLRQV IRU D VLQJOH GHILQLWLRQ )RU VOLFLQJ ZKLFK LV WKH LQYHUVH SUREOHP WKH GDWDIORZ SUREOHP VROYHG DW OLQH ZLOO EH UHDFKLQJ XVHV IRU D VLQJOH XVH ,Q UHDFKLQJ GHILQLWLRQV WKH GHILQLWLRQ IORZV LQ WKH GLUHFWLRQ RI WKH DUFV LQ WKH IORZJUDSK DQG LV NLOOHG E\ GHILQLWLRQV RI WKH VDPH YDULDEOH DQG DIIHFWV XVHV RI WKH VDPH YDULDEOH DQG DQ\ GHILQLWLRQV GLUHFWO\ GHSHQGHQW RQ DQ DIIHFWHG XVH ,Q UHDFKLQJ XVHV WKH XVH IORZV LQ WKH UHYHUVH GLUHFWLRQ RI WKH DUFV LQ WKH IORZJUDSK DQG LV NLOOHG E\ GHILQLWLRQV RI WKH VDPH YDULDEOH DQG DIIHFWV GHILQLWLRQV RI WKH VDPH YDULDEOH DQG DQ\ XVHV WKDW GLUHFWO\ GHWHUPLQH DQ DIIHFWHG GHILQLWLRQ 7KLV UHYHUVH IORZ LQ WKH IORZJUDSK PHDQV WKDW WKH GDWDIORZ HTXDWLRQV VROYHG DW OLQH IRU WKH VOLFLQJ DOJRULWKP PXVW EH DQ LQYHUWHG IRUP RI WKH GDWDIORZ HTXDWLRQV WKDW DUH XVHG IRU WKH ORJLFDO ULSSOH HIIHFW DOJRULWKP 7KHVH LQYHUWHG GDWDIORZ HTXDWLRQV DUH VKRZQ LQ )LJXUH 7KH LQYHUWHG UXOHV WKDW WKH VOLFLQJ DOJRULWKP XVHV IRU EDFNZDUGIORZ UHVWULFWLRQ DUH JLYHQ EHORZ 1RWLFH WKDW WKH $//2: DQG 75$16)250 VHWV ZLOO FRQWDLQ UHWXUQV LQVWHDG RI FDOOV PAGE 78 fÂ§ &RPSXWH WKH VOLFH IRU D K\SRWKHWLFDO RU DFWXDO XVH E fÂ§ ,QSXW D SURJUDP IORZJUDSK UHDG\ IRU GDWDIORZ DQDO\VLV fÂ§ 2XWSXW WKH VOLFH LQ 6/,&( EHJLQ 6/,&( IRU HDFK XVH XX LQ WKH SURJUDP ),188 HQG IRU VWDFN fÂ§ SXVK f RQWR VWDFN ZKLOH VWDFN A GR SRS VWDFN LQWR X $//2: 75$16)250f 6ROYH WKH UHDFKLQJXVHV GDWDIORZ HTXDWLRQV IRU WKH VLQJOH XVH X XVLQJ 5XOHV DQG 6/,&( 6/,&( 8 0 IRU HDFK GHILQLWLRQ G LQ WKH SURJUDP WKDW LV DIIHFWHG E\ HLWKHU 8? RU X 6/,&( 6/,&( 8 ^G` HQG IRU 5227 A /,1. 5227 /,1. IRU HDFK UHWXUQ QRGH Q LQ WKH IORZJUDSK LI X? f %LQ>Q@ $ 8? FURVVHG IURP WKLV UHWXUQ LQWR WKH UHWXUQHGIURP SURFHGXUH 5227 fÂ§ 5227O 8 ^WKH UHWXUQ QRGH Q` IL LI X? e ef>Q@ $ X? FURVVHG IURP WKLV UHWXUQ LQWR WKH UHWXUQHGIURP SURFHGXUH /,1. fÂ§ /,1. 8 ^WKH UHWXUQ QRGH Q` IL LI X f %cQ>Q? $ X FURVVHG IURP WKLV UHWXUQ LQWR WKH UHWXUQHGIURP SURFHGXUH 5227 fÂ§ 5227 8 ^WKH UHWXUQ QRGH Q` IL LI X e (WQ>Q@ $ X FURVVHG IURP WKLV UHWXUQ LQWR WKH UHWXUQHGIURP SURFHGXUH /,1. fÂ§ /,1. 8 ^WKH UHWXUQ QRGH Q` IL HQG IRU )LJXUH 7KH VOLFLQJ DOJRULWKP PAGE 79 HQG HQG IRU HDFK XVH XX LQ WKH SURJUDP WKDW LV DIIHFWHG E\ HLWKHU X? RU X fÂ§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU XX E\ 7KHRUHP LI X e ="RXW>QRGH ZKHUH XX RFFXUV@ 3$7+6 75$16 FDOO $QDO\]H HOVH fÂ§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU XX E\ 7KHRUHP LI X e L"RXL>QRGH ZKHUH XX RFFXUV@ 3$7+6 3$7+6 fÂ§ ^[ [ e 5227 8 /,1.f $ [ UHWXUQV IURP WKH SURFHGXUH WKDW FRQWDLQV XX 9 [ UHWXUQV IURP D SURFHGXUH WKDW FRQWDLQV D UHWXUQ U e 3$7+6 /,1.ff` 75$16 5227 Q 3$7+6 FDOO $QDO\]H IL fÂ§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU XX E\ 7KHRUHP LI XL e "RXW>QRGH ZKHUH XX RFFXUV@ 3$7+6 3$7+6 rfÂ§ ^[ [ e $//2: $ [ UHWXUQV IURP WKH SURFHGXUH WKDW FRQWDLQV XX 9 [ UHWXUQV IURP D SURFHGXUH WKDW FRQWDLQV D UHWXUQ U e 3$7+6f` 75$16 75$16)250 Q 3$7+6 FDOO $QDO\]H IL fÂ§ GHWHUPLQH $OORZ DQG 7UDQVIRUP IRU XX E\ 7KHRUHP LI XL e )nRXMQRGH ZKHUH XX RFFXUV@ IRU HDFK SURFHGXUH ; WKDW FRQWDLQV D UHWXUQ e 5227 57 AfÂ§ ^[ [ e 5227 $ [ LV FRQWDLQHG LQ SURFHGXUH ;f 33 m 33 fÂ§ ^D [ e 57 8 /,1.f $ [ LV RQ D SDWK WKDW LQFOXVLYHO\ EHJLQV ZLWK D UHWXUQ e 57 DQG HQGV ZLWK D UHWXUQ IURP WKH SURFHGXUH WKDW FRQWDLQV XX VXFK WKDW HDFK UHWXUQ LQ WKLV SDWK LV LQ 57O 8 /,1.Off` LI 33 s 3$7+6 3$7+6 fÂ§ ^[ [ e $//2: $ [ UHWXUQV IURP SURFHGXUH ; 9 [ UHWXUQV IURP D SURFHGXUH WKDW FRQWDLQV D UHWXUQ U e 3$7+6f` 75$16 75$16)250 Q 3$7+6 3$7+6 3$7+6 8 33 FDOO $QDO\]H VWDWHPHQWV IL HQG IRU IL IL HQG IRU RG )LJXUH FRQWLQXHG PAGE 80 r 3URFHGXUH $QDO\]H EHJLQ fÂ§ DYRLG UHSHWLWLRQ RI XX GDWDIORZ DQDO\VLV LI SRVVLEOH LI ),188 7 $ 3$7+6 9 WUXH IRU DOO VDYHG SDLUV IRU XX 3$7+6 A 3 9 75$16 b 7ff LI 3$7+6 ),188 7 SXVK XX f RQWR VWDFN HOVH VDYH 3$7+6 DQG 75$16 DV WKH SDLU 3 [ 7 IRU XX SXVK XX 3$7+6 75$16f RQWR VWDFN IL IL HQG )LJXUH FRQWLQXHG 5XOH ,I $//2: WKHQ HOHPHQW LÂ LV JHQHUDWHG DW WKH QRGH ZKHUH XVH X RFFXUV RWKHUZLVH WT LV WKH JHQHUDWHG HOHPHQW 5XOH /HW Q EH D FDOO QRGH S EH WKH DVVRFLDWHG UHWXUQ QRGH DQG T EH WKH HQWU\ QRGH RI WKH UHWXUQHGIURP SURFHGXUH (DFK WLPH WKH %RXW>Q? HTXDWLRQ LV FRPSXWHG LI LT f %LQ>Tf WKHQ WT FDQQRW FURVV IURP %^Q>T@ LQWR WKH %RXW>Q@ VHW LI S e $//2: 5XOH /HW Q EH D FDOO QRGH S EH WKH DVVRFLDWHG UHWXUQ QRGH DQG T EH WKH HQWU\ QRGH RI WKH UHWXUQHGIURP SURFHGXUH (DFK WLPH WKH %RXW>Q? HTXDWLRQ LV FRPSXWHG LI LT f %^Q>T@ DQG E\ &L DQG 5XOH UT FDQ FURVV IURP %^Q>T@ LQWR WKH %RXW>Q@ VHW DQG S ( 75$16)250 WKHQ DV WKLV WT HOHPHQW FURVVHV IURP %^Q>T@ LQWR WKH %RXW>Q? VHW WKH HOHPHQW LV FKDQJHG WR LÂ ,Q HIIHFW LT LV WUDQVIRUPHG LQWR 8 DQG WKH FDOO QRGH Q EHFRPHV D JHQHUDWLRQ QRGH IRU WKH X HOHPHQW $V WKH XVHIXOQHVV RI VOLFLQJ LV SULPDULO\ IRU SURJUDP IDXOW ORFDOL]DWLRQ LW PD\ EH GHVLUDEOH WR PRGLI\ WKH DOJRULWKP VR WKDW WKRVH XVHV LQ FRQWURO SUHGLFDWHV ZKRVH VXERUGLQDWH VWDWHPHQWV KDYH DW OHDVW RQH XVH RU GHILQLWLRQ DOUHDG\ LQ WKH VOLFH DUH WKHPVHOYHV DGGHG WR WKH VOLFH DQG SURSDJDWHG LQ WXUQ $Q H[DPSOH RI D FRQWURO SUHGn LFDWH LV WKH FRQGLWLRQ WHVWHG E\ DQ LI VWDWHPHQW %\ VXERUGLQDWH VWDWHPHQWV LV PHDQW PAGE 81 )RU DQ\ QRGH Q 287>Q@ (RXW>Q@ 8 %RXW>Q? ,1>Q@ (LQ>Q? 8 %LQ>Q? *URXS Q LV DQ H[LW QRGH %XW>Q@ (RXW:@ ^[ [ ,1>S@ $ &L` S f VXFFQf %LQ>Q@ *(1>Q@ (LQ>Q@ (RXW>Q@ 8 5(&2'(>Q@ *URXS ,, Q LV D FDOO QRGH S LV WKH DVVRFLDWHG UHWXUQ QRGH DQG T LV WKH HQWU\ QRGH RI WKH UHWXUQHGIURP SURFHGXUH %XW>Q? ^] [ f %LQ>S@ $ &L 9 &L $ & $ [ H eÂQ>J@fff 9 [ f %LQ>T@ $ &f` (RXW>Q@ ^[ (LQ>S@ &L 9 &L $ & $ [ ef>"@f` %Q>Q@ >%RXW>Q@ .,//>Q@f 8 *(1>Q@ (LQ>Q@ (RXW>Q? .,//>Q@ *URXS ,,, Q LV QRW DQ H[LW RU FDOO QRGH %XW>Qf %LQ>S@ S f VXFFQf (XW>Q@ (LQ>S@ S f VXFFQf %LQ>Q? %RXW>Q? fÂ§ .,//>Q@f 8 *(1>Q@ &QIQ@ (RXW>Q@ .,//>Q? )LJXUH 'DWDIORZ HTXDWLRQV IRU WKH UHDFKLQJXVHV SUREOHP PAGE 82 WKRVH VWDWHPHQWV ZKRVH H[HFXWLRQ LV GHFLGHG E\ WKH FRQWURO SUHGLFDWH ,QFOXGLQJ WKHVH FRQWUROSUHGLFDWH XVHV LQ WKH VOLFH LV DGYDQWDJHRXV EHFDXVH WKH FDXVH RI D SURJUDP HUURU PD\ DFWXDOO\ EH LQ D FRQWURO SUHGLFDWH WKDW LV QRW GHFLGLQJ FRUUHFWO\ ZKHQ WR H[HFXWH LWV VXERUGLQDWH VWDWHPHQWV )HUUDQWH HW DO >@ SUHVHQW D PHWKRG WR SUHFLVHO\ GHWHUPLQH WKH FRQWURO SUHGLFDWHV IRU HDFK VWDWHPHQW PAGE 83 &+$37(5 ,17(5352&('85$/ 3$5$//(/,=$7,21 /RRS&DUULHG 'DWD 'HSHQGHQFH 7KLV VHFWLRQ H[SODLQV ORRSFDUULHG GDWD GHSHQGHQFH DQG LWV UHOHYDQFH WR SDUDOn OHOL]DWLRQ :KHQ D GHILQLWLRQ RI D YDULDEOH UHDFKHV D XVH RI WKDW YDULDEOH WKHQ D GDWD GHSHQGHQFH H[LVWV VXFK WKDW WKH XVH GHSHQGV RQ WKH GHILQLWLRQ $Q H[DPSOH RI GDWD GHSHQGHQFH FDQ EH VHHQ LQ )LJXUH 7KH XVH RI $,f DW OLQH DQG WKH XVH RI $,f DW OLQH ERWK GHSHQG RQ WKH GHILQLWLRQ RI $,f DW OLQH +RZHYHU ZKHQ FRQVLGHULQJ ZKHWKHU RU QRW D ORRS FDQ EH SDUDOOHOL]HG WKHUH LV D VSHFLDO NLQG RI GDWD GHSHQGHQFH FDOOHG ORRSFDUULHG GDWD GHSHQGHQFH >@ $ GDWD GHSHQGHQFH LV ORRS FDUULHG LI WKH YDOXH VHW E\ D GHILQLWLRQ LQVLGH WKH ORRS GXULQJ ORRS LWHUDWLRQ L FDQ EH XVHG E\ D XVH RI WKDW YDULDEOH LQVLGH WKH ORRS GXULQJ ORRS LWHUDWLRQ M ZKHUH L A M 1RWH WKDW L A M LV VSHFLILHG LQVWHDG RI WKH PRUH UHVWULFWLYH DQG QDWXUDO VHHPLQJ L M EHFDXVH LI WKH ORRS LV SDUDOOHOL]HG WKHQ WKH RUGHULQJ RI WKH ORRS LWHUDWLRQV FDQQRW EH DVVXPHG 7KH UHODWLRQVKLS EHWZHHQ ORRSFDUULHG GDWD GHSHQGHQFH DQG SDUDOOHOL]DWLRQ LV VWUDLJKWIRUZDUG ,I WKHUH LV DW OHDVW RQH ORRSFDUULHG GDWD GHSHQGHQFH WKHQ WKH ORRS FDQQRW EH SDUDOOHOL]HG RWKHUZLVH WKH ORRS FDQ EH SDUDOOHOL]HG /RRS SDUDOOHOL]DWLRQ '2 1 $,f %,f r &,f %,f &,f $,f ,) &,f 7+(1 &,f $,f r %,f ), (1' '2 )LJXUH $Q H[DPSOH ORRS PAGE 84 ZRXOG PHDQ WKDW WKH RUGHULQJ RI WKH GLIIHUHQW LWHUDWLRQV RI WKH ORRS LV XQLPSRUWDQW ZKHUHDV D ORRSFDUULHG GHSHQGHQFH PHDQV WKH RSSRVLWH ,I WKHUH DUH QR ORRSFDUULHG GDWD GHSHQGHQFLHV WKHQ WKHUH LV QR UHTXLUHPHQW WKDW WKH LWHUDWLRQV EH RUGHUHG D FHUWDLQ ZD\ +RZHYHU ZKHQHYHU D ORRS LV SDUDOOHOL]HG WKHUH VKRXOG EH D IROORZLQJ DGGHG VHULDO VWHS WKDW VHWV WKH LWHUDWLRQ YDULDEOHV VXFK DV WKH LQ )LJXUH WR ZKDWHYHU WKHLU YDOXHV ZRXOG EH IRU WKH ODVW LWHUDWLRQ RI WKH ORRS DVVXPLQJ WKH ORRS KDG QRW EHHQ SDUDOOHOL]HG 7KLV DGGHG VWHS ZRXOG EH QHFHVVDU\ DVVXPLQJ WKH LWHUDWLRQ YDULDEOHV RI D ORRS DUH YLVLEOH RXWVLGH WKH ORRS DQG FDQ WKHUHIRUH EH UHIHUHQFHG DIWHU WKH ORRS FRPSOHWHV ,WHUDWLRQ YDULDEOHV DUH WKRVH YDULDEOHV WKDW DUH LQFUHPHQWHG RU GHFUHPHQWHG D FRQVWDQW YDOXH IRU HDFK ORRS LWHUDWLRQ 7KH UHFRJQLWLRQ RI LWHUDWLRQ YDULDEOHV LV ODQJXDJHGHSHQGHQW 5HJDUGLQJ GDWD GHSHQGHQFH DQG DUUD\V WKHUH DUH VHYHUDO HIILFLHQW WHVWV DYDLODEOH WKDW GHWHUPLQH LI D GDWD GHSHQGHQFH LV SRVVLEOH EHWZHHQ D SDUWLFXODU GHILQLWLRQ DQG XVH RI DQ DUUD\ 7KH WHVWV DUH WKH VHSDUDELOLW\ WHVW WKH JFG WHVW DQG WKH %DQHUMHH WHVW 'HWDLOV RI WKHVH WKUHH WHVWV FDQ EH IRXQG LQ >@ 7KH QXPEHU WKHRU\ EHKLQG WKH WHVWV LV OLQHDU GLRSKDQWLQH HTXDWLRQV $ OLQHDU GLRSKDQWLQH HTXDWLRQ FDQ EH IRUPHG IURP WKH DUUD\ VXEVFULSWV RI WKH GHILQLWLRQ DQG XVH LQ TXHVWLRQ )RU H[DPSOH LQ )LJXUH ZH ZDQW WR NQRZ LI $ r f DQG $ r ,f FDQ HYHU UHIHU WR WKH VDPH DUUD\ HOHPHQW 7KH OLQHDU GLRSKDQWLQH HTXDWLRQ WKDW UHODWHV WKHVH WZR DUUD\ UHIHUHQFHV ZRXOG EH D fÂ§ \ 7KH TXHVWLRQ QRZ EHFRPHV GRHV WKLV HTXDWLRQ KDYH DQ\ LQWHJHU VROXWLRQV JLYHQ WKH ERXQGDU\ FRQGLWLRQV [\ ,I WKHUH LV DW OHDVW RQH LQWHJHU VROXWLRQ WKHQ WKHUH ZRXOG EH D GDWD GHSHQGHQFH RWKHUZLVH WKHUH LV QR GDWD GHSHQGHQFH DV LV WKH FDVH ZLWK )LJXUH )RU WKH GLVFXVVLRQ WKDW IROORZV ZH GHILQH WKH WHUP ORRS ERG\ 7KH ORRS ERG\ RI DQ\ ORRS / ZLOO EH DOO VWDWHPHQWV LQ WKH SURJUDP WKDW FDQ SRVVLEO\ EH H[HFXWHG GXULQJ WKH LWHUDWLRQV RI ORRS / &DOOV DUH DOORZHG LQ D ORRS VR D VLQJOH ORRS ERG\ FRXOG FRQFHLYDEO\ LQFOXGH WKH VWDWHPHQWV RI PDQ\ GLIIHUHQW SURFHGXUHV )RU H[DPSOH PAGE 85 '2 $ r f $ r ,f (1' '2 )LJXUH $ ORRS ZLWK DUUD\ UHIHUHQFHV LI D ORRS FRQWDLQV D FDOO RI SURFHGXUH $ DQG SURFHGXUH $ FRQWDLQV D FDOO RI SURFHGXUH % WKHQ WKH ORRS ERG\ ZRXOG LQFOXGH DOO WKH VWDWHPHQWV RI SURFHGXUHV $ DQG % ,Q )LJXUH WKH ORRS ERG\ LV WKH IRXU VWDWHPHQWV DW OLQHV WKURXJK :LWK UHVSHFW WR WKH SURJUDP IORZJUDSK WKH ORRS ERG\ LV DOO IORZJUDSK QRGHV WKDW PD\ EH WUDYHUVHG GXULQJ WKH LWHUDWLRQV RI WKH ORRS /HW /% EH WKH VHW RI IORZJUDSK QRGHV WKDW DUH LQ WKH ORRS ERG\ RI ORRS / /HW Q EH WKH ILUVW QRGH LQ WKH ORRS ERG\ WKDW LV WUDYHUVHG GXULQJ HDFK LWHUDWLRQ RI WKH ORRS 7KH LGHQWLILFDWLRQ RI QRGH Q LV ODQJXDJHGHSHQGHQW :LWKLQ WKH ORRS ERG\ RI / OHW GHILQLWLRQ G EH D GHILQLWLRQ RI D QRQDUUD\ YDULDEOH Y DQG OHW XVH X EH D XVH RI WKH YDULDEOH Y WKDW LV UHDFKHG E\ GHILQLWLRQ G /HW G EH WKH QRGH LQ WKH ORRS ERG\ ZKHUH GHILQLWLRQ G RFFXUV DQG OHW X EH WKH QRGH LQ WKH ORRS ERG\ ZKHUH WKH XVH X RFFXUV 7R DYRLG WKH FRPSOLFDWLRQV SRVHG E\ VSHFLDO FDVHV ZH DVVXPH WKDW G Q DQG X DUH VHSDUDWH DQG GLVWLQFW QRGHV $OWKRXJK XVH X GHSHQGV RQ GHILQLWLRQ G EHFDXVH GHILQLWLRQ G UHDFKHV XVH X WKLV GDWD GHSHQGHQFH FDQ SUHYHQW SDUDOOHOL]DWLRQ RI ORRS / RQO\ LI WKH GHSHQGHQFH LV ORRS FDUULHG /HW 3 EH D VHTXHQFH RI IORZJUDSK QRGHV GUDZQ IURP /% VXFK WKDW 3 UHSUHVHQWV D SRVVLEOH H[HFXWLRQ SDWK DORQJ ZKLFK GHILQLWLRQ G FDQ UHDFK XVH X )RU GHILQLWLRQ G WR EH ORRSFDUULHG WR XVH X DORQJ SDWK 3 WKH WKUHH QRGHV G Q DQG X PXVW EH LQ 3 DQG LQ WKDW RUGHU EHFDXVH RQO\ WKH WUDYHUVDO RI QRGH Q UHSUHVHQWV WKH WUDQVLWLRQ WR D GLIIHUHQW LWHUDWLRQ RI WKH ORRS ,I Y LV DQ DUUD\ WKHQ ZH DVVXPH WKDW GHILQLWLRQ G DQG XVH X PD\ UHIHU WR GLIIHUHQW DUUD\ HOHPHQWV GXULQJ WKH VDPH LWHUDWLRQ )RU WKLV UHDVRQ D SDWK 3 WKDW LQFOXGHV WKH QRGHV G X Q G LW LQ WKDW RUGHU PXVW PAGE 86 EH DVVXPHG WR VKRZ D ORRSFDUULHG GDWD GHSHQGHQFH ZKHQ Y LV DQ DUUD\ ZKHUHDV WKLV SDWK 3 GRHV QRW VKRZ D ORRSFDUULHG GDWD GHSHQGHQFH LI GHILQLWLRQ G DQG XVH X DOZD\V UHIHU WR WKH VDPH VWRUDJH ORFDWLRQ GXULQJ DQ\ LWHUDWLRQ DV ZH DVVXPH LV WKH FDVH ZKHQ Y LV D QRQDUUD\ EHFDXVH LQ DQ\ LWHUDWLRQ WKDW IROORZV VXFK D SDWK 3 WKH YDOXH XVHG DW XVH X LV DOZD\V WKH YDOXH GHILQHG DW GHILQLWLRQ G LQ WKDW VDPH LWHUDWLRQ 7KH 3DUDOOHOL]DWLRQ $OJRULWKP 7KLV VHFWLRQ SUHVHQWV LQ )LJXUH DQ DOJRULWKP WKDW LGHQWLILHV ORRSV WKDW FDQ EH SDUDOOHOL]HG LQFOXGLQJ ORRSV WKDW FRQWDLQ FDOOV 7KH DOJRULWKP XVHV RXU LQWHUSURFHGXn UDO GDWDIORZ DQDO\VLV PHWKRG DV DQ LQWHJUDO VWHS WR GHWHUPLQH GDWD GHSHQGHQFLHV 7KH ORRSV WKDW FDQ EH SDUDOOHOL]HG DUH WKRVH ORRSV WKDW DUH QRW PDUNHG E\ WKH DOJRULWKP DV LQKLELWHG 7KH DOJRULWKP KDV WKUHH GLVWLQFW VWHSV )LUVW WKH UHDFKLQJGHILQLWLRQV GDWDIORZ SUREOHP LV VROYHG IRU WKH LQSXW SURJUDP E\ XVLQJ RXU LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV PHWKRG 6HFRQG WKH TXDOLW\ RI WKH UHDFKLQJGHILQLWLRQ LQIRUPDWLRQ FRPSXWHG E\ WKH ILUVW VWHS LV SRVVLEO\ LPSURYHG LQ WKH FDVH RI DUUD\ UHIHUHQFHV E\ XVLQJ WKH VHSDUDELOLW\ JFG DQG %DQHUMHH WHVWV 7KLUG LQGLYLGXDO GX SDLUV WKDW UHSUHVHQW GDWD GHSHQGHQFH DUH H[DPLQHG IRU ORRSFDUULHG GDWD GHSHQGHQFH $W OLQH WKH GHILQLWLRQV DQG XVHV RI LWHUDWLRQ YDULDEOHV DUH H[FOXGHG IURP WHVWLQJ IRU ORRSFDUULHG GDWD GHSHQGHQFH EHFDXVH IRU DQ\ LWHUDWLRQ WKH LWHUDWLRQ YDULDEOHV ZLOO KDYH FRQVWDQW YDOXHV WKDW FDQ EH SUHFRPSXWHG LI ORRS / LV SDUDOOHOL]HG 7KH WHVW DW OLQH LV D QHFHVVDU\ FRQGLWLRQ IRU WKH 3WHVW SURFHGXUH WR UHWXUQ D 7 ZKLFK LV WHVWHG IRU DW OLQH 7KH WHVW DW OLQH LV GRQH DV DQ HFRQRP\ PHDVXUH WR DYRLG ZKHQ SRVVLEOH WKH PRUH FRVWO\ 3WHVW 3URFHGXUH 3WHVW XVHV D VWUDLJKWIRUZDUG DOJRULWKP WKDW EHJLQV ZLWK QRGH G DQG WKHQ VSUHDGV RXW H[DPLQLQJ VXFFHVVRUV VXFFHVVRUV RI VXFFHVVRUV DQG VR RQ XQWLO HLWKHU WKHUH DUH QR PRUH DFFHSWDEOH QRGHV WR H[DPLQH LQ ZKLFK FDVH ) LV UHWXUQHG RU DOO WKH UHTXLUHPHQWV IRU SDWK 3 KDYH EHHQ PHW LQ ZKLFK FDVH 7 LV UHWXUQHG 7KH VXFFHVVRUV PAGE 87 fÂ§ D G X SDLU LV D GHILQLWLRQ G WKDW UHDFKHV D XVH X fÂ§ D LV WKH GDWDIORZ HOHPHQW WKDW UHSUHVHQWV WKH GHILQLWLRQ G fÂ§ Y LV WKH YDULDEOH UHIHUHQFHG E\ GHILQLWLRQ G DQG XVH X fÂ§ WR DYRLG FRPSOLFDWLRQV Q A G A X LV DVVXPHG fÂ§ Q LV WKH ILUVW QRGH WUDYHUVHG GXULQJ HDFK ORRS / LWHUDWLRQ fÂ§ G LV WKH QRGH ZKRVH EDVLF EORFN FRQWDLQV GHILQLWLRQ G fÂ§ X LV WKH QRGH ZKRVH EDVLF EORFN FRQWDLQV XVH X fÂ§ /% LV WKH VHW RI QRGHV LQ WKH ORRS ERG\ RI ORRS / fÂ§ ,9 LV WKH VHW RI GHILQLWLRQV RI LWHUDWLRQ YDULDEOHV IRU ORRS / EHJLQ fÂ§ VWHS GHWHUPLQH UHDFKLQJ GHILQLWLRQV IRU WKH LQSXW SURJUDP XVH RXU PHWKRG WR VROYH WKH UHDFKLQJGHILQLWLRQV GDWDIORZ SUREOHP fÂ§ VWHS LPSURYH WKH UHDFKLQJGHILQLWLRQ LQIRUPDWLRQ IRU DUUD\ UHIHUHQFHV IRU DOO G X SDLUV LQ WKH SURJUDP VXFK WKDW Y LV DQ DUUD\ XVH WKH VHSDUDELOLW\ JFG DQG %DQHUMHH WHVWV DV DSSOLFDEOH LI GHILQLWLRQ G DQG XVH X FDQ QHYHU UHIHUHQFH WKH VDPH HOHPHQW PDUN WKH G X SDLU DV QRQUHDFKLQJ IL HQG IRU fÂ§ VWHS LGHQWLI\ G X SDLUV WKDW LQKLELW SDUDOOHOL]DWLRQ IRU HDFK ORRS / LQ WKH SURJUDP IRU HDFK UHDFKLQJ G X SDLU VXFK WKDW G X f /% DQG GHILQLWLRQ G e ,9 LI [ f %RXW>Q? LI 3WHVWDU Q G X / /%f 7 PDUN / SDUDOOHOL]DWLRQ DV LQKLELWHG E\ WKH G X SDLU IL IL HQG IRU HQG IRU HQG )LJXUH 7KH SDUDOOHOL]DWLRQ DOJRULWKP PAGE 88 SURFHGXUH 3WHVW[ Q G X / /%f fÂ§ LV WKHUH D ORRSFDUULHG GDWD GHSHQGHQFH IURP GHILQLWLRQ G WR XVH X WKUX QRGH Q fÂ§ UHWXUQ 7 LI \HV ) LI QR EHJLQ fÂ§ SDUWL LV WKHUH D SDWK IURP G WR Q DORQJ ZKLFK [ LV IRXQG LI Y LV DQ DUUD\ '21( m ^G` HOVH '21( ^G X` IL 1(;7 ^G` XQWLO 1(;7 UHPRYH D QRGH IURP 1(;7 GHQRWH LW S IRU HDFK VXFFHVVRU QRGH V RI QRGH S VXFK WKDW V e '21( '21( '21( 8 ^V` LI V e /% 9V LV DQ HQWU\ QRGH 9[ e RXW>!V@ LJQRUH V HOVH LI V Q JRWR SDUW HOVH 1(;7 1(;7 8 ^V` IL HQG IRU HQG XQWLO UHWXUQ ) )LJXUH FRQWLQXHG PAGE 89 SDUW fÂ§ SDUW LV WKHUH D SDWK IURP Q WR X DORQJ ZKLFK [ LV IRXQG LI Y LV DQ DUUD\ '21( m ^Q` HOVH '21( m ^Q G` IL 1(;7 m ^Q` XQWLO 1(;7 UHPRYH D QRGH IURP 1(;7 GHQRWH LW S IRU HDFK VXFFHVVRU QRGH V RI QRGH S VXFK WKDW V A '21( '21( '21( 8 ^V` LI V e /% 9V LV DQ H[LW QRGH 9V LV FRQWDLQHG LQ WKH VDPH SURFHGXUH WKDW FRQWDLQV / $ [ %RXW>V@f 9V LV QRW FRQWDLQHG LQ WKH VDPH SURFHGXUH WKDW FRQWDLQV / $ [ e (ARXL>V@f LJQRUH V HOVH LI V X UHWXUQ 7 HOVH 1(;7 1(;7 8 ^V` IL HQG IRU HQG XQWLO UHWXUQ ) HQG )LJXUH FRQWLQXHG PAGE 90 RI D QRGH DUH H[DPLQHG EHFDXVH QRUPDOO\ D VXFFHVVRU QRGH LV DVVXPHG WR UHSUHVHQW D SRVVLEOH FRQWLQXDWLRQ RI WKH H[HFXWLRQ SDWK IURP WKH SRLQW RI WKH SUHGHFHVVRU QRGH ([FHSWLRQV LQ WKH DOJRULWKP LQYROYLQJ HQWU\ DQG H[LW QRGHV DUH H[SODLQHG VKRUWO\ 1RWH WKDW 3WHVW RQO\ GHWHUPLQHV ZKHWKHU D VDWLVIDFWRU\ SDWK 3 H[LVWV RU QRW LW GRHV QRW GHWHUPLQH ZKDW SDWK 3 LV LQ WHUPV RI DQ DFWXDO QRGH VHTXHQFH DV WKHUH PD\ EH PDQ\ VXFK VDWLVIDFWRU\ SDWKV 3 /LQHV DQG DUH DFWLYH ZKHQ Y LV QRW DQ DUUD\ ,Q WKLV FDVH D SDWK 3 WKDW LQFOXGHV G X Q G X LQ WKDW RUGHU LV QRW DOORZHG DQG WKLV LV SUHYHQWHG E\ PDUNLQJ WKH XQZDQWHG QRGH X DW OLQH DQG WKH XQZDQWHG QRGH G DW OLQH 7KH WHVW RI [ I"RXW>V@ DW OLQH VDWLVILHV WKH UHTXLUHPHQW WKDW WKH GHILQLWLRQ G FDQ UHDFK DORQJ WKH SDWK 3 $ VLPLODU WHVW LV PDGH DW OLQH $W OLQH RQO\ WKH % VHW LV FKHFNHG EHFDXVH WKHUH DUH QR GHVFHQWV LQWR FDOOHG SURFHGXUHV DV SHU WKH UHMHFWLRQ RI HQWU\ QRGHV DW OLQH (QWU\ QRGHV DUH UHMHFWHG DW OLQH EHFDXVH DQ\ SDWK IURP G WR Q ZLOO QRW OHDYH XQUHWXUQHG FDOOV EHFDXVH Q LV DQ RXWHUPRVW QRGH UHODWLYH WR WKH ORRS ERG\ DQG WKH SDWK LV FRQILQHG WR WKH ORRS ERG\ $V WKH VXFFHVVRUV RI HDFK FDOO QRGH DUH DQ HQWU\ QRGH DQG D UHWXUQ QRGH LW LV RQO\ QHFHVVDU\ WR FKHFN WKH RXW VHW RI WKH UHWXUQ QRGH WR NQRZ ZKHWKHU WKH HOHPHQW [ VXUYLYHG WKH FDOO RU QRW DQG WKLV LV HIIHFWLYHO\ GRQH E\ WKH [ e %RXW>V@ WHVW DOUHDG\ PHQWLRQHG $W OLQH H[LW QRGHV DUH UHMHFWHG EHFDXVH DQ\ SDWK IURP Q WR X ZLOO QRW PDNH D UHWXUQ ZLWKRXW ILUVW PDNLQJ WKH FDOO 7KLV IROORZV IURP WKH IDFW DOUHDG\ PHQWLRQHG WKDW QRGH Q LV DQ RXWHUPRVW QRGH UHODWLYH WR WKH ORRS ERG\ DQG WKH SDWK LV FRQILQHG WR WKH ORRS ERG\ $V WKH UHWXUQ QRGH FDQ DOZD\V EH DGGHG WR WKH SDWK 3 IURP WKH FDOO QRGH WKHUH LV QR QHHG WR DGG LW IURP WKH H[LW QRGH KHQFH WKH UHMHFWLRQ RI WKH H[LW QRGH )RU SDUWL DQG SDUW LQ SURFHGXUH 3WHVW HDFK IORZJUDSK QRGH PD\ DSSHDU RQO\ RQFH LQ WKH 1(;7 VHW KHQFH WKH FRPSOH[LW\ RI WKH 3WHVW SURFHGXUH LV Qf ZKHUH Q LV WKH QXPEHU RI IORZJUDSK QRGHV )RU WKH HQWLUH DOJRULWKP VWHS GRPLQDWHV VR WKH PAGE 91 FRPSOH[LW\ LV 2OSQf ZKHUH O LV WKH QXPEHU RI ORRSV LQ WKH SURJUDP S LV WKH QXPEHU RI GX SDLUV LQ WKH SURJUDP DQG Q LV WKH QXPEHU RI IORZJUDSK QRGHV PAGE 92 &+$37(5 &21&/86,216 $1' )8785( 5(6($5&+ 6XPPDU\ RI 0DLQ 5HVXOWV 7KH ILUVW SDUW RI WKLV ZRUN SUHVHQWHG D QHZ PHWKRG IRU FRQWH[WGHSHQGHQW IORZ VHQVLWLYH LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV 7KH PHWKRG ZDV VKRZQ WR SURGXFH D SUHFLVH ORZFRVW VROXWLRQ IRU VXFK IXQGDPHQWDO DQG LPSRUWDQW SUREOHPV DV UHDFKLQJ GHILQLWLRQV DQG DYDLODEOH H[SUHVVLRQV UHJDUGOHVV RI WKH DFWXDO FDOO VWUXFWXUH RI WKH SURJUDP EHLQJ DQDO\]HG %\ XVLQJ D VHSDUDWH VHW WR LVRODWH FDOOLQJFRQWH[W HIIHFWV DQG DQRWKHU VHW WR DFFXPXODWH ERG\ HIIHFWV WKH FDOOLQJFRQWH[W SUREOHP KDV EHHQ UHGXFHG WR WKH SUREOHP RI VROYLQJ WKH GDWDIORZ HTXDWLRQV WKDW FRPSXWH WKH GLIIHUHQW VHWV 7KHVH HTXDWLRQV FDQ EH VROYHG E\ WKH LWHUDWLYH DOJRULWKP $V SDUW RI RXU PHWKRG WKH LQWHUSURFHGXUDO NLOO HIIHFWV RI FDOOE\UHIHUHQFH IRUPDO SDUDPHWHUV DUH FRUUHFWO\ KDQGOHG E\ WKH HTXDWLRQVFRPSDWLEOH WHFKQLTXH RI HOHPHQW UHFRGLQJ 7KH LPSRUWDQFH RI RXU LQWHUSURFHGXUDO DQDO\VLV PHWKRG OLHV LQ WKH IDFW WKDW D QXPEHU RI GLIIHUHQW DSSOLFDWLRQV GHSHQG RQ WKH VROXWLRQ RI IXQGDPHQWDO GDWDIORZ SUREOHPV VXFK DV UHDFKLQJ GHILQLWLRQV OLYH YDULDEOHV GHILQLWLRQXVH DQG XVHGHILQLWLRQ FKDLQV DQG DYDLODEOH H[SUHVVLRQV 3URJUDP UHYDOLGDWLRQ GDWDIORZ DQRPDO\ GHWHFWLRQ FRPSLOHU RSWLPL]DWLRQ DXWRPDWLF YHFWRUL]DWLRQ DQG SDUDOOHOL]DWLRQ DQG VRIWZDUH WRROV WKDW PDNH D SURJUDP PRUH XQGHUVWDQGDEOH E\ UHYHDOLQJ GDWD GHSHQGHQFLHV DUH VRPH RI WKH DSSOLFDWLRQV WKDW PD\ EHQHILW E\ XVLQJ RXU PHWKRG 7KH VHFRQG SDUW RI WKLV ZRUN SUHVHQWHG QHZ DOJRULWKPV IRU SUHFLVH LQWHUSURFHGXn UDO ORJLFDO ULSSOH HIIHFW DQG VOLFLQJ 7KH DOJRULWKPV XVH RXU LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV PHWKRG DQG DGG D FRQWURO PHFKDQLVP E\ ZKLFK LQ HIIHFW H[HFXWLRQSDWK PAGE 93 KLVWRU\ FDQ DIIHFW H[HFXWLRQSDWK FRQWLQXDWLRQ DV WKH ULSSOH HIIHFW RU VOLFH LV EXLOW SLHFH E\ SLHFH 7KH LPSRUWDQFH RI RXU DOJRULWKPV IRU SUHFLVH LQWHUSURFHGXUDO ORJLFDO ULSSOH HIIHFW DQG VOLFLQJ OLHV LQ WKHLU DSSOLFDELOLW\ WR WKH DUHDV RI VRIWZDUH PDLQWHQDQFH DQG GHEXJn JLQJ $ SUHFLVH LQWHUSURFHGXUDO ORJLFDO ULSSOH HIIHFW FDQ EH XVHG WR VKRZ D SURJUDPPHU WKH FRQVHTXHQFHV RI SURJUDP FKDQJHV WKHUHE\ UHGXFLQJ HUURUV DQG PDLQWHQDQFH FRVW 6LPLODUO\ D SUHFLVH LQWHUSURFHGXUDO VOLFH FDQ ORFDOL]H SURJUDP IDXOWV WKHUHE\ VDYLQJ SURJUDPPHU HIIRUW DQG GHEXJJLQJ FRVW 7KH WKLUG SDUW RI WKLV ZRUN SUHVHQWHG DQ DOJRULWKP WKDW LGHQWLILHV ORRSV WKDW FDQ EH SDUDOOHOL]HG LQFOXGLQJ ORRSV WKDW FRQWDLQ FDOOV 7KH DOJRULWKP PDNHV XVH RI RXU LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV PHWKRG WR GHWHUPLQH GDWD GHSHQGHQFLHV DQG WKHQ WKH DOJRULWKP H[DPLQHV WKH GDWD GHSHQGHQFLHV ZLWKLQ HDFK ORRS DQG GHWHUPLQHV LI DQ\ RI WKHVH GDWD GHSHQGHQFLHV DUH ORRSFDUULHG LQ ZKLFK FDVH SDUDOOHOL]DWLRQ RI WKH ORRS LV LQKLELWHG 7KH DOJRULWKP KDV SRWHQWLDO XVH LQ SDUDOOHOL]DWLRQ WRROV 'LUHFWLRQV IRU )XWXUH 5HVHDUFK 7KHUH DUH VHYHUDO WRSLFV RI SRVVLEOH IXWXUH UHVHDUFK UHODWHG WR RXU PHWKRG IRU LQWHUSURFHGXUDO GDWDIORZ DQDO\VLV 5HJDUGLQJ VROYLQJ WKH HTXDWLRQV EHVLGHV WKH LWn HUDWLYH DOJRULWKP WKHUH DUH HOLPLQDWLRQ DOJRULWKPV >@ WKDW KDYH EHWWHU FRPSOH[LW\ )XUWKHU VWXGLHV DUH QHHGHG WR GHWHUPLQH WR ZKDW H[WHQW WKHVH RWKHU DOJRULWKPV FDQ EH XVHG WR VROYH WKH HTXDWLRQV $QRWKHU WRSLF UHJDUGV WKH GDWDIORZ SUREOHPV WKDW FDQ EH VROYHG E\ RXU PHWKRG DV WKH DFWXDO XQLYHUVH RI VROYDEOH SUREOHPV UHPDLQV WR EH GHWHUPLQHG :H KDYH RQO\ PHQWLRQHG D IHZ RI WKH EHWWHU NQRZQ SUREOHPV )RU VRPH GDWDIORZ SUREOHPV LW PD\ EH WKDW RXU PHWKRG FDQ EH XVHG DIWHU VXLWDEOH PRGLILFDWLRQ WR DGDSW LW WR WKH VSHFLDO QHHGV RI WKH SUREOHP 5HJDUGLQJ SRVVLEOH IXWXUH UHVHDUFK UHODWHG WR RXU DOJRULWKPV IRU SUHFLVH LQWHUn SURFHGXUDO ORJLFDO ULSSOH HIIHFW DQG VOLFLQJ EHFDXVH WKH DOJRULWKPV PD\ RYHUHVWLPDWH ZKHQ UHFXUVLYH FDOOV DUH SUHVHQW RU EHFDXVH WKH $OORZ VHW ODFNV WKH LQIRUPDWLRQ QHHGHG PAGE 94 WR HQIRUFH WKH RUGHULQJ RI XQPDWFKHG UHWXUQV RQH DUHD RI IXWXUH UHVHDUFK ZRXOG EH WR LQYHVWLJDWH WKH SRVVLELOLW\ RI PRGLI\LQJ 'HILQLWLRQ 7KHRUHPV WKURXJK DQG WKH DOJRULWKPV VR DV WR UHPRYH WKH SRVVLELOLW\ RI VXFK RYHUHVWLPDWLRQ PAGE 95 5()(5(1&(6 >@ $JUDZDO + DQG +RUJDQ '\QDPLF SURJUDP VOLFLQJ 3URFHHGLQJV RI WKH 6,* 3/$1 &RQIHUHQFH RQ 3URJUDPPLQJ /DQJXDJH 'HVLJQ DQG ,PSOHPHQWDWLRQ $&0 6,*3/$1 1RWLFHV -XQH f >@ $KR $ 6HWKL 5 DQG 8OOPDQ &RPSLOHUV 3ULQFLSOHV 7HFKQLTXHV DQG 7RROV $GGLVRQ:HVOH\ 5HDGLQJ 0$ f >@ $OOHQ ) ,QWHUSURFHGXUDO GDWD IORZ DQDO\VLV 3URFHHGLQJV RI WKH ,),3 &RQJUHVV 1RUWK +ROODQG $PVWHUGDP f >@ %DQQLQJ $Q HIILFLHQW ZD\ WR ILQG WKH VLGH HIIHFWV RI SURFHGXUH FDOOV DQG WKH DOLDVHV RI YDULDEOHV &RQIHUHQFH 5HFRUG RI WKH WK $&0 6\PSRVLXP RQ 3ULQFLSOHV RI 3URJUDPPLQJ /DQJXDJHV $&0 1HZ PAGE 96 >@ -RKPDQQ /LX 6 DQG PAGE 97 %,2*5$3+,&$/ 6.(7&+ .XUW -RKPDQQ ZDV ERUQ LQ (OL]DEHWK 1HZ -HUVH\ RQ 1RYHPEHU ,Q KH UHFHLYHG D %$ LQ FRPSXWHU VFLHQFH IURP 5XWJHUV 8QLYHUVLW\ LQ 1HZ -HUVH\ )ROORZLQJ JUDGXDWLRQ KH ZRUNHG IRU D VKLSSLQJ FRPSDQ\ 6HD/DQG 6HUYLFH ,QF DV D SURJUDPPHU DQG V\VWHPV DQDO\VW ,Q KH OHIW 6HD/DQG DQG GLG 3& ZRUN IRU WKUHH \HDUV )ROORZLQJ WKLV KH HQWHUHG WKH JUDGXDWH SURJUDP RI WKH &RPSXWHU DQG ,QIRUPDWLRQ 6FLHQFHV 'HSDUWPHQW DW WKH 8QLYHUVLW\ RI )ORULGD LQ WKH )DOO RI +H UHFHLYHG DQ 06 LQ FRPSXWHU VFLHQFH 'HFHPEHU DQG HQWHUHG WKH 3K' SURJUDP $QWLFLSDWLQJ JUDGXDWLRQ KH KRSHV WR ILQG D MRE LQ DFDGHPLD PAGE 98 , FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 6WHSILH 6 PAGE 99 , FHUWLI\ WKDW KDYH UHDG WKLV VWXG\ DQG WKDW LQ P\ RSLQLRQ LW FRQIRUPV WR DFFHSWDEOH VWDQGDUGV RI VFKRODUO\ SUHVHQWDWLRQ DQG LV IXOO\ DGHTXDWH LQ VFRSH DQG TXDOLW\ DV D GLVVHUWDWLRQ IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 7KLV GLVVHUWDWLRQ ZDV VXEPLWWHG WR WKH *UDGXDWH )DFXOW\ RI WKH &ROOHJH RI (QJLQHHULQJ DQG WR WKH *UDGXDWH 6FKRRO DQG ZDV DFFHSWHG DV SDUWLDO IXOILOOPHQW RI WKH UHTXLUHPHQWV IRU WKH GHJUHH RI 'RFWRU RI 3KLORVRSK\ 0D\ Q :LQIUHG 0 3KLOOLSV 'HDQ &ROOHJH RI (QJLQHHULQJ 0DGHO\Q 0 /RFNKDUW 'HDQ *UDGXDWH 6FKRRO PAGE 100 81,9(56,7< 2) )/25,'$ CONTEXT-DEPENDENT FLOW-SENSITIVE INTERPROCEDURAL DATAFLOW ANALYSIS AND ITS APPLICATION TO SLICING AND PARALLELIZATION By KURT JOHMANN A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1992 UNIVERSITY OF FIORIOR LIBRARIES ACKNOWLEDGEMENTS I would like to express my appreciation and gratitude to my chairman and advisor, Dr. Stephen S. Yau, for his careful guidance and generous support during this study. I also would like to express my appreciation and gratitude to my previous advisor, Dr. Sying-Syang Liu. Without their supervision and counsel, this work would not have been possible. To Dr. Paul Fishwick, Dr. Richard Newman-Wolfe, and Dr. Mark Yang, members of the supervisory committee, go my thankfulness for their service. Finally, I want to thank the Software Engineering Research Center (SERC) for providing financial support during this study. in TABLE OF CONTENTS ACKNOWLEDGEMENTS ii ABSTRACT v CHAPTERS 1 INTRODUCTION 1 1.1 Interprocedural Dataflow Analysis 1 1.2 Slicing and Logical Ripple Effect 3 1.3 Parallelization 6 1.4 Literature Review 7 1.5 Outline in Brief 11 2 THE INTERPROCEDURAL DATAFLOW ANALYSIS METHOD .... 12 2.1 Constructing the Flowgraph 12 2.2 Interprocedural Forward-Flow-Or Analysis 16 2.2.1 The Dataflow Equations 17 2.2.2 Element Recoding for Aliases 23 2.2.3 Implicit Definitions Due to Calls 27 2.3 Interprocedural Forward-Flow-And Analysis 30 2.4 Interprocedural Backward-Flow Analysis 36 2.5 Complexity of Our Interprocedural Analysis Method 36 2.6 Experimental Results 41 3 INTERPROCEDURAL SLICING AND LOGICAL RIPPLE EFFECT . . 45 3.1 Representing Continuation Paths for Interprocedural Logical Ripple Effect 45 3.2 The Logical Ripple Effect Algorithm 55 3.3 A Prototype Demonstrates the Algorithm 67 3.4 The Slicing Algorithm 71 4 INTERPROCEDURAL PARALLELIZATION 77 4.1 Loop-Carried Data Dependence 77 4.2 The Parallelization Algorithm 80 5 CONCLUSIONS AND FUTURE RESEARCH 86 5.1 Summary of Main Results 86 iii 5.2 Directions for Future Research 87 REFERENCES 88 BIOGRAPHICAL SKETCH 91 iv Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy CONTEXT-DEPENDENT FLOW-SENSITIVE INTERPROCEDURAL DATAFLOW ANALYSIS AND ITS APPLICATION TO SLICING AND PARALLELIZATION By Kurt Johmann May 1992 Chairman: Dr. Stephen S. Yau Major Department: Computer and Information Sciences Interprocedural dataflow analysis is important in compiler optimization, auÂ¬ tomatic vectorization and parallelization, program revalidation, dataflow anomaly detection, and software tools that make a program more understandable by showÂ¬ ing data dependencies. These applications require the solution of dataflow problems such as reaching definitions, live variables, available expressions, and definition-use chains. When solving these problems interprocedurally, the context of each call must be taken into account. In this dissertation we present a method to solve this kind of dataflow problem precisely. The method consists of special dataflow equations that are solved for a program flowgraph. Regarding calling context, separate sets, called entry and body sets, are maintained at each node in the flowgraph. The entry set contains calling- context effects that enter a procedure. The body set contains effects that result from statements in the procedure. By isolating calling-context effects in the entry set, a callâ€™s nonkilled calling context is preserved by means of a simple intersection operation done at the return node for the call. Slicing determines program pieces that can affect a value. Logical ripple effect determines program pieces that can be affected by a value. Both slicing and logical ripple effect are useful for software maintenance. The problems of slicing and logical ripple effect are inverses of each other, and a solution of either problem can be inverted to solve the other. Precise interprocedural logical ripple effect analysis is complicated by the fact that an element may be in the ripple effect by virtue of one or more specific execution paths. In this dissertation we present an algorithm that builds a precise logical ripple effect or slice piece by piece, taking into account the possible execution paths. The algorithm makes use of our interprocedural dataflow analysis method, and this method is also used in an algorithm given in this dissertation for identifying loops that can be parallelized. vi CHAPTER 1 INTRODUCTION 1.1 Interprocedural Dataflow Analysis Dataflow analysis refers to a class of problems that ask about the relationships that exist along a programâ€™s possible execution paths, between such program eleÂ¬ ments as variables, constants, and expressions [2, 10]. When dataflow analysis is done for a program by treating its individual procedures as being independent of each other, regardless of the calls made, this is known as intraprocedural analysis. For intraprocedural analysis, assumptions must be made about the effects of calls. By contrast, interprocedural analysis replaces assumptions with specific information about the effects of each call. This information can be gathered by either flow- sensitive [3, 6, 9, 17, 19, 21] or flow-insensitive [4, 7, 18] analysis. When answering a dataflow question, a flow-sensitive analysis will take into account the flow paths within procedures, whereas a flow-insensitive analysis ignores these flow paths. The flow paths are the possible execution paths. Flow-sensitive analysis typically provides more precise information, but at greater cost. Flow-sensitive interprocedural dataflow analysis has two major problems that make it significantly harder than intraprocedural analysis. First, in intraprocedural analysis, it is assumed that any path in the flowgraph is a possible execution path. By contrast, for interprocedural analysis, it is useful to assume that the possible execution paths conform to the rule that once a procedure is entered by a call, the flow returns to that call upon return. Thus, the set of possible execution paths will typically be a proper subset of the paths in the program flowgraph. This problem 1 2 will be referred to as the calling-context problem. Second, call-by-reference formal parameters typically cause alias relationships between actual and formal parameters that are valid only for certain calls and apply only to those passes through the called procedure that originate from those calls that establish the specific alias relationship. There are many applications for a flow-sensitive interprocedural dataflow analÂ¬ ysis method that solves the two major problems, assuming that the costs of the method are not too high. Some of the well-known dataflow problems that can be precisely solved by such a method are reaching definitions, live variables, the related problems of definition-use and use-definition chains, and available expressions. ApÂ¬ plications that require the solution of one or more of these dataflow problems include compiler optimization, automatic vectorization and parallelization of program code, program revalidation, dataflow anomaly detection, and software tools that show data dependencies. In this dissertation we present a new method for flow-sensitive interprocedural dataflow analysis that solves the two major problems, and does so at a comparatively low cost [13]. The method consists of special dataflow equations that are solved for a program flowgraph. In deference to calling context, separate sets, called entry and body sets, are maintained at each node in the flowgraph. The entry set contains calling-context effects that enter a procedure. The body set contains effects that result from statements in the procedure. By isolating calling-context effects in the entry set, a callâ€™s nonkilled calling context is preserved by means of a simple interÂ¬ section operation done at the return node for the call. The main advantage of our method is its low complexity, and the fact that the presence of recursion does not affect the preciseness of the result. The language model assumed for Chapter 2 allows global variables, but the visibility of each formal parameter is limited to the single procedure that declares 3 it. Thus, with the exception of a call and its indirect reference, each formal paÂ¬ rameter can only be referenced inside a single procedure. Examples of programming languages that fit this model are C and FORTRAN. This restriction on the visibility of formal parameters is imposed for the sake of the discussions of element recoding in Sections 2.2.2 and 2.3, of implicit definitions in Section 2.2.3, and of worst-case complexity in Section 2.5. Our method can also be used for the alternative language model that allows each formal parameter to have visibility in more than a single procedure, but this is considered only briefly at the end of Section 2.5. 1.2 Slicing and Logical Ripple Effect Given an actual or hypothetical variable v at program point p, determine all program pieces that can possibly be affected by the value of v at p. This is the logical ripple effect problem. Given v and p, determine all program pieces that can possibly affect the value of v at p. This is the slicing problem. For these two problems, each problem is the inverse of the other, and a solution for one of these problems, once inverted, would be a solution for the other problem. Logical ripple effect is useful for helping a programmer to understand how a program change, either actual or hypothetical, will impact that program. Making program changes as part of routine maintenance often introduces new errors into the changed program. Such errors typically result because the programmer overlooked some part of the logical ripple effect for that change. By showing a programmer what the logical ripple effect actually is for a program change, mistakes can be avoided. Slicing is primarily useful for program fault localization [23]. If a variable v at point p is known to have a wrong value, then a slice on v at p will narrow the search for the cause of the error to that part of the program that can truly affect v at p. Thus, the fault is localized. The more precise the slice, the more localized the cause of the error, saving programmer time. 4 In this dissertation we are concerned only with static logical ripple effect and slicing [11, 12, 16, 24] where the ripple effect or slice is determined from dataflow analysis of the program text. The alternative approach is dynamic logical ripple effect and slicing [1, 14] where the ripple effect or slice is determined by actually executing the program. Whenever we speak of execution paths in Chapter 3, we always mean possible execution paths as determined by dataflow analysis. Precise interprocedural logical ripple effect analysis is complicated by the fact that a definition may be added to the ripple effect because of one or more specific execution paths. To determine in turn the ripple effect of that added definition, that definition should be constrained to those execution paths that are the possible continuations of the execution paths along which that definition was itself affected and thereby added to the ripple effect. We refer to this as the execution-path problem. In particular, it is those call instances made in an execution path P that have not been returned to in P that cause the difficulty. This is because of the rule that a called procedure returns to its most recent caller. This means that any continuation of the execution path P must first return to those unreturned calls in P before returns can possibly be made to call instances that precede P. An example will illustrate the problem. procedure main procedure B procedure A begin begin begin 1: f Â«- 7 6: y <â€” f + 5 7: call B 2: call A end 8: x < y 3: z <â€” x 4: f Â«- 1 5: call B end end For the example, assume that all variables are global, and that the problem is to determine the logical ripple effect for the definition of variable / at line 4. The call 5 to procedure B at line 5 allows the definition of / at line 4 to affect the definition of y at line 6, and the return of procedure B would be to the call at line 5 by which the definition of y at line 6 was affected. The end result is that the ripple effect should include only line 6. However, assume that the execution-path problem is ignored and all returns are possible when the ripple effect is computed. For the same problem, the call at line 5 allows the definition of / at line 4 to affect the definition of y at line 6. Then the definition of y at line 6 affects the definition of x at line 8 by procedure B returning to the call at line 7 in addition to the call at line 5. Then the definition of x at line 8 affects the definition of z at line 3 by procedure A returning to the call at line 2. The end result is a ripple effect that includes lines 3, 6, and 8, but only line 6 should be in the ripple effect. Although there are a number of papers on logical ripple effect and slicing [11, 12, 16, 24], there appears to be only one [11] that addresses the problems of precise interprocedural logical ripple effect and slicing, and presents a method for it. Weiser [24] was the first to propose an interprocedural slicing method that ignores the execution-path problem and thereby suffers from the resulting loss of precision. Horwitz et al. [11] address the problem of precise interprocedural slicing, and present a method to construct a system dependence graph from which slices can be extracted. In this dissertation we present an algorithm that builds the logical ripple effect piece by piece, and takes into account the restrictions on execution-path continuation that are imposed by the preceding execution paths up to the point by which the given program piece is affected and thereby included in the ripple effect. In general, the algorithm computes a precise logical ripple effect, but some overestimation is possible, meaning that the computed logical ripple effect may be larger than it actually is. An inverse form of the algorithm is presented for the slicing problem. The languages that our algorithm will work for include many of the common procedural languages such as C, Pascal, Ada, and Fortran. 6 1,3 Parallelization Automatic conversion of a sequential program into a parallel program is often referred to as parallelization. Parallelization problems are typically concerned with the conversion of sequential loops into parallel code. In this dissertation, the specific problem considered is the identification of loops in a program that can be parallelized, including those loops that contain calls. A flow-sensitive interprocedural dataflow analysis method has specific applicability to the problem of parallelizing loops that contain calls, because such a method can supply the precise data-dependency inforÂ¬ mation that would be necessary for the parallelization analysis. The parallelization of a loop would mean that each iteration of the loop can be executed independently of the other iterations of the loop. In theory, this would mean that each single iteration, or each arbitrary block of iterations, can be assigned to a separate processor in a parallel machine. The specific architecture of a particular parallel machine, as well as the programming language to be parallelized, as well as the various loop transformations that are possible to convert sequential loop code into functionally equivalent sequential code that is more parallelizable, will influence the determination in any parallelization tool as to what loops can actually be parallelized, and how they would be parallelized. However, none of the architecture, language, and loop-transformation issues will be considered here. Instead, the problem will be considered solely from the standpoint of data dependence. After a brief review of the basics regarding data dependence and parallelization, an algorithm is given that identifies loops in a program that can be parallelized, and this algorithm uses our interprocedural dataflow analysis method as an integral part. The potential value of parallelization is clear. On the one hand, parallel machines are becoming more common, and on the other hand, a great number of sequential programs already exist, some of which can benefit from the greater processing power that parallelization would offer. 7 1.4 Literature Review Different methods have been offered for solving various flow-sensitive interproÂ¬ cedural dataflow analysis problems. Sharir and Pnueli [21] present a method they name call-strings. The essential idea of their method is to accumulate for each eleÂ¬ ment a history of the calls traversed by that element as it flows through the program flowgraph. The call history associated with an element is used whenever that element is at a return point. The element can only cross back to those calls in its call history. Thus, the call-strings approach provides a solution to the calling-context problem. However, the disadvantage of this approach is the time and space needed to maintain a call history for each element at each flowgraph node. Let l be the program size. We assume that the number of elements will be a linear function of l. The worst-case number of total set operations required by the call-strings approach would be greater by a factor of / when compared to our method. This is because for each union or intersection of two sets of elements, if the same element is in both sets, then a union operation must also be done for the two associated call histories so as to get the new call history to be associated with that element at the node for which the set operation is being done. A further disadvantage of the call-strings approach is the need to include the associated call histories when set stability is tested to determine termination for the iterative algorithm used to solve the dataflow equations. Myers [17] offers a solution to the calling-context problem that is essentially the same as call-strings. Allen [3] presents a different method for interprocedural dataflow analysis. The method analyzes each procedure completely, in reverse invocation order. The first procedures to be analyzed would be those that make no calls, then the procedures that only call these procedures would be analyzed, and so on. Once a procedure is analyzed, its effects can be incorporated into those procedures that call 8 it, when they in turn are analyzed. The obvious drawback of this method is that it cannot be used to analyze recursive calls. Rosen [19] presents a complex method for interprocedural dataflow analysis that is limited to solving the problems of variable modification, preservation, and use. These dataflow problems do not require a solution of the calling-context problem. Callahan [6] has proposed the program summary graph to solve the interproceÂ¬ dural dataflow problems of kill and use, where kill determines all definite kills that result from a procedure call, and use determines all variables that may be used as a result of a procedure call before being redefined. As part of the determination of edges in the program summary graph, intraproÂ¬ cedural reaching-definitions analysis must be done for each procedure. Simplifying Callahanâ€™s space complexity analysis, we get 0(vgal) as the worst-case size of the program summary graph, where vga is the number of global variables in the program plus the average number of actual parameters per call, and / is the program size. One limitation of Callahanâ€™s method is that it does not correctly handle multiple aliases that result when the same variable is used multiple times as an actual parameter in the same call and the corresponding formal parameters are call-by-reference. By contrast, our method, using element recoding where all the aliases are encoded in a single element, will correctly handle the multiple aliases problem. Callahanâ€™s method offers no solution to the calling-context problem, and could not be used to determine, for example, interprocedural reaching definitions. However, Harrold and Soffa [9] have extended his method so that interprocedural reaching definitions can be determined. They use an interprocedural flowgraph, denoted IFG, that is very similar to the program summary graph. The IFG has inter-reaching edges that are determined by solving Callahanâ€™s kill problem. They recommend using his method, so their method inherits Callahanâ€™s space and time complexity, as well as its limitation with regard to multiple aliases. 9 Before the IFG can be used, it must be decorated with the results of intraproÂ¬ cedural analysis done twice for each procedure to determine both reaching definitions and upwardly exposed uses. Then an algorithm is used to propagate the upwardly exposed uses throughout the IFG. This algorithm has worst-case time complexity of 0(n2) where n is the number of nodes in the IFG. Their graph will have the same number of nodes as for Callahanâ€™s graph, meaning worst-case graph size will be 0{vgal). Substituting vgal for n, we get a worst-case time complexity of 0(v2J2). As the size of our flowgraph is proportional to the size of the program, the worst-case time complexity for solving our equations is only 0(l2). Weiser [24] was the first to propose an interprocedural slicing method that ignores the execution-path problem and thereby suffers from the resulting loss of precision. Horwitz et al. [11] have presented a method to compute the more precise slice explained in the Introduction. However, they use a more restricted definition of a slice. Their slice is all statements and predicates that may affect a variable v at program point p, such that v is defined or used at point p. Their method consists of constructing a specialized graph called a system dependence graph. Nodes in this graph represent program pieces such as statements, and the edges in the graph represent control or data dependencies. Edges representing transitive data dependencies that are due to procedure calls are computed by first modeling each procedure and its calls with an attribute grammar called a linkage grammar, and then solving the grammar so as to determine the transitive data dependencies represented by it. Once the system dependence graph is complete, any slice based on an actual definition or use occurring at any point p in the program can be extracted from the graph. A major weakness of their method is that it does not allow a hypothetical use to be the starting point of the slice. The complexity of constructing the system dependence graph is given as 0(G â€¢ X2 â– D2) where G is the total number of procedures and calls in the program, X is the 10 total number of global variables in the program plus a term that can be considered a constant, and D is a linear function of X. Once the system dependence graph is complete, any particular slice that is wanted can be extracted from the graph at complexity 0(n) where n is the size of the graph. The size of the graph is roughly quadratic with program size, being bounded by 0(P â€¢ (V -f E) + T â€¢ X) where P is the number of procedures, V is the largest number of predicates and definitions in a single procedure, E is the largest number of edges in a procedure dependence graph, T is the number of calls in the program, and X is the number of global variables. In their paper, much is made of the fact that once the graph is complete, any slice on an actual definition or use can be extracted from the graph at 0(n) cost where n is the size of the graph. However, the number of actual definition and use occurrences in a program is proportional to the program size L. Therefore, any method that can compute a slice at cost O(Z) for some Z, can generate all the slices contained in their graph at cost 0(Z â€¢ L), spool the slices to disk, and recover them at cost 0(1). Although there are many papers on slicing, it seems that only Horwitz et al. [11] discuss clearly the problem of the more precise interprocedural slice, and present a method to compute it, as well as providing complexity analysis. Our research on slicing is only concerned with computing the more precise slice, so Horwitz et al. is the principal reference. Zima and Chapman [25] is the principal reference used to study the issues and methods of parallelization. Their book distills the work found in scores of papers and dissertations, and is an excellent survey of parallelization. Interprocedural parÂ¬ allelization is specifically considered by Burke and Cytron [5], and by Triolet et al. [22]. 11 1.5 Outline in Brief This introductory chapter ends with a brief synopsis of the remaining chapters. Chapter 2 presents in detail our interprocedural dataflow analysis method. The chapÂ¬ ter ends with a brief description of the prototypes that were built to demonstrate the method, along with some of the experimental results obtained from these prototypes. Chapter 3 begins with a representation scheme for continuation paths for the interÂ¬ procedural logical ripple effect problem and then presents our interprocedural logical ripple effect algorithm. A prototype that was built to demonstrate this algorithm is briefly described and experimental results are presented. An inversion of the logical ripple effect algorithm is then presented as a solution to the interprocedural slicing problem. Chapter 4 begins with an explanation of loop-carried data dependence and its relevance to parallelization, and concludes with an algorithm that identifies loops that can be parallelized, including loops that contain calls. Chapter 5 summarizes the major results of the dissertation, and suggests directions for future research. CHAPTER 2 THE INTERPROCEDURAL DATAFLOW ANALYSIS METHOD 2.1 Constructing the Flowgraph This section discusses the flowgraph and its relationship to dataflow equations. After the discussion, rules are given for constructing the specific flowgraph required by our interprocedural analysis method. Note that the required flowgraph is conÂ¬ ventional and the rules to be given relate only to the representation of calls and procedures in the flowgraph. A flowgraph is a directed graph that represents the possible flow paths of a program. The nodes of a flowgraph correspond to basic blocks in the program. A basic block is a sequence of program code that is always executed together in the same order. The directed edges of a flowgraph represent possible transfers of control. Figures 2.1 and 2.3 each represent a flowgraph. Dataflow problems are often formulated as a set of equations that relate the four sets, IN, OUT, GEN, and KILL, that are associated with each node in the flow- graph. For any node and its block, the GEN set represents the elements generated by that block. The KILL set represents those elements that cannot flow through the block, because they would be killed by the block. The IN set represents the valid elements at the start of the block, and the OUT set represents the valid elements at the end of the block. Dataflow problems are typically either forward-flow or backward-flow. For forward-flow, the IN set of a node is computed as the confluence of the OUT sets of the predecessor nodes, and the OUT set is a function of the nodeâ€™s IN, GEN, 12 13 and KILL sets. For backward-flow, the OUT set of a node is computed as the conÂ¬ fluence of the IN sets of the successor nodes, and the IN set is a function of the nodeâ€™s OUT, GEN, and KILL sets. The predecessors of any node n are those nodes that have an out-edge directed to node n. The successors of node n are those nodes that have an in-edge directed from node n. The confluence operator will almost inÂ¬ variably be either set union or set intersection, depending on the problem. Thus, a dataflow problem may be classified as being either forward-flow-or, forward-flow-and, backward-flow-or, or backward-flow-and, where â€œorâ€ refers to set union and â€œandâ€ refers to set intersection. Once the dataflow equations have been defined for a particular problem, and the rules established for creating the GEN and KILL sets, the equations can then be solved for a specific program or procedure and its representative flowgraph. To solve the equations, the iterative algorithm can be used. The iterative algorithm has the advantage that it will work for any flowgraph. The iterative algorithm repeatedly computes the IN and OUT sets for all nodes until all sets have stabilized and ceased to change. Recomputation of a node is necessary whenever an outside set that it depends on changes. For forward-flow problems, a node must be recomputed if the OUT set of a predecessor node changes. For backward-flow problems, a node must be recomputed if the IN set of a successor node changes. Typically, an evaluation strategy will determine the actual order in which nodes are recomputed. The flowgraph required by our interprocedural analysis method is conventional, with special nodes and edges as follows. For each procedure in the program, assign an entry node and an exit node. These nodes have no associated blocks of program code. The entry node has a single out-edge and as many in-edges as there are calls to that procedure in the program. The exit node has as many in-edges as there are 14 nodes for that procedure whose blocks terminate with a return action. The exit node has as many out-edges as there are calls to that procedure in the program. For every in-edge of the entry node, there is a corresponding out-edge of the exit node. For the purpose of constructing the flowgraph, calls must be classified as either known or unknown. A known call is where the flowgraph for the called procedure will be a part of the total flowgraph being constructed. An unknown call is where the flowgraph of the called procedure will not be a part of the total flowgraph being constructed. Unknown calls are common and will occur for two reasons. First, the called procedure may be a compiler-library procedure for which source code is not available. Second, the called procedure may be a separately compiled user procedure for which the source code is not available. For any unknown call made within the program, if summary information of its interprocedural effects is not available, then conservative assumptions about its effects will have to be made. The actual summary information needed, and the assumptions made in its absence, will depend on the particular dataflow problem. The summary information, if present, would be used when constructing the GEN and KILL sets for any node whose block contains an unknown call. For any known call made within the program, there will be two nodes in the flowgraph for that call. One node is the call node. The call node represents a basic block that ends with the known call. The other node is the return node. The return node has an empty associated block. The call node will have two out-edges. One edge will be directed to the entry node of the called procedure. The other out-edge will be directed to the return node for that call. The return node will have two in-edges. One edge is the directed edge from the call node. The other in-edge is directed from the called procedureâ€™s exit node. 15 In all, each known call results in two nodes and three distinct edges. One edge connects the call node to its return node. A second edge connects the call node to the called procedureâ€™s entry node. A third edge connects the called procedureâ€™s exit node to the return node. In constructing the flowgraph, a special problem arises if the programming lanÂ¬ guage allows procedure-valued variables, such as the function pointers of C that when dereferenced result in a call of the function that is pointed at. The problem is to identify what are the possible procedure values when the procedure-valued variable invokes a call. Assuming this information is available from a separate analysis, the flowgraph can be constructed accordingly. For example, if the procedure-valued variÂ¬ able can have three different values when the call in question is invoked and each value is a procedure whose flowgraph will be part of the total flowgraph, then three known calls would be constructed in parallel with a common predecessor node for the three call nodes and a common successor node for the three return nodes. A procedure-valued variable is in essence a pointer. Note that the problem of determining what a pointer is or may be pointing at when that pointer is dereferenced, can itself be formulated as a dataflow problem, and in particular as a forward-flow-or dataflow problem. If necessary, an initial version of the flowgraph could be conÂ¬ structed that treats all calls invoked by procedure-valued variables as unknown calls, followed by a solving of the dataflow problem for determining possible pointer values whenever a pointer is dereferenced, followed by amendments to the flowgraph using the pointer-value information. Dataflow analysis makes a simplifying, conservative assumption about the corÂ¬ respondence between paths in the flowgraph and possible execution paths in the proÂ¬ gram. Let a path be a sequence of flowgraph nodes such that in the sequence node n follows node m only if n is a successor of m in the flowgraph. For intraprocedural 16 analysis, the assumption made is that any path in the flowgraph is a possible execuÂ¬ tion path. That this assumption may not be true for a particular program should be obvious. However, the problem of determining the possible execution paths for an arbitrary program is known to be undecidable. The simplifying assumption that we use for interprocedural analysis is the same as that used for intraprocedural analyÂ¬ sis, but with the added proviso that for any path that is a possible execution path, any subsequence of return nodes must inversely match, if present, the immediately preceding subsequence of call nodes. A return node matches a call node if and only if the return node is the call nodeâ€™s successor in the flowgraph. 2.2 Interprocedural Forward-Flow-Or Analysis This section begins with our basic approach to solving the calling-context probÂ¬ lem. The dataflow equations for forward-flow-or analysis are then given and their correctness is shown. As a part of our interprocedural analysis method, the techÂ¬ nique of element recoding is presented as a way to deal with the aliases that result from call-by-reference formal parameters. For some dataflow problems, implicit defiÂ¬ nitions due to calls require explicit treatment, and this is discussed last. If certain problems, such as reaching definitions, are to be solved for a program by flow-sensitive interprocedural analysis, then the calling context of each procedure call must be preserved. In general, preserving calling context means that the dataflow effects of an individual call should include those effects that survive the call and were introduced into the called procedure by the call itself, but not those effects introduced into the called procedure by all the other calls to it that may exist elsewhere in the program. We refer to the need to preserve calling context as the calling-context problem. Our solution to the calling-context problemâ€”and the essential difference beÂ¬ tween our dataflow equations and conventional dataflow equationsâ€”is to divide every IN set and every OUT set into two sets called an entry set and a body set. The reason 17 for having two sets is that the calling-context effects that enter a procedure from the different calls can be collected and isolated in the separate entry set. This entry set can then have effects in it killed by statements in the body of the procedure, but no additions are made to this entry set by body statements. Instead, any additions of effects due to body statements are made to the separate body set. This body set will also have effects killed in the normal manner, as for the entry set. Because the body set is kept free of calling-context effects, it is empty at the entry node. By contrast, the entry set is at its largest at the entry node and will either stay the same size as it progresses through the procedureâ€™s body nodes, or become smaller because of kills. By intersecting the calling context at a call node with the entry set at the exit node of the called procedure, the result is that subset of the calling context that has reached the exit node and therefore will reach the return node for that call. By â€œreachâ€ we mean that there exists a path in the flowgraph along which the element is not killed or blocked. 2.2.1 The Dataflow Equations The dataflow equations that define the entry and body sets at every node are now given. The equations are divided into three groups. The first group computes the sets for entry nodes. The second group computes the sets for return nodes. The third group computes the sets for all other nodes. In the equations, B denotes a body set and E denotes an entry set. Two conditions, C\ and C2, appear in the equations. C\ means that x will cross the interprocedural boundary from call node p into the called procedure. C2 means that x can cross the interprocedural boundary from exit node q into return node n. Cx means not C,. For each node n, pred(n) means the set of predecessors of n. The RECODE set used in Group I is explained in Section 2.2.2. The GEN set used in Group I, and the GEN and KILL sets used in Group II, are explained in Section 2.2.3. 18 For any node n. IN[n] = Ein[n] U Bin[n\ OUT[n] = Eout[n\ U Bout[n\ Group I: n is an entry node. Bin[n] = 0 Ein[n] = 1J {x \ x â‚¬ OUT[p] ACi} p Â£ pred(n) Boxlt[n] = GEN[n\ Eout [n] = Ein[n\U RECODE[n\ Group II: n is a return node, p is the associated call node and q is the exit node of the called procedure. Bin[n\ = {x | (x G Bout[p] A (Ci V (Cx A C2 A x Â£ Â£out[ E,n[n\ = {xÃ Eout[p} | Ci V (Ci A C2 A x e Eout[q})} Bout[n} = (5m[n] - KILL[n]) U GEN[n} Eout[n} = Ein[n\ - I Bin M = \J Bout[p] p Â£ pred(n) Ein[n] = (J Eout[p] p Â£ pred(n) B0ut[n] = (Btn[n] - KILL[n]) U GEN[n\ 19 Eout[n] = E{n[n] - KILL[n] The equations assume that the GEN and KILL sets for each call node will include only those effects for that call that occur prior to the entry of the called procedure. This requirement is necessary because the OUT set of the call node is used by the entry-node equation that constructs the entry set of the called procedure. Referring to conditions C\ and C2, the rules for deciding whether an effect crosses a particular interprocedural boundary will depend on two primary factors, namely the dataflow problem and the programming language. For example, for the reaching-definitions problem and a language such as FORTRAN, any definition of a global variable, and any definition of a variable that is used as an actual parameter whose corresponding formal parameter is call-by-reference, will cross. As a rule, an effect that crosses into a procedure because it might be killed, will also cross back to the return node if it reaches the exit node of the called procedure. Table 2.1 shows the result of solving the equations for the flowgraph of FigÂ¬ ure 2.1. By â€œsolvingâ€ we mean that, in effect, the iterative algorithm has been used and all the sets are stable. The dataflow problem is reaching definitions, and variable w is local while variables x, t/, and 2 are global. Reaching definitions is the problem of finding all definitions of a variable that reach a particular use of that variable, for all variables and uses in the program. In Figure 2.1, nodes 1 and 8 are entry nodes, nodes 7 and 10 are exit nodes, nodes 3 and 5 are call nodes, and nodes 4 and 6 are return nodes. Alongside each node is its basic block. Each defined variable is superscripted with an identifier that is the set element used in Table 2.1 to represent that definition. The correctness of the equations can be seen from the following observations. For a procedure, the entry-node entry set is constructed as the union of all calling- context effects that can enter the procedure from its calls. Within the procedure body, effects in the entry set can be killed, but not added to. For effects in the entry 20 procedure main begin w = 5 x = 10 if(w > x) z = 10 procedure f() begin x= 10 end call f() else y3 = 5 call f() Figure 2.1. A reaching-definitions example. 21 Table 2.1. Solution of forward-flow-or equations for Figure 2.1. Node E{n E0ut Bin Bout 1 0 0 0 0 2 0 0 0 {1,2} 3 0 0 {1,2} {1,2, 4) 4 0 0 {1,4,5} {1,4, 5} 5 0 0 (1,2) (1,2,3) 6 0 0 {1,3, 5} {1,3, 5} 7 0 0 {1,3, 4,5} {1,3, 4,5} 8 {2, 3, 4} {2, 3, 4} 0 0 9 {2, 3, 4} {3,4} 0 {5} 10 (3,4) {3, 4} {5} {5} set that reach a call at a call node, those effects that survive the call are recovered in the entry set constructed by the Ein[n\ equation for the successor return node n. To see that this is true, observe the following. If an entry-set effect that reaches the call cannot enter the called procedure, then it cannot be killed within the called procedure, so the effect should be added to the return-node entry set without further conditions, and this is done by the selection criterion (x Â£ Eout[p\ A C\) in the equation for the return node. If, on the other hand, an entry-set effect reaches the call and does enter the called procedure, and therefore may be killed by it, then this effect should be added to the return-node entry set only if it reached the entry set of the called procedureâ€™s exit node and the effect can cross back into the caller. This is done by the selection criterion (x Â£ Eout[p] A C\ A C2 A x Â£ Eout[q]) in the Ein[n] equation for the return node. From the equations for the entry set, we see that for any procedure z, the entry set at zâ€™s exit node will, as the equations are solved, eventually contain all calling-context effects that entered z and reached its exit node. This characteristic of the exit-node entry set is the requirement placed upon it when it is used in the 22 Ein[n\ equation for the return node, so this requirement is satisfied and the entry-set equations are correct. For any procedure, the Bin set is always empty at the entry node, so the B set is free of calling-context effects. Within the procedure body, GEN and KILL sets are used to update the body set as it propagates along the various nodes. For effects in the body set that reach a call at a call node, those effects that survive the call are recovered in the body set constructed by the Â¿?,-â€ž[n] equation for the successor return node n. If a body-set effect that reaches the call cannot enter the called procedure, then it cannot be killed within the called procedure, so it should be added to the return-node body set without further conditions, and this is done by the selection criterion (x â‚¬ Bout[p] A Cj) in the 5,â€ž[n] equation for the return node. If, on the other hand, a body-set effect reaches the call and will enter the called procedure, and therefore may be killed by it, then this effect should be added to the return- node body set only if it reached the entry set of the called procedureâ€™s exit node and the effect can cross back into the caller. This is done by the selection criterion (x â‚¬ Boxlt[p] A Ci A C2 A x â‚¬ Eout[q]) in the Z?m[n] equation for the return node. In addition, all crossable effects that result from the call, and that are independent of calling context, should also be added to the return-node body set, and this is done by the selection criterion (x â‚¬ Bout[q] A C2) in the 5m[n] equation for the return node. From the equations for the body set, we see that for any procedure 2, the body set at 2â€™s exit node is free of calling-context effects and will, as the equations are solved, eventually contain all body effects that reached the exit node, including those body effects resulting from calls made within 2. This characteristic of the exit-node body set is the requirement placed upon it when it is used in the Bin[n\ equation for the return node, so this requirement is satisfied. The other requirement of this return-node equation is that the exit-node entry set contains all calling-context effects 23 for the procedure that reach the exit node. This requirement has already been shown to be satisfied, so we conclude that the body-set equations are correct. 2.2.2 Element Recoding for Aliases The RECODE set for the entry node has its elements added to the F,n set for that node. The idea of the RECODE set is that certain elements in the OUT set of a predecessor call node, irrespective of their ability to cross the interprocedural boundÂ¬ ary when parameters are ignored, should nevertheless be carried over into the entry set of the called procedure as calling-context effects because of an alias relationship established by the call, between an actual parameter and a formal call-by-reference parameter. Any element that enters a procedure because of such an alias relationship between parameters should be recoded to reflect this alias relationship. A recoded element represents both the base element, which is the element as it would be if there were no alias relationship, and the non-empty alias relationship. Element recoding has two purposes. First, it allows the recoded element within the called procedure to be killed correctly through its alias relationship. Second, it allows the recoded element within the called procedure to be correctly associated with specific references to those aliases that are in the alias relationship. Element recoding never involves a change of the base element, but only a change of the associated alias relationship, which would be the set of formal parameters to which the base element is, in effect, aliased. Because of element recoding, in effect a new element is generated, hence the separate RECODE set. Figure 2.2 presents an algorithm for generating the entry-node input sets E,â€ž and RECODE, for a forward-flow-or dataflow problem, for the assumed language model in which the visibility of each formal parameter is limited to the single proceÂ¬ dure that declares it. For each element in the OUT[c} set, the algorithm generates at most one element for inclusion in the entry-node input sets. The algorithm is 24 unambiguous, except for line 10. The â€œcan be affected byâ€ test at line 10 is a generÂ¬ alization. The details of this test will depend on the specific dataflow problem being solved. For example, if the dataflow problem is reaching definitions, then each base element w represents a specific definition of some variable z. If the actual parameÂ¬ ter p being tested by the algorithm is the variable z, and the corresponding formal parameter is call-by-reference, then the definition that w represents can be used or killed through that formal parameter, so w can be affected by that actual parameter z, and the â€œaffected byâ€ test is therefore satisfied. The p â‚¬ OA test at line 10 covers the situation where an actual parameter p that is aliased to the formal / is itself a formal parameter that is effectively aliased to w. In this case / is established as a new effective alias for w, by transitivity of the alias relationship. Referring to the algorithm, there is no carry over of the old alias relationship into the new alias relationship. The old alias relationship is represented by the OA set, and the new alias relationship is represented by the NA set. That this no- carry-over of the old alias relationship is correct, follows from the assumed language model. The aliases of element recoding are formal parameters, and the model states that each formal parameter is visible in only one procedure. This means there is no need to carry the old alias relationship into a different procedure, because the aliases cannot be referenced outside the single procedure in which the old alias relationship is active. Note that recursive calls are no exception to this no-carry-over rule, because a recursive call will cancel any alias relationship established for a base element by any prior call of the procedure. In general, the fact that crossing elements are recoded when NA / 0, and unrecoded when NA = 0 and OA ^ 0, places an added burden on the return-node equations to recognize an element that should be recovered from the exit-node entry set, necessitating, in effect, additional rules to cover this possibility. After an element is recovered, it would also be necessary to restore the alias relationship, if any, that 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 25 â€” e is an entry node. â€” This algorithm constructs the E{n[e] and RECODE[e] sets, begin Ein [ e ] <â€” 0 RECODE[e] <- 0 for each predecessor call node c of entry node e for each element x Â£ OUT[c] let w be the base element of x let OA be the set of aliases, if any, associated with w, forming x let NA be the set of new aliases N A *â€” 0 for each actual parameter p at call node c that is aliased to a call-by-reference formal parameter / if (w can be affected by p) V (p â‚¬ OA) NA +- NAU{f] fi end for if NA 0 RECODE[e] <- RECODE[e] U {(w,NA)} else if w can cross the interprocedural boundary Ein[e] <- Ein[e] U {u;} fi end for end for end Figure 2.2. Element-recoding algorithm for forward-flow-or dataflow problems. 26 it had prior to the call. This recognition and restoration problem is perhaps most easily solved by associating with each call node two additional sets, one for body- set elements and another for entry-set elements, where each set consists of ordered pairs. These sets would be determined whenever the entry-node entry set of the called procedure is computed. The first element of each ordered pair is a crossing element x as it exists in the Bout or Eout set at the call node, and the second element is element y which is that element effectively generated from element x by the element-recoding algorithm of Figure 2.2 at either line 13 or line 15. If all crossing elements for the call are included in these additional sets, then the return-node equations can use these sets instead of the Bout[p] and Eout[p) sets to recognize elements to be recovered from the exit- node entry set. Recognition and restoration would be done by trying to match the exit-node entry-set element against the second element of an ordered pair from the appropriate additional set at the call node, and then, if there is a match, restoring the original element by using the first element of the matched pair. For example, if a: is a crossing element in the Bout set of a call node, and y is the generated element, then (x, y) would be an ordered pair in the additional set for body-set elements. When the set for the return node is computed, if y is in the exit-node entry set then it will match the ordered pair (x, y), and element x will be added to the B{n set. As an example of why element recoding is necessary, consider the following. Suppose there are two different calls to the same procedure, and different definitions of global variable g reach each call. At one of the calls, g is also used as an actual parameter and the corresponding formal parameter is call-by-reference. The problem now is what to kill from the entry set whenever that formal parameter is defined in the called procedure. If the individual elements representing the different definitions of g do not somehow identify how they are related to this formal parameter, then 27 the only choice is to kill all of them or none of them, and neither of these choices is correct in this case, as the only definitions of g that should be killed are those that entered the procedure from the call where g is aliased to the call-by-reference formal parameter. 2.2.3 Implicit Definitions Due to Calls A call with parameters typically has implicit definitions associated with it. For example, if a formal parameter is call-by-reference, then each actual parameter aliased to that formal parameter is implicitly defined at each definition of the formal parameter. If a formal parameter is call-by-value-result, then that formal parameter is implicitly defined each time the called procedure is entered, and the actual parameter at the call is implicitly defined upon return from the call. From the standpoint of solving a dataflow problem such as reaching definitions, all implicit definitions due to calls should be determined, and elements generated at the appropriate nodes to represent these implicit definitions. The remainder of this section discusses the generation of implicit definitions and the determination of what reaches them for the specific problem of reaching definitions. We assume that a formal parameter may be either call-by-reference, call-byÂ¬ value, call-by-value-result, or call-by-result. For the reaching-definitions problem, before the iterative algorithm can be used to solve the dataflow equations, all GEN sets must be prepared. For each point p in the program where a call-by-reference formal parameter is defined, add to the GEN set of the node for point p an implicit definition of each actual-parameter variable that is aliased to that formal parameter in a call. Each added implicit-definition element must be a recoded element that includes the alias relationship for that actual parameter. For example, suppose a procedure named A has two call-by-reference formal parameters, x and y, and inside A at point p there is a definition of x, and there are three calls of procedure A in the program. The first call 28 aliases variable v to x. The second call aliases variable v to both x and y. The third call aliases variable w to x. Thus, at point p there would be three implicit-definition elements generated, namely (u, {x}), (w, {x, y}), and (in, {a:}). As an example of what this element notation means, for the (u,{x}) element the v represents the implicit definition of variable v that occurs at point p, and the x represents the formal parameter that variable v is aliased to. As a special requirement for these implicit-definition elements, for the Bout set at the exit node of procedure A, the (v, {x}) element, if it reaches this set, can only cross from this set to the return node of the first call. Similarly, the (u, {x,p}) element can only cross to the return node of the second call, and the (w, {x}) element can only cross to the return node of the third call. The crossing restrictions in the preceding example are due to a rule, now given. Let A denote a procedure containing a definition at point p of a call-by-reference formal parameter x, (t, {x}) is the implicit-definition element generated at point p for some specific call c of A that aliases actual-parameter variable t to x, and m is the exit node of A. If (t, {x}) â‚¬ Bout[m\, then (t, {x}) can only cross from Bout[m\ to the return node of call c, and as (t, {x}) crosses, it must be recoded as t by having its alias relationship nullified. This crossing-restriction rule is necessary because element (f, {x}) is both a body effect, because it is generated inside the called procedure, and a calling-context effect, because it is the result of a specific call of that procedure. This dual quality requires the special treatment that the rule provides. Nullifying the alias relationship as the element crosses to the return node is both good practice in general for this element, and a necessity if call c is a recursive call of A. As an example, assume that call c is a recursive call of A, and that variable t is a global variable. If (t, {x}) reaches the Bout[m\ set, the rule states that this element can only cross to the return node of call c, and that it be recoded as t. Assuming that this t element then reaches from this return node to the Bout[m\ set, t can then cross 29 to any return node that has an in-edge from m. Although both the (Â¿,{x}) and t elements refer to the same implicit definition of variable t occurring at point p, the two elements are not the same, and the crossing-restriction rule applies only to an element that is identical to the element generated at point p, which is (t, {x}). The implicit definitions of actual-parameter variables is the most important category of implicit definitions that are due to call-by-reference formal parameters. However, there is also a second, less-important category. At each explicit definition of a variable t at point p inside A, such that variable t is also used in a call of A as an actual parameter aliased to a call-by-reference formal parameter x, then there is an implicit definition of formal parameter x at point p. The implicit-definition element generated at point p would be (x, {<}), meaning a definition of variable x at point p, aliased to variable t. However, assuming a formal parameter cannot be defined or used outside the procedure for which it is declared, it follows that there is no need for a crossing-restriction rule for these elements, because they cannot cross to any return node. Normally, a definition of a variable kills all other definitions of that variable. However, the implicit definitions due to call-by-reference formal parameters have no associated kills. Instead, the following rule suffices. For each call-by-reference formal parameter x declared for procedure A, if all calls of A alias the same actual-parameter variable t to x, then each explicit definition inside A of either variable t or x, will kill all definitions of variable t and all definitions of variable x. Otherwise, if all calls of A do not alias the same actual-parameter variable t to x, then each explicit definition inside A of either variable t or x will kill only the definitions of that variable and those recoded elements that are aliased to that variable. The entry-node GEN set will be used to hold all implicit definitions of formal parameters that occur upon procedure entry. Thus, for each entry node, for each 30 formal parameter of the represented procedure that is call-by-value or call-by-value- result, add to the GEN set of that entry node an element that represents an implicit definition of that formal parameter occurring at that entry node. The return-node GEN set will be used to hold all implicit definitions of actual parameters that may occur upon return from the called procedure. Thus, for each return node, for each actual parameter of the associated call whose corresponding formal parameter is call-by-result or call-by-value-result, add to the GEN set of that return node an element that represents an implicit definition of that actual parameter occurring at that return node. The return-node KILL set should represent all elements that will be killed by these implicit definitions of actual parameters. With the GEN sets ready, the iterative algorithm can proceed. Once the iterÂ¬ ative algorithm is ended, a follow-on step is done: a) Examine the Bout set for each exit node. For each definition d in this set of a formal parameter p, and p is call-byÂ¬ result or call-by-value-result, then d reaches the implicit use of this formal parameter by those implicit definitions of actual parameters found at the various return nodes whose corresponding formal parameter is p. The element representing d can be added to the Bin sets of those return nodes in a way that reflects the reach, b) Examine the OUT set of each call node. For each definition d in this set of a variable that is used as an actual parameter in the call, and the corresponding formal parameter is call-by-value or call-by-value-result, then d reaches the implicit use of the defined variable by the implicit definition of the corresponding formal parameter found at the entry node of the called procedure. The element representing d can be added to the Ein set of that entry node in a way that reflects the reach. 2.3 Interprocedural Forward-Flow-And Analysis This section gives the dataflow equations used by our interprocedural analysis method for forward-flow-and problems. The difference between these equations and the equations for forward-flow-or is explained. 31 For forward-flow-and problems, some changes are needed to the dataflow equaÂ¬ tions given in Section 2.2.1. Of course, the confluence operator must be changed from union to intersection. However, it is still necessary to construct the entry-node entry set as the union of all crossing effects from the predecessor-node sets, so that calling context can be properly recovered at the return nodes. At the same time, the entry set must always be constructed as the intersection of predecessor-node sets, if the entry set is to be a part of the IN and OUT sets. These conflicting requirements for the entry-node entry set can be resolved by maintaining two separate entry sets at each node. The revised dataflow equations follow. The two conditions, C\ and C2, are explained in Section 2.2.1. For any node n. /7V[n] = 4â€ž2)[n] U Bin[n] OUT[n] = Eâ„¢[n\ U Bout[n\ Group I: n is an entry node. Bin[n] - 0 = U {x I X e (Â£Ãut[p] U Bout[p)) A Cl} p â‚¬ pred(n) 4?M = n {xlxeOUTipjhCr} p â‚¬ pred(n) Bout[n] = GEN[n\ = 4n}H u RECODEw[n\ U RECODE{2)[n] Eil\[n] = Â£jn2)[n] U RECODE^[n] Group II: n is a return node, p is the associated call node and q is the exit node of the called procedure. Bin[n\ = {i|(xG Bout[p] A (Cj V (Ci A C2 A 1 Â£ Eâ„¢[q ]))) V(iÂ£ Bout[q] A C2)} 32 4ÃM = {* â‚¬ Ã‰&ip] | CT V (Ci A C2 A x e EÃ2[9])};* = 1,2. 5ouÃ[n] = (5m[n] - tf/IZjn]) U GEN[n\ E^t[n\ = EÂ¡Ãœ[n}-KILL[n}]i = 1,2. Group III: n is not an entry or return node. Â£Â»nM = Pi Bout[p] p G pred{n) 4Â°M = fl 42.H;< = i,2. p G pred(n) 50Ut[n] = (Â£:n[n] - AVLI[n]) U GÂ£/V[n] 4ÃœW = 4?[n] - A7LL[n];i = 1,2. The entry set E^ is the set used to recover calling context, and the entry set Eis the set that is a component of the IN and OUT sets. The RECODE sets appearing in the entry-node equations represent recoded elements as explained in Section 2.2.2. The RECODEF) set will just be the union of the recoded elements generated from each predecessor call node c, using the algorithm of Figure 2.2 and drawing from the E^Jt[c] and Bout[c] sets at line 4 instead of the OUT[c] set. Similarly, the RECODE^ set could just be the intersection of the recoded elements from each predecessor call node c, drawing from the OUT[c] set at line 4. However, doing this may cause the unnecessary loss of recoded elements when the same underlying base element w is found in each OUT[c} set. To avoid such loss, an improved rule states that if the same base element w is found in each OUT[c] set, and there is one or more non-empty alias relationships for that w occurring at one or more predecessor nodes c, then a single recoded element for that w that encodes all of these alias relationships would be generated into the RECODE^ set, otherwise no recoded element for that w would be generated into the RECODE^ set. For 33 example, suppose c has three different values for a given entry node, and the same base element w is found in each OUT[c] set, and at one c there is an empty alias relationship, at the second c there is an alias relationship to formal parameter x, and at the third c there is an alias relationship to formal parameter y. For this example, the single recoded element would be (w, {x,i/}), and this recoded element can either be killed directly through to, or indirectly through x, or through y. Note that the complete kill of this recoded element at any kill point, even though the kill may have been made through an alias that was not established at each c, is nevertheless correct. The intersection confluence operator associated with RECODE^ implicitly requires that for base element to to pass a kill point, it must be on every call path past that kill point, which is not the case when to is killed from at least one call path, which happens when that to is killed through an alias that was established by at least one of the c. If the specific dataflow problem being solved allows the base element to be used through one of its effective aliases, then a flag could be associated with each alias in the recoded elements of RECODE^2\ and this flag could indicate whether or not the alias was established at each c. In the case of the example, the recoded element with flags would be (to, {znot> 2/not))- Only a use of the base element through an alias established at each c would be a use through an alias that occurs on every call path, and this kind of use would be the all-paths use that is implicitly required by the specific dataflow problem by virtue of it being forward-flow-and. With the exception of the confluence operator and the two different entry sets, the equations for forward-flow-and are the same as for forward-flow-or, and are likeÂ¬ wise correct. Set E^ fulfills the requirement for the IN and OUT sets by consistently using the intersection confluence operator for its construction, just as B does. The equations for the E^ and E^ sets only differ at the entry node, and there the only difference is the confluence operator, and the way the RECODE sets are built. As set intersection is the confluence operator for E^2\ and set union for E^\ and the 34 Table 2.2. Solution of forward-flow-and equations for Figure 2.3. Node EfiÂ» in ^out if? r(2) C'out Bin Bout 1 0 0 0 0 0 0 2 0 0 0 0 0 {1,2} 3 0 0 0 0 {1,2} {1,2, 4} 4 0 0 0 0 {1,4, 5} {1,4,5} 5 0 0 0 0 {1.2} {1,2,3} 6 0 0 0 0 {1,3, 5} {1,3, 5} 7 0 0 0 0 {1,5} 0,5} 8 {2, 3, 4} {2, 3, 4} {2} {2} 0 0 9 {2, 3, 4} {3, 4} {2} 0 0 {5} 10 {3.4} {3, 4} 0 0 {5} {5} RECODE^ set is added to both E^ and E^2\ it follows that E^ will be a subset of Eat every node. Thus, E^ can be used to recover calling context for E^2\ Set E^ also serves to recover calling context for both E^ and B, because E^ is built at the entry node from these two sets, and the use of union as the confluence operator guarantees that all calling-context effects will be collected. Table 2.2 shows the result of solving the equations for the flowgraph of FigÂ¬ ure 2.3. By â€œsolvingâ€ we mean that, in effect, the iterative algorithm has been used and all the sets are stable. The dataflow problem is available expressions, and variÂ¬ able w is local while variables x, y, and 2 are global. Available expressions is the problem of determining whether the use of an expression is always reached by some prior use of that expression, for certain expressions in the program. In Figure 2.3, nodes 1 and 8 are entry nodes, nodes 7 and 10 are exit nodes, nodes 3 and 5 are call nodes, and nodes 4 and 6 are return nodes. Alongside each node is its basic block. Each expression is superscripted with an identifier that is the set element used in Table 2.2 to represent that expression. procedure main begin y = w + 1 z = x + 1 procedure f() begin x = z + 2 end if(e) a = z + 1 caU f() Figure 2.3. An available-expressions example. 36 2.4 Interprocedural Backward-Flow Analysis Backward-flow problems are basically forward-flow problems in reverse. HowÂ¬ ever, the same flowgraph is used for both forward-flow and backward-flow problems. To convert the equations for forward-flow-or to backward-flow-or, or for forward- flow-and to backward-flow-and, the transformation is mechanical and straightforÂ¬ ward. The same equations are used, but various words and phrases are everywhere changed to reflect the reverse flow. For example, â€œpred(n)â€ for predecessors becomes â€œsucc(n)â€ for successors, â€œoutâ€ subscripts become â€œinâ€ subscripts and â€œinâ€ subscripts become â€œoutâ€ subscripts, IN becomes OUT and OUT becomes IN, â€œcall nodeâ€ beÂ¬ comes â€œreturn nodeâ€ and â€œreturn nodeâ€ becomes â€œcall nodeâ€, â€œentry nodeâ€ becomes â€œexit nodeâ€ and â€œexit nodeâ€ becomes â€œentry nodeâ€. For backward flow, the nodes requiring special equations are the exit node and call node, and not the entry node and return node as for the forward-flow problems. 2.5 Complexity of Our Interprocedural Analysis Method To determine the worst-case complexity of our method for the assumed lanÂ¬ guage model in which the visibility of each formal parameter is limited to the single procedure that declares it, we consider the solution of the dataflow equations for only one element at a time. Let n be the number of flowgraph nodes. Let the elementary operation measured by the complexity be the computation of the dataflow equations once at a single, average flowgraph node, for a single element. Only the presence or absence of the single element within a particular body or entry set need be repreÂ¬ sented, and this requires no more than a single bit of storage for each set referenced by the equations. Thus, computing the dataflow equations once at an average node, for a single element, will consist of a small number of integer operations, assuming that the average in and out-degree of the flowgraph nodes is bounded by a small constant, which will always be the case for flowgraphs generated from real programs, 37 and also assuming that the length of recoded elements will be small. Referring to the algorithm of Figure 2.2, the length of a recoded element is 1 + |NA|, and |i\M| is bounded from above by the number of call-by-reference formal parameters of the given procedure. As a rule, this upper bound will be small. We next consider the total number of node visits required to solve the dataflow equations for a single element. Prior to solving the equations, all body and entry sets are initialized to empty, at complexity 0(n). The empty sets represent the absence of the element. Note that each set has only two states: either the element is present, or it is absent. Assuming a forward-flow problem, each time the equations are computed for a node, if any of the out sets have changed from their previous state, then the equations will be computed for all successor nodes. The forward-flow-or equations have only two out sets per node, and the forward-flow-and equations have three. It follows that repeated computation of the equations for a single node will cause the successor nodes to be marked for computation at most two or three times, depending on the equations being used. Given that the average number of successor nodes is bounded by a small constant, it follows that the total number of node visits required to solve the dataflow equations for a single element will be bounded from above by kin where ki is a constant, giving a worst-case complexity of O(n) for solving the dataflow equations for a single element. The worst-case complexity of solving the dataflow equations for m total eleÂ¬ ments will therefore be 0{mn). Let b be the number of base elements for the program being analyzed, and let r be the number of recoded elements, giving m â€” b + r. As an example, for the reaching-definitions dataflow problem the base elements will be all the definitions in the program. We assume that for the kind of dataflow problems our method is meant to solve, the number of base elements will be a linear function of the program size, and therefore proportional to n. Let constant &2 be an upper bound of b/n. We also assume the universe of real, useful programs, written by 38 programmers to solve practical problems. To determine an upper bound for r, let k be the maximum number of formal parameters for a single procedure. That k is a constant independent of program size should be obvious. Given k and the algorithm of Figure 2.2, and allowing all possible combinations of the formal parameters of any single procedure, the maximum number of recoded elements for any single procedure and base element is k3 = ^ Â¿ ^ = 2k â€” 1. Note that k3 is a constant, albeit an enormous constant. The maximum number of recoded elements for any single procedure will therefore be k3b. In the assumed language model, each formal parameter is visible in only one procedure, and this means each recoded element is confined to a single procedure when the dataflow equations are solved. Therefore, the total number of node visits required to solve the dataflow equations for all the recoded elements will be bounded from above by kiSik3b where j is the number of procedures in the flowgraph, and sÂ¡ is the number of flowgraph nodes in the ith procedure. This upper bound can be rewritten as J2i=i kik2k3ns{. Ignoring constants and given that J2i=i si â€” n and HÂ¿=i nsi = n2, the worst-case complexity of our method for the assumed language model is 0(n2), and the elementary operation measured by the complexity is a small number of integer operations assuming that the average recoded-element length is small. For a program from the assumed universe of programs, the likelihood of a large complexity constant due to element recoding is very low, for the following reason. In order to increase the number of recoded elements for a given base element and procedure, the given base element must, in effect, be repeatedly aliased to different combinations of formal parameters in the given procedure. The algorithm of FigÂ¬ ure 2.2 generates at most a single recoded element for each element in the OUT set, so to increase the number of recoded elements as stated, there must be multiple calls to the same procedure, and in these different calls the same base element must be aliased to different formal-parameter combinations. To assess the likelihood of this 39 requirement being met, consider that for any given program from the assumed uniÂ¬ verse, the type and purpose of a variable determines how that variable is used in that program, and each variable used in a program by necessity has a purpose. Given a number of different calls to the same procedure, and given that a variable appears as one or more of the actual parameters in each of the calls, then as a rule we expect that variable to always occupy the same parameter positions in those calls because there is always a close correspondence between parameter position and the purpose of the variable that occupies that position. Note that by â€œvariableâ€ we mean a variable and any aliases it may have, including formal-parameter aliases. A variable and its aliases are interchangeable and share the same purpose because by definition they reference the same data. It might be argued that a language such as C has procedures that have a variable number of arguments, such as print/ and scan/, for which the same variable could easily occupy different actual-parameter positions in different calls. This is true, but such library procedures are best treated as unknown calls, and there is no element recoding for unknown calls. For the needs of element recoding in the rare case of a user-written procedure with a variable number of arguments, a single formal parameter could stand for the variable portion of the formal parameters, and conservative assumptions could be made whenever that single formal is, in effect, referenced. Aside from mentioning this, we do not consider such user-written variable- argument procedures further. For a dataflow problem such as reaching definitions, the base element can only be affected by a single variable. For such a dataflow problem, the purposefulness of variables makes it very unlikely that an increase in the number of recoded elements for a given procedure and base element can even begin, let alone be sustained. HowÂ¬ ever, such an increase would be more likely for a dataflow problem where the base 40 element can be affected by several different variables. An example would be availÂ¬ able expressions, because each base element could be affected by as many different variables as compose the expression represented by that base element. In light of the preceding argument regarding the purposefulness of variables, for the reaching-definitions and similar dataflow problems, we expect the maximum number of recoded elements for any given procedure and base element in the majority of the programs in the assumed universe, to be one, and a little higher than one for the remaining programs in that universe. Given the algorithm of Figure 2.2, we also expect the average length of each recoded element to be slightly more than two, given the preceding expectation that there will be a very small maximum number of recoded elements for any given procedure and base element, and assuming that most base elements when aliased by a call will be aliased to only a single formal parameter, and only occasionally aliased to more than one. Note that this expected average length of the recoded elements is consistent with the claim that the elementary operation measured by the worst-case complexity of our method is a small number of integer operations. It may be noticed that the complexity of 0(n2) for our interprocedural analysis method is the same as the known worst-case complexity for intraprocedural dataflow analysis, assuming there are no restrictions on the flowgraph. This fact makes it unlikely that it would be possible to improve on our method in terms of complexity, without resorting to flowgraph restrictions. However, although the complexities are the same, this does not mean interprocedural dataflow analysis will now take roughly the same time as intraprocedural dataflow analysis. The following inequality should make this clear. 5: n2> given that j is the number of procedures in the flowgraph, s,- is the number of flowgraph nodes in the Â¿th procedure, and si = n- Besides the language model that is assumed for this chapter, an alternative model allows each formal parameter to have visibility in more than a single procedure. 41 Examples of programming languages that fit this alternative model are Pascal and Ada, which allow nested procedures. Element recoding can be used for this alternative model, but unless precision is compromised, the worst-case complexity for solving the equations will be exponential, because the number of recoded elements could grow exponentially assuming that alias information is compounded when a recoded element is recoded. The exponential complexity of tracking aliases due to calls was first considered by Myers [17], and more recently by Landi and Ryder [15]. In practice, the cost of precise element recoding for the alternative language model may be acceptable for the assumed universe of programs, and for the same reason given previously regarding the purposefulness of variables. However, we do not consider the alternative model further. 2.6 Experimental Results There are experimental data for our interprocedural analysis method. SpecifÂ¬ ically, two different prototypes have been constructed, and they both solve the reaching-definitions dataflow problem using our method. Both prototypes accept C-language programs as the input to be dataflow analyzed. For simplicity, these proÂ¬ totypes impose some restrictions on the input, such as requiring that all variables be represented by single identifiers, thereby excluding variables that have more than one component, such as structure and union variables. In addition, there is no logic in the prototypes to determine what pointers are pointing at, so pointer dereferencing is essentially ignored. The prototypes do not accept pre-processor commands, so the input programs must be post-preprocessor. Both prototypes, named prototype 1 and prototype 2, use the same code to parse the input program and construct the flowgraph. However, they differ in how they implement our analysis method. Prototype 1 prepares a single bit-vector format containing all the definitions in the input program, and then solves the dataflow equations once for the program flowgraph. Prototype 2 uses a single integer as the 42 bit vector and solves the dataflow equations for the program flowgraph as many times as there are base elements. For the reaching-definitions dataflow problem, the definitions in the program are the base elements. We call the approach used by prototype 2 one-base-element-at-a-time, and the approach used by prototype 1 is all-at-once. It might be expected that prototype 2 would be many times slower than protoÂ¬ type 1, because of the big difference in bit-vector sizes, but this is not the case. For prototype 1, calculations using varied test results show that V x Si ~ D, where V is the average number of visits per flowgraph node made to solve the dataflow equaÂ¬ tions, Si is the integer size of the bit vector for prototype 1, and D is the number of definitions in the input program. This relationship for prototype 1 means that prototype 2 should run at roughly the same speed as prototype 1, because solving the dataflow equations for a single element will require an average of roughly one visit per flowgraph node and the application of the dataflow equations to a vector of size one. Note that the total amount of work prototype 1 must do per flowgraph node to solve the equations is proportional to the product V x S\ ~ Z), and the total amount of work prototype 2 must do per flowgraph node to solve the equations for the D base elements is proportional to the product VxSjxflwlxlxflwfl, where Â¿2 is the integer size of the bit vector for prototype 2. Experimental results have supported the expectation of similar speeds for the two prototypes. When deciding on the design of a practical tool, this finding is important and decisively tips the scales in favor of the one-base-element-at-a-time approach used by prototype 2. For both prototypes, the bit space needed for set storage is n&s, where n is the number of flowgraph nodes, k is the average number of sets per node, and s = max(average set bit-size for any solving of the equations). Note that for prototype 1 there is only one solving of the equations, and for prototype 2 there are as many solving of the equations as base elements. The primary reason 43 Table 2.3. Typical experimental results for the two prototypes. defs defs global calls nodes prototype 1 prototype 2 2126 30% 521 4191 49s lm21s 2026 60% 472 3948 55s 2m22s 4109 30% 924 7537 4ml8s 4m38s 4223 60% 916 7723 4m57s 8m 19s 6115 30% 1325 11185 N/A lOmOs 6091 60% 1411 11288 N/A 18ml8s 8200 30% 1832 14799 N/A 17m44s 8054 60% 1726 14641 N/A 30m2s 10299 30% 2164 18434 N/A 23m55s 10016 60% 2356 18587 N/A 45m8s the approach used by prototype 2 is preferable when compared with the all-at-once approach used by prototype 1, is the likelihood of a greatly reduced s value. For example, without element recoding, the s value is 1 for prototype 2, and D for prototype 1. Allowing element recoding, the s value for the prototype-2 approach will be 1 + max(average number of recoded elements per procedure for any solving of the equations). Here we assume that the best way to add element recoding to prototype 2 would be, for each solving of the equations, to solve the equations for both a single base element and all recoded elements generated from that base element. Table 2.3 presents typical experimental results for the two prototypes. Each table row represents a different input program. The input programs were randomly generated by a separate program generator. The generated input programs are synÂ¬ tactically correct and compile without error, but have meaningless executions. Each input program in Table 2.3 has 100 procedures. Only prototype 1 currently has element-recoding logic, so the input programs do not have call parameters and the table data do not reflect element-recoding costs. Measuring element-recoding costs for randomly generated programs would be somewhat meaningless anyway, since the purposefulness-of-variables principle would be violated. 44 Referring to the columns of Table 2.3, â€œdefsâ€ is the total number of definitions in the input program, â€œdefs globalâ€ is the percentage that define global variables, â€œcallsâ€ is the number of known calls, â€œnodesâ€ is the number of flowgraph nodes, â€œprototype 1â€ is the total CPU usage time in minutes and seconds required by prototype 1 to completely solve the reaching-definitions dataflow problem for the input program and generate a report of all the reaches, and â€œprototype 2â€ is the same thing for prototype 2. The hardware used was rated at roughly 23 MIPS. The large space requirements of prototype 1 prevented running it for the larger input programs in the table. CHAPTER 3 INTERPROCEDURAL SLICING AND LOGICAL RIPPLE EFFECT 3.1 Representing Continuation Paths for Interprocedural Logical Ripple Effect This section lays the theoretical basis for our algorithm. The problem of inter- procedural logical ripple effect is examined from the perspective of execution paths and their possible continuations. First, general definitions are given, followed by three assumptions and a definition of the Allow and Transform sets, followed by Lemma 1, Theorems 1 through 4, and a discussion of the potential for overestimation inherent in the Allow set. A variable is defined at each point in a program where it is assigned a value. A definition is assumed to have the general form of â€œv <â€” expressionâ€, where v is the variable being defined and â€ is an assignment operator that assigns the value of expression to v. If the expression includes variables, then these variables are termed the use variables of the definition. In general, a use is any instance of a variable that is having its value used at the point where the variable occurs. A procedure contains a definition if the statement that makes the definition is in the body of the procedure. Similarly, a procedure contains a call if the statement that makes the call is in the body of the procedure. The body of a procedure is those statements that are defined as belonging to the procedure. Frequent reference is made in this chapter to a procedure containing a stateÂ¬ ment, or containing a call, or containing a flowgraph node. For languages that allow nested procedures, such as Pascal and Ada, note that procedure nesting in these languages is a mechanism for controlling variable scope, and not a mechanism for 45 46 sharing statements, calls, or flowgraph nodes. Throughout this chapter we assume that at most only a single procedure contains any given statement, call, or flowgraph node. Let d and dd be two definitions, possibly the same, in the same program. Let dd have a use-variable v, let vÂ¿Â¿ be that use-variable instance, and let d define v. Given a possible execution path between definition d and vÂ¿d, along which the definition of v that d represents would be propagated, such a path is referred to as a definition-clear path between d and Vdd with respect to v. Definition d can only be propagated along an execution path to the end of that path if either definition d itself or an element that represents definition d exists at the beginning of that path, and there is no redefinition of v along that path. Definition d is said to affect definition dd if there is a definition-clear path between d and Vdd with respect to v. Similarly, definition d affects use u if u is an instance of v, and there is a definition-clear path between d and u with respect to v. For convenience, v will not be explicitly mentioned when it is understood. Note that whenever we speak of an execution path between two points, we always mean that the execution path begins at the first point and ends at the second point. For example, an execution path between d and dd begins at the program point where d occurs and ends at the program point where Vdd occurs. For convenience, we assume that dd and Vdd occupy the same program point. Assumption 1. A called procedure, if it returns, always returns to its most recent caller. A procedure that returns, always returns to the most recent unreturned call. Assumption 2. A call has no influence on the execution paths taken inside the called procedure. Assumption 3. There are no recursive calls. Assumption 1 reflects the behavior of all the procedural languages that we know of. Regarding Assumption 2, our algorithm may in fact overestimate the logical ripple effect because of both Assumption 2 and the unstated but standard assumption of 47 intraprocedural dataflow analysis that all paths in a procedure flowgraph are possible execution paths. However, these two assumptions are unavoidable because determinÂ¬ ing all the truly possible execution paths in an arbitrary program is known to be an undecidable problem. Regarding Assumption 3, making this assumption improves the precision of our algorithm because this assumption removes a potential cause of overestimation. The consequence of using our algorithm for a program with recursive calls is discussed at the end of Section 3.2. To determine what a definition affects when it is constrained by ripple effect, it is useful to introduce two concepts: backward flow and forward flow. Given an execution path, whenever the execution path returns from a procedure to a call, this is termed backward flow. All other parts of the execution path may be termed forward flow. Note that the possibilities for backward flow are constrained by Assumption 1, and therefore constrained by the relevant execution paths that lead up to the point of the return in question. Regarding a given execution path, those call instances within that execution path that have yet to be returned to within that path, called unreturned calls, are the parts of the path that constrain backward flow. Note that this constraint is a positive constraint, since a call cannot be returned to unless that call exists as an unreturned call in at least one relevant execution path. Definition 1. Two sets, Allow and Transform, will be used to represent the backward-flow restrictions associated with a particular definition d. Let p be the proÂ¬ gram point where definition d occurs. The elements in both sets are calls. The Allow set identifies only the calls to which the execution path continuing on from point p may make an unmatched return toâ€”until the backward-flow restrictions represented by this Allow set are effectively cancelled by the interaction between the execution- path continuation and the Transform set, explained shortly. An unmatched return is a return made during the execution-path continuation to a call instance that precedes 48 the beginning of that execution-path continuation. The call instance is necessarily an unreturned call, as otherwise it could not be returned to. |AIlow| < the total number of different calls represented in the program text. We define Allow = 0 to mean there are no backward-flow restrictions for d. The Transform set identifies only the calls to which the execution path continuing on from point p may make an unmatched return to, and upon this unmatched return, the execution-path continuation is no longer constrained by the Allow and Transform sets associated with d. The following relationships hold. Transform C Allow. If Allow ^ 0 then Transform ^ 0. Note that minimizing backward-flow restrictions must be done whenever the possible execution paths allow it, because otherwise the computed logical ripple effectâ€”which is the whole purpose of this formal-analysis sectionâ€”may be missing pieces that belong in it but were not added to it because backward-flow restrictions were retained that are not valid for all the possible execution paths involved. Lemma 1. For any execution path P between two program points p and q, if P includes two or more call instances made in P that have not been returned to in P, then for these unreturned calls, c, calls the procedure containing ct+1, where c, is the ith unreturned call, in execution order, made in P. Proof. Assume that the next unreturned call c,+i is not contained in the proÂ¬ cedure that was called by c,. Let X be the procedure called by c,, and let Y be the procedure that contains c,+j. The execution path in P between making the call c, and making the call c,+1 must include a path out of procedure X and into procedure Y so that the call ct+i can be made. A path out of procedure X can occur in only two ways. Either X returns to a call, or X itself makes a call. If X returns to a call, then by Assumption 1, c, would be returned to, contradicting the given that c, has not been returned to. This means X must make a call to get to Y. Let c be the call contained in X that is the last call contained in X on the execution path in P taken from X to Y so as to make the call c1+i. If X makes the call c, and c has not been 49 returned to in P, then c would precede c,+i as an unreturned call following cÂ¿, conÂ¬ tradicting the given that c,+i is the next unreturned call in execution order after c,. If c has been returned to in P, then all calls occurring on the execution path between the call c and the return to c must have been returned to according to Assumption 1. This would mean c,+i has been returned to, contradicting the given that cÂ¿+i has not been returned to. Thus, it is true that c, calls the procedure containing Cj+i, as assuming otherwise leads to contradictions. â–¡ Definitions for Theorems 1 through Let d and dd be the two definitions previously defined. Let A and T be the Allow and Transform sets associated with d. Let P be a single execution path between d and dd, and along which d can affect dd, subject to the constraints on P imposed by A and T. P will consist of a sequence of calls and returns, if any, in the order they are made. Any instance of a call made in P that is not returned to in P, is an unreturned call in P. K is defined for P if and only if P contains an unmatched returnâ€”meaning a return to a call instance that precedes the beginning of Pâ€”to a call â‚¬ T. K is that part of P that follows the first unmatched return to a call 6 T. Thus, K represents the continuation of P after the unmatched return. Any instance of a call made in K that is not returned to in K, is an unreturned call in K. Referring to each of the four theorems in turn, let AA and TT be the Allow and Transform sets for dd given all the paths P that meet the requirements of P as stated by that theorem. Let AAp and TTp be the Allow and Transform sets for dd given a single path P that meets the requirements of P as stated by that theorem. The four theorems that follow each define AA and TT. Note that for any given P, A, and T, one of the four theorems will apply. Theorem 1. If (1) A = 0, and P has no unreturned calls, or (2) A ^ 0, K is defined for P, and K has no unreturned calls, then AA <â€” 0 and TT <â€” 0. 50 Proof. For case (1), d is free of backward-flow restrictions and d has affected dd without making an unreturned call, therefore dd will be free of backward-flow restrictions, giving AA *â€” 0 and TT <â€” 0. For case (2), as soon as path P makes an unmatched return r to a call 6 T, then by Definition 1 what d can affect is no longer constrained by A and T, and this freedom from constraint by A and T passes by transitivity to dd because d affects dd. When K is defined for P, the unmatched return r in P that immediately preÂ¬ cedes the beginning of K, means that any unreturned calls in P are also in K. This is because all call instances within P are more recent than the call instance that matches the unmatched return r. Thus, by Assumption 1 all call instances in P preceding the return r must be returned to in P before r can occur. Therefore, P has no unreturned calls because K has no unreturned calls. Thus, dd is free of backward- flow restrictions since A, T, and P contribute nothing in the way of constraint, giving AA Â«â€” 0 and TT <â€” 0. â–¡ Theorem 2. If (1) A = 0, and P has at least one unreturned call, or (2) A ^ 0, K is defined for P, and K has at least one unreturned call, then AA *â€” Uajj suc^ p {the unreturned calls of P}, and TT <â€” (Jajj ^ p {the first unreturned call in P}. Proof. For case (1), A and T contribute nothing in the way of constraint to AAp and TTp. Because d affects dd along path P which contains unreturned calls, by Assumption 1 those unreturned calls must be returned to first before any other unreturned calls can be made from the execution-path continuation point of dd onward. Hence, AAp <â€” {the unreturned calls of P}. Because d had no backward- flow restrictions, it follows that once all the unreturned calls of P are returned to by the execution-path continuation, then that continuation would no longer have any backward-flow restrictions. Because of Assumption 3 and Lemma 1, all the unreturned calls of P are returned to when the sequentially first unreturned call in P is returned to. Hence, TTp <â€” {the first unreturned call in P}. For case (2), as 51 shown in the proof of Theorem 1 case (2), A and T contribute nothing to AAp and TTp when K is defined for P. Thus, this case (2) is effectively the same as case (1), because the A and T sets contribute nothing and an unreturned call in K is an unreturned call in P. Therefore, AAp *â€” {the unreturned calls of P} and TTp <â€” {the first unreturned call in P}. From Definition 1 and the general definitions of AA, TT, AAp, and TTp, it follows that AA <â€” (Jall such p AAp and TT <â€” Uall such P TTp. Thus, AA <â€” (Jail such P unreturned calls of P}, and TT +â€” (Jajj such p {the first unreturned call in P}. â–¡ Theorem 3. If A ^ 0, K is not defined for P, and P has no unreturned calls, then AA <â€” {x | x â‚¬ A A (x is part of a possible execution path that inclusively begins with a call Â£ T and ends with a call of the procedure containing dd, such that each unreturned call in this possible execution path is in A)}, and TT <â€” A A fl T. Proof. Note that only one procedure contains dd. Because K is not defined for P, it follows that P was constrained in its entirety by A, never making an unmatched return to a call Â£ T. Because P has no unreturned calls, d can only affect dd along P by making one or more unmatched returns to calls Â£ (A â€” T), unless d and dd are in the same procedure. A, in effect, represents possible execution paths with unreturned calls by which d was affected. However, once given P, the path P may eliminate some of the paths from A as being possible, and return to some of the unreturned calls in A. Thus, although P contributes nothing directly to AA, it may narrow the unreturned execution-path possibilities that A can contribute to AA. AA as defined for this theorem, captures all execution paths in A that begin with a call Â£ T and end with a call of the procedure that contains dd. Given Assumption 3, it should be obvious that these are all the possible paths in A that are unreturned after P. Note that if d and dd are in the same procedure, then AA = A and TT = T. Assume that 52 d and dd are in different procedures. Any call â‚¬ A that is not part of at least one path in A that makes a call of the procedure containing dd, must be excluded from AA because P requires a path in A that passes through the procedure containing dd, because otherwise P could not make a return to the procedure containing dd. Any call â‚¬ A that is on a path in A between the procedure containing dd and the procedure containing d, must be excluded from AA because the procedure containing dd has been returned to by P. The definition of AA for this theorem satisfies these two exclusions. That TT *â€” AA D T follows from Definition 1 requiring TT C AA, and from the definition of AA for this theorem. â–¡ Theorem 4- If A ^ 0, K is not defined for P, P has at least one unreturned call, and the first unreturned call in P is contained in procedure X, then S\ *â€” Uall such P given X {^e unreturned calls of P}, and S2 <â€” {x | x â‚¬ A A (x is part of a possible execution path that inclusively begins with a call 6 T and ends with a call of the procedure X, such that each unreturned call in this possible execution path is in A)}, AA <â€” S\ U S2, and TT <â€” S2 D T. Proof. Si follows from Definition 1 and the proof of Theorem 2. S2 follows from Theorem 3, where the specific â€œprocedure containing ddâ€ in the expression for AA in Theorem 3 has been replaced by the equally specific â€œprocedure X". That the union operation of AA, combining S\ and S2, does not thereby repreÂ¬ sent spurious paths in AA, it is only necessary to show that the paths represented in Si never cross with the paths represented in S'2. Two paths cross if each path makes an unreturned call to the same procedure. All paths in S2 end with an unreturned call of procedure X. All paths in Si begin with an unreturned call contained in procedure X. Assume that both Si and S2 include an unreturned call to the same procedure. As all paths in S2 lead to procedure X, this means there exists an exeÂ¬ cution path that originates in procedure X and eventually calls procedure X. Thus, 53 Figure 3.1. An example call structure that does not allow overestimation. the execution path represents recursion, and this is contradicted by Assumption 3. Therefore, the paths represented in Si never cross with the paths represented in S2. The first unreturned call in P is not added to TT because the path P is an extension of the unreturned paths represented in S2- That TT *â€” Si (~\T follows from Definition 1 requiring TT C AA, and from the definition of AA for this theorem. â–¡ The four theorems given above will be used to build the algorithm given in the next section. In effect, a given Allow set represents possible execution paths with unreturned calls by which the definition associated with that Allow set was affected. Inversely, the Allow set identifies, in effect, those continuation paths that can make unmatched returns. However, missing from the Allow set is the information needed to enforce an ordering of the unmatched returns that the continuation path may make. To a large extent, this missing information is unnecessary because of Lemma 1. Typically, the call structure of the program itself enforces the ordering of the unmatched returns. Figure 3.1 is an example. Assume d affects dd, giving an Allow set of {cl,c2} for dd. Given a continuation path from dd, it is not possible for cl to be returned to before c2, so the correct ordering of unmatched returns is enforced by the program itself. However, there are cases where the missing ordering information can result in a continuation path taking unwanted shortcuts. Figure 3.2 gives an example of a call structure that allows the continuation path from dd to make an unwanted shortcut when given the right circumstances. Assume d affects dd along the paths cl-c2 and c3-c4, giving an Allow set of {cl,c2,c3,c4} for dd. Assume the continuation path is r2-c5-r3, where r2 and r3 are unmatched returns 54 Figure 3.2. An example call structure that allows overestimation. to calls c2 and c3. The unmatched return r3 should not be allowed to happen before an unmatched return r4, but this unmatched-return ordering will not be enforced by the Allow set defined in this dissertation, so the assumed continuation path is possible. By virtue of such a spurious continuation path, dd may be able to affect a definition or use that it would not otherwise be able to affect, assuming dd were confined to only legitimate continuation paths. In practical terms, this means that the computed logical ripple effect that consists of affected definitions and uses may in fact be an overestimate because of spurious continuation paths. Although the Allow set does permit spurious continuation paths under the right circumstances, of which Figure 3.2, and the assumed paths by which d affected dd, are the most simple example, we feel that these circumstances, along with spurious paths that affect what would otherwise be unaffected, will not occur often enough in real programs to undermine the general usefulness of the Allow set in constraining backward flow and permitting computation of a precise or semiprecise logical ripple effect. 55 3.2 The Logical Ripple Effect Algorithm This section presents an algorithm for computing a precise interprocedural logiÂ¬ cal ripple effect. After a brief overview of the algorithm, the dataflow analysis method used by the algorithm is discussed. Then, two important properties of the dataflow sets are detailed, followed by three rules that are used to impose backward-flow reÂ¬ strictions on the dataflow analysis that is done. Last are proofs that the algorithm is correct. The algorithm to compute logical ripple effect is shown in Figure 3.3. Each statement in the algorithm is numbered on the left. For convenience, algorithm statements will be referred to as lines. For example, a reference to line 28 means the statement at 28 that actually is printed on several lines. Comments in the algorithm begin with â€”. _L and T are just two different, fixed, arbitrary values. In general, the algorithm works as follows. A definition d and its associated Allow and Transform sets are popped from the stack (line 7), and then the reaching- definitions dataflow problem is solved for this definition d, imposing any backward- flow restrictions represented by the Allow and Transform sets (line 8). Reaching definitions for a single definition is the problem of finding all uses and definitions affected by the definition. The definition d that was dataflow analyzed, and any uses affected by it, are included in the ripple effect (lines 9 to 11). Each affected definition will have its Allow and Transform sets determined in accordance with Theorems 1 through 4 (lines 22 to 46). A check is then made to see if the affected definition and its restriction sets, Allow and Transform, should be added to the stack for dataflow analysis or not (lines 47 to 52). The algorithm ends when the stack is empty. Although the algorithm shows a single definition b being added to the stack at line 5, any number of different b can actually be added, along with empty restriction sets for each b. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 56 â€” Compute the logical ripple effect for a hypothetical or actual definition b â€” Input: a program flowgraph ready for dataflow analysis â€” Output: the logical ripple effect in RIPPLE begin RIPPLE 0 for each definition dd in the program FlNdd Â«- 1 end for stack Â«â€” 0 push (b, 0, 0) onto stack while stack / 0 do pop stack into (d, ALLOW, TRANSFORM) Solve the reaching-definitions dataflow equations for the single definition d, using Rules 1, 2, and 3. RIPPLE 4- RIPPLE U {d} for each use u in the program that is affected by either d\ or d2 RIPPLE 4- RIPPLE U {u} end for ROOT1 4- 0, LINK1 4- 0, ROOT2 Â«- 0, LINK2 4- 0 for each call node n in the flowgraph if d\ â‚¬ Bout[n] and d\ crossed from this call into the called procedure ROOT1 4â€” ROOT1 U {the call node n} fi if di â‚¬ Eout[n\ and di crossed from this call into the called procedure LINK1 4- LINK1 U {the call node n} fi if d2 â‚¬ Bout[n] and d2 crossed from this call into the called procedure ROOT2 4 ROOT2 U {the call node n} fi if d2 â‚¬ Eout[n\ and d2 crossed from this call into the called procedure LINK2 4- LINK2 U {the call node n} fi end for Figure 3.3. The logical ripple effect algorithm. 57 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 for each definition dd in the program that is affected by either d\ or d2 â€” determine Allow and Transform for dd by Theorem 1 if d-2 â‚¬ Bin [node where dd occurs] PATHS Â«- 0, TRANS 4- 0 call Analyze else â€” determine Allow and Transform for dd by Theorem 2 if d2 (E Â£tâ€ž[node where dd occurs] PATHS v- 0 PATHS <â€” {a: I x â‚¬ (ROOT2 U LINK2) A (x calls the procedure that contains dd V x calls a procedure that contains a call c â‚¬ (PATHS n LINK2))} TRANS Â«- ROOT2 n PATHS call Analyze fi â€” determine Allow and Transform for dd by Theorem 3 if d\ â‚¬ Bin [node where dd occurs] PATHS Â«- 0 PATHS <â€” {1 I x 6 ALLOW A (x calls the procedure that contains dd V x calls a procedure that contains a call c â‚¬ PATHS)} TRANS 4- TRANSFORM D PATHS call Analyze fi â€” determine Allow and Transform for dd by Theorem 4 if d\ â‚¬ 2Â£,-â€ž[node where dd occurs] for each procedure X that contains a call 6 ROOT1 RT1 PP â™¦ PP v - {x I x â‚¬ ROOT1 A x is contained in procedure X} 0 {x I x â‚¬ (RT1 U LINK1) A (x is on a path that inclusively begins with a call â‚¬ RT1 and ends with a call of the procedure that contains dd, such that each call in this path is in (RT1 U LINK1))} if PP ^ 0 PATHS 4- 0 PATHS <â€” {x I x G ALLOW A (x calls procedure X V x calls a procedure that contains a call c â‚¬ PATHS)} TRANS 4- TRANSFORM 0 PATHS PATHS 4- PATHS U PP call Analyze end statements: fi, end for, fi, fi, end for, od end Figure 3.3. - continued. 58 Procedure Analyze begin â€” avoid repetition of dd dataflow analysis if possible 47 if FIN* / T A (PATHS = 0 V (true for all saved pairs for dd: PATHS % P V TRANS Â£ T)) 48 if PATHS = 0 49 FINdii T 50 push (dd, 0, 0) onto stack else 51 save PATHS and TRANS as the pair P x T for dd 52 push (dd, PATHS, TRANS) onto stack fi fi end Figure 3.3. - continued. The dataflow equations referred to in line 8 are shown in Figure 3.4. These equations are copied from Chapter 2 that presents a method for context-dependent flow-sensitive interprocedural dataflow analysis. The method consists of solvingâ€” using the standard iterative algorithmâ€”the dataflow equations shown in Figure 3.4, for the program flowgraph required by the equations. The method in Chapter 2 includes a solution to the problems of parameter aliasing and implicit definitions, that are part of the interprocedural reaching-definition problem. We assume that the full method of Chapter 2 would be used, but we do not discuss these side issues in this chapter as they are not directly relevant to the algorithm. Note that there are other methods for context-dependent flow-sensitive interprocedural dataflow analysis [3, 9, 17, 21], but the method of Chapter 2 has precision and efficiency advantages over the other methods cited. Referring to the dataflow equations of Figure 3.4, four sets are computed for each flowgraph node: two body sets, 5,n and Bout, and two entry sets, and Eout. All body and entry sets are initially empty. As the equations will be solved for only a single definition d, the GEN set for the node where d occursâ€”i.e. the node whose 59 For any node n. IN[n] = Ein [n] U Â£,â€ž[n] OUT[n] = Eout[n\ U Bout[n] Group I: n is an entry node. Bin[n] = 0 Ein[n\= U {x\xeOUT[p}AC1} p G pred(n) Bout[n\ = GEN[n] Eout[n] = Etn[n] U RECODE[n\ Group II: n is a return node, p is the associated call node and q is the exit node of the called procedure. Bin[n} = {x | (x G Bout[p] A (Ci V (Ci A C2 A x e .Eout[?]))) V (x G Bout[q] A C2)} E,n[n] = {iÂ£ Eout[p] | Ci V (Ci A C2 A x G Eout [?])} 50Ut[n] = (Â£,â€ž[Â«] - KILL[n}) U GEN[n\ E0ut[n] = Ein[n] - KILL[n] Group III: n is not an entry or return node. 5,â€ž[n] = U Bout[p] p G pred(n) Ein[n] = |J Eout[p] p G pred(n) Bout[n] = (.Bin[n] - KILL[n]) U GEN[n} Eout[n\ = Ein[n\ â€” KILL[n] Figure 3.4. Dataflow equations for the reaching-definitions problem. 60 associated block of program code contains the definition dâ€”will contain an element representing d, and all the other GEN sets will be empty. The node where d occurs is the natural starting point for the iterative algorithm, that will recompute the body and entry sets for the nodes until stability is attained and the sets cease to change, at which point the equations have been solved. Once solved, an element is in the entry set or body set at a particular node depending on how that element was propagated to that node. The same element may be in both sets at the same node. Properties 1 and 2 listed below, summarize those implications of set membership that are used by the algorithm. The properties follow directly from the dataflow equations. Property 1. For any node n, an element is in the Â£:n[n] set or Eout[n\ set if and only if that element entered the procedure that contains node n from a call node, and there is a definition-clear path from that call node to node n. Thus, membership in the entry set of node n implies that the element can propagate to node n by an execution path that makes at least one unreturned call between the point where the element is generated and the point where node n occurs. Property 2. For any node n, an element is in the Bin[n] set or Bout[n] set if and only if that element was generated in the same procedure that contains node n, or that element entered the procedure that contains node n from an exit-node Bout set. There must also be a definition-clear path to node n from either the elementâ€™s generation node or from the exit node. If the element entered from an exit-node Bout set, then Property 2 applies recursively to the element in that Bout set. Thus, membership in the body set of node n implies that the element can propagate to node n by an execution path between the point where the element is generated and the point where node n occurs that does not include any unreturned calls. The three rules referred to in line 8 are listed below. Rule 1 applies before the dataflow equations are solved. Rules 2 and 3 apply as the equations are being 61 solved. The rules impose the backward-flow restrictions represented by the ALLOW and TRANSFORM sets in line 7. Rule 1. If ALLOW = 0 then element d2 is generated at the node where definition d occurs, otherwise dx is the generated element, meaning the element in the GEN set. Both d\ and d2 are base elements that represent the same definition d. Both elements are identical in terms of when they appear in any given KILL set. The only difference between them is that d\ and d2 are treated differently by Rules 2 and 3 below. If the ALLOW set is empty, then by Definition 1 there should be no backward- flow restrictions on d. Rule 1 accomplishes this requirement, as d2 is immune to backward-flow restrictions which are imposed by Rule 2. Rule 2. Let n be a return node, p be the associated call node, and q be the exit node of the called procedure. Each time the 5m[n] equation is computed, if d\ â‚¬ Bout[q]i then d\ cannot cross from Bout[q] into the 5,â€ž[n] set if p Â£ ALLOW. In the dataflow equations, the crossing of an element from an exit-node body set to a return node is the only action in the equations that represents, in effect, an unmatched return to a call instance that was made in an execution path leading up to the program point where definition d occurs, which is the starting point of the reaching-definition analysis done for d. Thus, Rule 2 covers all cases in which an unmatched return occurs. Rule 2 restricts unmatched returns to those call instances that are represented in the ALLOW set, thereby realizing the purpose of the ALLOW set as given by Definition 1. Rule 3. Let n be a return node, p be the associated call node, and q be the exit node of the called procedure. Each time the F?,â€ž[n] equation is computed, if d\ E Bout[q], and, by C2 and Rule 2, dj can cross from Bout[q] into the 5,â€ž[n] set, and p â‚¬ TRANSFORM, then as this d\ element crosses from Bout[q] into the 62 set, the element is changed to d2. In effect, d\ is transformed into d2, and the return node n becomes a generation node for the d2 element. As already mentioned, in the dataflow equations the crossing of an element from an exit-node body set to a return node is the only action in the equations that represents, in effect, an unmatched return to a call instance that was made in an execution path leading up to the program point where definition d occurs, which is the starting point of the reaching-definition analysis done for d. Thus, Rule 3 covers all cases in which an unmatched return occurs. The requirement by Rule 3 that the returned-to call be in the TRANSFORM set satisfies Definition 1 as to when backward-flow restrictions can be ignored. Rule 3 replaces element d\, which is subject to the backward-flow restrictions, with element d2, which is free of backward- flow restrictions, at the return point and thereby satisfies Definition 1 regarding removal of backward-flow restrictions on the execution-path continuation, since d2 now represents the continuation instead of d\. Lemma 2. The algorithm computes at lines 23 to 46 the restriction sets for an affected definition in accordance with Theorems 1 through 4. Proof. We first establish the properties of the LINK and ROOT sets computed at lines 12 to 21. Let p be the node, if any, where d\ is generated. Let q be any node where d2 is generated, i.e. those return nodes where di is transformed into d2, or for ALLOW = 0 the node where definition d occurs. The tests at lines 14 and 18 make use of Property 2: if an element is in the Bout set of a call node n, then there exists a definition-clear path between the node where the element is generated and node n, and the path has no unreturned calls. The call at node n would be the first unreturned call on that path by just extending the path to the entry node of the called procedure. Therefore, the R00T1 set represents all calls that are the first unreturned call on at least one definition-clear path between node p and some other node in the flowgraph. The ROOT2 set represents all calls 63 that are the first unreturned call on at least one definition-clear path between node q and some other node in the flowgraph. The tests of lines 16 and 20 make use of Property 1: if an element is in the Eout set of a call node n, then there exists a definition-clear path between the node where the element is generated and node n, and the path includes the unreturned call that called the procedure containing node n. The call at node n would be at least the second unreturned call on that path by just extending the path to the entry node of the called procedure. Therefore, the LINK1 set represents all calls that are an unreturned call but not the first unreturned call on at least one definition-clear path between node p and some other node in the flowgraph. The LINK2 set represents all calls that are an unreturned call but not the first unreturned call on at least one definition-clear path between node q and some other node in the flowgraph. The test at line 23 checks for the application of Theorem 1. If d2 â‚¬ R,n[node where dd occurs], then by Property 2 there exists a definition-clear path P between d and dd that has no unreturned calls, and somewhere along P, d2 is generated, meaning either ALLOW = 0 or K is defined for P. This satisfies the conditions of Theorem 1, and line 24 sets PATHS and TRANS to empty in accordance with the theorem. PATHS and TRANS are the Allow and Transform sets for dd. The test at line 26 checks for the application of Theorem 2. If d2 â‚¬ Â£m[node where dd occurs], then by Property 1 there exists at least one definition-clear path P between d and dd that has at least one unreturned call, and somewhere along P, d2 is generated, meaning either ALLOW = 0 or K is defined for P. This satisfies the conditions of Theorem 2. Only the d2 element satisfies the theorem, so it follows that all paths P for the theorem will have to be constructed from the ROOT2 and LINK2 sets exclusively. Referring to Theorem 2, line 28 computes the AA set, and line 29 computes TT. For line 28, the PATHS set is defined in terms of itself. This recursive reference 64 means that each time a call is added to the PATHS set, the condition containing the recursive reference must be reevaluated, because additional calls may thereby be added to PATHS. Recursive references are similarly used in lines 33 and 43. What line 28 does is extract from all the calls that element d2 crossed, just those calls that are on a path to dd. This is done by building the paths backwards, beginning with those calls that call the procedure containing dd. By Lemma 1, any path between d and dd consisting of unreturned calls can be found by proceeding in reverse order from dd and selecting those calls that call a procedure containing a call already selected. Backward path building and Lemma 1 are similarly used in lines 33 and 43. By the properties of the ROOT and LINK sets, the paths constructed by line 28 will be definition-clear. Notice that a particular call may be in both the R00T2 and LINK2 sets, but if a call is only in the R00T2 set, then it cannot be used as the basis for extending further backwards any path, because by Property 1, d2 does not propagate from the entry node of the procedure that contains that call, to the call node for that call. This is the reason for the (PATHS f) LINK2) requirement in line 28. Once the PATHS set is computed, line 29 computes TRANS in accordance with the theorem. The test at line 31 checks for the application of Theorem 3. If d\ 6 Bm[node where dd occurs], then by Property 2 there exists a definition-clear path P between d and dd that has no unreturned calls. It also follows that ALLOW / 0 and P does not make an unmatched return to a call â‚¬ TRANSFORM, because d\ is the element, meaning K is not defined for P. This satisfies the conditions of Theorem 3. Referring to Theorem 3, line 33 computes the AA set, and line 34 computes TT. What line 33 does is extract from ALLOW all paths that end with a call of the procedure containing dd. Although Theorem 3 states that the path begin with a call â‚¬ TRANSFORM, line 33 does not require a check for this because TRANSFORM is a subset of ALLOW and those first unreturned calls in TRANSFORM that are on a path in ALLOW to dd, will unavoidably be picked up as the paths are built backwards 65 from dd. Thus, the PATHS set is computed in accordance with Theorem 3, followed by line 34 that computes the TRANS set in accordance with the theorem. The test at line 36 checks for the application of Theorem 4. If d\ â‚¬ Â£tn[node where dd occurs], then by Property 1 there exists at least one definition-clear path P between d and dd that has at least one unreturned call. It also follows that ALLOW Ã 0 and P does not make an unmatched return to a call 6 TRANSFORM, because di is the element. This satisfies the conditions of the theorem. Only the d\ element satisfies the theorem, so it follows that all paths P for the theorem will have to be constructed from the ROOT1 and LINK1 sets exclusively. Referring to Theorem 4, line 40 computes the Si set, line 43 computes the S2 set, line 44 computes the TT set, and line 45 computes the AA set. The reason for the test at line 41 is that although there exists at least one path P satisfying the theorem, there may not be any paths P that begin in the specific procedure X. It can be seen that lines 37 to 46 compute in accordance with the theorem. â–¡ Lemma 3. Let A, and be one pair of Allow and Transform sets associated with a definition d, and let Aj and Tj be a different pair of Allow and Transform sets associated with the same definition d. Assume A,- 0 and A: 0. If Aj C A, and Tj C T{, then dataflow analyzing d with the pair Aj and Tj cannot add anything to the ripple effect that is not added by dataflow analyzing d with the pair A, and T,. Proof. By inspection of Rules 1, 2, and 3, it can be seen that removing some of the calls from A,- or T, cannot make d affect anything that it does not affect with A,- and 7Â¿ as they were. Also, by inspection of lines 23 to 46, the determination of the Allow and Transform sets for any definition dd affected by d, cannot be made to include calls when A: and Tj are the restriction sets for d, that would not be included when A, and T, are the restriction sets for d. â–¡ Lemma 4â– Let A and T be Allow and Transform sets associated with a definition d, and let X and Y be a different pair of Allow and Transform sets associated with 66 the same definition d. If A = 0, then dataflow analyzing d with X and Y cannot add anything to the ripple effect that is not added by dataflow analyzing d with A and T. Proof. By Rule 1, d will be represented by d2 and have no restrictions on its backward flow. Thus, d will affect everything that it is possible for it to affect. If d is dataflow analyzed with X and Y, then any calls found in the ROOT1, ROOT2, LINK1, or LINK2 sets will also be found in the ROOT2 or LINK2 sets when d is dataflow analyzed with A and T. These sets determine the restriction sets associated with a definition dd affected by d. It follows that any dataflow path allowed for a dd affected by d using X and Y, will also be allowed for a dd affected by d using A and T. â–¡ Theorem 5. Given Definition 1 and Theorems 1 through 4, the algorithm will correctly compute the logical ripple effect. Proof. As shown by Lemma 2, for any affected definition dd, the Allow and Transform sets to be associated with dd are computed in accordance with Theorems 1 to 4. By Lemma 4, if Theorem 1 applies to an affected definition (line 23), then there is no need to check if any other theorem also applies, because additional dataflow analysis resulting from the other theorems cannot contribute to the ripple effect. However, if Theorem 1 does not apply, then the definition must be dataflow analyzed separately in turn for each theorem that does apply. This is done by the sequence of three if statements at lines 26, 31, and 36. Thus, the control logic in lines 23 to 46 is safe. The Analyze procedure (lines 47 to 52) prepares a definition and its restriction sets for dataflow analysis by adding them to the stack (line 50 and 52). Once a defiÂ¬ nition will be dataflow analyzed with no restrictions (line 50) it will not be analyzed again (line 47). By Lemma 4, this is safe. Assuming FINdd ^ T and PATHS ^ 0, the test at line 47 will not prepare a definition for dataflow analysis if both restriction sets 67 are subsets of any pair of restriction sets used previously to analyze that definition. This follows from Lemma 3. Thus, the Analyze procedure is safe. The correctness of the dataflow equations (line 8) is established in Chapter 2, and the correctness of the three rules for imposing backward-flow restrictions (line 8) has already been discussed. Regarding the correctness of having no backward-flow restrictions for the initial definition (line 5), let p be the program point where b occurs. For execution to attain point p, any possible execution path between the programâ€™s execution starting point and point p can be assumed to have occurred. Thus, there should be no restrictions on the backward-flow possibilities of b, because there were no constraints imposed by the ripple effect on how point p was initially attained. â–¡ Programs with recursive calls can be processed by our algorithm, but there may be some overestimation of the logical ripple effect because of the recursive calls. The dataflow equations (line 8) are not the problem, as they work for recursive programs. Instead, the problem is with the Allow set and its representation of execution paths. If a cyclic execution path is represented in the Allow set, then when the Allow set is used to restrict backward flow by Rule 2, it may be possible for an element moving through the program flowgraph to take a shortcut on its unmatched returns and avoid having to make unmatched returns along the complete cycle before a program point can be attained. This shortcut may permit the element to affect something that it should not be able to affect, possibly adding to the ripple effect beyond what should be there. 3.3 A Prototype Demonstrates the Algorithm This section first considers the complexity of our interprocedural logical ripple effect algorithm. A prototype that demonstrates the algorithm is then described, and test results presented. 68 Let n be the number of nodes in the flowgraph of the input program. For a programming language such as C, solving the dataflow equations for a single defiÂ¬ nition, which is what line 8 does, has worst-case complexity of O(n). Let k be the number of known calls in the input program. Considering line 47, a definition may be dataflow analyzed repeatedly as long as the associated restriction sets are not subsets of any previous pair of restriction sets used to dataflow analyze that definition. The number of different restriction sets possible such that no set is a subset of another set, is clearly a number that will grow exponentially with k. Thus, the worst-case complexity of our logical ripple effect algorithm is exponential, where the exponent is some function of k. However, for the typical input program, the actual number of non-subset restriction sets that can be generated by our algorithm for a given definiÂ¬ tion, will be severely constrained by a combination of Lemma 1, Theorems 1 through 4, and the typical program call structure that is characterized by shallow call depth. A prototype that demonstrates our logical ripple effect algorithm has been built. The prototype accepts as input C programs that satisfy certain constraints, such as having only single-identifier variable names. Given an input program, the prototype then requires that one or more definitions be identified as the starting point of the ripple effect. For purposes of comparison, besides using our algorithm to compute a precise logical ripple effect, the prototype also computes an overestimate of the logical ripple effect. The overestimate is computed by simply ignoring the execution- path problem, i.e. there are no backward-flow restrictions when the overestimate is computed. The worst-case complexity of computing the overestimate for C programs is only O(nd) where n is the number of flowgraph nodes and d is the number of definitions in the overestimated ripple effect. This complexity follows from the O(n) complexity of solving the dataflow equations for a single definition, and the fact that the equations will have to be solved d times. 69 Table 3.1. Experimental results for the prototype. globals defs defs global depth nodes RS0 RSP reduction time0 timep 50 2420 7% 2/213 3939 2275 936 53.4% 5s 3s 100 2291 15% 2/188 3776 4151 2449 41.0% 17s 13s 200 2294 30% 2/188 3662 5594 3718 33.5% 40s 32s 300 2370 45% 2/231 3962 5897 2607 55.8% lm5s 27s 50 2225 7% 3/202 3717 1222 633 40.3% 3s 2s 100 2333 15% 3/229 3864 4139 1867 54.9% 17s 7s 200 2211 30% 3/231 3760 4884 2688 45.0% 39s 28s 300 2236 45% 3/205 3737 5308 3505 34.0% 59s 38s 50 2320 7% 4/227 3912 1822 1067 35.1% 5s 3s 100 2211 15% 4/228 3673 4329 1525 64.8% 18s 7s 200 2223 30% 4/227 3705 5019 1918 61.8% 37s 16s 300 2214 45% 4/214 3648 5922 4740 20.0% lm9s lm36s 100 4354 7% 2/372 6858 4317 2201 40.0% 19s 10s 200 4467 15% 2/368 7068 8844 6457 27.0% lml7s lml2s 400 4261 30% 2/388 6851 9653 2976 69.2% 2m29s 49s 600 4289 45% 2/340 6784 10590 6840 35.4% 4m8s 3m56s 100 4314 7% 3/432 6781 1993 631 52.5% 8s 2s 200 4268 15% 3/395 6876 5795 3236 35.5% 51s 54s 400 4223 30% 3/393 6735 9240 7307 20.9% 2m26s 4m21s 600 4248 45% 3/433 6868 9772 6453 30.6% 3m56s 4m50s 100 4252 7% 4/455 6961 2756 1120 42.6% 14s 5s 200 4276 15% 4/440 6858 7781 5752 26.1% lmlOs 2m35s 400 4228 30% 4/391 6681 9838 8290 15.7% 2m45s 9m20s 600 4112 45% 4/462 6802 10017 9192 8.2% 4m24s 39m55s Table 3.1 presents test results for the prototype. Each row details relevant charÂ¬ acteristics of an input program, and presents the resulting averages of ten different tests of that input program, where each test computed the ripple effect started by a single, randomly chosen definition of a global variable. The input programs of Table 3.1 were randomly generated by a separate proÂ¬ gram generator. The generated input programs are syntactically correct and compile without error, but have meaningless executions. Each input program of Table 3.1 has 100 procedures, and exactly the number of global variables listed. Within each input 70 program, each global variable is defined and used at least once. The call structure of each input program was determined randomly by the generator, with the constraint that there be no recursion in the input program, and the given maximum call depth not be exceeded by any call in the input program. All calls in the generated input program are known calls, and approximately l/(max + 1) of the calls will be at each possible depth from zero to max, where max is the given maximum call depth. Referring to the columns of Table 3.1, â€œglobalsâ€ is the number of global variables in the input program, â€œdefsâ€ is the number of definitions in the input program, â€œdefs globalâ€ is the percentage of the definitions that define a global variable, â€œdepthâ€ is the maximum call depth followed by the total number of calls in the input program, â€œnodesâ€ is the number of nodes in the flowgraph, â€œRSâ€žâ€ is the average size of the overestimated ripple effect for the ten test cases where size is the total number of definitions and uses in the ripple effect, â€œRSPâ€ is the average size of the precise ripple effect, â€œreductionâ€ is the average percentage reduction for the ten test cases of the size of the overestimated ripple effect when it is replaced by the precise ripple effect, â€œtime0â€ is the average CPU usage time for each test case to compute the overestimated ripple effect, and â€œtimepâ€ is the average CPU usage time for each test case to compute the precise ripple effect. The hardware used was rated at roughly 24 MIPS. As an example of the time notation used in Table 3.1, time lm36s would be read as 1 minute, 36 seconds. Although the worst-case complexity of our algorithm for precise logical ripple effect is exponential, the data of Table 3.1 indicates that the expected complexity for a wide range of input programs, given a programming language such as C, is approxiÂ¬ mated by 0(nd). This follows from the 0(nd) worst-case complexity of computing the overestimate, and the typical closeness of time0 and timep for each row in Table 3.1. However, the last row of Table 3.1 is instructive, because it shows that regardless of what the expected complexity might be, there will always be specific input programs 71 and starting points that require time greatly exceeding the time required to compute the overestimate. In practice, if the computation of the precise logical ripple effect is taking too long, then this computation can be abandoned and the overestimate computed and used in its place. Note that our algorithm can very easily compute the overestimate by simply modifying Rule 1 so that element d2 is always generated in place of element d\, thereby avoiding all backward-flow restrictions. 3.4 The Slicing Algorithm This section presents the inverse form of the precise interprocedural logical ripple effect algorithm, and the inverse form of the associated dataflow equations and backward-flow restriction rules. Our algorithm for precise interprocedural slicing is shown in Figure 3.5. The complexity and expected performance of this algorithm is the same as for the precise interprocedural logical ripple effect algorithm given previously. For logical ripple effect, the dataflow problem solved at line 8 was reaching definitions for a single definition. For slicing, which is the inverse problem, the dataflow problem solved at line 8 will be reaching uses for a single use. In reaching definitions, the definition flows in the direction of the arcs in the flowgraph, and is killed by definitions of the same variable, and affects uses of the same variable and any definitions directly dependent on an affected use. In reaching uses, the use flows in the reverse direction of the arcs in the flowgraph, and is killed by definitions of the same variable, and affects definitions of the same variable and any uses that directly determine an affected definition. This reverse flow in the flowgraph means that the dataflow equations solved at line 8 for the slicing algorithm must be an inverted form of the dataflow equations that are used for the logical ripple effect algorithm. These inverted dataflow equations are shown in Figure 3.6. The inverted rules that the slicing algorithm uses for backward-flow restriction are given below. Notice that the ALLOW and TRANSFORM sets will contain returns instead of calls. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 72 â€” Compute the slice for a hypothetical or actual use b â€” Input: a program flowgraph ready for dataflow analysis â€” Output: the slice in SLICE begin SLICE <- 0 for each use uu in the program FINUU <- 1 end for stack <â€” 0 push (6, 0, 0) onto stack while stack ^ 0 do pop stack into (u, ALLOW, TRANSFORM) Solve the reaching-uses dataflow equations for the single use u, using Rules 1, 2, and 3. SLICE <- SLICE U {u} for each definition d in the program that is affected by either Ui or u2 SLICE <- SLICE U {d} end for ROOT1 ^ 0, LINK1 <- 0, ROOT2 4- 0, LINK2 0 for each return node n in the flowgraph if u\ â‚¬ Bin[n] A U\ crossed from this return into the returned-from procedure ROOT1 <â€” ROOTl U {the return node n} fi if u\ â‚¬ Â£,â€ž[n] A u\ crossed from this return into the returned-from procedure LINK1 4â€” LINK1 U {the return node n} fi if u2 â‚¬ BÂ¡n[n\ A ii2 crossed from this return into the returned-from procedure ROOT2 4â€” ROOT2 U {the return node n} fi if u2 fi end for Figure 3.5. The slicing algorithm. 73 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 end end for each use uu in the program that is affected by either u\ or u2 â€” determine Allow and Transform for uu by Theorem 1 if u2 Â£ Z?out[node where uu occurs] PATHS - 0, TRANS <- 0 call Analyze else â€” determine Allow and Transform for uu by Theorem 2 if u2 Â£ i?oui[node where uu occurs] PATHS 4- 0 PATHS <â€” {x | x Â£ (ROOT2 U LINK2) A (x returns from the procedure that contains uu V x returns from a procedure that contains a return r Â£ (PATHS D LINK2))} TRANS 4- ROOT2 n PATHS call Analyze fi â€” determine Allow and Transform for uu by Theorem 3 if ui Â£ B0ut[node where uu occurs] PATHS 4- 0 PATHS 4â€” (x | x Â£ ALLOW A (x returns from the procedure that contains uu V x returns from a procedure that contains a return r Â£ PATHS)} TRANS 4- TRANSFORM n PATHS call Analyze fi â€” determine Allow and Transform for uu by Theorem 4 if ui Â£ F'oujnode where uu occurs] for each procedure X that contains a return Â£ ROOT1 RT1 4â€” {x | x Â£ ROOT1 A x is contained in procedure X) PP 4- 0 PP 4â€” {a: | x Â£ (RT1 U LINK1) A (x is on a path that inclusively begins with a return Â£ RT1 and ends with a return from the procedure that contains uu, such that each return in this path is in (RTl U LINKl))} if PP Â± 0 PATHS 4- 0 PATHS Â«â€” {x | x Â£ ALLOW A (x returns from procedure X V x returns from a procedure that contains a return r Â£ PATHS)} TRANS 4- TRANSFORM n PATHS PATHS 4- PATHS U PP call Analyze statements: fi, end for, fi, fi, end for, od Figure 3.5. - continued. 74 * Procedure Analyze begin â€” avoid repetition of uu dataflow analysis if possible 47 if FINUU T A (PATHS = 0 V (true for all saved pairs for uu: PATHS ^ P V TRANS % T)) 48 if PATHS = 0 49 FINUU T 50 push (uu, 0, 0) onto stack else 51 save PATHS and TRANS as the pair P x T for uu 52 push (uu, PATHS, TRANS) onto stack fi fi end Figure 3.5. - continued. Rule 1. If ALLOW = 0 then element u2 is generated at the node where use u occurs, otherwise Uj is the generated element. Rule 2. Let n be a call node, p be the associated return node, and q be the entry node of the returned-from procedure. Each time the Bout[n\ equation is computed, if Ui (E Bin[q), then ux cannot cross from B{n[q] into the Bout[n] set if p Â£ ALLOW. Rule 3. Let n be a call node, p be the associated return node, and q be the entry node of the returned-from procedure. Each time the Bout[n\ equation is computed, if iq â‚¬ B{n[q], and, by C2 and Rule 2, uj can cross from B{n[q] into the Bout[n] set, and p E TRANSFORM, then as this U\ element crosses from B{n[q] into the Bout[n\ set, the element is changed to u2. In effect, uj is transformed into u2, and the call node n becomes a generation node for the u2 element. As the usefulness of slicing is primarily for program fault localization, it may be desirable to modify the algorithm so that those uses in control predicates whose subordinate statements have at least one use or definition already in the slice, are themselves added to the slice and propagated in turn. An example of a control predÂ¬ icate is the condition tested by an if statement. By subordinate statements is meant 75 For any node n. OUT[n] = Eout[n\ U Bout[n\ IN[n] = Ein[n\ U Bin[n] Group I: n is an exit node. B0ut[n] = 0 Eout[n] = (J {x | x e IN[p] A Ci} p â‚¬ succ(n) Bin[n] = GEN[n] Ein[n] = Eout[n] U RECODE[n] Group II: n is a call node, p is the associated return node and q is the entry node of the returned-from procedure. B0ut[n\ = {x | (x â‚¬ Bin[p] A (Ci V (Ci A C2 A x e Â£Â¿n[g]))) V (x â‚¬ Bin[q] A C2)} Eout[n] = {x G Ein[p] | Ci V (Ci A C2 A x G Â£,â€ž[?])} Bin[n} = [Bout[n] - KILL[n]) U GEN[n] Ein[n] = Eout[n\ - KILL[n] Group III: n is not an exit or call node. B0ut[n) = (J Bin[p] p â‚¬ succ(n) E0ut[n] = (J Ein[p] p â‚¬ succ(n) Bin[n\ = (Bout[n] â€” KILL[n]) U GEN[n] Â£,â€ž[Â«] = Eout[n] - KILL[n\ Figure 3.6. Dataflow equations for the reaching-uses problem. 76 those statements whose execution is decided by the control predicate. Including these control-predicate uses in the slice is advantageous because the cause of a program error may actually be in a control predicate that is not deciding correctly when to execute its subordinate statements. Ferrante et al. [8] present a method to precisely determine the control predicates for each statement. CHAPTER 4 INTERPROCEDURAL PARALLELIZATION 4.1 Loop-Carried Data Dependence This section explains loop-carried data dependence and its relevance to paralÂ¬ lelization. When a definition of a variable reaches a use of that variable, then a data dependence exists such that the use depends on the definition. An example of data dependence can be seen in Figure 4.1. The use of A(I) at line 3, and the use of A(I) at line 4, both depend on the definition of A(I) at line 2. However, when considering whether or not a loop can be parallelized, there is a special kind of data dependence called loop-carried data dependence [25]. A data dependence is loop carried if the value set by a definition inside the loop during loop iteration i can be used by a use of that variable inside the loop during loop iteration j, where i ^ j. Note that i ^ j is specified instead of the more restrictive and natural seeming i < j, because if the loop is parallelized then the ordering of the loop iterations cannot be assumed. The relationship between loop-carried data dependence and parallelization is straightforward. If there is at least one loop-carried data dependence, then the loop cannot be parallelized, otherwise the loop can be parallelized. Loop parallelization 1 DO I = 1,N 2 A(I) = B(I) * C(I) + D 3 B(I) = C(I) / D + A(I) 4 IF C(I) < 0 THEN C(I) = A(I) * B(I) FI END DO Figure 4.1. An example loop. 77 78 would mean that the ordering of the different iterations of the loop is unimportant, whereas a loop-carried dependence means the opposite. If there are no loop-carried data dependencies then there is no requirement that the iterations be ordered a certain way. However, whenever a loop is parallelized, there should be a following, added, serial step that sets the iteration variables, such as the I in Figure 4.1, to whatever their values would be for the last iteration of the loop, assuming the loop had not been parallelized. This added step would be necessary, assuming the iteration variables of a loop are visible outside the loop and can therefore be referenced after the loop completes. Iteration variables are those variables that are incremented or decremented a constant value for each loop iteration. The recognition of iteration variables is language-dependent. Regarding data dependence and arrays, there are several efficient tests available that determine if a data dependence is possible between a particular definition and use of an array. The tests are the separability test, the gcd test, and the Banerjee test. Details of these three tests can be found in [25]. The number theory behind the tests is linear diophantine equations. A linear diophantine equation can be formed from the array subscripts of the definition and use in question. For example, in Figure 4.2 we want to know if A(3 * I - 5) and A(6 * I) can ever refer to the same array element. The linear diophantine equation that relates these two array references would be 3a: â€” 6y = 5. The question now becomes does this equation have any integer solutions given the boundary conditions 30 < x,y < 100. If there is at least one integer solution, then there would be a data dependence, otherwise there is no data dependence, as is the case with Figure 4.2. For the discussion that follows, we define the term loop body. The loop body of any loop L will be all statements in the program that can possibly be executed during the iterations of loop L. Calls are allowed in a loop, so a single loop body could conceivably include the statements of many different procedures. For example, 79 DO I = 30,100 A(3 * I - 5) = ... ...= A(6 * I) END DO Figure 4.2. A loop with array references. if a loop contains a call of procedure A, and procedure A contains a call of procedure B, then the loop body would include all the statements of procedures A and B. In Figure 4.1, the loop body is the four statements at lines 1 through 4. With respect to the program flowgraph, the loop body is all flowgraph nodes that may be traversed during the iterations of the loop. Let LB be the set of flowgraph nodes that are in the loop body of loop L. Let n be the first node in the loop body that is traversed during each iteration of the loop. The identification of node n is language-dependent. Within the loop body of L, let definition d be a definition of a non-array variable v, and let use u be a use of the variable v that is reached by definition d. Let d be the node in the loop body where definition d occurs, and let u be the node in the loop body where the use u occurs. To avoid the complications posed by special cases, we assume that d, n, and u are separate and distinct nodes. Although use u depends on definition d because definition d reaches use u, this data dependence can prevent parallelization of loop L only if the dependence is loop carried. Let P be a sequence of flowgraph nodes drawn from LB, such that P represents a possible execution path along which definition d can reach use u. For definition d to be loop-carried to use u along path P, the three nodes, d, n, and u, must be in P, and in that order, because only the traversal of node n represents the transition to a different iteration of the loop. If v is an array, then we assume that definition d and use u may refer to different array elements during the same iteration. For this reason, a path P that includes the nodes d, u, n, d, it, in that order, must 80 be assumed to show a loop-carried data dependence when v is an array, whereas this path P does not show a loop-carried data dependence if definition d and use u always refer to the same storage location during any iteration, as we assume is the case when v is a non-array, because in any iteration that follows such a path P, the value used at use u is always the value defined at definition d in that same iteration. 4.2 The Parallelization Algorithm This section presents in Figure 4.3 an algorithm that identifies loops that can be parallelized, including loops that contain calls. The algorithm uses our interproceduÂ¬ ral dataflow analysis method as an integral step to determine data dependencies. The loops that can be parallelized are those loops that are not marked by the algorithm as inhibited. The algorithm has three distinct steps. First, the reaching-definitions dataflow problem is solved for the input program by using our interprocedural dataflow analysis method. Second, the quality of the reaching-definition information computed by the first step is possibly improved in the case of array references by using the separability, gcd, and Banerjee tests. Third, individual d,u pairs that represent data dependence are examined for loop-carried data dependence. At line 7, the definitions and uses of iteration variables are excluded from testing for loop-carried data dependence, because for any iteration the iteration variables will have constant values that can be precomputed if loop L is parallelized. The test at line 8 is a necessary condition for the P-test procedure to return a T, which is tested for at line 9. The test at line 8 is done as an economy measure to avoid, when possible, the more costly P-test. Procedure P-test uses a straightforward algorithm that begins with node d and then spreads out examining successors, successors of successors, and so on, until either there are no more acceptable nodes to examine, in which case F is returned, or all the requirements for path P have been met, in which case T is returned. The successors 1 2 3 4 5 6 7 8 9 10 81 â€” a d, u pair is a definition d that reaches a use u â€” a: is the dataflow element that represents the definition d â€” v is the variable referenced by definition d and use u â€” to avoid complications, n ^ d ^ u is assumed â€” n is the first node traversed during each loop L iteration â€” d is the node whose basic block contains definition d â€” u is the node whose basic block contains use u â€” LB is the set of nodes in the loop body of loop L â€” IV is the set of definitions of iteration variables for loop L begin â€” step 1, determine reaching definitions for the input program use our method to solve the reaching-definitions dataflow problem â€” step 2, improve the reaching-definition information for array references for all d, u pairs in the program, such that v is an array use the separability, gcd, and Banerjee tests as applicable if definition d and use u can never reference the same element mark the d, u pair as non-reaching fi end for â€” step 3, identify d, u pairs that inhibit parallelization for each loop L in the program for each reaching d, u pair such that d, u â‚¬ LB and definition d Â£ IV if x â‚¬ Bout[n] if P-test(ar, n, d, u, L, LB) = T mark L parallelization as inhibited by the d, u pair fi fi end for end for end Figure 4.3. The parallelization algorithm. 11 12 13 14 15 16 17 18 19 20 21 22 23 24 82 procedure P-test(x, n, d, u, L, LB) â€” is there a loop-carried data dependence from definition d to use u thru node n â€” return T if yes, F if no begin â€” parti, is there a path from d to n along which x is found if v is an array DONE Â«- {d} else DONE <- {d, u} fi NEXT Â«- {d} until NEXT = 0 remove a node from NEXT, denote it p for each successor node s of node p, such that s Â£ DONE DONE 4- DONE U {s} if s & LB Vs is an entry node Vx Â£ 5out[>s] ignore s else if s = n goto part2 else NEXT <- NEXT U {s} fi end for end until return F Figure 4.3. - continued. 83 part2: â€” part2, is there a path from n to u along which x is found 25 if v is an array 26 DONE Â«- {n} else 27 DONE Â«- {n, d} fi 28 NEXT Â«- {n} 29 until NEXT = 0 30 remove a node from NEXT, denote it p 31 for each successor node s of node p, such that s ^ DONE 32 DONE <- DONE U {s} 33 if s Â£ LB V5 is an exit node V(Ã¡ is contained in the same procedure that contains L A x $ -Bout[s]) V(s is not contained in the same procedure that contains L A x <Â£ E^oui[s]) 34 ignore s 35 else if s = u 36 return T else 37 NEXT 4- NEXT U {s} fi end for end until 38 return F end Figure 4.3. - continued. 84 of a node are examined because normally a successor node is assumed to represent a possible continuation of the execution path from the point of the predecessor node. Exceptions in the algorithm involving entry and exit nodes are explained shortly. Note that P-test only determines whether a satisfactory path P exists or not; it does not determine what path P is in terms of an actual node sequence, as there may be many such satisfactory paths P. Lines 13 and 27 are active when v is not an array. In this case, a path P that includes d, u, n, d, u, in that order, is not allowed, and this is prevented by marking the unwanted node u at line 13, and the unwanted node d at line 27. The test of x f?out[s] at line 19 satisfies the requirement that the definition d can reach along the path P. A similar test is made at line 33. At line 19, only the B set is checked because there are no descents into called procedures, as per the rejection of entry nodes at line 19. Entry nodes are rejected at line 19 because any path from d to n will not leave unreturned calls, because n is an outermost node relative to the loop body, and the path is confined to the loop body. As the successors of each call node are an entry node and a return node, it is only necessary to check the out set of the return node to know whether the element x survived the call or not, and this is effectively done by the x Â£ Bout[s] test already mentioned. At line 33, exit nodes are rejected because any path from n to u will not make a return without first making the call. This follows from the fact, already mentioned, that node n is an outermost node relative to the loop body, and the path is confined to the loop body. As the return node can always be added to the path P from the call node, there is no need to add it from the exit node, hence the rejection of the exit node. For parti and part2 in procedure P-test, each flowgraph node may appear only once in the NEXT set, hence the complexity of the P-test procedure is 0(n) where n is the number of flowgraph nodes. For the entire algorithm, step3 dominates, so the 85 complexity is O(lpn) where / is the number of loops in the program, p is the number of d,u pairs in the program, and n is the number of flowgraph nodes. CHAPTER 5 CONCLUSIONS AND FUTURE RESEARCH 5.1 Summary of Main Results The first part of this work presented a new method for context-dependent, flow- sensitive interprocedural dataflow analysis. The method was shown to produce a precise, low-cost solution for such fundamental and important problems as reaching definitions and available expressions, regardless of the actual call structure of the program being analyzed. By using a separate set to isolate calling-context effects, and another set to accumulate body effects, the calling-context problem has been reduced to the problem of solving the dataflow equations that compute the different sets. These equations can be solved by the iterative algorithm. As part of our method, the interprocedural kill effects of call-by-reference formal parameters are correctly handled by the equations-compatible technique of element recoding. The importance of our interprocedural analysis method lies in the fact that a number of different applications depend on the solution of fundamental dataflow problems such as reaching definitions, live variables, definition-use and use-definition chains, and available expressions. Program revalidation, dataflow anomaly detection, compiler optimization, automatic vectorization and parallelization, and software tools that make a program more understandable by revealing data dependencies, are some of the applications that may benefit by using our method. The second part of this work presented new algorithms for precise interproceduÂ¬ ral logical ripple effect and slicing. The algorithms use our interprocedural dataflow analysis method, and add a control mechanism by which, in effect, execution-path 86 87 history can affect execution-path continuation as the ripple effect or slice is built piece by piece. The importance of our algorithms for precise interprocedural logical ripple effect and slicing lies in their applicability to the areas of software maintenance and debugÂ¬ ging. A precise interprocedural logical ripple effect can be used to show a programmer the consequences of program changes, thereby reducing errors and maintenance cost. Similarly, a precise interprocedural slice can localize program faults, thereby saving programmer effort and debugging cost. The third part of this work presented an algorithm that identifies loops that can be parallelized, including loops that contain calls. The algorithm makes use of our interprocedural dataflow analysis method to determine data dependencies, and then the algorithm examines the data dependencies within each loop and determines if any of these data dependencies are loop-carried, in which case parallelization of the loop is inhibited. The algorithm has potential use in parallelization tools. 5.2 Directions for Future Research There are several topics of possible future research related to our method for interprocedural dataflow analysis. Regarding solving the equations, besides the itÂ¬ erative algorithm there are elimination algorithms [20] that have better complexity. Further studies are needed to determine to what extent these other algorithms can be used to solve the equations. Another topic regards the dataflow problems that can be solved by our method, as the actual universe of solvable problems remains to be determined. We have only mentioned a few of the better known problems. For some dataflow problems, it may be that our method can be used after suitable modification to adapt it to the special needs of the problem. Regarding possible future research related to our algorithms for precise interÂ¬ procedural logical ripple effect and slicing, because the algorithms may overestimate when recursive calls are present, or because the Allow set lacks the information needed 88 to enforce the ordering of unmatched returns, one area of future research would be to investigate the possibility of modifying Definition 1, Theorems 1 through 4, and the algorithms, so as to remove the possibility of such overestimation. REFERENCES [1] Agrawal, H., and Horgan, J. Dynamic program slicing. Proceedings of the SIG- PLAN 90 Conference on Programming Language Design and Implementation. ACM SIGPLAN Notices, 25, 6 (June 1990), 246-256. [2] Aho, A., Sethi, R., and Ullman, J. Compilers, Principles, Techniques and Tools. Addison-Wesley, Reading, MA (1986). [3] Allen, F. Interprocedural data flow analysis. Proceedings of the IFIP Congress 1974, North Holland, Amsterdam (1974), 398-402. [4] Banning, J. An efficient way to find the side effects of procedure calls and the aliases of variables. Conference Record of the 6th ACM Symposium on Principles of Programming Languages, ACM, New York (Jan. 1979), 29-41. [5] Burke, M., and Cytron, R. Interprocedural dependence analysis and parallelizaÂ¬ tion. Proceedings of the SIGPLAN 86 Symposium on Compiler Construction, 162-175. [6] Callahan, D. The program summary graph and flow-sensitive interprocedural data flow analysis. Proceedings of the SIGPLAN 88 Conference on ProgramÂ¬ ming Language Design and Implementation. ACM SIGPLAN Notices, 23, 7 (July 1988), 47-56. [7] Cooper, K., and Kennedy, K. Interprocedural side-effect analysis in linear time. Proceedings of the SIGPLAN 88 Conference on Programming Language Design and Implementation. ACM SIGPLAN Notices, 23, 7 (July 1988), 57-66. [8] Ferrante, J., Ottenstein, K., and Warren, J. The program dependence graph and its use in optimization. ACM Transactions on Programming Languages and Systems, 9, 2 (1987), 319-349. [9] Harrold, M., and Soffa, M. Computation of interprocedural definition and use dependencies. Proceedings of the IEEE Computer Society 1990 Intâ€™l Conference on Computer Languages, New Orleans, LA (March 1990). [10] Hecht, M. Flow Analysis of Computer Programs. Elsevier North-Holland, New York (1977). [11] Horwitz, S., Reps, T., and Binkley, D. Interprocedural slicing using dependence graphs. ACM Transactions on Programming Languages and Systems, 12, 1 (Jan. 1990), 26-60. [12] Hwang, J., Du, M., and Chou, C. Finding program slices for recursive procedures. Proceedings of the IEEE COMPSAC 88 (Oct. 1988), 220-227. 89 90 [13] Johmann, K., Liu, S., and Yau, S. Dataflow Equations for Context-Dependent Flow-Sensitive Interprocedural Analysis. SERC-TR-45-F, Department of ComÂ¬ puter and Information Sciences, University of Florida, Gainesville (Jan. 1991). [14] Korel, B., and Laski, J. Dynamic program slicing. Information Processing LetÂ¬ ters, 29, 3 (Oct. 1988), 155-163. [15] Landi, W., and Ryder, B. Pointer-induced aliasing: a problem classification. Conference Record of the 18th ACM Symposium on Principles of Programming Languages, ACM, New York (1991), 93-103. [16] Leung, H., and Reghbati, H. Comments on program slicing. IEEE Transactions on Software Engineering, SE-13, 12 (Dec. 1987), 1370-1371. [17] Myers, E. A precise interprocedural data flow analysis algorithm. Conference Record of the 8th ACM Symposium on Principles of Programming Languages, ACM, New York (1981), 219-230. [18] Richardson, S., and Ganapathi, M. Interprocedural optimization: experimental results. Softwareâ€”Practice and Experience, 19, 2 (1989), 149-169. [19] Rosen, B. Data flow analysis for procedural languages. Journal of the ACM, 26, 2 (April 1979), 322-344. [20] Ryder, B., and Pauli, M. Elimination algorithms for data flow analysis. ACM Computing Surveys, 18, 3 (Sep. 1986), 277-316. [21] Sharir, M., and Pnueli, A. Two approaches to interprocedural data flow analysis. Muchnik, S., and Jones, N. Eds. Program Flow Analysis: Theory and ApplicaÂ¬ tions, Prentice-Hall, Englewood Cliffs, NJ (1981), 189-232. [22] Triolet, R., Irigoin, F., Feautrier, P. Direct parallelization of call statements. Proceedings of the SIGPLAN 86 Symposium on Compiler Construction, 176â€” 185. [23] Weiser, M. Programmers use slices when debugging. Communications of the ACM, 25, 7 (July 1982), 446-452. [24] Weiser, M. Program slicing. IEEE Transactions on Software Enqineerinq, SE-10, 4 (July 1984), 352-357. [25] Zima, H., and Chapman, B. Supercompilers for Parallel and Vector Computers. Addison-Wesley, Reading, MA (1990). BIOGRAPHICAL SKETCH Kurt Johmann was born in Elizabeth, New Jersey, on November 16, 1955. In 1978 he received a B.A. in computer science from Rutgers University in New Jersey. Following graduation, he worked for a shipping company, Sea-Land Service Inc., as a programmer and systems analyst. In 1985 he left Sea-Land and did PC work for three years. Following this, he entered the graduate program of the Computer and Information Sciences Department at the University of Florida in the Fall of 1988. He received an M.S. in computer science, December 1989, and entered the Ph.D. program. Anticipating graduation, he hopes to find a job in academia. 91 I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Stepfie/ S. Yau, Chai/man Professor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Richard Newman-Wolfe, Cochairman Assistant Professor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. fL//\rfpLj Paul Fishwick Associate Professor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. May, 1992 n /'I'v Winfred M. Phillips Dean, College of Engineering Madelyn M. Lockhart Dean, Graduate School UNIVERSITY OF FLORIDA xml version 1.0 encoding UTF-8 REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd INGEST IEID EBDFRDY78_3D0UU3 INGEST_TIME 2011-07-29T20:06:58Z PACKAGE AA00003273_00001 AGREEMENT_INFO ACCOUNT UF PROJECT UFDC FILES |