PAGE 1
A HIERARCHICAL METHOD OF PERFORMING GLOBAL OPTIMIZATIONS By LAURIE A. WHITE A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA
PAGE 2
Copyright 1990 by Laurie Ann White
PAGE 3
ACKNOWLEDGEMENTS I must first thank the wonderful teachers along the way who made this possible for me. From Dorothy Buckley who had me going to the library in the 6th grade to find out more, through Diana Brantley and Sue Sturgill, who actually expected me to think in high school, and Glenn Kesler who frightened me with all that I did not know in a logic tutorial at the University of Virginia (but then went on to help me learn much of it), to those professors at the University of Florida who gave me the background, motivation, and encouragement to undertake this dissertation. I especially appreciate the work done by my committee chairman, Dr. Gerhard Ritter, and all of the members of my committee. Jeffie Woodham, the graduate secretary of the Computer and Information Sciences Department, made everything much easier by both knowing exactly what I had to do at every step along the way and being ready with a smile as she helped me through. Special thanks go to Dr. Joseph N. Wilson, my cochairman. He spent literally hundreds of hours working with me in all phases of this project, keeping me excited when things were going well and encouraged when they were not. He was available to me at even the worst times, including a year of meetings at 5 p.m. Fridays. Thanks go to my mother and my father, who raised me in a home where anything was possible. When things started to seem difficult, it was good to have that to fall back on. And finally, my greatest thanks go to my husband Charles Engelke, who assures me, that although writing a dissertation is difficult, living with someone writing a dissertation is far worse. Even when I wasnÂ’t there for him, he was always there for me, challenging me to do my best, and then some.
PAGE 4
TABLE OF CONTENTS Page ACKNOWLEDGEMENTS iii LIST OF TABLES vi LIST OF FIGURES vii LIST OF SYMBOLS viii ABSTRACT ix CHAPTERS 1 INTRODUCTION 1 2 BACKGROUND MATERIAL 3 2.1 Code Optimization 4 2.1.1 Peephole Optimizations 4 2.1.2 Global Data Flow Analysis 6 2.1.3 Global Data Flow Optimizations 7 2.2 Program Semantics 9 2.3 Image Processing and the Image Algebra 10 2.4 Expert Systems 13 3 LANGUAGE DEFINITION 14 3.1 Language Syntax 15 3.2 Variable Locations and States 16 3.3 Complexity Measures 19 3.4 Syntax of Substitution 20 3.5 Semantics of the Language 26 3.6 Semantics of State Variants 29 3.7 Semantics of Substitution 31 3.8 Sets and Uses 37 3.9 Static Approximation of Sets and Uses 42 IV
PAGE 5
4 PRIMITIVE TRANSFORMATIONS 46 4.1 Statement Transformations 47 4.2 Primitive if Statement Transformations 58 4.3 Primitive Loop Transformation 63 5 GLOBAL OPTIMIZATIONS 66 5.1 Loop Joining 67 5.2 Loop Interchange 68 5.3 Code Motion 73 5.4 Loop-Conditional Joining 76 6 A PROTOTYPE OPTIMIZER 80 6.1 The System and Its Data Structures 80 6.2 A Parameterized Timer 85 6.3 A New Approximation of Always Separate 85 6.4 Some Heuristic Programs Using the System 88 6.5 A Large Example: The Histogram 92 7 CONCLUSIONS 94 REFERENCES 97 BIOGRAPHICAL SKETCH 100 v
PAGE 6
LIST OF TABLES Table p age 3.1 Syntactic Notation 15 3.2 Semantic Notation 26 6.1 HOPS Equivalents for Language Constructs 82 6.2 When Two Index Types May Be Always Separate 88 Â» VI
PAGE 7
LIST OF FIGURES Figure p age 4.1 Statement Interchange Used to Move a Statement 47 5.1 Loop Interchange May Change the Number of Loop Initializations . . 69 5.2 The Effects of Loop Interchange 70 5.3 Statement Interchanging During Loop Interchanging 71 6.1 Some Symbolic Statement Times 86 6.2 A HOPS Program to Optimize the Histogram Program 90 6.3 A Straightforward Implementation of the Histogram 92 6.4 The Resulting Histogram Program 93 vii
PAGE 8
LIST OF SYMBOLS Symbol Meaning Page x,y,z,u Simple integer variable 15 a[s] Array reference 15 V Integer variable 15 m, n Integer constant 15 s Integer expression 15 Â® Integer binary operator 15 a Integer 15 V Set of integers 15 b Boolean expression 16 0 Boolean binary operator 16 P Truth value 16 W Set of truth values 16 s Statement 16 a State 16 Â£ Set of states 16 c Location of a variable 16 t Intermediate variable 16 a{al() State variant 17 sep Always separate 18 c Complexity 19 S[s/x] Textual substitution 20 V < s/x > Left hand side substitution 20 a Constant with the value of a 26 71 Meaning of an integer expression 26 W Meaning of a Boolean expression 27 M Meaning of a statement 28 <7 Equal in state a 28 = Equal in all states 28 vm
PAGE 9
Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy A HIERARCHICAL METHOD OF PERFORMING GLOBAL OPTIMIZATIONS By Laurie A. White May 1990 Chairman: Dr. Gerhard X. Ritter Cochairman: Dr. Joseph N. Wilson Major Department: Computer and Information Sciences Program optimization has long been an important function of compilers. Traditionally, global optimizations have been accomplished by collecting large sets of data flow information items about the various statements in the program. This dissertation provides a new approach to global optimizations by introducing a set of provably correct code transformations which can be combined to perform most of the traditional global data flow code optimizations. First, a small language which could be viewed as an intermediate code of an image processing language is defined syntactically and semantically. Next, a group of primitive source-to-source transformations on this language are described and the cases under which each transformation is valid are proved. Then, these primitive transformations are combined to yield global transformations such as code motion and copy propagation. A new result called loop-conditional joining is also developed from the primitive transformations. Finally, a prototype system using these techniques is developed. It enables a user to experiment with a variety of code transformations and provides some assistance with heuristics designed to improve code. IX
PAGE 10
CHAPTER 1 INTRODUCTION While straightforward implementations of the image algebra as a means of specifying image processing algorithms have been successful in providing people a uniform means of discussing these algorithms, these implementations have produced some programs which are highly inefficient in using machine resources. This study was motivated by the desire to find ways to optimize image algebra code. Unfortunately, most of the traditional optimization techniques were ill-suited to the gross inefficiencies introduced by direct translation of image algebra programs. Additionally, the proofs of correctness of global data flow optimizations are based on the flow of the program data, rather than the meaning of the program. This work presents a new approach to code optimization designed to solve both of these problems. Traditionally, code optimization has been classified as either peephole , where small pieces of code could have relatively small changes applied, or global , where transformations could be made on a larger scale. Determining when global optimizations can be performed has previously been done by looking at the results of global data flow analysis. Rather than collect the large sets of data required for global data flow analysis, I collect the variables set and used by the execution of each statement. Using only this information, primitive transformations can be performed. These primitive transformations can be proven to be correct using denotational semantics. Previously, any proof of transformation correctness has been carried out by examining the program flow graph without regard to semantics. These provably correct transformations can then be combined to give many of the same global 1
PAGE 11
2 transformations as those provided by global data flow analysis. Since the basic transformations of the system are so small, they can easily be rearranged and recombined for the task at hand. Thus, I have developed some new and previously unexploited transformations which are highly beneficial for image processing programs. I have developed a prototype optimizing system which implements all of the primitive transformations and a number of the global optimizations which can be built from them. This system allows a user to experiment with a variety of combinations of the techniques and demonstrates the power and flexibility of this approach. The remainder of this dissertation is divided into six chapters. Chapter 2 provides a brief background in traditional code optimization techniques, semantic approaches to code optimization, and image processing. Chapter 3 provides a small language which could be viewed as a simple intermediate language. This chapter provides the syntactic and denotational semantic definition for the language, along with a discussion of the variables set and used by statements and some preliminary results about the language. The primitive transformations are presented in Chapter 4. A full proof of the correctness of each transformation is given along with its description. Some of the more beneficial global transformations derivable from these primitive transformations are presented in Chapter 5. Chapter 6 describes the prototype optimizer developed from these transformations. Along with describing the system and its basic operation, it discusses some of the possible combinations of the transformations and examines how this can improve the execution of a sample program. Finally, Chapter 7 presents conclusions and suggestions for further work.
PAGE 12
CHAPTER 2 BACKGROUND MATERIAL This dissertation combines work from several diverse areas of computer science. First, there is much work previously done in code optimization. Originally, most optimizations were done at the local or peephole level, to correct problems of one particular compiler or to fine tune for one particular architecture. In the early 1970s, there was great interest in more global optimizations. Both of these types of optimizations are discussed in Section 2.1. Most of the work done in optimization is based on a graphical view of the program being optimized and computes a variety of information based on the structure of the program graph. Instead, I approach the problem from a semantic view and look at the meanings of statements, rather than their positions in the overall program. Background material on semantics is presented in Section 2.2. Much of Chapter 3 is devoted to presenting a denotational semantic background for the language discussed in this dissertation. Although the language described here and the transformations are general purpose, this work has been motivated by the desire to improve the running time of image processing programs as implemented in the image algebra. A brief introduction to image processing and the image algebra is given in Section 2.3. Finally, there is no way to guarantee that a single combination of transformations produces an optimal, or even an improved, program. Instead, heuristics must be developed to actually use these transformations to arrive at a new program with a 3
PAGE 13
4 shorter execution time. This involves the use of expert system techniques as discussed in Section 2.4. 2.1 Code Optimization Straightforward translation from a high-level language to machine code almost never produces code as good as that which a human machine language programmer could write for the same task. All of the optimizations discussed in this section are definite code improvements. Some of them also have relaxed rules for application if the compiler writer is willing to take the chance that the code may be degraded in certain instances with the relaxed rules. A code-transformation is considered by Aho et al. [1] to be an optimization if it meets the following conditions: a. The transformation preserves the meaning of the program. b. The transformation speeds up the execution of the program. Some transformations may be undertaken to reduce the size of the code produced, but the primary emphasis is on speed. c. The execution time saved by the transformation is at least as much as the time it takes to perform the transformation. It is this third condition which has kept many of the transformations in Chapter 4 from being considered as optimizations before this time. 2.1.1 Peephole Optimizations There are some optimizations of statements that can be made with little knowledge of the code surrounding the statements. These are known as peephole optimizations. Only a few instructions (those in the peephole) need to be examined at a time to apply these measures. Peephole optimizations are described below. Removal of redundant stores and loads . The final step of one instruction may be a store of a value and the first step of the next instruction a load of the same value.
PAGE 14
5 If this is the case (and both instructions are in the same block), the load instruction would not be necessary. (A load followed by a store would result in the removal of the store instruction.) Removal of unreachable code . Although not all unreachable code can be determined by peephole optimization, some can be. All code following an unconditional branch and before the next labelled statement is unreachable and can be removed. Flow-ofControl Optimizations . If the peephole used does not require the statements to be contiguous, jump sequences can be examined and optimized. A jump statement to another jump statement can be replaced by a jump to the final destination. This may result in the removal of the intermediate jump statement. Algebraic simplification . There are many algebraic identities that may be exploited, but the most common ones (and therefore the most beneficial to optimize for) involve statements of the form x := x + 0, x := x * 1 and x := x * 0. These can be replaced with simple assignments or removed entirely. Reduction in strength . Operations which are considered expensive, such as multiplication and computing squares, are replaced by equivalent operations using less expensive operators (computing a square may be replaced by a multiplication and multiplication may be replaced by a shift operation). Use of machine idioms. Different target machines may have different operations implemented. Using these special operations may improve the code. Details on all of these can be found in Aho et al. [l]. Actual use of these techniques for the languages SIMPL and OS/360 FORTRAN H, is reported by Lowry and Medlock [17] and Zelkowitz and Bail [30]. Although these steps seem simple compared to the optimizations that result from global data flow analysis, many redundant
PAGE 15
6 statements can be produced by the compiler front end, which performs code generation for each statement individually without considering what code the surrounding statements have generated. 2.1.2 Global Data Flow Analysis Global data flow analysis examines the definitions and uses of variables in a program. While there are other uses for data flow analysis, some of which are discussed by Muchnick and Jones [19], the most important is in performing global program optimizations, discussed in Section 2.1.3. Data flow analysis is most commonly done using elimination methods. Allen presents most of the basic concepts of data flow analysis [2]. Other methods have been developed for determining the same information, but using different algorithms. A good overview of elimination methods of data flow analysis is provided by Ryder and Pauli [26]. Data flow analysis is based on simple graph theory. It begins with a control flow graph of the program. From that graph, using one of the techniques discussed in the previous paragraph, one first identifies loops suitable for improvement in the code and then computes information for each statement. This information consists of the items discussed below. Reaching definitions . For each statement, all of the possible definitions of every variable will be calculated. All of the possible definitions for a use of a particular variable in a statement are collected into a list known as the use-definition chain, or ud-chain. Live variables . A variable is considered to be alive at a statement if it could be used somewhere in or after the statement. Definition-use chain . For each definition of a variable, all of the possible uses of the definition will be listed. This is also known as the du-chain.
PAGE 16
7 Available expressions . All of the expressions which have already definitely been computed, with no possible change, are calculated for each statement. This will allow redundant subexpression elimination to occur. Copy statements . For each statement, all statements of the form a := b which have preceded it and have not had either a or b redefined are collected. These will be used in copy propagation. 2.1.3 Global Data Flow Optimizations Peephole optimization works only with a few statements at a time. When the additional information provided by data flow analysis is known about a program, additional optimizations are possible. The two most important collections of these are the Allen-Cocke catalogue [3] and the Irvine catalogue [29]. These catalogues view the optimizations at a very high level, giving more of an idea of what can be done rather than how it is done. An overview of these catalogues, giving the traditional global data flow optimizations, is presented by Kennedy [16]. These optimizations are below. Redundant subexpression elimination . A subexpression, once it is calculated, may not need to be recomputed when it is used again. If there is no possible change to the variables in the subexpression between where it is originally computed and where it is recomputed, a new variable is created and the value of the subexpression is assigned to the new variable. Instead of recomputing the subexpression, the value of the new variable is used. This was first discussed by Cocke [7], Copy propagation . A copy statement, of the form A := B, can be removed and all uses of A replaced with B if there are no definitions of A or B between the copy statement and the uses of A. This will not only eliminate extra copy statements and variables the programmer (or high-level language) may have produced, but will also
PAGE 17
8 eliminate many of the extra copy statements produced by other optimizations. If B is a constant, this procedure is called constant folding. Constant values may be substituted into expressions wherever possible and the resulting peephole optimization may reduce entire expressions to constants. Code motion . Statements in a loop that do not depend on the variables that may change in the loop can be moved outside of the loop. This eliminates multiple executions of a statement that only needs to be executed once. Strength reduction of induction variables . Induction variables are variables that depend on the loop variable for their values. They are typically given values which are some linear function of the loop control variable. Rather than recompute this function every time the loop control variable changes, it can be computed once at the beginning of the loop and incremented each successive time through the loop. This will replace the computation of an expression with a simpler statement. Elimination of induction variables . Induction variables may in some cases be replaced by the loop control variable which they depend on. This will remove a variable and possibly allow further optimizations. Dead code elimination . If the du-chain of a statement contains no entries, the definition in the statement is never used, and therefore the statement can be eliminated. Procedure integration . The body of a procedure can sometimes be substituted for the procedure call. This has the advantage of reducing procedure call overhead, which is very inefficient in some compilers. It may also allow other optimizations to occur and give more restricted ud-chains and du-chains. Machine-dependent optimizations . If something is known about the target machineÂ’s organization, other optimizations to take advantage of the machineÂ’s features can be made. The most common machine-dependent optimizations are listed below.
PAGE 18
9 Register allocation . Different machines have different numbers of registers and different types of special-purpose registers. They also have different register manipulation instructions, such as autoincrement. If the optimizer knows about the specifics of the registers, it can better allocate registers to avoid redundant store and load operations and to use these specialized instructions. There is also some optimization which can be done without knowing all of the details of a specific machine. This global machine-independent register allocation utilizes usage counts to determine which values should reside in a limited number of registers and is discussed by Chow [6]. Detection of parallelism . Any instruction which can be coded as a vector operation should be identified if the target machine is a vector machine. Methods to do this are discussed by Schneck [27], A good overview of the rules for performing subexpression elimination, copy propagation, code motion, and strength reduction can be found in many introductory compiler texts [1,5]. Specific work in the implementation of copy propagation, dead code elimination, code motion, strength reduction and elimination of induction variables and register allocation has been done by Chow using U-Code (described in Section 2.3) as the intermediate language [6], 2.2 Program Semantics While there has been previous work on the mathematical background of the correctness of program optimization (including an entire book devoted to the subject [28]), this has not included a formal notion of the semantics of the program being optimized. Rosen introduces a high-level approach to the problem, but does not use any sort of semantic definition of the language with which he is working [25]. All of
PAGE 19
10 these works have been based on an informa , ^ ^ ^ mean. Cousot presents some of the ear, test forma, work in this area He uses an operational semantic framework in which to perform program aÂ„a, y ses. , assert that Â“ operational, y based framework does not yie ,d the coherent hierarchical framework t at denotations semantics provides. This view is shared b y a number of o ers [12, 9], In add.t.on, the current popularity of denotations definitions certainly make optimization work based unnn , , Â“ Upon them m Â°re appealing. Donzeau-Couge as explored the app.ication of denotations semantic, to program optimization [10) e emons, rates the applicability of this techniÂ„ue to such optimizations as constant propagation, common subexpression determination, and invariant determination, but does not discuss the elimination of these common subexpressions and invariant expressions. In addition, optimizations such as code motion, loop rolling and unrolling, etc, are not discussed. Alcr.hr, Image processing has two main goals. First, images may be processed to enhance em or uman use. Image processing is crucial in the images of planets sent back y space probes, for example. Second, images may be processed for machine inerpretation. Current work in computer vision includes medical diagnosis, military target ac q Â„isitioÂ„, robotics navigation, and face recognition for television rating services. Introductions to image processing in genera, can be found in a number of texts [4,13], Digital images consist of some underlying system of discrete points, called pizels a ort for picture elements, each of which has a corresponding value in the image C0rreSP0ndi "* "V * *ome brightness indicator or infrared reading for
PAGE 20
11 the point, or even a vector of values. A common image, the black and white photograph from newspapers, has a 2-dimensional grid of pixels and gray levels indicating the relative brightness or darkness as the values. Some common image processing techniques include detecting edges, smoothing and sharpening, and locating features in an image. The AFATL Image Algebra was developed to provide a standard mathematical environment for image processing. It can perform any gray level image-to-image transformation and has the advantage of having a formal mathematical basis. The most basic operand in the image algebra is the image. Images can have many different coordinate sets and values. A coordinate set (usually denoted X) must be a compact subset of IF 1 , with n most often being 2, to indicate 2-dimensional images. The image value set (usually denoted F) must be a groupoid. The most common image value sets are integers, natural numbers, real numbers and vectors of integers, natural, or real numbers. An F valued image, a, on a set of image coordinates, X, is defined to be the graph of the function a : X Â— *Â• F, or: a = {(x, a(x)) : x Â€ X, a(x) e F} The image algebra provides numerous types of image functions. Binary operations between images include +, Â— , *, A, and V. These functions will operate pointwise on two images with the same coordinate system. There are also elementary functions, such as the characteristic function (x) and the sum (E) of an image. The characteristic function of an image will be a binary image (that is, an image consisting of just 0 and 1) which is 1 where the pixel meets certain requirements and 0 everywhere else. Thus the characteristic function x
PAGE 21
12 both with images and with other templates. These typically require subroutines to implement and are outside the scope of this research. A fuller introduction to the image algebra is presented by Ritter and Wilson [23]. Currently the image algebra is implemented as an extension to FORTRAN, and FORTRAN programs can be written which are converted to standard FORTRAN programs by the Image Algebra FORTRAN preprocessor. A description of Image Algebra FORTRAN is provided by Ritter et al. [24]. Work is underway to implement the image algebra with Image Algebra-C. This work was begun by Perry [22]. Because so much of the image algebra has the potential for vectorization and because of the interest in parallel architectures for image processing in general [11], the intermediate language U-Code [20] was enhanced to include vector instructions. (The addition of vector instructions to a language is discussed by Zosel [31].) The resulting V-Code serves as the intermediate language of Image Algebra-C. While the image algebra provides powerful notation, these previous attempts at straightforward translation from image algebra programs to lower-level languages have led to highly inefficient code. (This is not just a shortcoming of the image algebra or these implementations. Inefficient translation of code has been a problem for almost as long as there has been translation of code.) Hence, optimizations are important to implement for the image algebra. Inspection of existing optimization techniques showed they were lacking for some of the high-level inefficiencies introduced by the image algebra. Several new or previously unexploited techniques, such as backward copy propagation and loop-conditional joining, are needed to better improve the code.
PAGE 22
13 2.4 Expert Systems Although this dissertation is not intended to contribute to the field of artificial intelligence, the techniques employed in building simple expert systems proved quite useful in the work presented in Section 6.4. Expert systems are programs which behave as a human expert would. They are particularly useful in situations where no algorithmic solution is possible. An overview of expert systems is presented by HayesRoth et al. [14]. For this particular expert system, the simpler reasoning techniques of MACSYMA, as discussed by Martin and Fateman [18], were sufficient.
PAGE 23
CHAPTER 3 LANGUAGE DEFINITION This chapter introduces the terminology used in this dissertation. A small language is defined and its denotational semantics are presented in the style of de Bakker [9]. The language provides both simple and indexed integer variables, integer and boolean expressions, the basic structured statements, an empty statement, and an assignment statement. This is intentionally a simple language. However, it suffices for the ideas presented here. If it were more complex, the proofs in Chapters 4 and 5 would be much more involved, with few, if any, benefits. This simplification of a language to make optimization easier is not without precedent. Rosen discusses movement of optimization decisions from compile time to design time [25]. In keeping with this desire for simplicity, the language is restricted to programs containing loops with bounds fixed at the time of loop entry and does not support subprograms. These language constructs, though amenable to the kind of treatment given other constructs presented here, introduce a level of complexity which would greatly complicate the proofs presented with little or no benefit to the image processing programs being considered. The first section describes the syntax of the language. The second section discusses variable locations and the states that assign them meanings. Section 3.3 defines a basic complexity measure for expressions and statements in this language, which will be used by some proofs in later sections. The syntax of substitution is given in Section 3.4. Sections 3.5, 3.6 and 3.7 give the semantics of the language, state variants, and substitution. The way statements affect and are affected by the values 14
PAGE 24
15 Name Table 3.1. Syntactic Notation Description Typical elements Icon Integer constants m, n Svar Simple variables x, y, z, u Avar Array variables a Ivar Any integer variable (Svar U Avar) v, w Iexp Integer expressions s Bexp Boolean expressions b Stat Statements Textual substitution into a member of Iexp (Bexp, Stat) Left-hand-side substitution into a member of Iexp S s [*i/y} V stored at locations is discussed in Section 3.8 and a static approximation of this is provided in Section 3.9. 3.1 Language Syntax A brief description of the notation used in this language is given in Table 3.1. Variables in this language are members of the set Ivar and may be either simple integervalued variables (members of the set Svar, x, y, x\, etc.) or integer-expressionindexed integer arrays (members of the set Avar , a,a x , etc.). Definition 3.1.1 ( Integer variables) v ::= x | a[s] . Integer expressions may consist of variables, constants, binary operations and conditional expressions. The set Icon will contain all integer constants (m,n,m x , etc.) while Iexp contains the integer expressions (s,s l5 etc.). Actual integer values (a, a x , etc.) are members of the set V. Definition 3.1.2 (Integer expressions) s ::= v | m | s x Â© s 2 | if b then s x else S 2 f i. (where Â® is any binary operator).
PAGE 25
16 Boolean expressions may consist of the boolean constants true and false, relational operations, negation, implication, and subrange inclusion. Boolean expressions (b,bi, etc.) are in the set Bexp and map to truth values (/5,/?i, etc.) in the set of truth values W. Definition 3.1.3 ( Boolean expressions ) b ::= true | false | s 2 \ ~>b | s in (a x . . . s 2 ). (where @ is any relational operator). Statements (members of the set Stat, 5, Si, etc.) consist of assignment and empty statements, along with the standard structured constructs of concatenation, selection and iteration. A program in this language will be the same as a statement. Definition 3.1. A ( Statements ) S ::= v := s | Si; S 2 | if b then Si else S 2 fi | | for x := Si to s 2 do S od (where x does not appear anywhere else). This language provides no boolean binary operators. If and is needed, the statement if b\ and b 2 then Si else S 2 fi can be replaced with if b\ then if b 2 then Si else S 2 f i else S 2 fi. Similar replacement is possible for the or operator. This provides conditional evaluation of the and statement, where if the first clause, 61 , is not true, the second clause will not even be evaluated. A number of high-level languages, such as C and Modula-2, have similar conditional evaluation. 3.2 Variable Locations and States The set LocV contains all possible variable locations. A state is a function V, mapping locations of variables into integer values. The set of all states is Â£. The location of a variable in a state a, C(v)(cr), is an intermediate variable (Â£,Â£ 1 ,
PAGE 26
17 etc.). For simple integer variables, the location of the variable is simply the variable itself. With array variables, the location of the variable is given by the array and an integer, the meaning of the index for a particular state. (Definition 3.5.1 explains the meaning of expressions in a given state, 71.) Definition 3.2.1 ( Location of a variable) r( \( \ Â— f x if v = x e Svar \ if v = a[s] E Avar For any program, the domain of a, a subset of LocV containing all of the intermediate variables of the program assigned values by < 7 , is assumed to exist. Since there is no need to declare variables in a program, there will be no error handling due to undeclared variables or out-of-bound indices. While these are important considerations in practice, they are not important to the goals of this research. A state, a', that assigns the same value to all but one location a s another state, < 7 , is known as a state variant. State variants will be important in defining the meaning of statements. Definition 3.2.2 ( Variants of a state) For each a E E and a E V we write cr{a/Â£} for each element of Â£ which satisfies, for each ^ E Loc V: Aliasing (two or more names for the same memory location) can cause special problems when determining if a transformation is valid. A transformation which may seem to preserve the meaning of a statement (such as interchanging x := 7; y := 8) may in fact change the meaning if there is aliasing (in this case, if x and y refer to the same memory location). Although this language is simple and does not include many of the features which introduce aliases (such as pointers and variable parameters),
PAGE 27
18 because arrays are allowed, it is possible that a[sj] and a[s 2 ] refer to the same element of the array a. To avoid possible aliasing problems, the concept of two variables being always separate is used. Definition 3.2.3 ( Always sevarate) If V\ and t>2 Â£ Ivar, then we say v\ and are always separate, written sep(t>!, u 2 )> if there is no a 6 E with C(v\)(cr ) = C(vi)(er). We can say v is always separate from a set of variables, I, written sep (v,I), if, Â£ I, sep(v,v i). Changing a variableÂ’s value may also change its location, as may happen with a[a[x]] when both x and a[x] have the same value. Variables of this form are the exception to many of the following transformations and are said to be self-referencing. Definition 3.2.1 (Strictly non-self-ref erencina variables ) A variable v is strictly non-self-referencing if C(v)(a{a/ C(v)}) = C(v)(cr), for any state a and any integer a. Any simple variable is strictly non-self-referencing, as is any indexed variable whose subscript does not include a reference to the array being indexed. Another form of variable interdependence which may cause trouble occurs when changing the value of one variable may change the location of another. The variable whose location is changed is said to be location-dependent on the variable whose value changes. Definition 3.2.5 (Location-indevendent) A variable v is location-independent of a variable w if C(v)(a) = C(v)(cr{a/ C(w)(cr)}) for all integers a and all states a.
PAGE 28
19 Any simple variable is location-independent of any other variable. An array reference is location-independent of another variable if the other variable is not in the expression indexing the array. It should be noted that location independence is not a symmetric relation. The variable x is location-independent of a[x], but a[x] is not location-independent of x. 3.3 Complexity Measures For the sake of inductive proofs, a structural complexity function c is introduced for the language elements. It is necessary that the complexity of any combination of elements have a complexity greater than the elements comprising it. Definition 3.3.1 (Complexity of Iexp) c(x) = 1 c(a[s] ) = 1 + c(s) c(m) = 1 c(s x Â® s 2 ) = 1 + c(si) + c(s 2 ) c( if b then sx else s 2 fi ) = 1 + c (b) + c(s x ) + c(s 2 ) Definition 3.3.2 (Complexity of Bexp) c( true,) = 1 c(false) = 1 c(s x Qs 2 ) = 1 + c(s x ) + c(s 2 ) c(~**
**
PAGE 29
20 c(SiÂ‘,S 2 ) = c(Si) + c(S 2 ) c ( if b then S x else S 2 fi) = 1 + c (b) + c(S x ) + c(S 2 ) c(D) = 1 c(for x := Si to s 2 do 5 odj = 1 + c(s x ) + c(s 2 ) + c(S) 3.4 Syntax of Substitution Substitution of expressions for simple variables is important in defining the semantics of loops. It will also play a part in statement interchange and absorption. There are two kinds of substitution. Square brackets ([ ]) are used to denote textual substitution into an expression or statement and angle brackets (< >) denote textual substitution into the left-hand-side of an assignment statement. Substitution for array variables is not defined here for a variety of reasons. The definition of substitution for an array variable traditionally includes conditional expressions, which would greatly complicate the proofs in Chapter 4. Substitution for array variables in statements is even more complicated. This added complexity would provide little new functionality to the language since loop control variables must be simple variables. (I am not alone in this exclusion of substitution for array variables; de Bakker knowingly omits an explanation of S[vi/u 2 ] as well [9].) Definition 3.A.1 (, Substitution of expressions into Iexp ) s if y = x x otherwise
PAGE 30
21 aUx^ls/y] = a[s\[a/y] ] m [ s /y] Â— m (si Â© s 2 )[s/y] = (si[s/y] Â®s 2 [s/y]) (ii b then Si else s 2 fi)[s/y] = (if b[s/y] then s x [s/y] else s 2 [s/y] fi) Definition 3. A. 2 (Substitution of expressions into Bexp ) true [s/y] = true fals e[s/y] = false (si%s 2 )[s/y]= (s x [s/y] % s 2 [s/y]) (~'b)[s/yj = ~'(b[s/y]) (s in (s x ... s 2 )) [s/y] = (s[s/y] in (s x [s/y] . . . s 2 [s/y])) Left-hand-side substitution will not substitute for a simple variable, but will substitute in the indexing expression of an array reference. Thus a[x] ~~ = a[s] and x[s/x\ = s , but x ~~~~ = x. Definition 3.&.S ( Left-hand-side substitution) Substituting into a statement involves substituting for all variables in the statement, much like substituting in an expression. The difference here is that left-handside substitution is performed on the left-hand-side of assignment statements and textual substitution is done everywhere else. Definition 3.1. A (Substitution of expressions into Stat) (v := s x ) [s/y] = v ~~~~ := s x [s/y] (Si] S 2 )[s/y] = S x [s/y]; S 2 [s/y] (if b then 5i else S 2 fi) [s/y] = if b[s/y] then Si[s/y] else S 2 [s/y] fi ~~
PAGE 31
22 (D)[s/y] = D (for x := to S 2 do S od )[s/y] = (for x := -Si [-s/t/] to ^[s/t/] do S[s/yJ od ) A statement with a pair of substitutions (i.e., 5[x/y][n/x]) may simplify to a statement with a single substitution (i.e., S[n/y]) in some cases. This result will be used in other proofs in later chapters (most notably the proof of loop joining in Theorem 5.1.1) and is presented here as an example of a complete inductive proof in this language. In later inductive proofs, only the basis cases will be proven because of the length of the proofs and the fact that there is very little of interest to be found in the inductive steps. Since statements use both integer and boolean expressions, this result must first be shown for expressions. It is provided in the lemma below. This lemma refers to ivar, the set of all variables in a statement or expression. A fuller definition of ivar is given in Section 3.9. Lemma 3.1.1 b s [ n /y] = s[x/y][n/x] provided: x & ivar(s) and b b [n/y] = b[x/y][n/x] provided: x Â£ ivar(b) Proof: By simultaneous induction on the complexity of s and b. Basis: c(s) = 1 and c(b) = 1 Case 1: s = y,x ^ y y[n/y] = n (Def. of subst. into expr. [3.4.1]) = x[n/x] (Def. of subst. into expr. [3.4.1]) = (y[*/y])[Â»/*] (Def. of subst. into expr. [3.4.1]) Case 2: s = z, z ^ y z[n/y] = z (Def. of subst. into expr. [3.4.1]) = x[n/x] (Def. of subst. into expr. [3.4.1]) = (z[x/y])[n/x] (Def. of subst. into expr. [3.4.1])
PAGE 32
23 Case 3: s = m m[n/y] = m (Def. of subst. into expr. [3.4.1]) = m[n/x\ (Def. of subst. into expr. [3.4.1]) = {Â™{xly])[nlx) (Def. of subst. into expr. [3.4.1]) Case 4: 6 = true true[n/y] = true (Def. of subst. into expr. [3.4.2]) = true[n/x] (Def. of subst. into expr. [3.4.2]) = (true[x/y])[n/x] (Def. of subst. into expr. [3.4.2]) Case 5: b = false false[n/y] = false (Def. of subst. into expr. [3.4.2]) = false[n/x] (Def. of subst. into expr. [3.4.2]) = (f alse[x/y])[n/x] (Def. of subst. into expr. [3.4.2]) Induction step: Assume that s\n/y\ Â— s\xly\\n/x\ and b[n/y] = 6[x/y][n/x] whenever c(s) < k and c(b) < k (k > 0). Show that it is true when c(s) = k + 1 and c(b ) = k + 1. Case 1: s = a[si] a ( 5 i][ n /y] = a[si[n/y]] = a[Â«i[*/y][n/*]] = <*[*i [*/yj][n/xj = Â«[5i][x/y][n/xj Case 2: 5 = sx Â® s 2 {si Â© s 2 )[nly} = (sifn/yj Â© s 2 [n/y]) = [ s \[ x ly][nlAÂ® s 2[ x ly\Wlx]) = (si [xly\@ s 2 [xly))[nlx] = (-si Â© s 2 )[x / y\[n / x\ (Def. of subst. into expr. [3.4.1]) (Induction hypothesis) (Def. of subst. into expr. [3.4.1]) (Def. of subst. into expr. [3.4.1]) (Def. of subst. into expr. [3.4.1]) (Induction hypothesis) (Def. of subst. into expr. [3.4.1]) (Def. of subst. into expr. [3.4.1]) Case 3: s = if b then si else s 2 f i (if b then si else s 2 fi )[n/y] = if b[n/y ] then Si[n/y] else 5 2 [n/y] fi (Def. of subst. into expr. [3.4.1]) = if b[xly][nlx\ then Si[x/y][n/x\ else s 2 [x/y][n/x] fi (Induction hypothesis) = (if b[x/y ] then Si[x/y ] else s 2 [x/y] fi)[n/x] (Def. of subst. into expr. [3.4.1]) = (if b then Si else s 2 f i)[x/y][n/x] (Def. of subst. into expr. [3.4.1])
PAGE 33
24 Case 4: b = si0s 2 (*i@a 2 )[n/y] = ai[n/y]0 s 2 [n/y] = ai[Â®/y][n/ar]0s 2 [x/y][n/x] = (MVy]Â© 5 2[*/s/j)[n/x] = (Â«i@a 2 )[a?/y][n/x] Case 5: b = ->b ( _, 6)[n/y] = ->(6[n/y]) = ->(6[x/y][n/x]) = (Â“ , (M*/y3))[Â«/*] = (-Â’6)[ar/ 2 /][n/a:3 (Def. of subst. into expr. [3.4.2]) (Induction hypothesis) (Def. of subst. into expr. [3.4.2]) (Def. of subst. into expr. [3.4.2]) (Def. of subst. into expr. [3.4.2]) (Induction hypothesis) (Def. of subst. into expr. [3.4.2]) (Def. of subst. into expr. [3.4.2]) Case 6: b = (s in (si . . . s 2 )) (s in (Â«i ...a 2 ))[n/y] = = (s[n/y\ in (^i[n/y] . . . s 2 [n/y])) (Def. of subst. into expr. [3.4.2]) = (s[x/y][n/x] in (ai[x/y][n/x] ... a 2 [x/y][n/ar])) (Induction hypothesis) = (s[x/y] in (si[x/y] . . . s 2 [x/y]))[n/x] (Def. of subst. into expr. [3.4.2]) = (s in (si . . . s 2 ))[x/y][n/x] (Def. of subst. into expr. [3.4.2]) With the preliminary result proved above, the following lemma can now be proved. Lemma 3. A. 2 (= s f n /y] = ( s f x /y])[n/x] Provided: x ivar(S) Proof: By mathematical induction on the complexity of S. Basis: c(S) = 1 Case 1 : S = D D[n/y] = D = D[n/x\ = D[xly][nlx] (Def. of subst. into stat. [3.4.4]) (Def. of subst. into stat. [3.4.4]) (Def. of subst. into stat. [3.4.4])
PAGE 34
25 Case 2: S = x := s ( x = s)[n/y ] = x := s[n/y] = x := s[n/y] = x := s[x/yj[n/x] = x < n/x >:= s[x/y][n/x] = x < x/y >< n/x >:= s[x/y][n/x] = (x < x/y >:= s[x/y])[n/x] = (x := s)[x/y][n/x] (Def. of subst. into stat.[3.4.4]) (Def. of subst. into l.h.s.[3.4.3]) (Previous result [3.4.1])) (Def. of subst. into l.h.s. [3.4.3] ) (Def. of subst. into l.h.s. [3.4.3]) (Def. of subst. into stat.[3.4.4]) (Def. of subst. into stat. [3. 4. 4]) Case 3 : 5 = a[s x ] := s ( a M : = *)[n/yj = a[s x ] < n/y >:= s[n/y] = <*K[n/y]] := s[n/y ] = a[si[x/yj[n/x]] := s[x/y][n/x] = a[si[x/yj] < n/x >:= s[x/y\[n/x] = (a[si[x/y]] := s[x/y])[n/x] = (a[s x ] < x/y >:= s[x/yj)[n/x] = (a[s x ] := s)[x/y][n/x] (Def. of subst. into stat.[3.4.4]) (Def. of subst. into l.h.s. [3. 4. 3]) (Previous result [3.4.1])) (Def. of subst. into l.h.s. [3. 4. 3]) (Def. of subst. into stat. [3.4.4]) (Def. of subst. into l.h.s. [3.4.3]) (Def. of subst. into stat. [3.4.4]) Induction step: Assume that S[n/y ] = S[x/y][n/x]. whenever c(5) < k. Show it is true when c(S) = k + 1. Case 1:5 = 5i; 5 2 (5i;5 2 )[n/y] = 5 x [n/y]; 5 2 [n/y] = s Â’i[ ;r /y][Â«/^];5' 2 [x/y][n/x] = (Si[x/y];5 2 [x/yj)[n/x] = (S , i;SÂ’ 2 )[ar/y][n/x] (Def. of subst. into stat. [3.4.4]) (Induction hypothesis) (Def. of subst. into stat. [3.4.4]) (Def. of subst. into stat. [3. 4. 4]) Case 2: 5 = if b then 5 X else 5 2 f i (if b then 5 X else 5 2 fi)[n/y] = if b[n/y] then S x [n/y] else 5 2 [n/y] fi (Def. of subst. into stat. [3. 4. 4]) = if 6[x/y][n/x] then 5 x [x/y][n/x] else 5 2 [x/y][n/x] fi (Induction hypothesis) = (if b[x/y] then 5 x [x/y] else 5 2 [x/y] fi)[n/x] (Def. of subst. into stat. [3.4.4]) = (if b then 5 X else 5 2 f i)[x/y][n/x] (Def. of subst. into stat. [3.4.4])
PAGE 35
26 Name . Table 3.2. Semantic Notation Description Typical elements V Integers a W Truth values (T and F) 0 s States (functions from LocV to V) a LocV Intermediate variables (Svar U (Avar x V)) Location of Ivar (Â£: Ivar Â— (E Â— > LocV)) Â£ Value of Iexp (11: Iexp Â—(Â£Â—*Â• V)) n Value of Bexp (W: Bexp Â—(Â£Â—Â»Â• W)) w Value of Stat (M: Stat Â— > (S Â— S)) M Variant of a state Constant representing the value of the integer a a Case 3: 5 = for x := Sa to s 2 do S od (for x := mi to m 2 do 5 od )[n/y] = for x := S\\n/y\ to s 2 \n/y\ do Sfu/y] od (Def. of subst. into stat.[3.4.4]) = for x := Si[x/y\[nlx\ to s 2 [xly\[nlx ] do 5[x/y][n/x] od (Induction hypothesis) = (for x := s\[xly] to s 2 [x/y] do 5[x/y] od)[n/x] (Def. of subst. into stat. [3.4.4]) = (for x := si to S 2 do S od)[x/y][n/x] (Def. of subst. into stat. [3.4.4]) 3.5 Semantics of the Language Once the syntax of the language, states and substitution is defined, the semantics follow. The notation used in defining the semantics of this language is given in Table 3.2. The meanings of integer and boolean expressions and statements are fairly straightforward. A bar over an integerÂ’s value, a, will be used to indicate a constant with the value of that integer. Definition 3.5.1 ( Semantics of Iexp) K(v)(a) = cr(Â£(u)(cr)) 71 ( m) (a ) = a, where a is the mathematical constant associated with m K( Sl Â®s 2 )(a) = K(s x )(o) Â© K(s 2 )(o) 1Z(if b then Si else s 2 = if\V(b)(cr) then 'R.(si)(cr ) else H(s 2 )(cr) fi
PAGE 36
27 It is further required that all expressions evaluate without error. While there could be special cases for statements which could not be evaluated (such as x/0), this only complicates the presentation without providing greater understanding of the correctness proofs here. Thus it is assumed no semantic errors will occur during program execution and no attempt is made to provide meanings for erroneous states or conditions. Definition 3.5.2 ( Semantics of Bexp ) W(trne)(a) = T VV(f z.lse)(cr) = F W(siBs 3 )(W(b)(a) W(s in (si...s a )) = Tl(s)(a) > H(s x )(cj) andH(s)(a) < n(s 2 )(a) The semantics of statements is also unsurprising. The meaning of a statement in a state <7 is just a variant of <7. So, for any statement, M(S)(cr) = cr{a 1/6} Â• Â• Â• {} where the Â£,Â• are the locations of the variables assigned by the statement. Empty statements have no effect on the state in which they are executed. Assignment statements result in a variant of the state in which they are executed by substituting the value of the right-hand side of the assignment for the location of the left-hand side of the assignment. The for statement has probably the most interesting semantic definition. If the value of the upper bound is greater than or equal to the value of the lower bound in the state in which the for statement is being executed, then the statement in the loop will be executed at least once. The for statement then has the meaning of the same for statement, with one fewer iterations of the loop, followed by the statement in the for body, with the original value of the upper bound substituted for the loop
PAGE 37
28 control variable. This has the effect of making any assignment to the loop control variable legal, but meaningless. For example, the statement for i := 1 to 10 do x := 100; sum := sum + x od has the same meaning as the statement for i := 1 to 9 do x := 100; sum := sum + x od; x := 100; sum := sum -f 10. If the upper boundÂ’s value is less than the lower boundÂ’s value, then the for statement has the same meaning as the empty statement. Notice that since the loop bounds are fixed at the time of entrance to the loop, it is impossible to have infinite looping. Definition 3.5.3 (, Semantics of Stat) M(v := s)(a) = o {H{s)(cr) / C(v){ 7Z(si)(a) then j\4 (f or x := si to s 2 1 do S od; S['JZ(s 2 )(cr)/x])(o') else Af (D)(a) fi Two statements are equal in a state if they have the same meaning in that state. They are equal if they have the same meaning in all states. Definition 3.5.1 ( Equality of statements) Two statements S\ and S 2 are equal in a state, written S x S 2 , if M (Si) (a) = M(S 2 )(a). Two statements Si and S 2 are equal, written Si = S 2) if for all states a Â€ S, =Â„ S 2 . A transformation of a statement S x into a statement S 2 is valid in state a if Si =Â„ S 2 .
PAGE 38
29 A statement is said to be nullable if and only if it has no effect on any state. Nullable statements are of the form x := x, or some combination of nullable statements, such as if b then x := x else for y := si to S 2 do y := y ; z := z od f i. Definition 3.5.5 ( Nullable statements) A statement S is nullable iff S = D. 3.6 Semantics of State Variants Once the semantics of statements is established, some obvious results about state variants and their meanings can be proved. First, state variants can be interchanged when they refer to different locations. Lemma 3.6.1 (Interchange of state variants) ( <7 { Q! i/6}){ a 2/6} = (< r {a2/(f 2 }){a 1 / ( f 1 } Provided: 6 ^ 6 Proof: It must be shown that (
PAGE 39
30 Case 3: Â£ ^ and Â£ ^ Â£ 2 (Â° r { Q: i/6}){a2/6}(0 = Â° r {Â«i/Â£i}(Â£) = *(0 = <7W6}(Â£) = (^{a a /6}){Â«i/6}(0 (Def. of state variant[3.2.2]) (Def. of state variant[3.2.2]) (Def. of state variant[3.2.2]) (Def. of state variant [3.2.2]) The value of a variable in a state variant follows directly from the definition of state variants. Lemma 3.6.2 ( Semantics of state variants ) If the variable v is strictly non-self-referencing and v is not location-dependent on x then { *<*)(Â„) Sr Proof: This must be shown for all v Â€ Ivar. Case 1: C(v)((t) = C(x)(a) IZ(v)((r{a/ C(x)(a)}) =
PAGE 40
31 Lemma 3.6.3
PAGE 41
32 expressions, the value of the expression s (or b ) after syntactic substitution for some variable i in a state is the same as the value of the expression in the state with the same semantic substitution made for the variable. Lemma 3.7.1 (, Semantics of substitution into expressions) /])(*) = ft(s)(
PAGE 42
33 Substituting into locations is a bit more complicated. In this case, only variables can be substituted (since it makes no sense to discuss the location of an expression). Additionally, there are three possible cases. First, the original variable may be an indexed variable. (Recall that only simple, nonindexed variables can be substituted for, so no indexed variable will be replaced with another indexed variable.) Second, if the original variable is simple, there are two cases; the old location can be identical to it or always separate from it. Additionally, left-hand-side substitution has semantics very similar to full textual substitution into array variables. Lemma 3.7.2 ( Semantics of substitution into locations ) Â£ ( u fai h])W) = Â£( u i)(<7) if v Â£ Svar, v = y Â£( v )( a ) if v Â£ Svar, v ^ y C(v)(cr{'R.(vi)(a)/C(y)(a)}) if v Â£ Avar C{v )(c 7 ) = Civ^ain^^/Ciy^a)}) Proof: There are five possible cases (the first three for regular substitution and the last two for left-hand-side substitution). Case 1: v = y Â£(y[ v i/y])( = < a,ft(s)(<7{ft(ui)(<7)/Â£(2/)(<7)}) > = Â£(a[s])(
PAGE 43
34 Case 4: v (E Svar C(v )(cr) = Â£(Â«)()()(a ) = r (a[5[v 1 / 2 /]])(cr) = Â£(aM)W%)(.)/Â£(y)(.)}) (Def. of left-hand-side subst. [3.4.3]) (Def. of location of Svar [3.2.1]) (Def. of left-hand-side subst. [3.4.3]) (Same arguments as Case 3, above) Defining substitution into statements also involves moving the substitution from the statement into the state. However, since the meaning of a statement results not in an integer or boolean value, but in another state, simply evaluating the meaning of the statement at a modified state, one with the new value substituted for the old, could possibly result in an incorrect new state, one which has the same modification. For example, the statement (x := y)[3/y] evaluated in a may seem to have the same results as evaluating x := y in cr{3/y}. Both M{(x := J/)[3/y])(cr) and M(x := y)( cr {3/y}) map x to 3. But M((x := y)[3/y])(cr) maps y to whatever value it has in
PAGE 44
35 Case 1: H(y)(:= s[s 1 /y])((r) (Def. of substitution [3.4.4]) = M(y := >s[si/y])(cr) (Def. of l.h.s. subst. [3.4.3]) = v{K{s[si.l y])(cr) / C(y)(cr)} (Def. of v := s [3.5.3]) = cT{n( S )(a{n( Sl )(a)/C(y)(a)})/C(y)(a)} (Sem. of substitution [3.7.1]) = (a{K( S! )(ff)/Â£(!/)(^)}){K( S )(:= s[s 1 /j/])()(a)} (Def. of v := s [3.5.3]) = )()(a)} (Sem. of substitution [3.7.1]) =
PAGE 45
36 Although substitution into assignment statements uses a special left-hand-side substitution, in some cases (most particularly, copy propagation), full textual variable substitution, rather than left-hand-side substitution, occurs in the left-hand-side of assignment statements. This strong substitution will result in one of two cases, depending on whether or not the original variable is identical to the old variable. Lemma 3.7.1 (Semantics of strong substitution into statements) If v 2 is location-independent of V\ then M(v 2 := s)(c{'Jl(v 2 ){a)/C(x)(/*1)(Â») = I (M(n ;= ,)(Â„{*(,*)(<,)/Â£(*)(<,)})) {7Z(x)(ct)/jC(x)((t)} otherwise Proof: There are three possible cases for Â£(t> x ) and Â£(x). Case 1: ui G Svar, v\ = x. M{yi[v2lx) := s[u 2 /x])(cr) = Â°{K(s[v 2 / x])\(t) / C{yi[v 2 / x])((t)} (Def. of v := s [3.5.3]) = < 7{^( 5 )(a{7e(t; 2 )(a)/Â£(x)(a)})/Â£( Ul [ U2 /a:])( < 7)} (Sem. of substitution [3.7.1]) = a{-R(s){
PAGE 46
37 = ((*W*)( 2 )(<Â’)IC(x)(c)})IC(v l ){ a {V.(v^)(c)IC(x)(a)})}) {R{x)((7) l C(x)(o)} (Interchange of state variants [3.6.1]) = (M(v, ~s)(o{K(v 7 )(a)IC(x)(v)))){n(x)(<,)IC(x)(Â„)} (Def. of v := s [3.5.3]) 3.8 Sets and Uses In examining statements for possible transformations, it is often necessary to see which variables they manipulate. Statements can manipulate variables in two different ways Â— they may simply reference the variableÂ’s value without changing it, or they may actually give a variable a new value. The first will be referred to as a
PAGE 47
38 use of the variable and the second a set of the variable. A definition of using and setting variables is given below: Definition 3.8.1 ( Setting and using) Let $ (E S Â— E and x Â€ Ivar. a) $ sets x whenever, for some a, $() and uses($) respectively. While expressions do not set variablesÂ’ values, they certainly do use the values of variables, so the definition of uses is extended to the functions IZ and W in the following definition. Definition 3.8.2 (Using) 7l(s) uses x whenever, for some a and a, TZ(s)(a {a / C(x)(a)}) Â± 1Z{s)(
PAGE 48
39 Proof: By induction on the complexity of s and b. Since the operations are assumed to preserve their meaning in any variant, only the basis steps are shown here. Case 1: s = y,y ^ x n(y)((r{a/x}) = 'R(y)(a) Case 2: s = m ' Tl(m)(a{a/x }) = = 7 Z(m)(a) Case 3: b = true W(true)(<7{a/x}) = T = W(true)(<7) Case 4: b = false W(false)( and v := s is not nullable , then Â• Â£(i>)()(.W(S)0)) . K(s)(r) = K(s)(M(S)(c))
PAGE 49
40 For the proof of this, assume M(S)(a) = a{a x l^} . . . {Â«Â„/Â£Â„} where = < a,^(s)( = < a,7l(s)( = < a,ft(s)(At(S)(<7)) > = Â£(Â«M)(A4(S)(*)) Proof of the second part: K(v)( then H(a){ then W(6)(a) = W(b)(M(S)(a)).
PAGE 50
41 The proof of this lemma is similar to that of Lemma 3.8.2. If the sets of an assignment statement and the uses of an arbitrary statement have an empty intersection, the meaning of the arbitrary statement is the same, whether or not the assignment statement has been executed. As with all cases in the definition of the meaning of a statement in a modified state, the modification must be undone after evaluating the statement. Lemma 3.8.1 (Emvtv intersections of sets and uses! If sets (Ai(v := s)) fl uses(Ad (S)) = f> and v := s is not nullable, then Ad(5)(
PAGE 51
42 assigned to x, whereas, if S 2 is executed first, 15 is assigned to y. This is expressed formally in the following lemma. Lemma 3.8.5 ( Empty intersections of sets and uses ! If sets^A4(5i)j fl uses(0\d(S2)j = then given a where A4(5i)(cr) = (T { a ll/ 6ll} Â• . . {<* lm/ 6m} 0,nd M{S 2 )(
PAGE 52
43 Definition 3.9.1 (ivar) The set of all integer variables occurring in s, b, or S is denoted by ivar(s), ivar(b) or ivar(S), respectively, and consists of the following: ivar(x) = {z}, ivar(afsf) = {a} U ivar(s), ivar(m) = ivarfsi @s 2 ) = ivar(si) U ivar(s 2 ), ivar( if b then Sj else s 2 fi.) = ivar(b) U ivar(s\) U ivar(s 2 ), ivar (true) = ivar(f alse,) = ivar(s\% s 2 ) = ivar(si) U ivar(s 2 ), ivar(-ib) = ivar(b) ivar(v := s) = ivar(v) U ivar(s), ivar(S x -, S 2 ) = ivar(Si) U ivar(S 2 ), ivar( if b then S x else S 2 fi) = ivar(B) U ivar(S x ) U ivar(S 2 ), ivar(D) = 0, and ivar(foT x := s\ to s 2 do S od ) = ivar(s\) U ivar(s 2 ) U ivar(S). Lemma 3.9.1 if x ivar(S) then x Â£ uses(A4(5)) This proof follows by induction on complexity of S.
PAGE 53
44 The function livar, defined only for statements, describes the set of variables on the left-hand side of assignments and can be used statically in place of uses. Definition 3.9.2 (livar) The set of all integer variables on the left-hand-side of an assignment in S is denoted by livar(S) and consists of the following: livar (x := s) = {a;}, livar (a[s] := s) = {a}, livar(S\] S 2 ) = livar(Si) U livar(S 2 ), livar (if b then Sj else S 2 fi) = livar(S x ) U livar(S 2 ), livar (D) = , and livar (for x := sx to s 2 do S odj = livar(S). Lemma 3.9.2 if x Â£ livar(S) then x ^ sets(At(5)) This proof follows by induction on complexity of S. While these definitions provide a close approximation of the simple variables in sets and uses, they tend to be overly broad with array references, adding the entire array to the set, instead of only the element being accessed. For example, ivar (x := 3 * w + z) is [x,w,z], but ivar (a[x] := 3 * it; + z) is [ a,x,w,z ]. Since array references are so important and prevalent in image processing, a finer approximation of sets and uses is employed in the actual implementation of these transformations. This approximation is discussed in detail in Section 6.3. For simple variables, the sets uses(M(S)) and ivar(S) (as well as the sets livar(S) and sets(yU(S))) may appear to be equivalent, and very often do refer to the same set. They are not, however, always identical. As a counter-example, consider the
PAGE 54
45 statement y := x Â— x. Here, the set ivar(S) is [x,y], those variables which appear in the statement. But the set nses(^Vt(5)) is [j/]. It does not include x because ^(Â‘5')( cr { Q: / ;r }) = (5Â’)(cr)){or/a:} for all cr G Â£ and all a G V. Using ivar as an approximation of uses (and livar as an approximation of sets) will often cause no problem, but it may prevent some transformations from taking place. For instance, if ivar(S) is employed instead of uses(A4(5)) in the cases for statement interchange (in Theorem 4.1.1), the statements x := 7; y := x Â— x are not interchangeable. The sets ivar and livar usually give a reasonable approximation of always separate. Lemma 3.9.3 ( Static avvroximation of Always Separate ) If ivar(S\) and livar(S 2 ) are disjoint, the elements of uses ( M (Si)) and sets(0V( (S 2 ) ) are always separate. Proof: Let iuar(Si) n livar(S 2 ) = and v G uses(A4(5!)) and sets ( M(S 2 )) By lemma 3.9.2, since v G uses(A4(Si)), v G ivar(Si) By lemma 3.9.1, since v G sets(M(S 2 )), v G livar(S 2 ) Therefore, v G ivar(Si) n livar(S 2 ). By contradiction, there can be no v such that v G uses(A4(Si)) and sets(M(S 2 )), so they are always separate.
PAGE 55
CHAPTER 4 PRIMITIVE TRANSFORMATIONS This chapter focuses on the minimal transformations that can be performed on programs in the language defined in Chapter 3. These transformations will be combined in Chapter 5 to give global optimizations. Since they are less complex than the global optimizations, primitive transformations are more easily proved correct. The proofs of the correctness of each of the minimal transformations is provided here as well. The basic code transformations are: Â• statement interchange (Theorem 4.1.1). There are two variations on interchange: interchange with substitution (Theorem 4.1.4) interchange with backward substitution (Theorem 4.1.5) Â• statement compression (Theorem 4.1.6) Â• movement of statements into (and out of) if statements (Theorem 4.2.1) and a variation that uses substitution (Theorem 4.2.2) Â• if then else statement splitting (Theorem 4.2.3) Â• if then else statement simplification (Theorem 4.2.5) Â• loop rolling and unrolling (Theorems 4.3.1 and 4.3.2) 46
PAGE 56
47 To move the statement Sk in front of the statement Si, it must be interchanged with all of the statements Si, ... , Sk1 : Si; Sr Sk2Â‘, Sk-i ; X Sr Sr, Si; Sr Sr, Sk ; S k ; ^^Si, Sk-2',\ >< ^Â’SkÂ‘, Sk-, Sk-r Sk-3] Sk 3 Sk2', Sk-2 Sk1 ; Sk-i', Sk-r, Sk-i Figure 4.1. Statement Interchange Used to Move a Statement 4.1 Statement Transformations The most basic of the smaller transformations is statement interchange. Two adjacent statements may, in some situations, be interchanged with no ultimate change in meaning to the entire program. In some computer architectures this may result in a shorter running time, but that is not the goal of statement interchange. Instead, it will support rearranging code so that other, more powerful optimizations may be performed. Statement interchange is often used to move one statement to the beginning (or end) of a group of other statements, as shown in figure 4.1. In code motion, a standard loop optimization technique that removes invariant statements from loops, statement interchanges first move the statement being removed from the loop to the beginning of that loop. Additionally, many of the other transformations discussed in this section require that statements be interchangeable in order for the transformation to take place. Definition A. 1.1 ( Interchangeability) Two statements S\ and S 2 are said to be interchangeable in state a if M.(S\; S 2 )(cr) = M(S 2 ; Si) (a).
PAGE 57
48 While there will always be pathological cases in which two seemingly conflicting statements will still be interchangeable, a set of sufficient conditions for statement interchange are given in theorem 4.1.1. Theorem 4.1.1 (Conditions for statement interchange ) Two statements S\ and S 2 are interchangeable if any of the following conditions are true: a) Si = S 2 b) VyVx 3-y G sets(A4(Si)), and x 6 uses(At(52)), sep(y, x); VyVx 3-y G sets(A4 (S 2 )), and x G uses(A4(Si)), sep(y, x); c) Si = v := f(v), S 2 = v := g(v), v is strictly non-self-referencing, and f(d(v)) = g(f(v)). Proof: The proof of part a) follows directly from the definition of equal statements [3.5.4]. The proof of part b) is in Theorem 4.1.2. The proof of part c) is in Theorem 4.1.3. Clearly, if two statements are identical, their order does not matter. It is when the statements are different that statement interchange becomes interesting. First, two statements can be interchanged if the locations set by each are not used by the other. For example, a[x] := 3 * y can be interchanged with z w Â— y, because a, which is set by the first, is not used in computing w Â— y; and z, which is set by the second, is not used in computing 3 * y.
PAGE 58
49 Theorem A. 1.2 (Statement interchange) \=Si',S 2 = S 2 ;Si Provided: sets(A4(5j))n uses(A4(S 2 )) = sets(A4(S 2 ))fl uses(Ad(Si)) = Proof: Assume that ^W(5Â’ 1 )(
PAGE 59
50 Theorem 4.1.3 (Interchange of assignments of functions) |= v := f(v)]v :=g(v) = v := g(v)\ v := f(v) Provided: v is strictly non-self-referencing and f{g(v)) = g(f{v)) Proof: This can be proved by manipulating the meaning of v := f(v); v := g(v). M(v := f(v); v := sr(n))()(
PAGE 60
51 same substitution that would have been made by the assignment statement. This interchange with substitution is discussed in the following theorem. Theorem A. l.A (Interchange with substitution ) j=x:=s;i> = S[s/x];x:=s Provided: uses(Ad(x := s)) D sets(Ad(S)) = Proof: This can be proved by manipulation of the meaning of x := s; S. If x := s is nullable, the result is obviously true (x := x;5 = S[x/x];x := x), so the proof assumes x := s is not nullable. As a preliminary result, notice that 7e(s)((Ad(S)(<7{7Â£(s)(a)/Â£(x)(cr)})) {^(x)(<7)/Â£(x)(c 7)}) = Â’^(Â•s)(()(
PAGE 61
52 = (A*(S)(*Wj)(<7)/Â£(Â«)(Â«t)})) {7e( 5 )(l(S)(^{K( s )M/Â£(x)M})){R( 3 )( < r{R(i)(
PAGE 62
53 Provided: 1) uses(.Vf(u 1 := 5 )) n sets(.Vt(u 2 := x)) = 2) V 2 is location-independent of v 1# 3) V 2 is strictly non-self-referencing. Proof: This can be proved by manipulation of the meaning of := s; i >2 := x. There will be two separate cases, either = x or it is not. Again, the cases in which either V\ := s or v 2 := x are nullable follow directly and are not considered in the proof. Case 1: v\ = x M.[y\ := s;u 2 := x)(
PAGE 63
54 = M(x := Vl )(M( v 2 := S )(WR(x)(x)/Â£(x)(x)}){71(x)(x)/Â£(o 2 )(,7)}) (Sem. of state variants [3.6.3]) = M(x := v 2 )(M(v 2 := s)((a{1l(x)(<7)/ C(v 2 )(a)}){1l(x)((T)/ C( x)()}) (Def. of t> := s [3.5.3]) = MRMM/Av.Xa)}) W*)WÂ«(Â«)W/Â£(Â»i)(ff)})/Â£(Â»a)(ffWÂ»)(iT)/Â£(i>i)(
PAGE 64
55 = ((cr{^(x)( < 7)/Â£(t; 2 )(a)}){7e( 5 )( < T)/Â£( l ; a )( < T)}) {R.( x )(cr) / C( x )(cr)} (Interchange of state variants [3.6.1]) = (((^W*)(<7)/Â£(u 2 )(<7)}) {^(x)( < 7{7e(x)( < 7)/Â£( l ; 2 )( ( 7)})/Â£(x)( < r{7l(x)( < 7)/Â£( t ; 2 )( < T)})}) {7^(5)(cr)/Â£( Vl )( C r)}){^(x)(cx)/Â£(x)(2)(
PAGE 65
56 = (((<*W*)(<0/Â£(Â°2)(<')})WÂ»)M/Â£(u 1 )(<0}) {TC(x)( 2 )(,T)})}) /Â£(x)(((^{K(x)(, := < )( ff {R(x)(
PAGE 66
57 = M(v x [v 2 Ix\ := s[v 2 / *]; x := v 2 )(M(v 2 := x)(a)) (Def. of v := s [3.5.3]) = M(v 2 := x-,v x [v 2 lx] := s[u 2 /x];x := v 2 )(cr) (Def. of 5 i;S 2 [3.5.3]) Compression is used to remove statements from the code when they are no longer needed. A pair of statements will compress down to the second statement if the meaning of the second is the same as the meaning of the pair. (This may be combined with statement interchange to provide compression to the first statement.) Definition i. 1.2 (Comvressible statements ) A pair of statements S x ; S 2 is compressible (to S 2 ) in state a if M (S x ; S 2 ) (a ) = M (S 2 )(a). The simplest cases for statement compression follow from the conditions below. More complex compression can take place by first applying the other transformations (such as absorption into if statements) to create these conditions. This will be discussed in more detail in chapter 5. Theorem 4.1.6 (Compression of statements) The following are sufficient for the compression of Si ; S 2 : a) Si = v si and S 2 = v := s 2 when sets(v := si) fl usesfv := s 2 ) = . b) Si is nullable. Proof: The proof of part b) follows directly from the definition of nullable statements. The proof of part a) is provided below. M(v s x ; v := s 2 )(cr) = M(v := s 2 )(M(v := si){a)) (Def. of S l5 5 2 [3.5.3]) = {M{v := s 1 )(cr)){7^(s 2 )(yVf(u := si)(
PAGE 67
58 = (. and N if ^ then Si else S 2 fi; S Â— if b then Si;S; else S 2 ;S fi Proof of the first part: A4(S; if b then Si else S 2 f i)(cr) = A4(if b then Si else S 2 fi) (A4(S)(cr)) (Def. of Si; S 2 [3.5.3]) = if\V(b)(M{S){*)) then M{Si)(M{S)(a)) else M(S 2 )(M(S)(
PAGE 68
59 = */W(6)(
PAGE 69
60 = ifW(b[s/x])( n then m else n fi min(m, n) = if m < n then m else n fi Definition 4.2.2 fit then statement ) if b then Si f i = if b then Si else D f i Since later optimizations only work with if then statements (as opposed to if then else statements), it may be necessary to transform an if then else statement into two if then statements. This is done in the following theorem. Theorem i.2.3 (Splitting if then else statements) (=if b then Si else S 2 f i = if 6 then Si fi; if ->f> then S 2 f i Provided: sets(0Vf (5i)J n usesf&J = (f>.
PAGE 70
Proof: M(if b then 5 X else 5 2 f i)(6) [3.5.2]) >fW(^b)(a) then M(Si)(M(D)(a)) else M(D)(M(S 1 )(a)) fi ... . . (Def. of D [3.5.3)) if W(~b)(cr) then , ^K^OM) else M(S^)(M(D)(c)) fi else if W(b)(a) then M(D)(M{S 1 )(a)) rfu fifi (Vacuously true) = */W(-.6)()(cr) then Â’ M(S 2 )(M(if b then Si else D fi)(cr)) else b then Si else D fi)(, o , rh \ / (Def. of empty sets and uses [3.8.3]) Â— -io then 5 2 else D fi) (A4(if b then 5 X fi)(cr)) . ... , (Def. of if [3.5.3]) = M ( lf ^ then ^ fi) (M(if b then Si'fi^a)) Â„ (Def. of if then fi [4.2.21) Ml if A C. 1 u , . w x l
PAGE 71
62 Corollary 1.2. i ( Splitting if then else Statements) [=if b then S\ else S 2 fi = if then 52 fi; if b then 5i fi Provided: sets(0Vf (S 2 )) D usesf&J = This is proved in the same way as Theorem 4.2.3. Finally, some if statements can be simplified to single simpler statements. If the truth value of a condition is the same in all states, an if statement can be simplified to either the then or else clause. If both the then and the else clauses of an if statement contain nullable statements, then the if statement can become the empty statement. This result shows an interesting result of the fact that no effects of errors are considered during evaluation of expressions. Consider the statement if x/0 = 4 then D else D fi. If this were reduced to simply D, the meaning of the code would be vastly different the first case would cause a run time error and the second would have no such error. The third clause, combined with if statement extraction (Lemma 4.2.1), allow the simplification of statements of the form if b then 5 else 5 fi. Theorem l. 2.5 (if simplification ) \= (it b then Si else S 2 fi) = 5j Provided: W(6)(cr) = T for all a 6 S (= (if b then 5j else S 2 fi) = S 2 Provided: W(6)(
PAGE 72
63 Proof of the first part: Af(if b then S x else S 2 f i)(cr) = if\V(b)(a) then At(5 x )(cr) else M(S 2 )(a) fi (Def. of if [3.5.3]) = At(5i)(cr) (Given) Proof of the second part: A4(if b then 5 X else S 2 fi )(cr) = ifW(b)(cr) then ,Ad(5 1 )(cr) else M(S 2 )(
PAGE 73
Theorem A. 3.1 (Loov elimination) M( for x := 5i to s 2 do 5 od )(a) = 64 = a ifK(s !)(*)Â£ K(s 2 )(ct) This proof follows directly from the definition of the semantics of for statements (Definition 3.5.3). Theorem j.3.2 ( Loov rolling and unrolling) f= M (for x := to s 2 do S od) (a) = M(fox x := Si to s 2 Â— 1 do S od; S[JZ(s 2 )(cr)/x])(
PAGE 74
65 Show it is true when Tl(s 2 )(a) 7Â£(s x )(cr) + 1 = k + 1. At(for x := s x to s 2 do S od)(cr) = Af(for a: := s x to s 2 Â— 1 do S od; 5[7^(s 2 )(
PAGE 75
CHAPTER 5 GLOBAL OPTIMIZATIONS The basic transformations discussed in Chapter 4 can be combined to give traditional data flow analysis optimizations and some new optimizations as well. Most of these transformations can be viewed as unrolling one or more loops, interchanging the resulting unrolled statements, possibly with some statement elimination, restructuring, or simplification (such as statement compression or if then else simplification) and rerolling of the code into a loop again. Loop joining (Section 5.1) unrolls the statements in two adjacent loops, rearranges them, and rerolls them into a single loop. Similarly, loop interchanging (Section 5.2) requires no elimination or restructuring of the unrolled statements Â— the statements in a pair of nested loops are simply unrolled, rearranged, and then rerolled. In code motion (Section 5.3), after the loops are unrolled, the copies of the statement being removed from the loop are moved to the beginning of the unrolled statements. These statements are then compressed to a single statement and the remaining statements are rerolled, giving a single statement followed by a loop. Finally, loop-conditional joining (Section 5.4) unrolls the loop, simplifies the conditional statement which makes up the body of the loop (and in doing may remove some of the conditional statements), and then rerolls the remaining statements, which no longer have the original conditional clause. Besides describing each complex transformation in terms of the minimal transformations, it is necessary to determine when the complex transformations can occur and when they will improve the code. In all but the smallest of programs, these 66
PAGE 76
67 transformations must be attempted in some order. Heuristic methods for combining these transformations and the transformations presented in the previous chapter are presented in the context of a prototype optimizer in Section 6.4. 5.1 Loop Joining Loop joining, that is, the combining of two (or more, by repeated application) loops that are executed over a similar range of loop boundaries, is another transformation which provides its greatest benefits in rearranging the grouping of statements. Loop joining has some benefits in eliminating the initialization of the second loop and consolidating the loop control variable increment and examination costs of two loops. Loop joining (and the reverse operation of loop splitting) can be part of more spectacular benefits when they are used to rearrange code so that other optimizations, most notably loop-conditional joining, can occur. Theorem 5.1.1 (Loov Joining) j= .Ad (for x := Si to S 2 do S\ od; for y := si to S 2 do S 2 od)(a) = Ad( for x := si to s 2 do S 1 ;S 2 [x/y] od) (cr) Provided: sets (Si,) D uses(sij = sets(Si) fl uses(s 2 ^ = Vyi,i /2 3-^(>Si)(cr) < yi < y 2 < 7^(s 2 )(
PAGE 77
68 = A4(for x := si to s 2 do 5i;5 2 [x/y] od)( 0). Show it is true in states where (1l(s 2 )(cr) 7^(3 1 )(
PAGE 78
69 for Xi := 1 to 1000 do for x 2 := 1 to 2 do 5 od od Outer loop initialization 1 Inner loop initialization 1000 Outer loop condition check 1001 Inner loop condition check 3000 Figure 5.1. Loop Interchange May for x 2 := 1 to 2 do for Xi := 1 to 1000 do S od od Outer loop initialization 1 Inner loop initialization 2 Outer loop condition check 3 Inner loop condition check 2002 mge the Number of Loop Initializations seem to be reason enough for loop interchanging in extreme conditions such as the one in Figure 5.1, it is usually insignificant when compared to the real power of loop interchanging. The major advantage of loop joining is that it allows other optimizing manipulations involving loops, most notably loop condition joining, but to a lesser degree loop splitting, to occur. A conditional statement may refer to the loop control not of its immediately encompassing for statement, but rather to the outer for statement. This occurs in many cases in the image algebra, and will appear in an example discussed in Chapter 6. The loop interchange is crucial in allowing more powerful transformations to take place. Figure 5.2 shows a nested loop before and after loop interchange, with the loops unrolled to show the exact statement ordering. While it is fairly straightforward to see the difference in the code before and after loop interchange, it is awkward to express this difference mathematically. Transition from row-major to columnmajor involves moving statements over large groups of statements. Figure 5.3 shows some of this movement. To transform row-major to column-major, the statement S[ni/x 2 ][mi + l/xi] must be interchanged with all of the statements above it except 5Â’[ni/x 2 ][m 1 /a; 1 ] (that is, all of the statements with a lower row number and a higher
PAGE 79
70 The code segments: (in row-major form) for Â£1 := mi to m 2 do for x 2 := rii to n 2 do S od od are equivalent to the following S[nilx 2 ][milx i]; S[ni + l/r 2 ][mi/xi]; 5[ni + 2/x 2 ][m 1 /x 1 j; 5[n 2 /r 2 ][mi/xi]; 5[n 1 /x 2 jjm 1 + l/x x ]; 5[n 2 /x 2 ][m 1 + 1/ Xj] ; 5[n 2 /x 2 ][m 2 /x x ]; (in column-major form) for x 2 := n x to n 2 do for xi := m x to m 2 do S od od ;s (assuming m x < m 2 and n x < n 2 ): S[mi/xi][ni/x 2 ]; 5[mi + l/xi][ni/x 2 ]; 5[mi + 2/xij[ni/x 2 j; 5[m 2 /xi][ni/x 2 ]; 5[mi/xij[ni + l/x 2 ] ; 5[m 2 /xi][ni -f l/x 2 ]; 5[m 2 /x : ][n 2 /x 2 ]; Figure 5.2. The Effects of Loop Interchange
PAGE 80
71 5Â’[n 1 /x 2 ][m 1 /x 1 ]; S[ni + l/x 2 ][m x /x x ]; 5[n a + 2/x 2 ][m x /x x j; 5[n 2 /x 2 ][mj/xi]; S[n x /x 2 ][m x + 1/xi]; 5[n 2 /x 2 ][mi + l/*i]; 5[n 2 /x 2 ][m 2 /x!]; 5[n 1 /x 2 ][m 1 /xi]; S[n x /x 2 ][m x + 1/xi]; 5[ni 4l/x 2 ][m 1 /xi]; 5[n x + 2/x 2 ][mi/xi]; 5[n 2 /x 2 ][m x /xi]; S[n x + l/x 2 ][m 1 + 1/xi]; Â‘SÂ’[n 2 /x 2 ][m 1 + 1/xi]; 5[ni/x 2 ][mi + 2/xi]; Â— S[n 2 /x 2 ][m 2 /xi]; Figure 5.3. Statement Interchanging During Loop Interchanging column number). Then, the statement 5[n x /x 2 ][mi + 2/x x ] must be interchanged with all of the statements above it except the first two (again, all statements except those with a lower row number and a lower column number). A statement in row i and column j must be interchanged with statements in all previous rows which have a higher column number. As a preliminary result to loop interchange, notice that if there is no body to a loop, the loop has no meaning. Lemma 5.2.1 (Loov simplification) for x ;= to s 2 do D od = D The proof of this follows directly from loop unrolling. With this result, the proof of the validity of loop interchange is fairly straightforward.
PAGE 81
72 Theorem 5.2.2 ( Loop Interchange ) H ( for Xi := s 4 to s 2 do for x 2 := S3 to 54 do S od od^ (a) = At (if or X 2 '= S3 to 54 do for X\ := 64 to 52 do S od od)(a) Provided: sets(S) fl usesf^ij = sets (S) fl uses (s 2 ) = sets (S) fl uses (^3 j = sets (S) fl uses (34) = x\ Â£ ivar (s 3 ) = Xi ivar (s 4 ) = V?/ 1 , 2 / 2 , 2 / 3 , 2/4 3-7?.(si)(cr) 0). Show it is true in states where (7Z(s 2 )(<7) Â— 7 ?.(s x )(cr) + 1 ) = k + 1. A 4 (for x x := s x to s 2 do for x 2 := S3 to s 4 do S od od)(
PAGE 82
73 Â— M (for x 2 S3 to s 4 do for x x := s x to Â«2 1 do 5 od od; (for x 2 := 53 to s 4 do 5 od)[ft(s 2 )(<7)/x x ])(cr) (Induction hypothesis) = M (for x 2 := 33 to s 4 do for x x := s x to s 2 Â— 1 do 5 od od; for x 2 j=^S 3[7 ?.(s 2 )((7 )/x 1 ] to 5 4 [72.(3 2 )(cr) /xi] do S[ft(s 2 )(<7)/xi] od)(cr) (Def. of subst. into stat. [3.4.4]) = A4 (for x 2 := 33 to s 4 do for x x := s x to 3 2 Â— 1 do 5 od od; for x 2 := 33 to s 4 do S[ft(3 2 )(cr)/xi] od od)(cr) (Substitution simplification [5.3.2]) = -M (for x 2 := s 3 to s 4 do for x x := s x to s 2 Â— 1 do S od; S[7?.(s 2 )(cr)/xi][x 2 /x 2 ] od) (a) (Loop joining [5.1.1]) = M. (for x 2 := 33 to s 4 do for x x := 3 X to 3 2 1 do 5 od; S[7Â£(s 2 )(cr)/xi] od) (cr) (Def. of subst. [3.4.4]) = M. (for x 2 := S3 to s 4 do for x x := s x to s 2 do S od od) (cr) (Loop rolling [4.3.2]) 5.3 Code Motion Code motion is a traditional code transformation which removes loop invariant statements from a loop. It can be viewed as a series of smaller transformations. First the statement to be removed is interchanged with the other statements in the loop until it becomes the first statement. (If this cannot be done, the statement cannot be removed.) Then the statements in the loop are unrolled. The statements which are invariant (since the loop may have been unrolled more than once, there may be more than one copy of the invariant statement) are moved to the first positions by more statement interchange, leaving the other statements in the same order as before. The remaining statements are rolled into the loop again and the invariant statement is compressed so there is only one copy of it. Since all of these operations have already been shown to yield code with the same meaning as the original code, the overall operation must also result in equivalent code.
PAGE 83
74 Theorem 5.3.1 (Code Motion ) 1= M (for x := Sx to s 2 do S a ;S 2 od)(cr) = Ad (Si; for x := s a to s 2 do S 2 od )(a) Provided: setsfSij fl uses^sij = (f> sets(Si) n uses(s 2> ) = (f> ft(si)(cr) < 7Z(s 2 )(a )/ x Â£ ivar(S\); S\ is interchangeable with S 2 [k/x] for k = m . . . n Â— 1; Si; Si is compressible. In proving this theorem, the following lemmas are very useful. Lemma 5.3.2 If x Â£ ivar(s) then s[si/x] = s. Similarly, if x Â£ ivar(b) then 6[si/x] = b. Lemma 5.3.3 If x Â£ ivar(S) then S[si/x] = S. The proofs of both of these lemmas follow by mathematical induction on the complexity of s, b , and S. The proof of code motion then follows by induction on the number of times through the loop, (H(s 2 )(
PAGE 84
= X(5 2 [ 7e( 3a )( < 7) /a;])(A^(5 1 )( < T)) (Def. of S x ; S 2 [3.5.3]) = M{Si[ 11(si)((t) Ix\, D){M{S x ){i t)) (Def. of D [3.5.3]) = M{S 2 [n{ s 2 )(a) / x}\ for x := n{s x )((r) + 1 to 71 (s 2 )( 1). Show it is true in states where (7 Z(s 2 )(a) 7l(si)(a) + 1) = k + 1. A4(f or x := s x to s 2 do S X ;S 2 od)(
PAGE 85
76 for x := ^(g 1 )(X(5 1 ) (cr)) + 1 to 7?.(52 )(A / ((S'i)((t)) do S 2 od)(Af (5i)(cr)) (Def. of empty sets and uses [3.8.3]) = A4(f or x := sx to s 2 do S 2 od)(Ad (Si)(
PAGE 86
77 Theorem 5.1.1 (Loov-Conditionnl Joining) }= for x := sx to 52 do if x > s 3 then S fi od = for x := max(si, s 3 ) to s 2 do 5 od and (= for x := Si to s 2 do if x < S3 then 5 f i od = for x := Si to min(s 2 ,s 3 ) do 5 od Provided: sets (S) fl uses(S 3 j = cf> x & uses(s 3 ) Proof: Proof of the first part by induction on the number of times through the loop, (7Â£(s 2 )(cr) Â— 7?.(si)( s 3 then 5 fi od)(cr) = Ad(D)(cr) (Loop unrolling [4.3.1]) = A4(for x := max (si,S3) to s 2 do if x > S3 then S fi od)(cr) (Loop rolling [4.3.1]) Induction step: Assume that for x := si to s 2 do if x > S3 then 5 fi od = for x := max(si,S3) to s 2 do S od in states where (7 ^(s 2 )( 0). Show it is true in states where (7Â£(s 2 )( s 3 then S f i od)(cr) = A4((if x > s 3 then 5 f i)[7Â£(si)(cr)/a:]; for x := 7Â£(si)( S3 then 5 fi od)( 53[7?.(si)((T)/a:] then 5[7 ^(si)(it)/x] fi; for x := 7^(si)(cr) + 1 to 7 ^(s 2 )( S3 th en S fi od)( S3 then 5[7^(si)( S3 then 5 fi od)(cr) (Def. of subst. into exp. [3.4.1])
PAGE 87
= M( if 7Z( 3i )( s 3 t hen S'[^(3 1 )( s 3 then S[ft(sx)( 1Z(s 3 )(cr) then M.(S[H(s\)(cr) / x])(cr) else M(D)(cr ) fi) (Def. of if [3.5.3]) = jVf (for x := max (1Z(si)(cr) + 1, a 3 ) to 1Z(s 2 )(a) do 5 od) (ifn( Sl )(a) > 7Z(s 3 )(a) then M(S(n( Sl )(a)/x])(a) e lse M.(D )(a) fi) (Sem. of 1Z(m) [3.5.1]) = if 7Z(si)(cr ) > 7l(s 3 )(cr) then M(for x := max (7Â£(ax)( 7l(s 3 )(cr) then M(tor x := max (TZ( 3l )(a) + 1, s 3 ) to ft(s 2 )(or) do S od) (A4(S[ft(*x)( 7l(s 3 )(cr) then M( for x := max (7l(sx)(cr) + 1, s 3 ) to H(s 2 )(a) do 5 od) (A^(5[^( 5i )( < t)/x])( < 7)) else M(for x := max (sx + 1, s 3 ) to s 2 do S od)( K(s 3 )( ft(s 3 )(cr) then M(S[R,(si)(cr) / x]; for x := max (ft(ax)( 7Z(s 3 )(a) then M(S[R.(si)(
PAGE 88
for x 7?.(sx)(o') + 1 to lZ(s 2 )(cr) do S od)( K(*3)( 7Z(s z )(cr) then Ad(for x := Si to s 2 do S od)(a) else Ad (for x := max (si, s 3 ) to s 2 do S od) (< 7 ) fi (Loop rolling [4.3.2]) if 7l(si)(cr) > 7Z(s 3 )(o-) then Ad (for x max (s x ,s 3 ) to s 2 do 5 od)( *(*Â»)(*)) Ad(f or x := max (s lv s 3 ) to s 2 do S od)(cr) (Meaning of if)
PAGE 89
CHAPTER 6 A PROTOTYPE OPTIMIZER The previous chapters provide the necessary denotational semantic background for performing code optimizations. This background certainly provides the provable correctness of the transformations and related optimizations. Still, it remains to be shown that these transformations can be implemented in a reasonable fashion. This research has revealed that they can indeed be implemented to give the basic tools for a person to use to perform a variety of potential optimizations and evaluate the results. This chapter will describe the Heursitic Optimizing Prototype System (HOPS), a LISP system that provides all of the primitive and global optimizations discussed in Chapters 4 and 5. This chapter gives a brief description of HOPS, starting with an overview of the system in Section 6.1. In developing the system the need for a timer for the code and a better approximation of when two sets of variables are always separate became apparent. The timer which resulted is discussed in Section 6.2 and the refinement of static approximation for always separate is presented in Section 6.3. Once the basic transformations were implemented, it was necessary to write programs to combine them in an attempt to optimize code. A discussion of these heuristic programs is in Section 6.4. Finally, a large example of HOPS in action, optimizing a program to do image histograms, is presented in Section 6.5. 6.1 The System and Its Data Structures HOPS is an interactive LISP system, implemented in XLISP running under UNIX, on a Sun 3/280 and a Gould Powernode 9080. It has functions that perform each of 80
PAGE 90
81 the primitive transformations as described in Chapter 4 and each of the optimizing transformations as described in Chapter 5. It also provides functions to aid a human optimizer, including statement and expression construction, extraction of parts of statements and expressions, statement and expression validity checks, peephole optimizations, and, probably most importantly, statement timing. The timer is discussed in more detail in Section 6.2 and the other functions are described later in this section. Simple variables and constants in this language are LISP atoms and indexed variables, expressions and statements are lists. Variables must be declared in HOPS programs, unlike programs in the language described in Chapter 3. This can be done with the function MakeVariable. Statements and expressions are stored in prefix form. The format for storing each of these is given in Table 6.1. In order to simplify some testing, abstract statements and expressions are allowed. Thus, if the user is interested in examining if statement transformations, the user need not enter complete statements for each clause, but may instead enter abstract statements for these unimportant clauses. For example, a user interested in loop-conditional joining may not want to consider the statement in the then clause of the conditional being joined. In this case, an abstract statement could be used. Similarly, someone trying to explore if statement simplification as described in Theorem 4.2.5 could use an abstract boolean expression rather than some real expression. Abstract values are assumed to set and use no variables. These abstract values are atoms and must be declared by the user using the functions MakeAbstractStatement, MakeAbstractBoolean, and MakeAbstractExpression. The functions in HOPS each take a single statement as a parameter and return the statement after execution of the transformation, if the transformation is valid.
PAGE 91
82 Table 6.1. HOPS Equivalents for Language Constructs Language Construct HOPS Equivalent X a[s] Integer Variables X (a s) m Integer Expressions m si op s2 (op si s2) legal ops are + , , * , / if b then si else s2 fi (if b si s2) true Boolean Expressions true false false si op s2 (op si s2) legal ops are <, >, <=, (not b) >=, =, <> s in sl...s2 (in s (si s2)) v := s Statements (:= v s) SI; S2 (SI S2) if b then SI else S2 fi (if b SI S2) D 0 for x := si to s2 do S od (for x si s2 S)
PAGE 92
83 There are a number of functions allowing the user to extract single statements from structured statements, making it easier to send the single statement expected here as a parameter. These are relatively straightforward and are omitted from the present discussion. The important transforming functions include the following. Interchange . This function will interchange the first two statements in a compound statement. InterchangeWithSubstitution . This function interchanges the first two statements of a compound list. The first statement must be an assignment statement and the value assigned to the left-hand-side is substituted for the left-hand-side in the second statement. InterchangeWithBackSubstitution . This function performs interchange with backward substitution as described in Theorem 4.1.5. LoopUnroll . This function unrolls a statement from the back of a for statement. Currently, this is only done if the loop bounds are constants Â— there is no attempt made to evaluate variable loop bounds. Absorbln. This function will attempt to absorb statements into an if statement from either before or after the if statement. If only absorption of statements following the if statement is desired, Absorblnl can be used, while AbsorbIn2 only attempts to absorb statements from before the if statement. ExtractOut . This function removes statements from both branches of an if statement to either before or after the statement. ExtractOut 1 and Extract0ut2 will move statements to after or before the if statement respectively. AbsorbWithSubstitution . This function absorbs assignment statements before if statements into the if statement, substituting the right-hand-side for the left hand side in the if statement condition as necessary.
PAGE 93
84 IfSplit . This function divides an if then else statement into two if then statements. Looplnterchange . This function will interchange the boundaries of nested loops. Loop Join . This function will join a pair of for loops with the same range. LoopSplit . This function will split a single for loop into a pair of for loops with the same range. MoveCode. This function will remove loop invariant statements to before for loops. LCJoin. This function will convert a loop statement with a nested conditional statement into a loop statement with altered bounds. StatementSimplify . This function performs a variety of peephole optimizations. Each of the functions of HOPS is nondestructive. This enables a user to assign the value of a statement to a variable and then to try a variety of transformations on the variable, being sure of always starting with the same statement. A typical call sequence might then be: (setq testprog '(for x 1 10 ( (:= y 1) (:= x (+ y x)) ) ) ) Give the test program a value. (time testprog) Compute the original time it takes. (setq tryl (MoveCode testprog)) Attempt code motion. (time tryl) Compute the time the transformed program takes, (setq try2 (LCJoin testprog)) Attempt loop-conditional joining. (time try2) Compute the time the transformed program takes.
PAGE 94
85 Notice that the user tried to perform loop-conditional joining when it was not possible (since there is no conditional). HOPS recognizes this and will not alter the statement, so the time for the original statement and the statement after the loop-conditional joining will be the same. 6.2 A Parameterized Timer The functions given in the previous section provide the user with the ability to attempt a variety of transformations, but there is nothing in HOPS which will tell the user when one transformation provides improved code. Instead, HOPS provides a pair of timers, one which returns a symbolic time and the other which simply returns a count of time units, which the user can then use to determine the benefit (or harm) of the transformation. Since many systems differ in the amount of time it may take to perform operations, these timers are based on a set of constants, any of which may be changed by the user to better represent relative speeds of the userÂ’s actual system. An optimizing system based on HOPS could then use the symbolic time to determine whether or not to apply a particular transformation. Additionally, there are two functions to return the time of boolean and integer operators, so that different operators may be assigned different times (as is so often the case in actual systems). When the bounds of a loop can be statically evaluated, the actual number of iterations in the loop will be computed, but if the loop bounds are expressions, another default will give the number of times through the loop. Figure 6.2 shows the symbolic times of both the original and altered statements in the example at the end of the previous section. 6.3 A New Approximation of Always Separate Using ivar and livar as the static approximation of sets and uses in Section 3.9 proved inadequate in HOPS. While the approximation was fine for simple variables,
PAGE 95
86 Original statement: (for x 1 10 ( ( : = y 1) ( : = x (+ y x) ) ) ) ) Its symbolic time: (+ ForStatementTime O (+ 1 (10 1)) (+ CompoundStatementTime (+ (+ AssignmentStatementTime Const antTime ) (+ AssignmentStatementTime C+ BinaryFunctionTime VariableTime VariableTime ) Statement transformed by MoveCode: (( := y 1) (for x 1 10 ( (:x (+ y x)) ) ) ) Its symbolic time: (+ CompoundStatementTime (+ AssignmentStatementTime ConstantTime ) (+ ForStatementTime (* (+ 1 (10 1 )) (+ BinaryFunctionTime VariableTime VariableTime ) ) ) ) Figure 6.1. Some Symbolic Statement Times
PAGE 96
87 it was far too broad to say that the entire array was set when in fact only one element of it may have been set. Too many image processing functions work on arrays, usually going through them in some specific order each time. Because of this, a new representation of intermediate variables was used in HOPS. Simple variables are still stored as simple variables in the intermediate form. Array variables are now stored as both the array and some index information. This array index information is stored in one of five formats, depending on how the array reference appears in the program. These index types are described below. Constants. If the array is indexed by a constant or constant expression (and thus the reference is of the form a[m]), the index will be stored as the constant m. Variables. If the array is indexed by a variable (and thus the reference is of the form a[x]), the index will be stored as the variable x. Expressions . If the array is indexed by a variable expression (and thus the reference is of the form a [s]), the index will be stored as the expression s (where expressions are stored as discussed in Section 6.1). Constant subranges . If the array reference is in a for loop with constant bounds and is indexed by the loop control variable, the index will be stored as a subrange of the constants in the form (lower-bound upper-bound). Variable subranges . If the array reference is in a for loop with nonconstant bounds or is indexed by a function of the loop control variable, the index will be stored as a subrange of the expressions in the form (lower-bound-expression upper-bound-expression). As before, any two simple variables are always separate if they have different names and any simple variable is always separate from any array variable. When two intermediate variables are array references referring to the same array, it is necessary to check the index information. In some cases, it may be possible to determine from
PAGE 97
88 Table 6.2. When Two Index Types May Be Always Separate Type of first index: Type of second index Constant Variable Expressions Constant Subrange Variable Subrange Constant Maybe Never Never Maybe Never Variable Never Never Maybe Never Never Expression Never Maybe Maybe Never Never Constant Subrange Maybe Never Never Maybe Never Variable Subrange Never Never Never Never Maybe the indices that the array references are always separate. Table 6.2 tells in which cases, based on the index values, array references may be always separate. In the cases in Table 6.2 marked Maybe , HOPS will check the values of the two indices to determine if they can positively be declared to be always separate. If, for example, the first index were a constant and the second were a constant subrange, HOPS would only have to check if the constant fell into the subrange. If it did not, the two array references could safely be declared always separate. The spaces marked Never do not mean that there is no way for the references to be always separate, only that HOPS cannot determine statically if they were. Certainly, two arrays referenced by variables may be references to different locations, but with no information about the values of the indexing expressions, HOPS cannot conclude this. 6.4 Some Heuristic Programs Using the System A user could certainly work with these transformations to transform code in their raw form, but they are at the level of assembly language programming. In order to assist the user, some functions have been added to attempt larger scale transformations. Thus HOPS contains some heuristics for code optimizations. These functions are described in this section.
PAGE 98
89 Probably the simplest of these heuristics is DoAllProp, which will perform forward copy propagation wherever possible. Forward copy propagation is done by interchanging with substitution all statements of the form x := v or x := m, to move them as far backward in the code as possible. It may also involve absorption into if statements. The function PropagateAndSimplify combines this with StatementSimplif y to perform both peephole simplifications and forward copy statement propagation concurrently. Backward copy propagation, possible as a result of Interchange with Backward Substitution (Lemma 4.1.5), can be done with the function BackPropagate. Only assignment statements with an array reference on the left-hand-side will be propagated backwards. Statement compression helps RemoveDeadVars with the removal of dead statements, those whose results are no longer needed by the rest of the program. Statement compression alone does not provide dead variable elimination. In order to determine which variables are live at any point in a program, there must be some notion of the output variables of that program. All previous results have considered states to be equal if they agree in the values assigned to all variables, not just a group of output variables. It is conceivable to discuss equality of states restricted to a set of variables, but is outside the scope of this research. There are also functions to optimize the different structures, such as CompoundOptimize and ForOptimize. It is here that various ways of combining the primitive and global transformations can be explored. I conjecture that the problem of determining exactly which set of transformations will best improve the running time of any piece of code for any set of timer parameters is most certainly undecidable. However, work has been done to devise some general rules for applying the traditional global data flow transformations [6]. This work has been adapted for use by HOPS.
PAGE 99
(def ForOptimize (lambda (Stat LiveVars) (prog (InnerStat NewStat SecondStat) > Simplify the entire statement and eliminate it if possible, (setq Stat (StatementSimplify Stat)) (cond ( (not (IsFor Stat)) (return Stat)) ) Â» Then, do Backward propagation (since CompoundOptimize won't ; know the compound is nested in an IF) (setq InnerStat (GetForStat Stat)) (setq InnerStat (BackPropagate InnerStat (GetLCV Stat))) ; Now optimize the inner statements (setq InnerStat (CompoundOptimize InnerStat LiveVars)) ; Restructure the statement (setq NewStat (MakeFor (GetLCV Stat) (GetLowBound Stat) (GetHighBound Stat) InnerStat )) 9 )) Attempt code motion from the loop (setq NewStat (ForceMoveCode NewStat)) Attempt loop-conditional joining (setq NewStat (ForceLCJoin NewStat)) (return (StatementSimplify NewStat)) ) Figure 6.2. A HOPS Program to Optimize the Histogram Program
PAGE 100
91 A version of ForOptimize is in Figure 6.2. It begins by simplifying the entire statement with peephole optimizations. If this results in a statement which is not a loop (either because the statement in the for loop is nullable, or because the loop boundaries are computable and the lower bound is greater than or equal to the upper bound), the optimization of the for loop stops there. This initial step is time consuming and may not be needed for some for statements, but was determined to be necessary in the simplification of the histogram program in the next section. Next, before doing global transformations to the nested statement, backward copy propagation is done. Only statements with array references, referenced by the loop control variable are propagated backwards. Then, CompoundOptimize is employed to perform global transformations, such as elimination of dead variables and statement compression, on the nested statement. Once the nested statement is improved, code motion and loop-conditional joining are attempted. Finally, peephole simplifications are repeated. This entire plan of attack may constitute overkill for some for statements, but usually does provide improved code whenever improvements are possible. It is still up to the user to look at the times of the code with and without the improvements to determine if the improvements were actually beneficial. There are also a variety of functions provided to extract statements of interest and greater potential for optimizations from a compound statement, along with the simplifying procedures available with HOPS. The first of these is ExtractForStat which will find the first for statement in a compound statement. Compound statements are of special interest because the amount of looping in image processing is so great. SplitCompoundAroundlf will extract an if statement which uses a particular variable (passed as a parameter) in its condition. This is particularly useful in loopconditional joining, and there is a function, ForceLCJoin, which uses this to attempt a variety of changes in the code to ultimately perform loop-conditional joining.
PAGE 101
92 (for i LowGray HighGray ( ( := s 0) (for j LowPixel HighPixel (if (= i (a j)) (:= s (+ s 1)) (:= s (+ s 0)) ) ) (:= (h i) s) ) ) Figure 6.3. A Straightforward Implementation of the Histogram While none of these heuristics is altogether complicated (and they all leave it to the user to determine if indeed the transformation is beneficial), they do as a group show how extensible the basic functions of HOPS are and indicate some of its potential to become a truly intelligent code optimizer. 6.5 A Large Example: The Histogram As an example of the abilities of HOPS, even in its present form, consider a program to compute the histogram of an image. Determining the histogram of gray levels in an image is important for a variety of techniques such as enhancement by histogram equalization over a wider range of gray levels and image segmentation [13]. To determine the histogram h of an image a over the gray levels min-gray-level to maxgray-level in the image algebra, the following expression is used: for i in min-gray-level to max-gray-level do hi Â«Â£(xi(a)) end for
PAGE 102
93 (for i 0 255 (:= (h i) 0) ) (for j 0 4095 (for i (max 0 (a j)) (min 255 (a j)) (:= (h i) (+ (h i) 1) ) ) ) Figure 6.4. The Resulting Histogram Program A straightforward implementation of this algorithm is shown in Figure 6.3. As stated it is extremely inefficient. It requires extra space to hold each of the new characteristic function images. It also requires looping over the size of the original image 2n times (where n is the number of gray levels) Â— once to compute the characteristic function and once to sum it. While standard optimizations may improve this code, they cannot eliminate the amount of looping. The HOPS program ForOptimize, discussed in the previous section and presented in Figure 6.2, can be used to optimize this histogram program. The resulting code is in Figure 6.4. (Since LowGray, HighGray, LowPixel, and HighPixel are all constants, they were replaced by their corresponding values during the simplification.) HOPS has demonstrated the potential to be an important assistant to human code optimizers. It still is fairly primitive and relies strongly on the user, both to direct its search and to determine when a transformation is beneficial. It has not yet been incorporated into any of the image algebra programming languages, and this would seem to be a logical next step. Still, its success at this level indicates the potential of approaching global optimization as a collection of primitive transformations.
PAGE 103
CHAPTER 7 CONCLUSIONS This dissertation presents a new approach to global optimizations. Rather than collect a variety of information about each statement and perform large transformations when the conditions are correct, I collect a small amount of information (only the sets and uses for the statement) and perform small transformations, which can then be combined to form large optimizations. By performing optimization in smaller pieces, it is easier to show that each of the pieces is correct. I have described a small language and have given its denotational definition. With this definition, I have proven that each of the primitive transformations preserves the meaning of the statement. These primitive transformations can be combined to give global optimizations. These optimizations include some of the traditional global data flow optimizations such as code motion and copy propagation, along with some previously unexploited optimizations such as loop-conditional joining and backward copy propagation. The Heuristic Optimizing Prototype System implements all of these primitive transformations and global optimizations. It allows a user to experiment with a variety of optimizing strategies. It also has functions developed to assist the user. These functions will attempt to rearrange the program so that some of the more beneficial optimizations (such as code motion and loop-conditional joining) can occur. The HOPS timer is configurable, enabling the user to adjust timer parameters to best represent the system being optimized. 94
PAGE 104
95 There are three areas in which this dissertation would most logically be extended. The first is in the design of the language for the proofs. As it is now, the language is not Turing equivalent. While this would be desirable, it is not necessary for image algebra programs. It would also have greatly complicated the proofs presented here. The language could be made to have the power of a Turing machine by adding either loops which were not bounded at the time of entrance to the loop or by adding subprograms. If unbounded loops were introduced, the possibility of errors could no longer be pushed aside because infinite loops are a real possibility. Subprograms would need to have some sort of a parameter passing mechanism defined. Side effects of subprograms may increase the amount of aliasing in the language as well. These extensions, although powerful, would extend this project beyond what is necessary for image processing and would greatly increase the complexity of the proofs given. The second area where this project could be extended is in the heuristics for applying the transformations and provided with the HOPS system. Determining when transformations that seem to degrade a piece of code might actually leave it in a position to be improved greatly is a fascinating problem, albeit one outside the scope of this dissertation. This appears on the surface to be a classical application for expert systems. It would be interesting to improve the HOPS system itself so that it could actually be used for working code optimization, or alternatively, as a teaching tool for students. This would involve some work on the interface, additional heuristic programs, and improvements to the timer such as recognizing when loops are vectorizable and permitting different guesses for the default number of loop iterations for different loops. Finally, optimization in architectures other than traditional von Neumann architectures is a rapidly growing field. There are quite a few special architectures available for image processing [11]. Many of the optimizations applied here are also
PAGE 105
96 applicable to intermediate code for multiprocessor machines such as the Connection Machine [15], Copy propagation, loop interchange, loop joining, and loop splitting are all optimization techniques applicable to vector or concurrent computers discussed by Padua and Wolfe [21].
PAGE 106
REFERENCES [1] Alfred Aho, Ravi Sethi, and Jeffrey Ullman. Compilers: Principles, Techniques, and Tools. AddisonWesley, Reading, MA, second edition, 1986. [2] Frances E. Allen. Control flow analysis. ACM SIGPLAN Notices , 5(7): 1-19 July 1970. Â’ V ' [3] Frances E. Allen and John Cocke. A catalogue of optimizing transformations. In Randall Rustin, editor, Design and Optimization of Compilers, pages 1-30. Prentice-Hall, Englewood Cliffs, NJ, 1972. [4] Dana H. Ballard and Christopher M. Brown. Computer Vision. Prentice-Hall Englewood Cliffs, NJ, 1981. [5] William A. Barrett, Rodney M. Bates, David A. Gustafson, and John D. Couch. Compiler Construction: Theory and Practice. Science Research Associates, Chicago, second edition, 1986. [6] F. Chow. A Portable MachineIndependent Global Optimizer. PhD thesis, Stanford University, Computer Science Laboratory, Stanford, CA, 1983. [7] John Cocke. Global common subexpression elimination. ACM SIGPLAN Notices, 5(7):20-25, July 1970. [8] Patrick Cousot. Semantic foundations of program analysis. In S.S. Muchnick and N.D. Jones, editors, Program Flow Analysis: Theory and Applications , chapter 10, pages 303-342. Prentice-Hall, Englewood Cliffs, NJ, 1981. [9] Jaco de Bakker. Mathematical Theory of Program Correctness. PrenticeHall, Englewood Cliffs, NJ, 1980. [10] Veronique DonzeauGouge. Denotational definition of properties of program computations. In S.S. Muchnick and N.D. Jones, editors, Program Flow Analysis: Theory and Applications , chapter 11, pages 343-379. Prentice-Hall, Englewood Cliffs, NJ, 1981. 6 [11] M.J.B. Duff and S. Levialdi, editors. Languages and Architectures for Image Processing. Academic Press, New York, 1981. [12] Carlo Ghezzi and Mehdi Jazayeri. Programming Language Concepts. John Wiley and Sons, Inc., New York, second edition, 1987. [13] Rafael C. Gonzalez and Paul Wintz. Digital Image Processing. AddisonWesley, Reading, MA, second edition, May 1987. 97
PAGE 107
98 [14] Frederick Hayes-Roth, Donald A. Waterman, and Douglas B. Lenat, editors. Building Expert Systems. AddisonWesley, Reading, MA, 1983. [15] W. Daniel Hillis. The Connection Machine. The MIT Press, Cambridge, MA 1985. 6 Â’ [16] Ken Kennedy. A survey of data flow analysis techniques. In S.S. Muchnick and N.D. Jones, editors, Program Flow Analysis: Theory and Applications , chapter 1, pages 5-54. Prentice-Hall, Englewood Cliffs, NJ, 1981. [17] Edward S. Lowry and C. W. Medlock. Object code optimization. Communications of the ACM , 12(1):13Â— 22, January 1969. [18] W.A. Martin and R.J. Fateman. The MACSYMA system. In Proceedings of the Second Symposium on Symbolic and Algebraic Manipulation , pages 59-75 Los Angeles, 1971. [19] S.S. Muchnick and N.D. Jones, editors. Program Flow Analysis: Theory and Applications. PrenticeHall, Englewood Cliffs, NJ, 1981. [20] P. Nye. S-l U-Code:an intermediate language for Fortran and Pascal. Project document pail-8, Computer System Lab, Stanford University, Stanford, CA, October 1981. [21] David A. Padua and Michael J. Wolfe. Advanced compiler optimizations for supercomputers. Communications of the ACM , 29( 12) : 1 184-1201 , December [22] Kurt Perry. IAC: Image Algebra C. MasterÂ’s thesis, University of Florida, Gainesville, FL, August 1987. [23] G.X. Ritter and J.N. Wilson. The Image Algebra in a nutshell. In Proceedings of the First International Conference on Computer Vision, pages 641-645, London England, June 1987. [24] G.X. Ritter, J.N. Wilson, and J.L. Davidson. Image Algebra: An overview. Technical Report TR-88-05, Center for Computer Vision Research, University of Florida, Gainesville, FL, 1988. [25] B.K. Rosen. High level data flow analysis. Communications of the ACM, 20(10) :712Â— 724, October 1977. [26] Barbara G. Ryder and Marvin C. Pauli. Elimination algorithms for data flow analysis. ACM Computing Surveys, 18(3):277Â— 316, September 1986. [27] Paul B. Schneck. Movement of implicit parallel and vector expressions out of program loops. ACM SIGPLAN Notices, 10(3):103Â— 106, March 1975. [28] M. Shaefer. A Mathematical Theory of Global Program Optimization. PrenticeHall, Englewood Cliffs, NJ, 1973. [29] T.A. Standish, D.C. Harriman, D.F. Kibler, and J.M. Neighbors. The Irvine program transformation catalog. Technical Report 161, University of California at Irvine, Department of Information and Computer Science, January 1976.
PAGE 108
99 [30] M.V. Zelkowitz and W.G. Bail. Optimization of structured programs. SoftwareÂ— Practice and Experience , 4(l):51-57, January 1974. [31] Mary E. Zosel. A modest proposal for vector extensions to ALGOL. ACM SIGPLAN Notices, 10(3):62-71, March 1975.
PAGE 109
BIOGRAPHICAL SKETCH Ms. White received the degree of BA in economics from the University of Virginia in 1979 and MS in computer and information sciences from the University of Florida in 1985. She served in the Transportation Corps of the United States Army for three years and was stationed at Camp Casey, Korea, and several locations in the United States. While at the University of Florida, Ms. White received several awards for excellence in teaching. After graduation she plans to work for Armstrong State College in Savannah, Georgia as an assistant professor in the Department of Mathematics and Computer Science. She is married to Charles Engelke. 100
PAGE 110
I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. rhard X. Ritter, Chairman Â’rofessor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Josdph N. Wilson, Cochairman Assistant Professor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Professor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Manuel Bermudez Assistant Professor of Computer and Information Sciences
PAGE 111
I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. AJ
~~
~~ |