UFDC Home  UF Theses & Dissertations   Help 
Material Information
Thesis/Dissertation Information
Subjects
Notes
Record Information

Full Text 
PAGE 1 1 AUTOMATIC DIFFERENTIATION FOR FIRST AND SECOND DERIVATIVE FOR SOLVING NONLINEAR PROGRAMMING PROBLEMS By YING LIN A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREM ENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2011 PAGE 2 2 2011 Ying Lin PAGE 3 3 To my parents PAGE 4 4 ACKNOWLEDGMENTS I would like to express my deep appreciation to my advisor Dr. Anil V. Rao, for his constant encouragement and support throug hout this project. Especially, I thank him for his technical guidance that makes the accomplishment of my work be possible. His clear thinking, understanding, enthusiasm towards science all influence me to work harder I would also thank my committee membe r Dr. Warran Dixon for his patience and his time on my thesis. His hard working also inspires me to work harder and dig deeper in my thesis work. I thank all my friends who helped me a lot no matter in my thesis work or the mental support and consistent s upport throughout the work. I am especially in debt to Michael Patterson, who spent a lot of time in providing better understanding for my work. Finally, I would like to express my deep appreciation to my family. In particular my parents, their care, love and motivation make me hold on until the last minute. PAGE 5 5 TABLE OF CONTENTS page ACKNOWLEDGMENTS ................................ ................................ ................................ .. 4 LIST OF TABLES ................................ ................................ ................................ ............ 7 LIST OF FIGURES ................................ ................................ ................................ .......... 8 ABSTRACT ................................ ................................ ................................ ..................... 9 CHAPTER 1 INTRODUCTION ................................ ................................ ................................ .... 10 Motivation ................................ ................................ ................................ ............... 10 Literature Review ................................ ................................ ................................ .... 12 Overview of the Thesis ................................ ................................ ........................... 14 Thesis Outline ................................ ................................ ................................ ......... 15 2 BACKGROUND ................................ ................................ ................................ ...... 16 Differentiation ................................ ................................ ................................ .......... 16 Jacobian Mat rix ................................ ................................ ................................ ....... 17 Hessian Matrix ................................ ................................ ................................ ........ 18 Ways of Implementing AD ................................ ................................ ...................... 18 Different Modes ................................ ................................ ................................ ...... 19 Forward Mode ................................ ................................ ................................ .. 19 Reverse Mode ................................ ................................ ................................ .. 20 Chain Rule ................................ ................................ ................................ .............. 21 3 ALGORITHM ................................ ................................ ................................ .......... 2 4 Class, Variable, Member Functions, Properties ................................ ...................... 24 Inner Derivative ................................ ................................ ................................ ....... 25 Outer Derivative ................................ ................................ ................................ ...... 28 Class Constructor Function ................................ ................................ ..................... 29 Unary Function ................................ ................................ ................................ ....... 31 Constant Variable Function ................................ ................................ ..................... 34 Binary Function ................................ ................................ ................................ ....... 35 Addition and Subt raction ................................ ................................ .................. 35 Element Wise Operation ................................ ................................ .................. 39 Matrix Operation ................................ ................................ ............................... 43 Logic Func tions ................................ ................................ ................................ 47 Constant Functions ................................ ................................ ................................ 47 Shaping Function ................................ ................................ ................................ .... 48 Array Referencin g and Assignment Functions ................................ ........................ 49 PAGE 6 6 Composite Function ................................ ................................ ................................ 50 4 EXAMPLES OF OPTIMAL CONTROL PROBLEM ................................ ................. 52 Optimization ................................ ................................ ................................ ............ 52 Optimization Problem ................................ ................................ ....................... 52 Nonlinear Programming ................................ ................................ .................... 53 Example 1 ................................ ................................ ................................ ............... 54 Problem Statement ................................ ................................ ........................... 54 Result ................................ ................................ ................................ ............... 54 Example 2 ................................ ................................ ................................ ............... 57 Problem Statement ................................ ................................ ........................... 57 Result ................................ ................................ ................................ ............... 58 Example 3 ................................ ................................ ................................ ............... 60 Problem Statement ................................ ................................ ........................... 60 Result ................................ ................................ ................................ ............... 61 Example 4 ................................ ................................ ................................ ............... 63 Problem Statement ................................ ................................ ........................... 63 Result ................................ ................................ ................................ ............... 63 5 CONCLUSION AND FUTURE WORK ................................ ................................ .... 67 LIST OF REFERENCES ................................ ................................ ............................... 68 BIOGRAPHICAL SKETCH ................................ ................................ ............................ 70 PAGE 7 7 LIST OF TABLES Table page 3 1 The first and second pattern of different unary operations ................................ 35 PAGE 8 8 LIST OF FIGURES Figure page 2 1 The curve showing the changing of the slope ................................ .................... 17 2 2 Shows the computation procedure of forward mode ................................ .......... 20 3 1 Relationship between different AD components ................................ ................. 25 3 2 Visualized second derivative of a matrix ................................ ............................. 26 3 3 Results of the sparse function ................................ ................................ ............ 27 3 4 ................................ ..... 33 3 5 All the input cases for the plus function ................................ .............................. 36 3 6 ............................... 38 3 7 ................................ .............. 39 3 8 Different cases when x and y are both objects ................................ ................... 43 3 9 ................................ ........... 44 3 10 Results of the matrix operation ................................ ................................ ........... 46 3 11 Different input possibilities for the constant function ................................ ........... 48 4 1 Result of the arrowhea d function ................................ ................................ ........ 55 4 2 Result of the example two ................................ ................................ .................. 59 4 3 Comparison of three different derivative methods ................................ .............. 60 4 4 Result of example three ................................ ................................ ...................... 61 4 5 Comp arison of three different derivative modes ................................ ................. 62 4 6 Result of the example with giving the hessian and gradient by AD tool .............. 64 4 7 Result with the hessian and gradient value generated by the quasi Newton approximation ................................ ................................ ................................ ..... 65 4 8 Comparison of Methods using AD and using Finite Differencing ........................ 66 PAGE 9 9 Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Deg ree of Master of Science AUTOMATIC DIFFERENTIATION FOR FIRST AND SECOND DERIVATIVE FOR SOLVING NONLINEAR PROGRAMMING PROBLEMS By Ying Lin August 2011 Chair: Anil V. Rao Major: Mechanical Engineering In t his work a new algorithm is proposed for autom atic differentiation The new algorithm use s object oriented MATLAB to generate multi dimensional derivatives This work combines the forward mode with operator overloading to realize t he first and second derivative s of a function. Automatic differentiat ion in this thesis can be app lied for solving more complex function problems in a more efficient and precise manner th a n when compared to the tradition al differentiations. In order to get to know the feature s of this tool, an example of the arrowhead func tion is presented. The second example is a simple optim ization problem with only two states. T he last two nonlinear programming problems are more complex and sensitive with greater than three states By providing the gradient value for the object and const raint functions and hessian value for the Lagrang ian equation, those examples are solved with more effectiveness and preciseness. Compared to the traditional differentiations, which is tedious and error prone, automatic differentiation saves time for deriv ing optimization result s with machine precision PAGE 10 1 0 CHAPTER 1 INTRODUCTION Motivation G etting the derivative s of a function plays an important role in different areas that need num erical computation There exists different wa ys of computing d erivatives su ch as hand coding finite differen cing symbolic differentiation and automatic differentiation wh ich is a relative ly new technology to generate the derivative value. (1) Hand coding : Hand coding means computing derivatives by hand. This procedure is tedio us, error prone and time consuming (2) Finite Differen cing : One of the numerical methods to calculate differentiation is finite differencing which has the general form of Once it divides we can get the differentiation quotient H ence numerical calculation for differentiation is preferred over hand differentiation Taylor Series is the expansion of finite differencing. A function that is infinitely differentiable in a neighborhood of point a, can be expressed using th e Taylor series expansion . By using the higher order terms to approximate a function f(x), we can improve the precision. However, even the small errors in higher order differentiation values can cause big round off errors to the entire approximation result. (3) Symbolic Differentiation : Symbolic differentiation is the process of taking the derivative of a string represented function with resp ect to the variables. For example, (1 1) The first order derivative of this function is PAGE 11 11 (1 2) cient. Especially, when the problem is very complicated and with large size of variables, the description of the function derivatives will be very long. (4) Automatic differentiation: Automatic d ifferentiation is a tool that relies on a computer program to execute the elementary operation, such as the binary function, by applying the chain rule to the composition of those elementary operations over and over again, yielding the final derivative value. Automatic differentiation has become a n important tool fo r quick computation and error reduction in computing the derivative value. Automatic d ifferentiation finds a great deal of application in solving optimal control problems. Usually, an optimal control problem can be converted into a nonlinear programming pr oblem (NLP). This NLP requires the gradient of the function to find the minimum or maximum point, which gives the point of optimality. If the user does not provide the gradient value of the function, the function will generate the gradient using finite dif ference approximation, which has poor precision. Since most of the functi ons in practical optimization problems have complex derivatives, it becomes very difficult for users to provide the gra dient of each function. Hence, the a utomatic d ifferentiation too l plays an important role in providing the efficient gradient values. Apart from the optim ization problems, automatic d ifferentiation can also be used in sensitivity analysis, which is widely used in mathematics, physics, engineering and other areas. PAGE 12 12 Liter ature R eview In 1982, Philip Wolf developed a procedure in his paper [1] that tests a program used for computing the derivatives of given function and locates the errors in the program. He also mentioned that calcula ting the gradient is 1.5 times as expensive as calculating the function. Later, Andreas Griewank, in his paper [2] stated that the upper bounds for the ratio of expense for calculating the gradient and the funct ion should be 5. He also introduced implementations Automatic differentiation tool ADIFOR wa s developed in the paper [3] by Christian Bischof. ADIFO R stand s for automatic di fferentiation in FORTRAN and it is a source transformation tool written in Fortran 77 This tool was capable for use in real time experiments. It wa s more accurate than the divided approximation and also reduce d the time used for co mputing derivatives. An implementation of ADIFOR is discussed in [4] This paper uses the automatic di fferentiation tool ADIFOR for solving systems of nonlinear equations and for parameter identification problems Magana Jimenez [5] used ADIFOR 2.0 to provide efficient lie derivatives and Jacobians to calculate the control law and evaluate the stability of the control system. Meanwhile, the author also developed a user friend ly package called MAFC, which is MATLAB5.3 ADIFOR2.0 FORTRAN. This package is useful for analyzing closed loop control system. The paper [6] applied the automatic differentiation in solving nonlinear programming pr oblems. He also presents a heuristic approach to reduce the memory occupied for processing graphs, which are used for solving the large scale problems He also discussed the method to decide the sparsity pattern of hessian matrix. This work [7] implemented automatic differentiation into optimal PAGE 13 13 control problems The original problems were reduced to nonlinear programming problems. This transformation was realized by using Runge Kutta formulation. In 1991, Andreas Griewank introduced an automatic differentiation tool ADOL C [8] It is a forward mode tool, which is w ritten in C++ and implemented by using operator overloading. This tool c ould not only compute the first ord er derivative but also compute the higher order derivatives. Hence it was widely used for Automatic dif ferentiation implementation in areas requiring numerical computation. In [9] Kyung Wook Jee implemented automatic differentiation tool (ADOL C) to solve Intensity Modulated Radia tion Therapy (IMRT) problem. The se pro blems have multiple objective, and they are solved using the multi criteria approach In order to make ADOL C more efficient D avid Juedes and Andreas Gri ewank [10] parallel ed implement ed the reverse mode evaluation on each node along with the forward mode evaluation. The paper by Kowarz Andreas [11] d evelops a technique that im proves the application of operator overloading based automatic differentiation and allows higher serial performance of ADOL C This paper also developed an activity tracking technique that allows compact internal function and simplifies automatic different iation. There are other automatic differentiation tools, such as AD MAT. In 1998, Arun Verma introduce d an automatic differentiation tool, which c an compute the derivative accurately and fast [12] This tool use d object oriented MATLAB and could compute the derivative of up to second order. One year later, he along with Thomas F Coleman develop ed ADMIT 1 [13] This new tool made the computing of sparse Jacobian and H essian matrices possible Also ADMAT could be used as a plug in tool with ADMIT 1 to compute the derivative matrices for nonlinear optimization. In 2006 Christian H. Bischof PAGE 14 14 developed the source transformation tool ADiMat for MATLAB with embedded macro la nguage [14] This m acro language help ed in generating more efficient derivatives Also David M.Gay [15] developed RAD package which is a semiautomatic differentiation tool that com pute a function and its gradient individually. These gradient s were then assembled individually less memory than the normal automatic differentiation. The paper [16] introduces the AD tool implemente d by the cross country elimination, which is called OpenAD. And this paper applied this tool to construct unambiguous computational graphs. Cross country is different from forward or reverse mode, it requires the co also impl ements the cross country, but it has the number restriction in its input code. OpenAD is the extension of EliAD. Phipps [17] used automatic differentiation package ADMC++ to provide the derivative for computing the peri odic orbits in the system of hybrid differential algebraic equations. ADMC++ was developed based on MATLAB language that can not only provide derivatives, but also provide taylor series coefficients. By combining the automatic differentiation tool with COS Y INFINITY, Pavel Snopok [18] simulated the Tevatron accelerator and optimizes them into the same strength in all the arcs. Tadiouddine [19] studied computing Jaco bian by using vertex elimination algorithm. This algorithm reorders the statement s of Jacobian code, and not only reduce s the memory, but also reduce the flops. Overview of the Thesis This work has developed stand s for h essian matrix. In particular, Hes focus on the operator overloading and forward mode. The aim of the work is to introduce the algorithm of the automatic differentiation tool At the same time, this work studies three nonlinear programming PAGE 15 15 problems. From the results of those examples the AD tool is prove d to be efficient in solving nonlinear programming problems. Thesis Outline The outline of this paper is as follows. An introduction of the research motivation on automatic differentia tion (AD) is gi ven in Chapter 1 Chapter one also introduces the review of previous work, such as different AD tools, their implementation along with how they related to the optimal contro l. Chapter 2 has introduced basic mathematics and chain rule which is used in auto matic differentiation It also gives brief introduction of the two ways of implementing automatic differentiation, such as source transformation and operator overloading. Also, two different modes of AD are described here. Chapter 3 first shows class name, member variables and functions being used in au tomatic differentiation. Meanwhile it also introduces different types of functions along with their functionality. In Chapter 4 brief study of optimization and the way of how it is related to the nonlinear programming is presented. This chapter also shows th e formulation of optimization nonlinear p rogramming, and Lagrangian that is u sed to solve the nonlinear programming problems. Four ex amples are given in Chapter 4 Those examples are organized from the simplest one with fewest variables to the complicated one with more variables. Chapter 5 is the conclusion of the whole thesis and the results showed in the example implementations. It also comes up with the direction of future work. PAGE 16 16 CHAPTER 2 BACKGROUND The main purpose of t his chapte r is to study some basic mathematical concepts which are closely related to the automatic differentiation. Meanwhile, it introduces two different modes: forward and reverse mode; and two AD implementing method: operator ove rloading and source transformation. A brief description of the chain rule is also presented in this chapter. Differentiation In Mathematics, d ifferentiation is the process of computing the derivative of the function. For a function y =f(x), where the outpu t y is called the dependent va riables and the input, x is called the independent var iable. D erivative measures the rate change of output y with respect to independent variable x. It can be represented as follows : (2 1) Or (2 2) This equation shows the derivative is the slop e value of the tangent drawn at point x1. In other words it is about the rate change of the function with respect to the var iable s at a given point The derivative can be used to determine the critical points of a function. A c ritical point is the point where the derivative of the function equals zero. We may show this graphically using a simple two dimensional example. The la teral line represents the input variable or independent variable. The vertical line represents the output variable or dependent variable. The capital letter, A, B and C are the different points on the curve line. The curve has different rates of change at A, B and C. PAGE 17 17 Figure 2 1 T he curve showing the changing of the slop e From the graph above, the curve changes at different rates. At A, the rate of change of the function is zero. The function goes fro m increasing to decreasing at A, therefore A is a loca l maximum point. The a utomatic d ifferentiation Algorithm developed in this work relies on determining the maximum or minimum values of most user supplied functions. Jacobia n Matrix In calculus, Jacobian Matrix is a matrix that one vector taking partial d erivative with respect to another vector. Assuming y is a vector with m variables, x is a vector with n variables. So the Jacobean matrix for y is denoted as (2 3) PAGE 18 18 Suppose m=1, t hat is, y is a scalar. Then the Jacobian matrix is a column vector. If x is a scalar, and y is a vector, the Jacobian matrix will be a row vector. If both x and y are scalars, the result will be a scalar. Hessian Matrix In mathematics, Hessian Matrix is th e matrix of one function taking second partial derivative s with respect to the entire variables. Suppose one function f with m variable s, and is written in the form of f Actually, h essian matrix is the Jacobia n m atrix taking derivative with re spect to all the variables In this case, output is a scalar, Jacobian matrix will be a column vector. Then, h essian Matrix is a square matrix (2 4 ) Where i and j are indices indicating row and column number and D i s the derivative operation. Here the Hessian matrix will become as (2 5) Ways of I mplementing AD There are two ways of implementing automatic differentiation which transform the function from computing function into the function that generates value and derivative. Those two ways are operator overloading and source transformation. Operator overloading initially overloads the elementar y operations of the function in an array The derivatives of the elementary function are given in those functions. By taking the derivative of composite function with respect to the independent variable, PAGE 19 19 overall derivative is generated. It is hard to apply the reverse mode in this method. The r everse mode is explained in Chapter 3. Source transformation is a way that read s in a program, determines which statement need s to be derivative and carries out the source code for computing the derivative. It augmen ts the original source code of the function with instruction for calculating the derivatives. The source transformation can be applied by reverse mode as given in [20] Different Modes Forward Mode Forward mode propag ate s derivative s with respect to the independent variables through the chain rule. In the forward mode, it breaks the complex function into the elementary functions The program gives the derivatives of those elementary functions. By applying some operatio n rules, such as product rule, power rule, Quotient rul e, reciprocal rule and so on, forward mode propagate s the derivative of the function. For example, h(x) = f (g(x ))+ m (n(x)), G(x) n(x) is the function with the variable x, f and m is the function with v ariable g(x), n(x). Therefore, f (g(x)) is functional with the variable x. (2 6) (2 7) (2 8) So the total derivat ive of the equation should collect each element value and derivative then compile them in to the equation above. PAGE 20 20 Figure 2 2 S hows the computation procedure of forward mode The second derivative of this function is In order to get the second derivative of the function, we need to first compute the second derivative of by applying product rule and then substitute each value into the equation above respectively. Reverse Mode Reverse Mode follows the opposite direction as the forward mode does in computing the derivative That is to say i t generates derivative s of the underlying function PAGE 21 21 through the opposite direction of the chain rule. For example f (g(x)), the chain rule to calculat e the first derivative of this function i s In reverse mode, it will first get the value o f then serving as a seed. In order to illustrate the way reverse mode works, we use the foll owing example Suppose one composition function (2 9) (2 10) (2 11) (2 12) It uses to re present So it propagates derivative s from most outside of the function to the most inner of the function. And the final derivative of the function is taken with respect to the intermediate quantity. Chain R ule The chain rule is a mathematical formula that is used to compute the derivative of a composite function w ith more than two functions. For example, a function f maps a vector x with n dimensions int o the vector y with m dimensions T he dimension of this function is If f is a simp le function with variable x, it can be written as f (x) where x is an argument with n entries Normally, the first derivative of f(x) is But we have create d a new term that can be used as the stop target for the whole computation. AD stops evaluating the derivative of the function when it encounters the situation of tak ing the derivative s of independent variables with respect to the independent variables For PAGE 22 22 example, in the following equation, the first derivative of f(x) is the product of the outer derivative and the inner derivative (2 11) For the first element of the right side of the equation, f is a vector with length m. W hen the derivative is taken derivative with respect to independent variable x, the output dimension will be increased That is because if one of the elements in f takes the derivative with respect to n x, it will return a row vector with length n Stacking up the total number of the m row vectors will result in the total dimension of m b y n So th e dimension for the first element in the equation above is If f is a composite function of g(x), where g(x) is the argument of f an d x is the independent variable, and f can be written as f (g(x)). The first derivative of function f with respect to x can be written as (2 12) Suppose f maps from to the dimension for f(x ) is The dimension of is Therefore, the total dimension for the first deriva ti ve is The dimension of the first derivative equals the dimension of the function times the number of the input. Generally, dimension of output times the dimension of input. Then the cha in rule is applied to the second derivative. For the first simple function, the second derivative is literally taking the derivative of the target function twice. Based on PAGE 23 23 the first derivative, the second derivative is taking the derivative of the first de rivative with respect to the independent variables. It can be written as (2 13) As provided in the first derivative, the dimensi on of first derivative is Suppose the first d erivative is a function h (x) and t he second derivative of the original function should be the first derivative of h (x) Applying the general rule for the dimensions as is the dimension of function f times the dimensio n of the independent variables: For the composite function problem f(g(x)), the second derivative of the function can be written as (2 14) Because is zero, only the first two terms left. Then the dimension of the second derivative equals the dimension of the function times the dimension of the independent variable squared or In general, the product rule is applied in both first derivative and second derivative s PAGE 24 24 CHAPTER 3 ALGORITHM In this chapter, different AD algorithm co mponents, such as binary functions, unary functions, operation functions and shaping functions are described Consider again the function f (x): the total dimension of this function is When evaluating this function, automatic dif ferentiation will overload the different AD algorithm components Class, Variable, Member F unctions, Properties Class : In this work, the class name is Hes, which stands for hessian. Variable : 1) Input variable: In this work, the input variable could eithe r be a double precision variable or an object of class. 2) Output variable : The o utput variable c an only be an object of the class Hes. Member functions : Class Constructor function, Composite function, Unary function, Binary function, Constant Variable fun ction, Constant function, Reshaping function, Array Reference& Assignment function. F igure 3 1 shows how these member functions work together with each other and propagate the overall output values Suppose t he input is a complex function, which contains b oth unary function s and binary functions. In the first step user has supplied one input variabl e, and the input variable can be any type of variable. T he constructor function will first evaluate the input variable, create one class object a nd return the o bject s properties as well The program will call relative unary function and binary func tion to provide the derivative Meanwhile the shaping function array reference and assignment function assist the wh ole computation process if necessary In the end of the computation the composite function collect s all the results from the unary function, binary function and constructor function, and retu rn s the final derivative value of the user supplied equation s derivative. PAGE 25 25 Figure 3 1 Relationship between d ifferent AD components Properties : The object store s the specification of the class; those properties are Value, derivative, s_nz, s_derivative, nderivs, s_nderivs The fields, value and derivative stand for the value of the variables and the directional f irst derivative respectively. The field nderivs is the number of independent variables s_nz stands for the nonzero values indicat ing the direction to which the second derivative goes The field, s_derivative stands f or the second derivative value s_nderi vs is the numb er of independent variables that the function takes second partial derivative with respect to T he order of th es e fields has to be fixed. Inner Derivative I nner d erivative is taking derivative of the independent variables with respect to the independent variables themselves T he function will stop evaluating the derivative when it finishes taking the inner derivative A ssuming n valued variables that are denoted as x PAGE 26 26 the total dimension of the first derivative is If the derivative of an ind ependent variable is taken derivative with respect to other independent variables it will result in zero. As a result, for the first d erivative, the elements that lie in th e diagonal are equal to one and the off diagonal elements are equal to zero. Then t he first derivative of x is the identity matrix. For the second derivative, the dimension will be ; t he second derivativ e is a three dimensional matrix which can be described by a cubic as shown in Fig 3 2 with the length of each side equal to one of th e dimension in the matrix. Look at the graph below, s uppose O is the st arting point. I, J and K represent e ach dimension of the derivative. The form of the second derivative is Only when i=j=k, for example in the diagonal direction of the cube, will the value of the second derivative be nonzero. Figure 3 2 Visualized second derivative of a matrix In automatic differentiation, all the first and second derivatives are stored sparsely. sparse function is used to create a sparse m atrix for derivatives or transform the original matrix into a sparse matrix. There are three different ways to use the s parse function : PAGE 27 27 1) sparse (A) The sparse function acts like a convertor, directly convert ing a matrix A into a sparse matrix. The example b elow has created one diagonal matrix. Then, it use s the sparse function to reconstruct the matrix. The result has been shown below in F igure 3 3 The two numbers in each parenthesis are the indices of rows and columns wh ere the eleme nts are nonzero, respectively. Figure 3 3 Results of the sparse function PAGE 28 28 In this example, the sparse function squee zes out a ll the zero elements, leav ing only the nonzero elements. The dimension of the matrix does not change. The numbers in the bracket show s the number of the row and column where the nonzero elements are 2) sparse (m,n) In this way, m and n are scalars indicat ing the size of the matrix. The utilization of this function entails an all zeros sparse matrix with the size of 3) sparse (i, j, s, m, n) S is the vector that sto res all the nonzero elements. The vectors i and j indicate the position of nonzero elements The final size of the sparse matrix is created by the sparse function. Since the s parse matrix can only b e stored as two dimension al forms and t he second derivative is a multi dimensional matrix as discussed before; it is necessary to convert this 3D matrix into a two dimensional concatenation of each nxn matrix Therefore there would be n stack s of 2D matric es that consist of the second derivative s of each element. Outer D erivative The o uter derivative is as show in the chain rule section, which is p rovided by the unary or binary functions. For non constant function s the outer derivativ e gives non zero value s According to the chain rule, the derivative of the function i s the derivative of the outer function times the derivative of inner function For a non scalar independent variable, its inner derivative is a sparse matrix. The o uter d erivative can either be a scalar, a vector or a mat rix. To make the derivatives dimensionally compatible the outer derivative and the inner derivative can be reshaped into vector s When z ero elements are PAGE 29 29 multiplied with any variable the result is always zero. That is why we only extra ct the non zero value s from the inner derivative. In this way, the function can save time that would be used to calculate the zero results. f ind function is used to locate the nonze ro elements in a matrix and return s th e indices of the rows and columns in which the nonzero elements stored And the find function returns a vector as an output Class C onstructor F unction The class constructor function of class Hes is used to create objects of the class Hes. It is stored u nder the path: hes/@hes. The class object stores specific ma nners of the class as a structure da ta type. It contains six fields: value, derivative, s_nz, s_derivative, nderivs, s_nderivs Suppose x is an object of class hes T he syntax of a ccessing the pro perties of x are x.value x.derivative and x.s_derivative whose value s are the value of x, the first derivative of x and the second derivative of x respectively. The se specific properties are all defined in the class constructor function. The input of t h e class constructor function could be an object of the Hes class or a double precision variable The output of the function is an object of class hes. The constructor function needs to check whether the input is an object of the class, b ecause t he input da ta type is ambiguous. If the input is not of class Hes then the output value component equals the value of the input The other properties of the output are empty matrices If the input is an object of the class, the constructor function uses dot operatio n to get the output value. Again, suppose x is a Hes class object. The command x value can be used to get the value stored in the object. If x is a scalar then the first derivative with respect to x will be 1. If the x is a matrix, the function needs to take the derivative with respect to the entire set of variables Each element will result in a row vector. For example, PAGE 30 30 assume the dimension of the matrix x is M x N and the number of variables is p. F or each element in the matrix, the first derivative will be a 1xp matrix with only one non zero element. Then, e ach of these row vectors is reshaped into the column vectors of a diagonal matrix. The second derivativ e behaves differently In the class constructor function, if the input var iable is a scalar, the result of the second derivative will be the same as that of the first derivative. The general form for the second derivative is and is a square identity matrix is When tak es the derivative with res pect to an independent variable each row vector in the matrix will become a square matrix. For example, x is a matrix tha t can be written as (3 1) If the first derivative of this matrix is (3 2) Then the second derivative of this matrix will be (3 3) I n the second derivat ive matrix, each element is the derivative of the first derivative with respect to the all of the independent variables. For example, t he full form of is Then taking the der ivativ e of with respect to the independent variable x agai n i t becomes a Hessian matrix as shown PAGE 31 31 (3 4) In the Hessian matrix above, there is only one non zero element located in the first row and first column. Th e values of the other elements are zero since they are derived with respect to other variables T he second derivative matrix is a large matrix with four matrix blocks; each block is composed of the second partial derivative s of the element in the original matrix with respect to the independent variables In order to generate the second derivative matrix, we have to use a loop to assign the value s to the element s whose row and column indices are the same. T he constructor function take s the product of the dim ension s of the f irst derivative as the number of rows in the second derivative matrix The number of columns in the second derivative matrix is the number of variables. Unary F unction The automatic differentiation Hes creates unary functions, such as sin ( x), cos (x) powers (x), which take only one hes class input x, and return an object of class h es The most common unary function s are the trigonometric function s Take f (x) =sin (x) as an example where x is an independ ent variable. The first derivative of this function , can be regarded as the outer derivative of the function, and as the inner derivative I n all unary functions, the number of independent variables determine s the output size of the inner der ivative. This unary function is defined in MATLAB as : function y = sin(x) PAGE 32 32 Where x is the input and y is the output. This function defines all the properties of y, including the value, first derivative, second derivative and number of derivative s The first derivative component of y is ,which equals and the second derivative component is which is The inner derivative is not evaluated in the unary function but is defined in the constructor function. After defining all the p roperties of y, y is a struct ure In automatic differentiation, there are only two types of input s : double precision variables and the class object As a result the unary function has to call function class to convert y into an object of class. y = clas s(y, 'hes'); Next a randomized matrix x, is used as an input to test the unary function to see the output The following graph F igure 3 4, shows the value of x. After evaluating x in the Hes constructor function, the function returns an object of cla ss y. The graph shows s the value of the first derivative, dimension s second derivative. The difference between th e dimension s of first derivative and the second derivative, how the dimension s chan ge from the oringinal matrix to the first derivative and to the second derivative are very clea rly shown in F igure 3 4. Since the function y=sin(x) is needed to be evaluated the program overloads the unary function s in(x). The result s shown in the graph demonstrate that the dimension s of the first and second derivatives T he total numb er of derivatives do not change, because of the unary Unary functio n only has one input, its number of derivatives Since the first and second derivative matrix dimension is enlarged, compared to the original matrix, the elements are PAGE 33 33 allocated, and t he allocation of nonzero values in the first and second derivative s are also shown in the result. Figure 3 4 Result of the function sin with the ramdon input value PAGE 34 34 Figure 3 4. Continued Constant Variable F unction Whe n evaluating a constant input the program wil l automatically call the hesconstant function Hesconstant evaluates the constant value and returns an object of the class hes In the constant function, the input could be an object of class, or a double precision variable. If input is an object of class, then input stores all the properties of class, including value, derivative, s_nz, s_derivative, nderivs, s_nderivs The o utput is the PAGE 35 35 deep copy of the input. All the values of properties are the same. If the input is a double precision variable, then the value of input equals the ou t put value component Because x is a constant, the first and second derivative s of the output are all zero sparse matrix. Since input is a double precision variable, it does not have the properties of the class. Theref ore, the derivatives of the output will be empty matrices. Binary F unction The b inary function is a function with two input variables. There are several binary functions in a utomatic differentiation, such as the plus and multiplication function. The Table 3 1 explains the gradient and hessian pattern of binary functions. Table 3 1 T he first and second pattern of different unary operations Equation First derivative Second derivative Addition and S ubtraction Addition and subtraction functions in AD are used to add or subtract two variables and take derivative value of the r esult They take in two input variables and return one object of class hes. In a matrix, when doing addition or subtraction the corresponding two entries are added or subtracted element by element The input v ariables should have PAGE 36 36 identical size s or one o f them should be a scalar Addition and subtraction ha ve the following form in MATLAB, z=function_name(x,y) Where x and y are the two inputs to the function function_name, and z is the output of the function. Fo r this function, t here will be three differ ent cases: only x is an object of the class; only y is an object of the class ; or both of x and y are object s of the class. An example of the possible inputs for plus() is shown in Figure 3 5 below Figure 3 5 A ll the input cases for the plus function Case1: x is an object of class and y is a double precision variable In this case, x has the properties: value, derivative, s_nz, s _derivative, nderivs and s_nderivs. As y is a double precision variable the function do es not evaluate at y The variabl e x could be a constant or an equation. If x is a constant, the constant function will be called to evaluate all of its properties and then return the property values to the output. The value component of the output of plus(x,y) is equivalent to the additi on of the value components of x and y The derivative of the output z is found by implementing the chain rule recursively until the evaluation reaches the derivative of the innermost function which is taken with respect to the independent variable. Since y is not an object of the class the binary function only takes the derivative of x Thus the derivative of the output is equivalent to the derivative of x The number of independent PAGE 37 37 variables in x decides the number of derivative s of z Since x is a cons tant variable, the constant function will be overloaded The constant function will generate the first and second derivative s of x, and then pass their values t o the plus function. when x is an e quation. F irst when x is a un a ry equation, such as a trigonometric function, it will first call the un ary function to pass the outer derivative to the composite function Then the composite function compute s the final derivative of x. Second, when x is a binary equation it will first cal l the binary function. As the binary function is compose d of two more unary functions it has to call the relevant unary function s to give the derivative values until it arrives at the inner most derivative. Case 2: I f y is an object of class hes and x is a double precision variable then the situation is reversed from the Case 1. Now the plus function do es not have to evaluate x, but h as to define y s properties and pass them to the output z. Case 3 : if both x and y are objects of the class, t he output value component depends on both x and y s value component s : z.value=x.value+y.value In this case, x or y could be a scalar or a matrix. If x is a scalar and y is a matrix. T he first derivativ e of x would be a row vector w hile the first derivative of y will be a multiple dimensional matrix. When adding the value component of x and y, each element in y s value component will add with x. The output size will be the same as y. When doing addition of two derivatives, we need to use function repmat Repm at is a function which takes a matrix as an input block and returns a larger matrix with a specified number of rows and columns. The input block can be an array or a matrix. T he number of rows and columns of these blocks to be placed in the output matrix are given as inputs to the function. T he repmat function converts the derivative PAGE 38 38 component of x into the same size as derivative component of y. Once the dimensions of x and y are compatible, x and y can be safely added together. In the following graph, we have created one random matrix with the size of function to create a two by three tiling of copied input matrix. After using repmat function the result will be a big matrix with repeated blocks. Figure 3 6 Result of the function repmat with input of random matrix If x and y are both matri ces the derivative component of output z is the addition of both the directional derivative component of x and y. When function add s x and y s derivative, it must check whether x and y have the same independent variables I f they are taken derivative with respect to different independent variables with different sizes and the dimensions of variables will be different that will lead to different dimensions of x and y derivative component s Also, the derivativ e of output z cannot be realized, because when z takes derivative, it can only do so with respect to one variable each time For plus, both sides of + sign should have the same dimension. If taking derivative of x PAGE 39 39 and y with respect to the sam e variable, the sizes of both sides of Thus, t he number of derivative is the nu mber of variable x or y are taking derivative with respect to. The number of variable for second derivative to be with respect to is the same. Element W ise O peration Element wi se operation s perform computation on the corresponding elements of the two matrices. They have the mathematical form of In t he element wise multiplication in order to be operationally compatible, both x and y must have the same size except when one of them is a scalar When x or y is a scalar, the scalar will multiply with each element in the matrix without changing the dimension of the matrix. F igure 3 7 illustrates the form of the element wise operati on: take two variable inputs a nd return one output. In F igure 3 7 there are three cases existing for function times : x is a double precision variable y is an obj ect of class; x is an object of class, y is double precision variable ; x and y are both class object s MATLAB s intrin sic function is used to det ermine whether x or y is an object of the class. F igure 3 7 Different possibilities of inputs for function times Case1: When x is a double precision variable and y is a n object of class, the output z is obtained by add The s ize of PAGE 40 40 is determined by the computation of and it should be the dimension of y times the number of independent variables. The first derivative of z is written as w here x and y could be a scalar or a matrix. If x is a double prec ision scalar, the output first derivative equals x times each element in y s first derivative matrix The s ame principle has to be implemented when computing the second derivative of the output. If y is a scalar, taking derivative of y with r espect to the independent variables will result in a row vector. Suppose the independent variable vector is the first derivative of y is For the second derivative, t he result will be a square matrix with the size of Since x is not a scalar and the first derivative of y is a row vector, i n order x (:) is used to convert x into a colum n vector. The second derivative of output would be Much like calculating the first derivative, algebraic multiplication is applied to that compute s the second derivative of z. If both x and y are matrices (3 5) (3 6) T he first derivative of y is a directional derivative : the derivative of y is the rate change of y with respect to a certain direction. T herefore, the function uses comma nds: repmat (x (:) 1 number of derivatives ) to convert x into [ x ( x ( :)], which has the same dimension of the first derivative of y. S uppose n represents the number of derivative s the total number of x (:) is n and x (:) is a column vecto r with dimensions of 1*the product of the PAGE 41 41 dimension of x. When doin g the second derivative, the same method above is used except the tiling dimension of input to repmat would be 1 and Ca se2: When x is an object of class and y is a double precision variable the methods from case 1 are used with the roles of x and y reversed. Case3: When x and y are both class objects the output value will be generated from the product of x value component s T he derivative of the outp ut follows the product rule and can be written as: (3 7) (3 8) If x is a scalar then t he first derivative of z can be divided into two terms: term1=x.derivative.*y.value, term2=x.value.*y.derivative Suppose the independent variable is a n valued entry. For term1, the f irst derivative of x is a row vector with size n. is multiplied with the corresponding element of y. For term2, the value component of x is a scalar W hether or not the first derivative of y is a matrix or a scalar, the first derivative of each element in y will multiply by x. The result of the computation will be the same size as the size of the first derivative of y. If y is a scalar then the methods used in case 1 are duplicated with the roles of x an d y reversed If x and y are both matrices then the sizes of x and y need to be the same for element wise operation Suppose the dimensions of x and y are m x n and the size of the independen t variable vector is p. The dimension s of term one above will be m x n x p that is The direction of the first derivative is taken along the columns Likewise, tiling the column vector y (:) follows the same direction as the derivative direction After reshaping into a column vector, y is duplica ted into a matrix with the n umber of columns PAGE 42 42 as the size of the derivative of x elements are computed. A similar method is used to deal with term 2 The equation for the second derivative is T he princ iple of taking the second derivativ e is similar to the principal of taking the first derivative. The onl y difference of the two computations is that t here is an extra term in the computation of second derivative which is 2*x.*y T he size of the extra term is m x n x p H owever the dimensions of the other term s in the equation are all m x n x p x p. I n order to continue the whole computation, the size s of all the terms should be c ompatible. So the binary function uses the find function to extract the non zero element s in the second term and return their location s returns the nonz e ro elements as a vector. After extracting the nonzero values from the matrix, according to their origina l location in the matrix, a sparse matrix is used to reorganize tho se values into the sa me size as first and third term witho ut changing the location of th e nonzero values. All of the properties ar e stored in the output z Due to features of operator overloading, the input can only be a double precision variable or an obj ect of the class hes And t he function needs to return the data type of the class object Therefore, in the end of the code, function is utilized to convert z into an object of class. Figure 3 8 gives all the three cases when x and y are both class objects. These three cases are 1) x is a scalar and y is a matrix. 2) x is a matrix and y is a scalar. 3) x and y are both matrices. Under these three cases, the program carries out different methods to compute the derivative of output z. As in case two a nd three, the equations for computing first derivative value of z are different. PAGE 43 43 Figure 3 8 Different cases when x and y are both objects Matrix O peration There are some matrix operations in automatic differentiation, such as mtimes, mrdivide mpower. Different from element wise operation, matrix operation is the linear algebraic operation of matrix. (3 9) Equation 3 9 is about the matrix multiplication. The elements in row vector of x are multipl ied with the associated elements in the same column in y. As indicated in the equation, the number of x s column s should be equal to the number of y s row s The following equation shows the function mtimes expression in MATLAB: function z= mtimes( input1 input2 ) If x or y is a sc alar, then the opera tion process is the same as element wise multiplication. The three cases of x an d y as matrices are shown below in F igure 3 9. Specially, there are also there cases under x and y are matrices. PAGE 44 44 Figure 3 9 Different possibilities of inputs for function mtimes If x is a double precision variable and y is an object of class then t he first derivative of the matrix multiplication is Suppose x is a matrix with the dimension s of and y is the matrix with the dimension s of , where q is the num ber of independent variable s So the dimension of y s first derivative is In order to be compatible, the derivative of y should be reshaped into the dimension s T he first derivative of y is a matrix that conta ins blocks. Each block is a row vector with dimension of which has only one nonzero entry the first derivative of y is converted into another matrix, whose number of rows is the same as the number of column s of x T he second derivat ive of matrix multiplication is Each element in x is multiplied with the corresponding element in the second derivative of y. If x is an object of class hes and y is a double precision variable then t he v alue component of the output z is determined by the product of the value co mponent s of x and y. The f irst derivative of z is z.derivative=x.derivative*y. Suppose x and y have the same dimensions as in case one. For the first derivative of x, it s dimension s are However, the dimension s of y are The dimensions of the two matrices do not fulfill PAGE 45 45 the requirement of matrix multiplication. T he function utilize s the regroup function to reshape x into a matrix with the dimension s of The function regroup divide s a nd concatenate those groups into one matrix If t he variables x an d y are both class objects, then both x and y ha ve the class properties: value, derivative, s_nz, s_derivative, nderivs, s_nderivs. The output, z onent and y The first an d second derivative of output z can be written as, (3 10) (3 11) In this case, when computing the first derivative, the function utilizes the results from case 1 and case 2 to compute the first and second term by applying the relevant rule, such as the chai n rule and product rule When computing the second derivative, the way to calculate the first and third term comes from the first and second case. When calculating the second term the dimension s of x and y are increased by n, where n is the number of inde pendent variables First, the dimensions of the first derivatives of x and y are converted into compatible sizes. After getting the the nonzero elements are extracted from the product with their row and column index. Then a sparse fun ction is used to create a matrix with the same dimensions as the first or third term. If the matrix ha s p stacks of a matrix, the regroup function is t o reshape the function z_sderivative=sterm1+2*sterm2+sterm3 ; this code is used to add the three terms together. For example, if y is the class object with independent variable x, t he following graph shows the result of the multiplication of these two matrices It can be PAGE 46 46 revealed that after multiplication, the dimensions for the first and s econd derivative will be and Figure 3 10 Results of the matrix operation PAGE 47 47 Logic F unctions There are multiple logic functions in the AD program such as ge.m, eq.m, and gt.m Th e se function s are used to compare each element in tw o inputs, and return logic value of 1 or 0 Whether the two input variables are scalar s or matri ces the output will be a variable with the same size as the input s E ach element in the output will be the comparison result of corresponding element in two in puts. However, the function need s to know whether the inputs are objects of class hes or double precision variables. If the input is an object of class dot reference is required to get the value component of the input. C onstant F unctions The const ant function is use d to create a constant value for the output from a given dimension They can take i n multiple inputs and return a class object T he input s indicate the final dimension s of the output. There are some constant functions in AD, such as zero s.m, ones.m, and hescontant.m F unction s such as ones and zeros have different number s of inputs. When the input is a scalar, it creates a square matrix with the size provided by the input. When the input is a vector the constant function returns a m ulti dimen s ional matrix T he size of each dimension is stored in the input vector and the first element in the vector indica tes the size of first dimension. F or those types of functions, the input can have variable data type s However, t he function can hav e only one input which can be a scalar or a vector, or can have multiple inputs. All of these cases ar e shown in F igure 3 11 In Figure 3 11, the input can be one or more than one, it can be a vector or scalar. PAGE 48 48 Figure 3 11 Different input possibilitie s for the constant function The constant function uses a MATLAB build in function nargin to return the number of inputs for the function. When th ere is only one input, nargin=1. The function varargin contains all the optional inputs from the function o nes For example the input i n ones.m is the size indicator which is different from other functions. function z=ones(varargin) Therefore, z.value=ones (input.value) where input is an object of class. For multiple input situation s v arargin cannot be di rectly used as the size indicator The function ones cannot extract the value stored in varargin directly since it is a cell array I n order to get the value stored in varargin, a loop is used to assign every value to a variable vector Suppose z =ones ( d ) where d is a vector containing the dimensions [ m, n, o, p . ]. The output matrix z, is a constant matrix in which all the elements are one T he derivative of this ones matrix is zero. The size of the first derivative is the product value of the size of t he input and the number of the independent variable s The size of second derivative matrix is the same as the first derivative. Shaping F unction The s haping function is used to change the size of the matrix and return s the final matrix after reshaping, wh ich is an object of class hes. There are some shaping functions PAGE 49 49 defined in AD, such as ctranspose.m, horzcat.m, repmat.m, reshape.m, vertcat.m, transpose.m. For example, the function is used to return a matrix with a desired size In the reshape function, there must be at least two input variables : one is the original matr ix and the other one is the size of the matrix after reshap ing We use the general form to express the function in MATLAB: z =reshape(x, varargin); Varargin indicates the size v ector and cannot be zero. Since each element s value in x does not change after reshaping, the first derivative of t h e output, z equals the first derivative of x. However, t he location of each value in the matrix changes after reshaping The dimensions of first and second de rivative of output z are the same as the dimensions of the first and second derivative of x. Array R eferencing and A ssignment F unctions In addition to the functions above, the AD t array referencing and assi gnment. Array referencing and assignment function s are used to take in three inputs, and return one output, which is an object of class hes. The overloaded version s of these function s are subsasgn.m and subsref.m. Subsasgn is the function that use s a scrip t to assign the value of an input to an output (3 12) The equation above explains how subsasgn works. In MATLAB the function will be written as follows, function x = subsasgn(x,s, y ) Usually, function subsasgn has three inputs : the variable itsel f, the indicator, and y X assigns its value to y T he variable s is a structure that stores the ind ices of the value and the type of referencin g For automatic differentiation, s should be a 1 by 1 structure so PAGE 50 50 that there are no multiple indexes. At the s ame time, there is only one type of value assignment, which is () MATLAB has its own intrinsic string tester, strcmp. Strcmp is used to compare two strings If it is of the type of () the function can continue evaluating It is very important to prevent indices from exceeding the size of the provider x. T he output of function subsagn is x where x is an object of class. T he function defines all the properties of x including value, derivative, s_ nz, s _derivative, nderivs, s_nderivs F unction subsasgn assigns only the referenced part of x s value to y. For example, if the index is i then t he value component of y describe s the value component of x (i). The first and second derivative of the outpu t y is equivalent to the derivative of the value of x(i). The value of x does not change because of the assignment Composite F unction The c omposite function aims to combine the outer and inner derivative s and return s the final result of the overall deriv ative by applying the chain rule like a compiler The form of the function in MATLAB is written as: function y =compositeDerivative(x,y,outerDerivative1,outerDerivative2) The first two input s of this function are class object s x and y The last two inputs stand for the first and second outer derivative of the function respectively. The unary function provides the first and second outer derivatives for the composite function. For y(x), the first derivative is The second derivative of y(x) i s (3 13) So the composite function utilizes this relationship between outer derivative and inner derivative to propagate the derivative of the function. In the composite function, y is the PAGE 51 51 output, and the type of y which is returned by the composite function, is an object of the class. So y six properties are need ed to be define d in the composite function. The o uter derivative is passed to composite function which is o r the dimension of outer derivative would be the product of the dimension s of x and y. F or the dimension of the inner derivative would be the square of dimension of the independent variable x. As discussed before, the first deriva tive is a diagonal matrix, and the second derivative is a tri agonal in a cubic. In order to speed up the computation, we only need to take care of the non zero elements in the chain rule equation. For the sparse matrix with many zero elements, it will occu py a lot of storage space. In order to reduce the storage size and speed calculation, all the derivativ es returned should be stored spars ely A s parse matrix is the two dimensional matrix removes the zero elements from a matrix in which most of the element s are zero In order to store the non zero value in a sparse matrix, one should first extract the non zero element s as well as their indices. Then the matrix is reconstructed using the sparse function PAGE 52 52 CHAPTER 4 EXAMPLES OF OPTIMAL CONTROL PROBLEM This chapter begins with a brief introduction about nonlinear programming problem and the NLP solver used in the following example is presented. Next, the main concentration of this chapter is on using aut omatic differentiation for four examples. The first exam ple is a test example, aims to test whether the result from AD tool is correct or not. Th e following three examples organized from the simplest one to the most complicated one. Those examples utilize MATLAB build in function fmincon to find the optim izat ion result of the cost function. This work choose s the algorithm to be an interior point algorithm, which accepts the use r supplied gradient and hessian. Through the affects the spe ed for propagating the result and in what kind of problem AD will show its obvious advantages. Optimization Optimization is choosing a best element in an allowed set to minimize or maximize a function. Usually, this function is called objective function. O ptimization P roblem Generally, the optimization problem has the form of Given a function: f(x) from set A to real numbers Sought: x 0 in A that minimizes f(x), that is ; Or maximize f(x), that is Where f(x) is the objective function, solution x 0 is called optimal solution PAGE 53 53 Nonlinear Programming The optimization problems with the nonlinear object functions or constraint s can be transform ed into nonlinear programming problems. Nonlinear programming is one method to solve optim ization problems through some inequality and equality constraints. Usually the form of Nonlinear Programming is as follows: Minimize or Maximize f(x) S ubject to F(x) is the cost function It is the target function that we need to get the optimal result. Or, we can call it as goal function. E(x) is the equality constraint G(x) is the upper bound constraint; L(x) is the lower bound constr aint. With these constraints we can get the condition for minimization or maximization of the cost function. Then the Lagrangian function could be written as (4 1) Where f(x) is the cost function, C(x) is the constrai nt function is a Lagrange multiplier. Th e second derivative of Lagrang ian function is shown as below (4 2) solver. In fmincon, interior point algorithm is used. For interior point algorithm, Hessian Matrix of the Lagrangian function and the first derivatives of the objective function and constraint function. And interior point algorithm can accept user supplied Hessian value. PAGE 54 54 Otherwise, the hessian value is provided by either dense quasi Newton approximation, or limited memory, large scale quasi Newton appro ximation or finite differences. Example 1 Problem S tatement This problem is used to test the Jacobean and Hessian generated by automatic differentiation, which is called arrowhead function. The function can be written as (4 3) Wher e maps from and x is the variable with n entries. The Jacobean of this function has a special form, which can be visualized as In this example, automatic differentiation provides the first derivative value of f(x). Consider a randomized matrix x as an input to the vector function above, who has the size of And the number n is 6. Result From the result showed in F igure 5 1 the dimension of the Jacobian matrix is a 6x6 sparse matr ix. The full matrix of the first derivative of the function is shown at the end. In this matrix, most of the elements are zero except for the arrowhead direction, which stores the nonzero first derivative values. It fulfills the characteristic of the arrow head function. That means the first derivative ge nerated from AD tool is correct. PAGE 55 55 Figure 4 1. R esult of the arrowhead function PAGE 56 56 Figure 4 1. Continued PAGE 57 57 Example 2 Problem S tatement Consider the problem Minimize Subjec t to This example is a minimization problem, w here are three decision variables is the objective function and is the equality constraint function. I n th is problem, the objective function is needed to be minimized through finding the suitable decision variable values In order to get the optimization result, th e first derivative of objective functio n and constraint functions are needed to compute with respect to all the variables Also we need to compute the second derivative of the Lagrangian function. There are three variables in this p roblem, and only one output. Suppose the output is y, the Jacobean matrix would be (4 4) Hessian matrix for this problem would be (4 5) As introduced in the previous chapter, t he Lagrang ian func tion is (4 6) Substituting F as Then the Lagrang ian Equation will become PAGE 58 58 (4 7) The gradient value is the differentiation of cost func tion with respect to the variables (4 8) The hessian value user or fmincon need to supply is the second derivative of Lagrangian equ ation with respect to the state (4 9) Usually, has an equality and inequality part of values. Where in this problem, since only the equality constraint function is presented only has the equality nonlinear value. Result The initial value for the three v ariables: are given as [100; 100; 100]. In The algorithm constrain t function and the Lagrangian formula. If user does not provide the derivative re 5 2 shows the result propagated from MATLAB. This problem is a simple problem, with only three states. And it can easily find the state value and the lambda value from computation of the Lagrange formula even by hand That s why MATLAB does not need to give the iterations. The values of three states are [1,1,1], and the equal nonlinear value for the Lagrange multiplier is 1. The function value at this point is 1.5. PAGE 59 59 Figure 4 2 R esult of the example two Figure 4 3 is the comparison of three different derivative methods used in NLP solver to propagate the overall results. Those three different derivative methods are: 1) Finite differencing provides the first derivative value of the objective function and constraint function. It also provides the second d erivative value of the Lagrangian function. 2) Finite differencing provides the first derivative value of the objective function. AD tool provides the second derivative value of the Lagrangian function. 3) AD tool provides the first derivative value of the objec tive function and the constraint function. It also provides the second derivative value of the Lagrangian function. PAGE 60 60 Here in F igure 4 3, the overall time of the first two methods CPU takes to get the final optimization is similar to each other. However, th e third method, AD provide both the first and second derivatives to the relative functions, is much faster. Figure 4 3 Comparison of three different derivative methods Example 3 P roblem S tatement M in S ubject to the constraints W here equations one and two are linear constraints, equation three, four, five and six are nonlinear constraints. So in the fmincon function we need to set up A= [ 1, 1] b= 0.5; lb= 1. In this problem, the Lagrange formula is PAGE 61 61 (4 7) In this problem, we use three different derivative methods. 1) Finite differencing provides the f irst derivative value of the objective function and constraint function. It also provides the second derivative value of the Lagrangian function. 2) Finite differencing provides the first derivative value of the objective function. AD tool provides the second derivative value of the Lagrangian function. 3) AD tool provides the first derivative value of the objective function and the constraint function. It also provides the second derivative value of the Lagrangian function. Result The final value of the function is 2, and the final value of the states are [1;1]. The following graph, F igure 4 3, shows the results iterated from the fmincon of MATLAB when AD provides both the first and the second derivative values. Fmincon in this problem has went through 14 steps to converge to the correct answer. Taking a look at the first order optimality, it shows the feasible solution is very close to the optimal point. Figure 4 4 R esult of example three PAGE 62 62 In T able 4 1 below, it compares the results of all the three cases. Ev en though the case, that when AD tool only provide first derivative, uses least steps, but this case consumes more CPU time to get the final result. Comparing the first derivative method to the third derivative method, they consume similar amount of CPU ti me to get the final results. The first method, that finite differencing provides all the first and second derivatives, is 0.03 second faster than the case that AD tool provides all the first and second derivatives. Since this difference is very tiny, we ca n almost ignore it. In the table below, the values of function, variable and Lagrange multiplier are the same. That means using AD tool in this optimization problem can get the correct result. Figure 4 5 Comparison of three different derivative modes PAGE 63 63 Example 4 Problem S tatement Consider the following NLP problem [21] Minimize Subject to This problem has five different va riables: x 1, x 2 x 3 x 4 x 5 ; one objective function, which is ; two equality constraints: and ; two inequality constraints: and In this problem, substituting objective function and all the constraint functions in the Lagrangian function the function will be (4 8) R esult The first result is what the AD tool supply those two values The second r esult is what the program supply the hessian and gradient of the related values using the finite difference. In the first result, the total nu mber of iteration steps is thirty five, compared to the second result, whose total num ber of iteration steps is fo rty five. This comparison PAGE 64 64 obviously reveals that, by using the automatic differentiation, optimality process reduces about twenty two percent of the total steps of iterates. Figure 4 6 R esult of the example with giving the hessian and gradient by AD t ool PAGE 65 65 Figure 4 7 R esult with the hessian and gradient value generated by the quasi Newton approximation PAGE 66 66 This table below shows all the results of three different derivative methods. It is apparently seen that, with providing the first and second derivat ives by AD tool, fmincon uses less iteration steps than the other derivative methods. The CPU time used to find the optimal result is 1.370038s, 0.9611894s and 1.378906s respectively. Therefore, if AD provides the first and second derivative values to the NLP solver, it will save more than 25 percent time as compared to the normal way. Other parameters of these three derivative methods are similar to each other. This shows the efficiency of AD tool in solving the complicated problems. Figure 4 8 Compari son of Methods using AD and using Finite Differencing PAGE 67 67 CHAPTER 6 CONCLUSION AND FUTURE WORK This work presents a new automatic differentiation called Hes This tool is forward operation overloading tool. In this work, we employ different algorithms of AD. Compared to hand computation or infinite differencing, automatic differentiation is more effici ent and less error prone. In Chapter 5 we have implemented the automatic differentiatio n into some specific problems. The gradient and hessian generated by the automatic differentiation were found to be more accurate and time saving when compared to that generated by finite differencing or quasi Newton approximation method This comparison proves that automatic differentiation can not only provide accurate gr adient value, but also works well for high order derivatives. This work only uses operator overloading method to generate the derivative of a formula. And also, it uses object oriented MATLAB to realize the code. In the future, we can combine forward mode with backward mode or combine operator overloading with source transformation to make the program run more fluently. PAGE 68 68 LIST OF REFERENCES [1] Wolfe, P., "Checking the Calculation of Gradients," ACM Trans.Math.Softw.ACM Transactions on Mathematical Software, Vol. 8, No. 4, 1982, pp. 337 343. [2] Andreas Griewank, "On Automatic Differentiation," 1994, [3] Christian Bischof, Alan Carle, George Corliss, "ADIFOR: Automatic differentiation in a sou rce translator environment," [4] Rosemblun, M.L., and Rosemblun, M.L., "Automatic differentiation: Overview and application to systems of parameterized nonlinear equations," 2009, [6] Fateh, H., "Automatic differentiation for large scale nonlinear programming problems," 1995, [7] Ernesto G. Birgin, and Yuri G. Evtushenko, "Automatic Differentiation for Optimal Control Problems," 1998, [8] Griewank, A., Juedes, D., and Srinivasan, J., "ADOL C : a package for the automatic differentiation of algorithms written in C/C++," Mathematics and Computer Science Division, Argonne National Laboratory, Argonne Ill., 1991, [9] Jee, K. ., McShan, D.L., and Fraass, B.A., "Implementation of Automatic Differentiation Tools for Multicriteria IMRT Optimization," LECTURE NOTES IN COMPUTATIONAL SCIENCE AND ENGINEERING, Vol. 50, 2006, pp. 225 234. [10] Juedes, D., a nd Griewank, A., "Implementing automatic differentiation efficiently," 2008, [11] Kowarz, A., "Advanced concepts for automatic differentiation based on operator overloading," 2008, [12] Arun Verma, "ADMAT: Automatic differentiation in MATLAB using obje ct oriented methods," 1998, [13] Thomas F. Coleman, A.V., "ADMIT 1: Automatic Differentiation and MATLAB Interface Toolbox," 1999, [14] Bischof, C.H., Bucker, H.M., and Vehreschild, A., "A Macro Language for Derivative Definition in ADiMat," LECTURE NO TES IN COMPUTATIONAL SCIENCE AND ENGINEERING, Vol. 50, 2006, pp. 181 188. PAGE 69 69 [15] Gay, D.M., "Semiautomatic Differentiation for Efficient Gradient Computations," LECTURE NOTES IN COMPUTATIONAL SCIENCE AND ENGINEERING, Vol. 50, 2006, pp. 147 158. [16] Utke J., "Flattening Basic Blocks," LECTURE NOTES IN COMPUTATIONAL SCIENCE AND ENGINEERING, Vol. 50, 2006, pp. 121 134. [17] Phipps, E., Casey, R., and Guckenheimer, J., "Periodic Orbits of Hybrid Systems and Parameter Estimation via AD," LECTURE NOTES IN C OMPUTATIONAL SCIENCE AND ENGINEERING, Vol. 50, 2006, pp. 211 224. [18] Snopok, P., Johnstone, C., and Berz, M., "Simulation and Optimization of the Tevatron Accelerator," LECTURE NOTES IN COMPUTATIONAL SCIENCE AND ENGINEERING, Vol. 50, 2006, pp. 199 210. [19] Tadjouddine, M., Bodman, F., Pryce, J.D., "Improving the Performance of the Vertex Elimination Algorithm for Derivative Calculation," LECTURE NOTES IN COMPUTATIONAL SCIENCE AND ENGINEERING, Vol. 50, 2006, pp. 111 120. [20] Kharche, R.V., and Fort h, S.A., "Source Transformation for MATLAB Automatic Differentiation," Lecture Notes in Computer Science., No. 3994, 2006, pp. 558 565. [21] Gill, P.E., Murray, W., and Saunders, M.A., "SNOPT: An SQP Algorithm for Large Scale Constrained Optimization," S IAM J.Optim.SIAM Journal on Optimization, Vol. 12, No. 4, 2002, PAGE 70 70 BIOGRAPHICAL SKETCH Ying Lin was born in Wen Lin, Zhejiang, China in 1987. She rec degree in mechanical e ngineering from China Agricultural Universit y, Beijing, China in June 2009. She was awarded 3rd prize of Excellent Study of university for t hree Consecutive Years from China Agricultural University. She received her Master of Science degree with thesis in the Department of Mechanical and Aerospace E ngineering at University of Florida in the summer of 2011 under the supervision of Dr. Anil V. Rao and co advised by Dr. Warren E. Dixon. Her interests lie in the field of optimal control and control systems. 