Citation |

- Permanent Link:
- https://ufdc.ufl.edu/UFE0004885/00001
## Material Information- Title:
- Affine image registration using minimum spanning tree entropies
- Creator:
- Kumthekar, Aditee (
*Dissertant*) Rangarajan, Anand (*Thesis advisor*) - Place of Publication:
- Gainesville, Fla.
- Publisher:
- University of Florida
- Publication Date:
- 2004
- Copyright Date:
- 2004
- Language:
- English
## Subjects- Subjects / Keywords:
- Coordinate transformations ( jstor )
Density estimation ( jstor ) Entropy ( jstor ) Histograms ( jstor ) Image rotation ( jstor ) Images ( jstor ) Images of transformations ( jstor ) Mathematical optima ( jstor ) Pixels ( jstor ) Timber ( jstor ) Computer and Information Science and Engineering thesis, M.S Image analysis Dissertations, Academic -- UF -- Computer and Information Science and Engineering Image processing -- Digital techniques
## Notes- Abstract:
- The various environmental factors, settings of the image acquisition device and the material properties of a object directly affect the formation of an image. Registration of two images involves aligning the two images obtained under different conditions which is based on several information-theoretic methods. Entropy estimation is an important step in these methods which traditionally requires estimation of the density function of the images. This thesis however directly estimates the entropy without a priori default density estimation. This is achieved by exploiting a powerful theorem which relates the asymptotic length of the minimum spanning tree formed from the samples and the Renyi entropy of the density function underlying the samples. Since the density estimation step is avoided, the careful tuning of several density estimation parameters, which is required by most density estimation methods, is avoided. Also, since the spanning tree method estimates the entropy in high dimensions, this allows us to include feature information into the registration process. Affine registration of face images of different subjects with different gestures and taken under different lighting conditions is demonstrated.
- Subject:
- affine, entropy, image, minimum, registration, Renyi, spanning, tree
- General Note:
- Title from title page of source document.
- General Note:
- Document formatted into pages; contains 91 pages.
- General Note:
- Includes vita.
- Thesis:
- Thesis (M.S.)--University of Florida, 2004.
- Bibliography:
- Includes bibliographical references.
- General Note:
- Text (Electronic thesis) in PDF format.
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright Kumthekar, Aditee. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Embargo Date:
- 4/30/2004
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

AFFINE IMAGE REGISTRATION USING MINIMUM SPANNING TREE ENTROPIES By ADITEE KUMTHEKAR A THESIS PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORIDA 2004 Copyright 2004 by Aditee Kumthekar This thesis is dedicated to my family whose constant and unconditional support has made it possible. ACKNOWLEDGMENTS I would like to thank Dr. Anand Rangarajan for his excellent guidance, for the innumerable hours that he spent patiently resolving my difficulties and for his constant encouragement which was the driving force behind this thesis. I would also like to extend my thanks to Jie Zhang who made me see certain aspects of image registration clearly. TABLE OF CONTENTS page ACKNOWLEDGMENTS ............... .......... iv LIST OF TABLES ............... ............. vii LIST OF FIGURES ............... ............ viii ABSTRACT ......... .... ........... ....... xii CHAPTER 1 INTRODUCTION .................. ... 1 2 BACKGROUND ............................... 7 2.1 Entropy and Mutual Information ... .......... 7 2.1.1 Joint and Conditional Entropy ............. 8 2.1.2 Relative Entropy and Mutual Information ......... 8 2.1.3 Differential Entropy. . .............. 9 2.1.4 Renyi Entropy ...... .. ......... .. ...... 10 2.2 Histogram method for Estimating Entropy ........... 10 2.3 Minimum Spanning Trees .. ................ .. 11 2.4 The Problem of Registration ................ .. 16 2.5 Image Transformations .... ........... .... 18 2.6 Independent Component Analysis. . . . .. 19 3 THEORY OF MINIMUM SPANNING TREE ENTROPIES .. ..... 22 3.1 Entropy Estimation for Registration . . . . 22 3.2 History ......... . . . ...... 23 3.3 Minimum Spanning Trees for Entropy Estimation . . 25 3.4 M measure of Registration . . . . . 28 4 ALGORITHM DESIGN .. . . . ....... 30 4.1 General Registration Algorithm. . . . ... 30 4.2 Design of the Algorithm . . . . . 32 4.2.1 Preprocessing by Histograms. . . . 32 4.2.2 Features . . . . .. ........ 34 4.2.3 Search Strategy . . . . .. . 37 4.2.4 Minimum Spanning Tree Algorithm . . 39 4.3 Implementation Details . . . . .. . 40 4.3.1 Hardware and software Details . . . 40 4.3.2 Free Parameters . . . . . . 44 4.4 Results . . . . .. ....... . 44 4.4.1 Same Image Registered with the Same Image . . 45 4.4.2 Registration of Same Person with Different Expressions 45 4.4.3 Yet Another Registration of Same Person with Different Expressions . . . . . . 48 4.4.4 Registration Result of Images of two Different People 50 4.4.5 Yet Another Registration Result of Images of Two Different People ....... ........ . . ........ 52 4.4.6 Comparison with Histogram Based Method . . 53 4.4.7 Convergence of Renyi Entropy with Number of Samples as Alpha Changes . . . . .. . 58 4.4.8 Noise Trials . . . . . . 61 5 DISCUSSION . . . . . . . . 64 APPENDIX ADDITIONAL INFORMATION . . . 67 REFERENCES ..... . . ... ............... 76 BIOGRAPHICAL SKETCH . . . . . . . 79 LIST OF TABLES Table page 4-1 The ICA values ................... ........... 34 4-2 Free parameter table . . . . . . . 44 4-3 Noise results over rotation for histogram . . . . 61 4-4 Noise results over rotation for MST . . . ...... 62 4-5 Noise results over rotation and scale for MST .. . . 63 4-6 Noise results over rotation and scale for MST .. . . 63 LIST OF FIGURES Figure page 1-1 Two registered images with different intensity distributions . 4 2-1 Original graph . . . . ... .......... 14 2-2 MST by Kruskals Algorithm . . . . ... . 14 4-1 Block diagram for image registration . . . ....... 30 4-2 Flow-chart for the estimating the parameters of the transform . 31 4-3 Effect on registration by varying the number of bins of histogram(Niiiil.i r of Bins are 8, 16 and 32 from top left to bottom right. Maximum should be present at 0.) . . . . .. . 33 4-4 ICA feature images . . . . . . . 35 4-5 The faces used to compare registration for different sizes of ICA blocks ..... . . .... .............. 36 4-6 Figure showing the variations in MI vs. the rotation angle various ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features. Maximum located at 0. . . . . .. . 36 4-7 Figure showing the variations in MI vs. the rotation angle various ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features 37 4-8 Behavior of MST entropy in ID and 2D search space. . . 38 4-9 Plots showing error between the actual MST length and the truncated MST length over various radius factors and over rotations of -10, 0 and 10 degrees . . . . . . . 41 4-10 Figure showing the variations in joint entropy verses the rotation angle with varying number of samples picked from the overlap area when alpha is 0.55 are shown as 1. 100 Samples 2. 300 samples 3. 500 samples 4. 700 samples . . . .... .. 42 4-11 Figure showing the variations in joint entropy verses the rotation angle with varying number of samples picked from the overlap area when alpha is 0.55 are shown as 1. 900 Samples 2. 1100 samples 42 4-12 Figure showing the variations in joint entropy verses the rotation angle with varying number of samples picked from the overlap area when alpha is 0.85 are shown as 1. 100 Samples 2. 300 samples 3. 500 samples 4. 700 samples . . . . .. 43 4-13 Figure showing the variations in joint entropy verses the rotation angle with varying number of samples picked from the overlap area when alpha is 0.85 are shown as 1. 900 Samples 2. 1100 samples 43 4-14 Figure showing the from top left to bottom right 1. Face image 2. Same face with applied transformation 3. Overlap of the two unregistered images 4. Edge image of the overlapped unregistered images . . . . . . . . 46 4-15 Minimum spanning tree of the unregistered images considering only the intensities of the two images as features . . 46 4-16 Figure showing the from top left to bottom right 1. Face image 2. Face obtained after applying the optimal transformation to the unregistered image shown in the figure 4-14 3. Overlap of the two registered images 4. Edge image of the overlapped registered images . . . . . . . . 47 4-17 Minimum spanning tree of the registered images considering only the intensities of the two images as features. . . . 47 4-18 Figure showing the from top left to bottom right 1. Face image 2. Face image of the same person with glasses which is unregistered 3. Overlap of the two unregistered images 4. Edge image of the overlapped unregistered images . . . .. . 48 4-19 Minimum spanning tree of the unregistered images considering only the intensities of the two images as features . . 49 4-20 Figure showing the from top left to bottom right 1. Face image 2. Face image of the same person with glasses which is registered 3. Overlap of the two registered images 4. Edge image of the overlapped registered images . . . . . 49 4-21 Minimum spanning tree of the registered images considering only the intensities of the two images as features . . . 50 4-22 Figure showing the from top left to bottom right 1. Face image of a happy person 2. Face image of the same person with sad expression which is unregistered 3. Overlap of the two unregistered images 4. Edge image of the overlapped unregistered images . 51 4-23 Figure showing the from top left to bottom right 1. Face image 2. Face image of the same person with sad expressions which is registered 3. Overlap of the two registered images 4. Edge image of the overlapped registered images . . . . 51 4-24 Figure showing the from top left to bottom right 1. A Face image 2. Face image of the different person 3. Overlap of the two unregistered images 4. Edge image of the overlapped unregistered images . 52 4-25 Figure showing the from top left to bottom right 1. Face image 2. Face image of different person which is registered 3. Overlap of the two registered images 4. Edge image of the overlapped registered images . . . .. ............ 53 4-26 Figure showing the from top left to bottom right 1. A Face image 2. Face image of the different person 3. Overlap of the two unregistered images 4. Edge image of the overlapped unregistered images . 54 4-27 Figure showing the from top left to bottom right 1. Face image 2. Face image of different person which is registered 3. Overlap of the two registered images 4. Edge image of the overlapped registered images . . . .. ............ 54 4-28 Figure showing the from top left to bottom right 1. A Face image 2. Face image of the different person 3. Overlap of the two unregistered images 4. Edge image of the overlapped unregistered images . 55 4-29 Figure showing the from top left to bottom right 1. Face image 2. Face image of different person which is registered 3. Overlap of the two images registered using histogram 4. Edge image of the overlapped registered images . . . . . 55 4-30 Figure showing from left to right 1. Face image of a person 2. Face image of same person which is registered . . 56 4-31 Figure showing image 2 as registered by the histogram . . 56 4-32 Plot of MI vs. rotation angle when registered by histogram. Maximum should be at -5 degrees. . . . . .. 57 4-33 Figure showing image 2 as registered by the MST . . 57 4-34 Plot of MI vs. rotation angle as registered by MST. Correct answer is at -5 degrees. . . . . . . ... 57 4-35 Figure showing the from top left to bottom right, MST Renyi joint entropy vs the samples for 1. alpha = 0.4 2. alpha = 0.45 3.alpha = 0.5 4. alpha = 0.55 . . . . ... 58 4-36 Figure showing the from top left to bottom right, MST Renyi joint entropy vs the samples for 1. alpha = 0.6 2. alpha = 0.65 3.alpha = 0.7 4. alpha = 0.75 . . . . ... 59 4-37 Figure showing the from top left to bottom right, MST Renyi joint entropy vs the samples for 1. alpha = 0.8 2. alpha = 0.85 3.alpha = 0.9 4. alpha = 0.95 . . . . ... 60 Abstract of Thesis Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Master of Science AFFINE IMAGE REGISTRATION USING MINIMUM SPANNING TREE ENTROPIES By Aditee Kumthekar \.r,- 2004 Chair: Anand Rangarajan AI. P] r Department: Computer and Information Science and Engineering The various environmental factors, settings of the image acquisition device and the material properties of a object directly affect the formation of an image. Registration of two images involves aligning the two images obtained under different conditions which is based on several information-theoretic methods. Entropy estimation is an important step in these methods which traditionally requires estimation of the density function of the images. This thesis however directly estimates the entropy without a priori default density estimation. This is achieved by exploiting a powerful theorem which relates the .i 'mptotic length of the minimum spanning tree formed from the samples and the Renyi entropy of the density function underlying the samples. Since the density estimation step is avoided, the careful tuning of several density estimation parameters, which is required by most density estimation methods, is avoided. Also, since the spanning tree method estimates the entropy in high dimensions, this allows us to include feature information into the registration process. Affine registration of face images of different subjects with different gestures and taken under different lighting conditions is demonstrated. CHAPTER 1 INTRODUCTION The entire field of computer vision is motivated to solve the problem of designing a system which has the capabilities of human vision. When we look around, our eyes and our brain immediately analyze the relationship between the different objects in our immediate environment. We usually do not have to think long to interpret what we see. It comes naturally to us. However, we have not been able to emulate this amazing system of perception after 40 years of the inception of the field of computer vision. Why is this so? This is because a very complex relationship exists between the 1]li',-i' .1l factors and the way an image is formed. Such factors include reflectivity and absorbency of the illuminated surface, the intensity and the direction of illumination, the environmental factors like the presence of dust and speckle in the atmosphere, rain etc. Not only this but the mode in which an image is acquired for example, a camera and the parameters that calibrate the instrument for example, the shutter speed in the camera contribute to the image formation. Viola and Wells [ have represented this event of image formation as an equation as v(T(x)) = F(u(x), q), (1.1) where T(x) is the transformation function which determines the orientation of the image; u(x) is the function which describes the surface of the object; q represents the other parameters that affect image formation such as lighting, environment etc; v() is the function that formulates the pixel intensities and F(u(x), q) is the function which incorporates the various parameters mentioned above which play a role in the image formation. Modelling the aforementioned function is not a trivial task as there are a vast number of parameters involved. Many vision related systems attempt having functionalities such as object recognition identifying the objects from the background region, indexing of image databases- the efficient retrieval of images similar to the query image and object modelling the mathematical representation, coding and synthetic generation of an image from a model. Many such applications require that the images under consideration be aligned before further processing can be performed on them. Apart from these applications which require alignment as a initial preprocessing step, there are a plethora of applications which rely on accurate image alignment. One such application is "anomaly dil 1 I i ii" in medical imaging, which refers to detecting any changes in a particular organ over a period of time either due to disease or to due to treatment, e.g., detecting if the post-operative MRI of patient shows the removal of the tumor that was seen in the pre-operative scan of the patient. In order to do this, we require accurate alignment of the images. Image registration is precisely this task of spatially aligning two images so that the corresponding pixels in the images refer to the same portion or area in the image. Alternatively, you could also think of image registration as finding the transformation which aligns the two images accurately. You might now wonder why is this not a simple task. Let us see why. We saw that there was a wide array of factors that contribute to the image formation. The challenge of registration lies in matching two images when they are taken at different times, under different conditions and using different viewpoints probably with different image acquisition techniques. Recently, the problem of multimodality based image registration, which refers to registering images under different lighting conditions, has gained importance. At the core,registration methods can be viewed as a combination of certain choices about the following parameters, namely [ ] Feature Space: Feature space refers to the parameters of the image used to perform registration. In some techniques, the raw image intensities are used directly. These are called intensity based methods. Also, there are feature based methods which extract some information from the raw image (termed as features) and then use these features to perform registration. The features can be extracted via image segmentation or by using information theory based technique. Search space: This is the set of various transformations applied to the image during the registration process. The spatial transformations run the gamut of rigid, affine, projective, polynomial, non-rigid(morphing)etc. These are explained later. Search strategy: This term denotes the technique used to search the entire set of transformations. In case of the exhaustive search, the entire space is scanned. In the case of hierarchical search, the set of transformations is searched using a coarse to fine hierarchy.Finallu there are techniques that use local gradient information of image similarity measure to determine a sequence of transformations which progressively minimizes the measure. Similarity metric: It is the criterion used for judging the 1ii.lii.], the quality of the achieved registration. The similarity metric is a figure of merit used to rank order the various spatial transformations. One might wonder what is this similarity metric anyway? Can't we simply compare the intensities at the corresponding pixel locations to verify if the images have been registered. This is not so straight forward. In figure 1-1, we show two registered images A and B. Clearly, the intensities in the two images at the same location do not correspond. There have been many recent efforts focused on information theory based methods. The information-theory based criterion usually provides us with some registration objective function which has to be minimized(or maximized) over a range. One such objective function is the mutual information objective function [ ] which was used to perform multimodality based image registration. Mutual Information is a measure which measures the amount of overlapping information content between two images. An interesting metaphor for intensity Itnig"A [muing B Figure 1-1: Two registered images with different intensity distributions based and information-theory based similarity measures is that the intensity- based technique can be viewed as "What We See Is All Get (WWSIAWG)" while the information-theory based criteria can be viewed as "What We See Is What We Mean(WWSIWWM)". Mutual Information techniques have now become widespread in registration especially in medical imaging. The Mutual Information between any two images can be computed either from the raw image intensity or from feature vectors extracted from each image. Rarely do we see registration approaches that combine intensity and features. This thesis attempts such a combination. Histograms are commonly used to estimate the entropy (information content)which is then used to calculate the Mutual Information. When feature based methods are used, the dimensionality of the feature space increases. Histogram methods suffer from the "curse of dimensionality" and thus cannot be easily used to estimate the entropy of a high dimensional feature set. Also the histogram based methods are extremely sensitive to the number of bins. Apart from this the histograms estimate the discrete Shannon Entropy while, as we explain later, we need to estimate the continuous differential entropy. 1 This calls for the use of non-parametric methods to estimate the entropy in a high dimensional feature space. Once such technique is to use spanning graphs to estimate entropy. The early work in this area was done by Steele [ ] who showed how the length of the minimum spanning tree (\IST) of a complete graph from the Euclidean distance between the feature vectors is directly proportional to the integral of an appropriately defined power of the probability density defined on the feature space. Later the group at University of Michigan, Alfred Hero et al. [ ]showed how the aforementioned relation between the length of the spanning tree and the integral of the power of the density can be used to estimate the Renyi entropy which in turn can be used to perform registration. The search space of affine transformations is one such space which includes the six parameters of rotations, scale, shear and translations in R6. The affine transformation is applied globally to the entire image. In this thesis, we have closely followed Hero's work and have successfully shown how the MST based approach can be used the to perform registration over six parameters of the affine transformation. We have computed an information- theory based criterion similar to the conventional mutual information using the Renyi entropy as proposed in paper [ ] and then minimized it over an affine search space. Also, we have combined the use of intensities and features in the same information theory based registration technique. The Renyi entropy is estimated 1 Information on the entropy, its types and Mutual Information is presented in Chapter 2. 6 from the high dimensional features space using the minimal spanning tree-based method. After this concise introduction, Chapter 2 covers the background material, Chapter 3 covers the theory, Chapter 4 covers the implementation details and finally, the last chapter covers the results, conclusions and future work. CHAPTER 2 BACKGROUND 2.1 Entropy and Mutual Information From a broad perspective, the entropy of a random variable is defined as the measure of uncertainty of a random variable. Alternatively, it can also be defined as the amount of information required to represent the variable. Thus, the more improbable a random the variable, greater is its entropy. Shannon [ ] derived a measure of entropy for discrete variables. Mathematically, the entropy of H(X) of a random variable X is defined as H(X) p(x) logp(x). (2.1) aGX Normally, the logarithm is taken to the base 2, in which case the entropy is expressed in terms of bits. Thus, entropy can also be expressed as the length of the code to represent the random variable. If g(x) are the actual values that a variable X can assume, then the expected values of g(X) is written as Epg(X) = g(x)p(x). (2.2) I X Thus, we can write the entropy of a random variable as H(X) = Elog (2.3) p(X)' This leads to a conclusion that H(X) > 0 as 0 < p(x) < 1 implies that log > 0. (2.4) p(X) 2.1.1 Joint and Conditional Entropy Joint and conditional entropy extend the concept of entropy to a pair of distinct random variables. Two random variables are said to be statistically independent when their joint density is the product of their marginal densities. Thus, when the two random variables are independent, the knowledge of one does not have any effect on the knowledge of the other variable. On the other hand, when the variables are totally dependent, that is, if Y = f(X) then, Y is not random when X is known. The mathematical representation of joint entropy is H(X, Y) = p(x, y) log p(xr, y) (2.5) EX y Y and the conditional entropy of a variable given the other variable is defined as H(YX) = p(x) p(, ,\-)logp(j, |) (2.6) xEX ye Y F-.,.,, ,)logp(Y|X). (2.7) By the chain rule we get H(X, Y) = H(X) + H(Y|X). (2.8) Two variables are said to be independent if H(Y|X) = H(Y) (2.9) that is knowledge of X has no effect on the knowledge of Y. Thus we have H(X, Y) = H(X) + H(Y). (2.10) 2.1.2 Relative Entropy and Mutual Information The relative entropy is the measure of the distance between the two probability distributions. To explain it better, suppose there a random variable with probability distribution p and whose entropy is H(p). Since we don't know the exact distribution of p we assume that it is q with entropy H(q). The relative entropy D(p||q) is the difference in the number of entropy bits of p and q. It is also called as the Kullback-Liebler distance and is mathematically represented as D(p| q)= p(x) logp() (2.11) xEX e Mutual information (I(X; Y)) is defined as the amount of information that one random variable has about other. It is the relative entropy between the joint and the product distributions of the two random variables, expressed as I(X; Y) = D(p(x, y)|'(x)p(y)) (2.12) p( p p(y) = p(X, y) log p(x)y (2.13) XEXyeY It can also be shown that I(X; Y) = H(X) + H(Y) H(X, Y) (2.14) I(Y; X)&I(X; Y) > 0. (2.15) Additional properties of mutual information are given in the book by Cover and Thomas [ ]. 2.1.3 Differential Entropy While entropy is a measure of uncert.'iinr in the discrete domain, differential entropy is a measure of uncert.'iirl of a variable in the continuous domain. It is defined as h(X) = -E[log(p(X))] (2.16) p(x) log(p(x))dx (2.17) /OO0 The major difference between entropy and differential entropy is that the differential entropy can take negative values. Thus, there is no direct relationship between code length and differential entropy since code length cannot be negative. 2.1.4 Renyi Entropy The entropy described in Section 2.1.3 is differential shannon entropy. Renyi entropy is a generalized version of the Shannon entropy. The Renyi entropy h0(X) of order a is defined as, h,(X)= log f fa(x)dxj, (2.18) for -o < a < 00oo, a 1. The discrete version of the Renyi entropy is defined as, H(X) = log I 1x (2.19) 1-a [ j When a -> 1 we obtain the Shannon entropy function from the Renyi entropy function as follows, h(X) = h (X) p(x)log(p(x))dx (2.20) In the discrete domain, as a -> 1, H0(X) = H(X). Viola and Wells [ ] have stated that the task of registering two images is equivalent to maximizing the mutual information between the two images. 2.2 Histogram method for Estimating Entropy Histogram based density estimation divides the range of the data into bins and then computes the normalized frequency of the data placed in the bins. The normalized frequency counts sum up to one. A one dimensional histogram of data with n samples and b bins is, f(ai) = where a, denotes the bin number i and 0 < i < b. (2.21) For multivariate data distributed across d dimensions, histogram-based density estimation by histograms is generally performed by constructing M bins along each of the d dimensions. If the number of bins is large then the histogram will fail to capture the shape of the probability distribution function due to high variance will be large and the reduced bias. However, as number of bins are decrease, the probability distribution function becomes spiky as the variance narrows and the bias increases. Algorithms that determine the correct bin size of the histogram include the Sturge's algorithm [ ]which states that for n samples, M = 1 + 3.322 loglo n Some other approaches predict bin size from a priori knowledge of data. In our thesis, we have chosen the bin size statistically. Histogramming suffers from the curse of dimensionality. That is, if the number of data dimensions are denoted by d, with M bins in each direction then the total number of bins will be M'. Such exponential growth in M incurs drawbacks. Firstly, as the M is large in higher dimensional data, more data points will be needed to populate the histogram. Otherwise, it will result in a sparsely populated histogram with numerous empty bins. Secondly, data in higher dimensions tends to be correlated and thus projects itself into selected dimensions. This makes histogram based method of density estimation unsuitable for higher dimensional data. 2.3 Minimum Spanning Trees Before describing the theory behind the use of Minimum Spanning Trees for registration later in chapter 3, let us see define a Minimum Spanning Tree. Minimum Spanning Trees(\ ITs) are used in solving many real world problems. For example, consider a case of a network with V nodes with E undirected connections between nodes. This can be represented as a connected, undirected graph G = (V, E) containing V vertices and E edges. Now suppose that all the edges are weighted, i.e., for each edge (u, v) c E we have an associated weight w(u, v). A weight can be used to represent real world quantities such as cost of a wire, propagation delay etc between two nodes in a network. A spanning tree is defined as a .,. 1*- li.- graph that connects all the vertices. A minimum spanning tree is a spanning tree with the minimum weight. Suppose we represent the spanning tree as T C E, which connects all the vertices, and whose total length is w(T), then the minimum spanning tree is defined as, min(w(T)) = w(u, v). (2.22) (u,v) CT Generic MST algorithm. The book by Cormen et al. [ ] gives a supported analysis of minimum spanning tree algorithms. The MST algorithm falls in the category of greedy algorithms. Greedy Algorithms are algorithms that make the best choice at each decision making step. In other words, at every step, greedy algorithms make the locally optimum choice and hope that it leads to a globally optimum solution. The greedy MST algorithm builds the tree step-by- step, incorporating the edge that causes minimum increase in the total weight at each step, without adding any cycles to the tree. Suppose there is a connected, undirected graph G = (V, E) with the weight function w : E R. While finding the minimum spanning tree for graph G, the algorithm manages at each step an edge-set S which is some subset of the MST. At each step, edge (u, v) is added to subset S such that it does not violate the MST property of S. This makes SU(u, v) a subset of the Minimum Spanning Tree. The edge which is added at each step is termed a "safe edge". The algorithm can be written as GENERIC-MST(G, w) 1. S--0 2. whileS does not form a spanning tree 3. do find a safe-edge (u,v) which can added in S 4. S -SU(u, v) 5. return S There are two popular algorithms for computing the Minimum Spanning Tree, Prim's algorithm and Kruskal's algorithm (refer [ ]). We have used the Kruskal's Algorithm to find the Minimum Spanning Tree. Its description follows. Kruskal's algorithm for MST. Kruskal's algorithm is an extension of the generic MST algorithm described in the preceding sub-section above. In the Kruskal's algorithm the set S, which is a subset of the minimum spanning tree, is a forest. At each step, the Kruskal's Algorithm finds the safe edge to be added as the edge with the minimum weight that connects two forests. Initially, the edges are sorted in the decreasing order of their weights. At each step, one finds the minimum edge in the graph not already present in the minimum spanning tree, connects two forests together. This process is repeated until, till all the vertices are included in the graph. The algorithm can be written as, MST-Kruskal(G,w) 1. S--0 2. for each vertex v E V[G] 3. do MAKE-SET(v) 4. sort the edges of E by non-decreasing weight w 5. for each edge (u, v) E E, in the order of nondecreasing weight 6. do if FIND-SET(u) / FIND-SET(V) 7. then S <- SU (u, v) 8. UNION(u,v) 9. return S In steps 1-3, the subset S is initialized to null and V number of forests each with a single vertex are created. Step 4 sorts the edge set E in a non decreasing order of weight. In steps 5-8, an edge (u, v) is found such that the endpoint u belongs to one forest and endpoint v belongs to other forest. This edge is incorporated in Figure 2-1: Original graph the subset S. The algorithm stops when all vertices are included in the tree. An illustration of how the Kruskal's algorithm works is showing in figure 2-2, 6 4 5 5 5 4 S 5 5 Figure 2-2: MST by Kruskals Algorithm A more formal proof of why the Kruskal's Algorithm works [ is as follows Let G be a undirected, connected graph. t is the minimum spanning tree by Kruskal's algorithm and t' is the actual minimum spanning tree. One shows that trees t and t' have same costs. Let E(t) and E(t') are the edges in both the trees where each set contains n 1 edges, provided that the total number of vertices in a graph are n. PROOF: Casel: If E(t) = E(t') then the lengths t and t' are equal. Thus Kruskal's Algorithm generates the minimum spanning tree in this case. Case2:E(t) / E(t'). Assume that there is an edge p such that p G E(t) and p ( E(t). If we include p in t', then t' will have n edges and, thus, a unique cycle in t' with edges p, e 2, e, ...ek. Of these k + 1 edges there will be at least one edge cj not in tree t. The weight of ej must be greater than or equal to the weight of p because, if the weight is less that that of p, then the greedy Kruskal's algorithm would include it while forming the tree t. If any such edge which forms a cycle is removed from tree t' a different tree t" is formed whose cost is less than or equal to the cost of tree t'. This is because weight(ej) > weight(p). Thus, tree t" is a minimum cost tree. By repeating the above mentioned transformation, we can transform tree t' into tree t. Thus, the tree generated by the Kruskals algorithm is a minimum spanning tree. The time complexity of Kruskals minimum spanning tree is O(|Elog|E) Prim's spanning tree. Unlike Kruskal's spanning tree, in Prim's spanning tree the subset s is always a tree and never a forest. The algorithm is given by, MST-Prim(G,w,r) G:Graph, w:Weights, r:Root 1. Q <-- V[G] 2. for each u E Q 3. do 1, I[i,,] <- oo key(v): minimum weight of any edge connecting vertex v to the tree. 4. 1' r/[ ] <- 00 5. 7[r] <- NIL 7[v]: names of the parent of v in tree. 6. while Q / 0 7. do u, -- EXTRACT-MIN(Q) 8. for each v E Q and w(u, v) < key[v] 9. then 7r[v] <- u 10. key [v] -- w (u, v) The time complexity of this algorithm is O(|E| + |V1 log IVI). More efficient algorithms for computing the minimum spanning trees are given in Chazelle [ ] and Pettie and Ramachandran [ ]. Some algorithms [ ] compute the minimum spanning trees dynamically, as the vertices are added or removed dynamically. 2.4 The Problem of Registration Image registration can be defined as the problem of finding the best transformation that, when applied to one image, aligns the image with the another image. Brown [ ] and Pluim et. al [ ] surv'"vrd image registration methods which can be classified as the various algorithms actually used to perform registration and the applications of the registration techniques. Based on the application of the registration techniques, the methods can be classified as, registration if images with different modalities, registration of different anatomical objects etc. The problem of registration of images with different modalities is called multimodality registration. This problem involved the registering of two images taken via different sensors or under different lighting conditions. The example of multimodality registration is registration of MRI, and CT images. In our thesis, we have registered face images which are obtained via a camera of different people, under different lighting conditions and with different expressions. Based on the methods used for registration, the techniques are classified as based on preprocessing of images, the type of transformations applied to images, the interpolation method used, the search strategy used, the different features of the image used and the measure of registration. Mutual information for registration. Based on the measure of registration, several statistical criteria are used. In these criteria, majority of the methods involve entropic measures to estimate how the images have been aligned. These entropic measures involve estimating the entropy of the images and formulating different measures based on them. Mutual information is one such measure, which is defined as I(X, Y) = H(X)+ H(X Y) (2.23) I(X, Y) = H(X) H(Y) -H(X, Y). (2.24) Some methods use normalized mutual information to register the images which is computed as, NMI(X, Y) H(X) H(Y) (2.25) H(X, Y) Observe that H(X|Y) is a measure of the conditional probability p(xly) i.e. of a particular value x in Xgiven the value in Y is y. Thus, the Equation (2.23) can be interpreted as the measure of the uncertainty about X minus the uncertainty about X when Y is known. It can also be stated that I(X, Y) = I(Y, X). Thus, when the images are aligned, the amount of information the images have about each other is maximized. As a result, registration involves finding the transformation that maximizes the mutual information. Another view about of information is obtained from equation (2.24), which describes mutual information as the sum of marginal entropies minus the joint entropy. One might argue that, as the images are transformed, the marginal entropies of the images remain constant and the only factor that changes is the joint entropy. From this it would seem that, maximizing the mutual information is equivalent to minimizing the joint entropy of the two images. But this is not true, as the entropies are calculated from the overlap areas while computing mutual information. The marginal entropies of the images thus, change with the joint entropy, as the number of pixels in the overlap area change with a given transformation. In such a scenario, if the overlap region is very small, then the joint histogram will be a sharp peak corresponding to the overlap of the one image and the background samples from second image, resulting a low joint entropy. This will give an incorrect answer, which ironically favors the separation of the two images. By adding the marginal entropies term in Equation (2.24), a p. n.iii, is added since the marginal entropies will also be low when the images are misaligned. This makes mutual information less sensitive to the problem of background. 2.5 Image Transformations An image transformation is a function that maps the location of points in one image to a different location in an other image. The transformations can be global or local. Global transformations mean applying the same mapping for the entire image while a local transformation consists of a collection of several mapping functions, each of which is specific to a particular region in the image. Apart from these broad classes, image transformations can be classified as, Rigid: involves translations and rotations alone. Affine: involves translations, rotations, scale and shear. Perspective: is mainly used to map a 3D image to a 2D image by projecting the data. Non-Rigid: In which mapping involves introducing morphing and distortions of an image. In this study, we have used the affine transformation and thus will elaborate more on that. Affine transformations. Affine transformations involve rotation, scale, shear and translation. The composite transformation matrix is decomposed into scale, rotation, shear, rotation as -ii..:i -1. .1 in the paper [ ]1 and translation. This decomposition of can be written as, ab0 T = c d 0 that is the composite matrix, which can be further decomposed as e f a b 2 0 cos(O) sin(O) 2t 0 cos(q) sin(q) c d 0 2y sin(O) cos(O) 0 2--sin() cos(y) where s is the scale parameter, t is the shear parameter, 0 and 9 are the rotation angles and e and f are the translations in the x and y direction. We thus, perform our search over six degrees of freedom, two rotations, one scale, one shear and two translations. An interesting property of this affine transformation that it does not allow reflections and that it gives a nice decomposition of the composite affine matrix which allows the user to individually set different parameters. 2.6 Independent Component Analysis Registration of two images can be performed by using the pixel intensity values, or by extracting features from image. Features can be extracted simply by passing the image through a set of filters, each of which detects one or more features of interest. Independent Component Analysis is one such technique for extracting features. The motivation for the technique of Independent Component Analysis, was to solve the "cocktail party problem". This problem can be posed as follows: 1 The paper -ii:'.I -1 decomposing into scale, rotation and two shears. However, it also mentions that the second shear can be replaced by a second rotation Suppose there are two people speaking simultaneously in a room, and we have two microphones to capture their voices. The output from the two microphones can written mathematically as xi (t) = allsi + a12S2 (2.26) X2(t) a2 1 a22S2 (2.27) where x represents the output from a microphone, t is the time, s represent the sources and a represents the weights that are assigned to the signals from the microphones. ICA attempts to find the original sources of the signals (i.e. the speakers in the above case) from a mixed signal input (i.e. from microphones). ICA thus, involves recovering independent sources of signals from only the sensor observations, which are the unknown weighted linear mixtures of the signals from independent sources. Thus, ICA searches for a linear mixing matrix that, when multiplied by the original multivariate mixed data, gives statistically independent components of the data. Definition of independent components. Two variables are said to be statistically independent when no variable has information about the other variable. The joint probability of two independent variables is the product of the marginal probabilities of the two variables. Thus, two random variables yl and y2 are considered to be statistically independent when, p(yl, y2) = p(yl).p(y2). For n such variables, the joint density must be a product of n marginal densities of the variables. Independent variables are also uncorrelated in nature. Also, independent variables are as non-Gaussian as possible. The reason behind this is that the Gaussian distribution is symmetric in nature, which makes it impossible to find the direction of the components of the mixing matrix. Now the obvious question arises is what are the different ways of measuring independence. Measuring independence. Non-Gaussianity and, thus, the independence of a variable, can be measured by using an information theory metric called negative entropy, also called negentropy. Previously we stated that that the differential entropy of a random variable x is given by the formula, h(x) p(x)log p(x)dx. (2.28) Gaussian variables have the largest entropy among all variables of equal variance. On the other hand, the variables that have i-. -" in their probability distribution functions (less Gaussian) have less entropy. Based on this observation, negentropy is defined as J(y) = H(ygas) H(y). (2.29) Thus, the negentropy is always positive, except for gaussian variables, when negentropy is zero. The FASTICA algorithm. We have used the FASTICA package(Available At: http://www.cis.hut.f i/projects/ica/f astica/, Accesses at: March 2004) [ ] to find the ICA of images. This package implements the FASTICA algorithm which constructs a neural network, the weights of which are updated based on a learning rule. Roughly, the algorithm is as follows: 1. Negentropy is used as a measure of the non-gaussianity. 2. The cost function is an approximation of the negentropy. 3. Initially random weights are assigned at the starting point. 4. Then the value of the weights in the subsequent iteration is estimated by the Newton-Raphson method. 5. The above step is repeated until convergence. Assuming that the images are a mixture of signals from statistically independent sources, we can use ICA to extract features from the images. CHAPTER 3 THEORY OF MINIMUM SPANNING TREE ENTROPIES In this chapter, we consider minimum spanning trees for registration. 3.1 Entropy Estimation for Registration In the previous section it was shown that, mutual information requires the estimation of entropy that is calculated by estimating the probability density of the underlying data. For image registration, it might be necessary to estimate discrete instead of continuous densities. There are two broad approaches to estimate the probability density: parametric methods and non-parametric methods. Both methods require the estimation of parameters. However, non-parametric methods estimate only the width parameters like the bin size in histograms, window size in parzen window density estimation described in [ ], etc, while parametric methods estimate more complicated parameters which describe the underlying density such as mean, variance etc. An overview of methods for density estimation can be found in the book by Christopher Bishop [ ]. To elaborate a little bit, Parametric Methods: assume a density model for the underlying distribution of data. These methods then attempt to find the parameters of the particular model which fit optimally to the data. One of the drawbacks of this approach is the assumption of the underlying density model might not be true. The common parametric density estimation techniques are the Hidden Markov Models(HMM), B.,i,. -i.'11 Networks etc. The HMMs model the underlying distribution by having different states and the transitions from one state to other due to the underlying probability and bias as described in [ ]. Also the B.i!,-' -i.im Networks model a causal system in which the occurrence of a event at one node can tri. r several changes in the connecting nodes as described in depth in [ ]. Both the above mentioned approaches have to model complex graphs, which are defined by a large set of parameters. Non-parametric Methods: These methods make no assumption about the density model which underlies the data, and lets the data decide the nature of its distribution. Histogram methods, parzen windows are examples of such methods. Such methods require estimation of the width parameter or smoothing factor which is the number of bins in histograms, the parzen window size "a" which determines how well the density estimation "fits" the data. Semi-Parametric methods: Bishop [ ] advances a class for density estimation that tries to combine the best features of parametric and non-parametric methods. Techniques such as mixture models fall in this category. These techniques assume Gaussian component densities of the data and thus have to estimate Gaussian like parameters. Since both the parametric and semi-parametric methods assume a specific underlying distribution, they have to estimate several distribution specific parameters such as mean, variance and covariance ( [ta, a, oa). In contrast, non-parametric methods estimate a smoothing parameter which is usually some kind of a width parameter. Image registration requires determining the pose parameters in a particular transformation space. If parametric or semi-parametric methods are used for density estimation in registration, then the density model parameters need to be estimated apart from the pose parameters. In contrast, non- parametric methods require the estimation of width(smoothing) parameters and the pose parameters only. Thus, non-parametric density estimators are generally preferred over parametric or semi-parametric methods for image registration. Traditional entropy estimation, either discrete or continuous, requires a priori density estimation. However, the approach taken in this study the entire density estimation. Renyi entropy is estimated directly using the samples from the image, via the relationship between the .ii -I.1. .1 i' length of the minimum spanning tree and the Renyi entropy. The Renyi entropy and its properties are discussed in Section (2.1.4). The following Section discusses the chronology of the development of the theory underlying the entropy estimation by a minimum spanning tree. 3.2 History In the late 50's, an interesting viewpoint for estimating entropy using graph algorithms came into existence. It began in 1959 when Beardwood, Halton and Hammersley [ ] first analyzed the travelling salesman problem(TSP) and stated that the TSP length of a set of multi-dimensional feature vector is related to the entropy. Later, in 1988 J. M. Steele [ ] followed their work and stated that if a fully connected graph is constructed from a multi-dimensional feature space such that the weight of an edge between any two points is the weighted Euclidean distance between the points, then the .i 1mptotic length of a minimum spanning tree constructed from such a graph is related to the underlying probability density. Later, the group at University of Michigan led by Dr. Alfred Hero showed how the .i 1mptotic length of the minimum spanning tree is directly proportional to the Renyi entropy (refer Section 2.1.4) [ ]. They further pointed out how this relationship between d and minimum spanning trees can be further expanded to a variety of graphs like k-nearest neighbors or Steiner Trees [ ]. The practical applications of such graphs are emphasized in [ with image registration being one such application. Minimum spanning trees and Renyi entropy were initially discussed in Section (2.3) and (2.1.4) respectively. Some relevant portions are reiterated here as and when needed. Suppose there are n independent multivariate points in a d dimensional space, x, E Rd and that there is a graph of these points with V = Xi1,2.., Xn vertices and E = (xr, xj); 1 < i < i < < n where each edge e in the edge set E is denoted by a weight function, (|e|). The task of finding a minimum spanning tree involves finding the tree with the minimum weight among all the trees which span the all the vertices in the set V M(x, x2...,Xn) = mm (|e|), (3.1) ecT where T is a set of all the graphs with vertex set that includes all points in the space Rd, y is the weighting function that associates weights with the edges of the minimum spanning trees which can be defined as y(x) -= x where 0 < 7 < d. (3.2) Let the points in feature space Rd, d > 2 have a independent random distribution p with the y function -.,l i-Fw- in4-; the criteria y(x) ~ x7 and 0 <7 < d, and let f be the density of the continuous part of ft with c(Q, d) a positive constant depending upon 7 and d, then Steele [ ] proved the relationship between the length of a minimum spanning tree and some power of the density function f as, lim n d M(XI, X2, .. ., X) c(, d) f(x) d dx. (3.3) The next section shows how the equation (3.3), after a little bit of algebraic manipulations, presents a clear link between the minimum spanning tree and the Renyi entropy. 3.3 Minimum Spanning Trees for Entropy Estimation We discussed about alpha Renyi Entropy or simply the Renyi entropy in Section (2.1.4) and represented it mathematically in Equation (2.18) as, h,(X) =- logV fo(x)dx (3.4) The relationship between the density function and the length of the minimum spanning tree is specified in Equation (3.3). If we rename the length of the Minimum Spanning Tree in Equation (3.3) as, M(Xi, X2,... ,X)= L(X,) = mim e (3.5) where es = i| xj\\ denotes the Euclidean distance between points i and j in a d dimensional feature space, we can rewrite the Equation (3.3) as, lim n- (d-y L(X) c(, d) f f(x) dx (3.6) n-oo d Now taking the log on both sides and substituting a = (d 1)/d we obtain, log lim n-L(X,) log(c(a, d)) + log(/ f (x)dx). (3.7) Note that the constant c which was a function of 7 and d, now becomes a function of a and d due to the substitution of a. Dividing the above equation by 1 a on each side yields, 1 1 log( f (xW)dx) = (log lim n-L(X,) log(c(a, d))) (3.8) Sa Jd I a noo From the Equation (2.18), the left hand side of the preceding equation can be reduced to h,(x) (log lim n-L(X,) log(c(a, d))). (3.9) 1 a noo The short derivation above points as to how the minimum spanning tree length converges to the Renyi entropy. The above Equation (3.9) is the fundamental result from which we estimate the entropy approximately in the thesis. It is the approximate estimation of the entropy as the constant c (which depends on a and the dimensions d) is unknown and cannot be computed analytically. The strength of Equation (3.9) lies in the fact that the length of the minimum spanning trees can be easily computed in polynomial time specified by O(|E| log |E|) (refer [ ]). Observe that Equation (3.9) is a very clean equation which doesn't require the estimation of any smoothing parameter which has to be chosen carefully e.g., the number of bins in the histogram or the "sigma" in the parzen window estimation of entropy. Of course we have to compute the length of the minimum spanning tree which is trivial. For the number of bins in the histogram and the parzen window width estimator, on the other hand, there is no explicit criteria of selection of the smoothing(width) parameter. The explanation of the rate of convergence of MST entropy follows. The rate of convergence of the algorithm depends mostly on the number of samples that we choose to compute the MST. A very nice paper by Alfred Hero et. al("Convergence rates of minimal graphs with random vertices" by "A. 0. Hero J. A. Costa and B. Ma" submitted to "IEEE Transactions on Information T1....y in March 2003) formulated the rates of convergence of the minimum spanning tree for piece-wise linear densities and then extended their proposition for a more general form of densities. We next discuss a proposition for the rate of convergence of minimum spanning trees. We assume that the number of dimensions d > 2, that 1 < 7 < d-1 and X1, X2,..., X, are independent sample points in the space [0, I]d with density f with support S C [0, 1]d Proposition: Marginal block density. The density f has md levels. For the quasi-additive Euclidean functional L, of order 7, the following bound exists, E[L,(XI, X2, ., X,)]/n( c(Q, d) f(d -7 d(x)dx < O((nm- d)-1d). (3.10) In the Equation (3.10), the term L, denotes the length of the minimum spanning tree whose edges have the weight of the Euclidean distance between the points raised to the power 7 as described in (3.5). The Equation (3.10) means that the difference between the expected value of the length of the minimum spanning tree normalized by nrd-"d and the product of the constant and the integral of the density function is .1-'in- l .1. il illyy less than (nm-d)l^/d. For example, if d = 2, then the upper bound on the difference mentioned above which is the error is O((nm-2)-1/2). That is as number of samples n increases, then the bound becomes tighter when we can say that the error reduces as the number of samples increase. Observe from equation (3.10) that, as the dimensionality of the feature space increases, the rate of convergence decreases and thus will need more samples for convergence. The preceding proposition is important in the context of image registration, as the images have fixed levels of color values and thus can be said equivalent to the piece-wise density considered above where there are a total of md levels in the d dimensional space. Interested readers should refer to the paper,("Convergence rates of minimal graphs with random vertices" by "A. 0. Hero J. A. Costa and B. Ma" submitted to "IEEE Transactions on Information T1...Iy in March 2003) for a generalization of the above result to the generalized class of densities. The paper by Hero et. al. ("Convergence rates of minimal graphs with random vertices" by "A. 0. Hero J. A. Costa and B. Ma" submitted to "IEEE Transactions on Information Til ..y" in March 2003) mentions that the aforementioned convergence criteria can also be applied to the graph constructions like the Steiner tree graph, minimal matching graph or the TSP tour all along with their power weighted variants. However, the minimum spanning tree is chosen based on the paper [ ] which recommends the MST as the best method to estimate entropy due to its polynomial runtime. There exist many well known algorithms for implementing the MST which supports use of the minimum spanning tree as the density estimator. 3.4 Measure of Registration Since Viola and Wells [ ] and Collignon et. al. [ ] successfully used the Shannon mutual information to perform registration, it has been used as a measure to perform registration in many systems. In this study, the minimum spanning tree as described in the above section are used to estimate the approximate Renyi entropy of the random variables. This entropy is further used to compute the measure of registration. The definition of the Renyi mutual information was first formulated by Alfred Renyi in the book [ as, 1 r00 ffx} (x) L(X) log fx dx. (3.11) 1 a f.HR1 fxi(Xi 1 Unfortunately, Renyi mutual information cannot be easily computed. Thus instead of the Renyi mutual information, we use a similar measure which is simply the sum of the marginal entropies minus the joint entropy similar to the Shannon mutual information. This measure was proposed by Hild et. al in the paper [ as, N fxx)a I ia 1N In case of two images, the above measure is the sum of the two Renyi marginal entropies minus the Renyi joint entropy as, H(xi) + H(xi) H0(x) = log x + log log L12 (3.13) 1 a n" n" n" where L, denotes the length of a minimum spanning tree constructed from a power weighted graph computed as L, = minT Z T e*. The preceding measure is very similar to the Renyi mutual information as both are positive and both tend to zero when the joint probability is equal to the product of the marginal probabilities (when the variables are independent as mentioned in the paper [ ]). However, the main reason why the measure is preferred over the actual entropy is that it can be calculated using the MST entropy algorithm. Thus, our algorithm maximizes this measure to perform registration. Also, maximizing the mutual information is similar to minimizing the joint entropy of the images when the search space is narrow. This is so as the marginal entropies remain more or less constant in a narrow search space. Referring to the Equation (3.12), we see that, in a narrow space as the marginal entropies are more or less constant, the negative of joint entropy is maximized. In other words, joint entropy is minimized. In some cases (especially when we results are based on one parameter like rotation) we have opted to minimize joint entropy instead of maximizing the mutual information. CHAPTER 4 ALGORITHM DESIGN This chapter first describes the overall design of a generalized registration technique followed by the various design decisions while designing this algorithm, a brief algorithm description and then implementation details. 4.1 General Registration Algorithm The process of image registration can be represented procedurally as shown in Figure 4-1. The first step preprocesses the images such that the images become appropriate for registration. Preprocessing involves performing operations on image like noise removal, smoothing of the image, cropping etc. Human Intervention can occur depending on the type of images considered for registration. The next step is a quick estimation the exact transformation parameters by scanning the search space in low resolution. And finally, when we have the coarse parameters for the transformation, the region around the rough parameters is scanned in order to get an accurate estimate. This multi-resolution helps as it speed up the registration process. Alternatively, if the time requirements for estimating the measure( that we discussed in Section 3.4) are less stringent, then the initial rough estimation step is omitted all together and the search space is searched at fine resolution. The flow Figure 4-1: Block diagram for image registration Figure 4-2: Flow-chart for the estimating the parameters of the transform chart for the block used to find the optimal transformations (either in the ballpark estimation or the fine tuning) is as shown in figure 4-2. We apply the transformation on only one of the two images (transforming image) and the other image is unchanged (reference image). At each step of the algorithm, the entropic measure between the two images is calculated. If the measure has reached the maximum(or minimum depending on its nature) in the space, then the parameters of the transform are declared as the optimal parameters. If not, then the next transform to be applied to the transforming image is computed and that transform is applied to the transforming image and the steps are repeated until the optimum value of the measure is found in the search space. 4.2 Design of the Algorithm Design of a registration algorithm involves a sequence of decisions that need to be taken at each step as discussed in this Section. As now what becomes obvious from the previous discussion that crux of this algorithm is estimation of the Renyi entropy which is continuous. This estimation can be done by using the non-parametric Renyi measure based on the the minimum spanning trees based on the equation (3.8) which is expressed as 1 log( f (x)dx) = (log lim n-L(X,) log(c(d da, d))). (4.1) 1 a 1 noo From this equation, the estimation of entropy is approximate because of the presence of the constant term c that only depends on a and the dimensions d. The first step therefore is estimating parameters by performing rough image registration using histograms. 4.2.1 Preprocessing by Histograms The main reason for using the histograms is that they are computationally efficient. This allows us to get a ballpark estimation very quickly. The entropies (both the joint and marginal) are estimated from the overlap area of the two images. The reason for estimating the joint entropy from the overlap area is obvious. However we also estimate the marginal entropy from the overlap area as as the images are transformed, the marginal entropy of the overlap area also changes dynamically with every transformation. The important parameter which decides how well the histogram performs is the bin size, which is chosen based on statistical analysis. The Figure 4-3 elaborates empirical results of histogram analysis when the number of bins are varied. -03r Changes in M I Vs angle when number of bins are 8 -06 -07 1 OB -30 -20 -10 0 10 20 Angle Changes in MI Vs angle when number of bins are 8 -30 -20 -10 0 10 20 30 Angle Changes in MI Vs angle when number of bins are 32 -1 1 - -12 -13 50 -40 -30 -20 -10 0 Angle 10 20 30 Figure 4-3: Effect on registration by varying the number of bins of histogram(\iiiilI. r of Bins are 8, 16 and 32 from top left to bottom right. Maximum should be present at 0.) Table 4-1: The ICA values -0.2845 0.2875 -0.1523 0.1468 0.3425 -0.5310 -0.4016 0.5821 0.4048 0.0869 -0.4269 -0.0605 0.0597 0.0197 0.0251 0.0777 As we can see as the number of bins decrease (bin size is large) the histogram is noisy and there are many peaks around the desired value maximum in the histogram. However, if the number of bins increase, the peak disappears as the data is over fitted. After histogram analysis, we extract features from the image and use these feature images to perform registration as follows, 4.2.2 Features Image features are extracted based on a local neighborhood of each pixel, called tags [ ]. The basic of idea of tag-based methods is to account for the local topology surrounding each pixel. Around each pixel in the image, a n n matrix is extracted and vectorized, to obtain a vector of length n2 for every pixel. This is equivalent to quantizing the image such that it reflects local topology due to which the extracted features become more invariant to change in the individual pixel intensity. Once the tags are extracted from the image, the decision that laid ahead was how to extract features. We opted for statistical based features since they extract features based on the information content of the image. As mentioned in Section (2.6), the ICA algorithm attempts to find statistically independent components of the image. We have used the FASTICA [ ] algorithm to compute the ICA of the images. The Figure 4-4 shows the feature images as computed by ICA. The actual values of the ICA components can be seen from the table 4-1.The first three lines ICA Image 1 ICA Image 2 I .* ** .I . ICA Image 3 ICA Image 4 * - Figure 4-4: ICA feature images Figure 4-5: The faces used to compare registration for different sizes of ICA blocks Blocks are 4 Blpcks are 2 -24 S\-27 -10 -5 G 5 10 -10 -5 0 5 10 BIR c]s are 1 -55 -10 5 5 10 Figure 4-6: Figure showing the variations in MI vs. the rotation angle various ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features. Maximum located at 0. of the table 4-1 are responsible for the edge images of the original image while the fourth ICA component simply smooths the image. One point here is how many ICA features to extract or how much should be the dimensionality of the feature space. There are several interesting observations in this case. Figure 4-6 represents the mutual information vs. the rotation angle for various sizes of ICA blocks. The images for registration are shown in Figure 4-5. From the Figure 4-6 we can see that the registration worsens if we pick too high a feature dimension. Block axe 2 Blocks are 4 feature- dimensions-1 4-eaur s 2.-1-featr-s "---- fetur- 11 -24 S -12 -291 -10 -5 Blocs aret 5 10 -1l -5 Q 5 Io -42 -44 -52 -10 -S 0 5 10 Figure 4-7: Figure showing the variations in MI vs. the rotation angle various ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features Another interesting observation is seen when we pick the faces shown in figure 4-30 we get the MI Vs. rotation plots as shown in Figure 4-7. From the Figure 4-7, we see that the the acceptable registration result is obtained when we use 16 ICA features. Thus the number of ICA features to use relies heavily on the type of image to register. Once features are extracted from the image, the next step is to fine tune the registration by estimating the entropy using the minimum spanning trees. 4.2.3 Search Strategy Our prototype MST matching algorithm uses a brute force search that scans the entire transformation space to perform registration. One of the reasons for not doing this is the way the MST entropies behave in the search space. The behavior is shown in 4-8, from which we can see that the MST entropy is not smooth and has several local maxima. Thus, normal search strategies like the gradient descent etc cannot be used as they rely on a smooth objective function. MST entropies over rotation 22 -23 -24 26 26 2\ -15 -105 Angle -136- 138 -14 142 148 -10 10 -05 scale Anale Figure 4-8: Behavior of MST entropy in ID and 2D search space. Thus, in order to be not overwhelmed by the design of an efficient search algorithm for finding the global maxima for a non-smooth objective function, we used a brute force search which is guaranteed to find a global maximum. 4.2.4 Minimum Spanning Tree Algorithm This is the final step of the registration. Ideally, we would use the entire overlap area of the two images to compute the minimum spanning tree and then estimate the entropy. From Equation (3.8), we see that the entropy is sensitive to the number of samples used to compute the minimum spanning tree and thus we need to maintain this number constant during each computation. So, we choose a fixed number of sample points randomly from the overlap area of the image and use only those to estimate the entropy. Minimum Spanning Tree computation. Assume that we pick n sample points each time to compute the minimum spanning tree. Now here we have the points in some d dimensional space, where the weights of the edges between two points are the Euclidean distance between the points. Also, initially we assume that there is an edge between a point and all the other (n 1) points in the space i.e. that the graph is fully connected. In such a scenario ,we have n points and n2 edges in the Euclidean space, the classical MST algorithm will have the complexity of order 0(n2 log n). As this is a polynomial complexity algorithm, the search time increases tremendously as n increases. One of the papers [ ] mentions about resolving this issue by reducing the number of edges. This has been done by placing a sphere around each point in the d dimensional hyperspace. Only the edges connecting the given point to its neighbors that lie within the sphere are retained for analysis. The radius of the sphere is calculated based on the following formula, max mean, radius, = meanmean RFactor (4.2) QFactor where radius, denotes the radius at point p. mean, denotes the mean of distances that connect point p to all other points. max denotes the maximum distance between any 2 points in the d dimensional space. QFactor denotes the quantization factor. RFactor denotes the factor which decides magnitude of the radius. We varied the RFactor it over a specific values of the radii and chose the one which gave the radius with minimum edges and error. The plots are shown in 4-9 As a result, every point has a sphere of a different radius, which is determined primarily by how close the point is to the other points in the hyperspace. Deciding the number of samples. The Figures 4-10 and 4-11 illustrate that when the value of a is kept about 0.55, then 500 samples seem sufficient for registration. However a nice plot with explicit maximum is obtained at about 1100 samples. A similar plot for value of alpha as 0.85 is shown in 4-12 and 4-13 which illustrates that a very good result is obtained at 1100 samples. Thus evidently as more samples are better for accuracy. Fewer samples are picked when speed desired. The minimum spanning tree was computed by using the Kruskal's Algorithm implementation provided by the Boost Graph Library (Available at: http://www. boost. org/libs/graph/doc/table_of_contents.html, Accessed on: March 2004). The pseudo code for this implementation can be found in the APPENDIX section. 4.3 Implementation Details 4.3.1 Hardware and software Details CPU: Pentium IV, 2.8 GHz, 2 GB RAM. OS: Linux. Languages: Matlab and C++. All the code was written in Matlab. The program used the Kruskal Algorithm provided by the Boost Graph Library (Available at: http://www.boost. org/libs/graph/doc/table_of _contents .html, Accessed on: March 2004) which is written in C++. The interface between the Matlab and the C++ code is done using the Matlab MEX Function Interface. Plul If .rrir in Iru1 MST ,d T-nLd MST w olh rwutaliin r-10 degr 3 25- 2 1 5 05 Qo RFa 1c r 25 R~achir PhlU lfeitr in tIru MST ndTrnAld MST wih -,ln-o f 10 arees Figure 4-9: Plots showing error between the actual MST length and the truncated MST length over various radius factors and over rotations of -10, 0 and 10 degrees RFnie tr Plate ofarrour in rnetu MET unTurd Trvnel M5T with rotati~o aEC dryrac> 55 5 55 -54 5 U 54 o 53 5 53 525 -10 -5 Sampl& Ils500 5 10 63 5r 63 L 62 5 62 61 5 61 - -10 -5 Samples is 300 11: 605[ N V 58 5 58 -10 -5 SamplI is 700 5 10 655 65 645 64 635 63 -10 -5 0 5 10 Figure showing the variations in joint entropy verses the rotation angle with varying number of samples picked from the overlap area when alpha is 0.55 are shown as 1. 100 Samples 2. 300 samples 3. 500 samples 4. 700 samples 665 66 w 655 65 645- -10 Figure 4-11: Samples is 900 675 67 66 5 S66 65 5 65 0 -10 -5 0 5 10 Figure showing the variations in joint entropy verses the rotation angle with varying number of samples picked from the overlap area when alpha is 0.55 are shown as 1. 900 Samples 2. 1100 samples Figure 4-10: Samples is 100 31 -10 -5 0 5 10 Samples is 500 36 355 35 ^ 34 5 34 35 5 35 34 5 34 33 5 33 -10 -5 0 5 10 Samples is 700 36 " a /\ 331 -10 -5 0 5 10 Figure 4-12: Figure showing the variations in joint entropy verses the rotation angle with varying number of samples picked from the overlap area when alpha is 0.85 are shown as 1. 100 Samples 2. 300 samples 3. 500 samples 4. 700 samples 36 5 36 S35 5 - 35 345 Samples is 1100 34 -10 -5 0 5 10 5 10 Figure 4-13: Figure showing the variations in joint entropy verses the rotation angle with varying number of samples picked from the overlap area when alpha is 0.85 are shown as 1. 900 Samples 2. 1100 samples ,\K! Table 4-2: Free parameter table Parameter Meaning Value Reason a a in Renyi 0.5 The value -I'.].i -1. .1 Entropy by the Hero et. al in paper [ ]. BlockSize Size of block 2 or 4 Based on analysis in around each pixel section 4.2.2,figures i.e. the number of 4-6, 4-7. ICA features radius radius of the 5 Based on analysis sphere around each shown in Figure 4-9 point in feature space to truncate the edges numBins number of bins for 16 Based on analysis histogram search shown in Figure 4-3 numberSamples Number of random 500 or 1000 Picked based on samples used to whether speed or compute Renyi accuracy is desired. entropy Figure 4.2.4 searchSpace Affine parameters Rotations[-20,20] Based on the time over which to Scale[-0.5,0.5] required to cover search shear[-0.5,0.5] the space translations[-5,5] in each direction Time: The time taken for the entire algorithm to run for a 64*64 images with histogram preprocessing and then final registration using MST entropy using 1000 samples is approximately 8 hours. 4.3.2 Free Parameters The free parameters are tabulated in Table 4-2 4.4 Results This section summarizes the different results obtained from the registration algorithm. This section shows that the registration by estimating entropy by using the MST produces acceptable results. All the face images are obtained from the Yale face database (Available At: http://cvc.yale. edu/proj ects/yalefaces/yalefaces. html Accessed On: March 2004) 4.4.1 Same Image Registered with the Same Image This result shows how the MST changes as the images are registered. As both the images considered are the same image in this case, the visual representation of the change in the MST from the unregistered case to one in the perfectly registered case is very apparent. The parameters values for the following tests: a = 0.5 Samples = 500 Number of bins of Histogram = 16 Block size around pixel = 2 x 2 The figure 4-14 displays the original unregistered image and their overlap images. The Figure 4-15 shows the minimum spanning tree of the two unregistered images. The feature space in this representation is the intensity of both the images for the purpose of visualization. The figure 4-16 shows the how the images are registered. The MST of the two registered images is shown in figure 4-17. Observe how nicely the points collapse in one line as the images become registered. 4.4.2 Registration of Same Person with Different Expressions In this section, here are the registration results for the images of the same person in different facial expressions. The parameters chosen are: a = 0.5 Samples = 500 Number of bins of Histogram = 16 Block size around pixel = 2*2 Image 1 Overlap of Unregistered Images a Overlap of Unregistered Edge Images Figure 4-14: Figure showing the from top left to bottom right 1. Face image 2. Same face with applied transformation 3. Overlap of the two unregistered images 4. Edge image of the overlapped unregistered images Minimum Spanning Tree for Unregistered Images 250 200 150 -50 0 50 100 150 200 250 300 D-one Figure 4-15: Minimum spanning tree of the unregistered images considering only the intensities of the two images as features Image 2 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4-16: Figure showing the from top left to bottom right 1. Face image 2. Face obtained after applying the optimal transformation to the unregistered image shown in the figure 4-14 3. Overlap of the two registered images 4. Edge image of the overlapped registered images Min mum Spanning Tree for Registered Images 200 50 100 150 D-one Figure 4-17: Minimum spanning tree of the registered intensities of the two images as features. 250 300 images considering only the Image 1 Image 1 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images Figure 4-18: Figure showing the from top left to bottom right 1. Face image 2. Face image of the same person with glasses which is unregistered 3. Overlap of the two unregistered images 4. Edge image of the overlapped unregistered images The images before registration are shown in Figure 4-18 and the MST for the unregistered case is in Figure 4-19 After performing registration the results are shown in Figure 4-20 and the MST for the registered case is in Figure 4-21 If we see at the images of the minimum spanning trees then we see that when the two images are unregistered then the points are scattered all over the place. However when the images are registered, a cluster of points is formed at the 45 degree line in the MST space reducing the length of the MST. 4.4.3 Yet Another Registration of Same Person with Different Expressions In this section, we show the registration results for another images of the same person in different expressions. The parameters chosen are: a = 0.5 Samples = 500 Min mum Spanning Tree(MST) of unregistered images 50 100 150 D-one 200 250 300 Figure 4-19: Minimum spanning tree of the unregistered images considering only the intensities of the two images as features Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images ri Figure 4-20: Figure showing the from top left to bottom right 1. Face image 2. Face image of the same person with glasses which is registered 3. Overlap of the two registered images 4. Edge image of the overlapped registered images Minimum Spanning Tree(MST) of registered images 250 - 200- 150 - 100 - 50 - 0 50 100 150 200 250 300 D-one Figure 4-21: Minimum spanning tree of the registered images considering only the intensities of the two images as features Number of bins of Histogram = 16 Block size around pixel = 2 x 2 The images before registration are shown in Figure 4-22 After performing registration the results are shown in Figure 4-23 From the previous two results with the images of same person and different expressions, we can see that the algorithm successfully performs registration. An important point to note here is that the image size was 64 x 49 (3136 pixels) out of which 500 were used to estimate entropy in order to perform registration by using MST unlike histograms which need all the samples in the area in order to estimate entropy. Plots of convergence for the Renyi a entropy verses the number of samples are shown later in this section. 4.4.4 Registration Result of Images of two Different People We next discuss registration results for the images of the different persons. The parameters chosen are: a = 0.5 Image 1 Overlap of Unregistered Images Overlap of Unregistered Edge Images / f Figure 4-22: Figure showing the from top left to bottom right 1. Face image of a happy person 2. Face image of the same person with sad expression which is unregistered 3. Overlap of the two unregistered images 4. Edge image of the overlapped unregistered images Image 1 Overlap of Registered Images Registered Image 2 Overlap of Registered Edge Images ^'I Figure 4-23: Figure showing the from top left to bottom right 1. Face image 2. Face image of the same person with sad expressions which is registered 3. Overlap of the two registered images 4. Edge image of the overlapped registered images Image 2 Image 2 Figure 4-24: Overlap of Unregistered Images Overlap of Unregistered Edge Images A JI Figure showing the from top left to bottom right 1. A Face image 2. Face image of the different person 3. Overlap of the two unregistered images 4. Edge image of the overlapped unregistered images Samples = 500 Number of bins of Histogram = 16 Block size around pixel = 4x4 Initial state of the unregistered images is shown in Figure 4-24 After performing registration the results are shown in Figure 4-25 The above result shows that though it is doing registering the two images, the result is not accurate for dissimilar images and there is definitely scope for improvement. In the next result therefore I changed some parameters and got some improvements in the result. 4.4.5 Yet Another Registration Result of Images of Two Different People Now are the registration results for the yet another images of the different persons. The parameters chosen are: a = 0.9999 Samples = 1000 Image 1 Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4-25: Figure showing the from top left to bottom right 1. Face image 2. Face image of different person which is registered 3. Overlap of the two registered images 4. Edge image of the overlapped registered images Number of bins of Histogram = 16 Block size around pixel = 4*4 Also I scaled the search range for scale to perform even finer search. Initial state of the unregistered images is shown in Figure 4-26 After performing registration the results are shown in Figure 4-27 The above result is a trifle better than the previous one. 4.4.6 Comparison with Histogram Based Method In order to do this comparison, we selected two sets of images of two different people. In one case, we perform registration over six parameters for the faces considered in section 4.4.5 and we registered the faces over all the six parameters namely rotation, scale, shear and translations. Initial state of the unregistered images is shown in Figure 4-28 After performing registration the results are shown in Figure 4-29 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images -i Figure 4-26: Figure showing the from top left to bottom right 1. A Face image 2. Face image of the different person 3. Overlap of the two unregistered images 4. Edge image of the overlapped unregistered images Image 1 Registered Image 2 amm Overlap of Registered Images Overlap of Registered Edge Images a Figure 4-27: Figure showing the from top left to bottom right 1. Face image 2. Face image of different person which is registered 3. Overlap of the two registered images 4. Edge image of the overlapped registered images Image 1 Image 1 a Overlap of Unregistered Edge Images i^ 1 I --^- ( T, Figure 4-28: Figure showing the from top left to bottom right 1. A Face image 2. Face image of the different person 3. Overlap of the two unregistered images 4. Edge image of the overlapped unregistered images Image 1 Registered Image 2 Overlap of Registered Images Figure 4-29: Overlap of Registered Edge Images 'i Figure showing the from top left to bottom right 1. Face image 2. Face image of different person which is registered 3. Overlap of the two images registered using histogram 4. Edge image of the overlapped registered images a Image 2 Image 1 with light from left Image 2 rotated with light from right Figure 4-30: Figure showing from left to right 1. Face image of a person 2. Face image of same person which is registered Figure 4-31: Figure showing image 2 as registered by the histogram We see from the figure 4-29 that the histogram method performs as well as the MST when the intensity differences in the two images are not that obvious. Later we performed the comparison on two images in which the lighting on one of the images was from the left side while in the other image the light was from the right side and then compared results over the rotation only. The images that were considered are shown in figure 4-30 The second image is rotated by 5 degrees and perfect registration finds the right answer at -5 degrees. The results with the histogram are shown in 4-31 The plot of the MI vs the rotation is shown in 4-32. Notice how the maximum MI is seen at 9 degrees instead of -5 degrees, which is a clear error. The results with the MST are shown in Figure 4-33 The plot of the MI vs the rotation is shown in Figure 4-34. Notice how the maximum MI is seen at -5 degrees as expected. MI Vs Rotation by histogram 1 71r -1 8 10 -8 2 0 Angle 8 10 Figure 4-32: Plot of MI vs. rotation angle when registered by histogram. Maximum should be at -5 degrees. Figure 4-33: Figure showing image 2 as registered by the MST MI Vs Rotaton h MS Figure 4-34: Plot of MI vs. rotation angle as registered by MST. Correct answer is at -5 degrees. -0.4 -0.4 -0.6 -0.6 -0.8 -0.8 -1 -1 -1.2 -1.2 -1.4 -1.4 -1.6- -1.6 -1.8 -1.8 0 500 1000 1500 2000 0 500 1000 1500 2000 -0.8 -0.8 -1 -1 -1.2 -1.2 1.4 -1.4 -1.6 -1.8 -1.6 -2 -1.8 -2.2 0 500 1000 1500 2000 0 500 1000 1500 2000 Figure 4-35: Figure showing the from top left to bottom right, MST Renyi joint entropy vs the samples for 1. alpha = 0.4 2. alpha = 0.45 3.alpha = 0.5 4. alpha = 0.55 Thus we conclude, that in the images with lot of intensity variations, the registration by using a minimum spanning tree definitely outperforms that by the histogram. 4.4.7 Convergence of Renyi Entropy with Number of Samples as Alpha Changes Recall from the previous discussion in the background and theory section that we need to estimate the a Renyi entropy that as a 1 converges to the Shannon differential entropy. The question here is what value of a to choose such that the Renyi entropy estimated converges quickly in fewer samples. This result documents the plots of the Renyi entropy computed using the minimum spanning trees vs. the number of samples as we vary the a. The result is shown in Figures 4-35, 4-36, 4-37. 1.4 1.6 1.8 -2 2.2 2.4 2.6 0 500 1000 1500 20 Figure 4-36: Figure showing the from top left to bottom right, MST Renyi joint entropy vs the samples for 1. alpha = 0.6 2. alpha = 0.65 3.alpha - 0.7 4. alpha = 0.75 3.5' 0 500 1000 1500 2( Figure 4-37: Figure showing the from top left to bottom right, MST Renyi joint entropy vs the samples for 1. alpha = 0.8 2. alpha = 0.85 3.alpha - 0.9 4. alpha = 0.95 -- Table 4-3: Noise results over rotation for histogram Noise Variance Rotation Error 0.01 0 0.02 0 0.03 0 0.04 0 0.05 1 0.06 1 0.07 1 0.08 1 0.09 1 0.1 1 The graphs in Figures 4-35, 4-36 and 4-37, show that the behavior of the Renyi entropy is very erratic for all the values of alpha and in fact does not seem to converge quickly. 4.4.8 Noise Trials Noise in the image may affect the registration results. This section compare the histogram technique to the MST technique when the Gaussian noise with varying variance was added to the images. The images were then registered over rotation and over rotation and scale and the error by both the techniques was compared. case 1: Registration over rotation only. Range of rotation: -10 to 10 degrees Correct Answer at: 0 degrees. The error for the histogram method for various noise added can be tabulated as in Table 4-3. As seen in Table 4-3, the histogram performs well for low noise but gives an error of 1 degree for a noise variance greater than 0.05. The error for the MST method for noise added can be seen in Table 4-4. From this table, we can see that the MST performs well for a low noise case and again Table 4-4: Noise results over rotation for MST Noise Variance Rotation Error 0.01 0 0.02 0 0.03 0 0.04 0 0.05 0 0.06 3 0.07 0 0.08 3 0.09 0 0.1 2 introduces error (in fact larger) than the histogram as the variance of the noise added increases. Case 2: Registration over rotation and scale. Rotation Range: -10 to 10 degrees. Scale Range: -0.5 to 0.5 The correct answer expected for both rotation and scale is at 0. Initially, some trials were made by just adding noise to the images and then registering the raw images. With this option however the MST approach performed badly and gave an error for values low noise cases with variance of 0.01. The histogram method clearly outshone the MST method in this case. So later we used a smoothing filter to smoothen out the images before registering then with the MST and the histogram. When smoothing Weiner filter was applied to the images after adding noise, the histogram method gave a ZERO error consistently for variance from 0.01 to 0.08. Results when the MST was used are tabulated in the table 4-5. This table shows how the MST error worsens dramatically in a two parameter search space. Table 4-5: Noise results over rotation and scale for MST Noise Variance Rotation Error Scale Error 0.01 0 0 0.02 0 0 0.03 2 0 0.04 4 0 Table 4-6: Noise results over rotation and scale for MST Noise Variance Rotation Error Scale Error 0.01 0 0 0.02 0 0 0.03 2 0 0.04 2 0 0.05 1 0.1 0.06 0 0 0.07 2 0 0.08 2 0 In this above result shown in Table 4-5, 1000 samples from the overlap area of the images were used to compute the MST length. Another set of results when 1500 samples were used to compute the MST are tabulated in the Table 4-6 From the Table 4-6 we can see that the accuracy of registration by MST decreases progressively as an error is consistently introduced. This result illustrates the behavior of the MST entropy when Gaussian noise was added to the image and then the image was smoothed with a particular filter before registering. The are different types of noise and filters that can be used to remove the noise. This relationship between accuracy of registration by MST and the different types of noise and filters needs to be studied further. CHAPTER 5 DISCUSSION The estimation of entropy has its applications in a variety of areas with image registration being one such application. Recently, information theoretic measures such as mutual information have become very popular in image registration. Estimating the mutual information requires the estimation of entropy. This thesis used minimum spanning tree algorithms to estimate the entropy from a finite set of samples. The MST can be computed easily in polynomial time. One of the good attributes of the MST-based entropy estimation is that no further parameter tuning is required. However when applying these entropies to perform registration several important issues need to be dealt with. The MST entropy does not approximate the Shannon entropy and instead approximates the Renyi entropy seen in equation (3.3). Thus it cannot estimate the Shannon mutual information which is widely used in registration. In fact it cannot even estimate the original Renyi mutual information that we saw in (3.11). Instead we have used a information theoretic measure which is the difference between the sum of the marginal entropies and the joint entropy as was -ii:',_i -led in paper [ ]. This measure has properties similar to that of the Renyi mutual information. There are several open issues that need to be resolved in this case. We would like to undertake a systematic comparison with other established density and entropy estimators such as Parzen windows [ ]. Parzen windows estimate both the Shannon entropy as well as the Renyi entropy. Thus they can be also used to estimate the Shannon mutual information. The density function by the Parzen windows can be computed from the following equation, 1 1 || xs||2 f(x) W exp- >2 (5.1) N (2w2) 227 This equation can be further used to estimate the a Renyi Entropy which is mentioned in the equation (2.18). Note that when using the Parzen Window density estimator, it can compute the entropy for a > 1 where as for the MST the condition on the alpha is that it should be less than 1. Thus careful formulation has to be done of all the parameters in order to do a comparison between these density estimators. Apart from this, even in the regime of MST entropies that are used for registration, several important changes can be made to the method proposed in this thesis. One of them is that, in this thesis we have simply scanned the entire space to find the maximum (Refer back to the section 4.2.3). Clearly, a good optimized search algorithm to search over such a space would be beneficial in cutting the time required drastically. Also, one area of improvement is in feature selection. Right now, we have used ICA features based on the rationale that ICA returns statistically independent features. But a detailed study on the features can help improve the registration results. There are several other methods of feature selection like the principal component analysis, use of image processing filters like Gabor filters etc. [ ]. However, the noise analysis that we did in the section (4.4.8), 1ii:'. -r that when noise is present it requires smoothing of images as these methods rely heavily on taking the derivatives of the images. Thus presence of noise can in fact have a negative effect on the registration when derivative-based features are used, vis-s-vis intensity-based registration. An interesting extension to the registration by using MST entropies can be pursued if instead of using Euclidean distance among the feature points, we project the points in a different space (e.g., Hilbert space) and then use the distance in the Hilbert space to compute the MST length. Whether such a extension improves the registration is another open area of research. In fact we also noticed in section (4.4.8) that the histogram method is less sensitive to noise. Thus a way to register by using a high dimensional histogram is also a possibility. However, in order to do that innovative ways have to be devised to compute histograms in a multi dimensional space. In conclusion, in this thesis, we use the MST entropies to register just two face images at a time searching over the space of affine transformations. This technique can also be expanded to register several images at once. To bring this into reality however a good search algorithm is a necessity. It also requires appropriate selection of features in order to extract most relevant information. For example, the features used to register face images may not be suitable for medical images. Also, MST entropies can be used for non-rigid image registration, provided it is been made extremely computationally efficient. Thus, this thesis bolsters the fact that that the MST entropies can be used for registration by implementing a registration algorithm which searches over the affine transformation space. APPENDIX ADDITIONAL INFORMATION This section covers the pseudocode of the algorithm. Pseudo-code. The program consists of the following main functions: FunctionRegister: This function is the main function which calls all the subfunctions. FunctionRoughRegister: This function performs the rough image registration using intensity and histograms for entropy estimation. FunctionExtractlCAFeatures: This function extracts the ICA features from the image. FunctionMSTRegister: This function performs the fine registration using the ICA features and the minimum spanning tree for estimating the entropy. and the following supplementary functions: FunctionAffineTransform: This function performs the affine transformation of the input image according to the given parameters. FunctionJointEntropyByHistogram: This function computes the joint histogram and accordingly estimates the Shannon joint entropy. FunctionMarginalEntropyByHistogram: This function computes the histogram and accordingly estimates the Shannon marginal entropy. FunctionEntropyByMST: This function estimates the Renyi entropy (Joint and Marginal) using the minimum spanning tree. Here are the main functions explained in depth: FunctionRegister Input: Imagel, Imii: .': The Images. BlockSize: Size of the square block around each pixel(i.e. 2 x 2 3 x 3 etc). Alpha: The a factor in the Renyi Entropy. NumBins: The number of bins used to compute the histogram. Output: AffineParameters: The affine transformation parameters which when applied to the second image give a registered images. The parameters of affine are two rotations, scale, shear and translations in the x and y direction MaxMeasure: The Mutual Information like measure when the for the parameters above. Algorithm: // Perform rough image registration of the images using intensity + histogram RoughAffineParameters = Function Rough Register(Imagel, Image2, NumBins); // Extract ICA features from Input Images [icaFeatures1, icaFeatures2, ICAFeatures] = FunctionExtractl CAFeatures(Imagel, Image, BlockSize); // Perform fine image registration of the images using ICA Features, + histogram AffineParameters = FunctionMSTRegister(ICAFeatures, Alpha, RoughAffineParameters); EndFunction FunctionRoughRegister Input: Imagel, Iiri,: .': The Images. NumBins: The number of bins used to compute the histogram. Output: RoughAffineParameters: The affine transformation parameters which when applied to the second image give a roughly registered images. The parameters of affine are two rotations, scale, shear and translations in the x and y direction Algorithm: // The Imagel is kept fixed while Image2 is transformed over the search space. Over all the six parameters of the affine search space using a rough resolution to scan the space do templmg = FunctionAffineTransform(currentTransformValues); jointEntropy = FunctionJointEntropyByHistogra m(overlap areas of the Imagel and templmg); marginalEntropy = Function Margina lEntropyByHistogra m(overlap area of Imagel); marginalEntropy2 = Function Margina lEntropyByH istogra m(overlap area of Image2); currentMI = marginalEntropyl + marginalEntropy2 jointEntropy; if( currentMI > maxMI ) maxMI = currentMI; RoughAffineParameters = current affine parameters; endlf endWhile return RoughAffineParameters; EndFunction FunctionExtractICAFeatures Input: Imagel, Irii' .': The Images. BlockSize: Size of the square block around each pixel(i.e. 2 x 2 3 x 3 etc). Output: icaFeaturesi: The ICA features of imagel icaF *in/, t i The ICA features of image2 ICAFeatures: The ICA features of the two images Algorithm: For both Imagel and Image2 do // Get the block neighborhood of each pixel in both the images. BlockInput1 = GetBlockNeighberhoodlmage(Imagel, blockSize); BlockInput2 = GetBlockNeighberhoodlmage(Imagel,blockSize); // Compute the ICA features. We have used the fastICA package [ to do so. The package is freely available for use. icaFeaturesi = FASTICA(BlockInput 1); icaF-.i, '; -= FASTICA(BlockInput2); // Now combine the ICA features ICAFeatures = icaFeaturesI U icaF., ,i,/ return icaFeatures1, icaFn -ii. / .', ICAFeatures; EndFunction FunctionMSTRegister Input: icaFeatures1: The ICA features of image. icaF.i/;,,i. .': The ICA features of image. ICAFeatures: The ICAFeatures of the Images combined. Alpha: The Alpha used to compute the Renyi entropy RoughAffineParameters: The Ballpark estimate from the histograms around which the search is performed. Output: FinalAffineParameters: The affine transformation parameters which when applied to the second image give a registered images. The parameters of affine are two rotations, scale, shear and translations in the x and y direction MaxMI: The value mutual information when the images are totally aligned. Algorithm: // The Imagel is kept fixed while Image2 is transformed over the search space. Over all the six parameters of the affine search around the ballpark estimate RoughAffineParameters using a fine resolution to scan the space do // First Pick Random Samples from the joint as well as the individual feature space. (sampleJoint, sample, sample) = pickRandomSamples(ICAFeatures, icaFeaturesi, icaFeatures2); templmg = FunctionAffineTransform(currentTransformValues); jointEntropy = FunctionEntropyByMST(sampleJoint); marginalEntropyl = FunctionEntropyByMST(samplel); marginalEntropy2 = FunctionEntropyByMST(sample2); CurrentRegistrationMeasure = marginalEntropy 1 + marginalEntropy2 - jointEntropy; if( CurrentRegistrationMeasure > maxMeasure ) maxMeasure = CurrentRegistrationMeasure; FinalAffineParameters = current affine parameters; endlf endWhile return F':i7+1. 1 fineParameters, maxMeasure; EndFunction Now the pseudo code of the supplementary functions is: FunctionAffineTransform Input: Image: The Image to transform. TransformParameters: Parameters by which the image is to be transformed Output: TransformedImage: The transformed image after applying the transformation. Algorithm: // The transform parameters are 2 rotation angles (0 and 0), scale, shear, and translations (xT and yT) a b 2 scale 0 cos(0) sin(0) 2shear 0 cos(q) sin(q) c d 0 scale sin(0) cos(0) 0 2-shear sin(q) cos(q) a bO affineTransformationMatrix = c d 0 xT yT 1 // Now call the MATLAB's Image Transform Routine which transforms the given image according to the given affine transform matrix and fills the empty areas according to the interpolation method specified. TransformedImage = Function MATLABI mTra nsform(Image, affineTransformMatrix, interpolation); return TransformedImage; EndFunction FunctionJointEntropyByHistogram Input: Imagel, Iiri,: .': The Images. NumBins: The number of bins used to compute the histogram. Output: joi:ilI, .l'.;1 Joint Shannon Entropy of the two images computed by the histogram method Algorithm: interval = Range/NumBins; //Compute the two dimensional normalized joint histogram of the two images. jointHist = FunctionJointHistogram(Imagel, Image2, interval); jointEntropy = 0; for i = 1 to NumBins for j = 1 to NumBins jointEntropy = -1 jointHist(i,j) log jointHist(i,j); endFor endFor return Jo*lo /r:l/ .i.. EndFunction FunctionMarginalEntropyByHistogram Input: Image: The Image. NumBins: The number of bins used to compute the histogram. Output: mariT.:al;ide/gi Marginal Shannon Entropy of the image computed by the histogram method Algorithm: interval = Range/NumBins; //Compute the one dimensional normalized histogram image. marginalHist = FunctionMarginalHistogram(Image, interval); mar.:,lTl; ** lui' = 0; for i = 1 to NumBins nmaIr.i.:ilfd,*l,.i '1. = -1 marginalHist(i,j) log marginalHist(i,j); endFor return margl.r:/ EndFunction FunctionEntropyByMST Input: featureMatrix: The feature matrix whose entropy is to be calculated. The size of the matrix is N d. where N :- Number of sample points and d :- Their dimension d > 2 Alpha: The Alpha Parameter that is used in computing the Renyi Entropy. Output: Eidl ..ir Renyi Entropy of the feature matrix computed by the Minimal Spanning Tree method. Algorithm: /* In this function we attempt to compute the approximate Renyi Entropy based on the formula 3.8. We do not know the factor C(a, d) and thus it is approximate. */ [N d] = size(featureMatrix); /* Compute the minimal spanning tree and its length. We use the Kruskal Algorithm provided in the "Boost Graph Library" (Available at: http: //www. boost. org/libs/graph/doc/table_of _contents html, Accessed on: March 2004) */ lengthMST = FunctionBOOSTKruskal(featureMatrix); gamma = (d Alpha) /d; Entropy- 1 lengthMST I Alpha ogNAtph 75 return Entropy; EndFunction REFERENCES [1] Y. AMIT AND D. GEMAN, Shape quantization and recognition with randomized trees, Neural Computation, 9 (1997), pp. 1545-1588. [2] D. BEARDWOOD, H. J. HALTON, AND J. M. HAMMERSLEY, The shortest path through many points, in Proceedings Cambridge Philosphical Society, vol. 55, 1959, pp. 299-327. [3] C. BISHOP, Neural networks for pattern recognition, Oxford University Press, Oxford, England, 1995. [4] L. G. BROWN, A survey of image registration techniques, Computing Surveys, 24 (1992), pp. 325-376. [5] B. CHAZELLE, A minimum .,::.:1'./ tree .*i1l,iili with inverse- ackermann i;.i. (.m: ,/. :l. Journal of the ACM, 47 (2000), pp. 1028-1047. [6] A. COLLIGNON, D. VANDERMEULEN, P. SUETENS, AND G. MARCHAL, 3D multi ir 1i./.i/ /i; medical image registration using feature space clustering, in Computer Vision, Virtual Reality and Robotics in Medicine, N. Ayache, ed., vol. 905 of Lecture Notes in Computer Science, Springer-Verlag, 1995, pp. 195-204. [7] T. CORMEN, C. LEISERSON, AND L. RIVEST, Introduction to iJi=ii ii; . MIT Press, Cambridge, MA, USA, 2001. [8] T. COVER AND J. THOMAS, Elements of information theory, John Wiley and Sons, New York, NY, USA, 1991. [9] R. DUDA AND P. HART, Pattern I .i; -6.;l/ .n and scene analysis, John Wiley and Sons, New York, NY, 1973. [10] D. EPPSTEIN, D iiii,. euclidean minimum T1..;1:i:.i:o trees and extrema of binary functions, Geometry: Discrete & Computational Geometry, 13 (1995), pp. 111-122. [11] S. GOLD, A. RANGARAJAN, C. P. LU, S. PAPPU, AND E. MJOLSNESS, New .iJ.tiHin-. for 2-D and 3-D point imintiiiiq: pose estimation and correspondence, Pattern Recognition, 31 (1998), pp. 1019-1031. [12] A. 0. HERO, B. MA, 0. MICHEL, AND J. GORMAN, Applications of entropic .ft'i:p:.':.t rli, IEEE Signal Processing Magazine (Special Issue on Mathematics in Imaging), 19 (2002), pp. 85-95. [13] A. 0. HERO AND 0. MICHEL, A ,in':l.*:' theory of greedy approximations to minimal k-point random qiupi,., IEEE Transactions on Information Theory, 45 (1999), pp. 1921-1938. [14] K. E. HILD II, D. ERDOGMUS, AND J. PRINCIPE, Blind source separation using renyis mutual informations, IEEE Signal Processing Letters, 8 (2002), pp. 174-176. [15] E. HOROWITZ, S. SAHNI, AND S. RAJASEKARAN, Fundamentals of computer algorithms, W. H. Freeman, San Francisco, CA, 1998. [16] A. HYVARINEN AND E. OJA, Independent component analysis: A tutorial, Neural Networks, 13(4-5) (2000), pp. 411-430. [17] B. MA, A. HERO, J. GORMAN, AND 0. MICHEL, Image registration with minimum .if:':.':.t tree *.icl, ilbih. in Internationl Conference on Image Proc. --inl. vol. 1, 2000, pp. 481-484. [18] R. E. NAEPOLITAN, Learning '*.7'. .*,i: networks, Pearson Prentice Hall, Upper Saddle River, NJ, USA, 2004. [19] H. NEEMUCHWALA, A. HERO, AND P. CARSON, Image registration using entropic ,iipl,-riiifth iii criteria, in Proceedings of Asilomar Conference on Signals and Systems, vol. 1, 2002, pp. 134-138. [20] S. PETTIE AND V. RAMACHANDRAN, An optimal minimum *..f:f:.1'.' tree algorithm, in Automata, Languages and Programming, 2000, pp. 49-60. [21] J. P. W. PLUIM, J. B. A. MAINTZ, AND M. A. VIERGEVER, Mutual- information-based registration of medical images: A m,' IEEE Trans. on Medical IIII...iI. 22 (2003), pp. 986-1004. [22] L. RABINER, A tutorial on hidden markov models and selected applications in speech recognition, Proceedings of the IEEE, 77 (Feb 1989), pp. 257-286. [23] A. RENYI, P,1,1/.:./:;, theory, North-Holland Publishing Company, Amsterdam, Netherlands, 1970. [24] C. E. SHANNON, A mathematical theory of communication, The Bell system technical journal, 27 (1948), pp. 379-423 and 623-656. [25] J. M. STEELE, Growth rates of euclidean minimum .ii;::.i..i. trees with power weighted edges, The Annals of Probability, 16 (1988), pp. 1767-1787. [26] H. A. STURGES, The choice of class interval, Journal of the American Statistical Association, 21 (1926), pp. 65-66. 78 [27] B. C. VEMURI, J. LIU, AND J. L. MARROQUN, Robust multimodal image registration using local fi ..;. i; 7 representations, vol. 2082 of Lecture Notes in Computer Science, 2001, pp. 176-182. [28] P. VIOLA AND W. M. WELLS III, Alignment by maximization of mutual information, in Fifth Intl. Conf. Computer Vision (ICCV), 1995, pp. 16-23. BIOGRAPHICAL SKETCH I graduated with a bachelor of engineering degree from the Government College of Engineering, Pune, India. I started my master of science program at the university of Florida in August 2002 and I am planning to graduate in a month! My academic interests lie in computer vision and image processing and this thesis was a great learning experience for me. Apart from this, I enjoy painting and badminton. |

Full Text |

PAGE 1 AFFINE IMA GE REGISTRA TION USING MINIMUM SP ANNING TREE ENTR OPIES By ADITEE KUMTHEKAR A THESIS PRESENTED TO THE GRADUA TE SCHOOL OF THE UNIVERSITY OF FLORID A IN P AR TIAL FULFILLMENT OF THE REQUIREMENTS F OR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORID A 2004 PAGE 2 Cop yrigh t 2004 b y Aditee Kum thek ar PAGE 3 This thesis is dedicated to m y family whose constan t and unconditional supp ort has made it p ossible. PAGE 4 A CKNO WLEDGMENTS I w ould lik e to thank Dr. Anand Rangara jan for his excellen t guidance, for the inn umerable hours that he sp en t patien tly resolving m y diculties and for his constan t encouragemen t whic h w as the driving force b ehind this thesis. I w ould also lik e to extend m y thanks to Jie Zhang who made me see certain asp ects of image registration clearly iv PAGE 5 T ABLE OF CONTENTS page A CKNO WLEDGMENTS . . . . . . . . . . . . . . iv LIST OF T ABLES . . . . . . . . . . . . . . . . vii LIST OF FIGURES . . . . . . . . . . . . . . . . viii ABSTRA CT . . . . . . . . . . . . . . . . . . xii CHAPTER1 INTR ODUCTION . . . . . . . . . . . . . . . 1 2 BA CK GR OUND . . . . . . . . . . . . . . . 7 2.1 En trop y and Mutual Information . . . . . . . . . 7 2.1.1 Join t and Conditional En trop y . . . . . . . . 8 2.1.2 Relativ e En trop y and Mutual Information . . . . . 8 2.1.3 Dieren tial En trop y . . . . . . . . . . . 9 2.1.4 Ren yi En trop y . . . . . . . . . . . . 10 2.2 Histogram metho d for Estimating En trop y . . . . . . 10 2.3 Minim um Spanning T rees . . . . . . . . . . . 11 2.4 The Problem of Registration . . . . . . . . . . 16 2.5 Image T ransformations . . . . . . . . . . . . 18 2.6 Indep enden t Comp onen t Analysis . . . . . . . . . 19 3 THEOR Y OF MINIMUM SP ANNING TREE ENTR OPIES . . . 22 3.1 En trop y Estimation for Registration . . . . . . . . 22 3.2 History . . . . . . . . . . . . . . . . 23 3.3 Minim um Spanning T rees for En trop y Estimation . . . . 25 3.4 Measure of Registration . . . . . . . . . . . 28 4 ALGORITHM DESIGN . . . . . . . . . . . . . 30 4.1 General Registration Algorithm . . . . . . . . . 30 4.2 Design of the Algorithm . . . . . . . . . . . 32 4.2.1 Prepro cessing b y Histograms . . . . . . . . 32 4.2.2 F eatures . . . . . . . . . . . . . . 34 4.2.3 Searc h Strategy . . . . . . . . . . . . 37 4.2.4 Minim um Spanning T ree Algorithm . . . . . . 39 4.3 Implemen tation Details . . . . . . . . . . . . 40 v PAGE 6 4.3.1 Hardw are and soft w are Details . . . . . . . . 40 4.3.2 F ree P arameters . . . . . . . . . . . . 44 4.4 Results . . . . . . . . . . . . . . . . 44 4.4.1 Same Image Registered with the Same Image . . . . 45 4.4.2 Registration of Same P erson with Dieren t Expressions . 45 4.4.3 Y et Another Registration of Same P erson with Dieren t Expressions . . . . . . . . . . . . 48 4.4.4 Registration Result of Images of t w o Dieren t P eople . 50 4.4.5 Y et Another Registration Result of Images of Tw o Dieren t P eople . . . . . . . . . . . . . . 52 4.4.6 Comparison with Histogram Based Metho d . . . . 53 4.4.7 Con v ergence of Ren yi En trop y with Num b er of Samples as Alpha Changes . . . . . . . . . . . 58 4.4.8 Noise T rials . . . . . . . . . . . . . 61 5 DISCUSSION . . . . . . . . . . . . . . . . 64 APPENDIX ADDITIONAL INF ORMA TION . . . . . . . . 67 REFERENCES . . . . . . . . . . . . . . . . . 76 BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . 79 vi PAGE 7 LIST OF T ABLES T able page 4{1 The ICA v alues . . . . . . . . . . . . . . . 34 4{2 F ree parameter table . . . . . . . . . . . . . . 44 4{3 Noise results o v er rotation for histogram . . . . . . . . 61 4{4 Noise results o v er rotation for MST . . . . . . . . . 62 4{5 Noise results o v er rotation and scale for MST . . . . . . . 63 4{6 Noise results o v er rotation and scale for MST . . . . . . . 63 vii PAGE 8 LIST OF FIGURES Figure page 1{1 Tw o registered images with dieren t in tensit y distributions . . . 4 2{1 Original graph . . . . . . . . . . . . . . . 14 2{2 MST b y Krusk als Algorithm . . . . . . . . . . . 14 4{1 Blo c k diagram for image registration . . . . . . . . . 30 4{2 Flo w-c hart for the estimating the parameters of the transform . . 31 4{3 Eect on registration b y v arying the n um b er of bins of histogram(Num b er of Bins are 8, 16 and 32 from top left to b ottom righ t. Maxim um should b e presen t at 0.) . . . . . . . . . . . . 33 4{4 ICA feature images . . . . . . . . . . . . . . 35 4{5 The faces used to compare registration for dieren t sizes of ICA blo c ks . . . . . . . . . . . . . . . . . 36 4{6 Figure sho wing the v ariations in MI vs. the rotation angle v arious ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features. Maxim um lo cated at 0. . . . . . . . . . . . . 36 4{7 Figure sho wing the v ariations in MI vs. the rotation angle v arious ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features . . . . . . . . . . . . . . . . . . . 37 4{8 Beha vior of MST en trop y in 1D and 2D searc h space. . . . . 38 4{9 Plots sho wing error b et w een the actual MST length and the truncated MST length o v er v arious radius factors and o v er rotations of -10, 0 and 10 degrees . . . . . . . . . . . . . . 41 4{10 Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.55 are sho wn as 1. 100 Samples 2. 300 samples 3. 500 samples 4. 700 samples . . . . . . . . . . . 42 4{11 Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.55 are sho wn as 1. 900 Samples 2. 1100 samples 42 viii PAGE 9 4{12 Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.85 are sho wn as 1. 100 Samples 2. 300 samples 3. 500 samples 4. 700 samples . . . . . . . . . . . 43 4{13 Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.85 are sho wn as 1. 900 Samples 2. 1100 samples 43 4{14 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. Same face with applied transformation 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images . . . . . . . . . . . . . . . . 46 4{15 Minim um spanning tree of the unregistered images considering only the in tensities of the t w o images as features . . . . . . 46 4{16 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace obtained after applying the optimal transformation to the unregistered image sho wn in the gure 4{14 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images . . . . . . . . . . . . . . . . 47 4{17 Minim um spanning tree of the registered images considering only the in tensities of the t w o images as features. . . . . . . . 47 4{18 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of the same p erson with glasses whic h is unregistered 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images . . . . . . . . . . 48 4{19 Minim um spanning tree of the unregistered images considering only the in tensities of the t w o images as features . . . . . . 49 4{20 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of the same p erson with glasses whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images . . . . . . . . . . 49 4{21 Minim um spanning tree of the registered images considering only the in tensities of the t w o images as features . . . . . . . 50 4{22 Figure sho wing the from top left to b ottom righ t 1. F ace image of a happ y p erson 2. F ace image of the same p erson with sad expression whic h is unregistered 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images . . 51 ix PAGE 10 4{23 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of the same p erson with sad expressions whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images . . . . . . . . . 51 4{24 Figure sho wing the from top left to b ottom righ t 1. A F ace image 2. F ace image of the dieren t p erson 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images . . 52 4{25 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of dieren t p erson whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images . . . . . . . . . . . . . . . . 53 4{26 Figure sho wing the from top left to b ottom righ t 1. A F ace image 2. F ace image of the dieren t p erson 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images . . 54 4{27 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of dieren t p erson whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images . . . . . . . . . . . . . . . . 54 4{28 Figure sho wing the from top left to b ottom righ t 1. A F ace image 2. F ace image of the dieren t p erson 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images . . 55 4{29 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of dieren t p erson whic h is registered 3. Ov erlap of the t w o images registered using histogram 4. Edge image of the o v erlapp ed registered images . . . . . . . . . . 55 4{30 Figure sho wing from left to righ t 1. F ace image of a p erson 2. F ace image of same p erson whic h is registered . . . . . . . 56 4{31 Figure sho wing image 2 as registered b y the histogram . . . . 56 4{32 Plot of MI vs. rotation angle when registered b y histogram. Maxim um should b e at -5 degrees. . . . . . . . . . . . . 57 4{33 Figure sho wing image 2 as registered b y the MST . . . . . 57 4{34 Plot of MI vs. rotation angle as registered b y MST. Correct answ er is at -5 degrees. . . . . . . . . . . . . . . 57 4{35 Figure sho wing the from top left to b ottom righ t, MST Ren yi join t en trop y vs the samples for 1. alpha = 0.4 2. alpha = 0.45 3.alpha = 0.5 4. alpha = 0.55 . . . . . . . . . . . . 58 x PAGE 11 4{36 Figure sho wing the from top left to b ottom righ t, MST Ren yi join t en trop y vs the samples for 1. alpha = 0.6 2. alpha = 0.65 3.alpha = 0.7 4. alpha = 0.75 . . . . . . . . . . . . 59 4{37 Figure sho wing the from top left to b ottom righ t, MST Ren yi join t en trop y vs the samples for 1. alpha = 0.8 2. alpha = 0.85 3.alpha = 0.9 4. alpha = 0.95 . . . . . . . . . . . . 60 xi PAGE 12 Abstract of Thesis Presen ted to the Graduate Sc ho ol of the Univ ersit y of Florida in P artial F ulllmen t of the Requiremen ts for the Degree of Master of Science AFFINE IMA GE REGISTRA TION USING MINIMUM SP ANNING TREE ENTR OPIES By Aditee Kum thek ar Ma y 2004 Chair: Anand Rangara jan Ma jor Departmen t: Computer and Information Science and Engineering The v arious en vironmen tal factors, settings of the image acquisition device and the material prop erties of a ob ject directly aect the formation of an image. Registration of t w o images in v olv es aligning the t w o images obtained under dieren t conditions whic h is based on sev eral information-theoretic metho ds. En trop y estimation is an imp ortan t step in these metho ds whic h traditionally requires estimation of the densit y function of the images. This thesis ho w ev er directly estimates the en trop y without a priori default densit y estimation. This is ac hiev ed b y exploiting a p o w erful theorem whic h relates the asymptotic length of the minim um spanning tree formed from the samples and the Ren yi en trop y of the densit y function underlying the samples. Since the densit y estimation step is a v oided, the careful tuning of sev eral densit y estimation parameters, whic h is required b y most densit y estimation metho ds, is a v oided. Also, since the spanning tree metho d estimates the en trop y in high dimensions, this allo ws us to include feature information in to the registration pro cess. Ane registration of face images of dieren t sub jects with dieren t gestures and tak en under dieren t ligh ting conditions is demonstrated. xii PAGE 13 CHAPTER 1 INTR ODUCTION The en tire eld of computer vision is motiv ated to solv e the problem of designing a system whic h has the capabilities of h uman vision. When w e lo ok around, our ey es and our brain immediately analyze the relationship b et w een the dieren t ob jects in our immediate en vironmen t. W e usually do not ha v e to think long to in terpret what w e see. It comes naturally to us. Ho w ev er, w e ha v e not b een able to em ulate this amazing system of p erception after 40 y ears of the inception of the eld of computer vision. Wh y is this so? This is b ecause a v ery complex relationship exists b et w een the ph ysical factors and the w a y an image is formed. Suc h factors include rerectivit y and absorb ency of the illuminated surface, the in tensit y and the direction of illumination, the en vironmen tal factors lik e the presence of dust and sp ec kle in the atmosphere, rain etc. Not only this but the mo de in whic h an image is acquired { for example, a camera and the parameters that calibrate the instrumen t { for example, the sh utter sp eed in the camera con tribute to the image formation. Viola and W ells [ 28 ] ha v e represen ted this ev en t of image formation as an equation as v ( T ( x )) = F ( u ( x ) ; q ) ; (1.1) where T ( x ) is the transformation function whic h determines the orien tation of the image; u ( x ) is the function whic h describ es the surface of the ob ject; q represen ts the other parameters that aect image formation suc h as ligh ting, en vironmen t etc; v () is the function that form ulates the pixel in tensities and F ( u ( x ) ; q ) is the function whic h incorp orates the v arious parameters men tioned ab o v e whic h pla y a 1 PAGE 14 2 role in the image formation. Mo delling the aforemen tioned function is not a trivial task as there are a v ast n um b er of parameters in v olv ed. Man y vision related systems attempt ha ving functionalities suc h as ob ject recognition { iden tifying the ob jects from the bac kground region, indexing of image databases{ the ecien t retriev al of images similar to the query image and ob ject mo delling { the mathematical represen tation, co ding and syn thetic generation of an image from a mo del. Man y suc h applications require that the images under consideration b e aligned b efore further pro cessing can b e p erformed on them. Apart from these applications whic h require alignmen t as a initial prepro cessing step, there are a plethora of applications whic h rely on accurate image alignmen t. One suc h application is "anomaly detection" in medical imaging, whic h refers to detecting an y c hanges in a particular organ o v er a p erio d of time either due to disease or to due to treatmen t, e.g., detecting if the p ost-op erativ e MRI of patien t sho ws the remo v al of the tumor that w as seen in the pre-op erativ e scan of the patien t. In order to do this, w e require accurate alignmen t of the images. Image registration is precisely this task of spatially aligning t w o images so that the corresp onding pixels in the images refer to the same p ortion or area in the image. Alternativ ely y ou could also think of image registration as nding the transformation whic h aligns the t w o images accurately Y ou migh t no w w onder wh y is this not a simple task. Let us see wh y W e sa w that there w as a wide arra y of factors that con tribute to the image formation. The c hallenge of registration lies in matc hing t w o images when they are tak en at dieren t times, under dieren t conditions and using dieren t viewp oin ts probably with dieren t image acquisition tec hniques. Recen tly the problem of m ultimo dalit y based image registration, whic h refers to registering images under dieren t ligh ting conditions, has gained imp ortance. PAGE 15 3 A t the core,registration metho ds can b e view ed as a com bination of certain c hoices ab out the follo wing parameters, namely [ 4 ] F eature Space: F eature space refers to the parameters of the image used to p erform registration. In some tec hniques, the ra w image in tensities are used directly These are called in tensit y based metho ds. Also, there are feature based metho ds whic h extract some information from the ra w image (termed as features) and then use these features to p erform registration. The features can b e extracted via image segmen tation or b y using information theory based tec hnique. Sea rch space: This is the set of v arious transformations applied to the image during the registration pro cess. The spatial transformations run the gam ut of rigid, ane, pro jectiv e, p olynomial, non-rigid(morphing)etc. These are explained later. Sea rch strategy: This term denotes the tec hnique used to searc h the en tire set of transformations. In case of the exhaustiv e searc h, the en tire space is scanned. In the case of hierarc hical searc h, the set of transformations is searc hed using a coarse to ne hierarc h y .Finallu there are tec hniques that use lo cal gradien t information of image similarit y measure to determine a sequence of transformations whic h progressiv ely minimizes the measure. Simila rit y metric : It is the criterion used for judging the qualit y the qualit y of the ac hiev ed registration. The similarit y metric is a gure of merit used to rank order the v arious spatial transformations. One migh t w onder what is this similarit y metric an yw a y? Can't w e simply compare the in tensities at the corresp onding pixel lo cations to v erify if the images ha v e b een registered. This is not so straigh t forw ard. In gure 1{1 w e sho w t w o registered images A and B. Clearly the in tensities in the t w o images at the same lo cation do not corresp ond. There ha v e b een man y recen t eorts fo cused on information theory based metho ds. The information-theory based criterion usually pro vides us with some registration ob jectiv e function whic h has to b e minimized(or maximized) o v er a range. One suc h ob jectiv e function is the m utual information ob jectiv e function [ 28 ] whic h w as used to p erform m ultimo dalit y based image registration. Mutual Information is a measure whic h measures the amoun t of o v erlapping information con ten t b et w een t w o images. An in teresting metaphor for in tensit y PAGE 16 4 Figure 1{1: Tw o registered images with dieren t in tensit y distributions based and information-theory based similarit y measures is that the in tensit ybased tec hnique can b e view ed as "What W e See Is All Get (WWSIA W G)" while the information-theory based criteria can b e view ed as "What W e See Is What W e Mean(WWSIWWM)". Mutual Information tec hniques ha v e no w b ecome widespread in registration esp ecially in medical imaging. The Mutual Information b et w een an y t w o images can b e computed either from the ra w image in tensit y or from feature v ectors extracted from eac h image. Rarely do w e see registration approac hes that com bine in tensit y and features. This thesis attempts suc h a com bination. Histograms are commonly used to estimate the en trop y (information con ten t)whic h is then used to calculate the Mutual Information. When feature based metho ds are used, the dimensionalit y of the feature space increases. Histogram metho ds suer from the "curse of dimensionalit y" and th us cannot b e easily used to estimate the en trop y of a high dimensional feature set. Also the PAGE 17 5 histogram based metho ds are extremely sensitiv e to the n um b er of bins. Apart from this the histograms estimate the discrete Shannon En trop y while, as w e explain later, w e need to estimate the con tin uous dieren tial en trop y 1 This calls for the use of non-parametric metho ds to estimate the en trop y in a high dimensional feature space. Once suc h tec hnique is to use spanning graphs to estimate en trop y The early w ork in this area w as done b y Steele [ 25 ] who sho w ed ho w the length of the minim um spanning tree (MST) of a complete graph from the Euclidean distance b et w een the feature v ectors is directly prop ortional to the in tegral of an appropriately dened p o w er of the probabilit y densit y dened on the feature space. Later the group at Univ ersit y of Mic higan, Alfred Hero et al. [ 12 19 ] sho w ed ho w the aforemen tioned relation b et w een the length of the spanning tree and the in tegral of the p o w er of the densit y can b e used to estimate the Ren yi en trop y whic h in turn can b e used to p erform registration. The searc h space of ane transformations is one suc h space whic h includes the six parameters of rotations, scale, shear and translations in R The ane transformation is applied globally to the en tire image. In this thesis, w e ha v e closely follo w ed Hero's w ork and ha v e successfully sho wn ho w the MST based approac h can b e used the to p erform registration o v er six parameters of the ane transformation. W e ha v e computed an informationtheory based criterion similar to the con v en tional m utual information using the Ren yi en trop y as prop osed in pap er [ 14 ] and then minimized it o v er an ane searc h space. Also, w e ha v e com bined the use of in tensities and features in the same information theory based registration tec hnique. The Ren yi en trop y is estimated 1 Information on the en trop y its t yp es and Mutual Information is presen ted in Chapter 2. PAGE 18 6 from the high dimensional features space using the minimal spanning tree-based metho d. After this concise in tro duction, Chapter 2 co v ers the bac kground material, Chapter 3 co v ers the theory Chapter 4 co v ers the implemen tation details and nally the last c hapter co v ers the results, conclusions and future w ork. PAGE 19 CHAPTER 2 BA CK GR OUND 2.1 En trop y and Mutual Information F rom a broad p ersp ectiv e, the en trop y of a random v ariable is dened as the measure of uncertain t y of a random v ariable. Alternativ ely it can also b e dened as the amoun t of information required to represen t the v ariable. Th us, the more improbable a random the v ariable, greater is its en trop y Shannon [ 24 ] deriv ed a measure of en trop y for discrete v ariables. Mathematically the en trop y of H(X) of a random v ariable X is dened as H ( X ) = X x 2 p ( x ) log p ( x ) : (2.1) Normally the logarithm is tak en to the base 2, in whic h case the en trop y is expressed in terms of bits. Th us, en trop y can also b e expressed as the length of the co de to represen t the random v ariable. If g(x) are the actual v alues that a v ariable X can assume, then the exp ected v alues of g(X) is written as E p g ( X ) = X x 2 g ( x ) p ( x ) : (2.2) Th us, w e can write the en trop y of a random v ariable as H ( X ) = E p log 1 p ( X ) : (2.3) This leads to a conclusion that H ( X ) 0 as 0 p ( x ) 1 implies that log 1 p ( X ) 0 : (2.4) 7 PAGE 20 8 2.1.1 Join t and Conditional En trop y Join t and conditional en trop y extend the concept of en trop y to a pair of distinct random v ariables. Tw o random v ariables are said to b e statistically indep enden t when their join t densit y is the pro duct of their marginal densities. Th us, when the t w o random v ariables are indep enden t, the kno wledge of one do es not ha v e an y eect on the kno wledge of the other v ariable. On the other hand, when the v ariables are totally dep enden t, that is, if Y = f ( X ) then, Y is not random when X is kno wn. The mathematical represen tation of join t en trop y is H ( X ; Y ) = X x 2 X X y 2 Y p ( x; y ) log p ( x; y ) (2.5) and the conditional en trop y of a v ariable giv en the other v ariable is dened as H ( Y j X ) = X x 2 X p ( x ) X y 2 Y p ( y j x ) log p ( y j x ) (2.6) = E p ( x;y ) log p ( Y j X ) : (2.7) By the c hain rule w e get H ( X ; Y ) = H ( X ) + H ( Y j X ) : (2.8) Tw o v ariables are said to b e indep enden t if H ( Y j X ) = H ( Y ) (2.9) that is kno wledge of X has no eect on the kno wledge of Y Th us w e ha v e H ( X ; Y ) = H ( X ) + H ( Y ) : (2.10) 2.1.2 Relativ e En trop y and Mutual Information The relativ e en trop y is the measure of the distance b et w een the t w o probabilit y distributions. T o explain it b etter, supp ose there a random v ariable PAGE 21 9 with probabilit y distribution p and whose en trop y is H ( p ). Since w e don't kno w the exact distribution of p w e assume that it is q with en trop y H ( q ). The relativ e en trop y D ( p jj q ) is the dierence in the n um b er of en trop y bits of p and q It is also called as the Kullbac k-Liebler distance and is mathematically represen ted as D ( p jj q ) = X x 2 X p ( x ) log p ( x ) q ( x ) (2.11) Mutual information ( I ( X ; Y )) is dened as the amoun t of information that one random v ariable has ab out other. It is the relativ e en trop y b et w een the join t and the pro duct distributions of the t w o random v ariables, expressed as I ( X ; Y ) = D ( p ( x; y ) jj p ( x ) p ( y )) (2.12) = X x 2 X X y 2 Y p ( x; y ) log p ( x; y ) p ( x ) p ( y ) : (2.13) It can also b e sho wn that I ( X ; Y ) = H ( X ) + H ( Y ) H ( X ; Y ) (2.14) = I ( Y ; X )& I ( X ; Y ) 0 : (2.15) Additional prop erties of m utual information are giv en in the b o ok b y Co v er and Thomas [ 8 ]. 2.1.3 Dieren tial En trop y While en trop y is a measure of uncertain t y in the discrete domain, dieren tial en trop y is a measure of uncertain t y of a v ariable in the con tin uous domain. It is dened as h ( X ) = E x [log ( p ( X ))] (2.16) = Z 1 1 p ( x ) log ( p ( x )) dx (2.17) PAGE 22 10 The ma jor dierence b et w een en trop y and dieren tial en trop y is that the dieren tial en trop y can tak e negativ e v alues. Th us, there is no direct relationship b et w een co de length and dieren tial en trop y since co de length cannot b e negativ e. 2.1.4 Ren yi En trop y The en trop y describ ed in Section 2.1.3 is dieren tial shannon en trop y Ren yi en trop y is a generalized v ersion of the Shannon en trop y The Ren yi en trop y h ( X ) of order is dened as, h ( X ) = 1 1 log Z f ( x ) dx ; (2.18) for 1 < < 1 ; 6 = 1. The discrete v ersion of the Ren yi en trop y is dened as, H ( X ) = 1 1 log Ni =1 x i : (2.19) When 1 w e obtain the Shannon en trop y function from the Ren yi en trop y function as follo ws, h ( X ) = h 1 ( X ) = Z p ( x ) log ( p ( x )) dx (2.20) In the discrete domain, as 1 ; H ( X ) = H ( X ) : Viola and W ells [ 28 ] ha v e stated that the task of registering t w o images is equiv alen t to maximizing the m utual information b et w een the t w o images. 2.2 Histogram metho d for Estimating En trop y Histogram based densit y estimation divides the range of the data in to bins and then computes the normalized frequency of the data placed in the bins. The normalized frequency coun ts sum up to one. A one dimensional histogram of data with n samples and b bins is, f ( a i ) = counta i n where a i denotes the bin n um b er i and 0 < i b: (2.21) PAGE 23 11 F or m ultiv ariate data distributed across d dimensions, histogram-based densit y estimation b y histograms is generally p erformed b y constructing M bins along eac h of the d dimensions. If the n um b er of bins is large then the histogram will fail to capture the shap e of the probabilit y distribution function due to high v ariance will b e large and the reduced bias. Ho w ev er, as n um b er of bins are decrease, the probabilit y distribution function b ecomes spiky as the v ariance narro ws and the bias increases. Algorithms that determine the correct bin size of the histogram include the Sturge's algorithm [ 26 ]whic h states that for n samples, M = 1 + 3 : 322 log 10 n Some other approac hes predict bin size from a priori kno wledge of data. In our thesis, w e ha v e c hosen the bin size statistically Histogramming suers from the curse of dimensionalit y That is, if the n um b er of data dimensions are denoted b y d with M bins in eac h direction then the total n um b er of bins will b e M d Suc h exp onen tial gro wth in M incurs dra wbac ks. Firstly as the M is large in higher dimensional data, more data p oin ts will b e needed to p opulate the histogram. Otherwise, it will result in a sparsely p opulated histogram with n umerous empt y bins. Secondly data in higher dimensions tends to b e correlated and th us pro jects itself in to selected dimensions. This mak es histogram based metho d of densit y estimation unsuitable for higher dimensional data. 2.3 Minim um Spanning T rees Before describing the theory b ehind the use of Minim um Spanning T rees for registration later in c hapter 3, let us see dene a Minim um Spanning T ree. Minim um Spanning T rees(MSTs) are used in solving man y real w orld problems. F or example, consider a case of a net w ork with V no des with E undirected connections b et w een no des. This can b e represen ted as a connected, undirected graph G = ( V ; E ) con taining V v ertices and E edges. No w supp ose that all the edges are w eigh ted, i.e., for eac h edge ( u; v ) 2 E w e ha v e an asso ciated w eigh t PAGE 24 12 w ( u; v ). A w eigh t can b e used to represen t real w orld quan tities suc h as cost of a wire, propagation dela y etc b et w een t w o no des in a net w ork. A spanning tree is dened as a acyclic graph that connects all the v ertices. A minim um spanning tree is a spanning tree with the minim um w eigh t. Supp ose w e represen t the spanning tree as T E whic h connects all the v ertices, and whose total length is w ( T ), then the minim um spanning tree is dened as, min( w ( T )) = X ( u;v ) 2 T w ( u; v ) : (2.22) Generic MST algorithm The b o ok b y Cormen et al. [ 7 ] giv es a supp orted analysis of minim um spanning tree algorithms. The MST algorithm falls in the category of greedy algorithms. Greedy Algorithms are algorithms that mak e the b est c hoice at eac h decision making step. In other w ords, at ev ery step, greedy algorithms mak e the lo cally optim um c hoice and hop e that it leads to a globally optim um solution. The greedy MST algorithm builds the tree step-b ystep, incorp orating the edge that causes minim um increase in the total w eigh t at eac h step, without adding an y cycles to the tree. Supp ose there is a connected, undirected graph G = ( V ; E ) with the w eigh t function w : E R While nding the minim um spanning tree for graph G the algorithm manages at eac h step an edge-set S whic h is some subset of the MST. A t eac h step, edge ( u; v ) is added to subset S suc h that it do es not violate the MST prop ert y of S This mak es S U ( u; v ) a subset of the Minim um Spanning T ree. The edge whic h is added at eac h step is termed a "safe edge". The algorithm can b e written as GENERIC-MST(G, w) 1. S ; 2. while S do es not form a spanning tree 3. do nd a safe-edge (u,v) whic h can added in S 4. S S [ ( u; v ) PAGE 25 13 5. return S There are t w o p opular algorithms for computing the Minim um Spanning T ree, Prim's algorithm and Krusk al's algorithm (refer [ 7 ]). W e ha v e used the Krusk al's Algorithm to nd the Minim um Spanning T ree. Its description follo ws. Krusk al's algorithm for MST. Krusk al's algorithm is an extension of the generic MST algorithm describ ed in the preceding sub-section ab o v e. In the Krusk al's algorithm the set S whic h is a subset of the minim um spanning tree, is a forest. A t eac h step, the Krusk al's Algorithm nds the safe edge to b e added as the edge with the minim um w eigh t that connects t w o forests. Initially the edges are sorted in the decreasing order of their w eigh ts. A t eac h step, one nds the minim um edge in the graph not already presen t in the minim um spanning tree, connects t w o forests together. This pro cess is rep eated un til, till all the v ertices are included in the graph. The algorithm can b e written as, MST-Krusk al(G,w) 1. S ; 2. for eac h v ertex v 2 V [ G ] 3. do MAKE-SET(v) 4. sort the edges of E b y non-decreasing w eigh t w 5. for eac h edge ( u; v ) 2 E ; in the order of nondecreasing w eigh t 6. do if find-set ( u ) 6 = find-set ( v ) 7. then S S [ ( u; v ) 8. union (u,v) 9. return S In steps 1-3, the subset S is initialized to n ull and V n um b er of forests eac h with a single v ertex are created. Step 4 sorts the edge set E in a non decreasing order of w eigh t. In steps 5-8, an edge ( u; v ) is found suc h that the endp oin t u b elongs to one forest and endp oin t v b elongs to other forest. This edge is incorp orated in PAGE 26 14 Figure 2{1: Original graph the subset S The algorithm stops when all v ertices are included in the tree. An illustration of ho w the Krusk al's algorithm w orks is sho win in gure 2{2 Figure 2{2: MST b y Krusk als Algorithm A more formal pro of of wh y the Krusk al's Algorithm w orks [ 15 ] is as follo ws Let G b e a undirected, connected graph. t is the minim um spanning tree b y Krusk al's algorithm and t 0 is the actual minim um spanning tree. One sho ws that PAGE 27 15 trees t and t 0 ha v e same costs. Let E ( t ) and E ( t 0 ) are the edges in b oth the trees where eac h set con tains n 1 edges, pro vided that the total n um b er of v ertices in a graph are n PR OOF: Case1: If E ( t ) = E ( t 0 ) then the lengths t and t 0 are equal. Th us Krusk al's Algorithm generates the minim um spanning tree in this case. Case2: E ( t ) 6 = E ( t 0 ). Assume that there is an edge p suc h that p 2 E ( t ) and p = 2 E ( t' ). If w e include p in t' then t 0 will ha v e n edges and, th us, a unique cycle in t' with edges p; e 1 ; e 2 ; :::e k Of these k + 1 edges there will b e at least one edge e j not in tree t The w eigh t of e j m ust b e greater than or equal to the w eigh t of p b ecause, if the w eigh t is less that that of p then the greedy Krusk al's algorithm w ould include it while forming the tree t If an y suc h edge whic h forms a cycle is remo v ed from tree t' a dieren t tree t" is formed whose cost is less than or equal to the cost of tree t' This is b ecause w eig ht ( e j ) w eig ht ( p ). Th us, tree t" is a minim um cost tree. By rep eating the ab o v e men tioned transformation, w e can transform tree t' in to tree t Th us, the tree generated b y the Krusk als algorithm is a minim um spanning tree. The time complexit y of Krusk als minim um spanning tree is O ( j E j l og j E j ) Prim's spanning tree. Unlik e Krusk al's spanning tree, in Prim's spanning tree the subset s is alw a ys a tree and nev er a forest. The algorithm is giv en b y MST-Prim(G,w,r)G:Graph, w:W eigh ts, r:Ro ot 1. Q V [ G ] 2. for eac h u 2 Q 3. do k ey [ u ] 1 k ey(v): minim um w eigh t of an y edge connecting v ertex v to the tree. 4. k ey [ r ] 1 5. [ r ] N I L [v]: names of the paren t of v in tree. PAGE 28 16 6. while Q 6 = 0 7. do u Extra ct-Min(Q) 8. for eac h v 2 Q and w ( u; v ) < k ey [ v ] 9. then [ v ] u 10. k ey [ v ] w ( u; v ) The time complexit y of this algorithm is O ( j E j + j V j log j V j ). More ecien t algorithms for computing the minim um spanning trees are giv en in Chazelle [ 5 ] and P ettie and Ramac handran [ 20 ]. Some algorithms [ 10 ] compute the minim um spanning trees dynamically as the v ertices are added or remo v ed dynamically 2.4 The Problem of Registration Image registration can b e dened as the problem of nding the b est transformation that, when applied to one image, aligns the image with the another image. Bro wn [ 4 ] and Pluim et. al [ 21 ] surv ey ed image registration metho ds whic h can b e classied as the v arious algorithms actually used to p erform registration and the applications of the registration tec hniques. Based on the application of the registration tec hniques, the metho ds can b e classied as, registration if images with dieren t mo dalities, registration of dieren t anatomical ob jects etc. The problem of registration of images with dieren t mo dalities is called m ultimo dalit y registration. This problem in v olv ed the registering of t w o images tak en via dieren t sensors or under dieren t ligh ting conditions. The example of m ultimo dalit y registration is registration of MRI, and CT images. In our thesis, w e ha v e registered face images whic h are obtained via a camera of dieren t p eople, under dieren t ligh ting conditions and with dieren t expressions. Based on the metho ds used for registration, the tec hniques are classied as based on prepro cessing of images, the t yp e of transformations applied to images, PAGE 29 17 the in terp olation metho d used, the searc h strategy used, the dieren t features of the image used and the measure of registration. Mutual information for registration Based on the measure of registration, sev eral statistical criteria are used. In these criteria, ma jorit y of the metho ds in v olv e en tropic measures to estimate ho w the images ha v e b een aligned. These en tropic measures in v olv e estimating the en trop y of the images and form ulating dieren t measures based on them. Mutual information is one suc h measure, whic h is dened as I ( X ; Y ) = H ( X ) + H ( X j Y ) (2.23) I ( X ; Y ) = H ( X ) + H ( Y ) H ( X ; Y ) : (2.24) Some metho ds use normalized m utual information to register the images whic h is computed as, N M I ( X ; Y ) = H ( X ) + H ( Y ) H ( X ; Y ) : (2.25) Observ e that H ( X j Y ) is a measure of the conditional probabilit y p ( x j y ) i.e. of a particular v alue x in X giv en the v alue in Y is y Th us, the Equation ( 2.23 ) can b e in terpreted as the measure of the uncertain t y ab out X min us the uncertain t y ab out X when Y is kno wn. It can also b e stated that I ( X ; Y ) = I ( Y ; X ). Th us, when the images are aligned, the amoun t of information the images ha v e ab out eac h other is maximized. As a result, registration in v olv es nding the transformation that maximizes the m utual information. Another view ab out of information is obtained from equation ( 2.24 ), whic h describ es m utual information as the sum of marginal en tropies min us the join t en trop y One migh t argue that, as the images are transformed, the marginal en tropies of the images remain constan t and the only factor that c hanges is the join t en trop y F rom this it w ould seem that, maximizing the m utual information is equiv alen t to minimizing the join t en trop y of the t w o images. But this is not PAGE 30 18 true, as the en tropies are calculated from the o v erlap areas while computing m utual information. The marginal en tropies of the images th us, c hange with the join t en trop y as the n um b er of pixels in the o v erlap area c hange with a giv en transformation. In suc h a scenario, if the o v erlap region is v ery small, then the join t histogram will b e a sharp p eak corresp onding to the o v erlap of the one image and the bac kground samples from second image, resulting a lo w join t en trop y This will giv e an incorrect answ er, whic h ironically fa v ors the separation of the t w o images. By adding the marginal en tropies term in Equation ( 2.24 ), a p enalt y is added since the marginal en tropies will also b e lo w when the images are misaligned. This mak es m utual information less sensitiv e to the problem of bac kground. 2.5 Image T ransformations An image transformation is a function that maps the lo cation of p oin ts in one image to a dieren t lo cation in an other image. The transformations can b e global or lo cal. Global transformations mean applying the same mapping for the en tire image while a lo cal transformation consists of a collection of sev eral mapping functions, eac h of whic h is sp ecic to a particular region in the image. Apart from these broad classes, image transformations can b e classied as, Rigid: in v olv es translations and rotations alone. Ane: in v olv es translations, rotations, scale and shear. P ersp ectiv e: is mainly used to map a 3D image to a 2D image b y pro jecting the data. Non-Rigid: In whic h mapping in v olv es in tro ducing morphing and distortions of an image. In this study w e ha v e used the ane transformation and th us will elab orate more on that. Ane transformations. Ane transformations in v olv e rotation, scale, shear and translation. The comp osite transformation matrix is decomp osed in to scale, PAGE 31 19 rotation, shear, rotation as suggested in the pap er [ 11 ] 1 and translation. This decomp osition of can b e written as, T = 266664 a b 0 c d 0 e f 1 377775 that is the comp osite matrix, whic h can b e further decomp osed as 264 a b c d 375 = 264 2 s 0 0 2 s 375 264 cos( ) sin( ) sin( ) cos( ) 375 264 2 t 0 0 2 t 375 264 cos( ) sin( ) sin( ) cos( ) 375 where s is the scale parameter, t is the shear parameter, and are the rotation angles and e and f are the translations in the x and y direction. W e th us, p erform our searc h o v er six degrees of freedom, t w o rotations, one scale, one shear and t w o translations. An in teresting prop ert y of this ane transformation that it do es not allo w rerections and that it giv es a nice decomp osition of the comp osite ane matrix whic h allo ws the user to individually set dieren t parameters. 2.6 Indep enden t Comp onen t Analysis Registration of t w o images can b e p erformed b y using the pixel in tensit y v alues, or b y extracting features from image. F eatures can b e extracted simply b y passing the image through a set of lters, eac h of whic h detects one or more features of in terest. Indep enden t Comp onen t Analysis is one suc h tec hnique for extracting features. The motiv ation for the tec hnique of Indep enden t Comp onen t Analysis, w as to solv e the "co c ktail part y problem". This problem can b e p osed as follo ws: 1 The pap er suggests decomp osing in to scale, rotation and t w o shears. Ho w ev er, it also men tions that the second shear can b e replaced b y a second rotation PAGE 32 20 Supp ose there are t w o p eople sp eaking sim ultaneously in a ro om, and w e ha v e t w o microphones to capture their v oices. The output from the t w o microphones can written mathematically as x 1 ( t ) = a 11 s 1 + a 12 s 2 (2.26) x 2 ( t ) = a 21 s 1 + a 22 s 2 (2.27) where x represen ts the output from a microphone, t is the time, s represen t the sources and a represen ts the w eigh ts that are assigned to the signals from the microphones. ICA attempts to nd the original sources of the signals (i.e. the sp eak ers in the ab o v e case) from a mixed signal input (i.e. from microphones). ICA th us, in v olv es reco v ering indep enden t sources of signals from only the sensor observ ations, whic h are the unkno wn w eigh ted linear mixtures of the signals from indep enden t sources. Th us, ICA searc hes for a linear mixing matrix that, when m ultiplied b y the original m ultiv ariate mixed data, giv es statistically indep enden t comp onen ts of the data. Denition of indep enden t comp onen ts Tw o v ariables are said to b e statistically indep enden t when no v ariable has information ab out the other v ariable. The join t probabilit y of t w o indep enden t v ariables is the pro duct of the marginal probabilities of the t w o v ariables. Th us, t w o random v ariables y 1 and y 2 are considered to b e statistically indep enden t when, p ( y 1 ; y 2) = p ( y 1) :p ( y 2). F or n suc h v ariables, the join t densit y m ust b e a pro duct of n marginal densities of the v ariables. Indep enden t v ariables are also uncorrelated in nature. Also, indep enden t v ariables are as non-Gaussian as p ossible. The reason b ehind this is that the Gaussian distribution is symmetric in nature, whic h mak es it imp ossible to nd the direction of the comp onen ts of the mixing matrix. No w the ob vious question arises is what are the dieren t w a ys of measuring indep endence. PAGE 33 21 Measuring indep endence Non-Gaussianit y and, th us, the indep endence of a v ariable, can b e measured b y using an information theory metric called negativ e en trop y also called negen trop y Previously w e stated that that the dieren tial en trop y of a random v ariable x is giv en b y the form ula, h ( x ) = Z p ( x ) log p ( x ) dx: (2.28) Gaussian v ariables ha v e the largest en trop y among all v ariables of equal v ariance. On the other hand, the v ariables that ha v e "spik es" in their probabilit y distribution functions (less Gaussian) ha v e less en trop y Based on this observ ation, negen trop y is dened as J ( y ) = H ( y g auss ) H ( y ) : (2.29) Th us, the negen trop y is alw a ys p ositiv e, except for gaussian v ariables, when negen trop y is zero. The F ASTICA algorithm W e ha v e used the F ASTICA pac k age(Av ailable A t: http://www.cis.hut.fi/projects/ica/fastica/ Accesses at: Marc h 2004) [ 16 ] to nd the ICA of images. This pac k age implemen ts the F ASTICA algorithm whic h constructs a neural net w ork, the w eigh ts of whic h are up dated based on a learning rule. Roughly the algorithm is as follo ws: 1. Negen trop y is used as a measure of the non-gaussianit y 2. The cost function is an appro ximation of the negen trop y 3. Initially random w eigh ts are assigned at the starting p oin t. 4. Then the v alue of the w eigh ts in the subsequen t iteration is estimated b y the Newton-Raphson metho d. 5. The ab o v e step is rep eated un til con v ergence. Assuming that the images are a mixture of signals from statistically indep enden t sources, w e can use ICA to extract features from the images. PAGE 34 CHAPTER 3 THEOR Y OF MINIMUM SP ANNING TREE ENTR OPIES In this c hapter, w e consider minim um spanning trees for registration. 3.1 En trop y Estimation for Registration In the previous section it w as sho wn that, m utual information requires the estimation of en trop y that is calculated b y estimating the probabilit y densit y of the underlying data. F or image registration, it migh t b e necessary to estimate discrete instead of con tin uous densities. There are t w o broad approac hes to estimate the probabilit y densit y: parametric metho ds and non-parametric metho ds. Both metho ds require the estimation of parameters. Ho w ev er, non-parametric metho ds estimate only the width parameters lik e the bin size in histograms, windo w size in parzen windo w densit y estimation describ ed in [ 9 ], etc, while parametric metho ds estimate more complicated parameters whic h describ e the underlying densit y suc h as mean, v ariance etc. An o v erview of metho ds for densit y estimation can b e found in the b o ok b y Christopher Bishop [ 3 ]. T o elab orate a little bit, P arametric Metho ds: assume a densit y mo del for the underlying distribution of data. These metho ds then attempt to nd the parameters of the particular mo del whic h t optimally to the data. One of the dra wbac ks of this approac h is the assumption of the underlying densit y mo del migh t not b e true. The common parametric densit y estimation tec hniques are the Hidden Mark o v Mo dels(HMM), Ba y esian Net w orks etc. The HMMs mo del the underlying distribution b y ha ving dieren t states and the transitions from one state to other due to the underlying probabilit y and bias as describ ed in [ 22 ]. Also the Ba y esian Net w orks mo del a causal system in whic h the o ccurrence of a ev en t at one no de can trigger sev eral c hanges in the connecting no des as describ ed in depth in [ 18 ]. Both the ab o v e men tioned approac hes ha v e to mo del complex graphs, whic h are dened b y a large set of parameters. Non-parametric Metho ds: These metho ds mak e no assumption ab out the densit y mo del whic h underlies the data, and lets the data decide the nature of its distribution. Histogram metho ds, parzen windo ws are examples of 22 PAGE 35 23 suc h metho ds. Suc h metho ds require estimation of the width parameter or smo othing factor whic h is the n um b er of bins in histograms, the parzen windo w size " whic h determines ho w w ell the densit y estimation "ts" the data. Semi-P arametric metho ds: Bishop [ 3 ] adv ances a class for densit y estimation that tries to com bine the b est features of parametric and non-parametric metho ds. T ec hniques suc h as mixture mo dels fall in this category These tec hniques assume Gaussian comp onen t densities of the data and th us ha v e to estimate Gaussian lik e parameters. Since b oth the parametric and semi-parametric metho ds assume a sp ecic underlying distribution, they ha v e to estimate sev eral distribution sp ecic parameters suc h as mean, v ariance and co v ariance ( a ; a ; a ). In con trast, non-parametric metho ds estimate a smo othing parameter whic h is usually some kind of a width parameter. Image registration requires determining the p ose parameters in a particular transformation space. If parametric or semi-parametric metho ds are used for densit y estimation in registration, then the densit y mo del parameters need to b e estimated apart from the p ose parameters. In con trast, nonparametric metho ds require the estimation of width(smo othing) parameters and the p ose parameters only Th us, non-parametric densit y estimators are generally preferred o v er parametric or semi-parametric metho ds for image registration. T raditional en trop y estimation, either discrete or con tin uous, requires a priori densit y estimation. Ho w ev er, the approac h tak en in this study the en tire densit y estimation. Ren yi en trop y is estimated directly using the samples from the image, via the relationship b et w een the asymptotic length of the minim um spanning tree and the Ren yi en trop y The Ren yi en trop y and its prop erties are discussed in Section ( 2.1.4 ). The follo wing Section discusses the c hronology of the dev elopmen t of the theory underlying the en trop y estimation b y a minim um spanning tree. 3.2 History In the late 50's, an in teresting viewp oin t for estimating en trop y using graph algorithms came in to existence. It b egan in 1959 when Beardw o o d, Halton and PAGE 36 24 Hammersley [ 2 ] rst analyzed the tra v elling salesman problem(TSP) and stated that the TSP length of a set of m ulti-dimensional feature v ector is related to the en trop y Later, in 1988 J. M. Steele [ 25 ] follo w ed their w ork and stated that if a fully connected graph is constructed from a m ulti-dimensional feature space suc h that the w eigh t of an edge b et w een an y t w o p oin ts is the w eigh ted Euclidean distance b et w een the p oin ts, then the asymptotic length of a minim um spanning tree constructed from suc h a graph is related to the underlying probabilit y densit y Later, the group at Univ ersit y of Mic higan led b y Dr. Alfred Hero sho w ed ho w the asymptotic length of the minim um spanning tree is directly prop ortional to the Ren yi en trop y (refer Section 2.1.4 ) [ 17 ]. They further p oin ted out ho w this relationship b et w een d and minim um spanning trees can b e further expanded to a v ariet y of graphs lik e k-nearest neigh b ors or Steiner T rees [ 13 ]. The practical applications of suc h graphs are emphasized in [ 12 ] with image registration b eing one suc h application. Minim um spanning trees and Ren yi en trop y w ere initially discussed in Section ( 2.3 ) and ( 2.1.4 ) resp ectiv ely Some relev an t p ortions are reiterated here as and when needed. Supp ose there are n indep enden t m ultiv ariate p oin ts in a d dimensional space, x i 2 R d and that there is a graph of these p oin ts with V = x 1 ; x 2 ::; x n v ertices and E = ( x i ; x j ); 1 i < j n where eac h edge e in the edge set E is denoted b y a w eigh t function, ( j e j ). The task of nding a minim um spanning tree in v olv es nding the tree with the minim um w eigh t among all the trees whic h span the all the v ertices in the set V M ( x 1 ; x 2 :::; x n ) = min T X e 2 T ( j e j ) ; (3.1) where T is a set of all the graphs with v ertex set that includes all p oin ts in the space R d is the w eigh ting function that asso ciates w eigh ts with the edges of the PAGE 37 25 minim um spanning trees whic h can b e dened as ( x ) = x r where 0 < r < d: (3.2) Let the p oin ts in feature space R d ; d 2 ha v e a indep enden t random distribution with the function satisfying the criteria ( x ) x r and 0 < r < d and let f b e the densit y of the con tin uous part of with c ( r ; d ) a p ositiv e constan t dep ending up on r and d then Steele [ 25 ] pro v ed the relationship b et w een the length of a minim um spanning tree and some p o w er of the densit y function f as, lim n !1 n d r d M ( X 1 ; X 2 ; : : : ; X n ) = c ( r ; d ) Z < d f ( x ) d r d dx: (3.3) The next section sho ws ho w the equation ( 3.3 ), after a little bit of algebraic manipulations, presen ts a clear link b et w een the minim um spanning tree and the Ren yi en trop y 3.3 Minim um Spanning T rees for En trop y Estimation W e discussed ab out alpha Ren yi En trop y or simply the Ren yi en trop y in Section ( 2.1.4 ) and represen ted it mathematically in Equation ( 2.18 ) as, h ( X ) = 1 1 log Z f ( x ) dx : (3.4) The relationship b et w een the densit y function and the length of the minim um spanning tree is sp ecied in Equation ( 3.3 ). If w e rename the length of the Minim um Spanning T ree in Equation ( 3.3 ) as, M ( X 1 ; X 2 ; : : : ; X n ) = L ( X n ) = min e 2 T X e ; e rij (3.5) where e ij = k x i x j k denotes the Euclidean distance b et w een p oin ts i and j in a d dimensional feature space, w e can rewrite the Equation ( 3.3 ) as, lim n !1 n ( d r ) =d L ( X n ) = c ( r ; d ) Z < d f ( x ) d r d dx (3.6) PAGE 38 26 No w taking the log on b oth sides and substituting = ( d r ) =d w e obtain, log lim n !1 n L ( X n ) = log ( c ( ; d )) + log ( Z < d f ( x ) dx ) : (3.7) Note that the constan t c whic h w as a function of r and d no w b ecomes a function of and d due to the substitution of Dividing the ab o v e equation b y 1 on eac h side yields, 1 1 log ( Z < d f ( x ) dx ) = 1 1 (log lim n !1 n L ( X n ) log ( c ( ; d ))) (3.8) F rom the Equation ( 2.18 ), the left hand side of the preceding equation can b e reduced to h ( x ) = 1 1 (log lim n !1 n L ( X n ) log ( c ( ; d ))) : (3.9) The short deriv ation ab o v e p oin ts as to ho w the minim um spanning tree length con v erges to the Ren yi en trop y The ab o v e Equation ( 3.9 ) is the fundamen tal result from whic h w e estimate the en trop y appro ximately in the thesis. It is the appro ximate estimation of the en trop y as the constan t c (whic h dep ends on and the dimensions d ) is unkno wn and cannot b e computed analytically The strength of Equation ( 3.9 ) lies in the fact that the length of the minim um spanning trees can b e easily computed in p olynomial time sp ecied b y O ( j E j log j E j ) (refer [ 7 ]). Observ e that Equation ( 3.9 ) is a v ery clean equation whic h do esn't require the estimation of an y smo othing parameter whic h has to b e c hosen carefully e.g., the n um b er of bins in the histogram or the "sigma" in the parzen windo w estimation of en trop y Of course w e ha v e to compute the length of the minim um spanning tree whic h is trivial. F or the n um b er of bins in the histogram and the parzen windo w width estimator, on the other hand, there is no explicit criteria of selection of the smo othing(width) parameter. The explanation of the rate of con v ergence of MST en trop y follo ws. The rate of con v ergence of the algorithm dep ends mostly on the n um b er of samples that w e PAGE 39 27 c ho ose to compute the MST. A v ery nice pap er b y Alfred Hero et. al ("Con v ergence rates of minimal graphs with random v ertices" b y "A. O. Hero J. A. Costa and B. Ma" submitted to "IEEE T ransactions on Information Theory" in Marc h 2003) form ulated the rates of con v ergence of the minim um spanning tree for piece-wise linear densities and then extended their prop osition for a more general form of densities. W e next discuss a prop osition for the rate of con v ergence of minim um spanning trees. W e assume that the n um b er of dimensions d 2, that 1 r d 1 and X 1 ; X 2 ; : : : ; X n are indep enden t sample p oin ts in the space [0 ; 1] d with densit y f with supp ort S [0 ; 1] d Prop osition: Marginal blo c k densit y. The densit y f has m d lev els. F or the quasi-additiv e Euclidean functional L r of order r the follo wing b ound exists, j E [ L r ( X 1 ; X 2 ; : : : ; X n )] =n ( d r ) =d c ( r ; d ) Z S f ( d r ) =d ( x ) dx j O ( nm d ) 1 =d : (3.10) In the Equation ( 3.10 ), the term L r denotes the length of the minim um spanning tree whose edges ha v e the w eigh t of the Euclidean distance b et w een the p oin ts raised to the p o w er r as describ ed in ( 3.5 ). The Equation ( 3.10 ) means that the dierence b et w een the exp ected v alue of the length of the minim um spanning tree normalized b y n d r =d and the pro duct of the constan t c and the in tegral of the densit y function is asymptotically less than ( nm d ) 1 =d F or example, if d = 2, then the upp er b ound on the dierence men tioned ab o v e whic h is the error is O ( nm 2 ) 1 = 2 That is as n um b er of samples n increases, then the b ound b ecomes tigh ter when w e can sa y that the error reduces as the n um b er of samples increase. Observ e from equation ( 3.10 ) that, as the dimensionalit y of the feature space increases, the rate of con v ergence decreases and th us will need more samples for con v ergence. The preceding prop osition is imp ortan t in the con text of image registration, as the images ha v e xed lev els of color v alues and th us can b e said equiv alen t to PAGE 40 28 the piece-wise densit y considered ab o v e where there are a total of m d lev els in the d dimensional space. In terested readers should refer to the pap er,("Con v ergence rates of minimal graphs with random v ertices" b y "A. O. Hero J. A. Costa and B. Ma" submitted to "IEEE T ransactions on Information Theory" in Marc h 2003) for a generalization of the ab o v e result to the generalized class of densities. The pap er b y Hero et. al. ("Con v ergence rates of minimal graphs with random v ertices" b y "A. O. Hero J. A. Costa and B. Ma" submitted to "IEEE T ransactions on Information Theory" in Marc h 2003) men tions that the aforemen tioned con v ergence criteria can also b e applied to the graph constructions lik e the Steiner tree graph, minimal matc hing graph or the TSP tour all along with their p o w er w eigh ted v arian ts. Ho w ev er, the minim um spanning tree is c hosen based on the pap er [ 12 ] whic h recommends the MST as the b est metho d to estimate en trop y due to its p olynomial run time. There exist man y w ell kno wn algorithms for implemen ting the MST whic h supp orts use of the minim um spanning tree as the densit y estimator. 3.4 Measure of Registration Since Viola and W ells [ 28 ] and Collignon et. al. [ 6 ] successfully used the Shannon m utual information to p erform registration, it has b een used as a measure to p erform registration in man y systems. In this study the minim um spanning tree as describ ed in the ab o v e section are used to estimate the appro ximate Ren yi en trop y of the random v ariables. This en trop y is further used to compute the measure of registration. The denition of the Ren yi m utual information w as rst form ulated b y Alfred Ren yi in the b o ok [ 23 ] as, I ( x ) = 1 1 log Z 1 1 f X ( x ) Q Ni =1 f X i ( x i ) 1 dx: (3.11) PAGE 41 29 Unfortunately Ren yi m utual information cannot b e easily computed. Th us instead of the Ren yi m utual information, w e use a similar measure whic h is simply the sum of the marginal en tropies min us the join t en trop y similar to the Shannon m utual information. This measure w as prop osed b y Hild et. al in the pap er [ 14 ] as, N X i =1 H ( x i ) H ( x ) = 1 1 log R 1 1 f X ( x ) R 1 1 Q Ni =1 f X i ( x i ) dx i : (3.12) In case of t w o images, the ab o v e measure is the sum of the t w o Ren yi marginal en tropies min us the Ren yi join t en trop y as, H ( x 1 ) + H ( x 1 ) H ( x ) = 1 1 log L x 1 n + log L x 2 n log L x 12 n (3.13) where L x denotes the length of a minim um spanning tree constructed from a p o w er w eigh ted graph computed as L x = min T P e 2 T e rx The preceding measure is v ery similar to the Ren yi m utual information as b oth are p ositiv e and b oth tend to zero when the join t probabilit y is equal to the pro duct of the marginal probabilities (when the v ariables are indep enden t as men tioned in the pap er [ 14 ]). Ho w ev er, the main reason wh y the measure is preferred o v er the actual en trop y is that it can b e calculated using the MST en trop y algorithm. Th us, our algorithm maximizes this measure to p erform registration. Also, maximizing the m utual information is similar to minimizing the join t en trop y of the images when the searc h space is narro w. This is so as the marginal en tropies remain more or less constan t in a narro w searc h space. Referring to the Equation ( 3.12 ), w e see that, in a narro w space as the marginal en tropies are more or less constan t, the negativ e of join t en trop y is maximized. In other w ords, join t en trop y is minimized. In some cases (esp ecially when w e results are based on one parameter lik e rotation) w e ha v e opted to minimize join t en trop y instead of maximizing the m utual information. PAGE 42 CHAPTER 4 ALGORITHM DESIGN This c hapter rst describ es the o v erall design of a generalized registration tec hnique follo w ed b y the v arious design decisions while designing this algorithm, a brief algorithm description and then implemen tation details. 4.1 General Registration Algorithm The pro cess of image registration can b e represen ted pro cedurally as sho wn in Figure 4{1 The rst step prepro cesses the images suc h that the images b ecome appropriate for registration. Prepro cessing in v olv es p erforming op erations on image lik e noise remo v al, smo othing of the image, cropping etc. Human In terv en tion can o ccur dep ending on the t yp e of images considered for registration. The next step is a quic k estimation the exact transformation parameters b y scanning the searc h space in lo w resolution. And nally when w e ha v e the coarse parameters for the transformation, the region around the rough parameters is scanned in order to get an accurate estimate. This m ulti-resolution helps as it sp eed up the registration pro cess. Alternativ ely if the time requiremen ts for estimating the measure( that w e discussed in Section 3.4 ) are less stringen t, then the initial rough estimation step is omitted all together and the searc h space is searc hed at ne resolution. The ro w Figure 4{1: Blo c k diagram for image registration 30 PAGE 43 31 Figure 4{2: Flo w-c hart for the estimating the parameters of the transform c hart for the blo c k used to nd the optimal transformations (either in the ballpark estimation or the ne tuning) is as sho wn in gure 4{2 W e apply the transformation on only one of the t w o images (transforming image) and the other image is unc hanged (reference image). A t eac h step of the algorithm, the en tropic measure b et w een the t w o images is calculated. If the measure has reac hed the maxim um(or minim um dep ending on its nature) in the space, then the parameters of the transform are declared as the optimal parameters. If not, then the next transform to b e applied to the transforming image is computed and that transform is applied to the transforming image and the PAGE 44 32 steps are rep eated un til the optim um v alue of the measure is found in the searc h space. 4.2 Design of the Algorithm Design of a registration algorithm in v olv es a sequence of decisions that need to b e tak en at eac h step as discussed in this Section. As no w what b ecomes ob vious from the previous discussion that crux of this algorithm is estimation of the Ren yi en trop y whic h is con tin uous. This estimation can b e done b y using the non-parametric Ren yi measure based on the the minim um spanning trees based on the equation ( 3.8 ) whic h is expressed as 1 1 log ( Z < d f ( x ) dx ) = 1 1 (log lim n !1 n L ( X n ) log ( c ( d d ; d ))) : (4.1) F rom this equation, the estimation of en trop y is appro ximate b ecause of the presence of the constan t term c that only dep ends on and the dimensions d The rst step therefore is estimating parameters b y p erforming rough image registration using histograms. 4.2.1 Prepro cessing b y Histograms The main reason for using the histograms is that they are computationally ecien t. This allo ws us to get a ballpark estimation v ery quic kly The en tropies (b oth the join t and marginal) are estimated from the o v erlap area of the t w o images. The reason for estimating the join t en trop y from the o v erlap area is ob vious. Ho w ev er w e also estimate the marginal en trop y from the o v erlap area as as the images are transformed, the marginal en trop y of the o v erlap area also c hanges dynamically with ev ery transformation. The imp ortan t parameter whic h decides ho w w ell the histogram p erforms is the bin size, whic h is c hosen based on statistical analysis. The Figure 4{3 elab orates empirical results of histogram analysis when the n um b er of bins are v aried. PAGE 45 33 -50 -40 -30 -20 -10 0 10 20 30 40 50 -1.1 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 AngleMutual InformationChanges in MI Vs. angle when number of bins are 8 50 40 30 20 10 0 10 20 30 40 50 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 AngleMutual InformationChanges in MI Vs. angle when number of bins are 8 50 40 30 20 10 0 10 20 30 40 50 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 AngleMutual InformationChanges in MI Vs. angle when number of bins are 32 Figure 4{3: Eect on registration b y v arying the n um b er of bins of histogram(Num b er of Bins are 8, 16 and 32 from top left to b ottom righ t. Maxim um should b e presen t at 0.) PAGE 46 34 T able 4{1: The ICA v alues -0.2845 0.2875 -0.1523 0.1468 0.3425 -0.5310 -0.4016 0.5821 0.4048 0.0869 -0.4269 -0.0605 0.0597 0.0197 0.0251 0.0777 As w e can see as the n um b er of bins decrease (bin size is large) the histogram is noisy and there are man y p eaks around the desired v alue maxim um in the histogram. Ho w ev er, if the n um b er of bins increase, the p eak disapp ears as the data is o v er tted. After histogram analysis, w e extract features from the image and use these feature images to p erform registration as follo ws, 4.2.2 F eatures Image features are extracted based on a lo cal neigh b orho o d of eac h pixel, called tags [ 1 ]. The basic of idea of tag-based metho ds is to accoun t for the lo cal top ology surrounding eac h pixel. Around eac h pixel in the image, a n n matrix is extracted and v ectorized, to obtain a v ector of length n 2 for ev ery pixel. This is equiv alen t to quan tizing the image suc h that it rerects lo cal top ology due to whic h the extracted features b ecome more in v arian t to c hange in the individual pixel in tensit y Once the tags are extracted from the image, the decision that laid ahead w as ho w to extract features. W e opted for statistical based features since they extract features based on the information con ten t of the image. As men tioned in Section ( 2.6 ), the ICA algorithm attempts to nd statistically indep enden t comp onen ts of the image. W e ha v e used the F ASTICA [ 16 ] algorithm to compute the ICA of the images. The Figure 4{4 sho ws the feature images as computed b y ICA. The actual v alues of the ICA comp onen ts can b e seen from the table 4{1 .The rst three lines PAGE 47 35 ICA Image 1 ICA Image 2 ICA Image 3 ICA Image 4 Figure 4{4: ICA feature images PAGE 48 36 Figure 4{5: The faces used to compare registration for dieren t sizes of ICA blo c ks Figure 4{6: Figure sho wing the v ariations in MI vs. the rotation angle v arious ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features. Maxim um lo cated at 0. of the table 4{1 are resp onsible for the edge images of the original image while the fourth ICA comp onen t simply smo oths the image. One p oin t here is ho w man y ICA features to extract or ho w m uc h should b e the dimensionalit y of the feature space. There are sev eral in teresting observ ations in this case. Figure 4{6 represen ts the m utual information vs. the rotation angle for v arious sizes of ICA blo c ks. The images for registration are sho wn in Figure 4{5 F rom the Figure 4{6 w e can see that the registration w orsens if w e pic k to o high a feature dimension. PAGE 49 37 Figure 4{7: Figure sho wing the v ariations in MI vs. the rotation angle v arious ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features Another in teresting observ ation is seen when w e pic k the faces sho wn in gure 4{30 w e get the MI Vs. rotation plots as sho wn in Figure 4{7 F rom the Figure 4{7 w e see that the the acceptable registration result is obtained when w e use 16 ICA features. Th us the n um b er of ICA features to use relies hea vily on the t yp e of image to register. Once features are extracted from the image, the next step is to ne tune the registration b y estimating the en trop y using the minim um spanning trees. 4.2.3 Searc h Strategy Our protot yp e MST matc hing algorithm uses a brute force searc h that scans the en tire transformation space to p erform registration. One of the reasons for not doing this is the w a y the MST en tropies b eha v e in the searc h space. The b eha vior is sho wn in 4{8 from whic h w e can see that the MST en trop y is not smo oth and has sev eral lo cal maxima. Th us, normal searc h strategies lik e the gradien t descen t etc cannot b e used as they rely on a smo oth ob jectiv e function. PAGE 50 38 15 10 5 0 5 29 28 27 26 25 24 23 22 AngleMutual InformationMST entropies over rotation 10 5 0 5 10 0.5 0 0.5 14.8 14.6 14.4 14.2 14 13.8 13.6 scale Angle Figure 4{8: Beha vior of MST en trop y in 1D and 2D searc h space. PAGE 51 39 Th us, in order to b e not o v erwhelmed b y the design of an ecien t searc h algorithm for nding the global maxima for a non-smo oth ob jectiv e function, w e used a brute force searc h whic h is guaran teed to nd a global maxim um. 4.2.4 Minim um Spanning T ree Algorithm This is the nal step of the registration. Ideally w e w ould use the en tire o v erlap area of the t w o images to compute the minim um spanning tree and then estimate the en trop y F rom Equation ( 3.8 ), w e see that the en trop y is sensitiv e to the n um b er of samples used to compute the minim um spanning tree and th us w e need to main tain this n um b er constan t during eac h computation. So, w e c ho ose a xed n um b er of sample p oin ts randomly from the o v erlap area of the image and use only those to estimate the en trop y Minim um Spanning T ree computation. Assume that w e pic k n sample p oin ts eac h time to compute the minim um spanning tree. No w here w e ha v e the p oin ts in some d dimensional space, where the w eigh ts of the edges b et w een t w o p oin ts are the Euclidean distance b et w een the p oin ts. Also, initially w e assume that there is an edge b et w een a p oin t and all the other ( n 1) p oin ts in the space i.e. that the graph is fully connected. In suc h a scenario ,w e ha v e n p oin ts and n 2 edges in the Euclidean space, the classical MST algorithm will ha v e the complexit y of order O ( n 2 log n ). As this is a p olynomial complexit y algorithm, the searc h time increases tremendously as n increases. One of the pap ers [ 19 ] men tions ab out resolving this issue b y reducing the n um b er of edges. This has b een done b y placing a sphere around eac h p oin t in the d dimensional h yp erspace. Only the edges connecting the giv en p oin t to its neigh b ors that lie within the sphere are retained for analysis. The radius of the sphere is calculated based on the follo wing form ula, radius p = mean p + max mean p QF actor RF actor (4.2) PAGE 52 40 where radius p denotes the radius at p oin t p mean p denotes the mean of distances that connect p oin t p to all other p oin ts. max denotes the maxim um distance b et w een an y 2 p oin ts in the d dimensional space. QF actor denotes the quan tization factor. RF actor denotes the factor whic h decides magnitude of the radius. W e v aried the RF actor it o v er a sp ecic v alues of the radii and c hose the one whic h ga v e the radius with minim um edges and error. The plots are sho wn in 4{9 As a result, ev ery p oin t has a sphere of a dieren t radius, whic h is determined primarily b y ho w close the p oin t is to the other p oin ts in the h yp erspace. Deciding the n um b er of samples The Figures 4{10 and 4{11 illustrate that when the v alue of is k ept ab out 0.55, then 500 samples seem sucien t for registration. Ho w ev er a nice plot with explicit maxim um is obtained at ab out 1100 samples. A similar plot for v alue of alpha as 0.85 is sho wn in 4{12 and 4{13 whic h illustrates that a v ery go o d result is obtained at 1100 samples. Th us eviden tly as more samples are b etter for accuracy F ew er samples are pic k ed when sp eed desired. The minim um spanning tree w as computed b y using the Krusk al's Algorithm implemen tation pro vided b y the Bo ost Graph Library (Av ailable at: http://www.boost.org/libs/graph/doc/table of contents.html Accessed on: Marc h 2004). The pseudo co de for this implemen tation can b e found in the APPENDIX section. 4.3 Implemen tation Details 4.3.1 Hardw are and soft w are Details CPU : P en tium IV, 2.8 GHz, 2 GB RAM. OS: Lin ux. Languages: Matlab and C++. PAGE 53 41 All the co de w as written in Matlab. The program used the Krusk al Algorithm pro vided b y the Bo ost Graph Library (Av ailable at: http://www.boost.org/libs/graph/doc/table of contents.html Accessed on: Marc h 2004) whic h is written in C++. The in terface b et w een the Matlab and the C++ co de is done using the Matlab MEX F unction In terface. Figure 4{9: Plots sho wing error b et w een the actual MST length and the truncated MST length o v er v arious radius factors and o v er rotations of -10, 0 and 10 degrees PAGE 54 42 10 5 0 5 10 52.5 53 53.5 54 54.5 55 55.5 Joint Entropy 10 5 0 5 10 58 58.5 59 59.5 60 60.5 61 Joint EntropySamples is 300 10 5 0 5 10 61 61.5 62 62.5 63 63.5 Joint EntropySamples is 500 10 5 0 5 10 63 63.5 64 64.5 65 65.5 Joint EntropySamples is 700 Figure 4{10: Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.55 are sho wn as 1. 100 Samples 2. 300 samples 3. 500 samples 4. 700 samples 10 5 0 5 10 64.5 65 65.5 66 66.5 Joint EntropySamples is 900 10 5 0 5 10 65 65.5 66 66.5 67 67.5 Joint Entropy Figure 4{11: Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.55 are sho wn as 1. 900 Samples 2. 1100 samples PAGE 55 43 10 5 0 5 10 31 32 33 34 35 36 Joint EntropySamples is 100 10 5 0 5 10 33 33.5 34 34.5 35 35.5 Joint Entropy 10 5 0 5 10 33.5 34 34.5 35 35.5 36 Joint EntropySamples is 500 10 5 0 5 10 33 33.5 34 34.5 35 35.5 36 Joint EntropySamples is 700 Figure 4{12: Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.85 are sho wn as 1. 100 Samples 2. 300 samples 3. 500 samples 4. 700 samples 10 5 0 5 10 34 34.5 35 35.5 36 36.5 Joint Entropy 10 5 0 5 10 34 34.5 35 35.5 36 Joint EntropySamples is 1100 Figure 4{13: Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.85 are sho wn as 1. 900 Samples 2. 1100 samples PAGE 56 44 T able 4{2: F ree parameter table P a rameter Meaning V alue Reason in Ren yi En trop y 0.5 The v alue suggested b y the Hero et. al in pap er [ 19 ]. Blo c kSize Size of blo c k around eac h pixel i.e. the n um b er of ICA features 2 or 4 Based on analysis in section 4.2.2 ,gures 4{6 4{7 radius radius of the sphere around eac h p oin t in feature space to truncate the edges 5 Based on analysis sho wn in Figure 4{9 n umBins n um b er of bins for histogram searc h 16 Based on analysis sho wn in Figure 4{3 n um b erSamples Num b er of random samples used to compute Ren yi en trop y 500 or 1000 Pic k ed based on whether sp eed or accuracy is desired. Figure 4.2.4 searc hSpace Ane parameters o v er whic h to searc h Rotations[-20,20], Scale[-0.5,0.5] shear[-0.5,0.5] translations[-5,5]in eac h direction Based on the time required to co v er the space Time: The time tak en for the en tire algorithm to run for a 64*64 images with histogram prepro cessing and then nal registration using MST en trop y using 1000 samples is appro ximately 8 hours. 4.3.2 F ree P arameters The free parameters are tabulated in T able 4{2 4.4 Results This section summarizes the dieren t results obtained from the registration algorithm. This section sho ws that the registration b y estimating en trop y b y using the MST pro duces acceptable results. All the face images are obtained from the Y ale face database (Av ailable A t: PAGE 57 45 http://cvc.yale.edu/projects/yalefaces/yalefaces.html Accessed On: Marc h 2004) 4.4.1 Same Image Registered with the Same Image This result sho ws ho w the MST c hanges as the images are registered. As b oth the images considered are the same image in this case, the visual represen tation of the c hange in the MST from the unregistered case to one in the p erfectly registered case is v ery apparen t. The parameters v alues for the follo wing tests: = 0.5 Samples = 500 Num b er of bins of Histogram = 16 Blo c k size around pixel = 2 x 2 The gure 4{14 displa ys the original unregistered image and their o v erlap images. The Figure 4{15 sho ws the minim um spanning tree of the t w o unregistered images. The feature space in this represen tation is the in tensit y of b oth the images for the purp ose of visualization. The gure 4{16 sho ws the ho w the images are registered. The MST of the t w o registered images is sho wn in gure 4{17 Observ e ho w nicely the p oin ts collapse in one line as the images b ecome registered. 4.4.2 Registration of Same P erson with Dieren t Expressions In this section, here are the registration results for the images of the same p erson in dieren t facial expressions. The parameters c hosen are: = 0.5 Samples = 500 Num b er of bins of Histogram = 16 Blo c k size around pixel = 2*2 PAGE 58 46 Image 1 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images Figure 4{14: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. Same face with applied transformation 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images 50 0 50 100 150 200 250 300 0 50 100 150 200 250 300 DoneDtwoMinimum Spanning Tree for Unregistered Images Figure 4{15: Minim um spanning tree of the unregistered images considering only the in tensities of the t w o images as features PAGE 59 47 Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4{16: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace obtained after applying the optimal transformation to the unregistered image sho wn in the gure 4{14 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images 0 50 100 150 200 250 300 0 50 100 150 200 250 300 DoneDtwoMinimum Spanning Tree for Registered Images Figure 4{17: Minim um spanning tree of the registered images considering only the in tensities of the t w o images as features. PAGE 60 48 Image 1 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images Figure 4{18: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of the same p erson with glasses whic h is unregistered 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images The images b efore registration are sho wn in Figure 4{18 and the MST for the unregistered case is in Figure 4{19 After p erforming registration the results are sho wn in Figure 4{20 and the MST for the registered case is in Figure 4{21 If w e see at the images of the minim um spanning trees then w e see that when the t w o images are unregistered then the p oin ts are scattered all o v er the place. Ho w ev er when the images are registered, a cluster of p oin ts is formed at the 45 degree line in the MST space reducing the length of the MST. 4.4.3 Y et Another Registration of Same P erson with Dieren t Expressions In this section, w e sho w the registration results for another images of the same p erson in dieren t expressions. The parameters c hosen are: = 0.5 Samples = 500 PAGE 61 49 0 50 100 150 200 250 300 50 0 50 100 150 200 250 300 DoneDtwoMinimum Spanning Tree(MST) of unregistered images Figure 4{19: Minim um spanning tree of the unregistered images considering only the in tensities of the t w o images as features Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4{20: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of the same p erson with glasses whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images PAGE 62 50 0 50 100 150 200 250 300 0 50 100 150 200 250 300 DoneDtwoMinimum Spanning Tree(MST) of registered images Figure 4{21: Minim um spanning tree of the registered images considering only the in tensities of the t w o images as features Num b er of bins of Histogram = 16 Blo c k size around pixel = 2 x 2 The images b efore registration are sho wn in Figure 4{22 After p erforming registration the results are sho wn in Figure 4{23 F rom the previous t w o results with the images of same p erson and dieren t expressions, w e can see that the algorithm successfully p erforms registration. An imp ortan t p oin t to note here is that the image size w as 64 x 49 (3136 pixels) out of whic h 500 w ere used to estimate en trop y in order to p erform registration b y using MST unlik e histograms whic h need all the samples in the area in order to estimate en trop y Plots of con v ergence for the Ren yi en trop y v erses the n um b er of samples are sho wn later in this section. 4.4.4 Registration Result of Images of t w o Dieren t P eople W e next discuss registration results for the images of the dieren t p ersons. The parameters c hosen are: = 0.5 PAGE 63 51 Image 1 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images Figure 4{22: Figure sho wing the from top left to b ottom righ t 1. F ace image of a happ y p erson 2. F ace image of the same p erson with sad expression whic h is unregistered 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4{23: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of the same p erson with sad expressions whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images PAGE 64 52 Image 1 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images Figure 4{24: Figure sho wing the from top left to b ottom righ t 1. A F ace image 2. F ace image of the dieren t p erson 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images Samples = 500 Num b er of bins of Histogram = 16 Blo c k size around pixel = 4x4 Initial state of the unregistered images is sho wn in Figure 4{24 After p erforming registration the results are sho wn in Figure 4{25 The ab o v e result sho ws that though it is doing registering the t w o images, the result is not accurate for dissimilar images and there is denitely scop e for impro v emen t. In the next result therefore I c hanged some parameters and got some impro v emen ts in the result. 4.4.5 Y et Another Registration Result of Images of Tw o Dieren t P eople No w are the registration results for the y et another images of the dieren t p ersons. The parameters c hosen are: = 0.9999 Samples = 1000 PAGE 65 53 Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4{25: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of dieren t p erson whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images Num b er of bins of Histogram = 16 Blo c k size around pixel = 4*4 Also I scaled the searc h range for scale to p erform ev en ner searc h. Initial state of the unregistered images is sho wn in Figure 4{26 After p erforming registration the results are sho wn in Figure 4{27 The ab o v e result is a trire b etter than the previous one. 4.4.6 Comparison with Histogram Based Metho d In order to do this comparison, w e selected t w o sets of images of t w o dieren t p eople. In one case, w e p erform registration o v er six parameters for the faces considered in section 4.4.5 and w e registered the faces o v er all the six parameters namely rotation, scale, shear and translations. Initial state of the unregistered images is sho wn in Figure 4{28 After p erforming registration the results are sho wn in Figure 4{29 PAGE 66 54 Image 1 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images Figure 4{26: Figure sho wing the from top left to b ottom righ t 1. A F ace image 2. F ace image of the dieren t p erson 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4{27: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of dieren t p erson whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images PAGE 67 55 Image 1 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images Figure 4{28: Figure sho wing the from top left to b ottom righ t 1. A F ace image 2. F ace image of the dieren t p erson 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4{29: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of dieren t p erson whic h is registered 3. Ov erlap of the t w o images registered using histogram 4. Edge image of the o v erlapp ed registered images PAGE 68 56 Image 1 with light from left Image 2 rotated with light from right Figure 4{30: Figure sho wing from left to righ t 1. F ace image of a p erson 2. F ace image of same p erson whic h is registered Figure 4{31: Figure sho wing image 2 as registered b y the histogram W e see from the gure 4{29 that the histogram metho d p erforms as w ell as the MST when the in tensit y dierences in the t w o images are not that ob vious. Later w e p erformed the comparison on t w o images in whic h the ligh ting on one of the images w as from the left side while in the other image the ligh t w as from the righ t side and then compared results o v er the rotation only The images that w ere considered are sho wn in gure 4{30 The second image is rotated b y 5 degrees and p erfect registration nds the righ t answ er at -5 degrees. The results with the histogram are sho wn in 4{31 The plot of the MI vs the rotation is sho wn in 4{32 Notice ho w the maxim um MI is seen at 9 degrees instead of -5 degrees, whic h is a clear error. The results with the MST are sho wn in Figure 4{33 The plot of the MI vs the rotation is sho wn in Figure 4{34 Notice ho w the maxim um MI is seen at -5 degrees as exp ected. PAGE 69 57 10 8 6 4 2 0 2 4 6 8 10 1.8 1.79 1.78 1.77 1.76 1.75 1.74 1.73 1.72 1.71 AngleMutual Information by histogramMI Vs. Rotation by histogram Figure 4{32: Plot of MI vs. rotation angle when registered b y histogram. Maxim um should b e at -5 degrees. Figure 4{33: Figure sho wing image 2 as registered b y the MST 10 8 6 4 2 0 2 4 6 8 10 6.9092 6.9091 6.909 6.9089 6.9088 6.9087 6.9086 x 10 4 AngleMutual Information by MSTMI Vs. Rotation by MST Figure 4{34: Plot of MI vs. rotation angle as registered b y MST. Correct answ er is at -5 degrees. PAGE 70 58 0 500 1000 1500 2000 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0 500 1000 1500 2000 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0 500 1000 1500 2000 1.8 1.6 1.4 1.2 1 0.8 0 500 1000 1500 2000 2.2 2 1.8 1.6 1.4 1.2 1 0.8 Figure 4{35: Figure sho wing the from top left to b ottom righ t, MST Ren yi join t en trop y vs the samples for 1. alpha = 0.4 2. alpha = 0.45 3.alpha = 0.5 4. alpha = 0.55 Th us w e conclude, that in the images with lot of in tensit y v ariations, the registration b y using a minim um spanning tree denitely outp erforms that b y the histogram. 4.4.7 Con v ergence of Ren yi En trop y with Num b er of Samples as Alpha Changes Recall from the previous discussion in the bac kground and theory section that w e need to estimate the Ren yi en trop y that as 1 con v erges to the Shannon dieren tial en trop y The question here is what v alue of to c ho ose suc h that the Ren yi en trop y estimated con v erges quic kly in few er samples. This result do cumen ts the plots of the Ren yi en trop y computed using the minim um spanning trees vs. the n um b er of samples as w e v ary the The result is sho wn in Figures 4{35 4{36 4{37 PAGE 71 59 0 500 1000 1500 2000 2 1.8 1.6 1.4 1.2 1 0 500 1000 1500 2000 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 0 500 1000 1500 2000 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 0 500 1000 1500 2000 2.5 2 1.5 1 Figure 4{36: Figure sho wing the from top left to b ottom righ t, MST Ren yi join t en trop y vs the samples for 1. alpha = 0.6 2. alpha = 0.65 3.alpha = 0.7 4. alpha = 0.75 PAGE 72 60 0 500 1000 1500 2000 3 2.5 2 1.5 1 0 500 1000 1500 2000 3.5 3 2.5 2 1.5 1 0 500 1000 1500 2000 3.5 3 2.5 2 1.5 0 500 1000 1500 2000 3 2.5 2 1.5 1 0.5 0 0.5 Figure 4{37: Figure sho wing the from top left to b ottom righ t, MST Ren yi join t en trop y vs the samples for 1. alpha = 0.8 2. alpha = 0.85 3.alpha = 0.9 4. alpha = 0.95 PAGE 73 61 T able 4{3: Noise results o v er rotation for histogram Noise V ariance Rotation Error 0.01 0 0.02 0 0.03 0 0.04 0 0.05 1 0.06 1 0.07 1 0.08 1 0.09 1 0.1 1 The graphs in Figures 4{35 4{36 and 4{37 sho w that the b eha vior of the Ren yi en trop y is v ery erratic for all the v alues of alpha and in fact do es not seem to con v erge quic kly 4.4.8 Noise T rials Noise in the image ma y aect the registration results. This section compare the histogram tec hnique to the MST tec hnique when the Gaussian noise with v arying v ariance w as added to the images. The images w ere then registered o v er rotation and o v er rotation and scale and the error b y b oth the tec hniques w as compared. case 1: Registration o v er rotation only Range of rotation: -10 to 10 degrees Correct Answ er at: 0 degrees. The error for the histogram metho d for v arious noise added can b e tabulated as in T able 4{3 As seen in T able 4{3 the histogram p erforms w ell for lo w noise but giv es an error of 1 degree for a noise v ariance greater than 0.05. The error for the MST metho d for noise added can b e seen in T able 4{4 F rom this table, w e can see that the MST p erforms w ell for a lo w noise case and again PAGE 74 62 T able 4{4: Noise results o v er rotation for MST Noise V ariance Rotation Error 0.01 0 0.02 0 0.03 0 0.04 0 0.05 0 0.06 3 0.07 0 0.08 3 0.09 0 0.1 2 in tro duces error (in fact larger) than the histogram as the v ariance of the noise added increases. Case 2: Registration o v er rotation and scale Rotation Range: -10 to 10 degrees. Scale Range: -0.5 to 0.5 The correct answ er exp ected for b oth rotation and scale is at 0. Initially some trials w ere made b y just adding noise to the images and then registering the ra w images. With this option ho w ev er the MST approac h p erformed badly and ga v e an error for v alues lo w noise cases with v ariance of 0.01. The histogram metho d clearly outshone the MST metho d in this case. So later w e used a smo othing lter to smo othen out the images b efore registering then with the MST and the histogram. When smo othing W einer lter w as applied to the images after adding noise, the histogram metho d ga v e a ZER O error consisten tly for v ariance from 0.01 to 0.08. Results when the MST w as used are tabulated in the table 4{5 This table sho ws ho w the MST error w orsens dramatically in a t w o parameter searc h space. PAGE 75 63 T able 4{5: Noise results o v er rotation and scale for MST Noise V ariance Rotation Error Scale Error 0.01 0 0 0.02 0 0 0.03 2 0 0.04 4 0 T able 4{6: Noise results o v er rotation and scale for MST Noise V ariance Rotation Error Scale Error 0.01 0 0 0.02 0 0 0.03 2 0 0.04 2 0 0.05 1 0.1 0.06 0 0 0.07 2 0 0.08 2 0 In this ab o v e result sho wn in T able 4{5 1000 samples from the o v erlap area of the images w ere used to compute the MST length. Another set of results when 1500 samples w ere used to compute the MST are tabulated in the T able 4{6 F rom the T able 4{6 w e can see that the accuracy of registration b y MST decreases progressiv ely as an error is consisten tly in tro duced. This result illustrates the b eha vior of the MST en trop y when Gaussian noise w as added to the image and then the image w as smo othed with a particular lter b efore registering. The are dieren t t yp es of noise and lters that can b e used to remo v e the noise. This relationship b et w een accuracy of registration b y MST and the dieren t t yp es of noise and lters needs to b e studied further. PAGE 76 CHAPTER 5 DISCUSSION The estimation of en trop y has its applications in a v ariet y of areas with image registration b eing one suc h application. Recen tly information theoretic measures suc h as m utual information ha v e b ecome v ery p opular in image registration. Estimating the m utual information requires the estimation of en trop y This thesis used minim um spanning tree algorithms to estimate the en trop y from a nite set of samples. The MST can b e computed easily in p olynomial time. One of the go o d attributes of the MST-based en trop y estimation is that no further parameter tuning is required. Ho w ev er when applying these en tropies to p erform registration sev eral imp ortan t issues need to b e dealt with. The MST en trop y do es not appro ximate the Shannon en trop y and instead appro ximates the Ren yi en trop y seen in equation ( 3.3 ). Th us it cannot estimate the Shannon m utual information whic h is widely used in registration. In fact it cannot ev en estimate the original Ren yi m utual information that w e sa w in ( 3.11 ). Instead w e ha v e used a information theoretic measure whic h is the dierence b et w een the sum of the marginal en tropies and the join t en trop y as w as suggested in pap er [ 14 ]. This measure has prop erties similar to that of the Ren yi m utual information. There are sev eral op en issues that need to b e resolv ed in this case. W e w ould lik e to undertak e a systematic comparison with other established densit y and en trop y estimators suc h as P arzen windo ws [ 9 ]. P arzen windo ws estimate b oth the Shannon en trop y as w ell as the Ren yi en trop y Th us they can b e also used to estimate the Shannon m utual information. The densit y function b y the P arzen 64 PAGE 77 65 windo ws can b e computed from the follo wing equation, f n ( x ) = 1 N N X i =1 1 (2 2 ) N 2 exp jj x x i jj 2 2 2 : (5.1) This equation can b e further used to estimate the Ren yi En trop y whic h is men tioned in the equation ( 2.18 ). Note that when using the P arzen Windo w densit y estimator, it can compute the en trop y for > 1 where as for the MST the condition on the alpha is that it should b e less than 1. Th us careful form ulation has to b e done of all the parameters in order to do a comparison b et w een these densit y estimators. Apart from this, ev en in the regime of MST en tropies that are used for registration, sev eral imp ortan t c hanges can b e made to the metho d prop osed in this thesis. One of them is that, in this thesis w e ha v e simply scanned the en tire space to nd the maxim um (Refer bac k to the section 4.2.3 ). Clearly a go o d optimized searc h algorithm to searc h o v er suc h a space w ould b e b enecial in cutting the time required drastically Also, one area of impro v emen t is in feature selection. Righ t no w, w e ha v e used ICA features based on the rationale that ICA returns statistically indep enden t features. But a detailed study on the features can help impro v e the registraion results. There are sev eral other metho ds of feature selection lik e the principal comp onen t analysis, use of image pro cessing lters lik e Gab or lters etc. [ 27 ]. Ho w ev er, the noise analysis that w e did in the section ( 4.4.8 ), suggests that when noise is presen t it requires smo othing of images as these metho ds rely hea vily on taking the deriv ativ es of the images. Th us presence of noise can in fact ha v e a negativ e eect on the registration when deriv ativ e-based features are used, vis-s-vis in tensit y-based registration. An in teresting extension to the registration b y using MST en tropies can b e pursued if instead of using Euclidean distance among the feature p oin ts, w e pro ject PAGE 78 66 the p oin ts in a dieren t space (e.g., Hilb ert space) and then use the distance in the Hilb ert space to compute the MST length. Whether suc h a extension impro v es the registration is another op en area of researc h. In fact w e also noticed in section ( 4.4.8 ) that the histogram metho d is less sensitiv e to noise. Th us a w a y to register b y using a high dimensional histogram is also a p ossibilit y Ho w ev er, in order to do that inno v ativ e w a ys ha v e to b e devised to compute histograms in a m ulti dimensional space. In conclusion, in this thesis, w e use the MST en tropies to register just t w o face images at a time searc hing o v er the space of ane transformations. This tec hnique can also b e expanded to register sev eral images at once. T o bring this in to realit y ho w ev er a go o d searc h algorithm is a necessit y It also requires appropriate selection of features in order to extract most relev an t information. F or example, the features used to register face images ma y not b e suitable for medical images. Also, MST en tropies can b e used for non-rigid image registration, pro vided it is b een made extremely computationally ecien t. Th us, this thesis b olsters the fact that that the MST en tropies can b e used for registration b y implemen ting a registration algorithm whic h searc hes o v er the ane transformation space. PAGE 79 APPENDIX ADDITIONAL INF ORMA TION This section co v ers the pseudo co de of the algorithm. Pseudo-co de. The program consists of the follo wing main functions: F unctionRegister : This function is the main function whic h calls all the subfunctions. F unctionRoughRegister : This function p erforms the rough image registration using in tensit y and histograms for en trop y estimation. F unctionExtractICAF eatures : This function extracts the ICA features from the image. F unctionMSTRegister : This function p erforms the ne registration using the ICA features and the minim um spanning tree for estimating the en trop y and the follo wing supplemen tary functions: F unctionAneT ransfo rm : This function p erforms the ane transformation of the input image according to the giv en parameters. F unctionJointEntrop yByHistogram : This function computes the join t histogram and accordingly estimates the Shannon join t en trop y F unctionMa rginalEntrop yByHistogram : This function computes the histogram and accordingly estimates the Shannon marginal en trop y F unctionEntrop yByMST : This function estimates the Ren yi en trop y (Join t and Marginal) using the minim um spanning tree. Here are the main functions explained in depth: F unctionRegister Input : Image1 Image2 : The Images. Blo ckSize : Size of the square blo c k around eac h pixel(i.e. 2 x 2 3 x 3 etc). A lpha : The factor in the Ren yi En trop y 67 PAGE 80 68 NumBins : The n um b er of bins used to compute the histogram. Output : AnePar ameters : The ane transformation parameters whic h when applied to the second image giv e a registered images. The parameters of ane are t w o rotations, scale, shear and translations in the x and y direction MaxMe asur e : The Mutual Information lik e measure when the for the parameters ab o v e. Algorithm:// P erform rough image registration of the images using in tensit y + histogram RoughAneP arameters = F unctionRoughRegister ( Image1 Image2 NumBins ); // Extract ICA features from Input Images [icaF eatures1, icaF eatures2, ICAF eatures] = F unctionExtractICAF eatures ( Image1 Image2 Blo ckSize ); // P erform ne image registration of the images using ICA F eatures, + histogram AneP arameters = F unctionMSTRegister (ICAF eatures, A lpha RoughAneP arameters); EndF unction F unctionRoughRegister Input : Image1 Image2 : The Images. NumBins : The n um b er of bins used to compute the histogram. Output : R oughAnePar ameters : The ane transformation parameters whic h when applied to the second image giv e a roughly registered images. The parameters of ane are t w o rotations, scale, shear and translations in the x and y direction PAGE 81 69 Algorithm:// The Image1 is k ept xed while Image2 is transformed o v er the searc h space. Ov er all the six parameters of the ane searc h space using a rough resolution to scan the space do tempImg = F unctionAneT ransfo rm (curren tT ransformV alues); join tEn trop y = F unctionJointEntrop yByHistogram (o v erlap areas of the Image1 and tempImg); marginalEn trop y1 = F unctionMa rginalEntrop yByHistogram (o v erlap area of Image1); marginalEn trop y2 = F unctionMa rginalEntrop yByHistogram (o v erlap area of Image2); curren tMI = marginalEn trop y1 + marginalEn trop y2 join tEn trop y; if ( curren tMI > maxMI ) maxMI = curren tMI; RoughAneP arameters = curren t ane parameters; endIfendWhilereturn RoughAneP arameters; EndF unction F unctionExtractICAF eatures Input : Image1 Image2 : The Images. Blo ckSize : Size of the square blo c k around eac h pixel(i.e. 2 x 2 3 x 3 etc). Output : ic aF e atur es1 : The ICA features of image1 ic aF e atur es2 : The ICA features of image2 ICAF e atur es : The ICA features of the t w o images PAGE 82 70 Algorithm:F or b oth Image1 and Image2 do // Get the blo c k neigh b erho o d of eac h pixel in b oth the images. Blo c kInput1 = GetBlo ckNeighb erho o dImage ( Image1 blo c kSize); Blo c kInput2 = GetBlo ckNeighb erho o dImage ( Image1 ,blo c kSize); // Compute the ICA features. W e ha v e used the fastICA pac k age [ 16 ] to do so. The pac k age is freely a v ailable for use. ic aF e atur es1 = F ASTICA (Blo c kInput1); ic aF e atur es2 = F ASTICA (Blo c kInput2); // No w com bine the ICA features ICAF e atur es = ic aF e atur es1 U ic aF e atur es2 ; return ic aF e atur es1, ic aF e atur es2, ICAF e atur es ; EndF unction F unctionMSTRegister Input : ic aF e atur es1 : The ICA features of image1. ic aF e atur es2 : The ICA features of image2. ICAF e atur es : The ICAF eatures of the Images com bined. A lpha : The Alpha used to compute the Ren yi en trop y R oughAnePar ameters : The Ballpark estimate from the histograms around whic h the searc h is p erformed. Output : FinalAnePar ameters : The ane transformation parameters whic h when applied to the second image giv e a registered images. The parameters of ane are t w o rotations, scale, shear and translations in the x and y direction MaxMI : The v alue m utual information when the images are totally aligned. PAGE 83 71 Algorithm:// The Image1 is k ept xed while Image2 is transformed o v er the searc h space. Ov er all the six parameters of the ane searc h around the ballpark estimate R oughAnePar ameters using a ne resolution to scan the space do // First Pic k Random Samples from the join t as w ell as the individual feature space. (sampleJoin t, sample1, sample2) = pickRandomSamples (ICAF eatures, icaF eatures1, icaF eatures2); tempImg = F unctionAneT ransfo rm (curren tT ransformV alues); join tEn trop y = F unctionEntrop yByMST (sampleJoin t); marginalEn trop y1 = F unctionEntrop yByMST (sample1); marginalEn trop y2 = F unctionEntrop yByMST (sample2); Curren tRegistrationMeasure = marginalEn trop y1 + marginalEn trop y2 join tEn trop y; if ( Curren tRegistrationMeasure > maxMeasure ) maxMeasure = Curren tRegistrationMeasure; FinalAneP arameters = curren t ane parameters; endIfendWhilereturn FinalAnePar ameters, maxMe asur e ; EndF unction No w the pseudo co de of the supplemen tary functions is: F unctionAneT ransform Input : Image : The Image to transform. PAGE 84 72 T r ansformPar ameters : P arameters b y whic h the image is to b e transformed Output : T r ansforme dImage : The transformed image after applying the transformation. Algorithm:// The transform parameters are 2 rotation angles ( and ), scale, shear, and translations (xT and yT) 264 a b c d 375 = 264 2 scal e 0 0 2 scal e 375 264 cos ( ) sin( ) sin ( ) cos( ) 375 264 2 shear 0 0 2 shear 375 264 cos ( ) sin( ) sin ( ) cos ( ) 375 aneT ransformationMatrix = 266664 a b 0 c d 0 xT y T 1 377775 ; // No w call the MA TLAB's Image T ransform Routine whic h transforms the giv en image according to the giv en ane transform matrix and lls the empt y areas according to the in terp olation metho d sp ecied. T r ansforme dImage = F unctionMA TLABImT ransfo rm (Image, aneT ransformMatrix, in terp olation); return T r ansforme dImage ; EndF unction F unctionJoin tEn trop yByHistogram Input : Image1 Image2 : The Images. NumBins : The n um b er of bins used to compute the histogram. Output : PAGE 85 73 jointEntr opy : Join t Shannon En trop y of the t w o images computed b y the histogram metho d Algorithm:in terv al = Range/NumBins; //Compute the t w o dimensional normalized join t histogram of the t w o images. join tHist = F unctionJoin tHistogram(Image1, Image2, in terv al); join tEn trop y = 0; for i = 1 to NumBins for j = 1 to NumBins join tEn trop y = -1 join tHist(i,j) log join tHist(i,j); endF or endF or return jointEntr opy ; EndF unction F unctionMarginalEn trop yByHistogram Input : Image : The Image. NumBins : The n um b er of bins used to compute the histogram. Output : mar ginalEntr opy : Marginal Shannon En trop y of the image computed b y the histogram metho d Algorithm:in terv al = Range/NumBins; //Compute the one dimensional normalized histogram image. marginalHist = F unctionMarginalHistogram(Image, in terv al); mar ginalEntr opy = 0; PAGE 86 74 for i = 1 to NumBins mar ginalEntr opy = -1 marginalHist(i,j) log marginalHist(i,j); endF or return mar ginalEntr opy ; EndF unction F unctionEn trop yByMST Input : fe atur eMatrix : The feature matrix whose en trop y is to b e calculated. The size of the matrix is N d. where N :Num b er of sample p oin ts and d :Their dimension d 2 A lpha : The Alpha P arameter that is used in computing the Ren yi En trop y Output : Entr opy : Ren yi En trop y of the feature matrix computed b y the Minimal Spanning T ree metho d. Algorithm:/* In this function w e attempt to compute the appro ximate Ren yi En trop y based on the form ula 3.8 W e do not kno w the factor C ( ; d ) and th us it is appro ximate. */ [N d] = size(featureMatrix); /* Compute the minimal spanning tree and its length. W e use the Krusk al Algorithm pro vided in the "Bo ost Graph Library" (Av ailable at: http://www.boost.org/libs/graph/doc/table of contents.html Accessed on: Marc h 2004) */ lengthMST = F unctionBOOSTKrusk al(featureMatrix); gamma = (d Alpha) /d; En trop y = 1 1 Al pha log l eng thM S T N Alpha ; PAGE 87 75 return En trop y; EndF unction PAGE 88 REFERENCES [1] Y. Amit and D. Geman Shap e quantization and r e c o gnition with r andomize d tr e es Neural Computation, 9 (1997), pp. 1545{1588. [2] D. Beard w ood, H. J. Hal ton, and J. M. Hammersley The shortest p ath thr ough many p oints in Pro ceedings Cam bridge Philosphical So ciet y v ol. 55, 1959, pp. 299{327. [3] C. Bishop Neur al networks for p attern r e c o gnition Oxford Univ ersit y Press, Oxford, England, 1995. [4] L. G. Br o wn A survey of image r e gistr ation te chniques Computing Surv eys, 24 (1992), pp. 325{376. [5] B. Chazelle A minimum sp anning tr e e algorithm with inverseackermann typ e c omplexity Journal of the A CM, 47 (2000), pp. 1028{1047. [6] A. Collignon, D. V andermeulen, P. Suetens, and G. Mar chal 3D multi{mo dality me dic al image r e gistr ation using fe atur e sp ac e clustering in Computer Vision, Virtual Realit y and Rob otics in Medicine, N. Ay ac he, ed., v ol. 905 of Lecture Notes in Computer Science, Springer{V erlag, 1995, pp. 195{204. [7] T. Cormen, C. Leiserson, and L. Rivest Intr o duction to algorithms MIT Press, Cam bridge, MA, USA, 2001. [8] T. Co ver and J. Thomas Elements of information the ory John Wiley and Sons, New Y ork, NY, USA, 1991. [9] R. Dud a and P. Har t Pattern classic ation and sc ene analysis John Wiley and Sons, New Y ork, NY, 1973. [10] D. Eppstein Dynamic euclide an minimum sp anning tr e es and extr ema of binary functions Geometry: Discrete & Computational Geometry 13 (1995), pp. 111{122. [11] S. Gold, A. Rangarajan, C. P. Lu, S. P appu, and E. Mjolsness New algorithms for 2-D and 3-D p oint matching: p ose estimation and c orr esp ondenc e P attern Recognition, 31 (1998), pp. 1019{1031. [12] A. O. Her o, B. Ma, O. Michel, and J. Gorman Applic ations of entr opic sp anning gr aphs IEEE Signal Pro cessing Magazine (Sp ecial Issue on Mathematics in Imaging), 19 (2002), pp. 85{95. 76 PAGE 89 77 [13] A. O. Her o and O. Michel Asymptotic the ory of gr e e dy appr oximations to minimal k-p oint r andom gr aphs IEEE T ransactions on Information Theory 45 (1999), pp. 1921{1938. [14] K. E. Hild I I, D. Erdogmus, and J. Principe Blind sour c e sep ar ation using r enyis mutual informations IEEE Signal Pro cessing Letters, 8 (2002), pp. 174{176. [15] E. Hor o witz, S. Sahni, and S. Rajasekaran F undamentals of c omputer algorithms W. H. F reeman, San F rancisco, CA, 1998. [16] A. Hyv A rinen and E. Oja Indep endent c omp onent analysis: A tutorial Neural Net w orks, 13(4-5) (2000), pp. 411{430. [17] B. Ma, A. Her o, J. Gorman, and O. Michel Image r e gistr ation with minimum sp anning tr e e algorithm in In ternationl Conference on Image Pro cessing, v ol. 1, 2000, pp. 481{484. [18] R. E. Naepolit an L e arning b ayesian networks P earson Pren tice Hall, Upp er Saddle Riv er, NJ, USA, 2004. [19] H. Neemuchw ala, A. Her o, and P. Carson Image r e gistr ation using entr opic gr aph-matching criteria in Pro ceedings of Asilomar Conference on Signals and Systems, v ol. 1, 2002, pp. 134{138. [20] S. Pettie and V. Rama chandran A n optimal minimum sp anning tr e e algorithm in Automata, Languages and Programming, 2000, pp. 49{60. [21] J. P. W. Pluim, J. B. A. Maintz, and M. A. Vier gever Mutualinformation-b ase d r e gistr ation of me dic al images: A survey IEEE T rans. on Medical Imaging, 22 (2003), pp. 986{1004. [22] L. Rabiner A tutorial on hidden markov mo dels and sele cte d applic ations in sp e e ch r e c o gnition Pro ceedings of the IEEE, 77 (F eb 1989), pp. 257{286. [23] A. Renyi Pr ob ability the ory North-Holland Publishing Compan y Amsterdam, Netherlands, 1970. [24] C. E. Shannon A mathematic al the ory of c ommunic ation The Bell system tec hnical journal, 27 (1948), pp. 379{423 and 623{656. [25] J. M. Steele Gr owth r ates of euclide an minimum sp anning tr e es with p ower weighte d e dges The Annals of Probabilit y 16 (1988), pp. 1767{1787. [26] H. A. Stur ges The choic e of class interval Journal of the American Statistical Asso ciation, 21 (1926), pp. 65{66. PAGE 90 78 [27] B. C. Vemuri, J. Liu, and J. L. Marr oqun R obust multimo dal image r e gistr ation using lo c al fr e quency r epr esentations v ol. 2082 of Lecture Notes in Computer Science, 2001, pp. 176{182. [28] P. Viola and W. M. Wells I I I A lignment by maximization of mutual information in Fifth In tl. Conf. Computer Vision (ICCV), 1995, pp. 16{23. PAGE 91 BIOGRAPHICAL SKETCH I graduated with a bac helor of engineering degree from the Go v ernmen t College of Engineering, Pune, India. I started m y master of science program at the univ ersit y of Florida in August 2002 and I am planning to graduate in a mon th! My academic in terests lie in computer vision and image pro cessing and this thesis w as a great learning exp erience for me. Apart from this, I enjo y pain ting and badmin ton. 79 |