<%BANNER%>

Affine image registration using minimum spanning tree entropies

University of Florida Institutional Repository

PAGE 1

AFFINE IMA GE REGISTRA TION USING MINIMUM SP ANNING TREE ENTR OPIES By ADITEE KUMTHEKAR A THESIS PRESENTED TO THE GRADUA TE SCHOOL OF THE UNIVERSITY OF FLORID A IN P AR TIAL FULFILLMENT OF THE REQUIREMENTS F OR THE DEGREE OF MASTER OF SCIENCE UNIVERSITY OF FLORID A 2004

PAGE 2

Cop yrigh t 2004 b y Aditee Kum thek ar

PAGE 3

This thesis is dedicated to m y family whose constan t and unconditional supp ort has made it p ossible.

PAGE 4

A CKNO WLEDGMENTS I w ould lik e to thank Dr. Anand Rangara jan for his excellen t guidance, for the inn umerable hours that he sp en t patien tly resolving m y diculties and for his constan t encouragemen t whic h w as the driving force b ehind this thesis. I w ould also lik e to extend m y thanks to Jie Zhang who made me see certain asp ects of image registration clearly iv

PAGE 5

T ABLE OF CONTENTS page A CKNO WLEDGMENTS . . . . . . . . . . . . . . iv LIST OF T ABLES . . . . . . . . . . . . . . . . vii LIST OF FIGURES . . . . . . . . . . . . . . . . viii ABSTRA CT . . . . . . . . . . . . . . . . . . xii CHAPTER1 INTR ODUCTION . . . . . . . . . . . . . . . 1 2 BA CK GR OUND . . . . . . . . . . . . . . . 7 2.1 En trop y and Mutual Information . . . . . . . . . 7 2.1.1 Join t and Conditional En trop y . . . . . . . . 8 2.1.2 Relativ e En trop y and Mutual Information . . . . . 8 2.1.3 Dieren tial En trop y . . . . . . . . . . . 9 2.1.4 Ren yi En trop y . . . . . . . . . . . . 10 2.2 Histogram metho d for Estimating En trop y . . . . . . 10 2.3 Minim um Spanning T rees . . . . . . . . . . . 11 2.4 The Problem of Registration . . . . . . . . . . 16 2.5 Image T ransformations . . . . . . . . . . . . 18 2.6 Indep enden t Comp onen t Analysis . . . . . . . . . 19 3 THEOR Y OF MINIMUM SP ANNING TREE ENTR OPIES . . . 22 3.1 En trop y Estimation for Registration . . . . . . . . 22 3.2 History . . . . . . . . . . . . . . . . 23 3.3 Minim um Spanning T rees for En trop y Estimation . . . . 25 3.4 Measure of Registration . . . . . . . . . . . 28 4 ALGORITHM DESIGN . . . . . . . . . . . . . 30 4.1 General Registration Algorithm . . . . . . . . . 30 4.2 Design of the Algorithm . . . . . . . . . . . 32 4.2.1 Prepro cessing b y Histograms . . . . . . . . 32 4.2.2 F eatures . . . . . . . . . . . . . . 34 4.2.3 Searc h Strategy . . . . . . . . . . . . 37 4.2.4 Minim um Spanning T ree Algorithm . . . . . . 39 4.3 Implemen tation Details . . . . . . . . . . . . 40 v

PAGE 6

4.3.1 Hardw are and soft w are Details . . . . . . . . 40 4.3.2 F ree P arameters . . . . . . . . . . . . 44 4.4 Results . . . . . . . . . . . . . . . . 44 4.4.1 Same Image Registered with the Same Image . . . . 45 4.4.2 Registration of Same P erson with Dieren t Expressions . 45 4.4.3 Y et Another Registration of Same P erson with Dieren t Expressions . . . . . . . . . . . . 48 4.4.4 Registration Result of Images of t w o Dieren t P eople . 50 4.4.5 Y et Another Registration Result of Images of Tw o Dieren t P eople . . . . . . . . . . . . . . 52 4.4.6 Comparison with Histogram Based Metho d . . . . 53 4.4.7 Con v ergence of Ren yi En trop y with Num b er of Samples as Alpha Changes . . . . . . . . . . . 58 4.4.8 Noise T rials . . . . . . . . . . . . . 61 5 DISCUSSION . . . . . . . . . . . . . . . . 64 APPENDIX ADDITIONAL INF ORMA TION . . . . . . . . 67 REFERENCES . . . . . . . . . . . . . . . . . 76 BIOGRAPHICAL SKETCH . . . . . . . . . . . . . . 79 vi

PAGE 7

LIST OF T ABLES T able page 4{1 The ICA v alues . . . . . . . . . . . . . . . 34 4{2 F ree parameter table . . . . . . . . . . . . . . 44 4{3 Noise results o v er rotation for histogram . . . . . . . . 61 4{4 Noise results o v er rotation for MST . . . . . . . . . 62 4{5 Noise results o v er rotation and scale for MST . . . . . . . 63 4{6 Noise results o v er rotation and scale for MST . . . . . . . 63 vii

PAGE 8

LIST OF FIGURES Figure page 1{1 Tw o registered images with dieren t in tensit y distributions . . . 4 2{1 Original graph . . . . . . . . . . . . . . . 14 2{2 MST b y Krusk als Algorithm . . . . . . . . . . . 14 4{1 Blo c k diagram for image registration . . . . . . . . . 30 4{2 Flo w-c hart for the estimating the parameters of the transform . . 31 4{3 Eect on registration b y v arying the n um b er of bins of histogram(Num b er of Bins are 8, 16 and 32 from top left to b ottom righ t. Maxim um should b e presen t at 0.) . . . . . . . . . . . . 33 4{4 ICA feature images . . . . . . . . . . . . . . 35 4{5 The faces used to compare registration for dieren t sizes of ICA blo c ks . . . . . . . . . . . . . . . . . 36 4{6 Figure sho wing the v ariations in MI vs. the rotation angle v arious ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features. Maxim um lo cated at 0. . . . . . . . . . . . . 36 4{7 Figure sho wing the v ariations in MI vs. the rotation angle v arious ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features . . . . . . . . . . . . . . . . . . . 37 4{8 Beha vior of MST en trop y in 1D and 2D searc h space. . . . . 38 4{9 Plots sho wing error b et w een the actual MST length and the truncated MST length o v er v arious radius factors and o v er rotations of -10, 0 and 10 degrees . . . . . . . . . . . . . . 41 4{10 Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.55 are sho wn as 1. 100 Samples 2. 300 samples 3. 500 samples 4. 700 samples . . . . . . . . . . . 42 4{11 Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.55 are sho wn as 1. 900 Samples 2. 1100 samples 42 viii

PAGE 9

4{12 Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.85 are sho wn as 1. 100 Samples 2. 300 samples 3. 500 samples 4. 700 samples . . . . . . . . . . . 43 4{13 Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.85 are sho wn as 1. 900 Samples 2. 1100 samples 43 4{14 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. Same face with applied transformation 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images . . . . . . . . . . . . . . . . 46 4{15 Minim um spanning tree of the unregistered images considering only the in tensities of the t w o images as features . . . . . . 46 4{16 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace obtained after applying the optimal transformation to the unregistered image sho wn in the gure 4{14 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images . . . . . . . . . . . . . . . . 47 4{17 Minim um spanning tree of the registered images considering only the in tensities of the t w o images as features. . . . . . . . 47 4{18 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of the same p erson with glasses whic h is unregistered 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images . . . . . . . . . . 48 4{19 Minim um spanning tree of the unregistered images considering only the in tensities of the t w o images as features . . . . . . 49 4{20 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of the same p erson with glasses whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images . . . . . . . . . . 49 4{21 Minim um spanning tree of the registered images considering only the in tensities of the t w o images as features . . . . . . . 50 4{22 Figure sho wing the from top left to b ottom righ t 1. F ace image of a happ y p erson 2. F ace image of the same p erson with sad expression whic h is unregistered 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images . . 51 ix

PAGE 10

4{23 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of the same p erson with sad expressions whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images . . . . . . . . . 51 4{24 Figure sho wing the from top left to b ottom righ t 1. A F ace image 2. F ace image of the dieren t p erson 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images . . 52 4{25 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of dieren t p erson whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images . . . . . . . . . . . . . . . . 53 4{26 Figure sho wing the from top left to b ottom righ t 1. A F ace image 2. F ace image of the dieren t p erson 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images . . 54 4{27 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of dieren t p erson whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images . . . . . . . . . . . . . . . . 54 4{28 Figure sho wing the from top left to b ottom righ t 1. A F ace image 2. F ace image of the dieren t p erson 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images . . 55 4{29 Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of dieren t p erson whic h is registered 3. Ov erlap of the t w o images registered using histogram 4. Edge image of the o v erlapp ed registered images . . . . . . . . . . 55 4{30 Figure sho wing from left to righ t 1. F ace image of a p erson 2. F ace image of same p erson whic h is registered . . . . . . . 56 4{31 Figure sho wing image 2 as registered b y the histogram . . . . 56 4{32 Plot of MI vs. rotation angle when registered b y histogram. Maxim um should b e at -5 degrees. . . . . . . . . . . . . 57 4{33 Figure sho wing image 2 as registered b y the MST . . . . . 57 4{34 Plot of MI vs. rotation angle as registered b y MST. Correct answ er is at -5 degrees. . . . . . . . . . . . . . . 57 4{35 Figure sho wing the from top left to b ottom righ t, MST Ren yi join t en trop y vs the samples for 1. alpha = 0.4 2. alpha = 0.45 3.alpha = 0.5 4. alpha = 0.55 . . . . . . . . . . . . 58 x

PAGE 11

4{36 Figure sho wing the from top left to b ottom righ t, MST Ren yi join t en trop y vs the samples for 1. alpha = 0.6 2. alpha = 0.65 3.alpha = 0.7 4. alpha = 0.75 . . . . . . . . . . . . 59 4{37 Figure sho wing the from top left to b ottom righ t, MST Ren yi join t en trop y vs the samples for 1. alpha = 0.8 2. alpha = 0.85 3.alpha = 0.9 4. alpha = 0.95 . . . . . . . . . . . . 60 xi

PAGE 12

Abstract of Thesis Presen ted to the Graduate Sc ho ol of the Univ ersit y of Florida in P artial F ulllmen t of the Requiremen ts for the Degree of Master of Science AFFINE IMA GE REGISTRA TION USING MINIMUM SP ANNING TREE ENTR OPIES By Aditee Kum thek ar Ma y 2004 Chair: Anand Rangara jan Ma jor Departmen t: Computer and Information Science and Engineering The v arious en vironmen tal factors, settings of the image acquisition device and the material prop erties of a ob ject directly aect the formation of an image. Registration of t w o images in v olv es aligning the t w o images obtained under dieren t conditions whic h is based on sev eral information-theoretic metho ds. En trop y estimation is an imp ortan t step in these metho ds whic h traditionally requires estimation of the densit y function of the images. This thesis ho w ev er directly estimates the en trop y without a priori default densit y estimation. This is ac hiev ed b y exploiting a p o w erful theorem whic h relates the asymptotic length of the minim um spanning tree formed from the samples and the Ren yi en trop y of the densit y function underlying the samples. Since the densit y estimation step is a v oided, the careful tuning of sev eral densit y estimation parameters, whic h is required b y most densit y estimation metho ds, is a v oided. Also, since the spanning tree metho d estimates the en trop y in high dimensions, this allo ws us to include feature information in to the registration pro cess. Ane registration of face images of dieren t sub jects with dieren t gestures and tak en under dieren t ligh ting conditions is demonstrated. xii

PAGE 13

CHAPTER 1 INTR ODUCTION The en tire eld of computer vision is motiv ated to solv e the problem of designing a system whic h has the capabilities of h uman vision. When w e lo ok around, our ey es and our brain immediately analyze the relationship b et w een the dieren t ob jects in our immediate en vironmen t. W e usually do not ha v e to think long to in terpret what w e see. It comes naturally to us. Ho w ev er, w e ha v e not b een able to em ulate this amazing system of p erception after 40 y ears of the inception of the eld of computer vision. Wh y is this so? This is b ecause a v ery complex relationship exists b et w een the ph ysical factors and the w a y an image is formed. Suc h factors include rerectivit y and absorb ency of the illuminated surface, the in tensit y and the direction of illumination, the en vironmen tal factors lik e the presence of dust and sp ec kle in the atmosphere, rain etc. Not only this but the mo de in whic h an image is acquired { for example, a camera and the parameters that calibrate the instrumen t { for example, the sh utter sp eed in the camera con tribute to the image formation. Viola and W ells [ 28 ] ha v e represen ted this ev en t of image formation as an equation as v ( T ( x )) = F ( u ( x ) ; q ) ; (1.1) where T ( x ) is the transformation function whic h determines the orien tation of the image; u ( x ) is the function whic h describ es the surface of the ob ject; q represen ts the other parameters that aect image formation suc h as ligh ting, en vironmen t etc; v () is the function that form ulates the pixel in tensities and F ( u ( x ) ; q ) is the function whic h incorp orates the v arious parameters men tioned ab o v e whic h pla y a 1

PAGE 14

2 role in the image formation. Mo delling the aforemen tioned function is not a trivial task as there are a v ast n um b er of parameters in v olv ed. Man y vision related systems attempt ha ving functionalities suc h as ob ject recognition { iden tifying the ob jects from the bac kground region, indexing of image databases{ the ecien t retriev al of images similar to the query image and ob ject mo delling { the mathematical represen tation, co ding and syn thetic generation of an image from a mo del. Man y suc h applications require that the images under consideration b e aligned b efore further pro cessing can b e p erformed on them. Apart from these applications whic h require alignmen t as a initial prepro cessing step, there are a plethora of applications whic h rely on accurate image alignmen t. One suc h application is "anomaly detection" in medical imaging, whic h refers to detecting an y c hanges in a particular organ o v er a p erio d of time either due to disease or to due to treatmen t, e.g., detecting if the p ost-op erativ e MRI of patien t sho ws the remo v al of the tumor that w as seen in the pre-op erativ e scan of the patien t. In order to do this, w e require accurate alignmen t of the images. Image registration is precisely this task of spatially aligning t w o images so that the corresp onding pixels in the images refer to the same p ortion or area in the image. Alternativ ely y ou could also think of image registration as nding the transformation whic h aligns the t w o images accurately Y ou migh t no w w onder wh y is this not a simple task. Let us see wh y W e sa w that there w as a wide arra y of factors that con tribute to the image formation. The c hallenge of registration lies in matc hing t w o images when they are tak en at dieren t times, under dieren t conditions and using dieren t viewp oin ts probably with dieren t image acquisition tec hniques. Recen tly the problem of m ultimo dalit y based image registration, whic h refers to registering images under dieren t ligh ting conditions, has gained imp ortance.

PAGE 15

3 A t the core,registration metho ds can b e view ed as a com bination of certain c hoices ab out the follo wing parameters, namely [ 4 ] F eature Space: F eature space refers to the parameters of the image used to p erform registration. In some tec hniques, the ra w image in tensities are used directly These are called in tensit y based metho ds. Also, there are feature based metho ds whic h extract some information from the ra w image (termed as features) and then use these features to p erform registration. The features can b e extracted via image segmen tation or b y using information theory based tec hnique. Sea rch space: This is the set of v arious transformations applied to the image during the registration pro cess. The spatial transformations run the gam ut of rigid, ane, pro jectiv e, p olynomial, non-rigid(morphing)etc. These are explained later. Sea rch strategy: This term denotes the tec hnique used to searc h the en tire set of transformations. In case of the exhaustiv e searc h, the en tire space is scanned. In the case of hierarc hical searc h, the set of transformations is searc hed using a coarse to ne hierarc h y .Finallu there are tec hniques that use lo cal gradien t information of image similarit y measure to determine a sequence of transformations whic h progressiv ely minimizes the measure. Simila rit y metric : It is the criterion used for judging the qualit y the qualit y of the ac hiev ed registration. The similarit y metric is a gure of merit used to rank order the v arious spatial transformations. One migh t w onder what is this similarit y metric an yw a y? Can't w e simply compare the in tensities at the corresp onding pixel lo cations to v erify if the images ha v e b een registered. This is not so straigh t forw ard. In gure 1{1 w e sho w t w o registered images A and B. Clearly the in tensities in the t w o images at the same lo cation do not corresp ond. There ha v e b een man y recen t eorts fo cused on information theory based metho ds. The information-theory based criterion usually pro vides us with some registration ob jectiv e function whic h has to b e minimized(or maximized) o v er a range. One suc h ob jectiv e function is the m utual information ob jectiv e function [ 28 ] whic h w as used to p erform m ultimo dalit y based image registration. Mutual Information is a measure whic h measures the amoun t of o v erlapping information con ten t b et w een t w o images. An in teresting metaphor for in tensit y

PAGE 16

4 Figure 1{1: Tw o registered images with dieren t in tensit y distributions based and information-theory based similarit y measures is that the in tensit ybased tec hnique can b e view ed as "What W e See Is All Get (WWSIA W G)" while the information-theory based criteria can b e view ed as "What W e See Is What W e Mean(WWSIWWM)". Mutual Information tec hniques ha v e no w b ecome widespread in registration esp ecially in medical imaging. The Mutual Information b et w een an y t w o images can b e computed either from the ra w image in tensit y or from feature v ectors extracted from eac h image. Rarely do w e see registration approac hes that com bine in tensit y and features. This thesis attempts suc h a com bination. Histograms are commonly used to estimate the en trop y (information con ten t)whic h is then used to calculate the Mutual Information. When feature based metho ds are used, the dimensionalit y of the feature space increases. Histogram metho ds suer from the "curse of dimensionalit y" and th us cannot b e easily used to estimate the en trop y of a high dimensional feature set. Also the

PAGE 17

5 histogram based metho ds are extremely sensitiv e to the n um b er of bins. Apart from this the histograms estimate the discrete Shannon En trop y while, as w e explain later, w e need to estimate the con tin uous dieren tial en trop y 1 This calls for the use of non-parametric metho ds to estimate the en trop y in a high dimensional feature space. Once suc h tec hnique is to use spanning graphs to estimate en trop y The early w ork in this area w as done b y Steele [ 25 ] who sho w ed ho w the length of the minim um spanning tree (MST) of a complete graph from the Euclidean distance b et w een the feature v ectors is directly prop ortional to the in tegral of an appropriately dened p o w er of the probabilit y densit y dened on the feature space. Later the group at Univ ersit y of Mic higan, Alfred Hero et al. [ 12 19 ] sho w ed ho w the aforemen tioned relation b et w een the length of the spanning tree and the in tegral of the p o w er of the densit y can b e used to estimate the Ren yi en trop y whic h in turn can b e used to p erform registration. The searc h space of ane transformations is one suc h space whic h includes the six parameters of rotations, scale, shear and translations in R The ane transformation is applied globally to the en tire image. In this thesis, w e ha v e closely follo w ed Hero's w ork and ha v e successfully sho wn ho w the MST based approac h can b e used the to p erform registration o v er six parameters of the ane transformation. W e ha v e computed an informationtheory based criterion similar to the con v en tional m utual information using the Ren yi en trop y as prop osed in pap er [ 14 ] and then minimized it o v er an ane searc h space. Also, w e ha v e com bined the use of in tensities and features in the same information theory based registration tec hnique. The Ren yi en trop y is estimated 1 Information on the en trop y its t yp es and Mutual Information is presen ted in Chapter 2.

PAGE 18

6 from the high dimensional features space using the minimal spanning tree-based metho d. After this concise in tro duction, Chapter 2 co v ers the bac kground material, Chapter 3 co v ers the theory Chapter 4 co v ers the implemen tation details and nally the last c hapter co v ers the results, conclusions and future w ork.

PAGE 19

CHAPTER 2 BA CK GR OUND 2.1 En trop y and Mutual Information F rom a broad p ersp ectiv e, the en trop y of a random v ariable is dened as the measure of uncertain t y of a random v ariable. Alternativ ely it can also b e dened as the amoun t of information required to represen t the v ariable. Th us, the more improbable a random the v ariable, greater is its en trop y Shannon [ 24 ] deriv ed a measure of en trop y for discrete v ariables. Mathematically the en trop y of H(X) of a random v ariable X is dened as H ( X ) = X x 2 p ( x ) log p ( x ) : (2.1) Normally the logarithm is tak en to the base 2, in whic h case the en trop y is expressed in terms of bits. Th us, en trop y can also b e expressed as the length of the co de to represen t the random v ariable. If g(x) are the actual v alues that a v ariable X can assume, then the exp ected v alues of g(X) is written as E p g ( X ) = X x 2 g ( x ) p ( x ) : (2.2) Th us, w e can write the en trop y of a random v ariable as H ( X ) = E p log 1 p ( X ) : (2.3) This leads to a conclusion that H ( X ) 0 as 0 p ( x ) 1 implies that log 1 p ( X ) 0 : (2.4) 7

PAGE 20

8 2.1.1 Join t and Conditional En trop y Join t and conditional en trop y extend the concept of en trop y to a pair of distinct random v ariables. Tw o random v ariables are said to b e statistically indep enden t when their join t densit y is the pro duct of their marginal densities. Th us, when the t w o random v ariables are indep enden t, the kno wledge of one do es not ha v e an y eect on the kno wledge of the other v ariable. On the other hand, when the v ariables are totally dep enden t, that is, if Y = f ( X ) then, Y is not random when X is kno wn. The mathematical represen tation of join t en trop y is H ( X ; Y ) = X x 2 X X y 2 Y p ( x; y ) log p ( x; y ) (2.5) and the conditional en trop y of a v ariable giv en the other v ariable is dened as H ( Y j X ) = X x 2 X p ( x ) X y 2 Y p ( y j x ) log p ( y j x ) (2.6) = E p ( x;y ) log p ( Y j X ) : (2.7) By the c hain rule w e get H ( X ; Y ) = H ( X ) + H ( Y j X ) : (2.8) Tw o v ariables are said to b e indep enden t if H ( Y j X ) = H ( Y ) (2.9) that is kno wledge of X has no eect on the kno wledge of Y Th us w e ha v e H ( X ; Y ) = H ( X ) + H ( Y ) : (2.10) 2.1.2 Relativ e En trop y and Mutual Information The relativ e en trop y is the measure of the distance b et w een the t w o probabilit y distributions. T o explain it b etter, supp ose there a random v ariable

PAGE 21

9 with probabilit y distribution p and whose en trop y is H ( p ). Since w e don't kno w the exact distribution of p w e assume that it is q with en trop y H ( q ). The relativ e en trop y D ( p jj q ) is the dierence in the n um b er of en trop y bits of p and q It is also called as the Kullbac k-Liebler distance and is mathematically represen ted as D ( p jj q ) = X x 2 X p ( x ) log p ( x ) q ( x ) (2.11) Mutual information ( I ( X ; Y )) is dened as the amoun t of information that one random v ariable has ab out other. It is the relativ e en trop y b et w een the join t and the pro duct distributions of the t w o random v ariables, expressed as I ( X ; Y ) = D ( p ( x; y ) jj p ( x ) p ( y )) (2.12) = X x 2 X X y 2 Y p ( x; y ) log p ( x; y ) p ( x ) p ( y ) : (2.13) It can also b e sho wn that I ( X ; Y ) = H ( X ) + H ( Y ) H ( X ; Y ) (2.14) = I ( Y ; X )& I ( X ; Y ) 0 : (2.15) Additional prop erties of m utual information are giv en in the b o ok b y Co v er and Thomas [ 8 ]. 2.1.3 Dieren tial En trop y While en trop y is a measure of uncertain t y in the discrete domain, dieren tial en trop y is a measure of uncertain t y of a v ariable in the con tin uous domain. It is dened as h ( X ) = E x [log ( p ( X ))] (2.16) = Z 1 1 p ( x ) log ( p ( x )) dx (2.17)

PAGE 22

10 The ma jor dierence b et w een en trop y and dieren tial en trop y is that the dieren tial en trop y can tak e negativ e v alues. Th us, there is no direct relationship b et w een co de length and dieren tial en trop y since co de length cannot b e negativ e. 2.1.4 Ren yi En trop y The en trop y describ ed in Section 2.1.3 is dieren tial shannon en trop y Ren yi en trop y is a generalized v ersion of the Shannon en trop y The Ren yi en trop y h ( X ) of order is dened as, h ( X ) = 1 1 log Z f ( x ) dx ; (2.18) for 1 < < 1 ; 6 = 1. The discrete v ersion of the Ren yi en trop y is dened as, H ( X ) = 1 1 log Ni =1 x i : (2.19) When 1 w e obtain the Shannon en trop y function from the Ren yi en trop y function as follo ws, h ( X ) = h 1 ( X ) = Z p ( x ) log ( p ( x )) dx (2.20) In the discrete domain, as 1 ; H ( X ) = H ( X ) : Viola and W ells [ 28 ] ha v e stated that the task of registering t w o images is equiv alen t to maximizing the m utual information b et w een the t w o images. 2.2 Histogram metho d for Estimating En trop y Histogram based densit y estimation divides the range of the data in to bins and then computes the normalized frequency of the data placed in the bins. The normalized frequency coun ts sum up to one. A one dimensional histogram of data with n samples and b bins is, f ( a i ) = counta i n where a i denotes the bin n um b er i and 0 < i b: (2.21)

PAGE 23

11 F or m ultiv ariate data distributed across d dimensions, histogram-based densit y estimation b y histograms is generally p erformed b y constructing M bins along eac h of the d dimensions. If the n um b er of bins is large then the histogram will fail to capture the shap e of the probabilit y distribution function due to high v ariance will b e large and the reduced bias. Ho w ev er, as n um b er of bins are decrease, the probabilit y distribution function b ecomes spiky as the v ariance narro ws and the bias increases. Algorithms that determine the correct bin size of the histogram include the Sturge's algorithm [ 26 ]whic h states that for n samples, M = 1 + 3 : 322 log 10 n Some other approac hes predict bin size from a priori kno wledge of data. In our thesis, w e ha v e c hosen the bin size statistically Histogramming suers from the curse of dimensionalit y That is, if the n um b er of data dimensions are denoted b y d with M bins in eac h direction then the total n um b er of bins will b e M d Suc h exp onen tial gro wth in M incurs dra wbac ks. Firstly as the M is large in higher dimensional data, more data p oin ts will b e needed to p opulate the histogram. Otherwise, it will result in a sparsely p opulated histogram with n umerous empt y bins. Secondly data in higher dimensions tends to b e correlated and th us pro jects itself in to selected dimensions. This mak es histogram based metho d of densit y estimation unsuitable for higher dimensional data. 2.3 Minim um Spanning T rees Before describing the theory b ehind the use of Minim um Spanning T rees for registration later in c hapter 3, let us see dene a Minim um Spanning T ree. Minim um Spanning T rees(MSTs) are used in solving man y real w orld problems. F or example, consider a case of a net w ork with V no des with E undirected connections b et w een no des. This can b e represen ted as a connected, undirected graph G = ( V ; E ) con taining V v ertices and E edges. No w supp ose that all the edges are w eigh ted, i.e., for eac h edge ( u; v ) 2 E w e ha v e an asso ciated w eigh t

PAGE 24

12 w ( u; v ). A w eigh t can b e used to represen t real w orld quan tities suc h as cost of a wire, propagation dela y etc b et w een t w o no des in a net w ork. A spanning tree is dened as a acyclic graph that connects all the v ertices. A minim um spanning tree is a spanning tree with the minim um w eigh t. Supp ose w e represen t the spanning tree as T E whic h connects all the v ertices, and whose total length is w ( T ), then the minim um spanning tree is dened as, min( w ( T )) = X ( u;v ) 2 T w ( u; v ) : (2.22) Generic MST algorithm The b o ok b y Cormen et al. [ 7 ] giv es a supp orted analysis of minim um spanning tree algorithms. The MST algorithm falls in the category of greedy algorithms. Greedy Algorithms are algorithms that mak e the b est c hoice at eac h decision making step. In other w ords, at ev ery step, greedy algorithms mak e the lo cally optim um c hoice and hop e that it leads to a globally optim um solution. The greedy MST algorithm builds the tree step-b ystep, incorp orating the edge that causes minim um increase in the total w eigh t at eac h step, without adding an y cycles to the tree. Supp ose there is a connected, undirected graph G = ( V ; E ) with the w eigh t function w : E R While nding the minim um spanning tree for graph G the algorithm manages at eac h step an edge-set S whic h is some subset of the MST. A t eac h step, edge ( u; v ) is added to subset S suc h that it do es not violate the MST prop ert y of S This mak es S U ( u; v ) a subset of the Minim um Spanning T ree. The edge whic h is added at eac h step is termed a "safe edge". The algorithm can b e written as GENERIC-MST(G, w) 1. S ; 2. while S do es not form a spanning tree 3. do nd a safe-edge (u,v) whic h can added in S 4. S S [ ( u; v )

PAGE 25

13 5. return S There are t w o p opular algorithms for computing the Minim um Spanning T ree, Prim's algorithm and Krusk al's algorithm (refer [ 7 ]). W e ha v e used the Krusk al's Algorithm to nd the Minim um Spanning T ree. Its description follo ws. Krusk al's algorithm for MST. Krusk al's algorithm is an extension of the generic MST algorithm describ ed in the preceding sub-section ab o v e. In the Krusk al's algorithm the set S whic h is a subset of the minim um spanning tree, is a forest. A t eac h step, the Krusk al's Algorithm nds the safe edge to b e added as the edge with the minim um w eigh t that connects t w o forests. Initially the edges are sorted in the decreasing order of their w eigh ts. A t eac h step, one nds the minim um edge in the graph not already presen t in the minim um spanning tree, connects t w o forests together. This pro cess is rep eated un til, till all the v ertices are included in the graph. The algorithm can b e written as, MST-Krusk al(G,w) 1. S ; 2. for eac h v ertex v 2 V [ G ] 3. do MAKE-SET(v) 4. sort the edges of E b y non-decreasing w eigh t w 5. for eac h edge ( u; v ) 2 E ; in the order of nondecreasing w eigh t 6. do if find-set ( u ) 6 = find-set ( v ) 7. then S S [ ( u; v ) 8. union (u,v) 9. return S In steps 1-3, the subset S is initialized to n ull and V n um b er of forests eac h with a single v ertex are created. Step 4 sorts the edge set E in a non decreasing order of w eigh t. In steps 5-8, an edge ( u; v ) is found suc h that the endp oin t u b elongs to one forest and endp oin t v b elongs to other forest. This edge is incorp orated in

PAGE 26

14 Figure 2{1: Original graph the subset S The algorithm stops when all v ertices are included in the tree. An illustration of ho w the Krusk al's algorithm w orks is sho win in gure 2{2 Figure 2{2: MST b y Krusk als Algorithm A more formal pro of of wh y the Krusk al's Algorithm w orks [ 15 ] is as follo ws Let G b e a undirected, connected graph. t is the minim um spanning tree b y Krusk al's algorithm and t 0 is the actual minim um spanning tree. One sho ws that

PAGE 27

15 trees t and t 0 ha v e same costs. Let E ( t ) and E ( t 0 ) are the edges in b oth the trees where eac h set con tains n 1 edges, pro vided that the total n um b er of v ertices in a graph are n PR OOF: Case1: If E ( t ) = E ( t 0 ) then the lengths t and t 0 are equal. Th us Krusk al's Algorithm generates the minim um spanning tree in this case. Case2: E ( t ) 6 = E ( t 0 ). Assume that there is an edge p suc h that p 2 E ( t ) and p = 2 E ( t' ). If w e include p in t' then t 0 will ha v e n edges and, th us, a unique cycle in t' with edges p; e 1 ; e 2 ; :::e k Of these k + 1 edges there will b e at least one edge e j not in tree t The w eigh t of e j m ust b e greater than or equal to the w eigh t of p b ecause, if the w eigh t is less that that of p then the greedy Krusk al's algorithm w ould include it while forming the tree t If an y suc h edge whic h forms a cycle is remo v ed from tree t' a dieren t tree t" is formed whose cost is less than or equal to the cost of tree t' This is b ecause w eig ht ( e j ) w eig ht ( p ). Th us, tree t" is a minim um cost tree. By rep eating the ab o v e men tioned transformation, w e can transform tree t' in to tree t Th us, the tree generated b y the Krusk als algorithm is a minim um spanning tree. The time complexit y of Krusk als minim um spanning tree is O ( j E j l og j E j ) Prim's spanning tree. Unlik e Krusk al's spanning tree, in Prim's spanning tree the subset s is alw a ys a tree and nev er a forest. The algorithm is giv en b y MST-Prim(G,w,r)G:Graph, w:W eigh ts, r:Ro ot 1. Q V [ G ] 2. for eac h u 2 Q 3. do k ey [ u ] 1 k ey(v): minim um w eigh t of an y edge connecting v ertex v to the tree. 4. k ey [ r ] 1 5. [ r ] N I L [v]: names of the paren t of v in tree.

PAGE 28

16 6. while Q 6 = 0 7. do u Extra ct-Min(Q) 8. for eac h v 2 Q and w ( u; v ) < k ey [ v ] 9. then [ v ] u 10. k ey [ v ] w ( u; v ) The time complexit y of this algorithm is O ( j E j + j V j log j V j ). More ecien t algorithms for computing the minim um spanning trees are giv en in Chazelle [ 5 ] and P ettie and Ramac handran [ 20 ]. Some algorithms [ 10 ] compute the minim um spanning trees dynamically as the v ertices are added or remo v ed dynamically 2.4 The Problem of Registration Image registration can b e dened as the problem of nding the b est transformation that, when applied to one image, aligns the image with the another image. Bro wn [ 4 ] and Pluim et. al [ 21 ] surv ey ed image registration metho ds whic h can b e classied as the v arious algorithms actually used to p erform registration and the applications of the registration tec hniques. Based on the application of the registration tec hniques, the metho ds can b e classied as, registration if images with dieren t mo dalities, registration of dieren t anatomical ob jects etc. The problem of registration of images with dieren t mo dalities is called m ultimo dalit y registration. This problem in v olv ed the registering of t w o images tak en via dieren t sensors or under dieren t ligh ting conditions. The example of m ultimo dalit y registration is registration of MRI, and CT images. In our thesis, w e ha v e registered face images whic h are obtained via a camera of dieren t p eople, under dieren t ligh ting conditions and with dieren t expressions. Based on the metho ds used for registration, the tec hniques are classied as based on prepro cessing of images, the t yp e of transformations applied to images,

PAGE 29

17 the in terp olation metho d used, the searc h strategy used, the dieren t features of the image used and the measure of registration. Mutual information for registration Based on the measure of registration, sev eral statistical criteria are used. In these criteria, ma jorit y of the metho ds in v olv e en tropic measures to estimate ho w the images ha v e b een aligned. These en tropic measures in v olv e estimating the en trop y of the images and form ulating dieren t measures based on them. Mutual information is one suc h measure, whic h is dened as I ( X ; Y ) = H ( X ) + H ( X j Y ) (2.23) I ( X ; Y ) = H ( X ) + H ( Y ) H ( X ; Y ) : (2.24) Some metho ds use normalized m utual information to register the images whic h is computed as, N M I ( X ; Y ) = H ( X ) + H ( Y ) H ( X ; Y ) : (2.25) Observ e that H ( X j Y ) is a measure of the conditional probabilit y p ( x j y ) i.e. of a particular v alue x in X giv en the v alue in Y is y Th us, the Equation ( 2.23 ) can b e in terpreted as the measure of the uncertain t y ab out X min us the uncertain t y ab out X when Y is kno wn. It can also b e stated that I ( X ; Y ) = I ( Y ; X ). Th us, when the images are aligned, the amoun t of information the images ha v e ab out eac h other is maximized. As a result, registration in v olv es nding the transformation that maximizes the m utual information. Another view ab out of information is obtained from equation ( 2.24 ), whic h describ es m utual information as the sum of marginal en tropies min us the join t en trop y One migh t argue that, as the images are transformed, the marginal en tropies of the images remain constan t and the only factor that c hanges is the join t en trop y F rom this it w ould seem that, maximizing the m utual information is equiv alen t to minimizing the join t en trop y of the t w o images. But this is not

PAGE 30

18 true, as the en tropies are calculated from the o v erlap areas while computing m utual information. The marginal en tropies of the images th us, c hange with the join t en trop y as the n um b er of pixels in the o v erlap area c hange with a giv en transformation. In suc h a scenario, if the o v erlap region is v ery small, then the join t histogram will b e a sharp p eak corresp onding to the o v erlap of the one image and the bac kground samples from second image, resulting a lo w join t en trop y This will giv e an incorrect answ er, whic h ironically fa v ors the separation of the t w o images. By adding the marginal en tropies term in Equation ( 2.24 ), a p enalt y is added since the marginal en tropies will also b e lo w when the images are misaligned. This mak es m utual information less sensitiv e to the problem of bac kground. 2.5 Image T ransformations An image transformation is a function that maps the lo cation of p oin ts in one image to a dieren t lo cation in an other image. The transformations can b e global or lo cal. Global transformations mean applying the same mapping for the en tire image while a lo cal transformation consists of a collection of sev eral mapping functions, eac h of whic h is sp ecic to a particular region in the image. Apart from these broad classes, image transformations can b e classied as, Rigid: in v olv es translations and rotations alone. Ane: in v olv es translations, rotations, scale and shear. P ersp ectiv e: is mainly used to map a 3D image to a 2D image b y pro jecting the data. Non-Rigid: In whic h mapping in v olv es in tro ducing morphing and distortions of an image. In this study w e ha v e used the ane transformation and th us will elab orate more on that. Ane transformations. Ane transformations in v olv e rotation, scale, shear and translation. The comp osite transformation matrix is decomp osed in to scale,

PAGE 31

19 rotation, shear, rotation as suggested in the pap er [ 11 ] 1 and translation. This decomp osition of can b e written as, T = 266664 a b 0 c d 0 e f 1 377775 that is the comp osite matrix, whic h can b e further decomp osed as 264 a b c d 375 = 264 2 s 0 0 2 s 375 264 cos( ) sin( ) sin( ) cos( ) 375 264 2 t 0 0 2 t 375 264 cos( ) sin( ) sin( ) cos( ) 375 where s is the scale parameter, t is the shear parameter, and are the rotation angles and e and f are the translations in the x and y direction. W e th us, p erform our searc h o v er six degrees of freedom, t w o rotations, one scale, one shear and t w o translations. An in teresting prop ert y of this ane transformation that it do es not allo w rerections and that it giv es a nice decomp osition of the comp osite ane matrix whic h allo ws the user to individually set dieren t parameters. 2.6 Indep enden t Comp onen t Analysis Registration of t w o images can b e p erformed b y using the pixel in tensit y v alues, or b y extracting features from image. F eatures can b e extracted simply b y passing the image through a set of lters, eac h of whic h detects one or more features of in terest. Indep enden t Comp onen t Analysis is one suc h tec hnique for extracting features. The motiv ation for the tec hnique of Indep enden t Comp onen t Analysis, w as to solv e the "co c ktail part y problem". This problem can b e p osed as follo ws: 1 The pap er suggests decomp osing in to scale, rotation and t w o shears. Ho w ev er, it also men tions that the second shear can b e replaced b y a second rotation

PAGE 32

20 Supp ose there are t w o p eople sp eaking sim ultaneously in a ro om, and w e ha v e t w o microphones to capture their v oices. The output from the t w o microphones can written mathematically as x 1 ( t ) = a 11 s 1 + a 12 s 2 (2.26) x 2 ( t ) = a 21 s 1 + a 22 s 2 (2.27) where x represen ts the output from a microphone, t is the time, s represen t the sources and a represen ts the w eigh ts that are assigned to the signals from the microphones. ICA attempts to nd the original sources of the signals (i.e. the sp eak ers in the ab o v e case) from a mixed signal input (i.e. from microphones). ICA th us, in v olv es reco v ering indep enden t sources of signals from only the sensor observ ations, whic h are the unkno wn w eigh ted linear mixtures of the signals from indep enden t sources. Th us, ICA searc hes for a linear mixing matrix that, when m ultiplied b y the original m ultiv ariate mixed data, giv es statistically indep enden t comp onen ts of the data. Denition of indep enden t comp onen ts Tw o v ariables are said to b e statistically indep enden t when no v ariable has information ab out the other v ariable. The join t probabilit y of t w o indep enden t v ariables is the pro duct of the marginal probabilities of the t w o v ariables. Th us, t w o random v ariables y 1 and y 2 are considered to b e statistically indep enden t when, p ( y 1 ; y 2) = p ( y 1) :p ( y 2). F or n suc h v ariables, the join t densit y m ust b e a pro duct of n marginal densities of the v ariables. Indep enden t v ariables are also uncorrelated in nature. Also, indep enden t v ariables are as non-Gaussian as p ossible. The reason b ehind this is that the Gaussian distribution is symmetric in nature, whic h mak es it imp ossible to nd the direction of the comp onen ts of the mixing matrix. No w the ob vious question arises is what are the dieren t w a ys of measuring indep endence.

PAGE 33

21 Measuring indep endence Non-Gaussianit y and, th us, the indep endence of a v ariable, can b e measured b y using an information theory metric called negativ e en trop y also called negen trop y Previously w e stated that that the dieren tial en trop y of a random v ariable x is giv en b y the form ula, h ( x ) = Z p ( x ) log p ( x ) dx: (2.28) Gaussian v ariables ha v e the largest en trop y among all v ariables of equal v ariance. On the other hand, the v ariables that ha v e "spik es" in their probabilit y distribution functions (less Gaussian) ha v e less en trop y Based on this observ ation, negen trop y is dened as J ( y ) = H ( y g auss ) H ( y ) : (2.29) Th us, the negen trop y is alw a ys p ositiv e, except for gaussian v ariables, when negen trop y is zero. The F ASTICA algorithm W e ha v e used the F ASTICA pac k age(Av ailable A t: http://www.cis.hut.fi/projects/ica/fastica/ Accesses at: Marc h 2004) [ 16 ] to nd the ICA of images. This pac k age implemen ts the F ASTICA algorithm whic h constructs a neural net w ork, the w eigh ts of whic h are up dated based on a learning rule. Roughly the algorithm is as follo ws: 1. Negen trop y is used as a measure of the non-gaussianit y 2. The cost function is an appro ximation of the negen trop y 3. Initially random w eigh ts are assigned at the starting p oin t. 4. Then the v alue of the w eigh ts in the subsequen t iteration is estimated b y the Newton-Raphson metho d. 5. The ab o v e step is rep eated un til con v ergence. Assuming that the images are a mixture of signals from statistically indep enden t sources, w e can use ICA to extract features from the images.

PAGE 34

CHAPTER 3 THEOR Y OF MINIMUM SP ANNING TREE ENTR OPIES In this c hapter, w e consider minim um spanning trees for registration. 3.1 En trop y Estimation for Registration In the previous section it w as sho wn that, m utual information requires the estimation of en trop y that is calculated b y estimating the probabilit y densit y of the underlying data. F or image registration, it migh t b e necessary to estimate discrete instead of con tin uous densities. There are t w o broad approac hes to estimate the probabilit y densit y: parametric metho ds and non-parametric metho ds. Both metho ds require the estimation of parameters. Ho w ev er, non-parametric metho ds estimate only the width parameters lik e the bin size in histograms, windo w size in parzen windo w densit y estimation describ ed in [ 9 ], etc, while parametric metho ds estimate more complicated parameters whic h describ e the underlying densit y suc h as mean, v ariance etc. An o v erview of metho ds for densit y estimation can b e found in the b o ok b y Christopher Bishop [ 3 ]. T o elab orate a little bit, P arametric Metho ds: assume a densit y mo del for the underlying distribution of data. These metho ds then attempt to nd the parameters of the particular mo del whic h t optimally to the data. One of the dra wbac ks of this approac h is the assumption of the underlying densit y mo del migh t not b e true. The common parametric densit y estimation tec hniques are the Hidden Mark o v Mo dels(HMM), Ba y esian Net w orks etc. The HMMs mo del the underlying distribution b y ha ving dieren t states and the transitions from one state to other due to the underlying probabilit y and bias as describ ed in [ 22 ]. Also the Ba y esian Net w orks mo del a causal system in whic h the o ccurrence of a ev en t at one no de can trigger sev eral c hanges in the connecting no des as describ ed in depth in [ 18 ]. Both the ab o v e men tioned approac hes ha v e to mo del complex graphs, whic h are dened b y a large set of parameters. Non-parametric Metho ds: These metho ds mak e no assumption ab out the densit y mo del whic h underlies the data, and lets the data decide the nature of its distribution. Histogram metho ds, parzen windo ws are examples of 22

PAGE 35

23 suc h metho ds. Suc h metho ds require estimation of the width parameter or smo othing factor whic h is the n um b er of bins in histograms, the parzen windo w size " whic h determines ho w w ell the densit y estimation "ts" the data. Semi-P arametric metho ds: Bishop [ 3 ] adv ances a class for densit y estimation that tries to com bine the b est features of parametric and non-parametric metho ds. T ec hniques suc h as mixture mo dels fall in this category These tec hniques assume Gaussian comp onen t densities of the data and th us ha v e to estimate Gaussian lik e parameters. Since b oth the parametric and semi-parametric metho ds assume a sp ecic underlying distribution, they ha v e to estimate sev eral distribution sp ecic parameters suc h as mean, v ariance and co v ariance ( a ; a ; a ). In con trast, non-parametric metho ds estimate a smo othing parameter whic h is usually some kind of a width parameter. Image registration requires determining the p ose parameters in a particular transformation space. If parametric or semi-parametric metho ds are used for densit y estimation in registration, then the densit y mo del parameters need to b e estimated apart from the p ose parameters. In con trast, nonparametric metho ds require the estimation of width(smo othing) parameters and the p ose parameters only Th us, non-parametric densit y estimators are generally preferred o v er parametric or semi-parametric metho ds for image registration. T raditional en trop y estimation, either discrete or con tin uous, requires a priori densit y estimation. Ho w ev er, the approac h tak en in this study the en tire densit y estimation. Ren yi en trop y is estimated directly using the samples from the image, via the relationship b et w een the asymptotic length of the minim um spanning tree and the Ren yi en trop y The Ren yi en trop y and its prop erties are discussed in Section ( 2.1.4 ). The follo wing Section discusses the c hronology of the dev elopmen t of the theory underlying the en trop y estimation b y a minim um spanning tree. 3.2 History In the late 50's, an in teresting viewp oin t for estimating en trop y using graph algorithms came in to existence. It b egan in 1959 when Beardw o o d, Halton and

PAGE 36

24 Hammersley [ 2 ] rst analyzed the tra v elling salesman problem(TSP) and stated that the TSP length of a set of m ulti-dimensional feature v ector is related to the en trop y Later, in 1988 J. M. Steele [ 25 ] follo w ed their w ork and stated that if a fully connected graph is constructed from a m ulti-dimensional feature space suc h that the w eigh t of an edge b et w een an y t w o p oin ts is the w eigh ted Euclidean distance b et w een the p oin ts, then the asymptotic length of a minim um spanning tree constructed from suc h a graph is related to the underlying probabilit y densit y Later, the group at Univ ersit y of Mic higan led b y Dr. Alfred Hero sho w ed ho w the asymptotic length of the minim um spanning tree is directly prop ortional to the Ren yi en trop y (refer Section 2.1.4 ) [ 17 ]. They further p oin ted out ho w this relationship b et w een d and minim um spanning trees can b e further expanded to a v ariet y of graphs lik e k-nearest neigh b ors or Steiner T rees [ 13 ]. The practical applications of suc h graphs are emphasized in [ 12 ] with image registration b eing one suc h application. Minim um spanning trees and Ren yi en trop y w ere initially discussed in Section ( 2.3 ) and ( 2.1.4 ) resp ectiv ely Some relev an t p ortions are reiterated here as and when needed. Supp ose there are n indep enden t m ultiv ariate p oin ts in a d dimensional space, x i 2 R d and that there is a graph of these p oin ts with V = x 1 ; x 2 ::; x n v ertices and E = ( x i ; x j ); 1 i < j n where eac h edge e in the edge set E is denoted b y a w eigh t function, ( j e j ). The task of nding a minim um spanning tree in v olv es nding the tree with the minim um w eigh t among all the trees whic h span the all the v ertices in the set V M ( x 1 ; x 2 :::; x n ) = min T X e 2 T ( j e j ) ; (3.1) where T is a set of all the graphs with v ertex set that includes all p oin ts in the space R d is the w eigh ting function that asso ciates w eigh ts with the edges of the

PAGE 37

25 minim um spanning trees whic h can b e dened as ( x ) = x r where 0 < r < d: (3.2) Let the p oin ts in feature space R d ; d 2 ha v e a indep enden t random distribution with the function satisfying the criteria ( x ) x r and 0 < r < d and let f b e the densit y of the con tin uous part of with c ( r ; d ) a p ositiv e constan t dep ending up on r and d then Steele [ 25 ] pro v ed the relationship b et w een the length of a minim um spanning tree and some p o w er of the densit y function f as, lim n !1 n d r d M ( X 1 ; X 2 ; : : : ; X n ) = c ( r ; d ) Z < d f ( x ) d r d dx: (3.3) The next section sho ws ho w the equation ( 3.3 ), after a little bit of algebraic manipulations, presen ts a clear link b et w een the minim um spanning tree and the Ren yi en trop y 3.3 Minim um Spanning T rees for En trop y Estimation W e discussed ab out alpha Ren yi En trop y or simply the Ren yi en trop y in Section ( 2.1.4 ) and represen ted it mathematically in Equation ( 2.18 ) as, h ( X ) = 1 1 log Z f ( x ) dx : (3.4) The relationship b et w een the densit y function and the length of the minim um spanning tree is sp ecied in Equation ( 3.3 ). If w e rename the length of the Minim um Spanning T ree in Equation ( 3.3 ) as, M ( X 1 ; X 2 ; : : : ; X n ) = L ( X n ) = min e 2 T X e ; e rij (3.5) where e ij = k x i x j k denotes the Euclidean distance b et w een p oin ts i and j in a d dimensional feature space, w e can rewrite the Equation ( 3.3 ) as, lim n !1 n ( d r ) =d L ( X n ) = c ( r ; d ) Z < d f ( x ) d r d dx (3.6)

PAGE 38

26 No w taking the log on b oth sides and substituting = ( d r ) =d w e obtain, log lim n !1 n L ( X n ) = log ( c ( ; d )) + log ( Z < d f ( x ) dx ) : (3.7) Note that the constan t c whic h w as a function of r and d no w b ecomes a function of and d due to the substitution of Dividing the ab o v e equation b y 1 on eac h side yields, 1 1 log ( Z < d f ( x ) dx ) = 1 1 (log lim n !1 n L ( X n ) log ( c ( ; d ))) (3.8) F rom the Equation ( 2.18 ), the left hand side of the preceding equation can b e reduced to h ( x ) = 1 1 (log lim n !1 n L ( X n ) log ( c ( ; d ))) : (3.9) The short deriv ation ab o v e p oin ts as to ho w the minim um spanning tree length con v erges to the Ren yi en trop y The ab o v e Equation ( 3.9 ) is the fundamen tal result from whic h w e estimate the en trop y appro ximately in the thesis. It is the appro ximate estimation of the en trop y as the constan t c (whic h dep ends on and the dimensions d ) is unkno wn and cannot b e computed analytically The strength of Equation ( 3.9 ) lies in the fact that the length of the minim um spanning trees can b e easily computed in p olynomial time sp ecied b y O ( j E j log j E j ) (refer [ 7 ]). Observ e that Equation ( 3.9 ) is a v ery clean equation whic h do esn't require the estimation of an y smo othing parameter whic h has to b e c hosen carefully e.g., the n um b er of bins in the histogram or the "sigma" in the parzen windo w estimation of en trop y Of course w e ha v e to compute the length of the minim um spanning tree whic h is trivial. F or the n um b er of bins in the histogram and the parzen windo w width estimator, on the other hand, there is no explicit criteria of selection of the smo othing(width) parameter. The explanation of the rate of con v ergence of MST en trop y follo ws. The rate of con v ergence of the algorithm dep ends mostly on the n um b er of samples that w e

PAGE 39

27 c ho ose to compute the MST. A v ery nice pap er b y Alfred Hero et. al ("Con v ergence rates of minimal graphs with random v ertices" b y "A. O. Hero J. A. Costa and B. Ma" submitted to "IEEE T ransactions on Information Theory" in Marc h 2003) form ulated the rates of con v ergence of the minim um spanning tree for piece-wise linear densities and then extended their prop osition for a more general form of densities. W e next discuss a prop osition for the rate of con v ergence of minim um spanning trees. W e assume that the n um b er of dimensions d 2, that 1 r d 1 and X 1 ; X 2 ; : : : ; X n are indep enden t sample p oin ts in the space [0 ; 1] d with densit y f with supp ort S [0 ; 1] d Prop osition: Marginal blo c k densit y. The densit y f has m d lev els. F or the quasi-additiv e Euclidean functional L r of order r the follo wing b ound exists, j E [ L r ( X 1 ; X 2 ; : : : ; X n )] =n ( d r ) =d c ( r ; d ) Z S f ( d r ) =d ( x ) dx j O ( nm d ) 1 =d : (3.10) In the Equation ( 3.10 ), the term L r denotes the length of the minim um spanning tree whose edges ha v e the w eigh t of the Euclidean distance b et w een the p oin ts raised to the p o w er r as describ ed in ( 3.5 ). The Equation ( 3.10 ) means that the dierence b et w een the exp ected v alue of the length of the minim um spanning tree normalized b y n d r =d and the pro duct of the constan t c and the in tegral of the densit y function is asymptotically less than ( nm d ) 1 =d F or example, if d = 2, then the upp er b ound on the dierence men tioned ab o v e whic h is the error is O ( nm 2 ) 1 = 2 That is as n um b er of samples n increases, then the b ound b ecomes tigh ter when w e can sa y that the error reduces as the n um b er of samples increase. Observ e from equation ( 3.10 ) that, as the dimensionalit y of the feature space increases, the rate of con v ergence decreases and th us will need more samples for con v ergence. The preceding prop osition is imp ortan t in the con text of image registration, as the images ha v e xed lev els of color v alues and th us can b e said equiv alen t to

PAGE 40

28 the piece-wise densit y considered ab o v e where there are a total of m d lev els in the d dimensional space. In terested readers should refer to the pap er,("Con v ergence rates of minimal graphs with random v ertices" b y "A. O. Hero J. A. Costa and B. Ma" submitted to "IEEE T ransactions on Information Theory" in Marc h 2003) for a generalization of the ab o v e result to the generalized class of densities. The pap er b y Hero et. al. ("Con v ergence rates of minimal graphs with random v ertices" b y "A. O. Hero J. A. Costa and B. Ma" submitted to "IEEE T ransactions on Information Theory" in Marc h 2003) men tions that the aforemen tioned con v ergence criteria can also b e applied to the graph constructions lik e the Steiner tree graph, minimal matc hing graph or the TSP tour all along with their p o w er w eigh ted v arian ts. Ho w ev er, the minim um spanning tree is c hosen based on the pap er [ 12 ] whic h recommends the MST as the b est metho d to estimate en trop y due to its p olynomial run time. There exist man y w ell kno wn algorithms for implemen ting the MST whic h supp orts use of the minim um spanning tree as the densit y estimator. 3.4 Measure of Registration Since Viola and W ells [ 28 ] and Collignon et. al. [ 6 ] successfully used the Shannon m utual information to p erform registration, it has b een used as a measure to p erform registration in man y systems. In this study the minim um spanning tree as describ ed in the ab o v e section are used to estimate the appro ximate Ren yi en trop y of the random v ariables. This en trop y is further used to compute the measure of registration. The denition of the Ren yi m utual information w as rst form ulated b y Alfred Ren yi in the b o ok [ 23 ] as, I ( x ) = 1 1 log Z 1 1 f X ( x ) Q Ni =1 f X i ( x i ) 1 dx: (3.11)

PAGE 41

29 Unfortunately Ren yi m utual information cannot b e easily computed. Th us instead of the Ren yi m utual information, w e use a similar measure whic h is simply the sum of the marginal en tropies min us the join t en trop y similar to the Shannon m utual information. This measure w as prop osed b y Hild et. al in the pap er [ 14 ] as, N X i =1 H ( x i ) H ( x ) = 1 1 log R 1 1 f X ( x ) R 1 1 Q Ni =1 f X i ( x i ) dx i : (3.12) In case of t w o images, the ab o v e measure is the sum of the t w o Ren yi marginal en tropies min us the Ren yi join t en trop y as, H ( x 1 ) + H ( x 1 ) H ( x ) = 1 1 log L x 1 n + log L x 2 n log L x 12 n (3.13) where L x denotes the length of a minim um spanning tree constructed from a p o w er w eigh ted graph computed as L x = min T P e 2 T e rx The preceding measure is v ery similar to the Ren yi m utual information as b oth are p ositiv e and b oth tend to zero when the join t probabilit y is equal to the pro duct of the marginal probabilities (when the v ariables are indep enden t as men tioned in the pap er [ 14 ]). Ho w ev er, the main reason wh y the measure is preferred o v er the actual en trop y is that it can b e calculated using the MST en trop y algorithm. Th us, our algorithm maximizes this measure to p erform registration. Also, maximizing the m utual information is similar to minimizing the join t en trop y of the images when the searc h space is narro w. This is so as the marginal en tropies remain more or less constan t in a narro w searc h space. Referring to the Equation ( 3.12 ), w e see that, in a narro w space as the marginal en tropies are more or less constan t, the negativ e of join t en trop y is maximized. In other w ords, join t en trop y is minimized. In some cases (esp ecially when w e results are based on one parameter lik e rotation) w e ha v e opted to minimize join t en trop y instead of maximizing the m utual information.

PAGE 42

CHAPTER 4 ALGORITHM DESIGN This c hapter rst describ es the o v erall design of a generalized registration tec hnique follo w ed b y the v arious design decisions while designing this algorithm, a brief algorithm description and then implemen tation details. 4.1 General Registration Algorithm The pro cess of image registration can b e represen ted pro cedurally as sho wn in Figure 4{1 The rst step prepro cesses the images suc h that the images b ecome appropriate for registration. Prepro cessing in v olv es p erforming op erations on image lik e noise remo v al, smo othing of the image, cropping etc. Human In terv en tion can o ccur dep ending on the t yp e of images considered for registration. The next step is a quic k estimation the exact transformation parameters b y scanning the searc h space in lo w resolution. And nally when w e ha v e the coarse parameters for the transformation, the region around the rough parameters is scanned in order to get an accurate estimate. This m ulti-resolution helps as it sp eed up the registration pro cess. Alternativ ely if the time requiremen ts for estimating the measure( that w e discussed in Section 3.4 ) are less stringen t, then the initial rough estimation step is omitted all together and the searc h space is searc hed at ne resolution. The ro w Figure 4{1: Blo c k diagram for image registration 30

PAGE 43

31 Figure 4{2: Flo w-c hart for the estimating the parameters of the transform c hart for the blo c k used to nd the optimal transformations (either in the ballpark estimation or the ne tuning) is as sho wn in gure 4{2 W e apply the transformation on only one of the t w o images (transforming image) and the other image is unc hanged (reference image). A t eac h step of the algorithm, the en tropic measure b et w een the t w o images is calculated. If the measure has reac hed the maxim um(or minim um dep ending on its nature) in the space, then the parameters of the transform are declared as the optimal parameters. If not, then the next transform to b e applied to the transforming image is computed and that transform is applied to the transforming image and the

PAGE 44

32 steps are rep eated un til the optim um v alue of the measure is found in the searc h space. 4.2 Design of the Algorithm Design of a registration algorithm in v olv es a sequence of decisions that need to b e tak en at eac h step as discussed in this Section. As no w what b ecomes ob vious from the previous discussion that crux of this algorithm is estimation of the Ren yi en trop y whic h is con tin uous. This estimation can b e done b y using the non-parametric Ren yi measure based on the the minim um spanning trees based on the equation ( 3.8 ) whic h is expressed as 1 1 log ( Z < d f ( x ) dx ) = 1 1 (log lim n !1 n L ( X n ) log ( c ( d d ; d ))) : (4.1) F rom this equation, the estimation of en trop y is appro ximate b ecause of the presence of the constan t term c that only dep ends on and the dimensions d The rst step therefore is estimating parameters b y p erforming rough image registration using histograms. 4.2.1 Prepro cessing b y Histograms The main reason for using the histograms is that they are computationally ecien t. This allo ws us to get a ballpark estimation v ery quic kly The en tropies (b oth the join t and marginal) are estimated from the o v erlap area of the t w o images. The reason for estimating the join t en trop y from the o v erlap area is ob vious. Ho w ev er w e also estimate the marginal en trop y from the o v erlap area as as the images are transformed, the marginal en trop y of the o v erlap area also c hanges dynamically with ev ery transformation. The imp ortan t parameter whic h decides ho w w ell the histogram p erforms is the bin size, whic h is c hosen based on statistical analysis. The Figure 4{3 elab orates empirical results of histogram analysis when the n um b er of bins are v aried.

PAGE 45

33 -50 -40 -30 -20 -10 0 10 20 30 40 50 -1.1 -1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 AngleMutual InformationChanges in MI Vs. angle when number of bins are 8 50 40 30 20 10 0 10 20 30 40 50 1.1 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 AngleMutual InformationChanges in MI Vs. angle when number of bins are 8 50 40 30 20 10 0 10 20 30 40 50 1.6 1.5 1.4 1.3 1.2 1.1 1 0.9 0.8 AngleMutual InformationChanges in MI Vs. angle when number of bins are 32 Figure 4{3: Eect on registration b y v arying the n um b er of bins of histogram(Num b er of Bins are 8, 16 and 32 from top left to b ottom righ t. Maxim um should b e presen t at 0.)

PAGE 46

34 T able 4{1: The ICA v alues -0.2845 0.2875 -0.1523 0.1468 0.3425 -0.5310 -0.4016 0.5821 0.4048 0.0869 -0.4269 -0.0605 0.0597 0.0197 0.0251 0.0777 As w e can see as the n um b er of bins decrease (bin size is large) the histogram is noisy and there are man y p eaks around the desired v alue maxim um in the histogram. Ho w ev er, if the n um b er of bins increase, the p eak disapp ears as the data is o v er tted. After histogram analysis, w e extract features from the image and use these feature images to p erform registration as follo ws, 4.2.2 F eatures Image features are extracted based on a lo cal neigh b orho o d of eac h pixel, called tags [ 1 ]. The basic of idea of tag-based metho ds is to accoun t for the lo cal top ology surrounding eac h pixel. Around eac h pixel in the image, a n n matrix is extracted and v ectorized, to obtain a v ector of length n 2 for ev ery pixel. This is equiv alen t to quan tizing the image suc h that it rerects lo cal top ology due to whic h the extracted features b ecome more in v arian t to c hange in the individual pixel in tensit y Once the tags are extracted from the image, the decision that laid ahead w as ho w to extract features. W e opted for statistical based features since they extract features based on the information con ten t of the image. As men tioned in Section ( 2.6 ), the ICA algorithm attempts to nd statistically indep enden t comp onen ts of the image. W e ha v e used the F ASTICA [ 16 ] algorithm to compute the ICA of the images. The Figure 4{4 sho ws the feature images as computed b y ICA. The actual v alues of the ICA comp onen ts can b e seen from the table 4{1 .The rst three lines

PAGE 47

35 ICA Image 1 ICA Image 2 ICA Image 3 ICA Image 4 Figure 4{4: ICA feature images

PAGE 48

36 Figure 4{5: The faces used to compare registration for dieren t sizes of ICA blo c ks Figure 4{6: Figure sho wing the v ariations in MI vs. the rotation angle v arious ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features. Maxim um lo cated at 0. of the table 4{1 are resp onsible for the edge images of the original image while the fourth ICA comp onen t simply smo oths the image. One p oin t here is ho w man y ICA features to extract or ho w m uc h should b e the dimensionalit y of the feature space. There are sev eral in teresting observ ations in this case. Figure 4{6 represen ts the m utual information vs. the rotation angle for v arious sizes of ICA blo c ks. The images for registration are sho wn in Figure 4{5 F rom the Figure 4{6 w e can see that the registration w orsens if w e pic k to o high a feature dimension.

PAGE 49

37 Figure 4{7: Figure sho wing the v ariations in MI vs. the rotation angle v arious ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features Another in teresting observ ation is seen when w e pic k the faces sho wn in gure 4{30 w e get the MI Vs. rotation plots as sho wn in Figure 4{7 F rom the Figure 4{7 w e see that the the acceptable registration result is obtained when w e use 16 ICA features. Th us the n um b er of ICA features to use relies hea vily on the t yp e of image to register. Once features are extracted from the image, the next step is to ne tune the registration b y estimating the en trop y using the minim um spanning trees. 4.2.3 Searc h Strategy Our protot yp e MST matc hing algorithm uses a brute force searc h that scans the en tire transformation space to p erform registration. One of the reasons for not doing this is the w a y the MST en tropies b eha v e in the searc h space. The b eha vior is sho wn in 4{8 from whic h w e can see that the MST en trop y is not smo oth and has sev eral lo cal maxima. Th us, normal searc h strategies lik e the gradien t descen t etc cannot b e used as they rely on a smo oth ob jectiv e function.

PAGE 50

38 15 10 5 0 5 29 28 27 26 25 24 23 22 AngleMutual InformationMST entropies over rotation 10 5 0 5 10 0.5 0 0.5 14.8 14.6 14.4 14.2 14 13.8 13.6 scale Angle Figure 4{8: Beha vior of MST en trop y in 1D and 2D searc h space.

PAGE 51

39 Th us, in order to b e not o v erwhelmed b y the design of an ecien t searc h algorithm for nding the global maxima for a non-smo oth ob jectiv e function, w e used a brute force searc h whic h is guaran teed to nd a global maxim um. 4.2.4 Minim um Spanning T ree Algorithm This is the nal step of the registration. Ideally w e w ould use the en tire o v erlap area of the t w o images to compute the minim um spanning tree and then estimate the en trop y F rom Equation ( 3.8 ), w e see that the en trop y is sensitiv e to the n um b er of samples used to compute the minim um spanning tree and th us w e need to main tain this n um b er constan t during eac h computation. So, w e c ho ose a xed n um b er of sample p oin ts randomly from the o v erlap area of the image and use only those to estimate the en trop y Minim um Spanning T ree computation. Assume that w e pic k n sample p oin ts eac h time to compute the minim um spanning tree. No w here w e ha v e the p oin ts in some d dimensional space, where the w eigh ts of the edges b et w een t w o p oin ts are the Euclidean distance b et w een the p oin ts. Also, initially w e assume that there is an edge b et w een a p oin t and all the other ( n 1) p oin ts in the space i.e. that the graph is fully connected. In suc h a scenario ,w e ha v e n p oin ts and n 2 edges in the Euclidean space, the classical MST algorithm will ha v e the complexit y of order O ( n 2 log n ). As this is a p olynomial complexit y algorithm, the searc h time increases tremendously as n increases. One of the pap ers [ 19 ] men tions ab out resolving this issue b y reducing the n um b er of edges. This has b een done b y placing a sphere around eac h p oin t in the d dimensional h yp erspace. Only the edges connecting the giv en p oin t to its neigh b ors that lie within the sphere are retained for analysis. The radius of the sphere is calculated based on the follo wing form ula, radius p = mean p + max mean p QF actor RF actor (4.2)

PAGE 52

40 where radius p denotes the radius at p oin t p mean p denotes the mean of distances that connect p oin t p to all other p oin ts. max denotes the maxim um distance b et w een an y 2 p oin ts in the d dimensional space. QF actor denotes the quan tization factor. RF actor denotes the factor whic h decides magnitude of the radius. W e v aried the RF actor it o v er a sp ecic v alues of the radii and c hose the one whic h ga v e the radius with minim um edges and error. The plots are sho wn in 4{9 As a result, ev ery p oin t has a sphere of a dieren t radius, whic h is determined primarily b y ho w close the p oin t is to the other p oin ts in the h yp erspace. Deciding the n um b er of samples The Figures 4{10 and 4{11 illustrate that when the v alue of is k ept ab out 0.55, then 500 samples seem sucien t for registration. Ho w ev er a nice plot with explicit maxim um is obtained at ab out 1100 samples. A similar plot for v alue of alpha as 0.85 is sho wn in 4{12 and 4{13 whic h illustrates that a v ery go o d result is obtained at 1100 samples. Th us eviden tly as more samples are b etter for accuracy F ew er samples are pic k ed when sp eed desired. The minim um spanning tree w as computed b y using the Krusk al's Algorithm implemen tation pro vided b y the Bo ost Graph Library (Av ailable at: http://www.boost.org/libs/graph/doc/table of contents.html Accessed on: Marc h 2004). The pseudo co de for this implemen tation can b e found in the APPENDIX section. 4.3 Implemen tation Details 4.3.1 Hardw are and soft w are Details CPU : P en tium IV, 2.8 GHz, 2 GB RAM. OS: Lin ux. Languages: Matlab and C++.

PAGE 53

41 All the co de w as written in Matlab. The program used the Krusk al Algorithm pro vided b y the Bo ost Graph Library (Av ailable at: http://www.boost.org/libs/graph/doc/table of contents.html Accessed on: Marc h 2004) whic h is written in C++. The in terface b et w een the Matlab and the C++ co de is done using the Matlab MEX F unction In terface. Figure 4{9: Plots sho wing error b et w een the actual MST length and the truncated MST length o v er v arious radius factors and o v er rotations of -10, 0 and 10 degrees

PAGE 54

42 10 5 0 5 10 52.5 53 53.5 54 54.5 55 55.5 Joint Entropy 10 5 0 5 10 58 58.5 59 59.5 60 60.5 61 Joint EntropySamples is 300 10 5 0 5 10 61 61.5 62 62.5 63 63.5 Joint EntropySamples is 500 10 5 0 5 10 63 63.5 64 64.5 65 65.5 Joint EntropySamples is 700 Figure 4{10: Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.55 are sho wn as 1. 100 Samples 2. 300 samples 3. 500 samples 4. 700 samples 10 5 0 5 10 64.5 65 65.5 66 66.5 Joint EntropySamples is 900 10 5 0 5 10 65 65.5 66 66.5 67 67.5 Joint Entropy Figure 4{11: Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.55 are sho wn as 1. 900 Samples 2. 1100 samples

PAGE 55

43 10 5 0 5 10 31 32 33 34 35 36 Joint EntropySamples is 100 10 5 0 5 10 33 33.5 34 34.5 35 35.5 Joint Entropy 10 5 0 5 10 33.5 34 34.5 35 35.5 36 Joint EntropySamples is 500 10 5 0 5 10 33 33.5 34 34.5 35 35.5 36 Joint EntropySamples is 700 Figure 4{12: Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.85 are sho wn as 1. 100 Samples 2. 300 samples 3. 500 samples 4. 700 samples 10 5 0 5 10 34 34.5 35 35.5 36 36.5 Joint Entropy 10 5 0 5 10 34 34.5 35 35.5 36 Joint EntropySamples is 1100 Figure 4{13: Figure sho wing the v ariations in join t en trop y v erses the rotation angle with v arying n um b er of samples pic k ed from the o v erlap area when alpha is 0.85 are sho wn as 1. 900 Samples 2. 1100 samples

PAGE 56

44 T able 4{2: F ree parameter table P a rameter Meaning V alue Reason in Ren yi En trop y 0.5 The v alue suggested b y the Hero et. al in pap er [ 19 ]. Blo c kSize Size of blo c k around eac h pixel i.e. the n um b er of ICA features 2 or 4 Based on analysis in section 4.2.2 ,gures 4{6 4{7 radius radius of the sphere around eac h p oin t in feature space to truncate the edges 5 Based on analysis sho wn in Figure 4{9 n umBins n um b er of bins for histogram searc h 16 Based on analysis sho wn in Figure 4{3 n um b erSamples Num b er of random samples used to compute Ren yi en trop y 500 or 1000 Pic k ed based on whether sp eed or accuracy is desired. Figure 4.2.4 searc hSpace Ane parameters o v er whic h to searc h Rotations[-20,20], Scale[-0.5,0.5] shear[-0.5,0.5] translations[-5,5]in eac h direction Based on the time required to co v er the space Time: The time tak en for the en tire algorithm to run for a 64*64 images with histogram prepro cessing and then nal registration using MST en trop y using 1000 samples is appro ximately 8 hours. 4.3.2 F ree P arameters The free parameters are tabulated in T able 4{2 4.4 Results This section summarizes the dieren t results obtained from the registration algorithm. This section sho ws that the registration b y estimating en trop y b y using the MST pro duces acceptable results. All the face images are obtained from the Y ale face database (Av ailable A t:

PAGE 57

45 http://cvc.yale.edu/projects/yalefaces/yalefaces.html Accessed On: Marc h 2004) 4.4.1 Same Image Registered with the Same Image This result sho ws ho w the MST c hanges as the images are registered. As b oth the images considered are the same image in this case, the visual represen tation of the c hange in the MST from the unregistered case to one in the p erfectly registered case is v ery apparen t. The parameters v alues for the follo wing tests: = 0.5 Samples = 500 Num b er of bins of Histogram = 16 Blo c k size around pixel = 2 x 2 The gure 4{14 displa ys the original unregistered image and their o v erlap images. The Figure 4{15 sho ws the minim um spanning tree of the t w o unregistered images. The feature space in this represen tation is the in tensit y of b oth the images for the purp ose of visualization. The gure 4{16 sho ws the ho w the images are registered. The MST of the t w o registered images is sho wn in gure 4{17 Observ e ho w nicely the p oin ts collapse in one line as the images b ecome registered. 4.4.2 Registration of Same P erson with Dieren t Expressions In this section, here are the registration results for the images of the same p erson in dieren t facial expressions. The parameters c hosen are: = 0.5 Samples = 500 Num b er of bins of Histogram = 16 Blo c k size around pixel = 2*2

PAGE 58

46 Image 1 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images Figure 4{14: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. Same face with applied transformation 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images 50 0 50 100 150 200 250 300 0 50 100 150 200 250 300 DoneDtwoMinimum Spanning Tree for Unregistered Images Figure 4{15: Minim um spanning tree of the unregistered images considering only the in tensities of the t w o images as features

PAGE 59

47 Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4{16: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace obtained after applying the optimal transformation to the unregistered image sho wn in the gure 4{14 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images 0 50 100 150 200 250 300 0 50 100 150 200 250 300 DoneDtwoMinimum Spanning Tree for Registered Images Figure 4{17: Minim um spanning tree of the registered images considering only the in tensities of the t w o images as features.

PAGE 60

48 Image 1 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images Figure 4{18: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of the same p erson with glasses whic h is unregistered 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images The images b efore registration are sho wn in Figure 4{18 and the MST for the unregistered case is in Figure 4{19 After p erforming registration the results are sho wn in Figure 4{20 and the MST for the registered case is in Figure 4{21 If w e see at the images of the minim um spanning trees then w e see that when the t w o images are unregistered then the p oin ts are scattered all o v er the place. Ho w ev er when the images are registered, a cluster of p oin ts is formed at the 45 degree line in the MST space reducing the length of the MST. 4.4.3 Y et Another Registration of Same P erson with Dieren t Expressions In this section, w e sho w the registration results for another images of the same p erson in dieren t expressions. The parameters c hosen are: = 0.5 Samples = 500

PAGE 61

49 0 50 100 150 200 250 300 50 0 50 100 150 200 250 300 DoneDtwoMinimum Spanning Tree(MST) of unregistered images Figure 4{19: Minim um spanning tree of the unregistered images considering only the in tensities of the t w o images as features Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4{20: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of the same p erson with glasses whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images

PAGE 62

50 0 50 100 150 200 250 300 0 50 100 150 200 250 300 DoneDtwoMinimum Spanning Tree(MST) of registered images Figure 4{21: Minim um spanning tree of the registered images considering only the in tensities of the t w o images as features Num b er of bins of Histogram = 16 Blo c k size around pixel = 2 x 2 The images b efore registration are sho wn in Figure 4{22 After p erforming registration the results are sho wn in Figure 4{23 F rom the previous t w o results with the images of same p erson and dieren t expressions, w e can see that the algorithm successfully p erforms registration. An imp ortan t p oin t to note here is that the image size w as 64 x 49 (3136 pixels) out of whic h 500 w ere used to estimate en trop y in order to p erform registration b y using MST unlik e histograms whic h need all the samples in the area in order to estimate en trop y Plots of con v ergence for the Ren yi en trop y v erses the n um b er of samples are sho wn later in this section. 4.4.4 Registration Result of Images of t w o Dieren t P eople W e next discuss registration results for the images of the dieren t p ersons. The parameters c hosen are: = 0.5

PAGE 63

51 Image 1 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images Figure 4{22: Figure sho wing the from top left to b ottom righ t 1. F ace image of a happ y p erson 2. F ace image of the same p erson with sad expression whic h is unregistered 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4{23: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of the same p erson with sad expressions whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images

PAGE 64

52 Image 1 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images Figure 4{24: Figure sho wing the from top left to b ottom righ t 1. A F ace image 2. F ace image of the dieren t p erson 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images Samples = 500 Num b er of bins of Histogram = 16 Blo c k size around pixel = 4x4 Initial state of the unregistered images is sho wn in Figure 4{24 After p erforming registration the results are sho wn in Figure 4{25 The ab o v e result sho ws that though it is doing registering the t w o images, the result is not accurate for dissimilar images and there is denitely scop e for impro v emen t. In the next result therefore I c hanged some parameters and got some impro v emen ts in the result. 4.4.5 Y et Another Registration Result of Images of Tw o Dieren t P eople No w are the registration results for the y et another images of the dieren t p ersons. The parameters c hosen are: = 0.9999 Samples = 1000

PAGE 65

53 Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4{25: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of dieren t p erson whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images Num b er of bins of Histogram = 16 Blo c k size around pixel = 4*4 Also I scaled the searc h range for scale to p erform ev en ner searc h. Initial state of the unregistered images is sho wn in Figure 4{26 After p erforming registration the results are sho wn in Figure 4{27 The ab o v e result is a trire b etter than the previous one. 4.4.6 Comparison with Histogram Based Metho d In order to do this comparison, w e selected t w o sets of images of t w o dieren t p eople. In one case, w e p erform registration o v er six parameters for the faces considered in section 4.4.5 and w e registered the faces o v er all the six parameters namely rotation, scale, shear and translations. Initial state of the unregistered images is sho wn in Figure 4{28 After p erforming registration the results are sho wn in Figure 4{29

PAGE 66

54 Image 1 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images Figure 4{26: Figure sho wing the from top left to b ottom righ t 1. A F ace image 2. F ace image of the dieren t p erson 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4{27: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of dieren t p erson whic h is registered 3. Ov erlap of the t w o registered images 4. Edge image of the o v erlapp ed registered images

PAGE 67

55 Image 1 Image 2 Overlap of Unregistered Images Overlap of Unregistered Edge Images Figure 4{28: Figure sho wing the from top left to b ottom righ t 1. A F ace image 2. F ace image of the dieren t p erson 3. Ov erlap of the t w o unregistered images 4. Edge image of the o v erlapp ed unregistered images Image 1 Registered Image 2 Overlap of Registered Images Overlap of Registered Edge Images Figure 4{29: Figure sho wing the from top left to b ottom righ t 1. F ace image 2. F ace image of dieren t p erson whic h is registered 3. Ov erlap of the t w o images registered using histogram 4. Edge image of the o v erlapp ed registered images

PAGE 68

56 Image 1 with light from left Image 2 rotated with light from right Figure 4{30: Figure sho wing from left to righ t 1. F ace image of a p erson 2. F ace image of same p erson whic h is registered Figure 4{31: Figure sho wing image 2 as registered b y the histogram W e see from the gure 4{29 that the histogram metho d p erforms as w ell as the MST when the in tensit y dierences in the t w o images are not that ob vious. Later w e p erformed the comparison on t w o images in whic h the ligh ting on one of the images w as from the left side while in the other image the ligh t w as from the righ t side and then compared results o v er the rotation only The images that w ere considered are sho wn in gure 4{30 The second image is rotated b y 5 degrees and p erfect registration nds the righ t answ er at -5 degrees. The results with the histogram are sho wn in 4{31 The plot of the MI vs the rotation is sho wn in 4{32 Notice ho w the maxim um MI is seen at 9 degrees instead of -5 degrees, whic h is a clear error. The results with the MST are sho wn in Figure 4{33 The plot of the MI vs the rotation is sho wn in Figure 4{34 Notice ho w the maxim um MI is seen at -5 degrees as exp ected.

PAGE 69

57 10 8 6 4 2 0 2 4 6 8 10 1.8 1.79 1.78 1.77 1.76 1.75 1.74 1.73 1.72 1.71 AngleMutual Information by histogramMI Vs. Rotation by histogram Figure 4{32: Plot of MI vs. rotation angle when registered b y histogram. Maxim um should b e at -5 degrees. Figure 4{33: Figure sho wing image 2 as registered b y the MST 10 8 6 4 2 0 2 4 6 8 10 6.9092 6.9091 6.909 6.9089 6.9088 6.9087 6.9086 x 10 4 AngleMutual Information by MSTMI Vs. Rotation by MST Figure 4{34: Plot of MI vs. rotation angle as registered b y MST. Correct answ er is at -5 degrees.

PAGE 70

58 0 500 1000 1500 2000 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0 500 1000 1500 2000 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0 500 1000 1500 2000 1.8 1.6 1.4 1.2 1 0.8 0 500 1000 1500 2000 2.2 2 1.8 1.6 1.4 1.2 1 0.8 Figure 4{35: Figure sho wing the from top left to b ottom righ t, MST Ren yi join t en trop y vs the samples for 1. alpha = 0.4 2. alpha = 0.45 3.alpha = 0.5 4. alpha = 0.55 Th us w e conclude, that in the images with lot of in tensit y v ariations, the registration b y using a minim um spanning tree denitely outp erforms that b y the histogram. 4.4.7 Con v ergence of Ren yi En trop y with Num b er of Samples as Alpha Changes Recall from the previous discussion in the bac kground and theory section that w e need to estimate the Ren yi en trop y that as 1 con v erges to the Shannon dieren tial en trop y The question here is what v alue of to c ho ose suc h that the Ren yi en trop y estimated con v erges quic kly in few er samples. This result do cumen ts the plots of the Ren yi en trop y computed using the minim um spanning trees vs. the n um b er of samples as w e v ary the The result is sho wn in Figures 4{35 4{36 4{37

PAGE 71

59 0 500 1000 1500 2000 2 1.8 1.6 1.4 1.2 1 0 500 1000 1500 2000 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 0 500 1000 1500 2000 2.6 2.4 2.2 2 1.8 1.6 1.4 1.2 0 500 1000 1500 2000 2.5 2 1.5 1 Figure 4{36: Figure sho wing the from top left to b ottom righ t, MST Ren yi join t en trop y vs the samples for 1. alpha = 0.6 2. alpha = 0.65 3.alpha = 0.7 4. alpha = 0.75

PAGE 72

60 0 500 1000 1500 2000 3 2.5 2 1.5 1 0 500 1000 1500 2000 3.5 3 2.5 2 1.5 1 0 500 1000 1500 2000 3.5 3 2.5 2 1.5 0 500 1000 1500 2000 3 2.5 2 1.5 1 0.5 0 0.5 Figure 4{37: Figure sho wing the from top left to b ottom righ t, MST Ren yi join t en trop y vs the samples for 1. alpha = 0.8 2. alpha = 0.85 3.alpha = 0.9 4. alpha = 0.95

PAGE 73

61 T able 4{3: Noise results o v er rotation for histogram Noise V ariance Rotation Error 0.01 0 0.02 0 0.03 0 0.04 0 0.05 1 0.06 1 0.07 1 0.08 1 0.09 1 0.1 1 The graphs in Figures 4{35 4{36 and 4{37 sho w that the b eha vior of the Ren yi en trop y is v ery erratic for all the v alues of alpha and in fact do es not seem to con v erge quic kly 4.4.8 Noise T rials Noise in the image ma y aect the registration results. This section compare the histogram tec hnique to the MST tec hnique when the Gaussian noise with v arying v ariance w as added to the images. The images w ere then registered o v er rotation and o v er rotation and scale and the error b y b oth the tec hniques w as compared. case 1: Registration o v er rotation only Range of rotation: -10 to 10 degrees Correct Answ er at: 0 degrees. The error for the histogram metho d for v arious noise added can b e tabulated as in T able 4{3 As seen in T able 4{3 the histogram p erforms w ell for lo w noise but giv es an error of 1 degree for a noise v ariance greater than 0.05. The error for the MST metho d for noise added can b e seen in T able 4{4 F rom this table, w e can see that the MST p erforms w ell for a lo w noise case and again

PAGE 74

62 T able 4{4: Noise results o v er rotation for MST Noise V ariance Rotation Error 0.01 0 0.02 0 0.03 0 0.04 0 0.05 0 0.06 3 0.07 0 0.08 3 0.09 0 0.1 2 in tro duces error (in fact larger) than the histogram as the v ariance of the noise added increases. Case 2: Registration o v er rotation and scale Rotation Range: -10 to 10 degrees. Scale Range: -0.5 to 0.5 The correct answ er exp ected for b oth rotation and scale is at 0. Initially some trials w ere made b y just adding noise to the images and then registering the ra w images. With this option ho w ev er the MST approac h p erformed badly and ga v e an error for v alues lo w noise cases with v ariance of 0.01. The histogram metho d clearly outshone the MST metho d in this case. So later w e used a smo othing lter to smo othen out the images b efore registering then with the MST and the histogram. When smo othing W einer lter w as applied to the images after adding noise, the histogram metho d ga v e a ZER O error consisten tly for v ariance from 0.01 to 0.08. Results when the MST w as used are tabulated in the table 4{5 This table sho ws ho w the MST error w orsens dramatically in a t w o parameter searc h space.

PAGE 75

63 T able 4{5: Noise results o v er rotation and scale for MST Noise V ariance Rotation Error Scale Error 0.01 0 0 0.02 0 0 0.03 2 0 0.04 4 0 T able 4{6: Noise results o v er rotation and scale for MST Noise V ariance Rotation Error Scale Error 0.01 0 0 0.02 0 0 0.03 2 0 0.04 2 0 0.05 1 0.1 0.06 0 0 0.07 2 0 0.08 2 0 In this ab o v e result sho wn in T able 4{5 1000 samples from the o v erlap area of the images w ere used to compute the MST length. Another set of results when 1500 samples w ere used to compute the MST are tabulated in the T able 4{6 F rom the T able 4{6 w e can see that the accuracy of registration b y MST decreases progressiv ely as an error is consisten tly in tro duced. This result illustrates the b eha vior of the MST en trop y when Gaussian noise w as added to the image and then the image w as smo othed with a particular lter b efore registering. The are dieren t t yp es of noise and lters that can b e used to remo v e the noise. This relationship b et w een accuracy of registration b y MST and the dieren t t yp es of noise and lters needs to b e studied further.

PAGE 76

CHAPTER 5 DISCUSSION The estimation of en trop y has its applications in a v ariet y of areas with image registration b eing one suc h application. Recen tly information theoretic measures suc h as m utual information ha v e b ecome v ery p opular in image registration. Estimating the m utual information requires the estimation of en trop y This thesis used minim um spanning tree algorithms to estimate the en trop y from a nite set of samples. The MST can b e computed easily in p olynomial time. One of the go o d attributes of the MST-based en trop y estimation is that no further parameter tuning is required. Ho w ev er when applying these en tropies to p erform registration sev eral imp ortan t issues need to b e dealt with. The MST en trop y do es not appro ximate the Shannon en trop y and instead appro ximates the Ren yi en trop y seen in equation ( 3.3 ). Th us it cannot estimate the Shannon m utual information whic h is widely used in registration. In fact it cannot ev en estimate the original Ren yi m utual information that w e sa w in ( 3.11 ). Instead w e ha v e used a information theoretic measure whic h is the dierence b et w een the sum of the marginal en tropies and the join t en trop y as w as suggested in pap er [ 14 ]. This measure has prop erties similar to that of the Ren yi m utual information. There are sev eral op en issues that need to b e resolv ed in this case. W e w ould lik e to undertak e a systematic comparison with other established densit y and en trop y estimators suc h as P arzen windo ws [ 9 ]. P arzen windo ws estimate b oth the Shannon en trop y as w ell as the Ren yi en trop y Th us they can b e also used to estimate the Shannon m utual information. The densit y function b y the P arzen 64

PAGE 77

65 windo ws can b e computed from the follo wing equation, f n ( x ) = 1 N N X i =1 1 (2 2 ) N 2 exp jj x x i jj 2 2 2 : (5.1) This equation can b e further used to estimate the Ren yi En trop y whic h is men tioned in the equation ( 2.18 ). Note that when using the P arzen Windo w densit y estimator, it can compute the en trop y for > 1 where as for the MST the condition on the alpha is that it should b e less than 1. Th us careful form ulation has to b e done of all the parameters in order to do a comparison b et w een these densit y estimators. Apart from this, ev en in the regime of MST en tropies that are used for registration, sev eral imp ortan t c hanges can b e made to the metho d prop osed in this thesis. One of them is that, in this thesis w e ha v e simply scanned the en tire space to nd the maxim um (Refer bac k to the section 4.2.3 ). Clearly a go o d optimized searc h algorithm to searc h o v er suc h a space w ould b e b enecial in cutting the time required drastically Also, one area of impro v emen t is in feature selection. Righ t no w, w e ha v e used ICA features based on the rationale that ICA returns statistically indep enden t features. But a detailed study on the features can help impro v e the registraion results. There are sev eral other metho ds of feature selection lik e the principal comp onen t analysis, use of image pro cessing lters lik e Gab or lters etc. [ 27 ]. Ho w ev er, the noise analysis that w e did in the section ( 4.4.8 ), suggests that when noise is presen t it requires smo othing of images as these metho ds rely hea vily on taking the deriv ativ es of the images. Th us presence of noise can in fact ha v e a negativ e eect on the registration when deriv ativ e-based features are used, vis-s-vis in tensit y-based registration. An in teresting extension to the registration b y using MST en tropies can b e pursued if instead of using Euclidean distance among the feature p oin ts, w e pro ject

PAGE 78

66 the p oin ts in a dieren t space (e.g., Hilb ert space) and then use the distance in the Hilb ert space to compute the MST length. Whether suc h a extension impro v es the registration is another op en area of researc h. In fact w e also noticed in section ( 4.4.8 ) that the histogram metho d is less sensitiv e to noise. Th us a w a y to register b y using a high dimensional histogram is also a p ossibilit y Ho w ev er, in order to do that inno v ativ e w a ys ha v e to b e devised to compute histograms in a m ulti dimensional space. In conclusion, in this thesis, w e use the MST en tropies to register just t w o face images at a time searc hing o v er the space of ane transformations. This tec hnique can also b e expanded to register sev eral images at once. T o bring this in to realit y ho w ev er a go o d searc h algorithm is a necessit y It also requires appropriate selection of features in order to extract most relev an t information. F or example, the features used to register face images ma y not b e suitable for medical images. Also, MST en tropies can b e used for non-rigid image registration, pro vided it is b een made extremely computationally ecien t. Th us, this thesis b olsters the fact that that the MST en tropies can b e used for registration b y implemen ting a registration algorithm whic h searc hes o v er the ane transformation space.

PAGE 79

APPENDIX ADDITIONAL INF ORMA TION This section co v ers the pseudo co de of the algorithm. Pseudo-co de. The program consists of the follo wing main functions: F unctionRegister : This function is the main function whic h calls all the subfunctions. F unctionRoughRegister : This function p erforms the rough image registration using in tensit y and histograms for en trop y estimation. F unctionExtractICAF eatures : This function extracts the ICA features from the image. F unctionMSTRegister : This function p erforms the ne registration using the ICA features and the minim um spanning tree for estimating the en trop y and the follo wing supplemen tary functions: F unctionAneT ransfo rm : This function p erforms the ane transformation of the input image according to the giv en parameters. F unctionJointEntrop yByHistogram : This function computes the join t histogram and accordingly estimates the Shannon join t en trop y F unctionMa rginalEntrop yByHistogram : This function computes the histogram and accordingly estimates the Shannon marginal en trop y F unctionEntrop yByMST : This function estimates the Ren yi en trop y (Join t and Marginal) using the minim um spanning tree. Here are the main functions explained in depth: F unctionRegister Input : Image1 Image2 : The Images. Blo ckSize : Size of the square blo c k around eac h pixel(i.e. 2 x 2 3 x 3 etc). A lpha : The factor in the Ren yi En trop y 67

PAGE 80

68 NumBins : The n um b er of bins used to compute the histogram. Output : AnePar ameters : The ane transformation parameters whic h when applied to the second image giv e a registered images. The parameters of ane are t w o rotations, scale, shear and translations in the x and y direction MaxMe asur e : The Mutual Information lik e measure when the for the parameters ab o v e. Algorithm:// P erform rough image registration of the images using in tensit y + histogram RoughAneP arameters = F unctionRoughRegister ( Image1 Image2 NumBins ); // Extract ICA features from Input Images [icaF eatures1, icaF eatures2, ICAF eatures] = F unctionExtractICAF eatures ( Image1 Image2 Blo ckSize ); // P erform ne image registration of the images using ICA F eatures, + histogram AneP arameters = F unctionMSTRegister (ICAF eatures, A lpha RoughAneP arameters); EndF unction F unctionRoughRegister Input : Image1 Image2 : The Images. NumBins : The n um b er of bins used to compute the histogram. Output : R oughAnePar ameters : The ane transformation parameters whic h when applied to the second image giv e a roughly registered images. The parameters of ane are t w o rotations, scale, shear and translations in the x and y direction

PAGE 81

69 Algorithm:// The Image1 is k ept xed while Image2 is transformed o v er the searc h space. Ov er all the six parameters of the ane searc h space using a rough resolution to scan the space do tempImg = F unctionAneT ransfo rm (curren tT ransformV alues); join tEn trop y = F unctionJointEntrop yByHistogram (o v erlap areas of the Image1 and tempImg); marginalEn trop y1 = F unctionMa rginalEntrop yByHistogram (o v erlap area of Image1); marginalEn trop y2 = F unctionMa rginalEntrop yByHistogram (o v erlap area of Image2); curren tMI = marginalEn trop y1 + marginalEn trop y2 join tEn trop y; if ( curren tMI > maxMI ) maxMI = curren tMI; RoughAneP arameters = curren t ane parameters; endIfendWhilereturn RoughAneP arameters; EndF unction F unctionExtractICAF eatures Input : Image1 Image2 : The Images. Blo ckSize : Size of the square blo c k around eac h pixel(i.e. 2 x 2 3 x 3 etc). Output : ic aF e atur es1 : The ICA features of image1 ic aF e atur es2 : The ICA features of image2 ICAF e atur es : The ICA features of the t w o images

PAGE 82

70 Algorithm:F or b oth Image1 and Image2 do // Get the blo c k neigh b erho o d of eac h pixel in b oth the images. Blo c kInput1 = GetBlo ckNeighb erho o dImage ( Image1 blo c kSize); Blo c kInput2 = GetBlo ckNeighb erho o dImage ( Image1 ,blo c kSize); // Compute the ICA features. W e ha v e used the fastICA pac k age [ 16 ] to do so. The pac k age is freely a v ailable for use. ic aF e atur es1 = F ASTICA (Blo c kInput1); ic aF e atur es2 = F ASTICA (Blo c kInput2); // No w com bine the ICA features ICAF e atur es = ic aF e atur es1 U ic aF e atur es2 ; return ic aF e atur es1, ic aF e atur es2, ICAF e atur es ; EndF unction F unctionMSTRegister Input : ic aF e atur es1 : The ICA features of image1. ic aF e atur es2 : The ICA features of image2. ICAF e atur es : The ICAF eatures of the Images com bined. A lpha : The Alpha used to compute the Ren yi en trop y R oughAnePar ameters : The Ballpark estimate from the histograms around whic h the searc h is p erformed. Output : FinalAnePar ameters : The ane transformation parameters whic h when applied to the second image giv e a registered images. The parameters of ane are t w o rotations, scale, shear and translations in the x and y direction MaxMI : The v alue m utual information when the images are totally aligned.

PAGE 83

71 Algorithm:// The Image1 is k ept xed while Image2 is transformed o v er the searc h space. Ov er all the six parameters of the ane searc h around the ballpark estimate R oughAnePar ameters using a ne resolution to scan the space do // First Pic k Random Samples from the join t as w ell as the individual feature space. (sampleJoin t, sample1, sample2) = pickRandomSamples (ICAF eatures, icaF eatures1, icaF eatures2); tempImg = F unctionAneT ransfo rm (curren tT ransformV alues); join tEn trop y = F unctionEntrop yByMST (sampleJoin t); marginalEn trop y1 = F unctionEntrop yByMST (sample1); marginalEn trop y2 = F unctionEntrop yByMST (sample2); Curren tRegistrationMeasure = marginalEn trop y1 + marginalEn trop y2 join tEn trop y; if ( Curren tRegistrationMeasure > maxMeasure ) maxMeasure = Curren tRegistrationMeasure; FinalAneP arameters = curren t ane parameters; endIfendWhilereturn FinalAnePar ameters, maxMe asur e ; EndF unction No w the pseudo co de of the supplemen tary functions is: F unctionAneT ransform Input : Image : The Image to transform.

PAGE 84

72 T r ansformPar ameters : P arameters b y whic h the image is to b e transformed Output : T r ansforme dImage : The transformed image after applying the transformation. Algorithm:// The transform parameters are 2 rotation angles ( and ), scale, shear, and translations (xT and yT) 264 a b c d 375 = 264 2 scal e 0 0 2 scal e 375 264 cos ( ) sin( ) sin ( ) cos( ) 375 264 2 shear 0 0 2 shear 375 264 cos ( ) sin( ) sin ( ) cos ( ) 375 aneT ransformationMatrix = 266664 a b 0 c d 0 xT y T 1 377775 ; // No w call the MA TLAB's Image T ransform Routine whic h transforms the giv en image according to the giv en ane transform matrix and lls the empt y areas according to the in terp olation metho d sp ecied. T r ansforme dImage = F unctionMA TLABImT ransfo rm (Image, aneT ransformMatrix, in terp olation); return T r ansforme dImage ; EndF unction F unctionJoin tEn trop yByHistogram Input : Image1 Image2 : The Images. NumBins : The n um b er of bins used to compute the histogram. Output :

PAGE 85

73 jointEntr opy : Join t Shannon En trop y of the t w o images computed b y the histogram metho d Algorithm:in terv al = Range/NumBins; //Compute the t w o dimensional normalized join t histogram of the t w o images. join tHist = F unctionJoin tHistogram(Image1, Image2, in terv al); join tEn trop y = 0; for i = 1 to NumBins for j = 1 to NumBins join tEn trop y = -1 join tHist(i,j) log join tHist(i,j); endF or endF or return jointEntr opy ; EndF unction F unctionMarginalEn trop yByHistogram Input : Image : The Image. NumBins : The n um b er of bins used to compute the histogram. Output : mar ginalEntr opy : Marginal Shannon En trop y of the image computed b y the histogram metho d Algorithm:in terv al = Range/NumBins; //Compute the one dimensional normalized histogram image. marginalHist = F unctionMarginalHistogram(Image, in terv al); mar ginalEntr opy = 0;

PAGE 86

74 for i = 1 to NumBins mar ginalEntr opy = -1 marginalHist(i,j) log marginalHist(i,j); endF or return mar ginalEntr opy ; EndF unction F unctionEn trop yByMST Input : fe atur eMatrix : The feature matrix whose en trop y is to b e calculated. The size of the matrix is N d. where N :Num b er of sample p oin ts and d :Their dimension d 2 A lpha : The Alpha P arameter that is used in computing the Ren yi En trop y Output : Entr opy : Ren yi En trop y of the feature matrix computed b y the Minimal Spanning T ree metho d. Algorithm:/* In this function w e attempt to compute the appro ximate Ren yi En trop y based on the form ula 3.8 W e do not kno w the factor C ( ; d ) and th us it is appro ximate. */ [N d] = size(featureMatrix); /* Compute the minimal spanning tree and its length. W e use the Krusk al Algorithm pro vided in the "Bo ost Graph Library" (Av ailable at: http://www.boost.org/libs/graph/doc/table of contents.html Accessed on: Marc h 2004) */ lengthMST = F unctionBOOSTKrusk al(featureMatrix); gamma = (d Alpha) /d; En trop y = 1 1 Al pha log l eng thM S T N Alpha ;

PAGE 87

75 return En trop y; EndF unction

PAGE 88

REFERENCES [1] Y. Amit and D. Geman Shap e quantization and r e c o gnition with r andomize d tr e es Neural Computation, 9 (1997), pp. 1545{1588. [2] D. Beard w ood, H. J. Hal ton, and J. M. Hammersley The shortest p ath thr ough many p oints in Pro ceedings Cam bridge Philosphical So ciet y v ol. 55, 1959, pp. 299{327. [3] C. Bishop Neur al networks for p attern r e c o gnition Oxford Univ ersit y Press, Oxford, England, 1995. [4] L. G. Br o wn A survey of image r e gistr ation te chniques Computing Surv eys, 24 (1992), pp. 325{376. [5] B. Chazelle A minimum sp anning tr e e algorithm with inverseackermann typ e c omplexity Journal of the A CM, 47 (2000), pp. 1028{1047. [6] A. Collignon, D. V andermeulen, P. Suetens, and G. Mar chal 3D multi{mo dality me dic al image r e gistr ation using fe atur e sp ac e clustering in Computer Vision, Virtual Realit y and Rob otics in Medicine, N. Ay ac he, ed., v ol. 905 of Lecture Notes in Computer Science, Springer{V erlag, 1995, pp. 195{204. [7] T. Cormen, C. Leiserson, and L. Rivest Intr o duction to algorithms MIT Press, Cam bridge, MA, USA, 2001. [8] T. Co ver and J. Thomas Elements of information the ory John Wiley and Sons, New Y ork, NY, USA, 1991. [9] R. Dud a and P. Har t Pattern classic ation and sc ene analysis John Wiley and Sons, New Y ork, NY, 1973. [10] D. Eppstein Dynamic euclide an minimum sp anning tr e es and extr ema of binary functions Geometry: Discrete & Computational Geometry 13 (1995), pp. 111{122. [11] S. Gold, A. Rangarajan, C. P. Lu, S. P appu, and E. Mjolsness New algorithms for 2-D and 3-D p oint matching: p ose estimation and c orr esp ondenc e P attern Recognition, 31 (1998), pp. 1019{1031. [12] A. O. Her o, B. Ma, O. Michel, and J. Gorman Applic ations of entr opic sp anning gr aphs IEEE Signal Pro cessing Magazine (Sp ecial Issue on Mathematics in Imaging), 19 (2002), pp. 85{95. 76

PAGE 89

77 [13] A. O. Her o and O. Michel Asymptotic the ory of gr e e dy appr oximations to minimal k-p oint r andom gr aphs IEEE T ransactions on Information Theory 45 (1999), pp. 1921{1938. [14] K. E. Hild I I, D. Erdogmus, and J. Principe Blind sour c e sep ar ation using r enyis mutual informations IEEE Signal Pro cessing Letters, 8 (2002), pp. 174{176. [15] E. Hor o witz, S. Sahni, and S. Rajasekaran F undamentals of c omputer algorithms W. H. F reeman, San F rancisco, CA, 1998. [16] A. Hyv A rinen and E. Oja Indep endent c omp onent analysis: A tutorial Neural Net w orks, 13(4-5) (2000), pp. 411{430. [17] B. Ma, A. Her o, J. Gorman, and O. Michel Image r e gistr ation with minimum sp anning tr e e algorithm in In ternationl Conference on Image Pro cessing, v ol. 1, 2000, pp. 481{484. [18] R. E. Naepolit an L e arning b ayesian networks P earson Pren tice Hall, Upp er Saddle Riv er, NJ, USA, 2004. [19] H. Neemuchw ala, A. Her o, and P. Carson Image r e gistr ation using entr opic gr aph-matching criteria in Pro ceedings of Asilomar Conference on Signals and Systems, v ol. 1, 2002, pp. 134{138. [20] S. Pettie and V. Rama chandran A n optimal minimum sp anning tr e e algorithm in Automata, Languages and Programming, 2000, pp. 49{60. [21] J. P. W. Pluim, J. B. A. Maintz, and M. A. Vier gever Mutualinformation-b ase d r e gistr ation of me dic al images: A survey IEEE T rans. on Medical Imaging, 22 (2003), pp. 986{1004. [22] L. Rabiner A tutorial on hidden markov mo dels and sele cte d applic ations in sp e e ch r e c o gnition Pro ceedings of the IEEE, 77 (F eb 1989), pp. 257{286. [23] A. Renyi Pr ob ability the ory North-Holland Publishing Compan y Amsterdam, Netherlands, 1970. [24] C. E. Shannon A mathematic al the ory of c ommunic ation The Bell system tec hnical journal, 27 (1948), pp. 379{423 and 623{656. [25] J. M. Steele Gr owth r ates of euclide an minimum sp anning tr e es with p ower weighte d e dges The Annals of Probabilit y 16 (1988), pp. 1767{1787. [26] H. A. Stur ges The choic e of class interval Journal of the American Statistical Asso ciation, 21 (1926), pp. 65{66.

PAGE 90

78 [27] B. C. Vemuri, J. Liu, and J. L. Marr oqun R obust multimo dal image r e gistr ation using lo c al fr e quency r epr esentations v ol. 2082 of Lecture Notes in Computer Science, 2001, pp. 176{182. [28] P. Viola and W. M. Wells I I I A lignment by maximization of mutual information in Fifth In tl. Conf. Computer Vision (ICCV), 1995, pp. 16{23.

PAGE 91

BIOGRAPHICAL SKETCH I graduated with a bac helor of engineering degree from the Go v ernmen t College of Engineering, Pune, India. I started m y master of science program at the univ ersit y of Florida in August 2002 and I am planning to graduate in a mon th! My academic in terests lie in computer vision and image pro cessing and this thesis w as a great learning exp erience for me. Apart from this, I enjo y pain ting and badmin ton. 79


Permanent Link: http://ufdc.ufl.edu/UFE0004885/00001

Material Information

Title: Affine image registration using minimum spanning tree entropies
Physical Description: Mixed Material
Language: English
Creator: Kumthekar, Aditee ( Dissertant )
Rangarajan, Anand ( Thesis advisor )
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2004
Copyright Date: 2004

Subjects

Subjects / Keywords: Computer and Information Science and Engineering thesis, M.S
Image analysis
Dissertations, Academic -- UF -- Computer and Information Science and Engineering
Image processing -- Digital techniques

Notes

Abstract: The various environmental factors, settings of the image acquisition device and the material properties of a object directly affect the formation of an image. Registration of two images involves aligning the two images obtained under different conditions which is based on several information-theoretic methods. Entropy estimation is an important step in these methods which traditionally requires estimation of the density function of the images. This thesis however directly estimates the entropy without a priori default density estimation. This is achieved by exploiting a powerful theorem which relates the asymptotic length of the minimum spanning tree formed from the samples and the Renyi entropy of the density function underlying the samples. Since the density estimation step is avoided, the careful tuning of several density estimation parameters, which is required by most density estimation methods, is avoided. Also, since the spanning tree method estimates the entropy in high dimensions, this allows us to include feature information into the registration process. Affine registration of face images of different subjects with different gestures and taken under different lighting conditions is demonstrated.
Subject: affine, entropy, image, minimum, registration, Renyi, spanning, tree
General Note: Title from title page of source document.
General Note: Document formatted into pages; contains 91 pages.
General Note: Includes vita.
Thesis: Thesis (M.S.)--University of Florida, 2004.
Bibliography: Includes bibliographical references.
General Note: Text (Electronic thesis) in PDF format.

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0004885:00001

Permanent Link: http://ufdc.ufl.edu/UFE0004885/00001

Material Information

Title: Affine image registration using minimum spanning tree entropies
Physical Description: Mixed Material
Language: English
Creator: Kumthekar, Aditee ( Dissertant )
Rangarajan, Anand ( Thesis advisor )
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2004
Copyright Date: 2004

Subjects

Subjects / Keywords: Computer and Information Science and Engineering thesis, M.S
Image analysis
Dissertations, Academic -- UF -- Computer and Information Science and Engineering
Image processing -- Digital techniques

Notes

Abstract: The various environmental factors, settings of the image acquisition device and the material properties of a object directly affect the formation of an image. Registration of two images involves aligning the two images obtained under different conditions which is based on several information-theoretic methods. Entropy estimation is an important step in these methods which traditionally requires estimation of the density function of the images. This thesis however directly estimates the entropy without a priori default density estimation. This is achieved by exploiting a powerful theorem which relates the asymptotic length of the minimum spanning tree formed from the samples and the Renyi entropy of the density function underlying the samples. Since the density estimation step is avoided, the careful tuning of several density estimation parameters, which is required by most density estimation methods, is avoided. Also, since the spanning tree method estimates the entropy in high dimensions, this allows us to include feature information into the registration process. Affine registration of face images of different subjects with different gestures and taken under different lighting conditions is demonstrated.
Subject: affine, entropy, image, minimum, registration, Renyi, spanning, tree
General Note: Title from title page of source document.
General Note: Document formatted into pages; contains 91 pages.
General Note: Includes vita.
Thesis: Thesis (M.S.)--University of Florida, 2004.
Bibliography: Includes bibliographical references.
General Note: Text (Electronic thesis) in PDF format.

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0004885:00001


This item has the following downloads:


Full Text











AFFINE IMAGE REGISTRATION USING MINIMUM SPANNING TREE
ENTROPIES















By

ADITEE KUMTHEKAR


A THESIS PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE

UNIVERSITY OF FLORIDA


2004


































Copyright 2004

by

Aditee Kumthekar















This thesis is dedicated to my family whose constant and unconditional

support has made it possible.















ACKNOWLEDGMENTS

I would like to thank Dr. Anand Rangarajan for his excellent guidance, for

the innumerable hours that he spent patiently resolving my difficulties and for his

constant encouragement which was the driving force behind this thesis. I would

also like to extend my thanks to Jie Zhang who made me see certain aspects of

image registration clearly.
















TABLE OF CONTENTS
page

ACKNOWLEDGMENTS ............... .......... iv

LIST OF TABLES ............... ............. vii

LIST OF FIGURES ............... ............ viii

ABSTRACT ......... .... ........... ....... xii

CHAPTER

1 INTRODUCTION .................. ... 1

2 BACKGROUND ............................... 7

2.1 Entropy and Mutual Information ... .......... 7
2.1.1 Joint and Conditional Entropy ............. 8
2.1.2 Relative Entropy and Mutual Information ......... 8
2.1.3 Differential Entropy. . .............. 9
2.1.4 Renyi Entropy ...... .. ......... .. ...... 10
2.2 Histogram method for Estimating Entropy ........... 10
2.3 Minimum Spanning Trees .. ................ .. 11
2.4 The Problem of Registration ................ .. 16
2.5 Image Transformations .... ........... .... 18
2.6 Independent Component Analysis. . . . .. 19

3 THEORY OF MINIMUM SPANNING TREE ENTROPIES .. ..... 22

3.1 Entropy Estimation for Registration . . . . 22
3.2 History ......... . . . ...... 23
3.3 Minimum Spanning Trees for Entropy Estimation . . 25
3.4 M measure of Registration . . . . . 28

4 ALGORITHM DESIGN .. . . . ....... 30

4.1 General Registration Algorithm. . . . ... 30
4.2 Design of the Algorithm . . . . . 32
4.2.1 Preprocessing by Histograms. . . . 32
4.2.2 Features . . . . .. ........ 34
4.2.3 Search Strategy . . . . .. . 37
4.2.4 Minimum Spanning Tree Algorithm . . 39
4.3 Implementation Details . . . . .. . 40










4.3.1 Hardware and software Details . . . 40
4.3.2 Free Parameters . . . . . . 44
4.4 Results . . . . .. ....... . 44
4.4.1 Same Image Registered with the Same Image . . 45
4.4.2 Registration of Same Person with Different Expressions 45
4.4.3 Yet Another Registration of Same Person with Different
Expressions . . . . . . 48
4.4.4 Registration Result of Images of two Different People 50
4.4.5 Yet Another Registration Result of Images of Two Different
People ....... ........ . . ........ 52
4.4.6 Comparison with Histogram Based Method . . 53
4.4.7 Convergence of Renyi Entropy with Number of Samples as
Alpha Changes . . . . .. . 58
4.4.8 Noise Trials . . . . . . 61

5 DISCUSSION . . . . . . . . 64

APPENDIX ADDITIONAL INFORMATION . . . 67

REFERENCES ..... . . ... ............... 76

BIOGRAPHICAL SKETCH . . . . . . . 79















LIST OF TABLES
Table page

4-1 The ICA values ................... ........... 34

4-2 Free parameter table . . . . . . . 44

4-3 Noise results over rotation for histogram . . . . 61

4-4 Noise results over rotation for MST . . . ...... 62

4-5 Noise results over rotation and scale for MST .. . . 63

4-6 Noise results over rotation and scale for MST .. . . 63















LIST OF FIGURES
Figure page

1-1 Two registered images with different intensity distributions . 4

2-1 Original graph . . . . ... .......... 14

2-2 MST by Kruskals Algorithm . . . . ... . 14

4-1 Block diagram for image registration . . . ....... 30

4-2 Flow-chart for the estimating the parameters of the transform . 31

4-3 Effect on registration by varying the number of bins of histogram(Niiiil.i r
of Bins are 8, 16 and 32 from top left to bottom right. Maximum
should be present at 0.) . . . . .. . 33

4-4 ICA feature images . . . . . . . 35

4-5 The faces used to compare registration for different sizes of ICA
blocks ..... . . .... .............. 36

4-6 Figure showing the variations in MI vs. the rotation angle various
ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features.
Maximum located at 0. . . . . .. . 36

4-7 Figure showing the variations in MI vs. the rotation angle various
ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features
37

4-8 Behavior of MST entropy in ID and 2D search space. . . 38

4-9 Plots showing error between the actual MST length and the truncated
MST length over various radius factors and over rotations of -10, 0
and 10 degrees . . . . . . . 41

4-10 Figure showing the variations in joint entropy verses the rotation
angle with varying number of samples picked from the overlap area
when alpha is 0.55 are shown as 1. 100 Samples 2. 300 samples 3.
500 samples 4. 700 samples . . . .... .. 42

4-11 Figure showing the variations in joint entropy verses the rotation
angle with varying number of samples picked from the overlap area
when alpha is 0.55 are shown as 1. 900 Samples 2. 1100 samples 42









4-12 Figure showing the variations in joint entropy verses the rotation
angle with varying number of samples picked from the overlap area
when alpha is 0.85 are shown as 1. 100 Samples 2. 300 samples 3.
500 samples 4. 700 samples . . . . .. 43

4-13 Figure showing the variations in joint entropy verses the rotation
angle with varying number of samples picked from the overlap area
when alpha is 0.85 are shown as 1. 900 Samples 2. 1100 samples 43

4-14 Figure showing the from top left to bottom right 1. Face image
2. Same face with applied transformation 3. Overlap of the two
unregistered images 4. Edge image of the overlapped unregistered
images . . . . . . . . 46

4-15 Minimum spanning tree of the unregistered images considering only
the intensities of the two images as features . . 46

4-16 Figure showing the from top left to bottom right 1. Face image 2.
Face obtained after applying the optimal transformation to the
unregistered image shown in the figure 4-14 3. Overlap of the
two registered images 4. Edge image of the overlapped registered
images . . . . . . . . 47

4-17 Minimum spanning tree of the registered images considering only the
intensities of the two images as features. . . . 47

4-18 Figure showing the from top left to bottom right 1. Face image 2.
Face image of the same person with glasses which is unregistered
3. Overlap of the two unregistered images 4. Edge image of the
overlapped unregistered images . . . .. . 48

4-19 Minimum spanning tree of the unregistered images considering only
the intensities of the two images as features . . 49

4-20 Figure showing the from top left to bottom right 1. Face image 2.
Face image of the same person with glasses which is registered
3. Overlap of the two registered images 4. Edge image of the
overlapped registered images . . . . . 49

4-21 Minimum spanning tree of the registered images considering only the
intensities of the two images as features . . . 50

4-22 Figure showing the from top left to bottom right 1. Face image
of a happy person 2. Face image of the same person with sad
expression which is unregistered 3. Overlap of the two unregistered
images 4. Edge image of the overlapped unregistered images . 51









4-23 Figure showing the from top left to bottom right 1. Face image
2. Face image of the same person with sad expressions which is
registered 3. Overlap of the two registered images 4. Edge image
of the overlapped registered images . . . . 51

4-24 Figure showing the from top left to bottom right 1. A Face image 2.
Face image of the different person 3. Overlap of the two unregistered
images 4. Edge image of the overlapped unregistered images . 52

4-25 Figure showing the from top left to bottom right 1. Face image 2.
Face image of different person which is registered 3. Overlap of the
two registered images 4. Edge image of the overlapped registered
images . . . .. ............ 53

4-26 Figure showing the from top left to bottom right 1. A Face image 2.
Face image of the different person 3. Overlap of the two unregistered
images 4. Edge image of the overlapped unregistered images . 54

4-27 Figure showing the from top left to bottom right 1. Face image 2.
Face image of different person which is registered 3. Overlap of the
two registered images 4. Edge image of the overlapped registered
images . . . .. ............ 54

4-28 Figure showing the from top left to bottom right 1. A Face image 2.
Face image of the different person 3. Overlap of the two unregistered
images 4. Edge image of the overlapped unregistered images . 55

4-29 Figure showing the from top left to bottom right 1. Face image 2.
Face image of different person which is registered 3. Overlap of
the two images registered using histogram 4. Edge image of the
overlapped registered images . . . . . 55

4-30 Figure showing from left to right 1. Face image of a person 2. Face
image of same person which is registered . . 56

4-31 Figure showing image 2 as registered by the histogram . . 56

4-32 Plot of MI vs. rotation angle when registered by histogram. Maximum
should be at -5 degrees. . . . . .. 57

4-33 Figure showing image 2 as registered by the MST . . 57

4-34 Plot of MI vs. rotation angle as registered by MST. Correct answer
is at -5 degrees. . . . . . . ... 57

4-35 Figure showing the from top left to bottom right, MST Renyi joint
entropy vs the samples for 1. alpha = 0.4 2. alpha = 0.45 3.alpha
= 0.5 4. alpha = 0.55 . . . . ... 58









4-36 Figure showing the from top left to bottom right, MST Renyi joint
entropy vs the samples for 1. alpha = 0.6 2. alpha = 0.65 3.alpha
= 0.7 4. alpha = 0.75 . . . . ... 59

4-37 Figure showing the from top left to bottom right, MST Renyi joint
entropy vs the samples for 1. alpha = 0.8 2. alpha = 0.85 3.alpha
= 0.9 4. alpha = 0.95 . . . . ... 60















Abstract of Thesis Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Master of Science

AFFINE IMAGE REGISTRATION USING MINIMUM SPANNING TREE
ENTROPIES

By

Aditee Kumthekar

\.r,- 2004

Chair: Anand Rangarajan
AI. P] r Department: Computer and Information Science and Engineering

The various environmental factors, settings of the image acquisition device

and the material properties of a object directly affect the formation of an image.

Registration of two images involves aligning the two images obtained under

different conditions which is based on several information-theoretic methods.

Entropy estimation is an important step in these methods which traditionally

requires estimation of the density function of the images. This thesis however

directly estimates the entropy without a priori default density estimation. This

is achieved by exploiting a powerful theorem which relates the .i 'mptotic length

of the minimum spanning tree formed from the samples and the Renyi entropy

of the density function underlying the samples. Since the density estimation step

is avoided, the careful tuning of several density estimation parameters, which is

required by most density estimation methods, is avoided. Also, since the spanning

tree method estimates the entropy in high dimensions, this allows us to include

feature information into the registration process. Affine registration of face images

of different subjects with different gestures and taken under different lighting

conditions is demonstrated.















CHAPTER 1
INTRODUCTION

The entire field of computer vision is motivated to solve the problem of

designing a system which has the capabilities of human vision. When we look

around, our eyes and our brain immediately analyze the relationship between

the different objects in our immediate environment. We usually do not have to

think long to interpret what we see. It comes naturally to us. However, we have

not been able to emulate this amazing system of perception after 40 years of the

inception of the field of computer vision. Why is this so? This is because a very

complex relationship exists between the 1]li',-i' .1l factors and the way an image is

formed. Such factors include reflectivity and absorbency of the illuminated surface,

the intensity and the direction of illumination, the environmental factors like the

presence of dust and speckle in the atmosphere, rain etc. Not only this but the

mode in which an image is acquired for example, a camera and the parameters

that calibrate the instrument for example, the shutter speed in the camera

contribute to the image formation. Viola and Wells [ have represented this event

of image formation as an equation as


v(T(x)) = F(u(x), q), (1.1)


where T(x) is the transformation function which determines the orientation of the

image; u(x) is the function which describes the surface of the object; q represents

the other parameters that affect image formation such as lighting, environment

etc; v() is the function that formulates the pixel intensities and F(u(x), q) is the

function which incorporates the various parameters mentioned above which play a









role in the image formation. Modelling the aforementioned function is not a trivial

task as there are a vast number of parameters involved.

Many vision related systems attempt having functionalities such as object

recognition identifying the objects from the background region, indexing of image

databases- the efficient retrieval of images similar to the query image and object

modelling the mathematical representation, coding and synthetic generation of

an image from a model. Many such applications require that the images under

consideration be aligned before further processing can be performed on them.

Apart from these applications which require alignment as a initial preprocessing

step, there are a plethora of applications which rely on accurate image alignment.

One such application is "anomaly dil 1 I i ii" in medical imaging, which refers to

detecting any changes in a particular organ over a period of time either due to

disease or to due to treatment, e.g., detecting if the post-operative MRI of patient

shows the removal of the tumor that was seen in the pre-operative scan of the

patient. In order to do this, we require accurate alignment of the images.

Image registration is precisely this task of spatially aligning two images so

that the corresponding pixels in the images refer to the same portion or area in

the image. Alternatively, you could also think of image registration as finding the

transformation which aligns the two images accurately. You might now wonder why

is this not a simple task. Let us see why. We saw that there was a wide array of

factors that contribute to the image formation. The challenge of registration lies

in matching two images when they are taken at different times, under different

conditions and using different viewpoints probably with different image acquisition

techniques. Recently, the problem of multimodality based image registration,

which refers to registering images under different lighting conditions, has gained

importance.









At the core,registration methods can be viewed as a combination of certain

choices about the following parameters, namely [ ]

Feature Space: Feature space refers to the parameters of the image used to
perform registration. In some techniques, the raw image intensities are used
directly. These are called intensity based methods. Also, there are feature
based methods which extract some information from the raw image (termed
as features) and then use these features to perform registration. The features
can be extracted via image segmentation or by using information theory
based technique.

Search space: This is the set of various transformations applied to the image
during the registration process. The spatial transformations run the gamut
of rigid, affine, projective, polynomial, non-rigid(morphing)etc. These are
explained later.

Search strategy: This term denotes the technique used to search the entire
set of transformations. In case of the exhaustive search, the entire space
is scanned. In the case of hierarchical search, the set of transformations is
searched using a coarse to fine hierarchy.Finallu there are techniques that
use local gradient information of image similarity measure to determine a
sequence of transformations which progressively minimizes the measure.
Similarity metric: It is the criterion used for judging the 1ii.lii.], the quality of
the achieved registration. The similarity metric is a figure of merit used to
rank order the various spatial transformations.

One might wonder what is this similarity metric anyway? Can't we simply

compare the intensities at the corresponding pixel locations to verify if the images

have been registered. This is not so straight forward. In figure 1-1, we show two

registered images A and B. Clearly, the intensities in the two images at the same

location do not correspond.

There have been many recent efforts focused on information theory based

methods. The information-theory based criterion usually provides us with

some registration objective function which has to be minimized(or maximized)

over a range. One such objective function is the mutual information objective

function [ ] which was used to perform multimodality based image registration.

Mutual Information is a measure which measures the amount of overlapping

information content between two images. An interesting metaphor for intensity



















Itnig"A


[muing B


Figure 1-1: Two registered images with different intensity distributions


based and information-theory based similarity measures is that the intensity-

based technique can be viewed as "What We See Is All Get (WWSIAWG)" while

the information-theory based criteria can be viewed as "What We See Is What

We Mean(WWSIWWM)". Mutual Information techniques have now become

widespread in registration especially in medical imaging.

The Mutual Information between any two images can be computed either from

the raw image intensity or from feature vectors extracted from each image. Rarely

do we see registration approaches that combine intensity and features. This thesis

attempts such a combination.

Histograms are commonly used to estimate the entropy (information

content)which is then used to calculate the Mutual Information. When feature

based methods are used, the dimensionality of the feature space increases.

Histogram methods suffer from the "curse of dimensionality" and thus cannot

be easily used to estimate the entropy of a high dimensional feature set. Also the









histogram based methods are extremely sensitive to the number of bins. Apart

from this the histograms estimate the discrete Shannon Entropy while, as we

explain later, we need to estimate the continuous differential entropy. 1 This

calls for the use of non-parametric methods to estimate the entropy in a high

dimensional feature space.

Once such technique is to use spanning graphs to estimate entropy. The

early work in this area was done by Steele [ ] who showed how the length of

the minimum spanning tree (\IST) of a complete graph from the Euclidean

distance between the feature vectors is directly proportional to the integral of

an appropriately defined power of the probability density defined on the feature

space. Later the group at University of Michigan, Alfred Hero et al. [ ]showed

how the aforementioned relation between the length of the spanning tree and the

integral of the power of the density can be used to estimate the Renyi entropy

which in turn can be used to perform registration.

The search space of affine transformations is one such space which includes

the six parameters of rotations, scale, shear and translations in R6. The affine

transformation is applied globally to the entire image.

In this thesis, we have closely followed Hero's work and have successfully

shown how the MST based approach can be used the to perform registration over

six parameters of the affine transformation. We have computed an information-

theory based criterion similar to the conventional mutual information using the

Renyi entropy as proposed in paper [ ] and then minimized it over an affine search

space. Also, we have combined the use of intensities and features in the same

information theory based registration technique. The Renyi entropy is estimated



1 Information on the entropy, its types and Mutual Information is presented in
Chapter 2.







6

from the high dimensional features space using the minimal spanning tree-based

method.

After this concise introduction, Chapter 2 covers the background material,

Chapter 3 covers the theory, Chapter 4 covers the implementation details and

finally, the last chapter covers the results, conclusions and future work.















CHAPTER 2
BACKGROUND

2.1 Entropy and Mutual Information

From a broad perspective, the entropy of a random variable is defined as the

measure of uncertainty of a random variable. Alternatively, it can also be defined

as the amount of information required to represent the variable. Thus, the more

improbable a random the variable, greater is its entropy. Shannon [ ] derived a

measure of entropy for discrete variables. Mathematically, the entropy of H(X) of a

random variable X is defined as


H(X) p(x) logp(x). (2.1)
aGX

Normally, the logarithm is taken to the base 2, in which case the entropy is

expressed in terms of bits. Thus, entropy can also be expressed as the length of the

code to represent the random variable. If g(x) are the actual values that a variable

X can assume, then the expected values of g(X) is written as


Epg(X) = g(x)p(x). (2.2)
I X

Thus, we can write the entropy of a random variable as


H(X) = Elog (2.3)
p(X)'

This leads to a conclusion that


H(X) > 0 as 0 < p(x) < 1 implies that log > 0. (2.4)
p(X)









2.1.1 Joint and Conditional Entropy

Joint and conditional entropy extend the concept of entropy to a pair of

distinct random variables. Two random variables are said to be statistically

independent when their joint density is the product of their marginal densities.

Thus, when the two random variables are independent, the knowledge of one does

not have any effect on the knowledge of the other variable. On the other hand,

when the variables are totally dependent, that is, if Y = f(X) then, Y is not

random when X is known.

The mathematical representation of joint entropy is


H(X, Y) = p(x, y) log p(xr, y) (2.5)
EX y Y

and the conditional entropy of a variable given the other variable is defined as


H(YX) = p(x) p(, ,\-)logp(j, |) (2.6)
xEX ye Y
F-.,.,, ,)logp(Y|X). (2.7)


By the chain rule we get


H(X, Y) = H(X) + H(Y|X). (2.8)


Two variables are said to be independent if


H(Y|X) = H(Y) (2.9)


that is knowledge of X has no effect on the knowledge of Y. Thus we have


H(X, Y) = H(X) + H(Y). (2.10)


2.1.2 Relative Entropy and Mutual Information

The relative entropy is the measure of the distance between the two

probability distributions. To explain it better, suppose there a random variable









with probability distribution p and whose entropy is H(p). Since we don't know

the exact distribution of p we assume that it is q with entropy H(q). The relative

entropy D(p||q) is the difference in the number of entropy bits of p and q.

It is also called as the Kullback-Liebler distance and is mathematically

represented as

D(p| q)= p(x) logp() (2.11)
xEX e
Mutual information (I(X; Y)) is defined as the amount of information that one

random variable has about other. It is the relative entropy between the joint and

the product distributions of the two random variables, expressed as


I(X; Y) = D(p(x, y)|'(x)p(y)) (2.12)


p( p p(y)
= p(X, y) log p(x)y (2.13)
XEXyeY

It can also be shown that


I(X; Y) = H(X) + H(Y) H(X, Y) (2.14)

I(Y; X)&I(X; Y) > 0. (2.15)


Additional properties of mutual information are given in the book by Cover

and Thomas [ ].

2.1.3 Differential Entropy

While entropy is a measure of uncert.'iinr in the discrete domain, differential

entropy is a measure of uncert.'iirl of a variable in the continuous domain. It is

defined as


h(X) = -E[log(p(X))] (2.16)

p(x) log(p(x))dx (2.17)
/OO0









The major difference between entropy and differential entropy is that the

differential entropy can take negative values. Thus, there is no direct relationship

between code length and differential entropy since code length cannot be negative.

2.1.4 Renyi Entropy

The entropy described in Section 2.1.3 is differential shannon entropy. Renyi

entropy is a generalized version of the Shannon entropy. The Renyi entropy h0(X)

of order a is defined as,

h,(X)= log f fa(x)dxj, (2.18)

for -o < a < 00oo, a 1.

The discrete version of the Renyi entropy is defined as,

H(X) = log I 1x (2.19)
1-a [ j

When a -> 1 we obtain the Shannon entropy function from the Renyi entropy

function as follows,

h(X) = h (X) p(x)log(p(x))dx (2.20)

In the discrete domain, as a -> 1, H0(X) = H(X).

Viola and Wells [ ] have stated that the task of registering two images is

equivalent to maximizing the mutual information between the two images.

2.2 Histogram method for Estimating Entropy

Histogram based density estimation divides the range of the data into bins

and then computes the normalized frequency of the data placed in the bins. The

normalized frequency counts sum up to one. A one dimensional histogram of data

with n samples and b bins is,


f(ai) = where a, denotes the bin number i and 0 < i < b. (2.21)









For multivariate data distributed across d dimensions, histogram-based density

estimation by histograms is generally performed by constructing M bins along

each of the d dimensions. If the number of bins is large then the histogram will

fail to capture the shape of the probability distribution function due to high

variance will be large and the reduced bias. However, as number of bins are

decrease, the probability distribution function becomes spiky as the variance

narrows and the bias increases. Algorithms that determine the correct bin size of

the histogram include the Sturge's algorithm [ ]which states that for n samples,

M = 1 + 3.322 loglo n Some other approaches predict bin size from a priori

knowledge of data. In our thesis, we have chosen the bin size statistically.

Histogramming suffers from the curse of dimensionality. That is, if the number

of data dimensions are denoted by d, with M bins in each direction then the total

number of bins will be M'. Such exponential growth in M incurs drawbacks.

Firstly, as the M is large in higher dimensional data, more data points will be

needed to populate the histogram. Otherwise, it will result in a sparsely populated

histogram with numerous empty bins. Secondly, data in higher dimensions tends

to be correlated and thus projects itself into selected dimensions. This makes

histogram based method of density estimation unsuitable for higher dimensional

data.

2.3 Minimum Spanning Trees

Before describing the theory behind the use of Minimum Spanning Trees

for registration later in chapter 3, let us see define a Minimum Spanning Tree.

Minimum Spanning Trees(\ ITs) are used in solving many real world problems.

For example, consider a case of a network with V nodes with E undirected

connections between nodes. This can be represented as a connected, undirected

graph G = (V, E) containing V vertices and E edges. Now suppose that all the

edges are weighted, i.e., for each edge (u, v) c E we have an associated weight









w(u, v). A weight can be used to represent real world quantities such as cost of a

wire, propagation delay etc between two nodes in a network. A spanning tree is

defined as a .,. 1*- li.- graph that connects all the vertices. A minimum spanning tree

is a spanning tree with the minimum weight. Suppose we represent the spanning

tree as T C E, which connects all the vertices, and whose total length is w(T), then

the minimum spanning tree is defined as,


min(w(T)) = w(u, v). (2.22)
(u,v) CT

Generic MST algorithm. The book by Cormen et al. [ ] gives a supported

analysis of minimum spanning tree algorithms. The MST algorithm falls in the

category of greedy algorithms. Greedy Algorithms are algorithms that make

the best choice at each decision making step. In other words, at every step,

greedy algorithms make the locally optimum choice and hope that it leads to a

globally optimum solution. The greedy MST algorithm builds the tree step-by-

step, incorporating the edge that causes minimum increase in the total weight at

each step, without adding any cycles to the tree. Suppose there is a connected,

undirected graph G = (V, E) with the weight function w : E R. While finding

the minimum spanning tree for graph G, the algorithm manages at each step an

edge-set S which is some subset of the MST. At each step, edge (u, v) is added to

subset S such that it does not violate the MST property of S. This makes SU(u, v)

a subset of the Minimum Spanning Tree. The edge which is added at each step is

termed a "safe edge". The algorithm can be written as

GENERIC-MST(G, w)

1. S--0

2. whileS does not form a spanning tree

3. do find a safe-edge (u,v) which can added in S

4. S -SU(u, v)










5. return S



There are two popular algorithms for computing the Minimum Spanning Tree,

Prim's algorithm and Kruskal's algorithm (refer [ ]). We have used the Kruskal's

Algorithm to find the Minimum Spanning Tree. Its description follows.

Kruskal's algorithm for MST. Kruskal's algorithm is an extension of the

generic MST algorithm described in the preceding sub-section above. In the

Kruskal's algorithm the set S, which is a subset of the minimum spanning tree, is

a forest. At each step, the Kruskal's Algorithm finds the safe edge to be added as

the edge with the minimum weight that connects two forests. Initially, the edges

are sorted in the decreasing order of their weights. At each step, one finds the

minimum edge in the graph not already present in the minimum spanning tree,

connects two forests together. This process is repeated until, till all the vertices are

included in the graph. The algorithm can be written as,
MST-Kruskal(G,w)
1. S--0

2. for each vertex v E V[G]

3. do MAKE-SET(v)
4. sort the edges of E by non-decreasing weight w

5. for each edge (u, v) E E, in the order of nondecreasing weight

6. do if FIND-SET(u) / FIND-SET(V)

7. then S <- SU (u, v)
8. UNION(u,v)

9. return S

In steps 1-3, the subset S is initialized to null and V number of forests each with

a single vertex are created. Step 4 sorts the edge set E in a non decreasing order

of weight. In steps 5-8, an edge (u, v) is found such that the endpoint u belongs

to one forest and endpoint v belongs to other forest. This edge is incorporated in























Figure 2-1: Original graph


the subset S. The algorithm stops when all vertices are included in the tree. An

illustration of how the Kruskal's algorithm works is showing in figure 2-2,




6 4 5 5 5












4 S







5 5




Figure 2-2: MST by Kruskals Algorithm



A more formal proof of why the Kruskal's Algorithm works [ is as follows

Let G be a undirected, connected graph. t is the minimum spanning tree by

Kruskal's algorithm and t' is the actual minimum spanning tree. One shows that









trees t and t' have same costs. Let E(t) and E(t') are the edges in both the trees

where each set contains n 1 edges, provided that the total number of vertices in a

graph are n.

PROOF:

Casel: If E(t) = E(t') then the lengths t and t' are equal. Thus Kruskal's

Algorithm generates the minimum spanning tree in this case.

Case2:E(t) / E(t'). Assume that there is an edge p such that p G E(t) and p (

E(t). If we include p in t', then t' will have n edges and, thus, a unique cycle in

t' with edges p, e 2, e, ...ek. Of these k + 1 edges there will be at least one edge

cj not in tree t. The weight of ej must be greater than or equal to the weight of

p because, if the weight is less that that of p, then the greedy Kruskal's algorithm

would include it while forming the tree t. If any such edge which forms a cycle is

removed from tree t' a different tree t" is formed whose cost is less than or equal

to the cost of tree t'. This is because weight(ej) > weight(p). Thus, tree t" is

a minimum cost tree. By repeating the above mentioned transformation, we can

transform tree t' into tree t. Thus, the tree generated by the Kruskals algorithm

is a minimum spanning tree. The time complexity of Kruskals minimum spanning

tree is O(|Elog|E)

Prim's spanning tree. Unlike Kruskal's spanning tree, in Prim's spanning tree

the subset s is always a tree and never a forest. The algorithm is given by,

MST-Prim(G,w,r)
G:Graph, w:Weights, r:Root
1. Q <-- V[G]
2. for each u E Q
3. do 1, I[i,,] <- oo key(v): minimum weight of any edge connecting
vertex v to the tree.
4. 1' r/[ ] <- 00
5. 7[r] <- NIL 7[v]: names of the parent of v in tree.









6. while Q / 0
7. do u, -- EXTRACT-MIN(Q)
8. for each v E Q and w(u, v) < key[v]
9. then 7r[v] <- u
10. key [v] -- w (u, v)

The time complexity of this algorithm is O(|E| + |V1 log IVI).

More efficient algorithms for computing the minimum spanning trees are given

in Chazelle [ ] and Pettie and Ramachandran [ ]. Some algorithms [ ] compute

the minimum spanning trees dynamically, as the vertices are added or removed

dynamically.

2.4 The Problem of Registration

Image registration can be defined as the problem of finding the best

transformation that, when applied to one image, aligns the image with the another

image. Brown [ ] and Pluim et. al [ ] surv'"vrd image registration methods which

can be classified as the various algorithms actually used to perform registration and

the applications of the registration techniques.

Based on the application of the registration techniques, the methods can

be classified as, registration if images with different modalities, registration of

different anatomical objects etc. The problem of registration of images with

different modalities is called multimodality registration. This problem involved

the registering of two images taken via different sensors or under different lighting

conditions. The example of multimodality registration is registration of MRI, and

CT images. In our thesis, we have registered face images which are obtained via

a camera of different people, under different lighting conditions and with different

expressions.

Based on the methods used for registration, the techniques are classified as

based on preprocessing of images, the type of transformations applied to images,









the interpolation method used, the search strategy used, the different features of

the image used and the measure of registration.

Mutual information for registration. Based on the measure of registration,

several statistical criteria are used. In these criteria, majority of the methods

involve entropic measures to estimate how the images have been aligned. These

entropic measures involve estimating the entropy of the images and formulating

different measures based on them. Mutual information is one such measure, which

is defined as


I(X, Y) = H(X)+ H(X Y) (2.23)

I(X, Y) = H(X) H(Y) -H(X, Y). (2.24)


Some methods use normalized mutual information to register the images which is

computed as,

NMI(X, Y) H(X) H(Y) (2.25)
H(X, Y)
Observe that H(X|Y) is a measure of the conditional probability p(xly) i.e. of a

particular value x in Xgiven the value in Y is y. Thus, the Equation (2.23) can be

interpreted as the measure of the uncertainty about X minus the uncertainty about

X when Y is known. It can also be stated that I(X, Y) = I(Y, X). Thus, when the

images are aligned, the amount of information the images have about each other

is maximized. As a result, registration involves finding the transformation that

maximizes the mutual information.

Another view about of information is obtained from equation (2.24), which

describes mutual information as the sum of marginal entropies minus the joint

entropy. One might argue that, as the images are transformed, the marginal

entropies of the images remain constant and the only factor that changes is the

joint entropy. From this it would seem that, maximizing the mutual information

is equivalent to minimizing the joint entropy of the two images. But this is not









true, as the entropies are calculated from the overlap areas while computing

mutual information. The marginal entropies of the images thus, change with the

joint entropy, as the number of pixels in the overlap area change with a given

transformation. In such a scenario, if the overlap region is very small, then the

joint histogram will be a sharp peak corresponding to the overlap of the one image

and the background samples from second image, resulting a low joint entropy. This

will give an incorrect answer, which ironically favors the separation of the two

images. By adding the marginal entropies term in Equation (2.24), a p. n.iii, is

added since the marginal entropies will also be low when the images are misaligned.

This makes mutual information less sensitive to the problem of background.

2.5 Image Transformations

An image transformation is a function that maps the location of points in

one image to a different location in an other image. The transformations can be

global or local. Global transformations mean applying the same mapping for the

entire image while a local transformation consists of a collection of several mapping

functions, each of which is specific to a particular region in the image. Apart from

these broad classes, image transformations can be classified as,

Rigid: involves translations and rotations alone.

Affine: involves translations, rotations, scale and shear.

Perspective: is mainly used to map a 3D image to a 2D image by projecting
the data.

Non-Rigid: In which mapping involves introducing morphing and distortions
of an image.

In this study, we have used the affine transformation and thus will elaborate

more on that.

Affine transformations. Affine transformations involve rotation, scale, shear

and translation. The composite transformation matrix is decomposed into scale,









rotation, shear, rotation as -ii..:i -1. .1 in the paper [ ]1 and translation. This

decomposition of can be written as,

ab0
T = c d 0 that is the composite matrix, which can be further decomposed as

e f


a b 2 0 cos(O) sin(O) 2t 0 cos(q) sin(q)

c d 0 2y sin(O) cos(O) 0 2--sin() cos(y)

where s is the scale parameter, t is the shear parameter, 0 and 9 are the rotation

angles and e and f are the translations in the x and y direction.

We thus, perform our search over six degrees of freedom, two rotations,

one scale, one shear and two translations. An interesting property of this

affine transformation that it does not allow reflections and that it gives a nice

decomposition of the composite affine matrix which allows the user to individually

set different parameters.

2.6 Independent Component Analysis

Registration of two images can be performed by using the pixel intensity

values, or by extracting features from image. Features can be extracted simply

by passing the image through a set of filters, each of which detects one or more

features of interest. Independent Component Analysis is one such technique for

extracting features.

The motivation for the technique of Independent Component Analysis, was to

solve the "cocktail party problem". This problem can be posed as follows:



1 The paper -ii:'.I -1 decomposing into scale, rotation and two shears. However,
it also mentions that the second shear can be replaced by a second rotation









Suppose there are two people speaking simultaneously in a room, and we have

two microphones to capture their voices. The output from the two microphones can

written mathematically as


xi (t) = allsi + a12S2 (2.26)

X2(t) a2 1 a22S2 (2.27)


where x represents the output from a microphone, t is the time, s represent the

sources and a represents the weights that are assigned to the signals from the

microphones. ICA attempts to find the original sources of the signals (i.e. the

speakers in the above case) from a mixed signal input (i.e. from microphones).

ICA thus, involves recovering independent sources of signals from only the sensor

observations, which are the unknown weighted linear mixtures of the signals from

independent sources. Thus, ICA searches for a linear mixing matrix that, when

multiplied by the original multivariate mixed data, gives statistically independent

components of the data.

Definition of independent components. Two variables are said to be

statistically independent when no variable has information about the other

variable. The joint probability of two independent variables is the product of

the marginal probabilities of the two variables. Thus, two random variables yl and

y2 are considered to be statistically independent when, p(yl, y2) = p(yl).p(y2). For

n such variables, the joint density must be a product of n marginal densities of the

variables. Independent variables are also uncorrelated in nature. Also, independent

variables are as non-Gaussian as possible. The reason behind this is that the

Gaussian distribution is symmetric in nature, which makes it impossible to find the

direction of the components of the mixing matrix.

Now the obvious question arises is what are the different ways of measuring

independence.









Measuring independence. Non-Gaussianity and, thus, the independence of a

variable, can be measured by using an information theory metric called negative

entropy, also called negentropy. Previously we stated that that the differential

entropy of a random variable x is given by the formula,


h(x) p(x)log p(x)dx. (2.28)


Gaussian variables have the largest entropy among all variables of equal variance.

On the other hand, the variables that have i-. -" in their probability distribution

functions (less Gaussian) have less entropy. Based on this observation, negentropy

is defined as


J(y) = H(ygas) H(y). (2.29)

Thus, the negentropy is always positive, except for gaussian variables, when

negentropy is zero.

The FASTICA algorithm. We have used the FASTICA package(Available At:

http://www.cis.hut.f i/projects/ica/f astica/, Accesses at: March 2004) [ ]

to find the ICA of images. This package implements the FASTICA algorithm which

constructs a neural network, the weights of which are updated based on a learning

rule. Roughly, the algorithm is as follows:
1. Negentropy is used as a measure of the non-gaussianity.
2. The cost function is an approximation of the negentropy.
3. Initially random weights are assigned at the starting point.
4. Then the value of the weights in the subsequent iteration is estimated by the
Newton-Raphson method.
5. The above step is repeated until convergence.

Assuming that the images are a mixture of signals from statistically

independent sources, we can use ICA to extract features from the images.















CHAPTER 3
THEORY OF MINIMUM SPANNING TREE ENTROPIES

In this chapter, we consider minimum spanning trees for registration.

3.1 Entropy Estimation for Registration

In the previous section it was shown that, mutual information requires the

estimation of entropy that is calculated by estimating the probability density of the

underlying data. For image registration, it might be necessary to estimate discrete

instead of continuous densities. There are two broad approaches to estimate the

probability density: parametric methods and non-parametric methods. Both

methods require the estimation of parameters. However, non-parametric methods

estimate only the width parameters like the bin size in histograms, window size in

parzen window density estimation described in [ ], etc, while parametric methods

estimate more complicated parameters which describe the underlying density such

as mean, variance etc. An overview of methods for density estimation can be found

in the book by Christopher Bishop [ ]. To elaborate a little bit,
Parametric Methods: assume a density model for the underlying distribution
of data. These methods then attempt to find the parameters of the particular
model which fit optimally to the data. One of the drawbacks of this approach
is the assumption of the underlying density model might not be true. The
common parametric density estimation techniques are the Hidden Markov
Models(HMM), B.,i,. -i.'11 Networks etc. The HMMs model the underlying
distribution by having different states and the transitions from one state to
other due to the underlying probability and bias as described in [ ]. Also
the B.i!,-' -i.im Networks model a causal system in which the occurrence of
a event at one node can tri. r several changes in the connecting nodes as
described in depth in [ ]. Both the above mentioned approaches have to
model complex graphs, which are defined by a large set of parameters.
Non-parametric Methods: These methods make no assumption about the
density model which underlies the data, and lets the data decide the nature
of its distribution. Histogram methods, parzen windows are examples of









such methods. Such methods require estimation of the width parameter
or smoothing factor which is the number of bins in histograms, the parzen
window size "a" which determines how well the density estimation "fits" the
data.
Semi-Parametric methods: Bishop [ ] advances a class for density estimation
that tries to combine the best features of parametric and non-parametric
methods. Techniques such as mixture models fall in this category. These
techniques assume Gaussian component densities of the data and thus have to
estimate Gaussian like parameters.

Since both the parametric and semi-parametric methods assume a specific

underlying distribution, they have to estimate several distribution specific

parameters such as mean, variance and covariance ( [ta, a, oa). In contrast,

non-parametric methods estimate a smoothing parameter which is usually some

kind of a width parameter. Image registration requires determining the pose

parameters in a particular transformation space. If parametric or semi-parametric

methods are used for density estimation in registration, then the density model

parameters need to be estimated apart from the pose parameters. In contrast, non-

parametric methods require the estimation of width(smoothing) parameters and

the pose parameters only. Thus, non-parametric density estimators are generally

preferred over parametric or semi-parametric methods for image registration.

Traditional entropy estimation, either discrete or continuous, requires a priori

density estimation. However, the approach taken in this study the entire density

estimation. Renyi entropy is estimated directly using the samples from the image,

via the relationship between the .ii -I.1. .1 i' length of the minimum spanning tree

and the Renyi entropy. The Renyi entropy and its properties are discussed in

Section (2.1.4). The following Section discusses the chronology of the development

of the theory underlying the entropy estimation by a minimum spanning tree.

3.2 History

In the late 50's, an interesting viewpoint for estimating entropy using graph

algorithms came into existence. It began in 1959 when Beardwood, Halton and









Hammersley [ ] first analyzed the travelling salesman problem(TSP) and stated

that the TSP length of a set of multi-dimensional feature vector is related to the

entropy. Later, in 1988 J. M. Steele [ ] followed their work and stated that if

a fully connected graph is constructed from a multi-dimensional feature space

such that the weight of an edge between any two points is the weighted Euclidean

distance between the points, then the .i 1mptotic length of a minimum spanning

tree constructed from such a graph is related to the underlying probability density.

Later, the group at University of Michigan led by Dr. Alfred Hero showed how

the .i 1mptotic length of the minimum spanning tree is directly proportional to

the Renyi entropy (refer Section 2.1.4) [ ]. They further pointed out how this

relationship between d and minimum spanning trees can be further expanded to

a variety of graphs like k-nearest neighbors or Steiner Trees [ ]. The practical

applications of such graphs are emphasized in [ with image registration being

one such application.

Minimum spanning trees and Renyi entropy were initially discussed in Section

(2.3) and (2.1.4) respectively. Some relevant portions are reiterated here as

and when needed. Suppose there are n independent multivariate points in a

d dimensional space, x, E Rd and that there is a graph of these points with

V = Xi1,2.., Xn vertices and E = (xr, xj); 1 < i < i < < n where each edge e in the

edge set E is denoted by a weight function, (|e|). The task of finding a minimum

spanning tree involves finding the tree with the minimum weight among all the

trees which span the all the vertices in the set V


M(x, x2...,Xn) = mm (|e|), (3.1)
ecT

where T is a set of all the graphs with vertex set that includes all points in the

space Rd, y is the weighting function that associates weights with the edges of the









minimum spanning trees which can be defined as


y(x) -= x where 0 < 7 < d. (3.2)

Let the points in feature space Rd, d > 2 have a independent random

distribution p with the y function -.,l i-Fw- in4-; the criteria y(x) ~ x7 and 0 <7 < d,

and let f be the density of the continuous part of ft with c(Q, d) a positive constant

depending upon 7 and d, then Steele [ ] proved the relationship between the

length of a minimum spanning tree and some power of the density function f as,

lim n d M(XI, X2, .. ., X) c(, d) f(x) d dx. (3.3)

The next section shows how the equation (3.3), after a little bit of algebraic

manipulations, presents a clear link between the minimum spanning tree and the

Renyi entropy.

3.3 Minimum Spanning Trees for Entropy Estimation

We discussed about alpha Renyi Entropy or simply the Renyi entropy in

Section (2.1.4) and represented it mathematically in Equation (2.18) as,


h,(X) =- logV fo(x)dx (3.4)

The relationship between the density function and the length of the minimum

spanning tree is specified in Equation (3.3). If we rename the length of the

Minimum Spanning Tree in Equation (3.3) as,


M(Xi, X2,... ,X)= L(X,) = mim e (3.5)

where es = i| xj\\ denotes the Euclidean distance between points i and j in a d

dimensional feature space, we can rewrite the Equation (3.3) as,

lim n- (d-y L(X) c(, d) f f(x) dx (3.6)
n-oo d









Now taking the log on both sides and substituting a = (d 1)/d we obtain,


log lim n-L(X,) log(c(a, d)) + log(/ f (x)dx). (3.7)


Note that the constant c which was a function of 7 and d, now becomes a function

of a and d due to the substitution of a. Dividing the above equation by 1 a on

each side yields,

1 1
log( f (xW)dx) = (log lim n-L(X,) log(c(a, d))) (3.8)
Sa Jd I a noo

From the Equation (2.18), the left hand side of the preceding equation can be

reduced to

h,(x) (log lim n-L(X,) log(c(a, d))). (3.9)
1 a noo

The short derivation above points as to how the minimum spanning tree length

converges to the Renyi entropy. The above Equation (3.9) is the fundamental

result from which we estimate the entropy approximately in the thesis. It is the

approximate estimation of the entropy as the constant c (which depends on a and

the dimensions d) is unknown and cannot be computed analytically. The strength

of Equation (3.9) lies in the fact that the length of the minimum spanning trees can

be easily computed in polynomial time specified by O(|E| log |E|) (refer [ ]).

Observe that Equation (3.9) is a very clean equation which doesn't require the

estimation of any smoothing parameter which has to be chosen carefully e.g., the

number of bins in the histogram or the "sigma" in the parzen window estimation

of entropy. Of course we have to compute the length of the minimum spanning tree

which is trivial. For the number of bins in the histogram and the parzen window

width estimator, on the other hand, there is no explicit criteria of selection of the

smoothing(width) parameter.

The explanation of the rate of convergence of MST entropy follows. The rate

of convergence of the algorithm depends mostly on the number of samples that we









choose to compute the MST. A very nice paper by Alfred Hero et. al("Convergence

rates of minimal graphs with random vertices" by "A. 0. Hero J. A. Costa and

B. Ma" submitted to "IEEE Transactions on Information T1....y in March 2003)

formulated the rates of convergence of the minimum spanning tree for piece-wise

linear densities and then extended their proposition for a more general form of

densities. We next discuss a proposition for the rate of convergence of minimum

spanning trees. We assume that the number of dimensions d > 2, that 1 < 7 < d-1

and X1, X2,..., X, are independent sample points in the space [0, I]d with density

f with support S C [0, 1]d

Proposition: Marginal block density. The density f has md levels. For the

quasi-additive Euclidean functional L, of order 7, the following bound exists,


E[L,(XI, X2, ., X,)]/n( c(Q, d) f(d -7 d(x)dx < O((nm- d)-1d).

(3.10)

In the Equation (3.10), the term L, denotes the length of the minimum spanning

tree whose edges have the weight of the Euclidean distance between the points

raised to the power 7 as described in (3.5).

The Equation (3.10) means that the difference between the expected value of

the length of the minimum spanning tree normalized by nrd-"d and the product

of the constant and the integral of the density function is .1-'in- l .1. il illyy less

than (nm-d)l^/d. For example, if d = 2, then the upper bound on the difference

mentioned above which is the error is O((nm-2)-1/2). That is as number of

samples n increases, then the bound becomes tighter when we can say that the

error reduces as the number of samples increase. Observe from equation (3.10)

that, as the dimensionality of the feature space increases, the rate of convergence

decreases and thus will need more samples for convergence.

The preceding proposition is important in the context of image registration,

as the images have fixed levels of color values and thus can be said equivalent to









the piece-wise density considered above where there are a total of md levels in the

d dimensional space. Interested readers should refer to the paper,("Convergence

rates of minimal graphs with random vertices" by "A. 0. Hero J. A. Costa and B.

Ma" submitted to "IEEE Transactions on Information T1...Iy in March 2003) for

a generalization of the above result to the generalized class of densities.

The paper by Hero et. al. ("Convergence rates of minimal graphs with

random vertices" by "A. 0. Hero J. A. Costa and B. Ma" submitted to

"IEEE Transactions on Information Til ..y" in March 2003) mentions that the

aforementioned convergence criteria can also be applied to the graph constructions

like the Steiner tree graph, minimal matching graph or the TSP tour all along

with their power weighted variants. However, the minimum spanning tree is

chosen based on the paper [ ] which recommends the MST as the best method to

estimate entropy due to its polynomial runtime. There exist many well known

algorithms for implementing the MST which supports use of the minimum

spanning tree as the density estimator.

3.4 Measure of Registration

Since Viola and Wells [ ] and Collignon et. al. [ ] successfully used the

Shannon mutual information to perform registration, it has been used as a measure

to perform registration in many systems. In this study, the minimum spanning

tree as described in the above section are used to estimate the approximate Renyi

entropy of the random variables. This entropy is further used to compute the

measure of registration.

The definition of the Renyi mutual information was first formulated by Alfred

Renyi in the book [ as,

1 r00 ffx} (x)
L(X) log fx dx. (3.11)
1 a f.HR1 fxi(Xi 1









Unfortunately, Renyi mutual information cannot be easily computed. Thus instead

of the Renyi mutual information, we use a similar measure which is simply the sum

of the marginal entropies minus the joint entropy similar to the Shannon mutual

information. This measure was proposed by Hild et. al in the paper [ as,

N fxx)a

I ia 1N

In case of two images, the above measure is the sum of the two Renyi marginal

entropies minus the Renyi joint entropy as,


H(xi) + H(xi) H0(x) = log x + log log L12 (3.13)
1 a n" n" n"

where L, denotes the length of a minimum spanning tree constructed from a power

weighted graph computed as L, = minT Z T e*.

The preceding measure is very similar to the Renyi mutual information as

both are positive and both tend to zero when the joint probability is equal to

the product of the marginal probabilities (when the variables are independent

as mentioned in the paper [ ]). However, the main reason why the measure

is preferred over the actual entropy is that it can be calculated using the MST

entropy algorithm. Thus, our algorithm maximizes this measure to perform

registration.

Also, maximizing the mutual information is similar to minimizing the joint

entropy of the images when the search space is narrow. This is so as the marginal

entropies remain more or less constant in a narrow search space. Referring to

the Equation (3.12), we see that, in a narrow space as the marginal entropies are

more or less constant, the negative of joint entropy is maximized. In other words,

joint entropy is minimized. In some cases (especially when we results are based on

one parameter like rotation) we have opted to minimize joint entropy instead of

maximizing the mutual information.















CHAPTER 4
ALGORITHM DESIGN

This chapter first describes the overall design of a generalized registration

technique followed by the various design decisions while designing this algorithm, a

brief algorithm description and then implementation details.

4.1 General Registration Algorithm

The process of image registration can be represented procedurally as shown

in Figure 4-1. The first step preprocesses the images such that the images become

appropriate for registration. Preprocessing involves performing operations on image

like noise removal, smoothing of the image, cropping etc. Human Intervention can

occur depending on the type of images considered for registration. The next step

is a quick estimation the exact transformation parameters by scanning the search

space in low resolution. And finally, when we have the coarse parameters for the

transformation, the region around the rough parameters is scanned in order to get

an accurate estimate. This multi-resolution helps as it speed up the registration

process. Alternatively, if the time requirements for estimating the measure( that we

discussed in Section 3.4) are less stringent, then the initial rough estimation step is

omitted all together and the search space is searched at fine resolution. The flow


Figure 4-1: Block diagram for image registration






































Figure 4-2: Flow-chart for the estimating the parameters of the transform


chart for the block used to find the optimal transformations (either in the ballpark

estimation or the fine tuning) is as shown in figure 4-2.

We apply the transformation on only one of the two images (transforming

image) and the other image is unchanged (reference image). At each step of

the algorithm, the entropic measure between the two images is calculated. If

the measure has reached the maximum(or minimum depending on its nature)

in the space, then the parameters of the transform are declared as the optimal

parameters. If not, then the next transform to be applied to the transforming

image is computed and that transform is applied to the transforming image and the









steps are repeated until the optimum value of the measure is found in the search

space.

4.2 Design of the Algorithm

Design of a registration algorithm involves a sequence of decisions that need

to be taken at each step as discussed in this Section. As now what becomes

obvious from the previous discussion that crux of this algorithm is estimation of

the Renyi entropy which is continuous. This estimation can be done by using the

non-parametric Renyi measure based on the the minimum spanning trees based on

the equation (3.8) which is expressed as


1 log( f (x)dx) = (log lim n-L(X,) log(c(d da, d))). (4.1)
1 a 1 noo

From this equation, the estimation of entropy is approximate because of the

presence of the constant term c that only depends on a and the dimensions d.

The first step therefore is estimating parameters by performing rough image

registration using histograms.

4.2.1 Preprocessing by Histograms

The main reason for using the histograms is that they are computationally

efficient. This allows us to get a ballpark estimation very quickly. The entropies

(both the joint and marginal) are estimated from the overlap area of the two

images. The reason for estimating the joint entropy from the overlap area is

obvious. However we also estimate the marginal entropy from the overlap area

as as the images are transformed, the marginal entropy of the overlap area also

changes dynamically with every transformation.

The important parameter which decides how well the histogram performs

is the bin size, which is chosen based on statistical analysis. The Figure 4-3

elaborates empirical results of histogram analysis when the number of bins are

varied.






















-03r


Changes in M I Vs angle when number of bins are 8


-06


-07
1 OB


-30 -20 -10 0 10 20
Angle
Changes in MI Vs angle when number of bins are 8


-30 -20 -10 0 10 20 30
Angle
Changes in MI Vs angle when number of bins are 32


-1 1 -


-12


-13


50 -40 -30 -20 -10 0
Angle


10 20 30


Figure 4-3: Effect on registration by varying the number of bins of

histogram(\iiiilI. r of Bins are 8, 16 and 32 from top left to bottom

right. Maximum should be present at 0.)









Table 4-1: The ICA values

-0.2845 0.2875 -0.1523 0.1468
0.3425 -0.5310 -0.4016 0.5821
0.4048 0.0869 -0.4269 -0.0605
0.0597 0.0197 0.0251 0.0777


As we can see as the number of bins decrease (bin size is large) the histogram

is noisy and there are many peaks around the desired value maximum in the

histogram. However, if the number of bins increase, the peak disappears as the

data is over fitted.

After histogram analysis, we extract features from the image and use these

feature images to perform registration as follows,

4.2.2 Features

Image features are extracted based on a local neighborhood of each pixel,

called tags [ ]. The basic of idea of tag-based methods is to account for the local

topology surrounding each pixel. Around each pixel in the image, a n n matrix

is extracted and vectorized, to obtain a vector of length n2 for every pixel. This is

equivalent to quantizing the image such that it reflects local topology due to which

the extracted features become more invariant to change in the individual pixel

intensity.

Once the tags are extracted from the image, the decision that laid ahead was

how to extract features. We opted for statistical based features since they extract

features based on the information content of the image. As mentioned in Section

(2.6), the ICA algorithm attempts to find statistically independent components of

the image. We have used the FASTICA [ ] algorithm to compute the ICA of the

images.

The Figure 4-4 shows the feature images as computed by ICA. The actual

values of the ICA components can be seen from the table 4-1.The first three lines


























ICA Image 1 ICA Image 2










I
.* **





.I .



ICA Image 3 ICA Image 4





* -


Figure 4-4: ICA feature images
























Figure 4-5: The faces used to compare registration for different sizes of ICA blocks



Blocks are 4
Blpcks are 2

-24


S\-27


-10 -5 G 5 10 -10 -5 0 5 10

BIR c]s are 1






-55

-10 5 5 10

Figure 4-6: Figure showing the variations in MI vs. the rotation angle various
ICA feature dimensions 1. 4 features 2. 16 features 3. 36 features.
Maximum located at 0.



of the table 4-1 are responsible for the edge images of the original image while the

fourth ICA component simply smooths the image.

One point here is how many ICA features to extract or how much should be

the dimensionality of the feature space. There are several interesting observations

in this case. Figure 4-6 represents the mutual information vs. the rotation angle

for various sizes of ICA blocks. The images for registration are shown in Figure

4-5. From the Figure 4-6 we can see that the registration worsens if we pick too

high a feature dimension.











Block axe 2 Blocks are 4
















feature- dimensions-1 4-eaur s 2.-1-featr-s "---- fetur-
11 -24



S -12 -291



-10 -5 Blocs aret 5 10 -1l -5 Q 5 Io

-42
-44






-52
-10 -S 0 5 10

Figure 4-7: Figure showing the variations in MI vs. the rotation angle various ICA
feature dimensions 1. 4 features 2. 16 features 3. 36 features



Another interesting observation is seen when we pick the faces shown in figure

4-30 we get the MI Vs. rotation plots as shown in Figure 4-7. From the Figure

4-7, we see that the the acceptable registration result is obtained when we use 16

ICA features. Thus the number of ICA features to use relies heavily on the type of

image to register.

Once features are extracted from the image, the next step is to fine tune the

registration by estimating the entropy using the minimum spanning trees.

4.2.3 Search Strategy

Our prototype MST matching algorithm uses a brute force search that scans

the entire transformation space to perform registration. One of the reasons for not

doing this is the way the MST entropies behave in the search space. The behavior

is shown in 4-8, from which we can see that the MST entropy is not smooth and

has several local maxima. Thus, normal search strategies like the gradient descent

etc cannot be used as they rely on a smooth objective function.












































MST entropies over rotation
22



-23



-24








26
26



2\








-15 -105

Angle












-136-

138

-14

142






148
-10






10 -05
scale

Anale


Figure 4-8: Behavior of MST entropy in ID and 2D search space.









Thus, in order to be not overwhelmed by the design of an efficient search

algorithm for finding the global maxima for a non-smooth objective function, we

used a brute force search which is guaranteed to find a global maximum.

4.2.4 Minimum Spanning Tree Algorithm

This is the final step of the registration. Ideally, we would use the entire

overlap area of the two images to compute the minimum spanning tree and then

estimate the entropy. From Equation (3.8), we see that the entropy is sensitive to

the number of samples used to compute the minimum spanning tree and thus we

need to maintain this number constant during each computation. So, we choose a

fixed number of sample points randomly from the overlap area of the image and use

only those to estimate the entropy.

Minimum Spanning Tree computation. Assume that we pick n sample points

each time to compute the minimum spanning tree. Now here we have the points in

some d dimensional space, where the weights of the edges between two points are

the Euclidean distance between the points. Also, initially we assume that there is

an edge between a point and all the other (n 1) points in the space i.e. that the

graph is fully connected.

In such a scenario ,we have n points and n2 edges in the Euclidean space, the

classical MST algorithm will have the complexity of order 0(n2 log n). As this is

a polynomial complexity algorithm, the search time increases tremendously as n

increases. One of the papers [ ] mentions about resolving this issue by reducing

the number of edges. This has been done by placing a sphere around each point

in the d dimensional hyperspace. Only the edges connecting the given point to its

neighbors that lie within the sphere are retained for analysis. The radius of the

sphere is calculated based on the following formula,

max mean,
radius, = meanmean RFactor (4.2)
QFactor









where

radius, denotes the radius at point p.

mean, denotes the mean of distances that connect point p to all other points.

max denotes the maximum distance between any 2 points in the d dimensional

space.

QFactor denotes the quantization factor.

RFactor denotes the factor which decides magnitude of the radius.

We varied the RFactor it over a specific values of the radii and chose the one

which gave the radius with minimum edges and error. The plots are shown in 4-9

As a result, every point has a sphere of a different radius, which is determined

primarily by how close the point is to the other points in the hyperspace.

Deciding the number of samples. The Figures 4-10 and 4-11 illustrate that

when the value of a is kept about 0.55, then 500 samples seem sufficient for

registration. However a nice plot with explicit maximum is obtained at about 1100

samples. A similar plot for value of alpha as 0.85 is shown in 4-12 and 4-13 which

illustrates that a very good result is obtained at 1100 samples.

Thus evidently as more samples are better for accuracy. Fewer samples are

picked when speed desired.

The minimum spanning tree was computed by using the Kruskal's

Algorithm implementation provided by the Boost Graph Library (Available at:

http://www. boost. org/libs/graph/doc/table_of_contents.html, Accessed

on: March 2004). The pseudo code for this implementation can be found in the

APPENDIX section.

4.3 Implementation Details

4.3.1 Hardware and software Details
CPU: Pentium IV, 2.8 GHz, 2 GB RAM.

OS: Linux.

Languages: Matlab and C++.













All the code was written in Matlab. The program used the Kruskal Algorithm

provided by the Boost Graph Library (Available at:

http://www.boost. org/libs/graph/doc/table_of _contents .html,

Accessed on: March 2004) which is written in C++. The interface between

the Matlab and the C++ code is done using the Matlab MEX Function

Interface.


Plul If .rrir in Iru1 MST ,d T-nLd MST w olh rwutaliin r-10 degr


3

25-

2

1 5



05

Qo


RFa 1c r 25
R~achir


PhlU lfeitr in tIru MST ndTrnAld MST wih -,ln-o f 10 arees


Figure 4-9: Plots showing error between the actual MST length and the truncated

MST length over various radius factors and over rotations of -10, 0 and

10 degrees


RFnie tr

Plate ofarrour in rnetu MET unTurd Trvnel M5T with rotati~o aEC dryrac>



















55 5

55

-54 5

U 54
o 53 5

53

525
-10 -5 Sampl& Ils500 5 10


63 5r


63

L 62 5

62

61 5

61 -
-10 -5


Samples is 300




11:


605[


N V


58 5


58
-10 -5 SamplI is 700 5 10


655

65

645

64

635

63
-10 -5 0 5 10


Figure showing the variations in joint entropy verses the rotation

angle with varying number of samples picked from the overlap area

when alpha is 0.55 are shown as 1. 100 Samples 2. 300 samples 3. 500

samples 4. 700 samples


665

66

w 655

65

645-
-10


Figure 4-11:


Samples is 900


675

67

66 5

S66


65 5


65 0
-10 -5 0 5 10


Figure showing the variations in joint entropy verses the rotation

angle with varying number of samples picked from the overlap area

when alpha is 0.55 are shown as 1. 900 Samples 2. 1100 samples


Figure 4-10:


















Samples is 100


31
-10 -5 0 5 10


Samples is 500


36

355

35

^ 34 5

34


35 5

35

34 5

34

33 5

33
-10 -5 0 5 10

Samples is 700
36
" a /\


331
-10 -5 0 5 10


Figure 4-12: Figure showing the variations in joint entropy verses the rotation

angle with varying number of samples picked from the overlap area

when alpha is 0.85 are shown as 1. 100 Samples 2. 300 samples 3. 500

samples 4. 700 samples


36 5


36

S35 5

- 35

345


Samples is 1100


34
-10 -5 0 5 10


5 10


Figure 4-13:


Figure showing the variations in joint entropy verses the rotation

angle with varying number of samples picked from the overlap area

when alpha is 0.85 are shown as 1. 900 Samples 2. 1100 samples


,\K!









Table 4-2: Free parameter table


Parameter Meaning Value Reason
a a in Renyi 0.5 The value -I'.].i -1. .1
Entropy by the Hero et. al
in paper [ ].
BlockSize Size of block 2 or 4 Based on analysis in
around each pixel section 4.2.2,figures
i.e. the number of 4-6, 4-7.
ICA features
radius radius of the 5 Based on analysis
sphere around each shown in Figure 4-9
point in feature
space to truncate
the edges
numBins number of bins for 16 Based on analysis
histogram search shown in Figure 4-3
numberSamples Number of random 500 or 1000 Picked based on
samples used to whether speed or
compute Renyi accuracy is desired.
entropy Figure 4.2.4
searchSpace Affine parameters Rotations[-20,20] Based on the time
over which to Scale[-0.5,0.5] required to cover
search shear[-0.5,0.5] the space
translations[-5,5]
in each direction


Time: The time taken for the entire algorithm to run for a 64*64 images with
histogram preprocessing and then final registration using MST entropy using
1000 samples is approximately 8 hours.

4.3.2 Free Parameters

The free parameters are tabulated in Table 4-2

4.4 Results

This section summarizes the different results obtained from the

registration algorithm. This section shows that the registration by

estimating entropy by using the MST produces acceptable results. All

the face images are obtained from the Yale face database (Available At:









http://cvc.yale. edu/proj ects/yalefaces/yalefaces. html Accessed On:

March 2004)

4.4.1 Same Image Registered with the Same Image

This result shows how the MST changes as the images are registered. As both

the images considered are the same image in this case, the visual representation of

the change in the MST from the unregistered case to one in the perfectly registered

case is very apparent.

The parameters values for the following tests:

a = 0.5

Samples = 500

Number of bins of Histogram = 16

Block size around pixel = 2 x 2

The figure 4-14 displays the original unregistered image and their overlap

images.

The Figure 4-15 shows the minimum spanning tree of the two unregistered

images. The feature space in this representation is the intensity of both the images

for the purpose of visualization.

The figure 4-16 shows the how the images are registered.

The MST of the two registered images is shown in figure 4-17. Observe how

nicely the points collapse in one line as the images become registered.

4.4.2 Registration of Same Person with Different Expressions

In this section, here are the registration results for the images of the same

person in different facial expressions. The parameters chosen are:

a = 0.5

Samples = 500

Number of bins of Histogram = 16

Block size around pixel = 2*2













Image 1


Overlap of Unregistered Images


a


Overlap of Unregistered Edge Images


Figure 4-14: Figure showing the from top left to bottom right 1. Face image
2. Same face with applied transformation 3. Overlap of the two
unregistered images 4. Edge image of the overlapped unregistered
images


Minimum Spanning Tree for Unregistered Images


250


200


150


-50 0 50 100 150 200 250 300
D-one

Figure 4-15: Minimum spanning tree of the unregistered images considering only
the intensities of the two images as features


Image 2













Registered Image 2


Overlap of Registered Images


Overlap of Registered Edge Images


Figure 4-16: Figure showing the from top left to bottom right 1. Face image 2.
Face obtained after applying the optimal transformation to the
unregistered image shown in the figure 4-14 3. Overlap of the two
registered images 4. Edge image of the overlapped registered images





Min mum Spanning Tree for Registered Images


200


50 100 150
D-one


Figure 4-17: Minimum spanning tree of the registered
intensities of the two images as features.


250 300



images considering only the


Image 1










Image 1 Image 2










Overlap of Unregistered Images Overlap of Unregistered Edge Images









Figure 4-18: Figure showing the from top left to bottom right 1. Face image 2.
Face image of the same person with glasses which is unregistered
3. Overlap of the two unregistered images 4. Edge image of the
overlapped unregistered images


The images before registration are shown in Figure 4-18 and the MST for the

unregistered case is in Figure 4-19

After performing registration the results are shown in Figure 4-20 and the

MST for the registered case is in Figure 4-21

If we see at the images of the minimum spanning trees then we see that when

the two images are unregistered then the points are scattered all over the place.

However when the images are registered, a cluster of points is formed at the 45

degree line in the MST space reducing the length of the MST.

4.4.3 Yet Another Registration of Same Person with Different Expressions

In this section, we show the registration results for another images of the same

person in different expressions. The parameters chosen are:

a = 0.5

Samples = 500










Min mum Spanning Tree(MST) of unregistered images


50 100 150
D-one


200 250 300


Figure 4-19: Minimum spanning tree of the unregistered images considering only
the intensities of the two images as features


Image 1


Registered Image 2


Overlap of Registered Images


Overlap of Registered Edge Images





ri


Figure 4-20: Figure showing the from top left to bottom right 1. Face image 2.
Face image of the same person with glasses which is registered 3.
Overlap of the two registered images 4. Edge image of the overlapped
registered images










Minimum Spanning Tree(MST) of registered images



250 -


200-


150 -


100 -


50 -



0 50 100 150 200 250 300
D-one

Figure 4-21: Minimum spanning tree of the registered images considering only the
intensities of the two images as features



Number of bins of Histogram = 16

Block size around pixel = 2 x 2

The images before registration are shown in Figure 4-22 After performing

registration the results are shown in Figure 4-23

From the previous two results with the images of same person and different

expressions, we can see that the algorithm successfully performs registration. An

important point to note here is that the image size was 64 x 49 (3136 pixels) out of

which 500 were used to estimate entropy in order to perform registration by using

MST unlike histograms which need all the samples in the area in order to estimate

entropy. Plots of convergence for the Renyi a entropy verses the number of samples

are shown later in this section.

4.4.4 Registration Result of Images of two Different People

We next discuss registration results for the images of the different persons.

The parameters chosen are:

a = 0.5











Image 1


Overlap of Unregistered Images


Overlap of Unregistered Edge Images


/ f


Figure 4-22:


Figure showing the from top left to bottom right 1. Face image of a
happy person 2. Face image of the same person with sad expression
which is unregistered 3. Overlap of the two unregistered images 4.
Edge image of the overlapped unregistered images


Image 1


Overlap of Registered Images


Registered Image 2


Overlap of Registered Edge Images


^'I


Figure 4-23: Figure showing the from top left to bottom right 1. Face image
2. Face image of the same person with sad expressions which is
registered 3. Overlap of the two registered images 4. Edge image
of the overlapped registered images


Image 2










Image 2


Figure 4-24:


Overlap of Unregistered Images Overlap of Unregistered Edge Images


A
JI





Figure showing the from top left to bottom right 1. A Face image 2.
Face image of the different person 3. Overlap of the two unregistered
images 4. Edge image of the overlapped unregistered images


Samples = 500

Number of bins of Histogram = 16

Block size around pixel = 4x4

Initial state of the unregistered images is shown in Figure 4-24

After performing registration the results are shown in Figure 4-25

The above result shows that though it is doing registering the two images,

the result is not accurate for dissimilar images and there is definitely scope for

improvement. In the next result therefore I changed some parameters and got some

improvements in the result.

4.4.5 Yet Another Registration Result of Images of Two Different People

Now are the registration results for the yet another images of the different

persons. The parameters chosen are:

a = 0.9999

Samples = 1000


Image 1









Image 1 Registered Image 2










Overlap of Registered Images Overlap of Registered Edge Images









Figure 4-25: Figure showing the from top left to bottom right 1. Face image 2.
Face image of different person which is registered 3. Overlap of the
two registered images 4. Edge image of the overlapped registered
images


Number of bins of Histogram = 16

Block size around pixel = 4*4

Also I scaled the search range for scale to perform even finer search.

Initial state of the unregistered images is shown in Figure 4-26

After performing registration the results are shown in Figure 4-27

The above result is a trifle better than the previous one.

4.4.6 Comparison with Histogram Based Method

In order to do this comparison, we selected two sets of images of two different

people. In one case, we perform registration over six parameters for the faces

considered in section 4.4.5 and we registered the faces over all the six parameters

namely rotation, scale, shear and translations.

Initial state of the unregistered images is shown in Figure 4-28

After performing registration the results are shown in Figure 4-29












Image 2


Overlap of Unregistered Images


Overlap of Unregistered Edge Images


-i


Figure 4-26: Figure showing the from top left to bottom right 1. A Face image 2.
Face image of the different person 3. Overlap of the two unregistered
images 4. Edge image of the overlapped unregistered images


Image 1


Registered Image 2


amm


Overlap of Registered Images


Overlap of Registered Edge Images


a


Figure 4-27: Figure showing the from top left to bottom right 1. Face image 2.
Face image of different person which is registered 3. Overlap of the
two registered images 4. Edge image of the overlapped registered
images


Image 1











Image 1


a


Overlap of Unregistered Edge Images





i^ 1 I
--^- ( T,


Figure 4-28: Figure showing the from top left to bottom right 1. A Face image 2.
Face image of the different person 3. Overlap of the two unregistered
images 4. Edge image of the overlapped unregistered images



Image 1 Registered Image 2


Overlap of Registered Images


Figure 4-29:


Overlap of Registered Edge Images


'i


Figure showing the from top left to bottom right 1. Face image
2. Face image of different person which is registered 3. Overlap
of the two images registered using histogram 4. Edge image of the
overlapped registered images


a


Image 2









Image 1 with light from left Image 2 rotated with light from right









Figure 4-30: Figure showing from left to right 1. Face image of a person 2. Face
image of same person which is registered







Figure 4-31: Figure showing image 2 as registered by the histogram


We see from the figure 4-29 that the histogram method performs as well as the

MST when the intensity differences in the two images are not that obvious.

Later we performed the comparison on two images in which the lighting on one

of the images was from the left side while in the other image the light was from the

right side and then compared results over the rotation only. The images that were

considered are shown in figure 4-30

The second image is rotated by 5 degrees and perfect registration finds the

right answer at -5 degrees. The results with the histogram are shown in 4-31

The plot of the MI vs the rotation is shown in 4-32. Notice how the maximum

MI is seen at 9 degrees instead of -5 degrees, which is a clear error.

The results with the MST are shown in Figure 4-33

The plot of the MI vs the rotation is shown in Figure 4-34. Notice how the

maximum MI is seen at -5 degrees as expected.
















MI Vs Rotation by histogram


1 71r


-1 8
10 -8


2 0
Angle


8 10


Figure 4-32: Plot of MI vs. rotation angle when registered by histogram.

Maximum should be at -5 degrees.


Figure 4-33: Figure showing image 2 as registered by the MST


MI Vs Rotaton h MS


Figure 4-34: Plot of MI vs. rotation angle as registered by MST. Correct answer is

at -5 degrees.












-0.4 -0.4
-0.6 -0.6

-0.8 -0.8

-1 -1

-1.2 -1.2

-1.4 -1.4

-1.6- -1.6
-1.8 -1.8
0 500 1000 1500 2000 0 500 1000 1500 2000



-0.8 -0.8

-1
-1
-1.2
-1.2 1.4

-1.4 -1.6
-1.8
-1.6
-2
-1.8 -2.2
0 500 1000 1500 2000 0 500 1000 1500 2000

Figure 4-35: Figure showing the from top left to bottom right, MST Renyi joint
entropy vs the samples for 1. alpha = 0.4 2. alpha = 0.45 3.alpha =
0.5 4. alpha = 0.55



Thus we conclude, that in the images with lot of intensity variations, the

registration by using a minimum spanning tree definitely outperforms that by the

histogram.

4.4.7 Convergence of Renyi Entropy with Number of Samples as Alpha Changes


Recall from the previous discussion in the background and theory section that

we need to estimate the a Renyi entropy that as a 1 converges to the Shannon

differential entropy. The question here is what value of a to choose such that the

Renyi entropy estimated converges quickly in fewer samples.

This result documents the plots of the Renyi entropy computed using the

minimum spanning trees vs. the number of samples as we vary the a. The result is

shown in Figures 4-35, 4-36, 4-37.
































1.4

1.6

1.8

-2

2.2

2.4

2.6
0 500 1000 1500 20


Figure 4-36: Figure showing the from top left to bottom right, MST Renyi joint
entropy vs the samples for 1. alpha = 0.6 2. alpha = 0.65 3.alpha -
0.7 4. alpha = 0.75











































3.5'
0 500 1000 1500 2(


Figure 4-37: Figure showing the from top left to bottom right, MST Renyi joint
entropy vs the samples for 1. alpha = 0.8 2. alpha = 0.85 3.alpha -
0.9 4. alpha = 0.95


--









Table 4-3: Noise results over rotation for histogram

Noise Variance Rotation Error
0.01 0
0.02 0
0.03 0
0.04 0
0.05 1
0.06 1
0.07 1
0.08 1
0.09 1
0.1 1


The graphs in Figures 4-35, 4-36 and 4-37, show that the behavior of the

Renyi entropy is very erratic for all the values of alpha and in fact does not seem to

converge quickly.

4.4.8 Noise Trials

Noise in the image may affect the registration results. This section compare

the histogram technique to the MST technique when the Gaussian noise with

varying variance was added to the images. The images were then registered over

rotation and over rotation and scale and the error by both the techniques was

compared.

case 1: Registration over rotation only.

Range of rotation: -10 to 10 degrees

Correct Answer at: 0 degrees.

The error for the histogram method for various noise added can be tabulated

as in Table 4-3. As seen in Table 4-3, the histogram performs well for low noise

but gives an error of 1 degree for a noise variance greater than 0.05.

The error for the MST method for noise added can be seen in Table 4-4. From

this table, we can see that the MST performs well for a low noise case and again









Table 4-4: Noise results over rotation for MST

Noise Variance Rotation Error
0.01 0
0.02 0
0.03 0
0.04 0
0.05 0
0.06 3
0.07 0
0.08 3
0.09 0
0.1 2


introduces error (in fact larger) than the histogram as the variance of the noise

added increases.

Case 2: Registration over rotation and scale.

Rotation Range: -10 to 10 degrees.

Scale Range: -0.5 to 0.5

The correct answer expected for both rotation and scale is at 0.

Initially, some trials were made by just adding noise to the images and then

registering the raw images. With this option however the MST approach performed

badly and gave an error for values low noise cases with variance of 0.01. The

histogram method clearly outshone the MST method in this case. So later we used

a smoothing filter to smoothen out the images before registering then with the

MST and the histogram.

When smoothing Weiner filter was applied to the images after adding noise,

the histogram method gave a ZERO error consistently for variance from 0.01 to

0.08.

Results when the MST was used are tabulated in the table 4-5. This table

shows how the MST error worsens dramatically in a two parameter search space.









Table 4-5: Noise results over rotation and scale for MST

Noise Variance Rotation Error Scale Error
0.01 0 0
0.02 0 0
0.03 2 0
0.04 4 0


Table 4-6: Noise results over rotation and scale for MST

Noise Variance Rotation Error Scale Error
0.01 0 0
0.02 0 0
0.03 2 0
0.04 2 0
0.05 1 0.1
0.06 0 0
0.07 2 0
0.08 2 0


In this above result shown in Table 4-5, 1000 samples from the overlap area

of the images were used to compute the MST length. Another set of results when

1500 samples were used to compute the MST are tabulated in the Table 4-6 From

the Table 4-6 we can see that the accuracy of registration by MST decreases

progressively as an error is consistently introduced.

This result illustrates the behavior of the MST entropy when Gaussian noise

was added to the image and then the image was smoothed with a particular filter

before registering. The are different types of noise and filters that can be used to

remove the noise. This relationship between accuracy of registration by MST and

the different types of noise and filters needs to be studied further.















CHAPTER 5
DISCUSSION

The estimation of entropy has its applications in a variety of areas with image

registration being one such application. Recently, information theoretic measures

such as mutual information have become very popular in image registration.

Estimating the mutual information requires the estimation of entropy.

This thesis used minimum spanning tree algorithms to estimate the entropy

from a finite set of samples. The MST can be computed easily in polynomial time.

One of the good attributes of the MST-based entropy estimation is that no further

parameter tuning is required.

However when applying these entropies to perform registration several

important issues need to be dealt with. The MST entropy does not approximate

the Shannon entropy and instead approximates the Renyi entropy seen in equation

(3.3). Thus it cannot estimate the Shannon mutual information which is widely

used in registration. In fact it cannot even estimate the original Renyi mutual

information that we saw in (3.11). Instead we have used a information theoretic

measure which is the difference between the sum of the marginal entropies and the

joint entropy as was -ii:',_i -led in paper [ ]. This measure has properties similar to

that of the Renyi mutual information.

There are several open issues that need to be resolved in this case. We would

like to undertake a systematic comparison with other established density and

entropy estimators such as Parzen windows [ ]. Parzen windows estimate both

the Shannon entropy as well as the Renyi entropy. Thus they can be also used to

estimate the Shannon mutual information. The density function by the Parzen









windows can be computed from the following equation,

1 1 || xs||2
f(x) W exp- >2 (5.1)
N (2w2) 227

This equation can be further used to estimate the a Renyi Entropy which is

mentioned in the equation (2.18). Note that when using the Parzen Window

density estimator, it can compute the entropy for a > 1 where as for the MST the

condition on the alpha is that it should be less than 1. Thus careful formulation

has to be done of all the parameters in order to do a comparison between these

density estimators.

Apart from this, even in the regime of MST entropies that are used for

registration, several important changes can be made to the method proposed in this

thesis. One of them is that, in this thesis we have simply scanned the entire space

to find the maximum (Refer back to the section 4.2.3). Clearly, a good optimized

search algorithm to search over such a space would be beneficial in cutting the time

required drastically.

Also, one area of improvement is in feature selection. Right now, we have used

ICA features based on the rationale that ICA returns statistically independent

features. But a detailed study on the features can help improve the registration

results. There are several other methods of feature selection like the principal

component analysis, use of image processing filters like Gabor filters etc. [ ].

However, the noise analysis that we did in the section (4.4.8), 1ii:'. -r that when

noise is present it requires smoothing of images as these methods rely heavily on

taking the derivatives of the images. Thus presence of noise can in fact have a

negative effect on the registration when derivative-based features are used, vis-s-vis

intensity-based registration.

An interesting extension to the registration by using MST entropies can be

pursued if instead of using Euclidean distance among the feature points, we project









the points in a different space (e.g., Hilbert space) and then use the distance in the

Hilbert space to compute the MST length. Whether such a extension improves the

registration is another open area of research.

In fact we also noticed in section (4.4.8) that the histogram method is less

sensitive to noise. Thus a way to register by using a high dimensional histogram is

also a possibility. However, in order to do that innovative ways have to be devised

to compute histograms in a multi dimensional space.

In conclusion, in this thesis, we use the MST entropies to register just two face

images at a time searching over the space of affine transformations. This technique

can also be expanded to register several images at once. To bring this into reality

however a good search algorithm is a necessity. It also requires appropriate

selection of features in order to extract most relevant information. For example, the

features used to register face images may not be suitable for medical images. Also,

MST entropies can be used for non-rigid image registration, provided it is been

made extremely computationally efficient.

Thus, this thesis bolsters the fact that that the MST entropies can be used for

registration by implementing a registration algorithm which searches over the affine

transformation space.















APPENDIX
ADDITIONAL INFORMATION

This section covers the pseudocode of the algorithm.
Pseudo-code. The program consists of the following main functions:
FunctionRegister: This function is the main function which calls all the
subfunctions.

FunctionRoughRegister: This function performs the rough image registration
using intensity and histograms for entropy estimation.
FunctionExtractlCAFeatures: This function extracts the ICA features from the
image.
FunctionMSTRegister: This function performs the fine registration using the
ICA features and the minimum spanning tree for estimating the entropy.

and the following supplementary functions:
FunctionAffineTransform: This function performs the affine transformation of
the input image according to the given parameters.

FunctionJointEntropyByHistogram: This function computes the joint histogram
and accordingly estimates the Shannon joint entropy.
FunctionMarginalEntropyByHistogram: This function computes the histogram
and accordingly estimates the Shannon marginal entropy.
FunctionEntropyByMST: This function estimates the Renyi entropy (Joint and
Marginal) using the minimum spanning tree.

Here are the main functions explained in depth:



FunctionRegister

Input:

Imagel, Imii: .': The Images.

BlockSize: Size of the square block around each pixel(i.e. 2 x 2 3 x 3 etc).

Alpha: The a factor in the Renyi Entropy.









NumBins: The number of bins used to compute the histogram.

Output:

AffineParameters: The affine transformation parameters which when applied

to the second image give a registered images. The parameters of affine are two

rotations, scale, shear and translations in the x and y direction

MaxMeasure: The Mutual Information like measure when the for the

parameters above.

Algorithm:

// Perform rough image registration of the images using intensity + histogram

RoughAffineParameters = Function Rough Register(Imagel, Image2, NumBins);

// Extract ICA features from Input Images

[icaFeatures1, icaFeatures2, ICAFeatures] = FunctionExtractl CAFeatures(Imagel,

Image, BlockSize);

// Perform fine image registration of the images using ICA Features, +

histogram

AffineParameters = FunctionMSTRegister(ICAFeatures, Alpha,

RoughAffineParameters);

EndFunction


FunctionRoughRegister

Input:

Imagel, Iiri,: .': The Images.

NumBins: The number of bins used to compute the histogram.

Output:

RoughAffineParameters: The affine transformation parameters which when

applied to the second image give a roughly registered images. The parameters of

affine are two rotations, scale, shear and translations in the x and y direction









Algorithm:

// The Imagel is kept fixed while Image2 is transformed over the search space.

Over all the six parameters of the affine search space using a rough resolution

to scan the space do

templmg = FunctionAffineTransform(currentTransformValues);

jointEntropy = FunctionJointEntropyByHistogra m(overlap areas of the Imagel

and templmg);

marginalEntropy = Function Margina lEntropyByHistogra m(overlap area of

Imagel);

marginalEntropy2 = Function Margina lEntropyByH istogra m(overlap area of

Image2);

currentMI = marginalEntropyl + marginalEntropy2 jointEntropy;

if( currentMI > maxMI )

maxMI = currentMI;

RoughAffineParameters = current affine parameters;

endlf

endWhile

return RoughAffineParameters;

EndFunction



FunctionExtractICAFeatures

Input:

Imagel, Irii' .': The Images.

BlockSize: Size of the square block around each pixel(i.e. 2 x 2 3 x 3 etc).

Output:

icaFeaturesi: The ICA features of imagel icaF *in/, t i The ICA features of

image2 ICAFeatures: The ICA features of the two images









Algorithm:

For both Imagel and Image2 do

// Get the block neighborhood of each pixel in both the images.

BlockInput1 = GetBlockNeighberhoodlmage(Imagel, blockSize);

BlockInput2 = GetBlockNeighberhoodlmage(Imagel,blockSize);

// Compute the ICA features. We have used the fastICA package [ to do

so. The package is freely available for use.

icaFeaturesi = FASTICA(BlockInput 1);

icaF-.i, '; -= FASTICA(BlockInput2);

// Now combine the ICA features

ICAFeatures = icaFeaturesI U icaF., ,i,/

return icaFeatures1, icaFn -ii. / .', ICAFeatures;

EndFunction



FunctionMSTRegister

Input:

icaFeatures1: The ICA features of image.

icaF.i/;,,i. .': The ICA features of image.

ICAFeatures: The ICAFeatures of the Images combined.

Alpha: The Alpha used to compute the Renyi entropy

RoughAffineParameters: The Ballpark estimate from the histograms around

which the search is performed.

Output:

FinalAffineParameters: The affine transformation parameters which when

applied to the second image give a registered images. The parameters of affine are

two rotations, scale, shear and translations in the x and y direction

MaxMI: The value mutual information when the images are totally aligned.









Algorithm:

// The Imagel is kept fixed while Image2 is transformed over the search space.

Over all the six parameters of the affine search around the ballpark estimate

RoughAffineParameters using a fine resolution to scan the space do

// First Pick Random Samples from the joint as well as the individual feature

space.

(sampleJoint, sample, sample) = pickRandomSamples(ICAFeatures,

icaFeaturesi, icaFeatures2);

templmg = FunctionAffineTransform(currentTransformValues);

jointEntropy = FunctionEntropyByMST(sampleJoint);

marginalEntropyl = FunctionEntropyByMST(samplel);

marginalEntropy2 = FunctionEntropyByMST(sample2);

CurrentRegistrationMeasure = marginalEntropy 1 + marginalEntropy2 -

jointEntropy;

if( CurrentRegistrationMeasure > maxMeasure )

maxMeasure = CurrentRegistrationMeasure;

FinalAffineParameters = current affine parameters;

endlf

endWhile

return F':i7+1. 1 fineParameters, maxMeasure;

EndFunction



Now the pseudo code of the supplementary functions is:



FunctionAffineTransform

Input:

Image: The Image to transform.









TransformParameters: Parameters by which the image is to be transformed

Output:

TransformedImage: The transformed image after applying the transformation.

Algorithm:

// The transform parameters are 2 rotation angles (0 and 0), scale, shear, and

translations (xT and yT)



a b 2 scale 0 cos(0) sin(0) 2shear 0 cos(q) sin(q)

c d 0 scale sin(0) cos(0) 0 2-shear sin(q) cos(q)


a bO
affineTransformationMatrix = c d 0

xT yT 1
// Now call the MATLAB's Image Transform Routine which transforms the

given image according to the given affine transform matrix and fills the empty areas

according to the interpolation method specified.

TransformedImage = Function MATLABI mTra nsform(Image, affineTransformMatrix,

interpolation);

return TransformedImage;

EndFunction



FunctionJointEntropyByHistogram

Input:

Imagel, Iiri,: .': The Images.

NumBins: The number of bins used to compute the histogram.

Output:










joi:ilI, .l'.;1 Joint Shannon Entropy of the two images computed by the

histogram method

Algorithm:

interval = Range/NumBins;

//Compute the two dimensional normalized joint histogram of the two images.

jointHist = FunctionJointHistogram(Imagel, Image2, interval);

jointEntropy = 0;

for i = 1 to NumBins

for j = 1 to NumBins

jointEntropy = -1 jointHist(i,j) log jointHist(i,j);

endFor

endFor

return Jo*lo /r:l/ .i..

EndFunction



FunctionMarginalEntropyByHistogram

Input:

Image: The Image.

NumBins: The number of bins used to compute the histogram.

Output:

mariT.:al;ide/gi Marginal Shannon Entropy of the image computed by the

histogram method

Algorithm:

interval = Range/NumBins;

//Compute the one dimensional normalized histogram image.

marginalHist = FunctionMarginalHistogram(Image, interval);

mar.:,lTl; ** lui' = 0;









for i = 1 to NumBins

nmaIr.i.:ilfd,*l,.i '1. = -1 marginalHist(i,j) log marginalHist(i,j);

endFor

return margl.r:/
EndFunction



FunctionEntropyByMST

Input:

featureMatrix: The feature matrix whose entropy is to be calculated. The size

of the matrix is N d.

where N :- Number of sample points and d :- Their dimension d > 2

Alpha: The Alpha Parameter that is used in computing the Renyi Entropy.

Output:

Eidl ..ir Renyi Entropy of the feature matrix computed by the Minimal

Spanning Tree method.

Algorithm:

/* In this function we attempt to compute the approximate Renyi Entropy

based on the formula 3.8. We do not know the factor C(a, d) and thus it is

approximate. */

[N d] = size(featureMatrix);

/* Compute the minimal spanning tree and its length. We use the Kruskal

Algorithm provided in the "Boost Graph Library" (Available at:

http: //www. boost. org/libs/graph/doc/table_of _contents html, Accessed

on: March 2004) */

lengthMST = FunctionBOOSTKruskal(featureMatrix);

gamma = (d Alpha) /d;

Entropy- 1 lengthMST
I Alpha ogNAtph







75

return Entropy;

EndFunction















REFERENCES


[1] Y. AMIT AND D. GEMAN, Shape quantization and recognition with
randomized trees, Neural Computation, 9 (1997), pp. 1545-1588.

[2] D. BEARDWOOD, H. J. HALTON, AND J. M. HAMMERSLEY, The shortest
path through many points, in Proceedings Cambridge Philosphical Society,
vol. 55, 1959, pp. 299-327.

[3] C. BISHOP, Neural networks for pattern recognition, Oxford University Press,
Oxford, England, 1995.

[4] L. G. BROWN, A survey of image registration techniques, Computing Surveys,
24 (1992), pp. 325-376.

[5] B. CHAZELLE, A minimum .,::.:1'./ tree .*i1l,iili with inverse- ackermann
i;.i. (.m: ,/. :l. Journal of the ACM, 47 (2000), pp. 1028-1047.

[6] A. COLLIGNON, D. VANDERMEULEN, P. SUETENS, AND G. MARCHAL,
3D multi ir 1i./.i/ /i; medical image registration using feature space clustering,
in Computer Vision, Virtual Reality and Robotics in Medicine, N. Ayache,
ed., vol. 905 of Lecture Notes in Computer Science, Springer-Verlag, 1995,
pp. 195-204.

[7] T. CORMEN, C. LEISERSON, AND L. RIVEST, Introduction to iJi=ii ii; .
MIT Press, Cambridge, MA, USA, 2001.

[8] T. COVER AND J. THOMAS, Elements of information theory, John Wiley and
Sons, New York, NY, USA, 1991.

[9] R. DUDA AND P. HART, Pattern I .i; -6.;l/ .n and scene analysis, John
Wiley and Sons, New York, NY, 1973.

[10] D. EPPSTEIN, D iiii,. euclidean minimum T1..;1:i:.i:o trees and extrema of
binary functions, Geometry: Discrete & Computational Geometry, 13 (1995),
pp. 111-122.

[11] S. GOLD, A. RANGARAJAN, C. P. LU, S. PAPPU, AND E. MJOLSNESS,
New .iJ.tiHin-. for 2-D and 3-D point imintiiiiq: pose estimation and
correspondence, Pattern Recognition, 31 (1998), pp. 1019-1031.

[12] A. 0. HERO, B. MA, 0. MICHEL, AND J. GORMAN, Applications of
entropic .ft'i:p:.':.t rli, IEEE Signal Processing Magazine (Special Issue on
Mathematics in Imaging), 19 (2002), pp. 85-95.









[13] A. 0. HERO AND 0. MICHEL, A ,in':l.*:' theory of greedy approximations to
minimal k-point random qiupi,., IEEE Transactions on Information Theory, 45
(1999), pp. 1921-1938.

[14] K. E. HILD II, D. ERDOGMUS, AND J. PRINCIPE, Blind source separation
using renyis mutual informations, IEEE Signal Processing Letters, 8 (2002),
pp. 174-176.

[15] E. HOROWITZ, S. SAHNI, AND S. RAJASEKARAN, Fundamentals of computer
algorithms, W. H. Freeman, San Francisco, CA, 1998.

[16] A. HYVARINEN AND E. OJA, Independent component analysis: A tutorial,
Neural Networks, 13(4-5) (2000), pp. 411-430.

[17] B. MA, A. HERO, J. GORMAN, AND 0. MICHEL, Image registration with
minimum .if:':.':.t tree *.icl, ilbih. in Internationl Conference on Image
Proc. --inl. vol. 1, 2000, pp. 481-484.

[18] R. E. NAEPOLITAN, Learning '*.7'. .*,i: networks, Pearson Prentice Hall,
Upper Saddle River, NJ, USA, 2004.

[19] H. NEEMUCHWALA, A. HERO, AND P. CARSON, Image registration using
entropic ,iipl,-riiifth iii criteria, in Proceedings of Asilomar Conference on
Signals and Systems, vol. 1, 2002, pp. 134-138.

[20] S. PETTIE AND V. RAMACHANDRAN, An optimal minimum *..f:f:.1'.' tree
algorithm, in Automata, Languages and Programming, 2000, pp. 49-60.

[21] J. P. W. PLUIM, J. B. A. MAINTZ, AND M. A. VIERGEVER, Mutual-
information-based registration of medical images: A m,' IEEE Trans. on
Medical IIII...iI. 22 (2003), pp. 986-1004.

[22] L. RABINER, A tutorial on hidden markov models and selected applications in
speech recognition, Proceedings of the IEEE, 77 (Feb 1989), pp. 257-286.

[23] A. RENYI, P,1,1/.:./:;, theory, North-Holland Publishing Company,
Amsterdam, Netherlands, 1970.

[24] C. E. SHANNON, A mathematical theory of communication, The Bell system
technical journal, 27 (1948), pp. 379-423 and 623-656.

[25] J. M. STEELE, Growth rates of euclidean minimum .ii;::.i..i. trees with power
weighted edges, The Annals of Probability, 16 (1988), pp. 1767-1787.

[26] H. A. STURGES, The choice of class interval, Journal of the American
Statistical Association, 21 (1926), pp. 65-66.







78

[27] B. C. VEMURI, J. LIU, AND J. L. MARROQUN, Robust multimodal image
registration using local fi ..;. i; 7 representations, vol. 2082 of Lecture Notes in
Computer Science, 2001, pp. 176-182.

[28] P. VIOLA AND W. M. WELLS III, Alignment by maximization of mutual
information, in Fifth Intl. Conf. Computer Vision (ICCV), 1995, pp. 16-23.















BIOGRAPHICAL SKETCH

I graduated with a bachelor of engineering degree from the Government

College of Engineering, Pune, India. I started my master of science program at the

university of Florida in August 2002 and I am planning to graduate in a month!

My academic interests lie in computer vision and image processing and this thesis

was a great learning experience for me. Apart from this, I enjoy painting and

badminton.