Information Theoretic Measures and Their Applications to Image Registration and Segmentation

Material Information

Information Theoretic Measures and Their Applications to Image Registration and Segmentation
WANG, FEI ( Author, Primary )
Copyright Date:


Subjects / Keywords:
Algorithms ( jstor )
Computer vision ( jstor )
Coordinate transformations ( jstor )
Datasets ( jstor )
Diagnostic imaging ( jstor )
Entropy ( jstor )
Images ( jstor )
Images of transformations ( jstor )
Random variables ( jstor )
Shannon entropy ( jstor )

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Fei Wang. Permission granted to University of Florida to digitize and display this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
Resource Identifier:
649810192 ( OCLC )


This item has the following downloads:

wang_f ( .pdf )

































































































Full Text







Copyright 2006


Fei Wang

For my wife, Lin, and my parents.


I would like to first thank my advisor, Dr. Baba C. Vemuri, for everything he has

done for me during my doctoral study. This dissertation would not have taken shape

without his invaluable input. Dr. Vemuri introduced me to the field of medical image

analysis. His insight and experience have guided me throughout my research during

which time he provided numerous invaluable suggestions. It was a great pleasure

for me to conduct this dissertation under his supervision. I would also like to thank

Dr.Anand Rangarajan, Dr. Sartaj Sahni, Dr. Arunava Banerjee and Dr. Tan Wong for their

willingness to serve on my committee. In addition, special thanks go to Dr. Jorg Peters

for attending my PhD oral examination.

My doctoral research is a happy cooperation with many people. Dr. Vemuri has been

involved with the whole process, Dr. Rangarajan has guided me a lot in the groupwise

point registration part, Dr. Ilona Schmalfuss and Dr. Stephan Eisenschenk have kindly

provided the data for hippocampal segmentation and taught me what little I know of

Neuroscience. I have also benefitted from Dr. Thomas E. Davis's guidance when I first

joined the lab. I would also like to thank Dr. Banerjee for stimulating debates, and Dr.

Jeffrey Ho for his professional advice and philosophical discussions. Thanks also goes

to my co-authors, Drs. Murali Rao and Yunmei Chen, who were my co-authors on a set

of papers that introduced, the concept of entropy based on probability distributions and

several properties of the same.

Needless to say, I am grateful for the support of my colleagues and friends at the

Computer and Information Science and Engineering Department at the University of

Florida. Dr. Zhizhou Wang, Dr. Jundong Liu, Dr. Tim Mcgraw, Dr. Eric Spellman, Bing

Jian, Santhosh Kodipaka, Nicholas Lord, Neeti Vohra, Angelos Barmpoutis, Seniha Esen

Yuksel, Ozlem Subakan, Ritwik Kumar, Evren Ozarslan, Ajit Rajwade, Adrian Peter, Dr.

Jie Zhang and Dr. Hongyu Guo all deserve thanks.

And finally, most importantly, I thank my family. I thank my mother and father

for everything and my brother, too. And of course I thank my dearest Lin for her

understanding and love during the past few years. Their support and encouragement are

my source of strength.

This research was supported in part by the grants NIH RO1-NS42075 and NIH

R01-NS046812. I would also like to acknowledge travel support (for attending various

conferences to present research papers) from the IEEE Computer Society, the Department

of Computer and Information Science and Engineering and the College of Engineering of

the University of Florida.


ACKNOW LEDGMENTS ................................

LIST OF TABLES .................

LIST OF FIGURE S . . . . . . . . .

A B STR A C T . . . . . . . . . .


1 INTRODUCTION ................................

1.1 Image and Point-set Registration ................... .
1.1.1 Im age R registration . . . . . . .
1.1.2 Groupwise Point-sets Registration ................
1.2 Im age Segm entation . . . . . . . .
1.3 Outline of Rem ainder . . . . . . .


2.1 Shannon Entropy and Related Measures .................
2.2 Cumulative Residual Entropy: A New Measure of Information ......
2.3 Properties of CRE . . . . . . . .
2.3.1 CRE and Empirical CRE ......................
2.3.2 Robustness of CRE .........................


3.1 Related W ork ......... .........
3.2 Multimodal Image Registration using CCRE . .
3.2.1 Transformation Model for Non-rigid Motion .
3.2.2 Measure Optimization . . . .
3.2.3 Computation of P(i > A, k; p) and P(i>,k;p)
3.2.4 Algorithm Summary . . . .
3.3 Implementation Results . . . . .
3.3.1 Synthetic Motion Experiments . . . Convergence speed . ...... Registration accuracy . ..... Noise immunity . ........ Partial overlap . . . .

. . 14
. . 17
. . 2 1
. . 2 1
. . 2 3
. . 2 5
. . 2 5
. . 26
. . 26
. . 2 8
. . 29
. . 3 0


3.3.2 Real Data Experiments .........

TRATION .................

4.1 Previous W ork . . . . . . .
4.2 Divergence M measures . . . . . .
4.2.1 Jensen-Shannon Divergence . . . .
4.2.2 CDF-JS Divergence . ..............
4.3 Methodology . . .......
4.3.1 Energy Function for Groupwise Point-sets Registration
4.3.2 JS Divergence in a Hypothesis Testing Framework ..
4.3.3 Unbiasness Property of the Divergence Measures .
4.3.4 Estimating JS and its Derivative . . . . Finite mixture models . ....... Optimizing the JS divergence . .
4.3.5 Estimating CDF-JS and its Derivative . ..... Optimizing the CDF-JS divergence . .
4.4 Experiment Results . .......
4.4.1 JS Divergence Results . . . . . Alignment results . . . . Atlas construction results . ......
4.4.2 CDF-JS Divergence Results . . . .


5.1 Related W ork ............
5.2 Registration+Segmentation Model .. ............
5.2.1 Gradient flow s . . . . . .
5.2.2 Algorithm Summary . . . . .
5.3 R results . . . . . . . .


6.1 Contributions of the Dissertation .......
6.2 Image and Point-sets Registration .. ............
6.2.1 Non-rigid Image Registration . . . .
6.2.2 Groupwise Point-sets Registration .. .........
6.3 Image Segmentation . . . . . .

REFERENCES ...............

BIOGRAPHICAL SKETCH .. ....................

. . 36
. . 38
. . 38
. . 40
. . 42
. . 43
. . 44
. . 45
. . 47
. . 47
. . 49
. . 50
. . 52
. . 53
. . 53
. . 53
. . 55
. . 56

Table page

3-1 Comparison of the registration results between CCRE and MI for a fixed syn-
thetic deform ation field . . . . . . . 30

3-2 Comparison of total time taken to achieve registration by the CCRE with MI. 31

3-3 Comparison of the value S of several brain structures for CCRE and MI. ... 33

5-1 Statistics of the error in estimated non-rigid deformation. . 68

Figure page

1-1 Illustration of groupwise registration of corpus callosum point-sets manually
extracted from the outer contours of the brain images. . . . 4

3-1 CCRE, MI and NMI traces plotted for the misaligned MR & CT image pair 20

3-2 Comparison of convergence speed between CCRE and MI . ..... 27

3-3 Plot demonstrating the change of Mean Deformation Error for CCRE and MI
registration results with time . . . . . . 28

3-4 Results of application of our algorithm to synthetic data (see text for details). .28

3-5 Registration results of MR T1 and T2 image slice with large non-overlap. 30

3-6 Registration results of different subjects of MR & CT brain data with real non-
rigid motion. (see text for details . . . . . . 32

4-1 Illustration of corpus callosum point-sets represented as density functions. 35

4-2 Results of rigid registration in noiseless case. 'o' and '+' indicate the model
and scene points respectively . . . . . . 54

4-3 Non-rigid registration of the corpus callosum pointsets. ........... 54

4-4 Experiment results on seven 2D corpus collasum point sets. ............ ..55

4-5 Robustness to outliers in the presence of large noise ............. .. .57

4-6 Robustness test on 3D swan data .................. ...... .. 57

4-7 Atlas construction from four 3D hippocampal point sets. . . . 58

5-1 M odel Illustration . . . . . . . . 61

5-2 Illustration of the various terms in the evolution of the level set function . 65

5-3 Results of application of our algorithm to synthetic data . ..... 67

5-4 Results of application of our algorithm to a pair of slices from human brain
M RIs . . . . . . ....... ... . 69

5-5 Corpus Callosum segmentation on a pair of corresponding slices from distinct
subjects . . . . . . . . . 70

5-6 Hippocampal segmentation using our algorithm on a pair of brain scans from
distinct subjects . . . . . . . . 71

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy



Fei Wang

August 2006

Chair: Baba C. Vemuri
Major Department: Computer and Information Sciences and Engineering

Information theory has played a fundamental role in many fields of science and

engineering including computer vision and medical imaging. In this dissertation, various

information theoretic measures that are used in achieving the goal of solving several

important problems in medical imaging namely, image registration, point-set registration

and image segmentation are presented.

To measure the information content in a random variable, we first present a novel

measure based on its cumulative distribution that is dubbed Cumulative Residual Entropy

(CRE). This measure parallels the well-known Shannon entropy but has the following

advantages: (1) it is more general than the Shannon entropy as its definition is valid

in the discrete and continuous domains, (2) it possesses more general mathematical

properties and (3) it can be easily computed from sample data and these computations

asymptotically converge to the true values. Based on CRE, we define the cross-CRE

(CCRE) between two random variables, and apply it to solve the image alignment

problem for parameterized transformations. The key strengths of the CCRE over using

the now popular Mutual Information (based on Shannon's entropy) between images being

aligned are that the former has significantly larger tolerance to noise and a much larger

convergence range over the field of parameterized transformations.

Jensen-Shannon (JS) divergence has long been known as a measure of cohesion

between multiple probability densities. Similar to the idea of defining an entropy measure

based on distributions, we derived a JS divergence based on probability distributions and

dub it as the CDF-JS divergence. We then apply the JS and the CDF-JS divergence to

the groupwise point-set registration problem, which involves simultaneously registering

multiple shapes (represented as point-sets) for constructing an atlas. Estimating a

meaningful average or mean shape from a set of shapes represented by unlabeled point-

sets is a challenging problem, since this usually involves solving for point correspondence

under a non-rigid motion setting. The novel and robust algorithm we propose avoids the

correspondence problem by minimizing the CDF-JS/JS divergence between the point-sets

represented as probability distribution/density functions. The cost functions are fully

symmetric with no bias toward any of the given shapes to be registered and whose mean

is being sought. We empirically show that CDF-JS is more robust to noise and outliers

than JS divergence. Our algorithm can be especially useful for creating atlases of various

shapes present in images as well as for simultaneously registering 3D range data sets

without having to establish any correspondence.

In the context of image segmentation, we developed a novel model-based seg-

mentation technique that involves segmenting the novel 3D image data by non-rigidly

registering an atlas to it. The key contribution here to the solution of this problem is that

we present a novel variational formulation of the registration assisted image segmenta-

tion task, which leads to solving a coupled set of nonlinear PDEs that are solved using

efficient numerical schemes. Our segmentation algorithm is a departure from earlier

methods in that we have a unified variational principle wherein non-rigid registration and

segmentation are simultaneously achieved; unlike previous solutions to this problem, our

algorithm can accommodate image pairs with very distinct intensity distributions.


In 1948, motivated by the problem of efficiently transmitting information over

a noisy communication channel, Claude Shannon introduced a revolutionary new

probabilistic way of thinking about communication and simultaneously created the first

truly mathematical theory of entropy. His ideas created a sensation and were rapidly

developed to create the field of information theory, which employs probability and

ergodic theory to study the statistical characteristics of data and communication systems.

Since then, information theory has played a fundamental role in many fields of science

and engineering including computer vision and medical imaging. In this dissertation, we

endeavor to develop novel information theoretic methods with the application to medical

image analysis.

We examine two applications in particular, image (point-set) registration and

image segmentation. In the first of these applications, we follow a promising avenue of

work in using a probability density or distribution function as the signature of a given

"object" (image or point-set). Then by optimizing certain information theoretic measures

between these functions, we achieve the desired registration. In the segmentation

application, we consider an atlas based approach, in which segmentation and registration

are simultaneously achieved by solving a novel variational principle.

1.1 Image and Point-set Registration

We start with the image registration problem and then move on to the point-set


1.1.1 Image Registration

The image registration problem is defined as follows: Given a pair of images

Ii(x, y) and I2 (x', y'), where (x', y') = T(x, y)' and T is the matrix corresponding

to the unknown parameterized transformation to be determined, define a match metric

M(Ii(x, y), I2(x', y')) and optimize M over all T.

The fundamental characteristic of any image registration technique is the type of

spatial transformation or mapping used to properly overlay two images. The transforma-

tion can be classified into global and local transformations. A global transformation is

given by a single equation which maps the entire image. Local transformations map the

image differently depending on the spatial location and are thus more difficult to express

succinctly. The most common global transformations are rigid, affine and projective


A transformation is called rigid if the distance between points in the image being

transformed is preserved. A rigid transformation can be expressed as

u(x, y) (cos(O)x sin(O)y + dx) x

v(x,y) = (sin(0)x + cos(0)y + dy) y

where u(x, y) and v(x, y) denote the displacement at point (x, y) along the X and Y

directions; 0 is the rotation angle, and (di, dy) the translation vector.

A transformation is called affine when any straight line in the first image is mapped

onto a straight line in the second image with parallelism being preserved. In 2D, the

affine transformation can be expressed as

u(x, y) = (alix a12y + dx) x
v(x, y) =(a21x a22y dy) y

where (2`1 '2 ) denotes an arbitrary real-valued matrix. Scaling transformation, which
has a transformation matrix of (3 ) and shearing transformation, which has a matrix

( 8 ) are two examples of affine transformation, where si,S2 and s3 are positive real

A more interesting case, in general, is that of a planar surface in motion viewed

through a pinhole camera. This motion can be described as a 2D projective transforma-

tion of the plane.

F(r Y) ao+al+a2 x
a l (1-3)

aV' 6x+axy+l Y

where ao, ..., a7 are the global parameters.

When a global transformation does not adequately explain the relationship of a

pair of input images, a local transformation may be necessary. Registering an image

pair obtained at different times with some portion of the body experiencing growth,

or registering two images from different patients, fall into this local transformation

registration category. A motion field is usually used to describe the change in local

transformation problem.

1.1.2 Groupwise Point-sets Registration

Point-set representations of image data, e.g., feature points, are commonly used

in many applications and the problem of registering them frequently arises in a variety

of these application domains. Extensive studies on the point set registration and related

problems can be found in a rich literature covering both theoretical and practical issues

relating to computer vision and pattern recognition.

Given N point-sets, which are denoted by {XP,p E {1,..., N}}, each point-set XP

consists of points {xj E RD, i E {1,..., ip} } and np is the number of points contained

in point-set XP. The task of multiple point pattern matching or point-set registration is

either to establish a consistent point-to-point correspondence between these point-sets or

to recover the spatial transformation which yields the best alignment. For example, we are

given a group of corpus callosum point-sets from the brain image scan, which is shown

in the left column of Figure 1-1. All the point-sets are registered simultaneously to the

point-sets shown in the right column in a symmetric manner, meaning that the registration

result is not biased towards any of the original point-set. We will discuss these issues in

greater detail in Chapter 4.

~*** '*





Figure 1-1: Illustration of groupwise registration of corpus callosum point-sets manually
extracted from the outer contours of the brain images.

1.2 Image Segmentation

Image segmentation plays a crucial role in many medical imaging applications by

automating or facilitating the delineation of anatomical structures. The segmentation of

structure from 2D and 3D images is an important first step in analyzing medical data. For

example, it is necessary to segment the brain in an MR image, before it can be rendered in

3D for visualization purposes. Segmentation can also be used to automatically detect the

head and abdomen of a fetus from an ultrasound image. The boundaries can then be used

to get quantitative estimates of organ sizes and provide aid in any necessary diagnoses.

Another important application is registration. It may be easier, or at least less error prone

to segment objects in multiple images prior to registration. This is especially true in

images from different modalities such as CT and MRI.

Image-guided surgery is another important application that needs image segmen-

tation. Recent advances in technology have made it possible to acquire images of the

patient while the surgery is in-progress. The goal is then to segment relevant regions of

interest and overlay them on an image of the patient to help guide the surgeon in his/her


Segmentation is therefore a very important task in medical imaging. However,

manual segmentation is not only a tedious and time consuming process, but is also

inaccurate. Segmentation by experts has shown to be variable up to 20%. It is therefore

desirable to use algorithms that are accurate and require as little user interaction as


1.3 Outline of Remainder

In the next chapter, we rigorously define a novel measure of information in a random

variable based on its cumulative distribution that we dub as cumulative residual entropy

(CRE). We also connect the measure to the mean residual life function in reliability

engineering. Thereafter follows the chapter on using the measure for multimodal image

registration. In Chapter 4, we present a simultaneous groupwise point-sets registration

and atlas construction algorithm, in which we minimize the proposed divergence

measures between point sets represented as probability densities or distributions. Based

on these new measures, we propose a novel variational principle in Chapter 5 for

solving the registration assisted image segmentation problem. Lastly, we end with some

concluding points and thoughts for future work.


2.1 Shannon Entropy and Related Measures

The concept of entropy is central to the field of information theory and was origi-

nally introduced by Shannon in his seminal paper [1] in the context of communication

theory. The entropy Shannon proposed is a measure of uncertainty in a discrete distri-

bution based on the Boltzman entropy of classical statistical mechanics. The Shannon

Entropy of a discrete distribution F is defined by

H(F) pilogpi, (2-1)

Since then, this concept and variants thereof have been extensively utilized in numerous

applications of science and engineering. To date, one of the most widely benefiting

application has been in financial analysis [2], data compression [3], statistics [4], and

information theory [5].

This measure of uncertainty has many important properties which agree with

our intuitive notion of randomness. We mention three: (1) It is always positive. (2) It

vanishes if and only if it is a certain event. (3) Entropy is increased by the addition of

an independent component, and decreased by conditioning. However, extension of this

notion to continuous distribution poses some challenges. A straightforward extension of

the discrete case to continuous distributions F with density f called differential entropy


(F) f(x)log f(x)dx (2-2)

However, this definition raises the following concerns, 1) First of all, it is defined based

on the density of the random variable, which in general may or may not exist, e.g., for

cases when the cumulative distribution function (cdj) is not differentiable. It would not

be possible to define the entropy of a random variable for which the density function

is undefined; 2) Secondly, the Shannon entropy of a discrete distribution is always

positive, while the differential entropy of a continuous variable may take any value

on the extended real line; 3) Shannon entropy computed from samples of a random

variable lacks the property of convergence to the differential entropy, i.e. even when the

sample size goes to infinity, the Shannon entropy estimated from these samples will not

converge to differential entropy [5]. The consequence of which is that it is impossible,

in general, to approximate the differential entropy of a continuous variable using the

entropy of empirical distributions; 4) Consider the following situation: Suppose X and

Y are two discrete random variables representing the height of a group of people, with

X taking on values {5.1,5.2, 5.3, 5.4, 5.5}, each with a probability 1/5 and Y taking on

values {5.1, 5.2, 5.3, 5.4, 7.5} (with Yao Ming in this group) again each with probability

1/5. The information content measured in these two random variables using Shannon

entropy is the same, i.e., Shannon entropy does not bring out any differences between

these two cases. However, if the two random variables represented the winning chances

in a basketball game, the information content in the two random variables would be

considered as being dramatically different. Nevertheless Shannon entropy fails to make

any distinction whatsoever between them. For additional discussion on some of these

issues the reader is referred to [6].

In this work we propose an alternative measure of uncertainty in a random variable

X and call it the Cumulative Residual Entropy (CRE) of X. The main objective of our

study is to extend Shannon entropy to random variables with continuous distributions.

The concept we proposed overcomes the problems mentioned above, while retaining

many of the important properties of Shannon entropy. For instance, both are decreased

by conditioning, while increased by independent addition. They both obey the data

processing inequality, etc. However, the differential entropy does not have the following

important properties of CRE.

1. CRE has consistent definitions in both the continuous and discrete domains;

2. CRE is always non-negative;

3. CRE can be easily computed from sample data and these computations asymptoti-

cally converge to the true values.

The basic idea in our definition is to replace the density function with the cumulative

distribution in Shannon's definition 2-1. The distribution function is more regular than

the density function, because the density is computed as the derivative of the distribution.

Moreover, in practice what is of interest and/or measurable is the distribution function.

For example, if the random variable is the life span of a machine, then the event of

interest is not whether the life span equals t, but rather whether the life span exceeds

t. Our definition also preserves the well established principle that the logarithm of

the probability of an event should represent the information content in the event. The

discussions about the properties of CRE in the next few sections, we trust, are convincing

enough for further development of the concept of CRE.

2.2 Cumulative Residual Entropy: A New Measure of Information

In this section, we define an alternate measure of uncertainty in a random variable

and then derive some properties about this new measurement. We do not delve into the

proofs but refer the reader to a more comprehensive mathematical treatment in [7].

Definition: Let X be a random vector in RN and X = (X, X2, ..., XN), F(A) :

P(IXI > A) is the cumulative residual distribution, where A = (A1, ....AN) and IX| > A
means IXil > Aj. F(A) is also called survival function in the Reliability Engineering

literature. We define the cumulative residual entropy (CRE) ofX, by

S(X) F(A) logF(A)dA (2-3)

where R (x, R" ; > 0).

CRE can be related to the well-known concept of mean residual life function in

Reliability Engineering which is defined as:

mF(t) = E(X tIX > t) t F x (2-4)
F (t)

The mF (t) is of fundamental importance in Reliability Engineering & is often used to

measure departure from exponentiality. CRE can be shown to be the expectation of mF(t)

[8], i.e.
S(x) = E(mF(x)) (2-5)

Now we give a few examples.

Example 1: (CRE of the uniform distribution)

Consider a general uniform distribution with the density function:

1 0 p(x) (2-6)
0 o.w

Then its CRE is computed as follows

S(X) P(IX > x)logP(IX| > x)dx

(1 X) log( )dx
o a a
-a (2-7)

Example 2: (CRE of the exponential distribution)

The exponential distribution with mean 1/A has the density function:

p(x) = Xe-X (2-8)

Correspondingly, the CRE of the exponential distribution is

S(x) e-X log e-Adt

te dt


Example 3: (CRE of the Gaussian Distribution)

The Gaussian probability density function is

1 (x m)2
p(x) exp[- 2 (2-10)

where m is the mean and o2 is the variance.

The cumulative distribution function is:

F(x) = erfc( (2-11)

where erfc is the error function:

erfc(x) -- exp(-t2/2)dt.

Then the CRE of the Gaussian distribution is:

S(x) = erfc( [- ) log [erfc(- m)]dx (2-12)

We'll now states important properties that are related to the application of CRE

to image registration. For a complete list of properties, we refer the readers to a more

comprehensive treatment in [7].

2.3 Properties of CRE

The traditional Shannon entropy of a sum of independent variables is larger than that

of either. We have analogously the following theorem:

Theorem 1 For any non-negative and independent variables X and Y,

max (S(X), S(Y)) < S(X + Y)

Proof: For a proof, see [7].

Similar to the case of Shannon's entropy, if X and Y are independent random

variables, S(X, Y) = E(IXI)(X) + E(IYI)S(Y). More generally,

Proposition 1 IfXi are independent, then

S(X) E(X) S(X)
i i~ j

For a proof, see [7].

Conditional entropy is a fundamental concept in information theory. We now define

the concept of conditioning in the context of CRE.

Definition: Given random vectors X and Y E RN, we define the conditional CRE

S(X|Y) by:

S(XIY) J P(IX| > xY) logP(|X| > x lY)dx (2-13)

As in the Shannon entropy case, conditioning reduces CRE.

Proposition 2 For any X and Y

E[S(XIY)] < S(X) (2-14)

Equality holds iff X is independent of Y.

2.3.1 CRE and Empirical CRE

Next theorem shows one of the salient feature of CRE. In the discrete case Shannon

entropy is always non-negative, and equals zero if and only if the random variable is a

certain event. However, this is not valid for the Shannon entropy in the continuous case

as defined in Eqn. 2-2. In contrast, in this regard CRE does not differentiate between

discrete and continuous cases, as shown by the following theorem:

Theorem 2 (X) > 0 and equality holds if and only if P[XI = A] = 1 for some vector

A, ie. IXi = Ai i il i probability 1.

Shannon entropy computed from samples of a random variable lacks the property

of convergence to the differential entropy (see Eqn. 2-2 for a definition). In contrast, the

CRE, S(x) computed from the samples converges to the continuous counterpart. This is

summarized in the following theorem.

Proposition 3 (Weak Convergence). Let the random vectors Xk converge in distribution

to the random vector X; by this we mean

lim E[(Xk)] = E[p(X)] (2-15)

for all bounded continuous functions 0 on RN, if all the Xk are bounded in LP for some

p > N, then

lim S(Xk) = S(X) (2-16)

Proof: Refer to [7] for the proof.

This is a powerful property and as a consequence of it, we can compute CRE of an

random variable from the samples which would converge to the true CRE of the random

variable. Note that Xk can be samples of a continuous random variable.

2.3.2 Robustness of CRE

We now investigate the robustness (or the lack thereof) of differential entropy and

prove that while differential entropy is not robust with respect to small perturbations,

CRE on the contrary is quite robust. This property plays a key role in demonstrating the

noise immunity of CCRE over MI depicted in the experiments in the next Chapter.

Theorem 3 Let X be a discrete R. V, taking value (x, x2, ..., XN), 1' i/h probabilities


p(X = x) = p 1 < i < N (2-17)

X has .V\Mlnn,, entropy: H(X) = pi log p. Let Y, have density f, and be

independent of X. Z, = X + Y, is no longer discrete, and has a density. Let X be as in

(2-17) and Y, as above. Suppose Y, -- 0 in probability. Then

h(X + Y,) -oo (2-18)

Theorem 4 For X and Y, as defined in Theorem 3,

lim S(X + Y) S(X) (2-19)

Proof: This is a direct consequence of the Proposition 3.

Theorems (3) and (4) are very important properties as they prove that the CRE is

robust to noise which is not the case for differential entropy. Intuitively, the robustness

of CRE maybe attributed to the use of CDF as opposed to a PDF in its definition, i.e., an

integral formulation as opposed to a differential formulation and it is well known that the

former is more robust compared to the later.


Matching two or more images under varying conditions illumination, pose,

acquisition parameters etc. is ubiquitous in Computer Vision, medical imaging,

geographical information systems etc. In the past several years, information theoretic

measures have been very widely used in defining cost functions to be optimized in

achieving a match. An example problem common to all the aforementioned areas is the

image registration problem. In the following, we will review the literature on existing

computational algorithms that have been reported for achieving multimodality image

registration, with the focus on the non-rigid registration methods. We will point out their

limitations and hence motivate the need for a new and efficient computational algorithm

for achieving our goal.

3.1 Related Work

Non-rigid image registration methods in literature to date may be classified into

feature-based and "direct" methods. Most feature-based methods are limited to determin-

ing the registration at the feature locations and require an interpolation at other locations.

If however, the transformation/registration between the images is a global transformation

e.g., rigid, affine etc. then, there is no need for an interpolation step. In the non-rigid case

however, interpolation is required. Also, the accuracy of the registration is dependent on

the accuracy of the feature detector.

Several feature-based methods involve detecting surfaces landmarks [9, 10, 11, 12],

edges, ridges, etc. Most of these assume a known correspondence with the exception

of the work in Chui et al.[9], Jian and Vemuri [13], Wang et al.[14] and Guo et al. [15].

Work reported in Irani and Anandan [16] uses the energy (squared magnitude) in the

directional derivative image as a representation scheme for matching achieved using the

SSD cost function. Recently, Liu et al. [17] reported the use of local frequency in a robust

statistical framework using the integral squared error a.k.a., L2E. The primary advantage

of L2E over other robust estimators in literature is that there are no tuning parameters in

it. The idea of using local phase was also exploited by Mellor and Brady [18], who used

mutual information (MI) to match local-phase representation of images and estimated

the non-rigid registration between them. However, robustness to significant non-overlap

in the field of view (FOV) of the scanners was not addressed. For more on feature-based

methods, we refer the reader to the recent survey by Zitova and Flusser [19].

In the context of "direct" methods, the primary matching techniques for intra-

modality registration involve the use of normalized cross-correlation, modified SSD,

and (normalized) mutual information (MI). Ruiz-Alzola et al.[20] presented a unified

framework for non-rigid registration of scalar, vector and tensor data based on template

matching. For scalar images, the cost function is the extension of modified SSD using a

different definition of inner products. However this model can only be used on images

from the same modality as it assumes similar intensity values between images. In [21,

22], a level-set based image registration algorithm was introduced that was designed to

non-rigidly register two 3D volumes from the same modality of imaging. This algorithm

was computationally efficient and was used to achieve atlas-based segmentation. Direct

methods based on the optical-flow estimation form a large class for solving the non-rigid

registration problem. Hellier et al.[23] proposed a registration method based on a dense

robust 3-D estimation of the optical flow with a piecewise parametric description of

the deformation field. Their algorithm is unsuitable for multi-modal image registration

due to the brightness constancy assumption. Variants of optical flow-based registration

that accommodate for varying illumination maybe used for inter-modality registration

and we refer the reader to [24, 25] for such methods. Guimond et al., [26] reported a

multi-modal brain warping technique that uses Thirion's Demons algorithm [27] with

an adaptive intensity correction. The technique however was not tested for robustness

with respect to significant non-overlap in the FOVs. More recently, Cuzol et al. [28]

introduced a new non-rigid image registration technique which basically involves a

Helmoholtz decomposition of the flow field which is then embedded into the brightness

constancy model of optical flow. The Helmholtz decomposition allows one to compute

large displacements when the data contains such displacements. This technique is

an innovation on accommodating for large displacements and not one that allows for

intermodality non-rigid registration. For more on intra-modality methods, we refer the

reader to the comprehensive surveys [29, 19].

A popular framework for "direct" methods is based on the information theoretic

measures [30], among them, mutual information (MI) pioneered by Viola and Wells [31]

and Collignon et al., [32] and modified in Studholme et al., [33] has been effective in

the application of image registration. Reported registration experiments in these works

are quite impressive for the case of rigid motion. The problem of being able to handle

non-rigid deformations in the MI framework is a very active area of research and some

recent papers reporting results on this problem are [18, 34, 35, 36, 37, 38, 39, 40, 41,

42]. In [34], Mattes et al., and in [35], Rueckert et al., presented mutual information

based schemes for matching multi-modal image pairs using B-Splines to represent the

deformation field on a regular grid. Guetter [43] recently incorporated a learned joint

intensity distribution into the mutual information formulation, in which the registration

is achieved by simultaneously minimizing the KL divergence between the observed

and learned intensity distributions and maximizing the mutual information between

the reference and alignment images. Recently, D'Agostino et al., [44] presented an

information theoretic approach wherein tissue class probabilities of each image being

registered are used to match over the space of transformations using a divergence measure

between the ideal case (where tissue class labels between images at corresponding voxels

are similar) and actual joint class distributions of both images. This work expects a

segmentation of either one of the images being registered. Computational efficiency and

accuracy (in the event of significant non-overlaps) are issues of concern in most if not all

the MI-based non-rigid registration methods.

Finally, some registration methods under the direct approach are inspired by

models from mechanics, either from elasticity [45, 46], or fluid mechanics [47, 48].

Fluid mechanics-based models accommodate for large deformations, but are largely

computationally expensive. Christensen [49] recently developed an interesting version

of these methods, where the direct deformation field and the inverse deformation field

are jointly estimated to guarantee the symmetry of the deformation with respect to

permutation of input images. A more general and mathematically rigorous treatment

of the non-rigid registration which subsumes the fluid-flow methods was presented in

Trouve [50]. All these methods however are primarily applicable to intra-modality and

not inter-modality registration.

3.2 Multimodal Image Registration using CCRE

Based on CRE, cross-CRE (CCRE) between two random variables was defined,

and applied to solve the image alignment problem, which is defined as: Given a pair

of images Il(x) and I2(x'), where (x') = T(x)t and T is the matrix corresponding

to the unknown parameterized transformation to be determined, define a match metric

MA(I1 (x), 12(x')) and maximize/minimize M over all T. The class of transformations can

be rigid, affine, projective or non-rigid transformations. Several matching criteria have

been proposed in the past, some of which were reviewed earlier. Amongst them, mutual

information is very popular and is defined as follows for the continuous random variable


MI(X, Y) = h(X) + h(Y) h(X, Y) (3-1)

where h(X) is the differential entropy of the random variable X and is given by h(x) =

f p(x)lnp(x)dx, where p(x) is the probability density function and can be estimated
from the image data using any of the parametric and nonparametric methods. The reason

for defining MI in terms of differential entropy as opposed to Shannon entropy is to

facilitate the optimization of MI with respect to the registration parameters using any

of the gradient based optimization methods. Note that MI defined using the Shannon's

entropy in discrete form will not converge to continuous case defined here due to the fact

that Shannon's entropy does not converge to the differential entropy (see [5]).

We now define the cross-CRE (CCRE) using CRE defined in Eqn. 2-3.

C(X, Y) = S(X) E[S(X/Y)], (3-2)

We will use this quantity as a matching criterion in the image alignment problem.

More specifically, let Ir(x) be a test image we want to register to a reference image

IR(x). The transformation g(x; tt) describes the deformation from VT to VR, where
VT and VR are continuous domains on which IT and Ip are defined, it is the set of the

transformation parameters to be determined. We pose the task of image registration as

an optimization problem. To align the reference image IR(x) with the transformed test

image Ir(g(x; tt)), we seek the set of the transformation parameters tt that maximizes

C(IT, IR) over the space of smooth transformations i.e.,

p= argmax C (IT g(x; lt),In) (3-3)

The computation of CCRE requires estimates of the marginal and joint probability

distributions of the intensity values of the reference and test images. We denote p(l, k; tt)

as the joint probability of (IT o g(x; 1t), IR). Let pr(1; 1t) and pR(k) represent the

marginal probability for the test image and reference images respectively, LT and LR

are the discrete sets of intensities associated with the test image and reference image

respectively. Then, we can rewrite the CCRE (IT o g(x; p), IR) as follows:

C(I, o g(x; p), IR) = E(IT) E[S(IT 0 g(x; p)/IR)]

E 1rP(l; )dl log [ pT(l; )dl

+ J p(k) p(l, k; ) dl log [ p(l, k; ) dl] (3-4)
kELR AELT A PR() pR( k)

Let P(i > A; p) pTr(l; p)dl and P(i > A, k; p) f p(l, k; t)dl. Using the fact

that pT(l; A) kELR p(l, k; tt), we have P(i > A; p) = kELR P(i > A, k; p). Eqn.
(3-4) can be further simplified, which leads to,

C (T o g (x; A), IR)

P(i > A; ) logP(i > A; t) + P(i > A, k;) log P(i > A, k; )
P(i > A,k; ) logP(i > A; ) (3-5)

+ P(>A,k; p) log P(i > A, k; )

C E P(>AXk ;P) [1 P(i > A, k; logP( >A; )]
ALT AcLp pR (k)
E P(i > A, k; p) log P(i > k; ) (3-6)

To illustrate the difference between CCRE and the now popular information theoretic

cost functions such as MI & NMI, we choose to plot these functions against a parameter

of the transformation, for illustrative purposes, say the rotations. The image pair we
used here is MR & CT images that were originally aligned, and the MR and CT data

intensities range from 0-255 with the mean 55.6 and 60.6 respectively. The cost functions

are computed over the rotation angle that was applied to the CT image to misalign it with
respect to the MR image. In each plot of the Figure 3-1 the X-axis shows the 3D rotation

angle about Z axis, while the Y-axis shows the values of CCRE, MI and NMI computed
from the misaligned (by a rotation) image pairs. The second row shows a zoom-in view

of the plots over a smaller region, so as to get a detailed view of the cost function. The

following observations are made from this plot:

CCRE MI Normalized MI
16 k
16 0 45

14 04 11
0 35
12 03

10 025 1 05
-40 -20 0 20 40 -40 -20 0 20 40 -40 -20 0 20 40

CCRE MI Normalized MI
01 04338 11117

305 0 4336
1 1117
0 4334
195 04332 1 1116
95 0 433
1 1115
,99 0 4328
985 04326 11115
-04 -02 0 02 04 -04 -02 0 02 04 -04 -02 0 02 04
CCRE MI Normalized MI
1598 04056 11029

1597 4054

1596 1 1028
0 405

0 4048

15 94 04046 11027
-04 -02 0 02 -04 -02 0 02 04 -04 -02 0 02 04

Figure 3-1: CCRE, MI and NMI traces plotted for the misaligned MR & CT image pair
where misalignment is generated by a rotation of the CT image. First row: over the range
-400 to 400. Second row: zoom in view between -0.50 to 0.50, where the arrows in the
first row signify the position. Note that all three cost function are implemented with tri-
linear interpolation. Third row: Three cost functions implemented with partial volume
interpolation [32].

1. Similar to MI and NMI, the maximum of CCRE occurs at 0 of rotation, which

confirms that our new information measure needs to be maximized in order to find

optimum transformation between two misaligned images.

2. The CCRE shows much larger range of values than MI & NMI. This feature plays

an important role in the numerical optimization since it leads to a more stable

numerical implementation by avoiding cancelation, round off etc. that often plague

arithmetic operations with smaller numerical values.

3. Upon closer inspection, we observe that CCRE is much smoother than the MI and

NMI for the MR& CT data pair, therefore verifies that CCRE is more regular than

other information theoretic measures.

3.2.1 Transformation Model for Non-rigid Motion

We model the non-rigid deformation field between two 3D image pairs using a

cubic B-splines basis in 3D. B-splines have a number of desirable properties for use

in modeling the deformation field. (1) Splines provide inherent control of smoothness

(degree of continuity). (2) B-splines are separable in multiple dimensions which provides

computational efficiency. Another feature of B-splines that is useful in a non-rigid

registration system is the "local control". Changing the location of a single control point

modifies only a local neighborhood of the control point.

The basic idea of the cubic B-spline deformation is to deform an object by manipu-

lating an underlying mesh of control points 7y. The deformation g is defined by a sparse

regular control point grid. In 3D case, the deformation at any point x = [x, y, z]T in the

test image can be interpolated with a linear combination of cubic B-spline convolution

g(x) 6jo(3) ) (3-7)
g6 (3-7)

where /(3)(x) = (3) (x)3(3) (y) (3)(z) and Ap is spacing of the control grid. 6j is the

expansion B-spline coefficients computed from the sample values of the image. For the

implementation details, we refer the reader to Forsey[51] and Mattes [34].

3.2.2 Measure Optimization

Calculation of the gradient of the energy function is necessary for its efficient and

robust maximization. The gradient of CCRE is given as,

VC ac (3 8)
VC = 0[.. ] (3-8)

Each component of the gradient can be found by differentiating Eqn. (3-4) with respect
to a transformation parameters. We consider the two terms in Eqn. (3-4) separately when
computing the derivative. For the first term in Eqn. (3-4), we have,

(IT) 0 [ pr(l; t)dl x log pr(1; t)dl
Ott 0A A


where P(i > A; p) = jfpr(;


log (P(i > A; p)) + x P A )

i)dl, and

> A; p) pT(1, ) dl



The derivative of the second term is given by,

OE[S(I o g(x; tp)/lp)]

aR(k) p(l x log (j p(l k;) l)
>: PR AELT(k) pR(k)
S5 (log P(i > A, k; ) + aP(i > A, k; )
= : 2^(log + 1)
kELR AELT p (k) Ott

where P(i > A, k; pt) = fA p(1, k; pt)dl, and

OP(i > A, k; tt)

Combining the derivatives of the two terms together, and using the fact that

Opr(l; A) aEkCLR p(l, k; p)
Ott Ott




"0p(, k; 9p )lk
I dl

we have the analytic gradient of CCRE,

C (IT o g(x; l), IR)

S[log P> A + kELR P(i > A, k; )
[log P(i > A; p) + 1] O P -
[logP(i > A, k; ) aP(i > A, k;)

S [logP(i > A;. ) +]OP( > A,k;
+ [log P(i> A, k; ) 8P(i > A,k; i)
pR(k) Ott
^ ~l^ P(i> A,k; l) 9P(i
= [o^g f log P(i > A; p)]x

aELT kL9k
note that in the derivation, we use the fact that P(i > A; p) = CkELR
Comparing the expressions for CCRE and derivative of CCRE


> A, k; )

P(i > A, k; ).

{C(I o g(x; ),IR) ELT kE L log pP(i>;) x P (i> A, k; t) ( )
ac (x;),I P(i>A,k;p) P(i>A,k;p)
SZAELT ZE kfL ELRo pR(k)P(i>A;p) )

we note that the two formulas in (3-15) are similar to each other and they share the

common term log p(ki>Ap) From a computational viewpoint, this is quite beneficial

since the common term can not only save memory space, but also make the calculation
of gradient more efficient. From the formulation, we can also see that calculation of

CCRE and derivative of CCRE require us to find a method to estimate P(i > A, k; p) and
aP(i ,k;) We will address the computation of these terms in the next subsection.

3.2.3 Computation of P(i > A, k; p) and 'P(i>i,k;p)
We will use the parzen window technique to estimate the cumulative distribution

function and its derivative. The calculation of P(i > A, k; p) requires estimate of the

cumulative probability distributions of the intensity values of the reference and test

images. Let 3(0) be a zero-order spline Parzen window (centered unit pulse) and p(3) be a

cubic spline Parzen window, the smoothed joint probability of (IR, IT o g) is given by

p(1, k; p) a k() (k I- (x) (3) IT (g (X f) (3-16)

where a is a normalization factor that ensures pp(l, k) = 1, and IR(x) and Ir(g(x; t)

are samples of the reference and interpolated test images respectively, which is normal-

ized by the minimum intensity value, f fo, and the intensity range of each bin, AbR,


Since P(i > A, k; tt) = fA p(l, k; pt)dl, we have the following,

P(i > A,k; ) p(l,k; t)dl

) O IR ( X ) --
ay~i( IR(x) f Igx 0)) IT (g(Xo
0 ( AbR A AbT '

aY3(o)(k IR(x)- f ( ( iT3(9 (x fIT) (3-17)

where 4 () is the cumulative residual function of cubic spline kernel defined as


4) j) / (3) (u)

1.0 v < -2

1.0 (+2)4 -2 < v < -1
1 2 v - 1 2 3 0< <1
2 3 3 8
(v-2)4 < v < 2
1 24
0 v>2

Note that d(u= -(3) (u), we can then take the derivative of Eqn. 3-17 with respect to

pi, and we get

P(i > A, k; p) /a (o) Ix) fR (g (x; t)) f
t bT AbR Ab )
SlT(t) A))Og(x; )
a t t O(x;

AbT^ vbbR AbI
X (OIT(t) 8g(x; A) (3-19)
x ((3-19)
\at t=g(x; Op)/ ) d

where a') is the image gradient.

3.2.4 Algorithm Summary

The registration algorithm can be summarized as follows,

1 For the current deformation field, interpolate the test image by IT o g(x; pA).

Calculate P(i > A, k; p) and 'P(i>,k;,) using Eqn. (3-17) and Eqn. (3-19)


2 Compute P(i > A; p) as ZkELR P(i > A, k; p;), which is used to calculate the

common term in both CCRE and gradient of CCRE, i.e., log P(i>Ak )
PR(k) P(i>A;)
3 Compute the energy function and its gradient using the formulas given in Eqn.

(3-15), we can then use the Quasi-Newton method to numerically solve the

optimization problem.

4 Update the deformation field g(x; pt). Stop the registration process if the differ-

ence in consecutive iterates is less than c = 0.01, a pre-chosen tolerance, otherwise

go to Step 1.

3.3 Implementation Results

In this section, we present the results of applying our non-rigid registration algorithm

to several data sets. The results are presented for synthetic as well as real data. The first

set of experiment was done with synthetic motion. We show the advantage of using the

CCRE measure in comparison to other information theoretic registration methods. We

show that CCRE is not only more robust, but also converges faster than others. We begin

by applying CCRE to register image pairs for which the ground truth was available.

3.3.1 Synthetic Motion Experiments

In this section, we demonstrate the robustness property of CCRE and will make

a case for its use over Mutual Information in the alignment problem. The case will be

made via experiments depicting faster convergence speed and superior performance

under noisy inputs in matching the image pairs misaligned by a synthetic non-rigid

motion, Additionally we will depict a larger capture range over MI-based methods in the

estimation of the motion parameters.

The data we use for this experiment are corresponding slices from an MR T1 and

T2 image pair, which is from the brainweb site at the Montreal Neurological Institute

[52]. They are originally aligned with each other. The two images are defined on a

1mm isotropic voxel grid in the Talairach space, with dimension (256 x 256). We then

apply a known non-rigid transformation to the T2 image and the goal is to recover this

deformation by applying our registration method. The mutual information scheme which

we are going to compare with is originally reported in literature [34] [53], in which the

explicit gradient forms are presented and thus allowing for the application of gradient

based optimization methods. Convergence speed

In order to compare the convergence speed of CCRE versus MI, we design the

experiment as follows: with the MR T1 & T2 image pair as our data, we choose the MR

Tl image as the source, the target image was obtained by applying a known smooth non-

rigid transformation that was procedurally generated. Notice the significant difference

between the intensity profiles of the source and target images. For comparison purposes,

we use the same gradient descent optimization scheme, and let the two registration

methods run for the same amount of time, and show the registration result visually and


Source Image Target Image

Transformed Source using CCRE Transformed Source using MI

Figure 3-2: Upper left, MR T1 image as source image; Upper right, deformed MR T2
image as target image; Lower left and right, results of estimated transformations using
CCRE and MI applied to the source respectively. Both algorithms run for 30 seconds
using the same gradient descent technique.

The source and target image pair along with the results of estimated transformation

using CCRE and MI applied to the source are shown in Figure 3-2. As evident visually,

we observe that the result generated by CCRE is more similar in shape with the target

image than the one produced by MI.

Quantitative assessment of accuracy of the registration is presented subsequently

in Figure 3-3, where we plotted the change of mean deformation error (MDE) obtained

for the CCRE-based algorithm and the MI-based algorithm respectively. MDE is defined

as dm = car R I 9go (xi) g(x) 11, where go(xi) and g(xi) are the ground truth

and estimated displacements respectively at voxel xi. |. | denotes the Euclidean norm,

and R is the volume of the region of interest. In both cases mean deformation error

are decreasing with time, but the solid line is decreasing faster than the dotted line.

For example, it takes about 5 minutes for MI to reach the error level inside 1.2 while

CCRE only requires about half of the time as that required by MI to get to the level. This

4 MI Results
CCRE Results


15- -- G

0 1 2 3 4 5 6 7 8
Time (minutes)

Figure 3-3: Plot demonstrating the change of Mean Deformation Error for CCRE and
MI registration results with time. Solid line shows the MDE for CCRE registration result,
while dotted line illustrates the MDE for MI result.

empirically validates the faster convergence speed of CCRE based algorithm over the

MI-based algorithm. Registration accuracy

Using the same experiment setting as in the previous experiment, we present the

registration error for our algorithm in the estimated non-rigid deformation field as an

indicator of the accuracy of estimated deformations. Figure depicts the results





0 2 4 6

Figure 3-4: Results of application of our algorithm to synthetic data (see text for details).

obtained for this image pair. which is organized as follows, from left to right: the first

row depicts the source image with the target image segmentation superposed to depict

the amount of mis-alignment, the registered source image which is obtained using

our algorithm superposed with the target segmentation, followed by the target image;

second row depicts ground truth deformation field which we used to generate the target

image from the MR T2 image, the estimated non-rigid deformation field followed by

histogram of the estimated magnitude error. Note that the error distribution is mostly

concentrated in the small error range indicating the accuracy of our method. As a measure

of accuracy of our method, we also estimated the average, p, and the standard deviation,

o, of the error in the estimated non-rigid deformation field. The error was estimated as the

angle between the ground truth and estimated displacement vectors. The average and

standard deviation are 1.5139 and 4.3211 (in degrees) respectively, which is quite

accurate. Noise immunity

In the next experiment, we compare the robustness of the two methods (CCRE,

MI) in the presence of noise. Still selecting the MR T image slice from the previous

experiment as our source image, we generate the target image by applying a fixed smooth

synthetic deformation field. We conduct this experiment by varying the amount of

Gaussian noise added and then for each instance of the added noise, we register the

two images using the two techniques. We expect both schemes are going to fail at some

level of noise. ("failed" here means that the optimization algorithm primarily diverged).

By comparing the noise magnitude of the failure point, we can show the degree to

which these methods are tolerant. The numerical schemes we used to implement these

registrations are all based on BFGS quasi-Newton algorithm.

The mean magnitude of the synthetic motion is 4.37 pixel, with the standard

deviation at 1.8852. Table3-1 show the registration results for the two schemes. From the

table, we observe that the MI fails when the standard deviation of the noise is increased

to 40, while CCRE is tolerant until 66, a significant difference when compared to the MI.

Table 3-1: Comparison of the registration results between CCRE and MI for a fixed
synthetic deformation field.

a MDE Standard Deviation MDE Standard Deviation
10 1.0816 0.9345 1.3884 1.4538
19 1.1381 1.1702 1.4871 1.5052
30 1.1975 1.3484 1.5204 1.5615
40 1.3373 1.6609 FAIL
60 1.3791 1.9072

This experiment conclusively depicts that CCRE has more noise immunity than MI when

dealing with the non-rigid motion. Partial overlap

Figure 3-5 depicts an example of registration of the MR T and T2 data sets with

large nonoverlap. The left image of the figure depicts the MR T1 brain scan as the source

image, and the right image shows the MR T2 data as the target. Note that the FOV for the

data sets are significantly nonoverlapping. The nonoverlap was simulated by cutting 66%

of the MR T1 image (Source image). The middle column depicts the transformed source

image along with an edge map of the target (Deformed MR T2 image) superimposed on

the transformed source. As is evident, the registration is visually quite accurate.

Figure 3-5: Registration results of MR T1 and T2 image slice with large non-overlap.
(left) MR T1 source image before registration; (right) Deformed T2 target image; (mid-
dle) the transformed MR image superimposed with edge map from target image.

3.3.2 Real Data Experiments

In this section, we present the performance of our method on a series of CT &

MR data containing real non-rigid misalignments. For the purpose of comparison, we

also apply traditional MI implemented as was presented in Mattes et al. [34] to these

same data sets. The CT image is of size (512, 512, 120) while the MR image size is

(512, 512, 142), and the voxel dimensions are (0.46, 0.46, 1.5)mm and (0.68, 0.68, 1.05)

for CT and MR respectively. The registration was performed on reduced volumes

(210 x 210 x 120) with the control knots placed every 16 x 16 x 16 voxels. The

program was written in the C++ programming language, and all experiments were run on

a 2.6GHZ Pentium PC.

Table 3-2: Comparison of total time taken to achieve registration by the CCRE with MI.

1 2 3 4 5 6 7 8
CCRE Time (s) 4827 3452 4345 4038 3910 4510 5470 3721
MITime(s) 9235 6344 10122 17812 12157 11782 13157 10057

We have used a set of eight volumes of CT data sets and the task was to register

these eight volumes to the MR data chosen as the target image for all registrations,

by using both CCRE and MI algorithms. Note that all CT & MR volumes are from

different subjects and thus contains real non-rigid motion. The parameters used with both

algorithms were identical. For both algorithms, the optimization of the cost functions was

halted when improvements of at least 0.0001 in the cost function could not be detected.

The time required for registering all data sets for our algorithm as well as MI method

are given in Tables 3-2. This table shows that, on average, our CCRE algorithm is about

2.5 times faster than the traditional MI approach for this set of experiments. For brevity,

we only show one registration result in Figure 3-6. Here, one slice of the volume is

shown on first row with the source CT image at left and reference image at right. The

middle image show the transformed CT image slice superimposed with edge map from

target image. On the second row, the source image superimposed with edge map from

target image is shown on the left, while shown in the middle and right are the surfaces

reconstructed from the transformed source using CCRE method and the target MR

image respectively. From this figure, we can see that the source and target image depict

considerable non-rigid changes in shape, nevertheless our method was able to register

these two images quite accurately. To validate the conformity of the two reconstructed

surfaces, we randomly sample 30 points from the surface of the transformed source

using CCRE, and then estimate the distances of these points to the surface of the target

MR volume. The average of these distances is about 0.47mm, which indicates a very

good agreement between two surfaces. The resemblance of the reconstructed shapes

from transformed source with the target indicates that our CCRE algorithm succeeded in

matching the source CT volume to the target MR image.

Figure 3-6: Registration results of different subjects of MR & CT brain data with real
non-rigid motion. (see text for details.

The accuracy of the information theoretic based algorithm for non-rigid registration

problems was assessed quantitatively by means of an region-based segmentation task

[54]. ROIs (whole brain, eyes) were segmented automatically in these eight CT data sets

used as the source image and binary masks were created. The deformation fields between

the CT and MR volume were computed and used to project the masks from each of the

CT to the MR volume. Contours were manually drawn on a few slices chosen at random

in MR volume (four slices/volume). Manual contours on MR and contours obtained

automatically were then compared using an accepted similarity index defined as two

times the number of pixels in the intersection of the contours divided by the sum of the

number of pixels within each contour [41]. This index varies between zero (complete

disagreement) and one (complete agreement) and is sensitive to both displacement and

differences in size and shape. Table 3-3 lists mean values for the similarity index for

each structure. It is customarily accepted that a value of the similarity index above 0.80

indicates a very good agreement between contours. Our results are well above this value.

For comparison purpose, we also computed the same index for the MI method. We can

conclude from the table that our CCRE can achieve better registration accuracy than the

MI for the task of non-rigid registration of real multi-model images.

Table 3-3: Comparison of the value S of several brain structures for CCRE and MI.

Volume 1 2 3 4 5 6 7 8
Whole Brain 0.987 0.996 0.974 0.962 0.975 0.967 0.988 0.981
CCRE Left Eye 0.925 0.935 0.925 0.907 0.875 0.890 0.834 0.871
Right Eye 0.840 0.940 0.891 0.872 0.851 0.829 0.910 0.921
Whole Brain 0.986 0.981 0.976 0.96 0.950 0.961 0.942 0.952
MI Left Eye 0.911 0.893 0.904 0.791 0.853 0.810 0.851 0.853
Right Eye 0.854 0.917 0.889 0.814 0.849 0.844 0.897 0.854


Matching point patterns is ubiquitous in many fields of Engineering and Science e.g.,

medical imaging, sports science, archaeology, and others. Point sets are widely used in

computer vision to represent boundary points of shapes contained in images or any other

salient features of objects contained in images. Given two or more images represented

using the salient features contained therein, most often than not, one is interested in

matching these (feature) point patterns to determine a linear or a nonlinear transformation

between the coordinates of the feature point sets. Such transformations capture the

changes in the pattern geometry characterized by the given feature point set.

The primary technical challenge in using point-set representations of shapes is

the correspondence problem. Typically correspondences can be estimated once the

point-sets are properly aligned with appropriate spatial transformations. If the objects

at hand are deformable, the adequate transformation would obviously be a non-rigid

spatial mapping. Solving for non-rigid deformations between point-sets with unknown

correspondence is a hard problem. In fact, many current methods only attempt to solve

for affine transformation for the alignment [55]. Furthermore, we also encounter the issue

of the bias problem in groupwise point-sets registration. If one arbitrarily chooses any one

of the given data sets as a reference, the estimated registration transformation would be

biased toward this chosen reference and it would be desirable to avoid such a bias. The

question that arises is: How do we align all the point-sets in a symmetric manner so that

there is no bias toward any particular point-set?

To overcome these aforementioned problems, we present a novel approach to

simultaneously register multiple point-sets and construct the atlas. The idea is to model

each point set by a kernel probability density or distribution, then quantify the distance

between these probability densities or distributions using information-theoretic measures.

Figure 4-1 illustrate this idea, where the right column of the figure is the density function

corresponding to the corpus callosum point-sets shown in the left. The distance is


Figure 4-1: Illustration of corpus callosum point-sets represented as density functions.

optimized over a space of coordinate transformations yielding the desired registrations.

It is obvious that once all the point sets are deformed into the same shape, the distance

measure between these distributions should be minimized since all the distribution are

identical to each other. We impose regularization on each deformation field to prevent

over-deforming of each point-sets (e.g. all the point-sets may deform into a single data


The rest of the chapter is organized as follows: we begin by reviewing all the related

literatures, which is followed by a description of the divergence measures we used for

quantify the distance between densities or distributions. We then present the details of

our energy function, and the empirical way of estimating the cost functions and their

derivatives. Finally we will show the experimental results at the end of this chapter.

4.1 Previous Work

Extensive studies on the atlas construction for deformable shapes can be found in

literature covering both theoretical and practical issues relating to computer vision and

pattern recognition. According to the shape representation, they can be classified into

two distinct categories. One is the methods dealing with shapes represented by feature

point-sets, and everything else is in the other category including those shapes represented

as curves and surfaces of the shape boundary, and these curves and surfaces may be

either intrinsically or extrinsically parameterized (e.g. using point locations and spline


The work presented in [56] is a representative method using an intrinsic curve

parameterization to analyze deformable shapes. Shapes are represented as elements of

infinite-dimensional spaces and their pairwise difference are quantified using the lengths

of geodesics connecting them on these spaces, the intrinsic mean (Karcher mean) can

be computed as a point on the manifold (of shapes) which minimize the sum of square

geodesic distance between this unknown point to each individual shape, which lies on the

manifold. However the curves are limited by closed curves, and it has not been extended

to the 3D surface shapes. For methods using intrinsic curve or surface representations

[56, 57, 58], further statistical analysis on these representations is much more difficult

than analysis on the point representation, but the reward maybe higher due to the use of

intrinsic higher order representation.

Among these methods using point-sets parameterization, the idea of using non-

rigid spatial mapping functions, specifically thin-plate splines [59, 60, 61], to analyze

deformable shape has been widely adopted. Bookstein's work in [59], successfully

initiated the research efforts on the usage of thin-plate splines to model the deformation

of shapes. This method is landmark-based, it avoids the correspondence problem since

the placement of corresponding points is driven by the visual perception of experts,

however it suffers from the the typical problem besetting landmark methods, e.g.

inconsistency. Several significant articles on robust and non-rigid point set matching have

been published by Rangaranjan and collaborators [62, 60, 63] using thin-plate splines.

In their recent work [60], they attempt to extend their work to the construction of an

mean shape from a set of unlabeled shapes which are represented by unlabeled point-sets.

The main strength of their work is the ability to jointly determine the correspondences

and non-rigid transformation between each point sets to the emerging mean shape using

deterministic annealing and soft-assign. However, in their work, the stability of the

registration result is not guaranteed in the case of data with outliers, and hence a good

stopping criterion is required. Unlike their approach, we do not need to first solve a

correspondence problem in order to subsequently solve a non-rigid registration problem.

The active shape model proposed in [64] utilized points to represent deformable

shapes. Their work pioneered the efforts in building point distribution models to under-

stand deformable shapes [64, 65]. Objects are represented as carefully-defined landmark

points and variation of shapes are modeled using a principal component analysis. These

landmark points are acquired through a more or less manual landmarking process where

an expert goes through all the samples to mark corresponding points on each sample. It is

a rather tedious process and accuracy is limited. In recent work [66], the authors attempt

to overcome this limitation by attempting to automatically solve for the correspondences

in a non-rigid setting. The resulting algorithm is very similar to the earlier work in [58]

and is restricted to curves. The work in [55] also uses 2D points to learn shape statistics,

which is quite similar to the active shape model method except that more attention has

been paid to the sample point-sets generation process from the shape. Unlike our method,

the transformation between curves are limited by rigid mapping, and process is not


There are several papers in the point-sets alignment literature which bear close

relation to our research reported here. For instance, Tsin and Kanade [67] proposed

a kernel correlation based point set registration approach where the cost function is

proportional to the correlation of two kernel density estimates. It is similar to our

work since we too model each of the point sets by a kernel density function and then

quantify the (dis)similarity between them using an information-theoretic measure,

followed by an optimization of a (dis)similarity function over a space of coordinate

transformations yielding the desired transformation. The difference lies in the fact that

divergence measures used in our work is a lot more general than the information-theoretic

measure used in [67], and can be easily extended to multiple point-sets. More recently,

in [68], Glaunes et al. convert the point matching problem into an image matching

problem by treating points as delta functions. Then they "lift" these delta functions and

diffeomorphically match them. The main problem for this technique is that they need

a 3D spatial integral which must be numerically computed, while we do not need this

due to the empirical computation of the divergence measures. We will show it in the

experimental results that our method, when applied to match point-sets, achieves very

good performance in terms of both robustness and accuracy.

4.2 Divergence Measures

In probability theory and information theory, divergence measures generally stands

for those measures that quantify "distance" between probability distributions. If there are

multiple distributions, the divergence will serve as a measure of cohesion between these

distributions. Since we are dealing with groupwise point-sets, which will be represented

as multiple probability densities/distributions, we will focus on these divergence measures

between multiple distributions.

4.2.1 Jensen-Shannon Divergence

There are many information and divergence measures exist in the literature on

information theory and statistics. The most famous one among them are Kullback-Leiber

(KL) divergence. The KL divergence (also known as the relative entropy) between two

densities p and q is defined as

DKL(p q) = p() og Xdx

It is convex in p, non-negative (though not necessarily finite), and is zero if and only if

p = q. In information theory, it has an interpretation in terms of the length of encoded

messages from a source which emits symbols according to a probability density function.

While the familiar Shannon entropy gives a lower bound on the average length per

symbol a code can achieve, the KL-divergence between p and q gives the penalty (in

length per symbol) incurred by encoding a source with density p under the assumption

that it really has density q; this penalty is commonly called redundancy.

To illustrate this, consider the Morse code, designed to send messages in English.

The Morse code encodes the letter "E" with a single dot and the letter "Q" with a

sequence of four dots and dashes. Because "E" is used frequently in English and "Q"

seldom, this makes for efficient transmission. However if one wanted to use the Morse

code to send messages in Chinese pinyin, which might use "Q" more frequently, he would

find the code less efficient. If we assume contrafactually that the Morse code is optimal

for English, this difference in efficiency is the redundancy.

Notice that KL divergence is not symmetric and a popular way to symmetrize it is

J(p,q) = (DKL(p lq) + DKL(qllp))

which is called the J-divergence. Jensen-Shannon (JS) divergence, first introduced in [69],

serves as a measure of cohesion between multiple probability distributions. It has been

used by some researchers as a dissimilarity measure for image registration and retrieval

applications [70, 71] with very good results. It has many desirable properties, to name a

few, 1) The square root of JS-divergence (in the case when its parameter is fixed to 0.5)

is a metric [72]; 2) JS-divergence relates to other information-theoretic functionals, such

as the relative entropy or the Kullback divergence, and hence it shares their mathematical

properties as well as their intuitive appeal; 3) The compared distributions using the

JS-divergence can be weighted, which allows one to take into account the different sizes

of the point set samples from which the probability distributions are computed; 4) The

JS-divergence measure also allows us to have different number of cluster centers in each

point-set. There is no requirement that the cluster centers be in correspondence as is

required by Chui et al [73]. Given n probability density functions pi, i E {1,..., n}, the

JS-divergence of pi is defined by

JS-(p,, P2, ..,Pn) = H( 7ipi) 7iH(pi) (4-1)

where r = {1rl, 2, ..., ,rn > 0, 7r = 1} are the weights of the probability density

functions pi and H(pi) is the Shannon entropy. The two terms on the right hand side

of Equation (4-1) are the entropy of p := ripi (the 7r- convex combination of the

pis ) and the same convex combination of the respective entropies. We can show that
JS-divergence can be derived from the KL divergence

JS (pI, p2) = aKL(pl, ap, + (1 a)p2) + (1 a)KL(p2, api + (1 a)P2) (4-2)

where a E (0, 1) is a fixed parameter; we will also consider its straightforward general-

ization to n distributions.

4.2.2 CDF-JS Divergence

In Chapter 2, we defined an entropy measure which is based on probability dis-

tribution instead of density function. The distribution function is more regular because

it is defined in an integral form unlike the density function, which is the derivative of

the distribution. The definition of Cumulative Residual Entropy also preserves the well

established principle that the logarithm of the probability of an event should represent

the information content in the event. CRE is shown to be more immune to noise and

outliers. Based on this idea, we can define a KL-divergence measure between Cumulative
Distribution Functions (CDFs),
Definition: Let Pr(Xi > x) and Pr(X2 > x) be the cumulative residual distribution of
two random variables X1 and X2 respectively, we define the CDF-KL divergence by
/ Pr(Xi > x)
/CD(Pl, P2) Pr(XPr(X > x n (d (4-3)
Pr(X2 > x)2

Follow the same relationship between Jensen-Shannon divergence and KL divergence, we
can derive the so-called CDF-JS divergence from the definition of CDF-KL divergence,
(denoted as J), the result of which is shown in the following theorem,

Theorem 5 Given N probability distributions Pi, i E {1,..., n}, the CDF-JS divergence
of Pi is given by

J(P1, P2, ...P), ) ( 7iP,)- i7ri(Pi) (4-4)

where T = {7l, 72, ..., n7ri > 0, Y T = 1} are the weights of the probability
distributions Pi and S is the Cumulative Residual Entropy defined in Eqn. (2-3) of
Chapter 2.

Proof: Without loss of generality, we can prove it for two random variable case, for
which the CDF-JS can be written as follows,

J(PI, P2)

Sa/CD(P,P) + ( a)/CD(P2, P)

/ a Pr(Xi > x) InPr(Xl > x)dx + (1t
[ Pr(X > x)

Sa Pr(Xi > x) In Pr(Xi > x)dx + (1 -

a Pr(Xi > x) + (1- a) Pr(X2

S-aS(Pi) (1- a)S(P2)

a [ Pr(Xi > x) + (1 a) Pr(X2

Pr(X2> x)
-a) Pr(X2 > x)In Pr(X > x)

a) Pr(X2 > x) In Pr(Xi > x)dx

> )] In Pr(X > x)dx

> )] In Pr(X > x)dx

where, P is the distribution function corresponding to the density function
p = oap + (1 a)p2, which is the convex combination of the two probability densities,

Pr(X > x)

J x
japi(x) + (1 -a)p2(x)dx

Sa Pr(Xi > x) + (1 a) Pr(X2 > x)

Consequently, CDF-JS divergence for two random variable can be rewritten as

-oa(Pi) (1 a)S(P2)

I P(X > x)InP(X > x)dx

S(P) aS(P) (1 a)S(P2)

4.3 Methodology
In this section, we present the details of the proposed simultaneous atlas construction and
non-rigid registration method. Note that atlas construction normally requires the task of



J(Pl, P2)


non-rigid registration following which an atlas is constructed from the registered data.

However, in our work, the atlas emerges as a byproduct of the non-rigid registration. The

basic idea is to model each point set by a probability density or distribution function, then

quantify the distance between these functions using an information-theoretic measure.

The distance measure is optimized over a space of coordinate transformations yielding the

desired transformations. We will begin by presenting the energy function for solving the

groupwise point-sets registration problem.

4.3.1 Energy Function for Groupwise Point-sets Registration

We use the following notation: The data point-sets are denoted by {XP,p E {1, ..., N}}.

Each point-set XP consists of points {x( E RD, i E {1,..., ip} }. The atlas points-set is

denoted by Z. Assume that each point set XP is related to Z via a function fP, Ap is the

set of the transformation parameters associated with each function fP. To compute the

mean shape from these point-sets and register them to the emerging mean shape, we need

to recover these transformation parameters to construct the mean shape. This problem can

modeled as an optimization problem with the objective function being the JS-divergence

or CDF-JS divergence between the distributions of the deformed point-sets, represented

as Pi = p(f (X')), the atlas construction problem can now be formulated as,
min (P, P2..., PN)+A I|Lfi 2
/ i1 (4-8)

In (4-8), D is the divergence measure for multiple distributions, which we propose to use

either JS divergence or CDF-JS divergence. The weight parameter A is a positive constant

the operator L determines the kind of regularization imposed. For example, L could

correspond to a thin-plate spline, a Gaussian radial basis function, etc. Each choice of L

is in turn related to a kernel and a metric of the deformation from and to Z.

Following the approach in [73], we choose the thin-plate spline (TPS) to represent the

non-rigid deformation. Given n control points xl,..., x, in Rd, a general non-rigid

mapping f : RI -- Rd represented by thin-plate spline can be written analytically as:

f(x) = WU(x) + Ax + t Here Ax + t is the linear part of f. The nonlinear part is

determined by a d x n matrix, W. And U(x) is an n x 1 vector consisting of n basis

functions Ui(x) = U(x, xi) = U(||x xil|) where U(r) is the kernel function of

thin-plate spline. For example, if the dimension is 2 (d = 2) and the regularization

functional is defined on the second derivatives of f, we have U(r) = 1/(87)r21n(r).

Therefore, the cost function for non-rigid registration can be formulated as an energy

functional in a regularization framework, where the regularization term in equation 4-8 is

governed by the bending energy of the thin-plate spline warping and can be explicitly

given by trace(WKWT) where K = (Kij), Kij = U(pi, pj) describes the internal

structure of the control point sets. Note the linear part can be obtained by an initial affine

registration, then an optimization can be performed to find the parameter W.

4.3.2 JS Divergence in a Hypothesis Testing Framework

In this section we show that the Jensen-Shannon divergence can be interpreted in the the

frame work of statistical hypothesis testing. To see this, we construct a likelihood ratio

between i.i.d. samples drawn from a mixture (a 7rapa) and i.i.d. samples drawn from a

heterogeneous collection of densities (pi, p2, PN) with the samples being indexed by

the specific member distribution in the family from which they are drawn. Assume that ni

samples are drawn from pi, n2 from p2 etc. Let the total number of pooled samples be
def N
defined as M l ea na. The likelihood ratio then is,

A -1M 1 a-1 7aPa(Xk) (4-9)
N Ir-9a
lla=1 1ka=Pa(Xla)

where xk consists of points {x, i E {1, ..., na}, a E {1,..., N}}, which is the pooled data

of all the samples. In contrast to the typical statistical test relative to a threshold, we seek

the maximum of the likelihood ration in Eqn. (4-9). The following theorem shows the

relationship between Jensen-Shannon divergence and the likelihood ration.

Theorem 6 Given N probability density functions pa, a E {1,..., N}, maximizing the

h)jplthei ratio in Eqn. (4-9) is equivalent to minimizing the Jensen-.\ln11,11ii divergence

between the N probability densities p,, a E {1,..., N}.

Proof: The proof follows by taking logarithm of the likelihood ratio, and then using the

weak law of large numbers, we can show that the log-likelihood ratio is the negative of

Jensen-Shannon divergence. m

We seek to maximize the probability that the samples are drawn from the mixture rather

than from separate members of the family (p, p2, ..., PN). In the context of groupwise

matching of point-sets, this makes eminent sense since maximizing the above ratio is

tantamount to increasing the chance that all of the observed point-sets are warped

versions of the same underlying warped and pooled data model. The notion of the pooled

data model is defined as follows. In our process of groupwise registration, the warping

does not have afixed target data set. Instead, the warping is between the input data sets

and an evolving target which we call the pooled model. The target evolves to a fully

registered pooled data set at the end of the optimization process. The pooled model then

consists of input data sets which have undergone groupwise matching and are now fully

registered, iith each other. The connection to the JS-divergence arises from the fact that

the negative logarithm of the above ratio (Eqn.4-9) asymptotically converges to the

JS-divergence when the samples are assumed to be drawn from the mixture 7rpPa.

4.3.3 Unbiasness Property of the Divergence Measures

Typically we are required to construct an atlas from very large number of point-sets, and

this process will usually take a long time since the computational complexity grows

polynomially with the increase of number of point-sets (N) that we want to register.

However the following hierarchical method will significantly reduce the computational


Assume that we are given N point-sets, from which we are going to construct the atlas,

we can then divide the N point-sets into m subsets (generally m < N), therefore we can

construct m atlases from each subsets using our algorithms, and all the point-sets inside

each subsets are registered. Then we can either construct a single atlas from these m atlas

point-sets, or we can further divide m atlas point-sets into even smaller subsets, and

follow the same process until a single atlas is constructed. The remaining question is

whether the atlas thus obtained is biased or not? The following theorem will lead us to the


Theorem 7 Given N probability distributions Pa, a E {1,..., N}, each having a weight

7Ta in the JS divergence or CDF-divergence. Ifwe further divide the N distributions into

m subsets, such that ith subset contains ni distributions Pa, a {k ), k) ,..., ki) }, and

ni n N. Assume Si is the convex combination of the all the distributions in the ith

subset, ii i/t the weights k(), where 3 = k), i.e. Si, 1 k )Pk/. We then

have the following relationship between the JS divergence of the Pas and divergence of

the Sis

D,(P, P2," ,PN)- D3(Si,S, S-, ,S,)
Tn (4-10)
ADiV ) (P {,,P,..," ,P,. k)
i= 1

Proof: It is trivial to derive the relationship in Eqn. (4-10) by simple algebraic

operations. m

In our registration algorithm, all the point-sets are represented as probability distributions,

and the atlas thus constructed can be considered as convex combination of these

distributions. Therefore, we can treat Pas and Sis as the distributions corresponding to

the point-sets and the constructed atlases from the subsets respectively. Therefore from

Theorem 7, we know that the relationship in Eqn. (4-10) holds between the JS divergence

of the Pas and Sis. Notice that the right hand side of Eqn. (4-10) is the JS/CDF-JS

divergences of the distributions in all the subsets, which are minimized in each steps of

the hierarchical method we proposed. Intuitively, if these point sets are aligned properly,

the corresponding distribution functions should be statistically similar. Therefore the

divergences of all the subsets should be zero all very close to zero, which means the right

hand side of Eqn. (4-10) is zero. Consequently, the JS/CDF-JS divergence of the Pas and

divergence of the Sis are equal to each other, therefore minimizing JS/CDF-JS divergence

of all the resultant atlas point-sets is equivalent to minimizing divergence of the original

point-sets, implying that there is no bias toward any particular partitioning of the


Having introduced the cost function and the transformation model, now the task is to

design an efficient way to estimate empirical divergence measures between multiple

densities or distributions and derive the analytic gradient of the estimated divergence in

order to achieve the optimal solution efficiently. We design two complete different

approaches for estimating JS divergence and CDF-JS divergence. We use finite mixture

model for estimating JS divergence and the parzen window technique for CDF-JS

divergence, the details of which will be introduced next.

4.3.4 Estimating JS and its Derivative Finite mixture models

Considering the point set as a collection of Dirac Delta functions, it is natural to think of a

finite mixture model as representation of a point set. As the most frequently used mixture

model, a Gaussian mixture [74] is defined as a convex combination of Gaussian

component densities.

To model each point-set as a Gaussian mixture, we define a set of cluster centers, one for

each point-set, to serve as the Gaussian mixture centers. Since the feature point-sets are

usually highly structured, we can expect them to cluster well. Furthermore we can greatly

improve the algorithm efficiency by using limited number of clusters. Note that we can

choose the cluster centers to be the point-set itself if the size of point-set is quite small.

The cluster center point-sets are denoted by {V, p {1,..., N} }. Each point-set VP

consists of points {v E 7RD, i E {1,..., KP} }. Note that there are KP points in each VP,

and the number of clusters for each point-set may be different (in our implementation, the

number of clusters were usually chosen to be proportional to the size of the point-sets).

The cluster centers are estimated by using a clustering process over the original sample

points x', and we only need to do this once before the process of joint atlas estimation

and point-sets registration. In our implementation, we utilize deterministic annealing

(DA) procedure with its proven benefit of robustness in clustering [75]. We begin by
specifying the density function of each point set.
pp(x) = oap(x I ) (4-11)
In Equation (4-11), the occupancy probability which is different for each data point-set is

denoted by ao'. The component densities p(x vI) is

1 pTy1 ,(X_ p
p(x ) = 1 exp 1 (x v (x v)) (4-12)
(27r)'Z a 2
Probability of the point set XP coming from this mixture is then
up up KP
Pr(XP|VP, aP) pp(x4) H ap(x |va) (4-13)
i=l i=1 a=
Later, we set the occupancy probability to be uniform and make the covariance matrices

EY to be proportional to the identity matrix in order to simplify atlas estimation

For simplicity, we choose 3i = Vi {= 1, 2,..., N}. Let

Qp: := a=1 aCPr(fJ(x') fP(vP)) be a mixture model containing component densities
Pr(fJ(x) f"(vP)),

eX1 ( 1(fJ(J') fP()P)T al (4f S (p;\
p(f (x ) fP(v)) = -exp ( ) (v) (-1 '(x) f(V))
(27 )T 2

Where {j, a E {1,..., K}} is the set of cluster covariance matrices. For the sake of

simplicity and ease of implementation, we assume that the occupancy probabilities are

uniform (a = -) and the covariance matrices ES are isotropic, diagonal, and identical

[(Y = cr2ID)]. Having specified the density function of the data, we can then rewrite
Equation (4-8) as follows,

JS~(PI,P2, PN)
1 1p
-{[H( j P) Y (P )]
N (4-15)
+[H( i)- H(P2)]

S... [H( Pi) H(PN)

For each term in the equation, we can estimate the entropy using the weak law of large

numbers, which is given by,

S_ l 1i Q Qi + ... + QN 1t '
H(Z Pi) H(P)) log N 1 log Q
ni NQni
in 1 1
-1 t log 1~ Q
= i Q1z + Q +" + Qx

Combining these terms we have,

JS(P,P2, ...,PN)
1 2
i1 n2 X?
1 VNQ1z 1 VQ2
Slog I + log 2 2 (4-16)
S 1 + Q2 +... + N i Q1 + 2z +...+ N

+..+1 flog N
nN" Q, +Q' +...+Q J Optimizing the JS divergence

Computation of the gradient of the energy function is necessary in the minimization

process when employing a gradient-based scheme. If this can be done in analytical form,

it leads to an efficient optimization method. We now present the analytic form of the

gradient of the JS-divergence (our cost function):

VJS= [ ,- ."' (4-17)
Bp," 8w2 OtN

Each component of the gradient maybe found by differentiating Eqn. (4-16) with respect
to the transformation parameters. In order to compute this gradient, let's first calculate the
derivative of Qp with respect to pl,

1 K ( 1 2F2F afj() if *I- )
S(2) 3K a exp (- 22 ) (F T P L if j
1 ( 1 F. F. fp'
e(2) 3K aexp -2 2 P, )I if p 4j (4-18)
O(2)T,3K a-
I K I F" 2(Fexp( F [afP a f (X)] if I p j

where Fj := f (xj) fP(v). Based on this, it is straight forward to derive the gradient
of the JS-divergence with respect to the transformation parameters tl1, which is given by

a ( nl Xiii
OPS 1 1 1Q
7=1 Qo O + QX + QX
1 2

11 1

y)1 2 N
) +1, 1+...+Q
LO1 iOQZ +I T'" lI J-

Q, + Q +... + Q J

4.3.5 Estimating CDF-JS and its Derivative
We will use the parzen window technique to estimate the cumulative distribution function
and its derivative. The calculation of CDF-JS divergence requires the estimation of the
cumulative probability distributions of each point-set. Without lost of generality, we only
discuss the derivation for the 2D case, which can be extended to 3D case easily. For each
point xa with the coordinates [xa, ya] in the point set Xa, it is transformed by the function

fa to fa(x, pa) [f(x, ~a), fa(', a)]. Let 3(3) be a cubic spline Parzen window.
The smoothed probability density function pa(1, k; ,a) of the point-set

Xa, a {1,..., N}} is given by

p (l, k;a) -(3) faXa ,,a) 0
Ai (4-20)

3(3) (k- fa(li,,,) yo

where a is a normalization factor that ensures ff p(l, k)dldk = 1, [1, k] are the
coordinate values in the X and Y axis respectively, the transformed point coordinates

[fa(xa, na), fa"(, 1)] is normalized by the minimum coordinate value, xo, yo, and the
range of each bin, Abx, Aby. From the density function, we can calculate the cumulative
residual distribution function by the formula
Pa(1 > A, k > 7; ,a) JfAO pa(l, k; ta)dldk, where A e L,7 c Ly, L, Ly are
discrete sets of coordinate values in the X and Y axis respectively. To express it in further
detail, we have the following,
oa o [0 fala ,a\ 0

ia('"!(l: fa(, -^a) ^ On
pa (Axy _,; a) /E3) I- (Xli, p a) \ d
/ () k-f (, ) yo dk

Abx Aby

where I)(.) is the cumulative residual function of cubic spline kernel which is defined as


Note that d( -P() (3) ().
Having specified the distribution function of the data, we can then rewrite Eqn. (4-8) as
follows, (For simplicity, we choose a = Va = {1, 2,..., N}. )
J(Pl,P2, ...,PN) P = ( 7raPa)- 7rag(Pa)
a=1 a=1 (4-21)
A P log P + a Plog p
A a A

where P is the cumulative residual distribution function for the density function

k Zp"(1, k; gf), which can be expressed as
N nct fa(acfl a 0
P(A,, ;a) ZaEZ A p )
a= i (4-22)

Aby Optimizing the CDF-JS divergence

We now present the analytic form of the gradient of the CDF-JS divergence (our cost

VJ = [ (4-23)
[al' av2 9'aN

Each component of the gradient maybe found by differentiating Eqn (4-21) with respect
to the transformation parameters. It can be easily shown that =P(A'"Y;) 1 IPc(A,;ip)
0aW N 0pa
Based on these facts, it is straight forward to derive the gradient of the CDF-JS
divergence with respect to the transformation parameters pa, which is given by

o > ZHE l + logP] &P(a,;;r)o
A -
+ [1 + lo gP (4-24)

N EEE opa
a A 7

As a byproduct of the groupwise registration using the CDF-JS divergence, we get the

atlas of the given population of data sets, which is simply obtained by substituting the

estimated transformation functions fa in to the formula for the atlas p(A) = N I TTaPa.
Note that our algorithm can be applied to yield a biased registration in situations that

demand such a solution. This is achieved by fixing one of the data sets (say the reference)

and estimating the transformation from this to the novel scene data. We will present

experimental results on point-set alignment between two given point-sets as well as atlas

construction from multiple point-sets in the next section.

4.4 Experiment Results

We now present experimental results using JS divergence and CDF-JS divergence for

point-sets registration. And the results will be demonstrated on both synthetic and real

data sets.

4.4.1 JS Divergence Results

To demonstrate the robustness and accuracy of our algorithm, we show the alignment

results by applying the JS-divergence to the point-set matching problem. Then, we will

present the atlas construction results in the second part. Alignment results

First, to test the validity of our approach, we perform a set of exact rigid registration

experiments on both synthetic and real data sets without noise and outliers. Some

examples are shown in Figure 4-2. The top row shows the registration result for a 2D real

range data set of a road (which was also used in Tsin and Kanade's experiments [67]).

The figure depicts the real data and the registered (using rigid motion). Top left frame

contains two unregistered point sets superposed on each other. Top right frame contains

the same point sets after registration using our algorithm. A 3D helix example is

presented in the second row (with the same arrangement as the top row). We also tested

our method against the KC method [67] and the ICP methods, as expected, our method

and KC method exhibit a much wider convergence basin/range than the ICP and both

achieve very high accuracy in the noiseless case.

We also applied our algorithm to non-rigidly register medical datasets (2D point-sets).

Figure 4-3 depicts some results of our registration method applied to a set of 2D corpus

callosum slices with feature points manually extracted by human experts. Registration

result is shown in the left column with the warping of 2D grid under the recovered motion

which is shown in the middle column. Our non-rigid alignment performs well in the

After registration


40 -20
Initial setup

-30 -20 -10 0 10
After registration

C "isf~

Figure 4-2: Results of rigid registration in noiseless case. 'o' and '+' indicate the model
and scene points respectively.

presence of noise and outliers (Figure 4-3 right column). For the purpose of comparison,

we also tested the TPS-RPM program provided in [62] on this data set, and found that

TPS-RPM can correctly register the pair without outliers (Figure 4-3 top left) but failed

to match the corrupted pair (Figure 4-3 top right).

Initial Setup


0.4 0.6 0.8 1
After registration

0.4 0.6 0.8 1

Original point set Initial Setup
0.2 0.2 0 ++ +++14
0 1 : t 0.1 ; :

0.4 0.6 0.8 1 0.4 0.6 0.8 1
Deformed point set After registration

0.2 4 ;0. 0.2 0 6

4 0 6 08 1 02 0
0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1

Figure 4-3: Non-rigid registration of the corpus callosum data. Left column: two man-
ually segmented corpus callosum slices before and after registration; Middle column:
warping of the 2D grid using the recovered motion; Top right: same slices with one cor-
rupted by noise and outliers, before and after registration.

Initial setup

 Atlas construction results

In this section, we begin with a simple but demonstrative example of our algorithm for 2D

atlas estimation. The structure we are interested in this experiment is the corpus callosum

as it appears in MR brain images. Constructing an atlas for the corpus callosum and

subsequently analyzing the individual shape variation from "normal" anatomy has been

regarded as potentially valuable for the study of brain diseases such as agenesis of the

corpus callosum(ACC), and fetal alcohol syndrome(FAS).

it Set1 Point Set2 Point Set3
S0.2 0.2
0.1 0.1
0 L 0
-0.1 -0.1
0.8 1 0.4 0.6 0.8 1 0.4 0.6 0.8 1
it Set4 Point Set5 Point Set6
0.2 | ,- 0.2
0.1 0.1
0 0
-0.1 0.1
0.8 1 0.4 0.6 0.8 1 0.4 0.6 0.8 1
it Set7 Point-sets Before Registration Deformed Point-sets
0.2 ,.4 -- 0.2
0.1 0.1

0.8 1 0.4 0.6 0.8 1 0.4 0.6 0.8 1

Figure 4-4: Experiment results on seven 2D corpus collasum point sets. The first two
rows and the left image in third row show the deformation of each point-set to the at-
las, superimposed with initial point set (show in 'o') and deformed point-set (shown in
'*'). Middle image in the third row: The estimated atlas is shown superimposed over all
the point-sets. Right: The estimated atlas is shown superimposed over all the deformed

We manually extracted points on the outer contour of the corpus callosum from seven

normal subjects, (as shown Figure 4-4, indicated by "o"). The recovered deformation

between each point-set and the mean shape are superimposed on the first two rows in

Figure 4-4. The resulting atlas (mean point-set) is shown in third row of Figure 4-4, and

is superimposed over all the point-sets. As we described earlier, all these results are

computed simultaneously and automatically. This example clearly demonstrate that our

joint matching and atlas construction algorithm can simultaneously align multiple shapes

(modeled by sample point-sets) and compute a meaningful atlas/mean shape.

4.4.2 CDF-JS Divergence Results

First, to see how the CDF-JS method behaves in the presence of noise and outliers, we

designed the following procedure to generate a corrupted template point set from a model

set. For a model set with n points, we control the degree of corruption by (1) discarding a

subset of size (1 p)n from the model point set, (2) applying a rigid transformation (R,t)

to the template, (3) perturbing the points of the template with noise (of strength e), and (4)

adding (7 p)n spurious, uniformly distributed points to the template. Thus, after

corruption, a template point set will have a total of -rn points, of which only pn

correspond to points in the model set. Since ICP is known to be sensitive to outliers, we

only compare our method with the more robust Jensen-Shannon divergence method in

terms of the sensitivity of noise and outliers. The comparison is done via a set of 2D

experiments.At each of several noise levels and outliers ,ienugitl, we generate five

models and six corrupted templates from each modelfor a total of 30 pairs at each noise

and outlier strength setting. For each pair, we use our algorithm and the JS method to

estimate the known rigid transformation which was partially responsible for the

corruption. Results show when the noise level is low, both JS and CDF-JS have strong

resistance to outliers. However, we observe that when the noise level is high, CDF-JS

method exhibits stronger resistance to outliers than the JS method, as shown in Figure

4-5, which confirm that CDF-JS is indeed more robust in the presence of high noise and

outlier level. A 3D example is also presented in Figure 4-6.

Next, we present groupwise registration results on 3D hippocampal point-sets. Four 3D

point-sets were extracted from epilepsy patients with left anterior temporal lobe foci

identified with EEG. An interactive segmentation tool was used to segment the

hippocampus from the 3D brain MRI scans of 4 subjects. The point-sets differ in shape,

with the number of points 450, 421, 376, 307 in each point-set respectively. In the first


RMS errors in translation RMS errors in rotation
10* 0.1
-s--JS --JS
8 CDF-JS 0.08 CDF-JS

6 0.06

4 / / 0.04 -
SV e-- '
2 0.02 -

0 O0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
Strength of outlier Strenath of outlier

Figure 4-5: Robustness to outliers in the presence of large noise. Errors in estimated rigid
transform vs. proportion of outliers ((7 p)/(p)) for both our method and KC method.

Initial setup After registration

+ 1004
+ 50
SI 0+ 200
+ 200 ++ + + +

S100 50 .
0 5

Figure 4-6: Robustness test on 3D swan data. 'o' and '+' indicate the model and scene
points respectively. Note that the scene point-set is corrupted by noise and outliers.

four images of Figure 4-7, the recovered nonrigid deformation between each

hippocampal point-set to the atlas is shown along with a superimposition on all of the

original data sets. In second row of the Figure 4-7, we also show the scatter plot of

original point-sets along with all the point-sets after the non-rigid warping. An

examination of the two scatter plots clearly shows the efficacy of our recovered non-rigid

warping. Note that validation of what an atlas shape ought to be in the real data case is a

difficult problem and we relegate its resolution to a future paper.

Pointset 2

Pooled Pointsets

Pointset 3

Deformed Pointsets

.r .

Figure 4-7: Atlas construction from four 3D hippocampal point sets. The first row and
the left image in second row shows the deformation of each point-set to the atlas (rep-
resented as cluster centers), superimposed with initial point set (show in green 'o') and
deformed point-set (shown in red '+'). Left image in the second row: Scatter plot of the
original four hippocampal point-sets. Right: Scatter plot of all the warped point-sets.

Pointset 1

Pointset 4

d .,:.


In Medical Imaging applications, segmentation can be a daunting task due to possibly

large inhomogeneities in image intensities across an image e.g., in MR images. These

inhomogeneities combined with volume averaging during the imaging and possible lack

of precisely defined shape boundaries for certain anatomical structures complicates the

segmentation problem immensely. One possible solution for such situations is atlas-based

segmentation. The atlas once constructed can be used as a template and can be registered

non-rigidly to the image being segmented (henceforth called a target image) thereby

achieving the desired segmentation. Many of the methods that achieve atlas-based

segmentation are based on a two stage process involving, (i) estimating the non-rigid

deformation field between the atlas image and the target image and then, (ii) applying the

estimated deformation field to the desired shape/atlas to achieve the segmentation of the

corresponding structures in the target image. In this chapter, we develop a novel

technique that will simultaneously achieve the non-rigid registration and segmentation.

There is a vast body of literature for the tasks of registration and segmentation

independently however, methods that combine them into one algorithm are far and few in

between. In the following, we will briefly review the few existing methods that attempt to

achieve simultaneous registration and segmentation.

5.1 Related Work

In one of the earliest attempts at joint registration & segmentation, Bansal et al., [76]

developed a minmax entropy framework to rigidly register & segment portal and CT data

sets. In [77], Yezzi et al., present a variational principle for achieving simultaneous

registration and segmentation, however, the registration part is limited to rigid motions. A

similar limitation applies to the technique presented by Noble et al., in [78]. A variational

principle in a level-set based formulation was presented in Pargios et. al., [79], for

segmentation and registration of cardiac MRI data. Their formulation was again limited

to rigid motion and the experiments were limited to 2D images. In Fischl et al., [80], a

Bayesian method is presented that simultaneously estimates a linear registration and the

segmentation of a novel image. Note that linear registration does not involve non-rigid

deformations. The case of joint registration and segmentation with non-rigid registration

has not been addressed adequately in literature with the exception of the recent work

reported in Soatto and Yezzi [81] and Vemuri et al., [82]. However, these methods can

only work with image pairs that are necessarily from the same modality or the intensity

profiles are not too disparate.

In this paper, we present a unified variational principle that will simultaneously register

the atlas shape (contour/surface) to the novel brain image and segment the desired shape

(contour/surface) in the novel image. In this work, the atlas serves in the segmentation

process as a prior and the registration of this prior to the novel brain scan will assist in

segmenting it. Another key feature/strength of our proposed registration+segmentation

scheme is that it accommodatesfor image pairs having very distinct intensity

distributions as in multimodality data sets. More details on this are presented in section


5.2 Registration+Segmentation Model

We now present our formulation of joint registration & segmentation model. Let li be the

atlas image containing the atlas shape C, 12 the novel image that needs to be segmented

and v be the vector field, from 12 to I1 i.e., the transformation is centered in 12, defining

the non-rigid deformation between the two images. The variational principle describing

our formulation of the registration assisted segmentation problem is given by:

minE(v, C) = Seg(2, C) + dst(v(C), C) + Reg( 1, 2, v).



Atlas Image Target Image

v C)

Figure 5-1: Model Illustration

Where, the first term denotes the segmentation functional. C is the boundary contour

(surface in 3D) of the desired anatomical shape in I2. The second term measures the

distance between the transformed atlas v(C) and the current segmentation C in the novel

brain image i.e., the target image and the third term denotes the non-rigid registration

functional between the two images. Our joint registration & segmentation model is

illustrated in Figure 5.2.

For the segmentation functional, we use a piecewise constant Mumford Shah model,

which is one of the well-known variational models for image segmentation, wherein it is

assumed that the image to be segmented can be modeled by piece-wise constant regions,

as was done in [54]. This assumption simplifies our presentation but our model itself can

be easily extended to the piecewise smooth regions case. Additionally, since we are only

interested in segmenting a desired anatomical shape (e.g., the hippocampus, the corpus

callosum, etc.), we will only be concerned with a binary segmentation i.e., two classes

namely, voxels inside the desired shape and those that are outside it. These assumptions

can be easily relaxed if necessary but at the cost of making the energy functional more

complicated and hence computationally more challenging. The segmentation functional

takes the following form:

Seg(12, C) (2 )2 dx +a ds (5-2)
J/o Jo

Where, Q is the image domain and a is a regularization parameter. u = ui if x C C, and

u = Uo if E Cout. C~ and Cot denote the regions inside and outside of the curve, C
representing the desired shape boundaries in I2.

For the non-rigid registration term in the energy function, we use the information

theoretic-based criteria, cross cumulative residual entropy (CCRE) which we introduced
in Chapters 2. CCRE was shown to outperform Mutual Information based registration in

the context of noise immunity and convergence range, motivating us to pick this criteria

over the MI-based cost function. The new registration functional is defined by

Reg(li, 12, V) (C(II V X)), 12 )) (l Vv(x) 2)) (5-3)

where, cross-CRE C(II, 12) is given by,

C(l, I2) (1) E[S(I1/I2)] (5-4)

with S(11) f+ P(|II| > A)log P(|II > A)dA and R+ (x e R; x > 0). v(x) is as

before and p is the regularization parameter and I I denotes Frobenius norm. Using a

B-spline representation of the non-rigid deformation, one need only compute this field at
the control points of the B-splines and interpolate elsewhere, thus accruing computational

advantages. Using this representation, we have derived analytic expressions for the

gradient of the energy with respect to the registration parameters. This in turn makes our
optimization more robust and efficient.

In order for the registration and the segmentation terms to "talk" to each other, we need a
connection term and that is given by

dist(v(C), C) = PO(c)(x)dx (5-5)

where, R is the region enclosed by C, Ov(c) (x) is the embedding signed distance function

of the contour v(C), which can be used to measure the distance between v(C) and C.

The level-set function 0 : R2 IR is chosen so that its zero level-set corresponds to the

transformed template curve v(C). Let Edist = dist(v(C), C), one can show that

O= v(C) (C)N where N is the normal to C. The corresponding curve evolution
equation given by gradient descent is then

at v(c)(C)N (5-6)

Not only does the signed distance function representation make it easier for us to convert

the curve evolution problem to the level-set framework, it also facilitates the matching of

the evolving curve C and the transformed template curve v(C), and yet does not rely on a

parametric specification of either C or the transformed template curve. Note that since

dist(v(C), C) is a function of the unknown registration v and the unknown segmentation

C, it plays the crucial role of connecting the registration and the segmentation terms.

Combining these three functionals together, we get the following variational principle for

the simultaneous registration+segmentation problem:

minE(C, v, Uo, ui) = (12 )2dx + a1 ds d dist(v(C), C)
p Jc (5-7)
a3C(II(V(X)), 12(X)) + a4 IVv(x) 12dX.

a are weights controlling the contribution of each term to the overall energy function and

can be treated as unknown constants and either set empirically or estimated during the

optimization process. This energy function is quite distinct from those used in methods

existing in literature because it is achieving the Mumford-Shah type of segmentation in an

active contour framework jointly with non-rigid registration and shape distance terms. We

are now ready to discuss the level-set formulation of the energy function in the following


5.2.1 Gradient flows

The level set method have been used extensively for implementing the curve evolution

based segmentation, primarily due to its many advantages over the competing approaches.

These include the ability to elegantly handle changes in the topology of the curve (splits

and merges), the ability to deal with the formation of cusps and corners, which are
extremely common in curve evolution, and the numerical stability and efficiency afforded
in its implementation. For our model where the equation for the unknown curve C is
coupled with the equations for v(x), Uo, ui, it is convenient for us to use the level set
approach as proposed in [54].
Taking the variation of E(.) with respect to C and writing down the gradient descent
leads to the following curve evolution equation:

C= [-(2 i)2 + (12 u0)2 + a1K a2v(C)(C')] N (5-8)

Note that equation (5-6) is used in the derivation. Equation (5-8) in the level-set
framework is given by:

=at h (12 i2 u)2 V + a2 O 2v(C)() |V~ (5-9)
where ui and no are the mean values inside and outside of the curve C in the image I2. To
drive the curve towards the template's level-set function v(c) more efficiently, rather than
just having the zero level-sets match, we can add another term 0(C) into the level-set
evolution equation giving us,

=[ (12 )2 +(12 )2 + V .
+a2 (v)(C) (C'))] V7.

As illustrated in Figure 5-2, the two parameters ac and a2 are used to balance the
influence of the shape distance model and the region-based model. Note that (C) = 0 at
any location of the curve by the definition of level-set function 0, this added term does
not affect the curve evolution equation [83].
As mentioned before, we use a B-spline basis to represent the displacement vector field

v(x, p), where p is the transformation parameters of the B-spline basis.

aE a fR v(c)(x) dx aC(I1(v(x)),2 (x)) a a Vv(x) 2dx
S/ 2 0/3 p+4 (5-11)


-2( (C) -( C -..
Next Step in the Evolution

Current Shape

Region Based Term

Figure 5-2: Illustration of the various terms in the evolution of the level set function <.
To update 0, we combine the standard region based update term S, and level set function
corresponding to the shape distance term.

The first term of equation(5- 1) can be rewritten as follows:

a SfR v(c) (x) dx f a9v(c) (x) dx
ap 0j (G (5-12)
j v(c) v, v(x, ) dx

where o ) is the directional derivative in the direction of v(x, p). The second term of

Eqn. (5-11) has been derived in Eqn. (3-15) of the chapter 3. We simply state the result

here without the derivations for the sake of brevity,

C (I2, 1 o v(x; p)) P(i > A, k; ) OP(i> A,k;p)13)
= log (5-13)
p AIl keI2 p12(k)P(i > A;p) p

where P(i > A, k; p) and P(i > A; p) are the joint and marginal cumulative residual

distributions respectively. p12 (k) is the density function of image I2. The last term of Eqn.

(5-11)leads to,
Sf IVv(x)||2dx V 9v
2 Vv *- dx (5-14)

where both the matrices Vv and 'v are vectorized before the dot product is computed.

Substituting equations (5-12), (5-13) and (5-14) respectively back into the equation

(5-11), we get the analytical gradient of our energy function with respect to the B-spline

transformation parameters p. We then solve for the stationary point of this nonlinear

equation numerically using a quasi-Newton method.

5.2.2 Algorithm Summary

Given the atlas image I1 and the unknown subject's brain scan 12, we want the

segmentation result C in 12. Initialize C in 12 to C and set the initial displacement field to

1. For fixed C, update deformation field using gradient-based numerical method for

one step.

2. For fixed deformation field v, evolve 0 in 12 and thereby update C as the zero

level-set of 0.

3. Stop the registration process if the difference in consecutive iterates is less than

e = 0.01, a pre-chosen tolerance, else go to Step 1.

5.3 Results

In this section, we present several examples results from an application of our algorithm.

The results are presented for synthetic as well as real data. The first three experiments

were performed in 2D, while the fourth one was performed in 3D. Note that the image

pairs used in all these experiments have significantly different intensity profiles, which is

unlike any of the previous methods, reported in literature, used for joint registration and

segmentation. The synthetic data example contains a pair of MR T1 and T2 weighted

images which are from the MNI brainweb site [52]. They were originally aligned with

each other. We use the MR T1 image as the source image and the target image was

generated from the MR T2 image by applying a known non-rigid transformation that was

procedurally generated using kernel-based spline representations (cubic B-Spline). The

possible values of each direction in deformation vary from -15 to 15 in pixels. In this

Figure 5-3: Results of application of our algorithm to synthetic data (see text for details).

case, we present the error in the estimated non-rigid deformation field, using our

algorithm, as an indicator of the accuracy of estimated deformations.

Figure 5-3 depicts the results obtained for this image pair. With the MR T1 image as the

source image, the target was obtained by applying a synthetically generated non-rigid

deformation field to the MR T2 image. Notice the significant difference between the

intensity profiles of the source and target images. Figure5-3 is organized as follows, from

left to right: the first row depicts the source image with the atlas-segmentation superposed

in red, the registered source image which is obtained using our algorithm followed by the

target image with the unregistered atlas-segmentation superposed to depict the amount of

mis-alignment; second row depicts ground truth deformation field which we used to

generate the target image from the MR T2 image, followed by the estimated non-rigid

deformation field and finally the segmented target. As visually evident, the

registration+segmentation are quite accurate from a visual inspection point of view. As a

measure of accuracy of our method, we estimated the average, p, and the standard

deviation, a, of the error in the estimated non-rigid deformation field. The error was

estimated as the angle between the ground truth and estimated displacement vectors. The

average and standard deviation are 1.5139 and 4.3211 (in degrees) respectively, which is

quite accurate.

Table 5-1 depicts statistics of the error in estimated non-rigid deformation when

compared to the ground truth. For the mean ground truth deformation (magnitude of the

displacement vector) in Column-1 of each row, 5 distinct deformation fields with this

mean are generated and applied to the target image of the given source-target pair to

synthesize 5 pairs of distinct data sets. These pairs (one at a time) are input to our

algorithm and the mean (p) of the mean deformation error (MDE) is computed over the

five pairs and reported in Column-2 of the table. MDE is defined as

Table 5-1: Statistics of the error in estimated non-rigid deformation.

ig ip of MDE a of MDE
2.4 0.5822 0.0464
3.3 0.6344 0.0923
4.5 0.7629 0.0253
5.5 0.7812 0.0714

dm = car(R) R I vo(x) v(x) 11, where vo(x0 ) v(xa) is the ground truth and

estimated displacements respectively at voxel xi. I1.1 denotes the Euclidean norm, and R

is the volume of the region of interest. Column-3 depicts the standard deviation of the

MDE for the five pairs of data in each row. As evident, the mean and the standard

deviation of the error are reasonably small indicating the accuracy of our joint registration

+ segmentation algorithm. Note that this testing was done on a total of 20 image pairs

(=40) as there are 5 pairs of images per row.

For the first real data experiment, we selected two image slices from two different

modalities of brain scans. The two slices depict considerable changes in shape of the

ventricles, the region of interest in the data sets. One of the two slices was arbitrarily

selected as the source and segmentation of the ventricle in the source was achieved using

an active contour model. The goal was then to automatically find the ventricle in the

target image using our algorithm given the input data along with the segmented ventricles

A,, ?..^ J

Figure 5-4: Results of application of our algorithm to a pair of slices from human brain
MRIs (see text for details).

in the source image. Figure 5-4 is organized as follows, from left to right: the first row

depicts the source image with the atlas-segmentation superposed in black followed by the

target image with the unregistered atlas-segmentation superposed to depict the amount of

mis-alignment; second row depicts the estimated non-rigid vector field and finally the

segmented target. As evident from figures 5-4, the accuracy of the achieved

registration+segmentation visually very good. Note that the non-rigid deformation

between the two images in these two examples is quite large and our method was able to

simultaneously register and segment the target data sets quite accurately.

The second real data example is obtained from two brain MRIs of different subjects and

modalities, the segmentation of the cerebellum in the source image is given. We selected

two "corresponding" slices from these volume data sets to conduct the experiment. Note

that even though the number of slices for the two data sets are the same, the slices may

not correspond to each other from an anatomical point of view. However, for the purposes

of illustration of our algorithm, this is not very critical. We use the corresponding slice of

the 3D segmentation of the source as our atlas-segmentation. The results of an application


Figure 5-5: Corpus Callosum segmentation on a pair of corresponding slices from dis-
tinct subjects.

of our algorithm are organized as before in figure 5-5. Once again, as evident, the visual

quality of the segmentation and registration are very high.

Finally we present a 3D real data experiment. In this experiment, the input is a pair of 3D

brain scans with the segmentation of the hippocampus in one of the two images (labeled

the atlas image) being obtained using the well known PCA on the several training data

sets. Each data set contains 19 slices of size 256x256. The goal was then to automatically

find the hippocampus in the target image given the input. Figure 5-6 depicts the results

obtained for this image pair. From left to right, the first image shows the given (atlas)

hippocampus surface followed by one cross-section of this surface overlaid on the source

image slice; the third image shows the segmented hippocampus surface from the target

image using our algorithm and finally the cross-section of the segmented surface overlaid

on the target image slice. To validate the accuracy of the segmentation result, we

randomly sampled 120 points from the segmented surface and computed the average

distance from these points to the ground truth hand segmented hippocampal surface in the

target image. The hand segmentation was performed by an expert neuroanatomist. The

Figure 5-6: Hippocampal segmentation using our algorithm on a pair of brain scans from
distinct subjects. (see text for details)

average and standard deviation of the error in the aforementioned distance in estimated

hippocampal shape are 0.8190 and 0.5121(in voxels) respectively, which is very accurate.


6.1 Contributions of the Dissertation

We have introduced a variety of information theoretic measures and showed various

applications. The novel information measures we presented in this dissertation include

Entropy defined on distributions, Cumulative Residual Entropy (CRE)

Cross-Cumulative Residual Entropy (CCRE)

CDF based Kullback-Leiber (KL) divergence

CDF based Jensen-Shannon (JS) divergence

We demonstrated their applications to the following medical image analysis problems,

Non-rigid image registration.

Simultaneous groupwise point-sets registration and atlas construction.

Atlas based image segmentation.

Our contributions to each of these topics are summarized in the following sections.

6.2 Image and Point-sets Registration

6.2.1 Non-rigid Image Registration

For non-rigid image registration, we presented a novel way to register multi-modal

datasets based on the matching criterion called cross cumulative residual entropy(CCRE)

[84] to measure the similarity between two images. The matching measure is defined

based on a new information measure, namely cumulative residual entropy (CRE), which

is defined based on the probability distributions instead of probability densities, therefore

CCRE is valid for both discrete and continuous domain. Furthermore, CCRE also inherits

the robustness property of the CRE measure. In [84], we presented results of rigid and

affine registration under a variety of noise levels and showed significantly superior

performance over MI-based methods.

The Cross-CRE between two images to be registered is maximized over the space of

smooth and unknown non-rigid transformations, which is represented by a tri-cubic

BSplines placed on a regular gird. The analytic gradient of this matching measure is then

derived in this paper to achieve efficient and accurate non-rigid registration. It turns out

that the gradient of the CCRE has a similar formulation with the cost function, which

greatly saves memory space in the optimization process. The matching criterion is

optimized using Quasi-Newton method to recover the transformation parameters.

The key strengths of our proposed non-rigid registration scheme are demonstrated

through the registration of the synthetic as well as real data sets from multi-modality (MR

Tl and T2 weighted, MR & CT) imaging sources. It is showed that our CCRE not only

can accommodate images to be registered of varying contrast+brightness, but it is also

robust in the presence of noise. CCRE converges faster when compared with other

information theory-based registration methods. Finally we also showed that CCRE is well

suited for situations where the source and the target images have FOVs with large

non-overlapping regions (which is quite common in practice). Comparisons were made

between CCRE and traditional MI [34, 51], which was defined using the Shannon

entropy. All the experiments depicted significantly better performance of CCRE over the

MI-based methods currently used in literature.

Our future work will focus on extending the transformations model to the one that permits

the spatial adaptation of the transformation's compliance, which will allow us to reduce

the number of degrees of freedom in the overall transformation. Validation of non-rigid

registration on real data with the aid of segmentations and landmarks obtained manually

from a group of trained anatomists are the goals of our ongoing work.

6.2.2 Groupwise Point-sets Registration

We presented a novel and robust algorithm for the groupwise non-rigid registration of

multiple unlabeled point-sets with no bias toward any of the given point-sets. To quantify

the divergence between multiple probability distributions estimated from the given point

sets, we proposed several divergence measures, the first of which is the Jensen-Shannon

divergence. Since it lacks robustness, we develop a novel measure based on their

cumulative distribution functions that we dub as the CDF-JS divergence. The measure

parallels the well known Jensen-Shannon divergence (defined for probability density

functions) but is more regular than the JS divergence since its definition is based on CDFs

as opposed to density functions. As a consequence, CDF-JS is more immune to noise and

statistically more robust than the JS.

Our proposed methods do not require any knowledge of correspondence between the

input point-sets, and therefore these point-sets need not have the same cardinality. One

other salient feature of our proposed algorithms is that we get a probabilistic atlas as the

byproduct of the registration process. Our algorithm can be especially useful for creating

atlases of various shapes present in images as well as for simultaneously (rigidly or

non-rigidly) registering 3D range data sets without having to establish any


Our future work will focus on using maximum likelihood estimation (MLE) to

automatically determine weighting coefficients in the divergence measures and smoothing

term; We are also attempting to extend our techniques to diffeomorphic point-sets


6.3 Image Segmentation

In the part of image segmentation, we presented a novel variational formulation of the

joint (non-rigid) registration and segmentation problem which requires the solution to a

coupled set of nonlinear PDEs that are solved using efficient numerical schemes. Our

work is a departure from earlier methods in that we presented a unified variational

principle wherein non-rigid registration and segmentation are simultaneously achieved.

Unlike earlier methods presented in literature, a key feature of our algorithm is that it can

accommodate for image pairs having distinct intensity distributions. We presented several

examples (twenty) on synthetic and (three) real data sets along with quantitative accuracy


estimates of the registration in the synthetic data case. The accuracy as evident in these

experiments is quite satisfactory. Our future efforts will focus on adapting our

algorithm+software for the clinic use.


[1] C. E. Shannon, "A mathematical theory of communication," Bell System Technical
Journal, pp. 379-423 and 623-656, 1948.

[2] W. F. Sharpe, Investments. London: Prentice Hall, 1985.

[3] D. Salomon, Data Compression. New York: Springer, 1998.

[4] S. Kullback, Information Theory and Statistics. New York: Wiley, 1959.

[5] T. M. Cover and J. A. Thomas, Elements ofInformation Theory. New York: Wiley,

[6] G. Jumarie, Relative Information. New York: Springer, 1990.

[7] M. Rao, Y Chen, B. C. Vemuri, and F. Wang, "Cumulative residual entropy, a new
measure of information," IEEE Transactions on Information Theory, vol. 50, no. 6,
pp. 1220-1228, June 2004.

[8] M. Asadi and Y Zohrevand, "On the dynamic cumulative residual entropy,"
Unpublished Manuscript, 2006.

[9] H. Chui, L. Win, R. Schultz, J. Duncan, and A. Rangarajan, "A unified non-rigid
feature registration method for brain mapping," Medical Image Analysis, vol. 7,
no. 2, pp. 112-130, 2003.

[10] N. Paragios, M. Rousson, and V. Ramesh, "Non-rigid registration using distance
functions," Comput. Vis. Image Underst., vol. 89, no. 2-3, pp. 142-165, 2003.

[11] M. A. Audette, K. Siddiqi, F. P. Ferrie, and T. M. Peters, "An integrated
range-sensing, segmentation and registration framework for the characterization of
intra-surgical brain deformations in image-guided surgery," Comput. Vis. Image
Underst., vol. 89, no. 2-3, pp. 226-251, 2003.

[12] A. Leow, P. M. Thompson, H. Protas, and S.-C. Huang, "Brain warping with
implicit representations." in International Symposium on Biomedical Imaging, 2004,
pp. 603-606.

[13] B. Jian and B. C. Vemuri, "A robust algorithm for point set registration using
mixture of gaussians." in IEEE International Conference on Computer Vision, 2005,
pp. 1246-1251.

[14] F. Wang, B. C. Vemuri, A. Rangarajan, I. M. Schmalfuss, and S. J. Eisenschenk,
"Simultaneous nonrigid registration of multiple point sets and atlas construction," in
European Conference on Computer Vision, 2006, pp. 551-563.

[15] S. J. H. Guo, A. Rangarajan, "A new joint clustering and diffeomorphism estimation
algorithm for non-rigid shape matching," in IEEE Computer Vision andPattern
Recognition, 2004, pp. 16-22.

[16] M. Irani and P. Anandan, "Robust Multi-sensor Image Alignment," in International
Conference on Computer Vision, Bombay, India, 1998, pp. 959-965.

[17] J. Liu, B. C. Vemuri, and J. L. Marroquin, "Local frequency representations for
robust multimodal image registration," IEEE Transactions on Medical Imaging,
vol. 21, no. 5, pp. 462-469, 2002.

[18] M. Mellor and M. Brady, "Non-rigid multimodal image registration using local
phase," in Medical Image Computing and Computer-Assisted Intervention,
Saint-Malo, France, Sep 2004, pp. 789-796.

[19] B. Zitova and J. Flusser, "Image registration methods: a survey." Image Vision
Comput., vol. 21, no. 11, pp. 977-1000, 2003.

[20] J. Ruiz-Alzola, C.-F. Westin, S. K. Warfield, A. Nabavi, and R. Kikinis, "Nonrigid
registration of 3d scalar vector and tensor medical data," in ThirdInternational
Conference on Medical Image Computing and Computer-Assisted Intervention,
A. M. DiGioia and S. Delp, Eds., Pittsburgh, October 11-14 2000, pp. 541-550.

[21] L. Marroquin, B. Vemuri, S. Botello, F. Calderon, and A. Fernandez-Bouzas, "An
accurate and efficient bayesian method for automatic segmentation of brain mri," in
IEEE Transactions on Medical Imaging, 2002, pp. 934-945.

[22] B. C. Vemuri, J. Ye, Y Chen, and C. M. Leonard, "A level-set based approach to
image registration," in IEEE Workshop on Mathematical Methods in Biomedical
Image Analysis, 2000, pp. 86-93.

[23] P. Hellier, C. Barillot, E. Mmin, and P. Prez, "Hierarchical estimation of a dense
deformation field for 3d robust registration," IEEE Transactions on Medical
Imaging, vol. 20, no. 5, pp. 388-402, May 2001.

[24] R. Szeliski and J. Coughlan, "Spline-based image registration," Int. J Comput.
Vision, vol. 22, no. 3, pp. 199-218, March 1997.

[25] S. H. Lai and M. Fang, "Robust and efficient image alignment with spatially-varying
illumination models," in IEEE Conference on Computer Vision and Pattern
Recognition, 1999, pp. II: 167-172.

[26] A. Guimond, A. Roche, N. Ayache, and J. Menuier, "Three-Dimensional
Multimodal Brain Warping Using the Demons Algorithm and Adaptive Intensity

Corrections," IEEE Transactions on Medical Imaging, vol. 20, no. 1, pp. 58-69,

[27] J.-P. Thirion, "Image matching as a diffusion process: an analogy with maxwell's
demons," Medical Image Analysis, vol. 2, no. 3, pp. 243-260, 1998.

[28] A. Cuzol, P. Hellier, and E. Memin, "A novel parametric method for non-rigid image
registration," in Proc. Information Processing in Medical Imaging (IPMI'05), ser.
LNCS, G. Christensen and M. Sonka, Eds., no. 3565, Glenwood Springes,
Colorado, USA, July 2005, pp. 456-467.

[29] A. W. Toga and P. M. Thompson, "The role of image registration in brain mapping,"
Image Vision Comput., vol. 19, no. 1-2, pp. 3-24, 2001.

[30] E. D'Agostino, F. Maes, D. Vandermeulen, and P. Suetens, "Non-rigid
atlas-to-image registration by minimization of class-conditional image entropy." in
Medical Image Computing and Computer-Assisted Intervention, 2004, pp. 745-753.

[31] P. A. Viola and W. M. Wells, "Alignment by maximization of mutual information,"
in IEEE International Conference on Computer Vision, MIT, Cambridge, 1995.

[32] A. Collignon, F. Maes, D. Delaere, D. Vandermeulen, P. Suetens, and G. Marchal,
"Automated multimodality image registration based on information theory," Proc.
Information Processing in Medical Imaging, pp. 263-274, 1995.

[33] C. Studholme, D. Hill, and D. J. Hawkes, "Automated 3D registration of MR and
CT images in the head," Medical Image Analysis, vol. 1, no. 2, pp. 163-175, 1996.

[34] D. Mattes, D. R. Haynor, H. Vesselle, T. K. Lewellen, and W. Eubank, "Pet-ct image
registration in the chest using free-form deformations." IEEE Transactions on
Medical Imaging, vol. 22, no. 1, pp. 120-128, 2003.

[35] D. Rueckert, A. F. Frangi, and J. A. Schnabel, "Automatic construction of 3d
statistical deformation models of the brain using non-rigid registration." IEEE
Transactions on Medical Imaging, vol. 22, no. 8, pp. 1014-1025, 2003.

[36] G. Hermosillo, C. Chefd'hotel, and 0. Faugeras, "Variational methods for
multimodal image matching," Int. J Comput. Vision, vol. 50, no. 3, pp. 329-343,

[37] D. Rueckert, L. I. Sonoda, C. Hayes, D. L. G. Hill, M. O. Leach, and D. J. Hawkes,
"Nonrigid registration using free-form deformations: Application to breast mr
images," IEEE Transactions on Medical Imaging, vol. 18, no. 8, pp. 712-721,
August 1999.

[38] M. E. Leventon and W. E. L. Grimson, "Multimodal volume registration using joint
intensity distributions," in Medical Image Computing and Computer-Assisted
Intervention (MICCAI), Cambridge, MA, 1998, pp. 1057-1066.

[39] T. Gaens, F. Maes, D. Vandermeulen, and P. Suetens, "Non-rigid multimodal image
registration using mutual information," in Proc. Conference on Medical Image
Computing and Compter-Assisted Intervention (MICCAI), 1998, pp. 1099-1106.

[40] D. Loeckx, F. Maes, D. Vandermeulen, and P. Suetens, "Nonrigid image registration
using free-form deformations with a local rigidity constraint." in Medical Image
Computing and Computer-Assisted Intervention, 2004, pp. 639-646.

[41] G. K. Rohde, A. Aldroubi, and B. M. Dawant, "The adaptive bases algorithm for
intensity based nonrigid image registration." IEEE Transactions on Medical
Imaging, vol. 22, no. 11, pp. 1470-1479, 2003.

[42] V. Duay, P.-F. D'Haese, R. Li, and B. M. Dawant, "Non-rigid registration algorithm
with spatially varying stiffness properties." in International Symposium on
Biomedical Imaging, 2004, pp. 408-411.

[43] C. Guetter, C. Xu, F. Sauer, and J. Hornegger, "Learning based non-rigid
multi-modal image registration using kullback-leibler divergence." in Medical Image
Computing and Computer-Assisted Intervention, 2005, pp. 255-262.

[44] E. D'Agostino, F. Maes, D. Vandermeulen, and P. Suetens, "An information
theoretic approach for non-rigid image registration using voxel class probabilities,"
Medical Image Analysis, vol. 10, no. 3, pp. 413-431, 2006.

[45] C. Davatzikos, "Spatial transformation and registration of brain images using
elastically deformable models," Comput. Vis. Image Underst., vol. 66, no. 2, pp.
207-222, 1997.

[46] J. C. Gee, M. Reivich, and R. Bajcsy, "Elastically deforming 3d atlas to match
anatomical brain images," J Comput. Assist. Tomogr., vol. 17, no. 2, pp. 225-236,

[47] M. Bro-Nielsen and C. Gramkow, "Fast fluid registration of medical images," in
Proc. of the 4th International Conference on Visualization in Biomedical
Computing. London, UK: Springer-Verlag, 1996, pp. 267-276.

[48] G. E. Christensen, R. D. Rabbitt, and M. I. Miller, "Deformable templates using
large deformation kinematics," IEEE Transactions On Image Processing, vol. 5,
no. 10, pp. 1435-1447, October 1996.

[49] X. Geng, D. Kumar, and G. E. Christensen, "Transitive inverse-consistent manifold
registration." in Proc. Information Processing in Medical Imaging, 2005, pp.

[50] A. Trouve, "Diffeomorphisms groups and pattern matching in image analysis," Int.
J Comput. Vision, vol. 28, no. 3, pp. 213-221, 1998.

[51] D. R. Forsey and R. H. Bartels, "Hierarchical b-spline refinement," Computer
Graphics, vol. 22, no. 4, pp. 205-212, 1988.

[52] C. Cocosco, V. Kollokian, R.-S. Kwan, and A. Evans, "Brainweb: online interface to
a 3-d mri simulated brain database," 1997, last accessed: July 2005. [Online].

[53] P. Thevenaz and M. Unser, "Optimization of mutual information for multiresolution
image registration," IEEE Transactions on Image Processing, vol. 9, no. 12, pp.
2083-2099, December 2000.

[54] T. Chan and L. Vesse, "An active contour model without edges," in Intl. Conf on
Scale-space Theories in Computer Vision, 1999, pp. 266-277.

[55] N. Duta, A. K. Jain, and M.-P. Dubuisson-Jolly, "Automatic construction of 2d shape
models," IEEE Transactions Pattern Anal. Mach. Intell., vol. 23, no. 5, pp. 433-446,

[56] E. Klassen, A. Srivastava, W. Mio, and S. H. Joshi, "Analysis of planar shapes using
geodesic paths on shape spaces." IEEE Transactions Pattern Anal. Mach. Intell.,
vol. 26, no. 3, pp. 372-383, 2003.

[57] T. B. Sebastian, P. N. Klein, B. B. Kimia, and J. J. Crisco, "Constructing 2d curve
atlases," in IEEE Workshop on Mathematical Methods in Biomedical Image
Analysis, Washington, DC, USA, 2000, pp. 70-77.

[58] H. Tagare, "Shape-based nonrigid correspondence with application to heart motion
analysis." IEEE Transactions on Medical Imaging, vol. 18, no. 7, pp. 570-579, 1999.

[59] F. L. Bookstein, "Principal warps: Thin-plate splines and the decomposition of
deformations." IEEE Transactions Pattern Anal. Mach. Intell., vol. 11, no. 6, pp.
567-585, 1989.

[60] H. Chui, A. Rangarajan, J. Zhang, and C. M. Leonard, "Unsupervised learning of an
atlas from unlabeled point-sets." IEEE Transactions Pattern Anal. Mach. Intell.,
vol. 26, no. 2, pp. 160-172, 2004.

[61] S. Belongie, J. Malik, and J. Puzicha, "Shape matching and object recognition using
shape contexts," IEEE Transactions Pattern Anal. Mach. Intell., vol. 24, no. 4, pp.
509-522, 2002.

[62] H. Chui and A. Rangarajan, "A new algorithm for non-rigid point matching." in
IEEE Computer Vision and Pattern Recognition, 2000, pp. 2044-2051.

[63] H. Guo, A. Rangarajan, S. Joshi, and L. Younes, "Non-rigid registration of shapes
via diffeomorphic point matching." in International Symposium on Biomedical
Imaging, 2004, pp. 924-927.

[64] T. F. Cootes, C. J. Taylor, D. H. Cooper, and J. Graham, "Active shape models: their
training and application," Comput. Vis. Image Underst., vol. 61, no. 1, pp. 38-59,

[65] Y Wang and L. H. Staib, "Boundary finding with prior shape and smoothness
models," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22,
no. 7, pp. 738-743, 2000.

[66] A. Hill, C. J. Taylor, and A. D. Brett, "A framework for automatic landmark
identification using a new method of nonrigid correspondence." IEEE Transactions
Pattern Anal. Mach. Intell., vol. 22, no. 3, pp. 241-251, 2000.

[67] Y Tsin and T. Kanade, "A correlation-based approach to robust point set
registration." in European Conference on Computer Vision, 2004, pp. 558-569.

[68] J. Glaunes, A. Trouve, and L. Younes, "Diffeomorphic matching of distributions: A
new approach for unlabelled point-sets and sub-manifolds matching." in IEEE
Computer Vision and Pattern Recognition, 2004, pp. 712-718.

[69] J. Lin, "Divergence measures based on the shannon entropy," IEEE Transactions
Information Theory, vol. 37, pp. 145-151, 1991.

[70] A. Hero, O. M. B. Ma, and J. Gorman, "Applications of entropic spanning graphs,"
IEEE Transactions Signal Processing, vol. 19, pp. 85-95, 2002.

[71] Y He, A. Ben-Hamza, and H. Krim, "A generalized divergence measure for robust
image registration," IEEE Transactions Signal Processing, vol. 51, pp. 1211-1220,

[72] D. M. Endres and J. E. Schindelin, "A new metric for probability distributions,"
IEEE Transactions Information Theory, vol. 49, pp. 1858-60, 2003.

[73] H. Chui and A. Rangarajan, "A new point matching algorithm for non-rigid
registration," Computer Vision and Image Understanding (CVIU), vol. 89, pp.
114-141, 2003.

[74] G. McLachlan and K. Basford, Mixture Model:Inference and Applications to
Clustering. New York: Marcel Dekker, 1988.

[75] A. L. Yuille, P. Stolorz, and J. Utans, "Statistical physics, mixtures of distributions,
and the em algorithm," Neural Comput., vol. 6, no. 2, pp. 334-340, 1994.

[76] R. Bansal, L. Staib, Z. Chen, A. Rangarajan, J. Knisely, R. Nath, and J. Duncan.,
"Entropy-based, multiple-portal-to-3d ct registration for prostate radiotherapy using
iteratively estimated segmentation," in Medical Image Computing and
Computer-Assisted Intervention, 1999, pp. 567-578.

[77] A. Yezzi, L. Zollei, and T. Kapur, "A variational framework for joint segmentation
and registration," in IEEE Workshop on Mathematical Methods in Biomedical Image
Analysis, 2001, pp. 388-400.

[78] P. Wyatt and J. Noble, "Mrf-map joint segmentation and registration," in Medical
Image Computing and Computer-Assisted Intervention, 2002, pp. 580-587.

[79] N. Paragios, M. Rousson, and V. Ramesh, "Knowledge-based registration &
segmentation of the left ventricle: A level set approach." in WACV, 2002, pp. 37-42.

[80] B. Fischl, D. Salat, E. Buena, and M. A., "Whole brain sementation:
Automated labeling of the neuroanatomical structures in the human brain," in
Neuron, vol. 33, 2002, pp. 341-355.

[81] S. Soatto and A. J. Yezzi, "Deformotion: Deforming motion, shape average and the
joint registration and segmentation of images," in European Conference on
Computer Vision, 2002, pp. 32-57.

[82] B. C. Vemuri, Y Chen, and Z. Wang, "Registration assisted image smoothing and
segmentation," in European Conference on Computer Vision, 2002, pp. 546-559.

[83] T. Zhang and D. Freedman, "Tracking objects using density matching and shape
priors." in IEEE International Conference on Computer Vision, 2003, pp.

[84] F. Wang, B. C. Vemuri, M. Rao, and Y Chen, "A new & robust information theoretic
measure and its application to image alignment." in Proc. Information Processing in
Medical Imaging, 2003, pp. 388-400.


Fei Wang was born in Yan Cheng, JiangSu, P. R. China. He received his Bachelor of

Science degree from the University of Science and Technology of China, P. R. China, in

2001. He earned his Master of Science and Doctor of Philosophy degree from the

University of Florida in December 2002 and August 2006 respectively. His research

interests include medical imaging, computer vision, pattern recognition, computer

graphics and shape modeling.

Full Text


Iwouldliketorstthankmyadvisor,Dr.BabaC.Vemuri,foreverythinghehasdoneformeduringmydoctoralstudy.Thisdissertationwouldnothavetakenshapewithouthisinvaluableinput.Dr.Vemuriintroducedmetotheeldofmedicalimageanalysis.Hisinsightandexperiencehaveguidedmethroughoutmyresearchduringwhichtimeheprovidednumerousinvaluablesuggestions.Itwasagreatpleasureformetoconductthisdissertationunderhissupervision.IwouldalsoliketothankDr.AnandRangarajan,Dr.SartajSahni,Dr.ArunavaBanerjeeandDr.TanWongfortheirwillingnesstoserveonmycommittee.Inaddition,specialthanksgotoDr.JorgPetersforattendingmyPhDoralexamination.Mydoctoralresearchisahappycooperationwithmanypeople.Dr.Vemurihasbeeninvolvedwiththewholeprocess,Dr.Rangarajanhasguidedmealotinthegroupwisepointregistrationpart,Dr.IlonaSchmalfussandDr.StephanEisenschenkhavekindlyprovidedthedataforhippocampalsegmentationandtaughtmewhatlittleIknowofNeuroscience.IhavealsobenettedfromDr.ThomasE.Davis'sguidancewhenIrstjoinedthelab.IwouldalsoliketothankDr.Banerjeeforstimulatingdebates,andDr.JeffreyHoforhisprofessionaladviceandphilosophicaldiscussions.Thanksalsogoestomyco-authors,Drs.MuraliRaoandYunmeiChen,whoweremyco-authorsonasetofpapersthatintroduced,theconceptofentropybasedonprobabilitydistributionsandseveralpropertiesofthesame.Needlesstosay,IamgratefulforthesupportofmycolleaguesandfriendsattheComputerandInformationScienceandEngineeringDepartmentattheUniversityofFlorida.Dr.ZhizhouWang,Dr.JundongLiu,Dr.TimMcgraw,Dr.EricSpellman,BingJian,SanthoshKodipaka,NicholasLord,NeetiVohra,AngelosBarmpoutis,SenihaEsen iv




page ACKNOWLEDGMENTS ................................ iv LISTOFTABLES ................................... viii LISTOFFIGURES ................................... ix ABSTRACT ....................................... xi CHAPTER 1INTRODUCTION ................................ 1 1.1ImageandPoint-setRegistration ...................... 1 1.1.1ImageRegistration ......................... 1 1.1.2GroupwisePoint-setsRegistration ................. 3 1.2ImageSegmentation ............................. 4 1.3OutlineofRemainder ............................ 5 2ENTROPYANDRELATEDMEASURES ................... 6 2.1ShannonEntropyandRelatedMeasures .................. 6 2.2CumulativeResidualEntropy:ANewMeasureofInformation ...... 8 2.3PropertiesofCRE .............................. 10 2.3.1CREandEmpiricalCRE ...................... 11 2.3.2RobustnessofCRE ......................... 12 3APPLICATIONSTOMULTIMODALITYIMAGEREGISTRATION ..... 14 3.1RelatedWork ................................ 14 3.2MultimodalImageRegistrationusingCCRE ................ 17 3.2.1TransformationModelforNon-rigidMotion ............ 21 3.2.2MeasureOptimization ........................ 21 3.2.3ComputationofP(i>;k;)and@P(i>;k;) 23 3.2.4AlgorithmSummary ......................... 25 3.3ImplementationResults ........................... 25 3.3.1SyntheticMotionExperiments ................... 26 ................... 26 .................. 28 ..................... 29 ...................... 30 vi


....................... 31 4DIVERGENCEMEASURESFORGROUPWISEPOINT-SETSREGIS-TRATION ..................................... 34 4.1PreviousWork ................................ 36 4.2DivergenceMeasures ............................ 38 4.2.1Jensen-ShannonDivergence ..................... 38 4.2.2CDF-JSDivergence ......................... 40 4.3Methodology ................................ 42 4.3.1EnergyFunctionforGroupwisePoint-setsRegistration ...... 43 4.3.2JSDivergenceinaHypothesisTestingFramework ......... 44 4.3.3UnbiasnessPropertyoftheDivergenceMeasures ......... 45 4.3.4EstimatingJSanditsDerivative ................... 47 .................. 47 .............. 49 4.3.5EstimatingCDF-JSanditsDerivative ............... 50 ........... 52 4.4ExperimentResults ............................. 53 4.4.1JSDivergenceResults ........................ 53 .................... 53 ................ 55 4.4.2CDF-JSDivergenceResults ..................... 56 5APPLICATIONSTOIMAGESEGMENTATION ................ 59 5.1RelatedWork ................................ 59 5.2Registration+SegmentationModel ..................... 60 5.2.1Gradientows ............................ 63 5.2.2AlgorithmSummary ......................... 66 5.3Results .................................... 66 6CONCLUSIONSANDFUTUREWORK .................... 72 6.1ContributionsoftheDissertation ...................... 72 6.2ImageandPoint-setsRegistration ..................... 72 6.2.1Non-rigidImageRegistration .................... 72 6.2.2GroupwisePoint-setsRegistration ................. 73 6.3ImageSegmentation ............................. 74 REFERENCES ..................................... 76 BIOGRAPHICALSKETCH .............................. 83 vii


Table page 3ComparisonoftheregistrationresultsbetweenCCREandMIforaxedsyn-theticdeformationeld. ............................. 30 3ComparisonoftotaltimetakentoachieveregistrationbytheCCREwithMI. 31 3ComparisonofthevalueSofseveralbrainstructuresforCCREandMI. .... 33 5Statisticsoftheerrorinestimatednon-rigiddeformation. ............ 68 viii


Figure page 1Illustrationofgroupwiseregistrationofcorpuscallosumpoint-setsmanuallyextractedfromtheoutercontoursofthebrainimages. .............. 4 3CCRE,MIandNMItracesplottedforthemisalignedMR&CTimagepair .. 20 3ComparisonofconvergencespeedbetweenCCREandMI ........... 27 3PlotdemonstratingthechangeofMeanDeformationErrorforCCREandMIregistrationresultswithtime. ........................... 28 3Resultsofapplicationofouralgorithmtosyntheticdata(seetextfordetails). 28 3RegistrationresultsofMRT1andT2imageslicewithlargenon-overlap. ... 30 3RegistrationresultsofdifferentsubjectsofMR&CTbraindatawithrealnon-rigidmotion.(seetextfordetails. ........................ 32 4Illustrationofcorpuscallosumpoint-setsrepresentedasdensityfunctions. .. 35 4Resultsofrigidregistrationinnoiselesscase.'o'and'+'indicatethemodelandscenepointsrespectively. ........................... 54 4Non-rigidregistrationofthecorpuscallosumpointsets. ............. 54 4Experimentresultsonseven2Dcorpuscollasumpointsets. ........... 55 4Robustnesstooutliersinthepresenceoflargenoise. ............... 57 4Robustnessteston3Dswandata ......................... 57 4Atlasconstructionfromfour3Dhippocampalpointsets. ............ 58 5ModelIllustration ................................. 61 5Illustrationofthevarioustermsintheevolutionofthelevelsetfunction. ... 65 5Resultsofapplicationofouralgorithmtosyntheticdata ............. 67 5ResultsofapplicationofouralgorithmtoapairofslicesfromhumanbrainMRIs ....................................... 69 5CorpusCallosumsegmentationonapairofcorrespondingslicesfromdistinctsubjects. ...................................... 70 ix


.................................. 71 x


Informationtheoryhasplayedafundamentalroleinmanyeldsofscienceandengineeringincludingcomputervisionandmedicalimaging.Inthisdissertation,variousinformationtheoreticmeasuresthatareusedinachievingthegoalofsolvingseveralimportantproblemsinmedicalimagingnamely,imageregistration,point-setregistrationandimagesegmentationarepresented. Tomeasuretheinformationcontentinarandomvariable,werstpresentanovelmeasurebasedonitscumulativedistributionthatisdubbedCumulativeResidualEntropy(CRE).Thismeasureparallelsthewell-knownShannonentropybuthasthefollowingadvantages:(1)itismoregeneralthantheShannonentropyasitsdenitionisvalidinthediscreteandcontinuousdomains,(2)itpossessesmoregeneralmathematicalpropertiesand(3)itcanbeeasilycomputedfromsampledataandthesecomputationsasymptoticallyconvergetothetruevalues.BasedonCRE,wedenethecross-CRE(CCRE)betweentworandomvariables,andapplyittosolvetheimagealignmentproblemforparameterizedtransformations.ThekeystrengthsoftheCCREoverusingthenowpopularMutualInformation(basedonShannon'sentropy)betweenimagesbeing xi


Jensen-Shannon(JS)divergencehaslongbeenknownasameasureofcohesionbetweenmultipleprobabilitydensities.Similartotheideaofdeninganentropymeasurebasedondistributions,wederivedaJSdivergencebasedonprobabilitydistributionsanddubitastheCDF-JSdivergence.WethenapplytheJSandtheCDF-JSdivergencetothegroupwisepoint-setregistrationproblem,whichinvolvessimultaneouslyregisteringmultipleshapes(representedaspoint-sets)forconstructinganatlas.Estimatingameaningfulaverageormeanshapefromasetofshapesrepresentedbyunlabeledpoint-setsisachallengingproblem,sincethisusuallyinvolvessolvingforpointcorrespondenceunderanon-rigidmotionsetting.ThenovelandrobustalgorithmweproposeavoidsthecorrespondenceproblembyminimizingtheCDF-JS/JSdivergencebetweenthepoint-setsrepresentedasprobabilitydistribution/densityfunctions.Thecostfunctionsarefullysymmetricwithnobiastowardanyofthegivenshapestoberegisteredandwhosemeanisbeingsought.WeempiricallyshowthatCDF-JSismorerobusttonoiseandoutliersthanJSdivergence.Ouralgorithmcanbeespeciallyusefulforcreatingatlasesofvariousshapespresentinimagesaswellasforsimultaneouslyregistering3Drangedatasetswithouthavingtoestablishanycorrespondence. Inthecontextofimagesegmentation,wedevelopedanovelmodel-basedseg-mentationtechniquethatinvolvessegmentingthenovel3Dimagedatabynon-rigidlyregisteringanatlastoit.Thekeycontributionheretothesolutionofthisproblemisthatwepresentanovelvariationalformulationoftheregistrationassistedimagesegmenta-tiontask,whichleadstosolvingacoupledsetofnonlinearPDEsthataresolvedusingefcientnumericalschemes.Oursegmentationalgorithmisadeparturefromearliermethodsinthatwehaveauniedvariationalprinciplewhereinnon-rigidregistrationandsegmentationaresimultaneouslyachieved;unlikeprevioussolutionstothisproblem,ouralgorithmcanaccommodateimagepairswithverydistinctintensitydistributions. xii


In1948,motivatedbytheproblemofefcientlytransmittinginformationoveranoisycommunicationchannel,ClaudeShannonintroducedarevolutionarynewprobabilisticwayofthinkingaboutcommunicationandsimultaneouslycreatedthersttrulymathematicaltheoryofentropy.Hisideascreatedasensationandwererapidlydevelopedtocreatetheeldofinformationtheory,whichemploysprobabilityandergodictheorytostudythestatisticalcharacteristicsofdataandcommunicationsystems.Sincethen,informationtheoryhasplayedafundamentalroleinmanyeldsofscienceandengineeringincludingcomputervisionandmedicalimaging.Inthisdissertation,weendeavortodevelopnovelinformationtheoreticmethodswiththeapplicationtomedicalimageanalysis. Weexaminetwoapplicationsinparticular,image(point-set)registrationandimagesegmentation.Intherstoftheseapplications,wefollowapromisingavenueofworkinusingaprobabilitydensityordistributionfunctionasthesignatureofagivenobject(imageorpoint-set).Thenbyoptimizingcertaininformationtheoreticmeasuresbetweenthesefunctions,weachievethedesiredregistration.Inthesegmentationapplication,weconsideranatlasbasedapproach,inwhichsegmentationandregistrationaresimultaneouslyachievedbysolvinganovelvariationalprinciple. 1


totheunknownparameterizedtransformationtobedetermined,deneamatchmetricM(I1(x;y);I2(x0;y0))andoptimizeMoverallT. Thefundamentalcharacteristicofanyimageregistrationtechniqueisthetypeofspatialtransformationormappingusedtoproperlyoverlaytwoimages.Thetransforma-tioncanbeclassiedintoglobalandlocaltransformations.Aglobaltransformationisgivenbyasingleequationwhichmapstheentireimage.Localtransformationsmaptheimagedifferentlydependingonthespatiallocationandarethusmoredifculttoexpresssuccinctly.Themostcommonglobaltransformationsarerigid,afneandprojectivetransformations. Atransformationiscalledrigidifthedistancebetweenpointsintheimagebeingtransformedispreserved.Arigidtransformationcanbeexpressedas whereu(x;y)andv(x;y)denotethedisplacementatpoint(x;y)alongtheXandYdirections;istherotationangle,and(dx;dy)thetranslationvector. Atransformationiscalledafnewhenanystraightlineintherstimageismappedontoastraightlineinthesecondimagewithparallelismbeingpreserved.In2D,theafnetransformationcanbeexpressedas where(a11a12a21a22)denotesanarbitraryreal-valuedmatrix.Scalingtransformation,whichhasatransformationmatrixofs100s2andshearingtransformation,whichhasamatrix(1s301)aretwoexamplesofafnetransformation,wheres1,s2ands3arepositiverealnumbers.


Amoreinterestingcase,ingeneral,isthatofaplanarsurfaceinmotionviewedthroughapinholecamera.Thismotioncanbedescribedasa2Dprojectivetransforma-tionoftheplane. wherea0;:::;a7aretheglobalparameters. Whenaglobaltransformationdoesnotadequatelyexplaintherelationshipofapairofinputimages,alocaltransformationmaybenecessary.Registeringanimagepairobtainedatdifferenttimeswithsomeportionofthebodyexperiencinggrowth,orregisteringtwoimagesfromdifferentpatients,fallintothislocaltransformationregistrationcategory.Amotioneldisusuallyusedtodescribethechangeinlocaltransformationproblem. GivenNpoint-sets,whicharedenotedbyfXp;p2f1;:::;Ngg,eachpoint-setXpconsistsofpointsfxpi2RD;i2f1;:::;npggandnpisthenumberofpointscontainedinpoint-setXp.Thetaskofmultiplepointpatternmatchingorpoint-setregistrationiseithertoestablishaconsistentpoint-to-pointcorrespondencebetweenthesepoint-setsortorecoverthespatialtransformationwhichyieldsthebestalignment.Forexample,wearegivenagroupofcorpuscallosumpoint-setsfromthebrainimagescan,whichisshownintheleftcolumnofFigure 1 .Allthepoint-setsareregisteredsimultaneouslytothepoint-setsshownintherightcolumninasymmetricmanner,meaningthattheregistration


resultisnotbiasedtowardsanyoftheoriginalpoint-set.WewilldiscusstheseissuesingreaterdetailinChapter 4 Figure1: Illustrationofgroupwiseregistrationofcorpuscallosumpoint-setsmanuallyextractedfromtheoutercontoursofthebrainimages.


tosegmentobjectsinmultipleimagespriortoregistration.ThisisespeciallytrueinimagesfromdifferentmodalitiessuchasCTandMRI. Image-guidedsurgeryisanotherimportantapplicationthatneedsimagesegmen-tation.Recentadvancesintechnologyhavemadeitpossibletoacquireimagesofthepatientwhilethesurgeryisin-progress.Thegoalisthentosegmentrelevantregionsofinterestandoverlaythemonanimageofthepatienttohelpguidethesurgeoninhis/herwork. Segmentationisthereforeaveryimportanttaskinmedicalimaging.However,manualsegmentationisnotonlyatediousandtimeconsumingprocess,butisalsoinaccurate.Segmentationbyexpertshasshowntobevariableupto20%.Itisthereforedesirabletousealgorithmsthatareaccurateandrequireaslittleuserinteractionaspossible.


1 ]inthecontextofcommunicationtheory.TheentropyShannonproposedisameasureofuncertaintyinadiscretedistri-butionbasedontheBoltzmanentropyofclassicalstatisticalmechanics.TheShannonEntropyofadiscretedistributionFisdenedby Sincethen,thisconceptandvariantsthereofhavebeenextensivelyutilizedinnumerousapplicationsofscienceandengineering.Todate,oneofthemostwidelybenetingapplicationhasbeeninnancialanalysis[ 2 ],datacompression[ 3 ],statistics[ 4 ],andinformationtheory[ 5 ]. Thismeasureofuncertaintyhasmanyimportantpropertieswhichagreewithourintuitivenotionofrandomness.Wementionthree:(1)Itisalwayspositive.(2)Itvanishesifandonlyifitisacertainevent.(3)Entropyisincreasedbytheadditionofanindependentcomponent,anddecreasedbyconditioning.However,extensionofthisnotiontocontinuousdistributionposessomechallenges.AstraightforwardextensionofthediscretecasetocontinuousdistributionsFwithdensityfcalleddifferentialentropyreads However,thisdenitionraisesthefollowingconcerns,1)Firstofall,itisdenedbasedonthedensityoftherandomvariable,whichingeneralmayormaynotexist,e.g.,for 6


caseswhenthecumulativedistributionfunction(cdf)isnotdifferentiable.Itwouldnotbepossibletodenetheentropyofarandomvariableforwhichthedensityfunctionisundened;2)Secondly,theShannonentropyofadiscretedistributionisalwayspositive,whilethedifferentialentropyofacontinuousvariablemaytakeanyvalueontheextendedrealline;3)Shannonentropycomputedfromsamplesofarandomvariablelacksthepropertyofconvergencetothedifferentialentropy,i.e.evenwhenthesamplesizegoestoinnity,theShannonentropyestimatedfromthesesampleswillnotconvergetodifferentialentropy[ 5 ].Theconsequenceofwhichisthatitisimpossible,ingeneral,toapproximatethedifferentialentropyofacontinuousvariableusingtheentropyofempiricaldistributions;4)Considerthefollowingsituation:SupposeXandYaretwodiscreterandomvariablesrepresentingtheheightofagroupofpeople,withXtakingonvaluesf5:1;5:2;5:3;5:4;5:5g,eachwithaprobability1=5andYtakingonvaluesf5:1;5:2;5:3;5:4;7:5g(withYaoMinginthisgroup)againeachwithprobability1=5.TheinformationcontentmeasuredinthesetworandomvariablesusingShannonentropyisthesame,i.e.,Shannonentropydoesnotbringoutanydifferencesbetweenthesetwocases.However,ifthetworandomvariablesrepresentedthewinningchancesinabasketballgame,theinformationcontentinthetworandomvariableswouldbeconsideredasbeingdramaticallydifferent.NeverthelessShannonentropyfailstomakeanydistinctionwhatsoeverbetweenthem.Foradditionaldiscussiononsomeoftheseissuesthereaderisreferredto[ 6 ]. InthisworkweproposeanalternativemeasureofuncertaintyinarandomvariableXandcallittheCumulativeResidualEntropy(CRE)ofX.ThemainobjectiveofourstudyistoextendShannonentropytorandomvariableswithcontinuousdistributions.Theconceptweproposedovercomestheproblemsmentionedabove,whileretainingmanyoftheimportantpropertiesofShannonentropy.Forinstance,botharedecreasedbyconditioning,whileincreasedbyindependentaddition.Theybothobeythedata


processinginequality,etc.However,thedifferentialentropydoesnothavethefollowingimportantpropertiesofCRE. 1. CREhasconsistentdenitionsinboththecontinuousanddiscretedomains; 2. CREisalwaysnon-negative; 3. CREcanbeeasilycomputedfromsampledataandthesecomputationsasymptoti-callyconvergetothetruevalues. ThebasicideainourdenitionistoreplacethedensityfunctionwiththecumulativedistributioninShannon'sdenition 2 .Thedistributionfunctionismoreregularthanthedensityfunction,becausethedensityiscomputedasthederivativeofthedistribution.Moreover,inpracticewhatisofinterestand/ormeasurableisthedistributionfunction.Forexample,iftherandomvariableisthelifespanofamachine,thentheeventofinterestisnotwhetherthelifespanequalst,butratherwhetherthelifespanexceedst.Ourdenitionalsopreservesthewellestablishedprinciplethatthelogarithmoftheprobabilityofaneventshouldrepresenttheinformationcontentintheevent.ThediscussionsaboutthepropertiesofCREinthenextfewsections,wetrust,areconvincingenoughforfurtherdevelopmentoftheconceptofCRE. 7 ]. whereRN+=xi2RN;xi0.


CREcanberelatedtothewell-knownconceptofmeanresiduallifefunctioninReliabilityEngineeringwhichisdenedas: F(t)(2) ThemF(t)isoffundamentalimportanceinReliabilityEngineering&isoftenusedtomeasuredeparturefromexponentiality.CREcanbeshowntobetheexpectationofmF(t)[ 8 ],i.e. Nowwegiveafewexamples. Considerageneraluniformdistributionwiththedensityfunction: ThenitsCREiscomputedasfollows a)log(1x a)dx=1 4a Theexponentialdistributionwithmean1=hasthedensityfunction: Correspondingly,theCREoftheexponentialdistributionis


TheGaussianprobabilitydensityfunctionis wheremisthemeanand2isthevariance. Thecumulativedistributionfunctionis: ); whereerfcistheerrorfunction: )logerfc(xm )dx We'llnowstatesimportantpropertiesthatarerelatedtotheapplicationofCREtoimageregistration.Foracompletelistofproperties,wereferthereaderstoamorecomprehensivetreatmentin[ 7 ]. 7 ].


SimilartothecaseofShannon'sentropy,ifXandYareindependentrandomvariables,E(X;Y)=E(jXj)E(X)+E(jYj)E(Y).Moregenerally, 7 ]. Conditionalentropyisafundamentalconceptininformationtheory.WenowdenetheconceptofconditioninginthecontextofCRE. AsintheShannonentropycase,conditioningreducesCRE. E[E(XjY)]E(X)(2) 2 .Incontrast,inthisregardCREdoesnotdifferentiatebetweendiscreteandcontinuouscases,asshownbythefollowingtheorem:


Shannonentropycomputedfromsamplesofarandomvariablelacksthepropertyofconvergencetothedifferentialentropy(seeEqn. 2 foradenition).Incontrast,theCRE,E(x)computedfromthesamplesconvergestothecontinuouscounterpart.Thisissummarizedinthefollowingtheorem. 7 ]fortheproof. Thisisapowerfulpropertyandasaconsequenceofit,wecancomputeCREofanrandomvariablefromthesampleswhichwouldconvergetothetrueCREoftherandomvariable.NotethatXkcanbesamplesofacontinuousrandomvariable.


2 )andYnasabove.SupposeYn!0inprobability.Then 3 3 Theorems( 3 )and( 4 )areveryimportantpropertiesastheyprovethattheCREisrobusttonoisewhichisnotthecasefordifferentialentropy.Intuitively,therobustnessofCREmaybeattributedtotheuseofCDFasopposedtoaPDFinitsdenition,i.e.,anintegralformulationasopposedtoadifferentialformulationanditiswellknownthattheformerismorerobustcomparedtothelater.


Matchingtwoormoreimagesundervaryingconditionsillumination,pose,acquisitionparametersetc.isubiquitousinComputerVision,medicalimaging,geographicalinformationsystemsetc.Inthepastseveralyears,informationtheoreticmeasureshavebeenverywidelyusedindeningcostfunctionstobeoptimizedinachievingamatch.Anexampleproblemcommontoalltheaforementionedareasistheimageregistrationproblem.Inthefollowing,wewillreviewtheliteratureonexistingcomputationalalgorithmsthathavebeenreportedforachievingmultimodalityimageregistration,withthefocusonthenon-rigidregistrationmethods.Wewillpointouttheirlimitationsandhencemotivatetheneedforanewandefcientcomputationalalgorithmforachievingourgoal. Severalfeature-basedmethodsinvolvedetectingsurfaceslandmarks[ 9 10 11 12 ],edges,ridges,etc.MostoftheseassumeaknowncorrespondencewiththeexceptionoftheworkinChuietal.[ 9 ],JianandVemuri[ 13 ],Wangetal.[ 14 ]andGuoetal.[ 15 ].WorkreportedinIraniandAnandan[ 16 ]usestheenergy(squaredmagnitude)inthedirectionalderivativeimageasarepresentationschemeformatchingachievedusingthe 14


SSDcostfunction.Recently,Liuetal.[ 17 ]reportedtheuseoflocalfrequencyinarobuststatisticalframeworkusingtheintegralsquarederrora.k.a.,L2E.TheprimaryadvantageofL2Eoverotherrobustestimatorsinliteratureisthattherearenotuningparametersinit.TheideaofusinglocalphasewasalsoexploitedbyMellorandBrady[ 18 ],whousedmutualinformation(MI)tomatchlocal-phaserepresentationofimagesandestimatedthenon-rigidregistrationbetweenthem.However,robustnesstosignicantnon-overlapintheeldofview(FOV)ofthescannerswasnotaddressed.Formoreonfeature-basedmethods,wereferthereadertotherecentsurveybyZitovaandFlusser[ 19 ]. Inthecontextofdirectmethods,theprimarymatchingtechniquesforintra-modalityregistrationinvolvetheuseofnormalizedcross-correlation,modiedSSD,and(normalized)mutualinformation(MI).Ruiz-Alzolaetal.[ 20 ]presentedauniedframeworkfornon-rigidregistrationofscalar,vectorandtensordatabasedontemplatematching.Forscalarimages,thecostfunctionistheextensionofmodiedSSDusingadifferentdenitionofinnerproducts.Howeverthismodelcanonlybeusedonimagesfromthesamemodalityasitassumessimilarintensityvaluesbetweenimages.In[ 21 22 ],alevel-setbasedimageregistrationalgorithmwasintroducedthatwasdesignedtonon-rigidlyregistertwo3Dvolumesfromthesamemodalityofimaging.Thisalgorithmwascomputationallyefcientandwasusedtoachieveatlas-basedsegmentation.Directmethodsbasedontheoptical-owestimationformalargeclassforsolvingthenon-rigidregistrationproblem.Hellieretal.[ 23 ]proposedaregistrationmethodbasedonadenserobust3-Destimationoftheopticalowwithapiecewiseparametricdescriptionofthedeformationeld.Theiralgorithmisunsuitableformulti-modalimageregistrationduetothebrightnessconstancyassumption.Variantsofopticalow-basedregistrationthataccommodateforvaryingilluminationmaybeusedforinter-modalityregistrationandwereferthereaderto[ 24 25 ]forsuchmethods.Guimondetal.,[ 26 ]reportedamulti-modalbrainwarpingtechniquethatusesThirion'sDemonsalgorithm[ 27 ]withanadaptiveintensitycorrection.Thetechniquehoweverwasnottestedforrobustness


withrespecttosignicantnon-overlapintheFOVs.Morerecently,Cuzoletal.[ 28 ]introducedanewnon-rigidimageregistrationtechniquewhichbasicallyinvolvesaHelmoholtzdecompositionoftheoweldwhichisthenembeddedintothebrightnessconstancymodelofopticalow.TheHelmholtzdecompositionallowsonetocomputelargedisplacementswhenthedatacontainssuchdisplacements.Thistechniqueisaninnovationonaccommodatingforlargedisplacementsandnotonethatallowsforintermodalitynon-rigidregistration.Formoreonintra-modalitymethods,wereferthereadertothecomprehensivesurveys[ 29 19 ]. Apopularframeworkfordirectmethodsisbasedontheinformationtheoreticmeasures[ 30 ],amongthem,mutualinformation(MI)pioneeredbyViolaandWells[ 31 ]andCollignonetal.,[ 32 ]andmodiedinStudholmeetal.,[ 33 ]hasbeeneffectiveintheapplicationofimageregistration.Reportedregistrationexperimentsintheseworksarequiteimpressiveforthecaseofrigidmotion.Theproblemofbeingabletohandlenon-rigiddeformationsintheMIframeworkisaveryactiveareaofresearchandsomerecentpapersreportingresultsonthisproblemare[ 18 34 35 36 37 38 39 40 41 42 ].In[ 34 ],Mattesetal.,andin[ 35 ],Rueckertetal.,presentedmutualinformationbasedschemesformatchingmulti-modalimagepairsusingB-Splinestorepresentthedeformationeldonaregulargrid.Guetter[ 43 ]recentlyincorporatedalearnedjointintensitydistributionintothemutualinformationformulation,inwhichtheregistrationisachievedbysimultaneouslyminimizingtheKLdivergencebetweentheobservedandlearnedintensitydistributionsandmaximizingthemutualinformationbetweenthereferenceandalignmentimages.Recently,D'Agostinoetal.,[ 44 ]presentedaninformationtheoreticapproachwhereintissueclassprobabilitiesofeachimagebeingregisteredareusedtomatchoverthespaceoftransformationsusingadivergencemeasurebetweentheidealcase(wheretissueclasslabelsbetweenimagesatcorrespondingvoxelsaresimilar)andactualjointclassdistributionsofbothimages.Thisworkexpectsasegmentationofeitheroneoftheimagesbeingregistered.Computationalefciencyand


accuracy(intheeventofsignicantnon-overlaps)areissuesofconcerninmostifnotalltheMI-basednon-rigidregistrationmethods. Finally,someregistrationmethodsunderthedirectapproachareinspiredbymodelsfrommechanics,eitherfromelasticity[ 45 46 ],oruidmechanics[ 47 48 ].Fluidmechanics-basedmodelsaccommodateforlargedeformations,butarelargelycomputationallyexpensive.Christensen[ 49 ]recentlydevelopedaninterestingversionofthesemethods,wherethedirectdeformationeldandtheinversedeformationeldarejointlyestimatedtoguaranteethesymmetryofthedeformationwithrespecttopermutationofinputimages.Amoregeneralandmathematicallyrigoroustreatmentofthenon-rigidregistrationwhichsubsumestheuid-owmethodswaspresentedinTrouve[ 50 ].Allthesemethodshoweverareprimarilyapplicabletointra-modalityandnotinter-modalityregistration. whereh(X)isthedifferentialentropyoftherandomvariableXandisgivenbyh(x)=R1p(x)lnp(x)dx,wherep(x)istheprobabilitydensityfunctionandcanbeestimatedfromtheimagedatausinganyoftheparametricandnonparametricmethods.Thereason


fordeningMIintermsofdifferentialentropyasopposedtoShannonentropyistofacilitatetheoptimizationofMIwithrespecttotheregistrationparametersusinganyofthegradientbasedoptimizationmethods.NotethatMIdenedusingtheShannon'sentropyindiscreteformwillnotconvergetocontinuouscasedenedhereduetothefactthatShannon'sentropydoesnotconvergetothedifferentialentropy(see[ 5 ]). Wenowdenethecross-CRE(CCRE)usingCREdenedinEqn. 2 Wewillusethisquantityasamatchingcriterionintheimagealignmentproblem.Morespecically,letIT(x)beatestimagewewanttoregistertoareferenceimageIR(x).Thetransformationg(x;)describesthedeformationfromVTtoVR,whereVTandVRarecontinuousdomainsonwhichITandIRaredened,isthesetofthetransformationparameterstobedetermined.Weposethetaskofimageregistrationasanoptimizationproblem.ToalignthereferenceimageIR(x)withthetransformedtestimageIT(g(x;)),weseekthesetofthetransformationparametersthatmaximizesC(IT;IR)overthespaceofsmoothtransformationsi.e., ThecomputationofCCRErequiresestimatesofthemarginalandjointprobabilitydistributionsoftheintensityvaluesofthereferenceandtestimages.Wedenotep(l;k;)asthejointprobabilityof(ITg(x;);IR.LetpT(l;)andpR(k)representthemarginalprobabilityforthetestimageandreferenceimagesrespectively,LTandLRarethediscretesetsofintensitiesassociatedwiththetestimageandreferenceimagerespectively.Then,wecanrewritetheCCREITg(x;);IRasfollows:CITg(x;);IR=E(IT)E[E(ITg(x;)=IR)]=X2LTZ1pT(l;)dlloghZ1pT(l;)dli


LetP(i>;)=R1pT(l;)dlandP(i>;k;)=R1p(l;k;)dl.UsingthefactthatpT(l;)=Pk2LRp(l;k;),wehaveP(i>;)=Pk2LRP(i>;k;).Eqn.( 3 )canbefurthersimplied,whichleadsto, ToillustratethedifferencebetweenCCREandthenowpopularinformationtheoreticcostfunctionssuchasMI&NMI,wechoosetoplotthesefunctionsagainstaparameterofthetransformation,forillustrativepurposes,saytherotations.TheimagepairweusedhereisMR&CTimagesthatwereoriginallyaligned,andtheMRandCTdataintensitiesrangefrom0-255withthemean55.6and60.6respectively.ThecostfunctionsarecomputedovertherotationanglethatwasappliedtotheCTimagetomisalignitwithrespecttotheMRimage.IneachplotoftheFigure 3 theX-axisshowsthe3DrotationangleaboutZaxis,whiletheY-axisshowsthevaluesofCCRE,MIandNMIcomputedfromthemisaligned(byarotation)imagepairs.Thesecondrowshowsazoom-inviewoftheplotsoverasmallerregion,soastogetadetailedviewofthecostfunction.Thefollowingobservationsaremadefromthisplot:


Figure3: CCRE,MIandNMItracesplottedforthemisalignedMR&CTimagepairwheremisalignmentisgeneratedbyarotationoftheCTimage.Firstrow:overtherange40to40.Secondrow:zoominviewbetween0:5to0:5,wherethearrowsintherstrowsignifytheposition.Notethatallthreecostfunctionareimplementedwithtri-linearinterpolation.Thirdrow:Threecostfunctionsimplementedwithpartialvolumeinterpolation[ 32 ]. 1. SimilartoMIandNMI,themaximumofCCREoccursat0ofrotation,whichconrmsthatournewinformationmeasureneedstobemaximizedinordertondoptimumtransformationbetweentwomisalignedimages. 2. TheCCREshowsmuchlargerrangeofvaluesthanMI&NMI.Thisfeatureplaysanimportantroleinthenumericaloptimizationsinceitleadstoamorestablenumericalimplementationbyavoidingcancelation,roundoffetc.thatoftenplaguearithmeticoperationswithsmallernumericalvalues.


3. Uponcloserinspection,weobservethatCCREismuchsmootherthantheMIandNMIfortheMR&CTdatapair,thereforeveriesthatCCREismoreregularthanotherinformationtheoreticmeasures. ThebasicideaofthecubicB-splinedeformationistodeformanobjectbymanipu-latinganunderlyingmeshofcontrolpointsi.Thedeformationgisdenedbyasparseregularcontrolpointgrid.In3Dcase,thedeformationatanypointx=[x;y;z]TinthetestimagecanbeinterpolatedwithalinearcombinationofcubicB-splineconvolutionkernel. where(3)(x)=(3)(x)(3)(y)(3)(z)and4isspacingofthecontrolgrid.jistheexpansionB-splinecoefcientscomputedfromthesamplevaluesoftheimage.Fortheimplementationdetails,wereferthereadertoForsey[ 51 ]andMattes[ 34 ].


EachcomponentofthegradientcanbefoundbydifferentiatingEqn.( 3 )withrespecttoatransformationparameters.WeconsiderthetwotermsinEqn.( 3 )separatelywhencomputingthederivative.FortherstterminEqn.( 3 ),wehave, @hX2LTZ1pT(l;)dllogZ1pT(l;)dli=X2LTlogP(i>;)+1@P(i>;) whereP(i>;)=R1pT(l;)dl,and Thederivativeofthesecondtermisgivenby, @hXk2LRpR(k)X2LTZ1p(l;k;) whereP(i>;k;)=R1p(l;k;)dl,and Combiningthederivativesofthetwotermstogether,andusingthefactthat


wehavetheanalyticgradientofCCRE, ComparingtheexpressionsforCCREandderivativeofCCRE wenotethatthetwoformulasin( 3 )aresimilartoeachotherandtheysharethecommontermlogP(i>;k;)


images.Let(0)beazero-ordersplineParzenwindow(centeredunitpulse)and(3)beacubicsplineParzenwindow,thesmoothedjointprobabilityof(IR;ITg)isgivenby whereisanormalizationfactorthatensuresPp(l;k)=1,andIR(x)andIT(g(x;)aresamplesofthereferenceandinterpolatedtestimagesrespectively,whichisnormal-izedbytheminimumintensityvalue,f0R,f0T,andtheintensityrangeofeachbin,4bR,4bT. SinceP(i>;k;)=R1p(l;k;)dl,wehavethefollowing, where()isthecumulativeresidualfunctionofcubicsplinekerneldenedasfollows, 22 3v+v3 22 3v+v3


Notethatd(u) du=(3)(u),wecanthentakethederivativeofEqn. 3 withrespectto,andweget where@IT(t) 3 )andEqn.( 3 )respectively. 3 ),wecanthenusetheQuasi-Newtonmethodtonumericallysolvetheoptimizationproblem.


showthatCCREisnotonlymorerobust,butalsoconvergesfasterthanothers.WebeginbyapplyingCCREtoregisterimagepairsforwhichthegroundtruthwasavailable. ThedataweuseforthisexperimentarecorrespondingslicesfromanMRT1andT2imagepair,whichisfromthebrainwebsiteattheMontrealNeurologicalInstitute[ 52 ].Theyareoriginallyalignedwitheachother.Thetwoimagesaredenedona1mmisotropicvoxelgridintheTalairachspace,withdimension(256256).Wethenapplyaknownnon-rigidtransformationtotheT2imageandthegoalistorecoverthisdeformationbyapplyingourregistrationmethod.Themutualinformationschemewhichwearegoingtocomparewithisoriginallyreportedinliterature[ 34 ][ 53 ],inwhichtheexplicitgradientformsarepresentedandthusallowingfortheapplicationofgradientbasedoptimizationmethods.


Figure3: Upperleft,MRT1imageassourceimage;Upperright,deformedMRT2imageastargetimage;Lowerleftandright,resultsofestimatedtransformationsusingCCREandMIappliedtothesourcerespectively.Bothalgorithmsrunfor30secondsusingthesamegradientdescenttechnique. ThesourceandtargetimagepairalongwiththeresultsofestimatedtransformationusingCCREandMIappliedtothesourceareshowninFigure 3 .Asevidentvisually,weobservethattheresultgeneratedbyCCREismoresimilarinshapewiththetargetimagethantheoneproducedbyMI. QuantitativeassessmentofaccuracyoftheregistrationispresentedsubsequentlyinFigure 3 ,whereweplottedthechangeofmeandeformationerror(MDE)obtainedfortheCCRE-basedalgorithmandtheMI-basedalgorithmrespectively.MDEisdenedasdm=1


Figure3: PlotdemonstratingthechangeofMeanDeformationErrorforCCREandMIregistrationresultswithtime.SolidlineshowstheMDEforCCREregistrationresult,whiledottedlineillustratestheMDEforMIresult. empiricallyvalidatesthefasterconvergencespeedofCCREbasedalgorithmovertheMI-basedalgorithm. depictstheresults Figure3: Resultsofapplicationofouralgorithmtosyntheticdata(seetextfordetails). obtainedforthisimagepair.whichisorganizedasfollows,fromlefttoright:therst


rowdepictsthesourceimagewiththetargetimagesegmentationsuperposedtodepicttheamountofmis-alignment,theregisteredsourceimagewhichisobtainedusingouralgorithmsuperposedwiththetargetsegmentation,followedbythetargetimage;secondrowdepictsgroundtruthdeformationeldwhichweusedtogeneratethetargetimagefromtheMRT2image,theestimatednon-rigiddeformationeldfollowedbyhistogramoftheestimatedmagnitudeerror.Notethattheerrordistributionismostlyconcentratedinthesmallerrorrangeindicatingtheaccuracyofourmethod.Asameasureofaccuracyofourmethod,wealsoestimatedtheaverage,,andthestandarddeviation,,oftheerrorintheestimatednon-rigiddeformationeld.Theerrorwasestimatedastheanglebetweenthegroundtruthandestimateddisplacementvectors.Theaverageandstandarddeviationare1.5139and4.3211(indegrees)respectively,whichisquiteaccurate. Themeanmagnitudeofthesyntheticmotionis4.37pixel,withthestandarddeviationat1.8852.Table 3 showtheregistrationresultsforthetwoschemes.Fromthetable,weobservethattheMIfailswhenthestandarddeviationofthenoiseisincreasedto40,whileCCREistolerantuntil66,asignicantdifferencewhencomparedtotheMI.


Table3: ComparisonoftheregistrationresultsbetweenCCREandMIforaxedsyntheticdeformationeld. CCRE MI MDEStandardDeviation 1:08160:9345 1:38841:4538 19 1:13811:1702 1:48711:5052 30 1:19751:3484 1:52041:5615 40 1:37911:9072 66 ThisexperimentconclusivelydepictsthatCCREhasmorenoiseimmunitythanMIwhendealingwiththenon-rigidmotion. 3 depictsanexampleofregistrationoftheMRT1andT2datasetswithlargenonoverlap.TheleftimageoftheguredepictstheMRT1brainscanasthesourceimage,andtherightimageshowstheMRT2dataasthetarget.NotethattheFOVforthedatasetsaresignicantlynonoverlapping.Thenonoverlapwassimulatedbycutting66%oftheMRT1image(Sourceimage).Themiddlecolumndepictsthetransformedsourceimagealongwithanedgemapofthetarget(DeformedMRT2image)superimposedonthetransformedsource.Asisevident,theregistrationisvisuallyquiteaccurate. Figure3: RegistrationresultsofMRT1andT2imageslicewithlargenon-overlap.(left)MRT1sourceimagebeforeregistration;(right)DeformedT2targetimage;(mid-dle)thetransformedMRimagesuperimposedwithedgemapfromtargetimage.


34 ]tothesesamedatasets.TheCTimageisofsize(512;512;120)whiletheMRimagesizeis(512;512;142),andthevoxeldimensionsare(0:46;0:46;1:5)mmand(0:68;0:68;1:05)forCTandMRrespectively.Theregistrationwasperformedonreducedvolumes(210210120)withthecontrolknotsplacedevery161616voxels.TheprogramwaswrittenintheC++programminglanguage,andallexperimentswererunona2.6GHZPentiumPC. Table3: ComparisonoftotaltimetakentoachieveregistrationbytheCCREwithMI. 1 2 3 4 5 6 7 8 CCRETime(s) 4827 3452 4345 4038 3910 4510 5470 3721 MITime(s) 9235 6344 10122 17812 12157 11782 13157 10057 WehaveusedasetofeightvolumesofCTdatasetsandthetaskwastoregistertheseeightvolumestotheMRdatachosenasthetargetimageforallregistrations,byusingbothCCREandMIalgorithms.NotethatallCT&MRvolumesarefromdifferentsubjectsandthuscontainsrealnon-rigidmotion.Theparametersusedwithbothalgorithmswereidentical.Forbothalgorithms,theoptimizationofthecostfunctionswashaltedwhenimprovementsofatleast0:0001inthecostfunctioncouldnotbedetected.ThetimerequiredforregisteringalldatasetsforouralgorithmaswellasMImethodaregiveninTables 3 .Thistableshowsthat,onaverage,ourCCREalgorithmisabout2.5timesfasterthanthetraditionalMIapproachforthissetofexperiments.Forbrevity,weonlyshowoneregistrationresultinFigure 3 .Here,onesliceofthevolumeisshownonrstrowwiththesourceCTimageatleftandreferenceimageatright.ThemiddleimageshowthetransformedCTimageslicesuperimposedwithedgemapfromtargetimage.Onthesecondrow,thesourceimagesuperimposedwithedgemapfrom


targetimageisshownontheleft,whileshowninthemiddleandrightarethesurfacesreconstructedfromthetransformedsourceusingCCREmethodandthetargetMRimagerespectively.Fromthisgure,wecanseethatthesourceandtargetimagedepictconsiderablenon-rigidchangesinshape,neverthelessourmethodwasabletoregisterthesetwoimagesquiteaccurately.Tovalidatetheconformityofthetworeconstructedsurfaces,werandomlysample30pointsfromthesurfaceofthetransformedsourceusingCCRE,andthenestimatethedistancesofthesepointstothesurfaceofthetargetMRvolume.Theaverageofthesedistancesisabout0:47mm,whichindicatesaverygoodagreementbetweentwosurfaces.TheresemblanceofthereconstructedshapesfromtransformedsourcewiththetargetindicatesthatourCCREalgorithmsucceededinmatchingthesourceCTvolumetothetargetMRimage. Figure3: RegistrationresultsofdifferentsubjectsofMR&CTbraindatawithrealnon-rigidmotion.(seetextfordetails. Theaccuracyoftheinformationtheoreticbasedalgorithmfornon-rigidregistrationproblemswasassessedquantitativelybymeansofanregion-basedsegmentationtask[ 54 ].ROIs(wholebrain,eyes)weresegmentedautomaticallyintheseeightCTdatasets


usedasthesourceimageandbinarymaskswerecreated.ThedeformationeldsbetweentheCTandMRvolumewerecomputedandusedtoprojectthemasksfromeachoftheCTtotheMRvolume.ContoursweremanuallydrawnonafewsliceschosenatrandominMRvolume(fourslices/volume).ManualcontoursonMRandcontoursobtainedautomaticallywerethencomparedusinganacceptedsimilarityindexdenedastwotimesthenumberofpixelsintheintersectionofthecontoursdividedbythesumofthenumberofpixelswithineachcontour[ 41 ].Thisindexvariesbetweenzero(completedisagreement)andone(completeagreement)andissensitivetobothdisplacementanddifferencesinsizeandshape.Table 3 listsmeanvaluesforthesimilarityindexforeachstructure.Itiscustomarilyacceptedthatavalueofthesimilarityindexabove0.80indicatesaverygoodagreementbetweencontours.Ourresultsarewellabovethisvalue.Forcomparisonpurpose,wealsocomputedthesameindexfortheMImethod.WecanconcludefromthetablethatourCCREcanachievebetterregistrationaccuracythantheMIforthetaskofnon-rigidregistrationofrealmulti-modelimages. Table3: ComparisonofthevalueSofseveralbrainstructuresforCCREandMI. Volume 1 2 3 4 5 6 7 8 WholeBrain 0.987 0.996 0.974 0.962 0.975 0.967 0.988 0.981 CCRE LeftEye 0.925 0.935 0.925 0.907 0.875 0.890 0.834 0.871 RightEye 0.840 0.940 0.891 0.872 0.851 0.829 0.910 0.921 WholeBrain 0.986 0.981 0.976 0.96 0.950 0.961 0.942 0.952 MI LeftEye 0.911 0.893 0.904 0.791 0.853 0.810 0.851 0.853 RightEye 0.854 0.917 0.889 0.814 0.849 0.844 0.897 0.854


MatchingpointpatternsisubiquitousinmanyeldsofEngineeringandSciencee.g.,medicalimaging,sportsscience,archaeology,andothers.Pointsetsarewidelyusedincomputervisiontorepresentboundarypointsofshapescontainedinimagesoranyothersalientfeaturesofobjectscontainedinimages.Giventwoormoreimagesrepresentedusingthesalientfeaturescontainedtherein,mostoftenthannot,oneisinterestedinmatchingthese(feature)pointpatternstodeterminealinearoranonlineartransformationbetweenthecoordinatesofthefeaturepointsets.Suchtransformationscapturethechangesinthepatterngeometrycharacterizedbythegivenfeaturepointset. Theprimarytechnicalchallengeinusingpoint-setrepresentationsofshapesisthecorrespondenceproblem.Typicallycorrespondencescanbeestimatedoncethepoint-setsareproperlyalignedwithappropriatespatialtransformations.Iftheobjectsathandaredeformable,theadequatetransformationwouldobviouslybeanon-rigidspatialmapping.Solvingfornon-rigiddeformationsbetweenpoint-setswithunknowncorrespondenceisahardproblem.Infact,manycurrentmethodsonlyattempttosolveforafnetransformationforthealignment[ 55 ].Furthermore,wealsoencountertheissueofthebiasproblemingroupwisepoint-setsregistration.Ifonearbitrarilychoosesanyoneofthegivendatasetsasareference,theestimatedregistrationtransformationwouldbebiasedtowardthischosenreferenceanditwouldbedesirabletoavoidsuchabias.Thequestionthatarisesis:Howdowealignallthepoint-setsinasymmetricmannersothatthereisnobiastowardanyparticularpoint-set? Toovercometheseaforementionedproblems,wepresentanovelapproachtosimultaneouslyregistermultiplepoint-setsandconstructtheatlas.Theideaistomodeleachpointsetbyakernelprobabilitydensityordistribution,thenquantifythedistance 34


betweentheseprobabilitydensitiesordistributionsusinginformation-theoreticmeasures.Figure 4 illustratethisidea,wheretherightcolumnofthegureisthedensityfunctioncorrespondingtothecorpuscallosumpoint-setsshownintheleft.Thedistanceis Figure4: Illustrationofcorpuscallosumpoint-setsrepresentedasdensityfunctions. optimizedoveraspaceofcoordinatetransformationsyieldingthedesiredregistrations.Itisobviousthatonceallthepointsetsaredeformedintothesameshape,thedistancemeasurebetweenthesedistributionsshouldbeminimizedsinceallthedistributionareidenticaltoeachother.Weimposeregularizationoneachdeformationeldtopreventover-deformingofeachpoint-sets(e.g.allthepoint-setsmaydeformintoasingledatapoint). Therestofthechapterisorganizedasfollows:webeginbyreviewingalltherelatedliteratures,whichisfollowedbyadescriptionofthedivergencemeasuresweusedforquantifythedistancebetweendensitiesordistributions.Wethenpresentthedetailsof


ourenergyfunction,andtheempiricalwayofestimatingthecostfunctionsandtheirderivatives.Finallywewillshowtheexperimentalresultsattheendofthischapter. Theworkpresentedin[ 56 ]isarepresentativemethodusinganintrinsiccurveparameterizationtoanalyzedeformableshapes.Shapesarerepresentedaselementsofinnite-dimensionalspacesandtheirpairwisedifferencearequantiedusingthelengthsofgeodesicsconnectingthemonthesespaces,theintrinsicmean(Karchermean)canbecomputedasapointonthemanifold(ofshapes)whichminimizethesumofsquaregeodesicdistancebetweenthisunknownpointtoeachindividualshape,whichliesonthemanifold.Howeverthecurvesarelimitedbyclosedcurves,andithasnotbeenextendedtothe3Dsurfaceshapes.Formethodsusingintrinsiccurveorsurfacerepresentations[ 56 57 58 ],furtherstatisticalanalysisontheserepresentationsismuchmoredifcultthananalysisonthepointrepresentation,buttherewardmaybehigherduetotheuseofintrinsichigherorderrepresentation. Amongthesemethodsusingpoint-setsparameterization,theideaofusingnon-rigidspatialmappingfunctions,specicallythin-platesplines[ 59 60 61 ],toanalyzedeformableshapehasbeenwidelyadopted.Bookstein'sworkin[ 59 ],successfullyinitiatedtheresearcheffortsontheusageofthin-platesplinestomodelthedeformationofshapes.Thismethodislandmark-based,itavoidsthecorrespondenceproblemsince


theplacementofcorrespondingpointsisdrivenbythevisualperceptionofexperts,howeveritsuffersfromthethetypicalproblembesettinglandmarkmethods,e.g.inconsistency.Severalsignicantarticlesonrobustandnon-rigidpointsetmatchinghavebeenpublishedbyRangaranjanandcollaborators[ 62 60 63 ]usingthin-platesplines.Intheirrecentwork[ 60 ],theyattempttoextendtheirworktotheconstructionofanmeanshapefromasetofunlabeledshapeswhicharerepresentedbyunlabeledpoint-sets.Themainstrengthoftheirworkistheabilitytojointlydeterminethecorrespondencesandnon-rigidtransformationbetweeneachpointsetstotheemergingmeanshapeusingdeterministicannealingandsoft-assign.However,intheirwork,thestabilityoftheregistrationresultisnotguaranteedinthecaseofdatawithoutliers,andhenceagoodstoppingcriterionisrequired.Unliketheirapproach,wedonotneedtorstsolveacorrespondenceprobleminordertosubsequentlysolveanon-rigidregistrationproblem. Theactiveshapemodelproposedin[ 64 ]utilizedpointstorepresentdeformableshapes.Theirworkpioneeredtheeffortsinbuildingpointdistributionmodelstounder-standdeformableshapes[ 64 65 ].Objectsarerepresentedascarefully-denedlandmarkpointsandvariationofshapesaremodeledusingaprincipalcomponentanalysis.Theselandmarkpointsareacquiredthroughamoreorlessmanuallandmarkingprocesswhereanexpertgoesthroughallthesamplestomarkcorrespondingpointsoneachsample.Itisarathertediousprocessandaccuracyislimited.Inrecentwork[ 66 ],theauthorsattempttoovercomethislimitationbyattemptingtoautomaticallysolveforthecorrespondencesinanon-rigidsetting.Theresultingalgorithmisverysimilartotheearlierworkin[ 58 ]andisrestrictedtocurves.Theworkin[ 55 ]alsouses2Dpointstolearnshapestatistics,whichisquitesimilartotheactiveshapemodelmethodexceptthatmoreattentionhasbeenpaidtothesamplepoint-setsgenerationprocessfromtheshape.Unlikeourmethod,thetransformationbetweencurvesarelimitedbyrigidmapping,andprocessisnotsymmetric.


Thereareseveralpapersinthepoint-setsalignmentliteraturewhichbearcloserelationtoourresearchreportedhere.Forinstance,TsinandKanade[ 67 ]proposedakernelcorrelationbasedpointsetregistrationapproachwherethecostfunctionisproportionaltothecorrelationoftwokerneldensityestimates.Itissimilartoourworksincewetoomodeleachofthepointsetsbyakerneldensityfunctionandthenquantifythe(dis)similaritybetweenthemusinganinformation-theoreticmeasure,followedbyanoptimizationofa(dis)similarityfunctionoveraspaceofcoordinatetransformationsyieldingthedesiredtransformation.Thedifferenceliesinthefactthatdivergencemeasuresusedinourworkisalotmoregeneralthantheinformation-theoreticmeasureusedin[ 67 ],andcanbeeasilyextendedtomultiplepoint-sets.Morerecently,in[ 68 ],Glaunesetal.convertthepointmatchingproblemintoanimagematchingproblembytreatingpointsasdeltafunctions.Thentheyliftthesedeltafunctionsanddiffeomorphicallymatchthem.Themainproblemforthistechniqueisthattheyneeda3Dspatialintegralwhichmustbenumericallycomputed,whilewedonotneedthisduetotheempiricalcomputationofthedivergencemeasures.Wewillshowitintheexperimentalresultsthatourmethod,whenappliedtomatchpoint-sets,achievesverygoodperformanceintermsofbothrobustnessandaccuracy.


(KL)divergence.TheKLdivergence(alsoknownastherelativeentropy)betweentwodensitiespandqisdenedas Toillustratethis,considertheMorsecode,designedtosendmessagesinEnglish.TheMorsecodeencodestheletterEwithasingledotandtheletterQwithasequenceoffourdotsanddashes.BecauseEisusedfrequentlyinEnglishandQseldom,thismakesforefcienttransmission.HoweverifonewantedtousetheMorsecodetosendmessagesinChinesepinyin,whichmightuseQmorefrequently,hewouldndthecodelessefcient.IfweassumecontrafactuallythattheMorsecodeisoptimalforEnglish,thisdifferenceinefciencyistheredundancy. NoticethatKLdivergenceisnotsymmetricandapopularwaytosymmetrizeitis 2(DKL(pkq)+DKL(qkp)) 69 ],servesasameasureofcohesionbetweenmultipleprobabilitydistributions.Ithasbeenusedbysomeresearchersasadissimilaritymeasureforimageregistrationandretrievalapplications[ 70 71 ]withverygoodresults.Ithasmanydesirableproperties,tonameafew,1)ThesquarerootofJS-divergence(inthecasewhenitsparameterisxedto0:5)isametric[ 72 ];2)JS-divergencerelatestootherinformation-theoreticfunctionals,such


astherelativeentropyortheKullbackdivergence,andhenceitsharestheirmathematicalpropertiesaswellastheirintuitiveappeal;3)ThecompareddistributionsusingtheJS-divergencecanbeweighted,whichallowsonetotakeintoaccountthedifferentsizesofthepointsetsamplesfromwhichtheprobabilitydistributionsarecomputed;4)TheJS-divergencemeasurealsoallowsustohavedifferentnumberofclustercentersineachpoint-set.ThereisnorequirementthattheclustercentersbeincorrespondenceasisrequiredbyChuietal[ 73 ].Givennprobabilitydensityfunctionspi,i2f1;:::;ng,theJS-divergenceofpiisdenedby where=f1;2;:::;nji>0;Pi=1garetheweightsoftheprobabilitydensityfunctionspiandH(pi)istheShannonentropy.ThetwotermsontherighthandsideofEquation( 4 )aretheentropyofp:=Pipi(the-convexcombinationofthepis)andthesameconvexcombinationoftherespectiveentropies.WecanshowthatJS-divergencecanbederivedfromtheKLdivergence where2(0;1)isaxedparameter;wewillalsoconsideritsstraightforwardgeneral-izationtondistributions. 2 ,wedenedanentropymeasurewhichisbasedonprobabilitydis-tributioninsteadofdensityfunction.Thedistributionfunctionismoreregularbecauseitisdenedinanintegralformunlikethedensityfunction,whichisthederivativeofthedistribution.ThedenitionofCumulativeResidualEntropyalsopreservesthewellestablishedprinciplethatthelogarithmoftheprobabilityofaneventshouldrepresenttheinformationcontentintheevent.CREisshowntobemoreimmunetonoiseand


outliers.Basedonthisidea,wecandeneaKL-divergencemeasurebetweenCumulativeDistributionFunctions(CDFs), Pr(X2>x)dx(4) FollowthesamerelationshipbetweenJensen-ShannondivergenceandKLdivergence,wecanderivetheso-calledCDF-JSdivergencefromthedenitionofCDF-KLdivergence,(denotedasJ),theresultofwhichisshowninthefollowingtheorem, 2 )ofChapter 2


Pr(X>x)dx+(1)ZPr(X2>x)lnPr(X2>x) Pr(X>x)dx=ZPr(X1>x)lnPr(X1>x)dx+(1)ZPr(X2>x)lnPr(X1>x)dxZhPr(X1>x)+(1)Pr(X2>x)ilnPr(X>x)dx=E(P1)(1)E(P2)ZhPr(X1>x)+(1)Pr(X2>x)ilnPr(X>x)dx(4) where,Pisthedistributionfunctioncorrespondingtothedensityfunctionp=p1+(1)p2,whichistheconvexcombinationofthetwoprobabilitydensities,therefore Consequently,CDF-JSdivergencefortworandomvariablecanberewrittenas


non-rigidregistrationfollowingwhichanatlasisconstructedfromtheregistereddata.However,inourwork,theatlasemergesasabyproductofthenon-rigidregistration.Thebasicideaistomodeleachpointsetbyaprobabilitydensityordistributionfunction,thenquantifythedistancebetweenthesefunctionsusinganinformation-theoreticmeasure.Thedistancemeasureisoptimizedoveraspaceofcoordinatetransformationsyieldingthedesiredtransformations.Wewillbeginbypresentingtheenergyfunctionforsolvingthegroupwisepoint-setsregistrationproblem. In( 4 ),Disthedivergencemeasureformultipledistributions,whichweproposetouseeitherJSdivergenceorCDF-JSdivergence.TheweightparameterisapositiveconstanttheoperatorLdeterminesthekindofregularizationimposed.Forexample,Lcouldcorrespondtoathin-platespline,aGaussianradialbasisfunction,etc.EachchoiceofLisinturnrelatedtoakernelandametricofthedeformationfromandtoZ. Followingtheapproachin[ 73 ],wechoosethethin-platespline(TPS)torepresentthenon-rigiddeformation.Givenncontrolpointsx1;:::;xninRd,ageneralnon-rigid


mappingf:Rd!Rdrepresentedbythin-platesplinecanbewrittenanalyticallyas:f(x)=WU(x)+Ax+tHereAx+tisthelinearpartoff.Thenonlinearpartisdeterminedbyadnmatrix,W.AndU(x)isann1vectorconsistingofnbasisfunctionsUi(x)=U(x;xi)=U(kxxik)whereU(r)isthekernelfunctionofthin-platespline.Forexample,ifthedimensionis2(d=2)andtheregularizationfunctionalisdenedonthesecondderivativesoff,wehaveU(r)=1=(8)r2ln(r). Therefore,thecostfunctionfornon-rigidregistrationcanbeformulatedasanenergyfunctionalinaregularizationframework,wheretheregularizationterminequation 4 isgovernedbythebendingenergyofthethin-platesplinewarpingandcanbeexplicitlygivenbytrace(WKWT)whereK=(Kij),Kij=U(pi;pj)describestheinternalstructureofthecontrolpointsets.Notethelinearpartcanbeobtainedbyaninitialafneregistration,thenanoptimizationcanbeperformedtondtheparameterW. wherexkconsistsofpointsfxai;i2f1;:::;nag;a2f1;:::;Ngg,whichisthepooleddataofallthesamples.Incontrasttothetypicalstatisticaltestrelativetoathreshold,weseekthemaximumofthelikelihoodrationinEqn.( 4 ).ThefollowingtheoremshowstherelationshipbetweenJensen-Shannondivergenceandthelikelihoodration.


4 )isequivalenttominimizingtheJensen-ShannondivergencebetweentheNprobabilitydensitiespa,a2f1;:::;Ng. Weseektomaximizetheprobabilitythatthesamplesaredrawnfromthemixtureratherthanfromseparatemembersofthefamily(p1;p2;:::;pN).Inthecontextofgroupwisematchingofpoint-sets,thismakeseminentsensesincemaximizingtheaboveratioistantamounttoincreasingthechancethatalloftheobservedpoint-setsarewarpedversionsofthesameunderlyingwarpedandpooleddatamodel.Thenotionofthepooleddatamodelisdenedasfollows.Inourprocessofgroupwiseregistration,thewarpingdoesnothaveaxedtargetdataset.Instead,thewarpingisbetweentheinputdatasetsandanevolvingtargetwhichwecallthepooledmodel.Thetargetevolvestoafullyregisteredpooleddatasetattheendoftheoptimizationprocess.Thepooledmodelthenconsistsofinputdatasetswhichhaveundergonegroupwisematchingandarenowfullyregisteredwitheachother.TheconnectiontotheJS-divergencearisesfromthefactthatthenegativelogarithmoftheaboveratio(Eqn. 4 )asymptoticallyconvergestotheJS-divergencewhenthesamplesareassumedtobedrawnfromthemixturePaapa. AssumethatwearegivenNpoint-sets,fromwhichwearegoingtoconstructtheatlas,wecanthendividetheNpoint-setsintomsubsets(generallymN),thereforewecan


constructmatlasesfromeachsubsetsusingouralgorithms,andallthepoint-setsinsideeachsubsetsareregistered.Thenwecaneitherconstructasingleatlasfromthesematlaspoint-sets,orwecanfurtherdividematlaspoint-setsintoevensmallersubsets,andfollowthesameprocessuntilasingleatlasisconstructed.Theremainingquestioniswhethertheatlasthusobtainedisbiasedornot?Thefollowingtheoremwillleadustotheanswer. 4 )bysimplealgebraicoperations. Inourregistrationalgorithm,allthepoint-setsarerepresentedasprobabilitydistributions,andtheatlasthusconstructedcanbeconsideredasconvexcombinationofthesedistributions.Therefore,wecantreatPasandSisasthedistributionscorrespondingtothepoint-setsandtheconstructedatlasesfromthesubsetsrespectively.ThereforefromTheorem 7 ,weknowthattherelationshipinEqn.( 4 )holdsbetweentheJSdivergenceofthePasandSis.NoticethattherighthandsideofEqn.( 4 )istheJS/CDF-JSdivergencesofthedistributionsinallthesubsets,whichareminimizedineachstepsofthehierarchicalmethodweproposed.Intuitively,ifthesepointsetsarealignedproperly,


thecorrespondingdistributionfunctionsshouldbestatisticallysimilar.Thereforethedivergencesofallthesubsetsshouldbezeroallveryclosetozero,whichmeanstherighthandsideofEqn.( 4 )iszero.Consequently,theJS/CDF-JSdivergenceofthePasanddivergenceoftheSisareequaltoeachother,thereforeminimizingJS/CDF-JSdivergenceofalltheresultantatlaspoint-setsisequivalenttominimizingdivergenceoftheoriginalpoint-sets,implyingthatthereisnobiastowardanyparticularpartitioningofthepoint-sets. Havingintroducedthecostfunctionandthetransformationmodel,nowthetaskistodesignanefcientwaytoestimateempiricaldivergencemeasuresbetweenmultipledensitiesordistributionsandderivetheanalyticgradientoftheestimateddivergenceinordertoachievetheoptimalsolutionefciently.WedesigntwocompletedifferentapproachesforestimatingJSdivergenceandCDF-JSdivergence.WeusenitemixturemodelforestimatingJSdivergenceandtheparzenwindowtechniqueforCDF-JSdivergence,thedetailsofwhichwillbeintroducednext. 74 ]isdenedasaconvexcombinationofGaussiancomponentdensities. Tomodeleachpoint-setasaGaussianmixture,wedeneasetofclustercenters,oneforeachpoint-set,toserveastheGaussianmixturecenters.Sincethefeaturepoint-setsareusuallyhighlystructured,wecanexpectthemtoclusterwell.Furthermorewecangreatlyimprovethealgorithmefciencybyusinglimitednumberofclusters.Notethatwecanchoosetheclustercenterstobethepoint-setitselfifthesizeofpoint-setisquitesmall.Theclustercenterpoint-setsaredenotedbyfVp;p2f1;:::;Ngg.Eachpoint-setVpconsistsofpointsfvpi2RD;i2f1;:::;Kpgg.NotethatthereareKppointsineachVp,


andthenumberofclustersforeachpoint-setmaybedifferent(inourimplementation,thenumberofclusterswereusuallychosentobeproportionaltothesizeofthepoint-sets).Theclustercentersareestimatedbyusingaclusteringprocessovertheoriginalsamplepointsxpi,andweonlyneedtodothisoncebeforetheprocessofjointatlasestimationandpoint-setsregistration.Inourimplementation,weutilizedeterministicannealing(DA)procedurewithitsprovenbenetofrobustnessinclustering[ 75 ].Webeginbyspecifyingthedensityfunctionofeachpointset. InEquation( 4 ),theoccupancyprobabilitywhichisdifferentforeachdatapoint-setisdenotedbyp.Thecomponentdensitiesp(xjvpa)is (2)D 2aexp1 2xvpaT1axvpa(4) ProbabilityofthepointsetXpcomingfromthismixtureisthen Later,wesettheoccupancyprobabilitytobeuniformandmakethecovariancematricesatobeproportionaltotheidentitymatrixinordertosimplifyatlasestimationprocedure. Forsimplicity,wechoosei=1 (2)D 2aexp1 2fj(xji)fp(vpa)T1afj(xji)fp(vpa)(4) Wherefa;a2f1;:::;Kggisthesetofclustercovariancematrices.Forthesakeofsimplicityandeaseofimplementation,weassumethattheoccupancyprobabilitiesare


uniform(pa=1 4 )asfollows, Foreachtermintheequation,wecanestimatetheentropyusingtheweaklawoflargenumbers,whichisgivenby,H(X1 @1;@JS @2;:::;@JS @N]


EachcomponentofthegradientmaybefoundbydifferentiatingEqn.( 4 )withrespecttothetransformationparameters.Inordertocomputethisgradient,let'srstcalculatethederivativeofQxjipwithrespecttol, (2)D 22jFjpj2(Fjp@fj(xji) (2)D 22jFjpj2(Fjp@fp(vpa) (2)D 22jFjpj2(Fjp[@fp(vpa) @l=n1


whereaisanormalizationfactorthatensuresRRp(l;k)dldk=1,[l;k]arethecoordinatevaluesintheXandYaxisrespectively,thetransformedpointcoordinates[fa(xai;a);fa(yai;a)]isnormalizedbytheminimumcoordinatevalue,x0,y0,andtherangeofeachbin,4bX,4bY.Fromthedensityfunction,wecancalculatethecumulativeresidualdistributionfunctionbytheformulaPa(l>;k>;a)=R1R1pa(l;k;a)dldk,where2Lx;2Ly,Lx;LyarediscretesetsofcoordinatevaluesintheXandYaxisrespectively.Toexpressitinfurtherdetail,wehavethefollowing, du=(3)(u). Havingspeciedthedistributionfunctionofthedata,wecanthenrewriteEqn.( 4 )asfollows,(Forsimplicity,wechoosea=1


wherePisthecumulativeresidualdistributionfunctionforthedensityfunction1 EachcomponentofthegradientmaybefoundbydifferentiatingEqn( 4 )withrespecttothetransformationparameters.Itcanbeeasilyshownthat@P(;;a)


experimentalresultsonpoint-setalignmentbetweentwogivenpoint-setsaswellasatlasconstructionfrommultiplepoint-setsinthenextsection. 4 .Thetoprowshowstheregistrationresultfora2Drealrangedatasetofaroad(whichwasalsousedinTsinandKanade'sexperiments[ 67 ]).Theguredepictstherealdataandtheregistered(usingrigidmotion).Topleftframecontainstwounregisteredpointsetssuperposedoneachother.Toprightframecontainsthesamepointsetsafterregistrationusingouralgorithm.A3Dhelixexampleispresentedinthesecondrow(withthesamearrangementasthetoprow).WealsotestedourmethodagainsttheKCmethod[ 67 ]andtheICPmethods,asexpected,ourmethodandKCmethodexhibitamuchwiderconvergencebasin/rangethantheICPandbothachieveveryhighaccuracyinthenoiselesscase. Wealsoappliedouralgorithmtonon-rigidlyregistermedicaldatasets(2Dpoint-sets).Figure 4 depictssomeresultsofourregistrationmethodappliedtoasetof2Dcorpuscallosumsliceswithfeaturepointsmanuallyextractedbyhumanexperts.Registrationresultisshownintheleftcolumnwiththewarpingof2Dgridundertherecoveredmotionwhichisshowninthemiddlecolumn.Ournon-rigidalignmentperformswellinthe


Figure4: Resultsofrigidregistrationinnoiselesscase.'o'and'+'indicatethemodelandscenepointsrespectively. presenceofnoiseandoutliers(Figure 4 rightcolumn).Forthepurposeofcomparison,wealsotestedtheTPS-RPMprogramprovidedin[ 62 ]onthisdataset,andfoundthatTPS-RPMcancorrectlyregisterthepairwithoutoutliers(Figure 4 topleft)butfailedtomatchthecorruptedpair(Figure 4 topright). Figure4: Non-rigidregistrationofthecorpuscallosumdata.Leftcolumn:twoman-uallysegmentedcorpuscallosumslicesbeforeandafterregistration;Middlecolumn:warpingofthe2Dgridusingtherecoveredmotion;Topright:samesliceswithonecor-ruptedbynoiseandoutliers,beforeandafterregistration.


Figure4: Experimentresultsonseven2Dcorpuscollasumpointsets.Thersttworowsandtheleftimageinthirdrowshowthedeformationofeachpoint-settotheat-las,superimposedwithinitialpointset(showin'o')anddeformedpoint-set(shownin'*').Middleimageinthethirdrow:Theestimatedatlasisshownsuperimposedoverallthepoint-sets.Right:Theestimatedatlasisshownsuperimposedoverallthedeformedpoint-sets. Wemanuallyextractedpointsontheoutercontourofthecorpuscallosumfromsevennormalsubjects,(asshownFigure 4 ,indicatedbyo).Therecovereddeformationbetweeneachpoint-setandthemeanshapearesuperimposedonthersttworowsinFigure 4 .Theresultingatlas(meanpoint-set)isshowninthirdrowofFigure 4 ,andissuperimposedoverallthepoint-sets.Aswedescribedearlier,alltheseresultsarecomputedsimultaneouslyandautomatically.Thisexampleclearlydemonstratethatour


jointmatchingandatlasconstructionalgorithmcansimultaneouslyalignmultipleshapes(modeledbysamplepoint-sets)andcomputeameaningfulatlas/meanshape. 4 ,whichconrmthatCDF-JSisindeedmorerobustinthepresenceofhighnoiseandoutlierlevel.A3DexampleisalsopresentedinFigure 4 Next,wepresentgroupwiseregistrationresultson3Dhippocampalpoint-sets.Four3Dpoint-setswereextractedfromepilepsypatientswithleftanteriortemporallobefociidentiedwithEEG.Aninteractivesegmentationtoolwasusedtosegmentthehippocampusfromthe3DbrainMRIscansof4subjects.Thepoint-setsdifferinshape,withthenumberofpoints450;421;376;307ineachpoint-setrespectively.Intherst


Figure4: Robustnesstooutliersinthepresenceoflargenoise.Errorsinestimatedrigidtransformvs.proportionofoutliers(()=())forbothourmethodandKCmethod. Figure4: Robustnessteston3Dswandata.'o'and'+'indicatethemodelandscenepointsrespectively.Notethatthescenepoint-setiscorruptedbynoiseandoutliers. fourimagesofFigure 4 ,therecoverednonrigiddeformationbetweeneachhippocampalpoint-settotheatlasisshownalongwithasuperimpositiononalloftheoriginaldatasets.InsecondrowoftheFigure 4 ,wealsoshowthescatterplotoforiginalpoint-setsalongwithallthepoint-setsafterthenon-rigidwarping.Anexaminationofthetwoscatterplotsclearlyshowstheefcacyofourrecoverednon-rigidwarping.Notethatvalidationofwhatanatlasshapeoughttobeintherealdatacaseisadifcultproblemandwerelegateitsresolutiontoafuturepaper.


Figure4: Atlasconstructionfromfour3Dhippocampalpointsets.Therstrowandtheleftimageinsecondrowshowsthedeformationofeachpoint-settotheatlas(rep-resentedasclustercenters),superimposedwithinitialpointset(showingreen'o')anddeformedpoint-set(showninred'+').Leftimageinthesecondrow:Scatterplotoftheoriginalfourhippocampalpoint-sets.Right:Scatterplotofallthewarpedpoint-sets.


InMedicalImagingapplications,segmentationcanbeadauntingtaskduetopossiblylargeinhomogeneitiesinimageintensitiesacrossanimagee.g.,inMRimages.Theseinhomogeneitiescombinedwithvolumeaveragingduringtheimagingandpossiblelackofpreciselydenedshapeboundariesforcertainanatomicalstructurescomplicatesthesegmentationproblemimmensely.Onepossiblesolutionforsuchsituationsisatlas-basedsegmentation.Theatlasonceconstructedcanbeusedasatemplateandcanberegisterednon-rigidlytotheimagebeingsegmented(henceforthcalledatargetimage)therebyachievingthedesiredsegmentation.Manyofthemethodsthatachieveatlas-basedsegmentationarebasedonatwostageprocessinvolving,(i)estimatingthenon-rigiddeformationeldbetweentheatlasimageandthetargetimageandthen,(ii)applyingtheestimateddeformationeldtothedesiredshape/atlastoachievethesegmentationofthecorrespondingstructure/sinthetargetimage.Inthischapter,wedevelopanoveltechniquethatwillsimultaneouslyachievethenon-rigidregistrationandsegmentation.Thereisavastbodyofliteratureforthetasksofregistrationandsegmentationindependentlyhowever,methodsthatcombinethemintoonealgorithmarefarandfewinbetween.Inthefollowing,wewillbrieyreviewthefewexistingmethodsthatattempttoachievesimultaneousregistrationandsegmentation. 76 ]developedaminmaxentropyframeworktorigidlyregister&segmentportalandCTdatasets.In[ 77 ],Yezzietal.,presentavariationalprincipleforachievingsimultaneousregistrationandsegmentation,however,theregistrationpartislimitedtorigidmotions.AsimilarlimitationappliestothetechniquepresentedbyNobleetal.,in[ 78 ].Avariational 59

PAGE 72,[ 79 ],forsegmentationandregistrationofcardiacMRIdata.Theirformulationwasagainlimitedtorigidmotionandtheexperimentswerelimitedto2Dimages.InFischletal.,[ 80 ],aBayesianmethodispresentedthatsimultaneouslyestimatesalinearregistrationandthesegmentationofanovelimage.Notethatlinearregistrationdoesnotinvolvenon-rigiddeformations.Thecaseofjointregistrationandsegmentationwithnon-rigidregistrationhasnotbeenaddressedadequatelyinliteraturewiththeexceptionoftherecentworkreportedinSoattoandYezzi[ 81 ]andVemurietal.,[ 82 ].However,thesemethodscanonlyworkwithimagepairsthatarenecessarilyfromthesamemodalityortheintensityprolesarenottoodisparate. Inthispaper,wepresentauniedvariationalprinciplethatwillsimultaneouslyregistertheatlasshape(contour/surface)tothenovelbrainimageandsegmentthedesiredshape(contour/surface)inthenovelimage.Inthiswork,theatlasservesinthesegmentationprocessasapriorandtheregistrationofthispriortothenovelbrainscanwillassistinsegmentingit.Anotherkeyfeature/strengthofourproposedregistration+segmentationschemeisthatitaccommodatesforimagepairshavingverydistinctintensitydistributionsasinmultimodalitydatasets.Moredetailsonthisarepresentedinsection 5.2


Figure5: ModelIllustration Where,thersttermdenotesthesegmentationfunctional.~Cistheboundarycontour(surfacein3D)ofthedesiredanatomicalshapeinI2.Thesecondtermmeasuresthedistancebetweenthetransformedatlasv(C)andthecurrentsegmentation~Cinthenovelbrainimagei.e.,thetargetimageandthethirdtermdenotesthenon-rigidregistrationfunctionalbetweenthetwoimages.Ourjointregistration&segmentationmodelisillustratedinFigure 5.2 Forthesegmentationfunctional,weuseapiecewiseconstantMumfordShahmodel,whichisoneofthewell-knownvariationalmodelsforimagesegmentation,whereinitisassumedthattheimagetobesegmentedcanbemodeledbypiece-wiseconstantregions,aswasdonein[ 54 ].Thisassumptionsimpliesourpresentationbutourmodelitselfcanbeeasilyextendedtothepiecewisesmoothregionscase.Additionally,sinceweareonlyinterestedinsegmentingadesiredanatomicalshape(e.g.,thehippocampus,thecorpuscallosum,etc.),wewillonlybeconcernedwithabinarysegmentationi.e.,twoclassesnamely,voxelsinsidethedesiredshapeandthosethatareoutsideit.Theseassumptionscanbeeasilyrelaxedifnecessarybutatthecostofmakingtheenergyfunctionalmorecomplicatedandhencecomputationallymorechallenging.Thesegmentationfunctionaltakesthefollowingform:


Where,istheimagedomainandisaregularizationparameter.u=uiifx2~Cinandu=uoifx2~Cout.~Cinand~Coutdenotetheregionsinsideandoutsideofthecurve,~CrepresentingthedesiredshapeboundariesinI2. Forthenon-rigidregistrationtermintheenergyfunction,weusetheinformationtheoretic-basedcriteria,crosscumulativeresidualentropy(CCRE)whichweintroducedinChapters 2 .CCREwasshowntooutperformMutualInformationbasedregistrationinthecontextofnoiseimmunityandconvergencerange,motivatingustopickthiscriteriaovertheMI-basedcostfunction.Thenewregistrationfunctionalisdenedby where,cross-CREC(I1;I2)isgivenby, withE(I1)=RR+P(jI1j>)logP(jI1j>)dandR+=(x2R;x0).v(x)isasbeforeandistheregularizationparameterandjjjjdenotesFrobeniusnorm.UsingaB-splinerepresentationofthenon-rigiddeformation,oneneedonlycomputethiseldatthecontrolpointsoftheB-splinesandinterpolateelsewhere,thusaccruingcomputationaladvantages.Usingthisrepresentation,wehavederivedanalyticexpressionsforthegradientoftheenergywithrespecttotheregistrationparameters.Thisinturnmakesouroptimizationmorerobustandefcient. Inorderfortheregistrationandthesegmentationtermstotalktoeachother,weneedaconnectiontermandthatisgivenby where,Ristheregionenclosedby~C,v(C)(x)istheembeddingsigneddistancefunctionofthecontourv(C),whichcanbeusedtomeasurethedistancebetweenv(C)and~C.Thelevel-setfunction:R2!Rischosensothatitszerolevel-setcorrespondstothe


transformedtemplatecurvev(C).LetEdist:=dist(v(C);~C),onecanshowthat@Edist @t=v(C)(~C)N(5) Notonlydoesthesigneddistancefunctionrepresentationmakeiteasierforustoconvertthecurveevolutionproblemtothelevel-setframework,italsofacilitatesthematchingoftheevolvingcurve~Candthetransformedtemplatecurvev(C),andyetdoesnotrelyonaparametricspecicationofeither~Corthetransformedtemplatecurve.Notethatsincedist(v(C);~C)isafunctionoftheunknownregistrationvandtheunknownsegmentation~C,itplaysthecrucialroleofconnectingtheregistrationandthesegmentationterms. Combiningthesethreefunctionalstogether,wegetthefollowingvariationalprincipleforthesimultaneousregistration+segmentationproblem:


andmerges),theabilitytodealwiththeformationofcuspsandcorners,whichareextremelycommonincurveevolution,andthenumericalstabilityandefciencyaffordedinitsimplementation.Forourmodelwheretheequationfortheunknowncurve~Ciscoupledwiththeequationsforv(x),uo;ui,itisconvenientforustousethelevelsetapproachasproposedin[ 54 ]. TakingthevariationofE(:)withrespectto~Candwritingdownthegradientdescentleadstothefollowingcurveevolutionequation: @t=h(I2ui)2+(I2uo)2+1+2v(C)(~C)iN(5) Notethatequation( 5 )isusedinthederivation.Equation( 5 )inthelevel-setframeworkisgivenby: @t=(I2ui)2+(I2uo)2+1rr whereuianduoarethemeanvaluesinsideandoutsideofthecurve~CintheimageI2.Todrivethecurvetowardsthetemplate'slevel-setfunctionv( @t=h(I2ui)2+(I2uo)2+1rr AsillustratedinFigure 5 ,thetwoparameters1and2areusedtobalancetheinuenceoftheshapedistancemodelandtheregion-basedmodel.Notethat(~C)=0atanylocationofthecurvebythedenitionoflevel-setfunction,thisaddedtermdoesnotaffectthecurveevolutionequation[ 83 ]. Asmentionedbefore,weuseaB-splinebasistorepresentthedisplacementvectoreldv(x;),whereisthetransformationparametersoftheB-splinebasis. @=2@RRv(C)(x)dx


Figure5: Illustrationofthevarioustermsintheevolutionofthelevelsetfunction.Toupdate,wecombinethestandardregionbasedupdatetermS,andlevelsetfunctioncorrespondingtotheshapedistanceterm. Thersttermofequation( 5 )canberewrittenasfollows: where@v(C) 5 )hasbeenderivedinEqn.( 3 )ofthechapter 3 ..Wesimplystatetheresultherewithoutthederivationsforthesakeofbrevity, whereP(i>;k;)andP(i>;)arethejointandmarginalcumulativeresidualdistributionsrespectively.pI2(k)isthedensityfunctionofimageI2.ThelasttermofEqn.( 5 )leadsto, whereboththematricesrvand@v


Substitutingequations( 5 ),( 5 )and( 5 )respectivelybackintotheequation( 5 ),wegettheanalyticalgradientofourenergyfunctionwithrespecttotheB-splinetransformationparameters.Wethensolveforthestationarypointofthisnonlinearequationnumericallyusingaquasi-Newtonmethod. 52 ].Theywereoriginallyalignedwitheachother.WeusetheMRT1imageasthesourceimageandthetargetimagewasgeneratedfromtheMRT2imagebyapplyingaknownnon-rigidtransformationthatwasprocedurallygeneratedusingkernel-basedsplinerepresentations(cubicB-Spline).Thepossiblevaluesofeachdirectionindeformationvaryfrom15to15inpixels.Inthis


Figure5: Resultsofapplicationofouralgorithmtosyntheticdata(seetextfordetails). case,wepresenttheerrorintheestimatednon-rigiddeformationeld,usingouralgorithm,asanindicatoroftheaccuracyofestimateddeformations. Figure 5 depictstheresultsobtainedforthisimagepair.WiththeMRT1imageasthesourceimage,thetargetwasobtainedbyapplyingasyntheticallygeneratednon-rigiddeformationeldtotheMRT2image.Noticethesignicantdifferencebetweentheintensityprolesofthesourceandtargetimages.Figure 5 isorganizedasfollows,fromlefttoright:therstrowdepictsthesourceimagewiththeatlas-segmentationsuperposedinred,theregisteredsourceimagewhichisobtainedusingouralgorithmfollowedbythetargetimagewiththeunregisteredatlas-segmentationsuperposedtodepicttheamountofmis-alignment;secondrowdepictsgroundtruthdeformationeldwhichweusedtogeneratethetargetimagefromtheMRT2image,followedbytheestimatednon-rigiddeformationeldandnallythesegmentedtarget.Asvisuallyevident,theregistration+segmentationarequiteaccuratefromavisualinspectionpointofview.Asameasureofaccuracyofourmethod,weestimatedtheaverage,,andthestandarddeviation,,oftheerrorintheestimatednon-rigiddeformationeld.Theerrorwasestimatedastheanglebetweenthegroundtruthandestimateddisplacementvectors.The


Table 5 depictsstatisticsoftheerrorinestimatednon-rigiddeformationwhencomparedtothegroundtruth.Forthemeangroundtruthdeformation(magnitudeofthedisplacementvector)inColumn-1ofeachrow,5distinctdeformationeldswiththismeanaregeneratedandappliedtothetargetimageofthegivensource-targetpairtosynthesize5pairsofdistinctdatasets.Thesepairs(oneatatime)areinputtoouralgorithmandthemean()ofthemeandeformationerror(MDE)iscomputedoverthevepairsandreportedinColumn-2ofthetable.MDEisdenedas Table5: Statisticsoftheerrorinestimatednon-rigiddeformation. 2.4 0.5822 0.0464 3.3 0.6344 0.0923 4.5 0.7629 0.0253 5.5 0.7812 0.0714


Figure5: ResultsofapplicationofouralgorithmtoapairofslicesfromhumanbrainMRIs(seetextfordetails). inthesourceimage.Figure 5 isorganizedasfollows,fromlefttoright:therstrowdepictsthesourceimagewiththeatlas-segmentationsuperposedinblackfollowedbythetargetimagewiththeunregisteredatlas-segmentationsuperposedtodepicttheamountofmis-alignment;secondrowdepictstheestimatednon-rigidvectoreldandnallythesegmentedtarget.Asevidentfromgures 5 ,theaccuracyoftheachievedregistration+segmentationvisuallyverygood.Notethatthenon-rigiddeformationbetweenthetwoimagesinthesetwoexamplesisquitelargeandourmethodwasabletosimultaneouslyregisterandsegmentthetargetdatasetsquiteaccurately. ThesecondrealdataexampleisobtainedfromtwobrainMRIsofdifferentsubjectsandmodalities,thesegmentationofthecerebelluminthesourceimageisgiven.Weselectedtwocorrespondingslicesfromthesevolumedatasetstoconducttheexperiment.Notethateventhoughthenumberofslicesforthetwodatasetsarethesame,theslicesmaynotcorrespondtoeachotherfromananatomicalpointofview.However,forthepurposesofillustrationofouralgorithm,thisisnotverycritical.Weusethecorrespondingsliceofthe3Dsegmentationofthesourceasouratlas-segmentation.Theresultsofanapplication


Figure5: CorpusCallosumsegmentationonapairofcorrespondingslicesfromdis-tinctsubjects. ofouralgorithmareorganizedasbeforeingure 5 .Onceagain,asevident,thevisualqualityofthesegmentationandregistrationareveryhigh. Finallywepresenta3Drealdataexperiment.Inthisexperiment,theinputisapairof3Dbrainscanswiththesegmentationofthehippocampusinoneofthetwoimages(labeledtheatlasimage)beingobtainedusingthewellknownPCAontheseveraltrainingdatasets.Eachdatasetcontains19slicesofsize256x256.Thegoalwasthentoautomaticallyndthehippocampusinthetargetimagegiventheinput.Figure 5 depictstheresultsobtainedforthisimagepair.Fromlefttoright,therstimageshowsthegiven(atlas)hippocampussurfacefollowedbyonecross-sectionofthissurfaceoverlaidonthesourceimageslice;thethirdimageshowsthesegmentedhippocampussurfacefromthetargetimageusingouralgorithmandnallythecross-sectionofthesegmentedsurfaceoverlaidonthetargetimageslice.Tovalidatetheaccuracyofthesegmentationresult,werandomlysampled120pointsfromthesegmentedsurfaceandcomputedtheaveragedistancefromthesepointstothegroundtruthhandsegmentedhippocampalsurfaceinthetargetimage.Thehandsegmentationwasperformedbyanexpertneuroanatomist.The


Figure5: Hippocampalsegmentationusingouralgorithmonapairofbrainscansfromdistinctsubjects.(seetextfordetails) averageandstandarddeviationoftheerrorintheaforementioneddistanceinestimatedhippocampalshapeare0.8190and0.5121(invoxels)respectively,whichisveryaccurate.


Wedemonstratedtheirapplicationstothefollowingmedicalimageanalysisproblems, Ourcontributionstoeachofthesetopicsaresummarizedinthefollowingsections. 6.2.1Non-rigidImageRegistration 84 ]tomeasurethesimilaritybetweentwoimages.Thematchingmeasureisdenedbasedonanewinformationmeasure,namelycumulativeresidualentropy(CRE),whichisdenedbasedontheprobabilitydistributionsinsteadofprobabilitydensities,thereforeCCREisvalidforbothdiscreteandcontinuousdomain.Furthermore,CCREalsoinheritstherobustnesspropertyoftheCREmeasure.In[ 84 ],wepresentedresultsofrigidandafneregistrationunderavarietyofnoiselevelsandshowedsignicantlysuperiorperformanceoverMI-basedmethods. 72


TheCross-CREbetweentwoimagestoberegisteredismaximizedoverthespaceofsmoothandunknownnon-rigidtransformations,whichisrepresentedbyatri-cubicBSplinesplacedonaregulargird.Theanalyticgradientofthismatchingmeasureisthenderivedinthispapertoachieveefcientandaccuratenon-rigidregistration.ItturnsoutthatthegradientoftheCCREhasasimilarformulationwiththecostfunction,whichgreatlysavesmemoryspaceintheoptimizationprocess.ThematchingcriterionisoptimizedusingQuasi-Newtonmethodtorecoverthetransformationparameters. Thekeystrengthsofourproposednon-rigidregistrationschemearedemonstratedthroughtheregistrationofthesyntheticaswellasrealdatasetsfrommulti-modality(MRT1andT2weighted,MR&CT)imagingsources.ItisshowedthatourCCREnotonlycanaccommodateimagestoberegisteredofvaryingcontrast+brightness,butitisalsorobustinthepresenceofnoise.CCREconvergesfasterwhencomparedwithotherinformationtheory-basedregistrationmethods.FinallywealsoshowedthatCCREiswellsuitedforsituationswherethesourceandthetargetimageshaveFOVswithlargenon-overlappingregions(whichisquitecommoninpractice).ComparisonsweremadebetweenCCREandtraditionalMI[ 34 51 ],whichwasdenedusingtheShannonentropy.AlltheexperimentsdepictedsignicantlybetterperformanceofCCREovertheMI-basedmethodscurrentlyusedinliterature. Ourfutureworkwillfocusonextendingthetransformationsmodeltotheonethatpermitsthespatialadaptationofthetransformation'scompliance,whichwillallowustoreducethenumberofdegreesoffreedomintheoveralltransformation.Validationofnon-rigidregistrationonrealdatawiththeaidofsegmentationsandlandmarksobtainedmanuallyfromagroupoftrainedanatomistsarethegoalsofourongoingwork.


sets,weproposedseveraldivergencemeasures,therstofwhichistheJensen-Shannondivergence.Sinceitlacksrobustness,wedevelopanovelmeasurebasedontheircumulativedistributionfunctionsthatwedubastheCDF-JSdivergence.ThemeasureparallelsthewellknownJensen-Shannondivergence(denedforprobabilitydensityfunctions)butismoreregularthantheJSdivergencesinceitsdenitionisbasedonCDFsasopposedtodensityfunctions.Asaconsequence,CDF-JSismoreimmunetonoiseandstatisticallymorerobustthantheJS. Ourproposedmethodsdonotrequireanyknowledgeofcorrespondencebetweentheinputpoint-sets,andthereforethesepoint-setsneednothavethesamecardinality.Oneothersalientfeatureofourproposedalgorithmsisthatwegetaprobabilisticatlasasthebyproductoftheregistrationprocess.Ouralgorithmcanbeespeciallyusefulforcreatingatlasesofvariousshapespresentinimagesaswellasforsimultaneously(rigidlyornon-rigidly)registering3Drangedatasetswithouthavingtoestablishanycorrespondence. Ourfutureworkwillfocusonusingmaximumlikelihoodestimation(MLE)toautomaticallydetermineweightingcoefcientsinthedivergencemeasuresandsmoothingterm;Wearealsoattemptingtoextendourtechniquestodiffeomorphicpoint-setsmatching.




[1] C.E.Shannon,Amathematicaltheoryofcommunication,BellSystemTechnicalJournal,pp.379and623,1948. [2] W.F.Sharpe,Investments.London:PrenticeHall,1985. [3] D.Salomon,DataCompression.NewYork:Springer,1998. [4] S.Kullback,InformationTheoryandStatistics.NewYork:Wiley,1959. [5] T.M.CoverandJ.A.Thomas,ElementsofInformationTheory.NewYork:Wiley,1991. [6] G.Jumarie,RelativeInformation.NewYork:Springer,1990. [7] M.Rao,Y.Chen,B.C.Vemuri,andF.Wang,Cumulativeresidualentropy,anewmeasureofinformation,IEEETransactionsonInformationTheory,vol.50,no.6,pp.1220,June2004. [8] M.AsadiandY.Zohrevand,Onthedynamiccumulativeresidualentropy,UnpublishedManuscript,2006. [9] H.Chui,L.Win,R.Schultz,J.Duncan,andA.Rangarajan,Auniednon-rigidfeatureregistrationmethodforbrainmapping,MedicalImageAnalysis,vol.7,no.2,pp.112,2003. [10] N.Paragios,M.Rousson,andV.Ramesh,Non-rigidregistrationusingdistancefunctions,Comput.Vis.ImageUnderst.,vol.89,no.2-3,pp.142,2003. [11] M.A.Audette,K.Siddiqi,F.P.Ferrie,andT.M.Peters,Anintegratedrange-sensing,segmentationandregistrationframeworkforthecharacterizationofintra-surgicalbraindeformationsinimage-guidedsurgery,Comput.Vis.ImageUnderst.,vol.89,no.2-3,pp.226,2003. [12] A.Leow,P.M.Thompson,H.Protas,andS.-C.Huang,Brainwarpingwithimplicitrepresentations.inInternationalSymposiumonBiomedicalImaging,2004,pp.603. [13] B.JianandB.C.Vemuri,Arobustalgorithmforpointsetregistrationusingmixtureofgaussians.inIEEEInternationalConferenceonComputerVision,2005,pp.1246. 76


[14] F.Wang,B.C.Vemuri,A.Rangarajan,I.M.Schmalfuss,andS.J.Eisenschenk,Simultaneousnonrigidregistrationofmultiplepointsetsandatlasconstruction,inEuropeanConferenceonComputerVision,2006,pp.551. [15] S.J.H.Guo,A.Rangarajan,Anewjointclusteringanddiffeomorphismestimationalgorithmfornon-rigidshapematching,inIEEEComputerVisionandPatternRecognition,2004,pp.16. [16] M.IraniandP.Anandan,RobustMulti-sensorImageAlignment,inInternationalConferenceonComputerVision,Bombay,India,1998,pp.959. [17] J.Liu,B.C.Vemuri,andJ.L.Marroquin,Localfrequencyrepresentationsforrobustmultimodalimageregistration,IEEETransactionsonMedicalImaging,vol.21,no.5,pp.462,2002. [18] M.MellorandM.Brady,Non-rigidmultimodalimageregistrationusinglocalphase,inMedicalImageComputingandComputer-AssistedIntervention,Saint-Malo,France,Sep2004,pp.789. [19] B.ZitovaandJ.Flusser,Imageregistrationmethods:asurvey.ImageVisionComput.,vol.21,no.11,pp.977,2003. [20] J.Ruiz-Alzola,C.-F.Westin,S.K.Wareld,A.Nabavi,andR.Kikinis,Nonrigidregistrationof3dscalarvectorandtensormedicaldata,inThirdInternationalConferenceonMedicalImageComputingandComputer-AssistedIntervention,A.M.DiGioiaandS.Delp,Eds.,Pittsburgh,October112000,pp.541. [21] L.Marroquin,B.Vemuri,S.Botello,F.Calderon,andA.Fernandez-Bouzas,Anaccurateandefcientbayesianmethodforautomaticsegmentationofbrainmri,inIEEETransactionsonMedicalImaging,2002,pp.934. [22] B.C.Vemuri,J.Ye,Y.Chen,andC.M.Leonard,Alevel-setbasedapproachtoimageregistration,inIEEEWorkshoponMathematicalMethodsinBiomedicalImageAnalysis,2000,pp.86. [23] P.Hellier,C.Barillot,E.Mmin,andP.Prez,Hierarchicalestimationofadensedeformationeldfor3drobustregistration,IEEETransactionsonMedicalImaging,vol.20,no.5,pp.388,May2001. [24] R.SzeliskiandJ.Coughlan,Spline-basedimageregistration,Int.J.Comput.Vision,vol.22,no.3,pp.199,March1997. [25] S.H.LaiandM.Fang,Robustandefcientimagealignmentwithspatially-varyingilluminationmodels,inIEEEConferenceonComputerVisionandPatternRecognition,1999,pp.II:167. [26] A.Guimond,A.Roche,N.Ayache,andJ.Menuier,Three-DimensionalMultimodalBrainWarpingUsingtheDemonsAlgorithmandAdaptiveIntensity


Corrections,IEEETransactionsonMedicalImaging,vol.20,no.1,pp.58,2001. [27] J.-P.Thirion,Imagematchingasadiffusionprocess:ananalogywithmaxwell'sdemons,MedicalImageAnalysis,vol.2,no.3,pp.243,1998. [28] A.Cuzol,P.Hellier,andE.Memin,Anovelparametricmethodfornon-rigidimageregistration,inProc.InformationProcessinginMedicalImaging(IPMI'05),ser.LNCS,G.ChristensenandM.Sonka,Eds.,no.3565,GlenwoodSpringes,Colorado,USA,July2005,pp.456. [29] A.W.TogaandP.M.Thompson,Theroleofimageregistrationinbrainmapping,ImageVisionComput.,vol.19,no.1-2,pp.3,2001. [30] E.D'Agostino,F.Maes,D.Vandermeulen,andP.Suetens,Non-rigidatlas-to-imageregistrationbyminimizationofclass-conditionalimageentropy.inMedicalImageComputingandComputer-AssistedIntervention,2004,pp.745. [31] P.A.ViolaandW.M.Wells,Alignmentbymaximizationofmutualinformation,inIEEEInternationalConferenceonComputerVision,MIT,Cambridge,1995. [32] A.Collignon,F.Maes,D.Delaere,D.Vandermeulen,P.Suetens,andG.Marchal,Automatedmultimodalityimageregistrationbasedoninformationtheory,Proc.InformationProcessinginMedicalImaging,pp.263,1995. [33] C.Studholme,D.Hill,andD.J.Hawkes,Automated3DregistrationofMRandCTimagesinthehead,MedicalImageAnalysis,vol.1,no.2,pp.163,1996. [34] D.Mattes,D.R.Haynor,H.Vesselle,T.K.Lewellen,andW.Eubank,Pet-ctimageregistrationinthechestusingfree-formdeformations.IEEETransactionsonMedicalImaging,vol.22,no.1,pp.120,2003. [35] D.Rueckert,A.F.Frangi,andJ.A.Schnabel,Automaticconstructionof3dstatisticaldeformationmodelsofthebrainusingnon-rigidregistration.IEEETransactionsonMedicalImaging,vol.22,no.8,pp.1014,2003. [36] G.Hermosillo,C.Chefd'hotel,andO.Faugeras,Variationalmethodsformultimodalimagematching,Int.J.Comput.Vision,vol.50,no.3,pp.329,2002. [37] D.Rueckert,L.I.Sonoda,C.Hayes,D.L.G.Hill,M.O.Leach,andD.J.Hawkes,Nonrigidregistrationusingfree-formdeformations:Applicationtobreastmrimages,IEEETransactionsonMedicalImaging,vol.18,no.8,pp.712,August1999. [38] M.E.LeventonandW.E.L.Grimson,Multimodalvolumeregistrationusingjointintensitydistributions,inMedicalImageComputingandComputer-AssistedIntervention(MICCAI),Cambridge,MA,1998,pp.1057.


[39] T.Gaens,F.Maes,D.Vandermeulen,andP.Suetens,Non-rigidmultimodalimageregistrationusingmutualinformation,inProc.ConferenceonMedicalImageComputingandCompter-AssistedIntervention(MICCAI),1998,pp.1099. [40] D.Loeckx,F.Maes,D.Vandermeulen,andP.Suetens,Nonrigidimageregistrationusingfree-formdeformationswithalocalrigidityconstraint.inMedicalImageComputingandComputer-AssistedIntervention,2004,pp.639. [41] G.K.Rohde,A.Aldroubi,andB.M.Dawant,Theadaptivebasesalgorithmforintensitybasednonrigidimageregistration.IEEETransactionsonMedicalImaging,vol.22,no.11,pp.1470,2003. [42] V.Duay,P.-F.D'Haese,R.Li,andB.M.Dawant,Non-rigidregistrationalgorithmwithspatiallyvaryingstiffnessproperties.inInternationalSymposiumonBiomedicalImaging,2004,pp.408. [43] C.Guetter,C.Xu,F.Sauer,andJ.Hornegger,Learningbasednon-rigidmulti-modalimageregistrationusingkullback-leiblerdivergence.inMedicalImageComputingandComputer-AssistedIntervention,2005,pp.255. [44] E.D'Agostino,F.Maes,D.Vandermeulen,andP.Suetens,Aninformationtheoreticapproachfornon-rigidimageregistrationusingvoxelclassprobabilities,MedicalImageAnalysis,vol.10,no.3,pp.413,2006. [45] C.Davatzikos,Spatialtransformationandregistrationofbrainimagesusingelasticallydeformablemodels,Comput.Vis.ImageUnderst.,vol.66,no.2,pp.207,1997. [46] J.C.Gee,M.Reivich,andR.Bajcsy,Elasticallydeforming3datlastomatchanatomicalbrainimages,J.Comput.Assist.Tomogr.,vol.17,no.2,pp.225,1993. [47] M.Bro-NielsenandC.Gramkow,Fastuidregistrationofmedicalimages,inProc.ofthe4thInternationalConferenceonVisualizationinBiomedicalComputing.London,UK:Springer-Verlag,1996,pp.267. [48] G.E.Christensen,R.D.Rabbitt,andM.I.Miller,Deformabletemplatesusinglargedeformationkinematics,IEEETransactionsOnImageProcessing,vol.5,no.10,pp.1435,October1996. [49] X.Geng,D.Kumar,andG.E.Christensen,Transitiveinverse-consistentmanifoldregistration.inProc.InformationProcessinginMedicalImaging,2005,pp.468. [50] A.Trouve,Diffeomorphismsgroupsandpatternmatchinginimageanalysis,Int.J.Comput.Vision,vol.28,no.3,pp.213,1998. [51] D.R.ForseyandR.H.Bartels,Hierarchicalb-splinerenement,ComputerGraphics,vol.22,no.4,pp.205,1988.


[52] C.Cocosco,V.Kollokian,R.-S.Kwan,andA.Evans,Brainweb:onlineinterfacetoa3-dmrisimulatedbraindatabase,1997,lastaccessed:July2005.[Online].Available: [53] P.ThevenazandM.Unser,Optimizationofmutualinformationformultiresolutionimageregistration,IEEETransactionsonImageProcessing,vol.9,no.12,pp.2083,December2000. [54] T.ChanandL.Vesse,Anactivecontourmodelwithoutedges,inIntl.Conf.onScale-spaceTheoriesinComputerVision,1999,pp.266. [55] N.Duta,A.K.Jain,andM.-P.Dubuisson-Jolly,Automaticconstructionof2dshapemodels,IEEETransactionsPatternAnal.Mach.Intell.,vol.23,no.5,pp.433,2001. [56] E.Klassen,A.Srivastava,W.Mio,andS.H.Joshi,Analysisofplanarshapesusinggeodesicpathsonshapespaces.IEEETransactionsPatternAnal.Mach.Intell.,vol.26,no.3,pp.372,2003. [57] T.B.Sebastian,P.N.Klein,B.B.Kimia,andJ.J.Crisco,Constructing2dcurveatlases,inIEEEWorkshoponMathematicalMethodsinBiomedicalImageAnalysis,Washington,DC,USA,2000,pp.70. [58] H.Tagare,Shape-basednonrigidcorrespondencewithapplicationtoheartmotionanalysis.IEEETransactionsonMedicalImaging,vol.18,no.7,pp.570,1999. [59] F.L.Bookstein,Principalwarps:Thin-platesplinesandthedecompositionofdeformations.IEEETransactionsPatternAnal.Mach.Intell.,vol.11,no.6,pp.567,1989. [60] H.Chui,A.Rangarajan,J.Zhang,andC.M.Leonard,Unsupervisedlearningofanatlasfromunlabeledpoint-sets.IEEETransactionsPatternAnal.Mach.Intell.,vol.26,no.2,pp.160,2004. [61] S.Belongie,J.Malik,andJ.Puzicha,Shapematchingandobjectrecognitionusingshapecontexts,IEEETransactionsPatternAnal.Mach.Intell.,vol.24,no.4,pp.509,2002. [62] H.ChuiandA.Rangarajan,Anewalgorithmfornon-rigidpointmatching.inIEEEComputerVisionandPatternRecognition,2000,pp.2044. [63] H.Guo,A.Rangarajan,S.Joshi,andL.Younes,Non-rigidregistrationofshapesviadiffeomorphicpointmatching.inInternationalSymposiumonBiomedicalImaging,2004,pp.924. [64] T.F.Cootes,C.J.Taylor,D.H.Cooper,andJ.Graham,Activeshapemodels:theirtrainingandapplication,Comput.Vis.ImageUnderst.,vol.61,no.1,pp.38,1995.


[65] Y.WangandL.H.Staib,Boundaryndingwithpriorshapeandsmoothnessmodels,IEEETransactionsonPatternAnalysisandMachineIntelligence,vol.22,no.7,pp.738,2000. [66] A.Hill,C.J.Taylor,andA.D.Brett,Aframeworkforautomaticlandmarkidenticationusinganewmethodofnonrigidcorrespondence.IEEETransactionsPatternAnal.Mach.Intell.,vol.22,no.3,pp.241,2000. [67] Y.TsinandT.Kanade,Acorrelation-basedapproachtorobustpointsetregistration.inEuropeanConferenceonComputerVision,2004,pp.558. [68] J.Glaunes,A.Trouve,andL.Younes,Diffeomorphicmatchingofdistributions:Anewapproachforunlabelledpoint-setsandsub-manifoldsmatching.inIEEEComputerVisionandPatternRecognition,2004,pp.712. [69] J.Lin,Divergencemeasuresbasedontheshannonentropy,IEEETransactionsInformationTheory,vol.37,pp.145,1991. [70] A.Hero,O.M.B.Ma,andJ.Gorman,Applicationsofentropicspanninggraphs,IEEETransactionsSignalProcessing,vol.19,pp.85,2002. [71] Y.He,A.Ben-Hamza,andH.Krim,Ageneralizeddivergencemeasureforrobustimageregistration,IEEETransactionsSignalProcessing,vol.51,pp.1211,2003. [72] D.M.EndresandJ.E.Schindelin,Anewmetricforprobabilitydistributions,IEEETransactionsInformationTheory,vol.49,pp.1858,2003. [73] H.ChuiandA.Rangarajan,Anewpointmatchingalgorithmfornon-rigidregistration,ComputerVisionandImageUnderstanding(CVIU),vol.89,pp.114,2003. [74] G.McLachlanandK.Basford,MixtureModel:InferenceandApplicationstoClustering.NewYork:MarcelDekker,1988. [75] A.L.Yuille,P.Stolorz,andJ.Utans,Statisticalphysics,mixturesofdistributions,andtheemalgorithm,NeuralComput.,vol.6,no.2,pp.334,1994. [76] R.Bansal,L.Staib,Z.Chen,A.Rangarajan,J.Knisely,R.Nath,andJ.Duncan.,Entropy-based,multiple-portal-to-3dctregistrationforprostateradiotherapyusingiterativelyestimatedsegmentation,inMedicalImageComputingandComputer-AssistedIntervention,1999,pp.567. [77] A.Yezzi,L.Zollei,andT.Kapur,Avariationalframeworkforjointsegmentationandregistration,inIEEEWorkshoponMathematicalMethodsinBiomedicalImageAnalysis,2001,pp.388. [78] P.WyattandJ.Noble,Mrf-mapjointsegmentationandregistration,inMedicalImageComputingandComputer-AssistedIntervention,2002,pp.580.


[79] N.Paragios,M.Rousson,andV.Ramesh,Knowledge-basedregistration&segmentationoftheleftventricle:Alevelsetapproach.inWACV,2002,pp.37. [80] B.Fischl,D.Salat,E.Buena,,Wholebrainsementation:Automatedlabelingoftheneuroanatomicalstructuresinthehumanbrain,inNeuron,vol.33,2002,pp.341. [81] S.SoattoandA.J.Yezzi,Deformotion:Deformingmotion,shapeaverageandthejointregistrationandsegmentationofimages,inEuropeanConferenceonComputerVision,2002,pp.32. [82] B.C.Vemuri,Y.Chen,andZ.Wang,Registrationassistedimagesmoothingandsegmentation,inEuropeanConferenceonComputerVision,2002,pp.546. [83] T.ZhangandD.Freedman,Trackingobjectsusingdensitymatchingandshapepriors.inIEEEInternationalConferenceonComputerVision,2003,pp.1056. [84] F.Wang,B.C.Vemuri,M.Rao,andY.Chen,Anew&robustinformationtheoreticmeasureanditsapplicationtoimagealignment.inProc.InformationProcessinginMedicalImaging,2003,pp.388.


FeiWangwasborninYanCheng,JiangSu,P.R.China.HereceivedhisBachelorofSciencedegreefromtheUniversityofScienceandTechnologyofChina,P.R.China,in2001.HeearnedhisMasterofScienceandDoctorofPhilosophydegreefromtheUniversityofFloridainDecember2002andAugust2006respectively.Hisresearchinterestsincludemedicalimaging,computervision,patternrecognition,computergraphicsandshapemodeling. 83