Cumulative Residual Entropy, A New Measure of Information & its Application

to Image Alignment *

B. C. Vemuri F. E. Wang', M. Rao2 and Y Chen2

1Dept. ofCISE,

University of Florida,

Abstract

In this paper we use the cumulative distribution of a ran-

dom variable to define the information content in it and

use it to develop a novel measure of information that paral-

lels Shannon entropy, which we dub cumulative residual en-

tropy (CRE). The salientfeatures of CRE are, (1) it is more

general than the Shannon Entropy in that its definition is

valid in the continuous and discrete domains, (2) it possess

more general mathematical properties than the Shannon en-

tropy and (3) it can be easily computedfrom sample data

and these computations asymptotically converge to the true

values. Based on CRE, we define the cross-CRE (CCRE)

between two random variables, and apply it to solve the

uni- & multi-modal image alignment problem for parame-

terized (rigid, aifine and projective) transformations. The

key 'i,. rili of the CCRE over using the now popular mu-

tual information method (based on Shannon 's entropy) are

that the former has ',,-,r ... Iid\' larger noise immunity and

a much larger convergence range over the field ofparame-

terized transformations. We demonstrate these i, ,.,1rih, I ,

experiments on synthesized and real image data.

1. Introduction

The concept of Entropy is central to the field of Informa-

tion Theory and was originally introduced by Shannon in

his seminal paper [16], inthe context of communicationthe-

ory. Since then, this concept and variants thereof have been

extensively utilized in numerous applications of science and

engineering. To date, one of the most widely benefiting ap-

plication has been for data compression and transmission.

Shannon's definition of entropy originated from the discrete

domain and its continuous counterpart called the dittrrential

entropy is not a direct consequence of the definition in the

discrete case. It is well known that the Shannon definition of

Entropy in the discrete case does not converge to the con-

tinuous definition [7]. Moreover, the definition in the dis-

*This research was in part funded by the NIH grant NS42075.

Manuscript submitted to ICCV'03. Also, a technical report dept. of CISE

TR03-005.

2Dept. ofH /itih'en ti \

Gainesville, Fl. 32611

create case, which states that the entropy H(X) in a random

variable, X, is H(X) = E, p(x)log(p(x) is based on

the density of the random variable p(X), which in general

may or may not exist [7]. Several alternative measures have

been defined in literature [13, 1, 8, 9] to overcome some

of these drawbacks. In this regard, all of the methods either

simply replace the summation with an integral or use the di-

rected divergence from the uniform distribution. The use of

directed divergence i.e., comparing the uncertainty in a ran-

dom variable to that in one which maximizes the Shannon

entropy namely, the uniformly distributed random variable

seems to overcome the difficulty of leaping from the entropy

definition in the discrete random variable case to that of the

continuous case. For more details, we refer the reader to [9].

However, this approach is not a direct solution to the prob-

lem i.e., uses a comparative/relative measure. In this paper,

we present a new measure of information in a random vari-

able that will overcome the aforementioned drawbacks of

the Shannon entropy and has very general properties as a

consequence. This new measure is a fundamental depar-

ture from all the existing measures of entropy in that it is

based on the probability distribution of a random variable

rather than its density function. We will also present some

interesting properties of this measure and then state some

theorems which are proved elsewhere [5]. Following this,

we will define a new matching criterion based on our in-

formation theoretic measure for application to the image

alignment problem and compare it to methods that use the

Shannon entropy in defining a match measure.

1.1 Previous Work on Image Alignment

In the context of the image alignment problem, informa-

tion theoretic measures for comparing image pairs differing

by an unknown coordinate transformation has been popular

since the seminal works of Viola & Wells [20] and Col-

lignon et.al., [6]. There are numerous methods in literature

for solving the image alignment problem. Broadly speak-

ing, these can be categorized as feature-based and direct

methods. The former typically compute some distinguish-

ing features and define a cost function whose optimization

over the space of a known class of coordinate transforms

leads to an optimal coordinate transformation. The latter set

of methods involve defining a matching criterion directly on

the intensity image pairs. We will briefly review the direct

methods and refer the reader to a recent survey [12] for oth-

ers.

Sum of squared differences (SSD) has been a popular

technique for image alignment [2, 18, 19, 10]. Variants of

the original formulation have been able to cope with the de-

viations from the image brightness constancy assumption

[10]. Other matching criteria use of statistical information

in the image e.g., correlation ratio [14] and maximum likeli-

hood criteria based on data sets that are pre-registered [11].

Image alignment is achieved by optimizing these criteria

over a set of parameterized coordinate transformations. The

statistical techniques can cope with image pairs that are not

necessarily from the same imaging modality.

Another direct approach is based on the concept of max-

imizing mutual information (MI) defined using the Shan-

non entropy reported in Viola and Wells [20], Collignon

et al., [6] and Studholme et al., [17]. MI between the source

and the target images that are to be aligned is maximized

using a stochastic analog of the gradient descent method in

[20] and other optimization methods such as the Powells

method in [6] and a multiresolution scheme in [17]. Re-

ported registration experiments in these works are quite im-

pressive for the case of rigid motion. In [17], Studholme

et.al., presented a normalized MI scheme for matching

multi-modal image pairs misaligned by a rigid motion. Nor-

malized MI was shown to be able to cope with image pairs

not having the same field of view (FOV), an important and

practical problem. Most of the effort in the recent past has

been spent on coping with non-rigid deformations between

the source and target multi-modal data sets [15, 4].

2 Cumulative Residual Entropy: A

new measure of information

In this section we define our new information theoretic mea-

sure and derive some properties/theorems. We do not delve

into the proofs but refer the reader to a more comprehensive

mathematical -unpublished technical -report [5].

The key idea in our definition is to use the cumulative

distribution in place of the density function in Shannon's

definition of entropy. The distribution function is more reg-

ular because it is defined in an integral form unlike the den-

sity function, which is computed as the derivative of the

distribution. Moreover, in practice what is of interest and/or

measurable is the distribution function. For example, if the

random variable describes the life span of a light bulb, then

the event of interest is not whether the life span equals t,

but whether it exceeds t. Our definition also preserves the

well established principle that the logarithm of the proba-

bility of an event should represent the information content

in the event. We dub this measure as cumulative residual

entropy henceforth abbreviated CRE.

Definition: Let X be a random vector in RZN, we define

the CRE of X, by :

8(X) =- f P(X > A) logP(IXI > A)dA (1)

Where X (X1,X2, ...,XN), A = (A1, ...AN) and |X >

Means Xi > A, and R = (X ; X; > 0). CRE

is easily computed for various distributions (in some cases

numerically). For example, in the uniform distribution case,

1

p(x)= o

The CRE computes to,

8(X)

0 < x < a

O<

o.W

S>)loP(X )d

SP(JX > x)logP( X > x)dx

Jo

(1 x)log(1

a

x

-)dx

a

In the case of the exponential distribution with mean 1/A

and density function: p(x) = Ae- A, the CRE computes to:

8(x)

Se- Aloge-x dt

oAt

Xte-)Adt

A k"

For the case of the Gaussian distribution, the expression for

CRE will involve the error function erf

Proposition 1 8(X) < oo if for all i and some p >

N, L| P] < oo; where E is the expectation operator.

Proposition 2 IfXi are independent, then

F8=(X = Xj 1)) S (Xi)

Proposition 3 (Weak Convergence). Let the random vec-

tors Xk converge in distribution to the random vector X;

by this we mean

lim E[p(Xk)] = E[p(X)] (5)

k-+oo

for all bounded continuous function ) on "ZN, if all the Xk

are bounded in LP for some p > N, then

lim 8(Xk) = 8(X) (6)

k-4oo

Definition: Given random vectors X and Y E JZN, we de-

fine the conditional CRE (X Y) by :

(X\Y) =- P( X > x|Y)logP(|X| > xlY)dx

(7)

Proposition 4 For any X and Y

E[(X Y)] < (X) (8)

Equality holds iff X is independent of Y. This is analo-

gous to the Shannon entropy case. Essentially, it states that

c. i i, r. i i. reduces CRE.

Definition: The continuous version of the Shannon entropy

called the differential entropy [7] 'H(X) of a random vari-

able X with density f is defined as

'H(X) = -EI- L. f] = f f(x) log f (x)dx

The following proposition describes the relationship be-

tween CRE and the differential entropy and we prove that

the CRE is exponentially larger than the differential entropy.

This in turn will have an influence on relationship between

quantities derived from S (X) and 7/(X) such as cross-CRE

(CCRE) and mutual information (MI) respectively. CCRE

and MI will be used in estimating the image alignment prob-

lem subsequently.

Proposition 5 Let X > 0 have density f, then,

(X) > C. exp(7-(X)), (9)

C = exp( log(x log x)dx)

Proof: Let G(x) = P[X > x] = '7 f(u)du using the

Log-Sum inequality [7] we have,

Finally a change of variable gives:

Sf (x) log (G(x) log G(x) dx= log (XI log x dx

Using the above and exponentiating both sides of (10), we

get (9) 0

Definition: The mutual information I(X, Y) of two con-

tinuous random variables X and Y using Shannon entropy

is defined as :

(11)

This measure for the discrete random variable case is now

widely employed in assessing the misalignment between a

pair of uni- or a pair of multi-modality image data sets.

We now define a quantity called cross-CRE (CCRE)

given by

(12)

Note that I(X, Y) is symmetric but C(X, Y) need not be.

We define the symmetrized version of C as,

C(X, Y) = (X)

2+ (Y)

E[S(Y/X)])

E[(X/Y)])

(13)

From Proposition 4, we know that C is non-negative. In our

experiments, we found that the non-symmetric CCRE given

by C was sufficient to yield the desired results. We empiri-

cally show the superior performance of CCRE over MI and

normalized-MI under low signal to noise ratio (SNR) con-

ditions and also depict its larger capture range with regards

to the convergence to the optimal parameterized transfor-

mation.

S f(x) log f(x) d

f(x) log G(x) ilog G(x)idx,

G (x)I log G(x) da

1

= log )

(X)

The left hand side in (10) equals

-H/(X)- f(x) log(G(x) log G(x) )ddx

so that,

2.1 Estimating Empirical CRE

In order to compute CRE of an image, we use the histogram

of an image to estimate the P(X > A) where X corre-

sponds to the image intensity which is considered as a ran-

dom variable. Note that as a consequence of proposition

3, empirical CRE computation based on the samples will

converge in the limit to the true value. This is not the case

for the Shanon entropy computed using histograms to esti-

mate the probability densityfunctions, as is usually done in

current literature. In the case of CRE, we have,

$(X)

/L(X) + I f(x) log(G(x) log G(x) )dx < log S(X)

/P(X > A) logP(X > A)dA

- P(X > A) logP(X > A)

A

(14)

I(X, Y) = -H(X) E[-H(XiY)]

C(X, Y) = (X) E[(YiX)]

-20 0 20 40 -40 -20 0 20

Rotation Angle(degree) Rotation Angle(degree)

30

25

2O

20

15

10 -

40 -40 -20 0 20

Rotation Angle(degree)

Figure 1: Comparison of the magnitude of C and I over a range of rotations, for a pair of images shown in Figure (2). (a) Traditional MI

which is computed by H(f)+H(r)-H(f,r); (b) Normalized MI which is computed by (H(f)+H(r))/H(f,r); (c) CCRE.

Hence, using a histogram to compute the CRE is well de-

fined and justified theoretically.

Note that estimating (X/Y) is done using the joint his-

togram and then marginalizing it with respect to the condi-

tioned variable.

3 The Alignment Problem

The alignment problem is defined as: Given a pair of images

f(x,y) and r(x',y'), where (x',y')' = T (x,y)' where

T is the matrix corresponding to the unknown parameter-

ized transformation to be determined, define a match met-

ric M (f (x, y), r(x', y')) and maximize/minimize M over

all T. In our case, the matching criterion M is defined

by CCRE. The class of transformations that we consider

are, rigid motions, affine motions and projective transfor-

mations.

To show the marked contrast in the range of values taken

by C and I, we compare the ranges for a given pair of reg-

istered images over a range of rigid motions applied to one

of the two given pair of registered images.

Note the significant difference in the range of values of

C and I shown in Figure 1. As evident from the experiments

described later, this characteristic of CCRE will prove to be

very useful in demonstrating a large range of convergence

and noise immunity for a given optimization procedure over

the traditional MI defined using the Shannon Entropy. This

we believe is a significant strength of our approach to image

alignment using CCRE.

Figure 2: Aligned a) TI weighted MR and b) T2 weighted MR

images used in the computation of CCRE, MI and NMI over the

range of rotations.

4 Experiment Results

In this section we demonstrate alignment by maximization

of CCRE for a variety of transformations. The performance

of the CCRE was evaluated for each set. The first exper-

iment (with 30 image pairs) was done for synthetic mo-

tions, where we compare the estimated alignment with the

ground-truth alignments. The second experiment (two pairs

of data sets) is done on the real data image pair. In all of

the following experiments, bi-linear interpolation was used

when needed for non-integral indexing into the image.

4.1 Synthetic Motion Experiments

In this section, we demonstrate the robustness property

of CCRE and hence justifying the use of CCRE over MI

and NMI (normalized-MI) in the alignment problem. This

mc aII 11i1-- I I

Y Y

Figure 3: Registration example for rigid motion using our algorithm. Leftmost:The source image, Rightmost: Target image, obtained by

applying a synthetic rigid motion to the source image. The sizes of both images are: 240x320. Middle: Overlay of the target edge and the

transformed source image by applying the estimated rigid motion using CCRE.

noiseo-2 true motion CCRE traditional MI normalized MI

10 10 5.0 5.0 9.998 5.016 4.996 9.993 4.999 5.007 10.002 5.256 2-.

15 9.998 5.077 i 11i" 0 6.003 -3.000 10.132 5.046 5.998

19 9.998 5.006 5.001 FAIL 0 -15.890 19.222

30 9.998 5.256 2.. FAIL

59 10.027 5.124 4.995

60 0 -3.003 0

61 FAIL

Table 1: Comparison of the registration results between CCRE and other MI algorithms for a fi xed synthetic motion. Note that the image

intensity range before adding noise is 0-255.

is demonstrated via experiments depicting superior perfor-

mance in matching under noisy inputs and larger capture

range in the estimation of the motion parameters.

4.1.1 Rigid Motion

In order to compare the robustness property of CCRE ver-

sus traditional MI and NMI, we designed a series of exper-

iments as follows: with a 2D aerial image as the source, the

target image is obtained by applying a known rigid trans-

formation to the source image. The source and target image

pair along with the result of estimated transformation using

CCRE applied to the source with an overlay of the target

edge map are shown in Figure 3. The registration is quite

accurate as evident visually. Quantitative assessment of ac-

curacy of the registration is presented subsequently.

Next, we applied CCRE together with other MI algo-

rithms to estimate motion parameters, with 30 randomly

generated rigid transformations. These are normally dis-

tributed around the values of (0, 5pixel, 5pixel), with

standard deviations of ( 8, 3pixel 3pixel) for rotation and

translation in x and y respectively. Table 4.1.1 shows the

statistics of errors resulting from the 3 different methods.

In each cell, the leftmost value is the rotation angle (in de-

grees), while the right two values show the translations in

x and y directions. Out of the 30 trials, the traditional MI

failed 3 times while CCRE and Normalized MI both failed

only once ("failed" here means that the optimization algo-

rithm sequential quadratic programming (SQP) primar-

ily diverged). If we only count the cases which gave reason-

able results, as shown in the first (for CCRE), second (for

traditional MI) and third (for normalized MI) rows, CCRE

and the traditional MI have comparable performances, all

being very accurate. Thus, in terms of accuracy, CCRE and

NMI are comparable and are both better than MI.

mean standard deviation

1 0.0570 0.456 0.286 0.0220 0.236 0.079

2 0.1650 0.645 0.478 0.0670 0.271 0.204

3 0.122 0.397 0.466 0.0400 0.093 0.077

Table 3: Comparison of estimation errors for rigid motion between

CCRE, MI and normalized MI.

In the second experiment, we compare the robustness of

the three methods (CCRE, MI and normalized MI) in the

presence of noise. Still selecting the aerial image from the

previous expt. as our source image, we generate the target

I sa. iI ., 1 1 I I

I ML II n 1 j+-- : I

noiseao2 true motion CCRE traditional MI normalized MI

13 5 6 6 4.997 6.002 5.997 5.008 ,.'.,7 6.004 5.003 6.007 6.022

5 7 7 4.995 7.004 7.012 0.087 6.988 7.018 5.384 7.995 6.541

10 10 10 10.015 i-)'.: 9.972 FAIL 0 -18.748 -21.041

20 10 10 20.002 '.) '.) 9.990 FAIL FAIL

30 13 13 30.002 12.990 12.998

32 13 13 31.950 14.037 12.974

35 14 14 19.840 1.119 -9.942

Table 2: Comparison of the convergence range of the rigid registration between CCRE and other MI schemes for fi xed noise variance.

Figure 4: An affi ne motion estimation example of our algorithm. Leftmost: The source image, which is a T1 weighted MR image. Right-

most: Target image, obtained by apply a synthetic affi ne motion to the T2 weighted MR image. The sizes of both images are:256x256.

Middle: Overlay of the target edge map on the transformed (using affi ne motion computed by CCRE) source.

image by applying a fixed synthetic motion. We conduct

this experiment by varying the amount of Gaussian noise

added and then for each instance of the added noise, we reg-

ister the two images using the three techniques. We expect

all schemes are going to fail at some level of noise. By com-

paring the noise magnitude of the failure point, we can show

the degree to which these methods are tolerant. We choose

the fixed motion to be 100 rotation, and 5 pixel translation

in both x and y direction. The numerical schemes we used

to implement these registrations are all based on sequential

quadratic programming (SQP) technique. Table 1 show the

registration results for the three schemes. From the table,

we observe that the traditional MI fails when the variance of

the noise is increased to 15. It is slightly better for normal-

ized MI, which fails at 19, while CCRE is tolerant until 60,

a significant dittlrence when compared to the traditional

MI and the normalized MI methods. This experiment con-

clusively depicts that CCRE has more noise immunity than

both traditional MI and the normalized MI.

Next, we fix the variance of noise and vary the magnitude

of the synthetic motion until all of them fail. With this ex-

periment, we can compare the convergence range for each

registration scheme. From Table 2, we find that the con-

vergence range of traditional MI and normalized MI is es-

timated at (50, 6, 6) and (90, 10, 10) respectively, while our

CCRE-based algorithm has a much larger capture range at

(320, 13, 13). It is evident from this experiment that the cap-

ture range for reaching the optimum is significantly larger

for CCRE when compared with MI and NMI in the pres-

ence of noise. Note that in all the cases, the same numerical

optimization scheme SQP was used.

4.1.2 Affine Motion

The affine motion experiment was designed as follows: in

every experiment, we applied a known affine transformation

to the target image shown in Figure 2. One example of the

pair of source and transformed target image are displayed

in Figure 4.

For the purpose of comparison, we separate the affine

motion into three parts, rotation, translation and scaling.

Three sets of 10 randomized transformations have been

used. They are normally distributed around the values

of (50, 1.0, 5pixel), (70, 1.0, 7pixel) and (10, 1.0, 9pixel)

respectively, with standard deviations of 50, 0.2 and 2pixel

for rotation, scale and translation respectively. For a quan-

titative assessment of the accuracy of the registration, we

computed the mean and standard deviation of the errors

'I

' --~L~LI~

~-1131 ~CI I

for the six parameters of the affine motion. It should be

noted that in all the three sets of experiments, our CCRE

method has yielded superior performance over the other

two methods. Out of the 30 trials, the traditional MI failed

6 times, the normalized MI 3, while CCRE failed only 2

times. ("failed" here means that the results diverged).

mean standard deviation

S 0.0020 0.0068 0.0732 0.0000 0.0005 0.0233

0.0098 0.0029 0.0395 0.0011 0.0001 0.0017

2 0.0460 0.0163 0.3945 0.0155 0.0005 0.2200

0.0231 0.0432 0.4743 0.0007 0.0130 0.2537

0.0078 0.0076 0.1260 0.0001 0.0001 0.0132

0.0089 0.0069 0.1443 0.0001 0.0001 0.0149

Table 4: Comparison of estimation errors between CCRE, and

other MI-based methods in estimating the affi ne motion.

The second test on affine motion is similar to the one

for the rigid motion (table refconverg), we registered the

source and target images while varying the synthetic affine

motion until the methods fail to find the motion. Each mo-

tion parameter is evaluated independently, Table 5 summa-

rizes the results of applying our CCRE algorithm as well as

the other MI schemes. The values shown are the maximum

capture range (from zero) for each parameter in each algo-

rithm. As evident, our algorithm has a significantly larger

convergence range.

algorithm Rotation Translation Scaling

CCRE 390 30 3.2

Traditional MI 180 15 2.2

Normalized MI 210 14 2.6

Table 5: Convergence range of different algorithms for affi ne mo-

tion. Here we divide the affi ne motion into 3 parts. Each part is

evaluated independently.

The last test for the affine motion is to vary the amount

of Gaussian noise while fixing the synthetic affine motion.

Table 6 depicts the noise variance which causes each algo-

rithmto fail. Again, observe superior performance of CCRE

over the other MI-based methods.

algorithm noise variance(-2)

CCRE 19

Traditional MI 6

Normalized MI 5

Table 6: Comparison of the registration results between CCRE

and other MI-based methods for the fi xed affi ne motion,

(1.4772, -0.2605, 5.0000, 0.2605, 1.4772, 5.0000) and varying

noise levels

4.2 Real Data Experiments

In this section, we demonstrate the algorithm performance

for a pair aerial images taken over time. The transformation

between the two images is assumed to be a projective trans-

formation. Our data is approximated by a planar surface in

motion viewed through a pinhole camera. This motion can

be described as 2D projective transformation.

) aox aiy + a2

u(x,y) =-----

a6x + a7y + 1

v(x,y) = +a4+a5

a6x + a7y + 1

This projective transformation requires us to estimate

eight parameters for each image pair. For brevity, only one

registration result is shown in Figure 5. Here, the source

and target images are shown in the top row, and the lower

left image is the overlay of the transformed source with the

source edge map (showing the change in the source due

to the applied transformation), while the lower right image

shows the overlay with the target edge map showing the

registration. As evident, the registration is visually quite

accurate.

Figure 5: Registration results for the projective transformation.

Upper left, the source image; Upper right, the target image; Lower

left, the transformed source overlayed with the source edge map;

Lower right, the transformed source overlayed with the target edge

map.

(15)

5 Summary

In this paper, we presented a novel measure of informa-

tion that we dub cumulative residual entropy (CRE). This

measure has several advantages over the traditional Shanon

entropy whose definition is based on probability density

functions which are hard to estimate accurately. In con-

trast, CRE can be easily computed from the sample data

and these computations asymptotically converge to the true

value. Unlike Shanon entropy, the same CRE definition is

valid for both discrete and continuous domains.

We defined the cross-CRE denoted by CCRE and applied

it to estimate the parameterized misalignments between im-

age pairs and tested it on synthetic as well as real data sets

from mono (video) and multi-modality (MR T1 and T2

weighted ) imaging sources. Comparisons were made be-

tween CCRE and traditional MI and normalized MI both of

which were defined using the Shanon entropy. Experiments

depicted significantly better performance of CCRE over the

other MI-based methods currently used in literature.

Acknowledgements

Authors would like to thank Dr. Wen Masters of ONR for

providing the Aerial images.

References

[1] J. Aczel and Z. 'Daroczy, On measures of information and

their characterization, Academic Press, New York, 1975.

[2] J. L. Barron, D. J. Fleet, and S. S. Beauchemin, 'Perfor-

mance of Optical Flow Techniques," Intl. J Comput. Vision,

1(12):43-77,1994.

[3] Simulated brain database, available online at:

www.bic.mni.mcgill.ca/brainweb/

[4] C. Chefd'Hotel, G. Hermosillo and O. Faugeras, 'A varia-

tional approach to multi-modal image matching," in IEEE

Workshop on VLSM, pp. 21-28, 2001, Vancouver, BC,

Canada.

[5] X, Y and Z, 'Cumulative residual entropy, a new measure

of information," Technical Report, Institute of Fundamental

Theory, Department of Mathematics, October 2002.

[6] A. Collignon, F. Maes, D. Delaere, D. Vandermeulen, and P

S. ang G. Marchal, "Automated multimodality image registra-

tion using information theory,"Proc. IPMI,YJ.C.Bizais, Ed.,

pp. 263-274,1995.

[7] Thomas M. Cover, Joy A. Thomas, Elements oflfrI, .,..i. i .,

Theory, John Wiley and Sons, 1991.

[8] B. Forte and W. Hughes, 'The maximum entropy princi-

ple: a tool to defi ne new entropies,"Reports of mathematical

physics, 26(2), pp. 227-238, 1988.

[9] J. N. Kapur, 'On the basis of relationship between measures

of entropy and directed divergence," Proc. of the National

Acad. Sci, 58 A(3), 375-387.

[10] S. H. Lai and M. fang, 'Robust and effi cient image alignment

with spatially varying illumination models," in IEEE CVPR

1999, pp. 167-172.

[11] M. Leventon and W. E. L. Grimson, 'Multi-modal volume

registration using joint intensity distributions," in MICCAI

1999.

[12] J.B. Maintz and M. A. Viergever,'A Survey of Medical Im-

age Registration,"MedIA Vol. 2, pp. 1-36,1998.

[13] A. Renyi, 'On measures of entropy and information," se-

lected papers of Alfred Renyi, Vol. 2, 1961.

[14] A. Roche, G. Mandalain, X. Pennec and N. Ayache, 'The

correlation ratio as new similarity metric for multi-modal im-

age registration, in MICCAI'98.

[15] D. Ruckert, C. Hayes, C. Studholme, M. leacha nd D.

Hawkes, "Non-rigid registration of breast MRI using MI,"

in MICCAI98.

[16] C. E. Shannon, 'A mathematical theory of communication,"

Bell System Technical Journal, vol. 27, pp. 379-423 and 623-

656, July and October, 1948.

[17] C. Studholme, D. L. G. Hill and D. J. Hawkes, 'An overlap

invariant entropy measure of 3D medical image alignment,"

Pattern Recognition, Vol. 32, pp. 71-86,1999

[18] R. Szeliski,J. Coughlan, 'Spline-based image registration,"

IJCV, v.22 n.3, p.199-218, March/April 1997

[19] B. C. Vemuri, S. Huang, S. Sahni, C. M. Leonard, C. Mohr,

R. Gilmore and J. Fitzsimmons, 'An effi cient motion estima-

tor with application to medical image registration", Medical

Image Analysis, Oxford University Press, Vol.2, No. 1, pp.

79-98, 1998 .

[20] P. A. Viola and W. M. Wells, 'Alignment by maximization

of mutual information,"in I ril, ICCV, MIT, Cambridge, MA,

pp. 16-23, 1995