Technical Report
Shape L'Ane Rouge: Sliding Wavelets for Indexing and Retrieval
Adrian Peter1, Anand Rangarajan2 and Jeffrey Ho2
1Dept. of ECE, 2Dept. of CISE, University of Florida, Gainesville, FL
Abstract
Shape representation and retrieval of stored shape mod
els are becoming increasingly more prominent infields such
as medical imaging, molecular biology and remote sensing.
We present a novel framework that di,,. r! addresses the
necessity for a rich and compressible shape representation,
while simultaneously providing an accurate method to index
stored shapes. The core idea is to represent pointset shapes
as the square root of probability densities expanded in a
wavelet basis. We then use this representation to develop a
natural similarity metric that respects the geometry of these
probability distributions, i.e. under the wavelet expansion,
densities are points on a unit hypersphere and the distance
between densities is given by the separating arc length. The
process uses a linear assignment solver for nonrigid align
ment between densities prior to matching; this has the con
notation of "sliding" wavelet coefficientv akin to the sliding
block puzzle L'Ane Rouge. We illustrate the utility of this
framework by matching shapes from the MPEG7 data set
and provide comparisons to other similarity measures, such
as Euclidean distance shape distributions.
1. Introduction
Today's scientific (and nonscientific) community gener
ates information at a frantic pace; thus placing a paramount
emphasis on developing flexible and robust systems for
mining the data. Often, the desires are to classify test data,
cluster similar groups or discover the closest match to an
incoming query. The key enablers of these operations are
the similarity metrics used for querying the data [ ]. In
this paper we focus on similarity metrics for shape models
having applicability to a variety of disciplines, e.g. medical
imaging, remote sensing and robotics. Our framework in
troduces a new shape representation and then uses the nat
ural geometry arising from this representation to derive a
geodesicdistance, similarity metric.
The present effort is motivated by a recent wavelet den
sity estimation method [ ] that estimates x) and then
obtains a bona fide density as i px) This has several
advantages over estimating p(x) directly such as guaran
teeing nonnegativity and imposing a simple constraint on
the wavelet coefficients. This new density estimator uses a
wavelet expansion of p(x) i.e.
p(x)= a .' (x) + kj,kk(x), (1)
jo,k j>jo,k
where ajo,k and 3j,a are coefficients for the father q(x) and
mother y(x) basis function; the jindex represents the cur
rent scale level and the kindex the integer translation value.
(Note: (zx) and 7(x) are also referred to as the scaling and
wavelet functions respectively.) For numerical implementa
tion, the infinite expansion in (1) is truncated to some n set
of scale levels and we must also select a starting scale level
jo. The coefficients in (1) are estimated with a maximum
likelihood objective function which is minimized using a
modified Newton's method.
Expanding the /p with a wavelet basis serves as the
spring board to our development of an efficient similarity
metric between shapes. We will show that given pointset
shapes, we can use this density estimation method to rep
resent shapes as probability densitiesa natural by product
is that the densities visually resemble the shapes. (For the
purposes of this paper, we consider only two dimensional
shapes but the theory and algorithmic procedures readily
extend to higher dimensions.) All shapes in a given data
set can be similarly represented. This representation has
excellent properties like: (1) the multiscale wavelet coeffi
cients of the densities can be thresholded [ ] to compress
the storage requirements (2) several different orthonormal
bases can be used to estimate the densities, thus enhancing
their descriptive capabilities and (3) the compact nature of
wavelets provides both spatial and frequency localization
enabling the densities to closely mimic shape features.
Based on this representation, the intuition for the sim
ilarity metric follows from considering the coefficients of
the probability density {ajo, 3j,k } as the coordinates c
r. I .. a~jo,m, 3,1,, /n,m] indexing the location of
a density on a unit hypersphere; then the distance between
two distributions pi and p2 indexed by their coordinates cl
and c2, respectively, is given by
d(p,p2) = cos (cc2). (2)
Technical Report
Technical Report
(The unit hypersphere comes about from the constraints on
the coefficients in conjunction with a diagonal metric tensor
as discussed in 2.2.) We expand on these intuitive ideas to
develop a matching procedure that casts density matching in
a linear assignment problem. The linear assignment is used
to handle nonrigid differences between shapes. It "warps"
the densities while preserving their defining properties, e.g.
unit integrability and nonnegativity. Since the densities
closely resemble the shapes, we are in effect warping the
shapes. It will be shown that this nonrigid alignment is
necessary to obtain more accurate recognition. When one
uses a Haar basis (box function) for the density expansion,
the permutation of the wavelet coefficients due to the linear
assignment visually looks like sliding blocks. Thus we have
informally branded this process as Shape L'Ane Rouge after
the French moniker for sliding block puzzles. Our method
has several benefits, including:
Shapes are not limited by topological constraints (such
as the need to represent shapes as closed curves),
eliminating extra effort often spent in developing
parametrizations or other preprocessing.
All of the intensive computations happen offline, e.g.
the density estimation.
For querying, the similarity metric computation be
tween source and target shape is fast, satisfying the
requirements for demanding indexing and retrieval ap
plications.
Use of wavelet representations enables flexibility in
compression and storage.
1.1. Related Work
Existing work in shape modeling and matching span a
broad spectrum of representations and their corresponding
metrics. There are several recent surveys, e.g. [ ], that suc
cinctly describe shape representations such as unstructured
pointsets or curves. They also detail the myriad of simi
larity measures that provide a means by which to compare
shapes under a common representation. The advances made
by all these methods have been instrumental in enabling ro
bust indexing and retrieval mechanisms.
Because we incorporate a linear assignment solver to
handle nonrigid deformations, our method is situated in
close proximity to techniques that use transportation and as
signment problem formulations [ ] to obtain their distance
measures. One such measure is the Earth Mover's Distance
(EMD) [ ], a metric between general mass distributions of
objects. Given two distributions x and y, the goal becomes
to find a matrix fj that establishes a flow between all fea
tures x and yj in x and y. Feasible flows must satisfy row
sum, column sum and total sum constraints. Obtaining the
flow and subsequently the EMD is generally based on the
solution to the transportation problem [ ]. Hence, one of
the main differences between our approach and EMD is that
we solve a matching problem in contrast to the transporta
tion problem. The EMD also requires one to decide on the
features as well as the appropriate weighting of each fea
ture per object. For some applications these choices may
already be readily apparent, but for most this requires an
added level of effort and investigation. Our method sim
ply works on the point sets that naturally arise either from
sampling or preprocessing.
Our present method also falls in the same paradigm as
a recently introduced shape analysis framework [ ] which
uses geodesic distances on the manifold of Gaussian mix
ture models to establish a shape similarity metric. The au
thors represent shapes as mixture models and use the Fisher
Rao metric derived directly from the representation to ob
tain intrinsic distances on the manifold of parametric mix
tures. Like their method, we also leverage the ,L 'Id ill \ ili.i
results directly from the shape representation. However, it
is not feasible to use their metric for retrieval due to the
computational demand of solving for geodesic distances on
arbitrary, highdimensional manifolds; a byproduct of us
ing Gaussian mixtures. We on the other hand, have a well
understood geometry with an easy to compute metric.
The remainder of this paper is organized as follows.
In the next section we provide detailed discussions of our
methodthe representation of shapes as density functions
expanded in a wavelet basis, the geometry that arises from
this representation and the derivation of the similarity met
ric. We then follow with experimental verification of our
method, Section 3. The indexing and retrieval accuracies
are tested on a shape database consisting of 1400 shapes
from the MPEG7 Core Experiment CEShapel[ ]. Our
the method is compared with another densitymatching
technique for retrieval: D2 shape distributions [ ], for
which we compute four different similarity measures. We
also compare our results with published recognition rates of
other algorithms on the MPEG7 data. The last section con
cludes by summarizing our effort and proposing directions
for future work.
2. Shape L'Ane Rouge
Our similarity metric, the geodesic distance on a unit hy
persphere, is obtained directly from our representation of
shapes as probability densities expanded in a orthonormal
wavelet basis. This shape representation is detailed next,
followed by a discussion on how this leads to the hyper
sphere geometry for the distributions. Afterwards, we il
lustrate the need for nonrigid alignment and how it can be
accomplished on the space of distributions through a linear
assignment formulation. It will turn out that the linear as
signment process has to be regularized to improve matching
Technical Report
Technical Report
performance. To this end, we formulate a penalty term that
restricts large movements of wavelet basis.
2.1. From Shapes to Wavelet Densities
The idea of representing shapes as densities is usually
brought to fruition in two ways, either the density is di
rectly estimated from the shape's discrete samples [ ] or
some other feature is first extracted from the shape and then
the density is fit to these features [ ]; our method falls
in line with the former. To our knowledge, this is the first
time a wavelet density estimator has been used to directly
represent shapes.
Many of the issues of estimating a bona fide density can
be overcome by first estimating p(x) and then obtaining
the desired density as (p) [ ]. For two dimensional
densities the wavelet expansion of the square root of the
density is given by:
3i 3
Vpx= a Z k(x)+ ES 3 ,k ',k(x) (3)
jo,k j>jo,kw 1
where x E R2, ji is some stopping scale level for the multi
scale decomposition and (kl, k2) k E Z2 is a multiindex
that represents the spatial location of the basis. (The trans
lation range of k can be computed from the span of the data
and basis function support size.) The father and mother ba
sis are tensor product combinations of their one dimensional
counterparts, i.e.
Yjo,k(x)
,k (x)
,k(x)
Yk (x)
2io 0(2jo x
2J3 (2J3ox
23j(2xl
23j(23xl
 k1) (2 ,,
 kl) (2x2 
 kl)(23 2 
k1)y(2x2 
 k2)
k2)
k2)
k2).
The goal is to estimate the set of coefficients { ajo,k, L k
and reconstruct the density using (3). An efficient maxi
mum likelihood method to estimate them, with fast conver
gence, is discussed in [ ]. Due to the increased indexing no
tation for two dimensional wavelet expansion, we will typ
ically resort to one dimensional arguments, as in Section 1,
with it being understood that all results directly translate
to two dimensions. Under a wavelet expansion of p(x),
the unit integrability requirement of all probability densities
translates to a constraint on the wavelet coefficients
J(/ (X))2
ji
2 2
o 0,+k + jo k
jof 3>3o, k
Recall that we are using only orthonormal bases such as
Haar, Coiflets or Symlets. Figure 1 illustrates estimated
densities for four point set shapes, using a single level
wavelet decomposition. The points were extracted from the
0] . 
. : ..
,1 il t ii
..
Figure 1. Example wavelet densities estimated from pointssets of MPEG7 shapes.
Top row are point sets, cardinality from left to right: 4,948;5,578;7,773;11,984. Sec
ond row is a nadir view of the estimated densities using the following wavelet families
(from left to right): Haar (o 2 ), Coiflet4 (o 1), Symlet10 (o 0) and
Haar (jo 2). Third row is the perspective view. Notice how the wavelet densities
accurately represent the shapes.
MPEG7 binary image data set. Notice how the compact
nature of the bases does an excellent job in modeling the
shape features. In the overhead views, it is readily apparent
how closely the densities resemble the shapes. We feel this
direct visual association of the density and the shape pro
vides a nice advantage over trying to extract features and
then fit the density to the features. Also, notice that shapes
exhibit a variety of topological properties like interior struc
tures and disconnected components.
2.2. The Geometry of Wavelet Densities: Geodesic
Distances on the Hypersphere
Equation (5) showed that a natural byproduct of work
ing with the square root of density and then expanding it
with an orthonormal wavelet expansion was that it imposed
a constraint on the basis coefficients; namely the sum of
squared coefficient values must equal one. This immedi
ately leads to the interpretation that the basis coefficients
which are unique to a particular density since wavelets serve
as a true basis for the space of continuous distributions
give the coordinates for a position on the unit hypersphere.
The ordering of the coefficients in the coordinate vector can
be taken in any arrangement but it must be consistent across
all densities. The dimensionality of the hypersphere is de
termined by the cardinality of the set containing all the co
efficients. The hypersphere geometry of the densities can
be more rigorously justified when we analyze the p(x
representation under the theoretical basis of information ge
ometry [ ]. In this context, the Fisher information matrix
(FIM) serves as the metric tensor on the manifold of a para
metric family of distributions. One of the algebraic forms
of the FIM is given by
9uv 4f a (x dx (6)
n ,y 4 f th se ctan
Technical Report
Technical Report
1), il) (2) (2)
jk' j,kk k 0 k
Figure 2. Hypersphere of densities. Unit integrability for densities requires
jo k ,2 k + EC1j k = 1 and the Fisher information matrix is diago
nalized when ,p is expanded in an orthonormal basis. This places the shapes rep
resented by the densities on unit hypersphere with coordinates given by the wavelet
coefficients. The above figure shows two densities, see coefficient superscript, on the
hyperspheretheir geodesic distance is the angle between the unit vectors.
where O {0 1,..., 0 } denotes the parameters of the
distribution and u and v indicate the row and column in
dex, i.e. for a family with m parameters the FIM is m x m.
Under an orthonormal expansion of /p(x ), Eq. (6) re
duces to g,u, 41 which when taken in conjunction with
a constraint E i (0i)2 1 gives the unit hypersphere ge
ometry. Such is the case in our framework where Vp(xE)
has been expanded in a orthonormal wavelet basis with the
coefficients of the expansion serving as the parameters of
the density, i.e. = {ajo,,, /3j,k}. Hence the orthonormal
wavelet bases serve as eigenfunctions diagonalizing certain
classes of parametric densities with a particular class de
fined by regularity properties of the density which come
about from the wavelet's order of vanishing moments [ ].
Two shapes represented as wavelet densities end up as two
points on the hypersphere, see Figure 2. Since this is a unit
hypersphere with the wavelet coefficients for each shape
playing the role of two unit vectors, the angle between these
unit vectors immediately gives the geodesic distance be
tween the shapes.
It is also interesting to note that we can obtain this same
inner product interpretation by taking the approach of work
ing with a similarity measure directly between the densities,
instead of analyzing the geometry implied by the coefficient
constraints and the metric tensor. In particular, using the
Hellinger divergence [ ] to calculate the distance between
two densities pi and p2 gives
DH(pl,p2)
P2 1_ )2 dx
f2(i (/p2
2 2 Zo, ,(1) ,(2)
22 [ 0jo,k jok jo,k
j+ X ki (1() (2)
where {a(1), 3(1) } and {c(2), /(2)} are the wavelet param
eters of pi and P2 respectively. Notice that we can factor
out a 2 and drop the constant without effecting metric
qualities of the measure. This reduces (7) to an inner prod
uct between the coefficients of the densities, hence giving
the same metric as the one we derived above by analyz
ing the geometry of the space of distributions. There are
other notions of similarity measures between densities such
as the KullbackLeibler divergence and Euclidean distance
but none of them operate on the square root of the density
and they also do not provide a closedform expression for
the distance. We refer the reader to [ ] for a summary of
other distance measures between densities.
2.3. Sliding Wavelets
If our analysis ended with the previous section, we would
be equipped with a very fast similarity metric. Given a pair
of pointset shapes we would merely estimate the coeffi
cients for the wavelet, squareroot densities of each shape
and then take their inner product to get a measure of their
closeness to each other. However, this approach is some
what naive in that it does not leverage the full mathemat
ical formalisms that relate one shape to another. Follow
ing the Klein school of thought [ ], similarity between
shapes is often considered after quotienting out some trans
formation group, typically the group of similarity transfor
mations [ ]. The motivation for this is that without remov
ing transformations we are not really analyzing effects that
are intrinsic to the shapes. Nonrigid transformations are
the most general, basically encompassing any continuous
transformation. Practically it is expected that most shapes
from the same category should differ by i.ull c 'nonrigid
warps compared to shapes from other arbitrary categories;
hence correcting for this prior to evaluating the similarity
metric should enhance its discriminability. In our frame
work, we could incorporate nonrigid alignment in one of
two ways: perform nonrigid alignment of the point sets
prior to fitting the wavelet density or fit the density to the
data and then adjust for nonrigid deformations by warping
the densities. The former method usually involves adopt
ing a spline based model to represent the nonrigid transfor
mation [ ] and can involve iterative optimization to solve
for the spline parameters. Though these methods are able
to model a large class of nonrigid deformations, they do
not posses the computational efficiency needed for query
ing systems. Our method takes the second option of warp
ing the densities which we accomplish by locally translating
wavelet coefficients.
We now give a simple example to illustrate how warping
the densities by local translations can increase recognition.
Suppose two shapes have been affine aligned and there only
remains an nonrigid warp between the two. We model the
nonrigid deformation, in the infinitesimal, as local transla
tions. Figure 3 shows the estimated densities of two hypo
thetical shapes, see (a) and (b). The coefficients for the ba
sis functions of each shape are indicated by a red bar. The
density function shown in (a) only differs by a translation
Technical Report
Technical Report
each wavelet is allowed to slide, we cannot allow the slid
ing wavelets to collide and end up at the same spatial loca
tion. This imposes a permutation constraint on the sliding
wavelets and the resulting deformation picture evokes the
L'Ane Rouge puzzle, see Figure 4. Thus our new objective
to minimize becomes
(a)
1:4 ~1'
.: .' .; .. .
'"...  : 1/4: ia4 .' ,,
.... ...] a"'2 '., ,
n! ,,,Z..
'"'' .: ..... < :,: ....
"",, "":.. ..... ~ :"
00 0 00 0 000 1 00000
000 00 000 00 0 1
0000 1 / 00 100 10
Figure 3. Local nonrigid effects and the need for linear assignment. (a) is density
pi of the first shape, with only scaling coefficients, c = shown. (b)
is the second shape with density p2 with coefficients c2 = Locally the
point sets only differed by a translation which resulted in the densities differing by
a translation. Without linear assignment the coefficient vectors of these would give
a inner product of 0 and consequently large geodesic distance on the hypersphere.
Linear assignment can correctly recover the local translation and then the geodesic
distance will be small, reflecting the true similarity between the shapes.
to density (b). Notice that if we were to stack the coeffi
cients in a vector (from bottom left to top right) for each
density and perform an inner product between them, the re
sulting value would be zero. This leads to high geodesic
distance, cos 1(0) j. However, if we simply slide the
wavelet bases of one shape to correspond to locations on the
other, our inner product would then yield a very high cor
relation indicating the true similarity between the shapes.
Also we must be careful that whatever mechanism we use
to translate the bases does not alter the values of their coeffi
cients and compromise the properties of a bona fide density,
i.e. (5) must hold to maintain unit integrability. The most
straightforward way to accommodate these objectives is to
reformulate our similarity metric under the action of a per
mutation group on the ordering of the coefficients. These
specific requirements can be addressed within a linear as
signment construct [ ]; thus our deformation model can
be interpreted as a "sliding grammar" wherein we only al
low wavelets at each level j to independently slide to get
a good match. The independent sliding assumption at each
level can be rigorously justified. The hypersphere nature
of the manifold (discussed in the previous section) implies
that the "probability density mass" corresponding to each
wavelet is independent of the rest. Consequently, this al
lows us to independently slide each wavelet to get a best
match. While this justifies the independence assumption,
"deformation il.iiiiiin.is more complex than sliding could
be considered, e.g. splitting coefficients. However, in this
paper, we restricted ourselves to only sliding the wavelets
leaving more exotic rules for future research. Even though
where 7r(k) is a permutation operator that takes as input the
wavelet spatial index k and returns a new index k' at the
same level. (Since the wavelet coefficients can all be re
versed to get the same density, there's an overall sign sym
metry which is accounted for in the linear assignment algo
rithm by running it twiceonce with the set of coefficients
{ajo,k, 3j,k} and a second time with {ajo,k, 3j,k}.)
The space of possible permutations is large and hence this
objective needs to be regularized to yield useful results.
Otherwise, every source shape's coefficients could be re
ordered to be in the shape of the target; this is a detriment to
recognition since any shape can essentially match another.
To overcome this effect, we penalize large spatial move
ments by incorporating a cost based on the Euclidean dis
tance between the centers of basis functions. This restricts
large movements of the coefficients forcing them to be only
locally translated. Incorporating this penalty gives our final
objective function
E(7)
D(pi,p2; 7r)
+ A [jk r(Jo,k) r((jo, k))2
+ E,k r(j, k) r((, k))2]
where r(j, k) is a location operatoressentially giving us
the center of the wavelet basis at (j, k)which has two
inputs, the level j (and this includes jo), the wavelet spa
tial index k and returns a spatial location r E R2. The
basic idea here is that as the regularization parameter A is
increased, the objective increasingly favors shorter wavelet
sliding movements and hence smaller deformations. The
optimal permutation 7* can be obtained by setting up the
cost matrix
C = clc + Ad (10)
where c, is a vectorized representation of all the density
wavelet coefficients for shape i and the matrix d contains
pairwise distances between the wavelet basis locations. Fig
ure 4 illustrates the effect of A on the linear assignment and
hence the similarity metric.
3. Experiments
The presented technique was evaluated on the MPEG7
database [ ]. The original data set consists of 70 different
Technical Report
k j (1) w(2)
2 + 2 Ejok Ojo,^^kOjo0r(k)
i v jl (4(1) (2) 1
+ Ej>jo,k /j,k/Jjlr(k)\
D(pl, p2; i7)
S.. ::
:1 .
.. .
,
V.l; *
dll
"r "
II.
Technical Report
4A
;: i
/
st**l* = .~
cA
Figure 4. Effects of A on linear assignment. Top row far left is target shape and
far right is the source. Second row shows for small A the source shape is almost
perfectly transformed to the target while for large A the source shape retains original
shape; A values from left to right: 10, 250, 500, and 1000. Third row illustrates the
wavelet coefficients movement in row two (best viewed in color). The densities were
estimated using the Haar family with jo 2.
categories with 20 observations per category for a total of
1400 binary images. Each image consists of a single shape.
One of the main strengths of our method is its accessibil
ity and ease of use. The first part involves simply taking
the data samples for each object and using them to esti
mate <{jo,k,/ k} for the wavelet expansion of /p. In
the context of shape indexing this phase is completely off
line, i.e. wavelet densities for the entire database can be
estimated once and before the actual similarity computation
takes place. Next, to compare two shapes, we first use the
regularized linear assignment (9) to handle nonrigid effects
and then use closedform distance on unit hypersphere to
obtain the similarity measure between them. We compare
the performance of our method, Shape L'Ane Rouge, to D2
shape distributions [ ].
For the MPEG7 data, each shape was represented with
a subset of points. There are no topology or equal pointset
cardinality requirements amongst shapes, allowing shapes
with richer features to be represented with a greater num
ber of points, see Figure 1 for some examples. In this
preliminary effort, we have focused on handling nonrigid
effects. To this end, shapes within each category were
affine aligned to a category reference shape. We used a re
cently introduced affine alignment algorithm that enables
alignment of 2D pointset data without iterative optimiza
tion [ ]. Once the shapes were aligned, all of them were
brought into a common field of view by placing them in a
[10, 10] x [10, 10] coordinate system. This was done
to control the range over which we estimate the densities.
Next we estimated coefficients for the wavelet density of
each shape using a Haar basis with jo = 1. Note it is
possible to use several other families, but the Haar basis
is available in closed form and reduces the time required to
estimate the densities (on average about 2 to 3 minutes per
shape). It is worth mentioning that regardless of the num
ber of points used to represent each shape, once the den
sities are estimated all of them will have the same number
of wavelet coefficients. (Recall that the densities are all es
timated in the same square coordinate system.) Per these
specifications, each wavelet density was represented with
1, 764 coefficients.
Once the densities are estimated for all the shapes, pair
wise matching between densities only involves working
with the wavelet coefficients of the densities. When match
ing two shapes, the wavelet density coefficients of each are
used to create the cost matrix in Eq. (10). With this cost ma
trix, we can then use the linear assignment solver presented
in [ ] to obtain the waveletcoefficient rearrangements of
the source shape with respect to the target. All of our ex
periments were conducted with multiple values of A. For a
shape pair, it typically takes less than 5 seconds to perform
the linear assignment. Once the coefficients are reordered
we can use Eq. (2) to obtain the geodesic distance between
the shapes. In fact we experimented with three possible
similarity measures that can be computed after the linear
assignment: (1) the standard arc length geodesic distance
after linear assignment, (2) geodesic distance plus the total
distance penalty incurred for sliding and (3) just the total
sliding penalty. (Note: The last two metrics are not be con
fused with the fact that the distance penalty is also used to
regularize the sliding process which is different from treat
ing the total amount of movement as a metric.)
We compared our method to D2 shape distributions as
this is also a densitybased shape retrieval metric. For each
shape, a D2 shape distribution was created by taking 10, 000
random pairwise distances between points on the shape. In
[ ], the authors then use these distances to construct a 1D
histogram for each shape; this serves as a unique shape
signature. Instead of using histograms, we estimate a 1D
wavelet density for each shape. Distance metrics between
shapes can be obtained by using a variety of 1D density dis
similarity measures. In addition to the Hellinger divergence,
Eq. (7), we computed three other measures:
Bhattacharyya: D(pl,p2) = 1 f p/p2dx
X2: D(pl,p2) = j1 P2
Pl @P2
* L2: D(pl,p2)
(f(pl P2)2dx)2
Figure 5 shows some example D2 shape distributions using
the 1D wavelet density estimator; these distributions corre
spond to shapes shown in Figure 1.
Performance on the MPEG7 is most commonly evalu
ated using the bullseye criterion [ ]. Each shape is used
Technical Report
Technical Report
Figure 5. Example D2 shape distributions using wavelet densities estimators..
These distributions correspond to shapes in Figure 1 from left to right. All densi
ties were estimated using a Symlet7, jo = 1.
Shape L'Ane Rouge D2 Shape Distributions
Metrics
Metrics A = 500 A = 1000
X2 59.3%
Geodesic w/ LA 81.7% 84.4% Helling
Hellinger 58.6%
Geodesic+ EDP 32.6% 18.5% Bhattar 58.6%
EDP 32.5% 18 Bhattacharyya 58.6%
EDP 32.5% 18.3% L2 56 %
L2 56.6%
Table 1. MPEG7 recognition rate. Our method Shape L'Ane Rouge out performs
D2 Shape Distributions [ ]. In our method the choice of A effects the recognition
rate. See text for explanation of metrics. (LAlinear assignment, EDPEuclidean
distance penalty).
as a query shape and the top 40 matches are retrieved from
all 1400 shapes (the test shape is not removed). For a single
query, maximum possible correct retrievals are 20 coincid
ing with the number of shapes in each category. Hence there
are a total of 28, 000 possible matches with the recognition
rate reflecting the number of correct matches divided by this
total. Table 1 lists the recognition rates using several density
similarity measures for both Shape L'Ane Rouge and D2
shape distributions. Shape L'Ane Rouge significantly out
performs D2 shape distributions. This gives credence to the
idea of working with feature representations that mimic the
true visual properties of shapes, i.e. D2 shape distributions
represent objects using a 1D signature derived from the 2D
points whereas Shape L'Ane Rouge represents shapes using
2D densities which are visually similar to the shapes. The
three different metrics computed for Shape L'Ane Rouge
illustrate how A impacts recognition performance. A judi
cious choice for A can be made by optimizing over a training
set. The different metrics also show that the wavelet den
sity representation provides a rich set of features evident by
that the fact the geodesic distance (with linear assignment)
out performs the metrics that include the Euclidean distance
penalty. Hence the sliding alone is not sufficient to discrim
inate between shapes. (For high A, the total sliding penalty
dominates the second metric giving similar performance to
the third.)
Recently, methods based on hierarchical representations
[ ] have reported recognition rates greater than .".'
on the MPEG7 data set. However, these methods work on
a more simplified version of the problem than what we have
addressed. They assume shapes are represented by their
boundary outlines and typically use less than 200 points for
the shapes. A hierarchical representation is used to cap
ture both global and local properties. These methods have
the drawback of extracting oriented, boundary curves which
can be a troublesome preprocessing procedure. We also
lose the descriptive power afforded by allowing arbitrary
shape topologies and unconstrained point set cardinalities.
The closest method, in terms of operating on unstructured
point sets and not restricting shape topology, is [ ] which
has published recognition rate of 76.51% on the MPEG7
data set. Our results clearly show reasonable gains over this
method. We are still in the preliminary stages of exploiting
the full capabilities of the Shape L'Ane Rouge framework,
i.e. using multiscale representations to get more descrip
tive attributes, experimenting with different wavelet fami
lies, etc. Since we are already exceeding .1' ., we believe
in the future these enhancements will improve our recog
nition rates significantly without sacrificing our easeofuse
and rich descriptive power.
4. Discussion
The development of robust and effective shape indexing
and retrieval mechanisms largely depends on the represen
tation model for the data and also the metrics used to distin
guish one observation from another. In this paper, we have
presented a novel shape representation scheme which gives
rise to a natural metric that comes directly from the repre
sentation. Given an unstructured point set model of a shape,
our Shape L'Ane Rouge framework estimates /p, under
a wavelet expansion, directly from the point data and re
covers the probability density as (/p) As we illustrated,
these densities have a direct visual similarity to the original
shape. The unit integrability property of all densities trans
lates to a constraint on the wavelet coefficients, i.e. the sum
squared coefficients equal one, see Eq. (5). Since the den
sities are uniquely identified by their wavelet coefficients,
these are in effect the coordinates by which probability den
sities are indexed on a unit hyperspherebecause the den
sities represent the original shapes, intuitively the shapes
are also on the unit hypersphere. Arising from this repre
sentation, we immediately gain a natural similarity measure
between shapes by computing the arc length between prob
ability densities on the unit hypersphere. Shape recognition
can be improved if we adjust for nonrigid differences be
tween a pair of shapes before computing a similarity mea
sure. Rather than do this in the original shape space, we
have introduce a novel way of deforming their wavelet den
sity representation through the use of penalized linear as
signment; allowing us to locally warp the density while
maintaining its defining integrability and positivity proper
ties.
Our framework has several advantages over other con
temporary shape modeling and matching schemes, includ
ing:
Each shape can have an arbitrary number of points
Technical Report
Technical Report
without topological restrictions. This is in sharp con
trast to methods that work only on shape silhouettes or
are limited to only a few sample points. Hence, the car
dinality of a shape point set is dictated by the amount
of points needed to accurately represent a shape's fea
tures not by algorithmic limitations.
Limited preprocessing is required since we directly
take the shape points and estimate the density.
The metric is in closed form and when incorporating
linear assignment our method is still computationally
efficient enough for querying applications.
We are still in the preliminary stages of fleshing out the ca
pabilities of our Shape L'Ane Rouge technique. In the im
mediate future, we plan to incorporate the use of the mul
tiscale wavelet densities along with studying the effects of
multiple wavelet families. We anticipate these will provide
additional attributes for each shape which we will further
increase shape discriminability and subsequently improve
recognition rates. We are also planning to investigate other
penalty terms for the linear assignment objective function
and better mechanisms for choosing A.
References
[1] J. Tangelder and R. Veltkamp, "A survey of content based 3d
shape retrieval methods," in Proceedings of the Shape Mod
eling International 2004, vol. 00. IEEE Computer Society,
2004,pp. 145156. 1
[2] Authors, "Maximum likelihood wavelet density estimation
with applications to image and shape matching," IEEE Trans.
Image Processing (in submission), 2007. 1, 3
[3] D. Donoho, I. Johnstone, G. Kerkyacharian, and D. Picard,
"Density estimation by wavelet thresholding," Ann. Statist.,
vol. 24(2), pp. 508539, 1996. 1
[4] P. Shilane, P Min, M. Kazhdan, and T. Funkhouser, "The
Princeton shape benchmark," in Proceedings of the Shape
Modeling International 2004, vol. 00. IEEE Computer So
ciety, 2004, pp. 167178. 2
[5] D. Luenberger, Linear and Nonlinear Programming. Read
ing, MA: AddisonWesley, 1984. 2, 5
[6] Y. Rubner, C. Tomasi, and L. J. Guibas, "The earth mover's
distance as a metric for image retrieval," International Jour
nal of Computer Vision, vol. 40, no. 2, pp. 99121, 2000. 2,
3,4
[7] F. Hitchcock, "The distribution of a product from several
sources to numerous localities." J. Math. Phys., vol. 20, pp.
224230, 1941. 2
[8] A. Peter and A. Rangarajan, "Shape matching using the
FisherRao Riemannian metric: Unifying shape represen
tation and deformation," IEEE International Symposium on
Biomedical Imaging (ISBI), pp. 11641167, 2006. 2
[9] L. J. Latecki, R. Lakiimper, and U. Eckhardt, "Shape de
scriptors for nonrigid shapes with a single closed contour,"
in IEEE Conf on Computer Vision and Pattern Recognition
(CVPR), 2000, pp. 424429. 2, 5, 6
[10] R. Osada, T. Funkhouser, B. Chazelle, and D. Dobkin,
"Shape distributions," ACM Transactions on Graphics, no. 4,
pp. 807832, 2004. 2, 3, 6, 7
[11] F. Wang, B. Vemuri, A. Rangarajan, I. Schmalfuss, and
S. Eisenschenk, "Simultaneous nonrigid registration of mul
tiple point sets and atlas construction," in European Confer
ence on Computer Vision (ECCV), 2006, pp. 551563. 3
[12] S. Penev and L. Dechevsky, "On nonnegative waveletbased
density estimators," Journal of Nonparametric Statistics,
vol. 7, pp. 365394. 3
[13] A. Srivastava, I. Jermyn, and S. Joshi, "Riemannian analysis
of probability density functions with applications in vision,"
in IEEE Conf on Computer Vision and Pattern Recognition
(CVPR), 2007, pp. 18. 3
[14] S.I. Amari and H. Nagaoka, Methods of Information Geom
etry. American Mathematical Society, 2001. 3
[15] I. Daubechies, Ten Lectures on Wavelets, ser. CBMSNSF
Reg. Conf. Series in Applied Math. SIAM, 1992. 4
[16] R. Beran, "Minimum Hellinger distance estimates for para
metric models," Annals of Statistics, vol. 5, no. 3, pp. 445
463, 1977. 4
[17] F. Klein, "Vergleichende betrachtungen Uber neuere ge
ometrische forsuchungen," 1872. 4
[18] I. L. Dryden and K. V. Mardia, Statistical Shape Analysis.
Wiley, 1998. 4
[19] F. L. Bookstein, "Principal warps: Thinplate splines and
the decomposition of deformations," IEEE Trans. Patt. Anal.
Mach. Intell., vol. 11, no. 6, pp. 567585, June 1989. 4
[20] J. Ho, M. Yang, A. Rangarajan, and B. Vemuri, "A new affine
registration algorithm for matching 2d point sets," in Pro
ceedings of the F;. ii IEEE Workshop on Applications of
Computer Vision, 2007, p. 25. 6
[21] R. Jonker and A. Volgenant, "A shortest augmenting path
algorithm for dense and sparse linear assignment problems,"
Computing, vol. 38, pp. 325340, 1987. 6
[22] G. McNeill and S. Vijayakumar, "Hierarchical procrustes
matching for shape retrieval," in IEEE Conf on Computer
Vision and Pattern Recognition (CVPR), 2006, pp. 885894.
6,7
[23] P. Felzenszwalb and J. Schwartz, "Hierarchical matching of
deformable shapes," in IEEE Conf on Computer Vision and
Pattern Recognition (CVPR), 2007, pp. 18. 7
[24] S. Belongie, J. Malik, and J. Puzicha, "Shape matching and
object recognition using shape contexts," IEEE Trans. Patt.
Anal. Mach. Intell., vol. 24, no. 4, pp. 509522, 2002. 7
Technical Report
