Irregular-Structure Tree Models for Image Interpretation

Material Information

Irregular-Structure Tree Models for Image Interpretation
TODOROVIC, SINISA ( Author, Primary )
Copyright Date:


Subjects / Keywords:
Approximation ( jstor )
Datasets ( jstor )
Graphics ( jstor )
Image classification ( jstor )
Inference ( jstor )
Learning ( jstor )
Logical givens ( jstor )
Pixels ( jstor )
Plant roots ( jstor )
Statistical models ( jstor )

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Sinisa Todorovic. Permission granted to University of Florida to digitize and display this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
Resource Identifier:
71303361 ( OCLC )


This item has the following downloads:

todorovic_s ( .pdf )































































































Full Text




i:',. i : : : O F i: : ~ : )A

AC : OWLED(. : C. "'-,

I would to express -,.- sincere gratitude to Dr. Michael Nechyba for his wise and pa-

tient guidance of my research for this dissertation. As my i : :: : advisor, Dr. Nechyba has

been directing but on no account confining : interests. I especially appreciate his readi-

ness and expertise to help me solve numerous .... ..... .':. issues. Most importantly, I

am ::ii :i for the friendship that we have I .... ::: on this work.

Also, I thank current advisor Dr. Dapeng Wu for -.:;:. extra effort to help me

finalize my PhD :: I am grateful for his invaluable pieces of advise in choosing my

future research goals, as well as .. practical concrete steps that he undertook to help me

find a :. .

My thanks also go to Dr. Jian Li, who helped me a lot in the transition : I in

which I was .: ..- .- to change my advisor. Her research group provided .. a stimulating

environment for me to endeavor investigating areas that are :1 the work : k in

this dissertation.

Also, I thank Dr. Antonio Arroyo, whose :ia: .::H lectures on machine intelligence

have ':. .1:." me to do research in the field of machine learning. As the (::.. .. of the

Machine : ; -- L h Lab (MIL), Dr. Arroyo has created a warm, ..... i and hard working

i: .: among the : -ers." Thanks to him, I have decided to : :. the MIL, which

has ... on numerous occasions to be the right decision. I thank i the members of the

MIL for their friendship and support.

I thank Dr. Takeo Kanade and Dr. Andrew Kurdila for sharing their research i... s:s

on the micro air vehicle (MAV) project with me. : multidisciplinary environment of

this i.: i. in which I had a chance to collaborate with various researchers with (:

educational backgrounds was a great experience for me.


ACKNOW LEDGMENTS .................................

LIST OF TABLES ................... .................

LIST OF FIGURES ... ... ... .. ... ... ... ... ... .. ... ... .

KEY TO ABBREVIATIONS ...............................

KEY TO SYMBOLS ................... ........... ....

A B ST R A C T . . . . . . . . . ..


1 INTRODUCTION ..................................

1.1 Part-Based Object Recognition .......................
1.2 Probabilistic Framework ...........................
1.3 Tree-Structured Generative Models .....................
1.4 Learning Tree Structure from Data is an NP-hard Problem .......
1.5 Our Approach to Image Interpretation ...................
1.6 Contributions . . . . . . . . .
1.7 O verview . . . . . . . . .


2.1 M odel Specification .. .................
2.2 Probabilistic Inference .. ...............
2.3 Structured Variational Approximation ..........
2.3.1 Optimization of Q(XIZ) ..............
2.3.2 Optimization of Q(R'IZ) ..............
2.3.3 Optimization of Q(Z) ................
2.4 Inference Algorithm and B ,, -i im Estimation ......
2.5 Learning Parameters of the Irregular Tree with Random
2.6 Implementation Issues .. ...............


Node Positions

3.1 M odel Specification . ... ... ... ... ... .. ... ... .
3.2 Inference of the Irregular Tree with Fixed Node Positions .. ......
3.3 Learning Parameters of the Irregular Tree with Fixed Node Positions .


4.1 Measuring Significance of Object Parts ...................








4.2 Combining Object-Part Recognition Results .

5 FEATURE EXTRACTION .................. .......... .. 39

5.1 Texture .................. ................. .. 39
5.1.1 Wavelet Transform .................. ..... .. 39
5.1.2 Wavelet Properties .................. ..... .. 41
5.1.3 Complex Wavelet Transform ................... ... .. 42
5.1.4 Difference-of-Gaussian Texture Extraction . . ... 44
5.2 Color .................. ................... .. 45

6 EXPERIMENTS AND DISCUSSION ................ .... .. 46

6.1 Unsupervised Image Segmentation Tests ... . . 47
6.2 Tests of Convergence ............ . . .... 50
6.3 Image Classification Tests .................. ..... .. 53
6.4 Object-Part Recognition Strategy ................. .. 57

7 CONCLUSION .................. ................. .. 63

7.1 Summary of Contributions .................. .. .... .. .. 63
7.2 Opportunities for Future Work ................ .... .. 65




REFERENCES ................... ......... ...... 74

BIOGRAPHICAL SKETCH .................. ............ .. 80

Table page

5-1 Coefficients of the filters used in the Q-shift DTCWT . . .... 43

6-1 Root-node distance error .................. .......... .. 49

6-2 Pixel segmentation error .................. .......... .. 50

6-3 Object detection error. .................. ........... .. 50

6-4 Object recognition error .................. ........ .. .. 55

6-5 Pixel labeling error .................. ............. .. 55

6-6 Object recognition error for IQTyo ................ ... .. 59

6-7 Pixel labeling error for IQT o .................. ..... .. 59

Figure page

1-1 Variants of TSBNs .................. ... ........... 7

1-2 An irregular tree consists of a forest of subtrees .... . . 8

1-3 B ',, -i mi estimation of the irregular tree ................ 11

2-1 Two types of irregular trees .................. ........ .. 13

2-2 Pixel clustering using irregular trees .................. ..... 17

2-3 Irregular tree learned for the 4x4 image in (a) ................ .. 17

2-4 Inference of the irregular tree given Y, R0, and 0 .............. .. 24

3-1 C'I .--. of candidate parents .................. ........ .. 30

3-2 Inference of the irregular tree with fixed node positions . . .... 32

3-3 Algorithm for learning the parameters of the irregular tree . ... 34

4-1 For each subtree of ITv, representing an object in the 128 x 128 image 37

4-2 For each subtree of ITv, representing an object in the 256 x 256 image 38

5-1 Two levels of the DWT of a two-dimensional signal. ........... .. 40

5-2 The original image (left) and its two-scale dyadic DWT (right). ....... .. 40

5-3 The Q-shift Dual-Tree CWT. . ............... . ... 42

5-4 The CWT is strongly oriented at angles 15, 45, 75 . .... 43

6-1 20 image classes in type I and II datasets. ................. .. 48

6-2 Image segmentation using ITvo .................. .... 48

6-3 Image segmentation using ITyo: (top) dataset I images . . .... 48

6-4 Image segmentation by irregular trees learned using SVA . . ... 49

6-5 Image segmentation by irregular trees learned using SVA: (a) ITo . 49

6-6 Image segmentation using ITv .................. .... 49

6-7 Comparison of inference algorithms .................. ...... 51

6-8 Typical convergence rate of the inference algorithm for ITyo on the 128 x 128 52

6-9 Typical convergence rate of the inference algorithm for ITyo on the 256 x 256 52

6-10 Percentage increase in log-likelihood .................. ..... 52

6-11 Comparison of classification results for various statistical models . 55

6-12 MAP pixel labeling using different statistical models. . . ..... 56

6-13 ROC curves for the image in Fig. 6-12a with ITvo, TSBN, DRF and MRF. 56

6-14 ROC curves for the image in Fig. 6-12a with ITv, ITvo, TSBN, and TSBNT. 56

6-15 Comparison of two recognition strategies .... . . 58

6-16 Recognition results over dataset IV for IQTo. ....... . . 60

6-17 Recognition results over dataset V for IQTvo. ............... .61

6-18 C'l .--!i, i 1, .1- using the part-object recognition strategy . . ... 62

B-1 Steps 2 and 5 in Fig. 32 ................. . ..... 73


list shown below gives a description of the f. :- -! used .... .. or abbrevi-

ations in this work. For each name, the page number corresponds to the where the

name is :; used.

B: blue channel of the RGB color space ............

G: green channel of the RGB color space . .............. .... 43

R: red channel of the R.GB color space ...............

IQTV: irregular tree with fixed node positions, and with observables present at all
levels ................. ............... .... 26

IQTyo: irregular tree with fixed node '":... and with observables present <. .1 at
th e . . . . . . . . . .

ITvi: irregular tree where observables are .: :.; .;.! at the

ITV: irregular tree where observables are present at all levels

g: normalized green channel .........

r: normalized red channel ..... ....

( ', T: Complex Wavelet Transform ......

DRF: Discriminative Random Field . .

DTC ','IT: Dual Tree Complex Wavelet Transform ....

DWT: Discrete WXavelet Transfor . .........

SExp : .: ..... ..... algorithm .. .

KL:R : I i : i : divergence . ...

MAP: Maximum A Posteriori . ....

MC( C: Markov ( i .:: Monte Carlo method . .

ML: Maxiimum Likelihood . .......

MPM : Maxirmum Posterior Marginal .. ..

MRF: Markov .::. i..: Field .... .

, \ ': nondeterministic : .. ....... 1 time . ........


. 13

. . 37

. . . . 7

. . . . 3

.. . 15

. 22


. . . 7

RGB: T : color space that consists of red, green and blue color values . .

S*: receiver operating characteristic ................ . .... 52

'.'A: structured variational approximation inference algorithm .... . 16

TSBN: tree-structured belief network .. .. .. 5

VA: variational approximation inference algorithm .................. 16


list shown below gives a brief <1 : .. of the '":* mathematical symbols

defined in this work. For each symbol, the number corresponds to the : where the

symbol is : used.

Influence of observables Y on .ij . . 20

Bij: influence of the geometric .. ... lies of the network on ... .... ..... 20

G: number of components in a Gaussian mixture . ......... 15

Hil: Shanon's :.i:.. of node i ............. ... .. . . 3

,(Q, P): free ................ ............ 64

L: maximum number of levels in the irregular tree ..... . .. 13

:: set of image classes (i.e., ob'. appearances) .................. 13

i: : : i : 1' p .1 tables .................. . ...... 13

.< approximate conditional : :i tables, given Y and Ro ....... 18

R: positions of all nodes in the irregular tree . .......... 13

R': positions of non-leaf nodes in the irregular tree ......... . ....... 13

. 0: positions of leaf nodes in the irregular tree ............... ... 13

V: set of all nodes in the irregular tree ............... . ....... 13

V': set of all non-leaf nodes in the irregular tree . ............. 13

V: set of all leaf nodes in the irregular tree . ......... ...... 13

X: random vector of all ........... . ........ 13

Y: all observables ............... . .............. 13

Z: connect i r random matrix .................... . ....... 13

C: cost .. .. .. ......... ............. . 20

: the set of i i ::: ; .:7 } in the irregular tree with -i node positions .

Eg covariance matrix of a relative child-parent i1 : i.. .... (ri-rj) .... .

0: set of parameters that characterize an irregular tree ........ ..... 15

~Jy: approximate covariance of r,, given that j is the parent of i, and given Y and 18

;: approximate mean of ri, given that j is the parent of i, and given Y and '." 18

p(i): ..'.. : of an observable random vector in the iage plane . . 13

f: index of levels in the irregular tree ..... . ....... 13

:pro ,,.i= of a node i being the child of j . ............ 13

h: norm alization constant ....................... . ..... 18

0: set of parameters that characterize a Gaussian mixture .............. .. 15

,ij: approximate probability of i being the :i i of j, given Y and . . 18

*..' ': ** posterior that node i is labeled as image class k, given Y and R 19

xi: image-class of node .. . ........ . 13

image-class indicator if k class is assigned to node i ................ 13

zij: con ..- : :' indicator random variable between nodes i and j . . .. 13

dij the mean of relative displacement ri-r . ............. .. 13

r,: ". ... of node i in the image .. ................. . 13

Yp(,): observable random vector ...... . ............... 13

Abstract of Dissertation Presented to the Graduate School
of the U-i: i H of F : in Partial i ::.:::: : of the
P. i::' :.. .. for the Degree of Doctor of Pi '

i ;G i. ,-.i .'* i iE 'ii i i, l i i t:FO R i 'G E: ii i i i lI: O N


Sinisa Todorovic

(. :: Dapeng \Vu
'. .: I)Departmient: ii : : and C :... : :Enggineering

In this dissertation, we seek to accomplish the f1i .. ,, related goals: (1) to i a

un'r 1:. i ... i to address localization, detection, and recognition of obi. i as three

sub-tasks of image-interpretation, and (2) to :.. : a .. -_ : : .. .. ..: and reliable

solution to recognition of multiple, partially occluded, alike ob' in a given single image.

second ....!.' ... is to date an open problem in computer vision, eluding a satisfactory

solution. For this :-- we formulate obh. recognition as I' : ... estimation, whereby

class labels with the maximum posterior (i : :i. : :. are assigned to each pixel. To effi-

ciently estimate the posterior distribution of image classes, we propose to model images

with : :: r l models known as : .'.:. trees.

'. irregular tree i-..... i,..1!..!.':y distributions over both its structure and im-

age classes. i means that, for each image, it is necessary to i :. the optimal model

structure, as well as the posterior distribution of image classes. XWe propose several infer-

ence algorithms as a solution to this NP-hard *. 1.*. (nondeterministic : ... time),

which can be viewed as variants of the Expectation-Maximization (EM) algorithm.

After i....... the model i .... a forest .. subtrees, each of which segments the

image. ri :: is, inference of model structure provides a solution to obi 1 localization and


With to our second goal, we hypothesize that for a successful occluded-object

recognition it is critical to ( :* -: iy analyze visible obi~. parts. Irregular trees are conve-

nient for such ... because the treatment of obi' : I. : represents ... 1 a particular

interpretation of the tree/subtree structure. We analyze the '...*.. of irregular-tree

nodes, representing ob'-: parts, with : to recognition of an ob'-: as a whole. :

S: :: : : is then exploited toward the ultimate obi 1 recognition.

..r .. .'. results demonstrate that irregular trees more accurately model images than

their : structure counterparts quad-trees. Also, the experiments reported herein show

that our explicit treatment of object i. .: results in an :' -i: oved recognition p. :

as compared to the strategies in which ob i ( ... .. are not t -1 ay accounted for.


Image interpretation is a difficult challenge that has long been confronting the computer-

vision community. A number of factors contribute to the complexity of this problem. The

most critical is inherent uncer' i I-,v in how the observed visual evidence in images should

be attributed to infer object types and their relationships. In addition to video noise, there

are various sources of this uncer' ,ir -,, including variations in camera quality and position,

wide-ranging illumination conditions, extreme scene diversity, and the randomness of object

appearances, clutter and locations in scenes.

One of the critical hindrances to successful image interpretation is that objects may

occlude each other in a complex scene. In the literature, the initial research on the inter-

pretation of scenes with occlusions appeared in early nineties. However, in the last decade

relatively small volume of the related literature was published. In fact, a majority of the

recently proposed vision systems is not directly aimed at solving the problem of occluded-

object recognition; experiments on images with occlusions are reported as a side result only

to illustrate the versatility of those systems. This i:_:_ I that recognition of partially

occluded objects is an open problem in computer vision, which motivates us to seek its

solution in this dissertation.

In the initial work, local features (e.g., points, line and curve segments) are used to

represent objects, allowing the unoccluded features to be matched with object features, by

computing a scalar measure of model fit [1,2,3]. The unmatched scene features are modeled

as spurious features, and the unmatched object features indicate the occluded part of the

object. The matching score is either the number of matched object features or the sum of a

Gaussian-weighted matching error. The main limitation with these approaches is that they

do not account for the spatial correlation among occlusions.

Statistical approaches to occluded-object recognition have also been reported in the

literature. For instance, Wells [4], and Ying and Castanon [5] propose probabilistic models

to characterize scene features and the correspondence between scene and object features.

The authors model both object-feature uncer' ,inil-, and the pI...1, that the object

features are occluded in the scene. They introduce two statistical models for occlusion.

One model assumes that each feature can be occluded independently of whether any other

features are occluded, whereas the second model accounts for the spatial correlation to

represent the extent of occlusion. The spatial correlation is computed using a Markov

Random Field (MRF) model with a Gibbs distribution [6]. The main drawback of these

systems is a prohibitive computational load; the run-time of these algorithms is exponential

in the number of objects to be recognized.

Other related work exploits auxiliary information provided, for example, by image

sequences or stereo views of the same scene [7,8,9,10,11,5], where occlusions are transitory.

Since this information in general may not be available, and/or occlusions may remain

permanent, in our approach we do not use the strategies of these systems.

A review of the related literature also -i:_:_- I that the majority of vision systems are

designed to deal with only one constrained vision task, such as, for example, image segmen-

tation [10, 11, 5]. However, to conduct image interpretation, as is our goal, it is necessary

to perform three related tasks: (1) localization, (2) detection (also called image segmenta-

tion), and (3) ultimate recognition of object appearances (also called image classification).

Further, in many systems in which the three sub-tasks are addressed, this is not done in a

unified manner. Here, as a drawback, the system's architecture comprises a serial connec-

tion of separate modules, without any feedback on the accuracy of the ultimate recognition.

Moreover, vision systems are typically designed to recognize only a specific instance of ob-

ject classes appearing in the image (e.g., face), which, in turn, is assumed dissimilar to

other objects in the image. However, the assumption of uniqueness of the target class may

not be appropriate in many settings. Also, the success of these systems usually depends on

ad hoc fine-tuning of the feature-extraction methods and system's parameters, optimized

for that unique target class. With current demands to design systems capable of I,-- -, i-I:

thousands of image classes simultaneously, it would be difficult to generalize the outlined


The small volume of published research addressing occlusions in images i:_:_- I that

the problem is not fully examined. Also, the drawbacks of the above systems-namely: con-

strained goals and settings of operation, poor spatial modeling of occlusion, and prohibitive

computational load motivated us to conduct the research reported herein. Our motivation

is that most object classes seem to be naturally described by a few characteristic parts or

components and their geometrical relation. We hypothesize that it is not the percentage of

occlusion that is critical for object recognition, but rather which object parts are occluded.

Not all components of an object are equally important for its recognition, especially when

that object is partially occluded. Given two similar objects in the image, the visible parts of

one object may mislead the algorithm to recognize it as its counterpart. Therefore, careful

consideration should be given to the I, '1-, -i- of detected visible object parts. One of the

benefits of such I ', 1--i, is the flexibility to develop various recognition strategies that weigh

the information obtained from the detected object parts more judiciously. In the following

section, we review some of the reported part-based object-recognition strategies.

1.1 Part-Based Object Recognition

Recently, there has been a flurry of research related to part-based object recognition.

For example, Mohan et al. [12] use separate classifiers to detect heads, arms, and legs of

people in an image, and a final classifier to decide whether a person is present. However,

the approach requires object parts to be manually defined and separated for training the

individual part classifiers. To build a --, -. i 1i that is easily extensible to deal with different

objects, it is important that the part selection procedure be automated. One approach in

this direction is developed by Weber et al. [13,14]. The authors assume that an object is

composed of parts and shape, where parts are image patches, which may be detected and

characterized by appropriate detectors, and shape describes the geometry of the mutual

position of the parts in a way that is invariant with respect to rigid and, possibly, affine

transformations. The authors propose a joint p. ".11 ,ii 1-, density over part appearances

and shape that models the object class. This framework is appealing in that it naturally

allows for parts of different sizes and resolutions. However, due to computational issues, to

learn the joint probability density, the authors choose heuristically a small number of parts

per each object class, rendering the density unreliable in the case of large variations across


Probabilistic detection of object parts has also been reported. For instance, Heisele

et al. [15] propose to learn object components from a set of examples based on their dis-

criminative power, and their robustness against pose and illumination changes. For this

purpose, they use Support Vector Machines. Also, Felzenszwalb and Huttenlocher [16] rep-

resent an object by a collection of parts arranged in a deformable configuration. In their

approach, the appearance of each part is modeled separately by Gaussian-mixture distribu-

tions, and the deformable configuration is represented by spring-like connections between

pairs of parts. The main problem of the mentioned approaches is that they lack the ., 11-, -1,

of object parts through scales. It is assumed that parts cannot contain other sub-parts, and

that objects are unions of mutually exclusive components, which is hard to justify for more

complex object classes.

To address the .111 '1, -i- of object parts through scales Schneiderman and Kanade [17]

propose a trainable multi-stage object detector composed of classifiers, each making a de-

cision about whether to cease evaluation, labeling the input as non-object, or to continue

further evaluation. The detector orders these stages of evaluation from a low-resolution to

a high-resolution search of the image.

The aforementioned approaches are not suitable for recognition of a large number of

object classes. As the number of classes increases there is a combinatorial explosion of

the number of their parts (i.e., image patches) that need to be evaluated by appropriate


In this dissertation, we seek a solution to the outlined problems. Our goal it to design a

vision --, -I. in that would i, -1-,... multiple object classes through their constituent, "mean-

ingful" parts at a number of different resolutions. To this end, we resort to a probabilistic

framework, as discussed in the following section.

1.2 Probabilistic Framework

We formulate image interpretation as inference of a posterior distribution over pixel

random fields for a given image. Once the posterior distribution of image classes is inferred,

each pixel can be labeled through B ,-,. -i ,,i estimation (e.g., maximum a posteriori-MAP).

Within this framework, it is necessary to specify the following:

1. The probability distribution of image classes over pixel random fields,

2. The inference algorithms for computing the posterior distribution of image classes,

3. B ,-,. -i i, estimation for ultimate pixel 1 1.. lii,:_. that is, object recognition.

Our principal challenge lies in choosing a statistical model for specifying the probability

distribution of image classes, since this choice conditions the formulation of inference and

B ,-,. -i im estimation. A suitable model should be computationally manageable, and suffi-

ciently expressive to represent a wide range of patterns in images. A review of the literature

offers four broad classes of models [18]. The descriptive models are constructed based on

statistical descriptions of image ensembles with variables only at one level (e.g., [19, 20]).

The pseudo-descriptive models reduce the computational cost of descriptive models by im-

posing partial (or even linear) order among random variables (e.g., [21,22]). The generative

models consist of observable and hidden variables, where hidden variables represent a finite

number of bases generating an image (e.g., [23, 24]). The discriminative models directly

encode posterior distribution of hidden variables given observables (e.g., [25,26]).

The available models differ in structural complexity and difficulty of inference. At one

end lie descriptive models, which build statistical descriptions of image ensembles only at

the observable (i.e., pixel) level. Other modeling paradigms (i.e., generative, discriminative)

impose \ ,i-, in:_ levels of structure through the introduction of hidden variables. However,

no principled formulation exists, as of yet, to -1:_:_. -. one approach superior to the others.

Therefore, our choice of model is guided by the goal to interpret scenes with partially

occluded, alike objects. We seek a model that offers a viable means of recognizing partially

occluded objects through recognition of their visible constituent parts. Thus, a prospective

model should allow for I, '1-, -i- of object parts towards recognition of objects as a whole.

To alleviate the computational complexity arising from the treatment of multiple

object-parts of multiple objects in images, we seek a model that is capable of modeling

both whole objects and their sub-parts in a unified manner. That is, a candidate model

must be expressive enough to capture component-subcomponent relationships among re-

gions in an image. To accomplish this, it is necessary to i_ '1-,.. pixel neighborhoods

of varying size. The literature abounds with reports on successful applications of multi-

scale statistical models for this purpose [27,28,29,30,31, 32]. Following these trends, we

choose the irregular tree-structured '. 1'. f network, or short irregular tree. Our choice is

directly driven by our image-interpretation strategy and goals, and appears better suited

than alternative statistical approaches. Descriptive models lack the necessary structure

for component-subcomponent representation we seek to exploit. Discriminative approaches

directly model posterior distribution of hidden variables given observables. Consequently,

they lose the convenience of assigning .1!-, -i. 1 meaning to the statistical parameters of the

model. In contrast, irregular trees can detect objects and their parts simultaneously, as

discussed in the following chapters.

Before we continue to present our approach to image interpretation, we give a brief

overview of tree-structured generative models in the following section.

1.3 Tree-Structured Generative Models

Recently, there has been a flurry of research in the field of tree-structured generative

models, also known as tree-structured belief networks (TSBNs) [27,33,28,29,30,31,32]. The

models provide a systematic way to describe random processes/fields and have extremely

efficient and statistically optimal inference algorithms. Tree-structured belief networks are

characterized by a fixed balanced tree structure of nodes representing hidden (latent) and

observable random variables. We focus on TSBNs whose hidden variables take discrete val-

ues, though TSBNs can model even continuously valued Gaussian processes [34, 35]. The

edges of TSBNs represent parent-child (Markovian) dependencies between neighboring lay-

ers of hidden variables, while hidden variables, belonging to the same layer, are conditionally

independent, as depicted in Figure 1-1. Note that observables depend solely on their corre-

sponding hidden variables. Observables are either present at the finest level only, or could be

propagated upward the tree, as dictated by the design choices related to image processing.

TSBNs have efficient linear-time inference algorithms, of which, in the graphical-models

literature, the best-known is ', 1., f propagation [36, 37, 38]. Cheng and Bouman [29] have

used TSBNs for multiscale document segmentation; Kumar and Hebert [39] have employed

TSBNs for segmentation of man-made structures in natural scene images; and Schneider et

al. [40] have used TSBNs for simultaneous image denoising and segmentation. All the afore-

mentioned examples demonstrate the powerful expressiveness of TSBNs and the efficiency

of their inference algorithms, which is critically important for our purposes.

In spite of these attractive properties, the fixed regular structure of nodes in the TSBN

gives rise to 1.1.. I:y" estimates. The pre-defined tree structure fails to adequately represent

the immense variability in size and location of different objects and their subcomponents

in images. In the literature, there are several approaches to alleviate this problem. Irving

et al. [28] have proposed an overlapping tree model, where distinct nodes correspond to

overlapping parts in the image. Li et al. [41] have discussed two-dimensional hierarchical

models where nodes are dependent both at any particular layer through a Markov-mesh

and across resolutions. In both approaches segmentation results are superior to those when

standard TSBNs are used, because the descriptive component of the models is improved at

increased computational cost. Ultimately, however, these approaches do not deal with the

source of the I .... Iii, -- namely, the orderly structure of TSBNs.

Not until recently has the research on irregular structures been initiated. Konen et

al. [42] have proposed a flexible neural mechanism for invariant pattern recognition based on

correlated neuronal activity and the self-organization of dynamic links in neural networks.

Also, Montanvert et al. [43], and Bertolino and Montanvert [44] have explored irregular

multiscale tessellations that adapt to image content. We join these research efforts building

on the work of Adams et al. [45], Adams [46], Storkey [47], and Storkey and Williams [48],

by considering the irregular-structured tree belief network.

(a) (b)
Figure 1-1: Variants of TSBNs: (a) observables (black) at the lowest layer only; (b) ob-
servables (black) at all layers; white nodes represent hidden random variables, connected
in a balanced quad-tree structure.


- - -

- - - -

Figure 1-2: An irregular tree consists of a forest of subtrees, each of which segments the
image into regions, marked by distinct shading; round- and square-shaped nodes indicate
hidden and observable variables, respectively; triangles indicate roots.

In the irregular tree, as in TSBNs, nodes represent random variables, and arcs between

them model causal (Markovian) dependence assumptions, as illustrated in Figure 1-2. The

irregular tree specifies probability distributions over both its structure and image classes.

It is this distribution over tree structures that mitigates the above cited problems with


1.4 Learning Tree Structure from Data is an NP-hard Problem

In order to fully characterize the irregular tree (and any graphical model, for that

matter), it is necessary to learn both the graph topology (structure) and the parameters

of transition probabilities between connected nodes from training data. Usually, for this

purpose, one maximizes the likelihood of the model over training data, while at the same

time minimizing the complexity of model structure. Current methods are successful at

learning both the structure and parameters from complete data. Unfortunately, when the

data are incomplete (i.e., some random variables are hidden), optimizing both the structure

and parameters becomes NP-hard (nondeterministic polynomial time) [49,50].

The principal contribution of this dissertation is that we propose a solution to the

NP-hard problem of model-structure estimation. In our approach, we use a variant of the

Expectation-Maximization (EM) algorithm [51,52], to facilitate efficient search over a large

number of candidate structures. In particular, the EM procedure iteratively improves its

current choice of parameters by using the following two steps. In the Expectation step,

current parameters are used for computing the expected value of all the statistics needed to

evaluate the current structure. That is, the missing data (hidden variables) are completed

by their expected values. In the Maximization step, we replace current parameters with

those that maximize the likelihood over the complete data. This second step is essentially

equivalent to learning model structure and parameters from complete data, and, hence, can

be done efficiently [50, 38, 49].

In the incomplete-data case, a local change in structure of one part of the tree may

lead to a structure change in another part of the model. Thus, the available methods for

structure estimation evaluate all the neighbors (e.g., networks that differ by a few local

changes) of each candidate they visit [53]. The novel idea of our approach is to perform a

search for the best structure within EM. In each iteration step, our procedure attempts to

find a better network structure, by computing the expected statistics needed for evaluation

of alternative structures. In contrast to the available approaches, the EM-based structure

search makes a significant progress in each iteration. As we show through experimental

validation, our procedure requires relatively few EM iterations to learn non-trivial tree


The outlined image modeling constitutes the core of our approach to image interpre-

tation, which is discussed in the following section.

1.5 Our Approach to Image Interpretation

We seek to accomplish the following related goals: (1) to find a unifying framework

to address localization, detection, and recognition of objects, as three sub-tasks of image-

interpretation, and (2) to find a computationally efficient and reliable solution to recognition

of multiple, partially occluded, alike objects in a given single image. For this purpose, we

formulate object recognition as the B i-,. -i mi1 estimation problem, where class labels are

assigned to pixels by minimizing the expected value of a suitably specified cost function.

This formulation requires efficient estimation of the posterior distribution of image classes

(i.e., objects), given an image. To this end, we resort to directed graphical models, known

as irregular trees [54,55,46,47,48,45]. As discussed in Section 1.3, the irregular tree specifies

probability distributions over both its structure and image classes. This means that, for

each image, it is necessary to infer the optimal model structure, as well as the posterior

distribution of image classes. By utilizing the Markov property of the irregular tree, we are

in a position to reduce computational complexity of the inference algorithm, and, thereby,

to efficiently solve our B i-,. -i mi estimation problem.

After inference, the model represents a forest of sub-trees, each of which segments the

image. More precisely, leaf nodes that are descendants down the subtree of a given root form

the image region characterized by that root, as depicted in Fig. 1-2. These segmented image

regions can be interpreted as distinct object appearances in the image. That is, inference

of irregular-tree structure provides a solution to localization and detection. Moreover, in

inference, we also derive the posterior distribution of image classes over leaf nodes. In order

to classify the segmented image regions as a whole, we perform majority voting over the

maximum a posteriori (MAP) classes of leaf nodes. In this fashion, we accomplish our first


With respect to our second goal, we hypothesize that the critical factor in a successful

occluded-object recognition should be the ,i, '1-, -~i of visible object parts, which, as dis-

cussed before, usually induces prohibitive computational cost. To account explicitly for

object parts at various scales, we utilize the Markovian property of irregular trees, which

lends itself as a natural solution. Since each root determines a subtree whose leaf nodes form

a detected object, we can assign 1r,-, -i, 1 meaning to roots as representing whole objects.

Also, each descendant of the root down the subtree can be interpreted as the root of another

subtree whose leaf nodes cover only a part of the object. Thus, roots' descendants can be

viewed as object parts at various scales. Therefore, within the irregular-tree framework, the

treatment of object parts represents merely a particular interpretation of the tree/subtree


To reduce the complexity of interpreting all detected object sub-parts, we propose to

.,1 ,1-,.. the .:i.:;i. ,i,,. .- of object components (i.e., irregular-tree nodes) with respect to

recognition of objects as a whole. After B i-,. -i mi estimation of the irregular-tree structure

for a given image, we first find the set of most .:,,.:. a',,I irregular-tree nodes. Then, these

selected significant nodes are treated as new roots of subtrees. Finally, we conduct MAP

classification and majority voting over the selected image regions, descending from the

selected .:,,.:; ai,,Il nodes, as illustrated in Fig. 1-3.

1.6 Contributions

Below, we outline the main contributions of this dissertation.

St I
- - - - - -
----- --------------

-I /--

optimize structure find "significant" nodes classify selected regions

Figure 1-3: B ,-, -i ,i, estimation of the irregular tree along with the ,ii ,1-, -i, of signifi-
cant tree nodes constitute our approach to recognition of partially occluded, alike objects;
shading indicates the two distinct sub-trees under the two i-, il iii!" nodes.

We propose an EM-like algorithm for learning a graphical-model, where both model

structure and its distributions are learned on a given data simultaneously. The algorithm

represents a stage-wise solution to the learning problem known to be NP-hard. While we

use the algorithm for learning irregular trees, its generalization to any generative model is


A critical part of this learning algorithm is inference of the posterior distribution of

image classes on a given data. As is the case for many complex-structure models, exact

inference for irregular trees is intractable. To overcome this problem, we resort to variational

approximation approach. We assume that there are averaging phenomena in irregular trees

that may render a given set of variables in the model approximately independent of the rest

of the network. Thereby, we derive the Structured Variational Approximation algorithm

that advances existing methods for inference.

In order to avoid variational approximation in inference, we propose two novel archi-

tectures and their inference algorithms within the irregular-tree framework. Being simpler,

these models allow for exact inference. Moreover, empirically, they exhibit higher accuracy

in modeling images than irregular-tree-like models proposed in prior work [45, 46, 47, 48].

Along with architectural novelties, we also introduce multi-layered data into the model

an approach that has been extensively investigated in fixed-structure quad-trees [29,33].

The proposed quad-trees have proved rather successful for various applications including

image d, i-riin, classification, and segmentation. Hence, it is important to develop a

similar formulation for irregular trees.

We develop a novel approach to object recognition, in which object parts are explicitly

I, '1-,. .1 in a computationally efficient manner. As a major theoretical contribution, we

define the measure of cognitive significance of object details. The measure provides for a

principled algorithm that combines detected object parts toward recognition of an object

as a whole.

Finally, we report results of experiments conducted on a wide variety of image datasets,

which characterize the proposed models and inference algorithms, and validate our approach

to image interpretation.

1.7 Overview

The remainder of the dissertation is organized as follows.

In Chapter 2, we specify two architectures of the irregular-tree model, and derive

inference algorithms for them. The architectures differ in the treatment of observable

random variables. We also discuss learning of the model parameters. Detailed derivation

of the inference algorithm is given in Appendix A.

Next, in Chapter 3, we specify yet another two architectures of the irregular-tree model,

for which it is possible to simplify the inference algorithm, as compared to that discussed

in Chapter 2. We deliberate the probabilistic inference and learning algorithms for the


Further, in Chapter 4, we propose a measure of significance of object parts. This

measure ranks object components with respect to the entropy over all image classes (i.e.,

objects). To incorporate the information of this .111 ,1-, -i into the MAP classification, we

devise a greedy algorithm, which we refer to as object-part recognition.

The extraction of image features, which we use in our experiments, is thoroughly

discussed in Chapter 5. Then, In Chapter 6, we report performance results of different

irregular-tree architectures on a large number of challenging images with partially occluded,

alike objects.

Finally, in Chapter 7, we summarize the major contributions of the dissertation, and

conclude with remarks on the future research.


2.1 Model Specification

Irregular trees are directed, -, 11i graphs with two disjoint sets of nodes representing

hidden and observable random vectors. Graphically, we represent all hidden variables as

round-shaped nodes, connected via directed edges indicating Markovian dependencies, while

observables are denoted as rectangular-shaped nodes, connected only to their corresponding

hidden variables, as depicted in Fig. 2-1. Below, we first introduce nodes characterized by

hidden variables.

There are V round-shaped nodes, organized in hierarchical levels, V, = {0, 1, ..., L-l},

where Vo denotes the leaf level, and V'AV\V0. The number of round-shaped nodes is identi-

cal to that of the corresponding quad-tree with L levels, such that V|= | V-1|/4=...=| Vo/4'.

Connections are established under the constraint that a node at level f can become a root,

or it can connect only to the nodes at the next f+1 level. The network connectivity is

represented by random matrix Z, where entry zij is an indicator random variable, such

that zij=1 if iVe and jE{0, V+1} are connected. Z contains an additional zero ("root")


(a) (b)

Figure 2-1: Two types of irregular trees: (a) observable variables present at the leaf level
only; (b) observable variables present at all levels; round- and square-shaped nodes indicate
hidden and observable random variables; triangles indicate roots; unconnected nodes in this
example belong to other subtrees; each subtree segments the image into regions marked by
distinct shading.

column, where entries zio=1 if i is a root. Since each node can have only one parent, a real-

ization of Z can have at most one entry equal to 1 in each row. We define the distribution

over connectivity as

P(Z) A nL ni) {, } [- (2.1)

where is the I .1. 1 ,ill- of i being the child of j, subject to Yjo0,ve+li} 1.
Further, each round-shaped node i (see Fig. 2-1) is characterized by random position
ri in the image plane. The distribution of ri is conditioned on the position of its parent rj


P(ri|rj, zij1) A exp(--(ri-rj-dijr-r-d)) (2.2)
27rE ij 2
where yij is a diagonal matrix that represents the order of magnitude of object size, and pa-

rameter dij is the mean of relative displacement (ri-rj). Storkey and Williams [48] set dij

to zero, which favors undesirable positioning of children and parent nodes at the same loca-

tions. From our experiments, this may seriously degrade the image-modeling capabilities of

irregular trees, and as such some nonzero relative displacement dij needs to be accounted
for. For roots i, we have P(rilro,zio 1)Aexp(- (ri-di)T ; (r -d))/(2r|E|). The

joint probability of RA{riViEV}, is given by

P(R|Z) A [P(rilrj, zij) ]Z"i (2.3)

At the leaf level, Vo, we fix node positions Ro to the locations of the finest-scale ob-

servables, and then use P(Z, R'IR) as the prior over positions and connectivity, where

Roa{rilVieVo}, and R'A{riVieV\Vo}.

Next, each node i is characterized by an image-class label xi and an image-class indica-

tor random variable xz, such that x =1 if xi=k, where k is a label taking values in the finite

set M. Thus, we assume that the set M of unknown image classes is finite. The label k of

node i is conditioned on image class I of its parent j and is given by conditional probability

tables Pb'. For roots i, we have P(xalz, zio 1) AP(xa). Thus, the joint 1.i1..1. i111 -, of

XA{xziEV, kEM} is given by

P(X|Z) = Y [k,pEM ''zi (2.4)

Finally, we introduce nodes that are characterized by observable random vectors rep-

resenting image texture and color cues. Here, we make a distinction between two types of

irregular trees. The model where observables are present only at the leaf-level is referred

to as ITyo; the model where observables are present at all levels is referred to as ITv. To

clarify the difference between the two types of nodes in irregular trees, we index observables

with respect to their locations in the data-structure (e.g., wavelet dyadic squares), while

hidden variables are indexed with respect to a node-index in the graph. This generalizes

correspondence between hidden and observable random variables of the position-encoding

dynamic trees [48]. We define the position of an observable, p(i), to be equal to the center

of mass of the i-th dyadic square at level in the corresponding quad-tree with L levels:

p(i) A [(n+0.5)2' (m+0.5)2 ]T Vi E V', = {0,..., L 1}, n, m = 1,2,... (2.5)

where n and m denote the row and column in the dyadic square at scale f (e.g., for wavelet

coefficients). C'I. Ily, other application-dependent definitions of p(i) are possible. Note

that while the r's are random vectors, the p's are deterministic values fixed at locations

where the corresponding observables are recorded in the image. Also, after fixing Ro to the

locations of the finest-scale observables, we have VieVo, ri=p(i). The definition, given by
Eq. (2.5), holds for ITvo, as well, for f=0.

For both types of irregular trees, we assume that observables YA{yp(i)ViEV} at loca-

tions pa{p(i)|VieV} are conditionally independent given the corresponding x4 :

P(YIX, p) = n v kM [P(yp) p(i))] (2.6)

where for ITvo, Vo should be substituted for V. The likelihood P(yp(y) =l1, p(i)) are
modeled as mixtures of Gaussians: P(yp(i) x =1, p(i)) A I 7k (g)Afp(iy); vk(), 7k(g)).

For large Gk, a Gaussian-mixture density can approximate any probability density [56].

In order to avoid the risk of overfitting the model, we assume that the parameters of the

Gaussian-mixture are equal for all nodes. The Gaussian-mixture parameters can be grouped

in the set 0 A {Gk, {k(g9), Vk(g), Ek(g)}1 VkCM}.

Speaking in generative terms, for a given set of V nodes, first P(Z) is defined using
Eq. (2.1) and P(RIZ) using Eq. (2.3) to give us P(Z, R). We then impose the condition of

fixing the leaf-level node positions to the locations of the finest-scale observables, po C p,

to obtain P(Z, R'IRo pO). Combining Eq. (2.4) and Eq. (2.6) with P(Z, R'IRo pO) results

in the joint prior

P(Z,X, R', YRO po) = P(YX, p)P(XZ)P(Z, R'RO po) (2.7)

which fully specifies the irregular tree. All the parameters of the joint prior can be grouped

in the set 0 A { dij,Yij,P', 0}, Vi, jV, Vk, leM.

As depicted in Figure 2-1, a irregular tree is a directed graph. The formalism of the

graph-theoretic representation of irregular trees provides general algorithms for computing

marginal and conditional probabilities of interest, which is discussed in the following section.

2.2 Probabilistic Inference

Image interpretation, as discussed in Chapter 1, requires computation of posterior prob-

abilities of hidden random variables Z, X, and R', given observables Y and leaf-node posi-

tions Ro. However, due to the complexity of irregular trees, the exact probabilistic inference

of P(Z, X, R'|Y, Ro) is infeasible. Therefore, we resort to approximate inference methods,

which are divided into two broad classes: deterministic approximations and Monte-Carlo

methods [57, 58, 59, 60, 61].

Markov Chain Monte Carlo (MC':\C) methods allow for sampling of the posterior

P(Z, X, R' Y, Ro), and the construction of a Markov chain whose equilibrium distribution

is the desired P(Z, X, R'IY, Ro). Below, we report an experiment for two datasets of 4x4

and 8x8 binary images, samples of which are depicted in Fig. 2-2a, where we learned

P(Z,X, R'IY, R) for ITvo models through Gibbs sampling [62]. Observables yi were set

to binary pixel values; the number of image classes was set to |M|I2; the number of

components in the Gaussian-mixture was set to G=1; and the maximum number of levels

in the model is set to L=3 and L=4 for 4x4 and 8x8 images, respectively. The initial

irregular-tree structure is a balanced quad-tree (TSBN), where the number of leaf-level

nodes is equal to the number of pixels. One iteration of Gibbs sampling consists of sampling

each variable, conditioned on the other variables in the irregular tree, until all the variables

are sampled. We iterated this procedure until our convergence criterion was met -namely,

when IPt+l(Z,X, R'IY, R)-Pt(Z,X,R'IY, RO)I/Pt(Z,X, R'IY, R)

(a) (D) (c)
Figure 2-2: Pixel clustering using irregular trees learned by Gibbs sampling: (a) sample
4x4 and 8x8 binary images; (b) clustered leaf-level pixels that have the same parent at
level 1; (c) clustered leaf-level pixels that have the same grandparent at level 2; clusters
are indicated by different shades of gray; the point in each group marks the position of the
parent node.

Figure 2-3: Irregular tree learned for the 4x4 image in (a), after 20,032 iterations of Gibbs
- i 1.'liih:_. nodes are depicted in-line representing 4, 2 and 1 actual rows of the levels 0, 1
and 2, respectively; nodes are drawn as pie-charts representing P(x =- 1), k e {0, 1}; note
that there are two root nodes for two distinct objects in the image.

iteration steps t, where e=0.1 and e=1 for 4x4 and 8x8 images, respectively. For the

dataset of 50 binary 4x4 images, on average more than 20,000 iteration steps were required

for convergence, while for 50 binary 8x8 binary image, more than 100,000 iterations were

required. In Figs. 2-2b-c, we also illustrate the grouping of pixels in the learned irregular

trees, while in Fig. 2-3, we depict the irregular tree learned for the 4x4 image in Fig. 2-2a.

From the experimental results, we infer that irregular trees learned through Gibbs

sampling are capable of capturing important structural information about image regions

at various scales. Generally, however, in MC' IC approaches, with increasing model com-

plexity, the choice of proposals in the Markov chain becomes hard, so that the equilibrium

distribution is reached very slowly [63, 57]. Hence, in order to achieve faster inference, we

resort to variational approximation, a specific type of deterministic approximation [59,64].

Variational approximation methods have been demonstrated to give good and significantly
faster results, when compared to Gibbs sampling [46]. The proposed approaches range from
a factorized approximating distribution over hidden variables [45] (a.k.a. mean field varia-
tional approximation) to more structured solutions [48], where dependencies among hidden
variables are enforced. The underlying assumption in those methods is that there are aver-
aging phenomena in irregular trees that may render a given set of variables approximately
independent of the rest of the network. Therefore, the resulting variational optimization of
irregular trees provides for principled solutions, while reducing computational complexity.
In the following section, we derive a novel Structured Variational Approximation (SVA)
algorithm for the irregular tree model defined in Section 2.1.
2.3 Structured Variational Approximation
In variational approximation, the intractable distribution P(Z, X, R'\Y, Ro) is approxi-
mated by a simpler distribution Q(Z, X, R'IY, Ro) closest to P(Z, X, R'IY, Ro). To simplify
notation, below, we omit the conditioning on Y and R, and write Q(Z, X, R'). The novelty
of our approach is that we constrain the variational distribution to the form

Q(Z, X, R') A Q(Z)Q(X|Z)Q(R'|Z) (2.8)

which enforces that both class-indicator variables X and position variables R' are statisti-
cally dependent on the tree connectivity Z. Since these dependencies are significant in the
prior, one should expect them to remain so in the posterior. Therefore, our formulation
appears to be more appropriate for approximating the true posterior than the mean-field
variational approximation Q(Z,X, R')=Q(Z)Q(X)Q(R') discussed by Adams et al. [45],
and the form Q(Z, X, R')=Q(Z)Q(X|Z)Q(R') proposed by Storkey and Williams [48]. We
define the approximating distributions as follows:

Q(Z) A nL1 Hj(i,j)ex v{ov+1} [iJ]~' (2.9)

Q(X|Z) A n ,j H1,klM [Qj] (2.10)
exp ( p)TQl(rj j)i')
Q(RtlZ) A ,v, [Q(rzIij)] jjj [, ( -- / (2.11)
271\lij\1 2

where parameters (ij correspond to the connection probabilities, and the Qk1 are anal-

ogous to the Pl' conditional p .1. 1. ,ilr-, tables. For the parameters of Q(R'IZ), note that
covariances Qij and mean values ptij form the set of Gaussian parameters for a given node

iVe over its candidate parents jVcW 1. Which pair of parameters (pij, Qj), is used to

generate ri is conditioned on the given connection between i and j -that is, the current
realization of Z. Furthermore, we assume that the Q's are diagonal matrices, such that

node positions along the "x" and "y" image axes are uncorrelated. Also, for roots, suitable

forms of Q functions are used, similar to the specifications given in Section 2.1.

To find Q(Z, X, R') closest to P(Z, X, R'IY, Ro) we resort to a standard optimization

method, where Kullback-Leibler (KL) divergence between Q(Z, X, R') and P(Z, X, R'IY, R)
is minimized ( [65], ch. 2, pp. 12-49, and ch. 16, pp. 482-509). The KL divergence is given


KL(QIIP) A dR' Q(, X, R) log P(, 7X, R') (2.12)
It is well known that KL(QIIP) is non-negative for any two distributions Q and P, and

KL(QIIP)=0 if and only if Q=P; these properties are a direct corollary of Jensen's inequal-
ity ( [65], ch. 2, pp. 12-49). As such, KL(QIIP) guarantees a global minimum -that is, a
unique solution to Q(Z, X, R').
By minimizing the KL divergence, we derive the update equations for estimating the

parameters of the variational distribution Q(Z,X, R'). Below, we summarize the final

derivation results. Detailed derivation steps are reported in Appendix A, where we also

provide the list of nomenclature. In the following equations, we use K to denote an arbitrary

normalization constant, the definition of which may change from equation to equation.

Parameters on the right-hand side of the update equations are assumed known, as learned

in the previous iteration step.

2.3.1 Optimization of Q(XIZ)

Q(XIZ) is fully characterized by parameters Qk1, which are updated as

QJ = KPjA Vi,j EV Vk, cI eM ,


where the auxiliary parameters A\ are computed as

S/ P(yp()i, p(i)) ,i ( i,
A. (2.14a)
c[ V f M pak akl Ci G V,
ACCV L-Z aCM Pcijia] ,ci J ,

A = P(yp(i)|,\xp(i))cV [ZaeM ] Vi e V, Vk E M, (2.14b)

where Eq. (2.14a) is derived for ITyo, and Eq. (2.14b) for ITV. Since the ,ci are non-zero
only for child-parent pairs, from Eq. (2.14), we note that A's are computed for both models
by propagating the A messages of the corresponding children nodes upward. Thus, Q's, given
by Eq. (2.13), can be updated by making a single pass up the tree. Also, note that for leaf
nodes, ieV, the ,ci parameters are equal to 0 by definition, yielding Ai P(yp(i)zx, p(i))
in Eq. (2.14b).
Further, from Eqs. (2.9) and (2.10), we derive the update equation for the approximate
posterior probability mi that node i is assigned to image class k, given Y and R0, as

"' -j / fdR' ,x XQ(Z, X, R') = E y v, E M Q'}, Vi e Vk M. (2.15)

Note that the mi can be computed by propagating image-class probabilities in a single pass
downward. This upward-downward propagation, specified by Eqs. (2.14) and (2.15), is very
reminiscent of belief propagation for TSBNs [36,31]. For the special case when ,ij1= only
for one parent j, we obtain the standard A-7r rules of Pearl's message passing scheme for
2.3.2 Optimization of Q(R'IZ)
Q(R'|Z) is fully characterized by parameters p/ij and Qij. The update equation for
ILij, V(i,j)EV'x{0, V+1}, >0, is given by
]~-1 -

iJ= EjPZij1 > Ci 1Ci > i 1(, Li -dij)
pcV c7V1 pV/ cCV'

where c and p denote children and grandparents of node i, respectively. Further, for all

node pairs V(i, j)EVx {0, V1+}, >0, where ijQ0, ij is updated as

Trz I T''r{?J}I 1+ E jp 1Tr{I QIj- } J +
pGV/ Tr1 (I [ zIQ ij + (2.17)
+( Tr{f^^l. 1' I\7

+ c iTr{E~ } 1+ ),Tr-
GyV / CL ( ^

where, once again, c and p denote children and grandparents of node i, respectively. Since
the Q's and E's are assumed diagonal, it is straightforward to derive the expressions for
the diagonal elements of the Q's from Eq. (2.17). Note that both pij and Qij are up-
dated summing over children and grandparents of i, and, therefore, must be iterated until

2.3.3 Optimization of Q(Z)

Q(Z) is fully characterized by connectivity probabilities (ij, which are computed as

ij = K- exp(Aij Bij) VW, V(i,j)EVx {0, V+1} (2.18)

where Aij represents the influence of observables Y, while Bij represents the contribution of

the geometric properties of the network to the connectivity distribution. These are defined
in Appendix A.
2.4 Inference Algorithm and Bayesian Estimation

For the given set of parameters characterizing the joint prior, observables Y, and
leaf-level node positions R, the standard B i-, -i 11 estimation of optimal Z, X, and R'
requires minimizing the expectation of a cost function C:

(Z, It')= arg minz,x,R E{C((Z, X, R'), (Z*,X*, R'*)) Y, R0, E}, (2.19)

where C(.) penalizes the discrepancy between the estimated configuration (Z, X, R') and

the true one (Z*, X*, R'*). We propose the following cost function:

C((Z, X, R'), (Z*, X*, R'*))a [1_6(zz)]+ [16(_ *)]+ [-(r-r)],
i,jEV iEV keM iEV'

where indicates true values, and 6(.) is the Kronecker delta function. Using the variational

approximation P(Z,X,R'IY, RO)Q(Z)Q(XIZ)Q(R'IZ), from Eqs. (2.19) and (2.20), we

Z arg minz ZQ(Z) (i,j)Ex{o,v+}[1-6(ziJ-Z)], (2.21)

X= arg minx Ez,x Q(Z)Q(XIZ) Zicv EkEM[-l -ai *))], (2.22)

R'= arg minR, J dR' (Z Q(Z)Q(R'IZ) ~i yv[1-6(ri-r*)]. (2.23)

Given the constraints on connections, discussed in Section 2.1, minimization in Eq. (2.21)

is equivalent to finding parents:

(Vf)(VieVl)(Z.i40) = argmaxj{0o,vyt+} ij for ITyo (2.24a)

(Vw)(VieV ) = argmaxj{0oyV+,} ij for ITv (2.24b)

where (ij is given by Eq. (2.18); Z.i denotes the i-th column of Z, and Z.ij40 indicates that

there is at least one non-zero element in column Z.i; that is, i has children, and thereby
is included in the tree structure. Note that due to the distribution over connections, after

estimation of Z, for a given image, some nodes may remain without children. To preserve

the generative property in ITyo, we impose an additional constraint on Z that nodes above
the leaf level must have children in order to be able to connect to upper levels. On the other

hand, in ITv, due to multi-layered observables, all nodes V must be included in the tree

structure, even if they do not have children. The global solution to Eq. (2.24a) is an open
problem in many research areas. Therefore, for ITyo, we propose a stage-wise optimization,

where, as we move upwards, starting from the leaf level = {0, 1, ..., L-l}, we include in the

tree structure optimal parents at V+1 according to

(ViEV)(Z.$)0) j=argmaxj0o,vt+1}ij, (2.25)

where Z.i denotes i-th column of the estimated Z, and Z.i40 indicates that i has already

been included in the tree structure when optimizing the previous level V.

Next, from Eq. (2.22), the resulting B i-,, -i I, estimator of image-class labels, denoted

as xi, is

(VieV) xi = arg maxkeM mi (2.26)

where the approximate posterior .1. .1.11 l.1-, mk that image class k is assigned to node i is

given by Eq. (2.15).

Finally, from Eq. (2.23), optimal node positions are estimated as

(V>0)(VieVe ) ri argin- ::, Ez Q(rilZ)Q(Z) = Ej{o,vt+i} lijij, (2.27)

where ttij and (ij are given by Eqs. (2.16) and (2.18), respectively.

The inference algorithm for irregular trees is summarized in Fig. 2-4. The specified

ordering of parameter updates for Q(Z), Q(XIZ), and Q(R'IZ) in Fig. 2-4, steps (4)-(10),

is arbitrary; theoretically, other orderings are equally valid.

2.5 Learning Parameters of the Irregular Tree with Random Node Positions

Variational inference presumes that model parameters: 0= { dij, ij, P.l, 0}, Vi, jEV,

Vk, leM, and V, L, M, are available. These parameters can be learned off-line through

standard Maximum Likelihood (ML) optimization. Usually, for the ML optimization, it is

assumed that N, independently generated, training images, with observables {Y"}JN and

latent variables {(Z', X', R"~) } are given. However, for multiscale generative models,

in general, neither the true image-class labels for nodes at higher levels nor their dynamic

connections are given. Therefore, configurations {(Z", ', X R')} must be estimated from

the training images.

To this end, we propose an iterative learning procedure. In initialization, we first set

L= log2 0(V), where IVo is equal to the size of a given image. The number of image

classes |M| is also assumed known. Next, due to a huge diversity of possible configurations

of objects in images, for each node iGVe, we initialize to be uniform over i's candidate

parents VjE{0, V+1}. Then, for all pairs (i,j)EVxV1+l at level f, we set dijp(i)-p(j);

namely, the dij are initialized to the relative displacement of the centers of mass of the i-th

and j-th dyadic squares in the corresponding quad-tree with L levels, specified in Eq. (2.5).

For roots i, we have di=p(i). Also, we set diagonal elements of Eij to the diagonal elements

Inference Algorithm
Assume that V, L, M, 0, Ns, e, and e, are given.
(1) Initialization: t=0; tin 0; (Vi,jEV) (Vk,leM) j(0) Q(0)=Pj; lij(0) "node
locations in the corresponding quad-tree"; diagonal elements of Qij(0) are set to the area
of dyadic squares in the corresponding quad-tree;
(2) repeat Outer Loop
(3) t=t+l;
(4) Compute in bottom-up pass for f=0,1,..., L-l, ViEV', VkcM,
x(t) given by Eq. (2.14); Qj(t) given by Eq. (2.13);
(5) Compute in top-down pass for f=L-1, L-2,..., 0, ViEV', VkcM,
m () given by Eq. (2.15);
(6) repeat Inner Loop
(7) tin tin + 1;
(8) Compute Vi,jEV',
llij(tin) given by Eq. (2.16); Qij(tin) given by Eq. (2.17);
(9) until |I-ij (tin)--IJij(tin-1) I/Pij (tin-l) < EP;
(10) Compute Vi,jEV',
ij(t) given by Eq. (2.18);
(11) until |Q(Z, X, R'; t)-Q(Z, X, R'; t-1)|/Q(Z, X, R'; t-1) tion steps ;
(12) Estimation of Z: compute in bottom-up pass for =0, 1, ..., L-l,
for ITvo: (ViEVG)(Z.i40) = arglm ,- ,, e+ ij(),
for ITv: (ViEV ) j- arg maxj{o,v~+1} ij(t);
(13) Estimation of X: compute (ViEV) ai argmaxkc m (t);
(14) Estimation of R': compute (Vf>0)(VieVL) ri je{o,v,+i} iij(t)ij(t);

Figure 2-4: Inference of the irregular tree given Y, R0, and 0; t and tin are counters in the
outer and inner loops, respectively; N,, e, and e, control the convergence criteria for the
two loops.

of a matrix djd~ The number of components Gk in a Gaussian mixture for each class k
is set to Gk=3, which is empirically validated to be appropriate. Other parameters of the

Gaussian mixture, 0, are estimated by using the EM algorithm [52, 56] on the hand-labeled
training images. Finally, conditional 1.1. 1 ilil-,i tables, P1', are initialized to be uniform

over possible image classes.

After initialization of 0, we run an iterative learning procedure, where in step t we

conduct SVA inference of the irregular tree on the training images, as explained in the

previous section. After inference of the posterior I .1. 1 d Iil.i I-, that class k is assigned to node

i, mn, given by Eq. (2.15), and posterior connectivity probability, ij, given by Eq. (2.18),

on all training images, n 1, ..., N, we update only P' and as

P l (t+1l) -- k";n(t) ,(2.28)
,(t+1) = (E(t). (2.29)

Other parameters in O(t+l)= { .(t+l), dij, Eij, PiS(t+l), 0}, are fixed to their initial val-

ues. In the next iteration step, we use O(t+1) for SVA inference of the irregular tree on

the training images. We assume that the learning algorithm converged when

|P (t+1) PM ()

where e > 0 is a pre-specified parameter.

2.6 Implementation Issues

In this section, we list algorithm-related details that are necessary for the experimental

results, presented in Chapter 6, to be reproducible.

First, direct implementation of Eq. (2.13) would result in numerical underflow. There-

fore, we introduce the following scaling procedure:

k A --, VicV, VkeM, (2.30)
Si A (2.31)

Substituting the scaled A's into Eq. (2.13), we obtain

vkl \k vkl \k
kl pki k ki p k
p- aEM Pl aM P"Ia

In other words, computation of Qfk does not change when the scaled A's are used.

Second, to reduce computational complexity, we consider, for each node i, only the 7x7

box encompassing parent nodes j that neighbor the parent of the corresponding quad-tree.

Consequently, the number of possible children nodes c of i is also limited. Our experiments

show that the omitted nodes, either children or parents, contribute negligibly to the update

equations. Thus, we limit overall computational cost as the number of nodes increases.


Finally, the convergence criterion of the inner loop, where tij and Qij are computed,

is controlled by parameter e,. When e =0.01, the average number of iteration steps, tin, in

the inner loop, is from 3 to 5, depending on the image size, where the latter is obtained for

128x 128 images. The convergence criterion of the outer loop is controlled by parameters

N, and e. Simplifications that we use in practice may lead to sub-optimal solutions of SVA.

From our experience, though, the algorithm recovers from unstable stationary points for

sufficiently large N. In our experiments, we set N,=10 and e=0.01.

After the inference algorithm (Fig. 2-4) converged, we then estimate the values of

hidden variables (Z, X, R') for a given image, thereby conducting image interpretation.


In the previous chapter, two architectures of the irregular tree are presented, which are

fully characterized by the following joint prior:

P(Z, X, R', YIRO po) =P(YIX, p)P(X|Z)P(Z, R'IRo po)

As discussed in Section 2.2, the inference of the posterior distribution P(Z,X, R'IY, RO)

is intractable, due to the complexity of the model. The node-position variables, R', are

the main culprit for conducting approximate inference. On the other hand, the R' are

very useful, because they constrain possible network configurations. In order to avoid

approximate inference, in this chapter, we introduce yet another architecture of the irregular

tree, where the R' are eliminated, and where the constraints on the tree structure are directly

modeled in the distribution of connectivity Z.

3.1 Model Specification

Similar to the model specification in the previous chapter, we introduce two architec-

tures: one with observables only at the leaf level, and the other with observables propagated

to higher levels. The main difference from the architectures ITv and ITvo is that node po-

sitions are identical to those of the quad-tree. Therefore, we refer to the architectures

presented in this chapter as irregular quad trees IQTV and IQTvo.

The irregular quad tree is a directed -,. li graph with nodes in set V, organized

in hierarchical levels, VW, ={0,1,...,L}, where Vo denotes the leaf level. The layout

of nodes is identical to that of the quad-tree, modeling for example the dyadic pyramid

of wavelet coefficients, such that the number of nodes at level can be computed as

IVl =IV- 1l/4=...= VO|/4'. Unlike for position-encoding dynamic trees [48], we assume

that nodes are fixed at locations of the corresponding quad-tree. Consequently, irregular

model structure is achieved only through establishing arbitrary connections between nodes.

Connections are established under the constraint that a node at level f can become a root

or it can connect only to the nodes at the next f+1 level. The network connectivity is rep-

resented by a random matrix, Z, where entry zij is an indicator random variable, such that

zij=1 if iEVW and jeV1+l are connected. Z contains an additional zero ("root") column,

where entries zio=1 if i is a root node. Each node can have only one parent, or can be a

root. Note that due to the distribution over connections, after estimation of Z, for a given

image, in IQTv, some nodes may remain without children.

Each node i is characterized by an image-class random variable, xi, which can take

values in a finite class set C. Given Z, the label xi of node i is conditioned on xj of its

parent j as P(xilxj, zij=1). The joint probability of image-class variables X={xi}, VieV,

is given by

P(XZ)= ni o nieve P(Xz Xj, zij 1), (3.1)

where for roots we use priors P(xi). We assume that the conditional probability tables

P(xilxj, zij=1) are equal for all the nodes at all levels, as in [33]. Such a unique conditional

probability table is denoted as F.

Next, we assume that observables yi are conditionally independent given the corre-

sponding xi:

P(YIX) = n v P(yilxi) (3.2)

P(y\ 1) Ef 1 "1k(g)N(Yi; Vk(g),7 k(g)) (3.3)

where for IQTvo instead of V we write Vo in Eq. (3.2). P(yl, i=k), kcM, is mod-

eled as a mixture of Gaussians. The Gaussian-mixture parameters can be grouped in

0 {7k (9), vk(9), 7k(g), Gk}, VkeM.

Finally, we specify the connectivity distribution. In the previous chapter, it is de-

flQTined as the prior P(Z)= Hi,je P(zij=1), and then the constraint on possible tree

structures is imposed through introducing an additional set of random variables -namely,

random node positions R. The main purpose of the R's is to provide for the mechanism

that the connections between close nodes are favored. That approach has two major dis-

advantages. First, the additional R variables render the exact inference of the dynamic

tree intractable, enforcing the use of approximate inference methods (variational approxi-

mation). Second, the decision if nodes i and j should be connected is not informed on the

actual values of xi and xj. To improve upon the model formulation of the previous chap-

ter, we seek to eliminate the R's, and to incorporate the information on image-class labels

and node positions in the connectivity distribution. We reason that connections between

parents and children, whose relative distance is small, should be favored over those that are

far apart. At the same time, we seek to establish a mechanism that groups nodes belonging

to the same image class, and separates those assigned to different classes.

Let us first examine relative distances between nodes. Due to -, iiil. rry of the node

layout (equal to that of the quad-tree), we divide the set of all candidate parents j into

classes of equidistance from child i, as depicted in Fig. 3-1. We specify that relative

distances can take integer values dij{0, 1,2,...,d ax}, where if i is a root nioAO. Note

that d"ax values vary for different positions of i at one level, as well as for different levels

to which i belongs.

Given X, we specify the conditional connectivity distribution as

P(ZIX) 1= H P(zi=ll|z, ), (3.4)
=0 (i,j) Ve x{0,Ve+1}

pi i is a root,
P(zij= lzi,Xj) = pI p p)dij if xi=Xj, (3.5)

subject to P(zij l\xi,Xj) 1, (3.6)

where K is a normalizing constant, and pi is the parameter of the geometric distribution.

From Eq. (3.5), we observe that when xi=xj, P(zij=llxi,xj) decreases as dij becomes

larger, while when xiyxj, P(zij=lzxi,xj) increases for greater distances dij. Hence, the

form of P(zij=llxi, xj), given by Eq. (3.5), satisfies the aforementioned desirable properties.

To avoid overfitting, we assume that pi is equal for all nodes i at the same level. The

parameters of P(ZIX) can be grouped in the parameter set = {pi}, ViEV.

S / class: dj = 1
e class: d3j = 2
/ /
D node i

Figure 3-1: Classes of candidate parents j that are characterized by a unique relative
distance dij from child i.

The introduced parameters of the model can be grouped in the parameter set 0( {1, 0, '}.

In the next section we explain how to infer the I. configuration of Z and X from the

observed image data Y, provided that 0 is known.

3.2 Inference of the Irregular Tree with Fixed Node Positions

The standard B ,-,. -i ,_i formulation of the inference problem consists in minimizing the

expectation of some cost function C, given the data

(Z, X) arg minz,x E{C((Z, X), (Z', X'))IY, } (3.7)

where C penalizes the discrepancy between the estimated configuration (Z, X) and the true

one (Z', X'). We propose the following cost function:

C((Z, X), (Z', X')) = C(X, X') + C(Z, Z') (3.8)
L-1 L-1
> [1 (xj 4-)] + > [1i- -4)], (3.9)
=0 icyE =0 (i,j)cVex{0,Ve+l}

where stands for true values, and 6(.) is the Kronecker delta function. From Eq. (3.9), the

resulting B i-,. -i i estimator of X is

VieV, i= argmax,,c P(xilZ, Y). (3.10)

Next, given the constraints on connections in the irregular tree, we derive that mini-

mizing E{C(Z, Z')IY, 9} is equivalent to finding a set of optimal parents j such that

(Vf)(VieV )(Z.,40) j arg ,:. -,,+ I} P(zijlx,x y), for IQTyo (3.11a)

(Vw)(ViEV) j arg im -,,,jv+I} P(zijli, xj) for IQTv ,


where Z.i is the i-th column of Z, and Z.ij4O represents the event "node i has children", that

is, "node i is included in the irregular-tree structure." The global solution to Eq. (3.11a) is

an open problem in many research areas. We propose a stage-wise optimization, where, as

we move upwards, starting from the leaf level = {0, 1,..., L}, we include in the tree structure

optimal parents at V+1 according to

(ViEVc)(Z.$O0) j argmaxj{o,v+i} P(zij=lzi, x), (3.12)

where Z.i40 denotes an estimate that i has already been included in the tree structure

when optimizing the previous level V.

By using the results in Eqs. (3.10) and (3.12), we specify the inference algorithm for

the irregular quad tree, which is summarized in Fig. 3-2. In a recursive step t, we first

assume that estimate Z(t-1) of the previous step t-1 is known and then derive estimate

X(t) using Eq. (3.10); then, substituting X(t) in Eq. (3.12) we derive estimate Z(t). We

consider the algorithm converged if P(Y, XIZ) does not vary more than some threshold e

for N, consecutive iteration steps t. In our experiments, we set e = 0.01 and N, = 10.

Steps 2 and 6 in the algorithm can be interpreted as inference of X given Y for a fixed-

structure tree. In particular, for Step 2, where the initial structure is the quad-tree, we can

use the standard inference on quad-trees, where, essentially, belief messages are propagated

in only two sweeps up and down the tree [33,29,31]. For Step 6, the irregular tree represents

a forest of subtrees, which also have fixed, though irregular, structure; therefore, we can

use the very same tree-inference algorithm for each of the subtrees. For completeness,

in Appendix B, we present the two-pass maximum posterior marginal estimation of X

proposed by Laferte et al. [33].

3.3 Learning Parameters of the Irregular Tree with Fixed Node Positions

Analogous to the learning algorithm discussed in the previous chapter, the parameters

of the irregular tree with fixed node positions can be learned by using the standard ML

optimization. Here, we assume that N, independently generated, training images, with ob-

servables {Y"}, n=1, ..., N, are given. As explained before, configurations of latent variables

{(Z", X')} must be estimated.

Inference Algorithm
(1) t = 0; initialize irregular-tree structure Z(0) to quad-tree;
(2) Compute ViEV, xi(0) argn-,-:, I.c P(xi\Z(0),Y);
(3) repeat
(4) t t + 1;
(5) Compute in bottom-up pass for =0,1, ..., L
for IQTyo: (ViEVG)(Z.4j0) = argmaxj{0ove+} P(zijz1lxi,Xj);
for IQTv: (VieVG) j arg n-ir,::,, +1} P(zij 1lxi, xj);
(6) Compute ViEV, xi(t) arg m- ec P(xilZ(t),Y);
(7) X-X(t); Z(t);
(8) until I P(Y, x( i1)Z(t-1)) < for N s consecutive iteration steps.

Figure 3-2: Inference of the irregular tree with fixed node positions, given observables Y
and the model parameters R.

To this end, we propose an iterative learning procedure, where in step t we first assume

that 0(t)={((t), 0(t), I(t)} is given and then conduct inference for each training image,

n= 1, ..., N,

(Z, X) arg minE IC((Z, X), (Z', X'))IY', E(t)},
as explained in Section 3.2. Once the estimates {(ZI, X")} are found, we apply standard

ML optimization to compute @(t+1).

More specifically, suppose, in the learning step t, realizations of random variables

(Y", X", Z") are given for n=1,..., N. Then the parameters of Gaussian-mixture distribu-

tions, in step t + 1, can be computed using the standard EM algorithm [56]:

P(wP(g) c) I I c)zc() (3.13)
S( ( X CP ( I| c) 7 c( g)

^rc(g) -EPP((g)ly,i =c), (3.14)
c(g) i= 1

Yii] P(wc(g)lyi, ~ic c)
S Eil( C(g))(y C(g))TP(wc(g) yixi c)
c(g)i P(a)c y^ c) (3.16)

where n~ is the total number of all the nodes over N training images that are classified as

class c. To compute P(wc(g)lyi,xi=c) in Eq. (3.13), we use Gaussian-mixture parameters

from the previous learning step t. For all classes we set Gc=3.

Next, we explain how to learn the parameters of the connectivity distribution, I(t+l) =

{pi(t+l)}iGv, by using the ML principle:
(t+1) -arg max J P(ZT|1X, '(t-1)). (3.17)
Here, we consider two cases for IQTv and IQTyo models. Recall that parameters pi are

equal for all nodes i at the same level f. Given the estimates {(Z", X")}, for each training

image n=1, ..., N, from Eqs. (3.5) and (3.17), we derive for IQTy:

p()= N N||, (3.18)
EL [1+I(a7 ,/ +I(4 )(dqax- -j)]
n=1 icV

where I(.) is an indicator function, j is an estimated parent of node i, d' denotes the relative

distance assigned to the estimated connection z=l1.

For IQTvo, given the estimates {(Z", X")}, for each training image n=1,...,N, we
i_ ,'1-,.. the set of nodes ieVe included in the corresponding irregular tree, i.e., Zz~O0.

Thus, from Eqs. (3.5), and (3.17), we derive:
I ( I(Z. 0)
M(-) n- iE (3.19)

E L i(Z'T 0) [1 + 1( + i(ml^# (dax- -d)]
n= licV

where I(.) is an indicator function, j is an estimated parent of node i, d' denotes the relative

distance assigned to the estimated connection ~z=1.

Finally, to learn the conditional probability table 1, we use the standard EM algorithm

on fixed-structure trees, thoroughly discussed in [33]. Note that to obtain the estimates

{(Z", X")}, for each training image n 1, ..., N, in the learning step t, we in fact have to
conduct the MPM estimation, given in in Appendix B in Fig. B. By using already available

P(xzi,xjlY), zl=1) and P(xiY' )), obtained for each image n as in Fig B, we derive

N1 y iE P(Xi, Xjyn y,n=1)
( +I v xJ d(i)I-1) (3.20)
N iev P(j I Yd )
The overall learning procedure is summarized in Fig. 3-3.

Learning Algorithm
(1) t= 0; initialize E(0) {-(0), 0(0), T(0)};
(2) Estimate for n 1,..., N
(Z-, X'- arg minz,x E{C((Z, X), (Z', X'))lY', e(0)};
(3) repeat
(4) t t+l;
(5) Compute:
0(t) as in Eqs. (3.13)-(3.16);
p(; t), for IQTv as in Eq. (3.18); for IQTvo as in Eq. (3.19);
D(t), as in Eq. (3.20);
(6) Estimate for n 1,..., N
(Z k), X)= arg minz,x E{C((Z, X), (Z', X'))IY", 8(t)},
using the inference algorithm in Fig. 3-2;
(7) E*= (t);
() unil () P(Y",X"|lZ",*)--P(Y",X"\Z",l(t-1)) A
(8) until (Vn) p, I
Figure 3-3: Algorithm for learning the parameters of the irregular tree; for notational
simplicity, in Step (8) we do not indicate the different estimates of (Z'",X") for 8* and

Once 8* is learned, we can localize, detect and recognize objects in the image, by

conducting the inference algorithm, presented in Fig. 3-2.


Inference of hidden variables (Z, X), can be viewed as building a forest of subtrees, each

segmenting an image into arbitrary (not necessarily contiguous) regions, which we interpret

as objects. Since, each root determines a subtree, whose leaf nodes form a detected object,

we assign I,-i, il 1 meaning to roots by assuming they represent whole objects. Moreover,

each descendant of the root can be viewed as the root of another subtree, whose leaf nodes

cover only a part of the object. Hence, we say that roots' descendants represent object

parts at various scales.

Strategies for recognizing detected objects naturally arise from a particular interpreta-

tion of the tree/sub-tree structure. Below, we make a distinction between two such strate-

gies. The ,1-, -i, of image regions under the roots leads to the whole-. '1,i. recognition

dl,,/i ..,o. while the .1 ,1:, -i, of image regions determined by roots' descendants constitutes

the .1,/'. I-part recognition strategy. For both approaches, final recognition is conducted by

majority voting over MAP labels, xi, of leaf nodes.1

The reason for ,,,,1 i,:_: smaller image regions than those under the roots stems from

our hypothesis that the information of fine-scale object details may prove critical for the

recognition of an object as a whole in scenes with occlusions. To reduce the complexity of

interpreting all detected object sub-parts, we propose to i& '1-,., the .1;,<.:;. ,ii,.. of object

components (i.e., irregular-tree nodes) with respect to recognition of objects as a whole.

1 The literature offers various strategies that outperform majority-voting classification
(e.g., multiscale B i-, -i i, classification [29], and multiscale Viterbi classification [32]); how-
ever, they do not account explicitly for occlusions, and, as such, do not significantly out-
perform majority voting for scenes with occluded objects.

4.1 Measuring Significance of Object Parts

We hypothesize that the significance of object parts with respect to object recognition

depends on both local, innate object properties and global scene properties. While in-

nate properties represent characteristic object features, which differentiate one object from

another, global scene properties describe interdependencies of object parts in the overall

image composition. It is necessary to account for both local and global cues, as the most

conspicuous object component need not necessarily be the most significant for that object's

recognition in the presence of alike objects.

The .,m 11-, -i, of innate object properties is handled through inference of the irregular

tree, where, for a given image, we compute P(zilZ,Y), VieV, as explained in C'!i 11.1' s 2

and 3. To account for the influence of global scene properties, for each node i, we compute

Shanon's entropy over the set of image classes, M, as

(Vie V)(zi 0) H,= P(x Z, Y) logP(x Z, Y) (4.1)

Since node i represents an object part, we define Hi as a measure of significance of that

object part. Note that a node with small entropy is characterized by a I" il:y" distribution

P(1xiZ, Y) with the maximum, say, at xi = k e M. This indicates that the error of

classification will be small when i is labeled as class c. Recall that during inference, the

belief message of i is propagated down the subtree in belief propagation [33], which is likely

to render i's descendants with small entropies, as well. Thus, the classification error of

the whole region of leaf nodes under i is likely to be small, when compared to some other

image region under, say, node j such that Hj>Hi. Consequently, i is more -,!!lil 'i!il"

for recognition of class k than node j. In brief, the most significant object part has the

smallest entropy over all nodes in a given sub-tree T:

i* max Hi (4.2)

In Figs. 4-1 and 4-2, we illustrate the most significant object part under each root,

where entropy is computed over seven and six image classes, shown in Figs. 4-1(top) and

4-2(top), respectively. The experiment is conducted as explained in Chapter 2, using the

Figure 4-1: For each subtree of ITy, representing an object in the 128 x 128 image, a node
i* is found with the highest entropy for |M| = 6 + 1 7 possible image classes (top row).
Bright pixels are descendants of i* at the leaf level and indicate the object part represented
by i.

irregular tree with random node positions, and observables at all levels (ITv). Details on

computing observables Y in this experiment are explained in Chapter 5. Note that for

different scenes different object parts are established as the most significant with respect to

the entropy measure.

4.2 Combining Object-Part Recognition Results

Once nodes are ranked with respect to the entropy measure, we are in a position to

devise a criterion to optimally combine this information toward ultimate object recognition.

Herewith, we propose a simple greedy algorithm, which, nonetheless, shows remarkable

improvements in performance over the whole-object recognition approach.

Under each root, we first select the descendant node with the smallest entropy. Each

selected node determines a subtree, whose leaf nodes form an object part. Then, we conduct

majority voting over these selected image regions. In the second round, we select under

each root the descendant node with the smallest entropy, such that it does not belong to

any of the subtrees selected in the first round. Now, these nodes determine new subtrees,

whose leaf nodes form object parts that do not overlap with the selected image regions in

Figure 4-2: For each subtree of ITy, representing an object in the 256 x 256 image, a node
i* is found with the highest entropy for |M| = 5 +1 = 6 possible image classes (top row).
Bright pixels are descendants of i* at the leaf level and indicate the object part represented
by i*; the images represent the same scene viewed from three different angles; the most
significant object parts differ over various scenes.

the first round. Then, we conduct majority voting over the newly selected image regions.

This procedure is repeated until we exhaustively cover all the pixels in the image. This

stage-wise majority voting over non-overlapping image regions constitutes the final step in

the object-part recognition strategy (see Fig. 1 3).


In C'lI 1.1. 'p 2 and 3, we have introduced four architectures of the irregular tree, referred

to as ITv, ITvo, IQTv, and IQTvo. To compute the observable (feature) random vectors

Y's for these models, we account for both color and texture cues.

5.1 Texture

For the choice of texture-based features, we have considered several filtering, model-

based and statistical methods for texture feature extraction. Our conclusion complies with

the comparative study of Randen and Husoy [66] that for problems with many textures with

subtle spectral differences, as in the case of our complex classes, it is reasonable to assume

that the spectral decomposition by a filter bank yields consistently superior results over

other texture mI '1-, -i, methods. Our experimental results also 1:_:_ -. that it is crucial to

i, '1-,. both local as well as regional properties of texture. As such, we (n 11,. -,, the wavelet

transform, due to its inherent representation of texture at different scales and locations.

5.1.1 Wavelet Transform

Wavelet atom functions, being well localized both in space and frequency, retrieve

texture information quite successfully [67]. The conventional discrete wavelet transform

(DWT) may be regarded as equivalent to filtering the input signal with a bank of bandpass

filters, whose impulse responses are all given by scaled versions of a mother wavelet. The

scaling factor between adjacent filters is 2:1, leading to octave bandwidths and center fre-

quencies that are one octave apart. The octave-band DWT is most efficiently implemented

by the dyadic wavelet decomposition tree of Mallat [68], where wavelet coefficients of an

image are obtained convolving every row and column with impulse responses of lowpass

and highpass filters, as shown in Figure 5-1. Practically, coefficients of one scale are ob-

tained convolving every second row and column from the previous finer scale. Thus, the

filter output is a wavelet subimage that has four times less coefficients than the one at the

Level 0
Column filters

Level 1
Row filters


2 WH,,L

el 1
umn filters

Level 0
Row filters 0 2 H1 2



~^ WHH


Figure 5-1: Two levels of the DWT of a two-dimensional signal.

20 40 60 80 100 120 20 40 60 80 100 120

Figure 5-2: The original image (left) and its two-scale dyadic DWT (right).

previous scale. The lowpass filter is denoted with Ho and the highpass filter with H1. The

wavelet coefficients W have in index L denoting lowpass output and H for highpass output.

Separable filtering of rows and columns produces four subimages at each level, which

can be arranged as shown in Figure 5-2. The same figure also illustrates well the directional

selectivity of the DWT, because WLH, WHL, and WHH bandpass subimages can select

horizontal, vertical and diagonal edges, respectively.

5.1.2 Wavelet Properties

The following properties of the DWT have made wavelet-based image processing very

attractive in recent years [67,30,69]:

1. 1.. .l1 -,: each wavelet coefficient represents local image content in space and frequency,
because wavelets are well localized simultaneously in space and frequency

2. multi-resolution: DWT represents an image at different scales of resolution in space
domain (i.e., in frequency domain); regions of ." 1,1-, -i- at one scale are divided up into
four smaller regions at the next finer scale (Fig. 5-2)

3. edge detector: edges of an image are represented by large wavelet coefficients at the
corresponding locations

4. energy compression: wavelet coefficients are large only if edges are present within the
support of the wavelet, which means that the majority of wavelet coefficients have
small values

5. decorrelation: wavelet coefficients are approximately decorrelated, since the scaled
and shifted wavelets form orthonormal basis; dependencies among wavelet coefficients
are predominantly local

6. clustering: if a particular wavelet coefficient is large/small, then the adjacent coeffi-
cients are very likely to also be large/small

7. persistence: large/small values of wavelet coefficients tend to propagate through scales

8. non-Gaussian marginal: wavelet coefficients have peaky and long-tailed marginal dis-
tributions; due to the energy compression property only a few wavelet coefficients
have large values, therefore Gaussian distribution for an individual coefficient is a
poor statistical model

It is also important to introduce shortcomings of the DWT. Discrete wavelet decom-

positions suffer from two main problems, which hamper their use for many applications, as

follows [70]:

1. lack of shift invariance: small shifts in the input signal can cause major variations in
the energy distribution of wavelet coefficients

2. poor directional selectivity: for some applications horizontal, vertical and diagonal
selectivity is insufficient

When we .i_ ,1- ,.., the Fourier spectrum of a signal, we expect the energy in each

frequency bin to be invariant to any shifts of the input. Unfortunately, the DWT has a

significant drawback that the energy distribution between various wavelet scales depends

critically on the position of key features of the input signal, whereas ideally dependence



Figure 5-3: The Q-shift Dual-Tree CWT.

is on just the features themselves. Therefore, the real DWT is unlikely to give consistent

results when used in texture I &', 1,

In literature, there are several approaches proposed to overcome this problem (e.g.,

Discrete Wavelet Frames [67,71]), all increasing computational load with inevitable redun-

dancy in the wavelet domain. In our opinion, the Complex Wavelet Transform (CWT) offers

the best solution providing additional advantages, described in the following subsection.

5.1.3 Complex Wavelet Transform

The structure of the CWT is the same as in Figure 5 1, except that the CWT filters

have complex coefficients and generate complex output. The output sampling rates are

unchanged from the DWT, but each wavelet coefficient contains a real and imaginary part,

thus a redundancy of 2:1 for one-dimensional signals is introduced. In our case, for two-

dimensional signals, the redundancy becomes 4:1, because two adjacent quadrants of the

spectrum are required to represent fully a real two-dimensional signal, adding an extra 2:1

factor. This is achieved by additional filtering with complex < jiugates of either the row or

column filters [70].

Despite its higher computational cost, we prefer the CWT over the DWT because of

the CWT's following attractive properties. The CWT is shown to posses almost shift and

rotational invariance, given suitably designed biorthogonal or orthogonal wavelet filters. We

Level 2

Table 5-1: Coefficients of the filters used in the Q-shift DTCWT.

H 13 (-,ii Wi1,) H 19 (-,-ir-i, ii)1.! H 6
-0.0017581 -0.0000706 0.03611. : I
0 0 0
111 1 1'' ,"i. 0.0013419 -0.08832942
-0.0468750 -0.011;! 0.23389032
-0.0482422 -0.0071568 0.76027237
0.2968750 0.0238560 0.58751830
",",", !i ,'ss 1a 1 ,,i, ;1 0
0.2968750 -0.0516881 -0.11430184
-0.0482422 -0.2997576 0


Figure 5-4: The CWT is strongly oriented at angles 150, 450, 750.

implement the Q-shift Dual-Tree CWT scheme, proposed by Kingsbury [72], as depicted in
Figure 5-3. The figure shows the CWT of only one-dimensional signal x, for clarity. The
output of the trees a and b can be viewed as real and imaginary parts of complex wavelet
coefficients, respectively. Thus, to compute the CWT, we implement two real DWT's (see
Fig. 5-1), obtaining a wavelet frame with redundancy two. As for the DWT, here, lowpass
and highpass filters are denoted with 0 and 1 in index, respectively. The level 0 comprises
odd-length filters Hoa(z) Hob(z) = H13(z) (13 taps) and Hia(z) Hlb(z) = H19(
(19 taps). Levels above the level 0 consist of even-length filters Hoo0(z) z-H6(z-1),

Hoia(z) =H6(-z), Hoob(z) =H6(z), Hob(z) = -1H6(-z-1), where the impulse response
of the filters H13, H19 and H6 is given in the table 5-1.

hit P -9


Aside from being shift invariant, the CWT is superior to the DWT in terms of direc-

tional selectivity, too. A two-dimensional CWT produces six bandpass subimages (analo-

gous to the three subimages in the DWT) of complex coefficients at each level, which are

strongly oriented at angles of 150, 450, 750, as illustrated in Figure 5-4.

Another advantageous property of the CWT exerts in the presence of noise. The

phase and magnitude of the complex wavelet coefficients collaborate in a non trivial way

to describe data [70]. The phase encodes the coherent (in space and scale) structure of

an image, which is resilient to noise, and the magnitude captures the strength of local

information that could be very susceptible to noise corruption. Hence, the phase of complex

wavelet coefficients might be used as a principal clue for image denoising. However, our

experimental results have shown that phase is not a good feature choice for sky/ground

modeling. Therefore, we consider only magnitudes.

In summary, for texture .111 m, -i- in ITv and IQTv, we choose the complex wavelet

transform (CWT) applied to the intensity (gi '-, .1-) image, due to its shift-invariant

representation of texture at different scales, orientations and locations.

5.1.4 Difference-of-Gaussian Texture Extraction

In ITvo and IQTvo, observables are present only at the leaf level. Therefore, for these

models, multiscale texture extraction is superfluous. Here, we compute the difference-of-

Gaussian function convolved with the image as

D(x, y, k, )= (G(x, y, ka)-G(x, y, a))*I(x, y), (5.1)

where x and y represent pixel coordinates, G(x,y, a) exp(-(X2 + y2)/2a2)/27ca2, and

I(, y) is the intensity image. In addition to reduced computational complexity, as com-

pared to the CWT, the function D provides a close approximation to the scale-normalized

Laplacian of Gaussian, a2V2G, which has been shown to produce the most stable image

features across scales when compared to a range of other possible image functions, such as

the gradient and the Hessian [73,74]. We compute D(x, y, k, a) for three scales k= /2,2, V/8

and a = 2.


5.2 Color

Ti color information in a video signal is O. !" e encoded in the RGB color space. For

color features, in all models, we choose the generalized RGB color space: r=R/(R+G+B),

and g=G/( R+G B), which vely 1 variations in brightness. For ITy and

I(- TV, the Y's of higher-level nodes are computed as the mean of the r's and .of their

(i. .-:. nodes of the initial ..: 1 :ree structure. Each color observable is normalized to

have zero mean and unit variance over the dataset.

In summary, the y's are 8 dimensional vectors for ITv and Ii* i ,n and 5 ... :

vectors for ITV,' and IQTyLo.


We report experiments on image segmentation and classification for six sets of images.

Dataset I comprises fifty, 64x64, simple-scene images with object appearances of 20 distinct

objects shown in Fig. 6-1. Samples of dataset I are given in Figs. 6-2, 6-3, and 6-4. Dataset

II contains 120, 128x128, complex-scene images with partially occluded object appearances

of the same 20 distinct objects as for dataset I images. Examples of dataset II are shown

in Figs. 6-11, 6-12, 6-15. Note that objects appearing in datasets I and II are carefully

chosen to test if irregular trees are expressive enough to capture very small variations in

appearances of some classes (e.g., two different types of cans in Fig. 6-1), as well as to

encode large differences among some other classes (e.g., wiry-featured robot and books in

Fig. 6-1).

Next, dataset III contains fifty, 128x128, natural-scene images, samples of which are

shown in Figs. 6-5 and 6-6.

For dataset IV we choose sixty, 128 x 128 images from a database that is publicly

available at the Computer Vision Home Page. Dataset IV contains a video sequence of

two people approaching each other, who wear alike shirts, but different pants, as illustrated

in Fig. 6-16. The sequence is in, i. -,ii:_. because the most significant "object" parts for

differentiating between the two persons (i.e., pants) get occluded. Moreover, the images

represent scenes with clutter, where recognition of partially occluded, similar-in-appearance

people becomes harder. Together with the two persons, there are 12 possible image classes

appearing in dataset II, as depicted in Fig. 6-16a. Here, each image is treated separately,

without making use of the fact that the background scene does not change in the video


Further, dataset V consists of sixty, 256 x 256 images, typical samples of which are

shown in Figs. 6-17b. The images in dataset V represent the video sequence of a com-

plex scene, which is observed from different view points by moving a camera horizontally

clockwise. Together with the background, there are 6 possible image classes, as depicted in

Figs. 6-17a.

Finally, dataset VI consists of sixty, 256 x 256 natural-scene images, samples of which

are shown in Figs. 6-18. The images in dataset VI represent the video sequence of a row

of houses, which is observed from different view points. The houses are very similar in

appearance, so that the recognition task becomes very difficult, when details differentiating

one house from another are occluded. There are 8 possible image classes: 4 different houses,

sky, road, grass, and tree, as marked with different colors in Figs. 6-18.

All datasets are divided into training and test sets by random selection of images,

such that 2/3 are used for training and 1/3 for testing. Ground truth for each image is

determined through hand-labeling of pixels.

6.1 Unsupervised Image Segmentation Tests

We first report experiments on unsupervised image segmentation using ITvo and ITy.

Irregular-tree based image segmentation is tested on datasets I and III, and conducted by

the algorithm given in Fig. 2-4. Since in unsupervised settings the parameters of the model

are not known, we initialize them as discussed in the initialization step of the learning

algorithm in Section 2.5. After B i-, -i ,i estimation of the irregular tree, each node defines

one image region composed of those leaf nodes (pixels) that are that node's descendants.

Results presented in Figs. 6-2, 6-3, 6-4, 6-5, and 66 -:_:_. -1 that irregular trees are able

to parse images into "meaningful" parts by assigning one subtree per "object" in the image.

Moreover, from Figs. 6-2 and 6-3, we also observe that irregular trees, inferred through SVA,

preserve structure for objects across images subject to translation, rotation and scaling. In

Fig. 6-2, note that the level-4 clustering for the larger-object scale in Fig. 6-2(top-right)

corresponds to the level-3 clustering for the smaller-object scale in Fig. 6-2(bottom-center).

In other words, as the object transitions through scales, the tree structure changes by

eliminating the lowest-level layer, while the higher-order structure remains intact.

We also note that the estimated positions of higher-level hidden variables in ITyo and

ITv are very close to the center of mass of object parts, as well as of whole objects. We

compute the error of estimated root-node positions r as the distance from the actual center

of mass rCM of hand-labeled objects, derr,,= -rCM||. Also, we compare our SVA inference

Figure 6-1: 20 image classes in type I and II datasets.


Figure 6-2: Image segmentation using
ITvo: (left) dataset I images; (center)
pixel clusters with the same parent at
level f=3; (right) pixel clusters with the
same parent at level =4; points mark the
position of parent nodes. Irregular-tree
structure is preserved through scales.

Figure 6-3: Image segmentation using
ITvo: (top) dataset I images; (bottom)
pixel clusters with the same parent at
level 3. Irregular-tree structure is pre-
served over rotations.

algorithm with variational approximation (VA)1 proposed by Storkey and Williams [48].
The averaged error values over the given test images for VA and SVA are reported in
Table 6-1. We observe that the error significantly decreases as the image size increases,
because in summing node positions over parent and children nodes, as in Eq. (2.16) and
Eq. (2.17), more statistically significant information contributes to the position estimates.
For example, d6 = 6.18 for SVA is only 4 s' of the dataset-III image size, whereas d =4.23
for SVA is 6.1' of the dataset-I image size.
In Table 6-2, we report the percentage of erroneously grouped pixels, and, in Table 6-3,
we report the object detection error, when compared to ground truth, averaged over each
dataset. For estimating the object detection error, the following instances are counted as

1 Although the algorithm proposed by Storkey and Williams [48] is also structured varia-
tional approximation, to differentiate that method from ours, we slightly abuse the notation.

Figure 6-4: Image segmentation by irregular trees learned using SVA: (a)-(c) ITvo for
dataset I images; all pixels labeled with the same color are descendants of a unique root.

(a) (b) (c) (d)

Figure 6-5: Image segmentation by irregular trees learned using SVA: (a) ITyo for a dataset
III image; (b)-(d) ITv for dataset III images; all pixels labeled with the same color are
descendants of a unique root.

(c) (d)

Figure 6-6: Image segmentation using ITv: (a) a dataset III image; (b)-(d) pixel clusters
with the same parent at levels f=3, 4, 5, respectively; white regions represent pixels already
grouped by roots at the previous scale; points mark the position of parent nodes.

Table 6-1: Root-node distance error


6.32 4.61
9.15 6.87

6.14 4.23
8.99 6.18


Table 6-2: Pixel segmentation error Table 6-3: Object detection error
datasets datasets

ITvo VA 7' 10%
SVA c 9',
ITv VA 7'. 11%
SVA 1' 7'.

ITvo VA 1: ',
SVA ,:' 10%
ITv VA ', 10%
SVA 2'. .'

error: (1) merging two distinct objects into one (i.e., failure to detect an object), and (2)

segmenting an object into sub-regions that are not actual object parts. On the other hand, if

an object is segmented into several "meaningful" sub-regions, verified by visual inspection,

this type of error is not included. Overall, we observe that SVA outperforms VA for image

segmentation using ITvo and ITv. Interestingly, the segmentation results for ITv models

are only slightly better than for ITvo models.

It should be emphasized that our experiments are carried out in an unsupervised ii -i .

and, as such, cannot not be equitably evaluated against supervised object recognition results

reported in the literature. Take, for instance, the segmentation in Fig. 6-5d, where two

boys dressed in white clothes (i.e., two similar-looking objects) are merged into one subtree.

Given the absence of prior knowledge, the ground-truth segmentation for this image is

arbitrary, and the resulting segmentation ambiguous; nevertheless, we still count it towards

the object-detection error percentages in Table 6-3.

Our claim that nodes at different levels of irregular trees represent object-parts at

various scales is supported by experimental evidence that the nodes segment the image into

"meaningful" object sub-components and position themselves at the center of mass of these


6.2 Tests of Convergence

In this section, we report on the convergence properties of the inference algorithms

for ITvo, ITv, IQTyo, and IQTv. First, we compare our SVA inference algorithm with

variational approximation (VA) [48]. In Fig. 6-7a-b, we illustrate the convergence rate

of computing P(Z, X, R'IY, Ro) a Q(Z,X, R') for SVA and VA, averaged over the given

datasets. Numbers above bars represent the mean number of iteration steps it takes for

the algorithm to converge. We consider the algorithm converged when IQ(Z,X, R';t +




O 150


dataset I dataset III dataset II
(a) Average convergence rate for ITvo.

iMm VA 33
35 33
M 25-
20 -



dataset I dataset III dataset II
(c) Increase of log Q(Z, X, R') in SVA over
VA for ITvo.


r 200

c 50
e- S

dataset I dataset III dataset II
(b) Average convergence rate for ITv.

40 Mi VA 3
3 5 1-

I 30
2 20
a' 15

dataset I dataset III dataset II
(d) Increase of log Q(Z, X, R') in SVA over
VA for ITv

Figure 6-7: Comparison of inference algorithms: (a)-(b) convergence rate averaged over
the given datasets; (c)-(d) percentage increase in log Q(Z,X, R') computed in SVA over
log Q(Z, X, R') computed in VA.

1) Q(Z,X, R';t)I/Q(Z,X, R';t)
e=0.01 (see Fig. 2-4, Step (11)). Overall, SVA converges in the fewest number of iterations.

For example, the average number of iterations for SVA on dataset III is 25 and 23 for ITyo

and ITv, respectively, which takes approximately 6s and 5s on a Dual 2 GHz PowerPC G5.

Here, the processing time also includes image-feature extraction.

For the same experiments, in Fig. 6-7c-d, we report the percentage increase in log Q(Z, X, R')

computed using our SVA over log Q(Z, X, R') obtained by VA. We note that SVA results

in larger approximate posteriors than VA. The larger log Q(Z, X, R') means that the as-

sumed form of the approximate posterior distribution Q(Z,X, R')=Q(Z)Q(XIZ)Q(R'|Z)

more accurately represents underlying stochastic processes in the image than VA.

Now, we compare the convergence of the inference algorithm for IQTyo with SVA and

VA for ITyo. For simplicity, we refer to the inference algorithm for the model IQTvo,

also, as IQTyo, slightly abusing the notation. The parameters that control the convergence



-2000 "


\- VA
2 5 10 20 50 100 200
number of iterations

Figure 6-8: Typical convergence rate of the inference algorithm for ITvo on the 128 x 128
dataset IV image in Fig. 6-16b; SVA and VA inference algorithms are conducted for ITvo

-10000 .-*** .
-15000 /
E -20000 .,- '
-35000 VA
S- -VA
2 5 10 50 100 500
number of iterations

Figure 6-9: Typical convergence rate of the inference algorithm for ITvo on the 256 x 256
dataset V image in Fig. 6-17b; SVA and VA inference algorithms are conducted for ITvo
80 II I I _:.. 66
60 -4--
4) 47
4 40-
4 30

ITvo ITvo
learned by SVA learned by VA

Figure 6-10: Percentage increase in log-likelihood logP(YIX) of IQTvo over logP(YIX)
of ITvo, after 500 and 200 iteration steps for datasets IV and V, respectively.

criterion for the inference algorithms of the three models are N=10, and E=0.01. Figs. 6-8

and 6-9 illustrate typical examples of the convergence rate. We observe that the inference

algorithm for IQTVo converges slightly slower than SVA and VA for ITvo. The average

number of iteration steps for IQTvo is approximately 160 and 230, which takes 6s and 17s

on a Dual 2 GHz PowerPC G5, for datasets IV and V, respectively.

The bar-chart in Fig. 6-10 shows the percentage logP l- P, where Pi=P(Y|X) is

the likelihood of ITvo, and P2=P(YIX) of IQTvo. We observe that P(YIX) of IQTvo,

after the algorithm converged, is larger than P(YIX) of ITyo. The larger likelihood means

that the model structure and inferred distributions more accurately represent underlying

stochastic processes in the image.

6.3 Image Classification Tests

We compare classification performance of ITyo with that of the following statistical

models: (1) Markov Random Field (lI;F) [6], (2) Discriminative Random Field (DRF) [25],

and (3) Tree-Structured Belief Network (TSBN) [33,29]. These models are representatives

of descriptive, discriminative and fixed-structure generative models, respectively. Below,

we briefly explain the models.

For MRFs, we assume that the label field P(X) is a homogeneous and isotropic MRF,

given by the generalized Ising model with only pairwise nonzero potentials [6]. The likeli-

hoods P(yilxi) are assumed conditionally independent given the labels. Thus, the posterior

energy function is given by

U(X|Y) = logP(y z)+ ZV2 (i j),
ieVo i vojgi'

V2 (i, x) = 3MRF ,if i x ,
-/MRF,if i 4j .

where Ni denotes the neighborhood of i, P(yilxi) is a G-component mixture of Gaussians

given by Eq. (2.6), and V2 is the interaction parameter. Details on learning the model

parameters as well as on inference for a given image can be found in Stan Li's book [6].

Next, the posterior energy function of the DRF is given by

U(XlY)= Ai(xi, Y)+ 1,ij (aX, x, Y),
ieVo ieVo j'Ai

where A= log (T(xWTyi) and ij= /3DRF(Kxixj +(1l-K)(2a((xixjVTyV)-1)) are the unary

and pairwise potentials, respectively. Since the above formulation deals only with binary

classification (i.e. xi E {-1, 1}), when estimating parameters {W, V,/3DRF, K} for an ob-

ject, we treat that object as a positive example, and all other objects as negative examples

("one against all" strategy). For details on how to learn the model parameters, and how

to conduct inference for a given image, we refer the reader to the paper of Kumar and

Hebert [25].

Further, TSBNs or quad-trees are defined to have the same number of nodes V and

levels L as irregular trees. For both ITyo and TSBNs, we use the same image features.

When we operate on wavelets, which is a multiscale image feature, we in fact propagate

observables to higher levels. In this case, we refer to the counterpart of ITv as TSBNT. To

learn the parameters of TSBN or TSBNT, and to perform inference on a given image, we

use the algorithms thoroughly discussed by Laferte et al. [33].

Finally, irregular-tree based image classification is conducted by 1 d1..-, i,:; the infer-

ence algorithms in Fig. 2-4 for ITyo and ITv, and the inference algorithms in Fig. 3-2

for IQTvo and IQTv. Since image classification represents a supervised machine learning

problem, it is necessary to first learn model parameters on training images. For this pur-

pose, we (!n1 d..-,- the learning algorithms discussed in Section 2.5 for ITyo and ITv, and the

learning algorithms discussed in Section 3.3 for IQTvo and IQTy.

After inference of MRF, DRF, TSBN, and the irregular tree, on a given image, for each

model, we conduct pixel labeling by using the MAP classifier. In Fig. 6-11, we illustrate

an example of pixel labeling for a dataset-II image. Here, we say that an image region is

correctly recognized as an object if the majority of MAP-classified pixel labels in that region

are equal to the true labeling of the object. For estimating the object-recognition error, the

following instances are counted as error: (1) merging two distinct objects into one, and (2)

swapping the identity of objects. The object-recognition error over all objects in 40 test

images in dataset II is summarized in Table 6-4. In each cell of Table 6-4, the first number

indicates the overall recognition error, while the number in parentheses indicates the ratio

of swapped-identity errors. For instance, for ITyo the overall recognition error is 9.,.' ., of

which ;7' of instances were caused by swapped-identity errors. Moreover, Table 6-5 shows

average pixel-labeling error.

Next, we examine the receiver operating characteristic (ROC) of MRF, DRF, TSBN

and ITvo for a two-class recognition problem. From the set of image classes given in Fig. 6-1,

we choose "to-, --ii I and ,-- I. i-' ..... I: as the two possible classes in the following set

Table 6-4: Object recognition error

image type MRF DRF TSBN ITyo
dataset II 21.2' 12.5% 14 -' 9..'
'0.7 .) ( :'.) (72 .) (::7',)

Table 6-5: Pixel labeling error

image type MRF DRF TSBN ITyo
dataset II 1'. '. 12.::'. 16.1% 9.9'.

of experiments. The task is to label two-class-problem images containing "to-,--,ii Ii and

'.- ,i. 1. i-i .... I: objects, a typical example of which is shown in Fig. 6-12. Here, pixels

labeled as "toy-snail" are considered true positives, while pixels labeled as ....:" are

considered true negatives. In Fig. 6-13, we plot ROC curves for the two-class problem,

where we compare the performance of ITyo with those of MRF, DRF and TSBN From

Fig. 6-13, we observe that image classification with ITyo is the most accurate, since its

ROC curve is the closest to the left-hand and top borders of the ROC space, as compared

to the ROC curves of the other models. Further, in Fig. 6-14, we plot ROC curves for the

same two-class problem, where we compare the performance of ITv, with those of ITvo,

TSBN, and TSBNT. From Fig. 6-14, we observe that image classification with ITv is the

most accurate, and that both ITyo and ITv outperform their fixed-structure counterparts


From the results reported in Tables 6-4 and 6-5, as well as form Figs. 6-13 and 6-14,

we note that irregular trees outperform the other three models. However, recognition

performance of all the models suffers substantially when an image contains occlusions.

While for some applications the literature reports vision systems with impressively small

classification errors (e.g., 2.5% hand-written digit recognition error [75]), in the case of

(a) 256 x 256 (b) MRF (c) DRF (d) TSBN (e) ITvo

Figure 6-11: Comparison of classification results for various statistical models; pixels are
labeled with a color specific for each object; non-colored pixels are classified as background.

N r

(a) 256 x 256 (b) MRF (c) DRF (d) TSBN (e) ITvo

Figure 6-12: MAP pixel labeling using different statistical models.

0.94 "

: 0.9 2 .
/-- ITvo
0.88 7 DRF
0.86 MRF
0.06 0.08 0.1 0.12 0.14 0.16
false positive rate

Figure 6-13: ROC curves for the image in Fig. 6-12a with ITvo, TSBN, DRF and MRF.

complex scenes this error is much higher [76, 77, 11, 5, 4]. To some extent, our results

could have been improved had we iil.l1.-,. l more discriminative image features and/or

more sophisticated classification algorithms than majority rule. However, none of these

will alleviate the fundamental problem of ii Ii i, i_., iI" recognition approaches: the lack of

explicit i ,1-, -i, of visible object parts. Thus, the poor classification performance of MRF,

DRF, and TSBN, reported in Tables 6-4 and 6-5, can be interpreted as follows. Accounting

for only pairwise potentials between adjacent nodes in MRF and DRF is not sufficient to

, i'1-,.., complex configurations of objects in the scene. Also, the .,, 11- ,-, of fixed-size pixel

neighborhoods at various scales in TSBN leads to I.1.. I:y" estimates, and consequently

S ..... -----
2 0.92
E 0.9 'IT
o. 0.88 / ITyo
*0.86 / --- TSBN T

0.06 0.08 0.1 0.12 0.14 0.16
false positive rate

Figure 6-14: ROC curves for the image in Fig. 6-12a with ITv, ITvo, TSBN, and TSBNT.

to poor classification performance. Therefore, we hypothesize that the main reason why

irregular trees outperform the other models is their capability to represent object details at

various scales, which in turn provides for explicit .,1 '1, --i, of visible object parts. In other

words, we speculate that in the face of the occlusion-problem, recognition of '.'/I. / parts is

critical and should condition recognition of the object as a whole.

To support our hypothesis, instead of I'l.-, i::_. more sophisticated image-feature-

extraction tools and better classification procedures than majority vote, we introduce a

more radical change to our recognition strategy.

6.4 Object-Part Recognition Strategy

Recall from Section 6.1 that irregular trees are capable of capturing component sub-

component structures at various scales, such that root nodes represent the center of mass

of distinct objects, while children nodes down the subtrees represent object parts. As such,

irregular trees provide a natural and seamless framework for identifying candidate image

regions as object parts, requiring no additional training for such identification. To uti-

lize this convenient property, we conduct the object-part recognition strategy presented in

Section 4.2.

We compare the performance of the whole-object and part-object recognition strategies.

The whole-object approach can be viewed as a benchmark strategy, in the sense that a

majority of existing vision systems does not explicitly ., 1 1-,. .. visible object parts at various

scales. In these systems, once the object is detected, the whole image region is identified

through MAP classification, as is done in the previous section.

In Fig. 6-15, we present classification results for ITyo, using the whole-object and

object-part recognition strategies on dataset-II images. In Fig. 6-15a, both strategies suc-

ceed in recognizing two different "Fluke" voltage-measuring instruments (see Fig. 6-1).

However, in Fig. 6-15b, the whole-object recognition strategy fails to make a distinction

between the objects, since the part that differentiates most one object from another is oc-

cluded, making it a difficult case for recognition even for a human interpreter. In the other

two images, we observe that the object-part recognition strategy is more successful than

the whole-object approach.

(a) (b) (c) (d)

Figure 6-15: Comparison of two recognition strategies on dataset II for ITyo: (top) 128 x
128 challenging images containing objects that are very similar in appearance; (middle)
classification using the whole-object recognition strategy; (bottom) classification using the
part-object recognition strategy; each recognized object in the image is marked with a
different color.

For estimating the object-recognition error of ITyo on dataset-II images, the following

instances are counted as error: (1) merging two distinct objects into one (i.e., object not

detected), and (2) swapping the identity of objects (i.e., object correctly detected but

misclassified as one of the objects in the class of known objects). The recognition error

averaged over all objects in 40 test images in dataset II is only -', an improvement of

nearly over the reported error of 9.1' in the previous section.

We also recorded the object-recognition error of IQTvo over all objects in 20 test

images of datasets IV, V, and VI, respectively. The results are summarized in Table 6-6.

In each cell of Table 6-6, the first number indicates the overall recognition error, while the

number in parentheses indicates the ratio of merged-object errors. For instance, for dataset

V and the whole-object strategy, the overall recognition error is 21.2' of which slightly

more than half ('.1i' .) were caused by merged-object errors. The results in Table 6-6 clearly

demonstrate significantly improved recognition performance, as well as reduction in false

Table 6-6: Object recognition error for IQTyo

strategy IV V VI
whole-object 11.i.' ( .) 21.2' ( .') 26.;' ( .)
object-part 3. :'. (100%) 8.7' (92'.) 12.5% (81%)

Table 6-7: Pixel labeling error for IQTyo

strategy IV V V
whole-object 9..' 17.9' 16..:'
object-part 4.:;', 6.7' 8.;'

alarm and swapped-identity types of error for the object-part, as compared with the whole-

object approach. Also, Table 6-7 shows that the object-part strategy reduces pixel-labeling


These results support our hypothesis that for successful recognition of partially oc-

cluded objects it is critical to I_ ,1-,.. visible object details at various scales.

(a) Cluttered scene containing 10 objects, each of which is marked with a different color; images of two
alike persons.

(b) Dataset II: video sequence of two alike people walking in a cluttered scene.

(c) Classification using the whole-object recognition strategy.

(d) Classification using the part-object recognition strategy.

Figure 6-16: Recognition results over dataset IV for IQTvo.

(a) 6 image classes: 5 similar objects and background.

(b) 4 images of the same scene viewed from 4 different angles with objects shown in (a).

(c) The most significant object parts differ over various scenes; the majority-voting classification result is
indicated by the colored regions.

(d) Classification using the whole-object recognition strategy.

(e) Classification using the object-part recognition strategy.

Figure 6-17: Recognition results over dataset V for IQTvo.

Figure 6-18: Classification using the part-object recognition strategy; Recognition results
for dataset VI.








7.1 Summary of Contributions

In this dissertation, we have addressed detection and recognition of partially occluded,

alike objects in complex scenes -the problem that has eluded, as of yet, a satisfactory

solution. The experiments reported herein show that i i'Il, i..11 I" approaches to object

recognition, where objects are first detected and then identified as a whole, yield poor per-

formance in complex settings. Therefore, we speculate that a careful i,'1-, -i- of visible,

fine-scale object details may prove critical for recognition. However, in general, the .11 i,1-, -i

of multiple sub-parts of multiple objects gives rise to prohibitive computational complexity.

To overcome this problem, we have proposed to model images with irregular trees, which

provide a suitable framework for developing novel object-recognition strategies -in particu-

lar, object-part recognition. Here, object details at various scales are first detected through

tree-structure estimation; then, these object parts are I_ '1-,. .1 as to which component of

an object is the most significant for recognition of that object; finally, information on cog-

nitive significance of each object part is combined toward the ultimate image classification.

Empirical evidence demonstrates that this explicit treatment of object parts results in an

improved recognition performance, as compared to the strategies where object components

are not explicitly accounted for.

In Chapter 2, we have proposed two architectures within the irregular-tree framework,

referred to as ITyo and ITv. For each architecture, we have developed an inference al-

gorithm. Gibbs sampling has been shown to be successful at finding trees that have high

posterior probability; however, at a great computational price, which renders the algorithm

impractical. Therefore, we have proposed Structured Variational Approximation (SVA)

for inference of ITyo and ITv, which relaxes poorly justified independence assumptions in

prior work. We have shown that SVA converges to larger posterior distributions, an order

of magnitude faster than competing algorithms. We have also demonstrated that ITyo and

ITv overcome the blocky segmentation problem of TSBNs, and that they possess certain

invariance to translation, rotation, and scaling transformations.

In Chapter 3, we have proposed another two architectures, referred to as IQTvo and

IQTv. In these models, we have constrained the node positions to be fixed, such that

only connections can control irregular tree structure. At the same time, we have made the

distribution of connections dependent on image classes. This formulation has allowed us to

avoid variational-approximation inference, and to develop the exact inference algorithm for

IQTvo and IQTv. We have shown that it converges slower than SVA; however, it yields

larger likelihood, which in general means that IQTyo represents underlying stochastic

processes in the image more accurately than ITvo.

In experiments on unsupervised image segmentation, we have shown the capability of

irregular trees to capture important component-subcomponent structures in images. Empir-

ical evidence demonstrates that root nodes represent the center of mass of distinct objects,

while children nodes down the subtrees represent object parts. As such, irregular trees

provide a natural and seamless framework for identifying candidate image regions as ob-

ject parts, requiring no additional training for such identification. In Chapter 4, we have

proposed to explicitly .111 '1-,. the significance of object parts (i.e., tree nodes) with respect

to recognition of an object as a whole. We have defined entropy as a measure of such cog-

nitive significance. To avoid the costly approach of ._ ,1-,. i,: every detected object part,

we have devised a greedy algorithm, referred to as object-part recognition. The compari-

son of whole-object and part-object approaches indicates that the latter method generates

significantly better recognition performance and reduced pixel-labeling error.

Ultimately, what allows us to overcome obstacles in I, i1-,. i,:: scenes with occlusions

in a computationally efficient and intuitively appealing manner is the generative-model

framework we have proposed. This framework provides an explicit representation of objects

and their sub-parts at various scales, which, in turn, constitutes the key factor for improved

interpretation of scenes with partially occluded, alike objects.

7.2 Opportunities for Future Work

The .,1 ,1, -i, in the previous chapters 1:_:_. -I the following opportunities for future

work. One promising thrust of research would be to investigate relationships among descrip-

tive, generative and discriminative statistical models. We anticipate that these studies will

lead to a greater integration of the modeling paradigms, yielding richer and more advanced

classes of models. Here, the most critical issue is that of computationally manageable in-

ference. With recent advances in the area of belief propagation (e.g., Generalized Belief

Propagation [78]), the new algorithms may make it possible to solve real-world problems

that were previously computationally intractable.

Within the irregular-tree framework, it is possible to continue further investigation

toward replacing the current discrete-valued node variables with real-valued ones. Thereby,

a real-valued version of the irregular tree can be specified. Gaussians could be used as

a probability distribution to govern continuous random variables, represented by nodes,

due to their tractable properties. Such a model could then operate directly on real-valued

pixel data, improving the state-of-the-art techniques for solving various image-processing

problems, including super resolution, image enhancement, and compression.

Further, with respect to the measure of significance of irregular-tree nodes, one can

pursue investigation of more complex information-theoretic concepts than Shanon's entropy.

For example, we anticipate that joint entropy and mutual information may yield a more

efficient cognitive .111 1, -i-. which in turn could eliminate the need for the greedy algorithm

discussed in Section 4.2.

The ., ,11-, -i, of object parts can be interpreted as integration of information from

multiple complementary and/or competitive sensors, each of which has only limited accu-

racy. As such, further research could be conducted on formulating the optimal strategy

for combining the pieces of information of object parts toward ultimate object recognition.

We anticipate that algorithms such as the adaptive boosting (AdaBoost) [79] and Support

Vector Machine [80] may prove useful for this purpose.

Another promising research topic is to incorporate available prior knowledge into the

proposed B ,-, -i mi estimation framework, where we have assumed that all classification

errors are equally costly. However, in many applications, some errors are more serious than

others. Cost-sensitive learning methods are needed to address this problem [81].

On a broader scale, the research reported in this dissertation can be viewed as solving

a more general machine learning problem, with experimental validation on images as data.

This problem concerns supervised learning from examples, where the goal is to learn a

function X = f(Y) from N training examples of the form {(Y,, f(Y~))}I 1. Here, X,

and Y, contain sub-components, the meaning of which differs for various applications. For

example, in computer vision, each Y, might be a vector of image pixel values, and each

X, might be a partition of that image into segments and an assignment of labels to each

segment. Most importantly, the components of Y, form a sequence (e.g., a sequence on

the 2D image lattice). Therefore, learning a classifier function X = f(Y) represents the

sequential supervised learning problem [82]. Thus, in this dissertation, we have addressed

sequential supervised 1. in__:_. the solutions of which can be readily applied to a wide range

of problems beyond computer vision, such as, for example, speech pi.. -.-,:_. where the

components of Y form a sequence in time.

Preliminaries. Computation of KL(QIIP), given by Eq. (2.12), is intractable, be-
cause it depends on P(Z,X, R'IY, RO). Note, though, that Q(Z,X, R') does not depend
on P(YIRo) and P(RO). Consequently, by subtracting logP(YIRo) and logP(RO) from

KL(QIIP), we obtain a tractable criterion J(Q,P), whose minimization with respect to
Q(Z, X, R') yields the same solution as minimization of KL(QIIP):
0RQ(Z, X, R')
J(Q, P)KL(QIIP)- log P(YIR)- log P(R)= j dR' Q(Z, X, R') log ( R,)

J(Q, P) is known alternatively as Helmholtz free energy, Gibbs free energy, or free energy
[59]. By minimizing J(Q, P), we seek to compute parameters of approximate distributions

Q(Z), Q(XIZ) and Q(R'IZ). It is convenient, first, to reformulate Eq. (A.1) as J(Q,P) =
Lz + Lx + LR. We define auxiliary Lz, Lx, and LR as Lz A z Q(Z) log Q(Z) Lx

Ez,x Q(Z)Q(XIZ) log x ), and L To
derive expressions for Lz, Lx, LR, we first observe:

(zij)=i, a /k\, .)j -Qjk => k :jiVJl:MQ!'' VieVVkM,
where (K) denotes expectation with respect to Q(Z, X, R'). Consequently, from Eqs. (2.1),
(2.9) and (A.2), we have

Lz = Eijv j i.j -l[ ;/- .] (A.3)

Next, from Eqs. (2.4), (2.10) and (A.2), we derive

Lx = E yjv k,lcM ijQ.I "') log[ /Pi] Ei EkM log P(yp(,) |X, p(i)) (A.4)

Note that for DTvo, V in the second term is substituted with Vo. Finally, from Eqs. (2.3),
(2.11) and (A.2), we get

LR = I (jl log 1 Tr{ 1 } + Tr E{Zl,(r-rr-d)(r-rj-dj )T .
Let us now consider the expectation in the last term:

((ri-rj -dij) (ri-rj -dij)T) = ((r- pj+ ij-rj-dij)(r-pj+ ij-rj-dij)T)

-= ij + 2((ri -pij) (jp-rj --dij -pjp+pij)T) +

+ ((rj -jp+dij +pj-ij)(rj -Cjpdij ++jp-pij)) =

=Qij+2((ri-ij)(Ljp-rj)T)+((-rj --Ljp) ( rj -jp,)T+( ij -jp -dij) )(ij -jp-dij)T)=

= ij + Y:EpC / jp (2'ijp + Qjp + ijp) ,
where the definitions of auxiliary matrices fijp and Maijy are given in the second to the
last derivation step above, and i-j-p is a child-parent-grandparent triad. It follows from
Eqs. (A.5) and (A.6) that

LR log ( 2 + Tr{YZ1Qi} + E ipTr{z (2qijp+Qjp+Mij)}
i,jEV' 1 pV/
In Eq. (A.7), the last expression left to compute is Tr{E( I'jp}. For this purpose, we apply
the Cauchy-Schwartz inequality as follows:

1Tr{J }I ij, p Tr{ ij Y ij ((i--ij)(prjp-)T)} = Tr{(ij2 (ri -- i)(l (jp-rj)T~)} ,

< Tr{Z l i}-yr{)ljp}- (A.8)

where we used the fact that the E's and Q's are diagonal matrices. Although the Cauchy-
Schwartz inequality in general does not yield a tight upper bound, in our case it appears
reasonable to assume that variables re and rj (i.e., positions of object parts at different
scales) are uncorrelated. Substituting Eq. (A.8) into Eq. (A.7), we finally derive the upper

bound for LR as

L ( i- --log 2+Tr{StQi} + Evr,pyTr{EZ1 (OY + My)}+

+2E YpviTr{Eij j}Tr{Eij n j,} (A.9)

Optimization of Q(XIZ). Q(XIZ) is fully characterized by parameters Q1. From
the definition of Lx, we have OJ(Q, P)/oQJ =Lx/xOQQk. Due to parent-child dependencies
in Eq. (A.2), it is necessary to iteratively differentiate Lx with respect to Qfk down the
subtree of node i. For this purpose, we introduce three auxiliary terms Fij, Gi, and XA,
which facilitate computation, as shown below:

FiA Ek,leM ij "* [f/Pi
iOLx OFij 9Gj Om
Gi EdP,ced(i) Fdc EkeM lg, O p(i) i X Pi)yo) 'i + g k k

A exp(-OGi/oM$),
where {.}vo denotes that the term is included in the expression for Gi if i is a leaf
node for DTvo. For DTv, the term in braces {.} is 11,---,- included. This allows us
to derive update equations for both models simultaneously. After finding the derivatives
OFij/OQj = "m(lQ-l Pl]1) and Om/OQij=,jjm, and substituting these expres-
sions in Eq. (A.10), we arrive at

OLx/OQj = ijm (l.-:-ck/Pl] + 1 log Ai) (A.11)

Finally, optimizing Eq. (A.11) with the Lagrange multiplier that accounts for the constraint

EkeM- Q =1 yields the desired update equation: Q = iKP1jj, introduced in Eq. (2.13).
To compute Ai, we first find

oG/om=k Ecc(i) (OFc/om? + EaeM(oGc/), I)(',, I/om)) {log P(yp()I x, p())}o ,

cc(i) ZaeM ci (log /f] + Gc/c ) {log P(y() p(i))}v(oA.12)

and then substitute Qk1, given by Eq. (2.13), into Eq. (A.12), which results in
A {P(yp()|, p(i))}voJcv [eEaeM PkA] as introduced in Eq. (2.14).

Optimization of Q(R'lZ). Q(R'|Z) is fully characterized by parameters tLij and Qij.
From the definition of LR, we observe that 9J(Q)/10ij =0LR/0ij and 9J(Q)/O1ij 0LR/I P i.
Since the Q's are positive definite, from Eq. (A.9), it follows that

OLR/IQj =0.5 j (-Tr{ 1}+T.r{ 1}+ CEr',cir{ }+
TrI Yir{ }+
+EpV jpTr{ l ZIp}- JTQijITZjlQjp} +

+Ec ciTr{Z }Tr{zC l i}- Tr{Z } (A.13)

From OLR/OQij=0, it is straightforward to derive the update equation for Qij given by
Eq. (2.17).
Next, to optimize the pij parameters, from (A.9), we compute

~^~~i- -Fi Ei,jPeV' ijJP(P-ij -- -tjp ~ jp) d ij (.ij t jp ,
D-liji D-lij 2i
Ec,peV (fijJpTij Y -jp-djp) ciijj 1 (1Lci-ij-dij)) (A.14)

Then, from OLR/9Ipij 0, it is straightforward to compute the update equation for PLij given
by Eq. (2.16).
Optimization of Q(Z). Q(Z) is fully characterized by the parameters (ij. From the
definitions of Lz, Lx, and LR we see that OJ(Q)/9,ij = 9(Lx+LR+Lz)/9idj. Similar to
the optimization of Qk, we need to iteratively differentiate Lx as follows:

OLx/iOj = OFiy/jlOy + keM(G9/Gm/ )(9 mfj/i4) (A.15)

where Fij and Gi are defined as in Eq. (A.10). Substituting the derivatives OGi/om=- log A,
and aFij/j= EkeM ','jj -.: /Pikl], and m/ni/ =j EZleMQ~j'"j into Eq. (A.15)
we obtain

Lx= EM :'" klcM Q log M A ,
S k1 ij i
Next, we differentiate LR, given by Eq. (A.9), with respect to ij as

1 1
OLRI/9Oj log Ijl/lijl 1 + Tr{Z j} +
y2 2( + )}+2Tr{ }Tr{ t
+2 ( (Qjp+Mijp )}2Tr{Z 1QP {z ,1Q2) +


+ Ecv i (Tr{/ (,+i )}+2Tr{ l } Tr{ lj}) ,(A.17)

= Bj 1 (A.18)

where indexes c, j and p denote children, parents and grandparents of node i, respectively.
Further, from Eq. (A.3), we get

OLz/I2~ij 1 +logijj (A.19)

Finally, substituting Eqs. (A.16), (A.18) and (A.19) into OJ(Q)/OSij=0 and adding the
Lagrange multiplier to account for the constraint ~, v/ij= 1, we solve for the update

equation of ij given by Eq. (2.18).


The inference algorithm for Maximum Posterior Marginal (MPM) estimation on the

quad-tree is known to alleviate implementation issues related to underflow numerical er-

ror [33]. The whole procedure is summarized in Fig. B 1. The algorithm assumes that

the tree structure is fixed and known. Therefore, in Fig. B-1, we simplify notation as

P(xi Z, Y)-P(xilY) and P(xilxj, Z)-P(xijxj). Also, we denote with c(i) children of i,

and with d(i) the set of all the descendants down the tree of node i including i itself. Thus,

Yd(i) denotes a set of all observables down the subtree whose root is i. Also, for comput-
ing P(xilYd(i)), in the bottom-up pass, oc means that equality holds up to a multiplicative

constant that does not depend on xi.

Two-pass MPM estimation on the tree
t Preliminary downward pass: Vi E VL-1, VL-2, ..., V,
SP(xi)- Ex P(xilzj)P(xj),
T Bottom-up pass:
Initialize leaf nodes: Vi E VO
P(' I|II 1 P('i i, )P(xz ),

-Pz,z^yj|) P(z z)P(i)P(' |I 1/P(Xi),
A compute upward Vi E V1, V2..., VL,
P(x| Yd(c))P(xc|lx)
P(X|j1Yd())ocP(:X) Hncc(i) Ex, P(X ) ,

P(xx, xjl d()))-P(xi|lj)P(x )P(xilYd(|))/P(xi),
t Top-down pass:
Initialize root: i E V,
P(x|iY) P(xilYd(i)),
i= argmax, P(xilY),
V compute downward Vi E VL-1, VL-2..., VO,
PfY)-z P(xi, xj1Yd(i))
SP(|,Y) P(zgY ),P(xi ) Y)

i= arg max, P(xilY)

Figure B-1: Steps 2 and 5 in Fig. 3-2: MPM estimation on the fixed-structure tree. Dis-
tributions P(i 1, ) and P(xilxj) are assumed known.

REFFi. i :NC

1] W. E. L. Grimson and T. Lozano-Perez, i )calizing c .1 .:.. :. ts by searching
the interpretation tree," PIattern Anal. Machine' vol. 9, no. 4, pp.

[2] S. Z. Der and R. ( ii i 'robe-based automatic target recognition in infrared
:. I'EE 7.. ... P vol. 6, no. 1, pp. i

3] P. C. Ci .. E. L. C i.. and J. B. Wu, "A spatiotemporal neural network for recog-
nizing .. i .11 occluded ob.- ts," ':'' .. vol. ., no. 7, pp.

4] W. M. Wells, : :. i aches to feature-based ob-'" recognition," Intl. J.
Computer Vision, vol. 21, no. 1, pp.

5] Z. Ying and D. Castanon, partially occluded object recognition using statistical
models," J. C .' Vision, vol. :, no. 1, i 57

S. Z. Li, Maerkov random in. .: : S :. : V -VC Tok -
Japan, 2nd edition, 1.

7] M. H. Lin and C. Tomasi, ..: :es with occlusions from stereo," IEEE
Pattern Anal. Machine Intell., vol. 26, no. 8, pp. 1i'. -' .'

;A. Miittal and L. S. Davis, i ': : a multi-view approach to segmenting and
tracking people in a cluttered scene," Intl J C ...' Vision, vol. 51, no. 3, pp.

B. J. Frey, N. Jc : and A. Kannan, "Learning appearance and t .... manifolds
of occluded ob-:. in 1--.--." in Proc. C' .,' C. Corrputer Vision Pattern
.i .. ', i vol. 1, pp. 45 52, i i i Inc.

[1:: F. Dell'Acqua and R. Fisher, i.. : uction of planar surfaces behind occlusions in
range images," IEEE Trans. Pattern Anal. Machine Intell., vol. 24, no. 4, pp.. : 575,

[11] R. i P. Perona, and A. Zisserman, "Ob'. class recognition by unsupervised
scale-invariant 1. .. .. in Proc. .' C ." Cormputer Vision Pattern Rec.,
son, WI, '.'. vol. 2, pp. 264-271, IEEE, Inc.

[12] A. Mohan, C. Papageorgiou, and T. P I ,i :::i -based obi detection in images
by :::i .:. 1 I' .: Trans. Patlern .'. Machine .C ''. *, vol. 23, no. 4,
p p i. ; : : ... L.

1 M. Weber, M .'. ":: and P. Perona, : awards automatic : :. of obi- cat-
egories," in Proc. 1iEEE C r C Vision Pattern Rec., Hilton Head Island, SC,
vol. 2, 101 1-' -, IEEE, Inc.

[14] Weber, ,'. i.. and P. Perona, "Unsupervised learning of models for recogni-
tion," in Proc. '7 European C Comp. Vision, Dublin, Ireland, ::I, vol. 1, pp.
18-32, .

[15] B. Heisele, T. Serre, M. P : :I T. Better, and T. P I "Categorization learning
and combining obi 1 parts," in Advances in neural : / .' .
14, T. G. Dietterich, S. Becker, and Z. Ghahramani, Eds., vol. 2, i 1 : -2.: MIT
Press, Cambridge, MA,

[1:.' P. F. I and Daniel P. Huttenlocher, L.- : .... structures for object recog-
nition," Intl. J. C .. Vision, vol. .: no. 1, 55-79, :::

[17] H. Schneiderman and T. Kanade, "Obi I detection using the statistics of parts," Intl.
J. Computer Vision, vol. 56, no. 3, pp. 151-177, '.: i.

[18] S. C. . modeling and conceptualization of visual ..'
Trans. Pattern Anal. Machine Intell., vol. no. 6, : .. ::"_ 712, .

[ :S. C. Zhu, Y. N. Wu, and D. B. Mumford, : ::::: entropy principle and its ap-
plications to texture :: 1 ":: Neural Computation, vol. 9, no. 8, pp. '-. 7 1--

C S. Geman and D. Geman, :... ... relaxation, ( distribution and the E
restoration of images," IEEE Trans. Pattern Anal. Machine Intell., vol. 6, no. 6, pp.
721-741, 1--

[21] A. F : and T. ::: "Texture thesisis byh :, .: .: : .::..1i:. in Proc. In l.
C. C .,: .'. : Vision, Y : Greece, 1. vol. 2, i 1.: -1: I;i i Inc.

[22] J. S. De Bonet and P. Viola, texture e recognition using a non-pararetric multi-scale
statistical model," in IEEE C .-' Computer Vision Patterni Rec., Santa Barbara,
CA. 1=: 641 7, I= Inc.

[23] M. J. Beal, N. Jojic, and H. Attias, "A, : .i '. model for i visual ob' .
1 .' Trans. Pallern Anal. Machine Intell., vol. no. 7, "

SJ. Coughlan and A. ..ii. "Algorithms from statistical .1. for generative models
of images," a.. and Vision C .'" : vol. 21, no. 1, pp. 29-.

SS. Kurnar and M. Hebert, "Discriminative random 1: a discriminative framework
for contextual interaction in i :.: .i: in Proc. IEEE Inll. C. *' Comp. Vision,
Nice, Frane, '::': vol. 2, pp. 1150-1157, Ii 1 Inc.

SJ. i .i. : A. McCallum, and F. IPereira, "Conditional random fields: ....1
models ... segmenting and labeling sequence data," in Intl. C .' Machine
.. ii. ...... .i. M A :::: r.-. 282-

SC. A. Bouman and M. Shapiro, "A mult.iscale random field model for i' .: image
segmentation," IEEE .. F: : V. vol. 3, no. 2, : 162-177, 1

W. W. Irving, P. W. Fieguth, and A. S. ..':. "An overlapping tree approach to
multiscale stochastic ::: I ::: and estimation," IEEE Irans. ......: .... vol.
6, no. 11, pp. 1517 1

[29] i ( .. .. and C. A. Bouman, .i:iscale 1 .. segmentation using a trainable
context model," '' '' .* .* .' vol. 10, no. 4, i 511 525, .

::M. S. Crouse, R. D. Nowak, and R. G. Baraniuk,
processing using ii: n Markov Models," IEEE V .:.
4, i : -,: i:

i: -based statistical signal
Signal .. vol. i : no.

S X. ... C.K. 1. '. and S. N. Felderhof, "Combining belief networks and neural
networks for scene segmentation," .' Pattern Anal. Machine vol. 24,
no. 4, pp. -483, .

" S. Todorovic and M. C. "- i1 awards :::
Vehicles: rnultiscale Viterbi I. :: .:: :: in .
Vision, Prague, Czech :. : vol. 2, pp. 1"

J.-M. Laferte, P. erez, and F. ii.
on the quadtree." IEEE Trans. T.

Mission : : i of Micro Air
I. E.:. .. .. C ...." Conmputer
.' ,, .

S)iscrete Markov image modeling and inference
F. .. vol. 9, no. 3, pp. i. .

[34] M. R. Luettgen and A. S. '.'ii 1, "i : :1: : calculation for a class of multiscale
stochastic models, with i:' : : to texture discrimination," IEEEI '"..- i:
F .. vol. 4, no. 2, pp. 194- :'. 1-

SP. L. Ainsleigh, N. i *.:arnavaz, and R. L. Streit, i .i.. ('... i .. I.v models for
signal classification," IEEE ,.' F.. .. vol. 50, no. :. I. 1 -1367,

J. Pearl, Probabilistic : : in .' .''. : networks of plausible '.. '.
Morgan Kaufamnn, San Mateo, CA, 1

[37] J. -.. .. .. 1.:, T S.
framework ... analysis of
vol. : 1" no. 5, pp.

. .i i a ind A. S. .ii i .'ee-based ri .... !. ization
.. uct and related algorithms," IEEE ... I inm.
i: :: 11.

Brendan J. Frey, ( '. models : machine .. and i..' communication,
The MIT Press, Cambridge, MA, 1

S. Kumar and M. i::. i'rt, i .... e structure detection in natural images using a
causal multiscale random :. -." in Proc. IEEE C ".' Computer Vision Pattern Rec.,
Madison, WI. : vol. 1, pp. 119-126, I: : Inc.

i-. M. K. Schneider, P. W. Fiegut.h, W. C. Karl, and A. S. Willsky, : i: ;cale methods
for the segmentation and reconstruction of signals and images," IEEE '.. I:.
vol. 9, no. 3, pp. ::

[41] J. Li, R. M. Gray, and R. A. Olshen, : solutionin image : ..... by hierar-
chical ...- -. "..; with two-dimensional IT .. Markov Models," IEEE I*n! inm.
T,77 vol. : no. 5, pp. 1 18/41, '.:::::

SI W. K. Konen, T. Maurer, and C. von der M I1 1 "A fast d : .: link matching
algorithm for invariant pattern recognition," Neural Networks, vol. 7, no. 7, pp.
1019 1030, ::

[43] A. Montanvert, P. Meer, and A. i .. 1. i .. ..-chical image ... 1 using irregular
tessellations," ,'.' .' Pattern Anal. Machine Intell., vol. 13, no. 4, .. 307 3::
1 "-1.

[44] P. Bertolino and A. Montanvert, :: :: solution segmentation using the irregular
in Proc. Intl. C c ..... .... Lausanne, Switzerland, '., vol. 1,
pp. -260, I : Inc.

[45] N. J. Adams, A. J. Storkey, Z. (' ... ...** and C. K. 1. .' i ..*.. : iDTs: mean
field dynamic trees," in 15Ith Intl. C 7.. Pattern Rec., Barcelona, Spain, :::
vol. 3, pp. 147-150, Intl. Assoc. Pattern Rec.

SN. J. Adams, D-:...: .: trees: a hierarchical ... ,. to ..
Ph.D. d nation, Division of Informatics, Univ. of Edinburgh, E : :::. UK, i

[47] A. J. Storkey, \,. .. trees: a structured variational method giving ...
nation rules," in Uncertainty in ...." C. B ... and M. Goldszmid t,
Eds., : -.573. Morgan I ....... San Francisco, CA, :

SA. J. S and C. K. I. W: :: :: :: 1 ::: with position-encoding dynamic
trees," '1 :' .. Patlern Anal. Machine In dll., vol. no. 7, : ;. 71, '.

S 1. Jordan, i in models (adaptive computation and machine
S' MIT press, Cambridge, MA, 1 :"

M. I. Jordan, "Graphical models," .: Science ( issue on PE.-.. :..- statis-
tics), vol. 19, pp. 1.: 1-155, ...

[51] A. IP. Dempster, N. M. Laird, and 1). B. Rubini, i .. ... 1.1. 1... 1 from incomplete
data via the i i algorithm," Journal of the Statistical S. '; B, vol. ", pp.
1- 1977.

[52] G. J. McLachlan and K. :. i:: E: iM .... .' and extensions, John
&. Sons, New York, NY, 1 .

[53] D. M ( :... : .... and 1). i .. i .. .. .. ... ... for the marginal like-
lihood of ....I.. data given a : network," in Proc. : C .
.' '" Portland, OR. 1 :: pp. 158 168, Assoc. Uncertainty Artificial
] ..:

[54] S. Todorovic and M. C. ba, i .:: i elation of ..i scenes using generative
dynamic-struct.ured models," in CD-ROM Proc. IEEE CVPR ',, Wo1orkshop on
Generativeu-Model Based Vision (GAMBV), NWashington, DC, :-,:' IEEE, Inc.

[55] S. Todorovic and M. C. Nechyba, electionn of artificial structures in natural-scene
images using dynamic trees," in Proc. : Intl. C .' Pattern Rec., Cambridge, UK,
i 39, Intl. Assoc. Pattern Rec.


56] M. Ait.kin and D. B. Rubin, i i :: : :: and hypothesis testing in -::: mixture
models," J. ..' Soc., vol. B-47, no. 1, pp. !

157] R. M. Neal, "Probabilistic inference using Markov ( ... Monte Carlo methods," Tech.
Rep. ( i G-(' 1 Connectionist Research Group, of Toronto,

D. A. F-i th, J. Haddon, and S. I- "The joy of -::::: Intl. J. Computer
Vision, vol. 41, no. 1-2, pp. II:. 34,

M. I. Jordan, Z. ( ".. .. .. T. S. Jaakkola, and L. K. Saul, "An introduction
to variational methods for ..:'.: models," Machine vol. no. 2, pp.

:: J. C. MacKay, I : inference, and ,; '".: Cambridge
Univ. Press, Cambridge, UK, '

161] D. Barber and P. van de Laar, .. ... cumulant ': for intractable dis-
tributions," J. Artificial Intell. Research, vol. 10, pp. 1 -

S D. J. C. MacKay, I '" rinerence, and '.: chapter '
pp. ; Cambridge University Press, Cambridge, UK, '.

I). J. C(. MacKay, i:.:... auction to Monte Carlo methods," in ,'.. :.' in .
models (adaptive computation and machine '. ), M. I. Jordan, Ed., pp. 175-204.
: i' press, Cambridge, MA, 1- :

164] T. S. Jaakkola, "Tutorial on variational : oximation methods," in Adv. Mean Field
JMethods, M. Opper and D. Saad, Eds., I 1 61. 1 i press, Cambridge, MA, 'k :

165] T. M. Cover and J. A. Thomas,. of "' ," .i j Interscience
Press. New York. NY, 1 .

[ r 'lYygve Randen and lHakon llusoy, alteringg for texture a comparative
study," IEEE .: Pattern Anal. Machine Intell., vol. 21, no. 4, pp. 291-310, 1'- *'

i *S. .:. 4. A. ., wavelet tour of .. Academic Press, San Diego, CA,
2nd edition. :

L j 1. -'.'. G. .. .: "A theory for rnultiresolution signal decomposition: the wavelet
representation," IEEJE Pattern Anal. Machine Intell. vol. 11, no. 7, ] -.1
S' 1' '

S' Jerome M. Shapiro, i ..i added image coding using zerotrees of wavelet ... .
IEEE .' on f. vol. 41, no. 12, pp. 34455 -2, 1 '
H- N. G. Kingsbury, "Corplex wavelets for shift invariant !. and filtering of signals,"
J. 1 .' C' HIarmonic 1 vol. 10, no. 3, : .

Michael Unser, texture classification and segmentation using wavelet frames," IEEE'
Trans. on vol. 4, no. 11, 1

[72] Nick Kingsbury, "Corrplex wavelets for shift invariant ... 1 and : ,of signals,"
Journal of Applied and C '' Harnmonic A4 vol. 10, no. 3, pp. 253,


173] T. UI:: 1. i theory: a basic tool : : : structures at different
scales," J. .~..:' .7 Statistics, vol. 21, no. 2, pp. 224 270,

[74] D. G. Lowe. "Distinctive image ...'es from scale-invariant keypoints," Intl. J.
C. Vision, vol. :, no. 2, pp. 91 110, :

[75] S. Belongie, J. :' and J. Puzicha, : matching and object recognition using
shape contexts," IEEE Pattern, Anal. Machine Intell., vol. 24, no. 4, pp.

SB. J. Frey, N. Jc : and A. Kannan, "Learning appearance and t .... manifolds
of occluded ob' in 1 in Proc. .' C CI .. Vision Pattern Rec.,
.. ., i: .. vol. 1, pp. 45 52, i i i Inc.

[77] G. Jones III and B. Bhanu, 'cognition of articulated and occluded obi IEEE
S Pattern Anal. Machine Intell., vol. *, no. 7,

J. S. Y, ...i. W. T. Freeman, and Y. Weiss, "C.( .. .. belief .. .: in
Advances in neural ..' '. 183, T. K. Leen, T. G. Dietterich,
and V. 'resp, Eds., pp. : 95. i Press, Cambridge, MA, ::A .

S Y. Freund and R. E. .e, "A decision-theoretic .: : : :of : ::: learning
and an i" : to b- 1: J. C '. Sciences, vol. 55, no. 1, pp.
39, :'

:: V. N. .. ..i John .' i. k, Sons, Inc., New York, NY,

l P. Dorningos, i: ost: a general method for making : :i: cost-sensitive," in
Proc. 15th Intl. C. r /.. '.' ? Data /.u V.., San Diego, CA, i pp.
155-164, A( Press.

[82] T. G. Dietterich, i.. learning for sequential data: a review," in :. 'e notes
in corrputer science, T C I i vol. :., pp. 15 30. Springer-V. i ...
Germany, '::: '


STodorovic was born in Belgrade, Serbia, in 1968. He graduated from Mathemat-

ical I:: i: School-Belgrade in 1' 7. He received his B.S. degree in electrical and computer

engineering at the Uli: .: .' of Belgrade, Serbia, in i F' r From 1994 -::: :il, he worked

as a .. :. e engineer in the communications industry. In fall ::: Sinisa Todorovic

enrolled in the master's degree program at. the Department of Fi ::1 a.nd Computer

S" 'U- .." 'r of Florida, G ... C i.. became a member of the Center for

: :o Air Vehicle Research, where he conducted research in statistical image modeling and

multi-resolution signal processing. ... Todorovic earned his master's degree ( j i. thesis

option) in December, '::: : after which he continued his studies toward a IF i). degree in the

same D. .: tment.. He received two certificates for outstanding academic. .. ..

in :::: and ':::. He to graduate in May,

Full Text




ACKNOWLEDGMENTS IwouldliketoexpressmysinceregratitudetoDr.MichaelNe chybaforhiswiseandpatientguidanceofmyresearchforthisdissertation.Asmyfo rmeradvisor,Dr.Nechybahas beendirectingbutonnoaccountconningmyinterests.Iesp eciallyappreciatehisreadinessandexpertisetohelpmesolvenumerousimplementation issues.Mostimportantly,I amthankfulforthefriendshipthatwehavedevelopedcollab oratingonthiswork. Also,IthankmycurrentadvisorDr.DapengWuforputtingext raeorttohelpme nalizemyPhDstudies.Iamgratefulforhisinvaluablepiec esofadviseinchoosingmy futureresearchgoals,aswellasforpracticalconcreteste psthatheundertooktohelpme ndajob. MythanksalsogotoDr.JianLi,whohelpedmealotinthetrans itionperiodin whichIwassupposedtochangemyadvisor.Herresearchgroup providedforastimulating environmentformetoendeavorinvestigatingareasthatare beyondtheworkpresentedin thisdissertation. Also,IthankDr.AntonioArroyo,whosebrilliantlectureso nmachineintelligence haveinspiredmetodoresearchintheeldofmachinelearnin g.Asthedirectorofthe MachineIntelligenceLab(MIL),Dr.Arroyohascreatedawar m,friendly,andhardworking atmosphereamongthe\MIL-ers."Thankstohim,Ihavedecide dtojointheMIL,which hasprovedonnumerousoccasionstobetherightdecision.It hankallthemembersofthe MILfortheirfriendshipandsupport. IthankDr.TakeoKanadeandDr.AndrewKurdilaforsharingth eirresearcheorts onthemicroairvehicle(MAV)projectwithme.Themultidisc iplinaryenvironmentof thisprojectinwhichIhadachancetocollaboratewithvario usresearcherswithdiverse educationalbackgroundswasagreatexperienceforme. ii


TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................ii LISTOFTABLES .....................................v LISTOFFIGURES KEYTOABBREVIATIONS ...............................viii KEYTOSYMBOLS ....................................x ABSTRACT ........................................xii CHAPTER1INTRODUCTION ..................................1 1.1Part-BasedObjectRecognition .......................3 1.2ProbabilisticFramework ...........................4 1.3Tree-StructuredGenerativeModels .....................6 1.4LearningTreeStructurefromDataisanNP-hardProblem .......8 1.5OurApproachtoImageInterpretation ...................9 1.6Contributions .................................10 1.7Overview ...................................12 2IRREGULARTREESWITHRANDOMNODEPOSITIONS .........13 2.1ModelSpecication ..............................13 2.2ProbabilisticInference ............................16 2.3StructuredVariationalApproximation ...................18 2.3.1Optimizationof Q ( X j Z ) .......................19 2.3.2Optimizationof Q ( R 0 j Z ) .......................20 2.3.3Optimizationof Q ( Z ) .........................21 2.4InferenceAlgorithmandBayesianEstimation ...............21 2.5LearningParametersoftheIrregularTreewithRandomNo dePositions 23 2.6ImplementationIssues ............................25 3IRREGULARTREESWITHFIXEDNODEPOSITIONS ...........27 3.1ModelSpecication ..............................27 3.2InferenceoftheIrregularTreewithFixedNodePosition s ........30 3.3LearningParametersoftheIrregularTreewithFixedNod ePositions ..31 4COGNITIVEANALYSISOFOBJECTPARTS .................35 4.1MeasuringSignicanceofObjectParts ...................36 iii


4.2CombiningObject-PartRecognitionResults ................37 5FEATUREEXTRACTION .............................39 5.1Texture ....................................39 5.1.1WaveletTransform ..........................39 5.1.2WaveletProperties ..........................41 5.1.3ComplexWaveletTransform .....................42 5.1.4Dierence-of-GaussianTextureExtraction .............44 5.2Color ......................................45 6EXPERIMENTSANDDISCUSSION .......................46 6.1UnsupervisedImageSegmentationTests ..................47 6.2TestsofConvergence .............................50 6.3ImageClassicationTests ..........................53 6.4Object-PartRecognitionStrategy ......................57 7CONCLUSION ....................................63 7.1SummaryofContributions ..........................63 7.2OpportunitiesforFutureWork .......................65 APPENDIXADERIVATIONOFVARIATIONALAPPROXIMATION .............67 BINFERENCEONTHEFIXED-STRUCTURETREE ..............72 REFERENCES .......................................74 BIOGRAPHICALSKETCH ...............................80 iv


LISTOFTABLES Table page 5{1CoecientsoftheltersusedintheQ-shiftDTCWT. .............43 6{1Root-nodedistanceerror .............................49 6{2Pixelsegmentationerror .............................50 6{3Objectdetectionerror ...............................50 6{4Objectrecognitionerror .............................55 6{5Pixellabelingerror ................................55 6{6ObjectrecognitionerrorforIQT V 0 .......................59 6{7PixellabelingerrorforIQT V 0 ..........................59 v


LISTOFFIGURES Figure page 1{1VariantsofTSBNs ................................7 1{2Anirregulartreeconsistsofaforestofsubtrees ................8 1{3Bayesianestimationoftheirregulartree ....................11 2{1Twotypesofirregulartrees ...........................13 2{2Pixelclusteringusingirregulartrees ......................17 2{3Irregulartreelearnedforthe4 4imagein(a) .................17 2{4Inferenceoftheirregulartreegiven Y R 0 ,and ...............24 3{1Classesofcandidateparents ...........................30 3{2Inferenceoftheirregulartreewithxednodepositions ............32 3{3Algorithmforlearningtheparametersoftheirregulart ree ..........34 4{1ForeachsubtreeofIT V ,representinganobjectinthe128 128image ...37 4{2ForeachsubtreeofIT V ,representinganobjectinthe256 256image ...38 5{1TwolevelsoftheDWTofatwo-dimensionalsignal. .............40 5{2Theoriginalimage(left)anditstwo-scaledyadicDWT(r ight). .......40 5{3TheQ-shiftDual-TreeCWT. ..........................42 5{4TheCWTisstronglyorientedatangles 15 ; 45 ; 75 ..........43 6{120imageclassesintypeIandIIdatasets. ...................48 6{2Imagesegmentationusing IT V 0 .........................48 6{3ImagesegmentationusingIT V 0 :(top)datasetIimages ............48 6{4ImagesegmentationbyirregulartreeslearnedusingSVA ...........49 6{5ImagesegmentationbyirregulartreeslearnedusingSVA :(a)IT V 0 .....49 6{6ImagesegmentationusingIT V .........................49 6{7Comparisonofinferencealgorithms .......................51 6{8Typicalconvergencerateoftheinferencealgorithmfor IT V 0 onthe128 128 52 6{9Typicalconvergencerateoftheinferencealgorithmfor IT V 0 onthe256 256 52 vi


6{10Percentageincreaseinlog-likelihood ......................52 6{11Comparisonofclassicationresultsforvariousstati sticalmodels ......55 6{12MAPpixellabelingusingdierentstatisticalmodels. .............56 6{13ROCcurvesfortheimageinFig.6{12awithIT V 0 ,TSBN,DRFandMRF. 56 6{14ROCcurvesfortheimageinFig.6{12awithIT V ,IT V 0 ,TSBN,andTSBN 56 6{15Comparisonoftworecognitionstrategies ....................58 6{16RecognitionresultsoverdatasetIVforIQT V 0 ................60 6{17RecognitionresultsoverdatasetVforIQT V 0 .................61 6{18Classicationusingthepart-objectrecognitionstra tegy ...........62 B{1Steps2and5inFig.3{2 .............................73 vii


KEYTOABBREVIATIONS Thelistshownbelowgivesadescriptionofthefrequentlyus edacronymsorabbreviationsinthiswork.Foreachname,thepagenumbercorrespon dstotheplacewherethe nameisrstused.B :bluechanneloftheRGBcolorspace..................... ....43 G :greenchanneloftheRGBcolorspace.................... ....43 R :redchanneloftheRGBcolorspace...................... ....43 IQT V :irregulartreewithxednodepositions,andwithobservab lespresentatall levels...........................................26 IQT V 0 :irregulartreewithxednodepositions,andwithobservab lespresentonlyat theleaf-level...................................... .26 IT V 0 :irregulartreewhereobservablesarepresentonlyatthele af-level.......13 IT V :irregulartreewhereobservablesarepresentatalllevels .............13 g :normalizedgreenchannel............................ ....43 r :normalizedredchannel.............................. ...43 CWT:ComplexWaveletTransform........................ ...40 DRF:DiscriminativeRandomField...................... .....51 DTCWT:DualTreeComplexWaveletTransform.............. ......40 DWT:DiscreteWaveletTransform....................... .....37 EM:Expectation-Maximizationalgorithm............... .........7 KL:Kullback-Leiblerdivergence...................... .......17 MAP:MaximumAPosteriori............................. ..3 MCMC:MarkovChainMonteCarlomethod................... ...15 ML:MaximumLikelihood............................... ..22 MPM:MaximumPosteriorMarginal....................... ....69 MRF:MarkovRandomField.............................. .2 NP:nondeterministicpolynomialtime.................. ........7 viii


RGB:Thecolorspacethatconsistsofred,greenandbluecolo rvalues........43 ROC:receiveroperatingcharacteristic................ ..........52 SVA:structuredvariationalapproximationinferencealgo rithm............16 TSBN:tree-structuredbeliefnetwork.................. ........5 VA:variationalapproximationinferencealgorithm...... .............16 ix


KEYTOSYMBOLS Thelistshownbelowgivesabriefdescriptionofthemajorma thematicalsymbols denedinthiswork.Foreachsymbol,thepagenumbercorresp ondstotheplacewherethe symbolisrstused.A ij :inruenceofobservables Y on ij ..........................20 B ij :inruenceofthegeometricpropertiesofthenetworkon ij ............20 G :numberofcomponentsinaGaussianmixture.............. ......15 H i :Shanon'sentropyofnode i ..............................34 J ( Q;P ):freeenergy....................................64 L :maximumnumberoflevelsintheirregulartree........... ........13 M :setofimageclasses(i.e.,objectappearances)......... ...........13 P kl ij :conditionalprobabilitytables...................... ......13 Q klij :approximateconditionalprobabilitytables,given Y and R 0 ...........18 R :positionsofallnodesintheirregulartree............. .........13 R 0 :positionsofnon-leafnodesintheirregulartree........ ...........13 R 0 :positionsofleafnodesintheirregulartree............ .........13 V :setofallnodesintheirregulartree................... .......13 V 0 :setofallnon-leafnodesintheirregulartree........... .........13 V 0 :setofallleafnodesintheirregulartree............... ........13 X :randomvectorofall x ki ................................13 Y :allobservables.................................... .13 Z :connectivityrandommatrix.......................... ....13 C :costfunction...................................... .20 :thesetofallparameters f p i g intheirregulartreewithxednodepositions...28 ij covariancematrixofarelativechild-parentdisplacement ( r i r j ).........13 :setofparametersthatcharacterizeanirregulartree... .............15 x


n ij :approximatecovarianceof r i ,giventhat j istheparentof i ,andgiven Y and R 0 18 ij :approximatemeanof r i ,giventhat j istheparentof i ,andgiven Y and R 0 ..18 ( i ):coordinateofanobservablerandomvectorintheimagepla ne.........13 ` :indexoflevelsintheirregulartree................... ........13 r ij :probabilityofanode i beingthechildof j .....................13 :normalizationconstant............................. ....18 :setofparametersthatcharacterizeaGaussianmixture... ............15 ij :approximateprobabilityof i beingthechildof j ,given Y and R 0 ........18 m ki :approximateposteriorthatnode i islabeledasimageclass k ,given Y and R 0 .19 x i :image-classofnode i ..................................13 x ki :image-classindicatorif k classisassignedtonode i ................13 z ij :connectivityindicatorrandomvariablebetweennodes i and j ..........13 d ij themeanofrelativedisplacement r i r j .......................13 r i :positionofnode i intheimageplane.........................13 y ( i ) :observablerandomvector............................ ..13 xi


AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy IRREGULAR-STRUCTURETREEMODELSFORIMAGEINTERPRETATION By SinisaTodorovic May2005 Chair:DapengWuMajorDepartment:ElectricalandComputerEngineering Inthisdissertation,weseektoaccomplishthefollowingre latedgoals:(1)tonda unifyingframeworktoaddresslocalization,detection,an drecognitionofobjects,asthree sub-tasksofimage-interpretation,and(2)tondacomputa tionallyecientandreliable solutiontorecognitionofmultiple,partiallyoccluded,a likeobjectsinagivensingleimage. Thesecondproblemistodateanopenproblemincomputervisi on,eludingasatisfactory solution.Forthispurpose,weformulateobjectrecognitio nasBayesianestimation,whereby classlabelswiththemaximumposteriordistributionareas signedtoeachpixel.Toecientlyestimatetheposteriordistributionofimageclass es,weproposetomodelimages withgraphicalmodelsknownas irregulartrees Theirregulartreespeciesprobabilitydistributionsove rbothitsstructureandimageclasses.Thismeansthat,foreachimage,itisnecessary toinfertheoptimalmodel structure,aswellastheposteriordistributionofimagecl asses.WeproposeseveralinferencealgorithmsasasolutiontothisNP-hardproblem(nonde terministicpolynomialtime), whichcanbeviewedasvariantsoftheExpectation-Maximiza tion(EM)algorithm. Afterinference,themodelrepresentsaforestofsubtrees, eachofwhichsegmentsthe image.Thatis,inferenceofmodelstructureprovidesasolu tiontoobjectlocalizationand detection. xii


Withrespecttooursecondgoal,wehypothesizethatforasuc cessfuloccluded-object recognitionitiscriticaltoexplicitlyanalyzevisibleob jectparts.Irregulartreesareconvenientforsuchanalysis,becausethetreatmentofobjectpar tsrepresentsmerelyaparticular interpretationofthetree/subtreestructure.Weanalyzet he signicance ofirregular-tree nodes,representingobjectparts,withrespecttorecognit ionofanobjectasawhole.This informationisthenexploitedtowardtheultimateobjectre cognition. Empiricalresultsdemonstratethatirregulartreesmoreac curatelymodelimagesthan theirxed-structurecounterpartsquad-trees.Also,thee xperimentsreportedhereinshow thatourexplicittreatmentofobjectpartsresultsinanimp rovedrecognitionperformance, ascomparedtothestrategiesinwhichobjectcomponentsare notexplicitlyaccountedfor. xiii


CHAPTER1 INTRODUCTION Imageinterpretationisadicultchallengethathaslongbe enconfrontingthecomputervisioncommunity.Anumberoffactorscontributetothecomp lexityofthisproblem.The mostcriticalisinherentuncertaintyinhowtheobservedvi sualevidenceinimagesshould beattributedtoinferobjecttypesandtheirrelationships .Inadditiontovideonoise,there arevarioussourcesofthisuncertainty,includingvariati onsincameraqualityandposition, wide-rangingilluminationconditions,extremescenedive rsity,andtherandomnessofobject appearances,clutterandlocationsinscenes. Oneofthecriticalhindrancestosuccessfulimageinterpre tationisthatobjectsmay occludeeachotherinacomplexscene.Intheliterature,the initialresearchontheinterpretationofsceneswithocclusionsappearedinearlyninet ies.However,inthelastdecade relativelysmallvolumeoftherelatedliteraturewaspubli shed.Infact,amajorityofthe recentlyproposedvisionsystemsisnotdirectlyaimedatso lvingtheproblemofoccludedobjectrecognition;experimentsonimageswithocclusions arereportedasasideresultonly toillustratetheversatilityofthosesystems.Thissugges tsthatrecognitionofpartially occludedobjectsisanopenproblemincomputervision,whic hmotivatesustoseekits solutioninthisdissertation. Intheinitialwork,localfeatures(e.g.,points,lineandc urvesegments)areusedto representobjects,allowingtheunoccludedfeaturestobem atchedwithobjectfeatures,by computingascalarmeasureofmodelt[ 1 2 3 ].Theunmatchedscenefeaturesaremodeled asspuriousfeatures,andtheunmatchedobjectfeaturesind icatetheoccludedpartofthe object.Thematchingscoreiseitherthenumberofmatchedob jectfeaturesorthesumofa Gaussian-weightedmatchingerror.Themainlimitationwit htheseapproachesisthatthey donotaccountforthespatialcorrelationamongocclusions Statisticalapproachestooccluded-objectrecognitionha vealsobeenreportedinthe literature.Forinstance,Wells[ 4 ],andYingandCastanon[ 5 ]proposeprobabilisticmodels 1


2 tocharacterizescenefeaturesandthecorrespondencebetw eensceneandobjectfeatures. Theauthorsmodelbothobject-featureuncertaintyandthep robabilitythattheobject featuresareoccludedinthescene.Theyintroducetwostati sticalmodelsforocclusion. Onemodelassumesthateachfeaturecanbeoccludedindepend entlyofwhetheranyother featuresareoccluded,whereasthesecondmodelaccountsfo rthespatialcorrelationto representtheextentofocclusion.Thespatialcorrelation iscomputedusingaMarkov RandomField(MRF)modelwithaGibbsdistribution[ 6 ].Themaindrawbackofthese systemsisaprohibitivecomputationalload;therun-timeo fthesealgorithmsisexponential inthenumberofobjectstoberecognized. Otherrelatedworkexploitsauxiliaryinformationprovide d,forexample,byimage sequencesorstereoviewsofthesamescene[ 7 8 9 10 11 5 ],whereocclusionsaretransitory. Sincethisinformationingeneralmaynotbeavailable,and/ orocclusionsmayremain permanent,inourapproachwedonotusethestrategiesofthe sesystems. Areviewoftherelatedliteraturealsosuggeststhatthemaj orityofvisionsystemsare designedtodealwithonlyoneconstrainedvisiontask,such as,forexample,imagesegmentation[ 10 11 5 ].However,toconductimageinterpretation,asisourgoal, itisnecessary toperformthreerelatedtasks:(1)localization,(2)detec tion(alsocalledimagesegmentation),and(3)ultimaterecognitionofobjectappearances( alsocalledimageclassication). Further,inmanysystemsinwhichthethreesub-tasksareadd ressed,thisisnotdoneina uniedmanner.Here,asadrawback,thesystem'sarchitectu recomprisesaserialconnectionofseparatemodules,withoutanyfeedbackontheaccura cyoftheultimaterecognition. Moreover,visionsystemsaretypicallydesignedtorecogni zeonlyaspecicinstanceofobjectclassesappearingintheimage(e.g.,face),which,int urn,isassumeddissimilarto otherobjectsintheimage.However,theassumptionofuniqu enessofthetargetclassmay notbeappropriateinmanysettings.Also,thesuccessofthe sesystemsusuallydependson adhoc ne-tuningofthefeature-extractionmethodsandsystem's parameters,optimized forthatuniquetargetclass.Withcurrentdemandstodesign systemscapableofclassifying thousandsofimageclassessimultaneously,itwouldbedic ulttogeneralizetheoutlined approaches.


3 Thesmallvolumeofpublishedresearchaddressingocclusio nsinimagessuggeststhat theproblemisnotfullyexamined.Also,thedrawbacksofthe abovesystems{namely:constrainedgoalsandsettingsofoperation,poorspatialmode lingofocclusion,andprohibitive computationalload{motivatedustoconducttheresearchre portedherein.Ourmotivation isthatmostobjectclassesseemtobenaturallydescribedby afewcharacteristicpartsor componentsandtheirgeometricalrelation.Wehypothesize thatitisnotthepercentageof occlusionthatiscriticalforobjectrecognition,butrath erwhichobjectpartsareoccluded. Notallcomponentsofanobjectareequallyimportantforits recognition,especiallywhen thatobjectispartiallyoccluded.Giventwosimilarobject sintheimage,thevisiblepartsof oneobjectmaymisleadthealgorithmtorecognizeitasitsco unterpart.Therefore,careful considerationshouldbegiventotheanalysisofdetectedvi sibleobjectparts.Oneofthe benetsofsuchanalysisistherexibilitytodevelopvariou srecognitionstrategiesthatweigh theinformationobtainedfromthedetectedobjectpartsmor ejudiciously.Inthefollowing section,wereviewsomeofthereportedpart-basedobject-r ecognitionstrategies. 1.1Part-BasedObjectRecognition Recently,therehasbeenarurryofresearchrelatedtopartbasedobjectrecognition. Forexample,Mohanetal.[ 12 ]useseparateclassierstodetectheads,arms,andlegsof peopleinanimage,andanalclassiertodecidewhetherape rsonispresent.However, theapproachrequiresobjectpartstobemanuallydenedand separatedfortrainingthe individualpartclassiers.Tobuildasystemthatiseasily extensibletodealwithdierent objects,itisimportantthatthepartselectionprocedureb eautomated.Oneapproachin thisdirectionisdevelopedbyWeberetal.[ 13 14 ].Theauthorsassumethatanobjectis composedofpartsandshape,wherepartsareimagepatches,w hichmaybedetectedand characterizedbyappropriatedetectors,andshapedescrib esthegeometryofthemutual positionofthepartsinawaythatisinvariantwithrespectt origidand,possibly,ane transformations.Theauthorsproposeajointprobabilityd ensityoverpartappearances andshapethatmodelstheobjectclass.Thisframeworkisapp ealinginthatitnaturally allowsforpartsofdierentsizesandresolutions.However ,duetocomputationalissues,to learnthejointprobabilitydensity,theauthorschooseheu risticallyasmallnumberofparts


4 pereachobjectclass,renderingthedensityunreliableint hecaseoflargevariationsacross images. Probabilisticdetectionofobjectpartshasalsobeenrepor ted.Forinstance,Heisele etal.[ 15 ]proposetolearnobjectcomponentsfromasetofexamplesba sedontheirdiscriminativepower,andtheirrobustnessagainstposeandil luminationchanges.Forthis purpose,theyuseSupportVectorMachines.Also,Felzenszw albandHuttenlocher[ 16 ]representanobjectbyacollectionofpartsarrangedinadeform ableconguration.Intheir approach,theappearanceofeachpartismodeledseparately byGaussian-mixturedistributions,andthedeformablecongurationisrepresentedbysp ring-likeconnectionsbetween pairsofparts.Themainproblemofthementionedapproaches isthattheylacktheanalysis ofobjectpartsthroughscales.Itisassumedthatpartscann otcontainothersub-parts,and thatobjectsareunionsofmutuallyexclusivecomponents,w hichishardtojustifyformore complexobjectclasses. ToaddresstheanalysisofobjectpartsthroughscalesSchne idermanandKanade[ 17 ] proposeatrainablemulti-stageobjectdetectorcomposedo fclassiers,eachmakingadecisionaboutwhethertoceaseevaluation,labelingtheinpu tasnon-object,ortocontinue furtherevaluation.Thedetectorordersthesestagesofeva luationfromalow-resolutionto ahigh-resolutionsearchoftheimage. Theaforementionedapproachesarenotsuitableforrecogni tionofalargenumberof objectclasses.Asthenumberofclassesincreasesthereisa combinatorialexplosionof thenumberoftheirparts(i.e.,imagepatches)thatneedtob eevaluatedbyappropriate detectors. Inthisdissertation,weseekasolutiontotheoutlinedprob lems.Ourgoalittodesigna visionsystemthatwouldanalyzemultipleobjectclassesth roughtheirconstituent,"meaningful"partsatanumberofdierentresolutions.Tothisen d,weresorttoaprobabilistic framework,asdiscussedinthefollowingsection. 1.2ProbabilisticFramework Weformulateimageinterpretationasinferenceofaposteri ordistributionoverpixel randomeldsforagivenimage.Oncetheposteriordistribut ionofimageclassesisinferred,


5 eachpixelcanbelabeledthroughBayesianestimation(e.g. maximumaposteriori {MAP). Withinthisframework,itisnecessarytospecifythefollow ing: 1.Theprobabilitydistributionofimageclassesoverpixel randomelds, 2.Theinferencealgorithmsforcomputingtheposteriordis tributionofimageclasses, 3.Bayesianestimationforultimatepixellabeling,thatis ,objectrecognition. Ourprincipalchallengeliesinchoosingastatisticalmode lforspecifyingtheprobability distributionofimageclasses,sincethischoicecondition stheformulationofinferenceand Bayesianestimation.Asuitablemodelshouldbecomputatio nallymanageable,andsucientlyexpressivetorepresentawiderangeofpatternsini mages.Areviewoftheliterature oersfourbroadclassesofmodels[ 18 ].Thedescriptivemodelsareconstructedbasedon statisticaldescriptionsofimageensembleswithvariable sonlyatonelevel(e.g.,[ 19 20 ]). Thepseudo-descriptivemodelsreducethecomputationalco stofdescriptivemodelsbyimposingpartial(orevenlinear)orderamongrandomvariable s(e.g.,[ 21 22 ]).Thegenerative modelsconsistofobservableandhiddenvariables,wherehi ddenvariablesrepresentanite numberofbasesgeneratinganimage(e.g.,[ 23 24 ]).Thediscriminativemodelsdirectly encodeposteriordistributionofhiddenvariablesgivenob servables(e.g.,[ 25 26 ]). Theavailablemodelsdierinstructuralcomplexityanddi cultyofinference.Atone endliedescriptivemodels,whichbuildstatisticaldescri ptionsofimageensemblesonlyat theobservable(i.e.,pixel)level.Othermodelingparadig ms(i.e.,generative,discriminative) imposevaryinglevelsofstructurethroughtheintroductio nofhiddenvariables.However, noprincipledformulationexists,asofyet,tosuggestonea pproachsuperiortotheothers. Therefore,ourchoiceofmodelisguidedbythegoaltointerp retsceneswithpartially occluded,alikeobjects.Weseekamodelthatoersaviablem eansofrecognizingpartially occludedobjectsthroughrecognitionoftheirvisiblecons tituentparts.Thus,aprospective modelshouldallowforanalysisofobjectpartstowardsreco gnitionofobjectsasawhole. Toalleviatethecomputationalcomplexityarisingfromthe treatmentofmultiple object-partsofmultipleobjectsinimages,weseekamodelt hatiscapableofmodeling bothwholeobjectsandtheirsub-partsinauniedmanner.Th atis,acandidatemodel mustbeexpressiveenoughtocapturecomponent-subcompone ntrelationshipsamongregionsinanimage.Toaccomplishthis,itisnecessarytoanal yzepixelneighborhoods


6 ofvaryingsize.Theliteratureaboundswithreportsonsucc essfulapplicationsofmultiscalestatisticalmodelsforthispurpose[ 27 28 29 30 31 32 ].Followingthesetrends,we choosethe irregulartree-structuredbeliefnetwork ,orshort irregulartree .Ourchoiceis directlydrivenbyourimage-interpretationstrategyandg oals,andappearsbettersuited thanalternativestatisticalapproaches.Descriptivemod elslackthenecessarystructure forcomponent-subcomponentrepresentationweseektoexpl oit.Discriminativeapproaches directlymodelposteriordistributionofhiddenvariables givenobservables.Consequently, theylosetheconvenienceofassigningphysicalmeaningtot hestatisticalparametersofthe model.Incontrast,irregulartreescandetectobjectsandt heirpartssimultaneously,as discussedinthefollowingchapters. Beforewecontinuetopresentourapproachtoimageinterpre tation,wegiveabrief overviewoftree-structuredgenerativemodelsinthefollo wingsection. 1.3Tree-StructuredGenerativeModels Recently,therehasbeenarurryofresearchintheeldoftre e-structuredgenerative models,alsoknownastree-structuredbeliefnetworks(TSB Ns)[ 27 33 28 29 30 31 32 ].The modelsprovideasystematicwaytodescriberandomprocesse s/eldsandhaveextremely ecientandstatisticallyoptimalinferencealgorithms.T ree-structuredbeliefnetworksare characterizedbyaxedbalancedtreestructureofnodesrep resentinghidden(latent)and observablerandomvariables.WefocusonTSBNswhosehidden variablestakediscretevalues,thoughTSBNscanmodelevencontinuouslyvaluedGaussi anprocesses[ 34 35 ].The edgesofTSBNsrepresentparent-child(Markovian)depende nciesbetweenneighboringlayersofhiddenvariables,whilehiddenvariables,belonging tothesamelayer,areconditionally independent,asdepictedinFigure 1{1 .Notethatobservablesdependsolelyontheircorrespondinghiddenvariables.Observablesareeitherpresent atthenestlevelonly,orcouldbe propagatedupwardthetree,asdictatedbythedesignchoice srelatedtoimageprocessing. TSBNshaveecientlinear-timeinferencealgorithms,ofwh ich,inthegraphical-models literature,thebest-knownis beliefpropagation [ 36 37 38 ].ChengandBouman[ 29 ]have usedTSBNsformultiscaledocumentsegmentation;Kumarand Hebert[ 39 ]haveemployed TSBNsforsegmentationofman-madestructuresinnaturalsc eneimages;andSchneideret


7 al.[ 40 ]haveusedTSBNsforsimultaneousimagedenoisingandsegme ntation.Alltheaforementionedexamplesdemonstratethepowerfulexpressivene ssofTSBNsandtheeciency oftheirinferencealgorithms,whichiscriticallyimporta ntforourpurposes. Inspiteoftheseattractiveproperties,thexedregularst ructureofnodesintheTSBN givesriseto\blocky"estimates.Thepre-denedtreestruc turefailstoadequatelyrepresent theimmensevariabilityinsizeandlocationofdierentobj ectsandtheirsubcomponents inimages.Intheliterature,thereareseveralapproachest oalleviatethisproblem.Irving etal.[ 28 ]haveproposedanoverlappingtreemodel,wheredistinctno descorrespondto overlappingpartsintheimage.Lietal.[ 41 ]havediscussedtwo-dimensionalhierarchical modelswherenodesaredependentbothatanyparticularlaye rthroughaMarkov-mesh andacrossresolutions.Inbothapproachessegmentationre sultsaresuperiortothosewhen standardTSBNsareused,becausethedescriptivecomponent ofthemodelsisimprovedat increasedcomputationalcost.Ultimately,however,these approachesdonotdealwiththe sourceofthe\blockiness"{namely,theorderlystructureo fTSBNs. Notuntilrecentlyhastheresearchonirregularstructures beeninitiated.Konenet al.[ 42 ]haveproposedarexibleneuralmechanismforinvariantpat ternrecognitionbasedon correlatedneuronalactivityandtheself-organizationof dynamiclinksinneuralnetworks. Also,Montanvertetal.[ 43 ],andBertolinoandMontanvert[ 44 ]haveexploredirregular multiscaletessellationsthatadapttoimagecontent.Wejo intheseresearcheortsbuilding ontheworkofAdamsetal.[ 45 ],Adams[ 46 ],Storkey[ 47 ],andStorkeyandWilliams[ 48 ], byconsideringtheirregular-structuredtreebeliefnetwo rk. (a) (b) Figure1{1:VariantsofTSBNs:(a)observables(black)atth elowestlayeronly;(b)observables(black)atalllayers;whitenodesrepresenthidd enrandomvariables,connected inabalancedquad-treestructure.


8 Figure1{2:Anirregulartreeconsistsofaforestofsubtree s,eachofwhichsegmentsthe imageintoregions,markedbydistinctshading;round-ands quare-shapednodesindicate hiddenandobservablevariables,respectively;triangles indicateroots. Intheirregulartree,asinTSBNs,nodesrepresentrandomva riables,andarcsbetween themmodelcausal(Markovian)dependenceassumptions,asi llustratedinFigure 1{2 .The irregulartreespeciesprobabilitydistributionsoverbo thitsstructureandimageclasses. Itisthisdistributionovertreestructuresthatmitigates theabovecitedproblemswith TSBNs. 1.4LearningTreeStructurefromDataisanNP-hardProblem Inordertofullycharacterizetheirregulartree(andanygr aphicalmodel,forthat matter),itisnecessarytolearnboththegraphtopology(st ructure)andtheparameters oftransitionprobabilitiesbetweenconnectednodesfromt rainingdata.Usually,forthis purpose,onemaximizesthelikelihoodofthemodelovertrai ningdata,whileatthesame timeminimizingthecomplexityofmodelstructure.Current methodsaresuccessfulat learningboththestructureandparametersfrom complete data.Unfortunately,whenthe dataare incomplete (i.e.,somerandomvariablesare hidden ),optimizingboththestructure andparametersbecomesNP-hard(nondeterministicpolynom ialtime)[ 49 50 ]. Theprincipalcontributionofthisdissertationisthatwep roposeasolutiontothe NP-hardproblemofmodel-structureestimation.Inourappr oach,weuseavariantofthe Expectation-Maximization(EM)algorithm[ 51 52 ],tofacilitateecientsearchoveralarge numberofcandidatestructures.Inparticular,theEMproce dureiterativelyimprovesits currentchoiceofparametersbyusingthefollowingtwostep s.IntheExpectationstep, currentparametersareusedforcomputingtheexpectedvalu eofallthestatisticsneededto evaluatethecurrentstructure.Thatis,themissingdata(h iddenvariables)arecompleted bytheirexpectedvalues.IntheMaximizationstep,werepla cecurrentparameterswith thosethatmaximizethelikelihoodoverthecompletedata.T hissecondstepisessentially


9 equivalenttolearningmodelstructureandparametersfrom completedata,and,hence,can bedoneeciently[ 50 38 49 ]. Intheincomplete-datacase,alocalchangeinstructureofo nepartofthetreemay leadtoastructurechangeinanotherpartofthemodel.Thus, theavailablemethodsfor structureestimationevaluatealltheneighbors(e.g.,net worksthatdierbyafewlocal changes)ofeachcandidatetheyvisit[ 53 ].Thenovelideaofourapproachistoperforma searchforthebeststructurewithinEM.Ineachiterationst ep,ourprocedureattemptsto ndabetternetworkstructure,bycomputingtheexpectedst atisticsneededforevaluation ofalternativestructures.Incontrasttotheavailableapp roaches,theEM-basedstructure searchmakesasignicantprogressineachiteration.Aswes howthroughexperimental validation,ourprocedurerequiresrelativelyfewEMitera tionstolearnnon-trivialtree structures. Theoutlinedimagemodelingconstitutesthecoreofourappr oachtoimageinterpretation,whichisdiscussedinthefollowingsection. 1.5OurApproachtoImageInterpretation Weseektoaccomplishthefollowingrelatedgoals:(1)tond aunifyingframework toaddresslocalization,detection,andrecognitionofobj ects,asthreesub-tasksofimageinterpretation,and(2)tondacomputationallyecientan dreliablesolutiontorecognition ofmultiple,partiallyoccluded,alikeobjectsinagivensi ngleimage.Forthispurpose,we formulateobjectrecognitionastheBayesianestimationpr oblem,whereclasslabelsare assignedtopixelsbyminimizingtheexpectedvalueofasuit ablyspeciedcostfunction. Thisformulationrequiresecientestimationoftheposter iordistributionofimageclasses (i.e.,objects),givenanimage.Tothisend,weresorttodir ectedgraphicalmodels,known as irregulartrees [ 54 55 46 47 48 45 ].AsdiscussedinSection 1.3 ,theirregulartreespecies probabilitydistributionsoverbothitsstructureandimag eclasses.Thismeansthat,for eachimage,itisnecessarytoinfertheoptimalmodelstruct ure,aswellastheposterior distributionofimageclasses.ByutilizingtheMarkovprop ertyoftheirregulartree,weare inapositiontoreducecomputationalcomplexityoftheinfe rencealgorithm,and,thereby, toecientlysolveourBayesianestimationproblem.


10 Afterinference,themodelrepresentsaforestofsub-trees ,eachofwhichsegmentsthe image.Moreprecisely,leafnodesthataredescendantsdown thesubtreeofagivenrootform theimageregioncharacterizedbythatroot,asdepictedinF ig. 1{2 .Thesesegmentedimage regionscanbeinterpretedasdistinctobjectappearancesi ntheimage.Thatis,inference ofirregular-treestructureprovidesasolutiontolocaliz ationanddetection.Moreover,in inference,wealsoderivetheposteriordistributionofima geclassesoverleafnodes.Inorder toclassifythesegmentedimageregionsasawhole,weperfor mmajorityvotingoverthe maximumaposteriori(MAP)classesofleafnodes.Inthisfas hion,weaccomplishourrst goal. Withrespecttooursecondgoal,wehypothesizethatthecrit icalfactorinasuccessful occluded-objectrecognitionshouldbetheanalysisofvisi bleobjectparts,which,asdiscussedbefore,usuallyinducesprohibitivecomputational cost.Toaccountexplicitlyfor objectpartsatvariousscales,weutilizetheMarkovianpro pertyofirregulartrees,which lendsitselfasanaturalsolution.Sinceeachrootdetermin esasubtreewhoseleafnodesform adetectedobject,wecanassignphysicalmeaningtorootsas representingwholeobjects. Also,eachdescendantoftherootdownthesubtreecanbeinte rpretedastherootofanother subtreewhoseleafnodescoveronlyapartoftheobject.Thus ,roots'descendantscanbe viewedasobjectpartsatvariousscales.Therefore,within theirregular-treeframework,the treatmentofobjectpartsrepresentsmerelyaparticularin terpretationofthetree/subtree structure. Toreducethecomplexityofinterpretingalldetectedobjec tsub-parts,weproposeto analyzethe signicance ofobjectcomponents(i.e.,irregular-treenodes)withres pectto recognitionofobjectsasawhole.AfterBayesianestimatio noftheirregular-treestructure foragivenimage,werstndthesetof mostsignicant irregular-treenodes.Then,these selectedsignicantnodesaretreatedasnewrootsofsubtre es.Finally,weconductMAP classicationandmajorityvotingovertheselectedimager egions,descendingfromthe selected signicant nodes,asillustratedinFig. 1{3 1.6Contributions Below,weoutlinethemaincontributionsofthisdissertati on.


11 Figure1{3:Bayesianestimationoftheirregulartreealong withtheanalysisofsignicanttreenodesconstituteourapproachtorecognitionofpa rtiallyoccluded,alikeobjects; shadingindicatesthetwodistinctsub-treesunderthetwo\ signicant"nodes. WeproposeanEM-likealgorithmforlearningagraphical-mo del,wherebothmodel structureanditsdistributionsarelearnedonagivendatas imultaneously.Thealgorithm representsastage-wisesolutiontothelearningproblemkn owntobeNP-hard.Whilewe usethealgorithmforlearningirregulartrees,itsgeneral izationtoanygenerativemodelis straightforward. Acriticalpartofthislearningalgorithmisinferenceofth eposteriordistributionof imageclassesonagivendata.Asisthecaseformanycomplexstructuremodels,exact inferenceforirregulartreesisintractable.Toovercomet hisproblem,weresorttovariational approximationapproach.Weassumethatthereareaveraging phenomenainirregulartrees thatmayrenderagivensetofvariablesinthemodelapproxim atelyindependentoftherest ofthenetwork.Thereby,wederivetheStructuredVariation alApproximationalgorithm thatadvancesexistingmethodsforinference. Inordertoavoidvariationalapproximationininference,w eproposetwonovelarchitecturesandtheirinferencealgorithmswithintheirregul ar-treeframework.Beingsimpler, thesemodelsallowforexactinference.Moreover,empirica lly,theyexhibithigheraccuracy inmodelingimagesthanirregular-tree-likemodelspropos edinpriorwork[ 45 46 47 48 ]. Alongwitharchitecturalnovelties,wealsointroducemult i-layereddataintothemodel{ anapproachthathasbeenextensivelyinvestigatedinxedstructurequad-trees[ 29 33 ]. Theproposedquad-treeshaveprovedrathersuccessfulforv ariousapplicationsincluding imagedenoising,classication,andsegmentation.Hence, itisimportanttodevelopa similarformulationforirregulartrees.


12 Wedevelopanovelapproachtoobjectrecognition,inwhicho bjectpartsareexplicitly analyzedinacomputationallyecientmanner.Asamajorthe oreticalcontribution,we denethemeasureofcognitivesignicanceofobjectdetail s.Themeasureprovidesfora principledalgorithmthatcombinesdetectedobjectpartst owardrecognitionofanobject asawhole. Finally,wereportresultsofexperimentsconductedonawid evarietyofimagedatasets, whichcharacterizetheproposedmodelsandinferencealgor ithms,andvalidateourapproach toimageinterpretation. 1.7Overview Theremainderofthedissertationisorganizedasfollows.InChapter 2 ,wespecifytwoarchitecturesoftheirregular-treemodel, andderive inferencealgorithmsforthem.Thearchitecturesdierint hetreatmentofobservable randomvariables.Wealsodiscusslearningofthemodelpara meters.Detailedderivation oftheinferencealgorithmisgiveninAppendix A Next,inChapter 3 ,wespecifyyetanothertwoarchitecturesoftheirregulartreemodel, forwhichitispossibletosimplifytheinferencealgorithm ,ascomparedtothatdiscussed inChapter 2 .Wedeliberatetheprobabilisticinferenceandlearningal gorithmsforthe models. Further,inChapter 4 ,weproposeameasureofsignicanceofobjectparts.This measureranksobjectcomponentswithrespecttotheentropy overallimageclasses(i.e., objects).Toincorporatetheinformationofthisanalysisi ntotheMAPclassication,we deviseagreedyalgorithm,whichwerefertoasobject-partr ecognition. Theextractionofimagefeatures,whichweuseinourexperim ents,isthoroughly discussedinChapter 5 .Then,InChapter 6 ,wereportperformanceresultsofdierent irregular-treearchitecturesonalargenumberofchalleng ingimageswithpartiallyoccluded, alikeobjects. Finally,inChapter 7 ,wesummarizethemajorcontributionsofthedissertation, and concludewithremarksonthefutureresearch.


CHAPTER2 IRREGULARTREESWITHRANDOMNODEPOSITIONS 2.1ModelSpecication Irregulartreesaredirected,acyclicgraphswithtwodisjo intsetsofnodesrepresenting hiddenandobservablerandomvectors.Graphically,werepr esentallhiddenvariablesas round-shapednodes,connectedviadirectededgesindicati ngMarkoviandependencies,while observablesaredenotedasrectangular-shapednodes,conn ectedonlytotheircorresponding hiddenvariables,asdepictedinFig. 2{1 .Below,werstintroducenodescharacterizedby hiddenvariables. Thereare V round-shapednodes,organizedinhierarchicallevels, V ` ` = f 0 ; 1 ;:::;L 1 g where V 0 denotestheleaflevel,and V 0 V n V 0 .Thenumberofround-shapednodesisidenticaltothatofthecorrespondingquad-treewith L levels,suchthat j V ` j = j V ` 1 j = 4= ::: = j V 0 j = 4 ` Connectionsareestablishedundertheconstraintthatanod eatlevel ` canbecomearoot, oritcanconnectonlytothenodesatthenext ` +1level.Thenetworkconnectivityis representedbyrandommatrix Z ,whereentry z ij isanindicatorrandomvariable,such that z ij =1if i 2 V ` and j 2f 0 ;V ` +1 g areconnected. Z containsanadditionalzero(\root") (a) (b) Figure2{1:Twotypesofirregulartrees:(a)observablevar iablespresentattheleaflevel only;(b)observablevariablespresentatalllevels;round -andsquare-shapednodesindicate hiddenandobservablerandomvariables;trianglesindicat eroots;unconnectednodesinthis examplebelongtoothersubtrees;eachsubtreesegmentsthe imageintoregionsmarkedby distinctshading. 13


14 column,whereentries z i 0 =1if i isaroot.Sinceeachnodecanhaveonlyoneparent,arealizationof Z canhaveatmostoneentryequalto1ineachrow.Wedenethedi stribution overconnectivityas P ( Z ) Q L 1 ` =0 Q ( i;j ) 2 V ` f 0 ;V ` +1 g [ r ij ] z ij ; (2.1) where r ij istheprobabilityof i beingthechildof j ,subjectto P j 2f 0 ;V ` +1 g r ij =1. Further,eachround-shapednode i (seeFig. 2{1 )ischaracterizedbyrandomposition r i intheimageplane.Thedistributionof r i isconditionedonthepositionofitsparent r j as P ( r i j r j ;z ij =1) 1 2 j ij j 1 2 exp( 1 2 ( r i r j d ij ) T 1 ij ( r i r j d ij )) ; (2.2) where ij isadiagonalmatrixthatrepresentstheorderofmagnitudeo fobjectsize,andparameter d ij isthemeanofrelativedisplacement( r i r j ).StorkeyandWilliams[ 48 ]set d ij tozero,whichfavorsundesirablepositioningofchildrena ndparentnodesatthesamelocations.Fromourexperiments,thismayseriouslydegradethe image-modelingcapabilitiesof irregulartrees,andassuchsomenonzerorelativedisplace ment d ij needstobeaccounted for.Forroots i ,wehave P ( r i j r 0 ;z i 0 =1) exp( 1 2 ( r i d i ) T 1 i ( r i d i )) = (2 j i j 1 2 ).The jointprobabilityof R f r i j8 i 2 V g ,isgivenby P ( R j Z ) Q i;j 2 V [ P ( r i j r j ;z ij )] z ij : (2.3) Attheleaflevel, V 0 ,wexnodepositions R 0 tothelocationsofthenest-scaleobservables,andthenuse P ( Z;R 0 j R 0 )astheprioroverpositionsandconnectivity,where R 0 f r i j8 i 2 V 0 g ,and R 0 f r i j8 i 2 V n V 0 g Next,eachnode i ischaracterizedbyanimage-classlabel x i andanimage-classindicatorrandomvariable x ki ,suchthat x ki =1if x i = k ,where k isalabeltakingvaluesinthenite set M .Thus,weassumethattheset M ofunknownimageclassesisnite.Thelabel k of node i isconditionedonimageclass l ofitsparent j andisgivenbyconditionalprobability tables P kl ij .Forroots i ,wehave P ( x ki j x l0 ;z i 0 =1) P ( x ki ).Thus,thejointprobabilityof X f x ki j i 2 V;k 2 M g isgivenby P ( X j Z )= Q i;j 2 V Q k;l 2 M h P kl ij i x ki x lj z ij : (2.4)


15 Finally,weintroducenodesthatarecharacterizedbyobser vablerandomvectorsrepresentingimagetextureandcolorcues.Here,wemakeadisti nctionbetweentwotypesof irregulartrees.Themodelwhereobservablesarepresenton lyattheleaf-levelisreferred toasIT V 0 ;themodelwhereobservablesarepresentatalllevelsisref erredtoasIT V .To clarifythedierencebetweenthetwotypesofnodesinirreg ulartrees,weindexobservables withrespecttotheirlocationsinthedata-structure(e.g. ,waveletdyadicsquares),while hiddenvariablesareindexedwithrespecttoanode-indexin thegraph.Thisgeneralizes correspondencebetweenhiddenandobservablerandomvaria blesoftheposition-encoding dynamictrees[ 48 ].Wedenethepositionofanobservable, ( i ),tobeequaltothecenter ofmassofthe i -thdyadicsquareatlevel ` inthecorrespondingquad-treewith L levels: ( i ) [( n +0 : 5)2 ` ( m +0 : 5)2 ` ] T ; 8 i 2 V ` ;` = f 0 ;:::;L 1 g ;n;m =1 ; 2 ;::: (2.5) where n and m denotetherowandcolumninthedyadicsquareatscale ` (e.g.,forwavelet coecients).Clearly,otherapplication-dependentdeni tionsof ( i )arepossible.Note thatwhilethe r 'sarerandomvectors,the 'saredeterministicvaluesxedatlocations wherethecorrespondingobservablesarerecordedintheima ge.Also,afterxing R 0 tothe locationsofthenest-scaleobservables,wehave 8 i 2 V 0 r i = ( i ).Thedenition,givenby Eq.( 2.5 ),holdsforIT V 0 ,aswell,for ` =0. Forbothtypesofirregulartrees,weassumethatobservable s Y f y ( i ) j8 i 2 V g atlocations f ( i ) j8 i 2 V g areconditionallyindependentgiventhecorresponding x ki : P ( Y j X; )= Q i 2 V Q k 2 M P ( y ( i ) j x ki ; ( i )) x ki ; (2.6) whereforIT V 0 V 0 shouldbesubstitutedfor V .Thelikelihoods P ( y ( i ) j x ki =1 ; ( i ))are modeledasmixturesofGaussians: P ( y ( i ) j x ki =1 ; ( i )) P G k g =1 k ( g ) N ( y ( i ) ; k ( g ) ; k ( g )). Forlarge G k ,aGaussian-mixturedensitycanapproximateanyprobabili tydensity[ 56 ]. Inordertoavoidtheriskofoverttingthemodel,weassumet hattheparametersofthe Gaussian-mixtureareequalforallnodes.TheGaussian-mix tureparameterscanbegrouped intheset f G k ; f k ( g ) ; k ( g ) ; k ( g ) g G k g =1 j8 k 2 M g Speakingingenerativeterms,foragivensetof V nodes,rst P ( Z )isdenedusing Eq.( 2.1 )and P ( R j Z )usingEq.( 2.3 )togiveus P ( Z;R ).Wethenimposetheconditionof


16 xingtheleaf-levelnodepositionstothelocationsofthe nest-scaleobservables, 0 toobtain P ( Z;R 0 j R 0 = 0 ).CombiningEq.( 2.4 )andEq.( 2.6 )with P ( Z;R 0 j R 0 = 0 )results inthejointprior P ( Z;X;R 0 ;Y j R 0 = 0 )= P ( Y j X; ) P ( X j Z ) P ( Z;R 0 j R 0 = 0 ) ; (2.7) whichfullyspeciestheirregulartree.Alltheparameters ofthejointpriorcanbegrouped intheset f r ij ; d ij ; ij ;P kl ij ; g 8 i;j 2 V 8 k;l 2 M AsdepictedinFigure 2{1 ,airregulartreeisadirectedgraph.Theformalismofthe graph-theoreticrepresentationofirregulartreesprovid esgeneralalgorithmsforcomputing marginalandconditionalprobabilitiesofinterest,which isdiscussedinthefollowingsection. 2.2ProbabilisticInference Imageinterpretation,asdiscussedinChapter 1 ,requirescomputationofposteriorprobabilitiesofhiddenrandomvariables Z X ,and R 0 ,givenobservables Y andleaf-nodepositions R 0 .However,duetothecomplexityofirregulartrees,theexac tprobabilisticinference of P ( Z;X;R 0 j Y;R 0 )isinfeasible.Therefore,weresorttoapproximateinfere ncemethods, whicharedividedintotwobroadclasses:deterministicapp roximationsandMonte-Carlo methods[ 57 58 59 60 61 ]. MarkovChainMonteCarlo(MCMC)methodsallowforsamplingo ftheposterior P ( Z;X;R 0 j Y;R 0 ),andtheconstructionofaMarkovchainwhoseequilibriumd istribution isthedesired P ( Z;X;R 0 j Y;R 0 ).Below,wereportanexperimentfortwodatasetsof4 4 and8 8binaryimages,samplesofwhicharedepictedinFig. 2{2 a,wherewelearned P ( Z;X;R 0 j Y;R 0 )forIT V 0 modelsthroughGibbssampling[ 62 ].Observables y i wereset tobinarypixelvalues;thenumberofimageclasseswassetto j M j =2;thenumberof componentsintheGaussian-mixturewassetto G =1;andthemaximumnumberoflevels inthemodelissetto L =3and L =4for4 4and8 8images,respectively.Theinitial irregular-treestructureisabalancedquad-tree(TSBN),w herethenumberofleaf-level nodesisequaltothenumberofpixels.OneiterationofGibbs samplingconsistsofsampling eachvariable,conditionedontheothervariablesintheirr egulartree,untilallthevariables aresampled.Weiteratedthisprocedureuntilourconvergen cecriterionwasmet{namely, when j P t +1 ( Z;X;R 0 j Y;R 0 ) P t ( Z;X;R 0 j Y;R 0 ) j =P t ( Z;X;R 0 j Y;R 0 ) <" for N =10successive


17 (a) (b) (c) Figure2{2:Pixelclusteringusingirregulartreeslearned byGibbssampling:(a)sample 4 4and8 8binaryimages;(b)clusteredleaf-levelpixelsthathavet hesameparentat level1;(c)clusteredleaf-levelpixelsthathavethesameg randparentatlevel2;clusters areindicatedbydierentshadesofgray;thepointineachgr oupmarksthepositionofthe parentnode. Figure2{3:Irregulartreelearnedforthe4 4imagein(a),after20,032iterationsofGibbs sampling;nodesaredepictedin-linerepresenting4,2and1 actualrowsofthelevels0,1 and2,respectively;nodesaredrawnaspie-chartsrepresen ting P ( x ki =1), k 2f 0 ; 1 g ;note thattherearetworootnodesfortwodistinctobjectsinthei mage. iterationsteps t ,where =0 : 1and =1for4 4and8 8images,respectively.Forthe datasetof50binary4 4images,onaveragemorethan20,000iterationstepswerere quired forconvergence,whilefor50binary8 8binaryimage,morethan100,000iterationswere required.InFigs. 2{2 b-c,wealsoillustratethegroupingofpixelsinthelearned irregular trees,whileinFig. 2{3 ,wedepicttheirregulartreelearnedforthe4 4imageinFig. 2{2 a. Fromtheexperimentalresults,weinferthatirregulartree slearnedthroughGibbs samplingarecapableofcapturingimportantstructuralinf ormationaboutimageregions atvariousscales.Generally,however,inMCMCapproaches, withincreasingmodelcomplexity,thechoiceofproposalsintheMarkovchainbecomes hard,sothattheequilibrium distributionisreachedveryslowly[ 63 57 ].Hence,inordertoachievefasterinference,we resorttovariationalapproximation,aspecictypeofdete rministicapproximation[ 59 64 ].


18 Variationalapproximationmethodshavebeendemonstrated togivegoodandsignicantly fasterresults,whencomparedtoGibbssampling[ 46 ].Theproposedapproachesrangefrom afactorizedapproximatingdistributionoverhiddenvaria bles[ 45 ](a.k.a.meaneldvariationalapproximation)tomorestructuredsolutions[ 48 ],wheredependenciesamonghidden variablesareenforced.Theunderlyingassumptioninthose methodsisthatthereareaveragingphenomenainirregulartreesthatmayrenderagivense tofvariablesapproximately independentoftherestofthenetwork.Therefore,theresul tingvariationaloptimizationof irregulartreesprovidesforprincipledsolutions,whiler educingcomputationalcomplexity. Inthefollowingsection,wederiveanovelStructuredVaria tionalApproximation(SVA) algorithmfortheirregulartreemodeldenedinSection 2.1 2.3StructuredVariationalApproximation Invariationalapproximation,theintractabledistributi on P ( Z;X;R 0 j Y;R 0 )isapproximatedbyasimplerdistribution Q ( Z;X;R 0 j Y;R 0 )closestto P ( Z;X;R 0 j Y;R 0 ).Tosimplify notation,below,weomittheconditioningon Y and R 0 ,andwrite Q ( Z;X;R 0 ).Thenovelty ofourapproachisthatweconstrainthevariationaldistrib utiontotheform Q ( Z;X;R 0 ) Q ( Z ) Q ( X j Z ) Q ( R 0 j Z ) ; (2.8) whichenforcesthatbothclass-indicatorvariables X andpositionvariables R 0 arestatisticallydependentonthetreeconnectivity Z .Sincethesedependenciesaresignicantinthe prior,oneshouldexpectthemtoremainsointheposterior.T herefore,ourformulation appearstobemoreappropriateforapproximatingthetruepo steriorthanthemean-eld variationalapproximation Q ( Z;X;R 0 )= Q ( Z ) Q ( X ) Q ( R 0 )discussedbyAdamsetal.[ 45 ], andtheform Q ( Z;X;R 0 )= Q ( Z ) Q ( X j Z ) Q ( R 0 )proposedbyStorkeyandWilliams[ 48 ].We denetheapproximatingdistributionsasfollows: Q ( Z ) Q L 1 ` =0 Q ( i;j ) 2 V ` f 0 ;V ` +1 g [ ij ] z ij ; (2.9) Q ( X j Z ) Q i;j 2 V Q k;l 2 M h Q klij i x ki x lj z ij ; (2.10) Q ( R 0 j Z ) Q i;j 2 V 0 [ Q ( r i j z ij )] z ij = Q i;j 2 V 0 24 exp 1 2 ( r i ij ) T n 1 ij ( r i ij ) 2 j n ij j 1 2 35 z ij ; (2.11)


19 whereparameters ij correspondtothe r ij connectionprobabilities,andthe Q klij areanalogoustothe P kl ij conditionalprobabilitytables.Fortheparametersof Q ( R 0 j Z ),notethat covariancesn ij andmeanvalues ij formthesetofGaussianparametersforagivennode i 2 V ` overitscandidateparents j 2 V ` +1 .Whichpairofparameters( ij ; n ij ),isusedto generate r i isconditionedonthegivenconnectionbetween i and j {thatis,thecurrent realizationof Z .Furthermore,weassumethatthen'sarediagonalmatrices, suchthat nodepositionsalongthe\ x "and\ y "imageaxesareuncorrelated.Also,forroots,suitable formsof Q functionsareused,similartothespecicationsgiveninSe ction 2.1 Tond Q ( Z;X;R 0 )closestto P ( Z;X;R 0 j Y;R 0 )weresorttoastandardoptimization method,whereKullback-Leibler(KL)divergencebetween Q ( Z;X;R 0 )and P ( Z;X;R 0 j Y;R 0 ) isminimized([ 65 ],ch.2,pp.12{49,andch.16,pp.482{509).TheKLdivergenc eisgiven by KL ( Q k P ) Z R 0 dR 0 XZ;X Q ( Z;X;R 0 )log Q ( Z;X;R 0 ) P ( Z;X;R 0 j Y;R 0 ) : (2.12) Itiswellknownthat KL ( Q k P )isnon-negativeforanytwodistributions Q and P ,and KL ( Q k P )=0ifandonlyif Q = P ;thesepropertiesareadirectcorollaryofJensen'sinequa lity([ 65 ],ch.2,pp.12{49).Assuch, KL ( Q k P )guaranteesaglobalminimum{thatis,a uniquesolutionto Q ( Z;X;R 0 ). ByminimizingtheKLdivergence,wederivetheupdateequati onsforestimatingthe parametersofthevariationaldistribution Q ( Z;X;R 0 ).Below,wesummarizethenal derivationresults.Detailedderivationstepsarereporte dinAppendix A ,wherewealso providethelistofnomenclature.Inthefollowingequation s,weuse todenoteanarbitrary normalizationconstant,thedenitionofwhichmaychangef romequationtoequation. Parametersontheright-handsideoftheupdateequationsar eassumedknown,aslearned inthepreviousiterationstep.2.3.1Optimizationof Q ( X j Z ) Q ( X j Z )isfullycharacterizedbyparameters Q klij ,whichareupdatedas Q klij = P kl ij ki ; 8 i;j 2 V; 8 k;l 2 M; (2.13)


20 wheretheauxiliaryparameters ki arecomputedas ki = 8><>: P ( y ( i ) j x ki ; ( i )) ;i 2 V 0 ; Q c 2 V P a 2 M P ak ci akci ci ;i 2 V 0 ; (2.14a) ki = P ( y ( i ) j x ki ; ( i )) Q c 2 V P a 2 M P ak ci ac ci ; 8 i 2 V; 8 k 2 M; (2.14b) whereEq.( 2.14a )isderivedforIT V 0 ,andEq.( 2.14b )forIT V .Sincethe ci arenon-zero onlyforchild-parentpairs,fromEq.( 2.14 ),wenotethat 'sarecomputedforbothmodels bypropagatingthe messagesofthecorrespondingchildrennodesupward.Thus, Q 's,given byEq.( 2.13 ),canbeupdatedbymakingasinglepassupthetree.Also,not ethatforleaf nodes, i 2 V 0 ,the ci parametersareequalto0bydenition,yielding ki = P ( y ( i ) j x ki ; ( i )) inEq.( 2.14b ). Further,fromEqs.( 2.9 )and( 2.10 ),wederivetheupdateequationfortheapproximate posteriorprobability m ki thatnode i isassignedtoimageclass k ,given Y and R 0 ,as m ki = R R 0 dR 0 P Z;X x ki Q ( Z;X;R 0 )= P j 2 V 0 ij P l 2 M Q klij m lj ; 8 i 2 V; 8 k 2 M: (2.15) Notethatthe m ki canbecomputedbypropagatingimage-classprobabilitiesi nasinglepass downward.Thisupward-downwardpropagation,speciedbyE qs.( 2.14 )and( 2.15 ),isvery reminiscentofbeliefpropagationforTSBNs[ 36 31 ].Forthespecialcasewhen ij =1only foroneparent j ,weobtainthestandard rulesofPearl'smessagepassingschemefor TSBNs.2.3.2Optimizationof Q ( R 0 j Z ) Q ( R 0 j Z )isfullycharacterizedbyparameters ij andn ij .Theupdateequationfor ij 8 ( i;j ) 2 V ` f 0 ;V ` +1 g `> 0,isgivenby ij = 24 X p 2 V 0 jp 1 ij + X c 2 V 0 ci 1 ci 35 1 24 X p 2 V 0 jp 1 ij ( jp + d jp )+ X c 2 V 0 ci 1 ci ( ci d ij ) 35 ; (2.16)


21 where c and p denotechildrenandgrandparentsofnode i ,respectively.Further,forall nodepairs 8 ( i;j ) 2 V ` f 0 ;V ` +1 g `> 0,where ij 6 =0,n ij isupdatedas Tr f n 1 ij g =Tr f 1 ij g 0@ 1+ X p 2 V 0 jp Tr f 1 ij n jp g Tr f 1 ij n ij g # 1 2 1A + + X c 2 V 0 ci Tr f 1 ci g 0@ 1+ Tr f 1 ci n ci g Tr f 1 ci n ij g 1 2 1A ; (2.17) where,onceagain, c and p denotechildrenandgrandparentsofnode i ,respectively.Since then'sand'sareassumeddiagonal,itisstraightforwardt oderivetheexpressionsfor thediagonalelementsofthen'sfromEq.( 2.17 ).Notethatboth ij andn ij areupdatedsummingoverchildrenandgrandparentsof i ,and,therefore,mustbeiterateduntil convergence.2.3.3Optimizationof Q ( Z ) Q ( Z )isfullycharacterizedbyconnectivityprobabilities ij ,whicharecomputedas ij = r ij exp( A ij B ij ) ; 8 `; 8 ( i;j ) 2 V ` f 0 ;V ` +1 g ; (2.18) where A ij representstheinruenceofobservables Y ,while B ij representsthecontributionof thegeometricpropertiesofthenetworktotheconnectivity distribution.Thesearedened inAppendix A 2.4InferenceAlgorithmandBayesianEstimation Forthegivensetofparameterscharacterizingthejointpr ior,observables Y ,and leaf-levelnodepositions R 0 ,thestandardBayesianestimationofoptimal ^ Z ^ X ,and ^ R 0 requiresminimizingtheexpectationofacostfunction C : ( ^ Z; ^ X; ^ R 0 )=argmin Z;X;R 0 E fC (( Z;X;R 0 ) ; ( Z ;X ;R 0 )) j Y;R 0 ; g ; (2.19) where C ( )penalizesthediscrepancybetweentheestimatedcongura tion( Z;X;R 0 )and thetrueone( Z ;X ;R 0 ).Weproposethefollowingcostfunction: C (( Z;X;R 0 ) ; ( Z ;X ;R 0 )) X i;j 2 V [1 ( z ij z ij )]+ X i 2 V X k 2 M [1 ( x ki x k i )]+ X i 2 V 0 [1 ( r i r i )] ; (2.20)


22 where indicatestruevalues,and ( )istheKroneckerdeltafunction.Usingthevariational approximation P ( Z;X;R 0 j Y;R 0 ) Q ( Z ) Q ( X j Z ) Q ( R 0 j Z ),fromEqs.( 2.19 )and( 2.20 ),we derive: ^ Z =argmin Z P Z Q ( Z ) P L 1 ` =0 P ( i;j ) 2 V ` f 0 ;V ` +1 g [1 ( z ij z ij )] ; (2.21) ^ X =argmin X P Z;X Q ( Z ) Q ( X j Z ) P i 2 V P k 2 M [1 ( x ki x k i ))] ; (2.22) ^ R 0 =argmin R 0 R R 0 dR 0 P Z Q ( Z ) Q ( R 0 j Z ) P i 2 V 0 [1 ( r i r i )] : (2.23) Giventheconstraintsonconnections,discussedinSection 2.1 ,minimizationinEq.( 2.21 ) isequivalenttondingparents: ( 8 ` )( 8 i 2 V ` )( Z i 6 =0) ^ j =argmax j 2f 0 ;V ` +1 g ij ; forIT V 0 ; (2.24a) ( 8 ` )( 8 i 2 V ` ) ^ j =argmax j 2f 0 ;V ` +1 g ij ; forIT V ; (2.24b) where ij isgivenbyEq.( 2.18 ); Z i denotesthe i -thcolumnof Z ,and Z i 6 =0indicatesthat thereisatleastonenon-zeroelementincolumn Z i ;thatis, i haschildren,andthereby isincludedinthetreestructure.Notethatduetothedistri butionoverconnections,after estimationof Z ,foragivenimage,somenodesmayremainwithoutchildren.T opreserve thegenerativepropertyinIT V 0 ,weimposeanadditionalconstrainton Z thatnodesabove theleaflevelmusthavechildreninordertobeabletoconnec ttoupperlevels.Ontheother hand,inIT V ,duetomulti-layeredobservables,allnodes V mustbeincludedinthetree structure,eveniftheydonothavechildren.Theglobalsolu tiontoEq.( 2.24a )isanopen probleminmanyresearchareas.Therefore,forIT V 0 ,weproposeastage-wiseoptimization, where,aswemoveupwards,startingfromtheleaflevel ` = f 0 ; 1 ;:::;L 1 g ,weincludeinthe treestructureoptimalparentsat V ` +1 accordingto ( 8 i 2 V ` )( ^ Z i 6 =0) ^ j =argmax j 2f 0 ;V ` +1 g ij ; (2.25) where ^ Z i denotes i -thcolumnoftheestimated ^ Z ,and ^ Z i 6 =0indicatesthat i hasalready beenincludedinthetreestructurewhenoptimizingtheprev iouslevel V `


23 Next,fromEq.( 2.22 ),theresultingBayesianestimatorofimage-classlabels, denoted as^ x i ,is ( 8 i 2 V )^ x i =argmax k 2 M m ki ; (2.26) wheretheapproximateposteriorprobability m ki thatimageclass k isassignedtonode i is givenbyEq.( 2.15 ). Finally,fromEq.( 2.23 ),optimalnodepositionsareestimatedas ( 8 `> 0)( 8 i 2 V ` ) ^ r i =argmax r i P Z Q ( r i j Z ) Q ( Z )= P j 2f 0 ;V ` +1 g ij ij ; (2.27) where ij and ij aregivenbyEqs.( 2.16 )and( 2.18 ),respectively. Theinferencealgorithmforirregulartreesissummarizedi nFig. 2{4 .Thespecied orderingofparameterupdatesfor Q ( Z ), Q ( X j Z ),and Q ( R 0 j Z )inFig. 2{4 ,steps(4){(10), isarbitrary;theoretically,otherorderingsareequallyv alid. 2.5LearningParametersoftheIrregularTreewithRandomNo dePositions Variationalinferencepresumesthatmodelparameters:= f r ij ; d ij ; ij ;P kl ij ; g 8 i;j 2 V 8 k;l 2 M ,and V L M ,areavailable.Theseparameterscanbelearnedo-linethr ough standard MaximumLikelihood (ML)optimization.Usually,fortheMLoptimization,itis assumedthat N ,independentlygenerated,trainingimages,withobservab les f Y n g Nn =1 and latentvariables f ( Z n ;X n ;R 0 n ) g Nn =1 ,aregiven.However,formultiscalegenerativemodels, ingeneral,neitherthetrueimage-classlabelsfornodesat higherlevelsnortheirdynamic connectionsaregiven.Therefore,congurations f ( ^ Z n ; ^ X n ; ^ R 0 n ) g mustbeestimatedfrom thetrainingimages. Tothisend,weproposeaniterativelearningprocedure.Ini nitialization,werstset L =log 2 ( j V 0 j ),where j V 0 j isequaltothesizeofagivenimage.Thenumberofimage classes j M j isalsoassumedknown.Next,duetoahugediversityofpossib lecongurations ofobjectsinimages,foreachnode i 2 V ` ,weinitialize r ij tobeuniformover i 'scandidate parents 8 j 2f 0 ;V ` +1 g .Then,forallpairs( i;j ) 2 V ` V ` +1 atlevel ` ,weset d ij = ( i ) ( j ); namely,the d ij areinitializedtotherelativedisplacementofthecenters ofmassofthe i -th and j -thdyadicsquaresinthecorrespondingquad-treewith L levels,speciedinEq.( 2.5 ). Forroots i ,wehave d i = ( i ).Also,wesetdiagonalelementsof ij tothediagonalelements


24 InferenceAlgorithm Assumethat V L M ,, N " ,and aregiven. (1)Initialization: t =0; t in =0;( 8 i;j 2 V )( 8 k;l 2 M ) ij (0)= r ij ; Q klij (0)= P kl ij ; ij (0)="node locationsinthecorrespondingquad-tree";diagonaleleme ntsof n ij (0)aresettothearea ofdyadicsquaresinthecorrespondingquad-tree;(2) repeat OuterLoop (3) t = t +1; (4)Computein bottom-up passfor ` =0 ; 1 ;:::;L 1, 8 i 2 V ` 8 k 2 M ki ( t )givenbyEq.( 2.14 ); Q klij ( t )givenbyEq.( 2.13 ); (5)Computein top-down passfor ` = L 1 ;L 2 ;:::; 0, 8 i 2 V ` 8 k 2 M m ki ( t )givenbyEq.( 2.15 ); (6) repeat InnerLoop (7) t in = t in +1; (8)Compute 8 i;j 2 V 0 ij ( t in )givenbyEq.( 2.16 );n ij ( t in )givenbyEq.( 2.17 ); (9) until j ij ( t in ) ij ( t in 1) j = ij ( t in 1) <" ; (10)Compute 8 i;j 2 V 0 ij ( t )givenbyEq.( 2.18 ); (11) until j Q ( Z;X;R 0 ; t ) Q ( Z;X;R 0 ; t 1) j =Q ( Z;X;R 0 ; t 1) <" ,for N consecutiveiterationsteps;(12)Estimationof ^ Z :computein bottom-up passfor ` =0 ; 1 ;:::;L 1, forIT V 0 :( 8 i 2 V ` )( ^ Z i 6 =0) ^ j =argmax j 2f 0 ;V ` +1 g ij ( t ), forIT V :( 8 i 2 V ` ) ^ j =argmax j 2f 0 ;V ` +1 g ij ( t ); (13)Estimationof ^ X :compute( 8 i 2 V )^ x i =argmax k 2 M m ki ( t ); (14)Estimationof ^ R 0 :compute( 8 `> 0)( 8 i 2 V ` ) ^ r i = P j 2f 0 ;V ` +1 g ij ( t ) ij ( t ); Figure2{4:Inferenceoftheirregulartreegiven Y R 0 ,and; t and t in arecountersinthe outerandinnerloops,respectively; N " ,and controltheconvergencecriteriaforthe twoloops.ofamatrix d ij d Tij .Thenumberofcomponents G k inaGaussianmixtureforeachclass k issetto G k =3,whichisempiricallyvalidatedtobeappropriate.Other parametersofthe Gaussianmixture, ,areestimatedbyusingtheEMalgorithm[ 52 56 ]onthehand-labeled trainingimages.Finally,conditionalprobabilitytables P kl ij ,areinitializedtobeuniform overpossibleimageclasses. Afterinitializationof,werunaniterativelearningproc edure,whereinstep t we conductSVAinferenceoftheirregulartreeonthetrainingi mages,asexplainedinthe previoussection.Afterinferenceoftheposteriorprobabi litythatclass k isassignedtonode i m ki ,givenbyEq.( 2.15 ),andposteriorconnectivityprobability, ij ,givenbyEq.( 2.18 ),


25 onalltrainingimages, n =1 ;:::;N ,weupdateonly P kl ij and r ij as P kl ij ( t +1)= 1 N N X n =1 m k ; n i ( t ) ; (2.28) r ij ( t +1)= 1 N N X n =1 n ij ( t ) : (2.29) Otherparametersin( t +1)= f r ij ( t +1) ; d ij ; ij ;P kl ij ( t +1) ; g ,arexedtotheirinitialvalues.Inthenextiterationstep,weuse( t +1)forSVAinferenceoftheirregulartreeon thetrainingimages.Weassumethatthelearningalgorithmc onvergedwhen j P kl ij ( t +1) P kl ij ( t ) j P kl ij ( t ) <"; where "> 0isapre-speciedparameter. 2.6ImplementationIssues Inthissection,welistalgorithm-relateddetailsthatare necessaryfortheexperimental results,presentedinChapter 6 ,tobereproducible. First,directimplementationofEq.( 2.13 )wouldresultinnumericalunderrow.Therefore,weintroducethefollowingscalingprocedure: ~ ki ki S i ; 8 i 2 V; 8 k 2 M; (2.30) S i X k 2 M ki : (2.31) Substitutingthescaled ~ 'sintoEq.( 2.13 ),weobtain Q klij = P kl ij ki P a 2 M P al ij ai = P kl ij ~ ki P a 2 M P al ij ~ ai : (2.32) Inotherwords,computationof Q klij doesnotchangewhenthescaled ~ 0 s areused. Second,toreducecomputationalcomplexity,weconsider,f oreachnode i ,onlythe7 7 boxencompassingparentnodes j thatneighbortheparentofthecorrespondingquad-tree. Consequently,thenumberofpossiblechildrennodes c of i isalsolimited.Ourexperiments showthattheomittednodes,eitherchildrenorparents,con tributenegligiblytotheupdate equations.Thus,welimitoverallcomputationalcostasthe numberofnodesincreases.


26 Finally,theconvergencecriterionoftheinnerloop,where ij andn ij arecomputed, iscontrolledbyparameter .When =0 : 01,theaveragenumberofiterationsteps, t in ,in theinnerloop,isfrom3to5,dependingontheimagesize,whe rethelatterisobtainedfor 128 128images.Theconvergencecriterionoftheouterloopisco ntrolledbyparameters N and .Simplicationsthatweuseinpracticemayleadtosub-opti malsolutionsofSVA. Fromourexperience,though,thealgorithmrecoversfromun stablestationarypointsfor sucientlylarge N .Inourexperiments,weset N =10and =0 : 01. Aftertheinferencealgorithm(Fig. 2{4 )converged,wethenestimatethevaluesof hiddenvariables( ^ Z; ^ X; ^ R 0 )foragivenimage,therebyconductingimageinterpretatio n.


CHAPTER3 IRREGULARTREESWITHFIXEDNODEPOSITIONS Inthepreviouschapter,twoarchitecturesoftheirregular treearepresented,whichare fullycharacterizedbythefollowingjointprior: P ( Z;X;R 0 ;Y j R 0 = 0 )= P ( Y j X; ) P ( X j Z ) P ( Z;R 0 j R 0 = 0 ) : AsdiscussedinSection 2.2 ,theinferenceoftheposteriordistribution P ( Z;X;R 0 j Y;R 0 ) isintractable,duetothecomplexityofthemodel.Thenodepositionvariables, R 0 ,are themainculpritforconductingapproximateinference.Ont heotherhand,the R 0 are veryuseful,becausetheyconstrainpossiblenetworkcong urations.Inordertoavoid approximateinference,inthischapter,weintroduceyetan otherarchitectureoftheirregular tree,wherethe R 0 areeliminated,andwheretheconstraintsonthetreestruct urearedirectly modeledinthedistributionofconnectivity Z 3.1ModelSpecication Similartothemodelspecicationinthepreviouschapter,w eintroducetwoarchitectures:onewithobservablesonlyattheleaflevel,andtheot herwithobservablespropagated tohigherlevels.Themaindierencefromthearchitectures IT V andIT V 0 isthatnodepositionsareidenticaltothoseofthequad-tree.Therefore, werefertothearchitectures presentedinthischapterasirregularquadtreesIQT V andIQT V 0 Theirregularquadtreeisadirectedacyclicgraphwithnode sinset V ,organized inhierarchicallevels, V ` ` = f 0 ; 1 ;:::;L g ,where V 0 denotestheleaflevel.Thelayout ofnodesisidenticaltothatofthequad-tree,modelingfore xamplethedyadicpyramid ofwaveletcoecients,suchthatthenumberofnodesatlevel ` canbecomputedas j V ` j = j V ` 1 j = 4= ::: = j V 0 j = 4 ` .Unlikeforposition-encodingdynamictrees[ 48 ],weassume thatnodesarexedatlocationsofthecorrespondingquad-t ree.Consequently,irregular modelstructureisachievedonlythroughestablishingarbi traryconnectionsbetweennodes. Connectionsareestablishedundertheconstraintthatanod eatlevel ` canbecomearoot 27


28 oritcanconnectonlytothenodesatthenext ` +1level.Thenetworkconnectivityisrepresentedbyarandommatrix, Z ,whereentry z ij isanindicatorrandomvariable,suchthat z ij =1if i 2 V ` and j 2 V ` +1 areconnected. Z containsanadditionalzero(\root")column, whereentries z i 0 =1if i isarootnode.Eachnodecanhaveonlyoneparent,orcanbea root.Notethatduetothedistributionoverconnections,af terestimationof Z ,foragiven image,inIQT V ,somenodesmayremainwithoutchildren. Eachnode i ischaracterizedbyanimage-classrandomvariable, x i ,whichcantake valuesinaniteclassset C .Given Z ,thelabel x i ofnode i isconditionedon x j ofits parent j as P ( x i j x j ;z ij =1).Thejointprobabilityofimage-classvariables X = f x i g 8 i 2 V isgivenby P ( X j Z )= Q L` =0 Q i 2 V ` P ( x i j x j ;z ij =1) ; (3.1) whereforrootsweusepriors P ( x i ).Weassumethattheconditionalprobabilitytables P ( x i j x j ;z ij =1)areequalforallthenodesatalllevels,asin[ 33 ].Suchauniqueconditional probabilitytableisdenotedas. Next,weassumethatobservables y i areconditionallyindependentgiventhecorresponding x i : P ( Y j X )= Q i 2 V P ( y i j x i ) ; (3.2) P ( y i j x i = k )= P Gg =1 k ( g ) N ( y i ; k ( g ) ; k ( g )) ; (3.3) whereforIQT V 0 insteadof V wewrite V 0 inEq.( 3.2 ). P ( y i j x i = k ), k 2 M ,ismodeledasamixtureofGaussians.TheGaussian-mixtureparame terscanbegroupedin = f k ( g ) ; k ( g ) ; k ( g ) ;G k g 8 k 2 M Finally,wespecifytheconnectivitydistribution.Inthep reviouschapter,itisdefIQTinedastheprior P ( Z )= Q i;j 2 V P ( z ij =1),andthentheconstraintonpossibletree structuresisimposedthroughintroducinganadditionalse tofrandomvariables{namely, randomnodepositions R .Themainpurposeofthe R 'sistoprovideforthemechanism thattheconnectionsbetweenclosenodesarefavored.Thata pproachhastwomajordisadvantages.First,theadditional R variablesrendertheexactinferenceofthedynamic


29 treeintractable,enforcingtheuseofapproximateinferen cemethods(variationalapproximation).Second,thedecisionifnodes i and j shouldbeconnectedisnotinformedonthe actualvaluesof x i and x j .Toimproveuponthemodelformulationofthepreviouschapter,weseektoeliminatethe R 's,andtoincorporatetheinformationonimage-classlabel s andnodepositionsintheconnectivitydistribution.Werea sonthatconnectionsbetween parentsandchildren,whoserelativedistanceissmall,sho uldbefavoredoverthosethatare farapart.Atthesametime,weseektoestablishamechanismt hatgroupsnodesbelonging tothesameimageclass,andseparatesthoseassignedtodie rentclasses. Letusrstexaminerelativedistancesbetweennodes.Dueto symmetryofthenode layout(equaltothatofthequad-tree),wedividethesetofa llcandidateparents j into classesofequidistancefromchild i ,asdepictedinFig. 3{1 .Wespecifythatrelative distancescantakeintegervalues d ij = f 0 ; 1 ; 2 ;:::;d maxi g ,whereif i isaroot n i 0 0.Note that d maxi valuesvaryfordierentpositionsof i atonelevel,aswellasfordierentlevels towhich i belongs. Given X ,wespecifytheconditionalconnectivitydistributionas P ( Z j X )= L Y ` =0 Y ( i;j ) 2 V ` f 0 ;V ` +1 g P ( z ij =1 j x i ;x j ) ; (3.4) P ( z ij =1 j x i ;x j )= 8>>>><>>>>: p i ;i isaroot ; p i (1 p i ) d ij ; if x i = x j ; p i (1 p i ) d maxi d ij ; if x i 6 = x j ; (3.5) subjectto X j 2f 0 ;V ` +1 g P ( z ij =1 j x i ;x j )=1 ; (3.6) where isanormalizingconstant,and p i istheparameterofthegeometricdistribution. FromEq.( 3.5 ),weobservethatwhen x i = x j P ( z ij =1 j x i ;x j )decreasesas d ij becomes larger,whilewhen x i 6 = x j P ( z ij =1 j x i ;x j )increasesforgreaterdistances d ij .Hence,the formof P ( z ij =1 j x i ;x j ),givenbyEq.( 3.5 ),satisestheaforementioneddesirableproperties. Toavoidovertting,weassumethat p i isequalforallnodes i atthesamelevel.The parametersof P ( Z j X )canbegroupedintheparameterset= f p i g 8 i 2 V


30 Figure3{1:Classesofcandidateparents j thatarecharacterizedbyauniquerelative distance d ij fromchild i Theintroducedparametersofthemodelcanbegroupedinthep arameterset= f ;; g Inthenextsectionweexplainhowtoinferthe\best"congur ationof Z and X fromthe observedimagedata Y ,providedthatisknown. 3.2InferenceoftheIrregularTreewithFixedNodePosition s ThestandardBayesianformulationoftheinferenceproblem consistsinminimizingthe expectationofsomecostfunction C ,giventhedata ( ^ Z; ^ X )=argmin Z;X E fC (( Z;X ) ; ( Z 0 ;X 0 )) j Y; g ; (3.7) where C penalizesthediscrepancybetweentheestimatedcongurat ion( Z;X )andthetrue one( Z 0 ;X 0 ).Weproposethefollowingcostfunction: C (( Z;X ) ; ( Z 0 ;X 0 ))= C ( X;X 0 )+ C ( Z;Z 0 ) ; (3.8) = L 1 X ` =0 X i 2 V ` [1 ( x i x 0i )]+ L 1 X ` =0 X ( i;j ) 2 V ` f 0 ;V ` +1 g [1 ( z ij z 0 ij )] ; (3.9) where 0 standsfortruevalues,and ( )istheKroneckerdeltafunction.FromEq.( 3.9 ),the resultingBayesianestimatorof X is 8 i 2 V; ^ x i =argmax x i 2 C P ( x i j Z;Y ) : (3.10) Next,giventheconstraintsonconnectionsintheirregular tree,wederivethatminimizing E fC ( Z;Z 0 ) j Y; g isequivalenttondingasetofoptimalparents ^ j suchthat ( 8 ` )( 8 i 2 V ` )( Z i 6 =0) ^ j =argmax j 2f 0 ;V ` +1 g P ( z ij j x i ;x j ) ; forIQT V 0 ; (3.11a) ( 8 ` )( 8 i 2 V ` ) ^ j =argmax j 2f 0 ;V ` +1 g P ( z ij j x i ;x j ) ; forIQT V ; (3.11b)


31 where Z i isthe i -thcolumnof Z ,and Z i 6 =0representstheevent\node i haschildren",that is,\node i isincludedintheirregular-treestructure."Theglobalso lutiontoEq.( 3.11a )is anopenprobleminmanyresearchareas.Weproposeastage-wi seoptimization,where,as wemoveupwards,startingfromtheleaflevel ` = f 0 ; 1 ;:::;L g ,weincludeinthetreestructure optimalparentsat V ` +1 accordingto ( 8 i 2 V ` )( ^ Z i 6 =0) ^ j =argmax j 2f 0 ;V ` +1 g P ( z ij =1 j x i ;x j ) ; (3.12) where ^ Z i 6 =0denotesanestimatethat i hasalreadybeenincludedinthetreestructure whenoptimizingthepreviouslevel V ` ByusingtheresultsinEqs.( 3.10 )and( 3.12 ),wespecifytheinferencealgorithmfor theirregularquadtree,whichissummarizedinFig. 3{2 .Inarecursivestep t ,werst assumethatestimate Z ( t 1)ofthepreviousstep t 1isknownandthenderiveestimate X ( t )usingEq.( 3.10 );then,substituting X ( t )inEq.( 3.12 )wederiveestimate Z ( t ).We considerthealgorithmconvergedif P ( Y;X j Z )doesnotvarymorethansomethreshold for N consecutiveiterationsteps t .Inourexperiments,weset =0 : 01and N =10. Steps2and6inthealgorithmcanbeinterpretedasinference of ^ X given Y foraxedstructuretree.Inparticular,forStep2,wheretheinitial structureisthequad-tree,wecan usethestandardinferenceonquad-trees,where,essential ly,beliefmessagesarepropagated inonlytwosweepsupanddownthetree[ 33 29 31 ].ForStep6,theirregulartreerepresents aforestofsubtrees,whichalsohavexed,thoughirregular ,structure;therefore,wecan usetheverysametree-inferencealgorithmforeachofthesu btrees.Forcompleteness, inAppendix B ,wepresentthetwo-passmaximumposteriormarginalestima tionof X proposedbyLaferteetal.[ 33 ]. 3.3LearningParametersoftheIrregularTreewithFixedNod ePositions Analogoustothelearningalgorithmdiscussedintheprevio uschapter,theparameters oftheirregulartreewithxednodepositionscanbelearned byusingthestandardML optimization.Here,weassumethat N ,independentlygenerated,trainingimages,withobservables f Y n g n =1 ;:::;N ,aregiven.Asexplainedbefore,congurationsoflatentva riables f ( Z n ;X n ) g mustbeestimated.


32 InferenceAlgorithm (1) t =0;initializeirregular-treestructure Z (0)toquad-tree; (2)Compute 8 i 2 V;x i (0)=argmax x i 2 C P ( x i j Z (0) ;Y ); (3) repeat (4) t = t +1; (5)Computein bottom-up passfor ` =0 ; 1 ;:::;L forIQT V 0 :( 8 i 2 V ` )( ^ Z i 6 =0) ^ j =argmax j 2f 0 ;V ` +1 g P ( z ij =1 j x i ;x j ); forIQT V :( 8 i 2 V ` ) ^ j =argmax j 2f 0 ;V ` +1 g P ( z ij =1 j x i ;x j ); (6)Compute 8 i 2 V;x i ( t )=argmax x i 2 C P ( x i j Z ( t ) ;Y ); (7) ^ X = X ( t ); ^ Z = Z ( t ); (8) until j P ( Y; ^ X j ^ Z ) P ( Y;X ( t 1) j Z ( t 1)) P ( Y;X ( t 1) j Z ( t 1)) j <" for N consecutiveiterationsteps. Figure3{2:Inferenceoftheirregulartreewithxednodepo sitions,givenobservables Y andthemodelparameters. Tothisend,weproposeaniterativelearningprocedure,whe reinstep t werstassume that( t )= f ( t ) ; ( t ) ; ( t ) g isgivenandthenconductinferenceforeachtrainingimage, n =1 ;:::;N ( ^ Z n ; ^ X n )=argmin Z;X E fC (( Z;X ) ; ( Z 0 ;X 0 )) j Y n ; ( t ) g ; asexplainedinSection 3.2 .Oncetheestimates f ( ^ Z n ; ^ X n ) g arefound,weapplystandard MLoptimizationtocompute( t +1). Morespecically,suppose,inthelearningstep t ,realizationsofrandomvariables ( Y n ; ^ X n ; ^ Z n )aregivenfor n =1 ;:::;N .ThentheparametersofGaussian-mixturedistributions,instep t +1,canbecomputedusingthestandardEMalgorithm[ 56 ]: P ( c ( g ) j y i ;x i = c )= P ( y i j x i = c ) c ( g ) P G c g =1 P ( y i j x i = c ) c ( g ) ; (3.13) ^ c ( g )= 1 n c n c X i =1 P ( c ( g ) j y i ; ^ x i = c ) ; (3.14) ^ c ( g )= P n c i =1 y i P ( c ( g ) j y i ; ^ x i = c ) P n c i =1 P ( c ( g ) j y i ; ^ x i = c ) ; (3.15) ^ c ( g )= P n c i =1 ( y i ^ c ( g ))( y i ^ c ( g )) T P ( c ( g ) j y i ; ^ x i = c ) P n c i =1 P ( c ( g ) j y i ; ^ x i = c ) ; (3.16) where n c isthetotalnumberofallthenodesover N trainingimagesthatareclassiedas class c .Tocompute P ( c ( g ) j y i ;x i = c )inEq.( 3.13 ),weuseGaussian-mixtureparameters fromthepreviouslearningstep t .Forallclassesweset G c =3.


33 Next,weexplainhowtolearntheparametersoftheconnectiv itydistribution,( t +1)= f p i ( t +1) g i 2 V ,byusingtheMLprinciple: ( t +1)=argmax N Y n =1 P ( ^ Z n j ^ X n ; ( t 1)) : (3.17) Here,weconsidertwocasesforIQT V andIQT V 0 models.Recallthatparameters p i are equalforallnodes i atthesamelevel ` .Giventheestimates f ( ^ Z n ; ^ X n ) g ,foreachtraining image n =1 ;:::;N ,fromEqs.( 3.5 )and( 3.17 ),wederiveforIQT V : ^ p ( ` )= N j V ` j N X n =1 X i 2 V ` 1+I(^ x ni =^ x nj ) d nij +I(^ x ni 6 =^ x nj )( d maxi d nij ) ; (3.18) whereI( )isanindicatorfunction, j isanestimatedparentofnode i d nij denotestherelative distanceassignedtotheestimatedconnection^ z n ij =1. ForIQT V 0 ,giventheestimates f ( ^ Z n ; ^ X n ) g ,foreachtrainingimage n =1 ;:::;N ,we analyzethesetofnodes i 2 V ` includedinthecorrespondingirregulartree,i.e., ^ Z n i 6 =0. Thus,fromEqs.( 3.5 ),and( 3.17 ),wederive: ^ p i ( ` )= N X n =1 X i 2 V ` I( ^ Z n i 6 =0) N X n =1 X i 2 V ` I( ^ Z n i 6 =0) 1+I(^ x ni =^ x nj ) d nij +I(^ x ni 6 =^ x nj )( d maxi d nij ) ; (3.19) whereI( )isanindicatorfunction, j isanestimatedparentofnode i d nij denotestherelative distanceassignedtotheestimatedconnection^ z n ij =1. Finally,tolearntheconditionalprobabilitytable,weus ethestandardEMalgorithm onxed-structuretrees,thoroughlydiscussedin[ 33 ].Notethattoobtaintheestimates f ( ^ Z n ; ^ X n ) g ,foreachtrainingimage n =1 ;:::;N ,inthelearningstep t ,weinfacthaveto conducttheMPMestimation,givenininAppendix B inFig. B .Byusingalreadyavailable P ( x i ;x j j Y n d ( i ) ; ^ z n ij =1)and P ( x i j Y n d ( i ) ),obtainedforeachimage n asinFig B ,wederive ^ = 1 N N X n =1 P i 2 V P ( x i ;x j j Y n d ( i ) ; ^ z n ij =1) P i 2 V P ( x j j Y n d ( i ) ) : (3.20) TheoveralllearningprocedureissummarizedinFig. 3{3


34 LearningAlgorithm (1) t =0;initialize(0)= f (0) ; (0) ; (0) g ; (2)Estimatefor n =1 ;:::;N ( ^ Z n ; ^ X n )=argmin Z;X E fC (( Z;X ) ; ( Z 0 ;X 0 )) j Y n ; (0) g ; (3) repeat (4) t = t +1; (5)Compute: ( t )asinEqs.( 3.13 ){( 3.16 ); p ( ` ; t ),forIQT V asinEq.( 3.18 );forIQT V 0 asinEq.( 3.19 ); ( t ),asinEq.( 3.20 ); (6)Estimatefor n =1 ;:::;N ( ^ Z n ; ^ X n )=argmin Z;X E fC (( Z;X ) ; ( Z 0 ;X 0 )) j Y n ; ( t ) g usingtheinferencealgorithminFig. 3{2 ; (7) =( t ); (8) until ( 8 n ) j P ( Y n ; ^ X n j ^ Z n ; ) P ( Y n ; ^ X n j ^ Z n ; ( t 1)) P ( Y n ; ^ X n j ^ Z n ; ( t 1)) j <" for N consecutiveiterationsteps. Figure3{3:Algorithmforlearningtheparametersoftheirr egulartree;fornotational simplicity,inStep(8)wedonotindicatethedierentestim atesof( ^ Z n ; ^ X n )for and ( t 1). Once islearned,wecanlocalize,detectandrecognizeobjectsin theimage,by conductingtheinferencealgorithm,presentedinFig. 3{2


CHAPTER4 COGNITIVEANALYSISOFOBJECTPARTS Inferenceofhiddenvariables( ^ Z; ^ X ),canbeviewedasbuildingaforestofsubtrees,each segmentinganimageintoarbitrary(notnecessarilycontig uous)regions,whichweinterpret asobjects.Since,eachrootdeterminesasubtree,whoselea fnodesformadetectedobject, weassignphysicalmeaningtorootsbyassumingtheyreprese ntwholeobjects.Moreover, eachdescendantoftherootcanbeviewedastherootofanothe rsubtree,whoseleafnodes coveronlyapartoftheobject.Hence,wesaythatroots'desc endantsrepresentobject partsatvariousscales. Strategiesforrecognizingdetectedobjectsnaturallyari sefromaparticularinterpretationofthetree/sub-treestructure.Below,wemakeadistin ctionbetweentwosuchstrategies.Theanalysisofimageregionsundertherootsleadstot he whole-objectrecognition strategy ,whiletheanalysisofimageregionsdeterminedbyroots'de scendantsconstitutes the object-partrecognitionstrategy .Forbothapproaches,nalrecognitionisconductedby majorityvotingoverMAPlabels,^ x i ,ofleafnodes. 1 Thereasonforanalyzingsmallerimageregionsthanthoseun dertherootsstemsfrom ourhypothesisthattheinformationofne-scaleobjectdet ailsmayprovecriticalforthe recognitionofanobjectasawholeinsceneswithocclusions .Toreducethecomplexityof interpretingalldetectedobjectsub-parts,weproposetoa nalyzethe signicance ofobject components(i.e.,irregular-treenodes)withrespecttore cognitionofobjectsasawhole. 1 Theliteratureoersvariousstrategiesthatoutperformma jority-votingclassication (e.g.,multiscaleBayesianclassication[ 29 ],andmultiscaleViterbiclassication[ 32 ]);however,theydonotaccountexplicitlyforocclusions,and,as such,donotsignicantlyoutperformmajorityvotingforsceneswithoccludedobjects. 35


36 4.1MeasuringSignicanceofObjectParts Wehypothesizethatthesignicanceofobjectpartswithres pecttoobjectrecognition dependsonbothlocal,innateobjectpropertiesandglobals ceneproperties.Whileinnatepropertiesrepresentcharacteristicobjectfeatures ,whichdierentiateoneobjectfrom another,globalscenepropertiesdescribeinterdependenc iesofobjectpartsintheoverall imagecomposition.Itisnecessarytoaccountforbothlocal andglobalcues,asthemost conspicuousobjectcomponentneednotnecessarilybethemo stsignicantforthatobject's recognitioninthepresenceofalikeobjects. Theanalysisofinnateobjectpropertiesishandledthrough inferenceoftheirregular tree,where,foragivenimage,wecompute P ( x i j ^ Z;Y ), 8 i 2 V ,asexplainedinChapters 2 and 3 .Toaccountfortheinruenceofglobalsceneproperties,for eachnode i ,wecompute Shanon'sentropyoverthesetofimageclasses, M ,as ( 8 i 2 V )(^ z i 6 =0) H i = X x i 2 M P ( x i j ^ Z;Y )log P ( x i j ^ Z;Y ) : (4.1) Sincenode i representsanobjectpart,wedene H i asameasureofsignicanceofthat objectpart.Notethatanodewithsmallentropyischaracter izedbya\peaky"distribution P ( x i j ^ Z;Y )withthemaximum,say,at x i = k 2 M .Thisindicatesthattheerrorof classicationwillbesmallwhen i islabeledasclass c .Recallthatduringinference,the beliefmessageof i ispropagateddownthesubtreeinbeliefpropagation[ 33 ],whichislikely torender i 'sdescendantswithsmallentropies,aswell.Thus,theclas sicationerrorof thewholeregionofleafnodesunder i islikelytobesmall,whencomparedtosomeother imageregionunder,say,node j suchthat H j >H i .Consequently, i ismore\signicant" forrecognitionofclass k thannode j .Inbrief,themostsignicantobjectparthasthe smallestentropyoverallnodesinagivensub-tree T : i =max i 2T H i : (4.2) InFigs. 4{1 and 4{2 ,weillustratethemostsignicantobjectpartundereachro ot, whereentropyiscomputedoversevenandsiximageclasses,s howninFigs. 4{1 (top)and 4{2 (top),respectively.Theexperimentisconductedasexplai nedinChapter 2 ,usingthe


37 Figure4{1:ForeachsubtreeofIT V ,representinganobjectinthe128 128image,anode i isfoundwiththehighestentropyfor j M j =6+1=7possibleimageclasses(toprow). Brightpixelsaredescendantsof i attheleaflevelandindicatetheobjectpartrepresented by i irregulartreewithrandomnodepositions,andobservables atalllevels(IT V ).Detailson computingobservables Y inthisexperimentareexplainedinChapter 5 .Notethatfor dierentscenesdierentobjectpartsareestablishedasth emostsignicantwithrespectto theentropymeasure. 4.2CombiningObject-PartRecognitionResults Oncenodesarerankedwithrespecttotheentropymeasure,we areinapositionto deviseacriteriontooptimallycombinethisinformationto wardultimateobjectrecognition. Herewith,weproposeasimplegreedyalgorithm,which,none theless,showsremarkable improvementsinperformanceoverthewhole-objectrecogni tionapproach. Undereachroot,werstselectthedescendantnodewiththes mallestentropy.Each selectednodedeterminesasubtree,whoseleafnodesforman objectpart.Then,weconduct majorityvotingovertheseselectedimageregions.Inthese condround,weselectunder eachrootthedescendantnodewiththesmallestentropy,suc hthatitdoesnotbelongto anyofthesubtreesselectedintherstround.Now,thesenod esdeterminenewsubtrees, whoseleafnodesformobjectpartsthatdonotoverlapwithth eselectedimageregionsin


38 Figure4{2:ForeachsubtreeofIT V ,representinganobjectinthe256 256image,anode i isfoundwiththehighestentropyfor j M j =5+1=6possibleimageclasses(toprow). Brightpixelsaredescendantsof i attheleaflevelandindicatetheobjectpartrepresented by i ;theimagesrepresentthesamesceneviewedfromthreedier entangles;themost signicantobjectpartsdierovervariousscenes.therstround.Then,weconductmajorityvotingoverthenew lyselectedimageregions. Thisprocedureisrepeateduntilweexhaustivelycoverallt hepixelsintheimage.This stage-wisemajorityvotingovernon-overlappingimagereg ionsconstitutesthenalstepin theobject-partrecognitionstrategy(seeFig. 1{3 ).


CHAPTER5 FEATUREEXTRACTION InChapters 2 and 3 ,wehaveintroducedfourarchitecturesoftheirregulartre e,referred toasIT V ,IT V 0 ,IQT V ,andIQT V 0 .Tocomputetheobservable(feature)randomvectors Y 'sforthesemodels,weaccountforbothcolorandtexturecue s. 5.1Texture Forthechoiceoftexture-basedfeatures,wehaveconsidere dseveralltering,modelbasedandstatisticalmethodsfortexturefeatureextracti on.Ourconclusioncomplieswith thecomparativestudyofRandenandHusoy[ 66 ]thatforproblemswithmanytextureswith subtlespectraldierences,asinthecaseofourcomplexcla sses,itisreasonabletoassume thatthespectraldecompositionbyalterbankyieldsconsi stentlysuperiorresultsover othertextureanalysismethods.Ourexperimentalresultsa lsosuggestthatitiscrucialto analyzebothlocalaswellasregionalpropertiesoftexture .Assuch,weemploythewavelet transform,duetoitsinherentrepresentationoftextureat dierentscalesandlocations. 5.1.1WaveletTransform Waveletatomfunctions,beingwelllocalizedbothinspacea ndfrequency,retrieve textureinformationquitesuccessfully[ 67 ].Theconventionaldiscretewavelettransform (DWT)mayberegardedasequivalenttolteringtheinputsig nalwithabankofbandpass lters,whoseimpulseresponsesareallgivenbyscaledvers ionsofamotherwavelet.The scalingfactorbetweenadjacentltersis2:1,leadingtooc tavebandwidthsandcenterfrequenciesthatareoneoctaveapart.Theoctave-bandDWTismo stecientlyimplemented bythedyadicwaveletdecompositiontreeofMallat[ 68 ],wherewaveletcoecientsofan imageareobtainedconvolvingeveryrowandcolumnwithimpu lseresponsesoflowpass andhighpasslters,asshowninFigure 5{1 .Practically,coecientsofonescaleareobtainedconvolvingeverysecondrowandcolumnfromtheprevi ousnerscale.Thus,the lteroutputisawaveletsubimagethathasfourtimeslessco ecientsthantheoneatthe 39


40 1 H 0 H 2 2 H 0 H 1 22 H 1 22 H 0 H 0 H 1 22 H 0 H 1 H 0 H 1 Row filters Column filters Row filters Column filters 2222 Level 0 Level 0 Level 1 Level 1 LEVEL 0 LEVEL 1 W L W H W LL W LH W HL W HH W LL W LH W HL W HH Figure5{1:TwolevelsoftheDWTofatwo-dimensionalsignal 20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120 Figure5{2:Theoriginalimage(left)anditstwo-scaledyad icDWT(right). previousscale.Thelowpasslterisdenotedwith H 0 andthehighpasslterwith H 1 .The waveletcoecientsWhaveinindexLdenotinglowpassoutput andHforhighpassoutput. Separablelteringofrowsandcolumnsproducesfoursubima gesateachlevel,which canbearrangedasshowninFigure 5{2 .Thesamegurealsoillustrateswellthedirectional selectivityoftheDWT,because W LH W HL ,and W HH bandpasssubimagescanselect horizontal,verticalanddiagonaledges,respectively.


41 5.1.2WaveletProperties ThefollowingpropertiesoftheDWThavemadewavelet-based imageprocessingvery attractiveinrecentyears[ 67 30 69 ]: 1.locality:eachwaveletcoecientrepresentslocalimage contentinspaceandfrequency, becausewaveletsarewelllocalizedsimultaneouslyinspac eandfrequency 2.multi-resolution:DWTrepresentsanimageatdierentsc alesofresolutioninspace domain(i.e.,infrequencydomain);regionsofanalysisato nescalearedividedupinto foursmallerregionsatthenextnerscale(Fig. 5{2 ) 3.edgedetector:edgesofanimagearerepresentedbylargew aveletcoecientsatthe correspondinglocations 4.energycompression:waveletcoecientsarelargeonlyif edgesarepresentwithinthe supportofthewavelet,whichmeansthatthemajorityofwave letcoecientshave smallvalues 5.decorrelation:waveletcoecientsareapproximatelyde correlated,sincethescaled andshiftedwaveletsformorthonormalbasis;dependencies amongwaveletcoecients arepredominantlylocal 6.clustering:ifaparticularwaveletcoecientislarge/s mall,thentheadjacentcoecientsareverylikelytoalsobelarge/small 7.persistence:large/smallvaluesofwaveletcoecientst endtopropagatethroughscales 8.non-Gaussianmarginal:waveletcoecientshavepeakyan dlong-tailedmarginaldistributions;duetotheenergycompressionpropertyonlyafe wwaveletcoecients havelargevalues,thereforeGaussiandistributionforani ndividualcoecientisa poorstatisticalmodel ItisalsoimportanttointroduceshortcomingsoftheDWT.Di scretewaveletdecompositionssuerfromtwomainproblems,whichhampertheiru seformanyapplications,as follows[ 70 ]: 1.lackofshiftinvariance:smallshiftsintheinputsignal cancausemajorvariationsin theenergydistributionofwaveletcoecients 2.poordirectionalselectivity:forsomeapplicationshor izontal,verticalanddiagonal selectivityisinsucient WhenweanalyzetheFourierspectrumofasignal,weexpectth eenergyineach frequencybintobeinvarianttoanyshiftsoftheinput.Unfo rtunately,theDWThasa signicantdrawbackthattheenergydistributionbetweenv ariouswaveletscalesdepends criticallyonthepositionofkeyfeaturesoftheinputsigna l,whereasideallydependence


42 TREE aTREE b Level 0 Level 1 Level 2 H 0a H 1a H 0b H 1b 2222 2222 2222 H 00a H 01a H 00b H 01b H 00a H 01a H 00b H 01b x 0a x 1a x 0b x 1b x 00a x 01a x 00b x 01b x 000a x 001a x 000b x 001b Figure5{3:TheQ-shiftDual-TreeCWT. isonjustthefeaturesthemselves.Therefore,therealDWTi sunlikelytogiveconsistent resultswhenusedintextureanalysis. Inliterature,thereareseveralapproachesproposedtoove rcomethisproblem(e.g., DiscreteWaveletFrames[ 67 71 ]),allincreasingcomputationalloadwithinevitableredu ndancyinthewaveletdomain.Inouropinion,theComplexWave letTransform(CWT)oers thebestsolutionprovidingadditionaladvantages,descri bedinthefollowingsubsection. 5.1.3ComplexWaveletTransform ThestructureoftheCWTisthesameasinFigure 5{1 ,exceptthattheCWTlters havecomplexcoecientsandgeneratecomplexoutput.Theou tputsamplingratesare unchangedfromtheDWT,buteachwaveletcoecientcontains arealandimaginarypart, thusaredundancyof2:1forone-dimensionalsignalsisintr oduced.Inourcase,fortwodimensionalsignals,theredundancybecomes4:1,becauset woadjacentquadrantsofthe spectrumarerequiredtorepresentfullyarealtwo-dimensi onalsignal,addinganextra2:1 factor.Thisisachievedbyadditionallteringwithcomple xconjugatesofeithertherowor columnlters[ 70 ]. Despiteitshighercomputationalcost,weprefertheCWTove rtheDWTbecauseof theCWT'sfollowingattractiveproperties.TheCWTisshown topossesalmostshiftand rotationalinvariance,givensuitablydesignedbiorthogo nalororthogonalwaveletlters.We


43 Table5{1:CoecientsoftheltersusedintheQ-shiftDTCWT H 13 (symmetric) H 19 (symmetric) H 6 -0.0017581 -0.0000706 0.03616384 0 0 0 0.0222656 0.0013419 -0.08832942 -0.0468750 -0.0018834 0.23389032 -0.0482422 -0.0071568 0.76027237 0.2968750 0.0238560 0.58751830 0.5554688 0.0556431 0 0.2968750 -0.0516881 -0.11430184 -0.0482422 -0.2997576 0 ... 0.5594308 0 -0.2997576 ... Figure5{4:TheCWTisstronglyorientedatangles 15 ; 45 ; 75 implementtheQ-shiftDual-TreeCWTscheme,proposedbyKin gsbury[ 72 ],asdepictedin Figure 5{3 .ThegureshowstheCWTofonlyone-dimensionalsignal x ,forclarity.The outputofthetrees a and b canbeviewedasrealandimaginarypartsofcomplexwavelet coecients,respectively.Thus,tocomputetheCWT,weimpl ementtworealDWT's(see Fig. 5{1 ),obtainingawaveletframewithredundancytwo.AsfortheD WT,here,lowpass andhighpassltersaredenotedwith0and1inindex,respect ively.Thelevel0comprises odd-lengthlters H 0 a ( z )= H 0 b ( z )= H 13 ( z )(13taps)and H 1 a ( z )= H 1 b ( z )= H 19 ( z ) (19taps).Levelsabovethelevel0consistofeven-lengthl ters H 00 a ( z )= z 1 H 6 ( z 1 ), H 01 a ( z )= H 6 ( z ), H 00 b ( z )= H 6 ( z ), H 01 b ( z )= z 1 H 6 ( z 1 ),wheretheimpulseresponse ofthelters H 13 H 19 and H 6 isgiveninthetable 5{1


44 Asidefrombeingshiftinvariant,theCWTissuperiortotheD WTintermsofdirectionalselectivity,too.Atwo-dimensionalCWTproducessi xbandpasssubimages(analogoustothethreesubimagesintheDWT)ofcomplexcoecients ateachlevel,whichare stronglyorientedatanglesof 15 ; 45 ; 75 ,asillustratedinFigure 5{4 AnotheradvantageouspropertyoftheCWTexertsintheprese nceofnoise.The phaseandmagnitudeofthecomplexwaveletcoecientscolla borateinanontrivialway todescribedata[ 70 ].Thephaseencodesthecoherent(inspaceandscale)struct ureof animage,whichisresilienttonoise,andthemagnitudecapt uresthestrengthoflocal informationthatcouldbeverysusceptibletonoisecorrupt ion.Hence,thephaseofcomplex waveletcoecientsmightbeusedasaprincipalclueforimag edenoising.However,our experimentalresultshaveshownthatphaseisnotagoodfeat urechoiceforsky/ground modeling.Therefore,weconsideronlymagnitudes. Insummary,fortextureanalysisinIT V andIQT V ,wechoosethecomplexwavelet transform(CWT)appliedtotheintensity(gray-scale)imag e,duetoitsshift-invariant representationoftextureatdierentscales,orientation sandlocations. 5.1.4Dierence-of-GaussianTextureExtraction InIT V 0 andIQT V 0 ,observablesarepresentonlyattheleaflevel.Therefore, forthese models,multiscaletextureextractionissuperruous.Here ,wecomputethedierence-ofGaussianfunctionconvolvedwiththeimageas D ( x;y;k; )=( G ( x;y;k ) G ( x;y; )) I ( x;y ) ; (5.1) where x and y representpixelcoordinates, G ( x;y; ) exp( ( x 2 + y 2 ) = 2 2 ) = 2 2 ,and I ( x;y )istheintensityimage.Inadditiontoreducedcomputation alcomplexity,ascomparedtotheCWT,thefunction D providesacloseapproximationtothescale-normalized LaplacianofGaussian, 2 r 2 G ,whichhasbeenshowntoproducethemoststableimage featuresacrossscaleswhencomparedtoarangeofotherposs ibleimagefunctions,suchas thegradientandtheHessian[ 73 74 ].Wecompute D ( x;y;k; )forthreescales k = p 2 ; 2 ; p 8 and =2.


45 5.2Color Thecolorinformationinavideosignalisusuallyencodedin theRGBcolorspace.For colorfeatures,inallmodels,wechoosethegeneralizedRGB colorspace: r = R= ( R + G + B ), and g = G= ( R + G + B ),whicheectivelynormalizesvariationsinbrightness.F orIT V and IQT V ,the Y 'sofhigher-levelnodesarecomputedasthemeanofthe r 'sand g 'softheir childrennodesoftheinitialquad-treestructure.Eachcol orobservableisnormalizedto havezeromeanandunitvarianceoverthedataset. Insummary,the y 'sare8dimensionalvectorsforIT V andIQT V ,and5dimensional vectorsforIT V 0 andIQT V 0


CHAPTER6 EXPERIMENTSANDDISCUSSION Wereportexperimentsonimagesegmentationandclassicat ionforsixsetsofimages. DatasetIcomprisesfty,64 64,simple-sceneimageswithobjectappearancesof20disti nct objectsshowninFig. 6{1 .SamplesofdatasetIaregiveninFigs. 6{2 6{3 ,and 6{4 .Dataset IIcontains120,128 128,complex-sceneimageswithpartiallyoccludedobjecta ppearances ofthesame20distinctobjectsasfordatasetIimages.Examp lesofdatasetIIareshown inFigs. 6{11 6{12 6{15 .NotethatobjectsappearingindatasetsIandIIarecareful ly chosentotestifirregulartreesareexpressiveenoughtoca ptureverysmallvariationsin appearancesofsomeclasses(e.g.,twodierenttypesofcan sinFig. 6{1 ),aswellasto encodelargedierencesamongsomeotherclasses(e.g.,wir y-featuredrobotandbooksin Fig. 6{1 ). Next,datasetIIIcontainsfty,128 128,natural-sceneimages,samplesofwhichare showninFigs. 6{5 and 6{6 FordatasetIVwechoosesixty,128 128imagesfromadatabasethatispublicly availableattheComputerVisionHomePage.DatasetIVconta insavideosequenceof twopeopleapproachingeachother,whowearalikeshirts,bu tdierentpants,asillustrated inFig. 6{16 .Thesequenceisinteresting,becausethemostsignicant\ object"partsfor dierentiatingbetweenthetwopersons(i.e.,pants)getoc cluded.Moreover,theimages representsceneswithclutter,whererecognitionofpartia llyoccluded,similar-in-appearance peoplebecomesharder.Togetherwiththetwopersons,there are12possibleimageclasses appearingindatasetII,asdepictedinFig. 6{16 a.Here,eachimageistreatedseparately, withoutmakinguseofthefactthatthebackgroundscenedoes notchangeinthevideo sequence. Further,datasetVconsistsofsixty,256 256images,typicalsamplesofwhichare showninFigs. 6{17 b.TheimagesindatasetVrepresentthevideosequenceofaco mplexscene,whichisobservedfromdierentviewpointsbymo vingacamerahorizontally 46


47 clockwise.Togetherwiththebackground,thereare6possib leimageclasses,asdepictedin Figs. 6{17 a. Finally,datasetVIconsistsofsixty,256 256natural-sceneimages,samplesofwhich areshowninFigs. 6{18 .TheimagesindatasetVIrepresentthevideosequenceofaro w ofhouses,whichisobservedfromdierentviewpoints.Theh ousesareverysimilarin appearance,sothattherecognitiontaskbecomesverydicu lt,whendetailsdierentiating onehousefromanotherareoccluded.Thereare8possibleima geclasses:4dierenthouses, sky road grass ,and tree ,asmarkedwithdierentcolorsinFigs. 6{18 Alldatasetsaredividedintotrainingandtestsetsbyrando mselectionofimages, suchthat2/3areusedfortrainingand1/3fortesting.Groun dtruthforeachimageis determinedthroughhand-labelingofpixels. 6.1UnsupervisedImageSegmentationTests Werstreportexperimentsonunsupervisedimagesegmentat ionusingIT V 0 andIT V Irregular-treebasedimagesegmentationistestedondatas etsIandIII,andconductedby thealgorithmgiveninFig. 2{4 .Sinceinunsupervisedsettingstheparametersofthemodel arenotknown,weinitializethemasdiscussedintheinitial izationstepofthelearning algorithminSection 2.5 .AfterBayesianestimationoftheirregulartree,eachnode denes oneimageregioncomposedofthoseleafnodes(pixels)thata rethatnode'sdescendants. ResultspresentedinFigs. 6{2 6{3 6{4 6{5 ,and 6{6 suggestthatirregulartreesareable toparseimagesinto\meaningful"partsbyassigningonesub treeper\object"intheimage. Moreover,fromFigs. 6{2 and 6{3 ,wealsoobservethatirregulartrees,inferredthroughSVA preservestructureforobjectsacrossimagessubjecttotra nslation,rotationandscaling.In Fig. 6{2 ,notethatthelevel-4clusteringforthelarger-objectsca leinFig. 6{2 (top-right) correspondstothelevel-3clusteringforthesmaller-obje ctscaleinFig. 6{2 (bottom-center). Inotherwords,astheobjecttransitionsthroughscales,th etreestructurechangesby eliminatingthelowest-levellayer,whilethehigher-orde rstructureremainsintact. Wealsonotethattheestimatedpositionsofhigher-levelhi ddenvariablesinIT V 0 and IT V areveryclosetothecenterofmassofobjectparts,aswellas ofwholeobjects.We computetheerrorofestimatedroot-nodepositions ^ r asthedistancefromtheactualcenter ofmass r CM ofhand-labeledobjects, d err = jj ^ r r CM jj .Also,wecompareourSVAinference


48 Figure6{1:20imageclassesintypeIandIIdatasets. Figure6{2:ImagesegmentationusingIT V 0 :(left)datasetIimages;(center) pixelclusterswiththesameparentatlevel ` =3;(right)pixelclusterswiththe sameparentatlevel ` =4;pointsmarkthe positionofparentnodes.Irregular-treestructureispreservedthroughscales. Figure6{3:ImagesegmentationusingIT V 0 :(top)datasetIimages;(bottom) pixelclusterswiththesameparentatlevel3.Irregular-treestructureispre-servedoverrotations. algorithmwithvariationalapproximation(VA) 1 proposedbyStorkeyandWilliams[ 48 ]. TheaveragederrorvaluesoverthegiventestimagesforVAan dSVAarereportedin Table 6{1 .Weobservethattheerrorsignicantlydecreasesastheima gesizeincreases, becauseinsummingnodepositionsoverparentandchildrenn odes,asinEq.( 2.16 )and Eq.( 2.17 ),morestatisticallysignicantinformationcontributes tothepositionestimates. Forexample, d IIIerr =6 : 18forSVAisonly4.8%ofthedataset-IIIimagesize,whereas d Ierr =4 : 23 forSVAis6.6%ofthedataset-Iimagesize. InTable 6{2 ,wereportthepercentageoferroneouslygroupedpixels,an d,inTable 6{3 wereporttheobjectdetectionerror,whencomparedtogroun dtruth,averagedovereach dataset.Forestimatingtheobjectdetectionerror,thefol lowinginstancesarecountedas 1 AlthoughthealgorithmproposedbyStorkeyandWilliams[ 48 ]isalsostructuredvariationalapproximation,todierentiatethatmethodfromour s,weslightlyabusethenotation.


49 (a) (b) (c) Figure6{4:Imagesegmentationbyirregulartreeslearnedu singSVA:(a)-(c)IT V 0 for datasetIimages;allpixelslabeledwiththesamecolorared escendantsofauniqueroot. (a) (b) (c) (d) Figure6{5:Imagesegmentationbyirregulartreeslearnedu singSVA:(a)IT V 0 foradataset IIIimage;(b)-(d)IT V fordatasetIIIimages;allpixelslabeledwiththesamecolo rare descendantsofauniqueroot. (a) (b) (c) (d) Figure6{6:ImagesegmentationusingIT V :(a)adatasetIIIimage;(b)-(d)pixelclusters withthesameparentatlevels ` =3 ; 4 ; 5,respectively;whiteregionsrepresentpixelsalready groupedbyrootsatthepreviousscale;pointsmarktheposit ionofparentnodes. Table6{1:Root-nodedistanceerror IT V 0 IT V dataset VA SVA VA SVA I 6.32 4.61 6.14 4.23 III 9.15 6.87 8.99 6.18


50 Table6{2:Pixelsegmentationerror datasets I III IT V 0 VA 7% 10% SVA 4% 9% IT V VA 7% 11% SVA 4% 7% Table6{3:Objectdetectionerror datasets I III IT V 0 VA 4% 13% SVA 3% 10% IT V VA 4% 10% SVA 2% 6% error:(1)mergingtwodistinctobjectsintoone(i.e.,fail uretodetectanobject),and(2) segmentinganobjectintosub-regionsthatarenotactualob jectparts.Ontheotherhand,if anobjectissegmentedintoseveral\meaningful"sub-regio ns,veriedbyvisualinspection, thistypeoferrorisnotincluded.Overall,weobservethatS VAoutperformsVAforimage segmentationusingIT V 0 andIT V .Interestingly,thesegmentationresultsforIT V models areonlyslightlybetterthanforIT V 0 models. Itshouldbeemphasizedthatourexperimentsarecarriedout inan unsupervised setting, and,assuch,cannotnotbeequitablyevaluatedagainst supervised objectrecognitionresults reportedintheliterature.Take,forinstance,thesegment ationinFig. 6{5 d,wheretwo boysdressedinwhiteclothes(i.e.,twosimilar-lookingob jects)aremergedintoonesubtree. Giventheabsenceofpriorknowledge,theground-truthsegm entationforthisimageis arbitrary,andtheresultingsegmentationambiguous;neve rtheless,westillcountittowards theobject-detectionerrorpercentagesinTable 6{3 Ourclaimthatnodesatdierentlevelsofirregulartreesre presentobject-partsat variousscalesissupportedbyexperimentalevidencethatt henodessegmenttheimageinto \meaningful"objectsub-componentsandpositionthemselv esatthecenterofmassofthese sub-parts. 6.2TestsofConvergence Inthissection,wereportontheconvergencepropertiesoft heinferencealgorithms forIT V 0 ,IT V ,IQT V 0 ,andIQT V .First,wecompareourSVAinferencealgorithmwith variationalapproximation(VA)[ 48 ].InFig. 6{7 a-b,weillustratetheconvergencerate ofcomputing P ( Z;X;R 0 j Y;R 0 ) Q ( Z;X;R 0 )forSVAandVA,averagedoverthegiven datasets.Numbersabovebarsrepresentthemeannumberofit erationstepsittakesfor thealgorithmtoconverge.Weconsiderthealgorithmconver gedwhen j Q ( Z;X;R 0 ; t +


51 (a)AverageconvergencerateforIT V 0 (b)AverageconvergencerateforIT V (c)Increaseoflog Q ( Z;X;R 0 )inSVAover VAforIT V 0 (d)Increaseoflog Q ( Z;X;R 0 )inSVAover VAforIT V Figure6{7:Comparisonofinferencealgorithms:(a)-(b)co nvergencerateaveragedover thegivendatasets;(c)-(d)percentageincreaseinlog Q ( Z;X;R 0 )computedinSVAover log Q ( Z;X;R 0 )computedinVA. 1) Q ( Z;X;R 0 ; t ) j =Q ( Z;X;R 0 ; t ) <" for N consecutiveiterationsteps t ,where N =10and =0 : 01(seeFig. 2{4 ,Step(11)).Overall,SVAconvergesinthefewestnumberofi terations. Forexample,theaveragenumberofiterationsforSVAondata setIIIis25and23forIT V 0 andIT V ,respectively,whichtakesapproximately6sand5sonaDual 2GHzPowerPCG5. Here,theprocessingtimealsoincludesimage-featureextr action. Forthesameexperiments,inFig. 6{7 c-d,wereportthepercentageincreaseinlog Q ( Z;X;R 0 ) computedusingourSVAoverlog Q ( Z;X;R 0 )obtainedbyVA.WenotethatSVAresults inlargerapproximateposteriorsthanVA.Thelargerlog Q ( Z;X;R 0 )meansthattheassumedformoftheapproximateposteriordistribution Q ( Z;X;R 0 )= Q ( Z ) Q ( X j Z ) Q ( R 0 j Z ) moreaccuratelyrepresentsunderlyingstochasticprocess esintheimagethanVA. Now,wecomparetheconvergenceoftheinferencealgorithmf orIQT V 0 withSVAand VAforIT V 0 .Forsimplicity,werefertotheinferencealgorithmforthe modelIQT V 0 also,asIQT V 0 ,slightlyabusingthenotation.Theparametersthatcontro ltheconvergence


52 Figure6{8:Typicalconvergencerateoftheinferencealgor ithmforIT V 0 onthe128 128 datasetIVimageinFig. 6{16 b;SVAandVAinferencealgorithmsareconductedforIT V 0 model. Figure6{9:Typicalconvergencerateoftheinferencealgor ithmforIT V 0 onthe256 256 datasetVimageinFig. 6{17 b;SVAandVAinferencealgorithmsareconductedforIT V 0 model. Figure6{10:Percentageincreaseinlog-likelihoodlog P ( Y j X )ofIQT V 0 overlog P ( Y j X ) ofIT V 0 ,after500and200iterationstepsfordatasetsIVandV,resp ectively. criterionfortheinferencealgorithmsofthethreemodelsa re N =10,and =0 : 01.Figs. 6{8 and 6{9 illustratetypicalexamplesoftheconvergencerate.Weobs ervethattheinference algorithmforIQT V 0 convergesslightlyslowerthanSVAandVAforIT V 0 .Theaverage numberofiterationstepsforIQT V 0 isapproximately160and230,whichtakes6sand17s onaDual2GHzPowerPCG5,fordatasetsIVandV,respectively


53 Thebar-chartinFig. 6{10 showsthepercentage log P 1 log P 2 j log P 1 j ,where P 1 = P ( Y j X )is thelikelihoodofIT V 0 ,and P 2 = P ( Y j X )ofIQT V 0 .Weobservethat P ( Y j X )ofIQT V 0 afterthealgorithmconverged,islargerthan P ( Y j X )ofIT V 0 .Thelargerlikelihoodmeans thatthemodelstructureandinferreddistributionsmoreac curatelyrepresentunderlying stochasticprocessesintheimage. 6.3ImageClassicationTests WecompareclassicationperformanceofIT V 0 withthatofthefollowingstatistical models:(1)MarkovRandomField(MRF)[ 6 ],(2)DiscriminativeRandomField(DRF)[ 25 ], and(3)Tree-StructuredBeliefNetwork(TSBN)[ 33 29 ].Thesemodelsarerepresentatives ofdescriptive,discriminativeandxed-structuregenera tivemodels,respectively.Below, webrieryexplainthemodels. ForMRFs,weassumethatthelabeleld P ( X )isahomogeneousandisotropicMRF, givenbythegeneralizedIsingmodelwithonlypairwisenonz eropotentials[ 6 ].Thelikelihoods P ( y i j x i )areassumedconditionallyindependentgiventhelabels.T hus,theposterior energyfunctionisgivenby U ( X j Y )= X i 2 V 0 log P ( y i j x i )+ X i 2 V 0 X j 2N i V 2 ( x i ;x j ) ; V 2 ( x i ;x j )= 8><>: MRF ;ifx i = x j ; MRF ;ifx i 6 = x j : where N i denotestheneighborhoodof i P ( y i j x i )isa G -componentmixtureofGaussians givenbyEq.( 2.6 ),and V 2 istheinteractionparameter.Detailsonlearningthemodel parametersaswellasoninferenceforagivenimagecanbefou ndinStanLi'sbook[ 6 ]. Next,theposteriorenergyfunctionoftheDRFisgivenby U ( X j Y )= X i 2 V 0 A i ( x i ;Y )+ X i 2 V 0 X j 2N i I ij ( x i ;x j ;Y ) ; where A i =log ( x i W T y i )and I ij = DRF ( Kx i x j +(1 K )(2 ( x i x j V T y i ) 1))aretheunary andpairwisepotentials,respectively.Sincetheabovefor mulationdealsonlywithbinary classication(i.e. x i 2f 1 ; 1 g ),whenestimatingparameters f W;V; DRF ;K g foranobject,wetreatthatobjectasapositiveexample,andallothe robjectsasnegativeexamples


54 (\oneagainstall"strategy).Fordetailsonhowtolearnthe modelparameters,andhow toconductinferenceforagivenimage,wereferthereaderto thepaperofKumarand Hebert[ 25 ]. Further,TSBNsorquad-treesaredenedtohavethesamenumb erofnodes V and levels L asirregulartrees.ForbothIT V 0 andTSBNs,weusethesameimagefeatures. Whenweoperateonwavelets,whichisamultiscaleimagefeat ure,weinfactpropagate observablestohigherlevels.Inthiscase,werefertotheco unterpartofIT V asTSBN .To learntheparametersofTSBNorTSBN ,andtoperforminferenceonagivenimage,we usethealgorithmsthoroughlydiscussedbyLaferteetal.[ 33 ]. Finally,irregular-treebasedimageclassicationiscond uctedbyemployingtheinferencealgorithmsinFig. 2{4 forIT V 0 andIT V ,andtheinferencealgorithmsinFig. 3{2 forIQT V 0 andIQT V .Sinceimageclassicationrepresentsasupervisedmachin elearning problem,itisnecessarytorstlearnmodelparametersontr ainingimages.Forthispurpose,weemploythelearningalgorithmsdiscussedinSectio n 2.5 forIT V 0 andIT V ,andthe learningalgorithmsdiscussedinSection 3.3 forIQT V 0 andIQT V AfterinferenceofMRF,DRF,TSBN,andtheirregulartree,on agivenimage,foreach model,weconductpixellabelingbyusingtheMAPclassier. InFig. 6{11 ,weillustrate anexampleofpixellabelingforadataset-IIimage.Here,we saythatanimageregionis correctlyrecognizedasanobjectifthemajorityofMAP-cla ssiedpixellabelsinthatregion areequaltothetruelabelingoftheobject.Forestimatingt heobject-recognitionerror,the followinginstancesarecountedaserror:(1)mergingtwodi stinctobjectsintoone,and(2) swappingtheidentityofobjects.Theobject-recognitione rroroverallobjectsin40test imagesindatasetIIissummarizedinTable 6{4 .IneachcellofTable 6{4 ,therstnumber indicatestheoverallrecognitionerror,whilethenumberi nparenthesesindicatestheratio ofswapped-identityerrors.Forinstance,forIT V 0 theoverallrecognitionerroris9.6%,of which37%ofinstanceswerecausedbyswapped-identityerro rs.Moreover,Table 6{5 shows averagepixel-labelingerror. Next,weexaminethe receiveroperatingcharacteristic (ROC)ofMRF,DRF,TSBN andIT V 0 foratwo-classrecognitionproblem.Fromthesetofimagecl assesgiveninFig. 6{1 wechoose\toy-snail"and"wavelets-book"asthetwopossib leclassesinthefollowingset


55 Table6{4:Objectrecognitionerror imagetype MRF DRF TSBN IT V 0 datasetII 21.2% 12.5% 14.8% 9.6% (67%) (83%) (72%) (37%) Table6{5:Pixellabelingerror imagetype MRF DRF TSBN IT V 0 datasetII 15.8% 12.3% 16.1% 9.9% ofexperiments.Thetaskistolabeltwo-class-problemimag escontaining\toy-snail"and "wavelets-book"objects,atypicalexampleofwhichisshow ninFig. 6{12 .Here,pixels labeledas\toy-snail"areconsideredtruepositives,whil epixelslabeledas\book"are consideredtruenegatives.InFig. 6{13 ,weplotROCcurvesforthetwo-classproblem, wherewecomparetheperformanceofIT V 0 withthoseofMRF,DRFandTSBN.From Fig. 6{13 ,weobservethatimageclassicationwithIT V 0 isthemostaccurate,sinceits ROCcurveistheclosesttotheleft-handandtopbordersofth eROCspace,ascompared totheROCcurvesoftheothermodels.Further,inFig. 6{14 ,weplotROCcurvesforthe sametwo-classproblem,wherewecomparetheperformanceof IT V ,withthoseofIT V 0 TSBN,andTSBN .FromFig. 6{14 ,weobservethatimageclassicationwithIT V isthe mostaccurate,andthatbothIT V 0 andIT V outperformtheirxed-structurecounterparts TSBNandTSBN FromtheresultsreportedinTables 6{4 and 6{5 ,aswellasformFigs. 6{13 and 6{14 wenotethatirregulartreesoutperformtheotherthreemode ls.However,recognition performanceofallthemodelssuerssubstantiallywhenani magecontainsocclusions. Whileforsomeapplicationstheliteraturereportsvisions ystemswithimpressivelysmall classicationerrors(e.g.,2.5%hand-writtendigitrecog nitionerror[ 75 ]),inthecaseof (a)256 256 (b)MRF (c)DRF (d)TSBN (e)IT V 0 Figure6{11:Comparisonofclassicationresultsforvario usstatisticalmodels;pixelsare labeledwithacolorspecicforeachobject;non-coloredpi xelsareclassiedasbackground.


56 (a)256 256 (b)MRF (c)DRF (d)TSBN (e)IT V 0 Figure6{12:MAPpixellabelingusingdierentstatistical models. Figure6{13:ROCcurvesfortheimageinFig. 6{12 awithIT V 0 ,TSBN,DRFandMRF. complexscenesthiserrorismuchhigher[ 76 77 11 5 4 ].Tosomeextent,ourresults couldhavebeenimprovedhadweemployedmorediscriminativ eimagefeaturesand/or moresophisticatedclassicationalgorithmsthanmajorit yrule.However,noneofthese willalleviatethefundamentalproblemof\traditional"re cognitionapproaches:thelackof explicitanalysisofvisibleobjectparts.Thus,thepoorcl assicationperformanceofMRF, DRF,andTSBN,reportedinTables 6{4 and 6{5 ,canbeinterpretedasfollows.Accounting foronlypairwisepotentialsbetweenadjacentnodesinMRFa ndDRFisnotsucientto analyzecomplexcongurationsofobjectsinthescene.Also ,theanalysisofxed-sizepixel neighborhoodsatvariousscalesinTSBNleadsto\blocky"es timates,andconsequently Figure6{14:ROCcurvesfortheimageinFig. 6{12 awithIT V ,IT V 0 ,TSBN,andTSBN


57 topoorclassicationperformance.Therefore,wehypothes izethatthemainreasonwhy irregulartreesoutperformtheothermodelsistheircapabi litytorepresentobjectdetailsat variousscales,whichinturnprovidesforexplicitanalysi sofvisibleobjectparts.Inother words,wespeculatethatinthefaceoftheocclusion-proble m, recognitionofobjectparts is criticalandshouldconditionrecognitionoftheobjectasa whole. Tosupportourhypothesis,insteadofapplyingmoresophist icatedimage-featureextractiontoolsandbetterclassicationproceduresthan majorityvote,weintroducea moreradicalchangetoourrecognitionstrategy. 6.4Object-PartRecognitionStrategy RecallfromSection 6.1 thatirregulartreesarecapableofcapturingcomponentsub componentstructuresatvariousscales,suchthatrootnode srepresentthecenterofmass ofdistinctobjects,whilechildrennodesdownthesubtrees representobjectparts.Assuch, irregulartreesprovideanaturalandseamlessframeworkfo ridentifyingcandidateimage regionsasobjectparts,requiringnoadditionaltrainingf orsuchidentication.Toutilizethisconvenientproperty,weconducttheobject-partr ecognitionstrategypresentedin Section 4.2 Wecomparetheperformanceofthewhole-objectandpart-obj ectrecognitionstrategies. Thewhole-objectapproachcanbeviewedasabenchmarkstrat egy,inthesensethata majorityofexistingvisionsystemsdoesnotexplicitlyana lyzevisibleobjectpartsatvarious scales.Inthesesystems,oncetheobjectisdetected,thewh oleimageregionisidentied throughMAPclassication,asisdoneintheprevioussectio n. InFig. 6{15 ,wepresentclassicationresultsforIT V 0 ,usingthewhole-objectand object-partrecognitionstrategiesondataset-IIimages. InFig. 6{15 a,bothstrategiessucceedinrecognizingtwodierent\Fluke"voltage-measurin ginstruments(seeFig. 6{1 ). However,inFig. 6{15 b,thewhole-objectrecognitionstrategyfailstomakeadis tinction betweentheobjects,sincethepartthatdierentiatesmost oneobjectfromanotherisoccluded,makingitadicultcaseforrecognitionevenforahu maninterpreter.Intheother twoimages,weobservethattheobject-partrecognitionstr ategyismoresuccessfulthan thewhole-objectapproach.


58 (a) (b) (c) (d) Figure6{15:Comparisonoftworecognitionstrategiesonda tasetIIforIT V 0 :(top)128 128challengingimagescontainingobjectsthatareverysim ilarinappearance;(middle) classicationusingthewhole-objectrecognitionstrateg y;(bottom)classicationusingthe part-objectrecognitionstrategy;eachrecognizedobject intheimageismarkedwitha dierentcolor. Forestimatingtheobject-recognitionerrorofIT V 0 ondataset-IIimages,thefollowing instancesarecountedaserror:(1)mergingtwodistinctobj ectsintoone(i.e.,objectnot detected),and(2)swappingtheidentityofobjects(i.e.,o bjectcorrectlydetectedbut misclassiedasoneoftheobjectsintheclassofknownobjec ts).Therecognitionerror averagedoverallobjectsin40testimagesindatasetIIison ly5.8%,animprovementof nearly40%overthereportederrorof9.6%intheprevioussec tion. Wealsorecordedtheobject-recognitionerrorofIQT V 0 overallobjectsin20test imagesofdatasetsIV,V,andVI,respectively.Theresultsa resummarizedinTable 6{6 IneachcellofTable 6{6 ,therstnumberindicatestheoverallrecognitionerror,w hilethe numberinparenthesesindicatestheratioofmerged-object errors.Forinstance,fordataset Vandthewhole-objectstrategy,theoverallrecognitioner roris21.2%,ofwhichslightly morethanhalf(56%)werecausedbymerged-objecterrors.Th eresultsinTable 6{6 clearly demonstratesignicantlyimprovedrecognitionperforman ce,aswellasreductioninfalse


59 Table6{6:ObjectrecognitionerrorforIQT V 0 datasets strategy IV V VI whole-object 11.6%(85%) 21.2%(56%) 26.3%(44%) object-part 3.3%(100%) 8.7%(92%) 12.5%(81%) Table6{7:PixellabelingerrorforIQT V 0 datasets strategy IV V V whole-object 9.6% 17.9% 16.3% object-part 4.3% 6.7% 8.3% alarmandswapped-identitytypesoferrorfortheobject-pa rt,ascomparedwiththewholeobjectapproach.Also,Table 6{7 showsthattheobject-partstrategyreducespixel-labelin g error. Theseresultssupportourhypothesisthatforsuccessfulre cognitionofpartiallyoccludedobjectsitiscriticaltoanalyzevisibleobjectdeta ilsatvariousscales.


60 (a)Clutteredscenecontaining10objects,eachofwhichism arkedwithadierentcolor;imagesoftwo alikepersons. (b)DatasetII:videosequenceoftwoalikepeoplewalkingin aclutteredscene. (c)Classicationusingthewhole-objectrecognitionstra tegy. (d)Classicationusingthepart-objectrecognitionstrat egy. Figure6{16:RecognitionresultsoverdatasetIVforIQT V 0


61 (a)6imageclasses:5similarobjectsandbackground. (b)4imagesofthesamesceneviewedfrom4dierentangleswi thobjectsshownin(a). (c)Themostsignicantobjectpartsdierovervariousscen es;themajority-votingclassicationresultis indicatedbythecoloredregions. (d)Classicationusingthewhole-objectrecognitionstra tegy. (e)Classicationusingtheobject-partrecognitionstrat egy. Figure6{17:RecognitionresultsoverdatasetVforIQT V 0


62 Figure6{18:Classicationusingthepart-objectrecognit ionstrategy;Recognitionresults fordatasetVI.


CHAPTER7 CONCLUSION 7.1SummaryofContributions Inthisdissertation,wehaveaddresseddetectionandrecog nitionofpartiallyoccluded, alikeobjectsincomplexscenes{theproblemthathaseluded ,asofyet,asatisfactory solution.Theexperimentsreportedhereinshowthat\tradi tional"approachestoobject recognition,whereobjectsarerstdetectedandthenident iedasawhole,yieldpoorperformanceincomplexsettings.Therefore,wespeculatethat acarefulanalysisofvisible, ne-scaleobjectdetailsmayprovecriticalforrecognitio n.However,ingeneral,theanalysis ofmultiplesub-partsofmultipleobjectsgivesrisetoproh ibitivecomputationalcomplexity. Toovercomethisproblem,wehaveproposedtomodelimageswi thirregulartrees,which provideasuitableframeworkfordevelopingnovelobject-r ecognitionstrategies{inparticular,object-partrecognition.Here,objectdetailsatvari ousscalesarerstdetectedthrough tree-structureestimation;then,theseobjectpartsarean alyzedastowhichcomponentof anobjectisthemostsignicantforrecognitionofthatobje ct;nally,informationoncognitivesignicanceofeachobjectpartiscombinedtowardth eultimateimageclassication. Empiricalevidencedemonstratesthatthisexplicittreatm entofobjectpartsresultsinan improvedrecognitionperformance,ascomparedtothestrat egieswhereobjectcomponents arenotexplicitlyaccountedfor. InChapter 2 ,wehaveproposedtwoarchitectureswithintheirregular-t reeframework, referredtoasIT V 0 andIT V .Foreacharchitecture,wehavedevelopedaninferencealgorithm.Gibbssamplinghasbeenshowntobesuccessfulatn dingtreesthathavehigh posteriorprobability;however,atagreatcomputationalp rice,whichrendersthealgorithm impractical.Therefore,wehaveproposedStructuredVaria tionalApproximation(SVA) forinferenceofIT V 0 andIT V ,whichrelaxespoorlyjustiedindependenceassumptionsi n priorwork.WehaveshownthatSVAconvergestolargerposter iordistributions,anorder ofmagnitudefasterthancompetingalgorithms.Wehavealso demonstratedthatIT V 0 and 63


64 IT V overcometheblockysegmentationproblemofTSBNs,andthat theypossesscertain invariancetotranslation,rotation,andscalingtransfor mations. InChapter 3 ,wehaveproposedanothertwoarchitectures,referredtoas IQT V 0 and IQT V .Inthesemodels,wehaveconstrainedthenodepositionstob exed,suchthat onlyconnectionscancontrolirregulartreestructure.Att hesametime,wehavemadethe distributionofconnectionsdependentonimageclasses.Th isformulationhasallowedusto avoidvariational-approximationinference,andtodevelo ptheexactinferencealgorithmfor IQT V 0 andIQT V .WehaveshownthatitconvergesslowerthanSVA;however,it yields largerlikelihoods,whichingeneralmeansthatIQT V 0 representsunderlyingstochastic processesintheimagemoreaccuratelythanIT V 0 Inexperimentsonunsupervisedimagesegmentation,wehave shownthecapabilityof irregulartreestocaptureimportantcomponent-subcompon entstructuresinimages.Empiricalevidencedemonstratesthatrootnodesrepresentthece nterofmassofdistinctobjects, whilechildrennodesdownthesubtreesrepresentobjectpar ts.Assuch,irregulartrees provideanaturalandseamlessframeworkforidentifyingca ndidateimageregionsasobjectparts,requiringnoadditionaltrainingforsuchident ication.InChapter 4 ,wehave proposedtoexplicitlyanalyzethesignicanceofobjectpa rts(i.e.,treenodes)withrespect torecognitionofanobjectasawhole.Wehavedenedentropy asameasureofsuchcognitivesignicance.Toavoidthecostlyapproachofanalyzi ngeverydetectedobjectpart, wehavedevisedagreedyalgorithm,referredtoasobject-pa rtrecognition.Thecomparisonofwhole-objectandpart-objectapproachesindicatest hatthelattermethodgenerates signicantlybetterrecognitionperformanceandreducedp ixel-labelingerror. Ultimately,whatallowsustoovercomeobstaclesinanalyzi ngsceneswithocclusions inacomputationallyecientandintuitivelyappealingman neristhegenerative-model frameworkwehaveproposed.Thisframeworkprovidesanexpl icitrepresentationofobjects andtheirsub-partsatvariousscales,which,inturn,const itutesthekeyfactorforimproved interpretationofsceneswithpartiallyoccluded,alikeob jects.


65 7.2OpportunitiesforFutureWork Theanalysisinthepreviouschapterssuggeststhefollowin gopportunitiesforfuture work.Onepromisingthrustofresearchwouldbetoinvestiga terelationshipsamongdescriptive,generativeanddiscriminativestatisticalmodels.W eanticipatethatthesestudieswill leadtoagreaterintegrationofthemodelingparadigms,yie ldingricherandmoreadvanced classesofmodels.Here,themostcriticalissueisthatofco mputationallymanageableinference.Withrecentadvancesintheareaofbeliefpropagat ion(e.g.,GeneralizedBelief Propagation[ 78 ]),thenewalgorithmsmaymakeitpossibletosolvereal-wor ldproblems thatwerepreviouslycomputationallyintractable. Withintheirregular-treeframework,itispossibletocont inuefurtherinvestigation towardreplacingthecurrentdiscrete-valuednodevariabl eswithreal-valuedones.Thereby, areal-valuedversionoftheirregulartreecanbespecied. Gaussianscouldbeusedas aprobabilitydistributiontogoverncontinuousrandomvar iables,representedbynodes, duetotheirtractableproperties.Suchamodelcouldthenop eratedirectlyonreal-valued pixeldata,improvingthestate-of-the-arttechniquesfor solvingvariousimage-processing problems,includingsuperresolution,imageenhancement, andcompression. Further,withrespecttothemeasureofsignicanceofirreg ular-treenodes,onecan pursueinvestigationofmorecomplexinformation-theoret icconceptsthanShanon'sentropy. Forexample,weanticipatethatjointentropyandmutualinf ormationmayyieldamore ecientcognitiveanalysis,whichinturncouldeliminatet heneedforthegreedyalgorithm discussedinSection 4.2 Theanalysisofobjectpartscanbeinterpretedasintegrati onofinformationfrom multiplecomplementaryand/orcompetitivesensors,eacho fwhichhasonlylimitedaccuracy.Assuch,furtherresearchcouldbeconductedonformul atingtheoptimalstrategy forcombiningthepiecesofinformationofobjectpartstowa rdultimateobjectrecognition. Weanticipatethatalgorithmssuchastheadaptiveboosting (AdaBoost)[ 79 ]andSupport VectorMachine[ 80 ]mayproveusefulforthispurpose. Anotherpromisingresearchtopicistoincorporateavailab lepriorknowledgeintothe proposedBayesianestimationframework,wherewehaveassu medthatallclassication


66 errorsareequallycostly.However,inmanyapplications,s omeerrorsaremoreseriousthan others.Cost-sensitivelearningmethodsareneededtoaddr essthisproblem[ 81 ]. Onabroaderscale,theresearchreportedinthisdissertati oncanbeviewedassolving amoregeneralmachinelearningproblem,withexperimental validationonimagesasdata. Thisproblemconcernssupervisedlearningfromexamples,w herethegoalistolearna function X = f ( Y )from N trainingexamplesoftheform f ( Y n ;f ( Y n )) g Nn =1 .Here, X n and Y n containsub-components,themeaningofwhichdiersforvar iousapplications.For example,incomputervision,each Y n mightbeavectorofimagepixelvalues,andeach X n mightbeapartitionofthatimageintosegmentsandanassign mentoflabelstoeach segment.Mostimportantly,thecomponentsof Y n formasequence(e.g.,asequenceon the2Dimagelattice).Therefore,learningaclassierfunc tion X = f ( Y )representsthe sequentialsupervisedlearning problem[ 82 ].Thus,inthisdissertation,wehaveaddressed sequentialsupervisedlearning,thesolutionsofwhichcan bereadilyappliedtoawiderange ofproblemsbeyondcomputervision,suchas,forexample,sp eechprocessing,wherethe componentsof Y formasequenceintime.


APPENDIXA DERIVATIONOFVARIATIONALAPPROXIMATION Preliminaries. Computationof KL ( Q k P ),givenbyEq.( 2.12 ),isintractable,becauseitdependson P ( Z;X;R 0 j Y;R 0 ).Note,though,that Q ( Z;X;R 0 )doesnotdepend on P ( Y j R 0 )and P ( R 0 ).Consequently,bysubtractinglog P ( Y j R 0 )andlog P ( R 0 )from KL ( Q k P ),weobtainatractablecriterion J ( Q;P ),whoseminimizationwithrespectto Q ( Z;X;R 0 )yieldsthesamesolutionasminimizationof KL ( Q k P ): J ( Q;P ) KL ( Q k P ) log P ( Y j R 0 ) log P ( R 0 )= Z R 0 dR 0 XZ;X Q ( Z;X;R 0 )log Q ( Z;X;R 0 ) P ( Z;X;R;Y ) : (A.1) J ( Q;P )isknownalternativelyasHelmholtzfreeenergy,Gibbsfre eenergy,orfreeenergy [ 59 ].Byminimizing J ( Q;P ),weseektocomputeparametersofapproximatedistributio ns Q ( Z ), Q ( X j Z )and Q ( R 0 j Z ).Itisconvenient,rst,toreformulateEq.( A.1 )as J ( Q;P )= L Z + L X + L R .Wedeneauxiliary L Z L X ,and L R as L Z P Z Q ( Z )log Q ( Z ) P ( Z ) L X P Z;X Q ( Z ) Q ( X j Z )log Q ( X j Z ) P ( X j Z ) P ( Y j X; ) ,and L R R R 0 dR 0 P Z Q ( Z ) Q ( R 0 j Z )log Q ( R 0 j Z ) P ( R j Z ) .To deriveexpressionsfor L Z L X L R ,werstobserve: h z ij i = ij ; D x ki E = m ki ; D x ki x lj E = Q klij m lj ) m ki = P j 2 V ij P l 2 M Q klij m lj ; 8 i 2 V; 8 k 2 M; (A.2) where hi denotesexpectationwithrespectto Q ( Z;X;R 0 ).Consequently,fromEqs.( 2.1 ), ( 2.9 )and( A.2 ),wehave L Z = P ij 2 V ij log[ ij =r ij ] : (A.3) Next,fromEqs.( 2.4 ),( 2.10 )and( A.2 ),wederive L X = P i;j 2 V P k;l 2 M ij Q klij m lj log[ Q klij =P kl ij ] P i 2 V P k 2 M m ki log P ( y ( i ) j x ki ; ( i )) : (A.4) 67


68 Notethatfor DT V 0 V inthesecondtermissubstitutedwith V 0 .Finally,fromEqs.( 2.3 ), ( 2.11 )and( A.2 ),weget L R = 1 2 P i;j 2 V 0 ij log j ij j j n ij j Tr n n 1 ij n ij o +Tr n 1 ij h ( r i r j d ij )( r i r j d ij ) T i o : (A.5) Letusnowconsidertheexpectationinthelastterm: h ( r i r j d ij )( r i r j d ij ) T i = h ( r i ij + ij r j d ij )( r i ij + ij r j d ij ) T i = =n ij +2 h ( r i ij )( jp r j d ij jp + ij ) T i + + h ( r j jp + d ij + jp ij )( r j jp + d ij + jp ij ) i = =n ij +2 h ( r i ij )( jp r j ) T i + h ( r j jp )( r j jp ) T +( ij jp d ij )( ij jp d ij ) T i = =n ij + P p 2 V 0 jp (2 ijp +n jp + M ijp ) ; (A.6) wherethedenitionsofauxiliarymatrices ijp and M ijp aregiveninthesecondtothe lastderivationstepabove,and i j p isachild-parent-grandparenttriad.Itfollowsfrom Eqs.( A.5 )and( A.6 )that L R = 1 2 X i;j 2 V 0 ij 0@ log j ij j j n ij j 2+Tr f 1 ij n ij g + X p 2 V 0 jp Tr f 1 ij (2 ijp +n jp + M ijp ) g 1A : (A.7) InEq.( A.7 ),thelastexpressionlefttocomputeisTr f 1 ij ijp g .Forthispurpose,weapply theCauchy-Schwartzinequalityasfollows: Tr f 1 ij ijp g =Tr f 1 2 ij 1 2 ij h ( r i ij )( jp r j ) T ig =Tr fh 1 2 ij ( r i ij )( jp r j ) T 1 2 ij ig ; Tr f 1 ij n ij g 1 2 Tr f 1 ij n jp g 1 2 ; (A.8) whereweusedthefactthatthe'sandn'sarediagonalmatric es.AlthoughtheCauchySchwartzinequalityingeneraldoesnotyieldatightupperb ound,inourcaseitappears reasonabletoassumethatvariables r i and r j (i.e.,positionsofobjectpartsatdierent scales)areuncorrelated.SubstitutingEq.( A.8 )intoEq.( A.7 ),wenallyderivetheupper


69 boundfor L R as L R 1 2 P i;j 2 V 0 ij log j ij j j n ij j 2+Tr f 1 ij n ij g + P p 2 V 0 jp Tr f 1 ij (n jp + M ijp ) g + +2 P p 2 V 0 jp Tr f 1 ij n ij g 1 2 Tr f 1 ij n jp g 1 2 : (A.9) Optimizationof Q ( X j Z ) Q ( X j Z )isfullycharacterizedbyparameters Q klij .From thedenitionof L X ,wehave @J ( Q;P ) =@Q klij = @L X =@Q klij .Duetoparent-childdependencies inEq.( A.2 ),itisnecessarytoiterativelydierentiate L X withrespectto Q klij downthe subtreeofnode i .Forthispurpose,weintroducethreeauxiliaryterms F ij G i ,and ki whichfacilitatecomputation,asshownbelow: F ij P k;l 2 M ij Q klij m lj log[ Q klij =P kl ij ] ; G i P d;c 2 d ( i ) F dc P k 2 M m ki log P ( y ( i ) j x ki ; ( i )) V 0 ; ki exp( @G i =@m ki ) ; ) @L X @Q klij = @F ij @Q klij + @G i @m ki @m ki @Q klij ; (A.10) where fg V 0 denotesthatthetermisincludedintheexpressionfor G i if i isaleaf nodefor DT V 0 .For DT V ,theterminbraces fg isalwaysincluded.Thisallowsus toderiveupdateequationsforbothmodelssimultaneously. Afterndingthederivatives @F ij =@Q klij = ij m lj (log[ Q klij =P kl ij ]+1)and @m ki =@Q klij = ij m lj ,andsubstitutingtheseexpressionsinEq.( A.10 ),wearriveat @L X =@Q klij = ij m lj (log[ Q klij =P kl ij ]+1 log ki ) : (A.11) Finally,optimizingEq.( A.11 )withtheLagrangemultiplierthataccountsfortheconstra int P k 2 M Q klij =1yieldsthedesiredupdateequation: Q klij = P kl ij ki ,introducedinEq.( 2.13 ). Tocompute ki ,werstnd @G i =@m ki = P c 2 c ( i ) @F ci =@m ki + P a 2 M ( @G c =@m ac )( @m ac =@m ki ) f log P ( y ( i ) j x ki ; ( i )) g V 0 ; = P c 2 c ( i ) P a 2 M ci Q akci log[ Q akci =P ak ci ]+ @G c =@m ac f log P ( y ( i ) j x ki ; ( i )) g V 0 ; (A.12) andthensubstitute Q klij ,givenbyEq.( 2.13 ),intoEq.( A.12 ),whichresultsin ki = f P ( y ( i ) j x ki ; ( i )) g V 0 Q c 2 V P a 2 M P ak ci ac ci ,asintroducedinEq.( 2.14 ).


70 Optimizationof Q ( R 0 j Z ) Q ( R 0 j Z )isfullycharacterizedbyparameters ij andn ij Fromthedenitionof L R ,weobservethat @J ( Q ) =@ n ij = @L R =@ n ij and @J ( Q ) =@ ij = @L R =@ i Sincethen'sarepositivedenite,fromEq.( A.9 ),itfollowsthat @L R =@ n ij =0 : 5 ij Tr f n 1 ij g +Tr f 1 ij g + P c 2 V 0 ci Tr f 1 ci g + + P p 2 V 0 jp Tr f 1 ij g Tr f 1 ij n ij g 1 2 Tr f 1 ij n jp g 1 2 + + P c 2 V 0 ci Tr f 1 ci g Tr f 1 ci n ij g 1 2 Tr f 1 ci n ci g 1 2 : (A.13) From @L R =@ n ij =0,itisstraightforwardtoderivetheupdateequationforn ij givenby Eq.( 2.17 ). Next,tooptimizethe ij parameters,from( A.9 ),wecompute @L R @ ij = @ @ ij 1 2 P i;j;p 2 V 0 ij jp ( ij jp d jp ) T 1 ij ( ij jp d jp ) ; = P c;p 2 V 0 ij jp 1 ij ( ij jp d jp ) ci ij 1 ci ( ci ij d ij ) : (A.14) Then,from @L R =@ ij =0,itisstraightforwardtocomputetheupdateequationfor ij given byEq.( 2.16 ). Optimizationof Q ( Z ) Q ( Z )isfullycharacterizedbytheparameters ij .Fromthe denitionsof L Z L X ,and L R weseethat @J ( Q ) =@ ij = @ ( L X + L R + L Z ) =@ ij .Similarto theoptimizationof Q klij ,weneedtoiterativelydierentiate L X asfollows: @L X =@ ij = @F ij =@ ij + P k 2 M ( @G i =@m ki )( @m ki =@ ij )(A.15) where F ij and G i aredenedasinEq.( A.10 ).Substitutingthederivatives @G i =@m ki = log ki and @F ij =@ ij = P k;l 2 M Q klij m lj log[ Q klij =P kl ij ],and @m ki =@ ij = P l 2 M Q klij m lj intoEq.( A.15 ) weobtain @L X @ ij = P k;l 2 M Q klij m lj log Q klij P kl ij log ki = P k;l 2 M Q klij m lj log P a 2 M P al ij ai = A ij ; (A.16) Next,wedierentiate L R ,givenbyEq.( A.9 ),withrespectto ij as @L R =@ ij = 1 2 log j ij j = j n ij j 1+ 1 2 Tr f 1 ij n ij g + + 1 2 P p 2 V 0 jp Tr f 1 ij (n jp + M ijp ) g +2Tr f 1 ij n ij g 1 2 Tr f 1 ij n tu g 1 2 +


71 + 1 2 P c 2 V 0 ci Tr f 1 ci (n ij + M cij ) g +2Tr f 1 ci n ci g 1 2 Tr f 1 ci n ij g 1 2 ; (A.17) = B ij 1 ; (A.18) whereindexes c j and p denotechildren,parentsandgrandparentsofnode i ,respectively. Further,fromEq.( A.3 ),weget @L Z =@ ij =1+log ij =r ij : (A.19) Finally,substitutingEqs.( A.16 ),( A.18 )and( A.19 )into @J ( Q ) =@ ij =0andaddingthe Lagrangemultipliertoaccountfortheconstraint P j 2 V 0 ij =1,wesolvefortheupdate equationof ij givenbyEq.( 2.18 ).


APPENDIXB INFERENCEONTHEFIXED-STRUCTURETREE Theinferencealgorithmfor MaximumPosteriorMarginal (MPM)estimationonthe quad-treeisknowntoalleviateimplementationissuesrela tedtounderrownumericalerror[ 33 ].ThewholeprocedureissummarizedinFig. B{1 .Thealgorithmassumesthat thetreestructureisxedandknown.Therefore,inFig. B{1 ,wesimplifynotationas P ( x i j Z;Y ) P ( x i j Y )and P ( x i j x j ;Z ) P ( x i j x j ).Also,wedenotewith c ( i )childrenof i andwith d ( i )thesetofallthedescendantsdownthetreeofnode i including i itself.Thus, Y d ( i ) denotesasetofallobservablesdownthesubtreewhoserooti s i .Also,forcomputing P ( x i j Y d ( i ) ),inthebottom-uppass, / meansthatequalityholdsuptoamultiplicative constantthatdoesnotdependon x i 72


73 Two-passMPMestimationonthetree # Preliminarydownwardpass: 8 i 2 V L 1 ;V L 2 ;:::;V 0 P ( x i )= P x j P ( x i j x j ) P ( x j ), Bottom-uppass: Initializeleafnodes: 8 i 2 V 0 P ( x i j y i ) / P ( y i j x i ) P ( x i ), P ( x i ;x j j y i )= P ( x i j x j ) P ( x j ) P ( x i j y i ) =P ( x i ), N computeupward 8 i 2 V 1 ;V 2 :::;V L P ( x i j Y d ( i ) ) / P ( x i ) Q c 2 c ( i ) P x c P ( x c j Y d ( c ) ) P ( x c j x i ) P ( x c ) P ( x i ;x j j Y d ( i ) )= P ( x i j x j ) P ( x j ) P ( x i j Y d ( i ) ) =P ( x i ), # Top-downpass: Initializeroot: i 2 V L P ( x i j Y )= P ( x i j Y d ( i ) ), ^ x i =argmax x i P ( x i j Y ), H computedownward 8 i 2 V L 1 ;V L 2 :::;V 0 P ( x i j Y )= X x j P ( x i ;x j j Y d ( i ) ) P x i P ( x i ;x j j Y d ( i ) ) P ( x j j Y ), ^ x i =argmax x i P ( x i j Y ) FigureB{1:Steps2and5inFig. 3{2 :MPMestimationonthexed-structuretree.Distributions P ( y i j x i )and P ( x i j x j )areassumedknown.


REFERENCES [1]W.E.L.GrimsonandT.Lozano-Perez,\Localizingoverla ppingpartsbysearching theinterpretationtree," IEEETrans.PatternAnal.MachineIntell. ,vol.9,no.4,pp. 469{482,1987. [2]S.Z.DerandR.Chellappa,\Probe-basedautomatictarge trecognitionininfrared imagery," IEEETrans.ImageProcessing ,vol.6,no.1,pp.92{102,1997. [3]P.C.Chung,E.L.Chen,andJ.B.Wu,\Aspatiotemporalneu ralnetworkforrecognizingpartiallyoccludedobjects," IEEETrans.SignalProcessing ,vol.46,no.7,pp. 1991{2000,1998. [4]W.M.Wells,\Statisticalapproachestofeature-basedo bjectrecognition," Intl.J. ComputerVision ,vol.21,no.1,pp.63{98,1997. [5]Z.YingandD.Castanon,\Partiallyoccludedobjectreco gnitionusingstatistical models," Intl.J.ComputerVision ,vol.49,no.1,pp.57{78,2002. [6]S.Z.Li, Markovrandomeldmodelinginimageanalysis ,Springer-Verlag,Tokyo, Japan,2ndedition,2001. [7]M.H.LinandC.Tomasi,\Surfaceswithocclusionsfromla yeredstereo," IEEETrans. PatternAnal.MachineIntell. ,vol.26,no.8,pp.1073{1078,2004. [8]A.MittalandL.S.Davis,\M2tracker:amulti-viewappro achtosegmentingand trackingpeopleinaclutteredscene," Intl.J.ComputerVision ,vol.51,no.3,pp. 189{203,2003. [9]B.J.Frey,N.Jojic,andA.Kannan,\Learningappearance andtransparencymanifolds ofoccludedobjectsinlayers,"in Proc.IEEEConf.ComputerVisionPatternRec. Madison,WI,2003,vol.1,pp.45{52,IEEE,Inc. [10]F.Dell'AcquaandR.Fisher,\Reconstructionofplanar surfacesbehindocclusionsin rangeimages," IEEETrans.PatternAnal.MachineIntell. ,vol.24,no.4,pp.569{575, 2002. [11]R.Fergus,P.Perona,andA.Zisserman,\Objectclassre cognitionbyunsupervised scale-invariantlearning,"in Proc.IEEEConf.ComputerVisionPatternRec. ,Madison,WI,2003,vol.2,pp.264{271,IEEE,Inc. [12]A.Mohan,C.Papageorgiou,andT.Poggio,\Example-bas edobjectdetectioninimages bycomponents," IEEETrans.PatternAnalysisMachineIntelligence ,vol.23,no.4, pp.349{361,2001. 74


75 [13]M.Weber,MWelling,andP.Perona,\Towardsautomaticd iscoveryofobjectcategories,"in Proc.IEEEConf.Comp.VisionPatternRec. ,HiltonHeadIsland,SC, 2000,vol.2,pp.101{109,IEEE,Inc. [14]M.Weber,MWelling,andP.Perona,\Unsupervisedlearn ingofmodelsforrecognition,"in Proc.6thEuropeanConf.Comp.Vision ,Dublin,Ireland,2000,vol.1,pp. 18{32,Springer. [15]B.Heisele,T.Serre,M.Pontil,T.Vetter,andT.Poggio ,\Categorizationbylearning andcombiningobjectparts,"in Advancesinneuralinformationprocessingsystems, 14 ,T.G.Dietterich,S.Becker,andZ.Ghahramani,Eds.,vol.2 ,pp.1239{1245.MIT Press,Cambridge,MA,2002. [16]P.F.FelzenszwalbandDanielP.Huttenlocher,\Pictor ialstructuresforobjectrecognition," Intl.J.ofComputerVision ,vol.61,no.1,pp.55{79,2005. [17]H.SchneidermanandT.Kanade,\Objectdetectionusing thestatisticsofparts," Intl. J.ComputerVision ,vol.56,no.3,pp.151{177,2004. [18]S.C.Zhu,\Statisticalmodelingandconceptualizatio nofvisualpatterns," IEEE Trans.PatternAnal.MachineIntell. ,vol.25,no.6,pp.691{712,2003. [19]S.C.Zhu,Y.N.Wu,andD.B.Mumford,\Minimaxentropypr incipleanditsapplicationstotexturemodeling," NeuralComputation ,vol.9,no.8,pp.1627{1660, 1997. [20]S.GemanandD.Geman,\Stochasticrelaxation,Gibbsdi stributionandtheBayesian restorationofimages," IEEETrans.PatternAnal.MachineIntell. ,vol.6,no.6,pp. 721{741,1984. [21]A.EfrosandT.Leung,\Texturesynthesisbynon-parame tricsampling,"in Proc.Intl. Conf.ComputerVision ,Kerkyra,Greece,1999,vol.2,pp.1033{1038,IEEE,Inc. [22]J.S.DeBonetandP.Viola,\Texturerecognitionusinga non-parametricmulti-scale statisticalmodel,"in Proc.IEEEConf.ComputerVisionPatternRec. ,SantaBarbara, CA,1998,pp.641{647,IEEE,Inc. [23]M.J.Beal,N.Jojic,andH.Attias,\Agraphicalmodelfo raudiovisualobjecttracking," IEEETrans.PatternAnal.MachineIntell. ,vol.25,no.7,pp.828{836,2003. [24]J.CoughlanandA.Yuille,\Algorithmsfromstatistica lphysicsforgenerativemodels ofimages," ImageandVisionComputing ,vol.21,no.1,pp.29{36,2003. [25]S.KumarandM.Hebert,\Discriminativerandomelds:a discriminativeframework forcontextualinteractioninclassication,"in Proc.IEEEIntl.Conf.Comp.Vision Nice,France,2003,vol.2,pp.1150{1157,IEEE,Inc. [26]J.Laerty,A.McCallum,andF.Pereira,\Conditionalr andomelds:probabilistic modelsforsegmentingandlabelingsequencedata,"in Proc.Intl.Conf.Machine Learning ,WilliamsCollege,MA,2001,pp.282{289. [27]C.A.BoumanandM.Shapiro,\Amultiscalerandomeldmo delforBayesianimage segmentation," IEEETrans.ImageProcessing ,vol.3,no.2,pp.162{177,1994.


76 [28]W.W.Irving,P.W.Fieguth,andA.S.Willsky,\Anoverla ppingtreeapproachto multiscalestochasticmodelingandestimation," IEEETrans.ImageProcessing ,vol. 6,no.11,pp.1517{1529,1997. [29]H.ChengandC.A.Bouman,\MultiscaleBayesiansegment ationusingatrainable contextmodel," IEEETrans.ImageProcessing ,vol.10,no.4,pp.511{525,2001. [30]M.S.Crouse,R.D.Nowak,andR.G.Baraniuk,\Wavelet-b asedstatisticalsignal processingusingHiddenMarkovModels," IEEETrans.SignalProcessing ,vol.46,no. 4,pp.886{902,1998. [31]X.Feng,C.K.I.Williams,andS.N.Felderhof,\Combini ngbeliefnetworksandneural networksforscenesegmentation," IEEETrans.PatternAnal.MachineIntell. ,vol.24, no.4,pp.467{483,2002. [32]S.TodorovicandM.C.Nechyba,\Towardsintellignetmi ssionprolesofMicroAir Vehicles:multiscaleViterbiclassication,"in Proc.8thEuropeanConf.Computer Vision ,Prague,CzechRepublic,2004,vol.2,pp.178{189,Springe r. [33]J.-M.Laferte,P.Perez,andF.Heitz,\DiscreteMark ovimagemodelingandinference onthequadtree," IEEETrans.ImageProcessing ,vol.9,no.3,pp.390{404,2000. [34]M.R.LuettgenandA.S.Willsky,\Likelihoodcalculati onforaclassofmultiscale stochasticmodels,withapplicationtotexturediscrimina tion," IEEETrans.Image Processing ,vol.4,no.2,pp.194{207,1995. [35]P.L.Ainsleigh,N.Kehtarnavaz,andR.L.Streit,\Hidd enGauss-Markovmodelsfor signalclassication," IEEETrans.SignalProcessing ,vol.50,no.6,pp.1355{1367, 2002. [36]J.Pearl, Probabilisticreasoninginintelligentsystems:networks ofplausibleinference MorganKaufamnn,SanMateo,CA,1988. [37]M.J.Wainwright,T.S.Jaakkola,andA.S.Willsky,\Tre e-basedreparameterization frameworkforanalysisofsum-productandrelatedalgorith ms," IEEETrans.Inform. Theory ,vol.49,no.5,pp.1120{1146,2003. [38]BrendanJ.Frey, Graphicalmodelsformachinelearninganddigitalcommunic ation TheMITPress,Cambridge,MA,1998. [39]S.KumarandM.Hebert,\Man-madestructuredetectioni nnaturalimagesusinga causalmultiscalerandomeld,"in Proc.IEEEConf.ComputerVisionPatternRec. Madison,WI,2003,vol.1,pp.119{126,IEEE,Inc. [40]M.K.Schneider,P.W.Fieguth,W.C.Karl,andA.S.Wills ky,\Multiscalemethods forthesegmentationandreconstructionofsignalsandimag es," IEEETrans.Image Processing ,vol.9,no.3,pp.456{468,2000. [41]J.Li,R.M.Gray,andR.A.Olshen,\Multiresolutionima geclassicationbyhierarchicalmodelingwithtwo-dimensionalHiddenMarkovModels ," IEEETrans.Inform. Theory ,vol.46,no.5,pp.1826{1841,2000.


77 [42]W.K.Konen,T.Maurer,andC.vonderMalsburg,\Afastdy namiclinkmatching algorithmforinvariantpatternrecognition," NeuralNetworks ,vol.7,no.6-7,pp. 1019{1030,1994. [43]A.Montanvert,P.Meer,andA.Roseneld,\Hierarchica limageanalysisusingirregular tessellations," IEEETrans.PatternAnal.MachineIntell. ,vol.13,no.4,pp.307{316, 1991. [44]P.BertolinoandA.Montanvert,\Multiresolutionsegm entationusingtheirregular pyramid,"in Proc.Intl.ConfImageProcessing ,Lausanne,Switzerland,1996,vol.1, pp.257{260,IEEE,Inc. [45]N.J.Adams,A.J.Storkey,Z.Ghahramani,andC.K.I.Wil liams,\MFDTs:mean elddynamictrees,"in Proc.15thIntl.Conf.PatternRec. ,Barcelona,Spain,2000, vol.3,pp.147{150,Intl.Assoc.PatternRec. [46]N.J.Adams, Dynamictrees:ahierarchicalprobabilisticapproachtoim agemodeling Ph.D.dissertation,DivisionofInformatics,Univ.ofEdin burgh,Edinburgh,UK,2001. [47]A.J.Storkey,\Dynamictrees:astructuredvariationa lmethodgivingecientpropagationrules,"in Uncertaintyinarticialintelligence ,C.BoutilierandM.Goldszmidt, Eds.,pp.566{573.MorganKauamnn,SanFrancisco,CA,2000 [48]A.J.StorkeyandC.K.I.Williams,\Imagemodelingwith position-encodingdynamic trees," IEEETrans.PatternAnal.MachineIntell. ,vol.25,no.7,pp.859{871,2003. [49]M.I.Jordan,Ed., Learningingraphicalmodels(adaptivecomputationandmac hine learning) ,MITpress,Cambridge,MA,1999. [50]M.I.Jordan,\Graphicalmodels," StatisticalScience(spec.issueonBayesianstatistics) ,vol.19,pp.140{155,2004. [51]A.P.Dempster,N.M.Laird,andD.B.Rubin,\Maximumlik elihoodfromincomplete dataviatheEMalgorithm," JournaloftheRoyalStatisticalSocietyB ,vol.39,pp. 1{39,1977. [52]G.J.McLachlanandK.T.Thriyambakam, TheEMalgorithmandextensions ,John Wiley&Sons,NewYork,NY,1996. [53]D.M.ChickeringandD.Heckerman,\Ecientapproximat ionsforthemarginallikelihoodofincompletedatagivenaBayesiannetwork,"in Proc.12thConf.Uncertainty ArticialIntelligence ,Portland,OR,1996,pp.158{168,Assoc.UncertaintyArti cial Intelligence. [54]S.TodorovicandM.C.Nechyba,\Interpretationofcomp lexscenesusinggenerative dynamic-structuredmodels,"in CD-ROMProc.IEEECVPR2004,Workshopon Generative-ModelBasedVision(GMBV) ,Washington,DC,2004,IEEE,Inc. [55]S.TodorovicandM.C.Nechyba,\Detectionofarticial structuresinnatural-scene imagesusingdynamictrees,"in Proc.17thIntl.Conf.PatternRec. ,Cambridge,UK, 2004,pp.35{39,Intl.Assoc.PatternRec.


78 [56]M.AitkinandD.B.Rubin,\Estimationandhypothesiste stinginnitemixture models," J.RoyalStat.Soc. ,vol.B-47,no.1,pp.67{75,1985. [57]R.M.Neal,\ProbabilisticinferenceusingMarkovChai nMonteCarlomethods,"Tech. Rep.CRG-TR-93-1,ConnectionistResearchGroup,Univ.ofT oronto,1993. [58]D.A.Forsyth,J.Haddon,andS.Ioe,\Thejoyofsamplin g," Intl.J.Computer Vision ,vol.41,no.1-2,pp.109{134,2001. [59]M.I.Jordan,Z.Ghahramani,T.S.Jaakkola,andL.K.Sau l,\Anintroduction tovariationalmethodsforgraphicalmodels," MachineLearning ,vol.37,no.2,pp. 183{233,1999. [60]D.J.C.MacKay, Informationtheory,inference,andlearningalgorithms ,Cambridge Univ.Press,Cambridge,UK,2003. [61]D.BarberandP.vandeLaar,\Variationalcumulantexpa ntionsforintractabledistributions," J.ArticialIntell.Research ,vol.10,pp.435{455,1999. [62]D.J.C.MacKay, Informationtheory,inference,andlearningalgorithms ,chapter29, pp.357{386,CambridgeUniversityPress,Cambridge,UK,20 03. [63]D.J.C.MacKay,\IntroductiontoMonteCarlomethods," in Learningingraphical models(adaptivecomputationandmachinelearning) ,M.I.Jordan,Ed.,pp.175{204. MITpress,Cambridge,MA,1999. [64]T.S.Jaakkola,\Tutorialonvariationalapproximatio nmethods,"in Adv.MeanField Methods ,M.OpperandD.Saad,Eds.,pp.129{161.MITpress,Cambridg e,MA,2000. [65]T.M.CoverandJ.A.Thomas, Elementsofinformationtheory ,WileyInterscience Press,NewYork,NY,1991. [66]TrygveRandenandHakonHusoy,\Filteringfortexturec lassication:acomparative study," IEEETrans.PatternAnal.MachineIntell. ,vol.21,no.4,pp.291{310,1999. [67]StephaneMallat, Awavelettourofsignalprocessing ,AcademicPress,SanDiego,CA, 2ndedition,2001. [68]StephaneG.Mallat,\Atheoryformultiresolutionsign aldecomposition:thewavelet representation," IEEETrans.PatternAnal.MachineIntell. ,vol.11,no.7,pp.674{ 693,1989. [69]JeromeM.Shapiro,\Embeddedimagecodingusingzerotr eesofwaveletcoecients," IEEETrans.onSignalProcessing ,vol.41,no.12,pp.3445{3462,1993. [70]N.G.Kingsbury,\Complexwaveletsforshiftinvariant analysisandlteringofsignals," J.AppliedComp.HarmonicAnalysis ,vol.10,no.3,pp.234{253,2001. [71]MichaelUnser,\Textureclassicationandsegmentati onusingwaveletframes," IEEE Trans.onImageProcessing ,vol.4,no.11,pp.1549{1560,1995. [72]NickKingsbury,\Complexwaveletsforshiftinvariant analysisandlteringofsignals," JournalofAppliedandComputationalHarmonicAnalysis ,vol.10,no.3,pp.234{253, 2001.


79 [73]T.Lindeberg,\Scale-spacetheory:abasictoolforana lysingstructuresatdierent scales," J.AppliedStatistics ,vol.21,no.2,pp.224{270,1994. [74]D.G.Lowe,\Distinctiveimagefeaturesfromscale-inv ariantkeypoints," Intl.J. ComputerVision ,vol.60,no.2,pp.91{110,2004. [75]S.Belongie,J.Malik,andJ.Puzicha,\Shapematchinga ndobjectrecognitionusing shapecontexts," IEEETrans.PatternAnal.MachineIntell. ,vol.24,no.4,pp.509{ 522,2002. [76]B.J.Frey,N.Jojic,andA.Kannan,\Learningappearanc eandtransparencymanifolds ofoccludedobjectsinlayers,"in Proc.IEEEConf.ComputerVisionPatternRec. Madison,WI,2003,vol.1,pp.45{52,IEEE,Inc. [77]G.JonesIIIandB.Bhanu,\Recognitionofarticulateda ndoccludedobjects," IEEE Trans.PatternAnal.MachineIntell. ,vol.21,no.7,pp.603{613,1999. [78]J.S.Yedidia,W.T.Freeman,andY.Weiss,\Generalized beliefpropagation,"in Advancesinneuralinformationprocessingsystems13 ,T.K.Leen,T.G.Dietterich, andV.Tresp,Eds.,pp.689{695.MITPress,Cambridge,MA,20 01. [79]Y.FreundandR.E.Schapire,\Adecision-theoreticgen eralizationofon-linelearning andanapplicationtoboosting," J.ComputerSystemSciences ,vol.55,no.1,pp. 119{139,1997. [80]V.N.Vapnik, Statisticallearningtheory ,JohnWiley&Sons,Inc.,NewYork,NY, 1998. [81]P.Domingos,\MetaCost:ageneralmethodformakingcla ssierscost-sensitive,"in Proc.15thIntl.Conf.KnowledgeDiscoveryDataMining ,SanDiego,CA,1999,pp. 155{164,ACMPress. [82]T.G.Dietterich,\Machinelearningforsequentialdat a:areview,"in Lecturenotes incomputerscience ,T.Caelli,Ed.,vol.2396,pp.15{30.Springer-Verlag,Hei delberg, Germany,2002.


BIOGRAPHICALSKETCH SinisaTodorovicwasborninBelgrade,Serbia,in1968.Hegr aduatedfromMathematicalHighSchool{Belgradein1987.HereceivedhisB.S.degr eeinelectricalandcomputer engineeringattheUniversityofBelgrade,Serbia,in1994. From1994until2001,heworked asasoftwareengineerinthecommunicationsindustry.Infa ll2001,SinisaTodorovic enrolledinthemaster'sdegreeprogramattheDepartmentof ElectricalandComputer Engineering,UniversityofFlorida,Gainesville.Hebecam eamemberoftheCenterfor MicroAirVehicleResearch,whereheconductedresearchins tatisticalimagemodelingand multi-resolutionsignalprocessing.SinisaTodorovicear nedhismaster'sdegree(M.S.thesis option)inDecember,2002,afterwhichhecontinuedhisstud iestowardaPh.D.degreeinthe sameDepartment.Hereceivedtwocerticatesforoutstandi ngacademicaccomplishment in2002and2003.HeexpectstograduateinMay,2005. 80