Citation |

- Permanent Link:
- https://ufdc.ufl.edu/UFE0010089/00001
## Material Information- Title:
- Irregular-Structure Tree Models for Image Interpretation
- Creator:
- TODOROVIC, SINISA (
*Author, Primary*) - Copyright Date:
- 2008
## Subjects- Subjects / Keywords:
- Approximation ( jstor )
Datasets ( jstor ) Graphics ( jstor ) Image classification ( jstor ) Inference ( jstor ) Learning ( jstor ) Logical givens ( jstor ) Pixels ( jstor ) Plant roots ( jstor ) Statistical models ( jstor )
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright Sinisa Todorovic. Permission granted to University of Florida to digitize and display this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Embargo Date:
- 5/1/2005
- Resource Identifier:
- 71303361 ( OCLC )
## UFDC Membership |

Downloads |

## This item has the following downloads:
todorovic_s ( .pdf )
todorovic_s_Page_61.txt todorovic_s_Page_47.txt todorovic_s_Page_76.txt todorovic_s_Page_41.txt todorovic_s_Page_44.txt todorovic_s_Page_91.txt todorovic_s_Page_10.txt todorovic_s_Page_73.txt todorovic_s_Page_15.txt todorovic_s_Page_74.txt todorovic_s_Page_51.txt todorovic_s_Page_66.txt todorovic_s_Page_87.txt todorovic_s_Page_57.txt todorovic_s_Page_26.txt todorovic_s_Page_50.txt todorovic_s_Page_69.txt todorovic_s_Page_84.txt todorovic_s_Page_07.txt todorovic_s_Page_78.txt todorovic_s_Page_71.txt todorovic_s_Page_19.txt todorovic_s_Page_12.txt todorovic_s_Page_24.txt todorovic_s_Page_02.txt todorovic_s_Page_11.txt todorovic_s_Page_85.txt todorovic_s_Page_18.txt todorovic_s_Page_29.txt todorovic_s_Page_16.txt todorovic_s_Page_83.txt todorovic_s_Page_93.txt todorovic_s_Page_04.txt todorovic_s_Page_86.txt todorovic_s_Page_05.txt todorovic_s_Page_49.txt todorovic_s_Page_14.txt todorovic_s_Page_36.txt todorovic_s_Page_90.txt todorovic_s_Page_06.txt todorovic_s_Page_70.txt todorovic_s_Page_52.txt todorovic_s_Page_23.txt todorovic_s_Page_62.txt todorovic_s_Page_54.txt todorovic_s_Page_64.txt todorovic_s_Page_40.txt todorovic_s_Page_65.txt todorovic_s_Page_39.txt todorovic_s_Page_25.txt todorovic_s_Page_28.txt todorovic_s_Page_33.txt todorovic_s_Page_03.txt todorovic_s_Page_46.txt todorovic_s_Page_77.txt todorovic_s_Page_82.txt todorovic_s_Page_43.txt todorovic_s_Page_53.txt todorovic_s_Page_20.txt todorovic_s_Page_35.txt todorovic_s_Page_81.txt todorovic_s_Page_08.txt todorovic_s_Page_42.txt todorovic_s_Page_30.txt todorovic_s_Page_89.txt todorovic_s_Page_27.txt todorovic_s_Page_58.txt todorovic_s_Page_13.txt todorovic_s_Page_63.txt todorovic_s_Page_45.txt todorovic_s_Page_22.txt todorovic_s_Page_79.txt todorovic_s_Page_32.txt todorovic_s_Page_21.txt todorovic_s_Page_48.txt todorovic_s_pdf.txt todorovic_s_Page_59.txt todorovic_s_Page_31.txt todorovic_s_Page_72.txt todorovic_s_Page_75.txt todorovic_s_Page_09.txt todorovic_s_Page_34.txt todorovic_s_Page_92.txt todorovic_s_Page_68.txt todorovic_s_Page_37.txt todorovic_s_Page_55.txt todorovic_s_Page_38.txt todorovic_s_Page_17.txt todorovic_s_Page_88.txt todorovic_s_Page_80.txt todorovic_s_Page_67.txt todorovic_s_Page_01.txt todorovic_s_Page_56.txt todorovic_s_Page_60.txt |

Full Text |

IRREG ULAR-S'IRUC'I'TUIRE TI--: MODELS FOR IMAGE I iiT.Pi.T ',iON By S' TODOROVIC A :. ON PF F .- NTED TO THE GRADUT.: : SCHOOL OF riii UNIVERSITY OF FLORIDA IN PART IAL FULFii i i OF Trii R:-' UIREMENTS FOR Tiiii DEGREE OF DOCTOR OF PHILOSOPHY i:',. i : : : O F i: : ~ : )A AC : OWLED(. : C. "'-, I would to express -,.- sincere gratitude to Dr. Michael Nechyba for his wise and pa- tient guidance of my research for this dissertation. As my i : :: : advisor, Dr. Nechyba has been directing but on no account confining : interests. I especially appreciate his readi- ness and expertise to help me solve numerous .... ..... .':. issues. Most importantly, I am ::ii :i for the friendship that we have I .... ::: on this work. Also, I thank current advisor Dr. Dapeng Wu for -.:;:. extra effort to help me finalize my PhD :: I am grateful for his invaluable pieces of advise in choosing my future research goals, as well as .. practical concrete steps that he undertook to help me find a :. . My thanks also go to Dr. Jian Li, who helped me a lot in the transition : I in which I was .: ..- .- to change my advisor. Her research group provided .. a stimulating environment for me to endeavor investigating areas that are :1 the work : k in this dissertation. Also, I thank Dr. Antonio Arroyo, whose :ia: .::H lectures on machine intelligence have ':. .1:." me to do research in the field of machine learning. As the (::.. .. of the Machine : ; -- L h Lab (MIL), Dr. Arroyo has created a warm, ..... i and hard working i: .: among the : -ers." Thanks to him, I have decided to : :. the MIL, which has ... on numerous occasions to be the right decision. I thank i the members of the MIL for their friendship and support. I thank Dr. Takeo Kanade and Dr. Andrew Kurdila for sharing their research i... s:s on the micro air vehicle (MAV) project with me. : multidisciplinary environment of this i.: i. in which I had a chance to collaborate with various researchers with (: educational backgrounds was a great experience for me. TABLE OF CONTENTS ACKNOW LEDGMENTS ................................. LIST OF TABLES ................... ................. LIST OF FIGURES ... ... ... .. ... ... ... ... ... .. ... ... . KEY TO ABBREVIATIONS ............................... KEY TO SYMBOLS ................... ........... .... A B ST R A C T . . . . . . . . . .. CHAPTER 1 INTRODUCTION .................................. 1.1 Part-Based Object Recognition ....................... 1.2 Probabilistic Framework ........................... 1.3 Tree-Structured Generative Models ..................... 1.4 Learning Tree Structure from Data is an NP-hard Problem ....... 1.5 Our Approach to Image Interpretation ................... 1.6 Contributions . . . . . . . . . 1.7 O verview . . . . . . . . . 2 IRREGULAR TREES WITH RANDOM NODE POSITIONS 2.1 M odel Specification .. ................. 2.2 Probabilistic Inference .. ............... 2.3 Structured Variational Approximation .......... 2.3.1 Optimization of Q(XIZ) .............. 2.3.2 Optimization of Q(R'IZ) .............. 2.3.3 Optimization of Q(Z) ................ 2.4 Inference Algorithm and B ,, -i im Estimation ...... 2.5 Learning Parameters of the Irregular Tree with Random 2.6 Implementation Issues .. ............... 3 IRREGULAR TREES WITH FIXED NODE POSITIONS . Node Positions 3.1 M odel Specification . ... ... ... ... ... .. ... ... . 3.2 Inference of the Irregular Tree with Fixed Node Positions .. ...... 3.3 Learning Parameters of the Irregular Tree with Fixed Node Positions . 4 COGNITIVE ANALYSIS OF OBJECT PARTS .. .............. 4.1 Measuring Significance of Object Parts ................... page ii v vi viii x xii 4.2 Combining Object-Part Recognition Results . 5 FEATURE EXTRACTION .................. .......... .. 39 5.1 Texture .................. ................. .. 39 5.1.1 Wavelet Transform .................. ..... .. 39 5.1.2 Wavelet Properties .................. ..... .. 41 5.1.3 Complex Wavelet Transform ................... ... .. 42 5.1.4 Difference-of-Gaussian Texture Extraction . . ... 44 5.2 Color .................. ................... .. 45 6 EXPERIMENTS AND DISCUSSION ................ .... .. 46 6.1 Unsupervised Image Segmentation Tests ... . . 47 6.2 Tests of Convergence ............ . . .... 50 6.3 Image Classification Tests .................. ..... .. 53 6.4 Object-Part Recognition Strategy ................. .. 57 7 CONCLUSION .................. ................. .. 63 7.1 Summary of Contributions .................. .. .... .. .. 63 7.2 Opportunities for Future Work ................ .... .. 65 APPENDIX A DERIVATION OF VARIATIONAL APPROXIMATION . . ... 67 B INFERENCE ON THE FIXED-STRUCTURE TREE . . 72 REFERENCES ................... ......... ...... 74 BIOGRAPHICAL SKETCH .................. ............ .. 80 LIST OF TABLES Table page 5-1 Coefficients of the filters used in the Q-shift DTCWT . . .... 43 6-1 Root-node distance error .................. .......... .. 49 6-2 Pixel segmentation error .................. .......... .. 50 6-3 Object detection error. .................. ........... .. 50 6-4 Object recognition error .................. ........ .. .. 55 6-5 Pixel labeling error .................. ............. .. 55 6-6 Object recognition error for IQTyo ................ ... .. 59 6-7 Pixel labeling error for IQT o .................. ..... .. 59 LIST OF FIGURES Figure page 1-1 Variants of TSBNs .................. ... ........... 7 1-2 An irregular tree consists of a forest of subtrees .... . . 8 1-3 B ',, -i mi estimation of the irregular tree ................ 11 2-1 Two types of irregular trees .................. ........ .. 13 2-2 Pixel clustering using irregular trees .................. ..... 17 2-3 Irregular tree learned for the 4x4 image in (a) ................ .. 17 2-4 Inference of the irregular tree given Y, R0, and 0 .............. .. 24 3-1 C'I .--. of candidate parents .................. ........ .. 30 3-2 Inference of the irregular tree with fixed node positions . . .... 32 3-3 Algorithm for learning the parameters of the irregular tree . ... 34 4-1 For each subtree of ITv, representing an object in the 128 x 128 image 37 4-2 For each subtree of ITv, representing an object in the 256 x 256 image 38 5-1 Two levels of the DWT of a two-dimensional signal. ........... .. 40 5-2 The original image (left) and its two-scale dyadic DWT (right). ....... .. 40 5-3 The Q-shift Dual-Tree CWT. . ............... . ... 42 5-4 The CWT is strongly oriented at angles 15, 45, 75 . .... 43 6-1 20 image classes in type I and II datasets. ................. .. 48 6-2 Image segmentation using ITvo .................. .... 48 6-3 Image segmentation using ITyo: (top) dataset I images . . .... 48 6-4 Image segmentation by irregular trees learned using SVA . . ... 49 6-5 Image segmentation by irregular trees learned using SVA: (a) ITo . 49 6-6 Image segmentation using ITv .................. .... 49 6-7 Comparison of inference algorithms .................. ...... 51 6-8 Typical convergence rate of the inference algorithm for ITyo on the 128 x 128 52 6-9 Typical convergence rate of the inference algorithm for ITyo on the 256 x 256 52 6-10 Percentage increase in log-likelihood .................. ..... 52 6-11 Comparison of classification results for various statistical models . 55 6-12 MAP pixel labeling using different statistical models. . . ..... 56 6-13 ROC curves for the image in Fig. 6-12a with ITvo, TSBN, DRF and MRF. 56 6-14 ROC curves for the image in Fig. 6-12a with ITv, ITvo, TSBN, and TSBNT. 56 6-15 Comparison of two recognition strategies .... . . 58 6-16 Recognition results over dataset IV for IQTo. ....... . . 60 6-17 Recognition results over dataset V for IQTvo. ............... .61 6-18 C'l .--!i, i 1, .1- using the part-object recognition strategy . . ... 62 B-1 Steps 2 and 5 in Fig. 32 ................. . ..... 73 KEY TO ABBREVIATIONS list shown below gives a description of the f. :- -! used .... .. or abbrevi- ations in this work. For each name, the page number corresponds to the where the name is :; used. B: blue channel of the RGB color space ............ G: green channel of the RGB color space . .............. .... 43 R: red channel of the R.GB color space ............... IQTV: irregular tree with fixed node positions, and with observables present at all levels ................. ............... .... 26 IQTyo: irregular tree with fixed node '":... and with observables present <. .1 at th e . . . . . . . . . . ITvi: irregular tree where observables are .: :.; .;.! at the ITV: irregular tree where observables are present at all levels g: normalized green channel ......... r: normalized red channel ..... .... ( ', T: Complex Wavelet Transform ...... DRF: Discriminative Random Field . . DTC ','IT: Dual Tree Complex Wavelet Transform .... DWT: Discrete WXavelet Transfor . ......... SExp : .: ..... ..... algorithm .. . KL:R : I i : i : divergence . ... MAP: Maximum A Posteriori . .... MC( C: Markov ( i .:: Monte Carlo method . . ML: Maxiimum Likelihood . ....... MPM : Maxirmum Posterior Marginal .. .. MRF: Markov .::. i..: Field .... . , \ ': nondeterministic : .. ....... 1 time . ........ leaf-level . 13 . . 37 . . . . 7 . . . . 3 1'7 .. . 15 . 22 2 . . . 7 RGB: T : color space that consists of red, green and blue color values . . S*: receiver operating characteristic ................ . .... 52 '.'A: structured variational approximation inference algorithm .... . 16 TSBN: tree-structured belief network .. .. .. 5 VA: variational approximation inference algorithm .................. 16 KEY TO SYMBOLS list shown below gives a brief <1 : .. of the '":* mathematical symbols defined in this work. For each symbol, the number corresponds to the : where the symbol is : used. Influence of observables Y on .ij . . 20 Bij: influence of the geometric .. ... lies of the network on ... .... ..... 20 G: number of components in a Gaussian mixture . ......... 15 Hil: Shanon's :.i:.. of node i ............. ... .. . . 3 ,(Q, P): free ................ ............ 64 L: maximum number of levels in the irregular tree ..... . .. 13 :: set of image classes (i.e., ob'. appearances) .................. 13 i: : : i : 1' p .1 tables .................. . ...... 13 .< approximate conditional : :i tables, given Y and Ro ....... 18 R: positions of all nodes in the irregular tree . .......... 13 R': positions of non-leaf nodes in the irregular tree ......... . ....... 13 . 0: positions of leaf nodes in the irregular tree ............... ... 13 V: set of all nodes in the irregular tree ............... . ....... 13 V': set of all non-leaf nodes in the irregular tree . ............. 13 V: set of all leaf nodes in the irregular tree . ......... ...... 13 X: random vector of all ........... . ........ 13 Y: all observables ............... . .............. 13 Z: connect i r random matrix .................... . ....... 13 C: cost .. .. .. ......... ............. . 20 : the set of i i ::: ; .:7 } in the irregular tree with -i node positions . Eg covariance matrix of a relative child-parent i1 : i.. .... (ri-rj) .... . 0: set of parameters that characterize an irregular tree ........ ..... 15 ~Jy: approximate covariance of r,, given that j is the parent of i, and given Y and 18 ;: approximate mean of ri, given that j is the parent of i, and given Y and '." 18 p(i): ..'.. : of an observable random vector in the iage plane . . 13 f: index of levels in the irregular tree ..... . ....... 13 :pro ,,.i= of a node i being the child of j . ............ 13 h: norm alization constant ....................... . ..... 18 0: set of parameters that characterize a Gaussian mixture .............. .. 15 ,ij: approximate probability of i being the :i i of j, given Y and . . 18 *..' ': ** posterior that node i is labeled as image class k, given Y and R 19 xi: image-class of node .. . ........ . 13 image-class indicator if k class is assigned to node i ................ 13 zij: con ..- : :' indicator random variable between nodes i and j . . .. 13 dij the mean of relative displacement ri-r . ............. .. 13 r,: ". ... of node i in the image .. ................. . 13 Yp(,): observable random vector ...... . ............... 13 Abstract of Dissertation Presented to the Graduate School of the U-i: i H of F : in Partial i ::.:::: : of the P. i::' :.. .. for the Degree of Doctor of Pi ' i ;G i. ,-.i .'* i iE 'ii i i, l i i t:FO R i 'G E: ii i i i lI: O N By Sinisa Todorovic (. :: Dapeng \Vu '. .: I)Departmient: ii : : and C :... : :Enggineering In this dissertation, we seek to accomplish the f1i .. ,, related goals: (1) to i a un'r 1:. i ... i to address localization, detection, and recognition of obi. i as three sub-tasks of image-interpretation, and (2) to :.. : a .. -_ : : .. .. ..: and reliable solution to recognition of multiple, partially occluded, alike ob' in a given single image. second ....!.' ... is to date an open problem in computer vision, eluding a satisfactory solution. For this :-- we formulate obh. recognition as I' : ... estimation, whereby class labels with the maximum posterior (i : :i. : :. are assigned to each pixel. To effi- ciently estimate the posterior distribution of image classes, we propose to model images with : :: r l models known as : .'.:. trees. '. irregular tree i-..... i,..1!..!.':y distributions over both its structure and im- age classes. i means that, for each image, it is necessary to i :. the optimal model structure, as well as the posterior distribution of image classes. XWe propose several infer- ence algorithms as a solution to this NP-hard *. 1.*. (nondeterministic : ... time), which can be viewed as variants of the Expectation-Maximization (EM) algorithm. After i....... the model i .... a forest .. subtrees, each of which segments the image. ri :: is, inference of model structure provides a solution to obi 1 localization and detection. With to our second goal, we hypothesize that for a successful occluded-object recognition it is critical to ( :* -: iy analyze visible obi~. parts. Irregular trees are conve- nient for such ... because the treatment of obi' : I. : represents ... 1 a particular interpretation of the tree/subtree structure. We analyze the '...*.. of irregular-tree nodes, representing ob'-: parts, with : to recognition of an ob'-: as a whole. : S: :: : : is then exploited toward the ultimate obi 1 recognition. ..r .. .'. results demonstrate that irregular trees more accurately model images than their : structure counterparts quad-trees. Also, the experiments reported herein show that our explicit treatment of object i. .: results in an :' -i: oved recognition p. : as compared to the strategies in which ob i ( ... .. are not t -1 ay accounted for. CHAPTER 1 INTRODUCTION Image interpretation is a difficult challenge that has long been confronting the computer- vision community. A number of factors contribute to the complexity of this problem. The most critical is inherent uncer' i I-,v in how the observed visual evidence in images should be attributed to infer object types and their relationships. In addition to video noise, there are various sources of this uncer' ,ir -,, including variations in camera quality and position, wide-ranging illumination conditions, extreme scene diversity, and the randomness of object appearances, clutter and locations in scenes. One of the critical hindrances to successful image interpretation is that objects may occlude each other in a complex scene. In the literature, the initial research on the inter- pretation of scenes with occlusions appeared in early nineties. However, in the last decade relatively small volume of the related literature was published. In fact, a majority of the recently proposed vision systems is not directly aimed at solving the problem of occluded- object recognition; experiments on images with occlusions are reported as a side result only to illustrate the versatility of those systems. This i:_:_ I that recognition of partially occluded objects is an open problem in computer vision, which motivates us to seek its solution in this dissertation. In the initial work, local features (e.g., points, line and curve segments) are used to represent objects, allowing the unoccluded features to be matched with object features, by computing a scalar measure of model fit [1,2,3]. The unmatched scene features are modeled as spurious features, and the unmatched object features indicate the occluded part of the object. The matching score is either the number of matched object features or the sum of a Gaussian-weighted matching error. The main limitation with these approaches is that they do not account for the spatial correlation among occlusions. Statistical approaches to occluded-object recognition have also been reported in the literature. For instance, Wells [4], and Ying and Castanon [5] propose probabilistic models to characterize scene features and the correspondence between scene and object features. The authors model both object-feature uncer' ,inil-, and the pI...1 il.il-, that the object features are occluded in the scene. They introduce two statistical models for occlusion. One model assumes that each feature can be occluded independently of whether any other features are occluded, whereas the second model accounts for the spatial correlation to represent the extent of occlusion. The spatial correlation is computed using a Markov Random Field (MRF) model with a Gibbs distribution [6]. The main drawback of these systems is a prohibitive computational load; the run-time of these algorithms is exponential in the number of objects to be recognized. Other related work exploits auxiliary information provided, for example, by image sequences or stereo views of the same scene [7,8,9,10,11,5], where occlusions are transitory. Since this information in general may not be available, and/or occlusions may remain permanent, in our approach we do not use the strategies of these systems. A review of the related literature also -i:_:_- I that the majority of vision systems are designed to deal with only one constrained vision task, such as, for example, image segmen- tation [10, 11, 5]. However, to conduct image interpretation, as is our goal, it is necessary to perform three related tasks: (1) localization, (2) detection (also called image segmenta- tion), and (3) ultimate recognition of object appearances (also called image classification). Further, in many systems in which the three sub-tasks are addressed, this is not done in a unified manner. Here, as a drawback, the system's architecture comprises a serial connec- tion of separate modules, without any feedback on the accuracy of the ultimate recognition. Moreover, vision systems are typically designed to recognize only a specific instance of ob- ject classes appearing in the image (e.g., face), which, in turn, is assumed dissimilar to other objects in the image. However, the assumption of uniqueness of the target class may not be appropriate in many settings. Also, the success of these systems usually depends on ad hoc fine-tuning of the feature-extraction methods and system's parameters, optimized for that unique target class. With current demands to design systems capable of I,-- -, i-I: thousands of image classes simultaneously, it would be difficult to generalize the outlined approaches. The small volume of published research addressing occlusions in images i:_:_- I that the problem is not fully examined. Also, the drawbacks of the above systems-namely: con- strained goals and settings of operation, poor spatial modeling of occlusion, and prohibitive computational load motivated us to conduct the research reported herein. Our motivation is that most object classes seem to be naturally described by a few characteristic parts or components and their geometrical relation. We hypothesize that it is not the percentage of occlusion that is critical for object recognition, but rather which object parts are occluded. Not all components of an object are equally important for its recognition, especially when that object is partially occluded. Given two similar objects in the image, the visible parts of one object may mislead the algorithm to recognize it as its counterpart. Therefore, careful consideration should be given to the I, '1-, -i- of detected visible object parts. One of the benefits of such I ', 1--i, is the flexibility to develop various recognition strategies that weigh the information obtained from the detected object parts more judiciously. In the following section, we review some of the reported part-based object-recognition strategies. 1.1 Part-Based Object Recognition Recently, there has been a flurry of research related to part-based object recognition. For example, Mohan et al. [12] use separate classifiers to detect heads, arms, and legs of people in an image, and a final classifier to decide whether a person is present. However, the approach requires object parts to be manually defined and separated for training the individual part classifiers. To build a --, -. i 1i that is easily extensible to deal with different objects, it is important that the part selection procedure be automated. One approach in this direction is developed by Weber et al. [13,14]. The authors assume that an object is composed of parts and shape, where parts are image patches, which may be detected and characterized by appropriate detectors, and shape describes the geometry of the mutual position of the parts in a way that is invariant with respect to rigid and, possibly, affine transformations. The authors propose a joint p. ".11 ,ii 1-, density over part appearances and shape that models the object class. This framework is appealing in that it naturally allows for parts of different sizes and resolutions. However, due to computational issues, to learn the joint probability density, the authors choose heuristically a small number of parts per each object class, rendering the density unreliable in the case of large variations across images. Probabilistic detection of object parts has also been reported. For instance, Heisele et al. [15] propose to learn object components from a set of examples based on their dis- criminative power, and their robustness against pose and illumination changes. For this purpose, they use Support Vector Machines. Also, Felzenszwalb and Huttenlocher [16] rep- resent an object by a collection of parts arranged in a deformable configuration. In their approach, the appearance of each part is modeled separately by Gaussian-mixture distribu- tions, and the deformable configuration is represented by spring-like connections between pairs of parts. The main problem of the mentioned approaches is that they lack the ., 11-, -1, of object parts through scales. It is assumed that parts cannot contain other sub-parts, and that objects are unions of mutually exclusive components, which is hard to justify for more complex object classes. To address the .111 '1, -i- of object parts through scales Schneiderman and Kanade [17] propose a trainable multi-stage object detector composed of classifiers, each making a de- cision about whether to cease evaluation, labeling the input as non-object, or to continue further evaluation. The detector orders these stages of evaluation from a low-resolution to a high-resolution search of the image. The aforementioned approaches are not suitable for recognition of a large number of object classes. As the number of classes increases there is a combinatorial explosion of the number of their parts (i.e., image patches) that need to be evaluated by appropriate detectors. In this dissertation, we seek a solution to the outlined problems. Our goal it to design a vision --, -I. in that would i, -1-,... multiple object classes through their constituent, "mean- ingful" parts at a number of different resolutions. To this end, we resort to a probabilistic framework, as discussed in the following section. 1.2 Probabilistic Framework We formulate image interpretation as inference of a posterior distribution over pixel random fields for a given image. Once the posterior distribution of image classes is inferred, each pixel can be labeled through B ,-,. -i ,,i estimation (e.g., maximum a posteriori-MAP). Within this framework, it is necessary to specify the following: 1. The probability distribution of image classes over pixel random fields, 2. The inference algorithms for computing the posterior distribution of image classes, 3. B ,-,. -i i, estimation for ultimate pixel 1 1.. lii,:_. that is, object recognition. Our principal challenge lies in choosing a statistical model for specifying the probability distribution of image classes, since this choice conditions the formulation of inference and B ,-,. -i im estimation. A suitable model should be computationally manageable, and suffi- ciently expressive to represent a wide range of patterns in images. A review of the literature offers four broad classes of models [18]. The descriptive models are constructed based on statistical descriptions of image ensembles with variables only at one level (e.g., [19, 20]). The pseudo-descriptive models reduce the computational cost of descriptive models by im- posing partial (or even linear) order among random variables (e.g., [21,22]). The generative models consist of observable and hidden variables, where hidden variables represent a finite number of bases generating an image (e.g., [23, 24]). The discriminative models directly encode posterior distribution of hidden variables given observables (e.g., [25,26]). The available models differ in structural complexity and difficulty of inference. At one end lie descriptive models, which build statistical descriptions of image ensembles only at the observable (i.e., pixel) level. Other modeling paradigms (i.e., generative, discriminative) impose \ ,i-, in:_ levels of structure through the introduction of hidden variables. However, no principled formulation exists, as of yet, to -1:_:_. -. one approach superior to the others. Therefore, our choice of model is guided by the goal to interpret scenes with partially occluded, alike objects. We seek a model that offers a viable means of recognizing partially occluded objects through recognition of their visible constituent parts. Thus, a prospective model should allow for I, '1-, -i- of object parts towards recognition of objects as a whole. To alleviate the computational complexity arising from the treatment of multiple object-parts of multiple objects in images, we seek a model that is capable of modeling both whole objects and their sub-parts in a unified manner. That is, a candidate model must be expressive enough to capture component-subcomponent relationships among re- gions in an image. To accomplish this, it is necessary to i_ '1-,.. pixel neighborhoods of varying size. The literature abounds with reports on successful applications of multi- scale statistical models for this purpose [27,28,29,30,31, 32]. Following these trends, we choose the irregular tree-structured '. 1'. f network, or short irregular tree. Our choice is directly driven by our image-interpretation strategy and goals, and appears better suited than alternative statistical approaches. Descriptive models lack the necessary structure for component-subcomponent representation we seek to exploit. Discriminative approaches directly model posterior distribution of hidden variables given observables. Consequently, they lose the convenience of assigning .1!-, -i. 1 meaning to the statistical parameters of the model. In contrast, irregular trees can detect objects and their parts simultaneously, as discussed in the following chapters. Before we continue to present our approach to image interpretation, we give a brief overview of tree-structured generative models in the following section. 1.3 Tree-Structured Generative Models Recently, there has been a flurry of research in the field of tree-structured generative models, also known as tree-structured belief networks (TSBNs) [27,33,28,29,30,31,32]. The models provide a systematic way to describe random processes/fields and have extremely efficient and statistically optimal inference algorithms. Tree-structured belief networks are characterized by a fixed balanced tree structure of nodes representing hidden (latent) and observable random variables. We focus on TSBNs whose hidden variables take discrete val- ues, though TSBNs can model even continuously valued Gaussian processes [34, 35]. The edges of TSBNs represent parent-child (Markovian) dependencies between neighboring lay- ers of hidden variables, while hidden variables, belonging to the same layer, are conditionally independent, as depicted in Figure 1-1. Note that observables depend solely on their corre- sponding hidden variables. Observables are either present at the finest level only, or could be propagated upward the tree, as dictated by the design choices related to image processing. TSBNs have efficient linear-time inference algorithms, of which, in the graphical-models literature, the best-known is ', 1., f propagation [36, 37, 38]. Cheng and Bouman [29] have used TSBNs for multiscale document segmentation; Kumar and Hebert [39] have employed TSBNs for segmentation of man-made structures in natural scene images; and Schneider et al. [40] have used TSBNs for simultaneous image denoising and segmentation. All the afore- mentioned examples demonstrate the powerful expressiveness of TSBNs and the efficiency of their inference algorithms, which is critically important for our purposes. In spite of these attractive properties, the fixed regular structure of nodes in the TSBN gives rise to 1.1.. I:y" estimates. The pre-defined tree structure fails to adequately represent the immense variability in size and location of different objects and their subcomponents in images. In the literature, there are several approaches to alleviate this problem. Irving et al. [28] have proposed an overlapping tree model, where distinct nodes correspond to overlapping parts in the image. Li et al. [41] have discussed two-dimensional hierarchical models where nodes are dependent both at any particular layer through a Markov-mesh and across resolutions. In both approaches segmentation results are superior to those when standard TSBNs are used, because the descriptive component of the models is improved at increased computational cost. Ultimately, however, these approaches do not deal with the source of the I .... Iii, -- namely, the orderly structure of TSBNs. Not until recently has the research on irregular structures been initiated. Konen et al. [42] have proposed a flexible neural mechanism for invariant pattern recognition based on correlated neuronal activity and the self-organization of dynamic links in neural networks. Also, Montanvert et al. [43], and Bertolino and Montanvert [44] have explored irregular multiscale tessellations that adapt to image content. We join these research efforts building on the work of Adams et al. [45], Adams [46], Storkey [47], and Storkey and Williams [48], by considering the irregular-structured tree belief network. (a) (b) Figure 1-1: Variants of TSBNs: (a) observables (black) at the lowest layer only; (b) ob- servables (black) at all layers; white nodes represent hidden random variables, connected in a balanced quad-tree structure. '------------------------7 - - - - - - - Figure 1-2: An irregular tree consists of a forest of subtrees, each of which segments the image into regions, marked by distinct shading; round- and square-shaped nodes indicate hidden and observable variables, respectively; triangles indicate roots. In the irregular tree, as in TSBNs, nodes represent random variables, and arcs between them model causal (Markovian) dependence assumptions, as illustrated in Figure 1-2. The irregular tree specifies probability distributions over both its structure and image classes. It is this distribution over tree structures that mitigates the above cited problems with TSBNs. 1.4 Learning Tree Structure from Data is an NP-hard Problem In order to fully characterize the irregular tree (and any graphical model, for that matter), it is necessary to learn both the graph topology (structure) and the parameters of transition probabilities between connected nodes from training data. Usually, for this purpose, one maximizes the likelihood of the model over training data, while at the same time minimizing the complexity of model structure. Current methods are successful at learning both the structure and parameters from complete data. Unfortunately, when the data are incomplete (i.e., some random variables are hidden), optimizing both the structure and parameters becomes NP-hard (nondeterministic polynomial time) [49,50]. The principal contribution of this dissertation is that we propose a solution to the NP-hard problem of model-structure estimation. In our approach, we use a variant of the Expectation-Maximization (EM) algorithm [51,52], to facilitate efficient search over a large number of candidate structures. In particular, the EM procedure iteratively improves its current choice of parameters by using the following two steps. In the Expectation step, current parameters are used for computing the expected value of all the statistics needed to evaluate the current structure. That is, the missing data (hidden variables) are completed by their expected values. In the Maximization step, we replace current parameters with those that maximize the likelihood over the complete data. This second step is essentially equivalent to learning model structure and parameters from complete data, and, hence, can be done efficiently [50, 38, 49]. In the incomplete-data case, a local change in structure of one part of the tree may lead to a structure change in another part of the model. Thus, the available methods for structure estimation evaluate all the neighbors (e.g., networks that differ by a few local changes) of each candidate they visit [53]. The novel idea of our approach is to perform a search for the best structure within EM. In each iteration step, our procedure attempts to find a better network structure, by computing the expected statistics needed for evaluation of alternative structures. In contrast to the available approaches, the EM-based structure search makes a significant progress in each iteration. As we show through experimental validation, our procedure requires relatively few EM iterations to learn non-trivial tree structures. The outlined image modeling constitutes the core of our approach to image interpre- tation, which is discussed in the following section. 1.5 Our Approach to Image Interpretation We seek to accomplish the following related goals: (1) to find a unifying framework to address localization, detection, and recognition of objects, as three sub-tasks of image- interpretation, and (2) to find a computationally efficient and reliable solution to recognition of multiple, partially occluded, alike objects in a given single image. For this purpose, we formulate object recognition as the B i-,. -i mi1 estimation problem, where class labels are assigned to pixels by minimizing the expected value of a suitably specified cost function. This formulation requires efficient estimation of the posterior distribution of image classes (i.e., objects), given an image. To this end, we resort to directed graphical models, known as irregular trees [54,55,46,47,48,45]. As discussed in Section 1.3, the irregular tree specifies probability distributions over both its structure and image classes. This means that, for each image, it is necessary to infer the optimal model structure, as well as the posterior distribution of image classes. By utilizing the Markov property of the irregular tree, we are in a position to reduce computational complexity of the inference algorithm, and, thereby, to efficiently solve our B i-,. -i mi estimation problem. After inference, the model represents a forest of sub-trees, each of which segments the image. More precisely, leaf nodes that are descendants down the subtree of a given root form the image region characterized by that root, as depicted in Fig. 1-2. These segmented image regions can be interpreted as distinct object appearances in the image. That is, inference of irregular-tree structure provides a solution to localization and detection. Moreover, in inference, we also derive the posterior distribution of image classes over leaf nodes. In order to classify the segmented image regions as a whole, we perform majority voting over the maximum a posteriori (MAP) classes of leaf nodes. In this fashion, we accomplish our first goal. With respect to our second goal, we hypothesize that the critical factor in a successful occluded-object recognition should be the ,i, '1-, -~i of visible object parts, which, as dis- cussed before, usually induces prohibitive computational cost. To account explicitly for object parts at various scales, we utilize the Markovian property of irregular trees, which lends itself as a natural solution. Since each root determines a subtree whose leaf nodes form a detected object, we can assign 1r,-, -i, 1 meaning to roots as representing whole objects. Also, each descendant of the root down the subtree can be interpreted as the root of another subtree whose leaf nodes cover only a part of the object. Thus, roots' descendants can be viewed as object parts at various scales. Therefore, within the irregular-tree framework, the treatment of object parts represents merely a particular interpretation of the tree/subtree structure. To reduce the complexity of interpreting all detected object sub-parts, we propose to .,1 ,1-,.. the .:i.:;i. ,i,,. .- of object components (i.e., irregular-tree nodes) with respect to recognition of objects as a whole. After B i-,. -i mi estimation of the irregular-tree structure for a given image, we first find the set of most .:,,.:. a',,I irregular-tree nodes. Then, these selected significant nodes are treated as new roots of subtrees. Finally, we conduct MAP classification and majority voting over the selected image regions, descending from the selected .:,,.:; ai,,Il nodes, as illustrated in Fig. 1-3. 1.6 Contributions Below, we outline the main contributions of this dissertation. St I - - - - - - ----- -------------- -I /-- optimize structure find "significant" nodes classify selected regions Figure 1-3: B ,-, -i ,i, estimation of the irregular tree along with the ,ii ,1-, -i, of signifi- cant tree nodes constitute our approach to recognition of partially occluded, alike objects; shading indicates the two distinct sub-trees under the two i-, il iii!" nodes. We propose an EM-like algorithm for learning a graphical-model, where both model structure and its distributions are learned on a given data simultaneously. The algorithm represents a stage-wise solution to the learning problem known to be NP-hard. While we use the algorithm for learning irregular trees, its generalization to any generative model is straightforward. A critical part of this learning algorithm is inference of the posterior distribution of image classes on a given data. As is the case for many complex-structure models, exact inference for irregular trees is intractable. To overcome this problem, we resort to variational approximation approach. We assume that there are averaging phenomena in irregular trees that may render a given set of variables in the model approximately independent of the rest of the network. Thereby, we derive the Structured Variational Approximation algorithm that advances existing methods for inference. In order to avoid variational approximation in inference, we propose two novel archi- tectures and their inference algorithms within the irregular-tree framework. Being simpler, these models allow for exact inference. Moreover, empirically, they exhibit higher accuracy in modeling images than irregular-tree-like models proposed in prior work [45, 46, 47, 48]. Along with architectural novelties, we also introduce multi-layered data into the model an approach that has been extensively investigated in fixed-structure quad-trees [29,33]. The proposed quad-trees have proved rather successful for various applications including image d, i-riin, classification, and segmentation. Hence, it is important to develop a similar formulation for irregular trees. We develop a novel approach to object recognition, in which object parts are explicitly I, '1-,. .1 in a computationally efficient manner. As a major theoretical contribution, we define the measure of cognitive significance of object details. The measure provides for a principled algorithm that combines detected object parts toward recognition of an object as a whole. Finally, we report results of experiments conducted on a wide variety of image datasets, which characterize the proposed models and inference algorithms, and validate our approach to image interpretation. 1.7 Overview The remainder of the dissertation is organized as follows. In Chapter 2, we specify two architectures of the irregular-tree model, and derive inference algorithms for them. The architectures differ in the treatment of observable random variables. We also discuss learning of the model parameters. Detailed derivation of the inference algorithm is given in Appendix A. Next, in Chapter 3, we specify yet another two architectures of the irregular-tree model, for which it is possible to simplify the inference algorithm, as compared to that discussed in Chapter 2. We deliberate the probabilistic inference and learning algorithms for the models. Further, in Chapter 4, we propose a measure of significance of object parts. This measure ranks object components with respect to the entropy over all image classes (i.e., objects). To incorporate the information of this .111 ,1-, -i into the MAP classification, we devise a greedy algorithm, which we refer to as object-part recognition. The extraction of image features, which we use in our experiments, is thoroughly discussed in Chapter 5. Then, In Chapter 6, we report performance results of different irregular-tree architectures on a large number of challenging images with partially occluded, alike objects. Finally, in Chapter 7, we summarize the major contributions of the dissertation, and conclude with remarks on the future research. CHAPTER 2 IRREGULAR TREES WITH RANDOM NODE POSITIONS 2.1 Model Specification Irregular trees are directed, -, 11i graphs with two disjoint sets of nodes representing hidden and observable random vectors. Graphically, we represent all hidden variables as round-shaped nodes, connected via directed edges indicating Markovian dependencies, while observables are denoted as rectangular-shaped nodes, connected only to their corresponding hidden variables, as depicted in Fig. 2-1. Below, we first introduce nodes characterized by hidden variables. There are V round-shaped nodes, organized in hierarchical levels, V, = {0, 1, ..., L-l}, where Vo denotes the leaf level, and V'AV\V0. The number of round-shaped nodes is identi- cal to that of the corresponding quad-tree with L levels, such that V|= | V-1|/4=...=| Vo/4'. Connections are established under the constraint that a node at level f can become a root, or it can connect only to the nodes at the next f+1 level. The network connectivity is represented by random matrix Z, where entry zij is an indicator random variable, such that zij=1 if iVe and jE{0, V+1} are connected. Z contains an additional zero ("root") 00 (a) (b) Figure 2-1: Two types of irregular trees: (a) observable variables present at the leaf level only; (b) observable variables present at all levels; round- and square-shaped nodes indicate hidden and observable random variables; triangles indicate roots; unconnected nodes in this example belong to other subtrees; each subtree segments the image into regions marked by distinct shading. column, where entries zio=1 if i is a root. Since each node can have only one parent, a real- ization of Z can have at most one entry equal to 1 in each row. We define the distribution over connectivity as P(Z) A nL ni) {, } [- (2.1) where is the I .1. 1 ,ill- of i being the child of j, subject to Yjo0,ve+li} 1. Further, each round-shaped node i (see Fig. 2-1) is characterized by random position ri in the image plane. The distribution of ri is conditioned on the position of its parent rj as P(ri|rj, zij1) A exp(--(ri-rj-dijr-r-d)) (2.2) 27rE ij 2 where yij is a diagonal matrix that represents the order of magnitude of object size, and pa- rameter dij is the mean of relative displacement (ri-rj). Storkey and Williams [48] set dij to zero, which favors undesirable positioning of children and parent nodes at the same loca- tions. From our experiments, this may seriously degrade the image-modeling capabilities of irregular trees, and as such some nonzero relative displacement dij needs to be accounted for. For roots i, we have P(rilro,zio 1)Aexp(- (ri-di)T ; (r -d))/(2r|E|). The joint probability of RA{riViEV}, is given by P(R|Z) A [P(rilrj, zij) ]Z"i (2.3) At the leaf level, Vo, we fix node positions Ro to the locations of the finest-scale ob- servables, and then use P(Z, R'IR) as the prior over positions and connectivity, where Roa{rilVieVo}, and R'A{riVieV\Vo}. Next, each node i is characterized by an image-class label xi and an image-class indica- tor random variable xz, such that x =1 if xi=k, where k is a label taking values in the finite set M. Thus, we assume that the set M of unknown image classes is finite. The label k of node i is conditioned on image class I of its parent j and is given by conditional probability tables Pb'. For roots i, we have P(xalz, zio 1) AP(xa). Thus, the joint 1.i1..1. i111 -, of XA{xziEV, kEM} is given by P(X|Z) = Y [k,pEM ''zi (2.4) Finally, we introduce nodes that are characterized by observable random vectors rep- resenting image texture and color cues. Here, we make a distinction between two types of irregular trees. The model where observables are present only at the leaf-level is referred to as ITyo; the model where observables are present at all levels is referred to as ITv. To clarify the difference between the two types of nodes in irregular trees, we index observables with respect to their locations in the data-structure (e.g., wavelet dyadic squares), while hidden variables are indexed with respect to a node-index in the graph. This generalizes correspondence between hidden and observable random variables of the position-encoding dynamic trees [48]. We define the position of an observable, p(i), to be equal to the center of mass of the i-th dyadic square at level in the corresponding quad-tree with L levels: p(i) A [(n+0.5)2' (m+0.5)2 ]T Vi E V', = {0,..., L 1}, n, m = 1,2,... (2.5) where n and m denote the row and column in the dyadic square at scale f (e.g., for wavelet coefficients). C'I. Ily, other application-dependent definitions of p(i) are possible. Note that while the r's are random vectors, the p's are deterministic values fixed at locations where the corresponding observables are recorded in the image. Also, after fixing Ro to the locations of the finest-scale observables, we have VieVo, ri=p(i). The definition, given by Eq. (2.5), holds for ITvo, as well, for f=0. For both types of irregular trees, we assume that observables YA{yp(i)ViEV} at loca- tions pa{p(i)|VieV} are conditionally independent given the corresponding x4 : P(YIX, p) = n v kM [P(yp) p(i))] (2.6) where for ITvo, Vo should be substituted for V. The likelihood P(yp(y) =l1, p(i)) are modeled as mixtures of Gaussians: P(yp(i) x =1, p(i)) A I 7k (g)Afp(iy); vk(), 7k(g)). For large Gk, a Gaussian-mixture density can approximate any probability density [56]. In order to avoid the risk of overfitting the model, we assume that the parameters of the Gaussian-mixture are equal for all nodes. The Gaussian-mixture parameters can be grouped in the set 0 A {Gk, {k(g9), Vk(g), Ek(g)}1 VkCM}. Speaking in generative terms, for a given set of V nodes, first P(Z) is defined using Eq. (2.1) and P(RIZ) using Eq. (2.3) to give us P(Z, R). We then impose the condition of fixing the leaf-level node positions to the locations of the finest-scale observables, po C p, to obtain P(Z, R'IRo pO). Combining Eq. (2.4) and Eq. (2.6) with P(Z, R'IRo pO) results in the joint prior P(Z,X, R', YRO po) = P(YX, p)P(XZ)P(Z, R'RO po) (2.7) which fully specifies the irregular tree. All the parameters of the joint prior can be grouped in the set 0 A { dij,Yij,P', 0}, Vi, jV, Vk, leM. As depicted in Figure 2-1, a irregular tree is a directed graph. The formalism of the graph-theoretic representation of irregular trees provides general algorithms for computing marginal and conditional probabilities of interest, which is discussed in the following section. 2.2 Probabilistic Inference Image interpretation, as discussed in Chapter 1, requires computation of posterior prob- abilities of hidden random variables Z, X, and R', given observables Y and leaf-node posi- tions Ro. However, due to the complexity of irregular trees, the exact probabilistic inference of P(Z, X, R'|Y, Ro) is infeasible. Therefore, we resort to approximate inference methods, which are divided into two broad classes: deterministic approximations and Monte-Carlo methods [57, 58, 59, 60, 61]. Markov Chain Monte Carlo (MC':\C) methods allow for sampling of the posterior P(Z, X, R' Y, Ro), and the construction of a Markov chain whose equilibrium distribution is the desired P(Z, X, R'IY, Ro). Below, we report an experiment for two datasets of 4x4 and 8x8 binary images, samples of which are depicted in Fig. 2-2a, where we learned P(Z,X, R'IY, R) for ITvo models through Gibbs sampling [62]. Observables yi were set to binary pixel values; the number of image classes was set to |M|I2; the number of components in the Gaussian-mixture was set to G=1; and the maximum number of levels in the model is set to L=3 and L=4 for 4x4 and 8x8 images, respectively. The initial irregular-tree structure is a balanced quad-tree (TSBN), where the number of leaf-level nodes is equal to the number of pixels. One iteration of Gibbs sampling consists of sampling each variable, conditioned on the other variables in the irregular tree, until all the variables are sampled. We iterated this procedure until our convergence criterion was met -namely, when IPt+l(Z,X, R'IY, R)-Pt(Z,X,R'IY, RO)I/Pt(Z,X, R'IY, R) (a) (D) (c) Figure 2-2: Pixel clustering using irregular trees learned by Gibbs sampling: (a) sample 4x4 and 8x8 binary images; (b) clustered leaf-level pixels that have the same parent at level 1; (c) clustered leaf-level pixels that have the same grandparent at level 2; clusters are indicated by different shades of gray; the point in each group marks the position of the parent node. Figure 2-3: Irregular tree learned for the 4x4 image in (a), after 20,032 iterations of Gibbs - i 1.'liih:_. nodes are depicted in-line representing 4, 2 and 1 actual rows of the levels 0, 1 and 2, respectively; nodes are drawn as pie-charts representing P(x =- 1), k e {0, 1}; note that there are two root nodes for two distinct objects in the image. iteration steps t, where e=0.1 and e=1 for 4x4 and 8x8 images, respectively. For the dataset of 50 binary 4x4 images, on average more than 20,000 iteration steps were required for convergence, while for 50 binary 8x8 binary image, more than 100,000 iterations were required. In Figs. 2-2b-c, we also illustrate the grouping of pixels in the learned irregular trees, while in Fig. 2-3, we depict the irregular tree learned for the 4x4 image in Fig. 2-2a. From the experimental results, we infer that irregular trees learned through Gibbs sampling are capable of capturing important structural information about image regions at various scales. Generally, however, in MC' IC approaches, with increasing model com- plexity, the choice of proposals in the Markov chain becomes hard, so that the equilibrium distribution is reached very slowly [63, 57]. Hence, in order to achieve faster inference, we resort to variational approximation, a specific type of deterministic approximation [59,64]. Variational approximation methods have been demonstrated to give good and significantly faster results, when compared to Gibbs sampling [46]. The proposed approaches range from a factorized approximating distribution over hidden variables [45] (a.k.a. mean field varia- tional approximation) to more structured solutions [48], where dependencies among hidden variables are enforced. The underlying assumption in those methods is that there are aver- aging phenomena in irregular trees that may render a given set of variables approximately independent of the rest of the network. Therefore, the resulting variational optimization of irregular trees provides for principled solutions, while reducing computational complexity. In the following section, we derive a novel Structured Variational Approximation (SVA) algorithm for the irregular tree model defined in Section 2.1. 2.3 Structured Variational Approximation In variational approximation, the intractable distribution P(Z, X, R'\Y, Ro) is approxi- mated by a simpler distribution Q(Z, X, R'IY, Ro) closest to P(Z, X, R'IY, Ro). To simplify notation, below, we omit the conditioning on Y and R, and write Q(Z, X, R'). The novelty of our approach is that we constrain the variational distribution to the form Q(Z, X, R') A Q(Z)Q(X|Z)Q(R'|Z) (2.8) which enforces that both class-indicator variables X and position variables R' are statisti- cally dependent on the tree connectivity Z. Since these dependencies are significant in the prior, one should expect them to remain so in the posterior. Therefore, our formulation appears to be more appropriate for approximating the true posterior than the mean-field variational approximation Q(Z,X, R')=Q(Z)Q(X)Q(R') discussed by Adams et al. [45], and the form Q(Z, X, R')=Q(Z)Q(X|Z)Q(R') proposed by Storkey and Williams [48]. We define the approximating distributions as follows: Q(Z) A nL1 Hj(i,j)ex v{ov+1} [iJ]~' (2.9) Q(X|Z) A n ,j H1,klM [Qj] (2.10) exp ( p)TQl(rj j)i') Q(RtlZ) A ,v, [Q(rzIij)] jjj [, ( -- / (2.11) 271\lij\1 2 where parameters (ij correspond to the connection probabilities, and the Qk1 are anal- ogous to the Pl' conditional p .1. 1. ,ilr-, tables. For the parameters of Q(R'IZ), note that covariances Qij and mean values ptij form the set of Gaussian parameters for a given node iVe over its candidate parents jVcW 1. Which pair of parameters (pij, Qj), is used to generate ri is conditioned on the given connection between i and j -that is, the current realization of Z. Furthermore, we assume that the Q's are diagonal matrices, such that node positions along the "x" and "y" image axes are uncorrelated. Also, for roots, suitable forms of Q functions are used, similar to the specifications given in Section 2.1. To find Q(Z, X, R') closest to P(Z, X, R'IY, Ro) we resort to a standard optimization method, where Kullback-Leibler (KL) divergence between Q(Z, X, R') and P(Z, X, R'IY, R) is minimized ( [65], ch. 2, pp. 12-49, and ch. 16, pp. 482-509). The KL divergence is given by KL(QIIP) A dR' Q(, X, R) log P(, 7X, R') (2.12) Z,X It is well known that KL(QIIP) is non-negative for any two distributions Q and P, and KL(QIIP)=0 if and only if Q=P; these properties are a direct corollary of Jensen's inequal- ity ( [65], ch. 2, pp. 12-49). As such, KL(QIIP) guarantees a global minimum -that is, a unique solution to Q(Z, X, R'). By minimizing the KL divergence, we derive the update equations for estimating the parameters of the variational distribution Q(Z,X, R'). Below, we summarize the final derivation results. Detailed derivation steps are reported in Appendix A, where we also provide the list of nomenclature. In the following equations, we use K to denote an arbitrary normalization constant, the definition of which may change from equation to equation. Parameters on the right-hand side of the update equations are assumed known, as learned in the previous iteration step. 2.3.1 Optimization of Q(XIZ) Q(XIZ) is fully characterized by parameters Qk1, which are updated as QJ = KPjA Vi,j EV Vk, cI eM , (2.13) where the auxiliary parameters A\ are computed as S/ P(yp()i, p(i)) ,i ( i, A. (2.14a) c[ V f M pak akl Ci G V, ACCV L-Z aCM Pcijia] ,ci J , A = P(yp(i)|,\xp(i))cV [ZaeM ] Vi e V, Vk E M, (2.14b) where Eq. (2.14a) is derived for ITyo, and Eq. (2.14b) for ITV. Since the ,ci are non-zero only for child-parent pairs, from Eq. (2.14), we note that A's are computed for both models by propagating the A messages of the corresponding children nodes upward. Thus, Q's, given by Eq. (2.13), can be updated by making a single pass up the tree. Also, note that for leaf nodes, ieV, the ,ci parameters are equal to 0 by definition, yielding Ai P(yp(i)zx, p(i)) in Eq. (2.14b). Further, from Eqs. (2.9) and (2.10), we derive the update equation for the approximate posterior probability mi that node i is assigned to image class k, given Y and R0, as "' -j / fdR' ,x XQ(Z, X, R') = E y v, E M Q'}, Vi e Vk M. (2.15) Note that the mi can be computed by propagating image-class probabilities in a single pass downward. This upward-downward propagation, specified by Eqs. (2.14) and (2.15), is very reminiscent of belief propagation for TSBNs [36,31]. For the special case when ,ij1= only for one parent j, we obtain the standard A-7r rules of Pearl's message passing scheme for TSBNs. 2.3.2 Optimization of Q(R'IZ) Q(R'|Z) is fully characterized by parameters p/ij and Qij. The update equation for ILij, V(i,j)EV'x{0, V+1}, >0, is given by ]~-1 - iJ= EjPZij1 > Ci 1Ci > i 1(, Li -dij) pcV c7V1 pV/ cCV' (2.16) where c and p denote children and grandparents of node i, respectively. Further, for all node pairs V(i, j)EVx {0, V1+}, >0, where ijQ0, ij is updated as Trz I T''r{?J}I 1+ E jp 1Tr{I QIj- } J + pGV/ Tr1 (I [ zIQ ij + (2.17) +( Tr{f^^l. 1' I\7 + c iTr{E~ } 1+ ),Tr- GyV / CL ( ^ where, once again, c and p denote children and grandparents of node i, respectively. Since the Q's and E's are assumed diagonal, it is straightforward to derive the expressions for the diagonal elements of the Q's from Eq. (2.17). Note that both pij and Qij are up- dated summing over children and grandparents of i, and, therefore, must be iterated until convergence. 2.3.3 Optimization of Q(Z) Q(Z) is fully characterized by connectivity probabilities (ij, which are computed as ij = K- exp(Aij Bij) VW, V(i,j)EVx {0, V+1} (2.18) where Aij represents the influence of observables Y, while Bij represents the contribution of the geometric properties of the network to the connectivity distribution. These are defined in Appendix A. 2.4 Inference Algorithm and Bayesian Estimation For the given set of parameters characterizing the joint prior, observables Y, and leaf-level node positions R, the standard B i-, -i 11 estimation of optimal Z, X, and R' requires minimizing the expectation of a cost function C: (Z, It')= arg minz,x,R E{C((Z, X, R'), (Z*,X*, R'*)) Y, R0, E}, (2.19) where C(.) penalizes the discrepancy between the estimated configuration (Z, X, R') and the true one (Z*, X*, R'*). We propose the following cost function: C((Z, X, R'), (Z*, X*, R'*))a [1_6(zz)]+ [16(_ *)]+ [-(r-r)], i,jEV iEV keM iEV' (2.20) where indicates true values, and 6(.) is the Kronecker delta function. Using the variational approximation P(Z,X,R'IY, RO)Q(Z)Q(XIZ)Q(R'IZ), from Eqs. (2.19) and (2.20), we derive: Z arg minz ZQ(Z) (i,j)Ex{o,v+}[1-6(ziJ-Z)], (2.21) X= arg minx Ez,x Q(Z)Q(XIZ) Zicv EkEM[-l -ai *))], (2.22) R'= arg minR, J dR' (Z Q(Z)Q(R'IZ) ~i yv[1-6(ri-r*)]. (2.23) Given the constraints on connections, discussed in Section 2.1, minimization in Eq. (2.21) is equivalent to finding parents: (Vf)(VieVl)(Z.i40) = argmaxj{0o,vyt+} ij for ITyo (2.24a) (Vw)(VieV ) = argmaxj{0oyV+,} ij for ITv (2.24b) where (ij is given by Eq. (2.18); Z.i denotes the i-th column of Z, and Z.ij40 indicates that there is at least one non-zero element in column Z.i; that is, i has children, and thereby is included in the tree structure. Note that due to the distribution over connections, after estimation of Z, for a given image, some nodes may remain without children. To preserve the generative property in ITyo, we impose an additional constraint on Z that nodes above the leaf level must have children in order to be able to connect to upper levels. On the other hand, in ITv, due to multi-layered observables, all nodes V must be included in the tree structure, even if they do not have children. The global solution to Eq. (2.24a) is an open problem in many research areas. Therefore, for ITyo, we propose a stage-wise optimization, where, as we move upwards, starting from the leaf level = {0, 1, ..., L-l}, we include in the tree structure optimal parents at V+1 according to (ViEV)(Z.$)0) j=argmaxj0o,vt+1}ij, (2.25) where Z.i denotes i-th column of the estimated Z, and Z.i40 indicates that i has already been included in the tree structure when optimizing the previous level V. Next, from Eq. (2.22), the resulting B i-,, -i I, estimator of image-class labels, denoted as xi, is (VieV) xi = arg maxkeM mi (2.26) where the approximate posterior .1. .1.11 l.1-, mk that image class k is assigned to node i is given by Eq. (2.15). Finally, from Eq. (2.23), optimal node positions are estimated as (V>0)(VieVe ) ri argin- ::, Ez Q(rilZ)Q(Z) = Ej{o,vt+i} lijij, (2.27) where ttij and (ij are given by Eqs. (2.16) and (2.18), respectively. The inference algorithm for irregular trees is summarized in Fig. 2-4. The specified ordering of parameter updates for Q(Z), Q(XIZ), and Q(R'IZ) in Fig. 2-4, steps (4)-(10), is arbitrary; theoretically, other orderings are equally valid. 2.5 Learning Parameters of the Irregular Tree with Random Node Positions Variational inference presumes that model parameters: 0= { dij, ij, P.l, 0}, Vi, jEV, Vk, leM, and V, L, M, are available. These parameters can be learned off-line through standard Maximum Likelihood (ML) optimization. Usually, for the ML optimization, it is assumed that N, independently generated, training images, with observables {Y"}JN and latent variables {(Z', X', R"~) } are given. However, for multiscale generative models, in general, neither the true image-class labels for nodes at higher levels nor their dynamic connections are given. Therefore, configurations {(Z", ', X R')} must be estimated from the training images. To this end, we propose an iterative learning procedure. In initialization, we first set L= log2 0(V), where IVo is equal to the size of a given image. The number of image classes |M| is also assumed known. Next, due to a huge diversity of possible configurations of objects in images, for each node iGVe, we initialize to be uniform over i's candidate parents VjE{0, V+1}. Then, for all pairs (i,j)EVxV1+l at level f, we set dijp(i)-p(j); namely, the dij are initialized to the relative displacement of the centers of mass of the i-th and j-th dyadic squares in the corresponding quad-tree with L levels, specified in Eq. (2.5). For roots i, we have di=p(i). Also, we set diagonal elements of Eij to the diagonal elements Inference Algorithm Assume that V, L, M, 0, Ns, e, and e, are given. (1) Initialization: t=0; tin 0; (Vi,jEV) (Vk,leM) j(0) Q(0)=Pj; lij(0) "node locations in the corresponding quad-tree"; diagonal elements of Qij(0) are set to the area of dyadic squares in the corresponding quad-tree; (2) repeat Outer Loop (3) t=t+l; (4) Compute in bottom-up pass for f=0,1,..., L-l, ViEV', VkcM, x(t) given by Eq. (2.14); Qj(t) given by Eq. (2.13); (5) Compute in top-down pass for f=L-1, L-2,..., 0, ViEV', VkcM, m () given by Eq. (2.15); (6) repeat Inner Loop (7) tin tin + 1; (8) Compute Vi,jEV', llij(tin) given by Eq. (2.16); Qij(tin) given by Eq. (2.17); (9) until |I-ij (tin)--IJij(tin-1) I/Pij (tin-l) < EP; (10) Compute Vi,jEV', ij(t) given by Eq. (2.18); (11) until |Q(Z, X, R'; t)-Q(Z, X, R'; t-1)|/Q(Z, X, R'; t-1) (12) Estimation of Z: compute in bottom-up pass for =0, 1, ..., L-l, for ITvo: (ViEVG)(Z.i40) = arglm ,- ,, e+ ij(), for ITv: (ViEV ) j- arg maxj{o,v~+1} ij(t); (13) Estimation of X: compute (ViEV) ai argmaxkc m (t); (14) Estimation of R': compute (Vf>0)(VieVL) ri je{o,v,+i} iij(t)ij(t); Figure 2-4: Inference of the irregular tree given Y, R0, and 0; t and tin are counters in the outer and inner loops, respectively; N,, e, and e, control the convergence criteria for the two loops. of a matrix djd~ The number of components Gk in a Gaussian mixture for each class k is set to Gk=3, which is empirically validated to be appropriate. Other parameters of the Gaussian mixture, 0, are estimated by using the EM algorithm [52, 56] on the hand-labeled training images. Finally, conditional 1.1. 1 ilil-,i tables, P1', are initialized to be uniform over possible image classes. After initialization of 0, we run an iterative learning procedure, where in step t we conduct SVA inference of the irregular tree on the training images, as explained in the previous section. After inference of the posterior I .1. 1 d Iil.i I-, that class k is assigned to node i, mn, given by Eq. (2.15), and posterior connectivity probability, ij, given by Eq. (2.18), on all training images, n 1, ..., N, we update only P' and as N P l (t+1l) -- k";n(t) ,(2.28) n=1 N ,(t+1) = (E(t). (2.29) n=l Other parameters in O(t+l)= { .(t+l), dij, Eij, PiS(t+l), 0}, are fixed to their initial val- ues. In the next iteration step, we use O(t+1) for SVA inference of the irregular tree on the training images. We assume that the learning algorithm converged when |P (t+1) PM () where e > 0 is a pre-specified parameter. 2.6 Implementation Issues In this section, we list algorithm-related details that are necessary for the experimental results, presented in Chapter 6, to be reproducible. First, direct implementation of Eq. (2.13) would result in numerical underflow. There- fore, we introduce the following scaling procedure: k A --, VicV, VkeM, (2.30) Si Si A (2.31) keM Substituting the scaled A's into Eq. (2.13), we obtain vkl \k vkl \k kl pki k ki p k p- aEM Pl aM P"Ia In other words, computation of Qfk does not change when the scaled A's are used. Second, to reduce computational complexity, we consider, for each node i, only the 7x7 box encompassing parent nodes j that neighbor the parent of the corresponding quad-tree. Consequently, the number of possible children nodes c of i is also limited. Our experiments show that the omitted nodes, either children or parents, contribute negligibly to the update equations. Thus, we limit overall computational cost as the number of nodes increases. 26 Finally, the convergence criterion of the inner loop, where tij and Qij are computed, is controlled by parameter e,. When e =0.01, the average number of iteration steps, tin, in the inner loop, is from 3 to 5, depending on the image size, where the latter is obtained for 128x 128 images. The convergence criterion of the outer loop is controlled by parameters N, and e. Simplifications that we use in practice may lead to sub-optimal solutions of SVA. From our experience, though, the algorithm recovers from unstable stationary points for sufficiently large N. In our experiments, we set N,=10 and e=0.01. After the inference algorithm (Fig. 2-4) converged, we then estimate the values of hidden variables (Z, X, R') for a given image, thereby conducting image interpretation. CHAPTER 3 IRREGULAR TREES WITH FIXED NODE POSITIONS In the previous chapter, two architectures of the irregular tree are presented, which are fully characterized by the following joint prior: P(Z, X, R', YIRO po) =P(YIX, p)P(X|Z)P(Z, R'IRo po) As discussed in Section 2.2, the inference of the posterior distribution P(Z,X, R'IY, RO) is intractable, due to the complexity of the model. The node-position variables, R', are the main culprit for conducting approximate inference. On the other hand, the R' are very useful, because they constrain possible network configurations. In order to avoid approximate inference, in this chapter, we introduce yet another architecture of the irregular tree, where the R' are eliminated, and where the constraints on the tree structure are directly modeled in the distribution of connectivity Z. 3.1 Model Specification Similar to the model specification in the previous chapter, we introduce two architec- tures: one with observables only at the leaf level, and the other with observables propagated to higher levels. The main difference from the architectures ITv and ITvo is that node po- sitions are identical to those of the quad-tree. Therefore, we refer to the architectures presented in this chapter as irregular quad trees IQTV and IQTvo. The irregular quad tree is a directed -,. li graph with nodes in set V, organized in hierarchical levels, VW, ={0,1,...,L}, where Vo denotes the leaf level. The layout of nodes is identical to that of the quad-tree, modeling for example the dyadic pyramid of wavelet coefficients, such that the number of nodes at level can be computed as IVl =IV- 1l/4=...= VO|/4'. Unlike for position-encoding dynamic trees [48], we assume that nodes are fixed at locations of the corresponding quad-tree. Consequently, irregular model structure is achieved only through establishing arbitrary connections between nodes. Connections are established under the constraint that a node at level f can become a root or it can connect only to the nodes at the next f+1 level. The network connectivity is rep- resented by a random matrix, Z, where entry zij is an indicator random variable, such that zij=1 if iEVW and jeV1+l are connected. Z contains an additional zero ("root") column, where entries zio=1 if i is a root node. Each node can have only one parent, or can be a root. Note that due to the distribution over connections, after estimation of Z, for a given image, in IQTv, some nodes may remain without children. Each node i is characterized by an image-class random variable, xi, which can take values in a finite class set C. Given Z, the label xi of node i is conditioned on xj of its parent j as P(xilxj, zij=1). The joint probability of image-class variables X={xi}, VieV, is given by P(XZ)= ni o nieve P(Xz Xj, zij 1), (3.1) where for roots we use priors P(xi). We assume that the conditional probability tables P(xilxj, zij=1) are equal for all the nodes at all levels, as in [33]. Such a unique conditional probability table is denoted as F. Next, we assume that observables yi are conditionally independent given the corre- sponding xi: P(YIX) = n v P(yilxi) (3.2) P(y\ 1) Ef 1 "1k(g)N(Yi; Vk(g),7 k(g)) (3.3) where for IQTvo instead of V we write Vo in Eq. (3.2). P(yl, i=k), kcM, is mod- eled as a mixture of Gaussians. The Gaussian-mixture parameters can be grouped in 0 {7k (9), vk(9), 7k(g), Gk}, VkeM. Finally, we specify the connectivity distribution. In the previous chapter, it is de- flQTined as the prior P(Z)= Hi,je P(zij=1), and then the constraint on possible tree structures is imposed through introducing an additional set of random variables -namely, random node positions R. The main purpose of the R's is to provide for the mechanism that the connections between close nodes are favored. That approach has two major dis- advantages. First, the additional R variables render the exact inference of the dynamic tree intractable, enforcing the use of approximate inference methods (variational approxi- mation). Second, the decision if nodes i and j should be connected is not informed on the actual values of xi and xj. To improve upon the model formulation of the previous chap- ter, we seek to eliminate the R's, and to incorporate the information on image-class labels and node positions in the connectivity distribution. We reason that connections between parents and children, whose relative distance is small, should be favored over those that are far apart. At the same time, we seek to establish a mechanism that groups nodes belonging to the same image class, and separates those assigned to different classes. Let us first examine relative distances between nodes. Due to -, iiil. rry of the node layout (equal to that of the quad-tree), we divide the set of all candidate parents j into classes of equidistance from child i, as depicted in Fig. 3-1. We specify that relative distances can take integer values dij{0, 1,2,...,d ax}, where if i is a root nioAO. Note that d"ax values vary for different positions of i at one level, as well as for different levels to which i belongs. Given X, we specify the conditional connectivity distribution as L P(ZIX) 1= H P(zi=ll|z, ), (3.4) =0 (i,j) Ve x{0,Ve+1} pi i is a root, P(zij= lzi,Xj) = pI p p)dij if xi=Xj, (3.5) subject to P(zij l\xi,Xj) 1, (3.6) jC{0,Ve+1} where K is a normalizing constant, and pi is the parameter of the geometric distribution. From Eq. (3.5), we observe that when xi=xj, P(zij=llxi,xj) decreases as dij becomes larger, while when xiyxj, P(zij=lzxi,xj) increases for greater distances dij. Hence, the form of P(zij=llxi, xj), given by Eq. (3.5), satisfies the aforementioned desirable properties. To avoid overfitting, we assume that pi is equal for all nodes i at the same level. The parameters of P(ZIX) can be grouped in the parameter set = {pi}, ViEV. S / class: dj = 1 e class: d3j = 2 / / D node i Figure 3-1: Classes of candidate parents j that are characterized by a unique relative distance dij from child i. The introduced parameters of the model can be grouped in the parameter set 0( {1, 0, '}. In the next section we explain how to infer the I. configuration of Z and X from the observed image data Y, provided that 0 is known. 3.2 Inference of the Irregular Tree with Fixed Node Positions The standard B ,-,. -i ,_i formulation of the inference problem consists in minimizing the expectation of some cost function C, given the data (Z, X) arg minz,x E{C((Z, X), (Z', X'))IY, } (3.7) where C penalizes the discrepancy between the estimated configuration (Z, X) and the true one (Z', X'). We propose the following cost function: C((Z, X), (Z', X')) = C(X, X') + C(Z, Z') (3.8) L-1 L-1 > [1 (xj 4-)] + > [1i- -4)], (3.9) =0 icyE =0 (i,j)cVex{0,Ve+l} where stands for true values, and 6(.) is the Kronecker delta function. From Eq. (3.9), the resulting B i-,. -i i estimator of X is VieV, i= argmax,,c P(xilZ, Y). (3.10) Next, given the constraints on connections in the irregular tree, we derive that mini- mizing E{C(Z, Z')IY, 9} is equivalent to finding a set of optimal parents j such that (Vf)(VieV )(Z.,40) j arg ,:. -,,+ I} P(zijlx,x y), for IQTyo (3.11a) (Vw)(ViEV) j arg im -,,,jv+I} P(zijli, xj) for IQTv , (3.11b) where Z.i is the i-th column of Z, and Z.ij4O represents the event "node i has children", that is, "node i is included in the irregular-tree structure." The global solution to Eq. (3.11a) is an open problem in many research areas. We propose a stage-wise optimization, where, as we move upwards, starting from the leaf level = {0, 1,..., L}, we include in the tree structure optimal parents at V+1 according to (ViEVc)(Z.$O0) j argmaxj{o,v+i} P(zij=lzi, x), (3.12) where Z.i40 denotes an estimate that i has already been included in the tree structure when optimizing the previous level V. By using the results in Eqs. (3.10) and (3.12), we specify the inference algorithm for the irregular quad tree, which is summarized in Fig. 3-2. In a recursive step t, we first assume that estimate Z(t-1) of the previous step t-1 is known and then derive estimate X(t) using Eq. (3.10); then, substituting X(t) in Eq. (3.12) we derive estimate Z(t). We consider the algorithm converged if P(Y, XIZ) does not vary more than some threshold e for N, consecutive iteration steps t. In our experiments, we set e = 0.01 and N, = 10. Steps 2 and 6 in the algorithm can be interpreted as inference of X given Y for a fixed- structure tree. In particular, for Step 2, where the initial structure is the quad-tree, we can use the standard inference on quad-trees, where, essentially, belief messages are propagated in only two sweeps up and down the tree [33,29,31]. For Step 6, the irregular tree represents a forest of subtrees, which also have fixed, though irregular, structure; therefore, we can use the very same tree-inference algorithm for each of the subtrees. For completeness, in Appendix B, we present the two-pass maximum posterior marginal estimation of X proposed by Laferte et al. [33]. 3.3 Learning Parameters of the Irregular Tree with Fixed Node Positions Analogous to the learning algorithm discussed in the previous chapter, the parameters of the irregular tree with fixed node positions can be learned by using the standard ML optimization. Here, we assume that N, independently generated, training images, with ob- servables {Y"}, n=1, ..., N, are given. As explained before, configurations of latent variables {(Z", X')} must be estimated. Inference Algorithm (1) t = 0; initialize irregular-tree structure Z(0) to quad-tree; (2) Compute ViEV, xi(0) argn-,-:, I.c P(xi\Z(0),Y); (3) repeat (4) t t + 1; (5) Compute in bottom-up pass for =0,1, ..., L for IQTyo: (ViEVG)(Z.4j0) = argmaxj{0ove+} P(zijz1lxi,Xj); for IQTv: (VieVG) j arg n-ir,::,, +1} P(zij 1lxi, xj); (6) Compute ViEV, xi(t) arg m- ec P(xilZ(t),Y); (7) X-X(t); Z(t); (8) until I P(Y, x( i1)Z(t-1)) < for N s consecutive iteration steps. Figure 3-2: Inference of the irregular tree with fixed node positions, given observables Y and the model parameters R. To this end, we propose an iterative learning procedure, where in step t we first assume that 0(t)={((t), 0(t), I(t)} is given and then conduct inference for each training image, n= 1, ..., N, (Z, X) arg minE IC((Z, X), (Z', X'))IY', E(t)}, Z,X as explained in Section 3.2. Once the estimates {(ZI, X")} are found, we apply standard ML optimization to compute @(t+1). More specifically, suppose, in the learning step t, realizations of random variables (Y", X", Z") are given for n=1,..., N. Then the parameters of Gaussian-mixture distribu- tions, in step t + 1, can be computed using the standard EM algorithm [56]: P(wP(g) c) I I c)zc() (3.13) S( ( X CP ( I| c) 7 c( g) ^rc(g) -EPP((g)ly,i =c), (3.14) nin c(g) i= 1 Yii] P(wc(g)lyi, ~ic c) S Eil( C(g))(y C(g))TP(wc(g) yixi c) c(g)i P(a)c y^ c) (3.16) where n~ is the total number of all the nodes over N training images that are classified as class c. To compute P(wc(g)lyi,xi=c) in Eq. (3.13), we use Gaussian-mixture parameters from the previous learning step t. For all classes we set Gc=3. Next, we explain how to learn the parameters of the connectivity distribution, I(t+l) = {pi(t+l)}iGv, by using the ML principle: N (t+1) -arg max J P(ZT|1X, '(t-1)). (3.17) n=l Here, we consider two cases for IQTv and IQTyo models. Recall that parameters pi are equal for all nodes i at the same level f. Given the estimates {(Z", X")}, for each training image n=1, ..., N, from Eqs. (3.5) and (3.17), we derive for IQTy: p()= N N||, (3.18) EL [1+I(a7 ,/ +I(4 )(dqax- -j)] n=1 icV where I(.) is an indicator function, j is an estimated parent of node i, d' denotes the relative distance assigned to the estimated connection z=l1. For IQTvo, given the estimates {(Z", X")}, for each training image n=1,...,N, we i_ ,'1-,.. the set of nodes ieVe included in the corresponding irregular tree, i.e., Zz~O0. Thus, from Eqs. (3.5), and (3.17), we derive: N I ( I(Z. 0) M(-) n- iE (3.19) E L i(Z'T 0) [1 + 1( + i(ml^# (dax- -d)] n= licV where I(.) is an indicator function, j is an estimated parent of node i, d' denotes the relative distance assigned to the estimated connection ~z=1. Finally, to learn the conditional probability table 1, we use the standard EM algorithm on fixed-structure trees, thoroughly discussed in [33]. Note that to obtain the estimates {(Z", X")}, for each training image n 1, ..., N, in the learning step t, we in fact have to conduct the MPM estimation, given in in Appendix B in Fig. B. By using already available P(xzi,xjlY), zl=1) and P(xiY' )), obtained for each image n as in Fig B, we derive N1 y iE P(Xi, Xjyn y,n=1) ( +I v xJ d(i)I-1) (3.20) N iev P(j I Yd ) The overall learning procedure is summarized in Fig. 3-3. Learning Algorithm (1) t= 0; initialize E(0) {-(0), 0(0), T(0)}; (2) Estimate for n 1,..., N (Z-, X'- arg minz,x E{C((Z, X), (Z', X'))lY', e(0)}; (3) repeat (4) t t+l; (5) Compute: 0(t) as in Eqs. (3.13)-(3.16); p(; t), for IQTv as in Eq. (3.18); for IQTvo as in Eq. (3.19); D(t), as in Eq. (3.20); (6) Estimate for n 1,..., N (Z k), X)= arg minz,x E{C((Z, X), (Z', X'))IY", 8(t)}, using the inference algorithm in Fig. 3-2; (7) E*= (t); () unil () P(Y",X"|lZ",*)--P(Y",X"\Z",l(t-1)) A (8) until (Vn) p, I Figure 3-3: Algorithm for learning the parameters of the irregular tree; for notational simplicity, in Step (8) we do not indicate the different estimates of (Z'",X") for 8* and E(t-1). Once 8* is learned, we can localize, detect and recognize objects in the image, by conducting the inference algorithm, presented in Fig. 3-2. CHAPTER 4 COGNITIVE ANALYSIS OF OBJECT PARTS Inference of hidden variables (Z, X), can be viewed as building a forest of subtrees, each segmenting an image into arbitrary (not necessarily contiguous) regions, which we interpret as objects. Since, each root determines a subtree, whose leaf nodes form a detected object, we assign I,-i, il 1 meaning to roots by assuming they represent whole objects. Moreover, each descendant of the root can be viewed as the root of another subtree, whose leaf nodes cover only a part of the object. Hence, we say that roots' descendants represent object parts at various scales. Strategies for recognizing detected objects naturally arise from a particular interpreta- tion of the tree/sub-tree structure. Below, we make a distinction between two such strate- gies. The ,1-, -i, of image regions under the roots leads to the whole-. '1,i. recognition dl,,/i ..,o. while the .1 ,1:, -i, of image regions determined by roots' descendants constitutes the .1,/'. I-part recognition strategy. For both approaches, final recognition is conducted by majority voting over MAP labels, xi, of leaf nodes.1 The reason for ,,,,1 i,:_: smaller image regions than those under the roots stems from our hypothesis that the information of fine-scale object details may prove critical for the recognition of an object as a whole in scenes with occlusions. To reduce the complexity of interpreting all detected object sub-parts, we propose to i& '1-,., the .1;,<.:;. ,ii,.. of object components (i.e., irregular-tree nodes) with respect to recognition of objects as a whole. 1 The literature offers various strategies that outperform majority-voting classification (e.g., multiscale B i-, -i i, classification [29], and multiscale Viterbi classification [32]); how- ever, they do not account explicitly for occlusions, and, as such, do not significantly out- perform majority voting for scenes with occluded objects. 4.1 Measuring Significance of Object Parts We hypothesize that the significance of object parts with respect to object recognition depends on both local, innate object properties and global scene properties. While in- nate properties represent characteristic object features, which differentiate one object from another, global scene properties describe interdependencies of object parts in the overall image composition. It is necessary to account for both local and global cues, as the most conspicuous object component need not necessarily be the most significant for that object's recognition in the presence of alike objects. The .,m 11-, -i, of innate object properties is handled through inference of the irregular tree, where, for a given image, we compute P(zilZ,Y), VieV, as explained in C'!i 11.1' s 2 and 3. To account for the influence of global scene properties, for each node i, we compute Shanon's entropy over the set of image classes, M, as (Vie V)(zi 0) H,= P(x Z, Y) logP(x Z, Y) (4.1) xiEM Since node i represents an object part, we define Hi as a measure of significance of that object part. Note that a node with small entropy is characterized by a I" il:y" distribution P(1xiZ, Y) with the maximum, say, at xi = k e M. This indicates that the error of classification will be small when i is labeled as class c. Recall that during inference, the belief message of i is propagated down the subtree in belief propagation [33], which is likely to render i's descendants with small entropies, as well. Thus, the classification error of the whole region of leaf nodes under i is likely to be small, when compared to some other image region under, say, node j such that Hj>Hi. Consequently, i is more -,!!lil 'i!il" for recognition of class k than node j. In brief, the most significant object part has the smallest entropy over all nodes in a given sub-tree T: i* max Hi (4.2) ieT In Figs. 4-1 and 4-2, we illustrate the most significant object part under each root, where entropy is computed over seven and six image classes, shown in Figs. 4-1(top) and 4-2(top), respectively. The experiment is conducted as explained in Chapter 2, using the Figure 4-1: For each subtree of ITy, representing an object in the 128 x 128 image, a node i* is found with the highest entropy for |M| = 6 + 1 7 possible image classes (top row). Bright pixels are descendants of i* at the leaf level and indicate the object part represented by i. irregular tree with random node positions, and observables at all levels (ITv). Details on computing observables Y in this experiment are explained in Chapter 5. Note that for different scenes different object parts are established as the most significant with respect to the entropy measure. 4.2 Combining Object-Part Recognition Results Once nodes are ranked with respect to the entropy measure, we are in a position to devise a criterion to optimally combine this information toward ultimate object recognition. Herewith, we propose a simple greedy algorithm, which, nonetheless, shows remarkable improvements in performance over the whole-object recognition approach. Under each root, we first select the descendant node with the smallest entropy. Each selected node determines a subtree, whose leaf nodes form an object part. Then, we conduct majority voting over these selected image regions. In the second round, we select under each root the descendant node with the smallest entropy, such that it does not belong to any of the subtrees selected in the first round. Now, these nodes determine new subtrees, whose leaf nodes form object parts that do not overlap with the selected image regions in Figure 4-2: For each subtree of ITy, representing an object in the 256 x 256 image, a node i* is found with the highest entropy for |M| = 5 +1 = 6 possible image classes (top row). Bright pixels are descendants of i* at the leaf level and indicate the object part represented by i*; the images represent the same scene viewed from three different angles; the most significant object parts differ over various scenes. the first round. Then, we conduct majority voting over the newly selected image regions. This procedure is repeated until we exhaustively cover all the pixels in the image. This stage-wise majority voting over non-overlapping image regions constitutes the final step in the object-part recognition strategy (see Fig. 1 3). CHAPTER 5 FEATURE EXTRACTION In C'lI 1.1. 'p 2 and 3, we have introduced four architectures of the irregular tree, referred to as ITv, ITvo, IQTv, and IQTvo. To compute the observable (feature) random vectors Y's for these models, we account for both color and texture cues. 5.1 Texture For the choice of texture-based features, we have considered several filtering, model- based and statistical methods for texture feature extraction. Our conclusion complies with the comparative study of Randen and Husoy [66] that for problems with many textures with subtle spectral differences, as in the case of our complex classes, it is reasonable to assume that the spectral decomposition by a filter bank yields consistently superior results over other texture mI '1-, -i, methods. Our experimental results also 1:_:_ -. that it is crucial to i, '1-,. both local as well as regional properties of texture. As such, we (n 11,. -,, the wavelet transform, due to its inherent representation of texture at different scales and locations. 5.1.1 Wavelet Transform Wavelet atom functions, being well localized both in space and frequency, retrieve texture information quite successfully [67]. The conventional discrete wavelet transform (DWT) may be regarded as equivalent to filtering the input signal with a bank of bandpass filters, whose impulse responses are all given by scaled versions of a mother wavelet. The scaling factor between adjacent filters is 2:1, leading to octave bandwidths and center fre- quencies that are one octave apart. The octave-band DWT is most efficiently implemented by the dyadic wavelet decomposition tree of Mallat [68], where wavelet coefficients of an image are obtained convolving every row and column with impulse responses of lowpass and highpass filters, as shown in Figure 5-1. Practically, coefficients of one scale are ob- tained convolving every second row and column from the previous finer scale. Thus, the filter output is a wavelet subimage that has four times less coefficients than the one at the Level 0 Column filters Level 1 Row filters 2 WLH 2 WH,,L el 1 umn filters Level 0 Row filters 0 2 H1 2 SWL H LH Lev H WH WH Col ~^ WHH LEVEL 0 LEVEL 1 Figure 5-1: Two levels of the DWT of a two-dimensional signal. 20 40 60 80 100 120 20 40 60 80 100 120 Figure 5-2: The original image (left) and its two-scale dyadic DWT (right). previous scale. The lowpass filter is denoted with Ho and the highpass filter with H1. The wavelet coefficients W have in index L denoting lowpass output and H for highpass output. Separable filtering of rows and columns produces four subimages at each level, which can be arranged as shown in Figure 5-2. The same figure also illustrates well the directional selectivity of the DWT, because WLH, WHL, and WHH bandpass subimages can select horizontal, vertical and diagonal edges, respectively. 5.1.2 Wavelet Properties The following properties of the DWT have made wavelet-based image processing very attractive in recent years [67,30,69]: 1. 1.. .l1 -,: each wavelet coefficient represents local image content in space and frequency, because wavelets are well localized simultaneously in space and frequency 2. multi-resolution: DWT represents an image at different scales of resolution in space domain (i.e., in frequency domain); regions of ." 1,1-, -i- at one scale are divided up into four smaller regions at the next finer scale (Fig. 5-2) 3. edge detector: edges of an image are represented by large wavelet coefficients at the corresponding locations 4. energy compression: wavelet coefficients are large only if edges are present within the support of the wavelet, which means that the majority of wavelet coefficients have small values 5. decorrelation: wavelet coefficients are approximately decorrelated, since the scaled and shifted wavelets form orthonormal basis; dependencies among wavelet coefficients are predominantly local 6. clustering: if a particular wavelet coefficient is large/small, then the adjacent coeffi- cients are very likely to also be large/small 7. persistence: large/small values of wavelet coefficients tend to propagate through scales 8. non-Gaussian marginal: wavelet coefficients have peaky and long-tailed marginal dis- tributions; due to the energy compression property only a few wavelet coefficients have large values, therefore Gaussian distribution for an individual coefficient is a poor statistical model It is also important to introduce shortcomings of the DWT. Discrete wavelet decom- positions suffer from two main problems, which hamper their use for many applications, as follows [70]: 1. lack of shift invariance: small shifts in the input signal can cause major variations in the energy distribution of wavelet coefficients 2. poor directional selectivity: for some applications horizontal, vertical and diagonal selectivity is insufficient When we .i_ ,1- ,.., the Fourier spectrum of a signal, we expect the energy in each frequency bin to be invariant to any shifts of the input. Unfortunately, the DWT has a significant drawback that the energy distribution between various wavelet scales depends critically on the position of key features of the input signal, whereas ideally dependence TREE a I TREE b Figure 5-3: The Q-shift Dual-Tree CWT. is on just the features themselves. Therefore, the real DWT is unlikely to give consistent results when used in texture I &', 1, In literature, there are several approaches proposed to overcome this problem (e.g., Discrete Wavelet Frames [67,71]), all increasing computational load with inevitable redun- dancy in the wavelet domain. In our opinion, the Complex Wavelet Transform (CWT) offers the best solution providing additional advantages, described in the following subsection. 5.1.3 Complex Wavelet Transform The structure of the CWT is the same as in Figure 5 1, except that the CWT filters have complex coefficients and generate complex output. The output sampling rates are unchanged from the DWT, but each wavelet coefficient contains a real and imaginary part, thus a redundancy of 2:1 for one-dimensional signals is introduced. In our case, for two- dimensional signals, the redundancy becomes 4:1, because two adjacent quadrants of the spectrum are required to represent fully a real two-dimensional signal, adding an extra 2:1 factor. This is achieved by additional filtering with complex < jiugates of either the row or column filters [70]. Despite its higher computational cost, we prefer the CWT over the DWT because of the CWT's following attractive properties. The CWT is shown to posses almost shift and rotational invariance, given suitably designed biorthogonal or orthogonal wavelet filters. We Level 2 Table 5-1: Coefficients of the filters used in the Q-shift DTCWT. H 13 (-,ii Wi1,) H 19 (-,-ir-i, ii)1.! H 6 -0.0017581 -0.0000706 0.03611. : I 0 0 0 111 1 1'' ,"i. 0.0013419 -0.08832942 -0.0468750 -0.011;! 0.23389032 -0.0482422 -0.0071568 0.76027237 0.2968750 0.0238560 0.58751830 ",",", !i ,'ss 1a 1 ,,i, ;1 0 0.2968750 -0.0516881 -0.11430184 -0.0482422 -0.2997576 0 -0.2997576 Figure 5-4: The CWT is strongly oriented at angles 150, 450, 750. implement the Q-shift Dual-Tree CWT scheme, proposed by Kingsbury [72], as depicted in Figure 5-3. The figure shows the CWT of only one-dimensional signal x, for clarity. The output of the trees a and b can be viewed as real and imaginary parts of complex wavelet coefficients, respectively. Thus, to compute the CWT, we implement two real DWT's (see Fig. 5-1), obtaining a wavelet frame with redundancy two. As for the DWT, here, lowpass and highpass filters are denoted with 0 and 1 in index, respectively. The level 0 comprises odd-length filters Hoa(z) Hob(z) = H13(z) (13 taps) and Hia(z) Hlb(z) = H19( (19 taps). Levels above the level 0 consist of even-length filters Hoo0(z) z-H6(z-1), Hoia(z) =H6(-z), Hoob(z) =H6(z), Hob(z) = -1H6(-z-1), where the impulse response of the filters H13, H19 and H6 is given in the table 5-1. hit P -9 ONR- D P.M 0210 Aside from being shift invariant, the CWT is superior to the DWT in terms of direc- tional selectivity, too. A two-dimensional CWT produces six bandpass subimages (analo- gous to the three subimages in the DWT) of complex coefficients at each level, which are strongly oriented at angles of 150, 450, 750, as illustrated in Figure 5-4. Another advantageous property of the CWT exerts in the presence of noise. The phase and magnitude of the complex wavelet coefficients collaborate in a non trivial way to describe data [70]. The phase encodes the coherent (in space and scale) structure of an image, which is resilient to noise, and the magnitude captures the strength of local information that could be very susceptible to noise corruption. Hence, the phase of complex wavelet coefficients might be used as a principal clue for image denoising. However, our experimental results have shown that phase is not a good feature choice for sky/ground modeling. Therefore, we consider only magnitudes. In summary, for texture .111 m, -i- in ITv and IQTv, we choose the complex wavelet transform (CWT) applied to the intensity (gi '-, .1-) image, due to its shift-invariant representation of texture at different scales, orientations and locations. 5.1.4 Difference-of-Gaussian Texture Extraction In ITvo and IQTvo, observables are present only at the leaf level. Therefore, for these models, multiscale texture extraction is superfluous. Here, we compute the difference-of- Gaussian function convolved with the image as D(x, y, k, )= (G(x, y, ka)-G(x, y, a))*I(x, y), (5.1) where x and y represent pixel coordinates, G(x,y, a) exp(-(X2 + y2)/2a2)/27ca2, and I(, y) is the intensity image. In addition to reduced computational complexity, as com- pared to the CWT, the function D provides a close approximation to the scale-normalized Laplacian of Gaussian, a2V2G, which has been shown to produce the most stable image features across scales when compared to a range of other possible image functions, such as the gradient and the Hessian [73,74]. We compute D(x, y, k, a) for three scales k= /2,2, V/8 and a = 2. 45 5.2 Color Ti color information in a video signal is O. !" e encoded in the RGB color space. For color features, in all models, we choose the generalized RGB color space: r=R/(R+G+B), and g=G/( R+G B), which vely 1 variations in brightness. For ITy and I(- TV, the Y's of higher-level nodes are computed as the mean of the r's and .of their (i. .-:. nodes of the initial ..: 1 :ree structure. Each color observable is normalized to have zero mean and unit variance over the dataset. In summary, the y's are 8 dimensional vectors for ITv and Ii* i ,n and 5 ... : vectors for ITV,' and IQTyLo. CHAPTER 6 EXPERIMENTS AND DISCUSSION We report experiments on image segmentation and classification for six sets of images. Dataset I comprises fifty, 64x64, simple-scene images with object appearances of 20 distinct objects shown in Fig. 6-1. Samples of dataset I are given in Figs. 6-2, 6-3, and 6-4. Dataset II contains 120, 128x128, complex-scene images with partially occluded object appearances of the same 20 distinct objects as for dataset I images. Examples of dataset II are shown in Figs. 6-11, 6-12, 6-15. Note that objects appearing in datasets I and II are carefully chosen to test if irregular trees are expressive enough to capture very small variations in appearances of some classes (e.g., two different types of cans in Fig. 6-1), as well as to encode large differences among some other classes (e.g., wiry-featured robot and books in Fig. 6-1). Next, dataset III contains fifty, 128x128, natural-scene images, samples of which are shown in Figs. 6-5 and 6-6. For dataset IV we choose sixty, 128 x 128 images from a database that is publicly available at the Computer Vision Home Page. Dataset IV contains a video sequence of two people approaching each other, who wear alike shirts, but different pants, as illustrated in Fig. 6-16. The sequence is in, i. -,ii:_. because the most significant "object" parts for differentiating between the two persons (i.e., pants) get occluded. Moreover, the images represent scenes with clutter, where recognition of partially occluded, similar-in-appearance people becomes harder. Together with the two persons, there are 12 possible image classes appearing in dataset II, as depicted in Fig. 6-16a. Here, each image is treated separately, without making use of the fact that the background scene does not change in the video sequence. Further, dataset V consists of sixty, 256 x 256 images, typical samples of which are shown in Figs. 6-17b. The images in dataset V represent the video sequence of a com- plex scene, which is observed from different view points by moving a camera horizontally clockwise. Together with the background, there are 6 possible image classes, as depicted in Figs. 6-17a. Finally, dataset VI consists of sixty, 256 x 256 natural-scene images, samples of which are shown in Figs. 6-18. The images in dataset VI represent the video sequence of a row of houses, which is observed from different view points. The houses are very similar in appearance, so that the recognition task becomes very difficult, when details differentiating one house from another are occluded. There are 8 possible image classes: 4 different houses, sky, road, grass, and tree, as marked with different colors in Figs. 6-18. All datasets are divided into training and test sets by random selection of images, such that 2/3 are used for training and 1/3 for testing. Ground truth for each image is determined through hand-labeling of pixels. 6.1 Unsupervised Image Segmentation Tests We first report experiments on unsupervised image segmentation using ITvo and ITy. Irregular-tree based image segmentation is tested on datasets I and III, and conducted by the algorithm given in Fig. 2-4. Since in unsupervised settings the parameters of the model are not known, we initialize them as discussed in the initialization step of the learning algorithm in Section 2.5. After B i-, -i ,i estimation of the irregular tree, each node defines one image region composed of those leaf nodes (pixels) that are that node's descendants. Results presented in Figs. 6-2, 6-3, 6-4, 6-5, and 66 -:_:_. -1 that irregular trees are able to parse images into "meaningful" parts by assigning one subtree per "object" in the image. Moreover, from Figs. 6-2 and 6-3, we also observe that irregular trees, inferred through SVA, preserve structure for objects across images subject to translation, rotation and scaling. In Fig. 6-2, note that the level-4 clustering for the larger-object scale in Fig. 6-2(top-right) corresponds to the level-3 clustering for the smaller-object scale in Fig. 6-2(bottom-center). In other words, as the object transitions through scales, the tree structure changes by eliminating the lowest-level layer, while the higher-order structure remains intact. We also note that the estimated positions of higher-level hidden variables in ITyo and ITv are very close to the center of mass of object parts, as well as of whole objects. We compute the error of estimated root-node positions r as the distance from the actual center of mass rCM of hand-labeled objects, derr,,= -rCM||. Also, we compare our SVA inference Figure 6-1: 20 image classes in type I and II datasets. Sitflfl Figure 6-2: Image segmentation using ITvo: (left) dataset I images; (center) pixel clusters with the same parent at level f=3; (right) pixel clusters with the same parent at level =4; points mark the position of parent nodes. Irregular-tree structure is preserved through scales. Figure 6-3: Image segmentation using ITvo: (top) dataset I images; (bottom) pixel clusters with the same parent at level 3. Irregular-tree structure is pre- served over rotations. algorithm with variational approximation (VA)1 proposed by Storkey and Williams [48]. The averaged error values over the given test images for VA and SVA are reported in Table 6-1. We observe that the error significantly decreases as the image size increases, because in summing node positions over parent and children nodes, as in Eq. (2.16) and Eq. (2.17), more statistically significant information contributes to the position estimates. For example, d6 = 6.18 for SVA is only 4 s' of the dataset-III image size, whereas d =4.23 for SVA is 6.1' of the dataset-I image size. In Table 6-2, we report the percentage of erroneously grouped pixels, and, in Table 6-3, we report the object detection error, when compared to ground truth, averaged over each dataset. For estimating the object detection error, the following instances are counted as 1 Although the algorithm proposed by Storkey and Williams [48] is also structured varia- tional approximation, to differentiate that method from ours, we slightly abuse the notation. Figure 6-4: Image segmentation by irregular trees learned using SVA: (a)-(c) ITvo for dataset I images; all pixels labeled with the same color are descendants of a unique root. (a) (b) (c) (d) Figure 6-5: Image segmentation by irregular trees learned using SVA: (a) ITyo for a dataset III image; (b)-(d) ITv for dataset III images; all pixels labeled with the same color are descendants of a unique root. (c) (d) Figure 6-6: Image segmentation using ITv: (a) a dataset III image; (b)-(d) pixel clusters with the same parent at levels f=3, 4, 5, respectively; white regions represent pixels already grouped by roots at the previous scale; points mark the position of parent nodes. Table 6-1: Root-node distance error dataset I III ITvo VA SVA 6.32 4.61 9.15 6.87 ITv VA SVA 6.14 4.23 8.99 6.18 ~I+ Table 6-2: Pixel segmentation error Table 6-3: Object detection error datasets datasets I III I III ITvo VA 7' 10% SVA c 9', ITv VA 7'. 11% SVA 1' 7'. ITvo VA 1: ', SVA ,:' 10% ITv VA ', 10% SVA 2'. .' error: (1) merging two distinct objects into one (i.e., failure to detect an object), and (2) segmenting an object into sub-regions that are not actual object parts. On the other hand, if an object is segmented into several "meaningful" sub-regions, verified by visual inspection, this type of error is not included. Overall, we observe that SVA outperforms VA for image segmentation using ITvo and ITv. Interestingly, the segmentation results for ITv models are only slightly better than for ITvo models. It should be emphasized that our experiments are carried out in an unsupervised ii -i . and, as such, cannot not be equitably evaluated against supervised object recognition results reported in the literature. Take, for instance, the segmentation in Fig. 6-5d, where two boys dressed in white clothes (i.e., two similar-looking objects) are merged into one subtree. Given the absence of prior knowledge, the ground-truth segmentation for this image is arbitrary, and the resulting segmentation ambiguous; nevertheless, we still count it towards the object-detection error percentages in Table 6-3. Our claim that nodes at different levels of irregular trees represent object-parts at various scales is supported by experimental evidence that the nodes segment the image into "meaningful" object sub-components and position themselves at the center of mass of these sub-parts. 6.2 Tests of Convergence In this section, we report on the convergence properties of the inference algorithms for ITvo, ITv, IQTyo, and IQTv. First, we compare our SVA inference algorithm with variational approximation (VA) [48]. In Fig. 6-7a-b, we illustrate the convergence rate of computing P(Z, X, R'IY, Ro) a Q(Z,X, R') for SVA and VA, averaged over the given datasets. Numbers above bars represent the mean number of iteration steps it takes for the algorithm to converge. We consider the algorithm converged when IQ(Z,X, R';t + 300 0 A252 200 O 150 E 50 dataset I dataset III dataset II (a) Average convergence rate for ITvo. 40 iMm VA 33 35 33 30 M 25- 20 - P 10 dataset I dataset III dataset II (c) Increase of log Q(Z, X, R') in SVA over VA for ITvo. 250 r 200 0 c 50 e- S dataset I dataset III dataset II (b) Average convergence rate for ITv. 45 40 Mi VA 3 35 3 5 1- I 30 25 2 20 20 a' 15 10 5 dataset I dataset III dataset II (d) Increase of log Q(Z, X, R') in SVA over VA for ITv Figure 6-7: Comparison of inference algorithms: (a)-(b) convergence rate averaged over the given datasets; (c)-(d) percentage increase in log Q(Z,X, R') computed in SVA over log Q(Z, X, R') computed in VA. 1) Q(Z,X, R';t)I/Q(Z,X, R';t) e=0.01 (see Fig. 2-4, Step (11)). Overall, SVA converges in the fewest number of iterations. For example, the average number of iterations for SVA on dataset III is 25 and 23 for ITyo and ITv, respectively, which takes approximately 6s and 5s on a Dual 2 GHz PowerPC G5. Here, the processing time also includes image-feature extraction. For the same experiments, in Fig. 6-7c-d, we report the percentage increase in log Q(Z, X, R') computed using our SVA over log Q(Z, X, R') obtained by VA. We note that SVA results in larger approximate posteriors than VA. The larger log Q(Z, X, R') means that the as- sumed form of the approximate posterior distribution Q(Z,X, R')=Q(Z)Q(XIZ)Q(R'|Z) more accurately represents underlying stochastic processes in the image than VA. Now, we compare the convergence of the inference algorithm for IQTyo with SVA and VA for ITyo. For simplicity, we refer to the inference algorithm for the model IQTvo, also, as IQTyo, slightly abusing the notation. The parameters that control the convergence 0 -1000 -2000 " S-3000 I IQTvo \- VA S II 2 5 10 20 50 100 200 number of iterations Figure 6-8: Typical convergence rate of the inference algorithm for ITvo on the 128 x 128 dataset IV image in Fig. 6-16b; SVA and VA inference algorithms are conducted for ITvo model. -5000 -10000 .-*** . -15000 / E -20000 .,- ' -25000 S-oooIQTyo 30000V SVA -35000 VA S- -VA 2 5 10 50 100 500 number of iterations Figure 6-9: Typical convergence rate of the inference algorithm for ITvo on the 256 x 256 dataset V image in Fig. 6-17b; SVA and VA inference algorithms are conducted for ITvo model. 80 II I I _:.. 66 70 60 -4-- 4) 47 S503 4 40- 4 30 20 10 ITvo ITvo learned by SVA learned by VA Figure 6-10: Percentage increase in log-likelihood logP(YIX) of IQTvo over logP(YIX) of ITvo, after 500 and 200 iteration steps for datasets IV and V, respectively. criterion for the inference algorithms of the three models are N=10, and E=0.01. Figs. 6-8 and 6-9 illustrate typical examples of the convergence rate. We observe that the inference algorithm for IQTVo converges slightly slower than SVA and VA for ITvo. The average number of iteration steps for IQTvo is approximately 160 and 230, which takes 6s and 17s on a Dual 2 GHz PowerPC G5, for datasets IV and V, respectively. The bar-chart in Fig. 6-10 shows the percentage logP l- P, where Pi=P(Y|X) is the likelihood of ITvo, and P2=P(YIX) of IQTvo. We observe that P(YIX) of IQTvo, after the algorithm converged, is larger than P(YIX) of ITyo. The larger likelihood means that the model structure and inferred distributions more accurately represent underlying stochastic processes in the image. 6.3 Image Classification Tests We compare classification performance of ITyo with that of the following statistical models: (1) Markov Random Field (lI;F) [6], (2) Discriminative Random Field (DRF) [25], and (3) Tree-Structured Belief Network (TSBN) [33,29]. These models are representatives of descriptive, discriminative and fixed-structure generative models, respectively. Below, we briefly explain the models. For MRFs, we assume that the label field P(X) is a homogeneous and isotropic MRF, given by the generalized Ising model with only pairwise nonzero potentials [6]. The likeli- hoods P(yilxi) are assumed conditionally independent given the labels. Thus, the posterior energy function is given by U(X|Y) = logP(y z)+ ZV2 (i j), ieVo i vojgi' V2 (i, x) = 3MRF ,if i x , -/MRF,if i 4j . where Ni denotes the neighborhood of i, P(yilxi) is a G-component mixture of Gaussians given by Eq. (2.6), and V2 is the interaction parameter. Details on learning the model parameters as well as on inference for a given image can be found in Stan Li's book [6]. Next, the posterior energy function of the DRF is given by U(XlY)= Ai(xi, Y)+ 1,ij (aX, x, Y), ieVo ieVo j'Ai where A= log (T(xWTyi) and ij= /3DRF(Kxixj +(1l-K)(2a((xixjVTyV)-1)) are the unary and pairwise potentials, respectively. Since the above formulation deals only with binary classification (i.e. xi E {-1, 1}), when estimating parameters {W, V,/3DRF, K} for an ob- ject, we treat that object as a positive example, and all other objects as negative examples ("one against all" strategy). For details on how to learn the model parameters, and how to conduct inference for a given image, we refer the reader to the paper of Kumar and Hebert [25]. Further, TSBNs or quad-trees are defined to have the same number of nodes V and levels L as irregular trees. For both ITyo and TSBNs, we use the same image features. When we operate on wavelets, which is a multiscale image feature, we in fact propagate observables to higher levels. In this case, we refer to the counterpart of ITv as TSBNT. To learn the parameters of TSBN or TSBNT, and to perform inference on a given image, we use the algorithms thoroughly discussed by Laferte et al. [33]. Finally, irregular-tree based image classification is conducted by 1 d1..-, i,:; the infer- ence algorithms in Fig. 2-4 for ITyo and ITv, and the inference algorithms in Fig. 3-2 for IQTvo and IQTv. Since image classification represents a supervised machine learning problem, it is necessary to first learn model parameters on training images. For this pur- pose, we (!n1 d..-,- the learning algorithms discussed in Section 2.5 for ITyo and ITv, and the learning algorithms discussed in Section 3.3 for IQTvo and IQTy. After inference of MRF, DRF, TSBN, and the irregular tree, on a given image, for each model, we conduct pixel labeling by using the MAP classifier. In Fig. 6-11, we illustrate an example of pixel labeling for a dataset-II image. Here, we say that an image region is correctly recognized as an object if the majority of MAP-classified pixel labels in that region are equal to the true labeling of the object. For estimating the object-recognition error, the following instances are counted as error: (1) merging two distinct objects into one, and (2) swapping the identity of objects. The object-recognition error over all objects in 40 test images in dataset II is summarized in Table 6-4. In each cell of Table 6-4, the first number indicates the overall recognition error, while the number in parentheses indicates the ratio of swapped-identity errors. For instance, for ITyo the overall recognition error is 9.,.' ., of which ;7' of instances were caused by swapped-identity errors. Moreover, Table 6-5 shows average pixel-labeling error. Next, we examine the receiver operating characteristic (ROC) of MRF, DRF, TSBN and ITvo for a two-class recognition problem. From the set of image classes given in Fig. 6-1, we choose "to-, --ii I and ,-- I. i-' ..... I: as the two possible classes in the following set Table 6-4: Object recognition error image type MRF DRF TSBN ITyo dataset II 21.2' 12.5% 14 -' 9..' '0.7 .) ( :'.) (72 .) (::7',) Table 6-5: Pixel labeling error image type MRF DRF TSBN ITyo dataset II 1'. '. 12.::'. 16.1% 9.9'. of experiments. The task is to label two-class-problem images containing "to-,--,ii Ii and '.- ,i. 1. i-i .... I: objects, a typical example of which is shown in Fig. 6-12. Here, pixels labeled as "toy-snail" are considered true positives, while pixels labeled as ....:" are considered true negatives. In Fig. 6-13, we plot ROC curves for the two-class problem, where we compare the performance of ITyo with those of MRF, DRF and TSBN From Fig. 6-13, we observe that image classification with ITyo is the most accurate, since its ROC curve is the closest to the left-hand and top borders of the ROC space, as compared to the ROC curves of the other models. Further, in Fig. 6-14, we plot ROC curves for the same two-class problem, where we compare the performance of ITv, with those of ITvo, TSBN, and TSBNT. From Fig. 6-14, we observe that image classification with ITv is the most accurate, and that both ITyo and ITv outperform their fixed-structure counterparts TSBN and TSBNT. From the results reported in Tables 6-4 and 6-5, as well as form Figs. 6-13 and 6-14, we note that irregular trees outperform the other three models. However, recognition performance of all the models suffers substantially when an image contains occlusions. While for some applications the literature reports vision systems with impressively small classification errors (e.g., 2.5% hand-written digit recognition error [75]), in the case of (a) 256 x 256 (b) MRF (c) DRF (d) TSBN (e) ITvo Figure 6-11: Comparison of classification results for various statistical models; pixels are labeled with a color specific for each object; non-colored pixels are classified as background. N r (a) 256 x 256 (b) MRF (c) DRF (d) TSBN (e) ITvo Figure 6-12: MAP pixel labeling using different statistical models. 0.94 " : 0.9 2 . a0.9 /-- ITvo 0.88 7 DRF S'0 TSBN 0.86 MRF 0.06 0.08 0.1 0.12 0.14 0.16 false positive rate Figure 6-13: ROC curves for the image in Fig. 6-12a with ITvo, TSBN, DRF and MRF. complex scenes this error is much higher [76, 77, 11, 5, 4]. To some extent, our results could have been improved had we iil.l1.-,. l more discriminative image features and/or more sophisticated classification algorithms than majority rule. However, none of these will alleviate the fundamental problem of ii Ii i, i_., iI" recognition approaches: the lack of explicit i ,1-, -i, of visible object parts. Thus, the poor classification performance of MRF, DRF, and TSBN, reported in Tables 6-4 and 6-5, can be interpreted as follows. Accounting for only pairwise potentials between adjacent nodes in MRF and DRF is not sufficient to , i'1-,.., complex configurations of objects in the scene. Also, the .,, 11- ,-, of fixed-size pixel neighborhoods at various scales in TSBN leads to I.1.. I:y" estimates, and consequently 0.96 0.94 S ..... ----- 2 0.92 E 0.9 'IT 0 o. 0.88 / ITyo *0.86 / --- TSBN T 2- TSBN 0.06 0.08 0.1 0.12 0.14 0.16 false positive rate Figure 6-14: ROC curves for the image in Fig. 6-12a with ITv, ITvo, TSBN, and TSBNT. to poor classification performance. Therefore, we hypothesize that the main reason why irregular trees outperform the other models is their capability to represent object details at various scales, which in turn provides for explicit .,1 '1, --i, of visible object parts. In other words, we speculate that in the face of the occlusion-problem, recognition of '.'/I. / parts is critical and should condition recognition of the object as a whole. To support our hypothesis, instead of I'l.-, i::_. more sophisticated image-feature- extraction tools and better classification procedures than majority vote, we introduce a more radical change to our recognition strategy. 6.4 Object-Part Recognition Strategy Recall from Section 6.1 that irregular trees are capable of capturing component sub- component structures at various scales, such that root nodes represent the center of mass of distinct objects, while children nodes down the subtrees represent object parts. As such, irregular trees provide a natural and seamless framework for identifying candidate image regions as object parts, requiring no additional training for such identification. To uti- lize this convenient property, we conduct the object-part recognition strategy presented in Section 4.2. We compare the performance of the whole-object and part-object recognition strategies. The whole-object approach can be viewed as a benchmark strategy, in the sense that a majority of existing vision systems does not explicitly ., 1 1-,. .. visible object parts at various scales. In these systems, once the object is detected, the whole image region is identified through MAP classification, as is done in the previous section. In Fig. 6-15, we present classification results for ITyo, using the whole-object and object-part recognition strategies on dataset-II images. In Fig. 6-15a, both strategies suc- ceed in recognizing two different "Fluke" voltage-measuring instruments (see Fig. 6-1). However, in Fig. 6-15b, the whole-object recognition strategy fails to make a distinction between the objects, since the part that differentiates most one object from another is oc- cluded, making it a difficult case for recognition even for a human interpreter. In the other two images, we observe that the object-part recognition strategy is more successful than the whole-object approach. (a) (b) (c) (d) Figure 6-15: Comparison of two recognition strategies on dataset II for ITyo: (top) 128 x 128 challenging images containing objects that are very similar in appearance; (middle) classification using the whole-object recognition strategy; (bottom) classification using the part-object recognition strategy; each recognized object in the image is marked with a different color. For estimating the object-recognition error of ITyo on dataset-II images, the following instances are counted as error: (1) merging two distinct objects into one (i.e., object not detected), and (2) swapping the identity of objects (i.e., object correctly detected but misclassified as one of the objects in the class of known objects). The recognition error averaged over all objects in 40 test images in dataset II is only -', an improvement of nearly over the reported error of 9.1' in the previous section. We also recorded the object-recognition error of IQTvo over all objects in 20 test images of datasets IV, V, and VI, respectively. The results are summarized in Table 6-6. In each cell of Table 6-6, the first number indicates the overall recognition error, while the number in parentheses indicates the ratio of merged-object errors. For instance, for dataset V and the whole-object strategy, the overall recognition error is 21.2' of which slightly more than half ('.1i' .) were caused by merged-object errors. The results in Table 6-6 clearly demonstrate significantly improved recognition performance, as well as reduction in false Table 6-6: Object recognition error for IQTyo datasets strategy IV V VI whole-object 11.i.' ( .) 21.2' ( .') 26.;' ( .) object-part 3. :'. (100%) 8.7' (92'.) 12.5% (81%) Table 6-7: Pixel labeling error for IQTyo datasets strategy IV V V whole-object 9..' 17.9' 16..:' object-part 4.:;', 6.7' 8.;' alarm and swapped-identity types of error for the object-part, as compared with the whole- object approach. Also, Table 6-7 shows that the object-part strategy reduces pixel-labeling error. These results support our hypothesis that for successful recognition of partially oc- cluded objects it is critical to I_ ,1-,.. visible object details at various scales. (a) Cluttered scene containing 10 objects, each of which is marked with a different color; images of two alike persons. (b) Dataset II: video sequence of two alike people walking in a cluttered scene. (c) Classification using the whole-object recognition strategy. (d) Classification using the part-object recognition strategy. Figure 6-16: Recognition results over dataset IV for IQTvo. (a) 6 image classes: 5 similar objects and background. (b) 4 images of the same scene viewed from 4 different angles with objects shown in (a). (c) The most significant object parts differ over various scenes; the majority-voting classification result is indicated by the colored regions. (d) Classification using the whole-object recognition strategy. (e) Classification using the object-part recognition strategy. Figure 6-17: Recognition results over dataset V for IQTvo. Figure 6-18: Classification using the part-object recognition strategy; Recognition results for dataset VI. a 1:'1"----1 mu R---ftH 17- iin 1 CHAPTER 7 CONCLUSION 7.1 Summary of Contributions In this dissertation, we have addressed detection and recognition of partially occluded, alike objects in complex scenes -the problem that has eluded, as of yet, a satisfactory solution. The experiments reported herein show that i i'Il, i..11 I" approaches to object recognition, where objects are first detected and then identified as a whole, yield poor per- formance in complex settings. Therefore, we speculate that a careful i,'1-, -i- of visible, fine-scale object details may prove critical for recognition. However, in general, the .11 i,1-, -i of multiple sub-parts of multiple objects gives rise to prohibitive computational complexity. To overcome this problem, we have proposed to model images with irregular trees, which provide a suitable framework for developing novel object-recognition strategies -in particu- lar, object-part recognition. Here, object details at various scales are first detected through tree-structure estimation; then, these object parts are I_ '1-,. .1 as to which component of an object is the most significant for recognition of that object; finally, information on cog- nitive significance of each object part is combined toward the ultimate image classification. Empirical evidence demonstrates that this explicit treatment of object parts results in an improved recognition performance, as compared to the strategies where object components are not explicitly accounted for. In Chapter 2, we have proposed two architectures within the irregular-tree framework, referred to as ITyo and ITv. For each architecture, we have developed an inference al- gorithm. Gibbs sampling has been shown to be successful at finding trees that have high posterior probability; however, at a great computational price, which renders the algorithm impractical. Therefore, we have proposed Structured Variational Approximation (SVA) for inference of ITyo and ITv, which relaxes poorly justified independence assumptions in prior work. We have shown that SVA converges to larger posterior distributions, an order of magnitude faster than competing algorithms. We have also demonstrated that ITyo and ITv overcome the blocky segmentation problem of TSBNs, and that they possess certain invariance to translation, rotation, and scaling transformations. In Chapter 3, we have proposed another two architectures, referred to as IQTvo and IQTv. In these models, we have constrained the node positions to be fixed, such that only connections can control irregular tree structure. At the same time, we have made the distribution of connections dependent on image classes. This formulation has allowed us to avoid variational-approximation inference, and to develop the exact inference algorithm for IQTvo and IQTv. We have shown that it converges slower than SVA; however, it yields larger likelihood, which in general means that IQTyo represents underlying stochastic processes in the image more accurately than ITvo. In experiments on unsupervised image segmentation, we have shown the capability of irregular trees to capture important component-subcomponent structures in images. Empir- ical evidence demonstrates that root nodes represent the center of mass of distinct objects, while children nodes down the subtrees represent object parts. As such, irregular trees provide a natural and seamless framework for identifying candidate image regions as ob- ject parts, requiring no additional training for such identification. In Chapter 4, we have proposed to explicitly .111 '1-,. the significance of object parts (i.e., tree nodes) with respect to recognition of an object as a whole. We have defined entropy as a measure of such cog- nitive significance. To avoid the costly approach of ._ ,1-,. i,: every detected object part, we have devised a greedy algorithm, referred to as object-part recognition. The compari- son of whole-object and part-object approaches indicates that the latter method generates significantly better recognition performance and reduced pixel-labeling error. Ultimately, what allows us to overcome obstacles in I, i1-,. i,:: scenes with occlusions in a computationally efficient and intuitively appealing manner is the generative-model framework we have proposed. This framework provides an explicit representation of objects and their sub-parts at various scales, which, in turn, constitutes the key factor for improved interpretation of scenes with partially occluded, alike objects. 7.2 Opportunities for Future Work The .,1 ,1, -i, in the previous chapters 1:_:_. -I the following opportunities for future work. One promising thrust of research would be to investigate relationships among descrip- tive, generative and discriminative statistical models. We anticipate that these studies will lead to a greater integration of the modeling paradigms, yielding richer and more advanced classes of models. Here, the most critical issue is that of computationally manageable in- ference. With recent advances in the area of belief propagation (e.g., Generalized Belief Propagation [78]), the new algorithms may make it possible to solve real-world problems that were previously computationally intractable. Within the irregular-tree framework, it is possible to continue further investigation toward replacing the current discrete-valued node variables with real-valued ones. Thereby, a real-valued version of the irregular tree can be specified. Gaussians could be used as a probability distribution to govern continuous random variables, represented by nodes, due to their tractable properties. Such a model could then operate directly on real-valued pixel data, improving the state-of-the-art techniques for solving various image-processing problems, including super resolution, image enhancement, and compression. Further, with respect to the measure of significance of irregular-tree nodes, one can pursue investigation of more complex information-theoretic concepts than Shanon's entropy. For example, we anticipate that joint entropy and mutual information may yield a more efficient cognitive .111 1, -i-. which in turn could eliminate the need for the greedy algorithm discussed in Section 4.2. The ., ,11-, -i, of object parts can be interpreted as integration of information from multiple complementary and/or competitive sensors, each of which has only limited accu- racy. As such, further research could be conducted on formulating the optimal strategy for combining the pieces of information of object parts toward ultimate object recognition. We anticipate that algorithms such as the adaptive boosting (AdaBoost) [79] and Support Vector Machine [80] may prove useful for this purpose. Another promising research topic is to incorporate available prior knowledge into the proposed B ,-, -i mi estimation framework, where we have assumed that all classification errors are equally costly. However, in many applications, some errors are more serious than others. Cost-sensitive learning methods are needed to address this problem [81]. On a broader scale, the research reported in this dissertation can be viewed as solving a more general machine learning problem, with experimental validation on images as data. This problem concerns supervised learning from examples, where the goal is to learn a function X = f(Y) from N training examples of the form {(Y,, f(Y~))}I 1. Here, X, and Y, contain sub-components, the meaning of which differs for various applications. For example, in computer vision, each Y, might be a vector of image pixel values, and each X, might be a partition of that image into segments and an assignment of labels to each segment. Most importantly, the components of Y, form a sequence (e.g., a sequence on the 2D image lattice). Therefore, learning a classifier function X = f(Y) represents the sequential supervised learning problem [82]. Thus, in this dissertation, we have addressed sequential supervised 1. in__:_. the solutions of which can be readily applied to a wide range of problems beyond computer vision, such as, for example, speech pi.. -.-,:_. where the components of Y form a sequence in time. APPENDIX A DERIVATION OF VARIATIONAL APPROXIMATION Preliminaries. Computation of KL(QIIP), given by Eq. (2.12), is intractable, be- cause it depends on P(Z,X, R'IY, RO). Note, though, that Q(Z,X, R') does not depend on P(YIRo) and P(RO). Consequently, by subtracting logP(YIRo) and logP(RO) from KL(QIIP), we obtain a tractable criterion J(Q,P), whose minimization with respect to Q(Z, X, R') yields the same solution as minimization of KL(QIIP): 0RQ(Z, X, R') J(Q, P)KL(QIIP)- log P(YIR)- log P(R)= j dR' Q(Z, X, R') log ( R,) Z,X (A.1) J(Q, P) is known alternatively as Helmholtz free energy, Gibbs free energy, or free energy [59]. By minimizing J(Q, P), we seek to compute parameters of approximate distributions Q(Z), Q(XIZ) and Q(R'IZ). It is convenient, first, to reformulate Eq. (A.1) as J(Q,P) = Lz + Lx + LR. We define auxiliary Lz, Lx, and LR as Lz A z Q(Z) log Q(Z) Lx Ez,x Q(Z)Q(XIZ) log x ), and L To derive expressions for Lz, Lx, LR, we first observe: (zij)=i, a /k\, .)j -Qjk => k :jiVJl:MQ!'' VieVVkM, (A.2) where (K) denotes expectation with respect to Q(Z, X, R'). Consequently, from Eqs. (2.1), (2.9) and (A.2), we have Lz = Eijv j i.j -l[ ;/- .] (A.3) Next, from Eqs. (2.4), (2.10) and (A.2), we derive Lx = E yjv k,lcM ijQ.I "') log[ /Pi] Ei EkM log P(yp(,) |X, p(i)) (A.4) Note that for DTvo, V in the second term is substituted with Vo. Finally, from Eqs. (2.3), (2.11) and (A.2), we get LR = I (jl log 1 Tr{ 1 } + Tr E{Zl,(r-rr-d)(r-rj-dj )T . (A.5) Let us now consider the expectation in the last term: ((ri-rj -dij) (ri-rj -dij)T) = ((r- pj+ ij-rj-dij)(r-pj+ ij-rj-dij)T) -= ij + 2((ri -pij) (jp-rj --dij -pjp+pij)T) + + ((rj -jp+dij +pj-ij)(rj -Cjpdij ++jp-pij)) = =Qij+2((ri-ij)(Ljp-rj)T)+((-rj --Ljp) ( rj -jp,)T+( ij -jp -dij) )(ij -jp-dij)T)= = ij + Y:EpC / jp (2'ijp + Qjp + ijp) , (A.6) where the definitions of auxiliary matrices fijp and Maijy are given in the second to the last derivation step above, and i-j-p is a child-parent-grandparent triad. It follows from Eqs. (A.5) and (A.6) that LR log ( 2 + Tr{YZ1Qi} + E ipTr{z (2qijp+Qjp+Mij)} i,jEV' 1 pV/ (A.7) In Eq. (A.7), the last expression left to compute is Tr{E( I'jp}. For this purpose, we apply the Cauchy-Schwartz inequality as follows: 1Tr{J }I ij, p Tr{ ij Y ij ((i--ij)(prjp-)T)} = Tr{(ij2 (ri -- i)(l (jp-rj)T~)} , < Tr{Z l i}-yr{)ljp}- (A.8) where we used the fact that the E's and Q's are diagonal matrices. Although the Cauchy- Schwartz inequality in general does not yield a tight upper bound, in our case it appears reasonable to assume that variables re and rj (i.e., positions of object parts at different scales) are uncorrelated. Substituting Eq. (A.8) into Eq. (A.7), we finally derive the upper bound for LR as 216ij L ( i- --log 2+Tr{StQi} + Evr,pyTr{EZ1 (OY + My)}+ +2E YpviTr{Eij j}Tr{Eij n j,} (A.9) Optimization of Q(XIZ). Q(XIZ) is fully characterized by parameters Q1. From the definition of Lx, we have OJ(Q, P)/oQJ =Lx/xOQQk. Due to parent-child dependencies in Eq. (A.2), it is necessary to iteratively differentiate Lx with respect to Qfk down the subtree of node i. For this purpose, we introduce three auxiliary terms Fij, Gi, and XA, which facilitate computation, as shown below: FiA Ek,leM ij "* [f/Pi iOLx OFij 9Gj Om Gi EdP,ced(i) Fdc EkeM lg, O p(i) i X Pi)yo) 'i + g k k A exp(-OGi/oM$), (A.10) where {.}vo denotes that the term is included in the expression for Gi if i is a leaf node for DTvo. For DTv, the term in braces {.} is 11,---,- included. This allows us to derive update equations for both models simultaneously. After finding the derivatives OFij/OQj = "m(lQ-l Pl]1) and Om/OQij=,jjm, and substituting these expres- sions in Eq. (A.10), we arrive at OLx/OQj = ijm (l.-:-ck/Pl] + 1 log Ai) (A.11) Finally, optimizing Eq. (A.11) with the Lagrange multiplier that accounts for the constraint EkeM- Q =1 yields the desired update equation: Q = iKP1jj, introduced in Eq. (2.13). To compute Ai, we first find oG/om=k Ecc(i) (OFc/om? + EaeM(oGc/), I)(',, I/om)) {log P(yp()I x, p())}o , cc(i) ZaeM ci (log /f] + Gc/c ) {log P(y() p(i))}v(oA.12) and then substitute Qk1, given by Eq. (2.13), into Eq. (A.12), which results in A {P(yp()|, p(i))}voJcv [eEaeM PkA] as introduced in Eq. (2.14). Optimization of Q(R'lZ). Q(R'|Z) is fully characterized by parameters tLij and Qij. From the definition of LR, we observe that 9J(Q)/10ij =0LR/0ij and 9J(Q)/O1ij 0LR/I P i. Since the Q's are positive definite, from Eq. (A.9), it follows that OLR/IQj =0.5 j (-Tr{ 1}+T.r{ 1}+ CEr',cir{ }+ TrI Yir{ }+ +EpV jpTr{ l ZIp}- JTQijITZjlQjp} + +Ec ciTr{Z }Tr{zC l i}- Tr{Z } (A.13) From OLR/OQij=0, it is straightforward to derive the update equation for Qij given by Eq. (2.17). Next, to optimize the pij parameters, from (A.9), we compute ~^~~i- -Fi Ei,jPeV' ijJP(P-ij -- -tjp ~ jp) d ij (.ij t jp , D-liji D-lij 2i Ec,peV (fijJpTij Y -jp-djp) ciijj 1 (1Lci-ij-dij)) (A.14) Then, from OLR/9Ipij 0, it is straightforward to compute the update equation for PLij given by Eq. (2.16). Optimization of Q(Z). Q(Z) is fully characterized by the parameters (ij. From the definitions of Lz, Lx, and LR we see that OJ(Q)/9,ij = 9(Lx+LR+Lz)/9idj. Similar to the optimization of Qk, we need to iteratively differentiate Lx as follows: OLx/iOj = OFiy/jlOy + keM(G9/Gm/ )(9 mfj/i4) (A.15) where Fij and Gi are defined as in Eq. (A.10). Substituting the derivatives OGi/om=- log A, and aFij/j= EkeM ','jj -.: /Pikl], and m/ni/ =j EZleMQ~j'"j into Eq. (A.15) we obtain Lx= EM :'" klcM Q log M A , S k1 ij i (A.16) Next, we differentiate LR, given by Eq. (A.9), with respect to ij as 1 1 OLRI/9Oj log Ijl/lijl 1 + Tr{Z j} + y2 2( + )}+2Tr{ }Tr{ t +2 ( (Qjp+Mijp )}2Tr{Z 1QP {z ,1Q2) + 71 + Ecv i (Tr{/ (,+i )}+2Tr{ l } Tr{ lj}) ,(A.17) = Bj 1 (A.18) where indexes c, j and p denote children, parents and grandparents of node i, respectively. Further, from Eq. (A.3), we get OLz/I2~ij 1 +logijj (A.19) Finally, substituting Eqs. (A.16), (A.18) and (A.19) into OJ(Q)/OSij=0 and adding the Lagrange multiplier to account for the constraint ~, v/ij= 1, we solve for the update equation of ij given by Eq. (2.18). APPENDIX B INFERENCE ON THE FIXED-STRUCTURE TREE The inference algorithm for Maximum Posterior Marginal (MPM) estimation on the quad-tree is known to alleviate implementation issues related to underflow numerical er- ror [33]. The whole procedure is summarized in Fig. B 1. The algorithm assumes that the tree structure is fixed and known. Therefore, in Fig. B-1, we simplify notation as P(xi Z, Y)-P(xilY) and P(xilxj, Z)-P(xijxj). Also, we denote with c(i) children of i, and with d(i) the set of all the descendants down the tree of node i including i itself. Thus, Yd(i) denotes a set of all observables down the subtree whose root is i. Also, for comput- ing P(xilYd(i)), in the bottom-up pass, oc means that equality holds up to a multiplicative constant that does not depend on xi. Two-pass MPM estimation on the tree t Preliminary downward pass: Vi E VL-1, VL-2, ..., V, SP(xi)- Ex P(xilzj)P(xj), T Bottom-up pass: Initialize leaf nodes: Vi E VO P(' I|II 1 P('i i, )P(xz ), -Pz,z^yj|) P(z z)P(i)P(' |I 1/P(Xi), A compute upward Vi E V1, V2..., VL, P(x| Yd(c))P(xc|lx) P(X|j1Yd())ocP(:X) Hncc(i) Ex, P(X ) , P(xx, xjl d()))-P(xi|lj)P(x )P(xilYd(|))/P(xi), t Top-down pass: Initialize root: i E V, P(x|iY) P(xilYd(i)), i= argmax, P(xilY), V compute downward Vi E VL-1, VL-2..., VO, PfY)-z P(xi, xj1Yd(i)) SP(|,Y) P(zgY ),P(xi ) Y) i= arg max, P(xilY) Figure B-1: Steps 2 and 5 in Fig. 3-2: MPM estimation on the fixed-structure tree. Dis- tributions P(i 1, ) and P(xilxj) are assumed known. REFFi. i :NC 1] W. E. L. Grimson and T. Lozano-Perez, i )calizing c .1 .:.. :. ts by searching the interpretation tree," PIattern Anal. Machine' vol. 9, no. 4, pp. [2] S. Z. Der and R. ( ii i 'robe-based automatic target recognition in infrared :. I'EE 7.. ... P vol. 6, no. 1, pp. i 3] P. C. Ci .. E. L. C i.. and J. B. Wu, "A spatiotemporal neural network for recog- nizing .. i .11 occluded ob.- ts," ':'' .. vol. ., no. 7, pp. 4] W. M. Wells, : :. i aches to feature-based ob-'" recognition," Intl. J. Computer Vision, vol. 21, no. 1, pp. 5] Z. Ying and D. Castanon, partially occluded object recognition using statistical models," J. C .' Vision, vol. :, no. 1, i 57 S. Z. Li, Maerkov random in. .: : S :. : V -VC Tok - Japan, 2nd edition, 1. 7] M. H. Lin and C. Tomasi, ..: :es with occlusions from stereo," IEEE Pattern Anal. Machine Intell., vol. 26, no. 8, pp. 1i'. -' .' ;A. Miittal and L. S. Davis, i ': : a multi-view approach to segmenting and tracking people in a cluttered scene," Intl J C ...' Vision, vol. 51, no. 3, pp. B. J. Frey, N. Jc : and A. Kannan, "Learning appearance and t .... manifolds of occluded ob-:. in 1--.--." in Proc. C' .,' C. Corrputer Vision Pattern .i .. ', i vol. 1, pp. 45 52, i i i Inc. [1:: F. Dell'Acqua and R. Fisher, i.. : uction of planar surfaces behind occlusions in range images," IEEE Trans. Pattern Anal. Machine Intell., vol. 24, no. 4, pp.. : 575, [11] R. i P. Perona, and A. Zisserman, "Ob'. class recognition by unsupervised scale-invariant 1. .. .. in Proc. .' C ." Cormputer Vision Pattern Rec., son, WI, '.'. vol. 2, pp. 264-271, IEEE, Inc. [12] A. Mohan, C. Papageorgiou, and T. P I ,i :::i -based obi detection in images by :::i .:. 1 I' .: Trans. Patlern .'. Machine .C ''. *, vol. 23, no. 4, p p i. ; : : ... L. 1 M. Weber, M .'. ":: and P. Perona, : awards automatic : :. of obi- cat- egories," in Proc. 1iEEE C r C Vision Pattern Rec., Hilton Head Island, SC, vol. 2, 101 1-' -, IEEE, Inc. [14] Weber, ,'. i.. and P. Perona, "Unsupervised learning of models for recogni- tion," in Proc. '7 European C Comp. Vision, Dublin, Ireland, ::I, vol. 1, pp. 18-32, . [15] B. Heisele, T. Serre, M. P : :I T. Better, and T. P I "Categorization learning and combining obi 1 parts," in Advances in neural : / .' . 14, T. G. Dietterich, S. Becker, and Z. Ghahramani, Eds., vol. 2, i 1 : -2.: MIT Press, Cambridge, MA, [1:.' P. F. I and Daniel P. Huttenlocher, L.- : .... structures for object recog- nition," Intl. J. C .. Vision, vol. .: no. 1, 55-79, ::: [17] H. Schneiderman and T. Kanade, "Obi I detection using the statistics of parts," Intl. J. Computer Vision, vol. 56, no. 3, pp. 151-177, '.: i. [18] S. C. . modeling and conceptualization of visual ..' Trans. Pattern Anal. Machine Intell., vol. no. 6, : .. ::"_ 712, . [ :S. C. Zhu, Y. N. Wu, and D. B. Mumford, : ::::: entropy principle and its ap- plications to texture :: 1 ":: Neural Computation, vol. 9, no. 8, pp. '-. 7 1-- 1'* C S. Geman and D. Geman, :... ... relaxation, ( distribution and the E restoration of images," IEEE Trans. Pattern Anal. Machine Intell., vol. 6, no. 6, pp. 721-741, 1-- [21] A. F : and T. ::: "Texture thesisis byh :, .: .: : .::..1i:. in Proc. In l. C. C .,: .'. : Vision, Y : Greece, 1. vol. 2, i 1.: -1: I;i i Inc. [22] J. S. De Bonet and P. Viola, texture e recognition using a non-pararetric multi-scale statistical model," in IEEE C .-' Computer Vision Patterni Rec., Santa Barbara, CA. 1=: 641 7, I= Inc. [23] M. J. Beal, N. Jojic, and H. Attias, "A, : .i '. model for i visual ob' . 1 .' Trans. Pallern Anal. Machine Intell., vol. no. 7, " SJ. Coughlan and A. ..ii. "Algorithms from statistical .1. for generative models of images," a.. and Vision C .'" : vol. 21, no. 1, pp. 29-. SS. Kurnar and M. Hebert, "Discriminative random 1: a discriminative framework for contextual interaction in i :.: .i: in Proc. IEEE Inll. C. *' Comp. Vision, Nice, Frane, '::': vol. 2, pp. 1150-1157, Ii 1 Inc. SJ. i .i. : A. McCallum, and F. IPereira, "Conditional random fields: ....1 models ... segmenting and labeling sequence data," in Intl. C .' Machine .. ii. ...... .i. M A :::: r.-. 282- SC. A. Bouman and M. Shapiro, "A mult.iscale random field model for i' .: image segmentation," IEEE .. F: : V. vol. 3, no. 2, : 162-177, 1 W. W. Irving, P. W. Fieguth, and A. S. ..':. "An overlapping tree approach to multiscale stochastic ::: I ::: and estimation," IEEE Irans. ......: .... vol. 6, no. 11, pp. 1517 1 [29] i ( .. .. and C. A. Bouman, .i:iscale 1 .. segmentation using a trainable context model," '' '' .* .* .' vol. 10, no. 4, i 511 525, . ::M. S. Crouse, R. D. Nowak, and R. G. Baraniuk, processing using ii: n Markov Models," IEEE V .:. 4, i : -,: i: i: -based statistical signal Signal .. vol. i : no. S X. ... C.K. 1. '. and S. N. Felderhof, "Combining belief networks and neural networks for scene segmentation," .' Pattern Anal. Machine vol. 24, no. 4, pp. -483, . " S. Todorovic and M. C. "- i1 awards ::: Vehicles: rnultiscale Viterbi I. :: .:: :: in . Vision, Prague, Czech :. : vol. 2, pp. 1" J.-M. Laferte, P. erez, and F. ii. on the quadtree." IEEE Trans. T. Mission : : i of Micro Air I. E.:. .. .. C ...." Conmputer .' ,, . S)iscrete Markov image modeling and inference F. .. vol. 9, no. 3, pp. i. . [34] M. R. Luettgen and A. S. '.'ii 1, "i : :1: : calculation for a class of multiscale stochastic models, with i:' : : to texture discrimination," IEEEI '"..- i: F .. vol. 4, no. 2, pp. 194- :'. 1- SP. L. Ainsleigh, N. i *.:arnavaz, and R. L. Streit, i .i.. ('... i .. I.v models for signal classification," IEEE ,.' F.. .. vol. 50, no. :. I. 1 -1367, J. Pearl, Probabilistic : : in .' .''. : networks of plausible '.. '. Morgan Kaufamnn, San Mateo, CA, 1 [37] J. -.. .. .. 1.:, T S. framework ... analysis of vol. : 1" no. 5, pp. . .i i a ind A. S. .ii i .'ee-based ri .... !. ization .. uct and related algorithms," IEEE ... I inm. i: :: 11. Brendan J. Frey, ( '. models : machine .. and i..' communication, The MIT Press, Cambridge, MA, 1 S. Kumar and M. i::. i'rt, i .... e structure detection in natural images using a causal multiscale random :. -." in Proc. IEEE C ".' Computer Vision Pattern Rec., Madison, WI. : vol. 1, pp. 119-126, I: : Inc. i-. M. K. Schneider, P. W. Fiegut.h, W. C. Karl, and A. S. Willsky, : i: ;cale methods for the segmentation and reconstruction of signals and images," IEEE '.. I:. vol. 9, no. 3, pp. :: [41] J. Li, R. M. Gray, and R. A. Olshen, : solutionin image : ..... by hierar- chical ...- -. "..; with two-dimensional IT .. Markov Models," IEEE I*n! inm. T,77 vol. : no. 5, pp. 1 18/41, '.::::: SI W. K. Konen, T. Maurer, and C. von der M I1 1 "A fast d : .: link matching algorithm for invariant pattern recognition," Neural Networks, vol. 7, no. 7, pp. 1019 1030, :: [43] A. Montanvert, P. Meer, and A. i .. 1. i .. ..-chical image ... 1 using irregular tessellations," ,'.' .' Pattern Anal. Machine Intell., vol. 13, no. 4, .. 307 3:: 1 "-1. [44] P. Bertolino and A. Montanvert, :: :: solution segmentation using the irregular in Proc. Intl. C c ..... .... Lausanne, Switzerland, '., vol. 1, pp. -260, I : Inc. [45] N. J. Adams, A. J. Storkey, Z. (' ... ...** and C. K. 1. .' i ..*.. : iDTs: mean field dynamic trees," in 15Ith Intl. C 7.. Pattern Rec., Barcelona, Spain, ::: vol. 3, pp. 147-150, Intl. Assoc. Pattern Rec. SN. J. Adams, D-:...: .: trees: a hierarchical ... ,. to .. Ph.D. d nation, Division of Informatics, Univ. of Edinburgh, E : :::. UK, i [47] A. J. Storkey, \,. .. trees: a structured variational method giving ... nation rules," in Uncertainty in ...." C. B ... and M. Goldszmid t, Eds., : -.573. Morgan I ....... San Francisco, CA, : SA. J. S and C. K. I. W: :: :: :: 1 ::: with position-encoding dynamic trees," '1 :' .. Patlern Anal. Machine In dll., vol. no. 7, : ;. 71, '. S 1. Jordan, i in models (adaptive computation and machine S' MIT press, Cambridge, MA, 1 :" M. I. Jordan, "Graphical models," .: Science ( issue on PE.-.. :..- statis- tics), vol. 19, pp. 1.: 1-155, ... [51] A. IP. Dempster, N. M. Laird, and 1). B. Rubini, i .. ... 1.1. 1... 1 from incomplete data via the i i algorithm," Journal of the Statistical S. '; B, vol. ", pp. 1- 1977. [52] G. J. McLachlan and K. :. i:: E: iM .... .' and extensions, John &. Sons, New York, NY, 1 . [53] D. M ( :... : .... and 1). i .. i .. .. .. ... ... for the marginal like- lihood of ....I.. data given a : network," in Proc. : C . .' '" Portland, OR. 1 :: pp. 158 168, Assoc. Uncertainty Artificial ] ..: [54] S. Todorovic and M. C. ba, i .:: i elation of ..i scenes using generative dynamic-struct.ured models," in CD-ROM Proc. IEEE CVPR ',, Wo1orkshop on Generativeu-Model Based Vision (GAMBV), NWashington, DC, :-,:' IEEE, Inc. [55] S. Todorovic and M. C. Nechyba, electionn of artificial structures in natural-scene images using dynamic trees," in Proc. : Intl. C .' Pattern Rec., Cambridge, UK, i 39, Intl. Assoc. Pattern Rec. 78 56] M. Ait.kin and D. B. Rubin, i i :: : :: and hypothesis testing in -::: mixture models," J. ..' Soc., vol. B-47, no. 1, pp. ! 157] R. M. Neal, "Probabilistic inference using Markov ( ... Monte Carlo methods," Tech. Rep. ( i G-(' 1 Connectionist Research Group, of Toronto, D. A. F-i th, J. Haddon, and S. I- "The joy of -::::: Intl. J. Computer Vision, vol. 41, no. 1-2, pp. II:. 34, M. I. Jordan, Z. ( ".. .. .. T. S. Jaakkola, and L. K. Saul, "An introduction to variational methods for ..:'.: models," Machine vol. no. 2, pp. 1999. :: J. C. MacKay, I : inference, and ,; '".: Cambridge Univ. Press, Cambridge, UK, ' 161] D. Barber and P. van de Laar, .. ... cumulant ': for intractable dis- tributions," J. Artificial Intell. Research, vol. 10, pp. 1 - S D. J. C. MacKay, I '" rinerence, and '.: chapter ' pp. ; Cambridge University Press, Cambridge, UK, '. I). J. C(. MacKay, i:.:... auction to Monte Carlo methods," in ,'.. :.' in . models (adaptive computation and machine '. ), M. I. Jordan, Ed., pp. 175-204. : i' press, Cambridge, MA, 1- : 164] T. S. Jaakkola, "Tutorial on variational : oximation methods," in Adv. Mean Field JMethods, M. Opper and D. Saad, Eds., I 1 61. 1 i press, Cambridge, MA, 'k : 165] T. M. Cover and J. A. Thomas,. of "' ," .i j Interscience Press. New York. NY, 1 . [ r 'lYygve Randen and lHakon llusoy, alteringg for texture a comparative study," IEEE .: Pattern Anal. Machine Intell., vol. 21, no. 4, pp. 291-310, 1'- *' i *S. .:. 4. A. ., wavelet tour of .. Academic Press, San Diego, CA, 2nd edition. : L j 1. -'.'. G. .. .: "A theory for rnultiresolution signal decomposition: the wavelet representation," IEEJE Pattern Anal. Machine Intell. vol. 11, no. 7, ] -.1 S' 1' ' S' Jerome M. Shapiro, i ..i added image coding using zerotrees of wavelet ... . IEEE .' on f. vol. 41, no. 12, pp. 34455 -2, 1 ' H- N. G. Kingsbury, "Corplex wavelets for shift invariant !. and filtering of signals," J. 1 .' C' HIarmonic 1 vol. 10, no. 3, : . Michael Unser, texture classification and segmentation using wavelet frames," IEEE' Trans. on vol. 4, no. 11, 1 [72] Nick Kingsbury, "Corrplex wavelets for shift invariant ... 1 and : ,of signals," Journal of Applied and C '' Harnmonic A4 vol. 10, no. 3, pp. 253, 79 173] T. UI:: 1. i theory: a basic tool : : : structures at different scales," J. .~..:' .7 Statistics, vol. 21, no. 2, pp. 224 270, [74] D. G. Lowe. "Distinctive image ...'es from scale-invariant keypoints," Intl. J. C. Vision, vol. :, no. 2, pp. 91 110, : [75] S. Belongie, J. :' and J. Puzicha, : matching and object recognition using shape contexts," IEEE Pattern, Anal. Machine Intell., vol. 24, no. 4, pp. 522, SB. J. Frey, N. Jc : and A. Kannan, "Learning appearance and t .... manifolds of occluded ob' in 1 in Proc. .' C CI .. Vision Pattern Rec., .. ., i: .. vol. 1, pp. 45 52, i i i Inc. [77] G. Jones III and B. Bhanu, 'cognition of articulated and occluded obi IEEE S Pattern Anal. Machine Intell., vol. *, no. 7, J. S. Y, ...i. W. T. Freeman, and Y. Weiss, "C.( .. .. belief .. .: in Advances in neural ..' '. 183, T. K. Leen, T. G. Dietterich, and V. 'resp, Eds., pp. : 95. i Press, Cambridge, MA, ::A . S Y. Freund and R. E. .e, "A decision-theoretic .: : : :of : ::: learning and an i" : to b- 1: J. C '. Sciences, vol. 55, no. 1, pp. 39, :' :: V. N. .. ..i John .' i. k, Sons, Inc., New York, NY, l P. Dorningos, i: ost: a general method for making : :i: cost-sensitive," in Proc. 15th Intl. C. r /.. '.' ? Data /.u V.., San Diego, CA, i pp. 155-164, A( Press. [82] T. G. Dietterich, i.. learning for sequential data: a review," in :. 'e notes in corrputer science, T C I i vol. :., pp. 15 30. Springer-V. i ... Germany, '::: ' BIOGRAPHICAL i ::;TC i STodorovic was born in Belgrade, Serbia, in 1968. He graduated from Mathemat- ical I:: i: School-Belgrade in 1' 7. He received his B.S. degree in electrical and computer engineering at the Uli: .: .' of Belgrade, Serbia, in i F' r From 1994 -::: :il, he worked as a .. :. e engineer in the communications industry. In fall ::: Sinisa Todorovic enrolled in the master's degree program at. the Department of Fi ::1 a.nd Computer S" 'U- .." 'r of Florida, G ... C i.. became a member of the Center for : :o Air Vehicle Research, where he conducted research in statistical image modeling and multi-resolution signal processing. ... Todorovic earned his master's degree ( j i. thesis option) in December, '::: : after which he continued his studies toward a IF i). degree in the same D. .: tment.. He received two certificates for outstanding academic. .. .. in :::: and ':::. He to graduate in May, |

Full Text |

PAGE 1 IRREGULAR-STRUCTURETREEMODELSFORIMAGEINTERPRETATION By SINISATODOROVIC ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFPHILOSOPHY UNIVERSITYOFFLORIDA 2005 PAGE 2 ACKNOWLEDGMENTS IwouldliketoexpressmysinceregratitudetoDr.MichaelNe chybaforhiswiseandpatientguidanceofmyresearchforthisdissertation.Asmyfo rmeradvisor,Dr.Nechybahas beendirectingbutonnoaccountconningmyinterests.Iesp eciallyappreciatehisreadinessandexpertisetohelpmesolvenumerousimplementation issues.Mostimportantly,I amthankfulforthefriendshipthatwehavedevelopedcollab oratingonthiswork. Also,IthankmycurrentadvisorDr.DapengWuforputtingext raeorttohelpme nalizemyPhDstudies.Iamgratefulforhisinvaluablepiec esofadviseinchoosingmy futureresearchgoals,aswellasforpracticalconcreteste psthatheundertooktohelpme ndajob. MythanksalsogotoDr.JianLi,whohelpedmealotinthetrans itionperiodin whichIwassupposedtochangemyadvisor.Herresearchgroup providedforastimulating environmentformetoendeavorinvestigatingareasthatare beyondtheworkpresentedin thisdissertation. Also,IthankDr.AntonioArroyo,whosebrilliantlectureso nmachineintelligence haveinspiredmetodoresearchintheeldofmachinelearnin g.Asthedirectorofthe MachineIntelligenceLab(MIL),Dr.Arroyohascreatedawar m,friendly,andhardworking atmosphereamongthe\MIL-ers."Thankstohim,Ihavedecide dtojointheMIL,which hasprovedonnumerousoccasionstobetherightdecision.It hankallthemembersofthe MILfortheirfriendshipandsupport. IthankDr.TakeoKanadeandDr.AndrewKurdilaforsharingth eirresearcheorts onthemicroairvehicle(MAV)projectwithme.Themultidisc iplinaryenvironmentof thisprojectinwhichIhadachancetocollaboratewithvario usresearcherswithdiverse educationalbackgroundswasagreatexperienceforme. ii PAGE 3 TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................ii LISTOFTABLES .....................................v LISTOFFIGURES ....................................vi KEYTOABBREVIATIONS ...............................viii KEYTOSYMBOLS ....................................x ABSTRACT ........................................xii CHAPTER1INTRODUCTION ..................................1 1.1Part-BasedObjectRecognition .......................3 1.2ProbabilisticFramework ...........................4 1.3Tree-StructuredGenerativeModels .....................6 1.4LearningTreeStructurefromDataisanNP-hardProblem .......8 1.5OurApproachtoImageInterpretation ...................9 1.6Contributions .................................10 1.7Overview ...................................12 2IRREGULARTREESWITHRANDOMNODEPOSITIONS .........13 2.1ModelSpecication ..............................13 2.2ProbabilisticInference ............................16 2.3StructuredVariationalApproximation ...................18 2.3.1Optimizationof Q ( X j Z ) .......................19 2.3.2Optimizationof Q ( R 0 j Z ) .......................20 2.3.3Optimizationof Q ( Z ) .........................21 2.4InferenceAlgorithmandBayesianEstimation ...............21 2.5LearningParametersoftheIrregularTreewithRandomNo dePositions 23 2.6ImplementationIssues ............................25 3IRREGULARTREESWITHFIXEDNODEPOSITIONS ...........27 3.1ModelSpecication ..............................27 3.2InferenceoftheIrregularTreewithFixedNodePosition s ........30 3.3LearningParametersoftheIrregularTreewithFixedNod ePositions ..31 4COGNITIVEANALYSISOFOBJECTPARTS .................35 4.1MeasuringSignicanceofObjectParts ...................36 iii PAGE 4 4.2CombiningObject-PartRecognitionResults ................37 5FEATUREEXTRACTION .............................39 5.1Texture ....................................39 5.1.1WaveletTransform ..........................39 5.1.2WaveletProperties ..........................41 5.1.3ComplexWaveletTransform .....................42 5.1.4Dierence-of-GaussianTextureExtraction .............44 5.2Color ......................................45 6EXPERIMENTSANDDISCUSSION .......................46 6.1UnsupervisedImageSegmentationTests ..................47 6.2TestsofConvergence .............................50 6.3ImageClassicationTests ..........................53 6.4Object-PartRecognitionStrategy ......................57 7CONCLUSION ....................................63 7.1SummaryofContributions ..........................63 7.2OpportunitiesforFutureWork .......................65 APPENDIXADERIVATIONOFVARIATIONALAPPROXIMATION .............67 BINFERENCEONTHEFIXED-STRUCTURETREE ..............72 REFERENCES .......................................74 BIOGRAPHICALSKETCH ...............................80 iv PAGE 5 LISTOFTABLES Table page 5{1CoecientsoftheltersusedintheQ-shiftDTCWT. .............43 6{1Root-nodedistanceerror .............................49 6{2Pixelsegmentationerror .............................50 6{3Objectdetectionerror ...............................50 6{4Objectrecognitionerror .............................55 6{5Pixellabelingerror ................................55 6{6ObjectrecognitionerrorforIQT V 0 .......................59 6{7PixellabelingerrorforIQT V 0 ..........................59 v PAGE 6 LISTOFFIGURES Figure page 1{1VariantsofTSBNs ................................7 1{2Anirregulartreeconsistsofaforestofsubtrees ................8 1{3Bayesianestimationoftheirregulartree ....................11 2{1Twotypesofirregulartrees ...........................13 2{2Pixelclusteringusingirregulartrees ......................17 2{3Irregulartreelearnedforthe4 4imagein(a) .................17 2{4Inferenceoftheirregulartreegiven Y R 0 ,and ...............24 3{1Classesofcandidateparents ...........................30 3{2Inferenceoftheirregulartreewithxednodepositions ............32 3{3Algorithmforlearningtheparametersoftheirregulart ree ..........34 4{1ForeachsubtreeofIT V ,representinganobjectinthe128 128image ...37 4{2ForeachsubtreeofIT V ,representinganobjectinthe256 256image ...38 5{1TwolevelsoftheDWTofatwo-dimensionalsignal. .............40 5{2Theoriginalimage(left)anditstwo-scaledyadicDWT(r ight). .......40 5{3TheQ-shiftDual-TreeCWT. ..........................42 5{4TheCWTisstronglyorientedatangles 15 ; 45 ; 75 ..........43 6{120imageclassesintypeIandIIdatasets. ...................48 6{2Imagesegmentationusing IT V 0 .........................48 6{3ImagesegmentationusingIT V 0 :(top)datasetIimages ............48 6{4ImagesegmentationbyirregulartreeslearnedusingSVA ...........49 6{5ImagesegmentationbyirregulartreeslearnedusingSVA :(a)IT V 0 .....49 6{6ImagesegmentationusingIT V .........................49 6{7Comparisonofinferencealgorithms .......................51 6{8Typicalconvergencerateoftheinferencealgorithmfor IT V 0 onthe128 128 52 6{9Typicalconvergencerateoftheinferencealgorithmfor IT V 0 onthe256 256 52 vi PAGE 7 6{10Percentageincreaseinlog-likelihood ......................52 6{11Comparisonofclassicationresultsforvariousstati sticalmodels ......55 6{12MAPpixellabelingusingdierentstatisticalmodels. .............56 6{13ROCcurvesfortheimageinFig.6{12awithIT V 0 ,TSBN,DRFandMRF. 56 6{14ROCcurvesfortheimageinFig.6{12awithIT V ,IT V 0 ,TSBN,andTSBN 56 6{15Comparisonoftworecognitionstrategies ....................58 6{16RecognitionresultsoverdatasetIVforIQT V 0 ................60 6{17RecognitionresultsoverdatasetVforIQT V 0 .................61 6{18Classicationusingthepart-objectrecognitionstra tegy ...........62 B{1Steps2and5inFig.3{2 .............................73 vii PAGE 8 KEYTOABBREVIATIONS Thelistshownbelowgivesadescriptionofthefrequentlyus edacronymsorabbreviationsinthiswork.Foreachname,thepagenumbercorrespon dstotheplacewherethe nameisrstused.B :bluechanneloftheRGBcolorspace..................... ....43 G :greenchanneloftheRGBcolorspace.................... ....43 R :redchanneloftheRGBcolorspace...................... ....43 IQT V :irregulartreewithxednodepositions,andwithobservab lespresentatall levels...........................................26 IQT V 0 :irregulartreewithxednodepositions,andwithobservab lespresentonlyat theleaf-level...................................... .26 IT V 0 :irregulartreewhereobservablesarepresentonlyatthele af-level.......13 IT V :irregulartreewhereobservablesarepresentatalllevels .............13 g :normalizedgreenchannel............................ ....43 r :normalizedredchannel.............................. ...43 CWT:ComplexWaveletTransform........................ ...40 DRF:DiscriminativeRandomField...................... .....51 DTCWT:DualTreeComplexWaveletTransform.............. ......40 DWT:DiscreteWaveletTransform....................... .....37 EM:Expectation-Maximizationalgorithm............... .........7 KL:Kullback-Leiblerdivergence...................... .......17 MAP:MaximumAPosteriori............................. ..3 MCMC:MarkovChainMonteCarlomethod................... ...15 ML:MaximumLikelihood............................... ..22 MPM:MaximumPosteriorMarginal....................... ....69 MRF:MarkovRandomField.............................. .2 NP:nondeterministicpolynomialtime.................. ........7 viii PAGE 9 RGB:Thecolorspacethatconsistsofred,greenandbluecolo rvalues........43 ROC:receiveroperatingcharacteristic................ ..........52 SVA:structuredvariationalapproximationinferencealgo rithm............16 TSBN:tree-structuredbeliefnetwork.................. ........5 VA:variationalapproximationinferencealgorithm...... .............16 ix PAGE 10 KEYTOSYMBOLS Thelistshownbelowgivesabriefdescriptionofthemajorma thematicalsymbols denedinthiswork.Foreachsymbol,thepagenumbercorresp ondstotheplacewherethe symbolisrstused.A ij :inruenceofobservables Y on ij ..........................20 B ij :inruenceofthegeometricpropertiesofthenetworkon ij ............20 G :numberofcomponentsinaGaussianmixture.............. ......15 H i :Shanon'sentropyofnode i ..............................34 J ( Q;P ):freeenergy....................................64 L :maximumnumberoflevelsintheirregulartree........... ........13 M :setofimageclasses(i.e.,objectappearances)......... ...........13 P kl ij :conditionalprobabilitytables...................... ......13 Q klij :approximateconditionalprobabilitytables,given Y and R 0 ...........18 R :positionsofallnodesintheirregulartree............. .........13 R 0 :positionsofnon-leafnodesintheirregulartree........ ...........13 R 0 :positionsofleafnodesintheirregulartree............ .........13 V :setofallnodesintheirregulartree................... .......13 V 0 :setofallnon-leafnodesintheirregulartree........... .........13 V 0 :setofallleafnodesintheirregulartree............... ........13 X :randomvectorofall x ki ................................13 Y :allobservables.................................... .13 Z :connectivityrandommatrix.......................... ....13 C :costfunction...................................... .20 :thesetofallparameters f p i g intheirregulartreewithxednodepositions...28 ij covariancematrixofarelativechild-parentdisplacement ( r i r j ).........13 :setofparametersthatcharacterizeanirregulartree... .............15 x PAGE 11 n ij :approximatecovarianceof r i ,giventhat j istheparentof i ,andgiven Y and R 0 18 ij :approximatemeanof r i ,giventhat j istheparentof i ,andgiven Y and R 0 ..18 ( i ):coordinateofanobservablerandomvectorintheimagepla ne.........13 ` :indexoflevelsintheirregulartree................... ........13 r ij :probabilityofanode i beingthechildof j .....................13 :normalizationconstant............................. ....18 :setofparametersthatcharacterizeaGaussianmixture... ............15 ij :approximateprobabilityof i beingthechildof j ,given Y and R 0 ........18 m ki :approximateposteriorthatnode i islabeledasimageclass k ,given Y and R 0 .19 x i :image-classofnode i ..................................13 x ki :image-classindicatorif k classisassignedtonode i ................13 z ij :connectivityindicatorrandomvariablebetweennodes i and j ..........13 d ij themeanofrelativedisplacement r i r j .......................13 r i :positionofnode i intheimageplane.........................13 y ( i ) :observablerandomvector............................ ..13 xi PAGE 12 AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofPhilosophy IRREGULAR-STRUCTURETREEMODELSFORIMAGEINTERPRETATION By SinisaTodorovic May2005 Chair:DapengWuMajorDepartment:ElectricalandComputerEngineering Inthisdissertation,weseektoaccomplishthefollowingre latedgoals:(1)tonda unifyingframeworktoaddresslocalization,detection,an drecognitionofobjects,asthree sub-tasksofimage-interpretation,and(2)tondacomputa tionallyecientandreliable solutiontorecognitionofmultiple,partiallyoccluded,a likeobjectsinagivensingleimage. Thesecondproblemistodateanopenproblemincomputervisi on,eludingasatisfactory solution.Forthispurpose,weformulateobjectrecognitio nasBayesianestimation,whereby classlabelswiththemaximumposteriordistributionareas signedtoeachpixel.Toecientlyestimatetheposteriordistributionofimageclass es,weproposetomodelimages withgraphicalmodelsknownas irregulartrees Theirregulartreespeciesprobabilitydistributionsove rbothitsstructureandimageclasses.Thismeansthat,foreachimage,itisnecessary toinfertheoptimalmodel structure,aswellastheposteriordistributionofimagecl asses.WeproposeseveralinferencealgorithmsasasolutiontothisNP-hardproblem(nonde terministicpolynomialtime), whichcanbeviewedasvariantsoftheExpectation-Maximiza tion(EM)algorithm. Afterinference,themodelrepresentsaforestofsubtrees, eachofwhichsegmentsthe image.Thatis,inferenceofmodelstructureprovidesasolu tiontoobjectlocalizationand detection. xii PAGE 13 Withrespecttooursecondgoal,wehypothesizethatforasuc cessfuloccluded-object recognitionitiscriticaltoexplicitlyanalyzevisibleob jectparts.Irregulartreesareconvenientforsuchanalysis,becausethetreatmentofobjectpar tsrepresentsmerelyaparticular interpretationofthetree/subtreestructure.Weanalyzet he signicance ofirregular-tree nodes,representingobjectparts,withrespecttorecognit ionofanobjectasawhole.This informationisthenexploitedtowardtheultimateobjectre cognition. Empiricalresultsdemonstratethatirregulartreesmoreac curatelymodelimagesthan theirxed-structurecounterpartsquad-trees.Also,thee xperimentsreportedhereinshow thatourexplicittreatmentofobjectpartsresultsinanimp rovedrecognitionperformance, ascomparedtothestrategiesinwhichobjectcomponentsare notexplicitlyaccountedfor. xiii PAGE 14 CHAPTER1 INTRODUCTION Imageinterpretationisadicultchallengethathaslongbe enconfrontingthecomputervisioncommunity.Anumberoffactorscontributetothecomp lexityofthisproblem.The mostcriticalisinherentuncertaintyinhowtheobservedvi sualevidenceinimagesshould beattributedtoinferobjecttypesandtheirrelationships .Inadditiontovideonoise,there arevarioussourcesofthisuncertainty,includingvariati onsincameraqualityandposition, wide-rangingilluminationconditions,extremescenedive rsity,andtherandomnessofobject appearances,clutterandlocationsinscenes. Oneofthecriticalhindrancestosuccessfulimageinterpre tationisthatobjectsmay occludeeachotherinacomplexscene.Intheliterature,the initialresearchontheinterpretationofsceneswithocclusionsappearedinearlyninet ies.However,inthelastdecade relativelysmallvolumeoftherelatedliteraturewaspubli shed.Infact,amajorityofthe recentlyproposedvisionsystemsisnotdirectlyaimedatso lvingtheproblemofoccludedobjectrecognition;experimentsonimageswithocclusions arereportedasasideresultonly toillustratetheversatilityofthosesystems.Thissugges tsthatrecognitionofpartially occludedobjectsisanopenproblemincomputervision,whic hmotivatesustoseekits solutioninthisdissertation. Intheinitialwork,localfeatures(e.g.,points,lineandc urvesegments)areusedto representobjects,allowingtheunoccludedfeaturestobem atchedwithobjectfeatures,by computingascalarmeasureofmodelt[ 1 2 3 ].Theunmatchedscenefeaturesaremodeled asspuriousfeatures,andtheunmatchedobjectfeaturesind icatetheoccludedpartofthe object.Thematchingscoreiseitherthenumberofmatchedob jectfeaturesorthesumofa Gaussian-weightedmatchingerror.Themainlimitationwit htheseapproachesisthatthey donotaccountforthespatialcorrelationamongocclusions Statisticalapproachestooccluded-objectrecognitionha vealsobeenreportedinthe literature.Forinstance,Wells[ 4 ],andYingandCastanon[ 5 ]proposeprobabilisticmodels 1 PAGE 15 2 tocharacterizescenefeaturesandthecorrespondencebetw eensceneandobjectfeatures. Theauthorsmodelbothobject-featureuncertaintyandthep robabilitythattheobject featuresareoccludedinthescene.Theyintroducetwostati sticalmodelsforocclusion. Onemodelassumesthateachfeaturecanbeoccludedindepend entlyofwhetheranyother featuresareoccluded,whereasthesecondmodelaccountsfo rthespatialcorrelationto representtheextentofocclusion.Thespatialcorrelation iscomputedusingaMarkov RandomField(MRF)modelwithaGibbsdistribution[ 6 ].Themaindrawbackofthese systemsisaprohibitivecomputationalload;therun-timeo fthesealgorithmsisexponential inthenumberofobjectstoberecognized. Otherrelatedworkexploitsauxiliaryinformationprovide d,forexample,byimage sequencesorstereoviewsofthesamescene[ 7 8 9 10 11 5 ],whereocclusionsaretransitory. Sincethisinformationingeneralmaynotbeavailable,and/ orocclusionsmayremain permanent,inourapproachwedonotusethestrategiesofthe sesystems. Areviewoftherelatedliteraturealsosuggeststhatthemaj orityofvisionsystemsare designedtodealwithonlyoneconstrainedvisiontask,such as,forexample,imagesegmentation[ 10 11 5 ].However,toconductimageinterpretation,asisourgoal, itisnecessary toperformthreerelatedtasks:(1)localization,(2)detec tion(alsocalledimagesegmentation),and(3)ultimaterecognitionofobjectappearances( alsocalledimageclassication). Further,inmanysystemsinwhichthethreesub-tasksareadd ressed,thisisnotdoneina uniedmanner.Here,asadrawback,thesystem'sarchitectu recomprisesaserialconnectionofseparatemodules,withoutanyfeedbackontheaccura cyoftheultimaterecognition. Moreover,visionsystemsaretypicallydesignedtorecogni zeonlyaspecicinstanceofobjectclassesappearingintheimage(e.g.,face),which,int urn,isassumeddissimilarto otherobjectsintheimage.However,theassumptionofuniqu enessofthetargetclassmay notbeappropriateinmanysettings.Also,thesuccessofthe sesystemsusuallydependson adhoc ne-tuningofthefeature-extractionmethodsandsystem's parameters,optimized forthatuniquetargetclass.Withcurrentdemandstodesign systemscapableofclassifying thousandsofimageclassessimultaneously,itwouldbedic ulttogeneralizetheoutlined approaches. PAGE 16 3 Thesmallvolumeofpublishedresearchaddressingocclusio nsinimagessuggeststhat theproblemisnotfullyexamined.Also,thedrawbacksofthe abovesystems{namely:constrainedgoalsandsettingsofoperation,poorspatialmode lingofocclusion,andprohibitive computationalload{motivatedustoconducttheresearchre portedherein.Ourmotivation isthatmostobjectclassesseemtobenaturallydescribedby afewcharacteristicpartsor componentsandtheirgeometricalrelation.Wehypothesize thatitisnotthepercentageof occlusionthatiscriticalforobjectrecognition,butrath erwhichobjectpartsareoccluded. Notallcomponentsofanobjectareequallyimportantforits recognition,especiallywhen thatobjectispartiallyoccluded.Giventwosimilarobject sintheimage,thevisiblepartsof oneobjectmaymisleadthealgorithmtorecognizeitasitsco unterpart.Therefore,careful considerationshouldbegiventotheanalysisofdetectedvi sibleobjectparts.Oneofthe benetsofsuchanalysisistherexibilitytodevelopvariou srecognitionstrategiesthatweigh theinformationobtainedfromthedetectedobjectpartsmor ejudiciously.Inthefollowing section,wereviewsomeofthereportedpart-basedobject-r ecognitionstrategies. 1.1Part-BasedObjectRecognition Recently,therehasbeenarurryofresearchrelatedtopartbasedobjectrecognition. Forexample,Mohanetal.[ 12 ]useseparateclassierstodetectheads,arms,andlegsof peopleinanimage,andanalclassiertodecidewhetherape rsonispresent.However, theapproachrequiresobjectpartstobemanuallydenedand separatedfortrainingthe individualpartclassiers.Tobuildasystemthatiseasily extensibletodealwithdierent objects,itisimportantthatthepartselectionprocedureb eautomated.Oneapproachin thisdirectionisdevelopedbyWeberetal.[ 13 14 ].Theauthorsassumethatanobjectis composedofpartsandshape,wherepartsareimagepatches,w hichmaybedetectedand characterizedbyappropriatedetectors,andshapedescrib esthegeometryofthemutual positionofthepartsinawaythatisinvariantwithrespectt origidand,possibly,ane transformations.Theauthorsproposeajointprobabilityd ensityoverpartappearances andshapethatmodelstheobjectclass.Thisframeworkisapp ealinginthatitnaturally allowsforpartsofdierentsizesandresolutions.However ,duetocomputationalissues,to learnthejointprobabilitydensity,theauthorschooseheu risticallyasmallnumberofparts PAGE 17 4 pereachobjectclass,renderingthedensityunreliableint hecaseoflargevariationsacross images. Probabilisticdetectionofobjectpartshasalsobeenrepor ted.Forinstance,Heisele etal.[ 15 ]proposetolearnobjectcomponentsfromasetofexamplesba sedontheirdiscriminativepower,andtheirrobustnessagainstposeandil luminationchanges.Forthis purpose,theyuseSupportVectorMachines.Also,Felzenszw albandHuttenlocher[ 16 ]representanobjectbyacollectionofpartsarrangedinadeform ableconguration.Intheir approach,theappearanceofeachpartismodeledseparately byGaussian-mixturedistributions,andthedeformablecongurationisrepresentedbysp ring-likeconnectionsbetween pairsofparts.Themainproblemofthementionedapproaches isthattheylacktheanalysis ofobjectpartsthroughscales.Itisassumedthatpartscann otcontainothersub-parts,and thatobjectsareunionsofmutuallyexclusivecomponents,w hichishardtojustifyformore complexobjectclasses. ToaddresstheanalysisofobjectpartsthroughscalesSchne idermanandKanade[ 17 ] proposeatrainablemulti-stageobjectdetectorcomposedo fclassiers,eachmakingadecisionaboutwhethertoceaseevaluation,labelingtheinpu tasnon-object,ortocontinue furtherevaluation.Thedetectorordersthesestagesofeva luationfromalow-resolutionto ahigh-resolutionsearchoftheimage. Theaforementionedapproachesarenotsuitableforrecogni tionofalargenumberof objectclasses.Asthenumberofclassesincreasesthereisa combinatorialexplosionof thenumberoftheirparts(i.e.,imagepatches)thatneedtob eevaluatedbyappropriate detectors. Inthisdissertation,weseekasolutiontotheoutlinedprob lems.Ourgoalittodesigna visionsystemthatwouldanalyzemultipleobjectclassesth roughtheirconstituent,"meaningful"partsatanumberofdierentresolutions.Tothisen d,weresorttoaprobabilistic framework,asdiscussedinthefollowingsection. 1.2ProbabilisticFramework Weformulateimageinterpretationasinferenceofaposteri ordistributionoverpixel randomeldsforagivenimage.Oncetheposteriordistribut ionofimageclassesisinferred, PAGE 18 5 eachpixelcanbelabeledthroughBayesianestimation(e.g. maximumaposteriori {MAP). Withinthisframework,itisnecessarytospecifythefollow ing: 1.Theprobabilitydistributionofimageclassesoverpixel randomelds, 2.Theinferencealgorithmsforcomputingtheposteriordis tributionofimageclasses, 3.Bayesianestimationforultimatepixellabeling,thatis ,objectrecognition. Ourprincipalchallengeliesinchoosingastatisticalmode lforspecifyingtheprobability distributionofimageclasses,sincethischoicecondition stheformulationofinferenceand Bayesianestimation.Asuitablemodelshouldbecomputatio nallymanageable,andsucientlyexpressivetorepresentawiderangeofpatternsini mages.Areviewoftheliterature oersfourbroadclassesofmodels[ 18 ].Thedescriptivemodelsareconstructedbasedon statisticaldescriptionsofimageensembleswithvariable sonlyatonelevel(e.g.,[ 19 20 ]). Thepseudo-descriptivemodelsreducethecomputationalco stofdescriptivemodelsbyimposingpartial(orevenlinear)orderamongrandomvariable s(e.g.,[ 21 22 ]).Thegenerative modelsconsistofobservableandhiddenvariables,wherehi ddenvariablesrepresentanite numberofbasesgeneratinganimage(e.g.,[ 23 24 ]).Thediscriminativemodelsdirectly encodeposteriordistributionofhiddenvariablesgivenob servables(e.g.,[ 25 26 ]). Theavailablemodelsdierinstructuralcomplexityanddi cultyofinference.Atone endliedescriptivemodels,whichbuildstatisticaldescri ptionsofimageensemblesonlyat theobservable(i.e.,pixel)level.Othermodelingparadig ms(i.e.,generative,discriminative) imposevaryinglevelsofstructurethroughtheintroductio nofhiddenvariables.However, noprincipledformulationexists,asofyet,tosuggestonea pproachsuperiortotheothers. Therefore,ourchoiceofmodelisguidedbythegoaltointerp retsceneswithpartially occluded,alikeobjects.Weseekamodelthatoersaviablem eansofrecognizingpartially occludedobjectsthroughrecognitionoftheirvisiblecons tituentparts.Thus,aprospective modelshouldallowforanalysisofobjectpartstowardsreco gnitionofobjectsasawhole. Toalleviatethecomputationalcomplexityarisingfromthe treatmentofmultiple object-partsofmultipleobjectsinimages,weseekamodelt hatiscapableofmodeling bothwholeobjectsandtheirsub-partsinauniedmanner.Th atis,acandidatemodel mustbeexpressiveenoughtocapturecomponent-subcompone ntrelationshipsamongregionsinanimage.Toaccomplishthis,itisnecessarytoanal yzepixelneighborhoods PAGE 19 6 ofvaryingsize.Theliteratureaboundswithreportsonsucc essfulapplicationsofmultiscalestatisticalmodelsforthispurpose[ 27 28 29 30 31 32 ].Followingthesetrends,we choosethe irregulartree-structuredbeliefnetwork ,orshort irregulartree .Ourchoiceis directlydrivenbyourimage-interpretationstrategyandg oals,andappearsbettersuited thanalternativestatisticalapproaches.Descriptivemod elslackthenecessarystructure forcomponent-subcomponentrepresentationweseektoexpl oit.Discriminativeapproaches directlymodelposteriordistributionofhiddenvariables givenobservables.Consequently, theylosetheconvenienceofassigningphysicalmeaningtot hestatisticalparametersofthe model.Incontrast,irregulartreescandetectobjectsandt heirpartssimultaneously,as discussedinthefollowingchapters. Beforewecontinuetopresentourapproachtoimageinterpre tation,wegiveabrief overviewoftree-structuredgenerativemodelsinthefollo wingsection. 1.3Tree-StructuredGenerativeModels Recently,therehasbeenarurryofresearchintheeldoftre e-structuredgenerative models,alsoknownastree-structuredbeliefnetworks(TSB Ns)[ 27 33 28 29 30 31 32 ].The modelsprovideasystematicwaytodescriberandomprocesse s/eldsandhaveextremely ecientandstatisticallyoptimalinferencealgorithms.T ree-structuredbeliefnetworksare characterizedbyaxedbalancedtreestructureofnodesrep resentinghidden(latent)and observablerandomvariables.WefocusonTSBNswhosehidden variablestakediscretevalues,thoughTSBNscanmodelevencontinuouslyvaluedGaussi anprocesses[ 34 35 ].The edgesofTSBNsrepresentparent-child(Markovian)depende nciesbetweenneighboringlayersofhiddenvariables,whilehiddenvariables,belonging tothesamelayer,areconditionally independent,asdepictedinFigure 1{1 .Notethatobservablesdependsolelyontheircorrespondinghiddenvariables.Observablesareeitherpresent atthenestlevelonly,orcouldbe propagatedupwardthetree,asdictatedbythedesignchoice srelatedtoimageprocessing. TSBNshaveecientlinear-timeinferencealgorithms,ofwh ich,inthegraphical-models literature,thebest-knownis beliefpropagation [ 36 37 38 ].ChengandBouman[ 29 ]have usedTSBNsformultiscaledocumentsegmentation;Kumarand Hebert[ 39 ]haveemployed TSBNsforsegmentationofman-madestructuresinnaturalsc eneimages;andSchneideret PAGE 20 7 al.[ 40 ]haveusedTSBNsforsimultaneousimagedenoisingandsegme ntation.Alltheaforementionedexamplesdemonstratethepowerfulexpressivene ssofTSBNsandtheeciency oftheirinferencealgorithms,whichiscriticallyimporta ntforourpurposes. Inspiteoftheseattractiveproperties,thexedregularst ructureofnodesintheTSBN givesriseto\blocky"estimates.Thepre-denedtreestruc turefailstoadequatelyrepresent theimmensevariabilityinsizeandlocationofdierentobj ectsandtheirsubcomponents inimages.Intheliterature,thereareseveralapproachest oalleviatethisproblem.Irving etal.[ 28 ]haveproposedanoverlappingtreemodel,wheredistinctno descorrespondto overlappingpartsintheimage.Lietal.[ 41 ]havediscussedtwo-dimensionalhierarchical modelswherenodesaredependentbothatanyparticularlaye rthroughaMarkov-mesh andacrossresolutions.Inbothapproachessegmentationre sultsaresuperiortothosewhen standardTSBNsareused,becausethedescriptivecomponent ofthemodelsisimprovedat increasedcomputationalcost.Ultimately,however,these approachesdonotdealwiththe sourceofthe\blockiness"{namely,theorderlystructureo fTSBNs. Notuntilrecentlyhastheresearchonirregularstructures beeninitiated.Konenet al.[ 42 ]haveproposedarexibleneuralmechanismforinvariantpat ternrecognitionbasedon correlatedneuronalactivityandtheself-organizationof dynamiclinksinneuralnetworks. Also,Montanvertetal.[ 43 ],andBertolinoandMontanvert[ 44 ]haveexploredirregular multiscaletessellationsthatadapttoimagecontent.Wejo intheseresearcheortsbuilding ontheworkofAdamsetal.[ 45 ],Adams[ 46 ],Storkey[ 47 ],andStorkeyandWilliams[ 48 ], byconsideringtheirregular-structuredtreebeliefnetwo rk. (a) (b) Figure1{1:VariantsofTSBNs:(a)observables(black)atth elowestlayeronly;(b)observables(black)atalllayers;whitenodesrepresenthidd enrandomvariables,connected inabalancedquad-treestructure. PAGE 21 8 Figure1{2:Anirregulartreeconsistsofaforestofsubtree s,eachofwhichsegmentsthe imageintoregions,markedbydistinctshading;round-ands quare-shapednodesindicate hiddenandobservablevariables,respectively;triangles indicateroots. Intheirregulartree,asinTSBNs,nodesrepresentrandomva riables,andarcsbetween themmodelcausal(Markovian)dependenceassumptions,asi llustratedinFigure 1{2 .The irregulartreespeciesprobabilitydistributionsoverbo thitsstructureandimageclasses. Itisthisdistributionovertreestructuresthatmitigates theabovecitedproblemswith TSBNs. 1.4LearningTreeStructurefromDataisanNP-hardProblem Inordertofullycharacterizetheirregulartree(andanygr aphicalmodel,forthat matter),itisnecessarytolearnboththegraphtopology(st ructure)andtheparameters oftransitionprobabilitiesbetweenconnectednodesfromt rainingdata.Usually,forthis purpose,onemaximizesthelikelihoodofthemodelovertrai ningdata,whileatthesame timeminimizingthecomplexityofmodelstructure.Current methodsaresuccessfulat learningboththestructureandparametersfrom complete data.Unfortunately,whenthe dataare incomplete (i.e.,somerandomvariablesare hidden ),optimizingboththestructure andparametersbecomesNP-hard(nondeterministicpolynom ialtime)[ 49 50 ]. Theprincipalcontributionofthisdissertationisthatwep roposeasolutiontothe NP-hardproblemofmodel-structureestimation.Inourappr oach,weuseavariantofthe Expectation-Maximization(EM)algorithm[ 51 52 ],tofacilitateecientsearchoveralarge numberofcandidatestructures.Inparticular,theEMproce dureiterativelyimprovesits currentchoiceofparametersbyusingthefollowingtwostep s.IntheExpectationstep, currentparametersareusedforcomputingtheexpectedvalu eofallthestatisticsneededto evaluatethecurrentstructure.Thatis,themissingdata(h iddenvariables)arecompleted bytheirexpectedvalues.IntheMaximizationstep,werepla cecurrentparameterswith thosethatmaximizethelikelihoodoverthecompletedata.T hissecondstepisessentially PAGE 22 9 equivalenttolearningmodelstructureandparametersfrom completedata,and,hence,can bedoneeciently[ 50 38 49 ]. Intheincomplete-datacase,alocalchangeinstructureofo nepartofthetreemay leadtoastructurechangeinanotherpartofthemodel.Thus, theavailablemethodsfor structureestimationevaluatealltheneighbors(e.g.,net worksthatdierbyafewlocal changes)ofeachcandidatetheyvisit[ 53 ].Thenovelideaofourapproachistoperforma searchforthebeststructurewithinEM.Ineachiterationst ep,ourprocedureattemptsto ndabetternetworkstructure,bycomputingtheexpectedst atisticsneededforevaluation ofalternativestructures.Incontrasttotheavailableapp roaches,theEM-basedstructure searchmakesasignicantprogressineachiteration.Aswes howthroughexperimental validation,ourprocedurerequiresrelativelyfewEMitera tionstolearnnon-trivialtree structures. Theoutlinedimagemodelingconstitutesthecoreofourappr oachtoimageinterpretation,whichisdiscussedinthefollowingsection. 1.5OurApproachtoImageInterpretation Weseektoaccomplishthefollowingrelatedgoals:(1)tond aunifyingframework toaddresslocalization,detection,andrecognitionofobj ects,asthreesub-tasksofimageinterpretation,and(2)tondacomputationallyecientan dreliablesolutiontorecognition ofmultiple,partiallyoccluded,alikeobjectsinagivensi ngleimage.Forthispurpose,we formulateobjectrecognitionastheBayesianestimationpr oblem,whereclasslabelsare assignedtopixelsbyminimizingtheexpectedvalueofasuit ablyspeciedcostfunction. Thisformulationrequiresecientestimationoftheposter iordistributionofimageclasses (i.e.,objects),givenanimage.Tothisend,weresorttodir ectedgraphicalmodels,known as irregulartrees [ 54 55 46 47 48 45 ].AsdiscussedinSection 1.3 ,theirregulartreespecies probabilitydistributionsoverbothitsstructureandimag eclasses.Thismeansthat,for eachimage,itisnecessarytoinfertheoptimalmodelstruct ure,aswellastheposterior distributionofimageclasses.ByutilizingtheMarkovprop ertyoftheirregulartree,weare inapositiontoreducecomputationalcomplexityoftheinfe rencealgorithm,and,thereby, toecientlysolveourBayesianestimationproblem. PAGE 23 10 Afterinference,themodelrepresentsaforestofsub-trees ,eachofwhichsegmentsthe image.Moreprecisely,leafnodesthataredescendantsdown thesubtreeofagivenrootform theimageregioncharacterizedbythatroot,asdepictedinF ig. 1{2 .Thesesegmentedimage regionscanbeinterpretedasdistinctobjectappearancesi ntheimage.Thatis,inference ofirregular-treestructureprovidesasolutiontolocaliz ationanddetection.Moreover,in inference,wealsoderivetheposteriordistributionofima geclassesoverleafnodes.Inorder toclassifythesegmentedimageregionsasawhole,weperfor mmajorityvotingoverthe maximumaposteriori(MAP)classesofleafnodes.Inthisfas hion,weaccomplishourrst goal. Withrespecttooursecondgoal,wehypothesizethatthecrit icalfactorinasuccessful occluded-objectrecognitionshouldbetheanalysisofvisi bleobjectparts,which,asdiscussedbefore,usuallyinducesprohibitivecomputational cost.Toaccountexplicitlyfor objectpartsatvariousscales,weutilizetheMarkovianpro pertyofirregulartrees,which lendsitselfasanaturalsolution.Sinceeachrootdetermin esasubtreewhoseleafnodesform adetectedobject,wecanassignphysicalmeaningtorootsas representingwholeobjects. Also,eachdescendantoftherootdownthesubtreecanbeinte rpretedastherootofanother subtreewhoseleafnodescoveronlyapartoftheobject.Thus ,roots'descendantscanbe viewedasobjectpartsatvariousscales.Therefore,within theirregular-treeframework,the treatmentofobjectpartsrepresentsmerelyaparticularin terpretationofthetree/subtree structure. Toreducethecomplexityofinterpretingalldetectedobjec tsub-parts,weproposeto analyzethe signicance ofobjectcomponents(i.e.,irregular-treenodes)withres pectto recognitionofobjectsasawhole.AfterBayesianestimatio noftheirregular-treestructure foragivenimage,werstndthesetof mostsignicant irregular-treenodes.Then,these selectedsignicantnodesaretreatedasnewrootsofsubtre es.Finally,weconductMAP classicationandmajorityvotingovertheselectedimager egions,descendingfromthe selected signicant nodes,asillustratedinFig. 1{3 1.6Contributions Below,weoutlinethemaincontributionsofthisdissertati on. PAGE 24 11 Figure1{3:Bayesianestimationoftheirregulartreealong withtheanalysisofsignicanttreenodesconstituteourapproachtorecognitionofpa rtiallyoccluded,alikeobjects; shadingindicatesthetwodistinctsub-treesunderthetwo\ signicant"nodes. WeproposeanEM-likealgorithmforlearningagraphical-mo del,wherebothmodel structureanditsdistributionsarelearnedonagivendatas imultaneously.Thealgorithm representsastage-wisesolutiontothelearningproblemkn owntobeNP-hard.Whilewe usethealgorithmforlearningirregulartrees,itsgeneral izationtoanygenerativemodelis straightforward. Acriticalpartofthislearningalgorithmisinferenceofth eposteriordistributionof imageclassesonagivendata.Asisthecaseformanycomplexstructuremodels,exact inferenceforirregulartreesisintractable.Toovercomet hisproblem,weresorttovariational approximationapproach.Weassumethatthereareaveraging phenomenainirregulartrees thatmayrenderagivensetofvariablesinthemodelapproxim atelyindependentoftherest ofthenetwork.Thereby,wederivetheStructuredVariation alApproximationalgorithm thatadvancesexistingmethodsforinference. Inordertoavoidvariationalapproximationininference,w eproposetwonovelarchitecturesandtheirinferencealgorithmswithintheirregul ar-treeframework.Beingsimpler, thesemodelsallowforexactinference.Moreover,empirica lly,theyexhibithigheraccuracy inmodelingimagesthanirregular-tree-likemodelspropos edinpriorwork[ 45 46 47 48 ]. Alongwitharchitecturalnovelties,wealsointroducemult i-layereddataintothemodel{ anapproachthathasbeenextensivelyinvestigatedinxedstructurequad-trees[ 29 33 ]. Theproposedquad-treeshaveprovedrathersuccessfulforv ariousapplicationsincluding imagedenoising,classication,andsegmentation.Hence, itisimportanttodevelopa similarformulationforirregulartrees. PAGE 25 12 Wedevelopanovelapproachtoobjectrecognition,inwhicho bjectpartsareexplicitly analyzedinacomputationallyecientmanner.Asamajorthe oreticalcontribution,we denethemeasureofcognitivesignicanceofobjectdetail s.Themeasureprovidesfora principledalgorithmthatcombinesdetectedobjectpartst owardrecognitionofanobject asawhole. Finally,wereportresultsofexperimentsconductedonawid evarietyofimagedatasets, whichcharacterizetheproposedmodelsandinferencealgor ithms,andvalidateourapproach toimageinterpretation. 1.7Overview Theremainderofthedissertationisorganizedasfollows.InChapter 2 ,wespecifytwoarchitecturesoftheirregular-treemodel, andderive inferencealgorithmsforthem.Thearchitecturesdierint hetreatmentofobservable randomvariables.Wealsodiscusslearningofthemodelpara meters.Detailedderivation oftheinferencealgorithmisgiveninAppendix A Next,inChapter 3 ,wespecifyyetanothertwoarchitecturesoftheirregulartreemodel, forwhichitispossibletosimplifytheinferencealgorithm ,ascomparedtothatdiscussed inChapter 2 .Wedeliberatetheprobabilisticinferenceandlearningal gorithmsforthe models. Further,inChapter 4 ,weproposeameasureofsignicanceofobjectparts.This measureranksobjectcomponentswithrespecttotheentropy overallimageclasses(i.e., objects).Toincorporatetheinformationofthisanalysisi ntotheMAPclassication,we deviseagreedyalgorithm,whichwerefertoasobject-partr ecognition. Theextractionofimagefeatures,whichweuseinourexperim ents,isthoroughly discussedinChapter 5 .Then,InChapter 6 ,wereportperformanceresultsofdierent irregular-treearchitecturesonalargenumberofchalleng ingimageswithpartiallyoccluded, alikeobjects. Finally,inChapter 7 ,wesummarizethemajorcontributionsofthedissertation, and concludewithremarksonthefutureresearch. PAGE 26 CHAPTER2 IRREGULARTREESWITHRANDOMNODEPOSITIONS 2.1ModelSpecication Irregulartreesaredirected,acyclicgraphswithtwodisjo intsetsofnodesrepresenting hiddenandobservablerandomvectors.Graphically,werepr esentallhiddenvariablesas round-shapednodes,connectedviadirectededgesindicati ngMarkoviandependencies,while observablesaredenotedasrectangular-shapednodes,conn ectedonlytotheircorresponding hiddenvariables,asdepictedinFig. 2{1 .Below,werstintroducenodescharacterizedby hiddenvariables. Thereare V round-shapednodes,organizedinhierarchicallevels, V ` ` = f 0 ; 1 ;:::;L 1 g where V 0 denotestheleaflevel,and V 0 V n V 0 .Thenumberofround-shapednodesisidenticaltothatofthecorrespondingquad-treewith L levels,suchthat j V ` j = j V ` 1 j = 4= ::: = j V 0 j = 4 ` Connectionsareestablishedundertheconstraintthatanod eatlevel ` canbecomearoot, oritcanconnectonlytothenodesatthenext ` +1level.Thenetworkconnectivityis representedbyrandommatrix Z ,whereentry z ij isanindicatorrandomvariable,such that z ij =1if i 2 V ` and j 2f 0 ;V ` +1 g areconnected. Z containsanadditionalzero(\root") (a) (b) Figure2{1:Twotypesofirregulartrees:(a)observablevar iablespresentattheleaflevel only;(b)observablevariablespresentatalllevels;round -andsquare-shapednodesindicate hiddenandobservablerandomvariables;trianglesindicat eroots;unconnectednodesinthis examplebelongtoothersubtrees;eachsubtreesegmentsthe imageintoregionsmarkedby distinctshading. 13 PAGE 27 14 column,whereentries z i 0 =1if i isaroot.Sinceeachnodecanhaveonlyoneparent,arealizationof Z canhaveatmostoneentryequalto1ineachrow.Wedenethedi stribution overconnectivityas P ( Z ) Q L 1 ` =0 Q ( i;j ) 2 V ` f 0 ;V ` +1 g [ r ij ] z ij ; (2.1) where r ij istheprobabilityof i beingthechildof j ,subjectto P j 2f 0 ;V ` +1 g r ij =1. Further,eachround-shapednode i (seeFig. 2{1 )ischaracterizedbyrandomposition r i intheimageplane.Thedistributionof r i isconditionedonthepositionofitsparent r j as P ( r i j r j ;z ij =1) 1 2 j ij j 1 2 exp( 1 2 ( r i r j d ij ) T 1 ij ( r i r j d ij )) ; (2.2) where ij isadiagonalmatrixthatrepresentstheorderofmagnitudeo fobjectsize,andparameter d ij isthemeanofrelativedisplacement( r i r j ).StorkeyandWilliams[ 48 ]set d ij tozero,whichfavorsundesirablepositioningofchildrena ndparentnodesatthesamelocations.Fromourexperiments,thismayseriouslydegradethe image-modelingcapabilitiesof irregulartrees,andassuchsomenonzerorelativedisplace ment d ij needstobeaccounted for.Forroots i ,wehave P ( r i j r 0 ;z i 0 =1) exp( 1 2 ( r i d i ) T 1 i ( r i d i )) = (2 j i j 1 2 ).The jointprobabilityof R f r i j8 i 2 V g ,isgivenby P ( R j Z ) Q i;j 2 V [ P ( r i j r j ;z ij )] z ij : (2.3) Attheleaflevel, V 0 ,wexnodepositions R 0 tothelocationsofthenest-scaleobservables,andthenuse P ( Z;R 0 j R 0 )astheprioroverpositionsandconnectivity,where R 0 f r i j8 i 2 V 0 g ,and R 0 f r i j8 i 2 V n V 0 g Next,eachnode i ischaracterizedbyanimage-classlabel x i andanimage-classindicatorrandomvariable x ki ,suchthat x ki =1if x i = k ,where k isalabeltakingvaluesinthenite set M .Thus,weassumethattheset M ofunknownimageclassesisnite.Thelabel k of node i isconditionedonimageclass l ofitsparent j andisgivenbyconditionalprobability tables P kl ij .Forroots i ,wehave P ( x ki j x l0 ;z i 0 =1) P ( x ki ).Thus,thejointprobabilityof X f x ki j i 2 V;k 2 M g isgivenby P ( X j Z )= Q i;j 2 V Q k;l 2 M h P kl ij i x ki x lj z ij : (2.4) PAGE 28 15 Finally,weintroducenodesthatarecharacterizedbyobser vablerandomvectorsrepresentingimagetextureandcolorcues.Here,wemakeadisti nctionbetweentwotypesof irregulartrees.Themodelwhereobservablesarepresenton lyattheleaf-levelisreferred toasIT V 0 ;themodelwhereobservablesarepresentatalllevelsisref erredtoasIT V .To clarifythedierencebetweenthetwotypesofnodesinirreg ulartrees,weindexobservables withrespecttotheirlocationsinthedata-structure(e.g. ,waveletdyadicsquares),while hiddenvariablesareindexedwithrespecttoanode-indexin thegraph.Thisgeneralizes correspondencebetweenhiddenandobservablerandomvaria blesoftheposition-encoding dynamictrees[ 48 ].Wedenethepositionofanobservable, ( i ),tobeequaltothecenter ofmassofthe i -thdyadicsquareatlevel ` inthecorrespondingquad-treewith L levels: ( i ) [( n +0 : 5)2 ` ( m +0 : 5)2 ` ] T ; 8 i 2 V ` ;` = f 0 ;:::;L 1 g ;n;m =1 ; 2 ;::: (2.5) where n and m denotetherowandcolumninthedyadicsquareatscale ` (e.g.,forwavelet coecients).Clearly,otherapplication-dependentdeni tionsof ( i )arepossible.Note thatwhilethe r 'sarerandomvectors,the 'saredeterministicvaluesxedatlocations wherethecorrespondingobservablesarerecordedintheima ge.Also,afterxing R 0 tothe locationsofthenest-scaleobservables,wehave 8 i 2 V 0 r i = ( i ).Thedenition,givenby Eq.( 2.5 ),holdsforIT V 0 ,aswell,for ` =0. Forbothtypesofirregulartrees,weassumethatobservable s Y f y ( i ) j8 i 2 V g atlocations f ( i ) j8 i 2 V g areconditionallyindependentgiventhecorresponding x ki : P ( Y j X; )= Q i 2 V Q k 2 M P ( y ( i ) j x ki ; ( i )) x ki ; (2.6) whereforIT V 0 V 0 shouldbesubstitutedfor V .Thelikelihoods P ( y ( i ) j x ki =1 ; ( i ))are modeledasmixturesofGaussians: P ( y ( i ) j x ki =1 ; ( i )) P G k g =1 k ( g ) N ( y ( i ) ; k ( g ) ; k ( g )). Forlarge G k ,aGaussian-mixturedensitycanapproximateanyprobabili tydensity[ 56 ]. Inordertoavoidtheriskofoverttingthemodel,weassumet hattheparametersofthe Gaussian-mixtureareequalforallnodes.TheGaussian-mix tureparameterscanbegrouped intheset f G k ; f k ( g ) ; k ( g ) ; k ( g ) g G k g =1 j8 k 2 M g Speakingingenerativeterms,foragivensetof V nodes,rst P ( Z )isdenedusing Eq.( 2.1 )and P ( R j Z )usingEq.( 2.3 )togiveus P ( Z;R ).Wethenimposetheconditionof PAGE 29 16 xingtheleaf-levelnodepositionstothelocationsofthe nest-scaleobservables, 0 toobtain P ( Z;R 0 j R 0 = 0 ).CombiningEq.( 2.4 )andEq.( 2.6 )with P ( Z;R 0 j R 0 = 0 )results inthejointprior P ( Z;X;R 0 ;Y j R 0 = 0 )= P ( Y j X; ) P ( X j Z ) P ( Z;R 0 j R 0 = 0 ) ; (2.7) whichfullyspeciestheirregulartree.Alltheparameters ofthejointpriorcanbegrouped intheset f r ij ; d ij ; ij ;P kl ij ; g 8 i;j 2 V 8 k;l 2 M AsdepictedinFigure 2{1 ,airregulartreeisadirectedgraph.Theformalismofthe graph-theoreticrepresentationofirregulartreesprovid esgeneralalgorithmsforcomputing marginalandconditionalprobabilitiesofinterest,which isdiscussedinthefollowingsection. 2.2ProbabilisticInference Imageinterpretation,asdiscussedinChapter 1 ,requirescomputationofposteriorprobabilitiesofhiddenrandomvariables Z X ,and R 0 ,givenobservables Y andleaf-nodepositions R 0 .However,duetothecomplexityofirregulartrees,theexac tprobabilisticinference of P ( Z;X;R 0 j Y;R 0 )isinfeasible.Therefore,weresorttoapproximateinfere ncemethods, whicharedividedintotwobroadclasses:deterministicapp roximationsandMonte-Carlo methods[ 57 58 59 60 61 ]. MarkovChainMonteCarlo(MCMC)methodsallowforsamplingo ftheposterior P ( Z;X;R 0 j Y;R 0 ),andtheconstructionofaMarkovchainwhoseequilibriumd istribution isthedesired P ( Z;X;R 0 j Y;R 0 ).Below,wereportanexperimentfortwodatasetsof4 4 and8 8binaryimages,samplesofwhicharedepictedinFig. 2{2 a,wherewelearned P ( Z;X;R 0 j Y;R 0 )forIT V 0 modelsthroughGibbssampling[ 62 ].Observables y i wereset tobinarypixelvalues;thenumberofimageclasseswassetto j M j =2;thenumberof componentsintheGaussian-mixturewassetto G =1;andthemaximumnumberoflevels inthemodelissetto L =3and L =4for4 4and8 8images,respectively.Theinitial irregular-treestructureisabalancedquad-tree(TSBN),w herethenumberofleaf-level nodesisequaltothenumberofpixels.OneiterationofGibbs samplingconsistsofsampling eachvariable,conditionedontheothervariablesintheirr egulartree,untilallthevariables aresampled.Weiteratedthisprocedureuntilourconvergen cecriterionwasmet{namely, when j P t +1 ( Z;X;R 0 j Y;R 0 ) P t ( Z;X;R 0 j Y;R 0 ) j =P t ( Z;X;R 0 j Y;R 0 ) <" for N =10successive PAGE 30 17 (a) (b) (c) Figure2{2:Pixelclusteringusingirregulartreeslearned byGibbssampling:(a)sample 4 4and8 8binaryimages;(b)clusteredleaf-levelpixelsthathavet hesameparentat level1;(c)clusteredleaf-levelpixelsthathavethesameg randparentatlevel2;clusters areindicatedbydierentshadesofgray;thepointineachgr oupmarksthepositionofthe parentnode. Figure2{3:Irregulartreelearnedforthe4 4imagein(a),after20,032iterationsofGibbs sampling;nodesaredepictedin-linerepresenting4,2and1 actualrowsofthelevels0,1 and2,respectively;nodesaredrawnaspie-chartsrepresen ting P ( x ki =1), k 2f 0 ; 1 g ;note thattherearetworootnodesfortwodistinctobjectsinthei mage. iterationsteps t ,where =0 : 1and =1for4 4and8 8images,respectively.Forthe datasetof50binary4 4images,onaveragemorethan20,000iterationstepswerere quired forconvergence,whilefor50binary8 8binaryimage,morethan100,000iterationswere required.InFigs. 2{2 b-c,wealsoillustratethegroupingofpixelsinthelearned irregular trees,whileinFig. 2{3 ,wedepicttheirregulartreelearnedforthe4 4imageinFig. 2{2 a. Fromtheexperimentalresults,weinferthatirregulartree slearnedthroughGibbs samplingarecapableofcapturingimportantstructuralinf ormationaboutimageregions atvariousscales.Generally,however,inMCMCapproaches, withincreasingmodelcomplexity,thechoiceofproposalsintheMarkovchainbecomes hard,sothattheequilibrium distributionisreachedveryslowly[ 63 57 ].Hence,inordertoachievefasterinference,we resorttovariationalapproximation,aspecictypeofdete rministicapproximation[ 59 64 ]. PAGE 31 18 Variationalapproximationmethodshavebeendemonstrated togivegoodandsignicantly fasterresults,whencomparedtoGibbssampling[ 46 ].Theproposedapproachesrangefrom afactorizedapproximatingdistributionoverhiddenvaria bles[ 45 ](a.k.a.meaneldvariationalapproximation)tomorestructuredsolutions[ 48 ],wheredependenciesamonghidden variablesareenforced.Theunderlyingassumptioninthose methodsisthatthereareaveragingphenomenainirregulartreesthatmayrenderagivense tofvariablesapproximately independentoftherestofthenetwork.Therefore,theresul tingvariationaloptimizationof irregulartreesprovidesforprincipledsolutions,whiler educingcomputationalcomplexity. Inthefollowingsection,wederiveanovelStructuredVaria tionalApproximation(SVA) algorithmfortheirregulartreemodeldenedinSection 2.1 2.3StructuredVariationalApproximation Invariationalapproximation,theintractabledistributi on P ( Z;X;R 0 j Y;R 0 )isapproximatedbyasimplerdistribution Q ( Z;X;R 0 j Y;R 0 )closestto P ( Z;X;R 0 j Y;R 0 ).Tosimplify notation,below,weomittheconditioningon Y and R 0 ,andwrite Q ( Z;X;R 0 ).Thenovelty ofourapproachisthatweconstrainthevariationaldistrib utiontotheform Q ( Z;X;R 0 ) Q ( Z ) Q ( X j Z ) Q ( R 0 j Z ) ; (2.8) whichenforcesthatbothclass-indicatorvariables X andpositionvariables R 0 arestatisticallydependentonthetreeconnectivity Z .Sincethesedependenciesaresignicantinthe prior,oneshouldexpectthemtoremainsointheposterior.T herefore,ourformulation appearstobemoreappropriateforapproximatingthetruepo steriorthanthemean-eld variationalapproximation Q ( Z;X;R 0 )= Q ( Z ) Q ( X ) Q ( R 0 )discussedbyAdamsetal.[ 45 ], andtheform Q ( Z;X;R 0 )= Q ( Z ) Q ( X j Z ) Q ( R 0 )proposedbyStorkeyandWilliams[ 48 ].We denetheapproximatingdistributionsasfollows: Q ( Z ) Q L 1 ` =0 Q ( i;j ) 2 V ` f 0 ;V ` +1 g [ ij ] z ij ; (2.9) Q ( X j Z ) Q i;j 2 V Q k;l 2 M h Q klij i x ki x lj z ij ; (2.10) Q ( R 0 j Z ) Q i;j 2 V 0 [ Q ( r i j z ij )] z ij = Q i;j 2 V 0 24 exp 1 2 ( r i ij ) T n 1 ij ( r i ij ) 2 j n ij j 1 2 35 z ij ; (2.11) PAGE 32 19 whereparameters ij correspondtothe r ij connectionprobabilities,andthe Q klij areanalogoustothe P kl ij conditionalprobabilitytables.Fortheparametersof Q ( R 0 j Z ),notethat covariancesn ij andmeanvalues ij formthesetofGaussianparametersforagivennode i 2 V ` overitscandidateparents j 2 V ` +1 .Whichpairofparameters( ij ; n ij ),isusedto generate r i isconditionedonthegivenconnectionbetween i and j {thatis,thecurrent realizationof Z .Furthermore,weassumethatthen'sarediagonalmatrices, suchthat nodepositionsalongthe\ x "and\ y "imageaxesareuncorrelated.Also,forroots,suitable formsof Q functionsareused,similartothespecicationsgiveninSe ction 2.1 Tond Q ( Z;X;R 0 )closestto P ( Z;X;R 0 j Y;R 0 )weresorttoastandardoptimization method,whereKullback-Leibler(KL)divergencebetween Q ( Z;X;R 0 )and P ( Z;X;R 0 j Y;R 0 ) isminimized([ 65 ],ch.2,pp.12{49,andch.16,pp.482{509).TheKLdivergenc eisgiven by KL ( Q k P ) Z R 0 dR 0 XZ;X Q ( Z;X;R 0 )log Q ( Z;X;R 0 ) P ( Z;X;R 0 j Y;R 0 ) : (2.12) Itiswellknownthat KL ( Q k P )isnon-negativeforanytwodistributions Q and P ,and KL ( Q k P )=0ifandonlyif Q = P ;thesepropertiesareadirectcorollaryofJensen'sinequa lity([ 65 ],ch.2,pp.12{49).Assuch, KL ( Q k P )guaranteesaglobalminimum{thatis,a uniquesolutionto Q ( Z;X;R 0 ). ByminimizingtheKLdivergence,wederivetheupdateequati onsforestimatingthe parametersofthevariationaldistribution Q ( Z;X;R 0 ).Below,wesummarizethenal derivationresults.Detailedderivationstepsarereporte dinAppendix A ,wherewealso providethelistofnomenclature.Inthefollowingequation s,weuse todenoteanarbitrary normalizationconstant,thedenitionofwhichmaychangef romequationtoequation. Parametersontheright-handsideoftheupdateequationsar eassumedknown,aslearned inthepreviousiterationstep.2.3.1Optimizationof Q ( X j Z ) Q ( X j Z )isfullycharacterizedbyparameters Q klij ,whichareupdatedas Q klij = P kl ij ki ; 8 i;j 2 V; 8 k;l 2 M; (2.13) PAGE 33 20 wheretheauxiliaryparameters ki arecomputedas ki = 8><>: P ( y ( i ) j x ki ; ( i )) ;i 2 V 0 ; Q c 2 V P a 2 M P ak ci akci ci ;i 2 V 0 ; (2.14a) ki = P ( y ( i ) j x ki ; ( i )) Q c 2 V P a 2 M P ak ci ac ci ; 8 i 2 V; 8 k 2 M; (2.14b) whereEq.( 2.14a )isderivedforIT V 0 ,andEq.( 2.14b )forIT V .Sincethe ci arenon-zero onlyforchild-parentpairs,fromEq.( 2.14 ),wenotethat 'sarecomputedforbothmodels bypropagatingthe messagesofthecorrespondingchildrennodesupward.Thus, Q 's,given byEq.( 2.13 ),canbeupdatedbymakingasinglepassupthetree.Also,not ethatforleaf nodes, i 2 V 0 ,the ci parametersareequalto0bydenition,yielding ki = P ( y ( i ) j x ki ; ( i )) inEq.( 2.14b ). Further,fromEqs.( 2.9 )and( 2.10 ),wederivetheupdateequationfortheapproximate posteriorprobability m ki thatnode i isassignedtoimageclass k ,given Y and R 0 ,as m ki = R R 0 dR 0 P Z;X x ki Q ( Z;X;R 0 )= P j 2 V 0 ij P l 2 M Q klij m lj ; 8 i 2 V; 8 k 2 M: (2.15) Notethatthe m ki canbecomputedbypropagatingimage-classprobabilitiesi nasinglepass downward.Thisupward-downwardpropagation,speciedbyE qs.( 2.14 )and( 2.15 ),isvery reminiscentofbeliefpropagationforTSBNs[ 36 31 ].Forthespecialcasewhen ij =1only foroneparent j ,weobtainthestandard rulesofPearl'smessagepassingschemefor TSBNs.2.3.2Optimizationof Q ( R 0 j Z ) Q ( R 0 j Z )isfullycharacterizedbyparameters ij andn ij .Theupdateequationfor ij 8 ( i;j ) 2 V ` f 0 ;V ` +1 g `> 0,isgivenby ij = 24 X p 2 V 0 jp 1 ij + X c 2 V 0 ci 1 ci 35 1 24 X p 2 V 0 jp 1 ij ( jp + d jp )+ X c 2 V 0 ci 1 ci ( ci d ij ) 35 ; (2.16) PAGE 34 21 where c and p denotechildrenandgrandparentsofnode i ,respectively.Further,forall nodepairs 8 ( i;j ) 2 V ` f 0 ;V ` +1 g `> 0,where ij 6 =0,n ij isupdatedas Tr f n 1 ij g =Tr f 1 ij g 0@ 1+ X p 2 V 0 jp Tr f 1 ij n jp g Tr f 1 ij n ij g # 1 2 1A + + X c 2 V 0 ci Tr f 1 ci g 0@ 1+ Tr f 1 ci n ci g Tr f 1 ci n ij g 1 2 1A ; (2.17) where,onceagain, c and p denotechildrenandgrandparentsofnode i ,respectively.Since then'sand'sareassumeddiagonal,itisstraightforwardt oderivetheexpressionsfor thediagonalelementsofthen'sfromEq.( 2.17 ).Notethatboth ij andn ij areupdatedsummingoverchildrenandgrandparentsof i ,and,therefore,mustbeiterateduntil convergence.2.3.3Optimizationof Q ( Z ) Q ( Z )isfullycharacterizedbyconnectivityprobabilities ij ,whicharecomputedas ij = r ij exp( A ij B ij ) ; 8 `; 8 ( i;j ) 2 V ` f 0 ;V ` +1 g ; (2.18) where A ij representstheinruenceofobservables Y ,while B ij representsthecontributionof thegeometricpropertiesofthenetworktotheconnectivity distribution.Thesearedened inAppendix A 2.4InferenceAlgorithmandBayesianEstimation Forthegivensetofparameterscharacterizingthejointpr ior,observables Y ,and leaf-levelnodepositions R 0 ,thestandardBayesianestimationofoptimal ^ Z ^ X ,and ^ R 0 requiresminimizingtheexpectationofacostfunction C : ( ^ Z; ^ X; ^ R 0 )=argmin Z;X;R 0 E fC (( Z;X;R 0 ) ; ( Z ;X ;R 0 )) j Y;R 0 ; g ; (2.19) where C ( )penalizesthediscrepancybetweentheestimatedcongura tion( Z;X;R 0 )and thetrueone( Z ;X ;R 0 ).Weproposethefollowingcostfunction: C (( Z;X;R 0 ) ; ( Z ;X ;R 0 )) X i;j 2 V [1 ( z ij z ij )]+ X i 2 V X k 2 M [1 ( x ki x k i )]+ X i 2 V 0 [1 ( r i r i )] ; (2.20) PAGE 35 22 where indicatestruevalues,and ( )istheKroneckerdeltafunction.Usingthevariational approximation P ( Z;X;R 0 j Y;R 0 ) Q ( Z ) Q ( X j Z ) Q ( R 0 j Z ),fromEqs.( 2.19 )and( 2.20 ),we derive: ^ Z =argmin Z P Z Q ( Z ) P L 1 ` =0 P ( i;j ) 2 V ` f 0 ;V ` +1 g [1 ( z ij z ij )] ; (2.21) ^ X =argmin X P Z;X Q ( Z ) Q ( X j Z ) P i 2 V P k 2 M [1 ( x ki x k i ))] ; (2.22) ^ R 0 =argmin R 0 R R 0 dR 0 P Z Q ( Z ) Q ( R 0 j Z ) P i 2 V 0 [1 ( r i r i )] : (2.23) Giventheconstraintsonconnections,discussedinSection 2.1 ,minimizationinEq.( 2.21 ) isequivalenttondingparents: ( 8 ` )( 8 i 2 V ` )( Z i 6 =0) ^ j =argmax j 2f 0 ;V ` +1 g ij ; forIT V 0 ; (2.24a) ( 8 ` )( 8 i 2 V ` ) ^ j =argmax j 2f 0 ;V ` +1 g ij ; forIT V ; (2.24b) where ij isgivenbyEq.( 2.18 ); Z i denotesthe i -thcolumnof Z ,and Z i 6 =0indicatesthat thereisatleastonenon-zeroelementincolumn Z i ;thatis, i haschildren,andthereby isincludedinthetreestructure.Notethatduetothedistri butionoverconnections,after estimationof Z ,foragivenimage,somenodesmayremainwithoutchildren.T opreserve thegenerativepropertyinIT V 0 ,weimposeanadditionalconstrainton Z thatnodesabove theleaflevelmusthavechildreninordertobeabletoconnec ttoupperlevels.Ontheother hand,inIT V ,duetomulti-layeredobservables,allnodes V mustbeincludedinthetree structure,eveniftheydonothavechildren.Theglobalsolu tiontoEq.( 2.24a )isanopen probleminmanyresearchareas.Therefore,forIT V 0 ,weproposeastage-wiseoptimization, where,aswemoveupwards,startingfromtheleaflevel ` = f 0 ; 1 ;:::;L 1 g ,weincludeinthe treestructureoptimalparentsat V ` +1 accordingto ( 8 i 2 V ` )( ^ Z i 6 =0) ^ j =argmax j 2f 0 ;V ` +1 g ij ; (2.25) where ^ Z i denotes i -thcolumnoftheestimated ^ Z ,and ^ Z i 6 =0indicatesthat i hasalready beenincludedinthetreestructurewhenoptimizingtheprev iouslevel V ` PAGE 36 23 Next,fromEq.( 2.22 ),theresultingBayesianestimatorofimage-classlabels, denoted as^ x i ,is ( 8 i 2 V )^ x i =argmax k 2 M m ki ; (2.26) wheretheapproximateposteriorprobability m ki thatimageclass k isassignedtonode i is givenbyEq.( 2.15 ). Finally,fromEq.( 2.23 ),optimalnodepositionsareestimatedas ( 8 `> 0)( 8 i 2 V ` ) ^ r i =argmax r i P Z Q ( r i j Z ) Q ( Z )= P j 2f 0 ;V ` +1 g ij ij ; (2.27) where ij and ij aregivenbyEqs.( 2.16 )and( 2.18 ),respectively. Theinferencealgorithmforirregulartreesissummarizedi nFig. 2{4 .Thespecied orderingofparameterupdatesfor Q ( Z ), Q ( X j Z ),and Q ( R 0 j Z )inFig. 2{4 ,steps(4){(10), isarbitrary;theoretically,otherorderingsareequallyv alid. 2.5LearningParametersoftheIrregularTreewithRandomNo dePositions Variationalinferencepresumesthatmodelparameters:= f r ij ; d ij ; ij ;P kl ij ; g 8 i;j 2 V 8 k;l 2 M ,and V L M ,areavailable.Theseparameterscanbelearnedo-linethr ough standard MaximumLikelihood (ML)optimization.Usually,fortheMLoptimization,itis assumedthat N ,independentlygenerated,trainingimages,withobservab les f Y n g Nn =1 and latentvariables f ( Z n ;X n ;R 0 n ) g Nn =1 ,aregiven.However,formultiscalegenerativemodels, ingeneral,neitherthetrueimage-classlabelsfornodesat higherlevelsnortheirdynamic connectionsaregiven.Therefore,congurations f ( ^ Z n ; ^ X n ; ^ R 0 n ) g mustbeestimatedfrom thetrainingimages. Tothisend,weproposeaniterativelearningprocedure.Ini nitialization,werstset L =log 2 ( j V 0 j ),where j V 0 j isequaltothesizeofagivenimage.Thenumberofimage classes j M j isalsoassumedknown.Next,duetoahugediversityofpossib lecongurations ofobjectsinimages,foreachnode i 2 V ` ,weinitialize r ij tobeuniformover i 'scandidate parents 8 j 2f 0 ;V ` +1 g .Then,forallpairs( i;j ) 2 V ` V ` +1 atlevel ` ,weset d ij = ( i ) ( j ); namely,the d ij areinitializedtotherelativedisplacementofthecenters ofmassofthe i -th and j -thdyadicsquaresinthecorrespondingquad-treewith L levels,speciedinEq.( 2.5 ). Forroots i ,wehave d i = ( i ).Also,wesetdiagonalelementsof ij tothediagonalelements PAGE 37 24 InferenceAlgorithm Assumethat V L M ,, N " ,and aregiven. (1)Initialization: t =0; t in =0;( 8 i;j 2 V )( 8 k;l 2 M ) ij (0)= r ij ; Q klij (0)= P kl ij ; ij (0)="node locationsinthecorrespondingquad-tree";diagonaleleme ntsof n ij (0)aresettothearea ofdyadicsquaresinthecorrespondingquad-tree;(2) repeat OuterLoop (3) t = t +1; (4)Computein bottom-up passfor ` =0 ; 1 ;:::;L 1, 8 i 2 V ` 8 k 2 M ki ( t )givenbyEq.( 2.14 ); Q klij ( t )givenbyEq.( 2.13 ); (5)Computein top-down passfor ` = L 1 ;L 2 ;:::; 0, 8 i 2 V ` 8 k 2 M m ki ( t )givenbyEq.( 2.15 ); (6) repeat InnerLoop (7) t in = t in +1; (8)Compute 8 i;j 2 V 0 ij ( t in )givenbyEq.( 2.16 );n ij ( t in )givenbyEq.( 2.17 ); (9) until j ij ( t in ) ij ( t in 1) j = ij ( t in 1) <" ; (10)Compute 8 i;j 2 V 0 ij ( t )givenbyEq.( 2.18 ); (11) until j Q ( Z;X;R 0 ; t ) Q ( Z;X;R 0 ; t 1) j =Q ( Z;X;R 0 ; t 1) <" ,for N consecutiveiterationsteps;(12)Estimationof ^ Z :computein bottom-up passfor ` =0 ; 1 ;:::;L 1, forIT V 0 :( 8 i 2 V ` )( ^ Z i 6 =0) ^ j =argmax j 2f 0 ;V ` +1 g ij ( t ), forIT V :( 8 i 2 V ` ) ^ j =argmax j 2f 0 ;V ` +1 g ij ( t ); (13)Estimationof ^ X :compute( 8 i 2 V )^ x i =argmax k 2 M m ki ( t ); (14)Estimationof ^ R 0 :compute( 8 `> 0)( 8 i 2 V ` ) ^ r i = P j 2f 0 ;V ` +1 g ij ( t ) ij ( t ); Figure2{4:Inferenceoftheirregulartreegiven Y R 0 ,and; t and t in arecountersinthe outerandinnerloops,respectively; N " ,and controltheconvergencecriteriaforthe twoloops.ofamatrix d ij d Tij .Thenumberofcomponents G k inaGaussianmixtureforeachclass k issetto G k =3,whichisempiricallyvalidatedtobeappropriate.Other parametersofthe Gaussianmixture, ,areestimatedbyusingtheEMalgorithm[ 52 56 ]onthehand-labeled trainingimages.Finally,conditionalprobabilitytables P kl ij ,areinitializedtobeuniform overpossibleimageclasses. Afterinitializationof,werunaniterativelearningproc edure,whereinstep t we conductSVAinferenceoftheirregulartreeonthetrainingi mages,asexplainedinthe previoussection.Afterinferenceoftheposteriorprobabi litythatclass k isassignedtonode i m ki ,givenbyEq.( 2.15 ),andposteriorconnectivityprobability, ij ,givenbyEq.( 2.18 ), PAGE 38 25 onalltrainingimages, n =1 ;:::;N ,weupdateonly P kl ij and r ij as P kl ij ( t +1)= 1 N N X n =1 m k ; n i ( t ) ; (2.28) r ij ( t +1)= 1 N N X n =1 n ij ( t ) : (2.29) Otherparametersin( t +1)= f r ij ( t +1) ; d ij ; ij ;P kl ij ( t +1) ; g ,arexedtotheirinitialvalues.Inthenextiterationstep,weuse( t +1)forSVAinferenceoftheirregulartreeon thetrainingimages.Weassumethatthelearningalgorithmc onvergedwhen j P kl ij ( t +1) P kl ij ( t ) j P kl ij ( t ) <"; where "> 0isapre-speciedparameter. 2.6ImplementationIssues Inthissection,welistalgorithm-relateddetailsthatare necessaryfortheexperimental results,presentedinChapter 6 ,tobereproducible. First,directimplementationofEq.( 2.13 )wouldresultinnumericalunderrow.Therefore,weintroducethefollowingscalingprocedure: ~ ki ki S i ; 8 i 2 V; 8 k 2 M; (2.30) S i X k 2 M ki : (2.31) Substitutingthescaled ~ 'sintoEq.( 2.13 ),weobtain Q klij = P kl ij ki P a 2 M P al ij ai = P kl ij ~ ki P a 2 M P al ij ~ ai : (2.32) Inotherwords,computationof Q klij doesnotchangewhenthescaled ~ 0 s areused. Second,toreducecomputationalcomplexity,weconsider,f oreachnode i ,onlythe7 7 boxencompassingparentnodes j thatneighbortheparentofthecorrespondingquad-tree. Consequently,thenumberofpossiblechildrennodes c of i isalsolimited.Ourexperiments showthattheomittednodes,eitherchildrenorparents,con tributenegligiblytotheupdate equations.Thus,welimitoverallcomputationalcostasthe numberofnodesincreases. PAGE 39 26 Finally,theconvergencecriterionoftheinnerloop,where ij andn ij arecomputed, iscontrolledbyparameter .When =0 : 01,theaveragenumberofiterationsteps, t in ,in theinnerloop,isfrom3to5,dependingontheimagesize,whe rethelatterisobtainedfor 128 128images.Theconvergencecriterionoftheouterloopisco ntrolledbyparameters N and .Simplicationsthatweuseinpracticemayleadtosub-opti malsolutionsofSVA. Fromourexperience,though,thealgorithmrecoversfromun stablestationarypointsfor sucientlylarge N .Inourexperiments,weset N =10and =0 : 01. Aftertheinferencealgorithm(Fig. 2{4 )converged,wethenestimatethevaluesof hiddenvariables( ^ Z; ^ X; ^ R 0 )foragivenimage,therebyconductingimageinterpretatio n. PAGE 40 CHAPTER3 IRREGULARTREESWITHFIXEDNODEPOSITIONS Inthepreviouschapter,twoarchitecturesoftheirregular treearepresented,whichare fullycharacterizedbythefollowingjointprior: P ( Z;X;R 0 ;Y j R 0 = 0 )= P ( Y j X; ) P ( X j Z ) P ( Z;R 0 j R 0 = 0 ) : AsdiscussedinSection 2.2 ,theinferenceoftheposteriordistribution P ( Z;X;R 0 j Y;R 0 ) isintractable,duetothecomplexityofthemodel.Thenodepositionvariables, R 0 ,are themainculpritforconductingapproximateinference.Ont heotherhand,the R 0 are veryuseful,becausetheyconstrainpossiblenetworkcong urations.Inordertoavoid approximateinference,inthischapter,weintroduceyetan otherarchitectureoftheirregular tree,wherethe R 0 areeliminated,andwheretheconstraintsonthetreestruct urearedirectly modeledinthedistributionofconnectivity Z 3.1ModelSpecication Similartothemodelspecicationinthepreviouschapter,w eintroducetwoarchitectures:onewithobservablesonlyattheleaflevel,andtheot herwithobservablespropagated tohigherlevels.Themaindierencefromthearchitectures IT V andIT V 0 isthatnodepositionsareidenticaltothoseofthequad-tree.Therefore, werefertothearchitectures presentedinthischapterasirregularquadtreesIQT V andIQT V 0 Theirregularquadtreeisadirectedacyclicgraphwithnode sinset V ,organized inhierarchicallevels, V ` ` = f 0 ; 1 ;:::;L g ,where V 0 denotestheleaflevel.Thelayout ofnodesisidenticaltothatofthequad-tree,modelingfore xamplethedyadicpyramid ofwaveletcoecients,suchthatthenumberofnodesatlevel ` canbecomputedas j V ` j = j V ` 1 j = 4= ::: = j V 0 j = 4 ` .Unlikeforposition-encodingdynamictrees[ 48 ],weassume thatnodesarexedatlocationsofthecorrespondingquad-t ree.Consequently,irregular modelstructureisachievedonlythroughestablishingarbi traryconnectionsbetweennodes. Connectionsareestablishedundertheconstraintthatanod eatlevel ` canbecomearoot 27 PAGE 41 28 oritcanconnectonlytothenodesatthenext ` +1level.Thenetworkconnectivityisrepresentedbyarandommatrix, Z ,whereentry z ij isanindicatorrandomvariable,suchthat z ij =1if i 2 V ` and j 2 V ` +1 areconnected. Z containsanadditionalzero(\root")column, whereentries z i 0 =1if i isarootnode.Eachnodecanhaveonlyoneparent,orcanbea root.Notethatduetothedistributionoverconnections,af terestimationof Z ,foragiven image,inIQT V ,somenodesmayremainwithoutchildren. Eachnode i ischaracterizedbyanimage-classrandomvariable, x i ,whichcantake valuesinaniteclassset C .Given Z ,thelabel x i ofnode i isconditionedon x j ofits parent j as P ( x i j x j ;z ij =1).Thejointprobabilityofimage-classvariables X = f x i g 8 i 2 V isgivenby P ( X j Z )= Q L` =0 Q i 2 V ` P ( x i j x j ;z ij =1) ; (3.1) whereforrootsweusepriors P ( x i ).Weassumethattheconditionalprobabilitytables P ( x i j x j ;z ij =1)areequalforallthenodesatalllevels,asin[ 33 ].Suchauniqueconditional probabilitytableisdenotedas. Next,weassumethatobservables y i areconditionallyindependentgiventhecorresponding x i : P ( Y j X )= Q i 2 V P ( y i j x i ) ; (3.2) P ( y i j x i = k )= P Gg =1 k ( g ) N ( y i ; k ( g ) ; k ( g )) ; (3.3) whereforIQT V 0 insteadof V wewrite V 0 inEq.( 3.2 ). P ( y i j x i = k ), k 2 M ,ismodeledasamixtureofGaussians.TheGaussian-mixtureparame terscanbegroupedin = f k ( g ) ; k ( g ) ; k ( g ) ;G k g 8 k 2 M Finally,wespecifytheconnectivitydistribution.Inthep reviouschapter,itisdefIQTinedastheprior P ( Z )= Q i;j 2 V P ( z ij =1),andthentheconstraintonpossibletree structuresisimposedthroughintroducinganadditionalse tofrandomvariables{namely, randomnodepositions R .Themainpurposeofthe R 'sistoprovideforthemechanism thattheconnectionsbetweenclosenodesarefavored.Thata pproachhastwomajordisadvantages.First,theadditional R variablesrendertheexactinferenceofthedynamic PAGE 42 29 treeintractable,enforcingtheuseofapproximateinferen cemethods(variationalapproximation).Second,thedecisionifnodes i and j shouldbeconnectedisnotinformedonthe actualvaluesof x i and x j .Toimproveuponthemodelformulationofthepreviouschapter,weseektoeliminatethe R 's,andtoincorporatetheinformationonimage-classlabel s andnodepositionsintheconnectivitydistribution.Werea sonthatconnectionsbetween parentsandchildren,whoserelativedistanceissmall,sho uldbefavoredoverthosethatare farapart.Atthesametime,weseektoestablishamechanismt hatgroupsnodesbelonging tothesameimageclass,andseparatesthoseassignedtodie rentclasses. Letusrstexaminerelativedistancesbetweennodes.Dueto symmetryofthenode layout(equaltothatofthequad-tree),wedividethesetofa llcandidateparents j into classesofequidistancefromchild i ,asdepictedinFig. 3{1 .Wespecifythatrelative distancescantakeintegervalues d ij = f 0 ; 1 ; 2 ;:::;d maxi g ,whereif i isaroot n i 0 0.Note that d maxi valuesvaryfordierentpositionsof i atonelevel,aswellasfordierentlevels towhich i belongs. Given X ,wespecifytheconditionalconnectivitydistributionas P ( Z j X )= L Y ` =0 Y ( i;j ) 2 V ` f 0 ;V ` +1 g P ( z ij =1 j x i ;x j ) ; (3.4) P ( z ij =1 j x i ;x j )= 8>>>><>>>>: p i ;i isaroot ; p i (1 p i ) d ij ; if x i = x j ; p i (1 p i ) d maxi d ij ; if x i 6 = x j ; (3.5) subjectto X j 2f 0 ;V ` +1 g P ( z ij =1 j x i ;x j )=1 ; (3.6) where isanormalizingconstant,and p i istheparameterofthegeometricdistribution. FromEq.( 3.5 ),weobservethatwhen x i = x j P ( z ij =1 j x i ;x j )decreasesas d ij becomes larger,whilewhen x i 6 = x j P ( z ij =1 j x i ;x j )increasesforgreaterdistances d ij .Hence,the formof P ( z ij =1 j x i ;x j ),givenbyEq.( 3.5 ),satisestheaforementioneddesirableproperties. Toavoidovertting,weassumethat p i isequalforallnodes i atthesamelevel.The parametersof P ( Z j X )canbegroupedintheparameterset= f p i g 8 i 2 V PAGE 43 30 Figure3{1:Classesofcandidateparents j thatarecharacterizedbyauniquerelative distance d ij fromchild i Theintroducedparametersofthemodelcanbegroupedinthep arameterset= f ;; g Inthenextsectionweexplainhowtoinferthe\best"congur ationof Z and X fromthe observedimagedata Y ,providedthatisknown. 3.2InferenceoftheIrregularTreewithFixedNodePosition s ThestandardBayesianformulationoftheinferenceproblem consistsinminimizingthe expectationofsomecostfunction C ,giventhedata ( ^ Z; ^ X )=argmin Z;X E fC (( Z;X ) ; ( Z 0 ;X 0 )) j Y; g ; (3.7) where C penalizesthediscrepancybetweentheestimatedcongurat ion( Z;X )andthetrue one( Z 0 ;X 0 ).Weproposethefollowingcostfunction: C (( Z;X ) ; ( Z 0 ;X 0 ))= C ( X;X 0 )+ C ( Z;Z 0 ) ; (3.8) = L 1 X ` =0 X i 2 V ` [1 ( x i x 0i )]+ L 1 X ` =0 X ( i;j ) 2 V ` f 0 ;V ` +1 g [1 ( z ij z 0 ij )] ; (3.9) where 0 standsfortruevalues,and ( )istheKroneckerdeltafunction.FromEq.( 3.9 ),the resultingBayesianestimatorof X is 8 i 2 V; ^ x i =argmax x i 2 C P ( x i j Z;Y ) : (3.10) Next,giventheconstraintsonconnectionsintheirregular tree,wederivethatminimizing E fC ( Z;Z 0 ) j Y; g isequivalenttondingasetofoptimalparents ^ j suchthat ( 8 ` )( 8 i 2 V ` )( Z i 6 =0) ^ j =argmax j 2f 0 ;V ` +1 g P ( z ij j x i ;x j ) ; forIQT V 0 ; (3.11a) ( 8 ` )( 8 i 2 V ` ) ^ j =argmax j 2f 0 ;V ` +1 g P ( z ij j x i ;x j ) ; forIQT V ; (3.11b) PAGE 44 31 where Z i isthe i -thcolumnof Z ,and Z i 6 =0representstheevent\node i haschildren",that is,\node i isincludedintheirregular-treestructure."Theglobalso lutiontoEq.( 3.11a )is anopenprobleminmanyresearchareas.Weproposeastage-wi seoptimization,where,as wemoveupwards,startingfromtheleaflevel ` = f 0 ; 1 ;:::;L g ,weincludeinthetreestructure optimalparentsat V ` +1 accordingto ( 8 i 2 V ` )( ^ Z i 6 =0) ^ j =argmax j 2f 0 ;V ` +1 g P ( z ij =1 j x i ;x j ) ; (3.12) where ^ Z i 6 =0denotesanestimatethat i hasalreadybeenincludedinthetreestructure whenoptimizingthepreviouslevel V ` ByusingtheresultsinEqs.( 3.10 )and( 3.12 ),wespecifytheinferencealgorithmfor theirregularquadtree,whichissummarizedinFig. 3{2 .Inarecursivestep t ,werst assumethatestimate Z ( t 1)ofthepreviousstep t 1isknownandthenderiveestimate X ( t )usingEq.( 3.10 );then,substituting X ( t )inEq.( 3.12 )wederiveestimate Z ( t ).We considerthealgorithmconvergedif P ( Y;X j Z )doesnotvarymorethansomethreshold for N consecutiveiterationsteps t .Inourexperiments,weset =0 : 01and N =10. Steps2and6inthealgorithmcanbeinterpretedasinference of ^ X given Y foraxedstructuretree.Inparticular,forStep2,wheretheinitial structureisthequad-tree,wecan usethestandardinferenceonquad-trees,where,essential ly,beliefmessagesarepropagated inonlytwosweepsupanddownthetree[ 33 29 31 ].ForStep6,theirregulartreerepresents aforestofsubtrees,whichalsohavexed,thoughirregular ,structure;therefore,wecan usetheverysametree-inferencealgorithmforeachofthesu btrees.Forcompleteness, inAppendix B ,wepresentthetwo-passmaximumposteriormarginalestima tionof X proposedbyLaferteetal.[ 33 ]. 3.3LearningParametersoftheIrregularTreewithFixedNod ePositions Analogoustothelearningalgorithmdiscussedintheprevio uschapter,theparameters oftheirregulartreewithxednodepositionscanbelearned byusingthestandardML optimization.Here,weassumethat N ,independentlygenerated,trainingimages,withobservables f Y n g n =1 ;:::;N ,aregiven.Asexplainedbefore,congurationsoflatentva riables f ( Z n ;X n ) g mustbeestimated. PAGE 45 32 InferenceAlgorithm (1) t =0;initializeirregular-treestructure Z (0)toquad-tree; (2)Compute 8 i 2 V;x i (0)=argmax x i 2 C P ( x i j Z (0) ;Y ); (3) repeat (4) t = t +1; (5)Computein bottom-up passfor ` =0 ; 1 ;:::;L forIQT V 0 :( 8 i 2 V ` )( ^ Z i 6 =0) ^ j =argmax j 2f 0 ;V ` +1 g P ( z ij =1 j x i ;x j ); forIQT V :( 8 i 2 V ` ) ^ j =argmax j 2f 0 ;V ` +1 g P ( z ij =1 j x i ;x j ); (6)Compute 8 i 2 V;x i ( t )=argmax x i 2 C P ( x i j Z ( t ) ;Y ); (7) ^ X = X ( t ); ^ Z = Z ( t ); (8) until j P ( Y; ^ X j ^ Z ) P ( Y;X ( t 1) j Z ( t 1)) P ( Y;X ( t 1) j Z ( t 1)) j <" for N consecutiveiterationsteps. Figure3{2:Inferenceoftheirregulartreewithxednodepo sitions,givenobservables Y andthemodelparameters. Tothisend,weproposeaniterativelearningprocedure,whe reinstep t werstassume that( t )= f ( t ) ; ( t ) ; ( t ) g isgivenandthenconductinferenceforeachtrainingimage, n =1 ;:::;N ( ^ Z n ; ^ X n )=argmin Z;X E fC (( Z;X ) ; ( Z 0 ;X 0 )) j Y n ; ( t ) g ; asexplainedinSection 3.2 .Oncetheestimates f ( ^ Z n ; ^ X n ) g arefound,weapplystandard MLoptimizationtocompute( t +1). Morespecically,suppose,inthelearningstep t ,realizationsofrandomvariables ( Y n ; ^ X n ; ^ Z n )aregivenfor n =1 ;:::;N .ThentheparametersofGaussian-mixturedistributions,instep t +1,canbecomputedusingthestandardEMalgorithm[ 56 ]: P ( c ( g ) j y i ;x i = c )= P ( y i j x i = c ) c ( g ) P G c g =1 P ( y i j x i = c ) c ( g ) ; (3.13) ^ c ( g )= 1 n c n c X i =1 P ( c ( g ) j y i ; ^ x i = c ) ; (3.14) ^ c ( g )= P n c i =1 y i P ( c ( g ) j y i ; ^ x i = c ) P n c i =1 P ( c ( g ) j y i ; ^ x i = c ) ; (3.15) ^ c ( g )= P n c i =1 ( y i ^ c ( g ))( y i ^ c ( g )) T P ( c ( g ) j y i ; ^ x i = c ) P n c i =1 P ( c ( g ) j y i ; ^ x i = c ) ; (3.16) where n c isthetotalnumberofallthenodesover N trainingimagesthatareclassiedas class c .Tocompute P ( c ( g ) j y i ;x i = c )inEq.( 3.13 ),weuseGaussian-mixtureparameters fromthepreviouslearningstep t .Forallclassesweset G c =3. PAGE 46 33 Next,weexplainhowtolearntheparametersoftheconnectiv itydistribution,( t +1)= f p i ( t +1) g i 2 V ,byusingtheMLprinciple: ( t +1)=argmax N Y n =1 P ( ^ Z n j ^ X n ; ( t 1)) : (3.17) Here,weconsidertwocasesforIQT V andIQT V 0 models.Recallthatparameters p i are equalforallnodes i atthesamelevel ` .Giventheestimates f ( ^ Z n ; ^ X n ) g ,foreachtraining image n =1 ;:::;N ,fromEqs.( 3.5 )and( 3.17 ),wederiveforIQT V : ^ p ( ` )= N j V ` j N X n =1 X i 2 V ` 1+I(^ x ni =^ x nj ) d nij +I(^ x ni 6 =^ x nj )( d maxi d nij ) ; (3.18) whereI( )isanindicatorfunction, j isanestimatedparentofnode i d nij denotestherelative distanceassignedtotheestimatedconnection^ z n ij =1. ForIQT V 0 ,giventheestimates f ( ^ Z n ; ^ X n ) g ,foreachtrainingimage n =1 ;:::;N ,we analyzethesetofnodes i 2 V ` includedinthecorrespondingirregulartree,i.e., ^ Z n i 6 =0. Thus,fromEqs.( 3.5 ),and( 3.17 ),wederive: ^ p i ( ` )= N X n =1 X i 2 V ` I( ^ Z n i 6 =0) N X n =1 X i 2 V ` I( ^ Z n i 6 =0) 1+I(^ x ni =^ x nj ) d nij +I(^ x ni 6 =^ x nj )( d maxi d nij ) ; (3.19) whereI( )isanindicatorfunction, j isanestimatedparentofnode i d nij denotestherelative distanceassignedtotheestimatedconnection^ z n ij =1. Finally,tolearntheconditionalprobabilitytable,weus ethestandardEMalgorithm onxed-structuretrees,thoroughlydiscussedin[ 33 ].Notethattoobtaintheestimates f ( ^ Z n ; ^ X n ) g ,foreachtrainingimage n =1 ;:::;N ,inthelearningstep t ,weinfacthaveto conducttheMPMestimation,givenininAppendix B inFig. B .Byusingalreadyavailable P ( x i ;x j j Y n d ( i ) ; ^ z n ij =1)and P ( x i j Y n d ( i ) ),obtainedforeachimage n asinFig B ,wederive ^ = 1 N N X n =1 P i 2 V P ( x i ;x j j Y n d ( i ) ; ^ z n ij =1) P i 2 V P ( x j j Y n d ( i ) ) : (3.20) TheoveralllearningprocedureissummarizedinFig. 3{3 PAGE 47 34 LearningAlgorithm (1) t =0;initialize(0)= f (0) ; (0) ; (0) g ; (2)Estimatefor n =1 ;:::;N ( ^ Z n ; ^ X n )=argmin Z;X E fC (( Z;X ) ; ( Z 0 ;X 0 )) j Y n ; (0) g ; (3) repeat (4) t = t +1; (5)Compute: ( t )asinEqs.( 3.13 ){( 3.16 ); p ( ` ; t ),forIQT V asinEq.( 3.18 );forIQT V 0 asinEq.( 3.19 ); ( t ),asinEq.( 3.20 ); (6)Estimatefor n =1 ;:::;N ( ^ Z n ; ^ X n )=argmin Z;X E fC (( Z;X ) ; ( Z 0 ;X 0 )) j Y n ; ( t ) g usingtheinferencealgorithminFig. 3{2 ; (7) =( t ); (8) until ( 8 n ) j P ( Y n ; ^ X n j ^ Z n ; ) P ( Y n ; ^ X n j ^ Z n ; ( t 1)) P ( Y n ; ^ X n j ^ Z n ; ( t 1)) j <" for N consecutiveiterationsteps. Figure3{3:Algorithmforlearningtheparametersoftheirr egulartree;fornotational simplicity,inStep(8)wedonotindicatethedierentestim atesof( ^ Z n ; ^ X n )for and ( t 1). Once islearned,wecanlocalize,detectandrecognizeobjectsin theimage,by conductingtheinferencealgorithm,presentedinFig. 3{2 PAGE 48 CHAPTER4 COGNITIVEANALYSISOFOBJECTPARTS Inferenceofhiddenvariables( ^ Z; ^ X ),canbeviewedasbuildingaforestofsubtrees,each segmentinganimageintoarbitrary(notnecessarilycontig uous)regions,whichweinterpret asobjects.Since,eachrootdeterminesasubtree,whoselea fnodesformadetectedobject, weassignphysicalmeaningtorootsbyassumingtheyreprese ntwholeobjects.Moreover, eachdescendantoftherootcanbeviewedastherootofanothe rsubtree,whoseleafnodes coveronlyapartoftheobject.Hence,wesaythatroots'desc endantsrepresentobject partsatvariousscales. Strategiesforrecognizingdetectedobjectsnaturallyari sefromaparticularinterpretationofthetree/sub-treestructure.Below,wemakeadistin ctionbetweentwosuchstrategies.Theanalysisofimageregionsundertherootsleadstot he whole-objectrecognition strategy ,whiletheanalysisofimageregionsdeterminedbyroots'de scendantsconstitutes the object-partrecognitionstrategy .Forbothapproaches,nalrecognitionisconductedby majorityvotingoverMAPlabels,^ x i ,ofleafnodes. 1 Thereasonforanalyzingsmallerimageregionsthanthoseun dertherootsstemsfrom ourhypothesisthattheinformationofne-scaleobjectdet ailsmayprovecriticalforthe recognitionofanobjectasawholeinsceneswithocclusions .Toreducethecomplexityof interpretingalldetectedobjectsub-parts,weproposetoa nalyzethe signicance ofobject components(i.e.,irregular-treenodes)withrespecttore cognitionofobjectsasawhole. 1 Theliteratureoersvariousstrategiesthatoutperformma jority-votingclassication (e.g.,multiscaleBayesianclassication[ 29 ],andmultiscaleViterbiclassication[ 32 ]);however,theydonotaccountexplicitlyforocclusions,and,as such,donotsignicantlyoutperformmajorityvotingforsceneswithoccludedobjects. 35 PAGE 49 36 4.1MeasuringSignicanceofObjectParts Wehypothesizethatthesignicanceofobjectpartswithres pecttoobjectrecognition dependsonbothlocal,innateobjectpropertiesandglobals ceneproperties.Whileinnatepropertiesrepresentcharacteristicobjectfeatures ,whichdierentiateoneobjectfrom another,globalscenepropertiesdescribeinterdependenc iesofobjectpartsintheoverall imagecomposition.Itisnecessarytoaccountforbothlocal andglobalcues,asthemost conspicuousobjectcomponentneednotnecessarilybethemo stsignicantforthatobject's recognitioninthepresenceofalikeobjects. Theanalysisofinnateobjectpropertiesishandledthrough inferenceoftheirregular tree,where,foragivenimage,wecompute P ( x i j ^ Z;Y ), 8 i 2 V ,asexplainedinChapters 2 and 3 .Toaccountfortheinruenceofglobalsceneproperties,for eachnode i ,wecompute Shanon'sentropyoverthesetofimageclasses, M ,as ( 8 i 2 V )(^ z i 6 =0) H i = X x i 2 M P ( x i j ^ Z;Y )log P ( x i j ^ Z;Y ) : (4.1) Sincenode i representsanobjectpart,wedene H i asameasureofsignicanceofthat objectpart.Notethatanodewithsmallentropyischaracter izedbya\peaky"distribution P ( x i j ^ Z;Y )withthemaximum,say,at x i = k 2 M .Thisindicatesthattheerrorof classicationwillbesmallwhen i islabeledasclass c .Recallthatduringinference,the beliefmessageof i ispropagateddownthesubtreeinbeliefpropagation[ 33 ],whichislikely torender i 'sdescendantswithsmallentropies,aswell.Thus,theclas sicationerrorof thewholeregionofleafnodesunder i islikelytobesmall,whencomparedtosomeother imageregionunder,say,node j suchthat H j >H i .Consequently, i ismore\signicant" forrecognitionofclass k thannode j .Inbrief,themostsignicantobjectparthasthe smallestentropyoverallnodesinagivensub-tree T : i =max i 2T H i : (4.2) InFigs. 4{1 and 4{2 ,weillustratethemostsignicantobjectpartundereachro ot, whereentropyiscomputedoversevenandsiximageclasses,s howninFigs. 4{1 (top)and 4{2 (top),respectively.Theexperimentisconductedasexplai nedinChapter 2 ,usingthe PAGE 50 37 Figure4{1:ForeachsubtreeofIT V ,representinganobjectinthe128 128image,anode i isfoundwiththehighestentropyfor j M j =6+1=7possibleimageclasses(toprow). Brightpixelsaredescendantsof i attheleaflevelandindicatetheobjectpartrepresented by i irregulartreewithrandomnodepositions,andobservables atalllevels(IT V ).Detailson computingobservables Y inthisexperimentareexplainedinChapter 5 .Notethatfor dierentscenesdierentobjectpartsareestablishedasth emostsignicantwithrespectto theentropymeasure. 4.2CombiningObject-PartRecognitionResults Oncenodesarerankedwithrespecttotheentropymeasure,we areinapositionto deviseacriteriontooptimallycombinethisinformationto wardultimateobjectrecognition. Herewith,weproposeasimplegreedyalgorithm,which,none theless,showsremarkable improvementsinperformanceoverthewhole-objectrecogni tionapproach. Undereachroot,werstselectthedescendantnodewiththes mallestentropy.Each selectednodedeterminesasubtree,whoseleafnodesforman objectpart.Then,weconduct majorityvotingovertheseselectedimageregions.Inthese condround,weselectunder eachrootthedescendantnodewiththesmallestentropy,suc hthatitdoesnotbelongto anyofthesubtreesselectedintherstround.Now,thesenod esdeterminenewsubtrees, whoseleafnodesformobjectpartsthatdonotoverlapwithth eselectedimageregionsin PAGE 51 38 Figure4{2:ForeachsubtreeofIT V ,representinganobjectinthe256 256image,anode i isfoundwiththehighestentropyfor j M j =5+1=6possibleimageclasses(toprow). Brightpixelsaredescendantsof i attheleaflevelandindicatetheobjectpartrepresented by i ;theimagesrepresentthesamesceneviewedfromthreedier entangles;themost signicantobjectpartsdierovervariousscenes.therstround.Then,weconductmajorityvotingoverthenew lyselectedimageregions. Thisprocedureisrepeateduntilweexhaustivelycoverallt hepixelsintheimage.This stage-wisemajorityvotingovernon-overlappingimagereg ionsconstitutesthenalstepin theobject-partrecognitionstrategy(seeFig. 1{3 ). PAGE 52 CHAPTER5 FEATUREEXTRACTION InChapters 2 and 3 ,wehaveintroducedfourarchitecturesoftheirregulartre e,referred toasIT V ,IT V 0 ,IQT V ,andIQT V 0 .Tocomputetheobservable(feature)randomvectors Y 'sforthesemodels,weaccountforbothcolorandtexturecue s. 5.1Texture Forthechoiceoftexture-basedfeatures,wehaveconsidere dseveralltering,modelbasedandstatisticalmethodsfortexturefeatureextracti on.Ourconclusioncomplieswith thecomparativestudyofRandenandHusoy[ 66 ]thatforproblemswithmanytextureswith subtlespectraldierences,asinthecaseofourcomplexcla sses,itisreasonabletoassume thatthespectraldecompositionbyalterbankyieldsconsi stentlysuperiorresultsover othertextureanalysismethods.Ourexperimentalresultsa lsosuggestthatitiscrucialto analyzebothlocalaswellasregionalpropertiesoftexture .Assuch,weemploythewavelet transform,duetoitsinherentrepresentationoftextureat dierentscalesandlocations. 5.1.1WaveletTransform Waveletatomfunctions,beingwelllocalizedbothinspacea ndfrequency,retrieve textureinformationquitesuccessfully[ 67 ].Theconventionaldiscretewavelettransform (DWT)mayberegardedasequivalenttolteringtheinputsig nalwithabankofbandpass lters,whoseimpulseresponsesareallgivenbyscaledvers ionsofamotherwavelet.The scalingfactorbetweenadjacentltersis2:1,leadingtooc tavebandwidthsandcenterfrequenciesthatareoneoctaveapart.Theoctave-bandDWTismo stecientlyimplemented bythedyadicwaveletdecompositiontreeofMallat[ 68 ],wherewaveletcoecientsofan imageareobtainedconvolvingeveryrowandcolumnwithimpu lseresponsesoflowpass andhighpasslters,asshowninFigure 5{1 .Practically,coecientsofonescaleareobtainedconvolvingeverysecondrowandcolumnfromtheprevi ousnerscale.Thus,the lteroutputisawaveletsubimagethathasfourtimeslessco ecientsthantheoneatthe 39 PAGE 53 40 1 H 0 H 2 2 H 0 H 1 22 H 1 22 H 0 H 0 H 1 22 H 0 H 1 H 0 H 1 Row filters Column filters Row filters Column filters 2222 Level 0 Level 0 Level 1 Level 1 LEVEL 0 LEVEL 1 W L W H W LL W LH W HL W HH W LL W LH W HL W HH Figure5{1:TwolevelsoftheDWTofatwo-dimensionalsignal 20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120 20 40 60 80 100 120 Figure5{2:Theoriginalimage(left)anditstwo-scaledyad icDWT(right). previousscale.Thelowpasslterisdenotedwith H 0 andthehighpasslterwith H 1 .The waveletcoecientsWhaveinindexLdenotinglowpassoutput andHforhighpassoutput. Separablelteringofrowsandcolumnsproducesfoursubima gesateachlevel,which canbearrangedasshowninFigure 5{2 .Thesamegurealsoillustrateswellthedirectional selectivityoftheDWT,because W LH W HL ,and W HH bandpasssubimagescanselect horizontal,verticalanddiagonaledges,respectively. PAGE 54 41 5.1.2WaveletProperties ThefollowingpropertiesoftheDWThavemadewavelet-based imageprocessingvery attractiveinrecentyears[ 67 30 69 ]: 1.locality:eachwaveletcoecientrepresentslocalimage contentinspaceandfrequency, becausewaveletsarewelllocalizedsimultaneouslyinspac eandfrequency 2.multi-resolution:DWTrepresentsanimageatdierentsc alesofresolutioninspace domain(i.e.,infrequencydomain);regionsofanalysisato nescalearedividedupinto foursmallerregionsatthenextnerscale(Fig. 5{2 ) 3.edgedetector:edgesofanimagearerepresentedbylargew aveletcoecientsatthe correspondinglocations 4.energycompression:waveletcoecientsarelargeonlyif edgesarepresentwithinthe supportofthewavelet,whichmeansthatthemajorityofwave letcoecientshave smallvalues 5.decorrelation:waveletcoecientsareapproximatelyde correlated,sincethescaled andshiftedwaveletsformorthonormalbasis;dependencies amongwaveletcoecients arepredominantlylocal 6.clustering:ifaparticularwaveletcoecientislarge/s mall,thentheadjacentcoecientsareverylikelytoalsobelarge/small 7.persistence:large/smallvaluesofwaveletcoecientst endtopropagatethroughscales 8.non-Gaussianmarginal:waveletcoecientshavepeakyan dlong-tailedmarginaldistributions;duetotheenergycompressionpropertyonlyafe wwaveletcoecients havelargevalues,thereforeGaussiandistributionforani ndividualcoecientisa poorstatisticalmodel ItisalsoimportanttointroduceshortcomingsoftheDWT.Di scretewaveletdecompositionssuerfromtwomainproblems,whichhampertheiru seformanyapplications,as follows[ 70 ]: 1.lackofshiftinvariance:smallshiftsintheinputsignal cancausemajorvariationsin theenergydistributionofwaveletcoecients 2.poordirectionalselectivity:forsomeapplicationshor izontal,verticalanddiagonal selectivityisinsucient WhenweanalyzetheFourierspectrumofasignal,weexpectth eenergyineach frequencybintobeinvarianttoanyshiftsoftheinput.Unfo rtunately,theDWThasa signicantdrawbackthattheenergydistributionbetweenv ariouswaveletscalesdepends criticallyonthepositionofkeyfeaturesoftheinputsigna l,whereasideallydependence PAGE 55 42 TREE aTREE b Level 0 Level 1 Level 2 H 0a H 1a H 0b H 1b 2222 2222 2222 H 00a H 01a H 00b H 01b H 00a H 01a H 00b H 01b x 0a x 1a x 0b x 1b x 00a x 01a x 00b x 01b x 000a x 001a x 000b x 001b Figure5{3:TheQ-shiftDual-TreeCWT. isonjustthefeaturesthemselves.Therefore,therealDWTi sunlikelytogiveconsistent resultswhenusedintextureanalysis. Inliterature,thereareseveralapproachesproposedtoove rcomethisproblem(e.g., DiscreteWaveletFrames[ 67 71 ]),allincreasingcomputationalloadwithinevitableredu ndancyinthewaveletdomain.Inouropinion,theComplexWave letTransform(CWT)oers thebestsolutionprovidingadditionaladvantages,descri bedinthefollowingsubsection. 5.1.3ComplexWaveletTransform ThestructureoftheCWTisthesameasinFigure 5{1 ,exceptthattheCWTlters havecomplexcoecientsandgeneratecomplexoutput.Theou tputsamplingratesare unchangedfromtheDWT,buteachwaveletcoecientcontains arealandimaginarypart, thusaredundancyof2:1forone-dimensionalsignalsisintr oduced.Inourcase,fortwodimensionalsignals,theredundancybecomes4:1,becauset woadjacentquadrantsofthe spectrumarerequiredtorepresentfullyarealtwo-dimensi onalsignal,addinganextra2:1 factor.Thisisachievedbyadditionallteringwithcomple xconjugatesofeithertherowor columnlters[ 70 ]. Despiteitshighercomputationalcost,weprefertheCWTove rtheDWTbecauseof theCWT'sfollowingattractiveproperties.TheCWTisshown topossesalmostshiftand rotationalinvariance,givensuitablydesignedbiorthogo nalororthogonalwaveletlters.We PAGE 56 43 Table5{1:CoecientsoftheltersusedintheQ-shiftDTCWT H 13 (symmetric) H 19 (symmetric) H 6 -0.0017581 -0.0000706 0.03616384 0 0 0 0.0222656 0.0013419 -0.08832942 -0.0468750 -0.0018834 0.23389032 -0.0482422 -0.0071568 0.76027237 0.2968750 0.0238560 0.58751830 0.5554688 0.0556431 0 0.2968750 -0.0516881 -0.11430184 -0.0482422 -0.2997576 0 ... 0.5594308 0 -0.2997576 ... Figure5{4:TheCWTisstronglyorientedatangles 15 ; 45 ; 75 implementtheQ-shiftDual-TreeCWTscheme,proposedbyKin gsbury[ 72 ],asdepictedin Figure 5{3 .ThegureshowstheCWTofonlyone-dimensionalsignal x ,forclarity.The outputofthetrees a and b canbeviewedasrealandimaginarypartsofcomplexwavelet coecients,respectively.Thus,tocomputetheCWT,weimpl ementtworealDWT's(see Fig. 5{1 ),obtainingawaveletframewithredundancytwo.AsfortheD WT,here,lowpass andhighpassltersaredenotedwith0and1inindex,respect ively.Thelevel0comprises odd-lengthlters H 0 a ( z )= H 0 b ( z )= H 13 ( z )(13taps)and H 1 a ( z )= H 1 b ( z )= H 19 ( z ) (19taps).Levelsabovethelevel0consistofeven-lengthl ters H 00 a ( z )= z 1 H 6 ( z 1 ), H 01 a ( z )= H 6 ( z ), H 00 b ( z )= H 6 ( z ), H 01 b ( z )= z 1 H 6 ( z 1 ),wheretheimpulseresponse ofthelters H 13 H 19 and H 6 isgiveninthetable 5{1 PAGE 57 44 Asidefrombeingshiftinvariant,theCWTissuperiortotheD WTintermsofdirectionalselectivity,too.Atwo-dimensionalCWTproducessi xbandpasssubimages(analogoustothethreesubimagesintheDWT)ofcomplexcoecients ateachlevel,whichare stronglyorientedatanglesof 15 ; 45 ; 75 ,asillustratedinFigure 5{4 AnotheradvantageouspropertyoftheCWTexertsintheprese nceofnoise.The phaseandmagnitudeofthecomplexwaveletcoecientscolla borateinanontrivialway todescribedata[ 70 ].Thephaseencodesthecoherent(inspaceandscale)struct ureof animage,whichisresilienttonoise,andthemagnitudecapt uresthestrengthoflocal informationthatcouldbeverysusceptibletonoisecorrupt ion.Hence,thephaseofcomplex waveletcoecientsmightbeusedasaprincipalclueforimag edenoising.However,our experimentalresultshaveshownthatphaseisnotagoodfeat urechoiceforsky/ground modeling.Therefore,weconsideronlymagnitudes. Insummary,fortextureanalysisinIT V andIQT V ,wechoosethecomplexwavelet transform(CWT)appliedtotheintensity(gray-scale)imag e,duetoitsshift-invariant representationoftextureatdierentscales,orientation sandlocations. 5.1.4Dierence-of-GaussianTextureExtraction InIT V 0 andIQT V 0 ,observablesarepresentonlyattheleaflevel.Therefore, forthese models,multiscaletextureextractionissuperruous.Here ,wecomputethedierence-ofGaussianfunctionconvolvedwiththeimageas D ( x;y;k; )=( G ( x;y;k ) G ( x;y; )) I ( x;y ) ; (5.1) where x and y representpixelcoordinates, G ( x;y; ) exp( ( x 2 + y 2 ) = 2 2 ) = 2 2 ,and I ( x;y )istheintensityimage.Inadditiontoreducedcomputation alcomplexity,ascomparedtotheCWT,thefunction D providesacloseapproximationtothescale-normalized LaplacianofGaussian, 2 r 2 G ,whichhasbeenshowntoproducethemoststableimage featuresacrossscaleswhencomparedtoarangeofotherposs ibleimagefunctions,suchas thegradientandtheHessian[ 73 74 ].Wecompute D ( x;y;k; )forthreescales k = p 2 ; 2 ; p 8 and =2. PAGE 58 45 5.2Color Thecolorinformationinavideosignalisusuallyencodedin theRGBcolorspace.For colorfeatures,inallmodels,wechoosethegeneralizedRGB colorspace: r = R= ( R + G + B ), and g = G= ( R + G + B ),whicheectivelynormalizesvariationsinbrightness.F orIT V and IQT V ,the Y 'sofhigher-levelnodesarecomputedasthemeanofthe r 'sand g 'softheir childrennodesoftheinitialquad-treestructure.Eachcol orobservableisnormalizedto havezeromeanandunitvarianceoverthedataset. Insummary,the y 'sare8dimensionalvectorsforIT V andIQT V ,and5dimensional vectorsforIT V 0 andIQT V 0 PAGE 59 CHAPTER6 EXPERIMENTSANDDISCUSSION Wereportexperimentsonimagesegmentationandclassicat ionforsixsetsofimages. DatasetIcomprisesfty,64 64,simple-sceneimageswithobjectappearancesof20disti nct objectsshowninFig. 6{1 .SamplesofdatasetIaregiveninFigs. 6{2 6{3 ,and 6{4 .Dataset IIcontains120,128 128,complex-sceneimageswithpartiallyoccludedobjecta ppearances ofthesame20distinctobjectsasfordatasetIimages.Examp lesofdatasetIIareshown inFigs. 6{11 6{12 6{15 .NotethatobjectsappearingindatasetsIandIIarecareful ly chosentotestifirregulartreesareexpressiveenoughtoca ptureverysmallvariationsin appearancesofsomeclasses(e.g.,twodierenttypesofcan sinFig. 6{1 ),aswellasto encodelargedierencesamongsomeotherclasses(e.g.,wir y-featuredrobotandbooksin Fig. 6{1 ). Next,datasetIIIcontainsfty,128 128,natural-sceneimages,samplesofwhichare showninFigs. 6{5 and 6{6 FordatasetIVwechoosesixty,128 128imagesfromadatabasethatispublicly availableattheComputerVisionHomePage.DatasetIVconta insavideosequenceof twopeopleapproachingeachother,whowearalikeshirts,bu tdierentpants,asillustrated inFig. 6{16 .Thesequenceisinteresting,becausethemostsignicant\ object"partsfor dierentiatingbetweenthetwopersons(i.e.,pants)getoc cluded.Moreover,theimages representsceneswithclutter,whererecognitionofpartia llyoccluded,similar-in-appearance peoplebecomesharder.Togetherwiththetwopersons,there are12possibleimageclasses appearingindatasetII,asdepictedinFig. 6{16 a.Here,eachimageistreatedseparately, withoutmakinguseofthefactthatthebackgroundscenedoes notchangeinthevideo sequence. Further,datasetVconsistsofsixty,256 256images,typicalsamplesofwhichare showninFigs. 6{17 b.TheimagesindatasetVrepresentthevideosequenceofaco mplexscene,whichisobservedfromdierentviewpointsbymo vingacamerahorizontally 46 PAGE 60 47 clockwise.Togetherwiththebackground,thereare6possib leimageclasses,asdepictedin Figs. 6{17 a. Finally,datasetVIconsistsofsixty,256 256natural-sceneimages,samplesofwhich areshowninFigs. 6{18 .TheimagesindatasetVIrepresentthevideosequenceofaro w ofhouses,whichisobservedfromdierentviewpoints.Theh ousesareverysimilarin appearance,sothattherecognitiontaskbecomesverydicu lt,whendetailsdierentiating onehousefromanotherareoccluded.Thereare8possibleima geclasses:4dierenthouses, sky road grass ,and tree ,asmarkedwithdierentcolorsinFigs. 6{18 Alldatasetsaredividedintotrainingandtestsetsbyrando mselectionofimages, suchthat2/3areusedfortrainingand1/3fortesting.Groun dtruthforeachimageis determinedthroughhand-labelingofpixels. 6.1UnsupervisedImageSegmentationTests Werstreportexperimentsonunsupervisedimagesegmentat ionusingIT V 0 andIT V Irregular-treebasedimagesegmentationistestedondatas etsIandIII,andconductedby thealgorithmgiveninFig. 2{4 .Sinceinunsupervisedsettingstheparametersofthemodel arenotknown,weinitializethemasdiscussedintheinitial izationstepofthelearning algorithminSection 2.5 .AfterBayesianestimationoftheirregulartree,eachnode denes oneimageregioncomposedofthoseleafnodes(pixels)thata rethatnode'sdescendants. ResultspresentedinFigs. 6{2 6{3 6{4 6{5 ,and 6{6 suggestthatirregulartreesareable toparseimagesinto\meaningful"partsbyassigningonesub treeper\object"intheimage. Moreover,fromFigs. 6{2 and 6{3 ,wealsoobservethatirregulartrees,inferredthroughSVA preservestructureforobjectsacrossimagessubjecttotra nslation,rotationandscaling.In Fig. 6{2 ,notethatthelevel-4clusteringforthelarger-objectsca leinFig. 6{2 (top-right) correspondstothelevel-3clusteringforthesmaller-obje ctscaleinFig. 6{2 (bottom-center). Inotherwords,astheobjecttransitionsthroughscales,th etreestructurechangesby eliminatingthelowest-levellayer,whilethehigher-orde rstructureremainsintact. Wealsonotethattheestimatedpositionsofhigher-levelhi ddenvariablesinIT V 0 and IT V areveryclosetothecenterofmassofobjectparts,aswellas ofwholeobjects.We computetheerrorofestimatedroot-nodepositions ^ r asthedistancefromtheactualcenter ofmass r CM ofhand-labeledobjects, d err = jj ^ r r CM jj .Also,wecompareourSVAinference PAGE 61 48 Figure6{1:20imageclassesintypeIandIIdatasets. Figure6{2:ImagesegmentationusingIT V 0 :(left)datasetIimages;(center) pixelclusterswiththesameparentatlevel ` =3;(right)pixelclusterswiththe sameparentatlevel ` =4;pointsmarkthe positionofparentnodes.Irregular-treestructureispreservedthroughscales. Figure6{3:ImagesegmentationusingIT V 0 :(top)datasetIimages;(bottom) pixelclusterswiththesameparentatlevel3.Irregular-treestructureispre-servedoverrotations. algorithmwithvariationalapproximation(VA) 1 proposedbyStorkeyandWilliams[ 48 ]. TheaveragederrorvaluesoverthegiventestimagesforVAan dSVAarereportedin Table 6{1 .Weobservethattheerrorsignicantlydecreasesastheima gesizeincreases, becauseinsummingnodepositionsoverparentandchildrenn odes,asinEq.( 2.16 )and Eq.( 2.17 ),morestatisticallysignicantinformationcontributes tothepositionestimates. Forexample, d IIIerr =6 : 18forSVAisonly4.8%ofthedataset-IIIimagesize,whereas d Ierr =4 : 23 forSVAis6.6%ofthedataset-Iimagesize. InTable 6{2 ,wereportthepercentageoferroneouslygroupedpixels,an d,inTable 6{3 wereporttheobjectdetectionerror,whencomparedtogroun dtruth,averagedovereach dataset.Forestimatingtheobjectdetectionerror,thefol lowinginstancesarecountedas 1 AlthoughthealgorithmproposedbyStorkeyandWilliams[ 48 ]isalsostructuredvariationalapproximation,todierentiatethatmethodfromour s,weslightlyabusethenotation. PAGE 62 49 (a) (b) (c) Figure6{4:Imagesegmentationbyirregulartreeslearnedu singSVA:(a)-(c)IT V 0 for datasetIimages;allpixelslabeledwiththesamecolorared escendantsofauniqueroot. (a) (b) (c) (d) Figure6{5:Imagesegmentationbyirregulartreeslearnedu singSVA:(a)IT V 0 foradataset IIIimage;(b)-(d)IT V fordatasetIIIimages;allpixelslabeledwiththesamecolo rare descendantsofauniqueroot. (a) (b) (c) (d) Figure6{6:ImagesegmentationusingIT V :(a)adatasetIIIimage;(b)-(d)pixelclusters withthesameparentatlevels ` =3 ; 4 ; 5,respectively;whiteregionsrepresentpixelsalready groupedbyrootsatthepreviousscale;pointsmarktheposit ionofparentnodes. Table6{1:Root-nodedistanceerror IT V 0 IT V dataset VA SVA VA SVA I 6.32 4.61 6.14 4.23 III 9.15 6.87 8.99 6.18 PAGE 63 50 Table6{2:Pixelsegmentationerror datasets I III IT V 0 VA 7% 10% SVA 4% 9% IT V VA 7% 11% SVA 4% 7% Table6{3:Objectdetectionerror datasets I III IT V 0 VA 4% 13% SVA 3% 10% IT V VA 4% 10% SVA 2% 6% error:(1)mergingtwodistinctobjectsintoone(i.e.,fail uretodetectanobject),and(2) segmentinganobjectintosub-regionsthatarenotactualob jectparts.Ontheotherhand,if anobjectissegmentedintoseveral\meaningful"sub-regio ns,veriedbyvisualinspection, thistypeoferrorisnotincluded.Overall,weobservethatS VAoutperformsVAforimage segmentationusingIT V 0 andIT V .Interestingly,thesegmentationresultsforIT V models areonlyslightlybetterthanforIT V 0 models. Itshouldbeemphasizedthatourexperimentsarecarriedout inan unsupervised setting, and,assuch,cannotnotbeequitablyevaluatedagainst supervised objectrecognitionresults reportedintheliterature.Take,forinstance,thesegment ationinFig. 6{5 d,wheretwo boysdressedinwhiteclothes(i.e.,twosimilar-lookingob jects)aremergedintoonesubtree. Giventheabsenceofpriorknowledge,theground-truthsegm entationforthisimageis arbitrary,andtheresultingsegmentationambiguous;neve rtheless,westillcountittowards theobject-detectionerrorpercentagesinTable 6{3 Ourclaimthatnodesatdierentlevelsofirregulartreesre presentobject-partsat variousscalesissupportedbyexperimentalevidencethatt henodessegmenttheimageinto \meaningful"objectsub-componentsandpositionthemselv esatthecenterofmassofthese sub-parts. 6.2TestsofConvergence Inthissection,wereportontheconvergencepropertiesoft heinferencealgorithms forIT V 0 ,IT V ,IQT V 0 ,andIQT V .First,wecompareourSVAinferencealgorithmwith variationalapproximation(VA)[ 48 ].InFig. 6{7 a-b,weillustratetheconvergencerate ofcomputing P ( Z;X;R 0 j Y;R 0 ) Q ( Z;X;R 0 )forSVAandVA,averagedoverthegiven datasets.Numbersabovebarsrepresentthemeannumberofit erationstepsittakesfor thealgorithmtoconverge.Weconsiderthealgorithmconver gedwhen j Q ( Z;X;R 0 ; t + PAGE 64 51 (a)AverageconvergencerateforIT V 0 (b)AverageconvergencerateforIT V (c)Increaseoflog Q ( Z;X;R 0 )inSVAover VAforIT V 0 (d)Increaseoflog Q ( Z;X;R 0 )inSVAover VAforIT V Figure6{7:Comparisonofinferencealgorithms:(a)-(b)co nvergencerateaveragedover thegivendatasets;(c)-(d)percentageincreaseinlog Q ( Z;X;R 0 )computedinSVAover log Q ( Z;X;R 0 )computedinVA. 1) Q ( Z;X;R 0 ; t ) j =Q ( Z;X;R 0 ; t ) <" for N consecutiveiterationsteps t ,where N =10and =0 : 01(seeFig. 2{4 ,Step(11)).Overall,SVAconvergesinthefewestnumberofi terations. Forexample,theaveragenumberofiterationsforSVAondata setIIIis25and23forIT V 0 andIT V ,respectively,whichtakesapproximately6sand5sonaDual 2GHzPowerPCG5. Here,theprocessingtimealsoincludesimage-featureextr action. Forthesameexperiments,inFig. 6{7 c-d,wereportthepercentageincreaseinlog Q ( Z;X;R 0 ) computedusingourSVAoverlog Q ( Z;X;R 0 )obtainedbyVA.WenotethatSVAresults inlargerapproximateposteriorsthanVA.Thelargerlog Q ( Z;X;R 0 )meansthattheassumedformoftheapproximateposteriordistribution Q ( Z;X;R 0 )= Q ( Z ) Q ( X j Z ) Q ( R 0 j Z ) moreaccuratelyrepresentsunderlyingstochasticprocess esintheimagethanVA. Now,wecomparetheconvergenceoftheinferencealgorithmf orIQT V 0 withSVAand VAforIT V 0 .Forsimplicity,werefertotheinferencealgorithmforthe modelIQT V 0 also,asIQT V 0 ,slightlyabusingthenotation.Theparametersthatcontro ltheconvergence PAGE 65 52 Figure6{8:Typicalconvergencerateoftheinferencealgor ithmforIT V 0 onthe128 128 datasetIVimageinFig. 6{16 b;SVAandVAinferencealgorithmsareconductedforIT V 0 model. Figure6{9:Typicalconvergencerateoftheinferencealgor ithmforIT V 0 onthe256 256 datasetVimageinFig. 6{17 b;SVAandVAinferencealgorithmsareconductedforIT V 0 model. Figure6{10:Percentageincreaseinlog-likelihoodlog P ( Y j X )ofIQT V 0 overlog P ( Y j X ) ofIT V 0 ,after500and200iterationstepsfordatasetsIVandV,resp ectively. criterionfortheinferencealgorithmsofthethreemodelsa re N =10,and =0 : 01.Figs. 6{8 and 6{9 illustratetypicalexamplesoftheconvergencerate.Weobs ervethattheinference algorithmforIQT V 0 convergesslightlyslowerthanSVAandVAforIT V 0 .Theaverage numberofiterationstepsforIQT V 0 isapproximately160and230,whichtakes6sand17s onaDual2GHzPowerPCG5,fordatasetsIVandV,respectively PAGE 66 53 Thebar-chartinFig. 6{10 showsthepercentage log P 1 log P 2 j log P 1 j ,where P 1 = P ( Y j X )is thelikelihoodofIT V 0 ,and P 2 = P ( Y j X )ofIQT V 0 .Weobservethat P ( Y j X )ofIQT V 0 afterthealgorithmconverged,islargerthan P ( Y j X )ofIT V 0 .Thelargerlikelihoodmeans thatthemodelstructureandinferreddistributionsmoreac curatelyrepresentunderlying stochasticprocessesintheimage. 6.3ImageClassicationTests WecompareclassicationperformanceofIT V 0 withthatofthefollowingstatistical models:(1)MarkovRandomField(MRF)[ 6 ],(2)DiscriminativeRandomField(DRF)[ 25 ], and(3)Tree-StructuredBeliefNetwork(TSBN)[ 33 29 ].Thesemodelsarerepresentatives ofdescriptive,discriminativeandxed-structuregenera tivemodels,respectively.Below, webrieryexplainthemodels. ForMRFs,weassumethatthelabeleld P ( X )isahomogeneousandisotropicMRF, givenbythegeneralizedIsingmodelwithonlypairwisenonz eropotentials[ 6 ].Thelikelihoods P ( y i j x i )areassumedconditionallyindependentgiventhelabels.T hus,theposterior energyfunctionisgivenby U ( X j Y )= X i 2 V 0 log P ( y i j x i )+ X i 2 V 0 X j 2N i V 2 ( x i ;x j ) ; V 2 ( x i ;x j )= 8><>: MRF ;ifx i = x j ; MRF ;ifx i 6 = x j : where N i denotestheneighborhoodof i P ( y i j x i )isa G -componentmixtureofGaussians givenbyEq.( 2.6 ),and V 2 istheinteractionparameter.Detailsonlearningthemodel parametersaswellasoninferenceforagivenimagecanbefou ndinStanLi'sbook[ 6 ]. Next,theposteriorenergyfunctionoftheDRFisgivenby U ( X j Y )= X i 2 V 0 A i ( x i ;Y )+ X i 2 V 0 X j 2N i I ij ( x i ;x j ;Y ) ; where A i =log ( x i W T y i )and I ij = DRF ( Kx i x j +(1 K )(2 ( x i x j V T y i ) 1))aretheunary andpairwisepotentials,respectively.Sincetheabovefor mulationdealsonlywithbinary classication(i.e. x i 2f 1 ; 1 g ),whenestimatingparameters f W;V; DRF ;K g foranobject,wetreatthatobjectasapositiveexample,andallothe robjectsasnegativeexamples PAGE 67 54 (\oneagainstall"strategy).Fordetailsonhowtolearnthe modelparameters,andhow toconductinferenceforagivenimage,wereferthereaderto thepaperofKumarand Hebert[ 25 ]. Further,TSBNsorquad-treesaredenedtohavethesamenumb erofnodes V and levels L asirregulartrees.ForbothIT V 0 andTSBNs,weusethesameimagefeatures. Whenweoperateonwavelets,whichisamultiscaleimagefeat ure,weinfactpropagate observablestohigherlevels.Inthiscase,werefertotheco unterpartofIT V asTSBN .To learntheparametersofTSBNorTSBN ,andtoperforminferenceonagivenimage,we usethealgorithmsthoroughlydiscussedbyLaferteetal.[ 33 ]. Finally,irregular-treebasedimageclassicationiscond uctedbyemployingtheinferencealgorithmsinFig. 2{4 forIT V 0 andIT V ,andtheinferencealgorithmsinFig. 3{2 forIQT V 0 andIQT V .Sinceimageclassicationrepresentsasupervisedmachin elearning problem,itisnecessarytorstlearnmodelparametersontr ainingimages.Forthispurpose,weemploythelearningalgorithmsdiscussedinSectio n 2.5 forIT V 0 andIT V ,andthe learningalgorithmsdiscussedinSection 3.3 forIQT V 0 andIQT V AfterinferenceofMRF,DRF,TSBN,andtheirregulartree,on agivenimage,foreach model,weconductpixellabelingbyusingtheMAPclassier. InFig. 6{11 ,weillustrate anexampleofpixellabelingforadataset-IIimage.Here,we saythatanimageregionis correctlyrecognizedasanobjectifthemajorityofMAP-cla ssiedpixellabelsinthatregion areequaltothetruelabelingoftheobject.Forestimatingt heobject-recognitionerror,the followinginstancesarecountedaserror:(1)mergingtwodi stinctobjectsintoone,and(2) swappingtheidentityofobjects.Theobject-recognitione rroroverallobjectsin40test imagesindatasetIIissummarizedinTable 6{4 .IneachcellofTable 6{4 ,therstnumber indicatestheoverallrecognitionerror,whilethenumberi nparenthesesindicatestheratio ofswapped-identityerrors.Forinstance,forIT V 0 theoverallrecognitionerroris9.6%,of which37%ofinstanceswerecausedbyswapped-identityerro rs.Moreover,Table 6{5 shows averagepixel-labelingerror. Next,weexaminethe receiveroperatingcharacteristic (ROC)ofMRF,DRF,TSBN andIT V 0 foratwo-classrecognitionproblem.Fromthesetofimagecl assesgiveninFig. 6{1 wechoose\toy-snail"and"wavelets-book"asthetwopossib leclassesinthefollowingset PAGE 68 55 Table6{4:Objectrecognitionerror imagetype MRF DRF TSBN IT V 0 datasetII 21.2% 12.5% 14.8% 9.6% (67%) (83%) (72%) (37%) Table6{5:Pixellabelingerror imagetype MRF DRF TSBN IT V 0 datasetII 15.8% 12.3% 16.1% 9.9% ofexperiments.Thetaskistolabeltwo-class-problemimag escontaining\toy-snail"and "wavelets-book"objects,atypicalexampleofwhichisshow ninFig. 6{12 .Here,pixels labeledas\toy-snail"areconsideredtruepositives,whil epixelslabeledas\book"are consideredtruenegatives.InFig. 6{13 ,weplotROCcurvesforthetwo-classproblem, wherewecomparetheperformanceofIT V 0 withthoseofMRF,DRFandTSBN.From Fig. 6{13 ,weobservethatimageclassicationwithIT V 0 isthemostaccurate,sinceits ROCcurveistheclosesttotheleft-handandtopbordersofth eROCspace,ascompared totheROCcurvesoftheothermodels.Further,inFig. 6{14 ,weplotROCcurvesforthe sametwo-classproblem,wherewecomparetheperformanceof IT V ,withthoseofIT V 0 TSBN,andTSBN .FromFig. 6{14 ,weobservethatimageclassicationwithIT V isthe mostaccurate,andthatbothIT V 0 andIT V outperformtheirxed-structurecounterparts TSBNandTSBN FromtheresultsreportedinTables 6{4 and 6{5 ,aswellasformFigs. 6{13 and 6{14 wenotethatirregulartreesoutperformtheotherthreemode ls.However,recognition performanceofallthemodelssuerssubstantiallywhenani magecontainsocclusions. Whileforsomeapplicationstheliteraturereportsvisions ystemswithimpressivelysmall classicationerrors(e.g.,2.5%hand-writtendigitrecog nitionerror[ 75 ]),inthecaseof (a)256 256 (b)MRF (c)DRF (d)TSBN (e)IT V 0 Figure6{11:Comparisonofclassicationresultsforvario usstatisticalmodels;pixelsare labeledwithacolorspecicforeachobject;non-coloredpi xelsareclassiedasbackground. PAGE 69 56 (a)256 256 (b)MRF (c)DRF (d)TSBN (e)IT V 0 Figure6{12:MAPpixellabelingusingdierentstatistical models. Figure6{13:ROCcurvesfortheimageinFig. 6{12 awithIT V 0 ,TSBN,DRFandMRF. complexscenesthiserrorismuchhigher[ 76 77 11 5 4 ].Tosomeextent,ourresults couldhavebeenimprovedhadweemployedmorediscriminativ eimagefeaturesand/or moresophisticatedclassicationalgorithmsthanmajorit yrule.However,noneofthese willalleviatethefundamentalproblemof\traditional"re cognitionapproaches:thelackof explicitanalysisofvisibleobjectparts.Thus,thepoorcl assicationperformanceofMRF, DRF,andTSBN,reportedinTables 6{4 and 6{5 ,canbeinterpretedasfollows.Accounting foronlypairwisepotentialsbetweenadjacentnodesinMRFa ndDRFisnotsucientto analyzecomplexcongurationsofobjectsinthescene.Also ,theanalysisofxed-sizepixel neighborhoodsatvariousscalesinTSBNleadsto\blocky"es timates,andconsequently Figure6{14:ROCcurvesfortheimageinFig. 6{12 awithIT V ,IT V 0 ,TSBN,andTSBN PAGE 70 57 topoorclassicationperformance.Therefore,wehypothes izethatthemainreasonwhy irregulartreesoutperformtheothermodelsistheircapabi litytorepresentobjectdetailsat variousscales,whichinturnprovidesforexplicitanalysi sofvisibleobjectparts.Inother words,wespeculatethatinthefaceoftheocclusion-proble m, recognitionofobjectparts is criticalandshouldconditionrecognitionoftheobjectasa whole. Tosupportourhypothesis,insteadofapplyingmoresophist icatedimage-featureextractiontoolsandbetterclassicationproceduresthan majorityvote,weintroducea moreradicalchangetoourrecognitionstrategy. 6.4Object-PartRecognitionStrategy RecallfromSection 6.1 thatirregulartreesarecapableofcapturingcomponentsub componentstructuresatvariousscales,suchthatrootnode srepresentthecenterofmass ofdistinctobjects,whilechildrennodesdownthesubtrees representobjectparts.Assuch, irregulartreesprovideanaturalandseamlessframeworkfo ridentifyingcandidateimage regionsasobjectparts,requiringnoadditionaltrainingf orsuchidentication.Toutilizethisconvenientproperty,weconducttheobject-partr ecognitionstrategypresentedin Section 4.2 Wecomparetheperformanceofthewhole-objectandpart-obj ectrecognitionstrategies. Thewhole-objectapproachcanbeviewedasabenchmarkstrat egy,inthesensethata majorityofexistingvisionsystemsdoesnotexplicitlyana lyzevisibleobjectpartsatvarious scales.Inthesesystems,oncetheobjectisdetected,thewh oleimageregionisidentied throughMAPclassication,asisdoneintheprevioussectio n. InFig. 6{15 ,wepresentclassicationresultsforIT V 0 ,usingthewhole-objectand object-partrecognitionstrategiesondataset-IIimages. InFig. 6{15 a,bothstrategiessucceedinrecognizingtwodierent\Fluke"voltage-measurin ginstruments(seeFig. 6{1 ). However,inFig. 6{15 b,thewhole-objectrecognitionstrategyfailstomakeadis tinction betweentheobjects,sincethepartthatdierentiatesmost oneobjectfromanotherisoccluded,makingitadicultcaseforrecognitionevenforahu maninterpreter.Intheother twoimages,weobservethattheobject-partrecognitionstr ategyismoresuccessfulthan thewhole-objectapproach. PAGE 71 58 (a) (b) (c) (d) Figure6{15:Comparisonoftworecognitionstrategiesonda tasetIIforIT V 0 :(top)128 128challengingimagescontainingobjectsthatareverysim ilarinappearance;(middle) classicationusingthewhole-objectrecognitionstrateg y;(bottom)classicationusingthe part-objectrecognitionstrategy;eachrecognizedobject intheimageismarkedwitha dierentcolor. Forestimatingtheobject-recognitionerrorofIT V 0 ondataset-IIimages,thefollowing instancesarecountedaserror:(1)mergingtwodistinctobj ectsintoone(i.e.,objectnot detected),and(2)swappingtheidentityofobjects(i.e.,o bjectcorrectlydetectedbut misclassiedasoneoftheobjectsintheclassofknownobjec ts).Therecognitionerror averagedoverallobjectsin40testimagesindatasetIIison ly5.8%,animprovementof nearly40%overthereportederrorof9.6%intheprevioussec tion. Wealsorecordedtheobject-recognitionerrorofIQT V 0 overallobjectsin20test imagesofdatasetsIV,V,andVI,respectively.Theresultsa resummarizedinTable 6{6 IneachcellofTable 6{6 ,therstnumberindicatestheoverallrecognitionerror,w hilethe numberinparenthesesindicatestheratioofmerged-object errors.Forinstance,fordataset Vandthewhole-objectstrategy,theoverallrecognitioner roris21.2%,ofwhichslightly morethanhalf(56%)werecausedbymerged-objecterrors.Th eresultsinTable 6{6 clearly demonstratesignicantlyimprovedrecognitionperforman ce,aswellasreductioninfalse PAGE 72 59 Table6{6:ObjectrecognitionerrorforIQT V 0 datasets strategy IV V VI whole-object 11.6%(85%) 21.2%(56%) 26.3%(44%) object-part 3.3%(100%) 8.7%(92%) 12.5%(81%) Table6{7:PixellabelingerrorforIQT V 0 datasets strategy IV V V whole-object 9.6% 17.9% 16.3% object-part 4.3% 6.7% 8.3% alarmandswapped-identitytypesoferrorfortheobject-pa rt,ascomparedwiththewholeobjectapproach.Also,Table 6{7 showsthattheobject-partstrategyreducespixel-labelin g error. Theseresultssupportourhypothesisthatforsuccessfulre cognitionofpartiallyoccludedobjectsitiscriticaltoanalyzevisibleobjectdeta ilsatvariousscales. PAGE 73 60 (a)Clutteredscenecontaining10objects,eachofwhichism arkedwithadierentcolor;imagesoftwo alikepersons. (b)DatasetII:videosequenceoftwoalikepeoplewalkingin aclutteredscene. (c)Classicationusingthewhole-objectrecognitionstra tegy. (d)Classicationusingthepart-objectrecognitionstrat egy. Figure6{16:RecognitionresultsoverdatasetIVforIQT V 0 PAGE 74 61 (a)6imageclasses:5similarobjectsandbackground. (b)4imagesofthesamesceneviewedfrom4dierentangleswi thobjectsshownin(a). (c)Themostsignicantobjectpartsdierovervariousscen es;themajority-votingclassicationresultis indicatedbythecoloredregions. (d)Classicationusingthewhole-objectrecognitionstra tegy. (e)Classicationusingtheobject-partrecognitionstrat egy. Figure6{17:RecognitionresultsoverdatasetVforIQT V 0 PAGE 75 62 Figure6{18:Classicationusingthepart-objectrecognit ionstrategy;Recognitionresults fordatasetVI. PAGE 76 CHAPTER7 CONCLUSION 7.1SummaryofContributions Inthisdissertation,wehaveaddresseddetectionandrecog nitionofpartiallyoccluded, alikeobjectsincomplexscenes{theproblemthathaseluded ,asofyet,asatisfactory solution.Theexperimentsreportedhereinshowthat\tradi tional"approachestoobject recognition,whereobjectsarerstdetectedandthenident iedasawhole,yieldpoorperformanceincomplexsettings.Therefore,wespeculatethat acarefulanalysisofvisible, ne-scaleobjectdetailsmayprovecriticalforrecognitio n.However,ingeneral,theanalysis ofmultiplesub-partsofmultipleobjectsgivesrisetoproh ibitivecomputationalcomplexity. Toovercomethisproblem,wehaveproposedtomodelimageswi thirregulartrees,which provideasuitableframeworkfordevelopingnovelobject-r ecognitionstrategies{inparticular,object-partrecognition.Here,objectdetailsatvari ousscalesarerstdetectedthrough tree-structureestimation;then,theseobjectpartsarean alyzedastowhichcomponentof anobjectisthemostsignicantforrecognitionofthatobje ct;nally,informationoncognitivesignicanceofeachobjectpartiscombinedtowardth eultimateimageclassication. Empiricalevidencedemonstratesthatthisexplicittreatm entofobjectpartsresultsinan improvedrecognitionperformance,ascomparedtothestrat egieswhereobjectcomponents arenotexplicitlyaccountedfor. InChapter 2 ,wehaveproposedtwoarchitectureswithintheirregular-t reeframework, referredtoasIT V 0 andIT V .Foreacharchitecture,wehavedevelopedaninferencealgorithm.Gibbssamplinghasbeenshowntobesuccessfulatn dingtreesthathavehigh posteriorprobability;however,atagreatcomputationalp rice,whichrendersthealgorithm impractical.Therefore,wehaveproposedStructuredVaria tionalApproximation(SVA) forinferenceofIT V 0 andIT V ,whichrelaxespoorlyjustiedindependenceassumptionsi n priorwork.WehaveshownthatSVAconvergestolargerposter iordistributions,anorder ofmagnitudefasterthancompetingalgorithms.Wehavealso demonstratedthatIT V 0 and 63 PAGE 77 64 IT V overcometheblockysegmentationproblemofTSBNs,andthat theypossesscertain invariancetotranslation,rotation,andscalingtransfor mations. InChapter 3 ,wehaveproposedanothertwoarchitectures,referredtoas IQT V 0 and IQT V .Inthesemodels,wehaveconstrainedthenodepositionstob exed,suchthat onlyconnectionscancontrolirregulartreestructure.Att hesametime,wehavemadethe distributionofconnectionsdependentonimageclasses.Th isformulationhasallowedusto avoidvariational-approximationinference,andtodevelo ptheexactinferencealgorithmfor IQT V 0 andIQT V .WehaveshownthatitconvergesslowerthanSVA;however,it yields largerlikelihoods,whichingeneralmeansthatIQT V 0 representsunderlyingstochastic processesintheimagemoreaccuratelythanIT V 0 Inexperimentsonunsupervisedimagesegmentation,wehave shownthecapabilityof irregulartreestocaptureimportantcomponent-subcompon entstructuresinimages.Empiricalevidencedemonstratesthatrootnodesrepresentthece nterofmassofdistinctobjects, whilechildrennodesdownthesubtreesrepresentobjectpar ts.Assuch,irregulartrees provideanaturalandseamlessframeworkforidentifyingca ndidateimageregionsasobjectparts,requiringnoadditionaltrainingforsuchident ication.InChapter 4 ,wehave proposedtoexplicitlyanalyzethesignicanceofobjectpa rts(i.e.,treenodes)withrespect torecognitionofanobjectasawhole.Wehavedenedentropy asameasureofsuchcognitivesignicance.Toavoidthecostlyapproachofanalyzi ngeverydetectedobjectpart, wehavedevisedagreedyalgorithm,referredtoasobject-pa rtrecognition.Thecomparisonofwhole-objectandpart-objectapproachesindicatest hatthelattermethodgenerates signicantlybetterrecognitionperformanceandreducedp ixel-labelingerror. Ultimately,whatallowsustoovercomeobstaclesinanalyzi ngsceneswithocclusions inacomputationallyecientandintuitivelyappealingman neristhegenerative-model frameworkwehaveproposed.Thisframeworkprovidesanexpl icitrepresentationofobjects andtheirsub-partsatvariousscales,which,inturn,const itutesthekeyfactorforimproved interpretationofsceneswithpartiallyoccluded,alikeob jects. PAGE 78 65 7.2OpportunitiesforFutureWork Theanalysisinthepreviouschapterssuggeststhefollowin gopportunitiesforfuture work.Onepromisingthrustofresearchwouldbetoinvestiga terelationshipsamongdescriptive,generativeanddiscriminativestatisticalmodels.W eanticipatethatthesestudieswill leadtoagreaterintegrationofthemodelingparadigms,yie ldingricherandmoreadvanced classesofmodels.Here,themostcriticalissueisthatofco mputationallymanageableinference.Withrecentadvancesintheareaofbeliefpropagat ion(e.g.,GeneralizedBelief Propagation[ 78 ]),thenewalgorithmsmaymakeitpossibletosolvereal-wor ldproblems thatwerepreviouslycomputationallyintractable. Withintheirregular-treeframework,itispossibletocont inuefurtherinvestigation towardreplacingthecurrentdiscrete-valuednodevariabl eswithreal-valuedones.Thereby, areal-valuedversionoftheirregulartreecanbespecied. Gaussianscouldbeusedas aprobabilitydistributiontogoverncontinuousrandomvar iables,representedbynodes, duetotheirtractableproperties.Suchamodelcouldthenop eratedirectlyonreal-valued pixeldata,improvingthestate-of-the-arttechniquesfor solvingvariousimage-processing problems,includingsuperresolution,imageenhancement, andcompression. Further,withrespecttothemeasureofsignicanceofirreg ular-treenodes,onecan pursueinvestigationofmorecomplexinformation-theoret icconceptsthanShanon'sentropy. Forexample,weanticipatethatjointentropyandmutualinf ormationmayyieldamore ecientcognitiveanalysis,whichinturncouldeliminatet heneedforthegreedyalgorithm discussedinSection 4.2 Theanalysisofobjectpartscanbeinterpretedasintegrati onofinformationfrom multiplecomplementaryand/orcompetitivesensors,eacho fwhichhasonlylimitedaccuracy.Assuch,furtherresearchcouldbeconductedonformul atingtheoptimalstrategy forcombiningthepiecesofinformationofobjectpartstowa rdultimateobjectrecognition. Weanticipatethatalgorithmssuchastheadaptiveboosting (AdaBoost)[ 79 ]andSupport VectorMachine[ 80 ]mayproveusefulforthispurpose. Anotherpromisingresearchtopicistoincorporateavailab lepriorknowledgeintothe proposedBayesianestimationframework,wherewehaveassu medthatallclassication PAGE 79 66 errorsareequallycostly.However,inmanyapplications,s omeerrorsaremoreseriousthan others.Cost-sensitivelearningmethodsareneededtoaddr essthisproblem[ 81 ]. Onabroaderscale,theresearchreportedinthisdissertati oncanbeviewedassolving amoregeneralmachinelearningproblem,withexperimental validationonimagesasdata. Thisproblemconcernssupervisedlearningfromexamples,w herethegoalistolearna function X = f ( Y )from N trainingexamplesoftheform f ( Y n ;f ( Y n )) g Nn =1 .Here, X n and Y n containsub-components,themeaningofwhichdiersforvar iousapplications.For example,incomputervision,each Y n mightbeavectorofimagepixelvalues,andeach X n mightbeapartitionofthatimageintosegmentsandanassign mentoflabelstoeach segment.Mostimportantly,thecomponentsof Y n formasequence(e.g.,asequenceon the2Dimagelattice).Therefore,learningaclassierfunc tion X = f ( Y )representsthe sequentialsupervisedlearning problem[ 82 ].Thus,inthisdissertation,wehaveaddressed sequentialsupervisedlearning,thesolutionsofwhichcan bereadilyappliedtoawiderange ofproblemsbeyondcomputervision,suchas,forexample,sp eechprocessing,wherethe componentsof Y formasequenceintime. PAGE 80 APPENDIXA DERIVATIONOFVARIATIONALAPPROXIMATION Preliminaries. Computationof KL ( Q k P ),givenbyEq.( 2.12 ),isintractable,becauseitdependson P ( Z;X;R 0 j Y;R 0 ).Note,though,that Q ( Z;X;R 0 )doesnotdepend on P ( Y j R 0 )and P ( R 0 ).Consequently,bysubtractinglog P ( Y j R 0 )andlog P ( R 0 )from KL ( Q k P ),weobtainatractablecriterion J ( Q;P ),whoseminimizationwithrespectto Q ( Z;X;R 0 )yieldsthesamesolutionasminimizationof KL ( Q k P ): J ( Q;P ) KL ( Q k P ) log P ( Y j R 0 ) log P ( R 0 )= Z R 0 dR 0 XZ;X Q ( Z;X;R 0 )log Q ( Z;X;R 0 ) P ( Z;X;R;Y ) : (A.1) J ( Q;P )isknownalternativelyasHelmholtzfreeenergy,Gibbsfre eenergy,orfreeenergy [ 59 ].Byminimizing J ( Q;P ),weseektocomputeparametersofapproximatedistributio ns Q ( Z ), Q ( X j Z )and Q ( R 0 j Z ).Itisconvenient,rst,toreformulateEq.( A.1 )as J ( Q;P )= L Z + L X + L R .Wedeneauxiliary L Z L X ,and L R as L Z P Z Q ( Z )log Q ( Z ) P ( Z ) L X P Z;X Q ( Z ) Q ( X j Z )log Q ( X j Z ) P ( X j Z ) P ( Y j X; ) ,and L R R R 0 dR 0 P Z Q ( Z ) Q ( R 0 j Z )log Q ( R 0 j Z ) P ( R j Z ) .To deriveexpressionsfor L Z L X L R ,werstobserve: h z ij i = ij ; D x ki E = m ki ; D x ki x lj E = Q klij m lj ) m ki = P j 2 V ij P l 2 M Q klij m lj ; 8 i 2 V; 8 k 2 M; (A.2) where hi denotesexpectationwithrespectto Q ( Z;X;R 0 ).Consequently,fromEqs.( 2.1 ), ( 2.9 )and( A.2 ),wehave L Z = P ij 2 V ij log[ ij =r ij ] : (A.3) Next,fromEqs.( 2.4 ),( 2.10 )and( A.2 ),wederive L X = P i;j 2 V P k;l 2 M ij Q klij m lj log[ Q klij =P kl ij ] P i 2 V P k 2 M m ki log P ( y ( i ) j x ki ; ( i )) : (A.4) 67 PAGE 81 68 Notethatfor DT V 0 V inthesecondtermissubstitutedwith V 0 .Finally,fromEqs.( 2.3 ), ( 2.11 )and( A.2 ),weget L R = 1 2 P i;j 2 V 0 ij log j ij j j n ij j Tr n n 1 ij n ij o +Tr n 1 ij h ( r i r j d ij )( r i r j d ij ) T i o : (A.5) Letusnowconsidertheexpectationinthelastterm: h ( r i r j d ij )( r i r j d ij ) T i = h ( r i ij + ij r j d ij )( r i ij + ij r j d ij ) T i = =n ij +2 h ( r i ij )( jp r j d ij jp + ij ) T i + + h ( r j jp + d ij + jp ij )( r j jp + d ij + jp ij ) i = =n ij +2 h ( r i ij )( jp r j ) T i + h ( r j jp )( r j jp ) T +( ij jp d ij )( ij jp d ij ) T i = =n ij + P p 2 V 0 jp (2 ijp +n jp + M ijp ) ; (A.6) wherethedenitionsofauxiliarymatrices ijp and M ijp aregiveninthesecondtothe lastderivationstepabove,and i j p isachild-parent-grandparenttriad.Itfollowsfrom Eqs.( A.5 )and( A.6 )that L R = 1 2 X i;j 2 V 0 ij 0@ log j ij j j n ij j 2+Tr f 1 ij n ij g + X p 2 V 0 jp Tr f 1 ij (2 ijp +n jp + M ijp ) g 1A : (A.7) InEq.( A.7 ),thelastexpressionlefttocomputeisTr f 1 ij ijp g .Forthispurpose,weapply theCauchy-Schwartzinequalityasfollows: Tr f 1 ij ijp g =Tr f 1 2 ij 1 2 ij h ( r i ij )( jp r j ) T ig =Tr fh 1 2 ij ( r i ij )( jp r j ) T 1 2 ij ig ; Tr f 1 ij n ij g 1 2 Tr f 1 ij n jp g 1 2 ; (A.8) whereweusedthefactthatthe'sandn'sarediagonalmatric es.AlthoughtheCauchySchwartzinequalityingeneraldoesnotyieldatightupperb ound,inourcaseitappears reasonabletoassumethatvariables r i and r j (i.e.,positionsofobjectpartsatdierent scales)areuncorrelated.SubstitutingEq.( A.8 )intoEq.( A.7 ),wenallyderivetheupper PAGE 82 69 boundfor L R as L R 1 2 P i;j 2 V 0 ij log j ij j j n ij j 2+Tr f 1 ij n ij g + P p 2 V 0 jp Tr f 1 ij (n jp + M ijp ) g + +2 P p 2 V 0 jp Tr f 1 ij n ij g 1 2 Tr f 1 ij n jp g 1 2 : (A.9) Optimizationof Q ( X j Z ) Q ( X j Z )isfullycharacterizedbyparameters Q klij .From thedenitionof L X ,wehave @J ( Q;P ) =@Q klij = @L X =@Q klij .Duetoparent-childdependencies inEq.( A.2 ),itisnecessarytoiterativelydierentiate L X withrespectto Q klij downthe subtreeofnode i .Forthispurpose,weintroducethreeauxiliaryterms F ij G i ,and ki whichfacilitatecomputation,asshownbelow: F ij P k;l 2 M ij Q klij m lj log[ Q klij =P kl ij ] ; G i P d;c 2 d ( i ) F dc P k 2 M m ki log P ( y ( i ) j x ki ; ( i )) V 0 ; ki exp( @G i =@m ki ) ; ) @L X @Q klij = @F ij @Q klij + @G i @m ki @m ki @Q klij ; (A.10) where fg V 0 denotesthatthetermisincludedintheexpressionfor G i if i isaleaf nodefor DT V 0 .For DT V ,theterminbraces fg isalwaysincluded.Thisallowsus toderiveupdateequationsforbothmodelssimultaneously. Afterndingthederivatives @F ij =@Q klij = ij m lj (log[ Q klij =P kl ij ]+1)and @m ki =@Q klij = ij m lj ,andsubstitutingtheseexpressionsinEq.( A.10 ),wearriveat @L X =@Q klij = ij m lj (log[ Q klij =P kl ij ]+1 log ki ) : (A.11) Finally,optimizingEq.( A.11 )withtheLagrangemultiplierthataccountsfortheconstra int P k 2 M Q klij =1yieldsthedesiredupdateequation: Q klij = P kl ij ki ,introducedinEq.( 2.13 ). Tocompute ki ,werstnd @G i =@m ki = P c 2 c ( i ) @F ci =@m ki + P a 2 M ( @G c =@m ac )( @m ac =@m ki ) f log P ( y ( i ) j x ki ; ( i )) g V 0 ; = P c 2 c ( i ) P a 2 M ci Q akci log[ Q akci =P ak ci ]+ @G c =@m ac f log P ( y ( i ) j x ki ; ( i )) g V 0 ; (A.12) andthensubstitute Q klij ,givenbyEq.( 2.13 ),intoEq.( A.12 ),whichresultsin ki = f P ( y ( i ) j x ki ; ( i )) g V 0 Q c 2 V P a 2 M P ak ci ac ci ,asintroducedinEq.( 2.14 ). PAGE 83 70 Optimizationof Q ( R 0 j Z ) Q ( R 0 j Z )isfullycharacterizedbyparameters ij andn ij Fromthedenitionof L R ,weobservethat @J ( Q ) =@ n ij = @L R =@ n ij and @J ( Q ) =@ ij = @L R =@ i Sincethen'sarepositivedenite,fromEq.( A.9 ),itfollowsthat @L R =@ n ij =0 : 5 ij Tr f n 1 ij g +Tr f 1 ij g + P c 2 V 0 ci Tr f 1 ci g + + P p 2 V 0 jp Tr f 1 ij g Tr f 1 ij n ij g 1 2 Tr f 1 ij n jp g 1 2 + + P c 2 V 0 ci Tr f 1 ci g Tr f 1 ci n ij g 1 2 Tr f 1 ci n ci g 1 2 : (A.13) From @L R =@ n ij =0,itisstraightforwardtoderivetheupdateequationforn ij givenby Eq.( 2.17 ). Next,tooptimizethe ij parameters,from( A.9 ),wecompute @L R @ ij = @ @ ij 1 2 P i;j;p 2 V 0 ij jp ( ij jp d jp ) T 1 ij ( ij jp d jp ) ; = P c;p 2 V 0 ij jp 1 ij ( ij jp d jp ) ci ij 1 ci ( ci ij d ij ) : (A.14) Then,from @L R =@ ij =0,itisstraightforwardtocomputetheupdateequationfor ij given byEq.( 2.16 ). Optimizationof Q ( Z ) Q ( Z )isfullycharacterizedbytheparameters ij .Fromthe denitionsof L Z L X ,and L R weseethat @J ( Q ) =@ ij = @ ( L X + L R + L Z ) =@ ij .Similarto theoptimizationof Q klij ,weneedtoiterativelydierentiate L X asfollows: @L X =@ ij = @F ij =@ ij + P k 2 M ( @G i =@m ki )( @m ki =@ ij )(A.15) where F ij and G i aredenedasinEq.( A.10 ).Substitutingthederivatives @G i =@m ki = log ki and @F ij =@ ij = P k;l 2 M Q klij m lj log[ Q klij =P kl ij ],and @m ki =@ ij = P l 2 M Q klij m lj intoEq.( A.15 ) weobtain @L X @ ij = P k;l 2 M Q klij m lj log Q klij P kl ij log ki = P k;l 2 M Q klij m lj log P a 2 M P al ij ai = A ij ; (A.16) Next,wedierentiate L R ,givenbyEq.( A.9 ),withrespectto ij as @L R =@ ij = 1 2 log j ij j = j n ij j 1+ 1 2 Tr f 1 ij n ij g + + 1 2 P p 2 V 0 jp Tr f 1 ij (n jp + M ijp ) g +2Tr f 1 ij n ij g 1 2 Tr f 1 ij n tu g 1 2 + PAGE 84 71 + 1 2 P c 2 V 0 ci Tr f 1 ci (n ij + M cij ) g +2Tr f 1 ci n ci g 1 2 Tr f 1 ci n ij g 1 2 ; (A.17) = B ij 1 ; (A.18) whereindexes c j and p denotechildren,parentsandgrandparentsofnode i ,respectively. Further,fromEq.( A.3 ),weget @L Z =@ ij =1+log ij =r ij : (A.19) Finally,substitutingEqs.( A.16 ),( A.18 )and( A.19 )into @J ( Q ) =@ ij =0andaddingthe Lagrangemultipliertoaccountfortheconstraint P j 2 V 0 ij =1,wesolvefortheupdate equationof ij givenbyEq.( 2.18 ). PAGE 85 APPENDIXB INFERENCEONTHEFIXED-STRUCTURETREE Theinferencealgorithmfor MaximumPosteriorMarginal (MPM)estimationonthe quad-treeisknowntoalleviateimplementationissuesrela tedtounderrownumericalerror[ 33 ].ThewholeprocedureissummarizedinFig. B{1 .Thealgorithmassumesthat thetreestructureisxedandknown.Therefore,inFig. B{1 ,wesimplifynotationas P ( x i j Z;Y ) P ( x i j Y )and P ( x i j x j ;Z ) P ( x i j x j ).Also,wedenotewith c ( i )childrenof i andwith d ( i )thesetofallthedescendantsdownthetreeofnode i including i itself.Thus, Y d ( i ) denotesasetofallobservablesdownthesubtreewhoserooti s i .Also,forcomputing P ( x i j Y d ( i ) ),inthebottom-uppass, / meansthatequalityholdsuptoamultiplicative constantthatdoesnotdependon x i 72 PAGE 86 73 Two-passMPMestimationonthetree # Preliminarydownwardpass: 8 i 2 V L 1 ;V L 2 ;:::;V 0 P ( x i )= P x j P ( x i j x j ) P ( x j ), Bottom-uppass: Initializeleafnodes: 8 i 2 V 0 P ( x i j y i ) / P ( y i j x i ) P ( x i ), P ( x i ;x j j y i )= P ( x i j x j ) P ( x j ) P ( x i j y i ) =P ( x i ), N computeupward 8 i 2 V 1 ;V 2 :::;V L P ( x i j Y d ( i ) ) / P ( x i ) Q c 2 c ( i ) P x c P ( x c j Y d ( c ) ) P ( x c j x i ) P ( x c ) P ( x i ;x j j Y d ( i ) )= P ( x i j x j ) P ( x j ) P ( x i j Y d ( i ) ) =P ( x i ), # Top-downpass: Initializeroot: i 2 V L P ( x i j Y )= P ( x i j Y d ( i ) ), ^ x i =argmax x i P ( x i j Y ), H computedownward 8 i 2 V L 1 ;V L 2 :::;V 0 P ( x i j Y )= X x j P ( x i ;x j j Y d ( i ) ) P x i P ( x i ;x j j Y d ( i ) ) P ( x j j Y ), ^ x i =argmax x i P ( x i j Y ) FigureB{1:Steps2and5inFig. 3{2 :MPMestimationonthexed-structuretree.Distributions P ( y i j x i )and P ( x i j x j )areassumedknown. PAGE 87 REFERENCES [1]W.E.L.GrimsonandT.Lozano-Perez,\Localizingoverla ppingpartsbysearching theinterpretationtree," IEEETrans.PatternAnal.MachineIntell. ,vol.9,no.4,pp. 469{482,1987. [2]S.Z.DerandR.Chellappa,\Probe-basedautomatictarge trecognitionininfrared imagery," IEEETrans.ImageProcessing ,vol.6,no.1,pp.92{102,1997. [3]P.C.Chung,E.L.Chen,andJ.B.Wu,\Aspatiotemporalneu ralnetworkforrecognizingpartiallyoccludedobjects," IEEETrans.SignalProcessing ,vol.46,no.7,pp. 1991{2000,1998. [4]W.M.Wells,\Statisticalapproachestofeature-basedo bjectrecognition," Intl.J. ComputerVision ,vol.21,no.1,pp.63{98,1997. [5]Z.YingandD.Castanon,\Partiallyoccludedobjectreco gnitionusingstatistical models," Intl.J.ComputerVision ,vol.49,no.1,pp.57{78,2002. [6]S.Z.Li, Markovrandomeldmodelinginimageanalysis ,Springer-Verlag,Tokyo, Japan,2ndedition,2001. [7]M.H.LinandC.Tomasi,\Surfaceswithocclusionsfromla yeredstereo," IEEETrans. PatternAnal.MachineIntell. ,vol.26,no.8,pp.1073{1078,2004. [8]A.MittalandL.S.Davis,\M2tracker:amulti-viewappro achtosegmentingand trackingpeopleinaclutteredscene," Intl.J.ComputerVision ,vol.51,no.3,pp. 189{203,2003. [9]B.J.Frey,N.Jojic,andA.Kannan,\Learningappearance andtransparencymanifolds ofoccludedobjectsinlayers,"in Proc.IEEEConf.ComputerVisionPatternRec. Madison,WI,2003,vol.1,pp.45{52,IEEE,Inc. [10]F.Dell'AcquaandR.Fisher,\Reconstructionofplanar surfacesbehindocclusionsin rangeimages," IEEETrans.PatternAnal.MachineIntell. ,vol.24,no.4,pp.569{575, 2002. [11]R.Fergus,P.Perona,andA.Zisserman,\Objectclassre cognitionbyunsupervised scale-invariantlearning,"in Proc.IEEEConf.ComputerVisionPatternRec. ,Madison,WI,2003,vol.2,pp.264{271,IEEE,Inc. [12]A.Mohan,C.Papageorgiou,andT.Poggio,\Example-bas edobjectdetectioninimages bycomponents," IEEETrans.PatternAnalysisMachineIntelligence ,vol.23,no.4, pp.349{361,2001. 74 PAGE 88 75 [13]M.Weber,MWelling,andP.Perona,\Towardsautomaticd iscoveryofobjectcategories,"in Proc.IEEEConf.Comp.VisionPatternRec. ,HiltonHeadIsland,SC, 2000,vol.2,pp.101{109,IEEE,Inc. [14]M.Weber,MWelling,andP.Perona,\Unsupervisedlearn ingofmodelsforrecognition,"in Proc.6thEuropeanConf.Comp.Vision ,Dublin,Ireland,2000,vol.1,pp. 18{32,Springer. [15]B.Heisele,T.Serre,M.Pontil,T.Vetter,andT.Poggio ,\Categorizationbylearning andcombiningobjectparts,"in Advancesinneuralinformationprocessingsystems, 14 ,T.G.Dietterich,S.Becker,andZ.Ghahramani,Eds.,vol.2 ,pp.1239{1245.MIT Press,Cambridge,MA,2002. [16]P.F.FelzenszwalbandDanielP.Huttenlocher,\Pictor ialstructuresforobjectrecognition," Intl.J.ofComputerVision ,vol.61,no.1,pp.55{79,2005. [17]H.SchneidermanandT.Kanade,\Objectdetectionusing thestatisticsofparts," Intl. J.ComputerVision ,vol.56,no.3,pp.151{177,2004. [18]S.C.Zhu,\Statisticalmodelingandconceptualizatio nofvisualpatterns," IEEE Trans.PatternAnal.MachineIntell. ,vol.25,no.6,pp.691{712,2003. [19]S.C.Zhu,Y.N.Wu,andD.B.Mumford,\Minimaxentropypr incipleanditsapplicationstotexturemodeling," NeuralComputation ,vol.9,no.8,pp.1627{1660, 1997. [20]S.GemanandD.Geman,\Stochasticrelaxation,Gibbsdi stributionandtheBayesian restorationofimages," IEEETrans.PatternAnal.MachineIntell. ,vol.6,no.6,pp. 721{741,1984. [21]A.EfrosandT.Leung,\Texturesynthesisbynon-parame tricsampling,"in Proc.Intl. Conf.ComputerVision ,Kerkyra,Greece,1999,vol.2,pp.1033{1038,IEEE,Inc. [22]J.S.DeBonetandP.Viola,\Texturerecognitionusinga non-parametricmulti-scale statisticalmodel,"in Proc.IEEEConf.ComputerVisionPatternRec. ,SantaBarbara, CA,1998,pp.641{647,IEEE,Inc. [23]M.J.Beal,N.Jojic,andH.Attias,\Agraphicalmodelfo raudiovisualobjecttracking," IEEETrans.PatternAnal.MachineIntell. ,vol.25,no.7,pp.828{836,2003. [24]J.CoughlanandA.Yuille,\Algorithmsfromstatistica lphysicsforgenerativemodels ofimages," ImageandVisionComputing ,vol.21,no.1,pp.29{36,2003. [25]S.KumarandM.Hebert,\Discriminativerandomelds:a discriminativeframework forcontextualinteractioninclassication,"in Proc.IEEEIntl.Conf.Comp.Vision Nice,France,2003,vol.2,pp.1150{1157,IEEE,Inc. [26]J.Laerty,A.McCallum,andF.Pereira,\Conditionalr andomelds:probabilistic modelsforsegmentingandlabelingsequencedata,"in Proc.Intl.Conf.Machine Learning ,WilliamsCollege,MA,2001,pp.282{289. [27]C.A.BoumanandM.Shapiro,\Amultiscalerandomeldmo delforBayesianimage segmentation," IEEETrans.ImageProcessing ,vol.3,no.2,pp.162{177,1994. PAGE 89 76 [28]W.W.Irving,P.W.Fieguth,andA.S.Willsky,\Anoverla ppingtreeapproachto multiscalestochasticmodelingandestimation," IEEETrans.ImageProcessing ,vol. 6,no.11,pp.1517{1529,1997. [29]H.ChengandC.A.Bouman,\MultiscaleBayesiansegment ationusingatrainable contextmodel," IEEETrans.ImageProcessing ,vol.10,no.4,pp.511{525,2001. [30]M.S.Crouse,R.D.Nowak,andR.G.Baraniuk,\Wavelet-b asedstatisticalsignal processingusingHiddenMarkovModels," IEEETrans.SignalProcessing ,vol.46,no. 4,pp.886{902,1998. [31]X.Feng,C.K.I.Williams,andS.N.Felderhof,\Combini ngbeliefnetworksandneural networksforscenesegmentation," IEEETrans.PatternAnal.MachineIntell. ,vol.24, no.4,pp.467{483,2002. [32]S.TodorovicandM.C.Nechyba,\Towardsintellignetmi ssionprolesofMicroAir Vehicles:multiscaleViterbiclassication,"in Proc.8thEuropeanConf.Computer Vision ,Prague,CzechRepublic,2004,vol.2,pp.178{189,Springe r. [33]J.-M.Laferte,P.Perez,andF.Heitz,\DiscreteMark ovimagemodelingandinference onthequadtree," IEEETrans.ImageProcessing ,vol.9,no.3,pp.390{404,2000. [34]M.R.LuettgenandA.S.Willsky,\Likelihoodcalculati onforaclassofmultiscale stochasticmodels,withapplicationtotexturediscrimina tion," IEEETrans.Image Processing ,vol.4,no.2,pp.194{207,1995. [35]P.L.Ainsleigh,N.Kehtarnavaz,andR.L.Streit,\Hidd enGauss-Markovmodelsfor signalclassication," IEEETrans.SignalProcessing ,vol.50,no.6,pp.1355{1367, 2002. [36]J.Pearl, Probabilisticreasoninginintelligentsystems:networks ofplausibleinference MorganKaufamnn,SanMateo,CA,1988. [37]M.J.Wainwright,T.S.Jaakkola,andA.S.Willsky,\Tre e-basedreparameterization frameworkforanalysisofsum-productandrelatedalgorith ms," IEEETrans.Inform. Theory ,vol.49,no.5,pp.1120{1146,2003. [38]BrendanJ.Frey, Graphicalmodelsformachinelearninganddigitalcommunic ation TheMITPress,Cambridge,MA,1998. [39]S.KumarandM.Hebert,\Man-madestructuredetectioni nnaturalimagesusinga causalmultiscalerandomeld,"in Proc.IEEEConf.ComputerVisionPatternRec. Madison,WI,2003,vol.1,pp.119{126,IEEE,Inc. [40]M.K.Schneider,P.W.Fieguth,W.C.Karl,andA.S.Wills ky,\Multiscalemethods forthesegmentationandreconstructionofsignalsandimag es," IEEETrans.Image Processing ,vol.9,no.3,pp.456{468,2000. [41]J.Li,R.M.Gray,andR.A.Olshen,\Multiresolutionima geclassicationbyhierarchicalmodelingwithtwo-dimensionalHiddenMarkovModels ," IEEETrans.Inform. Theory ,vol.46,no.5,pp.1826{1841,2000. PAGE 90 77 [42]W.K.Konen,T.Maurer,andC.vonderMalsburg,\Afastdy namiclinkmatching algorithmforinvariantpatternrecognition," NeuralNetworks ,vol.7,no.6-7,pp. 1019{1030,1994. [43]A.Montanvert,P.Meer,andA.Roseneld,\Hierarchica limageanalysisusingirregular tessellations," IEEETrans.PatternAnal.MachineIntell. ,vol.13,no.4,pp.307{316, 1991. [44]P.BertolinoandA.Montanvert,\Multiresolutionsegm entationusingtheirregular pyramid,"in Proc.Intl.ConfImageProcessing ,Lausanne,Switzerland,1996,vol.1, pp.257{260,IEEE,Inc. [45]N.J.Adams,A.J.Storkey,Z.Ghahramani,andC.K.I.Wil liams,\MFDTs:mean elddynamictrees,"in Proc.15thIntl.Conf.PatternRec. ,Barcelona,Spain,2000, vol.3,pp.147{150,Intl.Assoc.PatternRec. [46]N.J.Adams, Dynamictrees:ahierarchicalprobabilisticapproachtoim agemodeling Ph.D.dissertation,DivisionofInformatics,Univ.ofEdin burgh,Edinburgh,UK,2001. [47]A.J.Storkey,\Dynamictrees:astructuredvariationa lmethodgivingecientpropagationrules,"in Uncertaintyinarticialintelligence ,C.BoutilierandM.Goldszmidt, Eds.,pp.566{573.MorganKauamnn,SanFrancisco,CA,2000 [48]A.J.StorkeyandC.K.I.Williams,\Imagemodelingwith position-encodingdynamic trees," IEEETrans.PatternAnal.MachineIntell. ,vol.25,no.7,pp.859{871,2003. [49]M.I.Jordan,Ed., Learningingraphicalmodels(adaptivecomputationandmac hine learning) ,MITpress,Cambridge,MA,1999. [50]M.I.Jordan,\Graphicalmodels," StatisticalScience(spec.issueonBayesianstatistics) ,vol.19,pp.140{155,2004. [51]A.P.Dempster,N.M.Laird,andD.B.Rubin,\Maximumlik elihoodfromincomplete dataviatheEMalgorithm," JournaloftheRoyalStatisticalSocietyB ,vol.39,pp. 1{39,1977. [52]G.J.McLachlanandK.T.Thriyambakam, TheEMalgorithmandextensions ,John Wiley&Sons,NewYork,NY,1996. [53]D.M.ChickeringandD.Heckerman,\Ecientapproximat ionsforthemarginallikelihoodofincompletedatagivenaBayesiannetwork,"in Proc.12thConf.Uncertainty ArticialIntelligence ,Portland,OR,1996,pp.158{168,Assoc.UncertaintyArti cial Intelligence. [54]S.TodorovicandM.C.Nechyba,\Interpretationofcomp lexscenesusinggenerative dynamic-structuredmodels,"in CD-ROMProc.IEEECVPR2004,Workshopon Generative-ModelBasedVision(GMBV) ,Washington,DC,2004,IEEE,Inc. [55]S.TodorovicandM.C.Nechyba,\Detectionofarticial structuresinnatural-scene imagesusingdynamictrees,"in Proc.17thIntl.Conf.PatternRec. ,Cambridge,UK, 2004,pp.35{39,Intl.Assoc.PatternRec. PAGE 91 78 [56]M.AitkinandD.B.Rubin,\Estimationandhypothesiste stinginnitemixture models," J.RoyalStat.Soc. ,vol.B-47,no.1,pp.67{75,1985. [57]R.M.Neal,\ProbabilisticinferenceusingMarkovChai nMonteCarlomethods,"Tech. Rep.CRG-TR-93-1,ConnectionistResearchGroup,Univ.ofT oronto,1993. [58]D.A.Forsyth,J.Haddon,andS.Ioe,\Thejoyofsamplin g," Intl.J.Computer Vision ,vol.41,no.1-2,pp.109{134,2001. [59]M.I.Jordan,Z.Ghahramani,T.S.Jaakkola,andL.K.Sau l,\Anintroduction tovariationalmethodsforgraphicalmodels," MachineLearning ,vol.37,no.2,pp. 183{233,1999. [60]D.J.C.MacKay, Informationtheory,inference,andlearningalgorithms ,Cambridge Univ.Press,Cambridge,UK,2003. [61]D.BarberandP.vandeLaar,\Variationalcumulantexpa ntionsforintractabledistributions," J.ArticialIntell.Research ,vol.10,pp.435{455,1999. [62]D.J.C.MacKay, Informationtheory,inference,andlearningalgorithms ,chapter29, pp.357{386,CambridgeUniversityPress,Cambridge,UK,20 03. [63]D.J.C.MacKay,\IntroductiontoMonteCarlomethods," in Learningingraphical models(adaptivecomputationandmachinelearning) ,M.I.Jordan,Ed.,pp.175{204. MITpress,Cambridge,MA,1999. [64]T.S.Jaakkola,\Tutorialonvariationalapproximatio nmethods,"in Adv.MeanField Methods ,M.OpperandD.Saad,Eds.,pp.129{161.MITpress,Cambridg e,MA,2000. [65]T.M.CoverandJ.A.Thomas, Elementsofinformationtheory ,WileyInterscience Press,NewYork,NY,1991. [66]TrygveRandenandHakonHusoy,\Filteringfortexturec lassication:acomparative study," IEEETrans.PatternAnal.MachineIntell. ,vol.21,no.4,pp.291{310,1999. [67]StephaneMallat, Awavelettourofsignalprocessing ,AcademicPress,SanDiego,CA, 2ndedition,2001. [68]StephaneG.Mallat,\Atheoryformultiresolutionsign aldecomposition:thewavelet representation," IEEETrans.PatternAnal.MachineIntell. ,vol.11,no.7,pp.674{ 693,1989. [69]JeromeM.Shapiro,\Embeddedimagecodingusingzerotr eesofwaveletcoecients," IEEETrans.onSignalProcessing ,vol.41,no.12,pp.3445{3462,1993. [70]N.G.Kingsbury,\Complexwaveletsforshiftinvariant analysisandlteringofsignals," J.AppliedComp.HarmonicAnalysis ,vol.10,no.3,pp.234{253,2001. [71]MichaelUnser,\Textureclassicationandsegmentati onusingwaveletframes," IEEE Trans.onImageProcessing ,vol.4,no.11,pp.1549{1560,1995. [72]NickKingsbury,\Complexwaveletsforshiftinvariant analysisandlteringofsignals," JournalofAppliedandComputationalHarmonicAnalysis ,vol.10,no.3,pp.234{253, 2001. PAGE 92 79 [73]T.Lindeberg,\Scale-spacetheory:abasictoolforana lysingstructuresatdierent scales," J.AppliedStatistics ,vol.21,no.2,pp.224{270,1994. [74]D.G.Lowe,\Distinctiveimagefeaturesfromscale-inv ariantkeypoints," Intl.J. ComputerVision ,vol.60,no.2,pp.91{110,2004. [75]S.Belongie,J.Malik,andJ.Puzicha,\Shapematchinga ndobjectrecognitionusing shapecontexts," IEEETrans.PatternAnal.MachineIntell. ,vol.24,no.4,pp.509{ 522,2002. [76]B.J.Frey,N.Jojic,andA.Kannan,\Learningappearanc eandtransparencymanifolds ofoccludedobjectsinlayers,"in Proc.IEEEConf.ComputerVisionPatternRec. Madison,WI,2003,vol.1,pp.45{52,IEEE,Inc. [77]G.JonesIIIandB.Bhanu,\Recognitionofarticulateda ndoccludedobjects," IEEE Trans.PatternAnal.MachineIntell. ,vol.21,no.7,pp.603{613,1999. [78]J.S.Yedidia,W.T.Freeman,andY.Weiss,\Generalized beliefpropagation,"in Advancesinneuralinformationprocessingsystems13 ,T.K.Leen,T.G.Dietterich, andV.Tresp,Eds.,pp.689{695.MITPress,Cambridge,MA,20 01. [79]Y.FreundandR.E.Schapire,\Adecision-theoreticgen eralizationofon-linelearning andanapplicationtoboosting," J.ComputerSystemSciences ,vol.55,no.1,pp. 119{139,1997. [80]V.N.Vapnik, Statisticallearningtheory ,JohnWiley&Sons,Inc.,NewYork,NY, 1998. [81]P.Domingos,\MetaCost:ageneralmethodformakingcla ssierscost-sensitive,"in Proc.15thIntl.Conf.KnowledgeDiscoveryDataMining ,SanDiego,CA,1999,pp. 155{164,ACMPress. [82]T.G.Dietterich,\Machinelearningforsequentialdat a:areview,"in Lecturenotes incomputerscience ,T.Caelli,Ed.,vol.2396,pp.15{30.Springer-Verlag,Hei delberg, Germany,2002. PAGE 93 BIOGRAPHICALSKETCH SinisaTodorovicwasborninBelgrade,Serbia,in1968.Hegr aduatedfromMathematicalHighSchool{Belgradein1987.HereceivedhisB.S.degr eeinelectricalandcomputer engineeringattheUniversityofBelgrade,Serbia,in1994. From1994until2001,heworked asasoftwareengineerinthecommunicationsindustry.Infa ll2001,SinisaTodorovic enrolledinthemaster'sdegreeprogramattheDepartmentof ElectricalandComputer Engineering,UniversityofFlorida,Gainesville.Hebecam eamemberoftheCenterfor MicroAirVehicleResearch,whereheconductedresearchins tatisticalimagemodelingand multi-resolutionsignalprocessing.SinisaTodorovicear nedhismaster'sdegree(M.S.thesis option)inDecember,2002,afterwhichhecontinuedhisstud iestowardaPh.D.degreeinthe sameDepartment.Hereceivedtwocerticatesforoutstandi ngacademicaccomplishment in2002and2003.HeexpectstograduateinMay,2005. 80 |