
Citation 
 Permanent Link:
 http://ufdc.ufl.edu/AA00025708/00001
Material Information
 Title:
 Efficient computational methods for early vision problems
 Creator:
 Lai, ShangHong, 1964
 Publication Date:
 1995
 Language:
 English
 Physical Description:
 xiii, 188 leaves : ill. ; 29 cm.
Subjects
 Subjects / Keywords:
 Algorithms ( jstor )
Approximation ( jstor ) Capacitance ( jstor ) Datasets ( jstor ) Iterative solutions ( jstor ) Linear systems ( jstor ) Matrices ( jstor ) Matrix equations ( jstor ) Preconditioning ( jstor ) Wavelet analysis ( jstor ) Dissertations, Academic  Electrical and Computer Engineering  UF Electrical and Computer Engineering thesis, Ph. D
 Genre:
 bibliography ( marcgt )
theses ( marcgt ) nonfiction ( marcgt )
Notes
 Thesis:
 Thesis (Ph. D.)University of Florida, 1995.
 Bibliography:
 Includes bibliographical references (leaves 181187).
 Additional Physical Form:
 Also available online.
 General Note:
 Typescript.
 General Note:
 Vita.
 Statement of Responsibility:
 by ShangHong Lai.
Record Information
 Source Institution:
 University of Florida
 Holding Location:
 University of Florida
 Rights Management:
 Copyright [name of dissertation author]. Permission granted to the University of Florida to digitize, archive and distribute this item for nonprofit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
 Resource Identifier:
 022049735 ( ALEPH )
33876876 ( OCLC )

Downloads 
This item has the following downloads:

Full Text 
EFFICIENT COMPUTATIONAL METHODS FOR EARLY VISION PROBLEMS
By
SHANGHONG LAI
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1995
To my wife YihChou
and my son Patrick
ACKNOWLEDGMENTS
I am deeply grateful to my advisor and committee chairman, Dr. Baba C. Ve
muri, whose guidance, encouragement and support made the completion of this work
possible. I thank him for his academic as well as nonacademic advice and assistance
during my study at the University of Florida. Many ideas in this dissertation were
produced through the discussions with him. I would also like to thank Dr. Staudham
mer, Dr. Doty, Dr. Harris, Dr. Sigmon and Dr. Rao for serving in my supervisory
committee and advising me on various aspects of this dissertation. I sincerely thank
the National Science Foundation for supporting me financially to finish this work
under the grant number ECS9210648. Special thanks go to my parents, who never
hesitate to deliver their love, encouragement and support to enrich my life. Finally,
I would like to dedicate this dissertation to my beloved wife, YihChou, who shared
the happiness and struggle I experienced during the work of this dissertation, and to
my son, Patrick.
TABLE OF CONTENTS
ACKNOWLEDGMENTS ............................. iii
LIST OF TABLES ................................. vii
LIST OF FIGURES ................................ viii
ABSTRACT .................................... xii
CHAPTERS
1 INTRODUCTION ............................... 1
1.1 Formulations of Early Vision Problems ................... 2
1.2 Computational Algorithms ........................ 3
1.3 Main Contributions of this Thesis ....................... 6
1.4 Thesis Overview .............................. 8
2 PROBLEM FORMULATIONS ........................ 11
2.1 Deterministic Framework ......................... 11
2.2 Probabilistic Framework ......................... 14
2.3 Variational Formulations for Some Early Vision Problems ....... .18
2.3.1 Surface Reconstruction ...................... 19
2.3.2 Shape from Shading ....................... 20
2.3.3 Optical Flow Computation ...................... 24
3 GENERALIZED CAPACITANCE MATRIX THEOREMS ......... 28
3.1 Introduction .. ... .. ... .. ... .. .. ... .. . . . 28
3.2 Generalized Capacitance Matrix Theorems ................. 33
3.3 Discussion . . . .. . . . . . . . . . . . . 48
4 GENERALIZED CAPACITANCE MATRIX ALGORITHM ........ 50
4.1 Structure of the Algorithm ........................ 50
4.2 Choice of the Splitting .......................... 52
4.2.1 Dense Data Surface Reconstruction . . . . . . ... .. 55
4.2.2 Sparse Data Surface Reconstruction . . . . . . ... .. 60
4.3 Numerical Solution ..... ........................... .. 61
4.3.1 ADI Method for the Lyapunov Equation . . . . ... ..62
4.3.2 Modified BCG Method for the Capacitance Matrix Equation 72
4.4 Experimental Results . . . . . . . . . . . . ... .. 74
4.4.1 Surface Reconstruction . . . . . . . . . ... .. 75
4.4.2 Shape from Shading . . . . . . . . . . ... .. 77
5 ADAPTIVE PRECONDITIONING TECHNIQUE . . . . . .... .. 81
5.1 Introduction . . . . . . . . . . . . . . ... .. 81
5.2 Adaptive Preconditioning in a Wavelet Basis . . . . . .... .. 85
5.2.1 Preconditioning Technique . . . . . . . . ... .. 86
5.2.2 Regularization Filter . . . . . . . . . . ... .. 88
5.2.3 W avelet Basis . . . . . . . . . . . . ... .. 90
5.2.4 Adaptive Preconditioning via the Modulation in a Wavelet Basis 93
5.3 Adaptive Preconditioned Conjugate Gradient Algorithm ....... 96
5.3.1 Surface Reconstruction . . . . . . . . . ... .. 97
5.3.2 Shape from Shading . . . . . . . . . . ... .. 100
5.3.3 Optical Flow Computation . . . . . . . . ... .. 102
5.4 Experimental Results . . . . . . . . . . . . ... .. 102
5.4.1 Surface Reconstruction . . . . . . . . . ... .. 103
5.4.2 Shape from Shading . . . . . . . . . . ... .. 117
5.4.3 Optical Flow Computation . . . . . . . . ... .. 123
5.5 Summary and Discussion . . . . . . . . . . . ... .. 126
6 ROBUST OPTICAL FLOW COMPUTATION . . . . . . ... .. 128
6.1 Introduction . . . . . . . . . . . . . . ... .. 128
6.2 Modified gradientbased method . . . . . . . . . ... .. 132
6.3 SSDbased Regularization Method . . . . . . . . ... .. 137
6.3.1 Nonlinear Conjugate Gradient Algorithm . . . . ... ..139
6.3.2 Modified Search Direction Scheme . . . . . . ... .. 141
6.4 Experimental Results . . . . . . . . . . . . ... .. 142
6.4.1 Modified Gradientbased Method . . . . . . ... .. 143
6.4.2 SSDbased Regularization Method . . . . . . ... .. 146
6.5 Summary and Discussion . . . . . . . . . . . ... .. 150
7 RESEARCH DIRECTIONS . . . . . . . . . . . ... .. 152
7.1 Introduction . . . . . . . . . . . . . . ... .. 152
7.2 Previous Energy Minimization Methods . . . . . . . ... .. 154
7.2.1 Deterministic Optimization Methods . . . . . .... ..155
7.2.2 Stochastic Optimization Methods . . . . . . ... .. 160
7.3 Proposed Hybrid Search Algorithm . . . . . . . . ... .. 161
7.3.1 Elements of Genetic Algorithms . . . . . . . ... .. 164
7.3.2 Informed Genetic Algorithm . . . . . . . . ... .. 168
7.3.3 Preliminary Experimental Results . . . . . . ... .. 171
7.3.4 Discussion . . . . . . . . . . . . . ... .. 176
8 CONCLUSIONS . . . . . . . . . . . . . . ... .. 178
REFERENCES . . . . . . . . . . . . . . . . ... .. 181
BIOGRAPHICAL SKETCH . . . . . . . . . . . . ... .. 188
LIST OF TABLES
6.1 Summary of Square 2 results . . . . . . . . . .... .. 145
6.2 Summary of Diverging Tree results . . . . . . . . ... .. 146
6.3 Summary of Sinusoid 1 results . . . . . . . . . ... .. 148
6.4 Summary of Translating Tree results . . . . . . . .... .. 150
LIST OF FIGURES
4.1 The flow chart of the generalized capacitance matrix algorithm. . 53
4.2 (a) Data constraint molecule and (b) membrane computational molecules
periodically imposed on the domain Q . . . . . . . ... .. 56
4.3 Surface reconstruction example; (a) original sparse data, (b) recon
structed surface using membrane smoothness, (c) reconstructed sur
face using thinplate smoothness, obtained using the generalized ca
pacitance matrix algorithm . . . . . . . . . . .... .. 76
4.4 Surface reconstruction with prespecified discontinuities along a hori
zontal line; (a) original sparse data set with depth discontinuities along
the indicated horizontal line; reconstructed surfaces using a membrane
spline after (b) 1 iteration and (c) 10 iterations of ADI method; (d)
reconstructed surface using a thinplate spline after 16 iterations of
ADI m ethod . . . . . . . . . . . . . . ... ..77
4.5 Surface reconstruction with prespecified discontinuities along a diag
onal line; (a) the original sparse data set with depth discontinuities
along the indicated diagonal line, (b) reconstructed surface using a
membrane spline after 10 iterations of ADI method . . . ... .. 78
4.6 Shape from shading example, (a) Shaded sphere image (b) irregular
occluding boundary imbedded in a rectangle (c) recovered surface ori
entation, (d) reconstructed surface shape . . . . . . ... .. 80
5.1 The implementation of the wavelet transform via the QMF struc
ture from one level to the next, (a) decomposition (b) reconstruction,
adopted from Mallat[1989] . . . . . . . . . . ... .. 92
5.2 Division of the frequency domain in a wavelet basis . . . ... .. 92
5.3 The sparse data sets used for the surface reconstruction, (a) synthetic
data (b) synthetic symmetric data (c) real laser range data . . 104
5.4 (a), (d), (g) & (j) are the computed solutions using the APCG al
gorithm, (b), (e), (h) & (k) are the computed solutions using HCG
algorithm, (c), (f), (i) & (1) are the computed solutions using conju
gate gradient algorithm, after 1, 5, 10 and 20 iterations, respectively. 107
5.5 The computed solutions after (a) 10 iterations of APCG (b) 25 itera
tions of HCG (c) 30 iterations of CG using the same time . ... .. 108
5.6 The convergence for different preconditioners in the preconditioned
conjugate gradient algorithm in the surface reconstruction problem
with the synthetic sparse data set and (a) with controlledcontinuity
stabilizer (p = 1, r = 0.1) (b) with thinplate stabilizer . . ... .. 109
5.7 The convergence for different preconditioners in the preconditioned
conjugate gradient algorithm for the surface reconstruction problem
with discontinuities specified in the controlledcontinuity stabilizer. 109
5.8 The computed solutions for the surface reconstruction with disconti
nuities using the APCG algorithm after (a) 10, (b) 20, (c) 40 and (d)
60 iterations, respectively . . . . . . . . . . .... .. 110
5.9 The convergence of Pentland's preconditioner in conjunction with the
conjugate gradient algorithm is worse than that of conjugate gradi
ent algorithm without preconditioning for the surface reconstruction
problem with the synthetic sparse data set using a membrane stabilizer. 111
5.10 Row profile of the wavelet transformed membrane matrix K, of size
64 x 64 along different rows, (a) row 3 (b) row 10 (c) row 30 (d) row 50.
This figure establishes the falsity of the diagonal dominance behavior
of K. claimed in [19] . . . . . . . . . . . . ... .. 112
5.11 The convergence for different preconditioners in the preconditioned
conjugate gradient algorithm for the surface reconstruction problem
with sparse symmetric data set using a controlledcontinuity stabilizer. 113
5.12 The computed solutions using the adaptive preconditioned conjugate
gradient algorithm after (a) 10 iterations, (b) 40 iterations, (c) 100
iterations, in the third surface reconstruction experiment . . ... .. 114
5.13 (a) The convergence for the APCG, HCG and CG algorithms for the
surface reconstruction problem with the real laser range data set using
a controlledcontinuity stabilizer, (b) The computed solution after 10
APCG iterations . . . . . . . . . . . . . ... .. 116
5.14 Shaded images of bump with different light source directions . . .117
5.15 The convergence of the APCG, HCG, and CG algorithms for the
shapefromshading problem . . . . . . . . . . .... .. 118
5.16 Shaded images of bump after 100 iterations of APCG algorithm with
out using any boundary condition . . . . . . . . ... .. 118
5.17 Shaded images of bump after (a) & (b) 10 iterations, (c) & (d) 20
iterations, (e) & (f) 40 iterations of the APCG algorithm with the
occluding boundary condition . . . . . . . . . ... .. 120
5.18 The computed (a) gradient and (b) height after 40 iterations of the
APCG algorithm . . . . . . . . . . . . . .... .. 121
5.19 The convergence curves of the (a) energy, (b) p q error, (c) z error
for the CG, HCG and APCG algorithm . . . . . . ... .. 122
5.20 One frame from the (a) square image sequence and (b) rotating Rubic
cube image sequence . . . . . . . . . . . . ... .. 124
5.21 Computed flow of the (a) translating square and (b) rotating Rubic
cube . . . . . . . . . . . . . . . . . . .. 125
5.22 The convergence curves of the CG, HCG and APCG algorithms for
(a) translating square (b) rotating Rubic image sequence . . ... .. 126
6.1 The discrepancy between the gradient directions of the function oI0 and
I, at the ith location is small and thus the use of the gradient direction
as the search direction for the ith component is robust; while the the
jth component of the gradient direction is unreliable for a local search. 142
6.2 Frames from the (a) Square 2 and (b) Diverging Tree image sequences. 144
6.3 Computed optical flow of the (a) Square 2 and (b) Diverging Tree
image sequences after 40 iterations of the incomplete Cholesky pre
conditioned CG algorithm . . . . . . . . . . ... .. 145
6.4 (a) One frame from the Rotating Rubic Cube sequence (b) Computed
optical flow after 40 iterations of the incomplete Cholesky precondi
tioned CG algorithm . . . . . . . . . . . . ... .. 147
6.5 Frames from the (a) Sinusoid 1 and (b) Translating Tree image sequences. 147
6.6 Computed optical flow of the (a) Sinusoid 1 and (b) Translating Tree
image sequences, using the preconditioned nonlinear CG algorithm. 149
6.7 (a) One frame from the Hamburg Taxi sequence (b) Computed optical
flow after 200 iterations of the preconditioned nonlinear CG algorithm. 150
7.1 The crossover operation example . . . . . . . . ... .. 166
7.2 The sparse data set used for the surface reconstruction with disconti
nuity detection, (a) original surface (b) sparsely sampled data set. . 172
7.3 (a) Clique for the line process (x: line process, 0: nodal variable for
surface), (b) clique potentials for different line process configurations. 173
7.4 Surface reconstruction example: (a) the reconstructed surface (b) the
recovered discontinuity map ( solid line ) obtained after 3 generations
of the informed GA and imposed on the true discontinuity map (
dashed line ) . . . . . . . . . . . . . . ... .. 174
7.5 Surface reconstruction with noise example: (a) the noisy sparse data
set, (b) the reconstructed surface (c) the recovered discontinuity map
( solid line ) obtained after 5 generations of the informed GA and
imposed on the true discontinuity map ( dashed line ) . . ... .. 175
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
EFFICIENT COMPUTATIONAL METHODS FOR EARLY VISION PROBLEMS
By
SHANGHONG LAI
August 1995
Chairman: Dr. Baba C. Vemuri
Major Department: Electrical and Computer Engineering
Early vision has been one of the most fundamental research topic in computer
vision. The task of early vision is to recover physical properties of threedimensional
objects (such as shape) from twodimensional sensed data. Example tasks consist
of surface reconstruction, optical flow computation, shape from shading, shape from
stereo, edge detection, lightness problem, segmentation problem,... etc. Early vision
problems may be formulated either in a deterministic or a probabilistic framework.
A popular technique in the deterministic framework is based on the regularization
theory, and the probabilistic formulation exploits the Markov random field (MRF)
model to characterize the function being estimated. Both formulations lead to min
imization of equivalent energy functions. When discontinuity detection is involved
in the formulation, the energy functions to be minimized become nonconvex. In this
thesis, we propose efficient computational algorithms for minimizing these energy
functions with and without discontinuity detection.
This thesis contains the following four main contributions:
1. A new theoretical and algorithmic framework based on the capacitance matrix
method for solving elliptic partial differential equations arising in early vision
and engineering problems.
2. A novel adaptive preconditioning algorithm using a wavelet transform for solv
ing illconditioned large sparse linear systems arising in regularized solutions of
illposed problems in early vision and other domains.
3. Two new, robust and efficient algorithms for accurate optical flow computation
namely, a modified gradientbased formulation and an SSDbased regularization
formulation.
4. A new hybrid (stochastic + deterministic) search algorithm involving an in
formed genetic algorithm (GA) and either one of the two aforementioned fast
numerical algorithms, for solving nonconvex optimization problems in early
vision.
The efficiency of the proposed algorithms are demonstrated through experi
ments on several early vision problems, including surface reconstruction, shape from
shading, and optical flow computation. In addition to computer vision applications,
the theory and algorithms developed in this thesis can also be applied to other ar
eas such as cartography/terrain mapping, reverse engineering, MPEG image sequence
compression, material science applications (velocity estimation of dry granular flows),
engineering applications and mathematical physics.
CHAPTER 1
INTRODUCTION
Early vision has been one of the most fundamental research topic in computer
vision. The primary goal of early vision is to recover the geometric or physical
properties of threedimensional objects from twodimensional sensed data. Typical
properties of objects are the shape ( depth or orientation ), texture, color, reflectance
and motion parameters. Example tasks of early vision include surface reconstruction,
optical flow computation, shape from shading, stereo matching, edge detection and
lightness problem [28, 41, 40, 38, 63, 6, 72]. The results of early vision can be used
for higher level tasks such as bin picking, object recognition, passive navigation and
motion tracking.
Early vision problems can be regarded as inverse problems. During the imaging
process, some information is lost since it projects threedimensional objects to two
dimensional images. Therefore, it is usually not possible to solve the inverse problems
from data information only. In fact, most early vision problems are illposed problems
[6]. A problem is said to be wellposed if its solution (a) exists, (b) is unique, and
(c) depends continuously on the input data [62]. We can transform an illposed
problem into a wellposed one by imposing additional constraints (such as smoothness
constraints) on the solution.
1.1 Formulations of Early Vision Problems
Early vision problems may be formulated either in a deterministic or a prob
abilistic framework. A popular technique in the deterministic framework is based
on regularization theory [82, 80] and leads to the minimization of energy function
als. The probabilistic formulation exploits the Markov random field (MRF) model to
characterize the function being estimated. We can relate the smoothness constraints
in the regularization theory to the prior distribution of the MRF model. Both the
regularization formulation and the MRF formulation of early vision problems lead
to the minimization of equivalent energy functions. We will give the deterministic
formulations of some early vision problems in the next chapter.
Discontinuities contain very crucial information in computer vision. There are
different treatments of discontinuities in early vision research. Early work on vi
sual reconstruction problems reported in literature ignores discontinuities altogether
[40, 28, 62], which yields undesirable smoothing over discontinuity locations. How
ever, more recently problem formulations allow for incorporation of prespecified dis
continuities [81, 75]. For both cases, the aforementioned energy function is convex
and quadratic for most early vision problems. Minimization of a convex quadratic
function can be achieved by solving the associated linear system. In this thesis, we
present several efficient and novel computational algorithms to minimize the energy
functions for early vision problems with prespecified discontinuities.
There have been many methods proposed to detect discontinuities in the under
lying function from the observed data. An elegant method to treat discontinuities
is via the introduction of a set of binary variables called the line process. Each line
process variable takes a value of 1 in the presence of discontinuities and zero other
wise. Line process has been included in both the deterministic [53, 8] and the MRF
[22] formulations. The resulting energy function involves real variables (for surface
or image) coupled with binary variables (for discontinuities) and is therefore a non
convex function. Both formulations lead to problems involving the minimization of
nonconvex functions.
1.2 Computational Algorithms
For the prespecified discontinuities case, numerous iterative methods have been
employed to minimize the energy functions namely, GaussSeidel [14], successive
overrelaxation (SOR), conjugate gradient (CG) [26], hierarchical conjugate gradi
ent (HCG) [75], and multigrid methods [79], the slowest being GaussSeidel, taking
O(N2) time to converge ( N(= n2) is the number of discretization ) [26] while, the
fastest being multigrid methods taking 0(N) time to converge for some regular class
of problems [29]. Although the multigrid techniques have been applied successfully
to a general class of problems, the proofs of convergence are restricted to a special
class of problems [29, 9, 91]. For these results ( posed as solution for PDE problems),
discontinuity in the solution is not considered. Therefore, inclusion of discontinuity
into the problem is bound to deteriorate the convergence speed of multigrid methods.
In addition, the construction of multigrid needs to be modified for nonrectangular
regions. The modification becomes quite complicated for implementation.
Direct solutions to the associate linear systems from minimization of the energy
functions are based on a theorem called the capacitance matrix theorem which was
stated and proved in Buzbee et al. [10]. However, this capacitance matrix theorem
is restricted to some early vision problems and on the design on numerical algo
rithms [45]. The direct solutions using the capacitance matrix theorem employ the
FourierToeplitz method in combination with LU decomposition [72], the Cholesky
decomposition [10] or the conjugate gradient [65] technique. The FourierToeplitz
method requires 0(N log N) time and the LU decomposition, Cholesky factoriza
tion or the conjugate gradient require 0(n3) = O(Nv/'N), thus making the overall
complexity O(N v/N).
In this thesis, we present two efficient algorithms for the minimization of the en
ergy functions in early vision. One is a generalized capacitance matrix algorithm and
the other is an adaptive preconditioning technique in conjunction with a conjugate
gradient or nonlinear conjugate gradient algorithm. The development of the gener
alized capacitance matrix algorithm is based on the generalized capacitance matrix
theorems that we state and prove in this thesis. In our algorithm, we first transform
the linear system into a Lyapunov matrix equation or a cascade of two Lyapunov
matrix equations with an appropriate righthand side (RHS). This RHS is obtained
by solving an associated linear system with a capacitance matrix. The Lyapunov ma
trix equations are solved using the alternating direction implicit method (ADI) while
the solution to the capacitance matrix linear system is obtained using a modified
biconjugate gradient technique. We prove that the ADI method takes a constant
number of iterations with each iteration taking 0(N) time.
Our adaptive preconditioning technique is used in conjunction with a conjugate
gradient or nonlinear conjugate gradient algorithm to minimize the energy function.
This adaptive preconditioner is constructed in a wavelet basis and adapted to the
spectral characteristics of the imposed smoothness constraint as well as the data
constraint in the energy function. The construction of the preconditioner makes
use of the division of frequency domain property in a wavelet basis. Previously,
there have been some preconditioners proposed in computer vision literature [75, 61,
92]. However, they are either not adapted to the problem [75, 92] or constructed
inappropriately [61]. We empirically show the superiority of our preconditioner over
other preconditioners on various early vision problems.
To solve the early vision problems with discontinuity detection, both regular
ization and MRF formulations lead to the minimization of a coupled (binaryreal)
nonconvex energy function. Numerous algorithms have been proposed to solve this
coupled nonconvex minimization problem. They can be categorized into the deter
ministic methods [8] [44] [21] and the stochastic methods [22] [51]. Most of the deter
ministic methods can only deal with simple constraints on the line process [8, 21, 31];
in addition, they can not find the global minimum solution. The stochastic optimiza
tion methods are primarily the simulated annealing (SA) type algorithms. Simulated
annealing is a general global optimization algorithm, but its convergence rate is very
slow, thus making it impractical for many applications.
In this thesis, we will briefly discuss existing methods for this coupled noncon
vex minimization problem and propose a novel hybrid search algorithm as a possible
solution toward solving this problem. This hybrid search algorithm which is a combi
nation of a stochastic and a deterministic search techniques. For the stochastic search,
we propose an informed genetic algorithm (GA) whereas for the deterministic search,
we employ either one of the two deterministic algorithms mentioned above. Some
promising preliminary experimental results of applying this algorithm on sparse data
surface reconstruction along with discontinuity detection problems are presented.
1.3 Main Contributions of this Thesis
This thesis contains the following four main contributions:
1. A new theoretical and algorithmic framework based on the capacitance matrix
method is proposed for solving elliptic partial differential equations arising in
early vision and engineering problems. The algorithm is based on the develop
ment of the generalized capacitance matrix theorems which we state and prove
in this thesis. We make use of the ADI method as a fast elliptic solver in this
algorithm and prove that it takes a constant number of iterations to converge
within a given error tolerance with each iteration taking 0(N) time.
2. A novel adaptive preconditioning algorithm using a wavelet transform is devel
oped for solving illconditioned large systems of equations arising in regularized
solutions of illposed problems in early vision and other domains. Our pre
conditioner is constructed by approximating the spectral characteristics of the
imposed smoothness as well as data constraints. This spectral approximation is
achieved by using the division of frequency domain property in a wavelet basis.
By using this construction, our preconditioner is adapted to to the smoothness
constraint as well as data constraint.
3. Two robust algorithms are proposed for computing optical flow. The first is a
modified gradientbased regularization method, and the other is an SSDbased
regularization method. In our modified gradientbased method, we selectively
combine the image flow constraint and the contourbased flow constraint into
the data constraint in a regularization framework to amend the errors in the
image flow constraint caused by brightness discontinuities. Our SSDbased
regularization method uses the SSD (Sum of Squared Differences) measure as
the data constraint in a regularization framework. Experimental results for
these two algorithms are given to demonstrate their superior performance.
4. A novel hybrid (stochastic + deterministic) search algorithm is proposed to
solve the nonconvex optimization problems for early vision involving disconti
nuity detection. This hybrid search algorithm consists of an informed genetic
algorithm (GA) and either one of the two aforementioned numerical algorithms.
The informed GA is used for the line process only in the outer loop of our algo
rithm, while the deterministic algorithm is employed to solve the subproblems
of minimizing the energy function given a fixed line process configuration.
The efficiency of the proposed algorithms is demonstrated through experiments
on several early vision problems, including surface reconstruction, shape from shad
ing, and optical flow computation. In addition to computer vision applications, the
theory and algorithms developed in this thesis can also be applied to many other
areas, such as cartography/terrain mapping, reverse engineering, MPEG image se
quence compression, material science applications (velocity estimation of dry granular
flows), engineering applications and mathematical physics.
1.4 Thesis Overview
The rest of this thesis is organized as follows.
In chapter 2, the deterministic and probabilistic frameworks of the early vision
problems are briefly reviewed and the formulations of some early vision problems
namely, surface reconstruction, shape from shading and optical flow computation,
are given. The deterministic framework is primarily based on the regularization
formulation. For the probabilistic framework, we discuss the maximum a posteriori
(MAP) estimation in a coupled Markov Random Field (MRF) model.
In chapter 3, the generalized capacitance matrix theorems are stated with com
plete proof. These theorems are generalizations of using ShermanMorrisonWoodbury
formula [26] to solve linear systems and the capacitance matrix theorem proposed by
Buzbee et al. [10]. The structure of the our generalized capacitance matrix algorithm
can be obtained directly from these generalized theorems.
Chapter 4 gives the complete generalized capacitance matrix algorithm which
involves using the ADI method for solving the Lyapunov matrix equations and using
the modified biconjugate gradient (BCG) algorithm for solving the associate linear
system with a capacitance matrix. We consider the membrane smoothness, thinplate
smoothness or a linear combination of the two in our algorithm. We demonstrate the
efficiency of this algorithm by applying it to surface reconstruction and shape from
shading problems.
In chapter 5, the new adaptive preconditioning technique is presented as an
efficient solution to general early vision problems. Our adaptive preconditioner is
constructed in a wavelet basis based on the approximation to the spectral character
istics of the smoothness and data constraints. We empirically show the superiority of
this adaptive preconditioner over other preconditioners previously proposed in vision
literature through experiments on surface reconstruction, shape from shading and
optical flow computation.
Two new, robust and efficient algorithms for optical flow estimation are pre
sented in chapter 6. The first is a modified gradientbased algorithm which combines
the image flow constraint and the contourbased flow constraint into the data con
straint in a regularization framework to amend the errors in the image flow constraint
caused by brightness discontinuities. We employ the adaptive preconditioned conju
gate gradient algorithm, presented in chapter 5, to solve the linear system for this new
formulation. The second is an SSDbased regularization method that uses the SSD
measure as the data constraint in a regularization framework. The preconditioned
nonlinear conjugate gradient with a modified search direction scheme is developed to
minimize the resulting energy function. Experimental results for these two algorithms
are given to demonstrate their performance.
In chapter 7, we list several directions for conducting future research based on
the research reported in this thesis. For the early vision with discontinuity detec
tion problem, we discuss numerous existing techniques and propose a new hybrid
search algorithm as a possible approach toward solving this problem. By includ
ing line process into the optimization of the energy function, the early vision along
with discontinuity detection problem is formulated as the minimization of a coupled
(binaryreal) nonconvex energy function. Our hybrid search algorithm consists of a
stochastic and a deterministic search techniques. For the stochastic search, we pro
pose an informed genetic algorithm (GA) as a global minimizer for the binary line
process only, whereas for the deterministic search, we employ either one of the two
deterministic algorithms mentioned above to solve the minimization problem for each
fixed line process configuration inside the GA. Promising preliminary experimental
results of using this algorithm on the sparse data surface reconstruction problems are
given.
Finally, we summarize the results of this thesis in chapter 8.
CHAPTER 2
PROBLEM FORMULATIONS
There have been two classes of formulations proposed for the early vision prob
lems; one is the deterministic formulation, and the other is the probabilistic one. The
deterministic approach is based on the regularization theory proposed by Tikhonov
[82]. The regularization method introduces a stabilizing operator to wellpose the
original problem and leads to a variational principle. The most popular probabilistic
formulation to early vision problems involves the use of an MRF (Markov Random
Field) model. Although there is some correspondence between the aforementioned
stabilizing operator and the MRF model, the MRF model is known to be more ver
satile to model surfaces or images than the deterministic model [51] [74].
2.1 Deterministic Framework
Variational formulations for various early vision problems have been reported
in [62, 6]. These formulations make use of regularization theory. In a regularization
framework, generic smoothness assumptions are imposed on the solution space prior
to attempting any functional minimization. The smoothness constraints are well
characterized by a class of generalized multidimensional spline functionals [80]. The
formulations involve minimization of an energy functional E, which is the sum of the
energy contribution from the smoothness constraint (E8) and the data constraint (Ed)
i.e.,
Find u such that (u) = infv E H (v) (2.1)
where, R defines the linear admissible space of smooth functions defined on R2 and
the functional
(v) = TD(v) + AS(v) (2.2)
Where, A is called the regularization parameter that controls the contribution of the
smoothness term, S(v) : R is a functional on H that is a measure of smoothness
of an admissible function v(x, y), and 79(v) : H 3 R is a functional that measures
the discrepancy between the observation model and the observed data [80].
To solve the variational problem, a finite difference method [37] or a finite el
ement method [78] can be used to discretize the problem domain. The infinite
dimensional minimization problem in equation 2.1 can be reduced to a finitedimensional
minimization problem as follows:
minE(u) with E(u)= D(u,d)+ AS(u) (2.3)
where u E R?2 is the discretized surface, d E n"2 Xl is the data vector, and the
discretization mesh is n x n. In general, the observation model can be nonlinear,
e.g. the shape from shading problem with a nonlinear reflectance map function.
Let's denote this model by Kdu = d where the operator Kd is a matrix when the
model is linear, and choose the stabilizing ( or smoothing ) operator as the firstorder
derivative along x and y, then
D(u,d) = II Ku d 112 (2.4)
S(u) = luWKu (2.5)
2
For the choice of firstorder differential stabilizer, i.e. the membrane smoothness, the
operator K, is a Laplacian matrix.
The regularization method imposes the smoothness constraint over the entire
domain. It also smoothes out the discontinuities which is very undesirable. Incorpo
rating the discontinuity detection into the early vision problem, the energy function
to be minimized can be modified as [8]
E(u,h,v) = I Kdu d 12 + ((u+i,, uu,j)2(1 hi,j) + (u+i u,j)2(1 vi,))
i,j
+ Ey V(h, v) (2.6)
ij
where hij and vi,j are the horizontal and vertical line processes, respectively, which
are binary variables with a value 1 denoting a vertical or horizontal discontinuity
at that location and a 0 otherwise. The line energy V(h,v) is a term that may
be used to penalize the inclusion of discontinuities or to impose the smoothness of
the line process. Most of the deterministic formulations take very simple functions
for V(h, v) merely based on some heuristics; for example, Blake and Zisserman [8]
take V(h, v) = hi,j + vij, which has no interaction between individual line processes
variables.
The energy in equation 2.6 is a nonconvex function of real variables coupled
with boolean variables. It is difficult to solve a nonconvex minimization problem
since traditional gradient decent methods are easily trapped in local minima.
The deterministic formulation leads to satisfactory results for some early vision
problems, but this formulation is not general enough to include arbitrary constraints
for the line process. We will see in the next section that the probabilistic formu
lation using the MRF model is more flexible in this context than the deterministic
formulation.
2.2 Probabilistic Framework
MRF (Markov Random Field) is a generalization of Markov process in 1D to
2D. To define an MRF [22], we have to define a neighborhood system first. Let S be
a finite set of n x n sites, and G = {G1,j, (i,j) E S} be a neighborhood system for S
such that
(1) (i,j) V Gij, for all (i,j) E S.
(2) (i,j) E Gk,l, if and only if (k, 1) E G ,j, for all (i,j),(k ,l) E S.
Let F = { Fij, (i,j) E S} be a random field, which is a family of random variables
indexed by the twodimensional site (i,j) E S. Any possible sample realization
f = (f1i,i,... fn,n) is called a configuration of the field. Let f be the set of all
possible configurations, i.e. the sample space, and P be a probability measure on Q.
Then F is an MRF with respect to G if
(1) P(F = f) > 0, for all f E Q.
(2) P(Fi,j = fij I Fk, = fk,, (k, 1) # (i,j)) = P(Fij = fi, Fk,i = fkA,, (k, 1) E
Gjj).
It is not clear how to find a probability distribution function for an MRF directly
from the above definition. Fortunately, the HammersleyClifford theorem [22] states
that F is an MRF on S with respect to the neighborhood system G if and only if
the probability distribution is a Gibbs distribution with respect to G. Therefore, we
can construct an MRF by designing a Gibbs distribution function as the distribution
function of the MRF.
To give a definition of a Gibbs distribution, we have to define a "clique" first.
A subset C C S is a clique if every pair of distinct sites in C are neighbors. Let C
denote the set of all cliques.
A Gibbs distribution relative to {S, G} is a probability measure ir on f such
that
r(w) = expU()/T (2.7)
where Z is the normalizing constant ( also called partition function), T is a parameter,
and the energy function U(w) is of the form
Uw) = E Vc() (2.8)
CEC
The function Vc(w) is called the clique potential. Its value is determined by wij with
(i,j) E C.
For the sake of exposition, let's consider the surface reconstruction problem
here, but the discussion to follow should apply to most early vision problems with
minor modifications. A coupled MRF model may be constructed for the surface
reconstruction problem by defining an MRF, F, for surface and a boolean MRF, L,
for line process. The prior probability can be written as
1
P(F = f, L = 1) = expU(fl)/T (2.9)
Zi
U(f,l) = >Vc(f,1) (2.10)
c
The parameter T is assumed to be 1 for simplicity. The prior energy U(f, 1) is
similar to the last two terms of E(u, h, v) in equation 2.6, which are the smoothness
constraints on the surface and line process.
Let the observed data be denoted by d, then the problem to estimate f and 1
from d can be obtained by the MAP (maximum a posteriori) estimation. The MAP
estimate is obtained by using Bayes' rule and maximizing the posterior probability
P(F = f, L = 1 D = d). Assume the observation model is given by D = H(F) + N,
where N is zero mean white Gaussian random field with variance 0 for each variable
in n. After applying the Bayes rule, we can see that the MAP estimation is equivalent
to maximizing P(D = d F = f, L = 1)P(F = f, L = 1), and
P(D d F f, L 1) = (2a)"/2 expdHf)2 (2.11)
P(D = d F = f,L 1) = (2 a2)n2/2exp J2oI (2.11)
Combining equations 2.9 and 2.11, we can see that the MAP estimate can be obtained
by minimizing the following energy function
U(f, I d) = 1 I d H(f) 112 +U(f,l) (2.12)
1  d H(f) 12 + Vc(f Il) + Vc(1) (2.13)
a c c
Investigating equations 2.6 and 2.13, we can see that these two energy functions are
very similar. If we choose the parameters and the clique potential to make these
two energy functions identical, then, both the deterministic formulation and the
MRF formulation give rise to minimizing the same energy function. Therefore, the
correspondence between the two formulations is obvious.
The advantages of the probabilistic model over a deterministic model are two
fold:
1. The MRF model can include very general constraints on the line process, while
the deterministic model is restricted in the types of constraints considered.
The MRF model regards the imposed constraints as a prior probability, which is
determined by the clique potential. In fact, we can estimate the clique potential
functions for different object models instead of ad hoc or heuristic choices [54].
2. A probabilistic model can provide second covariancee) or higher order statistics
of the estimate in addition to the MAP estimate. The higher order statis
tical information is very useful in the context of dynamic vision [74], sensor
fusion, and integration of multiple modules [51]. This can not be achieved by
a deterministic model.
Therefore, it can be concluded that the MRF model is more versatile than the
deterministic models. The efficient computational algorithms proposed in this thesis
can be applied to solve the minimization problems for both the deterministic and
probabilistic formulations.
2.3 Variational Formulations for Some Early Vision Problems
From the previous two sections, we can see the link between the regulariza
tion and MRF formulations, which lead to the minimization of equivalent energy
functions. In this section, we give the deterministic formulation of some early vi
sion problems namely, surface reconstruction, shape from shading and optical flow
computation. The corresponding probabilistic formulation can be directly obtained
from the equivalence between these two formulations [51]. For ease of exposition, the
location of surface and orientation discontinuities (if any) are assumed to be known
in the below discussion. However, we will discuss the issue of discontinuity detection
in chapter 7.
Variational formulations for various early vision problems have been reported
in [79] and references therein. These formulations make use of the popular theory
of regularization. In a regularization framework, generic smoothness assumptions
are imposed on the solution space prior to attempting any functional minimization.
The smoothness constraints are well characterized by a class of generalized multi
dimensional spline functionals [80]. The formulations involve minimization of an
energy functional , which is the sum of the energy contribution from the smoothness
constraint (S) and the data constraint ('P) (see [80, 75] for details of this formulation).
2.3.1 Surface Reconstruction
For the surface reconstruction problem, we are faced with recovering surface
shape from either sparse or dense range data. For a first order stabilizer, the smooth
ness constraint is given by
S(v)= J (v + v2) dxdy, (2.14)
where v(x, y) is the admissible function and vx, vy its partial derivatives assumed to
be small. For the case of a second order stabilizer, we have the following smoothness
constraint
S(v) = J 2(x + 2v + V2) dxdy, (2.15)
The first and second order stabilizer can be combined to form the controlledcontinuity
stabilizers [80]. To the smoothness constraint, we add the data constraints in the
form of penalty terms. The following penalty term which measures the discrepancy
between the surface and data weighted by the uncertainty in the data may be used,
P(v) = ci (v(xi, y,)di,)2 (2.16)
Where, di are the depth data points specified in the domain 1 and ci are the un
certainty associated with the data. The goal is to find a u that minimizes the total
potential energy (v) = S(v) + P(v).
The numerical solution to the above problem is computed by discretizing the
functionals S(v) and P(v) using finite element techniques [75]. The resulting energy
function is a quadratic in x (a discretization of v) given by, E(x) = 1x'Kx xTb + c.
Where, K is a very large and sparse N x N matrix [79], with N being the number
of discretization nodes. The minimum x* of this energy function is found by solving
the large sparse linear system Kx = b. We present two very fast solutions to this
problem in this thesis. One is a generalized capacitance matrix algorithm and the
other is a novel adaptive preconditioning technique, which will be given in subsequent
chapters
2.3.2 Shape from Shading
Numerous solution methods using variational principle formulations of the shape
from shading (SFS) problem have been proposed. For a comprehensive set of papers
on this topic, we refer the reader to the book edited by Horn and Brooks [39] and the
work in [79, 72, 38, 76]. In this problem, it is required to recover the shape of surfaces
from image irradiance which depends on surface geometry and reflectance, scene
illuminance and imaging geometry. The image irradiance can be expressed directly as
a function of the surface orientation if illuminance, reflectance and imaging geometry
are assumed to be constant. Two deterministic formulations for SFS problem are
discussed in the following.
PDE formulation
The shape from shading problem is commonly posed as a nonlinear, first or
der partial differential equation in two unknowns, called the image irradiance equa
tion, E(x,y) = R(p,q) [39], where E(x,y) is the image irradiance at a point (x,y),
p = v., q = vy are first partial of the surface function v(x, y), and R(p, q) is the re
lation between surface orientation (p, q) and image irradiance E(x, y). To overcome
the ambiguity caused along occluding contours in the gradient space parameteriza
tion (p, q), a reparameterization of surface orientation in terms of a stereographic
mapping: f = 2ap, g = 2aq with a = 1/(1 + /1 + p2 + q2) is used. With this repa
rameterization, the overall energy function (sum of stabilizer and data energies) to
be minimized is expressed as
(fg) = A (f. +f) + (g + g)dxdy+ J [E(xy)R(fg)2dxdy. (2.17)
The Euler Lagrange equations which express the necessary condition for a minimum
are given by the system of coupled PDEs
Af = A[R(f,g) E(x,y)]Rf (2.18)
(2.18)
Ag = A[R(f,g)E(x,y)]R,
The above equations may be discretized and solved using the algorithm of Sim
chony et al. [72], which enforces integrability condition on p and q. The method
involves solving three different discretized Poisson equations namely, Kxj = bi,
Kx2 = b2, and K1x3 = b3, where K is the stiffness matrix obtained by discretizing
the Laplacian operator (on the LHS of the equation 2.18) with Dirichlet boundary
conditions and K1 is the stiffness matrix obtained by discretizing the Poisson equation
for the height from orientation problem i.e.,
AZ = p. + qY (2.19)
with Neumann boundary conditions. The vectors b, and b2 are the appropriate
RHSs obtained by discrete computation of the RHS of equation 2.18 at the current
estimated value of (f,g). b3 is the discrete form of the RHS of the above height from
orientation equation with the estimated (p, q) from equation 2.18. The structure of
the stiffness matrices K can be analyzed using the computational molecules [81]. In
this dissertation, we apply our generalized capacitance matrix algorithm to solve the
above discretized Poisson equations.
Energy minimization formulation
The other formulation for the SFS problem is to minimize the total energy func
tional containing the data constraint, the integrability constraint and the smoothness
constraint. The minimization is taken with respect to the surface orientation (p, q)
and the surface height Z altogether. The total energy functional (Z, p, q) is the sum
of the smoothness constraint and the penalty terms in this problem, i.e.,
(Z,p,q) = pdxdy+ tp)2+(Zq)2dxdy
// (p . 4(q "4q)xy 24 /Z p2Z )dd
+ JJE(xy) R(pq)]2dxdy.
The first term in the above equation is a weak smoothness constraint and the coeffi
cient A controls its contribution toward the final solution. The second term enforces
the integrability constraint, ensuring that the gradients (p, q) of the recovered surface
Z match those of a valid surface. Zx, Zy are the first partial of the surface Z(x, y).
The above energy can be discretized directly by using finite element techniques
however, the same set of discrete domain equations can be arrived at via the use of
finite difference approximations for this application [76]. The discrete energy is given
by
E(Zp)2 4 )2+ (pZ2+p,+q+ 220)
E= (E(xy)R(p,q))2+! P+(Z 2 % (2.20)
where the summations are over all the pixels in the image. In vector notation, the
above equation can be written as
E(u) = TKu bu+C (2.21)
Where, u = [zTpTqT] is a concatenation of the vector representation of the fields
Z, p and q. The minimum of the above quadratic is obtained by solving Ku = b with
Ku containing linear u terms and the constant and the nonlinear terms form the b
vector. This is an (N x N) sparse nonlinear system which is usually solved using
iterative methods. These iterative methods to solve the nonlinear system normally
have the following procedures in each iteration: (1) compute the b vector using
the current u, (2) update the solution u by treating b as a constant. In the past,
multigrid techniques have been employed in solving this nonlinear system [79] wherein
a hierarchy of problems at different resolutions needs to be explicitly constructed.
Szeliski [76] presented a fast shape from shading algorithm which employs hierarchical
basis for preconditioning the conjugate gradient algorithm used to solve the above
nonlinear system. Unlike the multigrid methods, this technique does not require
the explicit construction of a multiresolution pyramid of problems. In this thesis we
apply our adaptive preconditioning technique that leads to a faster convergence than
the one in Szeliski [76]. Our method makes use of the spectral characteristics of the
data constraint, the imposed smoothness constraint and the integrability constraint.
This spectral function is then used to modulate the frequency characteristics of the
chosen wavelet basis leading to the construction of the preconditioner.
2.3.3 Optical Flow Computation
A third early vision problem that we consider in this dissertation is the compu
tation of optical flow from an image sequence. Many techniques for computing optical
flow have been proposed in literature. These can be classified into gradientbased,
correlationbased, energybased, and phasebased methods [3]. Recently, Barron et
al., [3] have presented a comprehensive survey of various optical flow computation
techniques along with a quantitative comparison of them. We note that the survey did
not do any comparison of the computation time required by the various algorithms.
Two new and robust algorithms namely, a modified gradientbased algorithm
and an SSDbased regularization algorithm, for accurate optical flow computation
are presented in this dissertation. In the modified gradientbased algorithm, we
propose to selectively combine the image flow constraint and the contourbased flow
constraint into the data constraint in a regularization framework to amend the errors
in the image flow constraint caused by the brightness discontinuities. The image flow
constraint is disabled in the neighborhood of discontinuities, while the contourbased
flow constraint is active at discontinuity locations. The SSDbased regularization
algorithm uses the SSD measure as the data constraint in a regularization framework,
which leads to a nonlinear optimization problem. These two different regularization
formulations are described below. These two robust algorithms will be discussed in
details in chapter 6. In this section, we briefly describe our modified gradientbased
formulation.
Our modified gradientbased formulation modifies the Horn and Schunk for
mulation [40] to incorporate a separate condition along discontinuities in the image
intensity [35]. This modification drastically enhances the performance of our method
over other gradientbased methods because the optical flow constraint equation is in
valid along intensity discontinuities and causes large errors in the computed flow. We
get drastically better and more robust estimates of optical flow than any gradient
based optical flow technique reported in Barron et al. [3].
Our variational formulation of the optical flow problem involves minimizing a
combination of the optical flow constraint equation with a global smoothness term
and an additional constraint term along edges leading to
I a((7VEu)+Et)2+A(\\ Vu 11' + I1 Vv II )dx+ I3((V(V2G*E)*u)+(V2G*E)t)2dx
(2.22)
Where u(x,t) = (u(x,t),v(x,t)) is the velocity field to be estimated, E is the image
brightness function, and V2G is the Laplacian of Gaussian operator [35]. The first
term is the gradient constraint which is active away from discontinuities, the second
term is the smoothness constraint on the velocity field with A controlling the contri
bution of the smoothness term and the third term is the flow constraint along edges
which is disabled away from the edges. A discrete version of the above energy can
be written as
+(E + EyU + Et)2 + A (2 + u 2) + (U + 2) (2.23)
Where, U and F form the discretized flow vector, U,, uy, v. and :, represent the dis
cretized partial of the velocity field, E., Ey and Et correspond to the discretized
partial in the x, y and t directions respectively collectively obtained from the
first and the third terms of the equation 2.22. Note that E., Ey, and Et are the
discretized partial of the brightness function E when the location of differentiation
is away from discontinuities, and they represent the discretized partial of V2G E
when the differentials are taken at the discontinuity locations. The minimization of
this discretized energy leads to solving a linear system of equations Ku = b where
K contains the discretized differential (smoothness) operator along with E., Ey and
the b contains the terms corresponding to the optical flow constraint equation. The
gradient constraint in the above energy function implies the weighting (E +  on
graien cnstain i th aoveenegyfuntio iplis he eigtig (.,+ E.) on
(U, vU) at each discretized spatial location. This weighting is not reasonable since the
gradient constraints at locations of high gradient are not necessarily reliable. In fact,
they tend to be unreliable because the brightness gradients close to discontinuities are
normally higher than those away from discontinuities. Therefore, we eliminate this
weighting for each gradient constraint via normalization. To have a uniform contri
20__ 2 
bution at each pixel from the flow constraint, we normalize the entries E., Ey, E.,EY
2 2
in the matrix K with E, + Ey.
To solve the above quadratic and convex energy minimization problem, we em
ploy our adaptive preconditioned conjugate gradient algorithm presented in this
dissertation. This novel adaptive preconditioning technique is applicable to any
gradientbased optical flow computation method that uses a smoothness constraint
involving the 2D Laplacian operator. The efficiency as well as accuracy of our adap
tive preconditioning technique and new optical flow algorithms will be demonstrated
through experimental results on synthetic and real image sequences in chapters 5 and
6.
CHAPTER 3
GENERALIZED CAPACITANCE MATRIX THEOREMS
3.1 Introduction
There are numerous applications of the ShermanMorrisonWoodbury formula in
various fields [30] [85]. These applications can be classified into two categories. One is
to compute the inverse of a matrix that has lowrank difference w.r.t. another matrix
whose inverse is available or can be efficiently computed; the other is to solve a linear
system whose matrix differs from a wellstructured matrix by a lowrank matrix and
very efficient methods exist to solve the linear system with the wellstructured matrix.
The latter case is also called the capacitance matrix algorithm. This algorithm has
been widely used for solving linear systems arising from the discretization of the
boundary value problems, primarily for the elliptic partial differential equations. In
this chapter, we state and prove some very general theorems which are then applied
in developing a new capacitance matrix algorithm in the next chapter.
When A E Rx", U,V E Rnxp, and both the matrices A and (I + VTAU)
are nonsingular, the ShermanMorrisonWoodbury formula [26] gives an expression
for the inverse of (A + UVT) as
(A + UVT)1 = A1 A1U(I + VTALU)IVTA1. (3.1)
The above assumptions imply that the matrix (A + UVT) is also nonsingular. The
matrix (I + VTA'U) is called the capacitance matrix and is denoted by C. Note
that a similar formula can be obtained for the inverse of (A + UGVT), where the
matrix G E RPP is nonsingular [30].
Let B = A + UVT, with A, B, and C being nonsingular, then the linear
system Bx = b can be solved using the traditional capacitance matrix algorithm,
which involves the following steps:
1. Solve AR = b for x.
2. Compute W = A1U by solving AW = U column by column.
3. Form the capacitance matrix C = I + VTW.
4. Solve C/3 = VTR for /3.
5. Compute the solution x = R W/3.
This capacitance matrix algorithm is derived directly from the ShermanMorrison
Woodbury formula, and therefore, it can be applied only when A, B, and C are all
nonsingular, making it unsuitable for general linear system problems. In addition, the
choice of a wellstructured matrix A which is close to the original matrix is limited
to be nonsingular.
Buzbee et al. [10] proposed a fast algorithm to solve the Poisson equation (
a secondorder elliptic PDE ) on irregular regions. Their algorithm used a domain
imbedding technique to imbed the irregular region in a rectangular region and utilized
a fast Poisson solver ( on the rectangular region ) which is based on the capacitance
matrix algorithm. The discrete Poisson operator on the rectangular domain is the
Laplacian matrix, which is singular with rank n 1. Therefore, they presented a
theorem to generalize the traditional capacitance matrix algorithm to the case when
rank(A) = n 1 and B is nonsingular. This generalization is based on projecting the
righthandsides (RHSs) of the linear systems appearing in the first and second steps
onto the range of A through a unit vector. In [58], a generalized capacitance matrix
algorithm was presented for rankdeficient matrices A and B. This generalization
is accomplished by establishing the relationship between the solutions of the linear
systems with matrices A and B using their singular value decompositions. The
resulting algorithm involves the projection of the RHSs of the linear systems with
matrix A onto its range through the eigenvectors in the null space of AT. In addition,
two auxiliary matrices A and B, which are the rankaugmented versions of A and B,
are constructed in this algorithm to apply the ShermanMorrisonWoodbury formula
directly. Hence, this capacitance matrix algorithm requires the computation of all
the eigenvectors in the null spaces of A, AT, B and Bw.
The capacitance matrix algorithm has also been employed as a discrete anal
ogy to the potential theory for partial differential equations by Proskurowski and
Wildlund [65, 66], and others. However, the framework of their capacitance matrix
algorithms is limited to solving secondorder elliptic boundary value problems.
The other possibility to generalize the capacitance matrix algorithm for singular
A and B is to directly apply a generalized inverse form for (A + UVT) to solving
the linear system. Some works on the extension of the ShermanMorrisonWoodbury
formula in equation 3.1 have been reported in [11], [34], [67], and references therein.
However, these generalized inverse forms of (A + UVT) have been derived under
very restrictive assumptions. In [11], a generalized inverse form was given for the
case when U and V are both vectors, which means B is a rankone modification of
A. Although the generalized inverse of a rankp modification of A can be obtained by
using the rankone modification generalized inverse formula iteratively, the resulting
algorithm is very inefficient in terms of storage and computation when the size of
the matrix A is very large. Henderson and Searle [34] derived several generalized
inverse forms for (A + UVT) under the assumptions that range(UVT) C range(A)
and both the matrices A and UVT are symmetric. Recently, Riedel [67] presented a
generalized inverse form for (A + UVT) with a very restrictive assumption, which is
implied by the condition rank(B A) < dim(null(A)). This condition implies that
Riedel's result at best can give a generalized inverse for a rankq modification of A,
where q < dim(null(A)). This condition becomes very restrictive and impractical
when the dimension of the null space of A is small. Although the aforementioned
generalized inverse formulas can be used to design the corresponding capacitance
matrix algorithms, the associated assumptions restrict them from being applied for
general matrices A and B.
In this chapter, we present several theorems which we call generalized capaci
tance matrix theorems that lead to a generalized capacitance matrix algorithm which
can be applied to solve very general linear systems. In the main theorem, there are
no restrictions on the rank of the matrices A or B and it can be applied to any linear
system as long as it has a solution. In addition, with this generalized capacitance
matrix theorem, we have ample freedom to choose a wellstructured matrix A for de
signing more efficient capacitance matrix algorithms. Unlike in [58], no construction
of auxiliary rankaugmented matrices is required since we do not use the Sherman
MorrisonWoodbury formula in our generalization, instead our resulting algorithm
uses only the eigenvectors in null(AT)\null(BT) to project the RHSs of the linear
systems with matrix A onto range(A). To achieve the projection of the RHS onto
range(A), we subtract the RHS along the vectors uj, 1 < j < mA m, instead of
subtracting along the eigenvectors in the null space of A as in [58]. By using our
projection technique, the sparsity of the RHS can be preserved, while the sparsity of
the RHS is usually destroyed after subtracting along the eigenvectors as in [58]. We
take advantage of this sparsity property to obtain fast numerical algorithms in our
work. Unlike in [58], the capacitance matrix and the capacitance matrix equation
are explicitly given in our generalized capacitance theorems and the algorithm. By
explicitly specifying the capacitance matrix and the capacitance matrix equation, it
is possible to modify the RHSs of these equations in accordance with the singularities
of A and B. With the structure of the capacitance matrix equation given explicitly,
it is easy to design an efficient numerical algorithm to solve this equation. When
comparing the generalized inverse forms of (A + UVT) proposed in [34] with our
generalized capacitance matrix theorems, we observe that the generalized inverse for
mula of Henderson and Searle [34] is a degenerate result of our theorem 2, which has
a more relaxed assumption, i.e. null(A) C null(B) and range(B) C range(A), than
the assumption, range(B) C range(A) for symmetric A & B, made in [34]. When
applied to solve linear systems using the capacitance matrix approach with the matri
ces A and B satisfying the aforementioned assumption, both the generalized inverse
formula in [34] and our theorem 2 lead to the same algorithm.
We will state and prove the generalized capacitance matrix theorems for different
types of relationships between the matrices A and B subsequently. In the next
chapter, the generalized capacitance matrix algorithm will be constructed based on
the theorems presented in this chapter.
3.2 Generalized Capacitance Matrix Theorems
We categorize the generalized capacitance matrix theorems according to the
type of relationship between the matrices A and B. The first theorem is for the case
when null(B) C null(A) and range(A) C range(B), while the second theorem is
for the assumption null(A) C null(B) and range(B) C range(A). There are no
assumptions on A and B in the third theorem. The generalized capacitance matrix
theorem for the first case is stated as follows.
Theorem 1 Given B = A + UGVT, where A,B E Rnxn, G E Rpxp is nonsingular,
U = [ui,..., up] E Rnxp, V = [v1,...,Vp] E nxp, with both the sets {ul,...,up}
and {vi,... ,vp} containing linearly independent vectors. Assume null(B) C null(A)
and range(A) C range(B). Let rank(B) = n mB and rank(A) = n mA,
0 < mB <_ mA < n, then there exist two sets of orthogonal bases, {qi,... ,qmA}
and {q'1,... ,q'mA}, for null(AT) and null(A) respectively, such that null(BT) =
span{ qmAmB+, ... ,qa} and null(B) = span{ q'mAmB+,.. q,q'mA} Let qTiu. =
aj6(ij), ai j 0, for 1 < i,j < mA mB.
Let R be a solution to
MAMB qTb
A,=b E fU (3.2)
=1 q3Tu j,
where b E range(B). For 1 < i < p, let i7i be a solution to
mAmB qTut
= qjTu1
Ar/i= Ui quj (3.3)
and choose 1 ... r7aMA B to be linearly independent and the linear combinations of
q'l,...,q'mAmB Define the capacitance matrix C as
mAmB qTU
C=G1+VT77 E Ge Tu' (3.4)
j=1 q j~u
where 7 = [rl,.... ,7,], and ej is the jth standard basis vector in RP. Then, there
exists a unique solution 3 to the capacitance matrix equation
m lAmB qTb
C,3= VTx E Gej, (3.5)
and x ,q3 is a solution to the linear system Bx = b.
We will use the following proposition to prove this theorem.
Proposition 1 Under the assumptions in Theorem 1, we have vfq'j = uTq, = 0 for
1 < i < p and mA mB + 1 < j < mA In addition, VT[q'i ...q'mA,mI and
UT[qi ... qmAmB ] are of full rank.
Proof: From the assumption in Theorem 1, Bq'j = Aq'j = 0, mA mB + 1 < j <
mA. Since Bq', = (A + UGVT)q'j, we have UGVTqj = 0 for mA mB + 1 < j <
mA. From the assumption that U E nR"p, p < n, has full rank and G is nonsingular,
it is obvious that VTq'j = 0. Therefore, we have vTq'j = 0 for 1 < i < p and
mA mB + 1 < j < mA. Similarly, we can show that uTqj = 0, 1 < i < p and
mAmB+l
We use proof by contradiction to show that the matrix VT[q'i ... q'mAmB] is of
full rank. Let's assume this matrix is rank deficient, i.e., there exists a nonzero vector
aT = (ai,..., amAmB) such that VT[q'... q'A,n]a = 0. Let r = [q' ... q'MAM]a,
then r E null(A) and r is orthogonal to null(B). Postmultiplying both sides of the
equation B = A + UGVT by r yields Br = Ar + UGVTr. Since Ar = 0 and
VTr = 0, we obtain Br = 0. But, from the above, r is a nonzero vector orthogonal to
null(B), i.e. Br : 0, leading to a contradiction. Consequently, the assumption that
VT[q'i ... q'mAMBj is rank deficient is not true, implying that VT[q'i ... q'mAm]
is of full rank. Similarly, we can prove that UT[qi ... qmAmB] is of full rank. *
We are now equipped to prove Theorem 1. The proof is in four parts, and
we will show that (1) there exist two sets of orthogonal bases, {q1,...,qmA} and
{q',... ,q'mA }, for null(AT) and null(A) respectively, such that null(BT) = span{
qmAmB+1, *"',qmA} and null(B) = span{ q'mAMB+1'*.*' q'MA}, (2) there exist
solutions to equations 3.2 and 3.3, (3) the capacitance matrix C is nonsingular,
implying that equation 3.5 has a unique solution, and (4) B(x 17,3) = b.
Proof for Theorem 1:
(1) Since rank(A) = nmA and rank(B) = nmB, it is obvious from their singular
value decompositions that there exist mA orthogonal eigenvectors, 4b",... ,mA' for
A and mB orthogonal eigenvectors, f1,... 4,,i, for B such that null(A) = span{
01,"''", mA} and null(B) = span{ol,.. ., }. From the assumption null(B)
C null(A), we can write each 0j, i = 1,..., m, in terms of a linear combination of
the basis vectors 01,''... mA Thus, we have
: =T : (3.6)
where T E R'mBX A is of full rank. By using the Householder or Givens transforma
tion [26], we can obtain the factorization of T as follows:
T=[o T]Q, (3.7)
where 0 is an mB x (mA ms) zero matrix, T E RBXmB is nonsingular, and
Q e RmAXm is orthogonal. Define the vectors q'1,..., q'mA by the following relation
q'i
=Q (3.8)
q'MA 4OMA
We can see that {q'I,. q'MA} is a set of orthogonal bases for null(A) and its
subset {q'MAm,+1 ,..., q'mA} forms an orthogonal basis for null(B). Thus, the
existence of the basis {q'j,..., q'mA } which satisfies the aforementioned requirements
is proved. Similarly, we can prove that there exists an orthogonal basis {q, . qmA}
for null(AT) and null(BT) = span{qmAm,+,. ... qmA } is satisfied.
(2) From Proposition 1, we have uTqj = 0 for 1 < i < p and mA mB +1 < j <
mA. The assumption b E range(B) implies bTqj = 0 for mA mB + 1 < j < mA.
Combining these facts and the orthogonality of the eigenvectors qj, j = 1,..., ma, we
can see that the RHSs of equations 3.2 and 3.3 are orthogonal to qc for j = 1,..., mA,
thus, the RHSs are orthogonal to null(AT). Consequently, the RHSs of equations
3.2 and 3.3 are both in the range of A, which means there always exist solutions to
equations 3.2 and 3.3.
(3) To prove that C is nonsingular, we show that C/3 = 0 => (3 = 0. Let's
suppose C/3 = 0, then, from equation 3.4, we have
7 MAmB qTU/3
vTE Z G1e u G/3. (3.9)
j=1 qlu3
By definition, Bri/3 = Ar/3 + UVTf3. Substituting equations 3.3 and 3.9 into this
equation, we get Br/3 = 0. Thus, r7/3 E null(B). Since null(B) C null(A), we
have r7/3 E null(A), i.e. Ait/3 = 0. Substituting At7 by equation 3.3 and denoting
/3 = (/I,..., 3p), we obtain
p p mAmB qTu
=1 =l ?l qju,
(3.10)
After rearranging, equation 3.10 becomes
mAmB q J Um_ 3 P
qjul j=mAmB+l
luj = 0.
(3.11)
Since Ul,..., up are linearly independent, equation 3.11 imposes the following condi
tions on j3,
q~TU/3
Piu = ,J = l,...,mAmB,
#j = 0, j=mAmB+l,...,p.
(3.12)
(3.13)
Substituting these conditions into the RHS of equation 3.9, we obtain
MAB
I3 ^mE
=0.
(3.14)
VT[,l,..", 1 ,AMB]
For i = 1,...,mA mB, the RHS in equation 3.3 equals 0, thus, r77 E null(A).
From the assumption that h,. '' mA_7mB are chosen to be linear combinations of
ql,... ,q'mAmB and linearly independent, we can write [t7 ... mAr,MBI] in terms
of [q'l ... q'mAmB] as follows:
[7 ... TmAmBI = [q'I *... q'mAmBIT, (3.15)
where T E 3(mAmB)(mAmB) is nonsingular. By substituting equation 3.15 into
equation 3.14, we get
VT[q'I ... qjT : = 0. (3.16)
SPmA mEi
From Proposition 1, the matrix VT[q'li ... q'MAmBI is of full rank and is of size
p x (mA mB) with p > mA mB, which implies
T : =0. (3.17)
f PmAmB
Since T is nonsingular, this makes fli = 0, i = 1,..., mA mB. Combining this with
equation 3.13, we obtain 03 = 0. Thus concluding that C/3 = 0 implies /3 = 0,
therefore, C is nonsingular.
(4) Now, we will show that B(R r]0) = b. Substituting A + UGVT for B
and expanding, we have
B( 1f}) = AR Ay/3 + UGVTk UGVTf3. (3.18)
Rewriting VT7)3 using equations 3.4 and 3.5 and some algebraic manipulation, we
get
mAmB qTb mAmB qTU
VT^ 'VTjE Z  Gej G1j3 + E e,j/3. (3.19)
j=l qj uj .j=l qj"
Substituting for AR, At and VT1/3 from equations 3.2, 3.3 and 3.19 respectively,
equation 3.18 leads to B(x rq/3) = b. *
In theorem 1, we assume that the matrices U and V contain linearly independent
columns and G is nonsingular, which is equivalent to saying that U, V and G are all
of full rank, in addition, the condition qTuj = aii(i,j) must be satisfied for 1 < i,j <
mAmB. In fact, we can prove that given any two matrices A, BE Rn"n, there always
exist fullranked matrices U, V E Rn'P and G E RPXP such that UGVT = BA and
p = rank(B A). This can be verified by taking the singular value decomposition of
the matrix B A as UVZVT, where both U and V are orthogonal matrices and X is
a diagonal matrix with rank p = rank(B A). We can reduce the size of the matrix
X by retaining the p nonzero diagonal entries and taking out the remaining rows
and columns of zeros. This reduced matrix G E RP'P is nonsingular. In addition,
we take out the columns in V and V corresponding to the zero singular values in 7,
then the reduced matrices U, V E 3Zn'p are obtained. It is not difficult to see that
UGVT = B A and the matrices U, V and G are of full rank.
As for the condition qTuj = ai6(i,j), 1 < ij < mA mB, we can find such
a U by transformation. If {U, G, V} satisfies UGVT = B A and each is of full
rank, then the alternative set {UQ, Q1G, V} also satisfies these requirements when
the matrix Q 3Rpxp is nonsingular. From Proposition 1, UT[qi ... qmAmBI has
full rank. We can find a transformation matrix Q to make the upper (mA mB) x
(mA mB) block of the transformed matrix QTUT[qi ... qmlAm ] diagonal. Then,
the condition qfuj = ai6(i,j), 1 < i,j < 5MA mB, is satisfied.
There are infinite number of choices for the matrices U, V and G to satisfy the
aforementioned requirements. The above choice from the singular value decomposi
tion of B A is just a particular one that leads to a diagonal G matrix. This makes
the computation of G in equation 3.4 trivial. The choice of A, U, V and G is very
crucial for the efficiency of the numerical algorithms based on the capacitance matrix
algorithm to solve a linear system.
Now let's turn to the other case when null(A) C null(B) and range(B) C
range(A). Note that these two constraints are equivalent to each other when both
B and A are symmetric matrices as in [34].
Theorem 2 Given B = A + UGVT, where A,B E ?"nn, G E RpXp is nonsingular,
U = [ui,...,up] E R"Xp, V = [v1,...,Vp] E "nXp, with both the sets {ul,...,Up}
and {vi,... ,VpJ containing linearly independent vectors. Assume null(A) C null(B)
and range(B) C range(A). Let R be a solution to
(3.20)
where b E range(B). For 1 < i < p, let rh be a solution to
Ati = ui,
(3.21)
Define the capacitance matrix C as G + VT'q, where q = [q?,..., qp]. Then there
exist a solution (3 to the capacitance matrix equation
C63 = VTx,
(3.22)
and R rij3 is a solution to the linear system Bx = b.
We will use the following proposition to prove the above theorem.
Proposition 2 In Theorem 2, let {qi,... ,qmA} and {q',.., q,'mA} be the sets of or
thogonal eigenvectors in null(AT) and null(A) respectively, where mA = nrank(A).
Then, qfuj = q' Tvj =0, for i= 1,...,mA and j = 1,...,p.
Proof: From the assumption that null(A) C null(B), we can see that the eigen
vectors q'i, i = 1,... ,mA, are also in null(B). Since Bq', = Aq'i + UGVTq'i, we
obtain
UGVTq'i = 0,
(3.23)
for 1 < i p, and of full rank, consequently
GVTq', = 0. From the assumption that G is nonsingular, we have VTq, = 0.
Therefore, q'Tvj = 0 for i = 1,... ,mA and j = 1,... ,p.
Similarly, we obtain the equation VGTUTqi = 0 from the condition BTq, =
ATqj = 0 which is derived from the assumption range(B) C range(A). Using
the similar reasoning for this equation leads to the conclusion that qiTu = 0, i =
1,...,mAandj = l,...,p. m
We are now poised to prove Theorem 2.
Proof for Theorem 2 : The proof is presented in three parts namely, (1) we
will show that there exist solutions for equations 3.20 and 3.21, (2) we will establish
that there exist a solution for equation 3.22, and (3) show that B(x 1/3) = b.
(1) To show that there exist a solution for equations 3.20 and 3.21, we prove that
the RHSs of both equations are in the range of A. It is obvious that b E range(A)
for equation 3.20, since b E range(B) and range(B) C range(A). For equation 3.21,
from Proposition 2 we can see that ui, 1 < i < p, is orthogonal to null(AT), which
means ui E range(A) for i = 1,... ,p.
(2) As for equation 3.22, we need to show that VTR is in the range of C. We
use proof by contradiction in the following. Assume that VTR is not in the range of
C, then there exists a nonzero vector 7 E null(CT), i.e. CTy = 0, such that
7T(VyT) 5 0.
(3.24)
Using the generalized inverse 1 A+ of the matrix A, we can write R as follows.
mA
=A+b + aiq'i,
i=1
(3.25)
where mA = n rank(A), q'l,..., q',A are the orthogonal eigenvectors in null(A),
and a, E R, Vi. Then,
MA
VT5 = VTA+b + aiVTq'i.
(3.26)
From Proposition 2, we have VTq'i = 0, for i = 1,..., mA. Substituting into equation
3.26, we have VTR = VTA+b. Substituting for VTR in equation 3.24 yields
(A+TVy)Tb 5 0.
(3.27)
We can show that A+TVY E null(BT) as follows.
BT(A+TVy) = (A + UGVT)TA+TV7
(3.28)
(3.29)
= (ATA+Tv + VGTUTA+TV)
Since ATA+T I EI= qIqT, using a result of Proposition 2 namely, q'TV = 0
for i = 1,..., mA, we have
ATA+TV = V q'qV = V.
S=1 V = V.
A A 1
(3.30)
'Let the singular value decomposition of A be A = UEVT, where U and V are orthogonal
matrices and X = diag(O,...,O, m A+I,..., An). Then its generalized inverse A+ = VE+UT,
where 7+ = diag(O,..., i '... I) [26].
Substituting equation 3.30 into equation 3.29 yields
BT(A+TVy) = VGT((G1 + VTA+U)T^y (3.31)
Again, using Proposition 2, we can show that C = G1 + VTA+U. Consequently,
we obtain BT(A+TVy) = VGTCT7 = 0 since CTy = 0 by assumption. Thus,
A+Tv Y E null(BT). Combining this with the equation 3.27 leads to the conclusion
that b is not in range(B). This is a contradiction to the assumption b E range(B).
Therefore, the assumption that VTx is not in the range of C is untrue. Hence, we
can conclude that VTi E C. This proves that there exists a solution to equation
3.22.
(3) The proof for B(x rj3) = b is very similar to part (3) of the proof for
Theorem 1 and is omitted here. *
If we restrict the above constraints on the relation between B and A to null(B) =
null(A) and range(B) = range(A), then the capacitance matrix C is nonsingular,
i.e., the capacitance matrix equation has a unique solution. This result is stated in
the following corollary.
Corollary 1 In Theorem 2, if null(A) = null(B) and range(A) = range(B), then
the capacitance matrix C is nonsingular.
Proof: To prove that C is nonsingular, we show that C/3 = 0 =* 3 = 0. Suppose
C/3 = 0, then, from the definition of C, we have VTq/3 = G13, which can be
substituted into the equation Bi/t3 = Ai7f3 + UVTi/.3, along with the equation for
Ait from (3.21), to get Brj/3 = 0, i.e., r/3ft E null(B). Since null(A) = null(B), we
have Ar7/3 = 0. Using equation 3.21 for Atq, we get U/3 = 0. From the assumption on
the matrix U namely, u1,. . up are linearly independent, we conclude that /3 = 0.
Therefore, C is nonsingular. *
In the above theorems, we impose the constraints on the relationship between
B and A either via null(B) C null(A) and range(A) C range(B) or via null(A) C
null(B) and range(B) C range(A). Now we consider the general case, i.e., no con
straint on the relationship between B and A is imposed. The following theorem is
given for this general case and can be used for any matrices B and A in ?Inn as long
as there exists a solution for the associated capacitance matrix equation.
Theorem 3 Given B = A + UGVT, where B,A E RnXn, G E RPxp is nonsingular,
U = [ul,...,up] E fnxp, V = [vi,..., ,Vp] E Rnxp, with both the sets {ul,...,up}
and {vi,...,vp} containing linearly independent vectors. Let rank(A) = n mA,
dim(null(B) n null(A)) = m', and dim(null(BT) n null(AT)) = m, 0 < m',m <
mA < n, then there exist two sets of orthogonal basis, {qi, . qmA } and {q'l ..,M ,
for null(AT) and null(A) respectively, such that null(BT)nflnull(AT) = span{qMqAm+i,
..,qmA} and null(B) n null(A) = span{q'mAM,+l,... ,q'A}. Let rank(B) =
n MB, 0 < m',m < mB < n, then there exist mB m' vectors r'm+l,...,r'mB
such that the set {q' mAm'+1, ... q 'mA' r/m,'+1,. , r',B M} is an orthogonal basis for
null(B). Let qfuj = aij(i,j), ai, 0, for 1 < i,j < MA m.
Let R be a solution to
MIAM" oTb
AR=b E q= (3.32)
A b I qjTuUj,
j=l ?"
where b E range(B). For 1 < i < p, let ih be a solution to
mAm qTui.
Ai= ui E . uj, (3.33)
j=l qj uj
and choose .. .lrnAm to be linearly independent and linear combinations of q',...,
qmAm Define the capacitance matrix C as
mAM 1 ; qUT
C=G1+VT7 : Glej, T (3.34)
j=1 qj uj
where r7 = [771,...,1p], and ej is the jth standard basis vector in RP. Assume the
two sets of vectors {Ar'm,'+i,..., Ar'm,,} and {ul,..., up} are linearly independent,
then there exists a unique solution 3 to the capacitance matrix equation
mAm QTb
CO = VR G1e, (3.35)
C vT  qjTuj. ,
j=i l
and R r13 is a solution to the linear system Bx = b.
The proof of this theorem is omitted here since, it is similar to the proof of
Theorem 1. This proof will require the use of the fact that qTuj = 0, for mAm+l <
i < mA and 1 < j < p, and the matrix VT[q'I ... q'mAm] is of full rank.
In general, Theorems 1 and 2 can be regarded as the special cases for Theo
rem 3 with the exception of the additional assumption on the linear independence of
the two sets of vectors {Ar'm,'+i,..., Ar',,,} and {ul,..., up) made in Theorem 3 to
guarantee the uniqueness of the solution in the capacitance matrix equation. This as
sumption is automatically satisfied for Theorem 1 since the set {Ar'm'+i,..., Ar',m }
is empty and the vectors ul,..., up are linearly independent. However, there is no
such assumption in Theorem 2, which gives the existence of a solution to the capac
itance matrix equation instead of the uniqueness of the solution given in Theorem
3.
3.3 Discussion
In this chapter, we presented the generalized capacitance matrix theorems for
solving linear systems. Our generalized capacitance matrix theorems can be used for
very general matrices A and B, therefore the resulting algorithm can be applied to a
very broad class of problems and we have ample freedom to choose the matrix A ap
propriately for designing efficient numerical algorithms. The traditional capacitance
matrix algorithm [10, 65, 66] has been primarily applied to solve the secondorder
elliptic boundary value problems, whereas our generalized capacitance matrix algo
rithm can be used to solve anyorder elliptic boundary value problems. Also, we can
handle cases such as the shape from shading ( SFS ) problem on an irregular domain
with Neumann boundary conditions which specifically yields a singular K matrix.
This case can not be solved using the capacitance matrix technique of Buzbee et al.,
[10] as mentioned in [72] because, the capacitance matrix theorem of [10] requires a
49
nonsingular K. Our algorithm gives a solution to this problem (recovers a surface)
within a scale factor in height.
Previous works [11, 34, 67] on deriving the generalized inverse forms of (A +
UVT) were presented under restrictive assumptions. Although our generalized capac
itance matrix theorems are developed for solving general linear systems, it is possible
to derive the generalized inverse form of (A + UGVT) for more general cases based
on these theorems.
CHAPTER 4
GENERALIZED CAPACITANCE MATRIX ALGORITHM
4.1 Structure of the Algorithm
From the generalized capacitance matrix theorems presented in chapter 2, we
can generalize the standard capacitance matrix algorithm for general matrices A and
B as follows.
1. Solve equation 3.32 for it.
2. Solve equation 3.33 for q1, for i = 1,... ,p.
3. Form the capacitance matrix C from equation 3.34.
4. Solve the capacitance matrix equation (i.e. eq. 3.35) for f3.
5. The solution x = R Tq,.
This generalized capacitance matrix algorithm can always be applied to the cases
when null(B) C null(A) and range(A) C range(B) or when null(A) C null(B)
and range(B) C range(A), as stated in Theorem 1 and 2 respectively. However,
to apply this algorithm to more general cases, we need to make sure that a solution
exists to the capacitance matrix equation in step 4. Theorem 3 gives a sufficient
condition for the uniqueness of the solution to the capacitance matrix equation in
the general case.
Note that the step 2 of the algorithm requires the solutions 77,..., r7p in equation
3.33 with different RHSs. The computational cost to solve this equation p times is
very high when p is large. However, the vectors 7l, ... vp can be easily obtained
when the matrix A is chosen to be circulant Toeplitz and the vectors ui have a very
sparse set of nonzero entries. The circulant Toeplitz property of the matrix A makes
it shiftinvariant, which means AS(u) = S(Au) for all u E R" where S is a shift
operator [59]. A linear shiftinvariant operator can be completely characterized by its
impulse response, which is the solution of the corresponding linear system Au = e
where the RHS e is a unit vector. Thus, the solution rqi can be obtained via the
convolution of the impulse response for this shiftinvariant linear system with the
RHS in equation 3.33, which is sparse since the vectors ul,...,Up themselves are
sparse. The convolution of the impulse response with a sparse vector can be easily
obtained by a linear combination of the shifted impulse response vectors. When the
vectors vi, i = 1,... ,p, are also sparse, the computation of the matrix VT'i for
forming the capacitance matrix C can be further simplified by obtaining the (i,j)th
entry of VTq from the impulse response corresponding to the relative locations of
the nonzero components of vi and those of uI,..., umA,m and uj.
The direct computation of rq/3 in the final step of the generalized capacitance
matrix algorithm requires O(pn) operations, which are very costly when p, the rank
of difference between B and A, is large. Instead, we solve the following linear system
mAM qTUO
Ay=U/3 E 7u. Ui. (4.1)
j= q j
to get the solution y which has the same projection on the range of A as ir)3. As for
the projection of r/3 on the null space of A, it can be easily computed from the inner
product of 0 and ai = (a,,1,... ai,p), where aij = qT%, for 1 < i < mA m. The
coefficients ai can be chosen arbitrarily as long as the assumption that rn ... 7mA_m
are linearly independent and linear combinations of q',., q'mAm is satisfied. Thus,
r/3 can be obtained from the following equation.
mAm
r3 = y+ (a/3 )q (4.2)
i=1 q*qi j "
The flowchart of the generalized capacitance matrix algorithm is given in figure
4.1.
4.2 Choice of the Splitting
From the variational formulations for the early vision problems, we can see it
is necessary to solve linear systems of the form Kx = b, where the matrix K E
RNXN(N = n2) is the sum of the matrices Kd and AK, obtained from the data con
straint and smoothness constraint respectively. To apply the generalized capacitance
matrix algorithm to solve the linear systems, we have to choose a splitting scheme
for K at first. We discuss the nice choice of splitting K in this section.
Assume we split the matrix K into two components Ko and UVT. Note that the
matrices K and K0 can be replaced by the matrices B and A, respectively, used in
chapter 3. The notation K and K0 (popularly used in computational vision literature)
are used instead of B and A (widely used in the field of capacitance matrix methods)
Figure 4.1. The flow chart of the generalized capacitance matrix algorithm.
for the rest of this chapter. Using the generalized capacitance matrix algorithm to
solve the linear system Kx = b with the above splitting for K, we need to solve several
linear systems of the form K0x = b as shown in equations 3.2 and 3.3 respectively.
Especially, in equation 3.3, we have to solve it p times, where p is the rank of K K0.
Therefore, it is very crucial to choose a well structured K0 so as to efficiently compute
the solution in the capacitance matrix method.
In general, there are three criteria that dictate the choice of K0. The first is
the difference between K0 and K must be small in the sense that rank(K Ko) is
small. Secondly, Ko must possess some nice structure so as to have a fast numerical
solution to the associated linear system. Lastly, K0 should be translation invariant;
this property has the advantage of saving the computational cost in solving equation
3.3 for each ui, 0 < i < p 1, since there are only one or two nonzero components in
each ui. Therefore, we need to compute the solution h0 to a linear system consisting
of a matrix K0 and an RHS vector which is the projection of the unit vector onto
the range of K0. Because, ho is translation invariant, we can obtain the solution of
rq in equation 3.3 without having to solve any of the linear systems associated with
each ir.
For ease of exposition, we consider the splitting scheme for the surface recon
struction problem in the following. This splitting schemes can be extended to other
early vision problems.
The stabilizer discussed in the previous section imposes smoothness constraints
on the surface reconstruction problem leading to a Co surface i.e., the membrane
spline, a C1 surface namely, the thin plate spline, or a combination thereof called
the thinplatemembrane spline. As mentioned earlier, the numerical solution to the
surface reconstruction problem involves solving a linear system Kx = b. The matrix
K is large and sparse but not tightly banded. This is primarily due to the fact
that a vector has been used to represent a twodimensional array which leads to
representing neighboring interactions in twodimension by interactions between very
distant elements in onedimension. The matrix K is a sum of two matrices K8 and
Kd, with K, containing the membrane or thinplate molecules [81] defined on the
interior and boundary of Q. The Kd matrix is a diagonal matrix containing nonzero
weights.
By making use of the structure of the matrix K, we can split K into two com
ponents K0 and UVT, with K0 E RNxN and U,V E 3Np, such that K0 is close to
K, i.e., the rank of UVT is small, and there exists a fast algorithm to solve a linear
system with the matrix K0. Then, the solution to the linear system Kx = b can
be obtained by using the generalized capacitance matrix algorithm which involves
solving the wellstructured linear systems with the matrix K0 and the capacitance
matrix linear system.
The choice of Ko depends on the data compatibility matrix Kd and the smooth
ness constraint matrix K,. We have different choices of K0 for sparse (scattered) and
dense data cases. They are given in the following sections.
4.2.1 Dense Data Surface Reconstruction
We will first describe our choice of the matrix K0 and the equivalent Lyapunov
matrix equation for the dense data surface reconstruction with a membrane spline.
Then we will present the matrix K0 for the thinplatemembrane spline surface re
construction based on the construction scheme for the membrane problem. The thin
plate spline is a special case of thinplatemembrane spline and will not be discussed
separately. In addition, the selection of the matrices U and V will be discussed for
both cases.
Membrane spline for the dense data case
The data compatibility matrix Kd for the dense data case with uniform weight
ing is an identity matrix. Focusing on the surface reconstruction problem in a rectan
gular domain, the matrix K0 is chosen to contain both the membrane computational
molecules and the data constraint molecules [81] all over the domain which are
shown in figure 4.2, Using this choice of K0, we can transform the original linear
system Kx = b to an equivalent Lyapunov matrix equation.
+ 1
(a) (b)
Figure 4.2. (a) Data constraint molecule and (b) membrane computational molecules
periodically imposed on the domain Q.
With the above choice, the matrix K0 can be written as follows.
Ko = AK m +K AKemK, + I, (4.3)
where
D I I
I *". "'.
Kmem = ". ". (4.4)
*** I
I I D
4 1 1
1 4 1
D = ". e rn. (4.5)
1 4 1
1 1 4
The matrix Kmem, can be regarded as the result obtained from the 5point ap
proximation to Laplacian operator on a doubly periodic imbedded rectangular do
main. From a computational molecule [81] point of view, only the membrane molecule
is used at every node of the doubly periodic rectangular domain. Therefore the molec
ular inhibitions due to depth discontinuities (if any) and imposed doubly periodic
smoothness are included in the matrix UVT. Given the matrices K and Ko, we can
show that the choice of the matrices U and V such that UVT = K K0 where both
matrices U and V are of full column rank is not unique. Our selection method for
the matrices U (= [ui,..., up]) and V (= [vl,..., vp]) is as follows. Let each column
vector ui, 1 < i < p, contain only two nonzero entries +1 and 1 located at the two
neighboring sites separated by the ith discontinuity. Then select the column vector
vi to be Au_, 1 < i < p. Thus the matrices U and V are of full column rank p
where p is the number of discontinuities. Our selection scheme has the advantage
that the column vectors of U and V are very sparse (only two nonzero entries in each
column), a structure that is very conductive to the design of an efficient numerical
algorithm. We will discuss more of this in section 4.3.
We can see that this particular choice of K0 makes it circulant Toeplitz and thus
translation invariant. In addition, it can be decomposed as the sum of two tensor
products of matrices A and I, i.e.,
Ko =AI+I A, (4.6)
with
+ 2A A A
A +2A A
A= .. .. .. Enxn,
A + 2A A
A A + 2A
and is the Kronecker product. By using this special structure of the matrix K0, we
can rewrite any linear system of the form K0z = f as the following Lyapunov matrix
equation
AZ + ZA = F, (4.7)
where Z and F are n x n matrices corresponding to their concatenated n2 x 1 vectors
z and f respectively. It should be noted that n2 = N, and that the matrix A in
equation 4.7 is circulant Toeplitz and symmetric positivedefinite. We can use the
ADI (Alternating Direction Implicit) method to solve this Lyapunov matrix equation
in O(N) operations. We will prove this 0(N) computational complexity of the ADI
method in section 4.3.
Thinplatemembrane spline for the dense data case
The construction of the matrix K0 for the thinplatemembrane spline prob
lem is based on the above construction for the membrane spline. Analogous to the
matrix Kmem which is a wellstructured matrix and close to the smoothness con
straint matrix K, constructed for the membrane problem, a matrix Ktp is used for
the smoothness constraint matrix discretized from the thinplate smoothness energy.
The differentiation of the membrane smoothness energy function corresponds to a
Laplacian operator A, while the corresponding operator for the thinplate smooth
ness energy is a biharmonic operator A2. Thus, we can extend the selection scheme
for the membrane problem to the thinplate problem by choose the matrix Ktp as
Ktp=KeM. (4.8)
The smoothness matrix for the thinplatemembrane spline is a convex combination
of the membrane and thinplate smoothness matrices. Let's define the matrix Ktpm
as a convex combination of Kmem and Ktp, i.e.,
Ktp, = aKtp + (1 o)Kmem, (4.9)
where 0 < a < 1. Similar to that of the membrane spline, the matrix Ko is chosen
to be AKtpm + I, which can be written as
Ko = AaKem + A(1 a)Kmm +I
= (aKmem + I)(bK.em + I), (4.10)
where
A(l a) + A2(1 )2 4Aa A(l a) A2(1 )24A (4.11)
a, = 2 2(411
When A > (1)2, a and b are always real and positive. Then the matrix K0 can be
factored into two real matrices (aK.em + I) and (bKmem + I), which are of the same
form as the matrix K0 for the membrane dense data problem. Thus, the solution to
a linear system with the matrix Ko can be obtained by solving two Lyapunov matrix
equations, which can be solved using the ADI method in O(N) time. However, when
the above condition is violated, a and b as well as the factoring matrices in equation
4.10 become complex. It is more complicated to use ADI method to solve the complex
Lyapunov matrix equations and analyze the computational complexity. Therefore,
we limit our discussion to the case when the aforementioned condition is satisfied for
the dense data surface reconstruction problem.
Using the above choice for the matrix K0, we can see that the matrices U and
V encode the inhibition of membrane and thinplate computational molecules due to
the presence of depth and orientation discontinuities (if any) and the imposed doubly
periodic smoothness over the imbedded rectangular domain.
4.2.2 Sparse Data Surface Reconstruction
The splitting of matrix K for the sparse data surface reconstruction problem is
discussed in this section. The only difference between the splitting schemes for the
sparse data problem and the dense data problem is the treatment of the data com
patibility matrix Kd. For the dense data case discussed above, the matrix Kd is an
identity matrix and is arranged into K0, which can be rewritten as a Kronecker prod
uct of two wellstructured matrices. For the sparse data case, the nonzero elements of
the matrix Kd are sparsely distributed along the diagonal, thus every nonzero entry
of the sparse diagonal matrix Kd is decomposed into the outer product of ul and v
and included into U and V as follows:
U = U1 ... Uk U1 ...u
V = Avi ... Av,, V1 ... V'k,
where the vectors ul,..., u V, vi,...,vk account for the molecule inhibition due
to depth and orientation discontinuities (if any) and the imposed doubly periodic
smoothness over the embedded rectangular domain. With the above choice of U and
V, each column of U and V is still very sparse, containing only one or two nonzero
entries. Our numerical algorithm will take advantage of this sparsity property.
Similar to the dense data case, the matrix K0 can be decomposed as the sum of
two Kronecker products of AA and I. Therefore, a linear system of the form Koz = f
can be written as the Lyapunov matrix equation AZ + ZA = F.
4.3 Numerical Solution
In this section, we discuss the convergence of ADI method applied to the Lya
punov matrix equation and show that the computational cost for solving this equation
is 0(N) by proving that the ADI method can converge in a constant number of it
erations independent of N. In addition, we present a modified biconjugate gradient
method that is used in solving the capacitance linear system.
4.3.1 ADI Method for the Lvapunov Equation
The ADI method for solving a Lyapunov matrix equation is described in [48].
For any Lyapunov matrix equation of the form AX + XA = B, the ADI method
involves the following steps in each iteration j = 1,2,..., J,
(A+pjI)X,_ = BXj_,(ApjI) (4.12)
Xj(A + pjI) = B (A pjI)Xj (4.13)
For our problem, note that the matrix A is circulant Toeplitz and symmetric positive
semidefinite as discussed in section 4.2. The special structure of matrix A helps in
advancing the ADI iterations very efficiently in each step of the technique. On the
lefthand side (LHS) of the equations 4.12 and 4.13 respectively, the matrix A + pjIl
is very close to being tridiagonal except for the nonzero entries at the corners namely,
the entries (1,n) and (n,1). These nonzero corner entries are caused due to the doubly
periodic boundary conditions discussed in the last section. In addition, A+pjI is SPD
for any positive parameter pj. Therefore, we can compute the LU decomposition of
this matrix and then use the forward and back substitution to compute the updated
solution. Due to the the special structure of the matrix A +pjlI, the solution update
requires only 0(N) time per iteration.
We now examine the convergence of the ADI method for the Lyapunov matrix
equation in our algorithm. Let the initial Xo = 0 (zero matrix), and AXt = Xt X*,
where X* is the true solution of the Lyapunov matrix equation, AO0, Ai,..., Ani, and
qo, q, ..., qn be the eigenvalues and the eigenvectors of A respectively. For this
symmetric matrix A, we have
A = QDQT, (4.14)
where Q = [qo q, ... qn,], D = diag[Ao, Ai,.., An1], and Ak = 4sin2( ')
for k = 0,..., n 1. We express the eigenvalue Ak in this form so as to have nonde
creasing eigenvalues with respect to the indices. The range of eigenvalues is between
0 and 4, and the zero eigenvalue (when k = 0) is simple. The orthogonal matrix
Q contains the discrete sine or cosine function in each column with increasing fre
quency corresponding to increasing eigenvalue, i.e., the index increases. For example,
the eigenvector corresponding to the simple zero eigenvalue is a flat function whose
frequency is 0.
By taking the tensor product of the eigenvectors, qk 0 qj, k, 1 = 0,..., n 1, we
can generate an orthogonal basis for R" and express the true solution X* in this
basis as follows:
ni n
X*= akL(qkq). (4.15)
k=O 1=0
Subtracting each of the equations 4.12 and 4.13 from the Lyapunov equation AX + XA
= B, and expressing X in the above tensor product basis, we have the following result
for the error after t iterations,
AXt ni' r tk A)(,p) aki(qk l. (4.16)
A ~=Z P (1+) ],+p
k=0 1=0 = Ak+ q)(.j
The function inside the bracket is the error reduction factor in the basis qk 0 qj.
Notice that the Lyapunov matrix equation has an infinite number of solutions due to
the singularity of the matrix A. The error reduction in each step is always less than
1 for eigenvectors with positive eigenvalues and positive parameters pj while it is
always 1 for the basis qo 0 qo (which is evident from equation 4.16 after substituting
k = 1 = 0 and Ao = 0). A solution to the Lyapunov matrix equation can be obtained
without reducing this factor at the component k = I = 0.
The classical ADI minimax parameter problem is to find the ADI parameters pj
to minimize the maximum of the error reduction function excluding the zero eigen
value, i.e., to minimize the function
t (Ak Pj) A, PJ
t = max I (=)(P) (4.17)
=t m ax f 'k'k+ Pj t+ Pji
4V(k,)\(,O) j= p Al + p
Applying the result from the classical ADI minimax analysis [48], leads to an 0(log N)
iterations for convergence [85]. This in turn yields an overall computational complex
ity of 0(N log N) for the ADI method since, 0(N) time is required per iteration in
the ADI method. An O(N log N) computational complexity is unacceptable for our
purposes (recall that we seek an 0(N) solution). Thus, in order to design an 0(N)
algorithm, it is essential to use the additional knowledge about the problem e.g.,
spectral characteristics of the differential operator in the PDE. The classical ADI
convergence analysis does not make use of any such knowledge. In this thesis, we re
formulate the ADI convergence analysis by incorporating the spectral characteristics
of the Laplacian operator. This is facilitated by making the following assumption.
Assumption 1 Let the projection of B, the RHS of the Lyapunov matrix equation, on
the basis qk 0 q1 be bkl. Assume there exist a constant a E 3+ and a Wm, > 0 such
that ,V(k,l)\(O,O), Ibkl < (k for some m > 0.5, and EkZE( .6k+2) > aw
 (k2 +12)m,,,2
For our Lyapunov matrix equation AX + XA = B, the relationship between the
solution X and the data B in the eigen basis of the matrix K0 is aki = jk+J'
Therefore, the assumption IbklI < (w2) IakM < J (Ak+k)"2 for some m >
0.5. Using the relation Ix < sin x < x for x E [0,7r/2], we have clk2 < Ak < c2k2
with c = 4 and C2 = 42. Thus, requiring that the solution satisfy the constraint
\akil < (k f2 for some m > 0.5.
(k2+J2)M+I
This assumption can be given the following spectral interpretation. The smaller
values k or 1 correspond to lower frequencies along vertical or horizontal direction, and
vice versa. Consequently, the assumption means that the overall frequency content
of the input data or the right hand side can be bounded by an increasing function
of 11w12 = (k2 + 12)0.5 of degree less than 1. This can be easily satisfied when the
overall frequency distribution of the input data is decreasing or uniform. This is true
in reality since, the low frequency content in signals is normally much higher than
the high frequency content.
We first define two matrices X' and AX't for use in the convergence analysis
of the ADI method. Let X' and AX't be the projections of the true solution X*
and the error in computed solution (in iteration t) AXt on the range of the matrix
K0 ( = AI + IA), respectively. Note that, X' and X* are almost the same
except that X' doesn't have projection at the basis qo qo. In the ADI literature
[48], the convergence criterion is normally defined as follows: The ADI method for
the Lyapunov matrix equation is said to have converged when the relative Frobenius
norm AXj is reduced to within a specified error tolerance c lying between 0 and
1, i.e., IIAX'tlf <: CIX'F. This means we don't need to reduce the error component
in the basis corresponding to the zero eigenvalue or in the null space of K0. The
problem is to find the number of iterations required in the ADI method and the
associated ADI parameter set to achieve IIAX'IF < :EllX'IIF.
We make use of this convergence criterion and assumption 1 to reformulate the
convergence theory of the ADI method as follows. The Frobenius norm IIAX'tlF can
be written as,
JJAXtJJ2a~lt (Ak p Pj(l pj)2
IIAX'II' = Ea4 ( ')2( )2 (4.18)
k 1 j=l Ak + pj Al "+p
where the double summation takes every k and 1 between 0 and n 1 except k =
1 = 0 (simultaneously). As a consequence of assumption 1 the ADI method for our
Lyapunov matrix equation can converge in a constant number of iterations. We state
and prove this convergence result in the following theorem.
Theorem 4 Given assumption 1 and an error tolerance e between 0 and 1, there exists
a constant integer t such that Vn, IIAX'tlIjF EIIX'IIF.
67
Proof: From the discussion on assumption 1, we have the following inequality for
IIX'IIF.
(4.19)
X2 2 > bkI )2 > 0aW2
' k IyAk+ A, 16 kI V 16 'n
Applying assumption 1 to equation 4.18, we have the following inequality
2 f.Akp., A(Pj )2
IIAX 'lII2 < W A Wmkl('k + PJ 2 ___q _
(kl +)m(A, )+)) A.+P. A,+p_)
21 1
 n ((k)2 + ()2)2m(Ak + A)2q()R(pp2,..
where q is a positive number between 0 and 2m + 1 and
R(PP2 ..1 tP = Pp)2(A p)2.
(A + A). Ak + pj A, + pj
(4.20)
Using the fact that Ak = 4sin2(1Y71) > 4(k)2 for k > 0 and substituting into the
above inequality, we have
2 {
j4 2q/t jm2 < m
42qn4m2k
1 ( p 2,...,Pt)
((k)2 + ()2)2m+2q)
Then we have,
IIAX',II < V. V.1 1 1
' 42qn4 2 " L((k)2 + (L)2)2m+2q T2
k I vn7 ^n'
} max R(pi,p2,...,pt).
V(k,l)\(O,O)
(4.22)
Note that the double summation on the RHS of equation 4.22 takes every integer k
and I between 0 and n 1 except k = 1 = 0 simultaneously. Let the set of all the
(4.21)
pair (k, 1) in this double summation be denoted by A. By using the split in the set
A, we have
1 1 n1 ni
EEf(k,l)1 = {f(O, 1)+f(1, O)+f(1, 1)+ f(k, O)+ / f(O, l)+ E f(k,l)},
k I k=2 1=2 (k,l)EB
(4.23)
where f(k,l) = 1!2+ and B = (k,l)\(l,l) k,l= 1,.. ,n 1. By using
((k)2+(L)22m+2q "
the fact that f(k, 1) is a strictly decreasing function of k and 1, we can bound the
terms with summation in equation 4.23 by the corresponding terms in an integration
form as follows.
f(k,1) < {f(0,1) + f(1,0) +f(l,1)1+ f )dx
k I n
4n f(0, y) dy + L 2 + y2)2m+ dxdy, (4.24)
where the integration domain S = {(x,y)I(x,y) E [0,1] x [0,1]\[0, ) x [0, )}. By
transforming the Cartesian coordinate to the polar coordinate, the term with double
integral in equation 4.24 can be bounded by the following integral,
j r2 1 rdrdO r ) 4m+22q 2m+l)
Sr4m+42q 4(2m+ 1 n 2
~ 7r n4m+22q (4.25)
< 4(2m + 1 q)
Substituting equation 4.25 into equation 4.24 and calculating the integrals in equation
4.24, we obtain the following inequality
E E 1 1
k n(( )2 +r
where f=2 + 2+2q + 4m+32q + 74(21q) is a constant.
Using equations 4.19, 4.22 and 4.26, the convergence criterion IIAXtIF < IEjX'IIF
can be met with by the following derived condition
max R(pp2,. ..P Pt) < ' n2q = Mn29, (4.27)
V(k,I)\(0,0) 0
where M is the constant defined as M := 24 Furthermore, we can bound the
function maxk,l\(o,o) R(pi,p2,. .. ,pt) by a function of Ak as follows:
max R(pl,...,pt) = max ( 1" Ak P )2)AtI p)2)
V(k,I)\(o,O) V(k,\o)\ (Ak + A). (= Ak + pj j=1 +pj
< max 1t Akp2
The maximum taken from the values of function R at the nonzero eigenvalues,can
be bounded by using the maximum of the same function in an interval containing all
the nonzero eigenvalues. Then the convergence can be reached by using the following
modified requirement
max 1 J( )2 < Mn2q. (4.28)
A=4sin2()
For any positive pj, the function I 1 (j )2 is always less than the righthand
side when x > r. Therefore, in equation 4.28, the interval (4sin2(*),4) can be
Mqn2
replaced by S = (4sin2(),min(4, ')). The convergence requirement in equation
Mn n2
4.28 can be made less stringent by using the inequality,
1 [xP,)2t1
max ,( )2 < ( (x P)2)(Max), (4.29)
rx X ES es X ++ i pj xES Xq
leading to the following modified requirement which when satisfied automatically
makes the inequality 4.28 true.
t Xp3 2 1 2 ()) =MI
max I ( ) < (Mn2)( _) = (Mn2)(4 sin2)) = (4.30)
xES j=x + Pj maxxEs n
This bound M' will approach a constant when n + oo. Thus, the problem becomes
one of finding the number of iterations t and the parameters pj such that the re
quirement in 4.30 is satisfied. This requirement is of the same form as the classical
ADI minimax problem which was solved by Jordan as quoted in [87]. We can use the
result in [87] to determine the number of iterations required for convergence. The
number of iterations needed to meet this requirement is
J = [ 7M2 k (4.31)
k' = (4.32)
v, + =v( (3 1
v= _(V+ ), (4.33)
2 V
where, [z] denotes the smallest integer larger than z and v is the expansion factor of
an interval (a, b) defined as . The expansion factor v of the interval of interest (S)
is fixed when n approaches infinity, i.e.
limv= 1 (4.34)
nlroo 47r2Ml/q.
In the classical ADI minimax formulation, v is the spectral radius of the eigenvalues
being considered in A, which is shown be of the order of n2 [85]. Since we use
a different formulation for convergence and impose the spectral assumption on the
rin(4,. =)
data, the resulting v becomes m n When n + oo, the ratio v approaches a
*n'
constant as given in equation 4.34 and M' also approaches a constant as mentioned
above. Therefore, we conclude that lirn.oo J is a constant, i.e. there is a constant t
such that IIAX'tIIF < eClX'DF is satisfied for all n. m
In [87], an optimal set of ADI parameters for the general ADI iterations was
given. We may use these parameters to solve the minimax problem in equation 4.30.
Although this set of ADI parameters is not the optimal set to meet our convergence
requirement, it provides us with a parameter set which can achieve the convergence
criterion in a constant number of iterations. Therefore, we use the result in [87] to
select a suboptimal ADI parameter set. Solving for the optimal ADI parameters in
our problem requires complicated analysis using transformation theory [87] and is
currently under investigation.
4.3.2 Modified BCG Method for the Capacitance Matrix Equation
In this section we present an algorithm to solve the capacitance matrix equa
tion defined earlier. The conjugategradient(CG) method is a very powerful iterative
scheme to solve the symmetric positive definite linear system. A natural generaliza
tion of the CGtype method to solve a general nonsymmetric linear system is called
the biconjugate gradient(BCG) method [46]. The linear system with the capacitance
matrix is a nonsymmetric and indefinite system. The size of the capacitance ma
trix is determined by the rank of the matrix UVT, which contains the difference
between K and K0. With the Ko proposed in section 4.2, the size of the associated
capacitance matrix is usually 0(n). The BCG method requires the computation of a
matrixvector product in each iteration. Since the capacitance matrix C is dense, this
matrixvector product takes 0(N) operations. In this section, we present a modified
biconjugate gradient method to solve this linear system by applying the wavelet trans
form technique to approximate the multiplication and reduce the computational cost
to 0(n). Since the BCG converges in O(n) iterations, the computational complexity
for the modified BCG method is 0(n2) = 0(N).
The BCG algorithm [46] for the nonsymmetric linear system CQ3 = f is as
follows.
0. Choose 30, and set q0 = ro = f C30; Choose io # 0, and set 4o = ro;
1. Compute = T D T N D
1. Compute ak rkrkl,ak = qk_1Cqk1, and ak = ak/ak,
2. Update 13k = /3k1 + akqk1;
3. Set rk = rk akCqk1, and rk = 41 akCTc4k1;
N TN N.
4. Compute pN = ik.Trk, and Pk = pk/ak,
5. Set qk = rk + pkqk1, and qk = rk + Pkcqk1;
6. If rk P 0 or rik 0, stop; else go to 1.
Like the CG, the operations and storage requirement per step in the BCG are con
stant. The convergence behavior is similar to the CG method except that the BCG
algorithm is susceptible to breakdowns and numerical instabilities. To be more spe
cific, division by 0 (or a number very close to 0) may occur in computing ak and
pk. These two different breakdowns may be avoided by using the lookahead Lanczos
algorithm or the quasiminimal residual(QMR) algorithm [19].
In the above algorithm, two matrixvector multiplications Cqk1 and CqkI are
required in each iteration. Since the matrix C is dense, the direct multiplication will
take 0(N) operations, which is not acceptable to our pursuit of an O(N) algorithm
for the Poisson equation. Thus, we propose to carry out the multiplication in a
wavelet basis to reduce the computational cost [1, 7]. With appropriate wavelet basis
[7], the decomposed matrix can be approximated to high accuracy by a sparse matrix.
After the approximation, the multiplication of this sparse matrix and the decomposed
vector can be reduced to 0(n) or O(nlog(n)) operations [7]. Consequently, the
modified BCG method is computationally efficient.
The wavelet transform of a matrix D and vector q can be written as follows:
D = DpT, =
where 0 is the wavelet transform matrix with each row corresponding to a wavelet
basis vector. If we use the orthogonal wavelet transform, then Dq = 0r(fTj).
Therefore, we compute matrixvector multiplication in the wavelet domain and then
transform back to the original domain by using the wavelet reconstruction (inverse
wavelet transform) to obtain the product Dq.
It is very crucial to know the structure of the capacitance matrix to achieve an
accurate and efficient approximation of the matrix. In equation 3.4, the capacitance
TT
matrix is composed of three matrices, i.e., the identity matrix I, E'=o1 ej+lqTU
and VTr. The first two matrices are extremely sparse, but the third is a dense
matrix. When computing the matrixvector product of the capacitance matrix and
a vector, we directly compute the separate products for the first two sparse matrices
while the aforementioned wavelet approximation scheme is used only for the matrix
VT1.
4.4 Experimental Results
We have successfully applied our generalized capacitance matrix algorithm to
a host of early vision problems which use the membrane and/or thinplate spline
to constrain the solution [85, 45]. These early vision problems include the surface
reconstruction, shape from shading, optical flow computation, lightness problem,
etc.[85, 45]. These problems when formulated as variational principles and dis
cretized, lead to solving one or more large sparse linear systems. In this section,
we present the results of applying our algorithm to the surface reconstruction and
shape from shading problems.
4.4.1 Surface Reconstruction
We implemented our generalized capacitance matrix algorithm on the sparse
data surface reconstruction with membrane and thinplate splines. We synthesized a
sparse range/elevation data set on a 64 x 64 grid. The input data set is very sparse
and contains 15 data points randomly scattered in the plane as shown in figure 4.3(a).
We apply our algorithm to this data to obtain a surface approximation.
In the first experiment, we reconstruct the surface to approximate this sparse
data set by using membrane and thinplate smoothness constraints. Figure 4.3(b)
shows the reconstructed surface using membrane smoothness after 10 ADI iterations.
We observe that only 10 ADI iterations are needed to attain an error tolerance of
105. The reconstructed surface using thinplate smoothness after 16 ADI iterations
is shown in figure 4.3(c). We can see the reconstructed surface using thinplate
smoothness is much smoother than that using membrane smoothness.
The second experiment is the surface reconstruction on the same data set with
prespecified discontinuities along a horizontal line, between the coordinates (1,32)
and (30,32). as shown in figure 4.4(a). We depict the computed surface with mem
brane smoothness obtained using our generalized capacitance matrix algorithm with
1 and 10 ADI iterations in figure 4.4(b) & (c) respectively. Figure 4.4(d) shows the
result of the same experiment with thinplate smoothness after 16 ADI iterations
in our generalized capacitance matrix algorithm. A similar experiment with depth
discontinuities along the diagonal line segment extending from the coordinate (1,63)
Ulip
(a)
(b) (c)
Figure 4.3. Surface reconstruction example; (a) original sparse data, (b) recon
structed surface using membrane smoothness, (c) reconstructed surface using thin
plate smoothness, obtained using the generalized capacitance matrix algorithm.
to (63,1) was performed. The data along with the recovered surface using membrane
smoothness are shown in figure 4.5(a) & (b) respectively.
(a) (b)
(c) (d)
Figure 4.4. Surface reconstruction with prespecified discontinuities along a horizon
tal line; (a) original sparse data set with depth discontinuities along the indicated
horizontal line; reconstructed surfaces using a membrane spline after (b) 1 iteration
and (c) 10 iterations of ADI method; (d) reconstructed surface using a thinplate
spline after 16 iterations of ADI method.
4.4.2 Shape from Shading
In this problem, we tested our algorithm on a synthetically generated image of
a Lambertian sphere illuminated by a distant point source and viewed from the same
location as the point source. Figure 4.6(a) depicts the 64 x 64 original shaded image
(a) (b)
Figure 4.5. Surface reconstruction with prespecified discontinuities along a diagonal
line; (a) the original sparse data set with depth discontinuities along the indicated
diagonal line, (b) reconstructed surface using a membrane spline after 10 iterations
of ADI method.
with an 8 bit resolution in gray per pixel. The sphere is assumed to have Lambertian
reflectance properties with a constant albedo. Both depth and orientation data are
specified on an occluding contour depicted in figure 4.6(b). Note that we have to
solve three systems of equations namely Kxj = bl, Kx2 = b2 and K1x3 = b3 (see
section 2.3). The first two systems are for recovering the surface normals and the last
equation for recovering the depth from the surface orientation map. We start the ADI
iteration for the linear systems with the initial values of p = q = z = 0. The recovered
surface orientation is shown in figure 4.6(c) and the final surface shape obtained using
the orientation information is depicted in figure 4.6(d). For each computed (p, q) on
the RHS of equation 2.18, the ADI error tolerance was set to 10. The same tolerance
was used for the ADI iterations in the depth from orientation problem.
79
Although, we have not compared the computational time of existing solution
methods for the discretized Poisson equation with the computational performance
of our algorithm, we believe that our method will outperform most other existing
methods. Future research efforts will focus on empirically demonstrating this claim.
80
(a) (b)
(c) (d)
Figure 4.6. Shape from shading example, (a) Shaded sphere image (b) irregular
occluding boundary imbedded in a rectangle (c) recovered surface orientation, (d)
reconstructed surface shape.
CHAPTER 5
ADAPTIVE PRECONDITIONING TECHNIQUE
5.1 Introduction
The generalized capacitance matrix algorithm presented in chapter 4 is very
efficient to solve the linear systems arising from the early vision problems. However,
this technique can prove to be inefficient for dense data problems with nonuniform
weighting on the data constraints, such as the optical flow problem, since the size
of the associated ( dense ) capacitance matrix is too large. Although this problem
may be circumvented by incorporating the capacitance matrix technique as part of
an iterative scheme [72], however it was pointed out in [12] that this semidirect
numerical scheme only converges when the regularization parameter A is very large.
In this chapter, we present a novel adaptive preconditioning technique which when
used in conjunction with a conjugate gradient algorithm can solve the linear systems
very efficient. This adaptive preconditioning technique can be universally applied to
solve any early vision problem.
Numerical iterative methods have been popular for solving the linear systems in
early vision problems. Several multiresolution preconditioning techniques [75, 61, 92]
have been proposed to accelerate the convergence speed of the iterative methods in
computer vision literature. In this chapter, we will introduce a physicallybased adap
tive preconditioning technique which will outperform in computational efficiency
 all previously proposed preconditioning methods in vision literature namely, the
hierarchical basis preconditioned conjugate gradient of Szeliski [75] and the change
of basis methods of Yaou et al. [92] and Pentland [61]. In addition, it does not have
any restriction on the choice of the regularization parameter A for its convergence.
In [75, 76], Szeliski presented a preconditioning technique based on the use of
hierarchical basis functions. This technique is based on the multilevel splitting of the
finite element spaces presented in [93]. The hierarchical basis functions naturally lead
to a pyramidal representation of the unknown function being solved for. Empirical
convergence results were shown for preconditioned conjugate gradient in comparison
with the standard conjugate gradient method. A key piece of information that was
overlooked in Szeliski's work [75, 76] was that, in designing a preconditioner for the
conjugate gradient algorithm, no information about the problem being solved was
used i.e., once a basis set was chosen, the same preconditioner was used regardless
of the imposed smoothness or data constraints. Thus, the true power of a precondi
tioning transform was not exploited to its fullest.
More recently, Pentland [61] presented a technique for surface interpolation us
ing a change of basis to orthonormal wavelets as a preconditioning transform. He
uses the results of Beylkin et al., [7] which states that the application of (N x N)
matrices corresponding to any pseudodifferential and CalderonZygmund opera
tors to arbitrary vectors requires either 0(N) or O(N log N) operations based on
whether nonstandard or standard wavelet transform is used. The result of Beylkin
et al., primarily shows a reduction in the bandwidth of the matrices corresponding
to pseudodifferential operators in a wavelet basis and gives a technique whereby an
0(N) coefficients can be used to approximate the (N x N) operator, for a given
tolerance. This compression takes 0(N) time if the structure of the singularities of
the matrix are known a priori.
There are three issues to be noted with regards to Pentland's reported results.
Firstly, after a change of basis to orthonormal wavelets, the offdiagonal terms were
discarded in [61], under the claim that the transformed stiffness matrix was diagonally
dominant. This diagonal dominance behavior was depicted via the profiling of a row
of the transformed stiffness matrix at a high resolution. Upon close examination,
we found that the diagonal dominance property does not hold in general at lower
resolutions (see figure 5.10) and thus, discarding the offdiagonal terms is unjustified
and leads to a solution which may be far from the true solution. Secondly, Pentland's
preconditioner requires the computation of the diagonal entries in a wavelet trans
formed stiffness matrix. Although the stiffness matrix is banded, the computation of
these diagonal entries takes O(N log N) operations, which is too expensive for large
N. Thirdly, it is not known if the matrix (K + S) in his linear system (K + S)U = D
being solved for the interpolation problem would satisfy the conditions set forth in
Beylkin et al., [7] in order for their bandwidth reduction scheme to be applicable in
this case. Further, we found that his algorithm fails to converge in reasonable number
of iterations to the correct solution for very sparse data that is used in the examples
of this thesis.
Yaou and Chang [92] report a fast surface interpolation algorithm using the
multiresolution wavelet transform. This work is almost exactly similar to that of
Szeliski with the exception that Yaou and Chang use a wavelet basis instead of the
hierarchical basis and a Jacobi iteration [26] instead of the conjugate gradients used in
Szeliski [75]. As in Szeliski, the full potential of a preconditioning technique was not
exploited in Yaou and Chang i.e., the design of the preconditioner did not make use of
any information about the problem being solved. The stiffness matrix in the surface
interpolation problem was first transformed to a biorthonormal wavelet basis and
then the transformed linear system was solved using a Jacobi iteration. Note that,
in a preconditioning technique applied to the the positive definite stiffness matrix K,
one tries to approximate K as well as possible with another positive definite matrix P
known as the preconditioner, such that P'K is a good approximation to the Identity
matrix. Such an approximation P was never constructed in Yaou and Chang [92].
Further, upon implementing their algorithm, it was found that their published results
about rate of convergence were only valid for data from periodic function since their
implementation took advantage of periodic wavelet transforms.
In this chapter, we will present a very effective way to construct a physically
based adaptive preconditioner in a wavelet basis for several early vision problems
namely, the surface reconstruction (SR), shape from shading (SFS) and computation
of optical flow (OF). This physicallybased preconditioner is adapted to the membrane
spline or the thin plate spline or a convex combination of the two. Experimental
results depicting the performance of our algorithm in comparison to existing methods
discussed above are presented for synthetic and real data.
The rest of this chapter is organized as follows. In section 5.2, an adaptive
preconditioner constructed in a wavelet basis is presented, followed by the adaptive
preconditioned conjugate gradient algorithm in section 5.3. Algorithm implementa
tion on synthetic and real data are presented in section 5.4. In section 5.5, we end
with a discussion.
5.2 Adaptive Preconditioning in a Wavelet Basis
Iterative methods are preferable to direct methods for solving large and sparse
linear systems in early vision problems, since direct methods require the formation
of intermediate matrices which are not necessarily sparse. Although the sparsity
can be preserved when using direct methods on 1D problems with the smoothness
constraint, it will be destroyed in the case of 2D problems. This makes the number
of operations and the amount of storage too large for 2D and higher dimensional
problems.
The efficiency of applying an iterative method to solve a linear system depends
on its rate of convergence, which is a function of the condition number of the coef
ficient matrix in a linear system. Unfortunately, many early vision problems after
regularization lead to large illconditioned linear systems. Therefore, most itera
tive methods converge very slowly to the true solution and are thus computationally
impractical. One way to speed up the convergence of an iterative method is via pre
conditioning. In this section, we present a new adaptive preconditioning technique
wherein the preconditioner is constructed in a wavelet basis. Since the proposed
preconditioner is designed to adapt to the spectral characteristics of the smoothness
constraint in the regularization solution (in the frequency domain) of different early
vision problems, it is bound to perform better than the existing "fixed" precondition
ers [75, 42, 92].
5.2.1 Preconditioning Technique
For a linear system Kx = b with positivedefinite matrix K, we assume there
exists a positivedefinite matrix P, the preconditioner for the matrix K, such that the
condition number of P1K is greatly reduced. Since P is a positivedefinite matrix,
it can be shown that there exists a positivedefinite matrix C such that C2 = P [26].
Now, we can define the vectors x and b via the transformation matrix C as follows
x=Cx, 6 = Cb. (5.1)
Thus, the linear system Kx = b can be solved by solving the transformed linear
system
Kx = b, (5.2)
where K = C1KC1. Note that the matrix K in the transformed linear system
is similar to the matrix P1K, i.e., the condition numbers of K and P1K are
equivalent. The condition number of the transformed matrix is much smaller than

Full Text 
6
In this thesis, we will briefly discuss existing methods for this coupled noncon
vex minimization problem and propose a novel hybrid search algorithm as a possible
solution toward solving this problem. This hybrid search algorithm which is a combi
nation of a stochastic and a deterministic search techniques. For the stochastic search,
we propose an informed genetic algorithm (GA) whereas for the deterministic search,
we employ either one of the two deterministic algorithms mentioned above. Some
promising preliminary experimental results of applying this algorithm on sparse data
surface reconstruction along with discontinuity detection problems are presented.
1.3 Main Contributions of this Thesis
This thesis contains the following four main contributions:
1. A new theoretical and algorithmic framework based on the capacitance matrix
method is proposed for solving elliptic partial differential equations arising in
early vision and engineering problems. The algorithm is based on the develop
ment of the generalized capacitance matrix theorems which we state and prove
in this thesis. We make use of the ADI method as a fast elliptic solver in this
algorithm and prove that it takes a constant number of iterations to converge
within a given error tolerance with each iteration taking 0(N) time.
2. A novel adaptive preconditioning algorithm using a wavelet transform is devel
oped for solving illconditioned large systems of equations arising in regularized
solutions of illposed problems in early vision and other domains. Our pre
conditioner is constructed by approximating the spectral characteristics of the
imposed smoothness as well as data constraints. This spectral approximation is
139
to take care of largedisplacement problems and to accelerate the convergence rate of
the nonlinear CG algorithm.
6.3.1 Nonlinear Conjugate Gradient Algorithm
The nonlinear conjugate gradient method for minimizing general nonlinear func
tions is outlined as follows [23]:
1.
2.
3.
4.
5.
6.
7.
d0 = r0 = /'(u0).
Find a, that minimizes /(u, + a,d,).
u,+i = u, + a,d,.
r+i = /'(u.+i)
fii+i = or fii+i = max{r,tl(rrT+r r,),0}
d,+i r,+i  /?,+id,.
If Tk 0, stop; else set i := i + 1 and go to step 2.
Step 2 requires a line search method to minimize the function /(u, + a,d) at u, along
the direction d,. The NewtonRaphson method and the Secant method are two popu
lar line search schemes to approximate the best a,. We employ the NewtonRaphson
method in the nonlinear CG. The Taylor series approximation of the function / up
to the secondorder terms is used to derive the approximate a,, giving
/'(u,)d,
dff"{ui)d,'
a, =
(6.11)
CHAPTER 6
ROBUST OPTICAL FLOW COMPUTATION
6.1 Introduction
Optical flow computation is a fundamental problem in the motion analysis of
image sequences. It provides very important information for estimating 3D velocity
fields, analyzing rigid and nonrigid motion, segmenting the image into regions based
on their motion, or recovering 3D structure of objects in the image. Many techniques
for computing optical flow have been proposed in literature. These can be classified
into gradientbased, correlationbased, energybased, and phasebased methods [3].
In this chapter, we present two robust and efficient algorithms namely, a modified
gradientbased algorithm and an SSDbased regularization algorithm, for optical flow
computation.
The gradientbased approach is dependent on the image flow constraint equa
tion, which is derived from the brightness constancy assumption as well as the first
order Taylor series approximation [40]. Using the image flow constraint equation
alone is not enough to compute the optical flow since each equation involves two
different variables. Horn and Schunck [40] introduced the membrane smoothness
measure to constraint the flow field. Since the smoothness constraint is invalid across
128
154
formulate in a deterministic or probabilistic framework (see chapter 2), leading to
a coupled (binaryreal) nonconvex optimization problem. In this chapter, we will
briefly discuss existing techniques for this problem and propose a novel hybrid search
algorithm as a possible approach toward solving this problem. This hybrid search
algorithm involves a combination of a stochastic and a deterministic search tech
niques. For the stochastic search, we employ a novel informed genetic algorithm
(GA), whereas for the deterministic part, we employ either the generalized capaci
tance matrix algorithm or the adaptive preconditioned conjugate gradient algorithm,
presented in chapter 4 and 5, respectively. Some promising preliminary experimental
results of applying this hybrid search algorithm on surface reconstruction problems
are presented. However, this hybrid search technique will have to be considerably
researched upon prior to deciding upon its practicality.
7.2 Previous Energy Minimization Methods
In chapter 2, we have seen that both the deterministic formulation and proba
bilistic formulation of early vision problems with discontinuity detection give rise to
minimizing a nonconvex function. Note that in some applications, a local optimum
may provide a satisfactory solution. However, in certain other applications, local
optimal solutions are unsatisfactory. A number of optimization techniques have been
proposed to solve this problem. We briefly discuss their pros and cons in this section.
There are two classes of optimization methods proposed for solving this non
convex minimization problems; one is deterministic, and the other is stochastic. The
no
Figure 5.8. The computed solutions for the surface reconstruction with discontinu
ities using the APCG algorithm after (a) 10, (b) 20, (c) 40 and (d) 60 iterations,
respectively.
168
7.3.2 Informed Genetic Algorithm
In this section, we present an informed GA which incorporates the problem
information into the genetic operators. In traditional GAs [24, 15, 69], only the
reproduction operator depends on the fitness function, while the crossover and mu
tation operators randomly explore the solution space with prespecified probabilities.
Totally random search is a very inefficient way to explore the solution space, and
it makes the GAs converge very slowly especially when the entire solution space is
very large, such as early vision problems. In our informed GA, we use an informed
mutation operator which exploits the problem information in the search, thus making
the solution search in the GA very efficient.
The only search procedure in simulated annealing involves the use of the Gibbs
sampler [22, 90] which is a random perturbation of the current solution and is similar
to the mutation operator in the GA. The Gibbs sampler is used to search the solution
space based on the Gibbs distribution of the energy function. The search in the
Gibbs sampler makes use of the information in the energy function, i.e. the problem,
therefore it is an informed search, not a totally random search. We use the Gibbs
sampler as an informed mutation operator in our informed GA.
Our informed GA consists of the reproduction operator and an informed muta
tion operator only. The main idea is to design the GA operators to make each con
figuration of candidate solutions in the generations of the GA to be a sample drawn
from the Gibbs distributions of the energy function to be minimized. To achieve this
goal, we choose our initial population as a sample from a Gibbs distribution of the
44
Using the generalized inverse 1 A+ of the matrix A, we can write x as follows.
mA
x = A+b + o.q' (325)
1=1
where = n rank(A), q'x,..., q'm/1 are the orthogonal eigenvectors in null(A),
and a, 3?, Vi. Then,
mA
VTx = VT A+b + Â£ a, VTq',. (3.26)
1=1
From Proposition 2, we have VTq't = 0, for i = 1,..., m^. Substituting into equation
3.26, we have VTx = VTA+b. Substituting for VTx in equation 3.24 yields
(A+TV7)Tb 0. (3.27)
We can show that A+TV7 null(BT) as follows.
Bt(A+TV7) = (A + UGVT)rA+rV7 (3.28)
= (AtA+TV + VGtUtA+TV)7 (3.29)
Since ATA+r = I Â£= q'tq'f, using a result of Proposition 2 namely, q'fV = 0
for i = 1,..., tua, we have
ATA+TV = V Â£ q'.q'f V = V. (3.30)
i=i
1 Let the singular value decomposition of A be A = Ui7VT, where U and V are orthogonal
matrices and Â£ = diag(0,..., 0, XmA+i, , An). Then its generalized inverse A+ = Vi7+UT,
where Â£+ = diag(0,..., 0, ,..., Â£) [26].
66
We first define two matrices X' and AX' for use in the convergence analysis
of the ADI method. Let X' and AX't be the projections of the true solution X*
and the error in computed solution (in iteration t) AX* on the range of the matrix
Ko ( = A I + I A), respectively. Note that, X' and X* are almost the same
except that X' doesnt have projection at the basis qo qo In the ADI literature
[48], the convergence criterion is normally defined as follows: The ADI method for
the Lyapunov matrix equation is said to have converged when the relative Frobenius
norm is reduced to within a specified error tolerance t lying between 0 and
l,i.e AX'Â¡f < eX'jr. This means we dont need to reduce the error component
in the basis corresponding to the zero eigenvalue or in the null space of Ko. The
problem is to find the number of iterations required in the ADI method and the
associated ADI parameter set to achieve HAX'tH^ < c11X'11^.
We make use of this convergence criterion and assumption 1 to reformulate the
convergence theory of the ADI method as follows. The Frobenius norm  AX't can
be written as,
(4.18)
where the double summation takes every k and l between 0 and n 1 except k =
l = 0 (simultaneously). As a consequence of assumption 1 the ADI method for our
Lyapunov matrix equation can converge in a constant number of iterations. We state
and prove this convergence result in the following theorem.
Theorem \ Given assumption 1 and an error tolerance t between 0 and 1, there exists
a constant integer t such thatVn, AX'tH/^ < cIX,f
CHAPTER 7
RESEARCH DIRECTIONS
7.1 Introduction
In this thesis, we have presented efficient computational methods for early vision
problems with prescribed discontinuities. In this context, several research issues
presented in the following paragraphs will benefit from further investigation.
In the regularization and MRF formulations for early vision problems, the reg
ularization parameters are usually chosen in an ad hoc fashion. Without adapting
the parameters to the problem, it is easy to end up with an oversmoothed or under
smoothed result. The generalized crossvalidation method [88] has been popularly
used to determine the regularization parameters. However, the computational cost
of this method is extremely high. In addition, this method can not be used when
discontinuity is included in the problem. It is important to have an efficient and
general technique to determine the regularization parameters to obtain better results
for early vision problems.
The adaptive preconditioner constructed in a wavelet basis can be applied to
any regularization formulation in early vision. In fact, we can use similar construction
of the adaptive preconditioner for the linear system discretized from any boundary
152
57
4 1
D
1 4 1
Â£nXn.
(4.5)
The matrix Kmem can be regarded as the result obtained from the 5point ap
proximation to Laplacian operator on a doubly periodic imbedded rectangular do
main. From a computational molecule [81] point of view, only the membrane molecule
is used at every node of the doubly periodic rectangular domain. Therefore the molec
ular inhibitions due to depth discontinuities (if any) and imposed doubly periodic
smoothness are included in the matrix UVT. Given the matrices K and K0, we can
show that the choice of the matrices U and V such that UVr = K K0 where both
matrices U and V are of full column rank is not unique. Our selection method for
the matrices U (= [ul5..., up]) and V (= [vi,..., vp]) is as follows. Let each column
vector Uj, 1 < i < p, contain only two nonzero entries +1 and 1 located at the two
neighboring sites separated by the ith discontinuity. Then select the column vector
v, to be Au,, 1 < i < p. Thus the matrices U and V are of full column rank p
where p is the number of discontinuities. Our selection scheme has the advantage
that the column vectors of U and V are very sparse (only two nonzero entries in each
column), a structure that is very conductive to the design of an efficient numerical
algorithm. We will discuss more of this in section 4.3.
79
Although, we have not compared the computational time of existing solution
methods for the discretized Poisson equation with the computational performance
of our algorithm, we believe that our method will outperform most other existing
methods. Future research efforts will focus on empirically demonstrating this claim.
67
Proof: From the discussion on assumption 1, we have the following inequality for
IIX'IIf.
)!>sEB
bkl \2 ^ n 2
) > 77awm
16 klk2 + l 16
(4.19)
Applying assumption 1 to equation 4.18, we have the following inequality
IIAX',II* < EE
wt
rifrrrIfTnr)2
n
T,(k2 + l2Ym(h + A,)2 f=rh +Pj A/ + Vi
Wm ST' }_
4m2 k i (& + &ym(^ + w~q n
where q is a positive number between 0 and 2m + 1 and
= 7r4rÂ¡7n(rrf )2(rr^)!
(A* + \Â¡)q jJi Xk + Pj A/ + pj
(4.20)
Using the fact that A* = 4sin2(^^) > 4(Â£)2 for k > 0 and substituting into the
above inequality, we have
IIax'^<7^zEE
42,n4 m2^Zrl((^)2 + (i)2)2m+2_,
(2)^(Pl.P2,...,Pf)l, (4.21)
ni
Then we have,
II AX'*p. <
42g4m2 l((n)2 + (^)2)2m+2_9 n2 J V(fc?)\(0,0) R^Pl P2 P*)
(4.22)
Note that the double summation on the RHS of equation 4.22 takes every integer k
and / between 0 and n 1 except k = l = 0 simultaneously. Let the set of all the
24
have the following procedures in each iteration: (1) compute the b vector using
the current u, (2) update the solution u by treating b as a constant. In the past,
multigrid techniques have been employed in solving this nonlinear system [79] wherein
a hierarchy of problems at different resolutions needs to be explicitly constructed.
Szeliski [76] presented a fast shape from shading algorithm which employs hierarchical
basis for preconditioning the conjugate gradient algorithm used to solve the above
nonlinear system. Unlike the multigrid methods, this technique does not require
the explicit construction of a multiresolution pyramid of problems. In this thesis we
apply our adaptive preconditioning technique that leads to a faster convergence than
the one in Szeliski [76]. Our method makes use of the spectral characteristics of the
data constraint, the imposed smoothness constraint and the integrability constraint.
This spectral function is then used to modulate the frequency characteristics of the
chosen wavelet basis leading to the construction of the preconditioner.
2.3.3 Optical Flow Computation
A third early vision problem that we consider in this dissertation is the compu
tation of optical flow from an image sequence. Many techniques for computing optical
flow have been proposed in literature. These can be classified into gradientbased,
correlationbased, energybased, and phasebased methods [3]. Recently, Barron et
al., [3] have presented a comprehensive survey of various optical flow computation
techniques along with a quantitative comparison of them. We note that the survey did
not do any comparison of the computation time required by the various algorithms.
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
EFFICIENT COMPUTATIONAL METHODS FOR EARLY VISION PROBLEMS
By
SHANGHONG LAI
August 1995
Chairman: Dr. Baba C. Vemuri
Major Department: Electrical and Computer Engineering
Early vision has been one of the most fundamental research topic in computer
vision. The task of early vision is to recover physical properties of threedimensional
objects (such as shape) from twodimensional sensed data. Example tasks consist
of surface reconstruction, optical flow computation, shape from shading, shape from
stereo, edge detection, lightness problem, segmentation problem,... etc. Early vision
problems may be formulated either in a deterministic or a probabilistic framework.
A popular technique in the deterministic framework is based on the regularization
theory, and the probabilistic formulation exploits the Markov random field (MRF)
model to characterize the function being estimated. Both formulations lead to min
imization of equivalent energy functions. When discontinuity detection is involved
ACKNOWLEDGMENTS
I am deeply grateful to my advisor and committee chairman, Dr. Baba C. Ve
muri, whose guidance, encouragement and support made the completion of this work
possible. I thank him for his academic as well as nonacademic advice and assistance
during my study at the University of Florida. Many ideas in this dissertation were
produced through the discussions with him. I would also like to thank Dr. Staudham
mer, Dr. Doty, Dr. Harris, Dr. Sigmon and Dr. Rao for serving in my supervisory
committee and advising me on various aspects of this dissertation. I sincerely thank
the National Science Foundation for supporting me financially to finish this work
under the grant number ECS9210648. Special thanks go to my parents, who never
hesitate to deliver their love, encouragement and support to enrich my life. Finally,
I would like to dedicate this dissertation to my beloved wife, YihChou, who shared
the happiness and struggle I experienced during the work of this dissertation, and to
my son, Patrick.
in
93
transform was also extended to the twodimensional case in [50]. The division of
frequency domain behavior for the 2D wavelet basis is simply an extension of the ID
case.
After introducing the division of frequency domain behavior for a wavelet basis,
we are ready to construct our preconditioner in a wavelet basis and adapt its spectrum
to that of the regularization filter discussed in section 5.2.2.
5.2.4 Adaptive Preconditioning via the Modulation in a Wavelet Basis
In section 5.2.1, we discussed two criteria for selecting a good preconditioner.
In this section, we construct an adaptive preconditioner for early vision problems by
following these two criteria.
Using techniques of frequency domain analysis, a lowpass filtering interpretation
can be given to the regularization formulation of early vision problems as was shown in
section 5.2.2. The spectral characteristics of the regularization filter in the frequency
domain can be approximated using the frequency domain subdivision behavior of
the wavelet transform which can be implemented very efficiently using a quadrature
mirror based implementation.
To apply a preconditioner P to a matrix K, the first criterion is to drastically
reduce the condition number of P1K; the second criterion for a good preconditioner
is that the auxiliary linear system Pz = r should be efficiently solvable. An obvious
choice to reduce the condition number of P1K is to choose P = K, which reduces
the condition number to 1. Although only one iteration suffices to solve the problem,
the complexity of solving the auxiliary linear system Pz = r is the same as that of
4.2.2 Sparse Data Surface Reconstruction 60
4.3 Numerical Solution 61
4.3.1 ADI Method for the Lyapunov Equation 62
4.3.2 Modified BCG Method for the Capacitance Matrix Equation 72
4.4 Experimental Results 74
4.4.1 Surface Reconstruction 75
4.4.2 Shape from Shading 77
5 ADAPTIVE PRECONDITIONING TECHNIQUE 81
5.1 Introduction 81
5.2 Adaptive Preconditioning in a Wavelet Basis 85
5.2.1 Preconditioning Technique 86
5.2.2 Regularization Filter 88
5.2.3 Wavelet Basis 90
5.2.4 Adaptive Preconditioning via the Modulation in a Wavelet Basis 93
5.3 Adaptive Preconditioned Conjugate Gradient Algorithm 96
5.3.1 Surface Reconstruction 97
5.3.2 Shape from Shading 100
5.3.3 Optical Flow Computation 102
5.4 Experimental Results 102
5.4.1 Surface Reconstruction 103
5.4.2 Shape from Shading 117
5.4.3 Optical Flow Computation 123
5.5 Summary and Discussion 126
6 ROBUST OPTICAL FLOW COMPUTATION 128
6.1 Introduction 128
6.2 Modified gradientbased method 132
6.3 SSDbased Regularization Method 137
6.3.1 Nonlinear Conjugate Gradient Algorithm 139
6.3.2 Modified Search Direction Scheme 141
6.4 Experimental Results 142
6.4.1 Modified Gradientbased Method 143
6.4.2 SSDbased Regularization Method 146
6.5 Summary and Discussion 150
7 RESEARCH DIRECTIONS 152
7.1 Introduction 152
7.2 Previous Energy Minimization Methods 154
7.2.1 Deterministic Optimization Methods 155
7.2.2 Stochastic Optimization Methods 160
7.3 Proposed Hybrid Search Algorithm 161
v
151
regularization algorithm, and the other is the SSDbased regularization algorithm.
The modified gradientbased regularization method selectively combines the image
flow constraint and the contourbased flow constraint into the data constraint. The
image flow constraint is disabled in the neighborhood of discontinuities, while the
contourbased flow constraint is active at discontinuity locations. A more precise
data constraint is obtained by using this selective combination scheme than the one
used by other gradientbased methods in literature. Flow estimates obtained us
ing our method are more accurate than those obtained from techniques reported in
literature [3]. This is evident from the experimental results.
Our SSDbased regularization method uses the SSD measure as the data con
straint in a regularization framework. The resulting energy function is neither quadratic
nor convex. The preconditioned nonlinear conjugate gradient with a modified search
direction is developed to minimize this energy function. Very accurate results were
obtained by applying this algorithm to two consecutive images of a sequence.
Comparing these two algorithms proposed in this chapter, we conclude that the
modified gradientbased regularization algorithm is more computationally efficient
than the SSDbased regularization algorithm, while the latter gives more accurate
results than the former. Also, we have experimentally demonstrated the superiority
of our algorithms for optical flow computation over those reported to date in litera
ture. Future work will focus on automatically detecting motion discontinuities in the
framework presented here.
99
the matrix Ms has very small diagonal entries at the very coarse resolution levels,
the spectral estimation error in Md at these levels may cause significant error in the
spectral approximation to the matrix K. This problem may exist only in the case of
sparse data surface reconstruction, since the spectral estimation for Kj in the dense
data case is quite accurate.
To amend the problem caused by the spectral estimation error, we introduce a
weighting function a(i), where i is the number of iterations, to appropriately weight
the importance of Ms and during the CG iterations, i.e.,
M(t) = Ms +
The matrix M() is the modulation matrix in the ith iteration. From the filtering
correspondence with the regularization formulation, we can see that the overall spec
tral characteristics for the matrix K1 is a lowpass filter. Note that the inverse of
Ks is also a lowpass filter, while the inverse of is not. Therefore, the matrix
M, ( or K, ) is more dominant than the matrix M< ( or K< ) for solving the linear
system Kx = b. Based on this observation, we set the weighting function a(i) to be
very small initially and then gradually increase the values of the weighting function.
Using this schedule for the weighting function, the lower frequency components of
the solution are recovered at the beginning of the preconditioned conjugate gradient
iterations, and then the higher frequency components. This globaltolocal shape re
covery phenomenon can be predicted by the choice of the weighting function cr(i). In
fact, the weighting function can be adapted during the iterations instead of following
CHAPTER 2
PROBLEM FORMULATIONS
There have been two classes of formulations proposed for the early vision prob
lems; one is the deterministic formulation, and the other is the probabilistic one. The
deterministic approach is based on the regularization theory proposed by Tikhonov
[82]. The regularization method introduces a stabilizing operator to wellpose the
original problem and leads to a variational principle. The most popular probabilistic
formulation to early vision problems involves the use of an MRF (Markov Random
Field) model. Although there is some correspondence between the aforementioned
stabilizing operator and the MRF model, the MRF model is known to be more ver
satile to model surfaces or images than the deterministic model [51] [74].
2.1 Deterministic Framework
Variational formulations for various early vision problems have been reported
in [62, 6]. These formulations make use of regularization theory. In a regularization
framework, generic smoothness assumptions are imposed on the solution space prior
to attempting any functional minimization. The smoothness constraints are well
characterized by a class of generalized multidimensional spline functionals [80]. The
formulations involve minimization of an energy functional Â£, which is the sum of the
energy contribution from the smoothness constraint (Â£,) and the data constraint (Â£)
11
39
For i = mg, the RHS in equation 3.3 equals 0, thus, rji Â£ null(A).
From the assumption that rjx,..., TjmAmB are chosen to be linear combinations of
q'j,..., q'myl_mB and linearly independent, we can write [tj1 ... r/mA_me] in terms
of [q'i 35 follows:
[Vl VmAmB] = [q'l q'mj(mBlT. (315)
where T sj\(mAmB)x.(mAmB) js nonsingUlar. By substituting equation 3.15 into
equation 3.14, we get
\
= 0. (3.16)
From Proposition 1, the matrix V^q'j ... q,myl_mB] is of full rank and is of size
p x (m,A ms) with p > rriA mg, which implies
Vr[q'i ... q'm.TOB]T
A
3.
/
fii
\
T
= 0.
(3.17)
V
A
mAmB
Since T is nonsingular, this makes /?, = 0, = 1,..., mg. Combining this with
equation 3.13, we obtain (3 = 0. Thus concluding that C/3 = 0 implies (3 = 0,
therefore, C is nonsingular.
167
Convergence property
Markov chain models have been employed to analyze the convergence of simple
genetic algorithms in [15, 69, 73]. Davis [15] used a nonhomogeneous finite Markov
chain to model the simple GA (population size is fixed but the mutation and crossover
are allowed to vary with the iteration index) and proved that the convergence rate
of a simple GA is far superior to that of simulate annealing. Rudolph [69] employed
a homogeneous finite Markov chain to model a canonical GA ( fixed population size,
proportional selection, fixed crossover and mutation probabilities ), and proved that
a canonical GA never converges to the global optimum regardless of the initialization.
But, he also showed that modified versions of canonical GAs with elitist selection (
the best individual survives with probability one ) asymptotically converge to the
global minimum in the probabilistic sense. Following the Rudolphs work, Suzuki
[73] proved that the probability of finding the global minimum in n generations for a
canonical GA with elitist selection is lower bounded by 1 (9(A,n), where A < 1
and A, is the largest eigenvalues of the diagonal submatrices in the transition matrix.
In addition, an upper bound for  A*  was derived in terms of the mutation probability.
Therefore, one can find the mutation probability to minimize this upper bound to
achieve best convergence performance in the GA. Because of the advantage of elitist
selection proved in [69, 73], this strategy is included in our informed GA, which will
be discussed in the next section.
165
GA can be directly employed to solve the optimization problem by defining a
positive fitness function F(x) in terms of the objective function E(x) which is to
be optimized. For example, we can choose F(x) = E0 + E(x) for the maximization
problem and E(x) = E0 E(x) for the minimization problem, where the constant E0
is used to make sure the fitness F(x) is positive. The fitness function is a measure
for the fitness of survival.
A representation of the solution space has to be chosen for the GA operators.
Normally, the bitstring representation is employed for simplicity. To use the bit
string representation for the real variables (surface), we have to discretize the contin
uous solution space. However, this representation is well suited for boolean variables,
like the line process.
A sketch of the genetic algorithm is as follows [69]:
1. Choose an initial population.
2. Determine the fitness function value for each individual.
3. Perform selection operation.
4. Perform crossover operation.
5. Perform mutation operation.
6. Check stopping criterion; if satisfied, stop; else go to step 2.
Genetic operators
In this section, the three basic genetic operators are briefly described.
42
and range(B) C range(A). Let x be a solution to
Ax = b, (3.20)
where b G range(B). For 1 < i < p, let rji be a solution to
A Vi = (3.21)
Define the capacitance matrix C as G_1 + VTT7, where r\ = [iyx,..., T7p]. Then there
exist a solution f3 to the capacitance matrix equation
C(3 = VTx, (3.22)
and x rj/3 is a solution to the linear system Bx = b.
We will use the following proposition to prove the above theorem.
Proposition 2 In Theorem 2, let {qi,...,qm/t} and {q'l5..., q'myi} be the sets of or
thogonal eigenvectors in null(AT) and null(A) respectively, where tua = nrank(A).
Then, qfuj = q'Jvj = 0, for i = 1,..., tua and j = 1,... ,p.
Proof: From the assumption that null(A) C null(B), we can see that the eigen
vectors q', i = 1,...,ttia, are also in null(B). Since Bq'( = Aq', + UGVTq't, we
obtain
UGVTq't = 0,
(3.23)
65
spectral characteristics of the differential operator in the PDE. The classical ADI
convergence analysis does not make use of any such knowledge. In this thesis, we re
formulate the ADI convergence analysis by incorporating the spectral characteristics
of the Laplacian operator. This is facilitated by making the following assumption.
Assumption 1 Let the projection of B, the RHS of the Lyapunov matrix equation, on
the basis q* 0 such
that V(Ar, /)\(0,0), \bkÂ¡\ < ^+p)> for some 171 > 0'5> and ^(p+z?)2 ^ awm
For our Lyapunov matrix equation AX f XA = B, the relationship between the
solution X and the data B in the eigen basis of the matrix Ko is akl =
Therefore, the assumption \bu\ < =* Kz < (a^+a,Hfc2+2)m for some m >
0.5. Using the relation < sinx < x for x [0,7t/2], we have C\k2 < Ak < c22
with ci = Af and c2 = ^r. Thus, requiring that the solution satisfy the constraint
K/ < (glffijm+i for some m > 0.5.
This assumption can be given the following spectral interpretation. The smaller
values k or / correspond to lower frequencies along vertical or horizontal direction, and
vice versa. Consequently, the assumption means that the overall frequency content
of the input data or the right hand side can be bounded by an increasing function
of  Hh = (^2 + Z2)0'5 of degree less than 1. This can be easily satisfied when the
overall frequency distribution of the input data is decreasing or uniform. This is true
in reality since, the low frequency content in signals is normally much higher than
the high frequency content.
43
for 1 < < mA. The matrix U is of size n x p, n > p, and of full rank, consequently
GVTq't = 0. From the assumption that G is nonsingular, we have Vrq', = 0.
Therefore, q'Jvj = 0 for t = 1,..., mA and j = 1,... ,p.
Similarly, we obtain the equation VGrUrq, = 0 from the condition Brq, =
ATq = 0 which is derived from the assumption range(B) C range(A). Using
the similar reasoning for this equation leads to the conclusion that qfuj = 0, i =
and j = l,...,p.
We are now poised to prove Theorem 2.
Proof for Theorem 2 : The proof is presented in three parts namely, (1) we
will show that there exist solutions for equations 3.20 and 3.21, (2) we will establish
that there exist a solution for equation 3.22, and (3) show that B(x rÂ¡/3) = b.
(1) To show that there exist a solution for equations 3.20 and 3.21, we prove that
the RHSs of both equations are in the range of A. It is obvious that b G range(A)
for equation 3.20, since b G range(B) and range(B) C range(A). For equation 3.21,
from Proposition 2 we can see that u,, 1 < i < p, is orthogonal to null(AT), which
means u, G range(A) for i = 1,... ,p.
(2) As for equation 3.22, we need to show that Vrx is in the range of C. We
use proof by contradiction in the following. Assume that VTx is not in the range of
C, then there exists a nonzero vector 7 G null{CT), i.e. Cr7 = 0, such that
7t(Vtx) ^ 0.
(3.24)
9
efficiency of this algorithm by applying it to surface reconstruction and shape from
shading problems.
In chapter 5, the new adaptive preconditioning technique is presented as an
efficient solution to general early vision problems. Our adaptive preconditioner is
constructed in a wavelet basis based on the approximation to the spectral character
istics of the smoothness and data constraints. We empirically show the superiority of
this adaptive preconditioner over other preconditioners previously proposed in vision
literature through experiments on surface reconstruction, shape from shading and
optical flow computation.
Two new, robust and efficient algorithms for optical flow estimation are pre
sented in chapter 6. The first is a modified gradientbased algorithm which combines
the image flow constraint and the contourbased flow constraint into the data con
straint in a regularization framework to amend the errors in the image flow constraint
caused by brightness discontinuities. We employ the adaptive preconditioned conju
gate gradient algorithm, presented in chapter 5, to solve the linear system for this new
formulation. The second is an SSDbased regularization method that uses the SSD
measure as the data constraint in a regularization framework. The preconditioned
nonlinear conjugate gradient with a modified search direction scheme is developed to
minimize the resulting energy function. Experimental results for these two algorithms
are given to demonstrate their performance.
BIOGRAPHICAL SKETCH
ShangHong Lai was born in ChangHua, Taiwan, Republic of China, on August
16, 1964. He received his Bachelor and Master of Science degrees in electrical engi
neering from National TsingHua University, HsinChu, Taiwan, in 1986 and 1988,
respectively. From 1988 to 1990, he served as a communication officer in the army
for the compulsory military service. After finishing the military service, he worked in
the Institute of Information Science, Academia Sinica, Taipei, as a research assistant
for one year. Since 1991, he entered the Department of Electrical and Computer
Engineering at the University of Florida, Gainesville, to persue his Ph.D. degree. His
research interests include computer vision, image processing, pattern recognition,
scientific computing and optimization algorithms.
188
8
theory and algorithms developed in this thesis can also be applied to many other
areas, such as cartography/terrain mapping, reverse engineering, MPEG image se
quence compression, material science applications (velocity estimation of dry granular
flows), engineering applications and mathematical physics.
1.4 Thesis Overview
The rest of this thesis is organized as follows.
In chapter 2, the deterministic and probabilistic frameworks of the early vision
problems are briefly reviewed and the formulations of some early vision problems
namely, surface reconstruction, shape from shading and optical flow computation,
are given. The deterministic framework is primarily based on the regularization
formulation. For the probabilistic framework, we discuss the maximum a posteriori
(MAP) estimation in a coupled Markov Random Field (MRF) model.
In chapter 3, the generalized capacitance matrix theorems are stated with com
plete proof. These theorems are generalizations of using ShermanMorrisonWoodbury
formula [26] to solve linear systems and the capacitance matrix theorem proposed by
Buzbee et al. [10]. The structure of the our generalized capacitance matrix algorithm
can be obtained directly from these generalized theorems.
Chapter 4 gives the complete generalized capacitance matrix algorithm which
involves using the ADI method for solving the Lyapunov matrix equations and using
the modified biconjugate gradient (BCG) algorithm for solving the associate linear
system with a capacitance matrix. We consider the membrane smoothness, thinplate
smoothness or a linear combination of the two in our algorithm. We demonstrate the
90
varying filter is difficult to characterize fully in the frequency domain, it is in general
a lowpass filter and we can approximate its spectral characteristics in the frequency
domain. In the next section we show how to achieve this approximation in a wavelet
basis.
5.2.3 Wavelet Basis
Wavelet transform has been used widely in the signal and image processing
communities since it provides a powerful tool for multiresolution analysis. The con
struction of a wavelet basis is achieved by defining a scaling function (f>(x), such that
the integer translates of the dilated scaling function V2~]
mal basis of L2(3?), where j G Z. This orthonormal basis (= {y/2~i
forms a subspace Vj C L2(3Â£), and the subspaces Vj satisfy the containment hierar
chy
Vj C Vj_i,Vj Z. (5.5)
Now we can introduce the difference space Wj between the two consecutive sub
spaces Vj and Vj_i such that Vj Wj = Vj_x, where denotes the direct sum
operator. The key idea of the wavelet basis is the difference space Wj is spanned by
an orthonormal basis which are obtained via integer translations of the associated
wavelet function ip(x), i.e. Wj = {\/2~^ip(2~^x k)}kez Therefore, a multiresolu
tion orthogonal decomposition of the space V0 can be achieved via
Vj WJ WJ_! . Wj = Vo,
(5.6)
135
flow constraint, respectively. In this dissertation, the weighting functions A and Â¡3
are chosen to be constants, and a(x) has value 0 when x is in the neighborhood of
discontinuities and takes a constant value elsewhere. A discrete version of the above
energy can be written as
y](Ex,jUj + EyVi + Et,i)2 + A(
(6.6)
Where, the index i denotes the ith location, the indices x, y and t denote the partial
derivatives along the corresponding directions, u and v are the discretized flow vector,
and the function E is defined as follows
ali when i Â£ D\N
Ei = {
/3Si when i Â£ Z
The set D contains all the discretized locations in the image domain, N is the set of
locations in the neighborhood of discontinuities, and the set Z contains the discretized
locations along the zerocrossing contours.
The image flow data constraint and the contourbased flow constraint are ob
tained by using the firstorder Taylor series approximation of the functions / and S',
respectively, thus each constraint is unreliable in the regions where the underlying
function ( I or S ) is not well approximated by a linear function locally, which means
the function contains significant nonlinear terms. We define a function () to be a
106
time. We can see that our APCG algorithm gives the best result for this comparison
with the same execution time. In figure 5.6(a), we depict the convergence rate of
the APCG algorithm in comparison to all the aforementioned iterative algorithms.
Figure 5.6(b) depicts the experiments with a thin plate stabilizer. The convergence
curves for our APCG algorithm, HCG, CG, and Yaou & Changs biorthogonal basis
transfer with Jaffards diagonal preconditioning are shown. As can be seen, our
APCG algorithm has the best convergence performance in both experiments.
We include the discontinuities in the above controlledcontinuity stabilizer to
test the performance of our APCG algorithm. The discontinuities are located along a
prespecified diagonal line. Figure 5.7 illustrates the convergence rates of our APCG,
HCG( with 4 levels) and CG algorithms. The APCG algorithm still has the best
convergence performance in this experiment. Since the inclusion of discontinuities
makes the conditioning of the stiffness matrix worse and our preconditioner was not
redesigned to account for the discontinuities in the stabilizer, the convergence rate in
this experiment is slower than that of APCG without discontinuities. The computed
solutions after 10, 20, 40 and 60 iterations of our APCG algorithm are depicted in
figure 5.8.
We attempted to incorporate the preconditioner developed in Pentlands [61]
into the comparison however, we found that the convergence of conjugate gradient
with this preconditioner is even slower than that of the CG algorithm (without pre
conditioning) for the very sparse data case. Note that the preconditioner was used
in a gradient decent iteration in [61]. In our experiments, the preconditioners are
136
measure of reliability for the data constraint at th location as follows
r, V \Exx,i\ + \Eyy,i \ + \Ett,i\ + \EXy,i \ + \Ext,i\ + \Eyt,i\
{l)~ \Ex,i\ + \Ey,i\ + \Et,i\
The partial derivatives of the function E in equation 6.7 can be estimated by standard
finite difference methods. The reliability measure S(i) is the ratio between the sums
of the absolute values of the secondorder and firstorder partial derivatives of the
function E at the ith location. Therefore, the corresponding data constraint is
reliable when the measure 8 is small and unreliable when 8 is large. We can set a
threshold to reject the data constraints with a high 8 function value. This threshold
was set to 1.0 in our implementation.
The data constraint in the above energy function implies the weighting 7j? +
Iyi or Sli + Sy on at site i. This weighting is not reasonable since the
flow constraints at high gradient locations are not necessarily more reliable than
those at low gradient locations. In fact, they tend to be less reliable because the
brightness gradients close to discontinuities are normally higher than those away from
discontinuities. Therefore, we eliminate this weighting on each data constraint via
normalization. To have a uniform contribution at each pixel from the flow constraint,
we divide the flow constraint at the ith location by the normalization factor (7j +
Iyti)* or (Sx>i + Sy)*. Lets denote the normalized flow constraint by ExUi + EyVi +
Et = 0, then the energy function to be minimized becomes
(6.7)
/() = + EV.Â¡Vi + Et,Â¡)2 + + *4, + Vl.i + *'.,)
I
(6.8)
146
example. The error in the computed optical flow after 40 iterations of the incomplete
Cholesky preconditioned CG algorithm is presented in Table 6.2, along results from
other gradientbased methods in literature that yield 100% flow density [3]. Once
again, our modified gradientbased regularization method produces more accurate
optical flow than the other methods. The computed optical flow is shown in figure
6.3(b).
Technique
Average Error
Standard Deviation
Density
Horn and Schunck (modified)
2.55
3.67
100%
Nagel
2.94
3.23
100%
Ur as et al.
4.64
3.48
100%
Our method (Gradientbased)
1.60
1.15
100%
Table 6.2. Summary of Diverging Tree results.
Our gradientbased algorithm was also tested on a real data set namely, the
Rotating Rubic cube. Figure 6.4 shows one frame of the Rubic Cubic sequence and the
computed optical flow after 40 iterations of the incomplete Cholesky preconditioned
conjugate gradient algorithm.
6.4.2 SSDbased Regularization Method
The experimental results of using our SSDbased regularization method on two
synthetic image data and one real image sequence are presented here. The two
synthetic image sequences are the Sinusoid and the Translating Tree image sequences.
Our results on these two examples are compared to the best results with 100% density
optical flow estimate reported in [3] and [77].
133
(u,v) is the optical flow field. The image flow constraint equation has been used in
different ways to obtain the optical flow estimates as reported in the literature on
gradientbased methods [40, 49, 55, 84]. In a regularization framework, the image
flow constraint equation is regarded as the data constraint and additional smoothness
constraint is imposed on the optical flow.
The image constraint equation is derived by taking the Taylor expansion of the
image constancy equation = 0 up to the firstorder terms. The higherorder
terms are neglected under the assumption that the time step between consecutive
frames is arbitrarily small, which is most often violated in practice. In reality, the
time step is usually fixed. Therefore, the lack of higherorder terms becomes the main
source of errors in the data constraint. The reliability of the image flow constraint
equation depends on the magnitudes of the higher order derivatives of image bright
ness function. If the image brightness function in the neighborhood of one point is
well approximated by a linear function, i.e. its higherorder derivatives are small,
then the image flow constraint is very reliable at this point. On the contrary, the
image flow constraint is very unreliable at the locations with significant higherorder
derivatives, which are mainly the neighborhood of brightness discontinuities. Since
the use of an unreliable flow constraint will degrade the accuracy of the optical flow
estimation, we propose to disable this unreliable image flow constraint in the neigh
borhood of discontinuities and enable the robust contourbased flow constraint at the
discontinuity locations in a regularization framework.
20
Where, d, are the depth data points specified in the domain 0 and c, are the un
certainty associated with the data. The goal is to find a u that minimizes the total
potential energy Â£(v) = S(v) + V(v).
The numerical solution to the above problem is computed by discretizing the
functionals
function is a quadratic in x (a discretization of u) given by, E(x) = xrKx xTb + c.
Where, K is a very large and sparse N x N matrix [79], with N being the number
of discretization nodes. The minimum x* of this energy function is found by solving
the large sparse linear system Kx = b. We present two very fast solutions to this
problem in this thesis. One is a generalized capacitance matrix algorithm and the
other is a novel adaptive preconditioning technique, which will be given in subsequent
chapters
2.3.2 Shape from Shading
Numerous solution methods using variational principle formulations of the shape
from shading (SFS) problem have been proposed. For a comprehensive set of papers
on this topic, we refer the reader to the book edited by Horn and Brooks [39] and the
work in [79, 72, 38, 76]. In this problem, it is required to recover the shape of surfaces
from image irradiance which depends on surface geometry and reflectance, scene
illuminance and imaging geometry. The image irradiance can be expressed directly as
a function of the surface orientation if illuminance, reflectance and imaging geometry
are assumed to be constant. Two deterministic formulations for SFS problem are
discussed in the following.
(f)
(e)
Figure 5.17. Shaded images of bump after (a) 8 (b) 10 iterations, (c) & (d) 20
iterations, (e) & (f) 40 iterations of the APCG algorithm with the occluding boundary
condition.
41
then the reduced matrices U, V 3?nxp are obtained. It is not difficult to see that
UGVr = B A and the matrices U, V and G are of full rank.
As for the condition qfu, = a,(, j), 1 < i,j < mg, we can find such
a U by transformation. If {U,G,V} satisfies UGVT = B A and each is of full
rank, then the alternative set {UQ, Q_1G, V} also satisfies these requirements when
the matrix Q 9Â£pxp is nonsingular. From Proposition 1, UT[qi... has
full rank. We can find a transformation matrix Q to make the upper (m tub) x
(rriA tub) block of the transformed matrix QTUr[qi ...qmA_mB] diagonal. Then,
the condition qfuj = 1 < i,j < tua ms, is satisfied.
There are infinite number of choices for the matrices U, V and G to satisfy the
aforementioned requirements. The above choice from the singular value decomposi
tion of B A is just a particular one that leads to a diagonal G matrix. This makes
the computation of G1 in equation 3.4 trivial. The choice of A, U, V and G is very
crucial for the efficiency of the numerical algorithms based on the capacitance matrix
algorithm to solve a linear system.
Now lets turn to the other case when null(A) C null(B) and range(B) C
range(A). Note that these two constraints are equivalent to each other when both
B and A are symmetric matrices as in [34].
Theorem 2 Given B = A + UGVT, where A, B 3ftnxn, G 3?pXp is nonsingular,
U = [ui,..., Up] 3Â£nxp, V = [vj,..., Vp] G 3?nxp, with both the sets {ui,...,up}
and {vj,..., Vp} containing linearly independent vectors. Assume null( A) C nu//(B)
153
value problems in PDEs. The extension is straightforward by replacing the spec
tral characteristics of the membrane or thinplate stabilizer by that of the associated
differential operator. There is no restriction on the order of the partial differential
equation. From the success of our adaptive preconditioner in the early vision prob
lems, we believe the adaptive preconditioner will lead to drastic acceleration for the
numerical solution of the boundary value problems.
In the construction of our adaptive preconditioner, we approximate the asso
ciated regularization filter by a linear shiftinvariant filter in a wavelet basis, since
the modulation matrix in our preconditioner gives uniform weighting to the wavelet
basis functions corresponding to the same frequency band. As discussed in section
5.2, the regularization filter is spatially varying in general, especially for the regu
larization problems with discontinuities or sparse data constraints. However, it is
possible to construct a spatially varying filter in a wavelet basis to achieve a better
approximation to the regularization filter.
The other possibility to design an adaptive preconditioner is to approximate the
generalized inverse of (K0 + UVT) by using our generalized capacitance matrix algo
rithm or the generalized ShermanMorrisonWoodbury formulas [11, 34, 67]. Using
this approximation, we can design a new adaptive preconditioner that accounts for
discontinuities and sparse data constraints in early vision problems. This will be the
focus of our future research.
The important issue of discontinuity detection has not been discussed in the
previous chapters. The early vision problem with discontinuity detection can be
5.15 The convergence of the APCG, HCG, and CG algorithms for the
shapefromshading problem 118
5.16 Shaded images of bump after 100 iterations of APCG algorithm with
out using any boundary condition 118
5.17 Shaded images of bump after (a) & (b) 10 iterations, (c) & (d) 20
iterations, (e) & (f) 40 iterations of the APCG algorithm with the
occluding boundary condition 120
5.18 The computed (a) gradient and (b) height after 40 iterations of the
APCG algorithm 121
5.19 The convergence curves of the (a) energy, (b) p q error, (c) z error
for the CG, HCG and APCG algorithm 122
5.20 One frame from the (a) square image sequence and (b) rotating Rubic
cube image sequence 124
5.21 Computed flow of the (a) translating square and (b) rotating Rubic
cube 125
5.22 The convergence curves of the CG, HCG and APCG algorithms for
(a) translating square (b) rotating Rubic image sequence 126
6.1The discrepancy between the gradient directions of the function I0 and
7i at the ith location is small and thus the use of the gradient direction
as the search direction for the ith component is robust; while the the
jth component of the gradient direction is unreliable for a local search. 142
6.2 Frames from the (a) Square 2 and (b) Diverging Tree image sequences. 144
6.3 Computed optical flow of the (a) Square 2 and (b) Diverging Tree
image sequences after 40 iterations of the incomplete Cholesky pre
conditioned CG algorithm 145
6.4 (a) One frame from the Rotating Rubic Cube sequence (b) Computed
optical flow after 40 iterations of the incomplete Cholesky precondi
tioned CG algorithm 147
6.5 Frames from the (a) Sinusoid 1 and (b) Translating Tree image sequences. 147
6.6 Computed optical flow of the (a) Sinusoid 1 and (b) Translating Tree
image sequences, using the preconditioned nonlinear CG algorithm. 149
6.7 (a) One frame from the Hamburg Taxi sequence (b) Computed optical
flow after 200 iterations of the preconditioned nonlinear CG algorithm. 150
x
52
to get the solution y which has the same projection on the range of A as 77/3. As for
the projection of rj/3 on the null space of A, it can be easily computed from the inner
product of Â¡3 and a, = (a,,i,... ,a,)P), where a.j = qJrÂ¡j, for 1 < i < mA m. The
coefficients a can be chosen arbitrarily as long as the assumption that rj1... rÂ¡mA_m
are linearly independent and linear combinations of q'l5..., q,mx_m is satisfied. Thus,
V P can be obtained from the following equation.
Am aTv
vP = y+ E (42)
,=1 H q.
The flowchart of the generalized capacitance matrix algorithm is given in figure
4.1.
4.2 Choice of the Splitting
From the variational formulations for the early vision problems, we can see it
is necessary to solve linear systems of the form Kx = b, where the matrix K
$NxN(N = n2) is the sum of the matrices K and AKS obtained from the data con
straint and smoothness constraint respectively. To apply the generalized capacitance
matrix algorithm to solve the linear systems, we have to choose a splitting scheme
for K at first. We discuss the nice choice of splitting K in this section.
Assume we split the matrix K into two components Ko and UVr. Note that the
matrices K and K0 can be replaced by the matrices B and A, respectively, used in
chapter 3. The notation K and K0 (popularly used in computational vision literature)
are used instead of B and A (widely used in the field of capacitance matrix methods)
127
frequency characteristics of a chosen wavelet basis leading to the construction of our
preconditioner. The preconditioning technique was demonstrated for the surface re
construction, shape from shading and optical flow computation problems. Through
experiments, we demonstrated the superiority of our preconditioning method over
existing methods.
Our adaptive preconditioning technique can be used in conjunction with any
wavelet basis. The choice of the wavelet basis is crucial to achieve efficient pre
conditioning. The ideal wavelet needs to be well localized in spatial (or time) and
frequency domain. Good spatial localization makes the QMF implementation very
efficient, i.e., the wavelet transform can be accomplished very efficiently. Good local
ization in frequency domain of the wavelet is necessary to achieve fast convergence
since the the accuracy of spectral approximation to the regularization filter depends
on the localization in frequency domain of the wavelet. But the localization in spa
tial and frequency domain cannot be arbitrarily small, because the product of their
bandwidths is lower bounded by a constant which is known as the uncertainty
principle or Heisenberg inequality. This implies that there is a tradeoff between the
spatial and frequency localization. Gaussian functions are optimal in the sense that
they meet the bound with equality [20]. If we use the wavelet with the optimal
property, the preconditioning should be more efficient.
25
Two new and robust algorithms namely, a modified gradientbased algorithm
and an SSDbased regularization algorithm, for accurate optical flow computation
are presented in this dissertation. In the modified gradientbased algorithm, we
propose to selectively combine the image flow constraint and the contourbased flow
constraint into the data constraint in a regularization framework to amend the errors
in the image flow constraint caused by the brightness discontinuities. The image flow
constraint is disabled in the neighborhood of discontinuities, while the contourbased
flow constraint is active at discontinuity locations. The SSDbased regularization
algorithm uses the SSD measure as the data constraint in a regularization framework,
which leads to a nonlinear optimization problem. These two different regularization
formulations are described below. These two robust algorithms will be discussed in
details in chapter 6. In this section, we briefly describe our modified gradientbased
formulation.
Our modified gradientbased formulation modifies the Horn and Schunk for
mulation [40] to incorporate a separate condition along discontinuities in the image
intensity [35]. This modification drastically enhances the performance of our method
over other gradientbased methods because the optical flow constraint equation is in
valid along intensity discontinuities and causes large errors in the computed flow. We
get drastically better and more robust estimates of optical flow than any gradient
based optical flow technique reported in Barron et al. [3].
Our variational formulation of the optical flow problem involves minimizing a
combination of the optical flow constraint equation with a global smoothness term
72
4.3.2 Modified BCG Method for the Capacitance Matrix Equation
In this section we present an algorithm to solve the capacitance matrix equa
tion defined earlier. The conjugategradient(CG) method is a very powerful iterative
scheme to solve the symmetric positive definite linear system. A natural generaliza
tion of the CGtype method to solve a general nonsymmetric linear system is called
the biconjugate gradient(BCG) method [46]. The linear system with the capacitance
matrix is a nonsymmetric and indefinite system. The size of the capacitance ma
trix is determined by the rank of the matrix UVT, which contains the difference
between K and K0. With the K0 proposed in section 4.2, the size of the associated
capacitance matrix is usually 0(n). The BCG method requires the computation of a
matrixvector product in each iteration. Since the capacitance matrix C is dense, this
matrixvector product takes 0(N) operations. In this section, we present a modified
biconjugate gradient method to solve this linear system by applying the wavelet trans
form technique to approximate the multiplication and reduce the computational cost
to 0{n). Since the BCG converges in O(n) iterations, the computational complexity
for the modified BCG method is 0(n2) = 0(N).
The BCG algorithm [46] for the nonsymmetric linear system C/3 = f is as
follows.
0. Choose /30, and set q0 = r0 = f C/30; Choose ?0 / 0, and set q0 = ?0;
1. Compute af = = qLaCq*i, and ak = af/af;
2. Update f3k = /3fc_x + akqk
177
The computational cost of our hybrid search algorithm will be very high when
the problem size is large. One possible way to improve the efficiency of this algorithm
is to modify the informed GA as a multiscale stochastic optimization method. To
achieve this, it is necessary to construct a multiscale representation for the line process
field. Heitz et al. [33] constructed a multiscale representation for a label field and
used a multiscale relaxation algorithm to minimize the energy function for some early
vision problems, ft is possible to generalize this construction for the line process
field and apply a multiscale informed GA to make the hybrid search algorithm more
efficient.
97
The strategy for constructing our adaptive preconditioner for the early vision
problems with regularization formulation was discussed in section 5.2. Our precon
ditioner is based on the idea of modulation in a wavelet basis to approximate the
spectral characteristics of the matrix K. Accordingly, our adaptive wavelet precondi
tioner P took the form WMWr with the diagonal modulation matrix M encoding
the spectral approximation in the wavelet basis. Now, the primary problem in this
preconditioner construction is how to construct the modulation matrix M for general
problems. In the following paragraphs, we show how to achieve the construction of
an adaptive preconditioner for the surface reconstruction, shape from shading, and
optical flow computation problems.
5.3.1 Surface Reconstruction
For the surface reconstruction, the stiffness matrix K(= Kj + Ks) contains the
matrix obtained from the data constraint and the matrix K4 corresponding to the
smoothness constraint. When the controlledcontinuity stabilizer is used, the matrix
Ks is the linear combination of the discrete Laplacian matrix ( from membrane) and
the discrete biharmonic matrix ( from thin plate). Since the spectral distributions for
both matrices are known a priori, the spectral distribution for the matrix Ks can be
obtained from their linear combination. Now, what remains to be determined is the
spectral distribution for the matrix K<. For the dense data case with uniform weights
for each data constraint, the matrix Kj is a scaled identity matrix, whose spectral
distribution is a constant function. This case corresponds to the discrete version
of the linear shiftinvariant lowpass filtering interpretation for the regularization
185
[54] S. G. Nadabar and A. K. Jain, Parameter estimation in MRF line process
models, in Proceedings of the IEEE Conference on Computer Vsion and Pattern
Recognition, (Champaign, IL), pp. 528533, IEEE, June 1992.
[55] H.H. Nagel, Constraints for the estimation of displacement vector fields from
image sequences, in Proceedings of the 8th International Joint Conference on
Artificial Intelligence, (Karlsruhe, Germany), pp. 945951, 1983.
[56] H. H. Nagel and W. Enkelmann, An investigation of smoothness constraints
for the estimation of displacement vector fields from image sequences , IEEE
Trans. Patt. Anal. Machine Intell., vol. PAMI8, no. 5, pp. 565593, 1986.
[57] G. M. Nielson and R. Franke, A method for construction of surfaces under
tension, Rocky Mountain Journal of Mathematics, vol. 14, no. 1, pp. 203221,
1984.
[58] D.P. OLeary, A note on the capacitance matrix algorithm, substructuring,
and mixed or neumann boundary conditions, Applied Numerical Mathematics,
vol. 3, pp. 339345, 1987.
[59] A. V. Oppenheim, Discretetime Signal Processing. Englewood Cliffs, NJ: Pren
tice Hall, 1989.
[60] A. Papoulias, liCurve Segmentation Using Weak Continuity Constraints, Mas
ters thesis, University of Edinburgh, 1983.
[61] A. Pentland, Interpolation using wavelet basis , IEEE Trans. Patt. Anal.
Machine Intell., vol. PAMI16, no. 4, pp. 410414, 1994.
[62] T. Poggio and V. Torre, IllPosed problems and regularization analysis in early
vision , Technical Report AI Memo 773, MIT, A.I. LAB. Cambridge, MA 1984.
[63] T. Poggio, V. Torre, and C. Koch, Computational vision and regularization
theory , Nature, vol. 317, pp. 314319, 1985.
[64] M. Proesmans, E. Pauwels, and L. Van Gool, Coupled geometrydriven diffu
sion equations for lowlevel vision , in GeometriyDriven Diffusion in Computer
Vision, pp. 191228, Dordrecht, The Netherlands: Kluwer Academic Pub., 1994.
[65] W. Proskurowski and 0. Wildlund, On the numerical solution of Helmholtzs
equation by the capacitance matrix method, Math. Comp., vol. 30, pp. 433468,
July 1976.
[66] W. Proskurowski and 0. Wildlund, A finite element capacitance matrix method
for the Neumann problem for Laplaces equation, SIAM J. Sci. Statist. Corn
put., vol. 1, no. 4, pp. 410425, 1980.
21
PDE formulation
The shape from shading problem is commonly posed as a nonlinear, first or
der partial differential equation in two unknowns, called the image irradiance equa
tion, E(x,y) = R(p,q) [39], where E(x,y) is the image irradiance at a point (x,y),
p = vx, q = vy are first partials of the surface function u(x,t/), and R(p,q) is the re
lation between surface orientation (p, q) and image irradiance E(x,y). To overcome
the ambiguity caused along occluding contours in the gradient space parameteriza
tion (p, q), a reparameterization of surface orientation in terms of a stereographic
mapping: / = 2ap, g = 2aq with a = 1/(1 + \/l + p2 + q2) is used. With this repa
rameterization, the overall energy function (sum of stabilizer and data energies) to
be minimized is expressed as
Â£{f,9) = X j jQ(fx + fl) + (sl + 92y)dxdy + J jjE(x,y) R{f,g)]2dxdy. (2.17)
The Euler Lagrange equations which express the necessary condition for a minimum
are given by the system of coupled PDEs
A/ = A[ R(f,g)E(x,y)]R,
Ag = \[R(f,g) E(x,y)]Rg
(2.18)
The above equations may be discretized and solved using the algorithm of Sim
chony et al. [72], which enforces integrability condition on p and q. The method
involves solving three different discretized Poisson equations namely, Kxi = bi,
54
this property has the advantage of saving the computational cost in solving equation
3.3 for each u,0 < i < p 1, since there are only one or two nonzero components in
each u,. Therefore, we need to compute the solution h0 to a linear system consisting
of a matrix K0 and an RHS vector which is the projection of the unit vector onto
the range of K0. Because, h0 is translation invariant, we can obtain the solution of
rÂ¡i in equation 3.3 without having to solve any of the linear systems associated with
each ?7.
For ease of exposition, we consider the splitting scheme for the surface recon
struction problem in the following. This splitting schemes can be extended to other
early vision problems.
The stabilizer discussed in the previous section imposes smoothness constraints
on the surface reconstruction problem leading to a C surface i.e., the membrane
spline, a C1 surface namely, the thin plate spline, or a combination thereof called
the thinplatemembrane spline. As mentioned earlier, the numerical solution to the
surface reconstruction problem involves solving a linear system Kx = b. The matrix
K is large and sparse but not tightly banded. This is primarily due to the fact
that a vector has been used to represent a twodimensional array which leads to
representing neighboring interactions in twodimension by interactions between very
distant elements in onedimension. The matrix K is a sum of two matrices Ks and
Kd, with Ka containing the membrane or thinplate molecules [81] defined on the
interior and boundary of f2. The IQ matrix is a diagonal matrix containing nonzero
weights.
46
At] from (3.21), to get B77/3 = 0, i.e., tj/3 nu 11(B). Since null(A) = null(B), we
have Ar](3 = 0. Using equation 3.21 for At], we get U/3 = 0. From the assumption on
the matrix U namely, Ui,..., up are linearly independent, we conclude that (3 = 0.
Therefore, C is nonsingular.
In the above theorems, we impose the constraints on the relationship between
B and A either via null(B) C null(A) and range(A) C ran<7e(B) or via null(A) C
null(B) and range(B) C range(A). Now we consider the general case, i.e., no con
straint on the relationship between B and A is imposed. The following theorem is
given for this general case and can be used for any matrices B and A in 3?nXn as long
as there exists a solution for the associated capacitance matrix equation.
Theorem 3 Given B = A + UGVT, where B, A Â£ 5RXri, G 3?pXp is nonsingular,
U = [ui,...,Up] 3inXp, V = [vi,...,vp] 3inxp, with both the sets {ui,...,up}
and {vi,...,Vp} containing linearly independent vectors. Let rank(A) = n ,
dim(null(B) C\ null(A)) = m', and dim(null(Br) H null(AT)) = m, 0 < m',m <
rriA < n, then there exist two sets of orthogonal basis, {qi,... ,qm/l} and {q'j,..., q'm^ },
fornull(AT) and null(A) respectively, such that null(BT)r\null(AT) = span{qm/l_m+i,
...,qm/1) and null (B) C\ null (A) = span{q'mx_m,+1,..., q'mA}. Let rank( B) =
n mg, 0 < m',m < mg < n, then there exist mg m' vectors r'm<+i,..., r'mB
such that the set {q'm^_m/+1,..., q'mA, r'm<+i,..., r'mj9} is an orthogonal basis for
null(B). Let qjuj = cti6(i,j), a, ^ 0, for 1 < i,j < m.
172
Figure 7.2. The sparse data set used for the surface reconstruction with discontinuity
detection, (a) original surface (b) sparsely sampled data set.
step, we need to compute the energy function value for the perturbed new line process
configuration and then determine the acceptance probability PaCcepti given by
i 1 when E'(\new) < E'(lold),
Paccept(\old lneu;) \
y e(Â£'(lneu,)Â£'(l0id))/7 when E'(\new) > E'(l0id),
where 1 m is the line process before the exploration step and lnew is the one after the
exploration step. Computation of the energy function value Â£'(1) given 1 involves
minimizing a quadratic convex function, which is solved via the generalized capac
itance matrix algorithm. It is necessary to repeat these two steps many times for
the Gibbs sampler to converge, therefore the computational cost is extremely high in
implementing the Gibbs sampler. In our experiments, we use an approximate Gibbs
sampler to reduce the computational cost drastically. This Gibbs sampler uses the
approximation that the minimum solution u0jd obtained by minimizing the function
f7(ul0/d) is similar to the minimum solution uneu; for the new perturbed line process
138
obtain the image flow constraint in equation 6.3. Consequently, the gradientbased
regularization approach can be perceived as an approximation to the SSDbased
regularization formulation. The energy function to be minimized in the gradient
based regularization is quadratic and convex, while the energy function in equation
6.10 is neither quadratic nor convex. For the quadratic and convex energy function in
the gradientbased regularization, the energy minimization can be accomplished by
solving an SPD linear system. The minimization of the nonquadratic and nonconvex
energy function is more difficult and requires more computational effort. The SSD
based regularization can give more accurate estimation than that from the gradient
based regularization, since the data constraint used in the SSDbased regularization
is more precise than that in the gradientbased regularization. However, it definitely
requires more computational work to minimize the energy function.
The total energy function in equation 6.10 is a nonlinear function with a large
number of variables. We use the nonlinear conjugate gradient method [23] with
a modified search direction to minimize this nonlinear energy function in equation
6.10. The nonlinear conjugate gradient algorithm is very efficient in the unconstrained
optimization of largescale nonlinear functions [23]. Although this algorithm can only
find the local minima, we can obtain very accurate solution when the initial solution
is close to the true solution. For example, small displacement between consecutive
frames can be easily found when we start the nonlinear CG from zero initial values.
In fact, a very good initial solution can be obtained from the optical flow estimated in
the previous frames. In addition, the scale space tracking scheme [89] can be employed
132
Recently, Barron et al., [3] have presented a comprehensive survey of various
optical flow computation techniques along with a quantitative comparison of them.
From their experimental results, they argue that the phasebased method by Fleet
and Jepson [18] gives the most accurate result. However, most of the reported results
are only for sparse optical flow measurements obtained after thresholding unreliable
components, present in the neighborhood of phase singularities.
The remainder of this chapter is organized as follows. In the next section, we
present our modified gradientbased method that combines the regionbased gradient
constraint in the regions away from brightness discontinuities and the contourbased
gradient constraint at these discontinuity locations. In section 6.3, our SSDbased
regularization technique for optical flow computation from two images of a time
sequence is proposed. The experimental results for both algorithms on synthetic and
real image sequences are presented in section 6.4. Finally, we conclude this chapter
in section 6.5.
6.2 Modified gradientbased method
At the foundation of the standard gradientbased approach is the image flow
constraint equation,
Ix(x, y, t)u(x, y, t) + Iy(x, y, t)v(x, y, t) + Jt(x, y, t) = 0 (6.3)
where I(x,y,t) is the image brightness function at location (x,y) and time t, Ix,Iy
and It are the partial derivatives of I with respect to x, y and t, respectively, and
129
the motion boundary, Nagel [56] proposed the oriented smoothness measure to sup
press the smoothness constraint in the direction perpendicular to boundaries. How
ever, the smoothness across motion boundaries problem can be resolved by using the
regularization with discontinuities [80].
A major problem with the gradientbased approach is the unreliability of the
image flow constraint equation in the areas whose local brightness function is highly
nonlinear. These areas occur in scenes containing highly textured regions, motion
discontinuities, or depth discontinuities. The firstorder Taylor series approximation
used in the derivation of the image flow constraint equation leads to inaccuracies
when the higher order terms are significant. The higher order terms are neglected
in the derivation of the image flow constraint equation by assuming the time step
between consecutive frames is arbitrarily small, i.e. approaches 0, which is far from
being practical. Usually, the time step between consecutive images is fixed and is not
arbitrarily small. Therefore, the gradient constraint equation is reliable only when
the higher order derivatives of the brightness function are insignificant, i.e., the local
brightness function is well approximated by a linear function. Most of the gradient
based methods do not account for the reliability of the gradient constraint equation.
In [32], Heitz and Bouthemy used the hypotheses testing technique to determine
whether the image flow constraint equation is valid at each site in the image plane.
The above mentioned gradient constraint is very unreliable in the neighborhood
of discontinuities of the brightness function. However, the contourbased methods
can give robust estimates of the optical flow along the zerocrossing contours, which
171
In step 2, we need to compute the fitness function value for each individual line
process configuration. To compute the fitness function value E'( 1) given in equation
7.8, it involves solving a convex and quadratic minimization problem. This can be
very efficiently solved by using either our generalized capacitance matrix algorithm
or our adaptive preconditioned conjugate gradient algorithm, presented in chapters
4 and 5.
7.3.3 Preliminary Experimental Results
We implemented our hybrid search algorithm on the sparse data surface recon
struction problem. In our experiments, our hybrid search algorithm consists of the
informed GA for the binary line process 1 and the generalized capacitance matrix
algorithm for determining the fitness function value for a prespecified line process
configuration. The sparse data set ( see figure 7.2(b) ) is obtained from sampling
the original surface shown in figure 7.2(a). We discretized this problem on a 32 x 32
mesh. The sparse data set shown in figure 7.2(b) contains 64 data points.
In our experiments, we use the cliques for the line process used by Geman and
Geman [22], shown in figure 7.3(a). The line clique potentials for different line process
configurations are shown in figure 7.3(b). This choice of the line clique potentials is
ad hoc. However, they can be obtained via parameter estimation in MRF models
[54].
The Gibbs sampler [22, 90] is used to implement the informed mutation operator
in our informed GA. There are two steps in the Gibbs sampler, i.e., the exploration
step (or perturbation step) and acceptance step. Each time after the exploration
I certify that I have read this study and that in my opinion it conforms
to acceptable standards of scholarly presentation and is fully adequate, in
scope and quality, as a dissertation for the degree of Doctor of Philosophy.
Baba CXVemuri, Chair
Associate Professor of Electrical
and Computer Engineering
I certify that I have read this study and that in my opinion it conforms
to acceptable standards of scholarly presentation and is fully adequate, in
scope and quality, as a dissertation for the degree of Doctor of Philosophy.
John Staudhammer
Professor of Electrical and
Computer Engineering
I certify that I have read this study and that in my opinion it conforms
to acceptable standards of scholarly presentation and is fully adequate, in
scope and quality, as a dissertation for the degree of Doctor of Philosophy.
KeiYh L. Doty
Professor of Electrical and
Computer Engineering
I certify that I have read this study and that in my opinion it conforms
to acceptable standards of scholarly presentation and is fully adequate, in
scope and quality, as a dissertation for the degree of Doctor of Philosophy.
in G. Harris
assistant Professor of Electrical
and Computer Engineering
I certify that I have read this study and that in my opinion it conforms
to acceptable standards of scholarly presentation and is fully adequate, in
scope and quality, as a dissertation Ijoj/the degree of Doctor of Philosophy.
Permit N. Sigmon
Associate Professor of Mathematics
162
As shown in chapter 2, either the deterministic formulation or the probabilistic
formulation leads to the minimization of a nonconvex energy function E(u, 1), where
1 = [h v], given in equation 2.6 or 2.13. This energy function is a hybrid of the real
variables and boolean variables. We know that the GA is well suited for combinato
rial optimization problems because of its bitstring representation. This nonconvex
minimization problem can be transformed to a combinatorial optimization problem
as follows.
The energy function i?(u,l) can be rewritten as
E(u,l) = Â£(u  1) + Â£/(l),
(7.2)
Â£(u 11) = K,,ud2 + Â£/(ul),
(7.3)
m = e ^(i).
(7.4)
c
For the fourconnected neighborhood system, the conditional clique potential of the
real variables u given the line process 1, U(u  1), is analogous to the quadratic mem
brane smoothness energy, i.e.,
Â£/(ul) = Â£Vc(ul)
c
~ X^((ui+l>j uj) (1 hi'j) + (u,j+l Uu,j) (1 vi,j)) (7*5)
J
13
derivative along x and y, then
D(u,d) =  Ku d 2 (2.4)
S{ u) = iuTKsu (2.5)
For the choice of firstorder differential stabilizer, i.e. the membrane smoothness, the
operator Ka is a Laplacian matrix.
The regularization method imposes the smoothness constraint over the entire
domain. It also smoothes out the discontinuities which is very undesirable. Incorpo
rating the discontinuity detection into the early vision problem, the energy function
to be minimized can be modified as [8]
F7(u,h,v) =  Kju d  + ^((ut+u uj) (1 hÂ¡) + (u,j+i uuj) (1 u,j))
J
+ E^(h,v) (26)
where hij and V{j are the horizontal and vertical line processes, respectively, which
are binary variables with a value 1 denoting a vertical or horizontal discontinuity
at that location and a 0 otherwise. The line energy V(h,v) is a term that may
be used to penalize the inclusion of discontinuities or to impose the smoothness of
the line process. Most of the deterministic formulations take very simple functions
for F(h,v) merely based on some heuristics; for example, Blake and Zisserman [8]
take F(h, v) = hj + vj, which has no interaction between individual line processes
variables.
70
For any positive pj, the function ^rij=i(^)2 is always less than the righthand
side when x > Â£. Therefore, in equation 4.28, the interval (4sin2(^),4) can be
M 1 n2
replaced by S = (4 sin2(^), mm(4, [)). The convergence requirement in equation
Min2
4.28 can be made less stringent by using the inequality,
1 1
max TT (
xS xl M
J = 1
X Pj \2
X + Pj
t
< (max
v xes
n<
xPj
X+Pj
)2)(
max
xes x*
(4.29)
leading to the following modified requirement which when satisfied automatically
makes the inequality 4.28 true.
max TT ( )2 < (Mn2q)( r) = (Mn29)(4sin2())9 = M' (4.30)
This bound M' will approach a constant when n oo. Thus, the problem becomes
one of finding the number of iterations t and the parameters pj such that the re
quirement in 4.30 is satisfied. This requirement is of the same form as the classical
ADI minimax problem which was solved by Jordan as quoted in [87]. We can use the
result in [87] to determine the number of iterations required for convergence. The
number of iterations needed to meet this requirement is
J
k'
v
/
lnÂ£rln%
1,
v' + vV2 1
1. 1,
(4.31)
(4.32)
(4.33)
147
(a)
(b)
Figure 6.4. (a) One frame from the Rotating Rubic Cube sequence (b) Computed
optical flow after 40 iterations of the incomplete Cholesky preconditioned CG algo
rithm.
(a)
(b)
Figure 6.5. Frames from the (a) Sinusoid 1 and (b) Translating Tree image sequences.
23
+ J Jj.E{x, y) R(p, q))2dxdy.
The first term in the above equation is a weak smoothness constraint and the coeffi
cient A controls its contribution toward the final solution. The second term enforces
the integrability constraint, ensuring that the gradients (p, q) of the recovered surface
Z match those of a valid surface. Zy are the first partials of the surface Z(x,y).
The above energy can be discretized directly by using finite element techniques
however, the same set of discrete domain equations can be arrived at via the use of
finite difference approximations for this application [76]. The discrete energy is given
by
e = ^J2(E(x^y)R^
where the summations are over all the pixels in the image. In vector notation, the
above equation can be written as
E(u) = utKu bu + C (2.21)
j
Where, u = [zTpTqT] is a concatenation of the vector representation of the fields
Z,p and q. The minimum of the above quadratic is obtained by solving Ku = b with
Ku containing linear u terms and the constant and the nonlinear terms form the b
vector. This is an (N x N) sparse nonlinear system which is usually solved using
iterative methods. These iterative methods to solve the nonlinear system normally
170
This mutation transition probability can be simplified to be
Pm{ 1.
1 E'(l)/T
ze
(7.15)
This informed mutation operation can be implemented by using a Gibbs sampler
[22, 90].
By using the above reproduction and mutation operators with the initial popula
tion sampled from the Gibbs distribution given in equation 7.9, we can show that each
individual at the nth generation is a sample from the following Gibbs distribution
Pn(\)
z
1 ~(n+l)E'(\)/T
z
y e(n+i)Â£'(l)/r
(7.16)
(7.17)
Thus, the asymptotic convergence of this informed GA can be proved directly from
equation 7.16 by assuming n > oo.
Our informed GA is summarized as follows:
1. Choose an initial population from a Gibbs distribution in equation 7.9.
2. Determine the fitness function value for each individual.
3. Perform Boltzmann weighted selection.
4. Perform the informed mutation operation by using a Gibbs sampler.
5. Check stopping criterion; if satisfied, stop; else go to step 2.
109
(a)
(b)
Figure 5.6. The convergence for different preconditioners in the preconditioned con
jugate gradient algorithm in the surface reconstruction problem with the synthetic
sparse data set and (a) with controlledcontinuity stabilizer (p = 1, r = 0.1) (b) with
thinplate stabilizer.
Figure 5.7. The convergence for different preconditioners in the preconditioned con
jugate gradient algorithm for the surface reconstruction problem with discontinuities
specified in the controlledcontinuity stabilizer.
45
Substituting equation 3.30 into equation 3.29 yields
Bt(A+TV7) = VGt(G1 + VTA+U)T7 (3.31)
Again, using Proposition 2, we can show that C = G1 + VTA+U. Consequently,
we obtain Br(A+TV7) = VGTCT7 = 0 since CTj = 0 by assumption. Thus,
A+rV7 Â£ null(BT). Combining this with the equation 3.27 leads to the conclusion
that b is not in range(B). This is a contradiction to the assumption b Â£ range(B).
Therefore, the assumption that VTx is not in the range of C is untrue. Hence, we
can conclude that VTx C. This proves that there exists a solution to equation
3.22.
(3) The proof for B(x rj/3) = b is very similar to part (3) of the proof for
Theorem 1 and is omitted here.
If we restrict the above constraints on the relation between B and A to nu 11(B) =
null(A) and range(B) = range(A), then the capacitance matrix C is nonsingular,
i.e., the capacitance matrix equation has a unique solution. This result is stated in
the following corollary.
Corollary 1 In Theorem 2, if null(A) = null(B) and range(A) = range(B), then
the capacitance matrix C is nonsingular.
Proof: To prove that C is nonsingular, we show that C(3 = 0 => (3 = 0. Suppose
C/3 = 0, then, from the definition of C, we have Vttz/3 = G_1/3, which can be
substituted into the equation Btj/3 = Arj/3 + UVTT7/3, along with the equation for
122
... CG
HCG(L=2)
HCG(L=3)
HCG(L4)
APCG
15 20 25 30
Number of iterations
. . CG
HCG(L2)
HCG(L3)
HCG(l4)
20 25 30 35
Number of iterations
(b)
(c)
Figure 5.19. The convergence curves of the (a) energy, (b) p q error, (c) z error
for the CG, HCG and APCG algorithm.
88
K. This approximation is achieved via the frequency domain analysis of the regu
larization operators and using the frequency band partitioning behavior of a wavelet
basis. Therefore, our adaptive preconditioner is akin to a spectrally equivalent op
erator for the original matrix K. Furthermore, the solution of a linear system using
our adaptive preconditioner can be obtained by taking the wavelet transform, scaling
the coefficients, and taking the inverse wavelet transform. By using a very compactly
supported wavelet, the solution update can be achieved very efficiently.
5.2.2 Regularization Filter
The filtering interpretation of regularization operators has been discussed in
vision literature [80, 83]. The inverse of the controlcontinuity stabilizer (Laplace +
Biharmonic operators) may be interpreted as a lowpass filter hence, the solution to
an early vision problem in a regularization framework can be obtained by applying
this lowpass filter to the data. We will use this filtering behavior of the regularization
operator to develop a preconditioner which approximates this filter in a wavelet basis.
In section 2.3, we have seen the variational formulations of various early vision
problems that lead to minimizing an energy functional. This minimization can be
achieved by solving the associated EulerLagrange equation which for the surface
reconstruction problem in the continuous data case (with a controlledcontinuity
stabilizer) can be written as
/>{x){(l t(x))A2u(x) t(x)Av(x)} + c(x)t;(x) = c(x)d(x),
(5.3)
142
Figure 6.1. The discrepancy between the gradient directions of the function I0 and
/x at the ith location is small and thus the use of the gradient direction as the
search direction for the ith component is robust; while the the jth component of the
gradient direction is unreliable for a local search.
Figure 6.1 depicts the gradient directions of Io and Ii at two locations. The
gradient directions at the ith location are close, therefore the use of it as the search
direction is robust for the ith component. The two gradient directions at the jth
location differ by a significant angle, which means the jth component of the gradient
is unreliable as a local search direction.
We can set a threshold on the smoothness measure in order to reject the unre
liable components of the gradient direction as the basis of a local search direction.
In addition, this smoothness measure is also used as a confidence measure for flow
vectors.
6.4 Experimental Results
In this section, our modified gradientbased regularization algorithm and SSD
based regularization algorithm are tested on a variety of synthetic and real image
sequences. Comparisons of our results with the previously reported results in [3, 77]
are made to demonstrate the performance of our proposed algorithms.
155
deterministic methods include the GNC (graduated nonconvexity) algorithm, ana
log networks, mean field annealing and dynamic programming. As for the stochastic
methods, there are the simulated annealing, MPM (maximizer of posterior marginals)
and the GA (genetic algorithm).
7.2.1 Deterministic Optimization Methods
Most of the deterministic minimization methods can not find the global min
imum solution for the nonconvex function in equation 2.6, instead they lead to an
approximate solution or a locally optimal solution. But they can converge much
faster than the stochastic methods. The following deterministic methods have been
proposed in literature for this nonconvex minimization problem. We give a brief
description of the same.
GNC algorithm
The GNC (Graduated NonConvex) algorithm was proposed by Blake and Zisser
man [8] and can be considered as a continuation method widely known in numerical
analysis literature [14]. For the GNC algorithm to minimize the nonconvex func
tion F(u, h, v) given in equation 2.6, a dual energy i^u) is derived by the following
equation
F(u) = infÂ£(u,h,v). (7.1)
h,v
A sequence of approximating functions F'p)(u), 0 < p < 1, are constructed such that
Fis convex and Fis equal to the dual energy F. Then, gradient decent tech
niques are employed to minimize F^ for a prescribed decreasing sequence of values
71
where, \z] denotes the smallest integer larger than 2 and v is the expansion factor of
an interval (a, b) defined as K The expansion factor u of the interval of interest (5)
is fixed when n approaches infinity, i.e.
lim v =
n 00
1
47r2M1/?
(4.34)
In the classical ADI minimax formulation, v is the spectral radius of the eigenvalues
being considered in A, which is shown be of the order of n2 [85]. Since we use
a different formulation for convergence and impose the spectral assumption on the
data, the resulting v becomes When n > 00, the ratio v approaches a
constant as given in equation 4.34 and M' also approaches a constant as mentioned
above. Therefore, we conclude that limn_0O J is a constant, i.e. there is a constant t
such that  AX'f Up < eX';r is satisfied for all n.
In [87], an optimal set of ADI parameters for the general ADI iterations was
given. We may use these parameters to solve the minimax problem in equation 4.30.
Although this set of ADI parameters is not the optimal set to meet our convergence
requirement, it provides us with a parameter set which can achieve the convergence
criterion in a constant number of iterations. Therefore, we use the result in [87] to
select a suboptimal ADI parameter set. Solving for the optimal ADI parameters in
our problem requires complicated analysis using transformation theory [87] and is
currently under investigation.
126
(a)
(b)
Figure 5.22. The convergence curves of the CG, HCG and APCG algorithms for (a)
translating square (b) rotating Rubic image sequence.
regularization formulations of the optical flow problem. For the rotating Rubic cube
experiment, the computed optical flow after 15 iterations of our APCG algorithm is
shown in figure 5.21(b).
5.5 Summary and Discussion
In this chapter, we presented a novel physicallybased adaptive precondition
ing technique which was used in conjunction with a conjugate gradient algorithm to
drastically improve the speed of convergence for solving various early vision prob
lems. A preconditioner based on the membrane spline or the thin plate spline or a
convex combination of the two was termed as a physicallybased preconditioner for
obvious reasons. The adaptation of the preconditioner to an early vision problem
was achieved via the explicit use of the spectral characteristics of the regularization
filter in conjunction with the data. This spectral function was used to modulate the
119
HCG and CG algorithms. The convergence curves for the energy function and the
computed (p, q) error are shown in figure 5.15. Note that the (p, q) error is defined as
H \Pi~P,*l + kt' 9t*l> where (p,9,) and (p*,q*) are the computed and true (p,g) at the
ith location, respectively. We can see the APCG algorithm has the best convergence
behavior in figure 5.15(a) and (b). The computed solution after 100 iterations of the
APCG algorithm is fed back to the reflectance model to generate the left and right
bump images as shown in figure 5.16. Comparing these computed images to those in
figure 5.14, we can see that this computed solution is far from the true solution in
the upper left region, although its corresponding energy is close to 0. This is because
the energy function for the SFS problem is nonconvex and gradientbased methods
(e.g. CGtype methods) tend to get trapped in local minima. To obtain the true
solution using the gradientbased method, we need to have an appropriate initial
guess that is closer to the true solution. But it is difficult to find such initial guess
that is guaranteed to converge to the true solution in the CGtype algorithms. In
fact, we always use 0 as the initial guess in our experiments.
To resolve the above problem for minimizing the SFS energy function by gradient
based methods, we include the boundary conditions obtained from the occluding
boundary [41] as an additional data constraint in the energy function to be mini
mized. By incorporating the boundary conditions into the SFS energy function, the
CGtype algorithms can approach the true solution. In our experiment, we use the
active contour model [43] to locate the occluding boundary at first. By using the
property that the surface normal at a point on the occluding boundary is parallel
55
By making use of the structure of the matrix K, we can split K into two com
ponents K0 and UVT, with K0 and U, V 3Â£iVxp, such that K0 is close to
K, i.e., the rank of UVT is small, and there exists a fast algorithm to solve a linear
system with the matrix K0. Then, the solution to the linear system Kx = b can
be obtained by using the generalized capacitance matrix algorithm which involves
solving the wellstructured linear systems with the matrix Ko and the capacitance
matrix linear system.
The choice of K0 depends on the data compatibility matrix and the smooth
ness constraint matrix Ks. We have different choices of K0 for sparse (scattered) and
dense data cases. They are given in the following sections.
4.2.1 Dense Data Surface Reconstruction
We will first describe our choice of the matrix K0 and the equivalent Lyapunov
matrix equation for the dense data surface reconstruction with a membrane spline.
Then we will present the matrix Ko for the thinplatemembrane spline surface re
construction based on the construction scheme for the membrane problem. The thin
plate spline is a special case of thinplatemembrane spline and will not be discussed
separately. In addition, the selection of the matrices U and V will be discussed for
both cases.
Membrane spline for the dense data case
The data compatibility matrix K for the dense data case with uniform weight
ing is an identity matrix. Focusing on the surface reconstruction problem in a rectan
gular domain, the matrix K0 is chosen to contain both the membrane computational
131
reliable optical flow estimates in a homogeneous region. Anandan [2] treated the
estimates provided by the matching process as the data constraints for optical flow
with appropriate confidence measure, and incorporated the smoothness constraint on
optical flow to achieve better results. Recently, Szeliski and Coughlan [77] used the
twodimensional spline model to represent the flow field and minimized the following
SSD function
E(u) = [M* + uijj + VH) Hhj)}2 (62)
j=ii=i
where the vector u is the concatenation of the flow components tt,j and u,j. The
LevenbergMarquardt algorithm was then employed to solve this nonconvex opti
mization problem. They reported very accurate results using this method. The 2D
spline models for optical flow field assume the flow field to be wellapproximated
by the 2D spline basis functions in the preset patches. However, it is difficult to
incorporate discontinuities into these 2D spline patches. In our method, we use the
standard finite difference discretization on the flow field and take the SSD as the
data constraint energy in a regularization framework. A preconditioned nonlinear
conjugate gradient algorithm is employed to minimize the total energy function. The
experimental results on the standard synthetic image sequence using our algorithm
are better than or comparable to the best existing results reported in literature.
Unlike some other algorithms that only generate sparse optical flow estimates, our
algorithms give dense optical flow estimates. In addition, motion discontinuities can
be incorporated in our (regularization) framework.
89
where A is the Laplace operator, A2 is the biharmonic operator, c(x) is the weight
ing function associated with the data d(x), the functions p(x) and r(x) control the
rigidity and tension of the solution surface respectively. Now lets consider the case
when p(x) and r(x) are constant functions, i.e. p(x) = p and r(x) = r. In addition,
lets also assume the function c(x) to be a constant function, which means the Gaus
sian noise associated with each data measurement has the same variance. Taking the
Fourier transform of equation 5.3 we get
v() =
,/>(! r) u>  + pr\\ w  + c,
V(U).
(5.4)
Where u> = (uq, ,u>d) is the ddimensional frequency domain variable while, V(u>)
and V(cj) are the Fourier transforms of n(x) and d(x) respectively. Equation 5.4 can
be given a filtering interpretation namely, V(u>) is obtained by lowpass filtering the
data This lowpass filter is characterized by the first term on the right hand
side in equation 5.4 called the transfer function. Our preconditioner is constructed
in a wavelet basis to approximate this transfer function.
Equation 5.4 expresses a linear shiftinvariant lowpass filtering of the data in
frequency domain, derived from a regularization framework. This filtering interpre
tation can be generalized to the regularly sampled discrete data case in a straight
forward manner. However, this linear shiftinvariant filter will become a nonlinear
and spatially varying filter in the general cases when the functions p(x), r(x) and
c(x) are not constant. These cases include discontinuitypreserving regularization
and sparsedata interpolation. Although the corresponding nonlinear and spatially
87
that of the original matrix according to the aforementioned criterion of choosing a
preconditioner P. Consequently, the convergence of the iterative methods can be
accelerated by a transformation of the original linear system to a better conditioned
linear system.
Although the conditioning of a linear system can be improved after an appro
priate transformation, it is not practical to form the matrix K to solve equation
5.2 since, the transformation of a matrix requires a large number of operations and
the sparsity structure of the matrix K could be destroyed after the transformation.
Fortunately, it is not necessary to compute this transformation of the matrix K to
achieve the preconditioning effect in a conjugate gradient algorithm. Instead, the
same preconditioning can be achieved via solving an auxiliary linear system Pz = r,
where r is the residual vector, in each conjugate gradient iteration. (The precondi
tioned conjugate gradient algorithm is given in section 5.3.) Therefore, the second
consideration for a good preconditioner is to choose the matrix P such that the linear
system Pz = r can be easily solved.
The above two criteria for designing a good preconditioner in conjugate gradient
are equally important. Reducing the condition number can accelerate the convergence
of the iterative methods employed. Choosing the preconditioner P such that Pz = r
is easily solvable is very crucial for efficiently updating the solution in each iteration
of the preconditioned conjugate gradient (PCG) algorithm. Our preconditioner is de
signed in a wavelet basis by approximating the spectral characteristics of the matrix
186
[67] K. S. Riedel, A ShermanMorrisonWoodbury identity for rank augmenting
matrices with application to centering SIAM J. Matrix Anal. Appl., vol. 13,
no. 2, pp. 659662, 1992.
[68] 0. Rioul and M. Vitterli, Wavelets and signal processing , IEEE Signal Pro
cessing Magazine, vol. 8, pp. 1438, 1991.
[69] G. Rudolph, Convergence analysis of canonical genetic algorithms , IEEE
Trans. Neural Networks, vol. 5, no. 1, pp. 96101, 1994.
[70] J. Shah, Segmentation by nonlinear diffusion , in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, (Maui, HI), pp. 202
207, June 1991.
[71] R. Shonkwiler and E. Van Vleck, Parallel speedup of Monte Carlo methods
for global optimization, Journal of Complexity, vol. 10, pp. 6495, 1994.
[72] T. Simchony, R. Chellappa, and M. Shao, Direct analytical methods for solv
ing Poisson equations in computer vision problems , IEEE Trans. Patt. Anal.
Machine Intell., vol. 12, May 1990.
[73] J. Suzuki, A Markov chain analysis on simple genetic algorithm, IEEE Trans.
Systems, Man and Cybernetics, vol. 25, no. 4, pp. 655659, 1995.
[74] R. Szeliski, Bayesian Modeling of Uncertainty in Lowlevel Vision. Boston:
Kluwer Academic Publishers, 1989.
[75] R. Szeliski, Fast surface interpolation using hierarchical basis functions, IEEE
Trans. Patt. Anal. Machine Intell., vol. 12, no. 6, pp. 513528, 1990.
[761 R. Szeliski, Fast shape from shading , CVGIP: Image Understanding, vol. 53,
no. 2, pp. 129153, 1991.
[77] R. Szeliski and J. Coughlan, Hierarchical splinebased image registration , in
Proceedings of the IEEE Conference Computer Vision & Pattern Recognition,
(Seattle, WA), pp. 194201, 1994.
[78] D. Terzopoulos, Multilevel computational processes for visual surface recon
struction , Computer Vision, Graphics, Image Processing, vol. 24, pp. 5296,
1983.
[79] D. Terzopoulos, Image analysis using multigrid relaxation methods , IEEE
Trans. Patt. Anal. Machine Intell., vol. 8, pp. 129139, 1986.
[80] D. Terzopoulos, Regularization of inverse visual problems involving disconti
nuities , IEEE Trans. Patt. Anal. Machine Intell., vol. 8, no. 4, pp. 413424,
1986.
68
pair (&, l) in this double summation be denoted by A. By using the split in the set
A, we have
EÂ£/(m4 = 4{/(o,i)+/(i,o)+/(i,i)+Â£ /(M)+E/(M)+ Â£ /(*,/)}.
k l n n k=2 1=2 (k,l)eB
(4.23)
where /(i,!) = ((r),+),m<,, and B = (fc,f)\(l,l) I M = 1,... ,n 1. By using
the fact that f(k, l) is a strictly decreasing function of k and /, we can bound the
terms with summation in equation 4.23 by the corresponding terms in an integration
form as follows.
Â£Â£/(M)i < {/(o,i) + /(i,o) + /(i,i)}i + i /'/(*,o)
72 it Tt J
/ n
+;/i' /('i')* + !Js{x2 + yly+2,dxiV <4'24)
where the integration domain S = {(rr, ?/)(rr, y) G [0,1] x [0,1]\[0, x [0, Â£)}. By
transforming the Cartesian coordinate to the polar coordinate, the term with double
integral in equation 4.24 can be bounded by the following integral,
1
j,4m+42q
rdrdO
( 4m+22q
4(2 m + 1 q)
1 n4m+22q^
4(2 m + 1 q)
2^m+lq^
<
(4.25)
114
Figure 5.12. The computed solutions using the adaptive preconditioned conjugate
gradient algorithm after (a) 10 iterations, (b) 40 iterations, (c) 100 iterations, in the
third surface reconstruction experiment.
187
[81] D. Terzopoulos, The computation of visiblesurface representations , IEEE
Trans. Patt. Anal. Machine Intel!, vol. 10, no. 4, pp. 417438, 1988.
[82] A. N. Tikhonov and V. Y. Arsenin, Solutions of IllPosed Problems. Washington,
DC: Winston & Sons, 1977.
[83] V. Torre and T. Poggio, On edge detection , IEEE Trans. Patt. Anal. Machine
Intel!, vol. PAMI8, pp. 147163, 1986.
[84] S. Uras, F. Girosi, A. Verri, and V. Torre, A computational approach to motion
perception, Bio! Cybern., vol. 60, pp. 7987, 1988.
[85] B. C. Vemuri and S. H. Lai, A fast solution to the surface reconstruction prob
lem , in Proceedings of SPIE Conference on Mathematical Imaging: Geometric
Methods in Computer Vision, (San Diego, CA), pp. 2737, SPIE, July 1993.
[86] B. C. Vemuri and A. Radisavljevic, Multiresolution stochastic hybrid shape
models with fractal priors , ACM Trans, on Graphics, vol. 13, pp. 177207,
April 1994.
[87] E. L. Wachspress, Extended application of alternating direction implicit itera
tion model problem theory , Journal of SIAM, vol. 11, pp. 9941016, December
1963.
[88] G. Wahba, Spline Models for Observational Data. Philadelphia: Society for
Industrial and Applied Mathematics, 1990.
[89] G. Whitten, Scale space tracking of deformable sheet models for computational
vision , IEEE Trans. Patt. Ana! Mach. Intel!, vol. PAMI15, pp. 697706, July
1993.
[90] G. Winkler, Image Analysis, Random Fields and Dynamic Monte Carlo Methods.
Berlin, Germany: Springer Verlag, 1995.
[91] J. Xu, Iterative methods by space decomposition and subspace correction,
SIAM Review, vol. 34, pp. 581613, December 1992.
[92] M. H. Yaou and W. T. Chang, Fast surface interpolation using multiresolution
wavelet transform , IEEE Trans. Patt. Anal, and Machine Intel!, vol. PAMI
16, no. 7, pp. 673688, 1994.
[93] H. Yeserentant, On multilevel splitting of finite element spaces , Numeriche
Mathematik, vol. 49, pp. 379412, 1986.
184
[40] B. K. P. Horn and B. G. Schunk, Determining optical flow , Artificial Intel
ligence, vol. 17, pp. 185203, 1981.
[41] K. Ikeuchi and B. K. P. Horn, Numerical shape from shading and occluding
boundaries, Artificial Intelligence, vol. 17, pp. 141184, 1981.
[42] S. Jaffard, Wavelet methods for fast resolution of elliptic problems , SIAM
Journal of Numerical Analysis, vol. 29, pp. 965986, August 1992.
[43] M. Kass, A. Witkin, and D. Terzopoulos, Snakes: active contour models, Inti.
Journal of Computer Vision, vol. 1, pp. 321331, 1987.
[44] C. Koch, J. Marroquin, and A. Yuille, Analog neuronal networks in early
vision , Proc. Natl. Acad. Sci. USA, vol. 83, pp. 42634267, 1986.
[45] S. H. Lai and B. C. Vemuri, An O(N) iterative solution to the Poisson equa
tions in lowlevel vision problems, in Proceedings of the IEEE Conference on
Computer Vision & Pattern Recognition, (Seattle, WA), pp. 914, June 1994.
[46] C. Lanczos, An iteration method for the solution of the eigenvalue problem
of linear differential and integral operators , J. Res. Natl. Bur. Stand., vol. 45,
pp. 255282, 1950.
[47] Y. G. Leclerc, Constructing simple stable descriptions for image partition ,
International Journal of Computer Vision, vol. 3, pp. 73102, 1989.
[48] A. Lu and E. L. Wachspress, Solution of Lyapunov equations by alternating
direction implicit iteration , Computers Math. Applic., vol. 21, no. 9, pp. 4358,
1991.
[49] B. Lucas and T. Kanade, An iterative image registration technique with an
application to stereo vision , in Proceedings of the 7th International Joint Con
ference on Artificial Intelligence, (Vancouver, B.C., Canada), pp. 121130, 1981.
[50] S. G. Mallat, A theory of multiresolution signal decomposition , IEEE Trans.
Patt. Anal. Machine Intell., vol. PAMI11, no. 7, pp. 647693, 1989.
[51] J. Marroquin, S. Mitter, and T. Poggio, Probabilistic solution of illposed prob
lems in computational vision , Journal of the American Stat. Assoc., vol. 82,
no. 397, pp. 7689, 1987.
[52] J. L. Marroquin, Probabilistic Solution of Inverse Problems. Ph.D. Dissertation,
Massachusetts Institute of Technology, 1985.
[53] D. Mumford and J. Shah, Boundary detection by minimizing functionals , in
Proceedings of the IEEE Conference on Computer Vsion and Pattern Recogni
tion, (San Francisco, CA), pp. 2226, June 1985.
energy function E'(\), i.e.,
ft(l) =
(7.9)
z = 2>*,<1>'r,
i
(7.10)
where T is a constant. Let the population size be fixed and denoted by a constant
M. By using a Gibbs sampler to obtain M samples from the Gibbs distribution
given in equation 7.9, we obtain an initial population of size M, denoted by the set
{1!,12,...,1m}
In the reproduction operation, we use the Boltzmann weighted selection [25, 16]
scheme such that the survival probability for the ith individual is given by
Pr{h)
 LeE'(h)/T
~ Z'
(7.11)
M
Z'
= ^2eE'(lj)/T
J=1
(7.12)
For the informed mutation operator, we assign the probability of mutating the
ith individual 1, to a line process configuration I to be a Gibbs distribution as follows.
64
for the error after t iterations,
n1 n1
ax, = Â£Â£
k=0 =0
TT/^ Pj w Pj \
jlch + pj^i+pj
aki{qk q/).
(4.16)
The function inside the bracket is the error reduction factor in the basis q*,
Notice that the Lyapunov matrix equation has an infinite number of solutions due to
the singularity of the matrix A. The error reduction in each step is always less than
1 for eigenvectors with positive eigenvalues and positive parameters pj while it is
always 1 for the basis q0qo (which is evident from equation 4.16 after substituting
k = l = 0 and A0 = 0). A solution to the Lyapunov matrix equation can be obtained
without reducing this factor at the component k = l = 0.
The classical ADI minimax parameter problem is to find the ADI parameters pj
to minimize the maximum of the error reduction function excluding the zero eigen
value, i.e., to minimize the function
Pt =
max
V(fc,/)\(0,0)
t
UK
^k Pj\t^l Pj
^k + Pj A/ f Pj
(4.17)
Applying the result from the classical ADI minimax analysis [48], leads to an O(log N)
iterations for convergence [85]. This in turn yields an overall computational complex
ity of 0(N log TV) for the ADI method since, O(N) time is required per iteration in
the ADI method. An 0(N log N) computational complexity is unacceptable for our
purposes (recall that we seek an O(N) solution). Thus, in order to design an O(N)
algorithm, it is essential to use the additional knowledge about the problem e.g.,
53
Figure 4.1. The flow chart of the generalized capacitance matrix algorithm.
for the rest of this chapter. Using the generalized capacitance matrix algorithm to
solve the linear system Kx = b with the above splitting for K, we need to solve several
linear systems of the form K0x = b as shown in equations 3.2 and 3.3 respectively.
Especially, in equation 3.3, we have to solve it p times, where p is the rank of K K0.
Therefore, it is very crucial to choose a well structured Ko so as to efficiently compute
the solution in the capacitance matrix method.
In general, there are three criteria that dictate the choice of K0. The first is
the difference between K0 and K must be small in the sense that rank(K Ko) is
small. Secondly, K0 must possess some nice structure so as to have a fast numerical
solution to the associated linear system. Lastly, K0 should be translation invariant;
EFFICIENT COMPUTATIONAL METHODS FOR EARLY VISION PROBLEMS
By
SHANGHONG LAI
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1995
To my wife YihChou
and my son Patrick
ACKNOWLEDGMENTS
I am deeply grateful to my advisor and committee chairman, Dr. Baba C. Ve
muri, whose guidance, encouragement and support made the completion of this work
possible. I thank him for his academic as well as nonacademic advice and assistance
during my study at the University of Florida. Many ideas in this dissertation were
produced through the discussions with him. I would also like to thank Dr. Staudham
mer, Dr. Doty, Dr. Harris, Dr. Sigmon and Dr. Rao for serving in my supervisory
committee and advising me on various aspects of this dissertation. I sincerely thank
the National Science Foundation for supporting me financially to finish this work
under the grant number ECS9210648. Special thanks go to my parents, who never
hesitate to deliver their love, encouragement and support to enrich my life. Finally,
I would like to dedicate this dissertation to my beloved wife, YihChou, who shared
the happiness and struggle I experienced during the work of this dissertation, and to
my son, Patrick.
in
TABLE OF CONTENTS
ACKNOWLEDGMENTS iii
LIST OF TABLES vii
LIST OF FIGURES viii
ABSTRACT xii
CHAPTERS
1 INTRODUCTION 1
1.1 Formulations of Early Vision Problems 2
1.2 Computational Algorithms 3
1.3 Main Contributions of this Thesis 6
1.4 Thesis Overview 8
2 PROBLEM FORMULATIONS 11
2.1 Deterministic Framework 11
2.2 Probabilistic Framework 14
2.3 Variational Formulations for Some Early Vision Problems 18
2.3.1 Surface Reconstruction 19
2.3.2 Shape from Shading 20
2.3.3 Optical Flow Computation 24
3 GENERALIZED CAPACITANCE MATRIX THEOREMS 28
3.1 Introduction 28
3.2 Generalized Capacitance Matrix Theorems 33
3.3 Discussion 48
4 GENERALIZED CAPACITANCE MATRIX ALGORITHM 50
4.1 Structure of the Algorithm 50
4.2 Choice of the Splitting 52
4.2.1Dense Data Surface Reconstruction 55
IV
4.2.2 Sparse Data Surface Reconstruction 60
4.3 Numerical Solution 61
4.3.1 ADI Method for the Lyapunov Equation 62
4.3.2 Modified BCG Method for the Capacitance Matrix Equation 72
4.4 Experimental Results 74
4.4.1 Surface Reconstruction 75
4.4.2 Shape from Shading 77
5 ADAPTIVE PRECONDITIONING TECHNIQUE 81
5.1 Introduction 81
5.2 Adaptive Preconditioning in a Wavelet Basis 85
5.2.1 Preconditioning Technique 86
5.2.2 Regularization Filter 88
5.2.3 Wavelet Basis 90
5.2.4 Adaptive Preconditioning via the Modulation in a Wavelet Basis 93
5.3 Adaptive Preconditioned Conjugate Gradient Algorithm 96
5.3.1 Surface Reconstruction 97
5.3.2 Shape from Shading 100
5.3.3 Optical Flow Computation 102
5.4 Experimental Results 102
5.4.1 Surface Reconstruction 103
5.4.2 Shape from Shading 117
5.4.3 Optical Flow Computation 123
5.5 Summary and Discussion 126
6 ROBUST OPTICAL FLOW COMPUTATION 128
6.1 Introduction 128
6.2 Modified gradientbased method 132
6.3 SSDbased Regularization Method 137
6.3.1 Nonlinear Conjugate Gradient Algorithm 139
6.3.2 Modified Search Direction Scheme 141
6.4 Experimental Results 142
6.4.1 Modified Gradientbased Method 143
6.4.2 SSDbased Regularization Method 146
6.5 Summary and Discussion 150
7 RESEARCH DIRECTIONS 152
7.1 Introduction 152
7.2 Previous Energy Minimization Methods 154
7.2.1 Deterministic Optimization Methods 155
7.2.2 Stochastic Optimization Methods 160
7.3 Proposed Hybrid Search Algorithm 161
v
7.3.1 Elements of Genetic Algorithms 164
7.3.2 Informed Genetic Algorithm 168
7.3.3 Preliminary Experimental Results 171
7.3.4 Discussion 176
8 CONCLUSIONS 178
REFERENCES 181
BIOGRAPHICAL SKETCH 188
vi
LIST OF TABLES
6.1 Summary of Square 2 results 145
6.2 Summary of Diverging Tree results 146
6.3 Summary of Sinusoid 1 results 148
6.4 Summary of Translating Tree results 150
Vll
LIST OF FIGURES
4.1 The flow chart of the generalized capacitance matrix algorithm. ... 53
4.2 (a) Data constraint molecule and (b) membrane computational molecules
periodically imposed on the domain 0 56
4.3 Surface reconstruction example; (a) original sparse data, (b) recon
structed surface using membrane smoothness, (c) reconstructed sur
face using thinplate smoothness, obtained using the generalized ca
pacitance matrix algorithm 76
4.4 Surface reconstruction with prespecified discontinuities along a hori
zontal line; (a) original sparse data set with depth discontinuities along
the indicated horizontal line; reconstructed surfaces using a membrane
spline after (b) 1 iteration and (c) 10 iterations of ADI method; (d)
reconstructed surface using a thinplate spline after 16 iterations of
ADI method 77
4.5 Surface reconstruction with prespecified discontinuities along a diag
onal line; (a) the original sparse data set with depth discontinuities
along the indicated diagonal line, (b) reconstructed surface using a
membrane spline after 10 iterations of ADI method 78
4.6 Shape from shading example, (a) Shaded sphere image (b) irregular
occluding boundary imbedded in a rectangle (c) recovered surface ori
entation, (d) reconstructed surface shape 80
5.1 The implementation of the wavelet transform via the QMF struc
ture from one level to the next, (a) decomposition (b) reconstruction,
adopted from Mallat[1989] 92
5.2 Division of the frequency domain in a wavelet basis 92
5.3 The sparse data sets used for the surface reconstruction, (a) synthetic
data (b) synthetic symmetric data (c) real laser range data 104
vm
5.4 (a), (d), (g) & (j) are the computed solutions using the APCG al
gorithm, (b), (e), (h) & (k) are the computed solutions using HCG
algorithm, (c), (f), (i) & (1) are the computed solutions using conju
gate gradient algorithm, after 1,5, 10 and 20 iterations, respectively. 107
5.5 The computed solutions after (a) 10 iterations of APCG (b) 25 itera
tions of HCG (c) 30 iterations of CG using the same time 108
5.6 The convergence for different preconditioners in the preconditioned
conjugate gradient algorithm in the surface reconstruction problem
with the synthetic sparse data set and (a) with controlledcontinuity
stabilizer (p = 1, r = 0.1) (b) with thinplate stabilizer 109
5.7 The convergence for different preconditioners in the preconditioned
conjugate gradient algorithm for the surface reconstruction problem
with discontinuities specified in the controlledcontinuity stabilizer. 109
5.8 The computed solutions for the surface reconstruction with disconti
nuities using the APCG algorithm after (a) 10, (b) 20, (c) 40 and (d)
60 iterations, respectively 110
5.9 The convergence of Pentlands preconditioner in conjunction with the
conjugate gradient algorithm is worse than that of conjugate gradi
ent algorithm without preconditioning for the surface reconstruction
problem with the synthetic sparse data set using a membrane stabilizer.Ill
5.10 Row profile of the wavelet transformed membrane matrix Ks of size
64 x 64 along different rows, (a) row 3 (b) row 10 (c) row 30 (d) row 50.
This figure establishes the falsity of the diagonal dominance behavior
of K4 claimed in [19] 112
5.11 The convergence for different preconditioners in the preconditioned
conjugate gradient algorithm for the surface reconstruction problem
with sparse symmetric data set using a controlledcontinuity stabilizer. 113
5.12 The computed solutions using the adaptive preconditioned conjugate
gradient algorithm after (a) 10 iterations, (b) 40 iterations, (c) 100
iterations, in the third surface reconstruction experiment 114
5.13 (a) The convergence for the APCG, HCG and CG algorithms for the
surface reconstruction problem with the real laser range data set using
a controlledcontinuity stabilizer, (b) The computed solution after 10
APCG iterations 116
5.14 Shaded images of bump with different light source directions 117
IX
5.15 The convergence of the APCG, HCG, and CG algorithms for the
shapefromshading problem 118
5.16 Shaded images of bump after 100 iterations of APCG algorithm with
out using any boundary condition 118
5.17 Shaded images of bump after (a) & (b) 10 iterations, (c) & (d) 20
iterations, (e) & (f) 40 iterations of the APCG algorithm with the
occluding boundary condition 120
5.18 The computed (a) gradient and (b) height after 40 iterations of the
APCG algorithm 121
5.19 The convergence curves of the (a) energy, (b) p q error, (c) z error
for the CG, HCG and APCG algorithm 122
5.20 One frame from the (a) square image sequence and (b) rotating Rubic
cube image sequence 124
5.21 Computed flow of the (a) translating square and (b) rotating Rubic
cube 125
5.22 The convergence curves of the CG, HCG and APCG algorithms for
(a) translating square (b) rotating Rubic image sequence 126
6.1The discrepancy between the gradient directions of the function I0 and
7i at the ith location is small and thus the use of the gradient direction
as the search direction for the ith component is robust; while the the
jth component of the gradient direction is unreliable for a local search. 142
6.2 Frames from the (a) Square 2 and (b) Diverging Tree image sequences. 144
6.3 Computed optical flow of the (a) Square 2 and (b) Diverging Tree
image sequences after 40 iterations of the incomplete Cholesky pre
conditioned CG algorithm 145
6.4 (a) One frame from the Rotating Rubic Cube sequence (b) Computed
optical flow after 40 iterations of the incomplete Cholesky precondi
tioned CG algorithm 147
6.5 Frames from the (a) Sinusoid 1 and (b) Translating Tree image sequences. 147
6.6 Computed optical flow of the (a) Sinusoid 1 and (b) Translating Tree
image sequences, using the preconditioned nonlinear CG algorithm. 149
6.7 (a) One frame from the Hamburg Taxi sequence (b) Computed optical
flow after 200 iterations of the preconditioned nonlinear CG algorithm. 150
x
7.1 The crossover operation example 166
7.2 The sparse data set used for the surface reconstruction with disconti
nuity detection, (a) original surface (b) sparsely sampled data set. . 172
7.3 (a) Clique for the line process (x: line process, 0: nodal variable for
surface), (b) clique potentials for different line process configurations. 173
7.4 Surface reconstruction example: (a) the reconstructed surface (b) the
recovered discontinuity map ( solid line ) obtained after 3 generations
of the informed GA and imposed on the true discontinuity map (
dashed line ) 174
7.5 Surface reconstruction with noise example: (a) the noisy sparse data
set, (b) the reconstructed surface (c) the recovered discontinuity map
( solid line ) obtained after 5 generations of the informed GA and
imposed on the true discontinuity map ( dashed line ) 175
xi
Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy
EFFICIENT COMPUTATIONAL METHODS FOR EARLY VISION PROBLEMS
By
SHANGHONG LAI
August 1995
Chairman: Dr. Baba C. Vemuri
Major Department: Electrical and Computer Engineering
Early vision has been one of the most fundamental research topic in computer
vision. The task of early vision is to recover physical properties of threedimensional
objects (such as shape) from twodimensional sensed data. Example tasks consist
of surface reconstruction, optical flow computation, shape from shading, shape from
stereo, edge detection, lightness problem, segmentation problem,... etc. Early vision
problems may be formulated either in a deterministic or a probabilistic framework.
A popular technique in the deterministic framework is based on the regularization
theory, and the probabilistic formulation exploits the Markov random field (MRF)
model to characterize the function being estimated. Both formulations lead to min
imization of equivalent energy functions. When discontinuity detection is involved
in the formulation, the energy functions to be minimized become nonconvex. In this
thesis, we propose efficient computational algorithms for minimizing these energy
functions with and without discontinuity detection.
This thesis contains the following four main contributions:
1. A new theoretical and algorithmic framework based on the capacitance matrix
method for solving elliptic partial differential equations arising in early vision
and engineering problems.
2. A novel adaptive preconditioning algorithm using a wavelet transform for solv
ing illconditioned large sparse linear systems arising in regularized solutions of
illposed problems in early vision and other domains.
3. Two new, robust and efficient algorithms for accurate optical flow computation
namely, a modified gradientbased formulation and an SSDbased regularization
formulation.
4. A new hybrid (stochastic f deterministic) search algorithm involving an in
formed genetic algorithm (GA) and either one of the two aforementioned fast
numerical algorithms, for solving nonconvex optimization problems in early
vision.
The efficiency of the proposed algorithms are demonstrated through experi
ments on several early vision problems, including surface reconstruction, shape from
shading, and optical flow computation. In addition to computer vision applications,
Xlll
the theory and algorithms developed in this thesis can also be applied to other ar
eas such as cartography/terrain mapping, reverse engineering, MPEG image sequence
compression, material science applications (velocity estimation of dry granular flows),
engineering applications and mathematical physics.
xiv
CHAPTER 1
INTRODUCTION
Early vision has been one of the most fundamental research topic in computer
vision. The primary goal of early vision is to recover the geometric or physical
properties of threedimensional objects from twodimensional sensed data. Typical
properties of objects are the shape ( depth or orientation ), texture, color, reflectance
and motion parameters. Example tasks of early vision include surface reconstruction,
optical flow computation, shape from shading, stereo matching, edge detection and
lightness problem [28, 41, 40, 38, 63, 6, 72]. The results of early vision can be used
for higher level tasks such as bin picking, object recognition, passive navigation and
motion tracking.
Early vision problems can be regarded as inverse problems. During the imaging
process, some information is lost since it projects threedimensional objects to two
dimensional images. Therefore, it is usually not possible to solve the inverse problems
from data information only. In fact, most early vision problems are illposed problems
[6]. A problem is said to be wellposed if its solution (a) exists, (b) is unique, and
(c) depends continuously on the input data [62]. We can transform an illposed
problem into a wellposed one by imposing additional constraints (such as smoothness
constraints) on the solution.
1
2
1.1 Formulations of Early Vision Problems
Early vision problems may be formulated either in a deterministic or a prob
abilistic framework. A popular technique in the deterministic framework is based
on regularization theory [82, 80] and leads to the minimization of energy function
als. The probabilistic formulation exploits the Markov random field (MRF) model to
characterize the function being estimated. We can relate the smoothness constraints
in the regularization theory to the prior distribution of the MRF model. Both the
regularization formulation and the MRF formulation of early vision problems lead
to the minimization of equivalent energy functions. We will give the deterministic
formulations of some early vision problems in the next chapter.
Discontinuities contain very crucial information in computer vision. There are
different treatments of discontinuities in early vision research. Early work on vi
sual reconstruction problems reported in literature ignores discontinuities altogether
[40, 28, 62], which yields undesirable smoothing over discontinuity locations. How
ever, more recently problem formulations allow for incorporation of prespecified dis
continuities [81, 75]. For both cases, the aforementioned energy function is convex
and quadratic for most early vision problems. Minimization of a convex quadratic
function can be achieved by solving the associated linear system. In this thesis, we
present several efficient and novel computational algorithms to minimize the energy
functions for early vision problems with prespecified discontinuities.
There have been many methods proposed to detect discontinuities in the under
lying function from the observed data. An elegant method to treat discontinuities
3
is via the introduction of a set of binary variables called the line process. Each line
process variable takes a value of 1 in the presence of discontinuities and zero other
wise. Line process has been included in both the deterministic [53, 8] and the MRF
[22] formulations. The resulting energy function involves real variables (for surface
or image) coupled with binary variables (for discontinuities) and is therefore a non
convex function. Both formulations lead to problems involving the minimization of
nonconvex functions.
1.2 Computational Algorithms
For the prespecified discontinuities case, numerous iterative methods have been
employed to minimize the energy functions namely, GaussSeidel [14], successive
overrelaxation (SOR), conjugate gradient (CG) [26], hierarchical conjugate gradi
ent (HCG) [75], and multigrid methods [79], the slowest being GaussSeidel, taking
0(N2) time to converge ( N(= n2) is the number of discretization ) [26] while, the
fastest being multigrid methods taking O(N) time to converge for some regular class
of problems [29]. Although the multigrid techniques have been applied successfully
to a general class of problems, the proofs of convergence are restricted to a special
class of problems [29, 9, 91]. For these results ( posed as solution for PDE problems),
discontinuity in the solution is not considered. Therefore, inclusion of discontinuity
into the problem is bound to deteriorate the convergence speed of multigrid methods.
In addition, the construction of multigrid needs to be modified for nonrectangular
regions. The modification becomes quite complicated for implementation.
4
Direct solutions to the associate linear systems from minimization of the energy
functions are based on a theorem called the capacitance matrix theorem which was
stated and proved in Buzbee et al. [10]. However, this capacitance matrix theorem
is restricted to some early vision problems and on the design on numerical algo
rithms [45]. The direct solutions using the capacitance matrix theorem employ the
FourierToeplitz method in combination with LU decomposition [72], the Cholesky
decomposition [10] or the conjugate gradient [65] technique. The FourierToeplitz
method requires 0(N log N) time and the LU decomposition, Cholesky factoriza
tion or the conjugate gradient require 0(n3) = 0(Ny/), thus making the overall
complexity 0(Ny/).
In this thesis, we present two efficient algorithms for the minimization of the en
ergy functions in early vision. One is a generalized capacitance matrix algorithm and
the other is an adaptive preconditioning technique in conjunction with a conjugate
gradient or nonlinear conjugate gradient algorithm. The development of the gener
alized capacitance matrix algorithm is based on the generalized capacitance matrix
theorems that we state and prove in this thesis. In our algorithm, we first transform
the linear system into a Lyapunov matrix equation or a cascade of two Lyapunov
matrix equations with an appropriate righthand side (RHS). This RHS is obtained
by solving an associated linear system with a capacitance matrix. The Lyapunov ma
trix equations are solved using the alternating direction implicit method (ADI) while
the solution to the capacitance matrix linear system is obtained using a modified
5
biconjugate gradient technique. We prove that the ADI method takes a constant
number of iterations with each iteration taking 0(N) time.
Our adaptive preconditioning technique is used in conjunction with a conjugate
gradient or nonlinear conjugate gradient algorithm to minimize the energy function.
This adaptive preconditioner is constructed in a wavelet basis and adapted to the
spectral characteristics of the imposed smoothness constraint as well as the data
constraint in the energy function. The construction of the preconditioner makes
use of the division of frequency domain property in a wavelet basis. Previously,
there have been some preconditioners proposed in computer vision literature [75, 61,
92]. However, they are either not adapted to the problem [75, 92] or constructed
inappropriately [61]. We empirically show the superiority of our preconditioner over
other preconditioners on various early vision problems.
To solve the early vision problems with discontinuity detection, both regular
ization and MRF formulations lead to the minimization of a coupled (binaryreal)
nonconvex energy function. Numerous algorithms have been proposed to solve this
coupled nonconvex minimization problem. They can be categorized into the deter
ministic methods [8] [44] [21] and the stochastic methods [22] [51]. Most of the deter
ministic methods can only deal with simple constraints on the line process [8, 21, 31];
in addition, they can not find the global minimum solution. The stochastic optimiza
tion methods are primarily the simulated annealing (SA) type algorithms. Simulated
annealing is a general global optimization algorithm, but its convergence rate is very
slow, thus making it impractical for many applications.
6
In this thesis, we will briefly discuss existing methods for this coupled noncon
vex minimization problem and propose a novel hybrid search algorithm as a possible
solution toward solving this problem. This hybrid search algorithm which is a combi
nation of a stochastic and a deterministic search techniques. For the stochastic search,
we propose an informed genetic algorithm (GA) whereas for the deterministic search,
we employ either one of the two deterministic algorithms mentioned above. Some
promising preliminary experimental results of applying this algorithm on sparse data
surface reconstruction along with discontinuity detection problems are presented.
1.3 Main Contributions of this Thesis
This thesis contains the following four main contributions:
1. A new theoretical and algorithmic framework based on the capacitance matrix
method is proposed for solving elliptic partial differential equations arising in
early vision and engineering problems. The algorithm is based on the develop
ment of the generalized capacitance matrix theorems which we state and prove
in this thesis. We make use of the ADI method as a fast elliptic solver in this
algorithm and prove that it takes a constant number of iterations to converge
within a given error tolerance with each iteration taking 0(N) time.
2. A novel adaptive preconditioning algorithm using a wavelet transform is devel
oped for solving illconditioned large systems of equations arising in regularized
solutions of illposed problems in early vision and other domains. Our pre
conditioner is constructed by approximating the spectral characteristics of the
imposed smoothness as well as data constraints. This spectral approximation is
7
achieved by using the division of frequency domain property in a wavelet basis.
By using this construction, our preconditioner is adapted to to the smoothness
constraint as well as data constraint.
3. Two robust algorithms are proposed for computing optical flow. The first is a
modified gradientbased regularization method, and the other is an SSDbased
regularization method. In our modified gradientbased method, we selectively
combine the image flow constraint and the contourbased flow constraint into
the data constraint in a regularization framework to amend the errors in the
image flow constraint caused by brightness discontinuities. Our SSDbased
regularization method uses the SSD (Sum of Squared Differences) measure as
the data constraint in a regularization framework. Experimental results for
these two algorithms are given to demonstrate their superior performance.
4. A novel hybrid (stochastic + deterministic) search algorithm is proposed to
solve the nonconvex optimization problems for early vision involving disconti
nuity detection. This hybrid search algorithm consists of an informed genetic
algorithm (GA) and either one of the two aforementioned numerical algorithms.
The informed GA is used for the line process only in the outer loop of our algo
rithm, while the deterministic algorithm is employed to solve the subproblems
of minimizing the energy function given a fixed line process configuration.
The efficiency of the proposed algorithms is demonstrated through experiments
on several early vision problems, including surface reconstruction, shape from shad
ing, and optical flow computation. In addition to computer vision applications, the
8
theory and algorithms developed in this thesis can also be applied to many other
areas, such as cartography/terrain mapping, reverse engineering, MPEG image se
quence compression, material science applications (velocity estimation of dry granular
flows), engineering applications and mathematical physics.
1.4 Thesis Overview
The rest of this thesis is organized as follows.
In chapter 2, the deterministic and probabilistic frameworks of the early vision
problems are briefly reviewed and the formulations of some early vision problems
namely, surface reconstruction, shape from shading and optical flow computation,
are given. The deterministic framework is primarily based on the regularization
formulation. For the probabilistic framework, we discuss the maximum a posteriori
(MAP) estimation in a coupled Markov Random Field (MRF) model.
In chapter 3, the generalized capacitance matrix theorems are stated with com
plete proof. These theorems are generalizations of using ShermanMorrisonWoodbury
formula [26] to solve linear systems and the capacitance matrix theorem proposed by
Buzbee et al. [10]. The structure of the our generalized capacitance matrix algorithm
can be obtained directly from these generalized theorems.
Chapter 4 gives the complete generalized capacitance matrix algorithm which
involves using the ADI method for solving the Lyapunov matrix equations and using
the modified biconjugate gradient (BCG) algorithm for solving the associate linear
system with a capacitance matrix. We consider the membrane smoothness, thinplate
smoothness or a linear combination of the two in our algorithm. We demonstrate the
9
efficiency of this algorithm by applying it to surface reconstruction and shape from
shading problems.
In chapter 5, the new adaptive preconditioning technique is presented as an
efficient solution to general early vision problems. Our adaptive preconditioner is
constructed in a wavelet basis based on the approximation to the spectral character
istics of the smoothness and data constraints. We empirically show the superiority of
this adaptive preconditioner over other preconditioners previously proposed in vision
literature through experiments on surface reconstruction, shape from shading and
optical flow computation.
Two new, robust and efficient algorithms for optical flow estimation are pre
sented in chapter 6. The first is a modified gradientbased algorithm which combines
the image flow constraint and the contourbased flow constraint into the data con
straint in a regularization framework to amend the errors in the image flow constraint
caused by brightness discontinuities. We employ the adaptive preconditioned conju
gate gradient algorithm, presented in chapter 5, to solve the linear system for this new
formulation. The second is an SSDbased regularization method that uses the SSD
measure as the data constraint in a regularization framework. The preconditioned
nonlinear conjugate gradient with a modified search direction scheme is developed to
minimize the resulting energy function. Experimental results for these two algorithms
are given to demonstrate their performance.
10
In chapter 7, we list several directions for conducting future research based on
the research reported in this thesis. For the early vision with discontinuity detec
tion problem, we discuss numerous existing techniques and propose a new hybrid
search algorithm as a possible approach toward solving this problem. By includ
ing line process into the optimization of the energy function, the early vision along
with discontinuity detection problem is formulated as the minimization of a coupled
(binaryreal) nonconvex energy function. Our hybrid search algorithm consists of a
stochastic and a deterministic search techniques. For the stochastic search, we pro
pose an informed genetic algorithm (GA) as a global minimizer for the binary line
process only, whereas for the deterministic search, we employ either one of the two
deterministic algorithms mentioned above to solve the minimization problem for each
fixed line process configuration inside the GA. Promising preliminary experimental
results of using this algorithm on the sparse data surface reconstruction problems are
given.
Finally, we summarize the results of this thesis in chapter 8.
CHAPTER 2
PROBLEM FORMULATIONS
There have been two classes of formulations proposed for the early vision prob
lems; one is the deterministic formulation, and the other is the probabilistic one. The
deterministic approach is based on the regularization theory proposed by Tikhonov
[82]. The regularization method introduces a stabilizing operator to wellpose the
original problem and leads to a variational principle. The most popular probabilistic
formulation to early vision problems involves the use of an MRF (Markov Random
Field) model. Although there is some correspondence between the aforementioned
stabilizing operator and the MRF model, the MRF model is known to be more ver
satile to model surfaces or images than the deterministic model [51] [74].
2.1 Deterministic Framework
Variational formulations for various early vision problems have been reported
in [62, 6]. These formulations make use of regularization theory. In a regularization
framework, generic smoothness assumptions are imposed on the solution space prior
to attempting any functional minimization. The smoothness constraints are well
characterized by a class of generalized multidimensional spline functionals [80]. The
formulations involve minimization of an energy functional Â£, which is the sum of the
energy contribution from the smoothness constraint (Â£,) and the data constraint (Â£)
11
12
i.e.,
Find u such that S[u) = inf v 7i S(v) (21)
where, 7i defines the linear admissible space of smooth functions defined on 9ft2 and
the functional
E{v) = V{v) + A
Where, A is called the regularization parameter that controls the contribution of the
smoothness term, S(v) : Ft i> 9ft is a functional on H that is a measure of smoothness
of an admissible function v(x,y), and V(v) : 7i tt 9ft is a functional that measures
the discrepancy between the observation model and the observed data [80].
To solve the variational problem, a finite difference method [37] or a finite el
ement method [78] can be used to discretize the problem domain. The infinite
dimensional minimization problem in equation 2.1 can be reduced to a finitedimensional
minimization problem as follows:
minÂ£(u) with E(u) = D(u, d) + AS'(u) (23)
where u 9ft"2 xl is the discretized surface, d 9ft"2 xl is the data vector, and the
discretization mesh is n x n. In general, the observation model can be nonlinear,
e.g. the shape from shading problem with a nonlinear reflectance map function.
Lets denote this model by K
model is linear, and choose the stabilizing ( or smoothing ) operator as the firstorder
13
derivative along x and y, then
D(u,d) =  Ku d 2 (2.4)
S{ u) = iuTKsu (2.5)
For the choice of firstorder differential stabilizer, i.e. the membrane smoothness, the
operator Ka is a Laplacian matrix.
The regularization method imposes the smoothness constraint over the entire
domain. It also smoothes out the discontinuities which is very undesirable. Incorpo
rating the discontinuity detection into the early vision problem, the energy function
to be minimized can be modified as [8]
F7(u,h,v) =  Kju d  + ^((ut+u uj) (1 hÂ¡) + (u,j+i uuj) (1 u,j))
J
+ E^(h,v) (26)
where hij and V{j are the horizontal and vertical line processes, respectively, which
are binary variables with a value 1 denoting a vertical or horizontal discontinuity
at that location and a 0 otherwise. The line energy V(h,v) is a term that may
be used to penalize the inclusion of discontinuities or to impose the smoothness of
the line process. Most of the deterministic formulations take very simple functions
for F(h,v) merely based on some heuristics; for example, Blake and Zisserman [8]
take F(h, v) = hj + vj, which has no interaction between individual line processes
variables.
14
The energy in equation 2.6 is a nonconvex function of real variables coupled
with boolean variables. It is difficult to solve a nonconvex minimization problem
since traditional gradient decent methods are easily trapped in local minima.
The deterministic formulation leads to satisfactory results for some early vision
problems, but this formulation is not general enough to include arbitrary constraints
for the line process. We will see in the next section that the probabilistic formu
lation using the MRF model is more flexible in this context than the deterministic
formulation.
2.2 Probabilistic Framework
MRF (Markov Random Field) is a generalization of Markov process in ID to
2D. To define an MRF [22], we have to define a neighborhood system first. Let S be
a finite set ofnxn sites, and G = {G,j, (i,j) G S'} be a neighborhood system for S'
such that
(1) (t,j) 0 Gij, for all (i,j) G S.
(2) (i,j) Gkli, if and only if (k,l) G,j, for all (i,j),(k,l) S.
Let F = {Fij, (i,j) S} be a random field, which is a family of random variables
indexed by the twodimensional site (i,j) S. Any possible sample realization
f = (/i,i,...,/n,n) is called a configuration of the field. Let 0 be the set of all
possible configurations, i.e. the sample space, and P be a probability measure on .
Then F is an MRF with respect to G if
(1) P(F = f) > 0, for all f G fi.
15
(2) P(FÂ¡ fij  Fk,i /fe,/,(fc,O ^ (hj)) P{Fi,j fi,j  Fk,Â¡ fk,i,{k,l)
Gj).
It is not clear how to find a probability distribution function for an MRF directly
from the above definition. Fortunately, the HammersleyClifford theorem [22] states
that F is an MRF on S with respect to the neighborhood system G if and only if
the probability distribution is a Gibbs distribution with respect to G. Therefore, we
can construct an MRF by designing a Gibbs distribution function as the distribution
function of the MRF.
To give a definition of a Gibbs distribution, we have to define a clique first.
A subset C C S is a clique if every pair of distinct sites in C are neighbors. Let C
denote the set of all cliques.
A Gibbs distribution relative to {S, G} is a probability measure t on 2 such
that
7r(u>) = ^ exp~u^IT (2.7)
Zj
where Z is the normalizing constant ( also called partition function), T is a parameter,
and the energy function U{u) is of the form
V(a) = Â£ Vc(u)
cec
(2.8)
The function Vc(u>) is called the clique potential. Its value is determined by Uij with
(*, j) C.
16
For the sake of exposition, lets consider the surface reconstruction problem
here, but the discussion to follow should apply to most early vision problems with
minor modifications. A coupled MRF model may be constructed for the surface
reconstruction problem by defining an MRF, F, for surface and a boolean MRF, L,
for line process. The prior probability can be written as
P{ F = f,L = l) = i exP_t/(f1)/r (2.9)
z
U((, 1) = Â£ifc(f,l) (2.10)
c
The parameter T is assumed to be 1 for simplicity. The prior energy /(f, 1) is
similar to the last two terms of E(u, h, v) in equation 2.6, which are the smoothness
constraints on the surface and line process.
Let the observed data be denoted by d, then the problem to estimate f and 1
from d can be obtained by the MAP (maximum a posteriori) estimation. The MAP
estimate is obtained by using Bayes rule and maximizing the posterior probability
P(F = f, L = 1  D = d). Assume the observation model is given by D = H(F) + N,
where N is zero mean white Gaussian random field with variance cr for each variable
in n. After applying the Bayes rule, we can see that the MAP estimation is equivalent
to maximizing P(D = d  F = f, L = 1)P(F = f, L = 1), and
P( D = d  F = f,L = 1) = (2I72pl'/2ex[r"'L"l'f,
(2.11)
17
Combining equations 2.9 and 2.11, we can see that the MAP estimate can be obtained
by minimizing the following energy function
(/(did) = r+(/(f,i) (2.i2)
= idir(f)i*+j:fc(fii)+s:%(i) (2i3)
C C
Investigating equations 2.6 and 2.13, we can see that these two energy functions are
very similar. If we choose the parameters and the clique potential to make these
two energy functions identical, then, both the deterministic formulation and the
MRF formulation give rise to minimizing the same energy function. Therefore, the
correspondence between the two formulations is obvious.
The advantages of the probabilistic model over a deterministic model are two
fold:
1. The MRF model can include very general constraints on the line process, while
the deterministic model is restricted in the types of constraints considered.
The MRF model regards the imposed constraints as a prior probability, which is
determined by the clique potential. In fact, we can estimate the clique potential
functions for different object models instead of ad hoc or heuristic choices [54].
2. A probabilistic model can provide second (covariance) or higher order statistics
of the estimate in addition to the MAP estimate. The higher order statis
tical information is very useful in the context of dynamic vision [74], sensor
18
fusion, and integration of multiple modules [51]. This can not be achieved by
a deterministic model.
Therefore, it can be concluded that the MRF model is more versatile than the
deterministic models. The efficient computational algorithms proposed in this thesis
can be applied to solve the minimization problems for both the deterministic and
probabilistic formulations.
2.3 Variational Formulations for Some Early Vision Problems
From the previous two sections, we can see the link between the regulariza
tion and MRF formulations, which lead to the minimization of equivalent energy
functions. In this section, we give the deterministic formulation of some early vi
sion problems namely, surface reconstruction, shape from shading and optical flow
computation. The corresponding probabilistic formulation can be directly obtained
from the equivalence between these two formulations [51]. For ease of exposition, the
location of surface and orientation discontinuities (if any) are assumed to be known
in the below discussion. However, we will discuss the issue of discontinuity detection
in chapter 7.
Variational formulations for various early vision problems have been reported
in [79] and references therein. These formulations make use of the popular theory
of regularization. In a regularization framework, generic smoothness assumptions
are imposed on the solution space prior to attempting any functional minimization.
The smoothness constraints are well characterized by a class of generalized multi
dimensional spline functionals [80]. The formulations involve minimization of an
19
energy functional Â£, which is the sum of the energy contribution from the smoothness
constraint (S) and the data constraint (V) (see [80, 75] for details of this formulation).
2.3.1 Surface Reconstruction
For the surface reconstruction problem, we are faced with recovering surface
shape from either sparse or dense range data. For a first order stabilizer, the smooth
ness constraint is given by
$() = J JQ(vl + vy) dxdy (214)
where u(x, y) is the admissible function and vx, vy its partial derivatives assumed to
be small. For the case of a second order stabilizer, we have the following smoothness
constraint
Â£() = + + uw) dajd> (215)
The first and second order stabilizer can be combined to form the controlledcontinuity
stabilizers [80]. To the smoothness constraint, we add the data constraints in the
form of penalty terms. The following penalty term which measures the discrepancy
between the surface and data weighted by the uncertainty in the data may be used,
pm = \'E.cÂ¡ iv(x" df
(2.16)
20
Where, d, are the depth data points specified in the domain 0 and c, are the un
certainty associated with the data. The goal is to find a u that minimizes the total
potential energy Â£(v) = S(v) + V(v).
The numerical solution to the above problem is computed by discretizing the
functionals
function is a quadratic in x (a discretization of u) given by, E(x) = xrKx xTb + c.
Where, K is a very large and sparse N x N matrix [79], with N being the number
of discretization nodes. The minimum x* of this energy function is found by solving
the large sparse linear system Kx = b. We present two very fast solutions to this
problem in this thesis. One is a generalized capacitance matrix algorithm and the
other is a novel adaptive preconditioning technique, which will be given in subsequent
chapters
2.3.2 Shape from Shading
Numerous solution methods using variational principle formulations of the shape
from shading (SFS) problem have been proposed. For a comprehensive set of papers
on this topic, we refer the reader to the book edited by Horn and Brooks [39] and the
work in [79, 72, 38, 76]. In this problem, it is required to recover the shape of surfaces
from image irradiance which depends on surface geometry and reflectance, scene
illuminance and imaging geometry. The image irradiance can be expressed directly as
a function of the surface orientation if illuminance, reflectance and imaging geometry
are assumed to be constant. Two deterministic formulations for SFS problem are
discussed in the following.
21
PDE formulation
The shape from shading problem is commonly posed as a nonlinear, first or
der partial differential equation in two unknowns, called the image irradiance equa
tion, E(x,y) = R(p,q) [39], where E(x,y) is the image irradiance at a point (x,y),
p = vx, q = vy are first partials of the surface function u(x,t/), and R(p,q) is the re
lation between surface orientation (p, q) and image irradiance E(x,y). To overcome
the ambiguity caused along occluding contours in the gradient space parameteriza
tion (p, q), a reparameterization of surface orientation in terms of a stereographic
mapping: / = 2ap, g = 2aq with a = 1/(1 + \/l + p2 + q2) is used. With this repa
rameterization, the overall energy function (sum of stabilizer and data energies) to
be minimized is expressed as
Â£{f,9) = X j jQ(fx + fl) + (sl + 92y)dxdy + J jjE(x,y) R{f,g)]2dxdy. (2.17)
The Euler Lagrange equations which express the necessary condition for a minimum
are given by the system of coupled PDEs
A/ = A[ R(f,g)E(x,y)]R,
Ag = \[R(f,g) E(x,y)]Rg
(2.18)
The above equations may be discretized and solved using the algorithm of Sim
chony et al. [72], which enforces integrability condition on p and q. The method
involves solving three different discretized Poisson equations namely, Kxi = bi,
22
Kx2 = b2, and KiX3 = b3, where K is the stiffness matrix obtained by discretizing
the Laplacian operator (on the LHS of the equation 2.18) with Dirichlet boundary
conditions and Kj is the stiffness matrix obtained by discretizing the Poisson equation
for the height from orientation problem i.e.,
A Z = px + qy (2.19)
with Neumann boundary conditions. The vectors bi and b2 are the appropriate
RHSs obtained by discrete computation of the RHS of equation 2.18 at the current
estimated value of (f,g). b3 is the discrete form of the RHS of the above height from
orientation equation with the estimated (p, q) from equation 2.18. The structure of
the stiffness matrices K can be analyzed using the computational molecules [81]. In
this dissertation, we apply our generalized capacitance matrix algorithm to solve the
above discretized Poisson equations.
Energy minimization formulation
The other formulation for the SFS problem is to minimize the total energy func
tional containing the data constraint, the integrability constraint and the smoothness
constraint. The minimization is taken with respect to the surface orientation (p,q)
and the surface height Z altogether. The total energy functional Â£(Z, p, q) is the sum
of the smoothness constraint and the penalty terms in this problem, i.e.,
Â£(Z,p,q) = ^JJ^pl + P^ + iql + q^dxdy + ^JJiZzpy + iZyqYdxdy
23
+ J Jj.E{x, y) R(p, q))2dxdy.
The first term in the above equation is a weak smoothness constraint and the coeffi
cient A controls its contribution toward the final solution. The second term enforces
the integrability constraint, ensuring that the gradients (p, q) of the recovered surface
Z match those of a valid surface. Zy are the first partials of the surface Z(x,y).
The above energy can be discretized directly by using finite element techniques
however, the same set of discrete domain equations can be arrived at via the use of
finite difference approximations for this application [76]. The discrete energy is given
by
e = ^J2(E(x^y)R^
where the summations are over all the pixels in the image. In vector notation, the
above equation can be written as
E(u) = utKu bu + C (2.21)
j
Where, u = [zTpTqT] is a concatenation of the vector representation of the fields
Z,p and q. The minimum of the above quadratic is obtained by solving Ku = b with
Ku containing linear u terms and the constant and the nonlinear terms form the b
vector. This is an (N x N) sparse nonlinear system which is usually solved using
iterative methods. These iterative methods to solve the nonlinear system normally
24
have the following procedures in each iteration: (1) compute the b vector using
the current u, (2) update the solution u by treating b as a constant. In the past,
multigrid techniques have been employed in solving this nonlinear system [79] wherein
a hierarchy of problems at different resolutions needs to be explicitly constructed.
Szeliski [76] presented a fast shape from shading algorithm which employs hierarchical
basis for preconditioning the conjugate gradient algorithm used to solve the above
nonlinear system. Unlike the multigrid methods, this technique does not require
the explicit construction of a multiresolution pyramid of problems. In this thesis we
apply our adaptive preconditioning technique that leads to a faster convergence than
the one in Szeliski [76]. Our method makes use of the spectral characteristics of the
data constraint, the imposed smoothness constraint and the integrability constraint.
This spectral function is then used to modulate the frequency characteristics of the
chosen wavelet basis leading to the construction of the preconditioner.
2.3.3 Optical Flow Computation
A third early vision problem that we consider in this dissertation is the compu
tation of optical flow from an image sequence. Many techniques for computing optical
flow have been proposed in literature. These can be classified into gradientbased,
correlationbased, energybased, and phasebased methods [3]. Recently, Barron et
al., [3] have presented a comprehensive survey of various optical flow computation
techniques along with a quantitative comparison of them. We note that the survey did
not do any comparison of the computation time required by the various algorithms.
25
Two new and robust algorithms namely, a modified gradientbased algorithm
and an SSDbased regularization algorithm, for accurate optical flow computation
are presented in this dissertation. In the modified gradientbased algorithm, we
propose to selectively combine the image flow constraint and the contourbased flow
constraint into the data constraint in a regularization framework to amend the errors
in the image flow constraint caused by the brightness discontinuities. The image flow
constraint is disabled in the neighborhood of discontinuities, while the contourbased
flow constraint is active at discontinuity locations. The SSDbased regularization
algorithm uses the SSD measure as the data constraint in a regularization framework,
which leads to a nonlinear optimization problem. These two different regularization
formulations are described below. These two robust algorithms will be discussed in
details in chapter 6. In this section, we briefly describe our modified gradientbased
formulation.
Our modified gradientbased formulation modifies the Horn and Schunk for
mulation [40] to incorporate a separate condition along discontinuities in the image
intensity [35]. This modification drastically enhances the performance of our method
over other gradientbased methods because the optical flow constraint equation is in
valid along intensity discontinuities and causes large errors in the computed flow. We
get drastically better and more robust estimates of optical flow than any gradient
based optical flow technique reported in Barron et al. [3].
Our variational formulation of the optical flow problem involves minimizing a
combination of the optical flow constraint equation with a global smoothness term
26
and an additional constraint term along edges leading to
[ a((VÂ£.u)+Â£t)2+A( Vu 2 +  Vu
JD
2)dx+f /?((V(V2G*Â£)u)+(V2G*Â£)t)2dx
(2.22)
Where u(x, t) = (tt(x, t), n(x, t)) is the velocity field to be estimated, E is the image
brightness function, and V2G is the Laplacian of Gaussian operator [35]. The first
term is the gradient constraint which is active away from discontinuities, the second
term is the smoothness constraint on the velocity field with A controlling the contri
bution of the smoothness term and the third term is the flow constraint along edges
which is disabled away from the edges. A discrete version of the above energy can
be written as
+ Evv + Et)2 + \ Y,(ul + UD + (vl + VD
(2.23)
Where, and v form the discretized flow vector, x,y,vx and vy represent the dis
cretized partais of the velocity field, Ex, Ey and Et correspond to the discretized
partials in the x, y and t directions respectively collectively obtained from the
first and the third terms of the equation 2.22. Note that Ex, Ey and Et are the
discretized partials of the brightness function E when the location of differentiation
is away from discontinuities, and they represent the discretized partials of V2G E
when the differentials are taken at the discontinuity locations. The minimization of
this discretized energy leads to solving a linear system of equations Ku = b where
K contains the discretized differential (smoothness) operator along with Ex, Ey and
27
the b contains the terms corresponding to the optical flow constraint equation. The
gradient constraint in the above energy function implies the weighting (Ex + Ey) on
(,) at each discretized spatial location. This weighting is not reasonable since the
gradient constraints at locations of high gradient are not necessarily reliable. In fact,
they tend to be unreliable because the brightness gradients close to discontinuities are
normally higher than those away from discontinuities. Therefore, we eliminate this
weighting for each gradient constraint via normalization. To have a uniform contri
bution at each pixel from the flow constraint, we normalize the entries E2X, E2y, ExEy
2 2
in the matrix K with Ex + Ey.
To solve the above quadratic and convex energy minimization problem, we em
ploy our adaptive preconditioned conjugate gradient algorithm presented in this
dissertation. This novel adaptive preconditioning technique is applicable to any
gradientbased optical flow computation method that uses a smoothness constraint
involving the 2D Laplacian operator. The efficiency as well as accuracy of our adap
tive preconditioning technique and new optical flow algorithms will be demonstrated
through experimental results on synthetic and real image sequences in chapters 5 and
6.
CHAPTER 3
GENERALIZED CAPACITANCE MATRIX THEOREMS
3.1 Introduction
There are numerous applications of the ShermanMorrisonWoodbury formula in
various fields [30] [85]. These applications can be classified into two categories. One is
to compute the inverse of a matrix that has lowrank difference w.r.t. another matrix
whose inverse is available or can be efficiently computed; the other is to solve a linear
system whose matrix differs from a wellstructured matrix by a lowrank matrix and
very efficient methods exist to solve the linear system with the wellstructured matrix.
The latter case is also called the capacitance matrix algorithm. This algorithm has
been widely used for solving linear systems arising from the discretization of the
boundary value problems, primarily for the elliptic partial differential equations. In
this chapter, we state and prove some very general theorems which are then applied
in developing a new capacitance matrix algorithm in the next chapter.
When A 3?nxn, U, V 3?nxp, and both the matrices A and (7 + VrA1U)
are nonsingular, the ShermanMorrisonWoodbury formula [26] gives an expression
for the inverse of (A + UVr) as
(A + UVT)X = A"1 A_1U(7 + VtA1U)1VtA1. (3.1)
28
29
The above assumptions imply that the matrix (A + UVT) is also nonsingular. The
matrix (I + VTA1U) is called the capacitance matrix and is denoted by C. Note
that a similar formula can be obtained for the inverse of (A + UGVT), where the
matrix G 3?pxp is nonsingular [30].
Let B = A + UVT, with A, B, and C being nonsingular, then the linear
system Bx = b can be solved using the traditional capacitance matrix algorithm,
which involves the following steps:
1. Solve Ax = b for x.
2. Compute W = A1U by solving AW = U column by column.
3. Form the capacitance matrix C = / + VTW.
4. Solve C/3 = VTx for Â¡3.
5. Compute the solution x = x W/3.
This capacitance matrix algorithm is derived directly from the ShermanMorrison
Woodbury formula, and therefore, it can be applied only when A, B, and C are all
nonsingular, making it unsuitable for general linear system problems. In addition, the
choice of a wellstructured matrix A which is close to the original matrix is limited
to be nonsingular.
Buzbee et al. [10] proposed a fast algorithm to solve the Poisson equation (
a secondorder elliptic PDE ) on irregular regions. Their algorithm used a domain
imbedding technique to imbed the irregular region in a rectangular region and utilized
a fast Poisson solver ( on the rectangular region ) which is based on the capacitance
30
matrix algorithm. The discrete Poisson operator on the rectangular domain is the
Laplacian matrix, which is singular with rank n 1. Therefore, they presented a
theorem to generalize the traditional capacitance matrix algorithm to the case when
rank(A) = n 1 and B is nonsingular. This generalization is based on projecting the
righthandsides (RHSs) of the linear systems appearing in the first and second steps
onto the range of A through a unit vector. In [58], a generalized capacitance matrix
algorithm was presented for rankdeficient matrices A and B. This generalization
is accomplished by establishing the relationship between the solutions of the linear
systems with matrices A and B using their singular value decompositions. The
resulting algorithm involves the projection of the RHSs of the linear systems with
matrix A onto its range through the eigenvectors in the null space of AT. In addition,
two auxiliary matrices A and B, which are the rankaugmented versions of A and B,
are constructed in this algorithm to apply the ShermanMorrisonWoodbury formula
directly. Hence, this capacitance matrix algorithm requires the computation of all
the eigenvectors in the null spaces of A, AT, B and Br.
The capacitance matrix algorithm has also been employed as a discrete anal
ogy to the potential theory for partial differential equations by Proskurowski and
Wildlund [65, 66], and others. However, the framework of their capacitance matrix
algorithms is limited to solving secondorder elliptic boundary value problems.
The other possibility to generalize the capacitance matrix algorithm for singular
A and B is to directly apply a generalized inverse form for (A + UVT) to solving
the linear system. Some works on the extension of the ShermanMorrisonWoodbury
31
formula in equation 3.1 have been reported in [11], [34], [67], and references therein.
However, these generalized inverse forms of (A + UVT) have been derived under
very restrictive assumptions. In [11], a generalized inverse form was given for the
case when U and V are both vectors, which means B is a rankone modification of
A. Although the generalized inverse of a rankp modification of A can be obtained by
using the rankone modification generalized inverse formula iteratively, the resulting
algorithm is very inefficient in terms of storage and computation when the size of
the matrix A is very large. Henderson and Searle [34] derived several generalized
inverse forms for (A + UVT) under the assumptions that range(UVr) C range(A)
and both the matrices A and UVr are symmetric. Recently, Riedel [67] presented a
generalized inverse form for (A f UVr) with a very restrictive assumption, which is
implied by the condition rank(B A) < dim(null(A)). This condition implies that
Riedels result at best can give a generalized inverse for a rank<7 modification of A,
where q < dim(null(A)). This condition becomes very restrictive and impractical
when the dimension of the null space of A is small. Although the aforementioned
generalized inverse formulas can be used to design the corresponding capacitance
matrix algorithms, the associated assumptions restrict them from being applied for
general matrices A and B.
In this chapter, we present several theorems which we call generalized capaci
tance matrix theorems that lead to a generalized capacitance matrix algorithm which
can be applied to solve very general linear systems. In the main theorem, there are
no restrictions on the rank of the matrices A or B and it can be applied to any linear
32
system as long as it has a solution. In addition, with this generalized capacitance
matrix theorem, we have ample freedom to choose a wellstructured matrix A for de
signing more efficient capacitance matrix algorithms. Unlike in [58], no construction
of auxiliary rankaugmented matrices is required since we do not use the Sherman
MorrisonWoodbury formula in our generalization, instead our resulting algorithm
uses only the eigenvectors in null(AT)\null(JZT) to project the RHSs of the linear
systems with matrix A onto range(A). To achieve the projection of the RHS onto
range(A), we subtract the RHS along the vectors Uj,l < j < m\ m, instead of
subtracting along the eigenvectors in the null space of A as in [58], By using our
projection technique, the sparsity of the RHS can be preserved, while the sparsity of
the RHS is usually destroyed after subtracting along the eigenvectors as in [58]. We
take advantage of this sparsity property to obtain fast numerical algorithms in our
work. Unlike in [58], the capacitance matrix and the capacitance matrix equation
are explicitly given in our generalized capacitance theorems and the algorithm. By
explicitly specifying the capacitance matrix and the capacitance matrix equation, it
is possible to modify the RHSs of these equations in accordance with the singularities
of A and B. With the structure of the capacitance matrix equation given explicitly,
it is easy to design an efficient numerical algorithm to solve this equation. When
comparing the generalized inverse forms of (A T UVT) proposed in [34] with our
generalized capacitance matrix theorems, we observe that the generalized inverse for
mula of Henderson and Searle [34] is a degenerate result of our theorem 2, which has
a more relaxed assumption, i.e. null(A) C null(B) and range(B) C range(A), than
33
the assumption, range(B) C range(A) for symmetric A & B, made in [34]. When
applied to solve linear systems using the capacitance matrix approach with the matri
ces A and B satisfying the aforementioned assumption, both the generalized inverse
formula in [34] and our theorem 2 lead to the same algorithm.
We will state and prove the generalized capacitance matrix theorems for different
types of relationships between the matrices A and B subsequently. In the next
chapter, the generalized capacitance matrix algorithm will be constructed based on
the theorems presented in this chapter.
3.2 Generalized Capacitance Matrix Theorems
We categorize the generalized capacitance matrix theorems according to the
type of relationship between the matrices A and B. The first theorem is for the case
when null(B) C null(A) and range(A) C ranre(B), while the second theorem is
for the assumption null(A) C null(B) and range(B) C range(A). There are no
assumptions on A and B in the third theorem. The generalized capacitance matrix
theorem for the first case is stated as follows.
Theorem 1 Given B = A + UGV7, where A,B 3?nxn, G 3?pxp is nonsingular,
U = [ui,...,Up] 3?nXp, V = [vi,...,vp] &nXp, with both the sets {ui,...,up}
and {vi,..., vp} containing linearly independent vectors. Assume null(B) C null(A)
and range(A) C ran<7e(B). Let rank(B) = n ms and rank(A) = n m^,
0 < ms < tua < ti, then there exist two sets of orthogonal bases, {qi,...,qmx}
and {q'l5..., q'mA }, for null(AT) and null(A) respectively, such that nii//(BT) =
34
span{ ..., f\mA } o.nd null(H) span{ q myims+i..., q mA }. Let qt Uy
j), 7^ 0, for 1 < i,j < rriA tub
Let x be a solution to
mAmB
Ax = b 
j=i
qjb
qjui
(3.2)
where b range(B). For 1 < i < p, let r]{ be a solution to
qJu*
AVi = ut ^ r7ui
j=i q
(3.3)
anrf choose rj1...rjto be linearly independent and the linear combinations of
q'l5..., q/rriy4_ms Define the capacitance matrix C as
C = G
i
mAmB
+ Vtt/ Â£
=1
G
i.
qJU
a
'7 71 *
qjui
(3.4)
where rÂ¡ = [t7x, ..., rÂ¡p], and ej is the jth standard basis vector in 3?p. Then, there
exists a unique solution (3 to the capacitance matrix equation
C/3 = VTx
mAmB
E
j=x
^Ge
T ^ ^ 7 7
q, ui
and x *7/3 ts a solution to the linear system Bx = b.
(3.5)
We will use the following proposition to prove this theorem.
35
Proposition 1 Under the assumptions in Theorem 1, we have vfq'j = uJqj = 0 for
1 < i < p and mA mB + 1 < j < mA. In addition, VT[q/1... q'mA_mB] and
UT[qi... q mAmB] are of full rank.
Proof: From the assumption in Theorem 1, Bq'^ = Aq'^ = 0, mA mB + 1 < j <
mA. Since Bq'; = (A + UGV71)^, we have UGV7^ = 0 for mA mB + 1 < j <
mA. From the assumption that U 3?n)
it is obvious that Vrq' = 0. Therefore, we have vfqL = 0 for 1 < i < p and
mA mB + 1 < j < mA. Similarly, we can show that ufq, = 0, 1 < i < p and
mA mB + l < j < mA.
We use proof by contradiction to show that the matrix V^q'j... q,m^_mB] is of
full rank. Lets assume this matrix is rank deficient, i.e., there exists a nonzero vector
a = (
then r null{A) and r is orthogonal to null(B). Postmultiplying both sides of the
equation B = A + UGVT by r yields Br = Ar + UGVTr. Since Ar = 0 and
VTr = 0, we obtain Br = 0. But, from the above, r is a nonzero vector orthogonal to
nu//(B), i.e. Br ^ 0, leading to a contradiction. Consequently, the assumption that
VT[q,1 ...q'm.mj is rank deficient is not true, implying that VT[q'1... q'm/l_mJ
is of full rank. Similarly, we can prove that UT[qx... qmylmB] is of full rank.
We are now equipped to prove Theorem 1. The proof is in four parts, and
we will show that (1) there exist two sets of orthogonal bases, {qx,..., qmyl} and
{q'j,..., q'm^}, for null(AT) and null(A) respectively, such that null(BT) = span{
qnums+1, ,qmx} and null{ B) = span{ q/mil_mB+1,..., q'myl}, (2) there exist
36
solutions to equations 3.2 and 3.3, (3) the capacitance matrix C is nonsingular,
implying that equation 3.5 has a unique solution, and (4) B(x rÂ¡(3) = b.
Proof for Theorem 1:
(1) Since rank(A) = titua and rank(B) = nmj, it is obvious from their singular
value decompositions that there exist tua orthogonal eigenvectors, (f)1,..., mA, for
A and mg orthogonal eigenvectors, ipx,.. for B such that null(A) = span{
C null( A), we can write each i = 1,..., m, in terms of a linear combination of
the basis vectors x,..., (Â¡>mA Thus, we have
\ Vms J
T
/
(3.6)
where T is of full rank. By using the Householder or Givens transforma
tion [26], we can obtain the factorization of T as follows:
T =
0 T
Q,
(3.7)
37
where 0 is an mg x (ttia mg) zero matrix, T is nonsingular, and
Q 6 is orthogonal. Define the vectors q'j,..., q'm^ by the following relation
/ \
( \
q'i
= Q
i
(3.8)
\ ^mA )
We can see that {q'j,..., q'm/i} is a set f orthogonal bases for null( A) and its
subset {q'm^_mB+1,..., q/TJT>1} forms an orthogonal basis for null(B). Thus, the
existence of the basis {q'j,..., q'm/1} which satisfies the aforementioned requirements
is proved. Similarly, we can prove that there exists an orthogonal basis {qj,..., qTOyt}
for null(AT) and null(BT) = span{qmx_mfl+i,..., qm/1} is satisfied.
(2) From Proposition 1, we have ujqj = 0 for 1 < i < p and mg + 1 < j <
rriA. The assumption b range(B) implies bTqj = 0 for tua mg + 1 < j < m^.
Combining these facts and the orthogonality of the eigenvectors qj,j = 1,..., m^, we
can see that the RHSs of equations 3.2 and 3.3 are orthogonal to q, for j = 1,..., tua,
thus, the RHSs are orthogonal to null(Ar). Consequently, the RHSs of equations
3.2 and 3.3 are both in the range of A, which means there always exist solutions to
equations 3.2 and 3.3.
(3) To prove that C is nonsingular, we show that C/3 = 0 => (3 = 0. Lets
suppose C/3 = 0, then, from equation 3.4, we have
Vtt7/3
mAmB
E
J = 1
G
i
eP
qjug
qj;
G'p.
(3.9)
38
By definition, B77/3 = Arj(3 + \J\Trj/3. Substituting equations 3.3 and 3.9 into this
equation, we get B77(3 = 0. Thus, rjf3 E null(B). Since null(B) C null(A), we
have rj/3 Â£ null(A), i.e. ArÂ¡(3 = 0. Substituting At] by equation 3.3 and denoting
Â¡3 = (/?i,..., /?p), we obtain
y y 171A ~mB Qiii
U E = 
j=l 4j uj
1 = 1
=1
(3.10)
After rearranging, equation 3.10 becomes
mAmB
QTU/3 P
E (A tttK + E = 
4j uj
(3.11)
j=i
i=m^mfl+1
Since Ui,..., Up are linearly independent, equation 3.11 imposes the following condi
tions on flj
Pi
Pi
qjuP
q[uj
j = l,...,mA mg,
0, j = mAmB + 1,
(3.12)
(3.13)
Substituting these conditions into the RHS of equation 3.9, we obtain
/ \
Pi
\ PmAmB J
= 0.
(3.14)
39
For i = mg, the RHS in equation 3.3 equals 0, thus, rji Â£ null(A).
From the assumption that rjx,..., TjmAmB are chosen to be linear combinations of
q'j,..., q'myl_mB and linearly independent, we can write [tj1 ... r/mA_me] in terms
of [q'i 35 follows:
[Vl VmAmB] = [q'l q'mj(mBlT. (315)
where T sj\(mAmB)x.(mAmB) js nonsingUlar. By substituting equation 3.15 into
equation 3.14, we get
\
= 0. (3.16)
From Proposition 1, the matrix V^q'j ... q,myl_mB] is of full rank and is of size
p x (m,A ms) with p > rriA mg, which implies
Vr[q'i ... q'm.TOB]T
A
3.
/
fii
\
T
= 0.
(3.17)
V
A
mAmB
Since T is nonsingular, this makes /?, = 0, = 1,..., mg. Combining this with
equation 3.13, we obtain (3 = 0. Thus concluding that C/3 = 0 implies (3 = 0,
therefore, C is nonsingular.
40
(4) Now, we will show that B(x rj(3) = b. Substituting A + UGVT for B
and expanding, we have
B(x rj/3) = Ax Atj/3 + UGVTx UGVTrjf3. (3.18)
Rewriting Vtt]/3 using equations 3.4 and 3.5 and some algebraic manipulation, we
get
mAmB
Vtt7/3 = Vtx Â£
uj
nTb mAmB
ij
3 T
qju
3=1
q; Uj
(3.19)
Substituting for Ax, Atj and VT77/3 from equations 3.2, 3.3 and 3.19 respectively,
equation 3.18 leads to B(x r]/3) = b.
In theorem 1, we assume that the matrices U and V contain linearly independent
columns and G is nonsingular, which is equivalent to saying that U, V and G are all
of full rank, in addition, the condition qfuj = a,<(, j) must be satisfied for 1 < i,j <
mAmB In fact, we can prove that given any two matrices A, B 3?nXn, there always
exist fullranked matrices U, V 9?nxp and G 6 3ipxp such that UGVT = B A and
p = rank(B A). This can be verified by taking the singular value decomposition of
the matrix B A as \JS'VT, where both U and V are orthogonal matrices and Â£ is
a diagonal matrix with rank p = rank(B A). We can reduce the size of the matrix
Â£ by retaining the p nonzero diagonal entries and taking out the remaining rows
and columns of zeros. This reduced matrix G G 9Â£pxp is nonsingular. In addition,
we take out the columns in U and V corresponding to the zero singular values in Â£,
41
then the reduced matrices U, V 3?nxp are obtained. It is not difficult to see that
UGVr = B A and the matrices U, V and G are of full rank.
As for the condition qfu, = a,(, j), 1 < i,j < mg, we can find such
a U by transformation. If {U,G,V} satisfies UGVT = B A and each is of full
rank, then the alternative set {UQ, Q_1G, V} also satisfies these requirements when
the matrix Q 9Â£pxp is nonsingular. From Proposition 1, UT[qi... has
full rank. We can find a transformation matrix Q to make the upper (m tub) x
(rriA tub) block of the transformed matrix QTUr[qi ...qmA_mB] diagonal. Then,
the condition qfuj = 1 < i,j < tua ms, is satisfied.
There are infinite number of choices for the matrices U, V and G to satisfy the
aforementioned requirements. The above choice from the singular value decomposi
tion of B A is just a particular one that leads to a diagonal G matrix. This makes
the computation of G1 in equation 3.4 trivial. The choice of A, U, V and G is very
crucial for the efficiency of the numerical algorithms based on the capacitance matrix
algorithm to solve a linear system.
Now lets turn to the other case when null(A) C null(B) and range(B) C
range(A). Note that these two constraints are equivalent to each other when both
B and A are symmetric matrices as in [34].
Theorem 2 Given B = A + UGVT, where A, B 3ftnxn, G 3?pXp is nonsingular,
U = [ui,..., Up] 3Â£nxp, V = [vj,..., Vp] G 3?nxp, with both the sets {ui,...,up}
and {vj,..., Vp} containing linearly independent vectors. Assume null( A) C nu//(B)
42
and range(B) C range(A). Let x be a solution to
Ax = b, (3.20)
where b G range(B). For 1 < i < p, let rji be a solution to
A Vi = (3.21)
Define the capacitance matrix C as G_1 + VTT7, where r\ = [iyx,..., T7p]. Then there
exist a solution f3 to the capacitance matrix equation
C(3 = VTx, (3.22)
and x rj/3 is a solution to the linear system Bx = b.
We will use the following proposition to prove the above theorem.
Proposition 2 In Theorem 2, let {qi,...,qm/t} and {q'l5..., q'myi} be the sets of or
thogonal eigenvectors in null(AT) and null(A) respectively, where tua = nrank(A).
Then, qfuj = q'Jvj = 0, for i = 1,..., tua and j = 1,... ,p.
Proof: From the assumption that null(A) C null(B), we can see that the eigen
vectors q', i = 1,...,ttia, are also in null(B). Since Bq'( = Aq', + UGVTq't, we
obtain
UGVTq't = 0,
(3.23)
43
for 1 < < mA. The matrix U is of size n x p, n > p, and of full rank, consequently
GVTq't = 0. From the assumption that G is nonsingular, we have Vrq', = 0.
Therefore, q'Jvj = 0 for t = 1,..., mA and j = 1,... ,p.
Similarly, we obtain the equation VGrUrq, = 0 from the condition Brq, =
ATq = 0 which is derived from the assumption range(B) C range(A). Using
the similar reasoning for this equation leads to the conclusion that qfuj = 0, i =
and j = l,...,p.
We are now poised to prove Theorem 2.
Proof for Theorem 2 : The proof is presented in three parts namely, (1) we
will show that there exist solutions for equations 3.20 and 3.21, (2) we will establish
that there exist a solution for equation 3.22, and (3) show that B(x rÂ¡/3) = b.
(1) To show that there exist a solution for equations 3.20 and 3.21, we prove that
the RHSs of both equations are in the range of A. It is obvious that b G range(A)
for equation 3.20, since b G range(B) and range(B) C range(A). For equation 3.21,
from Proposition 2 we can see that u,, 1 < i < p, is orthogonal to null(AT), which
means u, G range(A) for i = 1,... ,p.
(2) As for equation 3.22, we need to show that Vrx is in the range of C. We
use proof by contradiction in the following. Assume that VTx is not in the range of
C, then there exists a nonzero vector 7 G null{CT), i.e. Cr7 = 0, such that
7t(Vtx) ^ 0.
(3.24)
44
Using the generalized inverse 1 A+ of the matrix A, we can write x as follows.
mA
x = A+b + o.q' (325)
1=1
where = n rank(A), q'x,..., q'm/1 are the orthogonal eigenvectors in null(A),
and a, 3?, Vi. Then,
mA
VTx = VT A+b + Â£ a, VTq',. (3.26)
1=1
From Proposition 2, we have VTq't = 0, for i = 1,..., m^. Substituting into equation
3.26, we have VTx = VTA+b. Substituting for VTx in equation 3.24 yields
(A+TV7)Tb 0. (3.27)
We can show that A+TV7 null(BT) as follows.
Bt(A+TV7) = (A + UGVT)rA+rV7 (3.28)
= (AtA+TV + VGtUtA+TV)7 (3.29)
Since ATA+r = I Â£= q'tq'f, using a result of Proposition 2 namely, q'fV = 0
for i = 1,..., tua, we have
ATA+TV = V Â£ q'.q'f V = V. (3.30)
i=i
1 Let the singular value decomposition of A be A = Ui7VT, where U and V are orthogonal
matrices and Â£ = diag(0,..., 0, XmA+i, , An). Then its generalized inverse A+ = Vi7+UT,
where Â£+ = diag(0,..., 0, ,..., Â£) [26].
45
Substituting equation 3.30 into equation 3.29 yields
Bt(A+TV7) = VGt(G1 + VTA+U)T7 (3.31)
Again, using Proposition 2, we can show that C = G1 + VTA+U. Consequently,
we obtain Br(A+TV7) = VGTCT7 = 0 since CTj = 0 by assumption. Thus,
A+rV7 Â£ null(BT). Combining this with the equation 3.27 leads to the conclusion
that b is not in range(B). This is a contradiction to the assumption b Â£ range(B).
Therefore, the assumption that VTx is not in the range of C is untrue. Hence, we
can conclude that VTx C. This proves that there exists a solution to equation
3.22.
(3) The proof for B(x rj/3) = b is very similar to part (3) of the proof for
Theorem 1 and is omitted here.
If we restrict the above constraints on the relation between B and A to nu 11(B) =
null(A) and range(B) = range(A), then the capacitance matrix C is nonsingular,
i.e., the capacitance matrix equation has a unique solution. This result is stated in
the following corollary.
Corollary 1 In Theorem 2, if null(A) = null(B) and range(A) = range(B), then
the capacitance matrix C is nonsingular.
Proof: To prove that C is nonsingular, we show that C(3 = 0 => (3 = 0. Suppose
C/3 = 0, then, from the definition of C, we have Vttz/3 = G_1/3, which can be
substituted into the equation Btj/3 = Arj/3 + UVTT7/3, along with the equation for
46
At] from (3.21), to get B77/3 = 0, i.e., tj/3 nu 11(B). Since null(A) = null(B), we
have Ar](3 = 0. Using equation 3.21 for At], we get U/3 = 0. From the assumption on
the matrix U namely, Ui,..., up are linearly independent, we conclude that (3 = 0.
Therefore, C is nonsingular.
In the above theorems, we impose the constraints on the relationship between
B and A either via null(B) C null(A) and range(A) C ran<7e(B) or via null(A) C
null(B) and range(B) C range(A). Now we consider the general case, i.e., no con
straint on the relationship between B and A is imposed. The following theorem is
given for this general case and can be used for any matrices B and A in 3?nXn as long
as there exists a solution for the associated capacitance matrix equation.
Theorem 3 Given B = A + UGVT, where B, A Â£ 5RXri, G 3?pXp is nonsingular,
U = [ui,...,Up] 3inXp, V = [vi,...,vp] 3inxp, with both the sets {ui,...,up}
and {vi,...,Vp} containing linearly independent vectors. Let rank(A) = n ,
dim(null(B) C\ null(A)) = m', and dim(null(Br) H null(AT)) = m, 0 < m',m <
rriA < n, then there exist two sets of orthogonal basis, {qi,... ,qm/l} and {q'j,..., q'm^ },
fornull(AT) and null(A) respectively, such that null(BT)r\null(AT) = span{qm/l_m+i,
...,qm/1) and null (B) C\ null (A) = span{q'mx_m,+1,..., q'mA}. Let rank( B) =
n mg, 0 < m',m < mg < n, then there exist mg m' vectors r'm<+i,..., r'mB
such that the set {q'm^_m/+1,..., q'mA, r'm<+i,..., r'mj9} is an orthogonal basis for
null(B). Let qjuj = cti6(i,j), a, ^ 0, for 1 < i,j < m.
47
Let x be a solution to
Ax = b Â£
mAm qib
J=1 *Jui
U
J >
(3.32)
where b range(B). For 1 < i < p, let 77, be a solution to
mAm QT
Ai? = u Â£ ikruJ (333)
1=
and choose rjl... TjmA_m to be linearly independent and linear combinations of q'j,...,
q'mAm Define the capacitance matrix C as
mAm gt U
C = G1 + Vtt/ y G'e.V, (3.34)
where rÂ¡ = [77^ ..., rÂ¡p], and ej is the jth standard basis vector in 3?p. Assume the
two sets of vectors {Ar'm/+i, , Ar'mB} and {ui,...,up} are linearly independent,
then there exists a unique solution (3 to the capacitance matrix equation
C/3 = VTx 
rn^m
U
(3.35)
and x r}(3 is a solution to the linear system Bx = b.
The proof of this theorem is omitted here since, it is similar to the proof of
Theorem 1. This proof will require the use of the fact that qfiij = 0, for mA m + 1 <
i < mA and 1 < j < p, and the matrix V^q'j... q'mA_m] is of full rank.
48
In general, Theorems 1 and 2 can be regarded as the special cases for Theo
rem 3 with the exception of the additional assumption on the linear independence of
the two sets of vectors {Ar'm+i,..., Ar'ms} and {ux,..., up} made in Theorem 3 to
guarantee the uniqueness of the solution in the capacitance matrix equation. This as
sumption is automatically satisfied for Theorem 1 since the set {Ar'm/+i,..., Ar'ms}
is empty and the vectors Ui,...,up are linearly independent. However, there is no
such assumption in Theorem 2, which gives the existence of a solution to the capac
itance matrix equation instead of the uniqueness of the solution given in Theorem
3.
3.3 Discussion
In this chapter, we presented the generalized capacitance matrix theorems for
solving linear systems. Our generalized capacitance matrix theorems can be used for
very general matrices A and B, therefore the resulting algorithm can be applied to a
very broad class of problems and we have ample freedom to choose the matrix A ap
propriately for designing efficient numerical algorithms. The traditional capacitance
matrix algorithm [10, 65, 66] has been primarily applied to solve the secondorder
elliptic boundary value problems, whereas our generalized capacitance matrix algo
rithm can be used to solve anyorder elliptic boundary value problems. Also, we can
handle cases such as the shape from shading ( SFS ) problem on an irregular domain
with Neumann boundary conditions which specifically yields a singular K matrix.
This case can not be solved using the capacitance matrix technique of Buzbee et al.,
[10] as mentioned in [72] because, the capacitance matrix theorem of [10] requires a
49
nonsingular K. Our algorithm gives a solution to this problem (recovers a surface)
within a scale factor in height.
Previous works [11, 34, 67] on deriving the generalized inverse forms of (A f
UVT) were presented under restrictive assumptions. Although our generalized capac
itance matrix theorems are developed for solving general linear systems, it is possible
to derive the generalized inverse form of (A + UGVT) for more general cases based
on these theorems.
CHAPTER 4
GENERALIZED CAPACITANCE MATRIX ALGORITHM
4.1 Structure of the Algorithm
From the generalized capacitance matrix theorems presented in chapter 2, we
can generalize the standard capacitance matrix algorithm for general matrices A and
B as follows.
1. Solve equation 3.32 for x.
2. Solve equation 3.33 for r/t, for i = 1,... ,p.
3. Form the capacitance matrix C from equation 3.34.
4. Solve the capacitance matrix equation (i.e. eq. 3.35) for (3.
5. The solution x = x rj(3.
This generalized capacitance matrix algorithm can always be applied to the cases
when null(B) C null(A) and range(A) C range(B) or when null(A) C null(B)
and range(B) C range(A), as stated in Theorem 1 and 2 respectively. However,
to apply this algorithm to more general cases, we need to make sure that a solution
exists to the capacitance matrix equation in step 4. Theorem 3 gives a sufficient
condition for the uniqueness of the solution to the capacitance matrix equation in
the general case.
50
51
Note that the step 2 of the algorithm requires the solutions t^, ..., rÂ¡p in equation
3.33 with different RHSs. The computational cost to solve this equation p times is
very high when p is large. However, the vectors rÂ¡1,...,rÂ¡p can be easily obtained
when the matrix A is chosen to be circulant Toeplitz and the vectors u, have a very
sparse set of nonzero entries. The circulant Toeplitz property of the matrix A makes
it shiftinvariant, which means AS'(u) = S(Au) for all u where S is a shift
operator [59]. A linear shiftinvariant operator can be completely characterized by its
impulse response, which is the solution of the corresponding linear system Au = e
where the RHS e is a unit vector. Thus, the solution 77, can be obtained via the
convolution of the impulse response for this shiftinvariant linear system with the
RHS in equation 3.33, which is sparse since the vectors Ui,...,up themselves are
sparse. The convolution of the impulse response with a sparse vector can be easily
obtained by a linear combination of the shifted impulse response vectors. When the
vectors v,, i = 1 ,...,p, are also sparse, the computation of the matrix VTT7 for
forming the capacitance matrix C can be further simplified by obtaining the (,j)th
entry of \Tr] from the impulse response corresponding to the relative locations of
the nonzero components of v, and those of Uj,..., um^_m and Uj.
The direct computation of rj/3 in the final step of the generalized capacitance
matrix algorithm requires O(pn) operations, which are very costly when p, the rank
of difference between B and A, is large. Instead, we solve the following linear system
qJV0
niAm
Ay = U/3 Â£ JT Uj.
(4.1)
52
to get the solution y which has the same projection on the range of A as 77/3. As for
the projection of rj/3 on the null space of A, it can be easily computed from the inner
product of Â¡3 and a, = (a,,i,... ,a,)P), where a.j = qJrÂ¡j, for 1 < i < mA m. The
coefficients a can be chosen arbitrarily as long as the assumption that rj1... rÂ¡mA_m
are linearly independent and linear combinations of q'l5..., q,mx_m is satisfied. Thus,
V P can be obtained from the following equation.
Am aTv
vP = y+ E (42)
,=1 H q.
The flowchart of the generalized capacitance matrix algorithm is given in figure
4.1.
4.2 Choice of the Splitting
From the variational formulations for the early vision problems, we can see it
is necessary to solve linear systems of the form Kx = b, where the matrix K
$NxN(N = n2) is the sum of the matrices K and AKS obtained from the data con
straint and smoothness constraint respectively. To apply the generalized capacitance
matrix algorithm to solve the linear systems, we have to choose a splitting scheme
for K at first. We discuss the nice choice of splitting K in this section.
Assume we split the matrix K into two components Ko and UVr. Note that the
matrices K and K0 can be replaced by the matrices B and A, respectively, used in
chapter 3. The notation K and K0 (popularly used in computational vision literature)
are used instead of B and A (widely used in the field of capacitance matrix methods)
53
Figure 4.1. The flow chart of the generalized capacitance matrix algorithm.
for the rest of this chapter. Using the generalized capacitance matrix algorithm to
solve the linear system Kx = b with the above splitting for K, we need to solve several
linear systems of the form K0x = b as shown in equations 3.2 and 3.3 respectively.
Especially, in equation 3.3, we have to solve it p times, where p is the rank of K K0.
Therefore, it is very crucial to choose a well structured Ko so as to efficiently compute
the solution in the capacitance matrix method.
In general, there are three criteria that dictate the choice of K0. The first is
the difference between K0 and K must be small in the sense that rank(K Ko) is
small. Secondly, K0 must possess some nice structure so as to have a fast numerical
solution to the associated linear system. Lastly, K0 should be translation invariant;
54
this property has the advantage of saving the computational cost in solving equation
3.3 for each u,0 < i < p 1, since there are only one or two nonzero components in
each u,. Therefore, we need to compute the solution h0 to a linear system consisting
of a matrix K0 and an RHS vector which is the projection of the unit vector onto
the range of K0. Because, h0 is translation invariant, we can obtain the solution of
rÂ¡i in equation 3.3 without having to solve any of the linear systems associated with
each ?7.
For ease of exposition, we consider the splitting scheme for the surface recon
struction problem in the following. This splitting schemes can be extended to other
early vision problems.
The stabilizer discussed in the previous section imposes smoothness constraints
on the surface reconstruction problem leading to a C surface i.e., the membrane
spline, a C1 surface namely, the thin plate spline, or a combination thereof called
the thinplatemembrane spline. As mentioned earlier, the numerical solution to the
surface reconstruction problem involves solving a linear system Kx = b. The matrix
K is large and sparse but not tightly banded. This is primarily due to the fact
that a vector has been used to represent a twodimensional array which leads to
representing neighboring interactions in twodimension by interactions between very
distant elements in onedimension. The matrix K is a sum of two matrices Ks and
Kd, with Ka containing the membrane or thinplate molecules [81] defined on the
interior and boundary of f2. The IQ matrix is a diagonal matrix containing nonzero
weights.
55
By making use of the structure of the matrix K, we can split K into two com
ponents K0 and UVT, with K0 and U, V 3Â£iVxp, such that K0 is close to
K, i.e., the rank of UVT is small, and there exists a fast algorithm to solve a linear
system with the matrix K0. Then, the solution to the linear system Kx = b can
be obtained by using the generalized capacitance matrix algorithm which involves
solving the wellstructured linear systems with the matrix Ko and the capacitance
matrix linear system.
The choice of K0 depends on the data compatibility matrix and the smooth
ness constraint matrix Ks. We have different choices of K0 for sparse (scattered) and
dense data cases. They are given in the following sections.
4.2.1 Dense Data Surface Reconstruction
We will first describe our choice of the matrix K0 and the equivalent Lyapunov
matrix equation for the dense data surface reconstruction with a membrane spline.
Then we will present the matrix Ko for the thinplatemembrane spline surface re
construction based on the construction scheme for the membrane problem. The thin
plate spline is a special case of thinplatemembrane spline and will not be discussed
separately. In addition, the selection of the matrices U and V will be discussed for
both cases.
Membrane spline for the dense data case
The data compatibility matrix K for the dense data case with uniform weight
ing is an identity matrix. Focusing on the surface reconstruction problem in a rectan
gular domain, the matrix K0 is chosen to contain both the membrane computational
56
molecules and the data constraint molecules [81] all over the domain which are
shown in figure 4.2, Using this choice of K0, we can transform the original linear
system Kx = b to an equivalent Lyapunov matrix equation.
Figure 4.2. (a) Data constraint molecule and (b) membrane computational molecules
periodically imposed on the domain 0.
With the above choice, the matrix Kq can be written as follows.
K0 AKmern 4 Kj AKmem (
(4.3)
where
D I
I .
K
mem
5
(4.4)
. I
I
I D
57
4 1
D
1 4 1
Â£nXn.
(4.5)
The matrix Kmem can be regarded as the result obtained from the 5point ap
proximation to Laplacian operator on a doubly periodic imbedded rectangular do
main. From a computational molecule [81] point of view, only the membrane molecule
is used at every node of the doubly periodic rectangular domain. Therefore the molec
ular inhibitions due to depth discontinuities (if any) and imposed doubly periodic
smoothness are included in the matrix UVT. Given the matrices K and K0, we can
show that the choice of the matrices U and V such that UVr = K K0 where both
matrices U and V are of full column rank is not unique. Our selection method for
the matrices U (= [ul5..., up]) and V (= [vi,..., vp]) is as follows. Let each column
vector Uj, 1 < i < p, contain only two nonzero entries +1 and 1 located at the two
neighboring sites separated by the ith discontinuity. Then select the column vector
v, to be Au,, 1 < i < p. Thus the matrices U and V are of full column rank p
where p is the number of discontinuities. Our selection scheme has the advantage
that the column vectors of U and V are very sparse (only two nonzero entries in each
column), a structure that is very conductive to the design of an efficient numerical
algorithm. We will discuss more of this in section 4.3.
58
We can see that this particular choice of K0 makes it circulant Toeplitz and thus
translation invariant. In addition, it can be decomposed as the sum of two tensor
products of matrices A and I, i.e.,
Kq A <8> I +1 <8> A,
(4.6)
with
A =
I + 2A
A
A
A i + 2A A
A A ). 2A
A
A
A  + 2A
and is the Kronecker product. By using this special structure of the matrix K0, we
can rewrite any linear system of the form K0z = f as the following Lyapunov matrix
equation
AZ + ZA = F,
(4.7)
where Z and F are nxn matrices corresponding to their concatenated n2 x 1 vectors
z and f respectively. It should be noted that n2 = N, and that the matrix A in
equation 4.7 is circulant Toeplitz and symmetric positivedefinite. We can use the
ADI (Alternating Direction Implicit) method to solve this Lyapunov matrix equation
in O(N) operations. We will prove this O(N) computational complexity of the ADI
method in section 4.3.
59
Thinplatemembrane spline for the dense data case
The construction of the matrix K0 for the thinplatemembrane spline prob
lem is based on the above construction for the membrane spline. Analogous to the
matrix Kmem which is a wellstructured matrix and close to the smoothness con
straint matrix Ks constructed for the membrane problem, a matrix Ktp is used for
the smoothness constraint matrix discretized from the thinplate smoothness energy.
The differentiation of the membrane smoothness energy function corresponds to a
Laplacian operator A, while the corresponding operator for the thinplate smooth
ness energy is a biharmonic operator A2. Thus, we can extend the selection scheme
for the membrane problem to the thinplate problem by choose the matrix Kip as
K* = KL. (4.8)
The smoothness matrix for the thinplatemembrane spline is a convex combination
of the membrane and thinplate smoothness matrices. Lets define the matrix Ktpm
as a convex combination of Kmem and Ktp, i.e.,
Ktpm crKtp + (1 cr) K mem ? (4.9)
where 0 < a < 1. Similar to that of the membrane spline, the matrix K0 is chosen
to be AKtpm f I, which can be written as
K0 = AcrK^em + A(1 <7)Kmem + I
60
= (aK mem + I)(6K mem I)?
(4.10)
where
A(1
a, 6= ^1^ l411)
When A > a and & are always real and positive. Then the matrix K0 can be
factored into two real matrices (aKmem +1) and (&Kmem +1), which are of the same
form as the matrix K0 for the membrane dense data problem. Thus, the solution to
a linear system with the matrix K0 can be obtained by solving two Lyapunov matrix
equations, which can be solved using the ADI method in 0(N) time. However, when
the above condition is violated, a and b as well as the factoring matrices in equation
4.10 become complex. It is more complicated to use ADI method to solve the complex
Lyapunov matrix equations and analyze the computational complexity. Therefore,
we limit our discussion to the case when the aforementioned condition is satisfied for
the dense data surface reconstruction problem.
Using the above choice for the matrix Ko, we can see that the matrices U and
V encode the inhibition of membrane and thinplate computational molecules due to
the presence of depth and orientation discontinuities (if any) and the imposed doubly
periodic smoothness over the imbedded rectangular domain.
4.2.2 Sparse Data Surface Reconstruction
The splitting of matrix K for the sparse data surface reconstruction problem is
discussed in this section. The only difference between the splitting schemes for the
61
sparse data problem and the dense data problem is the treatment of the data com
patibility matrix K<. For the dense data case discussed above, the matrix K is an
identity matrix and is arranged into K0, which can be rewritten as a Kronecker prod
uct of two wellstructured matrices. For the sparse data case, the nonzero elements of
the matrix K are sparsely distributed along the diagonal, thus every nonzero entry
of the sparse diagonal matrix IQ is decomposed into the outer product of u( and v'
and included into U and V as follows:
u =
Ul
. uk u'x ..
u'k.
V =
Avj .
.. Ay* v[
... \'k,
where the vectors Ui,...,Ujt, Vi,...,Vjt account for the molecule inhibition due
to depth and orientation discontinuities (if any) and the imposed doubly periodic
smoothness over the embedded rectangular domain. With the above choice of U and
V, each column of U and V is still very sparse, containing only one or two nonzero
entries. Our numerical algorithm will take advantage of this sparsity property.
Similar to the dense data case, the matrix Ko can be decomposed as the sum of
two Kronecker products of AA and I. Therefore, a linear system of the form K0z = f
can be written as the Lyapunov matrix equation AZ + ZA = F.
4.3 Numerical Solution
In this section, we discuss the convergence of ADI method applied to the Lya
punov matrix equation and show that the computational cost for solving this equation
62
is 0(N) by proving that the ADI method can converge in a constant number of it
erations independent of N. In addition, we present a modified biconjugate gradient
method that is used in solving the capacitance linear system.
4.3.1 ADI Method for the Lyapunov Equation
The ADI method for solving a Lyapunov matrix equation is described in [48].
For any Lyapunov matrix equation of the form AX + XA = B, the ADI method
involves the following steps in each iteration j = 1,2,..., J,
(A + PjI)Xj_i = B Xj_,(A pjl) (4.12)
Xj(A+pI) = B (A PjI)Xj_i (4.13)
For our problem, note that the matrix A is circulant Toeplitz and symmetric positive
semidefinite as discussed in section 4.2. The special structure of matrix A helps in
advancing the ADI iterations very efficiently in each step of the technique. On the
lefthand side (LHS) of the equations 4.12 and 4.13 respectively, the matrix A + Pjl
is very close to being tridiagonal except for the nonzero entries at the corners namely,
the entries (l,n) and (n,l). These nonzero corner entries are caused due to the doubly
periodic boundary conditions discussed in the last section. In addition, A+pjl is SPD
for any positive parameter pj. Therefore, we can compute the LU decomposition of
this matrix and then use the forward and back substitution to compute the updated
solution. Due to the the special structure of the matrix A + pjl, the solution update
requires only O(N) time per iteration.
63
We now examine the convergence of the ADI method for the Lyapunov matrix
equation in our algorithm. Let the initial Xo = 0 (zero matrix), and AX* = X* X*,
where X* is the true solution of the Lyapunov matrix equation, A0, Aj,..., An_i, and
q0, qi,..., q_i be the eigenvalues and the eigenvectors of A respectively. For this
symmetric matrix A, we have
A = QDQt, (4.14)
where Q = [q0 qi qni], D = diag[X0, A1? , An_i], and A*, = 4sin ( n2 )
for k = 0,..., n 1. We express the eigenvalue A in this form so as to have nonde
creasing eigenvalues with respect to the indices. The range of eigenvalues is between
0 and 4, and the zero eigenvalue (when k 0) is simple. The orthogonal matrix
Q contains the discrete sine or cosine function in each column with increasing fre
quency corresponding to increasing eigenvalue, i.e., the index increases. For example,
the eigenvector corresponding to the simple zero eigenvalue is a flat function whose
frequency is 0.
By taking the tensor product of the eigenvectors, qjt <8> q/, k, l = 0,..., n 1, we
can generate an orthogonal basis for 3inxn and express the true solution X* in this
basis as follows:
n1n1
x = Â£Â£ aw(q*0qj). (4.15)
k=0 1=0
Subtracting each of the equations 4.12 and 4.13 from the Lyapunov equation AX f XA
= B, and expressing X in the above tensor product basis, we have the following result
64
for the error after t iterations,
n1 n1
ax, = Â£Â£
k=0 =0
TT/^ Pj w Pj \
jlch + pj^i+pj
aki{qk q/).
(4.16)
The function inside the bracket is the error reduction factor in the basis q*,
Notice that the Lyapunov matrix equation has an infinite number of solutions due to
the singularity of the matrix A. The error reduction in each step is always less than
1 for eigenvectors with positive eigenvalues and positive parameters pj while it is
always 1 for the basis q0qo (which is evident from equation 4.16 after substituting
k = l = 0 and A0 = 0). A solution to the Lyapunov matrix equation can be obtained
without reducing this factor at the component k = l = 0.
The classical ADI minimax parameter problem is to find the ADI parameters pj
to minimize the maximum of the error reduction function excluding the zero eigen
value, i.e., to minimize the function
Pt =
max
V(fc,/)\(0,0)
t
UK
^k Pj\t^l Pj
^k + Pj A/ f Pj
(4.17)
Applying the result from the classical ADI minimax analysis [48], leads to an O(log N)
iterations for convergence [85]. This in turn yields an overall computational complex
ity of 0(N log TV) for the ADI method since, O(N) time is required per iteration in
the ADI method. An 0(N log N) computational complexity is unacceptable for our
purposes (recall that we seek an O(N) solution). Thus, in order to design an O(N)
algorithm, it is essential to use the additional knowledge about the problem e.g.,
65
spectral characteristics of the differential operator in the PDE. The classical ADI
convergence analysis does not make use of any such knowledge. In this thesis, we re
formulate the ADI convergence analysis by incorporating the spectral characteristics
of the Laplacian operator. This is facilitated by making the following assumption.
Assumption 1 Let the projection of B, the RHS of the Lyapunov matrix equation, on
the basis q* 0 such
that V(Ar, /)\(0,0), \bkÂ¡\ < ^+p)> for some 171 > 0'5> and ^(p+z?)2 ^ awm
For our Lyapunov matrix equation AX f XA = B, the relationship between the
solution X and the data B in the eigen basis of the matrix Ko is akl =
Therefore, the assumption \bu\ < =* Kz < (a^+a,Hfc2+2)m for some m >
0.5. Using the relation < sinx < x for x [0,7t/2], we have C\k2 < Ak < c22
with ci = Af and c2 = ^r. Thus, requiring that the solution satisfy the constraint
K/ < (glffijm+i for some m > 0.5.
This assumption can be given the following spectral interpretation. The smaller
values k or / correspond to lower frequencies along vertical or horizontal direction, and
vice versa. Consequently, the assumption means that the overall frequency content
of the input data or the right hand side can be bounded by an increasing function
of  Hh = (^2 + Z2)0'5 of degree less than 1. This can be easily satisfied when the
overall frequency distribution of the input data is decreasing or uniform. This is true
in reality since, the low frequency content in signals is normally much higher than
the high frequency content.
66
We first define two matrices X' and AX' for use in the convergence analysis
of the ADI method. Let X' and AX't be the projections of the true solution X*
and the error in computed solution (in iteration t) AX* on the range of the matrix
Ko ( = A I + I A), respectively. Note that, X' and X* are almost the same
except that X' doesnt have projection at the basis qo qo In the ADI literature
[48], the convergence criterion is normally defined as follows: The ADI method for
the Lyapunov matrix equation is said to have converged when the relative Frobenius
norm is reduced to within a specified error tolerance t lying between 0 and
l,i.e AX'Â¡f < eX'jr. This means we dont need to reduce the error component
in the basis corresponding to the zero eigenvalue or in the null space of Ko. The
problem is to find the number of iterations required in the ADI method and the
associated ADI parameter set to achieve HAX'tH^ < c11X'11^.
We make use of this convergence criterion and assumption 1 to reformulate the
convergence theory of the ADI method as follows. The Frobenius norm  AX't can
be written as,
(4.18)
where the double summation takes every k and l between 0 and n 1 except k =
l = 0 (simultaneously). As a consequence of assumption 1 the ADI method for our
Lyapunov matrix equation can converge in a constant number of iterations. We state
and prove this convergence result in the following theorem.
Theorem \ Given assumption 1 and an error tolerance t between 0 and 1, there exists
a constant integer t such thatVn, AX'tH/^ < cIX,f
67
Proof: From the discussion on assumption 1, we have the following inequality for
IIX'IIf.
)!>sEB
bkl \2 ^ n 2
) > 77awm
16 klk2 + l 16
(4.19)
Applying assumption 1 to equation 4.18, we have the following inequality
IIAX',II* < EE
wt
rifrrrIfTnr)2
n
T,(k2 + l2Ym(h + A,)2 f=rh +Pj A/ + Vi
Wm ST' }_
4m2 k i (& + &ym(^ + w~q n
where q is a positive number between 0 and 2m + 1 and
= 7r4rÂ¡7n(rrf )2(rr^)!
(A* + \Â¡)q jJi Xk + Pj A/ + pj
(4.20)
Using the fact that A* = 4sin2(^^) > 4(Â£)2 for k > 0 and substituting into the
above inequality, we have
IIax'^<7^zEE
42,n4 m2^Zrl((^)2 + (i)2)2m+2_,
(2)^(Pl.P2,...,Pf)l, (4.21)
ni
Then we have,
II AX'*p. <
42g4m2 l((n)2 + (^)2)2m+2_9 n2 J V(fc?)\(0,0) R^Pl P2 P*)
(4.22)
Note that the double summation on the RHS of equation 4.22 takes every integer k
and / between 0 and n 1 except k = l = 0 simultaneously. Let the set of all the
68
pair (&, l) in this double summation be denoted by A. By using the split in the set
A, we have
EÂ£/(m4 = 4{/(o,i)+/(i,o)+/(i,i)+Â£ /(M)+E/(M)+ Â£ /(*,/)}.
k l n n k=2 1=2 (k,l)eB
(4.23)
where /(i,!) = ((r),+),m<,, and B = (fc,f)\(l,l) I M = 1,... ,n 1. By using
the fact that f(k, l) is a strictly decreasing function of k and /, we can bound the
terms with summation in equation 4.23 by the corresponding terms in an integration
form as follows.
Â£Â£/(M)i < {/(o,i) + /(i,o) + /(i,i)}i + i /'/(*,o)
72 it Tt J
/ n
+;/i' /('i')* + !Js{x2 + yly+2,dxiV <4'24)
where the integration domain S = {(rr, ?/)(rr, y) G [0,1] x [0,1]\[0, x [0, Â£)}. By
transforming the Cartesian coordinate to the polar coordinate, the term with double
integral in equation 4.24 can be bounded by the following integral,
1
j,4m+42q
rdrdO
( 4m+22q
4(2 m + 1 q)
1 n4m+22q^
4(2 m + 1 q)
2^m+lq^
<
(4.25)
69
Substituting equation 4.25 into equation 4.24 and calculating the integrals in equation
4.24, we obtain the following inequality
1
1
((J)2 + ()2)2m+2<'
< /?n
4771+22?
(4.26)
where 0 = 2 + is constant.
Using equations 4.19, 4.22 and 4.26, the convergence criterion AXfF < cIX#f
can be met with by the following derived condition
ae24 9
~1~
n2q = Mn2q,
(4.27)
where M is the constant defined as M
a'2* q Furthermore, we can bound the
function max^\(0,o)R(p\,p2, ,Pt) by a function of A* as follows:
max
V(*,/)\(0,0)
R{pi,.
,Pt)
max
1
v(fe,/)\(o,o) (At + A/)9
t
IK
Afc ~ Pj
A* + Pj
t
)J)(IK
j=i
Ei)2)
A,+pj* }
<
max
i
lnXiT?II(
n_1 K j=i
Ak Pj '2
A k 4" Pj
The maximum taken from the values of function R at the nonzero eigenvalues,can
be bounded by using the maximum of the same function in an interval containing all
the nonzero eigenvalues. Then the convergence can be reached by using the following
modified requirement
max
Aj=4 sin2(^)
1 1
ik
x j=i
Â£ Pj_\2
x + Pj
< Mn2q.
(4.28)
70
For any positive pj, the function ^rij=i(^)2 is always less than the righthand
side when x > Â£. Therefore, in equation 4.28, the interval (4sin2(^),4) can be
M 1 n2
replaced by S = (4 sin2(^), mm(4, [)). The convergence requirement in equation
Min2
4.28 can be made less stringent by using the inequality,
1 1
max TT (
xS xl M
J = 1
X Pj \2
X + Pj
t
< (max
v xes
n<
xPj
X+Pj
)2)(
max
xes x*
(4.29)
leading to the following modified requirement which when satisfied automatically
makes the inequality 4.28 true.
max TT ( )2 < (Mn2q)( r) = (Mn29)(4sin2())9 = M' (4.30)
This bound M' will approach a constant when n oo. Thus, the problem becomes
one of finding the number of iterations t and the parameters pj such that the re
quirement in 4.30 is satisfied. This requirement is of the same form as the classical
ADI minimax problem which was solved by Jordan as quoted in [87]. We can use the
result in [87] to determine the number of iterations required for convergence. The
number of iterations needed to meet this requirement is
J
k'
v
/
lnÂ£rln%
1,
v' + vV2 1
1. 1,
(4.31)
(4.32)
(4.33)
71
where, \z] denotes the smallest integer larger than 2 and v is the expansion factor of
an interval (a, b) defined as K The expansion factor u of the interval of interest (5)
is fixed when n approaches infinity, i.e.
lim v =
n 00
1
47r2M1/?
(4.34)
In the classical ADI minimax formulation, v is the spectral radius of the eigenvalues
being considered in A, which is shown be of the order of n2 [85]. Since we use
a different formulation for convergence and impose the spectral assumption on the
data, the resulting v becomes When n > 00, the ratio v approaches a
constant as given in equation 4.34 and M' also approaches a constant as mentioned
above. Therefore, we conclude that limn_0O J is a constant, i.e. there is a constant t
such that  AX'f Up < eX';r is satisfied for all n.
In [87], an optimal set of ADI parameters for the general ADI iterations was
given. We may use these parameters to solve the minimax problem in equation 4.30.
Although this set of ADI parameters is not the optimal set to meet our convergence
requirement, it provides us with a parameter set which can achieve the convergence
criterion in a constant number of iterations. Therefore, we use the result in [87] to
select a suboptimal ADI parameter set. Solving for the optimal ADI parameters in
our problem requires complicated analysis using transformation theory [87] and is
currently under investigation.
72
4.3.2 Modified BCG Method for the Capacitance Matrix Equation
In this section we present an algorithm to solve the capacitance matrix equa
tion defined earlier. The conjugategradient(CG) method is a very powerful iterative
scheme to solve the symmetric positive definite linear system. A natural generaliza
tion of the CGtype method to solve a general nonsymmetric linear system is called
the biconjugate gradient(BCG) method [46]. The linear system with the capacitance
matrix is a nonsymmetric and indefinite system. The size of the capacitance ma
trix is determined by the rank of the matrix UVT, which contains the difference
between K and K0. With the K0 proposed in section 4.2, the size of the associated
capacitance matrix is usually 0(n). The BCG method requires the computation of a
matrixvector product in each iteration. Since the capacitance matrix C is dense, this
matrixvector product takes 0(N) operations. In this section, we present a modified
biconjugate gradient method to solve this linear system by applying the wavelet trans
form technique to approximate the multiplication and reduce the computational cost
to 0{n). Since the BCG converges in O(n) iterations, the computational complexity
for the modified BCG method is 0(n2) = 0(N).
The BCG algorithm [46] for the nonsymmetric linear system C/3 = f is as
follows.
0. Choose /30, and set q0 = r0 = f C/30; Choose ?0 / 0, and set q0 = ?0;
1. Compute af = = qLaCq*i, and ak = af/af;
2. Update f3k = /3fc_x + akqk
73
3. Set rk = rjti a^Cq^x, and r* = f*i akCTqk
4. Compute p% = r^r*, and pk = p%/a%;
5. Set qk = rk + pkqki, and qk = rk + pkqki',
6. If rk 0 or rk 0, stop; else go to 1.
Like the CG, the operations and storage requirement per step in the BCG are con
stant. The convergence behavior is similar to the CG method except that the BCG
algorithm is susceptible to breakdowns and numerical instabilities. To be more spe
cific, division by 0 (or a number very close to 0) may occur in computing ak and
pk. These two different breakdowns may be avoided by using the lookahead Lanczos
algorithm or the quasiminimal residual(QMR) algorithm [19].
In the above algorithm, two matrixvector multiplications Cq*_i and Cq*_i are
required in each iteration. Since the matrix C is dense, the direct multiplication will
take O(N) operations, which is not acceptable to our pursuit of an O(N) algorithm
for the Poisson equation. Thus, we propose to carry out the multiplication in a
wavelet basis to reduce the computational cost [1, 7]. With appropriate wavelet basis
[7], the decomposed matrix can be approximated to high accuracy by a sparse matrix.
After the approximation, the multiplication of this sparse matrix and the decomposed
vector can be reduced to 0(n) or 0(nlog(n)) operations [7]. Consequently, the
modified BCG method is computationally efficient.
The wavelet transform of a matrix D and vector q can be written as follows:
D = <Â£D4>t, q = <Â£q.
(4.35)
74
where # is the wavelet transform matrix with each row corresponding to a wavelet
basis vector. If we use the orthogonal wavelet transform, then Dq = ^r(Dq).
Therefore, we compute matrixvector multiplication in the wavelet domain and then
transform back to the original domain by using the wavelet reconstruction (inverse
wavelet transform) to obtain the product Dq.
It is very crucial to know the structure of the capacitance matrix to achieve an
accurate and efficient approximation of the matrix. In equation 3.4, the capacitance
matrix is composed of three matrices, i.e., the identity matrix I, ^2^=0
and VTrj. The first two matrices are extremely sparse, but the third is a dense
matrix. When computing the matrixvector product of the capacitance matrix and
a vector, we directly compute the separate products for the first two sparse matrices
while the aforementioned wavelet approximation scheme is used only for the matrix
VTrj.
m1 qfu
p
4.4 Experimental Results
We have successfully applied our generalized capacitance matrix algorithm to
a host of early vision problems which use the membrane and/or thinplate spline
to constrain the solution [85, 45]. These early vision problems include the surface
reconstruction, shape from shading, optical flow computation, lightness problem,
etc. [85, 45]. These problems when formulated as variational principles and dis
cretized, lead to solving one or more large sparse linear systems. In this section,
we present the results of applying our algorithm to the surface reconstruction and
shape from shading problems.
75
4.4.1 Surface Reconstruction
We implemented our generalized capacitance matrix algorithm on the sparse
data surface reconstruction with membrane and thinplate splines. We synthesized a
sparse range/elevation data set on a 64 x 64 grid. The input data set is very sparse
and contains 15 data points randomly scattered in the plane as shown in figure 4.3(a).
We apply our algorithm to this data to obtain a surface approximation.
In the first experiment, we reconstruct the surface to approximate this sparse
data set by using membrane and thinplate smoothness constraints. Figure 4.3(b)
shows the reconstructed surface using membrane smoothness after 10 ADI iterations.
We observe that only 10 ADI iterations are needed to attain an error tolerance of
10~5. The reconstructed surface using thinplate smoothness after 16 ADI iterations
is shown in figure 4.3(c). We can see the reconstructed surface using thinplate
smoothness is much smoother than that using membrane smoothness.
The second experiment is the surface reconstruction on the same data set with
prespecified discontinuities along a horizontal line, between the coordinates (1,32)
and (30,32). as shown in figure 4.4(a). We depict the computed surface with mem
brane smoothness obtained using our generalized capacitance matrix algorithm with
1 and 10 ADI iterations in figure 4.4(b) & (c) respectively. Figure 4.4(d) shows the
result of the same experiment with thinplate smoothness after 16 ADI iterations
in our generalized capacitance matrix algorithm. A similar experiment with depth
discontinuities along the diagonal line segment extending from the coordinate (1,63)
76
Figure 4.3. Surface reconstruction example; (a) original sparse data, (b) recon
structed surface using membrane smoothness, (c) reconstructed surface using thin
plate smoothness, obtained using the generalized capacitance matrix algorithm.
77
to (63,1) was performed. The data along with the recovered surface using membrane
smoothness are shown in figure 4.5(a) & (b) respectively.
(c)
(d)
Figure 4.4. Surface reconstruction with prespecified discontinuities along a horizon
tal line; (a) original sparse data set with depth discontinuities along the indicated
horizontal line; reconstructed surfaces using a membrane spline after (b) 1 iteration
and (c) 10 iterations of ADI method; (d) reconstructed surface using a thinplate
spline after 16 iterations of ADI method.
4.4.2 Shape from Shading
In this problem, we tested our algorithm on a synthetically generated image of
a Lambertian sphere illuminated by a distant point source and viewed from the same
location as the point source. Figure 4.6(a) depicts the 64 x 64 original shaded image
78
(a) (b)
Figure 4.5. Surface reconstruction with prespecified discontinuities along a diagonal
line; (a) the original sparse data set with depth discontinuities along the indicated
diagonal line, (b) reconstructed surface using a membrane spline after 10 iterations
of ADI method.
with an 8 bit resolution in gray per pixel. The sphere is assumed to have Lambertian
reflectance properties with a constant albedo. Both depth and orientation data are
specified on an occluding contour depicted in figure 4.6(b). Note that we have to
solve three systems of equations namely Kxi = bi, Kx2 = b2 and KjX3 = b3 (see
section 2.3). The first two systems are for recovering the surface normals and the last
equation for recovering the depth from the surface orientation map. We start the ADI
iteration for the linear systems with the initial values of p = q = z = 0. The recovered
surface orientation is shown in figure 4.6(c) and the final surface shape obtained using
the orientation information is depicted in figure 4.6(d). For each computed (p,q) on
the RHS of equation 2.18, the ADI error tolerance was set to 105. The same tolerance
was used for the ADI iterations in the depth from orientation problem.
79
Although, we have not compared the computational time of existing solution
methods for the discretized Poisson equation with the computational performance
of our algorithm, we believe that our method will outperform most other existing
methods. Future research efforts will focus on empirically demonstrating this claim.
80
(a)
(b)
\\
V'\N
\\\\\
\\\\\\\
\\WX\W
\XXWX\W
XXXXNSXXW
XNXXXXXXW
NNWNWWW
XSNXWWWW
XXXXXXXXXXX \
///
////
It////
///////
////////
/t////
/////////
//////////
////^/////
WWWNW
\\\\\\\\
xxxxxxx
wxxxxxxx
(c)
(d)
Figure 4.6. Shape from shading
occluding boundary imbedded in
reconstructed surface shape.
example, (a) Shaded sph
a rectangle (c) recovered
ere image (b) irregular
surface orientation, (d)
CHAPTER 5
ADAPTIVE PRECONDITIONING TECHNIQUE
5.1 Introduction
The generalized capacitance matrix algorithm presented in chapter 4 is very
efficient to solve the linear systems arising from the early vision problems. However,
this technique can prove to be inefficient for dense data problems with nonuniform
weighting on the data constraints, such as the optical flow problem, since the size
of the associated ( dense ) capacitance matrix is too large. Although this problem
may be circumvented by incorporating the capacitance matrix technique as part of
an iterative scheme [72], however it was pointed out in [12] that this semidirect
numerical scheme only converges when the regularization parameter A is very large.
In this chapter, we present a novel adaptive preconditioning technique which when
used in conjunction with a conjugate gradient algorithm can solve the linear systems
very efficient. This adaptive preconditioning technique can be universally applied to
solve any early vision problem.
Numerical iterative methods have been popular for solving the linear systems in
early vision problems. Several multiresolution preconditioning techniques [75, 61, 92]
have been proposed to accelerate the convergence speed of the iterative methods in
81
82
computer vision literature. In this chapter, we will introduce a physicallybased adap
tive preconditioning technique which will outperform in computational efficiency
 all previously proposed preconditioning methods in vision literature namely, the
hierarchical basis preconditioned conjugate gradient of Szeliski [75] and the change
of basis methods of Yaou et al. [92] and Pentland [61]. In addition, it does not have
any restriction on the choice of the regularization parameter A for its convergence.
In [75, 76], Szeliski presented a preconditioning technique based on the use of
hierarchical basis functions. This technique is based on the multilevel splitting of the
finite element spaces presented in [93]. The hierarchical basis functions naturally lead
to a pyramidal representation of the unknown function being solved for. Empirical
convergence results were shown for preconditioned conjugate gradient in comparison
with the standard conjugate gradient method. A key piece of information that was
overlooked in Szeliskis work [75, 76] was that, in designing a preconditioner for the
conjugate gradient algorithm, no information about the problem being solved was
used i.e., once a basis set was chosen, the same preconditioner was used regardless
of the imposed smoothness or data constraints. Thus, the true power of a precondi
tioning transform was not exploited to its fullest.
More recently, Pentland [61] presented a technique for surface interpolation us
ing a change of basis to orthonormal wavelets as a preconditioning transform. He
uses the results of Beylkin et al., [7] which states that the application of (TV x TV)
matrices corresponding to any pseudodifferential and CalderonZygmund opera
tors to arbitrary vectors requires either O(TV) or O(TVlogTV) operations based on
83
whether nonstandard or standard wavelet transform is used. The result of Beylkin
et al., primarily shows a reduction in the bandwidth of the matrices corresponding
to pseudodifferential operators in a wavelet basis and gives a technique whereby an
0(N) coefficients can be used to approximate the (N x N) operator, for a given
tolerance. This compression takes O(N) time if the structure of the singularities of
the matrix are known a priori.
There are three issues to be noted with regards to Pentlands reported results.
Firstly, after a change of basis to orthonormal wavelets, the offdiagonal terms were
discarded in [61], under the claim that the transformed stiffness matrix was diagonally
dominant. This diagonal dominance behavior was depicted via the profiling of a row
of the transformed stiffness matrix at a high resolution. Upon close examination,
we found that the diagonal dominance property does not hold in general at lower
resolutions (see figure 5.10) and thus, discarding the offdiagonal terms is unjustified
and leads to a solution which may be far from the true solution. Secondly, Pentlands
preconditioner requires the computation of the diagonal entries in a wavelet trans
formed stiffness matrix. Although the stiffness matrix is banded, the computation of
these diagonal entries takes 0(N \og N) operations, which is too expensive for large
N. Thirdly, it is not known if the matrix (K + S) in his linear system (K + S)U = D
being solved for the interpolation problem would satisfy the conditions set forth in
Beylkin et al., [7] in order for their bandwidth reduction scheme to be applicable in
this case. Further, we found that his algorithm fails to converge in reasonable number
84
of iterations to the correct solution for very sparse data that is used in the examples
of this thesis.
Yaou and Chang [92] report a fast surface interpolation algorithm using the
multiresolution wavelet transform. This work is almost exactly similar to that of
Szeliski with the exception that Yaou and Chang use a wavelet basis instead of the
hierarchical basis and a Jacobi iteration [26] instead of the conjugate gradients used in
Szeliski [75]. As in Szeliski, the full potential of a preconditioning technique was not
exploited in Yaou and Chang i.e., the design of the preconditioner did not make use of
any information about the problem being solved. The stiffness matrix in the surface
interpolation problem was first transformed to a biorthonormal wavelet basis and
then the transformed linear system was solved using a Jacobi iteration. Note that,
in a preconditioning technique applied to the the positive definite stiffness matrix K,
one tries to approximate K as well as possible with another positive definite matrix P
known as the preconditioner, such that P1K is a good approximation to the Identity
matrix. Such an approximation P was never constructed in Yaou and Chang [92].
Further, upon implementing their algorithm, it was found that their published results
about rate of convergence were only valid for data from periodic function since their
implementation took advantage of periodic wavelet transforms.
In this chapter, we will present a very effective way to construct a physically
based adaptive preconditioner in a wavelet basis for several early vision problems
namely, the surface reconstruction (SR), shape from shading (SFS) and computation
of optical flow (OF). This physicallybased preconditioner is adapted to the membrane
85
spline or the thin plate spline or a convex combination of the two. Experimental
results depicting the performance of our algorithm in comparison to existing methods
discussed above are presented for synthetic and real data.
The rest of this chapter is organized as follows. In section 5.2, an adaptive
preconditioner constructed in a wavelet basis is presented, followed by the adaptive
preconditioned conjugate gradient algorithm in section 5.3. Algorithm implementa
tion on synthetic and real data are presented in section 5.4. In section 5.5, we end
with a discussion.
5.2 Adaptive Preconditioning in a Wavelet Basis
Iterative methods are preferable to direct methods for solving large and sparse
linear systems in early vision problems, since direct methods require the formation
of intermediate matrices which are not necessarily sparse. Although the sparsity
can be preserved when using direct methods on ID problems with the smoothness
constraint, it will be destroyed in the case of 2D problems. This makes the number
of operations and the amount of storage too large for 2D and higher dimensional
problems.
The efficiency of applying an iterative method to solve a linear system depends
on its rate of convergence, which is a function of the condition number of the coef
ficient matrix in a linear system. Unfortunately, many early vision problems after
regularization lead to large illconditioned linear systems. Therefore, most itera
tive methods converge very slowly to the true solution and are thus computationally
86
impractical. One way to speed up the convergence of an iterative method is via pre
conditioning. In this section, we present a new adaptive preconditioning technique
wherein the preconditioner is constructed in a wavelet basis. Since the proposed
preconditioner is designed to adapt to the spectral characteristics of the smoothness
constraint in the regularization solution (in the frequency domain) of different early
vision problems, it is bound to perform better than the existing fixed precondition
ers [75, 42, 92].
5.2.1 Preconditioning Technique
For a linear system Kx = b with positivedefinite matrix K, we assume there
exists a positivedefinite matrix P, the preconditioner for the matrix K, such that the
condition number of P1K is greatly reduced. Since P is a positivedefinite matrix,
it can be shown that there exists a positivedefinite matrix C such that C2 = P [26].
Now, we can define the vectors x and b via the transformation matrix C as follows
x = Cx, b = C1b. (5.1)
Thus, the linear system Kx = b can be solved by solving the transformed linear
system
Kx = b, (5.2)
where K = C1KC1. Note that the matrix K in the transformed linear system
is similar to the matrix P_1K, i.e., the condition numbers of K and P1K are
equivalent. The condition number of the transformed matrix is much smaller than
87
that of the original matrix according to the aforementioned criterion of choosing a
preconditioner P. Consequently, the convergence of the iterative methods can be
accelerated by a transformation of the original linear system to a better conditioned
linear system.
Although the conditioning of a linear system can be improved after an appro
priate transformation, it is not practical to form the matrix K to solve equation
5.2 since, the transformation of a matrix requires a large number of operations and
the sparsity structure of the matrix K could be destroyed after the transformation.
Fortunately, it is not necessary to compute this transformation of the matrix K to
achieve the preconditioning effect in a conjugate gradient algorithm. Instead, the
same preconditioning can be achieved via solving an auxiliary linear system Pz = r,
where r is the residual vector, in each conjugate gradient iteration. (The precondi
tioned conjugate gradient algorithm is given in section 5.3.) Therefore, the second
consideration for a good preconditioner is to choose the matrix P such that the linear
system Pz = r can be easily solved.
The above two criteria for designing a good preconditioner in conjugate gradient
are equally important. Reducing the condition number can accelerate the convergence
of the iterative methods employed. Choosing the preconditioner P such that Pz = r
is easily solvable is very crucial for efficiently updating the solution in each iteration
of the preconditioned conjugate gradient (PCG) algorithm. Our preconditioner is de
signed in a wavelet basis by approximating the spectral characteristics of the matrix
88
K. This approximation is achieved via the frequency domain analysis of the regu
larization operators and using the frequency band partitioning behavior of a wavelet
basis. Therefore, our adaptive preconditioner is akin to a spectrally equivalent op
erator for the original matrix K. Furthermore, the solution of a linear system using
our adaptive preconditioner can be obtained by taking the wavelet transform, scaling
the coefficients, and taking the inverse wavelet transform. By using a very compactly
supported wavelet, the solution update can be achieved very efficiently.
5.2.2 Regularization Filter
The filtering interpretation of regularization operators has been discussed in
vision literature [80, 83]. The inverse of the controlcontinuity stabilizer (Laplace +
Biharmonic operators) may be interpreted as a lowpass filter hence, the solution to
an early vision problem in a regularization framework can be obtained by applying
this lowpass filter to the data. We will use this filtering behavior of the regularization
operator to develop a preconditioner which approximates this filter in a wavelet basis.
In section 2.3, we have seen the variational formulations of various early vision
problems that lead to minimizing an energy functional. This minimization can be
achieved by solving the associated EulerLagrange equation which for the surface
reconstruction problem in the continuous data case (with a controlledcontinuity
stabilizer) can be written as
/>{x){(l t(x))A2u(x) t(x)Av(x)} + c(x)t;(x) = c(x)d(x),
(5.3)
89
where A is the Laplace operator, A2 is the biharmonic operator, c(x) is the weight
ing function associated with the data d(x), the functions p(x) and r(x) control the
rigidity and tension of the solution surface respectively. Now lets consider the case
when p(x) and r(x) are constant functions, i.e. p(x) = p and r(x) = r. In addition,
lets also assume the function c(x) to be a constant function, which means the Gaus
sian noise associated with each data measurement has the same variance. Taking the
Fourier transform of equation 5.3 we get
v() =
,/>(! r) u>  + pr\\ w  + c,
V(U).
(5.4)
Where u> = (uq, ,u>d) is the ddimensional frequency domain variable while, V(u>)
and V(cj) are the Fourier transforms of n(x) and d(x) respectively. Equation 5.4 can
be given a filtering interpretation namely, V(u>) is obtained by lowpass filtering the
data This lowpass filter is characterized by the first term on the right hand
side in equation 5.4 called the transfer function. Our preconditioner is constructed
in a wavelet basis to approximate this transfer function.
Equation 5.4 expresses a linear shiftinvariant lowpass filtering of the data in
frequency domain, derived from a regularization framework. This filtering interpre
tation can be generalized to the regularly sampled discrete data case in a straight
forward manner. However, this linear shiftinvariant filter will become a nonlinear
and spatially varying filter in the general cases when the functions p(x), r(x) and
c(x) are not constant. These cases include discontinuitypreserving regularization
and sparsedata interpolation. Although the corresponding nonlinear and spatially
90
varying filter is difficult to characterize fully in the frequency domain, it is in general
a lowpass filter and we can approximate its spectral characteristics in the frequency
domain. In the next section we show how to achieve this approximation in a wavelet
basis.
5.2.3 Wavelet Basis
Wavelet transform has been used widely in the signal and image processing
communities since it provides a powerful tool for multiresolution analysis. The con
struction of a wavelet basis is achieved by defining a scaling function (f>(x), such that
the integer translates of the dilated scaling function V2~]
mal basis of L2(3?), where j G Z. This orthonormal basis (= {y/2~i
forms a subspace Vj C L2(3Â£), and the subspaces Vj satisfy the containment hierar
chy
Vj C Vj_i,Vj Z. (5.5)
Now we can introduce the difference space Wj between the two consecutive sub
spaces Vj and Vj_i such that Vj Wj = Vj_x, where denotes the direct sum
operator. The key idea of the wavelet basis is the difference space Wj is spanned by
an orthonormal basis which are obtained via integer translations of the associated
wavelet function ip(x), i.e. Wj = {\/2~^ip(2~^x k)}kez Therefore, a multiresolu
tion orthogonal decomposition of the space V0 can be achieved via
Vj WJ WJ_! . Wj = Vo,
(5.6)
91
where J is a positive integer. From this multiresolution orthogonal space decompo
sition, we can construct a multiresolution orthonormal wavelet basis, since each of
the subspaces on the left hand side of equation 5.6 are orthogonal to each other and
there exists an orthonormal basis for each subspace as discussed above.
The wavelet decomposition and reconstruction can be efficiently achieved via a
QMF (Quadrature Mirror Filters) implementation as shown in figure 5.1 (see [50]).
In this figure, A^f is the discrete approximation of the function f in the subspace
Vj, i.e. Ajf contains the values of the inner products of the function f and the
orthonormal basis functions in Vj, and D^f encodes the projection of the function f
on the subspace Wj. The filters H (in decomposition) and H (in reconstruction) are
lowpass filters, while the filters G ( in decomposition) and G ( in reconstruction)
are highpass filters. They are usually designed to be the FIR filters, which can
be implemented via convolution operations. With an appropriate choice of a wavelet
basis, we can find the associated FIR filters with very short length/span and the QMF
filter implementation for the wavelet transform can be accomplished very efficiently.
The QMF structure decomposes the space from one level to the next coarser level
with each stage involving a lowpass and highpass filtering. From a frequency domain
analysis, we can see that a wavelet decomposition divides the frequency domain into
two separate bands in each stage. Therefore, each subspace Wj has its own frequency
band, which means that all the wavelet basis functions in any particular level are fully
contained in the frequency band associated with that level. This behavior of dividing
the frequency domain [68, 86] exhibited by a wavelet decomposition is illustrated
92
(a)
A,.,f f 2
Dj4lf] f 2  G
H Gy 2 ~A,f
 2 : Put one zero between each sample
Convolve with filter X
Q
*2  : multiplication by 2
(b)
Figure 5.1. The implementation of the wavelet transform via the QMF structure
from one level to the next, (a) decomposition (b) reconstruction, adopted from Mal
lat [1989].
in figure 5.2. Notice that the closer the QMF filters are to the ideal lowpass and
highpass filters, the better it is for achieving a good frequencydomain division, i.e.,
the overlap between the frequency bands associated with each level is minimized with
close approximations of the ideal low and highpass filters.
In [50], Mallat constructed the 2D wavelet basis by using the tensor products
of the ID wavelet and scaling functions. The QMF implementation of the wavelet
93
transform was also extended to the twodimensional case in [50]. The division of
frequency domain behavior for the 2D wavelet basis is simply an extension of the ID
case.
After introducing the division of frequency domain behavior for a wavelet basis,
we are ready to construct our preconditioner in a wavelet basis and adapt its spectrum
to that of the regularization filter discussed in section 5.2.2.
5.2.4 Adaptive Preconditioning via the Modulation in a Wavelet Basis
In section 5.2.1, we discussed two criteria for selecting a good preconditioner.
In this section, we construct an adaptive preconditioner for early vision problems by
following these two criteria.
Using techniques of frequency domain analysis, a lowpass filtering interpretation
can be given to the regularization formulation of early vision problems as was shown in
section 5.2.2. The spectral characteristics of the regularization filter in the frequency
domain can be approximated using the frequency domain subdivision behavior of
the wavelet transform which can be implemented very efficiently using a quadrature
mirror based implementation.
To apply a preconditioner P to a matrix K, the first criterion is to drastically
reduce the condition number of P1K; the second criterion for a good preconditioner
is that the auxiliary linear system Pz = r should be efficiently solvable. An obvious
choice to reduce the condition number of P1K is to choose P = K, which reduces
the condition number to 1. Although only one iteration suffices to solve the problem,
the complexity of solving the auxiliary linear system Pz = r is the same as that of
94
solving the original linear system. Therefore, nothing is gained by using P = K. A
reasonable choice for a good preconditioner is to find a matrix P that is spectrally
close to K; in addition, the solution to the auxiliary linear system Pz = r should be
easily solvable. Our preconditioner is constructed based on this very idea.
The matrix K is the stiffness matrix obtained from a discretization of the regu
larization formulation. Therefore, its inverse has the same spectral characteristics as
that of the regularization filter given in equation 5.4. Now we approximate the spec
tral characteristics of the regularization filter in a wavelet basis by using its frequency
domain partitioning property. This approximation is accomplished by modulating
each wavelet basis function by the spectral value of the regularization filter taken at
the center of the corresponding frequency domain subregion. To be more specific,
when Mallats tensor product strategy [50]is used for the 2D wavelet transform, each
level (resolution) contains three sets of wavelet basis functions, i.e. LH ( tp), HL
and HH (xpip), and each set corresponds to its own subregion in the 2D fre
quency domain. The modulation for the wavelet basis functions in each set is based
on the spectral function of the regularization filter in its corresponding subregion.
For ease of implementation, we can just take the spectral function value at the center
of the subregion to be the modulation factor.
The inverse of the stiffness matrix is basically a lowpass filter, whose spectral
function changes rapidly in the lower frequency region and varies smoothly in the
higher frequency region. An approximation to the spectral function of the lowpass
filter achieved in a wavelet basis is very fine in the lower frequency regions and coarse
95
in the higher frequency regions. This is because the division of the frequency domain
becomes finer in the lowerfrequency band with increasing levels (see figure 5.2).
Therefore, we can achieve a better approximation when more levels are used. In our
preconditioner, the number of levels used is determined by the support of the wavelet
basis used and the problem size. Therefore, for a given problem size and wavelet
basis, the number of levels is fixed.
Our preconditioner P can be written as
P = WMWr, (5.7)
where the matrix W is a 2D orthonormal wavelet transformation matrix and the
matrix M is the diagonal modulation matrix, whose diagonal values are obtained
from the spectral function of the regularization filter. From equation 5.7, we can
see that the solution to the auxiliary linear system Pz = r in the preconditioned
conjugate gradient can be computed very efficiently when the wavelet basis are chosen
appropriately, i.e. the QMF filters are of short length/span. As long as the QMF
filters are of finite length, the solution can be obtained in 0(N) operations. The
solution to this auxiliary linear system is obtained in the three steps namely, (1) take
the wavelet transform of the vector r, (2) modulate the transformed vector using the
matrix M, and (3) take the inverse wavelet transform.
96
5.3 Adaptive Preconditioned Conjugate Gradient Algorithm
As discussed in section 2.3, the discretization of the variational formulation leads
to solving the system of equations Kx = b. The stiffness matrix K is symmetric
positive definite and therefore, the conjugate gradient algorithm can be used to solve
the problem guaranteeing a unique solution. When a preconditioner P is used to
accelerate the convergence, we have what is known as the preconditioned conjugate
gradient algorithm (see [26]) given as follows.
0. Choose an initial Xo, and compute ro = b Kx0; set k = 0.
1. Solve Pzk = r*; k = k + 1.
2. If k = 1, set pi = z0; else compute /3j? = rk_2zk^2 and /?* =
update pk = 7.ki + (3kpki
3. Compute af = rJLjZfci, ak = p^Kp*, and ak = ak /ak'i
4. Update rk = rki akpk.
5. Update xk = xki + akpk.
6. If rk ~ 0, stop; else go to step 1.
The only difference between the preconditioned conjugate gradient algorithm
and the standard conjugate gradient algorithm is the preconditioning process in step
1. Since the preconditioning involves solving the auxiliary linear system Pzk = rk in
each iteration, it is absolutely necessary to have a fast solution to this preconditioning
equation when designing a good preconditioner.
97
The strategy for constructing our adaptive preconditioner for the early vision
problems with regularization formulation was discussed in section 5.2. Our precon
ditioner is based on the idea of modulation in a wavelet basis to approximate the
spectral characteristics of the matrix K. Accordingly, our adaptive wavelet precondi
tioner P took the form WMWr with the diagonal modulation matrix M encoding
the spectral approximation in the wavelet basis. Now, the primary problem in this
preconditioner construction is how to construct the modulation matrix M for general
problems. In the following paragraphs, we show how to achieve the construction of
an adaptive preconditioner for the surface reconstruction, shape from shading, and
optical flow computation problems.
5.3.1 Surface Reconstruction
For the surface reconstruction, the stiffness matrix K(= Kj + Ks) contains the
matrix obtained from the data constraint and the matrix K4 corresponding to the
smoothness constraint. When the controlledcontinuity stabilizer is used, the matrix
Ks is the linear combination of the discrete Laplacian matrix ( from membrane) and
the discrete biharmonic matrix ( from thin plate). Since the spectral distributions for
both matrices are known a priori, the spectral distribution for the matrix Ks can be
obtained from their linear combination. Now, what remains to be determined is the
spectral distribution for the matrix K<. For the dense data case with uniform weights
for each data constraint, the matrix Kj is a scaled identity matrix, whose spectral
distribution is a constant function. This case corresponds to the discrete version
of the linear shiftinvariant lowpass filtering interpretation for the regularization
98
problem. For other cases such as sparse data surface reconstruction or dense data
with nonuniform weights, we somehow need to estimate the spectral distribution of
Kd in the frequency domain. We will elaborate further on this spectral estimation in
the following paragraphs.
Since the spectral approximation of the matrix K is achieved in the wavelet
basis, we can approximate the spectral distribution of the matrix via the wavelet
transform. One way is to do the wavelet transform of the matrix K< and then compute
the energy contained in each diagonal block which corresponds to a particular division
of the frequency band, and the computed energy is used to represent the spectral value
for this frequency band. To use this spectral estimation, we need to develop a fast
computational algorithm since the wavelet transform of an N x N matrix requires
0(N2) operations. Another way to estimate the spectral distribution of K is to
take the wavelet transform of its diagonal entries only. The N diagonal entries of the
N x N diagonal matrix K< are grouped into an n x n matrix. Then we compute the
energy from the wavelet transform of this matrix and associate it with the spectral
value of each frequency band. This method takes advantage of the diagonal structure
of the matrix and can be accomplished in O(N) operations. We use this method
to estimate the spectrum of the matrix K< in our experiment.
Using the above discussed spectral approximation to the matrices Ks and K
separately in a wavelet basis, we can obtain the diagonal approximation matrices Ms
and Mj, respectively. Then, the modulation matrix M is taken to be the sum of
Ms and Mj. Since the matrix is obtained from the spectral approximation and
99
the matrix Ms has very small diagonal entries at the very coarse resolution levels,
the spectral estimation error in Md at these levels may cause significant error in the
spectral approximation to the matrix K. This problem may exist only in the case of
sparse data surface reconstruction, since the spectral estimation for Kj in the dense
data case is quite accurate.
To amend the problem caused by the spectral estimation error, we introduce a
weighting function a(i), where i is the number of iterations, to appropriately weight
the importance of Ms and during the CG iterations, i.e.,
M(t) = Ms +
The matrix M() is the modulation matrix in the ith iteration. From the filtering
correspondence with the regularization formulation, we can see that the overall spec
tral characteristics for the matrix K1 is a lowpass filter. Note that the inverse of
Ks is also a lowpass filter, while the inverse of is not. Therefore, the matrix
M, ( or K, ) is more dominant than the matrix M< ( or K< ) for solving the linear
system Kx = b. Based on this observation, we set the weighting function a(i) to be
very small initially and then gradually increase the values of the weighting function.
Using this schedule for the weighting function, the lower frequency components of
the solution are recovered at the beginning of the preconditioned conjugate gradient
iterations, and then the higher frequency components. This globaltolocal shape re
covery phenomenon can be predicted by the choice of the weighting function cr(i). In
fact, the weighting function can be adapted during the iterations instead of following
100
a preset schedule. The adaptation concept is similar to the adaptive step scheme in
the scale space tracking [89].
The preconditioner construction for the shapefromshading and the optical flow
computation problems is very similar to the one outlined here for the surface recon
struction problem. We only state their differences in the following paragraphs.
5.3.2 Shape from Shading
The variational formulation of the SFS problem leads to solving a set of nonlin
ear partial differential equations. After discretization, we need to solve the system
Kx = b. If the linear approximation of the reflectance function technique [38] is used
to improve the convergence, the stiffness matrix K sR3Nx3N has the following form.
pKs
pi
pD (g) I
pi
AKS + (Rp + p) 1
RpRql
pDT (g> I
RpRql
AK, + (Rq + p)I
(5.9)
where K, $NxN is the Laplacian matrix, I $lNxN is the identity matrix, Rv and
Rq are the firstorder partial derivatives of the reflectance function with respect to p
and q, respectively, taken at the reference gradient (po, <7o), and the matrix D 9?nXn
(N = n2) has nonzero entries only along the main diagonal and upper diagonal with
101
Is along the main diagonal and Is along the upper diagonal, i.e.,
1 1 0 0
0 1 1 0 0
0 1 1
0 0 0
(5.10)
Our preconditioner P has similar block structure as K and can be expressed as
w
0
0
Ml
0
0
WT
0
0
p =
0
w
0
0
m2
0
0
WT
0
0
0
w
0
0
m3
0
0
WT
(5.11)
The modulation matrices Mi, M2 and M3 are all diagonal and they are used to mod
ulate the wavelet basis functions corresponding to z, p, and q vectors, respectively.
The modulation matrices are constructed to approximate the spectral distributions
of the corresponding diagonal blocks in the matrix K, i.e. fiKa, AKS + (Rp + /z)I and
AKS + (Rq T /i)I. The construction procedure for each block is the same as discussed
in the surface reconstruction problem.
102
5.3.3 Optical Flow Computation
Similar to the SFS problem, the stiffness matrix K for the optical flow estimation
has the following 2x2 block structure.
K =
Ks + Ej
E.
E
Xy Kg + Eyy
(5.12)
_ . K2
where Exx, Exy, and Ey are all diagonal matrices with the normalized entries _ JÂ¡Â¡,
EX+Ey
F F E2
and =5=^=j. The construction of the block preconditioner is similar to that in
Ex+Ey Ex+E y
the SFS problem, therefore it is omitted here.
5.4 Experimental Results
In this section, we present implementation results of applying our adaptive pre
conditioner to the surface reconstruction, shapefromshading, and optical flow com
putation problems. Since our preconditioner adapts to the problem, it is bound to
perform better than the existing fixed preconditioners described in literature. We
will justify this claim through the experimental results.
Note that there is no restriction on the wavelet basis being used in the construc
tion of our preconditioner. However, the efficiency of our preconditioner is related
to the localization of the wavelet being used. The time localization makes the com
putation of the wavelet transform efficient; the localization in the frequency domain
yields a more accurate spectral approximation and therefore faster convergence of the
preconditioned conjugate gradient can be achieved. In our experiment, we use the
cubic Bspline wavelet given in [50] with the QMF filters truncated to a length of 11.
103
This truncation is used to reduce the operations involved in QMF implementation
of the wavelet transform and an insignificant error is introduced in the frequency
domain.
In constructing our preconditioner, the weighting function a(i) is needed to
control the contributions of Ka and K< to the modulation matrix. We suggest that
the weighting function be chosen as an increasing function. We used an increasing
function of the form (1e~ e )d, where c and d are constants, for the weighting function.
In our experiment, c was usually set to 10 (for problems not severely illconditioned)
or 100 (for severely illconditioned problems), and accordingly d was set to be 1 or 2.
5.4.1 Surface Reconstruction
We present results of several experiments with sparse data, and compare the con
vergence rate of our adaptive preconditioned conjugate gradient (APCG) algorithm
with the hierarchical basis conjugate gradient (HCG) technique [75], the conjugate
gradient (CG) algorithm [26], and the biorthogonal wavelet basis transfer scheme
with Jaffards diagonal preconditioning [42]. In [92], the biorthogonal wavelet basis
transfer is used in the preconditioning process. We found that this wavelet basis
transfer must be combined with appropriate scaling to improve the convergence,
which was not mentioned in [92]. In [42], Jaffard proposed the 2J, where j is the
level, diagonal scaling in the wavelet basis as a preconditioning technique. Jaffards
2J diagonal scaling can be changed to 2mj scaling where m is a positive integer. We
combined this variant of Jaffards diagonal scaling with the biorthogonal wavelet ba
sis transfer and implemented it for conducting the convergence comparisons. We use
104
Figure 5.3. The sparse data sets used for the surface reconstruction, (a) synthetic
data (b) synthetic symmetric data (c) real laser range data.
105
all of the aforementioned preconditioners in conjunction with a conjugate gradient
algorithm and test their performance on one synthesized sparse data set (see fig
ure 5.3) in the surface reconstruction experiments and one real range data set from a
laser scanned ball. Subsequent experiments depict the convergence comparison of the
aforementioned preconditioners applied to the SFS, and OF computation problems.
In the first experiment, we recover the underlying surface from the sparse data
set in figure 5.3(a) using the controlledcontinuity stabilizer with the parameters p
and r being 1 and 0.1, respectively. The size of the problem is 64 x 64. Applying
our adaptive preconditioner to this problem, we can see that the iterations converge
to the true solution very fast. The computed solutions after 1, 5, 10 and 20 steps
of conjugate gradient, HCG, and our APCG algorithms are illustrated in figure 5.4.
With only five iterations of our APCG algorithm, the global shape of the surface
is recovered. After ten iterations of our algorithm, the solution is very close to the
true solution. Comparing these intermediate solutions in figure 5.4, we can see the
solution after 20 iterations of HCG is still bumpy and the computed solution after
20 CG iterations is far from the true solution. However, this comparison is not
very fair since the computational cost for each iteration of the APCG, HCG and
CG algorithms is different. Consequently, we include the execution time into the
comparison as follows. On a multiuser SUN SPARC10 workstation, 10 iterations
of our APCG algorithm on this experiment took 2.14 seconds. For the same time,
the HCG algorithm ran 25 iterations and the CG algorithm ran 30 iterations. Figure
5.5 shows these computed solutions for the three different algorithms using the same
106
time. We can see that our APCG algorithm gives the best result for this comparison
with the same execution time. In figure 5.6(a), we depict the convergence rate of
the APCG algorithm in comparison to all the aforementioned iterative algorithms.
Figure 5.6(b) depicts the experiments with a thin plate stabilizer. The convergence
curves for our APCG algorithm, HCG, CG, and Yaou & Changs biorthogonal basis
transfer with Jaffards diagonal preconditioning are shown. As can be seen, our
APCG algorithm has the best convergence performance in both experiments.
We include the discontinuities in the above controlledcontinuity stabilizer to
test the performance of our APCG algorithm. The discontinuities are located along a
prespecified diagonal line. Figure 5.7 illustrates the convergence rates of our APCG,
HCG( with 4 levels) and CG algorithms. The APCG algorithm still has the best
convergence performance in this experiment. Since the inclusion of discontinuities
makes the conditioning of the stiffness matrix worse and our preconditioner was not
redesigned to account for the discontinuities in the stabilizer, the convergence rate in
this experiment is slower than that of APCG without discontinuities. The computed
solutions after 10, 20, 40 and 60 iterations of our APCG algorithm are depicted in
figure 5.8.
We attempted to incorporate the preconditioner developed in Pentlands [61]
into the comparison however, we found that the convergence of conjugate gradient
with this preconditioner is even slower than that of the CG algorithm (without pre
conditioning) for the very sparse data case. Note that the preconditioner was used
in a gradient decent iteration in [61]. In our experiments, the preconditioners are
107
APCG
HCG
CG
(a)
Figure 5.4. (a), (d), (g) & (j) are the computed solutions using the APCG algorithm,
(b), (e), (h) & (k) are the computed solutions using HCG algorithm, (c), (f), (i) &
(1) are the computed solutions using conjugate gradient algorithm, after 1, 5, 10 and
20 iterations, respectively.
108
(a)
(b)
(c)
Figure 5.5. The computed solutions after (a) 10 iterations of APCG (b) 25 iterations
of HCG (c) 30 iterations of CG using the same time.
109
(a)
(b)
Figure 5.6. The convergence for different preconditioners in the preconditioned con
jugate gradient algorithm in the surface reconstruction problem with the synthetic
sparse data set and (a) with controlledcontinuity stabilizer (p = 1, r = 0.1) (b) with
thinplate stabilizer.
Figure 5.7. The convergence for different preconditioners in the preconditioned con
jugate gradient algorithm for the surface reconstruction problem with discontinuities
specified in the controlledcontinuity stabilizer.
no
Figure 5.8. The computed solutions for the surface reconstruction with discontinu
ities using the APCG algorithm after (a) 10, (b) 20, (c) 40 and (d) 60 iterations,
respectively.
Ill
Figure 5.9. The convergence of Pentlands preconditioner in conjunction with the con
jugate gradient algorithm is worse than that of conjugate gradient algorithm without
preconditioning for the surface reconstruction problem with the synthetic sparse data
set using a membrane stabilizer.
used in conjunction with the conjugate gradient algorithm, which is known to be
much more efficient than a gradient decent algorithm [26]. We use the subsampled
data set in figure 5.3(a) with a membrane stabilizer in a 32 x 32 problem for testing
this preconditioner. The convergence curves for Pentlands preconditioner with CG,
CG and our APCG algorithms are shown in figure 5.9. We can see that Pentlands
preconditioning is worse than the conjugate gradient iterations without precondition
ing. This is because the diagonal dominance behavior of K* after wavelet transform
claimed in [61] doesnt hold in general. Pentlands preconditioner was constructed
by approximating the wavelet transform of the stiffness matrix by its diagonal en
tries based on the diagonal dominance assumption of Ks. The falsity of the diagonal
dominance claim is evident in figure 5.10 where we profile arbitrarily selected rows of
the wavelet transformed membrane matrix Ks of size (64 x 64) for the ID problem.
In [61], the preconditioner was chosen as the diagonal of the stiffness matrix K in
Magnitude Magnitude
112
(a) (b)
(c)
(d)
Figure 5.10. Row profile of the wavelet transformed membrane matrix Ks of size
64 x 64 along different rows, (a) row 3 (b) row 10 (c) row 30 (d) row 50. This figure
establishes the falsity of the diagonal dominance behavior of Ks claimed in [19].
113
the wavelet basis. Note that for the sparse data case, the matrix Ks dominates K<.
Therefore, bad approximations to the wavelet transform of Ks will deteriorate the
preconditioning effect and slow down the convergence speed.
Figure 5.11. The convergence for different preconditioners in the preconditioned con
jugate gradient algorithm for the surface reconstruction problem with sparse sym
metric data set using a controlledcontinuity stabilizer.
The third experiment uses the sparse data set given in figure 5.3(b). The size
of the problem is maintained at 64 x 64. The controlledcontinuity stabilizer with
p 1 and r = 0.01 is used as the smoothness constraint here. Note that the
data set is symmetric with respect to the center. Using the circular convolution in
the QMF implementation will speed up the convergence rate of the wavelet based
preconditioning algorithm in this case. This is because the wavelet preconditioner
with circular convolution operates on the residue vector periodically in 2D. This
is very helpful for symmetric data as the solution is supposed to be symmetric in
this case. Yaou h Chang [92] used the circular convolution in their wavelet basis
transfer and most of their experiments are performed on the symmetric data sets
114
Figure 5.12. The computed solutions using the adaptive preconditioned conjugate
gradient algorithm after (a) 10 iterations, (b) 40 iterations, (c) 100 iterations, in the
third surface reconstruction experiment.
115
or on the problems with the solutions which are somehow periodically continuous.
We have implemented both the circular convolution and the linear convolution (with
selfreflection at the border [50]) in the QMF for our adaptive preconditioner in
this experiment. The convergence curves for a comparison of the aforementioned
algorithms are plotted in figure 5.11. Our preconditioner with the linear as well as
periodic convolutions has the best convergence performance. The computed solutions
after 10, 40 and 100 iterations are shown in figure 5.12.
The other data set in our surface reconstruction experiments is the real laser
range data from a ball. Figure 5.3(c) shows the 148 randomly sampled data points
from the original range data set generated by the MSU (Michigan State University)
Pattern Recognition and Image Processing Labs Technical Arts 100X scanner. The
size of the problem domain is 152 x 193. The controlledcontinuity stabilizer with
p = 1 and t = 0.1 was used in this experiment. The discontinuities are specified
along the boundary of the object. The discontinuities divide the problem domain
into two nonoverlapped regions, where the inside region corresponds to the object
surface and the outside region corresponds to uniform background. Our adaptive
preconditioning is used in the inside region only by enforcing the solution values
in the outside region to be zero after the preconditioning. The convergence curves
for our APCG, HCG (with different levels), and CG algorithms are shown in figure
5.13(a). Our adaptive preconditioned CG algorithm still has the best convergence
performance in this experiment with real laser range data. Very accurate solution is
obtained after 10 iterations of the APCG algorithm, as shown in figure 5.13(b).
116
(a)
(b)
Figure 5.13. (a) The convergence for the APCG, HCG and CG algorithms for the
surface reconstruction problem with the real laser range data set using a controlled
continuity stabilizer, (b) The computed solution after 10 APCG iterations.
117
5.4.2 Shape from Shading
For the shapefromshading experiment, we use the synthetic bump images with
size 64 x 64 used in [76, 38], as shown in figure 5.14. These two images are generated
from the Lambertian reflectance model with the light source directions (pa, qs, 1) being
(0.34,0.3467,1) and (0.34,0.3467,1) for the left and right images, respectively. In
this experiment, only the left image is taken as the input to the SFS algorithm
to recover the shape, and the right image is used to monitor the closeness of the
computed solution to the true solution.
Figure 5.14. Shaded images of bump with different light source directions.
The construction of our preconditioner for the shapefromshading problem is
described in the previous section. The parameters A and p are set to be 0.5 in
this experiment. Without including any boundary condition in the energy function
to be minimized, we compare the convergence of our APCG algorithm with the
118
(a)
(b)
Figure 5.15. The convergence of the APCG, HCG, and CG algorithms for the shape
fromshading problem.
(a)
(b)
Figure 5.16. Shaded images of bump after 100 iterations of APCG algorithm without
using any boundary condition.
119
HCG and CG algorithms. The convergence curves for the energy function and the
computed (p, q) error are shown in figure 5.15. Note that the (p, q) error is defined as
H \Pi~P,*l + kt' 9t*l> where (p,9,) and (p*,q*) are the computed and true (p,g) at the
ith location, respectively. We can see the APCG algorithm has the best convergence
behavior in figure 5.15(a) and (b). The computed solution after 100 iterations of the
APCG algorithm is fed back to the reflectance model to generate the left and right
bump images as shown in figure 5.16. Comparing these computed images to those in
figure 5.14, we can see that this computed solution is far from the true solution in
the upper left region, although its corresponding energy is close to 0. This is because
the energy function for the SFS problem is nonconvex and gradientbased methods
(e.g. CGtype methods) tend to get trapped in local minima. To obtain the true
solution using the gradientbased method, we need to have an appropriate initial
guess that is closer to the true solution. But it is difficult to find such initial guess
that is guaranteed to converge to the true solution in the CGtype algorithms. In
fact, we always use 0 as the initial guess in our experiments.
To resolve the above problem for minimizing the SFS energy function by gradient
based methods, we include the boundary conditions obtained from the occluding
boundary [41] as an additional data constraint in the energy function to be mini
mized. By incorporating the boundary conditions into the SFS energy function, the
CGtype algorithms can approach the true solution. In our experiment, we use the
active contour model [43] to locate the occluding boundary at first. By using the
property that the surface normal at a point on the occluding boundary is parallel
(f)
(e)
Figure 5.17. Shaded images of bump after (a) 8 (b) 10 iterations, (c) & (d) 20
iterations, (e) & (f) 40 iterations of the APCG algorithm with the occluding boundary
condition.
121
to the normal of the silhouette in the image plane [41], we impose the Direchelet
boundary condition on (p,q) along the occluding boundary. In addition, we use
the occluding boundary to incorporate the discontinuities in (p, q) or z representing
orientation discontinuities or depth discontinuities respectively. In this experiment,
we incorporate the orientation discontinuities along the occluding contour. Figure
5.17 shows the left (north west) and right (north east) images generated from the
computed solutions for our APCG algorithm after 10, 20 and 40 iterations. We can
see that the computed solutions approach to the true solution after incorporating
the boundary conditions. The computed gradient and height after 40 iterations are
shown in figure 5.18.
(a) (b)
Figure 5.18. The computed (a) gradient and (b) height after 40 iterations of the
APCG algorithm.
The convergence of our APCG algorithm was compared to the HCG and CG
algorithms for the SFS problem with the Direchelet boundary condition. The con
vergence curves for the energy, p q error and z error reduction are shown in figure
122
... CG
HCG(L=2)
HCG(L=3)
HCG(L4)
APCG
15 20 25 30
Number of iterations
. . CG
HCG(L2)
HCG(L3)
HCG(l4)
20 25 30 35
Number of iterations
(b)
(c)
Figure 5.19. The convergence curves of the (a) energy, (b) p q error, (c) z error
for the CG, HCG and APCG algorithm.
123
5.19. Our APCG outperforms the other methods in terms of convergence speed for
all of the three different measures.
5.4.3 Optical Flow Computation
In section 2.3, we presented a modified gradientbased formulation of the opti
cal flow computation problem. Our regularizationbased formulation combines the
regionbased [40] and contourbased [35] gradient constraints. The gradient con
straint for the brightness function E is active all over the image plane except in the
the neighborhood of the discontinuity locations, while the gradient constraint for the
brightness function after the Laplacian of Gaussian operation, i.e. V2G* E, is active
at the discontinuity locations. In our implementation, we locate the zerocrossings of
S72G*E and label them as discontinuities [35]. Accurate estimation of Ex, Ey and Et
is necessary to obtain a reliable flow constraint. When E is the brightness function,
we use the Gaussian spatialtemporal presmoothing prior to applying the twopoint
central difference to approximate the partial derivatives. When E is V2G E, a
twopoint central difference is applied to E to approximate its partial derivatives. As
discussed in section 2.3, a normalization of the gradient constraints is used to make
the contribution of each (active) flow constraint uniform.
We present two experiments for the optical flow computation. The first experi
ment consists of a synthetic translating square image sequence; and the second is a
real image sequence consisting of a rotating Rubic cube. One frame for each image
sequence is shown in figure 5.20 Each frame for the translating square and rotating
Rubic cube image sequence is of size 100 x 100 and 256 x 240 respectively.
124
(a) (b)
Figure 5.20. One frame from the (a) square image sequence and (b) rotating Rubic
cube image sequence.
The construction of our adaptive preconditioner for the optical flow problem is
described in section 5.3. This preconditioner is embedded in the conjugate gradient
to accelerate the convergence speed for solving the linear system in the optical flow
computation. We use the CG, HCG ( with various number of levels), and our APCG
algorithms to solve the problem. Figure 5.22(a) and (b) shows the convergence rates
of the three algorithms for the translating square and the rotating Rubic examples,
respectively. We can see that our APCG algorithm has the best performance in
convergence. Note that the HCG algorithm also converges quite fast because the
linear system for the optical flow problem is not as illconditioned as the one in the
surface reconstruction and shapefromshading problems. Therefore, the difference
between various preconditioning methods in not as pronounced for the optical flow
problem as in the other two (SR & SFS) problems. Although the HCG algorithm
125
with an appropriate number of levels L, exhibits fast convergence, it is difficult to
determine the best L for a problem in advance. As for the APCG algorithm, the
number of levels is always chosen to be the maximum possible which allows the linear
convolution in the QMF implementation with the mirror reflection at the border to
be carried out with ease.
(a) (b)
Figure 5.21. Computed flow of the (a) translating square and (b) rotating Rubic
cube.
In our examples, the regularization parameter A was set to a value of 10 for the
translating square example and 1 for the rotating Rubic cube experiment. For the
former, the computed optical flows after 10 iterations of the APCG algorithm is shown
in figure 5.21(a), and the average angular error [3] is 0.21 with the standard deviation
0.07. Compared to the results for the other gradient based methods reported in
Barron et al. [3], our new formulation gives the best solution with 100% flow density.
In fact, our adaptive preconditioner can also be used with the other gradient based
126
(a)
(b)
Figure 5.22. The convergence curves of the CG, HCG and APCG algorithms for (a)
translating square (b) rotating Rubic image sequence.
regularization formulations of the optical flow problem. For the rotating Rubic cube
experiment, the computed optical flow after 15 iterations of our APCG algorithm is
shown in figure 5.21(b).
5.5 Summary and Discussion
In this chapter, we presented a novel physicallybased adaptive precondition
ing technique which was used in conjunction with a conjugate gradient algorithm to
drastically improve the speed of convergence for solving various early vision prob
lems. A preconditioner based on the membrane spline or the thin plate spline or a
convex combination of the two was termed as a physicallybased preconditioner for
obvious reasons. The adaptation of the preconditioner to an early vision problem
was achieved via the explicit use of the spectral characteristics of the regularization
filter in conjunction with the data. This spectral function was used to modulate the
127
frequency characteristics of a chosen wavelet basis leading to the construction of our
preconditioner. The preconditioning technique was demonstrated for the surface re
construction, shape from shading and optical flow computation problems. Through
experiments, we demonstrated the superiority of our preconditioning method over
existing methods.
Our adaptive preconditioning technique can be used in conjunction with any
wavelet basis. The choice of the wavelet basis is crucial to achieve efficient pre
conditioning. The ideal wavelet needs to be well localized in spatial (or time) and
frequency domain. Good spatial localization makes the QMF implementation very
efficient, i.e., the wavelet transform can be accomplished very efficiently. Good local
ization in frequency domain of the wavelet is necessary to achieve fast convergence
since the the accuracy of spectral approximation to the regularization filter depends
on the localization in frequency domain of the wavelet. But the localization in spa
tial and frequency domain cannot be arbitrarily small, because the product of their
bandwidths is lower bounded by a constant which is known as the uncertainty
principle or Heisenberg inequality. This implies that there is a tradeoff between the
spatial and frequency localization. Gaussian functions are optimal in the sense that
they meet the bound with equality [20]. If we use the wavelet with the optimal
property, the preconditioning should be more efficient.
CHAPTER 6
ROBUST OPTICAL FLOW COMPUTATION
6.1 Introduction
Optical flow computation is a fundamental problem in the motion analysis of
image sequences. It provides very important information for estimating 3D velocity
fields, analyzing rigid and nonrigid motion, segmenting the image into regions based
on their motion, or recovering 3D structure of objects in the image. Many techniques
for computing optical flow have been proposed in literature. These can be classified
into gradientbased, correlationbased, energybased, and phasebased methods [3].
In this chapter, we present two robust and efficient algorithms namely, a modified
gradientbased algorithm and an SSDbased regularization algorithm, for optical flow
computation.
The gradientbased approach is dependent on the image flow constraint equa
tion, which is derived from the brightness constancy assumption as well as the first
order Taylor series approximation [40]. Using the image flow constraint equation
alone is not enough to compute the optical flow since each equation involves two
different variables. Horn and Schunck [40] introduced the membrane smoothness
measure to constraint the flow field. Since the smoothness constraint is invalid across
128
129
the motion boundary, Nagel [56] proposed the oriented smoothness measure to sup
press the smoothness constraint in the direction perpendicular to boundaries. How
ever, the smoothness across motion boundaries problem can be resolved by using the
regularization with discontinuities [80].
A major problem with the gradientbased approach is the unreliability of the
image flow constraint equation in the areas whose local brightness function is highly
nonlinear. These areas occur in scenes containing highly textured regions, motion
discontinuities, or depth discontinuities. The firstorder Taylor series approximation
used in the derivation of the image flow constraint equation leads to inaccuracies
when the higher order terms are significant. The higher order terms are neglected
in the derivation of the image flow constraint equation by assuming the time step
between consecutive frames is arbitrarily small, i.e. approaches 0, which is far from
being practical. Usually, the time step between consecutive images is fixed and is not
arbitrarily small. Therefore, the gradient constraint equation is reliable only when
the higher order derivatives of the brightness function are insignificant, i.e., the local
brightness function is well approximated by a linear function. Most of the gradient
based methods do not account for the reliability of the gradient constraint equation.
In [32], Heitz and Bouthemy used the hypotheses testing technique to determine
whether the image flow constraint equation is valid at each site in the image plane.
The above mentioned gradient constraint is very unreliable in the neighborhood
of discontinuities of the brightness function. However, the contourbased methods
can give robust estimates of the optical flow along the zerocrossing contours, which
130
are the discontinuity locations of the image brightness function. The contourbased
methods use a different flow constraint along the zerocrossing contours. This flow
constraint along contours is obtained by replacing the image brightness function I by
the convolution of I with the second derivative of Gaussian smoothing function. We
propose to use the image flow constraint in the areas away from the brightness dis
continuities and use the contourbased flow constraint at the discontinuity locations.
Thus, in our gradientbased regularization method, the unreliable image flow con
straints near discontinuity locations are replaced by the robust contourbased flow
constraints at the zerocrossing locations. This yields an accurate estimate of the
optical flow field vectors.
The correlationbased approach locally finds the displacement vector (u, v) be
tween two images 70 and I\ at the location (x, y) by minimizing the sum of squared
difference (SSD) function
k k
SSD(x;u,v) = '52w(i,j)[h(x + i + u,y+j + v)I0(x + i,y+j)]2 (6.1)
j=k i=k
where w(i,j) is the weighting function. In this SSD function, the summation is
performed in a window of size (2k f 1) x (2k + 1) centered at (x, y). Most correlation
based methods do an extensive search of the displacement vector (u, v) in a finite
integerpair set and find the pair with the smallest SSD value to be the displacement
at that location. Since the search region of the displacement vector is discretized
for an extensive search, the accuracy of the computed optical flow is limited by this
discretization. In addition, the correlationbased approach is incapable of producing
131
reliable optical flow estimates in a homogeneous region. Anandan [2] treated the
estimates provided by the matching process as the data constraints for optical flow
with appropriate confidence measure, and incorporated the smoothness constraint on
optical flow to achieve better results. Recently, Szeliski and Coughlan [77] used the
twodimensional spline model to represent the flow field and minimized the following
SSD function
E(u) = [M* + uijj + VH) Hhj)}2 (62)
j=ii=i
where the vector u is the concatenation of the flow components tt,j and u,j. The
LevenbergMarquardt algorithm was then employed to solve this nonconvex opti
mization problem. They reported very accurate results using this method. The 2D
spline models for optical flow field assume the flow field to be wellapproximated
by the 2D spline basis functions in the preset patches. However, it is difficult to
incorporate discontinuities into these 2D spline patches. In our method, we use the
standard finite difference discretization on the flow field and take the SSD as the
data constraint energy in a regularization framework. A preconditioned nonlinear
conjugate gradient algorithm is employed to minimize the total energy function. The
experimental results on the standard synthetic image sequence using our algorithm
are better than or comparable to the best existing results reported in literature.
Unlike some other algorithms that only generate sparse optical flow estimates, our
algorithms give dense optical flow estimates. In addition, motion discontinuities can
be incorporated in our (regularization) framework.
132
Recently, Barron et al., [3] have presented a comprehensive survey of various
optical flow computation techniques along with a quantitative comparison of them.
From their experimental results, they argue that the phasebased method by Fleet
and Jepson [18] gives the most accurate result. However, most of the reported results
are only for sparse optical flow measurements obtained after thresholding unreliable
components, present in the neighborhood of phase singularities.
The remainder of this chapter is organized as follows. In the next section, we
present our modified gradientbased method that combines the regionbased gradient
constraint in the regions away from brightness discontinuities and the contourbased
gradient constraint at these discontinuity locations. In section 6.3, our SSDbased
regularization technique for optical flow computation from two images of a time
sequence is proposed. The experimental results for both algorithms on synthetic and
real image sequences are presented in section 6.4. Finally, we conclude this chapter
in section 6.5.
6.2 Modified gradientbased method
At the foundation of the standard gradientbased approach is the image flow
constraint equation,
Ix(x, y, t)u(x, y, t) + Iy(x, y, t)v(x, y, t) + Jt(x, y, t) = 0 (6.3)
where I(x,y,t) is the image brightness function at location (x,y) and time t, Ix,Iy
and It are the partial derivatives of I with respect to x, y and t, respectively, and
133
(u,v) is the optical flow field. The image flow constraint equation has been used in
different ways to obtain the optical flow estimates as reported in the literature on
gradientbased methods [40, 49, 55, 84]. In a regularization framework, the image
flow constraint equation is regarded as the data constraint and additional smoothness
constraint is imposed on the optical flow.
The image constraint equation is derived by taking the Taylor expansion of the
image constancy equation = 0 up to the firstorder terms. The higherorder
terms are neglected under the assumption that the time step between consecutive
frames is arbitrarily small, which is most often violated in practice. In reality, the
time step is usually fixed. Therefore, the lack of higherorder terms becomes the main
source of errors in the data constraint. The reliability of the image flow constraint
equation depends on the magnitudes of the higher order derivatives of image bright
ness function. If the image brightness function in the neighborhood of one point is
well approximated by a linear function, i.e. its higherorder derivatives are small,
then the image flow constraint is very reliable at this point. On the contrary, the
image flow constraint is very unreliable at the locations with significant higherorder
derivatives, which are mainly the neighborhood of brightness discontinuities. Since
the use of an unreliable flow constraint will degrade the accuracy of the optical flow
estimation, we propose to disable this unreliable image flow constraint in the neigh
borhood of discontinuities and enable the robust contourbased flow constraint at the
discontinuity locations in a regularization framework.
134
The contourbased methods [35, 17] compute the optical flow along the zero
crossing contours by using the following flow constraint equation
Sx(x, y, t)u(x, y, t) + Sy(x, y, t)v(x, y, t) + St{x, y,t) = 0 (6.4)
where S(x, y, t) is the convolution of 7(x, y, t) with the second derivative of the Gaus
sian function, Sx, Sy, and St are the partial derivatives of S with respect to x, y and
t, respectively. In [35], the function S was chosen to be the convolution of the Lapla
cian of Gaussian (LOG) filter with the image brightness function, i.e. V2G*/, where
V2 = ^ + Jp, G(x,y) is the 2D spatial Gaussian function and the convolution is
performed in 2D.
Now, we incorporate the above two flow constraint equations into the regu
larization framework. The image flow constraint in equation 6.3 is disabled in the
neighborhood of brightness discontinuities, while the contourbased flow constraint
is active along the zerocrossing. The variational formulation of the optical flow
problem using this selective scheme for the data constraint leads to minimizing the
following functional
/ a((V/.u) + /i)2 + A( Vu 2 +  Vu 2)dx+ f /?(V5u + 5t)2dx (6.5)
J fi J zerocrossing
where u(x, t) = (u(x,t),v(x,t)) is the optical flow field to be estimated, x = (x,y)
is a 2D vector, 1 is the 2D image domain, a, A and (3 are the weighting functions
associated to the image flow constraint, smoothness constraint and contourbased
135
flow constraint, respectively. In this dissertation, the weighting functions A and Â¡3
are chosen to be constants, and a(x) has value 0 when x is in the neighborhood of
discontinuities and takes a constant value elsewhere. A discrete version of the above
energy can be written as
y](Ex,jUj + EyVi + Et,i)2 + A(
(6.6)
Where, the index i denotes the ith location, the indices x, y and t denote the partial
derivatives along the corresponding directions, u and v are the discretized flow vector,
and the function E is defined as follows
ali when i Â£ D\N
Ei = {
/3Si when i Â£ Z
The set D contains all the discretized locations in the image domain, N is the set of
locations in the neighborhood of discontinuities, and the set Z contains the discretized
locations along the zerocrossing contours.
The image flow data constraint and the contourbased flow constraint are ob
tained by using the firstorder Taylor series approximation of the functions / and S',
respectively, thus each constraint is unreliable in the regions where the underlying
function ( I or S ) is not well approximated by a linear function locally, which means
the function contains significant nonlinear terms. We define a function () to be a
136
measure of reliability for the data constraint at th location as follows
r, V \Exx,i\ + \Eyy,i \ + \Ett,i\ + \EXy,i \ + \Ext,i\ + \Eyt,i\
{l)~ \Ex,i\ + \Ey,i\ + \Et,i\
The partial derivatives of the function E in equation 6.7 can be estimated by standard
finite difference methods. The reliability measure S(i) is the ratio between the sums
of the absolute values of the secondorder and firstorder partial derivatives of the
function E at the ith location. Therefore, the corresponding data constraint is
reliable when the measure 8 is small and unreliable when 8 is large. We can set a
threshold to reject the data constraints with a high 8 function value. This threshold
was set to 1.0 in our implementation.
The data constraint in the above energy function implies the weighting 7j? +
Iyi or Sli + Sy on at site i. This weighting is not reasonable since the
flow constraints at high gradient locations are not necessarily more reliable than
those at low gradient locations. In fact, they tend to be less reliable because the
brightness gradients close to discontinuities are normally higher than those away from
discontinuities. Therefore, we eliminate this weighting on each data constraint via
normalization. To have a uniform contribution at each pixel from the flow constraint,
we divide the flow constraint at the ith location by the normalization factor (7j +
Iyti)* or (Sx>i + Sy)*. Lets denote the normalized flow constraint by ExUi + EyVi +
Et = 0, then the energy function to be minimized becomes
(6.7)
/() = + EV.Â¡Vi + Et,Â¡)2 + + *4, + Vl.i + *'.,)
I
(6.8)
137
where the vector u is the concatenation of all the components u, and n,.
The minimization of the above energy leads to solving a linear system of equa
tions Ku = b where the stiffness matrix K Â¡ft22*22 is symmetric positivedefinite
and it has the following 2x2 block structure.
K =
Ks + EXX
E
xy
EXJ, Ks 4 E^j,
(6.9)
where Ks 3?"2x"2 is the discrete 2D Laplacian matrix from the membrane smooth
ness constraint, Exx, Exy, and Eyy are all n2 x n2 diagonal matrices with entries Exi,
ExEy and E2yi, respectively.
To efficiently solve the above linear system for computing optical flow, we use
the adaptive preconditioned conjugate gradient algorithm developed in chapter 5.
6.3 SSDbased Regularization Method
The SSD function in equation 6.2 can be used as the data constraint energy in
a regularization framework. Incorporating the membrane smoothness and the SSD
constraints into the regularization formulation, we obtain the total energy function
/(u) = Â£(7i(*' + + u*) ~ Jo(xi,Vi))2 + A{u2x + u2yi + v2xi + v2yi) (6.10)
'Â£>
where (x,, t/,) is the ith discretized location in the image domain. Note that the data
constraint, ii(x, + + vÂ¡) = 7o(x,,?/,), is not linear in the flow vector u, or u,.
If the Taylor expansion of the data constraint is taken up to first order terms, we
138
obtain the image flow constraint in equation 6.3. Consequently, the gradientbased
regularization approach can be perceived as an approximation to the SSDbased
regularization formulation. The energy function to be minimized in the gradient
based regularization is quadratic and convex, while the energy function in equation
6.10 is neither quadratic nor convex. For the quadratic and convex energy function in
the gradientbased regularization, the energy minimization can be accomplished by
solving an SPD linear system. The minimization of the nonquadratic and nonconvex
energy function is more difficult and requires more computational effort. The SSD
based regularization can give more accurate estimation than that from the gradient
based regularization, since the data constraint used in the SSDbased regularization
is more precise than that in the gradientbased regularization. However, it definitely
requires more computational work to minimize the energy function.
The total energy function in equation 6.10 is a nonlinear function with a large
number of variables. We use the nonlinear conjugate gradient method [23] with
a modified search direction to minimize this nonlinear energy function in equation
6.10. The nonlinear conjugate gradient algorithm is very efficient in the unconstrained
optimization of largescale nonlinear functions [23]. Although this algorithm can only
find the local minima, we can obtain very accurate solution when the initial solution
is close to the true solution. For example, small displacement between consecutive
frames can be easily found when we start the nonlinear CG from zero initial values.
In fact, a very good initial solution can be obtained from the optical flow estimated in
the previous frames. In addition, the scale space tracking scheme [89] can be employed
139
to take care of largedisplacement problems and to accelerate the convergence rate of
the nonlinear CG algorithm.
6.3.1 Nonlinear Conjugate Gradient Algorithm
The nonlinear conjugate gradient method for minimizing general nonlinear func
tions is outlined as follows [23]:
1.
2.
3.
4.
5.
6.
7.
d0 = r0 = /'(u0).
Find a, that minimizes /(u, + a,d,).
u,+i = u, + a,d,.
r+i = /'(u.+i)
fii+i = or fii+i = max{r,tl(rrT+r r,),0}
d,+i r,+i  /?,+id,.
If Tk 0, stop; else set i := i + 1 and go to step 2.
Step 2 requires a line search method to minimize the function /(u, + a,d) at u, along
the direction d,. The NewtonRaphson method and the Secant method are two popu
lar line search schemes to approximate the best a,. We employ the NewtonRaphson
method in the nonlinear CG. The Taylor series approximation of the function / up
to the secondorder terms is used to derive the approximate a,, giving
/'(u,)d,
dff"{ui)d,'
a, =
(6.11)
140
In step 5, there are two ways for choosing f3. The first is FletcherReeves formula and
the other is PolakRibiere formula. We adopt the latter since it usually converges
much more quickly, although the nonlinear CG needs to restart when f) = 0. Note
that /'(u) and /"(u) in this algorithm denote the gradient vector and the Hessian
matrix of the function /. From equation 6.10, we can obtain
/'(u) =
f'd,u
+ A
Ks 0
. /*.
0 Ks
u
/ =
Kd,uu t~ AKS
Kdyitv Kd,vv t" AKS
(6.12)
(6.13)
where the matrix K, is the Laplacian matrix obtained from the membrane smoothness
constraint, the vectors f'd u and f'd are the concatenations of the components
and f'dvi respectively, K<)UU, KdyXlv and Kd}VV are all diagonal matrices with the ith
diagonal entries Kd
Kd,uu,ii Kd,uv,i and Kd,w,i corresponding to the ith location are defined as follows:
f
J d,u,i
f
J d,v,i
f^d,uu,i
R d,uv,i
^d,vv,i
dli(xj + u,,y, + u,)
dx
dh{xj + u,,y, + Vj)
dx
,dh(x, + u y, + i>,) 2 d2h{xj + Uj,yj f u,)^
' dx
w
w
dx2
(6.14)
(6.15)
(6.16)
^ dh(xj + ut, y, + Vi)^dh(xi + u, y, + t>) ^ + d2Il{xl + u,, y, +
dx
dy
dxdy
, dh(xi + Ui, y{ + Vi) 2 d2h{xi + iq, y{ + v{)
( ) + w
dy
dy2
(6.18)
141
where w = h(xÂ¡ + u,,y, + u,) /0(x, yt). In the above equations, the values of the
image function h and its partial derivatives at (x f u,, y, + u) need to be computed
using any interpolation technique. In our implementation, we used the interpolating
spline under tension [57] to compute these values.
6.3.2 Modified Search Direction Scheme
The nonlinear conjugate gradient algorithm is a local search method which in
volves approximating the local landscape of the energy function by a quadratic func
tion. The use of gradient to obtain the search direction is not appropriate at the
locations where the local landscape of the energy function has large higherorder
derivatives, i.e. the local landscape is very nonlinear. We propose a new search
direction scheme to modify the gradient direction in nonlinear conjugate gradient al
gorithm. The idea of the modified search direction scheme is to check the smoothness
of the local energy function for every component of flow vector. From equation 6.10,
we can see that the smoothness energy is already quadratic, therefore, we need only
to consider the smoothness of the first term, i.e., the data constraint energy. To mea
sure the local smoothness of the data constraint energy for the flow vector (m,u),
we use the inner product of the normalized gradients of the two image brightness
functions as a smoothness measure cr(i, u,), i.e.
V/i(x, + Ui, yÂ¡ + v) V/0(x, y,)
a(Ui,V,) ~ V/i(xI + u1,y, + u,) V/0(x,y,)'
(6.19)
142
Figure 6.1. The discrepancy between the gradient directions of the function I0 and
/x at the ith location is small and thus the use of the gradient direction as the
search direction for the ith component is robust; while the the jth component of the
gradient direction is unreliable for a local search.
Figure 6.1 depicts the gradient directions of Io and Ii at two locations. The
gradient directions at the ith location are close, therefore the use of it as the search
direction is robust for the ith component. The two gradient directions at the jth
location differ by a significant angle, which means the jth component of the gradient
is unreliable as a local search direction.
We can set a threshold on the smoothness measure in order to reject the unre
liable components of the gradient direction as the basis of a local search direction.
In addition, this smoothness measure is also used as a confidence measure for flow
vectors.
6.4 Experimental Results
In this section, our modified gradientbased regularization algorithm and SSD
based regularization algorithm are tested on a variety of synthetic and real image
sequences. Comparisons of our results with the previously reported results in [3, 77]
are made to demonstrate the performance of our proposed algorithms.
143
The angular measure of error [3] between the computed optical flow and the
true flow is adopted in the experiments as a basis for comparison with the previously
reported results. Unlike some other methods which only produce sparse optical flow,
both of our regularization methods presented in this chapter give dense optical flow
estimates with 100% density. To achieve a fair comparison with the reported results,
only those techniques reporting 100% flow density in literature are included in our
comparative study.
6.4.1 Modified Gradientbased Method
In the implementation of our modified gradientbased regularization method,
we treat the zerocrossing points of S(x,y,t) as the discontinuity locations. The
computation of the derivatives Ix, Iy and It is very errorsensitive. Therefore, we
presmooth the image brightness function with the Gaussian filter prior to using the
central difference method to approximate the derivatives. As for Sx, Sy and St, we
directly use the central difference scheme to approximate them since the process of
convolution with the LOG (Laplacian of Gaussian) filter for generating the function
S already involves the smoothing operation.
Two standard synthetic image sequences are tested by using our modified gradient
based regularization method to compute the optical flow. These two image sequences
are the images of a translating Square and a Diverging Tree reported in [3]. Figure
6.2 shows one frame from each sequence. The image sizes for the square and diverging
tree are 100 x 100 and 150 x 150 respectively.
144
(a) (b)
Figure 6.2. Frames from the (a) Square 2 and (b) Diverging Tree image sequences.
The motion in the Square 2 sequence is a constant translation. In our experi
ment, the regularization parameter A is chosen to be 10, and the weighting parameters
a and (3 for the image flow constraint and contourbased constraint are set to 1. The
optical flow obtained after 40 iterations of the incomplete Cholesky preconditioned
conjugate gradient algorithm is shown in figure 6.3(a). The errors of our modi
fied gradientbased regularization method and the other gradientbased methods for
this translating square example are given in Table 6.1. Only the results from those
gradientbased methods that yield 100% density flow estimates reported in [3] are
included in this table for comparison. It is obvious that our modified gradientbased
regularization method drastically outperforms the other gradientbased methods in
this example. Note that the results of the other two gradientbased methods in table
6.1 for this image sequence could be improved by increasing the degree of smoothness,
145
i.e., the value of the regularization parameter A, since the correct optical flow field
is uniform in this example. However, large regularization parameter tends to over
smooth the estimated optical flow for other image sequences. Although it is possible
to find an optimal regularization parameter by using the generalized crossvalidation
method [88], this method is extremely computationally expensive.
Technique
Average Error
Standard Deviation
Density
Horn and Schunck (modified)
32.81
13.67
100%
Nagel
34.57
14.38
100%
Our method (Gradientbased)
0.21
0.06
100%
Table 6.1. Summary of Square 2 results.
(a)
(b)
Figure 6.3. Computed optical flow of the (a) Square 2 and (b) Diverging Tree image
sequences after 40 iterations of the incomplete Cholesky preconditioned CG algo
rithm.
The other synthetic data is the more realistic Diverging Tree image sequence.
The underlying motion in this example is a bilinear expansion from the center of
the image. We use the regularization parameter A = 0.1 and a = fl = 1 in this
146
example. The error in the computed optical flow after 40 iterations of the incomplete
Cholesky preconditioned CG algorithm is presented in Table 6.2, along results from
other gradientbased methods in literature that yield 100% flow density [3]. Once
again, our modified gradientbased regularization method produces more accurate
optical flow than the other methods. The computed optical flow is shown in figure
6.3(b).
Technique
Average Error
Standard Deviation
Density
Horn and Schunck (modified)
2.55
3.67
100%
Nagel
2.94
3.23
100%
Ur as et al.
4.64
3.48
100%
Our method (Gradientbased)
1.60
1.15
100%
Table 6.2. Summary of Diverging Tree results.
Our gradientbased algorithm was also tested on a real data set namely, the
Rotating Rubic cube. Figure 6.4 shows one frame of the Rubic Cubic sequence and the
computed optical flow after 40 iterations of the incomplete Cholesky preconditioned
conjugate gradient algorithm.
6.4.2 SSDbased Regularization Method
The experimental results of using our SSDbased regularization method on two
synthetic image data and one real image sequence are presented here. The two
synthetic image sequences are the Sinusoid and the Translating Tree image sequences.
Our results on these two examples are compared to the best results with 100% density
optical flow estimate reported in [3] and [77].
147
(a)
(b)
Figure 6.4. (a) One frame from the Rotating Rubic Cube sequence (b) Computed
optical flow after 40 iterations of the incomplete Cholesky preconditioned CG algo
rithm.
(a)
(b)
Figure 6.5. Frames from the (a) Sinusoid 1 and (b) Translating Tree image sequences.
148
The Sinusoid 1 image sequence has a constant translation with a velocity of
(1.585,0.863) pixels/frame. The image size for each frame of this image sequence
is 100 x 100. Each frame is a highly textured image whose brightness function is
fast changing almost everywhere. The regularization parameter A is set to be 300.
Very accurate estimate is obtained after 300 preconditioned nonlinear conjugate gra
dient iterations. Figure 6.6(a) shows the computed optical flow. Our result is the
best among the three depicted in Table 6.3. Note that our method uses only two
consecutive images to estimate the optical flow. The results of Fleet and Jepsons
method [18] for this example were obtained by using a 21frame image sequence [3].
To achieve a fair comparison, we depict the reported results of Szeliski and Coughlan
[77] for the 2frame image sequence. With incorporation of more frames of data, the
results in Szeliski and Coughlan do however improve over those depicted in the table.
Technique
# of
Frames
Average
Error
Standard
Deviation
Density
Fleet and Jepson
21
0.03
O
rH
o
o
100%
Szeliski and Coughlan
2
O
tH
O
0.02
100%
Our method (SSDbased)
2
0.02
0.01
100%
Table 6.3. Summary of Sinusoid 1 results.
For the translating tree example, each frame contains more complicated highly
textured regions, which result in multiple local minima in the associated energy
function to be minimized. To avoid being trapped in a local minimum during the
minimization of a nonconvex energy function, we employ the scalespace tracking
149
Figure 6.6. Computed optical flow of the (a) Sinusoid 1 and (b) Translating Tree
image sequences, using the preconditioned nonlinear CG algorithm.
scheme [89]. The scalespace tracking is achieved by applying our algorithm on the
smoothed image pair with the degree of smoothing being gradually reduced during
the energy minimization. We start from a a = 1.1647 for Gaussian smoothing, reduce
to a a = 0.5774 after 100 iterations of the preconditioned nonlinear CG algorithm,
and finally set cr to 0 after 200 iterations. The computed optical flow is shown in figure
6.6(b). The error in the computed optical flow is depicted in Table 6.4 along with
the errors of best results with 100% density reported in [3] and [77]. Our SSDbased
regularization method gives the most accurate result from among these methods.
Note that the result of Szeliski and Coughlan in this table is adopted for the two
frame case to be compared to the result of our twoframe algorithm, although more
accurate estimation can be obtained via incorporating more frames into the energy
function as shown in [77].
150
Technique
# of
Frames
Average
Error
Standard
Deviation
Density
Ur as et al.
15
0.62
0.52
100%
Szeliski and Coughlan
2
0.35
0.34
100%
Our method (SSDbased)
2
0.30
0.19
100%
Table 6.4. Summary of Translating Tree results.
Finally, our SSDbased regularization method is applied to a real image sequence
namely, the Hamburg Taxi sequence. One frame of this sequence and the computed
optical flow are shown in figure 6.7. The computed optical flow between two con
secutive images was obtained after 200 iterations of the preconditioned nonlinear
conjugate gradient algorithm.
(a) (b)
Figure 6.7. (a) One frame from the Hamburg Taxi sequence (b) Computed optical
flow after 200 iterations of the preconditioned nonlinear CG algorithm.
6.5 Summary and Discussion
In this chapter, we presented two new algorithms for robust and efficient compu
tation of the optical flow from an image sequence. One is the modified gradientbased
151
regularization algorithm, and the other is the SSDbased regularization algorithm.
The modified gradientbased regularization method selectively combines the image
flow constraint and the contourbased flow constraint into the data constraint. The
image flow constraint is disabled in the neighborhood of discontinuities, while the
contourbased flow constraint is active at discontinuity locations. A more precise
data constraint is obtained by using this selective combination scheme than the one
used by other gradientbased methods in literature. Flow estimates obtained us
ing our method are more accurate than those obtained from techniques reported in
literature [3]. This is evident from the experimental results.
Our SSDbased regularization method uses the SSD measure as the data con
straint in a regularization framework. The resulting energy function is neither quadratic
nor convex. The preconditioned nonlinear conjugate gradient with a modified search
direction is developed to minimize this energy function. Very accurate results were
obtained by applying this algorithm to two consecutive images of a sequence.
Comparing these two algorithms proposed in this chapter, we conclude that the
modified gradientbased regularization algorithm is more computationally efficient
than the SSDbased regularization algorithm, while the latter gives more accurate
results than the former. Also, we have experimentally demonstrated the superiority
of our algorithms for optical flow computation over those reported to date in litera
ture. Future work will focus on automatically detecting motion discontinuities in the
framework presented here.
CHAPTER 7
RESEARCH DIRECTIONS
7.1 Introduction
In this thesis, we have presented efficient computational methods for early vision
problems with prescribed discontinuities. In this context, several research issues
presented in the following paragraphs will benefit from further investigation.
In the regularization and MRF formulations for early vision problems, the reg
ularization parameters are usually chosen in an ad hoc fashion. Without adapting
the parameters to the problem, it is easy to end up with an oversmoothed or under
smoothed result. The generalized crossvalidation method [88] has been popularly
used to determine the regularization parameters. However, the computational cost
of this method is extremely high. In addition, this method can not be used when
discontinuity is included in the problem. It is important to have an efficient and
general technique to determine the regularization parameters to obtain better results
for early vision problems.
The adaptive preconditioner constructed in a wavelet basis can be applied to
any regularization formulation in early vision. In fact, we can use similar construction
of the adaptive preconditioner for the linear system discretized from any boundary
152
153
value problems in PDEs. The extension is straightforward by replacing the spec
tral characteristics of the membrane or thinplate stabilizer by that of the associated
differential operator. There is no restriction on the order of the partial differential
equation. From the success of our adaptive preconditioner in the early vision prob
lems, we believe the adaptive preconditioner will lead to drastic acceleration for the
numerical solution of the boundary value problems.
In the construction of our adaptive preconditioner, we approximate the asso
ciated regularization filter by a linear shiftinvariant filter in a wavelet basis, since
the modulation matrix in our preconditioner gives uniform weighting to the wavelet
basis functions corresponding to the same frequency band. As discussed in section
5.2, the regularization filter is spatially varying in general, especially for the regu
larization problems with discontinuities or sparse data constraints. However, it is
possible to construct a spatially varying filter in a wavelet basis to achieve a better
approximation to the regularization filter.
The other possibility to design an adaptive preconditioner is to approximate the
generalized inverse of (K0 + UVT) by using our generalized capacitance matrix algo
rithm or the generalized ShermanMorrisonWoodbury formulas [11, 34, 67]. Using
this approximation, we can design a new adaptive preconditioner that accounts for
discontinuities and sparse data constraints in early vision problems. This will be the
focus of our future research.
The important issue of discontinuity detection has not been discussed in the
previous chapters. The early vision problem with discontinuity detection can be
154
formulate in a deterministic or probabilistic framework (see chapter 2), leading to
a coupled (binaryreal) nonconvex optimization problem. In this chapter, we will
briefly discuss existing techniques for this problem and propose a novel hybrid search
algorithm as a possible approach toward solving this problem. This hybrid search
algorithm involves a combination of a stochastic and a deterministic search tech
niques. For the stochastic search, we employ a novel informed genetic algorithm
(GA), whereas for the deterministic part, we employ either the generalized capaci
tance matrix algorithm or the adaptive preconditioned conjugate gradient algorithm,
presented in chapter 4 and 5, respectively. Some promising preliminary experimental
results of applying this hybrid search algorithm on surface reconstruction problems
are presented. However, this hybrid search technique will have to be considerably
researched upon prior to deciding upon its practicality.
7.2 Previous Energy Minimization Methods
In chapter 2, we have seen that both the deterministic formulation and proba
bilistic formulation of early vision problems with discontinuity detection give rise to
minimizing a nonconvex function. Note that in some applications, a local optimum
may provide a satisfactory solution. However, in certain other applications, local
optimal solutions are unsatisfactory. A number of optimization techniques have been
proposed to solve this problem. We briefly discuss their pros and cons in this section.
There are two classes of optimization methods proposed for solving this non
convex minimization problems; one is deterministic, and the other is stochastic. The
155
deterministic methods include the GNC (graduated nonconvexity) algorithm, ana
log networks, mean field annealing and dynamic programming. As for the stochastic
methods, there are the simulated annealing, MPM (maximizer of posterior marginals)
and the GA (genetic algorithm).
7.2.1 Deterministic Optimization Methods
Most of the deterministic minimization methods can not find the global min
imum solution for the nonconvex function in equation 2.6, instead they lead to an
approximate solution or a locally optimal solution. But they can converge much
faster than the stochastic methods. The following deterministic methods have been
proposed in literature for this nonconvex minimization problem. We give a brief
description of the same.
GNC algorithm
The GNC (Graduated NonConvex) algorithm was proposed by Blake and Zisser
man [8] and can be considered as a continuation method widely known in numerical
analysis literature [14]. For the GNC algorithm to minimize the nonconvex func
tion F(u, h, v) given in equation 2.6, a dual energy i^u) is derived by the following
equation
F(u) = infÂ£(u,h,v). (7.1)
h,v
A sequence of approximating functions F'p)(u), 0 < p < 1, are constructed such that
Fis convex and Fis equal to the dual energy F. Then, gradient decent tech
niques are employed to minimize F^ for a prescribed decreasing sequence of values
156
of p from 1 to 0. In the form described in [8], the GNC algorithm is applicable only
for the case when there is no interaction between the individual line process variables,
i.e. no smoothness constraint is imposed on the line process. This choice of line en
ergy is purely heuristic, since it tries to penalize only the inclusion of discontinuities.
The GNC algorithm proposed in [8] can only deal with this particular form of line
energy.
Recently, Bedini et al. [4] generalized the GNC algorithm to allow some kind of
interactions between line process variables. The new line energy contains the inhibi
tion of parallelism of discontinuities and is expressed in a closed form. However, it is
very difficult to generalize the GNC algorithm to include arbitrary line energies, i.e.,
general constraints on the line process, since the procedure of dual energy in the GNC
algorithm construction to eliminate the line process will be extremely complicated for
general cases. In addition, the GNC algorithm cannot guarantee to find the globally
minimal solution [47]. However, as stated earlier, it does yield satisfactory solutions
in a number of applications.
Analog networks
This approach is very promising since it can lead to a realtime implementation.
The idea of applying analog network to solve this nonconvex minimization problem
is in spirit similar to Hopfield and Tanks neural network [36] approach for solving
combinatorial optimization problems. The original energy function in equation 2.6 is
approximated by a Lyapunov function such that the associated circuit can minimize
this Lyapunov function [44], thus leading to an approximate solution.
157
Analog VLSI circuit based solutions were developed by Harris et al. [31] for
a host of early vision problems. These analog circuit based methods lead to some
impressive practical results. However, since this design is based on the idea of line
process elimination as in the GNC algorithm, it is not clear as to how the network
could be modified to include general line energies.
This approach does permit some kinds of line process interactions in the line
energy, but the line energy must have a closedform expression. In general, the MRF
model doesnt use a closedform expression for the clique potentials. In the MRF
model, a lookup table is more prevalent for designing the clique potentials for the
line process.
Mean field annealing
The mean field annealing algorithm [21] uses the mean field approximation and
the saddle point approximation to obtain an approximate solution to the MAP esti
mation problem. It is a nice deterministic approximation to the MRF model for sim
ple clique potential functions. Like the GNC algorithm and neural network approach,
mean field annealing can only deal with specific types of clique potential. When the
line energy includes line process interactions, i.e., the neighborhood system for line
process in the MRF model is nonempty, more elaborate, nontrivial approximation
methods are required to design the algorithm. Consequently, it may not be practical
for general clique potential functions.
158
Dynamic programming
Dynamic programming is a global optimization algorithm for a function of dis
crete variables when the function can be expressed as a recurrence. To use this
algorithm, the real variables need to be quantized to discrete variables, i.e. taking
values from a finite discrete set. In [60], the recurrence form is obtained for the sim
plest line energy as in GNC, i.e. without line process interactions in the line energy.
This is a degenerate case in an MRF model, since it happens when the neighborhood
system for line process is empty. It is nontrivial to obtain a recurrence form for
general clique potential functions.
Although dynamic programming can achieve a global optimum for the dis
cretized problems, the the computational cost is very high for large size problems
[8, 60], thus rendering it impractical.
Nonlinear diffusion
Nonlinear diffusion [70, 64] has been popularly employed as an energy mini
mization tool for various vision problems recently. To apply this technique to the
coupled nonconvex minimization problem, Shah [70] changed the binary line process
variables to the edge indicator real variables which assume real values between 0 and
1. When the value is close to 1, it implies that the point is in the vicinity of an edge,
and vice versa. Then, a coupled partial differential equation system is established
from this modified energy functional. The solution update is based on this coupled
PDE system. This technique seems to be very promising in terms of computational
efficiency. However it has been applied only to the simplest line energy case, i.e.
159
no interaction between line process variables. In addition, this technique can not
guarantee the globally optimal solution.
All the above deterministic optimization methods can only be applied to
some restricted energy functions whose line energies contain either no line process
interactions or very simple line process interactions. The line energies encode the
constraints or prior information we impose on the line process. Bedini et al. [4] used
a line energy with simple line process interactions and obtained much better results
than those with the simplest line energy, i.e., no interactions between line processes.
In an MRF model, the line energy is the summation of line clique potentials. When
the neighborhood system in the MRF model is nonempty, there are line process
interactions in the line energy. In general, the line clique potentials are accessed
through a lookup table, and no closedform expression is available. The line clique
potentials are very crucial to the performance of the results [54]. They are regarded
as the parameters in an MRF model and can be estimated for different classes of
object [54] to achieve better performance.
Furthermore, Most of the above mentioned deterministic methods with the ex
ception of dynamic programming, can only reach an approximate solution or a locally
optimal solution. In fact, the quantization of the real variables in dynamic program
ming may also be regarded as an approximation.
160
7.2.2 Stochastic Optimization Methods
Three stochastic methods for the MAP estimation problems will be discussed in
the following. Unlike the deterministic optimization methods, there is no restriction
on the clique potentials for the stochastic methods. In addition, simulated annealing
as well as the genetic algorithm can converge to a global optimum. The drawback
of stochastic optimization methods is that, their computational cost is very high,
compared to most of the deterministic methods. However, we will present a novel
hybrid search algorithm that has yielded promising and practically feasible prelimi
nary results.
Simulated annealing
Simulated annealing has been known as a general global optimization algorithm.
No restriction on the clique potential functions need be imposed in this approach.
The concept of this algorithm comes from statistical physics and is based on gradually
lowering a parameter T that corresponds to temperature in a physical system. As
T decreases, samples from the posterior distribution approach toward the minimal
energy solution. Geman and Geman [22] developed the Gibbs sampler with the
simulated annealing algorithm to solve the MAP estimation problem. In order to
achieve the global minimum, the cooling schedule for the temperature must be very
slowly decreased using the formula T(k) = where T(k) is the temperature
during the kth iteration and c is a constant. Since the cooling schedule must be
very slow to guarantee global convergence, the computational cost for the simulated
annealing algorithm is enormous.
161
Some variants of the simulated annealing algorithm have been reported in liter
ature, e.g. microcanonical annealing [13]. However, their computational complexities
are similar to the standard annealing.
MPM method
The MPM (maximizer of posterior marginals) method was proposed by Marro
quin et al. [51]. It is an approximation to MAP estimation, i.e. the MPM method
only provides an approximate solution. Unlike the gradual temperature cooling sched
ule in the simulated annealing algorithm, the MPM method executes the Metropolis
algorithm at a fixed temperature to collect statistics about the equilibrium behav
ior of the coupled MRF. Therefore, this method requires far less computation than
simulated annealing.
7.3 Proposed Hybrid Search Algorithm
In this section, we present a novel hybrid (stochastic + deterministic) search
algorithm as a possible solution to the coupled (binaryreal) nonconvex optimization
problem. This hybrid search algorithm consists of an informed genetic algorithm
(GA) and either one of our generalized capacitance matrix algorithm or the adaptive
preconditioned conjugate gradient algorithm. The informed genetic algorithm is em
ployed as a global minimizer on the binary line process. Inside the GA, either one of
the aforementioned deterministic algorithms is used on the real ( surface ) variables
to solve the quadratic and convex minimization problems for each given line process
configuration in the GA.
162
As shown in chapter 2, either the deterministic formulation or the probabilistic
formulation leads to the minimization of a nonconvex energy function E(u, 1), where
1 = [h v], given in equation 2.6 or 2.13. This energy function is a hybrid of the real
variables and boolean variables. We know that the GA is well suited for combinato
rial optimization problems because of its bitstring representation. This nonconvex
minimization problem can be transformed to a combinatorial optimization problem
as follows.
The energy function i?(u,l) can be rewritten as
E(u,l) = Â£(u  1) + Â£/(l),
(7.2)
Â£(u 11) = K,,ud2 + Â£/(ul),
(7.3)
m = e ^(i).
(7.4)
c
For the fourconnected neighborhood system, the conditional clique potential of the
real variables u given the line process 1, U(u  1), is analogous to the quadratic mem
brane smoothness energy, i.e.,
Â£/(ul) = Â£Vc(ul)
c
~ X^((ui+l>j uj) (1 hi'j) + (u,j+l Uu,j) (1 vi,j)) (7*5)
J
163
We can transform the minimization of E(u,l) with respect to u,l to the mini
mization of a function of the boolean variables 1 only as follows.
minÂ£(u,l) = mm min^(u,l) (7.6)
= rnin{f7(l) + rninÂ£(u  1) (7.7)
Define the function
Â£'(1):= 1/(1) + min Â£(ul) (7.8)
Then the original nonconvex minimization problem is transformed to the combinato
rial optimization problem, i.e. minimization of a function of boolean variables. Note
that the function E(u  1) is a quadratic and convex function of u when the operator
in the observation model, K, is linear.
Previously, hybrid algorithms [52, 5] were proposed to use simulated annealing
for the binary line process and analog network for minimizing the quadratic function
Â£(u  1) for each fixed line process. In our hybrid search algorithm, we use an informed
genetic algorithm to minimize this new energy function. To compute the new energy
function value in the GA, we need to minimize the convex and quadratic function
Â£(u  1). The minimum of a quadratic and convex function can be obtained by
solving a linear system with an associated symmetric positivedefinite (SPD) matrix.
Either one of the two fast algorithms namely, the generalized capacitance matrix
algorithm and the adaptive preconditioned conjugate gradient algorithm, presented
in this thesis, can be used to solve the SPD linear system.
164
The overall structure of our algorithm is to use the GA as a global minimizer
to the function E'(l) with a fast quadratic convex function minimizer inside the GA
to obtain the function value of E'( 1). It is important to have a very fast quadratic
convex function minimizer since we need to compute the function value E'( 1) for
each new line process candidate solution 1 at each generation. We employ either
one of our fast algorithms namely, the generalized capacitance matrix algorithm or
the adaptive preconditioned conjugate gradient algorithm, to solve the minimization
problems with prescribed discontinuities.
7.3.1 Elements of Genetic Algorithms
The GA is an adaptive search algorithm based on principles derived from the
dynamics of natural population genetics. It has been successfully applied to combina
torial optimization, function optimization, classifier systems, structure optimization,
pattern recognition, and other areas [27] [24].
A population of candidate solutions is generated in the beginning as the initial
population, denoted by P(0). Then, the GA operators, i.e. reproduction, crossover
and mutation, are applied to the current population P(t) to generate the descendant
population P(t + 1). Subsequently, the algorithm is an iterative adaptation from
one generation to the next based on the genetic operators and the fitness function.
The GA operators are designed to follow the survival of fittest principle. The fitness
function, which is a measure of fitness, in the GA is directly related to the objective
function to be minimized or maximized in an optimization problem.
165
GA can be directly employed to solve the optimization problem by defining a
positive fitness function F(x) in terms of the objective function E(x) which is to
be optimized. For example, we can choose F(x) = E0 + E(x) for the maximization
problem and E(x) = E0 E(x) for the minimization problem, where the constant E0
is used to make sure the fitness F(x) is positive. The fitness function is a measure
for the fitness of survival.
A representation of the solution space has to be chosen for the GA operators.
Normally, the bitstring representation is employed for simplicity. To use the bit
string representation for the real variables (surface), we have to discretize the contin
uous solution space. However, this representation is well suited for boolean variables,
like the line process.
A sketch of the genetic algorithm is as follows [69]:
1. Choose an initial population.
2. Determine the fitness function value for each individual.
3. Perform selection operation.
4. Perform crossover operation.
5. Perform mutation operation.
6. Check stopping criterion; if satisfied, stop; else go to step 2.
Genetic operators
In this section, the three basic genetic operators are briefly described.
166
Reproduction, also called selection, is the process of choosing the candidate so
lutions for the next generation from the candidate solutions in the current generation.
Some selection schemes have been proposed [25], e.g. proportional selection, ranking
selection, tournament selection ..., etc.
Crossover selects two samples from the current population as parents and then
mates them by exchanging a part of their gene values to generate their offsprings.
The parents are replaced by their offsprings in the current population. There
are many possible implementations of the crossover process. For example, we can
randomly select two samples, a start position and an end position in the gene string,
then exchange the 0s and ls of the two samples between the two positions. An
example is illustrated in figure 7.1.
Parent! = 0110100011010101
Parent2 = 1101001010100110
Child1 = 0110001010010101
Child! = 1101100011100110
Figure 7.1. The crossover operation example.
The mutation operator is similar to the perturbation of the Gibbs sampler in
the simulated annealing. It can prevent GAs from premature convergence. It is very
easy to implement this operator. Given a mutation probability, randomly choose a
sample and a position and flip the 0 or 1 at that location according to the mutation
probability.
167
Convergence property
Markov chain models have been employed to analyze the convergence of simple
genetic algorithms in [15, 69, 73]. Davis [15] used a nonhomogeneous finite Markov
chain to model the simple GA (population size is fixed but the mutation and crossover
are allowed to vary with the iteration index) and proved that the convergence rate
of a simple GA is far superior to that of simulate annealing. Rudolph [69] employed
a homogeneous finite Markov chain to model a canonical GA ( fixed population size,
proportional selection, fixed crossover and mutation probabilities ), and proved that
a canonical GA never converges to the global optimum regardless of the initialization.
But, he also showed that modified versions of canonical GAs with elitist selection (
the best individual survives with probability one ) asymptotically converge to the
global minimum in the probabilistic sense. Following the Rudolphs work, Suzuki
[73] proved that the probability of finding the global minimum in n generations for a
canonical GA with elitist selection is lower bounded by 1 (9(A,n), where A < 1
and A, is the largest eigenvalues of the diagonal submatrices in the transition matrix.
In addition, an upper bound for  A*  was derived in terms of the mutation probability.
Therefore, one can find the mutation probability to minimize this upper bound to
achieve best convergence performance in the GA. Because of the advantage of elitist
selection proved in [69, 73], this strategy is included in our informed GA, which will
be discussed in the next section.
168
7.3.2 Informed Genetic Algorithm
In this section, we present an informed GA which incorporates the problem
information into the genetic operators. In traditional GAs [24, 15, 69], only the
reproduction operator depends on the fitness function, while the crossover and mu
tation operators randomly explore the solution space with prespecified probabilities.
Totally random search is a very inefficient way to explore the solution space, and
it makes the GAs converge very slowly especially when the entire solution space is
very large, such as early vision problems. In our informed GA, we use an informed
mutation operator which exploits the problem information in the search, thus making
the solution search in the GA very efficient.
The only search procedure in simulated annealing involves the use of the Gibbs
sampler [22, 90] which is a random perturbation of the current solution and is similar
to the mutation operator in the GA. The Gibbs sampler is used to search the solution
space based on the Gibbs distribution of the energy function. The search in the
Gibbs sampler makes use of the information in the energy function, i.e. the problem,
therefore it is an informed search, not a totally random search. We use the Gibbs
sampler as an informed mutation operator in our informed GA.
Our informed GA consists of the reproduction operator and an informed muta
tion operator only. The main idea is to design the GA operators to make each con
figuration of candidate solutions in the generations of the GA to be a sample drawn
from the Gibbs distributions of the energy function to be minimized. To achieve this
goal, we choose our initial population as a sample from a Gibbs distribution of the
energy function E'(\), i.e.,
ft(l) =
(7.9)
z = 2>*,<1>'r,
i
(7.10)
where T is a constant. Let the population size be fixed and denoted by a constant
M. By using a Gibbs sampler to obtain M samples from the Gibbs distribution
given in equation 7.9, we obtain an initial population of size M, denoted by the set
{1!,12,...,1m}
In the reproduction operation, we use the Boltzmann weighted selection [25, 16]
scheme such that the survival probability for the ith individual is given by
Pr{h)
 LeE'(h)/T
~ Z'
(7.11)
M
Z'
= ^2eE'(lj)/T
J=1
(7.12)
For the informed mutation operator, we assign the probability of mutating the
ith individual 1, to a line process configuration I to be a Gibbs distribution as follows.
170
This mutation transition probability can be simplified to be
Pm{ 1.
1 E'(l)/T
ze
(7.15)
This informed mutation operation can be implemented by using a Gibbs sampler
[22, 90].
By using the above reproduction and mutation operators with the initial popula
tion sampled from the Gibbs distribution given in equation 7.9, we can show that each
individual at the nth generation is a sample from the following Gibbs distribution
Pn(\)
z
1 ~(n+l)E'(\)/T
z
y e(n+i)Â£'(l)/r
(7.16)
(7.17)
Thus, the asymptotic convergence of this informed GA can be proved directly from
equation 7.16 by assuming n > oo.
Our informed GA is summarized as follows:
1. Choose an initial population from a Gibbs distribution in equation 7.9.
2. Determine the fitness function value for each individual.
3. Perform Boltzmann weighted selection.
4. Perform the informed mutation operation by using a Gibbs sampler.
5. Check stopping criterion; if satisfied, stop; else go to step 2.
171
In step 2, we need to compute the fitness function value for each individual line
process configuration. To compute the fitness function value E'( 1) given in equation
7.8, it involves solving a convex and quadratic minimization problem. This can be
very efficiently solved by using either our generalized capacitance matrix algorithm
or our adaptive preconditioned conjugate gradient algorithm, presented in chapters
4 and 5.
7.3.3 Preliminary Experimental Results
We implemented our hybrid search algorithm on the sparse data surface recon
struction problem. In our experiments, our hybrid search algorithm consists of the
informed GA for the binary line process 1 and the generalized capacitance matrix
algorithm for determining the fitness function value for a prespecified line process
configuration. The sparse data set ( see figure 7.2(b) ) is obtained from sampling
the original surface shown in figure 7.2(a). We discretized this problem on a 32 x 32
mesh. The sparse data set shown in figure 7.2(b) contains 64 data points.
In our experiments, we use the cliques for the line process used by Geman and
Geman [22], shown in figure 7.3(a). The line clique potentials for different line process
configurations are shown in figure 7.3(b). This choice of the line clique potentials is
ad hoc. However, they can be obtained via parameter estimation in MRF models
[54].
The Gibbs sampler [22, 90] is used to implement the informed mutation operator
in our informed GA. There are two steps in the Gibbs sampler, i.e., the exploration
step (or perturbation step) and acceptance step. Each time after the exploration
172
Figure 7.2. The sparse data set used for the surface reconstruction with discontinuity
detection, (a) original surface (b) sparsely sampled data set.
step, we need to compute the energy function value for the perturbed new line process
configuration and then determine the acceptance probability PaCcepti given by
i 1 when E'(\new) < E'(lold),
Paccept(\old lneu;) \
y e(Â£'(lneu,)Â£'(l0id))/7 when E'(\new) > E'(l0id),
where 1 m is the line process before the exploration step and lnew is the one after the
exploration step. Computation of the energy function value Â£'(1) given 1 involves
minimizing a quadratic convex function, which is solved via the generalized capac
itance matrix algorithm. It is necessary to repeat these two steps many times for
the Gibbs sampler to converge, therefore the computational cost is extremely high in
implementing the Gibbs sampler. In our experiments, we use an approximate Gibbs
sampler to reduce the computational cost drastically. This Gibbs sampler uses the
approximation that the minimum solution u0jd obtained by minimizing the function
f7(ul0/d) is similar to the minimum solution uneu; for the new perturbed line process
173
O O
O o
vc=o.o
O I o
O I o
Vc = 1.2
OxO
X X
OxO
(a)
O
O
O
O
Vc = 0.25
O o
O I o
vc=0.5
O
o
o
o
Ve = 2.0
(b)
O O
O  O
Vc =2.0
Figure 7.3. (a) Clique for the line process (x: line process, 0: nodal variable for
surface), (b) clique potentials for different line process configurations.
174
\new. By using this approximation, the acceptance probability can be obtained by
computing the local energy change in E' for each ( onesite ) perturbation.
In our experiments, the population size in the informed GA is arbitrarily chosen
to be 100. By using our hybrid search algorithm on the sparse data set shown in
figure 7.2(b), the converged solution, which means the best visited solution remains
the same for many generations, was found after 3 generations of the informed GA.
Figure 7.4(a) shows the reconstructed surface and the detected discontinuities is given
in figure 7.4(b). We can see that this solution is very close to the true solution.
(a)
(b)
Figure 7.4. Surface reconstruction example: (a) the reconstructed surface (b) the
recovered discontinuity map ( solid line ) obtained after 3 generations of the informed
GA and imposed on the true discontinuity map ( dashed line ).
In another experiment, we added noise to the sparse data in figure 7.2(b) to test
our hybrid search algorithm. The noise for each data point i is a uniform random
variable ranging in the interval [0.2dt, 0.2d,], where d, is the data value sampled
from the original shape shown in figure 7.2(a). Figure 7.5(a) shows the noisy sparse
175
data set. Our hybrid search algorithm found the converged solution after 5 gener
ations of the informed GA. The reconstructed surface and detected discontinuities
are shown in figure 7.5(b) & (c), respectively. As evident from this experiment, our
hybrid search algorithm has obtained a very accurate solution for sparse data surface
reconstruction problems with high noise.
(b) (c)
Figure 7.5. Surface reconstruction with noise example: (a) the noisy sparse data set,
(b) the reconstructed surface (c) the recovered discontinuity map ( solid line ) ob
tained after 5 generations of the informed GA and imposed on the true discontinuity
map ( dashed line ).
176
With regards to computational effort, in the above surface reconstruction exper
iments, our generalized capacitance matrix algorithm usually takes less than 1 second
on a multiuser SUN SPARC10 workstation to solve a convex quadratic minimiza
tion problem. The population size was set to 100 in our hybrid search algorithm. In
the experiments, three generations of the hybrid search algorithm were completed in
less than 3 minutes. For 5 generations of the same, it took less than 6 minutes. The
structure of our hybrid search algorithm is primarily a genetic algorithm, which is
easily parallelizable [71], thus making it straightforward to port this algorithm to a
parallel machine. A parallel implementation will make our algorithm a very useful
and efficient technique for solving hard nonconvex optimization problems.
7.3.4 Discussion
A hybrid search algorithm is presented in this chapter to solve the coupled (
binaryreal) nonconvex optimization problem. Our hybrid search algorithm consists
of a novel informed GA and either one of our generalized capacitance matrix algo
rithm or adaptive preconditioned conjugate gradient algorithm. The informed GA
is used as a stochastic global minimizer for a new energy function consisting of the
binary line process variables only. Inside the GA, either one of the above two de
terministic algorithms is used to determine the new energy function values for each
line process configuration visited in the GA. The performance of this hybrid search
algorithm is demonstrated through experiments on the sparse data surface recon
struction problem. Since the GA is a highly parallizable algorithm, we can improve
the efficiency of this hybrid search algorithm by porting it to a parallel machine.
177
The computational cost of our hybrid search algorithm will be very high when
the problem size is large. One possible way to improve the efficiency of this algorithm
is to modify the informed GA as a multiscale stochastic optimization method. To
achieve this, it is necessary to construct a multiscale representation for the line process
field. Heitz et al. [33] constructed a multiscale representation for a label field and
used a multiscale relaxation algorithm to minimize the energy function for some early
vision problems, ft is possible to generalize this construction for the line process
field and apply a multiscale informed GA to make the hybrid search algorithm more
efficient.
CHAPTER 8
CONCLUSIONS
This thesis reported the development of several novel and efficient computational
algorithms for solving early vision problems using either a regularization formulation
or an MRF formulation.
We presented the generalized capacitance matrix theorems and an accompany
ing algorithm for solving the linear systems arising from the discretization of the
early vision problems with prespecified discontinuities. By using these theorems, we
transformed the original linear system arising from the discretization of early vision
problems into a Lyapunov matrix equation or a cascaded of two Lyapunov matrix
equations with an appropriate righthand side. The Lyapunov matrix equations are
solved by the ADI method, which was proved to converge for a prespecified error
tolerance in a constant number of iterations with each iteration taking O(N) oper
ations. The righthand side of the Lyapunov matrix equation can be obtained by
solving an associate capacitance matrix linear system. A modified BCG algorithm
was used to solve this dense, nonsymmetric and indefinite linear system. This al
gorithm was implemented and tested on the surface reconstruction and shape from
shading problems to demonstrate its efficiency.
As discussed in chapter 5, the generalized capacitance matrix algorithm is not
very efficient for the early vision problems with dense data and nonuniform weighting
178
179
case. Hence, we presented an alternative algorithm called the adaptive precondition
ing technique as a general solution to early vision problems with prespecified disconti
nuities. This adaptive preconditioner was constructed in a wavelet basis based on the
approximation to the spectral characteristics of the smoothness and data constraints
by using the division of frequency domain property in a wavelet basis. We empirically
showed the superiority of this adaptive preconditioner over other preconditioners pre
viously proposed in vision literature through experiments on surface reconstruction,
shape from shading and optical flow computation.
In the context of robustness, we developed two new and robust algorithms for
accurate optical flow estimation. One is a modified gradientbased algorithm which
combines the image flow constraint and the contourbased flow constraint into the
data constraint in a regularization framework to amend the errors in the image flow
constraint caused by brightness discontinuities. To solve the linear system arising
from the regularization formulation, our adaptive preconditioned conjugate gradient
algorithm was employed, leading to an efficient algorithm. The other is an SSD
based regularization method that uses the SSD measure as the data constraint in
a regularization framework. The preconditioned nonlinear conjugate gradient with
a modified search direction scheme was developed to minimize the resulting energy
function. Experimental results for these two algorithms are given to demonstrate
their performance.
Finally, in future research directions, we presented a novel hybrid search algo
rithm which involved a combination of stochastic and deterministic search techniques.
180
We proposed an informed genetic algorithm for the stochastic search, while using ei
ther one of our generalized capacitance matrix algorithm or adaptive preconditioned
conjugate gradient algorithm for the deterministic search. We presented some promis
ing preliminary results to demonstrate the effectiveness of this algorithm in recovering
shape along with its discontinuities from sparse range data. However, extensive ex
perimentation as well as further theoretical research needs to be performed prior to
judging the practicality of this algorithm in solving early vision problems leading to
nonconvex minimization.
REFERENCES
[1] B. K. Alpert, Wavelets and other bases for fast numerical linear algebra
in Wavelets A Tutorial in Theory and Applications, pp. 181216, Boston:
Academic Press Inc., 1992.
[2] P. Anandan, A computational framework and an algorithm for the measure
ment of visual motion , Intern. J. Comput. Vision, vol. 2, pp. 283310, 1989.
[3] J. L. Barron, D. J. Fleet, and S. S. Beauchemin, Performance of optical flow
techniques, Inti. Journ. of Comput. Vision, vol. 12, no. 1, pp. 4377, 1994.
[4] L. Bedini, I. Gerace, and A. Tonazzini, A deterministic Algorithm for recon
structing images with interacting discontinuities , CVGIP: Graphical Models
and Image Processing vol. 56, pp. 109123, 1994.
[5] L. Bedini and A. Tonazzini, Image restoration preserving discontinuities: the
Bayesian approach and neural networks, Image and Vision Computing vol. 10,
no. 2, pp. 108118, 1992.
[6] M. Bertero, T. A. Poggio, and V. Torre, Illposed problems in early vision ,
Proceedings of the IEEE, vol. 76, pp. 869889, August 1988.
[7] G. Beylkin, R. R. Coifman, and V. Rokhlin, Fast wavelet transforms and
numerical algorithm I, Comm, on Pure and Applied Math., vol. 44, pp. 141
183, 1991.
[8] A. Blake and A. Zisserman, Visual Reconstruction. Cambridge, MA: The MIT
Press, 1987.
[9] D. Braess, The convergence of a multigrid method with GaussSeidel relaxation
for the Poisson equation, Mathematics of Computation, vol. 42, pp. 505519,
April 1984.
[10] B. L. Buzbee, F. W. Dorr, J. A. George, and G. H. Golub, The direct solution
of the discrete Poison equation on irregular regions, SIAM Journal of Numerical
Analysis, vol. 8, pp. 722736, December 1971.
[11] S. L. Campbell and C. D. Meyer, Generalized Inverses of Linear Transforma
tions. London: Pitman Pub., 1979.
181
182
[12] A. K. Chhabra and T.A. Grogan, On Poisson solvers and semidirect methods
for computing area based optical flow, IEEE Trans. Patt. Anal. Machine Intell.,
vol. 16, Nov. 1994.
[13] M. Creutz, Microcanonical Monte Carlo simulation , Physical Rev. Lett.,
vol. 50, pp. 14111414, 1983.
[14] G. Dahlquist and A. Bjorck, Numerical Methods. Englewood Cliffs, NJ: Prentice
Hall Inc., 1974.
[15] T. Davis, Toward an Extrapolation of the Simulated Annealing Convergence
Theory onto the Simple Genetic Algorithm. Ph.D. Dissertation, University of
Florida, 1991.
[16] M. de la Maza and B. Tidor, Boltzmann weighted selection improves per
formance of genetic algorithms , Technical Report AI Memo 1345, MIT, A.I.
LAB., Cambridge, MA, 1991.
[17] J. H. Duncan and T.C. Chou, On the detection of motion and the computation
of optical flow, IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI14, pp. 346
352, March. 1992.
[18] D. J. Fleet and A. D. Jepson, Computation of component image velocity from
local phase information, Inti. Journ. of Comput. Vision, vol. 5, pp. 77104,
1990.
[19] R. W. Freund, G. H. Golub, and N. M. Nachtigal, Iterative solution of linear
system , in Acta Numrica 1992, pp. 57100, New York: Cambridge University
Press, 1992.
[20] D. Gabor, Theory of communication , Journal of the IEE, vol. 93, pp. 429
457, 1946.
[21] D. Geiger and F. Girosi, Parallel and deterministic algorithms from MRFs:
surface reconstruction , IEEE Trans. Patt. Anal. Machine Intell., vol. 13, no. 5,
pp. 401412, 1991.
[22] S. Geman and D. Geman, Stochastic relaximation, Gibbs distributions, and
the Bayesian restoration of images , IEEE Trans. Patt. Anal. Machine Intell.,
vol. 6, no. 6, pp. 721741, 1984.
[23] P. E. Gill, W. Murray, and M. H. Wright, Practical Optimization. New York:
Academic Press, 1981.
[24] D. E. Goldberg, Genetic Algorithm in Search, Optimization and Machine Learn
ing. Reading, MA: AddisonWesley, 1989.
183
[25] D. E. Goldberg and K. Deb, A comparative analysis of selection schemes used
in genetic algorithms, Technical Report TCGA Report No. 90007, Univ. of
Alabama, 1990.
[26] G. H. Golub and C. F. Van Loan, Matrix Computations Baltimore: The Johns
Hopkins University Press, 2nd ed., 1989.
[27] J. J. Grefenstette, Genetic Algorithms and their applications , in The En
cyclopedia of Computer Science and Technology (A. Kent and J. G. Williams,
eds.), vol. 21, pp. 139152, New York: Marcel Dekker, 1990.
[28] W. E. L. Grimson, An implementation of computational theory for visual sur
face interpolation , Computer Vision, Graphics, and Image Processing, vol. 6,
no. 22, pp. 3669, 1983.
[29] W. Hackbusch and U. Trottenberg, eds., Multigrid Methods (Lecture Notes in
Mathematics), vol. 960. New York: SpringerVerlag, 1982.
[30] W. W. Hager, Updating the inverse of a matrix , SIAM Review, vol. 31,
pp. 221239, June 1989.
[31] J. G. Harris, C. Koch, E. Staats, and J. Luo, u Analog hardware for detect
ing discontinuities in early vision , International Journal of Computer Vision,
vol. 4, pp. 211223, 1990.
[32] F. Heitz and P. Bouthemy, Multimodal estimation of discontinuous optical
flow using Markov random fields , IEEE Trans. Patt. Anal. Machine Intel!,
vol. PAMI15, pp. 12171232, 1993.
[33] F. Heitz, P. Perez, and P. Bouthemy, Multiscale minimization of global energy
functions in some visual recovery problems, CVGIP: Image Understanding,
vol. 59, no. 1, pp. 125134, 1994.
[34] H. V. Henderson and S. R. Searle, On deriving the inverse of a sum of matrices
, SIAM Review, vol. 23, no. 1, pp. 5360, 1981.
[35] E. C. Hildreth, Computations underlying the measurement of visual motion ,
Artificial Intelligence, vol. 23, pp. 309354, 1984.
[36] J. J. Hopfield and D. W. Tank, Neural computation in optimization problems,
Biological Cybernetics, vol. 52, pp. 141152, 1985.
[37] B. K. P. Horn, Robot Vision. Cambridge, MA: The MIT Press, 1986.
[38] B. K. P. Horn, Height from shading , Inti. Journal of Computer Vision, vol. 5,
no. 1, pp. 174208, 1990.
[39] B. K. P. Horn and M. J. Brooks, eds., Shape from Shading. Cambridge, MA:
The MIT Press, 1989.
184
[40] B. K. P. Horn and B. G. Schunk, Determining optical flow , Artificial Intel
ligence, vol. 17, pp. 185203, 1981.
[41] K. Ikeuchi and B. K. P. Horn, Numerical shape from shading and occluding
boundaries, Artificial Intelligence, vol. 17, pp. 141184, 1981.
[42] S. Jaffard, Wavelet methods for fast resolution of elliptic problems , SIAM
Journal of Numerical Analysis, vol. 29, pp. 965986, August 1992.
[43] M. Kass, A. Witkin, and D. Terzopoulos, Snakes: active contour models, Inti.
Journal of Computer Vision, vol. 1, pp. 321331, 1987.
[44] C. Koch, J. Marroquin, and A. Yuille, Analog neuronal networks in early
vision , Proc. Natl. Acad. Sci. USA, vol. 83, pp. 42634267, 1986.
[45] S. H. Lai and B. C. Vemuri, An O(N) iterative solution to the Poisson equa
tions in lowlevel vision problems, in Proceedings of the IEEE Conference on
Computer Vision & Pattern Recognition, (Seattle, WA), pp. 914, June 1994.
[46] C. Lanczos, An iteration method for the solution of the eigenvalue problem
of linear differential and integral operators , J. Res. Natl. Bur. Stand., vol. 45,
pp. 255282, 1950.
[47] Y. G. Leclerc, Constructing simple stable descriptions for image partition ,
International Journal of Computer Vision, vol. 3, pp. 73102, 1989.
[48] A. Lu and E. L. Wachspress, Solution of Lyapunov equations by alternating
direction implicit iteration , Computers Math. Applic., vol. 21, no. 9, pp. 4358,
1991.
[49] B. Lucas and T. Kanade, An iterative image registration technique with an
application to stereo vision , in Proceedings of the 7th International Joint Con
ference on Artificial Intelligence, (Vancouver, B.C., Canada), pp. 121130, 1981.
[50] S. G. Mallat, A theory of multiresolution signal decomposition , IEEE Trans.
Patt. Anal. Machine Intell., vol. PAMI11, no. 7, pp. 647693, 1989.
[51] J. Marroquin, S. Mitter, and T. Poggio, Probabilistic solution of illposed prob
lems in computational vision , Journal of the American Stat. Assoc., vol. 82,
no. 397, pp. 7689, 1987.
[52] J. L. Marroquin, Probabilistic Solution of Inverse Problems. Ph.D. Dissertation,
Massachusetts Institute of Technology, 1985.
[53] D. Mumford and J. Shah, Boundary detection by minimizing functionals , in
Proceedings of the IEEE Conference on Computer Vsion and Pattern Recogni
tion, (San Francisco, CA), pp. 2226, June 1985.
185
[54] S. G. Nadabar and A. K. Jain, Parameter estimation in MRF line process
models, in Proceedings of the IEEE Conference on Computer Vsion and Pattern
Recognition, (Champaign, IL), pp. 528533, IEEE, June 1992.
[55] H.H. Nagel, Constraints for the estimation of displacement vector fields from
image sequences, in Proceedings of the 8th International Joint Conference on
Artificial Intelligence, (Karlsruhe, Germany), pp. 945951, 1983.
[56] H. H. Nagel and W. Enkelmann, An investigation of smoothness constraints
for the estimation of displacement vector fields from image sequences , IEEE
Trans. Patt. Anal. Machine Intell., vol. PAMI8, no. 5, pp. 565593, 1986.
[57] G. M. Nielson and R. Franke, A method for construction of surfaces under
tension, Rocky Mountain Journal of Mathematics, vol. 14, no. 1, pp. 203221,
1984.
[58] D.P. OLeary, A note on the capacitance matrix algorithm, substructuring,
and mixed or neumann boundary conditions, Applied Numerical Mathematics,
vol. 3, pp. 339345, 1987.
[59] A. V. Oppenheim, Discretetime Signal Processing. Englewood Cliffs, NJ: Pren
tice Hall, 1989.
[60] A. Papoulias, liCurve Segmentation Using Weak Continuity Constraints, Mas
ters thesis, University of Edinburgh, 1983.
[61] A. Pentland, Interpolation using wavelet basis , IEEE Trans. Patt. Anal.
Machine Intell., vol. PAMI16, no. 4, pp. 410414, 1994.
[62] T. Poggio and V. Torre, IllPosed problems and regularization analysis in early
vision , Technical Report AI Memo 773, MIT, A.I. LAB. Cambridge, MA 1984.
[63] T. Poggio, V. Torre, and C. Koch, Computational vision and regularization
theory , Nature, vol. 317, pp. 314319, 1985.
[64] M. Proesmans, E. Pauwels, and L. Van Gool, Coupled geometrydriven diffu
sion equations for lowlevel vision , in GeometriyDriven Diffusion in Computer
Vision, pp. 191228, Dordrecht, The Netherlands: Kluwer Academic Pub., 1994.
[65] W. Proskurowski and 0. Wildlund, On the numerical solution of Helmholtzs
equation by the capacitance matrix method, Math. Comp., vol. 30, pp. 433468,
July 1976.
[66] W. Proskurowski and 0. Wildlund, A finite element capacitance matrix method
for the Neumann problem for Laplaces equation, SIAM J. Sci. Statist. Corn
put., vol. 1, no. 4, pp. 410425, 1980.
186
[67] K. S. Riedel, A ShermanMorrisonWoodbury identity for rank augmenting
matrices with application to centering SIAM J. Matrix Anal. Appl., vol. 13,
no. 2, pp. 659662, 1992.
[68] 0. Rioul and M. Vitterli, Wavelets and signal processing , IEEE Signal Pro
cessing Magazine, vol. 8, pp. 1438, 1991.
[69] G. Rudolph, Convergence analysis of canonical genetic algorithms , IEEE
Trans. Neural Networks, vol. 5, no. 1, pp. 96101, 1994.
[70] J. Shah, Segmentation by nonlinear diffusion , in Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition, (Maui, HI), pp. 202
207, June 1991.
[71] R. Shonkwiler and E. Van Vleck, Parallel speedup of Monte Carlo methods
for global optimization, Journal of Complexity, vol. 10, pp. 6495, 1994.
[72] T. Simchony, R. Chellappa, and M. Shao, Direct analytical methods for solv
ing Poisson equations in computer vision problems , IEEE Trans. Patt. Anal.
Machine Intell., vol. 12, May 1990.
[73] J. Suzuki, A Markov chain analysis on simple genetic algorithm, IEEE Trans.
Systems, Man and Cybernetics, vol. 25, no. 4, pp. 655659, 1995.
[74] R. Szeliski, Bayesian Modeling of Uncertainty in Lowlevel Vision. Boston:
Kluwer Academic Publishers, 1989.
[75] R. Szeliski, Fast surface interpolation using hierarchical basis functions, IEEE
Trans. Patt. Anal. Machine Intell., vol. 12, no. 6, pp. 513528, 1990.
[761 R. Szeliski, Fast shape from shading , CVGIP: Image Understanding, vol. 53,
no. 2, pp. 129153, 1991.
[77] R. Szeliski and J. Coughlan, Hierarchical splinebased image registration , in
Proceedings of the IEEE Conference Computer Vision & Pattern Recognition,
(Seattle, WA), pp. 194201, 1994.
[78] D. Terzopoulos, Multilevel computational processes for visual surface recon
struction , Computer Vision, Graphics, Image Processing, vol. 24, pp. 5296,
1983.
[79] D. Terzopoulos, Image analysis using multigrid relaxation methods , IEEE
Trans. Patt. Anal. Machine Intell., vol. 8, pp. 129139, 1986.
[80] D. Terzopoulos, Regularization of inverse visual problems involving disconti
nuities , IEEE Trans. Patt. Anal. Machine Intell., vol. 8, no. 4, pp. 413424,
1986.
187
[81] D. Terzopoulos, The computation of visiblesurface representations , IEEE
Trans. Patt. Anal. Machine Intel!, vol. 10, no. 4, pp. 417438, 1988.
[82] A. N. Tikhonov and V. Y. Arsenin, Solutions of IllPosed Problems. Washington,
DC: Winston & Sons, 1977.
[83] V. Torre and T. Poggio, On edge detection , IEEE Trans. Patt. Anal. Machine
Intel!, vol. PAMI8, pp. 147163, 1986.
[84] S. Uras, F. Girosi, A. Verri, and V. Torre, A computational approach to motion
perception, Bio! Cybern., vol. 60, pp. 7987, 1988.
[85] B. C. Vemuri and S. H. Lai, A fast solution to the surface reconstruction prob
lem , in Proceedings of SPIE Conference on Mathematical Imaging: Geometric
Methods in Computer Vision, (San Diego, CA), pp. 2737, SPIE, July 1993.
[86] B. C. Vemuri and A. Radisavljevic, Multiresolution stochastic hybrid shape
models with fractal priors , ACM Trans, on Graphics, vol. 13, pp. 177207,
April 1994.
[87] E. L. Wachspress, Extended application of alternating direction implicit itera
tion model problem theory , Journal of SIAM, vol. 11, pp. 9941016, December
1963.
[88] G. Wahba, Spline Models for Observational Data. Philadelphia: Society for
Industrial and Applied Mathematics, 1990.
[89] G. Whitten, Scale space tracking of deformable sheet models for computational
vision , IEEE Trans. Patt. Ana! Mach. Intel!, vol. PAMI15, pp. 697706, July
1993.
[90] G. Winkler, Image Analysis, Random Fields and Dynamic Monte Carlo Methods.
Berlin, Germany: Springer Verlag, 1995.
[91] J. Xu, Iterative methods by space decomposition and subspace correction,
SIAM Review, vol. 34, pp. 581613, December 1992.
[92] M. H. Yaou and W. T. Chang, Fast surface interpolation using multiresolution
wavelet transform , IEEE Trans. Patt. Anal, and Machine Intel!, vol. PAMI
16, no. 7, pp. 673688, 1994.
[93] H. Yeserentant, On multilevel splitting of finite element spaces , Numeriche
Mathematik, vol. 49, pp. 379412, 1986.
BIOGRAPHICAL SKETCH
ShangHong Lai was born in ChangHua, Taiwan, Republic of China, on August
16, 1964. He received his Bachelor and Master of Science degrees in electrical engi
neering from National TsingHua University, HsinChu, Taiwan, in 1986 and 1988,
respectively. From 1988 to 1990, he served as a communication officer in the army
for the compulsory military service. After finishing the military service, he worked in
the Institute of Information Science, Academia Sinica, Taipei, as a research assistant
for one year. Since 1991, he entered the Department of Electrical and Computer
Engineering at the University of Florida, Gainesville, to persue his Ph.D. degree. His
research interests include computer vision, image processing, pattern recognition,
scientific computing and optimization algorithms.
188
I certify that I have read this study and that in my opinion it conforms
to acceptable standards of scholarly presentation and is fully adequate, in
scope and quality, as a dissertation for the degree of Doctor of Philosophy.
Baba CXVemuri, Chair
Associate Professor of Electrical
and Computer Engineering
I certify that I have read this study and that in my opinion it conforms
to acceptable standards of scholarly presentation and is fully adequate, in
scope and quality, as a dissertation for the degree of Doctor of Philosophy.
John Staudhammer
Professor of Electrical and
Computer Engineering
I certify that I have read this study and that in my opinion it conforms
to acceptable standards of scholarly presentation and is fully adequate, in
scope and quality, as a dissertation for the degree of Doctor of Philosophy.
KeiYh L. Doty
Professor of Electrical and
Computer Engineering
I certify that I have read this study and that in my opinion it conforms
to acceptable standards of scholarly presentation and is fully adequate, in
scope and quality, as a dissertation for the degree of Doctor of Philosophy.
in G. Harris
assistant Professor of Electrical
and Computer Engineering
I certify that I have read this study and that in my opinion it conforms
to acceptable standards of scholarly presentation and is fully adequate, in
scope and quality, as a dissertation Ijoj/the degree of Doctor of Philosophy.
Permit N. Sigmon
Associate Professor of Mathematics
This dissertation was submitted to the Graduate Faculty of the College
of Engineering and to the Graduate School and was accepted as partial ful
fillment of the requirements for the degree of Doctor of Philosophy.
August 1995
Winfred M. Phillips
Dean, College of Engineering
Karen A. Holbrook
Dean, Graduate School
CHAPTER 8
CONCLUSIONS
This thesis reported the development of several novel and efficient computational
algorithms for solving early vision problems using either a regularization formulation
or an MRF formulation.
We presented the generalized capacitance matrix theorems and an accompany
ing algorithm for solving the linear systems arising from the discretization of the
early vision problems with prespecified discontinuities. By using these theorems, we
transformed the original linear system arising from the discretization of early vision
problems into a Lyapunov matrix equation or a cascaded of two Lyapunov matrix
equations with an appropriate righthand side. The Lyapunov matrix equations are
solved by the ADI method, which was proved to converge for a prespecified error
tolerance in a constant number of iterations with each iteration taking O(N) oper
ations. The righthand side of the Lyapunov matrix equation can be obtained by
solving an associate capacitance matrix linear system. A modified BCG algorithm
was used to solve this dense, nonsymmetric and indefinite linear system. This al
gorithm was implemented and tested on the surface reconstruction and shape from
shading problems to demonstrate its efficiency.
As discussed in chapter 5, the generalized capacitance matrix algorithm is not
very efficient for the early vision problems with dense data and nonuniform weighting
178
113
the wavelet basis. Note that for the sparse data case, the matrix Ks dominates K<.
Therefore, bad approximations to the wavelet transform of Ks will deteriorate the
preconditioning effect and slow down the convergence speed.
Figure 5.11. The convergence for different preconditioners in the preconditioned con
jugate gradient algorithm for the surface reconstruction problem with sparse sym
metric data set using a controlledcontinuity stabilizer.
The third experiment uses the sparse data set given in figure 5.3(b). The size
of the problem is maintained at 64 x 64. The controlledcontinuity stabilizer with
p 1 and r = 0.01 is used as the smoothness constraint here. Note that the
data set is symmetric with respect to the center. Using the circular convolution in
the QMF implementation will speed up the convergence rate of the wavelet based
preconditioning algorithm in this case. This is because the wavelet preconditioner
with circular convolution operates on the residue vector periodically in 2D. This
is very helpful for symmetric data as the solution is supposed to be symmetric in
this case. Yaou h Chang [92] used the circular convolution in their wavelet basis
transfer and most of their experiments are performed on the symmetric data sets
175
data set. Our hybrid search algorithm found the converged solution after 5 gener
ations of the informed GA. The reconstructed surface and detected discontinuities
are shown in figure 7.5(b) & (c), respectively. As evident from this experiment, our
hybrid search algorithm has obtained a very accurate solution for sparse data surface
reconstruction problems with high noise.
(b) (c)
Figure 7.5. Surface reconstruction with noise example: (a) the noisy sparse data set,
(b) the reconstructed surface (c) the recovered discontinuity map ( solid line ) ob
tained after 5 generations of the informed GA and imposed on the true discontinuity
map ( dashed line ).
82
computer vision literature. In this chapter, we will introduce a physicallybased adap
tive preconditioning technique which will outperform in computational efficiency
 all previously proposed preconditioning methods in vision literature namely, the
hierarchical basis preconditioned conjugate gradient of Szeliski [75] and the change
of basis methods of Yaou et al. [92] and Pentland [61]. In addition, it does not have
any restriction on the choice of the regularization parameter A for its convergence.
In [75, 76], Szeliski presented a preconditioning technique based on the use of
hierarchical basis functions. This technique is based on the multilevel splitting of the
finite element spaces presented in [93]. The hierarchical basis functions naturally lead
to a pyramidal representation of the unknown function being solved for. Empirical
convergence results were shown for preconditioned conjugate gradient in comparison
with the standard conjugate gradient method. A key piece of information that was
overlooked in Szeliskis work [75, 76] was that, in designing a preconditioner for the
conjugate gradient algorithm, no information about the problem being solved was
used i.e., once a basis set was chosen, the same preconditioner was used regardless
of the imposed smoothness or data constraints. Thus, the true power of a precondi
tioning transform was not exploited to its fullest.
More recently, Pentland [61] presented a technique for surface interpolation us
ing a change of basis to orthonormal wavelets as a preconditioning transform. He
uses the results of Beylkin et al., [7] which states that the application of (TV x TV)
matrices corresponding to any pseudodifferential and CalderonZygmund opera
tors to arbitrary vectors requires either O(TV) or O(TVlogTV) operations based on
140
In step 5, there are two ways for choosing f3. The first is FletcherReeves formula and
the other is PolakRibiere formula. We adopt the latter since it usually converges
much more quickly, although the nonlinear CG needs to restart when f) = 0. Note
that /'(u) and /"(u) in this algorithm denote the gradient vector and the Hessian
matrix of the function /. From equation 6.10, we can obtain
/'(u) =
f'd,u
+ A
Ks 0
. /*.
0 Ks
u
/ =
Kd,uu t~ AKS
Kdyitv Kd,vv t" AKS
(6.12)
(6.13)
where the matrix K, is the Laplacian matrix obtained from the membrane smoothness
constraint, the vectors f'd u and f'd are the concatenations of the components
and f'dvi respectively, K<)UU, KdyXlv and Kd}VV are all diagonal matrices with the ith
diagonal entries Kd
Kd,uu,ii Kd,uv,i and Kd,w,i corresponding to the ith location are defined as follows:
f
J d,u,i
f
J d,v,i
f^d,uu,i
R d,uv,i
^d,vv,i
dli(xj + u,,y, + u,)
dx
dh{xj + u,,y, + Vj)
dx
,dh(x, + u y, + i>,) 2 d2h{xj + Uj,yj f u,)^
' dx
w
w
dx2
(6.14)
(6.15)
(6.16)
^ dh(xj + ut, y, + Vi)^dh(xi + u, y, + t>) ^ + d2Il{xl + u,, y, +
dx
dy
dxdy
, dh(xi + Ui, y{ + Vi) 2 d2h{xi + iq, y{ + v{)
( ) + w
dy
dy2
(6.18)
EFFICIENT COMPUTATIONAL METHODS FOR EARLY VISION PROBLEMS
By
SHANGHONG LAI
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
1995
17
Combining equations 2.9 and 2.11, we can see that the MAP estimate can be obtained
by minimizing the following energy function
(/(did) = r+(/(f,i) (2.i2)
= idir(f)i*+j:fc(fii)+s:%(i) (2i3)
C C
Investigating equations 2.6 and 2.13, we can see that these two energy functions are
very similar. If we choose the parameters and the clique potential to make these
two energy functions identical, then, both the deterministic formulation and the
MRF formulation give rise to minimizing the same energy function. Therefore, the
correspondence between the two formulations is obvious.
The advantages of the probabilistic model over a deterministic model are two
fold:
1. The MRF model can include very general constraints on the line process, while
the deterministic model is restricted in the types of constraints considered.
The MRF model regards the imposed constraints as a prior probability, which is
determined by the clique potential. In fact, we can estimate the clique potential
functions for different object models instead of ad hoc or heuristic choices [54].
2. A probabilistic model can provide second (covariance) or higher order statistics
of the estimate in addition to the MAP estimate. The higher order statis
tical information is very useful in the context of dynamic vision [74], sensor
14
The energy in equation 2.6 is a nonconvex function of real variables coupled
with boolean variables. It is difficult to solve a nonconvex minimization problem
since traditional gradient decent methods are easily trapped in local minima.
The deterministic formulation leads to satisfactory results for some early vision
problems, but this formulation is not general enough to include arbitrary constraints
for the line process. We will see in the next section that the probabilistic formu
lation using the MRF model is more flexible in this context than the deterministic
formulation.
2.2 Probabilistic Framework
MRF (Markov Random Field) is a generalization of Markov process in ID to
2D. To define an MRF [22], we have to define a neighborhood system first. Let S be
a finite set ofnxn sites, and G = {G,j, (i,j) G S'} be a neighborhood system for S'
such that
(1) (t,j) 0 Gij, for all (i,j) G S.
(2) (i,j) Gkli, if and only if (k,l) G,j, for all (i,j),(k,l) S.
Let F = {Fij, (i,j) S} be a random field, which is a family of random variables
indexed by the twodimensional site (i,j) S. Any possible sample realization
f = (/i,i,...,/n,n) is called a configuration of the field. Let 0 be the set of all
possible configurations, i.e. the sample space, and P be a probability measure on .
Then F is an MRF with respect to G if
(1) P(F = f) > 0, for all f G fi.
4
Direct solutions to the associate linear systems from minimization of the energy
functions are based on a theorem called the capacitance matrix theorem which was
stated and proved in Buzbee et al. [10]. However, this capacitance matrix theorem
is restricted to some early vision problems and on the design on numerical algo
rithms [45]. The direct solutions using the capacitance matrix theorem employ the
FourierToeplitz method in combination with LU decomposition [72], the Cholesky
decomposition [10] or the conjugate gradient [65] technique. The FourierToeplitz
method requires 0(N log N) time and the LU decomposition, Cholesky factoriza
tion or the conjugate gradient require 0(n3) = 0(Ny/), thus making the overall
complexity 0(Ny/).
In this thesis, we present two efficient algorithms for the minimization of the en
ergy functions in early vision. One is a generalized capacitance matrix algorithm and
the other is an adaptive preconditioning technique in conjunction with a conjugate
gradient or nonlinear conjugate gradient algorithm. The development of the gener
alized capacitance matrix algorithm is based on the generalized capacitance matrix
theorems that we state and prove in this thesis. In our algorithm, we first transform
the linear system into a Lyapunov matrix equation or a cascade of two Lyapunov
matrix equations with an appropriate righthand side (RHS). This RHS is obtained
by solving an associated linear system with a capacitance matrix. The Lyapunov ma
trix equations are solved using the alternating direction implicit method (ADI) while
the solution to the capacitance matrix linear system is obtained using a modified
32
system as long as it has a solution. In addition, with this generalized capacitance
matrix theorem, we have ample freedom to choose a wellstructured matrix A for de
signing more efficient capacitance matrix algorithms. Unlike in [58], no construction
of auxiliary rankaugmented matrices is required since we do not use the Sherman
MorrisonWoodbury formula in our generalization, instead our resulting algorithm
uses only the eigenvectors in null(AT)\null(JZT) to project the RHSs of the linear
systems with matrix A onto range(A). To achieve the projection of the RHS onto
range(A), we subtract the RHS along the vectors Uj,l < j < m\ m, instead of
subtracting along the eigenvectors in the null space of A as in [58], By using our
projection technique, the sparsity of the RHS can be preserved, while the sparsity of
the RHS is usually destroyed after subtracting along the eigenvectors as in [58]. We
take advantage of this sparsity property to obtain fast numerical algorithms in our
work. Unlike in [58], the capacitance matrix and the capacitance matrix equation
are explicitly given in our generalized capacitance theorems and the algorithm. By
explicitly specifying the capacitance matrix and the capacitance matrix equation, it
is possible to modify the RHSs of these equations in accordance with the singularities
of A and B. With the structure of the capacitance matrix equation given explicitly,
it is easy to design an efficient numerical algorithm to solve this equation. When
comparing the generalized inverse forms of (A T UVT) proposed in [34] with our
generalized capacitance matrix theorems, we observe that the generalized inverse for
mula of Henderson and Searle [34] is a degenerate result of our theorem 2, which has
a more relaxed assumption, i.e. null(A) C null(B) and range(B) C range(A), than
174
\new. By using this approximation, the acceptance probability can be obtained by
computing the local energy change in E' for each ( onesite ) perturbation.
In our experiments, the population size in the informed GA is arbitrarily chosen
to be 100. By using our hybrid search algorithm on the sparse data set shown in
figure 7.2(b), the converged solution, which means the best visited solution remains
the same for many generations, was found after 3 generations of the informed GA.
Figure 7.4(a) shows the reconstructed surface and the detected discontinuities is given
in figure 7.4(b). We can see that this solution is very close to the true solution.
(a)
(b)
Figure 7.4. Surface reconstruction example: (a) the reconstructed surface (b) the
recovered discontinuity map ( solid line ) obtained after 3 generations of the informed
GA and imposed on the true discontinuity map ( dashed line ).
In another experiment, we added noise to the sparse data in figure 7.2(b) to test
our hybrid search algorithm. The noise for each data point i is a uniform random
variable ranging in the interval [0.2dt, 0.2d,], where d, is the data value sampled
from the original shape shown in figure 7.2(a). Figure 7.5(a) shows the noisy sparse
62
is 0(N) by proving that the ADI method can converge in a constant number of it
erations independent of N. In addition, we present a modified biconjugate gradient
method that is used in solving the capacitance linear system.
4.3.1 ADI Method for the Lyapunov Equation
The ADI method for solving a Lyapunov matrix equation is described in [48].
For any Lyapunov matrix equation of the form AX + XA = B, the ADI method
involves the following steps in each iteration j = 1,2,..., J,
(A + PjI)Xj_i = B Xj_,(A pjl) (4.12)
Xj(A+pI) = B (A PjI)Xj_i (4.13)
For our problem, note that the matrix A is circulant Toeplitz and symmetric positive
semidefinite as discussed in section 4.2. The special structure of matrix A helps in
advancing the ADI iterations very efficiently in each step of the technique. On the
lefthand side (LHS) of the equations 4.12 and 4.13 respectively, the matrix A + Pjl
is very close to being tridiagonal except for the nonzero entries at the corners namely,
the entries (l,n) and (n,l). These nonzero corner entries are caused due to the doubly
periodic boundary conditions discussed in the last section. In addition, A+pjl is SPD
for any positive parameter pj. Therefore, we can compute the LU decomposition of
this matrix and then use the forward and back substitution to compute the updated
solution. Due to the the special structure of the matrix A + pjl, the solution update
requires only O(N) time per iteration.
CHAPTER 5
ADAPTIVE PRECONDITIONING TECHNIQUE
5.1 Introduction
The generalized capacitance matrix algorithm presented in chapter 4 is very
efficient to solve the linear systems arising from the early vision problems. However,
this technique can prove to be inefficient for dense data problems with nonuniform
weighting on the data constraints, such as the optical flow problem, since the size
of the associated ( dense ) capacitance matrix is too large. Although this problem
may be circumvented by incorporating the capacitance matrix technique as part of
an iterative scheme [72], however it was pointed out in [12] that this semidirect
numerical scheme only converges when the regularization parameter A is very large.
In this chapter, we present a novel adaptive preconditioning technique which when
used in conjunction with a conjugate gradient algorithm can solve the linear systems
very efficient. This adaptive preconditioning technique can be universally applied to
solve any early vision problem.
Numerical iterative methods have been popular for solving the linear systems in
early vision problems. Several multiresolution preconditioning techniques [75, 61, 92]
have been proposed to accelerate the convergence speed of the iterative methods in
81
107
APCG
HCG
CG
(a)
Figure 5.4. (a), (d), (g) & (j) are the computed solutions using the APCG algorithm,
(b), (e), (h) & (k) are the computed solutions using HCG algorithm, (c), (f), (i) &
(1) are the computed solutions using conjugate gradient algorithm, after 1, 5, 10 and
20 iterations, respectively.
33
the assumption, range(B) C range(A) for symmetric A & B, made in [34]. When
applied to solve linear systems using the capacitance matrix approach with the matri
ces A and B satisfying the aforementioned assumption, both the generalized inverse
formula in [34] and our theorem 2 lead to the same algorithm.
We will state and prove the generalized capacitance matrix theorems for different
types of relationships between the matrices A and B subsequently. In the next
chapter, the generalized capacitance matrix algorithm will be constructed based on
the theorems presented in this chapter.
3.2 Generalized Capacitance Matrix Theorems
We categorize the generalized capacitance matrix theorems according to the
type of relationship between the matrices A and B. The first theorem is for the case
when null(B) C null(A) and range(A) C ranre(B), while the second theorem is
for the assumption null(A) C null(B) and range(B) C range(A). There are no
assumptions on A and B in the third theorem. The generalized capacitance matrix
theorem for the first case is stated as follows.
Theorem 1 Given B = A + UGV7, where A,B 3?nxn, G 3?pxp is nonsingular,
U = [ui,...,Up] 3?nXp, V = [vi,...,vp] &nXp, with both the sets {ui,...,up}
and {vi,..., vp} containing linearly independent vectors. Assume null(B) C null(A)
and range(A) C ran<7e(B). Let rank(B) = n ms and rank(A) = n m^,
0 < ms < tua < ti, then there exist two sets of orthogonal bases, {qi,...,qmx}
and {q'l5..., q'mA }, for null(AT) and null(A) respectively, such that nii//(BT) =
7.3.1 Elements of Genetic Algorithms 164
7.3.2 Informed Genetic Algorithm 168
7.3.3 Preliminary Experimental Results 171
7.3.4 Discussion 176
8 CONCLUSIONS 178
REFERENCES 181
BIOGRAPHICAL SKETCH 188
vi
26
and an additional constraint term along edges leading to
[ a((VÂ£.u)+Â£t)2+A( Vu 2 +  Vu
JD
2)dx+f /?((V(V2G*Â£)u)+(V2G*Â£)t)2dx
(2.22)
Where u(x, t) = (tt(x, t), n(x, t)) is the velocity field to be estimated, E is the image
brightness function, and V2G is the Laplacian of Gaussian operator [35]. The first
term is the gradient constraint which is active away from discontinuities, the second
term is the smoothness constraint on the velocity field with A controlling the contri
bution of the smoothness term and the third term is the flow constraint along edges
which is disabled away from the edges. A discrete version of the above energy can
be written as
+ Evv + Et)2 + \ Y,(ul + UD + (vl + VD
(2.23)
Where, and v form the discretized flow vector, x,y,vx and vy represent the dis
cretized partais of the velocity field, Ex, Ey and Et correspond to the discretized
partials in the x, y and t directions respectively collectively obtained from the
first and the third terms of the equation 2.22. Note that Ex, Ey and Et are the
discretized partials of the brightness function E when the location of differentiation
is away from discontinuities, and they represent the discretized partials of V2G E
when the differentials are taken at the discontinuity locations. The minimization of
this discretized energy leads to solving a linear system of equations Ku = b where
K contains the discretized differential (smoothness) operator along with Ex, Ey and
102
5.3.3 Optical Flow Computation
Similar to the SFS problem, the stiffness matrix K for the optical flow estimation
has the following 2x2 block structure.
K =
Ks + Ej
E.
E
Xy Kg + Eyy
(5.12)
_ . K2
where Exx, Exy, and Ey are all diagonal matrices with the normalized entries _ JÂ¡Â¡,
EX+Ey
F F E2
and =5=^=j. The construction of the block preconditioner is similar to that in
Ex+Ey Ex+E y
the SFS problem, therefore it is omitted here.
5.4 Experimental Results
In this section, we present implementation results of applying our adaptive pre
conditioner to the surface reconstruction, shapefromshading, and optical flow com
putation problems. Since our preconditioner adapts to the problem, it is bound to
perform better than the existing fixed preconditioners described in literature. We
will justify this claim through the experimental results.
Note that there is no restriction on the wavelet basis being used in the construc
tion of our preconditioner. However, the efficiency of our preconditioner is related
to the localization of the wavelet being used. The time localization makes the com
putation of the wavelet transform efficient; the localization in the frequency domain
yields a more accurate spectral approximation and therefore faster convergence of the
preconditioned conjugate gradient can be achieved. In our experiment, we use the
cubic Bspline wavelet given in [50] with the QMF filters truncated to a length of 11.
61
sparse data problem and the dense data problem is the treatment of the data com
patibility matrix K<. For the dense data case discussed above, the matrix K is an
identity matrix and is arranged into K0, which can be rewritten as a Kronecker prod
uct of two wellstructured matrices. For the sparse data case, the nonzero elements of
the matrix K are sparsely distributed along the diagonal, thus every nonzero entry
of the sparse diagonal matrix IQ is decomposed into the outer product of u( and v'
and included into U and V as follows:
u =
Ul
. uk u'x ..
u'k.
V =
Avj .
.. Ay* v[
... \'k,
where the vectors Ui,...,Ujt, Vi,...,Vjt account for the molecule inhibition due
to depth and orientation discontinuities (if any) and the imposed doubly periodic
smoothness over the embedded rectangular domain. With the above choice of U and
V, each column of U and V is still very sparse, containing only one or two nonzero
entries. Our numerical algorithm will take advantage of this sparsity property.
Similar to the dense data case, the matrix Ko can be decomposed as the sum of
two Kronecker products of AA and I. Therefore, a linear system of the form K0z = f
can be written as the Lyapunov matrix equation AZ + ZA = F.
4.3 Numerical Solution
In this section, we discuss the convergence of ADI method applied to the Lya
punov matrix equation and show that the computational cost for solving this equation
22
Kx2 = b2, and KiX3 = b3, where K is the stiffness matrix obtained by discretizing
the Laplacian operator (on the LHS of the equation 2.18) with Dirichlet boundary
conditions and Kj is the stiffness matrix obtained by discretizing the Poisson equation
for the height from orientation problem i.e.,
A Z = px + qy (2.19)
with Neumann boundary conditions. The vectors bi and b2 are the appropriate
RHSs obtained by discrete computation of the RHS of equation 2.18 at the current
estimated value of (f,g). b3 is the discrete form of the RHS of the above height from
orientation equation with the estimated (p, q) from equation 2.18. The structure of
the stiffness matrices K can be analyzed using the computational molecules [81]. In
this dissertation, we apply our generalized capacitance matrix algorithm to solve the
above discretized Poisson equations.
Energy minimization formulation
The other formulation for the SFS problem is to minimize the total energy func
tional containing the data constraint, the integrability constraint and the smoothness
constraint. The minimization is taken with respect to the surface orientation (p,q)
and the surface height Z altogether. The total energy functional Â£(Z, p, q) is the sum
of the smoothness constraint and the penalty terms in this problem, i.e.,
Â£(Z,p,q) = ^JJ^pl + P^ + iql + q^dxdy + ^JJiZzpy + iZyqYdxdy
94
solving the original linear system. Therefore, nothing is gained by using P = K. A
reasonable choice for a good preconditioner is to find a matrix P that is spectrally
close to K; in addition, the solution to the auxiliary linear system Pz = r should be
easily solvable. Our preconditioner is constructed based on this very idea.
The matrix K is the stiffness matrix obtained from a discretization of the regu
larization formulation. Therefore, its inverse has the same spectral characteristics as
that of the regularization filter given in equation 5.4. Now we approximate the spec
tral characteristics of the regularization filter in a wavelet basis by using its frequency
domain partitioning property. This approximation is accomplished by modulating
each wavelet basis function by the spectral value of the regularization filter taken at
the center of the corresponding frequency domain subregion. To be more specific,
when Mallats tensor product strategy [50]is used for the 2D wavelet transform, each
level (resolution) contains three sets of wavelet basis functions, i.e. LH ( tp), HL
and HH (xpip), and each set corresponds to its own subregion in the 2D fre
quency domain. The modulation for the wavelet basis functions in each set is based
on the spectral function of the regularization filter in its corresponding subregion.
For ease of implementation, we can just take the spectral function value at the center
of the subregion to be the modulation factor.
The inverse of the stiffness matrix is basically a lowpass filter, whose spectral
function changes rapidly in the lower frequency region and varies smoothly in the
higher frequency region. An approximation to the spectral function of the lowpass
filter achieved in a wavelet basis is very fine in the lower frequency regions and coarse
101
Is along the main diagonal and Is along the upper diagonal, i.e.,
1 1 0 0
0 1 1 0 0
0 1 1
0 0 0
(5.10)
Our preconditioner P has similar block structure as K and can be expressed as
w
0
0
Ml
0
0
WT
0
0
p =
0
w
0
0
m2
0
0
WT
0
0
0
w
0
0
m3
0
0
WT
(5.11)
The modulation matrices Mi, M2 and M3 are all diagonal and they are used to mod
ulate the wavelet basis functions corresponding to z, p, and q vectors, respectively.
The modulation matrices are constructed to approximate the spectral distributions
of the corresponding diagonal blocks in the matrix K, i.e. fiKa, AKS + (Rp + /z)I and
AKS + (Rq T /i)I. The construction procedure for each block is the same as discussed
in the surface reconstruction problem.
176
With regards to computational effort, in the above surface reconstruction exper
iments, our generalized capacitance matrix algorithm usually takes less than 1 second
on a multiuser SUN SPARC10 workstation to solve a convex quadratic minimiza
tion problem. The population size was set to 100 in our hybrid search algorithm. In
the experiments, three generations of the hybrid search algorithm were completed in
less than 3 minutes. For 5 generations of the same, it took less than 6 minutes. The
structure of our hybrid search algorithm is primarily a genetic algorithm, which is
easily parallelizable [71], thus making it straightforward to port this algorithm to a
parallel machine. A parallel implementation will make our algorithm a very useful
and efficient technique for solving hard nonconvex optimization problems.
7.3.4 Discussion
A hybrid search algorithm is presented in this chapter to solve the coupled (
binaryreal) nonconvex optimization problem. Our hybrid search algorithm consists
of a novel informed GA and either one of our generalized capacitance matrix algo
rithm or adaptive preconditioned conjugate gradient algorithm. The informed GA
is used as a stochastic global minimizer for a new energy function consisting of the
binary line process variables only. Inside the GA, either one of the above two de
terministic algorithms is used to determine the new energy function values for each
line process configuration visited in the GA. The performance of this hybrid search
algorithm is demonstrated through experiments on the sparse data surface recon
struction problem. Since the GA is a highly parallizable algorithm, we can improve
the efficiency of this hybrid search algorithm by porting it to a parallel machine.
76
Figure 4.3. Surface reconstruction example; (a) original sparse data, (b) recon
structed surface using membrane smoothness, (c) reconstructed surface using thin
plate smoothness, obtained using the generalized capacitance matrix algorithm.
85
spline or the thin plate spline or a convex combination of the two. Experimental
results depicting the performance of our algorithm in comparison to existing methods
discussed above are presented for synthetic and real data.
The rest of this chapter is organized as follows. In section 5.2, an adaptive
preconditioner constructed in a wavelet basis is presented, followed by the adaptive
preconditioned conjugate gradient algorithm in section 5.3. Algorithm implementa
tion on synthetic and real data are presented in section 5.4. In section 5.5, we end
with a discussion.
5.2 Adaptive Preconditioning in a Wavelet Basis
Iterative methods are preferable to direct methods for solving large and sparse
linear systems in early vision problems, since direct methods require the formation
of intermediate matrices which are not necessarily sparse. Although the sparsity
can be preserved when using direct methods on ID problems with the smoothness
constraint, it will be destroyed in the case of 2D problems. This makes the number
of operations and the amount of storage too large for 2D and higher dimensional
problems.
The efficiency of applying an iterative method to solve a linear system depends
on its rate of convergence, which is a function of the condition number of the coef
ficient matrix in a linear system. Unfortunately, many early vision problems after
regularization lead to large illconditioned linear systems. Therefore, most itera
tive methods converge very slowly to the true solution and are thus computationally
141
where w = h(xÂ¡ + u,,y, + u,) /0(x, yt). In the above equations, the values of the
image function h and its partial derivatives at (x f u,, y, + u) need to be computed
using any interpolation technique. In our implementation, we used the interpolating
spline under tension [57] to compute these values.
6.3.2 Modified Search Direction Scheme
The nonlinear conjugate gradient algorithm is a local search method which in
volves approximating the local landscape of the energy function by a quadratic func
tion. The use of gradient to obtain the search direction is not appropriate at the
locations where the local landscape of the energy function has large higherorder
derivatives, i.e. the local landscape is very nonlinear. We propose a new search
direction scheme to modify the gradient direction in nonlinear conjugate gradient al
gorithm. The idea of the modified search direction scheme is to check the smoothness
of the local energy function for every component of flow vector. From equation 6.10,
we can see that the smoothness energy is already quadratic, therefore, we need only
to consider the smoothness of the first term, i.e., the data constraint energy. To mea
sure the local smoothness of the data constraint energy for the flow vector (m,u),
we use the inner product of the normalized gradients of the two image brightness
functions as a smoothness measure cr(i, u,), i.e.
V/i(x, + Ui, yÂ¡ + v) V/0(x, y,)
a(Ui,V,) ~ V/i(xI + u1,y, + u,) V/0(x,y,)'
(6.19)
CHAPTER 4
GENERALIZED CAPACITANCE MATRIX ALGORITHM
4.1 Structure of the Algorithm
From the generalized capacitance matrix theorems presented in chapter 2, we
can generalize the standard capacitance matrix algorithm for general matrices A and
B as follows.
1. Solve equation 3.32 for x.
2. Solve equation 3.33 for r/t, for i = 1,... ,p.
3. Form the capacitance matrix C from equation 3.34.
4. Solve the capacitance matrix equation (i.e. eq. 3.35) for (3.
5. The solution x = x rj(3.
This generalized capacitance matrix algorithm can always be applied to the cases
when null(B) C null(A) and range(A) C range(B) or when null(A) C null(B)
and range(B) C range(A), as stated in Theorem 1 and 2 respectively. However,
to apply this algorithm to more general cases, we need to make sure that a solution
exists to the capacitance matrix equation in step 4. Theorem 3 gives a sufficient
condition for the uniqueness of the solution to the capacitance matrix equation in
the general case.
50
86
impractical. One way to speed up the convergence of an iterative method is via pre
conditioning. In this section, we present a new adaptive preconditioning technique
wherein the preconditioner is constructed in a wavelet basis. Since the proposed
preconditioner is designed to adapt to the spectral characteristics of the smoothness
constraint in the regularization solution (in the frequency domain) of different early
vision problems, it is bound to perform better than the existing fixed precondition
ers [75, 42, 92].
5.2.1 Preconditioning Technique
For a linear system Kx = b with positivedefinite matrix K, we assume there
exists a positivedefinite matrix P, the preconditioner for the matrix K, such that the
condition number of P1K is greatly reduced. Since P is a positivedefinite matrix,
it can be shown that there exists a positivedefinite matrix C such that C2 = P [26].
Now, we can define the vectors x and b via the transformation matrix C as follows
x = Cx, b = C1b. (5.1)
Thus, the linear system Kx = b can be solved by solving the transformed linear
system
Kx = b, (5.2)
where K = C1KC1. Note that the matrix K in the transformed linear system
is similar to the matrix P_1K, i.e., the condition numbers of K and P1K are
equivalent. The condition number of the transformed matrix is much smaller than
38
By definition, B77/3 = Arj(3 + \J\Trj/3. Substituting equations 3.3 and 3.9 into this
equation, we get B77(3 = 0. Thus, rjf3 E null(B). Since null(B) C null(A), we
have rj/3 Â£ null(A), i.e. ArÂ¡(3 = 0. Substituting At] by equation 3.3 and denoting
Â¡3 = (/?i,..., /?p), we obtain
y y 171A ~mB Qiii
U E = 
j=l 4j uj
1 = 1
=1
(3.10)
After rearranging, equation 3.10 becomes
mAmB
QTU/3 P
E (A tttK + E = 
4j uj
(3.11)
j=i
i=m^mfl+1
Since Ui,..., Up are linearly independent, equation 3.11 imposes the following condi
tions on flj
Pi
Pi
qjuP
q[uj
j = l,...,mA mg,
0, j = mAmB + 1,
(3.12)
(3.13)
Substituting these conditions into the RHS of equation 3.9, we obtain
/ \
Pi
\ PmAmB J
= 0.
(3.14)
164
The overall structure of our algorithm is to use the GA as a global minimizer
to the function E'(l) with a fast quadratic convex function minimizer inside the GA
to obtain the function value of E'( 1). It is important to have a very fast quadratic
convex function minimizer since we need to compute the function value E'( 1) for
each new line process candidate solution 1 at each generation. We employ either
one of our fast algorithms namely, the generalized capacitance matrix algorithm or
the adaptive preconditioned conjugate gradient algorithm, to solve the minimization
problems with prescribed discontinuities.
7.3.1 Elements of Genetic Algorithms
The GA is an adaptive search algorithm based on principles derived from the
dynamics of natural population genetics. It has been successfully applied to combina
torial optimization, function optimization, classifier systems, structure optimization,
pattern recognition, and other areas [27] [24].
A population of candidate solutions is generated in the beginning as the initial
population, denoted by P(0). Then, the GA operators, i.e. reproduction, crossover
and mutation, are applied to the current population P(t) to generate the descendant
population P(t + 1). Subsequently, the algorithm is an iterative adaptation from
one generation to the next based on the genetic operators and the fitness function.
The GA operators are designed to follow the survival of fittest principle. The fitness
function, which is a measure of fitness, in the GA is directly related to the objective
function to be minimized or maximized in an optimization problem.
83
whether nonstandard or standard wavelet transform is used. The result of Beylkin
et al., primarily shows a reduction in the bandwidth of the matrices corresponding
to pseudodifferential operators in a wavelet basis and gives a technique whereby an
0(N) coefficients can be used to approximate the (N x N) operator, for a given
tolerance. This compression takes O(N) time if the structure of the singularities of
the matrix are known a priori.
There are three issues to be noted with regards to Pentlands reported results.
Firstly, after a change of basis to orthonormal wavelets, the offdiagonal terms were
discarded in [61], under the claim that the transformed stiffness matrix was diagonally
dominant. This diagonal dominance behavior was depicted via the profiling of a row
of the transformed stiffness matrix at a high resolution. Upon close examination,
we found that the diagonal dominance property does not hold in general at lower
resolutions (see figure 5.10) and thus, discarding the offdiagonal terms is unjustified
and leads to a solution which may be far from the true solution. Secondly, Pentlands
preconditioner requires the computation of the diagonal entries in a wavelet trans
formed stiffness matrix. Although the stiffness matrix is banded, the computation of
these diagonal entries takes 0(N \og N) operations, which is too expensive for large
N. Thirdly, it is not known if the matrix (K + S) in his linear system (K + S)U = D
being solved for the interpolation problem would satisfy the conditions set forth in
Beylkin et al., [7] in order for their bandwidth reduction scheme to be applicable in
this case. Further, we found that his algorithm fails to converge in reasonable number
16
For the sake of exposition, lets consider the surface reconstruction problem
here, but the discussion to follow should apply to most early vision problems with
minor modifications. A coupled MRF model may be constructed for the surface
reconstruction problem by defining an MRF, F, for surface and a boolean MRF, L,
for line process. The prior probability can be written as
P{ F = f,L = l) = i exP_t/(f1)/r (2.9)
z
U((, 1) = Â£ifc(f,l) (2.10)
c
The parameter T is assumed to be 1 for simplicity. The prior energy /(f, 1) is
similar to the last two terms of E(u, h, v) in equation 2.6, which are the smoothness
constraints on the surface and line process.
Let the observed data be denoted by d, then the problem to estimate f and 1
from d can be obtained by the MAP (maximum a posteriori) estimation. The MAP
estimate is obtained by using Bayes rule and maximizing the posterior probability
P(F = f, L = 1  D = d). Assume the observation model is given by D = H(F) + N,
where N is zero mean white Gaussian random field with variance cr for each variable
in n. After applying the Bayes rule, we can see that the MAP estimation is equivalent
to maximizing P(D = d  F = f, L = 1)P(F = f, L = 1), and
P( D = d  F = f,L = 1) = (2I72pl'/2ex[r"'L"l'f,
(2.11)
117
5.4.2 Shape from Shading
For the shapefromshading experiment, we use the synthetic bump images with
size 64 x 64 used in [76, 38], as shown in figure 5.14. These two images are generated
from the Lambertian reflectance model with the light source directions (pa, qs, 1) being
(0.34,0.3467,1) and (0.34,0.3467,1) for the left and right images, respectively. In
this experiment, only the left image is taken as the input to the SFS algorithm
to recover the shape, and the right image is used to monitor the closeness of the
computed solution to the true solution.
Figure 5.14. Shaded images of bump with different light source directions.
The construction of our preconditioner for the shapefromshading problem is
described in the previous section. The parameters A and p are set to be 0.5 in
this experiment. Without including any boundary condition in the energy function
to be minimized, we compare the convergence of our APCG algorithm with the
12
i.e.,
Find u such that S[u) = inf v 7i S(v) (21)
where, 7i defines the linear admissible space of smooth functions defined on 9ft2 and
the functional
E{v) = V{v) + A
Where, A is called the regularization parameter that controls the contribution of the
smoothness term, S(v) : Ft i> 9ft is a functional on H that is a measure of smoothness
of an admissible function v(x,y), and V(v) : 7i tt 9ft is a functional that measures
the discrepancy between the observation model and the observed data [80].
To solve the variational problem, a finite difference method [37] or a finite el
ement method [78] can be used to discretize the problem domain. The infinite
dimensional minimization problem in equation 2.1 can be reduced to a finitedimensional
minimization problem as follows:
minÂ£(u) with E(u) = D(u, d) + AS'(u) (23)
where u 9ft"2 xl is the discretized surface, d 9ft"2 xl is the data vector, and the
discretization mesh is n x n. In general, the observation model can be nonlinear,
e.g. the shape from shading problem with a nonlinear reflectance map function.
Lets denote this model by K
model is linear, and choose the stabilizing ( or smoothing ) operator as the firstorder
5.4 (a), (d), (g) & (j) are the computed solutions using the APCG al
gorithm, (b), (e), (h) & (k) are the computed solutions using HCG
algorithm, (c), (f), (i) & (1) are the computed solutions using conju
gate gradient algorithm, after 1,5, 10 and 20 iterations, respectively. 107
5.5 The computed solutions after (a) 10 iterations of APCG (b) 25 itera
tions of HCG (c) 30 iterations of CG using the same time 108
5.6 The convergence for different preconditioners in the preconditioned
conjugate gradient algorithm in the surface reconstruction problem
with the synthetic sparse data set and (a) with controlledcontinuity
stabilizer (p = 1, r = 0.1) (b) with thinplate stabilizer 109
5.7 The convergence for different preconditioners in the preconditioned
conjugate gradient algorithm for the surface reconstruction problem
with discontinuities specified in the controlledcontinuity stabilizer. 109
5.8 The computed solutions for the surface reconstruction with disconti
nuities using the APCG algorithm after (a) 10, (b) 20, (c) 40 and (d)
60 iterations, respectively 110
5.9 The convergence of Pentlands preconditioner in conjunction with the
conjugate gradient algorithm is worse than that of conjugate gradi
ent algorithm without preconditioning for the surface reconstruction
problem with the synthetic sparse data set using a membrane stabilizer.Ill
5.10 Row profile of the wavelet transformed membrane matrix Ks of size
64 x 64 along different rows, (a) row 3 (b) row 10 (c) row 30 (d) row 50.
This figure establishes the falsity of the diagonal dominance behavior
of K4 claimed in [19] 112
5.11 The convergence for different preconditioners in the preconditioned
conjugate gradient algorithm for the surface reconstruction problem
with sparse symmetric data set using a controlledcontinuity stabilizer. 113
5.12 The computed solutions using the adaptive preconditioned conjugate
gradient algorithm after (a) 10 iterations, (b) 40 iterations, (c) 100
iterations, in the third surface reconstruction experiment 114
5.13 (a) The convergence for the APCG, HCG and CG algorithms for the
surface reconstruction problem with the real laser range data set using
a controlledcontinuity stabilizer, (b) The computed solution after 10
APCG iterations 116
5.14 Shaded images of bump with different light source directions 117
IX
31
formula in equation 3.1 have been reported in [11], [34], [67], and references therein.
However, these generalized inverse forms of (A + UVT) have been derived under
very restrictive assumptions. In [11], a generalized inverse form was given for the
case when U and V are both vectors, which means B is a rankone modification of
A. Although the generalized inverse of a rankp modification of A can be obtained by
using the rankone modification generalized inverse formula iteratively, the resulting
algorithm is very inefficient in terms of storage and computation when the size of
the matrix A is very large. Henderson and Searle [34] derived several generalized
inverse forms for (A + UVT) under the assumptions that range(UVr) C range(A)
and both the matrices A and UVr are symmetric. Recently, Riedel [67] presented a
generalized inverse form for (A f UVr) with a very restrictive assumption, which is
implied by the condition rank(B A) < dim(null(A)). This condition implies that
Riedels result at best can give a generalized inverse for a rank<7 modification of A,
where q < dim(null(A)). This condition becomes very restrictive and impractical
when the dimension of the null space of A is small. Although the aforementioned
generalized inverse formulas can be used to design the corresponding capacitance
matrix algorithms, the associated assumptions restrict them from being applied for
general matrices A and B.
In this chapter, we present several theorems which we call generalized capaci
tance matrix theorems that lead to a generalized capacitance matrix algorithm which
can be applied to solve very general linear systems. In the main theorem, there are
no restrictions on the rank of the matrices A or B and it can be applied to any linear
To my wife YihChou
and my son Patrick
63
We now examine the convergence of the ADI method for the Lyapunov matrix
equation in our algorithm. Let the initial Xo = 0 (zero matrix), and AX* = X* X*,
where X* is the true solution of the Lyapunov matrix equation, A0, Aj,..., An_i, and
q0, qi,..., q_i be the eigenvalues and the eigenvectors of A respectively. For this
symmetric matrix A, we have
A = QDQt, (4.14)
where Q = [q0 qi qni], D = diag[X0, A1? , An_i], and A*, = 4sin ( n2 )
for k = 0,..., n 1. We express the eigenvalue A in this form so as to have nonde
creasing eigenvalues with respect to the indices. The range of eigenvalues is between
0 and 4, and the zero eigenvalue (when k 0) is simple. The orthogonal matrix
Q contains the discrete sine or cosine function in each column with increasing fre
quency corresponding to increasing eigenvalue, i.e., the index increases. For example,
the eigenvector corresponding to the simple zero eigenvalue is a flat function whose
frequency is 0.
By taking the tensor product of the eigenvectors, qjt <8> q/, k, l = 0,..., n 1, we
can generate an orthogonal basis for 3inxn and express the true solution X* in this
basis as follows:
n1n1
x = Â£Â£ aw(q*0qj). (4.15)
k=0 1=0
Subtracting each of the equations 4.12 and 4.13 from the Lyapunov equation AX f XA
= B, and expressing X in the above tensor product basis, we have the following result
130
are the discontinuity locations of the image brightness function. The contourbased
methods use a different flow constraint along the zerocrossing contours. This flow
constraint along contours is obtained by replacing the image brightness function I by
the convolution of I with the second derivative of Gaussian smoothing function. We
propose to use the image flow constraint in the areas away from the brightness dis
continuities and use the contourbased flow constraint at the discontinuity locations.
Thus, in our gradientbased regularization method, the unreliable image flow con
straints near discontinuity locations are replaced by the robust contourbased flow
constraints at the zerocrossing locations. This yields an accurate estimate of the
optical flow field vectors.
The correlationbased approach locally finds the displacement vector (u, v) be
tween two images 70 and I\ at the location (x, y) by minimizing the sum of squared
difference (SSD) function
k k
SSD(x;u,v) = '52w(i,j)[h(x + i + u,y+j + v)I0(x + i,y+j)]2 (6.1)
j=k i=k
where w(i,j) is the weighting function. In this SSD function, the summation is
performed in a window of size (2k f 1) x (2k + 1) centered at (x, y). Most correlation
based methods do an extensive search of the displacement vector (u, v) in a finite
integerpair set and find the pair with the smallest SSD value to be the displacement
at that location. Since the search region of the displacement vector is discretized
for an extensive search, the accuracy of the computed optical flow is limited by this
discretization. In addition, the correlationbased approach is incapable of producing
123
5.19. Our APCG outperforms the other methods in terms of convergence speed for
all of the three different measures.
5.4.3 Optical Flow Computation
In section 2.3, we presented a modified gradientbased formulation of the opti
cal flow computation problem. Our regularizationbased formulation combines the
regionbased [40] and contourbased [35] gradient constraints. The gradient con
straint for the brightness function E is active all over the image plane except in the
the neighborhood of the discontinuity locations, while the gradient constraint for the
brightness function after the Laplacian of Gaussian operation, i.e. V2G* E, is active
at the discontinuity locations. In our implementation, we locate the zerocrossings of
S72G*E and label them as discontinuities [35]. Accurate estimation of Ex, Ey and Et
is necessary to obtain a reliable flow constraint. When E is the brightness function,
we use the Gaussian spatialtemporal presmoothing prior to applying the twopoint
central difference to approximate the partial derivatives. When E is V2G E, a
twopoint central difference is applied to E to approximate its partial derivatives. As
discussed in section 2.3, a normalization of the gradient constraints is used to make
the contribution of each (active) flow constraint uniform.
We present two experiments for the optical flow computation. The first experi
ment consists of a synthetic translating square image sequence; and the second is a
real image sequence consisting of a rotating Rubic cube. One frame for each image
sequence is shown in figure 5.20 Each frame for the translating square and rotating
Rubic cube image sequence is of size 100 x 100 and 256 x 240 respectively.
7.1 The crossover operation example 166
7.2 The sparse data set used for the surface reconstruction with disconti
nuity detection, (a) original surface (b) sparsely sampled data set. . 172
7.3 (a) Clique for the line process (x: line process, 0: nodal variable for
surface), (b) clique potentials for different line process configurations. 173
7.4 Surface reconstruction example: (a) the reconstructed surface (b) the
recovered discontinuity map ( solid line ) obtained after 3 generations
of the informed GA and imposed on the true discontinuity map (
dashed line ) 174
7.5 Surface reconstruction with noise example: (a) the noisy sparse data
set, (b) the reconstructed surface (c) the recovered discontinuity map
( solid line ) obtained after 5 generations of the informed GA and
imposed on the true discontinuity map ( dashed line ) 175
xi
in the formulation, the energy functions to be minimized become nonconvex. In this
thesis, we propose efficient computational algorithms for minimizing these energy
functions with and without discontinuity detection.
This thesis contains the following four main contributions:
1. A new theoretical and algorithmic framework based on the capacitance matrix
method for solving elliptic partial differential equations arising in early vision
and engineering problems.
2. A novel adaptive preconditioning algorithm using a wavelet transform for solv
ing illconditioned large sparse linear systems arising in regularized solutions of
illposed problems in early vision and other domains.
3. Two new, robust and efficient algorithms for accurate optical flow computation
namely, a modified gradientbased formulation and an SSDbased regularization
formulation.
4. A new hybrid (stochastic f deterministic) search algorithm involving an in
formed genetic algorithm (GA) and either one of the two aforementioned fast
numerical algorithms, for solving nonconvex optimization problems in early
vision.
The efficiency of the proposed algorithms are demonstrated through experi
ments on several early vision problems, including surface reconstruction, shape from
shading, and optical flow computation. In addition to computer vision applications,
Xlll
80
(a)
(b)
\\
V'\N
\\\\\
\\\\\\\
\\WX\W
\XXWX\W
XXXXNSXXW
XNXXXXXXW
NNWNWWW
XSNXWWWW
XXXXXXXXXXX \
///
////
It////
///////
////////
/t////
/////////
//////////
////^/////
WWWNW
\\\\\\\\
xxxxxxx
wxxxxxxx
(c)
(d)
Figure 4.6. Shape from shading
occluding boundary imbedded in
reconstructed surface shape.
example, (a) Shaded sph
a rectangle (c) recovered
ere image (b) irregular
surface orientation, (d)
75
4.4.1 Surface Reconstruction
We implemented our generalized capacitance matrix algorithm on the sparse
data surface reconstruction with membrane and thinplate splines. We synthesized a
sparse range/elevation data set on a 64 x 64 grid. The input data set is very sparse
and contains 15 data points randomly scattered in the plane as shown in figure 4.3(a).
We apply our algorithm to this data to obtain a surface approximation.
In the first experiment, we reconstruct the surface to approximate this sparse
data set by using membrane and thinplate smoothness constraints. Figure 4.3(b)
shows the reconstructed surface using membrane smoothness after 10 ADI iterations.
We observe that only 10 ADI iterations are needed to attain an error tolerance of
10~5. The reconstructed surface using thinplate smoothness after 16 ADI iterations
is shown in figure 4.3(c). We can see the reconstructed surface using thinplate
smoothness is much smoother than that using membrane smoothness.
The second experiment is the surface reconstruction on the same data set with
prespecified discontinuities along a horizontal line, between the coordinates (1,32)
and (30,32). as shown in figure 4.4(a). We depict the computed surface with mem
brane smoothness obtained using our generalized capacitance matrix algorithm with
1 and 10 ADI iterations in figure 4.4(b) & (c) respectively. Figure 4.4(d) shows the
result of the same experiment with thinplate smoothness after 16 ADI iterations
in our generalized capacitance matrix algorithm. A similar experiment with depth
discontinuities along the diagonal line segment extending from the coordinate (1,63)
145
i.e., the value of the regularization parameter A, since the correct optical flow field
is uniform in this example. However, large regularization parameter tends to over
smooth the estimated optical flow for other image sequences. Although it is possible
to find an optimal regularization parameter by using the generalized crossvalidation
method [88], this method is extremely computationally expensive.
Technique
Average Error
Standard Deviation
Density
Horn and Schunck (modified)
32.81
13.67
100%
Nagel
34.57
14.38
100%
Our method (Gradientbased)
0.21
0.06
100%
Table 6.1. Summary of Square 2 results.
(a)
(b)
Figure 6.3. Computed optical flow of the (a) Square 2 and (b) Diverging Tree image
sequences after 40 iterations of the incomplete Cholesky preconditioned CG algo
rithm.
The other synthetic data is the more realistic Diverging Tree image sequence.
The underlying motion in this example is a bilinear expansion from the center of
the image. We use the regularization parameter A = 0.1 and a = fl = 1 in this
37
where 0 is an mg x (ttia mg) zero matrix, T is nonsingular, and
Q 6 is orthogonal. Define the vectors q'j,..., q'm^ by the following relation
/ \
( \
q'i
= Q
i
(3.8)
\ ^mA )
We can see that {q'j,..., q'm/i} is a set f orthogonal bases for null( A) and its
subset {q'm^_mB+1,..., q/TJT>1} forms an orthogonal basis for null(B). Thus, the
existence of the basis {q'j,..., q'm/1} which satisfies the aforementioned requirements
is proved. Similarly, we can prove that there exists an orthogonal basis {qj,..., qTOyt}
for null(AT) and null(BT) = span{qmx_mfl+i,..., qm/1} is satisfied.
(2) From Proposition 1, we have ujqj = 0 for 1 < i < p and mg + 1 < j <
rriA. The assumption b range(B) implies bTqj = 0 for tua mg + 1 < j < m^.
Combining these facts and the orthogonality of the eigenvectors qj,j = 1,..., m^, we
can see that the RHSs of equations 3.2 and 3.3 are orthogonal to q, for j = 1,..., tua,
thus, the RHSs are orthogonal to null(Ar). Consequently, the RHSs of equations
3.2 and 3.3 are both in the range of A, which means there always exist solutions to
equations 3.2 and 3.3.
(3) To prove that C is nonsingular, we show that C/3 = 0 => (3 = 0. Lets
suppose C/3 = 0, then, from equation 3.4, we have
Vtt7/3
mAmB
E
J = 1
G
i
eP
qjug
qj;
G'p.
(3.9)
157
Analog VLSI circuit based solutions were developed by Harris et al. [31] for
a host of early vision problems. These analog circuit based methods lead to some
impressive practical results. However, since this design is based on the idea of line
process elimination as in the GNC algorithm, it is not clear as to how the network
could be modified to include general line energies.
This approach does permit some kinds of line process interactions in the line
energy, but the line energy must have a closedform expression. In general, the MRF
model doesnt use a closedform expression for the clique potentials. In the MRF
model, a lookup table is more prevalent for designing the clique potentials for the
line process.
Mean field annealing
The mean field annealing algorithm [21] uses the mean field approximation and
the saddle point approximation to obtain an approximate solution to the MAP esti
mation problem. It is a nice deterministic approximation to the MRF model for sim
ple clique potential functions. Like the GNC algorithm and neural network approach,
mean field annealing can only deal with specific types of clique potential. When the
line energy includes line process interactions, i.e., the neighborhood system for line
process in the MRF model is nonempty, more elaborate, nontrivial approximation
methods are required to design the algorithm. Consequently, it may not be practical
for general clique potential functions.
91
where J is a positive integer. From this multiresolution orthogonal space decompo
sition, we can construct a multiresolution orthonormal wavelet basis, since each of
the subspaces on the left hand side of equation 5.6 are orthogonal to each other and
there exists an orthonormal basis for each subspace as discussed above.
The wavelet decomposition and reconstruction can be efficiently achieved via a
QMF (Quadrature Mirror Filters) implementation as shown in figure 5.1 (see [50]).
In this figure, A^f is the discrete approximation of the function f in the subspace
Vj, i.e. Ajf contains the values of the inner products of the function f and the
orthonormal basis functions in Vj, and D^f encodes the projection of the function f
on the subspace Wj. The filters H (in decomposition) and H (in reconstruction) are
lowpass filters, while the filters G ( in decomposition) and G ( in reconstruction)
are highpass filters. They are usually designed to be the FIR filters, which can
be implemented via convolution operations. With an appropriate choice of a wavelet
basis, we can find the associated FIR filters with very short length/span and the QMF
filter implementation for the wavelet transform can be accomplished very efficiently.
The QMF structure decomposes the space from one level to the next coarser level
with each stage involving a lowpass and highpass filtering. From a frequency domain
analysis, we can see that a wavelet decomposition divides the frequency domain into
two separate bands in each stage. Therefore, each subspace Wj has its own frequency
band, which means that all the wavelet basis functions in any particular level are fully
contained in the frequency band associated with that level. This behavior of dividing
the frequency domain [68, 86] exhibited by a wavelet decomposition is illustrated
105
all of the aforementioned preconditioners in conjunction with a conjugate gradient
algorithm and test their performance on one synthesized sparse data set (see fig
ure 5.3) in the surface reconstruction experiments and one real range data set from a
laser scanned ball. Subsequent experiments depict the convergence comparison of the
aforementioned preconditioners applied to the SFS, and OF computation problems.
In the first experiment, we recover the underlying surface from the sparse data
set in figure 5.3(a) using the controlledcontinuity stabilizer with the parameters p
and r being 1 and 0.1, respectively. The size of the problem is 64 x 64. Applying
our adaptive preconditioner to this problem, we can see that the iterations converge
to the true solution very fast. The computed solutions after 1, 5, 10 and 20 steps
of conjugate gradient, HCG, and our APCG algorithms are illustrated in figure 5.4.
With only five iterations of our APCG algorithm, the global shape of the surface
is recovered. After ten iterations of our algorithm, the solution is very close to the
true solution. Comparing these intermediate solutions in figure 5.4, we can see the
solution after 20 iterations of HCG is still bumpy and the computed solution after
20 CG iterations is far from the true solution. However, this comparison is not
very fair since the computational cost for each iteration of the APCG, HCG and
CG algorithms is different. Consequently, we include the execution time into the
comparison as follows. On a multiuser SUN SPARC10 workstation, 10 iterations
of our APCG algorithm on this experiment took 2.14 seconds. For the same time,
the HCG algorithm ran 25 iterations and the CG algorithm ran 30 iterations. Figure
5.5 shows these computed solutions for the three different algorithms using the same
49
nonsingular K. Our algorithm gives a solution to this problem (recovers a surface)
within a scale factor in height.
Previous works [11, 34, 67] on deriving the generalized inverse forms of (A f
UVT) were presented under restrictive assumptions. Although our generalized capac
itance matrix theorems are developed for solving general linear systems, it is possible
to derive the generalized inverse form of (A + UGVT) for more general cases based
on these theorems.
2
1.1 Formulations of Early Vision Problems
Early vision problems may be formulated either in a deterministic or a prob
abilistic framework. A popular technique in the deterministic framework is based
on regularization theory [82, 80] and leads to the minimization of energy function
als. The probabilistic formulation exploits the Markov random field (MRF) model to
characterize the function being estimated. We can relate the smoothness constraints
in the regularization theory to the prior distribution of the MRF model. Both the
regularization formulation and the MRF formulation of early vision problems lead
to the minimization of equivalent energy functions. We will give the deterministic
formulations of some early vision problems in the next chapter.
Discontinuities contain very crucial information in computer vision. There are
different treatments of discontinuities in early vision research. Early work on vi
sual reconstruction problems reported in literature ignores discontinuities altogether
[40, 28, 62], which yields undesirable smoothing over discontinuity locations. How
ever, more recently problem formulations allow for incorporation of prespecified dis
continuities [81, 75]. For both cases, the aforementioned energy function is convex
and quadratic for most early vision problems. Minimization of a convex quadratic
function can be achieved by solving the associated linear system. In this thesis, we
present several efficient and novel computational algorithms to minimize the energy
functions for early vision problems with prespecified discontinuities.
There have been many methods proposed to detect discontinuities in the under
lying function from the observed data. An elegant method to treat discontinuities
158
Dynamic programming
Dynamic programming is a global optimization algorithm for a function of dis
crete variables when the function can be expressed as a recurrence. To use this
algorithm, the real variables need to be quantized to discrete variables, i.e. taking
values from a finite discrete set. In [60], the recurrence form is obtained for the sim
plest line energy as in GNC, i.e. without line process interactions in the line energy.
This is a degenerate case in an MRF model, since it happens when the neighborhood
system for line process is empty. It is nontrivial to obtain a recurrence form for
general clique potential functions.
Although dynamic programming can achieve a global optimum for the dis
cretized problems, the the computational cost is very high for large size problems
[8, 60], thus rendering it impractical.
Nonlinear diffusion
Nonlinear diffusion [70, 64] has been popularly employed as an energy mini
mization tool for various vision problems recently. To apply this technique to the
coupled nonconvex minimization problem, Shah [70] changed the binary line process
variables to the edge indicator real variables which assume real values between 0 and
1. When the value is close to 1, it implies that the point is in the vicinity of an edge,
and vice versa. Then, a coupled partial differential equation system is established
from this modified energy functional. The solution update is based on this coupled
PDE system. This technique seems to be very promising in terms of computational
efficiency. However it has been applied only to the simplest line energy case, i.e.
115
or on the problems with the solutions which are somehow periodically continuous.
We have implemented both the circular convolution and the linear convolution (with
selfreflection at the border [50]) in the QMF for our adaptive preconditioner in
this experiment. The convergence curves for a comparison of the aforementioned
algorithms are plotted in figure 5.11. Our preconditioner with the linear as well as
periodic convolutions has the best convergence performance. The computed solutions
after 10, 40 and 100 iterations are shown in figure 5.12.
The other data set in our surface reconstruction experiments is the real laser
range data from a ball. Figure 5.3(c) shows the 148 randomly sampled data points
from the original range data set generated by the MSU (Michigan State University)
Pattern Recognition and Image Processing Labs Technical Arts 100X scanner. The
size of the problem domain is 152 x 193. The controlledcontinuity stabilizer with
p = 1 and t = 0.1 was used in this experiment. The discontinuities are specified
along the boundary of the object. The discontinuities divide the problem domain
into two nonoverlapped regions, where the inside region corresponds to the object
surface and the outside region corresponds to uniform background. Our adaptive
preconditioning is used in the inside region only by enforcing the solution values
in the outside region to be zero after the preconditioning. The convergence curves
for our APCG, HCG (with different levels), and CG algorithms are shown in figure
5.13(a). Our adaptive preconditioned CG algorithm still has the best convergence
performance in this experiment with real laser range data. Very accurate solution is
obtained after 10 iterations of the APCG algorithm, as shown in figure 5.13(b).
LIST OF TABLES
6.1 Summary of Square 2 results 145
6.2 Summary of Diverging Tree results 146
6.3 Summary of Sinusoid 1 results 148
6.4 Summary of Translating Tree results 150
Vll
30
matrix algorithm. The discrete Poisson operator on the rectangular domain is the
Laplacian matrix, which is singular with rank n 1. Therefore, they presented a
theorem to generalize the traditional capacitance matrix algorithm to the case when
rank(A) = n 1 and B is nonsingular. This generalization is based on projecting the
righthandsides (RHSs) of the linear systems appearing in the first and second steps
onto the range of A through a unit vector. In [58], a generalized capacitance matrix
algorithm was presented for rankdeficient matrices A and B. This generalization
is accomplished by establishing the relationship between the solutions of the linear
systems with matrices A and B using their singular value decompositions. The
resulting algorithm involves the projection of the RHSs of the linear systems with
matrix A onto its range through the eigenvectors in the null space of AT. In addition,
two auxiliary matrices A and B, which are the rankaugmented versions of A and B,
are constructed in this algorithm to apply the ShermanMorrisonWoodbury formula
directly. Hence, this capacitance matrix algorithm requires the computation of all
the eigenvectors in the null spaces of A, AT, B and Br.
The capacitance matrix algorithm has also been employed as a discrete anal
ogy to the potential theory for partial differential equations by Proskurowski and
Wildlund [65, 66], and others. However, the framework of their capacitance matrix
algorithms is limited to solving secondorder elliptic boundary value problems.
The other possibility to generalize the capacitance matrix algorithm for singular
A and B is to directly apply a generalized inverse form for (A + UVT) to solving
the linear system. Some works on the extension of the ShermanMorrisonWoodbury
the theory and algorithms developed in this thesis can also be applied to other ar
eas such as cartography/terrain mapping, reverse engineering, MPEG image sequence
compression, material science applications (velocity estimation of dry granular flows),
engineering applications and mathematical physics.
xiv
149
Figure 6.6. Computed optical flow of the (a) Sinusoid 1 and (b) Translating Tree
image sequences, using the preconditioned nonlinear CG algorithm.
scheme [89]. The scalespace tracking is achieved by applying our algorithm on the
smoothed image pair with the degree of smoothing being gradually reduced during
the energy minimization. We start from a a = 1.1647 for Gaussian smoothing, reduce
to a a = 0.5774 after 100 iterations of the preconditioned nonlinear CG algorithm,
and finally set cr to 0 after 200 iterations. The computed optical flow is shown in figure
6.6(b). The error in the computed optical flow is depicted in Table 6.4 along with
the errors of best results with 100% density reported in [3] and [77]. Our SSDbased
regularization method gives the most accurate result from among these methods.
Note that the result of Szeliski and Coughlan in this table is adopted for the two
frame case to be compared to the result of our twoframe algorithm, although more
accurate estimation can be obtained via incorporating more frames into the energy
function as shown in [77].
27
the b contains the terms corresponding to the optical flow constraint equation. The
gradient constraint in the above energy function implies the weighting (Ex + Ey) on
(,) at each discretized spatial location. This weighting is not reasonable since the
gradient constraints at locations of high gradient are not necessarily reliable. In fact,
they tend to be unreliable because the brightness gradients close to discontinuities are
normally higher than those away from discontinuities. Therefore, we eliminate this
weighting for each gradient constraint via normalization. To have a uniform contri
bution at each pixel from the flow constraint, we normalize the entries E2X, E2y, ExEy
2 2
in the matrix K with Ex + Ey.
To solve the above quadratic and convex energy minimization problem, we em
ploy our adaptive preconditioned conjugate gradient algorithm presented in this
dissertation. This novel adaptive preconditioning technique is applicable to any
gradientbased optical flow computation method that uses a smoothness constraint
involving the 2D Laplacian operator. The efficiency as well as accuracy of our adap
tive preconditioning technique and new optical flow algorithms will be demonstrated
through experimental results on synthetic and real image sequences in chapters 5 and
6.
29
The above assumptions imply that the matrix (A + UVT) is also nonsingular. The
matrix (I + VTA1U) is called the capacitance matrix and is denoted by C. Note
that a similar formula can be obtained for the inverse of (A + UGVT), where the
matrix G 3?pxp is nonsingular [30].
Let B = A + UVT, with A, B, and C being nonsingular, then the linear
system Bx = b can be solved using the traditional capacitance matrix algorithm,
which involves the following steps:
1. Solve Ax = b for x.
2. Compute W = A1U by solving AW = U column by column.
3. Form the capacitance matrix C = / + VTW.
4. Solve C/3 = VTx for Â¡3.
5. Compute the solution x = x W/3.
This capacitance matrix algorithm is derived directly from the ShermanMorrison
Woodbury formula, and therefore, it can be applied only when A, B, and C are all
nonsingular, making it unsuitable for general linear system problems. In addition, the
choice of a wellstructured matrix A which is close to the original matrix is limited
to be nonsingular.
Buzbee et al. [10] proposed a fast algorithm to solve the Poisson equation (
a secondorder elliptic PDE ) on irregular regions. Their algorithm used a domain
imbedding technique to imbed the irregular region in a rectangular region and utilized
a fast Poisson solver ( on the rectangular region ) which is based on the capacitance
56
molecules and the data constraint molecules [81] all over the domain which are
shown in figure 4.2, Using this choice of K0, we can transform the original linear
system Kx = b to an equivalent Lyapunov matrix equation.
Figure 4.2. (a) Data constraint molecule and (b) membrane computational molecules
periodically imposed on the domain 0.
With the above choice, the matrix Kq can be written as follows.
K0 AKmern 4 Kj AKmem (
(4.3)
where
D I
I .
K
mem
5
(4.4)
. I
I
I D
34
span{ ..., f\mA } o.nd null(H) span{ q myims+i..., q mA }. Let qt Uy
j), 7^ 0, for 1 < i,j < rriA tub
Let x be a solution to
mAmB
Ax = b 
j=i
qjb
qjui
(3.2)
where b range(B). For 1 < i < p, let r]{ be a solution to
qJu*
AVi = ut ^ r7ui
j=i q
(3.3)
anrf choose rj1...rjto be linearly independent and the linear combinations of
q'l5..., q/rriy4_ms Define the capacitance matrix C as
C = G
i
mAmB
+ Vtt/ Â£
=1
G
i.
qJU
a
'7 71 *
qjui
(3.4)
where rÂ¡ = [t7x, ..., rÂ¡p], and ej is the jth standard basis vector in 3?p. Then, there
exists a unique solution (3 to the capacitance matrix equation
C/3 = VTx
mAmB
E
j=x
^Ge
T ^ ^ 7 7
q, ui
and x *7/3 ts a solution to the linear system Bx = b.
(3.5)
We will use the following proposition to prove this theorem.
156
of p from 1 to 0. In the form described in [8], the GNC algorithm is applicable only
for the case when there is no interaction between the individual line process variables,
i.e. no smoothness constraint is imposed on the line process. This choice of line en
ergy is purely heuristic, since it tries to penalize only the inclusion of discontinuities.
The GNC algorithm proposed in [8] can only deal with this particular form of line
energy.
Recently, Bedini et al. [4] generalized the GNC algorithm to allow some kind of
interactions between line process variables. The new line energy contains the inhibi
tion of parallelism of discontinuities and is expressed in a closed form. However, it is
very difficult to generalize the GNC algorithm to include arbitrary line energies, i.e.,
general constraints on the line process, since the procedure of dual energy in the GNC
algorithm construction to eliminate the line process will be extremely complicated for
general cases. In addition, the GNC algorithm cannot guarantee to find the globally
minimal solution [47]. However, as stated earlier, it does yield satisfactory solutions
in a number of applications.
Analog networks
This approach is very promising since it can lead to a realtime implementation.
The idea of applying analog network to solve this nonconvex minimization problem
is in spirit similar to Hopfield and Tanks neural network [36] approach for solving
combinatorial optimization problems. The original energy function in equation 2.6 is
approximated by a Lyapunov function such that the associated circuit can minimize
this Lyapunov function [44], thus leading to an approximate solution.
118
(a)
(b)
Figure 5.15. The convergence of the APCG, HCG, and CG algorithms for the shape
fromshading problem.
(a)
(b)
Figure 5.16. Shaded images of bump after 100 iterations of APCG algorithm without
using any boundary condition.
60
= (aK mem + I)(6K mem I)?
(4.10)
where
A(1
a, 6= ^1^ l411)
When A > a and & are always real and positive. Then the matrix K0 can be
factored into two real matrices (aKmem +1) and (&Kmem +1), which are of the same
form as the matrix K0 for the membrane dense data problem. Thus, the solution to
a linear system with the matrix K0 can be obtained by solving two Lyapunov matrix
equations, which can be solved using the ADI method in 0(N) time. However, when
the above condition is violated, a and b as well as the factoring matrices in equation
4.10 become complex. It is more complicated to use ADI method to solve the complex
Lyapunov matrix equations and analyze the computational complexity. Therefore,
we limit our discussion to the case when the aforementioned condition is satisfied for
the dense data surface reconstruction problem.
Using the above choice for the matrix Ko, we can see that the matrices U and
V encode the inhibition of membrane and thinplate computational molecules due to
the presence of depth and orientation discontinuities (if any) and the imposed doubly
periodic smoothness over the imbedded rectangular domain.
4.2.2 Sparse Data Surface Reconstruction
The splitting of matrix K for the sparse data surface reconstruction problem is
discussed in this section. The only difference between the splitting schemes for the
100
a preset schedule. The adaptation concept is similar to the adaptive step scheme in
the scale space tracking [89].
The preconditioner construction for the shapefromshading and the optical flow
computation problems is very similar to the one outlined here for the surface recon
struction problem. We only state their differences in the following paragraphs.
5.3.2 Shape from Shading
The variational formulation of the SFS problem leads to solving a set of nonlin
ear partial differential equations. After discretization, we need to solve the system
Kx = b. If the linear approximation of the reflectance function technique [38] is used
to improve the convergence, the stiffness matrix K sR3Nx3N has the following form.
pKs
pi
pD (g) I
pi
AKS + (Rp + p) 1
RpRql
pDT (g> I
RpRql
AK, + (Rq + p)I
(5.9)
where K, $NxN is the Laplacian matrix, I $lNxN is the identity matrix, Rv and
Rq are the firstorder partial derivatives of the reflectance function with respect to p
and q, respectively, taken at the reference gradient (po, <7o), and the matrix D 9?nXn
(N = n2) has nonzero entries only along the main diagonal and upper diagonal with
180
We proposed an informed genetic algorithm for the stochastic search, while using ei
ther one of our generalized capacitance matrix algorithm or adaptive preconditioned
conjugate gradient algorithm for the deterministic search. We presented some promis
ing preliminary results to demonstrate the effectiveness of this algorithm in recovering
shape along with its discontinuities from sparse range data. However, extensive ex
perimentation as well as further theoretical research needs to be performed prior to
judging the practicality of this algorithm in solving early vision problems leading to
nonconvex minimization.
163
We can transform the minimization of E(u,l) with respect to u,l to the mini
mization of a function of the boolean variables 1 only as follows.
minÂ£(u,l) = mm min^(u,l) (7.6)
= rnin{f7(l) + rninÂ£(u  1) (7.7)
Define the function
Â£'(1):= 1/(1) + min Â£(ul) (7.8)
Then the original nonconvex minimization problem is transformed to the combinato
rial optimization problem, i.e. minimization of a function of boolean variables. Note
that the function E(u  1) is a quadratic and convex function of u when the operator
in the observation model, K, is linear.
Previously, hybrid algorithms [52, 5] were proposed to use simulated annealing
for the binary line process and analog network for minimizing the quadratic function
Â£(u  1) for each fixed line process. In our hybrid search algorithm, we use an informed
genetic algorithm to minimize this new energy function. To compute the new energy
function value in the GA, we need to minimize the convex and quadratic function
Â£(u  1). The minimum of a quadratic and convex function can be obtained by
solving a linear system with an associated symmetric positivedefinite (SPD) matrix.
Either one of the two fast algorithms namely, the generalized capacitance matrix
algorithm and the adaptive preconditioned conjugate gradient algorithm, presented
in this thesis, can be used to solve the SPD linear system.
58
We can see that this particular choice of K0 makes it circulant Toeplitz and thus
translation invariant. In addition, it can be decomposed as the sum of two tensor
products of matrices A and I, i.e.,
Kq A <8> I +1 <8> A,
(4.6)
with
A =
I + 2A
A
A
A i + 2A A
A A ). 2A
A
A
A  + 2A
and is the Kronecker product. By using this special structure of the matrix K0, we
can rewrite any linear system of the form K0z = f as the following Lyapunov matrix
equation
AZ + ZA = F,
(4.7)
where Z and F are nxn matrices corresponding to their concatenated n2 x 1 vectors
z and f respectively. It should be noted that n2 = N, and that the matrix A in
equation 4.7 is circulant Toeplitz and symmetric positivedefinite. We can use the
ADI (Alternating Direction Implicit) method to solve this Lyapunov matrix equation
in O(N) operations. We will prove this O(N) computational complexity of the ADI
method in section 4.3.
116
(a)
(b)
Figure 5.13. (a) The convergence for the APCG, HCG and CG algorithms for the
surface reconstruction problem with the real laser range data set using a controlled
continuity stabilizer, (b) The computed solution after 10 APCG iterations.
74
where # is the wavelet transform matrix with each row corresponding to a wavelet
basis vector. If we use the orthogonal wavelet transform, then Dq = ^r(Dq).
Therefore, we compute matrixvector multiplication in the wavelet domain and then
transform back to the original domain by using the wavelet reconstruction (inverse
wavelet transform) to obtain the product Dq.
It is very crucial to know the structure of the capacitance matrix to achieve an
accurate and efficient approximation of the matrix. In equation 3.4, the capacitance
matrix is composed of three matrices, i.e., the identity matrix I, ^2^=0
and VTrj. The first two matrices are extremely sparse, but the third is a dense
matrix. When computing the matrixvector product of the capacitance matrix and
a vector, we directly compute the separate products for the first two sparse matrices
while the aforementioned wavelet approximation scheme is used only for the matrix
VTrj.
m1 qfu
p
4.4 Experimental Results
We have successfully applied our generalized capacitance matrix algorithm to
a host of early vision problems which use the membrane and/or thinplate spline
to constrain the solution [85, 45]. These early vision problems include the surface
reconstruction, shape from shading, optical flow computation, lightness problem,
etc. [85, 45]. These problems when formulated as variational principles and dis
cretized, lead to solving one or more large sparse linear systems. In this section,
we present the results of applying our algorithm to the surface reconstruction and
shape from shading problems.
59
Thinplatemembrane spline for the dense data case
The construction of the matrix K0 for the thinplatemembrane spline prob
lem is based on the above construction for the membrane spline. Analogous to the
matrix Kmem which is a wellstructured matrix and close to the smoothness con
straint matrix Ks constructed for the membrane problem, a matrix Ktp is used for
the smoothness constraint matrix discretized from the thinplate smoothness energy.
The differentiation of the membrane smoothness energy function corresponds to a
Laplacian operator A, while the corresponding operator for the thinplate smooth
ness energy is a biharmonic operator A2. Thus, we can extend the selection scheme
for the membrane problem to the thinplate problem by choose the matrix Kip as
K* = KL. (4.8)
The smoothness matrix for the thinplatemembrane spline is a convex combination
of the membrane and thinplate smoothness matrices. Lets define the matrix Ktpm
as a convex combination of Kmem and Ktp, i.e.,
Ktpm crKtp + (1 cr) K mem ? (4.9)
where 0 < a < 1. Similar to that of the membrane spline, the matrix K0 is chosen
to be AKtpm f I, which can be written as
K0 = AcrK^em + A(1 <7)Kmem + I
144
(a) (b)
Figure 6.2. Frames from the (a) Square 2 and (b) Diverging Tree image sequences.
The motion in the Square 2 sequence is a constant translation. In our experi
ment, the regularization parameter A is chosen to be 10, and the weighting parameters
a and (3 for the image flow constraint and contourbased constraint are set to 1. The
optical flow obtained after 40 iterations of the incomplete Cholesky preconditioned
conjugate gradient algorithm is shown in figure 6.3(a). The errors of our modi
fied gradientbased regularization method and the other gradientbased methods for
this translating square example are given in Table 6.1. Only the results from those
gradientbased methods that yield 100% density flow estimates reported in [3] are
included in this table for comparison. It is obvious that our modified gradientbased
regularization method drastically outperforms the other gradientbased methods in
this example. Note that the results of the other two gradientbased methods in table
6.1 for this image sequence could be improved by increasing the degree of smoothness,
166
Reproduction, also called selection, is the process of choosing the candidate so
lutions for the next generation from the candidate solutions in the current generation.
Some selection schemes have been proposed [25], e.g. proportional selection, ranking
selection, tournament selection ..., etc.
Crossover selects two samples from the current population as parents and then
mates them by exchanging a part of their gene values to generate their offsprings.
The parents are replaced by their offsprings in the current population. There
are many possible implementations of the crossover process. For example, we can
randomly select two samples, a start position and an end position in the gene string,
then exchange the 0s and ls of the two samples between the two positions. An
example is illustrated in figure 7.1.
Parent! = 0110100011010101
Parent2 = 1101001010100110
Child1 = 0110001010010101
Child! = 1101100011100110
Figure 7.1. The crossover operation example.
The mutation operator is similar to the perturbation of the Gibbs sampler in
the simulated annealing. It can prevent GAs from premature convergence. It is very
easy to implement this operator. Given a mutation probability, randomly choose a
sample and a position and flip the 0 or 1 at that location according to the mutation
probability.
137
where the vector u is the concatenation of all the components u, and n,.
The minimization of the above energy leads to solving a linear system of equa
tions Ku = b where the stiffness matrix K Â¡ft22*22 is symmetric positivedefinite
and it has the following 2x2 block structure.
K =
Ks + EXX
E
xy
EXJ, Ks 4 E^j,
(6.9)
where Ks 3?"2x"2 is the discrete 2D Laplacian matrix from the membrane smooth
ness constraint, Exx, Exy, and Eyy are all n2 x n2 diagonal matrices with entries Exi,
ExEy and E2yi, respectively.
To efficiently solve the above linear system for computing optical flow, we use
the adaptive preconditioned conjugate gradient algorithm developed in chapter 5.
6.3 SSDbased Regularization Method
The SSD function in equation 6.2 can be used as the data constraint energy in
a regularization framework. Incorporating the membrane smoothness and the SSD
constraints into the regularization formulation, we obtain the total energy function
/(u) = Â£(7i(*' + + u*) ~ Jo(xi,Vi))2 + A{u2x + u2yi + v2xi + v2yi) (6.10)
'Â£>
where (x,, t/,) is the ith discretized location in the image domain. Note that the data
constraint, ii(x, + + vÂ¡) = 7o(x,,?/,), is not linear in the flow vector u, or u,.
If the Taylor expansion of the data constraint is taken up to first order terms, we
5
biconjugate gradient technique. We prove that the ADI method takes a constant
number of iterations with each iteration taking 0(N) time.
Our adaptive preconditioning technique is used in conjunction with a conjugate
gradient or nonlinear conjugate gradient algorithm to minimize the energy function.
This adaptive preconditioner is constructed in a wavelet basis and adapted to the
spectral characteristics of the imposed smoothness constraint as well as the data
constraint in the energy function. The construction of the preconditioner makes
use of the division of frequency domain property in a wavelet basis. Previously,
there have been some preconditioners proposed in computer vision literature [75, 61,
92]. However, they are either not adapted to the problem [75, 92] or constructed
inappropriately [61]. We empirically show the superiority of our preconditioner over
other preconditioners on various early vision problems.
To solve the early vision problems with discontinuity detection, both regular
ization and MRF formulations lead to the minimization of a coupled (binaryreal)
nonconvex energy function. Numerous algorithms have been proposed to solve this
coupled nonconvex minimization problem. They can be categorized into the deter
ministic methods [8] [44] [21] and the stochastic methods [22] [51]. Most of the deter
ministic methods can only deal with simple constraints on the line process [8, 21, 31];
in addition, they can not find the global minimum solution. The stochastic optimiza
tion methods are primarily the simulated annealing (SA) type algorithms. Simulated
annealing is a general global optimization algorithm, but its convergence rate is very
slow, thus making it impractical for many applications.
3
is via the introduction of a set of binary variables called the line process. Each line
process variable takes a value of 1 in the presence of discontinuities and zero other
wise. Line process has been included in both the deterministic [53, 8] and the MRF
[22] formulations. The resulting energy function involves real variables (for surface
or image) coupled with binary variables (for discontinuities) and is therefore a non
convex function. Both formulations lead to problems involving the minimization of
nonconvex functions.
1.2 Computational Algorithms
For the prespecified discontinuities case, numerous iterative methods have been
employed to minimize the energy functions namely, GaussSeidel [14], successive
overrelaxation (SOR), conjugate gradient (CG) [26], hierarchical conjugate gradi
ent (HCG) [75], and multigrid methods [79], the slowest being GaussSeidel, taking
0(N2) time to converge ( N(= n2) is the number of discretization ) [26] while, the
fastest being multigrid methods taking O(N) time to converge for some regular class
of problems [29]. Although the multigrid techniques have been applied successfully
to a general class of problems, the proofs of convergence are restricted to a special
class of problems [29, 9, 91]. For these results ( posed as solution for PDE problems),
discontinuity in the solution is not considered. Therefore, inclusion of discontinuity
into the problem is bound to deteriorate the convergence speed of multigrid methods.
In addition, the construction of multigrid needs to be modified for nonrectangular
regions. The modification becomes quite complicated for implementation.
73
3. Set rk = rjti a^Cq^x, and r* = f*i akCTqk
4. Compute p% = r^r*, and pk = p%/a%;
5. Set qk = rk + pkqki, and qk = rk + pkqki',
6. If rk 0 or rk 0, stop; else go to 1.
Like the CG, the operations and storage requirement per step in the BCG are con
stant. The convergence behavior is similar to the CG method except that the BCG
algorithm is susceptible to breakdowns and numerical instabilities. To be more spe
cific, division by 0 (or a number very close to 0) may occur in computing ak and
pk. These two different breakdowns may be avoided by using the lookahead Lanczos
algorithm or the quasiminimal residual(QMR) algorithm [19].
In the above algorithm, two matrixvector multiplications Cq*_i and Cq*_i are
required in each iteration. Since the matrix C is dense, the direct multiplication will
take O(N) operations, which is not acceptable to our pursuit of an O(N) algorithm
for the Poisson equation. Thus, we propose to carry out the multiplication in a
wavelet basis to reduce the computational cost [1, 7]. With appropriate wavelet basis
[7], the decomposed matrix can be approximated to high accuracy by a sparse matrix.
After the approximation, the multiplication of this sparse matrix and the decomposed
vector can be reduced to 0(n) or 0(nlog(n)) operations [7]. Consequently, the
modified BCG method is computationally efficient.
The wavelet transform of a matrix D and vector q can be written as follows:
D = <Â£D4>t, q = <Â£q.
(4.35)
108
(a)
(b)
(c)
Figure 5.5. The computed solutions after (a) 10 iterations of APCG (b) 25 iterations
of HCG (c) 30 iterations of CG using the same time.
125
with an appropriate number of levels L, exhibits fast convergence, it is difficult to
determine the best L for a problem in advance. As for the APCG algorithm, the
number of levels is always chosen to be the maximum possible which allows the linear
convolution in the QMF implementation with the mirror reflection at the border to
be carried out with ease.
(a) (b)
Figure 5.21. Computed flow of the (a) translating square and (b) rotating Rubic
cube.
In our examples, the regularization parameter A was set to a value of 10 for the
translating square example and 1 for the rotating Rubic cube experiment. For the
former, the computed optical flows after 10 iterations of the APCG algorithm is shown
in figure 5.21(a), and the average angular error [3] is 0.21 with the standard deviation
0.07. Compared to the results for the other gradient based methods reported in
Barron et al. [3], our new formulation gives the best solution with 100% flow density.
In fact, our adaptive preconditioner can also be used with the other gradient based
150
Technique
# of
Frames
Average
Error
Standard
Deviation
Density
Ur as et al.
15
0.62
0.52
100%
Szeliski and Coughlan
2
0.35
0.34
100%
Our method (SSDbased)
2
0.30
0.19
100%
Table 6.4. Summary of Translating Tree results.
Finally, our SSDbased regularization method is applied to a real image sequence
namely, the Hamburg Taxi sequence. One frame of this sequence and the computed
optical flow are shown in figure 6.7. The computed optical flow between two con
secutive images was obtained after 200 iterations of the preconditioned nonlinear
conjugate gradient algorithm.
(a) (b)
Figure 6.7. (a) One frame from the Hamburg Taxi sequence (b) Computed optical
flow after 200 iterations of the preconditioned nonlinear CG algorithm.
6.5 Summary and Discussion
In this chapter, we presented two new algorithms for robust and efficient compu
tation of the optical flow from an image sequence. One is the modified gradientbased
18
fusion, and integration of multiple modules [51]. This can not be achieved by
a deterministic model.
Therefore, it can be concluded that the MRF model is more versatile than the
deterministic models. The efficient computational algorithms proposed in this thesis
can be applied to solve the minimization problems for both the deterministic and
probabilistic formulations.
2.3 Variational Formulations for Some Early Vision Problems
From the previous two sections, we can see the link between the regulariza
tion and MRF formulations, which lead to the minimization of equivalent energy
functions. In this section, we give the deterministic formulation of some early vi
sion problems namely, surface reconstruction, shape from shading and optical flow
computation. The corresponding probabilistic formulation can be directly obtained
from the equivalence between these two formulations [51]. For ease of exposition, the
location of surface and orientation discontinuities (if any) are assumed to be known
in the below discussion. However, we will discuss the issue of discontinuity detection
in chapter 7.
Variational formulations for various early vision problems have been reported
in [79] and references therein. These formulations make use of the popular theory
of regularization. In a regularization framework, generic smoothness assumptions
are imposed on the solution space prior to attempting any functional minimization.
The smoothness constraints are well characterized by a class of generalized multi
dimensional spline functionals [80]. The formulations involve minimization of an
183
[25] D. E. Goldberg and K. Deb, A comparative analysis of selection schemes used
in genetic algorithms, Technical Report TCGA Report No. 90007, Univ. of
Alabama, 1990.
[26] G. H. Golub and C. F. Van Loan, Matrix Computations Baltimore: The Johns
Hopkins University Press, 2nd ed., 1989.
[27] J. J. Grefenstette, Genetic Algorithms and their applications , in The En
cyclopedia of Computer Science and Technology (A. Kent and J. G. Williams,
eds.), vol. 21, pp. 139152, New York: Marcel Dekker, 1990.
[28] W. E. L. Grimson, An implementation of computational theory for visual sur
face interpolation , Computer Vision, Graphics, and Image Processing, vol. 6,
no. 22, pp. 3669, 1983.
[29] W. Hackbusch and U. Trottenberg, eds., Multigrid Methods (Lecture Notes in
Mathematics), vol. 960. New York: SpringerVerlag, 1982.
[30] W. W. Hager, Updating the inverse of a matrix , SIAM Review, vol. 31,
pp. 221239, June 1989.
[31] J. G. Harris, C. Koch, E. Staats, and J. Luo, u Analog hardware for detect
ing discontinuities in early vision , International Journal of Computer Vision,
vol. 4, pp. 211223, 1990.
[32] F. Heitz and P. Bouthemy, Multimodal estimation of discontinuous optical
flow using Markov random fields , IEEE Trans. Patt. Anal. Machine Intel!,
vol. PAMI15, pp. 12171232, 1993.
[33] F. Heitz, P. Perez, and P. Bouthemy, Multiscale minimization of global energy
functions in some visual recovery problems, CVGIP: Image Understanding,
vol. 59, no. 1, pp. 125134, 1994.
[34] H. V. Henderson and S. R. Searle, On deriving the inverse of a sum of matrices
, SIAM Review, vol. 23, no. 1, pp. 5360, 1981.
[35] E. C. Hildreth, Computations underlying the measurement of visual motion ,
Artificial Intelligence, vol. 23, pp. 309354, 1984.
[36] J. J. Hopfield and D. W. Tank, Neural computation in optimization problems,
Biological Cybernetics, vol. 52, pp. 141152, 1985.
[37] B. K. P. Horn, Robot Vision. Cambridge, MA: The MIT Press, 1986.
[38] B. K. P. Horn, Height from shading , Inti. Journal of Computer Vision, vol. 5,
no. 1, pp. 174208, 1990.
[39] B. K. P. Horn and M. J. Brooks, eds., Shape from Shading. Cambridge, MA:
The MIT Press, 1989.
7
achieved by using the division of frequency domain property in a wavelet basis.
By using this construction, our preconditioner is adapted to to the smoothness
constraint as well as data constraint.
3. Two robust algorithms are proposed for computing optical flow. The first is a
modified gradientbased regularization method, and the other is an SSDbased
regularization method. In our modified gradientbased method, we selectively
combine the image flow constraint and the contourbased flow constraint into
the data constraint in a regularization framework to amend the errors in the
image flow constraint caused by brightness discontinuities. Our SSDbased
regularization method uses the SSD (Sum of Squared Differences) measure as
the data constraint in a regularization framework. Experimental results for
these two algorithms are given to demonstrate their superior performance.
4. A novel hybrid (stochastic + deterministic) search algorithm is proposed to
solve the nonconvex optimization problems for early vision involving disconti
nuity detection. This hybrid search algorithm consists of an informed genetic
algorithm (GA) and either one of the two aforementioned numerical algorithms.
The informed GA is used for the line process only in the outer loop of our algo
rithm, while the deterministic algorithm is employed to solve the subproblems
of minimizing the energy function given a fixed line process configuration.
The efficiency of the proposed algorithms is demonstrated through experiments
on several early vision problems, including surface reconstruction, shape from shad
ing, and optical flow computation. In addition to computer vision applications, the
182
[12] A. K. Chhabra and T.A. Grogan, On Poisson solvers and semidirect methods
for computing area based optical flow, IEEE Trans. Patt. Anal. Machine Intell.,
vol. 16, Nov. 1994.
[13] M. Creutz, Microcanonical Monte Carlo simulation , Physical Rev. Lett.,
vol. 50, pp. 14111414, 1983.
[14] G. Dahlquist and A. Bjorck, Numerical Methods. Englewood Cliffs, NJ: Prentice
Hall Inc., 1974.
[15] T. Davis, Toward an Extrapolation of the Simulated Annealing Convergence
Theory onto the Simple Genetic Algorithm. Ph.D. Dissertation, University of
Florida, 1991.
[16] M. de la Maza and B. Tidor, Boltzmann weighted selection improves per
formance of genetic algorithms , Technical Report AI Memo 1345, MIT, A.I.
LAB., Cambridge, MA, 1991.
[17] J. H. Duncan and T.C. Chou, On the detection of motion and the computation
of optical flow, IEEE Trans. Patt. Anal. Machine Intell., vol. PAMI14, pp. 346
352, March. 1992.
[18] D. J. Fleet and A. D. Jepson, Computation of component image velocity from
local phase information, Inti. Journ. of Comput. Vision, vol. 5, pp. 77104,
1990.
[19] R. W. Freund, G. H. Golub, and N. M. Nachtigal, Iterative solution of linear
system , in Acta Numrica 1992, pp. 57100, New York: Cambridge University
Press, 1992.
[20] D. Gabor, Theory of communication , Journal of the IEE, vol. 93, pp. 429
457, 1946.
[21] D. Geiger and F. Girosi, Parallel and deterministic algorithms from MRFs:
surface reconstruction , IEEE Trans. Patt. Anal. Machine Intell., vol. 13, no. 5,
pp. 401412, 1991.
[22] S. Geman and D. Geman, Stochastic relaximation, Gibbs distributions, and
the Bayesian restoration of images , IEEE Trans. Patt. Anal. Machine Intell.,
vol. 6, no. 6, pp. 721741, 1984.
[23] P. E. Gill, W. Murray, and M. H. Wright, Practical Optimization. New York:
Academic Press, 1981.
[24] D. E. Goldberg, Genetic Algorithm in Search, Optimization and Machine Learn
ing. Reading, MA: AddisonWesley, 1989.
19
energy functional Â£, which is the sum of the energy contribution from the smoothness
constraint (S) and the data constraint (V) (see [80, 75] for details of this formulation).
2.3.1 Surface Reconstruction
For the surface reconstruction problem, we are faced with recovering surface
shape from either sparse or dense range data. For a first order stabilizer, the smooth
ness constraint is given by
$() = J JQ(vl + vy) dxdy (214)
where u(x, y) is the admissible function and vx, vy its partial derivatives assumed to
be small. For the case of a second order stabilizer, we have the following smoothness
constraint
Â£() = + + uw) dajd> (215)
The first and second order stabilizer can be combined to form the controlledcontinuity
stabilizers [80]. To the smoothness constraint, we add the data constraints in the
form of penalty terms. The following penalty term which measures the discrepancy
between the surface and data weighted by the uncertainty in the data may be used,
pm = \'E.cÂ¡ iv(x" df
(2.16)
LIST OF FIGURES
4.1 The flow chart of the generalized capacitance matrix algorithm. ... 53
4.2 (a) Data constraint molecule and (b) membrane computational molecules
periodically imposed on the domain 0 56
4.3 Surface reconstruction example; (a) original sparse data, (b) recon
structed surface using membrane smoothness, (c) reconstructed sur
face using thinplate smoothness, obtained using the generalized ca
pacitance matrix algorithm 76
4.4 Surface reconstruction with prespecified discontinuities along a hori
zontal line; (a) original sparse data set with depth discontinuities along
the indicated horizontal line; reconstructed surfaces using a membrane
spline after (b) 1 iteration and (c) 10 iterations of ADI method; (d)
reconstructed surface using a thinplate spline after 16 iterations of
ADI method 77
4.5 Surface reconstruction with prespecified discontinuities along a diag
onal line; (a) the original sparse data set with depth discontinuities
along the indicated diagonal line, (b) reconstructed surface using a
membrane spline after 10 iterations of ADI method 78
4.6 Shape from shading example, (a) Shaded sphere image (b) irregular
occluding boundary imbedded in a rectangle (c) recovered surface ori
entation, (d) reconstructed surface shape 80
5.1 The implementation of the wavelet transform via the QMF struc
ture from one level to the next, (a) decomposition (b) reconstruction,
adopted from Mallat[1989] 92
5.2 Division of the frequency domain in a wavelet basis 92
5.3 The sparse data sets used for the surface reconstruction, (a) synthetic
data (b) synthetic symmetric data (c) real laser range data 104
vm
10
In chapter 7, we list several directions for conducting future research based on
the research reported in this thesis. For the early vision with discontinuity detec
tion problem, we discuss numerous existing techniques and propose a new hybrid
search algorithm as a possible approach toward solving this problem. By includ
ing line process into the optimization of the energy function, the early vision along
with discontinuity detection problem is formulated as the minimization of a coupled
(binaryreal) nonconvex energy function. Our hybrid search algorithm consists of a
stochastic and a deterministic search techniques. For the stochastic search, we pro
pose an informed genetic algorithm (GA) as a global minimizer for the binary line
process only, whereas for the deterministic search, we employ either one of the two
deterministic algorithms mentioned above to solve the minimization problem for each
fixed line process configuration inside the GA. Promising preliminary experimental
results of using this algorithm on the sparse data surface reconstruction problems are
given.
Finally, we summarize the results of this thesis in chapter 8.
161
Some variants of the simulated annealing algorithm have been reported in liter
ature, e.g. microcanonical annealing [13]. However, their computational complexities
are similar to the standard annealing.
MPM method
The MPM (maximizer of posterior marginals) method was proposed by Marro
quin et al. [51]. It is an approximation to MAP estimation, i.e. the MPM method
only provides an approximate solution. Unlike the gradual temperature cooling sched
ule in the simulated annealing algorithm, the MPM method executes the Metropolis
algorithm at a fixed temperature to collect statistics about the equilibrium behav
ior of the coupled MRF. Therefore, this method requires far less computation than
simulated annealing.
7.3 Proposed Hybrid Search Algorithm
In this section, we present a novel hybrid (stochastic + deterministic) search
algorithm as a possible solution to the coupled (binaryreal) nonconvex optimization
problem. This hybrid search algorithm consists of an informed genetic algorithm
(GA) and either one of our generalized capacitance matrix algorithm or the adaptive
preconditioned conjugate gradient algorithm. The informed genetic algorithm is em
ployed as a global minimizer on the binary line process. Inside the GA, either one of
the aforementioned deterministic algorithms is used on the real ( surface ) variables
to solve the quadratic and convex minimization problems for each given line process
configuration in the GA.
84
of iterations to the correct solution for very sparse data that is used in the examples
of this thesis.
Yaou and Chang [92] report a fast surface interpolation algorithm using the
multiresolution wavelet transform. This work is almost exactly similar to that of
Szeliski with the exception that Yaou and Chang use a wavelet basis instead of the
hierarchical basis and a Jacobi iteration [26] instead of the conjugate gradients used in
Szeliski [75]. As in Szeliski, the full potential of a preconditioning technique was not
exploited in Yaou and Chang i.e., the design of the preconditioner did not make use of
any information about the problem being solved. The stiffness matrix in the surface
interpolation problem was first transformed to a biorthonormal wavelet basis and
then the transformed linear system was solved using a Jacobi iteration. Note that,
in a preconditioning technique applied to the the positive definite stiffness matrix K,
one tries to approximate K as well as possible with another positive definite matrix P
known as the preconditioner, such that P1K is a good approximation to the Identity
matrix. Such an approximation P was never constructed in Yaou and Chang [92].
Further, upon implementing their algorithm, it was found that their published results
about rate of convergence were only valid for data from periodic function since their
implementation took advantage of periodic wavelet transforms.
In this chapter, we will present a very effective way to construct a physically
based adaptive preconditioner in a wavelet basis for several early vision problems
namely, the surface reconstruction (SR), shape from shading (SFS) and computation
of optical flow (OF). This physicallybased preconditioner is adapted to the membrane
124
(a) (b)
Figure 5.20. One frame from the (a) square image sequence and (b) rotating Rubic
cube image sequence.
The construction of our adaptive preconditioner for the optical flow problem is
described in section 5.3. This preconditioner is embedded in the conjugate gradient
to accelerate the convergence speed for solving the linear system in the optical flow
computation. We use the CG, HCG ( with various number of levels), and our APCG
algorithms to solve the problem. Figure 5.22(a) and (b) shows the convergence rates
of the three algorithms for the translating square and the rotating Rubic examples,
respectively. We can see that our APCG algorithm has the best performance in
convergence. Note that the HCG algorithm also converges quite fast because the
linear system for the optical flow problem is not as illconditioned as the one in the
surface reconstruction and shapefromshading problems. Therefore, the difference
between various preconditioning methods in not as pronounced for the optical flow
problem as in the other two (SR & SFS) problems. Although the HCG algorithm
CHAPTER 1
INTRODUCTION
Early vision has been one of the most fundamental research topic in computer
vision. The primary goal of early vision is to recover the geometric or physical
properties of threedimensional objects from twodimensional sensed data. Typical
properties of objects are the shape ( depth or orientation ), texture, color, reflectance
and motion parameters. Example tasks of early vision include surface reconstruction,
optical flow computation, shape from shading, stereo matching, edge detection and
lightness problem [28, 41, 40, 38, 63, 6, 72]. The results of early vision can be used
for higher level tasks such as bin picking, object recognition, passive navigation and
motion tracking.
Early vision problems can be regarded as inverse problems. During the imaging
process, some information is lost since it projects threedimensional objects to two
dimensional images. Therefore, it is usually not possible to solve the inverse problems
from data information only. In fact, most early vision problems are illposed problems
[6]. A problem is said to be wellposed if its solution (a) exists, (b) is unique, and
(c) depends continuously on the input data [62]. We can transform an illposed
problem into a wellposed one by imposing additional constraints (such as smoothness
constraints) on the solution.
1
This dissertation was submitted to the Graduate Faculty of the College
of Engineering and to the Graduate School and was accepted as partial ful
fillment of the requirements for the degree of Doctor of Philosophy.
August 1995
Winfred M. Phillips
Dean, College of Engineering
Karen A. Holbrook
Dean, Graduate School
134
The contourbased methods [35, 17] compute the optical flow along the zero
crossing contours by using the following flow constraint equation
Sx(x, y, t)u(x, y, t) + Sy(x, y, t)v(x, y, t) + St{x, y,t) = 0 (6.4)
where S(x, y, t) is the convolution of 7(x, y, t) with the second derivative of the Gaus
sian function, Sx, Sy, and St are the partial derivatives of S with respect to x, y and
t, respectively. In [35], the function S was chosen to be the convolution of the Lapla
cian of Gaussian (LOG) filter with the image brightness function, i.e. V2G*/, where
V2 = ^ + Jp, G(x,y) is the 2D spatial Gaussian function and the convolution is
performed in 2D.
Now, we incorporate the above two flow constraint equations into the regu
larization framework. The image flow constraint in equation 6.3 is disabled in the
neighborhood of brightness discontinuities, while the contourbased flow constraint
is active along the zerocrossing. The variational formulation of the optical flow
problem using this selective scheme for the data constraint leads to minimizing the
following functional
/ a((V/.u) + /i)2 + A( Vu 2 +  Vu 2)dx+ f /?(V5u + 5t)2dx (6.5)
J fi J zerocrossing
where u(x, t) = (u(x,t),v(x,t)) is the optical flow field to be estimated, x = (x,y)
is a 2D vector, 1 is the 2D image domain, a, A and (3 are the weighting functions
associated to the image flow constraint, smoothness constraint and contourbased
40
(4) Now, we will show that B(x rj(3) = b. Substituting A + UGVT for B
and expanding, we have
B(x rj/3) = Ax Atj/3 + UGVTx UGVTrjf3. (3.18)
Rewriting Vtt]/3 using equations 3.4 and 3.5 and some algebraic manipulation, we
get
mAmB
Vtt7/3 = Vtx Â£
uj
nTb mAmB
ij
3 T
qju
3=1
q; Uj
(3.19)
Substituting for Ax, Atj and VT77/3 from equations 3.2, 3.3 and 3.19 respectively,
equation 3.18 leads to B(x r]/3) = b.
In theorem 1, we assume that the matrices U and V contain linearly independent
columns and G is nonsingular, which is equivalent to saying that U, V and G are all
of full rank, in addition, the condition qfuj = a,<(, j) must be satisfied for 1 < i,j <
mAmB In fact, we can prove that given any two matrices A, B 3?nXn, there always
exist fullranked matrices U, V 9?nxp and G 6 3ipxp such that UGVT = B A and
p = rank(B A). This can be verified by taking the singular value decomposition of
the matrix B A as \JS'VT, where both U and V are orthogonal matrices and Â£ is
a diagonal matrix with rank p = rank(B A). We can reduce the size of the matrix
Â£ by retaining the p nonzero diagonal entries and taking out the remaining rows
and columns of zeros. This reduced matrix G G 9Â£pxp is nonsingular. In addition,
we take out the columns in U and V corresponding to the zero singular values in Â£,
48
In general, Theorems 1 and 2 can be regarded as the special cases for Theo
rem 3 with the exception of the additional assumption on the linear independence of
the two sets of vectors {Ar'm+i,..., Ar'ms} and {ux,..., up} made in Theorem 3 to
guarantee the uniqueness of the solution in the capacitance matrix equation. This as
sumption is automatically satisfied for Theorem 1 since the set {Ar'm/+i,..., Ar'ms}
is empty and the vectors Ui,...,up are linearly independent. However, there is no
such assumption in Theorem 2, which gives the existence of a solution to the capac
itance matrix equation instead of the uniqueness of the solution given in Theorem
3.
3.3 Discussion
In this chapter, we presented the generalized capacitance matrix theorems for
solving linear systems. Our generalized capacitance matrix theorems can be used for
very general matrices A and B, therefore the resulting algorithm can be applied to a
very broad class of problems and we have ample freedom to choose the matrix A ap
propriately for designing efficient numerical algorithms. The traditional capacitance
matrix algorithm [10, 65, 66] has been primarily applied to solve the secondorder
elliptic boundary value problems, whereas our generalized capacitance matrix algo
rithm can be used to solve anyorder elliptic boundary value problems. Also, we can
handle cases such as the shape from shading ( SFS ) problem on an irregular domain
with Neumann boundary conditions which specifically yields a singular K matrix.
This case can not be solved using the capacitance matrix technique of Buzbee et al.,
[10] as mentioned in [72] because, the capacitance matrix theorem of [10] requires a
179
case. Hence, we presented an alternative algorithm called the adaptive precondition
ing technique as a general solution to early vision problems with prespecified disconti
nuities. This adaptive preconditioner was constructed in a wavelet basis based on the
approximation to the spectral characteristics of the smoothness and data constraints
by using the division of frequency domain property in a wavelet basis. We empirically
showed the superiority of this adaptive preconditioner over other preconditioners pre
viously proposed in vision literature through experiments on surface reconstruction,
shape from shading and optical flow computation.
In the context of robustness, we developed two new and robust algorithms for
accurate optical flow estimation. One is a modified gradientbased algorithm which
combines the image flow constraint and the contourbased flow constraint into the
data constraint in a regularization framework to amend the errors in the image flow
constraint caused by brightness discontinuities. To solve the linear system arising
from the regularization formulation, our adaptive preconditioned conjugate gradient
algorithm was employed, leading to an efficient algorithm. The other is an SSD
based regularization method that uses the SSD measure as the data constraint in
a regularization framework. The preconditioned nonlinear conjugate gradient with
a modified search direction scheme was developed to minimize the resulting energy
function. Experimental results for these two algorithms are given to demonstrate
their performance.
Finally, in future research directions, we presented a novel hybrid search algo
rithm which involved a combination of stochastic and deterministic search techniques.
51
Note that the step 2 of the algorithm requires the solutions t^, ..., rÂ¡p in equation
3.33 with different RHSs. The computational cost to solve this equation p times is
very high when p is large. However, the vectors rÂ¡1,...,rÂ¡p can be easily obtained
when the matrix A is chosen to be circulant Toeplitz and the vectors u, have a very
sparse set of nonzero entries. The circulant Toeplitz property of the matrix A makes
it shiftinvariant, which means AS'(u) = S(Au) for all u where S is a shift
operator [59]. A linear shiftinvariant operator can be completely characterized by its
impulse response, which is the solution of the corresponding linear system Au = e
where the RHS e is a unit vector. Thus, the solution 77, can be obtained via the
convolution of the impulse response for this shiftinvariant linear system with the
RHS in equation 3.33, which is sparse since the vectors Ui,...,up themselves are
sparse. The convolution of the impulse response with a sparse vector can be easily
obtained by a linear combination of the shifted impulse response vectors. When the
vectors v,, i = 1 ,...,p, are also sparse, the computation of the matrix VTT7 for
forming the capacitance matrix C can be further simplified by obtaining the (,j)th
entry of \Tr] from the impulse response corresponding to the relative locations of
the nonzero components of v, and those of Uj,..., um^_m and Uj.
The direct computation of rj/3 in the final step of the generalized capacitance
matrix algorithm requires O(pn) operations, which are very costly when p, the rank
of difference between B and A, is large. Instead, we solve the following linear system
qJV0
niAm
Ay = U/3 Â£ JT Uj.
(4.1)
143
The angular measure of error [3] between the computed optical flow and the
true flow is adopted in the experiments as a basis for comparison with the previously
reported results. Unlike some other methods which only produce sparse optical flow,
both of our regularization methods presented in this chapter give dense optical flow
estimates with 100% density. To achieve a fair comparison with the reported results,
only those techniques reporting 100% flow density in literature are included in our
comparative study.
6.4.1 Modified Gradientbased Method
In the implementation of our modified gradientbased regularization method,
we treat the zerocrossing points of S(x,y,t) as the discontinuity locations. The
computation of the derivatives Ix, Iy and It is very errorsensitive. Therefore, we
presmooth the image brightness function with the Gaussian filter prior to using the
central difference method to approximate the derivatives. As for Sx, Sy and St, we
directly use the central difference scheme to approximate them since the process of
convolution with the LOG (Laplacian of Gaussian) filter for generating the function
S already involves the smoothing operation.
Two standard synthetic image sequences are tested by using our modified gradient
based regularization method to compute the optical flow. These two image sequences
are the images of a translating Square and a Diverging Tree reported in [3]. Figure
6.2 shows one frame from each sequence. The image sizes for the square and diverging
tree are 100 x 100 and 150 x 150 respectively.
47
Let x be a solution to
Ax = b Â£
mAm qib
J=1 *Jui
U
J >
(3.32)
where b range(B). For 1 < i < p, let 77, be a solution to
mAm QT
Ai? = u Â£ ikruJ (333)
1=
and choose rjl... TjmA_m to be linearly independent and linear combinations of q'j,...,
q'mAm Define the capacitance matrix C as
mAm gt U
C = G1 + Vtt/ y G'e.V, (3.34)
where rÂ¡ = [77^ ..., rÂ¡p], and ej is the jth standard basis vector in 3?p. Assume the
two sets of vectors {Ar'm/+i, , Ar'mB} and {ui,...,up} are linearly independent,
then there exists a unique solution (3 to the capacitance matrix equation
C/3 = VTx 
rn^m
U
(3.35)
and x r}(3 is a solution to the linear system Bx = b.
The proof of this theorem is omitted here since, it is similar to the proof of
Theorem 1. This proof will require the use of the fact that qfiij = 0, for mA m + 1 <
i < mA and 1 < j < p, and the matrix V^q'j... q'mA_m] is of full rank.
103
This truncation is used to reduce the operations involved in QMF implementation
of the wavelet transform and an insignificant error is introduced in the frequency
domain.
In constructing our preconditioner, the weighting function a(i) is needed to
control the contributions of Ka and K< to the modulation matrix. We suggest that
the weighting function be chosen as an increasing function. We used an increasing
function of the form (1e~ e )d, where c and d are constants, for the weighting function.
In our experiment, c was usually set to 10 (for problems not severely illconditioned)
or 100 (for severely illconditioned problems), and accordingly d was set to be 1 or 2.
5.4.1 Surface Reconstruction
We present results of several experiments with sparse data, and compare the con
vergence rate of our adaptive preconditioned conjugate gradient (APCG) algorithm
with the hierarchical basis conjugate gradient (HCG) technique [75], the conjugate
gradient (CG) algorithm [26], and the biorthogonal wavelet basis transfer scheme
with Jaffards diagonal preconditioning [42]. In [92], the biorthogonal wavelet basis
transfer is used in the preconditioning process. We found that this wavelet basis
transfer must be combined with appropriate scaling to improve the convergence,
which was not mentioned in [92]. In [42], Jaffard proposed the 2J, where j is the
level, diagonal scaling in the wavelet basis as a preconditioning technique. Jaffards
2J diagonal scaling can be changed to 2mj scaling where m is a positive integer. We
combined this variant of Jaffards diagonal scaling with the biorthogonal wavelet ba
sis transfer and implemented it for conducting the convergence comparisons. We use
69
Substituting equation 4.25 into equation 4.24 and calculating the integrals in equation
4.24, we obtain the following inequality
1
1
((J)2 + ()2)2m+2<'
< /?n
4771+22?
(4.26)
where 0 = 2 + is constant.
Using equations 4.19, 4.22 and 4.26, the convergence criterion AXfF < cIX#f
can be met with by the following derived condition
ae24 9
~1~
n2q = Mn2q,
(4.27)
where M is the constant defined as M
a'2* q Furthermore, we can bound the
function max^\(0,o)R(p\,p2, ,Pt) by a function of A* as follows:
max
V(*,/)\(0,0)
R{pi,.
,Pt)
max
1
v(fe,/)\(o,o) (At + A/)9
t
IK
Afc ~ Pj
A* + Pj
t
)J)(IK
j=i
Ei)2)
A,+pj* }
<
max
i
lnXiT?II(
n_1 K j=i
Ak Pj '2
A k 4" Pj
The maximum taken from the values of function R at the nonzero eigenvalues,can
be bounded by using the maximum of the same function in an interval containing all
the nonzero eigenvalues. Then the convergence can be reached by using the following
modified requirement
max
Aj=4 sin2(^)
1 1
ik
x j=i
Â£ Pj_\2
x + Pj
< Mn2q.
(4.28)
TABLE OF CONTENTS
ACKNOWLEDGMENTS iii
LIST OF TABLES vii
LIST OF FIGURES viii
ABSTRACT xii
CHAPTERS
1 INTRODUCTION 1
1.1 Formulations of Early Vision Problems 2
1.2 Computational Algorithms 3
1.3 Main Contributions of this Thesis 6
1.4 Thesis Overview 8
2 PROBLEM FORMULATIONS 11
2.1 Deterministic Framework 11
2.2 Probabilistic Framework 14
2.3 Variational Formulations for Some Early Vision Problems 18
2.3.1 Surface Reconstruction 19
2.3.2 Shape from Shading 20
2.3.3 Optical Flow Computation 24
3 GENERALIZED CAPACITANCE MATRIX THEOREMS 28
3.1 Introduction 28
3.2 Generalized Capacitance Matrix Theorems 33
3.3 Discussion 48
4 GENERALIZED CAPACITANCE MATRIX ALGORITHM 50
4.1 Structure of the Algorithm 50
4.2 Choice of the Splitting 52
4.2.1Dense Data Surface Reconstruction 55
IV
15
(2) P(FÂ¡ fij  Fk,i /fe,/,(fc,O ^ (hj)) P{Fi,j fi,j  Fk,Â¡ fk,i,{k,l)
Gj).
It is not clear how to find a probability distribution function for an MRF directly
from the above definition. Fortunately, the HammersleyClifford theorem [22] states
that F is an MRF on S with respect to the neighborhood system G if and only if
the probability distribution is a Gibbs distribution with respect to G. Therefore, we
can construct an MRF by designing a Gibbs distribution function as the distribution
function of the MRF.
To give a definition of a Gibbs distribution, we have to define a clique first.
A subset C C S is a clique if every pair of distinct sites in C are neighbors. Let C
denote the set of all cliques.
A Gibbs distribution relative to {S, G} is a probability measure t on 2 such
that
7r(u>) = ^ exp~u^IT (2.7)
Zj
where Z is the normalizing constant ( also called partition function), T is a parameter,
and the energy function U{u) is of the form
V(a) = Â£ Vc(u)
cec
(2.8)
The function Vc(u>) is called the clique potential. Its value is determined by Uij with
(*, j) C.
159
no interaction between line process variables. In addition, this technique can not
guarantee the globally optimal solution.
All the above deterministic optimization methods can only be applied to
some restricted energy functions whose line energies contain either no line process
interactions or very simple line process interactions. The line energies encode the
constraints or prior information we impose on the line process. Bedini et al. [4] used
a line energy with simple line process interactions and obtained much better results
than those with the simplest line energy, i.e., no interactions between line processes.
In an MRF model, the line energy is the summation of line clique potentials. When
the neighborhood system in the MRF model is nonempty, there are line process
interactions in the line energy. In general, the line clique potentials are accessed
through a lookup table, and no closedform expression is available. The line clique
potentials are very crucial to the performance of the results [54]. They are regarded
as the parameters in an MRF model and can be estimated for different classes of
object [54] to achieve better performance.
Furthermore, Most of the above mentioned deterministic methods with the ex
ception of dynamic programming, can only reach an approximate solution or a locally
optimal solution. In fact, the quantization of the real variables in dynamic program
ming may also be regarded as an approximation.
160
7.2.2 Stochastic Optimization Methods
Three stochastic methods for the MAP estimation problems will be discussed in
the following. Unlike the deterministic optimization methods, there is no restriction
on the clique potentials for the stochastic methods. In addition, simulated annealing
as well as the genetic algorithm can converge to a global optimum. The drawback
of stochastic optimization methods is that, their computational cost is very high,
compared to most of the deterministic methods. However, we will present a novel
hybrid search algorithm that has yielded promising and practically feasible prelimi
nary results.
Simulated annealing
Simulated annealing has been known as a general global optimization algorithm.
No restriction on the clique potential functions need be imposed in this approach.
The concept of this algorithm comes from statistical physics and is based on gradually
lowering a parameter T that corresponds to temperature in a physical system. As
T decreases, samples from the posterior distribution approach toward the minimal
energy solution. Geman and Geman [22] developed the Gibbs sampler with the
simulated annealing algorithm to solve the MAP estimation problem. In order to
achieve the global minimum, the cooling schedule for the temperature must be very
slowly decreased using the formula T(k) = where T(k) is the temperature
during the kth iteration and c is a constant. Since the cooling schedule must be
very slow to guarantee global convergence, the computational cost for the simulated
annealing algorithm is enormous.
104
Figure 5.3. The sparse data sets used for the surface reconstruction, (a) synthetic
data (b) synthetic symmetric data (c) real laser range data.
xml version 1.0 encoding UTF8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchemainstance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EBG0XVQ8U_E45X1P INGEST_TIME 20141003T21:50:28Z PACKAGE AA00025708_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES
148
The Sinusoid 1 image sequence has a constant translation with a velocity of
(1.585,0.863) pixels/frame. The image size for each frame of this image sequence
is 100 x 100. Each frame is a highly textured image whose brightness function is
fast changing almost everywhere. The regularization parameter A is set to be 300.
Very accurate estimate is obtained after 300 preconditioned nonlinear conjugate gra
dient iterations. Figure 6.6(a) shows the computed optical flow. Our result is the
best among the three depicted in Table 6.3. Note that our method uses only two
consecutive images to estimate the optical flow. The results of Fleet and Jepsons
method [18] for this example were obtained by using a 21frame image sequence [3].
To achieve a fair comparison, we depict the reported results of Szeliski and Coughlan
[77] for the 2frame image sequence. With incorporation of more frames of data, the
results in Szeliski and Coughlan do however improve over those depicted in the table.
Technique
# of
Frames
Average
Error
Standard
Deviation
Density
Fleet and Jepson
21
0.03
O
rH
o
o
100%
Szeliski and Coughlan
2
O
tH
O
0.02
100%
Our method (SSDbased)
2
0.02
0.01
100%
Table 6.3. Summary of Sinusoid 1 results.
For the translating tree example, each frame contains more complicated highly
textured regions, which result in multiple local minima in the associated energy
function to be minimized. To avoid being trapped in a local minimum during the
minimization of a nonconvex energy function, we employ the scalespace tracking
98
problem. For other cases such as sparse data surface reconstruction or dense data
with nonuniform weights, we somehow need to estimate the spectral distribution of
Kd in the frequency domain. We will elaborate further on this spectral estimation in
the following paragraphs.
Since the spectral approximation of the matrix K is achieved in the wavelet
basis, we can approximate the spectral distribution of the matrix via the wavelet
transform. One way is to do the wavelet transform of the matrix K< and then compute
the energy contained in each diagonal block which corresponds to a particular division
of the frequency band, and the computed energy is used to represent the spectral value
for this frequency band. To use this spectral estimation, we need to develop a fast
computational algorithm since the wavelet transform of an N x N matrix requires
0(N2) operations. Another way to estimate the spectral distribution of K is to
take the wavelet transform of its diagonal entries only. The N diagonal entries of the
N x N diagonal matrix K< are grouped into an n x n matrix. Then we compute the
energy from the wavelet transform of this matrix and associate it with the spectral
value of each frequency band. This method takes advantage of the diagonal structure
of the matrix and can be accomplished in O(N) operations. We use this method
to estimate the spectrum of the matrix K< in our experiment.
Using the above discussed spectral approximation to the matrices Ks and K
separately in a wavelet basis, we can obtain the diagonal approximation matrices Ms
and Mj, respectively. Then, the modulation matrix M is taken to be the sum of
Ms and Mj. Since the matrix is obtained from the spectral approximation and
173
O O
O o
vc=o.o
O I o
O I o
Vc = 1.2
OxO
X X
OxO
(a)
O
O
O
O
Vc = 0.25
O o
O I o
vc=0.5
O
o
o
o
Ve = 2.0
(b)
O O
O  O
Vc =2.0
Figure 7.3. (a) Clique for the line process (x: line process, 0: nodal variable for
surface), (b) clique potentials for different line process configurations.
Ill
Figure 5.9. The convergence of Pentlands preconditioner in conjunction with the con
jugate gradient algorithm is worse than that of conjugate gradient algorithm without
preconditioning for the surface reconstruction problem with the synthetic sparse data
set using a membrane stabilizer.
used in conjunction with the conjugate gradient algorithm, which is known to be
much more efficient than a gradient decent algorithm [26]. We use the subsampled
data set in figure 5.3(a) with a membrane stabilizer in a 32 x 32 problem for testing
this preconditioner. The convergence curves for Pentlands preconditioner with CG,
CG and our APCG algorithms are shown in figure 5.9. We can see that Pentlands
preconditioning is worse than the conjugate gradient iterations without precondition
ing. This is because the diagonal dominance behavior of K* after wavelet transform
claimed in [61] doesnt hold in general. Pentlands preconditioner was constructed
by approximating the wavelet transform of the stiffness matrix by its diagonal en
tries based on the diagonal dominance assumption of Ks. The falsity of the diagonal
dominance claim is evident in figure 5.10 where we profile arbitrarily selected rows of
the wavelet transformed membrane matrix Ks of size (64 x 64) for the ID problem.
In [61], the preconditioner was chosen as the diagonal of the stiffness matrix K in
Magnitude Magnitude
112
(a) (b)
(c)
(d)
Figure 5.10. Row profile of the wavelet transformed membrane matrix Ks of size
64 x 64 along different rows, (a) row 3 (b) row 10 (c) row 30 (d) row 50. This figure
establishes the falsity of the diagonal dominance behavior of Ks claimed in [19].
REFERENCES
[1] B. K. Alpert, Wavelets and other bases for fast numerical linear algebra
in Wavelets A Tutorial in Theory and Applications, pp. 181216, Boston:
Academic Press Inc., 1992.
[2] P. Anandan, A computational framework and an algorithm for the measure
ment of visual motion , Intern. J. Comput. Vision, vol. 2, pp. 283310, 1989.
[3] J. L. Barron, D. J. Fleet, and S. S. Beauchemin, Performance of optical flow
techniques, Inti. Journ. of Comput. Vision, vol. 12, no. 1, pp. 4377, 1994.
[4] L. Bedini, I. Gerace, and A. Tonazzini, A deterministic Algorithm for recon
structing images with interacting discontinuities , CVGIP: Graphical Models
and Image Processing vol. 56, pp. 109123, 1994.
[5] L. Bedini and A. Tonazzini, Image restoration preserving discontinuities: the
Bayesian approach and neural networks, Image and Vision Computing vol. 10,
no. 2, pp. 108118, 1992.
[6] M. Bertero, T. A. Poggio, and V. Torre, Illposed problems in early vision ,
Proceedings of the IEEE, vol. 76, pp. 869889, August 1988.
[7] G. Beylkin, R. R. Coifman, and V. Rokhlin, Fast wavelet transforms and
numerical algorithm I, Comm, on Pure and Applied Math., vol. 44, pp. 141
183, 1991.
[8] A. Blake and A. Zisserman, Visual Reconstruction. Cambridge, MA: The MIT
Press, 1987.
[9] D. Braess, The convergence of a multigrid method with GaussSeidel relaxation
for the Poisson equation, Mathematics of Computation, vol. 42, pp. 505519,
April 1984.
[10] B. L. Buzbee, F. W. Dorr, J. A. George, and G. H. Golub, The direct solution
of the discrete Poison equation on irregular regions, SIAM Journal of Numerical
Analysis, vol. 8, pp. 722736, December 1971.
[11] S. L. Campbell and C. D. Meyer, Generalized Inverses of Linear Transforma
tions. London: Pitman Pub., 1979.
181
77
to (63,1) was performed. The data along with the recovered surface using membrane
smoothness are shown in figure 4.5(a) & (b) respectively.
(c)
(d)
Figure 4.4. Surface reconstruction with prespecified discontinuities along a horizon
tal line; (a) original sparse data set with depth discontinuities along the indicated
horizontal line; reconstructed surfaces using a membrane spline after (b) 1 iteration
and (c) 10 iterations of ADI method; (d) reconstructed surface using a thinplate
spline after 16 iterations of ADI method.
4.4.2 Shape from Shading
In this problem, we tested our algorithm on a synthetically generated image of
a Lambertian sphere illuminated by a distant point source and viewed from the same
location as the point source. Figure 4.6(a) depicts the 64 x 64 original shaded image
121
to the normal of the silhouette in the image plane [41], we impose the Direchelet
boundary condition on (p,q) along the occluding boundary. In addition, we use
the occluding boundary to incorporate the discontinuities in (p, q) or z representing
orientation discontinuities or depth discontinuities respectively. In this experiment,
we incorporate the orientation discontinuities along the occluding contour. Figure
5.17 shows the left (north west) and right (north east) images generated from the
computed solutions for our APCG algorithm after 10, 20 and 40 iterations. We can
see that the computed solutions approach to the true solution after incorporating
the boundary conditions. The computed gradient and height after 40 iterations are
shown in figure 5.18.
(a) (b)
Figure 5.18. The computed (a) gradient and (b) height after 40 iterations of the
APCG algorithm.
The convergence of our APCG algorithm was compared to the HCG and CG
algorithms for the SFS problem with the Direchelet boundary condition. The con
vergence curves for the energy, p q error and z error reduction are shown in figure
36
solutions to equations 3.2 and 3.3, (3) the capacitance matrix C is nonsingular,
implying that equation 3.5 has a unique solution, and (4) B(x rÂ¡(3) = b.
Proof for Theorem 1:
(1) Since rank(A) = titua and rank(B) = nmj, it is obvious from their singular
value decompositions that there exist tua orthogonal eigenvectors, (f)1,..., mA, for
A and mg orthogonal eigenvectors, ipx,.. for B such that null(A) = span{
C null( A), we can write each i = 1,..., m, in terms of a linear combination of
the basis vectors x,..., (Â¡>mA Thus, we have
\ Vms J
T
/
(3.6)
where T is of full rank. By using the Householder or Givens transforma
tion [26], we can obtain the factorization of T as follows:
T =
0 T
Q,
(3.7)
92
(a)
A,.,f f 2
Dj4lf] f 2  G
H Gy 2 ~A,f
 2 : Put one zero between each sample
Convolve with filter X
Q
*2  : multiplication by 2
(b)
Figure 5.1. The implementation of the wavelet transform via the QMF structure
from one level to the next, (a) decomposition (b) reconstruction, adopted from Mal
lat [1989].
in figure 5.2. Notice that the closer the QMF filters are to the ideal lowpass and
highpass filters, the better it is for achieving a good frequencydomain division, i.e.,
the overlap between the frequency bands associated with each level is minimized with
close approximations of the ideal low and highpass filters.
In [50], Mallat constructed the 2D wavelet basis by using the tensor products
of the ID wavelet and scaling functions. The QMF implementation of the wavelet
96
5.3 Adaptive Preconditioned Conjugate Gradient Algorithm
As discussed in section 2.3, the discretization of the variational formulation leads
to solving the system of equations Kx = b. The stiffness matrix K is symmetric
positive definite and therefore, the conjugate gradient algorithm can be used to solve
the problem guaranteeing a unique solution. When a preconditioner P is used to
accelerate the convergence, we have what is known as the preconditioned conjugate
gradient algorithm (see [26]) given as follows.
0. Choose an initial Xo, and compute ro = b Kx0; set k = 0.
1. Solve Pzk = r*; k = k + 1.
2. If k = 1, set pi = z0; else compute /3j? = rk_2zk^2 and /?* =
update pk = 7.ki + (3kpki
3. Compute af = rJLjZfci, ak = p^Kp*, and ak = ak /ak'i
4. Update rk = rki akpk.
5. Update xk = xki + akpk.
6. If rk ~ 0, stop; else go to step 1.
The only difference between the preconditioned conjugate gradient algorithm
and the standard conjugate gradient algorithm is the preconditioning process in step
1. Since the preconditioning involves solving the auxiliary linear system Pzk = rk in
each iteration, it is absolutely necessary to have a fast solution to this preconditioning
equation when designing a good preconditioner.
CHAPTER 3
GENERALIZED CAPACITANCE MATRIX THEOREMS
3.1 Introduction
There are numerous applications of the ShermanMorrisonWoodbury formula in
various fields [30] [85]. These applications can be classified into two categories. One is
to compute the inverse of a matrix that has lowrank difference w.r.t. another matrix
whose inverse is available or can be efficiently computed; the other is to solve a linear
system whose matrix differs from a wellstructured matrix by a lowrank matrix and
very efficient methods exist to solve the linear system with the wellstructured matrix.
The latter case is also called the capacitance matrix algorithm. This algorithm has
been widely used for solving linear systems arising from the discretization of the
boundary value problems, primarily for the elliptic partial differential equations. In
this chapter, we state and prove some very general theorems which are then applied
in developing a new capacitance matrix algorithm in the next chapter.
When A 3?nxn, U, V 3?nxp, and both the matrices A and (7 + VrA1U)
are nonsingular, the ShermanMorrisonWoodbury formula [26] gives an expression
for the inverse of (A + UVr) as
(A + UVT)X = A"1 A_1U(7 + VtA1U)1VtA1. (3.1)
28
78
(a) (b)
Figure 4.5. Surface reconstruction with prespecified discontinuities along a diagonal
line; (a) the original sparse data set with depth discontinuities along the indicated
diagonal line, (b) reconstructed surface using a membrane spline after 10 iterations
of ADI method.
with an 8 bit resolution in gray per pixel. The sphere is assumed to have Lambertian
reflectance properties with a constant albedo. Both depth and orientation data are
specified on an occluding contour depicted in figure 4.6(b). Note that we have to
solve three systems of equations namely Kxi = bi, Kx2 = b2 and KjX3 = b3 (see
section 2.3). The first two systems are for recovering the surface normals and the last
equation for recovering the depth from the surface orientation map. We start the ADI
iteration for the linear systems with the initial values of p = q = z = 0. The recovered
surface orientation is shown in figure 4.6(c) and the final surface shape obtained using
the orientation information is depicted in figure 4.6(d). For each computed (p,q) on
the RHS of equation 2.18, the ADI error tolerance was set to 105. The same tolerance
was used for the ADI iterations in the depth from orientation problem.
95
in the higher frequency regions. This is because the division of the frequency domain
becomes finer in the lowerfrequency band with increasing levels (see figure 5.2).
Therefore, we can achieve a better approximation when more levels are used. In our
preconditioner, the number of levels used is determined by the support of the wavelet
basis used and the problem size. Therefore, for a given problem size and wavelet
basis, the number of levels is fixed.
Our preconditioner P can be written as
P = WMWr, (5.7)
where the matrix W is a 2D orthonormal wavelet transformation matrix and the
matrix M is the diagonal modulation matrix, whose diagonal values are obtained
from the spectral function of the regularization filter. From equation 5.7, we can
see that the solution to the auxiliary linear system Pz = r in the preconditioned
conjugate gradient can be computed very efficiently when the wavelet basis are chosen
appropriately, i.e. the QMF filters are of short length/span. As long as the QMF
filters are of finite length, the solution can be obtained in 0(N) operations. The
solution to this auxiliary linear system is obtained in the three steps namely, (1) take
the wavelet transform of the vector r, (2) modulate the transformed vector using the
matrix M, and (3) take the inverse wavelet transform.
35
Proposition 1 Under the assumptions in Theorem 1, we have vfq'j = uJqj = 0 for
1 < i < p and mA mB + 1 < j < mA. In addition, VT[q/1... q'mA_mB] and
UT[qi... q mAmB] are of full rank.
Proof: From the assumption in Theorem 1, Bq'^ = Aq'^ = 0, mA mB + 1 < j <
mA. Since Bq'; = (A + UGV71)^, we have UGV7^ = 0 for mA mB + 1 < j <
mA. From the assumption that U 3?n)
it is obvious that Vrq' = 0. Therefore, we have vfqL = 0 for 1 < i < p and
mA mB + 1 < j < mA. Similarly, we can show that ufq, = 0, 1 < i < p and
mA mB + l < j < mA.
We use proof by contradiction to show that the matrix V^q'j... q,m^_mB] is of
full rank. Lets assume this matrix is rank deficient, i.e., there exists a nonzero vector
a = (
then r null{A) and r is orthogonal to null(B). Postmultiplying both sides of the
equation B = A + UGVT by r yields Br = Ar + UGVTr. Since Ar = 0 and
VTr = 0, we obtain Br = 0. But, from the above, r is a nonzero vector orthogonal to
nu//(B), i.e. Br ^ 0, leading to a contradiction. Consequently, the assumption that
VT[q,1 ...q'm.mj is rank deficient is not true, implying that VT[q'1... q'm/l_mJ
is of full rank. Similarly, we can prove that UT[qx... qmylmB] is of full rank.
We are now equipped to prove Theorem 1. The proof is in four parts, and
we will show that (1) there exist two sets of orthogonal bases, {qx,..., qmyl} and
{q'j,..., q'm^}, for null(AT) and null(A) respectively, such that null(BT) = span{
qnums+1, ,qmx} and null{ B) = span{ q/mil_mB+1,..., q'myl}, (2) there exist

