• TABLE OF CONTENTS
HIDE
 Title Page
 Acknowledgement
 Table of Contents
 List of Figures
 List of Tables
 Abstract
 Introduction
 The camera model
 A single camera
 Stereo cameras
 Further results and discussion
 Appendix A: Visual calibration...
 Appendix B: Graphical calibration...
 References
 Biographical sketch














Title: Improvement of the camera calibration through the use of machine learning techniques
CITATION PDF VIEWER THUMBNAILS PAGE IMAGE ZOOMABLE
Full Citation
STANDARD VIEW MARC VIEW
Permanent Link: http://ufdc.ufl.edu/UF00100823/00001
 Material Information
Title: Improvement of the camera calibration through the use of machine learning techniques
Physical Description: Book
Language: English
Creator: Nichols, Scott A., 1969-
Publisher: University of Florida
Place of Publication: Gainesville Fla
Gainesville, Fla
Publication Date: 2001
Copyright Date: 2001
 Subjects
Subject: Cameras -- Calibration   ( lcsh )
Computer vision   ( lcsh )
Three-dimensional imaging   ( lcsh )
Electrical and Computer Engineering thesis, M.S   ( lcsh )
Dissertations, Academic -- Electrical and Computer Engineering -- UF   ( lcsh )
Genre: government publication (state, provincial, terriorial, dependent)   ( marcgt )
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )
 Notes
Summary: ABSTRACT: In computer vision, we are frequently interested not only in automatically recognizing what is present in a particular image or video, but also where it is located in the real world. That is, we want to relate two-dimensional image coordinates and three-dimensional world coordinates. Camera calibration refers to the process through which we can derive this mapping from real-world coordinates to image pixels. We propose to reduce the amount of effort required to compute accurate camera calibration models by automating much of the calibration process through machine learning techniques. The techniques developed are intended to simplify the calibration process such that a minimally trained person can perform single, stereo, or multiple fixed-camera calibrations with precision and ease. In this thesis, we first develop a learning algorithm for improving a single calibration model through a combination of gradient descent and stochastic model perturbation. We then develop a second algorithm that applies specifically to simultaneous calibration of multiple fixed cameras. Finally, we illustrate our algorithms through a series of examples and discuss avenues for further research.
Summary: KEYWORDS: camera calibration
Thesis: Thesis (M.S.)--University of Florida, 2001.
Bibliography: Includes bibliographical references (p. 43-44).
System Details: System requirements: World Wide Web browser and PDF reader.
System Details: Mode of access: World Wide Web.
Statement of Responsibility: by Scott A. Nichols.
General Note: Title from first page of PDF file.
General Note: Document formatted into pages; contains vii, 45 p.; also contains graphics.
General Note: Vita.
 Record Information
Bibliographic ID: UF00100823
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: oclc - 49053113
alephbibnum - 002763565
notis - ANP1587

Downloads

This item has the following downloads:

nichols_thesis ( PDF )


Table of Contents
    Title Page
        Page i
    Acknowledgement
        Page ii
    Table of Contents
        Page iii
        Page iv
    List of Figures
        Page v
    List of Tables
        Page vi
    Abstract
        Page vii
    Introduction
        Page 1
        Page 2
        Page 3
    The camera model
        Page 4
        Page 5
        Page 6
        Page 7
        Page 8
    A single camera
        Page 9
        Page 10
        Page 11
        Page 12
        Page 13
        Page 14
        Page 15
        Page 16
        Page 17
        Page 18
        Page 19
        Page 20
        Page 21
        Page 22
    Stereo cameras
        Page 23
        Page 24
        Page 25
        Page 26
        Page 27
        Page 28
        Page 29
    Further results and discussion
        Page 30
        Page 31
        Page 32
        Page 33
        Page 34
        Page 35
        Page 36
        Page 37
        Page 38
    Appendix A: Visual calibration evaluation
        Page 39
        Page 40
    Appendix B: Graphical calibration tool
        Page 41
        Page 42
    References
        Page 43
        Page 44
    Biographical sketch
        Page 45
Full Text











IMPROVEMENT OF THE CAMERA CALIBRATION
THROUGH THE USE OF MACHINE
LEARNING TECHNIQUES















BY

SCOTT A. NICHOLS


A THESIS PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
MASTER OF SCIENCE

UNIVERSITY OF FLORIDA


2001
















ACKNOWLEDGMENTS


I wish to thank Dr. Antonio Arroyo for taking a long shot on an average undergrad. You have

my gratitude, and have helped me make much more of myself. I also wish to thank Dr. Michael

Nechyba for more things than I can enumerate here, but will make a half-hearted effort to. Thank

you for your patience; your service as an idea blackboard that corrects mistakes; your recommen-

dation of Danela's Ristorante; oh yeah, and your patience. I also wish to thank the members of the

Machine Intelligence Lab that I have shared workbench space with over the years for ideas and

motivation.




















TABLE OF CONTENTS




page


ACKNOWLEDGMENTS ............................................. ii


ABSTRACT ........ ................................................. vii


CHAPTERS


1 INTRODUCTION ................


1.1 Computer Vision. .........
1.2 Camera Calibration ........
1.3 ThisW ork...............


2 THE CAMERA MODEL ..........


2.1 Introduction.............
2.2 Definition .............
2.3 Training the model ........


3 A SINGLE CAMERA .............


Introduction..............
Training Edges ..........
Optimization Criterion .....
Initial Models ...........
Gradient Descent. .........
Model Perturbation ........
Gradient-Peturbation Hybrid
Performance Comparisons ..


4 STEREO CAMERAS ...........


Introduction..............
Related W ork ............
Our Approach ............
Training Data ............
Model Improvement.......


..................................


..................................
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . 2



. . . . . . . . . . . . . . . . . 2


. . . . . . . . . . . . . .. . . . 4
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . 5



. . . . . . . . . . . . . .. . . . 9


. . . . . . . . . . . . . .. . . . 9
. . . . . . . . . . . . . . . . . 1 0
. . . . . . . . . . . . . . . . . 1 1
. . . . . . . . . . . . . . . . . 1 2
. . . . . . . . . . . . . . . . . 1 3
. . . . . . . . . . . . . . . . . 1 5
. . . . . . . . . . . . . . . . . 1 8
. . . . . . . . . . . . . . . . . 1 9


. . . . . . . . . . . . . . . . . 2 3


. . . . . . . . . . . . . . . . . 2 3
. . . . . . . . . . . . . . . . . 2 4
. . . . . . . . . . . . . . . . . 2 5
. . . . . . . . . . . . . . . . . 2 5
. . . . . . . . . . . . . . . . . 2 5
. . . . . . . . 10

. . . . . . . . 11
. . . . . . . . 12
.2 . . . . . .

.4 . . . . . .

.4 . . . . . .
.5 . . . . . .

.7 . . . . . . . .2


.9 . . . . . . . 2

.9 . . . . . . . .2
.10 ......... .......2
.11 ......... .......2
.1 . . . . . . . 25















5 FURTHER RESULTS AND DISCUSSION. ............... ........... 30

5.1 Single C am era .......... ..... .................. ........... 30
5.2 Stereo Cameras .......... .... ................ .......... 30

APPENDICES

A VISUAL CALIBRATION EVALUATION .............................. 39

B GRAPHICAL CALIBRATION TOOL ............................... 41

REFERENCES ................. .................................. 43

BIOGRAPHICAL SKETCH .................. ........................ 45















LIST OF FIGURES


figure page

2-1 The Pinhole Model of Perspective Projection ........................... 5
3-1 Example Training Edges ............... ....................... 11
3-2 Error Types of Initial M odels .................................... 13
3-3 Gradient Descent vs. Different Types of Error ......................... 15
3-4 Stochastic Perturbation vs. Different Types of Error .................. 16
3-5 Stochastic Perturbation with Adaptive Delta vs. Different Types of Error .... 17
3-6 Gradient-Perturbation Hybrid vs. Different Types of Error ............... 18
3-7 Performance Comparison for the Translational and Scale Initial Models .... 20
3-8 Performance Comparison for the Close and Rotational Initial Models. ...... 21
3-9 Final Models Using the Gradient Perturbation Hybrid Technique ......... 22
4-1 Results for a Non-Weighted Model Improvement ................... .. 27
4-2 Error Over Time for Various Gains on the Training Model ............... 28
4-3 The Long Term Performance using Various Gains ................... .. 29
5-1 Example Single Camera Calibrations (1 & 2) .......................... 31
5-2 Example Single Camera Calibrations (3 & 4) .......................... 32
5-3 Example Single Camera Calibrations (5 & 6) .......................... 33
5-4 Example Single Camera Calibrations (7 & 8) .......................... 34
5-5 Poor Initial Calibrations (1 & 2) .................................. 35
5-6 Poor Initial Calibrations (3 & 4) .................................. 36
5-7 Stereo Calibration Examples (1 & 2). .............................. 37
5-8 Stereo Calibration Examples (3 & 4). .............................. 38
A-i The Calibration Grid ............... .......................... 40
A-2 The Experimental Area ........................................ 40
B-l The Graphical Calibration Tool. ............................. 42















LIST OF TABLES


table page

3-1 Model Perturbation: Average Error Per Pixel ........................... . 19

3-2 Model Perturbation With Adaptive Delta: Average Error Per Pixel ............ 19

3-3 Gradient Perturbation Hybrid: Average Error Per Pixel ................. .. 19
















Abstract of Thesis Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Master of Science

IMPROVEMENT OF THE CAMERA CALIBRATION
THROUGH THE USE OF MACHINE
LEARNING TECHNIQUES

By

Scott A. Nichols

August 2001

Chairman: Dr. Michael C. Nechyba
Major Department: Electrical and Computer Engineering


In computer vision, we are frequently interested not only in automatically recognizing what

is present in a particular image or video, but also where it is located in the real world. That is, we

want to relate two-dimensional image coordinates and three-dimensional world coordinates. Cam-

era calibration refers to the process through which we can derive this mapping from real-world

coordinates to image pixels. We propose to reduce the amount of effort required to compute accu-

rate camera calibration models by automating much of the calibration process through machine

learning techniques. The techniques developed are intended to simplify the calibration process

such that a minimally trained person can perform single, stereo, or multiple fixed-camera calibra-

tions with precision and ease. In this thesis, we first develop a learning algorithm for improving a

single calibration model through a combination of gradient descent and stochastic model perturba-

tion. We then develop a second algorithm that applies specifically to simultaneous calibration of

multiple fixed cameras. Finally, we illustrate our algorithms through a series of examples and dis-

cuss avenues for further research.
















CHAPTER 1
INTRODUCTION



1.1 Computer Vision

For decades, researchers have been attempting to duplicate in machine-centered systems

what we as humans do on a daily basis with our eyes to recognize and understand the world

around us through visual input. To date, however, our imagination has outpaced the reality of state-

of-the-art computer vision systems. While science fiction has often depicted robots and machines

with human-like capabilities, current computer vision systems do not come close to matching

those imagined capabilities. In fact, we are still many years away from computers that can rival

humans and other animals in image processing and recognition capabilities. Why? Computer

vision, it appears, is a much more difficult problem than was first believed by early researchers. In

the late sixties with the spread of general purpose computers, researchers felt that a solution to the

general vision problem the near instantaneous recognition of any visual input was achievable

within a short number of years. Since then, we have come to understand the enormous computa-

tional resources our own brains devote to the vision processing task, and the consequent challenges

that the general computer vision problem poses.

Therefore, rather than develop one computer vision system with very general capabilities,

researchers have begun to focus on implementing practical computer-vision applications that are

limited in scope but can successfully carry out specific tasks. Some examples of recent work

include face and car recognition [14], people detection and tracking [6, 13], computer interaction

through gestures [15], handwriting recognition [9], traffic monitoring and accident reporting [8],

detection of lesions in retinal images [17], and even automated aircraft landing [5]. In many of

1







2

these computer vision projects, researchers are not only interested in recognizing what is present in

an image or video; they would also like to infer 3D geometric information about the world from

the image itself. If a system is to relate with and/or draw conclusions about the 3D position of

objects in the image, it needs to be calibrated; that is, we need to derive a relationship between the

3D geometry of the real world and corresponding image pixels.


1.2 Camera Calibration

Developing accurate calibration or camera models in computer vision has been the focus of

much research over the years. Many researchers have opted for the following simple approach:

first the 3D location of certain known image pixels is identified; then, these pairs of correspon-

dence are exploited to estimate the parameters of the calibration model. From this model, intrinsic

camera parameters, which define the properties of the camera sensor and lens, and the extrinsic

parameters, which define the pose of the camera with respect to the world, can be extracted. This

process can be burdensome, as it requires many precise position measurements. Often, someone

doing computer vision research spends as much time fiddling with and worrying about camera cal-

ibration, as they do on their actual application.


1.3 This Work

In this thesis, we propose to reduce the amount of effort required to compute accurate cam-

era calibration models, by automating much of the calibration process through machine learning

techniques. The techniques developed are intended to simplify the calibration process such that a

minimally trained person can perform single, stereo, or multiple fixed camera calibrations with

precision and ease. In Chapter 2, we review the basics of camera calibration. Then, in Chapter 3 we

develop a training algorithm for improving a single-camera calibration model from constrained

features in the image. Next, in chapter 4, we build on the previous chapter by developing an algo-







3

rithm for improving multiple fixed camera calibration models. Finally, in chapter 5, we present fur-

ther results and discuss possible extensions of this work.
















CHAPTER 2
THE CAMERA MODEL



2.1 Introduction

Camera calibration in computer vision refers to the process of determining a camera's inter-

nal geometric and optical characteristics (intrinsic parameters) and/or the 3D position and orienta-

tion of the camera relative to a specified world coordinate system (extrinsic parameters) [16]. We

do this in order to extract 3D information from image pixel coordinates and to project 3D informa-

tion onto 2D image coordinates. Cameras in computer vision can be mounted as fixed view, a pan-

ning and tilting view, or can be integrated onto a mobile system. Panning, tilting and mobile

cameras do not have a fixed relationship to any world coordinate system (extrinsic parameters).

For such systems, information is often available only in the camera coordinate system, which is

defined by the direction of the camera at the moment the image was captured. External sensors and

encoders can alleviate this problem, but typically only at a significantly higher cost. In fixed cam-

era systems, on the other hand, both the intrinsic and extrinsic parameters combine to provide a

transformation between 3D world and 2D image coordinates.

In recent years, many different techniques have been developed for camera calibration. The

differences in these techniques primarily reflect the wide array of applications that researchers

have pursued. One of the most popular, and the one that forms the basis of this work, is direct lin-

ear transformation (DLT) introduced by Abdel-Aziz and Kahara [1]. The DLT method does not

consider radial lens distortion, but is conceptually and computationally simple. In 1987 Tsai [16]

proposed a method that is likely one of the most referenced works on the topic of camera calibra-

tion. It outlines a two-stage technique using the "radial alignment constraint" to model lens distor-

4







5

tion. Tsai's method involves a direct solution for most of the camera parameters and some iterative

solutions for the remaining parameters. The cameras used in this work appear to have little or no

lens distortion. Since it has been shown byWeng, Cohen and Herniou [18] that Tsai's method can

be worse than DLT if lens distortion is relatively small, we chose the DLT method.


2.2 Definition

In this thesis, we apply the pinhole lens model of perspective projection, whose basic geom-

etry is shown in Figure 2-1. This model constructs a transformation from 3D world coordinates to

pixels in an image in three steps. First, 3D world coordinates are converted to 3D camera coordi-

nates through a homogeneous transformation. Let us denote P, = (x,, y,, z,) as a coordinate in

the world, and Pc = (xc, yc, z) as the corresponding 3D camera coordinate. Then, the homoge-

neous transform T is defined by,


Figure 2-1: The Pinhole Model of Perspective Projection











T- R t
000 1[






1C = T F1


where R denotes a 3 x 3 rotation matrix,








and t denotes a 3 x 1 translation vector,


r11
R-
R 3i21
r31


r12 r13
r22 r23
r32 r3j


t=


Second, the 3D camera coordinate Pc is transformed to a 3D sensor coordinate P, = (u, v, w) :


where the camera's intrinsic matrix K is defined as,


-fa -fb -u, 0

K 0 -fc -vo0 (2-6)
0 0 10
0 0 01

In equation (2-6)fis the effective focal length of the camera; a, b, and c describe the scaling

in x and y and the angle between the optical axis and the image sensor plane, respectively; and, uo


such that,


(2-1)


(2-2)


(2-3)


(2-4)


(2-5)








7

and vo are the offset from the image origin to the imaging sensor origin. Finally, the 3D sensor

coordinate P, is converted to the 2D image coordinate (x,, y) :




w (2-7)
v


The complete projection equation is therefore given by,


[1 = KT p (2-8)


The transformation KT on the world coordinate in equation (2-8) can be combined into a

single matrix S. Since we are only interested in the overall transform between world coordinates

and image coordinates, and not an explicit solution of the extrinsic and/or intrinsic camera param-

eters, we therefore write,




P, = S P (2-9)


or alternatively,


S1l S12 S13 S14
P2 = S21 S2 S S (2-10)

S31 S32 S33 S34

2.3 Training the model

From equation (2-10), we now have a linear transformation with 12 unknowns that relate

world coordinates to image pixels. Each correspondence between a 3D world coordinate and a 2D

image point corresponds to a set of two equations,










(S31X + S32Y + 33Z + 34) = W(S11X + 12 + S13Z + 14)
(2-11)
v(S31X + s32y + 33Z + 34) = W(S21X+ S22y +23Z + S24)

or, in terms of the actual image coordinates,

x,(S31 + s32Y + S33Z + 34) = (S11 + S12y + S13Z+ S14)
(2-12)
y,(s31 + s32y + 33z + 34) = (s21 + s22y + 23z+ 24)


Given a set of n pairs of world and image coordinates, equation (2-11) can be written in matrix

form as,



xy z 1 0 0 0 -xx -xoy -xoz -xo s1
0 0 0 0 x y z 1 -y z -y 1 = 0 (2-13)


S34



Arbitrarily setting s34 = 1 leaves 11 unknown parameters which can be solved for using linear

regression. In general, the more correspondence pairs that are defined, the less susceptible the

model is to noise. To a large extent, this thesis aims to reduce or eliminate the need to collect many

such precise correspondence pairs, as that can be labor intensive and/or prone to human operator

error.
















CHAPTER 3
A SINGLE CAMERA



3.1 Introduction

A common method of calibration is to place a special object in the field of view of the cam-

era [10, 11, 16, 21]. Since the 3D shape of the object is known apriori, the 3D coordinates of spec-

ified reference points on the object can be defined in an object-relative coordinate system. One of

the more popular calibration objects described in the literature consists of a flat surface with a reg-

ular pattern marked on it. This pattern is typically chosen such that the image coordinates of the

projected reference points (corners, for example) can be measured with great accuracy. Given a

number of these points, each one yielding an equation of the form (2-9), the perspective transfor-

mation matrix S can be estimated with good results. One drawback of such techniques is that

sometimes a calibration object might not be available. Even when a calibration object is available,

the world coordinate system is defined by the placement of that object with respect to the camera,

and is not necessarily optimized to take advantage of the geometry of the scene.

In another popular method, called structure from motion, the camera is moved relative to the

world, and points of correspondence between consecutive images define the camera model [7, 21,

22]. In this approach, however, only the intrinsic parameters of the camera can be estimated (e.g.

the K matrix); as such, this method is used primarily for stereo vision and will be addressed in the

subsequent chapter.

In our work, we propose to divide the calibration process into two stages. First, we propose

to generate an initial "close" camera model, that then gets optimized to run-time quality through

standard machine learning techniques. Since our system will improve its model over time, all we

9







10

require initially is method for generating reasonably close calibrations without undue effort or

complexity. This type of problem was addressed by Worrall, et al. [19] through a graphical calibra-

tion tool, where the user can rotate and translate a known grid to the desired position on the image.

The system then calculates a perspective projection matrix S that will place the grid at that location

in the image. Our group has implemented a similar, intuitive GUI interface, which allows us to

generate fast and easy initial calibration estimates; this interface is described in further detail in

Appendix B.


3.2 Training Edges

Now, in order to improve the calibration model from the GUI interface through machine

learning, we need something for the machine learning algorithm to train on. Since we do not want

to require human operators to meticulously and precisely select numerous known correspondence

pairs in an image, we should select features in the image that can be easily isolated or extracted

through simple image processing techniques. In man-made environments, constrained edges with

known dimensions are frequent and stand out visually; some examples of these might be the inter-

section of the floor and a wall, the comer of a room, window sills, etc. These features can provide

a wealth of training data, without requiring explicit and precise image-to-world correspondence.

As such, we choose to rely on such constrained data for improving our initial camera calibration

model.

Exploiting the geometry of a given scene, a set of lines is chosen in 3D space, where each of

the lines is constrained along two dimensions; for example the vertical intersection of two perpen-

dicular walls is constrained by x = Co and y = C, where Co and C1 are known constants. The

pixels corresponding to each of the lines and the constraints that define each line are the basis for

our model improvement algorithm. In order not to bias the training algorithm, the lines (edges)








11

should be chosen so as to provide training data that is balanced throughout the region of interest

both in area and scale as shown, for example, in Figure 3-1.



3.3 Optimization Criterion

Given our constrained-edges training data, we must define a reasonably well-behaved opti-

mization criterion that lets us know how the training algorithm is progressing. Care should be used

when choosing the optimization criterion, as it is the only mechanism a system has to evaluate a

potential model. After some experimentation, our final optimization criterion was designed to

reflect the error between the actual and projected pixel locations of the constrained edges and is

defined by,




E = n E, (3-1)
e = 00



where e denotes a training edge, i denotes a pixel along that edge, n denotes the number of pixels

along a particular edge, and m denotes the total number of edges. In equation (3-1), Ee, is defined

by,


(a) (b)
Figure 3-1: Example Training Edges










Ee = 2 (x,- x +(y,-y ) (3-2)



where (x,, y,) is a actual pixel location in the training set and (x yp) is the projected pixel loca-

tion, which is computed as follows. First, given the two constraints of a single edge (e.g. x = Co,

y = C ), a pixel from the corresponding image line (x,, y,) and the current perspective projection

model S, apply equation (2-11) to generate two equations and one unknown. For example, for the

constraints specified above, the two equations would become,



(S31s S11)C0 + (S32Xs -S12)C1 + (S33s S13)Z = S14 -S34s
(3-3)
(S31s S21)C0 + (S32s S22)C1 + (S33ys S23) = S24- S34Ys


Equations (3-3) are easily generalized to arbitrary lines in space, and can be solved for the

unknown coordinate (or parameter, in the general case) through linear regression. Then, we can

project the resulting 3D coordinate onto the image, using equations (2-8) and (2-7), to get (xP, y )



The camera perspective will have some effect on the training data. Given two equal length

edges, the one with a larger image crossection will have more sample points and therefore will

contribute more to the error. To compensate for this, the error generated by each edge is averaged

to an error per pixel along that edge. The average error of each edge is then averaged to obtain the

final error measure E.


3.4 Initial Models

Error in a calibration model can be decomposed into three basic types: rotation, translation,

and scale. Any training algorithm may handle these different sources of error with varying degrees

of success. Therefore, we investigate how well our algorithm (defined below) performs on improv-

ing initial models in four general classes, as defined and illustrated in Figure 3-2. Each of the first







13

three initial models in Figure 3-2 is labeled by the dominant type of error displayed. The fourth

model exhibits a combination of errors, but was generated to be fairly close to an acceptable final

model. Given a graphical calibration tool, as described in Appendix B, the close model represents

an easily achievable and therefore most likely starting configuration. The other three cases are pre-

sented to establish the limits of our training approach and to determine the types of error that

present the most difficulty.


3.5 Gradient Descent

The error measure defined in equation (3-2) can be expanded to,


(a) Close


(b) Rotation


(c) Scale (d) Translation

Figure 3-2: Error Types of Initial Models











Ee= 275 Sli x+S12Y +S13z+sl4 2 S21X + S22Y + S23Z+S24 (3-4)
et s S31X + S32Y + S33Z + S34 s S31X + S32Y + S33Z + S34



This is a differentiable function in S for which we can compute the gradient VEe, with respect to

the parameters in S such that,



-T
VE, E, E TEe, (3-5)
as,11 dS33



Note that since s34 is assigned to be equal to one, it is not part of the gradient in equation (3-5).

From equation (3-1), the overall gradient VE is then givenby,



1 m n VE
VE = m x VE,. (3-6)
m Y, Y, n
e = = 0



Given this gradient, the current model S can now be modified by a small positive constant 8, in

the direction of the negative gradient,



Snew = (1 VE6g)S (3-7)

For error surfaces that can be roughly approximated as quadratic, we would expect the

model recursion in equation (3-7) to converge to a good near-optimal solution. Figure 3-3 below

illustrates the performance of pure gradient descent for the four types of initial models. From these

plots, it is apparent that the gradient descent recursion very quickly gets stuck in a local minimum

that is far from optimal; all four types of initial models caused gradient descent to fail within 2.75

seconds.1 In other words, the error surface is decidedly non-quadratic in a global sense, and gradi-

ent-descent represents at best only a partial training algorithm for this problem.

1. All experiments in this thesis were run on a 700 MHz Pentium III running Linux.










3.6 Model Perturbation

Another method for improving model is stochastic model perturbation. In this approach, the

current model S is first perturbed by a small delta 8p along a random direction,


Sw = (1+VD6p)S.


(3-8)


The error for the new model S,,e is computed; if it represents an improvement, the current

model becomes the new model; otherwise, the new model is discarded, and we simply try another

random perturbation of the current model.This approach is much less sensitive to the local minima

problem, since the random perturbations can effectively "jump out" of local minima; that is, the

direction VD is not constrained to be in the negative gradient direction. Figure 3-4 illustrates train-


48

47.5

47

46.5


0 0.1 0.2 0.3 0.4 0.5
Close initial model


0 0.5 1 1.5 2 2.5
Rotation Error


0 0.25 0.5 0.75 1 1.25 1.5 0 0.25 0.5 0.75 1 1.25 1.5
Scale Error Translation Error

Figure 3-3: Gradient Descent vs. Different Types of Error








16

ing results for the four initial model types. Note that there is an immediate and significant improve-

ment for all of the initial models, particularly for the cases of translation error, where training

results in an order-of-magnitude improvement. In each of the cases, the bulk of the model improve-

ment occurs in less than one minute. The models improve further over time, although at some point

there is a trade off between model quality and computing time.

One possible approach to speeding up improvement over time is to introduce an adaptive

perturbation delta that grows when the current model is stuck for some period of time in a rela-

tively isolated local minimum. Examples of convergence with an adaptive delta are shown in Fig-


Avg Pixel Error

4
3.5
3
2.5
2
1.5


Avg Pixel Error

2o0


0. s Time (sec)

0 200 400 600 800 1000 1200
(a) Close initial model

Avg Pixel Error
30

25

20

15

10

5 ^------- -- -


0 200


Time (sec)

0 200 400 600 800 1000 1200
(b) Rotation Error


Avg Pixel Error
25--


s Time (sec)


lime (sec)
400 600 800 1000 120C 0 200 400 600 800 1000 120C
(c) Scale Error (d) Translation Error


Figure 3-4: Stochastic Perturbation vs. Different Types of Error









17

ure 3-5. Not surprisingly, the results are very similar, but the adaptive delta does improve


performance slightly, and, we expect, would increasingly improve performance if used over a


longer period on time.


Avg Pixel Error

4
3.5
3
2.5
2
1.5
1
0.5 Time (sec)

0 200 400 600 800 1000 1200
(a) Close initial model


Avg Pixel Error
30

25

20

15

10

5


Time (sec)


Avg Pixel Error

20


15


10


5
Time (sec)

0 200 400 600 800 1000 1200
(b) Rotation Error


Avg Pixel Error
25


Time (sec)


0 200 400 600 800 1000 120C 0 200 400 600 800 1000 120C
(c) Scale Error (d) Translation Error

Figure 3-5: Stochastic Perturbation with Adaptive Delta vs. Different Types of Error








18

3.7 Gradient-Peturbation Hybrid


Given the relative speed of convergence of gradient descent in localized near quadratic

neighborhoods, and the insensitivity of stochastic perturbation to local minima, a combination of

gradient descent and model perturbation may well train faster than either approach by itself. In this

hybrid approach, gradient descent will quickly reach local minima, at which point, adaptive-delta

model perturbation takes over to search for better regions in model parameter space. In other

words, gradient descent and stochastic perturbation alternate in optimizing the camera model over

time. Sample results for this approach are shown in Figure 3-6.


Avg Pixel Error


Avg Pixel Error

20-


Time (sec)

0 200 400 600 800 1000
(a) Close initial model


Avg Pixel Error
30

25

20

15

10

5


Time (sec)


Time (sec)

0 200 400 600 800 1000 1200
(b) Rotation Error


Avg Pixel Error
25

20

15

10

5


Time (sec)


0 200 400 600 800 1000 1200 0 200 400 600 800
(c) Scale Error (d) Translation Error

Figure 3-6: Gradient-Perturbation Hybrid vs. Different Types of Error


1000










3.8 Performance Comparisons

Which approach is best for our purposes depends on a number of different criteria. Ideally,

we would like a method that responds quickly, improves in the face of difficult models, and is able

to continue to improve if given an open-ended amount of time. As we have already seen, and as we

further detail inTables 3-1, 3-2 and 3-3, each method improves the camera model the most within


Table 3-1: Model Perturbation: Average Error Per Pixel


Initial


Close

Rotation

Scale

Translation


3.9336

21.2163

28.9091

23.0482


20 Sec.

2.0053

8.8545

6.9955

1.7082


1 Min.

1.9618

6.5318

6.9518

1.6143


5 Min.

1.6327

6.2082

6.8318

1.3182


10Min.

1.5182

6.1391

5.4664

1.1909


20 Min.

1.4246

5.8464

5.2064

1.1618


Table 3-2: Model Perturbation With Adaptive Delta: Average Error Per Pixel


Close

Rotation

Scale

Translation


Initial

3.9336

21.2163

28.9091

23.0482


20 Sec.

2.0053

9.138

7.0035

1.7082


1 Min.

1.9691

6.8472

6.9515

1.6145


5 Min.

1.64

6.22

5.8091

1.3255


10Min.

1.5972

5.9091

5.2945

1.2227


20 Min.

1.4127

5.2491

4.5236

1.19


Table 3-3: Gradient Perturbation Hybrid: Average Error Per Pixel


Close

Rotation

Scale

Translation


Initial

3.9336

21.2164

28.9091

23.0482


20 Sec.

2.0364

12.4909

14.13

1.6909


1 Min.

1.9173

8.0218

6.7918

1.5627


5 Min.

1.5664

5.3855

5.7545

1.4


10Min.

1.4736

5.1991

5.5655

1.3118


20 Min.

1.3355

5.0564

4.6891

1.1964









20

the first minute of training. For each approach the error then continues to decline, but the two tech-


niques that make use of an adaptive delta show a continuing ability to improve over time with the


hybrid approach outperforming the others. A direct comparison is shown in Figures 3-7 and 3-8,


Translation Scale


Pixels


Pixels


0.02

0
STime (sec)
-0.02

0.04


0 200 400 600 800 1000
Perturbation Perturbation with delta


Pixels

0.1


0


-0.1


-0.2
Time (sec)


0 200 400 600
Perturbation Hybrid


800 1000


1
0.75

0.5

0.25

0
-0 .25 Time (sec)

-0.5
0 200 400 600 800 100C
Perturbation Perturbation with delta


Pixels

1

0.5

0

-0.5

-1 Time (sec)

0 200 400 600 800 100C
Perturbation Hybrid


Pixels

0.1

0.05

0

-0.05

-0.1

-0.15
_n )


Pixels


0.5

0

-0.5

-1

-1.5


Time (sec)


S0 200 400 600 800 1000 0 200 400 600 800 100C
Perturbation with Delta Hybrid Perturbation with Delta Hybrid

Figure 3-7: Performance Comparison for the Translational and Scale Initial Models










21

which plot the difference in error between the different techniques over time. The final camera

models trained by the hybrid method are depicted in Figure 3-9.


Close


Pixels


Rotation


Pixels


0.02

0

-0.02

-0.04

-0.06

-0.08 ime (sec)
0 200 400 600 800 1000
Perturbation Perturbation with delta


Pixels


Perturbation Hybrid


Pixels

0.125
0.1
0.075
0.05
0.025
0
-0.025


0.4

0.2



-0.2

Time (sec)
0.6
0 200 400 600 800 100C

Perturbation Perturbation with delta


Pixels


0.8

0.6

0.4

0.2

0 Time (sec)

0 200 400 600 800 100C
Perturbation Hybrid


Pixels


0.75
0.5
0.25
0


Perturbation with Delta Hybrid Perturbation with Delta Hybrid

Figure 3-8: Performance Comparison for the Close and Rotational Initial Models
























(a) Close (b) Rotation


(c) Scale (d) Translation

Figure 3-9: Final Models Using the Gradient Perturbation Hybrid Technique



Why does training virtually eliminate translation error while rotation and scale errors prove

more difficult to correct? The answer lies in the camera model itself [see equation (2-8)]. Transla-

tion is parameterized by 3 of the 12 parameters in S, while rotation is parameterized by 9 parame-

ters, which are additionally constrained by the orthonormality requirement for rotation matrices.

Finally, scale affects all 12 parameters in S. Thus, to correct rotational errors, 9 model parameters

must be changed simultaneously, while for scale errors, all 12 parameters must be changed. This

adds complexity and size to the search space, resulting in a slower improvement and a greater like-

lihood of getting stuck in a remote local minimum.
















CHAPTER 4
STEREO CAMERAS


4.1 Introduction

We are interested in abstracting 3D information about the world through computer vision

techniques. Therefore, we will, in general, require more than one non-coincident camera view for

an area of interest, since pixels from a single-camera image do not correspond to exact 3D world

coordinates, but rather to rays in space. If we can establish a feature correspondence between at

least two camera views calibrated to the same world coordinate system, then we can extract the

underlying 3D information for that feature by intersecting the rays corresponding to its pixel loca-

tion in each image. Hence, in this chapter, we consider the problem of simultaneous calibration of

multiple (stereo) fixed cameras to a unique world coordinate system. More specifically, we build

on the results of the previous chapter by assuming that one camera in a multi-camera setup has

already been trained towards a good model. The task that remains is to calibrate the additional

camerass. In the remainder of this chapter, we focus on the two-camera case, although our results

generalize in a straightforward manner to more than two cameras.

Given a set of two cameras where one has already been trained, and the second has a rough

initial calibration, we propose to improve the second calibration through an iterative process that

involves using a "virtual calibration object." This calibration object is created by moving a physical

object throughout the area of interest and tracking it from each camera view. The path that the

object follows in the image (in pixels) and the corresponding 3D estimates constitute the training

data at each step of the algorithm.









4.2 Related Work

Previous work in stereo camera calibration is relatively extensive, but varies based on the

particular application of interest. Many of the previous methods use a known calibration pattern

whose features are extracted from each image. Rander and Kanade [12] have a system of approxi-

mately 50 cameras arranged in a dome to observe a room-sized space. In order to calibrate these

cameras, a large calibration object is built and then moved to several precise locations, in effect

building a virtual calibration object that covers most of the room. Do [4] applied a neural network

to acquire stereo calibration models and determined the approach to be adequate; however, he went

on to show that high accuracy did not appear possible in a reasonable amount of training time.

Horaud et al. [7] use rigid and constrained motions of a stereo rig to allow camera calibration. This

approach, like most in the field, address small baseline-stereo rigs which usually have a fixed, rigid

geometry. Azarbayejani and Pentland [2] propose an approach that tracks an item of interest

through images to obtain useable training data. This approach is similar to ours except that they

use object tracking to completely establish a calibration rather than using it to modify an already

existing calibration. The drawback of this technique is that scale is not recoverable, so absolute

distance has no meaning. Chen et al. [3] offer a similar technique that uses structure-from-motion

to obtain initial camera calibration models, and then tracks a virtual calibration object to iterate to

better models. The initial calibrations are obtained sequentially, where each new calibration model

depends on those previously derived. As such, error accumulates for each sucessive calibration,

placing newer calibrations further away from an optimal solution. Our work has shown (e.g. Fig-

ures 3-4, 3-5 and 3-6) that this can have a significant impact on both the final calibration obtained

and the amount of time required to obtain it.









4.3 Our Approach

In this research, our primary motivation is to make the calibration process as quick and easy

as possible without sacrificing precision. We wish for a non-expert to be able to completely cali-

brate a system for operation by making part of the calibration process visual and automating the

rest. Each camera obtains its initial calibration via a graphical calibration tool (as described in

Appendix B). The calibration is then improved by having the cameras track an object of interest

throughout the viewing area. Ideally, image capture and processing should be synchronized

between cameras; however, in practice, such synchronization is difficult and expensive to achieve

in hardware. Even for an asynchronous system, however, we can approximate synchronous opera-

tion through Kalman filtering and interpolation of time-stamped images, as long as the system

clocks are synchronized between the image processing computers. This Kalman-filtered and inter-

polated trajectory then becomes the training data for improving our stereo camera models.


4.4 Training Data

In order to collect training data, we apply a modified version of Zapata's color-model-based

tracking algorithm [20] to track an object of interest from multiple camera views in real-time. The

time-stamped pixel data of the object's centroid are passed through a first-order Kalman filter and

then sent to a multi-camera data fusor. The data fusor accepts the time-stamped data streams, inter-

polating and synchronizing the multiple data streams at 30Hz. Prior to training, the synchronized

tracking data is balanced so that no one single region of the image is dominated by a disproportion-

ately large amount of data. Although training examples reported in this thesis are over fixed data

sets, there is no algorithmic obstacle to training off streaming data in real-time.


4.5 Model Improvement

The training data now consists of n synchronized sets of pixel values from the m multiple

cameras. Let us consider the set of pixel values for the object at time t:







26

{(x, yl), (x2,y), ... (Xm, Y)}t (4-1)



where (x, y ) denotes the pixel location of the object for camera j. Given our current estimate of

each camera model SJ, we can estimate the 3D world coordinate of the object at time t by regress-

ing the equations,



(s31,J s 11,)xt + (S32',X S12,')yt + (S33,XJ S13j)zt = S14J S34,JXJ (4-2)
(S31,jJ S21,j)t + (S32,yj S22,j)yt + (S33,jy S23,j)t = S24,j S34,JYJ


j { 1, 2, ..., m }, for (xt, Yt, z) which denotes the estimated 3D world coordinate at time t.



For each training camera, we now have n estimated correspondence pairs between the syn-

chronized pixel values for that camera and their corresponding estimated 3D world coordinates.

Given this data, we can now apply equation (2-13) to generate a new perspective projection matrix

S,new based on the estimated 3D tracking data. This process is repeated until the calibration

reaches acceptable precision. Figure 4-1 shows the error over time for the above approach. There is

initial improvement followed by rapid and consistent model degradation. This is not what was

expected, but can be explained after looking a little closer at the procedure and how it operates.

The resulting calibration in Figure 4-1 gives an indication of the source of the problem. Recall

from equation (2-8) that part of the projection matrix is the camera matrix K which contains intri-

nisic parameters for the camera such as scale, skew and offset. These parameters are fixed in real-

ity but are obviously being changed by this training process. This is happening due to the

unconstrained nature of the training process: known incorrect (x, y, z) are being used to train a

model which is simply a linear least squares solution and K, as a component of this model, is also

being changed. We are seeking to use a good model to train a bad model by modifying its rotation

and translation, not its intrinsic camera parameters. A property of the linear least squares estimate












Error (mm)

800

600

400

200

Time (sec)
0
0 25 50 75 100 125 150 175

(a) Error Over Time


(b) Initial Calibration


(c) Best Calibration (d) Final Calibration

Figure 4-1: Results for a Non-Weighted Model Improvement


is that it distributes the error as evenly as possible between the two models. The even distribution

of the error is not appropriate for us since we know that the source of the error is the bad model.

Weighing the parameters of the regression towards the training model as an indication that it is the

source of the error might help. Figure 4-2 shows the error vs. time as a function of different

weights applied to the training model. It shows that we can mitigate the effect on the K matrix of

the training model with a weighted regression. A gain of ten shows better performance but is still

unstable. A gain of 100 or 1000 shows much better stability and performance. Looking at the error

over time for the different gains it appears that a higher gain exhibits a similar response when

viewed on a larger scale.













Error (mm)
8O
70
60
50 J
40
30
20
10 Time (sec)

0 25 50 75 100 125 150 175
(a) Gain of 10 onTraining Modell


Error (mm)


Error (mm)
80
70
60
50
40
30
20
10


Time (sec)


0 25 50 75 100 125 150 175
(b) Gain of 100 on Training Model


Error (mm)


Time (sec)


0 25 50 75 100 125 150 175
(c) Gain of 1000 onTraining Model


Time (sec)
0 200 400 600 800 100012001400

(d) Gain of 10000 on Training Model


Figure 4-2: Error Over Time forVarious Gains on theTraining Model













Error (mm)


Time (sec)
0 25 50 75 100 125 150 175
(a) Gain of 1


Error (mm)


80
70
60
50
40
30
20
10 Time (sec)

0 25 50 75 100 125 150 175
(b) Gain of 10


Error (mm)


100


Time (sec)
0 250 500 750 1000 1250 1500

(c) Gain of 100


20
Time (sec)

0 100 200 300

(d) Gain of 1000


Figure 4-3: The LongTerm Performance using Various Gains


Error (mm)















CHAPTER 5
FURTHER RESULTS AND DISCUSSION


5.1 Single Camera

Here, we present some further results for the algorithms developed in the previous two chap-

ters. Below, we solve eight sample calibration problems, where the initial calibrations are obtained

using our previously referenced graphical calibration tool These are shown in Figures 5-1, 5-2, 5-3

and 5-4. Given these descent initial calibrations, our algorithms develop run-time quality models

within two minutes from the start of training. Figures 5-5 and 5-6 show the systems perfor-

mance in the face of poor initial calibrations. These figures show that even a poor initial

calibration can be improved significantly, but not always to run-time quality. Overall, these

results show that our single-camera calibration approach meets our goal namely, mak-

ing camera calibration a simpler, faster and easier process.


5.2 Stereo Cameras

Figure 5-7 and Figure 5-8 show four example calibrations obtained from two different cam-

era angles. The initial calibrations are obtained using our graphical calibration tool. These calibra-

tions and the associated graphs reveal the capabilities and shortcomings of our proposed stereo

technique. While in its present form, it may not yet be sufficiently robust for real world applica-

tion, it can certainly be made to be so with minor changes. As we discussed in Chapter 4, if we

constrained modifications of the camera model to extrinsic parameters only, keeping the intrinsic

parameters of the model fixed, we expect that the observed model drift would no longer occur.

































(a) Example 1 Initial


(c) Example 2 Initial


(b) Example 1 Final


(d) Example 2 Final


56 7

5 6
5
4
4
3

2 2

1 1

0 20 40 60 80 100 0 500 1000 1500 2000

(e) Example 1 Error vs. Time (f) Example 2 Error vs. Time



Figure 5-1: Example Single Camera Calibrations (1 & 2)



































(a) Example 3 Initial


(c) Example 4 Initial


(b) Example 3 Final


(d) Example 4 Final


4 5
3.5
4
3
2.5 3
22
1.5
1
0.5

0 500 1000 1500 2000 0 500 1000 1500 2000

(e) Example 3 Error vs. Time (f) Example 4 Error vs. Time



Figure 5-2: Example Single Camera Calibrations (3 & 4)


































(a) Example 5 Initial


(c) Example 6 Initial


(b) Example 5 Final


(d) Example 6 Final


17.5
15
12.5
10
7.5
5
2.5

0 500 1000 1500 2000

(e) Example 5 Error vs. Time


0 200 400 600 800 1000 1200 1400

(f) Example 6 Error vs. Time


Figure 5-3: Example Single Camera Calibrations (5 & 6)


































(a) Example 7 Initial


(c) Example 8 Initial


(b) Example 7 Final


(d) Example 8 Final


7
8
6
5 6
4
3 4
2
2
1

0 200 400 600 800 0 100 200 300 400 500

(e) Example 7 Error vs. Time (f) Example 8 Error vs. Time



Figure 5-4: Example Single Camera Calibrations (7 & 8)





























(a) Poor Calibration 1 Initial


(c) Poor Calibration 2 Initial


(d) Poor Calibration 2 Final


0 50 100 150 200 250 300 350 0 100 200 300 400 500 600
(e) Poor Calibration 1 Error vs. Time (f) Poor Calibration 2 Error vs. Time


Figure 5-5: Poor Initial Calibrations (1 & 2)


(b) Poor Calibration 1 Final



































(a) Poor Calibration 3 Initial


(c) Poor Calibration 4 Initial


25

20

15

10

5


0 100 200 300 400 500

(e) Poor Calibration 3 Error vs. Time


(b) Poor Calibration 3 Final


(d) Poor Calibration 4 Final


16
14
12
10
8
6
4
2

0 100 200 300 400 500 600

(f) Poor Calibration 4 Error vs. Time


Figure 5-6: Poor Initial Calibrations (3 & 4)






























(a) Initial Stereo Calibration 1


(c) Initial Stereo Calibration 2


(d) Final Stereo Calibration 2


0 20 40 60 80 100 120 140
(e) Calibration 1 Error vs. Time


0 50 100 150 200 250
(f) Calibration 2 Error vs. Time


Figure 5-7: Stereo Calibration Examples (1 & 2)


(b) Final Stereo Calibration 1


































(a) Initial Stereo Calibration 3


(c) Initial Stereo Calibration 4


60 100
57.5 90
55 80
52.5 70
50 60
47.5 50
45 40
42.5 30

0 50 100 150 200

(e) Calibration 3 Error vs. Time


(b) Final Stereo Calibration 3


(d) Final Stereo Calibration 4


0 25 50 75 100 125 150 175

(f) Calibration 4 Error vs. Time


Figure 5-8: Stereo Calibration Examples (3 & 4)
















APPENDIX A
VISUAL CALIBRATION EVALUATION


When applying machine learning to improve camera models, an well-behaved error measure

is critical. With exact correspondence data (2D image coordinates paired with 3D world coordi-

nates), such an error measure is easily specified. It is precisely this type of data that we want to

avoid having to collect, however. Yet, without such data, it is difficult, if not impossible, to define a

globally well-behaved error measure. As such, we choose the human eye as an appropriate means

of evaluating different calibration models. To make this intuitive, we draw a grid defined in space

onto the image using the current camera model, and decide visually if the system's error measure

is accurately measuring the quality of a model. This grid is chosen to match some features) in the

scene to allow a person to visually determine the quality of a calibration. Figure A-2 shows the

experimental area. In Figure A-2, the red arrows indicate the comer points of a one meter cube

placed in the far comer of the room, while the green arrows show the world coordinate system,

defined after considering the experimental area and how to keep initial data collection as simple as

possible. Figure A-i shows a sample grid drawn on an image. The grid uses 20cm2 squares and

should ideally be aligned with the floor and walls. Points marked with a red arrow in Figure A-2

should have the outside comer of the fifth box out or up laying on it. The windowsill is a straight

edge that a human can use as a guide for the top edge of the grid.
































Figure A-2: The Experimental Area


Figure A-1: The Calibration Grid















APPENDIX B
GRAPHICAL CALIBRATION TOOL


In order to generate initial rough calibration models quickly and easily, we developed an

intuitive graphical interface through which a user can manipulate and align a grid model of a scene

for different rotations and translations of the camera with respect to the world. This interface, as

shown in Figure B-1, was written in C for the X/Motif window system.

Given approximate intrinsic parameters for a camera, we can project a three-dimensional

model of the world onto an image for any given rotation, translation and scale (effective focal

length). Therefore, as the user adjusts parameters that control scale, rotation and translation, the

grid model of the world is continuously redrawn to reflect the new effective pose of the camera.

Once the user is happy with the alignment of the grid model to the world, he/she can then save the

effective perspective projection matrix and use that as the initial camera calibration model.







































Figure B-1: The Graphical Calibration Tool
















REFERENCES


[1] Y I. Abdel-Aziz and H. M. Kahara, "Direct Linear Transformation From Comparator Coor-
dinates into Object-Space Coordinates," Proc. American Society ofPhotogrammetry Sympo-
sium on Close Range Photogrammetry, pp. 1-18, 1971.

[2] A. Azarbayejani and A. Pentland, "Camera Self Calibration From One Point Correspon-
dence," Perceptual Computing Technical Report #341, Massachusetts Institute of Technol-
ogy Media Library, 1995.

[3] X. Chen, J. Davis and P. Slusallek, "Wide Area Calibration Using Virtual Calibration
Objects," Proc. IEEE Conf on Computer ;ini and Pattern Recognition, vol. 2, pp. 520-7,
2000.

[4] Y Do, "Application of Neural Networks for Stereo-Camera Calibration," Proc. Int. Joint
Conf on Neural Networks, vol. 4, pp 2719-22, 1999.

[5] R. Frezza and C. Altafini, "Autonomous Landing by Computer Vision: An Application of
Path following in SE(3)," Proc. IEEE Conf on Decision and Control, vol. 3, pp. 2527-32,
2000.


[6] I. Haritaoglu, D. Harwood and L. Davis, "W4: Who? When? Where? What? A Real Time
System for Detecting and Tracking People," Proc. IEEE Int. Conf on Face and Gesture Rec-
ognition, pp. 222-7, 1998.

[7] R. Horaud, G. Csurka and D. Demirdijian, "Stereo Calibration from Rigid Motions," IEEE
Trans. on Pattern andMachine Intelligence, vol. 22, no. 12, pp. 1446-52, 2000.

[8] S. Kamijo, Y Matsushita, K. Ikeuchi and M. Sakauchi, "Traffic Monitoring and Accident
Detection at Intersections," IEEE Trans. on Intelligent Transportation Systems, vol. 1, no. 2,
pp. 108-18, 2000.

[9] F. Karbou and F. Karbou, "An Interval Approach to Handwriting Recognition," Proc. Conf
of the North American Fuzzy Information Processing Society, pp. 153-7, 2000.

[10] J.M. Lee, B.H. Kim, M.H. Lee, K. Son, M.C. Lee, J.W. Choi and S.H. Han, "Fine Active Cal-
ibration of Camera Position/Orientation Through Pattern Recognition," IEEE Int. Symposium
on Industrial Electronics, vol. 2, pp. 657-62, 1998.

[11] P. Mendonca and R. Cipolla, "A Simple Technique for Self-Calibration," Proc. IEEE Conf. on
Computer J-,",,, andPattern Recognition, vol. 1, pp. 500-5, 1999.









[12] P. Rander, "A Multi-Camera Method for 3D Digitization of Dynamic, Real-World Events,"
CMU-RI-TR-98-12, Technical Report, The Robotics Institute, Carnegie Mellon University,
1998.

[13] G. Rigoll, S. Eickeler and S. Muller, "Person Tracking in Real-World Scenarios Using Statis-
tical Methods," Proc. IEEE Int. Conf on Automatic Face and Gesture RI. I IgIil ,, pp.342-7,
2000.

[14] H. Schneiderman, "A Statistical Approach to 3D Object Detection Applied to Faces and
Cars," CMU-RI-TR-00-06, Ph.D. Thesis, The Robotics Institute, Carnegie Mellon Univer-
sity, 2000.

[15] R. Sharma, M. Zeller, V.I. Pavlovic, T.S. Huang, Z. Lo, S. Chu, Y Zhao, J.C. Phillips and K.
Schulten. "Speech/Gesture Interface to a Visual-Computing Environment," IEEE Computer
Graphics and Applications, vol. 20, no. 2, pp. 29-37, 2000.

[16] R. Tsai, "A Versatile Camera Calibration Technique for High-Accuracy 3D Machine Vision
Metrology Using Off-the-Shelf TV Cameras and Lenses," IEEE Journal of Robotics and
Automation, vol. RA-3, no. 4, pp 323-44, 1987.

[17] H. Wang, W. Hsu, K. Guan and M. Lee, "An Effective Approach to Detect Lesions in Color
Retinal Images," Proc. IEEE Conf on Computer Vi, n, an Pattern Recognition, vol. 2, pp.
181-6, 2000.

[18] J. Weng, P. Cohen and M. Herniou, "Camera Calibration with Distortion Models and Accu-
racy Evaluation," IEEE Trans. ofPattern Analysis and Machine Intelligence, vol. 14, no. 10,
pp. 965-80, 1992.

[19] A.D.Worrall, G.D.Sullivan and K.D.Baker, "A Simple Intuitive Camera Calibration Tool For
Natural Images," Proc. of the 5th British Machine ti,,, Conf, vol. 2, pp. 781-90, 1994.

[20] I. Zapata, "Detecting Humans in Video Sequences Using Color and Shape Information," M.S.
Thesis, Dept. of Electrical and Computer Engineering, University of Florida, 2001.

[21] Z. Zhang, "A Flexible Technique for Camera Calibration," IEEE Trans. on Pattern Analysis
andMachine Intelligence, vol.22, no. 11, pp. 1330-4, 2000.

[22] Z. Zhang, "Motion and Structure of Four Points from One Motion of a Stereo Rig with
Unknown Extrinsic Parameters," IEEE Trans. on Pattern Analysis andMachine Intelligence,
vol. 17, no. 12, pp. 1222-5, 1995.















BIOGRAPHICAL SKETCH


Scott Nichols was born in Miami, Florida, in 1969. A high school dropout, Scott decided to

pursue an education and started community college full-time in 1994. He transferred as a junior to

the University of Florida in 1996 and recieved both a Bachelor of Science degree in electrical engi-

neering and a Bachelor of Science degree in computer engineering in Aug 1999. He has since

worked as a research assistant at the Machine Intelligence Laboratory, working toward a Master of

Science degree in electrical engineering.




University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs