Citation
Finding Objects in Complex Scenes with Top-Down and Bottom-Up Information

Material Information

Title:
Finding Objects in Complex Scenes with Top-Down and Bottom-Up Information
Creator:
Burt, Ryan M
Place of Publication:
[Gainesville, Fla.]
Florida
Publisher:
University of Florida
Publication Date:
Language:
english
Physical Description:
1 online resource (94 p.)

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
PRINCIPE,JOSE C
Committee Co-Chair:
WU,DAPENG
Committee Members:
WONG,TAN FOON
KEIL,ANDREAS

Subjects

Subjects / Keywords:
neurocomputing -- saliency
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre:
bibliography ( marcgt )
theses ( marcgt )
government publication (state, provincial, terriorial, dependent) ( marcgt )
born-digital ( sobekcm )
Electronic Thesis or Dissertation
Electrical and Computer Engineering thesis, Ph.D.

Notes

Abstract:
Humans have the ability to view a scene and form an overall representation in a remarkably short length of time. However, due to the complexity of visual search, it is reasonable to assume that humans do not fixate on and process every small region in an image. Instead, the entire image is quickly sent through a pyramidal processing mechanism that selects fixation regions for more attention. In contrast, current computer vision methods such as convolutional neural networks employ a sliding window based method that take small patches across the entire image. By selecting regions in the image which are interesting and processing only those we can avoid convolving over the entire image, which should reduce the correlated dimensions in the network by skipping over large uniform regions. In the same way that convolving learned filters over an image is a step beyond scanning pixel-by-pixel, processing still images as videos of small frames composed of visually interesting regions could be a further step that simply discards large regions of the image that have little to no effect on classification. Rather than convolving across the entire image, it is possible to take inspiration from the human vision system and only process interesting regions that have significance to the viewer. Humans use a mixture of bottom-up attention (edges and bright colors that attract eyes) and top-down goals (searching for specific objects) in order to choose where to fixate next. These top-down goals are informed by both memories and current emotional states that affect the visual cortex and cause it to respond to different stimuli. Using the human vision system as inspiration, we use a novel saliency metric based on gamma kernels as the basis of a simple bottom-up saccade and fixation system that is capable of finding interesting objects in scenes that is both faster to compute and more accurate than alternatives. By processing images with this focus of attention before forming representations with the network of choice, it is possible to both speed the computation and improve classification results by removing background data. We can also use the attention to augment the data and provide a form of supervision to the feature extraction network without using explicit labels. Lastly, we can create a visual search mechanism by using the convolutional layers in the feature extraction network to pre-process the image, then learning a set of weights on the feature maps that correspond to specific objects. By doing this, we are sharing information between the attention and the feature extraction methods, where the output of each informs the input of the other. This mimics the "what-where" pathways in the brain, where the pathways are separate but interconnected. ( en )
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis:
Thesis (Ph.D.)--University of Florida, 2017.
Local:
Adviser: PRINCIPE,JOSE C.
Local:
Co-adviser: WU,DAPENG.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2018-06-30
Statement of Responsibility:
by Ryan M Burt.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Embargo Date:
6/30/2018
Classification:
LD1780 2017 ( lcc )

Downloads

This item has the following downloads:


Full Text

PAGE 21

gk,(n1,n2)=k+1 2k!q n21+n22k)]TJ /F5 7.9701 Tf 6.587 0 Td[(1e)]TJ /F6 7.9701 Tf 6.587 0 Td[(p n21+n22n1n2kkk=1k=k>1k=1

PAGE 22

=1k=1=1k=10 kmm=2 gtotal=M)]TJ /F5 7.9701 Tf 6.586 0 Td[(1Xm=0=()]TJ /F7 11.9552 Tf 9.299 0 Td[(1m)gm(km,m)

PAGE 23

S=jgLj+jgaj+jgbj 3>1 S=(SG(2))G(.5)

PAGE 24

k=[1,26,1,25,1,19]=[2,2,1,1,.5,.5]=5 128x171

PAGE 26

n

PAGE 27

n

PAGE 31

n

PAGE 46

)]TJ /F7 11.9552 Tf 9.299 0 Td[(4545

PAGE 47

WTA(xf,r,c)=xf,r,cifxf,r,c=maxr,c(xf,r,c)WTA(xf,r,c)=0,else Lt=E[(xt1D(E(xt1)))2+(xtD(R(xt1)))2],

PAGE 48

n xtEDRE n

PAGE 51

n NwsNwwsNNssssk=1=.2k=9=.5

PAGE 52

n n n

PAGE 54

k=1,k=1,k=1=.1,=.3,and=.8k=13,k=9,k=5=.3,=.5,=.7

PAGE 56

NwsNwwsNNssss

PAGE 57

k=1,=.2k=8,=.5k=1,=.5k=5,.5 89.5%86.0% 6%

PAGE 65

S=PNn=1winjgCnj NC

PAGE 66

w wni win=1smInsmOnmn wn wn=MX1smIn smOn M

PAGE 67

k=(1,9)mu=(.2,.5).kmu

PAGE 68

3=4

PAGE 69

90%

PAGE 70

XYxZXXxY

PAGE 73

k=(1,60,1,38,1,19)=(.05,.5,.1,.5,.5,.5)n1=n2=400

PAGE 77

n