Citation
Multimodal Fusion

Material Information

Title:
Multimodal Fusion a Theory and Applications
Creator:
Peng, Yang
Place of Publication:
[Gainesville, Fla.]
Florida
Publisher:
University of Florida
Publication Date:
Language:
english
Physical Description:
1 online resource (103 p.)

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Science
Computer and Information Science and Engineering
Committee Chair:
WANG,ZHE
Committee Co-Chair:
CHEN,SHIGANG
Committee Members:
RANKA,SANJAY
SAHNI,SARTAJ KUMAR
WONG,TAN FOON

Subjects

Subjects / Keywords:
big-data -- data-science -- multimodal-fusion -- query-driven -- scalability
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
bibliography ( marcgt )
theses ( marcgt )
government publication (state, provincial, terriorial, dependent) ( marcgt )
born-digital ( sobekcm )
Electronic Thesis or Dissertation
Computer Science thesis, Ph.D.

Notes

Abstract:
As data grows larger and larger nowadays, Big Data and Data Science are becoming more and more prominent in Computer Science. In Data Science, not only the volume of data is important for research, but also the variety of data has drawn a lot of attention from researchers. In recent years, we have seen more and more complex datasets with multiple kinds of data. For example, Wikipedia is a huge dataset with unstructured text, semi-structured documents, structured knowledge and images. We call a dataset with different types of data as a multimodal dataset. This dissertation focuses on employing multimodal fusion on multimodal data to improve performance for various tasks, as well as providing scalability and high efficiency. In this dissertation, I first introduce the concepts of multimodal datasets and multimodal fusion, and then different applications for multimodal fusion, such as information extraction, word sense disambiguation, information retrieval and knowledge base completion. Multimodal fusion is the use of algorithms to combine information from different kinds of data with the purpose of achieving better performance. Multimodal datasets studied in this dissertation include images, unstructured text and structured facts in knowledge bases. I present the correlative and complementary relations between different modalities and propose a theory on multimodal fusion based on this observation. Previous work usually focused on exploiting the correlation between different modalities at the feature level and ignored the complementary relation between modalities. Early fusion and late fusion have been used as two schemes to combine multimodal data, but little explanation about how to effectively design multimodal fusion algorithms has been discussed. In this dissertation, I discuss multimodal fusion from a deeper perspective, explain why multimodal fusion works and analyze how to design multimodal fusion algorithms to improve performance for tasks based on the correlative and complementary relations on different multimodal datasets. We then present the multimodal ensemble fusion model to combine images and text for a few applications, including word sense disambiguation and information retrieval. In our ensemble fusion model, text processing and image processing are conducted on text and images separately and different fusion algorithms are employed to effectively combine the results. The ensemble fusion model can effectively exploit the correlative and complementary relations between images and text to improve performance. Experimental results demonstrate ensemble approaches outperform image-only and text-only approaches. We build a query-driven knowledge base completion system based on multimodal fusion with web-based question answering and rule inference to combine information from the Web and knowledge bases. We design a novel web-based question answering system to extract facts from the Web with multimodal features and an effective question template selection algorithm, which can achieve better performance with much fewer questions than previous work. We build an augmented rule inference engine to infer new facts using logical rules learned from knowledge bases and web-based question answering. We design different fusion algorithms to combine web-based question answering and rule inference to achieve high performance. We use a few query-driven optimization techniques, such as query-driven snippet filtering, to improve the efficiency of the whole system. Scalability and efficiency are also important aspects in this dissertation. We employ streaming processing for fact extraction, which can efficiently process terabytes of text data in less than one hour on a single machine. We implement a scalable image retrieval system over millions of images using distributed systems and map-reduce, which can run much faster than previous work. For knowledge base completion, query-driven techniques are applied to improve system efficiency. ( en )
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis:
Thesis (Ph.D.)--University of Florida, 2017.
Local:
Adviser: WANG,ZHE.
Local:
Co-adviser: CHEN,SHIGANG.
Statement of Responsibility:
by Yang Peng.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
LD1780 2017 ( lcc )

Downloads

This item has the following downloads:


Full Text

PAGE 32

PP=fp1;p2;p3;p4;p5gp1p2 p3p4p5

PAGE 40

(1;3;2;2)T

PAGE 41

0:1%

PAGE 42

s

PAGE 43

O(sk) O(sk) O(p%sk) O(ssqrt(k)) p%kkO(p%sk)O(klogk) O(p%sk)sklogk ktktkbk=ktkbO(skt)O(skb)mO(skbkt=m)O(sk=m)ktkbmO(ssqrt(k))

PAGE 45

5%ktkbm

PAGE 48

sqrt(k)

PAGE 52

scorescore(c1;c2;c3;::;cn)T(s1;s2;s3;::;sn)Tscore ws1s2dit(s1;ci1)(s2;ci2)(s1;ct1)(s2;ct2)ci1ci2ct1ct2s1s2[0;1]s1s2wditscoreiscoret

PAGE 53

s1s2 cf1=ci1+(1)]TJ /F4 11.9552 Tf 11.955 0 Td[()ct1

PAGE 54

cf2=ci2+(1)]TJ /F4 11.9552 Tf 11.955 0 Td[()ct2 =Accuracyi=(Accuracyi+Accuracyt)d scoref=scorei+(1)]TJ /F4 11.9552 Tf 11.955 0 Td[()scoret =APi=(APi+APt) sc(s1;ci1)(s2;ci2)(s1;ct1)(s2;ct2)(s1;0:45)(s2;0:55)(s1;0:91)(s2;0:09)s1d0:91s1scoreiscoretscoref ci1ci2ct1ct2

PAGE 56

L2

PAGE 61

(s1;1:0)(s2;0:0)(s1;0:95)(s2;0:05)s1(s1;0:0)(s2;0:0) (s1;0:55)(s2;0:45)(s1;0:60)(s2;0:40)s1

PAGE 77

T=ft1;t2;:::;tng Q=; QS=; i=1i<=ni++ tjTQ[ftjgtT Q=Q[ftjg QS=QS[fQg T=T)-222(ftjg QmQS Qm

PAGE 78

tjTQTi=1i=2

PAGE 79

2

PAGE 81

isMarriedTo(x;y)^hasChild(y;z)hasChild(x;z)isMarriedTo(x;y)=)isMarriedTo(y;x)isMarriedTo(x;y)^hasChild(y;z)=)hasChild(x;z)

PAGE 83

diedIn(x;y)=)wasBornIn(x;y)score(wasBornIn(x;y))=score(diedIn(x;y))scorerscorer

PAGE 84

isMarriedTo(x;y)^hasChild(y;z)=)hasChild(x;z)isMarriedTo(x;y)^hasChild(y;z)=)hasChild(x;z)scorerscore(isMarriedTo(x;y))score(hasChild(x;z))scorer mnmn

PAGE 85

m+1k AP=(Pnk=1p(k)r(k))=nknp(k)kr(k)k)]TJ /F5 11.9552 Tf 12.267 0 Td[(1k

PAGE 86

OO

PAGE 88

k=1

PAGE 92

350=15015020=30 320=3025%