- Permanent Link:
- Multimodal Fusion a Theory and Applications
- Peng, Yang
- Place of Publication:
- [Gainesville, Fla.]
- University of Florida
- Publication Date:
- Physical Description:
- 1 online resource (103 p.)
- Doctorate ( Ph.D.)
- Degree Grantor:
- University of Florida
- Degree Disciplines:
- Computer Science
Computer and Information Science and Engineering
- Committee Chair:
- Committee Co-Chair:
- Committee Members:
- Subjects / Keywords:
- big-data -- data-science -- multimodal-fusion -- query-driven -- scalability
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
- bibliography ( marcgt )
theses ( marcgt )
government publication (state, provincial, terriorial, dependent) ( marcgt )
born-digital ( sobekcm )
Electronic Thesis or Dissertation
Computer Science thesis, Ph.D.
- As data grows larger and larger nowadays, Big Data and Data Science are becoming more and more prominent in Computer Science. In Data Science, not only the volume of data is important for research, but also the variety of data has drawn a lot of attention from researchers. In recent years, we have seen more and more complex datasets with multiple kinds of data. For example, Wikipedia is a huge dataset with unstructured text, semi-structured documents, structured knowledge and images. We call a dataset with different types of data as a multimodal dataset. This dissertation focuses on employing multimodal fusion on multimodal data to improve performance for various tasks, as well as providing scalability and high efficiency.
In this dissertation, I first introduce the concepts of multimodal datasets and multimodal fusion, and then different applications for multimodal fusion, such as information extraction, word sense disambiguation, information retrieval and knowledge base completion. Multimodal fusion is the use of algorithms to combine information from different kinds of data with the purpose of achieving better performance. Multimodal datasets studied in this dissertation include images, unstructured text and structured facts in knowledge bases.
I present the correlative and complementary relations between different modalities and propose a theory on multimodal fusion based on this observation. Previous work
usually focused on exploiting the correlation between different modalities at the feature level and ignored the complementary relation between modalities. Early fusion and late fusion have been used as two schemes to combine multimodal data, but little explanation about how to effectively design multimodal fusion algorithms has been discussed. In this dissertation, I discuss multimodal fusion from a deeper perspective, explain why multimodal fusion works and analyze how to design multimodal fusion algorithms to improve performance for tasks based on the correlative and complementary relations on different multimodal datasets.
We then present the multimodal ensemble fusion model to combine images and text for a few applications, including word sense disambiguation and information retrieval.
In our ensemble fusion model, text processing and image processing are conducted on text and images separately and different fusion algorithms are employed to effectively combine the results. The ensemble fusion model can effectively exploit the correlative and complementary relations between images and text to improve performance. Experimental results demonstrate ensemble approaches outperform image-only and text-only approaches.
We build a query-driven knowledge base completion system based on multimodal fusion with web-based question answering and rule inference to combine information from
the Web and knowledge bases. We design a novel web-based question answering system to extract facts from the Web with multimodal features and an effective question template selection algorithm, which can achieve better performance with much fewer questions than previous work. We build an augmented rule inference engine to infer new facts using logical rules learned from knowledge bases and web-based question answering. We design different fusion algorithms to combine web-based question answering and rule inference to achieve high performance. We use a few query-driven optimization techniques, such as query-driven snippet filtering, to improve the efficiency of the whole system.
Scalability and efficiency are also important aspects in this dissertation. We employ streaming processing for fact extraction, which can efficiently process terabytes of text
data in less than one hour on a single machine. We implement a scalable image retrieval system over millions of images using distributed systems and map-reduce, which can run much faster than previous work. For knowledge base completion, query-driven techniques are applied to improve system efficiency. ( en )
- General Note:
- In the series University of Florida Digital Collections.
- General Note:
- Includes vita.
- Includes bibliographical references.
- Source of Description:
- Description based on online resource; title from PDF title page.
- Source of Description:
- This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
- Thesis (Ph.D.)--University of Florida, 2017.
- Adviser: WANG,ZHE.
- Co-adviser: CHEN,SHIGANG.
- Statement of Responsibility:
- by Yang Peng.
- Source Institution:
- Rights Management:
- Applicable rights reserved.
- LD1780 2017 ( lcc )
This item has the following downloads:
O(sk) O(sk) O(p%sk) O(ssqrt(k)) p%kkO(p%sk)O(klogk) O(p%sk)sklogk ktktkbk=ktkbO(skt)O(skb)mO(skbkt=m)O(sk=m)ktkbmO(ssqrt(k))
s1s2 cf1=ci1+(1)]TJ /F4 11.9552 Tf 11.955 0 Td[()ct1
cf2=ci2+(1)]TJ /F4 11.9552 Tf 11.955 0 Td[()ct2 =Accuracyi=(Accuracyi+Accuracyt)d scoref=scorei+(1)]TJ /F4 11.9552 Tf 11.955 0 Td[()scoret =APi=(APi+APt) sc(s1;ci1)(s2;ci2)(s1;ct1)(s2;ct2)(s1;0:45)(s2;0:55)(s1;0:91)(s2;0:09)s1d0:91s1scoreiscoretscoref ci1ci2ct1ct2
T=ft1;t2;:::;tng Q=; QS=; i=1i<=ni++ tjTQ[ftjgtT Q=Q[ftjg QS=QS[fQg T=T)-222(ftjg QmQS Qm
m+1k AP=(Pnk=1p(k)r(k))=nknp(k)kr(k)k)]TJ /F5 11.9552 Tf 12.267 0 Td[(1k