Fast SVM Training Using Approximate Extreme Points

MISSING IMAGE

Material Information

Title:
Fast SVM Training Using Approximate Extreme Points
Physical Description:
1 online resource (114 p.)
Language:
english
Creator:
Nandan, Manu
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
KHARGONEKAR,PRAMOD P
Committee Co-Chair:
TALATHI,SACHIN S
Committee Members:
WILSON,JOSEPH N
GADER,PAUL D
RANGARAJAN,ANAND
CARNEY,PAUL RICHARD

Subjects

Subjects / Keywords:
approximate -- extreme -- kernels -- svm
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Support vectors machines (SVMs) are widely applied machine learning algorithms that have many desirable characteristics. However, with the ever increasing size of datasets, it has become increasingly difficult to use SVMs in modern applications. In this dissertation we address two disadvantages of SVMs. Firstly, non-linear kernel SVM solvers need excessive training times with large datasets. Secondly, linear SVM solvers need to be parallelized to train on web-scale datasets that are large and high dimensional. State-of-the-art linear SVM solvers, though fast, are difficult to parallelize. We propose a modification, called the approximate extreme points support vector machine (AESVM), that is aimed at overcoming these disadvantages. Our approach relies on conducting the SVM optimization over a carefully selected subset, called the representative set, of the training dataset. We present analytical results that indicate the similarity of AESVM and SVM solutions. Linear or log-linear time algorithms based on convex hulls and extreme points are used to compute the representative sets. We also propose an algorithm to post-process the solution of SVMs to enable fast classification. A variant of the algorithm is easy to parallelize and is designed to compute efficiently on frameworks such as MapReduce. Extensive computational experiments on thirteen datasets compared our algorithms to other modern SVM solvers. We compared our non-linear AESVM solver to LIBSVM 15, CVM 76 , BVM 75, LASVM 7, SVM perf 36, and the random features method 61. Our AESVM implementation was found to train much faster than the other methods, while its classification accuracy was similar to that of LIBSVM in all cases. In particular, for a seizure detection dataset, AESVM training was almost 500 times faster than LIBSVM and LASVM and 20 times faster than CVM and BVM. Additionally, AESVM also gave competitively fast classification times. To evaluate our parallel linear AESVM solver, DRSpl, we compared against LIBLINEAR 25, LIBOCAS 29, and a bagging based SVM solver. DRSpl computed accurate results efficiently, in comparison to the other SVM solvers. In particular, for a large dataset that was used in the KDD 2010 challenge, DRSpl was more than 60 times faster than LIBLINEAR. DRSpl is designed to use very less communication bandwidth in a parallel computing cluster and is suitable for frameworks like MapReduce. We demonstrate that our algorithms can be easily modified to perform non-negative matrix factorization (NMF). We propose approximate extreme points NMF (AENMF) to efficiently compute NMF. Our empirical results indicate that AENMF computed matrix factors several orders of magnitude faster than other NMF solvers. In addition, when combined with another NMF solver, it computed matrix factorizations with very less approximation error.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Manu Nandan.
Thesis:
Thesis (Ph.D.)--University of Florida, 2013.
Local:
Adviser: KHARGONEKAR,PRAMOD P.
Local:
Co-adviser: TALATHI,SACHIN S.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2014-12-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2013
System ID:
UFE0046147:00001