Techniques and Tools for an Autonomic Approach to Fault and Performance Management in Map-Reduce

MISSING IMAGE

Material Information

Title:
Techniques and Tools for an Autonomic Approach to Fault and Performance Management in Map-Reduce
Physical Description:
1 online resource (196 p.)
Language:
english
Creator:
Kadirvel, Selvi
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
FORTES,JOSE A
Committee Co-Chair:
KHARGONEKAR,PRAMOD P
Committee Members:
FIGUEIREDO,RENATO JANSEN
CHEN,SHIGANG

Subjects

Subjects / Keywords:
autonomic-computing -- distributed-systems -- fault-tolerance -- machine-learning -- map-reduce -- performance
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre:
Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Map-Reduce is a programming paradigm and software implementation for executing data-intensive applications on a cluster of computers. Computational environments such as data centers, grids and clouds that execute Map-Reduce applications, experience faults and hence, performance degradations, because of a combination of factors that includes scale, complex interdependencies, use of commodity servers, sharing of resources, heterogeneity and geographical distribution of constituent components.  This dissertation proposes various techniques and tools that help facilitate an autonomic approach to fault and performance management in Map-Reduce systems. Fault-managed Map-Reduce handles faults in an online, on-demand and closed-loop manner through the use of performance prediction, anomaly detection, dynamic resource scaling and other built-in features of Hadoop. Performance prediction uses machine learning based regression methods that have short prediction computation times and high prediction accuracy. Anomaly detection for proactively identifying a faulty node is performed using a sparse-coding based technique. FMR successfully mitigates runtime penalties as high as 180% to an average of 14%. This dissertation presents two tools, namely MRNets and FaultPlay to facilitate performance and fault management studies in MapReduce.   Challenges in fault studies on a Map-Reduce platform include lack of access to real-world failure datasets and the inability to precisely evaluate and compare different fault management solutions. This is because the testbed, workload and faultload used in different solutions are difficult to recreate exactly either due to insufficient information, different available resources or the need for many hours of replicated effort. FaultPlay overcomes these challenges, through software-defined and easily reproducible fault studies on Map-Reduce platforms. FaultPlay enables a variety of characterization and management studies to be conducted by providing modules for job creation, fault injection, distributed monitoring, log parsing and deploying recovery-based management solutions. For the case of long-running jobs and a workload of multiple Map-Reduce jobs, empirical studies and the generation of representative training data for machine-learning models can become time-consuming. This dissertation presents an orthogonal approach to fast and accurate performance modeling, called MRNets, that is based on Petri-nets, a discrete event modeling methodology. Petrinets provide a formal method grounded by well-founded mathematical properties to capture both system structure and behavior. Models are executable and used to simulate system behavior. MRnets facilitate various performance analyses and provide a key advantage through their graphical representation, which makes it easy to design, use and extend these models for further performance studies.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Selvi Kadirvel.
Thesis:
Thesis (Ph.D.)--University of Florida, 2014.
Local:
Adviser: FORTES,JOSE A.
Local:
Co-adviser: KHARGONEKAR,PRAMOD P.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2015-05-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2014
System ID:
UFE0046167:00001