Design and Implementation of Mapreduce Systems for Block-Oriented Iterative Scientific Applications

MISSING IMAGE

Material Information

Title:
Design and Implementation of Mapreduce Systems for Block-Oriented Iterative Scientific Applications
Physical Description:
1 online resource (102 p.)
Language:
english
Creator:
Yang, Xin
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
LI,XIAOLIN
Committee Co-Chair:
FIGUEIREDO,RENATO JANSEN
Committee Members:
RANKA,SANJAY
FORTES,JOSE A
FANG,YUGUANG

Subjects

Subjects / Keywords:
design -- distributed -- implementation -- science -- systems
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Scientific applications are becoming data-intensive, and many high-impact discoveries are relying on efficient processing of massive scientific data. Scientists depend heavily on techniques such as MPI and OpenMP for parallelizing scientific applications. However, efficient implementations require interdisciplinary knowledge of scientific algorithms, programming models, parallel systems, data management and so on. The popular MapReduce systems such as Hadoop open an opportunity for parallelizing scientific applications easily. They encapsulate the complexity of parallelism and open the easy-to-use programming APIs for users to implement scientific applications. This dissertation focuses on the design and implementation of MapReduce systems for Block-Oriented Iterative (BOI) applications. As a representative application, the state transition application is analyzed, and the Mammoth system is introduced for users to program state transition applications easily and run them efficiently. Previous attempts of programming and running state transition applications following the MapReduce manner are: map tasks process individual points and reduce tasks aggregate the overlapped intermediate results. Such approach is simple but suffers performance issues. State transition applications generate inflated intermediate data that saturates the network. They cause substantial synchronization overheads as existing MapReduce systems support the all-to-all communication pattern only. Moreover, computation skews due to non-uniform distributions of scientific phenomena commonly exist. The Mammoth system features a MapReduce-style programming model that retains the ease of using. Optimizations addressing the performance issues are designed for the underlying runtime to parallelize the execution efficiently. Although the Mammoth system has enclosed optimizations that address computation skews in state transition applications, extended discussions specialized for this performance issue in BOI applications are presented in the following parts. First, a traditional approach of adaptive partitioning is presented, and the Apala (Adaptive Partitioning and load balancing) system is designed. Apala features the application-aware and system-aware design so that it can adapt to various systems while maintaining satisfactory performance. Second, an autonomic workload management system, iPart, for taming computation skews in BOI applications is presented for MapReduce systems. As a BOI application is typically expressed as a chain of MapReduce jobs, iPart introduces a workload control loop into the chain of MapReduce jobs. Specifically, the partitioning phase generates partitioning plans for distributing workloads to reduce tasks equally. Workload estimates in terms of execution times are collected in the reduce phase and fed back to the partition phase to update partitioning plans. As a result, computation skews are addressed by adapting partitioning plans iteratively.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Xin Yang.
Thesis:
Thesis (Ph.D.)--University of Florida, 2013.
Local:
Adviser: LI,XIAOLIN.
Local:
Co-adviser: FIGUEIREDO,RENATO JANSEN.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2014-12-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2013
System ID:
UFE0046154:00001