|
Citation |
- Permanent Link:
- https://ufdc.ufl.edu/UFE0021738/00001
Material Information
- Title:
- Learning-Aided System Performance Modeling in Support of Self-Optimized Resource Scheduling in Distributed Environments
- Creator:
- Zhang, Jian
- Place of Publication:
- [Gainesville, Fla.]
- Publisher:
- University of Florida
- Publication Date:
- 2007
- Language:
- english
- Physical Description:
- 1 online resource (146 p.)
Thesis/Dissertation Information
- Degree:
- Doctorate ( Ph.D.)
- Degree Grantor:
- University of Florida
- Degree Disciplines:
- Electrical and Computer Engineering
- Committee Chair:
- Figueiredo, Renato J.
- Committee Members:
- Fortes, Jose A.
George, Alan D. Ghosh, Malay
- Graduation Date:
- 12/14/2007
Subjects
- Subjects / Keywords:
- Distance functions ( jstor )
Information classification ( jstor ) Learning ( jstor ) Machine learning ( jstor ) Machinery ( jstor ) Modeling ( jstor ) Performance metrics ( jstor ) Principal components analysis ( jstor ) Scheduling ( jstor ) Workloads ( jstor ) Electrical and Computer Engineering -- Dissertations, Academic -- UF bayesian, classification, clustering, distributed, knn, learning, pca, performance, prediction, scheduling
- Genre:
- Electronic Thesis or Dissertation
born-digital ( sobekcm ) Electrical and Computer Engineering thesis, Ph.D.
Notes
- Abstract:
- With the goal of autonomic computing, it is desirable to have a resource scheduler that is capable of self-optimization, which means that with a given high-level objective the scheduler can automatically adapt its scheduling decisions to the changing workload. This self-optimization capacity imposes challenges to system performance modeling because of increasing size and complexity of computing systems. Our goals were twofold: to design performance models that can derive applications' resource consumption patterns in a systematic way, and to develop performance prediction models that can adapt to changing workloads. A novelty in the system performance model design is the use of various machine learning techniques to effciently deal with the complexity of dynamic workloads based on monitoring and mining of historical performance data. In the environments considered in this thesis, virtual machines (VMs) are used as resource containers to host application executions because of their flexibility in supporting resource provisioning and load balancing. Our study introduced three performance models to support self-optimized scheduling and decision-making. First, a novel approach is introduced for application classification based on the Principal Component Analysis (PCA) and the k-Nearest Neighbor (k-NN) classifier. It helps to reduce the dimensionality of the performance feature space and classify applications based on extracted features. In addition, a feature selection model is designed based on Bayesian Network (BN) to systematically identify the feature subset, which can provide optimal classification accuracy and adapt to changing workloads. Second, an adaptive system performance prediction model is investigated based on a learning-aided predictor integration technique. Supervised learning techniques are used to learn the correlations between the statistical properties of the workload and the best-suited predictors. In addition to a one-step ahead prediction model, a phase characterization model is studied to explore the large-scale behavior of application's resource consumption patterns. Our study provides novel methodologies to model system and application performance. The performance models can self-optimize over time based on learning of historical runs, therefore better adapt to the changing workload and achieve better prediction accuracy than traditional methods with static parameters. ( en )
- General Note:
- In the series University of Florida Digital Collections.
- General Note:
- Includes vita.
- Bibliography:
- Includes bibliographical references.
- Source of Description:
- Description based on online resource; title from PDF title page.
- Source of Description:
- This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
- Thesis:
- Thesis (Ph.D.)--University of Florida, 2007.
- Local:
- Adviser: Figueiredo, Renato J.
- Statement of Responsibility:
- by Jian Zhang.
Record Information
- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright Zhang, Jian. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Classification:
- LD1780 2007 ( lcc )
|
Downloads |
This item has the following downloads:
|
Full Text |
Case Format
Features Classes
Sample Cases
Patterns
of Feature ec
V u Decisions
Values
General
General Learning
Classifier Le m
Model System
Training
Testing
Tesn .. Decision on Class
Case to Be Application-Specific D Assignment
Classified Classifier Ase
of Case
Figure 1-2. Classification system representation
During the training phase, labeled sample cases are used to derive the
unknown parameters of the /.i- .:;. r model. During the testing phase, the
customized l,- .;'7 r is used to associate a -... .:7' pattern of observations with
a "-/ ''. : class.
formulated as a finite-state Markov decision process (MlI)P), and reinforcement learning
algorithms for this context are highly related to dynamic programming techniques.
1.3.4 Other Learning Paradigms
In addition to the above three traditional learning methods, there are some other
learning paradigms:
Relational Learning / Structured Prediction: It predicts structure on sets of objects.
For example, it is trained on genome/proteome data with known relationships and can
predict graph structure on new sets of genomes/proteomes.
Predictor Performance Comparison (VM2)
1.4
1.2
LU
u 1
0.8
N
0.6
o 0.4
0.2
0
1 2 3 4 5 6 7 8 9 10 11 12
Performance Metric ID
SP-LARP Knn-LARP OBays-LARP OCum.MSE W-Cum.MSE
Figure 4-10. Predictor performance comparison (VM2)
1 CPU_usedsec, 2 CPUready, 3 Mem_size, 4 Mem_swap,
5 NICl_rx, 6 NICl_tx, 7 NIC2_rx, 8 NIC2_tx,
9 VDlread, 10 VDl_write, 11 VD2_read, 12 VD2_write
The skeleton-based performance prediction work introduced in [48] uses a synthetic
skeleton program to reproduce the CPU utilization and communication behaviors of
message passing parallel programs to predict application performance. In contrast, the
application classifier provides application behavior learning in more dimensions.
Prophesy [49] employs a performance-modeling component, which uses coupling
parameters to quantify the interactions between kernels that compose an application.
However, to be able to collect data at the level of basic blocks, procedures, and loops,
it requires insertion of instrumentation code into the application source code. In
contrast, the classification approach uses the system performance data collected from
the application host to infer the application resource consumption pattern. It does not
require the modification of the application source code.
Statistical clustering techniques have been applied to learn application behavior
at various levels. Nickolayev et al applied clustering techniques to efficiently reduce
the processor event trace data volume in cluster environment [50]. Ahn and Vetter
conducted application performance ain i ,~i-; by using clustering techniques to identify the
representative performance counter metrics [51]. Both Cohen and ('!i 's [52] and our
work perform statistical clustering using system-level metrics. However, their work focuses
on system performance anomaly detection. Our work focuses on application classification
for resource scheduling.
Our work can be used to learn the resource consumption patterns of parallel
application's child process and multi-stage application's sub-stage. However, in this
study we focus on sequential and single-stage applications.
2.6 Conclusion
The application classification prototype presented in this chapter shows how to apply
the Principal Component Analysis and K-Nearest Neighbor techniques to reduce the
dimensions of application resource consumption feature space and assist the resource
scheduling. In addition to the CPU load, it also takes the I/O, network, and memory
Figure 2-3.
Data Pool
Application classification model
The Performance profiler collects performance metrics of the target
application node. The Cl. --.: ,l/, mn center classifies the application using
extracted key components and performs statistic ain i1, i of the classification
results. The Application DB stores the application class information. (m is the
number of snapshots taken in one application run, to0/t: are the beginning
ending times of the application execution, VMIP is the IP address of the
application's host machine).
system is used to sample the system performance of a computing node running an
application of interest.
2.3.1 Performance Profiler
The performance profiler is responsible for collecting performance data of the
application node. It interfaces with the resource manager to receive data collection
instructions, including the target node and when to start and stop.
where C1, C2, and C3 denote the unit cost per resource usage, switching, and penalty.
Therefore, k ,, is derived as
k,, = arg min TC'(k)
1
arg mmin [R(k) + C x TR(k) + CpP(k)] (5-10)
1
where C is the transition factor, Cp denote the discount factor for misprediction penalty,
which is the ratio of C3 to C1, and K is the maximum number of phases.
5.4 Phase Prediction
This section describes the work flow of the application resource demand phase
prediction illustrated in Figure 5-3. The prediction consists of two stages: a training
stage and a testing stage. During the training stage, the number of the clusters in
the application resource usage, the corresponding cluster centroids, and the unknown
parameters of the time series prediction model of the resource usage are determined.
During the testing stage, the one-step ahead resource usage is predicted and classified as
one of the clusters.
Both stages start from pattern representation and framing. In the step of pattern
representation, the collected performance data of the application VM are profiled to
extract only the features which will be used for clustering and future resource provisioning.
For example, in the one-dimension case discussed in this thesis, the training data of a
specific performance feature (Xx1,,, see Table 5-1), are extracted, where u is the total
number of input data. Then the extracted performance data Xix>1 are framed with the
prediction window size m to form data X'(Iu-m+1)xm-
The training stage mainly consists of two processes: prediction model fitting and
phase behavior analysis. The algorithms defined in Section 5.3.3 and 5.3.4 are used to
find out the number of phases k, which gives the lowest total resource provisioning cost.
The output phase profile is used to train the phase predictor. In addition, the unknown
parameters of the resource predictor are estimated from the training data. In this thesis,
Initially the performance profiler collected data of all the thirty-three (n = 33)
performance metrics once every five seconds (d = 5) during the application execution.
Then the data preprocessor extracted the data of the eight (p = 8) metrics listed in
Table 2-1 based on the expert knowledge of the correlation between these metrics and the
application classes. After that, the PCA processor conducted the linear transformation of
the performance data and selected principal components based on the minimal fraction
variance defined. In this experiment, the variance contribution threshold was set to extract
two (q = 2) principal components. It helps to reduce the computational requirements of
the classifier. Then, the trained 3-NN classifier conducts classification based on the data of
the two principal components.
The training data's class clustering diagram is shown in Figure 2-5 (a). The diagram
shows a PCA-based two-dimensional representation of the data corresponding to the five
classes targeted by our system. After being trained with the training data, the classifier
classifies the remaining benchmark programs shown in Table 2-2. The classifier provides
outputs in two kinds of formats: the application class-clustering diagram, which helps to
visualize the classification results, and the application class composition, which can be
used to calculate the unit application cost.
Figure 2-5 shows the sample clustering diagrams for three test applications. For
example, the interactive VMD application (Figure 2-5(d)) shows a mix of the idle class
when user is not interacting with the application, the I/O-intensive class when the user
is uploading an input file, and the Network-intensive class while the user is interacting
with the GUI through a VNC remote display. Table 2-3 summarizes the class compositions
of all the test applications. Figure 2-6 visualizes the class composition of some sample
benchmark programs. These classification results match the class expectations gained from
empirical experience with these programs. They are used to calculate the unit application
cost shown in section 4.4.
ratio schedule for the eight performance features
Performance Number of phases (k)
Features 1 2 3 4 5 6 7 8 9 10
CPU user 1.00 0.80 0.75 0.75 0.75 0.77 0.78 0.78 0.80 0.83
CPU_system 1.00 0.67 0.66 0.65 0.64 0.66 0.67 0.69 0.70 0.71
Bytes_in 1.00 0.97 0.96 0.96 0.96 0.96 0.96 0.95 0.95 0.95
Bytes_out 1.00 0.95 0.90 0.88 0.90 0.87 0.87 0.87 0.87 0.87
IO BI 1.00 0.57 0.52 0.55 0.56 0.58 0.62 0.63 0.62 0.64
IO BO 1.00 0.57 0.53 0.55 0.57 0.61 0.60 0.61 0.64 0.63
Swap_in 1.00 0.54 0.55 0.59 0.59 0.60 0.61 0.63 0.64 0.65
Swap_out 1.00 0.51 0.47 0.49 0.54 0.55 0.57 0.58 0.59 0.61
(Total cost ratio p = TC'(k)/TC'(1), where C = 52 and Cp = 8)
5.5.1.2 World Cup web log replay
In this experiment, phase characterization was performed for the performance data
collected from a network-intensive application, 1998 World Cup web access log repliv.
The workload used in this experiment was based on the 1998 World Cup trace [98].
The openly available trace containing a log of requests to Web servers was used as an
input to a client replay tool, which enabled us to exercise a realistic Web-based workload
and collect system-level performance metrics using Ganglia in the same manner that was
done for the SPECseis96 workload. For this study, we chose to replay the five hour (from
22:00:01 Jun.23 to 3:11:20 Jun.24) log of the least loaded server (serverlD 101), which
contained 130,000 web requests.
The phase analysis and prediction techniques can be used to characterize performance
data collected from not only virtual machines but also physical machines. During the
experiment, a physical server with sixteen Intel(R) Xeon(TM) MP 3.00GHz CPUs
and 32GB memory was used to execute the replay clients to submit requests based on
submission intervals, HTTP protocol types (1.0 or 1.1), and document sizes defined in
the log file. A physical machine with Intel(R) Pentium(R) 4 1.70GHz CPU and 512MB
memory was used to host the Apache web server and a set of files which were created
based on the file sizes described in the log.
Table 5-2. SPECseis96 total cost
1400
S1200
1000
0
S800
600
0
S400
S200
2 4 6 8
Number of Phases
(1
2 4 6 8
Number of Phases
D
Figure 5-4. Continued
outputs class composition, which can be used to support application cost models (Section
4.4). The post processed classification results together with the corresponding execution
time (t to) are stored in the application database and can be used to assist future
resource scheduling.
2.4 Experimental Results
We have implemented a prototype for application classification including a Perl
implementation of the performance profiler and a Matlab implementation of the
classification center. In addition, Ganglia was used to monitor the working status of
the virtual machines. This section evaluates our approach from the following three aspects:
the classification ability, the scheduling decision improvement and the classification cost.
2.4.1 Classification Ability
The application class set in this experiment has four classes: CPU-intensive, I/O and
paging-intensive, network-intensive, and idle. Application of I/O and paging-intensive
class can be further divided into two groups based on whether they have or do not have
substantial memory intensive activities. Various synthetic and benchmark programs,
scientific computing applications and user interactive applications are used to test
the classification ability. These programs represent typical application behaviors of
their classes. Table 2-2 summarizes the set of applications used as the training and the
testing applications in the experiments [28-38]. The 3-NN classifier was trained with the
performance data collected from the executions of the training applications highlighted in
the table. All the application executions were hosted by a VMware GSX virtual machine
(VM1). The host server of the virtual machine was an Intel(R) Xeon(TM\ ) dual-CPU
1.80GHz machine with 512KB cache and 1GB RAM. In addition, a second virtual
machine with the same specification was used to run the server applications of the network
benchmarks.
to measure the dissimilarity between two patterns. It works well when a data set has
"(, i 1iip I or -, ii. il" clusters. In case of clustering in the multi-dimensional space,
normalization of the continuous features can be used to remove the tendency of the
largest-scaled feature to dominate the others. In addition, Mahalanobis distance can be
used to remove the distortion caused by the linear correlation among features as discussed
in C'!i pter 3.
(3) Clustering or grouping: The clustering can be performed in a number of v--v- [97].
The output clustering can be hard (a partition of the data into groups) or f; ..;, (where
each pattern has a variable degree of membership in each of the output clusters). A hard
clustering can be obtained from a fuzzy partition by thresholding the membership value.
In this work, one of the most popular iterative clustering methods, k-means algorithm as
detailed in Section 5.3.3, is used.
5.3.2 Definitions and Notation
In this chapter, we follow the terms and notation defined in [97]
A pattern(or feature vector or observation) is a single data item used by the
clustering algorithm. It typically consists of a vector of d measurements.
The individual scalar components of a pattern are called features (or attributes).
d is the dimensionality of the pattern or of the pattern space.
A class refers to a state of nature that governs the pattern generation process
in some cases. More concretely, a class can be viewed as a source of patterns whose
distribution in feature space is governed by a probability density specific to the class.
Clustering techniques attempt to group patterns so that the classes thereby obtained
reflect the different pattern generation processes represented in the pattern set.
A distance measure is a metric on the feature space used to quantify the similarity of
patterns.
5 APPLICATION RESOURCE DEMAND PHASE ANALYSIS AND PREDIC-
TIONS. ................... ............... ...... 106
5.1 Introduction ... ................ ... .... ....... 106
5.2 Application Resource Demand Phase Analysis and Prediction Prototype 108
5.3 Data ('l-I. i,..g .......... ....... .............. 111
5.3.1 Stages in Cl.-1 .. . . 111
5.3.1 Stages in('CII-I ...... ........................... 111
5.3.2 Definitions and Notation .................. ... 112
5.3.3 k-means Clustering .................. ........ 113
5.3.4 Finding the Optimal Number of Cl- i .... . . ..... 114
5.4 Phase Prediction .................. ............. 117
5.5 Empirical Evaluation .................. ........... 118
5.5.1 Phase Behavior Analysis .................. ... 119
5.5.1.1 SPECseis96 benchmark ..... .......... 119
5.5.1.2 World Cup web log replay .............. 122
5.5.2 Phase Prediction Accuracy ............... .. .. 123
5.5.3 Discussion ............... .......... .. 125
5.6 Related Work . ............... ........... 126
5.7 Conclusion . ................ ............ 128
6 CONCLUSION . ............... ............ 135
REFERENCES. ..................... ...... .......... 137
BIOGRAPHICAL SKETCH .................. ............. 146
0 0, 0 0 O=t
t I I I n in 1- .
^o Mo 0a
0 0 00 00 0 0
Co 0>
I I I I I I I I I 0 0 0
0 C) 0o 0 0
- 0 0- o
I 0 I V I I I
0 C 0
o\ o in
0
0 0 0
- 6 6- 00
I0 00
I I~"~ I I ^ I In -
--NC
< Um cn u < r >
Sa
*S 0 03
_?__ | _4
2 4 6
Number of Phases
A
2 4 6
Number of Phases
B
2 4 6
Number of Phases
C
8 10
8 10
8 10
Figure 5-5. Phase analysis of WorldCup'98 BytesIn A)Phase transitions B)Misprediction
penalties C)Total cost with penalty (C, = 8)
1.6
g 1.55
E 1.5
" 1.45
1.4
1.35
0
experiments show that the proposed scheme can effectively select a performance metric
subset providing above 911' i classification accuracy for a set of benchmark applications.
In addition to the application resource demand modeling, C'! lpter 4 proposes a
learning based adaptive predictor, which can be used to predict resource availability. It
uses the k-NN classifier and PCA to learn the relationship between workload characteristic
and suited predictor based on historical predictions, and to forecast the best predictor
for the workload under study. Then, only the selected best predictor is run to predict the
next value of the performance metric, instead of running multiple predictors in parallel
to identify the best one. The experimental results show that this learning-aided adaptive
resource predictor can often outperform the single best predictor in the pool without a
priori knowledge of which model best fits the data.
The application classification and the feature selection techniques can be used
to define the application resource consumption patterns at any given moment. The
experimental results of the application classification -i -:.- -1 that allocating applications
which have complementary resource consumption patterns to the same server can improve
the system throughput.
In addition to one-step-ahead performance prediction, C'!i lter 5 studied the large
scale behavior application resource consumption. Clustering based algorithms have
been explored to provide a mechanism to define and predict the phase behavior of the
application resource usage to support on-demand resource allocation. The experimental
results show that an average of above 91' of phase prediction accuracy can be achieved
for the four-phase cases of the benchmark workloads.
TABLE OF CONTENTS
page
ACKNOWLEDGMENTS .........
LIST O F TABLES . . . . . . . . . .
LIST OF FIGURES ...............
ABSTRACT ....................
CHAPTER
1 INTRODUCTION ................
1.1 Resource Performance Modeling .......
1.2 Autonomic Computing ...........
1.3 Learning ....................
1.3.1 Supervised Learning .........
1.3.2 Unsupervised Learning .......
1.3.3 Reinforcement Learning .......
1.3.4 Other Learning Paradigms ..... .
1.4 Virtual Machines .. ..........
1.4.1 Virtual Machine ('C!I '.teristics .
1.4.2 Virtual Machine Plant .. ....
2 APPLICATION CLASSIFICATION BASED ON
MONITORING
RNING OF RESOURCE CONSUMPTION PATTERNS
2.1 Introduction . . . . . .
2.2 Classification Algorithms .. ............
2.2.1 Principal Component Analysis ....
2.2.2 k-Nearest Neighbor Algorithm ........
2.3 Application Classification Framework ........
2.3.1 Performance Profiler .. ..........
2.3.2 Classification Center .. ..........
2.3.2.1 Data preprocessing based on expert
2.3.2.2 Feature selection based on principal
2.3.2.3 Training and classification .....
2.3.3 Post Processing and Application Database
2.4 Experimental Results .. ..............
2.4.1 Classification Ability .. ..........
2.4.2 Scheduling Performance Improvement .
2.4.3 Classification Cost .. .............
2.5 Related W ork .. .................
2.6 Conclusion . . . . . .
AND
knowledge.
component
LEA-
analysis
.
5.3 Data Clustering
C'!l-I i ing is an important data mining technique for discovering patterns in the
data. It has been used effectively in many disciplines such as pattern recognition, biology,
geology, and marketing.
At a high-level, the problem of clustering is defined as follows: Given a set U of
n samples u u2, u,, we would like to partition U into k subsets U1, U2,. Uk
such that the samples assigned to each subset are more similar to each other than the
samples assigned to different subsets. Here, we assume that two samples are similar if they
correspond to the same phase.
5.3.1 Stages in Clustering
A typical pattern clustering activity involves the following steps [97]:
(1) Pattern representation, which is used to obtain an appropriate set of features to
use in clustering. It optionally consists of feature extraction and/or selection. Feature
selection is the process of identifying the most effective subset of the original features to
use in clustering. Feature extraction is the use of one or more transformations of the input
features to produce new salient features.
In the context of resource demand phase analysis, the features under study are the
system level resource performance metrics as shown in Table 5-1. For one dimension
(1< In- 1~.- which is the case of this work, the feature selection is as simple as choosing
the performance metric which is instructive to the allocation of the corresponding system
resource. For clustering based on multiple performance metrics, feature extraction
techniques such as Principal Component Analysis (PCA) may be used to transform the
input performance metrics to a lower dimension space to reduce the computing intensity of
subsequent clustering and improve the clustering quality.
(2) Definition of a pattern proximity measure appropriate to the data domain. The
pattern proximity is usually measured by a distance function defined on pairs of patterns.
In this work, the most popular metric for continuous features, Euclidean distance is used
3.6 Conclusion
The autonomic feature selection prototype presented in this chapter shows how
to apply statistical analysis techniques to support online application classification. We
envision that this classification approach can be used to provide first-order analysis of
the dominant resource consumption patterns of an application. This chapter shows that
autonomic feature selection enables classification without requiring expert knowledge in
the selection of relevant low-level performance metrics.
activities into account for the resource scheduling in an effective way. It does not require
modifications of the application source code. Experiments with various benchmark
applications -L.-. -1 that with the application class knowledge, a scheduler can improve
the system throughput 22.11 on average by allocating the applications of different classes
to share the system resources.
In this work, the input performance metrics are selected manually based on expert
knowledge. In the next chapter, the techniques for automatically selecting features for
application classification are discussed.
is capable of cloning an application-specific virtual machine and configuring it with an
appropriate execution environment. In the context of VMPlant, the application can be
scheduled to run on a dedicated virtual machine, which is hosted by a shared 1,;/-,.:. .'
machine. Within the VM, system performance metrics such as CPU load, memory usage,
I/O activity and network bandwidth utilization, reflect the application's resource usage.
The classification system described in this chapter leverages the capability of
summarizing application performance data by collecting system-level data within a
VM, as follows. During the application execution, snapshots of performance metrics are
taken at a desired frequency. A PCA processor analyzes the performance snapshots and
extracts the key components of the application's resource usage. Based on the extracted
features, a k-NN classifier categorizes each snapshot into one of the following classes:
CPU-intensive, IO-intensive, memory-intensive, network-intensive and idle.
By using this system, resource scheduling can be based on a comprehensive diagnosis
of the application resource utilization, which conveys more information than CPU load
in isolation. Experiments reported in this chapter show that the resource scheduling
facilitated with application class composition knowledge can achieve better average system
throughput than scheduling without the knowledge.
The rest of the chapter is organized as follows: Section 2.2 introduces the PCA and
the k-NN classifier in the context of application classification. Section 2.3 presents the
classification model and implementation. Section 2.4 presents and discusses experimental
results of classification performance measurements. Section 2.5 discusses related work.
Conclusions and future work are discussed in Section 2.6.
2.2 Classification Algorithms
Application behavior can be defined by its resource utilization, such as CPU load,
memory usage, network and disk bandwidth utilization. In principle, the more information
a scheduler knows about an application, the better scheduling decisions it can make.
However, there is a tradeoff between the complexity of decision-making process and the
In addition, the experimental data also demonstrate the impact of changing execution
environment configurations on the application's class composition. For example, in
Table 2-3 when SPECseis96 with medium size input data was executed in VM1 with
256MB memory (SPECseis96_A), it is classified as CPU-intensive application. In the
SPECseis96_B experiment, the smaller physical memory (32MB) resulted in increased
paging and I/O activity. The increased I/O activity is due to the fact that less physical
memory is available to the O/S buffer cache for I/O blocks. The buffer cache size at run
time was observed to be as small as 1MB in SPECseis96_B, and as large as 200MB in
SPECseis96_A. In addition, the execution time gets increased from 291 minutes and 42
seconds in the first case to 426 minutes 58 seconds in the second case.
Similarly, in the experiments with PostMark, different execution environment
configurations changed the application's resource consumption pattern from one class to
another. Table 2-3 shows that if a local file directory was used to store the files to be read
and written during the program execution, the PostMark benchmark showed the resource
consumption pattern of the I/O-intensive class. In contrast, with an NFS mounted file
directory, it (PostMark_NFS) was turned into a Network-intensive application.
2.4.2 Scheduling Performance Improvement
Two sets of experiments are used to illustrate the performance improvement that a
scheduler can achieve with the knowledge of application class. These experiments were
performed on 4 VMware GSX 2.5 virtual machines with 256MB memory each. One of
these virtual machines (VM1) was hosted on an Intel(R) Xeon(T\M) dual-CPU 1.80GHz
machine with 512KB cache and 1GB RAM. The other three (VM2, VM3, and VM4) were
hosted on an Intel(R) Xeon(TM) dual-CPU 2.40GHz machine with 512KB cache and 4GB
RAM. The host servers were connected by Gigabit Ethernet.
The first set of experiments demonstrates that the application class information can
help the scheduler to optimize resource sharing among applications running in parallel to
improve system throughput and reduce throughput variances. In the experiments, three
Table 3-2. Sample performance metrics in the original feature set
Performance Description
Metrics
cpu_system / user Percent CPU_system / user
/ idle / idle
cpunice Percent CPU nice
bytes_in / out Number of bytes per second
into / out of the network
io_bi / bo Blocks sent to / received from
a block device (blocks/s)
swap_in / out Amount of memory swapped
in / out from / to disk (kB/s)
pktsin / out Packets in / out per second
proc_run Total number of running
processes
load_one / five One / Five / Fifteen minutes
/ fifteen load average
the class, whose centroid has the smallest Mahanalobis distance min(di, d2, d5) to the
snapshot. Automated and adaptive threshold setting is discussed in detail in [67].
In our implementation, Ganglia is used as the monitoring tool and twenty (m = 20)
performance metrics, which are related to resource usage, are included in the training
data. These performance metrics include 16 out of 33 default metrics monitored by
Ganglia and the 4 metrics that we added based on the need of classification. The four
metrics include the number of I/O blocks read from/written to disk, and the number of
memory pages swapped in/out. A program was developed to collect these four metrics
(using vmstat) and added them to the metric list of Ganglia's monitoring daemon gmond.
Table 3-2 shows some sample performance metrics of the training candidate.
The first time quality assurance was performed by human expert at the initialization.
The subsequent assurance can be conducted automatically by following the above steps to
select representative training data for each class.
3.3.2 Feature Selector
Feature selector is responsible for selecting the features which are correlated with
the application's resource consumption pattern from the numerous performance metrics
The set of potential observations relevant to a particular problem are called features,
which also go by a host of other names, including attributes and variables. Only correctly
solved cases will be used in building the specific classifier, which is called the training
phase of the classification. The pattern of feature values for each case is associated with
the correct classification or decision to form the sample cases, a set which is also called
the training data. Thus, learning in any of these systems can be viewed as a process of
generalizing these observed empirical associations subject to the constrains imposed by
the chosen classifier model. During the testing phase, the customized classifier is used
to associate a specific pattern of observations with a specific class. The learning method
introduced above is a form of supervised 1. iiiiii which learn by being presented with
preclassified training data.
1.3.2 Unsupervised Learning
Unsupervised learning methods can learn without any human intervention. This
method is particularly useful in situations where data need to be classified or clustered
into a set of classifications but where the classifications are not known in advance. In
other words, it fits the model to observations. It differs from supervised learning by the
fact that there is not a priori output.
1.3.3 Reinforcement Learning
Reinforcement learning refers to a class of problems in machine learning which
postulate an agent exploring an environment in which the agent perceives its current
state and takes actions. A system that uses reinforcement learning is given a positive
reinforcement when it performs correctly and a negative reinforcement when it performs
incorrectly. However, the information of why and how the learning system performed
correctly is not provided to it.
Reinforcement learning algorithms attempt to find a policy for maximizing cumulative
reward for the agent over the course of the problem. The environment is typically
10
Principal C'omponent
5 .
CN ". ..
c ....* h.'.: :" -"
c 0
U)
.
-15 '
-10 ----------------------------
-15 -10 -5 0 5 10 15
Dimension 1
Figure 2-1. Sample of principal component analysis
and the covariance matrix of the same data set is
Cx E (x x) (X x)T (2-3)
The components of Cx, denoted by cij, represent the covariances between the random
variable components xi and xj. The component cii is the variance of the component xi.
From a sample of vectors xl, .- XM, we can calculate the sample mean and the
sample covariance matrix as the estimates of the mean and the covariance matrix.
The eigenvectors ei and the corresponding eigenvalues Ai can be obtained by solving
the equation
Cxei Aiei, i= 1,-- n (2-4)
VMID
DevicelD
VMM VM (
Performance Profiler
DB
VM Host
VM: Virtual Machine
VMM: Virtual Machine Monitor Prediction
DB: Database [(x, ,,),...(x,_ 1,1)] QA
QA: Quality Assuror
m: Prediction window size
j: Quality assurance window size
ts/te: Starting / ending time stamps
Figure 4-1. Virtual machine resource usage prediction prototype
The monitor agent, which is installed in the Virtual Machine Monitor (VMM),
collects the VM resource performance data and stores them in the round robin
VM Performance Database. The profiler extracts the performance data of a
given time frame for the VM indicated by VMID and deviceID. The
LARPredictor select the best prediction model based on learning of historical
predictions, predicts the resource performance for time t+1, and stores the
prediction results in the prediction database. The prediction results can be
used to support the resource i,,i,.i.i. r to perform dynamic VM resource
allocation. The Performance Q;,n.:,lli Assuror (QA) audits the LARPredictor's
performance and orders re-training for the predictor if the performance drops
below a predefined threshold.
Our virtual machine resource prediction prototype, illustrated in Figure 4-1, models
how the VM performance data are collected and used to predict the value for future time
to support resource allocation decision-making.
A performance monitoring agent is installed in the Virtual Machine Monitor (VMM)
to collect the performance data of the guest VMs. In our implementation, VMware's ESX
virtual machines are used to host the application execution and the ; n,,, -.i,.: tool [85]
of ESX is used to monitor and collect the performance data of the VM guests and host
- U_ W. W-U-U-P
a,
*S
C
C
0)
0)
N
(U
C
+^
b0
+0
e
.N
'E^
(U
If
a,
c/l
*a
*a
c/l
o o-
h ^
.^
-o
C
u)~
.f 0
Cf
-s +
*
w *
u E
_____
00 .
4-3 Learning-aided adaptive resource predictor workflow
4-4 Learning-aided adaptive resource predictor dataflow
4-5 Best predictor selection for trace VM2Joadl5 .
4-6
4-7
4-8
4-9
4-10
4-11
4-12
4-13
5-1
5-2
5-3
5-4
Best predictor selection for trace VM2_PktIn . .
Best predictor selection for trace VM2_Swap . .
Best predictor selection for trace VM2_Disk . .
Predictor performance comparison (VM1) . ..
Predictor performance comparison (VM2) . ..
Predictor performance comparison (VM3) . ..
Predictor performance comparison (VM4) . ..
Predictor performance comparison (VM) . ..
Application resource demand phase analysis and predict
Resource allocation strategy comparison . ...
Application resource demand phase prediction workflow
Phase analysis of SPECseis96 CPU_user . ...
5-5 Phase analysis of WorldCup'98 BytesIn
5-6 Phase analysis of WorldCup'98 Bytesout
.
.
. . . 88
.. . 92
.. . 93
.. . 94
.. . 95
. .. . 10 1
. .. . 02
. .. . 03
. .. . 04
. .. . 05
ion prototype ...... 109
. .. . 115
. . . 129
. .. . 30
. .. . 33
. .. . 34
Table 3-4. Performance metric correlation matrixes of test applications. A)Correlation
matrix of SPECseis96 performance data B)Correlation matrix of PostMark
performance data C)Correlation matrix of NetPIPE performance data
Metric 1 2 3 4 5 6
1 1.00 -0.21 -0.34 0.74 0.20 -0.02
2 -0.21 1.00 -0.16 -0.02 -0.17 -0.06
3 -0.34 -0.16 1.00 -0.60 0.20 -0.05
4 0.74 -0.02 -0.60 1.00 -0.19 0.04
5 0.20 -0.17 0.20 -0.19 1.00 0.12
6 -0.02 -0.06 -0.05 0.04 0.12 1.00
A
Metric 1 2 3 4 5 6
1 1.00 -0.24 0.22 0.34 -0.08 -0.13
2 -0.24 1.00 -0.22 0.18 0.04 -0.02
3 0.22 -0.22 1.00 0.33 0.30 0.18
4 0.34 0.18 0.33 1.00 0.42 0.47
5 -0.08 0.04 0.30 0.42 1.00 0.20
6 -0.13 -0.02 0.18 0.47 0.20 1.00
B
Metric 1 2 3 4 5 6
1 1.00 0.29 0.31 0.48 0.27 0.30
2 0.29 1.00 0.49 0.39 0.75 0.95
3 0.31 0.49 1.00 0.50 0.59 0.52
4 0.48 0.39 0.50 1.00 0.42 0.39
5 0.28 0.75 0.59 0.42 1.00 0.75
6 0.30 0.95 0.52 0.39 0.75 1.00
-loadive
loadfifteen
2-pktsin
5-pkts_out
cpu_system
bytes_out
Correlations those are larger than 0.5 are highlighted
with bold characters
data were classified as network-intensive. The results matched with our empirical
experience with these programs and are close to the results of expert-selected-feature
based classification, which shows 85'. cpu-intensive for SPECseis96, 97'. I/O-intensive for
PostMark, and I-'. network-intensive for PostMark_NFS.
Figure 1-1. Structure of an autonomic element.
the application resource performance modeling to support self-configuration and
self-optimization of application execution environments.
Generally, an autonomic system is an interactive collection of autonomic elements:
individual system constituents that contain resources and deliver services to humans and
other autonomic elements. As Figure 1-1 shows, an autonomic element will typically
consist of one or more managed elements coupled with a single autonomic manager that
controls and represents them. The managed element could be a hardware resource, such
as storage, a CPU, or a software resource, such as a database, or a directory service, or
a large legacy system [1]. The monitoring process collects the performance data of the
learned. In this work, the B ,i, i Network with a tree structure and full observability
is assumed. Figure 3-1 gives a sample BN learned in the experiment. The root is the
application class decision node, which is used to decide an application class given the value
of the leaf nodes. The root node is the parent of all other nodes. The leaf nodes represent
selected performance metrics, such as network packets sent and bytes written to disk.
They are connected one to another in a series.
3.2.3 Mahalanobis Distance
The Mahalanobis distance is a measure of distance between two points in the
multidimensional space defined by multidimensional correlated variables [22] [65]. For
example, if xl and x2 are two points from the distribution which is characterized by
covariance matrix E-1, then the quantity
((xi X2)T 1 X2)) (3 3)
is called the Mahalanobis distance from xi to X2, where T denotes the transpose of a
matrix.
In the cases where there are correlations between variables, simple Euclidean distance
is not an appropriate measure, whereas the Mahalanobis distance can adequately account
for the correlations and is scale-invariant. Statistical analysis of the performance data
in Section 3.4.3 shows that there are correlations between the application performance
metrics with various degrees. Therefore, Mahalanobis distance between the unlabeled
performance sample and the class centroid, which represents the average of all existing
training data of the class, is used in the training data qualification process in Section 3.3.1.
3.2.4 Confusion Matrix
Confusion matrix [66] is commonly used to evaluate the performance of classification
systems. It shows the predicted and actual classification done by the system. The matrix
size is LxL, where L is the number of different classes. In our case where there are five
target application classes, the L is equal to 5.
Prediction Performance Comparison (VM4)
1.6
1.4
n 1.2
"I 1
. 0.8
E 0.6
o 0.4
z
0.2
0
1 2 3 4 5 6 7 8 9 10 11 12
Performance Metric ID
SP-LARP Knn-LARP OBays-LARP OCum.MSE W-Cum.MSE
Figure 4-12. Predictor performance comparison (VM4)
1 CPU_usedsec, 2 CPUready, 3 Mem_size, 4 Mem_swap,
5 NICl_rx, 6 NICl_tx, 7 NIC2_rx, 8 NIC2_tx,
9 VDlread, 10 VDl_write, 11 VD2sread, 12 VD2_write
11 11 11 11 11 11 1 11 11 11 11 11 l
CHAPTER 6
CONCLUSION
Self-management has drawn increasing attentions in the last few years due to
the increasing size and complexity of computing systems. A resource scheduler that
can perform self-optimization and self-configuration can help to improve the system
throughput and free system administrators from labor-intensive and error-prone tasks.
However, it is challenging to equip a resource scheduler with such self- capacities because
of the dynamic nature of system performance and workload.
In this dissertation, we propose to use machine learning techniques to assist system
performance modeling and application workload characterization, which can provide
support for on-demand resource scheduling. In addition, virtual machines are used
as resource containers to host application executions for the ease of dynamic resource
provisioning and load balancing.
The application classification framework presented in C'! lpter 2 used the Principal
Component Analysis (PCA) to reduce the dimension of the performance data space.
Then the k-Nearest Neighbor k-NN algorithm is used to classify the data into different
classes such as CPU-intensive, I/O-intensive, memory-intensive, and network-intensive. It
does not require modifications of the application source code. Experiments with various
benchmark applications -,i--. -1 that with the application class knowledge, a scheduler
can improve the system throughput 22.11 on average by allocating the applications of
different classes to share the system resources.
The feature selection prototype presented in C(i ipter 3 uses a probabilistic model
(B i-, i i, Network) to systematically select the representative performance features,
which can provide optimal classification accuracy and adapt to changing workloads. It
shows that autonomic feature selection enables classification without requiring expert
knowledge in the selection of relevant low-level performance metrics. This approach
requires no application source code modification nor execution intervention. Results from
A,x,
/ a11 "' aIm
all .. a,
a21 a2m
a\ nl anm
Figure 2-4.
A'tpx
Preprocess / /
S11 1m
Preprocess a .. a2m PCA Classify Vote Class
---- --- ^ -- < ^1 1 -- ^ 'i ---- la s s
n>p : p>q q. q>1 (cil .. cTm
a p a pm q 1 bj bqm
Performance feature space dimension reductions in the application
classification process
m: The number of snapshots taken in one application run,
n: The number of performance metrics,
Anxm: All performance metrics collected by monitoring system,
A'pxm: The selected relevant performance metrics after the zero-mean and
unit-variance normalization,
Bqxm: The extracted key component metrics,
C1xm: The class vector of the snapshots,
Class: The application class, which is the ini i iily vote of snapshots' classes.
For example, performance metrics of CPU_System and CPU_User are correlated to
CPU-intensive applications; Bytes_In and Bytes_Out are correlated to Network-intensive
applications; IO_BI and IOBO are correlated to the IO-intensive applications; SwapIn
and Swap_Out are correlated to Memory-intensive applications. The data preprocessor
extracts these eight metrics of the target application node from the data pool based on our
expert knowledge. Thus it reduces the dimension of the performance metric from n = 33
to p = 8 and generates A'pxm as shown in Figure 2-4. In addition, the preprocessor also
normalizes the selected metrics to zero-mean and unit-variance.
2.3.2.2 Feature selection based on principal component analysis
The PCA processor takes the data collected for the performance metrics listed in
Table 2-1 as inputs. It conducts the linear transformation of the performance data and
selects the principal components based on the predefined minimal fraction variance. In
our implementation, the minimal fraction variance was set to extract exactly two principal
components. Therefore, at the end of processing, the data dimension gets further reduced
from p = 8 to q = 2 and the vector Bqxm is generated, as shown in Figure 2-4.
CHAPTER 2
APPLICATION CLASSIFICATION BASED ON MONITORING AND LEARNING OF
RESOURCE CONSUMPTION PATTERNS
Application awareness is an important factor of efficient resource scheduling. This
chapter introduces a novel approach for application classification based on the Principal
Component Analysis (PCA) and the k-Nearest Neighbor (k-NN) classifier. This approach
is used to assist scheduling in heterogeneous computing environments. It helps to reduce
the dimensionality of the performance feature space and classify applications based on
extracted features. The classification considers four dimensions: CPU-intensive, I/O
and paging-intensive, network-intensive, and idle. Application class information and the
statistical abstracts of the application behavior are learned over historical runs and used to
assist multi-dimensional resource scheduling.
2.1 Introduction
Heterogeneous distributed systems that serve application needs from diverse users
face the challenge of providing effective resource scheduling to applications. Resource
awareness and application awareness are necessary to exploit the heterogeneities of
resources and applications to perform adaptive resource scheduling. In this context, there
has been substantial research on effective scheduling policies [2-4] with given resource and
application specifications. There are several methods for obtaining resource specification
parameters (e.g., CPU, memory, disk information from /proc in Unix systems). However,
application specification is challenging to describe because of the following factors:
Numerous ';:/. of applications: In a closed environment where only a limited number
of applications are running, it is possible to analyze the source codes of each application
or even plug in codes to indicate the application execution stages for effective resource
scheduling. However, in an open environment such as in Grid computing, the growing
number of applications and lack of knowledge or control of the source codes present
the necessity of a general method of learning application behaviors without source code
modifications.
Resource Pattern
I I I I I.... I I
Training Data
(ixu Clxk)
P'
SIxv
Testing Data
)xm I
Yr
I
(v-m+l)xm
----------
S1
_* 1v
-I Phase Prediction Accuracy
Figure 5-3.
Application resource demand phase prediction workflow
In the training stage, the u performance data Xix,, of features) used in the
subsequent phase analysis are extracted (pattern representation) and framed
with prediction window size m. The unknown parameters of the resource
predictor is estimated during model fitting using the framed training data
X't(u-+1)x. In addition, the clustering algorithms introduced in Section 5.3
are used to construct the application phase profile including the phase labels
ix,u for all the samples and the calculated cluster centroids Clxk. In the
testing phase, the phase predictor uses the knowledge learned from the phase
profile to predict the future phases P1ixv based on the predicted resource
usage Y/x1,, and Pix, based on observed actual resource usage Yix,,, and
compare them to evaluate the phase prediction accur r ;,
7f
xv V
[98] W\\.. i II up98," http://ita.ee.lbl.gov/html/contrib/WorldCup.html.
[99] "Logreplayer," http://www.cs.virginia.edu/ rz5b/software/logreplayer-manual.htm.
[100] C. Isci, A. Buyuktosunoglu, and M. Martonosi, "Long-term workload phases:
duration predictions and applications to dvfs," IEEE Micro, vol. 25, no. 5, pp.
39-51, 2005.
[101] C. Isci and M. Martonosi, "Phase characterization for power: evaluating
control-flow-based and event-counter-based techniques," Proc. 12th International
Symposium on High-Performance Computer Architecture, pp. 121-132, 2006.
[102] T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, "Automatically
characterizing large scale program behavior," in Proc. 10th International Con-
ference on Architectural Support for P,.ii ,' ii,,i,.,:, Iraq,. and Operating S1,1 -1.
2002, pp. 45-57.
[103] H. Patil, R. Cohn, M. C'!i i,'-y, R. Kapoor, A. Sun, and A. Karunanidhi,
"Pinpointing representative portions of large intel itanium programs with dynamic
instrumentation," in Proc. 37th annual international symposium on Microarchitec-
ture, 2004.
[104] R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas,
\h i, ,,'y hierarchy reconfiguration for energy and performance in general purpose
architectures," in Proc. 33rd annual international symposium on microarchitecture,
Dec. 2000, pp. 245-257.
[105] A. Dhodapkar and J. Smith, \l i1, i, ig multi-configuration hardware via dynamic
working set analysis," in Proc. 29th Annual International Symposium on Computer
Architecture, Anchorage, AK, May 2002, pp. 233-244.
[106] A. Dhodapkar and J. Smith, "Comparing program phase detection techniques," in
Proc. 36th Annual IEEE/AC M International Symposium on Microarchitecture, 2003,
pp. 217-227.
[107] B. Urgaonkar, P. S1:, i,,i-, A. C'! iiili and P. Gov,-- "Dynamic provisioning
of multi-tier internet applications," in Proc. ',I./ International Conference of
Autonomic C ',,n-,j:,,; June 2005, pp. 217-228.
[108] J. Wildstrom, P. Stone, E. Witchel, R. J. Mooney, and M. Dahlin, "Towards
self-configuring hardware for distributed computer systems," in Proc. 2nd Interna-
tional Conference of Autonomic Computing, June 2005, pp. 241-249.
[109] J. S. C'! .-, D. E. Irwin, L. E. Grit, J. D. Moore, and S. E. Sprenkle, "Dynamic
virtual clusters in a grid site manager," Proc. 12th IEEE International Symposium
on High Performance Distributed Corn,,',l:,.',! pp. 90-100, June 2003.
Table 5-5. Average phase prediction accuracy of the five VMs
Performance Number of Phases
Features 1 2 3 4 5 6 7 8 9 110
CPUUsed 1.00 0.85 0.69 0.60 0.51 0.48 0.43 0.44 0.38 0.35
CPU_Ready 1.00 0.81 0.67 0.52 0.45 0.36 0.36 0.32 0.33 0.32
Mem Size 1.00 0.91 0.84 0.71 0.70 0.59 0.57 0.52 0.50 0.48
Mem_Swap 1.00 0.96 0.89 0.89 0.83 0.75 0.71 0.70 0.66 0.64
NIC #1RX 1.00 0.58 0.54 0.47 0.41 0.39 0.37 0.34 0.30 0.28
NIC #1TX 1.00 0.56 0.48 0.42 0.39 0.35 0.29 0.26 0.29 0.25
NIC#2_RX 1.00 0.93 0.77 0.70 0.61 0.55 0.46 0.33 0.31 0.24
NIC#2_TX 1.00 0.88 0.81 0.76 0.71 0.63 0.53 0.48 0.56 0.45
Diskl Read 1.00 0.97 0.92 0.86 0.80 0.73 0.64 0.56 0.52 0.44
Diskl Write 1.00 0.94 0.87 0.78 0.70 0.67 0.63 0.59 0.58 0.55
Disk2 Read 1.00 0.67 0.61 0.55 0.50 0.49 0.47 0.46 0.41 0.38
Disk2 Write 1.00 0.93 0.84 0.76 0.60 0.57 0.51 0.46 0.41 0.38
3. In this work, one dimensional phase analysis and prediction is performed. However
the prototype can generally work for multi-dimension resource provisioning cases also.
For clustering in the multi-dimension space, additional pattern representation techniques
such as Principal Component Analysis (PCA) can be used to project the data to lower
dimensional space to reduce the computing intensity. In addition, the transition factor
C will represent the unit transition cost defined in the pricing schedule of the resource
provider.
Developing prediction models for parallel and multi-tier applications is part of our
future research.
5.6 Related Work
Recently, application's phase behavior has drawn a growing research interest for
different reasons. First, tracking application phases enables workload dependent dynamic
management of power/performance trade-offs [100][101]. Second, phase characterization
that summarizes application behavior with representative execution regions can be used
each cluster to maximize system revenue [110]. Tesauro et al. used a combination of
reinforcement learning and queuing model for system performance management [5].
5.7 Conclusion
The application resource demand phase analysis and prediction prototype presented
in this chapter shows how to apply statistical learning techniques to support on-demand
resource provisioning. This chapter shows how to define the phases in the context of
system level resource provisioning and provides an approach to automatically find out
the number of phases which can provide optimal cost. The proposed cost model can
take the resource cost, phase transition cost, and prediction accuracy into account. The
experimental results show that an average of above 9,' i of phase prediction accuracy can
be achieved in the experiments across the CPU and network performance features under
study for the four-phase cases. With the knowledge of the system level application phase
behavior, we envision dynamic optimization of resource scheduling during the application
run can be performed to improve system utilization and reduce the cost for the user.
Providing more informative phase prediction can help to achieve this goal and is part of
our future research.
Predictor Performance Comparison (VM1)
1.4
1.2
LU
i> 1
N 0.8
N
S0.6
E
S0.4
z
0.2
0
1 2 3 4 5 6 7 8 9 10 11 12
Performance Metric ID
U P-LARP U Knn-LARP O Bays-LARP O Cum.MSE U W-Cum.MSE
Figure 4-9. Predictor performance comparison (VM1)
1 CPU_usedsec, 2 CPU_ready, 3 Mem_size, 4 Mem_swap,
5 NICl_rx, 6 NICl_tx, 7 NIC2-rx, 8 NIC2_tx,
9 VDl_read, 10 VDl_write, 11 VD2_read, 12 VD2_write
the B -i ,i classifier are used to forecast the best predictor for the workload based on
the learning of historical load characteristics and prediction performance. The principal
component analysis technique has been applied to reduce the input data dimension of
the classification process. Our experimental results with the traces of the full range
of virtual machine resources including CPU, memory, network and disk show that the
LARPredictor can effectively identify the best predictor for the workload and achieve
prediction accuracies that are close to or even better than any single best predictor.
Application Throughput of Different Schedules
250
SPN
SPPN
a. 200
)SSN
150- SSN
SMIN
0 100 *MAX
CLOO
SAVG
(SP } 50 Figue SPN
< 0
SPECseis96 PostMark NetPIPE
Application
Figure 2-8. Application throughput comparisons of different schedules. MIN, MAX, and
AVG are the minimum, maximum, average application throughput of all the
ten possible schedules. SPN is the proposed schedule 10 {(SPN), (SPN),
(SPN)} in Figure 2-7.
Table 2-4. System throughput: concurrent vs. sequential executions
Execution Elapsed CH3D PostMark Time Taken to
Time (sec) Finish 2 Jobs
Concurrent 613 310 613
Sequential 488 264 752
throughput of schedule ID 10 (labeled SPN in Figure 2-8) with the minimum, maximum,
and average throughputs of all the ten possible schedules. By allocating jobs from different
classes to the machine, the three applications' throughputs were higher than average by
different degrees: SPECseis96 Small by 24.911' Postmark by 48.1;:', and NetPIPE by
4.2'' Figure 2-8 also shows that the maximum application throughputs were achieved
by sub-schedule (SSN) for SPECseis96 and (PPN) for NetPIPE instead of the proposed
(SPN). However, the low throughputs of the other applications in the sub-schedule make
their total throughputs sub-optimal.
to reduce the high computation costs of large-scale simulations [102] [103]. Our purpose to
study the phase behavior is to support dynamic resource provisioning of the application
containers.
In addition to the purpose of study, our approach differs from traditional program
phase analysis in the following v--,-
1) Performance metric under study: In the area of power management and simulation
optimization for computer architecture research, the metrics used for workload charac-
terization are typically Basic Block Vectors (BBV) [102] [101], conditional branch counter
[104], and instruction working set [105]. In the context of application VM/container's
resource provisioning, the metrics under study are the system level performance features,
which are instructive to VM resource provisioning such as those shown in Table 5-1.
2) Knowledge of the program codes: While [102] [101] [104] at least requires profiling
of program binary codes, our approach requires neither instrumentation nor access of
program codes.
3) This thesis answers the question !:i.-- many clusters are b. -I in the context of
system level resource provisioning.
In [106], Dhodapkar et al. compared three dynamic program phase detection
techniques discussed in [102], [104], and [105] using a variety of performance metrics, such
as sensitivity, stability, performance variance and correlations between phase detection
techniques.
In addition, other related work on resource provisioning include: Urgaonkar et al.
studied resource provisioning in a multi-tier web environment [107]. Wildstrom et al.
developed a method to identify the best CPU and memory configuration from a pool of
configurations for a specific workload [108]. ('!: .- et al. have proposed a hierarchical
architecture that allocates virtual clusters to a group of applicaitons [109]. Kusic et al.
developed an optimization framework to decide the number of servers to allocate to
Semi-Supervised Learning: Given a mix of labeled and unlabeled data, it can get
better predictor than just training on labeled data.
Transductive Learning: It trains a classifier to give best predictor on a specific set of
test data.
Active Learning: It chooses or constructs optimal samples to train on next with the
objective to achieve best predictor with fewest labeled samples.
Nonlinear Dime ...: ..,.i.:;l. Reduction: It learns underlying complex manifolds of data
in high dimensional spaces.
In this work, various learning techniques are used to model the application resource
demand and system performance. These models can help to system to adapt to the
changing workload and achieve higher performance.
1.4 Virtual Machines
Virtual machines were first developed and used in the 1960s, with the best-known
example being IBM's VM/370 [10]. A "
independent, isolated operating systems (guest VM) to run on one physical machine (host
server), efficiently multiplexing system resources of the host machine [10].
A virtual-machine monitor (VMM) is a software lv--r that runs on a host platform
and provides an abstraction of a complete computer system to higher-level software.
The abstraction created by the VMM is called a virtual machine. Figure 1-3 shows the
structure of virtual machines.
1.4.1 Virtual Machine Characteristics
Virtual machines can greatly simplify system management (especially in environments
such as Grid computing) by raising the level of abstraction from that of the operating
system user to that of the virtual machine to the benefit of the resource providers and
users [11]. The following characteristics of virtual machines make them a highly flexible
and manageable application execution platform:
4.6.2 Testing Phase
Similar to the training phase, the testing data are normalized using the normalization
coefficient derived from the training phase and framed with the prediction window size
m. Then the PCA is used to reduce the dimension of the preprocessed testing data
(iy'tmrru-m+i '-1)) from m to n.
In the testing phase of the LARPredictor that is based on k-NN classifier, the
Euclidean distances between all PCA processed test data (y/'_/ yt_ +ii ... y ') and all
training data X"-_,+m)xx in the reduced n dimensional feature space are calculated and
the k (k = 3 in our implementation) training data which have the shortest distances to the
testing data are identified. The 1 I i' ,i ity vote of the k nearest neighbors' best predictor
will be chosen as the best predictor to predict y't based on the (y_,-, _-_+, ... ,y_-1) in
case of the AR model or the SWAVG model and y't = y'_, in case of the LAST model.
The prediction performance can be obtained by comparing the predicted value y't with the
normalized observed value y'.
In the testing phase of the LARPredictor that is based on B i-, -i in classifier, test
data are preprocessed the same as the k-NN classifier. The PCA-processed test data
(yt'_- y t"-+, ... yt-i) are p]li--:. 'l into the discriminant function (4 12) derived in
Section 4.5.2. The parameters in the discriminant function for each class, the mean vector
and covariance matrix, are obtained during the training phase. Then, each test data is
classified as the class of the largest discriminant function.
The testing phase differs from the training phase in that it does not require running
multiple predictors in parallel to identify the one which is best suited to the data and
gives the smallest MSE. Instead, it forecasts the best predictor by learning from historical
predictions. The reasoning here is that these nearest neighbors' workload characteristics
are closest to the testing data's and the predictor that works best for these neighbors
should also work best for the testing data.
[27] M. L. Massie, B. N. Chun, and D. E. Culler, "The ganglia distributed monitoring
system: Design, implementation, and experience.," Parallel ConTrI'I,.:, vol. 30, no.
5-6, pp. 817-840, 2004.
[28] N\' I 'iI," http://www.netapp.com/techlibrary/3022.html.
[29] R. Eigenmann and S. Hassanzadeh, "Benchmarking with real industrial applications:
the spec high-performance group," IEEE Computational Science and Engineering,
vol. 3, no. 1, pp. 18-23, 1996.
[30] "Ettcp," http://sourceforge.net/projects/ettcp/.
[34] Q. Snell, A. Mikler, and J. Gustafson, I lpipe: A network protocol independent
performance evaluator," June 1996.
[31] "Simplescalar," http://www.cs.wisc.edu/ mscalar/simplescalar.html.
[32] "Ch3d," http://users.coastal.ufl.edu/ pete/CH3D/ch3d.html.
[33] "Bonnie," http://www.textuality.com/bonnie/.
[35] "Vmd," http://www.ks.uiuc.edu/Research/vmd/.
[36] "Spim," http://www.cs.wisc.edu/ larus/spim.html.
[37] "Reference of stream," http://www.cs.virginia.edu/stream/ref.html.
[38] "Autobench," http://www.xenoclast.org/autobench/.
[39] I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," J.
Mach. Learn. Res., vol. 3, pp. 1157-1182, Mar. 2003.
[40] Y. Liao and V. R. Vemuri, "Using text categorization techniques for intrusion
detection," in 11th USENIX S.. ii; Symposium, San Francisco, CA, Aug. 5-9,
2002, pp. 51-59.
[41] A. K. Ghosh, A. Schwartzbard, and M. Schatz, "Learning program behavior profiles
for intrusion detection," in Proc. the Workshop on Intrusion Detection and Network
Monitoring, Santa Clara, CA, Apr. 9-12, 1999, pp. 51-62.
[42] M. Almgren and E. Jonsson, "Using active learning in intrusion detection," in Proc.
17th IEEE Computer S,. ,:; Foundations Workshop, June 28-30, 2004, pp. 88-98.
[43] S. C. Lee and D. V. Heinbuch, "Training a neural-network based intrusion detector
to recognize novel attacks.," IEEE Transactions on Sii-/, Man, and C;,d, ,,. ,.:..
Part A, vol. 31, no. 4, pp. 294-299, 2001.
[44] G. Forman, "An extensive empirical study of feature selection metrics for text
classification," J. Mach. Learn. Res., vol. 3, pp. 1289-1305, 2003.
This expression can be evaluated if the densities p(xlui) are multivariate normal. In
this case, we have
1 d
g (x) -(x pfE-1(x ) ln27
2 2
1
-n i + In P(u.). (4-12)
2
The resulting classification is performed by evaluating discriminant functions. When
the workload have similar statistical property, the B ,v -i i classifier derived from one
workload trace can be applied to another directly. In case of highly variable workload, the
retraining of the classifier is necessary.
4.5.3 Principal Component Analysis
The Principal Component A,.,,li,.:- (PCA) [22][88], also called Karhunen-Lodve trans-
form, is a linear transformation representing data in a least-square sense. The principal
components of a set of data in RP provide a sequence of best linear approximations to
those data, of all ranks q < p.
Denote the observations by xl,, x2, XN, and the parametric representation of the
rank-q linear model is as follows:
f(A) = + VqA, (4-13)
where p is a location vector in Rp, Vq is a p x q matrix with q orthogonal unit vectors
as columns, which are called eigenvectors, and A is a vector of q parameters, which are
called .:ij ,..l',. These eigenvectors are the principal components. The corresponding
eigenvalues represent the contribution to the variance of data. Often there will be just a
few (= k) large eigenvalues and this implies that k is the inherent dimensionality of the
subspace governing the data. When the k largest eigenvalues of q principal components are
chosen to represent the data, the dimensionality of the data reduces from q to k.
Table 4-1. Normalized prediction MSE statistics for resources of VM1
Predictors
Perf.Metrics P-LAR LAR LAST AR SW
CPU usedsec 0.6976 0.9508 1.1436 0.9456 1.0352
CPU ready 0.6775 0.9632 1.1699 0.9579 1.0333
Memory_size 0.2071 0.2389 0.2298 0.2379 0.4883
Memory swapped 0.2071 0.2386 0.2298 0.2379 0.4883
NIC1 received 0.3981 0.5436 1.836 0.5436 0.9831
NIC1 transmitted 0.3776 0.5845 1.8236 0.5845 0.9829
NIC2 received 0.9788 0.9912 1.4392 0.9966 1.0397
NIC2 transmitted 0.3983 0.5463 1.8406 0.5463 0.9843
VD1 read 0.9062 1.0215 1.2849 0.9754 1.0511
VD1 write 0.7969 0.9587 1.1905 0.9473 1.0566
VD2 read 1 1.2156 1.4191 1.1536 1.035
VD2 write 0.662 0.9931 1.1572 0.9929 1.0292
duration = 168 hours, interval = 30 minutes,
prediction order =16
LARPredictor to outperform any single predictor in the pool and approach the prediction
accuracy of the P-LAR by improving the best predictor forecasting / classification
accuracy. How to further improve the predictor classification accuracy is a topic of our
future research.
4.7.2.2 Performance comparison of k-NN and Bayesian-classifier based
LARPredictor
In this experiment, a set of VM trace with 138,240 performance data were used
to feed the LARPredictor. Half of the data were used for training and the other half
were used for testing. A B ,v- i i:-classifier based LARPredictor was implemented.
Fig. 4-9 shows the prediction performance comparisons between it and the k-NN based
LARPredictor for all the resources of VM1. The profile report of the Matlab program
execution showed that it cost the kNN based LARPredictor 205.8 second CPU time, with
193.5 seconds in the testing phase and 12.3 seconds in the training phase. It took 132.1
x Idle
0 10
CPU
a NET
O MEM
*
x Idle
* CPU
C
E 2
0
0 1
E
o.
a_
-6 -4 -2 0
Principal Component 1
2 4
-6 -4 -2 0
Principal Component 1
C,
E 2
C
0
S1
E
o
0
-7 0
o.a
2-
-2
-6 -4 -2 0
Principal Component 1
2 4
-6 -4 -2 0
Principal Component 1
D
Figure 2-5.
Sample clustering diagrams of application classifications. A)Training
data:Mixture B)SimpleScalar:CPU-intensive C)Autobench:Network-intensive
D)VMD:Interactive Principal Component 1 and 2 are the principal component
metrics extracted by PCA.
3
CM
E 2
C
o
0
0
0
2 -1
o_
2 4
2
C
1 1
E
0
0
7Z 0
2-
o.
x Idle
0 10
E NET
2 4
performance than the locally weighted regression algorithms for the tools tested. Our
choice of k-NN classification is based on conclusions from [45]. This thesis differs from
Kapadia's work in the following v--,v- First, the application class knowledge is used to
facilitate the resource scheduling to improve the overall system throughput in contrast
with Kapadia's work, which focuses on application CPU time prediction. Second, the
application classifier takes performance metrics as inputs. In contrast, in [45] the CPU
time prediction is based on the input parameters of the application. Third, the application
classifier employs PCA to reduce the dimensionality of the performance feature space. It is
especially helpful when the number of input features of the classifier is not trivial.
Condor uses process checkpoint and migration techniques [20] to allow an allocation
to be created and preempted at any time. The transfer of checkpoints may occupy
significant network bandwidth. Basney's study in [46] shows that co-scheduling of CPU
and network resources can improve the Condor resource pool's goodput, which is defined
as the allocation time when a remotely executing application uses the CPU to make
forward progress. The application classifier presented in this thesis performs learning of
application's resource consumption of memory and I/O in addition to CPU and network
usage. It provides a way to extract the key performance features and generate an abstract
of the application resource consumption pattern in the form of application class. The
application class information and resource consumption statistics can be used together
with recent multi-lateral resource scheduling techniques, such as Condor's Gang-matching
[47], to facilitate the resource scheduling and improve system throughput.
Conservative Scheduling [4] uses the prediction of the average and variance of the
CPU load of some future point of time and time interval to facilitate scheduling. The
application classifier shares the common technique of resource consumption pattern
analysis of a time window, which is defined as the time of one application run. However,
the application classifier is capable to take into account usage patterns of multiple kinds of
resources, such as CPU, I/O, network and memory.
CHAPTER 3
AUTONOMIC FEATURE SELECTION FOR APPLICATION CLASSIFICATION
Application classification techniques based on monitoring and learning of resource
usage (e.g., CPU, memory, disk, and network) have been proposed in Ci Ilpter 2 to aid in
resource scheduling decisions. An important problem that arises in application classifiers
is how to decide which subset of numerous performance metrics collected from monitoring
tools should be used for the classification. This chapter presents an approach based on
a probabilistic model (B i-, -i ,i Network) to systematically select the representative
performance features, which can provide optimal classification accuracy and adapt to
changing workloads.
3.1 Introduction
Awareness of application resource consumption patterns (such as CPU-intensive, I/O
and paging-intensive and network-intensive) can facilitate the mapping of workloads to
appropriate resources. Techniques of application classification based on monitoring and
learning of resource usage can be used to gain application awareness [53]. Well-known
monitoring tools such as the open source packages Ganglia [54] and dproc [55], and
commercial products such as HP's Open View [56] provide the capability of monitoring
a rich set of system level performance metrics. An important problem that arises is how
to decide which subset of numerous performance metrics collected from monitoring tools
should be used for the classification in a dynamic environment. In this chapter we address
this problem. Our approach is based on autonomic feature selection and can help to
improve the system's self-manageability [1] by reducing the reliance on expert knowledge
and increasing the system's adaptability.
The need for autonomic feature selection and application classification is motivated by
systems such as VMPlant [16], which provides automated resource provisioning of Virtual
Machine (VM). In the context of VMPlant, the application can be scheduled to run on a
dedicated virtual machine, whose system level performance metrics reflect the application's
Table 4-2. Normalized prediction MSE statistics for resources of VM2
Predictors
Perf.Metrics P-LAR LAR LAST AR SW
CPU usedsec 0.8142 1.1158 1.2476 1.0311 1.0912
CPU ready 0.7873 1.0128 1.2167 1.0166 1.0948
Memory_size 0.5328 0.6213 0.637 0.6262 0.79
Memory swapped 0.5328 0.6214 0.637 0.6262 0.7901
NIC1 received 0.4872 0.6189 0.6663 0.611 0.6831
NIC1 transmitted 0.7581 1.0138 1.0303 1.0209 1.0737
NIC2 received 0.6626 0.89 0.8765 0.8923 1.0242
NIC2 transmitted 0.7434 0.9924 1.0266 0.9949 1.0775
VD1 read 0.9582 1.0467 1.2249 1.0264 1.0912
VD1 write 0.7733 1.0744 1.1574 1.0129 1.0748
VD2 read 1.0208 1.4153 1.4155 1.0843 1.0972
VD2 write 0.7389 0.9941 1.0816 0.9372 1.0792
duration = 24
hours, interval = 5 minutes,
prediction order = 5
second CPU time for the B il, -i oi based LARPredictor to finish execution with 120.8
second testing phase and 11.3 second training phase.
The experimental results show that the prediction accuracy in terms of normalized
MSE of the B li-i ,i-classifier based LARPredictor is about 3.>'-. worse than the k-NN
based one. However, it shortened the CPU time of the testing phase by 37.57'.
4.7.2.3 Performance comparison of the LARPredictors and the cumulative-
MSE based predictor used in the NWS
This section compares the prediction accuracy of the LARPredictors and the NWS
predictor. Fig. 4-9, 4-10, 4-11, 4-12, and 4-13 shows the prediction accuracy of the perfect
LARPredictor that has l(1i'- best predictor forecasting accuracy (P-LARP), the k-NN
and B ,vi -i ,i based LARPredictors (KnnLARP and B ,i -LARP), the cumulative MSE
of all history based predictor used in the NWS (Cum.MSE), and the cumulative-MSE
a time-series prediction model, autoregressive (AR), is used for its simplicity and proven
success in computer system resource prediction [78]. However, this prototype can generally
work with any other time-series prediction models. In case of highly dynamic workloads,
the Learning-Aided Resource Predictor (LARPredictor) developed in C'!i pter 4 can be
used. The LARPredictor uses a mix-of-experts approach, which adaptively chooses the
best prediction model from a pool of models based on learning of the correlations between
the workload and fitted prediction models of historical runs.
Similar to the training stage, the testing data are extracted Y1ix, and framed with
the prediction window size m. The framed testing data Y'v-nm+I)xm are used as input
of the fitted resource predictor to predict the future resource usage Y'i,. The phase
predictor classifies the predicted resource usages Y'ix, into the phases P'1Ix based on the
phase profile learned in the training stage Similarly, the phase predictions for the actual
resource usage Yx1, are performed to generate Pix,. Then the corresponding predicted
phases P'1Ix (which are based on predicted resource usage) and Pixv (which are based on
actual resource usage) are compared to evaluate the phase prediction accur ; ., which is
defined as the ratio of the number of matched phase predictions over the total number of
phase predictions.
5.5 Empirical Evaluation
We have implemented a prototype for the phase analysis and prediction model
including Perl and Shell scripts to extract and profile the performance data from
the performance database, and a Matlab implementation of the phase analyzer and
predictor. This section shows the experimental results of the phase analysis and prediction
performance evaluations using traces collected from the batch executions of SPECseis96,
a scientific benchmark program, and replay of the WorldCup98 web access log. In all
the experiments, ten-fold cross validation was performed for each set of time series
performance data.
(NWS) [73] for 66.1' of the traces. It has the potential to consistently outperform any
single predictor for variable workloads and achieve 18., :'.- lower MSE than the model used
in the NWS.
The rest of the chapter is organized as follows: Section 4.2 gives an overview of
related work. Section 4.4 describes the linear time series prediction models used to
construct the LARPredictor and Section 4.5 describes the learning techniques used for
predictor selection. Section 4.6 details the work flow of the learning-aided adaptive
resource predictor. Section 4.7 discusses the experimental results. Section 4.8 summarizes
the work and describes future direction.
4.2 Related Work
Time series analysis has been studied in many areas such as financial forecasting [74],
biomedical signal processing [75], and geoscience [76]. In this work, we focus on the time
series modeling for computer resource performance prediction.
In [77] and [78], Dinda et al. conducted extensive study of the statistical properties
and the predictions of host load. Their work indicates that CPU load is strongly
correlated over time, which implies that history-based load prediction schemes are feasible.
They evaluated the predictive power of a set of linear models including autoregression
(AR), moving average (\A4), autoregression integrated moving average (ARIMA),
autoregression fractionally integrated moving average (ARFIMA), and window-mean
models. Their results show that the AR model is the best in terms of high prediction
accuracy and low overhead among the models they studied. Based on their conclusion, the
AR model is included in our predictor pool to leverage its performance.
To improve the prediction accuracy, various adaptive techniques have been exploited
by the research community. In [4], Yang et al. developed a t. ,i d-l-' --based prediction
model that predicts the next value according to the tendency of the time series change.
Some increment/decrement value are added/subtracted to the current measurement
based on the current measurement and some other dynamic information to predict the
145
[110] D. Kusic and N. K ind il- ii.i, "Risk-aware limited lookahead control for dynamic
resource provisioning in enterprise computing systems," in Proc. 3rd International
Conference of Autonomic ConruI,,l',: 2006, pp. 74-83.
VMIP
(t1, t2)
CTC Classification Training Center DataQA Data Quality Assuror
Figure 3-2.
Feature selection model
The Performance p li'/. collects performance metrics of the target
application node. The Application ,. /i-.:;/7 r classifies the application using
extracted key components and performs statistic ain i ,~i-; of the classification
results. The DataQA selects the training data for the classification. The
Feature selector selects performance metrics which can provide optimal
classification accuracy. The Trainer trains the classifier using the selected
metrics of training data. The Application DB stores the application class
information. (to/ti: are the beginning / ending times of the application
execution, VMIP is the IP address of the application's host machine).
training data from the application snapshots, only n out of m metrics are extracted based
on previous feature selection result to form a set of Kc n-dimensional training points.
{Xk,1, Xk,,2, Xk ,,n}, k 1, 2,.. Kc (3-4
that comprise a cluster Cc. From [50], it follows that the n-tuple
~,= ( I x 2,- ')nJ (3-5
)
)
0
Principal
2
Component 1
0 2
Principal Component 1
Figure 3-8. Training data
automatically
clustering diagram derived from expert-selected and
selected feature sets A)Automatic B)Expert
x Idle
0 10
CPU
NET
0 MEM
0
4t
-2
-4
x Idle
0 10
CPU
NET
0 MEM
0
0 C:
o
O(POc
-21
-4
Im I I
designed based on B ,i, i i Network (BN) to systematically identify the feature subset,
which can provide optimal classification accuracy and adapt to changing workloads.
Second, an adaptive system performance prediction model is investigated based
on a learning-aided predictor integration technique. Supervised learning techniques are
used to learn the correlations between the statistical properties of the workload and the
best-suited predictors.
In addition to a one-step ahead prediction model, a phase characterization model is
studied to explore the large-scale behavior of application's resource consumption patterns.
Our study provides novel methodologies to model system and application perfor-
mance. The performance models can self-optimize over time based on learning of historical
runs, therefore better adapt to the changing workload and achieve better prediction
accuracy than traditional methods with static parameters.
Table 4-3. Normalized prediction MSE statistics for resources of VM3
Predictors
Perf.Metrics P-LAR LAR LAST AR SW
CPU usedsec 0.9883 1.0395 1.4341 1.0376 1.0989
CPUready 0.6826 0.9502 1.6594 0.9502 1.0921
Memory_size 0.5009 0.6169 0.6818 0.6216 0.7481
Memory_swapped 0 0 0 NaN 0
NIC1 transmitted 0.9931 1.0514 1.3068 1.0665 1.0943
VD1 read 0 0 0 NaN 0
VD1 write 0 0 0 NaN 0
VD2 read 0.9728 1.0276 1.3969 1.0281 1.1016
VD2 write 0.8696 0.9938 1.245 0.9946 1.0815
duration = 24 hours, interval = 5 minutes, prediction order = 5
based predictor of a fixed window size (n=2 in this experiment) used in the NWS
(W-Cum.MSE).
The experimental results show that without running all the predictors in parallel all
the time, for 66.1.7' of the traces, the LARPredictor outperformed the cumulative-MSE
based predictor used in the NWS. The perfect LARPredictor shows the potential to
achieve 18.,.'. lower MSE in average that the cumulative-MSE based predictor.
4.7.3 Discussion
PCA is an optimal way to project data in the mean-square sense. The computational
complexity of estimating the PCA is O(d2W) + O(d3) for the original set of W x
d-dimensional data [89]. In the context of resource performance time series prediction,
W = 1 and d is the prediction window size. The typical small input data size in this
context makes the use of the PCA feasible. There also exist computationally less expensive
methods [90] for finding only a few eigenvectors and eigenvalues of a large matrix; in our
experiments, we use appropriate Matlab routines to realize these.
0.6
0.4
0.2
0
-0.2
40
CDU
60
system (%)
Figure 3-6. Five-class test data distribution with first two selected features
Automatic
I I Expert
... IL
1 2 3 4 5 6 7
Cluster Pair
8 9 10
Figure 3-7.
Comparison of distances between cluster centers derived from expert-selected
and automatically selected feature sets
l:idle-cpu 2:idle-I/O 3:idle-net 4:idle-mem 5:cpu-I/O
6:cpu-net 7:cpu-mem 8:I/O-net 9:I/O-mem 10:net-mem
x Idle
CPU
*0 10
o Network
O Memory
CIBiO0 00 G(1IO 0
- 0 coo 0 0 GD 0-
2007 Jian Zhang
System Throughput of Different Schedules
1600
1400
Ea 1200 -
fl o1000-
0-,
2 800
", 600
8 400
200
0
1 2 3 4 5 6 7 8 9 10
Schedule ID
Figure 2-7. System throughput comparisons for ten different schedules
1:{(SSS),(PPP),(NNN)}, 2:{(SSS),(PPN),(PNN)}, 3:{(SSP),(SPP),(NNN)},
4:{(SSP),(SPN),(PNN)}, 5:{(SSP),(SNN),(PPN)}, 6:{(SSN),(SPP),(PNN)},
7:{(SSN),(SPN),(PPN)}, 8:{(SSN),(SNN),(PPP)}, 9:{(SPP),(SPN),(SNN)},
10:{ (SPN), (SPN), (SPN) }
S -SPECseis96 (CPU-intensive), P -PostMark (I/O-intensive),
N -NetPIPE (Network-intensive).
selected at random. The other scenario used application class knowledge, alv--v allocating
applications of different classes (CPU, I/O and network) to run on the same machine
(Schedule 10, Figure 2-7). The system throughputs obtained from runs of all possible
schedules in the experimental environment are shown in Figure 2-7.
The average system throughput of the schedule chosen with class knowledge was
1391 jobs per d i'. It achieved the highest throughput among the ten possible schedules
22.11 larger than the weighted average of the system throughputs of all the ten possible
schedules. In addition, the random selection of the possible schedules resulted in large
variances of system throughput. The application class information can be used to facilitate
the scheduler to pick the optimal schedule consistently. The application throughput
comparison of different schedules on one machine is shown in Figure 2-8. It compares the
from the server host machine's /proc nodes. The vmkusage tool samples every minute,
and updates its data every five minutes with an average of the one-minute statistics over
the given five-minute interval. The collected data is stored in a Round Robin Database
(RRD). Table 2-1 shows the list of performance features under study in this work.
The profiler retrieves the VM performance data, which are identified by vmlD,
devicelD, and a time window, from the round robin performance database. The data of
each VM device's performance metric form time series (xt-m+l, xt) with an identical
interval, where m is the data retrieval window size. The retrieved performance data with
the corresponding time stamps are stored in the prediction database. The [vmID, devicelD,
timeStamp, metricName] forms the combinational primary key of the database. Figure 4-2
shows the XML schema of the database and sample database records of virtual machines
such as VM1, which has one CPU, two Network Interface Cards (NIC), and two virtual
hard disks.
The LARPredictor takes the time series performance data (y,_t ,Y_-1) as inputs,
selects the best prediction model based on the learning of historical prediction results,
and predicts the resource performance Yt of future time. The detail description of the
LARPredictor's work flow is given in Section 4.6. The predicted results are stored in
the prediction DB and can be used to support the resource manager's dynamic VM
provisioning decision-making.
The Prediction Q;,l.:iJ Assuror (QA) is responsible for monitoring the LARPredic-
tor's performance, in terms of MSE. It periodically audits the prediction performance by
calculating the average MSE of historical prediction data stored in the prediction DB.
When the average MSE of the data in the audit window exceeds a predefined threshold,
it directs the LARPredictor to re-train the predictors and the classifier using recent
performance data stored in the database.
LIST OF FIGURES
Figure
1-1 Structure of an autonomic element . ............
1-2 Classification system representation . ...........
1-3 Virtual machine structure . .................
1-4 VMPlant architecture . .......
2-1 Sample of principal component analysis . .........
2-2 k-nearest neighbor classification example . .
2-3 Application classification model . .............
2-4 Performance feature space dimension reductions in the application
classification
process . . . . . . . . . . .. .
2-5 Sample clustering diagrams of application classifications . .
2-6 Application class composition diagram. . . .....
2-7 System throughput comparisons for ten different schedules . . ..
2-8 Application throughput comparisons of different schedules . . ..
3-1 Sample B ,i, -i ,i, network generated by feature selector . . .
3-2 Feature selection model .... . .
3-3 B li,. -i ,i-network based feature selection algorithm for application classification
3-4 Average classification accuracy of 10 sets of test data versus number of features
selected in the first experiment . . .
3-5 Two-class test data distribution with the first two selected features . .
3-6 Five-class test data distribution with first two selected features . . .
3-7 Comparison of distances between cluster centers derived from expert-selected
and automatically selected feature sets ................ .... 66
3-8 Training data clustering diagram derived from expert-selected and automat-
ically selected feature sets .............. .......... .. 67
3-9 Classification results of benchmark programs ................ . 69
4-1 Virtual machine resource usage prediction prototype ............. 78
4-2 Sample XML schema of the VM performance DB ... . .... 80
page
16
19
21
23
28
31
32
34
39
42
43
44
54
57
60
63
63
66
where C denotes the transition factor, which is the ratio of C2 to C1, and K is the
maximum number of phases.
Encoding misprediction p y.J '.all; cost: The algorithm can be extended to phase
prediction as well as phase analysis of resource usage. The determination of the best
number of phases remains the same, whereas the cost function has to be changed to
take over- or under-provisioning caused by prediction error into account. Generally the
mispredictions consist of two possible cases: over-provisioning and under-provisioning.
Over-provisioning refers to the cases that the resource reservation based on prediction is
larger than the actual usage. It guarantees that the application response time is equal
to or less than the time defined in the SLA. In this case, the penalty is the cost of the
over-reserved resource, which has been encoded in the cost model already. In case of
under-provisioning, the application's execution time will be enlarged because of the
resource constrain. The performance degradation is approximated by the y. '.i/ll; in the
total cost function. The penalty is defined as the difference between the under-reserved
resource and the actual resource usage, and can be written as
penalty if u < UI (57)
V U Unax if U> Upax
k
P(k) Upena" (5-8)
i= 1
where k is the number of phases. Taking both the phase transition and misprediction costs
into account, the general total cost function is modified as
TC'(k) = C1R(k) + C2TR(k) + C3P(k) (5-9)
In this work, the PCA is used to reduce the prediction input data dimensions. It
helps to reduce the computing intensity of the subsequent classification process.
4.6 Learning-Aided Adaptive Resource Predictor
This section describes the work flow of the Learning-Aided Adaptive Resource Pre-
dictor (LARPredictor) illustrated in Figure 4-3. The prediction consists of two phases: a
training phase and a testing phase. During the training phase, the best predictors for each
set of training data are identified using the traditional mix-of-expert approach. During
the testing phase, the classifier forecasts the best predictor for the test data based on the
knowledge gained from the training data and historical prediction performance. Then only
the selected best predictor is run to predict the resource performance. Both phases include
the data pre-processing and the Principal Component All 1.,-i' (PCA) process.
The features under study in this work, as shown in Table 2-1, include CPU, memory,
network bandwidth, and disk I/O usages. Figure 4-4 illustrates how the features are
processed to form the prediction database. Since the features have different units of
measure, a data pre-processor was used to normalize the input data with zero mean and
unit variance. The normalized data are framed according to the prediction window size to
feed the PCA processor.
4.6.1 Training Phase
The training phase of both the k-NN and the B li-,-i ,i classifiers mainly consists
of two processes: Prediction model fitting and best predictor identification. The set of
training data with the corresponding best predictors are used for the k-NN classification in
the testing phase. The unknown parameters of the B li, -i in classifier are estimated from
on the training data.
The LAST and SW_AVG models do not involve any unknown parameters. They can
be used for predictions directly. The parametric prediction models such as the AR model,
which contain unknown parameters, require model fitting. The model fitting is a process
Table 5-3. Average phase prediction accuracy
Performance Number of phases (k)
Application Features 1 2 3 4 5 6 7 8 9 10
Bytesin 1.00 0.99 0.99 0.98 0.98 0.97 0.97 0.96 0.96 0.96
WorldCup98 Bytes_out 1.00 0.94 0.94 0.92 0.91 0.89 0.87 0.88 0.86 0.84
CPU user 1.00 0.95 0.90 0.87 0.85 0.81 0.78 0.77 0.74 0.69
SPECseis96 CPU_system 1.00 0.94 0.87 0.83 0.83 0.79 0.76 0.74 0.73 0.69
Table 5-4. Performance feature list of VM traces
Perf. Features Description
CPU_Ready The percentage of time that the virtual machine
was ready but could not get scheduled to run on
a physical CPU.
CPU_Used The percentage of physical CPU resources used
by a virtual CPU.
Mem_Size Current amount of memory in bytes the virtual
machine has.
Mem_Swap Amount of swap space in bytes used by the
virtual machine.
Net_RX/TX The number of packets and the MBytes per
second that are transmitted and received by a NIC.
Disk_RD/WR The number of I/Os and KBytes per second
that are read from and written to the disk.
replay and an average of 85'
accuracy can be achieved for the CPU performance traces of
SPECseis96 for the four-phase cases.
In addition to the above two applications, we also evaluated the prediction per-
formance of the phase predictor using traces of a set of five virtual machines. These
virtual machines were hosted by a physical machine with an Intel(R) Xeon(T\ 1) 2.0GHz
CPU, 4GB memory, and 36GB SCSI disk. VMware ESX server 2.5.2 was running on
the physical host. The ; nl -.i,,: tool was run on the ESX server to collect the resource
performance data of the guest virtual machines every minute and store them in a round
robin database. The performance features under study in this experiment are shown in
Table 5-4.
To perform the web log replay, a Matlab program was developed to profile the
binary access log file and extract the entries of the target web server. The i i. 1I."
tool provided by [98] was used to convert the binary log into the Common Log Format.
A modified version of the Real-Time Web Log Replayer [99] was used to analyze and
generate the files needed by the log replayer and perform the replay.
Figures 5-5 and 5-6 show the phase characterization results of the performance
features bytes-in and bytesout of the web server. The interesting observation from Figures
A and B is that the number of phase transitions and mis-prediction penalties do not
alv--,v- monotonically increase with the increasing number of phases. As a result, the
phase profile shown in Figure C argues that three-phase based resource provisioning gives
the lowest total cost with given C = [150k, 750k] and Cp = 8. The results implies that
the phase profile is highly workload dependent. The prototype presented in this thesis can
help to construct and analyze the phase profile of the application resource consumption
and decide the proper resource provisioning strategy.
5.5.2 Phase Prediction Accuracy
As one of the cost determinant, the misprediction penalty is a function of the phase
prediction accuracy. This section evaluates the performance of phase prediction model
introduced in Section 5.4. A performance measurement, prediction accur r ;, is defined as
the ratio of the number of performance snapshots, whose predicted phases match with the
observed phases, to the total number of performance snapshots collected during the testing
period.
Table 5-3 shows the phase prediction accuracies for the performance traces of the
main resources consumed by the SPECseis96 and the WorldCup98 workloads. Generally,
the phase prediction accuracy of each performance feature decreases with increasing
number of phases. It explains why the penalty curve rises monotonically with the
increasing number of phases in Figure D. With current implementation, an average of
95' accuracy can be achieved for the network performance traces of the WorldCup98 log
Instead of using all the eigenvectors of the covariance matrix, we may represent the
data in terms of only a few basis vectors of the orthogonal basis. If we denote the matrix
having the K first eigenvectors as rows by AK, we can create a similar transformation as
seen above
y = AK (x x) (2-8)
and
x = A y + x (2-9)
It means that we project the original data vector on the coordinate axes having the
dimension K and transforming the vector back by a linear combination of the basis
vectors. This method minimizes the mean-square error between the data and the
representation with given number of eigenvectors.
If the data is concentrated in a linear subspace, this method provides a way to
compress data without losing much information and simplifying the representation. By
picking the eigenvectors having the largest eigenvalues we lose as little information as
possible in the mean-square sense.
2.2.2 k-Nearest Neighbor Algorithm
K-Nearest Neighbor classifier (k-NN) is a supervised learning algorithm where the
result of new instance query is classified based on in iii il y of k-nearest neighbor category
[26]. It has been used in many applications in the field of data mining, statistical pattern
recognition, image processing, and many others. The purpose of this algorithm is to
classify a new object based on attributes and training samples. The classifiers do not
use any model to fit and only based on memory. Given a query point, we find k number
of objects or (training points) closest to the query point. The k-NN classifier decides
the class by considering the votes of k (an odd number) nearest neighbors. The nearest
[83] S. Gunter and H. Bunke, "An evaluation of ensemble methods in handwritten word
recognition based on feature selection," in Proc. 17th International Conference on
Pattern Recognition, Aug. 2004, vol. 1, pp. 388-392.
[84] G. Jain, A. Ginwala, and Y. Aslandogan, "An approach to text classification using
dimensionality reduction and combination of classifiers," in Proc. IEEE International
Conference on In frI, i,,il.:in Reuse and Integration, Nov. 2004, pp. 564-569.
[85] V. white paper, "Comparing the mui, virtualcenter, and vmkusage,"
[86] J. D. Cryer, Time series n.'/l,;.-, Duxbury Press, Boston, MA, 1986.
[87] S. G. John O.Rawlings and D. A.Dickey, Applied Regression A,.,l;,-: Springer,
2001.
[88] R. T. Trevor Hastie and J. Friedman, The Elements of Statistical Learning, Springer,
2001.
[89] E. Bingham and H. Mannila, "Random projection in dimensionality reduction:
applications to image and text data," in Knowledge Discovery and Data Mining,
2001, pp. 245-250.
[90] L. Sirovich and R. Everson, \ I! ,, ii :, ini and analysis of large scientific datasets,"
Int. Journal of Supercomputer Applications, vol. 6, no. 1, pp. 50-68, 1992.
[91] J. Yang, Y. Zhang and B. Kisiel, "A scalability analysis of classifiers in text
categorization," in ACMI SIGIR'03, 2003, pp. 96-103.
[92] F. Friedman, J.H. Baskett and L. Shustek, "An algorithm for finding nearest
neighbors," IEEE Transactions on Computers, vol. C-24, no. 10, pp. 1000-1006, Oct.
1975.
[93] J. Friedman, J.H. Be-,tk and R. Finkel, "An algorithm for finding best matches in
logarithmic expected time," AC'[ Transactions on Mathematical Software, vol. 3,
pp. 209-226, 1977.
[94] P. D. G. Banga and J. Mogul, "Resource containers: A new facility for resource
management in server systems," in Proc. 3rd symposium on Operating System
Design and Implementation, New Orleans, Feb. 1999.
[95] L. Ramakrishnan, L. Grit, A. Iamnitchi, D. Irwin, A. Yumerefendi, and J. C'!h -
"Towards a doctrine of containment: Grid hosting with adaptive resource control,"
in Proc. Supercomputing, Tampa, FL, Nov. 2006.
[96] R. Dubes, "How many clusters are best? -an experiment," Pattern Recogn., vol. 20,
no. 6, pp. 645-663, Nov. 1987.
[97] A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: a review," ACMI
Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.
The classification accuracy is measured as the proportion of the total number of
predictions that are correct. A prediction is considered correct if the data is classified to
the same class as its actual class. Table 3-1 shows a sample confusion matrix with L=2.
There are only two possible classes in this example: Positive and negative. Therefore, its
classification accuracy can be calculated as (a+d)/(a+b+c+d).
3.3 Autonomic Feature Selection Framework
Figure 3-2 shows the autonomic feature selection framework in the context of
application classification. In this section, we are going to focus on introducing the
classification training center, which enables the self-configurability for online application
classification. The training center has two 1n i' functions: quality assurance of training
data, which enables the classifier to adapt to changing workloads, and systematic feature
selection, which supports automatic feature selection. The training center consists of three
components: the data quality assuror, the feature selector, and the trainer.
3.3.1 Data Quality Assuror
The data quality assuror (DataQA) is responsible for selecting the training data for
application classification. The inputs of the DataQA are the performance snapshots taken
during the application execution. The outputs are the qualified training data with its
class, such as CPU-intensive.
The training data pool consists of representative data of five application classes
including CPU-intensive, I/O-intensive, memory-intensive, network-intensive, and idle.
Training data of each class c is a set of K< m-dimensional points, where m is the number
of application-specific performance metrics reported by the monitoring tools. To select the
Table 3-1. Sample confusion matrix with two classes (L 2)
Actual Predicted
Class Negative Positive
Negative a b
Positive c d
REFERENCES
[1] J. Kephart and D. C'! i -- "The vision of autonomic computing," Computer, vol. 36,
no. 1, pp. 41-50, 2003.
[2] Y. Yang and H. Casanova, "Rumr: Robust scheduling for divisible workloads.," in
Proc. 12th High-Performance Distributed Cor0nTl.',.: Seattle, WA, June 22-24, 2003,
pp. 114-125.
[3] J. M. Schopf and F. Berman, "Stochastic scheduling," in Proc. AC'I/IEEE
Conference on Super '..in-,l,.:., Portland, OR, Nov. 14-19, 1999, p. 48.
[4] L. Yang, J. M. Schopf, and I. Foster, "Conservative scheduling: Using predicted
variance to improve scheduling decisions in dynamic environments," in Proc.
AC'I/IEEE conference on Super ..,,,,il.:.,,i Nov. 15-21, 2003, p. 31.
[5] G. Tesauro, N. Jong, R. Das, and M. Bennani, "A hybrid reinforcement learning
approach to autonomic resource allocation," in Proc. IEEE International Conference
on Autonomic Computing (ICAC'06), 2006, pp. 65-73.
[6] G. Tesauro, R. Das, W. Walsh, and J. Kephart, "Utility-function-driven resource
allocation in autonomic systems," in Proc. Second International Conference on
Autonomic C ',,n;,/.:,1 (ICAC'05), 2005, pp. 342-343.
[7] R. Duda, P. Hart, and D. Stork, The Art of Computer S,-ii mi Performance
A,.l;i.-.: Techniques for Experimental Design, Measurement, Simulation, and
Modeling, Wiley-Interscience, New York, NY, Apr. 1991.
[8] J. O. Kephart, "Research challenges of autonomic computing," in Proc. 27th
International Conference on Software Engineering ICSE, May 2005, pp. 15-22.
[9] S. M. Weiss and C. A. Kulikowski, Computer S1,., 1'm That Learn: C1ir--' ..rl.:I.)n
and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert
S-1/ m- Morgan Kaufmann, San Mateo, CA 94403, 1990.
[10] R. P. Goldberg, "Survey of virtual machine research," IEEE Computer M rg. .:,..
vol. 7, no. 6, pp. 34-45, June 1974.
[11] R. Figueiredo, P. Dinda, and J. Fortes, "A case for grid computing on virtual
machines," in Proc. ,./ International Conference on Distributed CornI'nIl.I',;
S. ,1, 4 M iv 19-22, 2003, pp. 550-559.
[12] S. Pinter, Y. Aridor, S. Shultz, and S. Guenender, Iipiu, iinm machine virtualization
with 'hotplug memory'," Proc. 17th International Symposium on Computer
Architecture and High Performance Computing, pp. 168-175, 2005.
[13] C. Clark, K. Fraser, S. Hand, J. Hanseny, E. July, C. Limpach, I. Pratt, and
A. Warfield, "Live migration of virtual machines," in Proc. ',./I Symposium on
Networked S,/-1' m- Design & Implementation (NSDI'05), Boston, MA, 2005.
Cn
4
S0 2
20
P 0 20 40 60 80 100 120 140
4-4---------------
I I I I I I I
;0 0 2
2 0 20 40 60 80 100 120 140
0
0 20 40 60 80 100 120 140
Time Index
Figure 4-8. Best predictor selection for trace VM2_Disk
Predictor Class: 1 LAST, 2 AR, 3 SWAVG
1. It is hard to find a single prediction model among LAST, AR, and SW_AVG
that performs best for all types of resource performance data for a given VM trace. For
example, for the VMl's trace data shown in Table 4-1, each of the three models (LAST,
AR, and SW) outperformed the other two for a subset of the performance metrics. In this
experiment, only the AR model worked best for the trace data of VM3.
2. It is hard to find a single prediction model among the three that perform best
consistently for a given type of resources across all the VM traces. In the experiment, only
the AR model worked best for the CPU performance predictions.
3. The LARPredictor achieved better-than-expert performances using the
mix-of-expert approach for 44.2 :' of the workload traces. It shows the potential for the
the distances between 9 out of 10 pairs of cluster centroids are bigger in the automatic
selection case than the expert's manual selection case. It means that competitively distinct
class clusters can be formed with the 2 principal components which were derived from the
automatically selected features compared with the expert selected features.
Second, the PCA and k-NN based classifications were conducted with both the expert
selected 8 features in previous work [53] and the automatically selected features in Section
3.4.1. Table 3-3 shows the confusion matrices of the classification results. If data are
classified to the same classes as their actual classes, the classifications are considered as
correct. The classification accuracy is the proportion of the total number of classifications
that were correct. The confusion matrices shows that a classification accuracy of 98.05'
can be achieved with automatically selected feature set, which is similar to the 98.1 !'
accuracy achieved with expert selected feature set. Thus the automatic feature selection
that is based on B ,i, i i Network can reduce the reliance on expert knowledge while
offering competitive classification accuracy compared to manual selection by human
expert.
In addition, a set of 8 features selected in the 5-class feature selection experiment
in Section 3.4.1 was used to configure the application classifier and the same training
data used in the feature selection experiment were used to train the application classifier.
Then the trained classifier conducted classification for a set of three benchmark programs:
SPECseis96 [29], PostMark and PostMark_NFS [28]. SPECseis96 is a scientific application
which is computing-intensive but also exercises disk I/O in the initial and end phases of its
execution. PostMark originally is a disk I/O benchmark program. In PostMark_NFS,
a network file system (NFS) mounted directory was used to store the files which
were read/written by the benchmark. Therefore, PostMark_NFS performs substantial
network-I/O rather than disk I/O. The classification results are shown in Figure 3-9. The
results show that I,' of the SPECseis96 test data were classified as cpu-intensive, 95'
of the PostMark data were classified as I/O-intensive, and 61 of the PostMark_NFS
4
o 2
*OY
20
% 0 20 40 60 80 100 120 140
4
,..-----------------
S 2
S 0 20 40 60 80 100 120 140
So 2
rd 0
S0 20 40 60 80 100 120 140
Time Index
Figure 4-7. Best predictor selection for trace VM2_Swap
Predictor Class: 1 LAST, 2 AR, 3 SW_AVG
(P-LAR). The MSE of the P-LAR model shows the upper bound of the prediction
accuracy that can be achieved by the LARPredictor. The MSE of the best predictor
among LAR, LAST, AR, and SW_AVG is highlighted with italic bold numbers.
Table 4-6 shows the best predictor among LAST, AR, and SW_AVG for all the
resource performance metrics and VM traces. The symbol "*" indicates the cases in which
the LARPredictor achieved equal or higher prediction accuracy than the best of the three
predictors. Overall, the AR model performed better than the LAST and the SWAVG
models.
The above experimental results show:
symposium on Operating si,-1/ ii;- principles, Bolton L iliHi:, NY, Oct. 19-22, 2003,
pp. 74-89.
[71] R. Isaacs and P. Barham, "Performance analysis in loosely-coupled distributed
systems," in Proc. 7th CaberNet Radicals Workshop, Bertinoro, Italy, Oct. 2002.
[72] I. Foster, "The .in ,l .-_r: of the grid: enabling scalable virtual organizations," in
Proc. 1st IEEE/AC(M International Symposium on C'i,-l r CorT,,l'I,-Il and the Grid,
2001, pp. 6-7.
[73] R. Wolski, "Dynamically forecasting network performance using the network weather
service," in Journal of cluster -,,, 1i.:,1u, 1998.
[74] I. Matsuba, H. Suyari, S. Weon, and D. Sato, "Practical chaos time series ,I ll i-;
with financial applications," in Proc. 5th International Conference on S:,.,]rl
Processing, Beijing, 2000, vol. 1, pp. 265-271.
[75] P. Magni and R. Bellazzi, "A stochastic model to assess the variability of blood
glucose time series in diabetic patients self-monitoring," IEEE Trans. Biomed. Eng.,
vol. 53, no. 6, pp. 977-985, 2006.
[76] K. Didan and A. Huete, "Analysis of the global vegetation dynamic metrics using
modis vegetation index and land cover products," in IEEE International Geoscience
and Remote Sensing Symposium (IGARSS'04), 2004, vol. 3, pp. 2058-2061.
[77] P. Dinda, "The statistical properties of host load," Scientific P,.. 'Iini,,,. no.
7:3-4, 1999.
[78] P. Dinda, "Host load prediction using linear models," C'l,-i, r Cornl';,I.:., vol. 3, no.
4, 2000.
[79] Y. Zhang, W. Sun, and Y. Inoguchi, "CPU load predictions on the computational
grid *," in Proc. 6th IEEE International Symposium on C'ii. r Computing and the
Grid, liv 2006, vol. 1, pp. 321-326.
[80] J. Liang, K. N ,i istedt, and Y. Zhou, "Adaptive multi-resource prediction in
distributed resource sharing environment," in Proc. IEEE International Symposium
on C'1,-. r CorIpl..:,,l and the Grid, 2004, pp. 293-300.
[81] S. Vazhkudai and J. Schopf, "Predicting sporadic grid data transfers," Proc.
International Symposium on High Performance Distributed Cornr1,l.:,.'l pp. 188-196,
2002.
[82] S. Vazhkudai, J. Schopf, and I. Foster, "Using disk throughput data in predictions
of end-to-end grid data transfers," in Proc. 3rd International Workshop on Grid
Computing, Nov. 2002.
CHAPTER 4
ADAPTIVE PREDICTOR INTEGRATION FOR SYSTEM PERFORMANCE
PREDICTIONS
The integration of multiple predictors promises higher prediction accuracy than the
accuracy that can be obtained with a single predictor. The challenge is how to select the
best predictor at any given moment. Traditionally, multiple predictors are run in parallel
and the one that generates the best result is selected for prediction. In this chapter, we
propose a novel approach for predictor integration based on the learning of historical
predictions. Compared with the traditional approach, it does not require running all the
predictors simultaneously. Instead, it uses classification algorithms such as k-Nearest
Neighbor (k-NN) and B ,i, ~i classification and dimension reduction technique such
as Principal Component Analysis (PCA) to forecast the best predictor for the workload
under study based on the learning of historical predictions. Then only the forecasted best
predictor is run for prediction.
4.1 Introduction
Grid computing [72] enables entities to create a Virtual Organization (VO) to share
their computation resources such as CPU time, memory, network bandwidth, and disk
bandwidth. Predicting the dynamic resource availability is critical to adaptive resource
scheduling. However, determining the most appropriate resource prediction model a priori
is difficult due to the multi-dimensionality and variability of system resource usage. First,
the applications may exercise the use of different type of resources during their executions.
Some resource usages such as CPU load may be relatively smoother whereas others such
as network bandwidth are bustier. It is hard to find a single prediction model which works
best for all types of resources. Second, different applications may have different resource
usage patterns. The best prediction model for a specific resource of one machine may not
wok best for another machine. Third, the resource performance fluctuates dynamically due
to the contention created by competing applications. Indeed, in the absence of a perfect
prediction model, the best predictor for any particular resource may change over time.
Table 2-1. Performance metric list
Performance Metrics Description
CPU_System / User Percent CPU_System / User
Bytes_In / Out Number of bytes per second
into / out of the network
IO_BI / BO Blocks sent to / received from
block device (blocks/s)
Swap_In / Out Amount of memory swapped
in / out from / to disk (kB/s)
2.3.2.3 Training and classification
The 3-Nearest Neighbor classifier is used for the application classification in our
implementation. It is trained by a set of carefully chosen applications based on expert
knowledge. Each application represents the key performance characteristics of a class. For
example, an I/O benchmark program, PostMark [28], is used to represent the IO-intensive
class. SPECseis96 [29], a scientific computing intensive program, is used to represent
the CPU-intensive class. A synthetic application, Pagebench, is used to represent the
Paging-intensive class. It initializes and updates an array whose size is '-i.--. r than the
memory of the VM, thereby inducing frequent paging activity. Ettcp [30], a benchmark
that measures the network throughput over TCP or UDP between two nodes, is used as
the training application of the Network-intensive class. The performance data of all these
four applications and the idle state are used to train the classifier. For each test data, the
trained classifier calculates its distance to all the training data. The 3-NN classification
identifies only three training data sets with the shortest distance to the test data. Then
the test data's class is decided by the i1 i il i ily vote of the three nearest neighbors.
2.3.3 Post Processing and Application Database
At the end of classification, an m dimension class vector clxm = (l, c2, cm)
is generated. Each element of the vector clxm represents the class of the corresponding
application performance snapshot. The 1i, in i ly vote of the snapshot classes determines
the application Class. The complete performance data dimension reduction process is
shown in Figure 2-4. In addition to a single value (Class) the application classifier also
ACKNOWLEDGMENTS
I would like to express my sincere gratitude to my advisor, Professor Renato J.
Figueiredo, for his invaluable advice, encouragement, and support. This dissertation would
not have been possible without his guidance and support. My deep appreciation goes to
Professor Jose A.B. Fortes for participating in my supervisory committee and for all the
guidance and opportunities to work in the In-VIGO team that he gave me during my
Ph.D study. My deep recognition also goes to Professor Malay Ghosh and Professor Alan
George for serving on my supervisory committee and for their valuable ii-.. I ii- Many
thanks go to Dr. Mazin Yousif and Mr. Robert Carpenter from Intel Corporation for their
valuable input and generous funding for this research. Thanks also go to my colleagues
in the Advanced Computing Information Systems (ACIS) Laboratory for their discussion
of ideas and years of friendship. Last but not least, I owe a special debt of gratitude to
my family. Without their selfless love and support, I cannot imagine what I would have
achieved.
this work. While we have chosen to use the k-NN and B li-, -i ,i classification algorithms
due to its prior success in a large number of classification problems, such as handwritten
digits and satellite image scenes, our methodology may be generally used with other types
of classification algorithms.
4.5.1 k-Nearest Neighbor
The k-Nearest Neighbor (k-NN) classifier is memcr, -li.,o Its training data consist
of the N pairs (xi,pi), (xN, N) where pi is a class label taking values in 1, 2, P.
In this work, the P represents the number of prediction models in the pool. The training
data are represented by a set of points in the feature space, where each point xi is
associated with its class label pi. Classification of testing data xj is made to the class
of the closest training data. For example, given a test data xj, the k training data
xr, r = 1, k closest in distance to xj are identified. The test data is classified by using
the 1i i, i ilty vote among the k (an odd number) neighbors.
Since the features under study, such as CPU percentage and network receivedbytes/-
sec, have different units of measure, all features are normalized to have zero mean and unit
variance [88]. In this work, "(! ..-. is determined by Euclidean distance (Equation 4-6).
dij = ||xi Xj|| (4-6)
As a nonparametric method, the k-NN classifier can be applied to different time series
without modification. To address the problem associated with high dimensionality, various
dimension reduction techniques can be used in the data preprocessing.
4.5.2 Bayesian Classification
The B li- i i classifier is based on the well-known probability theorem, "Bw, -
formula". Suppose that we know both the prior probabilities P(Uj) and the conditional
densities p(xl j), where x and u represent a feature vector and its state (e.g., class),
respectively. The joint probability density can be written in two v--o,-i p(wy, x)
Application Class Compositions
VMD
Sftp
Autobench
NetPIPE
PostMark NFS
o Stream
SPEC M 32
a. Bonnie
PostMark
SimpleScalar
CH3D
SPEC S 256
SPEC M 256
0% 20% 40% 60% 80% 100%
Percentage
o Idle m I/O m CPU m Network O Paging
Figure 2-6. Application class composition diagram
applications SPECseis96 (S) with small data size, PostMark (P) with local file directory
and NetPIPE Client (N) were selected, and three instances of each application were
executed. The scheduler's task was to decide how to allocate the nine application instances
to run on the 3 virtual machines (VM1, VM2 and VM3) in parallel, each of which hosted
3 jobs. The VM4 was used to host the NetPIPE server. There are ten possible schedules
available, as shown in Figure 2-7.
When multiple applications run on the same host machine at the same time, there
are resource contentions among them. Two scenarios were compared: in the first scenario,
the scheduler did not use class information, and one of the ten possible schedules was
To my family.
5.3.4 Finding the Optimal Number of Clusters
One of the most venerable problems in cluster analysis is to find the optimal number
of clusters in the data. Many statistical methods and computational algorithms have been
developed to answer this question using external indices and/or internal indices [96]. The
best number of clusters in the context of phase analysis discussed in this work is the one
that gives minimal total costs. The process to find out the optimal number of clusters of
the application workload is explained as follows.
Let u, = u(to + nAt) denote the resource usage sampled at time t = to + nAt
during the execution of an application. As shown in Section 5.3.3, when the clustering
with input parameter k (i.e., the number of clusters) is performed for a resource usage set
U = {ul, U2, }, the subset Ui of resource usages that belong to the ith phase can be
written as:
U, = {ulVu e phase i}, 1 < i < k. (52)
Resource reservation strategy: Phase-based resource reservation is performed. For intervals
whose resource usages belong to the ith phase, the local maximum amount of resource
usage U rm of the phase i is reserved:
U,x = max (ulVu e U,), 1 < i< k (5-3)
and the total resource reservation R over the whole execution period can be written as
k
R(k) = Um x (size of U) (5-4)
i=1
where k is the number of clusters used for clustering algorithm and the size of Ui is defined
as the number of elements of the subset Ui. Compared to the conservative reservation
strategy, which reserves the global maximum amount of resources over the whole execution
period, the phase-based reservation strategy can better adapt the resource reservation to
the actual resource usage and reduce the resource reservation cost as shown in Figure 5-2,
[57] H. Liu and L. Yu, "Toward integrating feature selection algorithms for classification
and clustering," IEEE Trans. Knowl. Data Eng., vol. 17, no. 4, pp. 491-502, Apr.
2005.
[58] J. Pearl, Probabilistic Reasoning in Intelligent Si- I. m- Networks of Plausible
Inference, Morgan Kaufmann Publishers, San Francisco, CA, 1988.
[59] T. Dean, K. B .-,-., R. Chekaluk, S. Hyun, M. Lejter, and M. Randazza, "Coping
with uncertainty in a control system for navigation and exploration.," in Proc. 8th
National Conference on Ar'.:, .:,l Intelligence, Boston, MA, July 29-Aug. 3, 1990,
pp. 1010-1015.
[60] D. Heckerman, "Probabilistic similarity networks," Tech. Rep., Depts. of Computer
Science and Medicine, Stanford University, 1990.
[61] D. J. Spiegelhalter, R. C. Franklin, and K. Bull, "Assessment criticism and
improvement of imprecise subjective probabilities for a medical expert system,"
in Proc. Fifth Workshop on Uncer'i,,.:,l in Ar'.(l,.:,l Intelligence, 1989, pp. 335-342.
[62] E. C('! i i i1: and D. McDermott, Introduction to Ar'.:i .:.,l Intelligence,
Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1985.
[63] T. S. Levitt, J. Mullin, and T. O. Binford, "Model-based influence diagrams for
machine vision," in Proc. 5th Workshop on Uncer''i,.:ul in Ar'.fl, .:,i Intelligence,
1989, pp. 233-244.
[64] R. E. Neapolitan, Probabilistic reasoning in expert s;,;-. ii- theory and il,.,rithms,
John Wiley & Sons, Inc., New York, NY, USA, 1990.
[65] K. Weinberger, J. Blitzer, and L. Saul, "Distance metric learning for large margin
nearest neighbor classification," in Proc. 19th annual Conference on Neural
Information Processing S;, .i' Vancouver, CA, Dec. 2005.
[66] R. Kohavi and F. Provost, "Glossary of terms," Machine Learning, vol. 30, pp.
271-274, 1998.
[67] B. Ziebart, D. Roth, R. Campbell, and A. Dey, "Automated and adaptive threshold
setting: Enabling technology for .illiiin and self-management," in Proc. '.1/
International Conference of Autonomic Cor,,i',.:,I,, June 13-16, 2005, pp. 204-215.
[68] P. Mitra, C. Murthy, and S. Pal, "Unsupervised feature selection using feature
similarity," IEEE Trans. Pat. Anal. Mach. Intel., vol. 24, no. 3, pp. 301-312, Mar.
2002.
[69] W. Lee, S. J. Stolfo, and K. W. Mok, "Adaptive intrusion detection: A data mining
approach," Ar'.:l i.:.1 Intelligence Review, vol. 14, no. 6, pp. 533-567, 2000.
[70] M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthitacharoen,
"Performance debugging for distributed systems of black boxes," in Proc. 19th AC'CM
LEARNING-AIDED SYSTEM PERFORMANCE MODELING IN SUPPORT OF
SELF-OPTIMIZED RESOURCE SCHEDULING IN
DISTRIBUTED ENVIRONMENTS
By
JIAN ZHANG
A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA
2007
CHAPTER 5
APPLICATION RESOURCE DEMAND PHASE ANALYSIS AND PREDICTIONS
Profiling the execution phases of applications can help to optimize the utilization
of the underlying resources. This chapter presents a novel system level application-
resource-demand phase analysis and prediction approach in support of on-demand
resource provisioning. This approach explores large-scale behavior of applications' resource
consumption, followed by ain ,1,i -i; using a set of algorithms based on clustering. The
phase profile, which learns from historical runs, is used to classify and predict future
phase behavior. This process takes into consideration applications' resource consumption
patterns, phase transition costs and penalties associated with Service-Level Agreements
(SLA) violations.
5.1 Introduction
Recently there has been a renewed interest in using virtual machines) (VM) as a
container [94] of the application's execution environment both in academia and industry
[11] [16] [95]. This is motivated by the idea of providing computing resources as a utility
and charging the users for a specific usage. For example, in August 2006, Amazon
launched its Beta version of VM-based Elastic Compute Cloud (EC2) web service. EC2
allows users to rent virtual machines with specific configurations from Amazon and can
support changes in resource configurations in the order of minutes. In systems that
allow users to reserve and reconfigure resource allocations and charge based upon such
allocations, users have an incentive to request no more than the amount of resources an
application needs. A question which arises here is: how to adapt the resource provisioning
to the changing workload?
In this chapter, we focus on modeling and analyzing long-running applications' phase
behavior. The modeling is based on monitoring and learning of the applications' historical
resource consumption patterns, which likely varies over time. Understanding such behavior
is critical to optimizing resource scheduling. To self-optimize the configuration of an
2.b
o0 I
2 CPU
cj
- 1.5
0
E 0
0- 0.5
0.5
0
-1
-0.5 *
-1 0 1 2 3
Principal Component 1
A
2.5
O IO
2 CPU
cj
S1.5
0
o
- 0.5
U-
0-
0..
-0.5
-1
-1 0 1 2 3
Principal Component 1
B
2.5
o 10
2 O CPU
0 (B NET
1.5 0 NE
I 0 0
- 1
E
o
- 0.5 n
0.
-0.5 *
-1
-1 0 1 2
Principal Component 1
C
3 4
Figure 3-9. Classification results of benchmark programs A)SPECseis96 B)PostMark
C)PostMarkNFS Principal components 1 and 2 are the principal component
metrics extracted by PCA.
3 AUTONOMIC FEATURE SELECTION FOR APPLICATION CLASSIFICA-
TION ........... ................. .... .. .. ... 49
3.1 Introduction .........
3.2 Statistical Inference .....
3.2.1 Feature Selection
3.2.2 Bl, -i i Network
3.2.3 Mahalanobis Distance
3.2.4 Confusion Matrix .
3.3 Autonomic Feature Selection
3.3.1 Data Quality Assuror
3.3.2 Feature Selector .
3.3.3 Trainer ........
3.4 Experimental Results .
Framework
3.4.1 Feature Selection and Classification
Accuracy
3.4.2 Classification Validation .. ...................
3.4.3 Training Data Quality Assurance .. ...............
3.5 R elated W ork . . . . . . . . .
3.6 C conclusion . . . . . . . . .
4 ADAPTIVE PREDICTOR INTEGRATION FOR SYSTEM PERFORMANCE
PR ED ICTIO N S . . . . . . . . .
4.1 Introduction . . . . . . . . .
4.2 R elated W ork . . . . . . . . .
4.3 Virtual Machine Resource Prediction Overview .. ............
4.4 Time Series Models for Resource Performance Prediction .. ........
4.5 Algorithms for Prediction Model Selection .. ...............
4.5.1 k-Nearest Neighbor .. ......................
4.5.2 B li- -ii Classification . . . . . . .
4.5.3 Principal Component Analysis .. .................
4.6 Learning-Aided Adaptive Resource Predictor .. ..............
4.6.1 Training Phase . . . . . . . .
4.6.2 Testing Phase . . . . . . . .
4.7 Em pirical Evaluation .. .........................
4.7.1 Best Predictor Selection .. ...................
4.7.2 Virtual Machine Performance Trace Prediction .. ..........
4.7.2.1 Performance of k-NN based LARPredictor .. .......
4.7.2.2 Performance comparison of k-NN and B li, -i ii-classifier
based LARPredictor .. .................
4.7.2.3 Performance comparison of the LARPredictors and the
cumulative-MSE based predictor used in the NWS .
4.7.3 D discussion . . . . . . . .
4.8 C conclusion . . . . . . . . .
Table 3-3. Confusion matrix of classification results with expert-selected and
automatically-selected feature sets. A)Automatic B)Expert
Actual Classified as
Class Idle CPU IO Net Mem
Idle 4938 0 62 0 0
CPU 231 4746 23 0 0
IO 20 86 2888 6 0
Net 0 12 8 4980 0
Mem 0 0 0 0 5000
A
Actual Classified as
Class Idle CPU IO Net Mem
Idle 4962 0 38 0 0
CPU 4 4882 10 0 104
IO 20 10 2797 0 173
Net 0 0 24 4970 6
Mem 3 0 36 0 4961
B
The bold numbers along the diagonal are the number of
correctly classified data.
in this experiment, if we know that the application belongs to either I/O-intensive or
memory-intensive class, with two selected features, I'., classification accuracy can be
achieved versus S7'. accuracy in the 5-class case. It shows the potential of using pair-wise
classification to improve the classification accuracy for multi-class cases. Using pair-wise
approach for multi-class classification is a topic of future research.
3.4.2 Classification Validation
This set of experiments targets to validate the feature selection experiment results
with the Principal Component Analysis (PCA) and k-Nearest Neighbor (k-NN) based
application classification framework described in [53].
First, the training data distributions based on principal components, which are
derived from automatically selected features in Section 3.4.1 and manually selected
features in previous work [53], are shown in Figure 3-8. Distances between each pair
of class centroids in Figure 3-8 are calculated and ploted in Figure 3-7. It shows that
Multi- ,.:l,, ,!.':.'.u.:jl./ of application resource consumption: An application's execution
resource requirement is often multi-dimensional. That is, different applications may stretch
the use of CPU, memory, hard disk or network bandwidth to different degrees. The
knowledge of which kind of resource is the key component in the resource consumption
pattern can assist resource scheduling.
Multi-stage applications: There are cases where long-running scientific applications
exhibit multiple execution stages. Different execution stages may stress different kinds of
resources to different degrees, hence characterizing an application requires knowledge of
its dynamic run-time behavior. The identification of such stages presents opportunities to
exploit better matching of resource availability and application resource requirement across
different execution stages and across different nodes. For instance, with process migration
techniques [20] [21] it is possible to migrate an application during its execution for load
balancing.
The above characteristics of grid applications present a challenge to resource
scheduling: How to learn and make use of an application's multi-dimensional resource
consumption patterns for resource allocation? This chapter introduces a novel approach
to solve this problem: application classification based on the feature selection algorithm,
Principal Component Analysis (PCA), and K-Nearest Neighbor (k-NN) classifier [22][23].
The PCA is applied to reduce the dimensionality of application performance metrics, while
preserving the maximum amount of variance in the metrics. Then, the k-Nearest Neighbor
algorithm is used to categorize the application execution states into different classes
based on the application's resource consumption pattern. The learned application class
information is used to assist the resource scheduling decision-making in heterogeneous
computing environments.
The VMPlant service introduced in C!i lpter 1.4.2 provides automated cloning
and configuration of application-centric Virtual Machines (VMs). Problem-solving
environments such as In-VIGO [24] can submit requests to the VMPlant service, which
where
SKc
x(t) Zxkci,i =1,2, -- ,n (3-6)
kc=1
is called the centroid of the cluster Cc.
The training data selection is a three-step process: First the DataQA extracts the n
out of m metrics of the input performance snapshot to form a training data candidate.
Thus each candidate is represented by an n-dimensional point x = (xl, x2, xa).
Second, it evaluates whether the input candidate is qualified to be training data
representing one of the application class. At last, the qualified training data candidate
is associated with a scalar value Class, which defines the application class.
The first step is straight-forward. In the second and third steps, the Mahalanobis
distance between the training data candidate x and the centroid ec of cluster Cc is
calculated as follows:
dc(x) = ((x pc)'Ec '(x pc))' (3-7)
where c = 1, 2, .. 5 represents the application class and y 1 denotes the inverse
covariance matrix of the cluster Cc. The distance from the training data candidate x to
the boundary between two class clusters, for example C1 and C2, is Idl(x) d2(x) If
Idi(x) d2(x)| = 0, it means that the candidate is exactly at the boundary between
class 1 and 2. The further away the candidate is from the class boundaries, the better it
can represent a class. In other words, there is less probability for it to be misclassified.
Therefore, the DataQA calculates the distance from the candidate to boundaries of all
possible pairs of the classes. If the minimal distance to class boundaries, min(ldl -
d2a, Idl d3, .. d4 d5 ), is 'i.- -.-r than a predefined threshold 7, the corresponding
m-dimensional snapshot of the candidate is determined to be qualified training data of
[14] "Vmotion," http://www.vmware.com/products/vi/vc/vmotion.html.
[15] M. Zhao, J. Z!h iw- and R. Figueiredo, "Distributed file system support for virtual
machines in grid computing," Proc. 13th International Symposium on High Perfor-
mance Distributed CorT,,';,.:. pp. 202-211, 2004.
[16] I. Krsul, A. Ganguly, J. Zhang, J. Fortes, and R. Figueiredo, "Vmplants: Providing
and managing virtual machine execution environments for grid computing," in Proc.
Super, .i,,;jl/.:. Washington, DC, Nov. 6-12, 2004.
[17] J. Sugerman, G. Venkitachalan, and B. Lim, "Virtualizing i/o devices on vmware
workstation's hosted virtual machine monitor," in Proc. USENIX Annual Technical
Conference, 2001.
[18] J. Dike, "A user-mode port of the linux kernel," in Proc. 4th Annual Linux Showcase
and Conference, USENIX Association, Atlanta, GA, Oct. 2000.
[19] A. Sundararaj and P. Dinda, "Towards virtual networks for virtual machine grid
computing," in Proc. 3rd USENIX Virtual Machine Research and T / ',,. 4..,i;
Symposium, T i_- 2004.
[20] M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny, "C' I. l.point and
migration of UNIX processes in the Condor distributed processing system," Tech.
Rep. UW-CS-TR-1346, University of Wisconsin Madison Computer Sciences
Department, Apr. 1997.
[21] A. Barak, O. Laden, and Y. Yarom, "The now mosix and its preemptive process
migration scheme," Bulletin of the IEEE Technical Committee on Operating Sl-/' i
and Application Environments, vol. 7, no. 2, pp. 5-11, 1995.
[22] R. Duda, P. Hart, and D. Stork, Pattern ClI .--. ,I/r.:.n, Wiley-Interscience, New
York, NY, 2001, 2nd edition.
[23] C. G. Atkeson, A. W. Moore, and S. Schaal, "Locally weighted 1. i.riii.- Ar'.:pfi.:;,1
Intelligence Review, vol. 11, no. 1-5, pp. 11-73, 1997.
[24] S. Adabala, V. ('!i ,.[i P. C'!i .--! ., R. J. O. Figueiredo, J. A. B. Fortes, I. Krsul,
A. M. Matsunaga, M. O. Tsugawa, J. Zhang, M. Zhao, L. Zhu, and X. Zhu, "From
virtualized resources to virtual computing grids: the in-vigo system.," Future
Generation Comp. Syst., vol. 21, no. 6, pp. 896-909, 2005.
[25] L. Yu and H. Liu, "Efficient feature selection via analysis of relevance and
redundancy," Journal of Machine Learning Research, vol. 5, pp. 1205-1224,
Oct. 2004.
[26] T. Cover and P. Hart, \, rest neighbor pattern classification," IEEE Trans. Inf.
T7, .. ',; vol. 13, no. 1, pp. 21-27, Jan. 1967.
Prediction Performance Comparison (VM5)
Performance Metric ID
*P-LARP Knn-LARP OBays-LARP OCum.MSE W-Cum.MSE
Figure 4-13. Predictor performance comparison (VM5)
1 CPU_usedsec, 2 CPU_ready, 3 Mem_size, 4 Mem_swap,
5 NIC1_rx, 6 NIC1_tx, 7 NIC2_rx, 8 NIC2_tx,
9 VDl_read, 10 VDl_write, 11 VD2_read, 12 VD2_write
VM: Virtual Machine
VMM: Virtual Machine Monitor
DB: Database
ARM: Application Resource Manager
CQ: Clustering Quality
Figure 5-1.
Application resource demand phase ,n i1 Ji-; and prediction prototype
The phase i .,.i.l. analyzes the performance data collected by the monitoring
agent to find out the optimal number of phases n E [1, m]. The output phase
profile is stored in the application phase database (DB) and will be used as
training data for the phase predictor. The predictor predicts the next phase of
the application resource usage based on the learning of its historical phase
behaviors. The predicted phase can be used to support the application
resource i,,'.'.j, r's (ARM's) decisions regarding resource provisioning. The
auditor monitors and evaluates the performance of the analyzer and predictor
and orders re-training of the phase predictor with the updated workload profile
when the performance measurements drop to below a predefined threshold.
the application containers. The collected performance data is stored in the performance
database.
The phase analyzer retrieves the time-series VM performance data, which are
identified by vmID, FeaturelD, and a time window (ts, t), from the performance database.
Then it performs phase analysis using algorithms based on clustering to check whether
there is a phase behavior in the application's resource consumption patterns. If so, it
continues to find out how many phases in a numeric range are best in terms of providing
the minimal resource reservation costs. The output phase profile, which consists of the
This chapter introduces a Learning-Aided Adaptive Resource Predictor (LARPre-
dictor), which can dynamically choose the best prediction model suited to the workload at
any given moment. By integrating the prediction results generated by the best predictor
of each moment during the application run, the LARPredictor can outperform any single
predictor in the pool. It differs from the traditional mix-of-expert resource prediction
approach in that it does not require running multiple prediction models in parallel all
the time to identify the best predictors. Instead, the Principal Component Analysis
(PCA) and classification algorithm such as k-Nearest Neighbor (k-NN) are used to
forecast the best prediction model from a pool based on the monitoring and learning of
the historical resource availability and the corresponding prediction performance. The
learning-aided adaptive resource performance prediction can be used to support dynamic
VM provisioning by providing accurate prediction of the resource availability of the host
server and the resource demand of the applications that are reflected by the hosting
virtual machines.
Our experimental results based on the analysis of a set of virtual machine trace data
show:
1. The best prediction model is workload specific. In the absence of a perfect
prediction model, it is hard to find a single predictor which works best across virtual
machines which have different resource usage patterns.
2. The best prediction model is resource specific. It is hard to find a single predictor
which works best across different resource types.
3. The best prediction model for a specific type of resource of a given VM trace varies
as a function of time. The LARPredictor can adapt the predictor selection to the change
of the resource consumption pattern.
4. In the experiments with a set of trace data, The LARPredictor outperformed the
observed single best predictor in the pool for 44.2 :'. of the traces and outperformed the
cumulative-MSE based prediction model used in the Network Weather Service system
on-line resource reprovisioning on the same cluster node. So the transition time can be
virtually close to zero (C = 0). In this case, 10 phases can be used. If the transition takes
8 seconds (C = 156), which is achievable with intra-cluster VM migration for resource
1 I''' -i ..i:.i- four phases work the best. When the transition cost exceeds the level that
the reduced resource reservation can justify for the workload under study, the total cost is
an increasing function of the number of phases. In this case, it is better to fall back from
the phase-based resource reservation strategy to the conservative one.
The impact of inaccuracies introduced by the phase predictor is shown in Figure F. In
addition to the resource reservation costs and the phase transition costs, this experiment
also took the phase mis-prediction penalty costs into accounts while calculating the total
cost. For example, for each unit of down-size mis-predicted resource, a penalty of 8-times
(Cp = 8) of the unit resource cost is imposed. Comparing Figure E to Figure F, we can
see that adding penalty into the cost model will increase the final costs to the user for the
same set of k and C and potentially reduce the workload's best number of phases k'_best
for the same set of C and Cp.
Finally a total cost ratio p is defined to be the relative total cost using k phases
TC'(k) to the total cost of 1 phase TC'(1).
p =TC'(k)/TC(1). (5-11)
Intuitively, p measures the cost savings achieved using phase-based reservation strategy
over the conservative one. Thus, the smaller the value of p, the more efficient a
phase-based reservation scheme. Table 5-2 gives a sample total cost schedule (C = 52
and Cp = 8) for each of the eight performance features of SPECseis96. It shows that
by changing the resource provisioning strategy from the conservative approach (k = 1)
to the phase-based provisioning (k = 3), 29.5' total cost reduction for CPU usage can
be achieved. For spiky trace data such as disk I/O and memory usage, the total cost
reduction can be as high as !' '.
Rsc (Actual)), the predicted resource usage by the AR prediction (Predicted Rsc), and the
resource reservation based on the predicted usage (Rsvd Rsc (Predict)).
Figures C and D show that, with increasing number of phases, two of the deter-
minants in the cost model including the number of phase transitions TR(k) and the
misprediction penalty P(k) increase monotonically. The other determinant of the cost
model, the amount of reserved resources R(k), is shown by the lowest curve with index
C = 0 in Figure E. It indicates that with increasing number of phases the total reserved
resources of the training set is decreasing monotonically. This result is because with the
increasing number of phases, the resource allocation can be performed at time scales of
finer granularity. However, there is a diminishing return of the increased number of phases
because of the increasing phase transition costs and misprediction penalties.
In the first analysis, we assume each resource reservation scheme to be r1.:,; .;;,./
i.e., it reserves resources based on exact knowledge of future workload requirements. This
assumption eliminates the impact of inaccuracies introduced by the phase predictor.
In this case, Equation (5-6), which takes the resource reservation cost and the phase
transition cost into account while deciding the optimal number of phases, can be
applied as shown in Figure E. In this figure, the total cost over the whole testing period
is measured by CPU usage in percentage. The discount factor C denotes the CPU
percentage that will cost for each phase transition: C = CPU( .) TransitionDuration.
For example, the bottom line of C = 0 shows the case of no transition cost, which gives
the lower bound of the total cost. For another instance, C = 21 11' implies a 13-second
transition period (2.6intervals 5secs/interval) with the assumption of 1011t' CPU
consumption during the transition period. When the discount factor C increases from 0 to
260, the best number of phases kbest, which can provide the lowest total cost, decreases
gradually from 10 to 2. The phase profile depicted in Figure E can be used to decide the
number of phases that should be used in the phase-based resource reservation to minimize
the total cost with given available transition options. For example, VMware ESX supports
feature subset, it calls the classifier to perform classification of the data in the updated
training data pool using the old and new feature subsets respectively. Then it compares
the classification accuracy of the two. If the accuracy achieved by the new feature subset
is higher than the one achieved by the previous subset, the selected feature is updated.
Otherwise, it remains the same.
3.4 Experimental Results
We have implemented a prototype for the feature selector based on Matlab. This
section shows the experimental results of feature selection for data collected during the
execution of a set of applications representative of each class (CPU, I/O, memory and
network intensive) and the classification accuracy achieved. In addition, statistical analysis
of the performance metrics was conducted to justify the reason of using Mahalanobis
distance in the training data quality assurance process.
In the experiments, all the applications were executed in a VMware GSX 2.5 virtual
machine with 256MB memory. The virtual machine was hosted on an Intel(R) Xeon(T\ i)
dual-CPU 1.80GHz machine with 512KB cache and 1GB RAM. The CTC and application
classifier were running on an Intel(R) Pentium(R) III 750MHz machine with 256MB RAM.
3.4.1 Feature Selection and Classification Accuracy
Two sets of experiments were conducted offline to evaluate our feature selection
algorithm. In both experiments, the training data, described by 20 performance metrics,
consists of performance snapshots of applications belonging to different classes. In the
experiments, tenfold cross validation were performed. The training data was randomly
divided into two parts: A combination of 50 '. of the data from each class was used to
train the feature selector (training set) to derive the feature subset, and the other 5(0'. was
used as test set to validate the features selected by calculating its classification accuracy.
The first experiment was designed to show the relationship between classification
accuracy and the number of features selected. The second experiment was designed to
P(wjyx)p(x) = p(xlJj)P(wj). Rearranging these leads us to "Bayes formula":
P(wjx) -p(xj)P() (4-7)
p(x)
where in this case of c categories
p(x) p(x|yj)P(j). (4-8)
j 1
Then, the posterior probabilities P(cj |x) can be computed from p(x cj) by B-,-.-
formula. In addition, Bi v- formula can be expressed informally in English by -zwiing that
likelihood x prior
posterior = (4-9)
evidence
The multivariate normal density has been applied successfully to a number of
classification problems. In this work the feature vector can be modeled as a multivariate
normal random variable.
The general multivariate normal density in d dimensions is written as
p(x) =
(27)d/2 I I 1/2
exp (x ) (x ), (4-10)
where x is a d-component column vector, tt is the d-component mean vector, E is the
d-by-d covariance matrix, and XII and E-1 are its determinant and inverse, respectively.
Further, we let (x tt)T denote the transpose of x tt.
The minimization of the probability of error can be achieved by use of the discrimi-
nant functions
gi(x) IlnP(wi;x) = lnp(xlw;) + lnP(wi). (4-11)
used for performance prediction. VM2 was used in the experiments. Fig. 4-5 shows the
predictor selections for CPU fifteen minute load average during a 12 hour period with a
sampling interval of 5 minutes. The top plot shows the observed best predictor by running
three prediction models in parallel. The middle plot shows the predictor selection of the
LARPredictor and the bottom plot shows the cumulative-MSE based predictor selection
used in the NWS. Similarly the predictor selection results of the trace data of other
resources are shown as follows: Network packets in per second in Fig. 4-6, total amount of
swap memory in Fig. 4-7, and total disk space in Fig. 4-8.
These experimental results show that the best prediction model for a specific
type of resource of a given trace varies as a function of time. In the experiment, the
LARPredictor can better adapt the predictor selection to the changing workload than
the cumulative-MSE based approach presented in the NWS. The LARPredictor's average
best predictor forecasting accuracy of all the performance traces of the five virtual
machines is 515 *'- which is 20.1t' higher than the accuracy of 4 ,. ,' achieved by the
cumulative-MSE based predictor used in the NWS for the workload studied.
4.7.2 Virtual Machine Performance Trace Prediction
This set of experiments is used to check the prediction performance of the LARPre-
dictor. Section 4.7.2.1 shows the prediction accuracy of the k-NN based LARPredictor
and all the predictors in the pool. Section 4.7.2.2 compares the prediction accuracy and
execution time of the k-NN based LARPredictor and the B ii i i based LARPredictor.
In addition, Section 4.7.2.3 benchmarks the performance of the LARPredictors and the
cumulative-MSE based prediction model used in the NWS.
In the experiments, ten-fold cross validation were performed for each set of time series
data. A time stamp was randomly chosen to divide the performance data of a virtual
machine into two parts: 5(0' of the data was used to train the LARPredictor and the
other 50' was used as test set to measure the prediction performance by calculating its
prediction MSE.
optimality of the decision. The key challenge here is how to find a representation of the
application, which can describe multiple dimensions of resource consumption, in a simple
way. This section describes how the pattern classification techniques, the PCA and the
K-NN classifier, are applied to achieve this goal.
A pattern classification system consists of pre-processing, feature extraction,
classification, and post-processing. The pre-processing and feature extraction are known
to significantly affect the classification, because the error caused by wrong features may
propagate to the next steps and stays predominant in terms of the overall classification
error. In this work, a set of application performance metrics are chosen based on expert
knowledge and the principle of increasing relevance and reducing redundancy [25].
2.2.1 Principal Component Analysis
Principal Component Analysis (PCA) [22] is a linear transformation representing
data in a least-square sense. It is designed to capture the variance in a dataset in terms of
principal components and reduce the dimensionality of the data. It has been widely used
in data analysis and compression.
When a set of vector samples are represented by a set of lines passing through
the mean of the samples, the best linear directions result in eigenvectors of the scatter
matrix the so-called "principal compels, ii as shown in Figure 2-1. The corresponding
eigenvalues represent the contribution to the variance of data. When the k largest
eigenvalues of n principal components are chosen to represent the data, the dimensionality
of the data reduces from n to k.
Principal component analysis is based on the statistical representation of a random
variable. Suppose we have a random vector population x, where
x (Xri,.. ,X)T (2-1)
and the mean of that population is denoted by
x E {x} (2-2)
Application Class
pksin load fifteen
Figure 3-1. Sample B ,i-, -i i: network generated by feature selector
assertions that allow the construction of a global gpdf from the local gpdfs. As shown
previously, the chain rule of probability can be used to ascertain these values:
k
p(xi,' ,Xk|) Jp(xixl,"" ,Xi-l,) (3 1)
i=1
One assumption imposed by B li, -i i: Network theory (and indirectly by the Product
Rule of probability theory) is that each variable xi, Hi C {xi, xi-1} must be a set of
variables that renders xi and {xl, xii} conditionally independent. In this way:
p(xilXl,- Xi-1, = p(xili, 0) (3-2)
A B li-, -i i Network Structure then encodes the assertions of conditional independence
in Equation 3-1 above. Essentially then, a B li, -i ,i Network Structure Bs is a directed
.I. il ic graph such that each variable in U corresponds to a node in Bs, and the parents of
the node corresponding to xi are the nodes corresponding to the variables in Fi.
Depending on the problem that is defined, either (or both) of the topology and
the probability distribution of B li-, -i i, Network can be pre-defined by hand or may be
|
Full Text |
PAGE 1
1
PAGE 2
2
PAGE 3
3
PAGE 4
Iwouldliketoexpressmysinceregratitudetomyadvisor,ProfessorRenatoJ.Figueiredo,forhisinvaluableadvice,encouragement,andsupport.Thisdissertationwouldnothavebeenpossiblewithouthisguidanceandsupport.MydeepappreciationgoestoProfessorJoseA.B.FortesforparticipatinginmysupervisorycommitteeandforalltheguidanceandopportunitiestoworkintheIn-VIGOteamthathegavemeduringmyPh.Dstudy.MydeeprecognitionalsogoestoProfessorMalayGhoshandProfessorAlanGeorgeforservingonmysupervisorycommitteeandfortheirvaluablesuggestions.ManythanksgotoDr.MazinYousifandMr.RobertCarpenterfromIntelCorporationfortheirvaluableinputandgenerousfundingforthisresearch.ThanksalsogotomycolleaguesintheAdvancedComputingInformationSystems(ACIS)Laboratoryfortheirdiscussionofideasandyearsoffriendship.Lastbutnotleast,Ioweaspecialdebtofgratitudetomyfamily.Withouttheirselessloveandsupport,IcannotimaginewhatIwouldhaveachieved. 4
PAGE 5
page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 8 LISTOFFIGURES .................................... 9 ABSTRACT ........................................ 11 CHAPTER 1INTRODUCTION .................................. 13 1.1ResourcePerformanceModeling ........................ 14 1.2AutonomicComputing ............................. 15 1.3Learning ..................................... 17 1.3.1SupervisedLearning ........................... 17 1.3.2UnsupervisedLearning ......................... 18 1.3.3ReinforcementLearning ......................... 18 1.3.4OtherLearningParadigms ....................... 19 1.4VirtualMachines ................................ 20 1.4.1VirtualMachineCharacteristics .................... 20 1.4.2VirtualMachinePlant ......................... 22 2APPLICATIONCLASSIFICATIONBASEDONMONITORINGANDLEA-RNINGOFRESOURCECONSUMPTIONPATTERNS ............. 24 2.1Introduction ................................... 24 2.2ClassicationAlgorithms ............................ 26 2.2.1PrincipalComponentAnalysis ..................... 27 2.2.2k-NearestNeighborAlgorithm ..................... 30 2.3ApplicationClassicationFramework ..................... 31 2.3.1PerformanceProler .......................... 32 2.3.2ClassicationCenter .......................... 33 2.3.2.1Datapreprocessingbasedonexpertknowledge ....... 33 2.3.2.2Featureselectionbasedonprincipalcomponentanalysis 34 2.3.2.3Trainingandclassication .................. 35 2.3.3PostProcessingandApplicationDatabase .............. 35 2.4ExperimentalResults .............................. 36 2.4.1ClassicationAbility .......................... 36 2.4.2SchedulingPerformanceImprovement ................. 41 2.4.3ClassicationCost ............................ 45 2.5RelatedWork .................................. 45 2.6Conclusion .................................... 47 5
PAGE 6
......................................... 49 3.1Introduction ................................... 49 3.2StatisticalInference ............................... 51 3.2.1FeatureSelection ............................ 51 3.2.2BayesianNetwork ............................ 52 3.2.3MahalanobisDistance .......................... 55 3.2.4ConfusionMatrix ............................ 55 3.3AutonomicFeatureSelectionFramework ................... 56 3.3.1DataQualityAssuror .......................... 56 3.3.2FeatureSelector ............................. 59 3.3.3Trainer .................................. 61 3.4ExperimentalResults .............................. 62 3.4.1FeatureSelectionandClassicationAccuracy ............. 62 3.4.2ClassicationValidation ........................ 65 3.4.3TrainingDataQualityAssurance ................... 71 3.5RelatedWork .................................. 71 3.6Conclusion .................................... 73 4ADAPTIVEPREDICTORINTEGRATIONFORSYSTEMPERFORMANCEPREDICTIONS .................................... 74 4.1Introduction ................................... 74 4.2RelatedWork .................................. 76 4.3VirtualMachineResourcePredictionOverview ............... 77 4.4TimeSeriesModelsforResourcePerformancePrediction .......... 80 4.5AlgorithmsforPredictionModelSelection .................. 82 4.5.1k-NearestNeighbor ........................... 83 4.5.2BayesianClassication ......................... 83 4.5.3PrincipalComponentAnalysis ..................... 85 4.6Learning-AidedAdaptiveResourcePredictor ................. 86 4.6.1TrainingPhase ............................. 86 4.6.2TestingPhase .............................. 89 4.7EmpiricalEvaluation .............................. 90 4.7.1BestPredictorSelection ........................ 90 4.7.2VirtualMachinePerformanceTracePrediction ............ 91 4.7.2.1Performanceofk-NNbasedLARPredictor ......... 92 4.7.2.2Performancecomparisonofk-NNandBayesian-classierbasedLARPredictor ..................... 96 4.7.2.3PerformancecomparisonoftheLARPredictorsandthecumulative-MSEbasedpredictorusedintheNWS .... 97 4.7.3Discussion ................................ 98 4.8Conclusion .................................... 100 6
PAGE 7
......................................... 106 5.1Introduction ................................... 106 5.2ApplicationResourceDemandPhaseAnalysisandPredictionPrototype 108 5.3DataClustering ................................. 111 5.3.1StagesinClustering ........................... 111 5.3.2DenitionsandNotation ........................ 112 5.3.3k-meansClustering ........................... 113 5.3.4FindingtheOptimalNumberofClusters ............... 114 5.4PhasePrediction ................................ 117 5.5EmpiricalEvaluation .............................. 118 5.5.1PhaseBehaviorAnalysis ........................ 119 5.5.1.1SPECseis96benchmark ................... 119 5.5.1.2WorldCupweblogreplay .................. 122 5.5.2PhasePredictionAccuracy ....................... 123 5.5.3Discussion ................................ 125 5.6RelatedWork .................................. 126 5.7Conclusion .................................... 128 6CONCLUSION .................................... 135 REFERENCES ....................................... 137 BIOGRAPHICALSKETCH ................................ 146 7
PAGE 8
Table page 2-1Performancemetriclist ................................ 35 2-2Listoftrainingandtestingapplications ....................... 37 2-3Experimentaldata:applicationclasscompositions ................. 40 2-4Systemthroughput:concurrentvs.sequentialexecutions ............. 44 3-1Sampleconfusionmatrixwithtwoclasses(L=2) .................. 56 3-2Sampleperformancemetricsintheoriginalfeatureset .............. 59 3-3Confusionmatrixofclassicationresults ...................... 65 3-4Performancemetriccorrelationmatrixesoftestapplications ........... 70 4-1NormalizedpredictionMSEstatisticsforresourcesofVM1 ............ 96 4-2NormalizedpredictionMSEstatisticsforresourcesofVM2 ............ 97 4-3NormalizedpredictionMSEstatisticsforresourcesofVM3 ............ 98 4-4NormalizedpredictionMSEstatisticsforresourcesofVM4 ............ 99 4-5NormalizedpredictionMSEstatisticsforresourcesofVM5 ............ 99 4-6Bestpredictorsofallthetracedata ......................... 100 5-1Performancefeaturelist ............................... 119 5-2SPECseis96totalcostratioschedulefortheeightperformancefeatures ..... 122 5-3Averagephasepredictionaccuracy ......................... 124 5-4PerformancefeaturelistofVMtraces ........................ 124 5-5AveragephasepredictionaccuracyoftheveVMs ................ 126 8
PAGE 9
Figure page 1-1Structureofanautonomicelement. ......................... 16 1-2Classicationsystemrepresentation ......................... 19 1-3Virtualmachinestructure .............................. 21 1-4VMPlantarchitecture ................................ 23 2-1Sampleofprincipalcomponentanalysis ....................... 28 2-2k-nearestneighborclassicationexample ...................... 31 2-3Applicationclassicationmodel ........................... 32 2-4Performancefeaturespacedimensionreductionsintheapplicationclassicationprocess ......................................... 34 2-5Sampleclusteringdiagramsofapplicationclassications ............. 39 2-6Applicationclasscompositiondiagram ....................... 42 2-7Systemthroughputcomparisonsfortendierentschedules ............ 43 2-8Applicationthroughputcomparisonsofdierentschedules ............ 44 3-1SampleBayesiannetworkgeneratedbyfeatureselector .............. 54 3-2Featureselectionmodel ............................... 57 3-3Bayesian-networkbasedfeatureselectionalgorithmforapplicationclassication 60 3-4Averageclassicationaccuracyof10setsoftestdataversusnumberoffeaturesselectedintherstexperiment ........................... 63 3-5Two-classtestdatadistributionwiththersttwoselectedfeatures ....... 63 3-6Five-classtestdatadistributionwithrsttwoselectedfeatures .......... 66 3-7Comparisonofdistancesbetweenclustercentersderivedfromexpert-selectedandautomaticallyselectedfeaturesets ....................... 66 3-8Trainingdataclusteringdiagramderivedfromexpert-selectedandautomat-icallyselectedfeaturesets .............................. 67 3-9Classicationresultsofbenchmarkprograms .................... 69 4-1Virtualmachineresourceusagepredictionprototype ............... 78 4-2SampleXMLschemaoftheVMperformanceDB ................. 80 9
PAGE 10
............... 87 4-4Learning-aidedadaptiveresourcepredictordataow ................ 88 4-5BestpredictorselectionfortraceVM2 load15 ................... 92 4-6BestpredictorselectionfortraceVM2 PktIn .................... 93 4-7BestpredictorselectionfortraceVM2 Swap .................... 94 4-8BestpredictorselectionfortraceVM2 Disk .................... 95 4-9Predictorperformancecomparison(VM1) ..................... 101 4-10Predictorperformancecomparison(VM2) ..................... 102 4-11Predictorperformancecomparison(VM3) ..................... 103 4-12Predictorperformancecomparison(VM4) ..................... 104 4-13Predictorperformancecomparison(VM5) ..................... 105 5-1Applicationresourcedemandphaseanalysisandpredictionprototype ...... 109 5-2Resourceallocationstrategycomparison ...................... 115 5-3Applicationresourcedemandphasepredictionworkow .............. 129 5-4PhaseanalysisofSPECseis96CPU user ...................... 130 5-5PhaseanalysisofWorldCup'98Bytes In ...................... 133 5-6PhaseanalysisofWorldCup'98Bytes out ...................... 134 10
PAGE 11
Withthegoalofautonomiccomputing,itisdesirabletohavearesourceschedulerthatiscapableofself-optimization,whichmeansthatwithagivenhigh-levelobjectivetheschedulercanautomaticallyadaptitsschedulingdecisionstothechangingworkload.Thisself-optimizationcapacityimposeschallengestosystemperformancemodelingbecauseofincreasingsizeandcomplexityofcomputingsystems. Ourgoalsweretwofold:todesignperformancemodelsthatcanderiveapplications'resourceconsumptionpatternsinasystematicway,andtodevelopperformancepredictionmodelsthatcanadapttochangingworkloads.Anoveltyinthesystemperformancemodeldesignistheuseofvariousmachinelearningtechniquestoecientlydealwiththecomplexityofdynamicworkloadsbasedonmonitoringandminingofhistoricalperformancedata.Intheenvironmentsconsideredinthisthesis,virtualmachines(VMs)areusedasresourcecontainerstohostapplicationexecutionsbecauseoftheirexibilityinsupportingresourceprovisioningandloadbalancing. Ourstudyintroducedthreeperformancemodelstosupportself-optimizedschedulinganddecision-making.First,anovelapproachisintroducedforapplicationclassicationbasedonthePrincipalComponentAnalysis(PCA)andthek-NearestNeighbor(k-NN)classier.Ithelpstoreducethedimensionalityoftheperformancefeaturespaceandclassifyapplicationsbasedonextractedfeatures.Inaddition,afeatureselectionmodelis 11
PAGE 12
Second,anadaptivesystemperformancepredictionmodelisinvestigatedbasedonalearning-aidedpredictorintegrationtechnique.Supervisedlearningtechniquesareusedtolearnthecorrelationsbetweenthestatisticalpropertiesoftheworkloadandthebest-suitedpredictors. Inadditiontoaone-stepaheadpredictionmodel,aphasecharacterizationmodelisstudiedtoexplorethelarge-scalebehaviorofapplication'sresourceconsumptionpatterns. Ourstudyprovidesnovelmethodologiestomodelsystemandapplicationperfor-mance.Theperformancemodelscanself-optimizeovertimebasedonlearningofhistoricalruns,thereforebetteradapttothechangingworkloadandachievebetterpredictionaccuracythantraditionalmethodswithstaticparameters. 12
PAGE 13
Thevisionofautonomiccomputing[ 1 ]istoimprovemanageabilityofcomplexITsystemstoafargreaterextentthancurrentpracticethroughself-conguring,self-healing,self-optimization,andself-protection.Toperformtheself-congurationandself-optimizationofapplicationsandassociatedexecutionenvironmentsandtorealizedynamicresourceallocation,bothresourceawarenessandapplicationawarenessareimportant.Inthiscontext,therehasbeensubstantialresearchoneectiveschedulingpolicies[ 2 { 6 ]withgivenresourceandapplicationspecications.Whilethereareseveralmethodsforobtainingresourcespecicationparameters(e.g.,CPU,memory,anddiskinformationfromthe/proclesysteminUnixsystems),applicationspecicationischallengingtodescribeduetothefollowingfactors:1)lackofknowledgeandcontroloftheapplicationsourcecodes,2)multi-dimensionalityofapplicationresourceconsumptionpatterns,and3)multi-stageresourceconsumptionpatternsoflong-runningapplications.Furthermore,thedynamicsofsystemperformanceaggravatethedicultiesofperformancedescriptionandprediction. Inthisdissertation,anintegratedframeworkconsistingofalgorithmsandmiddlewareforresourceperformancemodelingisdeveloped.Itincludessystemperformancepredictionmodelsandapplicationresourcedemandmodelsbasedonlearningofhistoricalexecutions.Anoveltyoftheperformancemodeldesignsistheiruseofmachinelearningtechniquestoecientlyandrobustlydealwiththecomplexdynamicalphenomenaoftheworkloadandresourceavailability.Inaddition,virtualmachines(VMs)areusedasresourcecontainersbecausetheyprovideaexiblemanagementplatformthatisusefulforboththeencapsulationofapplicationexecutionenvironmentsandtheaggregationandaccountingofresourcesconsumedbyanapplication.Inthiscontext,resourceschedulingbecomesaproblemofhowtodynamicallyallocateresourcestovirtualmachines(whichhostapplicationexecutions)tomeettheapplications'resourcedemands. 13
PAGE 14
1.1 givesanoverviewofresourceperformancemodeling.Sections 1.2 1.3 ,and 1.4 ,brieyintroduceautonomiccomputing,machinelearning,andvirtualmachineconcepts. 7 ]: Insystemprocurementstudies,thecost/performanceratioiscommonlyusedasametricforcomparingsystems.Threetechniquesforperformanceevaluationareanalyticalmodeling,simulation,andmeasurement.Sometimesitishelpfultousetwoormoretechniques,eithersimultaneouslyorsequentially. Computersystemperformancemeasurementsinvolvemonitoringthesystemwhileitisbeingsubjectedtoaparticularworkload.Inordertoperformmeaningfulmeasurements,theworkloadshouldbecarefullyselectedbasedontheservicesexercisedbytheworkload,thelevelofdetail,representativeness,andtimeliness.Sincearealuserenvironmentisgenerallynotrepeatable,itisnecessarytostudytherealuserenvironments,observethekeycharacteristics,anddevelopaworkloadmodelthatcanbeusedrepeatedly.This 14
PAGE 15
1 ].Theessenceofautonomiccomputingistoenableself-managedsystems,whichincludesthefollowingaspects: Autonomiccomputingpresentschallengesandopportunitiesinvariousareassuchaslearningandoptimizationtheory,automatedstatisticallearning,andbehavioralabstractionandmodels[ 8 ].Thisdissertationaddressessomeofthechallengesin 15
PAGE 16
Structureofanautonomicelement. theapplicationresourceperformancemodelingtosupportself-congurationandself-optimizationofapplicationexecutionenvironments. Generally,anautonomicsystemisaninteractivecollectionofautonomicelements:individualsystemconstituentsthatcontainresourcesanddeliverservicestohumansandotherautonomicelements.AsFigure 1-1 shows,anautonomicelementwilltypicallyconsistofoneormoremanagedelementscoupledwithasingleautonomicmanagerthatcontrolsandrepresentsthem.Themanagedelementcouldbeahardwareresource,suchasstorage,aCPU,orasoftwareresource,suchasadatabase,oradirectoryservice,oralargelegacysystem[ 1 ].Themonitoringprocesscollectstheperformancedataofthe 16
PAGE 17
Machinelearningisanaturalsolutiontoautomation.Itavoidsknowledge-intensivemodelbuildingandreducestherelianceonexpertknowledge.Inaddition,itcandealwithcomplexdynamicalphenomenaandenablethesystemtoadapttothechangingenvironments. Traditionallytherearegenerallythreetypesoflearning:supervisedlearning,unsupervisedlearning,andreinforcementlearning. 9 ].\Learning"consistsofchoosingoradaptingparameterswithinthemodelstructurethatworkbestonthesamplesathandandotherslikethem.Oneofthemostprominentandbasiclearningtasksisclassicationorprediction,whichisusedextensivelyinthiswork.Forclassicationproblems,alearningsystemcanbeviewedasahigher-levelsystemthathelpsbuildthedecision-makingsystemitself,calledtheclassier.Figure 1-2 illustratesthestructureofaclassicationsystemanditslearningprocess. 17
PAGE 18
Reinforcementlearningalgorithmsattempttondapolicyformaximizingcumulativerewardfortheagentoverthecourseoftheproblem.Theenvironmentistypically 18
PAGE 19
ClassicationsystemrepresentationDuringthetrainingphase,labeledsamplecasesareusedtoderivetheunknownparametersoftheclassiermodel.Duringthetestingphase,thecustomizedclassierisusedtoassociateaspecicpatternofobservationswithaspecicclass. 19
PAGE 20
Inthiswork,variouslearningtechniquesareusedtomodeltheapplicationresourcedemandandsystemperformance.Thesemodelscanhelptosystemtoadapttothechangingworkloadandachievehigherperformance. 10 ].A\classic"virtualmachine(VM)enablesmultipleindependent,isolatedoperatingsystems(guestVM)torunononephysicalmachine(hostserver),ecientlymultiplexingsystemresourcesofthehostmachine[ 10 ]. Avirtual-machinemonitor(VMM)isasoftwarelayerthatrunsonahostplatformandprovidesanabstractionofacompletecomputersystemtohigher-levelsoftware.TheabstractioncreatedbytheVMMiscalledavirtualmachine.Figure 1-3 showsthestructureofvirtualmachines. 11 ].Thefollowingcharacteristicsofvirtualmachinesmakethemahighlyexibleandmanageableapplicationexecutionplatform: 20
PAGE 21
VirtualmachinestructureAvirtual-machinemonitorisasoftwarelayerthatrunsonahostplatformandprovidesanabstractionofacompletecomputersystemtohigher-levelsoftware.Thehostplatformmaybethebarehardware(TypeIVMM)orahostoperatingsystem(TypeIIVMM).Thesoftwarerunningabovethevirtual-machineabstractioniscalledguestsoftware(operatingsystemandapplications). 12 ]cansupportdynamicmemoryextensionofVMguestwithoutshuttingdownthesystem. 21
PAGE 22
13 ].VMware'sVMotioncansupportmigrationwithzerodowntime[ 14 ].TechniquesbasedonVirtualFileSystem(VFS)hasbeenstudiedin[ 15 ]tosupportVMmigrationacrossWide-AreaNetworks(WANs). 16 ]handlesvirtualmachinecreationandhostingforclassicvirtualmachines(e.g.VMware[ 17 ])anduser-modeLinuxplatforms(e.g.,UML[ 18 ])viadynamiccloning,instantiationandconguration.TheVMPlanthasthreemajorcomponents:VirtualMachineProductionCenter(VMPC),VirtualMachineWarehouse(VMWH)andVirtualMachineInformationSystem(VMIS).TheVMPChandlesthevirtualmachine'screation,congurationanddestruction.Itemploysacongurationpatternrecognitiontechniquetoidentifyopportunitiestoapplythepre-cachedvirtualmachinestatetoacceleratethemachinecongurationprocess.TheVMWHstoresthepre-cachedmachineimages,monitorsthemandtheirhostserver'sperformanceandperformsthemaintenanceactivity.TheVMISstoresthestaticanddynamicinformationofthevirtualmachinesandtheirhostserver.ThearchitectureoftheVMPlantisshowninFigure 1-4 TheVMPlantprovidesAPItoVMShopforvirtualmachinecreation,deconstruction,andmonitoring.TheVMShophasthreemajorcomponents:VMCreater,VMCollecterandVMReporter.TheVMCreaterhandlesthevirtualmachines'creation;TheVMCollecterhandlesthemachines'deconstructionandsuspension;TheVMReporterhandlesinformationrequest.Incombinationwithavirtualmachineshopservice,VMPlants 22
PAGE 23
VMPlantarchitecture deployedacrossphysicalresourcesofasiteallowclients(usersand/ormiddlewareactingontheirbehalf)toinstantiateandcontrolclient-customizedvirtualexecutionenvironments.Theplantcanbeintegratedwithvirtualnetworkingtechniques(suchasVNET[ 19 ])toallowclient-sidenetworkmanagement.Customized,application-specicVMscanbedenedinVMPlantwiththeuseofadirectedacyclicgraph(DAG)conguration.VMexecutionenvironmentsdenedwithinthisframeworkcanthenbeclonedanddynamicallyinstantiatedtoprovideahomogeneousapplicationexecutionenvironmentacrossdistributedresources. InthecontextoftheVMPlant,anapplicationcanbescheduledtoruninaspecicvirtualmachine,whichiscalledapplicationVM.Therefore,thesystemperformancemetriccollectedfromtheapplicationVMcanreectandsummarizetheresourceconsumptionoftheapplication. 23
PAGE 24
Applicationawarenessisanimportantfactorofecientresourcescheduling.ThischapterintroducesanovelapproachforapplicationclassicationbasedonthePrincipalComponentAnalysis(PCA)andthek-NearestNeighbor(k-NN)classier.Thisapproachisusedtoassistschedulinginheterogeneouscomputingenvironments.Ithelpstoreducethedimensionalityoftheperformancefeaturespaceandclassifyapplicationsbasedonextractedfeatures.Theclassicationconsidersfourdimensions:CPU-intensive,I/Oandpaging-intensive,network-intensive,andidle.Applicationclassinformationandthestatisticalabstractsoftheapplicationbehaviorarelearnedoverhistoricalrunsandusedtoassistmulti-dimensionalresourcescheduling. 2 { 4 ]withgivenresourceandapplicationspecications.Thereareseveralmethodsforobtainingresourcespecicationparameters(e.g.,CPU,memory,diskinformationfrom/procinUnixsystems).However,applicationspecicationischallengingtodescribebecauseofthefollowingfactors: 24
PAGE 25
20 ][ 21 ]itispossibletomigrateanapplicationduringitsexecutionforloadbalancing. Theabovecharacteristicsofgridapplicationspresentachallengetoresourcescheduling:Howtolearnandmakeuseofanapplication'smulti-dimensionalresourceconsumptionpatternsforresourceallocation?Thischapterintroducesanovelapproachtosolvethisproblem:applicationclassicationbasedonthefeatureselectionalgorithm,PrincipalComponentAnalysis(PCA),andK-NearestNeighbor(k-NN)classier[ 22 ][ 23 ].ThePCAisappliedtoreducethedimensionalityofapplicationperformancemetrics,whilepreservingthemaximumamountofvarianceinthemetrics.Then,thek-NearestNeighboralgorithmisusedtocategorizetheapplicationexecutionstatesintodierentclassesbasedontheapplication'sresourceconsumptionpattern.Thelearnedapplicationclassinformationisusedtoassisttheresourceschedulingdecision-makinginheterogeneouscomputingenvironments. TheVMPlantserviceintroducedinChapter 1.4.2 providesautomatedcloningandcongurationofapplication-centricVirtualMachines(VMs).Problem-solvingenvironmentssuchasIn-VIGO[ 24 ]cansubmitrequeststotheVMPlantservice,which 25
PAGE 26
Theclassicationsystemdescribedinthischapterleveragesthecapabilityofsummarizingapplicationperformancedatabycollectingsystem-leveldatawithinaVM,asfollows.Duringtheapplicationexecution,snapshotsofperformancemetricsaretakenatadesiredfrequency.APCAprocessoranalyzestheperformancesnapshotsandextractsthekeycomponentsoftheapplication'sresourceusage.Basedontheextractedfeatures,ak-NNclassiercategorizeseachsnapshotintooneofthefollowingclasses:CPU-intensive,IO-intensive,memory-intensive,network-intensiveandidle. Byusingthissystem,resourceschedulingcanbebasedonacomprehensivediagnosisoftheapplicationresourceutilization,whichconveysmoreinformationthanCPUloadinisolation.Experimentsreportedinthischaptershowthattheresourceschedulingfacilitatedwithapplicationclasscompositionknowledgecanachievebetteraveragesystemthroughputthanschedulingwithouttheknowledge. Therestofthechapterisorganizedasfollows:Section 2.2 introducesthePCAandthek-NNclassierinthecontextofapplicationclassication.Section 2.3 presentstheclassicationmodelandimplementation.Section 2.4 presentsanddiscussesexperimentalresultsofclassicationperformancemeasurements.Section 2.5 discussesrelatedwork.ConclusionsandfutureworkarediscussedinSection 2.6 26
PAGE 27
Apatternclassicationsystemconsistsofpre-processing,featureextraction,classication,andpost-processing.Thepre-processingandfeatureextractionareknowntosignicantlyaecttheclassication,becausetheerrorcausedbywrongfeaturesmaypropagatetothenextstepsandstayspredominantintermsoftheoverallclassicationerror.Inthiswork,asetofapplicationperformancemetricsarechosenbasedonexpertknowledgeandtheprincipleofincreasingrelevanceandreducingredundancy[ 25 ]. 22 ]isalineartransformationrepresentingdatainaleast-squaresense.Itisdesignedtocapturethevarianceinadatasetintermsofprincipalcomponentsandreducethedimensionalityofthedata.Ithasbeenwidelyusedindataanalysisandcompression. Whenasetofvectorsamplesarerepresentedbyasetoflinespassingthroughthemeanofthesamples,thebestlineardirectionsresultineigenvectorsofthescattermatrix-theso-called\principalcomponents"asshowninFigure 2-1 .Thecorrespondingeigenvaluesrepresentthecontributiontothevarianceofdata.Whentheklargesteigenvaluesofnprincipalcomponentsarechosentorepresentthedata,thedimensionalityofthedatareducesfromntok. Principalcomponentanalysisisbasedonthestatisticalrepresentationofarandomvariable.Supposewehavearandomvectorpopulationx,where andthemeanofthatpopulationisdenotedby 27
PAGE 28
Sampleofprincipalcomponentanalysis andthecovariancematrixofthesamedatasetis ThecomponentsofCx,denotedbycij,representthecovariancesbetweentherandomvariablecomponentsxiandxj.Thecomponentciiisthevarianceofthecomponentxi. Fromasampleofvectorsx1;;xM,wecancalculatethesamplemeanandthesamplecovariancematrixastheestimatesofthemeanandthecovariancematrix. Theeigenvectorseiandthecorrespondingeigenvaluesicanbeobtainedbysolvingtheequation 28
PAGE 29
(2{5) wheretheIistheidentifymatrixhavingthesameorderthanCxandthejjdenotesthedeterminantofthematrix.Ifthedatavectorhasncomponents,thecharacteristicequationbecomesofordern. Byorderingtheeigenvectorsintheorderofdescendingeigenvalues(largestrst),onecancreateanorderedorthogonalbasiswiththersteigenvectorhavingthedirectionoflargestvarianceofthedata.Inthisway,wecannddirectionsinwhichthedatasethasthemostsignicantamountsofenergy. Supposeonehasadatasetofwhichthesamplemeanandthecovariancematrixhavebeencalculated.LetAbeamatrixconsistingofeigenvectorsofthecovariancematrixastherowvectors. Bytransformingadatavectorx,weget (2{6) whichisapointintheorthogonalcoordinatesystemdenedbytheeigenvectors.Componentsofycanbeseenasthecoordinatesintheorthogonalbase.Wecanreconstructtheoriginaldatavectorxfromyby usingthepropertyofanorthogonalmatrixA1=AT.TheATisthetransposeofamatrixA.Theoriginalvectorxwasprojectedonthecoordinateaxesdenedbytheorthogonalbasis.Theoriginalvectorwasthenreconstructedbyalinearcombinationoftheorthogonalbasisvectors. 29
PAGE 30
(2{8) and ItmeansthatweprojecttheoriginaldatavectoronthecoordinateaxeshavingthedimensionKandtransformingthevectorbackbyalinearcombinationofthebasisvectors.Thismethodminimizesthemean-squareerrorbetweenthedataandtherepresentationwithgivennumberofeigenvectors. Ifthedataisconcentratedinalinearsubspace,thismethodprovidesawaytocompressdatawithoutlosingmuchinformationandsimplifyingtherepresentation.Bypickingtheeigenvectorshavingthelargesteigenvaluesweloseaslittleinformationaspossibleinthemean-squaresense. 26 ].Ithasbeenusedinmanyapplicationsintheeldofdatamining,statisticalpatternrecognition,imageprocessing,andmanyothers.Thepurposeofthisalgorithmistoclassifyanewobjectbasedonattributesandtrainingsamples.Theclassiersdonotuseanymodeltotandonlybasedonmemory.Givenaquerypoint,wendknumberofobjectsor(trainingpoints)closesttothequerypoint.Thek-NNclassierdecidestheclassbyconsideringthevotesofk(anoddnumber)nearestneighbors.Thenearest 30
PAGE 31
k-nearestneighborclassicationexample neighborispickedasthetrainingdatageometricallyclosesttothetestdatainthefeaturespaceasillustratedinFigure 2-2 Inthiswork,avectoroftheapplication'sresourceconsumptionsnapshotsisusedtorepresenttheapplication.Eachsnapshotconsistsofachosensetofperformancemetrics.ThePCAisusedtopreprocesstherawdatatoindependentfeaturesfortheclassier.Then,a3-NNclassierisusedtoclassifyeachsnapshot.Themajorityvoteofthesnapshots'classesisusedtorepresenttheclassoftheapplications:CPU-intensive,I/Oandpaging-intensive,network-intensive,oridle.Amachinewithnoloadexceptforbackgroundloadfromsystemdaemonsisconsideredasinidlestate. 2-3 .Inaddition,amonitoring 31
PAGE 32
ApplicationclassicationmodelThePerformanceprolercollectsperformancemetricsofthetargetapplicationnode.TheClassicationcenterclassiestheapplicationusingextractedkeycomponentsandperformsstatisticanalysisoftheclassicationresults.TheApplicationDBstorestheapplicationclassinformation.(misthenumberofsnapshotstakeninoneapplicationrun,t0=t1:arethebeginningendingtimesoftheapplicationexecution,VMIPistheIPaddressoftheapplication'shostmachine). systemisusedtosamplethesystemperformanceofacomputingnoderunninganapplicationofinterest. 32
PAGE 33
27 ]distributedmonitoringsystemisusedtomonitorapplicationnodes.TheperformancesamplertakessnapshotsoftheperformancemetricscollectedbyGangliaatapredenedfrequency(currently,5seconds)betweentheapplication'sstartingtimet0andendingtimet1.SinceGangliausesmulticastbasedonalisten/announceprotocoltomonitorthemachinestate,thecollectedsamplesconsistoftheperformancedataofallthenodesinasubnet.Theperformancelterextractsthesnapshotsofthetargetapplicationforfutureprocessing.Attheendofproling,anapplicationperformancedatapoolisgenerated.ThedatapoolconsistsofasetofndimensionalsamplesAnm=(a1;a2;;am),wherem=(t1t0)=disthenumberofsnapshotstakeninoneapplicationrunanddisthesamplingtimeinterval.Eachsampleaiconsistsofnperformancemetrics,whichincludeallthedefault29metricsmonitoredbyGangliaandthe4metricsthatweaddedbasedontheneedofclassication,includingthenumberofI/Oblocksreadfrom/writtentodisk,andthenumberofmemorypagesswappedin/out.Aprogramwasdevelopedtocollectthesefourmetrics(usingvmstat)andthemetricswereaddedtothemetriclistofGanglia'sgmond. 2-1 .Eachpairoftheperformancemetricscorrelatestotheresourceconsumptionbehaviorofthespecicapplicationclassandhaslimitedredundancies. 33
PAGE 34
Performancefeaturespacedimensionreductionsintheapplicationclassicationprocessm:Thenumberofsnapshotstakeninoneapplicationrun,n:Thenumberofperformancemetrics,Anm:Allperformancemetricscollectedbymonitoringsystem,A'pm:Theselectedrelevantperformancemetricsafterthezero-meanandunit-variancenormalization,Bqm:Theextractedkeycomponentmetrics,C1m:Theclassvectorofthesnapshots,Class:Theapplicationclass,whichisthemajorityvoteofsnapshots'classes. Forexample,performancemetricsofCPU SystemandCPU UserarecorrelatedtoCPU-intensiveapplications;Bytes InandBytes OutarecorrelatedtoNetwork-intensiveapplications;IO BIandIO BOarecorrelatedtotheIO-intensiveapplications;Swap InandSwap OutarecorrelatedtoMemory-intensiveapplications.Thedatapreprocessorextractstheseeightmetricsofthetargetapplicationnodefromthedatapoolbasedonourexpertknowledge.Thusitreducesthedimensionoftheperformancemetricfromn=33top=8andgeneratesA'pmasshowninFigure 2-4 .Inaddition,thepreprocessoralsonormalizestheselectedmetricstozero-meanandunit-variance. 2-1 asinputs.Itconductsthelineartransformationoftheperformancedataandselectstheprincipalcomponentsbasedonthepredenedminimalfractionvariance.Inourimplementation,theminimalfractionvariancewassettoextractexactlytwoprincipalcomponents.Therefore,attheendofprocessing,thedatadimensiongetsfurtherreducedfromp=8toq=2andthevectorBqmisgenerated,asshowninFigure 2-4 34
PAGE 35
Performancemetriclist Description System/User PercentCPU System/User Bytes In/Out Numberofbytespersecond into/outofthenetwork IO BI/BO Blockssentto/receivedfrom blockdevice(blocks/s) Swap In/Out Amountofmemoryswapped in/outfrom/todisk(kB/s) 28 ],isusedtorepresenttheIO-intensiveclass.SPECseis96[ 29 ],ascienticcomputingintensiveprogram,isusedtorepresenttheCPU-intensiveclass.Asyntheticapplication,Pagebench,isusedtorepresentthePaging-intensiveclass.ItinitializesandupdatesanarraywhosesizeisbiggerthanthememoryoftheVM,therebyinducingfrequentpagingactivity.Ettcp[ 30 ],abenchmarkthatmeasuresthenetworkthroughputoverTCPorUDPbetweentwonodes,isusedasthetrainingapplicationoftheNetwork-intensiveclass.Theperformancedataofallthesefourapplicationsandtheidlestateareusedtotraintheclassier.Foreachtestdata,thetrainedclassiercalculatesitsdistancetoallthetrainingdata.The3-NNclassicationidentiesonlythreetrainingdatasetswiththeshortestdistancetothetestdata.Thenthetestdata'sclassisdecidedbythemajorityvoteofthethreenearestneighbors. 2-4 .Inadditiontoasinglevalue(Class)theapplicationclassieralso 35
PAGE 36
2-2 summarizesthesetofapplicationsusedasthetrainingandthetestingapplicationsintheexperiments[ 28 { 38 ].The3-NNclassierwastrainedwiththeperformancedatacollectedfromtheexecutionsofthetrainingapplicationshighlightedinthetable.AlltheapplicationexecutionswerehostedbyaVMwareGSXvirtualmachine(VM1).ThehostserverofthevirtualmachinewasanIntel(R)Xeon(TM)dual-CPU1.80GHzmachinewith512KBcacheand1GBRAM.Inaddition,asecondvirtualmachinewiththesamespecicationwasusedtoruntheserverapplicationsofthenetworkbenchmarks. 36
PAGE 37
Listoftrainingandtestingapplications
PAGE 38
2-1 basedontheexpertknowledgeofthecorrelationbetweenthesemetricsandtheapplicationclasses.Afterthat,thePCAprocessorconductedthelineartransformationoftheperformancedataandselectedprincipalcomponentsbasedontheminimalfractionvariancedened.Inthisexperiment,thevariancecontributionthresholdwassettoextracttwo(q=2)principalcomponents.Ithelpstoreducethecomputationalrequirementsoftheclassier.Then,thetrained3-NNclassierconductsclassicationbasedonthedataofthetwoprincipalcomponents. Thetrainingdata'sclassclusteringdiagramisshowninFigure 2-5 (a).ThediagramshowsaPCA-basedtwo-dimensionalrepresentationofthedatacorrespondingtotheveclassestargetedbyoursystem.Afterbeingtrainedwiththetrainingdata,theclassierclassiestheremainingbenchmarkprogramsshowninTable 2-2 .Theclassierprovidesoutputsintwokindsofformats:theapplicationclass-clusteringdiagram,whichhelpstovisualizetheclassicationresults,andtheapplicationclasscomposition,whichcanbeusedtocalculatetheunitapplicationcost. Figure 2-5 showsthesampleclusteringdiagramsforthreetestapplications.Forexample,theinteractiveVMDapplication(Figure 2-5 (d))showsamixoftheidleclasswhenuserisnotinteractingwiththeapplication,theI/O-intensiveclasswhentheuserisuploadinganinputle,andtheNetwork-intensiveclasswhiletheuserisinteractingwiththeGUIthroughaVNCremotedisplay.Table 2-3 summarizestheclasscompositionsofallthetestapplications.Figure 2-6 visualizestheclasscompositionofsomesamplebenchmarkprograms.Theseclassicationresultsmatchtheclassexpectationsgainedfromempiricalexperiencewiththeseprograms.Theyareusedtocalculatetheunitapplicationcostshowninsection4.4. 38
PAGE 39
B C D Sampleclusteringdiagramsofapplicationclassications.A)Trainingdata:MixtureB)SimpleScalar:CPU-intensiveC)Autobench:Network-intensiveD)VMD:InteractivePrincipalComponent1and2aretheprincipalcomponentmetricsextractedbyPCA. 39
PAGE 40
Experimentaldata:applicationclasscompositions
PAGE 41
2-3 whenSPECseis96withmediumsizeinputdatawasexecutedinVM1with256MBmemory(SPECseis96 A),itisclassiedasCPU-intensiveapplication.IntheSPECseis96 Bexperiment,thesmallerphysicalmemory(32MB)resultedinincreasedpagingandI/Oactivity.TheincreasedI/OactivityisduetothefactthatlessphysicalmemoryisavailabletotheO/SbuercacheforI/Oblocks.Thebuercachesizeatruntimewasobservedtobeassmallas1MBinSPECseis96 B,andaslargeas200MBinSPECseis96 A.Inaddition,theexecutiontimegetsincreasedfrom291minutesand42secondsintherstcaseto426minutes58secondsinthesecondcase. Similarly,intheexperimentswithPostMark,dierentexecutionenvironmentcongurationschangedtheapplication'sresourceconsumptionpatternfromoneclasstoanother.Table 2-3 showsthatifalocalledirectorywasusedtostorethelestobereadandwrittenduringtheprogramexecution,thePostMarkbenchmarkshowedtheresourceconsumptionpatternoftheI/O-intensiveclass.Incontrast,withanNFSmountedledirectory,it(PostMark NFS)wasturnedintoaNetwork-intensiveapplication. Therstsetofexperimentsdemonstratesthattheapplicationclassinformationcanhelptheschedulertooptimizeresourcesharingamongapplicationsrunninginparalleltoimprovesystemthroughputandreducethroughputvariances.Intheexperiments,three 41
PAGE 42
Applicationclasscompositiondiagram applications{SPECseis96(S)withsmalldatasize,PostMark(P)withlocalledirectoryandNetPIPEClient(N){wereselected,andthreeinstancesofeachapplicationwereexecuted.Thescheduler'staskwastodecidehowtoallocatethenineapplicationinstancestorunonthe3virtualmachines(VM1,VM2andVM3)inparallel,eachofwhichhosted3jobs.TheVM4wasusedtohosttheNetPIPEserver.Therearetenpossibleschedulesavailable,asshowninFigure 2-7 Whenmultipleapplicationsrunonthesamehostmachineatthesametime,thereareresourcecontentionsamongthem.Twoscenarioswerecompared:intherstscenario,theschedulerdidnotuseclassinformation,andoneofthetenpossiblescheduleswas 42
PAGE 43
Systemthroughputcomparisonsfortendierentschedules1:f(SSS),(PPP),(NNN)g,2:f(SSS),(PPN),(PNN)g,3:f(SSP),(SPP),(NNN)g,4:f(SSP),(SPN),(PNN)g,5:f(SSP),(SNN),(PPN)g,6:f(SSN),(SPP),(PNN)g,7:f(SSN),(SPN),(PPN)g,8:f(SSN),(SNN),(PPP)g,9:f(SPP),(SPN),(SNN)g,10:f(SPN),(SPN),(SPN)gS{SPECseis96(CPU-intensive),P{PostMark(I/O-intensive),N{NetPIPE(Network-intensive). selectedatrandom.Theotherscenariousedapplicationclassknowledge,alwaysallocatingapplicationsofdierentclasses(CPU,I/Oandnetwork)torunonthesamemachine(Schedule10,Figure 2-7 ).ThesystemthroughputsobtainedfromrunsofallpossibleschedulesintheexperimentalenvironmentareshowninFigure 2-7 Theaveragesystemthroughputoftheschedulechosenwithclassknowledgewas1391jobsperday.Itachievedthehighestthroughputamongthetenpossibleschedules{22.11%largerthantheweightedaverageofthesystemthroughputsofallthetenpossibleschedules.Inaddition,therandomselectionofthepossibleschedulesresultedinlargevariancesofsystemthroughput.Theapplicationclassinformationcanbeusedtofacilitatetheschedulertopicktheoptimalscheduleconsistently.TheapplicationthroughputcomparisonofdierentschedulesononemachineisshowninFigure 2-8 .Itcomparesthe 43
PAGE 44
Applicationthroughputcomparisonsofdierentschedules.MIN,MAX,andAVGaretheminimum,maximum,averageapplicationthroughputofallthetenpossibleschedules.SPNistheproposedschedule10f(SPN),(SPN),(SPN)ginFigure 2-7 Table2-4. Systemthroughput:concurrentvs.sequentialexecutions CH3D PostMark TimeTakento Time(sec) Finish2Jobs Concurrent 310 613 264 752 throughputofscheduleID10(labeledSPNinFigure 2-8 )withtheminimum,maximum,andaveragethroughputsofallthetenpossibleschedules.Byallocatingjobsfromdierentclassestothemachine,thethreeapplications'throughputswerehigherthanaveragebydierentdegrees:SPECseis96Smallby24.90%,Postmarkby48.13%,andNetPIPEby4.29%.Figure 2-8 alsoshowsthatthemaximumapplicationthroughputswereachievedbysub-schedule(SSN)forSPECseis96and(PPN)forNetPIPEinsteadoftheproposed(SPN).However,thelowthroughputsoftheotherapplicationsinthesub-schedulemaketheirtotalthroughputssub-optimal. 44
PAGE 45
2-4 .Theexperimentresultsshowthattheexecutioneciencylossescausedbytherelativelymoderateresourcecontentionsbetweenapplicationsofdierentclasseswereosetbythegainsfromtheutilizationofidlecapacity.Theresourcesharingofapplicationsofdierentclassesimprovedtheoverallsystemthroughput. 2-3 wasrunningonanIntel(R)Pentium(R)4CPU1.70GHzmachinewith512MBmemory.Inaddition,theapplicationclassierwasrunningonanIntel(R)Pentium(R)III750MHzmachinewith256MBRAM. Inthisexperiment,atotalof8000snapshotsweretakenwithve-secondintervalsforthevirtualmachine,whichhostedtheexecutionofSPECseis96(medium).Ittooktheperformancelter72secondstoextracttheperformancedataofthetargetapplicationVM.Inaddition,ittookanother50secondsfortheclassicationcentertotraintheclassier,performthePCAfeatureselectionandtheapplicationclassication.Thereforetheunitclassicationcostis15mspersampledata,demonstratingthatitispossibletoconsidertheclassierforonlinetraining. 39 ][ 25 ]andclassicationtechniqueshavebeenappliedtomanyareassuccessfully,suchasintrusiondetection[ 40 ][ 41 ][ 42 ][ 43 ],textcategorization[ 44 ],andimageandspeechanalysis.Kapadia'sevaluationoflearningalgorithmsforapplicationperformancepredictionin[ 45 ]showsthatthenearest-neighboralgorithmhasbetter 45
PAGE 46
45 ].ThisthesisdiersfromKapadia'sworkinthefollowingways:First,theapplicationclassknowledgeisusedtofacilitatetheresourceschedulingtoimprovetheoverallsystemthroughputincontrastwithKapadia'swork,whichfocusesonapplicationCPUtimeprediction.Second,theapplicationclassiertakesperformancemetricsasinputs.Incontrast,in[ 45 ]theCPUtimepredictionisbasedontheinputparametersoftheapplication.Third,theapplicationclassieremploysPCAtoreducethedimensionalityoftheperformancefeaturespace.Itisespeciallyhelpfulwhenthenumberofinputfeaturesoftheclassierisnottrivial. Condorusesprocesscheckpointandmigrationtechniques[ 20 ]toallowanallocationtobecreatedandpreemptedatanytime.Thetransferofcheckpointsmayoccupysignicantnetworkbandwidth.Basney'sstudyin[ 46 ]showsthatco-schedulingofCPUandnetworkresourcescanimprovetheCondorresourcepool'sgoodput,whichisdenedastheallocationtimewhenaremotelyexecutingapplicationusestheCPUtomakeforwardprogress.Theapplicationclassierpresentedinthisthesisperformslearningofapplication'sresourceconsumptionofmemoryandI/OinadditiontoCPUandnetworkusage.Itprovidesawaytoextractthekeyperformancefeaturesandgenerateanabstractoftheapplicationresourceconsumptionpatternintheformofapplicationclass.Theapplicationclassinformationandresourceconsumptionstatisticscanbeusedtogetherwithrecentmulti-lateralresourceschedulingtechniques,suchasCondor'sGang-matching[ 47 ],tofacilitatetheresourceschedulingandimprovesystemthroughput. ConservativeScheduling[ 4 ]usesthepredictionoftheaverageandvarianceoftheCPUloadofsomefuturepointoftimeandtimeintervaltofacilitatescheduling.Theapplicationclassiersharesthecommontechniqueofresourceconsumptionpatternanalysisofatimewindow,whichisdenedasthetimeofoneapplicationrun.However,theapplicationclassieriscapabletotakeintoaccountusagepatternsofmultiplekindsofresources,suchasCPU,I/O,networkandmemory. 46
PAGE 47
48 ]usesasyntheticskeletonprogramtoreproducetheCPUutilizationandcommunicationbehaviorsofmessagepassingparallelprogramstopredictapplicationperformance.Incontrast,theapplicationclassierprovidesapplicationbehaviorlearninginmoredimensions. Prophesy[ 49 ]employsaperformance-modelingcomponent,whichusescouplingparameterstoquantifytheinteractionsbetweenkernelsthatcomposeanapplication.However,tobeabletocollectdataatthelevelofbasicblocks,procedures,andloops,itrequiresinsertionofinstrumentationcodeintotheapplicationsourcecode.Incontrast,theclassicationapproachusesthesystemperformancedatacollectedfromtheapplicationhosttoinfertheapplicationresourceconsumptionpattern.Itdoesnotrequirethemodicationoftheapplicationsourcecode. Statisticalclusteringtechniqueshavebeenappliedtolearnapplicationbehavioratvariouslevels.Nickolayevetalappliedclusteringtechniquestoecientlyreducetheprocessoreventtracedatavolumeinclusterenvironment[ 50 ].AhnandVetterconductedapplicationperformanceanalysisbyusingclusteringtechniquestoidentifytherepresentativeperformancecountermetrics[ 51 ].BothCohenandChase's[ 52 ]andourworkperformstatisticalclusteringusingsystem-levelmetrics.However,theirworkfocusesonsystemperformanceanomalydetection.Ourworkfocusesonapplicationclassicationforresourcescheduling. Ourworkcanbeusedtolearntheresourceconsumptionpatternsofparallelapplication'schildprocessandmulti-stageapplication'ssub-stage.However,inthisstudywefocusonsequentialandsingle-stageapplications. 47
PAGE 48
Inthiswork,theinputperformancemetricsareselectedmanuallybasedonexpertknowledge.Inthenextchapter,thetechniquesforautomaticallyselectingfeaturesforapplicationclassicationarediscussed. 48
PAGE 49
Applicationclassicationtechniquesbasedonmonitoringandlearningofresourceusage(e.g.,CPU,memory,disk,andnetwork)havebeenproposedinChapter2toaidinresourceschedulingdecisions.Animportantproblemthatarisesinapplicationclassiersishowtodecidewhichsubsetofnumerousperformancemetricscollectedfrommonitoringtoolsshouldbeusedfortheclassication.Thischapterpresentsanapproachbasedonaprobabilisticmodel(BayesianNetwork)tosystematicallyselecttherepresentativeperformancefeatures,whichcanprovideoptimalclassicationaccuracyandadapttochangingworkloads. 53 ].Well-knownmonitoringtoolssuchastheopensourcepackagesGanglia[ 54 ]anddproc[ 55 ],andcommercialproductssuchasHP'sOpenView[ 56 ]providethecapabilityofmonitoringarichsetofsystemlevelperformancemetrics.Animportantproblemthatarisesishowtodecidewhichsubsetofnumerousperformancemetricscollectedfrommonitoringtoolsshouldbeusedfortheclassicationinadynamicenvironment.Inthischapterweaddressthisproblem.Ourapproachisbasedonautonomicfeatureselectionandcanhelptoimprovethesystem'sself-manageability[ 1 ]byreducingtherelianceonexpertknowledgeandincreasingthesystem'sadaptability. TheneedforautonomicfeatureselectionandapplicationclassicationismotivatedbysystemssuchasVMPlant[ 16 ],whichprovidesautomatedresourceprovisioningofVirtualMachine(VM).InthecontextofVMPlant,theapplicationcanbescheduledtorunonadedicatedvirtualmachine,whosesystemlevelperformancemetricsreecttheapplication's 49
PAGE 50
Tobuildanautonomicclassicationsystemwithself-congurability,itiscriticaltodeviseasystematicfeatureselectionschemethatcanautomaticallychoosethemostrepresentativefeaturesforapplicationclassicationandadapttochangingworkloads.Thischapterpresentsanapproachofusingaprobabilisticmodel,theBayesianNetwork,toautomaticallyselecttheperformancemetricsthatcorrelatewithapplicationclassesandoptimizetheclassicationaccuracy.TheapproachalsousestheMahalanobisdistancetosupportonlineselectionoftrainingdata,whichenablesthefeatureselectiontoadapttodynamicworkloads.Intherestofthisdissertation,wewillusetheterms\metrics"and\features"interchangeably. Inchapter2,asubsetofperformancemetricsweremanuallyselectedbasedonexpertknowledgetocorrelatetotheresourceconsumptionbehavioroftheapplicationclass.However,expertknowledgeisnotalwaysavailable.Incaseofhighlydynamicworkloadormassvolumeofperformancedata,theapproachofmanualcongurationbyhumanexpertisalsonotfeasible.Thesepresentaneedforasystematicwaytoselecttherepresentativemetricsintheabsenceofsucientexpertknowledge.Ontheotherhand,theuseoftheBayesianNetworkleavestheoptionopentointegrateexpertknowledgewiththeautomaticfeatureselectiontoimprovetheclassicationaccuracyandeciency. Featureselectionbasedonstaticselectedapplicationperformancedata,whichareusedasthetrainingset,maynotalwaysprovidetheoptimalclassicationresultsindynamicenvironments.Toenablethefeatureselectiontoadapttothechangingworkload,itrequiresthesystemtobeabletodynamicallyupdatethetrainingsetwithdatafromrecentworkload.Aquestionthatarisesishowtodecidewhichdatashouldbeselectedastrainingdata.Inthiswork,analgorithmbasedonMahalanobisdistanceisused 50
PAGE 51
Ourexperimentalresultsshowthefollowing.First,weobservecorrelationsbetweenpairsofselectedperformancemetrics,whichjustiestheuseofMahalanobisdistanceasameansoftakingthecorrelationintoaccountinthetrainingdataselectionprocess.Second,thereisadiminishingreturnofclassicationutilityfunction(i.e.theratioofclassicationaccuracyoverthenumberofselectedmetrics)asmorefeaturesareselected.Theexperimentsshowedthatabove90%applicationclassicationaccuracycanbeachievedwithasmallsubsetofperformancemetricswhicharehighlycorrelatedwiththeapplicationclass.Third,theapplicationclassicationbasedontheselectedfeaturesforasetofbenchmarkprogramsandscienticapplicationsmatchedourempiricalexperiencewiththeseapplications. Therestofthechapterisorganizedasfollows:ThestatisticaltechniquesusedaredescribedinSection 3.2 .Section 3.3 presentsthefeatureselectionmodel.Section 3.4 presentsanddiscussestheexperimentalresults.Section 3.5 discussesrelatedwork.ConclusionsandfutureworkarediscussedinSection 3.6 3.2.1FeatureSelection 57 ].Subsetgenerationisaprocessofheuristicsearchofcandidatesubsets.Eachsubsetisevaluatedbasedontheevaluationcriterion.Thentheevaluationresultiscomparedwiththepreviouslycomputedbestresult.Ifitisbetter,itwillreplacethebestresultandtheprocesscontinuesuntilthestopcriterionisreached.Theselectionresultisvalidatedbydierenttestsorpriorknowledge. 51
PAGE 52
3.3.2 58 ].Itcanbeusedtocomputetheconditionalprobabilityofanode,giventhevaluesofitspredecessors;hence,aBNcanbeusedasaclassierthatgivestheposteriorprobabilitydistributionoftheclassdecisionnodegiventhevaluesofothernodes. BayesianNetworksarebasedontheworkofthemathematicianandtheologianRev.ThomasBayes,whoworkedwithconditionalprobabilitytheoryinthelate1700stodiscoverabasiclawofprobability,whichwasthencalledBayes'rule.TheBayes'ruleincludesahypothesis,pastexperience,andevidence: wherewecanupdateourbeliefinhypothesisHgiventheadditionalevidenceE,andthebackgroundcontext(pastexperience),c. Theleft-handterm,P(HjE;c)iscalledtheposteriorprobability,ortheprobabilityofhypothesisHafterconsideringtheeectoftheevidenceEonpastexperiencec. ThetermP(Hjc)iscalledthea-prioriprobabilityofHgivencalone. ThetermP(EjH;c)iscalledthelikelihoodandgivestheprobabilityoftheevidenceassumingthehypothesisHandthebackgroundinformationcistrue. 52
PAGE 53
BayesianNetworkscaptureBayes'ruleinagraphicalmodel.Theyareveryeectiveformodelingsituationswheresomeinformationisalreadyknownandincomingdataisuncertainorpartiallyunavailable(unlikerule-basedor\expert"systems,whereuncertainorunavailabledataresultsinineectiveorinaccuratereasoning).ThisrobustnessinthefaceofimperfectknowledgeisoneofthemanyreasonswhyBayesianNetworksareincreasinglyusedasanalternativetootherAIrepresentationalformalisms.Bayesiannetworkshavebeenappliedtomanyareassuccessfully,includingmaplearning[ 59 ],medicaldiagnosis[ 60 ][ 61 ],andspeechandvisionprocessing[ 62 ][ 63 ].Comparedwithotherpredictivemodels,suchasdecisiontreesandneuralnetworks,andstandardfeatureselectionmodelthatisbasedonPrincipalComponentAnalysis(PCA),Bayesiannetworksalsohavetheadvantageofinterpretability.Humanexpertscaneasilyunderstandthenetworkstructureandmodifyittoobtainbetterpredictivemodels.Byaddingdecisionnodesandutilitynodes,BNmodelscanalsobeextendedtodecisionnetworksfordecisionanalysis[ 64 ]. ConsideradomainUofnvariables,x1;;xn.Eachvariablemaybediscretehavinganiteorcountablenumberofstates,orcontinuous.GivenasubsetXofvariablesxi,wherexi2U,ifonecanobservethestateofeveryvariableinX,thenthisobservationiscalledaninstanceofXandisdenotedasX=p(xijx1;;xi1;)=p(xiji;)kxfortheobservationsxi=ki;xi2X.The\jointspace"orUisthesetofallinstancesofU.p(X=kxjY=k;)denotesthe\generalizedprobabilitydensity"thatX=p(xijx1;;xi1;)=p(xiji;)kxgivenY=kforapersonwithcurrentstateinformation.p(XjY;)thendenotesthe\GeneralizedProbabilityDensityFunction"(gpdf)forX,givenallpossibleobservationsofY.ThejointgpdfoverUisthegpdfforU. ABayesiannetworkfordomainUrepresentsajointgpdfoverU.Thisrepresentationconsistsofasetoflocalconditionalgpdfscombinedwithasetofconditionalindependence 53
PAGE 54
SampleBayesiannetworkgeneratedbyfeatureselector assertionsthatallowtheconstructionofaglobalgpdffromthelocalgpdfs.Asshownpreviously,thechainruleofprobabilitycanbeusedtoascertainthesevalues: (3{1) OneassumptionimposedbyBayesianNetworktheory(andindirectlybytheProductRuleofprobabilitytheory)isthateachvariablexi;ifxi;;xi1gmustbeasetofvariablesthatrendersxiandfx1;;xi1gconditionallyindependent.Inthisway: (3{2) ABayesianNetworkStructurethenencodestheassertionsofconditionalindependenceinEquation 3{1 above.Essentiallythen,aBayesianNetworkStructureBSisadirectedacyclicgraphsuchthateachvariableinUcorrespondstoanodeinBS,andtheparentsofthenodecorrespondingtoxiarethenodescorrespondingtothevariablesini. Dependingontheproblemthatisdened,either(orboth)ofthetopologyandtheprobabilitydistributionofBayesianNetworkcanbepre-denedbyhandormaybe 54
PAGE 55
3-1 givesasampleBNlearnedintheexperiment.Therootistheapplicationclassdecisionnode,whichisusedtodecideanapplicationclassgiventhevalueoftheleafnodes.Therootnodeistheparentofallothernodes.Theleafnodesrepresentselectedperformancemetrics,suchasnetworkpacketssentandbyteswrittentodisk.Theyareconnectedonetoanotherinaseries. 22 ][ 65 ].Forexample,ifx1andx2aretwopointsfromthedistributionwhichischaracterizedbycovariancematrix1,thenthequantity ((x1x2)T1(x1x2))1 2 iscalledtheMahalanobisdistancefromx1tox2,whereTdenotesthetransposeofamatrix. Inthecaseswheretherearecorrelationsbetweenvariables,simpleEuclideandistanceisnotanappropriatemeasure,whereastheMahalanobisdistancecanadequatelyaccountforthecorrelationsandisscale-invariant.StatisticalanalysisoftheperformancedatainSection 3.4.3 showsthattherearecorrelationsbetweentheapplicationperformancemetricswithvariousdegrees.Therefore,Mahalanobisdistancebetweentheunlabeledperformancesampleandtheclasscentroid,whichrepresentstheaverageofallexistingtrainingdataoftheclass,isusedinthetrainingdataqualicationprocessinSection 3.3.1 66 ]iscommonlyusedtoevaluatetheperformanceofclassicationsystems.Itshowsthepredictedandactualclassicationdonebythesystem.ThematrixsizeisLxL,whereListhenumberofdierentclasses.Inourcasewheretherearevetargetapplicationclasses,theLisequalto5. 55
PAGE 56
3-1 showsasampleconfusionmatrixwithL=2.Thereareonlytwopossibleclassesinthisexample:Positiveandnegative.Therefore,itsclassicationaccuracycanbecalculatedas(a+d)/(a+b+c+d). 3-2 showstheautonomicfeatureselectionframworkinthecontextofapplicationclassication.Inthissection,wearegoingtofocusonintroducingtheclassicationtrainingcenter,whichenablestheself-congurabilityforonlineapplicationclassication.Thetrainingcenterhastwomajorfunctions:qualityassuranceoftrainingdata,whichenablestheclassiertoadapttochangingworkloads,andsystematicfeatureselection,whichsupportsautomaticfeatureselection.Thetrainingcenterconsistsofthreecomponents:thedataqualityassuror,thefeatureselector,andthetrainer. ThetrainingdatapoolconsistsofrepresentativedataofveapplicationclassesincludingCPU-intensive,I/O-intensive,memory-intensive,network-intensive,andidle.TrainingdataofeachclasscisasetofKcm-dimensionalpoints,wheremisthenumberofapplication-specicperformancemetricsreportedbythemonitoringtools.Toselectthe Table3-1. Sampleconfusionmatrixwithtwoclasses(L=2) Actual Predicted Class Negative Positive Negative a b Positive c d 56
PAGE 57
FeatureselectionmodelThePerformanceprolercollectsperformancemetricsofthetargetapplicationnode.TheApplicationclassierclassiestheapplicationusingextractedkeycomponentsandperformsstatisticanalysisoftheclassicationresults.TheDataQAselectsthetrainingdatafortheclassication.TheFeatureselectorselectsperformancemetricswhichcanprovideoptimalclassicationaccuracy.TheTrainertrainstheclassierusingtheselectedmetricsoftrainingdata.TheApplicationDBstorestheapplicationclassinformation.(t0=t1:arethebeginning/endingtimesoftheapplicationexecution,VMIPistheIPaddressoftheapplication'shostmachine). trainingdatafromtheapplicationsnapshots,onlynoutofmmetricsareextractedbasedonpreviousfeatureselectionresulttoformasetofKcn-dimensionaltrainingpoints. thatcompriseaclusterCc.From[ 50 ],itfollowsthatthen-tuple (3{5) 57
PAGE 58
xic(t)=1 iscalledthecentroidoftheclusterCc. Thetrainingdataselectionisathree-stepprocess:FirsttheDataQAextractsthenoutofmmetricsoftheinputperformancesnapshottoformatrainingdatacandidate.Thuseachcandidateisrepresentedbyann-dimensionalpointx=(x1;x2;;xn).Second,itevaluateswhethertheinputcandidateisqualiedtobetrainingdatarepresentingoneoftheapplicationclass.Atlast,thequaliedtrainingdatacandidateisassociatedwithascalarvalueClass,whichdenestheapplicationclass. Therststepisstraight-forward.Inthesecondandthirdsteps,theMahalanobisdistancebetweenthetrainingdatacandidatexandthecentroidcofclusterCciscalculatedasfollows: 2 wherec=1;2;;5representstheapplicationclassand1cdenotestheinversecovariancematrixoftheclusterCc.Thedistancefromthetrainingdatacandidatextotheboundarybetweentwoclassclusters,forexampleC1andC2,isjd1(x)d2(x)j.Ifjd1(x)d2(x)j=0,itmeansthatthecandidateisexactlyattheboundarybetweenclass1and2.Thefurtherawaythecandidateisfromtheclassboundaries,thebetteritcanrepresentaclass.Inotherwords,thereislessprobabilityforittobemisclassied.Therefore,theDataQAcalculatesthedistancefromthecandidatetoboundariesofallpossiblepairsoftheclasses.Iftheminimaldistancetoclassboundaries,min(jd1d2j;jd1d3j;;jd4d5j),isbiggerthanapredenedthreshold,thecorrespondingm-dimensionalsnapshotofthecandidateisdeterminedtobequaliedtrainingdataof 58
PAGE 59
Sampleperformancemetricsintheoriginalfeatureset Description Metrics system/user PercentCPU system/user /idle /idle cpu nice PercentCPUnice bytes in/out Numberofbytespersecond into/outofthenetwork io bi/bo Blockssentto/receivedfrom ablockdevice(blocks/s) swap in/out Amountofmemoryswapped in/outfrom/todisk(kB/s) pkts in/out Packetsin/outpersecond proc run Totalnumberofrunning processes load one/ve One/Five/Fifteenminutes /fteen loadaverage theclass,whosecentroidhasthesmallestMahanalobisdistancemin(d1;d2;;d5)tothesnapshot.Automatedandadaptivethresholdsettingisdiscussedindetailin[ 67 ]. Inourimplementation,Gangliaisusedasthemonitoringtoolandtwenty(m=20)performancemetrics,whicharerelatedtoresourceusage,areincludedinthetrainingdata.Theseperformancemetricsinclude16outof33defaultmetricsmonitoredbyGangliaandthe4metricsthatweaddedbasedontheneedofclassication.ThefourmetricsincludethenumberofI/Oblocksreadfrom/writtentodisk,andthenumberofmemorypagesswappedin/out.Aprogramwasdevelopedtocollectthesefourmetrics(usingvmstat)andaddedthemtothemetriclistofGanglia'smonitoringdaemongmond.Table 3-2 showssomesampleperformancemetricsofthetrainingcandidate. Thersttimequalityassurancewasperformedbyhumanexpertattheinitialization.Thesubsequentassurancecanbeconductedautomaticallybyfollowingtheabovestepstoselectrepresentativetrainingdataforeachclass. 59
PAGE 60
trainingdatasetwithNfeatures Class classoftrainingdata(teacherforlearning) selectedfeaturesubset maximumaccuracyD=discretize(C); convertcontinuoustodiscretefeaturesrepeatinitializeAnode=0; maxaccuracyforeachnodeinitializeFnode=0; selectedfeatureforeachnodeforeachF2(fF0;F1;;FN1gSbest)doAccuracy=eval(D;Class;Sbest[fFg); evaluateBayesiannetworkwithextrafeatureFifAccuracy>Anodethen storethecurrentfeatureAnode=Accuracy;Fnode=F;endendifAnode>AmaxthenSbest=Sbest[fFnodeg;Amax=Anode;Anode=Anode+1;enduntil(AnodeAmax);end Bayesian-networkbasedfeatureselectionalgorithmforapplicationclassication collectedfrommonitoringtools.Bylteringoutmetricswhichcontributelesstotheclassication,itcanhelptonotonlyreducethecomputationalcomplexityofsubsequentclassications,butalsoimproveclassicationaccuracy. Inourpreviouswork[ 53 ],representativefeatureswereselectedmanuallybasedonexpertknowledge.Forexample,performancemetricsofcpu systemandcpu userarecorrelatedtothebehaviorofCPU-intensiveapplications;bytes inandbytes outarecorrelatedtonetwork-intensiveapplications;io biandio boarecorrelatedtotheI/O-intensiveapplications;swap inandswap outarecorrelatedtomemory-intensiveapplications.However,tosupporton-lineclassication,itisnecessaryforfeatureselectiontohavetheabilitytoadapttochangingworkloads.Therefore,thestaticselection 60
PAGE 61
AwrapperalgorithmbasedonBayesiannetworkisemployedbythefeatureselectortoconductthefeatureselection.AsintroducedinSection 3.2.1 ,althoughthisfeatureselectionschemereducestherelianceonhumanexperts'knowledge,theBayesiannetwork'sinterpretabilityleavestheoptionsopentointegratetheexpertknowledgeintotheselectionschemetobuildabetterclassicationmodel. Figure 3-3 showsthefeatureselectionalgorithm.ItstartswithanemptyfeaturesubsetSbest=fg.TosearchforthebestfeatureF,itusesthetemporaryfeaturesetfSbest[FgtoperformBayesianNetworkclassicationforthediscretetrainingdataD.TheclassicationaccuracyiscalculatedbycomparingtheclassicationresultandthetrueansweroftheClassinformationcontainedinthetrainingdata.Aftertheevaluationofaccuracyusingallremainingfeatures(fF1;F2;;FN1gSbest),thebestaccuracyisstoredtoAnode.IfAnodeisbetterthanthepreviousbestaccuracyAmaxachieved,thecorrespondingfeaturenodeisaddedtothefeaturesubsettoformthenewsubset.Thisprocessisrepeateduntiltheclassicationaccuracycannotbeimprovedanymorebyaddinganyoftheremainingfeaturesintothesubset. 61
PAGE 62
Intheexperiments,alltheapplicationswereexecutedinaVMwareGSX2.5virtualmachinewith256MBmemory.ThevirtualmachinewashostedonanIntel(R)Xeon(TM)dual-CPU1.80GHzmachinewith512KBcacheand1GBRAM.TheCTCandapplicationclassierwererunningonanIntel(R)Pentium(R)III750MHzmachinewith256MBRAM. Therstexperimentwasdesignedtoshowtherelationshipbetweenclassicationaccuracyandthenumberoffeaturesselected.Thesecondexperimentwasdesignedto 62
PAGE 63
Averageclassicationaccuracyof10setsoftestdataversusnumberoffeaturesselectedintherstexperiment Figure3-5. Two-classtestdatadistributionwiththersttwoselectedfeatures 63
PAGE 64
Intherstexperiment,thetrainingdataconsistofperformancesnapshotsofveclassesofapplications,includingCPU-intensive,I/O-intensive,memory-intensive,andnetwork-intensiveapplications,andthesnapshotscollectedfromanidleapplication-VM,whichhasonly\backgroundnoise"fromsystemactivity(i.e.withoutanyapplicationexecutionduringthemonitoringperiod).Thefeatureselector'staskistoselectthosemetricswhichcanbeusedtoclassifythetestsetintoveclasseswithoptimalaccuracy. Inalltheteniterationsofcrossvalidation,twoperformancemetrics(cpu systemandload fteen)werealwaysselectedasthebesttwofeatures.Figure 3-6 showsasampletestdatadistributionwiththesetwofeatures.Ifweprojectthedatatox-axisory-axis,wecanseethatitismorediculttodierentiatethedatafromeachclassbyusingeithercpu systemorload fteenalonethanusingbothmetrics.Forexample,thecpu systemvaluerangesofnetwork-intensiveapplicationandI/O-intensiveapplicationlargelyoverlap.Thismakesithardtoclassifythesetwoapplicationswithonlycpu systemmetric.Comparedwiththeone-metricclassication,itismucheasiertodecidewhichclassthetestdatabelongtobyusinginformationofbothmetrics.Inotherwords,thecombinationofmultiplefeaturesismoredescriptivethanasinglefeature. TheclassicationaccuracyversusthenumberoffeaturesselectedfortheabovelearnedBayesiannetworkisplottedinFigure 3-4 .Itshowsthatwithasmallnumberoffeatures(3to4),itcanachieveabove90%classicationaccuracyforthis5-classclassication. Inthesecondexperiment,thetrainingdataconsistofperformancesnapshotsoftwoclassesofapplications,I/O-intensiveandmemory-intensive.Figure 3-5 showsitstestdatadistributionwiththersttwoselectedfeatures,bytes inandpkts in.AcomparisonofFigure 3-6 andFigure 3-5 showsthatwithreducednumberofapplicationclasses,higherclassicationaccuracycanbeachievedwithlessnumberoffeatures.Forexample, 64
PAGE 65
Confusionmatrixofclassicationresultswithexpert-selectedandautomatically-selectedfeaturesets.A)AutomaticB)Expert Actual Classiedas Class Idle CPU IO Net Mem Idle 62 0 0 CPU 231 0 0 IO 20 86 0 Net 0 12 8 Mem 0 0 0 0 Actual Classiedas Class Idle CPU IO Net Mem Idle 38 0 0 CPU 4 0 104 IO 20 10 173 Net 0 0 24 Mem 3 0 36 0 Theboldnumbersalongthediagonalarethenumberofcorrectlyclassieddata. inthisexperiment,ifweknowthattheapplicationbelongstoeitherI/O-intensiveormemory-intensiveclass,withtwoselectedfeatures,96%classicationaccuracycanbeachievedversus87%accuracyinthe5-classcase.Itshowsthepotentialofusingpair-wiseclassicationtoimprovetheclassicationaccuracyformulti-classcases.Usingpair-wiseapproachformulti-classclassicationisatopicoffutureresearch. 53 ]. First,thetrainingdatadistributionsbasedonprincipalcomponents,whicharederivedfromautomaticallyselectedfeaturesinSection 3.4.1 andmanuallyselectedfeaturesinpreviouswork[ 53 ],areshowninFigure 3-8 .DistancesbetweeneachpairofclasscentroidsinFigure 3-8 arecalculatedandplotedinFigure 3-7 .Itshowsthat 65
PAGE 66
Five-classtestdatadistributionwithrsttwoselectedfeatures Figure3-7. Comparisonofdistancesbetweenclustercentersderivedfromexpert-selectedandautomaticallyselectedfeaturesets1:idle-cpu2:idle-I/O3:idle-net4:idle-mem5:cpu-I/O6:cpu-net7:cpu-mem8:I/O-net9:I/O-mem10:net-mem 66
PAGE 67
B Trainingdataclusteringdiagramderivedfromexpert-selectedandautomaticallyselectedfeaturesets A )Automatic B )Expert 67
PAGE 68
Second,thePCAandk-NNbasedclassicationswereconductedwithboththeexpertselected8featuresinpreviouswork[ 53 ]andtheautomaticallyselectedfeaturesinSection 3.4.1 .Table 3-3 showstheconfusionmatricesoftheclassicationresults.Ifdataareclassiedtothesameclassesastheiractualclasses,theclassicationsareconsideredascorrect.Theclassicationaccuracyistheproportionofthetotalnumberofclassicationsthatwerecorrect.Theconfusionmatricesshowsthataclassicationaccuracyof98.05%canbeachievedwithautomaticallyselectedfeatureset,whichissimilartothe98.14%accuracyachievedwithexpertselectedfeatureset.ThustheautomaticfeatureselectionthatisbasedonBayesianNetworkcanreducetherelianceonexpertknowledgewhileoeringcompetitiveclassicationaccuracycomparedtomanualselectionbyhumanexpert. Inaddition,asetof8featuresselectedinthe5-classfeatureselectionexperimentinSection 3.4.1 wasusedtoconguretheapplicationclassierandthesametrainingdatausedinthefeatureselectionexperimentwereusedtotraintheapplicationclassier.Thenthetrainedclassierconductedclassicationforasetofthreebenchmarkprograms:SPECseis96[ 29 ],PostMarkandPostMark NFS[ 28 ].SPECseis96isascienticapplicationwhichiscomputing-intensivebutalsoexercisesdiskI/Ointheinitialandendphasesofitsexecution.PostMarkoriginallyisadiskI/Obenchmarkprogram.InPostMark NFS,anetworklesystem(NFS)mounteddirectorywasusedtostoretheleswhichwereread/writtenbythebenchmark.Therefore,PostMark NFSperformssubstantialnetwork-I/OratherthandiskI/O.TheclassicationresultsareshowninFigure 3-9 .Theresultsshowthat86%oftheSPECseis96testdatawereclassiedascpu-intensive,95%ofthePostMarkdatawereclassiedasI/O-intensive,and61%ofthePostMark NFS 68
PAGE 69
B C Classicationresultsofbenchmarkprograms A )SPECseis96 B )PostMark C )PostMark NFSPrincipalcomponents1and2aretheprincipalcomponentmetricsextractedbyPCA. 69
PAGE 70
Performancemetriccorrelationmatrixesoftestapplications.A)CorrelationmatrixofSPECseis96performancedataB)CorrelationmatrixofPostMarkperformancedataC)CorrelationmatrixofNetPIPEperformancedata Metric 1 2 3 4 5 6 1 1.00 -0.21 -0.34 -0.02 2 -0.21 1.00 -0.16 -0.02 -0.17 -0.06 3 -0.34 -0.16 1.00 -0.05 4 -0.19 0.04 5 0.20 -0.17 0.20 -0.19 1.00 0.12 6 -0.02 -0.06 -0.05 0.04 0.12 1.00 A Metric 1 2 3 4 5 6 1 1.00 -0.24 0.22 0.34 -0.08 -0.13 2 -0.24 1.00 -0.22 0.18 0.04 -0.02 3 0.22 -0.22 1.00 0.33 0.30 0.18 4 0.34 0.18 0.33 1.00 0.42 0.47 5 -0.08 0.04 0.30 0.42 1.00 0.20 6 -0.13 -0.02 0.18 0.47 0.20 1.00 B Metric 1 2 3 4 5 6 1 1.00 0.29 0.31 0.48 0.27 0.30 2 0.29 1.00 0.49 0.39 0.95 0.31 0.49 1.00 0.50 0.52 0.48 0.39 0.50 1.00 0.42 0.39 5 0.28 0.59 1.00 0.30 0.52 C 1{load ve2{pkts in3{cpu system4{load fteen5{pkts out6{bytes outCorrelationsthosearelargerthan0.5arehighlightedwithboldcharacters datawereclassiedasnetwork-intensive.Theresultsmatchedwithourempiricalexperiencewiththeseprogramsandareclosetotheresultsofexpert-selected-featurebasedclassiciation,whichshows85%cpu-intensiveforSPECseis96,97%I/O-intensiveforPostMark,and62%network-intensiveforPostMark NFS. 70
PAGE 71
Thedataqualityassurorclassieseachunlabeledtestdatabyidentifyingitsnearestneighboramongallclasscentroids.Itsperformancethusdependscruciallyonthedistancemetricusedtoidentifythenearestclasscentroid.Infact,anumberofresearchershavedemonstratedthatnearestneighborclassicationcanbegreatlyimprovedbylearninganappropriatedistancemetricfromlabeledexamples[ 65 ]. Table 3-4 showsthecorrelationcoecientsofeachpairoftherstsixperformancemetricscollectedduringtheapplicationexecution,includingload ve,pkts in,cpu system,load fteen,pkts out,andbytes out.Threeapplicationsareusedintheseexperimentsincluding:SPECseis96[ 29 ],PostMark[ 28 ]andNetPIPE[ 34 ]. Theexperimentsshowthattherearecorrelationsbetweenpairsofperformancemetricsinvariousdegrees.Forexample,NetPIPE'sbytes outmetricarehighlycorrelatedwithitspkts in,pkts out,andcpu systemmetrics.Inthecaseswheretherearecorrelationsbetweenmetrics,distancemetricwhichcantakethecorrelationintoaccountwhendeterminingitsdistancefromtheclasscentroid,shouldbeused.Therefore,Mahalanobisdistanceisusedinthetrainingdataselectionprocess. 39 ][ 68 ]andclassicationtechniqueshavebeenappliedtomanyareassuccessfully,suchasintrusiondetection[ 69 ][ 40 ][ 42 ],textcategorization[ 44 ],speechandimageprocessing[ 62 ][ 63 ],andmedicaldiagnosis[ 60 ][ 61 ]. Thefollowingworksappliedthesetechniquestoanalyzesystemperformance.Howevertheydierfromeachotherfromthefollowingaspects:goalsoffeatureselection,thefeaturesunderstudy,andimplementationcomplexity. Nickolayevetal.usedstatisticalclusteringtechniquestoidentifytherepresentativeprocessorsforparallelapplicationperformancetuning[ 50 ].Onlyeventtracingofthe 71
PAGE 72
Ahnetal.appliedvariousstatisticaltechniquestoextracttheimportantperformancecountermetricsforapplicationperformanceanalysis[ 51 ].Theirprototypecansupportparallelapplication'sperformanceanalysisbycollectingandaggregatinglocaldata.Itrequiresannotationofapplicationsourcecodeaswellasappropriateoperatingsystemandlibrarysupporttocollectprocessinformation,whichisbasedonhardwarecounters. Cohenetal.[ 52 ]studiedcorrelationbetweencomponentperformancemetricsandSLOviolationinInternetserverplatform.Therearesomesimilaritiesbetweentheirworkandoursintermsoflevelofperformancemetricsunderstudyandtypeofclassierused.However,ourstudydiersfromtheirsinthefollowingways.First,ourstudyfocusesonapplicationclassication(CPU-intensive,I/Oandpagingintensive,andnetwork-intensive)forresourcescheduling.Theirstudyfocusedonperformanceanomalydetection(SLOviolationandcompliance).Second,ourprototypetargetstosupportonlineclassication.Itaddressedthetrainingdataqualicationproblemtoadaptthefeatureselectiontochangingworkload.However,onlinetrainingdataselectionproblemswerenotthefocusof[ 52 ].Third,inourprototype,virtualmachineswereusedtohostapplicationexecutionsandsummarizeapplication'sresourceusage.Theprototypesupportsawiderangeofapplications,suchasscienticprogramsandbusinessonlinetransactionsystem.[ 52 ]studiedwebapplicationinthree-tierclient/serversystems. Inadditionto[ 52 ],Aguileraetal.[ 70 ]andMagpie[ 71 ]alsostudiedperformanceanalysisofdistributedsystems.However,theyconsideredmessage-leveltracesofsystemactivitiesinsteadofsystemlevelperformancemetrics.Bothofthemtreatedcomponentsofdistributedsystemsasblack-box.Therefore,theirapproachesdonotrequireapplicationandmiddlewaremodications. 72
PAGE 73
73
PAGE 74
Theintegrationofmultiplepredictorspromiseshigherpredictionaccuracythantheaccuracythatcanbeobtainedwithasinglepredictor.Thechallengeishowtoselectthebestpredictoratanygivenmoment.Traditionally,multiplepredictorsareruninparallelandtheonethatgeneratesthebestresultisselectedforprediction.Inthischapter,weproposeanovelapproachforpredictorintegrationbasedonthelearningofhistoricalpredictions.Comparedwiththetraditionalapproach,itdoesnotrequirerunningallthepredictorssimultaneously.Instead,itusesclassicationalgorithmssuchask-NearestNeighbor(k-NN)andBayesianclassicationanddimensionreductiontechniquesuchasPrincipalComponentAnalysis(PCA)toforecastthebestpredictorfortheworkloadunderstudybasedonthelearningofhistoricalpredictions.Thenonlytheforecastedbestpredictorisrunforprediction. 72 ]enablesentitiestocreateaVirtualOrganization(VO)tosharetheircomputationresourcessuchasCPUtime,memory,networkbandwidth,anddiskbandwidth.Predictingthedynamicresourceavailabilityiscriticaltoadaptiveresourcescheduling.However,determiningthemostappropriateresourcepredictionmodelaprioriisdicultduetothemulti-dimensionalityandvariabilityofsystemresourceusage.First,theapplicationsmayexercisetheuseofdierenttypeofresourcesduringtheirexecutions.SomeresourceusagessuchasCPUloadmayberelativelysmootherwhereasotherssuchasnetworkbandwidtharebustier.Itishardtondasinglepredictionmodelwhichworksbestforalltypesofresources.Second,dierentapplicationsmayhavedierentresourceusagepatterns.Thebestpredictionmodelforaspecicresourceofonemachinemaynotwokbestforanothermachine.Third,theresourceperformanceuctuatesdynamicallyduetothecontentioncreatedbycompetingapplications.Indeed,intheabsenceofaperfectpredictionmodel,thebestpredictorforanyparticularresourcemaychangeovertime. 74
PAGE 75
Ourexperimentalresultsbasedontheanalysisofasetofvirtualmachinetracedatashow: 1.Thebestpredictionmodelisworkloadspecic.Intheabsenceofaperfectpredictionmodel,itishardtondasinglepredictorwhichworksbestacrossvirtualmachineswhichhavedierentresourceusagepatterns. 2.Thebestpredictionmodelisresourcespecic.Itishardtondasinglepredictorwhichworksbestacrossdierentresourcetypes. 3.ThebestpredictionmodelforaspecictypeofresourceofagivenVMtracevariesasafunctionoftime.TheLARPredictorcanadaptthepredictorselectiontothechangeoftheresourceconsumptionpattern. 4.Intheexperimentswithasetoftracedata,TheLARPredictoroutperformedtheobservedsinglebestpredictorinthepoolfor44.23%ofthetracesandoutperformedthecumulative-MSEbasedpredictionmodelusedintheNetworkWeatherServicesystem 75
PAGE 76
73 ]for66.67%ofthetraces.Ithasthepotentialtoconsistentlyoutperformanysinglepredictorforvariableworkloadsandachieve18.63%lowerMSEthanthemodelusedintheNWS. Therestofthechapterisorganizedasfollows:Section 4.2 givesanoverviewofrelatedwork.Section 4.4 describesthelineartimeseriespredictionmodelsusedtoconstructtheLARPredictorandSection 4.5 describesthelearningtechniquesusedforpredictorselection.Section 4.6 detailstheworkowofthelearning-aidedadaptiveresourcepredictor.Section 4.7 discussestheexperimentalresults.Section 4.8 summarizestheworkanddescribesfuturedirection. 74 ],biomedicalsignalprocessing[ 75 ],andgeoscience[ 76 ].Inthiswork,wefocusonthetimeseriesmodelingforcomputerresourceperformanceprediction. In[ 77 ]and[ 78 ],Dindaetal.conductedextensivestudyofthestatisticalpropertiesandthepredictionsofhostload.TheirworkindicatesthatCPUloadisstronglycorrelatedovertime,whichimpliesthathistory-basedloadpredictionschemesarefeasible.Theyevaluatedthepredictivepowerofasetoflinearmodelsincludingautoregression(AR),movingaverage(MA),autoregressionintegratedmovingaverage(ARIMA),autoregressionfractionallyintegratedmovingaverage(ARFIMA),andwindow-meanmodels.TheirresultsshowthattheARmodelisthebestintermsofhighpredictionaccuracyandlowoverheadamongthemodelstheystudied.Basedontheirconclusion,theARmodelisincludedinourpredictorpooltoleverageitsperformance. Toimprovethepredictionaccuracy,variousadaptivetechniqueshavebeenexploitedbytheresearchcommunity.In[ 4 ],Yangetal.developedatendency-basedpredictionmodelthatpredictsthenextvalueaccordingtothetendencyofthetimeserieschange.Someincrement/decrementvalueareadded/subtractedtothecurrentmeasurementbasedonthecurrentmeasurementandsomeotherdynamicinformationtopredictthe 76
PAGE 77
79 ].Inaddition,in[ 80 ],Lianget.alproposedamulti-resourcepredictionmodelthatusesboththeautocorrelationoftheCPUloadandthecrosscorrelationbetweentheCPUloadandfreememorytoachievehigherCPUloadpredictionaccuracy.Vazhkudaietal.[ 81 ][ 82 ]usedlinearregressiontopredictthedatatransfertimefromnetworkbandwidthordiskthroughput. TheNetworkWeatherService(NWS)[ 73 ]performspredictionofbothnetworkthroughputandlatencyforhostmachinesdistributedwithdierentgeographicdistances.BoththeNWSandtheLARPredictorusethemix-of-expertapproachtoselectthebestpredictoratanygivenmoment.However,theydierfromeachotherinthewayofbestpredictorselection.ThepredictionmodelusedintheNWSsystemrunsasetofpredictorsinparalleltotracktheirpredictionaccuracies.Acumulativeerrormeasurement,MeanSquareError(MSE),iscalculatedforeachpredictor.Theonethatgeneratesthelowestpredictionerrorfortheknownmeasurementsischosentomakeaforecastoffuturemeasurementvalues.Section 4.6 showsthattheLARPredictoronlyusesparallelpredictionduringthetrainingphase.Inthetestingphase,itusesthePCAandk-NNclassiertoforecastthebestpredictorforthenextvaluebasedonthelearningofhistoricalpredictionperformances.Onlytheforecastedbestpredictorisruntopredictthenextvalue. Themix-of-expertapproachhasbeenappliedtothetextrecognitionandcate-gorizationarea.Thecombinationofmultipleclassiershasbeenprovedtobeabletoincreasetherecognitionrateindicultproblemswhencomparedwithsingleclassier[ 83 ].Dierentcombinationstrategiessuchasweightedvotingandprobability-basedvotinganddimensionalityreductionbasedonconceptindexingareintroducedin[ 84 ]. 77
PAGE 78
VirtualmachineresourceusagepredictionprototypeThemonitoragent,whichisinstalledintheVirtualMachineMonitor(VMM),collectstheVMresourceperformancedataandstoresthemintheroundrobinVMPerformanceDatabase.TheprolerextractstheperformancedataofagiventimeframefortheVMindicatedbyVMIDanddeviceID.TheLARPredictorselectthebestpredictionmodelbasedonlearningofhistoricalpredictions,predictstheresourceperformancefortimet+1,andstoresthepredictionresultsinthepredictiondatabase.ThepredictionresultscanbeusedtosupporttheresourcemanagertoperformdynamicVMresourceallocation.ThePerformanceQualityAssuror(QA)auditstheLARPredictor'sperformanceandordersre-trainingforthepredictoriftheperformancedropsbelowapredenedthreshold. Ourvirtualmachineresourcepredictionprototype,illustratedinFigure 4-1 ,modelshowtheVMperformancedataarecollectedandusedtopredictthevalueforfuturetimetosupportresourceallocationdecision-making. AperformancemonitoringagentisinstalledintheVirtualMachineMonitor(VMM)tocollecttheperformancedataoftheguestVMs.Inourimplementation,VMware'sESXvirtualmachinesareusedtohosttheapplicationexecutionandthevmkusagetool[ 85 ]ofESXisusedtomonitorandcollecttheperformancedataoftheVMguestsandhost 78
PAGE 79
2-1 showsthelistofperformancefeaturesunderstudyinthiswork. TheprolerretrievestheVMperformancedata,whichareidentiedbyvmID,deviceID,andatimewindow,fromtheroundrobinperformancedatabase.ThedataofeachVMdevice'sperformancemetricformtimeseries(xtm+1;;xt)withanidenticalinterval,wheremisthedataretrievalwindowsize.Theretrievedperformancedatawiththecorrespondingtimestampsarestoredinthepredictiondatabase.The[vmID,deviceID,timeStamp,metricName]formsthecombinationalprimarykeyofthedatabase.Figure 4-2 showstheXMLschemaofthedatabaseandsampledatabaserecordsofvirtualmachinessuchasVM1,whichhasoneCPU,twoNetworkInterfaceCards(NIC),andtwovirtualharddisks. TheLARPredictortakesthetimeseriesperformancedata(ytm;;yt1)asinputs,selectsthebestpredictionmodelbasedonthelearningofhistoricalpredictionresults,andpredictstheresourceperformance^ytoffuturetime.ThedetaildescriptionoftheLARPredictor'sworkowisgiveninSection 4.6 .ThepredictedresultsarestoredinthepredictionDBandcanbeusedtosupporttheresourcemanager'sdynamicVMprovisioningdecision-making. ThePredictionQualityAssuror(QA)isresponsibleformonitoringtheLARPredic-tor'sperformance,intermsofMSE.ItperiodicallyauditsthepredictionperformancebycalculatingtheaverageMSEofhistoricalpredictiondatastoredinthepredictionDB.WhentheaverageMSEofthedataintheauditwindowexceedsapredenedthreshold,itdirectstheLARPredictortore-trainthepredictorsandtheclassierusingrecentperformancedatastoredinthedatabase. 79
PAGE 80
SampleXMLschemaoftheVMperformanceDB wherefZtgdenotestheobservedtimeseries,fatgdenotesanunobservedwhitenoiseseries,andfigdenotestheweights.Inthisthesis,performancesnapshotsofvirtualmachine'sresourcesincludingCPU,memory,disk,andnetworkbandwidtharetakenperiodicallytoformthetimeseriesfZtgunderstudy. 80
PAGE 81
86 ].Timeseriesanalysistechniqueshavebeenwidelyappliedtoforecastinginmanyareassuchaseconomicforecasting,salesforecasting,stockmarketanalysis,communicationtraccontrol,andworkloadprojection.Inthiswork,simpletimeseriesmodels,suchasLAST,sliding-windowaverage(SW AVG),andautoregressive(AR),areusedtoconstructtheLARPredictortosupportonlineprediction.However,theLARPredictorprototypemaybegenerallyusedwithotherpredictionmodelsstudiedin[ 78 ][ 73 ][ 4 ]. AVGmodel:Thesliding-windowaveragemodelpredictsthefuturevaluesbytakingtheaverageoveraxed-lengthhistory: ThecurrentvalueoftheseriesZtisalinearcombinationoftheplatestpastvaluesofitselfplusatermat,whichincorporateseverythingnewintheseriesattimetthatisnotexplainedbythepastvalues.Yule-WalkertechniqueisusedintheARmodelttinginthiswork. 81
PAGE 82
AVGisproposedtopredicttheVMresourceperformance. Thepredictionperformanceismeasuredinmeansquarederror(MSE)[ 87 ],whichisdenedastheaveragesquareddierencebetweenindependentobservationsandpredictionsfromthettedequationforthecorrespondingvaluesoftheindependentvariables. (4{5) where^istheestimatorofaparameterinastatisticalmodel. Therearetwotypesofclassiers:nonparametricandparametric.Theparametricclassierexploitspriorinformationtomodelthefeaturespace.Whentheassumedmodeliscorrect,parametricclassiersoutperformnonparametricones.Incontrast,thenonparametricclassiersdonotmakesuchassumptionandaremorerobust.However,thenonparametricclassierstendtosuerfromcurseofdimensionality,whichmeansthatthedemandofanumberofsamplesgrowsexponentiallywiththedimensionalityofthefeaturespace.Inthissection,wearegoingtointroduceanonparametricclassier,k-NNclassier,andaparametricclassier,Bayesianclassier,whichareusedforbestpredictorselectionin 82
PAGE 83
Sincethefeaturesunderstudy,suchasCPUpercentageandnetworkreceived bytes/-sec,havedierentunitsofmeasure,allfeaturesarenormalizedtohavezeromeanandunitvariance[ 88 ].Inthiswork,\closest"isdeterminedbyEuclideandistance(Equation 4{6 ). Asanonparametricmethod,thek-NNclassiercanbeappliedtodierenttimeserieswithoutmodication.Toaddresstheproblemassociatedwithhighdimensionality,variousdimensionreductiontechniquescanbeusedinthedatapreprocessing. 83
PAGE 84
whereinthiscaseofccategories Then,theposteriorprobabilitiesP(!jjx)canbecomputedfromp(xj!j)byBayesformula.Inaddition,BayesformulacanbeexpressedinformallyinEnglishbysayingthat evidence: Themultivariatenormaldensityhasbeenappliedsuccessfullytoanumberofclassicationproblems.Inthisworkthefeaturevectorcanbemodeledasamultivariatenormalrandomvariable. Thegeneralmultivariatenormaldensityinddimensionsiswrittenas (2)d=2jj1=2exp1 2(x)T1(x); wherexisad-componentcolumnvector,isthed-componentmeanvector,isthed-by-dcovariancematrix,andjjand1areitsdeterminantandinverse,respectively.Further,welet(x)Tdenotethetransposeofx. Theminimizationoftheprobabilityoferrorcanbeachievedbyuseofthediscrimi-nantfunctions 84
PAGE 85
2(xi)T1(xi)d 2lnjij+lnP(!i): Theresultingclassicationisperformedbyevaluatingdiscriminantfunctions.Whentheworkloadhavesimilarstatisticalproperty,theBayesianclassierderivedfromoneworkloadtracecanbeappliedtoanotherdirectly.Incaseofhighlyvariableworkload,theretrainingoftheclassierisnecessary. 22 ][ 88 ],alsocalledKarhunen-Loevetrans-form,isalineartransformationrepresentingdatainaleast-squaresense.Theprincipalcomponentsofasetofdatain
PAGE 86
4-3 .Thepredictionconsistsoftwophases:atrainingphaseandatestingphase.Duringthetrainingphase,thebestpredictorsforeachsetoftrainingdataareidentiedusingthetraditionalmix-of-expertapproach.Duringthetestingphase,theclassierforecaststhebestpredictorforthetestdatabasedontheknowledgegainedfromthetrainingdataandhistoricalpredictionperformance.Thenonlytheselectedbestpredictorisruntopredicttheresourceperformance.Bothphasesincludethedatapre-processingandthePrincipalComponentAnalysis(PCA)process. Thefeaturesunderstudyinthiswork,asshowninTable 2-1 ,includeCPU,memory,networkbandwidth,anddiskI/Ousages.Figure 4-4 illustrateshowthefeaturesareprocessedtoformthepredictiondatabase.Sincethefeatureshavedierentunitsofmeasure,adatapre-processorwasusedtonormalizetheinputdatawithzeromeanandunitvariance.ThenormalizeddataareframedaccordingtothepredictionwindowsizetofeedthePCAprocessor. TheLASTandSW AVGmodelsdonotinvolveanyunknownparameters.Theycanbeusedforpredictionsdirectly.TheparametricpredictionmodelssuchastheARmodel,whichcontainunknownparameters,requiremodeltting.Themodelttingisaprocess 86
PAGE 87
Learning-aidedadaptiveresourcepredictorworkowTheinputdataarenormalizedandframedwiththepredictionwindowsizem.ThePrincipalComponentAnalysis(PCA)isusedtoreducethedimensionoftheinputdatafromthewindowsizemton(n
PAGE 88
Learning-aidedadaptiveresourcepredictordataowFirst,theutrainingdataX1uisnormalizedtoX01uandsubsequentlyframedtoX0(um+1)maccordingtothepredictororderm.ThePCAprocessorisusedtoreducethedimensionofeachsetoftrainingdatafrommtonbeforeprediction.ThenthepredictorsareruninparallelwiththeinputsX00(um+1)nandtheonethatgivesthesmallestMSEisidentiedasthebestpredictortobeassociatedwiththecorrespondingtrainingdatainthepredictiondatabase.Thedimensionreductionofthetestingdataissimilartothetrainingdata'sandisnotshownhere. toestimatetheunknownparametersofthemodels.TheYule-Walkerequation[ 86 ]isusedintheARmodelttinginthiswork. Forwindowbasedpredictionmodels,suchasSW AVGandAR,thePCAalgorithmisappliedtoreducetheinputdatadimension.Thenaivemix-of-expertapproachisusedtoidentifythebestpredictorpiforeachsetofpre-processedtrainingdata(exp.(x0ix0i+1:::x0i+m1)).AllpredictionmodelsareruninparallelwiththetrainingdataandtheonewhichgeneratestheleastMSEofpredictionisidentiedasthebestpredictorpi,whichisaclasslabeltakingvaluesin(LAST,AR,SW AVG)tobeassociatedwiththetrainingdata.TheupairsofPCA-processedtrainingdataandthecorrespondingbestpredictors[(x001;p1);;(x00u;pu)]formthetrainingdataoftheclassiers. Asanon-parametricclassier,thek-NNclassierdoesnothaveanobvioustrainingphase.Themajortaskofitstrainingphaseistolabelthetrainingdatawithclassdenitions.Asaparametricclassier,theBayesianclassierusesthetrainingdatatoderiveitsunknownparameters,whicharethemeanandcovariancematrixoftrainingdataofeachclass,toformtheclassicationmodel. 88
PAGE 89
InthetestingphaseoftheLARPredictorthatisbasedonk-NNclassier,theEuclideandistancesbetweenallPCA{processedtestdatay00tny00tn+1:::y00t1andalltrainingdataX00(u1+m)ninthereducedndimensionalfeaturespacearecalculatedandthek(k=3inourimplementation)trainingdatawhichhavetheshortestdistancestothetestingdataareidentied.Themajorityvoteoftheknearestneighbors'bestpredictorwillbechosenasthebestpredictortopredict^y0tbasedonthe(y0tm;y0tm+1;;y0t1)incaseoftheARmodelortheSW AVGmodeland^y0t=y0t1incaseoftheLASTmodel.Thepredictionperformancecanbeobtainedbycomparingthepredictedvalue^y0twiththenormalizedobservedvaluey0t. InthetestingphaseoftheLARPredictorthatisbasedonBayesianclassier,testdataarepreprocessedthesameasthek-NNclassier.ThePCA{processedtestdatay00tny00tn+1:::y00t1arepluggedintothediscriminantfunction( 4{12 )derivedinSection 4.5.2 .Theparametersinthediscriminantfunctionforeachclass,themeanvectorandcovariancematrix,areobtainedduringthetrainingphase.Then,eachtestdataisclassiedastheclassofthelargestdiscriminantfunction. ThetestingphasediersfromthetrainingphaseinthatitdoesnotrequirerunningmultiplepredictorsinparalleltoidentifytheonewhichisbestsuitedtothedataandgivesthesmallestMSE.Instead,itforecaststhebestpredictorbylearningfromhistoricalpredictions.Thereasoninghereisthatthesenearestneighbors'workloadcharacteristicsareclosesttothetestingdata'sandthepredictorthatworksbestfortheseneighborsshouldalsoworkbestforthetestingdata. 89
PAGE 90
ThesevirtualmachineswerehostedbyaphysicalmachinewithanIntel(R)Xeon(TM)2.0GHzCPU,4GBmemory,and36GBSCSIdisk.VMwareESXserver2.5.2wasrunningonthephysicalhost.ThevmkusagetoolwasrunontheESXservertocollecttheresourceperformancedataoftheguestvirtualmachineseveryminuteandstoretheminaroundrobindatabase.TheprolerwasusedtoextractthedatawithgivenVMID,DeviceID,performancemetric,startingandendingtimestamps,andintervals.Inthisexperiment,theperformancedataofa24-hourperiodwith5-minuteintervalswereextractedforVM2,VM3,VM4,andVM5.Thedataofa7-dayperiodwith30-minuteintervalsofVM1wereextracted.ThedataofagivenVMID,DeviceID,andperformancemetricsformatimeseriesunderstudy.Thetimeseriesdatawerenormalizedwithzeromeanandunitvariance. 90
PAGE 91
4-5 showsthepredictorselectionsforCPUfteenminuteloadaverageduringa12hourperiodwithasamplingintervalof5minutes.Thetopplotshowstheobservedbestpredictorbyrunningthreepredictionmodelsinparallel.ThemiddleplotshowsthepredictorselectionoftheLARPredictorandthebottomplotshowsthecumulative-MSEbasedpredictorselectionusedintheNWS.Similarlythepredictorselectionresultsofthetracedataofotherresourcesareshownasfollows:NetworkpacketsinpersecondinFig. 4-6 ,totalamountofswapmemoryinFig. 4-7 ,andtotaldiskspaceinFig. 4-8 Theseexperimentalresultsshowthatthebestpredictionmodelforaspecictypeofresourceofagiventracevariesasafunctionoftime.Intheexperiment,theLARPredictorcanbetteradaptthepredictorselectiontothechangingworkloadthanthecumulative-MSEbasedapproachpresentedintheNWS.TheLARPredictor'saveragebestpredictorforecastingaccuracyofalltheperformancetracesofthevevirtualmachinesis55.98%,whichis20.18%higherthantheaccuracyof46.58%achievedbythecumulative-MSEbasedpredictorusedintheNWSfortheworkloadstudied. 4.7.2.1 showsthepredictionaccuracyofthek-NNbasedLARPredictorandallthepredictorsinthepool.Section 4.7.2.2 comparesthepredictionaccuracyandexecutiontimeofthek-NNbasedLARPredictorandtheBayesianbasedLARPredictor.Inaddition,Section 4.7.2.3 benchmarkstheperformanceoftheLARPredictorsandthecumulative-MSEbasedpredictionmodelusedintheNWS. Intheexperiments,ten-foldcrossvalidationwereperformedforeachsetoftimeseriesdata.Atimestampwasrandomlychosentodividetheperformancedataofavirtualmachineintotwoparts:50%ofthedatawasusedtotraintheLARPredictorandtheother50%wasusedastestsettomeasurethepredictionperformancebycalculatingitspredictionMSE. 91
PAGE 92
BestpredictorselectionfortraceVM2 load15PredictorClass:1-LAST,2-AR,3-SW AVG Inthetestingphase,the3NNclassierwasusedtoforecastthebestpredictorsofthetestingdata.First,foreachsetoftestingdataofthepredictionwindowsize,thePCAwasappliedtoreducethedatadimensionfromm,whichwas5or16,ton=2in 92
PAGE 93
BestpredictorselectionfortraceVM2 PktInPredictorClass:1-LAST,2-AR,3-SW AVG thisexperiment.ThentheEuclideandistancesbetweenthetestdataandallthetrainingdatainthereducedfeaturespacewerecalculated.Thethreetrainingdatawhichhadtheshortestdistancestothetestingdatawereidentiedandthemajorityvoteoftheirassociatedbestpredictorswasforecastedtobethebestpredictorofthetestingdata.Atlast,theforecastedbestpredictorwasruntopredictthefuturevalueofthetestingdata.TheMSEofeachtimeserieswascalculatedtomeasuretheperformanceoftheLARPredictor.Tables 4-1 4-2 4-3 4-4 ,and 4-5 showthepredictionperformanceoftheLARPredictorwithcurrentimplementation(LAR)andthethreepredictionmodelsincludingLAST,AR,andSW AVGforallresourceperformancetracesofthevevirtualmachines.AlsoshowninthesetablesisthecomputedMSEforaperfectLARPredictor 93
PAGE 94
BestpredictorselectionfortraceVM2 SwapPredictorClass:1-LAST,2-AR,3-SW AVG (P-LAR).TheMSEoftheP-LARmodelshowstheupperboundofthepredictionaccuracythatcanbeachievedbytheLARPredictor.TheMSEofthebestpredictoramongLAR,LAST,AR,andSW AVGishighlightedwithitalicboldnumbers. Table 4-6 showsthebestpredictoramongLAST,AR,andSW AVGforalltheresourceperformancemetricsandVMtraces.Thesymbol\*"indicatesthecasesinwhichtheLARPredictorachievedequalorhigherpredictionaccuracythanthebestofthethreepredictors.Overall,theARmodelperformedbetterthantheLASTandtheSW AVGmodels. Theaboveexperimentalresultsshow: 94
PAGE 95
BestpredictorselectionfortraceVM2 DiskPredictorClass:1-LAST,2-AR,3-SW AVG 1.ItishardtondasinglepredictionmodelamongLAST,AR,andSW AVGthatperformsbestforalltypesofresourceperformancedataforagivenVMtrace.Forexample,fortheVM1'stracedatashowninTable 4-1 ,eachofthethreemodels(LAST,AR,andSW)outperformedtheothertwoforasubsetoftheperformancemetrics.Inthisexperiment,onlytheARmodelworkedbestforthetracedataofVM3. 2.ItishardtondasinglepredictionmodelamongthethreethatperformbestconsistentlyforagiventypeofresourcesacrossalltheVMtraces.Intheexperiment,onlytheARmodelworkedbestfortheCPUperformancepredictions. 3.TheLARPredictorachievedbetter-than-expertperformancesusingthemix-of-expertapproachfor44.23%oftheworkloadtraces.Itshowsthepotentialforthe 95
PAGE 96
NormalizedpredictionMSEstatisticsforresourcesofVM1 4-9 showsthepredictionperformancecomparisonsbetweenitandthek-NNbasedLARPredictorforalltheresourcesofVM1.TheprolereportoftheMatlabprogramexecutionshowedthatitcostthekNNbasedLARPredictor205.8secondCPUtime,with193.5secondsinthetestingphaseand12.3secondsinthetrainingphase.Ittook132.1 96
PAGE 97
NormalizedpredictionMSEstatisticsforresourcesofVM2 TheexperimentalresultsshowthatthepredictionaccuracyintermsofnormalizedMSEoftheBayesian-classierbasedLARPredictorisabout3.8%worsethanthek-NNbasedone.However,itshortenedtheCPUtimeofthetestingphaseby37.57%. 4-9 4-10 4-11 4-12 ,and 4-13 showsthepredictionaccuracyoftheperfectLARPredictorthathas100%bestpredictorforecastingaccuracy(P-LARP),thek-NNandBayesianbasedLARPredictors(KnnLARPandBaysLARP),thecumulativeMSEofallhistorybasedpredictorusedintheNWS(Cum.MSE),andthecumulative-MSE 97
PAGE 98
NormalizedpredictionMSEstatisticsforresourcesofVM3 Theexperimentalresultsshowthatwithoutrunningallthepredictorsinparallelallthetime,for66.67%ofthetraces,theLARPredictoroutperformedthecumulative-MSEbasedpredictorusedintheNWS.TheperfectLARPredictorshowsthepotentialtoachieve18.6%lowerMSEinaveragethatthecumulative-MSEbasedpredictor. 89 ].Inthecontextofresourceperformancetimeseriesprediction,W=1anddisthepredictionwindowsize.ThetypicalsmallinputdatasizeinthiscontextmakestheuseofthePCAfeasible.Therealsoexistcomputationallylessexpensivemethods[ 90 ]forndingonlyafeweigenvectorsandeigenvaluesofalargematrix;inourexperiments,weuseappropriateMatlabroutinestorealizethese. 98
PAGE 99
NormalizedpredictionMSEstatisticsforresourcesofVM4 NormalizedpredictionMSEstatisticsforresourcesofVM5
PAGE 100
Bestpredictorsofallthetracedata.ThepredictorsshowninthetablehavethesmallestMSEamongallthethreepredictors(LAST,AR,andSW AVG).The\*"symbolindicatesthattheLARPredictoroutperformsthebestpredictorinthepredictorpool. VM1 VM2 VM3 VM4 VM5 usedsec AR AR AR AR* AR* CPU ready AR AR* AR* AR* AR Mem size LAST AR* AR* LAST AR* Mem swap LAST AR* LAST AR* NIC1 Rx AR* AR AR* AR NIC1 Tx AR* AR* AR* AR NIC2 Rx AR* LAST AR SW AVG NIC2 Tx AR* AR* AR* AR VD1 read AR AR AR SW AVG VD1 write AR AR SW AVG* AR VD2 read SW AVG AR AR* AR VD2 write AR AR AR* AR* AR Thek-NNdoesnothaveano-linelearningphase.The\trainingphase"ink-NNissimplytoindextheNtrainingdataforlateruse.Therefore,itstrainingcomplexityofk-NNisO(N)bothintimeandspace.Inthetestingphase,theknearestneighborsofatestingdatacanbeobtainedO(N)timebyusingamodiedversionofquicksort[ 91 ].Therearefastalgorithmsforndingnearest-neighbors[ 92 ][ 93 ]also. Threesimpletimeseriesmodelswereusedinthisexperimenttoshowthepotentialofusingdynamicpredictorselectionbasedonlearningtoimprovepredictionaccuracy.However,theLARPredictorprototypemaybegenerallyusedwithothermoresophisti-catedpredictionmodelssuchasthesestudiedin[ 78 ][ 73 ][ 4 ].Generally,themorepredictorsinthepoolandthemorecomplexthepredictorsare,itismorebenecialtousetheLARPredictorbecausetheclassicationoverheadcanbebetteramortizedbyrunningonlysinglepredictoratanygiventime. 100
PAGE 101
Predictorperformancecomparison(VM1)1-CPU usedsec,2-CPU ready,3-Mem size,4-Mem swap,5-NIC1 rx,6-NIC1 tx,7-NIC2 rx,8-NIC2 tx,9-VD1 read,10-VD1 write,11-VD2 read,12-VD2 write theBayesianclassierareusedtoforecastthebestpredictorfortheworkloadbasedonthelearningofhistoricalloadcharacteristicsandpredictionperformance.Theprincipalcomponentanalysistechniquehasbeenappliedtoreducetheinputdatadimensionoftheclassicationprocess.OurexperimentalresultswiththetracesofthefullrangeofvirtualmachineresourcesincludingCPU,memory,networkanddiskshowthattheLARPredictorcaneectivelyidentifythebestpredictorfortheworkloadandachievepredictionaccuraciesthatareclosetoorevenbetterthananysinglebestpredictor. 101
PAGE 102
Predictorperformancecomparison(VM2)1-CPU usedsec,2-CPU ready,3-Mem size,4-Mem swap,5-NIC1 rx,6-NIC1 tx,7-NIC2 rx,8-NIC2 tx,9-VD1 read,10-VD1 write,11-VD2 read,12-VD2 write 102
PAGE 103
Predictorperformancecomparison(VM3)1-CPU usedsec,2-CPU ready,3-Mem size,4-Mem swap,5-NIC1 rx,6-NIC1 tx,7-NIC2 rx,8-NIC2 tx,9-VD1 read,10-VD1 write,11-VD2 read,12-VD2 write 103
PAGE 104
Predictorperformancecomparison(VM4)1-CPU usedsec,2-CPU ready,3-Mem size,4-Mem swap,5-NIC1 rx,6-NIC1 tx,7-NIC2 rx,8-NIC2 tx,9-VD1 read,10-VD1 write,11-VD2 read,12-VD2 write 104
PAGE 105
Predictorperformancecomparison(VM5)1-CPU usedsec,2-CPU ready,3-Mem size,4-Mem swap,5-NIC1 rx,6-NIC1 tx,7-NIC2 rx,8-NIC2 tx,9-VD1 read,10-VD1 write,11-VD2 read,12-VD2 write 105
PAGE 106
Prolingtheexecutionphasesofapplicationscanhelptooptimizetheutilizationoftheunderlyingresources.Thischapterpresentsanovelsystemlevelapplication-resource-demandphaseanalysisandpredictionapproachinsupportofon-demandresourceprovisioning.Thisapproachexploreslarge-scalebehaviorofapplications'resourceconsumption,followedbyanalysisusingasetofalgorithmsbasedonclustering.Thephaseprole,whichlearnsfromhistoricalruns,isusedtoclassifyandpredictfuturephasebehavior.Thisprocesstakesintoconsiderationapplications'resourceconsumptionpatterns,phasetransitioncostsandpenaltiesassociatedwithService-LevelAgreements(SLA)violations. 94 ]oftheapplication'sexecutionenvironmentbothinacademiaandindustry[ 11 ][ 16 ][ 95 ].Thisismotivatedbytheideaofprovidingcomputingresourcesasautilityandchargingtheusersforaspecicusage.Forexample,inAugust2006,AmazonlauncheditsBetaversionofVM-basedElasticComputeCloud(EC2)webservice.EC2allowsuserstorentvirtualmachineswithspeciccongurationsfromAmazonandcansupportchangesinresourcecongurationsintheorderofminutes.Insystemsthatallowuserstoreserveandrecongureresourceallocationsandchargebaseduponsuchallocations,usershaveanincentivetorequestnomorethantheamountofresourcesanapplicationneeds.Aquestionwhichariseshereis:howtoadapttheresourceprovisioningtothechangingworkload? Inthischapter,wefocusonmodelingandanalyzinglong-runningapplications'phasebehavior.Themodelingisbasedonmonitoringandlearningoftheapplications'historicalresourceconsumptionpatterns,whichlikelyvariesovertime.Understandingsuchbehavioriscriticaltooptimizingresourcescheduling.Toself-optimizethecongurationofan 106
PAGE 107
Inthiscontext,aphaseisdenedasasetofintervalswithinanapplication'sexecutionthathavesimilarsystem-levelresourceconsumptionbehavior,regardlessoftemporaladjacency.Itmeansthataphasemayreappearmanytimesasanapplicationexecutes.Phaseclassicationpartitionsasetofintervalsintophaseswithsimilarbehavior.Inthischapter,weintroduceanapplicationresourcedemandphaseanalysisandpredictionprototype,whichusesacombinationofclusteringandsupervisedlearningtechniquestoinvestigatethefollowingquestions: 1)Isthereaphasebehaviorintheapplication'sresourceconsumptionpatterns?Ifso,howmanyphasesshouldbeusedtoprovideoptimalresourceprovisioning? 2)Basedontheobservationsofhistoricalphasebehaviors,whatisthepredictednextphaseoftheapplication'sexecution? 3)Howdophasetransitionfrequencyandpredictionaccuracyaectresourceallocation?Answerstothesequestionscanbeusedtodecidethetimeandspaceallocationofresources. Tomakeoptimizationdecisions,thisprototypetakestheapplication'sresourceconsumptionpatterns,phasetransitioncosts,andpenaltiesassociatedwithServiceLevelAgreement(SLA)violationsintoaccountwhilemakingoptimizationdecisions.Thepredictionaccuracyisfedbacktoguidefuturephaseanalysis.Thisprototypedoesnotrequireanyinstrumentationoftheapplicationsourcecodesandcangenerallyworkwithbothphysicalandvirtualmachineswhichcanprovidemonitoringofsystemlevelperformancemetrics. OurexperimentalresultswiththeCPUandthenetworkperformancetracesofSPECseis96andWorldCup98accesslogreplayshowthat: 107
PAGE 108
2.Forapplicationswithphasebehavior,typicallywithasmallnumberofphasesthesavingsgainedfromphase-basedresourcereservationcanoutweighthecostsassociatedwiththeincreasednumberofre-provisioningandthepenaltiescausedbymispredictions. 3.Thephasepredictionaccuracydecreasesasthenumberofphasesincreases.Withthecurrentprototype,anaverageofabove90%phasepredictionaccuracycanbeachievedfortheCPUandnetworkperformancefeatureswherefourphasesareconsidered. Therestofthischapterisorganizedasfollows:Section 5.2 presentstheapplicationphaseanalysisandpredictionmodel.Section 5.3 and 5.4 detailthealgorithmsusedforphaseanalysisandprediction.Section 5.5 presentsexperimentalresults.Section 5.6 discussesrelatedwork.Section 5.7 drawsconclusionsanddiscussesfuturework. 5-1 ,modelshowtheapplicationVM'sperformancedataarecollectedandanalyzedtoconstructthecorrespondingapplication'sphaseproleandhowtheproleisusedtopredictitsnextphase.Inaddition,itshowshowprocessqualityindicators,suchasphasepredictionaccuracy,aremonitoredandusedasfeedbacksignalstotunethesystemperformance(suchasapplicationresponsetime)towardsthegoaldenedintheSLA. AperformancemonitoringagentisusedtocollecttheperformancedataoftheapplicationVM,whichservesastheapplicationcontainer.Themonitoringagentcanbeimplementedinvariousways.Inthiswork,Ganglia[ 54 ],adistributedmonitoringsystem,andthevmkusagetool[ 85 ]providedbyVMwareESXserver,areusedtomonitor 108
PAGE 109
ApplicationresourcedemandphaseanalysisandpredictionprototypeThephaseanalyzeranalyzestheperformancedatacollectedbythemonitoringagenttondouttheoptimalnumberofphasesn2[1;m].Theoutputphaseproleisstoredintheapplicationphasedatabase(DB)andwillbeusedastrainingdataforthephasepredictor.Thepredictorpredictsthenextphaseoftheapplicationresourceusagebasedonthelearningofitshistoricalphasebehaviors.Thepredictedphasecanbeusedtosupporttheapplicationresourcemanager's(ARM's)decisionsregardingresourceprovisioning.Theauditormonitorsandevaluatestheperformanceoftheanalyzerandpredictorandordersre-trainingofthephasepredictorwiththeupdatedworkloadprolewhentheperformancemeasurementsdroptobelowapredenedthreshold. theapplicationcontainers.Thecollectedperformancedataisstoredintheperformancedatabase. Thephaseanalyzerretrievesthetime-seriesVMperformancedata,whichareidentiedbyvmID,FeatureID,andatimewindow(ts;te),fromtheperformancedatabase.Thenitperformsphaseanalysisusingalgorithmsbasedonclusteringtocheckwhetherthereisaphasebehaviorintheapplication'sresourceconsumptionpatterns.Ifso,itcontinuestondouthowmanyphasesinanumericrangearebestintermsofprovidingtheminimalresourcereservationcosts.Theoutputphaseprole,whichconsistsofthe 109
PAGE 110
5.3 Thephaseproleisusedastrainingdataofthephasepredictor.Inthepresenceofphasebehavior,thephasepredictorcanperformon-linepredictionofthenextphaseoftheapplication'sresourceusagebasedonthelearningofhistoricalphasebehaviorsasshowninSection 5.4 .Thepredictedphaseinformationcanbeusedtosupporttheapplicationresourcemanager'sdecisionsregardingresourcere-provisioningrequeststotheresourcescheduler. Theauditormonitorsandevaluatesthehealthofthephaseanalysisandpredictionprocessbyperformingqualitycontrolofeachcomponent.Clusteringqualitycanbemeasuredbythesimilarityandcompactnessoftheclustersusingvariousinternalindicesintroducedin[ 96 ].Thephasepredictor'sperformanceismeasuredbyitspredictionaccuracy.TheapplicationresponsetimeisusedasanexternalsignalfortotalqualitycontrolandcheckedagainsttheQualityofService(QoS)denedintheSLA.Localperformancetuningistriggeredwhentheauditorobservesthatthecomponent-levelservicequalitydropstobelowapredenedthreshold.Forexample,whenthereal-timeworkloadvariestoadegreewhichmakesitstatisticallysignicantlydierentfromthetrainingworkload,thephasepredictionaccuraciesmaydrop.Upondetection,theauditorcanorderaphaseanalysisbasedonrecentworkloadtoupdatethephaseproleandsubsequentlyorderare-trainingforthephasepredictor.Ifthere-trainingstillcannotimprovethetotalqualityofservicetoasatisfactorylevel,theresourcereservationstrategyfallsbackfromthephase-basedreservationtoaconservativestrategy,whichreservesthelargestamountofresourcestheuseriswillingtopayduringthewholeapplicationrun.Automatedandadaptivethresholdsettingisdiscussedindetailin[ 67 ]. 110
PAGE 111
Atahigh-level,theproblemofclusteringisdenedasfollows:GivenasetUofnsamplesu1;u2;;un,wewouldliketopartitionUintoksubsetsU1;U2;;Uk,suchthatthesamplesassignedtoeachsubsetaremoresimilartoeachotherthanthesamplesassignedtodierentsubsets.Here,weassumethattwosamplesaresimilariftheycorrespondtothesamephase. 97 ]: (1)Patternrepresentation,whichisusedtoobtainanappropriatesetoffeaturestouseinclustering.Itoptionallyconsistsoffeatureextractionand/orselection.Featureselectionistheprocessofidentifyingthemosteectivesubsetoftheoriginalfeaturestouseinclustering.Featureextractionistheuseofoneormoretransformationsoftheinputfeaturestoproducenewsalientfeatures. Inthecontextofresourcedemandphaseanalysis,thefeaturesunderstudyarethesystemlevelresourceperformancemetricsasshowninTable 5-1 .Foronedimensionclustering,whichisthecaseofthiswork,thefeatureselectionisassimpleaschoosingtheperformancemetricwhichisinstructivetotheallocationofthecorrespondingsystemresource.Forclusteringbasedonmultipleperformancemetrics,featureextractiontechniquessuchasPrincipalComponentAnalysis(PCA)maybeusedtotransformtheinputperformancemetricstoalowerdimensionspacetoreducethecomputingintensityofsubsequentclusteringandimprovetheclusteringquality. (2)Denitionofapatternproximitymeasureappropriatetothedatadomain.Thepatternproximityisusuallymeasuredbyadistancefunctiondenedonpairsofpatterns.Inthiswork,themostpopularmetricforcontinuousfeatures,Euclideandistanceisused 111
PAGE 112
(3)Clusteringorgrouping:Theclusteringcanbeperformedinanumberofways[ 97 ].Theoutputclusteringcanbehard(apartitionofthedataintogroups)orfuzzy(whereeachpatternhasavariabledegreeofmembershipineachoftheoutputclusters).Ahardclusteringcanbeobtainedfromafuzzypartitionbythresholdingthemembershipvalue.Inthiswork,oneofthemostpopulariterativeclusteringmethods,k-meansalgorithmasdetailedinSection 5.3.3 ,isused. 97 ] {Apattern(orfeaturevectororobservation)isasingledataitemusedbytheclusteringalgorithm.Ittypicallyconsistsofavectorofdmeasurements. {Theindividualscalarcomponentsofapatternarecalledfeatures(orattributes). {disthedimensionalityofthepatternorofthepatternspace. {Aclassreferstoastateofnaturethatgovernsthepatterngenerationprocessinsomecases.Moreconcretely,aclasscanbeviewedasasourceofpatternswhosedistributioninfeaturespaceisgovernedbyaprobabilitydensityspecictotheclass.Clusteringtechniquesattempttogrouppatternssothattheclassestherebyobtainedreectthedierentpatterngenerationprocessesrepresentedinthepatternset. {Adistancemeasureisametriconthefeaturespaceusedtoquantifythesimilarityofpatterns. 112
PAGE 113
Incaseofclusteringinthemulti-dimensionalspace,normalizationofthecontinuousfeaturescanbeusedtoremovethetendencyofthelargest-scaledfeaturetodominatetheothers.Inaddition,Mahalanobisdistancecanbeusedtoremovethedistortioncausedbythelinearcorrelationamongfeaturesasdiscussedinchapter 3 Thek-meansalgorithmworksasfollows[ 97 ]: (1)Choosekclustercenterstocoincidewithkrandomly-chosenpatternsinsidethehypervolumecontainingthepatternset. (2)Assigneachpatterntotheclosestclustercenter. (3)Recomputetheclustercentersusingthecurrentclustermemberships. (4)Ifaconvergencecriterionisnotmet,gotostep2.Typicalconvergencecriteriaare:no(orminimal)reassignmentofpatternstonewclustercenters,orminimaldecreaseinsquarederror. ThealgorithmhasatimecomplexityofO(n),wherenisthenumberofpatterns,andaspacecomplexityofO(k),wherekisthenumberofclusters.Thealgorithmisorder-independent;foragiveninitialseedsetofclustercenters,itgeneratesthesamepartitionofthedatairrespectiveoftheorderinwhichthepatternsarepresentedtothealgorithm.However,thealgorithmissensitivetoinitialseedselectionandeveninthebestcase,itcanproduceonlyhypersphericalclusters. 113
PAGE 114
96 ].Thebestnumberofclustersinthecontextofphaseanalysisdiscussedinthisworkistheonethatgivesminimaltotalcosts.Theprocesstondouttheoptimalnumberofclustersoftheapplicationworkloadisexplainedasfollows. Letun=u(t0+nt)denotetheresourceusagesampledattimet=t0+ntduringtheexecutionofanapplication.AsshowninSection 5.3.3 ,whentheclusteringwithinputparameterk(i.e.,thenumberofclusters)isperformedforaresourceusagesetU=fu1;u2;g,thesubsetUiofresourceusagesthatbelongtotheithphasecanbewrittenas: andthetotalresourcereservationRoverthewholeexecutionperiodcanbewrittenas wherekisthenumberofclustersusedforclusteringalgorithmandthesizeofUiisdenedasthenumberofelementsofthesubsetUi.Comparedtotheconservativereservationstrategy,whichreservestheglobalmaximumamountofresourcesoverthewholeexecutionperiod,thephase-basedreservationstrategycanbetteradapttheresourcereservationtotheactualresourceusageandreducetheresourcereservationcostasshowninFigure 5-2 114
PAGE 115
ResourceallocationstrategycomparisonPhase-basedresourceallocationstrategycanadaptthetime(t)andspace(s)granularityoftheallocationtotheactualresourceusage.Itpresentscostreductionopportunitycomparedtothecoarse-grainedconservativestrategy. whichillustratesthedierencebetweenthetworeservationstrategiesusingahypotheticalworkload. (5{5) whereC1andC2denotetheunitcostperresourceusageandtransition.Thebestnumberofphases,kbest,shouldminimizethetotalcost.Therefore,kbestisderivedas 115
PAGE 116
wherekisthenumberofphases.Takingboththephasetransitionandmispredictioncostsintoaccount,thegeneraltotalcostfunctionismodiedas (5{9) 116
PAGE 117
whereCisthetransitionfactor,Cpdenotethediscountfactorformispredictionpenalty,whichistheratioofC3toC1,andKisthemaximumnumberofphases. 5-3 .Thepredictionconsistsoftwostages:atrainingstageandatestingstage.Duringthetrainingstage,thenumberoftheclustersintheapplicationresourceusage,thecorrespondingclustercentroids,andtheunknownparametersofthetimeseriespredictionmodeloftheresourceusagearedetermined.Duringthetestingstage,theone-stepaheadresourceusageispredictedandclassiedasoneoftheclusters. Bothstagesstartfrompatternrepresentationandframing.Inthestepofpatternrepresentation,thecollectedperformancedataoftheapplicationVMareproledtoextractonlythefeatureswhichwillbeusedforclusteringandfutureresourceprovisioning.Forexample,intheone-dimensioncasediscussedinthisthesis,thetrainingdataofaspecicperformancefeature(X1u,seeTable 5-1 ),areextracted,whereuisthetotalnumberofinputdata.ThentheextractedperformancedataX1uareframedwiththepredictionwindowsizemtoformdataX0(um+1)m. Thetrainingstagemainlyconsistsoftwoprocesses:predictionmodelttingandphasebehavioranalysis.ThealgorithmsdenedinSection 5.3.3 and 5.3.4 areusedtondoutthenumberofphasesk,whichgivesthelowesttotalresourceprovisioningcost.Theoutputphaseproleisusedtotrainthephasepredictor.Inaddition,theunknownparametersoftheresourcepredictorareestimatedfromthetrainingdata.Inthisthesis, 117
PAGE 118
78 ].However,thisprototypecangenerallyworkwithanyothertime-seriespredictionmodels.Incaseofhighlydynamicworkloads,theLearning-AidedResourcePredictor(LARPredictor)developedinChapter4canbeused.TheLARPredictorusesamix-of-expertsapproach,whichadaptivelychoosesthebestpredictionmodelfromapoolofmodelsbasedonlearningofthecorrelationsbetweentheworkloadandttedpredictionmodelsofhistoricalruns. Similartothetrainingstage,thetestingdataareextractedY1vandframedwiththepredictionwindowsizem.TheframedtestingdataY0(vm+1)mareusedasinputofthettedresourcepredictortopredictthefutureresourceusage^Y01v.Thephasepredictorclassiesthepredictedresourceusages^Y01vintothephases^P01vbasedonthephaseprolelearnedinthetrainingstage.Similarly,thephasepredictionsfortheactualresourceusageY1vareperformedtogenerate^P1v.Thenthecorrespondingpredictedphases^P01v(whicharebasedonpredictedresourceusage)and^P1v(whicharebasedonactualresourceusage)arecomparedtoevaluatethephasepredictionaccuracy,whichisdenedastheratioofthenumberofmatchedphasepredictionsoverthetotalnumberofphasepredictions. 118
PAGE 119
Performancefeaturelist Description System/User PercentCPU System/User Bytes In/Out Numberofbytespersecondinto/outof thenetwork IO BI/BO Blockssentto/receivedfromblockdevice (blocks/s) Swap In/Out Amountofmemoryswappedin/out from/todisk(kB/s) 5.3.4 canbeusedtondoutthebestnumberofclustersforanapplicationworkload.TheGangliamonitoringdaemonwasusedtocollecttheperformancedataoftheapplicationcontainer.Table 5-1 showsthelistofperformancefeaturesunderstudyintheexperiments. 53 ],washostedbyaVMwareGSXvirtualmachine.ThehostserverofthevirtualmachinewasanIntel(R)Xeon(TM)dual-CPU1.80GHzmachinewith512KBcacheand1GBRAM.TheGangliadaemonwasinstalledintheguestVMandruntocollecttheresourceperformancedataonceeveryveseconds(5secs=interval)andstorethemintheperformancedatabase.Duringfeaturerepresen-tation,thedatawereextractedbasedongivenVMID,FeatureID,startingandendingtimestampstoformtimeseriesdataunderstudy.Thensubsequentphaseanalysiswasperformedforthe8000performancesnapshotscollectedduringthemonitoringperiods. Figure A showsasamplesetoftrainingdataoftheCPU user(%)oftheVMincludingtheactualresourceusages(ActualRsc),reservedresourcesbasedonthek-meanclustering(k=3)(RsvdRsc)andbasedontheconservativereservationstrategy(ConsrvRsc).Figure B showsasamplesetofthecorrespondingtestingdataincludingtheactualresourceusage(ActualRsc),theresourcereservationbasedonactualresourceusage(Rsvd 119
PAGE 120
Figures C and D showthat,withincreasingnumberofphases,twoofthedeter-minantsinthecostmodelincludingthenumberofphasetransitionsTR(k)andthemispredictionpenaltyP(k)increasemonotonically.Theotherdeterminantofthecostmodel,theamountofreservedresourcesR(k),isshownbythelowestcurvewithindexC=0inFigure E .Itindicatesthatwithincreasingnumberofphasesthetotalreservedresourcesofthetrainingsetisdecreasingmonotonically.Thisresultisbecausewiththeincreasingnumberofphases,theresourceallocationcanbeperformedattimescalesofnergranularity.However,thereisadiminishingreturnoftheincreasednumberofphasesbecauseoftheincreasingphasetransitioncostsandmispredictionpenalties. Intherstanalysis,weassumeeachresourcereservationschemetobeclairvoyant,i.e.,itreservesresourcesbasedonexactknowledgeoffutureworkloadrequirements.Thisassumptioneliminatestheimpactofinaccuraciesintroducedbythephasepredictor.Inthiscase,Equation( 5{6 ),whichtakestheresourcereservationcostandthephasetransitioncostintoaccountwhiledecidingtheoptimalnumberofphases,canbeappliedasshowninFigure E .Inthisgure,thetotalcostoverthewholetestingperiodismeasuredbyCPUusageinpercentage.ThediscountfactorCdenotestheCPUpercentagethatwillcostforeachphasetransition:C=CPU(%)TransitionDuration.Forexample,thebottomlineofC=0showsthecaseofnotransitioncost,whichgivesthelowerboundofthetotalcost.Foranotherinstance,C=260%impliesa13-secondtransitionperiod(2:6intervals5secs=interval)withtheassumptionof100%CPUconsumptionduringthetransitionperiod.WhenthediscountfactorCincreasesfrom0to260,thebestnumberofphasesk best,whichcanprovidethelowesttotalcost,decreasesgraduallyfrom10to2.ThephaseproledepictedinFigure E canbeusedtodecidethenumberofphasesthatshouldbeusedinthephase-basedresourcereservationtominimizethetotalcostwithgivenavailabletransitionoptions.Forexample,VMwareESXsupports 120
PAGE 121
TheimpactofinaccuraciesintroducedbythephasepredictorisshowninFigure F .Inadditiontotheresourcereservationcostsandthephasetransitioncosts,thisexperimentalsotookthephasemis-predictionpenaltycostsintoaccountswhilecalculatingthetotalcost.Forexample,foreachunitofdown-sizemis-predictedresource,apenaltyof8-times(Cp=8)oftheunitresourcecostisimposed.ComparingFigure E toFigure F ,wecanseethataddingpenaltyintothecostmodelwillincreasethenalcoststotheuserforthesamesetofkandCandpotentiallyreducetheworkload'sbestnumberofphasesk0 FinallyatotalcostratioisdenedtobetherelativetotalcostusingkphasesTC0(k)tothetotalcostof1phaseTC0(1). Intuitively,measuresthecostsavingsachievedusingphase-basedreservationstrategyovertheconservativeone.Thus,thesmallerthevalueof,themoreecientaphase-basedreservationscheme.Table 5-2 givesasampletotalcostschedule(C=52andCp=8)foreachoftheeightperformancefeaturesofSPECseis96.Itshowsthatbychangingtheresourceprovisioningstrategyfromtheconservativeapproach(k=1)tothephase-basedprovisioning(k=3),29.5%totalcostreductionforCPUusagecanbeachieved.ForspikytracedatasuchasdiskI/Oandmemoryusage,thetotalcostreductioncanbeashighas49%. 121
PAGE 122
SPECseis96totalcostratioschedulefortheeightperformancefeatures Theworkloadusedinthisexperimentwasbasedonthe1998WorldCuptrace[ 98 ].TheopenlyavailabletracecontainingalogofrequeststoWebserverswasusedasaninputtoaclientreplaytool,whichenabledustoexercisearealisticWeb-basedworkloadandcollectsystem-levelperformancemetricsusingGangliainthesamemannerthatwasdonefortheSPECseis96workload.Forthisstudy,wechosetoreplaythevehour(from22:00:01Jun.23to3:11:20Jun.24)logoftheleastloadedserver(serverID101),whichcontained130,000webrequests. Thephaseanalysisandpredictiontechniquescanbeusedtocharacterizeperformancedatacollectedfromnotonlyvirtualmachinesbutalsophysicalmachines.Duringtheexperiment,aphysicalserverwithsixteenIntel(R)Xeon(TM)MP3.00GHzCPUsand32GBmemorywasusedtoexecutethereplayclientstosubmitrequestsbasedonsubmissionintervals,HTTPprotocoltypes(1.0or1.1),anddocumentsizesdenedinthelogle.AphysicalmachinewithIntel(R)Pentium(R)41.70GHzCPUand512MBmemorywasusedtohosttheApachewebserverandasetofleswhichwerecreatedbasedonthelesizesdescribedinthelog. 122
PAGE 123
98 ]wasusedtoconvertthebinarylogintotheCommonLogFormat.AmodiedversionoftheReal-TimeWebLogReplayer[ 99 ]wasusedtoanalyzeandgeneratethelesneededbythelogreplayerandperformthereplay. Figures 5-5 and 5-6 showthephasecharacterizationresultsoftheperformancefeaturesbytes inandbytes outofthewebserver.TheinterestingobservationfromFigures A and B isthatthenumberofphasetransitionsandmis-predictionpenaltiesdonotalwaysmonotonicallyincreasewiththeincreasingnumberofphases.Asaresult,thephaseproleshowninFigure C arguesthatthree-phasebasedresourceprovisioninggivesthelowesttotalcostwithgivenC=[150k;750k]andC p=8.Theresultsimpliesthatthephaseproleishighlyworkloaddependent.Theprototypepresentedinthisthesiscanhelptoconstructandanalyzethephaseproleoftheapplicationresourceconsumptionanddecidetheproperresourceprovisioningstrategy. 5.4 .Aperformancemeasurement,predictionaccuracy,isdenedastheratioofthenumberofperformancesnapshots,whosepredictedphasesmatchwiththeobservedphases,tothetotalnumberofperformancesnapshotscollectedduringthetestingperiod. Table 5-3 showsthephasepredictionaccuraciesfortheperformancetracesofthemainresourcesconsumedbytheSPECseis96andtheWorldCup98workloads.Generally,thephasepredictionaccuracyofeachperformancefeaturedecreaseswithincreasingnumberofphases.ItexplainswhythepenaltycurverisesmonotonicallywiththeincreasingnumberofphasesinFigure D .Withcurrentimplementation,anaverageof95%accuracycanbeachievedforthenetworkperformancetracesoftheWorldCup98log 123
PAGE 124
Averagephasepredictionaccuracy Table5-4. PerformancefeaturelistofVMtraces Description Ready Thepercentageoftimethatthevirtualmachine wasreadybutcouldnotgetscheduledtorunon aphysicalCPU. CPU Used ThepercentageofphysicalCPUresourcesused byavirtualCPU. Mem Size Currentamountofmemoryinbytesthevirtual machinehas. Mem Swap Amountofswapspaceinbytesusedbythe virtualmachine. Net RX/TX ThenumberofpacketsandtheMBytesper secondthataretransmittedandreceivedbyaNIC. Disk RD/WR ThenumberofI/OsandKBytespersecond thatarereadfromandwrittentothedisk. replayandanaverageof85%accuracycanbeachievedfortheCPUperformancetracesofSPECseis96forthefour-phasecases. Inadditiontotheabovetwoapplications,wealsoevaluatedthepredictionper-formanceofthephasepredictorusingtracesofasetofvevirtualmachines.ThesevirtualmachineswerehostedbyaphysicalmachinewithanIntel(R)Xeon(TM)2.0GHzCPU,4GBmemory,and36GBSCSIdisk.VMwareESXserver2.5.2wasrunningonthephysicalhost.ThevmkusagetoolwasrunontheESXservertocollecttheresourceperformancedataoftheguestvirtualmachineseveryminuteandstoretheminaroundrobindatabase.TheperformancefeaturesunderstudyinthisexperimentareshowninTable 5-4 124
PAGE 125
Table 5-5 showstheaveragephasepredictionaccuraciesforeachofthe12perfor-mancefeaturesoveralltheveVMs.Itshowsthatwithincreasingnumberofphasesthephasepredictionaccuracyofeachperformancefeaturedecreasesmonotonically.Thepredictionaccuraciesvarywiththeperformancefeaturesunderstudy.Withcurrentimplementation,anaverageof83.25%accuracycanbeachievedacrossthephasepredictionsofallthetwelveperformancefeaturesforthetwophasecases. 1.Aclearmappingbetweenresourceconsumptionandresponsetimeisassumedfortheapplicationcontainer.Thismightnotalwaysbetrueforalltypesofapplications.Morecomplexperformance/queuingmodelsmaybeneededtoprovideanaccuratemappingincaseofcomplexapplications. 2.Adedicatedmachineisassumedfortheapplicationcontainertocollecttheperformancedata.Incasethatmultipleapplicationsco-existonthesamehostingmachine,amoresophisticatedmethodofdatacollection,forexampleaggregatingperformancedataoftheprocessesthatbelongtothesameapplication,maybeneeded. 125
PAGE 126
AveragephasepredictionaccuracyoftheveVMs 3.Inthiswork,onedimensionalphaseanalysisandpredictionisperformed.Howevertheprototypecangenerallyworkformulti-dimensionresourceprovisioningcasesalso.Forclusteringinthemulti-dimensionspace,additionalpatternrepresentationtechniquessuchasPrincipalComponentAnalysis(PCA)canbeusedtoprojectthedatatolowerdimensionalspacetoreducethecomputingintensity.Inaddition,thetransitionfactorCwillrepresenttheunittransitioncostdenedinthepricingscheduleoftheresourceprovider. Developingpredictionmodelsforparallelandmulti-tierapplicationsispartofourfutureresearch. 100 ][ 101 ].Second,phasecharacterizationthatsummarizesapplicationbehaviorwithrepresentativeexecutionregionscanbeused 126
PAGE 127
102 ][ 103 ].Ourpurposetostudythephasebehavioristosupportdynamicresourceprovisioningoftheapplicationcontainers. Inadditiontothepurposeofstudy,ourapproachdiersfromtraditionalprogramphaseanalysisinthefollowingways: 1)Performancemetricunderstudy:Intheareaofpowermanagementandsimulationoptimizationforcomputerarchitectureresearch,themetricsusedforworkloadcharac-terizationaretypicallyBasicBlockVectors(BBV)[ 102 ][ 101 ],conditionalbranchcounter[ 104 ],andinstructionworkingset[ 105 ].InthecontextofapplicationVM/container'sresourceprovisioning,themetricsunderstudyarethesystemlevelperformancefeatures,whichareinstructivetoVMresourceprovisioningsuchasthoseshowninTable 5-1 2)Knowledgeoftheprogramcodes:While[ 102 ][ 101 ][ 104 ]atleastrequiresprolingofprogrambinarycodes,ourapproachrequiresneitherinstrumentationnoraccessofprogramcodes. 3)Thisthesisanswersthequestion\howmanyclustersarebest"inthecontextofsystemlevelresourceprovisioning. In[ 106 ],Dhodapkaretal.comparedthreedynamicprogramphasedetectiontechniquesdiscussedin[ 102 ],[ 104 ],and[ 105 ]usingavarietyofperformancemetrics,suchassensitivity,stability,performancevarianceandcorrelationsbetweenphasedetectiontechniques. Inaddition,otherrelatedworkonresourceprovisioninginclude:Urgaonkaretal.studiedresourceprovisioninginamulti-tierwebenvironment[ 107 ].Wildstrometal.developedamethodtoidentifythebestCPUandmemorycongurationfromapoolofcongurationsforaspecicworkload[ 108 ].Chaseetal.haveproposedahierarchicalarchitecturethatallocatesvirtualclusterstoagroupofapplicaitons[ 109 ].Kusicetal.developedanoptimizationframeworktodecidethenumberofserverstoallocateto 127
PAGE 128
110 ].Tesauroetal.usedacombinationofreinforcementlearningandqueuingmodelforsystemperformancemanagement[ 5 ]. 128
PAGE 129
ApplicationresourcedemandphasepredictionworkowInthetrainingstage,theuperformancedataX1uoffeature(s)usedinthesubsequentphaseanalysisareextracted(patternrepresentation)andframedwithpredictionwindowsizem.TheunknownparametersoftheresourcepredictorisestimatedduringmodelttingusingtheframedtrainingdataX0(um+1)m.Inaddition,theclusteringalgorithmsintroducedinSection 5.3 areusedtoconstructtheapplicationphaseproleincludingthephaselabelsI1uforallthesamplesandthecalculatedclustercentroidsC1k.Inthetestingphase,thephasepredictorusestheknowledgelearnedfromthephaseproletopredictthefuturephases^P01vbasedonthepredictedresourceusage^Y01vand^P1vbasedonobservedactualresourceusage^Y1v,andcomparethemtoevaluatethephasepredictionaccuracy. 129
PAGE 130
B user A )Sampletrainingdata B )Sampletestingdata C )Phasetransitions D )Mispredictionpenalties E )Totalcostwithoutpenalty F )Totalcostwithpenalty(Cp=8) 130
PAGE 131
D 131
PAGE 132
F 132
PAGE 133
B C PhaseanalysisofWorldCup'98Bytes In A )Phasetransitions B )Mispredictionpenalties C )Totalcostwithpenalty(Cp=8) 133
PAGE 134
B C PhaseanalysisofWorldCup'98Bytes out A )Phasetransitions B )Mispredictionpenalties C )Totalcostwithpenalty(Cp=8) 134
PAGE 135
Self-managementhasdrawnincreasingattentionsinthelastfewyearsduetotheincreasingsizeandcomplexityofcomputingsystems.Aresourceschedulerthatcanperformself-optimizationandself-congurationcanhelptoimprovethesystemthroughputandfreesystemadministratorsfromlabor-intensiveanderror-pronetasks.However,itischallengingtoequiparesourceschedulerwithsuchself-capacitiesbecauseofthedynamicnatureofsystemperformanceandworkload. Inthisdissertation,weproposetousemachinelearningtechniquestoassistsystemperformancemodelingandapplicationworkloadcharacterization,whichcanprovidesupportforon-demandresourcescheduling.Inaddition,virtualmachinesareusedasresourcecontainerstohostapplicationexecutionsfortheeaseofdynamicresourceprovisioningandloadbalancing. TheapplicationclassicationframeworkpresentedinChapter2usedthePrincipalComponentAnalysis(PCA)toreducethedimensionoftheperformancedataspace.Thenthek-NearestNeighbork-NNalgorithmisusedtoclassifythedataintodierentclassessuchasCPU-intensive,I/O-intensive,memory-intensive,andnetwork-intensive.Itdoesnotrequiremodicationsoftheapplicationsourcecode.Experimentswithvariousbenchmarkapplicationssuggestthatwiththeapplicationclassknowledge,aschedulercanimprovethesystemthroughput22.11%onaveragebyallocatingtheapplicationsofdierentclassestosharethesystemresources. ThefeatureselectionprototypepresentedinChapter3usesaprobabilisticmodel(BayesianNetwork)tosystematicallyselecttherepresentativeperformancefeatures,whichcanprovideoptimalclassicationaccuracyandadapttochangingworkloads.Itshowsthatautonomicfeatureselectionenablesclassicationwithoutrequiringexpertknowledgeintheselectionofrelevantlow-levelperformancemetrics.Thisapproachrequiresnoapplicationsourcecodemodicationnorexecutionintervention.Resultsfrom 135
PAGE 136
Inadditiontotheapplicationresourcedemandmodeling,Chapter4proposesalearningbasedadaptivepredictor,whichcanbeusedtopredictresourceavailability.Itusesthek-NNclassierandPCAtolearntherelationshipbetweenworkloadcharacteristicandsuitedpredictorbasedonhistoricalpredictions,andtoforecastthebestpredictorfortheworkloadunderstudy.Then,onlytheselectedbestpredictorisruntopredictthenextvalueoftheperformancemetric,insteadofrunningmultiplepredictorsinparalleltoidentifythebestone.Theexperimentalresultsshowthatthislearning-aidedadaptiveresourcepredictorcanoftenoutperformthesinglebestpredictorinthepoolwithoutaprioriknowledgeofwhichmodelbesttsthedata. Theapplicationclassicationandthefeatureselectiontechniquescanbeusedtodenetheapplicationresourceconsumptionpatternsatanygivenmoment.Theexperimentalresultsoftheapplicationclassicationsuggestthatallocatingapplicationswhichhavecomplementaryresourceconsumptionpatternstothesameservercanimprovethesystemthroughput. Inadditiontoone-step-aheadperformanceprediction,Chapter 5 studiedthelargescalebehaviorapplicationresourceconsumption.Clusteringbasedalgorithmshavebeenexploredtoprovideamechanismtodeneandpredictthephasebehavioroftheapplicationresourceusagetosupporton-demandresourceallocation.Theexperimentalresultsshowthatanaverageofabove90%ofphasepredictionaccuracycanbeachievedforthefour-phasecasesofthebenchmarkworkloads. 136
PAGE 137
[1] J.KephartandD.Chess,\Thevisionofautonomiccomputing,"Computer,vol.36,no.1,pp.41{50,2003. [2] Y.YangandH.Casanova,\Rumr:Robustschedulingfordivisibleworkloads.,"inProc.12thHigh-PerformanceDistributedComputing,Seattle,WA,June22-24,2003,pp.114{125. [3] J.M.SchopfandF.Berman,\Stochasticscheduling,"inProc.ACM/IEEEConferenceonSupercomputing,Portland,OR,Nov.14{19,1999,p.48. [4] L.Yang,J.M.Schopf,andI.Foster,\Conservativescheduling:Usingpredictedvariancetoimproveschedulingdecisionsindynamicenvironments,"inProc.ACM/IEEEconferenceonSupercomputing,Nov.15-21,2003,p.31. [5] G.Tesauro,N.Jong,R.Das,andM.Bennani,\Ahybridreinforcementlearningapproachtoautonomicresourceallocation,"inProc.IEEEInternationalConferenceonAutonomicComputing(ICAC'06),2006,pp.65{73. [6] G.Tesauro,R.Das,W.Walsh,andJ.Kephart,\Utility-function-drivenresourceallocationinautonomicsystems,"inProc.SecondInternationalConferenceonAutonomicComputing(ICAC'05),2005,pp.342{343. [7] R.Duda,P.Hart,andD.Stork,TheArtofComputerSystemsPerformanceAnalysis:TechniquesforExperimentalDesign,Measurement,Simulation,andModeling,Wiley-Interscience,NewYork,NY,Apr.1991. [8] J.O.Kephart,\Researchchallengesofautonomiccomputing,"inProc.27thInternationalConferenceonSoftwareEngineeringICSE,May2005,pp.15{22. [9] S.M.WeissandC.A.Kulikowski,ComputerSystemsThatLearn:ClassicationandPredictionMethodsfromStatistics,NeuralNets,MachineLearning,andExpertSystems,MorganKaufmann,SanMateo,CA94403,1990. [10] R.P.Goldberg,\Surveyofvirtualmachineresearch,"IEEEComputerMagazine,vol.7,no.6,pp.34{45,June1974. [11] R.Figueiredo,P.Dinda,andJ.Fortes,\Acaseforgridcomputingonvirtualmachines,"inProc.23rdInternationalConferenceonDistributedComputingSystems,May19{22,2003,pp.550{559. [12] S.Pinter,Y.Aridor,S.Shultz,andS.Guenender,\Improvingmachinevirtualizationwith'hotplugmemory',"Proc.17thInternationalSymposiumonComputerArchitectureandHighPerformanceComputing,pp.168{175,2005. [13] C.Clark,K.Fraser,S.Hand,J.Hanseny,E.July,C.Limpach,I.Pratt,andA.Wareld,\Livemigrationofvirtualmachines,"inProc.2ndSymposiumonNetworkedSystemsDesign&Implementation(NSDI'05),Boston,MA,2005. 137
PAGE 138
[14] \Vmotion,"http://www.vmware.com/products/vi/vc/vmotion.html. [15] M.Zhao,J.Zhang,andR.Figueiredo,\Distributedlesystemsupportforvirtualmachinesingridcomputing,"Proc.13thInternationalSymposiumonHighPerfor-manceDistributedComputing,pp.202{211,2004. [16] I.Krsul,A.Ganguly,J.Zhang,J.Fortes,andR.Figueiredo,\Vmplants:Providingandmanagingvirtualmachineexecutionenvironmentsforgridcomputing,"inProc.Supercomputing,Washington,DC,Nov.6{12,2004. [17] J.Sugerman,G.Venkitachalan,andB.Lim,\Virtualizingi/odevicesonvmwareworkstation'shostedvirtualmachinemonitor,"inProc.USENIXAnnualTechnicalConference,2001. [18] J.Dike,\Auser-modeportofthelinuxkernel,"inProc.4thAnnualLinuxShowcaseandConference,USENIXAssociation,Atlanta,GA,Oct.2000. [19] A.SundararajandP.Dinda,\Towardsvirtualnetworksforvirtualmachinegridcomputing,"inProc.3rdUSENIXVirtualMachineResearchandTechnologySymposium,May2004. [20] M.Litzkow,T.Tannenbaum,J.Basney,andM.Livny,\CheckpointandmigrationofUNIXprocessesintheCondordistributedprocessingsystem,"Tech.Rep.UW-CS-TR-1346,UniversityofWisconsin-MadisonComputerSciencesDepartment,Apr.1997. [21] A.Barak,O.Laden,andY.Yarom,\Thenowmosixanditspreemptiveprocessmigrationscheme,"BulletinoftheIEEETechnicalCommitteeonOperatingSystemsandApplicationEnvironments,vol.7,no.2,pp.5{11,1995. [22] R.Duda,P.Hart,andD.Stork,PatternClassication,Wiley-Interscience,NewYork,NY,2001,2ndedition. [23] C.G.Atkeson,A.W.Moore,andS.Schaal,\Locallyweightedlearning,"ArticialIntellegenceReview,vol.11,no.1-5,pp.11{73,1997. [24] S.Adabala,V.Chadha,P.Chawla,R.J.O.Figueiredo,J.A.B.Fortes,I.Krsul,A.M.Matsunaga,M.O.Tsugawa,J.Zhang,M.Zhao,L.Zhu,andX.Zhu,\Fromvirtualizedresourcestovirtualcomputinggrids:thein-vigosystem.,"FutureGenerationComp.Syst.,vol.21,no.6,pp.896{909,2005. [25] L.YuandH.Liu,\Ecientfeatureselectionviaanalysisofrelevanceandredundancy,"JournalofMachineLearningResearch,vol.5,pp.1205{1224,Oct.2004. [26] T.CoverandP.Hart,\Nearestneighborpatternclassication,"IEEETrans.Inf.Theory,vol.13,no.1,pp.21{27,Jan.1967.
PAGE 139
[27] M.L.Massie,B.N.Chun,andD.E.Culler,\Thegangliadistributedmonitoringsystem:Design,implementation,andexperience.,"ParallelComputing,vol.30,no.5-6,pp.817{840,2004. [28] \Netapp,"http://www.netapp.com/tech library/3022.html. [29] R.EigenmannandS.Hassanzadeh,\Benchmarkingwithrealindustrialapplications:thespechigh-performancegroup,"IEEEComputationalScienceandEngineering,vol.3,no.1,pp.18{23,1996. [30] \Ettcp,"http://sourceforge.net/projects/ettcp/. [34] Q.Snell,A.Mikler,andJ.Gustafson,\Netpipe:Anetworkprotocolindependentperformaceevaluator,"June1996. [31] \Simplescalar,"http://www.cs.wisc.edu/mscalar/simplescalar.html. [32] \Ch3d,"http://users.coastal.u.edu/pete/CH3D/ch3d.html. [33] \Bonnie,"http://www.textuality.com/bonnie/. [35] \Vmd,"http://www.ks.uiuc.edu/Research/vmd/. [36] \Spim,"http://www.cs.wisc.edu/larus/spim.html. [37] \Referenceofstream,"http://www.cs.virginia.edu/stream/ref.html. [38] \Autobench,"http://www.xenoclast.org/autobench/. [39] I.GuyonandA.Elissee,\Anintroductiontovariableandfeatureselection,"J.Mach.Learn.Res.,vol.3,pp.1157{1182,Mar.2003. [40] Y.LiaoandV.R.Vemuri,\Usingtextcategorizationtechniquesforintrusiondetection,"in11thUSENIXSecuritySymposium,SanFrancisco,CA,Aug.5{9,2002,pp.51{59. [41] A.K.Ghosh,A.Schwartzbard,andM.Schatz,\Learningprogrambehaviorprolesforintrusiondetection,"inProc.theWorkshoponIntrusionDetectionandNetworkMonitoring,SantaClara,CA,Apr.9{12,1999,pp.51{62. [42] M.AlmgrenandE.Jonsson,\Usingactivelearninginintrusiondetection,"inProc.17thIEEEComputerSecurityFoundationsWorkshop,June28{30,2004,pp.88{98. [43] S.C.LeeandD.V.Heinbuch,\Traininganeural-networkbasedintrusiondetectortorecognizenovelattacks.,"IEEETransactionsonSystems,Man,andCybernetics,PartA,vol.31,no.4,pp.294{299,2001. [44] G.Forman,\Anextensiveempiricalstudyoffeatureselectionmetricsfortextclassication,"J.Mach.Learn.Res.,vol.3,pp.1289{1305,2003.
PAGE 140
[45] N.H.Kapadia,J.A.B.Fortes,andC.E.Brodley,\Predictiveapplication-performancemodelinginacomputationalgridenvironment,"inProc.8thIEEEInternationalSymposiumonHighPerformanceDistributedComputing,RedondoBeach,CA,Aug.3{6,1999,p.6. [46] J.BasneyandM.Livny,\Improvinggoodputbycoschedulingcpuandnetworkcapacity,"Int.J.HighPerform.Comput.Appl.,vol.13,no.3,pp.220{230,Aug.1999. [47] R.Raman,M.Livny,andM.Solomon,\Policydrivenheterogeneousresourceco-allocationwithgangmatching,"inProc.12thIEEEInternationalSymposiumonHighPerformanceDistributedComputing(HPDC'03),Seattle,WA,June22{24,2003,p.80. [48] S.SodhiandJ.Subhlok,\Skeletonbasedperformancepredictiononsharednetworks,"inIEEEInternationalSymposiumonClusterComputingandtheGrid(CCGrid2004),2004,pp.723{730. [49] V.Taylor,X.Wu,andR.Stevens,\Prophesy:aninfrastructureforperformanceanalysisandmodelingofparallelandgridapplications,"SIGMETRICSPerform.Eval.Rev.,vol.30,no.4,pp.13{18,2003. [50] O.Y.Nickolayev,P.C.Roth,andD.A.Reed,\Real-timestatisticalclusteringforeventtracereduction,"TheInternationalJournalofSupercomputerApplicationsandHighPerformanceComputing,vol.11,no.2,pp.144{159,Summer1997. [51] D.H.AhnandJ.S.Vetter,\Scalableanalysistechniquesformicroprocessorperformancecountermetrics,"inProc.SuperComputing,Baltimore,MD,Nov.16{22,2002,pp.1{16. [52] I.Cohen,J.S.Chase,M.Goldszmidt,T.Kelly,andJ.Symons,\Correlatinginstrumentationdatatosystemstates:Abuildingblockforautomateddiagnosisandcontrol.,"in6thUSENIXSymposiumonOperatingSystemsDesignandImplementation,2004,pp.231{244. [53] J.ZhangandR.Figueiredo,\Applicationclassicationthroughmonitoringandlearningofresourceconsumptionpatterns,"inProc.20thInternationalParallel&DistributedProcessingSymposium,RhodesIsland,Greece,Apr.25{29,2006. [54] M.Massie,B.Chun,andD.Culler,TheGangliaDistributedMonitoringSystem:Design,Implementation,andExperience,Addison-Wesley,Reading,MA,2003. [55] S.Agarwala,C.Poellabauer,J.Kong,K.Schwan,andM.Wolf,\Resource-awarestreammanagementwiththecustomizabledprocdistributedmonitoringmechanisms,"inProc.12thIEEEInternationalSymposiumonHighPerformanceDistributedComputing,June22{24,2003,pp.250{259. [56] \Hp,"http://www.managementsoftware.hp.com.
PAGE 141
[57] H.LiuandL.Yu,\Towardintegratingfeatureselectionalgorithmsforclassicationandclustering,"IEEETrans.Knowl.DataEng.,vol.17,no.4,pp.491{502,Apr.2005. [58] J.Pearl,ProbabilisticReasoninginIntelligentSystems:NetworksofPlausibleInference,MorganKaufmannPublishers,SanFrancisco,CA,1988. [59] T.Dean,K.Basye,R.Chekaluk,S.Hyun,M.Lejter,andM.Randazza,\Copingwithuncertaintyinacontrolsystemfornavigationandexploration.,"inProc.8thNationalConferenceonArticialIntelligence,Boston,MA,July29{Aug.3,1990,pp.1010{1015. [60] D.Heckerman,\Probabilisticsimilaritynetworks,"Tech.Rep.,Depts.ofComputerScienceandMedicine,StanfordUniversity,1990. [61] D.J.Spiegelhalter,R.C.Franklin,andK.Bull,\Assessmentcriticismandimprovementofimprecisesubjectiveprobabilitiesforamedicalexpertsystem,"inProc.FifthWorkshoponUncertaintyinArticialIntelligence,1989,pp.335{342. [62] E.CharniakandD.McDermott,IntroductiontoArticialIntelligence,Addison-WesleyLongmanPublishingCo.,Inc.,Boston,MA,USA,1985. [63] T.S.Levitt,J.Mullin,andT.O.Binford,\Model-basedinuencediagramsformachinevision,"inProc.5thWorkshoponUncertaintyinArticialIntelligence,1989,pp.233{244. [64] R.E.Neapolitan,Probabilisticreasoninginexpertsystems:theoryandalgorithms,JohnWiley&Sons,Inc.,NewYork,NY,USA,1990. [65] K.Weinberger,J.Blitzer,andL.Saul,\Distancemetriclearningforlargemarginnearestneighborclassication,"inProc.19thannualConferenceonNeuralInformationProcessingSystems,Vancouver,CA,Dec.2005. [66] R.KohaviandF.Provost,\Glossaryofterms,"MachineLearning,vol.30,pp.271{274,1998. [67] B.Ziebart,D.Roth,R.Campbell,andA.Dey,\Automatedandadaptivethresholdsetting:Enablingtechnologyforautonomyandself-management,"inProc.2ndInternationalConferenceofAutonomicComputing,June13{16,2005,pp.204{215. [68] P.Mitra,C.Murthy,andS.Pal,\Unsupervisedfeatureselectionusingfeaturesimilarity,"IEEETrans.Pat.Anal.Mach.Intel.,vol.24,no.3,pp.301{312,Mar.2002. [69] W.Lee,S.J.Stolfo,andK.W.Mok,\Adaptiveintrusiondetection:Adataminingapproach,"ArticialIntelligenceReview,vol.14,no.6,pp.533{567,2000. [70] M.K.Aguilera,J.C.Mogul,J.L.Wiener,P.Reynolds,andA.Muthitacharoen,\Performancedebuggingfordistributedsystemsofblackboxes,"inProc.19thACM
PAGE 142
[71] R.IsaacsandP.Barham,\Performanceanalysisinloosely-coupleddistributedsystems,"inProc.7thCaberNetRadicalsWorkshop,Bertinoro,Italy,Oct.2002. [72] I.Foster,\Theanatomyofthegrid:enablingscalablevirtualorganizations,"inProc.1stIEEE/ACMInternationalSymposiumonClusterComputingandtheGrid,2001,pp.6{7. [73] R.Wolski,\Dynamicallyforecastingnetworkperformanceusingthenetworkweatherservice,"inJournalofclustercomputing,1998. [74] I.Matsuba,H.Suyari,S.Weon,andD.Sato,\Practicalchaostimeseriesanalysiswithnancialapplications,"inProc.5thInternationalConferenceonSignalProcessing,Beijing,2000,vol.1,pp.265{271. [75] P.MagniandR.Bellazzi,\Astochasticmodeltoassessthevariabilityofbloodglucosetimeseriesindiabeticpatientsself-monitoring,"IEEETrans.Biomed.Eng.,vol.53,no.6,pp.977{985,2006. [76] K.DidanandA.Huete,\Analysisoftheglobalvegetationdynamicmetricsusingmodisvegetationindexandlandcoverproducts,"inIEEEInternationalGeoscienceandRemoteSensingSymposium(IGARSS'04),2004,vol.3,pp.2058{2061. [77] P.Dinda,\Thestatisticalpropertiesofhostload,"ScienticProgramming,,no.7:3-4,1999. [78] P.Dinda,\Hostloadpredictionusinglinearmodels,"ClusterComputing,vol.3,no.4,2000. [79] Y.Zhang,W.Sun,andY.Inoguchi,\CPUloadpredictionsonthecomputationalgrid*,"inProc.6thIEEEInternationalSymposiumonClusterComputingandtheGrid,May2006,vol.1,pp.321{326. [80] J.Liang,K.Nahrstedt,andY.Zhou,\Adaptivemulti-resourcepredictionindistributedresourcesharingenvironment,"inProc.IEEEInternationalSymposiumonClusterComputingandtheGrid,2004,pp.293{300. [81] S.VazhkudaiandJ.Schopf,\Predictingsporadicgriddatatransfers,"Proc.InternationalSymposiumonHighPerformanceDistributedComputing,pp.188{196,2002. [82] S.Vazhkudai,J.Schopf,andI.Foster,\Usingdiskthroughputdatainpredictionsofend-to-endgriddatatransfers,"inProc.3rdInternationalWorkshoponGridComputing,Nov.2002.
PAGE 143
[83] S.GunterandH.Bunke,\Anevaluationofensemblemethodsinhandwrittenwordrecognitionbasedonfeatureselection,"inProc.17thInternationalConferenceonPatternRecognition,Aug.2004,vol.1,pp.388{392. [84] G.Jain,A.Ginwala,andY.Aslandogan,\Anapproachtotextclassicationusingdimensionalityreductionandcombinationofclassiers,"inProc.IEEEInternationalConferenceonInformationReuseandIntegration,Nov.2004,pp.564{569. [85] V.whitepaper,\Comparingthemui,virtualcenter,andvmkusage,". [86] J.D.Cryer,Timeseriesanalysis,DuxburyPress,Boston,MA,1986. [87] S.G.JohnO.RawlingsandD.A.Dickey,AppliedRegressionAnalysis,Springer,2001. [88] R.T.TrevorHastieandJ.Friedman,TheElementsofStatisticalLearning,Springer,2001. [89] E.BinghamandH.Mannila,\Randomprojectionindimensionalityreduction:applicationstoimageandtextdata,"inKnowledgeDiscoveryandDataMining,2001,pp.245{250. [90] L.SirovichandR.Everson,\Managementandanalysisoflargescienticdatasets,"Int.JournalofSupercomputerApplications,vol.6,no.1,pp.50{68,1992. [91] J.Yang,Y.ZhangandB.Kisiel,\Ascalabilityanalysisofclassiersintextcategorization,"inACMSIGIR'03,2003,pp.96{103. [92] F.Friedman,J.H.BaskettandL.Shustek,\Analgorithmforndingnearestneighbors,"IEEETransactionsonComputers,vol.C-24,no.10,pp.1000{1006,Oct.1975. [93] J.Friedman,J.H.BentleyandR.Finkel,\Analgorithmforndingbestmatchesinlogarithmicexpectedtime,"ACMTransactionsonMathematicalSoftware,vol.3,pp.209{226,1977. [94] P.D.G.BangaandJ.Mogul,\Resourcecontainers:Anewfacilityforresourcemanagementinserversystems,"inProc.3rdsymposiumonOperatingSystemDesignandImplementation,NewOrleans,Feb.1999. [95] L.Ramakrishnan,L.Grit,A.Iamnitchi,D.Irwin,A.Yumerefendi,andJ.Chase,\Towardsadoctrineofcontainment:Gridhostingwithadaptiveresourcecontrol,"inProc.Supercomputing,Tampa,FL,Nov.2006. [96] R.Dubes,\Howmanyclustersarebest?-anexperiment,"PatternRecogn.,vol.20,no.6,pp.645{663,Nov.1987. [97] A.K.Jain,M.N.Murty,andP.J.Flynn,\Dataclustering:areview,"ACMComputingSurveys,vol.31,no.3,pp.264{323,1999.
PAGE 144
[98] \Worldcup98,"http://ita.ee.lbl.gov/html/contrib/WorldCup.html. [99] \Logreplayer,"http://www.cs.virginia.edu/rz5b/software/logreplayer-manual.htm. [100] C.Isci,A.Buyuktosunoglu,andM.Martonosi,\Long-termworkloadphases:durationpredictionsandapplicationstodvfs,"IEEEMicro,vol.25,no.5,pp.39{51,2005. [101] C.IsciandM.Martonosi,\Phasecharacterizationforpower:evaluatingcontrol-ow-basedandevent-counter-basedtechniques,"Proc.12thInternationalSymposiumonHigh-PerformanceComputerArchitecture,pp.121{132,2006. [102] T.Sherwood,E.Perelman,G.Hamerly,andB.Calder,\Automaticallycharacterizinglargescaleprogrambehavior,"inProc.10thInternationalCon-ferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystems,2002,pp.45{57. [103] H.Patil,R.Cohn,M.Charney,R.Kapoor,A.Sun,andA.Karunanidhi,\Pinpointingrepresentativeportionsoflargeintelitaniumprogramswithdynamicinstrumentation,"inProc.37thannualinternationalsymposiumonMicroarchitec-ture,2004. [104] R.Balasubramonian,D.Albonesi,A.Buyuktosunoglu,andS.Dwarkadas,\Memoryhierarchyrecongurationforenergyandperformanceingeneralpurposearchitectures,"inProc.33rdannualinternationalsymposiumonmicroarchitecture,Dec.2000,pp.245{257. [105] A.DhodapkarandJ.Smith,\Managingmulti-congurationhardwareviadynamicworkingsetanalysis,"inProc.29thAnnualInternationalSymposiumonComputerArchitecture,Anchorage,AK,May2002,pp.233{244. [106] A.DhodapkarandJ.Smith,\Comparingprogramphasedetectiontechniques,"inProc.36thAnnualIEEE/ACMInternationalSymposiumonMicroarchitecture,2003,pp.217{227. [107] B.Urgaonkar,P.Shenoy,A.Chandra,andP.Goyal,\Dynamicprovisioningofmulti-tierinternetapplications,"inProc.2ndInternationalConferenceofAutonomicComputing,June2005,pp.217{228. [108] J.Wildstrom,P.Stone,E.Witchel,R.J.Mooney,andM.Dahlin,\Towardsself-conguringhardwarefordistributedcomputersystems,"inProc.2ndInterna-tionalConferenceofAutonomicComputing,June2005,pp.241{249. [109] J.S.Chase,D.E.Irwin,L.E.Grit,J.D.Moore,andS.E.Sprenkle,\Dynamicvirtualclustersinagridsitemanager,"Proc.12thIEEEInternationalSymposiumonHighPerformanceDistributedComputing,pp.90{100,June2003.
PAGE 145
[110] D.KusicandN.Kandasamy,\Risk-awarelimitedlookaheadcontrolfordynamicresourceprovisioninginenterprisecomputingsystems,"inProc.3rdInternationalConferenceofAutonomicComputing,2006,pp.74{83.
PAGE 146
JianZhangwasborninChengdu,China.ShereceivedherB.S.degreein1995,fromtheUniversityofElectronicScienceandTechnologyofChina,majoringincomputercommunication.ShereceivedherM.S.degreein2001fromtheUniversityofFlorida,majoringinelectricalandcomputerengineering.Since2002,shehasbeenwiththeAdvancedComputingandInformationSystemsLaboratory(ACIS)attheUniversityofFlorida,pursuingherPh.D.degree.Herresearchinterestsincludedistributedsystems,autonomiccomputing,virtualizationtechnologies,andinformationsystems. 146
|
|