Citation
Learning-Aided System Performance Modeling in Support of Self-Optimized Resource Scheduling in Distributed Environments

Material Information

Title:
Learning-Aided System Performance Modeling in Support of Self-Optimized Resource Scheduling in Distributed Environments
Creator:
Zhang, Jian
Place of Publication:
[Gainesville, Fla.]
Publisher:
University of Florida
Publication Date:
Language:
english
Physical Description:
1 online resource (146 p.)

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
Figueiredo, Renato J.
Committee Members:
Fortes, Jose A.
George, Alan D.
Ghosh, Malay
Graduation Date:
12/14/2007

Subjects

Subjects / Keywords:
Distance functions ( jstor )
Information classification ( jstor )
Learning ( jstor )
Machine learning ( jstor )
Machinery ( jstor )
Modeling ( jstor )
Performance metrics ( jstor )
Principal components analysis ( jstor )
Scheduling ( jstor )
Workloads ( jstor )
Electrical and Computer Engineering -- Dissertations, Academic -- UF
bayesian, classification, clustering, distributed, knn, learning, pca, performance, prediction, scheduling
Genre:
Electronic Thesis or Dissertation
born-digital ( sobekcm )
Electrical and Computer Engineering thesis, Ph.D.

Notes

Abstract:
With the goal of autonomic computing, it is desirable to have a resource scheduler that is capable of self-optimization, which means that with a given high-level objective the scheduler can automatically adapt its scheduling decisions to the changing workload. This self-optimization capacity imposes challenges to system performance modeling because of increasing size and complexity of computing systems. Our goals were twofold: to design performance models that can derive applications' resource consumption patterns in a systematic way, and to develop performance prediction models that can adapt to changing workloads. A novelty in the system performance model design is the use of various machine learning techniques to effciently deal with the complexity of dynamic workloads based on monitoring and mining of historical performance data. In the environments considered in this thesis, virtual machines (VMs) are used as resource containers to host application executions because of their flexibility in supporting resource provisioning and load balancing. Our study introduced three performance models to support self-optimized scheduling and decision-making. First, a novel approach is introduced for application classification based on the Principal Component Analysis (PCA) and the k-Nearest Neighbor (k-NN) classifier. It helps to reduce the dimensionality of the performance feature space and classify applications based on extracted features. In addition, a feature selection model is designed based on Bayesian Network (BN) to systematically identify the feature subset, which can provide optimal classification accuracy and adapt to changing workloads. Second, an adaptive system performance prediction model is investigated based on a learning-aided predictor integration technique. Supervised learning techniques are used to learn the correlations between the statistical properties of the workload and the best-suited predictors. In addition to a one-step ahead prediction model, a phase characterization model is studied to explore the large-scale behavior of application's resource consumption patterns. Our study provides novel methodologies to model system and application performance. The performance models can self-optimize over time based on learning of historical runs, therefore better adapt to the changing workload and achieve better prediction accuracy than traditional methods with static parameters. ( en )
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis:
Thesis (Ph.D.)--University of Florida, 2007.
Local:
Adviser: Figueiredo, Renato J.
Statement of Responsibility:
by Jian Zhang.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Zhang, Jian. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Classification:
LD1780 2007 ( lcc )

Downloads

This item has the following downloads:

zhang_j.pdf

zhang_j_Page_122.txt

zhang_j_Page_095.txt

zhang_j_Page_143.txt

zhang_j_Page_066.txt

zhang_j_Page_006.txt

zhang_j_Page_038.txt

zhang_j_Page_097.txt

zhang_j_Page_090.txt

zhang_j_Page_040.txt

zhang_j_Page_010.txt

zhang_j_Page_131.txt

zhang_j_Page_063.txt

zhang_j_Page_064.txt

zhang_j_Page_073.txt

zhang_j_Page_146.txt

zhang_j_Page_032.txt

zhang_j_Page_118.txt

zhang_j_Page_037.txt

zhang_j_Page_065.txt

zhang_j_Page_028.txt

zhang_j_Page_101.txt

zhang_j_Page_079.txt

zhang_j_Page_128.txt

zhang_j_Page_117.txt

zhang_j_Page_024.txt

zhang_j_Page_051.txt

zhang_j_Page_030.txt

zhang_j_Page_140.txt

zhang_j_Page_135.txt

zhang_j_Page_093.txt

zhang_j_Page_013.txt

zhang_j_Page_130.txt

zhang_j_Page_087.txt

zhang_j_Page_108.txt

zhang_j_Page_049.txt

zhang_j_Page_042.txt

zhang_j_Page_141.txt

zhang_j_Page_047.txt

zhang_j_Page_056.txt

zhang_j_Page_080.txt

zhang_j_Page_035.txt

zhang_j_Page_016.txt

zhang_j_Page_114.txt

zhang_j_Page_082.txt

zhang_j_Page_017.txt

zhang_j_Page_120.txt

zhang_j_Page_105.txt

zhang_j_Page_067.txt

zhang_j_Page_070.txt

zhang_j_Page_107.txt

zhang_j_Page_068.txt

zhang_j_Page_134.txt

zhang_j_Page_023.txt

zhang_j_Page_052.txt

zhang_j_Page_112.txt

zhang_j_pdf.txt

zhang_j_Page_072.txt

zhang_j_Page_001.txt

zhang_j_Page_125.txt

zhang_j_Page_005.txt

zhang_j_Page_036.txt

zhang_j_Page_020.txt

zhang_j_Page_115.txt

zhang_j_Page_123.txt

zhang_j_Page_136.txt

zhang_j_Page_106.txt

zhang_j_Page_092.txt

zhang_j_Page_075.txt

zhang_j_Page_039.txt

zhang_j_Page_099.txt

zhang_j_Page_029.txt

zhang_j_Page_088.txt

zhang_j_Page_058.txt

zhang_j_Page_142.txt

zhang_j_Page_018.txt

zhang_j_Page_085.txt

zhang_j_Page_045.txt

zhang_j_Page_133.txt

zhang_j_Page_103.txt

zhang_j_Page_102.txt

zhang_j_Page_009.txt

zhang_j_Page_124.txt

zhang_j_Page_100.txt

zhang_j_Page_055.txt

zhang_j_Page_139.txt

zhang_j_Page_083.txt

zhang_j_Page_053.txt

zhang_j_Page_078.txt

zhang_j_Page_059.txt

zhang_j_Page_074.txt

zhang_j_Page_119.txt

zhang_j_Page_098.txt

zhang_j_Page_086.txt

zhang_j_Page_057.txt

zhang_j_Page_129.txt

zhang_j_Page_033.txt

zhang_j_Page_104.txt

zhang_j_Page_077.txt

zhang_j_Page_003.txt

zhang_j_Page_046.txt

zhang_j_Page_061.txt

zhang_j_Page_144.txt

zhang_j_Page_081.txt

zhang_j_Page_126.txt

zhang_j_Page_127.txt

zhang_j_Page_019.txt

zhang_j_Page_096.txt

zhang_j_Page_007.txt

zhang_j_Page_132.txt

zhang_j_Page_008.txt

zhang_j_Page_027.txt

zhang_j_Page_138.txt

zhang_j_Page_050.txt

zhang_j_Page_111.txt

zhang_j_Page_048.txt

zhang_j_Page_021.txt

zhang_j_Page_116.txt

zhang_j_Page_011.txt

zhang_j_Page_062.txt

zhang_j_Page_109.txt

zhang_j_Page_026.txt

zhang_j_Page_004.txt

zhang_j_Page_137.txt

zhang_j_Page_076.txt

zhang_j_Page_043.txt

zhang_j_Page_054.txt

zhang_j_Page_022.txt

zhang_j_Page_031.txt

zhang_j_Page_060.txt

zhang_j_Page_145.txt

zhang_j_Page_041.txt

zhang_j_Page_014.txt

zhang_j_Page_121.txt

zhang_j_Page_113.txt

zhang_j_Page_110.txt

zhang_j_Page_071.txt

zhang_j_Page_084.txt

zhang_j_Page_091.txt

zhang_j_Page_034.txt

zhang_j_Page_002.txt

zhang_j_Page_069.txt

zhang_j_Page_089.txt

zhang_j_Page_012.txt

zhang_j_Page_025.txt

zhang_j_Page_044.txt

zhang_j_Page_015.txt

zhang_j_Page_094.txt


Full Text








Case Format
Features Classes



Sample Cases

Patterns
of Feature ec
V u Decisions
Values



General
General Learning
Classifier Le m
Model System

Training

Testing
Tesn .. Decision on Class
Case to Be Application-Specific D Assignment
Classified Classifier Ase
of Case


Figure 1-2. Classification system representation
During the training phase, labeled sample cases are used to derive the
unknown parameters of the /.i- .:;. r model. During the testing phase, the
customized l,- .;'7 r is used to associate a -... .:7' pattern of observations with
a "-/ ''. : class.


formulated as a finite-state Markov decision process (MlI)P), and reinforcement learning

algorithms for this context are highly related to dynamic programming techniques.

1.3.4 Other Learning Paradigms

In addition to the above three traditional learning methods, there are some other

learning paradigms:

Relational Learning / Structured Prediction: It predicts structure on sets of objects.

For example, it is trained on genome/proteome data with known relationships and can

predict graph structure on new sets of genomes/proteomes.
























Predictor Performance Comparison (VM2)


1.4
1.2
LU
u 1
0.8
N
0.6

o 0.4
0.2
0
1 2 3 4 5 6 7 8 9 10 11 12
Performance Metric ID

SP-LARP Knn-LARP OBays-LARP OCum.MSE W-Cum.MSE

Figure 4-10. Predictor performance comparison (VM2)
1 CPU_usedsec, 2 CPUready, 3 Mem_size, 4 Mem_swap,
5 NICl_rx, 6 NICl_tx, 7 NIC2_rx, 8 NIC2_tx,
9 VDlread, 10 VDl_write, 11 VD2_read, 12 VD2_write









The skeleton-based performance prediction work introduced in [48] uses a synthetic

skeleton program to reproduce the CPU utilization and communication behaviors of

message passing parallel programs to predict application performance. In contrast, the

application classifier provides application behavior learning in more dimensions.

Prophesy [49] employs a performance-modeling component, which uses coupling

parameters to quantify the interactions between kernels that compose an application.

However, to be able to collect data at the level of basic blocks, procedures, and loops,

it requires insertion of instrumentation code into the application source code. In

contrast, the classification approach uses the system performance data collected from

the application host to infer the application resource consumption pattern. It does not

require the modification of the application source code.

Statistical clustering techniques have been applied to learn application behavior

at various levels. Nickolayev et al applied clustering techniques to efficiently reduce

the processor event trace data volume in cluster environment [50]. Ahn and Vetter

conducted application performance ain i ,~i-; by using clustering techniques to identify the

representative performance counter metrics [51]. Both Cohen and ('!i 's [52] and our

work perform statistical clustering using system-level metrics. However, their work focuses

on system performance anomaly detection. Our work focuses on application classification

for resource scheduling.

Our work can be used to learn the resource consumption patterns of parallel

application's child process and multi-stage application's sub-stage. However, in this

study we focus on sequential and single-stage applications.

2.6 Conclusion

The application classification prototype presented in this chapter shows how to apply

the Principal Component Analysis and K-Nearest Neighbor techniques to reduce the

dimensions of application resource consumption feature space and assist the resource

scheduling. In addition to the CPU load, it also takes the I/O, network, and memory





































Figure 2-3.


Data Pool


Application classification model
The Performance profiler collects performance metrics of the target
application node. The Cl. --.: ,l/, mn center classifies the application using
extracted key components and performs statistic ain i1, i of the classification
results. The Application DB stores the application class information. (m is the
number of snapshots taken in one application run, to0/t: are the beginning
ending times of the application execution, VMIP is the IP address of the
application's host machine).


system is used to sample the system performance of a computing node running an

application of interest.

2.3.1 Performance Profiler

The performance profiler is responsible for collecting performance data of the

application node. It interfaces with the resource manager to receive data collection

instructions, including the target node and when to start and stop.









where C1, C2, and C3 denote the unit cost per resource usage, switching, and penalty.

Therefore, k ,, is derived as


k,, = arg min TC'(k)
1 arg mmin [R(k) + C x TR(k) + CpP(k)] (5-10)
1
where C is the transition factor, Cp denote the discount factor for misprediction penalty,

which is the ratio of C3 to C1, and K is the maximum number of phases.

5.4 Phase Prediction

This section describes the work flow of the application resource demand phase

prediction illustrated in Figure 5-3. The prediction consists of two stages: a training

stage and a testing stage. During the training stage, the number of the clusters in

the application resource usage, the corresponding cluster centroids, and the unknown

parameters of the time series prediction model of the resource usage are determined.

During the testing stage, the one-step ahead resource usage is predicted and classified as

one of the clusters.

Both stages start from pattern representation and framing. In the step of pattern

representation, the collected performance data of the application VM are profiled to

extract only the features which will be used for clustering and future resource provisioning.

For example, in the one-dimension case discussed in this thesis, the training data of a

specific performance feature (Xx1,,, see Table 5-1), are extracted, where u is the total

number of input data. Then the extracted performance data Xix>1 are framed with the

prediction window size m to form data X'(Iu-m+1)xm-

The training stage mainly consists of two processes: prediction model fitting and

phase behavior analysis. The algorithms defined in Section 5.3.3 and 5.3.4 are used to

find out the number of phases k, which gives the lowest total resource provisioning cost.

The output phase profile is used to train the phase predictor. In addition, the unknown

parameters of the resource predictor are estimated from the training data. In this thesis,









Initially the performance profiler collected data of all the thirty-three (n = 33)

performance metrics once every five seconds (d = 5) during the application execution.

Then the data preprocessor extracted the data of the eight (p = 8) metrics listed in

Table 2-1 based on the expert knowledge of the correlation between these metrics and the

application classes. After that, the PCA processor conducted the linear transformation of

the performance data and selected principal components based on the minimal fraction

variance defined. In this experiment, the variance contribution threshold was set to extract

two (q = 2) principal components. It helps to reduce the computational requirements of

the classifier. Then, the trained 3-NN classifier conducts classification based on the data of

the two principal components.

The training data's class clustering diagram is shown in Figure 2-5 (a). The diagram

shows a PCA-based two-dimensional representation of the data corresponding to the five

classes targeted by our system. After being trained with the training data, the classifier

classifies the remaining benchmark programs shown in Table 2-2. The classifier provides

outputs in two kinds of formats: the application class-clustering diagram, which helps to

visualize the classification results, and the application class composition, which can be

used to calculate the unit application cost.

Figure 2-5 shows the sample clustering diagrams for three test applications. For

example, the interactive VMD application (Figure 2-5(d)) shows a mix of the idle class

when user is not interacting with the application, the I/O-intensive class when the user

is uploading an input file, and the Network-intensive class while the user is interacting

with the GUI through a VNC remote display. Table 2-3 summarizes the class compositions

of all the test applications. Figure 2-6 visualizes the class composition of some sample

benchmark programs. These classification results match the class expectations gained from

empirical experience with these programs. They are used to calculate the unit application

cost shown in section 4.4.









ratio schedule for the eight performance features


Performance Number of phases (k)
Features 1 2 3 4 5 6 7 8 9 10
CPU user 1.00 0.80 0.75 0.75 0.75 0.77 0.78 0.78 0.80 0.83
CPU_system 1.00 0.67 0.66 0.65 0.64 0.66 0.67 0.69 0.70 0.71
Bytes_in 1.00 0.97 0.96 0.96 0.96 0.96 0.96 0.95 0.95 0.95
Bytes_out 1.00 0.95 0.90 0.88 0.90 0.87 0.87 0.87 0.87 0.87
IO BI 1.00 0.57 0.52 0.55 0.56 0.58 0.62 0.63 0.62 0.64
IO BO 1.00 0.57 0.53 0.55 0.57 0.61 0.60 0.61 0.64 0.63
Swap_in 1.00 0.54 0.55 0.59 0.59 0.60 0.61 0.63 0.64 0.65
Swap_out 1.00 0.51 0.47 0.49 0.54 0.55 0.57 0.58 0.59 0.61
(Total cost ratio p = TC'(k)/TC'(1), where C = 52 and Cp = 8)

5.5.1.2 World Cup web log replay

In this experiment, phase characterization was performed for the performance data

collected from a network-intensive application, 1998 World Cup web access log repliv.

The workload used in this experiment was based on the 1998 World Cup trace [98].

The openly available trace containing a log of requests to Web servers was used as an

input to a client replay tool, which enabled us to exercise a realistic Web-based workload

and collect system-level performance metrics using Ganglia in the same manner that was

done for the SPECseis96 workload. For this study, we chose to replay the five hour (from

22:00:01 Jun.23 to 3:11:20 Jun.24) log of the least loaded server (serverlD 101), which

contained 130,000 web requests.

The phase analysis and prediction techniques can be used to characterize performance

data collected from not only virtual machines but also physical machines. During the

experiment, a physical server with sixteen Intel(R) Xeon(TM) MP 3.00GHz CPUs

and 32GB memory was used to execute the replay clients to submit requests based on

submission intervals, HTTP protocol types (1.0 or 1.1), and document sizes defined in

the log file. A physical machine with Intel(R) Pentium(R) 4 1.70GHz CPU and 512MB

memory was used to host the Apache web server and a set of files which were created

based on the file sizes described in the log.


Table 5-2. SPECseis96 total cost












1400

S1200

1000
0
S800

600
0
S400

S200


2 4 6 8
Number of Phases
(1


2 4 6 8
Number of Phases
D


Figure 5-4. Continued









outputs class composition, which can be used to support application cost models (Section

4.4). The post processed classification results together with the corresponding execution

time (t to) are stored in the application database and can be used to assist future

resource scheduling.

2.4 Experimental Results

We have implemented a prototype for application classification including a Perl

implementation of the performance profiler and a Matlab implementation of the

classification center. In addition, Ganglia was used to monitor the working status of

the virtual machines. This section evaluates our approach from the following three aspects:

the classification ability, the scheduling decision improvement and the classification cost.

2.4.1 Classification Ability

The application class set in this experiment has four classes: CPU-intensive, I/O and

paging-intensive, network-intensive, and idle. Application of I/O and paging-intensive

class can be further divided into two groups based on whether they have or do not have

substantial memory intensive activities. Various synthetic and benchmark programs,

scientific computing applications and user interactive applications are used to test

the classification ability. These programs represent typical application behaviors of

their classes. Table 2-2 summarizes the set of applications used as the training and the

testing applications in the experiments [28-38]. The 3-NN classifier was trained with the

performance data collected from the executions of the training applications highlighted in

the table. All the application executions were hosted by a VMware GSX virtual machine

(VM1). The host server of the virtual machine was an Intel(R) Xeon(TM\ ) dual-CPU

1.80GHz machine with 512KB cache and 1GB RAM. In addition, a second virtual

machine with the same specification was used to run the server applications of the network

benchmarks.









to measure the dissimilarity between two patterns. It works well when a data set has

"(, i 1iip I or -, ii. il" clusters. In case of clustering in the multi-dimensional space,

normalization of the continuous features can be used to remove the tendency of the

largest-scaled feature to dominate the others. In addition, Mahalanobis distance can be

used to remove the distortion caused by the linear correlation among features as discussed

in C'!i pter 3.

(3) Clustering or grouping: The clustering can be performed in a number of v--v- [97].

The output clustering can be hard (a partition of the data into groups) or f; ..;, (where

each pattern has a variable degree of membership in each of the output clusters). A hard

clustering can be obtained from a fuzzy partition by thresholding the membership value.

In this work, one of the most popular iterative clustering methods, k-means algorithm as

detailed in Section 5.3.3, is used.

5.3.2 Definitions and Notation

In this chapter, we follow the terms and notation defined in [97]

A pattern(or feature vector or observation) is a single data item used by the

clustering algorithm. It typically consists of a vector of d measurements.

The individual scalar components of a pattern are called features (or attributes).

d is the dimensionality of the pattern or of the pattern space.

A class refers to a state of nature that governs the pattern generation process

in some cases. More concretely, a class can be viewed as a source of patterns whose

distribution in feature space is governed by a probability density specific to the class.

Clustering techniques attempt to group patterns so that the classes thereby obtained

reflect the different pattern generation processes represented in the pattern set.

A distance measure is a metric on the feature space used to quantify the similarity of

patterns.










5 APPLICATION RESOURCE DEMAND PHASE ANALYSIS AND PREDIC-
TIONS. ................... ............... ...... 106

5.1 Introduction ... ................ ... .... ....... 106
5.2 Application Resource Demand Phase Analysis and Prediction Prototype 108
5.3 Data ('l-I. i,..g .......... ....... .............. 111
5.3.1 Stages in Cl.-1 .. . . 111
5.3.1 Stages in('CII-I ...... ........................... 111
5.3.2 Definitions and Notation .................. ... 112
5.3.3 k-means Clustering .................. ........ 113
5.3.4 Finding the Optimal Number of Cl- i .... . . ..... 114
5.4 Phase Prediction .................. ............. 117
5.5 Empirical Evaluation .................. ........... 118
5.5.1 Phase Behavior Analysis .................. ... 119
5.5.1.1 SPECseis96 benchmark ..... .......... 119
5.5.1.2 World Cup web log replay .............. 122
5.5.2 Phase Prediction Accuracy ............... .. .. 123
5.5.3 Discussion ............... .......... .. 125
5.6 Related Work . ............... ........... 126
5.7 Conclusion . ................ ............ 128

6 CONCLUSION . ............... ............ 135

REFERENCES. ..................... ...... .......... 137

BIOGRAPHICAL SKETCH .................. ............. 146















0 0, 0 0 O=t
t I I I n in 1- .
^o Mo 0a




0 0 00 00 0 0
Co 0>
I I I I I I I I I 0 0 0






0 C) 0o 0 0
- 0 0- o
I 0 I V I I I









0 C 0
o\ o in









0
0 0 0



- 6 6- 00













I0 00
I I~"~ I I ^ I In -
















--NC





< Um cn u < r >
Sa












*S 0 03
_?__ | _4


























2 4 6
Number of Phases
A


2 4 6
Number of Phases
B


2 4 6
Number of Phases
C


8 10


8 10


8 10


Figure 5-5. Phase analysis of WorldCup'98 BytesIn A)Phase transitions B)Misprediction
penalties C)Total cost with penalty (C, = 8)


1.6


g 1.55

E 1.5

" 1.45

1.4


1.35
0









experiments show that the proposed scheme can effectively select a performance metric

subset providing above 911' i classification accuracy for a set of benchmark applications.

In addition to the application resource demand modeling, C'! lpter 4 proposes a

learning based adaptive predictor, which can be used to predict resource availability. It

uses the k-NN classifier and PCA to learn the relationship between workload characteristic

and suited predictor based on historical predictions, and to forecast the best predictor

for the workload under study. Then, only the selected best predictor is run to predict the

next value of the performance metric, instead of running multiple predictors in parallel

to identify the best one. The experimental results show that this learning-aided adaptive

resource predictor can often outperform the single best predictor in the pool without a

priori knowledge of which model best fits the data.

The application classification and the feature selection techniques can be used

to define the application resource consumption patterns at any given moment. The

experimental results of the application classification -i -:.- -1 that allocating applications

which have complementary resource consumption patterns to the same server can improve

the system throughput.

In addition to one-step-ahead performance prediction, C'!i lter 5 studied the large

scale behavior application resource consumption. Clustering based algorithms have

been explored to provide a mechanism to define and predict the phase behavior of the

application resource usage to support on-demand resource allocation. The experimental

results show that an average of above 91' of phase prediction accuracy can be achieved

for the four-phase cases of the benchmark workloads.









TABLE OF CONTENTS


page


ACKNOWLEDGMENTS .........


LIST O F TABLES . . . . . . . . . .


LIST OF FIGURES ...............

ABSTRACT ....................


CHAPTER


1 INTRODUCTION ................

1.1 Resource Performance Modeling .......
1.2 Autonomic Computing ...........
1.3 Learning ....................
1.3.1 Supervised Learning .........
1.3.2 Unsupervised Learning .......
1.3.3 Reinforcement Learning .......
1.3.4 Other Learning Paradigms ..... .
1.4 Virtual Machines .. ..........
1.4.1 Virtual Machine ('C!I '.teristics .
1.4.2 Virtual Machine Plant .. ....

2 APPLICATION CLASSIFICATION BASED ON


MONITORING


RNING OF RESOURCE CONSUMPTION PATTERNS

2.1 Introduction . . . . . .
2.2 Classification Algorithms .. ............
2.2.1 Principal Component Analysis ....
2.2.2 k-Nearest Neighbor Algorithm ........
2.3 Application Classification Framework ........
2.3.1 Performance Profiler .. ..........
2.3.2 Classification Center .. ..........
2.3.2.1 Data preprocessing based on expert
2.3.2.2 Feature selection based on principal
2.3.2.3 Training and classification .....
2.3.3 Post Processing and Application Database
2.4 Experimental Results .. ..............
2.4.1 Classification Ability .. ..........
2.4.2 Scheduling Performance Improvement .
2.4.3 Classification Cost .. .............
2.5 Related W ork .. .................
2.6 Conclusion . . . . . .


AND


knowledge.
component


LEA-


analysis


.









5.3 Data Clustering

C'!l-I i ing is an important data mining technique for discovering patterns in the

data. It has been used effectively in many disciplines such as pattern recognition, biology,

geology, and marketing.

At a high-level, the problem of clustering is defined as follows: Given a set U of

n samples u u2, u,, we would like to partition U into k subsets U1, U2,. Uk

such that the samples assigned to each subset are more similar to each other than the

samples assigned to different subsets. Here, we assume that two samples are similar if they

correspond to the same phase.

5.3.1 Stages in Clustering

A typical pattern clustering activity involves the following steps [97]:

(1) Pattern representation, which is used to obtain an appropriate set of features to

use in clustering. It optionally consists of feature extraction and/or selection. Feature

selection is the process of identifying the most effective subset of the original features to

use in clustering. Feature extraction is the use of one or more transformations of the input

features to produce new salient features.

In the context of resource demand phase analysis, the features under study are the

system level resource performance metrics as shown in Table 5-1. For one dimension

(1< In- 1~.- which is the case of this work, the feature selection is as simple as choosing

the performance metric which is instructive to the allocation of the corresponding system

resource. For clustering based on multiple performance metrics, feature extraction

techniques such as Principal Component Analysis (PCA) may be used to transform the

input performance metrics to a lower dimension space to reduce the computing intensity of

subsequent clustering and improve the clustering quality.

(2) Definition of a pattern proximity measure appropriate to the data domain. The

pattern proximity is usually measured by a distance function defined on pairs of patterns.

In this work, the most popular metric for continuous features, Euclidean distance is used









3.6 Conclusion

The autonomic feature selection prototype presented in this chapter shows how

to apply statistical analysis techniques to support online application classification. We

envision that this classification approach can be used to provide first-order analysis of

the dominant resource consumption patterns of an application. This chapter shows that

autonomic feature selection enables classification without requiring expert knowledge in

the selection of relevant low-level performance metrics.









activities into account for the resource scheduling in an effective way. It does not require

modifications of the application source code. Experiments with various benchmark

applications -L.-. -1 that with the application class knowledge, a scheduler can improve

the system throughput 22.11 on average by allocating the applications of different classes

to share the system resources.

In this work, the input performance metrics are selected manually based on expert

knowledge. In the next chapter, the techniques for automatically selecting features for

application classification are discussed.









is capable of cloning an application-specific virtual machine and configuring it with an

appropriate execution environment. In the context of VMPlant, the application can be

scheduled to run on a dedicated virtual machine, which is hosted by a shared 1,;/-,.:. .'

machine. Within the VM, system performance metrics such as CPU load, memory usage,

I/O activity and network bandwidth utilization, reflect the application's resource usage.

The classification system described in this chapter leverages the capability of

summarizing application performance data by collecting system-level data within a

VM, as follows. During the application execution, snapshots of performance metrics are

taken at a desired frequency. A PCA processor analyzes the performance snapshots and

extracts the key components of the application's resource usage. Based on the extracted

features, a k-NN classifier categorizes each snapshot into one of the following classes:

CPU-intensive, IO-intensive, memory-intensive, network-intensive and idle.

By using this system, resource scheduling can be based on a comprehensive diagnosis

of the application resource utilization, which conveys more information than CPU load

in isolation. Experiments reported in this chapter show that the resource scheduling

facilitated with application class composition knowledge can achieve better average system

throughput than scheduling without the knowledge.

The rest of the chapter is organized as follows: Section 2.2 introduces the PCA and

the k-NN classifier in the context of application classification. Section 2.3 presents the

classification model and implementation. Section 2.4 presents and discusses experimental

results of classification performance measurements. Section 2.5 discusses related work.

Conclusions and future work are discussed in Section 2.6.

2.2 Classification Algorithms

Application behavior can be defined by its resource utilization, such as CPU load,

memory usage, network and disk bandwidth utilization. In principle, the more information

a scheduler knows about an application, the better scheduling decisions it can make.

However, there is a tradeoff between the complexity of decision-making process and the









In addition, the experimental data also demonstrate the impact of changing execution

environment configurations on the application's class composition. For example, in

Table 2-3 when SPECseis96 with medium size input data was executed in VM1 with

256MB memory (SPECseis96_A), it is classified as CPU-intensive application. In the

SPECseis96_B experiment, the smaller physical memory (32MB) resulted in increased

paging and I/O activity. The increased I/O activity is due to the fact that less physical

memory is available to the O/S buffer cache for I/O blocks. The buffer cache size at run

time was observed to be as small as 1MB in SPECseis96_B, and as large as 200MB in

SPECseis96_A. In addition, the execution time gets increased from 291 minutes and 42

seconds in the first case to 426 minutes 58 seconds in the second case.

Similarly, in the experiments with PostMark, different execution environment

configurations changed the application's resource consumption pattern from one class to

another. Table 2-3 shows that if a local file directory was used to store the files to be read

and written during the program execution, the PostMark benchmark showed the resource

consumption pattern of the I/O-intensive class. In contrast, with an NFS mounted file

directory, it (PostMark_NFS) was turned into a Network-intensive application.

2.4.2 Scheduling Performance Improvement

Two sets of experiments are used to illustrate the performance improvement that a

scheduler can achieve with the knowledge of application class. These experiments were

performed on 4 VMware GSX 2.5 virtual machines with 256MB memory each. One of

these virtual machines (VM1) was hosted on an Intel(R) Xeon(T\M) dual-CPU 1.80GHz

machine with 512KB cache and 1GB RAM. The other three (VM2, VM3, and VM4) were

hosted on an Intel(R) Xeon(TM) dual-CPU 2.40GHz machine with 512KB cache and 4GB

RAM. The host servers were connected by Gigabit Ethernet.

The first set of experiments demonstrates that the application class information can

help the scheduler to optimize resource sharing among applications running in parallel to

improve system throughput and reduce throughput variances. In the experiments, three









Table 3-2. Sample performance metrics in the original feature set
Performance Description
Metrics
cpu_system / user Percent CPU_system / user
/ idle / idle
cpunice Percent CPU nice
bytes_in / out Number of bytes per second
into / out of the network
io_bi / bo Blocks sent to / received from
a block device (blocks/s)
swap_in / out Amount of memory swapped
in / out from / to disk (kB/s)
pktsin / out Packets in / out per second
proc_run Total number of running
processes
load_one / five One / Five / Fifteen minutes
/ fifteen load average


the class, whose centroid has the smallest Mahanalobis distance min(di, d2, d5) to the

snapshot. Automated and adaptive threshold setting is discussed in detail in [67].

In our implementation, Ganglia is used as the monitoring tool and twenty (m = 20)

performance metrics, which are related to resource usage, are included in the training

data. These performance metrics include 16 out of 33 default metrics monitored by

Ganglia and the 4 metrics that we added based on the need of classification. The four

metrics include the number of I/O blocks read from/written to disk, and the number of

memory pages swapped in/out. A program was developed to collect these four metrics

(using vmstat) and added them to the metric list of Ganglia's monitoring daemon gmond.

Table 3-2 shows some sample performance metrics of the training candidate.

The first time quality assurance was performed by human expert at the initialization.

The subsequent assurance can be conducted automatically by following the above steps to

select representative training data for each class.

3.3.2 Feature Selector

Feature selector is responsible for selecting the features which are correlated with

the application's resource consumption pattern from the numerous performance metrics









The set of potential observations relevant to a particular problem are called features,

which also go by a host of other names, including attributes and variables. Only correctly

solved cases will be used in building the specific classifier, which is called the training

phase of the classification. The pattern of feature values for each case is associated with

the correct classification or decision to form the sample cases, a set which is also called

the training data. Thus, learning in any of these systems can be viewed as a process of

generalizing these observed empirical associations subject to the constrains imposed by

the chosen classifier model. During the testing phase, the customized classifier is used

to associate a specific pattern of observations with a specific class. The learning method

introduced above is a form of supervised 1. iiiiii which learn by being presented with

preclassified training data.

1.3.2 Unsupervised Learning

Unsupervised learning methods can learn without any human intervention. This

method is particularly useful in situations where data need to be classified or clustered

into a set of classifications but where the classifications are not known in advance. In

other words, it fits the model to observations. It differs from supervised learning by the

fact that there is not a priori output.

1.3.3 Reinforcement Learning

Reinforcement learning refers to a class of problems in machine learning which

postulate an agent exploring an environment in which the agent perceives its current

state and takes actions. A system that uses reinforcement learning is given a positive

reinforcement when it performs correctly and a negative reinforcement when it performs

incorrectly. However, the information of why and how the learning system performed

correctly is not provided to it.

Reinforcement learning algorithms attempt to find a policy for maximizing cumulative

reward for the agent over the course of the problem. The environment is typically
















10

Principal C'omponent
5 .
CN ". ..
c ....* h.'.: :" -"

c 0
U)
.






-15 '

-10 ----------------------------


-15 -10 -5 0 5 10 15
Dimension 1


Figure 2-1. Sample of principal component analysis


and the covariance matrix of the same data set is


Cx E (x x) (X x)T (2-3)


The components of Cx, denoted by cij, represent the covariances between the random

variable components xi and xj. The component cii is the variance of the component xi.

From a sample of vectors xl, .- XM, we can calculate the sample mean and the

sample covariance matrix as the estimates of the mean and the covariance matrix.

The eigenvectors ei and the corresponding eigenvalues Ai can be obtained by solving

the equation


Cxei Aiei, i= 1,-- n (2-4)














VMID
DevicelD
VMM VM (
Performance Profiler
DB

VM Host





VM: Virtual Machine
VMM: Virtual Machine Monitor Prediction
DB: Database [(x, ,,),...(x,_ 1,1)] QA
QA: Quality Assuror
m: Prediction window size
j: Quality assurance window size
ts/te: Starting / ending time stamps
Figure 4-1. Virtual machine resource usage prediction prototype
The monitor agent, which is installed in the Virtual Machine Monitor (VMM),
collects the VM resource performance data and stores them in the round robin
VM Performance Database. The profiler extracts the performance data of a
given time frame for the VM indicated by VMID and deviceID. The
LARPredictor select the best prediction model based on learning of historical
predictions, predicts the resource performance for time t+1, and stores the
prediction results in the prediction database. The prediction results can be
used to support the resource i,,i,.i.i. r to perform dynamic VM resource
allocation. The Performance Q;,n.:,lli Assuror (QA) audits the LARPredictor's
performance and orders re-training for the predictor if the performance drops
below a predefined threshold.


Our virtual machine resource prediction prototype, illustrated in Figure 4-1, models

how the VM performance data are collected and used to predict the value for future time

to support resource allocation decision-making.

A performance monitoring agent is installed in the Virtual Machine Monitor (VMM)

to collect the performance data of the guest VMs. In our implementation, VMware's ESX

virtual machines are used to host the application execution and the ; n,,, -.i,.: tool [85]

of ESX is used to monitor and collect the performance data of the VM guests and host














- U_ W. W-U-U-P


a,
*S
C
C
0)







0)
N
(U








C
+^






















b0
+0
e




.N
'E^
(U






If

a,

c/l
*a



*a

c/l












o o-
h ^
.^


-o


C






u)~






















.f 0



Cf


-s +
*





w *








u E
_____
00 .










4-3 Learning-aided adaptive resource predictor workflow

4-4 Learning-aided adaptive resource predictor dataflow

4-5 Best predictor selection for trace VM2Joadl5 .


4-6

4-7

4-8

4-9

4-10

4-11

4-12

4-13

5-1

5-2

5-3

5-4


Best predictor selection for trace VM2_PktIn . .

Best predictor selection for trace VM2_Swap . .

Best predictor selection for trace VM2_Disk . .

Predictor performance comparison (VM1) . ..

Predictor performance comparison (VM2) . ..

Predictor performance comparison (VM3) . ..

Predictor performance comparison (VM4) . ..

Predictor performance comparison (VM) . ..

Application resource demand phase analysis and predict

Resource allocation strategy comparison . ...

Application resource demand phase prediction workflow

Phase analysis of SPECseis96 CPU_user . ...


5-5 Phase analysis of WorldCup'98 BytesIn

5-6 Phase analysis of WorldCup'98 Bytesout


.

.


. . . 88

.. . 92

.. . 93

.. . 94

.. . 95

. .. . 10 1

. .. . 02

. .. . 03

. .. . 04

. .. . 05

ion prototype ...... 109

. .. . 115

. . . 129

. .. . 30

. .. . 33

. .. . 34









Table 3-4. Performance metric correlation matrixes of test applications. A)Correlation
matrix of SPECseis96 performance data B)Correlation matrix of PostMark
performance data C)Correlation matrix of NetPIPE performance data
Metric 1 2 3 4 5 6
1 1.00 -0.21 -0.34 0.74 0.20 -0.02
2 -0.21 1.00 -0.16 -0.02 -0.17 -0.06
3 -0.34 -0.16 1.00 -0.60 0.20 -0.05
4 0.74 -0.02 -0.60 1.00 -0.19 0.04
5 0.20 -0.17 0.20 -0.19 1.00 0.12
6 -0.02 -0.06 -0.05 0.04 0.12 1.00
A

Metric 1 2 3 4 5 6
1 1.00 -0.24 0.22 0.34 -0.08 -0.13
2 -0.24 1.00 -0.22 0.18 0.04 -0.02
3 0.22 -0.22 1.00 0.33 0.30 0.18
4 0.34 0.18 0.33 1.00 0.42 0.47
5 -0.08 0.04 0.30 0.42 1.00 0.20
6 -0.13 -0.02 0.18 0.47 0.20 1.00
B

Metric 1 2 3 4 5 6
1 1.00 0.29 0.31 0.48 0.27 0.30
2 0.29 1.00 0.49 0.39 0.75 0.95
3 0.31 0.49 1.00 0.50 0.59 0.52
4 0.48 0.39 0.50 1.00 0.42 0.39
5 0.28 0.75 0.59 0.42 1.00 0.75
6 0.30 0.95 0.52 0.39 0.75 1.00


-loadive
loadfifteen


2-pktsin
5-pkts_out


cpu_system
bytes_out


Correlations those are larger than 0.5 are highlighted
with bold characters


data were classified as network-intensive. The results matched with our empirical

experience with these programs and are close to the results of expert-selected-feature

based classification, which shows 85'. cpu-intensive for SPECseis96, 97'. I/O-intensive for

PostMark, and I-'. network-intensive for PostMark_NFS.









































Figure 1-1. Structure of an autonomic element.


the application resource performance modeling to support self-configuration and

self-optimization of application execution environments.

Generally, an autonomic system is an interactive collection of autonomic elements:

individual system constituents that contain resources and deliver services to humans and

other autonomic elements. As Figure 1-1 shows, an autonomic element will typically

consist of one or more managed elements coupled with a single autonomic manager that

controls and represents them. The managed element could be a hardware resource, such

as storage, a CPU, or a software resource, such as a database, or a directory service, or

a large legacy system [1]. The monitoring process collects the performance data of the









learned. In this work, the B ,i, i Network with a tree structure and full observability

is assumed. Figure 3-1 gives a sample BN learned in the experiment. The root is the

application class decision node, which is used to decide an application class given the value

of the leaf nodes. The root node is the parent of all other nodes. The leaf nodes represent

selected performance metrics, such as network packets sent and bytes written to disk.

They are connected one to another in a series.

3.2.3 Mahalanobis Distance

The Mahalanobis distance is a measure of distance between two points in the

multidimensional space defined by multidimensional correlated variables [22] [65]. For

example, if xl and x2 are two points from the distribution which is characterized by

covariance matrix E-1, then the quantity


((xi X2)T 1 X2)) (3 3)

is called the Mahalanobis distance from xi to X2, where T denotes the transpose of a

matrix.

In the cases where there are correlations between variables, simple Euclidean distance

is not an appropriate measure, whereas the Mahalanobis distance can adequately account

for the correlations and is scale-invariant. Statistical analysis of the performance data

in Section 3.4.3 shows that there are correlations between the application performance

metrics with various degrees. Therefore, Mahalanobis distance between the unlabeled

performance sample and the class centroid, which represents the average of all existing

training data of the class, is used in the training data qualification process in Section 3.3.1.

3.2.4 Confusion Matrix

Confusion matrix [66] is commonly used to evaluate the performance of classification

systems. It shows the predicted and actual classification done by the system. The matrix

size is LxL, where L is the number of different classes. In our case where there are five

target application classes, the L is equal to 5.
















Prediction Performance Comparison (VM4)


1.6
1.4
n 1.2
"I 1
. 0.8
E 0.6
o 0.4
z
0.2
0


1 2 3 4 5 6 7 8 9 10 11 12
Performance Metric ID
SP-LARP Knn-LARP OBays-LARP OCum.MSE W-Cum.MSE
Figure 4-12. Predictor performance comparison (VM4)
1 CPU_usedsec, 2 CPUready, 3 Mem_size, 4 Mem_swap,
5 NICl_rx, 6 NICl_tx, 7 NIC2_rx, 8 NIC2_tx,
9 VDlread, 10 VDl_write, 11 VD2sread, 12 VD2_write


11 11 11 11 11 11 1 11 11 11 11 11 l









CHAPTER 6
CONCLUSION

Self-management has drawn increasing attentions in the last few years due to

the increasing size and complexity of computing systems. A resource scheduler that

can perform self-optimization and self-configuration can help to improve the system

throughput and free system administrators from labor-intensive and error-prone tasks.

However, it is challenging to equip a resource scheduler with such self- capacities because

of the dynamic nature of system performance and workload.

In this dissertation, we propose to use machine learning techniques to assist system

performance modeling and application workload characterization, which can provide

support for on-demand resource scheduling. In addition, virtual machines are used

as resource containers to host application executions for the ease of dynamic resource

provisioning and load balancing.

The application classification framework presented in C'! lpter 2 used the Principal

Component Analysis (PCA) to reduce the dimension of the performance data space.

Then the k-Nearest Neighbor k-NN algorithm is used to classify the data into different

classes such as CPU-intensive, I/O-intensive, memory-intensive, and network-intensive. It

does not require modifications of the application source code. Experiments with various

benchmark applications -,i--. -1 that with the application class knowledge, a scheduler

can improve the system throughput 22.11 on average by allocating the applications of

different classes to share the system resources.

The feature selection prototype presented in C(i ipter 3 uses a probabilistic model

(B i-, i i, Network) to systematically select the representative performance features,

which can provide optimal classification accuracy and adapt to changing workloads. It

shows that autonomic feature selection enables classification without requiring expert

knowledge in the selection of relevant low-level performance metrics. This approach

requires no application source code modification nor execution intervention. Results from










A,x,
/ a11 "' aIm
all .. a,
a21 a2m



a\ nl anm


Figure 2-4.


A'tpx

Preprocess / /
S11 1m
Preprocess a .. a2m PCA Classify Vote Class
---- --- ^ -- < ^1 1 -- ^ 'i ---- la s s
n>p : p>q q. q>1 (cil .. cTm
a p a pm q 1 bj bqm


Performance feature space dimension reductions in the application
classification process
m: The number of snapshots taken in one application run,
n: The number of performance metrics,
Anxm: All performance metrics collected by monitoring system,
A'pxm: The selected relevant performance metrics after the zero-mean and
unit-variance normalization,
Bqxm: The extracted key component metrics,
C1xm: The class vector of the snapshots,
Class: The application class, which is the ini i iily vote of snapshots' classes.


For example, performance metrics of CPU_System and CPU_User are correlated to

CPU-intensive applications; Bytes_In and Bytes_Out are correlated to Network-intensive

applications; IO_BI and IOBO are correlated to the IO-intensive applications; SwapIn

and Swap_Out are correlated to Memory-intensive applications. The data preprocessor

extracts these eight metrics of the target application node from the data pool based on our

expert knowledge. Thus it reduces the dimension of the performance metric from n = 33

to p = 8 and generates A'pxm as shown in Figure 2-4. In addition, the preprocessor also

normalizes the selected metrics to zero-mean and unit-variance.

2.3.2.2 Feature selection based on principal component analysis

The PCA processor takes the data collected for the performance metrics listed in

Table 2-1 as inputs. It conducts the linear transformation of the performance data and

selects the principal components based on the predefined minimal fraction variance. In

our implementation, the minimal fraction variance was set to extract exactly two principal

components. Therefore, at the end of processing, the data dimension gets further reduced

from p = 8 to q = 2 and the vector Bqxm is generated, as shown in Figure 2-4.









CHAPTER 2
APPLICATION CLASSIFICATION BASED ON MONITORING AND LEARNING OF
RESOURCE CONSUMPTION PATTERNS

Application awareness is an important factor of efficient resource scheduling. This

chapter introduces a novel approach for application classification based on the Principal

Component Analysis (PCA) and the k-Nearest Neighbor (k-NN) classifier. This approach

is used to assist scheduling in heterogeneous computing environments. It helps to reduce

the dimensionality of the performance feature space and classify applications based on

extracted features. The classification considers four dimensions: CPU-intensive, I/O

and paging-intensive, network-intensive, and idle. Application class information and the

statistical abstracts of the application behavior are learned over historical runs and used to

assist multi-dimensional resource scheduling.

2.1 Introduction

Heterogeneous distributed systems that serve application needs from diverse users

face the challenge of providing effective resource scheduling to applications. Resource

awareness and application awareness are necessary to exploit the heterogeneities of

resources and applications to perform adaptive resource scheduling. In this context, there

has been substantial research on effective scheduling policies [2-4] with given resource and

application specifications. There are several methods for obtaining resource specification

parameters (e.g., CPU, memory, disk information from /proc in Unix systems). However,

application specification is challenging to describe because of the following factors:

Numerous ';:/. of applications: In a closed environment where only a limited number

of applications are running, it is possible to analyze the source codes of each application

or even plug in codes to indicate the application execution stages for effective resource

scheduling. However, in an open environment such as in Grid computing, the growing

number of applications and lack of knowledge or control of the source codes present

the necessity of a general method of learning application behaviors without source code

modifications.











Resource Pattern

I I I I I.... I I


Training Data


(ixu Clxk)


P'
SIxv


Testing Data


)xm I


Yr
I
(v-m+l)xm





----------


S1
_* 1v


-I Phase Prediction Accuracy


Figure 5-3.


Application resource demand phase prediction workflow
In the training stage, the u performance data Xix,, of features) used in the
subsequent phase analysis are extracted (pattern representation) and framed
with prediction window size m. The unknown parameters of the resource
predictor is estimated during model fitting using the framed training data
X't(u-+1)x. In addition, the clustering algorithms introduced in Section 5.3
are used to construct the application phase profile including the phase labels
ix,u for all the samples and the calculated cluster centroids Clxk. In the
testing phase, the phase predictor uses the knowledge learned from the phase
profile to predict the future phases P1ixv based on the predicted resource
usage Y/x1,, and Pix, based on observed actual resource usage Yix,,, and
compare them to evaluate the phase prediction accur r ;,


7f
xv V









[98] W\\.. i II up98," http://ita.ee.lbl.gov/html/contrib/WorldCup.html.

[99] "Logreplayer," http://www.cs.virginia.edu/ rz5b/software/logreplayer-manual.htm.

[100] C. Isci, A. Buyuktosunoglu, and M. Martonosi, "Long-term workload phases:
duration predictions and applications to dvfs," IEEE Micro, vol. 25, no. 5, pp.
39-51, 2005.

[101] C. Isci and M. Martonosi, "Phase characterization for power: evaluating
control-flow-based and event-counter-based techniques," Proc. 12th International
Symposium on High-Performance Computer Architecture, pp. 121-132, 2006.

[102] T. Sherwood, E. Perelman, G. Hamerly, and B. Calder, "Automatically
characterizing large scale program behavior," in Proc. 10th International Con-
ference on Architectural Support for P,.ii ,' ii,,i,.,:, Iraq,. and Operating S1,1 -1.
2002, pp. 45-57.

[103] H. Patil, R. Cohn, M. C'!i i,'-y, R. Kapoor, A. Sun, and A. Karunanidhi,
"Pinpointing representative portions of large intel itanium programs with dynamic
instrumentation," in Proc. 37th annual international symposium on Microarchitec-
ture, 2004.

[104] R. Balasubramonian, D. Albonesi, A. Buyuktosunoglu, and S. Dwarkadas,
\h i, ,,'y hierarchy reconfiguration for energy and performance in general purpose
architectures," in Proc. 33rd annual international symposium on microarchitecture,
Dec. 2000, pp. 245-257.

[105] A. Dhodapkar and J. Smith, \l i1, i, ig multi-configuration hardware via dynamic
working set analysis," in Proc. 29th Annual International Symposium on Computer
Architecture, Anchorage, AK, May 2002, pp. 233-244.

[106] A. Dhodapkar and J. Smith, "Comparing program phase detection techniques," in
Proc. 36th Annual IEEE/AC M International Symposium on Microarchitecture, 2003,
pp. 217-227.

[107] B. Urgaonkar, P. S1:, i,,i-, A. C'! iiili and P. Gov,-- "Dynamic provisioning
of multi-tier internet applications," in Proc. ',I./ International Conference of
Autonomic C ',,n-,j:,,; June 2005, pp. 217-228.

[108] J. Wildstrom, P. Stone, E. Witchel, R. J. Mooney, and M. Dahlin, "Towards
self-configuring hardware for distributed computer systems," in Proc. 2nd Interna-
tional Conference of Autonomic Computing, June 2005, pp. 241-249.

[109] J. S. C'! .-, D. E. Irwin, L. E. Grit, J. D. Moore, and S. E. Sprenkle, "Dynamic
virtual clusters in a grid site manager," Proc. 12th IEEE International Symposium
on High Performance Distributed Corn,,',l:,.',! pp. 90-100, June 2003.








Table 5-5. Average phase prediction accuracy of the five VMs
Performance Number of Phases
Features 1 2 3 4 5 6 7 8 9 110
CPUUsed 1.00 0.85 0.69 0.60 0.51 0.48 0.43 0.44 0.38 0.35
CPU_Ready 1.00 0.81 0.67 0.52 0.45 0.36 0.36 0.32 0.33 0.32
Mem Size 1.00 0.91 0.84 0.71 0.70 0.59 0.57 0.52 0.50 0.48
Mem_Swap 1.00 0.96 0.89 0.89 0.83 0.75 0.71 0.70 0.66 0.64
NIC #1RX 1.00 0.58 0.54 0.47 0.41 0.39 0.37 0.34 0.30 0.28
NIC #1TX 1.00 0.56 0.48 0.42 0.39 0.35 0.29 0.26 0.29 0.25
NIC#2_RX 1.00 0.93 0.77 0.70 0.61 0.55 0.46 0.33 0.31 0.24
NIC#2_TX 1.00 0.88 0.81 0.76 0.71 0.63 0.53 0.48 0.56 0.45
Diskl Read 1.00 0.97 0.92 0.86 0.80 0.73 0.64 0.56 0.52 0.44
Diskl Write 1.00 0.94 0.87 0.78 0.70 0.67 0.63 0.59 0.58 0.55
Disk2 Read 1.00 0.67 0.61 0.55 0.50 0.49 0.47 0.46 0.41 0.38
Disk2 Write 1.00 0.93 0.84 0.76 0.60 0.57 0.51 0.46 0.41 0.38


3. In this work, one dimensional phase analysis and prediction is performed. However
the prototype can generally work for multi-dimension resource provisioning cases also.
For clustering in the multi-dimension space, additional pattern representation techniques
such as Principal Component Analysis (PCA) can be used to project the data to lower
dimensional space to reduce the computing intensity. In addition, the transition factor
C will represent the unit transition cost defined in the pricing schedule of the resource
provider.
Developing prediction models for parallel and multi-tier applications is part of our
future research.
5.6 Related Work
Recently, application's phase behavior has drawn a growing research interest for
different reasons. First, tracking application phases enables workload dependent dynamic
management of power/performance trade-offs [100][101]. Second, phase characterization
that summarizes application behavior with representative execution regions can be used









each cluster to maximize system revenue [110]. Tesauro et al. used a combination of

reinforcement learning and queuing model for system performance management [5].

5.7 Conclusion

The application resource demand phase analysis and prediction prototype presented

in this chapter shows how to apply statistical learning techniques to support on-demand

resource provisioning. This chapter shows how to define the phases in the context of

system level resource provisioning and provides an approach to automatically find out

the number of phases which can provide optimal cost. The proposed cost model can

take the resource cost, phase transition cost, and prediction accuracy into account. The

experimental results show that an average of above 9,' i of phase prediction accuracy can

be achieved in the experiments across the CPU and network performance features under

study for the four-phase cases. With the knowledge of the system level application phase

behavior, we envision dynamic optimization of resource scheduling during the application

run can be performed to improve system utilization and reduce the cost for the user.

Providing more informative phase prediction can help to achieve this goal and is part of

our future research.










Predictor Performance Comparison (VM1)


1.4
1.2
LU
i> 1
N 0.8
N
S0.6
E
S0.4
z
0.2
0


1 2 3 4 5 6 7 8 9 10 11 12
Performance Metric ID

U P-LARP U Knn-LARP O Bays-LARP O Cum.MSE U W-Cum.MSE


Figure 4-9. Predictor performance comparison (VM1)
1 CPU_usedsec, 2 CPU_ready, 3 Mem_size, 4 Mem_swap,
5 NICl_rx, 6 NICl_tx, 7 NIC2-rx, 8 NIC2_tx,
9 VDl_read, 10 VDl_write, 11 VD2_read, 12 VD2_write


the B -i ,i classifier are used to forecast the best predictor for the workload based on

the learning of historical load characteristics and prediction performance. The principal

component analysis technique has been applied to reduce the input data dimension of

the classification process. Our experimental results with the traces of the full range

of virtual machine resources including CPU, memory, network and disk show that the

LARPredictor can effectively identify the best predictor for the workload and achieve

prediction accuracies that are close to or even better than any single best predictor.










Application Throughput of Different Schedules
250
SPN
SPPN
a. 200

)SSN
150- SSN
SMIN
0 100 *MAX
CLOO
SAVG
(SP } 50 Figue SPN
< 0

SPECseis96 PostMark NetPIPE
Application

Figure 2-8. Application throughput comparisons of different schedules. MIN, MAX, and
AVG are the minimum, maximum, average application throughput of all the
ten possible schedules. SPN is the proposed schedule 10 {(SPN), (SPN),
(SPN)} in Figure 2-7.

Table 2-4. System throughput: concurrent vs. sequential executions
Execution Elapsed CH3D PostMark Time Taken to
Time (sec) Finish 2 Jobs
Concurrent 613 310 613
Sequential 488 264 752


throughput of schedule ID 10 (labeled SPN in Figure 2-8) with the minimum, maximum,

and average throughputs of all the ten possible schedules. By allocating jobs from different

classes to the machine, the three applications' throughputs were higher than average by

different degrees: SPECseis96 Small by 24.911' Postmark by 48.1;:', and NetPIPE by

4.2'' Figure 2-8 also shows that the maximum application throughputs were achieved

by sub-schedule (SSN) for SPECseis96 and (PPN) for NetPIPE instead of the proposed

(SPN). However, the low throughputs of the other applications in the sub-schedule make

their total throughputs sub-optimal.









to reduce the high computation costs of large-scale simulations [102] [103]. Our purpose to

study the phase behavior is to support dynamic resource provisioning of the application

containers.

In addition to the purpose of study, our approach differs from traditional program

phase analysis in the following v--,-

1) Performance metric under study: In the area of power management and simulation

optimization for computer architecture research, the metrics used for workload charac-

terization are typically Basic Block Vectors (BBV) [102] [101], conditional branch counter

[104], and instruction working set [105]. In the context of application VM/container's

resource provisioning, the metrics under study are the system level performance features,

which are instructive to VM resource provisioning such as those shown in Table 5-1.

2) Knowledge of the program codes: While [102] [101] [104] at least requires profiling

of program binary codes, our approach requires neither instrumentation nor access of

program codes.

3) This thesis answers the question !:i.-- many clusters are b. -I in the context of

system level resource provisioning.

In [106], Dhodapkar et al. compared three dynamic program phase detection

techniques discussed in [102], [104], and [105] using a variety of performance metrics, such

as sensitivity, stability, performance variance and correlations between phase detection

techniques.

In addition, other related work on resource provisioning include: Urgaonkar et al.

studied resource provisioning in a multi-tier web environment [107]. Wildstrom et al.

developed a method to identify the best CPU and memory configuration from a pool of

configurations for a specific workload [108]. ('!: .- et al. have proposed a hierarchical

architecture that allocates virtual clusters to a group of applicaitons [109]. Kusic et al.

developed an optimization framework to decide the number of servers to allocate to









Semi-Supervised Learning: Given a mix of labeled and unlabeled data, it can get

better predictor than just training on labeled data.

Transductive Learning: It trains a classifier to give best predictor on a specific set of

test data.

Active Learning: It chooses or constructs optimal samples to train on next with the

objective to achieve best predictor with fewest labeled samples.

Nonlinear Dime ...: ..,.i.:;l. Reduction: It learns underlying complex manifolds of data

in high dimensional spaces.

In this work, various learning techniques are used to model the application resource

demand and system performance. These models can help to system to adapt to the

changing workload and achieve higher performance.

1.4 Virtual Machines

Virtual machines were first developed and used in the 1960s, with the best-known

example being IBM's VM/370 [10]. A "
independent, isolated operating systems (guest VM) to run on one physical machine (host

server), efficiently multiplexing system resources of the host machine [10].

A virtual-machine monitor (VMM) is a software lv--r that runs on a host platform

and provides an abstraction of a complete computer system to higher-level software.

The abstraction created by the VMM is called a virtual machine. Figure 1-3 shows the

structure of virtual machines.

1.4.1 Virtual Machine Characteristics

Virtual machines can greatly simplify system management (especially in environments

such as Grid computing) by raising the level of abstraction from that of the operating

system user to that of the virtual machine to the benefit of the resource providers and

users [11]. The following characteristics of virtual machines make them a highly flexible

and manageable application execution platform:









4.6.2 Testing Phase

Similar to the training phase, the testing data are normalized using the normalization

coefficient derived from the training phase and framed with the prediction window size

m. Then the PCA is used to reduce the dimension of the preprocessed testing data

(iy'tmrru-m+i '-1)) from m to n.
In the testing phase of the LARPredictor that is based on k-NN classifier, the

Euclidean distances between all PCA processed test data (y/'_/ yt_ +ii ... y ') and all

training data X"-_,+m)xx in the reduced n dimensional feature space are calculated and

the k (k = 3 in our implementation) training data which have the shortest distances to the

testing data are identified. The 1 I i' ,i ity vote of the k nearest neighbors' best predictor

will be chosen as the best predictor to predict y't based on the (y_,-, _-_+, ... ,y_-1) in

case of the AR model or the SWAVG model and y't = y'_, in case of the LAST model.

The prediction performance can be obtained by comparing the predicted value y't with the

normalized observed value y'.

In the testing phase of the LARPredictor that is based on B i-, -i in classifier, test

data are preprocessed the same as the k-NN classifier. The PCA-processed test data

(yt'_- y t"-+, ... yt-i) are p]li--:. 'l into the discriminant function (4 12) derived in
Section 4.5.2. The parameters in the discriminant function for each class, the mean vector

and covariance matrix, are obtained during the training phase. Then, each test data is

classified as the class of the largest discriminant function.

The testing phase differs from the training phase in that it does not require running

multiple predictors in parallel to identify the one which is best suited to the data and

gives the smallest MSE. Instead, it forecasts the best predictor by learning from historical

predictions. The reasoning here is that these nearest neighbors' workload characteristics

are closest to the testing data's and the predictor that works best for these neighbors

should also work best for the testing data.









[27] M. L. Massie, B. N. Chun, and D. E. Culler, "The ganglia distributed monitoring
system: Design, implementation, and experience.," Parallel ConTrI'I,.:, vol. 30, no.
5-6, pp. 817-840, 2004.

[28] N\' I 'iI," http://www.netapp.com/techlibrary/3022.html.

[29] R. Eigenmann and S. Hassanzadeh, "Benchmarking with real industrial applications:
the spec high-performance group," IEEE Computational Science and Engineering,
vol. 3, no. 1, pp. 18-23, 1996.

[30] "Ettcp," http://sourceforge.net/projects/ettcp/.

[34] Q. Snell, A. Mikler, and J. Gustafson, I lpipe: A network protocol independent
performance evaluator," June 1996.

[31] "Simplescalar," http://www.cs.wisc.edu/ mscalar/simplescalar.html.

[32] "Ch3d," http://users.coastal.ufl.edu/ pete/CH3D/ch3d.html.

[33] "Bonnie," http://www.textuality.com/bonnie/.

[35] "Vmd," http://www.ks.uiuc.edu/Research/vmd/.

[36] "Spim," http://www.cs.wisc.edu/ larus/spim.html.

[37] "Reference of stream," http://www.cs.virginia.edu/stream/ref.html.

[38] "Autobench," http://www.xenoclast.org/autobench/.

[39] I. Guyon and A. Elisseeff, "An introduction to variable and feature selection," J.
Mach. Learn. Res., vol. 3, pp. 1157-1182, Mar. 2003.

[40] Y. Liao and V. R. Vemuri, "Using text categorization techniques for intrusion
detection," in 11th USENIX S.. ii; Symposium, San Francisco, CA, Aug. 5-9,
2002, pp. 51-59.

[41] A. K. Ghosh, A. Schwartzbard, and M. Schatz, "Learning program behavior profiles
for intrusion detection," in Proc. the Workshop on Intrusion Detection and Network
Monitoring, Santa Clara, CA, Apr. 9-12, 1999, pp. 51-62.

[42] M. Almgren and E. Jonsson, "Using active learning in intrusion detection," in Proc.
17th IEEE Computer S,. ,:; Foundations Workshop, June 28-30, 2004, pp. 88-98.

[43] S. C. Lee and D. V. Heinbuch, "Training a neural-network based intrusion detector
to recognize novel attacks.," IEEE Transactions on Sii-/, Man, and C;,d, ,,. ,.:..
Part A, vol. 31, no. 4, pp. 294-299, 2001.

[44] G. Forman, "An extensive empirical study of feature selection metrics for text
classification," J. Mach. Learn. Res., vol. 3, pp. 1289-1305, 2003.









This expression can be evaluated if the densities p(xlui) are multivariate normal. In

this case, we have

1 d
g (x) -(x pfE-1(x ) ln27
2 2
1
-n i + In P(u.). (4-12)
2

The resulting classification is performed by evaluating discriminant functions. When

the workload have similar statistical property, the B ,v -i i classifier derived from one

workload trace can be applied to another directly. In case of highly variable workload, the

retraining of the classifier is necessary.

4.5.3 Principal Component Analysis

The Principal Component A,.,,li,.:- (PCA) [22][88], also called Karhunen-Lodve trans-

form, is a linear transformation representing data in a least-square sense. The principal

components of a set of data in RP provide a sequence of best linear approximations to

those data, of all ranks q < p.

Denote the observations by xl,, x2, XN, and the parametric representation of the

rank-q linear model is as follows:




f(A) = + VqA, (4-13)


where p is a location vector in Rp, Vq is a p x q matrix with q orthogonal unit vectors

as columns, which are called eigenvectors, and A is a vector of q parameters, which are

called .:ij ,..l',. These eigenvectors are the principal components. The corresponding

eigenvalues represent the contribution to the variance of data. Often there will be just a

few (= k) large eigenvalues and this implies that k is the inherent dimensionality of the

subspace governing the data. When the k largest eigenvalues of q principal components are

chosen to represent the data, the dimensionality of the data reduces from q to k.








Table 4-1. Normalized prediction MSE statistics for resources of VM1
Predictors
Perf.Metrics P-LAR LAR LAST AR SW
CPU usedsec 0.6976 0.9508 1.1436 0.9456 1.0352
CPU ready 0.6775 0.9632 1.1699 0.9579 1.0333
Memory_size 0.2071 0.2389 0.2298 0.2379 0.4883
Memory swapped 0.2071 0.2386 0.2298 0.2379 0.4883
NIC1 received 0.3981 0.5436 1.836 0.5436 0.9831
NIC1 transmitted 0.3776 0.5845 1.8236 0.5845 0.9829
NIC2 received 0.9788 0.9912 1.4392 0.9966 1.0397
NIC2 transmitted 0.3983 0.5463 1.8406 0.5463 0.9843
VD1 read 0.9062 1.0215 1.2849 0.9754 1.0511
VD1 write 0.7969 0.9587 1.1905 0.9473 1.0566
VD2 read 1 1.2156 1.4191 1.1536 1.035
VD2 write 0.662 0.9931 1.1572 0.9929 1.0292


duration = 168 hours, interval = 30 minutes,


prediction order =16


LARPredictor to outperform any single predictor in the pool and approach the prediction
accuracy of the P-LAR by improving the best predictor forecasting / classification
accuracy. How to further improve the predictor classification accuracy is a topic of our
future research.
4.7.2.2 Performance comparison of k-NN and Bayesian-classifier based
LARPredictor
In this experiment, a set of VM trace with 138,240 performance data were used
to feed the LARPredictor. Half of the data were used for training and the other half
were used for testing. A B ,v- i i:-classifier based LARPredictor was implemented.
Fig. 4-9 shows the prediction performance comparisons between it and the k-NN based
LARPredictor for all the resources of VM1. The profile report of the Matlab program
execution showed that it cost the kNN based LARPredictor 205.8 second CPU time, with
193.5 seconds in the testing phase and 12.3 seconds in the training phase. It took 132.1































x Idle
0 10
CPU
a NET
O MEM



*


x Idle
* CPU


C
E 2
0
0 1
E


o.

a_


-6 -4 -2 0
Principal Component 1


2 4


-6 -4 -2 0
Principal Component 1


C,
E 2
C
0
S1
E
o
0
-7 0
o.a
2-


-2


-6 -4 -2 0
Principal Component 1


2 4


-6 -4 -2 0
Principal Component 1

D


Figure 2-5.


Sample clustering diagrams of application classifications. A)Training

data:Mixture B)SimpleScalar:CPU-intensive C)Autobench:Network-intensive

D)VMD:Interactive Principal Component 1 and 2 are the principal component

metrics extracted by PCA.


3
CM
E 2
C


o
0
0
0

2 -1
o_


2 4


2
C

1 1
E
0
0
7Z 0
2-
o.


x Idle
0 10
E NET


2 4









performance than the locally weighted regression algorithms for the tools tested. Our

choice of k-NN classification is based on conclusions from [45]. This thesis differs from

Kapadia's work in the following v--,v- First, the application class knowledge is used to

facilitate the resource scheduling to improve the overall system throughput in contrast

with Kapadia's work, which focuses on application CPU time prediction. Second, the

application classifier takes performance metrics as inputs. In contrast, in [45] the CPU

time prediction is based on the input parameters of the application. Third, the application

classifier employs PCA to reduce the dimensionality of the performance feature space. It is

especially helpful when the number of input features of the classifier is not trivial.

Condor uses process checkpoint and migration techniques [20] to allow an allocation

to be created and preempted at any time. The transfer of checkpoints may occupy

significant network bandwidth. Basney's study in [46] shows that co-scheduling of CPU

and network resources can improve the Condor resource pool's goodput, which is defined

as the allocation time when a remotely executing application uses the CPU to make

forward progress. The application classifier presented in this thesis performs learning of

application's resource consumption of memory and I/O in addition to CPU and network

usage. It provides a way to extract the key performance features and generate an abstract

of the application resource consumption pattern in the form of application class. The

application class information and resource consumption statistics can be used together

with recent multi-lateral resource scheduling techniques, such as Condor's Gang-matching

[47], to facilitate the resource scheduling and improve system throughput.
Conservative Scheduling [4] uses the prediction of the average and variance of the

CPU load of some future point of time and time interval to facilitate scheduling. The

application classifier shares the common technique of resource consumption pattern

analysis of a time window, which is defined as the time of one application run. However,

the application classifier is capable to take into account usage patterns of multiple kinds of

resources, such as CPU, I/O, network and memory.









CHAPTER 3
AUTONOMIC FEATURE SELECTION FOR APPLICATION CLASSIFICATION

Application classification techniques based on monitoring and learning of resource

usage (e.g., CPU, memory, disk, and network) have been proposed in Ci Ilpter 2 to aid in

resource scheduling decisions. An important problem that arises in application classifiers

is how to decide which subset of numerous performance metrics collected from monitoring

tools should be used for the classification. This chapter presents an approach based on

a probabilistic model (B i-, -i ,i Network) to systematically select the representative

performance features, which can provide optimal classification accuracy and adapt to

changing workloads.

3.1 Introduction

Awareness of application resource consumption patterns (such as CPU-intensive, I/O

and paging-intensive and network-intensive) can facilitate the mapping of workloads to

appropriate resources. Techniques of application classification based on monitoring and

learning of resource usage can be used to gain application awareness [53]. Well-known

monitoring tools such as the open source packages Ganglia [54] and dproc [55], and

commercial products such as HP's Open View [56] provide the capability of monitoring

a rich set of system level performance metrics. An important problem that arises is how

to decide which subset of numerous performance metrics collected from monitoring tools

should be used for the classification in a dynamic environment. In this chapter we address

this problem. Our approach is based on autonomic feature selection and can help to

improve the system's self-manageability [1] by reducing the reliance on expert knowledge

and increasing the system's adaptability.

The need for autonomic feature selection and application classification is motivated by

systems such as VMPlant [16], which provides automated resource provisioning of Virtual

Machine (VM). In the context of VMPlant, the application can be scheduled to run on a

dedicated virtual machine, whose system level performance metrics reflect the application's








Table 4-2. Normalized prediction MSE statistics for resources of VM2
Predictors
Perf.Metrics P-LAR LAR LAST AR SW
CPU usedsec 0.8142 1.1158 1.2476 1.0311 1.0912
CPU ready 0.7873 1.0128 1.2167 1.0166 1.0948
Memory_size 0.5328 0.6213 0.637 0.6262 0.79
Memory swapped 0.5328 0.6214 0.637 0.6262 0.7901
NIC1 received 0.4872 0.6189 0.6663 0.611 0.6831
NIC1 transmitted 0.7581 1.0138 1.0303 1.0209 1.0737
NIC2 received 0.6626 0.89 0.8765 0.8923 1.0242
NIC2 transmitted 0.7434 0.9924 1.0266 0.9949 1.0775
VD1 read 0.9582 1.0467 1.2249 1.0264 1.0912
VD1 write 0.7733 1.0744 1.1574 1.0129 1.0748
VD2 read 1.0208 1.4153 1.4155 1.0843 1.0972
VD2 write 0.7389 0.9941 1.0816 0.9372 1.0792


duration = 24


hours, interval = 5 minutes,


prediction order = 5


second CPU time for the B il, -i oi based LARPredictor to finish execution with 120.8
second testing phase and 11.3 second training phase.
The experimental results show that the prediction accuracy in terms of normalized
MSE of the B li-i ,i-classifier based LARPredictor is about 3.>'-. worse than the k-NN
based one. However, it shortened the CPU time of the testing phase by 37.57'.
4.7.2.3 Performance comparison of the LARPredictors and the cumulative-
MSE based predictor used in the NWS
This section compares the prediction accuracy of the LARPredictors and the NWS
predictor. Fig. 4-9, 4-10, 4-11, 4-12, and 4-13 shows the prediction accuracy of the perfect
LARPredictor that has l(1i'- best predictor forecasting accuracy (P-LARP), the k-NN
and B ,vi -i ,i based LARPredictors (KnnLARP and B ,i -LARP), the cumulative MSE
of all history based predictor used in the NWS (Cum.MSE), and the cumulative-MSE









a time-series prediction model, autoregressive (AR), is used for its simplicity and proven

success in computer system resource prediction [78]. However, this prototype can generally

work with any other time-series prediction models. In case of highly dynamic workloads,

the Learning-Aided Resource Predictor (LARPredictor) developed in C'!i pter 4 can be

used. The LARPredictor uses a mix-of-experts approach, which adaptively chooses the

best prediction model from a pool of models based on learning of the correlations between

the workload and fitted prediction models of historical runs.

Similar to the training stage, the testing data are extracted Y1ix, and framed with

the prediction window size m. The framed testing data Y'v-nm+I)xm are used as input

of the fitted resource predictor to predict the future resource usage Y'i,. The phase

predictor classifies the predicted resource usages Y'ix, into the phases P'1Ix based on the

phase profile learned in the training stage Similarly, the phase predictions for the actual

resource usage Yx1, are performed to generate Pix,. Then the corresponding predicted

phases P'1Ix (which are based on predicted resource usage) and Pixv (which are based on

actual resource usage) are compared to evaluate the phase prediction accur ; ., which is

defined as the ratio of the number of matched phase predictions over the total number of

phase predictions.

5.5 Empirical Evaluation

We have implemented a prototype for the phase analysis and prediction model

including Perl and Shell scripts to extract and profile the performance data from

the performance database, and a Matlab implementation of the phase analyzer and

predictor. This section shows the experimental results of the phase analysis and prediction

performance evaluations using traces collected from the batch executions of SPECseis96,

a scientific benchmark program, and replay of the WorldCup98 web access log. In all

the experiments, ten-fold cross validation was performed for each set of time series

performance data.









(NWS) [73] for 66.1' of the traces. It has the potential to consistently outperform any

single predictor for variable workloads and achieve 18., :'.- lower MSE than the model used

in the NWS.

The rest of the chapter is organized as follows: Section 4.2 gives an overview of

related work. Section 4.4 describes the linear time series prediction models used to

construct the LARPredictor and Section 4.5 describes the learning techniques used for

predictor selection. Section 4.6 details the work flow of the learning-aided adaptive

resource predictor. Section 4.7 discusses the experimental results. Section 4.8 summarizes

the work and describes future direction.

4.2 Related Work

Time series analysis has been studied in many areas such as financial forecasting [74],

biomedical signal processing [75], and geoscience [76]. In this work, we focus on the time

series modeling for computer resource performance prediction.

In [77] and [78], Dinda et al. conducted extensive study of the statistical properties

and the predictions of host load. Their work indicates that CPU load is strongly

correlated over time, which implies that history-based load prediction schemes are feasible.

They evaluated the predictive power of a set of linear models including autoregression

(AR), moving average (\A4), autoregression integrated moving average (ARIMA),

autoregression fractionally integrated moving average (ARFIMA), and window-mean

models. Their results show that the AR model is the best in terms of high prediction

accuracy and low overhead among the models they studied. Based on their conclusion, the

AR model is included in our predictor pool to leverage its performance.

To improve the prediction accuracy, various adaptive techniques have been exploited

by the research community. In [4], Yang et al. developed a t. ,i d-l-' --based prediction

model that predicts the next value according to the tendency of the time series change.

Some increment/decrement value are added/subtracted to the current measurement

based on the current measurement and some other dynamic information to predict the






145


[110] D. Kusic and N. K ind il- ii.i, "Risk-aware limited lookahead control for dynamic
resource provisioning in enterprise computing systems," in Proc. 3rd International
Conference of Autonomic ConruI,,l',: 2006, pp. 74-83.











VMIP
(t1, t2)


CTC Classification Training Center DataQA Data Quality Assuror


Figure 3-2.


Feature selection model
The Performance p li'/. collects performance metrics of the target
application node. The Application ,. /i-.:;/7 r classifies the application using
extracted key components and performs statistic ain i ,~i-; of the classification
results. The DataQA selects the training data for the classification. The
Feature selector selects performance metrics which can provide optimal
classification accuracy. The Trainer trains the classifier using the selected
metrics of training data. The Application DB stores the application class
information. (to/ti: are the beginning / ending times of the application
execution, VMIP is the IP address of the application's host machine).


training data from the application snapshots, only n out of m metrics are extracted based

on previous feature selection result to form a set of Kc n-dimensional training points.




{Xk,1, Xk,,2, Xk ,,n}, k 1, 2,.. Kc (3-4

that comprise a cluster Cc. From [50], it follows that the n-tuple


~,= ( I x 2,- ')nJ (3-5


)


)






























0
Principal


2
Component 1


0 2
Principal Component 1


Figure 3-8. Training data
automatically


clustering diagram derived from expert-selected and
selected feature sets A)Automatic B)Expert


x Idle
0 10
CPU
NET
0 MEM






0


4t


-2
-4


x Idle
0 10
CPU
NET
0 MEM

0
0 C:


o
O(POc


-21
-4


Im I I









designed based on B ,i, i i Network (BN) to systematically identify the feature subset,

which can provide optimal classification accuracy and adapt to changing workloads.

Second, an adaptive system performance prediction model is investigated based

on a learning-aided predictor integration technique. Supervised learning techniques are

used to learn the correlations between the statistical properties of the workload and the

best-suited predictors.

In addition to a one-step ahead prediction model, a phase characterization model is

studied to explore the large-scale behavior of application's resource consumption patterns.

Our study provides novel methodologies to model system and application perfor-

mance. The performance models can self-optimize over time based on learning of historical

runs, therefore better adapt to the changing workload and achieve better prediction

accuracy than traditional methods with static parameters.








Table 4-3. Normalized prediction MSE statistics for resources of VM3
Predictors
Perf.Metrics P-LAR LAR LAST AR SW
CPU usedsec 0.9883 1.0395 1.4341 1.0376 1.0989
CPUready 0.6826 0.9502 1.6594 0.9502 1.0921
Memory_size 0.5009 0.6169 0.6818 0.6216 0.7481
Memory_swapped 0 0 0 NaN 0

NIC1 transmitted 0.9931 1.0514 1.3068 1.0665 1.0943
VD1 read 0 0 0 NaN 0
VD1 write 0 0 0 NaN 0
VD2 read 0.9728 1.0276 1.3969 1.0281 1.1016
VD2 write 0.8696 0.9938 1.245 0.9946 1.0815
duration = 24 hours, interval = 5 minutes, prediction order = 5

based predictor of a fixed window size (n=2 in this experiment) used in the NWS
(W-Cum.MSE).
The experimental results show that without running all the predictors in parallel all
the time, for 66.1.7' of the traces, the LARPredictor outperformed the cumulative-MSE
based predictor used in the NWS. The perfect LARPredictor shows the potential to
achieve 18.,.'. lower MSE in average that the cumulative-MSE based predictor.
4.7.3 Discussion
PCA is an optimal way to project data in the mean-square sense. The computational
complexity of estimating the PCA is O(d2W) + O(d3) for the original set of W x
d-dimensional data [89]. In the context of resource performance time series prediction,
W = 1 and d is the prediction window size. The typical small input data size in this
context makes the use of the PCA feasible. There also exist computationally less expensive
methods [90] for finding only a few eigenvectors and eigenvalues of a large matrix; in our
experiments, we use appropriate Matlab routines to realize these.

















0.6
0.4
0.2
0
-0.2


40
CDU


60
system (%)


Figure 3-6. Five-class test data distribution with first two selected features


Automatic
I I Expert










... IL


1 2 3 4 5 6 7
Cluster Pair


8 9 10


Figure 3-7.


Comparison of distances between cluster centers derived from expert-selected
and automatically selected feature sets
l:idle-cpu 2:idle-I/O 3:idle-net 4:idle-mem 5:cpu-I/O
6:cpu-net 7:cpu-mem 8:I/O-net 9:I/O-mem 10:net-mem


x Idle
CPU
*0 10
o Network
O Memory

CIBiO0 00 G(1IO 0



- 0 coo 0 0 GD 0-






























2007 Jian Zhang










System Throughput of Different Schedules
1600
1400
Ea 1200 -
fl o1000-
0-,
2 800
", 600
8 400
200
0
1 2 3 4 5 6 7 8 9 10
Schedule ID

Figure 2-7. System throughput comparisons for ten different schedules
1:{(SSS),(PPP),(NNN)}, 2:{(SSS),(PPN),(PNN)}, 3:{(SSP),(SPP),(NNN)},
4:{(SSP),(SPN),(PNN)}, 5:{(SSP),(SNN),(PPN)}, 6:{(SSN),(SPP),(PNN)},
7:{(SSN),(SPN),(PPN)}, 8:{(SSN),(SNN),(PPP)}, 9:{(SPP),(SPN),(SNN)},
10:{ (SPN), (SPN), (SPN) }
S -SPECseis96 (CPU-intensive), P -PostMark (I/O-intensive),
N -NetPIPE (Network-intensive).

selected at random. The other scenario used application class knowledge, alv--v allocating
applications of different classes (CPU, I/O and network) to run on the same machine
(Schedule 10, Figure 2-7). The system throughputs obtained from runs of all possible
schedules in the experimental environment are shown in Figure 2-7.
The average system throughput of the schedule chosen with class knowledge was
1391 jobs per d i'. It achieved the highest throughput among the ten possible schedules
22.11 larger than the weighted average of the system throughputs of all the ten possible
schedules. In addition, the random selection of the possible schedules resulted in large
variances of system throughput. The application class information can be used to facilitate
the scheduler to pick the optimal schedule consistently. The application throughput
comparison of different schedules on one machine is shown in Figure 2-8. It compares the









from the server host machine's /proc nodes. The vmkusage tool samples every minute,

and updates its data every five minutes with an average of the one-minute statistics over

the given five-minute interval. The collected data is stored in a Round Robin Database

(RRD). Table 2-1 shows the list of performance features under study in this work.

The profiler retrieves the VM performance data, which are identified by vmlD,

devicelD, and a time window, from the round robin performance database. The data of

each VM device's performance metric form time series (xt-m+l, xt) with an identical

interval, where m is the data retrieval window size. The retrieved performance data with

the corresponding time stamps are stored in the prediction database. The [vmID, devicelD,

timeStamp, metricName] forms the combinational primary key of the database. Figure 4-2

shows the XML schema of the database and sample database records of virtual machines

such as VM1, which has one CPU, two Network Interface Cards (NIC), and two virtual

hard disks.

The LARPredictor takes the time series performance data (y,_t ,Y_-1) as inputs,

selects the best prediction model based on the learning of historical prediction results,

and predicts the resource performance Yt of future time. The detail description of the

LARPredictor's work flow is given in Section 4.6. The predicted results are stored in

the prediction DB and can be used to support the resource manager's dynamic VM

provisioning decision-making.

The Prediction Q;,l.:iJ Assuror (QA) is responsible for monitoring the LARPredic-

tor's performance, in terms of MSE. It periodically audits the prediction performance by

calculating the average MSE of historical prediction data stored in the prediction DB.

When the average MSE of the data in the audit window exceeds a predefined threshold,

it directs the LARPredictor to re-train the predictors and the classifier using recent

performance data stored in the database.









LIST OF FIGURES


Figure

1-1 Structure of an autonomic element . ............

1-2 Classification system representation . ...........

1-3 Virtual machine structure . .................

1-4 VMPlant architecture . .......

2-1 Sample of principal component analysis . .........

2-2 k-nearest neighbor classification example . .

2-3 Application classification model . .............

2-4 Performance feature space dimension reductions in the application


classification


process . . . . . . . . . . .. .

2-5 Sample clustering diagrams of application classifications . .

2-6 Application class composition diagram. . . .....

2-7 System throughput comparisons for ten different schedules . . ..

2-8 Application throughput comparisons of different schedules . . ..

3-1 Sample B ,i, -i ,i, network generated by feature selector . . .

3-2 Feature selection model .... . .

3-3 B li,. -i ,i-network based feature selection algorithm for application classification

3-4 Average classification accuracy of 10 sets of test data versus number of features
selected in the first experiment . . .

3-5 Two-class test data distribution with the first two selected features . .

3-6 Five-class test data distribution with first two selected features . . .


3-7 Comparison of distances between cluster centers derived from expert-selected
and automatically selected feature sets ................ .... 66

3-8 Training data clustering diagram derived from expert-selected and automat-
ically selected feature sets .............. .......... .. 67

3-9 Classification results of benchmark programs ................ . 69

4-1 Virtual machine resource usage prediction prototype ............. 78

4-2 Sample XML schema of the VM performance DB ... . .... 80


page

16

19

21

23

28

31

32


34

39

42

43

44

54

57

60


63

63

66









where C denotes the transition factor, which is the ratio of C2 to C1, and K is the

maximum number of phases.

Encoding misprediction p y.J '.all; cost: The algorithm can be extended to phase

prediction as well as phase analysis of resource usage. The determination of the best

number of phases remains the same, whereas the cost function has to be changed to

take over- or under-provisioning caused by prediction error into account. Generally the

mispredictions consist of two possible cases: over-provisioning and under-provisioning.

Over-provisioning refers to the cases that the resource reservation based on prediction is

larger than the actual usage. It guarantees that the application response time is equal

to or less than the time defined in the SLA. In this case, the penalty is the cost of the

over-reserved resource, which has been encoded in the cost model already. In case of

under-provisioning, the application's execution time will be enlarged because of the

resource constrain. The performance degradation is approximated by the y. '.i/ll; in the

total cost function. The penalty is defined as the difference between the under-reserved

resource and the actual resource usage, and can be written as


penalty if u < UI (57)
V U Unax if U> Upax


k
P(k) Upena" (5-8)
i= 1

where k is the number of phases. Taking both the phase transition and misprediction costs

into account, the general total cost function is modified as


TC'(k) = C1R(k) + C2TR(k) + C3P(k) (5-9)









In this work, the PCA is used to reduce the prediction input data dimensions. It

helps to reduce the computing intensity of the subsequent classification process.

4.6 Learning-Aided Adaptive Resource Predictor

This section describes the work flow of the Learning-Aided Adaptive Resource Pre-

dictor (LARPredictor) illustrated in Figure 4-3. The prediction consists of two phases: a

training phase and a testing phase. During the training phase, the best predictors for each

set of training data are identified using the traditional mix-of-expert approach. During

the testing phase, the classifier forecasts the best predictor for the test data based on the

knowledge gained from the training data and historical prediction performance. Then only

the selected best predictor is run to predict the resource performance. Both phases include

the data pre-processing and the Principal Component All 1.,-i' (PCA) process.

The features under study in this work, as shown in Table 2-1, include CPU, memory,

network bandwidth, and disk I/O usages. Figure 4-4 illustrates how the features are

processed to form the prediction database. Since the features have different units of

measure, a data pre-processor was used to normalize the input data with zero mean and

unit variance. The normalized data are framed according to the prediction window size to

feed the PCA processor.

4.6.1 Training Phase

The training phase of both the k-NN and the B li-,-i ,i classifiers mainly consists

of two processes: Prediction model fitting and best predictor identification. The set of

training data with the corresponding best predictors are used for the k-NN classification in

the testing phase. The unknown parameters of the B li, -i in classifier are estimated from

on the training data.

The LAST and SW_AVG models do not involve any unknown parameters. They can

be used for predictions directly. The parametric prediction models such as the AR model,

which contain unknown parameters, require model fitting. The model fitting is a process









Table 5-3. Average phase prediction accuracy
Performance Number of phases (k)
Application Features 1 2 3 4 5 6 7 8 9 10
Bytesin 1.00 0.99 0.99 0.98 0.98 0.97 0.97 0.96 0.96 0.96
WorldCup98 Bytes_out 1.00 0.94 0.94 0.92 0.91 0.89 0.87 0.88 0.86 0.84
CPU user 1.00 0.95 0.90 0.87 0.85 0.81 0.78 0.77 0.74 0.69
SPECseis96 CPU_system 1.00 0.94 0.87 0.83 0.83 0.79 0.76 0.74 0.73 0.69


Table 5-4. Performance feature list of VM traces
Perf. Features Description
CPU_Ready The percentage of time that the virtual machine
was ready but could not get scheduled to run on
a physical CPU.
CPU_Used The percentage of physical CPU resources used
by a virtual CPU.
Mem_Size Current amount of memory in bytes the virtual
machine has.
Mem_Swap Amount of swap space in bytes used by the
virtual machine.
Net_RX/TX The number of packets and the MBytes per
second that are transmitted and received by a NIC.
Disk_RD/WR The number of I/Os and KBytes per second
that are read from and written to the disk.


replay and an average of 85'


accuracy can be achieved for the CPU performance traces of


SPECseis96 for the four-phase cases.

In addition to the above two applications, we also evaluated the prediction per-

formance of the phase predictor using traces of a set of five virtual machines. These

virtual machines were hosted by a physical machine with an Intel(R) Xeon(T\ 1) 2.0GHz

CPU, 4GB memory, and 36GB SCSI disk. VMware ESX server 2.5.2 was running on

the physical host. The ; nl -.i,,: tool was run on the ESX server to collect the resource

performance data of the guest virtual machines every minute and store them in a round

robin database. The performance features under study in this experiment are shown in

Table 5-4.









To perform the web log replay, a Matlab program was developed to profile the

binary access log file and extract the entries of the target web server. The i i. 1I."

tool provided by [98] was used to convert the binary log into the Common Log Format.

A modified version of the Real-Time Web Log Replayer [99] was used to analyze and

generate the files needed by the log replayer and perform the replay.

Figures 5-5 and 5-6 show the phase characterization results of the performance

features bytes-in and bytesout of the web server. The interesting observation from Figures

A and B is that the number of phase transitions and mis-prediction penalties do not

alv--,v- monotonically increase with the increasing number of phases. As a result, the

phase profile shown in Figure C argues that three-phase based resource provisioning gives

the lowest total cost with given C = [150k, 750k] and Cp = 8. The results implies that

the phase profile is highly workload dependent. The prototype presented in this thesis can

help to construct and analyze the phase profile of the application resource consumption

and decide the proper resource provisioning strategy.

5.5.2 Phase Prediction Accuracy

As one of the cost determinant, the misprediction penalty is a function of the phase

prediction accuracy. This section evaluates the performance of phase prediction model

introduced in Section 5.4. A performance measurement, prediction accur r ;, is defined as

the ratio of the number of performance snapshots, whose predicted phases match with the

observed phases, to the total number of performance snapshots collected during the testing

period.

Table 5-3 shows the phase prediction accuracies for the performance traces of the

main resources consumed by the SPECseis96 and the WorldCup98 workloads. Generally,

the phase prediction accuracy of each performance feature decreases with increasing

number of phases. It explains why the penalty curve rises monotonically with the

increasing number of phases in Figure D. With current implementation, an average of

95' accuracy can be achieved for the network performance traces of the WorldCup98 log









Instead of using all the eigenvectors of the covariance matrix, we may represent the

data in terms of only a few basis vectors of the orthogonal basis. If we denote the matrix

having the K first eigenvectors as rows by AK, we can create a similar transformation as

seen above


y = AK (x x) (2-8)


and


x = A y + x (2-9)


It means that we project the original data vector on the coordinate axes having the

dimension K and transforming the vector back by a linear combination of the basis

vectors. This method minimizes the mean-square error between the data and the

representation with given number of eigenvectors.

If the data is concentrated in a linear subspace, this method provides a way to

compress data without losing much information and simplifying the representation. By

picking the eigenvectors having the largest eigenvalues we lose as little information as

possible in the mean-square sense.

2.2.2 k-Nearest Neighbor Algorithm

K-Nearest Neighbor classifier (k-NN) is a supervised learning algorithm where the

result of new instance query is classified based on in iii il y of k-nearest neighbor category

[26]. It has been used in many applications in the field of data mining, statistical pattern

recognition, image processing, and many others. The purpose of this algorithm is to

classify a new object based on attributes and training samples. The classifiers do not

use any model to fit and only based on memory. Given a query point, we find k number

of objects or (training points) closest to the query point. The k-NN classifier decides

the class by considering the votes of k (an odd number) nearest neighbors. The nearest









[83] S. Gunter and H. Bunke, "An evaluation of ensemble methods in handwritten word
recognition based on feature selection," in Proc. 17th International Conference on
Pattern Recognition, Aug. 2004, vol. 1, pp. 388-392.

[84] G. Jain, A. Ginwala, and Y. Aslandogan, "An approach to text classification using
dimensionality reduction and combination of classifiers," in Proc. IEEE International
Conference on In frI, i,,il.:in Reuse and Integration, Nov. 2004, pp. 564-569.

[85] V. white paper, "Comparing the mui, virtualcenter, and vmkusage,"

[86] J. D. Cryer, Time series n.'/l,;.-, Duxbury Press, Boston, MA, 1986.

[87] S. G. John O.Rawlings and D. A.Dickey, Applied Regression A,.,l;,-: Springer,
2001.

[88] R. T. Trevor Hastie and J. Friedman, The Elements of Statistical Learning, Springer,
2001.

[89] E. Bingham and H. Mannila, "Random projection in dimensionality reduction:
applications to image and text data," in Knowledge Discovery and Data Mining,
2001, pp. 245-250.

[90] L. Sirovich and R. Everson, \ I! ,, ii :, ini and analysis of large scientific datasets,"
Int. Journal of Supercomputer Applications, vol. 6, no. 1, pp. 50-68, 1992.

[91] J. Yang, Y. Zhang and B. Kisiel, "A scalability analysis of classifiers in text
categorization," in ACMI SIGIR'03, 2003, pp. 96-103.

[92] F. Friedman, J.H. Baskett and L. Shustek, "An algorithm for finding nearest
neighbors," IEEE Transactions on Computers, vol. C-24, no. 10, pp. 1000-1006, Oct.
1975.

[93] J. Friedman, J.H. Be-,tk and R. Finkel, "An algorithm for finding best matches in
logarithmic expected time," AC'[ Transactions on Mathematical Software, vol. 3,
pp. 209-226, 1977.

[94] P. D. G. Banga and J. Mogul, "Resource containers: A new facility for resource
management in server systems," in Proc. 3rd symposium on Operating System
Design and Implementation, New Orleans, Feb. 1999.

[95] L. Ramakrishnan, L. Grit, A. Iamnitchi, D. Irwin, A. Yumerefendi, and J. C'!h -
"Towards a doctrine of containment: Grid hosting with adaptive resource control,"
in Proc. Supercomputing, Tampa, FL, Nov. 2006.

[96] R. Dubes, "How many clusters are best? -an experiment," Pattern Recogn., vol. 20,
no. 6, pp. 645-663, Nov. 1987.

[97] A. K. Jain, M. N. Murty, and P. J. Flynn, "Data clustering: a review," ACMI
Computing Surveys, vol. 31, no. 3, pp. 264-323, 1999.









The classification accuracy is measured as the proportion of the total number of

predictions that are correct. A prediction is considered correct if the data is classified to

the same class as its actual class. Table 3-1 shows a sample confusion matrix with L=2.

There are only two possible classes in this example: Positive and negative. Therefore, its

classification accuracy can be calculated as (a+d)/(a+b+c+d).

3.3 Autonomic Feature Selection Framework

Figure 3-2 shows the autonomic feature selection framework in the context of

application classification. In this section, we are going to focus on introducing the

classification training center, which enables the self-configurability for online application

classification. The training center has two 1n i' functions: quality assurance of training

data, which enables the classifier to adapt to changing workloads, and systematic feature

selection, which supports automatic feature selection. The training center consists of three

components: the data quality assuror, the feature selector, and the trainer.

3.3.1 Data Quality Assuror

The data quality assuror (DataQA) is responsible for selecting the training data for

application classification. The inputs of the DataQA are the performance snapshots taken

during the application execution. The outputs are the qualified training data with its

class, such as CPU-intensive.

The training data pool consists of representative data of five application classes

including CPU-intensive, I/O-intensive, memory-intensive, network-intensive, and idle.

Training data of each class c is a set of K< m-dimensional points, where m is the number

of application-specific performance metrics reported by the monitoring tools. To select the

Table 3-1. Sample confusion matrix with two classes (L 2)
Actual Predicted
Class Negative Positive
Negative a b
Positive c d









REFERENCES


[1] J. Kephart and D. C'! i -- "The vision of autonomic computing," Computer, vol. 36,
no. 1, pp. 41-50, 2003.

[2] Y. Yang and H. Casanova, "Rumr: Robust scheduling for divisible workloads.," in
Proc. 12th High-Performance Distributed Cor0nTl.',.: Seattle, WA, June 22-24, 2003,
pp. 114-125.

[3] J. M. Schopf and F. Berman, "Stochastic scheduling," in Proc. AC'I/IEEE
Conference on Super '..in-,l,.:., Portland, OR, Nov. 14-19, 1999, p. 48.

[4] L. Yang, J. M. Schopf, and I. Foster, "Conservative scheduling: Using predicted
variance to improve scheduling decisions in dynamic environments," in Proc.
AC'I/IEEE conference on Super ..,,,,il.:.,,i Nov. 15-21, 2003, p. 31.

[5] G. Tesauro, N. Jong, R. Das, and M. Bennani, "A hybrid reinforcement learning
approach to autonomic resource allocation," in Proc. IEEE International Conference
on Autonomic Computing (ICAC'06), 2006, pp. 65-73.

[6] G. Tesauro, R. Das, W. Walsh, and J. Kephart, "Utility-function-driven resource
allocation in autonomic systems," in Proc. Second International Conference on
Autonomic C ',,n;,/.:,1 (ICAC'05), 2005, pp. 342-343.

[7] R. Duda, P. Hart, and D. Stork, The Art of Computer S,-ii mi Performance
A,.l;i.-.: Techniques for Experimental Design, Measurement, Simulation, and
Modeling, Wiley-Interscience, New York, NY, Apr. 1991.

[8] J. O. Kephart, "Research challenges of autonomic computing," in Proc. 27th
International Conference on Software Engineering ICSE, May 2005, pp. 15-22.

[9] S. M. Weiss and C. A. Kulikowski, Computer S1,., 1'm That Learn: C1ir--' ..rl.:I.)n
and Prediction Methods from Statistics, Neural Nets, Machine Learning, and Expert
S-1/ m- Morgan Kaufmann, San Mateo, CA 94403, 1990.

[10] R. P. Goldberg, "Survey of virtual machine research," IEEE Computer M rg. .:,..
vol. 7, no. 6, pp. 34-45, June 1974.

[11] R. Figueiredo, P. Dinda, and J. Fortes, "A case for grid computing on virtual
machines," in Proc. ,./ International Conference on Distributed CornI'nIl.I',;
S. ,1, 4 M iv 19-22, 2003, pp. 550-559.

[12] S. Pinter, Y. Aridor, S. Shultz, and S. Guenender, Iipiu, iinm machine virtualization
with 'hotplug memory'," Proc. 17th International Symposium on Computer
Architecture and High Performance Computing, pp. 168-175, 2005.

[13] C. Clark, K. Fraser, S. Hand, J. Hanseny, E. July, C. Limpach, I. Pratt, and
A. Warfield, "Live migration of virtual machines," in Proc. ',./I Symposium on
Networked S,/-1' m- Design & Implementation (NSDI'05), Boston, MA, 2005.








Cn
4


S0 2


20
P 0 20 40 60 80 100 120 140

4-4---------------
I I I I I I I

;0 0 2


2 0 20 40 60 80 100 120 140






0
0 20 40 60 80 100 120 140
Time Index
Figure 4-8. Best predictor selection for trace VM2_Disk
Predictor Class: 1 LAST, 2 AR, 3 SWAVG

1. It is hard to find a single prediction model among LAST, AR, and SW_AVG
that performs best for all types of resource performance data for a given VM trace. For
example, for the VMl's trace data shown in Table 4-1, each of the three models (LAST,
AR, and SW) outperformed the other two for a subset of the performance metrics. In this
experiment, only the AR model worked best for the trace data of VM3.
2. It is hard to find a single prediction model among the three that perform best
consistently for a given type of resources across all the VM traces. In the experiment, only
the AR model worked best for the CPU performance predictions.
3. The LARPredictor achieved better-than-expert performances using the
mix-of-expert approach for 44.2 :' of the workload traces. It shows the potential for the









the distances between 9 out of 10 pairs of cluster centroids are bigger in the automatic

selection case than the expert's manual selection case. It means that competitively distinct

class clusters can be formed with the 2 principal components which were derived from the

automatically selected features compared with the expert selected features.

Second, the PCA and k-NN based classifications were conducted with both the expert

selected 8 features in previous work [53] and the automatically selected features in Section

3.4.1. Table 3-3 shows the confusion matrices of the classification results. If data are

classified to the same classes as their actual classes, the classifications are considered as

correct. The classification accuracy is the proportion of the total number of classifications

that were correct. The confusion matrices shows that a classification accuracy of 98.05'

can be achieved with automatically selected feature set, which is similar to the 98.1 !'

accuracy achieved with expert selected feature set. Thus the automatic feature selection

that is based on B ,i, i i Network can reduce the reliance on expert knowledge while

offering competitive classification accuracy compared to manual selection by human

expert.

In addition, a set of 8 features selected in the 5-class feature selection experiment

in Section 3.4.1 was used to configure the application classifier and the same training

data used in the feature selection experiment were used to train the application classifier.

Then the trained classifier conducted classification for a set of three benchmark programs:

SPECseis96 [29], PostMark and PostMark_NFS [28]. SPECseis96 is a scientific application

which is computing-intensive but also exercises disk I/O in the initial and end phases of its

execution. PostMark originally is a disk I/O benchmark program. In PostMark_NFS,

a network file system (NFS) mounted directory was used to store the files which

were read/written by the benchmark. Therefore, PostMark_NFS performs substantial

network-I/O rather than disk I/O. The classification results are shown in Figure 3-9. The

results show that I,' of the SPECseis96 test data were classified as cpu-intensive, 95'

of the PostMark data were classified as I/O-intensive, and 61 of the PostMark_NFS








4

o 2

*OY
20
% 0 20 40 60 80 100 120 140






4
,..-----------------

S 2
S 0 20 40 60 80 100 120 140


So 2
rd 0


S0 20 40 60 80 100 120 140
Time Index
Figure 4-7. Best predictor selection for trace VM2_Swap
Predictor Class: 1 LAST, 2 AR, 3 SW_AVG

(P-LAR). The MSE of the P-LAR model shows the upper bound of the prediction
accuracy that can be achieved by the LARPredictor. The MSE of the best predictor
among LAR, LAST, AR, and SW_AVG is highlighted with italic bold numbers.
Table 4-6 shows the best predictor among LAST, AR, and SW_AVG for all the
resource performance metrics and VM traces. The symbol "*" indicates the cases in which
the LARPredictor achieved equal or higher prediction accuracy than the best of the three
predictors. Overall, the AR model performed better than the LAST and the SWAVG
models.
The above experimental results show:









symposium on Operating si,-1/ ii;- principles, Bolton L iliHi:, NY, Oct. 19-22, 2003,
pp. 74-89.

[71] R. Isaacs and P. Barham, "Performance analysis in loosely-coupled distributed
systems," in Proc. 7th CaberNet Radicals Workshop, Bertinoro, Italy, Oct. 2002.

[72] I. Foster, "The .in ,l .-_r: of the grid: enabling scalable virtual organizations," in
Proc. 1st IEEE/AC(M International Symposium on C'i,-l r CorT,,l'I,-Il and the Grid,
2001, pp. 6-7.

[73] R. Wolski, "Dynamically forecasting network performance using the network weather
service," in Journal of cluster -,,, 1i.:,1u, 1998.

[74] I. Matsuba, H. Suyari, S. Weon, and D. Sato, "Practical chaos time series ,I ll i-;
with financial applications," in Proc. 5th International Conference on S:,.,]rl
Processing, Beijing, 2000, vol. 1, pp. 265-271.

[75] P. Magni and R. Bellazzi, "A stochastic model to assess the variability of blood
glucose time series in diabetic patients self-monitoring," IEEE Trans. Biomed. Eng.,
vol. 53, no. 6, pp. 977-985, 2006.

[76] K. Didan and A. Huete, "Analysis of the global vegetation dynamic metrics using
modis vegetation index and land cover products," in IEEE International Geoscience
and Remote Sensing Symposium (IGARSS'04), 2004, vol. 3, pp. 2058-2061.

[77] P. Dinda, "The statistical properties of host load," Scientific P,.. 'Iini,,,. no.
7:3-4, 1999.

[78] P. Dinda, "Host load prediction using linear models," C'l,-i, r Cornl';,I.:., vol. 3, no.
4, 2000.

[79] Y. Zhang, W. Sun, and Y. Inoguchi, "CPU load predictions on the computational
grid *," in Proc. 6th IEEE International Symposium on C'ii. r Computing and the
Grid, liv 2006, vol. 1, pp. 321-326.

[80] J. Liang, K. N ,i istedt, and Y. Zhou, "Adaptive multi-resource prediction in
distributed resource sharing environment," in Proc. IEEE International Symposium
on C'1,-. r CorIpl..:,,l and the Grid, 2004, pp. 293-300.

[81] S. Vazhkudai and J. Schopf, "Predicting sporadic grid data transfers," Proc.
International Symposium on High Performance Distributed Cornr1,l.:,.'l pp. 188-196,
2002.

[82] S. Vazhkudai, J. Schopf, and I. Foster, "Using disk throughput data in predictions
of end-to-end grid data transfers," in Proc. 3rd International Workshop on Grid
Computing, Nov. 2002.









CHAPTER 4
ADAPTIVE PREDICTOR INTEGRATION FOR SYSTEM PERFORMANCE
PREDICTIONS

The integration of multiple predictors promises higher prediction accuracy than the

accuracy that can be obtained with a single predictor. The challenge is how to select the

best predictor at any given moment. Traditionally, multiple predictors are run in parallel

and the one that generates the best result is selected for prediction. In this chapter, we

propose a novel approach for predictor integration based on the learning of historical

predictions. Compared with the traditional approach, it does not require running all the

predictors simultaneously. Instead, it uses classification algorithms such as k-Nearest

Neighbor (k-NN) and B ,i, ~i classification and dimension reduction technique such

as Principal Component Analysis (PCA) to forecast the best predictor for the workload

under study based on the learning of historical predictions. Then only the forecasted best

predictor is run for prediction.

4.1 Introduction

Grid computing [72] enables entities to create a Virtual Organization (VO) to share

their computation resources such as CPU time, memory, network bandwidth, and disk

bandwidth. Predicting the dynamic resource availability is critical to adaptive resource

scheduling. However, determining the most appropriate resource prediction model a priori

is difficult due to the multi-dimensionality and variability of system resource usage. First,

the applications may exercise the use of different type of resources during their executions.

Some resource usages such as CPU load may be relatively smoother whereas others such

as network bandwidth are bustier. It is hard to find a single prediction model which works

best for all types of resources. Second, different applications may have different resource

usage patterns. The best prediction model for a specific resource of one machine may not

wok best for another machine. Third, the resource performance fluctuates dynamically due

to the contention created by competing applications. Indeed, in the absence of a perfect

prediction model, the best predictor for any particular resource may change over time.









Table 2-1. Performance metric list
Performance Metrics Description
CPU_System / User Percent CPU_System / User
Bytes_In / Out Number of bytes per second
into / out of the network
IO_BI / BO Blocks sent to / received from
block device (blocks/s)
Swap_In / Out Amount of memory swapped
in / out from / to disk (kB/s)


2.3.2.3 Training and classification

The 3-Nearest Neighbor classifier is used for the application classification in our

implementation. It is trained by a set of carefully chosen applications based on expert

knowledge. Each application represents the key performance characteristics of a class. For

example, an I/O benchmark program, PostMark [28], is used to represent the IO-intensive

class. SPECseis96 [29], a scientific computing intensive program, is used to represent

the CPU-intensive class. A synthetic application, Pagebench, is used to represent the

Paging-intensive class. It initializes and updates an array whose size is '-i.--. r than the

memory of the VM, thereby inducing frequent paging activity. Ettcp [30], a benchmark

that measures the network throughput over TCP or UDP between two nodes, is used as

the training application of the Network-intensive class. The performance data of all these

four applications and the idle state are used to train the classifier. For each test data, the

trained classifier calculates its distance to all the training data. The 3-NN classification

identifies only three training data sets with the shortest distance to the test data. Then

the test data's class is decided by the i1 i il i ily vote of the three nearest neighbors.

2.3.3 Post Processing and Application Database

At the end of classification, an m dimension class vector clxm = (l, c2, cm)

is generated. Each element of the vector clxm represents the class of the corresponding

application performance snapshot. The 1i, in i ly vote of the snapshot classes determines

the application Class. The complete performance data dimension reduction process is

shown in Figure 2-4. In addition to a single value (Class) the application classifier also









ACKNOWLEDGMENTS

I would like to express my sincere gratitude to my advisor, Professor Renato J.

Figueiredo, for his invaluable advice, encouragement, and support. This dissertation would

not have been possible without his guidance and support. My deep appreciation goes to

Professor Jose A.B. Fortes for participating in my supervisory committee and for all the

guidance and opportunities to work in the In-VIGO team that he gave me during my

Ph.D study. My deep recognition also goes to Professor Malay Ghosh and Professor Alan

George for serving on my supervisory committee and for their valuable ii-.. I ii- Many

thanks go to Dr. Mazin Yousif and Mr. Robert Carpenter from Intel Corporation for their

valuable input and generous funding for this research. Thanks also go to my colleagues

in the Advanced Computing Information Systems (ACIS) Laboratory for their discussion

of ideas and years of friendship. Last but not least, I owe a special debt of gratitude to

my family. Without their selfless love and support, I cannot imagine what I would have

achieved.









this work. While we have chosen to use the k-NN and B li-, -i ,i classification algorithms

due to its prior success in a large number of classification problems, such as handwritten

digits and satellite image scenes, our methodology may be generally used with other types

of classification algorithms.

4.5.1 k-Nearest Neighbor

The k-Nearest Neighbor (k-NN) classifier is memcr, -li.,o Its training data consist

of the N pairs (xi,pi), (xN, N) where pi is a class label taking values in 1, 2, P.

In this work, the P represents the number of prediction models in the pool. The training

data are represented by a set of points in the feature space, where each point xi is

associated with its class label pi. Classification of testing data xj is made to the class

of the closest training data. For example, given a test data xj, the k training data

xr, r = 1, k closest in distance to xj are identified. The test data is classified by using

the 1i i, i ilty vote among the k (an odd number) neighbors.

Since the features under study, such as CPU percentage and network receivedbytes/-

sec, have different units of measure, all features are normalized to have zero mean and unit

variance [88]. In this work, "(! ..-. is determined by Euclidean distance (Equation 4-6).


dij = ||xi Xj|| (4-6)


As a nonparametric method, the k-NN classifier can be applied to different time series

without modification. To address the problem associated with high dimensionality, various

dimension reduction techniques can be used in the data preprocessing.

4.5.2 Bayesian Classification

The B li- i i classifier is based on the well-known probability theorem, "Bw, -

formula". Suppose that we know both the prior probabilities P(Uj) and the conditional

densities p(xl j), where x and u represent a feature vector and its state (e.g., class),

respectively. The joint probability density can be written in two v--o,-i p(wy, x)










Application Class Compositions


VMD
Sftp
Autobench
NetPIPE
PostMark NFS
o Stream
SPEC M 32

a. Bonnie
PostMark
SimpleScalar
CH3D
SPEC S 256
SPEC M 256

0% 20% 40% 60% 80% 100%
Percentage

o Idle m I/O m CPU m Network O Paging

Figure 2-6. Application class composition diagram


applications SPECseis96 (S) with small data size, PostMark (P) with local file directory

and NetPIPE Client (N) were selected, and three instances of each application were

executed. The scheduler's task was to decide how to allocate the nine application instances

to run on the 3 virtual machines (VM1, VM2 and VM3) in parallel, each of which hosted

3 jobs. The VM4 was used to host the NetPIPE server. There are ten possible schedules

available, as shown in Figure 2-7.

When multiple applications run on the same host machine at the same time, there

are resource contentions among them. Two scenarios were compared: in the first scenario,

the scheduler did not use class information, and one of the ten possible schedules was



































To my family.









5.3.4 Finding the Optimal Number of Clusters

One of the most venerable problems in cluster analysis is to find the optimal number

of clusters in the data. Many statistical methods and computational algorithms have been

developed to answer this question using external indices and/or internal indices [96]. The

best number of clusters in the context of phase analysis discussed in this work is the one

that gives minimal total costs. The process to find out the optimal number of clusters of

the application workload is explained as follows.

Let u, = u(to + nAt) denote the resource usage sampled at time t = to + nAt

during the execution of an application. As shown in Section 5.3.3, when the clustering

with input parameter k (i.e., the number of clusters) is performed for a resource usage set

U = {ul, U2, }, the subset Ui of resource usages that belong to the ith phase can be

written as:


U, = {ulVu e phase i}, 1 < i < k. (52)


Resource reservation strategy: Phase-based resource reservation is performed. For intervals

whose resource usages belong to the ith phase, the local maximum amount of resource

usage U rm of the phase i is reserved:


U,x = max (ulVu e U,), 1 < i< k (5-3)


and the total resource reservation R over the whole execution period can be written as
k
R(k) = Um x (size of U) (5-4)
i=1

where k is the number of clusters used for clustering algorithm and the size of Ui is defined

as the number of elements of the subset Ui. Compared to the conservative reservation

strategy, which reserves the global maximum amount of resources over the whole execution

period, the phase-based reservation strategy can better adapt the resource reservation to

the actual resource usage and reduce the resource reservation cost as shown in Figure 5-2,









[57] H. Liu and L. Yu, "Toward integrating feature selection algorithms for classification
and clustering," IEEE Trans. Knowl. Data Eng., vol. 17, no. 4, pp. 491-502, Apr.
2005.

[58] J. Pearl, Probabilistic Reasoning in Intelligent Si- I. m- Networks of Plausible
Inference, Morgan Kaufmann Publishers, San Francisco, CA, 1988.

[59] T. Dean, K. B .-,-., R. Chekaluk, S. Hyun, M. Lejter, and M. Randazza, "Coping
with uncertainty in a control system for navigation and exploration.," in Proc. 8th
National Conference on Ar'.:, .:,l Intelligence, Boston, MA, July 29-Aug. 3, 1990,
pp. 1010-1015.

[60] D. Heckerman, "Probabilistic similarity networks," Tech. Rep., Depts. of Computer
Science and Medicine, Stanford University, 1990.

[61] D. J. Spiegelhalter, R. C. Franklin, and K. Bull, "Assessment criticism and
improvement of imprecise subjective probabilities for a medical expert system,"
in Proc. Fifth Workshop on Uncer'i,,.:,l in Ar'.(l,.:,l Intelligence, 1989, pp. 335-342.

[62] E. C('! i i i1: and D. McDermott, Introduction to Ar'.:i .:.,l Intelligence,
Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1985.

[63] T. S. Levitt, J. Mullin, and T. O. Binford, "Model-based influence diagrams for
machine vision," in Proc. 5th Workshop on Uncer''i,.:ul in Ar'.fl, .:,i Intelligence,
1989, pp. 233-244.

[64] R. E. Neapolitan, Probabilistic reasoning in expert s;,;-. ii- theory and il,.,rithms,
John Wiley & Sons, Inc., New York, NY, USA, 1990.

[65] K. Weinberger, J. Blitzer, and L. Saul, "Distance metric learning for large margin
nearest neighbor classification," in Proc. 19th annual Conference on Neural
Information Processing S;, .i' Vancouver, CA, Dec. 2005.

[66] R. Kohavi and F. Provost, "Glossary of terms," Machine Learning, vol. 30, pp.
271-274, 1998.

[67] B. Ziebart, D. Roth, R. Campbell, and A. Dey, "Automated and adaptive threshold
setting: Enabling technology for .illiiin and self-management," in Proc. '.1/
International Conference of Autonomic Cor,,i',.:,I,, June 13-16, 2005, pp. 204-215.

[68] P. Mitra, C. Murthy, and S. Pal, "Unsupervised feature selection using feature
similarity," IEEE Trans. Pat. Anal. Mach. Intel., vol. 24, no. 3, pp. 301-312, Mar.
2002.

[69] W. Lee, S. J. Stolfo, and K. W. Mok, "Adaptive intrusion detection: A data mining
approach," Ar'.:l i.:.1 Intelligence Review, vol. 14, no. 6, pp. 533-567, 2000.

[70] M. K. Aguilera, J. C. Mogul, J. L. Wiener, P. Reynolds, and A. Muthitacharoen,
"Performance debugging for distributed systems of black boxes," in Proc. 19th AC'CM









LEARNING-AIDED SYSTEM PERFORMANCE MODELING IN SUPPORT OF
SELF-OPTIMIZED RESOURCE SCHEDULING IN
DISTRIBUTED ENVIRONMENTS


















By

JIAN ZHANG


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2007









CHAPTER 5
APPLICATION RESOURCE DEMAND PHASE ANALYSIS AND PREDICTIONS

Profiling the execution phases of applications can help to optimize the utilization

of the underlying resources. This chapter presents a novel system level application-

resource-demand phase analysis and prediction approach in support of on-demand

resource provisioning. This approach explores large-scale behavior of applications' resource

consumption, followed by ain ,1,i -i; using a set of algorithms based on clustering. The

phase profile, which learns from historical runs, is used to classify and predict future

phase behavior. This process takes into consideration applications' resource consumption

patterns, phase transition costs and penalties associated with Service-Level Agreements

(SLA) violations.

5.1 Introduction

Recently there has been a renewed interest in using virtual machines) (VM) as a

container [94] of the application's execution environment both in academia and industry

[11] [16] [95]. This is motivated by the idea of providing computing resources as a utility

and charging the users for a specific usage. For example, in August 2006, Amazon

launched its Beta version of VM-based Elastic Compute Cloud (EC2) web service. EC2

allows users to rent virtual machines with specific configurations from Amazon and can

support changes in resource configurations in the order of minutes. In systems that

allow users to reserve and reconfigure resource allocations and charge based upon such

allocations, users have an incentive to request no more than the amount of resources an

application needs. A question which arises here is: how to adapt the resource provisioning

to the changing workload?

In this chapter, we focus on modeling and analyzing long-running applications' phase

behavior. The modeling is based on monitoring and learning of the applications' historical

resource consumption patterns, which likely varies over time. Understanding such behavior

is critical to optimizing resource scheduling. To self-optimize the configuration of an















2.b
o0 I
2 CPU
cj
- 1.5
0
E 0

0- 0.5
0.5
0



-1
-0.5 *


-1 0 1 2 3
Principal Component 1

A

2.5
O IO
2 CPU
cj
S1.5


0
o
- 0.5
U-
0-
0..

-0.5

-1
-1 0 1 2 3
Principal Component 1

B

2.5
o 10
2 O CPU
0 (B NET
1.5 0 NE
I 0 0
- 1
E
o
- 0.5 n

0.

-0.5 *

-1


-1 0 1 2
Principal Component 1

C


3 4


Figure 3-9. Classification results of benchmark programs A)SPECseis96 B)PostMark

C)PostMarkNFS Principal components 1 and 2 are the principal component

metrics extracted by PCA.









3 AUTONOMIC FEATURE SELECTION FOR APPLICATION CLASSIFICA-
TION ........... ................. .... .. .. ... 49


3.1 Introduction .........
3.2 Statistical Inference .....
3.2.1 Feature Selection
3.2.2 Bl, -i i Network
3.2.3 Mahalanobis Distance
3.2.4 Confusion Matrix .
3.3 Autonomic Feature Selection
3.3.1 Data Quality Assuror
3.3.2 Feature Selector .
3.3.3 Trainer ........
3.4 Experimental Results .


Framework


3.4.1 Feature Selection and Classification


Accuracy


3.4.2 Classification Validation .. ...................
3.4.3 Training Data Quality Assurance .. ...............
3.5 R elated W ork . . . . . . . . .
3.6 C conclusion . . . . . . . . .

4 ADAPTIVE PREDICTOR INTEGRATION FOR SYSTEM PERFORMANCE
PR ED ICTIO N S . . . . . . . . .

4.1 Introduction . . . . . . . . .
4.2 R elated W ork . . . . . . . . .
4.3 Virtual Machine Resource Prediction Overview .. ............
4.4 Time Series Models for Resource Performance Prediction .. ........
4.5 Algorithms for Prediction Model Selection .. ...............
4.5.1 k-Nearest Neighbor .. ......................
4.5.2 B li- -ii Classification . . . . . . .
4.5.3 Principal Component Analysis .. .................
4.6 Learning-Aided Adaptive Resource Predictor .. ..............
4.6.1 Training Phase . . . . . . . .
4.6.2 Testing Phase . . . . . . . .
4.7 Em pirical Evaluation .. .........................
4.7.1 Best Predictor Selection .. ...................
4.7.2 Virtual Machine Performance Trace Prediction .. ..........
4.7.2.1 Performance of k-NN based LARPredictor .. .......
4.7.2.2 Performance comparison of k-NN and B li, -i ii-classifier
based LARPredictor .. .................
4.7.2.3 Performance comparison of the LARPredictors and the
cumulative-MSE based predictor used in the NWS .
4.7.3 D discussion . . . . . . . .
4.8 C conclusion . . . . . . . . .









Table 3-3. Confusion matrix of classification results with expert-selected and
automatically-selected feature sets. A)Automatic B)Expert
Actual Classified as
Class Idle CPU IO Net Mem
Idle 4938 0 62 0 0
CPU 231 4746 23 0 0
IO 20 86 2888 6 0
Net 0 12 8 4980 0
Mem 0 0 0 0 5000
A

Actual Classified as
Class Idle CPU IO Net Mem
Idle 4962 0 38 0 0
CPU 4 4882 10 0 104
IO 20 10 2797 0 173
Net 0 0 24 4970 6
Mem 3 0 36 0 4961
B

The bold numbers along the diagonal are the number of
correctly classified data.


in this experiment, if we know that the application belongs to either I/O-intensive or

memory-intensive class, with two selected features, I'., classification accuracy can be

achieved versus S7'. accuracy in the 5-class case. It shows the potential of using pair-wise

classification to improve the classification accuracy for multi-class cases. Using pair-wise

approach for multi-class classification is a topic of future research.

3.4.2 Classification Validation

This set of experiments targets to validate the feature selection experiment results

with the Principal Component Analysis (PCA) and k-Nearest Neighbor (k-NN) based

application classification framework described in [53].

First, the training data distributions based on principal components, which are

derived from automatically selected features in Section 3.4.1 and manually selected

features in previous work [53], are shown in Figure 3-8. Distances between each pair

of class centroids in Figure 3-8 are calculated and ploted in Figure 3-7. It shows that









Multi- ,.:l,, ,!.':.'.u.:jl./ of application resource consumption: An application's execution

resource requirement is often multi-dimensional. That is, different applications may stretch

the use of CPU, memory, hard disk or network bandwidth to different degrees. The

knowledge of which kind of resource is the key component in the resource consumption

pattern can assist resource scheduling.

Multi-stage applications: There are cases where long-running scientific applications

exhibit multiple execution stages. Different execution stages may stress different kinds of

resources to different degrees, hence characterizing an application requires knowledge of

its dynamic run-time behavior. The identification of such stages presents opportunities to

exploit better matching of resource availability and application resource requirement across

different execution stages and across different nodes. For instance, with process migration

techniques [20] [21] it is possible to migrate an application during its execution for load

balancing.

The above characteristics of grid applications present a challenge to resource

scheduling: How to learn and make use of an application's multi-dimensional resource

consumption patterns for resource allocation? This chapter introduces a novel approach

to solve this problem: application classification based on the feature selection algorithm,

Principal Component Analysis (PCA), and K-Nearest Neighbor (k-NN) classifier [22][23].

The PCA is applied to reduce the dimensionality of application performance metrics, while

preserving the maximum amount of variance in the metrics. Then, the k-Nearest Neighbor

algorithm is used to categorize the application execution states into different classes

based on the application's resource consumption pattern. The learned application class

information is used to assist the resource scheduling decision-making in heterogeneous

computing environments.

The VMPlant service introduced in C!i lpter 1.4.2 provides automated cloning

and configuration of application-centric Virtual Machines (VMs). Problem-solving

environments such as In-VIGO [24] can submit requests to the VMPlant service, which









where


SKc
x(t) Zxkci,i =1,2, -- ,n (3-6)
kc=1

is called the centroid of the cluster Cc.

The training data selection is a three-step process: First the DataQA extracts the n

out of m metrics of the input performance snapshot to form a training data candidate.

Thus each candidate is represented by an n-dimensional point x = (xl, x2, xa).

Second, it evaluates whether the input candidate is qualified to be training data

representing one of the application class. At last, the qualified training data candidate

is associated with a scalar value Class, which defines the application class.

The first step is straight-forward. In the second and third steps, the Mahalanobis

distance between the training data candidate x and the centroid ec of cluster Cc is

calculated as follows:




dc(x) = ((x pc)'Ec '(x pc))' (3-7)

where c = 1, 2, .. 5 represents the application class and y 1 denotes the inverse

covariance matrix of the cluster Cc. The distance from the training data candidate x to

the boundary between two class clusters, for example C1 and C2, is Idl(x) d2(x) If

Idi(x) d2(x)| = 0, it means that the candidate is exactly at the boundary between

class 1 and 2. The further away the candidate is from the class boundaries, the better it

can represent a class. In other words, there is less probability for it to be misclassified.

Therefore, the DataQA calculates the distance from the candidate to boundaries of all

possible pairs of the classes. If the minimal distance to class boundaries, min(ldl -

d2a, Idl d3, .. d4 d5 ), is 'i.- -.-r than a predefined threshold 7, the corresponding

m-dimensional snapshot of the candidate is determined to be qualified training data of









[14] "Vmotion," http://www.vmware.com/products/vi/vc/vmotion.html.

[15] M. Zhao, J. Z!h iw- and R. Figueiredo, "Distributed file system support for virtual
machines in grid computing," Proc. 13th International Symposium on High Perfor-
mance Distributed CorT,,';,.:. pp. 202-211, 2004.

[16] I. Krsul, A. Ganguly, J. Zhang, J. Fortes, and R. Figueiredo, "Vmplants: Providing
and managing virtual machine execution environments for grid computing," in Proc.
Super, .i,,;jl/.:. Washington, DC, Nov. 6-12, 2004.

[17] J. Sugerman, G. Venkitachalan, and B. Lim, "Virtualizing i/o devices on vmware
workstation's hosted virtual machine monitor," in Proc. USENIX Annual Technical
Conference, 2001.

[18] J. Dike, "A user-mode port of the linux kernel," in Proc. 4th Annual Linux Showcase
and Conference, USENIX Association, Atlanta, GA, Oct. 2000.

[19] A. Sundararaj and P. Dinda, "Towards virtual networks for virtual machine grid
computing," in Proc. 3rd USENIX Virtual Machine Research and T / ',,. 4..,i;
Symposium, T i_- 2004.

[20] M. Litzkow, T. Tannenbaum, J. Basney, and M. Livny, "C' I. l.point and
migration of UNIX processes in the Condor distributed processing system," Tech.
Rep. UW-CS-TR-1346, University of Wisconsin Madison Computer Sciences
Department, Apr. 1997.

[21] A. Barak, O. Laden, and Y. Yarom, "The now mosix and its preemptive process
migration scheme," Bulletin of the IEEE Technical Committee on Operating Sl-/' i
and Application Environments, vol. 7, no. 2, pp. 5-11, 1995.

[22] R. Duda, P. Hart, and D. Stork, Pattern ClI .--. ,I/r.:.n, Wiley-Interscience, New
York, NY, 2001, 2nd edition.

[23] C. G. Atkeson, A. W. Moore, and S. Schaal, "Locally weighted 1. i.riii.- Ar'.:pfi.:;,1
Intelligence Review, vol. 11, no. 1-5, pp. 11-73, 1997.

[24] S. Adabala, V. ('!i ,.[i P. C'!i .--! ., R. J. O. Figueiredo, J. A. B. Fortes, I. Krsul,
A. M. Matsunaga, M. O. Tsugawa, J. Zhang, M. Zhao, L. Zhu, and X. Zhu, "From
virtualized resources to virtual computing grids: the in-vigo system.," Future
Generation Comp. Syst., vol. 21, no. 6, pp. 896-909, 2005.

[25] L. Yu and H. Liu, "Efficient feature selection via analysis of relevance and
redundancy," Journal of Machine Learning Research, vol. 5, pp. 1205-1224,
Oct. 2004.

[26] T. Cover and P. Hart, \, rest neighbor pattern classification," IEEE Trans. Inf.
T7, .. ',; vol. 13, no. 1, pp. 21-27, Jan. 1967.






















Prediction Performance Comparison (VM5)


Performance Metric ID


*P-LARP Knn-LARP OBays-LARP OCum.MSE W-Cum.MSE

Figure 4-13. Predictor performance comparison (VM5)
1 CPU_usedsec, 2 CPU_ready, 3 Mem_size, 4 Mem_swap,
5 NIC1_rx, 6 NIC1_tx, 7 NIC2_rx, 8 NIC2_tx,
9 VDl_read, 10 VDl_write, 11 VD2_read, 12 VD2_write



























VM: Virtual Machine
VMM: Virtual Machine Monitor
DB: Database
ARM: Application Resource Manager
CQ: Clustering Quality


Figure 5-1.


Application resource demand phase ,n i1 Ji-; and prediction prototype
The phase i .,.i.l. analyzes the performance data collected by the monitoring
agent to find out the optimal number of phases n E [1, m]. The output phase
profile is stored in the application phase database (DB) and will be used as
training data for the phase predictor. The predictor predicts the next phase of
the application resource usage based on the learning of its historical phase
behaviors. The predicted phase can be used to support the application
resource i,,'.'.j, r's (ARM's) decisions regarding resource provisioning. The
auditor monitors and evaluates the performance of the analyzer and predictor
and orders re-training of the phase predictor with the updated workload profile
when the performance measurements drop to below a predefined threshold.


the application containers. The collected performance data is stored in the performance

database.

The phase analyzer retrieves the time-series VM performance data, which are

identified by vmID, FeaturelD, and a time window (ts, t), from the performance database.

Then it performs phase analysis using algorithms based on clustering to check whether

there is a phase behavior in the application's resource consumption patterns. If so, it

continues to find out how many phases in a numeric range are best in terms of providing

the minimal resource reservation costs. The output phase profile, which consists of the









This chapter introduces a Learning-Aided Adaptive Resource Predictor (LARPre-

dictor), which can dynamically choose the best prediction model suited to the workload at

any given moment. By integrating the prediction results generated by the best predictor

of each moment during the application run, the LARPredictor can outperform any single

predictor in the pool. It differs from the traditional mix-of-expert resource prediction

approach in that it does not require running multiple prediction models in parallel all

the time to identify the best predictors. Instead, the Principal Component Analysis

(PCA) and classification algorithm such as k-Nearest Neighbor (k-NN) are used to

forecast the best prediction model from a pool based on the monitoring and learning of

the historical resource availability and the corresponding prediction performance. The

learning-aided adaptive resource performance prediction can be used to support dynamic

VM provisioning by providing accurate prediction of the resource availability of the host

server and the resource demand of the applications that are reflected by the hosting

virtual machines.

Our experimental results based on the analysis of a set of virtual machine trace data

show:

1. The best prediction model is workload specific. In the absence of a perfect

prediction model, it is hard to find a single predictor which works best across virtual

machines which have different resource usage patterns.

2. The best prediction model is resource specific. It is hard to find a single predictor

which works best across different resource types.

3. The best prediction model for a specific type of resource of a given VM trace varies

as a function of time. The LARPredictor can adapt the predictor selection to the change

of the resource consumption pattern.

4. In the experiments with a set of trace data, The LARPredictor outperformed the

observed single best predictor in the pool for 44.2 :'. of the traces and outperformed the

cumulative-MSE based prediction model used in the Network Weather Service system









on-line resource reprovisioning on the same cluster node. So the transition time can be

virtually close to zero (C = 0). In this case, 10 phases can be used. If the transition takes

8 seconds (C = 156), which is achievable with intra-cluster VM migration for resource

1 I''' -i ..i:.i- four phases work the best. When the transition cost exceeds the level that

the reduced resource reservation can justify for the workload under study, the total cost is

an increasing function of the number of phases. In this case, it is better to fall back from

the phase-based resource reservation strategy to the conservative one.

The impact of inaccuracies introduced by the phase predictor is shown in Figure F. In

addition to the resource reservation costs and the phase transition costs, this experiment

also took the phase mis-prediction penalty costs into accounts while calculating the total

cost. For example, for each unit of down-size mis-predicted resource, a penalty of 8-times

(Cp = 8) of the unit resource cost is imposed. Comparing Figure E to Figure F, we can

see that adding penalty into the cost model will increase the final costs to the user for the

same set of k and C and potentially reduce the workload's best number of phases k'_best

for the same set of C and Cp.

Finally a total cost ratio p is defined to be the relative total cost using k phases

TC'(k) to the total cost of 1 phase TC'(1).


p =TC'(k)/TC(1). (5-11)


Intuitively, p measures the cost savings achieved using phase-based reservation strategy

over the conservative one. Thus, the smaller the value of p, the more efficient a

phase-based reservation scheme. Table 5-2 gives a sample total cost schedule (C = 52

and Cp = 8) for each of the eight performance features of SPECseis96. It shows that

by changing the resource provisioning strategy from the conservative approach (k = 1)

to the phase-based provisioning (k = 3), 29.5' total cost reduction for CPU usage can

be achieved. For spiky trace data such as disk I/O and memory usage, the total cost

reduction can be as high as !' '.









Rsc (Actual)), the predicted resource usage by the AR prediction (Predicted Rsc), and the

resource reservation based on the predicted usage (Rsvd Rsc (Predict)).

Figures C and D show that, with increasing number of phases, two of the deter-

minants in the cost model including the number of phase transitions TR(k) and the

misprediction penalty P(k) increase monotonically. The other determinant of the cost

model, the amount of reserved resources R(k), is shown by the lowest curve with index

C = 0 in Figure E. It indicates that with increasing number of phases the total reserved

resources of the training set is decreasing monotonically. This result is because with the

increasing number of phases, the resource allocation can be performed at time scales of

finer granularity. However, there is a diminishing return of the increased number of phases

because of the increasing phase transition costs and misprediction penalties.

In the first analysis, we assume each resource reservation scheme to be r1.:,; .;;,./

i.e., it reserves resources based on exact knowledge of future workload requirements. This

assumption eliminates the impact of inaccuracies introduced by the phase predictor.

In this case, Equation (5-6), which takes the resource reservation cost and the phase

transition cost into account while deciding the optimal number of phases, can be

applied as shown in Figure E. In this figure, the total cost over the whole testing period

is measured by CPU usage in percentage. The discount factor C denotes the CPU

percentage that will cost for each phase transition: C = CPU( .) TransitionDuration.

For example, the bottom line of C = 0 shows the case of no transition cost, which gives

the lower bound of the total cost. For another instance, C = 21 11' implies a 13-second

transition period (2.6intervals 5secs/interval) with the assumption of 1011t' CPU

consumption during the transition period. When the discount factor C increases from 0 to

260, the best number of phases kbest, which can provide the lowest total cost, decreases

gradually from 10 to 2. The phase profile depicted in Figure E can be used to decide the

number of phases that should be used in the phase-based resource reservation to minimize

the total cost with given available transition options. For example, VMware ESX supports









feature subset, it calls the classifier to perform classification of the data in the updated

training data pool using the old and new feature subsets respectively. Then it compares

the classification accuracy of the two. If the accuracy achieved by the new feature subset

is higher than the one achieved by the previous subset, the selected feature is updated.

Otherwise, it remains the same.

3.4 Experimental Results

We have implemented a prototype for the feature selector based on Matlab. This

section shows the experimental results of feature selection for data collected during the

execution of a set of applications representative of each class (CPU, I/O, memory and

network intensive) and the classification accuracy achieved. In addition, statistical analysis

of the performance metrics was conducted to justify the reason of using Mahalanobis

distance in the training data quality assurance process.

In the experiments, all the applications were executed in a VMware GSX 2.5 virtual

machine with 256MB memory. The virtual machine was hosted on an Intel(R) Xeon(T\ i)

dual-CPU 1.80GHz machine with 512KB cache and 1GB RAM. The CTC and application

classifier were running on an Intel(R) Pentium(R) III 750MHz machine with 256MB RAM.

3.4.1 Feature Selection and Classification Accuracy

Two sets of experiments were conducted offline to evaluate our feature selection

algorithm. In both experiments, the training data, described by 20 performance metrics,

consists of performance snapshots of applications belonging to different classes. In the

experiments, tenfold cross validation were performed. The training data was randomly

divided into two parts: A combination of 50 '. of the data from each class was used to

train the feature selector (training set) to derive the feature subset, and the other 5(0'. was

used as test set to validate the features selected by calculating its classification accuracy.

The first experiment was designed to show the relationship between classification

accuracy and the number of features selected. The second experiment was designed to









P(wjyx)p(x) = p(xlJj)P(wj). Rearranging these leads us to "Bayes formula":

P(wjx) -p(xj)P() (4-7)
p(x)

where in this case of c categories


p(x) p(x|yj)P(j). (4-8)
j 1

Then, the posterior probabilities P(cj |x) can be computed from p(x cj) by B-,-.-

formula. In addition, Bi v- formula can be expressed informally in English by -zwiing that

likelihood x prior
posterior = (4-9)
evidence

The multivariate normal density has been applied successfully to a number of

classification problems. In this work the feature vector can be modeled as a multivariate

normal random variable.

The general multivariate normal density in d dimensions is written as


p(x) =
(27)d/2 I I 1/2

exp (x ) (x ), (4-10)

where x is a d-component column vector, tt is the d-component mean vector, E is the

d-by-d covariance matrix, and XII and E-1 are its determinant and inverse, respectively.

Further, we let (x tt)T denote the transpose of x tt.

The minimization of the probability of error can be achieved by use of the discrimi-

nant functions


gi(x) IlnP(wi;x) = lnp(xlw;) + lnP(wi). (4-11)









used for performance prediction. VM2 was used in the experiments. Fig. 4-5 shows the

predictor selections for CPU fifteen minute load average during a 12 hour period with a

sampling interval of 5 minutes. The top plot shows the observed best predictor by running

three prediction models in parallel. The middle plot shows the predictor selection of the

LARPredictor and the bottom plot shows the cumulative-MSE based predictor selection

used in the NWS. Similarly the predictor selection results of the trace data of other

resources are shown as follows: Network packets in per second in Fig. 4-6, total amount of

swap memory in Fig. 4-7, and total disk space in Fig. 4-8.

These experimental results show that the best prediction model for a specific

type of resource of a given trace varies as a function of time. In the experiment, the

LARPredictor can better adapt the predictor selection to the changing workload than

the cumulative-MSE based approach presented in the NWS. The LARPredictor's average

best predictor forecasting accuracy of all the performance traces of the five virtual

machines is 515 *'- which is 20.1t' higher than the accuracy of 4 ,. ,' achieved by the

cumulative-MSE based predictor used in the NWS for the workload studied.

4.7.2 Virtual Machine Performance Trace Prediction

This set of experiments is used to check the prediction performance of the LARPre-

dictor. Section 4.7.2.1 shows the prediction accuracy of the k-NN based LARPredictor

and all the predictors in the pool. Section 4.7.2.2 compares the prediction accuracy and

execution time of the k-NN based LARPredictor and the B ii i i based LARPredictor.

In addition, Section 4.7.2.3 benchmarks the performance of the LARPredictors and the

cumulative-MSE based prediction model used in the NWS.

In the experiments, ten-fold cross validation were performed for each set of time series

data. A time stamp was randomly chosen to divide the performance data of a virtual

machine into two parts: 5(0' of the data was used to train the LARPredictor and the

other 50' was used as test set to measure the prediction performance by calculating its

prediction MSE.









optimality of the decision. The key challenge here is how to find a representation of the

application, which can describe multiple dimensions of resource consumption, in a simple

way. This section describes how the pattern classification techniques, the PCA and the

K-NN classifier, are applied to achieve this goal.

A pattern classification system consists of pre-processing, feature extraction,

classification, and post-processing. The pre-processing and feature extraction are known

to significantly affect the classification, because the error caused by wrong features may

propagate to the next steps and stays predominant in terms of the overall classification

error. In this work, a set of application performance metrics are chosen based on expert

knowledge and the principle of increasing relevance and reducing redundancy [25].

2.2.1 Principal Component Analysis

Principal Component Analysis (PCA) [22] is a linear transformation representing

data in a least-square sense. It is designed to capture the variance in a dataset in terms of

principal components and reduce the dimensionality of the data. It has been widely used

in data analysis and compression.

When a set of vector samples are represented by a set of lines passing through

the mean of the samples, the best linear directions result in eigenvectors of the scatter

matrix the so-called "principal compels, ii as shown in Figure 2-1. The corresponding

eigenvalues represent the contribution to the variance of data. When the k largest

eigenvalues of n principal components are chosen to represent the data, the dimensionality

of the data reduces from n to k.

Principal component analysis is based on the statistical representation of a random

variable. Suppose we have a random vector population x, where


x (Xri,.. ,X)T (2-1)


and the mean of that population is denoted by


x E {x} (2-2)










Application Class


pksin load fifteen


Figure 3-1. Sample B ,i-, -i i: network generated by feature selector


assertions that allow the construction of a global gpdf from the local gpdfs. As shown

previously, the chain rule of probability can be used to ascertain these values:
k
p(xi,' ,Xk|) Jp(xixl,"" ,Xi-l,) (3 1)
i=1

One assumption imposed by B li, -i i: Network theory (and indirectly by the Product

Rule of probability theory) is that each variable xi, Hi C {xi, xi-1} must be a set of

variables that renders xi and {xl, xii} conditionally independent. In this way:


p(xilXl,- Xi-1, = p(xili, 0) (3-2)

A B li-, -i i Network Structure then encodes the assertions of conditional independence

in Equation 3-1 above. Essentially then, a B li, -i ,i Network Structure Bs is a directed

.I. il ic graph such that each variable in U corresponds to a node in Bs, and the parents of

the node corresponding to xi are the nodes corresponding to the variables in Fi.

Depending on the problem that is defined, either (or both) of the topology and

the probability distribution of B li-, -i i, Network can be pre-defined by hand or may be




Full Text

PAGE 1

1

PAGE 2

2

PAGE 3

3

PAGE 4

Iwouldliketoexpressmysinceregratitudetomyadvisor,ProfessorRenatoJ.Figueiredo,forhisinvaluableadvice,encouragement,andsupport.Thisdissertationwouldnothavebeenpossiblewithouthisguidanceandsupport.MydeepappreciationgoestoProfessorJoseA.B.FortesforparticipatinginmysupervisorycommitteeandforalltheguidanceandopportunitiestoworkintheIn-VIGOteamthathegavemeduringmyPh.Dstudy.MydeeprecognitionalsogoestoProfessorMalayGhoshandProfessorAlanGeorgeforservingonmysupervisorycommitteeandfortheirvaluablesuggestions.ManythanksgotoDr.MazinYousifandMr.RobertCarpenterfromIntelCorporationfortheirvaluableinputandgenerousfundingforthisresearch.ThanksalsogotomycolleaguesintheAdvancedComputingInformationSystems(ACIS)Laboratoryfortheirdiscussionofideasandyearsoffriendship.Lastbutnotleast,Ioweaspecialdebtofgratitudetomyfamily.Withouttheirselessloveandsupport,IcannotimaginewhatIwouldhaveachieved. 4

PAGE 5

page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 8 LISTOFFIGURES .................................... 9 ABSTRACT ........................................ 11 CHAPTER 1INTRODUCTION .................................. 13 1.1ResourcePerformanceModeling ........................ 14 1.2AutonomicComputing ............................. 15 1.3Learning ..................................... 17 1.3.1SupervisedLearning ........................... 17 1.3.2UnsupervisedLearning ......................... 18 1.3.3ReinforcementLearning ......................... 18 1.3.4OtherLearningParadigms ....................... 19 1.4VirtualMachines ................................ 20 1.4.1VirtualMachineCharacteristics .................... 20 1.4.2VirtualMachinePlant ......................... 22 2APPLICATIONCLASSIFICATIONBASEDONMONITORINGANDLEA-RNINGOFRESOURCECONSUMPTIONPATTERNS ............. 24 2.1Introduction ................................... 24 2.2ClassicationAlgorithms ............................ 26 2.2.1PrincipalComponentAnalysis ..................... 27 2.2.2k-NearestNeighborAlgorithm ..................... 30 2.3ApplicationClassicationFramework ..................... 31 2.3.1PerformanceProler .......................... 32 2.3.2ClassicationCenter .......................... 33 2.3.2.1Datapreprocessingbasedonexpertknowledge ....... 33 2.3.2.2Featureselectionbasedonprincipalcomponentanalysis 34 2.3.2.3Trainingandclassication .................. 35 2.3.3PostProcessingandApplicationDatabase .............. 35 2.4ExperimentalResults .............................. 36 2.4.1ClassicationAbility .......................... 36 2.4.2SchedulingPerformanceImprovement ................. 41 2.4.3ClassicationCost ............................ 45 2.5RelatedWork .................................. 45 2.6Conclusion .................................... 47 5

PAGE 6

......................................... 49 3.1Introduction ................................... 49 3.2StatisticalInference ............................... 51 3.2.1FeatureSelection ............................ 51 3.2.2BayesianNetwork ............................ 52 3.2.3MahalanobisDistance .......................... 55 3.2.4ConfusionMatrix ............................ 55 3.3AutonomicFeatureSelectionFramework ................... 56 3.3.1DataQualityAssuror .......................... 56 3.3.2FeatureSelector ............................. 59 3.3.3Trainer .................................. 61 3.4ExperimentalResults .............................. 62 3.4.1FeatureSelectionandClassicationAccuracy ............. 62 3.4.2ClassicationValidation ........................ 65 3.4.3TrainingDataQualityAssurance ................... 71 3.5RelatedWork .................................. 71 3.6Conclusion .................................... 73 4ADAPTIVEPREDICTORINTEGRATIONFORSYSTEMPERFORMANCEPREDICTIONS .................................... 74 4.1Introduction ................................... 74 4.2RelatedWork .................................. 76 4.3VirtualMachineResourcePredictionOverview ............... 77 4.4TimeSeriesModelsforResourcePerformancePrediction .......... 80 4.5AlgorithmsforPredictionModelSelection .................. 82 4.5.1k-NearestNeighbor ........................... 83 4.5.2BayesianClassication ......................... 83 4.5.3PrincipalComponentAnalysis ..................... 85 4.6Learning-AidedAdaptiveResourcePredictor ................. 86 4.6.1TrainingPhase ............................. 86 4.6.2TestingPhase .............................. 89 4.7EmpiricalEvaluation .............................. 90 4.7.1BestPredictorSelection ........................ 90 4.7.2VirtualMachinePerformanceTracePrediction ............ 91 4.7.2.1Performanceofk-NNbasedLARPredictor ......... 92 4.7.2.2Performancecomparisonofk-NNandBayesian-classierbasedLARPredictor ..................... 96 4.7.2.3PerformancecomparisonoftheLARPredictorsandthecumulative-MSEbasedpredictorusedintheNWS .... 97 4.7.3Discussion ................................ 98 4.8Conclusion .................................... 100 6

PAGE 7

......................................... 106 5.1Introduction ................................... 106 5.2ApplicationResourceDemandPhaseAnalysisandPredictionPrototype 108 5.3DataClustering ................................. 111 5.3.1StagesinClustering ........................... 111 5.3.2DenitionsandNotation ........................ 112 5.3.3k-meansClustering ........................... 113 5.3.4FindingtheOptimalNumberofClusters ............... 114 5.4PhasePrediction ................................ 117 5.5EmpiricalEvaluation .............................. 118 5.5.1PhaseBehaviorAnalysis ........................ 119 5.5.1.1SPECseis96benchmark ................... 119 5.5.1.2WorldCupweblogreplay .................. 122 5.5.2PhasePredictionAccuracy ....................... 123 5.5.3Discussion ................................ 125 5.6RelatedWork .................................. 126 5.7Conclusion .................................... 128 6CONCLUSION .................................... 135 REFERENCES ....................................... 137 BIOGRAPHICALSKETCH ................................ 146 7

PAGE 8

Table page 2-1Performancemetriclist ................................ 35 2-2Listoftrainingandtestingapplications ....................... 37 2-3Experimentaldata:applicationclasscompositions ................. 40 2-4Systemthroughput:concurrentvs.sequentialexecutions ............. 44 3-1Sampleconfusionmatrixwithtwoclasses(L=2) .................. 56 3-2Sampleperformancemetricsintheoriginalfeatureset .............. 59 3-3Confusionmatrixofclassicationresults ...................... 65 3-4Performancemetriccorrelationmatrixesoftestapplications ........... 70 4-1NormalizedpredictionMSEstatisticsforresourcesofVM1 ............ 96 4-2NormalizedpredictionMSEstatisticsforresourcesofVM2 ............ 97 4-3NormalizedpredictionMSEstatisticsforresourcesofVM3 ............ 98 4-4NormalizedpredictionMSEstatisticsforresourcesofVM4 ............ 99 4-5NormalizedpredictionMSEstatisticsforresourcesofVM5 ............ 99 4-6Bestpredictorsofallthetracedata ......................... 100 5-1Performancefeaturelist ............................... 119 5-2SPECseis96totalcostratioschedulefortheeightperformancefeatures ..... 122 5-3Averagephasepredictionaccuracy ......................... 124 5-4PerformancefeaturelistofVMtraces ........................ 124 5-5AveragephasepredictionaccuracyoftheveVMs ................ 126 8

PAGE 9

Figure page 1-1Structureofanautonomicelement. ......................... 16 1-2Classicationsystemrepresentation ......................... 19 1-3Virtualmachinestructure .............................. 21 1-4VMPlantarchitecture ................................ 23 2-1Sampleofprincipalcomponentanalysis ....................... 28 2-2k-nearestneighborclassicationexample ...................... 31 2-3Applicationclassicationmodel ........................... 32 2-4Performancefeaturespacedimensionreductionsintheapplicationclassicationprocess ......................................... 34 2-5Sampleclusteringdiagramsofapplicationclassications ............. 39 2-6Applicationclasscompositiondiagram ....................... 42 2-7Systemthroughputcomparisonsfortendierentschedules ............ 43 2-8Applicationthroughputcomparisonsofdierentschedules ............ 44 3-1SampleBayesiannetworkgeneratedbyfeatureselector .............. 54 3-2Featureselectionmodel ............................... 57 3-3Bayesian-networkbasedfeatureselectionalgorithmforapplicationclassication 60 3-4Averageclassicationaccuracyof10setsoftestdataversusnumberoffeaturesselectedintherstexperiment ........................... 63 3-5Two-classtestdatadistributionwiththersttwoselectedfeatures ....... 63 3-6Five-classtestdatadistributionwithrsttwoselectedfeatures .......... 66 3-7Comparisonofdistancesbetweenclustercentersderivedfromexpert-selectedandautomaticallyselectedfeaturesets ....................... 66 3-8Trainingdataclusteringdiagramderivedfromexpert-selectedandautomat-icallyselectedfeaturesets .............................. 67 3-9Classicationresultsofbenchmarkprograms .................... 69 4-1Virtualmachineresourceusagepredictionprototype ............... 78 4-2SampleXMLschemaoftheVMperformanceDB ................. 80 9

PAGE 10

............... 87 4-4Learning-aidedadaptiveresourcepredictordataow ................ 88 4-5BestpredictorselectionfortraceVM2 load15 ................... 92 4-6BestpredictorselectionfortraceVM2 PktIn .................... 93 4-7BestpredictorselectionfortraceVM2 Swap .................... 94 4-8BestpredictorselectionfortraceVM2 Disk .................... 95 4-9Predictorperformancecomparison(VM1) ..................... 101 4-10Predictorperformancecomparison(VM2) ..................... 102 4-11Predictorperformancecomparison(VM3) ..................... 103 4-12Predictorperformancecomparison(VM4) ..................... 104 4-13Predictorperformancecomparison(VM5) ..................... 105 5-1Applicationresourcedemandphaseanalysisandpredictionprototype ...... 109 5-2Resourceallocationstrategycomparison ...................... 115 5-3Applicationresourcedemandphasepredictionworkow .............. 129 5-4PhaseanalysisofSPECseis96CPU user ...................... 130 5-5PhaseanalysisofWorldCup'98Bytes In ...................... 133 5-6PhaseanalysisofWorldCup'98Bytes out ...................... 134 10

PAGE 11

Withthegoalofautonomiccomputing,itisdesirabletohavearesourceschedulerthatiscapableofself-optimization,whichmeansthatwithagivenhigh-levelobjectivetheschedulercanautomaticallyadaptitsschedulingdecisionstothechangingworkload.Thisself-optimizationcapacityimposeschallengestosystemperformancemodelingbecauseofincreasingsizeandcomplexityofcomputingsystems. Ourgoalsweretwofold:todesignperformancemodelsthatcanderiveapplications'resourceconsumptionpatternsinasystematicway,andtodevelopperformancepredictionmodelsthatcanadapttochangingworkloads.Anoveltyinthesystemperformancemodeldesignistheuseofvariousmachinelearningtechniquestoecientlydealwiththecomplexityofdynamicworkloadsbasedonmonitoringandminingofhistoricalperformancedata.Intheenvironmentsconsideredinthisthesis,virtualmachines(VMs)areusedasresourcecontainerstohostapplicationexecutionsbecauseoftheirexibilityinsupportingresourceprovisioningandloadbalancing. Ourstudyintroducedthreeperformancemodelstosupportself-optimizedschedulinganddecision-making.First,anovelapproachisintroducedforapplicationclassicationbasedonthePrincipalComponentAnalysis(PCA)andthek-NearestNeighbor(k-NN)classier.Ithelpstoreducethedimensionalityoftheperformancefeaturespaceandclassifyapplicationsbasedonextractedfeatures.Inaddition,afeatureselectionmodelis 11

PAGE 12

Second,anadaptivesystemperformancepredictionmodelisinvestigatedbasedonalearning-aidedpredictorintegrationtechnique.Supervisedlearningtechniquesareusedtolearnthecorrelationsbetweenthestatisticalpropertiesoftheworkloadandthebest-suitedpredictors. Inadditiontoaone-stepaheadpredictionmodel,aphasecharacterizationmodelisstudiedtoexplorethelarge-scalebehaviorofapplication'sresourceconsumptionpatterns. Ourstudyprovidesnovelmethodologiestomodelsystemandapplicationperfor-mance.Theperformancemodelscanself-optimizeovertimebasedonlearningofhistoricalruns,thereforebetteradapttothechangingworkloadandachievebetterpredictionaccuracythantraditionalmethodswithstaticparameters. 12

PAGE 13

Thevisionofautonomiccomputing[ 1 ]istoimprovemanageabilityofcomplexITsystemstoafargreaterextentthancurrentpracticethroughself-conguring,self-healing,self-optimization,andself-protection.Toperformtheself-congurationandself-optimizationofapplicationsandassociatedexecutionenvironmentsandtorealizedynamicresourceallocation,bothresourceawarenessandapplicationawarenessareimportant.Inthiscontext,therehasbeensubstantialresearchoneectiveschedulingpolicies[ 2 { 6 ]withgivenresourceandapplicationspecications.Whilethereareseveralmethodsforobtainingresourcespecicationparameters(e.g.,CPU,memory,anddiskinformationfromthe/proclesysteminUnixsystems),applicationspecicationischallengingtodescribeduetothefollowingfactors:1)lackofknowledgeandcontroloftheapplicationsourcecodes,2)multi-dimensionalityofapplicationresourceconsumptionpatterns,and3)multi-stageresourceconsumptionpatternsoflong-runningapplications.Furthermore,thedynamicsofsystemperformanceaggravatethedicultiesofperformancedescriptionandprediction. Inthisdissertation,anintegratedframeworkconsistingofalgorithmsandmiddlewareforresourceperformancemodelingisdeveloped.Itincludessystemperformancepredictionmodelsandapplicationresourcedemandmodelsbasedonlearningofhistoricalexecutions.Anoveltyoftheperformancemodeldesignsistheiruseofmachinelearningtechniquestoecientlyandrobustlydealwiththecomplexdynamicalphenomenaoftheworkloadandresourceavailability.Inaddition,virtualmachines(VMs)areusedasresourcecontainersbecausetheyprovideaexiblemanagementplatformthatisusefulforboththeencapsulationofapplicationexecutionenvironmentsandtheaggregationandaccountingofresourcesconsumedbyanapplication.Inthiscontext,resourceschedulingbecomesaproblemofhowtodynamicallyallocateresourcestovirtualmachines(whichhostapplicationexecutions)tomeettheapplications'resourcedemands. 13

PAGE 14

1.1 givesanoverviewofresourceperformancemodeling.Sections 1.2 1.3 ,and 1.4 ,brieyintroduceautonomiccomputing,machinelearning,andvirtualmachineconcepts. 7 ]: Insystemprocurementstudies,thecost/performanceratioiscommonlyusedasametricforcomparingsystems.Threetechniquesforperformanceevaluationareanalyticalmodeling,simulation,andmeasurement.Sometimesitishelpfultousetwoormoretechniques,eithersimultaneouslyorsequentially. Computersystemperformancemeasurementsinvolvemonitoringthesystemwhileitisbeingsubjectedtoaparticularworkload.Inordertoperformmeaningfulmeasurements,theworkloadshouldbecarefullyselectedbasedontheservicesexercisedbytheworkload,thelevelofdetail,representativeness,andtimeliness.Sincearealuserenvironmentisgenerallynotrepeatable,itisnecessarytostudytherealuserenvironments,observethekeycharacteristics,anddevelopaworkloadmodelthatcanbeusedrepeatedly.This 14

PAGE 15

1 ].Theessenceofautonomiccomputingistoenableself-managedsystems,whichincludesthefollowingaspects: Autonomiccomputingpresentschallengesandopportunitiesinvariousareassuchaslearningandoptimizationtheory,automatedstatisticallearning,andbehavioralabstractionandmodels[ 8 ].Thisdissertationaddressessomeofthechallengesin 15

PAGE 16

Structureofanautonomicelement. theapplicationresourceperformancemodelingtosupportself-congurationandself-optimizationofapplicationexecutionenvironments. Generally,anautonomicsystemisaninteractivecollectionofautonomicelements:individualsystemconstituentsthatcontainresourcesanddeliverservicestohumansandotherautonomicelements.AsFigure 1-1 shows,anautonomicelementwilltypicallyconsistofoneormoremanagedelementscoupledwithasingleautonomicmanagerthatcontrolsandrepresentsthem.Themanagedelementcouldbeahardwareresource,suchasstorage,aCPU,orasoftwareresource,suchasadatabase,oradirectoryservice,oralargelegacysystem[ 1 ].Themonitoringprocesscollectstheperformancedataofthe 16

PAGE 17

Machinelearningisanaturalsolutiontoautomation.Itavoidsknowledge-intensivemodelbuildingandreducestherelianceonexpertknowledge.Inaddition,itcandealwithcomplexdynamicalphenomenaandenablethesystemtoadapttothechangingenvironments. Traditionallytherearegenerallythreetypesoflearning:supervisedlearning,unsupervisedlearning,andreinforcementlearning. 9 ].\Learning"consistsofchoosingoradaptingparameterswithinthemodelstructurethatworkbestonthesamplesathandandotherslikethem.Oneofthemostprominentandbasiclearningtasksisclassicationorprediction,whichisusedextensivelyinthiswork.Forclassicationproblems,alearningsystemcanbeviewedasahigher-levelsystemthathelpsbuildthedecision-makingsystemitself,calledtheclassier.Figure 1-2 illustratesthestructureofaclassicationsystemanditslearningprocess. 17

PAGE 18

Reinforcementlearningalgorithmsattempttondapolicyformaximizingcumulativerewardfortheagentoverthecourseoftheproblem.Theenvironmentistypically 18

PAGE 19

ClassicationsystemrepresentationDuringthetrainingphase,labeledsamplecasesareusedtoderivetheunknownparametersoftheclassiermodel.Duringthetestingphase,thecustomizedclassierisusedtoassociateaspecicpatternofobservationswithaspecicclass. 19

PAGE 20

Inthiswork,variouslearningtechniquesareusedtomodeltheapplicationresourcedemandandsystemperformance.Thesemodelscanhelptosystemtoadapttothechangingworkloadandachievehigherperformance. 10 ].A\classic"virtualmachine(VM)enablesmultipleindependent,isolatedoperatingsystems(guestVM)torunononephysicalmachine(hostserver),ecientlymultiplexingsystemresourcesofthehostmachine[ 10 ]. Avirtual-machinemonitor(VMM)isasoftwarelayerthatrunsonahostplatformandprovidesanabstractionofacompletecomputersystemtohigher-levelsoftware.TheabstractioncreatedbytheVMMiscalledavirtualmachine.Figure 1-3 showsthestructureofvirtualmachines. 11 ].Thefollowingcharacteristicsofvirtualmachinesmakethemahighlyexibleandmanageableapplicationexecutionplatform: 20

PAGE 21

VirtualmachinestructureAvirtual-machinemonitorisasoftwarelayerthatrunsonahostplatformandprovidesanabstractionofacompletecomputersystemtohigher-levelsoftware.Thehostplatformmaybethebarehardware(TypeIVMM)orahostoperatingsystem(TypeIIVMM).Thesoftwarerunningabovethevirtual-machineabstractioniscalledguestsoftware(operatingsystemandapplications). 12 ]cansupportdynamicmemoryextensionofVMguestwithoutshuttingdownthesystem. 21

PAGE 22

13 ].VMware'sVMotioncansupportmigrationwithzerodowntime[ 14 ].TechniquesbasedonVirtualFileSystem(VFS)hasbeenstudiedin[ 15 ]tosupportVMmigrationacrossWide-AreaNetworks(WANs). 16 ]handlesvirtualmachinecreationandhostingforclassicvirtualmachines(e.g.VMware[ 17 ])anduser-modeLinuxplatforms(e.g.,UML[ 18 ])viadynamiccloning,instantiationandconguration.TheVMPlanthasthreemajorcomponents:VirtualMachineProductionCenter(VMPC),VirtualMachineWarehouse(VMWH)andVirtualMachineInformationSystem(VMIS).TheVMPChandlesthevirtualmachine'screation,congurationanddestruction.Itemploysacongurationpatternrecognitiontechniquetoidentifyopportunitiestoapplythepre-cachedvirtualmachinestatetoacceleratethemachinecongurationprocess.TheVMWHstoresthepre-cachedmachineimages,monitorsthemandtheirhostserver'sperformanceandperformsthemaintenanceactivity.TheVMISstoresthestaticanddynamicinformationofthevirtualmachinesandtheirhostserver.ThearchitectureoftheVMPlantisshowninFigure 1-4 TheVMPlantprovidesAPItoVMShopforvirtualmachinecreation,deconstruction,andmonitoring.TheVMShophasthreemajorcomponents:VMCreater,VMCollecterandVMReporter.TheVMCreaterhandlesthevirtualmachines'creation;TheVMCollecterhandlesthemachines'deconstructionandsuspension;TheVMReporterhandlesinformationrequest.Incombinationwithavirtualmachineshopservice,VMPlants 22

PAGE 23

VMPlantarchitecture deployedacrossphysicalresourcesofasiteallowclients(usersand/ormiddlewareactingontheirbehalf)toinstantiateandcontrolclient-customizedvirtualexecutionenvironments.Theplantcanbeintegratedwithvirtualnetworkingtechniques(suchasVNET[ 19 ])toallowclient-sidenetworkmanagement.Customized,application-specicVMscanbedenedinVMPlantwiththeuseofadirectedacyclicgraph(DAG)conguration.VMexecutionenvironmentsdenedwithinthisframeworkcanthenbeclonedanddynamicallyinstantiatedtoprovideahomogeneousapplicationexecutionenvironmentacrossdistributedresources. InthecontextoftheVMPlant,anapplicationcanbescheduledtoruninaspecicvirtualmachine,whichiscalledapplicationVM.Therefore,thesystemperformancemetriccollectedfromtheapplicationVMcanreectandsummarizetheresourceconsumptionoftheapplication. 23

PAGE 24

Applicationawarenessisanimportantfactorofecientresourcescheduling.ThischapterintroducesanovelapproachforapplicationclassicationbasedonthePrincipalComponentAnalysis(PCA)andthek-NearestNeighbor(k-NN)classier.Thisapproachisusedtoassistschedulinginheterogeneouscomputingenvironments.Ithelpstoreducethedimensionalityoftheperformancefeaturespaceandclassifyapplicationsbasedonextractedfeatures.Theclassicationconsidersfourdimensions:CPU-intensive,I/Oandpaging-intensive,network-intensive,andidle.Applicationclassinformationandthestatisticalabstractsoftheapplicationbehaviorarelearnedoverhistoricalrunsandusedtoassistmulti-dimensionalresourcescheduling. 2 { 4 ]withgivenresourceandapplicationspecications.Thereareseveralmethodsforobtainingresourcespecicationparameters(e.g.,CPU,memory,diskinformationfrom/procinUnixsystems).However,applicationspecicationischallengingtodescribebecauseofthefollowingfactors: 24

PAGE 25

20 ][ 21 ]itispossibletomigrateanapplicationduringitsexecutionforloadbalancing. Theabovecharacteristicsofgridapplicationspresentachallengetoresourcescheduling:Howtolearnandmakeuseofanapplication'smulti-dimensionalresourceconsumptionpatternsforresourceallocation?Thischapterintroducesanovelapproachtosolvethisproblem:applicationclassicationbasedonthefeatureselectionalgorithm,PrincipalComponentAnalysis(PCA),andK-NearestNeighbor(k-NN)classier[ 22 ][ 23 ].ThePCAisappliedtoreducethedimensionalityofapplicationperformancemetrics,whilepreservingthemaximumamountofvarianceinthemetrics.Then,thek-NearestNeighboralgorithmisusedtocategorizetheapplicationexecutionstatesintodierentclassesbasedontheapplication'sresourceconsumptionpattern.Thelearnedapplicationclassinformationisusedtoassisttheresourceschedulingdecision-makinginheterogeneouscomputingenvironments. TheVMPlantserviceintroducedinChapter 1.4.2 providesautomatedcloningandcongurationofapplication-centricVirtualMachines(VMs).Problem-solvingenvironmentssuchasIn-VIGO[ 24 ]cansubmitrequeststotheVMPlantservice,which 25

PAGE 26

Theclassicationsystemdescribedinthischapterleveragesthecapabilityofsummarizingapplicationperformancedatabycollectingsystem-leveldatawithinaVM,asfollows.Duringtheapplicationexecution,snapshotsofperformancemetricsaretakenatadesiredfrequency.APCAprocessoranalyzestheperformancesnapshotsandextractsthekeycomponentsoftheapplication'sresourceusage.Basedontheextractedfeatures,ak-NNclassiercategorizeseachsnapshotintooneofthefollowingclasses:CPU-intensive,IO-intensive,memory-intensive,network-intensiveandidle. Byusingthissystem,resourceschedulingcanbebasedonacomprehensivediagnosisoftheapplicationresourceutilization,whichconveysmoreinformationthanCPUloadinisolation.Experimentsreportedinthischaptershowthattheresourceschedulingfacilitatedwithapplicationclasscompositionknowledgecanachievebetteraveragesystemthroughputthanschedulingwithouttheknowledge. Therestofthechapterisorganizedasfollows:Section 2.2 introducesthePCAandthek-NNclassierinthecontextofapplicationclassication.Section 2.3 presentstheclassicationmodelandimplementation.Section 2.4 presentsanddiscussesexperimentalresultsofclassicationperformancemeasurements.Section 2.5 discussesrelatedwork.ConclusionsandfutureworkarediscussedinSection 2.6 26

PAGE 27

Apatternclassicationsystemconsistsofpre-processing,featureextraction,classication,andpost-processing.Thepre-processingandfeatureextractionareknowntosignicantlyaecttheclassication,becausetheerrorcausedbywrongfeaturesmaypropagatetothenextstepsandstayspredominantintermsoftheoverallclassicationerror.Inthiswork,asetofapplicationperformancemetricsarechosenbasedonexpertknowledgeandtheprincipleofincreasingrelevanceandreducingredundancy[ 25 ]. 22 ]isalineartransformationrepresentingdatainaleast-squaresense.Itisdesignedtocapturethevarianceinadatasetintermsofprincipalcomponentsandreducethedimensionalityofthedata.Ithasbeenwidelyusedindataanalysisandcompression. Whenasetofvectorsamplesarerepresentedbyasetoflinespassingthroughthemeanofthesamples,thebestlineardirectionsresultineigenvectorsofthescattermatrix-theso-called\principalcomponents"asshowninFigure 2-1 .Thecorrespondingeigenvaluesrepresentthecontributiontothevarianceofdata.Whentheklargesteigenvaluesofnprincipalcomponentsarechosentorepresentthedata,thedimensionalityofthedatareducesfromntok. Principalcomponentanalysisisbasedonthestatisticalrepresentationofarandomvariable.Supposewehavearandomvectorpopulationx,where andthemeanofthatpopulationisdenotedby 27

PAGE 28

Sampleofprincipalcomponentanalysis andthecovariancematrixofthesamedatasetis ThecomponentsofCx,denotedbycij,representthecovariancesbetweentherandomvariablecomponentsxiandxj.Thecomponentciiisthevarianceofthecomponentxi. Fromasampleofvectorsx1;;xM,wecancalculatethesamplemeanandthesamplecovariancematrixastheestimatesofthemeanandthecovariancematrix. Theeigenvectorseiandthecorrespondingeigenvaluesicanbeobtainedbysolvingtheequation 28

PAGE 29

(2{5) wheretheIistheidentifymatrixhavingthesameorderthanCxandthejjdenotesthedeterminantofthematrix.Ifthedatavectorhasncomponents,thecharacteristicequationbecomesofordern. Byorderingtheeigenvectorsintheorderofdescendingeigenvalues(largestrst),onecancreateanorderedorthogonalbasiswiththersteigenvectorhavingthedirectionoflargestvarianceofthedata.Inthisway,wecannddirectionsinwhichthedatasethasthemostsignicantamountsofenergy. Supposeonehasadatasetofwhichthesamplemeanandthecovariancematrixhavebeencalculated.LetAbeamatrixconsistingofeigenvectorsofthecovariancematrixastherowvectors. Bytransformingadatavectorx,weget (2{6) whichisapointintheorthogonalcoordinatesystemdenedbytheeigenvectors.Componentsofycanbeseenasthecoordinatesintheorthogonalbase.Wecanreconstructtheoriginaldatavectorxfromyby usingthepropertyofanorthogonalmatrixA1=AT.TheATisthetransposeofamatrixA.Theoriginalvectorxwasprojectedonthecoordinateaxesdenedbytheorthogonalbasis.Theoriginalvectorwasthenreconstructedbyalinearcombinationoftheorthogonalbasisvectors. 29

PAGE 30

(2{8) and ItmeansthatweprojecttheoriginaldatavectoronthecoordinateaxeshavingthedimensionKandtransformingthevectorbackbyalinearcombinationofthebasisvectors.Thismethodminimizesthemean-squareerrorbetweenthedataandtherepresentationwithgivennumberofeigenvectors. Ifthedataisconcentratedinalinearsubspace,thismethodprovidesawaytocompressdatawithoutlosingmuchinformationandsimplifyingtherepresentation.Bypickingtheeigenvectorshavingthelargesteigenvaluesweloseaslittleinformationaspossibleinthemean-squaresense. 26 ].Ithasbeenusedinmanyapplicationsintheeldofdatamining,statisticalpatternrecognition,imageprocessing,andmanyothers.Thepurposeofthisalgorithmistoclassifyanewobjectbasedonattributesandtrainingsamples.Theclassiersdonotuseanymodeltotandonlybasedonmemory.Givenaquerypoint,wendknumberofobjectsor(trainingpoints)closesttothequerypoint.Thek-NNclassierdecidestheclassbyconsideringthevotesofk(anoddnumber)nearestneighbors.Thenearest 30

PAGE 31

k-nearestneighborclassicationexample neighborispickedasthetrainingdatageometricallyclosesttothetestdatainthefeaturespaceasillustratedinFigure 2-2 Inthiswork,avectoroftheapplication'sresourceconsumptionsnapshotsisusedtorepresenttheapplication.Eachsnapshotconsistsofachosensetofperformancemetrics.ThePCAisusedtopreprocesstherawdatatoindependentfeaturesfortheclassier.Then,a3-NNclassierisusedtoclassifyeachsnapshot.Themajorityvoteofthesnapshots'classesisusedtorepresenttheclassoftheapplications:CPU-intensive,I/Oandpaging-intensive,network-intensive,oridle.Amachinewithnoloadexceptforbackgroundloadfromsystemdaemonsisconsideredasinidlestate. 2-3 .Inaddition,amonitoring 31

PAGE 32

ApplicationclassicationmodelThePerformanceprolercollectsperformancemetricsofthetargetapplicationnode.TheClassicationcenterclassiestheapplicationusingextractedkeycomponentsandperformsstatisticanalysisoftheclassicationresults.TheApplicationDBstorestheapplicationclassinformation.(misthenumberofsnapshotstakeninoneapplicationrun,t0=t1:arethebeginningendingtimesoftheapplicationexecution,VMIPistheIPaddressoftheapplication'shostmachine). systemisusedtosamplethesystemperformanceofacomputingnoderunninganapplicationofinterest. 32

PAGE 33

27 ]distributedmonitoringsystemisusedtomonitorapplicationnodes.TheperformancesamplertakessnapshotsoftheperformancemetricscollectedbyGangliaatapredenedfrequency(currently,5seconds)betweentheapplication'sstartingtimet0andendingtimet1.SinceGangliausesmulticastbasedonalisten/announceprotocoltomonitorthemachinestate,thecollectedsamplesconsistoftheperformancedataofallthenodesinasubnet.Theperformancelterextractsthesnapshotsofthetargetapplicationforfutureprocessing.Attheendofproling,anapplicationperformancedatapoolisgenerated.ThedatapoolconsistsofasetofndimensionalsamplesAnm=(a1;a2;;am),wherem=(t1t0)=disthenumberofsnapshotstakeninoneapplicationrunanddisthesamplingtimeinterval.Eachsampleaiconsistsofnperformancemetrics,whichincludeallthedefault29metricsmonitoredbyGangliaandthe4metricsthatweaddedbasedontheneedofclassication,includingthenumberofI/Oblocksreadfrom/writtentodisk,andthenumberofmemorypagesswappedin/out.Aprogramwasdevelopedtocollectthesefourmetrics(usingvmstat)andthemetricswereaddedtothemetriclistofGanglia'sgmond. 2-1 .Eachpairoftheperformancemetricscorrelatestotheresourceconsumptionbehaviorofthespecicapplicationclassandhaslimitedredundancies. 33

PAGE 34

Performancefeaturespacedimensionreductionsintheapplicationclassicationprocessm:Thenumberofsnapshotstakeninoneapplicationrun,n:Thenumberofperformancemetrics,Anm:Allperformancemetricscollectedbymonitoringsystem,A'pm:Theselectedrelevantperformancemetricsafterthezero-meanandunit-variancenormalization,Bqm:Theextractedkeycomponentmetrics,C1m:Theclassvectorofthesnapshots,Class:Theapplicationclass,whichisthemajorityvoteofsnapshots'classes. Forexample,performancemetricsofCPU SystemandCPU UserarecorrelatedtoCPU-intensiveapplications;Bytes InandBytes OutarecorrelatedtoNetwork-intensiveapplications;IO BIandIO BOarecorrelatedtotheIO-intensiveapplications;Swap InandSwap OutarecorrelatedtoMemory-intensiveapplications.Thedatapreprocessorextractstheseeightmetricsofthetargetapplicationnodefromthedatapoolbasedonourexpertknowledge.Thusitreducesthedimensionoftheperformancemetricfromn=33top=8andgeneratesA'pmasshowninFigure 2-4 .Inaddition,thepreprocessoralsonormalizestheselectedmetricstozero-meanandunit-variance. 2-1 asinputs.Itconductsthelineartransformationoftheperformancedataandselectstheprincipalcomponentsbasedonthepredenedminimalfractionvariance.Inourimplementation,theminimalfractionvariancewassettoextractexactlytwoprincipalcomponents.Therefore,attheendofprocessing,thedatadimensiongetsfurtherreducedfromp=8toq=2andthevectorBqmisgenerated,asshowninFigure 2-4 34

PAGE 35

Performancemetriclist Description System/User PercentCPU System/User Bytes In/Out Numberofbytespersecond into/outofthenetwork IO BI/BO Blockssentto/receivedfrom blockdevice(blocks/s) Swap In/Out Amountofmemoryswapped in/outfrom/todisk(kB/s) 28 ],isusedtorepresenttheIO-intensiveclass.SPECseis96[ 29 ],ascienticcomputingintensiveprogram,isusedtorepresenttheCPU-intensiveclass.Asyntheticapplication,Pagebench,isusedtorepresentthePaging-intensiveclass.ItinitializesandupdatesanarraywhosesizeisbiggerthanthememoryoftheVM,therebyinducingfrequentpagingactivity.Ettcp[ 30 ],abenchmarkthatmeasuresthenetworkthroughputoverTCPorUDPbetweentwonodes,isusedasthetrainingapplicationoftheNetwork-intensiveclass.Theperformancedataofallthesefourapplicationsandtheidlestateareusedtotraintheclassier.Foreachtestdata,thetrainedclassiercalculatesitsdistancetoallthetrainingdata.The3-NNclassicationidentiesonlythreetrainingdatasetswiththeshortestdistancetothetestdata.Thenthetestdata'sclassisdecidedbythemajorityvoteofthethreenearestneighbors. 2-4 .Inadditiontoasinglevalue(Class)theapplicationclassieralso 35

PAGE 36

2-2 summarizesthesetofapplicationsusedasthetrainingandthetestingapplicationsintheexperiments[ 28 { 38 ].The3-NNclassierwastrainedwiththeperformancedatacollectedfromtheexecutionsofthetrainingapplicationshighlightedinthetable.AlltheapplicationexecutionswerehostedbyaVMwareGSXvirtualmachine(VM1).ThehostserverofthevirtualmachinewasanIntel(R)Xeon(TM)dual-CPU1.80GHzmachinewith512KBcacheand1GBRAM.Inaddition,asecondvirtualmachinewiththesamespecicationwasusedtoruntheserverapplicationsofthenetworkbenchmarks. 36

PAGE 37

Listoftrainingandtestingapplications

PAGE 38

2-1 basedontheexpertknowledgeofthecorrelationbetweenthesemetricsandtheapplicationclasses.Afterthat,thePCAprocessorconductedthelineartransformationoftheperformancedataandselectedprincipalcomponentsbasedontheminimalfractionvariancedened.Inthisexperiment,thevariancecontributionthresholdwassettoextracttwo(q=2)principalcomponents.Ithelpstoreducethecomputationalrequirementsoftheclassier.Then,thetrained3-NNclassierconductsclassicationbasedonthedataofthetwoprincipalcomponents. Thetrainingdata'sclassclusteringdiagramisshowninFigure 2-5 (a).ThediagramshowsaPCA-basedtwo-dimensionalrepresentationofthedatacorrespondingtotheveclassestargetedbyoursystem.Afterbeingtrainedwiththetrainingdata,theclassierclassiestheremainingbenchmarkprogramsshowninTable 2-2 .Theclassierprovidesoutputsintwokindsofformats:theapplicationclass-clusteringdiagram,whichhelpstovisualizetheclassicationresults,andtheapplicationclasscomposition,whichcanbeusedtocalculatetheunitapplicationcost. Figure 2-5 showsthesampleclusteringdiagramsforthreetestapplications.Forexample,theinteractiveVMDapplication(Figure 2-5 (d))showsamixoftheidleclasswhenuserisnotinteractingwiththeapplication,theI/O-intensiveclasswhentheuserisuploadinganinputle,andtheNetwork-intensiveclasswhiletheuserisinteractingwiththeGUIthroughaVNCremotedisplay.Table 2-3 summarizestheclasscompositionsofallthetestapplications.Figure 2-6 visualizestheclasscompositionofsomesamplebenchmarkprograms.Theseclassicationresultsmatchtheclassexpectationsgainedfromempiricalexperiencewiththeseprograms.Theyareusedtocalculatetheunitapplicationcostshowninsection4.4. 38

PAGE 39

B C D Sampleclusteringdiagramsofapplicationclassications.A)Trainingdata:MixtureB)SimpleScalar:CPU-intensiveC)Autobench:Network-intensiveD)VMD:InteractivePrincipalComponent1and2aretheprincipalcomponentmetricsextractedbyPCA. 39

PAGE 40

Experimentaldata:applicationclasscompositions

PAGE 41

2-3 whenSPECseis96withmediumsizeinputdatawasexecutedinVM1with256MBmemory(SPECseis96 A),itisclassiedasCPU-intensiveapplication.IntheSPECseis96 Bexperiment,thesmallerphysicalmemory(32MB)resultedinincreasedpagingandI/Oactivity.TheincreasedI/OactivityisduetothefactthatlessphysicalmemoryisavailabletotheO/SbuercacheforI/Oblocks.Thebuercachesizeatruntimewasobservedtobeassmallas1MBinSPECseis96 B,andaslargeas200MBinSPECseis96 A.Inaddition,theexecutiontimegetsincreasedfrom291minutesand42secondsintherstcaseto426minutes58secondsinthesecondcase. Similarly,intheexperimentswithPostMark,dierentexecutionenvironmentcongurationschangedtheapplication'sresourceconsumptionpatternfromoneclasstoanother.Table 2-3 showsthatifalocalledirectorywasusedtostorethelestobereadandwrittenduringtheprogramexecution,thePostMarkbenchmarkshowedtheresourceconsumptionpatternoftheI/O-intensiveclass.Incontrast,withanNFSmountedledirectory,it(PostMark NFS)wasturnedintoaNetwork-intensiveapplication. Therstsetofexperimentsdemonstratesthattheapplicationclassinformationcanhelptheschedulertooptimizeresourcesharingamongapplicationsrunninginparalleltoimprovesystemthroughputandreducethroughputvariances.Intheexperiments,three 41

PAGE 42

Applicationclasscompositiondiagram applications{SPECseis96(S)withsmalldatasize,PostMark(P)withlocalledirectoryandNetPIPEClient(N){wereselected,andthreeinstancesofeachapplicationwereexecuted.Thescheduler'staskwastodecidehowtoallocatethenineapplicationinstancestorunonthe3virtualmachines(VM1,VM2andVM3)inparallel,eachofwhichhosted3jobs.TheVM4wasusedtohosttheNetPIPEserver.Therearetenpossibleschedulesavailable,asshowninFigure 2-7 Whenmultipleapplicationsrunonthesamehostmachineatthesametime,thereareresourcecontentionsamongthem.Twoscenarioswerecompared:intherstscenario,theschedulerdidnotuseclassinformation,andoneofthetenpossiblescheduleswas 42

PAGE 43

Systemthroughputcomparisonsfortendierentschedules1:f(SSS),(PPP),(NNN)g,2:f(SSS),(PPN),(PNN)g,3:f(SSP),(SPP),(NNN)g,4:f(SSP),(SPN),(PNN)g,5:f(SSP),(SNN),(PPN)g,6:f(SSN),(SPP),(PNN)g,7:f(SSN),(SPN),(PPN)g,8:f(SSN),(SNN),(PPP)g,9:f(SPP),(SPN),(SNN)g,10:f(SPN),(SPN),(SPN)gS{SPECseis96(CPU-intensive),P{PostMark(I/O-intensive),N{NetPIPE(Network-intensive). selectedatrandom.Theotherscenariousedapplicationclassknowledge,alwaysallocatingapplicationsofdierentclasses(CPU,I/Oandnetwork)torunonthesamemachine(Schedule10,Figure 2-7 ).ThesystemthroughputsobtainedfromrunsofallpossibleschedulesintheexperimentalenvironmentareshowninFigure 2-7 Theaveragesystemthroughputoftheschedulechosenwithclassknowledgewas1391jobsperday.Itachievedthehighestthroughputamongthetenpossibleschedules{22.11%largerthantheweightedaverageofthesystemthroughputsofallthetenpossibleschedules.Inaddition,therandomselectionofthepossibleschedulesresultedinlargevariancesofsystemthroughput.Theapplicationclassinformationcanbeusedtofacilitatetheschedulertopicktheoptimalscheduleconsistently.TheapplicationthroughputcomparisonofdierentschedulesononemachineisshowninFigure 2-8 .Itcomparesthe 43

PAGE 44

Applicationthroughputcomparisonsofdierentschedules.MIN,MAX,andAVGaretheminimum,maximum,averageapplicationthroughputofallthetenpossibleschedules.SPNistheproposedschedule10f(SPN),(SPN),(SPN)ginFigure 2-7 Table2-4. Systemthroughput:concurrentvs.sequentialexecutions CH3D PostMark TimeTakento Time(sec) Finish2Jobs Concurrent 310 613 264 752 throughputofscheduleID10(labeledSPNinFigure 2-8 )withtheminimum,maximum,andaveragethroughputsofallthetenpossibleschedules.Byallocatingjobsfromdierentclassestothemachine,thethreeapplications'throughputswerehigherthanaveragebydierentdegrees:SPECseis96Smallby24.90%,Postmarkby48.13%,andNetPIPEby4.29%.Figure 2-8 alsoshowsthatthemaximumapplicationthroughputswereachievedbysub-schedule(SSN)forSPECseis96and(PPN)forNetPIPEinsteadoftheproposed(SPN).However,thelowthroughputsoftheotherapplicationsinthesub-schedulemaketheirtotalthroughputssub-optimal. 44

PAGE 45

2-4 .Theexperimentresultsshowthattheexecutioneciencylossescausedbytherelativelymoderateresourcecontentionsbetweenapplicationsofdierentclasseswereosetbythegainsfromtheutilizationofidlecapacity.Theresourcesharingofapplicationsofdierentclassesimprovedtheoverallsystemthroughput. 2-3 wasrunningonanIntel(R)Pentium(R)4CPU1.70GHzmachinewith512MBmemory.Inaddition,theapplicationclassierwasrunningonanIntel(R)Pentium(R)III750MHzmachinewith256MBRAM. Inthisexperiment,atotalof8000snapshotsweretakenwithve-secondintervalsforthevirtualmachine,whichhostedtheexecutionofSPECseis96(medium).Ittooktheperformancelter72secondstoextracttheperformancedataofthetargetapplicationVM.Inaddition,ittookanother50secondsfortheclassicationcentertotraintheclassier,performthePCAfeatureselectionandtheapplicationclassication.Thereforetheunitclassicationcostis15mspersampledata,demonstratingthatitispossibletoconsidertheclassierforonlinetraining. 39 ][ 25 ]andclassicationtechniqueshavebeenappliedtomanyareassuccessfully,suchasintrusiondetection[ 40 ][ 41 ][ 42 ][ 43 ],textcategorization[ 44 ],andimageandspeechanalysis.Kapadia'sevaluationoflearningalgorithmsforapplicationperformancepredictionin[ 45 ]showsthatthenearest-neighboralgorithmhasbetter 45

PAGE 46

45 ].ThisthesisdiersfromKapadia'sworkinthefollowingways:First,theapplicationclassknowledgeisusedtofacilitatetheresourceschedulingtoimprovetheoverallsystemthroughputincontrastwithKapadia'swork,whichfocusesonapplicationCPUtimeprediction.Second,theapplicationclassiertakesperformancemetricsasinputs.Incontrast,in[ 45 ]theCPUtimepredictionisbasedontheinputparametersoftheapplication.Third,theapplicationclassieremploysPCAtoreducethedimensionalityoftheperformancefeaturespace.Itisespeciallyhelpfulwhenthenumberofinputfeaturesoftheclassierisnottrivial. Condorusesprocesscheckpointandmigrationtechniques[ 20 ]toallowanallocationtobecreatedandpreemptedatanytime.Thetransferofcheckpointsmayoccupysignicantnetworkbandwidth.Basney'sstudyin[ 46 ]showsthatco-schedulingofCPUandnetworkresourcescanimprovetheCondorresourcepool'sgoodput,whichisdenedastheallocationtimewhenaremotelyexecutingapplicationusestheCPUtomakeforwardprogress.Theapplicationclassierpresentedinthisthesisperformslearningofapplication'sresourceconsumptionofmemoryandI/OinadditiontoCPUandnetworkusage.Itprovidesawaytoextractthekeyperformancefeaturesandgenerateanabstractoftheapplicationresourceconsumptionpatternintheformofapplicationclass.Theapplicationclassinformationandresourceconsumptionstatisticscanbeusedtogetherwithrecentmulti-lateralresourceschedulingtechniques,suchasCondor'sGang-matching[ 47 ],tofacilitatetheresourceschedulingandimprovesystemthroughput. ConservativeScheduling[ 4 ]usesthepredictionoftheaverageandvarianceoftheCPUloadofsomefuturepointoftimeandtimeintervaltofacilitatescheduling.Theapplicationclassiersharesthecommontechniqueofresourceconsumptionpatternanalysisofatimewindow,whichisdenedasthetimeofoneapplicationrun.However,theapplicationclassieriscapabletotakeintoaccountusagepatternsofmultiplekindsofresources,suchasCPU,I/O,networkandmemory. 46

PAGE 47

48 ]usesasyntheticskeletonprogramtoreproducetheCPUutilizationandcommunicationbehaviorsofmessagepassingparallelprogramstopredictapplicationperformance.Incontrast,theapplicationclassierprovidesapplicationbehaviorlearninginmoredimensions. Prophesy[ 49 ]employsaperformance-modelingcomponent,whichusescouplingparameterstoquantifytheinteractionsbetweenkernelsthatcomposeanapplication.However,tobeabletocollectdataatthelevelofbasicblocks,procedures,andloops,itrequiresinsertionofinstrumentationcodeintotheapplicationsourcecode.Incontrast,theclassicationapproachusesthesystemperformancedatacollectedfromtheapplicationhosttoinfertheapplicationresourceconsumptionpattern.Itdoesnotrequirethemodicationoftheapplicationsourcecode. Statisticalclusteringtechniqueshavebeenappliedtolearnapplicationbehavioratvariouslevels.Nickolayevetalappliedclusteringtechniquestoecientlyreducetheprocessoreventtracedatavolumeinclusterenvironment[ 50 ].AhnandVetterconductedapplicationperformanceanalysisbyusingclusteringtechniquestoidentifytherepresentativeperformancecountermetrics[ 51 ].BothCohenandChase's[ 52 ]andourworkperformstatisticalclusteringusingsystem-levelmetrics.However,theirworkfocusesonsystemperformanceanomalydetection.Ourworkfocusesonapplicationclassicationforresourcescheduling. Ourworkcanbeusedtolearntheresourceconsumptionpatternsofparallelapplication'schildprocessandmulti-stageapplication'ssub-stage.However,inthisstudywefocusonsequentialandsingle-stageapplications. 47

PAGE 48

Inthiswork,theinputperformancemetricsareselectedmanuallybasedonexpertknowledge.Inthenextchapter,thetechniquesforautomaticallyselectingfeaturesforapplicationclassicationarediscussed. 48

PAGE 49

Applicationclassicationtechniquesbasedonmonitoringandlearningofresourceusage(e.g.,CPU,memory,disk,andnetwork)havebeenproposedinChapter2toaidinresourceschedulingdecisions.Animportantproblemthatarisesinapplicationclassiersishowtodecidewhichsubsetofnumerousperformancemetricscollectedfrommonitoringtoolsshouldbeusedfortheclassication.Thischapterpresentsanapproachbasedonaprobabilisticmodel(BayesianNetwork)tosystematicallyselecttherepresentativeperformancefeatures,whichcanprovideoptimalclassicationaccuracyandadapttochangingworkloads. 53 ].Well-knownmonitoringtoolssuchastheopensourcepackagesGanglia[ 54 ]anddproc[ 55 ],andcommercialproductssuchasHP'sOpenView[ 56 ]providethecapabilityofmonitoringarichsetofsystemlevelperformancemetrics.Animportantproblemthatarisesishowtodecidewhichsubsetofnumerousperformancemetricscollectedfrommonitoringtoolsshouldbeusedfortheclassicationinadynamicenvironment.Inthischapterweaddressthisproblem.Ourapproachisbasedonautonomicfeatureselectionandcanhelptoimprovethesystem'sself-manageability[ 1 ]byreducingtherelianceonexpertknowledgeandincreasingthesystem'sadaptability. TheneedforautonomicfeatureselectionandapplicationclassicationismotivatedbysystemssuchasVMPlant[ 16 ],whichprovidesautomatedresourceprovisioningofVirtualMachine(VM).InthecontextofVMPlant,theapplicationcanbescheduledtorunonadedicatedvirtualmachine,whosesystemlevelperformancemetricsreecttheapplication's 49

PAGE 50

Tobuildanautonomicclassicationsystemwithself-congurability,itiscriticaltodeviseasystematicfeatureselectionschemethatcanautomaticallychoosethemostrepresentativefeaturesforapplicationclassicationandadapttochangingworkloads.Thischapterpresentsanapproachofusingaprobabilisticmodel,theBayesianNetwork,toautomaticallyselecttheperformancemetricsthatcorrelatewithapplicationclassesandoptimizetheclassicationaccuracy.TheapproachalsousestheMahalanobisdistancetosupportonlineselectionoftrainingdata,whichenablesthefeatureselectiontoadapttodynamicworkloads.Intherestofthisdissertation,wewillusetheterms\metrics"and\features"interchangeably. Inchapter2,asubsetofperformancemetricsweremanuallyselectedbasedonexpertknowledgetocorrelatetotheresourceconsumptionbehavioroftheapplicationclass.However,expertknowledgeisnotalwaysavailable.Incaseofhighlydynamicworkloadormassvolumeofperformancedata,theapproachofmanualcongurationbyhumanexpertisalsonotfeasible.Thesepresentaneedforasystematicwaytoselecttherepresentativemetricsintheabsenceofsucientexpertknowledge.Ontheotherhand,theuseoftheBayesianNetworkleavestheoptionopentointegrateexpertknowledgewiththeautomaticfeatureselectiontoimprovetheclassicationaccuracyandeciency. Featureselectionbasedonstaticselectedapplicationperformancedata,whichareusedasthetrainingset,maynotalwaysprovidetheoptimalclassicationresultsindynamicenvironments.Toenablethefeatureselectiontoadapttothechangingworkload,itrequiresthesystemtobeabletodynamicallyupdatethetrainingsetwithdatafromrecentworkload.Aquestionthatarisesishowtodecidewhichdatashouldbeselectedastrainingdata.Inthiswork,analgorithmbasedonMahalanobisdistanceisused 50

PAGE 51

Ourexperimentalresultsshowthefollowing.First,weobservecorrelationsbetweenpairsofselectedperformancemetrics,whichjustiestheuseofMahalanobisdistanceasameansoftakingthecorrelationintoaccountinthetrainingdataselectionprocess.Second,thereisadiminishingreturnofclassicationutilityfunction(i.e.theratioofclassicationaccuracyoverthenumberofselectedmetrics)asmorefeaturesareselected.Theexperimentsshowedthatabove90%applicationclassicationaccuracycanbeachievedwithasmallsubsetofperformancemetricswhicharehighlycorrelatedwiththeapplicationclass.Third,theapplicationclassicationbasedontheselectedfeaturesforasetofbenchmarkprogramsandscienticapplicationsmatchedourempiricalexperiencewiththeseapplications. Therestofthechapterisorganizedasfollows:ThestatisticaltechniquesusedaredescribedinSection 3.2 .Section 3.3 presentsthefeatureselectionmodel.Section 3.4 presentsanddiscussestheexperimentalresults.Section 3.5 discussesrelatedwork.ConclusionsandfutureworkarediscussedinSection 3.6 3.2.1FeatureSelection 57 ].Subsetgenerationisaprocessofheuristicsearchofcandidatesubsets.Eachsubsetisevaluatedbasedontheevaluationcriterion.Thentheevaluationresultiscomparedwiththepreviouslycomputedbestresult.Ifitisbetter,itwillreplacethebestresultandtheprocesscontinuesuntilthestopcriterionisreached.Theselectionresultisvalidatedbydierenttestsorpriorknowledge. 51

PAGE 52

3.3.2 58 ].Itcanbeusedtocomputetheconditionalprobabilityofanode,giventhevaluesofitspredecessors;hence,aBNcanbeusedasaclassierthatgivestheposteriorprobabilitydistributionoftheclassdecisionnodegiventhevaluesofothernodes. BayesianNetworksarebasedontheworkofthemathematicianandtheologianRev.ThomasBayes,whoworkedwithconditionalprobabilitytheoryinthelate1700stodiscoverabasiclawofprobability,whichwasthencalledBayes'rule.TheBayes'ruleincludesahypothesis,pastexperience,andevidence: wherewecanupdateourbeliefinhypothesisHgiventheadditionalevidenceE,andthebackgroundcontext(pastexperience),c. Theleft-handterm,P(HjE;c)iscalledtheposteriorprobability,ortheprobabilityofhypothesisHafterconsideringtheeectoftheevidenceEonpastexperiencec. ThetermP(Hjc)iscalledthea-prioriprobabilityofHgivencalone. ThetermP(EjH;c)iscalledthelikelihoodandgivestheprobabilityoftheevidenceassumingthehypothesisHandthebackgroundinformationcistrue. 52

PAGE 53

BayesianNetworkscaptureBayes'ruleinagraphicalmodel.Theyareveryeectiveformodelingsituationswheresomeinformationisalreadyknownandincomingdataisuncertainorpartiallyunavailable(unlikerule-basedor\expert"systems,whereuncertainorunavailabledataresultsinineectiveorinaccuratereasoning).ThisrobustnessinthefaceofimperfectknowledgeisoneofthemanyreasonswhyBayesianNetworksareincreasinglyusedasanalternativetootherAIrepresentationalformalisms.Bayesiannetworkshavebeenappliedtomanyareassuccessfully,includingmaplearning[ 59 ],medicaldiagnosis[ 60 ][ 61 ],andspeechandvisionprocessing[ 62 ][ 63 ].Comparedwithotherpredictivemodels,suchasdecisiontreesandneuralnetworks,andstandardfeatureselectionmodelthatisbasedonPrincipalComponentAnalysis(PCA),Bayesiannetworksalsohavetheadvantageofinterpretability.Humanexpertscaneasilyunderstandthenetworkstructureandmodifyittoobtainbetterpredictivemodels.Byaddingdecisionnodesandutilitynodes,BNmodelscanalsobeextendedtodecisionnetworksfordecisionanalysis[ 64 ]. ConsideradomainUofnvariables,x1;;xn.Eachvariablemaybediscretehavinganiteorcountablenumberofstates,orcontinuous.GivenasubsetXofvariablesxi,wherexi2U,ifonecanobservethestateofeveryvariableinX,thenthisobservationiscalledaninstanceofXandisdenotedasX=p(xijx1;;xi1;)=p(xiji;)kxfortheobservationsxi=ki;xi2X.The\jointspace"orUisthesetofallinstancesofU.p(X=kxjY=k;)denotesthe\generalizedprobabilitydensity"thatX=p(xijx1;;xi1;)=p(xiji;)kxgivenY=kforapersonwithcurrentstateinformation.p(XjY;)thendenotesthe\GeneralizedProbabilityDensityFunction"(gpdf)forX,givenallpossibleobservationsofY.ThejointgpdfoverUisthegpdfforU. ABayesiannetworkfordomainUrepresentsajointgpdfoverU.Thisrepresentationconsistsofasetoflocalconditionalgpdfscombinedwithasetofconditionalindependence 53

PAGE 54

SampleBayesiannetworkgeneratedbyfeatureselector assertionsthatallowtheconstructionofaglobalgpdffromthelocalgpdfs.Asshownpreviously,thechainruleofprobabilitycanbeusedtoascertainthesevalues: (3{1) OneassumptionimposedbyBayesianNetworktheory(andindirectlybytheProductRuleofprobabilitytheory)isthateachvariablexi;ifxi;;xi1gmustbeasetofvariablesthatrendersxiandfx1;;xi1gconditionallyindependent.Inthisway: (3{2) ABayesianNetworkStructurethenencodestheassertionsofconditionalindependenceinEquation 3{1 above.Essentiallythen,aBayesianNetworkStructureBSisadirectedacyclicgraphsuchthateachvariableinUcorrespondstoanodeinBS,andtheparentsofthenodecorrespondingtoxiarethenodescorrespondingtothevariablesini. Dependingontheproblemthatisdened,either(orboth)ofthetopologyandtheprobabilitydistributionofBayesianNetworkcanbepre-denedbyhandormaybe 54

PAGE 55

3-1 givesasampleBNlearnedintheexperiment.Therootistheapplicationclassdecisionnode,whichisusedtodecideanapplicationclassgiventhevalueoftheleafnodes.Therootnodeistheparentofallothernodes.Theleafnodesrepresentselectedperformancemetrics,suchasnetworkpacketssentandbyteswrittentodisk.Theyareconnectedonetoanotherinaseries. 22 ][ 65 ].Forexample,ifx1andx2aretwopointsfromthedistributionwhichischaracterizedbycovariancematrix1,thenthequantity ((x1x2)T1(x1x2))1 2 iscalledtheMahalanobisdistancefromx1tox2,whereTdenotesthetransposeofamatrix. Inthecaseswheretherearecorrelationsbetweenvariables,simpleEuclideandistanceisnotanappropriatemeasure,whereastheMahalanobisdistancecanadequatelyaccountforthecorrelationsandisscale-invariant.StatisticalanalysisoftheperformancedatainSection 3.4.3 showsthattherearecorrelationsbetweentheapplicationperformancemetricswithvariousdegrees.Therefore,Mahalanobisdistancebetweentheunlabeledperformancesampleandtheclasscentroid,whichrepresentstheaverageofallexistingtrainingdataoftheclass,isusedinthetrainingdataqualicationprocessinSection 3.3.1 66 ]iscommonlyusedtoevaluatetheperformanceofclassicationsystems.Itshowsthepredictedandactualclassicationdonebythesystem.ThematrixsizeisLxL,whereListhenumberofdierentclasses.Inourcasewheretherearevetargetapplicationclasses,theLisequalto5. 55

PAGE 56

3-1 showsasampleconfusionmatrixwithL=2.Thereareonlytwopossibleclassesinthisexample:Positiveandnegative.Therefore,itsclassicationaccuracycanbecalculatedas(a+d)/(a+b+c+d). 3-2 showstheautonomicfeatureselectionframworkinthecontextofapplicationclassication.Inthissection,wearegoingtofocusonintroducingtheclassicationtrainingcenter,whichenablestheself-congurabilityforonlineapplicationclassication.Thetrainingcenterhastwomajorfunctions:qualityassuranceoftrainingdata,whichenablestheclassiertoadapttochangingworkloads,andsystematicfeatureselection,whichsupportsautomaticfeatureselection.Thetrainingcenterconsistsofthreecomponents:thedataqualityassuror,thefeatureselector,andthetrainer. ThetrainingdatapoolconsistsofrepresentativedataofveapplicationclassesincludingCPU-intensive,I/O-intensive,memory-intensive,network-intensive,andidle.TrainingdataofeachclasscisasetofKcm-dimensionalpoints,wheremisthenumberofapplication-specicperformancemetricsreportedbythemonitoringtools.Toselectthe Table3-1. Sampleconfusionmatrixwithtwoclasses(L=2) Actual Predicted Class Negative Positive Negative a b Positive c d 56

PAGE 57

FeatureselectionmodelThePerformanceprolercollectsperformancemetricsofthetargetapplicationnode.TheApplicationclassierclassiestheapplicationusingextractedkeycomponentsandperformsstatisticanalysisoftheclassicationresults.TheDataQAselectsthetrainingdatafortheclassication.TheFeatureselectorselectsperformancemetricswhichcanprovideoptimalclassicationaccuracy.TheTrainertrainstheclassierusingtheselectedmetricsoftrainingdata.TheApplicationDBstorestheapplicationclassinformation.(t0=t1:arethebeginning/endingtimesoftheapplicationexecution,VMIPistheIPaddressoftheapplication'shostmachine). trainingdatafromtheapplicationsnapshots,onlynoutofmmetricsareextractedbasedonpreviousfeatureselectionresulttoformasetofKcn-dimensionaltrainingpoints. thatcompriseaclusterCc.From[ 50 ],itfollowsthatthen-tuple (3{5) 57

PAGE 58

xic(t)=1 iscalledthecentroidoftheclusterCc. Thetrainingdataselectionisathree-stepprocess:FirsttheDataQAextractsthenoutofmmetricsoftheinputperformancesnapshottoformatrainingdatacandidate.Thuseachcandidateisrepresentedbyann-dimensionalpointx=(x1;x2;;xn).Second,itevaluateswhethertheinputcandidateisqualiedtobetrainingdatarepresentingoneoftheapplicationclass.Atlast,thequaliedtrainingdatacandidateisassociatedwithascalarvalueClass,whichdenestheapplicationclass. Therststepisstraight-forward.Inthesecondandthirdsteps,theMahalanobisdistancebetweenthetrainingdatacandidatexandthecentroidcofclusterCciscalculatedasfollows: 2 wherec=1;2;;5representstheapplicationclassand1cdenotestheinversecovariancematrixoftheclusterCc.Thedistancefromthetrainingdatacandidatextotheboundarybetweentwoclassclusters,forexampleC1andC2,isjd1(x)d2(x)j.Ifjd1(x)d2(x)j=0,itmeansthatthecandidateisexactlyattheboundarybetweenclass1and2.Thefurtherawaythecandidateisfromtheclassboundaries,thebetteritcanrepresentaclass.Inotherwords,thereislessprobabilityforittobemisclassied.Therefore,theDataQAcalculatesthedistancefromthecandidatetoboundariesofallpossiblepairsoftheclasses.Iftheminimaldistancetoclassboundaries,min(jd1d2j;jd1d3j;;jd4d5j),isbiggerthanapredenedthreshold,thecorrespondingm-dimensionalsnapshotofthecandidateisdeterminedtobequaliedtrainingdataof 58

PAGE 59

Sampleperformancemetricsintheoriginalfeatureset Description Metrics system/user PercentCPU system/user /idle /idle cpu nice PercentCPUnice bytes in/out Numberofbytespersecond into/outofthenetwork io bi/bo Blockssentto/receivedfrom ablockdevice(blocks/s) swap in/out Amountofmemoryswapped in/outfrom/todisk(kB/s) pkts in/out Packetsin/outpersecond proc run Totalnumberofrunning processes load one/ve One/Five/Fifteenminutes /fteen loadaverage theclass,whosecentroidhasthesmallestMahanalobisdistancemin(d1;d2;;d5)tothesnapshot.Automatedandadaptivethresholdsettingisdiscussedindetailin[ 67 ]. Inourimplementation,Gangliaisusedasthemonitoringtoolandtwenty(m=20)performancemetrics,whicharerelatedtoresourceusage,areincludedinthetrainingdata.Theseperformancemetricsinclude16outof33defaultmetricsmonitoredbyGangliaandthe4metricsthatweaddedbasedontheneedofclassication.ThefourmetricsincludethenumberofI/Oblocksreadfrom/writtentodisk,andthenumberofmemorypagesswappedin/out.Aprogramwasdevelopedtocollectthesefourmetrics(usingvmstat)andaddedthemtothemetriclistofGanglia'smonitoringdaemongmond.Table 3-2 showssomesampleperformancemetricsofthetrainingcandidate. Thersttimequalityassurancewasperformedbyhumanexpertattheinitialization.Thesubsequentassurancecanbeconductedautomaticallybyfollowingtheabovestepstoselectrepresentativetrainingdataforeachclass. 59

PAGE 60

trainingdatasetwithNfeatures Class classoftrainingdata(teacherforlearning) selectedfeaturesubset maximumaccuracyD=discretize(C); convertcontinuoustodiscretefeaturesrepeatinitializeAnode=0; maxaccuracyforeachnodeinitializeFnode=0; selectedfeatureforeachnodeforeachF2(fF0;F1;;FN1gSbest)doAccuracy=eval(D;Class;Sbest[fFg); evaluateBayesiannetworkwithextrafeatureFifAccuracy>Anodethen storethecurrentfeatureAnode=Accuracy;Fnode=F;endendifAnode>AmaxthenSbest=Sbest[fFnodeg;Amax=Anode;Anode=Anode+1;enduntil(AnodeAmax);end Bayesian-networkbasedfeatureselectionalgorithmforapplicationclassication collectedfrommonitoringtools.Bylteringoutmetricswhichcontributelesstotheclassication,itcanhelptonotonlyreducethecomputationalcomplexityofsubsequentclassications,butalsoimproveclassicationaccuracy. Inourpreviouswork[ 53 ],representativefeatureswereselectedmanuallybasedonexpertknowledge.Forexample,performancemetricsofcpu systemandcpu userarecorrelatedtothebehaviorofCPU-intensiveapplications;bytes inandbytes outarecorrelatedtonetwork-intensiveapplications;io biandio boarecorrelatedtotheI/O-intensiveapplications;swap inandswap outarecorrelatedtomemory-intensiveapplications.However,tosupporton-lineclassication,itisnecessaryforfeatureselectiontohavetheabilitytoadapttochangingworkloads.Therefore,thestaticselection 60

PAGE 61

AwrapperalgorithmbasedonBayesiannetworkisemployedbythefeatureselectortoconductthefeatureselection.AsintroducedinSection 3.2.1 ,althoughthisfeatureselectionschemereducestherelianceonhumanexperts'knowledge,theBayesiannetwork'sinterpretabilityleavestheoptionsopentointegratetheexpertknowledgeintotheselectionschemetobuildabetterclassicationmodel. Figure 3-3 showsthefeatureselectionalgorithm.ItstartswithanemptyfeaturesubsetSbest=fg.TosearchforthebestfeatureF,itusesthetemporaryfeaturesetfSbest[FgtoperformBayesianNetworkclassicationforthediscretetrainingdataD.TheclassicationaccuracyiscalculatedbycomparingtheclassicationresultandthetrueansweroftheClassinformationcontainedinthetrainingdata.Aftertheevaluationofaccuracyusingallremainingfeatures(fF1;F2;;FN1gSbest),thebestaccuracyisstoredtoAnode.IfAnodeisbetterthanthepreviousbestaccuracyAmaxachieved,thecorrespondingfeaturenodeisaddedtothefeaturesubsettoformthenewsubset.Thisprocessisrepeateduntiltheclassicationaccuracycannotbeimprovedanymorebyaddinganyoftheremainingfeaturesintothesubset. 61

PAGE 62

Intheexperiments,alltheapplicationswereexecutedinaVMwareGSX2.5virtualmachinewith256MBmemory.ThevirtualmachinewashostedonanIntel(R)Xeon(TM)dual-CPU1.80GHzmachinewith512KBcacheand1GBRAM.TheCTCandapplicationclassierwererunningonanIntel(R)Pentium(R)III750MHzmachinewith256MBRAM. Therstexperimentwasdesignedtoshowtherelationshipbetweenclassicationaccuracyandthenumberoffeaturesselected.Thesecondexperimentwasdesignedto 62

PAGE 63

Averageclassicationaccuracyof10setsoftestdataversusnumberoffeaturesselectedintherstexperiment Figure3-5. Two-classtestdatadistributionwiththersttwoselectedfeatures 63

PAGE 64

Intherstexperiment,thetrainingdataconsistofperformancesnapshotsofveclassesofapplications,includingCPU-intensive,I/O-intensive,memory-intensive,andnetwork-intensiveapplications,andthesnapshotscollectedfromanidleapplication-VM,whichhasonly\backgroundnoise"fromsystemactivity(i.e.withoutanyapplicationexecutionduringthemonitoringperiod).Thefeatureselector'staskistoselectthosemetricswhichcanbeusedtoclassifythetestsetintoveclasseswithoptimalaccuracy. Inalltheteniterationsofcrossvalidation,twoperformancemetrics(cpu systemandload fteen)werealwaysselectedasthebesttwofeatures.Figure 3-6 showsasampletestdatadistributionwiththesetwofeatures.Ifweprojectthedatatox-axisory-axis,wecanseethatitismorediculttodierentiatethedatafromeachclassbyusingeithercpu systemorload fteenalonethanusingbothmetrics.Forexample,thecpu systemvaluerangesofnetwork-intensiveapplicationandI/O-intensiveapplicationlargelyoverlap.Thismakesithardtoclassifythesetwoapplicationswithonlycpu systemmetric.Comparedwiththeone-metricclassication,itismucheasiertodecidewhichclassthetestdatabelongtobyusinginformationofbothmetrics.Inotherwords,thecombinationofmultiplefeaturesismoredescriptivethanasinglefeature. TheclassicationaccuracyversusthenumberoffeaturesselectedfortheabovelearnedBayesiannetworkisplottedinFigure 3-4 .Itshowsthatwithasmallnumberoffeatures(3to4),itcanachieveabove90%classicationaccuracyforthis5-classclassication. Inthesecondexperiment,thetrainingdataconsistofperformancesnapshotsoftwoclassesofapplications,I/O-intensiveandmemory-intensive.Figure 3-5 showsitstestdatadistributionwiththersttwoselectedfeatures,bytes inandpkts in.AcomparisonofFigure 3-6 andFigure 3-5 showsthatwithreducednumberofapplicationclasses,higherclassicationaccuracycanbeachievedwithlessnumberoffeatures.Forexample, 64

PAGE 65

Confusionmatrixofclassicationresultswithexpert-selectedandautomatically-selectedfeaturesets.A)AutomaticB)Expert Actual Classiedas Class Idle CPU IO Net Mem Idle 62 0 0 CPU 231 0 0 IO 20 86 0 Net 0 12 8 Mem 0 0 0 0 Actual Classiedas Class Idle CPU IO Net Mem Idle 38 0 0 CPU 4 0 104 IO 20 10 173 Net 0 0 24 Mem 3 0 36 0 Theboldnumbersalongthediagonalarethenumberofcorrectlyclassieddata. inthisexperiment,ifweknowthattheapplicationbelongstoeitherI/O-intensiveormemory-intensiveclass,withtwoselectedfeatures,96%classicationaccuracycanbeachievedversus87%accuracyinthe5-classcase.Itshowsthepotentialofusingpair-wiseclassicationtoimprovetheclassicationaccuracyformulti-classcases.Usingpair-wiseapproachformulti-classclassicationisatopicoffutureresearch. 53 ]. First,thetrainingdatadistributionsbasedonprincipalcomponents,whicharederivedfromautomaticallyselectedfeaturesinSection 3.4.1 andmanuallyselectedfeaturesinpreviouswork[ 53 ],areshowninFigure 3-8 .DistancesbetweeneachpairofclasscentroidsinFigure 3-8 arecalculatedandplotedinFigure 3-7 .Itshowsthat 65

PAGE 66

Five-classtestdatadistributionwithrsttwoselectedfeatures Figure3-7. Comparisonofdistancesbetweenclustercentersderivedfromexpert-selectedandautomaticallyselectedfeaturesets1:idle-cpu2:idle-I/O3:idle-net4:idle-mem5:cpu-I/O6:cpu-net7:cpu-mem8:I/O-net9:I/O-mem10:net-mem 66

PAGE 67

B Trainingdataclusteringdiagramderivedfromexpert-selectedandautomaticallyselectedfeaturesets A )Automatic B )Expert 67

PAGE 68

Second,thePCAandk-NNbasedclassicationswereconductedwithboththeexpertselected8featuresinpreviouswork[ 53 ]andtheautomaticallyselectedfeaturesinSection 3.4.1 .Table 3-3 showstheconfusionmatricesoftheclassicationresults.Ifdataareclassiedtothesameclassesastheiractualclasses,theclassicationsareconsideredascorrect.Theclassicationaccuracyistheproportionofthetotalnumberofclassicationsthatwerecorrect.Theconfusionmatricesshowsthataclassicationaccuracyof98.05%canbeachievedwithautomaticallyselectedfeatureset,whichissimilartothe98.14%accuracyachievedwithexpertselectedfeatureset.ThustheautomaticfeatureselectionthatisbasedonBayesianNetworkcanreducetherelianceonexpertknowledgewhileoeringcompetitiveclassicationaccuracycomparedtomanualselectionbyhumanexpert. Inaddition,asetof8featuresselectedinthe5-classfeatureselectionexperimentinSection 3.4.1 wasusedtoconguretheapplicationclassierandthesametrainingdatausedinthefeatureselectionexperimentwereusedtotraintheapplicationclassier.Thenthetrainedclassierconductedclassicationforasetofthreebenchmarkprograms:SPECseis96[ 29 ],PostMarkandPostMark NFS[ 28 ].SPECseis96isascienticapplicationwhichiscomputing-intensivebutalsoexercisesdiskI/Ointheinitialandendphasesofitsexecution.PostMarkoriginallyisadiskI/Obenchmarkprogram.InPostMark NFS,anetworklesystem(NFS)mounteddirectorywasusedtostoretheleswhichwereread/writtenbythebenchmark.Therefore,PostMark NFSperformssubstantialnetwork-I/OratherthandiskI/O.TheclassicationresultsareshowninFigure 3-9 .Theresultsshowthat86%oftheSPECseis96testdatawereclassiedascpu-intensive,95%ofthePostMarkdatawereclassiedasI/O-intensive,and61%ofthePostMark NFS 68

PAGE 69

B C Classicationresultsofbenchmarkprograms A )SPECseis96 B )PostMark C )PostMark NFSPrincipalcomponents1and2aretheprincipalcomponentmetricsextractedbyPCA. 69

PAGE 70

Performancemetriccorrelationmatrixesoftestapplications.A)CorrelationmatrixofSPECseis96performancedataB)CorrelationmatrixofPostMarkperformancedataC)CorrelationmatrixofNetPIPEperformancedata Metric 1 2 3 4 5 6 1 1.00 -0.21 -0.34 -0.02 2 -0.21 1.00 -0.16 -0.02 -0.17 -0.06 3 -0.34 -0.16 1.00 -0.05 4 -0.19 0.04 5 0.20 -0.17 0.20 -0.19 1.00 0.12 6 -0.02 -0.06 -0.05 0.04 0.12 1.00 A Metric 1 2 3 4 5 6 1 1.00 -0.24 0.22 0.34 -0.08 -0.13 2 -0.24 1.00 -0.22 0.18 0.04 -0.02 3 0.22 -0.22 1.00 0.33 0.30 0.18 4 0.34 0.18 0.33 1.00 0.42 0.47 5 -0.08 0.04 0.30 0.42 1.00 0.20 6 -0.13 -0.02 0.18 0.47 0.20 1.00 B Metric 1 2 3 4 5 6 1 1.00 0.29 0.31 0.48 0.27 0.30 2 0.29 1.00 0.49 0.39 0.95 0.31 0.49 1.00 0.50 0.52 0.48 0.39 0.50 1.00 0.42 0.39 5 0.28 0.59 1.00 0.30 0.52 C 1{load ve2{pkts in3{cpu system4{load fteen5{pkts out6{bytes outCorrelationsthosearelargerthan0.5arehighlightedwithboldcharacters datawereclassiedasnetwork-intensive.Theresultsmatchedwithourempiricalexperiencewiththeseprogramsandareclosetotheresultsofexpert-selected-featurebasedclassiciation,whichshows85%cpu-intensiveforSPECseis96,97%I/O-intensiveforPostMark,and62%network-intensiveforPostMark NFS. 70

PAGE 71

Thedataqualityassurorclassieseachunlabeledtestdatabyidentifyingitsnearestneighboramongallclasscentroids.Itsperformancethusdependscruciallyonthedistancemetricusedtoidentifythenearestclasscentroid.Infact,anumberofresearchershavedemonstratedthatnearestneighborclassicationcanbegreatlyimprovedbylearninganappropriatedistancemetricfromlabeledexamples[ 65 ]. Table 3-4 showsthecorrelationcoecientsofeachpairoftherstsixperformancemetricscollectedduringtheapplicationexecution,includingload ve,pkts in,cpu system,load fteen,pkts out,andbytes out.Threeapplicationsareusedintheseexperimentsincluding:SPECseis96[ 29 ],PostMark[ 28 ]andNetPIPE[ 34 ]. Theexperimentsshowthattherearecorrelationsbetweenpairsofperformancemetricsinvariousdegrees.Forexample,NetPIPE'sbytes outmetricarehighlycorrelatedwithitspkts in,pkts out,andcpu systemmetrics.Inthecaseswheretherearecorrelationsbetweenmetrics,distancemetricwhichcantakethecorrelationintoaccountwhendeterminingitsdistancefromtheclasscentroid,shouldbeused.Therefore,Mahalanobisdistanceisusedinthetrainingdataselectionprocess. 39 ][ 68 ]andclassicationtechniqueshavebeenappliedtomanyareassuccessfully,suchasintrusiondetection[ 69 ][ 40 ][ 42 ],textcategorization[ 44 ],speechandimageprocessing[ 62 ][ 63 ],andmedicaldiagnosis[ 60 ][ 61 ]. Thefollowingworksappliedthesetechniquestoanalyzesystemperformance.Howevertheydierfromeachotherfromthefollowingaspects:goalsoffeatureselection,thefeaturesunderstudy,andimplementationcomplexity. Nickolayevetal.usedstatisticalclusteringtechniquestoidentifytherepresentativeprocessorsforparallelapplicationperformancetuning[ 50 ].Onlyeventtracingofthe 71

PAGE 72

Ahnetal.appliedvariousstatisticaltechniquestoextracttheimportantperformancecountermetricsforapplicationperformanceanalysis[ 51 ].Theirprototypecansupportparallelapplication'sperformanceanalysisbycollectingandaggregatinglocaldata.Itrequiresannotationofapplicationsourcecodeaswellasappropriateoperatingsystemandlibrarysupporttocollectprocessinformation,whichisbasedonhardwarecounters. Cohenetal.[ 52 ]studiedcorrelationbetweencomponentperformancemetricsandSLOviolationinInternetserverplatform.Therearesomesimilaritiesbetweentheirworkandoursintermsoflevelofperformancemetricsunderstudyandtypeofclassierused.However,ourstudydiersfromtheirsinthefollowingways.First,ourstudyfocusesonapplicationclassication(CPU-intensive,I/Oandpagingintensive,andnetwork-intensive)forresourcescheduling.Theirstudyfocusedonperformanceanomalydetection(SLOviolationandcompliance).Second,ourprototypetargetstosupportonlineclassication.Itaddressedthetrainingdataqualicationproblemtoadaptthefeatureselectiontochangingworkload.However,onlinetrainingdataselectionproblemswerenotthefocusof[ 52 ].Third,inourprototype,virtualmachineswereusedtohostapplicationexecutionsandsummarizeapplication'sresourceusage.Theprototypesupportsawiderangeofapplications,suchasscienticprogramsandbusinessonlinetransactionsystem.[ 52 ]studiedwebapplicationinthree-tierclient/serversystems. Inadditionto[ 52 ],Aguileraetal.[ 70 ]andMagpie[ 71 ]alsostudiedperformanceanalysisofdistributedsystems.However,theyconsideredmessage-leveltracesofsystemactivitiesinsteadofsystemlevelperformancemetrics.Bothofthemtreatedcomponentsofdistributedsystemsasblack-box.Therefore,theirapproachesdonotrequireapplicationandmiddlewaremodications. 72

PAGE 73

73

PAGE 74

Theintegrationofmultiplepredictorspromiseshigherpredictionaccuracythantheaccuracythatcanbeobtainedwithasinglepredictor.Thechallengeishowtoselectthebestpredictoratanygivenmoment.Traditionally,multiplepredictorsareruninparallelandtheonethatgeneratesthebestresultisselectedforprediction.Inthischapter,weproposeanovelapproachforpredictorintegrationbasedonthelearningofhistoricalpredictions.Comparedwiththetraditionalapproach,itdoesnotrequirerunningallthepredictorssimultaneously.Instead,itusesclassicationalgorithmssuchask-NearestNeighbor(k-NN)andBayesianclassicationanddimensionreductiontechniquesuchasPrincipalComponentAnalysis(PCA)toforecastthebestpredictorfortheworkloadunderstudybasedonthelearningofhistoricalpredictions.Thenonlytheforecastedbestpredictorisrunforprediction. 72 ]enablesentitiestocreateaVirtualOrganization(VO)tosharetheircomputationresourcessuchasCPUtime,memory,networkbandwidth,anddiskbandwidth.Predictingthedynamicresourceavailabilityiscriticaltoadaptiveresourcescheduling.However,determiningthemostappropriateresourcepredictionmodelaprioriisdicultduetothemulti-dimensionalityandvariabilityofsystemresourceusage.First,theapplicationsmayexercisetheuseofdierenttypeofresourcesduringtheirexecutions.SomeresourceusagessuchasCPUloadmayberelativelysmootherwhereasotherssuchasnetworkbandwidtharebustier.Itishardtondasinglepredictionmodelwhichworksbestforalltypesofresources.Second,dierentapplicationsmayhavedierentresourceusagepatterns.Thebestpredictionmodelforaspecicresourceofonemachinemaynotwokbestforanothermachine.Third,theresourceperformanceuctuatesdynamicallyduetothecontentioncreatedbycompetingapplications.Indeed,intheabsenceofaperfectpredictionmodel,thebestpredictorforanyparticularresourcemaychangeovertime. 74

PAGE 75

Ourexperimentalresultsbasedontheanalysisofasetofvirtualmachinetracedatashow: 1.Thebestpredictionmodelisworkloadspecic.Intheabsenceofaperfectpredictionmodel,itishardtondasinglepredictorwhichworksbestacrossvirtualmachineswhichhavedierentresourceusagepatterns. 2.Thebestpredictionmodelisresourcespecic.Itishardtondasinglepredictorwhichworksbestacrossdierentresourcetypes. 3.ThebestpredictionmodelforaspecictypeofresourceofagivenVMtracevariesasafunctionoftime.TheLARPredictorcanadaptthepredictorselectiontothechangeoftheresourceconsumptionpattern. 4.Intheexperimentswithasetoftracedata,TheLARPredictoroutperformedtheobservedsinglebestpredictorinthepoolfor44.23%ofthetracesandoutperformedthecumulative-MSEbasedpredictionmodelusedintheNetworkWeatherServicesystem 75

PAGE 76

73 ]for66.67%ofthetraces.Ithasthepotentialtoconsistentlyoutperformanysinglepredictorforvariableworkloadsandachieve18.63%lowerMSEthanthemodelusedintheNWS. Therestofthechapterisorganizedasfollows:Section 4.2 givesanoverviewofrelatedwork.Section 4.4 describesthelineartimeseriespredictionmodelsusedtoconstructtheLARPredictorandSection 4.5 describesthelearningtechniquesusedforpredictorselection.Section 4.6 detailstheworkowofthelearning-aidedadaptiveresourcepredictor.Section 4.7 discussestheexperimentalresults.Section 4.8 summarizestheworkanddescribesfuturedirection. 74 ],biomedicalsignalprocessing[ 75 ],andgeoscience[ 76 ].Inthiswork,wefocusonthetimeseriesmodelingforcomputerresourceperformanceprediction. In[ 77 ]and[ 78 ],Dindaetal.conductedextensivestudyofthestatisticalpropertiesandthepredictionsofhostload.TheirworkindicatesthatCPUloadisstronglycorrelatedovertime,whichimpliesthathistory-basedloadpredictionschemesarefeasible.Theyevaluatedthepredictivepowerofasetoflinearmodelsincludingautoregression(AR),movingaverage(MA),autoregressionintegratedmovingaverage(ARIMA),autoregressionfractionallyintegratedmovingaverage(ARFIMA),andwindow-meanmodels.TheirresultsshowthattheARmodelisthebestintermsofhighpredictionaccuracyandlowoverheadamongthemodelstheystudied.Basedontheirconclusion,theARmodelisincludedinourpredictorpooltoleverageitsperformance. Toimprovethepredictionaccuracy,variousadaptivetechniqueshavebeenexploitedbytheresearchcommunity.In[ 4 ],Yangetal.developedatendency-basedpredictionmodelthatpredictsthenextvalueaccordingtothetendencyofthetimeserieschange.Someincrement/decrementvalueareadded/subtractedtothecurrentmeasurementbasedonthecurrentmeasurementandsomeotherdynamicinformationtopredictthe 76

PAGE 77

79 ].Inaddition,in[ 80 ],Lianget.alproposedamulti-resourcepredictionmodelthatusesboththeautocorrelationoftheCPUloadandthecrosscorrelationbetweentheCPUloadandfreememorytoachievehigherCPUloadpredictionaccuracy.Vazhkudaietal.[ 81 ][ 82 ]usedlinearregressiontopredictthedatatransfertimefromnetworkbandwidthordiskthroughput. TheNetworkWeatherService(NWS)[ 73 ]performspredictionofbothnetworkthroughputandlatencyforhostmachinesdistributedwithdierentgeographicdistances.BoththeNWSandtheLARPredictorusethemix-of-expertapproachtoselectthebestpredictoratanygivenmoment.However,theydierfromeachotherinthewayofbestpredictorselection.ThepredictionmodelusedintheNWSsystemrunsasetofpredictorsinparalleltotracktheirpredictionaccuracies.Acumulativeerrormeasurement,MeanSquareError(MSE),iscalculatedforeachpredictor.Theonethatgeneratesthelowestpredictionerrorfortheknownmeasurementsischosentomakeaforecastoffuturemeasurementvalues.Section 4.6 showsthattheLARPredictoronlyusesparallelpredictionduringthetrainingphase.Inthetestingphase,itusesthePCAandk-NNclassiertoforecastthebestpredictorforthenextvaluebasedonthelearningofhistoricalpredictionperformances.Onlytheforecastedbestpredictorisruntopredictthenextvalue. Themix-of-expertapproachhasbeenappliedtothetextrecognitionandcate-gorizationarea.Thecombinationofmultipleclassiershasbeenprovedtobeabletoincreasetherecognitionrateindicultproblemswhencomparedwithsingleclassier[ 83 ].Dierentcombinationstrategiessuchasweightedvotingandprobability-basedvotinganddimensionalityreductionbasedonconceptindexingareintroducedin[ 84 ]. 77

PAGE 78

VirtualmachineresourceusagepredictionprototypeThemonitoragent,whichisinstalledintheVirtualMachineMonitor(VMM),collectstheVMresourceperformancedataandstoresthemintheroundrobinVMPerformanceDatabase.TheprolerextractstheperformancedataofagiventimeframefortheVMindicatedbyVMIDanddeviceID.TheLARPredictorselectthebestpredictionmodelbasedonlearningofhistoricalpredictions,predictstheresourceperformancefortimet+1,andstoresthepredictionresultsinthepredictiondatabase.ThepredictionresultscanbeusedtosupporttheresourcemanagertoperformdynamicVMresourceallocation.ThePerformanceQualityAssuror(QA)auditstheLARPredictor'sperformanceandordersre-trainingforthepredictoriftheperformancedropsbelowapredenedthreshold. Ourvirtualmachineresourcepredictionprototype,illustratedinFigure 4-1 ,modelshowtheVMperformancedataarecollectedandusedtopredictthevalueforfuturetimetosupportresourceallocationdecision-making. AperformancemonitoringagentisinstalledintheVirtualMachineMonitor(VMM)tocollecttheperformancedataoftheguestVMs.Inourimplementation,VMware'sESXvirtualmachinesareusedtohosttheapplicationexecutionandthevmkusagetool[ 85 ]ofESXisusedtomonitorandcollecttheperformancedataoftheVMguestsandhost 78

PAGE 79

2-1 showsthelistofperformancefeaturesunderstudyinthiswork. TheprolerretrievestheVMperformancedata,whichareidentiedbyvmID,deviceID,andatimewindow,fromtheroundrobinperformancedatabase.ThedataofeachVMdevice'sperformancemetricformtimeseries(xtm+1;;xt)withanidenticalinterval,wheremisthedataretrievalwindowsize.Theretrievedperformancedatawiththecorrespondingtimestampsarestoredinthepredictiondatabase.The[vmID,deviceID,timeStamp,metricName]formsthecombinationalprimarykeyofthedatabase.Figure 4-2 showstheXMLschemaofthedatabaseandsampledatabaserecordsofvirtualmachinessuchasVM1,whichhasoneCPU,twoNetworkInterfaceCards(NIC),andtwovirtualharddisks. TheLARPredictortakesthetimeseriesperformancedata(ytm;;yt1)asinputs,selectsthebestpredictionmodelbasedonthelearningofhistoricalpredictionresults,andpredictstheresourceperformance^ytoffuturetime.ThedetaildescriptionoftheLARPredictor'sworkowisgiveninSection 4.6 .ThepredictedresultsarestoredinthepredictionDBandcanbeusedtosupporttheresourcemanager'sdynamicVMprovisioningdecision-making. ThePredictionQualityAssuror(QA)isresponsibleformonitoringtheLARPredic-tor'sperformance,intermsofMSE.ItperiodicallyauditsthepredictionperformancebycalculatingtheaverageMSEofhistoricalpredictiondatastoredinthepredictionDB.WhentheaverageMSEofthedataintheauditwindowexceedsapredenedthreshold,itdirectstheLARPredictortore-trainthepredictorsandtheclassierusingrecentperformancedatastoredinthedatabase. 79

PAGE 80

SampleXMLschemaoftheVMperformanceDB wherefZtgdenotestheobservedtimeseries,fatgdenotesanunobservedwhitenoiseseries,andfigdenotestheweights.Inthisthesis,performancesnapshotsofvirtualmachine'sresourcesincludingCPU,memory,disk,andnetworkbandwidtharetakenperiodicallytoformthetimeseriesfZtgunderstudy. 80

PAGE 81

86 ].Timeseriesanalysistechniqueshavebeenwidelyappliedtoforecastinginmanyareassuchaseconomicforecasting,salesforecasting,stockmarketanalysis,communicationtraccontrol,andworkloadprojection.Inthiswork,simpletimeseriesmodels,suchasLAST,sliding-windowaverage(SW AVG),andautoregressive(AR),areusedtoconstructtheLARPredictortosupportonlineprediction.However,theLARPredictorprototypemaybegenerallyusedwithotherpredictionmodelsstudiedin[ 78 ][ 73 ][ 4 ]. AVGmodel:Thesliding-windowaveragemodelpredictsthefuturevaluesbytakingtheaverageoveraxed-lengthhistory: ThecurrentvalueoftheseriesZtisalinearcombinationoftheplatestpastvaluesofitselfplusatermat,whichincorporateseverythingnewintheseriesattimetthatisnotexplainedbythepastvalues.Yule-WalkertechniqueisusedintheARmodelttinginthiswork. 81

PAGE 82

AVGisproposedtopredicttheVMresourceperformance. Thepredictionperformanceismeasuredinmeansquarederror(MSE)[ 87 ],whichisdenedastheaveragesquareddierencebetweenindependentobservationsandpredictionsfromthettedequationforthecorrespondingvaluesoftheindependentvariables. (4{5) where^istheestimatorofaparameterinastatisticalmodel. Therearetwotypesofclassiers:nonparametricandparametric.Theparametricclassierexploitspriorinformationtomodelthefeaturespace.Whentheassumedmodeliscorrect,parametricclassiersoutperformnonparametricones.Incontrast,thenonparametricclassiersdonotmakesuchassumptionandaremorerobust.However,thenonparametricclassierstendtosuerfromcurseofdimensionality,whichmeansthatthedemandofanumberofsamplesgrowsexponentiallywiththedimensionalityofthefeaturespace.Inthissection,wearegoingtointroduceanonparametricclassier,k-NNclassier,andaparametricclassier,Bayesianclassier,whichareusedforbestpredictorselectionin 82

PAGE 83

Sincethefeaturesunderstudy,suchasCPUpercentageandnetworkreceived bytes/-sec,havedierentunitsofmeasure,allfeaturesarenormalizedtohavezeromeanandunitvariance[ 88 ].Inthiswork,\closest"isdeterminedbyEuclideandistance(Equation 4{6 ). Asanonparametricmethod,thek-NNclassiercanbeappliedtodierenttimeserieswithoutmodication.Toaddresstheproblemassociatedwithhighdimensionality,variousdimensionreductiontechniquescanbeusedinthedatapreprocessing. 83

PAGE 84

whereinthiscaseofccategories Then,theposteriorprobabilitiesP(!jjx)canbecomputedfromp(xj!j)byBayesformula.Inaddition,BayesformulacanbeexpressedinformallyinEnglishbysayingthat evidence: Themultivariatenormaldensityhasbeenappliedsuccessfullytoanumberofclassicationproblems.Inthisworkthefeaturevectorcanbemodeledasamultivariatenormalrandomvariable. Thegeneralmultivariatenormaldensityinddimensionsiswrittenas (2)d=2jj1=2exp1 2(x)T1(x); wherexisad-componentcolumnvector,isthed-componentmeanvector,isthed-by-dcovariancematrix,andjjand1areitsdeterminantandinverse,respectively.Further,welet(x)Tdenotethetransposeofx. Theminimizationoftheprobabilityoferrorcanbeachievedbyuseofthediscrimi-nantfunctions 84

PAGE 85

2(xi)T1(xi)d 2lnjij+lnP(!i): Theresultingclassicationisperformedbyevaluatingdiscriminantfunctions.Whentheworkloadhavesimilarstatisticalproperty,theBayesianclassierderivedfromoneworkloadtracecanbeappliedtoanotherdirectly.Incaseofhighlyvariableworkload,theretrainingoftheclassierisnecessary. 22 ][ 88 ],alsocalledKarhunen-Loevetrans-form,isalineartransformationrepresentingdatainaleast-squaresense.Theprincipalcomponentsofasetofdatain
PAGE 86

4-3 .Thepredictionconsistsoftwophases:atrainingphaseandatestingphase.Duringthetrainingphase,thebestpredictorsforeachsetoftrainingdataareidentiedusingthetraditionalmix-of-expertapproach.Duringthetestingphase,theclassierforecaststhebestpredictorforthetestdatabasedontheknowledgegainedfromthetrainingdataandhistoricalpredictionperformance.Thenonlytheselectedbestpredictorisruntopredicttheresourceperformance.Bothphasesincludethedatapre-processingandthePrincipalComponentAnalysis(PCA)process. Thefeaturesunderstudyinthiswork,asshowninTable 2-1 ,includeCPU,memory,networkbandwidth,anddiskI/Ousages.Figure 4-4 illustrateshowthefeaturesareprocessedtoformthepredictiondatabase.Sincethefeatureshavedierentunitsofmeasure,adatapre-processorwasusedtonormalizetheinputdatawithzeromeanandunitvariance.ThenormalizeddataareframedaccordingtothepredictionwindowsizetofeedthePCAprocessor. TheLASTandSW AVGmodelsdonotinvolveanyunknownparameters.Theycanbeusedforpredictionsdirectly.TheparametricpredictionmodelssuchastheARmodel,whichcontainunknownparameters,requiremodeltting.Themodelttingisaprocess 86

PAGE 87

Learning-aidedadaptiveresourcepredictorworkowTheinputdataarenormalizedandframedwiththepredictionwindowsizem.ThePrincipalComponentAnalysis(PCA)isusedtoreducethedimensionoftheinputdatafromthewindowsizemton(n
PAGE 88

Learning-aidedadaptiveresourcepredictordataowFirst,theutrainingdataX1uisnormalizedtoX01uandsubsequentlyframedtoX0(um+1)maccordingtothepredictororderm.ThePCAprocessorisusedtoreducethedimensionofeachsetoftrainingdatafrommtonbeforeprediction.ThenthepredictorsareruninparallelwiththeinputsX00(um+1)nandtheonethatgivesthesmallestMSEisidentiedasthebestpredictortobeassociatedwiththecorrespondingtrainingdatainthepredictiondatabase.Thedimensionreductionofthetestingdataissimilartothetrainingdata'sandisnotshownhere. toestimatetheunknownparametersofthemodels.TheYule-Walkerequation[ 86 ]isusedintheARmodelttinginthiswork. Forwindowbasedpredictionmodels,suchasSW AVGandAR,thePCAalgorithmisappliedtoreducetheinputdatadimension.Thenaivemix-of-expertapproachisusedtoidentifythebestpredictorpiforeachsetofpre-processedtrainingdata(exp.(x0ix0i+1:::x0i+m1)).AllpredictionmodelsareruninparallelwiththetrainingdataandtheonewhichgeneratestheleastMSEofpredictionisidentiedasthebestpredictorpi,whichisaclasslabeltakingvaluesin(LAST,AR,SW AVG)tobeassociatedwiththetrainingdata.TheupairsofPCA-processedtrainingdataandthecorrespondingbestpredictors[(x001;p1);;(x00u;pu)]formthetrainingdataoftheclassiers. Asanon-parametricclassier,thek-NNclassierdoesnothaveanobvioustrainingphase.Themajortaskofitstrainingphaseistolabelthetrainingdatawithclassdenitions.Asaparametricclassier,theBayesianclassierusesthetrainingdatatoderiveitsunknownparameters,whicharethemeanandcovariancematrixoftrainingdataofeachclass,toformtheclassicationmodel. 88

PAGE 89

InthetestingphaseoftheLARPredictorthatisbasedonk-NNclassier,theEuclideandistancesbetweenallPCA{processedtestdatay00tny00tn+1:::y00t1andalltrainingdataX00(u1+m)ninthereducedndimensionalfeaturespacearecalculatedandthek(k=3inourimplementation)trainingdatawhichhavetheshortestdistancestothetestingdataareidentied.Themajorityvoteoftheknearestneighbors'bestpredictorwillbechosenasthebestpredictortopredict^y0tbasedonthe(y0tm;y0tm+1;;y0t1)incaseoftheARmodelortheSW AVGmodeland^y0t=y0t1incaseoftheLASTmodel.Thepredictionperformancecanbeobtainedbycomparingthepredictedvalue^y0twiththenormalizedobservedvaluey0t. InthetestingphaseoftheLARPredictorthatisbasedonBayesianclassier,testdataarepreprocessedthesameasthek-NNclassier.ThePCA{processedtestdatay00tny00tn+1:::y00t1arepluggedintothediscriminantfunction( 4{12 )derivedinSection 4.5.2 .Theparametersinthediscriminantfunctionforeachclass,themeanvectorandcovariancematrix,areobtainedduringthetrainingphase.Then,eachtestdataisclassiedastheclassofthelargestdiscriminantfunction. ThetestingphasediersfromthetrainingphaseinthatitdoesnotrequirerunningmultiplepredictorsinparalleltoidentifytheonewhichisbestsuitedtothedataandgivesthesmallestMSE.Instead,itforecaststhebestpredictorbylearningfromhistoricalpredictions.Thereasoninghereisthatthesenearestneighbors'workloadcharacteristicsareclosesttothetestingdata'sandthepredictorthatworksbestfortheseneighborsshouldalsoworkbestforthetestingdata. 89

PAGE 90

ThesevirtualmachineswerehostedbyaphysicalmachinewithanIntel(R)Xeon(TM)2.0GHzCPU,4GBmemory,and36GBSCSIdisk.VMwareESXserver2.5.2wasrunningonthephysicalhost.ThevmkusagetoolwasrunontheESXservertocollecttheresourceperformancedataoftheguestvirtualmachineseveryminuteandstoretheminaroundrobindatabase.TheprolerwasusedtoextractthedatawithgivenVMID,DeviceID,performancemetric,startingandendingtimestamps,andintervals.Inthisexperiment,theperformancedataofa24-hourperiodwith5-minuteintervalswereextractedforVM2,VM3,VM4,andVM5.Thedataofa7-dayperiodwith30-minuteintervalsofVM1wereextracted.ThedataofagivenVMID,DeviceID,andperformancemetricsformatimeseriesunderstudy.Thetimeseriesdatawerenormalizedwithzeromeanandunitvariance. 90

PAGE 91

4-5 showsthepredictorselectionsforCPUfteenminuteloadaverageduringa12hourperiodwithasamplingintervalof5minutes.Thetopplotshowstheobservedbestpredictorbyrunningthreepredictionmodelsinparallel.ThemiddleplotshowsthepredictorselectionoftheLARPredictorandthebottomplotshowsthecumulative-MSEbasedpredictorselectionusedintheNWS.Similarlythepredictorselectionresultsofthetracedataofotherresourcesareshownasfollows:NetworkpacketsinpersecondinFig. 4-6 ,totalamountofswapmemoryinFig. 4-7 ,andtotaldiskspaceinFig. 4-8 Theseexperimentalresultsshowthatthebestpredictionmodelforaspecictypeofresourceofagiventracevariesasafunctionoftime.Intheexperiment,theLARPredictorcanbetteradaptthepredictorselectiontothechangingworkloadthanthecumulative-MSEbasedapproachpresentedintheNWS.TheLARPredictor'saveragebestpredictorforecastingaccuracyofalltheperformancetracesofthevevirtualmachinesis55.98%,whichis20.18%higherthantheaccuracyof46.58%achievedbythecumulative-MSEbasedpredictorusedintheNWSfortheworkloadstudied. 4.7.2.1 showsthepredictionaccuracyofthek-NNbasedLARPredictorandallthepredictorsinthepool.Section 4.7.2.2 comparesthepredictionaccuracyandexecutiontimeofthek-NNbasedLARPredictorandtheBayesianbasedLARPredictor.Inaddition,Section 4.7.2.3 benchmarkstheperformanceoftheLARPredictorsandthecumulative-MSEbasedpredictionmodelusedintheNWS. Intheexperiments,ten-foldcrossvalidationwereperformedforeachsetoftimeseriesdata.Atimestampwasrandomlychosentodividetheperformancedataofavirtualmachineintotwoparts:50%ofthedatawasusedtotraintheLARPredictorandtheother50%wasusedastestsettomeasurethepredictionperformancebycalculatingitspredictionMSE. 91

PAGE 92

BestpredictorselectionfortraceVM2 load15PredictorClass:1-LAST,2-AR,3-SW AVG Inthetestingphase,the3NNclassierwasusedtoforecastthebestpredictorsofthetestingdata.First,foreachsetoftestingdataofthepredictionwindowsize,thePCAwasappliedtoreducethedatadimensionfromm,whichwas5or16,ton=2in 92

PAGE 93

BestpredictorselectionfortraceVM2 PktInPredictorClass:1-LAST,2-AR,3-SW AVG thisexperiment.ThentheEuclideandistancesbetweenthetestdataandallthetrainingdatainthereducedfeaturespacewerecalculated.Thethreetrainingdatawhichhadtheshortestdistancestothetestingdatawereidentiedandthemajorityvoteoftheirassociatedbestpredictorswasforecastedtobethebestpredictorofthetestingdata.Atlast,theforecastedbestpredictorwasruntopredictthefuturevalueofthetestingdata.TheMSEofeachtimeserieswascalculatedtomeasuretheperformanceoftheLARPredictor.Tables 4-1 4-2 4-3 4-4 ,and 4-5 showthepredictionperformanceoftheLARPredictorwithcurrentimplementation(LAR)andthethreepredictionmodelsincludingLAST,AR,andSW AVGforallresourceperformancetracesofthevevirtualmachines.AlsoshowninthesetablesisthecomputedMSEforaperfectLARPredictor 93

PAGE 94

BestpredictorselectionfortraceVM2 SwapPredictorClass:1-LAST,2-AR,3-SW AVG (P-LAR).TheMSEoftheP-LARmodelshowstheupperboundofthepredictionaccuracythatcanbeachievedbytheLARPredictor.TheMSEofthebestpredictoramongLAR,LAST,AR,andSW AVGishighlightedwithitalicboldnumbers. Table 4-6 showsthebestpredictoramongLAST,AR,andSW AVGforalltheresourceperformancemetricsandVMtraces.Thesymbol\*"indicatesthecasesinwhichtheLARPredictorachievedequalorhigherpredictionaccuracythanthebestofthethreepredictors.Overall,theARmodelperformedbetterthantheLASTandtheSW AVGmodels. Theaboveexperimentalresultsshow: 94

PAGE 95

BestpredictorselectionfortraceVM2 DiskPredictorClass:1-LAST,2-AR,3-SW AVG 1.ItishardtondasinglepredictionmodelamongLAST,AR,andSW AVGthatperformsbestforalltypesofresourceperformancedataforagivenVMtrace.Forexample,fortheVM1'stracedatashowninTable 4-1 ,eachofthethreemodels(LAST,AR,andSW)outperformedtheothertwoforasubsetoftheperformancemetrics.Inthisexperiment,onlytheARmodelworkedbestforthetracedataofVM3. 2.ItishardtondasinglepredictionmodelamongthethreethatperformbestconsistentlyforagiventypeofresourcesacrossalltheVMtraces.Intheexperiment,onlytheARmodelworkedbestfortheCPUperformancepredictions. 3.TheLARPredictorachievedbetter-than-expertperformancesusingthemix-of-expertapproachfor44.23%oftheworkloadtraces.Itshowsthepotentialforthe 95

PAGE 96

NormalizedpredictionMSEstatisticsforresourcesofVM1 4-9 showsthepredictionperformancecomparisonsbetweenitandthek-NNbasedLARPredictorforalltheresourcesofVM1.TheprolereportoftheMatlabprogramexecutionshowedthatitcostthekNNbasedLARPredictor205.8secondCPUtime,with193.5secondsinthetestingphaseand12.3secondsinthetrainingphase.Ittook132.1 96

PAGE 97

NormalizedpredictionMSEstatisticsforresourcesofVM2 TheexperimentalresultsshowthatthepredictionaccuracyintermsofnormalizedMSEoftheBayesian-classierbasedLARPredictorisabout3.8%worsethanthek-NNbasedone.However,itshortenedtheCPUtimeofthetestingphaseby37.57%. 4-9 4-10 4-11 4-12 ,and 4-13 showsthepredictionaccuracyoftheperfectLARPredictorthathas100%bestpredictorforecastingaccuracy(P-LARP),thek-NNandBayesianbasedLARPredictors(KnnLARPandBaysLARP),thecumulativeMSEofallhistorybasedpredictorusedintheNWS(Cum.MSE),andthecumulative-MSE 97

PAGE 98

NormalizedpredictionMSEstatisticsforresourcesofVM3 Theexperimentalresultsshowthatwithoutrunningallthepredictorsinparallelallthetime,for66.67%ofthetraces,theLARPredictoroutperformedthecumulative-MSEbasedpredictorusedintheNWS.TheperfectLARPredictorshowsthepotentialtoachieve18.6%lowerMSEinaveragethatthecumulative-MSEbasedpredictor. 89 ].Inthecontextofresourceperformancetimeseriesprediction,W=1anddisthepredictionwindowsize.ThetypicalsmallinputdatasizeinthiscontextmakestheuseofthePCAfeasible.Therealsoexistcomputationallylessexpensivemethods[ 90 ]forndingonlyafeweigenvectorsandeigenvaluesofalargematrix;inourexperiments,weuseappropriateMatlabroutinestorealizethese. 98

PAGE 99

NormalizedpredictionMSEstatisticsforresourcesofVM4 NormalizedpredictionMSEstatisticsforresourcesofVM5

PAGE 100

Bestpredictorsofallthetracedata.ThepredictorsshowninthetablehavethesmallestMSEamongallthethreepredictors(LAST,AR,andSW AVG).The\*"symbolindicatesthattheLARPredictoroutperformsthebestpredictorinthepredictorpool. VM1 VM2 VM3 VM4 VM5 usedsec AR AR AR AR* AR* CPU ready AR AR* AR* AR* AR Mem size LAST AR* AR* LAST AR* Mem swap LAST AR* LAST AR* NIC1 Rx AR* AR AR* AR NIC1 Tx AR* AR* AR* AR NIC2 Rx AR* LAST AR SW AVG NIC2 Tx AR* AR* AR* AR VD1 read AR AR AR SW AVG VD1 write AR AR SW AVG* AR VD2 read SW AVG AR AR* AR VD2 write AR AR AR* AR* AR Thek-NNdoesnothaveano-linelearningphase.The\trainingphase"ink-NNissimplytoindextheNtrainingdataforlateruse.Therefore,itstrainingcomplexityofk-NNisO(N)bothintimeandspace.Inthetestingphase,theknearestneighborsofatestingdatacanbeobtainedO(N)timebyusingamodiedversionofquicksort[ 91 ].Therearefastalgorithmsforndingnearest-neighbors[ 92 ][ 93 ]also. Threesimpletimeseriesmodelswereusedinthisexperimenttoshowthepotentialofusingdynamicpredictorselectionbasedonlearningtoimprovepredictionaccuracy.However,theLARPredictorprototypemaybegenerallyusedwithothermoresophisti-catedpredictionmodelssuchasthesestudiedin[ 78 ][ 73 ][ 4 ].Generally,themorepredictorsinthepoolandthemorecomplexthepredictorsare,itismorebenecialtousetheLARPredictorbecausetheclassicationoverheadcanbebetteramortizedbyrunningonlysinglepredictoratanygiventime. 100

PAGE 101

Predictorperformancecomparison(VM1)1-CPU usedsec,2-CPU ready,3-Mem size,4-Mem swap,5-NIC1 rx,6-NIC1 tx,7-NIC2 rx,8-NIC2 tx,9-VD1 read,10-VD1 write,11-VD2 read,12-VD2 write theBayesianclassierareusedtoforecastthebestpredictorfortheworkloadbasedonthelearningofhistoricalloadcharacteristicsandpredictionperformance.Theprincipalcomponentanalysistechniquehasbeenappliedtoreducetheinputdatadimensionoftheclassicationprocess.OurexperimentalresultswiththetracesofthefullrangeofvirtualmachineresourcesincludingCPU,memory,networkanddiskshowthattheLARPredictorcaneectivelyidentifythebestpredictorfortheworkloadandachievepredictionaccuraciesthatareclosetoorevenbetterthananysinglebestpredictor. 101

PAGE 102

Predictorperformancecomparison(VM2)1-CPU usedsec,2-CPU ready,3-Mem size,4-Mem swap,5-NIC1 rx,6-NIC1 tx,7-NIC2 rx,8-NIC2 tx,9-VD1 read,10-VD1 write,11-VD2 read,12-VD2 write 102

PAGE 103

Predictorperformancecomparison(VM3)1-CPU usedsec,2-CPU ready,3-Mem size,4-Mem swap,5-NIC1 rx,6-NIC1 tx,7-NIC2 rx,8-NIC2 tx,9-VD1 read,10-VD1 write,11-VD2 read,12-VD2 write 103

PAGE 104

Predictorperformancecomparison(VM4)1-CPU usedsec,2-CPU ready,3-Mem size,4-Mem swap,5-NIC1 rx,6-NIC1 tx,7-NIC2 rx,8-NIC2 tx,9-VD1 read,10-VD1 write,11-VD2 read,12-VD2 write 104

PAGE 105

Predictorperformancecomparison(VM5)1-CPU usedsec,2-CPU ready,3-Mem size,4-Mem swap,5-NIC1 rx,6-NIC1 tx,7-NIC2 rx,8-NIC2 tx,9-VD1 read,10-VD1 write,11-VD2 read,12-VD2 write 105

PAGE 106

Prolingtheexecutionphasesofapplicationscanhelptooptimizetheutilizationoftheunderlyingresources.Thischapterpresentsanovelsystemlevelapplication-resource-demandphaseanalysisandpredictionapproachinsupportofon-demandresourceprovisioning.Thisapproachexploreslarge-scalebehaviorofapplications'resourceconsumption,followedbyanalysisusingasetofalgorithmsbasedonclustering.Thephaseprole,whichlearnsfromhistoricalruns,isusedtoclassifyandpredictfuturephasebehavior.Thisprocesstakesintoconsiderationapplications'resourceconsumptionpatterns,phasetransitioncostsandpenaltiesassociatedwithService-LevelAgreements(SLA)violations. 94 ]oftheapplication'sexecutionenvironmentbothinacademiaandindustry[ 11 ][ 16 ][ 95 ].Thisismotivatedbytheideaofprovidingcomputingresourcesasautilityandchargingtheusersforaspecicusage.Forexample,inAugust2006,AmazonlauncheditsBetaversionofVM-basedElasticComputeCloud(EC2)webservice.EC2allowsuserstorentvirtualmachineswithspeciccongurationsfromAmazonandcansupportchangesinresourcecongurationsintheorderofminutes.Insystemsthatallowuserstoreserveandrecongureresourceallocationsandchargebaseduponsuchallocations,usershaveanincentivetorequestnomorethantheamountofresourcesanapplicationneeds.Aquestionwhichariseshereis:howtoadapttheresourceprovisioningtothechangingworkload? Inthischapter,wefocusonmodelingandanalyzinglong-runningapplications'phasebehavior.Themodelingisbasedonmonitoringandlearningoftheapplications'historicalresourceconsumptionpatterns,whichlikelyvariesovertime.Understandingsuchbehavioriscriticaltooptimizingresourcescheduling.Toself-optimizethecongurationofan 106

PAGE 107

Inthiscontext,aphaseisdenedasasetofintervalswithinanapplication'sexecutionthathavesimilarsystem-levelresourceconsumptionbehavior,regardlessoftemporaladjacency.Itmeansthataphasemayreappearmanytimesasanapplicationexecutes.Phaseclassicationpartitionsasetofintervalsintophaseswithsimilarbehavior.Inthischapter,weintroduceanapplicationresourcedemandphaseanalysisandpredictionprototype,whichusesacombinationofclusteringandsupervisedlearningtechniquestoinvestigatethefollowingquestions: 1)Isthereaphasebehaviorintheapplication'sresourceconsumptionpatterns?Ifso,howmanyphasesshouldbeusedtoprovideoptimalresourceprovisioning? 2)Basedontheobservationsofhistoricalphasebehaviors,whatisthepredictednextphaseoftheapplication'sexecution? 3)Howdophasetransitionfrequencyandpredictionaccuracyaectresourceallocation?Answerstothesequestionscanbeusedtodecidethetimeandspaceallocationofresources. Tomakeoptimizationdecisions,thisprototypetakestheapplication'sresourceconsumptionpatterns,phasetransitioncosts,andpenaltiesassociatedwithServiceLevelAgreement(SLA)violationsintoaccountwhilemakingoptimizationdecisions.Thepredictionaccuracyisfedbacktoguidefuturephaseanalysis.Thisprototypedoesnotrequireanyinstrumentationoftheapplicationsourcecodesandcangenerallyworkwithbothphysicalandvirtualmachineswhichcanprovidemonitoringofsystemlevelperformancemetrics. OurexperimentalresultswiththeCPUandthenetworkperformancetracesofSPECseis96andWorldCup98accesslogreplayshowthat: 107

PAGE 108

2.Forapplicationswithphasebehavior,typicallywithasmallnumberofphasesthesavingsgainedfromphase-basedresourcereservationcanoutweighthecostsassociatedwiththeincreasednumberofre-provisioningandthepenaltiescausedbymispredictions. 3.Thephasepredictionaccuracydecreasesasthenumberofphasesincreases.Withthecurrentprototype,anaverageofabove90%phasepredictionaccuracycanbeachievedfortheCPUandnetworkperformancefeatureswherefourphasesareconsidered. Therestofthischapterisorganizedasfollows:Section 5.2 presentstheapplicationphaseanalysisandpredictionmodel.Section 5.3 and 5.4 detailthealgorithmsusedforphaseanalysisandprediction.Section 5.5 presentsexperimentalresults.Section 5.6 discussesrelatedwork.Section 5.7 drawsconclusionsanddiscussesfuturework. 5-1 ,modelshowtheapplicationVM'sperformancedataarecollectedandanalyzedtoconstructthecorrespondingapplication'sphaseproleandhowtheproleisusedtopredictitsnextphase.Inaddition,itshowshowprocessqualityindicators,suchasphasepredictionaccuracy,aremonitoredandusedasfeedbacksignalstotunethesystemperformance(suchasapplicationresponsetime)towardsthegoaldenedintheSLA. AperformancemonitoringagentisusedtocollecttheperformancedataoftheapplicationVM,whichservesastheapplicationcontainer.Themonitoringagentcanbeimplementedinvariousways.Inthiswork,Ganglia[ 54 ],adistributedmonitoringsystem,andthevmkusagetool[ 85 ]providedbyVMwareESXserver,areusedtomonitor 108

PAGE 109

ApplicationresourcedemandphaseanalysisandpredictionprototypeThephaseanalyzeranalyzestheperformancedatacollectedbythemonitoringagenttondouttheoptimalnumberofphasesn2[1;m].Theoutputphaseproleisstoredintheapplicationphasedatabase(DB)andwillbeusedastrainingdataforthephasepredictor.Thepredictorpredictsthenextphaseoftheapplicationresourceusagebasedonthelearningofitshistoricalphasebehaviors.Thepredictedphasecanbeusedtosupporttheapplicationresourcemanager's(ARM's)decisionsregardingresourceprovisioning.Theauditormonitorsandevaluatestheperformanceoftheanalyzerandpredictorandordersre-trainingofthephasepredictorwiththeupdatedworkloadprolewhentheperformancemeasurementsdroptobelowapredenedthreshold. theapplicationcontainers.Thecollectedperformancedataisstoredintheperformancedatabase. Thephaseanalyzerretrievesthetime-seriesVMperformancedata,whichareidentiedbyvmID,FeatureID,andatimewindow(ts;te),fromtheperformancedatabase.Thenitperformsphaseanalysisusingalgorithmsbasedonclusteringtocheckwhetherthereisaphasebehaviorintheapplication'sresourceconsumptionpatterns.Ifso,itcontinuestondouthowmanyphasesinanumericrangearebestintermsofprovidingtheminimalresourcereservationcosts.Theoutputphaseprole,whichconsistsofthe 109

PAGE 110

5.3 Thephaseproleisusedastrainingdataofthephasepredictor.Inthepresenceofphasebehavior,thephasepredictorcanperformon-linepredictionofthenextphaseoftheapplication'sresourceusagebasedonthelearningofhistoricalphasebehaviorsasshowninSection 5.4 .Thepredictedphaseinformationcanbeusedtosupporttheapplicationresourcemanager'sdecisionsregardingresourcere-provisioningrequeststotheresourcescheduler. Theauditormonitorsandevaluatesthehealthofthephaseanalysisandpredictionprocessbyperformingqualitycontrolofeachcomponent.Clusteringqualitycanbemeasuredbythesimilarityandcompactnessoftheclustersusingvariousinternalindicesintroducedin[ 96 ].Thephasepredictor'sperformanceismeasuredbyitspredictionaccuracy.TheapplicationresponsetimeisusedasanexternalsignalfortotalqualitycontrolandcheckedagainsttheQualityofService(QoS)denedintheSLA.Localperformancetuningistriggeredwhentheauditorobservesthatthecomponent-levelservicequalitydropstobelowapredenedthreshold.Forexample,whenthereal-timeworkloadvariestoadegreewhichmakesitstatisticallysignicantlydierentfromthetrainingworkload,thephasepredictionaccuraciesmaydrop.Upondetection,theauditorcanorderaphaseanalysisbasedonrecentworkloadtoupdatethephaseproleandsubsequentlyorderare-trainingforthephasepredictor.Ifthere-trainingstillcannotimprovethetotalqualityofservicetoasatisfactorylevel,theresourcereservationstrategyfallsbackfromthephase-basedreservationtoaconservativestrategy,whichreservesthelargestamountofresourcestheuseriswillingtopayduringthewholeapplicationrun.Automatedandadaptivethresholdsettingisdiscussedindetailin[ 67 ]. 110

PAGE 111

Atahigh-level,theproblemofclusteringisdenedasfollows:GivenasetUofnsamplesu1;u2;;un,wewouldliketopartitionUintoksubsetsU1;U2;;Uk,suchthatthesamplesassignedtoeachsubsetaremoresimilartoeachotherthanthesamplesassignedtodierentsubsets.Here,weassumethattwosamplesaresimilariftheycorrespondtothesamephase. 97 ]: (1)Patternrepresentation,whichisusedtoobtainanappropriatesetoffeaturestouseinclustering.Itoptionallyconsistsoffeatureextractionand/orselection.Featureselectionistheprocessofidentifyingthemosteectivesubsetoftheoriginalfeaturestouseinclustering.Featureextractionistheuseofoneormoretransformationsoftheinputfeaturestoproducenewsalientfeatures. Inthecontextofresourcedemandphaseanalysis,thefeaturesunderstudyarethesystemlevelresourceperformancemetricsasshowninTable 5-1 .Foronedimensionclustering,whichisthecaseofthiswork,thefeatureselectionisassimpleaschoosingtheperformancemetricwhichisinstructivetotheallocationofthecorrespondingsystemresource.Forclusteringbasedonmultipleperformancemetrics,featureextractiontechniquessuchasPrincipalComponentAnalysis(PCA)maybeusedtotransformtheinputperformancemetricstoalowerdimensionspacetoreducethecomputingintensityofsubsequentclusteringandimprovetheclusteringquality. (2)Denitionofapatternproximitymeasureappropriatetothedatadomain.Thepatternproximityisusuallymeasuredbyadistancefunctiondenedonpairsofpatterns.Inthiswork,themostpopularmetricforcontinuousfeatures,Euclideandistanceisused 111

PAGE 112

(3)Clusteringorgrouping:Theclusteringcanbeperformedinanumberofways[ 97 ].Theoutputclusteringcanbehard(apartitionofthedataintogroups)orfuzzy(whereeachpatternhasavariabledegreeofmembershipineachoftheoutputclusters).Ahardclusteringcanbeobtainedfromafuzzypartitionbythresholdingthemembershipvalue.Inthiswork,oneofthemostpopulariterativeclusteringmethods,k-meansalgorithmasdetailedinSection 5.3.3 ,isused. 97 ] {Apattern(orfeaturevectororobservation)isasingledataitemusedbytheclusteringalgorithm.Ittypicallyconsistsofavectorofdmeasurements. {Theindividualscalarcomponentsofapatternarecalledfeatures(orattributes). {disthedimensionalityofthepatternorofthepatternspace. {Aclassreferstoastateofnaturethatgovernsthepatterngenerationprocessinsomecases.Moreconcretely,aclasscanbeviewedasasourceofpatternswhosedistributioninfeaturespaceisgovernedbyaprobabilitydensityspecictotheclass.Clusteringtechniquesattempttogrouppatternssothattheclassestherebyobtainedreectthedierentpatterngenerationprocessesrepresentedinthepatternset. {Adistancemeasureisametriconthefeaturespaceusedtoquantifythesimilarityofpatterns. 112

PAGE 113

Incaseofclusteringinthemulti-dimensionalspace,normalizationofthecontinuousfeaturescanbeusedtoremovethetendencyofthelargest-scaledfeaturetodominatetheothers.Inaddition,Mahalanobisdistancecanbeusedtoremovethedistortioncausedbythelinearcorrelationamongfeaturesasdiscussedinchapter 3 Thek-meansalgorithmworksasfollows[ 97 ]: (1)Choosekclustercenterstocoincidewithkrandomly-chosenpatternsinsidethehypervolumecontainingthepatternset. (2)Assigneachpatterntotheclosestclustercenter. (3)Recomputetheclustercentersusingthecurrentclustermemberships. (4)Ifaconvergencecriterionisnotmet,gotostep2.Typicalconvergencecriteriaare:no(orminimal)reassignmentofpatternstonewclustercenters,orminimaldecreaseinsquarederror. ThealgorithmhasatimecomplexityofO(n),wherenisthenumberofpatterns,andaspacecomplexityofO(k),wherekisthenumberofclusters.Thealgorithmisorder-independent;foragiveninitialseedsetofclustercenters,itgeneratesthesamepartitionofthedatairrespectiveoftheorderinwhichthepatternsarepresentedtothealgorithm.However,thealgorithmissensitivetoinitialseedselectionandeveninthebestcase,itcanproduceonlyhypersphericalclusters. 113

PAGE 114

96 ].Thebestnumberofclustersinthecontextofphaseanalysisdiscussedinthisworkistheonethatgivesminimaltotalcosts.Theprocesstondouttheoptimalnumberofclustersoftheapplicationworkloadisexplainedasfollows. Letun=u(t0+nt)denotetheresourceusagesampledattimet=t0+ntduringtheexecutionofanapplication.AsshowninSection 5.3.3 ,whentheclusteringwithinputparameterk(i.e.,thenumberofclusters)isperformedforaresourceusagesetU=fu1;u2;g,thesubsetUiofresourceusagesthatbelongtotheithphasecanbewrittenas: andthetotalresourcereservationRoverthewholeexecutionperiodcanbewrittenas wherekisthenumberofclustersusedforclusteringalgorithmandthesizeofUiisdenedasthenumberofelementsofthesubsetUi.Comparedtotheconservativereservationstrategy,whichreservestheglobalmaximumamountofresourcesoverthewholeexecutionperiod,thephase-basedreservationstrategycanbetteradapttheresourcereservationtotheactualresourceusageandreducetheresourcereservationcostasshowninFigure 5-2 114

PAGE 115

ResourceallocationstrategycomparisonPhase-basedresourceallocationstrategycanadaptthetime(t)andspace(s)granularityoftheallocationtotheactualresourceusage.Itpresentscostreductionopportunitycomparedtothecoarse-grainedconservativestrategy. whichillustratesthedierencebetweenthetworeservationstrategiesusingahypotheticalworkload. (5{5) whereC1andC2denotetheunitcostperresourceusageandtransition.Thebestnumberofphases,kbest,shouldminimizethetotalcost.Therefore,kbestisderivedas 115

PAGE 116

wherekisthenumberofphases.Takingboththephasetransitionandmispredictioncostsintoaccount,thegeneraltotalcostfunctionismodiedas (5{9) 116

PAGE 117

whereCisthetransitionfactor,Cpdenotethediscountfactorformispredictionpenalty,whichistheratioofC3toC1,andKisthemaximumnumberofphases. 5-3 .Thepredictionconsistsoftwostages:atrainingstageandatestingstage.Duringthetrainingstage,thenumberoftheclustersintheapplicationresourceusage,thecorrespondingclustercentroids,andtheunknownparametersofthetimeseriespredictionmodeloftheresourceusagearedetermined.Duringthetestingstage,theone-stepaheadresourceusageispredictedandclassiedasoneoftheclusters. Bothstagesstartfrompatternrepresentationandframing.Inthestepofpatternrepresentation,thecollectedperformancedataoftheapplicationVMareproledtoextractonlythefeatureswhichwillbeusedforclusteringandfutureresourceprovisioning.Forexample,intheone-dimensioncasediscussedinthisthesis,thetrainingdataofaspecicperformancefeature(X1u,seeTable 5-1 ),areextracted,whereuisthetotalnumberofinputdata.ThentheextractedperformancedataX1uareframedwiththepredictionwindowsizemtoformdataX0(um+1)m. Thetrainingstagemainlyconsistsoftwoprocesses:predictionmodelttingandphasebehavioranalysis.ThealgorithmsdenedinSection 5.3.3 and 5.3.4 areusedtondoutthenumberofphasesk,whichgivesthelowesttotalresourceprovisioningcost.Theoutputphaseproleisusedtotrainthephasepredictor.Inaddition,theunknownparametersoftheresourcepredictorareestimatedfromthetrainingdata.Inthisthesis, 117

PAGE 118

78 ].However,thisprototypecangenerallyworkwithanyothertime-seriespredictionmodels.Incaseofhighlydynamicworkloads,theLearning-AidedResourcePredictor(LARPredictor)developedinChapter4canbeused.TheLARPredictorusesamix-of-expertsapproach,whichadaptivelychoosesthebestpredictionmodelfromapoolofmodelsbasedonlearningofthecorrelationsbetweentheworkloadandttedpredictionmodelsofhistoricalruns. Similartothetrainingstage,thetestingdataareextractedY1vandframedwiththepredictionwindowsizem.TheframedtestingdataY0(vm+1)mareusedasinputofthettedresourcepredictortopredictthefutureresourceusage^Y01v.Thephasepredictorclassiesthepredictedresourceusages^Y01vintothephases^P01vbasedonthephaseprolelearnedinthetrainingstage.Similarly,thephasepredictionsfortheactualresourceusageY1vareperformedtogenerate^P1v.Thenthecorrespondingpredictedphases^P01v(whicharebasedonpredictedresourceusage)and^P1v(whicharebasedonactualresourceusage)arecomparedtoevaluatethephasepredictionaccuracy,whichisdenedastheratioofthenumberofmatchedphasepredictionsoverthetotalnumberofphasepredictions. 118

PAGE 119

Performancefeaturelist Description System/User PercentCPU System/User Bytes In/Out Numberofbytespersecondinto/outof thenetwork IO BI/BO Blockssentto/receivedfromblockdevice (blocks/s) Swap In/Out Amountofmemoryswappedin/out from/todisk(kB/s) 5.3.4 canbeusedtondoutthebestnumberofclustersforanapplicationworkload.TheGangliamonitoringdaemonwasusedtocollecttheperformancedataoftheapplicationcontainer.Table 5-1 showsthelistofperformancefeaturesunderstudyintheexperiments. 53 ],washostedbyaVMwareGSXvirtualmachine.ThehostserverofthevirtualmachinewasanIntel(R)Xeon(TM)dual-CPU1.80GHzmachinewith512KBcacheand1GBRAM.TheGangliadaemonwasinstalledintheguestVMandruntocollecttheresourceperformancedataonceeveryveseconds(5secs=interval)andstorethemintheperformancedatabase.Duringfeaturerepresen-tation,thedatawereextractedbasedongivenVMID,FeatureID,startingandendingtimestampstoformtimeseriesdataunderstudy.Thensubsequentphaseanalysiswasperformedforthe8000performancesnapshotscollectedduringthemonitoringperiods. Figure A showsasamplesetoftrainingdataoftheCPU user(%)oftheVMincludingtheactualresourceusages(ActualRsc),reservedresourcesbasedonthek-meanclustering(k=3)(RsvdRsc)andbasedontheconservativereservationstrategy(ConsrvRsc).Figure B showsasamplesetofthecorrespondingtestingdataincludingtheactualresourceusage(ActualRsc),theresourcereservationbasedonactualresourceusage(Rsvd 119

PAGE 120

Figures C and D showthat,withincreasingnumberofphases,twoofthedeter-minantsinthecostmodelincludingthenumberofphasetransitionsTR(k)andthemispredictionpenaltyP(k)increasemonotonically.Theotherdeterminantofthecostmodel,theamountofreservedresourcesR(k),isshownbythelowestcurvewithindexC=0inFigure E .Itindicatesthatwithincreasingnumberofphasesthetotalreservedresourcesofthetrainingsetisdecreasingmonotonically.Thisresultisbecausewiththeincreasingnumberofphases,theresourceallocationcanbeperformedattimescalesofnergranularity.However,thereisadiminishingreturnoftheincreasednumberofphasesbecauseoftheincreasingphasetransitioncostsandmispredictionpenalties. Intherstanalysis,weassumeeachresourcereservationschemetobeclairvoyant,i.e.,itreservesresourcesbasedonexactknowledgeoffutureworkloadrequirements.Thisassumptioneliminatestheimpactofinaccuraciesintroducedbythephasepredictor.Inthiscase,Equation( 5{6 ),whichtakestheresourcereservationcostandthephasetransitioncostintoaccountwhiledecidingtheoptimalnumberofphases,canbeappliedasshowninFigure E .Inthisgure,thetotalcostoverthewholetestingperiodismeasuredbyCPUusageinpercentage.ThediscountfactorCdenotestheCPUpercentagethatwillcostforeachphasetransition:C=CPU(%)TransitionDuration.Forexample,thebottomlineofC=0showsthecaseofnotransitioncost,whichgivesthelowerboundofthetotalcost.Foranotherinstance,C=260%impliesa13-secondtransitionperiod(2:6intervals5secs=interval)withtheassumptionof100%CPUconsumptionduringthetransitionperiod.WhenthediscountfactorCincreasesfrom0to260,thebestnumberofphasesk best,whichcanprovidethelowesttotalcost,decreasesgraduallyfrom10to2.ThephaseproledepictedinFigure E canbeusedtodecidethenumberofphasesthatshouldbeusedinthephase-basedresourcereservationtominimizethetotalcostwithgivenavailabletransitionoptions.Forexample,VMwareESXsupports 120

PAGE 121

TheimpactofinaccuraciesintroducedbythephasepredictorisshowninFigure F .Inadditiontotheresourcereservationcostsandthephasetransitioncosts,thisexperimentalsotookthephasemis-predictionpenaltycostsintoaccountswhilecalculatingthetotalcost.Forexample,foreachunitofdown-sizemis-predictedresource,apenaltyof8-times(Cp=8)oftheunitresourcecostisimposed.ComparingFigure E toFigure F ,wecanseethataddingpenaltyintothecostmodelwillincreasethenalcoststotheuserforthesamesetofkandCandpotentiallyreducetheworkload'sbestnumberofphasesk0 FinallyatotalcostratioisdenedtobetherelativetotalcostusingkphasesTC0(k)tothetotalcostof1phaseTC0(1). Intuitively,measuresthecostsavingsachievedusingphase-basedreservationstrategyovertheconservativeone.Thus,thesmallerthevalueof,themoreecientaphase-basedreservationscheme.Table 5-2 givesasampletotalcostschedule(C=52andCp=8)foreachoftheeightperformancefeaturesofSPECseis96.Itshowsthatbychangingtheresourceprovisioningstrategyfromtheconservativeapproach(k=1)tothephase-basedprovisioning(k=3),29.5%totalcostreductionforCPUusagecanbeachieved.ForspikytracedatasuchasdiskI/Oandmemoryusage,thetotalcostreductioncanbeashighas49%. 121

PAGE 122

SPECseis96totalcostratioschedulefortheeightperformancefeatures Theworkloadusedinthisexperimentwasbasedonthe1998WorldCuptrace[ 98 ].TheopenlyavailabletracecontainingalogofrequeststoWebserverswasusedasaninputtoaclientreplaytool,whichenabledustoexercisearealisticWeb-basedworkloadandcollectsystem-levelperformancemetricsusingGangliainthesamemannerthatwasdonefortheSPECseis96workload.Forthisstudy,wechosetoreplaythevehour(from22:00:01Jun.23to3:11:20Jun.24)logoftheleastloadedserver(serverID101),whichcontained130,000webrequests. Thephaseanalysisandpredictiontechniquescanbeusedtocharacterizeperformancedatacollectedfromnotonlyvirtualmachinesbutalsophysicalmachines.Duringtheexperiment,aphysicalserverwithsixteenIntel(R)Xeon(TM)MP3.00GHzCPUsand32GBmemorywasusedtoexecutethereplayclientstosubmitrequestsbasedonsubmissionintervals,HTTPprotocoltypes(1.0or1.1),anddocumentsizesdenedinthelogle.AphysicalmachinewithIntel(R)Pentium(R)41.70GHzCPUand512MBmemorywasusedtohosttheApachewebserverandasetofleswhichwerecreatedbasedonthelesizesdescribedinthelog. 122

PAGE 123

98 ]wasusedtoconvertthebinarylogintotheCommonLogFormat.AmodiedversionoftheReal-TimeWebLogReplayer[ 99 ]wasusedtoanalyzeandgeneratethelesneededbythelogreplayerandperformthereplay. Figures 5-5 and 5-6 showthephasecharacterizationresultsoftheperformancefeaturesbytes inandbytes outofthewebserver.TheinterestingobservationfromFigures A and B isthatthenumberofphasetransitionsandmis-predictionpenaltiesdonotalwaysmonotonicallyincreasewiththeincreasingnumberofphases.Asaresult,thephaseproleshowninFigure C arguesthatthree-phasebasedresourceprovisioninggivesthelowesttotalcostwithgivenC=[150k;750k]andC p=8.Theresultsimpliesthatthephaseproleishighlyworkloaddependent.Theprototypepresentedinthisthesiscanhelptoconstructandanalyzethephaseproleoftheapplicationresourceconsumptionanddecidetheproperresourceprovisioningstrategy. 5.4 .Aperformancemeasurement,predictionaccuracy,isdenedastheratioofthenumberofperformancesnapshots,whosepredictedphasesmatchwiththeobservedphases,tothetotalnumberofperformancesnapshotscollectedduringthetestingperiod. Table 5-3 showsthephasepredictionaccuraciesfortheperformancetracesofthemainresourcesconsumedbytheSPECseis96andtheWorldCup98workloads.Generally,thephasepredictionaccuracyofeachperformancefeaturedecreaseswithincreasingnumberofphases.ItexplainswhythepenaltycurverisesmonotonicallywiththeincreasingnumberofphasesinFigure D .Withcurrentimplementation,anaverageof95%accuracycanbeachievedforthenetworkperformancetracesoftheWorldCup98log 123

PAGE 124

Averagephasepredictionaccuracy Table5-4. PerformancefeaturelistofVMtraces Description Ready Thepercentageoftimethatthevirtualmachine wasreadybutcouldnotgetscheduledtorunon aphysicalCPU. CPU Used ThepercentageofphysicalCPUresourcesused byavirtualCPU. Mem Size Currentamountofmemoryinbytesthevirtual machinehas. Mem Swap Amountofswapspaceinbytesusedbythe virtualmachine. Net RX/TX ThenumberofpacketsandtheMBytesper secondthataretransmittedandreceivedbyaNIC. Disk RD/WR ThenumberofI/OsandKBytespersecond thatarereadfromandwrittentothedisk. replayandanaverageof85%accuracycanbeachievedfortheCPUperformancetracesofSPECseis96forthefour-phasecases. Inadditiontotheabovetwoapplications,wealsoevaluatedthepredictionper-formanceofthephasepredictorusingtracesofasetofvevirtualmachines.ThesevirtualmachineswerehostedbyaphysicalmachinewithanIntel(R)Xeon(TM)2.0GHzCPU,4GBmemory,and36GBSCSIdisk.VMwareESXserver2.5.2wasrunningonthephysicalhost.ThevmkusagetoolwasrunontheESXservertocollecttheresourceperformancedataoftheguestvirtualmachineseveryminuteandstoretheminaroundrobindatabase.TheperformancefeaturesunderstudyinthisexperimentareshowninTable 5-4 124

PAGE 125

Table 5-5 showstheaveragephasepredictionaccuraciesforeachofthe12perfor-mancefeaturesoveralltheveVMs.Itshowsthatwithincreasingnumberofphasesthephasepredictionaccuracyofeachperformancefeaturedecreasesmonotonically.Thepredictionaccuraciesvarywiththeperformancefeaturesunderstudy.Withcurrentimplementation,anaverageof83.25%accuracycanbeachievedacrossthephasepredictionsofallthetwelveperformancefeaturesforthetwophasecases. 1.Aclearmappingbetweenresourceconsumptionandresponsetimeisassumedfortheapplicationcontainer.Thismightnotalwaysbetrueforalltypesofapplications.Morecomplexperformance/queuingmodelsmaybeneededtoprovideanaccuratemappingincaseofcomplexapplications. 2.Adedicatedmachineisassumedfortheapplicationcontainertocollecttheperformancedata.Incasethatmultipleapplicationsco-existonthesamehostingmachine,amoresophisticatedmethodofdatacollection,forexampleaggregatingperformancedataoftheprocessesthatbelongtothesameapplication,maybeneeded. 125

PAGE 126

AveragephasepredictionaccuracyoftheveVMs 3.Inthiswork,onedimensionalphaseanalysisandpredictionisperformed.Howevertheprototypecangenerallyworkformulti-dimensionresourceprovisioningcasesalso.Forclusteringinthemulti-dimensionspace,additionalpatternrepresentationtechniquessuchasPrincipalComponentAnalysis(PCA)canbeusedtoprojectthedatatolowerdimensionalspacetoreducethecomputingintensity.Inaddition,thetransitionfactorCwillrepresenttheunittransitioncostdenedinthepricingscheduleoftheresourceprovider. Developingpredictionmodelsforparallelandmulti-tierapplicationsispartofourfutureresearch. 100 ][ 101 ].Second,phasecharacterizationthatsummarizesapplicationbehaviorwithrepresentativeexecutionregionscanbeused 126

PAGE 127

102 ][ 103 ].Ourpurposetostudythephasebehavioristosupportdynamicresourceprovisioningoftheapplicationcontainers. Inadditiontothepurposeofstudy,ourapproachdiersfromtraditionalprogramphaseanalysisinthefollowingways: 1)Performancemetricunderstudy:Intheareaofpowermanagementandsimulationoptimizationforcomputerarchitectureresearch,themetricsusedforworkloadcharac-terizationaretypicallyBasicBlockVectors(BBV)[ 102 ][ 101 ],conditionalbranchcounter[ 104 ],andinstructionworkingset[ 105 ].InthecontextofapplicationVM/container'sresourceprovisioning,themetricsunderstudyarethesystemlevelperformancefeatures,whichareinstructivetoVMresourceprovisioningsuchasthoseshowninTable 5-1 2)Knowledgeoftheprogramcodes:While[ 102 ][ 101 ][ 104 ]atleastrequiresprolingofprogrambinarycodes,ourapproachrequiresneitherinstrumentationnoraccessofprogramcodes. 3)Thisthesisanswersthequestion\howmanyclustersarebest"inthecontextofsystemlevelresourceprovisioning. In[ 106 ],Dhodapkaretal.comparedthreedynamicprogramphasedetectiontechniquesdiscussedin[ 102 ],[ 104 ],and[ 105 ]usingavarietyofperformancemetrics,suchassensitivity,stability,performancevarianceandcorrelationsbetweenphasedetectiontechniques. Inaddition,otherrelatedworkonresourceprovisioninginclude:Urgaonkaretal.studiedresourceprovisioninginamulti-tierwebenvironment[ 107 ].Wildstrometal.developedamethodtoidentifythebestCPUandmemorycongurationfromapoolofcongurationsforaspecicworkload[ 108 ].Chaseetal.haveproposedahierarchicalarchitecturethatallocatesvirtualclusterstoagroupofapplicaitons[ 109 ].Kusicetal.developedanoptimizationframeworktodecidethenumberofserverstoallocateto 127

PAGE 128

110 ].Tesauroetal.usedacombinationofreinforcementlearningandqueuingmodelforsystemperformancemanagement[ 5 ]. 128

PAGE 129

ApplicationresourcedemandphasepredictionworkowInthetrainingstage,theuperformancedataX1uoffeature(s)usedinthesubsequentphaseanalysisareextracted(patternrepresentation)andframedwithpredictionwindowsizem.TheunknownparametersoftheresourcepredictorisestimatedduringmodelttingusingtheframedtrainingdataX0(um+1)m.Inaddition,theclusteringalgorithmsintroducedinSection 5.3 areusedtoconstructtheapplicationphaseproleincludingthephaselabelsI1uforallthesamplesandthecalculatedclustercentroidsC1k.Inthetestingphase,thephasepredictorusestheknowledgelearnedfromthephaseproletopredictthefuturephases^P01vbasedonthepredictedresourceusage^Y01vand^P1vbasedonobservedactualresourceusage^Y1v,andcomparethemtoevaluatethephasepredictionaccuracy. 129

PAGE 130

B user A )Sampletrainingdata B )Sampletestingdata C )Phasetransitions D )Mispredictionpenalties E )Totalcostwithoutpenalty F )Totalcostwithpenalty(Cp=8) 130

PAGE 131

D 131

PAGE 132

F 132

PAGE 133

B C PhaseanalysisofWorldCup'98Bytes In A )Phasetransitions B )Mispredictionpenalties C )Totalcostwithpenalty(Cp=8) 133

PAGE 134

B C PhaseanalysisofWorldCup'98Bytes out A )Phasetransitions B )Mispredictionpenalties C )Totalcostwithpenalty(Cp=8) 134

PAGE 135

Self-managementhasdrawnincreasingattentionsinthelastfewyearsduetotheincreasingsizeandcomplexityofcomputingsystems.Aresourceschedulerthatcanperformself-optimizationandself-congurationcanhelptoimprovethesystemthroughputandfreesystemadministratorsfromlabor-intensiveanderror-pronetasks.However,itischallengingtoequiparesourceschedulerwithsuchself-capacitiesbecauseofthedynamicnatureofsystemperformanceandworkload. Inthisdissertation,weproposetousemachinelearningtechniquestoassistsystemperformancemodelingandapplicationworkloadcharacterization,whichcanprovidesupportforon-demandresourcescheduling.Inaddition,virtualmachinesareusedasresourcecontainerstohostapplicationexecutionsfortheeaseofdynamicresourceprovisioningandloadbalancing. TheapplicationclassicationframeworkpresentedinChapter2usedthePrincipalComponentAnalysis(PCA)toreducethedimensionoftheperformancedataspace.Thenthek-NearestNeighbork-NNalgorithmisusedtoclassifythedataintodierentclassessuchasCPU-intensive,I/O-intensive,memory-intensive,andnetwork-intensive.Itdoesnotrequiremodicationsoftheapplicationsourcecode.Experimentswithvariousbenchmarkapplicationssuggestthatwiththeapplicationclassknowledge,aschedulercanimprovethesystemthroughput22.11%onaveragebyallocatingtheapplicationsofdierentclassestosharethesystemresources. ThefeatureselectionprototypepresentedinChapter3usesaprobabilisticmodel(BayesianNetwork)tosystematicallyselecttherepresentativeperformancefeatures,whichcanprovideoptimalclassicationaccuracyandadapttochangingworkloads.Itshowsthatautonomicfeatureselectionenablesclassicationwithoutrequiringexpertknowledgeintheselectionofrelevantlow-levelperformancemetrics.Thisapproachrequiresnoapplicationsourcecodemodicationnorexecutionintervention.Resultsfrom 135

PAGE 136

Inadditiontotheapplicationresourcedemandmodeling,Chapter4proposesalearningbasedadaptivepredictor,whichcanbeusedtopredictresourceavailability.Itusesthek-NNclassierandPCAtolearntherelationshipbetweenworkloadcharacteristicandsuitedpredictorbasedonhistoricalpredictions,andtoforecastthebestpredictorfortheworkloadunderstudy.Then,onlytheselectedbestpredictorisruntopredictthenextvalueoftheperformancemetric,insteadofrunningmultiplepredictorsinparalleltoidentifythebestone.Theexperimentalresultsshowthatthislearning-aidedadaptiveresourcepredictorcanoftenoutperformthesinglebestpredictorinthepoolwithoutaprioriknowledgeofwhichmodelbesttsthedata. Theapplicationclassicationandthefeatureselectiontechniquescanbeusedtodenetheapplicationresourceconsumptionpatternsatanygivenmoment.Theexperimentalresultsoftheapplicationclassicationsuggestthatallocatingapplicationswhichhavecomplementaryresourceconsumptionpatternstothesameservercanimprovethesystemthroughput. Inadditiontoone-step-aheadperformanceprediction,Chapter 5 studiedthelargescalebehaviorapplicationresourceconsumption.Clusteringbasedalgorithmshavebeenexploredtoprovideamechanismtodeneandpredictthephasebehavioroftheapplicationresourceusagetosupporton-demandresourceallocation.Theexperimentalresultsshowthatanaverageofabove90%ofphasepredictionaccuracycanbeachievedforthefour-phasecasesofthebenchmarkworkloads. 136

PAGE 137

[1] J.KephartandD.Chess,\Thevisionofautonomiccomputing,"Computer,vol.36,no.1,pp.41{50,2003. [2] Y.YangandH.Casanova,\Rumr:Robustschedulingfordivisibleworkloads.,"inProc.12thHigh-PerformanceDistributedComputing,Seattle,WA,June22-24,2003,pp.114{125. [3] J.M.SchopfandF.Berman,\Stochasticscheduling,"inProc.ACM/IEEEConferenceonSupercomputing,Portland,OR,Nov.14{19,1999,p.48. [4] L.Yang,J.M.Schopf,andI.Foster,\Conservativescheduling:Usingpredictedvariancetoimproveschedulingdecisionsindynamicenvironments,"inProc.ACM/IEEEconferenceonSupercomputing,Nov.15-21,2003,p.31. [5] G.Tesauro,N.Jong,R.Das,andM.Bennani,\Ahybridreinforcementlearningapproachtoautonomicresourceallocation,"inProc.IEEEInternationalConferenceonAutonomicComputing(ICAC'06),2006,pp.65{73. [6] G.Tesauro,R.Das,W.Walsh,andJ.Kephart,\Utility-function-drivenresourceallocationinautonomicsystems,"inProc.SecondInternationalConferenceonAutonomicComputing(ICAC'05),2005,pp.342{343. [7] R.Duda,P.Hart,andD.Stork,TheArtofComputerSystemsPerformanceAnalysis:TechniquesforExperimentalDesign,Measurement,Simulation,andModeling,Wiley-Interscience,NewYork,NY,Apr.1991. [8] J.O.Kephart,\Researchchallengesofautonomiccomputing,"inProc.27thInternationalConferenceonSoftwareEngineeringICSE,May2005,pp.15{22. [9] S.M.WeissandC.A.Kulikowski,ComputerSystemsThatLearn:ClassicationandPredictionMethodsfromStatistics,NeuralNets,MachineLearning,andExpertSystems,MorganKaufmann,SanMateo,CA94403,1990. [10] R.P.Goldberg,\Surveyofvirtualmachineresearch,"IEEEComputerMagazine,vol.7,no.6,pp.34{45,June1974. [11] R.Figueiredo,P.Dinda,andJ.Fortes,\Acaseforgridcomputingonvirtualmachines,"inProc.23rdInternationalConferenceonDistributedComputingSystems,May19{22,2003,pp.550{559. [12] S.Pinter,Y.Aridor,S.Shultz,andS.Guenender,\Improvingmachinevirtualizationwith'hotplugmemory',"Proc.17thInternationalSymposiumonComputerArchitectureandHighPerformanceComputing,pp.168{175,2005. [13] C.Clark,K.Fraser,S.Hand,J.Hanseny,E.July,C.Limpach,I.Pratt,andA.Wareld,\Livemigrationofvirtualmachines,"inProc.2ndSymposiumonNetworkedSystemsDesign&Implementation(NSDI'05),Boston,MA,2005. 137

PAGE 138

[14] \Vmotion,"http://www.vmware.com/products/vi/vc/vmotion.html. [15] M.Zhao,J.Zhang,andR.Figueiredo,\Distributedlesystemsupportforvirtualmachinesingridcomputing,"Proc.13thInternationalSymposiumonHighPerfor-manceDistributedComputing,pp.202{211,2004. [16] I.Krsul,A.Ganguly,J.Zhang,J.Fortes,andR.Figueiredo,\Vmplants:Providingandmanagingvirtualmachineexecutionenvironmentsforgridcomputing,"inProc.Supercomputing,Washington,DC,Nov.6{12,2004. [17] J.Sugerman,G.Venkitachalan,andB.Lim,\Virtualizingi/odevicesonvmwareworkstation'shostedvirtualmachinemonitor,"inProc.USENIXAnnualTechnicalConference,2001. [18] J.Dike,\Auser-modeportofthelinuxkernel,"inProc.4thAnnualLinuxShowcaseandConference,USENIXAssociation,Atlanta,GA,Oct.2000. [19] A.SundararajandP.Dinda,\Towardsvirtualnetworksforvirtualmachinegridcomputing,"inProc.3rdUSENIXVirtualMachineResearchandTechnologySymposium,May2004. [20] M.Litzkow,T.Tannenbaum,J.Basney,andM.Livny,\CheckpointandmigrationofUNIXprocessesintheCondordistributedprocessingsystem,"Tech.Rep.UW-CS-TR-1346,UniversityofWisconsin-MadisonComputerSciencesDepartment,Apr.1997. [21] A.Barak,O.Laden,andY.Yarom,\Thenowmosixanditspreemptiveprocessmigrationscheme,"BulletinoftheIEEETechnicalCommitteeonOperatingSystemsandApplicationEnvironments,vol.7,no.2,pp.5{11,1995. [22] R.Duda,P.Hart,andD.Stork,PatternClassication,Wiley-Interscience,NewYork,NY,2001,2ndedition. [23] C.G.Atkeson,A.W.Moore,andS.Schaal,\Locallyweightedlearning,"ArticialIntellegenceReview,vol.11,no.1-5,pp.11{73,1997. [24] S.Adabala,V.Chadha,P.Chawla,R.J.O.Figueiredo,J.A.B.Fortes,I.Krsul,A.M.Matsunaga,M.O.Tsugawa,J.Zhang,M.Zhao,L.Zhu,andX.Zhu,\Fromvirtualizedresourcestovirtualcomputinggrids:thein-vigosystem.,"FutureGenerationComp.Syst.,vol.21,no.6,pp.896{909,2005. [25] L.YuandH.Liu,\Ecientfeatureselectionviaanalysisofrelevanceandredundancy,"JournalofMachineLearningResearch,vol.5,pp.1205{1224,Oct.2004. [26] T.CoverandP.Hart,\Nearestneighborpatternclassication,"IEEETrans.Inf.Theory,vol.13,no.1,pp.21{27,Jan.1967.

PAGE 139

[27] M.L.Massie,B.N.Chun,andD.E.Culler,\Thegangliadistributedmonitoringsystem:Design,implementation,andexperience.,"ParallelComputing,vol.30,no.5-6,pp.817{840,2004. [28] \Netapp,"http://www.netapp.com/tech library/3022.html. [29] R.EigenmannandS.Hassanzadeh,\Benchmarkingwithrealindustrialapplications:thespechigh-performancegroup,"IEEEComputationalScienceandEngineering,vol.3,no.1,pp.18{23,1996. [30] \Ettcp,"http://sourceforge.net/projects/ettcp/. [34] Q.Snell,A.Mikler,andJ.Gustafson,\Netpipe:Anetworkprotocolindependentperformaceevaluator,"June1996. [31] \Simplescalar,"http://www.cs.wisc.edu/mscalar/simplescalar.html. [32] \Ch3d,"http://users.coastal.u.edu/pete/CH3D/ch3d.html. [33] \Bonnie,"http://www.textuality.com/bonnie/. [35] \Vmd,"http://www.ks.uiuc.edu/Research/vmd/. [36] \Spim,"http://www.cs.wisc.edu/larus/spim.html. [37] \Referenceofstream,"http://www.cs.virginia.edu/stream/ref.html. [38] \Autobench,"http://www.xenoclast.org/autobench/. [39] I.GuyonandA.Elissee,\Anintroductiontovariableandfeatureselection,"J.Mach.Learn.Res.,vol.3,pp.1157{1182,Mar.2003. [40] Y.LiaoandV.R.Vemuri,\Usingtextcategorizationtechniquesforintrusiondetection,"in11thUSENIXSecuritySymposium,SanFrancisco,CA,Aug.5{9,2002,pp.51{59. [41] A.K.Ghosh,A.Schwartzbard,andM.Schatz,\Learningprogrambehaviorprolesforintrusiondetection,"inProc.theWorkshoponIntrusionDetectionandNetworkMonitoring,SantaClara,CA,Apr.9{12,1999,pp.51{62. [42] M.AlmgrenandE.Jonsson,\Usingactivelearninginintrusiondetection,"inProc.17thIEEEComputerSecurityFoundationsWorkshop,June28{30,2004,pp.88{98. [43] S.C.LeeandD.V.Heinbuch,\Traininganeural-networkbasedintrusiondetectortorecognizenovelattacks.,"IEEETransactionsonSystems,Man,andCybernetics,PartA,vol.31,no.4,pp.294{299,2001. [44] G.Forman,\Anextensiveempiricalstudyoffeatureselectionmetricsfortextclassication,"J.Mach.Learn.Res.,vol.3,pp.1289{1305,2003.

PAGE 140

[45] N.H.Kapadia,J.A.B.Fortes,andC.E.Brodley,\Predictiveapplication-performancemodelinginacomputationalgridenvironment,"inProc.8thIEEEInternationalSymposiumonHighPerformanceDistributedComputing,RedondoBeach,CA,Aug.3{6,1999,p.6. [46] J.BasneyandM.Livny,\Improvinggoodputbycoschedulingcpuandnetworkcapacity,"Int.J.HighPerform.Comput.Appl.,vol.13,no.3,pp.220{230,Aug.1999. [47] R.Raman,M.Livny,andM.Solomon,\Policydrivenheterogeneousresourceco-allocationwithgangmatching,"inProc.12thIEEEInternationalSymposiumonHighPerformanceDistributedComputing(HPDC'03),Seattle,WA,June22{24,2003,p.80. [48] S.SodhiandJ.Subhlok,\Skeletonbasedperformancepredictiononsharednetworks,"inIEEEInternationalSymposiumonClusterComputingandtheGrid(CCGrid2004),2004,pp.723{730. [49] V.Taylor,X.Wu,andR.Stevens,\Prophesy:aninfrastructureforperformanceanalysisandmodelingofparallelandgridapplications,"SIGMETRICSPerform.Eval.Rev.,vol.30,no.4,pp.13{18,2003. [50] O.Y.Nickolayev,P.C.Roth,andD.A.Reed,\Real-timestatisticalclusteringforeventtracereduction,"TheInternationalJournalofSupercomputerApplicationsandHighPerformanceComputing,vol.11,no.2,pp.144{159,Summer1997. [51] D.H.AhnandJ.S.Vetter,\Scalableanalysistechniquesformicroprocessorperformancecountermetrics,"inProc.SuperComputing,Baltimore,MD,Nov.16{22,2002,pp.1{16. [52] I.Cohen,J.S.Chase,M.Goldszmidt,T.Kelly,andJ.Symons,\Correlatinginstrumentationdatatosystemstates:Abuildingblockforautomateddiagnosisandcontrol.,"in6thUSENIXSymposiumonOperatingSystemsDesignandImplementation,2004,pp.231{244. [53] J.ZhangandR.Figueiredo,\Applicationclassicationthroughmonitoringandlearningofresourceconsumptionpatterns,"inProc.20thInternationalParallel&DistributedProcessingSymposium,RhodesIsland,Greece,Apr.25{29,2006. [54] M.Massie,B.Chun,andD.Culler,TheGangliaDistributedMonitoringSystem:Design,Implementation,andExperience,Addison-Wesley,Reading,MA,2003. [55] S.Agarwala,C.Poellabauer,J.Kong,K.Schwan,andM.Wolf,\Resource-awarestreammanagementwiththecustomizabledprocdistributedmonitoringmechanisms,"inProc.12thIEEEInternationalSymposiumonHighPerformanceDistributedComputing,June22{24,2003,pp.250{259. [56] \Hp,"http://www.managementsoftware.hp.com.

PAGE 141

[57] H.LiuandL.Yu,\Towardintegratingfeatureselectionalgorithmsforclassicationandclustering,"IEEETrans.Knowl.DataEng.,vol.17,no.4,pp.491{502,Apr.2005. [58] J.Pearl,ProbabilisticReasoninginIntelligentSystems:NetworksofPlausibleInference,MorganKaufmannPublishers,SanFrancisco,CA,1988. [59] T.Dean,K.Basye,R.Chekaluk,S.Hyun,M.Lejter,andM.Randazza,\Copingwithuncertaintyinacontrolsystemfornavigationandexploration.,"inProc.8thNationalConferenceonArticialIntelligence,Boston,MA,July29{Aug.3,1990,pp.1010{1015. [60] D.Heckerman,\Probabilisticsimilaritynetworks,"Tech.Rep.,Depts.ofComputerScienceandMedicine,StanfordUniversity,1990. [61] D.J.Spiegelhalter,R.C.Franklin,andK.Bull,\Assessmentcriticismandimprovementofimprecisesubjectiveprobabilitiesforamedicalexpertsystem,"inProc.FifthWorkshoponUncertaintyinArticialIntelligence,1989,pp.335{342. [62] E.CharniakandD.McDermott,IntroductiontoArticialIntelligence,Addison-WesleyLongmanPublishingCo.,Inc.,Boston,MA,USA,1985. [63] T.S.Levitt,J.Mullin,andT.O.Binford,\Model-basedinuencediagramsformachinevision,"inProc.5thWorkshoponUncertaintyinArticialIntelligence,1989,pp.233{244. [64] R.E.Neapolitan,Probabilisticreasoninginexpertsystems:theoryandalgorithms,JohnWiley&Sons,Inc.,NewYork,NY,USA,1990. [65] K.Weinberger,J.Blitzer,andL.Saul,\Distancemetriclearningforlargemarginnearestneighborclassication,"inProc.19thannualConferenceonNeuralInformationProcessingSystems,Vancouver,CA,Dec.2005. [66] R.KohaviandF.Provost,\Glossaryofterms,"MachineLearning,vol.30,pp.271{274,1998. [67] B.Ziebart,D.Roth,R.Campbell,andA.Dey,\Automatedandadaptivethresholdsetting:Enablingtechnologyforautonomyandself-management,"inProc.2ndInternationalConferenceofAutonomicComputing,June13{16,2005,pp.204{215. [68] P.Mitra,C.Murthy,andS.Pal,\Unsupervisedfeatureselectionusingfeaturesimilarity,"IEEETrans.Pat.Anal.Mach.Intel.,vol.24,no.3,pp.301{312,Mar.2002. [69] W.Lee,S.J.Stolfo,andK.W.Mok,\Adaptiveintrusiondetection:Adataminingapproach,"ArticialIntelligenceReview,vol.14,no.6,pp.533{567,2000. [70] M.K.Aguilera,J.C.Mogul,J.L.Wiener,P.Reynolds,andA.Muthitacharoen,\Performancedebuggingfordistributedsystemsofblackboxes,"inProc.19thACM

PAGE 142

[71] R.IsaacsandP.Barham,\Performanceanalysisinloosely-coupleddistributedsystems,"inProc.7thCaberNetRadicalsWorkshop,Bertinoro,Italy,Oct.2002. [72] I.Foster,\Theanatomyofthegrid:enablingscalablevirtualorganizations,"inProc.1stIEEE/ACMInternationalSymposiumonClusterComputingandtheGrid,2001,pp.6{7. [73] R.Wolski,\Dynamicallyforecastingnetworkperformanceusingthenetworkweatherservice,"inJournalofclustercomputing,1998. [74] I.Matsuba,H.Suyari,S.Weon,andD.Sato,\Practicalchaostimeseriesanalysiswithnancialapplications,"inProc.5thInternationalConferenceonSignalProcessing,Beijing,2000,vol.1,pp.265{271. [75] P.MagniandR.Bellazzi,\Astochasticmodeltoassessthevariabilityofbloodglucosetimeseriesindiabeticpatientsself-monitoring,"IEEETrans.Biomed.Eng.,vol.53,no.6,pp.977{985,2006. [76] K.DidanandA.Huete,\Analysisoftheglobalvegetationdynamicmetricsusingmodisvegetationindexandlandcoverproducts,"inIEEEInternationalGeoscienceandRemoteSensingSymposium(IGARSS'04),2004,vol.3,pp.2058{2061. [77] P.Dinda,\Thestatisticalpropertiesofhostload,"ScienticProgramming,,no.7:3-4,1999. [78] P.Dinda,\Hostloadpredictionusinglinearmodels,"ClusterComputing,vol.3,no.4,2000. [79] Y.Zhang,W.Sun,andY.Inoguchi,\CPUloadpredictionsonthecomputationalgrid*,"inProc.6thIEEEInternationalSymposiumonClusterComputingandtheGrid,May2006,vol.1,pp.321{326. [80] J.Liang,K.Nahrstedt,andY.Zhou,\Adaptivemulti-resourcepredictionindistributedresourcesharingenvironment,"inProc.IEEEInternationalSymposiumonClusterComputingandtheGrid,2004,pp.293{300. [81] S.VazhkudaiandJ.Schopf,\Predictingsporadicgriddatatransfers,"Proc.InternationalSymposiumonHighPerformanceDistributedComputing,pp.188{196,2002. [82] S.Vazhkudai,J.Schopf,andI.Foster,\Usingdiskthroughputdatainpredictionsofend-to-endgriddatatransfers,"inProc.3rdInternationalWorkshoponGridComputing,Nov.2002.

PAGE 143

[83] S.GunterandH.Bunke,\Anevaluationofensemblemethodsinhandwrittenwordrecognitionbasedonfeatureselection,"inProc.17thInternationalConferenceonPatternRecognition,Aug.2004,vol.1,pp.388{392. [84] G.Jain,A.Ginwala,andY.Aslandogan,\Anapproachtotextclassicationusingdimensionalityreductionandcombinationofclassiers,"inProc.IEEEInternationalConferenceonInformationReuseandIntegration,Nov.2004,pp.564{569. [85] V.whitepaper,\Comparingthemui,virtualcenter,andvmkusage,". [86] J.D.Cryer,Timeseriesanalysis,DuxburyPress,Boston,MA,1986. [87] S.G.JohnO.RawlingsandD.A.Dickey,AppliedRegressionAnalysis,Springer,2001. [88] R.T.TrevorHastieandJ.Friedman,TheElementsofStatisticalLearning,Springer,2001. [89] E.BinghamandH.Mannila,\Randomprojectionindimensionalityreduction:applicationstoimageandtextdata,"inKnowledgeDiscoveryandDataMining,2001,pp.245{250. [90] L.SirovichandR.Everson,\Managementandanalysisoflargescienticdatasets,"Int.JournalofSupercomputerApplications,vol.6,no.1,pp.50{68,1992. [91] J.Yang,Y.ZhangandB.Kisiel,\Ascalabilityanalysisofclassiersintextcategorization,"inACMSIGIR'03,2003,pp.96{103. [92] F.Friedman,J.H.BaskettandL.Shustek,\Analgorithmforndingnearestneighbors,"IEEETransactionsonComputers,vol.C-24,no.10,pp.1000{1006,Oct.1975. [93] J.Friedman,J.H.BentleyandR.Finkel,\Analgorithmforndingbestmatchesinlogarithmicexpectedtime,"ACMTransactionsonMathematicalSoftware,vol.3,pp.209{226,1977. [94] P.D.G.BangaandJ.Mogul,\Resourcecontainers:Anewfacilityforresourcemanagementinserversystems,"inProc.3rdsymposiumonOperatingSystemDesignandImplementation,NewOrleans,Feb.1999. [95] L.Ramakrishnan,L.Grit,A.Iamnitchi,D.Irwin,A.Yumerefendi,andJ.Chase,\Towardsadoctrineofcontainment:Gridhostingwithadaptiveresourcecontrol,"inProc.Supercomputing,Tampa,FL,Nov.2006. [96] R.Dubes,\Howmanyclustersarebest?-anexperiment,"PatternRecogn.,vol.20,no.6,pp.645{663,Nov.1987. [97] A.K.Jain,M.N.Murty,andP.J.Flynn,\Dataclustering:areview,"ACMComputingSurveys,vol.31,no.3,pp.264{323,1999.

PAGE 144

[98] \Worldcup98,"http://ita.ee.lbl.gov/html/contrib/WorldCup.html. [99] \Logreplayer,"http://www.cs.virginia.edu/rz5b/software/logreplayer-manual.htm. [100] C.Isci,A.Buyuktosunoglu,andM.Martonosi,\Long-termworkloadphases:durationpredictionsandapplicationstodvfs,"IEEEMicro,vol.25,no.5,pp.39{51,2005. [101] C.IsciandM.Martonosi,\Phasecharacterizationforpower:evaluatingcontrol-ow-basedandevent-counter-basedtechniques,"Proc.12thInternationalSymposiumonHigh-PerformanceComputerArchitecture,pp.121{132,2006. [102] T.Sherwood,E.Perelman,G.Hamerly,andB.Calder,\Automaticallycharacterizinglargescaleprogrambehavior,"inProc.10thInternationalCon-ferenceonArchitecturalSupportforProgrammingLanguagesandOperatingSystems,2002,pp.45{57. [103] H.Patil,R.Cohn,M.Charney,R.Kapoor,A.Sun,andA.Karunanidhi,\Pinpointingrepresentativeportionsoflargeintelitaniumprogramswithdynamicinstrumentation,"inProc.37thannualinternationalsymposiumonMicroarchitec-ture,2004. [104] R.Balasubramonian,D.Albonesi,A.Buyuktosunoglu,andS.Dwarkadas,\Memoryhierarchyrecongurationforenergyandperformanceingeneralpurposearchitectures,"inProc.33rdannualinternationalsymposiumonmicroarchitecture,Dec.2000,pp.245{257. [105] A.DhodapkarandJ.Smith,\Managingmulti-congurationhardwareviadynamicworkingsetanalysis,"inProc.29thAnnualInternationalSymposiumonComputerArchitecture,Anchorage,AK,May2002,pp.233{244. [106] A.DhodapkarandJ.Smith,\Comparingprogramphasedetectiontechniques,"inProc.36thAnnualIEEE/ACMInternationalSymposiumonMicroarchitecture,2003,pp.217{227. [107] B.Urgaonkar,P.Shenoy,A.Chandra,andP.Goyal,\Dynamicprovisioningofmulti-tierinternetapplications,"inProc.2ndInternationalConferenceofAutonomicComputing,June2005,pp.217{228. [108] J.Wildstrom,P.Stone,E.Witchel,R.J.Mooney,andM.Dahlin,\Towardsself-conguringhardwarefordistributedcomputersystems,"inProc.2ndInterna-tionalConferenceofAutonomicComputing,June2005,pp.241{249. [109] J.S.Chase,D.E.Irwin,L.E.Grit,J.D.Moore,andS.E.Sprenkle,\Dynamicvirtualclustersinagridsitemanager,"Proc.12thIEEEInternationalSymposiumonHighPerformanceDistributedComputing,pp.90{100,June2003.

PAGE 145

[110] D.KusicandN.Kandasamy,\Risk-awarelimitedlookaheadcontrolfordynamicresourceprovisioninginenterprisecomputingsystems,"inProc.3rdInternationalConferenceofAutonomicComputing,2006,pp.74{83.

PAGE 146

JianZhangwasborninChengdu,China.ShereceivedherB.S.degreein1995,fromtheUniversityofElectronicScienceandTechnologyofChina,majoringincomputercommunication.ShereceivedherM.S.degreein2001fromtheUniversityofFlorida,majoringinelectricalandcomputerengineering.Since2002,shehasbeenwiththeAdvancedComputingandInformationSystemsLaboratory(ACIS)attheUniversityofFlorida,pursuingherPh.D.degree.Herresearchinterestsincludedistributedsystems,autonomiccomputing,virtualizationtechnologies,andinformationsystems. 146