<%BANNER%>

File System Virtualization and Service for Grid Data Management

Permanent Link: http://ufdc.ufl.edu/UFE0022584/00001

Material Information

Title: File System Virtualization and Service for Grid Data Management
Physical Description: 1 online resource (188 p.)
Language: english
Creator: Zhao, Ming
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: autonomic, data, distributed, filesystem, grid, management, middleware, service, virtualization
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Large-scale distributed computing systems, such as computational grids, aggregate computing and storage resources from multiple organizations to foster collaborations and facilitate problem solving through shared access to large volumes of data and high-performance machines. Data management in these systems is particularly challenging because of the heterogeneity, dynamism, size, and distribution of such grid-style environments. This dissertation address these challenges with a two-level data management system, in which file system virtualization provides application-tailored grid-wide data access, and service-based middleware enables autonomic management of the data provisioning. The diversity of applications and resources requires a data provisioning solution that can be transparently deployed, whereas the dynamic, wide-area environments necessitate tailored optimizations for data access. To achieve these goals, this dissertation proposes grid-wide virtual file systems (GVFS), a novel approach that virtualizes existing kernel distributed file systems (NFS) with user-level proxies, and provides transparent cross-domain data access to applications. User-level enhancements designed for grid-style environments are provided upon the virtualization layer in GVFS, including: customizable disk caching and multithreading for high-performance data access, efficient consistency protocols for application-desired data coherence, strong and grid-compatible security for secure grid-wide data access, and reliability protocols supporting application-transparent failure detection and recovery. Based on GVFS, data sessions can be created on demand on a per-application basis, where each session can apply and configure these enhancements independently. The second level of the proposed data management system addresses the problems of managing data provisioning in a large, dynamic system: how to control the data access for many applications based on their needs, and how to optimize it automatically according to high-level objectives. It proposes service-based middleware to manage the lifecycles and configurations of dynamic GVFS sessions. These data management services are able to exploit application knowledge to flexibly customize data sessions, and support interoperability with other middleware based on Web Service Resource Framework. In order to further reduce the complexity of managing data sessions and adapt them promptly to changing environments, an autonomic data management system is built by evolving these services into self-managing elements. Autonomic functions are integrated into the services to provide goal-driven automatic control of GVFS sessions on the aspects including cache configuration, data replication, and session redirection. A prototype of the proposed system is evaluated with a series of experiments based on file system benchmarks and typical grid applications. The results demonstrate that GVFS can transparently enable on-demand grid-wide data access with application-tailored enhancements; the proposed enhancements can achieve strong cache consistency, security, and reliability, as well as substantially outperform traditional DFS approaches (NFS) in wide-area networks; the autonomic services support flexible and dynamic management of GVFS sessions, and can also automatically optimize them on performance and reliability in the presence of changing resource availability.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Ming Zhao.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Figueiredo, Renato J.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0022584:00001

Permanent Link: http://ufdc.ufl.edu/UFE0022584/00001

Material Information

Title: File System Virtualization and Service for Grid Data Management
Physical Description: 1 online resource (188 p.)
Language: english
Creator: Zhao, Ming
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: autonomic, data, distributed, filesystem, grid, management, middleware, service, virtualization
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre: Electrical and Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Large-scale distributed computing systems, such as computational grids, aggregate computing and storage resources from multiple organizations to foster collaborations and facilitate problem solving through shared access to large volumes of data and high-performance machines. Data management in these systems is particularly challenging because of the heterogeneity, dynamism, size, and distribution of such grid-style environments. This dissertation address these challenges with a two-level data management system, in which file system virtualization provides application-tailored grid-wide data access, and service-based middleware enables autonomic management of the data provisioning. The diversity of applications and resources requires a data provisioning solution that can be transparently deployed, whereas the dynamic, wide-area environments necessitate tailored optimizations for data access. To achieve these goals, this dissertation proposes grid-wide virtual file systems (GVFS), a novel approach that virtualizes existing kernel distributed file systems (NFS) with user-level proxies, and provides transparent cross-domain data access to applications. User-level enhancements designed for grid-style environments are provided upon the virtualization layer in GVFS, including: customizable disk caching and multithreading for high-performance data access, efficient consistency protocols for application-desired data coherence, strong and grid-compatible security for secure grid-wide data access, and reliability protocols supporting application-transparent failure detection and recovery. Based on GVFS, data sessions can be created on demand on a per-application basis, where each session can apply and configure these enhancements independently. The second level of the proposed data management system addresses the problems of managing data provisioning in a large, dynamic system: how to control the data access for many applications based on their needs, and how to optimize it automatically according to high-level objectives. It proposes service-based middleware to manage the lifecycles and configurations of dynamic GVFS sessions. These data management services are able to exploit application knowledge to flexibly customize data sessions, and support interoperability with other middleware based on Web Service Resource Framework. In order to further reduce the complexity of managing data sessions and adapt them promptly to changing environments, an autonomic data management system is built by evolving these services into self-managing elements. Autonomic functions are integrated into the services to provide goal-driven automatic control of GVFS sessions on the aspects including cache configuration, data replication, and session redirection. A prototype of the proposed system is evaluated with a series of experiments based on file system benchmarks and typical grid applications. The results demonstrate that GVFS can transparently enable on-demand grid-wide data access with application-tailored enhancements; the proposed enhancements can achieve strong cache consistency, security, and reliability, as well as substantially outperform traditional DFS approaches (NFS) in wide-area networks; the autonomic services support flexible and dynamic management of GVFS sessions, and can also automatically optimize them on performance and reliability in the presence of changing resource availability.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Ming Zhao.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Figueiredo, Renato J.

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0022584:00001


This item has the following downloads:


Full Text

PAGE 1

FILESYSTEMVIRTUALIZATIONANDSERVICEFORGRIDDATAMANAGEMENT By MINGZHAO ADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOL OFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENT OFTHEREQUIREMENTSFORTHEDEGREEOF DOCTOROFTHEPHILOSOPHY UNIVERSITYOFFLORIDA 2008 1

PAGE 2

c 2008MingZhao 2

PAGE 3

Tomywifeandmyparents 3

PAGE 4

ACKNOWLEDGMENTS Firstandforemost,Iwouldliketoexpressmydeepestgratitudetomyadvisor, Prof.RenatoFigueiredo,forhisexcellentguidancethroughoutmyPh.D.study.Iam greatlyindebtedtohimforprovidingmesuchanexcitingresearchopportunity,constantly supportingmeonthingsnomatterbigorsmall,andpatientlygivingmetimeandhelping megrow.IwouldalsoliketogratefullyandsincerelythankProf.JoseFortesforhis exceptionalleadershipoftheACISlabandhisinvaluableadviceineveryaspectofmy academicpursuit.Ihavelearnedenormouslyfromthemaboutresearch,teaching,and advising,andtheywillalwaysbeexamplesformetofollowinmyfuturecareer. Iamverygratefultotheothermembersofmysupervisorycommittee,Prof.Sanjay RankaandProf.TaoLi,fortakingtimeoutoftheirbusyschedulestoreviewmywork. Theirconstructivecriticismandcommentsareveryhelpfultoimprovingmyworkandare highlyappreciated.IalsowishtothankProf.AlanGeorgeandProf.OscarBoykinfor theiradviceandhelp. Myheartfeltthanksandappreciationareextendedtomycurrentandformerfellow ACISlabmembers.ThelabiswhereIhaveobtainedsolidsupportofmyresearch,gained preciousknowledgeandexperience,andgrownfromastudenttoaprofessional.Iam especiallythankfultomycolleagueandgoodfriend,Prapaporn,forhercarefulproongof themanuscriptaswellasourclosecollaborationintheDDDBMIproject. AspecialnoteofgratitudeisduetoDr.GangRongatTsinghuaUniversity,my formaladvisoronmymaster'sdegreestudy,forencouragingandassistingmetopursue myPh.D.overseas. Finally,andmostimportantly,IwouldliketothankmyfamilyandIoweeverything Ihaveachievedtothem.Myparents'unwaveringbeliefinmeandunendingcaringofme arewhathavemademethepersonIamtoday.Mybrother,Hui,hasbeenmybestfriend sincemychildhoodandhasalwaysbeentherewhenIneedhim.Mywife,Jing,hasbeen bothalovingcompanionandasupportivecolleague,bringingendlessinspiration,joy,and 4

PAGE 5

passionintomylife.Itistrulywonderfultohavehersharingeverymomentwithmein thepastveyears,andIlookforwardtowalkinghandinhandwithherthroughthenew journeyaheadofus. 5

PAGE 6

TABLEOFCONTENTS page ACKNOWLEDGMENTS.................................4 LISTOFTABLES.....................................10 LISTOFFIGURES....................................11 ABSTRACT........................................13 CHAPTER 1INTRODUCTION..................................15 1.1Application-TransparentGrid-WideDataAccess...............17 1.2Application-TailoredGridDataProvisioning.................18 1.3Service-BasedAutonomicDataManagement.................19 2BACKGROUNDANDRELATEDWORK.....................21 2.1TypicalGridDataManagementApproaches.................21 2.2TraditionalDistributedFileSystems.....................23 2.3Application-TailoredGridFileSystems....................26 2.3.1NeedforApplication-TailoredEnhancements.............26 2.3.2CachingandConsistency........................28 2.3.3Security.................................34 2.3.3.1Securityindistributedlesystems.............35 2.3.3.2Securityingridsystems...................37 2.3.4FaultTolerance.............................38 2.4Service-OrientedandAutonomicDataManagement.............40 2.5SupportforDistributedVirtualMachines...................42 3DISTRIBUTEDFILESYSTEMVIRTUALIZATION...............44 3.1User-LevelProxy-BasedVirtualization....................44 3.1.1Architecture...............................44 3.1.2NFS-BasedGVFS............................47 3.1.2.1User-levelNFSproxy.....................47 3.1.2.2Multi-proxyGVFS......................50 3.2Evaluation....................................52 3.2.1Setup...................................52 3.2.2Stat....................................52 3.2.3IOzone..................................54 3.2.4PostMark................................56 6

PAGE 7

4APPLICATION-TAILOREDDISTRIBUTEDFILESYSTEMS.........58 4.1MotivatingExamples..............................58 4.2Performance...................................61 4.2.1Client-SideDiskCaching........................61 4.2.1.1Design.............................61 4.2.1.2Deployment..........................63 4.2.1.3Application-tailoredcongurations.............64 4.2.1.4Evaluation...........................65 4.2.2MultithreadedDataTransfer......................68 4.2.2.1Designandimplementation.................68 4.2.2.2Evaluation...........................70 4.3Consistency...................................71 4.3.1Architecture...............................72 4.3.2InvalidationPollingBasedCacheConsistency............75 4.3.2.1Protocol............................75 4.3.2.2Bootstraping.........................77 4.3.2.3Failurehandling.......................77 4.3.3DelegationCallbackBasedCacheConsistency............78 4.3.3.1Delegation...........................78 4.3.3.2Callback............................80 4.3.3.3Statemaintenance......................81 4.3.3.4Failurehandling.......................82 4.3.4Evaluation................................83 4.3.4.1Setup.............................83 4.3.4.2Make..............................84 4.3.4.3PostMark...........................85 4.3.4.4Lock..............................87 4.3.4.5Softwarerepository......................89 4.3.4.6Scienticdataprocessing...................91 4.4Security.....................................92 4.4.1SecureTunnelingBasedPrivateGridFileSystem..........93 4.4.1.1Securedatatunneling....................93 4.4.1.2Securitymodel........................95 4.4.1.3Evaluation...........................97 4.4.2TheSSL-EnabledSecureGridFileSystem..............100 4.4.2.1Design.............................100 4.4.2.2Implementation........................101 4.4.2.3Deployment..........................104 4.4.2.4Evaluation...........................105 4.5FaultTolerance.................................115 4.5.1VirtualizationofDataSets.......................116 4.5.2ReplicationSchemes...........................117 4.5.3Application-TransparentFailover....................119 4.5.4Evaluation................................120 7

PAGE 8

5APPLICATIONSTUDY:SUPPORTINGGRIDVIRTUALMACHINES....122 5.1Architecture...................................122 5.2VirtualMachineAwareDataTransfer.....................124 5.3IntegrationwithVM-BasedGridComputing.................126 5.4Evaluation....................................128 5.4.1Setup...................................128 5.4.2PerformanceofApplicationExecutionswithinVMs.........129 5.4.3PerformanceofVMCloning......................133 6SERVICE-ORIENTEDAUTONOMICDATAMANAGEMENT.........137 6.1Service-BasedDataManagement.......................137 6.1.1Architecture...............................137 6.1.2TheWSRF-BasedDataManagementServices............140 6.1.2.1Filesystemservice......................140 6.1.2.2Dataschedulerservice....................140 6.1.2.3Datareplicationservice...................141 6.1.3Application-TailoredDataSessions..................142 6.1.3.1Griddataaccessandletransfer..............142 6.1.3.2Cacheconsistency.......................144 6.1.3.3Faulttolerance........................145 6.1.4SecurityArchitecture..........................148 6.1.5UsageExamples.............................150 6.1.5.1Virtualmachinebasedgridcomputing...........150 6.1.5.2Workowexecution......................152 6.2AutonomicDataManagement.........................154 6.2.1AutonomicDataSchedulerService...................155 6.2.2AutonomicDataReplicationService..................158 6.2.2.1Datareplicationdegreeandplacement...........158 6.2.2.2Datareplicaregeneration..................160 6.2.3AutonomicFileSystemService.....................161 6.2.3.1Client-sidelesystemservice................161 6.2.3.2Server-sidelesystemservice................163 6.2.4Evaluation................................164 6.2.4.1Setup.............................164 6.2.4.2Autonomicsessionredirection................164 6.2.4.3Autonomiccacheconguration...............167 6.2.4.4Autonomicdatareplication.................168 7CONCLUSION....................................170 7.1Summary....................................170 7.2FutureWork...................................173 7.2.1Performance...............................173 7.2.2Intelligence................................175 8

PAGE 9

7.2.3Integration................................176 REFERENCES.......................................178 BIOGRAPHICALSKETCH................................188 9

PAGE 10

LISTOFTABLES Table page 4-1OverheadofprivateGVFSfortheLaTeXandSPECseisbenchmarks......99 5-1PerformanceofparallelVMcloning.........................135 10

PAGE 11

LISTOFFIGURES Figure page 2-1TypicalNFSsetup..................................24 2-2Architectureofanautonomicelement........................41 3-1ArchitectureofGVFS-basedvirtualDFSs.....................45 3-2User-levelproxy-basedDFSvirtualization.....................46 3-3Single-proxybasedGVFS..............................49 3-4Multi-proxybasedGVFS...............................51 3-5PerformanceoftheStatbenchmarkonGVFS...................53 3-6ThroughputsoftheIOzonebenchmarkonGVFS.................54 3-7TheCPUusageofGVFSproxywiththeIOzoneBenchmark...........55 3-8PerformanceofthePostMarkbenchmarkonGVFS................57 4-1SetupofGVFSproxy-manageddiskcaching....................63 4-2PerformanceofIOzonewithoutcachingonWAN.................67 4-3PerformanceofIOzonewithcachingonWAN...................68 4-4ThroughputsofIOzonewithdierentnumberofGVFSthreads.........70 4-5Application-tailoredcacheconsistencyprotocolsonGVFSsessions........73 4-6SamplecongurationleforcustomizingaGVFSsession.............74 4-7InvalidationpollingbasedGVFScacheconsistency................75 4-8DelegationcallbackbasedGVFScacheconsistency................79 4-9PerformanceoftheMakebenchmark........................84 4-10PerformanceofPostMarkwithdierentnetworklatency.............86 4-11PerformanceofthedierentphasesofPostMark..................87 4-12PerformanceoftheLockbenchmark........................88 4-13PerformanceoftheparallelNanoMOSbenchmark.................90 4-14PerformanceoftheCH1Dbenchmark........................91 4-15SecuretunnelingbasedprivateGVFS........................93 11

PAGE 12

4-16PerformanceofIOzoneonsecureGVFS......................108 4-17Proxyclient'sCPUusageinsecureGVFS.....................110 4-18Proxyserver'sCPUusageinsecureGVFS.....................111 4-19PerformanceofPostMarkinLANwithsecureGVFS...............112 4-20PerformanceofPostMarkinWANwithsecureGVFS...............113 4-21PerformanceofthemodiedAndrewbenchmarkonsecureGVFS........114 4-22PerformanceoftheSPECseisbenchmarkonsecureGVFS............115 5-1TheGVFSsupportforVMinstantiationingridsystems.............123 5-2TheGVFSextensionsforVMstatetransfers....................124 5-3PerformanceoftheSPECseisbenchmarkinVM..................131 5-4PerformanceoftheLaTeXbenchmarkinVM...................132 5-5PerformanceofthekernelcompilationbenchmarkinVM.............133 5-6PerformanceofVMcloning.............................134 6-1DatamanagementservicesforGVFSsessions...................138 6-2Application-tailoredenhancementsforaGVFSsession..............143 6-3SecurityarchitectureofGVFS-baseddatamanagementsystem..........149 6-4TheVM-basedgridcomputingsupportedbythedatamanagementservices...151 6-5AMonte-Carloworkowsupportedbythedatamanagementservices......153 6-6Autonomicdatamanagementsystem........................155 6-7NetworkRTTimpactedbythird-partyparallelTCPtransfers..........165 6-8AutonomicsessionredirectionforexecutionsofIOzone..............166 6-9AutonomiccachecongurationforexecutionsofIOzone..............167 6-10AutonomicdatareplicationforexecutionsofIOzone...............169 12

PAGE 13

AbstractofDissertationPresentedtotheGraduateSchool oftheUniversityofFloridainPartialFulllmentofthe RequirementsfortheDegreeofDoctorofthePhilosophy FILESYSTEMVIRTUALIZATIONANDSERVICEFORGRIDDATAMANAGEMENT By MingZhao August2008 Chair:RenatoJ.Figueiredo Major:ElectricalandComputerEngineering Large-scaledistributedcomputingsystems,suchascomputationalgrids,aggregate computingandstorageresourcesfrommultipleorganizationstofostercollaborations andfacilitateproblemsolvingthroughsharedaccesstolargevolumesofdataand high-performancemachines.Datamanagementinthesesystemsisparticularlychallenging becauseoftheheterogeneity,dynamism,size,anddistributionofsuchgrid-style environments.Thisdissertationaddressthesechallengeswithatwo-leveldatamanagement system,inwhichlesystemvirtualizationprovidesapplication-tailoredgrid-wide dataaccess,andservice-basedmiddlewareenablesautonomicmanagementofthedata provisioning. Thediversityofapplicationsandresourcesrequiresadataprovisioningsolution thatcanbetransparentlydeployed,whereasthedynamic,wide-areaenvironments necessitatetailoredoptimizationsfordataaccess.Toachievethesegoals,thisdissertation proposesgrid-widevirtuallesystemsGVFS,anovelapproachthatvirtualizesexisting kerneldistributedlesystemsNFSwithuser-levelproxies,andprovidestransparent cross-domaindataaccesstoapplications.User-levelenhancementsdesignedforgrid-style environmentsareprovideduponthevirtualizationlayerinGVFS,including:customizable diskcachingandmultithreadingforhigh-performancedataaccess,ecientconsistency protocolsforapplication-desireddatacoherence,strongandgrid-compatiblesecurityfor securegrid-widedataaccess,andreliabilityprotocolssupportingapplication-transparent 13

PAGE 14

failuredetectionandrecovery.BasedonGVFS,datasessionscanbecreatedondemand onaper-applicationbasis,whereeachsessioncanapplyandconguretheseenhancements independently. Thesecondleveloftheproposeddatamanagementsystemaddressestheproblemsof managingdataprovisioninginalarge,dynamicsystem:howtocontrolthedataaccessfor manyapplicationsbasedontheirneeds,andhowtooptimizeitautomaticallyaccording tohigh-levelobjectives.Itproposesservice-basedmiddlewaretomanagethelifecycles andcongurationsofdynamicGVFSsessions.Thesedatamanagementservicesare abletoexploitapplicationknowledgetoexiblycustomizedatasessions,andsupport interoperabilitywithothermiddlewarebasedonWebServiceResourceFramework. Inordertofurtherreducethecomplexityofmanagingdatasessionsandadaptthem promptlytochangingenvironments,anautonomicdatamanagementsystemisbuiltby evolvingtheseservicesintoself-managingelements.Autonomicfunctionsareintegrated intotheservicestoprovidegoal-drivenautomaticcontrolofGVFSsessionsontheaspects includingcacheconguration,datareplication,andsessionredirection. Aprototypeoftheproposedsystemisevaluatedwithaseriesofexperimentsbased onlesystembenchmarksandtypicalgridapplications.Theresultsdemonstratethat GVFScantransparentlyenableon-demandgrid-widedataaccesswithapplication-tailored enhancements;theproposedenhancementscanachievestrongcacheconsistency,security, andreliability,aswellassubstantiallyoutperformtraditionalDFSapproachesNFSin wide-areanetworks;theautonomicservicessupportexibleanddynamicmanagementof GVFSsessions,andcanalsoautomaticallyoptimizethemonperformanceandreliability inthepresenceofchangingresourceavailability. 14

PAGE 15

CHAPTER1 INTRODUCTION Computationsarebecomingincreasinglylargerscale,intermsofbothsizeand geographicalandadministrationdistribution.Examplesincludescienticgrids[1] whichharnessresourcesamongseveralinstitutionsforcoordinatedproblemsolving,and enterpriseinformationsystemsthataggregateeortsfrommultiplesitesforcollaborative development.Commoninthesesystemsisthatapplicationsanddataaredistributedon resourcesacrossadministrativeboundariesandwide-areanetworks.Suchenvironments canbereferredasthe grid-style"environments ,whichhavethefollowingdistinctive characteristics: Heterogeneity :Thereexistawidevarietyofapplicationsandresourcesina grid-styleenvironment.Theresourcestypicallyhavedierenthardwarecongurations e.g.,CPUspeedandarchitecture,memorysize,diskbandwidthandcapacityand softwaresetupse.g.,operatingsystemsandlibraries;theapplicationsalsohave diversecharacteristicse.g.,dataaccesspatternandneedse.g.,desireddataaccess performance,security,andreliability. Dynamism :Systemsdeployedinagrid-styleenvironmentarehighlydynamic. Failuresonmachinesandnetworkscanhappenatanytime,andnon-dedicated resourcesmaydynamicallyjoinandleavethesystem.Ontheotherhand,applications arestartedandterminatedondemand,andtheirworkloadsalsovaryovertime. Scale :Largeamountsofresourcescanbeaggregatedinagrid-styleenvironment. Theyaredistributedacrossdierentinstitutionsandconnectedonwide-area networks,providingthecomputingpowerandstoragecapacitytosupportexecutions ofmanyapplications. Thisdissertationfocusesontwospecicaspectsofdatamanagementindistributed systems: dataprovisioning |providingapplicationsrunningonthecomputingresources withremoteaccesstotheirdatastoredonthestorageresources,andthe managementof thedataprovisioning |theestablishment,conguration,andterminationoftheremote dataaccess.Computinginagrid-styleenvironmentposesuniquechallengestothese tasksbecauseoftheabovementionedheterogeneous,dynamic,andlarge-scalenatureof applicationsandresources. 15

PAGE 16

First,thediversityofapplicationsandresourcesmotivatesadataprovisioning solutionthatcanbetransparentlydeployed,withoutmodifyingtheexistingoperating systemsO/Ssandchangingtheapplicationsourcecodeorbinaries.Second,the wide-area,cross-domainenvironmentsnecessitateapplication-tailoredoptimizationsfor dataaccesstoaddresstheineciencylongnetworkdelay,limitednetworkbandwidth, insecurityinsecureresources,limitedmutual-trustbetweendierentdomains,and unsafetyunreliablemachinesandnetworksthataretypicalinsuchenvironments.Last butnotleast,themanagementofdataprovisioninginalarge,dynamicsystemalso desiresexiblecontrolandautomaticoptimizationoftheremotedataaccess,inorder todealwiththecomplexityofprovidingdatatomanyapplications,toagilelyadaptto thechangingenvironments,andtodeliverapplication-desiredperformance,security,and reliability. Toaddressthesechallenges,thisdissertationpresentsatwo-leveldatamanagement systeminwhichlesystemvirtualizationprovidesapplication-tailoredgrid-widedata access,andservice-basedmiddlewareenablesautonomicmanagementofthedata provisioning.Inparticular,thissystemhasmadethefollowingcontributions: Itprovideson-demand,cross-domaindataaccesstransparentlyforunmodied applicationsandO/Ssbasedonuser-levelvirtualizationofwidelyavailableO/S-level distributedlesystemsDFSs. Itsupportsapplication-tailoredenhancementsdesignedforgrid-styleenvironmentson severalimportantaspectsofremotedataaccess,includingperformance,consistency, security,andreliability. Itemploysmiddlewareservicestoachieveexibleandinteroperablemanagement ofgrid-scaledataprovisioning,whichiscapableofcontrollingthelifecyclesand congurationsofdynamicdatasessionsbasedonapplicationneeds. Itdevelopsautonomicfunctionstoautomaticallyoptimizethedatamanagement accordingtohigh-levelobjectives,inordertoreducethecomplexityofmanagingdata sessionsandadaptthempromptlytochangingenvironments. Finally,theproposedsystemhasbeendemonstrated,withthoroughexperimental evaluation,thatitiseectiveandcansignicantlyoutperformconventional 16

PAGE 17

DFS-basedapproachesingrid-styleenvironments;ithasalsobeensuccessfully deployedinaproductiongridsystem[2][3]forseveralyears,supportingscientic toolsandusersfrommanydisciplines. Thedatamanagementsystemproposedinthisdissertationisarchitectedtoaddress threeimportantquestions,whicharediscussedinthefollowingsubsectionsrespectively. 1.1Application-TransparentGrid-WideDataAccess Therstquestionis,howtoprovideapplication-transparentgrid-widedataaccess? Gridsdierfromtraditionaldistributedcomputingenvironmentsbecauseoftheir distinctcharacteristics,e.g.,wide-areanetworking,heterogeneousendsystems,anddisjoint administrativedomains.Thesedierencesbringnewchallengestodatamanagement systems,andthetechnologiesthataresuccessfulinlocal-areanetworksLAN,e.g., LANlesystems,cannotbedirectlyappliedinagridenvironment.Instead,griddata managementneedstospecicallyaddresstheseuniqueissues. Existingsolutionsallowapplicationstoaccessgriddatathroughtheuseofspecialized APIsorlibraries.However,therequiredmodicationsonapplicationsourcesorbinaries oftenplaceaburdenupontheshouldersofendusersanddevelopers,andpresentahurdle toapplicationsthatcannotbeeasilymodied.Therefore,application-transparencyis desirabletofacilitatethedeploymentofawiderangeofapplicationsongrids,where grid-enablingshouldbetheresponsibilityofthegridmiddlewarebutnottheapplication usersordevelopers. Thisdissertationpresentsauser-levelDFSvirtualization,namely G rid V irtual F ile S ystemGVFS,forapplication-transparentgriddataaccess.Becausethewell-known DFSinterfaceispreservedbyGVFSandpresentedtoapplications,nomodicationsare requiredtotheirsourcecode,libraries,orbinaries.Inaddition,theproposedapproachis basedonuser-levelvirtualizationtechniques,whichrequiresnochangestoexistingO/Ss andcanbeconvenientlydeployedongridresources.Furthermore,user-levelenhancements designedforgrid-styleenvironmentsarebuiltuponthevirtualizationlayertoenabledata provisioningwithapplication-desiredcharacteristics. 17

PAGE 18

Inshort,theproposedGVFSapproachanswerstherstquestionbyproviding transparentgrid-widedataaccessforunmodiedapplicationsandO/Ssthroughthe user-levelDFSvirtualization. 1.2Application-TailoredGridDataProvisioning Thesecondquestionis,howtoprovidedatawithapplication-tailoredoptimizations? TypicalO/Ssaredesignedtosupportgeneral-purposeapplications,butitisoften thecasethatonesizedoesnottall".Applicationshavediversecharacteristicsand requirements,intermsof,forexample,dataaccesspatterns,acceptablecachingand consistencypolicies,securityconcerns,andfaulttolerancerequirements.Toprovidethe desiredperformance,security,andreliabilitytoagridapplication,dataprovisioningneeds tobeoptimizedaccordingtotheapplication'sbehaviorsandneeds. Becauseanoptimizationtailoredforoneapplicatione.g.,aggressiveprefetchingof lecontentsmayresultinperformancedegradationforseveralotherse.g.,sparseles, databases,application-tailoredfeaturesaretypicallynotimplementedingeneral-purpose O/Skernels.Inaddition,kernel-levelmodicationsarediculttoportanddeploy, notablyinsharedenvironments.Toolkit-basedsolutionstypicallygiveuserspowerfulAPIs toprogramremotedataaccesswithdesiredbehaviors,butfewprogrammersareskilledto makeeectiveuseofsuchAPIs. Tosolvethisproblem,user-levelDFScustomizationsareproposedtosupport application-tailoredGVFSdatasessions.Inparticular,enhancementsdesignedfor grid-styleenvironmentsareprovideduponthevirtualizationlayerinGVFS,which includecustomizablediskcachingandmultithreadingforhigh-performancedata access,ecientconsistencyprotocolsforapplication-desireddatacoherence,strong andgrid-compatiblesecurityforsecuregrid-widedataaccess,andreliabilityprotocols supportingapplication-transparentfailuredetectionandrecovery.BasedonGVFS,data sessionscanbecreatedondemandonaper-applicationbasis,whereeachsessioncan applyandconguretheseenhancementsindependentlytoaddressitsapplication'sneeds. 18

PAGE 19

Therefore,theanswertothesecondquestionistousetheapplication-tailored enhancementsenabledbyGVFStoprovidegrid-widedatasessionswithapplication-desired performance,consistency,security,andreliability. 1.3Service-BasedAutonomicDataManagement Thethirdquestionis,howtomanagedataprovisioninginagrid-scalesystemwith dynamicallychangingenvironments? BasedontheGVFSapproach,datasessionscanbestartedondemandand independentlycustomizedforapplications.However,inalarge-scalesystem,the managementofmanydynamicdatasessionsisanotherchallengingtaskduetoits complexity.Datasessionsneedtobedynamicallyestablishedanddestroyedbasedon thelifecyclesofapplicationsandthelocationsoftheirinstantiationsanddatastorage. Customizationofdatasessionsalsoimpliestheconsiderationofvariousrelevantfactors andtuningofmanyparameters,inaccordancewiththedesiredbehaviorsandthe surroundingenvironments.Dynamicallychangingapplicationworkloadandresource availabilityfurtherrequirecontinuousmonitoringofdatasessionsandtimelyadaptationof theircongurations. Theserequirementsareoftenbeyondthecapabilityofend-usersandevensystem administrators.Yetthegoalsofusersoradministratorsarerathersimpleandexplicit. Forexample,fromanapplicationuser'spointofview,itisdesiredthatthejobexecution isfast,secure,andreliable;fromaresourceprovider'spointofview,itisexpectedthat theresourceuseishealthyandprotable.Therefore,thisdissertationpresentsanovel service-basedautonomicdatamanagementapproachtoautomaticallymanageand optimizethedataprovisioningaccordingtosuchhigh-levelobjectives. Thisdissertationproposesasetofdatamanagementservicestomanagethe per-applicationGVFSsessions,enforcetheisolationamongtheindependentsessions, andapplythedesiredcustomizationforeachsession.Theysupportexiblecontrol overthelifecyclesandcongurationsofdatasessions,andcanexploretheknowledge 19

PAGE 20

ofapplicationse.g.,dataaccesspatterns,datasharingscenarios,andservicequality requirementstocustomizetheirdatasessionsontheuseofperformance,consistency, security,andreliabilityenhancements.Theseservicesalsoprovideinteroperableinterfaces whichallowfordirectinteractionswithothergridmiddlewareservicesandautomated executionsofdataprovisioningtasks. Tofurtherreducehumaninterventioninmanagingdatasessionsandenablethem topromptlyadapttothechangingenvironments,autonomicfunctionsarebuiltintothe datamanagementservicestomakethemcapableofautomaticallymonitoring,analyzing, andoptimizingthedistributedentitiesofgrid-widedatasessions,andcooperatively workingtogethertoachievethedesireddataprovisioningandresourceusagegoals.Such autonomicmanagementisappliedtoseveralimportantaspectsofdatasessionsincluding cacheconguration,datareplication,andsessionredirection. Insummary,theGVFS-baseddatamanagementsystemaddressesthelastquestionby employingautonomicservicestoprovideautomaticmanagementandoptimizationofdata sessionsaccordingtotheapplicationneedsandchangingenvironments. 20

PAGE 21

CHAPTER2 BACKGROUNDANDRELATEDWORK 2.1TypicalGridDataManagementApproaches Currentlytherearethreemainapproachestogriddataprovisioning,whichare summarizedasfollows: Therstapproachleveragesmiddlewaretoexplicitlytransferlespriortoandafter applicationexecution.Thisapproachisoftencalledlestaging"andisadoptedby severalmajorclustercomputingandgridcomputingsystems,suchasPBS[4]andGlobus [5].Typically,thenecessaryinputsarestagedinbeforeajobstartsandtheproduced outputsarestagedoutafterthejobcompletes,wherethelesaretransferredentirely usingtoolssuchasRCPRemoteCopy,SCPSecureCopy,andGridFTP[6]. ThesecondapproachisbasedonapplicationprogramminginterfacesAPIswhich allowanapplicationtoexplicitlycontroltransfers.Forexample,GlobusGridFTP[6]and GASS[7]provideAPIsforapplicationstodownloadanduploadlesentirelyforaccess, whereastheRFTReliableFileTransfer[8]serviceexposesWebservicebasedinterface forschedulingGridFTP-basedletransfers. Thethirdapproachemploysmechanismstointerceptdata-relatedeventsandhandle themwithremotedataaccessimplicitlytotheapplications.Forexample,standard Clibrarycallse.g.,fread,fwriteandLinuxsystemcallse.g.,read,writeforlocal leaccesscanbeinterceptedandmappedtooperationsonaremotele[9][10][11][12]; distributedlesystemDFScallse.g.,NFSreadandwriteremoteprocedurecallsona regularlecanbemappedtotheaccessonagridobject[13][14]. Comparingthesethreedierentapproaches,therstoneistraditionallytakenfor applicationswithwell-deneddatasetsandows,suchasuploadingofstandardinput anddownloadingofstandardoutput.Itisdiculttosupportapplicationsthathave obscure,complexdataaccesspatterns.Thesecondapproachisadoptedforapplications wherethedevelopmentcostofincorporatingspecializedAPIsisjustiableforcertain 21

PAGE 22

specicpurposes,suchasperformanceandsecurity.Itisnotapplicableforapplications thatdonothavesourcecodeavailable,suchaspackagedcommercialsoftware.Theeorts requiredforthemodicationsalsomakeitdiculttoportawidevarietyofapplications. Inaddition,bothofthersttwoapproachesneedtotransferlesintheirentiretyinorder toreadorwritethem.Thus,theyarenotecientforlesthatareonlyaccessedsparsely e.g.,databaseles,virtualmachinediskstateandtheycannotsupportne-graineddata sharingamongseveraldistributedusersorapplications. Incontrast,thethirdapproachachievesgreatapplicationtransparencyandcanbe usedforapplicationsthatdonothavewell-deneddatasetsoraccesspatternsandfor applicationsthatcannotbeeasilymodied.Itisalsopossibleforthisapproachtotransfer onlytheneededdatablocksoflesondemand,sothatsparsedataaccessonlargeles canbeecientandmultipleusersorapplicationscanexiblyworkonthesameles concurrently. The G rid V irtual F ile S ystemGVFSdescribedinthisdissertationisbased onthethirdapproach.Experiencewithnetwork-computingenvironmentshasshown thattherearemanyapplicationsinneedofsuchsolutions[2][15].TheCondor[9]and Kangaroo[16]systemsprovideremotedataaccesstoanapplicationthroughlibrary callinterceptionbymeansofeitherstaticordynamicrelinking.Staticlinkingrequires theapplicationtobelinkedtoaspecializedlibrarythatreplacestheexistingonee.g., thestandardClibrary,sothatthedata-relatedlibrarycallscanbeinterceptedand mappedtoremoteI/Ooperations[9].Dynamiclinkingachievesthesamegoalbyusing linkercontroltodirectthespecializeddynamiclibrarytobeusedinplaceoftheexisting onewhentheapplicationisexecuted.However,staticlinkingrequiresrebuildingthe applicationanddoesnotworkifitssourcecodeisnotavailable;dynamiclinkingonly worksforapplicationsthataredynamicallylinkedanddoesnotsupportstatically-linked applications.Inaddition,Kangaroodoesnotprovidefulllesystemsemanticsandthus 22

PAGE 23

cannotsupportmanyapplicationsthatrequirethesemissingoperationse.g.,deleteand link. PreviouseortontheUFOsystem[11]andrecently,theParrotlesystem[12], leveragesystemcalltracingtointerceptanapplication'sdata-relatedsystemcallsand mapthemtoremotedataaccessfortheapplication.Buttheyrequirelow-levelprocess tracingcapabilitiesthatarehighlyO/Sdependentandnotwidelyavailable.Itisalso verydiculttoimplementrobustsystemcallinterpositionanditisunabletosupport non-POSIXcompliantoperationse.g.,setuid. Comparedtotheaboveapproaches,DFS-basedtechniquesutilizedbyGVFSare keytosupportingawiderangeofapplications,especiallytheonesthat mustbedeployed withoutmodicationstosourcecode,libraries,orbinaries .Examplesincludecommercial, interactivescienticandengineeringtoolsandvirtualmachinemonitorsthatoperateon large,sparsedatasets[17][18][19][20]. 2.2TraditionalDistributedFileSystems DistributedlesystemsDFSs,suchasNFS[21][22][23]andCIFS[24],have beensuccessfullydeployedonlocal-areasystemse.g.,computerclusters,fordecades, enablingapplicationstotransparentlyaccesslargeamountsofremotelystoreddata.Their architectureistypicallydesignedinaclient-serverstyle.A server storesthedatainits localdisksanditrunsthe DFSserver toprovidetheremoteleservice.TheDFSserver hidestheactualimplementationofitslocaldataaccesse.g.,alocallesystemandthe actuallocationofthedata,andpresentsastandardinterfacedenedbytheDFSprotocol toservicetheremoteleaccessfromclients.A client iswheretheuserorapplicationthat needstheaccessofremotelystoreddataisat,anditrunsthe DFSclient tohandlethe remotedataaccessforapplications.TheDFSclient,togetherwithothercomponentsof theO/S,oersagenericlesysteminterface,whichisthesameforbothlocalandremote dataaccess,toapplicationsandthushidesthecomplexityofbringingindatafroma remoteserver. 23

PAGE 24

Figure2-1.AnNFSsetupistypicallycreatedbymountingalesystemfromtheserverto theclient,andtheNFS-mountedlesystemispresentedtoapplicationsinthe samewayaslocallesystems.Anapplication'sdataaccesstriggerssystem callswhicharehandledbythekernel,andtheyarepassedontotheNFS clientiftherequesteddataareactuallymountedfromtheremoteserver.The NFSclienthandlesthedataaccessbysendingremoteprocedurecallsRPCs, accordingtotheNFSprotocol,totheNFSserveracrossthenetworkviaeither UDPorTCP.TheNFSserverprocessestheincomingclientRPCrequestsand invokesI/Osontheserverlocaldiskstosatisfytheaccess.Theresultsaresent backfromtheNFSservertotheNFSclientandthenreturnedtothe application. NetworkFileSystemNFSisthe defacto DFSsinceitisthemostwidely-deployed one.IthasbeenimplementedforalargenumberofdierenttypesofO/Ss,including UNIX,Linux,andWindows.Thewidely-usedversionsofNFSareNFSv2andNFSv3, whilethelatestversion,NFSv4,isalsobecomingavailableintherecentO/Sdistributions. TheNFSclientsandserversaretypicallyimplementedinO/Skernels.InNFS,accessto remotelystoredlesisservicedina block-by-block manner,thatis,onlythedatablocks thatareneededbyauserorapplicationaretransferredacrossthenetwork. AsillustratedinFigure2-1,anNFSsetupistypicallycreatedbymountingale systemfromtheservertotheclient,andtheNFS-mountedlesystemispresentedto applicationsinthesamewayaslocallesystems.Anapplication'sdataaccesstriggers systemcallswhicharehandledbythekernel,andtheyarepassedontotheNFSclientif therequesteddataareactuallymountedfromtheremoteserver.TheNFSclienthandles thedataaccessbysendingremoteprocedurecallsRPCs,accordingtotheNFSprotocol, 24

PAGE 25

totheNFSserveracrossthenetworkviaeitherUDPorTCP.TheNFSserverprocesses theincomingclientRPCrequestsandinvokesthenecessaryI/Osontheserver'slocal diskstosatisfytheaccess.TheresultsaresentbackfromtheNFSservertotheNFS clientandthenreturnedtotheapplication.Thiswholeprocessistransparenttothe applicationinthatitiscompletelyunawareofwherethedataarestoredandhowtheyare accessed,exceptforaprobablylongerdelayoftheaccess. ManyDFSsfollowlargelythesamedataaccessmodelasNFS.TheDFSthatis widelyavailableonWindows-familyO/SsistheCommonInternetFileSystemCIFS. ItisbasedontheServerMessageBlockSMBprotocolinwhichremotedataaccess iscarriedoutviaSMBrequestsandresponsesbetweentheCIFSclientandserver. AnothernoteworthyfamilyofDFSsisAFSAndrewFileSystem[25]anditsdescendants OpenAFS[26]andCoda[27].CodaandearlierversionsofAFSuseadierentdataaccess modelinwhichlesaretransferredentirelywhentheyareaccessedbyapplications. AFSandCodaarealsodesignedwithwide-areaenvironmentsinmindandhavespecial enhancementsforsuchusage.Thenextsectionwilldiscussseveralimportantaspectsof wide-arealesystemsindetails,includingcacheandconsistency,security,andreliability. SimilartotheGVFSapproachproposedinthisdissertation,severalrelatedsystems havealsoleverageduser-leveltechniquesbasedonloop-backserver/clientproxiestoextend O/S-levelDFSfunctionality|inessence,virtualizingDFSsbymeansofintercepting RPCcallsofprotocolssuchasNFS[21],e.g.,theautomounter[28],CFS[29],andSFS [30].Inparticular,LegionFS[13]interposesauser-levelmodiedNFSserverbetween akernelNFSclientandaLegionservertoprovideaccesstogridobjects.NeST[14] isastorageappliancethatservicesrequestsfordatatransferssupportingavariety ofprotocols,includingNFSandGridFTP.However,onlyarestrictedsubsetofNFS operationsandanonymoususeraccessareavailable.Furthermore,thesystemdoes notintegratewithunmodiedkernelNFSclients,akeyrequirementforapplication transparency.Theapproachdescribedinthisdissertationdierentiatesfromtheseeorts 25

PAGE 26

inthatitsupportsapplication-tailoredDFSswhichareimportantfordataprovisioningin grid-styleenvironments. 2.3Application-TailoredGridFileSystems 2.3.1NeedforApplication-TailoredEnhancements TransparencyisthemainmotivationforusingDFS-basedtechniquesforgrid-wide dataaccess.However,currentlytherearenomechanismsthatallowaconventionalDFS implementationtobecustomizedtosupportapplication-oruser-tailoredenhancements. Toillustratewithexamples,considerthecaseofaleserverexportinguserhome directoriestoclients.SupposeuserAliceisaprogrammerthatusesasingleclientto performthebulkofhersoftwaredevelopmentediting,compiling,debugging.UserBob isaresearcherthatusesoneormoreclientstodevelopsignal-processingalgorithmsthat lateraretoberunacrossmanyclientsconcurrently,usingasinputsalargenumberof benchmarkmediales. ExistingDFSsareunabletorecognizeper-sessionandper-applicationdierencesthat coulddriveperformanceandfunctionalityimprovementstotheseusers.Considerthecase ofNFS.Currentlywidelydeployedversionsoftheprotocolv2,v3donotstoreclient stateinformationintheserver,relyonclient-initiatedrevalidationrequeststocheckfor consistency,andwrite-throughcacheblocksonlecloses. InAlice'sexample,anNFSclientwouldnotbeabletoexploitthefactthatsheuses asingleclienttoaggressivelycacheread/writedatainlocaldisk,andthuscouldnotavoid theunnecessarynetworkcallsforconsistencychecksandwriterequests.Neitherwould NFSclientsbeabletoexploitthefactthatBob'sinputlesdonotoftenchangeand wouldpolltheservertorevalidateeachindividualleuponopening.Acurrentlyavailable customization|increasingcachedattributeexpirationtimes|wouldnotbeadvisable becauseitwouldapplytotheentireremotely-mountedlesystem,potentiallyforcing largeexpirationtimesonlesofotherapplications/usersthatsharethesamelesystem. 26

PAGE 27

TheseexampleshighlightcaseswhereDFSbehaviorcanbetunedbyexploiting applicationknowledgesuchas: Numberofclients .Forinstance,aggressivecachingofattributesanddatacanbe performedwithoutconsistencychecksifitisknownthatonlyasingleclientisassociated withaparticularcomputingsession. Sharingroleofclients .Forinstance,consistencymodelswell-suitedforscenarios withoneorafewwriterclientsandmanyreaderclientscanbeperformedifthisproperty isknowntoholdtruebymiddlewareforaparticularcomputingsession. OtherexamplesofapplicationknowledgethatisimportanttoDFSsincludethe application'sneedontheuseoffull-leorpartial-leaccess,thestrengthofsecurity enforcement,andtheleveloffaulttolerance.Theinabilityofperformingoptimizations basedonsuchinformationpresentsahurdletothedeploymentofpervasiveLANle systemse.g.,NFSacrossgrid-styleenvironments.AsillustratedintheaboveAlice's andBob'sexamples,ifDFSsarecapableofleveragingapplicationknowledge,thenumber ofclient-serverinteractionscanbereduced,therebyreducingserverloadsandaverage requestlatencies.However,typicalDFSimplementationsarenotdesignedtoexploitsuch knowledge,fortwoimportantreasons. First,traditionallyDFSsaresetupbysystemadministrators,who,formanagement eciencyreasons,favorstatic,long-lived,homogeneouscongurationsatthegranularityof acollectionofusersratherthandynamic,short-lived,customizedsetupsatthegranularity ofanapplicationsession.Inaddition,currentsystemshavenomechanismsallowingusers toconveyinformationaboutDFSfeaturesthattheydesirefortheirapplicationstosystem administrators. Second,integratingapplication-tailoredfeatureswithDFSimplementationsin commonlyavailablekernelsisverydicultinpractice.Fordesignersofastablekernel tree,selectingwhichenhancementsshouldbeaddedbasedonapplicationneedsis dicult:anoptimizationtailoredforoneapplicatione.g.,aggressivepre-fetchingof 27

PAGE 28

lecontentsmayresultinperformancedegradationforseveralotherse.g.,sparseles, databases.Furthermore,itisdicultforthekerneltogainapplicationknowledgethatis neededfordrivingtheusageofsuchfeatures|itmayrequireadditionalsystemcallsor non-standardAPIsthatmustthenbepresentinfuturereleasesforlegacysupport,evenif thefeaturesarerarelyused.Inaddition,kernel-levelmodicationsevenifencapsulated intomodulesarediculttoportanddeploy,notablyinsharedenvironments.For managementandsecurityreasons,administratorsareoftenstrictaboutcontrollingtheir kernelcongurationsandarereluctanttoallowmodicationsthatdeviatefromstockO/S distributions. TheproposedGVFS-basedapproachaddressestheneedforapplication-tailored dataprovisioningbyenablinguser-levelper-applicationcustomizationonremote dataaccess.Thelackofsupportforapplication-tailoredoptimizationshasalsobeen recognizedasalimitationbyBAD-FS[10],whichexposesthecontroldecisionsoncaching, consistency,andreplicationtogridmiddleware.However,itreliesonsystem-callbased interpositionagents,andtherefore,asdiscussedinSection2.1,itonlysupportsspecic typesapplicationsandO/Ss.Incontrast,thetechniquesdescribedinthisdissertation enableapplicationandO/Stransparentgridlesystemswithapplication-tailored enhancements.Theseenhancementscoverseveralimportantaspectsofremotedata access,includingcaching,consistency,security,andfaulttolerance,whicharediscussedin detailsintherestofthissection. 2.3.2CachingandConsistency Cachingisaclassic,successfultechniquetoimprovetheperformanceofvarious typesofcomputersystemsbyexploitingtemporalandspatiallocalityofdatareferences andprovidinghigh-bandwidth,low-latencyaccesstocacheddata.Thebasicidea ofclient-sidecachingisthatwhenaclientrequestsdatafromastoragefacility, ittemporarilystoresacopyofthedatainanotherlayerofstorage,namelycache, whichisclosertotheclientandprovidesfasterdataaccessthantheoriginalstorage. 28

PAGE 29

Dierentlevelsofcachingexistinatypicalcomputersystem.Forexample,CPUsuse hardware-implementedcachestospeeduptheaccesstodatastoredinmemory;O/Ss managepartofmemoryascachestoimprovetheaccesstimestodatastoredondisks;and aDFScanusedisksascachestoprovidehigh-performanceaccessofdatafromtheremote leserver. Cachesleveragethelocalitythattypicallyexistsindatareferencestoimproveits performance.Therearetwotypesoflocality:temporallocalityreferstothatifaclient accessessomedata,itishighlyprobablethatthesedatawillbeusedagainbytheclientin thenearfuture;spatiallocalitymeansthatwhenapieceofdataisaccessed,itsspatially nearbystoreddataarealsoverylikelytobeneededbytheclient.Ifadatarequestcan besatisedfromthecache,itiscalleda cachehit ;otherwise,itisa cachemiss ,and therequesteddataneedtobesatisedfromtheremotestorage.Apparently,acacheis eectivewhenmostofthedataaccessescanbeservedfromthecache,i.e.,thehitrateis highandmissrateislow.Acacheisinitiallycold",whichmeansitdoesnothaveany datatoserveanyrequests;andasdataarebroughtintothecache,itbecomeswarm"and abletosatisfydatarequestsleveragingthelocality. Dierentcachingpoliciesarepossible: read-onlycaching onlystoresthedata requestedbyreadoperationsincaches,whereas writecaching alsocachesthedata accessedbywriteoperations.Therearetwotypesofwritecaching: write-throughcaching allowsaclienttodirectlymodifydatainitscacheandforwardtheupdatetotheremote storageatthesametime; write-backcaching furtherdelaysthepropagationofdata updatesandkeepsthemodieddataonlyincachesforaperiodoftime.Read-only cachingworkswellforread-mostlydataaccesses,anditisalsoeasytobeimplemented inarobustmanner.Write-throughcachingpotentiallyoersimprovedperformanceover read-onlycachingasthedatacachedfromwritescanbereusedbythefollowingreads. Write-backcachingisimportanttothedataaccessperformancewhenthereareintensive writes,becausethelocalityexistedacrosswriteoperationscanbeleveragedtoreducethe 29

PAGE 30

amountofdataupdatesontheremotestorage.Itis,however,morecomplextoimplement andmorediculttohandleclientfailuresandmaintaindataconsistencybetweenthe cacheandremotestorage. Cacheconsistencyconcernsthatwhentherearemultipleclientssharingthedata storedontheremotestorage,aclient'sreadonapieceofdatashouldalwaysreturnvalue fromthelatestwriteonit,nomatterwhetherthewriteisfromthesameclientorothers. Inconsistencyhappenswhenaclientreadsdatafromitscachewhilethedataarealready modiedbyanotherclient,andwhenaclientdelaysitsupdatesinitscachewhilethe otherclientsgetastalecopyofthedatafromeithertheircachesortheremotestorage. Duetothelackofanabsoluteglobaltime,itisimpossibletodeterminewhichoperation isthelatest"one,soaformaldenitionofcacheconsistencytypicallyconsidersthatthe operationsfromalltheclientsonthesamepieceofdatafollowahypotheticalserialorder. Inthisserialorder,operationsfromanyparticularclientfollowtheorderinwhichthey areissuedbytheclient,andthevaluereturnedbyeachreadoperationisthevaluewritten bythelastwritetothepieceofdataintheserialorder.Thisconsistencymodelisableto presentacoherentviewofdatatoclients,butitcanbeveryexpensivetoimplementin practice.ExistingDFSsoftenuseothercacheconsistencymodelswhicharemoreorless relaxedfromthisoneandproviderelativelyweakerdatacoherence. ConventionalDFSsusuallyemployclient-sidecachinginmemory,buttheuseof diskcachingisnottypical,sincemostofthemaredesignedforlocal-areaenvironment wherethelatencyofnetworktransactionsiscomparabletothelatencyoflocaldiskaccess. Forexample,itiscommonamongdierentNFSclientimplementationstocachele datablocks,attributes,lehandles,anddirectoriesinmemory,butdiskcachingisonly availableonSolariswiththekernel-levelCacheFS[31]service.Thereareseveralrelated kernel-levelDFSsolutionsthatarespeciallydesignedfortheuseinWANandexploitthe advantagesofdiskcaching.Inparticular,AFS[25][26]andCoda[27]makeuseofdisk cachingtoimprovebothperformanceandavailability.Codaandtheearlierversionsof 30

PAGE 31

AFScachelesentirelyonlocaldisks,whereasthelaterversionsofAFSalsosupport partial-lecachingi.e.,cacheonlycertaindatablocksoflesondemand. AscachingiswidelyusedinDFSs,cacheconsistencyisanimportanttaskforDFSs tosupporttheconcurrentsharingofdatafordistributedclients.TraditionallyNFS repliesonatimestamp-basedalgorithmtomaintainconsistencyofcacheddata.When aclientcachesanyblockofleinthedatacache,italsostoresthele'smodication timeintheattributecache.Thecachedblocksoftheleareassumedtobevalidfor aniteintervaloftime,andtherstreferencetoanyblockoftheleafterthisinterval forcesarevalidation,inwhichtheclientcomparestherecordedtimestampwiththele's modicationtimeontheserver.Ifthelaterismorerecent,itmeansthatthelehasbeen recentlymodiedbysomeoneelse,sotheclientinvalidatesthecachedblocksofthele andrefetchesthemondemand.Becauseofthistimestamp-basedalgorithm,aclientneeds toperiodicallyrevalidatealeifitiscontinuouslyreferenced. TheNFSprotocolalsoprovidesaclose-to-openconsistencyinwhichaclientalways revalidatesalewhenitopensitandalwaysushesthelocallymodieddataofale whenitclosesit.Thisconsistencymodelisusefulforthe sequentialwrite-sharing scenario,inwhichasharedleisneveropensimultaneouslyforreadingandwritingby dierentclients.Itmakessurethataclientalwaysgetsthelatestcopyofalewhenthe clientstartstoworkonit.However,ifaleisopensimultaneouslybyseveralclientsand oneofthemmodiesit,whichiscalled concurrentwrite-sharing ",astrongerconsistency modelisneededtoallowtheotherclientstoseethechangesimmediately. SeveralsolutionsareproposedtoimproveuponNFSandprovidestrongercache consistency.SpritelyNFS[32]appliesthecacheconsistencyprotocoldesignedinSprite [33]toNFS:itaddsopenandclosecallstotheNFSprotocoltoallowaservertokeep trackoftheclientsthatopentheleforreadingandwriting;andwhenthewrite-sharing ofaleisdetected,theserverusescallbackcallstoinformtheclientsthattheleisno 31

PAGE 32

longercacheableandforcethemtoinvalidateand/orwritebacktheircacheddataofthis le. TheNQNFS[34]usesleasestoallowaclienttocachedataforreadingorwriting withoutworryingaboutconicts.Suchaleasehasalimiteddurationandmustbe renewedbytheclientifitwishestocontinuetocachethedata.Aread-cachinglease allowsaclienttouseread-onlycaching;awrite-cachingleasepermitswrite-backcaching, andthecachedmodicationsaresubmittedwhentheleaseexpiresoristerminatedbyan evictioncallbackissuedfromtheserverwhenthesharingconictisdetected. ThemostrecentversionofNFSv4[35]diersfromtheearlierversionsbyincluding openandclosecallsintheprotocolandprovidesopendelegationstoclients.Witha readdelegation,theclientcanusecacheddatawithoutperiodicconsistencychecks;a writedelegationfurtherallowstheclienttoretainmodieddatainitscaches.Aleaseis associatedwitheverydelegationanditsexpirationautomaticallyrevokesthedelegation. Whenasharingconictisdetectedbytheserver,itcanalsorevokethedelegationusinga server-to-clientcallbackcall. ThecacheconsistencymodelprovidedbyAFS[25][26]andCoda[27]issimilartothe aforementionedclose-to-openmodel,inwhichaclientgetsthelatestcopyofaleonopen andpropagatesthemodiedleonclose.Therearetwolimitationsofthismodel:rst, modicationsonalecannotberetainedincacheaftertheleisclosed;second,itcannot supportconcurrentlesharing,sincethemodicationmadebyaclientcannotbeseenby theothersimmediately. InAFSandCoda,consistencyismaintainedbymeansofcallbacks.Whenaclient cachesale,theserverkeepstrackofthatandpromisestoinformtheclientiftheleis modiedbyotherclients.Withthispromise,theclientcanusethecachedcopywithout checkingtheserver.Whenaclientupdatesthele,theserversendsoutcallbackbreak messagestotheotherclients,whichhavecachedthisle,sincetheserverwilldiscardthe callbackpromisesitheldfortheseclientsonthisle.Ontheotherhand,ifaclientopens 32

PAGE 33

aleandndsitinitscache,itneedstocheckwiththeserverwhetherthepromisestill holds.Ifnot,ithastotofetchthelatestversionofthelefromtheserver. However,theabovecachingandconsistencydesignsrequirekernelsupportthat isdiculttodeployacrosssharedgridenvironments,andtheyarenotabletoemploy per-userorper-applicationcachepolicies.Incontrast,GVFSenhancescachingand consistencyatuser-levelbasedonthevirtualizationofwidelyavailableNFSversionsv2 andv3,andsupportsper-userandper-applicationcustomizationontheuseofdiskcaches andconsistencyprotocols. Cachingorreplicationonpersistentstorageisalsowidelyusedamongtherelated scalabledistributeddatastorage/deliverysystems.Pangaea[36]isadecentralized wide-arealesystemthatusespervasivereplicationinapeer-to-peerfashiontoimprove systemperformance,butitsupportsonlyoneconsistencymodel|eventualconsistency", i.e.,itonlypromisesthatauserseesachangemadebyanotheruserinsomeunspecied futuretime.OceanStore[37]isanarchitecturedesignedforglobal-scalepersistentstorage; Pond[38]isitsprototypethatimplementsalesysteminterfaceusingNFSloop-back serverandallowsforapplication-specicconsistency,butitisnotapplication-transparent, requiringtheuseofitsAPItoachievethisgoal.InthecontextofWebcontentcaching,a relatedproxycacheinvalidationapproachhasbeenstudiedin[39].Thesesystemsdier fromtheproposedGVFSapproachinthattheyarenotarchitectedtoprovidedierent consistencymodelstransparentlytoapplicationsaccordingtotheirrequirementsand usagescenarios. Notethattheconsistencymodelsandprotocolsdiscussedinthisdissertationonly considertheorderofreadandwriteoperationsonasingledataiteme.g.,aleanddo notconsidertheorderofoperationsondierentdataitemse.g.,allthelesinthele system.Intheterminologytypicallyusedinsharedmemorymultiprocessorsystems,a consistencymodelspeciestheconstraintsontheorderofoperationsontheentiredata set,whereasacoherencemodelonlyconsidersthatwithrespecttoasingledataitem 33

PAGE 34

[40].However,thisdenitionofcachecoherencehasbeentraditionallyreferredascache consistencyintheliteratureofDFSsanditisthusfollowedinthisdissertation. 2.3.3Security SecurityforaDFStypicallyconcernsbothcondentialityandintegrity: condentiality referstothatdataaccessedthroughtheDFSaredisclosedonlytoauthorizedclients; integrity meansthatalterationstothedatacanonlybemadeinanauthorizedway.There areseveralimportantmechanismstoprotectthesecurityofaDFS. Encryption makes useofcryptographytotransferdataintosomethingthatanattackercannotunderstand, andtheencrypteddatacanonlybedecryptedbysomeonewiththeproperkey. Authentication isusedtoverifytheclaimedidentityofapartyanditistypicallyalsobased oncryptography.Afterapartyisauthenticated, authorization istheprocesstocheck whetherthatpartyisauthorizedtoperformtherequestedaccessonthedata,andthe accesscontrol canbeperformedbycheckingtheparty'sidentityagainstanAccessControl ListACLwhichliststhepermittedoperations. Integritycheck makessurethatthedata arenotalteredbyunauthorizedparties,anditcanbedoneusingMessageAuthentication CodeMACordigitalsignatures. Full-featuredsecurityneedstosupportalloftheabovementionedsecuritymechanisms: authentication,encryption,integritycheck,andaccesscontrol.StrongsecurityinDFSs isoftenbasedtwotypesofsecuritysystems:Kerberos[41]andPublic-keyInfrastructure PKI[42].AKerberossystemisbuiltaroundKeyDistributionCentersKDCs.Users areorganizedintorealmsandeachrealm'sKDCsaremanagedbyitsadministrators. Auserauthenticatesintoherrealmthroughtherealm'sKDCs,fromwhichsheobtains theticketsforsecureaccesstotheresourcesintherealm.Across-realmaccessrequires thecooperationoftheadministratorsineachrealmtodeveloptrustrelationshipsand exchangeper-realmkeys.InPKI-basedsecurity,apublic-keybasedcerticatee.g.,X.509 [43]alongwithitsassociatedprivatekeyuniquelyidentiesauser.Twopartiescanuse theircerticatestoestablishmutualauthenticationandthencreateasecurechannelfor 34

PAGE 35

access.Validationofacerticateisdonebycheckingthesignatureofitsissuer,anda trustedthirdpartyknownasCerticateAuthorityCAistypicallyleveragedtoissue certicates. 2.3.3.1Securityindistributedlesystems ExistingDFSshavediversesecuritydesignsandstrengths.EarlierversionsofNFS v2[22]andv3[23]relyonUNIX-styleauthentication,usinguserandgroupIDs. Althoughstrongerauthenticationavorsaredenedinthespecications,theyhavenever prevailedindeployments.Thereisalsonosupportforprivacyandintegrityinthese versions,andNFSRPCmessagescanbeeasilyspoofed,altered,andforged.Complete supportofsecurityhasnotbeenavailableuntilthelatestversionNFSv4[35],which mandatesthesupportofRPCSEC GSS[44],aRPC-layersecurityprotocolbasedonthe GenericSecurityServicesAPIGSS-API[45].ItisrequiredthataconformingNFSv4 implementationmustimplementRPCSEC GSSwithtwosecuritymechanisms,onebased onKerberosV5[41]andtheotherbasedonPKILIPKEY,LowInfrastructurePublic Key[46]. AllNFSversionsuseanexportsletospecifythehoststhatareallowedtoaccess anexporteddirectory.TheACCESSprocedurecallwasintroducedinNFSv3toprovide ne-grainedaccesscontrolusingPOSIX-modelACLs,butagainitisnotwidelyusedin practice.NFSv4improvesuponthisbyprovidingWindowsNT-modelACLswhichhave richersemanticsandwiderdeployments.Inaddition,NFSv4representsusersandgroups withstringIDsinsteadofintegers,whichfacilitatescross-domainidentitymapping. TheAFS[25][26]andCoda[27]useKerberos-basedsystemstoprovidestrong security.AccesscontrolisachievedbyassociatinganACLwithdirectoriesthatlist positiveornegativerightsforauserorgroup.TheKerberossecurityreliesoncentralized controlandworkswellwithinanintranet.Butcross-domainsecurityisdiculttosetup becauseitrequirestheinvolvedadministrationstonegotiateatrustrelationship. 35

PAGE 36

NoneoftheseconventionalDFSshasbeendesignedtosupportgridsecurity requirements.ThereisrelatedworkonextendingDFSsecurityatkernel-level.In particular,theGridNFS[47]projectdevelopsaGSI-compatiblesecurityinNFSv4. However,suchadesignrequireskernelsupportthatisdiculttodeployacrossshared gridenvironments.Kernel-levelsecuritytechniquesarealsounabletoemployper-useror per-applicationsecuritycongurations.Incontrast,aGVFS-styleuser-levelsolutioncan supportexiblecustomizationonsecuritymechanismsandpoliciesbasedonindividual applicationanduserneeds. User-leveltechniquescanachieveprivacyandintegrityofNFSthroughsecure tunneling,whereSSHorSSLcanbeleveragedtoestablishasecureend-to-endconnection betweentheclientandserverforNFStrac[48].Asecuretunnelmultiplexedbyusers facesthesamelimitationsasNFS,sinceRPC-layermechanismisstillrequiredfor authenticationandauthorizationwithinthetunnel,andsuchtunnelsarecreatedstatically bysystemadministrators.InGVFS,per-sessionSSHchannelsarecreatedondemandto ensureprivacyandintegrityofthedatasessions,whereasauthenticationandauthorization canbeperformedbyproxiesusingmiddleware-managedsessionkeys. Self-certifyingFileSystemSFS[49]alsoleveragesuser-levelloop-backclientand servertoenhanceDFSsecurity.Itaddressestheproblemofmutualauthentication betweenaleserverandusersbyprovidingself-certifyingpathnamesforles.Such apathnamehastheserver'spublickeyembeddedinside,whichisusedbyaclientto verifytheauthenticityoftheserver,andthentocreateasecurechanneltoprotectthe lesystemtrac.TheSFSapproachisalsoextendedtoprovidedecentralizedaccess control,inwhichusersareallowedtocreatelesharinggroupswithACLsinthele system[50].Whenausertriestoaccessale,theauthenticationserverfetchesthe user'scredentialsandcheckthemagainsttheACLtoauthorizetheaccess.Compared toSFS,theproposedGVFSfocusesonprovidingdataaccessthatmeetsgridsecurity requirements,employsdynamically-createdper-user,per-applicationlesystemproxies, 36

PAGE 37

andallowsformiddleware-controlledsecuritycongurationsonaper-user,per-application basis. 2.3.3.2Securityingridsystems Thedynamicandmulti-institutionalnatureofgrid-styleenvironmentsintroducenew challengestosecurity.In[51]severalkeyrequirementswerestudiedforagridsecurity model,includingthesupportformultiplesecuritymechanisms,dynamiccreationof services,anddynamicestablishmentoftrustdomains.Thisresearchresultedina defacto gridsecuritystandard,GSIGridSecurityInfrastructure.GSIemployspublic-keybased certicatesforgridauthentication.Authorizationisdonebycheckingagriduser'sidentity thedistinguishednameintheuser'scerticateagainstcertainaccesscontrolmechanism e.g.,gridmapleinGSI,MayIlayerinLegion[52].Oneimportantsecurityrequirement uniquetogridsystemsisdelegation,whichallowsaservicetoactonbehalfofauser.This canalsobesupportedwithextensionstopublickeycerticates,e.g.,proxycerticatesin GSIandcredentialsinLegion. Gridsecuritycanbeimplementedattwodierentlevels.Transport-levelsecurity [53][54]usespublic-keycerticatestocreateasecuresocketlayerconnectionbetweentwo endpointsandprotectthedataexchangesbetweenthem.Itisamaturetechnologythat hasecientimplementationse.g.,OpenSSL[55],butitlacksservice-levelsemantics anddoesnotworkformulti-hopconnections.Message-levelsecurityisasuiteof standardsarisingfromtheemergingWebservicetechnologies[56][57][58],whichprovides securityatthelayerofSOAPmessaging.Itisagnostictotransport-layerprotocolsand connections,andsupportsmoreservice-levelfunctionalities.However,itsperformanceis notcomparabletotransport-levelsecuritybecauseXMLprocessingisexpensive.Inthis dissertation,atwo-levelsecurityarchitecturethatexploitstheadvantagesofbothlevelsof securityisproposedfortheGVFS-basedgriddatamanagement. Intherelateddatamanagementsolutions,GSI-basedGridFTP[6]providesAPIfor programminggriddataaccess,andRFTisawebserviceforreliableletransferusing 37

PAGE 38

GridFTP;theLegionsystem[59]isanobject-basedgridsystem,whichemploysamodied NFSservertoprovideaccesstogridobjects,anditintegratesGSIinLegion-G[60]; theCondorsystem[9]useslibrarycallinterceptiontoprovideremoteI/O,anditalso supportsGSIinCondor-G[61].Thisdissertationproposesagrid-widelesystemwith compatiblesecuritymechanismswiththeseeorts.Itdierentiatesfromandcomplements theminthatGVFS-baseddatasessionsallowunmodiedapplicationbinariestoaccess griddatausingexistingkernelclientsandservers,andsupportapplication-tailored per-sessioncustomization. 2.3.4FaultTolerance ReliableremotedataaccessrequiresDFSstotoleratethepossiblefailureshappened inthesystems.Thisisespeciallyimportantforgrid/wide-arealesystemsbecauseof thedynamicnatureofsuchenvironments.Thecommontypesoffailuresincludeserver andclientcrashesduetosoftwareorhardwareproblems,aswellasnetworkpartitioning causedbycrashednetworkdeviceswhichbreakthephysicalnetworkconnectionbetween theclientandserver.Anothertypeoffailuresisdatacorruptionhappenedduringdata storageandtransmission,causingincorrectresultsfromdataaccess.Inaddition,resources canalsobecomeunavailabletoaDFSwhenresourcesvoluntarilyleavethesystem,which iscommoningridandpeer-to-peersystemsbuiltuponnon-dedicatedresources.Inthe worstpossiblefailuresemantics,anyoftheabovetypesoffailuresmayoccurandaclient cannottellwhethertheresultreceivedfromaserveriscorrectornot|suchascenariois referredtoasarbitraryorByzantinefailure. Fault-tolerantsystemsareoftenbuiltbyreplicatingthedatatointroduceredundancy intothesystem.Serverfailurescanbemaskedifthedataarereplicatedacrossdierent servers,andtoleranceofnetworkpartitionscanalsobeprovidedbyreplicatingthedata acrossdierentsites.Successfulrecoveryofanapplication'sexecutionafteraclient failureoftenreliesontheuseofcheckpointingmechanism,whichsavesthestateof theapplicationonpersistentstorage.Aftertheclientcomesbackfromafailure,the 38

PAGE 39

applicationcanrollbacktoitsmostrecentcheckpointandcontinueitsexecutionfrom there. Therearetwobasicmodelsofreplication,passiveprimary-backupandactive replication.Inthe primary-backupreplication model,onlytheprimaryreplicaservices datarequestsanditsynchronizeswiththebackupsbysendingtheupdateddatatothem. Iftheprimaryreplicafails,oneofthebackupsispromotedtoactastheprimary.In activereplication ,allreplicasexecuteoperationsinthesameorder,whichusuallycauses higheroverheadcomparedtopassivereplication,butitcantolerateByzantinefailuresby collectingtheresultsreceivedfromthereplicasandusingvotingtondoutthecorrect one. ConventionalDFSshavelimitedsupportforfaulttolerance.AFS[25][26]supports read-onlyreplicationofdatathatarefrequentlyreadbutrarelymodied,inorderto enhancedataavailability;Coda[27]supportsread-writereplicationwitharead-one, write-allapproach.EarlierversionsofNFSv2[22]andv3[23]donotprovideany supportforreplication;withthehelpfromAutomounter[28],aremotemountpointcan bespeciedasasetofserversinsteadofasingleonewhichallowstheuseofreplication, butpropagationofmodicationstoreplicashastobedonemanually.FT-NFS[62]is auser-levelNFSthatemploysaprimary-backupreplicationschemetoimprovedata availability.BFS[63]isanotherNFSservicewhichemploysareplicationalgorithm fortoleratingByzantinefaults.ThelatestversionofNFSv4[35]providesverylimited supportforusingread-onlyreplication:eachlecanhaveaattributetolistthelesystem locationswherethele'sreplicasarestored,butthemanagementofreplicasisleftoutof theprotocol. Fault-tolerancetechniquesarewidelyusedinlarge-scaledistributedstoragesystems. Oceanstore[37]anditsprototypePond[38]encodedatawithanerasurecodetointroduce redundancyandspreadthecodeddataoveralargenumberofserverstoprovidehigh availability.ThePASTproject[64]ispeer-to-peerstoragesystemthatusesreplicationfor 39

PAGE 40

durability.TheFarSitesystem[65]aimstobuildascalableserverlessnetworklesystem, usingreplicationtoprovideleavailabilityandreliability.Wide-areadatareplication ispresentedin[66]forscienticcollaborations.Itmanagesgriddatareplicationfor read-onlyscienticdatasetsusingGlobusReliableFileTransferserviceforschedulingof GridFTP-baseddatatransfer,andusingGlobusReplicaLocationServiceforlocatingdata replicas. Comparedtothesesystems,theGVFS-basedapproachdescribedinthisdissertation supportsapplication-tailoredcustomizationonfault-tolerancemechanismsandpolicies, anditleveragesmiddlewareservicesforautonomicreplicationmanagementand optimization. 2.4Service-OrientedandAutonomicDataManagement Service-orientedarchitectureSOAisanapproachtobuildinglooselycoupled distributedsystemswithminimalsharedunderstandingamongsystemcomponents. Inparticular,theWebservicesarchitecture[67]hasbeenbroadlyacceptedasameans ofstructuringinteractionsamongdistributedsoftwareservices,whichexchangeXML documentsusingSOAPmessagesoveranetwork.WebServiceResourceFramework WSRF[68]isaspecicationthatdescribesaconsistentandinteroperablewayofdealing withstatefulresourcesthattypicallyexistinagridsystem,e.g.,lesinalesystemand recordsinadatabase.Thisframeworkisbecomingwidelyadoptedbygridmiddleware includingGlobusToolkitversion4[69],WSRF.Net[70],andWSRF::Lite[71].Thesystem describedinthisdissertationfocusesondatamanagementandisuniqueinthesupportfor dynamicandcustomizabledatasessions,anditcanalsoprovideinteroperableserviceto othergridmiddlewareservicesbasedonWSRF. Autonomiccomputingaddressesthecomplexityofmanaginglarge-scale,heterogeneous computingsystemsbyendowingsystemsandtheircomponentswiththecapability ofself-managingaccordingtohigh-levelobjectives[72].Thebuildingblocksofan autonomicsystemareautonomicelements.Anautonomicelementmanagesitsown 40

PAGE 41

Figure2-2.Anautonomicelementemploysanautonomicmanagertomonitorthe managedelement,analyzethemonitoredinformation,planmanagement actionsupontheelement,andexecutetheplanaccordingly.Self-management oftheelementisrealizedthroughthisfeedback-controlloop. resourceorserviceguidedbypolicies,anditstypicalarchitectureisillustratedinFigure 2-2.Itemploysanautonomicmanagertomonitorthemanagedelement,analyzethe monitoredinformation,planmanagementactionsupontheelement,andexecutetheplan accordingly|self-managementoftheelementisrealizedthroughthisfeedback-control loop.Furthermore,suchautonomicelementsalsointeractwitheachothertoachievethe desiredsystem-levelself-management[73].Theproposedresearchfollowsthisapproachby buildinggriddatamanagementservicesasself-managinginteractingautonomicelements. Automaticperformanceoptimizationandfaulttoleranceareproposedin[74]for lestagingbaseddataprovisioning.Inthisapproach,performanceimprovementis throughthetuningofdatablocksizeandTCPparameters;failurerecoveryisachieved byloggingtransferprogressandretryitafterafailure.Incomparison,theapproachof thisdissertationsupportsgeneraldataaccesspatternsbeyondbulkdatatransfer,andin particular,itcansupportinteractiveapplicationsandecientsparseleaccesses.Italso supportsexiblecustomizationandautonomicoptimizationonavarietyofimportant aspectsofdatasessionsbasedonapplicationneeds. 41

PAGE 42

Thereisextensiveresearchonautonomicstoragemanagement.Inparticular,in[75] autility-basedalgorithmisusedtodecidethereplicationdegreethenumberofreplicas foradatasetforresourcemanagers;theIBMautonomicstoragemanagerimplements policy-basedstorageallocation[76].Automaticreplicagenerationanddistributionare studiedinthecontextofContentDeliveryNetworkCDN[77]andpeer-to-peerstorage systems[78].Comparedtothesesystems,thisdissertationproposesautonomicstorage andreplicamanagementinordertosupportdynamicgrid-widelesystemsthatprovide applicationtransparentandtailoredgriddataaccess. 2.5SupportforDistributedVirtualMachines AvirtualmachineVMpresentstheviewofaduplicateoftheunderlyingphysical machinetothesoftwarethatrunswithinit,allowingmultipleoperatingsystemstorun concurrentlyandmultiplextheresourcesofacomputer,includingprocessor,memory, disk,andnetwork.SuchVMsareoftencalledsystem-levelVMs,inordertodierentiate withothertypesofVMs,andtheyarebecomingincreasinglyvaluabletoprovideexible resourcecontainersandportableencapsulationsofexecutionenvironments.Inparticular, therearegrowinginterestsinemployingVMsingridcomputing[17][79]. System-levelVMsaremainlyprovidedbythesoftwarecalledvirtualmachine monitor,alsoknownashypervisorsometimesalsowithcertainlevelofhardware support.AVM'sentirestate,includingCPUs,memory,anddisks,canberepresented asdata.IntypicalVMtechnologies,suchasVMware[80],Xen[81],andUML[82],a VM'sstatedataareoftenencapsulatedinlesandstoredonphysicaldisks.Thusthe GVFS-basedapproachproposedbythisdissertationcanbeappliedtomanageVMstate andprovideremotestateaccessforVMsinstantiatedacrossgrids. Thereisarelatedprojectwhichhasinvestigatedtechniquesthatimprovethe performanceofVMmigrations[83][84].Theirworkfocusesonmechanismstotransfer thestateofvirtualdesktops,possiblyacrosslow-bandwidthlinks.Commonbetween theirapproachandthisdissertationaremechanismsforon-demandblocktransfers,and 42

PAGE 43

optimizationsbasedontheobservationthatzero-lledblocksarecommoninsuspended VMmemorystate.Thekeydierencesare:thetechniquesdescribedinthisdissertation aregenerictovariousVMtechnologies,implementedthroughinterceptingNFSRPCs andleveragingO/Sclientsandserversavailableintypicalgridresources,whereastheir approachisspecictoaparticulartypeofVM,usingmodiedlibrariesasameansof interceptingVMmonitoraccessestoleswithacustomizedprotocol. Theworkpresentedin[85]introducestechniquesforlowoverheadmigrationofVM memorystateinLANenvironments.Theapproachpresentedinthisdissertationprovides anecientwayofmigratingVMsacrosstheWAN.Inaddition,theimplementationof[85] requiresaccesstotheVM'sshadowpagetable,whichisnotpossibleforcommercialVM software,suchasVMware.Onthecontrary,thetechniquesdescribedinthisdissertation areapplicabletodierenttypesofVMs. 43

PAGE 44

CHAPTER3 DISTRIBUTEDFILESYSTEMVIRTUALIZATION ConventionaldistributedlesystemsDFSsaredesignedforgeneral-purposeusage, andimplementedinoperatingsystemsO/Ssaspartofkernelsorprivilegeduser-space software.Theyareunabletoincorporateapplication-tailoredfeatures,becausean optimizationtailoredforoneapplicatione.g.,aggressivepre-fetchingoflecontents mayresultinperformancedegradationforseveralotherse.g.,sparseles,databases. ModicationstoDFSsatO/S-levelarealsodiculttoportanddeploy,notablyinshared environments.Ontheotherhand,suchDFSsareunabletoemploycongurationsthatare customizedforspecicapplications:theyaretypicallydeployedbysystemadministrators withrelativelystatic,long-lived,andhomogeneouscongurationsatthegranularityofa collectionofusers,ratherthandynamic,short-lived,anddierentsetupsatthegranularity ofanindividualuserorapplicationsession. ThisdissertationaddressestheselimitationsbyproposingvirtualDFSs,namely G rid V irtual F ile S ystemsGVFSsFigure3-1.ThesevirtualDFSsarebuiltuponthe conventionalDFSs,buttheycanbehavedierentlythanthephysicalonesintermsof dataaccessibilityandcharacteristics.Theysharetheunderlyingsoftwareandhardware resources,buttheyareisolatedfromeachotherandcanbedynamicallycreatedand conguredindependently.WithGVFS,application-tailoreddataprovisioningcan berealizedbyestablishingper-applicationvirtualDFSsandcustomizeeachofthem accordingtoitsapplication'srequirementsandcharacteristics. 3.1User-LevelProxy-BasedVirtualization 3.1.1Architecture AGVFS-basedvirtualDFSconsistsofseveralvirtualclientsandserverswhichare implementedbyunprivilegeduser-levellesystemproxies.Theseproxiesvirtualizethe physicalO/S-levelDFSclientsandserversbyinterposingbetweenthemandbrokerthe datasharingacrossthedistributedsystemsFigure3-2:theO/Sserversdelegatethe 44

PAGE 45

Figure3-1.TheGVFS-basedvirtualDFSsarebuiltupontheconventionalDFSs,butthey canbehavedierentlythanthephysicalonesintermsofdataaccessibilityand characteristics.Theysharetheunderlyingsoftwareandhardwareresources, buttheyareisolatedfromeachotherandcanbedynamicallycreatedand conguredindependently. controlofdatasharingtotheproxiesontheserver-sidenamely, proxyservers ,and theO/Sclientsaccesstheremotedatathroughtheproxiesontheclient-sidenamely, proxyclients .Thelayerofindirectionprovidedbytheproxiesformsthe virtualization whichallowstheproxiestomultiplexthephysicalDFSclientsandserversandtoestablish independentvirtualDFSsforapplications,withaccesstodierentdatasetsandwith dierentcongurations. TheGVFScanleveragethewidelyavailableDFSimplementationsinexisting O/Ss,suchasNFSandCIFS/Samba,toprovidevirtualDFSsacrossheterogeneous platforms.NetworkFileSystemNFS[22][23][35]isthe defacto DFS,availableon manyO/Ss,includingUNIXandLinuxaswellasWindowswithanextensionservice. CommonInternetFileSystemCIFS[24]isprovidedbytheWindows-familyO/Ssand isinteroperablewithSamba[86]onUNIXandLinux.ByvirtualizingtheseDFSswith user-levelproxies,GVFScanbeseamlesslydeployedonawidevarietyofsystems,without anychangestotheexistingO/Ss. 45

PAGE 46

Figure3-2.AGVFS-basedvirtualDFSconsistsofseveralvirtualclientsandserverswhich areimplementedbyunprivilegeduser-levelGVFSproxies.Theseproxies virtualizethephysicalO/SDFSclientsandserversbyinterposingbetween themandbrokerthedatasharingacrossthedistributedsystems. UnlikeaconventionalDFSstaticallydeployedinalocal-areaenvironment,GVFS canbedynamicallycreatedacrosswide-areanetworksandadministrativedomains.Ina wide-areaenvironment,auser'sidentityisoftennotconsistentacrossdomainsduetothe lackofcentralizedadministration.Inagridsystem,virtualorganizationsaredynamically establishedonresourcesdistributedacrossdierentphysicalorganizations,whereagrid user'sidentityneedstobedynamicallymappedtothephysicalones.Therefore,an importanttaskperformedbyaGVFSproxyiscross-domainidentitymapping,which dynamicallymapstheidentitiesbetweentheaccountwherethejobisrunningandthe accountwherethelesarestored. WhilevirtualizingaconventionalDFS,theGVFSproxiescommunicatewiththe O/SclientsandserversviathenativeDFSprotocolse.g.,NFSRPC,CIFSSMB. Nonetheless,theprotocolusedbetweentheproxyclientsandserverscanbedierent 46

PAGE 47

fromthenativeprotocol.Itcanbeextendedtoprovidemorefunctionalityandimproved toachieveoptimizationonvariousimportantaspectsofremotedataaccess,including performance,consistency,security,andfault-tolerance. BasedontheGVFSapproach,bothNFSandCIFScanbevirtualizedbyinterposing lesystemproxiesbetweennativeNFSandCIFSclientsandservers.Theresultingvirtual NFS'sorCIFS'sdataaccessismanagedbyitsproxies,whichprocessandforwardthe correspondingNFSRPCorCIFSSMBmessages.Becauseofthegoodstandardization andavailabilityoftheNFSspecications[22][23][35],thisdissertationuses NFS-based GVFS topresentitsdesignandimplementation,aswellastodevelopandevaluatethe prototypesystem. 3.1.2NFS-BasedGVFS 3.1.2.1User-levelNFSproxy InatypicalNFSsetup,thekernelNFSserverexportsthesharedlesystemsto certainusersandclients,andanauthorizeduseraccesstheremotedataviathekernel NFSclient'sRPC.WithGVFS,avirtualNFScanbecreatedbyplacingauser-level lesystemproxybetweenthekernelNFSclientandserver.Tothekernelclient,the proxyworksasaserver;tothekernelserver,theproxyworksasaclient.Whenauseror applicationontheclientneedstoaccesstheremotedataontheserver,theRPCsfromthe kernelNFSclientareprocessedbytheproxyandthenforwardedtotheNFSkernelserver. Specically,duetothedecouplingofmountprotocolandNFSprotocolinNFSv2 andNFSv3,aproxyconsistsoftwouser-leveldaemons, gvfs.mountd and gvfs.nfsd ,for handlingmountRPCsandNFSRPCsrespectively.Whenaclienttriestomountthe remotelesystem,therequestisprocessedbytheGVFSmountdaemonwhichrst checkswhethertheclientisallowedtomountit.Ifitisallowed,theproxyforwardsthe requesttotheserver'snativeNFSmountdaemon,andtheresultfromtheservertheroot lehandleoftheremotelesystemisreturnedtotheclient.Otherwise,therequestis rejectedbytheGVFSmountdaemonandisnotforwardedtotheserver. 47

PAGE 48

AGVFSisestablishedoncetheremotelesystemismountedontheclient.A user'sdataaccesstotheremotelesystemontheclienttriggersthekernelNFSclient toissueRPCstotheGVFSproxy'sNFSdaemon.Theproxycheckswhethertheclient anduserareallowedtoaccesstherequesteddata,forwardsthepermittedonestothe nativeNFSserver,andreturnstheresultstotheclient.Invalidrequestsarerejectedby theproxywithoutbeingforwardedtotheserver.Intheend,aGVFScanbedestroyedby unmountingtheremotelesystemandterminatingtheproxydaemons. TheGVFSenforcesaccesscontrolforthemountandNFSrequestsbycheckingthe user'sidentitytypicallyuserIDandgroupIDandtheclient'sidentitytypicallythe IPaddressagainstitsaccesscontrollistles.Itusesanexportsletolisttheclients thatareallowedtoaccesstheremotelesystemandtheirreadandwritepermissions. AnothermapleisusedbyGVFStolisttheusersthatareallowedtoaccesstheremote lesystem. Thismaplealsostoresthecross-domainuseridentitymappingsbetweenthe accountwheretheuser'sjobisrunningandtheaccountwhereherlesarestored.Upon receivingarequest,theproxycheckstheidentityofthejobaccountembeddedinthe RPCmessageagainstthemapleandndsoutthecorrespondingleaccount'sidentity. ItthenchangestheuserandgroupIDsofthejobaccounttotheIDsoftheleaccountin thetheRPCmessagebeforeforwardingittotheserver.Ifajobaccount'sidentityisnot foundinthemaple,therequestiseitherdeniedortheaccountismappedto nobody the leastprivilegeaccountdependingthecongurationofGVFS. Figure3-3illustratesanexamplesetupoftwoNFS-basedGVFSs.Griduser X and Y arerunningtheirjobsunderaccount shadow1 and shadow2 oncomputerserver C1 and C2 respectively.Ontheleserver S ,thecorrespondinguserdataarelocatedin subdirectory X and Y ofleaccount F /home/F .ThenativeNFSserverexportsuser datatotoitself localhost sincetheproxiesarerunningontheleserver.Whenajob needstoaccesstheremotedataonGVFS,thecorrespondingrequestsfromthekernel 48

PAGE 49

Figure3-3.Twogridusers X Y executejobswiththeirallocatedshadowaccounts shadow1 shadow2 oncomputerserver C1 and C2 ,respectively.Theyaccess theirdataremotelystoredonleserver S undertheleaccount F .Theirdata requestsareauthenticatedandprocessedbytheuser-levelGVFSproxieson S TheacceptedrequestsareforwardedtotheNFSserver,andtheircredentials insidetherequestsaremappedfromtheshadowaccountstotheleaccount. NFSclientareprocessedbyitsproxy.Foravalidrequest,theproxymodiestheRPC messagetomapthejobaccount'sidentity shadow1 or shadow2 totheleaccount's F sothattherequestcanbeproperlyservicedbythekernelNFSserver.Uponreceivingthe resultfromtheserver,theproxyalsomodiestheRPCmessagetomaptheleaccount's identitybacktothejobaccount'sandthenforwardittotheclient.Becausetheuserdata areonlyexportedto localhost ontheserver,theremotedataaccessiscompletelyunder thecontroloftheproxies:theclientsdonothavedirectaccesstothedata;theycanonly accessthemthroughtheproxies.Thepathexportedbyaproxyalsoensuresthatauser e.g., X canonlyaccessherowndata /home/F/X ;shecannotaccesstheotherusers' lese.g., /home/F/Y sincetheyarenotexportedtoherclientbytheproxy. 49

PAGE 50

3.1.2.2Multi-proxyGVFS AlthoughaGVFS-basedvirtualDFScanbecreatedbyleveragingasinglenative NFSserver-sideproxy[87],thedesignsupportsconnectionsofproxiesinseries"between anativeNFSclientandserver.Inparticular,apairofproxiescanbeplacedonthekernel clientandservertoprovidebothvirtualclientandserverasdescribedatthebeginningof thischapter.Inthissetup,thekernelserverstillexportsthelesystemtoitslocalproxy servertheproxyrunningontheserverside,theproxyserverexportsittotheproxy clienttheproxyrunningontheclientside,andtheproxyclientexportsittothekernel client.Sotheclientcanonlyaccesstheremotelesystemthroughtheproxyclient:the proxyclientforwardstheaccesstotheproxyserver,andtheproxyserverthenforwards ittothekernelserver.Theproxyserveralsoperformsnecessaryuseridentitymappingto supportcross-domaindataaccess,whichisnotneededfortheproxyclient. AsillustratedinFigure3-4,apairofproxyclientandservercooperatebetweenthe nativeNFSserverandclienttoestablishaGVFSe.g., GVFS1 .Theproxyserverworks inthesamewayasinasingle-proxyGVFS:thekernelNFSserverexportstheuserdata directory /home/F/X to localhost ,sotheproxyserveron S isresponsibleforprocessing andforwardingtheremoteaccesstothedataitalsomapsthejobaccount'sidentity shadow1 intheRPCstotheleaccount'sidentity F .Ontheotherhand,theclient accessestheremotedatathroughtheproxyclienton C .Theproxyclientexportsthe remotelesystemtoits localhost ,soitacceptsonlytherequestscomingfromthekernel NFSclientanditthenforwardsthemtotheproxyserver. Althoughamulti-proxydesignmayintroducemoreoverheadfromprocessingand forwardingRPCcalls,thereareimportantdesigngoalsthatleadtoitsconsideration: Improvedperformance :Theadditionofaproxyattheclientsideenables techniqueswhichaddresstheineciencyofnativeNFSprotocolandimprovethe performanceofremotedataaccess.Forexample,aproxyclientcanintroducealevel ofcachesonlocaldisksadditionaltokernel-levelmemorybuersandimproveaccess 50

PAGE 51

Figure3-4.Multi-proxyGVFSsetup.TwoproxiesworkbetweenthenativeNFSserver andclientcooperativelytoprovideremotedataaccess.KernelNFSserver exportstheuserdatadirectory /home/F/X to localhost ,sotheproxyserver on S isresponsibleforprocessingandforwardingtheremoteaccesstothe data.Italsomapsthejobaccount'sidentity shadow1 intheRPCstothele account'sidentity F .Ontheotherhand,theclientaccessestheremotedata throughtheproxyclienton C .Theproxyclientexportstheremotelesystem toits localhost ,soitacceptsonlytherequestscomingfromthekernelNFS clientanditthenforwardsthemtotheproxyserver. latencyforrequeststhatexhibitlocality;theproxyclientandservercanalsoemploy inter-proxyhigh-throughputdatatransferprotocolse.g.,SecureFTP,GridFTP[6]for accessoflargeles. Additionalfunctionality :ExtensionstotheNFSprotocolcanbeimplemented betweenproxieswithoutmodicationstonativeNFSclients/serversorapplications. Forexample,secureremotedataaccesscanbeachievedbyinter-proxyauthentication, authorization,dataencryption,andintegritycheck;theproxyclientandservercanalso 51

PAGE 52

cooperatetorealizene-grainedconsistencymodelstomaintainthedatacoherence betweenclient-sidecachesandserver. Thesepotentialenhancementsenabledbymulti-proxyGVFSarediscussedindetail inChapter4.Therestofthischapterpresentsathoroughperformanceevaluationofthe basicNFS-basedGVFSimplementation. 3.2Evaluation 3.2.1Setup AprototypeofGVFSisimplementedbasedonvirtualizingNFSv2andv3andit isevaluatedinthissubsectionusingexperimentswithtypicallesystembenchmarksin alocal-areaenvironment.ThesebenchmarksexerciseGVFSwithintensivelesystem operationsanddemonstrateitsperformancecomparedtoconventionalDFSs.The experimentswereconductedonahigh-speedLANGigabitEthernetinordertoreveal theworst-caseoverheadoftheuser-levelvirtualization.Twophysicalserverswereused asthelesystemclientandserver,andeachhasdual2.4GHzhyper-threadedXeon processorswith4GBmemoryandrunsFedoraCore6withkernel2.6.17. TheseexperimentscomparetheperformanceoftheNFS-basedGVFSwiththenative NFS,wheretheversion3ofNFSoverTCPwasusedforboth.Theserversexportedthe lesystemwithwritedelayandsynchronousaccess.ThenativeNFSdaemonusedthe defaultcongurationof8threads,whereastheGVFSproxieswerealsomultithreadedwith 8workerthreads.See4.2.2foradetaileddiscussiononGVFSmultithreading.Dueto thelimitationofthekernels,themaximumblocksizeforreadandwriteRPCswassetto 32KB.Noswapwasusedonthephysicalmachinesduringtheexperiments.Everyrunwas startedwithcoldkernelbuerbyunmountingthelesystem. 3.2.2Stat ThevirtualizationprovidedbyGVFSinvolvesoverheadfromprocessingRPCs withtheuser-levelproxies.Therstexperimentstudiesthisoverheadbyusingamicro benchmarktomeasurethelatencyofasingleRPC.Thisbenchmarkusesthestatsystem 52

PAGE 53

Figure3-5.LatencyofastatsystemcallthattriggersasingleGETATTRRPCtothele server.ThreedierentDFSsetupswereconsidered:nativeNFS NFS ,GVFS basedononlyproxyserver GVFS-1-Proxy ,andGVFSbasedonbothproxy clientandserver GVFS-2-Proxy calltocheckadirectoryontheremotelesystem,whichtriggersthekernelclientto issueasingleGETATTRRPCtoretrievethedirectory'sattributes.Theseattributesare preloadedintheserver'smemory,andthusthelatencyofthestatcallmainlyentailsthe networkround-triptimeandtheRPCprocessingdelay. ComparedtousingNFStoservethestatcall,GVFSintroducesadditionallatency fromtheuser-levelRPCprocessingandthekernel-userspaceswitching.Totakeacloser lookatthis,twodierentGVFSsetupsweretested:in GVFS-1-Proxy ,onlytheproxy serverwasusedtocreatethevirtualDFS;andin GVFS-2-Proxy ,bothproxyclientand serverwereemployed.BothUDPandTCPwereconsideredasthetransporttocarrythe RPCs.Figure3-5illustratesthelatenciesofthestatcallonthesedierentsetups.Witha singleproxy,GVFSadds0.23msand0.27msofdelaywithUDPandTCP,respectively. Whenboththeproxyclientandserverareused,thevirtualizationoverheadincreasesto 0.48msforUDPand0.51msforTCP. 53

PAGE 54

Figure3-6.ThroughputsofIOzonewithdierentnumbersofthreadsreadinglargeles throughseparateNFS/GVFSconnectionstotheleserver.Thestandard deviationsareallunder1%ofthereportedaveragevalues. ThismicrobenchmarkshowsthatthelatencyforasingleRPCisdoubledortripled usingGVFS.However,theapplicationperceivedoverheadcanbemuchsmaller,because theprocessingofmultipleoutstandingRPCscanbeoverlapped,andthelatencyofdisk accessescanalsodiminishtheuser-leveldelay.Thisisdemonstratedbythefollowing experiments.ThetypicalGVFSsetupwhichutilizesbothproxyclientandserverisused throughputtherestofthissubsection. 3.2.3IOzone ThesecondexperimentevaluatesthethroughputofGVFSwithatypicallesystem benchmark,IOzone[88].ItisusedtosequentiallyreadalargeleGBfromtheremote lesystem.Theleispreloadedintheleserver'smemory,andthusthereisnodisk accesstoslowdownthebenchmark'srequestrate.Thisextreme"intensivesetupreveals theworst-caseoverheadfromtheuser-levelvirtualization.ThethroughputsonNFSand GVFSareplottedinFigure3-6therstgroupofbars.NotethatthemaximumTCP throughputbetweentheclientandserveris111MB/smeasuredwithIperf[89].In 54

PAGE 55

Figure3-7.TheCPUusageoftheGVFSproxyclientandproxyserverduringonetypical runofIOzoneonGVFS.Theaverageusertimepercentagesfortheproxy clientandserverareabout14%and8%respectively. comparison,GVFSdeliversaperformancethatis70%ofthemaximumand80%ofNFS. ThisconrmsthatthecapabilityofhandlingmanyoutstandingrequestshelpsGVFSto signicantlyreducetheoverheadofuser-levelvirtualization. ThisoverheadwasfurthermeasuredintermsofCPUusageofitsuser-levelproxies. TheuserCPUtimepercentagesfortheproxyclientandserverwerecollectedthroughout thebenchmark'sexecutions.Inaverage,theyconsumeabout14%and8%ofCPUon theclientandserver,respectively.Figure3-7showstheproxyclient'sandproxyserver's CPUconsumptionsforonetypicalrunofthebenchmark.Consideringtheintensity oftheworkload,theseusagesarereasonable,andtheycanbemuchlowerfortypical applications.TheproxyclientspendsmorecyclesthantheproxyserverbecauseitsRPC takeslongertonish,whichisservicedacrossthenetwork,whereastheproxyserver's RPCisrepliedfromitslocalhost. ThisexperimenthasalsostudiedthescalabilityofGVFSwithamulti-clientsetup, whereasingleproxyontheserverservicesconcurrentdataaccessfrommultipleproxieson 55

PAGE 56

theclient.IOzonewasexecutedwithseveralthreads,eachsequentiallyreadingadierent lethroughaseparateproxyclient.Asthenumberofthreadswasincreased,thesizeof theleswasreducedaccordinglyfrom1GBto192MB,sotheycouldstillbepreloaded intheserver'smemory.Forcomparison,thebenchmarkwasalsoexecutedonnativeNFS, whereseparatedconnectionsbetweenthekernelclientandserverwereusedfortheIOzone threadstoaccessles. TheaggregatethroughputswithvariousnumbersofthreadsareshowninFigure 3-6.Comparedtotheresultsfromtheprevioussingle-clienttests,thethroughputof GVFSisconsistentwithrespecttoboththemaximumachievablethroughputand NFS'throughput.Regardlessofthenumberofconcurrentintensiveclients,GVFScan alwayseectivelyutilizetheresourcesanddeliverthesamelevelofperformance.This demonstratesthatGVFScansupportscalabledatasharingbyusingasingleproxyserver toservicealargenumberofclients.Italsoshowsthathavingseveralproxiesrunningon thesamehostcanbeanecientwayofprovidingmultiplevirtualDFSsuponasingle physicalresource. 3.2.4PostMark Thethirdexperimentchoosesamorerealisticlesystembenchmark,PostMark[90], whichsimulatestheworkloadsfromemails,news,andWebcommenceapplications.It startswiththecreationofapoolofdirectoriesandlescreationphase,thenissuesamix oftransactions,includingcreate,delete,read,andappendtransactionphase,andnally removesallthedirectoriesandlesdeletionphase.Incontrasttotheuniform,sequential dataaccessesusedintheIOzoneexperiment,thelesystemisrandomlyaccessedby PostMarkwithavarietyofdataandmetadataoperations.Intheexperiment,theinitial numberofdirectoriesandleswere200and2000,thelesizesrangedfrom512Bto 50KB,andthenumberoftransactionswassetto20000. TheexecutiontimesofPostMark'svariousphasesanditstotalruntimeonNFS andGVFSareshowninFigure3-8.ComparedtoNFS,thetotalruntimeonGVFSis 56

PAGE 57

Figure3-8.RuntimesofthevariousphasesofPostMarkaswellasitstotalruntimeson NFSandGVFS. onlylongerby7%,andfortheveryintensivetransactionphase,wherealargevolume ofmetadataanddataupdatesareinvolved,GVFSisslowerby6%.Itisevidentthat theoverheadofGVFS'user-levelvirtualizationisverysmallforsuchabenchmarkthat involvesalargeamountofdiskaccessesandexhibitsamoretypicaldataaccesspattern. 57

PAGE 58

CHAPTER4 APPLICATION-TAILOREDDISTRIBUTEDFILESYSTEMS User-levelvirtualizationinGVFSprovidesthefoundationforapplication-tailored DFSs.Basedonvirtualization,GVFSscanbemanagedondemandbymiddlewareon aper-application,per-sessionbasisFigure3-4:aGVFSiscreatedbeforeacompute sessionstartstoprovideremotedataaccessfortheapplication,anditisdestroyedafter thecomputesessioncompletes.Suchadataprovisioningcycleiscalleda GVFSsession SinceeachGVFSsessionisdedicatedtoservingitsapplication'sremotedataaccess, itcanbecustomizedwithcongurationsthataretailoredtotheapplication'sneeds. ConcurrentGVFSsessionscansharetheunderlyingphysicalsoftwareandhardware resources,whereasthevirtualizationlayerenforcestheisolationamongthemandallows themtobecreatedandcustomizedindependently. ThisvirtualizationalsoallowsGVFSsessionstoemployextensionsandimprovements thatarenotavailableinthephysicalDFS,andaddressthelimitationsandineciencies ofthephysicalDFS.Thischapterintroducessuchenhancementsthataredesigned forapplication-tailoreddataprovisioning,particularlyinawide-area,cross-domain environment.TheycovervariousimportantaspectsofDFSs,includingperformance, consistency,security,andfault-tolerance.Notethatoptimizationsontheseaspectsand otherssuchascostcannotbeconsideredinanisolatedmanner,becauseoneoftenhas implicationsonothers.Therefore,tradeoshavetobemadetobalanceamongthese dierentgoals.TheexibilityprovidedbyGVFSallowsindividualGVFSsessionsto choosethecongurationsthatbestsuittheirneeds. 4.1MotivatingExamples WhileapplicationtransparencyisastrongassetofDFS-baseddataprovisioning approaches,itcanalsobecomealiabilitywhenDFSsarescaledtogrid-styleenvironments. ThenatureofWANandgridresourcesdecidesthatoptimizationsmustbemadefor suchenvironmentsinordertoprovideapplication-desiredperformance,consistency, 58

PAGE 59

security,andreliability.However,itisoftenthecasethatonesizedoesnottall". Theenhancementsneedtobecustomizedaccordingtothedataaccesscharacteristics andrequirements,andconsideredinacontextwhereapplication-specicmodications areunlikelytobeimplementedinkernels.Thesemotivatethepursuitofuser-level application-tailoredDFSenhancementsinthischapter.Potentialusesofsuchenhancements canbeillustratedwiththefollowingconcretescenarios. Distributedvirtualmachines :VirtualmachinesVMsareincreasinglyusedin distributedsystems[17][79].EcientprovisioningofVMimages,whicharetypicallyvery largeinsize,iskeytodynamicinstantiationsofVMsacrossnetworks.UsingDFSfor remoteVMstateaccessallowsdistributedVMstobequicklystartedwithoutentirely transferringthelargeVMstateles,anditsupportsmanyVMinstancestobecreated fromasmallsetoftemplatesbyread-onlysharingtheirtemplateswithindependent copy-on-writestate.Duetotheabsenceofwritesharing,bothreadsandwritescan becachedwithwritedelayontheclientsidetosupportecientexecutionsofthe instantiatedVMs,whereacustomizedcachingschemeisnecessarytoprovidethecapacity andpersistenceneededfortheaggressivecaching. Softwarerepositories :Softwarerepositoriesarepopularinenterprisesasameans ofsharingsoftwareamongusers.SuchrepositoriesareoftensetuponaDFSinan enterprise-scalenetwork,read-onlysharedbyorganizationusers,andcentrallymanagedby systemadministrators.However,asthescaleoftheenterprise'sresourcesandusersgrows, supportforwide-areasharingbecomesachallengetotraditionalDFStechnologies.Inthis example,client-sidedatacachingisimportanttoimprovinguser-perceivedperformanceof usingapplicationsfromtherepository,butacacheconsistencyprotocolisalsoneededto lettheusersseethelatestsoftwareafteritisupdatedbytheadministrator. Scienticdataprocessing :Scienticdataareoftencontinuouslyproduced on-siteandatthesametimeprocessedo-siteincomputingfacilities.ADFShelps thedistributedprogramstoconvenientlysharethepossiblymassiveamountofdata, 59

PAGE 60

anditallowstheanalysistobeperformedoverdierentdatarangesandwithdierent granularity[91].Thisscenarioprecludestheuseofwritedelayonthedataproducing side,butpermitsreadstobecachedontheprocessingsidetospeeduptheanalysis. Thecacheconsistencyprotocolneedstosupporteectiveuseofthecacheddatawith smallconsistencymaintenanceoverhead.Meanwhile,itshouldstillprovideaconsistency guaranteethatallowsthegenerateddatatobeavailableforprocessinginatimelyfashion. GSI-enabledgridlesystems :EmployingDFSstoprovidedatatogrid applicationsallowsunmodiedapplicationstotransparentlyutilizecomputingand dataresourcesacrossadministrativedomains.Securityiscriticalinsuchgridlesystems becausedataaresharedamongorganizationswithlimitedmutual-trust,andstoredand transferredonresourceswithlimitedsecurity.ItisnecessarytoenhancetheDFSsto supportstrongauthentication,privacy,andintegrity.Thesemechanismsalsoneedtobe compatiblewiththewidelyadoptedgridsecurityinfrastructureGSI[51],sothatthe datamanagementcanbeinteroperablewithothergridmiddlewareandintegratedwith existinggridsystems. Long-runningcomputationtasks :Largecomputationtasks,suchassimulation anddatamining,areoftenconductedinparalleloncomputingresourcesaggregatedacross LANandWAN.UsingDFSstosupportthesetasksenablestheparallelprocessesto transparentlysharetheinputsandoutputswithoutexplicitlytransferringthem.These tasksoftentakealongtime,possiblydaysorevenweeks,tonish,andthusrequirehighly reliableexecutions.TheirDFSsneedtobetailoredtoprovidegooddataavailabilitythat cantoleratefailureshappenedonclients,servers,andnetworks.Itisalsodesirablethat theybeabletoautomaticallydetectandrecoverfromthefailures,andsupportcontinuous remotedataaccessforthesetaskstransparently. Tosatisfythediverseneedsofapplications,suchastheonesintheaboveexamples, therestofthischapterpresentstheapplication-tailoredenhancementsonseveral importantaspectsofDFSs,includingperformance,consistency,security,andreliability. 60

PAGE 61

4.2Performance PerformanceisoneofthemainhurdlespreventingconventionalDFSstoscale inwide-areaenvironments.Thosedesigndecisionsmadeundertheassumptionofa LAN-speedconnectionbetweenclientandserverdonotapplytoWAN,wherethe round-triplatenciesareoftenlargerbyordersofmagnitude.Consequently,ineciency appearswhensuchDFSsareusedinwidearea,e.g.,excessiveinteractionsbetweenclient andserveronalong-latencynetworklinkcancausesignicantincreaseontheresponse timeandreductiononthethroughput. 4.2.1Client-SideDiskCaching OneparticularlimitationofconventionalDFSsisontheuseofclient-sidecaching. Cachingisaclassicandsuccessfultechniquetoimprovetheperformanceofcomputer systemsbyexploitingtemporalandspatiallocalityofdatareferencesandproviding high-bandwidth,low-latencyaccesstocacheddata.However,typicalDFSclients employonlymemorycaching,becausetheyassumethatserversareincloseproximity. Nonetheless,memoryoftendoesnothavesucientcapabilitytoexploitlocality,and itisnon-persistentandthusunabletosupportextensivewritedelay.Forexample,the NFSprotocolallowstheresultsofvariousNFSrequeststobecachedbyanNFSclient [21].AlthoughmemorycachingisgenerallyemployedbyNFSclients,diskcachingisnot typical. Onwide-arealesystems,cachingiskeytohidinglongnetworklatenciesand improvinganapplication'soruser'sdataaccessexperience,becausetheoverheadofa networktransactionismuchhighercomparedtothatofalocalI/Oaccess.Hence,GVFS providespersistentcachingonclient-sidelocaldiskstoenhancetheperformanceofremote dataaccessonWAN. 4.2.1.1Design TheGVFSdiskcachingeectivelycomplementstheexistingmemorycachesin thenativeDFS.Itsgreaterdiskcapacitypromisesreductiononcachecapacityand 61

PAGE 62

conictmisses[92]andthuslesshigh-latencynetworkcommunications.Itsuseof nonvolatilestoragealsoallowsmoreaggressivewrite-backcachingbecausethedelayed datamodicationscanberecoveredacrossclientrestartsorcrashes.Therefore,employing diskcachingcanformaneectivecachehierarchy:memoryisusedasasmallbutfast rst-levelcache,whereasdiskworksasarelativelyslowerbutmuchlargersecond-level cache. DiskcachinginGVFSisimplementedatuser-levelbythelesystemproxy.Avirtual DFScanbeestablishedbyachainofproxies,wherethenativeO/Sclient-sideproxy theproxyclientcanbeemployedtoestablishandmanagediskcaches.Asillustrated inFigure4-1,kernelbuermissescanbesatisedlocallyiftheyhitinthediskcaches; otherwisetheyareforwardedtotheserverandthereturnedresultsarestoredinthedisk caches.TheGVFSdiskcachingoperatesatthegranularityofNFSremoteprocedure callsRPCs.AGVFSdiskcacheisgenerallystructuredinawaysimilartotraditional block-basedhardwaredesigns:itcontainslebanksthatholdframesinwhichdatablocks andcachetagscanbestored.Cachebanksarecreatedonthelocaldiskbytheproxy clientondemand.Theindexingofbanksandframesisbasedonahashoftherequested NFSlehandle 1 andosetandallowsforassociativelookups.Thehashingfunctionis designedtoexploitspatiallocalitybymappingconsecutiveblocksofaleintoconsecutive setsofacachebank,sothatwhenadatablockisservicedfromthecache,itsadjacent blockscanbequicklyaccessedfromthecacheaswell.Moredetailsaboutthecache designandimplementationcanbefoundin[93]. TheGVFSdiskcachingsupportsdierentpoliciesforwriteoperations:read-only, write-through,andwrite-back,whichcanbeconguredbymiddlewareforspecicuser andapplicationonaperGVFSsessionbasis.Write-backcachingisanimportantfeature 1 Filesanddirectoriesarereferredbylehandles.Alehandleisanopaquebinaryvaluewhichcanbe upto64byteslonginNFSv3andevenlongerinNFSv4.Ale'slehandleisassignedbyanNFS server,andituniquelyidentiestheleontheserverthroughoutitslifetime. 62

PAGE 63

Figure4-1.AGVFS-basedvirtualDFScanbeestablishedbyachainofproxies,wherethe nativeO/Sclient-sideproxycanestablishandmanagediskcaches.Kernel buermissescanbesatisedlocallyiftheyhitinthediskcaches;otherwise theyareforwardedtotheserverandthereturnedresultsarestoredinthedisk caches. inwide-areaenvironmentstohidelongwritelatenciesbyleveragingthelocalityamong writeaccesses.Furthermore,write-backdiskcachingcanavoidtransferoftemporary les.Afterthecomputingsessioncompletes,auserordataschedulercanremove temporarylesfromtheworkingdirectory,whichautomaticallytriggerstheproxyto invalidatecachedmodicationsofthoseles.Thuswhentheproxywritesbackthecached modications,onlytheusefuldataaresubmittedtotheserver,sothatbothbandwidth andtimecanbeeectivelysaved. 4.2.1.2Deployment AsGVFSsessionsaredynamicallysetupbymiddleware,diskcachesarealso dynamicallycreatedandmanagedbytheirproxyclientsonper-sessionbasis.Whena GVFSsessionstarts,itsproxyclientinitializesthecachewithmiddlewarecongured parameters,includingcachepath,size,associativity,andpolicies.Duringthesession,some oftheparameters,includingcachewriteandconsistencypolicies,canalsoberecongured. Whenthesessionnishes,policiesimplementedbygridmiddlewarecandrivetheproxyto ush,write-back,orpreservecachedcontentsasneeded. 63

PAGE 64

Typically,kernel-levelNFSclientsaregearedtowardsalocal-areaenvironment andimplementawritepolicywithsupportforstagingwritesforalimitedtimein kernelmemorybuers.Kernelextensionstosupportmoreaggressivesolutions,such aslong-term,high-capacitywrite-backbuersareunlikelytobeundertaken;NFSclients arenotawareoftheexistenceofotherpotentialsharingclients,andthusmaintaining consistencyinthisscenarioisdicult.Thewrite-backcachinginGVFScanleverage middlewaresupporttoimplementasession-basedconsistencymodelfromahigher abstractionlayer:itsupportsmiddlewaretocommandaproxyclientthroughO/Ssignals andcontrolittowritebackandushcachecontents. Suchmiddleware-drivenconsistencyissucienttosupportmanygridapplications, e.g.,whentasksareknowntobeindependentbyaschedulerforhigh-throughput computing.Furthermore,itisalsopossibletoachievene-grainedcacheconsistency modelsthroughinter-proxycoordinationmechanisms,whicharepresentedinSection4.3. 4.2.1.3Application-tailoredcongurations ThereareseveralDFSsthatexploittheadvantagesofdiskcachingtoo,forexample, AFS[25]transfersandcachesentirelesintheclientdisk,andCacheFS[31]supports disk-basedcachingofNFSblocks.However,thesedesignsrequirekernelsupport,andare notabletoemployper-userorper-applicationcachingcongurations.Incontrast,GVFS isuniqueinsupportingcustomizationonaper-user,per-applicationbasis[94]. TheGVFSsessionscanemploydiskcachesindependentlyfromoneanother.The congurationsofcacheparameterssize,associativityandpolicieswrite-through, write-backcanbecustomizedaccordingtothedataaccesspatternsandrequirements ofapplications.Specically,forapplicationsthatuseintensivewrites,write-backcaching canbeemployedtoimprovetheapplication'sdataaccessperformance;forread-mostly applications,theuseofwrite-throughcanimprovethetoleranceofclient-sidefailuresand maintenanceofdataconsistency. 64

PAGE 65

ThesizeofaGVFSsession'sdiskcachecanalsobecustomizedaccordingtoits application'sneeds.Forexample,whentheuseofstorageisnotfree,theapplicationcan balancebetweenperformanceandcostbyadjustingitsdiskcachesize.Ontheotherhand, whentheavailablestoragecapacityislimited,themiddlewarecanallocatethediskspace amongthecachesofconcurrentGVFSsessionsaccordingtotheresourceutilizationpolicy, e.g.,basedonthepriorityofapplications,orbasedontheprotsgeneratedbyhostingthe applications.SeeSection6.2formorediscussionsonpolicy-drivenresourceallocationfor GVFSsessions.Anotherconcreteexampleofapplication-tailoreddiskcachingisenabling heterogeneousdiskcachingusingmetadatahandlingandapplication-specicknowledge,in ordertosupportblock-basedcachingforvirtualmachinediskstateandle-basedcaching forvirtualmachinememorystate,asdiscussedinChapter5. WhilecachesofdierentGVFSsessionsarenormallyindependentlyconguredand managed,GVFSalsoallowsthemtoshareread-onlycacheddataforsavingstorageusage andexploringmoredatalocality.Proxyclientsrunningonthesamehostcanaccessthe sharedcachesdirectly.Ontheotherhand,aseriesofproxies,withindependentcaches ofdierentcapacities,canbecascadedbetweenclientandserver,supportingscalability toamulti-leveldiskcachehierarchy.Forexample,theproxyclientslocatedinthesame LANcanemployadedicatedcacheservermanagedbyanadditionalproxythatinterposes betweentheproxyclientsandservers,whichformsatwo-levelhierarchywithGBytes ofcapacityinaclient'slocaldisktoexploitlocalityofdataaccessesfromthenode,and TBytesofcapacityavailablefromaLANdiskarrayservertoexploitlocalityofdata accessesfromclientsinthesameLAN.SuchasetupisstudiedinSection5.4.3tosupport fastvirtualmachinecloning. 4.2.1.4Evaluation ThissubsectionpresentstheexperimentalevaluationofGVFSwithdiskcaching usingaprototypeimplementedbasedonthevirtualizationofNFSv2andv3.The evaluationfocusesontheperformanceinWAN,thetargetenvironmentofGVFS.For 65

PAGE 66

easydeploymentandcontrolofthetestbed,VMware-basedvirtualmachineswereused tosetupthelesystemclientsandservers,andanetworkemulatorNISTNet[95] wasemployedtoemulatethewide-arealinksamongthem.Eachvirtualmachinewas conguredwith1CPUand512MBmemoryandwasinstalledUBUNTU7withkernel 2.6.20.Thesevirtualmachineswerehostedonacluster,whereeachphysicalnodehas dual2.4GHzhyper-threadedXeonprocessorsand1.5GBmemory.Thesystemclockona separatephysicalserverwasusedtomeasuretime,whichsucesthegranularityrequired bythisevaluation. ThisexperimentcomparesNFS-basedGVFSimplementationwiththenativeNFS. Unlessotherwisenoted,theversion3ofNFSoverTCPwasusedforboth.Theservers exportedthelesystemwithwritedelayandsynchronousaccess.ThenativeNFS daemonsusedthedefaultcongurationof8threads,whereastheGVFSproxieswerealso multithreadedanduse8workerthreads.See4.2.2foradetaileddiscussiononGVFS multithreading.ThedatablocksizeforreadandwriteRPCswassetto64KB.Noswap wasusedonthephysicalandvirtualmachinesduringtheexperiments.Everyrunwas startedwithcoldkernelbueranddiskcaches,ifused,byunmountingthelesystemand ushingthediskcache. TheexperimentevaluatesthethroughputofGVFSwithatypicallesystem benchmark,IOzone[88].Itwasexecutedintheread/rereadmode,whichsequentially readsa512MBletwicefromtheserver.Sincetheclientandserverhaveonly512MB ofmemory,thebuercachedoesnothelpwithitsLRU-basedreplacementforthe benchmark'ssequentialreads.Thisexperimentisdesignedtostudytheoverheadof virtualizationinwide-areaenvironmentswiththereadphase,anddemonstratethe benetsofdiskcachingwiththerereadphase.OnNFS,bothphasesneedtofetchthele entirelyacrossthenetworkfromtheserver'sdisk.Withthediskcache'sgreatercapacity, GVFScansatisfytherereadphase'sdataaccesslocallywithoutcontactingtheserver. 66

PAGE 67

Figure4-2.ThroughputsofIOzone'sreadphaseonNFSv3,NFSv4,andGVFS,with dierentnetworklatenciesbetweentheclientandserver.Thestandard deviationsareallunder10%ofthereportedmeans. ThethroughputsofthereadphaseonNFSv3,NFSv4andGVFSarecomparedin Figure4-2,withdierentroute-triptimeRTTbetweentheclientandserver.Whenthe networklatencyisrelativelysmalllessthan20ms,GVFS'throughputis16%lessthan NFS.RecallthatintheLANexperimentdiscussedinSection3.2.3,wheretheRTTis 0.071ms,theslowdownofGVFSis25%.Thelongernetworklatencyeectivelydiminishes thelatencyfromtheuser-levelvirtualizationandbringsGVFSperformanceclosertoNFS. WhentheRTTisbeyond40ms,GVFSbehavesaswellasNFS. Ontheotherhand,GVFSsubstantiallyoutperformsNFSbyleveragingdatalocality withdiskcaching.Thisisdemonstratedbythethroughputoftherereadphaseshownin Figure4-3.Withthehelpofwarmdiskcaches,thethroughputonGVFSisnotaected bythegrowingnetworklatency,andinfact,itisonlyboundedbythebandwidthof theclient'slocaldisks.Consequently,thespeedupwithrespecttoNFSincreasesasthe networklatencygrows,andGVFSisfourtimesfasterwhentheRTTreaches80ms. 67

PAGE 68

Figure4-3.ThroughputsofIOzone'srereadphaseonNFSv3,NFSv4,andGVFS,with warmcaches,withdierentnetworklatenciesbetweentheclientandserver. Thestandarddeviationsareallunder10%ofthereportedmeans. 4.2.2MultithreadedDataTransfer 4.2.2.1Designandimplementation Theabilityofoverlappingdatarequestprocessinganddatablocktransferis importanttothethroughputofremotedataaccess.ConventionalDFSsoftenusemultiple daemonstoserveincomingdatarequests,sothatmultipleoutstandingdatarequestscan behandledatthesametimewhilewaitingforthedataaccessestocompleteondisks. Forexample,onatypicalNFSserver,6to8NFSdaemonsareusuallyrunningtoserve theRPCrequestsfromclients.Thisabilityisevencriticalinawide-areaenvironment wherethenetworklatencyisveryhighbutthenetworkbandwidthissucienttotransfer dataformultiplerequests.Ifatanygivetimeonlyonedatarequestcouldbesent acrossthenetwork,theremotedataaccesswouldbesignicantlysloweddowneven thoughthenetworkbandwidthishighlyunderutilized.Therefore,whilevirtualizinga conventionalDFSsuchasNFS,theGVFSproxiesneedtobeabletoservicedatarequests inanon-blockingmannerinordertoimprovetheremotedataaccessthroughput. 68

PAGE 69

Toachievethisgoal,theGVFSproxyisenhancedbymakingitcapableofmultithreading. Specically,theGVFS'NFSdaemonconsistsofmultiplethreadswhichworkarounda RPCqueue.AdispatcherthreadisresponsibleforreceivingRPCrequestsfromthe clientandputtingtheminthequeue.Theotherworkerthreadsconcurrentlyretrievethe requestsfromthequeue,sendthemouttotheserver,andreturntheresultstotheclient. Inthisway,eventhougheveryworkerthreadcanonlyhandleasingleRPCrequestina blockingmanner,theentireproxyisprocessingtherequestsinanon-blockingmanner. TheprototypeofmultithreadedGVFSproxyisdevelopedonLinux,andits implementationisnottrivialduetothefactthatthestandardRPClibraryisnot multithreading-safeMT-safe.ProgramsthatmakeuseofRPConLinuxtypically utilizetheexistingRPClibraryprovidedbythestandardClibrary,buttheprogram cannotworkcorrectlyifitusesmultiplethreadstoissueandserviceRPCrequests. ThisiscausedbythefactthatcertaindatahandlingstructuresintheRPClibraryis sharedandthusconictshappenwhenmultiplethreadsareaccessingitconcurrently. Toaddressthisproblem,theprototypeusestheTI-RPClibrary[96]toprovidetheRPC functionality.ThislibraryisanimprovedversionoftheexistingRPClibraryinLinux, whichprovidesgenericRPCfunctionalitytoapplicationsindependentlyoftheunderlying transportprotocols.Theexistinglibraryistransport-dependentandwilleventuallybe replacedbytheTI-RPClibrary.NotethattheTI-RPClibraryisstillnotcompletely MT-safe,andtheproxyemploystechniquestofurtherimproveuponthatandmakeitself MT-safebyreplicatingtheshareddataprocessingstructuresintheproxytomakesure thateachthreadhasitsdedicatedcopytoworkwith. TheuseofmultithreadinginGVFSproxyalsoallowsaGVFSsessiontocustomize itsbandwidthusageaccordingtoitsneeds.AlthoughthenativeNFSserverisalso multithreadedandcanservemultipleclients'dataaccessesatthesametime,itisnot possibletoisolatethemfromeachotherandcontrolthebandwidthconsumedbyeach client.Incontrast,withmultithreadedproxy,aGVFSsession'sbandwidthusagecanbe 69

PAGE 70

Figure4-4.ThroughputofIOzonewiththenumberofGVFSworkerthreadsvaryingfrom 0to16.ThedispatcherthreadisresponsibleforqueuingtheincomingRPC requests,whereastheworkerthreadsareresponsibleforissuingthequeued RPCrequeststotheserver.Whenthenumberofworkerthreadsis0,the proxyisinfactnotmultithreadedanditblocksoneveryRPCrequestuntil theremotecalliscompleted. exiblycontrolledbytuningthenumberofthreadsusedbytheproxy.Asdiscussedinthe customizationofcachesizeinSection4.2.1.3,theabilityofcontrollingaGVFSsession's bandwidthusageisimportantintwofolds.First,itisimportantforanapplicationto tradedataaccessthroughputforotherconsiderations,e.g.,cost,whentheapplication hastopayfortheresourceusage.Second,itisnecessarytoallocatethesharednetwork bandwidthresourceamongtheconcurrentDFSsbasedonpoliciessothattheresource providercanoptimizeitsresourceprovisioning. 4.2.2.2Evaluation Thissubsectionusesanexperimenttodemonstratetheeciencyofmultithreaded datatransferandtheeectivenessofusingthenumberofthreadstocontrolthe bandwidthusage.TheexperimentswasconductedonaGigabitEthernetwherethe lesystemclientandserverweresetupontwovirtualmachines.Eachvirtualmachines 70

PAGE 71

wasconguredwith1CPUand512MBmemory,andwasinstalledwithUBUNTU7with kernel2.6.20.Theywerehostedontwophysicalservers,whereeachhas3.0GHzPentium Dprocessorwith4GBofmemory.TheexperimentevaluatesthethroughputofGVFS withatypicallesystembenchmark,IOzone[88].Thebenchmarkwasexecutedinthe readmode,whichsequentiallyreada512MBlefromtheserver. Figure4-4comparethethroughputsofIOzonewiththenumberofGVFSworker threadsvaryingfrom0to16.Thedispatcherthreadisresponsibleforqueuingthe incomingRPCrequests,whereastheworkerthreadsareresponsibleforissuingthequeued RPCrequeststotheserver.Whenthenumberofworkerthreadsis0,theproxyisinfact notmultithreadedanditblocksoneveryRPCrequestuntiltheremotecalliscompleted. Theresultsshowthatthethroughputgrowspracticallylinearlyasthenumberofworker threadsincreasesupto8.However,whenthenumberofworkerthreadsgoesbeyond 8,thethroughputslightlydecreases.Thisisbecause8workerthreadsaresucientto handletheincomingrequestsandsustainthemaximumthroughputinthissetup,whereas morethreadsonlycausesmoreoverheadfrommultithreadinganddegradestheoverall performance.Nonetheless,theseresultsprovethatcontrollingthenumberofthreads between0andthenumberneededforthemaximumthroughputcanbeaneectiveway tothrottlethethroughputofaGVFSsession. 4.3Consistency AsDFSclientswidelyemploylocalcachesforperformanceimprovement,cache consistencybecomesaprobleminthatstaledatamaybeseenbyaclientwhileanother clienthasitmodied.Thishappenswhenaclientreadsastalecopyofthedatafrom itscacheorwhenaclientdoesnotpropagateitsmodiedcopyofdatainatimely manner.Toaddressthisproblem,aDFSneedstoprovideapropercacheconsistency semanticstotheclientsinordertosupporttheconcurrentdatasharinganddeliverthe application-desireddataaccessbehaviors.AsdiscussedinSection2.3.2,thedenitionof consistencyinthisdissertationonlyconsiderstheorderofreadandwriteoperationson 71

PAGE 72

asingledataiteme.g.,ale.Althoughitisaweakerformofconsistencycomparedto themodelssuchassequentialconsistencythatspecifyconstraintsontheorderingwith respecttotheentiredataset,itissucienttosatisfytheneedsofmanyapplications.On theotherhand,forapplicationsthatdorequirethestrongerformofconsistency,GVFS supportstheuseoflelockingmechanismstoachievethat. 4.3.1Architecture ThechoiceofaconsistencymodelinaDFSisanimportantanddicultone,because ithasimplicationsinthecomplexityofdevelopingapplicationsandtheDFSitselfand intheperformanceofapplications.Arelaxedmodelmaybeacceptableanddesirableto asimulationapplication,butmayfallshortofsupportingdatabaseapplicationsthatrely onlocks.Acomplexprotocolthatimplementsstrongconsistencymaybedesirableifit delivershighperformance,butundesirableifitisdiculttoimplement,test,anddeploy inexistingO/Ssorifitrequiresapplicationstouseaconsistency-awareAPI. Acacheconsistencyprotocoldescribestheimplementationofaspecicconsistency model.Forwide-areaDFSs,suchaprotocolisimportantnotonlytothecorrectexecution oftheapplication,butalsotoitsdataaccessperformance,becausetheclient-sever interactionsneededtomaintaintheconsistencyaccordingtotheprotocolarevery expensiveonWAN.IfDFSsarecapableofleveragingapplicationknowledge,thenumber ofnetworktransactionscanbereduced,therebyreducingserverloadsandaveragerequest latencies. Thearchitecturedescribedinthisdissertationenablesapplicationstouseconsistency protocolsbettersuitedthanthosenativetoaDFSinamannertransparenttothekernel andapplications.Inthisarchitecture,illustratedinFigure4-5,dierentconsistency protocolscanbeoverlaiduponthenativeDFSconsistencymechanisms,andbeapplied todatasessionsselectivelyandindependently,basedonthevirtualizationinGVFS.For example,nativeNFSprotocolsv2andv3mainlyrelyonclient-initiatedrevalidation requeststocheckforconsistency.Aproxyclienthidesthekernelclient'sconsistency 72

PAGE 73

Figure4-5.Application-tailoredcacheconsistencyprotocolsonGVFSsessions.The sessionsconsistofvirtualclients VC1 VC5 andservers VS1 VS3 implementedbyuser-levelproxies.Theyaredynamicallyestablishedand managedbymiddlewareandareoverlaiduponsharedphysicalresources C1 Cn S1 S2 .EachGVFSsessioncanemployindependentapplication tailoreduser-leveldiskcachingandconsistencyprotocol.E.g.,session1applies thedelegation-callbackbasedprotocolSection4.3.3andsupportsascenario wherereal-timedataarecollectedon-site VC1 andprocessedo-site VC2 ; Session2usestheinvalidation-pollingprotocolSection4.3.2toenable read-onlysharingofasoftwarerepository VS2 amongWANusers VC3 VC4 andmaintenanceupdatebyLANadministrator VC5 checksfromthedatasessionbyservingthemlocally,anditinsteadusestheuser-level mechanismstocooperatewiththeproxyservertokeepdataconsistentacrossthenetwork. Inthisway,kernelclientsandserversareoblivioustotheseuser-levelprotocols;theGVFS proxies,however,canbeconguredtomaintaintheconsistencyforthedatasessions accordingtotheirselectedprotocols. AvarietyofprotocolsaresupportedinGVFS,includingtheunderlyingDFS consistencyitself,andmoreimportantly,alternativeonesthatarespeciallydesigned forwide-areaenvironments.Thesecustomizedprotocolsareexplainedindetailinthe restofthissection.Theycanachievedierentlevelsofconsistency,andcanbeselected andtunedbymiddlewarebasedontheapplicationneedsonaper-session,per-application 73

PAGE 74

Figure4-6.SamplecongurationleusedtocustomizeaGVFSproxycacheaswellasthe consistencyprotocol.TheparametersincludethepathandsessionIDofthe proxycache,thesize,associativityandbanknumbersoftheattributecache acache ,datacache dcache ,andtheuseofwrite-backand invalidation-basedconsistency. basis.Figure4-5illustratestwoGVFSsessionsthatarecustomizedtosupportdata provisionfortwoexampleapplicationsdescribedinSection4.1. TwocomponentsarekeytorealizingGVFS-basedapplication-tailoredcache consistencyoverawide-areaenvironment:1lesystemproxiesprovidingper-application GVFSsessionsandenhancedwithuser-leveldiskcachingandconsistency;and2a middleware-basedservicethatschedulestheGVFSsessionsandcongurestheiruseof cachingandconsistencyaccordingtoapplicationneeds.Figure4-6showsanexampleof acongurationleusedtocustomizeaproxycacheaswellastheconsistencyprotocol. TheparametersincludethepathandsessionIDoftheproxycache,thesize,associativity, andbanknumbersoftheattributecache acache anddatacache dcache ,andtheuse ofwrite-backandinvalidation-basedconsistency.Suchacongurationleisusedby middlewaretoestablishanapplication-tailoredGVFSsession.Thischapterfocuseson 74

PAGE 75

Figure4-7.AGVFSsessionusingtheinvalidationpollingconsistencyprotocolbetween theproxyclientsat C1 C2 andtheproxyserverat S1 .TheRPCsissuedfrom kernelNFSclientscanbeservedfromthediskcaches,whiletheproxyclients polltheproxyserverforcontentsofper-clientinvalidationbuers BC1 BC2 tomaintainconsistency. thecoremechanismssupportingtherstcomponent.Themechanismstoscheduleand congureGVFSsessionsondemandwillbeinvestigatedinSection6.1.2. 4.3.2InvalidationPollingBasedCacheConsistency 4.3.2.1Protocol Thisprotocolemploysinvalidationbuersthatreectpotentialmodicationsto manylestoreducetherateatwhichper-leinformationispolled.Suchanapproach proveseectivewhenmodicationstothelesystemareinfrequentandneedtobequickly propagatedtoclients.TheapproachisillustratedinFigure4-7:theproxyserverofa GVFSsessionkeepstrackoflogicallytimestampedlehandlesthatneedtobeinvalidated inper-clientbuers;theproxyclientsuseanewprotocolmessage|GETINV|to requestinformationrelatedtotheinvalidationbuer. Server-side :Whentheproxyserverreceivesalemodicationrequestfroma cliente.g.,CREATE,WRITE,itaddsthele'sNFSlehandleintotheotherclients' 75

PAGE 76

invalidationbuers,sincethisleneedstobeinvalidatedfromtheotherclients'caches later.Theseper-clientinvalidationbuersareofnitesizeandimplementedascircular queues.Multipleinvalidationstothesameleinaninvalidationbuercanbecoalescedin ordertosavespace.Thetimestampsassociatedwitheachinvalidationentryaregenerated bytheserverandincreasedmonotonicallywithincomingrequests.Aproxyclient's GETINVrequestcontainsthetimestampofthelastinvalidationithasperformed,and theproxyserverreturnsthelehandlesstoredintheclient'sinvalidationbuer,which representthelesthattheclientneedstoinvalidateinitscache.Theproxyserveralso returnsitscurrent,updatedtimestamptotheproxyclient,whichwillbeusedinthe client'snextGETINVrequest.Theservercanhandleprotocolcaseswhereinvalidation informationisnotfullyavailablebyusingaag force-invalidate toinformtheclientto invalidateitsentireattributecache.TheproxyserverprocessesaGETINVcallasfollows: 1.IfthisistherstGETINVcallreceivedfromtheclient:initializeaninvalidation buerfortheclient,returnupdatedtimestampandforce-invalidationagwithvalue 1.Else, 2.IfthetimestampintheGETINVrequestisearlierthantheearliestoneintheclient's invalidationbuer:ushbuerandreturnupdatedtimestampandforce-invalidation agwithvalue1.Else, 3.Returnbuercontentsandclearthem,updatedtimestamp,andforce-invalidation agwithvalue0.IfthebuercontentsdonottinasingleRPCmessage,then returnapoll-againagwithvalue1alongwithpartialbuercontents. Client-side :TheproxyclientpollstheserverwithGETINVcallsforpotential invalidationsoccurredsinceitslastknowntimestampwithinashorttimewindow. Thepollingtimewindowcanbexedorvariedwithinacongurablerangeusingan exponentialback-o"stylepolicy|thewindowsizedoubleseverytimewhenthereis noinvalidationsduringthepreviouswindow,untilitreachesthemaximumvalue,andit dropstotheminimumvalueifinvalidationshavehappenedduringthepreviouswindow. Thereceivedinvalidationsareperformedbyinvalidatingthecachedattributesofthe 76

PAGE 77

concernedles,whichwillcausetheproxyclienttorevalidatetheseleswhentheyare accessedagain.TheproxyclientprocessestheresultfromtheGETINVcallasfollows: 1.Updatealocalvariableholdingthelastknownservertimestamp. 2.Ifforce-invalidationisequalto1:invalidateitsentireattributecache.Else, 3.Scanthereturnedbuerandinvalidatetheattributesoftheconcernedlesinits cache.And, 4.Ifpoll-againisequalto1:sendanotherGETINVcalltotheserverimmediately. Insummary,onlythelemodicationsobservedbytheproxyservercauseinvalidations ontheproxyclientsandtheyaretransferredinasmallnumberofGETINVreplies.Only thelesthataremodiedbytheotherclientsduringthepastpollingtimewindowneedto berevalidatedbyaproxyclient,butalltheotherper-leconsistencychecksissuedfrom thekernelNFSclientwillbelteredoutduringthenexttimewindow. 4.3.2.2Bootstraping Theprotocoluseslogicaltimestampstomanageinvalidationbuers.Theseare createdbyproxyserverandusedasargumentstoGETINVcallsbyproxyclient.The bootstrappingmechanismthatprovidesaninitialtimestamptoaclientusesaGETINV callwithanullargument.Anotherformofbootstrappingtakesplaceiftheserverfailsor restartsandlosestimestampinformation.Inthiscase,aclienthasatimestampwhichis invalidandmustobtainanew,validtimestamp.Theserverhandlesthiscasebyreturning anewtimestampandaforce-invalidationagtoeachclient'srstGETINVafteritcomes back. 4.3.2.3Failurehandling Themainfactorthatfacilitatesfailurehandlinginthisprotocolisthatthestate storedonproxyserverinvalidationbuers,timestampsandclientscachedattributes andtimestampsissoftstatewhichcanbesafelydiscarded.Iftheservercrashes,once recovereditcaninitializenewinvalidationbuersfromscratch,bootstrapclientswith newtimestampsasdescribedabove,andcontinuetoservetheclients'GETINVcalls. 77

PAGE 78

Ifaclientcrashesandlosesitstimestamp,thenafterrecoveryitissuesGETINVwith anullargument,andtheserverreturnsthelatesttimestampwithaforce-invalidation ag.Thesamemechanismcanbeusediftheclientimplementsapolicytolimitthe numberofinvalidationsitshouldprocessandboundtheoverheadfromperformingthe individualinvalidations,eectivelyallowingtheclienttoforceaself-invalidationonits entireattributecache. Ifanetworkpartitionhappens,itispossiblethattheserver'sinvalidationcircular queuefortheclienthaswrapped-aroundwhenitreceivesaGETINVfromtheclientagain. Theservercandetectthiscasebycomparingtheclient'scurrenttimestampwiththe earliesttimestampinitsbuer.Iftheformeroneisearlier,itmeansthattheclienthas notkeptupwiththeinvalidations,sotheservershouldreturntheforce-invalidationag andanupdatedtimestamp. Theinvalidationmechanismisintendedtoproviderelaxedconsistencyforthebenet ofperformance,butinconsistencycanoccurduringthepollingtimewindow:aclientmay readastaledatablockorlehandle.Itisappropriateforapplicationsthatcantolerate modestinconsistencywiththehelpfromuserormiddleware.Inpractice,writesharing happensmuchlessoftenthanreadsharing.Thereforethisprotocoliscapableofproviding applicationswithgoodperformanceandacceptableconsistency.However,ifstronger consistencyisrequired,thedelegationcallbackbasedprotocoldescribedbelowisbetter suited. 4.3.3DelegationCallbackBasedCacheConsistency 4.3.3.1Delegation StrongconsistencycanbeachievedinGVFSviadelegationandcallbackmechanisms. Adelegationgivesaproxyclienttheguaranteetoperformoperationsonthecacheddata withoutconsistencycompromises,whereascallbackisusedbyaproxyservertorevokethe delegationinordertoavoidpotentialconicts.Delegationandcallbackdecisionsaremade bytheproxyserveronper-lebasis.AGVFSsessioncanrealizestrongconsistencyby 78

PAGE 79

Figure4-8.AGVFSsessionusingthedelegationcallbackbasedconsistencyprotocol betweentheproxyclientsatC1,C2andtheproxyserveratS1.Thegure showsthesequenceofinteractionshappenedduringareaddelegationandits callback. disablingthekernelNFSclient'sattributecachetoforcerevalidationsoneveryaccessed le,andenablingtheGVFScache'sdelegationcallbackprotocoltohandleconsistency enforcement.Twotypesofdelegationsareprovided.Readdelegationallowsaclientto readcacheddatawithoutrevalidation;theperiodicconsistencychecksissuedbykernel NFSclientcanbefullyhandledatclientside.Withawritedelegation,theproxyclient canfurtherdelaywrites;bothreadandwriterequeststothelecanbesatisedfromthe GVFScachewithoutcontactingtheserver. IntheabsenceofopenandcloseleoperationsinNFSv2,v3,aproxyserver speculatesabouttheseoperationsbytrackingaclient'sdataaccess.Whenareadorwrite requestisreceivedthecorrespondingleisconsideredopened"bytheclient.Inaread sharingscenario,multipleclientscanhavereaddelegationsonthesameleatthesame time.Butwritedelegationcanbegrantedonlyifnootherclientshavetheleopened. Whenthereisnosharingconictaclientobtainsadelegationautomaticallywithitsrst 79

PAGE 80

read/writerequestforthele.Otherwise,theconictingrequesttriggerstheproxyserver torecallthele'sexistingdelegationsandmakeittemporarilynon-cacheableuntilthe conictisresolved. Ontheotherhand,whenalehasnotbeenaccessedbyaclientforawhile,itis speculatedasclosedbytheclientandtheproxyserverissuescallbackifthisclienthasa delegationonthele.Toallowaclienttoautomaticallyrenewadelegation,theproxy clientperiodicallyletarequestforthelebypassthecache.Thedelegation'sexpiration andrenewperiodsarebothcongurablepersession,e.g.,10minutesand8minutes respectively.Thecallbackensuresthecorrectnessofconsistencyeveniftheclocksonthe serverandclientarebadlyskewed. Fortheabovementionedproxyserver-to-clientinteractions,thedelegationand cacheabilitydecisionsareeitherpiggybackedonanativeNFSreplymessage,orenclosed intheGVFScallbackcalls.Figure4-8showsanexampleoftheseinteractions. 4.3.3.2Callback Acallbackrequiresaserver-to-clientRPCcall,whichisinherentlysupportedin GVFSbecauseaproxyworksasbothRPCclientandserver.Aproxyclientencapsulates itslisteningportnumberalongwithitsidenticationinregularRPCrequests,sothe proxyserverknowshowtoconnectanauthenticatedclientforcallbacks.Toavoid deadlocks,theproxiesaremultithreadedtoservebothNFSRPCsandGVFScallbacks. Correspondingly,separatedqueuesarealsomaintainedtobuerthesetwotypesofcalls. Callbackofareaddelegationinvalidatesthele'sattributesintheproxyclient's cache,whichcausesrevalidationofthele'scacheddatawhentheyareaccessed again,whereascallbackofawritedelegationalsoforcesthewritebackofcacheddata modications.Inasimpleimplementation,thecallbackdoesnotreturnuntilallthedata havebeensubmittedtotheserver.However,thevolumeofdirtydatacanbeverylarge andthusthecallbackaswellastheotherclient'srequestwhichtriggersthiscallbackhave tobeblockedforalongtimeandmayeventuallytimeout.Notethatthisisstillsafe 80

PAGE 81

becausebothNFSandcallbackrequestscanbesimplyretried.Butitisnotdesirableif theapplicationwhichwaitsonthewrite-backperceivesasubstantialresponsedelay. Sincearequesttoasingleblockdoesnothavetowaitfortheentirelebeing writtenback,theprotocolisoptimizedasfollows.Ifthenumberofcacheddirtyblocks isconsiderablylargee.g.,morethan1Kblocks,theproxyclientreturnsalistofthese blocks'osetsforthereceivedcallback.Theblockthatisrequestedbytheotherclient isimmediatelywrittenbackifitisindeeddirty,buttheotherblocksaresubmitted afterwards.Torealizethis,therequestedblock'sosetissentalongwiththele'sNFS lehandleinthecallback.Uponreceivingtheblocklistandtherstblock,theproxy serverconsidersthewritedelegationrevoked.Howeveritneedstomonitortheprogress ofthewrite-backandupdatethelistaccordinglyuntilitcompletes.Meanwhile,requests fromotherclientstotheblocksthatarenotwrittenbackwillstillgeneratecallbacksto forcetheclienttosubmitthempromptly. 4.3.3.3Statemaintenance TheproxyservermanagesaGVFSsession'sstateusingalistforparticipatingclients andahashtableforopenedles.Clientidenticationisprovidedbyuniquesessionkey ordistinguishednameencapsulatedbyproxyclientineveryRPCrequest.Pleasesee Section4.4fordetailsonauthenticationmechanisms.TheclientliststorestheirIDsand callbackports.Eachopenedlehasanentryinthehashtabletorecorditscurrentstate andsharers'clientIDs.AtimestampisalsokeptalongwithaclientIDandupdatedevery timetheleisaccessedbytheclient,whichisusedtospeculateontheleclose.Once theleisconsideredclosedbyaclient,theclient'sinformationisremovedfromthele's entry;anentryisdeletedfromthehashtablewhentheleisnotopenedbyanyclientany more. Theexpirationtimedeterminingwhetheraclienthasclosedaleornotpresentsa tradeo.Itsvaluecannotbetoosmall;otherwisedelegationsaregivenouttoooftenwhich requiresmanycallbackstomaintaintheconsistency.Incontrastalongexpirationtime 81

PAGE 82

snagsaclientfromgettingdelegationsandcanpotentiallyhurtsitsperformance,whilethe proxyserveralsoneedstotrackalargenumberoflesandsharers.Inthelattercase,the proxyservercanreducetheamountofstatebyproactivelyissuingcallbacksontheleast recentlyaccessedlesandthenevictingtheirentries. 4.3.3.4Failurehandling Failurehandlingismoreimportanttothisconsistencyprotocolthantheinvalidation pollingbasedprotocol,becausethestatestoredattheproxyserveriscrucialtostrong consistency.Butdelegationsalsoprovidetheproxyclientsopportunitiestocontinue servingapplications'requestsfromlocallycacheddataeveninpresenceofservercrashor networkpartition.Aftertheservercomesbackitcanreconstructthesession'sstateby issuingspecialcallbackstoalltheknownparticipatingclients.Torealizethis,theclient listdatastructurementionedaboveisalwaysstoreddirectlyindisk. Thistypeofcallbackisdierentbecauseittargetsattheentirecacheratherthan aspecicle.Thereaddelegationholderswillinvalidateallthecachedattributesand thusrequirerevalidationofeverycachedlewhenitisreaccessed.Aproxyclientthathas writedelegationswillalsoreplythecallbackwithalistoflocallymodiedlessothat theproxyservercanrebuildthehashtable.Notethatbeforeeveryclienthasanswered thecallback,theproxyservershouldblockalltheincomingrequests.However,thisgrace periodisconsiderablyshortbecauseitonlyrequiresasinglemulticastedcallbacktothe clients.Ifthecallbacktoaclienttimesout,theclientisassumedfailedandnotconsidered anymore. Thenatureofdiskcachingguaranteesthataproxyclientwouldnotloseanything afteritrecoversfromafailureanditcaneasilyreconstructthelistofdirtyblocksby scanningtheentirecacheonce.However,itneedstocontacttheservertoreconcileany inconsistencyhappenedduringitscrash.Therefore,itinvalidatestheentireattribute cachetoforcerevalidationonallthecachedles.Inaddition,foralethathas modicationsdelayedincache,theproxyclientcheckswiththeproxyserverwhether 82

PAGE 83

theleisupdatedbyotherclientsornotduringthecrash.Ifnot,theproxyclientthen writesbackthecachedmodications;otherwise,thosedataareconsideredstaleandare discarded.Notethatiftheuserormiddlewarethatmanagesthesessionmakessurethat nowritesharinghappensduringtheclient'scrash,thennodatalosswillhappenafter theclientisrecovered.Ontheotherhand,thisprotocolalsoallowstheclient/middleware toquicklygiveuponthefailedclientandredothedataoperationsonthesessionfrom anotherclient. 4.3.4Evaluation 4.3.4.1Setup TheproposedGVFScacheconsistencyprotocolsareevaluatedinthissectionusing experimentsonbothmicrobenchmarksandapplicationbenchmarks.Microbenchmarks exerciseGVFSwithsimpleprogramstodemonstrateitsperformancecomparedto conventionalDFSs,whereasapplicationbenchmarksuserealscientictoolstoinvestigate GVFSintypicalgridcomputingscenarios. Theemphasisoftheexperimentsisinwide-areaenvironments,whichwereemulated usingNISTNet[95].Eachlinkbetweenthelesystemclientsandserverwascongured withatypicalwide-areaRTTof40msandbandwidthof4Mbps.Sixlesystemclients andonelesystemserverweresetuponVMware-basedvirtualmachines,whichwere hostedontwophysicalservers.Eachphysicalserverhasdual2.4GHzhyper-threaded Xeonprocessorsand1.5GBmemory.Eachvirtualmachinewasconguredwith256MB memoryandwasinstalledwithSUSELinux9.2.TheuseofnetworkemulatorandVMs facilitatesthequickdeploymentofacontrollable,duplicableexperimentalsetup.However, timekeepingwithinavirtualmachineisofteninaccurate,sothesystemclockonaphysical serverwasusedtomeasuretime,whichsucesthegranularityrequiredbythisevaluation. Theexperimentsweremainlyconductedonlesystemsmountedthroughnative NFSv3andNFSv3basedGVFS,bothwithACLdisabled.Thelesystemserverexported thelesystemwithwritedelayandsynchronousaccess.Everyexperimentwasstarted 83

PAGE 84

Figure4-9.aNumbersofRPCstransferredoverthenetworkduringtheexecutionofthe Makebenchmark.bRuntimesoftheMakebenchmark.Thebenchmarkwas executedondierentsetups:NFS NFS ,GVFSwithread-onlydiskcaching GVFS ,andGVFSwithwrite-backdiskcaching GVFS-WB withcoldkernelbuerandGVFSdiskcachesbyunmountingthelesystemandushing thediskcache. 4.3.4.2Make TherstbenchmarkdemonstratestheperformanceedgeofGVFScachingina singleclientscenario.Itrunsmake"onanapplication'ssourcecodeTcl/Tk8.4.5, similartotheAndrewbenchmark[25].Themaketakes357Csourcesand103headersto generate168objects.Itwasexecutedonalesystemmountedfromtheserverviathree dierentsetups:nativeNFS NFS ,GVFSwithread-onlycaching GVFS ,andGVFS withwrite-backcaching GVFS-WB .ThisbenchmarkmainlyexercisestheGETATTR, LOOKUP,READ,andWRITERPCs.TheexecutiontimesandRPCcountsarereported inFigure4-9. 84

PAGE 85

ThedatafromexecutionsonNFSshowthat,althoughthemakeonlyaccesses hundredsofles,itgeneratestensofthousandsofcacheconsistencychecksGETATTR callsintheprocessofcross-referencingexistinglestoproducenewobjects.However, theresultsfromGVFSprovethatthediskcachecanvirtuallysatisfyallofthemwith itsconsistencyprotocol.Whentheclientusesinvalidation-pollingwithatypicalperiod e.g.,30seconds,onlytensofGETINVcallsarerequired;withdelegationcallback basedconsistencytherearenoextracalls.Thelargercapacityofthediskcachealso substantiallyreducesthenumberofLOOKUPcalls,andtheuseofwrite-backfurther decreasesthenumberofREADsandWRITEs.Consequently,inWANenvironmentGVFS runsthebenchmarkthreetimesfasterthanNFS.Theruntimeofthebenchmarkina 100MbpsLANwasalsomeasuredandisreportedinthegure.ItshowsthatGVFSis relativelyslowerthanNFSinLANduetotheoverheadfromRPCinterceptionandcache management.However,thisoverheadisverysmall:only4%withread-onlycachingand 8%whenwrite-backisalsoused,andasnetworklatencygrowsitshouldbeovercomeby thegainfromsavingnetworktrips,asconrmedbythenextexperiment. 4.3.4.3PostMark ThesecondexperimentusesthePostMark[90]benchmarkwhichsimulatesthe workloadsfromemails,news,andwebcommenceapplications.Theexperimentwith PostMarkwasconductedonNFSandGVFSindierentnetworkenvironmentsby varyingtheend-to-endlatency.TwoGVFSsetupsthatemploydierentcacheconsistency protocolsareused: GVFS-inv usestheinvalidation-pollingprotocol,overlaiduponthe defaultkernelNFScacheconguration;and GVFS-cb disablesthekernelattributecaching andappliesthedelegation-callbackprotocol.Thebenchmarkwasconguredwiththe followingparameters:theinitialnumberofdirectoriesandlesare20and200,thele sizesrangefrom512Bto16KB,andthenumberoftransactionsissetto2000. ThetotalexecutiontimesofPostMarkontheabovesetupswithdierentnetwork latenciesarecomparedinFigure4-10,andtheruntimesofthebenchmark'svariousphases 85

PAGE 86

Figure4-10.RuntimesofPostMarkwithdierentnetworklatencyoverNFSv3 NFS3 NFSv4 NFS4 ,GVFSwithdefaultkernelbuersetup GVFS-inv ,and GVFSwithkernelattributecachingdisabled GVFS-cb forthecaseof80ms-RTTareplottedinFigure4-11.TheresultsshowthatbothGVFS setupssubstantiallyoutperformNFSintypicalWANenvironments,andthespeedupis about2-foldwhentheRTTisbeyond20ms.Thiscanbeattributedtothesignicant amountofclient-serverinteractionssavedbyusingGVFScacheconsistencyprotocols. Specically,whentheRTTis80ms,thenumberofRPCsreceivedbytheserverduringthe executiononGVFSis47%and37%ofthatonNFSv3andv4,respectively. Because GVFS-cb hasthekernelattributecachingdisabled,itneedstohandlemore requestsfromthekernelclientandthusitisrelativelyslowerthan GVFS-inv .Thisis thepriceithastopayinordertorealizestrongconsistency,buttheslowdownisvery smallanditsperformanceisstillconsiderablebetterthannativeNFSprotocols.Notethat theNFSv4implementationisstillattheexperimentalstage,andfutureimprovements maysupportmoreaggressivecachinganddeliverbetterperformance.Nonetheless,such supportwillstillbedeeplyembeddedinthekernel,anditisunlikelytobetunedor modiedaccordingtotheneedsofspecicapplications. 86

PAGE 87

Figure4-11.RuntimesofthedierentphasesofthePostMarkbenchmarkoverNFSv3 NFS3 ,NFSv4 NFS4 ,GVFSwithdefaultkernelbuersetup GVFS-inv andGVFSwithkernelattributecachingdisabled GVFS-cb .Thenetwork RTTis80ms. 4.3.4.4Lock ThethirdbenchmarkstudiesthebehaviorofGVFS'dierentconsistencyprotocols whensupportingcooperative,multiple-clientworkloads.Itusesapopularmutual exclusionmechanismonlesystems:le-basedlocks.Intheexperiment,sixdistributed clientscompeteforalockbycreatinganindependenttemporaryleandattemptingto hard-linkittothesharedlockle.Ifaclientgetsthelock,itpausesforaperiodoften secondsandthenreleasesthelockbyunlinkingthelockle.Otherwise,itpausesfora secondandtriesforthelockagaintillitgetsthelock.Afteraclientreleasesthelock,it alsopausesforasecondandthenrejoinsthecompetitiontillitsucceedsfortentimes. ThisexperimentwasconductedinWANwithdierentconsistencyprotocols.It servesasagoodexampleofthetradeobetweenconsistencyandperformance.When consistencyisrelaxed,aclientmaynotseethereleaseofthelockimmediately,andthe previousownerofthelocktendstogetitagain.Ontheotherhand,strongerconsistency 87

PAGE 88

Figure4-12.aNumbersofRPCstransferredoverthenetworkduringtheexecutionof theLockbenchmark.bRuntimesoftheLockbenchmark.Thebenchmark wasexecutedacrossWANwithdierentsetups:NFSwith30srevalidation period NFS-inv ,NFSwithnoattributecache NFS-noac ,GVFSwith30s invalidationperiod GVFS-inv ,GVFSwithdelegationandcallback GVFS-cb ,andAFS.TheRPCcountsusedbyAFSarenotshownbecause itusesadierentRPCprotocolandisnotcomparable. providesbetterfairnessamongtheclientsbutalsoconsumesmorebandwidthand generateshigherserverloadsduetotheuseofmoreconsistencycalls.Todemonstratethe rstcase,thebenchmarkwasexecutedonNFSandGVFSbothwitharevalidation/invalidation periodof30seconds NFS-inv and GVFS-inv .Forthesecondcase,theexperimentwas conductedwithNFSwithnoattributecache NFS-noac andGVFSwithdelegationand callback GVFS-cb Byanalyzingthedistributionoflockacquiredfromtheexperimentalresults,itis conrmedthatfairnesscanbeachievedwiththestrongconsistencymodelsbutnotwith theweakones.Further,Figure4-12showsthat,inthelattercasethebenchmarktakes 88

PAGE 89

nearlytwicelongertoexecute,alsobecauseofthedelayforalockreleasebeingobserved byotherclients. Theoverheadinvolvedinachievingthesamelevelofconsistencyissignicantly dierentbetweenNFSandGVFS.GVFS'clientpollingprotocoluses44%lessconsistency checksGETATTR,GETINVthanNFSGETATTR.Inthestrongerconsistencycase thedierencebetweenNFSandGVFSisevenmoredramatic.Theconsistencyrelated callsissuedbyNFSGETATTRoutnumbersthatofGVFSGETATTR,CALLBACK bymorethan10-fold.HencesubstantialbandwidthandloadaresavedbyusingGVFS. NotethatalthoughthebenchmarkrunsfasteronGVFSthanonNFS,theadvantage isnotsolargeasinthenumberofRPCs.ThisisbecausemostoftheextraRPCs' latenciesareoverlappedwithlockowners'pausingtimesduringtheexecution. Asareference,anothertraditionalDFSthatdeliversstrongconsistency,AFS OpenAFS1.2.11[25][26],wasalsotestedwiththebenchmark.Theaboveexperiments provethatGVFScanexiblyandecientlyprovidedierentapplication-tailored consistencyprotocols,whichisdiculttoachievewithtraditionalDFSs. 4.3.4.5Softwarerepository Thewide-areasharedsoftwarerepositoryscenariodiscussedinSection4.1isstudied herewithNanoMOS,a2-Dn-MOSFETsimulator.Thisisacompute-intensiveapplication andbenetsfromparallelexecutionongridresources.Awide-arealesystemsupports itbyallowingtheWANuserstoread-sharetheapplicationanditsrequiredsoftware, includingMATLABwiththeMPItoolboxMPITB,andallowingthelocaladministrator tomaintaintherepositoryatthesametime.Intheexperiment,NanoMOSwasstored alongwiththeothersoftwareintherepositoryontheserver,anditwasexecutedin parallelonsixclientsforeightconsecutiveiterations.Therepositorywasmaintainedby theadministratorfromanotherclient,andbetweenthefourthandfthrunasoftware updatewasperformedontherepository.Twodierentcasesofupdateswereconsidered: updatetoMPITBonlyandtotheentireMATLABpackage.Therepositorywasshared 89

PAGE 90

Figure4-13.RuntimesoftheparallelNanoMOSbenchmarkaccessedfromawide-area sharedsoftwarerepository.Thebenchmarkwasexecutedinparallelonsix clients,whiletherepositorywasmaintainedbytheadministratorfrom anotherclient.Therepositorywassharedamongalltheclientsvianative NFSorGVFSwith30sinvalidationperiod.Betweenthe4thand5thrunan updatehappenedon:atheentireMATLABdirectory;btheMPITB directoryonly. amongalltheclientsvianativeNFSorGVFSwithinvalidationpollingbasedconsistency. TheruntimesoftheNanoMOSexecutionsareshowninFigure4-13. NanoMOS'workingdatasetisrelativelysmallabout30MBperclient,soboth NFSandGVFSclientscancacheitandreducetheruntimesincethesecondrun.But thedierenceisthattheNFSclienthastofrequentlycheckconsistencyforthecached dataabout2.7KGETATTRsperclientperrun,whichcanbealmosteliminatedby theGVFSclientwithitscacheconsistency.Asaresult,GVFSdeliversmorethan2-fold speedupcomparedtoNFS.Whenanupdatehappens,theNFSclientcannotknowhow manylesareaectedtheMATLABpackageconsistsof14Kles/directories,butthe MPITBhasonly540,soithastoalwaysissuethesamevolumeofconsistencychecks 90

PAGE 91

Figure4-14.RuntimesofconsecutiveexecutionsoftheCH1Ddata-processingprogram. DatagenerationandprocessingwereperformedacrossWAN,wheredata weresharedvianativeNFSorGVFSwithdelegationcallbackbased consistency.Thedata-processingprogramstartedeachrunwith30more inputlesfromthedata-producingprogram. fortheentirepackage.However,theGVFSclientonlyusesinvalidationsproportionalto thesizeoftheupdateandbatchesthemtogetherinafewtransactionsMATLABupdate needsabout30GETINVcallsperclient;MPITBupdateneedsonlytwocallsperclient. 4.3.4.6Scienticdataprocessing Anotherbenchmarkusesacoastaloceanhydrodynamicsmodelingapplication, CH1D,tomodelthedistributedscienticdataprocessingscenariodiscussedinSection 4.1:real-timedataareaccumulatedoncoastalobservationsitesandmeanwhileprocessed ono-sitecomputingcenters.Awide-arealesystemhelpstheprogramstosharedata naturallywithoutexplicitlytransferringdatabackandforth.Intheexperiment,the data-processingprogramranconsecutivelyfor15times,whereeachrunwasstartedwith 30moreinputlesfromthedata-producingprogram.Thedataweresharedbetweenthe programsvianativeNFSorGVFSwiththedelegationandcallbackconsistencyprotocol. 91

PAGE 92

Similartothepreviousbenchmark,theinputdatasettothedata-processingprogram issmallandeven15runsofdatacanstilltintoitskernelclient'smemorybuer. However,asthedatasetgrowstheamountofconsistencythatthekernelclienthasto maintainalsoincreasesaccordingly.Theruntimesofthedata-processingprogramFigure 4-14clearlydemonstratethistrend:theoverheadfrommaintainingcacheconsistency increaseslinearlyasthesizeofthedataset.Incontrast,withGVFS'consistencyprotocol thisoverheadismuchsmallerandremainspracticallyconstantforeachrunonly30 callbacks.Accordingly,theperformancespeedupachievedbyGVFSalsogrowsasthe datasetdoesandatthe15thrunthebenchmarkrunsalready5timesfasterthanonNFS. 4.4Security Securityhasalwaysbeenoneofthemostimportantconcernsfordatamanagement ingrid-styleenvironments,wheredataaresharedacrossorganizationsanddomains withlimitedmutual-trust,andstoredandtransferredonresourceswithlimitedsecurity. However,conventionalDFSsdesignedfortheuseonLANsupportonlyweaksecurity, becausetheyareoftendeployedinarelativelymoretrustworthyenvironment.For example,NFSv2andv3typicallyemploysUNIX-styleauthenticationwithuserand groupidentiersIDs,whichisdiculttobeusedacrossdomains.Italsodoesnot provideprivacyandintegrity,andthusNFSRPCmessagescanbeeasilyspoofed,altered, andforged.SeveralWAN-orientedDFSsAFS[25][26],Coda[27]providestrongsecurity mechanisms,buttheyneedacomplexsecurityinfrastructureKerberos[41]inplace, whichrequiressubstantialadministrativeworkfromtheinvolveddomains. Inaddition,conventionalDFSsarenotdesignedtosupportapplication-tailored, dynamicallycongurablesecuritymechanismsandpolicies.Nonetheless,inagrid system,virtualorganizationsaredynamicallyestablished,applicationsandservices aredynamicallyinitiated,andentitiesandtrustaredynamicallycreated.Applicationsand theirexecutionenvironmentsalsohaveverydiverseneedsforsecurity.Insomecases,a cross-domainuserandgroupIDmappingissucientforauthenticationandauthorization, 92

PAGE 93

Figure4-15.PrivategridlesystembuiltuponGVFSvirtualization,SSHtunneling,and session-keybasedcross-domainauthentication. whereasinothers,asecuritytokenthatuniquelyidentiesauseracrossthedistributed systemsisnecessary.Tosomeapplications,privacyisnotimportantandthusencryption canbeavoidedtoimprovetheperformance,whereasothersmayaccesshighlysensitive dataandneedstrongencryptionmechanismthatrequiressubstantialcomputation. Hence,strongsecurityisneededforwide-areaDFSs,whereasper-application congurationisalsoimportantsinceithasimpactsonbothsecurityandperformance. GVFSsupportsbothofthesegoalsontopofthevirtualizationlayerbyprovidingstrong securitymechanismswithhighcustomizability,whichareachievedthroughthetwo dierentsecurityapproachesdiscussedintherestofthissection. 4.4.1SecureTunnelingBasedPrivateGridFileSystem InthecontextofRPC-basedapplications,securityincommunicationcanbeprovided withinoroutsidetheRPCprotocol.Akeyadvantageofthelatterapproachliesinthe factthatexistingRPC-basedclientsandserverscanbereusedwithoutmodications.This subsectiondescribessuchanapproachtakenbyGVFStoprovideprivategridlesystems basedonsecuredatatunnelingFigure4-15. 4.4.1.1Securedatatunneling SecureRPC-basedconnectionscanbeestablishedthroughtheuseofTCP/IP tunneling.Atunnelallowstheencapsulationandencryptionofdatagramsatthe clientside,andcorrespondingdecryptionandde-capsulationattheserversite.It 93

PAGE 94

supportsprivatecommunicationchannelsinanapplication-transparentmanner.The application-transparentpropertyoftunnelingisakeyadvantageofthistechniqueand hasfoundwideuseinapplicationssuchasVirtualPrivateNetworksVPNsandsecure remoteX-Windowssessions. TunnelingofRPC-basedconnectionscanbeachievedthroughmechanismssuch asSSLandSSH.Thelatterisa defacto standardforsecurelogins,providingstrong authentication,dataprivacy,anddataintegrityforremoteloginsessions,aswellas tunnelingofarbitraryTCPconnections.GVFSleveragesthefunctionalityofSSH tocreateauthenticated,encryptedtunnelsbetweenproxyclientandproxyserveras illustratedinFigure4-15.TunneledGVFSconnectionsareTCP-based.This,however, doesnotpreventGVFSfromsupportingUDP-basedkernelclientsandservers.Aproxy clientcanreceiveRPCcallsoverUDPfromlocalhostandforwardtotheproxyserver usingTCP,andtheproxyservercanreceiveRPCcallsovertheTCPtunnelandforward tolocalhostusingUDP. TheuseofSSHtotunnelNFStrachasbeenpursuedbyrelatedeorts,suchas SecureNFS[48].AkeydierentiatorofGVFSfrompreviousapproachesisthatprivate GVFSsessionsareestablished dynamicallybymiddlewareonaper-sessionbasis ,rather thanstaticallybyasystemadministratorforgroupsofusers.Anotherkeydierenceisthe GVFSsupportforper-useridentitymappingsacrossnetworkdomains. Per-usertunnelsandusermappingsarekeytoestablishingdynamiclesystem sessionsinagrid-orientedenvironment,whereusersbelongtodierentadministrative domains.Asecuretunnelmultiplexedbyusersfacesthesamelimitationsforcross-domain authenticationasNFS,sinceRPC-basedsecuritymustbeusedtoauthenticateusers withinatunnel.Withper-user,perlesystemsecurechannels,thetaskofguaranteeing privacyandintegritycanbeleveragedfromthesecurechannels,whereasthetaskof authenticatinguserscanbeindependentlycarriedoutbyeachprivatelesystemoutside ofitschannel. 94

PAGE 95

4.4.1.2Securitymodel TheGVFSprivatelesystemchannelsrelyonexistingkernel-levelservicesatthe clientlesystemmounting,serverlesystemexporting,user-levelmiddleware-controlled proxiesatbothclientandserver,andSSHtunnelsestablishedbetweenthem.Hence,the deploymentofGVFSinvolvesthesetupofappropriatetrustrelationshipsbetweenclient, server,andmiddleware. IntheGVFSsecuritymodel,thedataserveradministratorneedstotrustgrid middlewaretotheextentthatitallowsaccesstoanexporteddirectorytreetobe brokeredbyproxiese.g., /GVFS/X onserver S inFigure4-15.Inessence,theserver administratordelegatesaccesscontroltooneormoreexporteddirectoriestothegrid middleware.Thisisatrustrelationshipthatissimilartothosefoundinothergrid datadeploymentse.g.,GridFTP[6].ThearchitectureofGVFSallowskernelexport denitionstobeimplementedbylocalsystemadministratorsinasimplemanner|the kernelserverexportsonlytothelocalhost,andexportsonlythedirectoriesthatshouldbe accessibleviaGVFS.Usersoutsidethelocalhostcannotdirectlymountlesystemsfrom thekernel|onlyviaGVFSproxies.Atypicalscenario,whereabasehomedirectoryfor gridusersisexportedthroughGVFS,requiresasingleentryinanexportsdenitionle. Then,theproxiesareresponsibleforauthenticatingaccessestothoselesystems exportedbyGVFS.Thisisaccomplishedbymeansoftwomechanisms.Therst authenticationmechanismisindependentfromtheproxyandconsistsoftheclient machinebeingabletopresentappropriatecredentialsanSSHkeyoranX.509certicate forGSI-enabledSSHtotheservermachinetoestablishatunnel.Second,oncethetunnel isestablished,itisnecessaryfortheproxyservertoauthenticaterequestsreceivedthrough it.Typically,NFSserversauthenticateclientrequestsbycheckingtheoriginofNFS callsandonlyallowingthosethatcomefromprivilegedportsoftrustedIPstoproceed. IntheprivateGVFSsetup,theoriginatorofrequestssenttotheproxyserveristhe server'stunnelend-point.Hence,theproxyserverreceivesrequestsfromthelocalhost 95

PAGE 96

andfromnon-privilegedportsandcannotauthenticatetheclientbasedontrusted IP/portinformation.Itthusbecomesnecessarytoimplementanalternativeapproachfor inter-proxyauthenticationbetweentunnelend-points. TheauthenticationintheprivateGVFSapproachconsistsofthedynamiccreation ofarandomsessionkeybymiddlewareatthetimetheproxyserverisstartedandits transmissionoveraseparateprivatechanneltotheproxycliente.g.,usingGSI-enabled SecureCopy.ThentheproxyclientappendsthekeytoeachNFSprocedurecall,and theproxyserveronlyauthenticatesacomingrequestifitisoriginatedfromthelocalhost andithasasessionkeythatmatchestheproxyserver'skeyFigure4-15.Hence,the useofsessionkeysiscompletelytransparenttokernelclientsandserversandrequiresno changestotheirimplementations;itonlyappliestointer-proxyauthenticationbetween tunnelend-points.Thesesessionkeysareusedforauthentication,similarlytoX11/xauth, butnotforencryptionpurposestheprivacyofGVFSisprovidedbyitsSSHchannel.In theimplementation,asessionkeyisarandomlygenerated128-bitstringandencapsulated inoriginalNFSRPCmessagesbyreplacinganunusedcredentialeld,sotheruntime overheadofsupportingthismethodisverysmall,consistingofonlyencapsulationand decapsulationofasessionkey,andasimplecomparisonbetweenkeyvalues. Theproxyserverthusneedstotrustthegridmiddlewaretoauthenticateuser credentialsandestablishanencryptedtunnel,createarandomsessionkey,andprovide thekeytotheproxyclientthroughaseparateprivatechannel.Thesemechanismscanbe providedbyexistinggridsecurityinfrastructure,suchasGlobusGSI.Finally,theclient administratorneedstotrustgridmiddlewaretotheextentthatitneedstoallowNFS mountandunmountoperationstobeinitiatedbygridmiddlewarepossiblywithina restrictedsetofallowedbasedirectories,e.g., /GVFS/X inFigure4-15.IncurrentGVFS setups,thisisimplementedwiththeuseofsudoentriesforthesecommands. 96

PAGE 97

4.4.1.3Evaluation AprototypeoftheproposedsecuretunnelingbasedprivateGVFSisevaluatedwith experimentsinthissubsection.Theexperimentswereconductedinbothlocal-areaand wide-areaenvironments.Thelesystemclientisa1.1GHzPentium-IIIclusternodewith 1GBofRAMand18GBofSCSIdisk.IntheLANexperiment,thelesystemserveris adual-processor1.3GHzPentium-IIIclusternodewith1GBofRAMand18GBofdisk storage;intheWANexperiment,itisadual-processor1GHzPentium-IIIclusternode with1GBRAMand45GBdisk.TheLANsetupisa100MbpsEthernetandtheWANis throughAbilenebetweenNorthwesternUniversityandUniversityofFlorida.TheRTT betweentheclientandserverisaround0.17msintheLANandaround32msintheWAN, asmeasuredbyRTTometer[97]. TheexperimentscomparetheperformanceofprivateGVFSwithandwithoutdisk cachingwithnativeNFS.IntheexperimentsonGVFSwithdiskcaching,thecachewas conguredwith8GBytecapacity,512lebanks,and16-wayassociativity.TheGVFS proxyprototypeusedhereisbasedonNFSv2,whichlimitsthemaximumsizeofan on-the-wireNFSreadorwriteoperationto8KB.ThusNFSv2with8KBblocksizewas usedforGVFS.However,intheexperimentsonnativeNFS,NFSv3with32KBblock sizewasusedtoprovidethebestachievableresultsforcomparison.Furthermore,allthe experimentswereinitiallysetupwithcoldcachesbothkernelbuercacheandpossibly enabledproxydiskcachebyunmountingtheremotelesystemandushingthedisk cacheifitwasused. Theexperimentalresultsshowninthissubsectionconsiderapplication-perceived performancemeasuredaselapsedexecutiontimeforthefollowingbenchmarks: SPECseis :abenchmarkfromtheSPEChigh-performancegroup.Itconsistsoffour phases,wheretherstphasegeneratesalargetraceleondiskandthelastphaseinvolves intensiveseismicprocessingcomputations.Thebenchmarkwastestedinsequentialmode 97

PAGE 98

withthesmalldataset.ItmodelsascienticapplicationthatisbothI/O-intensiveand compute-intensive. LaTeX :abenchmarkdesignedtomodelaninteractivedocumentprocessingsession. ItisbasedonthegenerationofaPDFPortableDocumentFileversionofa190-page documenteditedbyLaTeX.Itrunsthelatex",bibtex",anddvipdf"programsin sequenceanditerates20times,whereeachtimeadierentversionofoneoftheLaTeX inputlesisused. Thesebenchmarkswereexecutedontheclients,wheretheworkingdirectories wereeitherstoredonlocaldisksormountedfromtheremoteLANorWANservers.To investigatetheoverheadincurredbyprivatelesystemchannels,intheLANenvironment theperformanceofnativeNFS LAN/N iscomparedwithprivateGVFS LAN/G withoutdiskcaches.ExperimentsconductedintheWANenvironmentmustuseprivate GVFStoensuredataprivacyandtraverserewalls 2 ,buttheperformanceofGVFS withoutdiskcaches WAN/G andwithcaches WAN/GC arebothcomparedagainst theperformanceofthelocaldisk Local ,inordertoinvestigatetheoverheadofGVFS securityandthepotentialperformanceimprovementachievedbyusingdiskcaching. TheexperimentresultsaresummarizedinTable4-1.Considertheexecutionofthe LaTeXbenchmark.IntheLANscenarios,theoverheadofprivateGVFS LAN/N vs. LAN/G islargeatthebeginningbutissubstantiallyreducedoncethekernelbuer cacheholdstheworkingdataset.ThisoverheadiscausedbythelatencyfrombothSSH tunnelingandproxyprocessingofRPCcalls.Theresultsalsoshowthatthekernelbuer cachealoneisnotsucienttolowerWANexecutiontimeoftheLaTeXbenchmarkin WAN/G ,buttheuseofdiskcachinginGVFScanremarkablyreducetheoverheadto 17%in WAN/GC comparedto Local .Twofactorsallowthecombinationofkernelbuer 2 Firewallsoftenrestrictnetworkaccessfromoutsidenetworks,butSSHconnectionsaretypically allowedbecauseofitsstrongsecurity. 98

PAGE 99

Table4-1.TheoverheadofprivateGVFSfortheLaTeXandSPECseisbenchmarks.The overheaddataarecalculatedbycomparingtheexecutiontimeofthe benchmarksindierentscenarios:Localdisk Local ,LANonNFS LAN/N LANonGVFS LAN/G ,WANonGVFSwithoutdiskcaching WAN/G and withdiskcaching WAN/GC .FortheLaTeXbenchmark,thecomparisonsfor theexecutiontimeoftherstiteration,theaverageexecutiontimeofthe secondtothetwentiethiterations,andthetotalexecutiontimearelisted.For theSPECseisbenchmark,thecomparisonsfortheexecutiontimesoftherst phase,thefourthphase,andthetotalexecutiontimearelisted.Forboth benchmarks,inthe WAN/GC scenariothewrite-backcacheddatawere submittedtoserveraftertheexecutionsandtheneededtimewassummedinto thetotalexecutiontime. Overhead LaTeX SPECseis 1strun 2nd20thrun total phase1 phase4 total LAN/Gvs.LAN/N 124% 7% 13% 47% 0% 9% WAN/Gvs.Local 797% 180% 21.5% 1500% 1% 265% WAN/GCvs.Local 691% 17% 60% 24% 0% 26% anddiskcachestooutperformasolutionwithkernelbueronly.First,alargerstorage capacity;second,theimplementationofawrite-backpolicythatallowswritehitsto completewithoutcontactingtheserver. FortheexecutionoftheSPECseisbenchmark,thecompute-intensivephasephase 4asexpectedachievesverycloseperformanceinallscenarios,buttheperformanceof theI/O-intensivephasephase1dierentiatesverymuch.Itisreasonabletoseethat theoverheadofprivateGVFSbecomeslargerasmorenetworkcommunicationrequires moretimeforSSHtunnelingandproxyprocessing.Thisoverheadisespeciallylargein WAN/G duetothemuchhighernetworklatencyinthewide-areaenvironment;however in WAN/GC theuseofdiskcachingeectivelyhidesthelatencyandsignicantlyreduces theoverheadto24%withrespectto Local .Besides,thewrite-backcachingalsohelpsto improveperformancebyavoidingthetransferoftemporarydatatoserver.Infact,the benchmarkgenerateshundredsofMBytesofdataintheworkingdirectoryduringits calculations,whereonlytensofMBytesaretherequiredresults.In WAN/GC ,thecached datamodicationswerewrittenbacktotheserveraftertheexecutionandtheneededtime wassummedintoitstotalexecutiontime,whichisstilllessthantenthofthatof WAN/G 99

PAGE 100

Inoverall,theperformanceoftheproposedprivateGVFSisclosetoNFSinLAN within10%forbothbenchmarks,andwiththehelpfromdiskcachingtheoverhead inWANiswithin20%fortheLaTeXbenchmarkandwithin30%fortheSPECseis benchmarkrelativetothelocal-disklesystem. 4.4.2TheSSL-EnabledSecureGridFileSystem ThekeyadvantagesoftheaforementionedsecuretunnelingbasedprivateGVFSare inthatexistingRPC-basedclientsandserverscanbereusedwithoutmodications,and itleveragesmaturesecuritytechnologies.However,itrequiresadditionalmiddleware tosetuptunnelsandkeys,anditsperformancealsosuersfromtheoverheadof doubleuser-levelforwardingincurredbyproxyRPCprocessingandSSHtunneling.In addition,itisnotcompatiblewithwidelyacceptedgridsecurityinfrastructure[51],which presentsahurdletotheinteroperabilitywithothergridmiddleware.Thissubsection presentsanothersecurityapproachthatpreservethemeritsofthesecuretunnelingbased approachandaddressesitslimitationsbyprotectingRPCcommunicationdirectlywith transport-levelsecurity,withouttheadditionoftunneling,anduseswidely-acceptedgrid securitytokenstoprovidecompatibleauthenticationandauthorization. 4.4.2.1Design Securedataaccessinthisapproachisprovidedbytransport-levelsecuritymechanisms, whichenableanecientsecureend-to-endconnectionbetweenproxyclientandproxy servertoprotectRPCcommunications.InordertocreateasecureGVFSsessionfor agridusertoaccessaleserver,public-keybaseduserandservercerticatesareused toestablishthemutualauthenticationbetweentheproxies.Ausercerticatecanbe theuser'sgrididentitycerticate,oraproxycerticateissuedbytheuserthatsupports delegation[51].Aftersuccessfulauthentication,asharedkeyisnegotiatedbetweenthe twopartiesandisusedtoencrypttheGVFStrac,whereasthedataintegritycanalsobe protectedusingdigitalsignaturesorMessageAuthenticationCodeMAC. 100

PAGE 101

Anauthenticateduser'scerticateisusedbytheproxyservertomakeauthorization decisions,i.e.,whethertogranttheuser'saccesstotheexportedles.Thisisachieved usingagrid-styleaccesscontrolmechanismwhichassociateslesystemaccesspermissions withthegriduser'sidentityembeddedinthecerticate.Suchaccesscontrolisprovided withdierentgranularitywhichallowsforexibleselectionbasedonapplicationneeds. Foranauthorizeddataaccessrequest,thenecessaryidentitymappingisalsoperformedby theproxyserversothattherequestcanbesuccessfullyexecutedontheleserver. ThechoiceofsecuritymechanismsandpoliciesisexibleandcustomizableperGVFS session,inordertosatisfydierentsecurityrequirementsfromusersandapplications. Thisisimportantbecausesuchcongurationshaveimplicationsonbothsecurity andperformance.Forexample,ifthedatatransferredbyGVFSarenotcondential, encryptioncanbeavoidedtoimprovethedataaccessperformance,whereasdigital signaturescanstillbeemployedtoprotectitsintegrity.Incontrast,foraGVFSsession createdforhighlysensitivedata,encryptionmustbeenabledwithstrongcipherswhich consumesconsiderableCPUcycles. 4.4.2.2Implementation TheSSL-enabledsecureRPC: Inthisapproach,securecommunicationforNFSRPCisachievedusingtransport-layer securityprotocolsSSL/TLS[53][54]|referredtogenerallyasSSLintherestofthis subsection.AlthoughsecureRPCcanberealizedattheRPC-layeritself RPCSEC GSS [44],severalfactorshavemotivatedtheuseofSSL:ithasverymatureandecient implementations,whichhavebeensuccessfullyemployedbymanyimportantapplications; itsupportsawiderangeofalgorithms,whichcanbeleveragedtosupportexiblesecurity congurations;GVFSsessionsareestablishedonper-user/applicationbasis,andthuscan useSSLtoprovidefull-featuredsecuritywithoutusinganyRPC-layermechanisms. AnSSL-enabledsecureRPClibraryhasbeendevelopedforGVFSbasedontwokey packages,TI-RPC[96]andOpenSSL[55].TI-RPCTransportIndependentRPCisthe 101

PAGE 102

replacementfortheoriginaltransport-specicRPC.Itallowsdistributedapplicationsto transparentlysupportRPCoverconnectionlessandconnection-orientedtransportsfor bothIPv4andv6.OpenSSLisanexcellentimplementationofSSL,andithasrecently alsoincludedthesupportfordatagramprotocolsDTLS.Therefore,thesetoolscanbe eectivelyutilizedtobuildasecureRPClibrarythatsupportsbothTCPandUDP. InthislibrarysecureRPCAPIsareprovidedinawaythatcloselyresembles theregularRPCAPIs.Forexample, clnt tli ssl create and svc tli ssl create aretwo expert-levelAPIsforcreatingaRPCclientandserver,respectively,usingasecure transportforcommunications.TheseAPIstakethesameparametersastheirregular counterpartswithanadditionaloneforthesecuritycongurationstructure.Theuseof authentication,encryption,andMACaswellastheirspecicalgorithmscanbespecied throughthisstructureandpassedontothelibrarytocreatesecuretransportsforRPC withthedesiredsecuritymechanisms. ThissecureRPClibraryisgenerictosupportallRPC-basedapplications.Thefact thatbothTI-RPCandOpenSSLarestandalonepackageshelpsitsusebyordinaryusers withouttheneedtochangeanysystem-levelcongurations.Thecurrentimplementationis basedonLinux;supportforotherplatformsisalsoconceivable. TheGSI-basedGVFSproxy: TheGVFSproxiesareenhancedtousetheSSL-enabledsecureRPClibraryfor communications,andarealsoextendedwiththecapabilityofparsingandvalidating GSI-basedcerticates.Usingtheseproxiestoestablishagrid-widelesystem,theprivacy andintegrityofdataaccessareprotectedinthesecureRPCs,whereasgridauthentication andauthorizationareperformedbasedontheuserandservercerticates. AGVFSproxyisconguredbyauserorservicethroughacongurationle, whichisusefulforcustomizingseveralimportantaspectsofaGVFSsessione.g.,the useofdiskcachinganditsparametersasshowninFigure4-6.Thisconguration mechanismisaugmentedtoincludethesecuritycongurations,includingthealgorithms 102

PAGE 103

forauthentication,encryption,andMAC,andthepathstouser,host,andtrustedCA certicates.Inthisway,boththeproxyclientandproxyservercanbeproperlycongured tousethegriduser'sandserver'scerticatestoauthenticatewitheachother,andsetupa datasessionwiththedesiredsecuritymechanismsandpolicies. AGVFSsession'ssecuritycustomizationcanalsobereconguredbysignalingthe proxiestoreloadthecongurationles.Suchdynamicrecongurationisveryusefulin severalimportantscenarios.Forexample,itcanforceaproxytoreloadthecerticate whentheoriginaloneisexpiredorbelievedtobebreached.Itcanresetasession's securitysetupwhenthedesiredcongurationischanged.Itcanalsobeusedtoforcea SSL-renegotiationandrefreshthesessionkeyforalong-livedsession.Infact,aproxycan beconguredwithatimeoutvaluetoenableperiodicautomaticrenegotiation. Gridleaccesscontrol: Aftersuccessfulmutualauthentication,thegriduser'scerticatepresentedbythe proxyclientisusedbytheproxyserverforauthorizationofthedatarequestsreceived fromthissession.TheusercredentialsUNIXuserandgroupIDineachNFSRPC messagearefromtheclient-sideaccountallocatedforagriduserorjob.Theydonot representthegriduser'sidentityandcannotbeusedforthepurposeofauthorization, buttheyarestillnecessaryforcross-domainidentitymapping.ForeachauthorizedRPC request,thesecredentialsaremappedtoalocaluseraccount'scredentials,whicharethen usedbythekernelNFSservertograntaccesstoles.Suchauthorizationandmappingare determinedbythegridleaccesscontrolpolicies. WithGSI-enabledproxies,avarietyofACLmechanismscanbeemployedtoenforce accesscontrolforGVFSsessions.Thebasicmechanismisbasedonagridmaplewhich issimilartoGSI'sgridmaple[51]andprovidesaccesscontrolperexportedlesystem. Thisledescribesthemappingbetweenauser'sgrididentitydistinguishednameinthe certicateandalocalaccount'sname.Ifamappingexistsforauserinthegridmaple, theusergainsthesameaccessrightstotheexportedlesystemasthecorrespondinglocal 103

PAGE 104

user.Otherwise,theuserismappedtoananonymoususer,ordeniedofaccesscompletely, dependingonthesession'sconguration.InGVFS,thegridmaplecanbesetupona per-sessionbasistoenableexiblesharing.Forexample,ifauserwantstoshareherles withanotheruser,sheonlyneedstoaddthemappingbetweenthatuser'sdistinguished nameandherlocalaccountnameinthesession'sgridmaple. Fine-grainedaccesscontrolisrealizedbyleveragingtheACCESSprocedurecall availableinNFSv3andNFSv4.EachleordirectorycanhaveanACLleassociated withitunderthesamepathandnamedinthestyleof .lename.acl .Auserorservice cangrantordenyauser'saccesstoaleordirectorybyputtingtheuser'sdistinguished nameinsidethecorrespondingACLlealongwithabitmaskencodingtheaccess permissions.OnlytheNFSv3styleACLissupportedinthecurrentimplementation. UponreceivinganACCESSrequest,theproxyservercheckstheuser'sgrididentity againsttherequestedleordirectory'sACL,andreturnsthecorrespondingbitmaskif theuserisfoundintheACL,orazerowhichdisablesallaccesspermissions. Aleordirectoryautomaticallyinheritsitsparent'sACLifitdoesnothavea dedicatedACLle.Thisinheritancemechanismcanreducethemanagementcomplexity ofACLs.Forthesakeofperformance,theACLsarecachedinmemorybytheproxy serveroncetheyarereadfromdisk.TheACLlesareprotectedbytheproxyserver fromremoteaccess,andcanonlybemodiedbythelocalownerofthelestypically theGVFSmanagementaccount,asdescribedbelowmanuallyorthroughanauthorized middlewareservice.NotethatintheGVFSsecuritymodel,theNFSserverdelegates theaccesscontroloftheexportedlesystemscompletelytotheproxies.SotheACL mechanismsinkernelexceptforthekernelexportslearenolongerusefulforthe exportedlesystemandshouldbedisabledtoavoidoverhead. 4.4.2.3Deployment GVFScanbeconvenientlydeployedongridresourcesbecauseitdoesnotrequireany modicationstoeitherapplicationsorkernels.Italsoobeystheleast-privilegeprinciple 104

PAGE 105

inthattheproxiesandtheirmanagementservicesSection6.1workcompletelyatuser levelanduseunprivilegednetworkports,andtheycanbemanagedusingasingleregular useraccounte.g.,user gvfs oneachhost.Ontheserverside,theonlyprivilegerequired isthecongurationofahost-wideexportsleusedbythekernelNFSserver.Thiscan berestrictedtoasingleentryintheexportslebyorganizingallthegrid-accessible lesystemsunderasinglepathe.g., /GVFS ,whichneedstobeexportedtoonlythe localhost.Ontheclientside,theuseoflesystemmountandunmountisnecessary, anditcanalsobeminimizedbygivingonlythelocalGVFSmanagementaccountthe permissiontousesudoorasetuidprogramtomountandunmountGVFSsessionstoa restrictedpathe.g., /GVFS TouseGVFS,itisnotnecessaryforagridusertohaveapersonalaccounton theclientorserver.Theproxiesandtheirmanagementservicescreateasecurele systemsessiononbehalfoftheuserbetweentheaccountwhereherjobisrunningand theaccountwhereherlesarestored.Thesejobandleaccountsareoftenprovided bymappingagridusertoalocaluser[51],orallocatedon-demandfordynamically submittedjobs[87]. 4.4.2.4Evaluation AprototypeoftheSSL-enabledGVFSisevaluatedinthissubsectionwithexperiments. FilesystembenchmarksIOzoneandPostMarkareusedtoinvestigatetheoverheadof achievingstrongsecurityunderintensiveI/Oload.Applicationbenchmarksmodeling workloadsinsoftwaredevelopmentandscienticcomputingarealsoemployedtostudy GVFSperformancewithtypicallesystemusages. BothLANandWANenvironmentswereconsideredintheexperiments.LAN-based runsstudytheoverheadfromtheuser-leveltechniques,whereastestsinanemulated WANrevealitsperformanceforthetargetgridenvironments.NISTNet[95]wasusedto emulatedierentwide-areanetworklatencies.Thelesystemclientandserveraswellas theNISTNetrouterweresetuponVMware-basedvirtualmachines.Theywerehostedon 105

PAGE 106

separatedphysicalserversconnectedviaGigabitEthernet.Eachphysicalserverhasdual 3.2GHzhyperthreadedXeonprocessorsand4GBofmemory.TheclientandserverVMs bothhave1CPUbutwithdierentamountofmemory,256MBand768MB,respectively. Theuseofanetworkemulatorandvirtualmachinesfacilitatesthequickdeployment ofacontrollableandreplicableexperimentalsetup.However,thetimekeepingwithin avirtualmachinemaybeinaccuratesothesystemclockonaphysicalserverwas usedtomeasuretime,whichsucesthegranularityrequiredbythisevaluation.All theexperimentswereconductedonvirtualmachinesrunningondedicatedphysical serverswithoutinterferencefromotherworkloads.DierentsecureDFSsetupswere experimented,including: NFSv3 and NFSv4 :Thenativekernel-levelNFSv3andNFSv4providethe baselineperformanceforcomparison.Althoughnotevaluatedhere,kernel-levelsecure NFSsolutionse.g.,Kerberos-enabledNFS,GridNFScanbeexpectedtohaveworse performancethantheseresults.KernelNFSimplementationsuseonlymemoryforcaching andrevalidatethecacheddatawhentheleisopenedoritsattributeshavetimedout. NFSv4alsoprovidesdelegation,whichallowsaclienttoaggressivelycachedata. GVFS-BASIC and GVFS-SSH :ThebasicGVFSwithoutanysecurityenhancements, andtheSSH-enabledprivateGVFSpresentedinSection4.4.1.Theirresultsdemonstrate theoverheadfromtheuser-levelRPCprocessingandSSHtunneling. SFS :TherelatedworkofSelf-certifyingFileSystem[30]|anotherNFS-based user-levelsecurelesystem.Theoverheadfromtheuser-leveltechniquescanalsobe observedfromitsperformance.SFSaggressivelycachesattributesandaccesspermissions inmemory,whichimprovetheperformanceformetadataoperations. GVFS-SSL :TheproposedSSL-enabledsecureGVFSapproach.Bycomparing totheabovesystems,theexperimentsexaminetheperformanceoftheSSL-enabled strongauthentication,privacy,andintegrity.Aggressivediskcachingofattributes,access 106

PAGE 107

permissions,anddatawereusedintheWAN-basedtests,sothoseresultsalsoreectthe potentialperformanceimprovementfromthat. Inalloftheabovesetups,theserverexportedthelesystemwithwritedelayand synchronousupdate,andtheclientaccessedtheserverusingTCPand32KBblocksizefor readsandwrites.Alltheexperimentresultsarereportedwiththeaverageandstandard deviationvaluesfrommultipleruns.Everyrunwasstartedwithcoldclient-sidecachesby unmountingthelesystemandushingthediskcache. IOzone: TherstexperimentconsiderstheIOzone[88]benchmarkwhichanalyzesale system'sperformancebyperformingreadandwriteoperationsonalargelewitha varietyofaccesspatterns.Inthisexperiment,itwasexecutedontheclientinread/reread mode,whichsequentiallyreadsa512MBletwicefromtheserver.Sincetheclienthas only256MBofmemory,thebuercachedoesnothelpwithitsLRU-basedreplacement forthebenchmark'ssequentialreads.Infact,theclientneedstoreadatotalof1GB datafromtheserverduringtheexecution.Ontheserverside,theleispreloadedto thememorybeforeeachrun,sothereisnoactualdiskI/Oinvolvedinthetests.This extremely"intensivesetuprevealstheworst-caseoverheadfromGVFS'user-level virtualizationandsecurityenhancements. TheexperimentsevaluatevariousSSL-enabledGVFScongurationsthathave dierentsecuritystrengths,asfollows: GVFS-AES usesAESRijndael[98]inCBCmodewith256bitkey,averystrong cipher,toencryptRPCtrac,andensuresdataintegritywithSHA1-basedHMAC[99]. GVFS-RC usesRC4ARCFOUR[100]with128bitkey,arelativelyweakercipher forencryption,anditalsoenablesSHA1-HMACfordataintegrity. GVFS-SHA doesnotuseanyencryptionbutstillprovidesintegrityusingSHA1-HMAC. TocomparewithGVFS,in GVFS-SSH theSSHtunnelswereconguredtouse256bit AES-CBCandSHA1-HMAC,whichissimilartothe GVFS-AES conguration;SFS 107

PAGE 108

Figure4-16.RuntimesofIOzoneonthedierentlesystemsetupsinLAN: NFSv3 NFSv4 SFS ,basicGVFSwithoutsecurityenhancement GVFS-BASIC secureGVFSwithonlyintegritycheckbutnotencryption GVFS-SHA secureGVFSwithRC4cipher GVFS-RC ,secureGVFSwithAEScipher GVFS-AES ,andSSH-enabledprivateGVFS GVFS-SSH providesprivacyandintegrityusingacustomizedRC4andSHA1-HMAC,whichisclose tothe GVFS-RC setup. Figure4-16illustratestheruntimesofIOzoneontheaboveDFSsetupsinLAN.The user-levellesystemsallshowaslowdownofmorethantwo-foldcomparedtothekernel NFSimplementations.However,suchintensiveworkloadisveryrareinpractice,andthe user-levelprocessinglatencycanoftenbeoverlappedwithapplicationthinking"time orbediminishedbydiskI/Olatency.Moreimportantly,inaWANenvironment,the networklatencybecomesthedominantfactorandrenderstheuser-levellatencynegligible. User-levelcachingtechniquescanfurtherhidethelatencyandimprovethelesystem's performance.Thesediscussionswillbevalidatedwiththeexperimentspresentedlaterin thissubsection. 108

PAGE 109

ComparingthedierentsecureGVFScongurations, GVFS-SHA hasthelowest overheadfromthesecurityenhancements%w.r.t. GVFS-BASIC ,becauseitonly calculatesHMACbutdoesnotperformanyencryption/decryptiononthelesystem trac.Withtheuseofencryption,theoverheadisincreasedto15%in GVFS-RC ,and 50%in GVFS-AES GVFS-SSH hasamuchhigheroverheadthantheotheronesmore thansix-foldslowdownw.r.t. GVFS-BASIC .Thiscanbeatleastpartiallyattributedto thepenaltiesfromthedoubleuser-levelforwarding:foreveryRPCmessage,twonetwork stacktraversalsandkernel-userspaceswitchesarerequiredbyGVFSandSSHtoprocess it.Asdiscussedearlier,suchanoverheadishighlightedbythisintensiveexperimentsetup. TheSSL-enabledsecureGVFSapproachremovesthisextrapenalty,andthusimproves theperformancesubstantially 3 Theexperimentalsomeasuredtheoverheadoftheuser-levellesystemsintermsof CPUusage.TheusertimepercentagesforGVFSproxiesandSFSdaemonswerecollected every5secondsthroughoutthebenchmark'sexecution.Theclient-andserver-sideresults areplottedinFigure4-17andFigure4-18respectively.Ontheclient,thebasicGVFS' CPUusageisverylow,averaging0.6%andunder1%forallthetime.ForSSL-enabled secureGVFS,theusagegoesupto5%with SHA1-HMAC ,andfurtherincreasedtoabout 8%whenencryption/decryptionisalsousedbit-AESconsumesslightlymoreCPU than128bit-RC4.Ontheserver,theCPUusageisevenlessfor GVFS GVFS-SHA ,and GVFS-RC ,averaging0.3%,1.5%,and3.6%respectively.AlltheGVFScongurations needlessCPUthanSFSwhichhasmorethan30%usageonbothsides. PostMark: 3 Theperformanceof GVFS-RC isrelativeworsethanSFS,becausetheGVFSprototypeusedinthis evaluationisasingle-threadedimplementation,whichcannothandlemultipleoutstandingRPCssimultaneously.Incontrast,SFSmakesuseofasynchronousRPCandcanprocessseveralrequestsat thesametime.However,itisreasonabletobelievethatamultithreadedsecureGVFSimplementationcandelivermuchbetterresults,asdemonstratedinSection4.2.2 109

PAGE 110

Figure4-17.Client-sideCPUusageoftheuser-levellesystemproxy/daemonduringthe executionofIOzoneondierentsetups:basicGVFSwithoutsecurity enhancement GVFS-BASIC ,secureGVFSwithonlyintegritycheckbut notencryption GVFS-SHA ,secureGVFSwithRC4cipher GVFS-RC secureGVFSwithAEScipher GVFS-AES ,and SFS ThesecondexperimentusesthePostMark[90]benchmark,whichisamorerealistic lesystembenchmarkthatsimulatestheworkloadsfromemails,news,andwebcommence applications.Itstartswiththecreationofapoolofdirectoriesandlescreationphase, thenissuesanumberoftransactions,includingcreate,delete,read,andappend,on theinitialpooltransactionphase,andnallyremovesallthedirectoriesandles deletionphase.Incontrasttotheuniform,sequentialdataaccessesusedintheIOzone experiment,thelesystemisrandomlyaccessedwithavarietyofdataandmetadata operationsfromPostMark. Inthisexperiment,theinitialnumberofdirectoriesandleswere100and500 respectively,andthenumberoftransactionswas1000.Thetransactionswereequally distributedbetweencreateanddelete,andbetweenreadandappend.Thelesizes rangedfrom512Bto16KB,andthusthebenchmarkexcisedmostlyonmetadata 110

PAGE 111

Figure4-18.Server-sideCPUusageoftheuser-levellesystemproxy/daemonduringthe executionofIOzoneondierentsetups:basicGVFSwithoutsecurity enhancement GVFS-BASIC ,secureGVFSwithonlyintegritycheckbut notencryption GVFS-SHA ,secureGVFSwithRC4cipher GVFS-RC secureGVFSwithAEScipher GVFS-AES ,and SFS operationsandsmallwrites.Figure4-19showstheruntimesofeachPostMarkphase fortheaforementionedDFSsetups.Inordertodemonstratetheworst-caseoverhead, thestrongGVFSconguration GVFS-AES isusedfortherestofthissubsectionandis denotedas GVFS or GVFS-SSL fromhereon.Forthecreationanddeletionphases,the runtimesofthesecurelesystemsareallveryclosetothenativeNFS' GVFS-SSH is marginallyworsethantheothers.However,forthemoreintensivetransactionphase, wherealargenumberofsmalldataandmetadataupdatesareinvolved,onlyGVFSshows acloseperformancetoNFSv3,anditisbetterthan SFS and GVFS-SSH by17%and 14%respectively. TheaboveexperimentwasconductedinaLANenvironment,wherethenetwork round-triptimeRTTbetweenthelesystemclientandserverisabout0.3ms.Thenit wasrepeatedintheemulatedWANwithdierentnetworklatencies.Figure4-20compares 111

PAGE 112

Figure4-19.RuntimesofvariousPostMarkphasesonthedierentDFSsetupsinLAN: NFSv3 NFSv4 SFS ,SSL-enabledsecureGVFS GVFS-SSL ,and SSH-enabledprivateGVFS GVFS-SSH thetotalruntimesofPostMarkon NFSv3 and GVFS-SSL .Benetedfromtheuseof diskcaching,GVFSshowsaveryslowdecreaseinperformanceasthenetworklatency grows.ItisalsosignicantlymoreecientthannativeNFSinwide-areaenvironments, andthespeedupisabouttwo-foldwhentheRTTis80ms.Theseresultsprovetheearlier discussionsthatauser-levelsecurelesystembasedonGVFScanbeveryecientfor grid-scalesystems. SincenoperformanceadvantagehasbeenobservedintheversionofNFSv4usedin theexperiments,onlytheresultsfromNFSv3arereportedhereaswellasinthefollowing experiments,anditisreferredtoasNFS. ModiedAndrewbenchmark: Thethirdexperimentmodelsthetypicalsoftwaredevelopmentprocessusinga modiedAndrewbenchmarkMAB.Itconsistsoffourphasesthatexercisedierent aspectsofalesystem.Therstphasecopymakesacopyofasoftwaresourcetree, whichtransfersalargenumberofsmallleswithinthelesystem.Thesecondphase 112

PAGE 113

Figure4-20.TotalruntimesofPostMarkonNFSv3andSSL-enabledsecureGVFSwith varyingnetworkround-triptimeRTT. statrecursivelyexaminesthestatusofeveryleandthusexercisesmetadatalookups intensively.Thethirdphasesearchreadseverylethoroughlytosearchforakeyword. Thelastphasecompilecompilestheentiresourcetree,whichgeneratesalargenumber ofdataandmetadataoperations.BecausetheoriginalAndrewbenchmark[25]uses aworkloadthatistoolightfortoday'slesystems,thesourcetreeisreplacedwith thepackageofanOpenSSHclientopenssh-4.6p1.Itisa3-levelsourcetreewith13 directoriesand449les,andtheentirecompilationgenerates194binariesandobjectles. ThebenchmarkwasexecutedonNFSandtheproposedSSL-enabledsecureGVFS inbothLANandemulatedWANwith40msRTT.TheirresultsareshowninFigure 4-21.ThesecureGVFSperformsaswellasNFSfortherstthreephasesinLAN,and intheintensivecompilephase,itshowsarelativelysmalloverheadof14%.InWAN, GVFScachingeectivelyhidesthenetworklatency,andthetotalruntimeofMABis sloweddownbyonly2.5times.ComparedtoNFS,itismorethanfourtimesfaster,and thespeedupisapproximatelynine-fold,ve-fold,andeight-foldforthestat,search,and 113

PAGE 114

Figure4-21.RuntimesofvariousMABphasesonNFSandsecureGVFSinbothLANand WAN.Thetimeneededtowritebackdataattheendofexecutionis51.2sin averagewithastandarddeviationof1.3s. compilephasesrespectively.Althoughnotshownhere,theperformanceofsecureGVFSin LANcanalsobeimprovedifdiskcachingisused,inwhichthecompilephaseisonly2% slowerthanNFS. SPECseis: Thelastbenchmarkusesascientictool,SPECseis,whichimplementsalgorithms usedbyseismologiststolocateresourcesofoil.ItistakenfromtheSPECHPC96suite, anditssequentialversionisconsideredintheexperiment.Theexecutionconsistsof fourphases:datageneration,datastacking,timemigration,anddepth migration.Phase1preparesalargeinitialdatale,andeachofthefollowingphases performscertaincomputationbasedonitspreviousphase'soutputleandthengenerates itsownresultsondisk.Intheend,theintermediateoutputsareremovedandonlythe resultsfromthelasttwophasesarepreserved.Thisbenchmarkmodelsagridapplication thatisbothI/Oandcomputationintensive. 114

PAGE 115

Figure4-22.RuntimesofvariousSPECseisphasesonNFSandsecureGVFSinbothLAN andWAN.Thetimeneededtowritebackdataattheendofexecutionis 14.2sinaveragewithastandarddeviationof1.3s. ThisexperimentwasconductedinbothLANandemulatedWANwith40msRTT. BasedontheresultsshowninFigure4-22,similarobservationscanbemadeasinthe previousexperiment:inLAN,theperformanceofsecureGVFSisveryclosetoNFS;in WAN,GVFSstilldeliversagoodperformanceandissubstantiallybetterthanNFS.In phase1,GVFSstoresthelargeoutputentirelyincachewiththeuseofwrite-back;in phase2,alargenumberofreadscanbesatisedfromthedatacachedindisk,whichare notavailableinmemory;andattheend,GVFSalsosavesconsiderabletimefromwriting backonlythenalresultsbutnotthetemporarydatatotheserver.Consequently,GVFS showsnoslowdowninWAN.ComparedtoNFSv3,itismorethanvetimesfasterin thetotalruntime,andthespeedupisabouttwo-fold,forty-fold,andfour-foldfortherst threephasesrespectively. 4.5FaultTolerance AsDFSsgettingdeployedonWAN,theirscaleanddynamismalsogrowdramatically. Unreliablemachinesandnetworksoftencausetheinterruptionofremotedataaccess 115

PAGE 116

onaDFSandeventhelossofdataduetounrecoverablefailures.Inasystembuilton non-dedicatedresources,suchasagridorpeer-to-peersystem,thedynamicleavingof resourcesalsomakesthedatastoredonthemunavailabletoothers.Therefore,strong toleranceofdynamicfailuresiscriticaltousingDFSsinsuchenvironments,especiallyto applicationsthattakelongtimetonishandcannotbeeasilyrecoveredincaseoffailures. WidelydeployedDFSstypicallydonotprovideanysupportforfaulttolerance, sincetheyaredesignedforarelativelyreliableandstableenvironment.However,the GVFS-basedvirtualDFSsbuiltuponthemcanemployuser-levelfault-toleranceprotocols toimprovethedatareliabilityandavailability.Intheseprotocols,faulttoleranceis providedbyintroducingredundancytothedatasets,throughtheuseofreplicationor coding.Redundancycanbedeployedacrossdierentserversanddierentsitestoprotect againstvarioussortsoffailures,includingthelossofdataduetosystemcrashes,theloss ofconnectionsduetonetworkpartitions,andthecorruptionofdataduetofaultystorage ortransmissions.Thissectionpresentsthereplication-basedfault-toleranceprotocolthat makesuseofredundancytoimprovethereliabilityofdataaccessonGVFS. 4.5.1VirtualizationofDataSets Replicationisacommonpracticeforimprovingreliability.AGVFS-baseddata sessioncanemploymultipleleserverstoreplicateitsdatasetandprovidefaulttolerance totheapplicationoruser'sdataaccess.Insuchasetup,aproxyserverisstartedoneach ofthereplicationservers,allowstheconnectionfromtheproxyclient,andservicesthe client'sdatarequestsusingitslocalreplica.Datavirtualizationisprovidedbytheproxy clienttohidethedierentphysicallocationandidentityofthereplicas,andpresenta single,consistentviewofthedatasettotheapplication. InNFS,aclientreferenceslesanddirectoriesbylehandlesFHs,whichare typicallylessthan64bytesinlengthandstorescontentsthatareopaquetotheclient. Aleisuniquelyidentiedbyitslehandleonitsleserver.However,afterreplication, aleordirectorymayhaveadierentFHoneveryleserver,andtheclientcannotuse 116

PAGE 117

thesameonetoreferenceitsreplicas.Toaddressthis,virtualFHsareprovidedbyGVFS tovirtualizeadatasetacrossthereplicationservers.Theproxyclientpresentsvirtual FHstotheclientandmapsthemtothephysicaloneson-the-ywhilethedataarebeing accessed. WhenaclientrstmountsareplicatedremotelesystemthroughGVFS,theproxy clientcreatesavirtualrootFHforthevirtualDFS,forwardstherequesttoalltheservers toobtainthephysicalrootFHs,storesthevirtual-to-physicalmappingspersistentlyin itslocaldiskcaches,andreturnsthevirtualonetotheclient.Henceforth,theclientcan usethisvirtualFHtoreferencetherootinitsRPCs,andtheproxyclientwilltranslate ittothephysicalrootFHsbeforeitforwardsthecallstotheservers.Similarstepsare followedwhentheclientcreatesaregularleordirectory,orlooksuponethatisnot alreadycachedbytheproxy. 4.5.2ReplicationSchemes Initialreplicationofadatasetisdonebyauserormiddlewarethatmanagesit pleaseseeSection6.1.3.3formoredetailsonthereplicationmanagement.AfteraGVFS sessioniscreatedtoprovideanapplicationwithaccesstothedata,theconsistencyamong thereplicasismaintainedusinganactive-styleprotocol[101],withseveralvariationsto allowapplication-tailoredcustomizations.Inthebasicscheme,everydataandmetadata updaterequestfromtheclientismulticasttotheproxyserversandperformedonallthe replicas,anditdoesnotreturnuntiltheproxyclienthasreceivedanacknowledgement fromeveryone.Consequently,eachreplicahastheexactsamecopyofthedataset,andif anyofthemcrashes,ithasnoimpactupontheapplicationsincetheremainingreplicas cancontinuetoserviceasusual. Theaboveschemeperformsupdatessynchronouslyandthuslimitstheperformance totheslowestreplicationserver.Itcanberelaxedtoallowasynchronousupdates andreducetheoverheadforwrite-intensiveapplications.Inthisvariation,oneofthe replicationservers,namelythe primaryserver ,isupdatedsynchronously,whereasthe 117

PAGE 118

othersaredoneasynchronously.Specically,theproxyclientrespondstoanupdate immediatelyaftertheprimaryserveracknowledgesitwithoutwaitingfortheothers.The orderingofrequeststoaleisprovidedbythereliablemulticastmechanismdescribed below.Therefore,theperformanceoftheupdatescanbeimprovedbychoosingthe fastestserverastheprimary.Itisusefulforsupportingheterogeneousreplicationservers, whichhave,e.g.,dierentlevelofloadsordierentnetworklatencytotheclient. Thesynchronizationofupdatescanbefurtherdelayedbytheproxyclientusing itsdiskcachestobuerthemlocally.Givensucientcachecapacity,theupdatesare submittedtotheserversinbatcheswhentheapplicationisidleorcompletes.Inaddition toexploitingthelocalityofupdates,theuseofwrite-backreducestheoverheadof maintainingreplicationconsistency.Thedrawbackofthisschemeisthat,incaseofclient crashes,theupdatescannotberecovereduntiltheclientisback,ortheapplicationneed berestartedonanotherclient. Reliablemulticastofupdatesintheaboveprotocolisrealizedbyclient-sidelogging andmultithreadedRPCs.Beforetheproxyclientmulticastsaupdaterequest,itlogsit synchronouslyonpersistentstorage.Thenitusesmultiplethreadstoforwardtheupdate RPCstotheproxyserversconcurrently.TheRPCscanusebothUDPandTCP,andthe proxyclientwillretransmituponatimeoutuntilafailureisdetermined.Aftertherequest isacknowledgedbyalltheproxyservers,itisconsideredcompletedandremovedfromthe log.Iftheproxyclientcrashesinthemiddleofthemulticastprocedure,thenafterthe recovery,itcancheckthelogandresendtheincompleteupdates.Toenforcetheordering ofmulticast,duringanupdatemulticast,thefollowingrequeststothesamelewillbe blockedbytheproxyclientuntiltheupdateiscompleted. Althoughitisnecessarytopropagateanupdatetotheentirereplicationgroup,a GVFSsessioncanchoosetowhetherperformreadoperationsonallthereplicasoronly ontheprimaryone.Theformerschemeisnecessarytoprovidefaulttoleranceagainst Byzantinefailures,inwhichtheproxyclientcomparesrepliesfromtheserversandreturns 118

PAGE 119

onebasedonthemajorityvote.Thisschemeisoftencombinedwiththesynchronous updateapproach,andithasasimilardisadvantageoflimitingtheperformancetothe slowestreplicationserver. AGVFSsessioncanalsouseaprimary-backupbasedschemeforreadoperations, inwhichtheproxyclientsendsreadsonlytotheprimaryserver.Bychoosingtheserver thatdeliversthebestdataaccessperformanceastheprimary,itcannotonlyreducethe overheadfromusingreplicationbutalsoimprovethedataaccessperformance,which isdiscussedindetailsininSection6.2.3.1.Incombinationwiththeaboveactive-style schemesforwriteoperations,thisapproachinessenceprovidesaread-one,write-all" fault-toleranceprotocol,whichcanbeemployedbyaread-mostlyGVFSsessionto improveitsbothperformanceandreliability. 4.5.3Application-TransparentFailover Basedonreplicationandvirtualizeddatasets,faulttolerancecanbeprovidedfor GVFSdatasessionswithapplication-transparentfailover.Inordertodetectaserver-side failurehappenedinasession,theproxyclientkeepsintouchwiththeproxyservers throughtherequestsfromtheapplication,orbysendingperiodicheartbeatsusingNULL RPCsiftheapplicationisidle.IfaRPCtimesout,itisrstretransmittedincasethe failureiscausedbyatransientnetworkorserverproblem.Whenamajortimeoute.g., 100timesoftheaverageresponsetimeisreachedbeforeanyoftheretransmissions succeeds,itassumesthatanunrecoverablefailurehasoccurredonthatserver. AfailedserverisexcludedfromthegroupofreplicationserversfortheGVFSsession, anditsapplicationcancontinueaccesstheremotedatausingtheavailableservers.If aprimary-backupbasedreplicationschemeisusedandtheprimaryserverisfailed,the proxyclientimmediatelychoosesanewprimaryserverfromthebackuponesandforwards thefailedcallaswellasthefollowingonestoit.Inthisway,thefailuredetectionand 119

PAGE 120

recoveryiscompletelymaskedfromtheapplication 4 ,andexceptforashortdelay,itsdata accessisnotaectedatallinthisprocess. Tosupporttimelyreplicarepairandregeneration,aproxyclientnotiesofadetected failuretotheuserorthemiddlewareservicethatmanagestheGVFSsession,sothat actionscanbetakentorecoverthereplicaseitheronlineoroine.GVFSalsosupports non-interruptonlinereplicaregenerationbyallowingauserorservicetotemporarily switchtheGVFSsessiontotherepairmode,inwhichalltheapplication'supdates arebueredbytheproxyclientinlocaldiskcachesandsynchronizedlaterwiththe replicationserversafterthenewreplicaisgenerated.Thereplicationmanagementservice isdiscussedindetailsinSection6.1.2. 4.5.4Evaluation Theeectivenessofthereplication-basedfault-toleranceprotocolisevaluatedin thissubsectionwithaGVFSdatasessionestablishedfortheSPECseis96benchmark application.ThelesystemclientandserverweresetuponVMwareGSX2.5based virtualmachinesVMs,connectedbytheWANbetweenUniversityofFloridaUFLand LouisianaStateUniversityLSU.TheserverVMwasreplicatedtoprovideredundancy forthedataset. DuringtheexecutionofthebenchmarkontheclientVM,afailurewasinjected bypoweringotheserverVM.ThefailurewasdetectedwhenatimeoutofaRPCcall happened,andwasimmediatelyrecoveredbyestablishinganewconnectiontothereplica VMandredirectingthecallsthroughit.Thebenchmarknishedsuccessfully,without beingawareoftheserverfailureandrecoveryduringitsexecution.Theelapsedtime ofsucharunsecondsiscomparedwiththeexecutiontimeofthebenchmarkina 4 Tohidealongtimeoutfromtheapplication,theGVFSsessionneedstobemountedinahard" manner,inwhichthekernelNFSclientcontinuesretryingthefailedoperationindenitelyuntilit succeeds.Ifthesessionismountedinasoft"manner,thekernelNFSclientwillreportanI/Oerror totheapplicationafteralongtimeout. 120

PAGE 121

normalGVFSsessionwithoutinjectedfailure,258seconds.Theresultsshowthatthe overheadoferrordetectionandredirectionsetupis5secondsplusthespeciedtimeout value|5seconds,whichisadjustablethroughtheproxy.Consideringalong-running application,thisoverheadisnegligible. 121

PAGE 122

CHAPTER5 APPLICATIONSTUDY:SUPPORTINGGRIDVIRTUALMACHINES TheGVFS-basedlesystemvirtualizationandapplication-tailoredenhancements discussedintheabovechaptershavebeensuccessfullyemployedtosupportapplications frommanydierentdisciplines,includingspectroscopystudyforbiomedicalscientists [91]andstormsurgemodelingforcostalresearchers[102].Aparticularlyimportantand interestingapplicationofthissolutionistosupportvirtualmachinesVMsasexecution environmentsforgridapplications,whereecientandsecureaccesstobothuserandVM dataistransparentlyprovidedtotheapplicationsandVMsinstantiatedon-demandacross grids. 5.1Architecture Afundamentalgoalofcomputationalgridsystemsistoallowexible,securesharing ofresourcesdistributedacrossdierentadministrativedomains[1].Torealizethis vision,akeychallengethatmustbeaddressedbygridmiddlewareistheprovisioning ofexecutionenvironmentsthathaveexible,customizablecongurationsandallow forsecureexecutionofuntrustedcodefromgridusers[103].Suchenvironmentscan bedeliveredbyarchitecturesthatcombinesystem-levelVMs[104]andmiddlewarefor dynamicinstantiationofVMinstancesonaper-user,per-applicationbasis[17].Ecient instantiationofVMsacrossdistributedresourcesrequiresmiddlewaresupportfortransfer oflargeVMstatelese.g.,memorystate,diskstateandthusposeschallengestodata managementinfrastructures. Mechanismsthatpresentinexistingmiddlewarecanbeutilizedtosupportthis functionalitybytreatingVM-basedcomputingsessionsasprocessestobescheduledVM monitorsanddatatobetransferredVMstate.Inordertofullyexploitthebenetsofa VM-basedmodelofgridcomputing,datamanagementiskey:withoutmiddlewaresupport fortransferofVMstate,computationistiedtotheend-resourcesthathaveacopyofa user'sVM;withoutsupportforthetransferofapplicationdata,computationistiedto 122

PAGE 123

Figure5-1.TheGVFSsupporteddatamanagementforbothvirtualmachinestateand userdataallowsforper-user,per-applicationVMinstantiations VM1 VM2 and VM3 acrossgridresourcescomputeservers C1 and C2 ,stateservers S1 and S2 ,dataservers D1 and D2 theend-resourcesthathavelocalaccesstoauser'sles.However,withappropriatedata managementsupport,thecomponentsofagridVMcomputingsessioncanbedistributed acrossthreedierentlogicalentities:thestateserver",whichstoresVMstate;the computeserver",whichprovidesthecapabilityofinstantiatingVMs;andthedata server",whichstoresuserdataFigure5-1. AsanimportantapplicationoftheGVFSvirtualizationapproach,adatamanagement solutionbasedonGVFSanditsapplication-tailoredenhancementsisproposedtoallow fastdynamicVMinstantiationandecientruntimeexecutiontosupportVMsas executionenvironmentsingridcomputing.Inparticular,asillustratedinFigure5-2, GVFSdiskcachingisimportanttoimprovingtheperformanceofremoteVMstateaccess 123

PAGE 124

Figure5-2.TheGVFSextensionsforVMstatetransfers.Atthecomputeserver,theVM monitorissuessystemcallsthatareprocessedbythekernelNFSclient. Requestsmayhitinthekernel-levelmemorybuercache;thosethatmiss areprocessedbytheuser-levelproxy.Attheproxy,requeststhathitin theblock-baseddiskcache,orinthele-baseddiskcacheifmatching storedmetadata,aresatisedlocally;proxymissesareforwardedas SSH-tunneledRPCcallstoaremoteproxy,whichfetchesdatadirectlyfor VMmemorystateorthroughthekernelNFSserverforVMdiskstate formanyVMtechnologies,includingVMware[80],UML[82],andXen[81],whereVM statee.g.,virtualdisk,virtualmemoryisstoredasregularlesorlesystems.The GVFSprivatelesystemchannelscanalsobeusedtoprovidesecureVMstateaccessover insecuregridresources. Anotheruser-levelextensionmadetoGVFSisthehandlingofapplication-specic metadatainformation.ThistechniqueisemployedtosupportgridVMsandrealize VM-awaredatatransferforon-demandVMinstantiation,asdiscussedinthenextsection. 5.2VirtualMachineAwareDataTransfer Themainmotivationformetadatahandlingistousemiddlewareinformation togeneratemetadataforcertaincategoriesoflestocapturetheknowledgeofgrid applications.Then,aGVFSproxycantakeadvantageofthemetadatatoimprovedata transfer.Metadatacontainthedatacharacteristicsoftheleitisassociatedwith,and deneasequenceofactionswhichshouldbetakenonthelewhenitisaccessed,where 124

PAGE 125

eachactioncanbedescribedasacommandorscript.WhentheproxyreceivesaNFS requesttoalewhichhasmetadataassociatedwith,itprocessesthemetadataand executestherequiredactionsontheleaccordingly.Inthecurrentimplementation,the metadataleisstoredasahiddenleinthesamedirectoryastheleitisassociated with,andhasaspecialextensionnamedinthestyleof .lename.meta ,sothatitcanbe easilylookedup. Forexample,resumingaVMwareVMrequiresreadingtheentirememorystatele typicallyinhundredsofMBytesormore.Transferringtheentirecontentsofthisle overaDFSistime-consuming;however,withapplication-specicknowledge,itcanbe pre-processedtogenerateametadatalespecifyingwhichblocksinthememorystate areallzeros.Then,whenthememorystateleisrequested,theproxyclient,through processingofthemetadata,canservicerequeststozero-lledblockslocally,askforonly non-zeroblocksfromtheserver,reconstructtheentirememorystate,andpresentitto theVMmonitor.Normallythememorystatecontainsmanyzero-lledblocksthatcanbe lteredoutbythistechnique[83],andthetraconthewirecanbegreatlyreducedwhile instantiatingaVM.Forinstance,whenresuminga512MB-RAMRedHat7.3VMwhich wassuspendedafterboot-up,theclientissued65,750NFSreads,whereasbyusingthis metadatahandlingtechnique92%oftherequestscouldbelteredoutwithoutbeingsent totheserver. AnotherexampleofGVFS'metadatahandlingcapabilityistohelpthetransferof largelesandenablele-baseddiskcaching.InheritedfromtheunderlyingNFSprotocol, datatransferinGVFSison-demandandblock-by-blockbasedtypically4KBto64KB perblock,whichallowsforpartialtransferofles.Manyapplicationscanbenetfrom thisproperty,especiallywhentheworkingsetsizesoftheaccessedlesareconsiderably smallerthantheoriginalsizesoftheles.Forexample,accessestotheVMdiskstateare typicallyrestrictedtoaworkingsetthatismuchsmaller < 10%thanthelargediskstate les.Butwhenlargelesareindeedcompletelyrequiredbyanapplicatione.g.,whena 125

PAGE 126

remotelystoredmemorystateleisrequestedbyVMwaretoresumeaVM,block-based datatransfermaybecomeinecient. However,ifgridmiddlewarecanspeculateinadvancewhichleswillbeentirely requiredbasedonitsknowledgeoftheapplication,itcangeneratemetadataforGVFS proxytoexpeditethedatatransfer.Theactionsdescribedinthemetadatacanbe compress",remotelycopy",uncompress",andreadlocally",whichmeanswhenthe referredleisaccessedbytheclient,insteadoffetchingtheleblockbyblockfromthe server,theproxywill:1compresstheleontheservere.g.,usingGZIP;2remotely copythecompressedletothecliente.g.,usingGSI-enabledSCP[105];3uncompress ittothelecachee.g.,usingGUNZIP;and4generateresultfortherequestfromthe locallycachedle.Oncetheleiscachedallthefollowingrequeststothelewillalsobe satisedlocallyFigure5-2. Hence,theproxyeectivelyestablishesanon-demandfastle-baseddatachannel, whichcanalsobesecurebyemployingSSHtunnelingfordatatransfer,inadditiontothe traditionalblock-basedNFSdatachannel,andale-basedcachewhichcomplementsthe block-basedcacheinGVFStoformaheterogeneousdiskcache.Thekeytothesuccess ofthistechniqueistheproperspeculationofanapplication'sbehavior.Gridmiddleware shouldbeabletoaccumulateknowledgeaboutapplicationsfromtheirpastbehaviors andmakeintelligentdecisionsbasedonthisknowledge.Forinstance,sinceforVMware theentirememorystateleisalwaysrequiredfromthestateserverbeforeaVMcan beresumedonthecomputeserver,andsinceitisoftenhighlycompressible,theabove techniquecanbeappliedveryecientlytoexpediteitstransfer. 5.3IntegrationwithVM-BasedGridComputing TheVMscanbedeployedinagridintwodierentkindsofscenarios,whichpose dierentrequirementsofdatamanagementtotheGVFSdatasessionscreatedforVM instantiations.Intherstscenario,agriduserisallocatedadedicatedVMwhichhasa persistentvirtualdiskonthestateserver.Itissuspendedatthecurrentstatewhenthe 126

PAGE 127

userleavesandresumedwhentheusercomesback.Nonetheless,theusermayormaynot startcomputingsessionsfromthesameserver.TheVMshouldbeecientlyinstantiated onthecomputeserverwhenthesessionstarts,andthemodicationstotheVMstate fromtheapplicationexecutionduringthesessionshouldalsobeecientlyreectedonthe stateserver. TheGVFSalongwithitsapplication-tailoredenhancementscanwellsupportthis scenariointhat:1theuseofmetadatahandlingcanquicklyrestoretheVMfromits checkpointedstate;2theon-demandblock-basedaccesstothevirtualdiskcanavoid thelargeoverheadincurredfromdownloadinganduploadingtheentirevirtualdisk; 3proxydiskcachescanexploitlocalityofreferencestothevirtualdiskandprovide high-bandwidth,low-latencyaccessestocachedleblocks;4write-backcachingcan eectivelyhidethelatenciesofwriteoperationsperceivedbytheuser/application,which aretypicallyverylargeinawide-areaenvironment,andsubmitthemodicationswhen theuserisoineorthesessionisidle. Intheotherscenario,thestateserverstoresanumberoftemplateVMsforthe purposeofcloning".ThesegenericVMshaveapplication-tailoredhardwareandsoftware congurations,andwhenaVMisrequestedfromacomputeserver,thestateserveris searchedagainsttherequirementsofthedesiredVM.Thebestmatchisreturnedasthe golden"VM,whichisthencloned"atthecomputeserver[106].Thecloningprocess entailscopyingthegolden"VM,restoringitfromcheckpointedstate,andsettingupthe clonewithcustomizedcongurations.Butinsteadofcopyingtheentirevirtualdisk,only symboliclinksaremadetothediskstateles.ThegoldenVM'sdiskstateisread-only sharedbyallofitsclones.Afteranewclonecomestolife",computingcanstartinthe VMandmodicationstotheoriginaldiskstatearestoredintheformofredologsalso knownascopy-on-writeles.Sodatamanagementinthisscenariorequiresecient transferoftheVMstatefromthestateservertothecomputeserver,aswellasecient writestotheredologsforcheckpointing. 127

PAGE 128

Similartotherstscenario,GVFScanquicklyinstantiateaVMclonebyusing metadatahandlingforthememorystateleandon-demandblock-basedaccesstothedisk stateles.Afterthecomputingstarts,theproxydiskcachecanhelpspeedupaccessto thesharedvirtualdiskles,andwrite-backcanhelpsaveusertimeforwritestotheredo logs.However,akeydierenceinthisscenarioisthatasmallsetofgoldenVMscanbe usedtoinstantiatemanyclones,e.g.,forparallelexecutionofahigh-throughputtask.The proxydiskcachescanexploittemporallocalityamongclonedinstancesandacceleratethe cloningprocess.Onthecomputeserver,thecacheddataofmemoryanddiskstateles frompreviousclonescangreatlyexpeditenewcloningsfromthesamegoldenVMs.In addition,asecond-levelproxydiskcachecanbesetuponaLANserver,asexplainedin Section4.2.1.3,tofurtherexploitthelocalityandprovidehighspeedaccesstothestateof goldenVMsforcloningstothesameLAN. Inbothoftheabovescenarios,middleware-drivencacheconsistencydiscussedin Section4.2.1.2canbeemployed.UnderaVMmanagementsystem,suchasVMPlant[106] andVMwareVirtualCenter[107]:aVMwithpersistentstatecanbededicatedtoasingle user,whereaggressivereadandwritecachingwithwritedelaycanbeused;aVMwith non-persistentstatecanberead-sharedamongusersbuteachusercanhaveindependent redologs,wherereadcachingforstatelesandwrite-backcachingforredologscanbe employed.Inbothcases,themanagementmiddlewarecancontrolGVFStowriteback VMstatemodicationsattheendofthedatasessions. 5.4Evaluation 5.4.1Setup Thissectionusesagroupoftypicalbenchmarkstoevaluatetheeciencyofusing GVFStosupportVMinstantiationsandexecutionsacrossnetwork.Experimentswere conductedinbothlocal-areaandwide-areaenvironments.TheLANsetupstudiesthe overheadofusingGVFS,whereastheWANsetupinvestigatesitsperformanceinthe targetgridenvironments. 128

PAGE 129

TheLANstateserverisadual-processor1.3GHzPentium-IIIclusternodewith 1GBofRAMand18GBofdiskstorage.TheWANstateserverisadual-processor 1GHzPentium-IIIclusternodewith1GBRAMand45GBdisk.InSection5.4.2,the computeserverisa1.1GHzPentium-IIIclusternodewith1GBofRAMand18GB ofSCSIdisk;inSection5.4.3,thecomputeserversareclusternodeswhichhavetwo 2.4GHzhyper-threadedXeonprocessorswith1.5GBRAMand18GBdiskpernode.All ofthecomputeserversrunVMwareGSXserver2.5tosupportx86-basedVMs.The computeserversareconnectedwiththeLANstateserverina100Mbit/sEthernetatthe UniversityofFlorida,andconnectedwiththeWANstateserverthroughAbilenebetween NorthwesternUniversityandUniversityofFlorida.TheRTTfromthecomputerserversto theLANstateserverisaround0.17ms,whereastotheWANstateserverisaround32ms asmeasuredbyRTTometer[97]. IntheexperimentsonGVFSwithdiskcaching,thecachewasconguredwith 8GBytecapacity,512lebanks,and16-wayassociativity.TheGVFSprototypeusedhere isbasedonNFSv2,whichlimitsthemaximumsizeofanon-the-wirereadorwriteRPCto 8KB.However,intheexperimentsonnativeNFS,NFSv3with32KBblocksizewasused toprovidethebestachievableresultsforcomparison.Furthermore,alltheexperiments wereinitiallysetupwithcoldcachesbothkernelbuercacheandpossiblyenabledproxy diskcachebyunmountingtheremotelesystemandushingtheproxycacheifitwas used.PrivatelesystemchannelswerealwaysemployedinGVFSduringtheexperiments. 5.4.2PerformanceofApplicationExecutionswithinVMs ThreebenchmarksthatrepresentdierenttypicalusageofVMswereusedtoevaluate theperformanceofapplicationsexecutingonGVFS-mountedVMenvironments: SPECseis :abenchmarkfromtheSPEChigh-performancegroup.Itconsistsoffour phases,wheretherstphasegeneratesalargetraceleondiskandthelastphaseinvolves intensiveseismicprocessingcomputations.Thebenchmarkwastestedinsequentialmode withthesmalldataset.ItmodelsascienticapplicationthatisbothI/O-intensiveand 129

PAGE 130

compute-intensive.SPECseisisusedtostudytheperformanceofanapplicationthat exhibitsamixofcompute-intensiveandI/O-intensivephases. LaTeX :abenchmarkdesignedtomodelaninteractivedocumentprocessingsession. ItisbasedonthegenerationofaPDFPortableDocumentFileversionofa190-page documenteditedbyLaTeX.Itrunsthelatex",bibtex",anddvipdf"programsin sequenceanditerates20times,whereeachtimeadierentversionofoneoftheLaTeX inputlesisused.Thisbenchmarkisusedtostudyascenariowhereusersinteractwith aVMtocustomizeanexecutionenvironmentforanapplicationthatcanthenbecloned byotherusersforexecution[106].Inthisenvironment,itisimportantthatinteractive sessionsforVMsetupshowgoodresponsetimestothegridusers. Kernelcompilation :abenchmarkthatrepresentslesystemusageinasoftware developmentenvironment,similartotheAndrewbenchmark[25].ThekernelisaLinux 2.4.18withthedefaultcongurationsinaRedHat7.3Workstationdeployment,andthe compilationconsistsoffourmajorsteps,makedep",makebzImage",makemodules", andmakemodules install",whichinvolvesubstantialreadsandwritesonalargenumber ofles. TheexecutiontimesoftheabovebenchmarkswithinaVM,whichhas512MBRAM and2GBdiskinVMwareplaindiskmode[108],runsLinuxRedHat7.3,andstoresthe benchmarkapplicationsandtheirdatasets,weremeasuredinthefollowingfourdierent scenarios: Local :TheVMstatewasaccessedfromalocal-disklesystem. LAN/G :TheVMstatewasaccessedfromtheLANstateserverviaGVFSwithout diskcaching. WAN/G :TheVMstatewasaccessedfromtheWANstateserverviaGVFSwithout diskcaching. WAN/GC :TheVMstatewasaccessedfromtheWANstateserverviaGVFSwith diskcaching. 130

PAGE 131

Figure5-3.TheSPECseisbenchmarkexecutiontimeinVM.Theresultsshowthe runtimesofvariousSPECseisphaseswiththeVMstateaccessedfroma local-disklesystem Local ,theLANstateserverviaGVFSwithoutdisk caching LAN/G ,theWANstateserverviaGVFSwithoutdiskcaching WAN/G ,andtheWANstateserverviaGVFSwithdiskcaching WAN/GC Figure5-3showstheexecutiontimesofthefourSPECseisphases.Theperformance ofthecompute-intensivepartphase4iswithina10%rangeacrossallscenarios.The resultsoftheI/O-intensivepartphase1,however,showsalargedierencebetweenthe WAN/G and WAN/GC scenarios|thelatterisfasterbyafactorof2.1.Thebenetof awrite-backpolicyisevidentinthephase1,wherealargelethatisusedasaninput tothefollowingphasesiscreated.TheuseofdiskcachinginGVFSalsobringsdownthe totalexecutiontimeby33percentinthewide-areaenvironment. TheLaTeXbenchmarkresultsinFigure5-4showthatinWANinteractiveusers wouldexperienceastartuplatencyof225.67seconds WAN/G or217.33seconds WAN/GC .Thisoverheadissubstantialwhencomparedto Local and LAN ,which executetherstiterationinabout12seconds.Nonetheless,thestart-upoverheadinthese scenariosismuchsmallerthanwhatonewouldexperienceiftheentireVMstatehasto bedownloadedfromthestateserverfordataaccess2818seconds.Duringsubsequent 131

PAGE 132

Figure5-4.TheLaTeXbenchmarkexecutiontimeinVM.Theresultsshowtheruntimeof therstrun,theaverageruntimeofthefollowing19runs,andthetotal runtime,withtheVMstateaccessedfromalocal-disklesystem Local ,the LANstateserverviaGVFSwithoutdiskcaching LAN/G ,theWANstate serverviaGVFSwithoutdiskcaching WAN/G ,andtheWANstateserver viaGVFSwithdiskcaching WAN/GC iterations,thekernelbuercanhelptoreducetheaverageresponsetimefor WAN/G to about20seconds.TheuseofGVFSdiskcachingcanfurtherimproveitfor WAN/GC toverycloseto Local %slowerand LAN/G %slower,whichare54%fasterthan WAN/G .Thetimeneededtosubmitcachedstatemodicationsisaround160seconds, whichisalsomuchshorterthantheuploadingtimesecondsoftheentireVMstate inthedownload-uploaddataaccessmodel. ExperimentalresultsfromthekernelcompilationbenchmarkareillustratedinFigure 5-5.Therstrunofthebenchmarkinthe WAN/GC scenariowhichbeginswithcold" cachesshowsan84%overheadcomparedtothatofthe Local scenario.However,forthe secondrun,thewarm"cacheshelptobringtheoverheaddownto9%,andcompared tothesecondrunofthe LAN scenario,itislessthan4%slower.Theavailabilityof diskcachingallows WAN/GC tooutperform WAN/G bymorethan30percent.As 132

PAGE 133

Figure5-5.KernelcompilationbenchmarkexecutiontimeinVM.Theresultsshowthe runtimeforfourdierentphasesintwoconsecutiverunsofthebenchmark, withtheVMstateaccessedfromalocal-disklesystem Local ,theLAN stateserverviaGVFSwithoutdiskcaching LAN/G ,theWANstateserver viaGVFSwithoutdiskcaching WAN/G ,andtheWANstateservervia GVFSwithdiskcaching WAN/GC intheLaTeXcase,thedatashowthattheoverheadexperiencedinanenvironment whereprogrambinariesand/ordatasetsarepartiallyreusedacrossiterationse.g.,in applicationdevelopmentenvironments,theresponsetimesofWAN-mountedGVFS sessionsareacceptable. 5.4.3PerformanceofVMCloning AnotherbenchmarkisdesignedtoinvestigatetheperformanceofVMcloningunder GVFS.ThecloningschemeisasdiscussedinSection5.3,whichincludescopyingthe congurationle,copyingthememorystatele,buildingsymboliclinkstothediskstate les,conguringtheclone,andatlastresumingthenewVM.Theexecutiontimeofthe benchmarkwasalsomeasuredinvedierentscenarios: Local :TheVMwasclonedforeighttimessequentiallyfromalocal-disklesystem. 133

PAGE 134

Figure5-6.PerformanceofasequenceofVMcloningsfrom1to8.EachclonedVMhas 320MBofvirtualmemoryand1.6GBofvirtualdisk.TheresultsshowtheVM cloningtimeinthedierentscenarios. WAN-S1 :TheVMwasclonedforeighttimessequentiallyfromtheWANstate serverviaGVFS.ThecloningsweresupportedbyGVFSwithprivatelesystemchannel, proxydiskcaching,andmetadatahandling.Itisdesignedtoevaluatetheperformance whenthereistemporallocalityamongclonings. WAN-S2 :ThesetupisthesameasWAN-S1exceptthateightdierentVMswere eachclonedoncetothecomputerserversequentially.Itisdesignedtoevaluatethe performancewhenthereisnolocalityamongclonings. WAN-S3 :ThesetupisthesameasWAN-S2exceptthataLANserverwasusedto provideasecond-levelproxydiskcachetothecomputeserver.EightdierentVMswere cloned,whichwerenewtothecomputeserver,butwerepre-cachedontheLANserver duetopreviousclonestoothercomputerserversinthesameLAN.Thissetupisdesigned tomodelascenariowherethereistemporallocalityamongtheVMsclonedtothesame LAN. 134

PAGE 135

Table5-1.TotaltimesofcloningeightVMsinWAN-S1andWAN-Pwhenthecaches kernelbuercache,proxyblock-basedcache,andproxyle-basedcacheare coldandwarm. Totaltimewhencachesarecold Totaltimewhencachesarewarm WAN-S1 1056seconds 200seconds WAN-P 150.3seconds 32seconds WAN-P :EightVMswereclonedinparallelfromtheWANstateservertoeight computeserversviaGVFS. Figure5-6showsthecloningtimeforasequenceofVMswhichhave320MBofvirtual memoryand1.6GBofvirtualdisk.IncomparisonwiththerangeofGVFS-basedcloning timesshowninthegure,iftheVMisclonedusingSecureCopyforfull-lecopying,it takesapproximatelytwentyminutestotransfertheentirestate.IftheVMstateisnot copiedbutisreadfromanativeNFS-mounteddirectory,thecloningtakesmorethan halfanhourbecausetheblock-basedtransferofmemorystateleisslow.However,the enhancedGVFSwithproxydiskcachesandmetadatasupporttocompressusingGZIP andtransferusingSCPtheVM'smemorystatecangreatlyspeedupthecloningprocess towithin160seconds.Furthermore,ifthereistemporallocalityofaccessestothememory anddiskstatelesamongtheclones,GVFSevenallowsthecloningtobeperformed within25secondsifdataarecachedonlocaldisksorwithin80secondsifdataarecached onaLANserver. Table5-1comparestheperformanceofsequentialcloningwithparallelcloningvia GVFS.Intheexperimentof WAN-P ,theeightcomputeserverssharedasinglestate serverandGVFSproxyserver.Butwhentheeightcloningsstartedinparallel,each proxyclientspawnedale-baseddatachanneltofetchthememorystateleondemand. Thespeedupfromparallelcloningversussequentialcloningismorethan700%when thecachesarecoldandmorethan600%whenthecachesarewarm.Comparedwiththe averagetimetocloneasingleVMinthesequentialcase WAN-S1 ,thetotaltimefor cloningeightVMsinparallelisonly14%longerwithcoldcachesand24%longerwith 135

PAGE 136

warmcaches,whichimpliesGVFS'supportforVMcloningcanscaletoparallelcloning ofalargenumberofVMsverywell.Inbothscenarios,thesupportfromGVFSison demandandtransparenttouserandVMmonitor.Inaddition,asdemonstratedinSection 5.4.2,followingaVM'sinstantiationviacloning,GVFScanalsoimproveitsrun-time performancesubstantially. 136

PAGE 137

CHAPTER6 SERVICE-ORIENTEDAUTONOMICDATAMANAGEMENT Thedatamanagementsystemproposedinthisdissertationisbuiltupontwolevels ofsoftware.TherstlevelisbasedontheGVFSapproachdescribedintheprevious chapters,whichaddressestheproblemofprovidingtransparentandapplication-tailored dataaccessingrid-styleenvironments.BasedonGVFS,datasessionscanbecreated ondemandforapplicationsuponthesharedphysicalresourcesasillustratedinFigure 3-1.Thesesessionscanbeindependentlycustomizedtoprovidethedesireddataaccess accordingtotheirapplicationneeds,e.g.,toservethediverseapplicationsdescribedin Section4.1. Thesecondleveloftheproposeddatamanagementsystemaddressestheproblem ofmanagingdataprovisioninginalarge,dynamicsystem,i.e.,howtomanagemany dynamicanddiverseGVFSdatasessionsinalarge-scalesystem.Itisdesirablethat datamanagementcanleveragetheknowledgeofapplicationscharacteristics,usage scenarios,servicequalityrequirementstooptimizethedataprovisioning,andthatitcan bedoneautomaticallyaccordingtohigh-levelobjectivese.g.,QualityofService.This chapterdescribesservice-orientedmiddlewaredesignedtowardsthesegoals.Service-based managementhidesthecomplexityofdataprovisioningfromclients|usersorother middleware,andtransparentlypreparesdatafortheirapplicationsthroughinteroperable interfaces.Intelligencecanbefurtherembeddedintotheservicestoautomatically optimizedataaccessaccordingtotheQoSgoalsspeciedbyclients. 6.1Service-BasedDataManagement 6.1.1Architecture Figure6-1illustratestheoverallarchitectureoftheproposedservice-orienteddata management.Itsupportsdynamicmanagementofgrid-scaledataprovisioningbymeans ofservice-basedmanagementmiddlewarethe controlow ,dashedlines,andGVFS-based 137

PAGE 138

Figure6-1.ExampleofGVFSsessionsestablishedbythedatamanagementserviceson computeservers C1 C2 andleservers F1 F2 .Instep1,thejob schedulerrequeststheDSSDataSchedulerServicetostartasessionbetween C1 and F1 ;step2,theDSSqueriestheDRSDataReplicationServicefor replicainformation;itthenrequestsinstep3theFSSFileSystemServiceon F1 tostarttheproxyserverstep4.TheDSSalsorequeststheFSSon C1 to starttheproxyclientandmountthelesystemsteps5,6.Thejobscheduler canthenstartataskin C1 step7,whichcanaccessthedataonserver F1 throughsessionI.SessionsII,IIIandIVareisolatedfromsessionI. datasessions 1 the dataow ,shadedregions.Thegureshowsexamplesofdatasessions establishedbythedatamanagementservices.Sessionsareindependentlyconguredand isolatedfromeachotherthroughtheservicesandproxies.Severalsessionscanalsoshare thesamedatasetbyconnectingtothesameproxyservere.g.,SessionIIandIIIin Figure6-1. 1 Theservicealsosupportsle-baseddatatransfersforthedataow,asdescribedinSection6.1.3.1. 138

PAGE 139

Fundamentally,thegoalofthisarchitectureistoenableexible,secureresource sharing.Thisinvolvestheestablishmentofrelationshipsbetweenprovidersandusersthat arecomplexandoftenconictingindistributedenvironments.Fromauser'sstandpoint, resourcesshouldideallybecustomizabletotheirneeds,regardlessoftheirlocation.From aprovider'sstandpoint,resourcesshouldideallybeconguredinasingle,consistentway. Otherwise,sharingishinderedbyaprovider'sinabilitytoaccommodateindividualuser needsandassociatedsecurityrisksandbytheuser'sinabilitytoeectivelyusesystems overwhichtheyhavelimitedcontrol. Tothisend,theproposedservice-orientedapproachbuildsupontwokeyaspects oftheWebServiceResourceFrameworkWSRF: interoperability inthedenition, publishing,discovery,andinteractionsofservices[109][68][110],and statemanagement for controllingdataaccesssessionsthatpersistthroughouttheexecutionofanapplication. Italsobuildsuponavirtualizeddataaccesslayerthatsupportsuser-levelcustomization. Asaresult,theservicesaredeployedoncebytheprovider,andcanthenbeaccessedby authorizeduserstocreateandcustomizeindependentdataaccesssessions. Thesedatamanagementservicesareintendedforusebybothend-usersand middlewarebrokerse.g.,jobschedulersactingontheirbehalf.Ineithercase,itis assumedthattheuserormiddlewareclientcanauthenticatetotheservicehost,directly orindirectlythroughdelegation,leveragingauthenticationsupportattheWSRFlayer, andobtainaccesstoalocaluseridentityonthehoste.g.,viaGSI-basedgrid-to-local accountmappings,orviamiddleware-allocated,logical"useraccounts[111][112].See Section6.1.4formoredetailsaboutthesecurityarchitecture. Theremainingofthischapterisorganizedasfollows.Section6.1.2presentsthe detailsoftheproposeddatamanagementservices.Section6.1.3describeshowtheyare employedtomanageapplication-tailoreddatasessions.Section6.1.5discussesseveral usageexamples. 139

PAGE 140

6.1.2TheWSRF-BasedDataManagementServices 6.1.2.1Filesystemservice TheFileSystemServiceFSSrunsoneverycomputeandleserverandcontrols thelocalGVFSproxies.Itimplementstheestablishmentandcustomizationoflesystem sessionsthroughthemanagementoftheproxies.Theproxyprocessesarethestateful resourcestotheservice,andtheserviceprovidestheinterfacetostart,congure,monitor, andterminatethem.Theirstatee.g.,proxycongurationsisstoredinlesonlocaldisk. Aproxyclientisassociatedwithasinglesession;aproxyserver,however,canbeinvolved inmorethanonesessiontosupportthedatasharingamongmultipleapplicationse.g., sessionIIandIIIinFigure6-1. Theservicecustomizesaproxyviacongurationsdenedinaleandcansignalit todynamicallyrecongureitselfbyreloadingtheleFigure4-6.Thecongurationle holdsinformationincludingdiskcacheparameters,cacheconsistencyprotocol,security conguration,anddatareplicalocation.TheyarerepresentedasWS-ResourceProperty andcanbeviewedandmodiedwithstandardWSRFoperations getResourceProperty and setResourceProperty .WhentheFSSreceivesarequestforasession'sstatus,it signalstheproxytoreporttheaccumulatedstatisticsnumberofRPCcalls,resource usageetc.andtoissueanNFSNULLcalltotheservertocheckwhethertheconnection isstillalive. 6.1.2.2Dataschedulerservice TheDataSchedulerServiceDSSisinchargeofcreationandmanagementofGVFS sessions.Thesesessionsareassociatedtotheserviceasitsstatefulresourcesandtheir statee.g.,sessioncongurationsismaintainedinadatabase.Theservicesupportsthe operationsofcreating,conguring,monitoring,andtearingdownofasession. Arequesttocreateasessionneedstospecifythetwoendpointlocationsclient's IPaddressandmountpoint,server'sIPaddressandlesystemexportpathandthe desiredcongurationsofthesessionintheaspectsofcaching,multithreading,consistency, 140

PAGE 141

security,andreliability.TheDSSrstchecksitsinformationaboutothersessionsto resolvesharingconicts.Forexample,ifthesamedatasetisaccessedbyanothersession withwrite-backcachingenabled,theservicecaninteractwiththecorrespondingFSS toforcethesessiontosubmitthecacheddatamodicationsanddisablewrite-back afterwards. Whenthereisnoconict,theDSScanproceedtostartthesessionFigure6-1. Itaskstheserver-sideFSStostarttheproxyserverandtheclient-sideFSStostart theproxyclientandthenestablishestheconnection.Beforesendingarequesttothe client-sideFSS,theDSSalsoqueriestheDRSaservicedescribedbelow.Ifthereare replicasforthedataset,theirlocationsarealsosentalongwiththerequest,sothatin caseoffailurethesessioncanberedirectedtoabackupserver. Notethatasessionissetupforaparticularjobsubmittedbyauserorservice.If thereisanirresolvabledatasharingconictwhenschedulingasessione.g.,thedatasetis currentlyunderexclusiveaccessbyanothersession,theDSScannotestablishthesession anditreturnsanerrortotheserviceclient. TheDSScanalsodynamicallyrecongureasession'scongurationsandmonitorits statusasneeded,viatheconguringandmonitoringoperationsonthesession'sproxies throughthecorrespondingFSSs.ItassociatestheresourceIDofthesessionwiththe resourceIDsofthesession'sproxies,sothatitcanreferencetheconcernedproxiesduring itsinteractionswiththeFSS. 6.1.2.3Datareplicationservice TheDataReplicationServiceDRSisresponsibleformanagingdatareplication, anditsstatefulresourcesarethedatareplicas.Theserviceexposesinterfacesforcreating, destroying,andqueryingagivendataset'sreplicas.Thestateofthereplicasisstored inarelationaldatabase,whichfacilitatesthequeryandmanipulationofinformation aboutreplicas.TheservicecanbequeriedwiththelocationIPaddressoftheserverand 141

PAGE 142

pathtothedataontheserverofadatasetprimaryorbackupone,anditreturnsthe locationsofallthereplicas. Arequesttocreateareplicaneedstospecifythelocationsofthesourcedataand thedesiredreplica.Ifareplicadoesnotalreadyexistattherequestedlocation,theDRS theninteractswiththeDSStoscheduleasessionbetweenthesourceanddestination andhavethedatareplicated.Suchadatasessioncanemployecientbulk-datatransfer mechanismsdescribedbelowSection6.1.3.1toachievehigh-throughputreplication, especiallyforwide-areadatareplication.Areplicacanalsobedestroyedasneeded throughtheDRS,whichcontactstheDSStoschedulethereplicaremovalfromthe speciedlocation,afteritbecomesunusedbyexistingsessions.Wheneverareplicais createdordestroyed,theDRSupdatesthedatabaseaccordingly. Theprototypeoftheabovedescribeddatamanagementserviceshasbeenbuiltusing WSRF::Lite[71],aPerl-basedWSRFimplementationofWSRF.Thedatabaseneededto storetheresourcestateforDSSandDRShasbeenimplementedusingMySQL. 6.1.3Application-TailoredDataSessions Thedatamanagementservicesarecapableofcreatingandmanagingdynamic GVFSsessions.Unliketraditionaldistributedlesystemswhicharestaticallysetupfor general-purposedataaccess,eachGVFSsessionisestablishedforaparticulartask.Hence theservicescanapplyapplication-tailoredcustomizationsonthesesessionstoenhance dataaccessintheaspectsofperformance,consistency,security,andfaulttolerance. Figure6-2illustratestheuseofseveralsuchenhancementsonasessiontoimprovedata accessperformanceandreliability.Thefollowingthreesubsectionsdescribethechoices thatcancurrentlybemadeonaper-applicationbasis. 6.1.3.1Griddataaccessandletransfer TheFTP-basedtoolscanoftenachievehighthroughputforlarge-sizelemovements [6],buttheapplication'sdataaccesspatternneedstobewelldenedtoemploysuch utilities.Forapplicationswhichhavecomplexdataaccesspatternsandforthose 142

PAGE 143

Figure6-2.Application-tailoredenhancementsforaGVFSsession.Readrequestsare satisedfromtheremoteserverortheproxycache.Writesareforwardedto theloopbackROWserverandstoredinshadowles.Whenarequesttothe remoteserverfailsitisredirectedtothebackupserver. thatoperateonsparsedatasets,thegenericlesysteminterfaceandpartial-data transfersupportedbyGVFSareadvantageous.Bothmodelsaresupportedbythe datamanagementservices. TheFSScanconguredataaccesssessionsbasedonlesystemproxies.According totheinformationaboutthejobaccountsandleaccountsprovidedbytheDSS,theFSS dynamicallysetsupcross-domainidentitymappingse.g.,remote-to-localuser/groupIDs onaper-sessionbasis.TheFSScanconguretheGVFSsessionwithapplication-tailored caching,security,andreliabilitycongurationsasdiscussedlater.Itisalsocapableof dynamicallyreconguringaGVFSsessionbasedonchangeddataaccessrequirements,for example,whenasession'sdatasetbecomessharedbymultiplesessions. Theservicescanalsoemployhigh-throughputdatatransfermechanismse.g., GridFTP[6],SFTP/GSI-OpenSSH[105]ifitisknowninadvancethatapplications usewhole-letransfers.Thisscenariocanbedealtwithintwodierentways.Inthe conventionalway,auserorserviceauthenticatesthroughtheDSS,whichrequeststhe FSStotransferlesonbehalfoftheuser:downloadingtherequiredinputsandpresenting 143

PAGE 144

themtotheapplicationbeforetheexecution;uploadingthespeciedoutputstotheserver aftertheexecution. TheFTP-styledatatransfercanalsobeexploitedbyGVFS whilemaintainingthe genericlesysteminterface .Theproxyclientusesthisfunctionalitytofetchtheentirely neededlargelestoalocalcache,buttheapplicationstilloperatesonthelesthrough thekernelNFSclientandproxyclientinablock-basedfashion.Inthisway,theselection ofdatatransfermechanismbecomestransparenttoapplicationsandcanbeleveraged byunmodiedapplications.Suchanapplication-selectivedatatransfersessionhasbeen showntoimprovetheperformanceofinstantiatinggridvirtualmachinesVMs[20]in Section5.2,anditcanbeappliedtosupportotherapplicationsaswellthroughthedata managementservices. Block-basedorle-baseddiskcachingcanbeemployedbydatasessionstosupport theabovedierentdatatransfermechanisms,leveraginglocalitytoimproveremotedata accessperformance6-2.Eachsession'scachecanbeindependentlycustomizedinterms ofbothparameterssize,associativityandpoliciesread-only,write-through,write-back, accordingtoitsapplication'scharacteristicsandrequirements. 6.1.3.2Cacheconsistency Applicationscanalsobenetfromtheavailabilityofdierentcacheconsistency models.TheDFSandFSSservicesenableapplicationstoselectwell-suitedstrongor weakconsistencymodelsbydynamicallycustomizingGVFSsessions.Dierentcache consistencyprotocolsareoverlaiduponthenativeNFSclient-pollingmechanismbythe user-levelproxiesasdiscussedinSection4.3. TypicalNFSclientsuseper-leandper-directorytimerstodeterminewhentopoll aserver.Thiscanleadtounnecessarytraciflesdonotchangeoftenandtimersare settotoosmallavalueononehand,andlongdelaysinupdatesiftimershavelarge valuesontheotherhand.Acrosswide-areanetworks,revalidationcallscontributeto 144

PAGE 145

longapplication-perceivedlatencies.Incontrast,theoverlaidmodelscustomizetheuseof consistencychecksonaperGVFSsessionbasis. Becausethedatamanagementservicesdynamicallyestablishsessionsthatcanbe independentlycongured,theoverlaidconsistencyprotocolcanbeselectedtoimprove performancewhenitisapplicable.Twoexampleswhereoverlaidconsistencyprotocolscan improveperformancearedescribedbelow: Single-clientsessions :whenataskisknowntotheschedulertobeindependent e.g.,inhigh-throughputtaskfarmjobs,aggressiveclient-sidecachingcanbeenabledfor bothreadandwritedataandcompletelysatisfyconsistencycheckslocallytoachievethe bestpossibleperformance.Aswritesaredelayedontheclient,thedatamaybecome inconsistentwiththeserver.Butfromthesession'spointofview,itsdatacanbe guaranteedtobeconsistentbytheDSS.Consistencyactionsthatapplytoasession areinitiatedthroughtheDSSintwooccasions:1whenthetasknishesandthesession istobeterminated,thecacheddatamodicationsareautomaticallysubmittedtothe server;2whenthedataaretobesharedwithothersessions,theDSSreconguresthe sessionbyforcingittowritebackcachedmodicationsanddisablewrite-backhenceforth. Ineithercase,theDSSwaitsforthewrite-backtocompletebeforeitstartsanother sessiononthesamedata. Multiple-clientsessions :ForGVFSsessionswhereexclusivewriteaccesstodatais notnecessaryornotpossible,theschedulercanapplyrelaxedcacheconsistencyprotocols onthesesessionstoimproveperformance,e.g.,theinvalidation-pollingbasedconsistency discussedinSection4.3.2.Forapplicationsthatcannottolerateanyinconsistency,strong consistencyprotocolscanbeemployedbytheirsessions,suchasthedelegation-callback basedconsistencyintroducedinSection4.3.3. 6.1.3.3Faulttolerance Reliableexecutioniscrucialformanyapplications,especiallylong-runningcomputing andsimulationtasks.Thedatamanagementservicescurrentlyprovidetwotechniquesfor 145

PAGE 146

improvedfaulttolerance:client-sideROW-assistedcheckpointing,andserverreplication andsessionredirectionFigure6-2. Redirect-on-WriteROWlesystem [113]:TheservicescanenableROWon aGVFSsession,soalllesystemmodicationsproducedbytheclientaretransparently bueredinlocalstablestorage.Insuchasession,theproxyclientsplitsthedatarequests acrosstwoservers:readsgototheremotemainserver,andwritesareredirectedtoa localROWserver 2 .ThisapproachreliesontheopaquenatureofNFSlehandlesto allowforvirtualhandlesthatarealwaysreturnedtotheclient,butmappedtodierent physicallehandlesatthemainandROWservers.Fileswhosecontentsaremodiedby theclienthaveshadow"lescreatedbytheROWserverinasparsele,andblock-based modicationsareinsertedin-placeintheshadowle. Whenanapplicationischeckpointed,theFSScanrequestthecheckpointingof allbueredmodicationsintheshadowlesystem.Then,whentherecoveryfroma client-sidefailureisneeded,astheapplicationisrolledbacktotheprevioussavedstate, theFSSalsorollsbacktheapplication'sdatamodicationstothecorrespondingstate. WithouttheROWmechanism,whentheapplicationrollsbackthemodicationson thelessincethelastcheckpointingarealreadyreectedontheserver,inwhichcase thedatastatebecomesinconsistentwiththeapplicationstateandtherecoverycannot proceedcorrectly.Forinstance,lesalreadydeletedontheservermaybeneededbythe applicationagainafteritisrolledback,whichwillcausetheapplicationtofail.Anumber ofcheckpointingtechniquescanbeemployedinthisapproach,including[114][115].One particularcaseisthecheckpointingofanentireVMwhichincludesboththeapplication instanceandtheROWlesystem.Thisisdemonstratedinthefollowingexperiment. 2 ReadsoflesthathavebeenmodiedbytheclientareroutedtotheROWserver,insteadofthe mainserver. 146

PAGE 147

ThisexperimentmodelsascenariowhereaVMrunninganarbitraryapplicationis checkpointed,continuestoexecute,andlaterfails.Beforefailing,theapplicationchanges thestateoftheleserverirreversibly|e.g.,bydeletingsometemporaryles.Thiscase wastestedwiththeGaussiancomputationalchemistryapplicationrunningonacompute VMandusingdatamountedfromadataVM.Theexperimentshowsthat,innativeNFS, whenthecomputeVMwasresumedtoitspreviouscheckpointedstate,thetemporary lesneededbytheapplicationwerealreadygoneonthedataVM,soNFSreporteda stalelehandleerrorandtheapplicationaborted.Incontrast,withGVFSandtheROW enhancement,thedatastatewasstoredalongwiththeapplicationstateaspartofthe VM'scheckpoint,andtheledeletionswerebueredlocallyanddidnottakeplaceonthe dataVM.SowhenthecomputeVMwasresumedfromthecheckpoint,theapplication wasrecoveredsuccessfully,becausethedatamodicationshappenedafterthecheckpoint werediscardedfromtheROWandthedataonthedataVMwerestillinaconsistent statewiththeapplication. Serverreplicationandsessionredirection :Replicationisacommonpractice forfaulttolerance,anditisemployedinGVFStoimprovethereliabilityofdatasessions asdiscussedinSection4.5.Thedatamanagementservicessupporttheuseofserver replicationandsessionredirectiontotolerateserver-sidefailures,includingservercrashes andnetworkpartitions,asfollows.ToprepareaGVFSsessionwithreplicateddatasets, theDSScheckstheDRSforinformationaboutthedataset'sexistingreplicastheIP addressofareplicaserverandthepathtothereplicaontheserver.Ifthedatasetdoes nothaveenoughnumberofreplicastosupporttheapplication-neededreplicationdegree thenumberofreplicasforthedataset,theservicescanbeaskedtocreatethereplicas onthedesiredleservers. TheDSSthenrequeststheFSSoneachreplicaservertostartaproxyserverfor thissession.ItalsopassesonthereplicainformationtotheFSSontheclient,sothat theproxyclientcanbestartedwithconnectionstoalltheproxyservers.Duringsuch 147

PAGE 148

asession,theproxyclientvirtualizesthedatasetacrossthereplicasandtransparently detectafailureaswellasrecoverfromitbasedonthefault-toleranceprotocolsdescribed inSection4.5.Thechoiceofreplicationdegree,replicaplacement,andconsistencyscheme foraGVFSsessioncanallbecustomizedbytheservicesaccordingtotheapplication's requirements. Thedatamanagementservicescanalsocustomizethesecuritycongurationsof GVFSdatasessionsintermsofthesecuritypoliciesandmechanismsdiscussedinSection 4.4.However,acompletesecurityarchitectureisneededtoprovidesecuritytonotonly thedatasessionsbutalsothemanagementservices,andthesetwolevelsofsecurityneed tobeconsistentwitheachotherandbecompatiblewithothersecuritymiddleware.The followingsubsectionpresentssuchasecurityarchitecturedesignedtoachievethesegoals. 6.1.4SecurityArchitecture Thisdissertationproposesatwo-levelsecurityarchitectureforGVFS-basedgriddata managementFigure6-3.Itleveragestransport-levelsecuritytoprotectthelesystem tracofGVFSandemploysmessage-levelsecuritytosecuretheinteractionswiththe managementservices.Bothlevelsutilizewidely-acceptedsecuritytokensX.509[43]/GSI certicates[51]tosupportgriduserauthenticationandleaccesscontrol.Thedesign andimplementationofthelesystemlevelsecurityarealreadydiscussedinSection 4.4.2.Therefore,therestofthissubsectionfocusesontheservice-levelsecurityandits interactionwiththelesystemlevelsecurity. TocreateaGVFSsessionthroughtheservices,agriduseroraservicethatacts onbehalfoftheuserneedstoauthenticatewithDSSusingtheuser'scerticate. Authorizationisperformedbycheckingthecerticateagainstanaccesscontrollist ACL,orconsultingadedicatedauthorizationservice.Anauthorizedusercanthen proceedtodelegatethemanagementservicestherighttocreateaGVFSsessiononbehalf theuser:theDSSusestheuser'scerticatetointeractwiththeclient-andserver-side 148

PAGE 149

Figure6-3.SecurityarchitectureofGVFS-baseddatamanagementsystem. Transport-levelsecurityisleveragedtoprotectthedataaccessonGVFS,while message-levelsecurityisemployedtosecuretheinteractionswiththe managementservices.GridusercerticatesandACLsareusedfor authenticationandaccesscontrol. FSSs,whichinturnconguretheproxiestousethiscerticatetoestablishasecurele systemsession. SecurityforserviceinteractionsisenabledatthemessagelevelFigure6-3.Despite theperformanceineciency,message-levelsecurityoersgreatfunctionalityattheservice level,whichisnecessaryforthemanagementservicestouseandinteractwithother high-levelservices.Theseservicesarenotinthecriticalpathofgriddataaccess;they areonlyinvolvedinfrequentlywhencontrolisneededonaGVFSsession,specically, whencreating,conguring,anddestroyingasession.Therefore,theuseofmoreexpensive securitymechanismsdoesnothurtaGVFSsession'sI/Operformanceandhasnegligible overheadcomparedtoasession'slifecycle. ThedatamanagementservicesareimplementedusingWSRF::Lite[71],aPerl-based implementationofWSRFWebServicesResourceFramework[68].Thistoolsupports signingandverifyingofSOAPmessagesusingX.509certicatesaccordingtothe WS-Securitystandard[56],whichisutilizedtoenablegridsecurityattheservicelevel. 149

PAGE 150

Theresultingdatamanagementservicescansecurelycommunicatewitheachother,use thegriduserandservercerticatestoperformauthenticationandauthorization,andthen controltheGVFSproxiestousethesecerticatesforthelesystemlevelsecurity. Asmentionedearlier,themanagementservicesareresponsibleforcreatingand customizingGVFS-baseddatasessionsonbehalfofgridusersorservices.These operationsareprovidedthroughWebserviceinterfaces,whichconformtotheWSRF standard;meanwhile,thesecurityoftheservice-levelinteractionsalsofollowsWeb servicesecuritystandardsanditiscompatibletoGSI.Suchcompatibilityisimportant toprovidinginteroperabilitywithothergridservices,e.g.,toserveajobschedulerwhich needstopreparedataaccessforthejobssubmittedtoagridresource. ThemanagementservicessupportexiblegridleaccesscontrolforGVFSsessions usingthemechanismsdiscussedinSection4.4.2.2.PerlesystemACLsarestoredin theDSSdatabaseandareusedtoautomaticallygeneratethesessions'gridmaples. Forne-grainedaccesscontrol,theservicesalsoprovideaninterfacetomanagethe per-le/directoryACLsstoredalongwiththeexportedlesanddirectories.However,in alargegridsystem,theaccesscontroltogridresourcesisoftendedicatedtothevirtual organization'ssecurityservicee.g.,aCommunityAuthorizationService[116].The GVFSservicescanconsultsuchaspecialsecurityserviceforleaccesscontroldecisionsat thegranularityofindividualgridusersorgroupsofusers. 6.1.5UsageExamples Thissubsectiondescribestwoexamplesofusingthemanagementservicesdiscussed abovetomanageGVFSdatasessionsfortwoimportantapplications. 6.1.5.1Virtualmachinebasedgridcomputing AsdiscussedinChapter5,GVFShasbeenappliedtosupportVMbasedgrid computing.VMshavebeendemonstratedasaneectivewaytoprovidesecureand exibleapplicationexecutionenvironmentsingrids[17].Thedynamicinstantiationof VMsrequiresecientdatamanagement:bothVMstateanduser/applicationdataneed 150

PAGE 151

Figure6-4.TheVM-basedgridcomputingsystemsupportedbythedatamanagement services.ToinstantiateacomputeVM,theVMPlantservicerequeststheDSS toscheduleaGVFSsessionbetweentheVMstateserverandtheVMhost. AftertheVMisstarted,thejobschedulerservicecanthenrequesttheDSSto scheduleanothersessionbetweenthecomputeVMandthedataVM,to providetailoredaccesstoapplication/userdataforthejobsubmittedtothe computeVM.TheDRSalsoallowsforreplicationofdataVMsforimproved reliability. tobeprovisionedacrossthenetwork.PreviousworkhasdescribedaVMPlantgridservice tosupporttheselection,cloning,andinstantiationofVMs[106].Thedatamanagement servicesprovidefunctionalitythatcomplementsVMPlanttosupportVM-basedgrid systems,asdepictedinFigure6-4. Insuchasystem,theVMPlantserviceisinchargeofmanagingVMs,includingthe onesusedforcomputingexecutionofapplicationsanddatastorageofapplicationand userdata.ToinstantiateacomputeVMforjobsubmission,theVMPlantservicerequests theDSStoscheduleaGVFSsessionbetweentheVMstateserverandtheVMhost,where theVMstatetransfercanbeoptimizedinthewaydiscussedinChapter5.Afterthe VMisstarted,thejobschedulerservicecanthenrequesttheDSStoscheduleanother 151

PAGE 152

sessionbetweenthecomputeVMandthedataVM,whichprovidestailoredaccessto application/userdataforthejobsubmittedtothecomputeVM. TheDRSallowsforreplicationofdataVMsforimprovedreliability.TheVM replicationcanbeconvenientlydonebycopyingitsstatelesviaDSS-scheduleddata sessionsthatusehigh-throughputtransfermechanisms.ThereplicatedVMscanbe distributedacrossdierentphysicalserversandsitestoprovidetoleranceofserver andnetworkfailures.Whenafailurehappens,theservicestransparentlyredirectthe application'sremotedataaccesstoabackupVMandregeneratethelostdataVM.To tolerateclient-sidefailures,theservicescancheckpointandresumeVMinstancesusingthe techniquesavailableinexistingVMmonitorse.g.,VMwaresuspend/resume,scrapbook UML[117],Xensuspend/resume.WithROWenabledintheGVFSsession,buereddata modicationsinducedbytheapplicationexecutionarealsocheckpointedaspartofthe VM'ssavedstate.So,uponfailureofthecomputeVM,theapplicationalongwithitsdata changescanbeconsistentlyresumedfromthelastcheckpoint. 6.1.5.2Workowexecution Aworkowtypicallyconsistsofaseriesofphases,whereineachphaseajobis executedusinginputsthatmaybedependentontheoutputsofthepreviousphases. WorkowdatarequirementscanbemanagedbytheDSSwithGVFSsessionscreatedon aper-phasebasis,andeachsessioncanbetailoredtosuitthecorrespondingjob'sneeds throughtheservice.Inparticular,thecontroloverenablinganddisablingtheconsistency protocolsandsynchronizingclient/serverdatacopiesisavailableviatheserviceinterface. Henceschedulingmiddlewarecanselectandsteerconsistencymodelsthroughoutthe executionoftheworkow. Forinstance,atypicalworkowinMonte-Carlosimulationsconsistsofrunning largenumbersofindependentjobs.Theiroutputsarethenpost-processedtoprovide summaryresults.Thistwo-phaseworkow'sexecutioncanbesupportedbythedata managementserviceswithadataowFigure6-5suchthatasessioniscreated 152

PAGE 153

Figure6-5.AMonte-Carloworkowandthecorrespondingdataowsupportedbythe datamanagementservices.Tosupportthistwo-phaseworkow'sexecution,a sessioniscreatedforeachindependentsimulationjobwithanindividualcache forread/writedata;eachsessionisforcedtowritebackandthendisablewrite delayasthesimulationjobscomplete;andanewsessionwith invalidation-pollingconsistencyiscreatedforrunningthepost-processingjobs thatconsumetheproduceddata. foreachindependentsimulationjobwithanindividualcacheforread/writedata, eachsessionisforcedtowritebackandthendisablewritedelayasthesimulationjobs complete,and3anewsessionwiththeinvalidation-pollingconsistencyprotocolis createdforrunningthepost-processingjobsthatconsumethedataproducedinstep. SuchaworkowcanbesupportedbytheIn-VIGOsystem[2][3],whereaconguration leisprovidedbytheinstallertospecifythedatarequirementandpreferredconsistency modelforeachphase.WhenitisrequestedbyauserviatheIn-VIGOportal,thevirtual applicationmanagerinteractswiththeresourcemanagertoallocatethenecessary resources,interactswiththedatamanagementservicestopreparetherequiredGVFS sessions,andthensubmitsandmonitorstheexecution,foreachphaseoftheworkow. 153

PAGE 154

6.2AutonomicDataManagement Theservice-orienteddataprovisioningandmanagementframeworkdescribedinthe previoussectionscanserveend-usersorjobschedulingmiddlewaree.g.,theIn-VIGO virtualapplicationmanager[2]topreparedatasessionsforapplicationexecutions.A keychallengefacedbysuchmiddlewareinagrid-styleenvironmentisthecomplexity ofmanagingtheperformanceofmanyon-demandapplicationinstancesinthepresence ofdynamicresourceavailability.Anautonomicapplicationmanagementservice[118]is proposedtoautomaticallyrecoverjobsfromperformancefaultsbasedonmonitoringand predictionsusingapplicationexecutionhistoryanddynamicresourceinformation.The datamanagementservicesdiscussedinthisdissertationarekeytosupportingtransparent resubmissionofjobs,providingfastandon-demanddatasessionestablishmenttothe applicationmanagementservice.Ontheotherhand,theideaofautonomicmanagement canalsobeappliedtothedatamanagementinordertoreduceitscomplexityina grid-styleenvironment. Theuseofuser-levelvirtualizationtocreatedynamic,per-applicationGVFSsessions andtheuseofservice-orientedmiddlewaretocontrolthelifetimesandcongurationsof thesessionsprovidesthebasisforsupportingtailoreddataprovisioningtoapplications inlarge-scalegridsystems.However,themanagementofGVFS-baseddatasessions insuchanenvironmentisstillchallengingbecauseofitscomplexity:largenumbers ofdatasessionsneedtobecreated,customized,andterminatedondemandbased ontheapplications'lifecyclesandrequirements;theircongurationsalsoneedtobe timelyadaptedaccordingtothechangestoapplicationworkloadsandusageofshared processor,network,andstorageresources.Itisdesirablethatthedatasessionscan managethemselvestoachieveuserorjobschedulerexpectedperformanceandreliability automatically,sothatthemanagementcanbesimpleandthesystemcanbeagile. Thissectiondescribestheapproachtorealizingthisgoalbyevolvingthedata managementservicesintoautonomicelementstoprovidegoal-drivenself-managementof 154

PAGE 155

Figure6-6.Autonomicdatamanagementsystemconsistsofautonomicdatascheduler service,replicationservice,andclient-andserver-sidelesystemservices. Theyfunctionasself-managingautonomicelements,whichcontroltheclient, server,andsessionofaGVFSdatasessionaccordingtothehigh-level objectives,andinteractwitheachothertoautomaticallyachievethedesired dataprovisioningbehaviors. thedataprovisioningFigure6-6.Theresultingautonomicdatamanagementservices arecapableofautomaticallymonitoring,analyzing,andoptimizingthedierententities ofgrid-widedatasessions,aswellascooperativelyworkingtogethertoachievethedesired dataprovisioninggoals.Therestofthissectionwillpresentthedetailsoftheseautonomic servicesaswellasanexperimentalevaluation. 6.2.1AutonomicDataSchedulerService AsdescribedinSection6.1,DataSchedulerServiceDSSschedulesdatasessions forapplicationexecutions.ItinteractswithDataReplicationServiceDRStorequest datareplicationandinteractswithclient-sideandserver-sideFileSystemServices FSSstocreate,congure,andterminatesessions.Itisresponsibleforcustomizingand isolatingdierentsessionswithdierentcongurations.Oneofthemostimportantsession 155

PAGE 156

parametersthatcanbecustomizedisthesizeoftheclient-sidediskcacheemployedby GVFS. KernelNFSclientstypicallybuerdataandmetadatainmemory,buttheuse ofdiskcachingisrare.Inwide-area,long-latencyenvironments,theaggressiveuse ofdiskcachingcanbebenecialtomanyapplications.Therefore,GVFSimplements client-sideper-sessiondiskcaches,whichcanleveragethelargecapacityandgreat persistenceofdiskstofurtherexploitdatalocalitySection4.2.1.However,asthedata setsizeofmodernscienticandcommercialapplicationsgrowsrapidly,theDSSneedsto carefullymanagethestorageuseforcaching,whichcanhaveanimportantimpactonthe performanceofGVFSdatasessions. Anapplication'sremoteI/Otimeisestimatedby, T = N r mem t mem + N r disk t disk + N )]TJ/F21 11.9552 Tf 11.956 0 Td [(r mem )]TJ/F21 11.9552 Tf 11.955 0 Td [(r disk t network where N isthetotalnumberofremotedatarequestsissuedbytheapplication; r mem is thememorybuerhitrateand r disk isthediskcachehitrate; t mem t disk ,and t network aretheaverageservicetimesofarequestfrommemory,localdisks,andnetworkstorage, respectively. Adataintensivegridapplicationtypicallyhas r disk >>r mem and t network >>t disk >> t mem ,sothehitrateofthediskcacheiscrucialtodeliveringgoodapplicationdataaccess performance.Alargercacheresultsinbetterhitrate,becausecapacitymissesandconict missesgenerallydecreaseasthecachesizegrows[92].However,therelationshipbetweena cache'ssizeandhitrateisacomplexone,dependingonthelocalityofdatareferencesand theassociativityofcache.Therefore,theDSSbydefaulttakesaconservativeapproachin whichasession'sdiskcacheisconguredwithasizelargerthanitsapplication'sdataset. TherearealsoimportantscenarioswheretheDSSneedstoconguresessionswith smallerdiskcaches.Forexample,whenanodeismorepowerfulorclosertothedata serverthantheothernodes,itischosentoexecutetheapplicationbecauseitcanprovide 156

PAGE 157

betterperformanceevenifitsstoragecannotholdtheentiredataset.Inanothercommon case,multipleapplicationsneedtoexecuteonthesamenodeandtheavailablediskspace isnotenoughfortheirdatasets.TheDSScanscheduletheirsessionstorunsequentially withfull-sizediskcaches.However,itmaybenecessaryormorebenecialtocongure eachonewithasmallerdiskcacheandrunthemconcurrently,e.g.,inordertomeetthe deadlinerequirementsorachieveshortertotalruntime. Ifanapplication'sdataaccesspatternisknownfromtheknowledgebase,theDSS canuseexistingmethods[119]toestimatethesession'smissrategiventhecongured cachesize,andthenestimateitsremoteI/Operformanceusingtheaboveequation. t network ismonitoredonline 3 ; N isknownfromhistoryandisosetbythenumber ofalreadytransferredrequests,whichisalsomonitoredonline.Thisinformationcan facilitateDSStoallocatetheavailablestorageamongmultiplesessionsthatarescheduled tothesamenode. Asession i 'sutility U i representsthevalueofprovidingagivenlevelofservicetothe application.Itcanbecalculatedbyconsideringthedeliverablesessionperformanceand theapplicationpriority, U i = Performance i Priority i whereshorterruntimeandhigherprioritygenerategreaterutilityvalue.Sinceasession's remoteI/Otimeisaectedbyitsdiskcachesize,giventheavailablestoragespaceas theconstraint,theoptimalallocationisachievedwhenthetotalutilityfromthedierent sessionsonthenodeismaximized.Thecomplexityofthisoptimizationcomputation isboundedbythelimitednumberofconcurrentsessionsandpossiblecachesizes.To performtheaboveanalysis,theDSSmustmonitorthestorageusageandthedataserver 3 BesidesofdatarequestskernelNFSalsoissuesaconsiderablenumberofmetadatarequestsforconsistencychecks.MostoftheserequestscanbesatisedbyGVFSdiskcacheswithitsconsistency protocolsdiscussedinSection4.3. 157

PAGE 158

responsetimeontheclient.Thisisrealizedbyinteractingwiththeclient-sideFSS,which operatesaneectivemonitoringdaemon. Duetothedynamicandnon-dedicatednatureofgridresources,environmentchanges maytriggertheDSStorecongurethesessionparameters.Forexample,whenthedisk usageisreachingthelimitbecauseofotherlocalactivities,theDSSwilldetectitand reducethetotalspaceoccupiedbythecachestoavoidoverowingthestorage.Onthe otherhand,whenmorespacebecomesavailableforgriduse,theDSScanexpandcaches asnecessary. Notethatthechangingresourceavailabilitymayfalsifythepredictionwhichhas motivatedtheend-userorapplicationmanagertosubmitajobonthisresource.Or,even worse,theclientnodemaycrashandfailthejobexecution.Anautonomicapplication managershouldsubscribetothesechangesandactaccordinglybyrecoveringorrestarting thejobwiththeassistancefromDSS.Ontheotherhand,theserver-sidefault-toleranceis providedbytheautonomicDRS. 6.2.2AutonomicDataReplicationService 6.2.2.1Datareplicationdegreeandplacement Datareplicationhaslongbeenrecognizedasthekeytoachievinghighavailability. Ingridenvironmentsreplicationnotonlyneedstobeperformedacrossserversinorder toprovidefailoveronserverfailures,butalsoshouldbedistributedtodierentsitesto protectagainstnetworkpartitions.However,thelimitedbandwidthandhighdelayof WANmakewide-areareplicationveryexpensive.AlthoughtheDRSuseshigh-throughput transfermechanismse.g.,GridFTP[6]toreplicatedata,theoverheadisstillconsiderable forlargedatasets.Becauseanapplication'ssessioncannotstartuntilthenecessary replicasareready,thisoverheadneedstobeconsideredintothecostassociatedwiththe session. Thechoiceofthereplicationdegree,i.e.,thenumberofreplicasforagivendata set,isadecisionthatneedstobemadebasedonabenet-costanalysis.Typicallyat 158

PAGE 159

leasttworeplicasarerequiredforeachdataset,sothatanapplicationcancontinueits executioninpresenceofinfrequentfailures.Asthefailurerategoeshigher,morereplicas arerequiredtosustaingoodreliability,butthecostfromreplicacreationandteardown isalsoincreased.Althoughitisgenerallydiculttopredictaparticulardataserver's failurerateorMTTFMean-Time-To-Failure,itcanbeestimatedbasedonobservation andanalysis.Initiallyeveryserverhasahypotheticalfailureratestoredintheknowledge base,anditisadjustedandupdatedbytheDRSasfailureshappenovertime.Gradually thevaluebecomesrepresentativeoftheserver'sactualreliability. Theavailablestoragecapacityforreplicaplacementissharedamongtheexisting sessions.Thereplicaspreparedforapastapplicationexecutionmayalsooccupydisk spacebecausealazystyleofcleanupisused,whereareplicaisnotremovedimmediately afteritssessionnishes,inanticipationoffutureusesofthesamedataset.Therefore,the storagemanagementtakesintoaccountthevaluesofdierentdatasets,inwhichhigher priorityapplications'datasetshavegreatervalues,andlivesessions'datasetsalways valuemorethanthosethatarenotcurrentlyinuse. Basedontheaboveconsiderations,anotherutilityfunctionisalsousedtosolvethe replicationdegreeandplacementproblem.Theutility U i d ofhavingthe i threplicafordata set d iscomputedbytheproductofthedataset'svalue V d andthereliability R i d provided bytheitsreplicas R i d =1 )]TJ/F22 7.9701 Tf 18.252 11.357 Td [(i Q j =1 r j d ,where r j d isthefailureprobabilityofthedataset d 's j threplica,decidedbythefailurerateofthedataserverwherethisreplicaisstored.On theotherhand,thecostofcreatingthesereplicasis C i d = i P j =1 c j d ,where c j d istheoverhead fromcopyingthedatasettothe j threplica'sdataseverfromthenearestexistingreplica. Therefore,whenconsideringaddingareplicaforadataset,itsutilityaswellasthe costandreliabilityconstraintsareusedtodecidewhethertoadditandwheretoplaceit, 159

PAGE 160

asfollows: U d = V d R i d R i d R min C i d C max where R min isthedesiredminimumlevelofreliabilityforthedataset,and C max isthe maximumtolerablereplicacreationoverhead.Thisalgorithmtendstoplaceareplicaon morereliableserverswhenthereliabilityismoreimportant,andputitoncloserserversif thecostismoreconcerned.Whenmultipleallocationsareavailable,theDRSchoosesthe serverthathasthebesttavailablespace.Ifreplacementisnecessaryduetothelackof storage,thereplicawiththelowestutilityischosenandevicted. 6.2.2.2Datareplicaregeneration AutonomicreplicaregenerationisalsosupportedbytheDRS.Whenadata serverfailureoccurs,therunningsessions'replicasonthatserverneedtobepromptly regeneratedinordertominimizethetimewindowsinwhichthenecessaryreplication degreesarenotsatisedforthedatasets.Inthisprocess,human-interventionshouldbe avoidedbecauseittendstobeslowandcostly.Furthermore,theimpactoffailureson applicationsshouldalsobereducedtotheminimum;itisdesirablethatapplicationscan continuetheirexecutionswithoutinterruptions.ThisisrealizedbytheautonomicFSS andwillbediscussedshortly. TheDRSachievesautonomicreplicaregenerationbymeansofautomaticfailure detectionandreplicareconguration.AfailureisusuallynotiedbytheDSSwhich monitorsthesessionthroughtheclient-sideFSS.However,afailurereportedbythe client-sideFSScanbecausedbynetworkpartitioningbetweentheclientandserver. ThisisconrmedbytheDRSifitcanstillconnecttothatserver,andthenonlythe datasetsthatareusedbythisparticularclientneedtoberegenerated.Onceafailure isdetermined,theDRSimmediatelyallocatesstoragespaceusingtheaforementioned algorithmandregeneratesthelostreplicasfortherunningsessionsonthenewlyselected 160

PAGE 161

dataservers.TheinformationaboutthesenewreplicasisalsoinformedtotheDSS,so thatitcanreconguretheconcernedsessionstousetheminneedoffailover. 6.2.3AutonomicFileSystemService 6.2.3.1Client-sidelesystemservice Client-sideFSSfacilitatesthetaskofautonomicDSSbymonitoringstorageusage andserverresponsetime,andexecutingthesessioncongurationsdecidedbytheDSS. AsdiscussedinSection6.2.1,aFSScontrolstheproxytoshrinkorexpandasession's diskcacheasinstructedbyDSS.Adiskcacheisstructuredaslebanksthatcontaindata blockshashedaccordingtotheirlehandlesandosets.Sophisticatedalgorithmscanbe conceivedtoreduceacache'ssizebyevictingtheleastrecentlyusedblocksandrehashing theotherones,whichoftenincurssubstantialoverhead.Instead,theproxyremoves theleast-usedandmost-cleanlebanksfromthecacheuntiltherequiredshrinkageis achieved.Thisneedsonlyasimplere-mappingoflebanksbutnotanyrehashingofdata blocks,andthenewcachesizecanimmediatelytakeeect. Client-sideFSSisalsothekeytorealizingapplication-transparentfailoverinpresence ofserver-sidefailures,includingdataserverfailure,networkpartitioningbetweentheclient andserver,andserverornetworkoverloading.AsdiscussedinSection4.5.3,theseare detectedwhenamajortimeoute.g.,100timesoftheaverageresponsetimeoccurstoa datarequest.Inordertorecoverfromafault,theproxyimmediatelyredirectsthesession tothebackupdataserver.TheFSSalsoreportsadetectedfaulttotheDSSsothatitcan asktheDRStotakeactionsonreplicaregeneration. Toachievefailoverwithoutanyinterruptiontotheapplication,asessionusesan active-stylemodel,aspresentedinSection4.5.2,tomaintaindataconsistencyamong thereplicas.Everydatamodicationrequestissuedbytheapplicationismulticastfrom theproxyclienttothesession'severyproxyserverinareliablemanner.Consequently, eachreplicahastheexactsamecopyofthedatasetduringtheentiresession,andifany 161

PAGE 162

ofthemcrashes,ithasnoimpactupontheapplicationsincetheremainingreplicascan continuetoserviceasusual. Althoughitisnecessarytowrite-all,asessioncanchoosetouseread-allorread-one. Intheformercasereadoperationsarealsoperformedonallthereplicas.Thismodelis employedtofurtherimprovereliabilityagainstnotonlyservercrashes,butalsoByzantine failures,inwhichtheproxyclientcollectsandcomparesalltherepliesfromtheproxy serversandthendecidesonthecorrectone.However,thedisadvantageofthismodelis thatitlimitstheperformanceofthesessiontotheslowestreplicaserverallthetime. Ontheotherhand,itisoftensafetoassumethatasuccessfuldataoperationiscorrect becausethereareothermechanismsfromhardwaretosoftwarethatareinplaceandcan promisethatanerrorwouldnothappenwithoutbeingnoticed.Readoperationsarethe mostcommononesfortypicalapplications,andthusthemostvaluabletooptimizeto achievespeedupaccordingtoAmdahl'sLaw[92].Therefore,GVFSsessionstypically employthisread-one,write-all"replicationmodel. Thismodelreliesonclient-sideFSS'autonomicfunctionstochoosethebestreplica servertoperformreadoperationsthroughoutasession.Thechosenserveriscalledthe primaryserver forthesession.Itisdecidedfromthesession'sreplicaserversbasedon theperformancethatcanbedeliveredfortheapplication'sremoteI/Ooperations.The primaryservercanchangeovertimeasnetworkconditionsandserverloadsvary.So theFSSmonitorstheperformanceofthereplicaserversperiodicallyusingamonitoring daemon. Itisimportanttodesignanaccurateandlowoverheadmechanismtomeasurethe performance.Fromasessionclient'spointofview,theresponsetimeofaremotedata requestisdeterminedbythenetworkdelayaswellasthedataserver'sCPUprocessing delayanddiskaccessdelay.Simplenetworkprobingmechanisms,e.g.,ping,cangive informationaboutthenetwork'sperformance,butnottheserver's.UsingNULLRPC requeststotheserverincurslittleoverheadandcanmeasureboththerstandsecond 162

PAGE 163

delays,butitcannotrevealtheserver'sI/Oloadanddiskperformance.Instead,the monitoringdaemonissuesverysmallwritesontheGVFSpartitionandusestheresponse timestoestimatetheperformance. Writesareusedinthemeasurementbecauseitisdiculttopreventtheeectof server-sidecachingprocessorcachingandmemorybueringwithreadoperations.The FSSmonitoringdaemonperiodicallyperformsaone-bytewriteatadierentblockof ahiddenremotele,anditrequeststheservertocommitthewritesoastoavoidthe eectofserver-sidewritedelay.Eventhoughthehiddenle'ssizemayreachalarge valueforalongsession,itdoesnotnecessarilyoccupymuchspaceondiskbecausethe server'slesystemtypicallyusesholesonsparseles.Tofurtherreducethisoverhead,the monitoringdaemonautomaticallytruncatesthisleandstartsoverfrombeginningaftera fewhundredsofprobes. Becausethismonitoringmechanismhasverylowoverheadanditisdoneoutside oftheproxythatisresponsibleforprocessingapplication'sdatarequests,thesession's performanceisalmostintact.TheFSScanusewell-knowntimeseriesanalysisalgorithms topredictareplicaserver'sfutureperformancebasedontheobservedresponsetimes, andthentomakedecisionsontheprimaryserverselection.Complexalgorithmsarenot suitablebecausetheywouldintensivelycompeteforCPUwiththerunningapplications andnotnecessarilygivethebestpredictions.Ontheotherhand,simpleralgorithmshave beenproveneectiveinmanycases[120].Inthisprototypetheexponentialsmoothing algorithm[121]isused.Oncetheprimaryserverisplanedtochange,theFSScontrolsthe proxytoswitchittransparentlytotheapplication. 6.2.3.2Server-sidelesystemservice Anautonomicserver-sideFSSmonitorstheserver'sstorageusage,whichhelpsthe autonomicDRStodecidereplicaplacement.ItalsomonitorstheresponsetimesofRPC requestsforwardedbyaproxyservertothekernelserver.Theserverisnotnecessarily ontheproxy'slocalhost,butcanbeavirtualizedserverthatconsolidatesnetwork 163

PAGE 164

attacheddevices[122].Therearewell-studiedalgorithms[123]toprovideloadbalancing inthisscenariowhichcanbeleveragedbytheFSSandhenceitisnotdiscussedinthis dissertation. 6.2.4Evaluation 6.2.4.1Setup Aprototypefortheproposedautonomicdatamanagementsystemhasbeen implementedanditsautonomicfeaturesareevaluatedinthissection.VMware-based virtualmachineswereusedtosetupthelesystemclientsandservers,whichwerehosted ontwophysicalclusternodes.Eachphysicalnodehasdual2.4GHzhyper-threaded Xeonprocessorsand1.5GBmemory.EachVMwasconguredwith64MBmemoryand wasinstalledwithSUSELINUX9.2.Theemphasisoftheexperimentsisinwide-area environments,whichwereemulatedusingNISTNet[95].Unlessotherwisenoted,every linkwasconguredwithatypicalwide-areaRTTof40ms. Theexperimentswereconductedbyusingatypicallesystembenchmark,IOzone [88],torepresenttheI/Opartofgridapplications.Itwasexecutedontheclientswith inputaccessedfromtheserversviaGVFS.TheGVFSsessionswerevirtualizedupon NFSv3,using32KBdatablocktransfer,andthedataserversexportedthelesystem withoutwritedelayandwithsynchronousaccess.Everyexperimentwasstartedwithcold kernelbuerandGVFSdiskcachesbyunmountingthelesystemandushingthedisk cache. 6.2.4.2Autonomicsessionredirection Inagridenvironmentnetworklatencyandthroughputareoftenaectedbythe existenceofparallelTCPtransfers[124],forexample,thepopularuseofGridFTP[6]. Figure6-7showsthelatencybetweentwonodesundersuchinuenceinarealwide-area setup.Eachnodewaslocatedinadierent100MbpsLANandtherewereparallelTCP transfersfromtheuseofIperf[89]betweenanotherpairofnodesfromthesetwoLANs. 164

PAGE 165

Figure6-7.Networkround-triptimeRTTbetweentwoWANnodesimpactedby third-partyparallelTCPtransfers.TheaggregatebandwidthBWconsumed bytheinterferingparallelTCPtransfersisalsoshowninthegure. ThedatademonstratethatthelatencygrowsrapidlyasthenumberofparallelTCP streamsusedbythethird-partytransferincreases. Suchascenariowasemulatedintheexperimentandusedtostudytheeectiveness ofautonomicsessionredirectioninthepresenceofnetworkperformanceuctuations.The datasetwasreplicatedacrosstwoservers,andtheclientwasconnectedtoserversserver 1and2viatwoindependentlyemulatedWANlinks,whereeachlink'slatencywasvaried randomlywiththevaluesshowninFigure6-7inexistenceof0,2,4and8parallelTCP streams,andwithdecreasingprobabilitiesforthesevalues. TheIOzonebenchmarkwasexecutedontheclientnode,whichreadsandrereadsa 256MBleaccessedthroughGVFSwithdierentdataservercongurations:using server 1 statically;using server2 statically;andusing autonomicsessionredirection between theseserversdynamically.Theaverageserverresponsetimeswerecollectedevery10s throughouttheexecutionofthebenchmark,asshowninFigure6-8a.Theresultsshow 165

PAGE 166

Figure6-8.AverageresponsetimesduringtheexecutionofIOzoneread/rereadmode witha256MBinputaccessedthroughGVFSwithdierentdataserver congurations:using server1 statically;using server2 statically;andusing autonomicsessionredirection betweentheseserversdynamically.The benchmarkwasexecutedunderdynamicnetworkloaductuationsinaand dynamicserverloadvariationsinb. thatwithautonomicredirectiontheGVFSsessioncanalmostalwayschoosethebetter linkfordataaccess,andconsequentlytheruntimeofthebenchmarkisabout13%better thanusingserver1staticallyand16%betterthanusingserver2statically. Thesecondscenarioconsiderstheeectofdynamicserverloadvariationsonthe performanceofasession.TheI/O-intensivejobsexecutionsofIOzonewithread/rereadof dierent256MBinputleswereloadedtotheserverfollowingaPoissonprocess,andthe intensitywasvariedbyrandomlychoosingthenumberofconcurrentjobsbetween0and 6.Thenthebenchmarkwasexecutedwiththesamecongurationsasabove.Theaverage serverresponsetimeswerecollectedevery60sthroughouttheexecutionandareplotted inFigure6-8b.Theresultsalsoshowthatautonomicredirectioncanachievethebest 166

PAGE 167

Figure6-9.aRuntimesoftwoexecutionsofIOzonejob1and2whichrandomlyreada dierent256MBlethroughGVFS;btheirtotalruntime,andctotal numberofrequestsreceivedbytheserver.Dierentcachecongurationsare usedforthesessions: A ,startstherstonewithfull-sizecacheandthesecond onewithoutcaching,concurrently; B ,startsthemsequentiallywithfull-size caches; Auto ,startsthemconcurrentlyanddividestheavailablestorage autonomicallybetweentheircaches. responsetime,andhelpsthebenchmarktorun16%fasterthanusingserver1statically, and29%fasterthanusingserver2statically. 6.2.4.3Autonomiccacheconguration ThesecondexperimentstudiestheuseofautonomiccachecongurationbytheDSS whileschedulingdierentdatasessions.Inthissetup,twotasksneedtoberunonthe sameclientnode,whereeachtaskexecutesIOzonewithrandomreadingofadierent 256MBleaccessedfromthedataserverviaGVFS.TheDSSisrequiredtopreparetwo datasessionsforthesetasksbuttheavailablestorageontheclientcanonlyhold256MB ofdiskcaches.Sothreedierentcongurationsarepossibleforthesesessions: A ,starts therstsessionwithfull-sizecacheandthesecondonewithoutcaching,concurrently; B startsthemsequentiallywithfull-sizecaches;and Auto ,startsthesessionsconcurrently andsplitsuptheavailablestorageautonomicallybetweenthesessions'cachesbased ontheirutilitiesinthissetupeachsessiongetshalfofthediskspacebecausetheyare assumedtobeequallyimportant. 167

PAGE 168

Figure6-9showstheruntimesofthejobsaswellastheirtotalruntimeandtotal numberofrequestsreceivedbytheserverduringtheirexecutions.Forconcurrent sessions,thetotalruntimeisthemaximumoneamongtheindividualruntimes. Comparedto A ,theautonomiccongurationprovidesbetterfairnessbetweenthejobs anditalsogreatlyreducestheserverloadtheamountofserver-receivedrequestsisdown by18%;comparedto B ,theautonomicconguration'stotalruntimeismuchshorter reducedby40%. 6.2.4.4Autonomicdatareplication Thelastexperimentinvestigatestheautonomicdatareplicationinpresenceof server-sidenodeornetworkfailures.Asequenceoftaskswerelaunchedontheclient nodesequentially,whereeachoneranIOzoneinrandomreadingmodewith512MBinput accessedfromthedataserversthroughGVFS.Thedataserversfailedrandomly,where thefailuresweremodeledasaPoissonprocesswithanaverageinter-arrivaltimeofhalf anhour.Theexperimentusedareplicationdegreeof2forthedatasets,andfailureswere injectedontheserversbyrandomlychoosingoneofthemtostopitsnetworkconnection. Twodierentsituationswereconsideredforthetasks: independent ,eachtaskhadan independentdatasettheinputleandwasscheduledwithadierentdatasession; dependent ,thetaskshadthesamedatasetandsharedthesamedatasession. Figure6-10showsthetimelineofeventsastheyhappenedduringtheexperiment. Intheindependenttaskscase,atotaloffourserver-sidefailureshappened.Eachfailure causedanewreplicatobegeneratedbycopyingthedatafromtheremainingservertoa newserver.Twoofthefailuresoccurredattheprimarydataserverandalsotriggeredthe clienttoredirectthesession'sdataaccesstothebackupserver.Theseallcauseddelaysin thebenchmark'sexecutions,e.g.,thesecondruntookthelongesttimetonishbecause twofailuresoccurredduringthatrun.Nonetheless,everyrunsuccessfullycompleted regardlessofthefailures.Inthedependenttaskscase,thediskcachewaswarmedup aftertherstrun,whichnotonlysubstantiallyreducedthetimesofthefollowingruns, 168

PAGE 169

Figure6-10.TimelineofeventshappenedduringasequenceofexecutionsofIOzone.Each runrandomlyreada512MBinputaccessedfromthedataserversthrough GVFS.Thedataserversfailedrandomly,andreplicaregenerationswere triggeredaccordingly.Ina,theexecutionswereindependentandwere supportedbydierentGVFSsessions;inb,theexecutionsusedthesame datasetandsharedthesamesession. butalsomadetheclientcompletelyunawareoftheserver-sidefailuresanddelays.This experimentdemonstratesthatbyusingtheautonomicservicestoselectandcombine application-tailoredenhancements,suchascachingandreplication,theperformanceand reliabilityofgrid-widedatasessionscanbesubstantiallyimprovedinanautomaticand transparentmanner. 169

PAGE 170

CHAPTER7 CONCLUSION 7.1Summary Thisdissertationfocusesondatamanagementinagrid-styleenvironment,where applicationsanddataaredistributedonresourcesacrossadministrativeboundaries andwide-areanetworks.Suchanenvironmentisbecomingincreasinglytypicalasthe scaleoftoday'slargedistributedcomputingsystems,suchasscienticgrids,enterprise datacenters,andcloud"computingsystems,grows.Thedissertationhasdesigned andimplementedanovelsystemfordatamanagementintheseenvironments,andit advocatesthethreekeyelementsofthissolution:user-leveldistributedlesystem DFSvirtualization,application-tailoreddataprovisioning,andautonomicservice-based management. Severaltypesofgriddatamanagementapproachesareavailabletoprovidecross-domain dataaccesstoapplications.Fundamentally,theydierinthelevelwhereremotedata accessisintroducedinthesystemandinthedegreeoftransparencyprovidedtothe system.Application-levelapproachesexplicitlyinvolvedatamanagementmiddleware instaginglesforapplications,andthustheyarestronglytiedtospecicapplications. Suchapproachesarenotapplicationtransparent,whichmakesitdiculttoenablea widevarietyofapplicationstoharnessthepowerofalargedistributedcomputingsystem. OperatingsystemO/Slevelapproachesachieveapplicationtransparencybyimplicitly serviceanapplication'sI/Orequestswithremotedataaccess.Buttheseapproachesare notO/Stransparent:theyrequireO/S-specicmodicationswhicharediculttodeploy onthetypicallyheterogeneousandnon-dedicatedresourcesinagrid-styleenvironment. User-levelapproachescanprovidetransparencytobothapplicationsandO/Ss, andoneofsuchapproachesisbasedoninterceptinganapplication'slibrarycallsor systemcallsandmappingthemtoremotedataaccess.However,thisapproachis onlyapplicabletoapplicationsthatcanberelinkedorO/Ssthathavesystemcall 170

PAGE 171

tracingcapability.Anotheruser-levelapproach,whichistakenbythisdissertation, isthroughtheinterceptionandhandlingofDFScalls,inessence,virtualizingan O/S-levelDFSandprovidinggrid-widevirtuallesystemsGVFSs.Itpreservesthe genericlesysteminterfacetoprovidecompleteapplicationtransparencyandleverages widely-availableDFSse.g.,NFStoachievegreatO/Stransparency.Moreover,basedon thisvirtualization,enhancementstogrid-widedataaccesscanbealsorealizedwithout changingapplicationsandO/Ss,andthemanagementofdataprovisioningcanbe decoupledfromtheunderlyingphysicalresourcesandoptimizedindependently. Auniquecontributionmadebythisdissertationisonapplication-tailoreddata provisioning.Noneoftheexistingsolutions,includingthecloselyrelatedDFS-basedones, canaddressthediverseneedsofapplications.Nonetheless,theineciency,insecurity, andunreliabilityofgrid-styleenvironmentsrequireapplication-tailoredoptimizations ondierentaspectsofremotedataaccess.Thisdissertationaddressesthisneedby proposinguser-levelenhancementsforGVFSonperformance,consistency,security,and fault-tolerance.GVFS-baseddatasessionsarecreatedondemandonaper-application basis,andtheycanindependentlyselect,customizetheseenhancementsaccordingtoits application'scharacteristicsandrequirements. Thisdissertationisalsothersttorealizetheimportanceofapplyingautonomic techniquestodatamanagementingrid-styleenvironments,andaddressingthecomplexity ofmanaginglargenumbersofon-demanddatasessionswithchangingapplication workloadsandresourceavailability.Ithasdevelopedanovelautonomicdatamanagement systembasedonGVFSvirtualizationandservice-orientedmanagement.Asetofservices aredevelopedtoprovideexiblecontrolofthelifecyclesandcongurationsofGVFSdata sessions,andtosupportinteroperableinteractionswithothermiddlewareservicesbased ontheWebServiceResourceFrameworkWSRF.Intelligencecanthenbebuiltinto theseservicestoenableautomaticconguration,optimization,healing,andprotectionof thedataprovisioninginaccordancewithhigh-levelobjectives. 171

PAGE 172

Everyaspectoftheproposedsystemhasbeenthoroughlyevaluatedwithexperiments basedontypicallesystembenchmarksandrealapplications.Akeyinsightfromthe resultsisthattheoverheadfromvirtualizationisnotsignicant,butitcanenable importantimprovementandfunctionalitywhicharenotavailableorpossibleinthe underlyingphysicalsystem.Ontheotherhand,thevirtualizedsystemscanleverage theseenhancementstofurtherreducetheoverheadandevenoutperformthephysical system.Specically,GVFSvirtualizationenablesdynamicdataprovisioningandexible application-tailoredenhancementstoaddressthelimitationsoftraditionalDFSse.g., NFSingrid-styleenvironments.Inadditiontoprovidingstrongconsistency,security, andreliabilitytogrid-widedataaccess,theenhancedGVFSisabletonotonlyhidethe overheadfromuser-levelproxybasedvirtualization,butalsodeliversignicantspeedup comparedtoNFSinWAN. Theseexperimentsalsodemonstratetheeectivenessofusingtheproposeddata managementservicestocreateandterminateGVFSsessionsondemandaswellasto customizeandapplythevariousenhancementsonthesessionsasneeded.Theseservices alsoallowtheinteractionswithotherWSRF-basedmiddleware,e.g.,servicesbuiltwith GlobusToolkit[69],andsupportitsdatamanagementrequirements.Theautonomicdata managementarchitecturedescribedinthisdissertationenablesautonomicfunctionsto bedevelopedforself-managementondierentaspectsofdataprovisioning.Experimental evaluationontheprovidedseveralautonomicfeaturesshowthattheycanautomatically allocateresourcestoGVFSsessionsbasedonpolicies,improvedataaccessperformance undernetworkandserverloaductuations,aswellasprotectapplicationsandregenerate datareplicasinthepresenceofdynamicserverfailures. ThisGVFS-basedapproachhasbeenappliedtosupporttheuseofvirtualmachines VMsasexecutionenvironmentsingridcomputing,inwhichon-demandaccessto bothuserdataandVMstateistransparentlyprovidedtotheapplicationsandVMs dynamicallyinstantiatedacrossgrids.ResultsfromexperimentswithVMware-based 172

PAGE 173

VMsshowthatthisGVFSsolutionsupportsfastVMinstantiationsviacloningand ecientexecutionsofapplicationsinVMs.Itsubstantiallyreducestheoverheadof provisioningVMsacrossWAN,comparedtotheapproachesbasedonnativeNFSor full-ledownload/upload. Animplementationofthisdissertation'sapproachhasalsobeendeployedin theproductionIn-VIGOsystem[2][3],agridsystemforscienticapplicationsand theprototypeofnanoHUB[125].Ithassupportedapplicationsfrommanydierent disciplines,suchasthespectroscopystudyforbiomedicalscientists[91]andstorm surgemodelingforcostalresearchers[102].TheGVFSsolutionprovidestransparent, on-demanddataaccesstonotonlytheseapplicationsenabledbyIn-VIGO,butalso theothermiddlewarecomponentsoftheIn-VIGOsystem,suchastheuserinterface manager,virtualapplicationmanager,andVMmanager.Inthepastafewyears,tensor evenhundredsofGVFSsessionshavebeendynamicallycreatedeverydaytosupportthe applicationexecutionsongridresourcesandthesmoothfunctioningofIn-VIGO. 7.2FutureWork ThefundamentalgoalofmyresearchistodesignanddevelopDFSvirtualization andservicesforexibleandscalabledatamanagementingrid-styleenvironments.This dissertationhasmadesubstantialprogresstowardsthisgoal,andithasalsolaidasolid foundationforfutureresearch.Furtherimprovementoftheproposeddatamanagement systemcanbeconsideredalongthefollowingthreedirections. 7.2.1Performance Distributedlesystembaseddataprovisioningisimportanttosupportingapplication transparency,andblock-baseddatatransferisveryecientforinteractiveapplicationsas wellaslargesparseleaccess.However,forlargebulkdatatransfer,theproposeddata managementsystemhastorelyonadditionalhigh-throughputdatatransfermechanisms suchasGridFtp[6].Thisineciencyofblock-basedDFSaccesscanbeattributedto twofactors:conservativeprefetchingpoliciesandlow-throughputnetworkdatatransport 173

PAGE 174

mechanisms.FutureresearchcanbeconductedtoimproveGVFSforbulkdatatransferby specicallyaddressingthesetwoproblems. TraditionalDFSclientsuseonlyconservativeprefetchingpolicies,e.g.,theadaptive read-aheadalgorithminLinuxkernelusesarelativelysmallvaluefortheread-ahead windowsize.ThesepoliciesaredesignedundertheassumptionsforLANenvironments, wherethepenaltyofmispredictiononprefetchingoftenoutweighsthebenetfrom aggressiveprefetching,becausethenetworkround-triptimeRTTisrelativelysmalland clientsuselimitedmemoryforcaching.Inagrid-styleenvironment,theseassumptionsdo notholdanymore.Wide-areanetworkshavehighlatencyaswellaslargebandwidth-delay productBDP|theproductofthelinkcapacityandtheRTTofapacket.Withthe emergenceandspreadofnewWANtechnologies,networkbandwidthsaregettinghigher, butthelatenciesarestillboundedbythespeedoflight,whichresulttoevenlarger networkBDP.Ifthenetworkbandwidthisunderutilized,thereislittledierenceinthe costoffetchingalargerblockofdatathanasmallerone,buttherewouldbeasignicant gaininperformanceiftheextradataareindeedusefultotheclient.Therefore,itwould bebenecialtouseaggressiveprefetchingtollthepipe"upandtakeadvantageofthe availablebandwidth. Ifaclient-sidecache'scapacityislimited,prematureprefetchingcouldpushother usefuldataoutofthecache,increasecachemisses,andcauseperformancedegradation. Thisisalsopartofthereasonwhytraditionalmemory-cachingonlyDFSsrestrictthe amountofprefetching.TheGVFSapproachemploysdiskcaches,whichhavemuch largercapacityintheorderofgigabytesoreventerabytesthanphysicalmemoriesin theorderofmegabytes,andthushavemuchlessconcernonthecachesbeingooded. Moreover,diskcachingispersistentinfaceofclientcrashes/reboots,whichmeansthat prefetcheddatacanbeusefulfortheclientoveralongperiodoftime.Basedonthese reasons,aggressiveprefetchingwithGVFSdiskcachinghasthepotentialtoimprovethe performanceofremotedataaccesssubstantially. 174

PAGE 175

TheGVFSapproachleveragesNFSremoteprocedurecallsRPCforremote dataaccess,whichcanbetransferredovereitherUDPorTCP.TraditionalNFS implementationsuseUDPbecausetheyaredesignedforuseintherelativelyreliable LANenvironments.Tocopewithpacketloss,NFSclientshavetoretransmittherequests thathavetimedout,which,however,cansignicantlyimpactthethroughputonWAN.If thetimeoutistooshort,theretransmissionsmayaggravatetheproblemsuchasnetwork congestionorserveroverloading.Ifthetimeoutvalueistoolarge,theclientmaybe unnecessarilyidlewaitingforalostpacketandcauseadditionallatencyforthedata request.TheNFSimplementationsbasedonTCPperformbetteroveralong-latency orunreliablenetworkbecauseTCPprovidesmoreecientreliabledatatransferatthe transportlayer.Nonetheless,itisstilldiculttoachievehigh-throughputinWANwith TCP,duetoitsinecienciesinnetworksthathavehighBDP[126],includingtheslow congestion-controlalgorithm,thebandwidthunderutilizationinfaceoferrorsthatarenot causedbycongestion,andthebiasagainstTCPowswithhigherRTTs. ToaddresstheabovelimitationsofnativeUDPandTCPprotocols,futureresearch canstudyhigh-throughputdatatransporttoservebothremotedataaccessanddata prefetching.Inadditiontoimprovingperformance,itisimportantthatsuchtransportcan betransparentlydeployedonresourcesingrid-styleenvironments.Itshouldnotrequire anymodicationstoexistingnetworkinfrastructurese.g.,O/Snetworkingstack,andit shouldbeconvenienttoinstallwithoutlocaladministrationinvolvementonthesystems. Therefore,anapplication-levelanduser-leveltransportenhancementonGVFSismore desirable. 7.2.2Intelligence Inordertodealwiththedatamanagementcomplexityandprovideoptimaldata service,anautonomicgriddatamanagementsystemisproposedinthisdissertationbased onGVFSanditsmanagementservices.Itsupportspolicy-drivenself-managementof GVFSsessionsinseveralaspects,includingcacheconguration,datareplication,and 175

PAGE 176

sessionredirection.Workingtowardsamoreadvancedautonomicgriddatamanagement system,moreself-optimizationfeaturescanbedevelopedandintegratedintothissystem. Inparticular,cachereplacementandprefetchingdecisionscanhavesignicant impactondataaccessperformance.CurrentGVFSimplementationemploysasimple priority-basedcachereplacementpolicy,inwhichdirtycacheblockshavehigherpriority thancleanblocks,andwithinthesamepriorityclassablockisrandomlychosentoevict. Resultsfromtheexperimentalevaluationshowthatthissimpleschemecandelivergood performancebytakingadvantageofdatalocality.Asarst-stepenhancementtothis simplescheme,traditionallysuccessfulcachereplacementalgorithmse.g.,LRU,MRU canbeemployed.However,itcanbefurtherimprovedbyadoptingmoreintelligent policiesbasedonthepredictionofapplicationdataaccesspatterns. Whilestudyingthesepatterns,itisimportanttomaintainapplication-transparency, whichhasalwaysbeenthetopdesigngoaloftheproposeddatamanagementsystem. Therefore,approachesthatrelyonexplicithintsproducedbyapplicationse.g.,[127]are notappealingtothisresearch.Section5.2hasdescribedtheuseofmetadatahandling tocaptureapplication-specicknowledgeandoptimizedatatransfer,whichisemployed tosupportecientVMstatetransfer.Suchanapplication-transparentapproachcanbe extendedtoamorecomprehensiveframework,inwhichtheknowledgeaboutapplication dataaccesscharacteristicsandrequirementsisautomaticallylearnedandleveragedbythe autonomicdatamanagementsystem.Futureresearchcaninvestigatehowtoachievethis goalbasedontechniquessuchastracetrackingandanalysis,oineandonlinelearning,as wellasphaseandpattenclassication. 7.2.3Integration Bythevirtueoftransparency,theproposeddatamanagementapproachisableto notonlysupporttheheterogeneoussystemsingrid-styleenvironments,butalsoexploit thestrengthsofthediversestorageassetsdeployedinthesystems,suchasstoragearea networkSAN,parallellesystem,andRAIDsystems.Therefore,infutureresearch,a 176

PAGE 177

scalabledatamanagementmiddlewaresystemcanbebuiltbasedonextendingGVFSand itsmanagementservices,andbridgingexisting,special-purposedleandstoragesystems. TheGVFSwillbethebackboneinthedatamanagement,connectingheterogeneous resourcesindierentdomains,andprovidingapplication-tailored,transparentdata accessacrossWAN.Specically,itcanbeextendedtosupportparallelleaccessfor high-throughput,bybridgingaparallellesystem,andsupportstoragenetworkingfor high-performance,bybridgingaSANlesystem.Themanagementserviceswillbethe braintocontrolGVFSaswellasthebridgedsystemsandtooptimizetheend-to-end dataprovisioning.Theycanbeextendedtosupportmorehigh-levelfunctionalitiese.g., globalsnapshots,livemigration,andtoprovidescalablemanagementinacooperative, peer-to-peermanner. Basedonthisapproach,thegridVMdatamanagementproposedinthisdissertation canbealsoextendedtosupportanevenlarger-scalehigh-performanceandhigh-throughput computingsystembuiltuponvirtualizedresourcesacrossdatacenters,enterprises,and grids.Inthisenvisionedsystem,theVMdatamanagementcanenablefastcloningand versioningonVMstateserverstoprovidetime-andspace-ecientVMcreationand customization,andemploysmartprefetching,caching,andcheckpointingontheVMhosts toachieveecientandreliableVMinstantiationsandexecutions. 177

PAGE 178

REFERENCES [1]I.Foster,C.Kesselman,andS.Tuecke,Theanatomyofthegrid:Enablingscalable virtualorganizations," Int.J.HighPerform.Comput.Appl. ,vol.15,pp.200{222, August2001. [2]S.Adabala,V.Chadha,P.Chawla,R.Figueiredo,J.Fortes,I.Krsul,A.Matsunaga, M.Tsugawa,J.Zhang,M.Zhao,L.Zhu,andX.Zhu,Fromvirtualizedresourcesto virtualcomputinggrids:thein-vigosystem," FutureGenerationComputerSystems vol.21,pp.896{909,June2005. [3]A.M.Matsunaga,M.O.Tsugawa,M.Zhao,L.Zhu,V.Sanjeepan,S.Adabala, R.J.O.Figueiredo,H.Lam,andJ.A.B.Fortes,Ontheuseofvirtualizationand servicetechnologiestoenablegrid-computing,"in Proc.Euro-Par ,pp.1{12,2005. [4]A.Bayucan,R.L.Henderson,C.Lesiak,B.Mann,T.Proett,andD.Tweten, Portablebatchsystem:Externalreferencespecication,"tech.rep.,MRJ TechnologySolutions,November1999. [5]K.Czajkowski,I.T.Foster,N.T.Karonis,C.Kesselman,S.Martin,W.Smith,and S.Tuecke,Aresourcemanagementarchitectureformetacomputingsystems,"in Proc.theWorkshoponJobSchedulingStrategiesforParallelProcessing ,London, UK,pp.62{82,Springer-Verlag,1998. [6]B.Allcock,J.Bester,J.Bresnahan,A.L.Chervenak,C.Kesselman,S.Meder, V.Nefedova,D.Quesnel,S.Tuecke,andI.Foster,Secure,ecientdatatransport andreplicamanagementforhigh-performancedata-intensivecomputing,"in Proc. theEighteenthIEEESymposiumonMassStorageSystemsandTechnologies Washington,DC,USA,IEEEComputerSociety,2001. [7]J.Bester,I.Foster,C.Kesselman,J.Tedesco,andS.Tuecke,Gass:Adata movementandaccessserviceforwideareacomputingsystems,"in Proc.theSixth WorkshoponInput/OutputinParallelandDistributedSystems ,Atlanta,GA, pp.78{88,ACMPress,1999. [8]Gt4.0reliableletransferrftservice."URL: http://www.globus.org/toolkit/docs/4.0/data/rft/. [9]M.J.Litzkow,M.Livny,andM.W.Mutka,Condor-ahunterofidle workstations,"in Proc.8thInternationalConferenceonDistributedComputing Systems ,SanJose,CA,USA,pp.104{111,June1988. [10]J.Bent,D.Thain,A.A.C.Dusseau,A.R.H.Dusseau,andM.Livny,Explicit controlinabatch-awaredistributedlesystem,"in Proc.the1stUSENIXSymposiumonNetworkedSystemsDesignandImplementation ,2004. 178

PAGE 179

[11]A.D.Alexandrov,M.Ibel,K.E.Schauser,andC.J.Scheiman,Ufo:Apersonal globallesystembasedonuser-levelextensionstotheoperatingsystem," ACM TransactionsonComputerSystems ,vol.16,no.3,pp.207{233,1998. [12]D.ThainandM.Livny,Parrot:Transparentuser-levelmiddlewarefor data-intensivecomputing,"in Proc.theWorkshoponAdaptiveGridMiddleware September2003. [13]B.S.White,M.Walker,M.Humphrey,andA.S.Grimshaw,Legionfs:asecure andscalablelesystemsupportingcross-domainhigh-performanceapplications,"in Proc.the2001ACM/IEEEconferenceonSupercomputing ,NewYork,NY,USA, pp.59{59,ACMPress,2001. [14]J.Bent,V.Venkataramani,N.Leroy,A.Roy,J.Stanley,A.C.Arpaci-Dusseau, R.H.Arpaci-Dusseau,andM.Livny,Flexibility,manageability,andperformancein agridstorageappliance,"in Proc.the11thIEEEInternationalSymposiumonHigh PerformanceDistributedComputingHPDC-1120002HPDC'02 ,Washington,DC, USA,IEEEComputerSociety,2002. [15]N.H.KapadiaandJ.A.B.Fortes,Punch:Anarchitectureforweb-enabled wide-areanetwork-computing," ClusterComputing ,vol.2,no.2,pp.153{164,1999. [16]D.Thain,J.Basney,S.C.Son,andM.Livny,Thekangarooapproachtodata movementonthegrid,"in Proc.theTenthIEEESymposiumonHighPerformance DistributedComputingHPDC-10 ,pp.7{9,2001. [17]R.J.Figueiredo,P.A.Dinda,andJ.A.B.Fortes,Acaseforgridcomputing onvirtualmachines,"in Proc.the23rdInternationalConferenceonDistributed ComputingSystems ,Washington,DC,USA,IEEEComputerSociety,2003. [18]M.KozuchandM.Satyanarayanan,Internetsuspend/resume,"in Proc.theFourth IEEEWorkshoponMobileComputingSystemsandApplications ,Washington,DC, USA,IEEEComputerSociety,2002. [19]C.Sapuntzakis,D.Brumley,R.Chandra,N.Zeldovich,J.Chow,M.S.Lam,and M.Rosenblum,Virtualappliancesfordeployingandmaintainingsoftware,"in Proc.the17thUSENIXconferenceonSystemadministration ,Berkeley,CA,USA, pp.181{194,2003. [20]M.Zhao,J.Zhang,andR.Figueiredo,Distributedlesystemsupportforvirtual machinesingridcomputing,"in Proc.the13thIEEEInternationalSymposiumon HighPerformanceDistributedComputingHPDC'04 ,Washington,DC,USA, pp.202{211,IEEEComputerSociety,2004. [21]B.Callaghan, NFSIllustrated .Addison-Wesley,2002. [22]Nfs:Networklesystemprotocolspecication."RFC1094,March1989. 179

PAGE 180

[23]B.Pawlowski,C.Juszczak,P.Staubach,C.Smith,D.Lebel,andD.Hitz,NFS version3:Designandimplementation,"in Proc.USENIXSummer ,pp.137{152, 1994. [24]P.J.LeachandD.C.Naik,Acommoninternetlesystemcifs/1.0protocol." InternetDraft,1997. [25]J.H.Howard,M.L.Kazar,S.G.Menees,D.A.Nichols,M.Satyanarayanan,R.N. Sidebotham,andM.J.West,Scaleandperformanceinadistributedlesystem," ACMTrans.Comput.Syst. ,vol.6,pp.51{81,February1988. [26]Opensourceversionofafs."URL:http://www.openafs.org. [27]P.J.Braam,Thecodadistributedlesystem," LinuxJ. ,vol.1998,no.50es,1998. [28]B.CallaghanandT.Lyon,Theautomouner,"in Proc.theWinter1989USENIX Conference ,pp.43{51,1989. [29]M.Blaze,Acryptographiclesystemforunix,"in Proc.1stACMConferenceon CCS ,pp.9{16,1993. [30]K.Fu,F.M.Kaashoek,andD.Mazieres,Fastandsecuredistributedread-onlyle system," ACMTrans.Comput.Syst. ,vol.20,pp.1{24,February2002. [31]Cachelesystemcachefs."White-Paper,SunMicrosystems,Incorporated, February1994. [32]V.SrinivasanandJ.Mogul,Spritelynfs:experimentswithcache-consistency protocols,"in Proc.thetwelfthACMsymposiumonOperatingsystemsprinciples NewYork,NY,USA,pp.44{57,ACMPress,1989. [33]M.N.Nelson,B.B.Welch,andJ.K.Ousterhout,Cachinginthespritenetwork lesystem," ACMTrans.Comput.Syst. ,vol.6,pp.134{154,February1988. [34]R.Macklem,Notquitenfs,softcacheconsistencyfornfs,"in Proc.theUSENIX Winter1994TechnicalConference ,SanFransisco,CA,USA,pp.261{278,1994. [35]S.Shepler,B.Callaghan,D.Robinson,R.Thurlow,C.Beame,M.Eisler,and D.Noveck,Rfc3530:Networklesystemnfsversion4protocol."URL: http://www.ietf.org/rfc/rfc3530.txt,2003. [36]Y.Saito,C.Karamonolis,M.Karlsson,andM.Mahalingam,Tamingaggressive replicationinthepangaeawide-arealesystem," ACMSIGOPSOperatingSystems Review ,vol.36,pp.15{30,2002. [37]J.Kubiatowicz,D.Bindel,Y.Chen,S.Czerwinski,P.Eaton,D.Geels, R.Gummadi,S.Rhea,H.Weatherspoon,C.Wells,andB.Zhao,Oceanstore: 180

PAGE 181

anarchitectureforglobal-scalepersistentstorage,"in Proc.theninthinternationalconferenceonArchitecturalsupportforprogramminglanguagesandoperating systems ,vol.28,pp.190{201,ACMPress,December2000. [38]S.Rhea,P.Eaton,D.Geels,H.Weatherspoon,B.Zhao,andJ.Kubiatowicz, Pond:Theoceanstoreprototype,"in Proc.theConferenceonFileandStorage Technologies ,USENIX,2003. [39]B.KrishnamurthyandC.E.Wills,Studyofpiggybackcachevalidationforproxy cachesintheworldwideweb,"in USENIXSymposiumonInternetTechnologiesand Systems ,1997. [40]D.E.CullerandJ.P.Singh, ParallelComputerArchitecture:AHardware/Software Approach .USA:MorganKaufmannPublishers,Inc.,1999. [41]J.Linn,Thekerberosversion5gss-apimechanism."RFC1964,June1996. [42]Pki."URL:http://www.oasis-pki.org/resources/techstandards/. [43]Public-keyinfrastructurex.509pkixcharter."URL: http://www.ietf.org/html.charters/pkix-charter.html. [44]M.Eisler,A.Chiu,andL.Ling,Rpcsec gssprotocolspecication."RFC2203, September1997. [45]J.Linn,Genericsecurityserviceapplicationprograminterface."RFC1508, September1993. [46]M.Eisler,Lipkey-alowinfrastructurepublickeymechanismusingspkm."RFC 2847,June2000. [47]P.Honeyman,W.A.Adamson,andS.Mckee,Gridnfs:Globalstorageforglobal collaborations," LocaltoGlobalDataInteroperability-ChallengesandTechnologies 2005. [48]Securenfsviasshtunnel."URL:http://www.math.ualberta.ca/imaging/snfs/. [49]D.Mazieres,M.Kaminsky,M.F.Kaashoek,andE.Witchel,Separatingkey managementfromlesystemsecurity," ACMSIGOPSOperatingSystemsReview vol.34,no.2,pp.19{20,2000. [50]M.Kaminsky,G.Savvides,D.Mazieres,andF.M.Kaashoek,Decentralizeduser authenticationinagloballesystem," ACMSIGOPSOperatingSystemsReview vol.37,pp.60{73,December2003. [51]V.Welch,F.Siebenlist,I.Foster,J.Bresnahan,K.Czajkowski,J.Gawor, C.Kesselman,S.Meder,L.Pearlman,andS.Tuecke,Securityforgridservices," in Proc.12thIEEEInternationalSymposiumonHighPerformanceDistributed Computing ,pp.48{57,2003. 181

PAGE 182

[52]A.Ferrari,F.Knabe,M.Humphrey,S.J.Chapin,andA.S.Grimshaw,Aexible securitysystemformetacomputingenvironments,"in Proc.the7thInternational ConferenceonHigh-PerformanceComputingandNetworking ,London,UK, pp.370{380,Springer-Verlag,1999. [53]A.O.Freier,P.Karlton,andP.C.Kocher,Thesslprotocolversion3.0."Internet Draft,January1996. [54]T.DierksandE.Rescorla,Thetransportlayersecuritytlsprotocol."RFC4346, April2006. [55]Openssl:Theopensourcetoolkitforssl/tls."URL:http://www.openssl.org. [56]Ws-securityspecication."URL: http://www.oasis-open.org/specs/index.php#wssv1.0. [57]Webservicessecureconversationlanguage."URL: http://www6.software.ibm.com/software/developer/library/ws-secureconversation.pdf. [58]Webservicessecuritypolicylanguage."URL: http://www6.software.ibm.com/software/developer/library/ws-secpol.pdf. [59]A.Grimshaw,A.Ferrari,F.Knabe,andM.Humphrey,Wideareacomputing: resourcesharingonalargescale," Computer ,vol.32,no.5,pp.29{37,1999. [60]M.Humphrey,Fromlegiontolegion-gtoogsi.net:Object-basedcomputingfor grids,"in Proc.the17thInternationalSymposiumonParallelandDistributed Processing ,IEEEComputerSociety,2003. [61]J.Frey,T.Tannenbaum,M.Livny,I.Foster,andS.Tuecke,Condor-g:A computationmanagementagentformulti-institutionalgrids," ClusterComputing ,vol.5,pp.237{246,July2002. [62]N.PeyrouzeandG.Muller,Ft-nfs:anecientfault-tolerantnfsserverdesigned foro-the-shelfworkstations,"in Proc.theTheTwenty-SixthAnnualInternational SymposiumonFault-TolerantComputingFTCS'96 ,Washington,DC,USA, IEEEComputerSociety,1996. [63]M.CastroandB.Liskov,Practicalbyzantinefaulttolerance,"in Proc.Symposium onOperatingSystemsDesignandImplementation ,USENIXAssociation,1999. [64]A.I.T.RowstronandP.Druschel,Storagemanagementandcachinginpast,a large-scale,persistentpeer-to-peerstorageutility,"in Proc.SymposiumonOperating SystemsPrinciples ,pp.188{201,2001. [65]A.Adya,W.Bolosky,M.Castro,R.Chaiken,G.Cermak,J.Douceur,J.Howell, J.Lorch,M.Theimer,andR.Wattenhofer,Farsite:Federated,available,and reliablestorageforanincompletelytrustedenvironment,"in Proc.5thSymposium onOperatingSystemsDesignandImplementation ,2002. 182

PAGE 183

[66]A.Chervenak,R.Schuler,C.Kesselman,S.Koranda,andB.Moe,Wideareadata replicationforscienticcollaborations,"in Proc.The6thIEEE/ACMInternational WorkshoponGridComputing ,pp.1{8,2005. [67]The,Webservicesarchitecture."publicdraft, http://www.w3.org/TR/2003/WD-ws-arch-20030808/,August2003. [68]K.Czajkowski,D.Ferguson,F.Leymann,M.Nally,I.Sedukhin,D.Snelling, T.Storey,W.Vambenepe,andS.Weerawarana,Modelingstatefulresourcesusing webservices,"GlobusAlliance,March2004. [69]Globustoolkit."URL:http://www.globus.org/toolkit/. [70]G.WassonandM.Humphrey,Exploitingwsrfandwsrf.netforremotejob executioningridenvironments,"in Proc.the19thIEEEInternationalParallel andDistributedProcessingSymposiumIPDPS'05 ,Washington,DC,USA,IEEE ComputerSociety,2005. [71]Wsrf::lite."URL:http://www.sve.man.ac.uk/Research/AtoZ/ILCT. [72]J.O.KephartandD.M.Chess,Thevisionofautonomiccomputing," Computer vol.36,no.1,pp.41{50,2003. [73]S.R.White,J.E.Hanson,I.Whalley,D.M.Chess,andJ.O.Kephart,An architecturalapproachtoautonomiccomputing,"in Proc.InternationalConference onAutonomicComputing ,pp.2{9,2004. [74]T.Kosar,G.Kola,andM.Livny,Aframeworkforself-optimizing,fault-tolerant, highperformancebulkdatatransfersinaheterogeneousgridenvironment,"in Proc.SecondInternationalSymposiumonParallelandDistributedComputing pp.137{144,2003. [75]V.Kalogeraki,Decentralizedresourcemanagementforreal-timeobject-oriented dependablesystems,"tech.rep.,2001. [76]D.Murky,C.David,W.Ian,S.Alla,G.Pawan,S.Aamer,R.Keri,L.Ed, T.William,A.Bill,B.Marcus,andK.Alexander,Policy-basedautonomicstorage allocation,"in Proc.IFIP/IEEEInternationalworkshoponDistributedsystems, OperationsandManagement ,October2003. [77]S.Sivasubramanian,G.Alonso,G.Pierre,andM.vanSteen,Globedb:autonomic datareplicationforwebapplications,"in Proc.the14thinternationalconferenceon WorldWideWeb ,NewYork,NY,USA,pp.33{42,ACMPress,2005. [78]H.YuandA.Vahdat,Consistentandautomaticreplicaregeneration," Trans. Storage ,vol.1,pp.3{37,February2005. 183

PAGE 184

[79]B.Chun,D.Culler,T.Roscoe,A.Bavier,L.Peterson,M.Wawrzoniak,and M.Bowman,Planetlab:anoverlaytestbedforbroad-coverageservices," SIGCOMMComput.Commun.Rev. ,vol.33,pp.3{12,July2003. [80]G.VenkitachalamandB.Lim,Virtualizingi/odevicesonvmwareworkstation's hostedvirtualmachinemonitor,"in USENIXAnnualTechnicalConference ,2001. [81]B.Dragovic,K.Fraser,S.Hand,T.Harris,A.Ho,I.Pratt,A.Wareld,P.Barham, andR.Neugebauer,Xenandtheartofvirtualization,"in Proc.theACMSymposiumonOperatingSystemsPrinciples ,October2003. [82]J.Dike,Auser-modeportofthelinuxkernel,"in Proc.4thAnnualLinuxShowcase andConference ,2000. [83]C.P.Sapuntzakis,R.Chandra,B.Pfa,J.Chow,M.S.Lam,andM.Rosenblum, Optimizingthemigrationofvirtualcomputers,"in Proc.the5thSymposiumon OperatingSystemsDesignandImplementation ,December2002. [84]R.Chandra,N.Zeldovich,C.Sapuntzakis,andM.S.Lam,Thecollective:A cache-basedsystemmanagementarchitecture,"in Proc.the2ndSymposiumon NetworkedSystemsDesignandImplementation ,pp.259{272,May2005. [85]C.Clark,K.Fraser,andH.Steven,Livemigrationofvirtualmachines,"in Proc. the2ndACM/USENIXSymposiumonNetworkedSystemsDesignandImplementationNSDI ,pp.1{11,2005. [86]Samba."URL:http://us4.samba.org/samba/. [87]R.J.Figueiredo,N.Kapadia,andJ.A.B.Fortes,Seamlessaccesstodecentralized storageservicesincomputationalgridsviaavirtuallesystem," ClusterComputing vol.7,pp.113{122,April2004. [88]Iozonelesystembenchmark."URL:http://www.iozone.org. [89]Iperf:Thetcp/udpbandwidthmeasurementtool."URL: http://dast.nlanr.net/Projects/Iperf/. [90]J.Katcher,Postmark:Anewlesystembenehmark,"tech.rep.,Network Appliance,1997. [91]J.Paladugula,M.Zhao,andR.J.Figueiredo,Supportfordata-intensive, variable-granularitygridapplicationsviadistributedlesystemvirtualization| acasestudyoflightscatteringspectroscopy,"in Proc.theSecondInternational WorkshoponChallengesofLargeApplicationsinDistributedEnvironmentsCLADE 2004 ,pp.12{21,2004. [92]J.HennessyandD.Patterson, ComputerArchitecture:aQuantitativeApproach MorganKaufmann,3rdeditioned.,2002. 184

PAGE 185

[93]M.ZhaoandR.J.Figueiredo,Proxymanagedclient-sidediskcachingforthe virtuallesystem,"tech.rep.,November2004. [94]M.Zhao,V.Chadha,andR.J.Figueiredo,Supportingapplication-tailoredgrid lesystemsessionswithwsrf-basedservices,"in Proc.14thIEEEInternational SymposiumonHighPerformanceDistributedComputingHPDC-14 ,pp.24{33, 2005. [95]M.CarsonandD.Santay,Nistnet:alinux-basednetworkemulationtool," SIGCOMMComput.Commun.Rev. ,vol.33,pp.111{126,July2003. [96]Ti-prc."URL:http://nfsv4.bullopensource.org/doc/tirpc rpcbind.php. [97]A.Zeitoun,Z.Wang,andS.Jamin,Rttometer:measuringpathminimumrtt withcondence,"in Proc.3rdIEEEWorkshoponIPOperationsandManagement IPOM2003. ,pp.127{134,2003. [98]J.DaemenandV.Rijmen,Aesproposal:Rijndael,"1998. [99]Thekeyedhashmessageauthenticationcodehmac."FIPSPub198,2002. [100]K.KaukonenandR.Thayer,Astreamcipherencryptionalgorithm'arcfour'." InternetDraft,1999. [101]G.Coulouris,J.Dollimore,andT.Kindberg, DistributedSystems:Conceptsand Design .Addison-Wesley,3rdeditioned.,2001. [102]Suracoastaloceanobservingandpredictionscoopprogram."URL: http://scoop.sura.org. [103]A.Butt,S.Adabala,N.Kapadia,R.Figueiredo,andJ.Fortes,Grid-computing portalsandsecurityissues," JournalofParallelandDistributedComputing ,vol.63, no.10,pp.1006{1014,2003. [104]R.P.Goldberg,Surveyofvirtualmachineresearch," IEEEComputerMagazine vol.7,no.6,pp.34{45,1974. [105]Gsi-enabledopenssh."URL:http://grid.ncsa.uiuc.edu/ssh/. [106]I.Krsul,A.Ganguly,J.Zhang,J.A.B.Fortes,andR.J.Figueiredo,Vmplants: Providingandmanagingvirtualmachineexecutionenvironmentsforgrid computing,"in Proc.the2004ACM/IEEEconferenceonSupercomputing Washington,DC,USA,IEEEComputerSociety,2004. [107]VMwareInc., VMwareVirtualCenterUsersManual [108]VMwareInc., GSXServer2.5.1User'sManual 185

PAGE 186

[109]I.Foster,C.Kesselman,J.Nick,andS.Tuecke,Thephysiologyofthegrid:An opengridservicesarchitecturefordistributedsystemsintegration."OpenGrid ServiceInfrastructureWG,GlobalGridForum,2002. [110]H.Kreger,Webservicesconceptualarchitecture,"IBMSoftwareGroup,2001. [111]N.H.Kapadia,R.J.Figueiredo,andJ.A.B.Fortes,Enhancingthescalabilityand usabilityofcomputationalgridsvialogicaluseraccountsandvirtuallesystems,"in Proc.10thHeterogeneousComputingWorkshop ,pp.82{82. [112]S.Adabala,A.Matsunaga,M.Tsugawa,R.Figueiredo,andJ.A.B.Fortes,Single sign-oninin-vigo:role-basedaccessviadelegationmechanismsusingshort-lived useridentities,"in Proc.18thInternationalParallelandDistributedProcessing Symposium ,pp.26{30,April2004. [113]V.ChadhaandR.J.Figueiredo,Row-fs:Auser-levelvirtualizedredirect-on-write distributedlesystemforwideareaapplications,"in Proc.13thInt.Conf.HiPC 2007. [114]M.Litzkow,T.Tannenbaum,J.Basney,andM.Livny,Checkpointandmigration ofunixprocessesinthecondordistributedprocessingsystem,"tech.rep.,Univ.of Wisconsin-Madison,1997. [115]M.BozyigitandM.Wasiq,User-levelprocesscheckpointandrestorefor migration," ACMSIGOPSOperatingSystemsReview ,vol.35,pp.86{96,April 2001. [116]L.Pearlman,V.Welch,I.Foster,C.Kesselman,andS.Tuecke,Acommunity authorizationserviceforgroupcollaboration,"in PoliciesforDistributedSystems andNetworks,2002.Proc.ThirdInternationalWorkshopon ,pp.50{59,2002. [117]O.Sato,R.Potter,M.Yamamoto,andM.Hagiya,Umlscrapbookandrealization ofsnapshotprogrammingenvironment,"in Proc.theInternationalSymposiumon SoftwareSecurity ,2003. [118]J.Xu,S.Adabala,andJ.A.B.Fortes,Towardsautonomicvirtualapplications inthein-vigosystem,"in Proc.SecondInternationalConferenceonAutonomic ComputingICAC2005 ,pp.15{26,2005. [119]Y.Zhong,S.Dropsho,andC.Ding,Missratepredictionacrossallprogram inputs,"in Proc.12thInternationalConferenceonParallelArchitecturesand CompilationTechniquesPACT2003 ,2003. [120]R.Wolski,Dynamicallyforecastingnetworkperformanceusingthenetworkweather service," ClusterComputing ,vol.1,no.1,pp.119{132,1998. [121]E.J.Gardner,Exponentialsmoothing:Thestateoftheart{partii," International JournalofForecasting ,vol.22,no.4,pp.637{666,2006. 186

PAGE 187

[122]D.C.Anderson,J.S.Chase,andA.M.Vahdat,Interposedrequestroutingfor scalablenetworkstorage," ACMTrans.Comput.Syst. ,vol.20,no.1,pp.25{48, 2002. [123]L.W.Russell,S.P.Morgan,andE.G.Chron,Clockwork:Anewmovementin autonomicsystems," IBMSystemsJournal ,vol.42,no.1,2003. [124]D.Lu,Y.Qiao,P.A.Dinda,andF.E.Bustamante,Modelingandtamingparallel tcponthewideareanetwork,"in Proc.19thIEEEInternationalParalleland DistributedProcessingSymposium ,2005. [125]nanohub."URL:http://www.nanohub.org/. [126]W.C.FengandP.Tinnakornsrisuphap,Thefailureoftcpinhigh-performance computationalgrids,"in Proc.High-PerformanceNetworkingandComputingConf. 2000. [127]H.R.Patterson,G.A.Gibson,E.Ginting,D.Stodolsky,andJ.Zelenka,Informed prefetchingandcaching,"in HighPerformanceMassStorageandParallelI/O: TechnologiesandApplications H.Jin,T.Cortes,andR.Buyya,eds.,pp.224{244, NewYork,NY:IEEEComputerSocietyPressandWiley,2001. 187

PAGE 188

BIOGRAPHICALSKETCH MingZhaowasborninShanghaiandgrewupinWuhan,twobeautifulcitiesalong theYangziRiverinChina.Attheageofnineteen,helefthishometowninpursuit ofhighereducation,whichhascontinuedtilltoday.Throughsevenyearsofstudyat TsinghuaUniversity,Beijing,China,hereceivedtheB.E.andM.E.degreesfromthe DepartmentofAutomation,withaspecialtyinpatternrecognitionandintelligent systems.Duringthecourseofhismaster'sdissertationwork,aWeb-basedsystemfor content-basedimageretrieval,hedevelopedstronginterestsincomputersystemsresearch. Encouragedbyhisparentsandadvisor,hedecidedtocontinuehisstudyoverseas,inthe UnitedStatesofAmerica. In2001,hestartedasaPh.D.studentatRiceUniversity,Houston,TX.However,it wasnotuntil2003,whenhefollowedProf.RenatoFigueiredotojoinProf.JoseFortes' ACISlabattheUniversityofFlorida,Gainesville,FL,thathenallyfoundtheresearch heisfascinatedwith,distributedsystemsandvirtualization.Duringmorethanve yearsofresearchinthisarea,hehasactivelypublishedintherelatedconferencesand journals,andthesystemshebuilthavealsobeendeployedforproductionuse,servicing applicationsandusersfrommanydisciplines.Ithasbeenquitealongjourneyforhimin thisintellectualquest,andhewillcontinueitasanAssistantProfessorattheSchoolof ComputingandInformationSciencesatFloridaInternationalUniversity,Miami,FL,in Fall2008. 188