PAGE 1
1
PAGE 2
2
PAGE 3
3
PAGE 4
IwouldliketothankmyadvisorDr.RenatoFigueiredoforallthesupporthehasprovidedmeduringlastsixyears.HehasbeentakingaroundthemazeofsystemsresearchandshownmetherightwaywheneverIfeltlost.IthasbeenaprivilegetoworkwithDr.Figueiredowhosecalmness,humbleandpolitedemeanoristheoneiwouldliketocarryandapplyfurtherinmycareer.ThankstoDr.JoseForteswhoprovidedmeopportunitytoworkatAdvancedComputingandInformationSystem(ACIS)laboratory.Hegavemeencouragementandsupportwheneverthingsweredown.IwouldliketothankDr.OscarBoykinforservinginmycommitteeandforallthosefruitfuldiscussionsonResearch,Linux,healthyfoodandRunning.Hispassiontoachieveperfectionineveryendeavorsoflifeofteneggsmetodobetter.ThankstoDr.AlanGeorgeandDr.JosephWilsonforservinginmyPhDprogramcommitteeandmotivatingmethroughtheircoursesandresearchwork.Iwouldliketothankmymentor,RameshIllikkalandmanager,DonaldNewellatIntelCorporationforthefaiththeyhaveshowninmeandeggedmetoworkhard.IthasbeenaprivilegetoworkwithRameshwhotaughtmeimportanceofteamwork,failureandsuccess.ThankstoDr.PadmashreeApparaoandDr.RavishankarIyerforguidanceandencouragementtoachievemygoals.ThanksisalsoduetoDr.IvankrsulandDr.SumaAdabalaforguidingmenotonlyduringthedevelopmentandresearchofIn-VIGOprojectbutalsooftensharingthoughtsonaPhDprogramandexpectations.ThanksisduetoallthecolleagueshereatACISwhichmadeworkenvironmentfuntoworkin.IwouldliketothankAndreaandMauricioforprovidingexcellentresearchfacilitiesandresources.ThankstomyocematesArijitandGirishforallfruitfuldiscussions.ThanksisduetoCathyformaintainingcordialenvironmentinACISlabandextendingsupporttomeasagoodfriendwheneverneedarises. 4
PAGE 5
5
PAGE 6
page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 9 LISTOFFIGURES .................................... 10 ABSTRACT ........................................ 13 CHAPTER 1INTRODUCTION .................................. 16 1.1Introduction ................................... 16 1.1.1VirtualNetworkFileSystemI/ORedirection ............. 16 1.1.2CharacterizationofI/OOverheadsinaVirtualizedEnvironments 18 1.1.2.1Simulation ........................... 19 1.1.2.2I/OMechanisms ....................... 19 1.2DissertationContributions ........................... 20 1.3DissertationRelevance ............................. 22 1.4DissertationOverview ............................. 23 1.5DissertationOrganization ........................... 26 2I/OVIRTUALIZATION:RELATEDTERMSANDTECHNOLOGIES ..... 28 2.1Introduction ................................... 28 2.2VirtualizationTechnologies ........................... 29 2.3VirtualMachineArchitectures ......................... 32 2.3.1I/OmechanismsinVirtualMachines ................. 32 2.3.2VirtualMachinesandCMParchitectures ............... 34 2.4GridComputing ................................ 35 2.5FileSystemVirtualization ........................... 35 2.5.1NetworkFileSystem .......................... 36 2.5.2GridVirtualFileSystem ........................ 38 3REDIRECT-ON-WRITEDISTRIBUTEDFILESYSTEM ............ 41 3.1Introduction ................................... 41 3.1.1FileSystemAbstraction ........................ 41 3.1.2Redirect-on-WriteFileSystem ..................... 42 3.2MotivationandBackground .......................... 42 3.2.1Use-CaseScenario:FileSystemSessionsforGridComputing .... 44 3.2.2Use-CaseScenario:NFS-MountedVirtualMachineImagesandO/SFileSystems ............................... 44 6
PAGE 7
................................. 45 3.3ROW-FSArchitecture ............................. 47 3.3.1HashTable ................................ 48 3.3.2Bitmap .................................. 49 3.4ROW-FSImplementation ........................... 51 3.4.1MOUNT ................................. 51 3.4.2LOOKUP ................................ 53 3.4.3GETATTR/SETATTR ......................... 54 3.4.4READ .................................. 54 3.4.5WRITE ................................. 55 3.4.6READDIR ................................ 55 3.4.7REMOVE/RMDIR/RENAME ..................... 58 3.4.8LINK/READLINK ........................... 58 3.4.9SYMLINK ................................ 60 3.4.10CREATE/MKDIR ........................... 60 3.4.11STATFS ................................. 60 3.5ExperimentalResults .............................. 60 3.5.1Microbenchmark ............................. 61 3.5.2ApplicationBenchmark ......................... 63 3.5.3VirtualMachineInstantiation ..................... 66 3.5.4FileSystemComparison ........................ 67 3.6RelatedWork .................................. 68 3.7Conclusion .................................... 68 4PROVISIONINGOFVIRTUALENVIRONMENTSFORWIDEAREADESKTOPGRIDS ......................................... 70 4.1Introduction ................................... 70 4.2DataProvisioningArchitecture ........................ 71 4.3ROW-FSConsistencyandReplicationApproach ............... 78 4.3.1ROW-FSConsistencyinImageProvisioning ............. 79 4.3.2ROW-FSReplicationinImageProvisioning .............. 80 4.4SecurityImplications .............................. 81 4.5ExperimentsandResults ............................ 84 4.5.1ProxyVMResourceConsumption ................... 84 4.5.2RPCCallProle ............................ 86 4.5.3DataTransferSize ............................ 87 4.5.4Wide-areaExperiment ......................... 87 4.5.5DistributedHashTableStateEvaluationandAnalysis ........ 88 4.6RelatedWork .................................. 88 4.7Conclusion .................................... 90 7
PAGE 8
.......... 91 5.1Introduction ................................... 92 5.2MotivationandBackground .......................... 93 5.2.1FullSystemSimulator .......................... 94 5.2.2I/OVirtualizationinXen ........................ 94 5.3AnalysisMethodology ............................. 96 5.3.1FullSystemSimulation:XenVMMasWorkload ........... 96 5.3.2InstructionTrace ............................ 97 5.3.3SymbolAnnotation ........................... 98 5.3.4PerformanceStatistics ......................... 98 5.3.5EnvironmentalSetupforVirtualizedWorkload ............ 100 5.4ExperimentsandSimulationResults ..................... 102 5.4.1LifeCycleofanI/Opacket ....................... 102 5.4.1.1UnprivilegedDomain ..................... 103 5.4.1.2GrantTableMechanism ................... 105 5.4.1.3TimerInterrupts ....................... 105 5.4.1.4PrivilegedDomain ...................... 106 5.4.2CacheandTLBCharacteristics .................... 108 5.5CacheandTLBScaling ............................ 110 5.6RelatedWork .................................. 114 5.7Conclusion .................................... 115 6HARDWARESUPPORTFORI/OWORKLOADS:ANANALYSIS ...... 116 6.1Introduction ................................... 116 6.2TranslationLookasideBuer .......................... 117 6.2.1Introduction ............................... 117 6.2.2TLBInvalidationinMultiprocessors .................. 118 6.3InterprocessorInterrupts ............................ 120 6.4GrantTableMechanism:I/OAnalysis .................... 121 6.5ExperimentsandResults ............................ 123 6.5.1GrantTablePerformance ........................ 123 6.5.2HypervisorGlobalBit .......................... 124 6.5.3TLBCoherenceEvaluation ....................... 125 6.6RelatedWork .................................. 129 6.7Conclusion .................................... 131 7CONCLUSIONANDFUTUREWORK ...................... 132 7.1Conclusion .................................... 132 7.2FutureWork ................................... 133 REFERENCES ....................................... 136 BIOGRAPHICALSKETCH ................................ 145 8
PAGE 9
Table page 3-1SummaryoftheNFSv2protocolremoteprocedurecalls ............. 53 3-2LANandWANexperimentsformicro-benchmarks ................ 62 3-3AndrewbenchmarkandAM-Utilsexecutiontimes ................. 63 3-4LinuxkernelcompilationexecutiontimesonaLANandWAN. ......... 65 3-5WideareaexperimentalresultsfordisklessLinuxbootandsecondboot ..... 66 3-6RemoteXenboot/rebootexperiment ........................ 67 4-1Gridappliancebootandreboottimesoverwideareanetwork .......... 88 4-2MeanandvarianceofDHTaccesstimeforveclients ............... 88 6-1Granttableoverheadsummary ........................... 124 6-2TLBushstatisticswithandwith-outIPIushoptimization .......... 128 6-3InstructionTLBmissstatisticswithandwith-outIPIushoptimization .... 129 6-4DatamissTLBstatisticswithandwith-outIPIushoptimization ........ 129 9
PAGE 10
Figure page 1-1Protocolredirectionthroughuser-levelproxies ................... 17 1-2Illustrationofserverconsolidation .......................... 18 2-1Landscapeofvirtualizedcomputersystems ..................... 30 2-2Systemspartitioningcharacteristics ......................... 31 2-3I/OvirtualizationpathforsingleO/Sandvirtualmachines ........... 33 2-4I/OpartitioninginvirtualmachineandCMParchitectures ............ 34 2-5Gridvirtuallesystem ................................ 39 3-1IndirectionmechanismintheLinuxvirtuallesystem .............. 41 3-2MiddlewaredatamanagementforsharedVMimages ............... 43 3-3Check-pointingaVMcontainerrunninganapplicationwithNFS-mountedlesystem ......................................... 46 3-4Redirect-on-writelesystemarchitecture ...................... 48 3-5ROW-FSproxydeploymentoptions ......................... 49 3-6Hashtableandagdescriptions ........................... 50 3-7RemoteprocedurecallprocessinginROW-FS ................... 50 3-8AsnapshotviewoflesystemsessionthroughRedirect-on-Writeproxy ..... 52 3-9Sequenceofredirect-on-writelesystemcalls ................... 56 3-10NumberofRPCcallsreceivedbyNFSserverinnon-virtualizedenvironment,andbyROW-FSshadowandmainserversduringAndrewbenchmarkexecution ............................................. 64 4-1O/Simagemanagementoverwideareadesktops .................. 72 4-2ThedeploymentoftheROWproxytosupportPXE-basedbootofa(diskless)non-persistentVMoverawideareanetwork. .................... 73 4-3AlgorithmtobootstrapaVMsession ........................ 77 4-4AlgorithmtopublishavirtualmachineImage ................... 77 4-5ReplicationapproachforROW-FS ......................... 81 4-6Disklessclientandpublisherclientsecurity ..................... 83 10
PAGE 11
.............. 85 4-8RPCstatisticsfordisklessboot ........................... 87 4-9CumulativedistributionofDHTquerythrough10IPOPclients(inseconds) .. 89 5-1FullsystemsimulationenvironmentwithXenexecution .............. 95 5-2Executiondrivensimulationandsymbolannotatedprolingmethodology .... 97 5-3Symbolannotation .................................. 99 5-4Function-levelperformancestatistics ........................ 99 5-5SoftSDVCPUcontrollerexecutionmode:performanceorfunctional ....... 101 5-6LifeofanI/Opacket ................................. 103 5-7Unprivilegeddomaincallgraph ........................... 104 5-8TCPtransmitandgranttableinvocation ...................... 105 5-9Timerinterruptstoinitiatecontextswitch ..................... 106 5-10Lifeofapacketinprivilegeddomain ........................ 107 5-11ImpactofTLBushandcontextswitch ...................... 109 5-12CorrelationbetweenVMswitchingandTLBmisses ................ 109 5-13TLBMissesafteraVMcontextswitch ....................... 110 5-14TLBmissesafteragrantdestroy .......................... 111 5-15ImpactofVMswitchoncachemisses ........................ 111 5-16L2cacheperformancefortransmitofI/Opackets ................. 112 5-17DataandinstructionTLBperformancefortransmitofI/Opackets ....... 112 5-18L2cacheperformanceforreceiveofI/Opackets .................. 113 5-19DataandinstructionTLBperformanceforreceiveofI/Opackets ........ 114 6-1Thex86pagetableforsmallpages ......................... 119 6-2Interprocessorinterruptmechanisminx86architecture .............. 121 6-3Simulationexperimentalsetup ............................ 124 6-4ImpactoftaggingTLBwithaglobalbit ...................... 125 6-5Pagesharinginmulticoreenvironment ....................... 126 11
PAGE 12
................. 127 12
PAGE 13
13
PAGE 14
14
PAGE 15
15
PAGE 16
1 ].ThisdissertationinvestigatesdataprovisioningandperformancecharacterizationofvirtualI/O,inparticularnetworklesystems,insuchvirtualizedenvironments.Thegoalsofthisdissertationareasfollows:First,deviseandevaluatetechniqueswhichcanseamlesslyprovidedatatoapplicationsinwide-areaenvironmentsspanningmultipledomainsthroughdistributedlesystemprotocolI/Oredirection.Second,becausetheperformanceofthisdataprovisioningsolutionislimitedinherentlybyoverheadsassociatedwithnetworkI/Oinavirtualizedenvironment,thisdissertationevaluatesnetworkI/Ovirtualizationoverheadsinsuchenvironmentswithasimulation-basedmethodologythatenablesquantitativeanalysisoftheimpactofmicro-architecturefeaturesonperformanceofacontemporarysplit-I/Ovirtualmachinehypervisor.Finally,thisworkexploreshardware/softwaresupporttoimprovebottlenecksinI/Operformance.Thedistributedlesystemredirectionapproachfordatamanagementandevaluationcanbenet,inparticularapplicationswherelargemostlyreaddatasetsneedtobeprovisionedinavirtualdatacenter. 2 ].Tofacilitatedatamovementinsuchenvironments,Ihavedevelopedanovelredirect-on-writedistributedlesystem(ROW-FS)whichallowsforapplication-transparentbueringandrequestre-routingofalllesystemmodicationslocallythroughuser-levelproxies.Theseproxiesforwardlesystemaccessesthatmodifyanylesystemobjectstoa\shadow"distributedlesystemserver,creatingon-demandprivatecopiesofsuchobjectsintheshadowleserverwhileroutingaccessestounmodieddatatoa\main"server.Amotivationforthis 16
PAGE 17
3 ].Figure 1-1 showsanexampleofprotocolredirectionofNFSthroughuser-levelproxies.VirtualmachineenvironmentsarecommonlyusedtoimproveCPUutilizationincomputersystems[ 1 ].SuchVM-basedenvironmentsarebeingincreasinglyusedindatacentersforresourceconsolidation,highperformanceandgridcomputingasmeanstofacilitatethedeploymentofuser-customizedexecutionenvironments[ 4 ][ 5 ][ 6 ][ 7 ].Thus,thedeploymentofuser-levelproxiesinVM-basedexecutionenvironmentsisacommonscenariotofacilitatedatamovementandprovisioning[ 4 ]. Figure1-1. ProtocolRedirection:Client/ServerprotocolismodiedtoforwardRPCcallstoashadowdistributedlesystemserver Forexample,considerthescenarioofmultipleserversindatacenterswhichareconsolidatedintoasingleserverthroughdeploymentofvirtualmachines,whichrepresentsacommoncontemporaryusecaseofvirtualization[ 8 ].Figure 1-2 showsapossibledeploymentofROWproxieswhichforwardcallstoNFSandshadowserver.ShadowserverVMandclientVMareconsolidatedintoasinglephysicalmachine.SuchascenarioiscommonwhenclientVMisdisklessordiskspaceonclientVMisaconstraint.WhiledeployingROWproxiesinsuchcasesprovidesamuchneededfunctionality,theoverheadassociatedwithvirtualizednetworkI/Oandisoftenconsideredabottleneck[ 9 ][ 10 ].Specically,theperformanceofROW-FSinsuchenvironments,aswellasthatof 17
PAGE 18
3 ],isdependentupontheperformanceoftheunderlyingvirtualI/Onetworklayer. Figure1-2. ServerConsolidation:Illustrationofpartitioningofphysicalsystemwithsharedhardwareresources.AspeciccaseofROW-FSdeploymentisshown.ShadowserverisencapsulatedintoaseparateVM.RPCcallsarepassedtoRemote/shadowserversthroughaROW-FSproxydeployedinprivilegedVM. 11 ].Also,systemworkloadsarebecomingmorecomplexasapplicationsareoftencompartmentalizedandexecutedwithinvirtualmachines[ 1 ].Thus,thekeyresponsibilitiesofconventionalO/Sessuchasschedulingaredelegatedtoanewlayer-theHypervisororVirtualMachineMonitor[ 12 ].Hypervisorsarewidelyacceptedasanapproachtoaddressunder-utilizationofresourcessuchasI/OandCPUinphysicalmachines.Tocharacterizetheimpact 18
PAGE 19
1. Whyhasasimulation-basedapproachbeenchosen? 2. WhataretheoptionsofnetworkI/Omechanismsinvirtualmachinedesigns? 13 ][ 14 ][ 15 ].Theuseofsimulationinthisdissertationismotivatedbythefactthatcurrentsystemevaluationmethodologiesforvirtualmachinesarebasedonmeasurementsofadeployedvirtualizedenvironmentonaphysicalmachine.Althoughsuchanapproachgivesgoodestimatesofperformanceoverheadsforagivenphysicalmachine,itlacksexibilityindeterminingtheresourcescalingperformance.Inaddition,itisdiculttoreplicateameasurementframeworkondierentsystemarchitectures.Itisimportanttomovetowardsafullsystemsimulationmethodologybecauseitisaexibleapproachinstudyingdierentarchitectures. 16 ].ThevirtualmachineI/OarchitectureswhichhavebeenimplementedindierentHypervisorsare:(1)DirectI/OModel(Xen1.0),wheretheHypervisorisresponsibleforrunningdevicedrivers,and(2)SplitI/OModel(Xen2.0/3.0),wheredevicedriversareexportedtoaprivilegedguestvirtualmachine(3)Pass-throughI/OModel,wheredirectaccesstohardwaredevicesisprovidedfromguestvirtualmachines.Chapter2further 19
PAGE 20
17 ].Microkernelshavenotfoundwideacceptancearguablybecauseofoverheadsassociatedwithinter-processcommunicationbetweendierententities(suchaslesystemanddevicedrivers).AsprocessorsarebecomingfasterandnewCMParchitecturesareprovidingcore-levelparallelism,I/OapproachessuchasooadingselectivevirtualI/Ofunctionalitytoseparatecorearebeingexplored[ 18 ].IchosesplitI/Oasabasisfortheinvestigationinthisdissertationbecauseofthefollowingreasons: 19 ].Withtheadventofchipmultiprocessorarchitectures,themicrokernelapproachisbeingre-visited;theXenVMMversion3.0[ 20 ]hasfeatureswhicharealsofoundinmicrokernels[ 17 ].DedicatedCPUstoimprovesplitI/OperformancecaneasilybeaddressedthroughCMParchitectures. 20
PAGE 21
21
PAGE 22
2 ]havedealtwiththisproblemviaapplicationcheck-pointingandrestart.Alimitationofthisapproachliesinthatitonlysupportsaveryrestrictedsetofapplications-theymustbere-linkedtoCondorlibrariesandcannotusemanysystemcalls(e.g.fork,exec,mmap).TheapproachofredirectingRPCcallsinashadowserver,incontrast,supportsunmodiedapplications.Itusesclient-sidevirtualizationmechanismsthatallowfortransparentbueringofalllesystemmodicationsproducedbydistributedlesystemclientsonlocalstablestorage.Locallybueredlesystemmodicationscanthenbecheckpointedtogetherwiththeapplicationstate.By"localstorage",Imeanthestoragewhichiseitherlocaltotheclientmachineorinclient'slocalareanetwork. 22
PAGE 23
21 ][ 22 ].TheseO/SimageframeworkseitherrequirekernellevelsupportorrelyonaggressivecachingoffullO/Simages. 23
PAGE 24
23 ].PreviousresearchhasshownthatdataaccesspatternsforaDFSareoftendynamicandephemeral[ 23 ].Also,currentsolutionsareeitherbasedoncompleteletransferthatmayincuraccesslatencyoverhead(ascomparedtoblocktransfers)[ 24 ]orarelimitedtosingledomains[ 25 ];notacceptableinvirtualizedenvironmentwhereoftenthereisneedtoaccesslargeO/Simagesoverwidearea.Solution:Ipresentanovelapproachthatenableswide-areaapplicationstoleverageon-demandblock-baseddatatransfersandade-factodistributedlesystem(NFS)toaccessdatastoredremotelyandmodifyitinthelocalarea-Redirect-On-WriteFileSystem(ROW-FS).IshowthattheROW-FSapproachprovidessubstantialimprovementscomparedtotraditionalNFSprotocolforbenchmarkapplicationssuchasLinuxkernelcompilationandvirtualmachineinstantiation.Problem2:Client-Serverconsistencyandreplicationmechanismsforon-demanddatatransferDuringthetimeaROW-FSlesystemsessionismounted,allmodicationsareredirectedtotheshadowserver.Itisimportanttoconsiderconsistencyindistributedlesystemsbecausedatacanbepotentiallysharedbymultipleclients.Forconsistency,twodierentscenariosneedtobeconsidered.First,thereareapplicationsinwhichitisneitherneedednordesirablefordataintheshadowservertobereconciledwiththemainserver;anexampleistheprovisioningofsystemimagesfordisklessclientsorvirtualmachines.Second,forapplicationsinwhichitisdesirabletoreconciledatawiththeserver,theROW-FSproxyholdsstateinitsprimarydatastructuresthatcanbeusedtocommitmodicationsbacktotheserver.Solution:IleverageonAPIsexportedbylookupservices(suchasDistributedHashtable)indistributedframeworks(e.g.IPOP[ 26 ])tokeepclientconsistentwithlatestupdates 24
PAGE 25
27 ].ThearchitecturalreasonsbehindtheI/Operformanceoverheadsinvirtualizedenvironmentsarenotwellunderstood.EarlyresearchincharacterizingthesepenaltieshaveshownthatcachemissesandTLBrelatedoverheadscontributetomostofI/Ovirtualizationcost[ 27 ].Solution:Ihaveappliedexecution-drivenmethodologytostudythenetworkI/OperformanceofXen(asacasestudy)inafullsystemsimulationenvironment,usingdetailedcacheandTLBmodelstoproleandcharacterizesoftwareandhardwarehotspots.Byapplyingsymbolannotationtotheinstructionowreportedbytheexecutiondrivensimulatorwederivefunctionlevelcallowinformation.Thismethodologyprovidesdetailedinformationatthearchitecturallevelandallowsdesignerstoevaluatepotentialhardwareenhancementstoreducevirtualizationoverhead.Problem4:HardwaresupportfornetworkI/Ovirtualization
PAGE 26
4 ],COD[ 28 ]),virtualmachineimageprovisioningandfaulttoleranceindistributedcomputing.Further,Chapter3discussesthedesignandimplementationofROW-FS.IevaluateROW-FSwithmicro-andapplicationbenchmarkstomeasureoverheadassociatedwithindividualRPCcalls.Chapter4describesanovelapproachtoO/SimagemanagementandprovisioningarchitecturethroughdisklesssetupofVMsusingROW-FS.Theprimarygoalisto 26
PAGE 27
27
PAGE 28
16 ],andwide-areaGridandpeer-to-peer(P2P)computingsystems[ 29 { 32 ].Assystemsevolveinthedirectionoflarge-scale,networkedenvironmentsandasapplicationsevolvetodemandprocessingofvastamountsofinformation,theinput/output(I/O)subsystemwhichprovidesacomputerwithaccesstomassstorageandtonetworksbecomesincreasinglyimportant.Fundamentally,computersystemsconsistofthreesubsystems:processors,memoryandI/O[ 1 ].Intheearlydaysofdesktopcomputing,I/Osystemswereprimarilyusedasanextensionofthememorysystem(asahierarchylevelbackingupcacheandRAMmemories)andforpersistentdatastorage.Withthearrivalofnetworksandthewide-areaInternet,theI/Osubsystemcanbeinterpretedasagenerictermreferringtoaccessofdata,overthenetworkorinstorage.InthecontextofnetworkedI/O,aparticularsubsystemwhichhasbeensuccessfullyusedinprovisioningdataovernetworksisa 28
PAGE 29
33 ],AFS[ 24 ],Legion[ 34 ]).Intheprocessorandmemorysubsystems,systemarchitectsincreasinglyapplyideasofvirtualization,withapproachesthatbuildontechniquesforlogicallypartitioningphysicalsystemsdevelopedinmainframes[ 12 ]whicharenowaccessibleforcommoditysystemsbasedonthex86architecture.Whileforseveralyearsthex86architecturedidnotsatisfyconditionsthatmakeaCPUamenabletovirtualization[ 12 ],hardwarevendors(IntelandAMD)haveprovidedhardwaresupporttosimplifythedesignofVirtualMachineMonitors(VMMs)[ 35 ][ 36 ].EventhoughvirtualizationapproachescanbeusedtoaddresstheproblemofCPUunder-utilizationthroughworkloadconsolidation,thegapbetweenI/OmechanismsandCPUeciencyhaswiden.Figure 2-1 givesanoverviewofsomeofthedierenttechnologiesbeingharnessedbystate-of-the-artcomputingsystems,andprovidesalandscapeinwhichtheapproachesdescribedinthisdissertationareintendedtobeapplied:wide-areasystems(possiblyorganizedinapeer-to-peerfashion),wherenodesandnetworksarevirtualizedandwherecommodityCPUscontainmultiplecores.Thefollowingsubsectionsaddressthesetechnologiesinmoredetail. 37 ].Theprocessofprovidingtransparentaccesstoheterogeneousresourcesthroughalayerof 29
PAGE 30
Landscapeofvirtualizedcomputersystems:resourcesandplatformsareheterogeneouswithrespecttohardwareandsystemsoftwareenvironments;computingnodeshavemultipleindependentprocessingunits(cores)andarevirtualizable;Gridcomputingmiddlewareisusedtoharnessesthecomputepowerofheterogeneousresources;apeer-to-peerorganizationofresourcesenablesself-organizingensemblesofCPUstosupporthigh-throughputcomputingapplicationsandscienticexperiments. indirectioniscentraltovirtualization.Forexample,inLinux,thevirtuallesystemprovidestransparentaccesstodataacrossdierentlesystems.Indirectioncanbeachievedthroughinterpositionbyanagentorproxybetweentwoormorecommunicatingentities.Proxieshavebeenextensivelyusedacrossforuserauthentication,callforwardingandforsecuregateways[ 38 ].Forexample,virtualmemoryisawidelyusedmechanismformultiplexingphysicalRAMintraditionaloperatingsystems. 30
PAGE 31
2-2 providesabroaderviewofpartitioningofasystemwhichalsoincludesverticalpartitioning.Horizontalpartitioningessentiallyaddsanextralayerinthesystemstackwhichprovidesabstractiontoapplicationlayertoaccessunderlyingresources,whileverticalpartitioningmechanismscanbeusedtodividetheunderlyingresources,forinstancetoisolatesub-systemssuchthatinterferenceduetofaultsandperformancecross-talkisminimized. Figure2-2. Systemspartitioningcharacteristics:AccesstoCPUinmulti-coresystemscanbenaturallypartitionedacrosscores,howevertherearehardwareresourceswhichareshared(e.g.L2cache,memory,harddiskandNICs).Qualityofservice(QoS)provisioningcanbeusedtopartitionsharedresourcesacrossvirtualcontainers. Ingeneral,virtualmachinearchitecturesexhibitthreecommoncharacteristics:multiplexing,polymorphismandmanifolding[ 4 ].Thevirtualizationapproachofmultiplexingphysicalresourcesnotonlydecouplesthecomputeresourcesfromhardwarebutprovidestheexibilityofallowingcomputeresourcestomigrateseamlessly.Today,manyvirtualmachinemonitorsareavailableforresearchanddevelopment(e.g.VMware[ 37 ],Parallels[ 39 ],VirtualBox[ 40 ],KVMcitevm:kvm,lguest[ 41 ],Xen[ 20 ],UML[ 42 ],andQemu[ 43 ]). 31
PAGE 32
1 ].Inordertobeeectivelyvirtualizable,itisdesirablethattheunderlyingprocessorinstructionsetarchitecture(ISA)followsconditionssetforthin[ 12 ].Forseveralyears,theinstructionsetofwhatiscurrentlythemostpopularmicroarchitecure[ 44 ]wasnotdirectlyvirtualizablebecausethereareasetofsensitiveinstructionswhichdonotcausetheprocessortotrapwhenrunninginunprivilegedmode[ 1 ].Systemvirtualmachineshavebeendesignedtoovercomethislimitationinpreviousx86generationswithtwomajorapproachesresultinginsuccessfulimplementations.Intheclassicalapproach,avirtualmachinesuchasVMwarereliesonecientbinarytranslationmechanismstoemulatenon-virtualizableinstructionswithoutrequiringmodicationstotheguestO/S.Intheparavirtualizedapproach(e.gXen),modicationstothearchitecture-dependentcodeoftheguestO/Sarerequired,bothtoavoidtheoccurrenceofnon-virtualizableinstructionsandtoenableimprovedsystemperformance.IntelandAMDhaveprovidedhardwaresupporttoextendthex86architectureforvirtualizedenvironment[ 1 ],makingtheimplementationofclassicVMMsamucheasiertaskbecausebinarytranslationisnotrequired;theKVMvirtualmachineisanexampleofarecentVMMwhichbuildsuponsuchhardwareextensions. 2-3 (a).Incontrast,aconventionalI/OvirtualizationpathtraversesthroughaguestVMdriver,virtualI/Odevice,physicaldevicedriverandphysicaldevice(e.g.anetworkinterfacecard,NIC).TheguestVMdrivermerelyprovidesamechanismtoshareaccesstoavirtualI/Odevice(emulateddevicedriver)throughasharedmemorymechanism.Theemulateddeviceiseither 32
PAGE 33
2-3 (b)). Figure2-3. I/Ovirtualizationpath:(a)InsingleO/S,applicationinvokesphysicaldevicedriverbymeansofsystemcalls(b)InaVMenvironment,eachVM'sguestdrivercommunicateswithvirtualI/Odevicethroughasharedmemorymechanism.ThevirtualI/OdevicecaneitherresideinthehypervisororintheseparateVMasshownbydottedlines. ThefollowingaretypicalalternativeswhichhavebeenconsideredforI/Ohandlinginvirtualmachineenvironments: 37 ]. 45 ]. 33
PAGE 34
46 ][ 47 ].IOMMUallowsdirectmappingofhardwaredeviceaddressesintoguestphysicalmemoryandalsoprotectsVMsfromspuriousmemoryaccesses. I/OPartitioning:Resourcemappinginmulti-coresystems.VM1isallocatedtwoCPUs.VM3ispinnedtoI/OdevicesDiskD1,andNICN1,N2. CMParchitecturesallowconcurrentexecutionofsoftwarethreadsandmodules.Thus,toleverageonCMParchitectures,itisimportanttopartitionthesystemsoftwareresources,operatingsystemsandhardwareresourcessoastodeterministicallyallocateCPUresourcestotheapplications[ 48 ].TheperformancegainthroughsuchapartitioningofresourcescouldbeabbreviatedifbottleneckishardwaredevicessuchasNICs.Toaddressthis,trendsareeithertowardsharnessingmultipledevicessuchasnetwork 34
PAGE 35
10 ]ordevelopingnewmodelsofcommunicationwithhardwaredevices[ 48 ][ 49 ].ItisconceivablethatvirtualizedCMParchitecturesofthefuturewillrunmultipleVMs,withamixofresourcestime-sharedand/ordedicatedtoVMguests.Forexample,asshowninFigure 2-4 ,virtualmachinesVM1,VM2andVM3arehostedonamulti-coresystem.AsshowninFigure 2-4 ,VM3ispinnedtoDiskD1,NICsN1andN2andhasbeenallocatedasingleCPU.VM1hasbeenallocatedwith2CPUsandhaveprivilegedstatustoaccesshardwareresources.ThegurealsoillustratesvirtualizationpathforsplitI/O(dottedpath)anddirectI/Oapproach.InsplitI/O,sincevirtualizationpathisdividedintoseparateVMcontainers,allocatingdedicatedresources(e.g.CPUsandNICs)toguestandprivilegedVMscanpotentiallyimproveI/Operformance. 29 ],Gridcomputingreferstothe\coordinatedresourcesharingandproblemsolvingindynamic,multi-institutionalvirtualorganization".Gridcomputingtypicallytakesplaceonheterogeneousresourcesdistributedacrosswide-areanetworks,relyingonsupportfrommiddlewaretoprovideservicessuchasauthentication,scheduling,anddatatransfers.VirtualizationinthecontextofGridcomputingismotivatedbytheabilitytoabstracttheheterogeneityofresourcesandprovideaconsistentenvironmentfortheexecutionofworkloads.TheIn-VIGO[ 4 ]middlewareisarepresentativeexampleofasystemwhichextensivelyemploysvirtualizationinthisscenario.SimilarapproachhasbeentakenbyseveralprojectsinGridorUtilitycomputingsuchasCOD[ 28 ]andVirtualWorkspaces[ 5 ].Akeychallengearisinginwide-area,GridcomputinginfrastructuresisthatofmanagementandaccesstodatanotonlyI/Olocaltoanode,butalsohowtoprovidedatatoapplications,seamlessly,inenvironmentsspanningmultipledomains. 50 ].Ingridenvironments,datamovementand 35
PAGE 36
3 ][ 24 ][ 33 ]toenhancementsofwidely-usedlocal-areaprotocols(NFSv2/v3)and,tooverlayadditionalfunctionalitiesormodiedconsistencymodelsoverwideareanetworks[ 51 ][ 52 ][ 53 ][ 54 ].However,thewide-spreaddeploymentofnewprotocolsishinderedbythefactthatoperatingsystemsdesignershavemostlyfocusedonlocal-areadistributedlesystemswhichcovertypicalusagescenarios.Asanexample,open-sourceandproprietaryimplementationsoftheLAN-orientedversionsoftheNFSprotocol(v2/v3)havebeendeployed(andhardenedovertime)inthemajorityofUNIXavorsandWindows,whileopen-sourceimplementationsofthewide-areaprotocol(v4)underdevelopmentsincethelate1990shavenotyetbeenwidelydeployed.ThefollowingsectionswillexplaintheNFSprotocolarchitectureandarelatedapproachthatconsistofvirtualizationlayerbuiltonexistingNetworkFileSystem(NFS)components-Gridvirtuallesystem(GVFS). 1. Machineandlesystemindependence 2. Transparentaccesstoremoteles 3. Simplecrashrecoverymechanism 4. Lowperformanceoverheadinlocalareanetworks 36
PAGE 37
3 ]andwaslaterextendedtoaddressshortcomingssuchassmalllesizes,largenumberofGetattrcallsandperformanceoverhead,leadingtoNFSversion3(NFSv3).Forexample,inNFSv2,theinvocationofalookupcallisalwaysfollowedbytheinvocationofagetattrcall.InNFSv3,thelookupcallisoptimizedtoreturnattributesinasingleRPCoperation.Similarly,NFSv3introducesnewprocedurecallstoprovisionbueringofwritesinclientandlatercommittingittotheserver.NeitherNFSv2norv3scalewellincross-domainwide-areaenvironments[ 3 ].Inaddition,NFSv3doesnotprovideguaranteedconsistencybetweentheclients.NFSv4improvestheconsistencymechanismattheexpenseofamorecomplexserverdesign,whichisnolongerstateless[ 3 ].NFSv4implementsanopen-closeconsistencymechanism.NFSv4clientscancachethedataafterleisopenedforaccess.Ifcacheddataismodied,NFSclientsneedtocommitthedatabackduringlecloseoperation.NFSclientscanalsore-validatethecacheddatathroughletimestampforfutureaccessofthedata.Inaddition,NFSv4serversupportsdelegationandcallbackmechanismstoprovidewritepermissionstotheclient.Thus,aNFSv4clientcanallowotherclientstoaccessthedatafromthedelegatedle.NFSsupportshierarchicalorganizationoflesanddirectories.EachdirectoryorleinNFSserverisuniquelyaddressedbyapersistentlehandle[ 3 ].Alehandleisareferencetoaleordirectorythatisindependentofthelename.Forexample,NFSv2haspersistentlehandlesofsizeof32-bytes.AcomprehensivesurveyofNFSlehandlestructurecanbefoundin[ 55 ]. 37
PAGE 38
3 ].Thisapproachofstatelessserversimpliesfailurerecovery.Forexample,afailurerequiringaserverrestartcanbedealtwithbysimplyrequiringclientstore-establishtheconnection.Inaddition,theNFSprotocolsupportsidempotentoperations.Intheeventofaservercrash,theclientonlyneedstowaitfortheservertobootandre-sendtherequest.NetworkFileSystemprimarilyconsistsoftwoprotocols.First,themountdprotocolisusedtoinitiallyaccessthelehandleoftherootofanexporteddirectory.Second,thenfsdprotocolisusedtoinvokeRPCprocedurecallstoperformleoperationsonremoteserver.ANFSclientinvokesthemountprotocolthroughmountutility.Themountprotocolisathreestepprocess.First,theclientcontactsthemountdservertoobtaintheinitiallehandleforanexportedlesystem.Inthesecondstep,themountprotocolaccesstheattributesofthedirectorymountpointrequestedbytheclient.Finally,theNFSclientobtainstheattributesoftheexportedlesystem.NFSprovidesthecapabilitytoenabledierentauthenticationmechanismsuchasUnixsystemauthentication(UIDorGID).NFSsupportsauthorizationbasedonaccesscontrollistsmaintainedbytheserver.ThisaccesscontrollistprovidesamappingofuserandgroupIDbetweenclientandserver.WheneveraRPCcallisreceivedbytheserver,theservervalidatestheclientcredentialsthroughaccesscontrollist. 56 ].GVFSformsthebasicframeworkforthetransferofdatanecessaryforproblemsolvingenvironmentssuchasIn-VIGO.ItreliesonavirtualizationlayerbuiltonexistingNetworkFileSystem(NFS)components,andisimplementedatthelevelofRemoteProcedureCalls(RPC)bymeansofmiddleware-controlledlesystemproxies.AvirtuallesystemproxyinterceptsRPCcallsfromanNFSclientandforwardsthemtoanNFSserver,possiblymodifying 38
PAGE 39
4 ]. Figure2-5. GridVirtualFileSystem:NFSproceduralcallsareinterceptedthroughuser-levelproxies.GVFSproxiesaredeployedinclientandservermachines.UsersareauthenticatedthroughaaccesscontrollistexportedbyGVFSproxyandNFSserver. Figure 2-5 providesanoverviewofgridvirtuallesystem.Asshowninthisgure,middleware-controlledlesystemproxiesareusedtostartagridsessionfortheclient.GridvirtualFilesystemsupportsperformanceenhancingmechanismssuchasadisk-cache[ 56 ].GVFSproxiesarefurtherextendedwithwritebacksupporttoprovideon-demandvirtualenvironmentsforgridcomputing[ 57 ].ThisapproachreliesonbueringRPCrequestsandresultsinadisk-cache,andcommittingchangesbacktoserverattheendofusersession.Arelatedworkusesaservice-orientedapproachtoharnesstheGVFSproxiesforoptimizationssuchascachingorcopy-on-writesupport.Thisapproachisbased-onWebservicesresourceframework(WSRF)thatenablestheprovisioningofdata 39
PAGE 40
58 ]. 40
PAGE 41
3.1.1FileSystemAbstractionAlesystemisanabstractioncommonlyusedtoaccessdatafromthememory/storagesystems(e.g.disk).Thisabstractionisoftenimplementedasalayerofindirection.Indirectionmechanismsarecommonlyusedtoaddresscomputerscienceproblems.Forexample,intheLinuxO/S,toprovidetransparentaccesstodierentlesystems,indirectionmechanismsaretypicallyusedtosteerthelesystemoperationsthroughacommonlesystemframeworkcalledvirtuallesystem(VFS).Figure 3-1 showsindirectionmechanismsacrossthreelevels:transparentaccessofalesystemthroughtheVFSframework,logicalaccesstoadiskvolume,andindirectaccesstoaleblockthroughi-node.Thisdissertationappliesindirectionmechanismsthroughauser-levelproxysoastoprovidetransparentaccessofdatafromoneormoreservers.Theredirect-on-writelesystemenableswide-areaapplicationstoleverageon-demandblock-baseddatatransfersandade-factodistributedlesystem(NFS)toaccessdatastoredremotelyandmodifyitinthelocalarea. Figure3-1. IndirectionmechanismintheLinuxvirtuallesystem(vfs).Theaccesstovariouslesystemsthroughtheindirectionmechanism.Alesystem'sblockscouldbelogicallypresentinmultipleharddisks.Further,accesstotheleblocksisperformedthroughdirectorindirectaccessofblocksthroughinodes. 41
PAGE 42
42
PAGE 43
Figure3-2. Middlewaredatamanagement:GridUsersG1,G2andG3accessesledisk.imgfromserverandcustomizeforpersonalusethroughROWproxy.G1modiessecondblockBtoB',G2modiesblockCtoC'andG3extendsthelewithadditionalblockD(a)Modicationsarestoredlocallyateachshadowserver(b)virtualizedview ROW-FScomplementscapabilitiesprovidedby"classic"virtualmachine(VMs[ 12 ][ 1 ])tosupportexible,fault-tolerantexecutionenvironmentsindistributedcomputingsystems.Namely,ROW-FSenablesmounteddistributedlesystemdatatobeperiodicallycheck-pointedalongwithaVM'sstateduringtheexecutionofalong-runningapplication.ROW-FSalsoenablesthecreationofnon-persistentexecutionenvironmentsfornon-virtualizedmachines.Forinstance,itallowsmultipleclientstoaccessinread/writemodeanNFSlesystemcontaininganO/Sdistributionexportedinread-onlymodebyasingleserver.Localmodicationsarekeptinper-client"shadow"lesystemsthatarecreatedand 43
PAGE 44
3-2 illustratesanexampleofsharedVMimagebetweenGridusersG1,G2andG3.O/Simagemodicationsarelocallybueredwhereastheserverhostsread-onlyO/Simages. 4 ],COD[ 28 ]andVirtualWorkspaces[ 5 ]).Forexample,datamanagementinIn-VIGOisprovidedbyavirtualizationlayerknownasGridVirtualFileSystem.Theresultinggridvirtuallesystemallowsdynamiccreationanddestructionoflesystemsessionsonaper-userorper-applicationbasis.Suchsessionsallowon-demanddatatransfers,andpresenttousersandapplicationstheAPIofawidelyuseddistributednetworklesystemacrossnodesofacomputationalgrid.ROW-FScanexportsimilarAPIstoenduserandnetwork-intensiveapplicationstotransparentlybuerwritesinalocalserver. 44
PAGE 45
59 ][ 60 ].Intheseenvironments,oftenitisthecasethatbaseoperatingsystemlayerissharedbetweendierentclients(read-only).AnymodicationstobaseOSbyclients(e.g.akernelpatch)canbefeasiblebydeployingROW-FStoaccessread-onlyimages. 45
PAGE 46
3-3 .Inthegure,theclientvirtualmachine"C"crashesattimetf.IntraditionalNFS(Figure 3-3 ,top),jobexecutionhastorestartagain,becausetheserverstate'S'maynolongerbeconsistentwiththeclientstateatthetimeofthelastcheckpoint.Intheredirect-on-writesetup(Figure 3-3 ,bottom),jobexecutioncancorrectlyrestartatthelastcheckpointtc. Figure3-3. Check-pointingaVMcontainerrunninganapplicationwithNFS-mountedlesystems.IntraditionalNFS(top),onceaclientrollsbacktocheckpointedstate,itmaybeinconsistentwithrespecttothe(non-checkpointed)serverstate.InROW-FS(bottom),statemodicationsarebueredattheclientsideandarecheckpointedalongwiththeVM AnimportantclassofGridapplicationsconsistsoflong-runningsimulations,whereexecutiontimesintheorderofdaysarenotuncommon,andmid-sessionfaultsarehighlyundesirable.SystemssuchasCondor[ 2 ]havedealtwiththisproblemviaapplicationcheck-pointingandrestart.Alimitationofthisapproachliesinthatitonlysupportsarestrictedsetofapplications-theymustbere-linkedtospeciclibrariesandcannotusemanysystemcalls(e.g.fork,exec,mmap).ROW-FS,incontrast,supportsunmodied 46
PAGE 47
2 ][ 61 ]middlewareisbeingextendedwiththeso-calledVMuniversetosupportcheckpointandrestoreofentireVMsratherthanindividualprocesses;ROW-FSsessionscanbeconceivablycontrolledbythismiddlewaretobuerlesystemmodicationsuntilaVMsessioncompletes. 3-4 .Itconsistsofuser-levelDFSextensionsthatsupportselectiveredirectionofdistributedlesystem(DFS)callstotwoservers:themainserverandashadowserver.ThearchitectureisnovelinthemanneritoverlaystheROWcapabilitiesuponunmodiedclientsandservers,withoutrequiringchangestotheunderlyingprotocol.TheapproachreliesontheopaquenatureofNFSlehandlestoallowforvirtualhandles[ 3 ]thatarealwaysreturnedtotheclient,butmaptophysicallehandlesatthemainandROWservers.Alehandlehashtablestoressuchmappings,aswellasinformationaboutclientmodicationsmadetoeachlehandle.Fileswhosecontentsaremodiedbytheclienthave"shadow"lescreatedbytheshadowserverinasparsele,andblock-basedmodicationsareinsertedin-placeintheshadowle.Apresencebitmapmarkswhichblockshavebeenmodied,atthegranularityofNFSblocks(typicallyofsize8-32KB).Figure 3-5 showspossibledeploymentsofproxiesenabledwithuser-leveldiskcachingandROWcapabilities.Forexample,acacheproxyconguredtocacheread-onlydatamayprecedetheROWproxy,thuseectivelyformingaread/writecachehierarchy.Suchcache-before-redirect(Figure 3-5 (a))proxysetupallowsdiskcachingofbothread-only 47
PAGE 48
ROW-FSArchitecture-TheRedirect-on-writelesystemisimplementedbymeansofauser-levelproxywhichvirtualizesNFSbyselectivelysteeringcallstoeitheramainserverorshadowserver.MFH:MainFileHandle,SFH:ShadowFileHandle,F:Flags,HP:Hashtableprocessor,BITMAP:bitmapprocessor. contentsofthemainserveraswellasofclientmodications.Write-intensiveapplicationscanbesupportedwithbetterperformanceusingaredirect-before-cache(Figure 3-5 (b))proxysetup.Furthermore,redirectionmechanismsbasedontheROWproxycanbeconguredwithbothshadowandmainserversbeingremote(Figure 3-5 (c)).Suchsetupcould,forexample,beusedtosupportaROW-mountedO/Simageforadisklessworkstation. 48
PAGE 49
Proxydeploymentoptions(a)Cache-before-redirect(CBR),(b)Redirect-before-cache(RBC)(c)Non-localshadowserver. isasupersetofthelesystemobjectsinmainserver.Themain-indexed(MI)tableisneededtomaintainstateinformationaboutlesinthemainserver.Figure 3-6 showsthestructureofthehashtableandaginformation.Thereaddirag(RD)isusedtoindicatetheoccurrenceoftheinvocationofanNFSreaddirprocedurecallforadirectoryinthemainserver.Generationcount(GC)isanumberinsertedintohashtupleforeachlesystemobjecttocreateauniquediskbasedbitmap.Remove(RM)andRename(RN)agsareusedtoindicatedeletion/renameofale. 49
PAGE 50
Hashtableandagdescriptions:SFH:ShadowFileHandle,MFH:MainFileHandle,RD:ReaddirFlag,RE:ReadFlag,GC:GenerationCount,RM:RemoveFlag,RN:RenameFlag,L1:InitialMainLink,L2:NewShadowLink,L3:CurrentMainLink,RL:Remove/RenameFileList theROWlesystem.Tokeeptrackofcurrentlocationofupdatedblocks,eachleisrepresentedbyatwo-levelhierarchicaldatastructureindisk.Therstlevelindicatesthenameofthelewhichcontainsinformationabouttheblock.Thesecondlevelindicatesthelocationofapresencebitwithinthebitmaple. Figure3-7. RemoteprocedurecallprocessinginROW-FS.TheprocedurecallisrstforwardedtotheshadowserverandlatertothemainNFSserver.SS:ShadowServer,MS:MainServer,SI:ShadowIndexed,MI:MainIndexed. 50
PAGE 51
3-8 illustratesasnapshotviewoflesystemsessionthroughRedirect-on-writeproxy.ROW-proxywhichisusedtointerceptthemountprotocolisabstractedinthegure.NFSclientsmountsaread-onlydirectory(/usr/lib)fromtheserverVM.ThemountedlesystemdirectoryistransparentlyreplicatedintheclientVMtobuerlocalmodications.Filesreplicatedatshadowserveraredummyleswhichrepresentsasparseversionofread-onlyleintheServerVM.Onlyleblockswrittenduringthelesystemsessionsarereplicatedintheshadowserver.Anhashtableentryisupdatedtoprovidestatusofles.Figure 3-8 illustrateshexdumpof32bytesNFSv2hashtable.Generationcountisusedtoprovideauniquebitmapdirectory.Thegenerationcountalongwithhashedvalueofshadowlehandleisusedtocreateabitmapdirectoryperle.AsshownintheFigure 3-8 ,libXlehandleishashedto"777"whichisfurtherconcatenatedwithgenerationcount"234"toproduceauniquebitmapdirectory.RDagismarked"0"asthisisforale.REagismarked"1"toindicatethatbitmapneedstobeaccessedforapossible"libX"leblockintheshadowserver.For"libX"le,thereisnomain-indexedhashtableentryasthereisnostatusinformationtokeepforread-onlyleintheServerVM.Allthenewlywrittenblocksarepresentin"0"leofbitmapdirectory. 3-7 describesthevarioushashtableentriesstoredintheproxywhicharereferencedthroughoutthissection.Table 3-1 brieydescribesNFSv2RPCcallsandpointstotherelevantsectionsforcallmodications.AdetaileddescriptionofallNFSprotocolcallsdescribedbelowcanbefoundin[ 3 ]. 51
PAGE 52
AsnapshotviewoflesystemsessionthroughRedirect-on-Writeproxy.Thehashtableandbitmapstatusisshownforthele\libX"whichistransparentlyreplicatedinashadowserver.Threeblocksof\libX"areshowntoberecentlyaccessedandwrittenintheshadowserver. theserver.Inthesecondstep,themountutilityinvokestheNFSgetattrproceduretogetattributesofdirectory.Finally,themountutilitygetstheattributesoflesystem.Tomaintainmounttransparency,ROW-FSalsohasaproxyforthemountprotocol.Themountprocedureismodiedtoobtaininitialmountlehandleofshadowserver.Specically,themountproxyforwardsamountcalltobothshadowandmainserver.Whenthemountutilityisissuedbyaclient,theshadowserveriscontactedrsttosavethelehandleofthedirectorytobemounted.ThislehandleislaterusedbyNFSprocedurecallstodirectRPCcallstotheshadowserver.TheinitialmappingoflehandlesofamounteddirectoryisinsertedintheSIhashtableduringinvocationofthegetattrprocedure.Figure 3-9 (top,left)depictshandlingoftheMountprocedure. 52
PAGE 53
SummaryoftheNFSv2protocolremoteprocedurecalls.EachrowsummarizesthebehavioroftheRPCcallandpointstothesectionwithinthischapterwherethemechanismtovirtualizeeachcallinROW-FSisdescribed. NFScall Behavior(ModicationSection) Null Testingcall(Nomodication) Getattr RetrievestheattributesfromNFSserver(Section 3.4.3 ) Setattr Settheattributesofaleordirectory(Section 3.4.3 ) Lookup Returnlehandleforalenameordirectory(Section 3.4.2 ) Readlink Readsymboliclink(Section 3.4.8 ) Read Readablockofale(Section 3.4.4 ) Write Writetoablockofale(Section 3.4.5 ) Create Createanewle(Section 3.4.10 ) Remove Removeale(Section 3.4.7 ) Rename Renameale(Section 3.4.7 ) Link Createahardlinkofale(Section 3.4.8 ) Symlink Createasymboliclinkofale(Section 3.4.9 ) Mkdir Createanewdirectory(Section 3.4.10 ) Rmdir Removeanexistingdirectory(Section 3.4.7 ) Readdir Listcontentofanexistingdirectory(Section 3.4.6 ) Statfs Checkstatusoflesystem(Section 3.4.11 ) exist"error.Otherwise,theproxyissuesanNFScreatecallforadummy 53
PAGE 54
1. Anewlemayhavebeencreatedattheshadowserver.Inthiscase,allblocksarepresentinshadowserverandallreadcallsaredirectedtoit. 54
PAGE 55
Ifthelesystemobjectisnotnewlycreatedatshadowserver,leblocksmayresideineithershadowormainserver.Inthatcase,theproxyusesthebitmappresencedatastructureandcalculateslocationofcurrentandvalidleblocktodeterminewhetherthereadrequestshouldbesatisedbythemainorbytheshadowserver. 3. Anoptimization(REag)isdoneforthecasewhenlehandlemappingispresentinMIhashtable;eventhoughabitmaphasnotbeencreated(i.e.noblocksofthelehavebeenwritteninto),thecallisforwardedtothemainserver.Thisoptimizationpreventsexpenseofcheckingbitmapdatastructurefromdisk. 55
PAGE 56
Sequenceofredirect-on-writelesystemcalls asimplelistdirectoryutility(i.e."ls-l")mayinvokemultiplereaddirprocedurecallsbecauseoflargenumberofleobjectspresentinthedirectory.Toprovidesynchronizationbetweenmultiplereaddircalls,informationisneededtokeeptrackofthepositionwherelastreaddircallreturnedthelesystemobject.IntraditionalNFS,thisisaccomplishedbythemeansofacookierandomlygeneratedonaper-leobjectbasis.InthecontextofROW-FS,therearetwopossiblescenariosforreaddirprocedure:rstcallorsubsequentcall.Forsakeofclarity,Ireferthemainserverreaddirasm-readdirandtheshadowserverforwardinginvocationass-readdir.Tovirtualizethersts-readdirprocedure,it 56
PAGE 57
1. ROWproxyinterceptss-readdirrequestfromclientandchecksstatusofparentlehandleintheSIhashtable.Ifitistherstcall,initializeatemporarybuertostoretemporarycookieformultiplem-readdircalls. 2. CheckstatusofRDagofparentdirectorywhichindicatesifreaddirhasbeenpreviouslycalled.IfRDagissetthenforwardcalltoshadowserver.IfRDagisnotset,thefollowingaretheoptionsofrelativestructureofthedirectoriesintheshadowandmainserver: 3. Checkforlesystemtypeofthereturnedlesystemobject.Ifitisasymboliclink,thereadlinkprocedureisinvokedtogetthedatafrommainserverandasymlinkcallisissuedtoshadowservertoreplicatetheobject. 57
PAGE 58
Iflesystemobjectisoflinktype(asprovidedbythenlinkattributeofthefattrdatastructure),thentheLINKprocedureiscalledintheshadowserver.Linkprocedureistheonlyprocedurewhichcanincrementnlinkattributeoflesystemobject.UpdateallregeneratedlesystemobjectsinSI/MIhashtables. exist"error. 58
PAGE 59
59
PAGE 60
60
PAGE 61
62 ].TheNISTnetemulatorisdeployedasavirtualrouterinaVMwareVMwith256MBmemoryandrunningLinuxRedhat7.3.Redirectionisperformedtoashadowserverrunninginavirtualmachineintheclient'slocaldomain. 3-2 showthatROW-FSperformanceissuperiortoNFSv3inaWANscenario,whilecomparableinaLAN.IntheWANexperiment,recursivestatshowsnearlyvetimesimprovementover 61
PAGE 62
3-2 .Clearly,performanceforWANforROW-FSduringthesecondruniscomparablewithLANperformanceandmuchimprovedoverNFSv3.Thisisbecauseonceadirectoryisreplicatedattheshadowserver,subsequentcallsaredirectedtotheshadowserverbymeansofreaddirstatusag.TheinitialreaddiroverheadforROW-FS(especiallyinLANsetup)isduetothefactthatdummyleobjectsarebeingcreatedintheshadowserverduringtheexecution.Remove:Tomeasurethelatencyofremoveoperations,Ideletedalargenumberofles(greaterthan15000andtotaldatasize190MB).IobservedthatinROW-FS,sinceonlytheremovestateisbeingmaintainedratherthancompleteremovalofle,performanceisnearly80%betterthanthatofconventionalNFSv3.Ittakesnearly37minutesinROW-FSincomparisonto63minutesinNFS3todelete190MBofdataoverawideareanetwork.Notethateachexperimentisperformedwithcoldcaches,setupbyre-mountinglesystemsineverynewsession.Ifthelesystemisalreadyreplicatedinshadowserver,ittakes18minutes(WAN)todeletethecompletehierarchy. Table3-2. LANandWANexperimentsforlookup,readdirandrecursivestatmicro-benchmarks.ForROW-FS,eachbenchmarkisrunfortwoiterations:First,warmsupshadowserver.Second,toaccessmodicationslocally.NFSv3isexecutedonceasperformanceforsecondrunissimilartorstrun.InbothROW-FSandNFSv3,NFScachingisdisabled. Micro-benchmark LAN(seconds) WAN(seconds) ROW-FS NFSv3 ROW-FS NFSv3 1strun 2ndrun 1strun 2ndrun Lookup 0.018 0.008 0.011 0.089 0.018 0.108 Readdir 67 17 41 1127 17 1170 RecursiveStat 425 404 367 1434 367 1965 Remove 160 NA 230 2250 NA 3785 62
PAGE 63
3-3 summarizestheperformanceofAndrewbenchmarkandFigure 3-10 providesstatisticsfornumberofRPCcalls.TheimportantconclusiontakenfromthedatainFigure 3-10 isthatROW-FS,whileincreasingthetotalnumberofRPCcallsprocessedduringtheapplicationexecution,itreducesthenumberofRPCcallsthatcrossdomainstolessthanhalf.Notethattheincreaseinnumberofgetattrcallsisduetoinvocationofgetattrproceduretovirtualizereadcallstomainserver.Readcallsarevirtualizedwithshadowattributes(thecasewhenblocksarereadfromMainserver)becausetheclientisunawareoftheshadowserver;lesystemattributeslikelesystemstatisticsandleinodenumberhavetobeconsistentbetweenreadandpost-readgetattrcall.Nonetheless,sinceallgetattrcallsgotothelocal-areashadowserver,theoverheadofextragetattrcallsissmallcomparedtogetattrcallsoverWAN. Table3-3. AndrewBenchmarkandAM-Utilsexecutiontimesinlocal-andwide-areanetworks. Benchmark ROW-FS(sec) NFSv3(sec) Andrew(LAN) 13 10 Andrew(WAN) 78 308 AMUtils(LAN) 833 703 AMUtils(WAN) 986 2744 63
PAGE 64
63 ]isalsousedasanadditionalbenchmarktoevaluateperformance.Theautomounterbuildconsistsofcongurationteststodeterminerequiredfeaturesforbuild,thusgeneratinglargenumberoflookups,readandwritecalls.Thesecondstepinvolvescompilingofam-utilssoftwarepackage.Table 3-3 providesexperimentalresultsforLANandWAN.TheresultingaveragepingtimefortheNIST-emulatedWANis48.9msintheROW-FSexperimentand29.1msintheNFSv3experiment.Wide-areaperformanceofROW-FSforthisbenchmarkisagainbetterthanNFSv3,evenunderlargeraveragepinglatencies. Figure3-10. NumberofRPCcallsreceivedbyNFSserverinnon-virtualizedenvironment,andbyROW-FSshadowandmainserversduringAndrewbenchmarkexecution 64
PAGE 65
LinuxkernelcompilationexecutiontimesonaLANandWAN. Setup FS Oldcongtime(s) Deptime(s) BzImagetime(s) NFSv3 49 120 710 LAN ROW-FS 55 315 652 NFSv3 472 2648 4200 WAN ROW-FS 77 1590 780 "oldcong",make"dep"andmake"bzImage".Table4showsperformancereadingsforbothLANandWANenvironments.TheperformanceofLinuxkernelcompilationforROW-FSiscomparablewithNFSv3inLANenvironmentandshowssubstantialimprovementintheperformanceovertheemulatedWAN.Notethat,forWAN,kernelcompilationperformanceisnearlyvetimesbetterwiththeROWproxyincomparisonwithNFSv3.TheresultsshowninTable 3-4 donotaccountfortheoverheadinsynchronizingthemainserver.Nonetheless,asshowninFigure 3-10 ,amajorityofRPCcallsdonotrequireserverupdates(read,lookup,getattr);furthermore,manyRPCcalls(write,create,mkdir,rename)arealsoaggregateinstatistics-oftenthesamedataiswrittenagain,andmanytemporarylesaredeletedandneednotbecommitted.Faulttolerance:Finally,Itestedthecheck-pointingandrecoveryofacomputationalchemistryscienticapplication(Gaussian[ 64 ]).AVMwarevirtualmachinerunningGaussianischeckpointed(alongwithROW-FSstateintheVM'smemoryanddisk).Itisthenresumed,runsforaperiodoftime,andafaultisinjected.SomeGaussianexperimentstakemorethanonehourtonishandgeneratealargeamountoftemporarydata(hundredsofMBytes).WithROW-FS,Iobservethattheapplicationsuccessfullyresumesfromapreviouscheckpoint.WithNFSv3,inconsistenciesbetweentheclientcheckpointandtheserverstatecausedtheapplicationtocrash,preventingitssuccessfulcompletion. 65
PAGE 66
3-5 summarizestheperformanceofdisklessboottimeswithdierentproxycachecongurations.Theresultsshowwhatprecachingofattributesbeforeredirectionandpostredirectiondatacachingdeliverthebestperformance,reducingwide-areaboottimewith"warm"cachesbyover300%. Table3-5. WideareaexperimentalresultsfordisklessLinuxboot/secondbootfor(1)ROWproxyonly(2)ROWproxy+datacache(3)attribute+ROW+datacache WAN Boot(sec) 2ndBoot(sec) Client->ROW->Server 435 236 Client->ROW->DataCache->Server 495 109 Client->Attr.Cache->ROW->DataCache->Server 409 76 3-6 .Inthesecondpart,Itestedthesetupwith 66
PAGE 67
RemoteXenboot/rebootexperimentwithROWproxyandROWproxy+cache NISTNetDelay ROWProxy ROWProxy+CacheProxy Boot(sec) 2ndBoot(sec) Boot(sec) 2ndBoot(sec) 1ms 121 38 147 36 5ms 179 63 188 36 10ms 248 88 279 37 20ms 346 156 331 37 50ms 748 266 604 41 aggressiveclientsidecaching(Figure2(b)).Table 3-6 alsopresentstheboot/secondbootlatenciesforthisscenario.Fordelayssmallerthan10ms,theROW+CPsetuphasadditionaloverheadforXenboot(incomparisonwithROWsetup);however,fordelaysgreaterthan10ms,thebootperformancewithROW+CPsetupisbetterthanROWsetup.RebootexecutiontimeisalmostconstantwithROW+CPproxysetup.Clearly,theresultsshowmuchbetterperformanceofXensecondbootfortheROW+CPexperimentalsetup. 65 ].ThekeyadvantagesofROW-FSoverUnionFSarethattheformerisuser-levelandintegrateswithunmodiedNFSclients/servers,whilethelatterisakernel-levelapproachthatrequiressupportfromthekernel,andthattheformeroperateswithindividualledatablockswhilethelatteroperatesonwholeles.Thisisimportantinapplicationswhereunmodiedclientsaredeployedandapplicationsthataccesssparsedata;forexample,theprovisioningofVMimages.IhaveattemptedtocomparetheperformanceofUnionFSandROW-FSforXenvirtualmachineinstantiationacrosswide-area,butinstantiatingaXen3.0domUwithanimagestackedusingthelatestversionofUnionFSavailableatthetimeofwriting(Unionfs1.4)fails.UnionFScopy-on-writemechanismisbasedoncopy-upcompletetonewbranchonwriteinvocationwhereasROW-FSjustreplicatestheneededblock;hence,ROW-FShasaddedadvantageoverUnionFSfordiskimagesinstantiation(largecopy-upisexpensive).AdvantagesofUnionFSoverROW-FSincludepotentially 67
PAGE 68
38 ].Inthepast,researchershaveusedNFSshadowingtechniquetologusers'behavioronoldlesinaversioninglesystem[ 66 ].EmulationofNFSmounteddirectoryhierarchyisoftenusedasameansofcachingandperformanceimprovement[ 67 ].Koshaprovidesapeertopeerenhancementofnetworklesystemtoutilizeredundantstoragespace[ 51 ].Inthepast,levirtualizationwasaddressedthroughNFSmountedlesystemwithintheprivatenamespacesforgroupofprocesseswithmotivationtomigratetheprocessdomain[ 68 ].Stripednetworklesystemisimplementedtoincreasetheserverthroughputbystripinglebetweenmultipleservers[ 69 ].Thisapproachisprimarilyusedtoparallelyaccessleblocksfrommultipleservers,thusimprovingitsperformanceoverNFS.Acopy-on-writeleserverisdeployedtoshareimmutabletemplateimagesforoperatingsystemskernelsandle-systemsin[ 21 ].Theproxy-basedapproachpresentedinthisdissertationisuniqueinhowitnotonlyprovidescopy-on-writefunctionality,butalsoprovidesprovisionforinter-proxycomposition.Checkpointmechanismsareintegratedintolanguagespecicbyte-codevirtualmachineasmeansofsavingapplication'sstate[ 70 ].VMwareandXen3.0virtualmachineshaveprovisionoftakingcheckpoints(snapshots)andrevertingbacktothem.Thesesnapshots,however,donotsupportcheckpointsofchangesinamounteddistributedlesystem. 68
PAGE 69
69
PAGE 70
71 ].Theapproachisathinclientsolutionfordesktopgridcomputingbasedonvirtualmachineapplianceswhoseimagesarefetchedon-demandandonaper-blockbasisoverwide-areanetworks.Specically,Iaimatreducingdownloadtimesassociatedwithapplianceimages,andprovidingadecentralized,scalablemechanismtopublishanddiscoverupgradestoapplianceimages.Theapproachembodiesdierentcomponentsandtechnologies|virtualmachines,anoverlaynetwork,pre-bootexecutionenvironmentservices,andaredirect-on-writevirtuallesystem.VirtualMachinesovervirtualnetworksaredeployedwithpre-bootexecutionservertofacilitateremotenetworkbooting.TheapproachusesROW-FSthatenablestheuseofunmodiedNFSclients/serversandlocalbueringoflesystemmodicationsduringtheappliance'slifetime.Similarlytorelatedeorts,ourapproachtargetsapplicationsdeployedonnon-persistentvirtualcontainers[ 4 ][ 28 ]throughprovisioningofvirtualenvironmentswithrole-specicdiskimages[ 60 ].Thincomputingparadigmsoeradvantagessuchasloweradministrationcostandfailuremanagement.Inearlycomputingsystems,thin-clientcomputingwassuccessfulbecauseoftwomainreasons:low-costcommodityhardwarewasnotavailabletoenduser,andacentralizedapproachofcomputingisoftenpreferredduetoeasiersystemadministration.Aslow-costPCsandhigh-bandwidthlocal-areanetworksbecamewidelyavailable,thin-clientcomputinglostground.Theadventofvirtualmachineshaveopenedupnewopportunities;virtualmachinescanbeeasilycreated,congured,managedanddeployed.Thevirtualizationapproachofmultiplexingphysicalresourcesnotonly 70
PAGE 71
72 ].AnillustrativeexampleofanvirtualapplianceisaFedora9applianceofsize800-MBwithpre-conguredgraphicaluserinterfacepackages[ 72 ].Optimizingthesizeofanapplianceistime-consuming,andinmanycasesnotpossiblewithoutlossoffunctionality(e.g.byavoidinginstallationofcertainpackages).Nonetheless,itisoftenthecasethatatrun-timeonlyasmallfractionofthevirtualdiskisactually\touched"byanapplication.Iexploitthisbehaviorbybuildingonon-demanddatatransfersthatsubstantiallyreducethedownloadtimeandbandwidthrequirementsfortheenduser.Thefollowingsectionswillexplaintheoverallarchitectureandapproach. 4-1 .Theapproachisbasedondisklessprovisioningofvirtualmachineenvironmentsthroughavirtualmachineproxy.Theutilityoftheenvisionedarchitecturecanbeobservedfromtheviewpointofbothusersandsystemadministrators.UsersnotonlyhavefastandtransparentaccesstodierentO/SimagesbutalsohaveautomaticsupporttoupgradetheO/Simages.Foradministrators,itprovidesaframeworkforsimpledeploymentandmaintenanceofnewimages.AsshowninFigure 4-1 ,anenduserX1downloadsasmallproxyappliance(VM2)fromDownloadServerDS.Theproxyapplianceisconguredtoconnecttoavirtual 71
PAGE 72
O/Simagemanagementoverwideareadesktops:UserX1downloadsasmallROW-FSproxyconguredappliance(VM2)fromdownloadserverDS.UserX2potentiallycansharetheapplianceimagewithUserX1.ImageServer(IS)exportsread-onlyimagestoclientsoverNFS.VM1isadisklessclient.TheappliancebootstrapprocedureisexplainedfurtherinFigure 4-2 networkoverlayconnectingittootherusers(e.g.usingIPOP[ 26 ][ 30 ]).AnexampleNFSproxyapplianceofsize350MBcanbedownloadedfromVmwarevirtualappliancemarketplace[ 72 ].TheproxyapplianceisalsoconguredtorunasmallftpserverandDHCPservertodownloadnetworkbootstrapprogramandallocateIPaddresstoclient'sworkingenvironment.Theactualapplianceswhichcarryoutcomputationcanbeconguredwithadesiredexecutionenvironmentandneednotbedownloadedintheirentiretybyendusers-theyarebroughtinon-demandthroughtheproxyappliance.EachnodeisanindependentcomputerwhichhasitsownIPaddressonaprivatenetwork.Keytothisarchitectureistheredirect-on-writelesystem(ROW-FS).Asexplainedinchapter3,ROW-FSconsistsofuser-levelDFSextensionsthatsupportselective 72
PAGE 73
ThedeploymentoftheROWproxytosupportPXE-basedbootofa(diskless)non-persistentVMoverawideareanetwork. Figure 4-2 expandsonFigure 4-1 toshowthedisklessprovisioningofvirtualmachines.InFigure 4-2 ,VM1isadisklessvirtualmachine,andVM2isabootproxyapplianceconguredwithtwoNICcardsforcommunicationwithhost-onlyandpublicnetwork.VM2isconguredtoexecuteROWlesystem(ROW-FS)andNFScacheproxies.Inaddition,VM2isconguredtorunDHCPandTFTPserverstoprovidedisklessclientanIPaddressandinitialkernelimagetoVM1.ClassicvirtualmachinessuchasVMwareprovidessupportforPXE-enabledBIOSandNICs;PXEisatechnologytobootdisklesscomputersusingnetworkinterfacecards.TheserverVMisconguredtoshareacommondirectorythroughROW-FStoclients.Toillustratetheworkingsofdisklesssetup,considerthefollowingstepstobootadisklessVMwithanapplianceimageservedoverawide-areanetwork: 1. DisklessVM(VM1)invokesDHCPrequestforanIPaddress 2. DHCPrequestisroutedthroughahost-onlyswitchtogatewayVM(VM2) 3. VM2isconguredtohavetwoNICs:host-only(privateIPaddress)andpublic.VM2receivesrequestathost-onlyNIC(eth0). 4. DHCPServerallocatesanIPaddressandsendsareplybacktodisklessVM(VM1) 73
PAGE 74
DisklessVMinvokesTFTPrequesttoobtainnetworkbootstrapprogramandinitialkernelimage. 6. VM2receivesTFTPrequestathost-onlyeth0 7. KernelimageistransferredtoVM1andloadedinRAMtokickstartbootprocess 8. DisklessVMinvokesmountrequesttomountread-onlydirectoryfromServer(VM3)throughtheproxyVM(VM2) 9. VM2isconguredtoredirectwritecallstoalocalserver.Read-onlyNFScallsareroutedthroughtheproxyVM2toVM3;theconnectionbetweenVM2andVM3isthroughthevirtualoverlaynetwork.TheP2Pnetworksareconsideredtobeinherentlyself-conguring,scalableandveryrobusttonodeorsystemfailures.EachP2Pnodemaintainsaviewofnetworkatregularintervalswhichfacilitatesseamlessadditionorremovalofanodefromthesystem.Asthenumberofnodesareaddedintothenetworkpool,bandwidthandCPUprocessingisdistributedandsharedamongusers,thusP2Psystemsareveryscalable.Furthermore,P2Psystemsareconguredtobetoleranttonodefailures.P2PoverlaynetworkssuchasIPOPalsofacilitaterewalltraversalwithoutadministratorinterventionwhichallowsP2Pnodesbehindtherewallstojointhenetwork[ 30 ].TheprocessofpublishingandsharingO/SimagesiswellsupportedbytheseP2Pproperties.Theprimarygoalofthearchitectureistoautomatetheprocessofpublishing,discoveringandmountingapplianceimages.Furthermore,itshouldbepossibleforimagestobereplicated(fullyorpartially)acrossmultiplevirtualserversthroughoutavirtualnetworkforload-balancingandfault-tolerance.ItisfeasibletoprovideimageversioningcapabilitythroughmaintainingthelatestimagestateinadecentralizedwayusingaDistributedHashTable(DHT){which,inthecaseoftheIPOPvirtualnetwork[ 30 ],isalreadyresponsibleforprovidingDHCPaddresses.DHTsprovidetwosimpleprimitives:put(key,value)andget(key).InordertousetheDHTtotrackapplianceimageversions,thekeyfunctionalityneededcanbebrokendownintodisklessclientandpublisherclient. 74
PAGE 75
75
PAGE 76
1. Disklessclientdownloadsthebootproxyappliancemachinefromthedownloadserver(VM2inFigure 4-1 ).Clientbootstrapsthisgenericapplianceconguredtoforwardclient'srequesttoImageServer(IS).ThedownloadedproxymachineisconguredsoastoconnectwithnetworkofappliancesthroughIPOPp2pnetwork. 2. DisklessclientqueriestheDHTforapplianceAversion.Anillustrativeexampleofanappliancenamecouldbea\Redhat"appliance. 3. DisklessclientqueriestheDHTtoobtaintheimageserverIPaddressandmountpathfortheapplianceA. 4. StarttheROW-FSproxiesusingtheimageserverIPaddress.ThestartupofROW-FSproxiessetsupanaccesscontrollistandasessiondirectorytoallowcallforwardingtotheimageserverandthelocalNFSserver. 5. Bootstrapadisklessclientvirtualmachineandestablishamountsessionwiththeimageserver.VirtualmachineAPIsareleveragedtobootstrapthedisklessclient. 6. DisklessclientperformstheexperimentsduringtheestablishedROW-FSsessionbetweendisklessclientandtheimageserver. 7. Haltthebooteddisklessclient 8. KilltheROW-FSproxies.Thebootproxyappliancemachinecontainstheclient'ssessiondataandexperimentalrunresults.Figure 4-3 andFigure 4-4 illustratesthealgorithmfordisklessclienttobootstrapanapplianceandapublisherclienttopublishtheO/Simage.UnusedVMscanberemovedfromthesystematregularintervals.WhennumberofclientsaccessingtheVMimageiszeroandtheimageisnotthelatestversion,DHTexpireswithtimeout. 76
PAGE 77
AlgorithmtobootstrapaVMsession AlgorithmtopublishavirtualmachineImage 77
PAGE 78
58 ];itisalsoconceivabletointegratethelogictocongure,createandtear-downROW-FSsessionswithapplicationworkowschedulerssuchas[ 73 ].Further,failuretransparencyisimportantpropertyofdistributedsystems.DuringthetimeaROW-FSlesystemsessionismounted,allmodicationsareredirectedtotheshadowserver.Aquestionthatarisesishowlesystemmodicationsintheshadowservershouldbereconciledwithdatainthemainserverattheendofasession.Threescenarioscanbeconsideredforconsistencysupportinthecontextofredirect-on-writelesystem: 1. Thereareapplicationsinwhichitisneitherneedednordesirablefordataintheshadowservertobereconciledwiththemainserver;anexampleistheprovisioningofsystemimagesfordisklessclientsorvirtualmachines,wherelocalmodicationsmadebyindividualVMinstancesordisklessmachinesarenotpersistent. 2. Forapplicationsinwhichitisdesirabletoreconciledatawiththeserver,theROW-FSproxyholdsstateinitsprimarydatastructures(thelehandlehashtableandtheblockbitmaps)thatcanbeusedtocommitmodicationsbacktotheserver.TheapproachistoremountthelesystemattheendofaROW-FSsessioninread/writemode,andsignaltheROW-FSproxytotraverseitslehandlehashtablesandbitmapstocommitchanges(moves,removes,renames,etc)todirectories 78
PAGE 79
3. OneparticularusecaseofROW-FSisautonomicprovisioningofO/Sdiskimagessharedbetweenmultipleclients.Inthiscontext,IleverageonAPIsexportedbylookupservices(suchasDistributedHashTable)indistributedframeworks(suchasIPOP[ 26 ])tostoreclient'susageandsharinginformation.Thisapproachisbasedonmultipleclientsconvergingtousethelatestapplianceimageduringthecourseoftime. 79
PAGE 80
74 ]. 16 ]).EachdirectoryorleinROW-FSisuniquelyaddressedbyapersistentlehandle.ItisfeasibletoprovidereplicasupportforwideareadesktopsinROW-FSProxyaslehandlesinreplicatedVMisconsistentwithprimaryVM. 80
PAGE 81
ReplicationApproachforROW-FS.Filehandlesofserverreplicaandread-onlyserverareequivalent.Atimeoutoftheconnectiontoread-onlyservercanresultinswitchovertoreplicaserver. ROW-FSproxycanbeconguredtoprovidesupportforread-onlyserverreplica.Figure 4-5 showsthefeasibilityofvirtualmachine-basedreplicationmechanism.TheROW-FSproxyisconguredtoforwardcallstobothread-onlyandreplicaserver.Eachreplicaisaclonedvirtualmachinewhichisexportingrootlesystemofthegridappliancetotheclient.Anexamplescenarioisshowninthegure.Ifread-onlyserverr0goesdownforsomereason,ROW-FSproxycanbeconguredtoswitchovertoareplicaserver(r1)afternoresponsetime,Tout.TheshadowstateofROW-FSproxywhichincludeslehandlemappingbetweenshadowserver(s0)andread-onlyserver(r0)andbitmapdatastructureisstillvalidwithnewread-onlyserverreplica(r1).Thisisfeasiblebecauseshadowserverstateisdependentonlehandlesofs0andr0servers.Serversr0andr1haveidenticallehandleswhichfacilitatesaseamlesstransitionofNFScallstor1. 81
PAGE 82
74 ]: 1. Condentiality:PublishersshouldbeabletopublishO/Simagesforintendedusers. 2. Integrity:Integrityofpublisher'sclaimshouldbemaintained.Nootherusershouldbeabletomodifythepublisher'sclaim.Toaddressthesesecurityproperties,Iconsiderfollowingsecuritymechanisms: 1. Encryption:Publisher'sclaimforO/Simageneedstobeencryptedtoavoidanyinterceptionorfabricationfromarogueuser. 2. Authentication:Publisherofanimagemustbeabletoproveitsidentity.Toestablishpublisher'sidentity,publickeycryptographyschemecanbeapplied.PublickeyofeachuserinP2Pnetworkcanbeadvertised.AnyclaimbyUserisencryptedbyitsownprivatekey.Whileauthenticationandencryptionhelpsinsendingdatasecurely,itisimportanttomodeltrustbetweentheP2Pusers.TrustmodelsareawaytovalidateclientX'sclaimtobe\UserX".Varioustrustmodelshavebeenusedtoestablishtrustindistributedsystems.Forexample,a\weboftrust"approachisacommonlyusedemailschemetosendprivateemailstotheendusers[ 74 ].Theapproachreliesonmaintainingalistoftrustedpublickeysbyuser.IsuggestapublickeyinfrastructureastrustedmodelforO/Smanagementframework.Publickeyinfrastructureisacollectionofcerticateauthoritiesandcerticatesassignedtousers.Certicatesarecommoncryptographictechniqueusedine-commerceapplications.Acerticateisadigitalsignaturewhichhelpsinmaintainingidentication,authorizationanddatacondentialityoftheuser.Adigitalsignatureisakindofasymmetriccryptographytosecurelysendmessagebetweenusers.Whileasymmetrickeycryptographysecurelytransferthedata,thequestionpersistinformationoftrustbetweenendusers.Forthepurposeofthisdissertation,Iassumethatthereisatrustedcerticate 82
PAGE 83
4-6 illustratessecuritymechanismtoauthenticateandencryptclient'sandpublisher'sdata.Here,assumptionisthatpublickeyofcerticateauthorityisbuiltintoP2Poverlaynetwork.Tovalidateeachclient'sidentity,certicateauthorityencryptsclient'sidentication(IDC)andpublickey(K+C)(i.e.certicate)throughitsprivatekey(KCA)anddistributeitoverP2Pnetwork. Disklessclientandpublisherclientsecurity.C:Client,P:PublisherandCA:CerticateAuthority. Followingequationsillustratetheencryptionofappliance(A)informationbyapublisherandfurtherdecryptionbyaclient.PublisherEncryption:(A;Vi)=>KP(A;Vi)(A;Vi;IP;MountPath)=>KP(A;Vi;IP;MountPath)ClientDecryption:K+P(KP(A;Vi))=>(A;Vi)K+P(KP(A;Vi;IP;MountPath))=>(A;Vi;IP;MountPath) 83
PAGE 84
4-2 ).Inthisexperiment,virtualmachinesVM1,VM2andVM3aredeployedinahost-onlynetworkusingVMware'sESXServerVMmonitor.NotethatthisexperimentonlyreectsVMresourcestatistics(notapplicationexecutiontime).Theexperimentalsetupisasfollows:ServerVM3isconguredtohave2GBRAM,singlevirtualCPU.VM2andVM1areconguredtohave1GBRAM,andalsoasingleVCPU.TheVMsarehostedbyadualXeon3.2GHzprocessor,4GBmemoryserver.Thesizeoftheimageoftheapplianceis934MB.Figure 4-7 showstime-seriesplotswithCPU,diskandnetworkratesforthreedierentintervals.Thesevaluesareobtainedin20-secondintervalsleveragingVMwareESX'sinternalmonitoringcapabilities.Intherstinterval,theVMisbooted.Inthesecondinterval,theVMrunsaCPU-intensiveapplication(thecomputerarchitecturesimulatorSimplescalar)whichmodelsthetargetworkloadofatypicalvoluntarycomputingexecution.Inthethirdphase,theapplianceisrebooted. 84
PAGE 85
ProxyVMusagetimeseriesforCPU,diskandnetwork.Resultsaresampledevery20secondsandreectdatameasuredattheVMmonitor.Threephasesareshown(markedbyverticallines):applianceboot,executionofCPU-intensiveapplication(simplescalar),andappliancereboot. 85
PAGE 86
75 ].TheFigureshowshighdatawriteratesduringVMbootexecution.Clearly,Icanobserveamaximumof12%CPUconsumptionandalsoahighdatarateacrossthenetworktoloadtheinitialkernelimageintodisklessclient(VM1)memory.ApplicationExecution:SincetheapplicationisCPUintensive,theproxyVMexhibitslittlerun-timeoverheadinthisphase.ThisisbecauseoncethedisklessVM(VM1)isbooted,itloadsnecessarylesfortheapplicationexecutionintoRAM(asshownbyinitialnetworkactivity).IcanfurtherobservethatdiskandnetworkusageisnegligibleinVM2duringtheexecutionofthesimplescalarapplication,thussupportingmyassumptionofminimaloverheadofproxyconguredVM.VMReboot:Duringreboot,theclienthasreplicatedsessionstateattheshadowserver.Iseeanaverageboottimereduction(describedbelow).IobservefurtherspikesinnetworkandCPUusageassomeoflesarefetchedandreadfromtheServerVM.Inpastresults,Ihaveshownthataggressivecachingcanfurtherimprovebootperformance[ 75 ]. 4-8 providesstatisticsfornumberofRPCcallsduringthebootupofthedisklessapplianceVM1.ThehistogramisbrokendownbydierenttypesofRPCcallscorrespondingtoNFSprotocolcalls,fromlefttoright:getandsetleattributes,lehandlelookup,readlinks,readblock,writeblock,createle,renamele,makedirectory,makesymboliclink,andreaddirectory.TheimportantconclusiontakenfromthedatainFigure 4-8 isthatROW-FS,whileincreasingthetotalnumberofcallsroutedtothelocalshadowserver,reducesthenumberofRPCcallsthatcrossdomains.Notethattheincreaseinnumberofgetattribute(getattr)callsisduetoinvocationofgetattrproceduretovirtualizereadcallstomain 86
PAGE 87
RPCstatisticsfordisklessboot.ShadowserverreceivesmajorityofRPCcalls.ThebarsrepresentthenumberofRPCcallsreceivedbyshadowandmainservers. 30 ]virtualnetwork,andtheserverandclientVMsarebehindNATs.TheproxyVM2is 87
PAGE 88
4-1 summarizestheresultsfromthisexperiment.Noticethattheboottimesreducetolessthanhalf,becomingcomparabletotheLANPXE/NFSboottimeofapproximately2minutes. Table4-1. ApplianceBoot/reboottimesoverWAN.ISPisaVMbehindaresidentialISPprovider;UFLisadesktopmachineattheUniversityofFlorida;VIMSisaservermachineattheVirginiaInstituteofMarineSciences. VM1/VM2, Boot 2ndBoot Pinglatency VM3 seconds seconds seconds ISP,UFL 291 116 23ms UFL,VIMS 351 162 68ms 4-9 showsthecumulativedistributiongraphof100accessesofDHTthrough10IPOPclients.TheclientsarerandomlychosentoquerytheDHT.Thedistributiongraphshowsthatinmostcasesittakeslessthan2secondstoquerytheDHTandobtaintheappliancestatusinformation.Theaveragetimetoinsertapplianceversionwithappliancenameaskeyover10iterationsis1.4sec.Table 4-2 providethemeanandvariancestatisticsforveclients.ThemeanandvariancestatisticsshowsthatclientaccesstimesvaryoverDHToverP2PnodesdeployedonPlanetlab.TheaccesstimeoftendependsontheroutepathtakentoaccessDHTinformation. Table4-2. MeanandvarianceofDHTaccesstimeforveclients Clients/ Client1 Client2 Client3 Client4 Client5 Statistics Mean 0.567 0.648 2.699 0.6875 2.224 Variance 0.00254 0.0389 4.731 0.08512 1.337 76 ]providesmechanismstomakeread-onlydatascalableoverthewideareanetworkthroughcooperativecachingandNFSproxies;myapproach 88
PAGE 89
CumulativedistributionofDHTquerythrough10IPOPclients(inseconds) complementsitbyenablingredirect-on-writecapabilities,whichisarequirementtosupportthetargetapplicationenvironmentofNFS-mounteddisklessVMclients.SFSadvocatedtheapproachofread-onlylesystemforuntrustedclients.Therehavebeenupcomingcommercialproductswhicheitherprovidethin-clientsolutionsbasedonpre-bootexecutionenvironment[ 77 ][ 78 ]orprovidecachebasedsolutionasaviablethin-clientapproachforscalablecomputing[ 79 ].Distributedcomputingapproachbasedonstackablevirtualmachinesandboxesisadvocatedin[ 59 ].Stackablestoragebasedframeworkisalsousedtoautomateclustermanagementasmeanstoreduceadministrativecomplexityandcost[ 60 ].Theapproachadvocatedin[ 60 ]isre-provisioningofapplicationenvironment(baseOS,servers,librariesandapplication)throughrole-specic(readorwrite)diskimages.Aframeworktomanageclusterofvirtualmachinesisproposedin[ 80 ].Storkpackagemanagementtoolprovidesmechanismtosharelessuchaslibrariesandbinariesbetweenvirtualmachines[ 81 ].Acopy-on-writeleserverisdeployedtoshareimmutabletemplateimagesforoperatingsystemskernelsandle-systemsin[ 21 ].ThisapproachusesacombinationoftraditionalNFSforread-onlymountandAFSforaggressivecachingofsharedimages 89
PAGE 90
21 ].Write-oncesemanticsbasedlesystemsarecommontoleverageoncommoditydiskimagesforapplicationssuchasmap-reduce[ 82 ]. 90
PAGE 91
1 ][ 71 ],acommondeploymentscenarioofROW-FSiswhenthevirtualmachinehostingshadowserverandtheclientvirtualmachineareconsolidatedintoasinglephysicalmachine.SuchascenarioiscommonwhenclientVMisdisklessordiskspaceonclientVMisaconstraint.WhiledeployingROWproxiesinsuchcasesprovidesamuchneededfunctionality,theoverheadassociatedwithvirtualizednetworkI/Oandisoftenconsideredabottleneck[ 9 ][ 10 ].Whilethevirtualizationcostdependsheavilyonworkloads,ithasbeendemonstratedthattheoverheadismuchhigherwithI/Ointensiveworkloadscomparedtothosewhicharecompute-intensive[ 10 ].Unfortunately,thearchitecturalreasonsbehindtheI/Operformanceoverheadsarenotwellunderstood.EarlyresearchincharacterizingthesepenaltieshasshownthatcachemissesandTLBrelatedoverheadscontributetomostofI/Ovirtualizationcost[ 10 ][ 83 ][ 84 ].Whilemostoftheseevaluationsaredoneusingmeasurements,inthischapterIdiscussanexecution-drivensimulationbasedanalysismethodologywithsymbolannotationasameansofevaluatingtheperformanceofvirtualizedworkloads.Thismethodologyprovidesdetailedinformationatthearchitecturallevel(withafocusoncacheandTLB)andallowsdesignerstoevaluatepotentialhardwareenhancementstoreducevirtualizationoverhead.ThismethodologyisappliedtostudythenetworkI/OperformanceofXen(asacasestudy)inafullsystemsimulationenvironment,usingdetailedcacheandTLBmodelstoproleandcharacterizesoftwareandhardwarehotspots.ByapplyingsymbolannotationtotheinstructionowreportedbytheexecutiondrivensimulatorIderivefunctionlevelcallowinformation.Ifollowtheanatomyof 91
PAGE 92
20 ]),usingtheSoftSDV[ 85 ]execution-drivensimulatorextendedwithsymbolannotationsupportandanetworkI/Oworkload(iperf). 92
PAGE 93
10 ][ 83 ].Therestofthischapterisorganizedasfollows.ThemotivationbehindthecurrentworkisdescribedinSection 5.2 .Section 5.3 describesthesimulationmethodology,toolsandsymbolannotations.Section 5.4 detailsthesoftwareandarchitecturalanatomyofI/Oprocessingbyfollowingtheexecutionpaththroughguestdomain,hypervisorandtheI/OVMdomain.Also,IprovideinitialresultsofresourcescalinginSection 5.5 .Section 5.6 describesrelatedwork. 13 ][ 14 ][ 15 ].Asimulation-basedmethodologyforvirtualenvironmentsisalsoimportanttoguidethedesignandtuningofarchitecturesforvirtualizedworkloads,andtohelpsoftwaresystemsdeveloperstoidentifyandmitigatesourcesofoverheadsintheircode.Adrivingapplicationforsimulation-drivenanalysisisI/Oworkloads.ItisimportanttominimizeperformanceoverheadsofI/Ovirtualizationinordertoenableecientworkloadconsolidation.Forexample,inatypicalthreetierdatacenterenvironment,Web 93
PAGE 94
17 ].Enablingalowlatency,highbandwidthinter-domaincommunicationmechanismbetweenVMdomainsisoneofthekeyarchitectureelementswhichcouldpushthisdistributedservicesarchitectureevolutionforward. 86 ][ 13 ];IusetheSoftSDVsimulator[ 85 ]asabasisfortheexperiments.SoftSDVnotonlysupportsfastemulationwithdynamicbinarytranslation,butalsoallowsproxyI/Odevicestoconnectasimulationrunwithphysicalhardwaredevices.ItalsosupportsmultiplesessionstobeconnectedandsynchronizedthroughavirtualSoftSDVnetwork.ForcacheandTLBmodelingIintegratedCASPER[ 87 ]-afunctionalsimulatorwhichoersrichsetofperformancemetricsandprotocolstodeterminecachehierarchicalstatistics. 94
PAGE 95
5-1 ,(A)).Theguestdomain'sfrontenddrivercommunicateswithbackenddriversthroughIPCcalls.ThevirtualandbackenddriverinterfacesareconnectedbyanI/Ochannel.ThisI/Ochannelimplementsazero-copypageremappingmechanismfortransferringpacketsbetweenmultipledomains.IdescribetheI/OVMarchitecturealongwiththelife-of-packetanalysisinSection 5.4 Figure5-1. FullsystemsimulationenvironmentwithXenexecutionincludes(A)XenVirtualEnvironment(B)SoftSDVSimulator(C)PhysicalMachine 95
PAGE 96
5-2 summarizestheprolingmethodologyandthetoolsused.Thefollowingsectionsdescribetheindividualstepsindetail;theseinclude(1)virtualizationworkload,(2)fullsystemsimulation,(3)instructiontrace,(4)performancesimulationwithdetailedcacheandTLBsimulation,and(5)symbolannotation. 5-1 ).Inordertoanalyzeanetwork-intensiveI/Oworkload,theiperfbenchmarkapplicationisexecutedinDomU.Thisenvironmentallowsustotapintotheinstructionowtostudytheexecutionowandtoplugindetailedperformancemodelstocharacterizearchitecturaloverheads.TheDomUguestusesafrontenddrivertocommunicatewithabackenddriverinsideDom0,whichcontrolstheI/Odevices.IsynchronizedtwoseparatesimulationsessionstocreateavirtualnetworkedenvironmentforI/Oevaluation.Theexecution-drivensimulationenvironmentcombinesfunctionalandperformancemodelsoftheplatform.Forthisstudy,IchosetoabstracttheprocessorperformancemodelandfocusoncacheandTLBmodelstoenablethecoverageofalongperiodintheworkload(approximately1.2billioninstructions). 96
PAGE 97
Executiondrivensimulationandsymbolannotatedprolingmethodology.Fullsystemsimulatoroperateseitherinfunctionalmodeorperformancemode.Instructiontraceandhardwareeventsareparsedandcorrelatedwithsymbolstoobtainannotatedinstructiontrace. 97
PAGE 98
5.4 .AnexampleexecutionowaftersymbolannotationisgiveninFigure 5-3 .Thesedecodedinstructionsfromthefunctionalmodelarethenprovidedtotheperformancemodelwhichsimulatesthearchitecturalresourcesandtimingfortheinstructionsexecuted. 5-4 98
PAGE 99
Symbolannotation.Compile-timeXensymbolsarecollectedfromhypervisor,driverandapplicationandannotated.Figureshowsanexamplewheresymbolsareannotatedwith\kernel"and\hypervisor". Figure5-4. Function-levelperformancestatistics.Figureillustratesthathowperformancestatisticsarecoupledwithinstructioncallgraphforeachfunction.SamplestatisticsforL1/L2cachesand,instructionanddataTLBareshown. 99
PAGE 100
5-5 ,aninstructionparserisusedtoparsedierentinstructioneventssuchasINT(interrupts,systemcalls),MOVCR3(addressspaceswitch),andCALL(functioncall).Thesetracesweredumpedintoalewithrun-timevirtualaddressinformation,aswellascacheandTLBstatistics.InstructiontracesareparsedandmappedwithsymboldumpstocreateI/Ocallgraph.SoftSDVsystemcall(SSC)utilitiesfacilitatetransferofdatabetweenhostandsimulatedguest.Aperformancesimulationmodelisusedtocollectinstructiontracesalongwithhardwareeventsofvirtualizedworkload.TheseutilitiesareimportantasIgatheredruntimesymbolsofkernelsandapplicationfromtheprockerneldatastructuretotransfertothehostsystem(forexample,/proc/kallsymsforkernelsymbols).Foriperfruntimesymbols,wemappedprocessIDwithcorrespondingprocessIDinprocdirectory.Theserun-timesymbols, 100
PAGE 101
Figure5-5. SoftSDVCPUcontrollerexecutionmode:performanceorfunctional.Infunctionalmode,SoftSDVsimulatorprovidesinstructiontrace.Inperformancemode,instructiontraceisparsedtoobtainhardwareeventssuchascacheandTLBmisses.Compiletimesymbolsfromkernel,driversandapplicationalongwithruntimesymbolsfromproclesystemarecollectedtoobtainper-functioneventstatistics. Symbolsareannotatedtokeeptrackofthesourceofafunctioncallinvocation.Notethattherecanbeduplicatesymbolswhencollectedsymbolsaresummedupintoale.Theseduplicatesareremovedandfurtherthedatacollectedisformatedinausefulway.Insomecases,itisnecessarytomanuallyresolveambiguitiesinvirtualaddressspacesthroughcheckpointatvirtualaddressduringre-runofasimulatedSoftSDVsession.Linuxutilitiessuchasnmandobjdumpareoftenusedtocollectsymbolsfromcompiletimesymboltables.Ingeneral,anyapplicationcanbecompiledtoprovidesymboltableinformation.InC++applications(suchasiperf),functionnamemanglinginobjectcodeisusedtoprovidedistinctnameforfunctionsthatsharethesamename.Essentially,itaddssomerandomnessatprexandsuxofthefunctionname.Iusedthedemangleoptionofthenmutilitytoidentifythecorrectfunctionforiperfapplication. 101
PAGE 102
5-5 showsthesimulationframeworkimplementationtoobtaincallgraphinformationandperformcachescalingstudies.AsillustratedinFigure 5-5 ,theCPUcontrollerlayerinSoftSDVintegrateswithaperformanceorfunctionalmodel.Theplatformcongurationforthisstudyissettoasingleprocessorwith2levelsofcache(32KBrstleveldataandinstructioncache,2MBL2cache)andwith64-entryinstructionanddataTLBs.TheexperimentalsetupinvolvedmultipleSoftSDVsessionsconnectedovervirtualnetwork.IchoosetoruniperfapplicationtostudylifeofI/Opacketasitistherepresentativebenchmarktomeasureandstudynetworkcharacteristics.TheiperfclientisexecutedtoinitiatepackettransmissionsfromaXenenvironment. 5-6 showsanoverviewofdierentstageswhichcharacterizethelifeofapacketbetweenVMdomains.Typically,anetworkpacketintheXenenvironmentgoesthroughthefollowingfourstagesinitslifecycleaftertheapplicationexecution: 1. Unprivilegeddomain-Packetbuildandmemoryallocation 2. PagetransferMechanism-Azero-copymechanismtomappagesinvirtualaddressspaceofDom0/DomUdomains 3. Timerinterrupts-Contextswitchbetweenhypervisoranddomains 102
PAGE 103
Privilegeddomain-ForwardingI/Opacketdownthewireandsendingacknowledgmentbacktotheguestdomain. Figure5-6. LifeofanI/Opacket(a)Applicationexecution(b)Unprivilegeddomain,(c)Granttablemechanism-switchtohypervisor,(d)Timerinterrupt,(e)Privilegeddomain 27 ].AninterfaceinXentoallocateasocketbuerinthenetworkinglayer(alloc skb from cache)isidentied.ThefrontenddriverusesthegranttablemechanismprovidedbythehypervisortotransferthebuertoDom-0.ThefunctionsandtheassociatedinstructioncountforoveralllifeofthepacketinDomUincludesocketlock,copydatafromuserspacetokernelspace,allocatepagefromfreelist,andreleasesocketlock(Figure 5-7 ).Notethattheinstructioncountstatisticsareshowninchronologicalorderwithfunctionentrypointsasmarkers.Iremovedsomerepeating 103
PAGE 104
Figure5-7. Dom-Ucallgraph:Socketallocation(alloc skb from cache),user-kerneldatacopy(copy from user)andnallyTCPtransmitwrite(tcp write xmit). 104
PAGE 105
5-8 demonstratestheexecutionowfromdomUtohypervisorthroughgranttablemechanism. Figure5-8. TCPtransmit(tcp transmit skb)andGranttableinvocation(gnttab claim grant reference) do upcalltostartprocessingtheevent.Thefunctionsinvokedduringthetimerinterruptisshowningure 5-9 105
PAGE 106
AnnotatedcallgraphtoshowcontextswitchbetweenhypervisorandDom-0VM-Timerinterrupts(write ptbase) 5-10 (sincethecompleteexecutionatthisstageislong,snippetsofexecutioncoveringthebasicowandhighlightingtheimportantfunctionsisshown).NotethatthegranttablemechanismisusedtomapguestpagesintoDom0addressdomainonthebackendreceivingside.Thenthepacketissenttothebridgecode,afterwhichitissentoutonthewire.Oncecomplete,thehostmapisdestroyedandaneventissentontheeventchanneltotheguestdomain.ItisinterestingtonotethattheprocessorTLBisushedwhiledestroyingthegrant.ItisdonebywritingtheCR3register(thex86pagetablepointer)throughthewrite cr3function.IdescribetheimpactofthisTLBushinSection 5.4.2 106
PAGE 107
LifeofapacketinDom-0:Accessinggrantedpage(create grant host mapping),ethernettransmission(e100 tx clean),destroygrantmapping(destroy grant host mapping)andeventnoticationbacktohypervisor(evtchn send) 107
PAGE 108
5-11 showsanexecutionsnippetwhereTLBushesandmissesareplottedasafunctionofsimulatedinstructionsretired.ThegureshowsthatthereisahighcorrelationbetweentheTLBmisses,contextswitchesandTLBushevents.AnexecutionrunofVMduringaperiodwithnocontextswitchesorTLBushesresultsinnegligibleTLBmisses.WheneverTLBushingeventshappen,thereisasurgeofTLBmisses.ThiscorrelateswellwiththeobservationsofTLBmissoverheadinearlierstudies.Figure 5-12 showstheincreasednumberofTLBmissesassociatedwiththeVMswitchesinacumulativegraph.IobservethatthereisasurgeofTLBmissesassociatedwitheachVMswitch.ExecutionsegmentswithoutVMswitchesshowatareaswithfewTLBushes.Figure 5-13 depictsatypicalVMswitchscenario.TheexecutionmovesfromoneVMtoanotherthroughacontext.TheCR3valueischangedtopointtothenewVMcontext.ThistriggersthehardwaretoushalltheTLBstoavoidinvalidtranslations.ButthiscomeswithacostofTLBusheseverytimeanewpageistouched,bothforcodeanddatapages.AnotherscenarioistheexplicitTLBushesdonebytheXenhypervisoraspartofthedatatransferbetweenVMs.ThisisanartifactofthecurrentI/OVMimplementationasexplainedintheprevioussection.Inordertorevokeagrant,acompleteTLBush 108
PAGE 109
ImpactofTLBushandcontextswitch.Thex-axisshowsasliceoftotalnumberofinstructionsretiredduringanexecutionrunofiperfapplication.They-axisshowsinstructionanddataTLBmissevents,normalizedtoTLBushesandcontextswitch. Figure5-12. CorrelationbetweenVMswitchingandTLBmisses.Thex-axisshowsasegmentofthetotalnumberofinstructionsretired.They-axis(left)representsVMswitchingwhere\1"indicatesVMswitch.They-axis(right)showscumulativeTLBmisses. 109
PAGE 110
TLBMissesafteraVMcontextswitch.InstructionanddataTLBmissesareplotedonthey-axisagainstasegmentofinstructionsretiredonx-axis.ThecontextswitchbetweenvirtualmachinecausesaTLBushwhichincreasesthenumberofTLBmisses. isexecutedexplicitly,whichalsocreatesTLBperformanceissuessimilartoVMswitch.Figure 5-14 demonstratesthecodeowandtheTLBimpact.Figure 5-15 showstheimpactofcontextswitchesoncacheperformance.TheverticallinesmarkVMswitcheventsobtainedthroughsymbolannotation,andtheplottedlineshowsthecumulativecachemissevents.NotethatthecachemissrateincreasesarealsocorrelatedwithVMswitchevents. 88 ].TLBandcachestatisticsaremeasuredfortransferofapproximately25millionTCP/IPpackets. 110
PAGE 111
TLBmissesafteragrantdestroy.Thex-axisshowasegmentofinstructionsretiredandthey-axisrepresentsdataandinstructionTLBmisses. Figure5-15. ImpactofVMswitchoncachemisses.Thex-axisshowasegmentofinstructionsretired.They-axis(left)representsVMcontextswitchthroughtheverticallinesbetween0and1.ThecontextswitchbetweenvirtualmachinecausesaTLBushwhichincreasesthenumberofL2cachemisses(y-axis(right)). 111
PAGE 112
L2cacheperformancewhenL2cachesizeisscaledfrom2MBto32MB.Intheplot,L2cachemissratioisnormalizedtoL2cachesize2MB.ThedatapointsarecollectedwheniperfclientisexecutedinaguestVM(transmitofI/Opackets). Figure5-17. DataandinstructionTLBperformancewhenTLBsizeisscaledbetween64and1024entries.TheplotindicatesthatTLBmissratioisnormalizedtoTLBsizeof64entries.ThedatapointsarecollectedwheniperfclientisexecutedinaguestVM(transmitofI/Opackets). 112
PAGE 113
5-16 showstheeectofscalingL2cache.Theperformancemodelisconguredtosimulateatwolevelcache:32KBL1(splitdataandinstruction)anda2MBuniedL2cache.TheprimarygoalistounderstandthecachesensitivityoftheI/OvirtualizationarchitectureinthecontextofnetworkI/O.NotethatincreasingtheL2cachesizeupto4MBprovidedgoodperformancescaling,afterwhichtheincreaseinperformancewasminimal.Increasingthecachesizebeyond8MB,therateofreductioninmissratesissmall.Icanattributereducedmissratesfromthe8MBcachetotheinclusionofneededpagesfromhypervisor,Dom0andDomU. Figure5-18. L2cacheperformancewhenL2cachesizeisscaledfrom2MBto32MB.L2cachemissratioisnormalizedtoL2cachesize2MB.ThedatapointsarecollectedwheniperfserverisrunninginaguestVM(ReceiveofI/Opackets). Figure 5-17 showstheTLBperformancescalingimpactfordataandinstructionTLBs.Asshowninthegure,withincreaseinsizeofthedataTLB,themissratiodecreasesforsizesupto128entries.Forlargersizes,themissratioisnearlyconstant.TheITLBmissratedecreasesslightly,whiletheDTLBrateshowsasharperdecreasefrom64to128entries.ItcanbeinferredthatTLBsizeof128entriesissucienttoincorporatealladdresstranslationsduringtheTLBstage.IncreasingtheTLBsizeisnotaveryeectiveenhancementinthisscenario.Thisisbecause,asobservedinFigure 113
PAGE 114
DataandinstructionTLBperformancewhenTLBsizeisscaledbetween64and1024entries.TheplotindicatesthatTLBmissratioisnormalizedtoTLBsizeof64entries.ThedatapointsarecollectedwheniperfserverisrunninginaguestVM(ReceiveofI/Opackets). 5-12 and 5-13 ,therearesubstantialnumbersofTLBushesduringgrantrevocationandVMswitches,whichinvalidateallTLBentries.AlargeTLBsizedoesnothelpmitigatetheeectofcompulsoryTLBmissesthatfollowaush.Similarly,thecacheandTLBscalingstudiesonthereceivesideisperformed.ResultsaregiveninFigures 5-18 and 5-19 respectively. 89 ][ 20 ].Performancemonitoringtoolshavebeendeployedtogaugeapplicationperformanceinvirtualizedenvironments[ 9 ][ 83 ][ 10 ].TraditionalnetworkoptimizationssuchasTCP/IPchecksumooad,TCPsegmentationooadarebeingusedtoimprovenetworkperformanceofXen-basedvirtualmachines[ 10 ].Inaddition,fasterI/Ochannelfortransferringnetworkpacketsbetweenguestanddriverdomainsisbeingstudied[ 10 ].Thesestudieslackmicro-architecturaloverheadanalysisofthevirtualizedenvironment. 114
PAGE 115
90 ][ 36 ]. 115
PAGE 116
91 ].Inmulti-coreprocessors,avirtualaddresstranslationstoredinpagetableentriesofaprocessorneedstobepropagatedtotheTLBsofallprocessors.ManyarchitecturesresorttotheushingofentirecontentsofaremoteTLBtoenforcesuchcoherence,inaprocessoftencalled"TLBshootdown".ConsideranexampleofnetworkI/Ocommunicationinavirtualizedenvironment.TheGranttablemechanismadoptedbyXenVMMisbasedonmodifyingaccessprotectionbitsofsharedpagetableentrybetweenguestandprivilegeddomain.Therefore,networkI/OcommunicationbetweenguestandprivilegeddomainsmayresultinTLBshootdowns.TheproblemwiththeshootdownapproachisthatitworksatacoarsecoherencegranularitybyinvalidatingallentriesofaTLB.BecausenotallTLBentriesmustbeinvalidatedtoenforceconsistency(onlythosethatareaectedbytheprotectionchanges)thiscoarse-grainapproachtoenforcecoherencecanresultinthe 116
PAGE 117
6.2 .IintroduceoverviewtheinterprocessorinterruptmechanismusedbyLinuxinx86-basedprocessorstoimplementTLBshootdownsinSection 6.3 .Section 6.4 explainsthepagesharingmechanisminXenhypervisor.Section 6.5 providedetailsofexperimentstomeasureI/Ooverhead,evaluatehardwaresupporttotaghypervisorpagesandevaluatepotentialforselectiveushingininterprocessorinterrupts.Section 6.6 describestherelatedwork. 6.2.1IntroductionTheTranslationlook-asidebuer(TLB)isanon-chipcachetoexpeditethevirtual-to-physicaladdresstranslation.IntheabsenceofTLB,pagetabledatastructureisusedtoaccessphysicalpagecorrespondingtovirtualaddress;thisprocessoftranslatingvirtualintophysicaladdressisexpensive.Instead,processorsrelyontheTLBandlocalityofreferencestoachievefastaddresstranslation.Thisprocesscanbesummarizedasfollows.Anapplicationrungeneratesavirtualmemoryaddresstoaccessinstructionordatafromthememory.TheCPUlooksupthevirtualaddressbyindexingtheTLB.IftheTLBaccessisahitthenthepagetableentrypresentinTLB,isusedtoaccessthephysicalpage.Inmulti-coresystems,typicallyeachprocessorhasitsownTLBinordertoachievefastlookuptimes.ThiscreatesachallengeinmanagingmultipletranslationscachedacrossmultipleTLBs,andthusitisimportanttomaintainTLBcoherency.Unlikeprocessordataandinstructioncaches,theTLBcoherencyisimplementedbyoperatingsystem.Thisisaccomplishedbytheoperatingsystemissuing,foranyupdatestoapagetableentry,aTLBinvalidationoperation. 117
PAGE 118
118
PAGE 119
91 ].TLBentriesrelyoninformationprovidedandupdatedthroughthepagetable.ATLBushoperationmayresultinTLBmiss,apagetablewalk,apossiblepagefault(ifpageisnotinthememory)andaTLBrell.HardwarestatemachinewalksthroughthepagetabletorelltheTLBentry. Figure6-1. Thex86pagetableforsmallpages:Pagingmechanisminx86architectureisshown.Controlregister(CR3)isloadedwithcurrentlyscheduledprocess.Thevirtualaddressfromanapplicationisdividedtoobtainpagedirectoryentry(PDE),Pagetableentry(PTE)andPageoset. Figure 6-1 illustratesthetranslationofvirtualaddressintoaphysicaladdressinthex86architecture.Thesequenceofaccesstophysicalpagefromavirtualorlinearaddressisasfollows:(1)VirtualaddressislookedintoTLB(2)IfTLBtranslationisnotavailable,virtualaddressistranslatedintophysicaladdresstoretrievethepagecontentthroughthepagetable(3)Ifvirtualaddressisnotpresentinpagetable,apagefaultisinvoked.Thepagetableisahierarchicalstructuretoindexandretrievethenallocationofthephysicaladdresscorrespondingtothevirtualaddress.Inaddition,itprovideentriestochecktheaccessprivilegesandmodeofinvocationofthepage.Thisistopreventother 119
PAGE 120
92 ].Incaseofmulti-core,itmaybethecaseeitherpagetablebetweentwoCPUsissharedorpageentriesaresharedbetweendierentCPUs. 92 ].SendingsucharequestinvolvesprogrammingspecialregistersinthisAPIC(ICR,orinterruptcontrolregisters)toselect,amongotherthings,thedestinationofaninterruptanditsarguments.TheformatandentriesinICRregistercanbeobtainedfromtheIntel'ssoftwaredeveloperspecications[ 92 ].InterprocessorinterruptsareinitiatedafterawritetotheAPIC'sICRregister.InSMParchitecture,eachprocessorhaveitsownlocalAPICtoinvokeoractonanIPI.AlocalAPICunitindicatesuccessfuldispatchofanIPIbyresettingdeliveryStatusbitinInterruptCommandRegister(ICR).AnexampleofIPIisushingtheTLBcontentswhenaprocessormodiesthesharedentryinthepagetabledatastructurethatissharedwithotherprocessors.ThisallowssynchronizationbetweentheaddresstranslationprocedureofSMPprocessors.AnotherexampleofIPIiswhenreschedulinganewtaskonSMPmachine.Consideranexampleofscheduling\idle"task.Toimplementthis,rstalltheinterruptsareenabledinSMPprocessorsandthen\hlt\instructionisissuedtoalltheprocessors.WheneveraninterruptisreceivedfromsystemdevicessuchasKeyboard,CPUisawakenthroughinterrupt.InSMP,oneCPUisawakenonsuchinterrupt,anIPIissenttootherCPUthroughawritetoAPIC'sICRregister(e.g.InLinuxO/S"send IPI mask"functionperformsthetaskofwritingtoICRregister).Figure 6-2 showsanexampleofIPIinvocationmechanisminatwoprocessorSMPsystem.EachCPUhasitsownlocalAPICunit.CPUscanstoretheinterruptvector 120
PAGE 121
Interprocessorinterruptmechanisminx86architecture:InterruptisinitiatedwithawritetoInterruptcontrolregister(ICR).ICRisamemorymappedregisterfortheAPICsystem. andtheidentiertotargetprocessorinICR.OnawritetoICR,amessageissenttotargetprocessorviasystembus.Figure 6-2 showsthecontentsofICRregistertoidentifydestinationCPU(0x1000000)andkindofIPI(e.g.thecontentofICRregisterforTLBinvalidationis0x8fd).ThedefaultlocationofAPICregistersisat0xfee0000inthephysicalmemory.ThesequenceofeventstoinvokeIPIforTLBinvalidationisasfollows: 121
PAGE 122
122
PAGE 123
6.5.1GrantTablePerformanceTheperformancemodelusedinChapter5capturesfunctionalbehaviorbutisnottimingaccurate.Thischoiceismotivatedbythefactthattiming-accuratemodelsareconsiderablyslowerthanfunctionalmodels.AconclusiondrawnfromthepreviousChapterwasthatsplitI/OimplementationresultsinadditionalTLBushesandmisses.Itisalsoimportanttocharacterizetheimpactofthismechanismonthetimingandperformanceofavirtualizedsystem.Thissubsectionaddressesthisissuethroughaproling-basedanalysisofamodiedversionofXen.Toaccuratelyevaluatetheoverheadinthelifecycleofnetworkpacket,inthischapterIinstrumentedxensourcecodetoevaluatepercentageofcyclesconsumedduringgrantmechanism.ThisexperimentisperformedonbothIntelduocore2andPentium-basedmachines.Table 6-1 providesthepercentageofcyclesconsumedduetogranttableoperationsbyhypervisor,averagedovertheperiodwheretheselectednetworkapplicationbenchmark(iperf)executed.AsinferredfromTable 6-1 ,grantoperationsconsumeasignicantamountofresources-approximately20%ofCPUcyclesduringapackettransferforboththeCoreDuo2andPentiumIIICPUs.Whilegrantmapandunmapstatisticsaregatheredduringthetransmitoftheiperfpacket,grantcopyortransferstatisticsshow 123
PAGE 124
10 ],similartotheresultshowedinTable 6-1 Table6-1. Granttableoverheadsummary Functioncalls %cycles(duocore2) %cycles(PentiumIII) gnttab map 7.13 6.18 gnttab unmap 4.28 3.56 gnttab transfer/copy 8.20(copy) 10.39(transfer) ThestatisticsareobtainedfromprolingthegrantoperationsinXenhypervisorthroughXentracetool[ 93 ]andaccessingthesystemcyclecountsthroughtime-stampregister(RDTSC). Experimentalsetup:Twosimics/softSDVsessionsaresynchronizedusingavirtualnetwork.IperfapplicationdeployedonLinuxO/Sisexecutedinonesession.IperfisdeployedintoDomUinXenhypervisorenvironment.Packetsaresentto/fromDomUfrom/toLinuxO/S IstudiedthepotentialimpactofaTLBoptimizationtomakeglobalhypervisorpagespersistentinTLBs.IntheabsenceofTLBtagging,onaTLBushalltranslations 124
PAGE 125
6-4 ,suchanoptimizationindeedhasthepotentialtosubstantiallyreduceDTLBmisses(and,toalesserextent,reduceITLBmisses).ItismoreeectivethanincreasingtheTLBsizebecauseglobalbittaggingallowsasubsetofthetranslationstoremaincachedduringswitchesandgrantrevocations.TheexperimentalsetupisshowninFigure 6-3 Figure6-4. ImpactoftaggingTLBwithaglobalbittopreventTLBushforhypervisorpages.Intheplot,TLBmissratioisnormalizedtoTLBsizeof64entries.ThedatapointsarecollectedwheniperfclientisexecutedinaguestVM(transmitofI/Opackets). TheimportanceofperformanceisolationandVMlevelQoSisagrowingresearchareaespeciallywiththeintroductionofmulti-coreprocessorswhicharesharingplatformresourceslikecache,TLBandmemoryresources.MyworkisfurtherextendedtostudytheimpactofqualityofserviceonTLBin[ 94 ]. 125
PAGE 126
Figure6-5. Pagesharinginmulticoreenvironment:AnexampleofpotentialreasonforinvocationofinterprocessorinterruptbetweentwoCPUsisshown. TochecktheconsistencybetweenpagetableandTLBcontentsatinvocationofinterprocessorinterrupt,followingarethesequenceofstepsasshowninFigure 6-5 : 126
PAGE 127
Figure6-6. Simicsmodeltocaptureinter-processorinterrupts:SimicsexportsAPItoregisterdierentperformancemodels.TheseperformancemodelscancaptureimportanteventssuchasIPIthroughsimicsAPI(SIM hap add callback).SimicsworkloadisabstractedtorepresentanO/Sorahypervisor. TLBmodelisgenerallyinitializedandloadedwhensimicsisbooted.SimicsprovideAPItoregisteracallbackfunctiontocaptureandactonIPIeventintheTLBmodule.AnexamplefunctionalityofIPIcallbackfunctioncouldbeacountertocountthenumberofIPIsduringtheexecutionofworkload.Figure 6-6 illustratethatsimicsAPI(SIM hap add callback)isusedtoregisteraneventrelatedtocoresystemdevice(inthiscaseAPIC).ThiscallbackfunctionisusedtomodifythesemanticsofTLBush 127
PAGE 128
6-2 showsthenumberofIPIsfortransmitandreceiveofthenetworkI/Opacketsduringexecutionrunofiperfbenchmark.ThepotentialbenetofnotushingtheTLBduringaninterprocessorinterruptcanbenegatedifO/Sissuesanormalushduetoschedulinganotherprocess.Forthisexperiment,Icreatedanetworkoftwosimicssimulatedmachine.Foreachexperiment,Simicssessionsarewarmedupfor20millioninstruction.Thestatisticsshownarecollectedforarunof500millioninstructions.Notethatnumberofushesismoreinreceivescenariowhenguestvirtualmachineisreceivingthepackets.ToevaluatetheconsistencyduringtheIPI,thepagetableparserisimplementedinsimicsTLBmoduletolookuptheTLBentriesinthepagetable. Table6-2. TLBushstatisticswithandwith-outIPIushoptimization Transmit/Receive IPIush Normalush Transmit 485 31670 Receive 602 57718 TofurtherunderstandtheimpactofIPImechanism,IstudiedtheimpactofscalingtheTLBsizeforinstructionanddataTLBs.Fortheseexperiments,domain-1isanitizetoCPU1whiledomain-0doesnothaveanyCPUanity.Table 6-3 andTable 6-4 providetheTLBmissstatisticsforinstructionanddatacachesduringtheapplicationrunofiperffor50millioninstructions.WhilethedataTLBdoesnotshowsignicantimprovementinperformanceforbothCPUs,instructionTLBforCPU1showsimprovementof1.2to2.4%.Inaddition,numberofmissesarenotsignicantlychangedbeyondtheTLBsizeof128entries.WhilenumberofTLBmissesisreducedwhencompleteushisavoidedduringinvocationofinterprocessorinterrupt,thepotentialperformanceimprovementisnegatedbycompleteushesduetolocalushes(e.g.schedulingofnewprocess). 128
PAGE 129
InstructionTLBmissstatisticswithandwith-outIPIushoptimizationforreceiveiperfbenchmark IPImode TLBEntries 64 128 256 512 IPIush(CPU0) 112848 100687 100657 100687 IPIush(CPU1) 347 4506 4506 4506 NoIPIush(CPU0) 112811 100604 100602 100602 NoIPIush(CPU1) 347 4455 4455 4455 Table6-4. DatamissTLBstatisticswithandwith-outIPIushoptimizationforreceiveiperfbenchmark IPImode TLBEntries 64 128 256 512 IPIush(CPU0) 389221 454949 450143 450118 IPIush(CPU1) 55369 33636 33628 33628 NoIPIush(CPU0) 389186 454916 450051 450008 NoIPIush(CPU1) 55369 33580 33554 33554 95 ].Performanceevaluationofsharedhardwareresourcesbetweenmulti-coreprocessorswithvirtualmachineenvironment-basedworkloadforserverconsolidationhasrecentlybeenaddressed[ 96 ].Simulation-basedapproachhasbeenusedtomaximizesharedmemoryaccessbetweenmultipleVMs[ 96 ],howeverthisstudyapproximatesuseofvirtualization-themodeldoesnotconsidertheexecutionofhypervisors.Alltheseanalysislackedthecompleteunderstandingofthenetworkstack.Specializedvirtualmachinecontainers(withminimalfunctionalitysuchasonlyI/O)arebeingusedtostudytheI/Oscalability[ 97 ].Solarare[ 11 ]hasapproachtobypassthehypervisorfornetworkI/O.TheapproachisbasedonprovidinghardwaresupportofavirtualNIC(vNIC)pervirtualmachine.ThevirtualNICcontrollerforI/Oaccelerationcommunicateswiththenetworkdriverinterfaceoeredbyguestvirtualmachines.ThenetworkdriversinsidetheguestvirtualmachinescommunicatedirectlywithinterfaceoeredbythevirtualNICcontroller(bypassingthehypervisor). 129
PAGE 130
98 ].ManyapproacheshavebeenconsideredinthepasttoimproveTLBperformance.WhenaprocessortriestomodifyaTLBentry,itlockspagetabletopreventotherprocessorfrommodifyingit.ItushesthelocalTLBentries.TLBoperations(suchasTLBrell)arequeuedforupdate.ProcessorsendsandIPIandspinsuntilallotherprocessorsaredone.Finallyitunlocksthepagetable.ThesestepsarecycleconsuminginTLB.ManyimprovementshavebeensuggestedtoimprovethisperformanceofTLB.Thisincludes: 99 ].Anexampleofthisisupgradeofapagefromread-onlytoread-write.LinuxO/SdoessupportLazyTLBushing. 130
PAGE 131
131
PAGE 132
132
PAGE 133
133
PAGE 134
100 ](Nehelam)toimprovethecostofvirtualtophysicaladdresstranslation.Inthisscenario,thisdissertationmotivatesresearchonthemodelsthatcapturesthebehaviorofsimilarnewon-chipresources.Thesemodelscanbeaugmentedandevaluatedfurthertoguidesystemdesignersforimprovedsystemsperformance.Futureinternetusageispredictedtoincludenewusagemodelsthatarebasedonuserconnectivitysuchassocialnetworking,collaborationandon-linegaming.ThereareinstanceswheretheseapplicationsaredesiredtobeencapsulatedinVMenvironments 134
PAGE 135
135
PAGE 136
[1] J.E.SmithandR.Nair,VirtualMachines:versatileplatformsforsystemsandprocesses,MorganKaufmannpublishers,May2005. [2] M.Litzkow,M.Livny,andM.W.Mutka,\Condor:ahunterofidleworkstations,"inProceedingsof8thinternationalconferenceonDistributedComputingSystems,Jun1988,pp.104{111. [3] B.Callaghan,NFSillustrated,Addison-WesleyLongmanLtd.,Essex,UK,2000. [4] S.Adabala,V.Chadha,P.Chawla,R.J.Figueiredo,J.A.B.Fortes,I.Krsul,A.Matsunga,M.Tsugawa,J.Zhang,M.Zhao,L.Zhu,andX.Zhu,\Fromvirtualizedresourcestovirtualcomputinggrids:Thein-vigosystem,"FutureGenerationComputingSystems,specialissueonComplexProblem-SolvingEnviron-mentsforGridComputing,vol.21,no.6,Apr2005. [5] K.Keahey,I.Foster,T.Freeman,X.Zhang,andD.Galron.,\Virtualworkspacesinthegrid,"inProceedingsoftheEuro-ParConference,Lisbon,Portugal,Sep2005. [6] A.SundarajandP.A.Dinda,\Towardsvirtualnetworksforvirtualmachinegridcomputing,"in3rdUSENIXVirtualMachineResearchandTechnologySymp.,May2004. [7] D.Nurmi,R.Wolski,C.Grzegorczyk,G.Obertelli,S.Soman,L.Youse,andD.Zagorodnov,\Theeucalyptusopen-sourcecloud-computingsystem,"inCloudComputingandItsApplicationsworkshop(CCA'08),Chicago,IL,October2008. [8] VMware,\Merrilllynchtostandardizeonvmwarevirtualmachinesoftware[Online],"WorldWideWebelectronicpublication,Available: lynch.html [9] A.Menon,J.R.Santos,Y.Turner,G.J.Janakiraman,andW.Zwaenepoel,\Diagnosingperformanceoverheadsinthexenvirtualmachineenvironment,"inVEE'05:Proceedingsofthe1stACM/USENIXinternationalconferenceonVirtualexecutionenvironments,NewYork,NY,USA,2005,pp.13{23,ACM. [10] A.Menon,A.L.Cox,andW.Zwaenepoel,\Optimizingnetworkvirtualizationinxen,"inATEC'06:ProceedingsoftheannualconferenceonUSENIX'06AnnualTechnicalConference,Berkeley,CA,USA,2006,USENIXAssociation. [11] D.Chisnall,Thedenitiveguidetothexenhypervisor,PrenticeHallPress,UpperSaddleRiver,NJ,USA,2007. [12] R.Goldberg,\Surveyofvirtualmachineresearch,"IEEEComputerMagazine,vol.7,no.6,pp.34{45,1974. 136
PAGE 137
[13] P.S.Magnusson,M.Christensson,J.Eskilson,D.Forsgren,G.Hllberg,J.Hgberg,F.Larsson,A.Moestedt,andB.Werner,\Simics:Afullsystemsimulationplatform,"IEEEComputer,2002. [14] M.Rosenblum,S.A.Herrod,E.Witchel,andA.Gupta,\Completecomputersystemsimulation:ThesimOSapproach,"IEEEParallelandDistributedTechnol-ogy,vol.3,pp.34{43,1995. [15] J.J.YiandD.J.Lilja,\Simulationofcomputerarchitectures:Simulators,benchmarks,methodologies,andrecommendations,"IEEETransactionsonComputers,vol.55,no.3,pp.268{280,2006. [16] J.Sugerman,G.Venkitachalan,andB.H.Lim,\Virtualizingi/odevicesonvmwareworkstation'shostedvirtualmachinemonitor,"inProceedingsoftheUSENIXAnnualTechnicalConference,Jun2001. [17] S.Hand,A.Wareld,K.Fraser,E.Kotsovinos,andD.Magenheimer,\Arevirtualmachinemonitorsmicrokernelsdoneright?,"inHOTOS'05:Proceedingsofthe10thconferenceonHotTopicsinOperatingSystems,Berkeley,CA,USA,2005,USENIXAssociation. [18] S.Kumar,H.Raj,K.Schwan,andI.Ganev,\Re-architectingvmmsformulticoresystems:Thesidecoreapproach,"inWorkshopontheInteractionbetweenOperatingSystemsandComputerArchitecture,2007. [19] K.Krewell,\Bestserversof2004:Multicoreisnorm,"MicroprocessorReport,2005. [20] P.Barham,B.Dragovic,K.Fraser,S.Hand,T.Harris,A.Ho,R.Neugebauer,I.Pratt,andA.Wareld,\Xenandtheartofvirtualization,"inSOSP'03:ProceedingsofthenineteenthACMsymposiumonOperatingsystemsprinciples,NewYork,NY,USA,2003,pp.164{177,ACM. [21] E.Kotsovinos,T.Moreton,I.Pratt,R.Ross,K.Fraser,S.Hand,andT.Harris,\Global-scaleservicedeploymentinthexenoserverplatform,"inProceedingsoftheFirstWorkshoponReal,LargeDistributedSystems(WORLDS'04),Dec2004. [22] K.Suzaki,T.Yagi,K.Iijima,andN.A.Quynh,\Oscircular:internetclientforreference,"inLISA'07:Proceedingsofthe21stconferenceonLargeInstallationSystemAdministrationConference,Berkeley,CA,USA,2007,pp.105{116,USENIXAssociation. [23] B.M.G.,J.Hartman,M.Kupfer,K.W.Shri,andJ.Ousterhout,\Measurementsofadistributedlesystem,"inProceedingsofthe13thSymposiumonOperatingSystemsPrinciples,1991. [24] J.H.Howard,M.L.Kazar,S.G.Menees,A.Nichols,M.Satyanarayanan,R.N.Sidebotham,andM.J.West,\Scaleandperformanceinadistributedlesystem,"ACMTransactionsonComputerSystems,vol.6,Feb1988.
PAGE 138
[25] B.Pawlowski,C.Juszczak,P.Staubach,C.Smith,D.Lebel,andD.Hitz,\Nfsversion3:Designandimplementation,"inUSENIXSummer,Boston,MA,Jun1994. [26] A.Ganguly,D.Wolinsky,P.O.Boykin,andR.Figueiredo,\Decentralizeddynamichost:Congurationinwide-areaoverlaynetworksofvirtualworkstations,"inWorkshoponLarge-ScaleandVolatileDesktopGrids(PCGrid),LongBeach,CA,Mar2007,pp.1{8. [27] P.Apparao,S.Makineni,andD.Newell,\Characterizationofnetworkprocessingoverheadsinxen,"inVTDC'06:Proceedingsofthe2ndInternationalWorkshoponVirtualizationTechnologyinDistributedComputing,WashingtonDC,USA,2006,p.2,IEEEComputerSociety. [28] J.S.Chase,D.E.Irwin,L.E.Grit,J.D.Moore,andS.E.Sprenkle,\Dynamicvirtualclustersinagridsitemanager,"inHPDC'03:Proceedingsofthe12thIEEEInternationalSymposiumonHighPerformanceDistributedComputing,Washington,DC,USA,2003,pp.90{101,IEEEComputerSociety. [29] I.Foster,C.Kesselman,andS.Tuecke,\Theanatomyofthegrid:enablingscalablevirtualorganizations,"InternationalJournalofSupercomputingApplications,vol.15,no.3,pp.200{222,Apr2001. [30] A.Ganguly,A.Agrawal,P.Boykin,andR.J.Figueiredo,\Ipoverp2p:Enablingself-conguringvirtualipnetworksforgridcomputing,"inIEEEInternationalParallel&DistributedProcessingSymposium(IPDPS),RhodeIsland,Greece,Apr2006. [31] S.M.Larson,C.D.Snow,M.R.Shirts,andV.S.Pande,\Folding@homeandgenome@home:Usingdistributedcomputingtotacklepreviouslyintractableproblemsincomputationalbiology,"inComputationalGenomics.2002,HorizonPress. [32] D.P.Anderson,J.Cobb,E.Korpella,M.Lebofsky,andD.Werthimer,\Seti@home:Anexperimentinpublic-resourcecomputing,"CommunicationsoftheACM,vol.11,no.45,pp.56{61,2002. [33] J.J.KistlerandM.Satyanarayan,\Disconnectedoperationincodalesystem,"ACMTransactionsonComputerSystems,vol.6,Feb1992. [34] B.S.White,A.S.Grimshaw,andA.Nguyen-Tuong,\Grid-basedleaccess:Thelegioni/omodel,"inHighPerformancedistributedComputing,Pittsburgh,PA,Aug2000,pp.165{174. [35] R.Uhlig,G.Neiger,D.Rodgers,A.L.Santoni,F.C.M.Martins,A.V.Anderson,S.M.Bennett,A.Kagi,F.H.Leung,andL.Smith,\Intelvirtualizationtechnology,"Computer,vol.38,no.5,2005.
PAGE 139
[36] T.Garnkel,K.Adams,A.Wareld,andJ.Franklin,\Compatibilityisnottransparency:Vmmdetectionmythsandrealities,"inHOTOS'07:Proceedingsofthe11thUSENIXworkshoponHottopicsinoperatingsystems,Berkeley,CA,USA,2007,pp.1{6,USENIXAssociation. [37] M.RosemblumandT.Garnkel,\Virtualmachinemonitors:Currenttechnologyandfuturetrends,"IEEEComputer,vol.38,pp.39{47,2005. [38] D.C.Anderson,J.S.Chase,andA.M.Vahdat,\Interposedrequestforroutingforscalablenetworkstorage,"inSymposiumonOSDI,SanDiego,CA,oct2000. [39] D.J.Blezard,\Multi-platformcomputerlabsandclassrooms:amagicbullet?,"inSIGUCCS'07:Proceedingsofthe35thannualACMSIGUCCSconferenceonUserservices,NewYork,NY,USA,2007,pp.16{20,ACM. [40] J.Watson,\Virtualbox:bitsandbytesmasqueradingasmachines,"LinuxJ.,vol.2008,no.166,pp.1,2008. [41] S.Bhattiprolu,E.W.Biederman,S.Hallyn,andD.Lezcano,\Virtualserversandcheckpoint/restartinmainstreamlinux,"SIGOPSOperatingSystemReview,vol.42,no.5,pp.104{113,2008. [42] J.Dike,\Auser-modeportofthelinuxkernel,"inProc.ofthe4thAnnualLinuxShowcaseandConference,Atlanta,GA,2000. [43] F.Bellard,\Qemu,afastandportabledynamictranslator,"inATEC'05:ProceedingsoftheannualconferenceonUSENIXAnnualTechnicalConference,Berkeley,CA,USA,2005,pp.41{46,USENIXAssociation. [44] \x86architecture[Online],"WorldWideWebelectronicpublication,Available: [45] A.S.Tanenbaum,J.N.Herder,andH.Bos,\Canwemakeoperatingsystemsreliableandsecure?,"inComputer.may2006,pp.44{51,IEEEComputerSociety. [46] R.Bhargava,B.Serebrin,F.Spadini,andS.Manne,\Acceleratingtwo-dimensionalpagewalksforvirtualizedsystems,"inASPLOSXIII:Proceedingsofthe13thinternationalconferenceonArchitecturalsupportforprogramminglanguagesandoperatingsystems,NewYork,NY,USA,2008,pp.26{35,ACM. [47] P.Willmann,S.Rixner,andA.L.Cox,\Protectionstrategiesfordirectaccesstovirtualizedi/odevices,"inATC'08:USENIX2008AnnualTechnicalConferenceonAnnualTechnicalConference,Berkeley,CA,USA,2008,pp.15{28,USENIXAssociation. [48] R.Iyer,L.Zhao,F.Guo,R.Illikkal,S.Makineni,D.Newell,Y.Solihin,L.Hsu,andS.Reinhardt,\Qospoliciesandarchitectureforcache/memoryincmpplatforms,"inACMSigmetrics,Jun2007.
PAGE 140
[49] N.Egi,A.Greenhalgh,M.Handley,M.Hoerdt,F.Huici,andL.Mathy,\Fairnessissuesinsoftwarevirtualrouters,"inPRESTO'08:ProceedingsoftheACMworkshoponProgrammableroutersforextensibleservicesoftomorrow,NewYork,NY,USA,2008,pp.33{38,ACM. [50] A.Muthitacharoen,B.Chen,andD.Mazieres,\Alow-bandwidthnetworklesystem,"inSymposiumonOperatingSystemsPrinciples,2001,pp.174{187. [51] A.R.Butt,T.Johnson,Y.Zheng,andY.C.Hu,\Kosha:Apeer-to-peerenhancementforthenetworklesystem,"inProceedingsofIEEE/ACMSC2004,Nov2004. [52] V.SrinivasanandJ.C.Mogul,\Spritelynfs:Experiementswithcache-consistencyprotocols,"inProceedingsoftheTwelfthACMSymposiumonOperatingSystemsPrinciples,Dec1989,pp.45{57. [53] R.Macklem,\Notquitenfs,softcacheconsistencyfornfs,"inProceedingsoftheWinter1994UsenixConference,SanFrancisco,CA,Jan1994. [54] D.HildebrandandP.Honeyman,\Exportingstoragesystemsinascalablemannerwithpnfs,"inProceedingsofthe22ndIEEE/13thNASAGoddardConferenceonMassStorageSystemsandTechnologies,Washington,DC,USA,2005,pp.18{27,IEEEComputerSociety. [55] A.Traeger,A.Rai,C.P.Wright,andE.Zadok,\Nfslehandlesecurity,"Tech.Rep.,StonyBrookUniversity,2004. [56] R.J.Figueiredo,P.Dinda,andJ.A.B.Fortes,\Acaseforgridcomputingonvirtualmachines,"inProc.ofthe23rdIEEEIntl.ConferenceonDistributedComputingSystems(ICDCS),Providence,RhodeIsland,May2003. [57] M.ZhaoandR.J.Figueiredo,\Distributedlesystemsupportforvirtualmachinesingridcomputing,"inProceedingsofHPDC,Jun2004. [58] M.Zhao,V.Chadha,andR.J.Figueiredo,\Supportingapplication-tailoredgridlesystemsessionswithwsrf-basedservices.,"inProceedingsofHPDC,Jul2005. [59] D.Wolinsky,A.Agrawal,P.O.Boykin,J.Davis,A.Ganguly,V.Paramygin,P.Sheng,andR.Figueiredo,\Onthedesignofvirtualmachinesandboxesfordistributedcomputinginwideareaoverlaysofvirtualworkstations,"inFirstWorkshoponVirtualizationTechnologiesinDistributedComputing(VTDC),Nov2006. [60] F.Oliveira,G.Guardiola,J.A.Patel,andE.V.Hensbergen,\Blutopia:Stackablestorageforclustermanagement,"inProceedingsoftheIEEEclustercomputing,Sep2007. [61] S.Santhanam,P.Elango,A.Arpaci-Dusseau,andM.Livny,\Deployingvirtualmachinesassandboxesforthegrid,"inUSENIXWORLDS,2004.
PAGE 141
[62] M.CarsonandD.Santay,\Nistnet:alinux-basednetworkemulationtool,"SIGCOMMComputerCommunicationReview,vol.33,no.3,pp.111{126,2003. [63] J.SpadavecchiaandE.Zadok,\Enhancingnfscross-administrativedomainaccess,"inUSENIXAnnualTechnicalConferenceFREENIXTrack,2002,pp.181{194. [64] M.Baker,R.Buyya,andD.Laforenza,\Gridsandgridtechnologiesforwide-areadistributedcomputing,"SoftwarePractice&Experience,vol.32,no.15,pp.1437{1466,2002. [65] C.P.Wright,J.Dave,P.Gupta,H.Krishnan,D.P.Quigley,E.Zadok,andM.N.Zubair,\Versatilityandunixsemanticsinnamespaceunication,"ACMTransac-tionsonStorage(TOS),vol.2,no.1,pp.1{s32,February2006. [66] D.Santry,M.Feeley,N.Hutchinson,A.Veitch,R.Carton,andJ.Or.,\Decidingwhentoforgetintheelephantlesystem,"in17thACMSOSPPrinciples,1999. [67] R.G.Minnich,\Theautocacher:Alecachewhichoperatesatthenfslevel,"inUSENIXconferenceproceedings,1993,pp.77{83. [68] S.Osman,D.Subhraveti,G.Su,andJ.Nieh,\Thedesignandimplementationofzap:Asystemformigratingcomputingenvironments.,"inSymposiumonOSDI,Boston,MA,Dec2002. [69] J.H.HartmanandJ.K.Ousterhout,\Thezebrastripednetworklesystem,"inSOSP'93:ProceedingsofthefourteenthACMsymposiumonOperatingsystemsprinciples.Dec1993,pp.29{43,ACM. [70] A.AgbariaandR.Friedman,\Virtualmachinebasedheterogeneouscheckpointing,"software-Practice&Experience,vol.32,pp.1175{1192,2002. [71] A.Ganguly,A.Agrawal,P.O.Boykin,andR.J.O.Figueiredo,\Wow:Self-organizingwideareaoverlaynetworksofvirtualworkstations.,"inHPDC.June2006,pp.30{42,IEEE. [72] C.Sun,L.He,Q.Wang,andR.Willenborg,\Simplifyingservicedeploymentwithvirtualappliances,"inIEEEInternationalConferenceonServicesComputing,vol.2,pp.265{272,2008. [73] R.ProdanandT.Fahringer,\Overheadanalysisofscienticworkowsingridenvironments,"IEEETransactionsonParallelandDistributedSystems,vol.19,no.3,pp.378{393,2008. [74] A.S.TanenbaumandM.vanSteen,DistributedSystems:PrinciplesandParadigms(1stEdition),Prentics-HallInc,2002. [75] V.ChadhaandR.J.Figueiredo,\Row-fs:Auser-levelvirtualizedredirect-on-writedistributedlesystemforwideareaapplications,"inInternationalConferenceonhighPerformanceComputing(HiPC),Goa,India,Dec2007.
PAGE 142
[76] S.Annapureddy,M.J.Freedman,andD.Mazires,\Shark:Scalingleserversviacooperativecaching,"inProceedingsofthe2ndUSENIX/ACMSymposiumonNetworkedSystemsDesignandImplementation,May2005. [77] D.Reimer,A.Thomas,G.Ammons,T.Mummert,B.Alpern,andV.Bala,\Openingblackboxes:usingsemanticinformationtocombatvirtualmachineimagesprawl,"inVEE'08:ProceedingsofthefourthACMSIGPLAN/SIGOPSinternationalconferenceonVirtualexecutionenvironments,NewYork,NY,USA,2008,pp.111{120,ACM. [78] R.Chandra,N.Zeldovich,C.Sapuntzakis,andM.S.Lam,\Thecollective:Acache-basedsystemmanagementarchitecture,"inProceedingsof2ndSymposiumonNetworkedSystemsDesign&Implementation(NSDI),2005,pp.259{272. [79] \2xthinclientserver,"WorldWideWebelectronicpublication,2008, [80] M.McNett,D.Gupta,A.Vahdat,andG.M.Voelker,\Usher:Anextensibleframeworkformanagingclustersofvirtualmachines,"inProceedingsofthe21stLargeInstallationSystemAdministrationConference(LISA),Nov2007. [81] J.Cappos,S.Baker,J.Plichta,D.Nyugen,J.Hardies,M.Borgard,J.Johnston,andJ.H.Hartman,\Stork:Packagemanagementfordistributedvmenvironments,"inProceedingsofthe21stLargeInstallationSystemAdministrationConference(LISA),Nov2007. [82] R.GrossmanandY.Gu,\Dataminingusinghighperformancedataclouds:experimentalstudiesusingsectorandsphere,"inKDD'08:Proceedingofthe14thACMSIGKDDinternationalconferenceonKnowledgediscoveryanddatamining,NewYork,NY,USA,2008,pp.920{927,ACM. [83] A.Menon,J.R.Santos,Y.Turner,G.Janakiraman,andW.Zwaenepoel,\Diagnosingperformanceoverheadsinthexenvirtualmachineenvironment,"inVEE'05:Proceedingsofthe1stACM/USENIXinternationalconferenceonVirtualexecutionenvironments.2005,pp.13{23,ACM. [84] J.R.Santos,Y.Turner,G.Janakiraman,andI.Pratt,\Bridgingthegapbetweensoftwareandhardwaretechniquesfori/ovirtualization,"inATC'08:USENIX2008AnnualTechnicalConferenceonAnnualTechnicalConference,Berkeley,CA,USA,2008,pp.29{42,USENIXAssociation. [85] R.Uhlig,R.Fishtein,O.Gershon,I.Hirsh,andH.Wang,\Softsdv:Apresiliconsoftwaredevelopmentenvironmentfortheia-64architecture,"IntelTechnologyJournal,1999. [86] M.Yourst,\PTLsim:Acycleaccuratefullsystemx86-64microarchitecturalsimulator,"IEEEInternationalSymposiumonPerformanceAnalysisofSystems&Software,2007,pp.23{34,April2007.
PAGE 143
[87] R.Iyer,\Onmodelingandanalyzingcachehierarchiesusingcasper,"in11thIEEEInternationalSymposiumonModeling,Analysis,andSimulationofComputerandTelecommunicationsSystems(MASCOTS),Oct2003. [88] W.WuandM.Crawford,\Interactivityvs.fairnessinnetworkedlinuxsystems,"ComputerNetworks,vol.51,no.14,pp.4050{4069,2007. [89] L.CherkasovaandR.Gardner,\Measuringcpuoverheadfori/oprocessinginthexenvirtualmachinemonitor,"inATEC'05:ProceedingsoftheannualconferenceonUSENIXAnnualTechnicalConference,Berkeley,CA,USA,Apr2005,USENIXAssociation. [90] G.Neiger,A.Santoni,F.Leung,D.Rodgers,andR.Uhlig,\Intelvirtualizationtechnology:Hardwaresupportforecientprocessorvirtualization,"IntelTechnologyJournal,Aug2006. [91] B.Jacob,S.W.NG,andD.T.Wang,MemorySystems:Cache,DRAMandDisk,MorganKaufmannpublishers,2007. [92] I.Corporation,\Intel64andia-32architecturessoftwaredeveloper'smanuals[Online],"WorldWideWebelectronicpublication,Available: [93] D.Gupta,R.Gardner,andL.Cherkasova,\Xenmon:Qosmonitoringandperformanceprolingtool[Online],"WorldWideWebelectronicpublication,Available: [94] O.tickoo,H.Kannan,V.Chadha,R.Illikkal,R.Iyer,andD.Newell,\qtlb:Lookinginsidelookasidebuer,"inInternationalConferenceonhighPerformanceComputing(HiPC),Goa,India,Dec2007. [95] R.Santos,G.Janikaraman,andY.turner,\Xennetworkoptimization[Online],"WorldWideWebelectronicpublication,Available: 3/networkoptimizations.pdf [96] M.R.MartyandM.D.Hill,\Virtualhierarchiestosupportserverconsolidation,"inISCA'07:Proceedingsofthe34thannualinternationalsymposiumonComputerarchitecture,NewYork,NY,USA,2007,pp.46{56,ACM. [97] J.Wiegert,G.Regnier,andJ.Jackson,\Challengesforscalablenetworkinginavirtualizedserver,"inProceedingsofthe16thInternationalConferenceonComputerCommunicationsandNetworks,Aug2007. [98] J.Liu,W.Huang,B.Abali,andD.K.Panda,\Highperformancevmm-bypassi/oinvirtualmachines,"inATEC'06:ProceedingsoftheannualconferenceonUSENIX'06AnnualTechnicalConference,Berkeley,CA,USA,May2006,USENIXAssociation.
PAGE 144
[99] M.-S.ChangandK.Koh,\Lazytlbconsistencyforlarge-scalemultiprocessors,"inProceedingsofthe2ndAIZUInternationalSymposiumonParallelAlgo-rithms/ArchitectureSynthesis,1997. [100] I.Corporation,\Firstthetick,nowthetock:NextgenerationIntelmicroarchitecture[Online],"WorldWideWebelectronicpublication,Available:
PAGE 145
VineethasgraduatedwithB.EinElectronicsandTelecommunicationfromUniversityofPune,India.HenishedhisM.SinComputerSciencefromMississippiStateUniversity.HeispursuinghisPhDinComputerInformationScienceandEngineeringatUniversityofFlorida.Hisresearchinterestsincludevirtualization,operatingsystems,computerarchitecture,lesystemsanddistributedcomputing.SinceFall2002,VineethasbeenresearchassistantatAdvancedComputingandInformationSystems(ACIS)Laboratory.AtACIS,hisresearchfocushasbeenGridVirtualFileSystem(GVFS)andI/Ovirtualization.Hehasbeeninvolvedindevelopmentofmiddlewaresupportfornetworklesystemandsimulation-basedevaluationmethodologytocharacterizeI/Ooverheadinvirtualizedenvironments.Tocomplementhisacademicexperience,VineethascompletedtwosummerinternshipsattheIntelSystemsTechnologyLab.Uponhisgraduation,VineetplanstotakeupafulltimepositionatIntelCorporation. 145
|