<%BANNER%>

Self-Managing Virtual Networks for Wide-Area Distributed Computing

Permanent Link: http://ufdc.ufl.edu/UFE0022657/00001

Material Information

Title: Self-Managing Virtual Networks for Wide-Area Distributed Computing
Physical Description: 1 online resource (154 p.)
Language: english
Creator: Ganguly, Arijit
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: dht, firewalls, grid, nat, networks, p2p, virtual
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Sharing of computing and storage resources among different institutions and individuals connected over the Internet is seen as a solution to meet the ever-increasing computation and storage demands of modern applications. Several factors curtail the ability of existing applications to run seamlessly on Wide-area Networks (WANs): heterogeneous resource configurations, obscured access to resources due to Network Address Translators (NATs) and firewalls, inability to express sharing policies and lack of isolation provided by operating systems. This work addresses the problem of providing bi-directional network connectivity among wide-area resources behind NATs and firewalls. At the core of the presented approach is a self-managing networking infrastructure (IPOP) that aggregates wide-area hosts into a private network with decoupled address space management, and is functionally equivalent to a Local-area network (LAN) environment where a wealth of existing, unmodified IP-based applications can be deployed. The IPOP virtual network tunnels the traffic generated by applications over a P2P-based overlay network, which handles NAT/firewall traversal (through hole-punching techniques) and dynamically adapts its topology (through establishment of direct connections between communicating nodes) in a self-organized, decentralized manner. Together with classic virtual machine technology for software dissemination, IPOP facilitates deployment of large-scale distributed computing environments on wide-area hosts owned by different organization and individuals. A real deployment of the system has been up and running for more than one year, providing access to computational resources for several users. This dissertation makes the following contributions in the area of virtualization applied to wide-area networks: a novel self-organizing IP-over-P2P system with decentralized NAT traversal; decentralized self-optimization techniques to create overlay links between nodes based on traffic inspection; creation of isolated address spaces and decentralized allocation of IP addresses within each such address space using Distributed Hash Tables (DHTs); tunneling of overlay links for maintaining the overlay structure even in presence of NATs and routing outages; and techniques for proxy discovery for tunnel nodes using network coordinates. I describe the IPOP virtual network architecture and present an evaluation of a prototype implementation using well-known network performance benchmarks and a set of distributed applications. To further facilitate deployment of IPOP, I describe techniques that allow new users to easily create and manage isolated address spaces and decentralized allocation of IP addresses within each such address space. I present generally applicable techniques that facilitate consistent routing in structured P2P systems even in presence of overlay faults, thereby benefiting different applications of these systems. In the context of the IPOP system, these techniques provide improved virtual IP connectivity. I also describe and evaluate decentralized techniques for discovering suitable proxy nodes to establish a 2-hop overlay path between virtual IP nodes, when direct communication is not possible.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Arijit Ganguly.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Figueiredo, Renato J.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2009-02-28

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0022657:00001

Permanent Link: http://ufdc.ufl.edu/UFE0022657/00001

Material Information

Title: Self-Managing Virtual Networks for Wide-Area Distributed Computing
Physical Description: 1 online resource (154 p.)
Language: english
Creator: Ganguly, Arijit
Publisher: University of Florida
Place of Publication: Gainesville, Fla.
Publication Date: 2008

Subjects

Subjects / Keywords: dht, firewalls, grid, nat, networks, p2p, virtual
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre: Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract: Sharing of computing and storage resources among different institutions and individuals connected over the Internet is seen as a solution to meet the ever-increasing computation and storage demands of modern applications. Several factors curtail the ability of existing applications to run seamlessly on Wide-area Networks (WANs): heterogeneous resource configurations, obscured access to resources due to Network Address Translators (NATs) and firewalls, inability to express sharing policies and lack of isolation provided by operating systems. This work addresses the problem of providing bi-directional network connectivity among wide-area resources behind NATs and firewalls. At the core of the presented approach is a self-managing networking infrastructure (IPOP) that aggregates wide-area hosts into a private network with decoupled address space management, and is functionally equivalent to a Local-area network (LAN) environment where a wealth of existing, unmodified IP-based applications can be deployed. The IPOP virtual network tunnels the traffic generated by applications over a P2P-based overlay network, which handles NAT/firewall traversal (through hole-punching techniques) and dynamically adapts its topology (through establishment of direct connections between communicating nodes) in a self-organized, decentralized manner. Together with classic virtual machine technology for software dissemination, IPOP facilitates deployment of large-scale distributed computing environments on wide-area hosts owned by different organization and individuals. A real deployment of the system has been up and running for more than one year, providing access to computational resources for several users. This dissertation makes the following contributions in the area of virtualization applied to wide-area networks: a novel self-organizing IP-over-P2P system with decentralized NAT traversal; decentralized self-optimization techniques to create overlay links between nodes based on traffic inspection; creation of isolated address spaces and decentralized allocation of IP addresses within each such address space using Distributed Hash Tables (DHTs); tunneling of overlay links for maintaining the overlay structure even in presence of NATs and routing outages; and techniques for proxy discovery for tunnel nodes using network coordinates. I describe the IPOP virtual network architecture and present an evaluation of a prototype implementation using well-known network performance benchmarks and a set of distributed applications. To further facilitate deployment of IPOP, I describe techniques that allow new users to easily create and manage isolated address spaces and decentralized allocation of IP addresses within each such address space. I present generally applicable techniques that facilitate consistent routing in structured P2P systems even in presence of overlay faults, thereby benefiting different applications of these systems. In the context of the IPOP system, these techniques provide improved virtual IP connectivity. I also describe and evaluate decentralized techniques for discovering suitable proxy nodes to establish a 2-hop overlay path between virtual IP nodes, when direct communication is not possible.
General Note: In the series University of Florida Digital Collections.
General Note: Includes vita.
Bibliography: Includes bibliographical references.
Source of Description: Description based on online resource; title from PDF title page.
Source of Description: This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility: by Arijit Ganguly.
Thesis: Thesis (Ph.D.)--University of Florida, 2008.
Local: Adviser: Figueiredo, Renato J.
Electronic Access: RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2009-02-28

Record Information

Source Institution: UFRGP
Rights Management: Applicable rights reserved.
Classification: lcc - LD1780 2008
System ID: UFE0022657:00001


This item has the following downloads:


Full Text

PAGE 1

1

PAGE 2

2

PAGE 3

3

PAGE 4

IwouldliketotakethisopportunitytoexpressmysinceregratitudetomyadvisorDr.RenatoFigueiredo,forhisconstantsupportandguidancethroughoutthecourseofmystudy.Withouthisencouragement,appreciationandopennessforresearchideas,itwouldnothavebeenpossibletopublishthedeningpapersofmywork.IalsothankhimforsupportingmystudieswithaResearchAssistantshipandalsofortheopportunitytopresentresearchpapersatvariousconferences.IamequallythankfultoDr.OscarBoykin,whoseideasandinitialworkontheBrunetP2PlibraryhavebeenthebaselineofmyresearchandalsothedevelopmentoftheIPOPsystem.IwouldalsoliketothankDr.JoseFortesforgivingmeanopportunitytobeapartoftheAdvancedComputingandInformationSystems(ACIS)Laboratory,anddocutting-edgesystemsresearch.Iamthankfultomyothercommitteemembers,Dr.RichardNewman,Dr.AlanGeorgeandDr.PaulAvery,forsomestimulatingdiscussions,andtheinspirationwhichIdrewfromtheirresearchwork,whichinturnkindledmyowncreativity.Intoday'scompetitiveworld,itisnotpossibletomakeanimpactwithoutteamwork.IamthankfultothemembersofACISP2Pteamforsharingwithmeseveraltoolsandinfrastructuresthathavehelpedmeinmyresearch.Inparticular,IthankDavidWolinsky,whosegrid-appliancewasveryusefulforgatheringdataforoneofmypapers.IthankACISresearchers,MauricioandAndrea,fortheireortsinensuringthehighavailabilityoflabfacilitiesandalsofortheirspontaneoushelpwheneverneeded.IamalsothankfultoVineetandGirish,forsharingawonderfulworkenvironment.Iamthankfultomyparents,whohavetaughtmetheabilityoftherightjudgement,andhavealwaystakenprideandexpressedcondenceinme.Iamthankfultomysister,Inakshi,whohasbeenencouragingattimesofdespair.Ishouldalsoexpressmythankfulnesstomygirlfriend,Swastiforbeingsosupportive,understandingandpatientallthiswhile.Finally,noachievementispossiblewithouttheblessingsandwishoftheAllMighty,whoIthankforallmyaccomplishmentsandseekhisblessingsforthefuture. 4

PAGE 5

page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 9 LISTOFFIGURES .................................... 10 ABSTRACT ........................................ 13 CHAPTER 1INTRODUCTION .................................. 15 1.1Wide-AreaDistributedComputing ...................... 18 1.1.1High-ThroughputComputing ..................... 18 1.1.2Cross-DomainCollaborationandDevelopment ............ 18 1.2IssueswithTraditionalApproaches ...................... 19 1.2.1HeterogeneityofComputeResources ................. 19 1.2.2ObscuredInternetConnectivity .................... 20 1.2.3LackofIsolation ............................. 20 1.3Solution:Virtualization ............................ 21 1.3.1VirtualMachines ............................ 21 1.3.2VirtualNetworks ............................ 22 1.3.3CombiningVirtualMachinesandVirtualNetworks ......... 23 1.4FocusandContributions ............................ 23 2SELF-ORGANIZINGVIRTUALNETWORKING ................. 25 2.1Peer-to-PeerNetworksandArchitectures ................... 26 2.2TunnelingIPoverP2P{IPOP ........................ 29 2.3ArchitectureofIPOP .............................. 30 2.3.1VirtualIPPacketCaptureandRouting ................ 30 2.3.2OverlayConnectionManagement ................... 31 2.3.2.1Connectionsetup ....................... 33 2.3.2.2Joininganexistingnetworkandacquiringconnections .. 35 2.3.2.3EstablishingcommunicationthroughNATs ........ 36 2.3.2.4Adaptiveshortcutcreation .................. 37 2.4Experiments ................................... 39 2.4.1ShortcutConnections:LatencyandBandwidth ............ 39 2.4.2SingleIPOPlink:LatencyandBandwidth .............. 43 2.4.3ImplementationOptimizations ..................... 46 2.5RelatedWork .................................. 46 2.5.1ResourceVirtualization ......................... 46 2.5.2ApplicationsofP2P ........................... 46 2.5.3TechniquesforNAT-Traversal ..................... 47 5

PAGE 6

.................................... 48 3WIDE-AREAOVERLAYSOFVIRTUALWORKSTATIONS .......... 49 3.1NodeCongurationandDeploymentinWOWs ............... 50 3.1.1BackgroundandMotivations ...................... 50 3.1.2VirtualMachineConguration ..................... 51 3.1.3DeploymentScenarios .......................... 52 3.1.4TestbedWOWPerformance ...................... 53 3.1.4.1Batchapplication ....................... 55 3.1.4.2Parallelapplication ...................... 56 3.1.4.3Virtualmachinemigration .................. 57 3.2RelatedWork .................................. 59 3.3Discussion .................................... 60 3.4MakingDeploymentofWOWsSimpler .................... 62 4DISTRIBUTEDHASHTABLESANDAPPLICATIONSINIPOP ....... 64 4.1RelatedWork .................................. 65 4.1.1SystemsBasedonDistributedHashTable .............. 65 4.1.2DistributedHashTableDesign ..................... 65 4.2DistributedHashTableonBrunet ....................... 67 4.2.1HandlingTopologyChanges ...................... 67 4.2.2ApplicationProgrammersInterface(API) ............... 68 4.2.3TolerancetoInconsistentRoots .................... 69 4.3ExtensionstoIPOP ............................... 71 4.4LifecycleManagementofWOWs ....................... 72 4.4.1CreatinganIPOPnamespace ..................... 73 4.4.2DynamicHostConguration ...................... 73 4.4.3ResolutionofVirtualIPtoP2PAddress ............... 75 4.5Experiments ................................... 75 4.6Conclusion .................................... 77 5IMPACTOFWIDE-AREACONNECTIVITYCONSTRAINTSONIPOP ... 79 5.1ConnectivityHazardsinWide-AreaNetworks ................ 80 5.2ImpactofConnectivityConstraints ...................... 82 5.2.1ImpactonCoreStructuredOverlayRouting ............. 83 5.2.2EectonAll-to-AllConnectivity .................... 84 5.2.3EectonDynamicVirtualIPConguration ............. 84 5.2.4EectonCompletionofDHTOperations ............... 85 5.2.5EectonDHTDynamics ........................ 85 5.2.6EectonTopologyAdaptation ..................... 86 5.3Discussion .................................... 86 6

PAGE 7

...................... 87 6.1AnnealingRouting ............................... 87 6.2TunnelEdges .................................. 90 6.3ImprovementsinStructuredRouting ..................... 93 6.3.1SimulationMethodology ........................ 94 6.3.2EvaluatingtheImpactofAnnealingRouting ............. 95 6.3.3EvaluatingtheImpactofTunnelEdges ................ 98 6.4TunnelEdgeImplementationinBrunet .................... 99 6.5Experiments ................................... 101 6.5.1StructureVericationofP2PNetwork ................. 101 6.5.1.1MajorityofnodesbehindNATs ............... 102 6.5.1.2Incompleteunderlyingnetwork ............... 102 6.5.1.3Wide-areadeployment .................... 102 6.5.2ConnectivitywithinaWOW ...................... 104 6.6RelatedWork .................................. 105 6.7Conclusion .................................... 107 7IMPROVINGEND-TO-ENDLATENCY ...................... 109 7.1NetworkCoordinates .............................. 110 7.2ImprovingLatencyofOverlayRouting .................... 112 7.2.1ProximityNeighborSelectioninBrunet ................ 112 7.2.2ImplementationinBrunet ....................... 115 7.2.3Experiments ............................... 116 7.3ImprovingLatencyofVirtualIPCommunication .............. 119 7.3.1Wide-areaResourceDiscovery ..................... 120 7.3.2DiscoveringProxiesinIPOP ...................... 121 7.3.2.1Closestconnection(CC) ................... 121 7.3.2.2ExpandingSegmentSearch(ESS) .............. 121 7.3.3Experiments ............................... 123 7.3.4Discussion ................................ 127 7.4Conclusion .................................... 127 8CONCLUSION .................................... 132 APPENDIX ANETWORKADDRESSTRANSLATORS ..................... 134 A.1TraversingNATswithUDP .......................... 135 A.2TraversingNATswithTCP .......................... 136 A.3ProblemsusingSTUN/STUNT ........................ 136 BVIVALDINETWORKCOORDINATES ...................... 137 B.1VivaldiAlgorithm ................................ 137 B.2ConsiderationsforConvergenceandAccuracy ................ 138 7

PAGE 8

........................ 140 C.1ApplicationProgrammersInterface(API) .................. 140 C.2TreeGenerationAlgorithms .......................... 141 C.3ApplicationsofMap-Reduce .......................... 142 REFERENCES ....................................... 144 BIOGRAPHICALSKETCH ................................ 154 8

PAGE 9

Table page 2-1BandwidthmeasurementsbetweentwoIPOPnodes:withandwithoutshortcuts 43 2-2CongurationsofmachinesusedforevaluatingperformanceofsingleIPOPlink 44 2-3Meanandstandarddeviationof10000pinground-triptimesforIPOP-UDPandphysicalnetwork ................................. 44 2-4MeanandstandarddeviationofrateofTCPresquest/responsetransactionsmeasuredover100sampleswithnetperfforIPOP-UDPandphysicalnetwork 44 2-5ComparisonofthroughputofasingleIPOPlinkinLANandWANenvironments 45 3-1CongurationoftheWOWtestbeddepictedinFigure 3-1 .AllWOWguestsrunthesameDebian/Linux2.4.27-2O/S ...................... 54 3-2ExecutiontimesandspeedupsfortheexecutionoffastDNAml-PVMin1,15and30nodesoftheWOW.Thesequentialexecutiontimefortheapplicationis22,272secondsonnode2and45,191secondsonnode34.Parallelspeedupsarereportedwithrespecttotheexecutiontimeofnode2. .............. 57 6-1Probabilityofbeingabletoformatunneledgeasafunctionofedgeprobabilityandnumberofrequirednearconnectionsoneachside ............... 93 6-2AdjacentP2Pnodesinwide-areadeploymentthatcouldnotcommunicateusingTCPorUDP ..................................... 103 7-1CongurationsofoverlaysusedtoevaluateCCandESS ............. 124 7-2RelativeandabsolutepenaltiesofCC,ESS(1.1),ESS(1.2)andESS(1.05)foroverlaysExp1(withPNS)andExp2(withoutPNS) ............... 126 7-3RelativeandabsolutepenaltiesofCCandESS(1.2)fordierentfractionofproxynodes ...................................... 130 9

PAGE 10

Figure page 1-1Hostedvirtualmachinemonitorrunningtwoisolatedvirtualmachine"guests" 22 2-1StructuredP2Prouting ............................... 28 2-2VirtualizingIPoveraP2Poverlay ......................... 30 2-3ArchitecturaloverviewofIPOP ........................... 31 2-4Peer-to-peerconnectionsinBrunet ......................... 32 2-5ConnectionsetupbetweenP2Pnodes ........................ 33 2-6Distributionofround-triplatenciesforICMP/pingpacketsover118-nodePlanet-Laboverlay.Twohopsseparatethepingsourcefromthedestination. ........ 38 2-7ProlesofICMPechoround-triplatenciesanddroppedpacketsduringIPOPnodejoin ....................................... 40 2-8ThreeregimesforpercentageofdroppedICMPpacketsinUFL-NWU(rst50packets). ........................................ 42 3-1TestbedWOWusedforexperiments ........................ 50 3-2FrequencydistributionsofPBS/MEMEjobwallclocktimes ........... 54 3-3ProleofexecutiontimesforPBS-scheduledMEMEsequentialjobsduringthemigrationofaworkernode .............................. 59 4-1Handlingtopologychanges .............................. 68 4-2InconsistentrootsinDHT .............................. 70 4-3ExampleoftwodierentWOWswithnamespacesN1andN2sharingacommonBrunetoverlay .................................... 72 4-4EventsleadingtovirtualIPaddresscongurationoftapinterface ........ 76 4-5CumulativedistributionofthetimetakenbyanewIPOPnodetoacquireavirtualIPaddress(T2inFigure 4-4 ).Meanandvarianceare27.41secondsand20.19seconds,respectively. .............................. 77 4-6CumulativedistributionofthetimetakenbyanewP2Pnodetogetconnectedtoitsleftandrightneighborsinthering(T1inFigure 4-4 ).Meanandvarianceare4.98secondsand13.99seconds,respectively. .................. 77 4-7CumulativedistributionofthenumberofdierentvirtualIPaddressestriedduringDHCP.Meanandvarianceare1.096and0.784,respectively. ....... 78 10

PAGE 11

............................... 80 5-2MultiplelevelsofNATs ............................... 82 5-3InconsistentrootsinDHT .............................. 83 6-1Annealingroutingalgorithm ............................. 89 6-2TunneledgebetweennodesAandBwhichcannotcommunicateoverTCPorUDPtransports .................................... 91 6-3Atedgelikelihoodof70%(0.7),thepercentageofnon-routablepairsvariesfrom9.5%to10.9%(thetotalnumberofpairsinthesimulatednetworkis1,000,000) 95 6-4Atedgelikelihoodof70%,thepercentageofwronglyroutedkeysvariesfrom9.5%to10.7%(thetotalnumberofsimulatedmessagesis10,000,000) ...... 96 6-5Comparinggreedyroutingwithtunneledgesform=3andm=2.Atedgelikelihoodof70%,thepercentageofnon-routablepairsinanetworkof1000nodesis(1)3.9%form=2,and(2)0.86%form=3. .................. 97 6-6Averagenumberofnon-routablepairs.Atedgelikelihoodof70%,thepercentageofnon-routablepairsforgreedyandannealingroutingis(1)withouttunneledges,10.26%and3.4%respectively;(2)withtunneledges,0.86%and0.21%respectively.Atedgelikelihoodsof95%,therearenonon-routablepairswithtunneledges. 97 6-7Averagenumberofwronglyroutedkeys.Atedgelikelihoodof70%,thepercentageofwronglyroutedkeysforgreedyandannealingroutingis(1)withouttunneledges,10.2%and3.4%respectively;(2)withtunneledges,0.86%and0.19%respectively. ...................................... 98 7-1Relativeerrorforround-triptimepredictionusingnetworkcoordinatesmeasuredafter1,3and5hoursofbootstrapping ....................... 112 7-2Proximityneighborselection ............................. 113 7-3Round-triptimes(RTTs)foroverlaypingswithPNSandwithoutPNS.AverageRTTis(1)1.75secs(withPNS),(2)2.86secs(withoutPNS). .......... 117 7-4PercentageofnodesinthequeriedrangeRthatareclosertothesourcenodethantheoneselectedbynetworkcoordinates.Onaverage,8.76%ofthenodesinthequeriedrangeRarecloserthantheselectednode. ............. 117 7-5Relativelatencyerrorforchoosingtheclosestnodeinthenetworkcoordinatespace.Onaverage,therelativelatencyerroris1.432. ............... 118 7-6ExpandingSegmentSearch(ESS) .......................... 123 11

PAGE 12

................ 125 7-8Relativepenalty(RP)forusingaproxynodeselectedbyCCandESS(1.2),whenallnodesinthenetworkcanserveasproxies,whenonly30%ofthenodescanserveasproxies. ................................. 128 7-9Relativepenalty(RP)forusingaproxynodeselectedbyCCandESS(1.2),whenallnodesinthenetworkcanserveasproxies,whenonly20%ofthenodescanserveasproxies. ................................. 129 C-1Executionofamap-reducecomputationoveratreeofnodes ........... 141 C-2BoundedbroadcastoverasegmentoftheP2Pring ................ 142 12

PAGE 13

SharingofcomputingandstorageresourcesamongdierentinstitutionsandindividualsconnectedovertheInternetisseenasasolutiontomeettheever-increasingcomputationandstoragedemandsofmodernapplications.SeveralfactorscurtailtheabilityofexistingapplicationstorunseamlesslyonWide-areaNetworks(WANs):heterogeneousresourcecongurations,obscuredaccesstoresourcesduetoNetworkAddressTranslators(NATs)andrewalls,inabilitytoexpresssharingpoliciesandlackofisolationprovidedbyoperatingsystems. Thisworkaddressestheproblemofprovidingbi-directionalnetworkconnectivityamongwide-arearesourcesbehindNATsandrewalls.Atthecoreofthepresentedapproachisaself-managingnetworkinginfrastructure(IPOP)thataggregateswide-areahostsintoaprivatenetworkwithdecoupledaddressspacemanagement,andisfunctionallyequivalenttoaLocal-areanetwork(LAN)environmentwhereawealthofexisting,unmodiedIP-basedapplicationscanbedeployed.TheIPOPvirtualnetworktunnelsthetracgeneratedbyapplicationsoveraP2P-basedoverlaynetwork,whichhandlesNAT/Firewalltraversal(throughhole-punchingtechniques)anddynamicallyadaptsitstopology(throughestablishmentofdirectconnectionsbetweencommunicatingnodes)inaself-organized,decentralizedmanner.Togetherwithclassicvirtualmachinetechnologyforsoftwaredissemination,IPOPfacilitatesdeploymentoflarge-scaledistributedcomputingenvironmentsonwide-areahostsownedbydierentorganizationandindividuals.Areal 13

PAGE 14

Thisdissertationmakesthefollowingcontributionsintheareaofvirtualizationappliedtowide-areanetworks:anovelself-organizingIP-over-P2PsystemwithdecentralizedNATtraversal;decentralizedself-optimizationtechniquestocreateoverlaylinksbetweennodesbasedontracinspection;creationofisolatedaddressspacesanddecentralizedallocationofIPaddresseswithineachsuchaddressspaceusingDistributedHashTables(DHTs);tunnelingofoverlaylinksformaintainingtheoverlaystructureeveninpresenceofNATsandroutingoutages;andtechniquesforproxydiscoveryfortunnelnodesusingnetworkcoordinates. IdescribetheIPOPvirtualnetworkarchitectureandpresentanevaluationofaprototypeimplementationusingwell-knownnetworkperformancebenchmarksandasetofdistributedapplications.TofurtherfacilitatedeploymentofIPOP,IdescribetechniquesthatallownewuserstoeasilycreateandmanageisolatedaddressspacesanddecentralizedallocationofIPaddresseswithineachsuchaddressspace.IpresentgenerallyapplicabletechniquesthatfacilitateconsistentroutinginstructuredP2Psystemseveninpresenceofoverlayfaults,therebybenetingdierentapplicationsofthesesystems.InthecontextoftheIPOPsystem,thesetechniquesprovideimprovedvirtualIPconnectivity.Ialsodescribeandevaluatedecentralizedtechniquesfordiscoveringsuitableproxynodestoestablisha2-hopoverlaypathbetweenvirtualIPnodes,whendirectcommunicationisnotpossible. 14

PAGE 15

ThegrowthoftheInternethasgivenanopportunitytoshareresourcessuchasCPUcyclesandstoragecapacityamongdierentinstitutionsconnectedoverwide-areanetworks.Besidescollaboration,suchsharingisparticularlyusefultomeettheever-increasingcomputationandstoragedemandsofmodernapplicationsfromdierentdomains,suchashigh-energyphysics,medicalimaging,businessdataanalysis,amongothers.Severalmiddlewaresolutions[ 1 ]havebeenproposedtofacilitateresourcesharinginawaythatnotonlyrespectsthepoliciesdenedbytheresourceowners,butalsoprovidesmaximumexibilitytoconsumers.SystemshavealsobeenconceivedandimplementedtoharnessidlecyclesfromdesktopsofusersconnectedtotheInternet[ 2 3 ].Commontotheseeortsisthevisionofprovidingcomputingasautilitythatcanbedeliveredbyapoolofdistributedresourcesinaseamlessmanner.ThetermsGridandutilitycomputingareusedtorefertosuchsystemsforwide-areadistributedcomputing. Asdistributedcomputingmovesfromwithinanorganizationtoawide-areacollaboration,severalnewchallengesarisethatlimittheclassofapplicationsthatcanbenetfromresourcesharing,comparedtolocal-areaenvironments. Firstly,resourcesownedbydierentdomainstendtodierintheirhardwareandsoftwarecongurations.Resourceprovidersindependentlychoosethecongurationsoftheresourcestheyown(includingO/Skernels,libraries,andapplications),whichmightbeincompatiblewithapplicationormiddlewarerequirements.Itisdicultfortheapplication/middlewaredevelopertomaintaindierentdistributionsforeverypossibleresourceconguration[ 4 5 ].Secondly,theincreasinguseofNATsandrewallroutersatInternetsiteshinderbi-directionalaccesstoresourcesacrossdierentdomains[ 6 { 8 ].Thenetworkpoliciesateachsitearedesignedwithfocusonsecuringhosts;often,theimplementationofsuchpoliciesmakesitdiculttosupportapplicationsfromexternalusers.Althoughsomemechanismstoaccessrewalledresources(forexample,secureshell 15

PAGE 16

Tofacilitatecollaboration,severalmiddleware-levelsolutionshavebeenproposed.TheGlobus[ 1 ]toolkitprovidesapublic-keybasedsecurityinfrastructureandWeb-servicebasedstandardsforinteractionsbetweencomponents.BOINC[ 9 ]providesaplatformforwritingdistributedapplicationsthatcanharnessCPUcyclesondesktophostsbelongingtoindividualusersconnectedtotheInternet.However,theseapproachesrequireexistingapplicationstobere-writtenwiththeapplicationprogramminginterface(API)theyprovideforoperationssuchasschedulinganddatatransfers..Theheterogeneityofwide-areahostsfurthercomplicatesmiddlewaredeploymentandconguration. Inthisdissertation,anovelapproachtowide-areadistributedcomputingisproposedandinvestigated.Thisapproachexploitsvirtualizationtoprovidetoapplicationstheirpreferredexecutionenvironment.Theexecutionenvironmentofanapplicationconsistsofthemachineonwhichitrunsandthenetworkoverwhichitcommunicateswithotherprocesses,middlewarecomponentsandapplications.Inthisapproach,anapplicationexecutesinsideavirtualmachine(VM)andisconnectedtootherapplicationsthroughavirtualnetworkthatisfunctionallyequivalenttoalocal-areaTCP/IPnetwork. "Classic"VMs[ 10 { 12 ]enablemultipleoperatingsystems,completelyisolatedfromeachother,totime-shareresourcesofasinglemachine.VMsencapsulatethesoftwaredependenciesofanapplicationconsistingoftheentireOS,libraries,andapplicationswithinaclosedenvironment,whichiscompletelydecoupledfromthephysicalhost.AVM-basedexecutionenvironmentcanbequicklyinstantiatedonanyphysicalhost[ 4 5 ],withtheonlysoftwaredependencebeingthepresenceofavirtualmachinemonitor(VMM). 16

PAGE 17

13 { 16 ]providetoapplicationsacommunicationenvironmentwithdecoupledIPaddressmanagementandall-to-allconnectivity.Inavirtualnetwork,idiosyncrasiesofheterogeneousnetworkaccessarehandledbythevirtualizationlayer,whileapplicationsperceiveanenvironmentthatisfunctionallyequivalenttoalocal-areanetwork. Withinthecontextofsuchavirtualizeddistributedsystem,thisdissertationfocusesonthearchitecture,design,implementationandevaluationofanovelvirtualnetworkingtechnique.TheproposedapproachcombinesIPtunnelingandpeer-to-peer(P2P)overlaynetworkingtoaggregatewide-areahostsintoanIP-over-P2P(IPOP)virtualnetwork.TheIPOPvirtualnetworkisarchitectedtobescalable,fault-tolerantandself-managing,requiringminimaladministrativecontrol.Thevirtualnetworkalsoincorporatesnoveltechniquestoprotectitselffromtheconnectivityandperformancedegradationduetowide-areaconnectivityconstraintsthatpreventcommunicationbetweenpairsofnodes. Theremainderofthisdissertationisorganizedasfollows.Chapter 2 describestheIPOParchitecture.Chapter 3 presentsandevaluatessystemswhichcombineVMtechnologiesandIPOPvirtualnetworkingtoprovidehomogeneouslyconguredWide-areaOverlaysofVirtualWorkstations(WOWs).Chapter 4 presentsthedesignofastoragelayerbasedonDistributedHashTable(DHT)andhowitcanbeleveragedtofacilitatethedeploymentofWOWs.Chapter 5 describeshowconnectivityhazardsinaWide-areacanaectconsistencyofP2Prouting,andsubsequentlyconnectivityandperformancewithintheIPOPvirtualnetwork.NoveltechniquesthatfacilitateconsistentroutingunderconnectivityconstraintsarepresentedinChapter 6 ,andtheirimplementationintheIPOPsystemisdescribed.Chapter 7 investigatestechniquestodiscoversuitableproxynodestominimizeend-to-endlatencywhendirectcommunicationisnotpossible,andalsovalidatesexistingtechniquestoreducelatencyofmulti-hopoverlayrouting.Finally,thedissertationisconcludedinChapter 8 17

PAGE 18

1.1.1High-ThroughputComputing 17 18 ]consistofasharedpoolofloosely-coupledcomputeresourcesmanagedbymiddlewarecomponents[ 19 20 ]whichincludejobscheduler,resourcemonitor,amongothers.Thesystemthroughputcanbereadilyincreasedbyaddingmore(heterogeneouswithrespecttohardwarecongurations)computeresources.Similarly,computationally-intensiveloosely-coupledparallelapplications(suchasmaster/workerapplicationswithsmallcommunication-to-computationratios)oftenachievealowerexecutiontimewhenmoreresources(workers)areavailable,irrespectiveofrelativeCPUspeedsoftheworkers. Addinganewresourcetothesharedpoolrequiresconguringanewhostwiththenecessarysoftware(OS,libraries,applicationbinariesandmiddleware)andalsoensuringthatthenewresourceisaccessibleoverthenetwork{aprocessthatbecomesnon-trivialwhenresourcesareownedbymultipleorganizations.Thisdissertationinvestigatesmechanismsbywhichnewresourcescanbeaddedtosuchpoolwithminimalmanualintervention,withfocusonaccessibilityofthenewresource. 18

PAGE 19

Asanillustration,experienceswithresearchersinthecoastalsciencesapplicationdomainrevealthatresearchersindierentuniversities(e.g.UniversityofFlorida,VirginiaInstituteofMarineSciences)collaboratebyexchangingdatasetsandpredictionmodels.Theyrunpredictionsbycouplingmodelsthatrunatdierentlocations,andthesimulationmodelsoftenrequirenotonlyanexecutablebinarybutalsoavarietyofsupporttools(scriptinglanguages,dataconversiontools,geographicalinformationsystems).UsersateachsiteareveryfamiliarwithOSlevelabstractions,suchasUnixaccountsanddistributedlesystems.EventhoughGridframeworksandtoolsareavailabletofacilitatecross-domaincomputationanddatatransfers,insuchscenariosusersstillwanttoreusetheirexistingsoftwareinfrastructurewithoutextensivemodications. 1.2.1HeterogeneityofComputeResources 19

PAGE 20

Manyapplicationshavestrictsoftwaredependenciesandthereforecannotbedeployedonarbitraryhosts.Todealwithheterogeneity,applicationprogrammersmustre-writetheirapplicationstotwithintheresourceframeworktheyareprovided.Modifyingthewealthofexistingapplicationstomakethemrunonremoteresourcesisseenasanobstacletowide-areadistributedcomputing. MechanismsforNATtraversalwithUDP(STUN)[ 21 ]andTCP(STUNT)[ 22 ]exist,andarebeingusedbyapplicationssuchasSkype[ 23 ].However,thesemechanismsrequiresettingupandmanagingpubliclyreachable(STUNorSTUNT)servers.Furthermore,theycannotbedirectlyusablebyexistingTCP/IPapplicationsthatwerenotdesignedtousethesemechanisms. 20

PAGE 21

Itsisequallyimportanttoisolatefromeachothertheapplicationsfromdierentusersthatsharethesameresource.Traditionalapproachesrelyontheoperatingsystemstoprovidethedesiredsecurityandperformanceisolation.However,thecomplexityofmodernoperatingsystemsintroducesseveralloop-holesthatcanbeexploitedbymaliciousapplications. 12 ]andXen[ 10 ])wereoriginallyintroducedbyIBM(Systems/370)in1970toallowfortime-sharingofitsexpensivemainframeplatforms.Virtualmachinesallowsimultaneousexecutionofmultiplefull-blownoperatingsystemsonsamehost(seeFigure 1-1 ).Thisprocessisachievedthroughathinlayerofsoftwarecalledvirtualmachinemonitor(VMM),whichprovidestoeachrunningoperatingsystem(guest)anabstractionofacompletehardwareplatform(CPU,memory,storageandperipherals).VirtualmachinesprovideaninterfaceequivalenttoISAoeredbytheunderlyingplatformandallowmostinstructionsissuedbytheguestOSandguestapplicationstobedirectlyexecutedontheCPU,thusreducingtheoverheadofemulation. Virtualmachinescompletelydecouplethesoftwareenvironmentforanapplicationfromtheunderlyingphysicalhost.Theentireexecutionenvironment(operatingsystem,librariesandapplicationbinaries)canbeencapsulatedintoasinglelargele(calledaVM 21

PAGE 22

Hostedvirtualmachinemonitorrunningtwoisolatedvirtualmachine"guests" image),thatcanbecopiedandinstantiatedonanyphysicalhostwithasuitablevirtualmachinemonitor.Thisencapsulationenableshomogeneouscongurationsofwide-arearesourcesusingunmodiedmiddlewareandapplications,withoutinterferingwiththelocalsitepolicies.In[ 24 ],theauthorsproposetouseVMimagestodeployandmaintainsoftwareinanorganization.Theentirememory,CPUstateofaVMcanbecheckpointedandresumed,thusallowingformigrationofunmodiedapplications.VMsconnetheguestapplicationswithinaclosedsandbox,whichcanpreventamaliciousapplicationfromcausingharmtothephysicalhostresources. Avirtualmachinemonitoroperatesatalevelwhereithastomanageonlyafewresources(CPU,memoryanddevices)asopposedtoamodernOSthatmanageseveralentities(users,les,processesetc).ThissimplicitynotonlymakesVMMsmoresecurethanmodernOSes,butalsomakeresourceusagepoliciestobeeasilyexpressedandenforced[ 10 ]. 22

PAGE 23

13 { 15 ]decouplethenetworkenvironmentofanapplicationfromthephysicalenvironmentandprovideanopportunitytoaggregatewide-arearesources.Thevirtualizednetworkhasitsownaddressspace,andallapplicationtracisisolatedfromthephysicalnetwork.ThevirtualizationlayerhandlesallcomplicationsrelatingtothepresenceofNATs/rewalls,thuspresentingtoapplicationsanenvironmentsimilartolocal-areanetworks. 13 { 15 ]techniquesexist,butimposenon-trivialmanagementoverhead. Inthiswork,IfocusontheIPOPself-managingvirtualnetwork,whichusesP2PtechniquesforoverlayroutinganddecentralizedestablishmentofdirectconnectionsamongnodesbehindNAT/rewallrouters.IPOPcanscaletoalargenumberofhosts,andishighlyresilienttonodeandlinkfailures.ThenoveltyofthisworkliesintheapplicationofstructuredP2Ptechniquestothedierentaspectsofthevirtualnetwork{addressspacemanagement,routediscoveryandNAT-traversal.IncombinationwithclassicVMtechnology,IPOPfacilitatescreationofhomogeneouslyconguredWide-areaclustersofVirtualWorkstations(WOWs).Thesesystemssupportexecution/checkpoint/migration 23

PAGE 24

1 ],Condor[ 18 ]),thusprovidingexcellentinfrastructurefordeploymentofdesktopgridsandcross-domaincollaboration. WithinthecontextoftheIPOPsystem,thekeycontributionsofthisworkarelistedbelow: 1. 25 ]fromMicrosoft)requirehighly-availableserversforout-of-bandexchangeofinformationrelevanttoNAT-traversal.ThedecentralizedNAT-traversalthatusesstructuredP2Proutingfortheout-of-bandmessagingbetweenNATedhosts,requiringonlyaneasy-to-manageseednetworkofpublicnodes.TheimplementationofthistechniqueintheIPOPsystemisdescribedinChapter 2 2. 4 3. 26 ]andseveralapplicationsandservicesthatbuildontopofit.IhavedevelopedgenerallyapplicabletechniquestofacilitateconsistentP2Proutinginpresenceofoverlayfaults.ThesetechniquesandthesubsequentimprovementsinroutingarepresentedinChapter 6 4. 27 ]isawell-knowntechniquetoreducelatencyofstructuredP2Prouting,byselectingoverlaylinksbasedonproximity.Toassessnetworkproximitywithoutexplicitlatencymeasurements,atechniquecallednetworkcoordinates[ 28 ]isusedthatallowsembeddingnodelatenciesinalow-dimensionalspacesuchthatthedistanceinthecoordinatespaceprovidesanestimateoflatency.IhaveinvestigatedtheusefulnessofPNSbasedonnetworkcoordinates,asameanstoimproveroutelatencyoftheIPOPoverlays.Chapter 7 describesthetechniqueandpresentsanalysisofadeploymentonPlanetLab[ 29 ]. 5. 7 ). 24

PAGE 25

TheincreasinguseofNetworkAddressTranslators(NATs)andrewallscreatesasituationthatsomenodesonthenetworkcancreateoutgoingconnections,butcannotreceiveincomingconnections.Thislackofbi-directionalconnectivityisrecognizedasahindrancetoprogramminganddeployingdistributedapplications[ 6 7 30 ].ProtocolsforNAT/rewalltraversal[ 21 ]exist,butrequireapplicationstobere-linkedwiththenewprotocollibraries. Virtualnetworks[ 13 { 15 ]canprovidetoapplicationsrunninginavirtualizedinfrastructuretheperceptionofanenvironmentfunctionallyidenticaltoalocal-areanetwork,despitethepresenceofNATsandrewallroutersinthephysicalinfrastructure.Virtualnetworksalsoconnecommunicationofadistributedapplicationwithinanenvironmentthatislogicallyisolatedfromthephysicalnetworkinfrastructure,thusreducingvulnerabilitytonon-participatinghostsandusersatasite. Inthecurrenttechniquesfornetworkvirtualization,overlayroutingtablesareeithersetupbyanadministratororrelyonvirtualnetworkrouters/switchestohaveall-to-allconnectivityamongthemselves.Hence,theprocessofadding,conguringandmanagingclientsandserversthatroutetracwithintheoverlayisdiculttoscale.Althoughtopologyadaptationispossibleusingtechniquesproposedin[ 31 ],adaptiveroutesarecoordinatedbyacentralizedserver.Theseapproachescanprovidearobustoverlaythroughredundancy.However,theeortrequiredtopreserverobustnesswouldincreaseeverytimeanewnodeisaddedandthenetworkgrowsinsize. Forwide-areacollaborations,itisalsonecessarythatanetworkvirtualizationtechniqueisscalable,fault-tolerantandrequiresminimaladministrativecontrol.ThisworkpresentsIPOP{anetworkvirtualizationtechniquebasedonIPtunnelingoverpeer-to-peer(P2P)networksthatmeetstheserequirements.P2Pnetworksare 25

PAGE 26

Therestofthischapterisorganizedasfollows.Section 2.1 givesanoverviewofP2Pnetworks,applicationsanddierentP2Parchitectures.InSection 2.3 ,IpresenttheIPOParchitecture,andmechanismstodiscover,establishandmaintainoverlaylinksbetweennodesbehindNATs/rewalls.AcomprehensiveevaluationofthevirtualnetworkperformanceispresentedinSection 6.5 Asdesktopcomputersbecomemoreandmorepowerful,thereisanincreasinginteresttouseresourcesfromcommoditycomputersattheedgeoftheInternet.Peer-to-peer(P2P)networksrefertodistributedcomputingarchitecturesthataredesignedtofacilitatethisexchangeofcomputerresources(content,storageandCPUcycles)bydirectexchange,ratherthanrequiringintermediationfromcentralizedservers.P2Pnetworksaredesignedtofunction,scaleandself-organizeinthepresenceofahighlytransientpopulationofnodes,andthatofnetworkandcomputerfailures.ApplicationsofP2Pnetworksinclude: 1. 26

PAGE 27

32 ],Gnutella[ 33 ],Kazaa[ 34 ],Oceanstore[ 35 ],PAST[ 36 ],CFS[ 37 ]. 2. TherstgenerationofP2Pnetworks(suchastheoriginalNapstersystem)requiredcentralizedserverstokeeptrackoflocationofdierentlesintheP2Pnetwork.Oncealewascorrectly,located,itcouldbetransferreddirectlywithoutinvolvingtheserver.GnutellaandKazaaalleviatedtherequirementofhigh-capacitycentralizedserversbystoringadistributedindexoflesacrossasetofP2Pnodes.Acommonpointamongtheseprotocolswasthatnoneofthemimposedanystructureontheoverlaytopology,andalecouldpotentiallybelocatedonanynodeinthenetwork.TheseP2Pnetworksthusincurredhighoverheadsonsearchingforles.Cachingtechniquesreducedsearchcost,butlimitedtheapplicabilityofthesesystemstoimmutabledata.Nevertheless,thelackofstructurealsomadetheseprotocolshighlyresilienttochurn. InthenextgenerationofP2Psystems[ 38 { 41 ],theP2Pnodesarrangethemselvesintoawell-denedtopology(suchasaring,orahypercube)basedontheirP2Pidentiers(alsocalledP2Paddresses)chosenfromalargeaddressspace.Theoverlaytopologyandroutingprotocolsboundthenumberofhops(withasymptoticcomplexitysub-linearwithrespecttothesizeofthenetwork)betweenP2Pnodes. Figure 2-1 illustratesroutingoveraring-structuredP2Pnetwork.Eachnodemaintainsaroutingtablewithnetworkendpoints(IPaddressandport)ofonlyafewnodesinsystem.Ateachnode,thefollowingroutingruleisexecuted:ifthedestinationappearsinthelocalroutingtable,themessageisdirectlycommunicatedtothedestination;otherwiseitissenttothenode(intheroutingtable)thatisclosesttothedestination;andincasethecurrentnodeisclosest,themessageisdeliveredlocally. 27

PAGE 28

ToillustrateP2PoverlayroutingtechniquesinFigure 2-1 (A),node100sendsamessagetoaddress130.Themessageisforwardedtonode118andsubsequentlytonode128,whichhasnode130initsroutingtable.Eventually,themessageisforwardedtonode130whereitisdeliveredtothelocalapplication.InFigure 2-1 (B),node100sendsamessagetoaddress133(destinationnodenotpresentinthenetwork);themessageisroutedthroughnodes118,128and130tonode134(closesttoaddress133),whereitisdeliveredtothelocalapplication. B StructuredP2Prouting StructuredP2PsystemsprovideanobjectstoragefacilitycalledaDistributedHashTable(DHT).Eachobjectisassociatedwithakeythatbelongstothesameaddressspaceasnodeidentiers.Theownershipofkeysispartitionedamongparticipatingnodes,suchthateachkeyisstoredonasetofnodesthatareclosesttothekeyintheidentierspace.ThispartitioningofkeyownershiptogetherwithecientroutingboundslookupoverheadforanobjectstoredintheDHT. 28

PAGE 29

42 ][ 43 ].AnIP-over-P2Poverlaybenetsfromthesynergyoffault-toleranttechniquesappliedatdierentlevels.TheIPOPoverlaydynamicallyadaptsroutingofIPpacketsasnodesfailorleavethenetwork;evenifpacketsaredroppedbysuchnodes,IPandotherprotocolsaboveitinthenetworkstackhavebeendesignedtocopewithsuchtransientfailures. 21 ][ 22 ]totraverseNAT/rewalls.TheseapproachesrequiresettingupgloballyreachableSTUNorSTUNTserversthataidbuildingthenecessaryNATstatebycarefullycraftedexchangeofpackets.WithP2Pnetworks,eachoverlaynodecanprovidethisfunctionalityfordetectionofNATsandtheirsubsequenttraversal.Thisapproachisdecentralizedandintroducesnodedicatedservers. AusefulapplicationforIPOPisinareaofGridComputing[ 1 ],wherewide-areanodes(irrespectiveoftheirphysicallocations),canbeaggregatedintoavirtualIPnetwork(Figure 2-2 ).TheIPOPlayersitsbetweenapplications(e.g.Gridclustersofphysicaland/orvirtualmachines,voice-over-IP)andphysicalcomputingnodesinterconnectedbyexistingIPnetworkinginfrastructures.Thisvirtualnetworkiscompletelydecoupledfromthephysicalnetwork,whichnotonlyisolatestheGridapplicationtrac,butalsoallowsformigrationofvirtualIPnodesintonewsubnets. 29

PAGE 30

VirtualizingIPoveraP2Poverlay TheIPOParchitectureisdescribednext. 2-3 ):avirtualizednetworkinterfaceforcapturingandinjectingIPpacketsintothevirtualnetwork,andaP2Pnodethatencapsulates,tunnelsandroutespacketswithintheoverlay.IPOPbuildsuponauser-levelframeworkprovidedbytheBrunetP2Pprotocolsuite[ 44 ],whichprovidesmechanismstodiscover,establishandmaintainoverlaylinksbetweennodes,evenacrossNATsandrewalls.ThenextsectiondescribeshowIPpacketsarecaptured/injectedattheendhosts,androutedontheBrunetP2Pnetwork;andthefollowingsectionwilldiscussmechanismsforoverlaylinksetupandNAT/rewalltraversal. 30

PAGE 31

ArchitecturaloverviewofIPOP Figure 2-3 showstheowofdatabetweentwoapplicationscommunicatingoverthevirtualIPnetworkprovidedbyIPOP:1)Application(onleft)sendsdatatoavirtualIPdestination(src:172.16.0.2,dest:172.16.0.18).2)IPOPreadsouttheethernetframefromthetapandextractsthevirtualIPpacket,3)ThevirtualIPpacketisencapsulatedinsideaP2P(Brunet)packetaddressedtoP2PnodeB(right)associatedwiththevirtualIPdestination,4)andthenroutedwithintheP2PoverlaytoadestinationnodeB.5)AtnodeB,IPOPextractsthevirtualIPpacketfromtheP2Ppacket,6)buildsanethernetframethatitinjectsintothetap.7)Eventually,dataisdeliveredtoapplication(onB).WhileIPOPseesEthernetframes,itonlyroutesIPpackets;non-IPtrac,notablyARPtrac,iscontainedwithinthehost. TheP2PaddressofanIPOPnodeisthe160bitSHA-1hashoftheIPaddress(virtualizedIP)ofthetapdevice,andthisisusedformappingadestinationvirtualIPaddresstoP2Paddress(ofdestinationIPOPnode)andvice-versa. 44 ],particularlythemechanismsthatenablenewnodestojoinanexistingP2PnetworkandconnectionstoformbetweennodesbehindNATs.Theterm\connection"isusedtorefertoanoverlaylinkbetweenP2Pnodesoverwhichpacketsarerouted. 31

PAGE 32

Peer-to-peerconnectionsinBrunet BrunetmaintainsastructuredringofP2Pnodesorderedby160-bitBrunetaddresses(Figure 2.3.2 ).EachnodemaintainsconnectionstoitsnearestneighborsintheP2Paddressspacecalledstructurednearconnections. 45 ].ThesystemsupportsdecentralizedtracinspectionandbootstrappingofdirectoverlayconnectionsbetweenfrequentlycommunicatingP2Pnodes,whichwecalledshortcutconnections. Brunetusesgreedyroutingofpacketsoverstructured(nearandfar)connections,whereateachoverlayhopthepacketgetsclosertothedestinationintheP2Paddressspace.Thepacketiseventuallydeliveredtothedestination;orifthedestinationisdown,itisdeliveredtoitsnearestneighborsintheP2Paddressspace. 32

PAGE 33

ConnectionsetupbetweenP2Pnodes ConnectionsbetweenBrunetnodesareabstractedandmayoperateoveranytransport.Theinformationabouttransportprotocolandthephysicalendpoint(e.g.IPaddressandportnumber)iscontainedinsideaUniformResourceIndicator(URI),suchasbrunet.tcp:192.0.1.1:1024.NotethataP2PnodemayhavemultipleURIs,ifithasmultiplenetworkinterfacesorifitisbehindoneormorelevelsofnetworkaddresstranslation.TheencapsulationprovidedbyURIsprovidesextensibilitytonewconnectiontypes;currentlythereareimplementationsforTCPandUDPtransports. 2-5 illustratesconnectionsetupbetweentwoP2Pnodes.Themechanismforconnectionsetupbetweennodesconsistsofconveyingtheintenttoconnect,andresolutionofP2PaddressestoURIsfollowedbyalinkinghandshake,whicharesummarizedasfollows: 33

PAGE 34

2. 2.3.2.3 Nodeskeepanidleconnectionstatealivebyperiodicallyprobingeachotherthroughpingmessages,whichalsoinvolvesresendingofunrespondedpingsandexponentialback-osbetweenresends.Asuccessionofunrespondedpingsisperceivedasthetargetnodegoingdownoranetworkoutage,andthecurrentnodediscardstheconnectionstate.Thesepingmessagesincurbandwidthandprocessingoverheadatendnodes,whichrestrictsthenumberofconnectionsanodecanmaintain. Itshouldbenotedthatthelinkingprotocolisinitiatedbyboththepeers,leadingtoapotentialraceconditionthatmustbebrokeninfavorofonepeersucceedingwhiletheotherfailing.Therefore,eachnoderecordsitsactivelinkingattempttothetarget,beforesendingoutalinkrequest.Now,ifthecurrentnodenowgetsalinkrequestfromitstarget,itrespondswithalinkerrormessagestatingthatthetargetshouldgiveupitsactiveattemptandletthecurrentnodegoaheadwiththeprotocol.Thetargetnodegivesupitsconnectionattempt,andeventuallythecurrentnodewouldsucceed.Itispossiblethatboththenodes(currentandtarget)initiateactivelinking,getlinkerrormessages 34

PAGE 35

Thenewnodemustnowidentifyitscorrectpositioninthering,andformstructurednearconnectionswithitsleftandrightneighbors.ItsendsaCTMrequestaddressedtoitselfonthenetworkthroughtheleaftarget.TheCTMrequestisroutedoverthestructurednetwork,and(sincethenewnodeisstillnotinthering)eventuallydeliveredtoitstwonearestneighbors.TheCTMrepliesreceivedbytheforwardingnodearepassedbacktothenewnode.ThenodenowknowstheURIsofitsnearest(leftandright)neighborsandviceversa,andcanformstructurednearconnectionswiththem.Atthispoint,thenodeisfullyroutable.ThetimetakenforanewnodetojoinanexistingP2Pnetworkofmorethan130nodesandbecomefully-routableisfoundtobeoftheorderofseconds(seeFigure 2-7 ). 35

PAGE 36

45 ])ontheringusingprotocolsdescribedinSection 2.3.2.1 21 ]protocol,therearefourtypesofNATsincommonusetoday(describedinAppendix A ).Ofthesefourtypes,allhavethepropertythatifaUDPpacketissentfromIPaddressAportpatoIPaddressBportpb,theNATdevicewillallowpacketsfromIPaddressBportpbtoowtoIPaddressAportpa.Inadditiontotheaboveproperty,threeoutoffourofthecommonNATtypes(allbutthesymmetric)usethesamemappingfortheNAT'sport!internal(IP,port)pair,irrespectiveofthedestination(IP,port).TheUDPtransportimplementationofBrunetisdesignedtodealwithNATtraversalforlargeclassofNATdevicesfoundinpracticaldeployments.Thebi-directionalityoflinkinghandshakeiswhatenablesnodespunchingholesintotheirownNATsasdescribedinSTUN[ 21 46 47 ].Thishappensbecauseoneoftheincomingpacketsisperceivedasareplytoanoutgoingpacket,andallowedtopass.Furthermore,thisapproachisdecentralizedandintroducesnosinglepointsoffailureordedicatedservers(unliketheSTUNprotocol). BasedonthedescriptionofURIspresentedearlier,itfollowsthatanodeinsideaprivatenetworkbehindaNATcanhavemultipleURIs(correspondingtotheprivateIP/port,andNATassignedIP/portwhenitcommunicateswithnodesonpublicinternet).Furthermore,notallURIscanbeusedtocommunicatewithit.WhichURIsareusabledependonthelocationsofcommunicatingnodesandthenatureofNATs.Forexample,twonodesbehindaNATthatdoesnotsupport\hairpin"translation[ 46 ]cancommunicateonlyusingURIscorrespondingtotheirprivateIP/port,andtheyfailwhenusingtheNATassignedIP/port.Incontrast,twonodesbehind\hairpin"NATs 36

PAGE 37

SinceprivateIPaddressesarenotuniqueacrossLANs,itispossiblethattryingaURIwithprivateaddressleadstocommunicationwithanodeotherthantheintendedconnectiontarget.However,eachnodehasauniqueP2Paddress.LinkingmessagescontaintheP2Paddressesofboththepeers.ThisinformationisusedtodetectsuchfalsehitsinthesameLANandsuppressthelinkingattempttotheintendedtargetusingthatURI. 2-6 showsobservedlatenciesofashighas1600msbetweenIPOPnodes(in[ 16 ])connectedtoP2Pnetworkofover100nodesonPlanetLab.Thesehighlatencieswereduetomulti-hopoverlayroutingthroughhighlyloadedPlanetLabnodes.Thissectiondescribesthetechniquefordecentralizedadaptiveshortcutcreationwhichenablesthesetupofsingle-hopoverlaylinksondemand,basedontracinspection.Section 6.5 showsthatshortcutsgreatlyreducelatencyandimprovebandwidthofthevirtualnetwork. 37

PAGE 38

Distributionofround-triplatenciesforICMP/pingpacketsover118-nodePlanet-Laboverlay.Twohopsseparatethepingsourcefromthedestination. TheBrunetP2Plibraryisanextensiblesystemwhichallowsdeveloperstoaddnewroutingprotocolsandconnectiontypes.Foreachconnectiontype,aP2PnodehasaConnectionOverlordthatensuresthenodehastherightnumberofconnectionsofthattype. TosupportshortcutP2Pconnections,IhaveimplementedaShortcutConnec-tionOverlordwithintheBrunetlibrary.TheShortcutConnectionOverlordatanodetrackscommunicationwithothernodesusingametriccalledscore.Thealgorithmisonebasedonaqueueingsystem.Thenumberofpacketsthatarriveintheithunitoftimeisai.Thereisaconstantservicerateonthisworkqueue.Thescoreistheamountofremainingworkleftinthisvirtualqueue.Ifthescoreattimeiissi,andtherateofserviceisc,itfollows:si+1=max(si+aic;0) Thehigherthescoreofadestinationnode,themorecommunicationtherehasbeenwithit.Thenodesforwhichthevirtualqueueisthelongestarethenodesitconnectto.TheShortcutConnectionOverlordestablishesandmaintainsshortcutconnectionswithnodeswhosescoresexceedacertainthreshold. 38

PAGE 39

Thenextfewsectionspresentexperimentalresultscomparingbandwidthandlatencyofthevirtualnetworkwithandwithouttheadaptiveshortcuts.Ialsoreportthetimerequiredforthenetworktoadaptandcreateshortcutsbetweencommunicatingnodes. 39

PAGE 40

3-1 .ThenodesinUFLwerelocatedinthesameprivatenetworkbehindasiteNAT,whiletheonesatNWUwereinseparateprivatenetworks(behindVMwareNATsontwodierenthosts)behindacommonsiterewall. BDroppedpackets ProlesofICMPechoround-triplatenciesanddroppedpacketsduringIPOPnodejoin 40

PAGE 41

2-7 summarizestheresultsfromthisexperiment.TheexperimentconsidersthreecombinationsofthelocationofthenodeAjoiningthenetworkandthenodeBitcommunicatesto:UFL-UFL,UFL-NWUandNWU-NWU. 2-7A :Theplotshowslatenciesaveragedover100trialsasreportedbythepingapplicationforpacketswhichwerenotdropped. 2-7B :Theplotshowsthepercentageoflostpackets(over100trials)foreachICMPsequencenumberasreportedbythepingapplication. FocusinginitiallyontheUFL-NWUcase,analyzingthedatafortheinitialtensofICMPpacketsshowsthreedierentregimes(seeFigure 2-8 ).FortherstthreeICMPrequests,onaverage,90%ofthepacketsaredropped TheUFL-UFLcasealsorevealsthesamethreeregimes;however,thetimingsdier.Ittakesuptoabout40ICMPpingpacketsbeforethenodebecomesroutableovertheP2Pnetwork.Furthermore,ittakesabout200ICMPpacketsbeforeshortcutconnectionsareformed.ThishighdelayisbecauseofthenatureoftheUFLNATandtheimplementation 2-7 ,thesefewinitialpacketsdonotappearontheplot.4 41

PAGE 42

ThreeregimesforpercentageofdroppedICMPpacketsinUFL-NWU(rst50packets). oftheIPOPlinkingprotocol,asfollows.TheUFLNATdoesnotsupport\hairpin"translation[ 46 ],i.e.itdiscardspacketssourcedwithintheprivatenetworkanddestinedtotheNAT-translatedpublicIP/port.AsdescribedinSection 2.3.2.1 ,thelinkinghandshakeinvolvesnodestryingtargetURIsonebyoneuntilndingoneonwhichtheycansendandreceivehandshakemessages.InIPOP,nodesrstattempttheURIscorrespondingtotheNATassignedpublicIP/portforthelinkinghandshakeduringtheconnectionsetup.Becauseofconservativeestimatesforthere-sendinterval,theback-ofactorandthenumberofretriesforUDPtunneling,nodestakeseveralsecondsbeforegivinguponthatURIandtryingthenextinthelist(privateIP/port)onwhichtheysucceed.InSection 2.4.3 ,Idescribeimplementation-leveloptimizationsthatincreasethelikelihoodofpickingthecorrectURIintherstattempt,thusreducingtheconnectionsetupdelayinUFL-UFLcasetoafewseconds. FortheNWU-NWUcase,thetwonodesareeitherinsidethesameprivatenetwork(theVMwareNATnetwork),orondierenthosts.TheVMwareNATsupportshairpintranslation,hencebothURIsfortheP2Pnodeworkforconnectionsetup.AswiththeUFL-NWUcase,thelinkingprotocolsucceedswiththerstURIittriesandhencetheshortcutconnectionsaresetupwithinafew(about20)ICMPpackets. ThebandwidthimprovementsovertheoriginalP2Proutingbyenablingtheshortcutconnectionsetup,betweentwoIPOPnodescommunicatingoverthevirtualnetwork, 42

PAGE 43

BandwidthmeasurementsbetweentwoIPOPnodes:withandwithoutshortcuts Shortcutsenabled Shortcutsdisabled Bandwidth Std.dev Bandwidth Std.dev Mbps Mbps Mbps Mbps UFL-UFL 12.91 0.74 0.67 0.024 UFL-NWU 10.00 1.62 0.68 0.018 wereevaluatedusingtheTestTCP(ttcp).Thisutilityisusedtomeasuretheend-to-endbandwidthachievedintransfersoflargeles.Table 2-1 showstheaveragebandwidthandstandarddeviationmeasurementsbetweentwoIPOPnodes:withandwithoutshortcuts.Theexperimentconsiders12ttcp-basedtransfersoflesofthreedierentsizes(695MB,50MBand8MB)andtwoscenariosforthelocationofnodes:UFL-UFLandNWU-UFL.Twofactorsalongtheroutingpathlimitthebandwidthbetweentwonodes:rst,thebandwidthoftheoverlaylinks,andsecond,theveryhighCPUloadofmachineshostingtheintermediateIPOProuters,whichreducestheprocessingthroughputoftheuser-levelIPOPimplementation.Withoutshortcutconnections,nodescommunicatedovera3-hopcommunicationpathtraversingtheheavilyloadedPlanetLabnodesandaverylowbandwidthwasrecorded.However,withshortcutsenabled,nodescommunicateoverasingleoverlayhop,thusachievingamuchhigherbandwidth. TheprecedingsectionsofthischapterhavereportedonthedelaysincurredbyanewnodetojointheP2Pnetworkandbecomefullyroutable,andalsoimprovementsinlatencyandbandwidthbyusingshortcutsasopposedtothemulti-hoproutingpath.FurtherassumingthatshortcutconnectionsarealwaysestablishedbetweentwocommunicatingIPOPnodes,thenextsectionquantiesthelatencyandbandwidthoverhead(duetovirtualization)onasingleIPOPlink(usingasingleP2Phop)overthephysicalnetwork(directIPlevelcommunication). 16 ].Basedonseveraloptimizations 43

PAGE 44

CongurationsofmachinesusedforevaluatingperformanceofsingleIPOPlink Machine Hosttype CPU Location A Physical Pentium-41.8Ghz UFL B Physical Pentium-41.7Ghz UFL C Virtual(VMwareESX3.0) Xeon3.2Ghz UFL D Physical Pentium-2400Mhz UCLA E Physical Pentium-43.0Ghz VIMS Table2-3. Meanandstandarddeviationof10000pinground-triptimesforIPOP-UDPandphysicalnetwork mean(msec) std.dev(msec) LAN physical 0.240 0.028 (AandB) IPOP-UDP 4.25 1.43 WAN physical 66.13 1.20 (CandD) IPOP-UDP 80.54 14.27 totheBrunetP2PlibraryandIPOP,thissectionpresentsasimilarcomparisonbetweenIPOP-UDPandphysicalnetwork.Latencywasmeasuredfromtheround-triptimesofICMPpings,andalsobytherateofTCPrequest/responsetransactionsfromnetperfbenchmark.Thebandwidthmeasurementswereperformedusingiperf.ThecongurationsofmachinesusedfortheseexperimentsareshowninTable 2-2 .TheLANexperiments(latencyandbandwidth)refertomachinesAandBconnectedtothesame100Mbpswitch.TheWANlatencyexperimentsrefertomachinesCandDwhiletheWANthroughputexperimentsrefertomachinesCandEconnectedviaAbilene. Table 2-3 summarizesthepinground-triptimesforthelatencyexperimentsonanIPOPlinkusingasingleP2Phop.ThelatencyoverheadforLANandWANisobservedtobe4msand15ms,respectively.ThehigheroverheadonWANisunderinvestigation. Table2-4. MeanandstandarddeviationofrateofTCPresquest/responsetransactionsmeasuredover100sampleswithnetperfforIPOP-UDPandphysicalnetwork mean std.dev (trans/sec) (trans/sec) LAN physical 8148.6 942.53 (AandB) IPOP-UDP 183.6 45.19 WAN physical 14.91 0.28 (CandD) IPOP-UDP 13.40 0.34 44

PAGE 45

ComparisonofthroughputofasingleIPOPlinkinLANandWANenvironments Abs.B/W Rel.B/W (Mbps) (IPOP/Phys) LAN physical 93.8 (AandB) IPOP-UDP(HostA) 23.91 25% IPOP-UDP(HostB) 29.58 32% WAN physical 13.9 (CandE) IPOP-UDP(HostE) 12.5 90% Latenciesoftheorderofmilliseconds/packethavealsobeenreportedincontextofotheruser-levelroutingsystems,suchasVNET[ 13 ].TheLANexperimentprovidesaroughestimateoftheoverheadassociatedwiththeimplementationofIPOP.ThisoverheadisattributedtothetraversalofkernelTCP/IPstacktwicebyanypacketsentonthevirtualnetwork(onceonthevirtualinterface,andadditionallyonthephysicalinterface).WhiletherelativeoverheadishighintheLANenvironment,fortheWANusedinthisexperimenttheoverheadis31%ofthatofthephysicalnetwork.InaWAN,theoverheadofuser-levelroutinggetsamortizedoverthenumberofInternethops(inourcase,10)thatmakeupaP2Plink. Table 2-4 presentstherateofTCPrequest/replytransactionsoverIPOPandphysicalnetworkmeasuredwithnetperfwith1-bytepayload.Eachtransactioninvolvessendingarequesttoanothernode(whichongettingtherequestimmediatelysendsaresponseback)andwaitingforaresponse.Thenetperfbenchmarkmeasuresthenumberofsuchtransactionsoneaftertheotherthatcanbecarriedoutwithinatimeinterval.Thehigherthelatency,higheristhetimepertransactionandthusalowertransactionrate.Netperfmeasurementsarerepresentativeoflatencies(multiplicativeinverseoftransactionrate)incurredbyapplicationswhicharetypicallybasedonTCP/IP.WithIPOP,weobservelowtransactionrateonlocal-area(2%ofphysical),whileonwide-areaIPOPisabletoachieveupto90%oftherateonphysicalnetwork. Table 2-5 comparesthethroughputofasingleIPOPlinktothatofthephysicalnetwork,forbothLANandWANscenarios.Aninterestingobservationisthatover 45

PAGE 46

2.4.1 ,delaysgreaterthan150secondswereobservedfordirectconnectionsetupbetweentwonodesinUFLbehindasiteNATwhichdoesnotsupport"hairpinning".Giventheconservativelychosentimeoutsforpacketresendsinthelinkingprotocol,thisdelayhappenedbecausenodesrsttriedthewrongURIs(correspondingtoNATassignedIP/portofthepeer)forlinking.ThelinkingprotocolhasbeenextendedsuchthatnodestryeachothersURIsveinparallel,sothattheycanrightawaytrythecorrectURIforlinking.ThisenhancementhasresultedinUFL-UFLconnectionsetuptobeasfast(withinafewseconds)asfortheotherscenarios. 2.5.1ResourceVirtualization 48 ][ 5 ][ 49 ][ 50 ][ 4 ]recognizetheusefulnessofvirtualmachines(VMware[ 12 ][ 51 ],Xen[ 10 ])asexecutionenvironmentsforGridapplications.SuchenvironmentscanbedeployedindependentofthephysicalsetupateachGridsite.Inaddition,VIOLIN[ 14 ],VNET[ 13 31 ],andViNe[ 15 ])havealsorecognizedtheutilityofnetworkoverlaysinwideareadistributedenvironments.Inthesetechniques,itisnecessaryforadministratorstosetupoverlaylinks,andacentralizedentityisneededtoorchestratenetworkadaptationin[ 31 ].ThedescribedapproachisfundamentallydierentinitsuseofaP2P-basedoverlay,whichthenodescanjoinandleaveinacompletelydecentralized,self-organizingfashion. 52 ].In[ 53 ]Cheemaet.alandin[ 54 ]Iamnitchiet.alhaveinvestigatedP2Pdiscoveryof 46

PAGE 47

55 ]Caoet.al.haveproposedaP2Papproachtotaskschedulingincomputationalgrids.RelatedtothisworkaretheJalapeno[ 56 ],OrganicGrid[ 57 ],OurGrid[ 58 ]andParCop[ 59 ]projectswhichalsopursuedecentralizedcomputingusingP2Ptechnology.ThereisalsoexistingbodyofresearchonvariouswaysinwhichP2PsystemscanbeappliedtoexistingIPsystems.In[ 60 ]Coxet.al.haveproposedtobuildaDistributedDNSusingDHash,apeer-to-peerdistributedtablebuiltontopofChord[ 40 ].TheIPOPsystemcurrentlyappliesP2Ptechniquestoachieveself-congurationofoverlaynetworklinkstoenableecientandeasy-to-deployvirtualprivatenetworksonwhichapplications,includingexistingP2P-basedsystems([ 36 37 ]),canbedeployedwithoutconcernforNAT/rewalltraversal. TheuseofP2Pbasedoverlaytosupportlegacyapplicationshasalsobeendescribedincontextofi3([ 61 ][ 62 ]).ThegoalistosupportinteroperabilitywithnewI3applicationsthatsupportmulticast,anycastandmobility.Incontrast,motivationofmyresearchistoprovideseamlessaccesstoGridresourcesspanningdierentnetworkdomainsbyaggregatingthemintoavirtualIPnetworkthatiscompletelyisolatedfromthephysicalnetwork. Zhouet.al.havedevelopedP6P[ 63 64 ],animplementationofIPv6onaP2Poverlay.TheTeredo[ 25 ]protocoldevelopedbyMicrosofttunnelsIPv6packetsoverIPv4UDPpacketstoenablenodesbehindNATstobeaddressedwithIPv6connectivity.Ontheotherhand,myfocusistoenableexistinggridapplications(typicallybasedonIPv4)rununmodiedonwide-area.FewexistingapplicationssupportIPv6. 46 ][ 47 ][ 21 ][ 22 ])requirepubliclyavailablerendezvousserversforout-of-bandsignallingandexchangeofNAT-assignedIPaddressandportnumbers.Forexample,theTeredoprotocoldevelopedbyMicrosofttunnelsIPv6insideIPv4UDPmessagesrequiresmaintainingpublicTeredoservers;theIPv6addressofaTeredoclientisderivedfromIPv4addressesofthecorrespondingTeredo 47

PAGE 48

48

PAGE 49

Inthischapter,Idescribehowself-conguringvirtualnetworkingthroughIPOPcanbecombinedwithvirtualmachinetechnologytocreatescalablewide-areaoverlaynetworksofvirtualworkstationscalledWOWs.Thesesystems:(1)facilitatetheadditionofnodestoapoolofresourcesthroughtheuseofsystemvirtualmachines(VMs)andself-organizingvirtualnetworklinks,(2)maintainIPconnectivityevenifVMsmigrateacrossnetworkdomains;(3)presenttoend-usersandapplicationsanenvironmentthatisfunctionallyidenticaltoalocal-areanetworkorclusterofworkstations[ 65 ].Bydoingso,WOWnodescanbedeployedindependentlyondierentdomains,andWOWdistributedsystemscanbemanagedandprogrammedjustlikelocal-areanetworks,reusingunmodiedsubsystemssuchasbatchschedulers,distributedlesystems,andparallelapplicationenvironmentsthatareveryfamiliartosystemadministratorsandusers. Furthermore,WOWnodescanbepackagedasVM\appliances"[ 24 ]thatcanbeinstantiatedwithoutdisruptingthecongurationofexisting,commoditydesktopswithavarietyofhostedI/Ovirtualizationtechnologies(e.g.VMware,Parallels,LinuxKVM).ThesecharacteristicsmakeWOWsanexcellentinfrastructureforthedeploymentofdesktopgridsthatsupportnotonlyapplicationsdesignedforsuchenvironments,asin[ 2 3 66 67 ]andsystemsbasedonBOINC[ 9 ],butalsocomplex,full-edgedO/Senvironmentswithunmodiedsoftwareandmiddlewarecomponents(e.g.Condor[ 18 { 20 ]). Experimentswitharealisticdeploymentconsistingof118routernodesonPlanetLab[ 29 ]and33computenodesacrosssixdierentrewalleddomains(Figure 3-1 ),demonstratetheabilityofWOWs(1)toestablishdirectoverlaylinksbetweenIPOPnodes(2)tosupportexistingmiddlewareandcompute-intensiveapplicationsanddelivergoodperformance,and(3)toautonomouslyre-establishvirtualnetworklinksafteraVMmigratesacrossawide-areanetwork,andsuccessfullyresumetheexecutionofTCP/IPclient/serverapplicationsinamannerthatiscompletelytransparenttotheapplications. 49

PAGE 50

TestbedWOWusedforexperiments Therestofthischapterisorganizedasfollows.In 3.1 ,IdescribehowWOWnodesareconguredanddeployed.Section 3.1.4 evaluatesatestbedWOWprototype.AcaseforusingWOWstosetupadhocCondorpoolsforhigh-throughputcomputingispresentedinSection 3.3 .IdescriberelatedworkinSection 3.2 .InSection 3.4 ,IlistthevariousdecienciesincurrentIPOPprototypethathinderthedeploymentofWOWsbynewusers;solutionstotheseproblemsarepresentedinsubsequentchaptersofthisdissertation. 3.1.1BackgroundandMotivations 1 ].Attheresourcelevel,systemcongurationheterogeneityanddicultytoestablishconnectivityamongmachinesduetotheincreasinguseofNATs/rewalls[ 7 ]substantiallyhindersharingofresources.WOWsaredesignedtofacilitatetheaggregationofresourcesinanenvironmentwheresystemsindierentdomainshavedierenthardwareandsoftwarecongurationsandaresubjecttodierent 50

PAGE 51

Virtualizationallowsforisolated,exibleandecientmultiplexingofresourcesofashared,distributedinfrastructure[ 48 ].WiththeuseofVMs,thenativeorpreferredsoftwareenvironmentforapplicationscanbeinstantiatedonanyphysicalhost,replicatedtoformvirtualclusters[ 68 ],andcheckpointed/migrated[ 69 ]toenableuniqueopportunitiesforloadbalancingandfaulttolerance. 16 ]virtualnetworkconsistingof:mono.NETruntimeenvironment,a\tap"device(Figure 2-3 ),andashort(tensoflines)congurationscripttolaunchIPOPandsetuptheVMwithanIPaddressontheoverlay.ThecongurationscriptspeciesthelocationofatleastoneIPOPnodeonthepublicInternettoestablishP2Pconnectionswithothernodes.Currently,weuseanoverlaydeployedonPlanetLabforthispurpose. OneimportantadvantageofavirtualnetworksupportingseamlessNATtraversalisthatincaseswheretheVMmonitorprovidesaNAT-basedvirtualnetworkinterface 51

PAGE 52

Alternatively,itisalsopossibletorunIPOPonthephysicalhostthathoststheVM,andstillbeabletocapture/injectvirtualIPtrac.TheVM'sethernetinterfacehasanIPaddressfromthevirtualaddressspace,andallnetworkvirtualizationmechanismstakeplaceoutsidetheVM.Atthecostofextraconguration(installingIPOP)onthehost,suchamodelcompletelyconnestheVMtracwithinavirtualnetwork. 24 ]isconguredonce,thencopiedanddeployedacrossmanyresources,facilitatingthedeploymentofopenenvironmentsforgridcomputingsimilarinnaturetoeortssuchastheOpenScienceGrid(OSG[ 70 ]).WOWallowsparticipantstoaddresourcesinafullydecentralizedmannerthatimposesverylittleadministrativeoverhead. AnillustrativeusecaseexampleofWOWtechniquesisaVMappliance[ 71 ]thatself-conguresCondor[ 18 ]poolsonwide-areahosts.TheVMcongurationisbasedonaLinux2.16kernelandaDebiandistributionthatiscustomizedtooptimizetheVMimagesize,Condor6.8.20,andtheIPOPruntime 52

PAGE 53

3-1 detailsthecongurationofthevariouscomputenodesofthetestbedillustratedinFigure 3-1 .TheWOWhas33computenodes,32ofwhicharehostedinuniversitiesandbehindatleastonelevelofNATand/orrewallrouters:16nodesinFlorida;13inIllinois(NorthwesternU.);2inLouisiana;and1nodeeachinVirginiaandNorthCarolina(VIMSandUNC).Node34isinahomenetwork,behindmultipleNATs(VMware,wirelessrouter,andISPprovider).Atotalof118P2Prouternodeswhichrunon20PlanetLabhostsarealsopartoftheoverlaynetwork,toprovidea\bootstrap"overlayrunningonpublic-addressInternetnodes,towhichnodesbehindrewallscouldconnect Withtheonlyexceptionofthencgrid.orgrewall,whichhadasingleUDPportopentoallowIPOPtrac,norewallchangesneededtobeimplementedbysystemadministrators.Furthermore,noneofthesites(exceptUFL)providedDHCPcapabilitiesandWOWnodesoverthereusedVMwareNATdevices,whichdonotrequireanIPaddresstobeallocatedbythesiteadministrator. 53

PAGE 54

CongurationoftheWOWtestbeddepictedinFigure 3-1 .AllWOWguestsrunthesameDebian/Linux2.4.27-2O/S Nodenumber PhysicalDomain HostCPU HostO/S VMmonitor (VMware) node002 u.edu Xeon Linux Workstation 2.4GHz 2.4.20-20.7smp 5.5 node003u.edu Xeon Linux GSX2.5.1node016 2.4GHz 2.4.20-20.7smp node017northwestern.edu Xeon Linux GSX2.5.1node029 2.0GHz 2.4.20-8smp node030lsu.edu Xeon Linux GSX3.0.0node031 3.2GHz 2.4.26 node032 ncgrid.org Pentium-3 Linux VMPlayer 1.3GHz 2.4.21-20.ELsmp 1.0.0 node033 vims.edu Xeon Linux GSX3.2.0 3.2GHz 2.4.31 node034 gru.net Pentium-4 WindowsXP VMPlayer 1.7GHz SP2 1.0.0 BShortcutsdisabled FrequencydistributionsofPBS/MEMEjobwallclocktimes Ichosetworepresentativelife-scienceapplicationsasbenchmarks:MEME[ 72 ]version3.5.0andfastDNAml-p[ 73 74 ]version1.2.2.Theseapplicationsran,withoutanymodications,onthe33-nodeWOW;scheduling,datatransferandparallelprogrammingrun-timemiddlewarealsoranunmodied,includingOpenPBS[ 75 ]version2.3.16,PVM[ 76 ]version3.4.5,SSH,RSHandNFSversion3. InthePBSexperiment,oneoftheWOWVMs(node002)wasconguredastheclusterheadnodeandtherestwereconguredasworkernodes.InthePVMexperiments, 54

PAGE 55

TheexperimentsweredesignedtobenchmarkmyimplementationforclassesoftargetapplicationsforWOW:high-throughputindependenttasksandparallelapplicationswithhighcomputation-to-communicationratios.Specically,thegoalsoftheexperimentsare:(1)toshowthatWOWscandelivergoodthroughputandparallelspeedups,(2)toquantifytheperformanceimprovementsduetoshortcutconnections,and(3)toprovidequalitativeinsightsonthedeployment,useandstabilityoftheIPOPsysteminarealisticenvironment. 72 ]isacompute-intensiveapplicationthatimplementsanalgorithmtodiscoveroneormoremotifsinacollectionofDNAorproteinsequences.Inthisexperiment,Iconsidertheexecutionofalargenumber(4000)ofshort-runningMEMEsequentialjobs(approximately30seach)queuedandscheduledbyPBS.Thejobsrunwiththesamesetofinputlesandarguments,andaresubmittedatafrequencyof1job/secondatthePBSheadnode.JobsreadandwriteinputandoutputlestoanNFSlesystemmountedfromtheheadnode. Forthescenariowhereshortcutconnectionswereenabled,theoverallwall-clocktimetonishthe4000jobswas4565s,andtheaveragethroughputoftheWOWwas53jobsperminute.Figure 3-2 showsadetailedanalysisofthedistributionofjobexecutiontimes,forboththecaseswheretheWOWhadshortcutconnectionestablishmentenabledanddisabled.ThevariationinjobexecutiontimesshowninthehistogramcanbequalitativelyexplainedwiththehelpofTable 3-1 :mostphysicalmachinesintheWOWprototypehave2.4GHzPentium-4CPUs;acoupleofthem(nodes32and34)arenoticeablyslower,andthreeofthemarenoticeablyfaster(nodes30,31and33).Overall,theslowernodesenduprunningasubstantiallysmallernumberofjobsthanthefastestnodes(node32runs1.6%ofthejobs,whilenode33runs4.2%). 55

PAGE 56

3-2 alsoshowsthattheuseofshortcutconnectionsdecreasesboththeaverageandtherelativestandarddeviationofthejobexecutiontimes.Thewallclocktimeaverageandstandarddeviationare24.1sand6.5s(shortcutsenabled)and32.2sand9.7s(shortcutsdisabled).TheuseofshortcutsalsoreducedqueuingdelaysinthePBSheadnode,whichresultedinsubstantialthroughputimprovement,from22jobs/minute(withoutshortcutconnections)to53jobs/minute(withshortcutconnections). Notethatthethroughputachievedbythedeployedsystemdependsnotonlyontheperformanceoftheoverlay,butalsoontheperformanceoftheschedulinganddatatransfersoftwarethatrunsonit(PBS,NFS).ThechoiceofdierentmiddlewareimplementationsrunninginsideWOW(e.g.Condor,Globus)canleadtodierentthroughputvalues.TheVMtechnologyinusealsoimpactsperformance.TheaverageexecutiontimeforMEMEapplicationinsideaVMwasobservedtobe13%higherthanthatofaphysicalhost. 73 74 ].TheparallelimplementationoffastDNAmloverPVMisbasedonamaster-workersmodel,wherethemastermaintainsataskpoolanddispatchestaskstoworkersdynamically.Ithasahighcomputation-to-communicationratioand,duetothedynamicnatureofitstaskdispatching,ittoleratesperformanceheterogeneitiesamongcomputingnodes. Table 3-2 summarizestheresultsofthisexperimentforthe50-taxainputdatasetreportedin[ 74 ].TheparallelexecutionoffastDNAmlontheWOWreducessignicantlytheexecutiontime.Thefastestexecutionisachievedon30nodeswhentheWOWhasshortcutconnectionsenabled:24%fasterthan30nodeswithoutshortcutsenabled,and49%fasterthanthe15-nodeexecution.EventhoughfastDNAmlhasahighcomputation-to-communicationratioforeachtask,theuseofshortcutsresultedinsubstantialperformanceimprovements.WhileIhavenotproledwheretimeisspent 56

PAGE 57

ExecutiontimesandspeedupsfortheexecutionoffastDNAml-PVMin1,15and30nodesoftheWOW.Thesequentialexecutiontimefortheapplicationis22,272secondsonnode2and45,191secondsonnode34.Parallelspeedupsarereportedwithrespecttotheexecutiontimeofnode2. ParallelExecution 15Nodes 30Nodes Shortcuts Shortcuts Shortcuts enabled disabled enabled Executiontime 2439 2033 1642 (seconds) Parallelspeedup 9.1 11.0 13.6 (withrespecttonode2) withintheapplicationduringitsexecutiontime,increaseinexecutiontimescanbeexplainedbythefactthattheapplicationneedstosynchronizemanytimesduringitsexecution,toselectthebesttreeateachroundoftreeoptimization[ 74 ]. ThesequentialexecutiontimesoffastDNAmlarereportedfortwodierentnodes(node002andnode034)andshowthatthedierencesinthehardwarecongurationoftheindividualnodesoftheWOWresultinsubstantialperformancedierences.Whilemodelingparallelspeedupsinsuchaheterogeneousenvironmentisdicult,Ireportonthespeedupswithrespecttoanodewhichhasthehardwaresetupmostcommoninthenetwork.Theparallelspeedupcomputedunderthisassumptionis13.6x;incomparison,thespeedupreportedin[ 74 ]isapproximately23x,butisachievedinahomogeneousIBMRS/6000SPclusterwithinaLAN. 69 77 78 ].However,whenaVMmigratesitalsocarriesalongitsconnectionstate.Suchconnectionstatecanalsoaccumulateinsideotherhostswithwhichitiscommunicating.ThisforcestheVMtoretainitsnetworkidentity,whichinturnhampersVMmigrationbetweensubnets.VirtualnetworkingprovidestheopportunityofmaintainingaconsistentnetworkidentityforaVM,evenwhenitmigratestoadierentnetwork. 57

PAGE 58

2.3.2 .Clearly,packetsdonotgetroutedandaredroppeduntilthenoderejoinstheP2Pnetwork;thisshortperiodofnoroutabilityisapproximately8minutesforthe150-nodenetworkusedinthesetup.TheTCPtransportandapplicationsareresilienttosuchtemporarynetworkoutages,asthefollowingexperimentsshow. 1. WhentheVMwasresumed,itsvirtualeth0networkinterfacewasrestarted,andbecausetheNATsthattheVMconnectedtoweredierentatUFLandNWU,theVMacquiredanewphysicaladdressforeth0atthedestination.However,thevirtualtap0interfacedidnotneedtoberestartedandremainedwiththesameidentityontheoverlaynetwork.ThenIPOPwasrestarted;secondslater,theSCPserverVMagainbecameroutableovertheP2Pnetwork,thenestablishedashortcutconnectionwiththeSCPclientVM,andeventuallytheSCPletransferresumedfromthepointithadstalled.Thesustainedtransferbandwidthsbeforeandaftermigrationare1.36MB/sand1.83MB/s,respectively. 2. Thisexperimentsimulatesausecaseofapplyingmigrationtoimproveloadbalancing:backgroundloadwasintroducedtoaVMhostresultinginanincreaseintheexecutiontimeofapplicationsexecutingontheVMguest;theVMguestwasthenmigratedfromUFLtoadierenthostatNWU.IPOPwasrestartedonthe 58

PAGE 59

ProleofexecutiontimesforPBS-scheduledMEMEsequentialjobsduringthemigrationofaworkernode guestuponVMresume.ThejobthatwasrunningontheVMcontinuedtoworkwithoutproblems,andeventuallycommitteditsoutputdatatotheNFS-mountedhomedirectoryoftheaccountusedintheexperiment.Whiletheruntimeforthejobthatwas\intransit"duringthemigrationwasincreasedsubstantiallyduetotheWANmigrationdelay.OncePBSstartedsubmittingjobstotheVMrunningonanunloadedhost,itwasobservedthatthejobruntimesdecreasedwithrespecttotheloadedhost.ThisexperimentalsoshowedthattheNFSandPBSclient/serverimplementationsweretoleranttotheperiodwithlackofconnectivity.Figure 3-3 summarizestheresultsfromthisexperiment.JobIDs1through87runonaVMatUFL.DuringjobID88,theVMismigratedtoNWU.JobID88isimpactedbythewide-areamigrationlatencyofhundredsofseconds,butcompletessuccessfully.SubsequentjobsscheduledbyPBSalsorunsuccessfullyonthemigratedVM,withoutrequiringanyapplicationrecongurationorrestart. 65 ]andBeowulf[ 79 ],whichareverysuccessfuleortsatusingcommoditymachinesandnetworksforhighperformancedistributedcomputing.Ratherthansupportingtightly-coupledparallelcomputationwithinalocal-areaorclusternetwork,theaimofWOWistosupporthigh-throughputcomputingandcross-domaincollaborations.TheDAS[ 80 ]projectbuiltadistributedclusterbasedonhomogeneouslyconguredcommoditynodesacrossveDutchuniversities.AlsorelatedtomyworkistheIbisproject[ 30 81 ]whichletsapplicationsspanmultiplesitesofagrid,andcopes 59

PAGE 60

82 ]. Severaleortsonlarge-scaledistributedcomputinghavefocusedonaggregatingwide-arearesourcestosupporthigh-throughputcomputing,butattheexpenseofrequiringapplicationstobedesignedfromscratch[ 2 3 66 67 83 ].Legion[ 84 ]isasystemalsodesignedtoscaletolargenumbersofmachinesandcrossadministrativedomains.Globus[ 1 ]providesasecurityinfrastructureforfederatedsystemsandsupportsseveralservicesforresource,dataandinformationmanagement.Condor[ 18 { 20 ]hasbeenhighlysuccessfulatdeliveringhigh-throughputcomputingtolargenumberofusers. Myworkdiersfromtheseapproachesinthatitisanend-to-endapproachtoprovidingafullyconnectedwide-areaclusterenvironmentforrunningunmodiedapplications.Nonetheless,theuseofvirtualizationmakesmyapproachonethatdoesnotprecludetheuseofanyofthesesystems.Quitethecontrary,becausevirtualizationenablesustorununmodiedsystemssoftware,itispossibletoreadilyreuseexisting,maturemiddlewareimplementationswhenapplicable,andrapidlyintegratefuturetechniques. In[ 82 ],authorsdescribeavirtualizedinfrastructuresforwide-areadistributedcomputingbasedonvirtualmachinesandvirtualnetworking.Thekeydistinguishingfeatureofmyapproachistheuseofpeer-to-peer(P2P)techniquesforoverlayroutingandestablishmentofoverlayconnectionsamongnodesbehindNAT/rewallroutersinahighlyscalablemanner. 71 ]havebeenusedtodeployaresourcepoolforrunningcompute-intensivejobsthroughCondor.Thepoolconsistsofmorethan80nodesinseveralNATed/rewalleddomains,andrunsjobssubmittedfromnanoHub[ 17 ]andusers 60

PAGE 61

Thedeploymentofthispoolhasbeengreatlyfacilitatedbypackagingallthesoftware(IPOP,Condormiddleware)withintheVMandrequiringonlyaNATnetwork,aswellasbytheavailabilityoffreex86-basedVMmonitors,notablyVMPlayerandVMwareServer.OnceabaseVMimagewascreated,replicatingandinstantiatingnewnodeswasquitesimple.TheVMimagecanbedownloadandinstantiatedbyordinaryusersontheirdesktops.Onceinstantiated,theVMautomaticallybecomesthepartofthesharedpool.ThroughthisVM,userscansubmitjobstothesharedpoolandalsorunjobsfromotherusers. Thesystemhasbeenobservedtobetoleranttophysicalnodesfailures{neighborsofafailednoderespondbycreatingconnectionstoothernodesandthusmaintainroutabilitywithinthenetwork.havebeenshutdownandrestartedduringthisperiodoftime.TheoverlaynetworkhasalsoexhibitedresiliencytochangesinNATIP/porttranslations.IPOPisabletodealwiththesetranslationchangesautonomouslybydetectingbrokenlinksandre-establishingthemusingtheconnectiontechniquesdiscussedinSection 2.3.2 AnapplicationofWOWtechniqueswithinaLANisfacilitatingdeploymentofCondorpoolsatUniversityComputerCentersconsistingofhundredsofidledesktops,usuallyrunningtheWindowsoperatingsystem.CondormiddlewarerunsonLinux,andcanbedeployedinsideLinuxVMcontaineronWindowsmachines.However,tobeabletocommunicatewitheachother,theseVMsneedunique,routableIPaddresses.ManagingadditionalIPaddressesforVMscanbeverydicultforadministrators,becausemanynewVMscanbereadilyinstantiatedbyusersfromaVMimage,andcanalsobeeasilymigratedacrossphysicalhosts.ThedecentralizedNAT-traversalsupportinIPOPallowsinstantiationoftheseVMsbehindNATinterfacesprovidedbyVMMs(suchasVMwareandXen),andbeabletoprovideconnectivitybetweenthemwithoutrequiringaroutableIPaddressesforVMs. 61

PAGE 62

85 ],thedeploymentofWOWsbynewusersisstillhinderedby: 1. 71 ],IPOPsupportsdynamicvirtualIPcongurationusingunmodiedDHCPclients,bycapturingDHCPpacketsfromthetapinterface,andmakingrequeststoSOAPserverthatmaintainsvirtualIPleases.WiththevirtualnetworkprovidedbyIPOPpotentiallyinvolvinghostsspanningwide-areanetworksandownedbymultipleorganizations,maintainingsuchdedicatedDHCPserversisdicult.Moreover,dedicatedserversintroducecentralpointsoffailures. 2. Furthermore,earlierversionsofIPOP[ 16 71 ]havealsosueredfromlimitationswithrespectto: 1. 78 ],[ 13 ]),thusrequireskillingandrestartingtheP2Pnodeonthetargethostasshownin[ 85 ]. 2. 62

PAGE 63

28 ]andresourcediscoverytondproxynodesthatcanroutecommunicationbetweenvirtualIPnodeswhenshortcutconnectionscannotform.Subsequentchaptersofthisdissertationdescribeandevaluatethesetechniques. Inthenextchapter,IdescribeanimplementationofaDHTovertheBrunetP2PsystemandhowitcanbeleveragedtomakedeploymentofWOWseasierfornewusers. 63

PAGE 64

Enterpriseinformationsystemsinvolvepackingandstoringlargeamountsofstoragedevicesthroughoutaseriesofshelvesinaroom,alllinkedtogether.Theinformationinthesestoragesystemscanbeaccessedbyasupercomputer,mainframecomputer,orpersonalcomputer.Thesesystemscanonlybeaccessedbyauthorizedusersandrequireconstantattentionandmanagementbyexpertswithinanorganization.Themanagementactivitiesincludekeepingthesystemupandrunning,hardwareandsoftwareupgrades,backupanddisasterrecovery. Inawide-areaenvironmentthatinvolvesseveralorganizationsandindividualusers,usingsuchcentralizedsystemsposesseveralissues:Whomanagesthesystem?Whatshouldbethetargetedsystemcapacity?Isthesystemaccessibletoallusers? Architecturesbasedonself-managingpeer-to-peerstorage(CFS[ 37 ],PAST[ 36 ])havebeenproposedasanalternativetocentralizedapproachesforvariousapplications.AsdescribedinChapter2,structuredP2Psystemsprovideaprimitivecalledthedistributedhashtable(DHT)forstoringandlocatingobjects.Eachobjectisassociatedwithakeythatbelongstothesameaddressspaceasnodeidentiers.Theownershipofkeysispartitionedamongparticipatingnodes,suchthateachkeyisstoredonasetofnodesthatareclosesttothekeyintheidentierspace.ThispartitioningofkeyownershiptogetherwithecientroutingbetweennodesboundslookupoverheadforanobjectstoredintheDHT.ThefollowingpropertiesmakeDHTsusefulwide-areastoragearchitectures: 1. 2. 64

PAGE 65

4.1.1SystemsBasedonDistributedHashTable 37 ]basedonChord[ 40 ],andPAST[ 36 ]developedbyMicrosoftbasedonPastry[ 39 ].In[ 60 ]Coxet.al.haveproposedtobuildDistributedDNSusingDHash,adistributedhashtable(DHT)basedonChord[ 40 ].SCRIBE[ 86 ]isalargescaleapplication-levelmulticastandeventnoticationinfrastructurebasedinPastryP2Psystem.ePost[ 87 ]describesadecentralizedemailservice,alsobasedonPastry.OpenDHT[ 88 ]isapublicDHTservicebasedonBambooP2Psystem[ 89 ]operatingonPlanetLab,andcanbeusedbythird-partyapplications. Byrestrictingthekeyownershipstoasmallsetofnodesandnotrequiringcachingtoachieveecientlookup,DHTscanalsobeusedtostoremutabledata.In[ 90 ],authorsproposeanalgorithmtoprovideatomicityofmutabledatastoredinaDHT.Comet[ 91 ]usesDHTtoprovideascalableanddecentralizedcoordinationspaceasinLinda[ 92 ]. In[ 93 ],theauthorsproposetouseauniversaloverlaytoprovideascalableinfrastructuretobootstrapmultipleserviceoverlaysprovidingdierentfunctionality.Itprovidesmechanismstoadvertiseservicesandtodiscoverservices,contactnodes,andservicecode.Inthiswork,IdemonstratehowauniversaloverlaycanbeusedtofacilitatebootstrappingofmultipleWOWs,eachsupportingadierentcommunityandhavingitsownvirtualprivateIPaddressspace. 65

PAGE 66

1. ProximityNeighborSelection(PNS)[ 27 ]isatechniqueinwhichwheneveraP2Pnodehasachoiceonitsroutingtableentries,itpicksthenodethathastheleastlatencytoit.Ithasbeenshownthatthislocalminimaateachhopcanboundthetotallatencyofalookuptowithinacertainfractionoftheactuallatencytothenodestoringthekey.PNSisalreadyemployedinexistingP2Psystems(Chord[ 40 ],Pastry[ 39 ])andcanbeincorporatedintoBrunettoachievetheboundsonthetotaltransitlatencyincurredbyavirtualIPpacket. ItispossiblethatthemessagescontainingDHToperationsgetlostandneedretry.TheBambooP2Poverlay[ 89 ]usestheInternetround-triplatencyinformationtocalculatetherighttimeoutsonDHToperations.TheP2Proutingateachnodealsotriestoavoidahighlatencyhop.TheBambooDHTalsoreplicatesakeyatseveralnodes,sothatasingleslownodestoringthekeydoesnotdelaythelookupprocess. InSBARC[ 94 ]andBrocade[ 95 ],someP2Pnodesupgradetosupernodesbasedontheirhighresourcecapacitiesandhighstability.AllintermediateP2Phopsinvolvetheonlysupernodes,thusachievingaquicklookuplatency.Skypeusesasimilarapproach,exceptthatitisbasedonanunstructuredP2Pnetwork. 2. 26 ].Anodecanpotentiallymissaneighborwithwhichitcannotcommunicate,whichcanleadtoanincorrectroutingdecisionandsubsequentlyaDHToperationtoberoutedtowrongsetofnodes. 3. 66

PAGE 67

4. 96 { 98 ]. 5. 99 ],theauthorspresenttechniquestoprovidecontent/pathlocalityandsupportforNATsandrewalls,whereinstancesofconventionaloverlaysareconguredtoformahierarchyofidentierspacesthatreectsadministrativeboundariesandrespectsconnectivityconstraintsamongnetworks. 4-1A showshowanewnodearrivalishandled.Initially,node123storeskeysintherange[110;123]S[123;128],whilenode110storeskeysintheranges[110;123]S[110;123].Inresponsetothearrivalofthenode116,node123migratesthekeysinrange[110;116]tothenewnode.Similarly,node110migratesthekeysinrange[116;123]tothenewnode. Figure 4-1B showshowanodefailureishandled.Initially,node123storeskeysintheranges[116;123]and[123;128],whilenode110storeskeysintheranges[110;116]and[110;123].Inresponsetothefailureofnode116,node123copiesthekeysinrange[116;123]tonode110,whilenode110copiesthekeysinrange[110;116]tonode123. 67

PAGE 68

BNodedeparture Handlingtopologychanges 1. 2. 3. SincetheobjectsarestoredintheDHTassoft-stateforthelifetimespeciedintime-to-live(afterwhichtheyareautomaticallygarbagecollected),thereisnoprimitivetodeleteanobjectassociatedwithasomekey 68

PAGE 69

26 ],oftenpreventcommunicationbetweenimmediateneighborsintheidentierspace,thusaectingoverlaystructuremaintenanceandrouting.Thisinabilitytocommunicatewithaneighbornodeisperceivedastheneighborbeingdown,thusresultinginaninconsistentviewofthelocalneighborhoodandsubsequentlyincorrectroutingdecisionsatthatnode.Insuchcases,DHToperationsonthesamekeykoriginatingatdierentsourcesmaynotalwaysroutedtothesamenode(calledtherootforthatkey).ThisproblemisreferredtoasinconsistentrootsintheDHTliteratureandhinderstheabilityofDHTCreateprimitivetodetectduplicationofkeys,asshowninFigure 5-3 Nodes115and110cannotformaconnection.Amessageissenttokey112,andtheclosestnodeis110.Left:message(Create)addressedtokey112arrivesatnode115;itbelievesthatitistheclosesttothedestination{themessageisdeliveredlocallyandthekeyissuccessfullycreated(alsoreplicatedatnode100).Right:anothermessageCreateaddressedtothesamekeyarrivingatnode83iscorrectlyroutedtonode110{herethekeyisnotfoundandiscreatedagain(thisCreateoperationalsoreturnssuccessinsteadofreturninganerror). TheinabilityoftheDHTCreateprimitivetodetectduplicationofkeyssubsequentlyaectsthecorrectoperationofapplicationsrequiringsuchuniquenessguarantees,suchasthedecentralizedDynamicHostCongurationProtocol(DHCP)describedinSection 4.4.2 69

PAGE 70

InconsistentrootsinDHT Toreducethelikelihoodofinconsistentroots,eachapplicationspeciedkeykisinternallyre-mappedtonkeys(k1;k2:::kn),whicharethenstored(togetherwiththeassociatedvalue)atndierentlocationsontheP2Pring.Applicationscanchoosethisdegreeofre-mappingforeachkey,andexpectDHToperationstoseparatelyprovidereturnvaluesforeachre-mappedkey,thusallowingapplicationstoimplementschemeslikemajorityvoteonresultsobtainedforeachsuchre-mappedkey.Forafaulttooccurnow,therootsofasmanyashalf(morethanone)ofthere-mappedkeyshavetobeinconsistent.Majorityvotingalsohastheadvantagethatbynotrequiringresultsforallre-mappedkeys,afewslownodescannotslowdowntheentireDHToperation. Furthermore,anewP2PnodejoiningtheoverlayisnotallowedtoperformanyDHToperationuntilitgetsconnectedcorrectly,i.e.itformsconnectionswithitsnearestleftandrightneighborsonthering.ThisisbecauseanincorrectlyconnectednodehasaninconsistentviewoftheringandmayobserverootsfortheDHTkeysthatareinconsistentwiththoseobservedbyexistingnodes.Thetimetakenforanewnodetogetcorrectlyconnectedtoanexistingnetworkofover100nodeshasbeenobservedtobeabout5secondsonaverage. 70

PAGE 71

Figure 4-3 showshowdierentWOWs(orIPOPnamespaces)canexistontopofacommonP2Poverlay.EachvirtualIPnodebelongstosomeIPOPnamespaceandisassociatedwithaP2Pnode.Inthisexample,theIP!P2PmappingsfornodesA1;B1;A2;B2(A1!X8,B1!X1,A2!X2andB2!X4)arestoredatnodesX3,X5,X6andX7respectively.TheDHTkeyforeachsuchmappingisacombinationofagloballyuniqueidentierforthenamespaceandthevirtualIPaddresswithinthatnamespace.TheinclusionofthenamespaceidentiferallowsvirtualIPnodesindierentnamespacestohavesameIPaddresses.TosendavirtualIPpackettonodeB1,thenodeA1queriestheDHTwith(N1;B1)askey.ThevalueassociatedwiththiskeyistheP2Paddress(X1)oftheP2PnodeassociatedwithB1,andisquicklyretrievedfromnodeX5.Fromthispointonwards,communicationproceedsasdescribedinChapter2. CreatinganewIPOPnamespaceonlyrequiresexecutingasimpleprogramwithinformationabouttheIPOPnamespace(assignablevirtualIPaddressesandothernetworkparameters).ThenamespaceidentieristhenprovidedasaparameterinsidetheIPOPcongurationoftheapplianceVMsfordistribution.ExperimentsshowthatanewnodejoiningaWOWtakesabout20-30secondsonaveragetoacquireavirtualIPaddress. 71

PAGE 72

ExampleoftwodierentWOWswithnamespacesN1andN2sharingacommonBrunetoverlay SimilartoDHT-basedsystemsdescribedearlier,thisdecentralizedIPaddressmanagement(1)eliminatesanydedicatedcomponents,(2)scalestolargenumbersbyharnessingtheresourcesatparticipatingnodesand(3)providesresiliencetonodefailures,whicharecommonrequirementsinlarge-scaledesktopgridenvironments. Thedescribedfunctionalityiscomparablewithtasksanadministratorwouldtypicallyperformtosetupaprivatenetwork.Followingthesettingupofswitchesandcables,aprivateIPaddressrangeissetaside.HostsconnectingtotheprivatenetworkareassigneduniqueIPaddressesfromthisIPaddressrange.Toenabledynamicnetworkcongurationofhostsconnectingtothenetwork,oneormoreDHCPserversareconguredwiththelistofassignableIPaddressesandothernetworkparameters.Newhostsdiscover 72

PAGE 73

Incontrast,tosetupanewWOW,auserisonlyrequiredtocreateanIPOPnamespacewithauniqueidentierandaprivateaddressspace.ThenamespaceidentierisspeciedinsidetheIPOPcongurationoftheWOWVMapplianceimage.Eachdeployedinstanceoftheapplianceonbootretrievesthenamespaceinformation(virtualIPaddressrange)andconguresitselfwithauniquevirtualIPaddress.Thesestepsaredescribedbelow. 85 ])ofnodesintheuniversaloverlayandstartsupaP2Pnodethatconnectstothatoverlay.ThenodetriestoinsertthenamespaceinformationintotheDHT(usingCreate)witharandomlychosenidentierasthekey.Ifthekeyalreadyexists,theCreatereturnsanerrorandtheprogramretrieswithadierentidentieruntilitsucceeds.SincetheDHTdoesnotstoreobjectsindenitely,thisobjectholdingthenamespaceinformationhastobeperiodicallyrecreated(usingRecreate).ThisnamespaceidentierisspeciedinsidetheIPOPcongurationoftheWOWapplianceimage. 71 ],IPOPsupportsdynamicvirtualIPcongurationusingunmodiedDHCPclients.ThisisachievedbycapturingDHCPrequestpacketsfromthetapandmakingSOAPrequeststoapubliclyaccessibleserverthatstoresthelistofassignableIPaddressesandactiveleases,andeventuallyinjectingDHCPresponsepacketstothetap.TheSOAPservercanbeasinglepointoffailure.ThedecentralizedDHCPusestheDHTas 73

PAGE 74

OninterceptingaDHCPpacket,IPOPretrievesinformationaboutitsnamespace(assignableIPaddressesrange,netmask,leasetimes)fromtheDHT(usingaGet)onthenamespaceidentiferasthekey.ItthenchoosesarandomIPaddressfromthatrange,belongingtothenamespace,andattemptstocreateaDHTentry(usingaCreate)with:combinationofnamespaceidentierandtheguessedIPaddressasthekey,arandomlychosenpassword,anditsP2Paddressasthevalue.Theentryissuccessfullycreatedonlyifthereisnootherentrywiththesamekey.ThispreventsIPaddressconictsbetweenWOWnodesbelongingtothesamenamespace.IncaseCreatereturnsanerror,IPOPtriesanother(randomlychosen)IPaddressuntiliteventuallysucceeds.TheDHCPresponsepacketwithinformationabouttheleaseiswrittentotap.Thepasswordisrecordedforsubsequentoperationsonthekey. Theentryisonlycreatedwithatime-to-live(TTL)equaltotheleasetimeforthatnamespace,andthusneedstoberecreated(usingaReCreate)periodically.ThisprocessisagaintriggeredbytheDHCPclient,whichattemptstorenewavirtualIPleaseafterhalftheleasetimehaselapsed.Inthiscase,IPOPattemptstoReCreatethesameDHTkeycorrespondingtothevirtualIPaddressboundtotap. DHTinconsistenciesasdescribedinSection 4.2.3 ,cancreateasituationwheremultipleIPOPnodesacquirethesamevirtualIPaddress.TheDHTimplementationinIPOPthereforeachievesfault-tolerancetoinconsistenciesbyre-mappingeachapplicationspeciedkeykto8dierentkeys(k1;k2;k3;k4;k5;k6;k7;k8)andperformthecorrespondingoperationoneachofthesekeys.ToconsideraCreateorRecreatetohavesuccessfullyhappened,itisexpectedthatatleast5oftheseoperationsbesuccessful(returnatrue).Otherwise,theoperationisconsideredtohavefailedandadierentIPaddressistried.However,theinitialimplementationofthistechniquethatisevaluatedinSection 4.5 stillwaitsforall8resultsbeforeperformingamajorityvote{incaseswhere 74

PAGE 75

Figure 4-4 showsatime-lineofevents,fromthestartupoftheIPOPandDHCPclient(dhclient)process,tohavinganIPaddressboundtotap.ItshouldbenotedthattheIPaddressleaseacquisitioncannotstartuntiltheassociatedP2Pnodeiscorrectlyconnected(i.e.,withleft/rightneighbors).Oncecorrectlyconnected,IPOPtriesdierentvirtualIPaddresses(one-by-one)untilitisassuredthatnoothernodewithinthesameIPOPnamespacehasacquiredthesameIPaddress.ItispossibletohavealargeIPaddressrangefortheprivatenetwork,whichreducesthechancesoftwoIPOPnodesguessingthesameIPaddress. 3 ,mostTCP/IPbasedapplicationscommunicatingovertheIPOPvirtualnetworkareresilienttosuchtransientpacketlosses.ThisprocessofresolvingavirtualIPaddresstoaP2Paddressiscalled"Brunet-ARP". 75

PAGE 76

EventsleadingtovirtualIPaddresscongurationoftapinterface AnexperimentbetweentwodesktopmachinesAandBisthenperformedasfollows.OndesktopA,startIPOPandDHCPclientsothatitacquiresavirtualIPaddress,whichremainsxedduringtheexperiment.OnthedesktopB,proceedaniterativeprocessof:(1)startIPOPnodeandDHCPclient(2)waituntilanIPaddressisboundtotap(3)startpingingthevirtualIPaddressofdesktopAfor200seconds(4)killIPOPandDHCPclientonB.Thisprocesswasrepeated250times. Ineachtrialoftheexperiment,theIPOPnodeondesktopBhadadierent(randomlychosen)P2Paddress.ThevirtualIPleasesfromdierenttrialspersistedintheDHT(sincetheleasesarenotrelinquished),andthispreventedthedesktopBfromacquiringthesameIPaddressoverdierenttrials.ThismappingbetweenvirtualIPaddressandP2Paddressisarbitraryandisdiscoveredautomaticallyineverytrial(asdescribedinSection 4.4.3 )beforepingpacketsstartowingbetweenthedesktopsoverthevirtualnetwork. Figure 4-5 showsthecumulativedistributionofdelayseenbytheDHCPclient(dhclient)toacquireanIPaddressonthetap.Weobservethatin90%ofthecases,DHCPprocessnisheswithin30secondsofIPOPandDHCPclientstartup.AsshowninFigure 4-4 ,thisdelaydependson(1)timetakenfortheIPOPnodestogetcorrectlyconnectedand(2)numberofdierentIPaddressesthataretried.ThecumulativedistributionsofthesecomponentsareshowninFigure 4-6 andFigure 4-7 76

PAGE 77

CumulativedistributionofthetimetakenbyanewIPOPnodetoacquireavirtualIPaddress(T2inFigure 4-4 ).Meanandvarianceare27.41secondsand20.19seconds,respectively. CumulativedistributionofthetimetakenbyanewP2Pnodetogetconnectedtoitsleftandrightneighborsinthering(T1inFigure 4-4 ).Meanandvarianceare4.98secondsand13.99seconds,respectively. 77

PAGE 78

CumulativedistributionofthenumberofdierentvirtualIPaddressestriedduringDHCP.Meanandvarianceare1.096and0.784,respectively. managementwithinaWOWthatleveragestheDHTfunctionalityoftheP2Pnetwork.ExperimentshaveshownthatanewWOWnodecanacquireauniquevirtualIPaddresswithin20-30secondsonaverage.TheimplementationofthisprotocolhasnowbeenmaderesilienttoP2Pmessagelosses.Insteadofwaitingforeachinternallyre-mappedkeyoperation(CreateorRecreate)toreturnaresult,theoperationisconsideredsuccessfulandtheprotocolproceedsforwardsassoonassucientresultsareavailabletoassurethattheoperationhassucceededonmajorityofnodes.TheseenhancementshaveresultedinreductionintheaveragetimetoacquireavirtualIPaddresstolessthan10seconds. AdditionaldecentralizedcongurationcanbeintegratedintoWOWs.Inparticular,theWOW-basedCondorpoolsthatarecurrentlyconguredfromacentralserver,canbeextendedtoleverageDHT-basedtechniquestoachievemanagerdiscoveryforworkernodes. 78

PAGE 79

Likeseveralrelatedeorts(suchasChord[ 40 ],Kademlia[ 100 ]andPastry[ 39 ]),IPOPreliesonstructuredP2Poverlaystoprovidethecoreserviceofmessageroutingandadditionalcapabilitiessuchasobjectstorageandretrieval.StructuredP2ProutingassumeseachnodehasaconsistentviewofitslocalneighborhoodintheP2Pidentierspace,whichisreectedinitsabilitytocommunicatewithitsneighboringnodes.RelatedworkonstructuredP2PsystemshaveimplicitlyassumedanenvironmentwhereP2Pnodesareabletoestablishdirectconnectionstooneanother,andhavemainlyfocussedonecientoverlaytopologies[ 27 ],correctroutingofobjectlookupsunderchurn[ 89 101 ],andproximity-awarerouting[ 102 ]. However,inpractice,wide-areaenvironmentsarebecomingincreasinglyconstrainedintermsofpeerconnectivity,primarilyduetotheproliferationofNATandrewallrouters.Studieshaveshownthatabout30%-40%[ 103 ]ofthenodesinaP2PsystemarebehindNATs.EventhoughmajorityoftheNATdevices(coneNATs)canbe\traversed"throughUDP-holepunching,upto20%oftheNATs[ 46 ](symmetricNATs)cannotbetraversedusingtheexistingtechniques.Inaddition,studieshavealsoshowntheexistenceofpermanentortransientrouteoutagesbetweenpairsofnodesontheInternet;forexample,[ 26 ]reports5.2%pair-wiseoutagesamongnodesonPlanetLab[ 29 ].Together,theseconnectivityconstraintsposeachallengetooverlaystructuremaintenance:twoadjacentnodescannotcommunicatedirectly,creatingfalseperceptionsofaneighbornotbeingavailable.Ingeneral,thesemissinglinksonaP2Pstructureleadtoinconsistentroutingdecisions,andsubsequentlyaectingroutabilityandservicesbuiltupontheassumptionofconsistentrouting. TheexistingimplementationsofstructuredP2Psystemshaverecognizedtheproblemofoverlaystructuremaintenancewhenonlyasmallfractionofpairs(about4%[ 104 ])cannotcommunicatewitheachother[ 26 105 ].However,practicalexperienceswithWOW 79

PAGE 80

Connectivity-constrainedwide-areadeploymentscenariotargetedbydeploymentsoftheIPOPP2Psystem deploymentsrevealthatthisfractioncanbesignicantlylargerduetonodesbehind(multiple)NATs,andNATsthataresymmetricordonotsupport\hairpin"translationthatprecludehole-punching. 88 ]reliesonnon-rewalledPlanetLabP2PnodestodeployitsDHT;however,nodesbehindNATsandrewallscanonlyactasOpenDHTclientsanddonotstorekeys.InordertoaggregatetheincreasingnumberofhostsbehindNATs/rewallsasWOWnodes,theIPOPvirtualnetworkmustbeabletodealwithacomplexwide-areaenvironmentastheonedepictedinFigure 5-1 ,wheretypicalendusersofaP2PsystemareconstrainedbyNATsinwhichtheydonothavethecontrol(orexpertise)necessarytosetupandmaintainrewallexceptionsandmappingsnecessaryforNATtraversal. Itiscommonforbroad-bandhoststobebehindtwolevelsofNAT(ahomegateway/routerandtheISPedgeNAT,nodesAandBinFigure 5-1 ).IPOPsupportsestablishmentofUDPcommunicationusingholepunchingtechniquesfor\cone"typeNATs(e.g.betweennodesAandCinFigure 5-1 ),andthereisempiricalevidencepointingtothefactthatthesearethecommoncase[ 46 ].However,nodesbehindNATsfor 80

PAGE 81

Recognizingtheimportanceofsupportingtraversal,someofrecentNATshavestartedsupportingUniversalPlugandPlay(UPnP)[ 106 ]whichallowthemtobeconguredtoopenportssothatotherhosts(outsidetheNAT)caninitiatecommunicationwithhostsbehindtheNAT(e.g.hostsIandH).However,UPnPisnotubiquitous,andevenwhenitisavailable,multi-levelNATscreatetheproblemthathostscanonlyconguretheirlocalNATsthroughUPnP,whilehavingnoaccesstocontrolthebehavioroftheedgeNAT,whichrenderstheUPnPapproachineectiveoutsidethedomain.Forexample,althoughhostsAandEinFigure 5-1 areconnectedtoUPnPNATs,theyarealsosubjecttorulesfromanISPNATandaUniversityNATrespectively,whichtheydonotcontrol. SomeNATssupport"hairpinning",wheretwonodesinthesameprivatenetworkandbehindthesameNATcancommunicateusingeachother'stranslatedIPandport.Suchabehaviorisusefulinamulti-levelNATscenario,wheretwohostsbehindthesamepublicNATbutdierentsemi-publicNATsareabletocommunicateonlyusingtheirIPaddressandportassignedbythepublicNAT.However,notallNATssupporthairpinning,creatingasituationinwhichtwonodesinthesamemulti-levelNATeddomainmaynotbeabletousehole-punchingtocommunicatedirectly(e.g.nodesEandF)asdepictedinFigure 5-2 .NodesEandFarebehindtwodierentsemi-publicNATsrespectively,whichinturnarebehindapublicNAT-P.WhileformingitsinitialconnectionswithbootstrapnodesonPlanetLab,thesenodesonlylearntheirIPendpointsassignedbythepublicNAT-P.WhenEandFtrytoformaconnectiontheysendlinkmessagestoeachothersIPendpointsassignedbyNAT-P.IncaseNAT-Pdoesnotsupport\hairpinning",EandFcannotformaconnection.Only24%oftheNATstestedin[ 46 ]supporthairpinning. Somehostsarebehindrewallrouters(e.g.hostG)thatmightblockallUDPtracaltogether.OnlyafewP2Pnodesarepublicandareexpectedtobeableto 81

PAGE 82

MultiplelevelsofNATs communicatewitheachother.Connectivityevenamongthesehostsisconstrained:Internet-1andInternet-2hostscannotcommunicatewitheachother(e.g.hostsJandK),whilemulti-homedhostscancommunicatewiththemboth.Inaddition,linkfailures,BGProutingupdates,andISPpeeringdisputescaneasilycreatesituationswheretwopublicnodescannotcommunicatedirectlywitheachother.In[ 26 ],theauthorsobservedthatabout5.2%ofunorderedpairsofhosts(P1,P2)onPlanetLabexhibitedabehaviorsuchthatP1andP2cannotreacheachotherbutanotherhostP3canreachbothP1andP2. Itsisobservedthatatypicalwide-areaenvironmentpresentsseveraldeterrentstoconnectivitybetweenapairofnodes,andwhentwosuchnodeshaveadjacentidentiersontheP2Pring,structuremaintenanceisaected.Tothebestofmyknowledge,whilestructuredP2PsystemshavebeendemonstratedinpublicinfrastructuressuchasPlanetLab,wherethereareonlyafewpair-wiseoutagesandasmallamountofdisordercanbetolerated[ 105 ],nostructuredP2PsystemsdescribedintheliteraturehavebeendemonstratedwherethemajorityofP2PnodesaresubjecttoNATconstraintsofvariouskindsasillustratedinFigure 5-1 82

PAGE 83

InconsistentrootsinDHT 45 ]. Similartootherstructuredsystems,routinginBrunetusesthegreedyalgorithmwhereateachhopamessagegetsmonotonicallyclosertothedestinationuntilitiseitherdeliveredtothedestinationortothenodethatisclosesttothedestinationintheP2Pidentierspace.Greedyroutingassumeseachnodehasaconsistentviewofitslocalneighborhood,whichisreectedinitsabilitytoformstructurednearconnectionswithitsleftandrightneighborsintheP2Pidentierspace.Theinabilitytoformconnectionswithimmediateneighborsintheidentierspacecreatesinconsistentviewofthelocalneighborhood,thusresultinginincorrectroutingdecisionsasshowninFigure 5-3 (a). 83

PAGE 84

85 ].Thenotionofaconnection,whichdescribesanoverlaylinkbetweentwoP2Pnodes,iskeytoestablishingsuchlinks.Connectionsoperateoverphysicalchannelscallededges,whichinIPOPcanbebasedondierenttransportssuchasUDPorTCP.Besidesassistinginoverlaystructuremaintenance,theconnectionprotocolsallowthecreationof1-hopshortcutsbetweenWOWnodestoself-optimizetheperformanceofthevirtualnetworkwithrespecttolatencyandbandwidth. TheconnectionsetupbetweenP2PnodesisprecededbyaconnectionprotocolforconveyingtheintenttoconnectandexchangingthelistofUniformResourceIndicators(URIs)forcommunication.TheseconnectionmessagesareroutedovertheP2Pnetwork.Incorrectroutingleadstosituationswhereconnectionmessagesareeithermisdelivered(ornotdeliveredatall),thusaectingbothoverlaystructuremaintenanceandconnectivitywithinthevirtualnetwork. 107 ],whichisusedfordynamicvirtualIPcongurationofWOWnodes,summarizedasfollows.IPOPsupportscreationofmultiplemutually-isolatedvirtualnetworks(calledIPOPnamespaces)overacommonP2Poverlay.ThevirtualIPcongurationofWOWnodesineachsuchprivatenetworkisachievedusingadecentralizedimplementationoftheDynamicHostCongurationProtocol(DHCP).TheDHCPimplementationusesaDHTprimitive(calledCreate)tocreatekey/valuepairsmappingvirtualnetworknamespacesandvirtualIPaddressesuniquelytoP2Pidentiers.TheCreateprimitivereliesontheconsistencyofkey-basedroutingtoguaranteeuniquenessofIP-to-P2Paddressmappings.Thatis,messagesaddressedtosomekeykmustbedeliveredtothesamesetofnodesregardlessofitsoriginator. 84

PAGE 85

5-3 (a)and(b),cancauseCreatemessagesaddressedtothesamekeyfromdierentsourcestoberoutedtodierentnodes.Thisproblemisalsoidentiedin[ 26 ]andisreferredtoasinconsistentroots,andcanleadtoasituationwheretwoWOWnodesclaimthesamevirtualIPaddress. 85

PAGE 86

71 ],theinabilityofaworkernodetoobtainanIPaddressimpliesitdoesnotjointhepool.Evenifanode\N"obtainsanIPaddress,ifitcannotcommunicatewiththecentralmanagernode\M",itisnotavailableforcomputation.Furthermore,theinabilityofnode\N'toroutetoaworkernode\W"preventsjobssubmittedby\N"toexecuteon\W".Allthesesituationsresultinthesystemnotbeingabletoachievablethemaximumavailablethroughputbecausenotallnodescanparticipateincomputations. Chapter 6 describesgenerallyapplicabletechniquesthatfacilitateconsistentstructuredroutingdespitetheconnectivityconstrainedpresentedbyawide-areaenvironment.TheninChapter 7 ,IevaluatetheapplicabilityofProximityNeighborSelection(PNS)inconjunctionwithnetworkcoordinatestoreducethelatencyofkeylookupsinIPOPoverlaysonPlanetLab.TechniquesarealsodescribedforselectingsuitableproxynodestoroutecommunicationbetweenvirtualIPnodes,whentopologyadaptationbasedonshortcutsisnotpossible. 86

PAGE 87

Inthischapter,Idescribeandevaluatetwonovel,synergistictechniquesforfault-tolerantroutingandstructuredoverlaymaintenanceinthepresenceofnetworkoutages:annealingrouting,analgorithmbasedonsimulatedannealingfromoptimizationtheory,andtunneledges,atechniquetoestablishconnectionsbetweenP2Pnodesbytunnelingovercommonneighbors.Thesearefullydecentralizedandself-conguringtechniquesthathavebeensuccessfullyimplementedintheBrunetP2Psystemanddemonstratedinactualwide-areaPlanetLabdeploymentsaswellasinNATedenvironmentswithemulatedpair-wiseoutages.Theeectivenessoftheseapproachesareanalyzedforvarioussystemcongurationswiththeaidofanalyticalmodels,simulation,anddatacollectedfromrealisticsystemdeployments. 87

PAGE 88

6-1 andworksasfollows. Inlines10-17,thenodelooksupitsconnectiontabletodetermineifitisadjacenttothedestinationintheidentierspace.Inthatcase,thenodedeliversthemessagelocallyandalsosendsittothenodeontheotherside(leftorright)ofthedestinationintheidentierspace. Ifthemessagehasnottakenanyhopsyet(i.e.itoriginatedatthecurrentnode),itissenttotheclosestnodeu.Otherwise(lines23-29),untilthemessagehastakenMAX UPHILLhopsitisdeliveredtotheclosestnodeuorthenextclosestusec(ifitwasalreadyreceivedfromtheclosestnodeu).Untilthispoint,thealgorithmdoesnotcheckfortheforwardprogressofthemessagetowardsthedestinationinidentierspace. BeyondMAX HILLhops(lines30-41),themessageissenttouorusec,onlyifthenexthopisclosertothedestinationthantheprevioushop.Itshouldbenotedthatthisconditiononlyrequiresprogresswithrespecttothepreviousnode;itstillallowsamessagetakeonehopthatisfartherawayfromdestinationthanthecurrentnode. Theannealingalgorithmisveryusefulforroutingmessagesaddressedtoexactdestinations,whichincludeconnectionsetupmessagesbetweenP2Pnodes,virtualIPpacketsbetweenIPOPnodes,andtheresultsofDHToperationsbacktothesourcenode.Inaperfectly-formedstructuredring,thealgorithmworksexactlyasthegreedyalgorithmandincursthesamenumberofhops. WhenmessagesareaddressedtoDHTkeys,thisalgorithmhasabetterchancetoreachthenodeclosesttothekey,bydeliveringthemessageateachlocalminima.Asa 88

PAGE 89

Annealingroutingalgorithm 89

PAGE 90

Theideabehindtunneledgesisasfollows.Assumethateachnodeinthenetworkattemptstoacquireconnectionstoitsclosest2mnearneighborsontheP2Pring,msuchneighborsateachside.ConsiderasituationwherethereisanoutagebetweentwoadjacentnodesAandBontheP2Pring.SincebothAandBalsoattempttoformnearconnectionswith2mnodeseach,theirneighborhoodsintersectat2(m1)nodesasshowninFigure 6-2 .ForatunneledgetoexistbetweenAandB,theremustbeatleastonenodeCintheintersectionItowhichbothAandBareconnected.SuchnodeCisacandidatetobeusedintunnelingthestructurednearconnectionbetweenAandB. ThisenhancementallowstheconnectionstateatanodetoconsistentlyreecttheoverlaytopologyevenwhenitisnotpossibletocommunicatewithsomeneighborsusingtheconventionalTCPorUDPtransports.Tunneledgesareimplemented(describedinSection 6.4 )suchthattheyarefunctionallyequivalenttoUDPorTCPedgesoncethey 90

PAGE 91

TunneledgebetweennodesAandBwhichcannotcommunicateoverTCPorUDPtransports areestablished,allowingseamlessreuseofthecoderesponsibleforstatemaintenanceandroutinglogicinthesystem. Twoimportantquestionsariseinthecontextofthisproposedapproach:whatistheprobabilityoftunneledgestobeformedbetweentwonodesAandB?howmanynodesarecandidatesforproxyingtunneledges?Thesequestionsareaddressedanalyticallyinthissection. Letpbetheprobabilityofanedgeoutagebetweenapairofnodes.ForatunneledgetoexistbetweenAandB,theremustbeatleastonenodeCintheintersectionItowhichbothAandBareconnected.Assumingmnearconnectionsateachsideofbothnodes,theprobabilityforatunneledgebetweenAandBtoexistthroughCisgivenby

PAGE 92

6-1 showstheprobabilityofformingatunneledgebetweenunconnectednodesAandBfordierentvaluesofedgeprobabilitypandnumberofnearconnectionsm.Itshouldbenotedthatthereisasharpincreaseintheprobabilityofbeingabletoformatunneledgewhennodesacquiremorethan2nearconnectionsoneachside.Thisfactisalsoreectedinsimulationresultswhichshowthatimprovementsincorrectnessofroutingusingtunneledgesaresignicantlyhigherwhenm3.Figure 6-5 shows3.9%brokenpairswhenm=2and0.86%brokenpairswhenm=3. Nowconsiderasituationwhereatunneledgeinvolvesexactlyoneforwardingnode.Whentheforwardingnodedeparts,thecurrentnodealsolosesthetunneledgeconnection.Therefore,forfault-tolerance,itisalsoimportantthattheforwardingsetofnodesfortunneledgecontainsmorethanonenode.Theprobabilitythatforwardingsetconsistsofatleast2nodesisgivenbyP[forwardingsetofsizeatleast2]=1k=1Xk=0P[forwardingsetofsizeexactlyk]=1k=1Xk=02(m1)k(p2)k(1p2)2(m1)k 92

PAGE 93

Probabilityofbeingabletoformatunneledgeasafunctionofedgeprobabilityandnumberofrequirednearconnectionsoneachside tunneledgeprobability edgeprob m=2 m=3 m=4 m=5 0.70 0.7399 0.9323 0.9824 0.9954 0.75 0.8085 0.9633 0.9929 0.9986 0.80 0.8704 0.9832 0.9978 0.9997 0.90 0.9638 0.9986 0.9999 0.9999 ItcanfurtherbeshownthatifeachnodemaintainsO(log2N)neighbors,thentheprobabilityofnotbeingabletoformatunneledgeis:=(1p2)2(m1)=(1p2)O(m)=(1p2)O(log2N)=O(Nlog2(1p2)) Forp=0:9,theaboveexpressionevaluatesto:O(N2:39).Therefore,asthenetworkgrowsinsizeandnodestendtoacquiremorenearconnections,tunneledgesbecomemoreandmoreprobable. ScenariossuchassymmetricNATs,multi-levelNATsandInternetrouteoutagesresultincomplexmodelsforthelikelihoodoftwonodesbeingabletocommunicate.Forexample,thelikelihoodofanodebehindasymmetricNATbeingabletoformanedgewithanotherarbitrarynodedependsonthefractionofnodesthatarepublic(orbehindfull-coneNATs).Inthemulti-levelNATscenario(Figure 5-2 )wheretheoutermostNAT-Pdoesnotsupport"`hairpinning"',thelikelihoodofanodeEtoformanedgewithanotherarbitrarynodeisafunctionofthefractionofnodesthatarebehindthesameNAT-P,but 93

PAGE 94

Intheabsenceofanypublishedworkthatprovidesafaultmodeltocaptureallsuchscenarios,thelikelihoodofanedgebetweenapairofnodeshasbeenmodeledwithauniformpair-wiseedgeprobability,andallowingforhighprobabilitiesofP2Pedgesnotbeingabletoform{ashighas30%. 1. Attempttoaddnearconnectionstotheimmediatemneighbors(oneitherside)respectingtheconnectionmatrix. 2. Iftunnelingisenabled:identifyallthemissingconnectionsbetweenpairsofnodes,computetheoverlapoftheirconnectiontablestoseeiftunnelingispossible,andaddthepossibletunneledgestothenetwork. 3. Iftherearenodeswithfewerthanmconnectionsoneachside:eachsuchnodetriestoacquiremorenearconnections(toitsclosestneighbors,andfullyrespectingtheconnectionmatrix),untilithassuccessfullyacquiredmnearconnectionsoneachside. 4. Iftherearenodeswhichacquiredmorethanmconnectionsoneachside,theseexcessconnectionsaretrimmedinthesubsequentstep. 5. Iteratively,attempttoaddafarconnectionateachnode.Thedistancestraveledbytheseconnectionsinthestructuredringfollowthedistributiondescribedin[ 45 ].Thisstepisrepeateduntileverynodehassuccessfullyacquiredatleastonefarconnectionthatisallowedbytheconnectionmatrix. Theall-to-allroutabilityofthenetworkisstudiedbysimulatingthesendingofamessagebetweeneachpairofnodes,andcountthenumberoftimesthemessageis 94

PAGE 95

Atedgelikelihoodof70%(0.7),thepercentageofnon-routablepairsvariesfrom9.5%to10.9%(thetotalnumberofpairsinthesimulatednetworkis1,000,000) incorrectlydelivered.Thisexperimentisconductedthisfor200dierentrandomlygeneratedgraphs.Toinvestigatecorrectroutingofkeys,Irandomlygenerate10000dierentkeys.Foreachkey,Isimulatethesendingofamessageaddressedtothatkeyfromeachnodeasthesource,andcountthenumberoftimesthelookupiswronglydelivered,i.e.tonodesotherthanthenodeclosesttothekeyinidentierspace.Weconductthisexperimentfor200dierentrandomlygeneratedgraphs. Figure 6-3 showsthenumberofnon-routablepairs(outof10001000possiblepairs)ofnodesfordierentvaluesofthenumberofnearneighborsm,whenneitherannealingroutingnortunneledgesareused.Itisobservedthatasedgelikelihooddropsto70%,theall-to-allroutabilityofthenetworkdropstolessthan90%,i.e.morethan10%ofpairsarenon-routable.Similarobservationsarealsomadefortheaveragenumberofinstanceswhenkeyswerewronglyrouted(seeFigure 6-4 ).Asedgelikelihooddropsto70%,thereismorethan10%chancethatakeyiswronglyrouted.Furthermore,keepingmorenearconnectionsateachnodeonlymarginallyimprovesthenetworkroutability. 6-1 isshowninFigure 6-6 ,withm=3;tunneledgesarenotenabled.Itisobservedthat,atanedge 95

PAGE 96

Atedgelikelihoodof70%,thepercentageofwronglyroutedkeysvariesfrom9.5%to10.7%(thetotalnumberofsimulatedmessagesis10,000,000) likelihoodof85%,thepercentageofnon-routablepairswithannealingroutingisabout0.6%,whichislessthanone-fthofthepercentagewhengreedyrouting(3.3%)isused.Evenwhentheedgelikelihooddropsto70%,thepercentageofnon-routablepairs(lessthan3.4%)isstilllessthanhalfofthatwhengreedyroutingisused(morethan10%).Itshouldalsobenotedthatannealingroutingwithm=3ismorelikelytoreachthecorrectdestinationthanusinggreedyroutingwithm=5,forthesameedgelikelihoodinthisnetworkof1000nodes. Theaveragenumberofhopstakenbyamessageforbothgreedyandannealingroutingwasalsomeasuredineachsimulation.Inaperfectlyformedstructurednetwork,bothroutingalgorithmsincurexactlythesamenumberofhops.Otherwise,theaveragenumberofhopsbetweenP2Pnodesforannealingisalmostthesameasforgreedyrouting.Foranedgelikelihoodof70%andm=3,theratioofnumberofhopsincurredbyannealingtothatofgreedyis1.01.Therefore,annealingroutingonlyincursamarginaloverheadintermsofnumberofhops. Figure 6-7 showstheaveragenumberofwronglyroutedkeylookupsforbothannealingandgreedyroutingusingthemethodologyasdescribedinSection 6.3.1 .Atanedgelikelihoodof70%(m=3),annealingroutingreducesthechancesofakeybeingwronglyroutedfrom10.2%to3.4%.Bydeliveringamessageatmorethanonenode,the 96

PAGE 97

Comparinggreedyroutingwithtunneledgesform=3andm=2.Atedgelikelihoodof70%,thepercentageofnon-routablepairsinanetworkof1000nodesis(1)3.9%form=2,and(2)0.86%form=3. Averagenumberofnon-routablepairs.Atedgelikelihoodof70%,thepercentageofnon-routablepairsforgreedyandannealingroutingis(1)withouttunneledges,10.26%and3.4%respectively;(2)withtunneledges,0.86%and0.21%respectively.Atedgelikelihoodsof95%,therearenonon-routablepairswithtunneledges. annealingalgorithmcanresultincreationofadditional(morethantwo)replicasforakey 97

PAGE 98

Averagenumberofwronglyroutedkeys.Atedgelikelihoodof70%,thepercentageofwronglyroutedkeysforgreedyandannealingroutingis(1)withouttunneledges,10.2%and3.4%respectively;(2)withtunneledges,0.86%and0.19%respectively. 6-6 ,form=3.Itisobservedthatatanedgelikelihoodof70%,tunneledgessubstantiallyreducethepercentageofnon-routablepairsofnodesfrom3.4%to0.21%forannealingrouting(from10%to0.86%forgreedyrouting). Eachvirtualhopoveratunneledgeactuallycorrespondstotwooverlayhops.Theactualnumberofhopstakenbyamessagesaddressedtoexactdestinationsinanoverlaythatsupportedtunneledgeswasalsorecorded.Foranedgelikelihoodof70%andm=3,theratioofnumberofactualhopstothatofvirtualhopswasobservedtobe1.14,whichisasmalloverheadconsideringtheimprovementinroutability. Figures 6-7 alsocompareshowtunneledgesimprovetheconsistencyofkeyroutabilityofthenetwork,form=3.Itisobservedthat,atanedgelikelihoodof70%withtunneledges,thechancesofakeybeingwronglyroutedare0.86%forgreedyrouting(and0.19%forannealingrouting). 98

PAGE 99

2 )thatusestheP2Poverlaytorendezvouswitharemotenodeforout-of-bandexchangeofinformationrelevantforcommunication(throughConnectToMe(CTM)messages),followedbyabidirectionallinkingprotocolthatestablishestheconnection.TheconnectionprotocolallowsnodestoexchangetheirNAT-assignedIPaddress/portforhole-punching.Toimplementtunneledges,thesamemechanismtoalsousedexchangeinformationaboutconnectionstonearneighbors. EachconnectioninBrunetisbasedonanedge.EachnodehasoneormoreUniformResourceIndicators(URIs)thatabstracttheedgeprotocolsitcansupportandtheendpointsoverwhichitcancommunicate.Foreachtypeofedge,anEdgeListenerisresponsibleforcreatingandmaintainingedgesofthattype,andalsosendingandreceivingmessagesoverconnectionsusingthatedgetype.Forexample,tocreateanedgewithanothernodeusingaURIipop.udp://128.227.56.123:4000,theUdpEdgeListenerisinvoked,whereastocommunicatewiththesamenodeusingURIipop.tcp://128.227.56.123:4001,theTcpEdgeListenerisinvoked.ABrunetnodecanhavemorethanoneEdgeListener,andnewtypescanbeeasilyadded. Beforedescribingtheprocessofcreatingatunneledge,IoverviewthefunctionalitythatallowseachBrunetnodeCtoalsoactasamessageforwarderforcommunicationbetweentwonodesAandB.ThemessagefromtheoriginalsourceAisencapsulatedinsideaforwardrequestmessageaddressedtonodeC.WhennodeCreceivesthemessagefromA,itextractstheoriginalmessage(fromAtoB),andsendsittonodeB.ThisfunctionalityisusedbyanewBrunetnodetoidentifyitsleftandrightneighborsintheP2Pring[ 85 ]. ThetunnelingofaconnectionbetweennodesAandBovercommonneighbors,isachievedbyimplementinganEdgeListenercalledTunnelEdgeListener.TheURIforanodecorrespondingtothetunneledgesiscomputeddynamicallybyconcatenatingthe 99

PAGE 100

WhenanodeAhascomputedtheforwardingsetFwitharemotenodeB,itthensendsthisinformationtoBinanEdgeRequestusingtheforwardingservicesofoneofthenodesinF.WhenBreceivesanEdgeRequest,itrepliesbackwithanEdgeResponseandalsorecordsthenewtunneledge.OnreceivingtheEdgeResponse,thenodeAalsorecordsthenewtunneledge.Oncethetunneledgeissuccessfullycreated,nodesAandBcansubsequentlycreateaconnectionbetweenthem. Thisimplementationdoesnotrequirenodesintheforwardingtokeepanystateaboutthetunneledgesthatareusingthem.Furthermore,theperiodicpingmessagestomaintainaconnectionbasedonatunneledgealsokeeptheunderlyingconnectionsalive.Therefore,noextraoverheadisincurredbynodesintheforwardingset.Theforwardingsetforatunneledgecanchangeovertimeasconnectionsareacquiredorlost.Tokeeptheforwardingsetuptodateandsynchronized,nodesAandBnotifyeachotheraboutthechangesintheirconnections. Whenanodejoinsanexistingoverlayandcannotcommunicatewithitsimmediateleftandrightneighbors,itstunnelURIisinitiallyemptysinceitdoesnothaveanyconnectionsyet.However,itispossiblethatthenewnodecancommunicatewithitsothernearneighbors,thereforeitmustrsttrytoformconnectionswiththemandthenusethoseconnectionstoformtunneledgeswithitsimmediateneighborsontheP2Pring.ThenewnodelearnsaboutitsothercloseneighborsthroughtheCTMmessagesitreceivesfromitsimmediateneighbors,whichalsocontainalistoftheirnearconnections. 100

PAGE 101

WhentwoadjacentP2Pneighborscannotformaconnection,itislikelythatcrawlingthenetworkusingneighborinformationwillskipanode.Ifthenextreportednodehasaconnectionwiththemissingnode,aninconsistencywillbereported.However,incaseeventhesecondnodedoesnothaveconnectiontothemissingnode,theinconsistencymaygounnoticed.Itisstillpossibletoobservea100%consistentringwithafewnodescompletelymissing.ThesehiddennodescanbedetectedusinginformationloggedbyBrunetateachnode;knowledgeofthenumberofnodesandtheiridentiersisalsoavailable. Theeectofthepresentedtechniqueswithrespecttooverlaystructureisdemonstratedinboth,asyntheticenvironmentwitharticiallycreatedsituationsthatpreventconnectionsetup;andalsoinalarge-scalePlanetLabenvironment,whichisknowntoexhibitrouteoutagesbetweenpairsofhosts. 101

PAGE 102

Eachnodewasconguredtoformconnectionswith3neighborsonitsimmediateleftandright.Ofthetotalof6180structurednearconnectionsreportedbyallnodes,about4926(70%)existedbetweennodeswhichwereondierentprivatenetworks.TheseconnectionswerenotpossiblewithoutdecentralizedNATtraversal.TheP2Pringwas100%consistent. TheP2Pringcontained35pairsofadjacentnodesthatcouldnotsetupaUDPconnectionbecauseofrewallrules.Thesepairsofnodeswerehoweverabletoconnectusingtunneledges,thusrenderingacompleteP2Pring. 6-2 )could 102

PAGE 103

AdjacentP2Pnodesinwide-areadeploymentthatcouldnotcommunicateusingTCPorUDP PlanetLabhostswithpair-wiseroutingoutages planetdev05.fm.intel.com|pli2-pa-3.hpl.hp.com planetlab-1.fokus.fraunhofer.de|planetlab2.cosc.canterbury.ac.nz sjtu2.6planetlab.edu.cn|planetlab1.ucb-dsl.nodes.planet-lab.org planetlab2-tijuana.lan.redclara.net|pli1-pa-3.hpl.hp.com planetlab-1.man.poznan.pl|pli2-pa-2.hpl.hp.com planetlab2.ls..upm.es|planetlab1.cosc.canterbury.ac.nz planetlab2.cosc.canterbury.ac.nz|planetlab3-dsl.cs.cornell.edu athen.dvs.informatik.tu-darmstadt.de|uestc2.6planetlab.edu.cn uestc2.6planetlab.edu.cn|planetdev02.fm.intel.com notcommunicateusingTCPorUDPtransports.Theirinabilitytoconnect,whichwasobservedindirectlythroughthefactthattunneledgeshadbeencreated,wasverieddirectlybyloggingintoeachhostandobservingthatICMPmessages(andSSHconnections)toitspeerdidnotgothrougheither. Theabilityoftunneledgestoformwasfurtherevaluatedbydeployingadditional20P2PnodesonhostsH1and20nodesrunningonhostH2.ThesenodeswereconguredtouseonlyUDPtransports,andtheirhostsH1andH2wereconguredtodropUDPpacketsbetweenthem,thusmodelingascenariowherethereisaroutingoutagebetweentwosites.TwoinstanceswereobservedwhereoneoftheadjacentpairswasrunningonH1,whiletheotherwasrunningonH2.Tunneledgesformedbetweenthesenodesinbothcases,againrenderinga100%consistentP2Pring.Withouttunneledges,thesenodeswouldhavehadinconsistentviewoftheirlocalneighborhoods(inidentierspace),andthemessagesaddressedtothemwerelikelytobemisdelivered. ThedelayincurredbyanewP2Pnode(onahomedesktop)togetconnectedwithitsleftandrightneighborsontheringusingtunneledgeswasalsomeasured,overseveraltrials.Theaveragetimetogetconnectedwithneighborsislessthan10seconds,usingUDPorTCP.ThehomedesktopdidnothaveanInternetpathtoafewnodesonPlanetLabandeverytimeitbecameaneighbortooneofthesenodes,itreliedontunneledgestogetconnected,whichtook41secondsonaverage.Thecurrent 103

PAGE 104

Thelinkingprotocolforconnectionsetupisexecutedthroughoneormorelinkers;eachlinkersendslinkmessagesusingthedierentURIsoftheremotenodeinparalleluntilitstartsreceivingreplies.Onlyonelinkerisactiveatatime,duringwhichitsendsseverallinkmessagesoveraURIuntilitstartsreceivingrepliesorgivesup.Initially,thenewnodedoesnothaveanyconnectionstotunneloveranditstunnelURIisempty.TherstlinkerthatiscreatedcanthusonlysucceedusingTCPorUDP.WhenTCPorUDPcommunicationisnotpossible,ittakesseveralattemptsforthelinkertonish,andthenextlinkertobeactivated.Insomecases,thelinkercontainingausabletunnelURI(createdafterthenodehasacquiredafewconnections)isstillwaitinginthequeue.AnalternativeimplementationispossiblethatallowsupdationstothetunnelURIlistedinthecurrentlyactivelinker,whichwouldobviatetheneedtowaitforthenextlinker. statuscommand)wasmeasured,whichisrepresentativeoftheachievablethroughputoftheCondorpool.Furthermore,onceaworkerhasbeenchosenforjobexecutionbytheCondormanagerthroughmatchmaking,theprocessofjobsubmissioninvolvesdirectcommunicationbetweenthesubmitnodeandtheworker.Theall-to-allconnectivitybetweenworkernodesisalsoreported.TheconnectivitywithintheWOW 104

PAGE 105

InitiallyP2Pedgesareallowedtoformwithoutconstraints.TheCondormanagerreportedall180workers,andtheworkerswereall-to-allconnected.TheP2Pringwas100%consistent.Tocreatesituationswheredirectcommunicationwasnotalwayspossible,theUdpEdgeListenerateachnodewasconguredtodenyUDP-basedconnectionswithaprobability0.10.TheprobabilityoftwonodesbeingabletoformaUDP-basedconnectionisthusgivenby(10:10)2=0:902=0:81. TheP2Pnodeswereconguredtoonlyusegreedyroutingandnotunneledges.TheCondormanagerreportedatmost160nodes,i.e.only88%oftheworkernodeswereavailable.Inaddition,therewere6020pair-wiseworkerconnections(outof180180)thatcouldnotform.Inanotherexperiment,theP2Pnodeswereconguredtouseannealingroutingbutnotunneledges.TheCondormanagerreported177workernodes,andtherewere859pairsofworkersthancouldnotcommunicatewitheachother. Finally,bothannealingroutingandtunneledgeswereenabled.TheP2Pringconsistingof201nodes(20bootstrap,1managerand180workers)reported40tunneledges,whichformedwhenUDPcommunicationwasdeniedbyoneoftheUdpEdgeListenerbetweenadjacentP2Pnodes.OnlyoneinconsistencywasobservedontheP2Pring,whereatunneledgedidnotformbecausetheP2PnodesdidnothaveanyoverlapontheirUDP-basedconnections,theoverlappingconnectionswerealreadybasedontunneledgesandtheexistingimplementationdoesnotsupportrecursivetunneling.TheCondormanagerreportedall180workers,andtherewereonly7pairsofworkersthatcouldnotcommunicate. 8 ],theauthorsdescribetheimplementationofaSocketslibrarythatcanbeusedbyapplicationsforcommunicationbetweennodessubjecttoavarietyofconstraintsinwide-areanetworks.Mywork,ontheotherhand,investigatesanapproachwhere 105

PAGE 106

StructuredP2Psystems(Chord[ 40 ],Pastry[ 39 ],Bamboo[ 89 ],Kademlia[ 100 ])haveprimarilyfocusedonecientoverlaytopologies[ 27 ],reliableroutingunderchurn[ 89 ][ 101 ],andimprovinglatencyoflookupsthroughproximity-awarerouting[ 102 ].In[ 26 ][ 105 ],theauthorsdescribetheaectofafew(5%brokenpairs)Internetroutingoutagesonwide-areadeploymentsofstructuredP2Psystems.Ontheotherhand,myfocusistoenableoverlaystructuremaintenanceinwhenalargemajorityofnodesbehindNATs,andseveralscenarioshindercommunicationbetweennodes.Thetechniquesdescribedinthischapterfacilitatecorrectstructuredrouting,evenwhenmany(upto30%)pairsofnodescannotcommunicatedirectlyusingTCPorUDP. In[ 99 ],theauthorspresenttechniquestoprovidecontent/pathlocalityandsupportforNATsandrewalls,whereinstancesofconventionaloverlaysareconguredtoformahierarchyofidentierspacesthatreectsadministrativeboundariesandrespectsconnectivityconstraintsamongnetworks.InaGridscenario,however,networkconstraintsarenotrepresentativeofcollaborationboundaries,asvirtualorganizations(VOs)areknowntospanacrossmultipleadministrativedomains. Atechniquesimilartotunneledgesisalsodescribedin[ 87 ],inthecontextofaP2P-basedemailsystembuiltontopofPastry.Mywork,ontheotherhand,usestunnelingtoimproveall-to-allvirtual-IPconnectivitybetweenWOWnodes.Ialsoquantifytheimpactofthedescribedtechniquesonstructuredroutingthroughsimulations,underdierentedgeprobabilitiesbetweennodes.UnmanagedInternetProtocol(UIP)[ 108 ]proposestousetunnelinginKademliaDHTtoroutebetween\unmanaged"mobiledevicesandhostsinadhocenvironments,beyondthehierarchicaltopologiesthatmakeupthecurrentInternet.However,myfocusistofacilitateIPcommunicationbetweenGridresourcesindierent\managed"Internetdomains. 106

PAGE 107

109 ],theauthorsdescribeanalgorithmforprovidingstrongconsistencyofkey-basedrouting(KBR)indynamicP2Penvironments,characterizedbyfrequentchangesinmembershipduetonodearrivalsanddepartures.TheimprovementsineventualconsistencybyusingthetechniquesdescribedinthischaptercanalsobenettheimplementationofstronglyconsistentKBR.Similarlyin[ 110 ],authorsprovideasymptoticupperboundsonthenumberofhopstakenbymessagesundervaryingratesforlinkandnodefailures,anddescribeheuristicstoimproveroutingunderthosefailures.However,theirworkdoesnotconsiderfailuresoflinkswithneighbornodesandthesubsequentimpactonconsistentstructuredrouting.Tocomplementfault-tolerantrouting,thetunnelingtechniquedescribedinthischapteralsoattemptstocorrecttheoverlaystructureinpresenceoflinkfailures. Thecurrentimplementationoftunneledges\passively"reliesonanoverlaptoexistbetweenconnectionsoftwonodes,forformingantunneledgebetweenthem.SymmetricNATednodescanbediculttohandleusingthisimplementationsincetheycanonlycommunicatewithpublicnodes(ornodesbehindcone-NATs).Thesenodesmaynot 107

PAGE 108

108

PAGE 109

InstructuredP2PsystemssuchasIPOP,nodesarrangethemselvesinawell-denedtopologythatisdictatedbytheirrandomlychosennodeidentiers.Asaresult,structuredroutingisobliviousofthegeographicallocationofnodes.ThesubsequentroutingdelaysaectthelatencyobservedbyservicessuchastheDHToperations,theconnectionsetupprotocols,andtheapplicationsofIPOPvirtualnetworkwhenshortcutconnectionscannotform.Furthermore,severaltoolsusedtogatherinformationabouttheIPOPdeploymentsonPlanetLab(suchasP2PringcrawlerasdescribedinChapter 6 ),alsorelyonoverlayroutingtocommunicatewithanode. Techniques[ 111 ]havebeenproposedtoembedlocalityinformationintonodeidentierstomaketheoverlayrouting(thatfollowsthenodeidentiers)latency-aware,howeverdoingsoadverselyaectstheuniformdistributiononnodeidentiersandsubsequentlytheboundsonaveragenumberofoverlayhopsandDHTdatareliability.Awell-knowntechniquecalledProximityNeighborSelection(PNS)[ 27 ]reduceslatencyofstructuredP2Proutingbyrequiringnodestoconsiderproximityinformationwhilechoosingonlysome(notall)oftheirconnections.ItworkswithrandomassignmentofP2PidentiersanddoesnotaecttheboundsonthenumberoverlayhopsandDHTreliability.Inthischapter,animplementationofPNSintheIPOPsystemispresented,andisshowntoachieveupto30%improvementinroutelatencyoftheIPOPoverlaysdeployedonPlanetLab. Asdescribedearlier,connectivityconstraintsinwide-areacanpreventcreationofadaptiveshortcutconnectionsbetweenIPOPnodes.EvenwithPNS,thelatencyof(multi-hop)P2ProutingforvirtualIPpacketsisstillveryhighformanyapplications.Insuchcases,itispossibletoestablisha2-hopoverlaytunnelbetweenIPOPnodes,byselectingaproxynodebasedonsomecriteria.Thischapteralsoinvestigatestechniquesto 109

PAGE 110

BothPNSandproxydiscoveryrequiretheknowledgeofInternetlatenciesbetweenpairsofnodesinthenetwork.Onepossibilityistotakeexplicitround-triptime(RTT)measurementstoanothernode.However,inaheavilyNATedenvironment(asdepictedinSections 5.1 and 6.5.1.1 ,measuringlatencytoanarbitrarynodemayalsorequiresettingupaconnectiontothatnodethroughhole-punching,thatcouldincuramessagingoverheadofupto10.2Kbytes.Sinceconnectionsetupincursnon-trivialcostanddelay,itisnotusefultoestablishconnectionsforshort-livedcommunicationwitharbitrarynodes,suchasmeasuringRTTtoanode.Toalleviatethisneedforcreationofshort-livedconnections,alow-overheadtechniquebasedonnetworkcoordinatesisusedthatallowsarbitrarypairsofnodestoestimateInternetlatenciestoeachotherwithoutrequiringanexplicitmeasurement. TherestofthisChapterisorganizedasfollows.InSection 7.1 ,IdescribenetworkcoordinatesandtheirimplementationintheIPOPsystem.Section 7.2 focussesonimprovementsinoverlayrouting.IdescribePNS,itsimplementationinIPOPandevaluatethecorrespondingimprovementsinroutelatencyofIPOPoverlaysonPlanetLab.ThefocusofSection 7.3 isonimprovingIPOPlatencybysettingupa2-hoppaththroughasuitablychosenproxynodewhendirectcommunicationisnotpossiblebetweenend-nodes.Idescribeandevaluatetechniquestodiscoverproxynodesthatminimizelatencybetweenend-nodes,underdierentscenarios. 112 ],Lighthouses[ 113 ]),andallothernodescomputetheircoordinatesbymeasuringlatencytotheselandmark 110

PAGE 111

28 ],andtheirisnodependenceonexternalnodes.ThesecondclassofalgorithmsaremoresuitableforanautonomoussystemsuchasIPOP,sincetheyfacilitatedeploymentandmaintainabilitybyallowingallnodesinthesystemtoexecutethesamecode. IntheVivaldialgorithm,allnodesinthesystemstartatrandompointsinspace,initially.ByperiodicallytakingRTTsamplestoaxedsetofafewrandomnodes,anodeadjustsitscoordinates.Gradually,thesecoordinatesevolvetorepresenttheInternetlatenciesbetweennodes.TheVivaldialgorithmisdescribedindetailinAppendix B SupportforVivaldinetworkcoordinateshasbeenincorporatedintotheBrunetP2Psystem.Periodically,anodetakesaRTTmeasurementtoonesofitsexistingconnections.Thesemeasurementsdonotimposeanyextraoverhead,sincetheysupplementtheperiodicpingmessagestokeepidleconnectionsalive.TheperiodicRTTmeasurementsaretakenatanapplication-level,andhenceexhibithigh-variancesthatcancauseslowconvergenceandinstabilityTheimplementationofnetworkcoordinatesintheIPOPsystemusesstatisticallteringoflatencysamples[ 114 ],toachievetolerancetothenoiseinRTTmeasurements,andstillberesponsivetosustainedchangeinlatencies. Figure 7-1 showsthecumulativedistributionfunction(CDF)oftheestimationerrorforlatenciesbetweenpairsofnodes(usingnetworkcoordinates)onanoverlayconsistingofthan350nodesonPlanetLab,measuredatdierentinstants.Eachnodeintheoverlayacquiresabout15connectionsonaverage,whichitperiodicallysamplesevery10seconds.Theembeddingspaceforthenetworkcoordinatesisa2dimensional,withascalarheightvector.Thepredictionaccuracyofnetworkcoordinatesisobservedovertime.After5hoursofbootstrappingthesystem,themedianpredictionerrorisobservedtobeabout15%,whichisalsoinagreementwithotherstudiesonusingnetworkcoordinates[ 115 ].Eventhoughanetworkof350nodes,allbootstrappedatonce,takeabout5hourstoachievethisaccuracy,suchnetworkstypicallygrowincrementally,onenodeatatime.Relatedwork[ 28 ]hasshownthanoncethereisacriticalmassofwell-placednodesina 111

PAGE 112

Relativeerrorforround-triptimepredictionusingnetworkcoordinatesmeasuredafter1,3and5hoursofbootstrapping Vivaldinetwork,anewnodejoiningthesystemneedstomakefewmeasurementsinordertondagoodplaceforitself. 27 ],toimprovelatencyofmulti-hopstructuredP2ProutinginIPOP,byrequiringP2Pnodestoselectasubsetoftheirconnectionsbasedonnetworkproximity.

PAGE 113

Proximityneighborselection PNSrequiresnodestoalsoconsidernetworkproximitywhilechoosingfarconnections,asillustratedinFigure 7-2 .Inthisgure,thenodeAconsidersarangeRofsizeO(log(n))nodesstartingattherandomtargetnodeB(inidentierspace),selectedusingthetechniquedescribedabove.InsteadofconnectingtothenodeB,thenodeAconnectswiththenodeB0inrangeRthathasthesmallestInternetlatencytoitself.Inaddition,eachnodealsokeepsaconnectiontotheclosestnode(intermsoflatency)amongitslog(n)nearestneighborsintheidentierspace.Ingeneral,anoverlaypathbetweentwonodes(adequatelyseparatedinidentierspace)isdominatedbyfarconnections,sincetheseconnectionstravelmostofthedistanceinidentierspace.Thenumberofhopsoverfarconnectionsisthusafunctionofn,whileonlythelastfew(constant,sayM)hopsareoverconnections.Byselectingfarconnectionsbasedonproximity,messagescanavoidtakinglonglatencyhopsformostofitsprogresstowardsthedestination,thusleadingtoreductioninoverlaylatency.Furthermore,choosingfarconnectionsusingthistechniquealsopreservestheboundonthenumberofoverlayhopsbetweenpairsofnodestoO(1 113

PAGE 114

114

PAGE 115

Giventheabovefunctionality,acomponentcalledVivaldiTargetSelector(VTS)hasbeenimplementedthatenablesanodetodiscovertheclosestnodeB0intherangeRstartingatitsguessedrandomaddresss.TheVTStakesasinput(1)startaddresssoftherangeand(2)sizeroftherange.Usingthestartaddresssasforwarder,itsendsaqueryrequestingthelocalnetworkcoordinatestothedirectionaladdress\Left"withTTLsettor,androutingmodepath-deliver.Thisqueryisthedeliveredtoeachnodeintherangeofsizertotheleftofaddresss,andtheresultsarecommunicatedbacktothesourcenode.TheVTSthenmeasuresthedistanceofinthenetworkcoordinatespacetoeachcandidatenode,andselectstheclosestnodeB0forconnectionsetup.Anestimateofthenetworksizeismadeusingaddressrangespannedbythenearconnectionsatanode.Periodicallyevery300seconds,anoderandomlyselectsafarconnectionstocheckifitisstilloptimalbyqueryingthesamerangeRfornetworkcoordinates.Incasetheconnectionnodeisnolongertheclosest,theconnectionistrimmed.TheseperiodiccheckstakescareoftheevolutionofnetworkcoordinatesovertimeandalsoarrivalofaclosernodeintherangeR. 115

PAGE 116

Thecumulativedistributionoftheobservedround-tripdelayisshowninFigure 7-3 .ItisobservedthatPNSresultsinupto40%reductionintheaverageoverlaylatency.ThisreductioninoverlaylatencyisalsoreectedinthedecreaseincrawltimesandDHToperationsonIPOPoverlays,thusvalidatingtheapplicabilityofPNSandnetworkcoordinatestothestructuredoverlaysdeployedinwide-area. TheabilityofnetworkcoordinatestocorrectlypredicttheclosestnodeineachqueriedrangeRisalsoevaluated.Thecandidateset(withtheircoordinates)foreachPNSqueryislogged.Theround-triptimes(RTTs)betweenPlanetLabnodesasmeasuredbyICMPpings,arethenusedtodeterminethepercentageofnodesthatwereclosertothecurrentnodethantheoneselectedbytheVivaltiTargetSelector(usingnetwork 116

PAGE 117

Round-triptimes(RTTs)foroverlaypingswithPNSandwithoutPNS.AverageRTTis(1)1.75secs(withPNS),(2)2.86secs(withoutPNS). PercentageofnodesinthequeriedrangeRthatareclosertothesourcenodethantheoneselectedbynetworkcoordinates.Onaverage,8.76%ofthenodesinthequeriedrangeRarecloserthantheselectednode. 117

PAGE 118

Relativelatencyerrorforchoosingtheclosestnodeinthenetworkcoordinatespace.Onaverage,therelativelatencyerroris1.432. coordinates).ThisdistributionisshowninFigure 7-4 .Itisobservedthatwithprobabilitygreaterthan0.60,networkcoordinatescancorrectlyidentifytheclosestnode,whilewithprobability0.90theselectednodeisamongtheclosest20%ofthequeriednodes. Therelativelatencyerrorforeachselectionisdenedasdseldopt 7-5 .Itisobservedthatinmorethan80%ofthecases,therelativelatencyerrorislessthan0.50. EvenwithPNS,theaveragelatencyontheoverlayisstillobservedtobeveryhigh(inexcessofasecond)forecientroutingofvirtualIPpacketswhendirectcommunicationisnotpossiblebetweenIPOPnodes.Inthenextsection,Iinvestigatetechniquestoreduceend-to-endperformanceoftheIPOPvirtualnetworkinsuchscenarios,bysettingupa2-hopcommunicationpaththroughproxynodeswithdesiredcapabilities. 118

PAGE 119

ThelatencyofvirtualIPcommunicationcanbereducedbysettingupa2-hopcommunicationpathbetweenendnodes,AandB(thatcannotestablishdirectcommunication),throughanothernodePinthenetworkthathassucientresourcestoecientlyroutevirtualIPpackets.AsimilarapproachisalsousedintheSkype[ 23 ]system.InSkype,somenodesinthesystemelevatetostatusofsupernodes,onbasisoftheircapabilities(CPU,memoryanddisk)andnetworkconnectivity.Thesenodescanproxycommunicationbetweenothernodesinthesystem.Theinformationaboutthesupernodesinmaintainedinacentraldirectory.Itisretrievedbyanewnodeonstart-up,andcachedlocallyforthefuture.InasystemsuchasIPOPwherenodesbelongtodierentorganizationsandindividuals,managingsuchcentralserversisaproblemasdescribedinChapter 4 .ToachieveautonomousoperationoftheIPOPvirtualnetwork,decentralizedtechniquestodiscoverproxynodesaremoresuitable,andareinvestigatedinthischapter. InanIPOPoverlay,anodeisonlyawareofafewothernodes,theonesitisconnectedwith.InordertodiscoveraproxynodePtocommunicatewithanothernodeBinthenetwork,anodeAmustbeabletoquerythenetworkforanodewithcertaindesiredcharacteristicsthatdependonthenatureoftheapplication{aninteractiveapplicationmightrequireaproxynodesuchthattheend-to-enddelayiswithinadelaybudget,whereasadata-transferapplicationwouldneedaproxythatwillprovidesucient 119

PAGE 120

19 ]usesacentralmanagertomatchapplicationrequirementstoresourcecharacteristics.Gridtechnologies[ 116 ]requireeachparticipatingsiteinthecollaborationtomaintainalocaldirectory,whichmaintainsinformationaboutthelocalresourcesatthatsite.Thesesystemsrequiresomesettingupandmaintainingdedicatedservers,whichbecomesdicultwhenusersandresourceprovidersareindividualusersontheInternet. SWORD[ 97 ]presentsanextensiveframeworkfordiscoveringwide-areahostsforservicedeploymentthatmeetcertainuser-denedcriteria.Eachresourceisdescribedbyasetofattributevalueswhichcanbetime-varying.Wide-areahostsformstructuredoverlaywherebesidesbeingaresourcethatperiodicallypublishesitsstate,everyhostisalsoresponsibleforstoringpartofinformationaboutotherresources.XenoSearch[ 117 ]isanothersystemfordiscoveringsuitableXenhostsontheInternet,fordeployingservicesinsideVMcontainers.Theabsenceofspecialserverstostoreresourcecharacteristicsmakesthesesystemshighlyself-managing,andthusmoresuitableforafully-decentralizedenvironment. BoththesesystemsuseaDHTtostoreresourceattributes.Foreachrelevantresourceattribute,anentry(attributevalue;resourceidentifier)iscreatedintheDHT.Thisinformationisonlystoredassoft-statesinceattributevaluesmayvaryovertime.Inaddition,SWORDalsohandlesrangequeriesonattributevalues.Manyresourcesmightsatisfytheconstraintsimposedbyaquery,whichaectsthequeryprocessingcost.Eachquerythereforealsospeciesthemaximumcostthatmustbeincurredtoprocessit. 120

PAGE 121

Thequalityofproxyselectioninthisapproachislimitedbytheconnectionsatanode,whicharechosenrandomly.Thenextsectiondescribesanapproachthatcansearchothernodesinthenetwork,beyondtheexistingconnectionsatanode,tondproxynodesthatminimizeend-to-endlatency. 121

PAGE 122

118 ]havebeenproposedtoecientlystoremulti-dimensionalattributessuchasnetworkcoordinates,anddorangequeriesonthem.However,networkcoordinatesaredynamicanderror-prone,therefore,usingaPHTtostoretheseattributescanbeanoverkill.Instead,itispossibletodirectlyquerynodesfortheirnetworkcoordinates.Ihavedevelopedagenericframeworkbasedonmap-reduceforparallelexecutionofsuchdistributedqueriesandaggregationofresultsfromalargenumberofnodes,whichisdescribedinAppendix C .Inowdescribethesearchalgorithm. ThegoalofthisalgorithmistondaproxynodesuchthatthissumSofdistances(A$PandP$B)iswithinafractionfofdirectdistanceA$B.Thesearchqueryisexecutedusingtheusingmap-reduceresourcediscoveryframeworkbybroadcastingoverdierentsegmentsoftheP2Pring,asshowninFigure 7-6 TheMapfunctiontakesasargumentthecoordinatesofnodesA(nca)andB(ncb),andadelayfractionf.IfthesumSpofthedistancesofthelocalcoordinates(nclocal)fromendnodes(i.e.jncanclocalj+jncbnclocalj)iswithinfractionfofthedirectdistance(i.e.fjncancbj),itreturnsalistcontainingasingletuple(Sp;Identifierp).Otherwiseitreturnsanemptylist.TheReducefunctionmergesthechildlistoftupleswiththecurrentlistbasedontherstcomponent(sumofdistances)ineachtuple,tobuildasinglelistofatmostK(typically10)tupleswiththebestsums. InFigure 7-6 ,thenodeArstqueriesasegmentoftheringofsizer,startingatitself,forthebestKnodesthatmatchthecriteria.Ifnosuchnodeisfound,thenodeAthenqueriesathenextsegmentwhichistwiceasbigasthecurrentsegment.Thisexpandingringsearchterminatesonceatleastonenodethatmatchesthecriteriahasbeenfoundorwhentheentireringhasbeenqueried.IntherestofthisChapter,IrefertothisapproachbyESS. 122

PAGE 123

ExpandingSegmentSearch(ESS) 7-1 showsthe6dierentoverlaysonPlanetLabthatwereusedforexperiments.Theoverlayswithineachset((Exp1;Exp2),(Exp3;Exp4)and(Exp5;Exp6))weredeployedsimultaneouslyontwodierentslices,withandwithoutPNS,respectively.Theseoverlayswerecrawledtocollectthelistofconnections(orderedbylatency)andnetworkcoordinatesateachnode.Theaveragenumberofconnectionsacquiredbyoverlaynodeswasobservedtobe15.TheorderedlistofconnectionsisthenusedtodeterminetheproxynodethatwouldhavebeenselectedusingCC,foreachpairofnodes.Likewise,itisalsopossibletosimulatetheESSalgorithmoineforeachpairofendnodes,giventhenodeidentiersandcoordinates(bothofwhichareavailablefromthecrawl). Thecorrectorderedlistofproxies(orderedbyend-to-endlatencythroughtheproxynode)foreachpairofendnodes,wascomputedfromtheICMPpinglatenciesbetweennodesForeachpair,Irefertothemostidealproxyaspopt,andtheend-to-endlatencyforusingthatproxynodeasdopt.Likewise,Irefertotheproxyselectedbythedescribed 123

PAGE 124

CongurationsofoverlaysusedtoevaluateCCandESS Overlay #ofnodes PNS Exp1 485 Yes Exp2 448 No Exp3 482 Yes Exp4 454 No Exp5 488 Yes Exp6 452 No techniquesasp,andtheend-to-endlatencythroughthatproxyasdp.TocomparethequalityoftheproxynodepselectedbyCCandESS,thefollowingmetricsareused: 1. RelativePenalty(RP):Therelativepenaltyforusingpoverpoptisgivenby: 2. AbsolutePenalty(AP):Theabsolutepenaltyforusingpoverpoptisgivenby: InTable 7-2 summarizestheRPandAPforCCandESS(withf:1.2,1.1,1.05),observedonoverlaysExp1andExp2.TheaveragenumberofnodesqueriedinthethreecongurationsofESS,was37,57and87.Interestingly,itisobservedthatthesimpleapproachCCperformsbetterthaneithercongurationofESS,whichisreectedinitslowerRPandAP.AnotherinterestingobservationisthatfortheoverlaywithoutPNS(Exp2),thedierenceinRPsandAPsofproxiesselectedbyCCandESSislesserthanthatoftheoverlaywithPNS(Exp1).Infact,CCperformsalmostthesameasESS. Thisobservationisexplainedfromaninformation-theoreticperspective.CCreliesontheinformationcontainedinanode'sconnectionsforproxyselection,whereasESSusestheinformationcontainedinnetworkcoordinates.With15connections,itispossiblethatanodeacquiresagoodviewoftheInternetspace.TheamountofinformationpresentinCCcannotbematchedbythatofnetworkcoordinates,duetotheirhighestimationerrors.Furthermore,whenPNSisused,theinformationcontainedinnetworkcoordinatesisalsousedinselectingconnectionsclosetoanode,thusprovidinganodeabetterviewofthenetworkandamuchbetterperformanceonCCoverESS.SimilarobservationswerealsomadeforotheroverlaysasshowninFigure 7-7 124

PAGE 125

BExp2 CExp3 DExp4 EExp5 FExp6 Relativepenalty(RP)forusingaproxynodeselectedbyCCandESS(1.2),whenallnodesinthenetworkcanserveasproxies. 125

PAGE 126

RelativeandabsolutepenaltiesofCC,ESS(1.1),ESS(1.2)andESS(1.05)foroverlaysExp1(withPNS)andExp2(withoutPNS) Exp1(WithPNS) Exp2(WithoutPNS) %-ile CC ESS(1.2) ESS(1.1) ESS(1.05) CC ESS(1.2) ESS(1.1) ESS(1.05) 25 0.076 0.117 0.113 0.111 0.106 0.133 0.127 0.124 8.732 14.296 13.825 13.619 12.986 17.077 16.113 15.645 50 0.166 0.213 0.208 0.204 0.225 0.245 0.236 0.231 19.509 25.787 25.232 24.863 26.573 30.247 28.998 28.3185 75 0.337 0.427 0.417 0.415 0.470 0.537 0.519 0.511 35.572 43.622 42.693 42.003 49.051 51.238 49.689 48.98 90 0.742 1.008 1.014 1.012 1.129 1.373 1.337 1.324 63.403 71.118 69.397 68.466 92.258 110.227 106.027 105.575 Eventhoughalow-overheadtechniqueCCprovidesamuchbetterproxyselection(intermsoflatency)thanESS,thesearchspaceandthequalityofproxies(byalsoincludingothercriteria)inCCisstilllimitedbytheconnectionsatanode.ScenariosarisewhereonlyasubsetofthenodesinanIPOPoverlaycanbeusedasproxies.Forexample,PlanetLabnodesthatimposelimitsontheamountoftractheycanhandlearepoorcandidatesforproxyselection.Inaddition,toproxyconnectionsbetweencertainpairsofnodes,itisalsoimportantthattheproxynodehasgoodnetworkconnectivity,suchasitisapublicnodeandalsohassucientCPUcyclestorouteIPOPtrac.Itispossiblethatonlyafew(ornone)oftheexistingconnectionshavesucientresourcestoecientlyroutevirtualIPtrac.Withonlyasmallsubsetofanode'sconnectionstoselectfrom,itispossiblethatonlyasmallamountoflatencyinformationisusablebyCC. Inanothersetofexperiments,onlyfractionofthenodeswererandomlyclassiedaspotentialproxies.Theserepresentnodesthathavegoodnetworkconnectivity,andotherresourcestoecientlyrouteIPOPtrac.Inthisscenario,CCselectstheclosestconnectionthatisalsoapotentialproxy.Similarly,ESSsearchonlyconsidersnodesthatarepotentialproxies.AnotheralgorithmcalledESS(rst)isalsointroduced,whichselectstherstpotentialproxyencounteredintheESSsearchwithoutconsideringlatencyat 126

PAGE 127

Table 7-3 summarizesthepenalties(RPandAP)forCCandESSwhen100%,30%and20%ofthenodesarepotentialproxies,fortheoverlaysExp1andExp2.Itisobservedthatwhenlessthan30%ofthenodes(about135nodesoutof450)inthenetworkcanserveasproxies,ESSprovidesabetterproxyselectionthanCC.Givenauniformlikelihoodof30%forbeingaproxy,lessthan5(outof15)ofanode'sconnectionscanbeusedasproxies,whichgreatlyreducesthelatencyinformationpresentavailabletoCCforproxyselection.ItisalsoobservedthattheCCperformsbetteronoverlaysconguredwithPNS,becausetheconnectionsetofeachnodesalreadyincorporateslatencyinformationfromnetworkcoordinates.TheaveragenumberofnodesqueriedbyESS(1.2)was74and99for30%and20%fractionofpotentialproxies,respectively. Figures 7-8 and 7-9 presenttheCDFoftheRPforthethreeapproaches,CC,ESS(1.2)andESS(rst),ontheotheroverlays.Ineachcase,ESS(1.2)resultsinalowerRPthanbothCCandESS(rst). 127

PAGE 128

BExp2 CExp3 DExp4 EExp5 FExp6 Relativepenalty(RP)forusingaproxynodeselectedbyCCandESS(1.2),whenallnodesinthenetworkcanserveasproxies,whenonly30%ofthenodescanserveasproxies. 128

PAGE 129

BExp2 CExp3 DExp4 EExp5 FExp6 Relativepenalty(RP)forusingaproxynodeselectedbyCCandESS(1.2),whenallnodesinthenetworkcanserveasproxies,whenonly20%ofthenodescanserveasproxies. 129

PAGE 130

RelativeandabsolutepenaltiesofCCandESS(1.2)fordierentfractionofproxynodes Network %ofproxynodes 100 30 20 %-ile CC ESS(1.2) CC ESS(1.2) CC ESS(1.2) Exp1 25 0.076 0.117 0.085 0.079 0.101 0.083 (WithPNS) 8.732 14.296 11.281 9.807 13.116 10.515 50 0.166 0.213 0.194 0.167 0.232 0.170 19.509 25.787 25.114 21.363 29.639 21.948 75 0.337 0.427 0.472 0.368 0.558 0.377 35.357 43.622 49.260 40.266 58.390 40.612 90 0.742 1.008 1.236 0.934 1.402 0.891 63.403 71.118 92.470 69.599 109.546 77.211 Exp2 25 0.106 0.133 0.132 0.131 0.124 0.090 (WithoutPNS) 12.986 17.077 18.623 16.618 18.315 11.555 50 0.225 0.245 0.270 0.240 0.274 0.187 26.573 30.247 36.405 31.771 38.277 23.158 75 0.470 0.537 0.695 0.513 0.965 0.415 49.051 51.238 67.040 52.404 108.508 42.939 90 1.129 1.373 2.482 1.278 4.210 1.097 92.258 110.227 184.208 113.434 214.333 95.353 Toimprovelatencyofoverlayrouting,IimplementedProximityNeighborSelection(PNS)basedonlatencyestimatesfromnetworkcoordinates.ExperimentswithIPOPoverlaysonPlanetLabshowupto30%reductioninaverageoverlaylatencyTheloweroverlaylatencyinturnalsoresultsinbetterresponsetimesDHT-basedsystemsthatrelyonoverlayroutingofkeys. Ontheotherhand,toprovidelowend-to-endlatencywhendirectcommunicationisnotpossible,techniqueshavebeenpresentedtodiscoversuitableproxynodesthatcanbeusedtosetupa2-hoppathbetweentheendnodes.Itisobservedthatalowoverheadtechniquebasedonsamplingofexistingconnections(CC)resultsinlowerend-to-endlatencythansearchbasedonnetworkcoordinates(ESS),whenallnodesinthenetworkarewillingtoserveasproxies.However,practicaldeploymentsofIPOPoverlayshaverevealedscenarioswhereonlyafractionofthenodescanbeusedasproxynodesbasedontheircapabilitiestoecientlyrouteIPOPtrac.Inscenarioswherelessthan30%ofthe 130

PAGE 131

131

PAGE 132

Myresearchhasleveragedacombinationofseveralscienticmethods:(1)validationofexistingtechniques,(2)presentingnovelideasandevaluatingtheirecacyusingsimulations,(3)workingimplementationsofresearchcontributionsand(4)large-scaleexperimentstodemonstratetheoperationofimplementedtechniques. Ihaveaddressedtheproblemofprovidingbi-directionalnetworkconnectivityamongwide-areahostsbehindNATsandrewalls,tosupportunmodieddistributedapplicationsonwide-area.Ihavepresentedaself-managingvirtualnetwork(IPOP)thataggregateswide-areahostsintoaprivatenetworkwithdecoupledaddressspacemanagement.ThevirtualnetworkisfunctionallyequivalenttoaLocal-areanetwork(LAN)environmentwhereawealthofexisting,unmodiedIP-basedapplicationscanbedeployed.TheIPOPvirtualnetworktunnelsthetracgeneratedbyapplicationsoveraP2P-basedoverlay.IPOPnodesself-congurevirtualIPaddressesusingaDHCPimplementationoveraDistributedHashTable(DHT);andself-congureIPtunnelstoconnecttoothernodesonthenetwork.Thevirtualnetworkprovidesamechanismtoselectivelyestablish1-hopoverlaylinksbetweencommunicatingnodes,whichself-optimizethevirtualnetworkwithrespecttooverlaylinklatencyandbandwidth. TogetherofwithVMsforsoftwaredissemination,IPOPfacilitatescreationofhomogeneouslyconguredwide-areaclustersofVirtualWorkstations(calledWOWs).Thesesystemscanbeprogrammedusingexistingbatchschedulersandmiddleware,andsupportcheckpoint/migrationofdistributedapplicationsacrossdomains.WOWdistributedsystemsprovideanexcellentinfrastructurefordeploymentofdesktopgridsandcross-domaincollaboration,wherenewnodescanbeaddedbysimplydownloadingaVMimageandinstantiatingit. TheWOWtechniqueshaveresultedinaneasilydeployableandhighlyusableVMappliancethatconguresadhocCondorpoolsonwide-areahostsforhigh-throughout 132

PAGE 133

17 ],andfromusersatUniversityofFlorida. InsupportofIPOPoverlaysforend-users,Ihavealsoaddressedinterestingresearchproblemsfromdierentareas,structuredP2Psystemsandwide-arearesourcediscovery,amongstothers.DeployingstructuredP2Psystemsonwide-areaisawell-recognizedchallenge,connectivityconstraintssuchassymmetricNATsandInternetrouteoutagesaectoverlaystructuremaintenance,oftenleadingtoinconsistentroutingdecisions.IhavepresentedgenerallyapplicabletechniquestoimproveroutabilityofstructuredP2Psystems,thusbenetingtheapplicationsofthesesystems.Inseverallarge-scaledistributedsystems(includingIPOP),whendirectcommunicationisnotpossiblebetweenendnodes,itispossibletosetupa2-hopcommunicationthroughasuitablychosenproxynode.Dierenttechniqueshavebeeninvestigatedtodiscoverproxiesthatminimizeend-to-endlatency,andtheirecacyhasbeenevaluatedunderdierentscenariosusingexperimentsonPlanetLab{atestbedthatisrepresentativeoftheInternet. 133

PAGE 134

NetworkAddressTranslators(NATs)rstbecamepopularasawaytodealwithshortageofIPv4addressesandalsoavoiddicultyofreservingIPaddressesforbuildinglocal-areanetworks.AccordingtoRFC1918,threeblocksofIPv4addresses(10.0.0.0-10.255.255.255,172.16.0.0-172.31.255.255and192.168.0.0-192.168.255.255)havebeenreservedforprivatenetworks,andarenotusedbyhostsonpublicInternet NATsareusedtoprovideInternetconnectivitytohostsinsuchprivatenetworks.ANATrouterhastwonetworkinterface:oneconnectedtotheprivatenetworkandtheotherconnectedtotheInternetwithoneormorepublicIPaddress(es).AstracpassesfromtheprivatenetworktotheInternet,thesourceIP/portineachpacketistranslatedtoaNAT-assignedpublicIP/port.TheNATtracksthesemappingfrominternalprivateIP/porttopublicIP/port.WhenareplyreturnstotheNAT,itusesthistrackingdataitstoredduringtheoutboundphasetore-writethedestinationIP/port.ToasystemontheInternet,therouteritselfappearstobethesource/destinationforthistrac.NATshavebecomeastandardfeatureinroutersforhomeandsmall-oceInternetconnections,wherethepriceofextraIPaddresseswouldoftenoutweighthebenets. BasedontheassignmentofpublicIP,port,andtreatmenttoinboundpacketsthefollowingNATbehaviorisobservedforUDPtrac: 1. 2. 3. 134

PAGE 135

4. SinceUDPisastateless,NATsdonotdomuchcommunicationstatetracking.Throughacarefullycraftedexchangeofpacketstwonodescanpunch"holes"intheirlocalNATs,andthencommunicatedirectlywithoutanyexternalproxying.Thishole-punchingtechniqueistechniqueisusedintheSTUNprotocol[ 21 ],whichwenowdescribe. WhenanodeAbehindaNATwantstocommunicatewithanodeBbehindanotherNAT,itcontactstheSTUNserverwhichthencommunicatesbacktoAtheexternalIP/portthathadrecordedfornodeB.Atthesametime,theSTUNserveralsoinformsthenodeBabouttheexternalIP/portofnodeA.ThenodeBthensendsamessagetoA'sexternalIP/port,whichpunchesaholeinB'slocalNATthatwouldlaterallowallpacketsfromA'sexternalIP/port.ThepacketssentoutbyAtoB'sexternalIP/portcreateaholeinA'sNAT,thusallowingpacketscominginfromB. 135

PAGE 136

STUNTprotocols[ 22 ]existforNATtraversalusingTCP,butrequiretechniqueslikepacketsning/spoongthatrequiresuperuserprivileges. NATtraversalwithTCPisfurthercomplicatedbythefactthatunlikeUDP,adierentlocalportischosenforeveryoutgoingTCPconnection.EventhoughlocalNATisofConetype,itseesadierentinternalportforeachnewconnection,whichitmapsdierently. 136

PAGE 137

SyntheticcoordinatesallowInternethoststopredictround-triptimes(RTTs)tootherhostsasfollows:hostscomputetheircoordinatesinsomespacesuchthatthedistancebetweentwohosts'syntheticcoordinatespredictstheRTTbetweenthemontheInternet.Acoordinatesystemisparticularlyusefulinserverselectiontofetchreplicateddata,especiallywhenthenumberofserversislargeandamountofdataissmall.Insuchcases,explicitRTTmeasurementsaneasilyoutweighthebenetsofexploitingproximityinformation. Vivaldi[ 28 ]isasimple,light-weightandfullydistributedalgorithmthatallowshoststocomputesyntheticcoordinatesbyonlycommunicatingwithonlyafewotherhosts.Thealgorithmdoesnotrequireanyxednetworkinfrastructureandnodistinguishablehosts. LetLijbetheactualRTTbetweennodesiandj,andxibethecoordinatesassignedtonodei.Thetotalpredictionerrorinthecoordinatesisgivenasfollows: wherejjxixjjjisdistancebetweenthecoordinatesofnodesiandj.Thiserrorfunctioncorrespondstotheenergystoredinthespringnetworkconnectingnodestoeachotherisequivalenttominimizingthepredictionerror. 137

PAGE 138

Thescalarquantity(Lijjjxixjjj)isthedisplacementofthespringfromrest,andu(xixj)isaunitvectorindirectionoftheforceoni. Tosimulatetheevolutionofthespringnetwork,thealgorithmconsiderssmallintervalsoftime.Ateachinterval,thealgorithmmoveseachnodeiasmalldistanceinthedirectionofeachforceFijandthenrecomputesallforces.Thecoordinatesattheendoftheintervalare:xi=xi+Fij;whereisthelengthoftheinterval. Eachnodethenetworksimulatesitsmovementinthespringsystem.Eachnodemaintainsitscurrentcoordinates,startingwithcoordinatesattheorigin.Wheneveranodecommunicateswithanothernode,itmeasurestheRTTtothatnodeandalsolearnsthatnode'scurrentcoordinates.Inresponsetosuchsample,anodepushesitselfforashorttimeinthedirectionofforcecomputedbyEquation B{2 ;eachsuchmovementreducesthenode'serrorwithrespecttotheothernodeinthesystem.Asnodescommunicatewitheachother,theyconvergetocoordinatesthatpredictRTTwell. 1. Eachnodekeeptracksitslocalpredictionerrorusinganexponentiallymovingaveragethatupdatesthelocalerror.EachRTTsamplebearsthepredictionerrorattheremotenode.Thetimestepischosenas: localerror+remoteerror(B{3) 138

PAGE 139

2. 114 ]proposetousenon-linearmovingpercentilewithinaxedwindowofRTTsamples.ItremovesnoiseandalsorespondstoactualchangesinRTT.Overall,thistechniqueprovidesimprovedaccuracyandstability. 139

PAGE 140

Inthesimplestform,resourcediscoveryispossiblebyqueryingasubsetofnodesinthenetworkfortheirattributes,gatheringalltheresultsatthesourcenode,andthencomputeanaggregateresult.TheresourcediscoveryframeworkinIPOPallowsforparallelqueryexecutionandresultaggregationbybuildingatreeconsistingofcandidatenodesrootedatthesourcenode,propagatethequerydownthetreeinparallelandastheresultspropagateupthetree,doanaggregationofresultsateachintermediatenode.Suchaquerycanbedescribedasamap-reducecomputation,thatexecutesinthefollowingtwophases(Figure C-1 ): 1. C-1 A).Eachnodeinthetreecomputesalocalresult(calledmap result),andusesittoinitializethereduce result.Itthendynamicallycomputesthelistofitschildrenandinitiatessimilarmap-reducecomputationsateachchildnode,withpossiblydierentparameters.Thenodethenstartswaitingforchildcomputationstoreturn.Incasethenodedoesnothaveanychildren,itreturnsthereduce resulttoitsparent. 2. C-1 B),andaggregationtakesplaceateachintermediatenode.Ongettingresultofachildcomputation,theparentupdatesitsreduce result.Whenallchildcomputationshavereturnedtheirresults,thecurrentreduce resultisreturnedbacktotheparent. 1. args):Computesthemap result,asthequerytraversesdownthetree. 2. args):Computesthelistofimmediatechildrennodesofthecurrentnode,basedontheargumentsgen args.Returnsalistofchildren,andargumentsfortheircorrespondingmap-reducecomputations.Returnsanemptylistifthenodehasnochildren. 3. args,current reduce result,child result,[out]done):Invokedongettingaresultfromachild.Computesanaggregationusingthecurrentvalueofreduce resultandthechild result.Alsoreturnsanoutparameterifcertain 140

PAGE 141

BReducephase Executionofamap-reducecomputationoveratreeofnodes terminationcriteriaismet,indicatingthatthereisnotneedtowaitforresultsfromotherchildrenandthecurrentreduce resultcanbereturnedrightawaytotheparent. 1. args).Thistreecorrespondstotheoverlayroutetoadestination.Themethodreturnsalistcontainingsinglenodethatisclosesttothedestinationinthelocalconnectiontable.Incasethereisnonodeclosertothedestinationthanthecurrentnode,itreturnsanemptylist. Thistreeallowscomputingstatisticsaboutanoverlayroutebetweentwonodes.Forexample,itispossibletoaggregateallintermediatenodeaddressesintoalist,byspecifyingaMapfunctionthatreturnsalistcontainingcurrentnodeaddress,andaReducefunctionthatsimplydoeslistconcatenation.Tocountthenumberoftunneledgesinanoverlayroute,provideaMapfunctionthatreturns1ifthe 141

PAGE 142

2. arg),startingatthecurrentnodeA(seeFigure C-2 ).Tocomputethechildrenateachnode,thefollowingalgorithmisused. TobroadcastoveraregionofringstartingatthecurrentnodeAandendingatanodeB,thenodedeterminesallitsconnectionsintheregion[A;B),sayc1;c2;c3::::cm.Thenodethenassignstoci,thesegment[ci;ci+1].Theprocesscontinuesuntilthecurrentnodeistheonlynodeinitsassignedrange.ItcanbeshownthatgivenO(log(n))connectionsateachnode,themaximumdepthofthetreeisO(log(n)),forarangeofsizen. Toexecutethecomputationoveranarbitraryrange[C;D)(Cisnotthecurrentnode),itisrequiredtodeterminetherstnodeC0inthisrange,whichisdoablebyusinggreedyrouting,andtheninitiatethecomputationatthatnode. BoundedbroadcastoverasegmentoftheP2Pring 1. 142

PAGE 143

3. identifier).TheReducefunctionmergesthechildlistoftupleswiththecurrentlisttobuildasinglelistof(atmostK)tupleswiththebestranks. 4. 143

PAGE 144

[1] I.FosterandC.Kesselman,\Globus:Ametacomputinginfrastructuretoolkit,"Intl.JournalofSupercomputerApplications,vol.11,no.2,pp.115{128,1997. [2] D.P.Anderson,J.Cobb,E.Korpella,M.Lebofsky,andD.Werthimer,\"SETI@Home:Anexperimentinpublic-resourcecomputing","CommunicationsoftheACM,vol.11,no.45,pp.56{61,2002. [3] M.W.Chang,W.Lindstrom,A.J.Olson,andR.K.Belew,\AnalysisofHIVwild-typeandmutantstructuresviainsilicodockingagainstdiverseligandlibraries,"JournalofChemicalInformationandModeling,vol.47,no.3,pp.1258{1262,2007. [4] K.Keahey,I.Foster,T.Freeman,X.Zhang,andD.Galron,\Virtualworkspacesinthegrid,"inProc.ofEuropar,Lisbon,Portugal,Sep2005. [5] I.Krsul,A.Ganguly,J.Zhang,J.A.B.Fortes,andR.J.Figueiredo,\VMPlants:Providingandmanagingvirtualmachineexecutionenvironmentsforgridcomputing,"ininProc.ofSC2004,Pittsburgh,PA,Nov2004. [6] S.SonandM.Livny,\Recoveringinternetsymmetryindistributedcomputing,"inProc.ofthe3rdIntl.Symp.onClusterComputingandtheGrid,May2003. [7] S.Son,B.Allcock,andM.Livny,\Codo:Firewalltraversalbycooperativeon-demandopening,"inProc.of14thIntl.Symp.onHighPerformanceDistributedComputing(HPDC),2005. [8] J.MaassenandH.E.Bal,\Smartsockets:Solvingtheconnectivityproblemsingridcomputing,"inProc.ofSymp.onHighPerformanceDistributedComputingSymposium,MontereyBay,CA,Jun2007. [9] D.Anderson,\BOINC:Asystemforpublic-resourcecomputingandstorage,"inProc.ofthe5thIntl.WorkshoponGridComputing(GRID-2004),Pittsburgh,PA,Nov2004,pp.4{10. [10] P.Barham,B.Dragovic,K.Fraser,andS.H.et.al.,\Xenandtheartofvirtualization,"inProc.ofthe19thACMsymposiumonOperatingsystemsprinci-ples,BoltonLanding,NY,2003,pp.164{177. [11] R.Goldberg,\Surveyofvirtualmachineresearch,"IEEEComputerMagazine,vol.7,no.6,pp.34{45,1974. [12] J.Sugerman,G.Venkitachalan,andB.H.Lim,\VirtualizingI/ODevicesonVMwareWorkstation'sHostedVirtualMachineMonitor,"inProc.oftheUSENIXAnnualTechnicalConference,Jun2001. 144

PAGE 145

A.SundararajandP.Dinda,\Towardsvirtualnetworksforvirtualmachinegridcomputing,"inProc.ofthe3rdUSENIXVirtualMachineResearchandTechnologySymp.,SanJose,CA,May2004. [14] X.JiangandD.Xu,\Violin:Virtualinternetworkingonoverlayinfrastructure,"inProc.ofthe2ndIntl.Symp.onParallelandDistributedProcessingandApplications,Dec2004. [15] M.TsugawaandJ.A.B.Fortes,\Avirtualnetwork(ViNe)architectureforgridcomputing,"inProc.oftheIEEEIntl.ParallelandDistributedProcessingSymp.(IPDPS),Rhodes,Greece,Jun2006. [16] A.Ganguly,A.Agrawal,P.O.Boykin,andR.J.Figueiredo,\IPoverP2P:Enablingself-conguringvirtualIPnetworksforgridcomputing,"inProc.oftheIEEEIntl.ParallelandDistributedProcessingSymp.(IPDPS),Rhodes,Greece,Apr2006. [17] G.Klimeck,\NanoHUB.orgtutorial:Educationsimulationtools,"inProc.ofIEEEConferenceonNano/MicroEngineeredandMolecularSystems(NEMS),Bankok,Thailand,Jan2007. [18] M.Litzkow,M.Livny,andM.Mutka,\Condor-Ahunterofidleworkstations,"inProc.8thIEEEIntl.ConferenceonDistributedComputingSystems(ICDCS),Jun1988. [19] R.Raman,M.Livny,andM.Solomon,\Matchmaking:Distributedresourcemanagementforhighthroughputcomputing,"inProc.ofthe7thIEEEIntl.Symp.onHighPerformanceDistributedComputing(HPDC),Chicago,IL,Jul1998. [20] T.Tannenbaum,D.Wright,K.Miller,andM.Livny,BeowulfClusterComputingwithLinux.TheMITPress,2002,ch.Condor-ADistributedJobScheduler. [21] J.Rosenberg,J.Weinberger,C.Huitema,andR.Mahy,\RFC3489-STUN-SimpleTraversalofUserDataProtocolProtocolThroughNetworkAddressTranslators,"Mar2003.[Online].Available: http://www.ietf.org/rfc/rfc3489.txt Lastaccessed:Jul2008. [22] S.GuhaandP.Francis,\CharacterizationandmeasurementofTCPtraversalthroughNATsandrewalls,"inProceedingsoftheInternetMeasurementConference,Berkeley,CA,Oct2005. [23] S.Guha,N.Daswani,andR.Jain,\AnexperimentalstudyoftheSkypepeer-to-peerVoIPsystem,"inInProc.oftheInternationalWorkshoponPeer-to-PeerSystems(IPTPS),SantaBarbara,CA,Feb2006. [24] C.Sapuntzakis,D.Brumley,R.Chandra,N.Zeldovich,J.Chow,M.S.Lam,andM.Rosenblum,\Virtualappliancesfordeployingandmaintainingsoftware," 145

PAGE 146

[25] S.-M.Huang,Q.Wu,andY.-B.Lin,\TunnelingIPv6throughNATwithTeredomechanism,"inProceedingsoftheInternationalConferenceonAdvancedInformationNetworkingandApplications,Taiwan,Mar2005. [26] M.J.Freedman,K.Lakshminarayanan,S.Rhea,andI.Stoica,\Non-transitiveconnectivityandDHTs,"inProc.ofthe2ndUSENIXWorkshoponReal,LargeDistributedSystems(WORLDS),SanFrancisco,CA,Dec2005. [27] K.Gummadi,R.Gummadi,S.Gribble,S.Ratnasamy,S.Shenker,andI.Stoica,\TheimpactofDHTroutinggeometryonresilienceandproximity,"inProc.ofACMSIGCOMM,Karlsruhe,Germany,Aug2003. [28] F.Dabek,R.Cox,F.Kaashoek,andR.Morris,\Vivaldi:Adecentralizednetworkcoordinatesystem,"inProc.ofACMSIGCOMM,Portland,OR,Aug2004. [29] B.Chun,D.Culler,T.Roscoe,A.Bavier,L.Peterson,M.Wawrzoniak,andM.Bowman,\PlanetLab:Anoverlaytestbedforbroad-coverageservices,"ACMSIGCOMMComputerCommunicationReview,vol.33,no.3,2003. [30] A.Denis,O.Aumage,R.Hofman,K.Verstoep,T.Kielmann,andH.E.Bal,\Wide-areacommunicationforgrids:Anintegratedsolutiontoconnectivity,performanceandsecurityproblems,"inProc.ofthe13thIntl.Symp.onHighPerformanceDistributedComputing,Honolulu,Hawaii,Jun2004. [31] A.Sundararaj,A.Gupta,andP.Dinda,\Dynamictopologyadaptationofvirtualnetworksofvirtualmachines,"inProc.SeventhWorkshoponLanguages,CompilersandRun-timeSupportforScalableSystems(LCR),Oct2004. [32] P.K.Gummadi,S.Saroiu,andS.D.Gribble,\AmeasurementstudyofNapsterandGnutellaasexamplesofpeer-to-peerlesharingsystems,"ACMSIGCOMMComputerCommunicationReview,vol.32,no.1,p.82,Jan2002. [33] M.Ripeanu,A.Iamnitchi,andI.Foster,\MappingtheGnutellanetwork,"IEEEInternetCommunication,vol.6,no.1,pp.50{57,Feb2002. [34] N.Leibowitz,M.Ripeanu,andA.Wierzbicki,\DeconstructingtheKazaanetwork,"inProceedingsof3rdWorkshoponInternetApplications,SanJose,CA,Jun2003. [35] J.Kubiatowicz,D.Bindel,Y.Chen,S.Czerwinski,P.Eaton,D.Geels,R.Gummadi,S.Rhea,H.Weatherspoon,W.Weimer,C.Wells,andB.Zhao,\Oceanstore:Anarchitectureforglobal-scalepersistentstorage,"inInProc.oftheIntlConf.onArchitecturalSupportforProgrammingLanguagesandOperatingSystems(ASPLOS2000),Cambridge,MA,Nov2000. 146

PAGE 147

P.DruschelandA.Rowstron,\PAST:Alarge-scale,persistentpeer-to-peerstorageutility,"inProc.ofthe8thWorkshoponHotTopicsinOperatingSystems(HotOS),SchossElmau,Germany,May2001. [37] F.Dabek,M.F.Kaashoek,D.Karger,R.Morris,andI.Stoica,\Wide-areacooperativestoragewithcooperativelesystem,"inProc.ofthe18thACMSymp.onOperatingSystemsPrinciples,Ban,Canada,Oct2001. [38] S.Ratnasamy,P.Francis,M.Handley,R.Karp,andS.Shenker,\Ascalablecontentaddressablenetwork,"inProc.oftheACMSIGCOMM2001,2001. [39] A.RowstronandP.Druschel,\Pastry:Scalable,decentralizedobjectlocationandroutingforlarge-scalepeer-to-peersystems,"inProc.oftheIFIP/ACMIntl.Conf.onDistributedSystemsPlatforms(Middleware),Heidelberg,Germany,Nov2001. [40] I.Stoica,R.Morris,D.Liben-Nowell,D.R.Karger,M.F.Kaashoek,F.Dabek,andH.Balakrishnan,\Chord:Ascalablepeer-to-peerlookupprotocolforinternetapplications,"IEEE/ACMTransactionsonNetworking,vol.11,no.1,pp.17{32,2003. [41] B.Y.Zhao,L.Huang,S.C.Rhea,J.Stribling,A.D.Joseph,andJ.D.Kubiatowicz,\Tapestry:Aglobal-scaleoverlayforrapidservicedeployment,"IEEEJ-SAC,vol.22,no.1,pp.41{53,Jan2004. [42] R.ekaAlbert,H.Jeong,andA.asloBarabasi,\Errorandattacktoleranceofcomplexnetworks,"Nature,vol.406,pp.378{381,2000. [43] M.Ripeanu,I.Foster,andA.Iamnitchi,\Mappingthegnutellanetwork:Propertiesoflarge-scalepeer-to-peersystemsandimplicationsforsystemdesign,"IEEEInternetComputingJournal,vol.6,no.1,2002,specialissueonpeer-to-peernetworking. [44] P.O.Boykin,J.S.A.Bridgewater,J.S.Kong,K.M.Lozev,B.A.Rezaei,andV.P.Roychowdhury,\AsymphonyconductedbyBrunet,"inarXiv:0709.4048v1,Sep2007. [45] J.Kleinberg,\Navigationinsmallworld,"Nature,vol.406,p.845,2000. [46] B.Ford,P.Srisuresh,andD.Kegel,\Peer-to-peercommunicationacrossnetworkaddresstranslators,"inProc.ofthe2005USENIXAnnualTechnicalConference(USENIX'05),Anaheim,California,Apr2005. [47] S.Guha,Y.Takeda,andP.Francis,\NUTSS:ASIPbasedapproachtoUDPandTCPconnectivity,"inProc.ofSpecialInterestGrouponDataCommunications(SIGCOMM)Workshops,Portland,OR,Aug2004,pp.43{48. [48] R.J.Figueiredo,P.Dinda,andJ.A.B.Fortes,\Acaseforgridcomputingonvirtualmachines,"inProc.ofthe23rdIEEEIntl.ConferenceonDistributedComputingSystems(ICDCS),Providence,RhodeIsland,May2003. 147

PAGE 148

S.Adabala,V.Chadha,P.Chawla,R.J.Figueiredo,J.A.B.Fortes,I.Krsul,A.Matsunaga,M.Tsugawa,J.Zhang,M.Zhao,L.Zhu,andX.Zhu,\Fromvirtualizedresourcestovirtualcomputinggrids:TheIn-VIGOsystem,"FutureGenerationComputingSystems,SpecialissueonComplexProblem-SolvingEnvironmentsforGridComputing,vol.21,no.6,Apr2005. [50] A.Shoykhet,J.Lange,andP.Dinda,\Virtuoso:Asystemforvirtualmachinemarketplaces."NorthwesternUniversity,Jul2004,technicalReportNWU-CS-04-39. [51] J.SmithandR.Nair,VirtualMachines:VersatilePlatformsforSystemsandProcesses.MorganKaufmann,2005. [52] I.FosterandA.Iamnitchi,\Ondeath,taxes,andtheconvergenceofpeer-to-peerandgridcomputing,"inProc.ofthe2ndIntl.WorkshoponPeer-to-PeerSystems(IPTPS),Berkeley,CA,Feb2003. [53] A.S.Cheema,M.Muhammad,andI.Gupta,\Peer-to-peerdiscoveryofcomputationalresourcesforgridapplications,"inProc.ofthe6thIEEE/ACMWorkshoponGridComputing,Seattle,WA,Nov2005. [54] A.Iamnitchi,I.Foster,andD.C.Nurmi,\Apeer-to-peerapproachtoresourcelocationingridenvironments,"inProc.ofthe11thSymp.onHighPerformanceDistributedComputing,Edinburgh,UK,Aug2002. [55] J.Cao,O.M.K.Kwong,X.Wang,andW.Cai,\Apeer-to-peerapproachtotaskschedulingincomputationgrid,"Intl.JournalofGridandUtilityComputing,vol.1,no.1,2005. [56] N.TherningandL.Bengtsson,\Jalapeno-Decentralizedgridcomputingusingpeer-to-peertechnology,"inProc.ofthe2ndConferenceonComputingFrontiers,Ischia,Italy,2005. [57] A.J.Chakravarti,G.Baumgartner,andM.Lauria,\Theorganicgrid:Self-organizingcomputationonapeer-to-peernetwork,"IEEETransactionsonSystems,Man,andCybernetics,vol.35,no.3,May2005. [58] N.Andrade,L.Costa,G.Germoglio,andW.Cirne,\Peer-to-peergridcomputingwiththeourgridcommunity,"inProc.ofthe23rdBrazilianSymp.onComputerNetworks,May2005. [59] N.A.Al-DmourandW.J.Teahan,\Parcop:Adecentralizedpeer-to-peercomputingsystem,"inProc.ofthe3rdIntl.Symp.onAlgorithms,ModelsandToolsforParallelComputingonHeterogenousNetworks,Jul2004. [60] R.Cox,A.Muthitacharoen,andR.Morris,\ServingDNSusingChord,"inProc.ofthe1stIntl.WorkshoponPeer-to-PeerSystems(IPTPS),Cambridge,MA,Mar2002. 148

PAGE 149

I.Stoica,D.Adkins,S.Zhuang,S.Shenker,andS.Surana,\InternetIndirectionInfrastructure,"IEEE/ACMTransactionsonNetworking,vol.12,no.2,pp.215{218,Apr2004. [62] J.Kannan,A.Kubota,andK.Lakshminarayan,\SupportinglegacyapplicationsoverI3."ComputerScienceDivision,UniversityofCalifornia,Berkeley,Jun2004,technicalreportno.UCB/CSD-04-1342. [63] L.ZhouandR.vanRenesse,\P6P:Apeer-to-peerapproachtointernetinfrastructure,"inProc.ofPeer-to-peerSystemsIII:ThirdIntl.Workshop(IPTPS),2004,p.75. [64] L.Zhou,R.vanRenesse,andM.Marsh,\ImplementingIPv6asapeer-to-peeroverlaynetwork,"in21stIEEESymp.onReliableDistributedSystems(SRDS),2002,p.347. [65] T.E.Anderson,D.E.Culler,andD.A.P.et.al,\Acasefornetworkofworkstations:NOW,"IEEEMicro,February1995. [66] C.Chiapusio.(2007,May)distributed.net.[Online].Available: http://distributed.net Lastaccessed:Jul2008. [67] B.Calder,A.A.Chien,J.Wang,andD.Yang,\Theentropiavirtualmachinefordesktopgrids,"inCSEtechnicalreportCS2003-0773,UniversityofCalifornia,SanDiego,SanDiego,CA,Oct2003. [68] X.Zhang,K.Keahey,I.Foster,andT.Freeman,\Virtualclusterworkspacesforgridapplications,"ArgonneNationalLaboratories,Tech.Rep.ANL/MCS-P1246-0405,Apr2005. [69] C.Clark,K.Fraser,S.Hand,J.Hansen,E.Jul,C.Limpach,I.Pratt,andA.Wareld,\Livemigrationofvirtualmachines,"inProc.2ndSymp.onNetworkedSystemsDesignandImplementation(NSDI),Boston,MA,May2005. [70] M.Lorch,D.Kafura,I.Fisk,K.Keahey,G.Carcassi,T.Freeman,T.Peremutov,andA.S.Rana,\AuthorizationandaccountmanagementintheOpenScienceGrid,"inProc.ofthe6thIEEE/ACMWorkshoponGridComputing,Nov2005. [71] D.Wolinsky,A.Agrawal,P.O.Boykin,J.Davis,A.Ganguly,V.Paramygin,P.Sheng,andR.Figueiredo,\Onthedesignofvirtualmachinesandboxesfordistributedcomputinginwideareaoverlaysofvirtualworkstations,"inProc.ofthe1stWorkshoponVirtualizationTechnologiesinDistributedComputing(VTDC),withSupercomputing,Tampa,FL,Nov2006. [72] T.BaileyandC.Elkan,\Fittingamixturemodelbyexpectationmaximizationtodiscovermotifsinbiopolymers,"inProc.oftheSecondIntl.ConferenceonIntelligentSystemsforMolecularBiology.MenloPark,CA:AAAIPress,1994,pp.28{36. 149

PAGE 150

G.Olsen,H.Matsuda,R.Haggstrom,andR.Overbeek,\fastDNAml:Atoolforconstructionofphyllogenetictreesofdnasequencesusingmaximumlikelihood,"Comput.Appl.Biosci.,vol.10,pp.41{48,1994. [74] C.Stewart,D.Hart,D.Berry,G.Olsen,E.Wernert,andW.Fischer,\ParallelimplementationandperformanceoffastDNAml-aprogramformaximumlikelihoodphylogeneticinference,"inProc.ofIEEE/ACMSupercomputingConference(SC01),2001. [75] K.Melero,M.Hardy,andM.Lucas,\Open-sourcesoftwaretechnologiesfordataarchivingandonlinegeospatialprocessing,"inProc.oftheIEEEGeoscienceandRemoteSensingSymposium,Toulouse,France,Jul2003. [76] V.Sunderam,J.Dongarra,A.Geist,andR.Manchek,\ThePVMconcurrentcomputingsystem:Evolution,experiences,andtrends,"ParallelComputing,vol.20,no.4,pp.531{547,Apr1994. [77] M.KozuchandM.Satyanarayanan,\Internetsuspend/resume,"in4thIEEEWorkshoponMobileComputingSystemsandApplications(WMCSA2002),20-21June2002,Callicoon,NY,USA.IEEEComputerSociety,2002,p.40. [78] C.Sapuntzakis,R.Chandra,B.Pfa,J.Chow,M.Lam,andM.Rosenblum,\Optimizingthemigrationofvirtualcomputers,"inProc.ofUSENIXOperatingSystemDesignandImplementation(OSDI),2002. [79] D.Becker,T.Sterling,D.Savarese,J.Dorband,U.Ranawake,andC.Packer,\Beowulf:Aparallelworkstationforscienticcomputation,"inProc.oftheIntl.ConferenceonParallelProcessing(ICPP),1995. [80] H.Bal,R.Bhoedjang,R.Hofman,C.Jacobs,T.Kielmann,J.Maassen,R.vanNieuwpoort,J.Romein,L.Renambot,T.Ruhl,R.Veldema,K.Verstoep,A.Baggio,G.Ballintijn,I.Kuz,G.Pierre,M.vanSteen,A.Tanenbaum,G.Doornbos,D.Germans,H.Spoelder,E.-J.Baerends,S.vanGisbergen,H.Afsermanesh,D.vanAlbada,A.Belloum,D.Dubbeldam,Z.Hendrikse,B.Hertzberger,A.Hoekstra,K.Iskra,D.Kandhai,D.Koelma,F.vanderLinden,B.Overeinder,P.Sloot,P.Spinnato,D.Epema,A.vanGemund,P.Jonker,A.Radulescu,C.vanReeuwijk,H.Sips,P.Knijnenburg,M.Lew,F.Sluiter,L.Wolters,H.Blom,C.deLaat,andA.vanderSteen,\ThedistributedASCIsupercomputerproject,"SIGOPSOper.Syst.Rev.,vol.34,no.4,pp.76{96,2000. [81] O.Aumage,R.F.H.Hofman,andH.E.Bal,\NetIbis:Anecientanddynamiccommunicationsystemforheterogenousgrids,"inProc.ofCCGrid2005,Cardi,UK,May2005. [82] D.Xu,P.Ruth,J.Rhee,R.Kennel,andS.Goasguen,\Autonomicadaptationofvirtualdistributedenvironmentsinamulti-domaininfrastructure,"inProc.ofIEEEIntlSymp.onHigh-PerformanceDistributedComputing(HPDC),HotTopicsSession,Paris,France,Jun2006. 150

PAGE 151

V.Lo,D.Zappala,D.Zhou,Y.Liu,andS.Zhao,\Clustercomputingonthey:P2Pschedulingofidlecyclesintheinternet,"inProc.ofthe3rdIntl.WorkshoponPeer-to-PeerSystems(IPTPS),SanDiego,CA,Feb2004. [84] A.S.GrimshawandW.A.Wulf,\Legion:Flexiblesupportforwide-areacomputing,"inProc.ofthe7thACMSIGOPSEuropeanWorkshop,Ireland,1996. [85] A.Ganguly,A.Agrawal,P.O.Boykin,andR.J.Figueiredo,\WOW:Self-organizingwideareaoverlaynetworksofvirtualworkstations,"inProc.oftheIEEEIntlSymp.onHigh-PerformanceDistributedComputing(HPDC),Paris,France,Jun2006. [86] M.Castro,P.Druschel,A.-M.Kermarrec,andA.Rowstron,\Scalableapplication-levelanycastforhighlydynamicgroups,"inProc.ofthe5thIntl.WorkshoponNetworkedGroupCommunications(NGC),Munich,Germany,Sep2003. [87] A.Mislove,A.Post,A.Haeberlen,andP.Druschel,\Experiencesinbuildingandoperatingepost,areliablepeer-to-peerapplication,"inProc.ofACMSIGOPS/EuroSysEuropeanConf.onComputerSystems,Leuven,Belgium,Apr2006. [88] S.Rhea,B.Godfrey,B.Karp,J.Kubiatowicz,S.Ratnasamy,S.Shenker,I.Stoica,andH.Yu,\Opendht:ApublicDHTserviceanditsuses."inProc.ofACMSIGCOMM,Philadelphia,PA,Aug2005. [89] S.Rhea,D.Geels,T.Roscoe,andJ.Kubiatowicz,\HandlingchurninaDHT,"inProc.ofUSENIXTechnicalConference,Jun2004. [90] A.Muthitacharoen,S.Gilbert,andR.Morris,\Etna:Afault-tolerantalgorithmforatomicmutabledhtdata,"inTechnicalReportMIT-LCS-TR-993,MIT-LCS,Jun2005. [91] Z.LiandM.Parashar,\Comet:Ascalablecoordinationspacefordecentralizeddistributedenvironments,"inProceedingsofthe2ndInternationalWorkshoponHotTopicsinPeer-to-PeerSystems(HOT-P2P),SanDiego,CA,Jul2005. [92] D.Gelernter,\Generativecommunicationinlinda,"inACMTransactionsonProgrammingLanguageSystems,vol.7,no.1,1985,pp.80{112. [93] M.Castro,P.Druschel,A.-M.Kermarrec,andA.Rowstron,\Oneringtorulethemall:Servicediscoveryandbindinginstructuredpeer-to-peeroverlaynetworks,"inProc.oftheSIGOPSEuropeanWorkshop,France,Sep2002. [94] Z.XuandY.Hu,\SBARC:Asupernodebasedpeer-to-peerlesharingsystem,"inProc.ofthe8thIEEEInternationalSymposiumonComputersandCommunication(ISCC),Antalaya,Turkey,Jun2003. [95] B.Y.Zhao,Y.Duan,L.Huang,A.D.Joseph,andJ.D.Kubiatowicz,\Brocade:Landmarkroutingonoverlaynetworks,"inInProc.oftheInternationalWorkshoponPeer-to-PeerSystems(IPTPS),Cambridge,MA,Mar2002. 151

PAGE 152

A.R.Bharambe,M.Agrawal,andS.Seshan,\Mercury:Supportingscalablemulti-attributerangequeries,"inProc.ofACMSIGCOMM,Portland,OR,2004. [97] D.Oppenheimer,J.Albrecht,D.Patterson,andA.Vahdat,\DistributedresourcediscoveryonplanetlabwithSWORD,"inProc.oftheACM/USENIXWorkshoponReal,LargeDistributedSystems(WORLDS),SanFrancisco,CA,Dec2004. [98] C.SchmidtandM.Parashar,\Enablingexiblequerieswithguaranteesinp2psysteme,"inIEEENetworkComputing(SpecialIssueonInformationDisseminationontheWeb),vol.3,Jun2004,pp.19{26. [99] A.MisloveandP.Druschel,\Providingadministrativecontrolandautonomyinstructuredpeer-to-peeroverlays,"inProc.ofthe3rdIntl.WorkshoponPeer-to-peersystems,SanDiego,ca,Feb2004. [100] P.MaymounkovandD.Mazieres,\Kademlia:Apeer-to-peerinformationsystembasedonthexormetric,"inProc.oftheWorkshoponPeer-to-PeerSystems(IPTPS),Cambridge,MA,Mar2002. [101] M.Castro,M.Costa,andA.Rowstron,\Performanceanddependabilityofstructuredpeer-to-peeroverlays,"inProc.oftheConf.onDependableSystemsandNetworks,Jun2004. [102] M.Castro,P.Druschel,Y.C.Hu,andA.Rowstron,\Topology-awareroutinginstructuredpeer-to-peeroverlaynetworks,"inMicrosoftResearchMSR-TR-2002-82,Sep2002. [103] J.Liang,R.Kumar,andK.Ross,\TheFastTrackoverlay:Ameasurementstudy,"inComputerNetworks(SpecialIssueonOverlays),2005. [104] J.Li,J.Stribling,R.Morris,M.F.Kaashoek,andT.M.Gil,\Aperformancevs.costframeworkforevaluatingDHTdesigntradeosunderchurn,"inProc.IEEEINFOCOM,2005. [105] S.GerdingandJ.Stribling,\Examiningthetrade-osofstructruredoverlaysindynamicnon-transitivenetwork,"Dec2003,Classproject.[Online].Available: http://pdos.lcs.mit.edu/strib/docs/projects/networking fall2003.ps Lastaccessed:Jul2008. [106] B.A.Miller,T.Nixon,C.Tai,andM.D.Wood,\HomenetworkingwithUniversalPlugandPlay,"IEEECommunicationsMagazine,vol.39,no.12,pp.104{109,Dec2001. [107] A.Ganguly,D.Wolinsky,P.O.Boykin,andR.J.Figueiredo,\Decentralizeddynamichostcongurationinwide-areaoverlaysofvirtualworkstations,"inProc.ofthePCGridworkshop,LongBeach,CA,Mar2006. 152

PAGE 153

B.Ford,\UnmanagedInternetProtrocol:Tamingtheedgenetworkmanagementcrisis,"inProc.oftheWorkshoponHotTopicsinNetworks(HotNets),Cambridge,MA,Nov2003. [109] W.ChenandX.Liu,\Enforcingroutingconsistencyinstructuredpeer-to-peeroverlays:Shouldweandcouldwe?"inInProc.oftheWorkshoponPeer-to-PeerSystems(IPTPS),SantaBarbara,CA,Feb2006. [110] J.Aspnes,Z.Diamadi,andG.Shah,\Fault-tolerantroutinginpeer-to-peersystems,"inProc.oftheSymp.onPrinciplesofDistributedComputing(PODC),Monterey,CA,Jul2002. [111] S.Ratnasamy,M.Handley,R.Karp,andS.Shenker,\Topology-awareoverlayconstructionandserverselection,"inProc.oftheIEEEINFOCOMM,NewYork,NY,2002. [112] T.S.E.NgandH.Zhang,\Predictinginternetnetworkdistancewithcoordinates-basedapproaches,"inProc.ofIEEEINFOCOMM,NewYork,NY,Jun2002. [113] M.Pias,J.Crowcroft,S.Wilbur,S.Bhatti,andT.Harris,\Lighthousesforscalabledistributedlocation,"inProc.oftheWorkshoponPeer-to-PeerSystems(IPTPS),Berkeley,CA,Feb2003. [114] P.Pietzuch,J.Ledlie,andM.Seltzer,\SupportingnetworkcoordinatesonPlanetLab,"inProc.ofthe2ndUSENIXWorkshoponReal,LargeDistributedSystems(WORLDS),SanFrancisco,CA,Dec2005. [115] P.Pietzuch,J.Ledlie,M.Mitzenmacher,andM.Seltzer,\Network-awareoverlayswithnetworkcoordinates,"inProc.oftheWorkshoponDynamicDistributedSystems(IWDDS),Lisbon,Portugal,Jul2006. [116] A.IamnitchiandI.Foster,\Onfullydecentralizedresourcediscoveryingridenvironments,"inProc.oftheInternationalWorkshoponGridComputing,Pittsburgh,PA,Nov2004. [117] D.SpenceandT.Harris,\Distributedresourcediscoveryinxenoserveropenplatform,"inProc.oftheIEEEIntl.Symp.onHighPerformanceDistributedCOmputing(HDDC),Seattle,WA,Jun2003. [118] Y.Chawathe,S.Ramabhadran,S.Ratnasamy,A.LaMarca,S.Shenker,andJ.Hellerstein,\Acasestudyinbuildinglayereddhtapplications,"inProceedingsoftheConferenceonApplications,technologies,architectures,andprotocolsforcomputercommunications.NewYork,NY,USA:ACM,2005,pp.97{108. 153

PAGE 154

ArijitGangulygrewupinNewDelhi,thecapitalcityofIndia.HeattendedtheIndianInstituteofTechnology(IIT),Guwahati(India)forhisundergraduatestudies(B.Tech.)incomputerscience.Hebecameinterestedincomputersystems,distributedcomputingandnetworks,andlaterdecidedtopursuegraduatestudies.AfterhisgraduationfromIITGuwahati,ArijitjoinedUniversityofFloridainFall2002.InFall2003,ArijitstartedhisPhDresearchunderDr.RenatoFigueiredoattheAdvancedComputingandInformationSystems(ACIS)Laboratory,wherehewaslaterappointedasaResearchAssistant(RA).AtACIS,hegotanopportunitytodocutting-edgeresearchononvirtualmachines,networksandP2Psystems;andintheprocesspublishpapersinhighlyacclaimedconferencesandjournals.TogetherwithACISresearchers,hehasalsobeeninvolvedinthedevelopmentofopen-sourcesoftwarecalledIPOP,whichisalreadybeingusedbyresearchersattheUniversityofFlorida.Tocomplementhisacedemicexperience,ArijitalsogotanopportunitytodosummerinternshipsatVMware,IBMResearchandMicrosoftin2005,2006and2007,respectively.Uponhisgraduation,Arijitplanstotakeupafull-timepositioninindustryintheareaofcomputersystems. 154