Citation
SDN and Overlay Network Hybrid System for Flexible and High Performance Network Virtualization

Material Information

Title:
SDN and Overlay Network Hybrid System for Flexible and High Performance Network Virtualization
Creator:
Jeong, Kyuho
Place of Publication:
[Gainesville, Fla.]
Florida
Publisher:
University of Florida
Publication Date:
Language:
english
Physical Description:
1 online resource (121 p.)

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Electrical and Computer Engineering
Committee Chair:
FIGUEIREDO,RENATO JANSEN
Committee Co-Chair:
FORTES,JOSE A
Committee Members:
LI,XIAOLIN
CHEN,SHIGANG

Subjects

Subjects / Keywords:
multipath -- network -- overlays -- sdn
Electrical and Computer Engineering -- Dissertations, Academic -- UF
Genre:
bibliography ( marcgt )
theses ( marcgt )
government publication (state, provincial, terriorial, dependent) ( marcgt )
born-digital ( sobekcm )
Electronic Thesis or Dissertation
Electrical and Computer Engineering thesis, Ph.D.

Notes

Abstract:
Network protocols and infrastructure are difficult to be modified after deployment. Thus, overlay networks have been widely used to provide a virtual abstraction of network layer, achieved by encapsulation at the end-points and tunneling through the network fabric. Although overlay networks provide the flexibility to implement operations such as modifying/mapping identifiers and control of information for network virtualization, they are not able to keep the pace with the transmission rate capacity of modern hardware and communication links, and consume computing resources from the end-hosts. The recent emergence of Software Defined Networking (SDN) created a means of virtualizing network from infrastructure by using common APIs. This dissertation addresses challenges in the context of overlay network design, including performance improvements by using SDN in the contexts of inter-cloud and multi-tenant data centers. In the context of inter-cloud architectures, this dissertation investigates a novel approach called VIAS (VIrtualization Acceleration using SDN) to the design of overlay networks. VIAS uses SDN to selectively bypass tunneled links when SDN-managed paths are feasible. Architecturally, VIAS is self-organizing, whereby overlay nodes can detect that peer endpoints are in the same network and program bypass flows between OpenFlow switches. In the context of multi-tenant data centers, the virtualization performance of overlay network technologies such as VXLAN is far below link speeds, and overlay processing consumes substantial resources from the hypervisor. This dissertation introduces PARES (PAcket REwriting on SDN), which uses the packet rewriting feature of SDN switches to enable multi-tenant functionality with near-native performance and reduced load on end-hosts. In public WAN Internet, along with the emergence of cloud computing, overlay networks can be used to increase end-to-end bandwidth by using additional cloud paths, instead of using only the default Internet path. This dissertation empirically evaluates the extent to which using cloud paths to transfer data in parallel with the default Internet path can improve the end-to-end bandwidth in bulk data transfers. ( en )
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Thesis:
Thesis (Ph.D.)--University of Florida, 2017.
Local:
Adviser: FIGUEIREDO,RENATO JANSEN.
Local:
Co-adviser: FORTES,JOSE A.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2018-06-30
Statement of Responsibility:
by Kyuho Jeong.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Embargo Date:
6/30/2018
Classification:
LD1780 2017 ( lcc )

Downloads

This item has the following downloads:


Full Text

PAGE 1

SDNANDOVERLAYNETWORKHYBRIDSYSTEMFORFLEXIBLEANDHIGHPERFORMANCENETWORKVIRTUALIZATIONByKYUHOJEONGADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2017

PAGE 2

c2017KyuhoJeong

PAGE 3

Toyouasareader

PAGE 4

ACKNOWLEDGMENTSFirstandforemost,Isincerelythanktomyacademicadvisor,Dr.RenatoFigueiredo,whohasguidedmepatientlytowardthisultimategoal.DespiteofmyfrequentchangeofacademictopicanddeviationfrommainfocusduringmyPhDstudy,heenduringlyusheredmetoconcentrateontherightdirection.WhileIgotstuckinpettydetails,healwayspresentedmethebigpictureandintegralinsightoftheproblemandinspiredmetocomeupwithsolutions.Henotonlyhelpedmeacademicallybutalsogaveinsightfullessonsoflifesuchastheimportanceofsocialandwritingskillstointeractwithotherinsightfulresearchers,andoeredopportunitiestoimprovethiswiththevariousprojects.DuringmyPhDstudy,IhaveworkedwithresearchersanddevelopersaroundtheworldsuchastheuniversitiesandinstitutesinItaly,Netherland,Miami,SanDiego,ChinaandJapan.Iwouldnothavethesevaluableexperiencesinthisshortperiodoftimewithouthisdriveforinterdisciplinaryandeagerforsynergywithotherresearchers.IalsoshowmygratitudetoDr.JoseFortes,Dr.XiolinLiandDr.ShigangChenforbeingmycommitteemembers,providinginsightfulinputstowardmyresearch,andencouragingmetochallengedauntingresearchtopic.IexpressmygratitudetoDr.KoheiIchikawa,whomIalwayscallforhelpwhenIhadshortofinsightincomputernetworkeld.Hadnotbeenthehelpfromhim,myresearchwouldbefarbelowlevel.Eventhetimedierenceandlongdistancecouldnotpreventhishelpingnature,healwaysrespondskindlywithinsightfulideasuponmycallforhelp.IappreciateDr.PierreStJuste,BenjaminWoodruandDr.HeugnsikEomwholeftthelegacyofIPOPprojectfromwhichIcanlearnandextendmyresearch.Wereitnottheirendeavorofdeveloping,carefuldesigndecision,andqualityassuranceforthereliablesoftwareproject,Icouldnotextendmyinsightsonoverlaynetwork.Ididnotacknowledgeyourimportanceatthetimeyouwerearoundme.ButthesenseofvoidIfeltafteryouleftcannothelpbutappreciateyourpresenceintheearlyofmyPhDstudy. 4

PAGE 5

AsIamincompletingmydegreeprogram,IcannothelpbutremindtheconversationwithDr.MingZhaowhenwemetinKyoto,Japan.WhenIfeltlostinpursuingmypath,heencouragedmetosetanambitiousobjectivetodrivemyselffurther.Ithankforhisnudge.IalreadyfeelnostalgiaofgoodolddaysinACISwithDr.YonggangLiuandDr.KyungyongLee.Ithankforhangingaroundwithmeinleisuretimeandextendingmyknowledgefromtheirresearch.AndbothofyoursuccessesafterthegraduatemotivatedmeforpursuingthePhDstudy.ItishardtoshowmygratitudeenoughforGiljaeLee,KyungminParkandDr.JongminLee.Asthementorsoflifeandresearch,theyadvisedme,spendtimewithme,andlistenedpatientlytome.Finally,Iespeciallythanktomyfamily.Ifeelremorseallthetimeaboutnotbeingwithyouoften.NomatterhowlongIthink,Icannotndacorrectenoughrhetoricthatexpressesyourdevotionstome.So,IplainlystatethatIsincerelythankforallofyourdevotion.ThismaterialisbaseduponworksupportedinpartbytheNationalScienceFoundationunderGrantsNo.1527415,1339737and1234983,andbytheDepartmentofDefenseawardW911NF-13-1-0157.Anyopinions,ndings,andconclusionsorrecommendationsexpressedinthismaterialarethoseoftheauthor(s)anddonotnecessarilyreecttheviewsoftheNationalScienceFoundationortheDepartmentofDefense.TheauthoracknowledgestheuseofNSFCloudresourcessuchasCloudLabandChameleonfortheexperiments. 5

PAGE 6

TABLEOFCONTENTS page ACKNOWLEDGMENTS ................................... 4 LISTOFTABLES ...................................... 9 LISTOFFIGURES ..................................... 10 ABSTRACT ......................................... 12 CHAPTER 1INTRODUCTION ................................... 14 1.1OverlayParadigm ................................ 15 1.2OverlayNetworks ................................ 17 1.3SoftwareDenedNetworking .......................... 20 1.4NetworkFunctionsinSDN,NFVandMiddleboxes ............... 22 1.5NetworkVirtualizationinMulti-TenantDataCenter .............. 24 1.6Cloud-RoutedOverlayPath ........................... 26 1.7ParallelTCP ................................... 27 1.8DissertationOrganization ............................ 28 2OVERLAYANDSDNHYBRIDMODELFORINTER-CLOUDARCHITECTURE .. 29 2.1Background ................................... 30 2.1.1Inter-CloudArchitecture ......................... 30 2.1.2NestedVirtualizationandNestedVirtualNetwork ............ 32 2.1.3NetworkinginNestedContainers ..................... 35 2.2NetworkVirtualization .............................. 36 2.2.1OverlayNetworks ............................. 37 2.2.2SoftwareDenedNetworking ....................... 37 2.2.3NATTraversal .............................. 38 2.3VIASOverview .................................. 39 2.4VIASArchitecture ................................ 41 2.4.1SDNController .............................. 43 2.4.2OverlayDatapath ............................. 43 2.4.3OverlayController ............................ 44 2.4.4OverlayLinkandBypassFlowManagement ............... 44 2.5VIASImeplementation .............................. 45 2.5.1SDNController .............................. 45 2.5.2OverlayDatapathandOverlayController ................ 48 2.5.3Software-denedOverlayBypass ..................... 51 2.6VIASEvaluation ................................. 54 2.6.1VIASOpenvSwitchbridge/NAT ..................... 55 2.6.2Microbenchmarks ............................. 56 6

PAGE 7

2.6.2.1ARP .............................. 57 2.6.2.2ICMP ............................. 59 2.6.2.3TCP .............................. 59 2.6.3ApplicationTest ............................. 61 2.7RelatedWork .................................. 62 2.8ChapterConclusion ............................... 64 3SDNFORMULTI-TENANTDATACENTER ..................... 65 3.1Background ................................... 66 3.1.1Multi-TenantDataCenter(MTDC) ................... 66 3.1.2RelatedWorksofMTDCNetworkArchitecture ............. 68 3.1.3Layer2SemanticsinMTDCs ...................... 70 3.2ArchitectureofPARES .............................. 72 3.2.1High-levelOverview ........................... 72 3.2.2PARESPacketHandlingLayers ..................... 73 3.2.2.1End-hostsolicitationlayer ................... 74 3.2.2.2Networkfunctionlayer ..................... 74 3.2.2.3Routinglayer .......................... 75 3.3Implementation ................................. 75 3.3.1Layer2SemanticsinPARES ....................... 75 3.3.2End-hostSolicitationLayer ........................ 76 3.3.3NetworkFunctionLayer ......................... 77 3.3.4PacketWalk-throughExample ...................... 78 3.3.5RoutingLayer .............................. 79 3.4Evaluation .................................... 79 3.4.1Multi-rootFatTreeandBidirectionalLongestPrex .......... 80 3.4.2VirtualeXtensibleLocalAreaNetwork(VXLAN) ............ 82 3.4.3End-hostSolicitationLayerScaleTest .................. 82 3.4.4LineRateVirtualizationTest ....................... 83 3.4.5ScalabilityTest .............................. 86 3.5Discussion .................................... 88 3.5.1ApplicabilitytoGeneralEnterpriseDataCenter ............. 88 3.5.2ScalabilityConsideration ......................... 89 3.6Conclusion .................................... 89 4OVERLAYNETWORKSFORBULKDATATRANSFERINPUBLICWANENVIRONMENT 91 4.1Cloud-routedOverlayNetwork(CRONets) ................... 91 4.2OverlayPath ................................... 93 4.3FindingAStreamCountCombinationforMaximumAggregateBandwidth .. 95 4.4AlgorithmDetails ................................ 97 4.5ParallelTCPExperimentUsingCloud-paths .................. 100 4.6Bandwidth-CostRelationshipinCloudMulti-Path ............... 102 4.7ADesignPatternforMulti-PathTCPSocket .................. 103 4.8Evaluation .................................... 106 7

PAGE 8

4.9RelatedWork .................................. 108 4.10ChapterConclusion ............................... 110 REFERENCES ........................................ 112 BIOGRAPHICALSKETCH ................................. 121 8

PAGE 9

LISTOFTABLES Table page 2-1Streambypassruleexample .............................. 53 2-2ICMPandTCPperformancecomparisonbetweenLinuxNATandNATfeaturedopenvswitchofVIAS .................................. 55 2-3ARPandICMPlatencycomparisonamongconventionalLinuximplementation,overlaydatapathandVIAS .................................. 58 2-4TCPperformancecomparisonamongphysical,overlaydatapathandSDNbypassvirtualizationscenario. ................................. 59 3-1Comparisonofvariousnetworkvirtualizationinmulti-tenantorenterprisedatacenter. 70 9

PAGE 10

LISTOFFIGURES Figure page 1-1Variousoverlayusecasesthisdissertationstudies ................... 16 1-2Approximateimplementationcomplexityofnetworkfunctions ............ 23 2-1Inter-cloudarchitecture ................................ 30 2-2Inherentheterogenityofnetworkenvironmentforinter-cloudarchitecture. ...... 32 2-3Layer-2networkvirtualizationfornestedVMsacrossthreecloudproviders. ..... 36 2-4ConceptualillustrationofVIAS ............................ 40 2-5VIASoverallarchitecture. ............................... 42 2-6NATimplementationinVIASusingOpenFlowprotocol. ............... 46 2-7PacketencapsulationinIPOP. ............................. 49 2-8PacketdatapathalternativesinVIAS. ........................ 50 2-9Virtualizationcomparisonofdatapath ......................... 52 2-10TCPperformancecomparisonamongphysical,overlaydatapathandVIASvirtualizationscenario. ........................................ 60 2-11Redissimulation. .................................... 61 3-1PARESinmulti-tenantdatacenter. .......................... 71 3-2ArchitectureofPARES. ................................ 72 3-3OperationlayeringonSDNfabric. ........................... 73 3-4Datapathcomparisonofconventionaltunneling(top)andaddresstranslation(PARES,bottom). ........................................ 78 3-5SDNtestbeds. ..................................... 80 3-6IPcore/aggregationroutingruleexamplewithasinglepodofmulti-rootfattreedatacenterarchitecture. ................................ 81 3-7End-hostsolicitationtest. ............................... 83 3-8TCPmaximumthroughputcomparisonwithvariousmaximumsegmentsizewith1Gbpsedgeswitch. ................................... 84 3-9TCPmaximumthroughputcomparisonwithvariousmaximumsegmentsizewith10Gbpsedgeswitch. ................................. 85 10

PAGE 11

3-10CPUprolespanof60secondsrunonphysicalhosts. ................ 85 3-11CPUprolespanof60secondsVXLANrun. ..................... 85 3-12CPUprolespanof60secondsPARESrun. ..................... 85 3-13TCPThroughputamongmultipleVMsandstreams. ................. 86 3-14Flowentryscalabilitytest(basevaluesare144sand520srespectivelyfortestbed1and2) ......................................... 87 4-1Locationsofrelayandend-pointnodes ........................ 92 4-2OverlaypathrelayedbycloudinstancesusingGREandIPsectunneling. ....... 94 4-3Averageofrelativeaggregatebandwidthofthe18WANpaths ........... 96 4-4Aggregatebandwidthcomparisonamongsinglestreamonsinglepath,multistreamonsinglepath,andmultistreamonmulti-path. ................... 99 4-5Sankeydiagramfromsource(left)todestination(right)intermsofstreamcount.Linewidthisscaledwiththestreamcount. ...................... 100 4-6BandwidthcomparisonamongdefaultInternetpath,simultaneousmulti-pathandarithmeticsumofmulti-path ............................. 101 4-7Bandwidth-Costrelationship(costisfora1GBdatatransfer) ............ 102 4-8ADesignpatternformulti-pathdatatransfer. .................... 105 4-9Transfercompletiontimeof1GBdata/leinvariouspaths. ............. 108 4-10Selectivepaththroughputimprovementwithincurredcost. ............. 109 11

PAGE 12

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophySDNANDOVERLAYNETWORKHYBRIDSYSTEMFORFLEXIBLEANDHIGHPERFORMANCENETWORKVIRTUALIZATIONByKyuhoJeongDecember2017Chair:RenatoJ.FigueiredoMajor:ElectricalandComputerEngineeringNetworkprotocolsandinfrastructurearediculttobemodiedafterdeployment.Thus,overlaynetworkshavebeenwidelyusedtoprovideavirtualabstractionofnetworklayer,achievedbyencapsulationattheend-pointsandtunnelingthroughthenetworkfabric.Althoughoverlaynetworksprovidetheexibilitytoimplementoperationssuchasmodifying/mappingidentiersandcontrolofinformationfornetworkvirtualization,theyarenotabletokeepthepacewiththetransmissionratecapacityofmodernhardwareandcommunicationlinks,andconsumecomputingresourcesfromtheend-hosts.TherecentemergenceofSoftwareDenedNetworking(SDN)createdameansofvirtualizingnetworkfrominfrastructurebyusingcommonAPIs.Thisdissertationaddresseschallengesinthecontextofoverlaynetworkdesign,includingperformanceimprovementsbyusingSDNinthecontextsofinter-cloudandmulti-tenantdatacenters.Inthecontextofinter-cloudarchitectures,thisdissertationinvestigatesanovelapproachcalledVIAS(VIrtualizationAccelerationusingSDN)tothedesignofoverlaynetworks.VIASusesSDNtoselectivelybypasstunneledlinkswhenSDN-managedpathsarefeasible.Architecturally,VIASisself-organizing,wherebyoverlaynodescandetectthatpeerendpointsareinthesamenetworkandprogrambypassowsbetweenOpenFlowswitches.Inthecontextofmulti-tenantdatacenters,thevirtualizationperformanceofoverlaynetworktechnologiessuchasVXLANisfarbelowlinkspeeds,andoverlayprocessingconsumessubstantialresourcesfromthehypervisor.ThisdissertationintroducesPARES(PAcket 12

PAGE 13

REwritingonSDN),whichusesthepacketrewritingfeatureofSDNswitchestoenablemulti-tenantfunctionalitywithnear-nativeperformanceandreducedloadonend-hosts.InpublicWANInternet,alongwiththeemergenceofcloudcomputing,overlaynetworkscanbeusedtoincreaseend-to-endbandwidthbyusingadditionalcloudpaths,insteadofusingonlythedefaultInternetpath.ThisdissertationempiricallyevaluatestheextenttowhichusingcloudpathstotransferdatainparallelwiththedefaultInternetpathcanimprovetheend-to-endbandwidthinbulkdatatransfers. 13

PAGE 14

CHAPTER1INTRODUCTIONNetworkprotocolscannotbemodiedwithoutsignicanteortandcostaftertheybecomewidelyadopted.Toaddressthisissue,overlaynetworkshavebeenusedinseveralscenariostoprovideexibilityontopofanexistingunmodiedinfrastructure[ 3 36 4 ]Forexample,ausermaywanttocreateaVirtualPrivateNetwork(VPN)[ 4 ]connectionbetweentwonodesseparatedbymultipleNetworkAddressTranslation(NAT)[ 83 ]devicesacrosspublicWideAreaNetwork(WAN).Shemaychooseoneofmanyknownoverlaynetworktechnologies(suchasIPsec[ 48 ],GRE[ 36 ]orIPOP[ 47 ]),whichcapturepacketsorframesfromtheend-pointsandencapsulatewithanothernetworkheadersuch(asGREorUDP).Then,theseouternetworkheadersenablethepacketstotraversethefabricbasedontheouteraddress,thentoreachtothedestinationendpoints.Asanotherexample,inavirtualizedmulti-tenantdatacenter,theadministratorwantstodecoupletheactualnetworkidentierofphysicalnetworkinterfacesusedbyhypervisorsfromthevirtualnetworkinterfacesoftenants'virtualmachines.Thus,shemaychooseoneofmanyoverlaynetworkstechnologies(suchasVXLAN[ 55 ]orNVGRE[ 26 ]),whichencapsulatestenant'spacketsbythehypervisornetworkidentiertobuildarbitraryvirtualnetworktopologiesandtraversethepacketsacrossthedatacenter[ 43 ].Anotheruseofoverlaynetworksistoaddsupplementalpathstoincreaseaggregatebandwidthorreliability.InternetroutingpathbetweenagivensourceanddestinationpairisdeterminedbyprotocolsbetweenAutonomousSystem(AS)[ 37 ]andthencommonlybyClasslessInter-DomainRouting(CIDR)[ 25 ].However,thepathisnotalwaysthebestpathintermsofbandwidthandpacketlossrate.AusermaywanttouseapathotherthanthedefaultInternetpath;however,thisisoftennotpractical,notonlybecauseend-hostsdonothaveauthoritytomodifytheroutingrule,butalsobecausethedecisionismadebymultipleentitiessuchasASes.Alternatively,ausercanintentionallydeployadditionalcloudnodesacrossthepublicWAN,anduseanoverlaynetworkapproachtosupportpacketstoberouted 14

PAGE 15

throughthecloud-provisionednodes.Throughoutthisdissertation,werefertothisalternatepathusingtheabovetechniqueascloud-path.Asstatedabove,overlaynetworkshavebeenwidelyusedinvariousscenarios.However,overlaynetworkshaveinherentoverheadsincludingencapsulationandcontextswitching.Moreover,althoughoverlayscanprovideadditionalexibilityandadditionalnetworkresources,widelyusedprotocolsandAPIsarenotdesignedtoexposeoverlayfunctionality.Thisdissertationstudiestheabovethreeusagescenariosofoverlaynetworks.IntheremainingofChapter 1 ,Ibrieysurveyoverlaynetworks,SoftwareDenedNetworking(SDN)[ 57 ],Multi-TenantDataCenter(MTDC)[ 61 ]architectures,andcloud-routedoverlaypath(cloud-path)[ 11 ],asthosearethefoundationalconceptsoftheproblemdomainthisdissertationdealswith.Chapter 2 3 and 4 arededicatedtostudiesandndingsoftheusagescenariosstatedaboverespectively. 1.1OverlayParadigmThephilosophyofIP-basednetworkistokeepthemiddlesimpleand,theedgesmart.Best-eortpacketdeliveryinnetworklayerischaracterizedasunreliableandconnectionless.Thus,packetsmaybecorrupted,lost,ordeliveredout-of-order.Whatmakespossibletoenforcereliabledeliveryend-to-endareprotocolsatthetransportlayer,whichareimplementedaboveIPlayer,onedge/enddevices.However,IPv4addressexhaustionandthenecessityforisolationfromthepublicInternetforsecurity,inevitablyledtotheintroductionofNetworkAddressTranslation(NAT)atedgenetworkdevices,whichseparatesIPv4addresssubnetandpreventspacketsofconnectionlessprotocolstotraversethrough.Inaddition,middleboxes(suchasload-balancerorproxies),transformtransportheadersbetweentheend-points.Thus,currentnetworkinfrastructureisnotpureIPnetworkfabricfromendtoend,butratheramorecomplexfabricsubjecttotheintroductionofnetworkdevicessuchasgateways,NATs,rewalls,ormiddleboxes.Sinceoverlayplacesnodesonthemiddleoffabricandfunctionsasmorethananetwork/switchesdevices,theoverlaynetworkparadigmcontrastswiththis\keepthemiddle 15

PAGE 16

Figure1-1. Variousoverlayusecasesthisdissertationstudies simpleandmakeedgesmart"designphilosophy.Overlaysmayplacenodesonthefabric,suchthattheyareabletomodiythepackettomeetend-pointsapplicationinterests.Forexample,RON[ 3 ]andX-Bone[ 88 ]improvetheresilienceandexibility.Yoid[ 24 ]andBullet[ 52 ]deployoverlaynodesinthenetworkandallowmulticastfromasinglesourcetomultipledestinations.Chord[ 85 ]deploysoverlaynodesinaring-shapedtopologyandpacketsareroutedbasedonidentierstoimplementsystemssuchasaDistributedHashTable(DHT).Assuch,overlaysarenowessentialincurrentnetworkinfrastructure.Inthisdissertation,westudythreedierentoverlayusagescenariosasshwoninFigure 1-1 ,oneininter-cloudarchitecture(asredcircle),theotherinmulti-tenantdatacenter(asbluesquare)andanotherinpublicInternetenvironment(aspurplestar).Ininter-cloudarchitecture,overlaysaredeployedinmultipleanddierentcloudserviceproviders,andprovidesthelayer2or3abstractiontotheVMsasiftheyareinthesameLANorCIDRenvironment.Inmulti-tenant 16

PAGE 17

datacenter(MTDC),overlaysaredeployedinsidedatacenterandrunonthehypervisortoprovidethesamevirtualnetworkenvironmenttothemultipleinstancesofVMsofatenant.Dierentfromotheroverlayenvironment,MTDCrequiresperformanceequivalenttolinerate,thustheconventionaloverlayimplementationforMTDCrequiressubstantialcomputingresourcesfromthehypervisor.Innalusagecase,multiplenodesaredeployedincloud,andend-hostsusethesenodestomaximizethroughputbyaggregatingbandwidth.Inthiscase,cloudnodesareusedtobypasscongestedroutingpathandtodiversityavailablepaths,ofwhichtheend-userscannotcontrol.Thus,overlaysareusedasameanstobypassexteriorgatewayprotocols. 1.2OverlayNetworksBeforetheemergenceofSoftwareDenedNetworking(SDN),thereweretwogeneralapproachestonetworkvirtualization.Oneisthroughencapsulation,alsoknownastunneling,toformoverlaynetworks[ 3 47 ].Theotherapproachistouseanadditionalprotocol,suchasVLANtagging,alongwithnetworkdevicessupportingtheprotocol[ 55 ].Thetwoapproacheshaveadvantagesandshortcomings.Encapsulationhastheadvantageofexibility,sinceitcanvirtualizeanylayerofnetworkframeorpacket.However,itneedstodelimitthepacket/framesizetoevadethedefragmentationproblem.Anotherdrawbackistheoverheadofappendinganadditionalnetworkheadertoperformencapsulation.Duringtheencapsulationprocess,uponallthepackets/framesindatapath,anewnetworkheaderneedstobebuiltalongwithitsnewchecksum.WhilehardwarenetworkdevicesuseTernaryContentAddressableMemory(TCAM)toachievepacketswitching/routingof10+Gbpsbandwidthonprexrouting[ 1 68 ],withextremelylimitednumberofentriesinTCAM,theencapsulationprocesstypicallydependsonacommodityCPU,whichmakesitdiculttoprocesspacketsonparwithhardwareswitches[ 77 ].Besides,applicationsusingencapsulationexposethepacketstountrustedpublicInternetdevices/links,requiringencryption,whichplacesadditionalburdenonend-hostCPU.Whilethelatterapproach(separateprotocolwithhardwareimplementation)hastheadvantageofline-speedvirtualization,itneedstobesupportedbythenetworkdevices.It 17

PAGE 18

alsohasthedisadvantagethatanetworkadministratorneedstothoroughlycongurealltheinvolvednetworkdevices.Furthermore,itmaynotscalebecauseoflimitedsupportfromtheprotocolitselforhardwaredeviceimplementation.Forinstance,aVLANtagidentierhasonlya12-biteld,limitingthenumberofentriesto4096[ 55 ].Furthermore,theIPmulticastentryofcommodityroutersorswitchesisoftheorderof100sto1000s,limitedbyhardwarespecication.Thoselimitationsinprotocolandhardwareimplementationplacescalabilitybarrieruponnetworkvirtualization.Inaddition,switchandrouterconsoleinterfacesaredierentfromvendortovendormakingithardertomanagevirtualizedsystemsatscale.OverlaynetworksprovideaexiblefoundationforVirtualPrivateNetworking(VPN),bytunnelingvirtualnetworktracthroughprivate,authenticatedend-to-endoverlaylinks.Overlaynetworkshavebeenwidelyusedasasolutionfornetworkvirtualizationatdierentlayers(link,network,andapplication).Theideaofbuildinguser-levelvirtualnetworksforgrid/cloudcomputingdatesbacktosystemsincludingRON[ 3 ],Violin[ 46 ],VNET[ 86 ],IPOP[ 47 ],ViNe[ 89 ],hSwitch[ 22 ],X-Bone[ 88 ],andMBone[ 19 ].RON[ 3 ]showedthatanapplicationlayeroverlayontopofInternetroutingsubstratesubstantiallyimprovestherecoveryratefrominfrastructureoutages.Violin[ 46 ]proposedoverlayvirtualnetworksprovidingtheabstractionofseparatelayer-2networksfortenants.ViolinwasimplementedwithoutrequiringanymodicationsontheVMM(VirtualMachineMonitor)orthehostingnetworkinfrastructure,notonlyprovidingexibleusercongurablenetworkenvironment,butalsoreducingthethreatofsecurityriskfromthehost.VNET[ 86 ]addressedasimilarproblem,focusingontheabilitytointer-connectvirtualmachinesacrossproviders.ThiswasaccomplishedbyrunningVNET\proxies"onendpoints(e.g.VMs)atdierentsites,andtunnelingL2tracoverTCP/TLSlinks.hSwitch[ 22 ]formsanoverlaynetworkbycreatingGenericRoutingEncapsulation(GRE)[ 36 ]tunnelsacrossdispersednestedVMs.However,becauseitcreatestunnelsbasedonprovidedcongurationles,theoverlaytopologyisstatic,anddynamicjoining/leavingofinstancesiscomplextomanage.ViNe[ 89 ]canbeconsideredasNFV(NetworkFunctionsVirtualization)node.ViNerunsaVirtualRouter 18

PAGE 19

(VR),implementedasaformofvirtualmachine,inaclusterenvironment,andprovidesvirtualnetworkenvironmentacrossgeo-distributedclusters.InternetProtocolSecurity(IPSec)iswidelyusedinVPNsinceitprovidesauthentication,encryptionandhosttohostvirtualization.TheIP-over-P2Poverlay(IPOP)[ 47 ]supportsbothlayer2and3virtualprivatenetworkingwithpeer-to-peertunnelsthataredynamicallycreatedandmanaged{evenwhennodesarebehindNATs.Tunnelsarecreatedtoreectrelationshipsstoredinmessagingoronlinesocialnetworkservices(e.g.anXMPPserver),supportingoverlaytopologiessuchasunstructuredsocialgraphsandstructuredP2Pnetworks[ 85 ].Manykernel-leveloverlaysarealsoused,especiallyindatacenter.VMware'sNSXandMicrosoft'sHyper-VrunVXLAN[ 55 ]andNVGRE[ 26 ]respectivelyonvirtualmachinemonitorsandencapsulatepacketsfromvirtualnetworkinterfacesofguestmachinesbyUDPandGREheaderrespectively.Whileprovidingexibility,bothuser-andkernel-leveloverlaysinherentlyincurnetworkvirtualizationoverheads,suchasuser/kernelboundarycrossingandheaderencapsulation,andmanagementoverhead,suchasvirtualnetworkinterfacestoattach/detachphysicalnetworkinfrastructure,andsetupandmanagementoftheoverlayitself.Violin,RONandIPOPcreatetheirownroutingheaderappendedbeforeconventionalnetworkroutingheader,whichallowsdynamictopologyofoverlaynetworkandconvenientconguration.VNETandhSwitchtakesadvantageofconventionalprotocolsuchasTCP/UDPandGRErespectivelyasanencapsulationheader.VNETachievesrelativelyhighperformancevirtualizationcomparedtootherresearchesbytakingadvantageofkernelimplementationandmulti-threadpackethandling[ 95 ].However,thenatureofcommodityprocessorhandlingvirtualizationprocessdoesnotchange.Inthisdissertation,IPOPisusedasoneofthestudiedsubstratesforoverlaynetworking.AlthoughIPOPisnotaperformance-centricoverlaynetwork,ithasembeddedNATtraversaltechniquetocreateadynamicpeer-to-peertunnel,andhasautonomictopologymanagement.Inparticular,itseparatesthetopologypolicymodulefromtheencapsulationandencryption 19

PAGE 20

module,separatesdataplanefromcontrolplane,andexposesAPIs.ThosefeaturesmakeIPOPsuitablesoftwaretointegratewithSDNframework.IPOPworksinlivemigrationscenarios,andhasbeenusedintheKangaroo[ 75 ]platform-as-a-servicesystem.However,itsoverlayprocessinghinderedtheperformanceofnestedVMsacrosshosts.Thisdissertationdirectlyaddressesthisperformancelimitation. 1.3SoftwareDenedNetworkingWhileoverlaynetworksprovidenetworkvirtualizationattheend-points,anotherapproachtodelivernetworkvirtualizationisfromthenetworkinfrastructurefabric,alongwithdedicatedprotocols.Awell-knownandwidely-usedtechnique,VLAN,keepslayer2boundarybasedon12-biteldVID(VLANIdentier).Anetworkadministratorcanmanagelayer2virtualizationbyconguringVLANinnetworkdevices,creatingvirtuallayer2fabricuponaphysicalswitchfabric.VLANusesaat12-bitidentiertovirtualizelayer2networks,thusitinherentlycannotscaletomorethan4096virtuallayer2networks[ 55 ].Koponenetal.alsopointsoutthatconventionaltechniquessuchasVLANsarenotabletocopewiththeneedforexiblecongurationandisolationacrosslargenumbersoftenantaddressspaces[ 51 ].Software-denednetworking(SDN)[ 35 57 ]hasemergedasageneralapproachtoaddresssuchchallenges,SDNallowsnetworkadministratorstoprogrammaticallyanddynamicallycongure,control,andmanagenetworkbehaviorthroughopeninterfaces,andprovidesameanstomonitortracsofnetworkdevices.SDNdiersfromprotocol-orientednetworkvirtualization(e.g.VLAN),asitletscontrollerdenethebehaviorofswitchesthroughopenAPIinterfaces.SDNhasemergedtoprovideexibilityinpetriedhardware-basednetworkinfrastructure,andtounifyinterfacesamongvendors.Itdividescontrolanddataplaneininfrastructurelayer,whichenablesanetworkoperatororadministratortoprogrammaticallycongureitsnetworkinfrastructure.AlthoughcongurationofnetworkdeviceswaspossibleevenbeforeSDN,thescopeofcongurabilityandfunctionalitydiersfromvendortovendor.Asdatacenterscalesoutnotonlyintermsofnumberofphysicaldevicesinsidethesitebut 20

PAGE 21

alsointermsofnumberofvirtualentities,thelevelofheterogeneityamongnetworkdevicesincreases,whichmotivatestoabstractawaynetworkdevicefunctionalityasaformofSDN.Although,inSDN,theabstractionofinfrastructureisprovidedtothecontrolplane,allowingevenexperimentalprotocolstorunonSDNinfrastructure,thecontrolplanedoesnothavecompletecontrolofthedataplane.Forexample,thedominantSDNprotocol,OpenFlow[ 57 ],allowsmatch/rewriteofalimitedsetofeldsofthenetworkprotocolheader,suchasMAC/IPaddresses,andtransportportnumbers.However,featuressuchasoverallpacketrewriting,fragmenting,encrypting,ormergingpacketsorframes,arenotpartoftheOpenFlowspecication.OneofthemotivationsofSDNisthenecessityoftestingexperimentalprotocolstoovercomethedicultiesofdeployingnewprotocolsonlegacyhardwareswitchesandrouters[ 57 ].Nonetheless,manyresearcheshavebeenconductedtouseSDNasantechniquetovirtualizethenetworkfromtheinfrastructureserviceproviders'pointofview.VirtualWire[ 93 ]takesadvantageofconnect/disconnectprimitives,whichareassumedtobeexposedbythecloudserviceprovider,andusetheseprimitivestolivemigrateVMsfromonecloudtoanothercloud.NetworkVirtualizationPlatform(NVP)[ 51 ]providesRESTAPIstothetenantstoexposenetworkvirtualizationcapabilities.Networkelementssuchasswitchesandportsarepresentedtothetenants,andtenantsbuildthetopologyoftheirnetwork.Then,tunnelsandowrulesarecreatedandprogrammedbyNVPtoeachhardwareandsoftwareOpenFlowswitchestoforwardpacketsamongVMsdeployedintra-andinter-clouds.However,NVPisdesignedtosupportmultipletenantsinasinglecloudprovider.WL2[ 14 ]presentsSDN-basedsolutiontoanInternet-scalevirtuallayer2acrossmultipledatacenters.Iteliminateslayer2broadcastingbyreroutingandsolicitinglayer2tracinthecontrolplane,andintroducesvirtualMACaddress,whichisaconcatenationofdatacenter,switch,VM,andtenantidentier,tomapataddressestohierarchicalnetworkentities.TheadventofSDNtechniquesandtheOpenFlowstandardhaveunlockedthepotentialtoaddressthelimitationsofvirtualizationperformance,andtoaddressrangescalability, 21

PAGE 22

whilestilldeployingnetworkvirtualizationservicesatscalewithinorbeyondserviceproviders.However,whileSDN-basedsolutionsarebecomingwell-understoodwithinthecontextofasingleprovider'sinfrastructure,SDNsolutionsacrossnon-cooperatingproviders(e.g.dierentpublic/privateclouds)arecurrentlynotfeasible,asprovidersarenotwillingtoallowexternalentitiestoexertcontrolintheirSDNinfrastructure. 1.4NetworkFunctionsinSDN,NFVandMiddleboxesWhileSDNemergesfromtheacademia,industryandopensourcecommunity,NetworkFunctionVirtualization(NFV)arisesfromthetelecommunicationandinfrastructurevendors.NFVvirtualizesnetworknodefunctions,typicallyintheformofvirtualmachinesattachedtothenetwork(alsoknownasmiddleboxes),forimprovedcommunicationservices.Hundredsoftelecommunicationcompaniesandnetworkdevicemanufacturersaretryingtogeneralizewhatfunctionstovirtualize,andtoidentifycommonsoftwaredesignpatternsfromwhathaspreviouslybeenimplementedasaproprietaryhardwareappliance[ 12 56 42 ].Alongwithdatacenterandservervirtualizationtrends,SDN/NFVnowareunderstoodasakeyforimprovingvirtualizationperformance.Subsequently,theneedforecientlysupportinglargevirtualprivatenetworktotenantsinenterprisedatacenters,suchaspublicandprivateclouds,hasmotivatedtheadoptionofSDN/NFVfordatacenternetworkvirtualization.Inthissection,theimplementationcomplexityofvariousnetworkfunctionsissurveyedandtheboundariesbetweenSDN,NFVandmiddleboxesareclaried.Middleboxesareusuallyasingleentitywhichiscapableofoneofvariousofnetworkfunctions.Networkfunctionsrangefromloadbalancer,NATorproxiesinitssimplestform[ 12 ],totransport-layerencryption(TLS)[ 64 ]andHadoop-layerdataaggregation[ 56 ],whichrequiresreadandwriteofthewholepacketpayload.Figure 1-2 illustratesthecomplexityofimplementingnetworkfunctionsandboundariesoftheseentities.Eachaxisshowstherequireddegreeofdepthinwrite/readaccessofpackets.Asthedepthincreases,thefunctionsrequiredeeperinspection/writeonthepackets.TheleftmostnetworkfunctionisVLAN,whichonlyrequiresLayer2shallowpacketinspection 22

PAGE 23

Figure1-2. Approximateimplementationcomplexityofnetworkfunctions anddoesnotrequiremodicationofanylayerofnetworkheader.Asitincreaseshorizontally,NFVrequiresaccesstohighernetworklayerordeeperinthepayload.Assuch,asdepthofinspectionincreases,networkfunctionsrequirepayloadinspectionsuchastranscoder(on-the-yencryption/decryptionmiddleboxforapplicationlayer[ 12 ])orHadoopdataaggregator.Theverticalaxisrepresentsthedegreeofcomplexityofrewritingapacket.Asitincreases,networkfunctionrequiresmorecomplexwriting,suchasencapsulationorpayloadrewriting.Simpleloadbalancercanbeimplementedusingrerouting,withouttherequirementofpayloadaccess,whilecomplexHTTPloadbalancerrequirestoaccesspayload 23

PAGE 24

toparseHTTP.NotethatFigure 1-2 isforvisualillustration;statelessandstatefulprocessarearbitrarilyplacedonthedimensionofverticalaxis.Whilerecentnetworkfunctionsrequiredeeperrewriteofpacketpayload,OpenFlowprotocolonlyallowspacketrewritingofafractionofeldsofthenetworkheader.FromtheOpenFlowversion1.5.1,the\Set-Field"actioniscategorizedto\optional"actioninsteadof\required"action,leadingtohardwareOpenFlowvendorsonlyimplementingthisactionintheslow-path(usingageneralpurposeprocessor)ratherthaninthefast-path.Thisisalsoknownasforwardingandcontrolelementseparation[ 63 50 ].AlthoughitisconceivablypossibletoimplementanyformofnetworkfunctionsinOpenFlow,bycallingintothecontroller,manyareimpracticalsinceitinvolvesforwardingwholepacketstothecontroller,applyingmodication,andsendingbacktooriginalswitches.Thus,inthisdissertation,weconcentrateonnetworkfunctionsthatareimplementableincurrentOpenFlowspecication,inthefastwhenpossible,orintheslow-path. 1.5NetworkVirtualizationinMulti-TenantDataCenterMulti-tenantdatacentersforcloudcomputingrequirethedeploymentofvirtualprivatenetworksfortenantsinanon-demandmanner,providingisolationandsecuritybetweentenants.Toaddresstheserequirements,hypervisorsusesprotocolssuchasVXLAN[ 55 ]orNVGRE[ 26 ].Theseprotocolsruninakernellayerofhypervisor,encapsulatepacketsfromtheguestmachines,andtunnelsthepacketinsidedatacenterinsecureandexiblemanner.However,theseapproachesinherentlyincurprocessingoverheadonhypervisorasotheruser-spaceoverlaysdo,reducingtheeectivethroughputforthetenantvirtualnetworkcomparedtothenativenetwork.UnlikeotheroverlaynetworksforVPNsuchasGRE[ 36 ]orIPsec[ 48 ],wherethevirtualizationprocessoverheadonuser-spacecankeepthepacewiththethroughput.However,thevirtualizationprocessoverheadisexacerbatedindatacenternetworkenvironmentwhere10Gbpsbandwidtharecommon.Asservervirtualizationtechnologiesmature,multi-tenantcloudcomputinghasbecomewidelyusedasaplatformforon-demandresourceprovisioning.Dynamicprovisioningof 24

PAGE 25

computingresourcesasvirtualmachinesprovidesexiblecomputinginfrastructurethatcanbetailoredbytenants,andprovideshighavailabilityandproximitytoclientsbygeographicallydistributingserversacrossdierentzones.Fromanetworkingperspective,multi-tenancyindatacentersrequiresanabilitytodeployarbitraryvirtualnetworktopologiesuponphysicalnetworkinfrastructure,alongwithmanagingoverlappingaddressesbetweendierenttenants.Inaddition,forbetterutilizationandhighavailability,VMmigrationisanessentialfeatureinMTDC.WhenVMsarecreateduponuser'srequest,theyshouldbevirtuallyconnectedtotheuser'salreadydeployedVMs,regardlessoftheirphysicallocation.Also,whenlivemigrated,thenetworkidentity(suchasMACorIPaddress)ofVMsshouldremainthesame.Finally,whenVMsareterminated,theirnetworkpresenceshouldberevokedfromthetenantvirtualnetwork,andnetworkidentitiesshouldbereclaimed.Tunnelinghasbeenwidelyusedtotackletheseissuesbyprovidingavirtualoverlaynetworkuponthephysicaldatacenternetwork.Tunnelingadvantagesincludeeaseofdeploymentandseparationfromphysicalnetworktopology,becauseitobviatestheneedforadditionalprotocolfromphysicalswitchessuchasVLANorMPLS(MultiprotocolLabelSwitchingArchitecture)[ 78 ].Generally,thetunnelingprocessinvolvesencapsulatingeverypacketfromthevirtualoverlaynetworkbyaphysicalnetworkheader.Thisprocessisusuallyachievedbygeneralpurposeprocessorsatend-pointsandimplementedinsoftware(e.g.hypervisors),insteadofphysicalnetworkfabric.Nowadays,linkspeedof10Gbpsisprevalent,andthetrendistowardhigherrates.Whilenetworkdevicessurpass10Gbps,itisincreasinglydicultforthetunnelingprocesstokeepthispace:althoughcurrentCPUtechnologyachievesfewhundredsofGIPS(GigaInstructionsperSecond),theprocessingtimeofpacketclassicationisdominatedbymemoryaccesstimeratherthanCPUcycletime[ 31 ].Moreover,thetunnelingprocesshogsupcomputingresourcesathypervisors,nibblingavailableresourcesfortheguestmachines,asencapsulationprocessincludesprocessof 25

PAGE 26

prependingadditionalheaderoneverypacketsandfrequentcontextswitchbetweenhypervisorandguestmachines.InChapter 3 ,PARES(PAcketREwritingonSDN)isintroduced,anoveltechniquewhichusesthepacketrewritingfeatureofSDNswitchestoprovidemulti-tenancyindatacenternetworksatedgeswitches,therebyreducingtheloadonend-pointhypervisorsandimprovingthethroughput,comparedtotunneling.ExperimentsinanSDNtestbedshowthattheproposeddatacenterarchitecturewithPARESachievesnearline-ratemulti-tenancyvirtualizationwith10Gbpslinks(comparedto20%ofline-rateforVXLANtunneling),withoutincurringprocessingoverheadatend-pointhypervisorsorguestservers.Additionally,itevaluatesthescalabilityofPARESforARPprotocolhandlingandwithrespecttonumberofSDNowentries. 1.6Cloud-RoutedOverlayPathItisknownthattheInternetisaloosely-coupledaggregateofmultipleASes(AutonomousSystems),wherenosingleentityhascompleteauthoritynorcontroltooptimizecongestionorutilizationoftracacrossthenetwork.ThisnatureoftheInternetarchitectureoccasionallyleadstoacounter-intuitivephenomenon,whereadirectshortestroutewithlowlatencydeliverslessbandwidththanalternative,longandroundaboutroutes.Inotherwords,congestioncontrolandutilizationarenotgloballyoptimized.AlthoughISP-provideddedicatedprivateleasedlinescanbeusedforhigherbandwidthandimprovedreliability,lineleasecontractsoccuratthegranularityofonlyafewperyear[ 44 ],makingitpracticallyimpossibletouseitforashort-termperiod,ortoaddresssuddenspikesinnetworktracdemand.PreviousresearchhasshownthatgeospatiallydistributedcomputinginstancesincommercialcloudsoerusersanopportunitytodeployrelaypointstodetourpotentiallycongestedASes,andasameantodiversifypathstoincreaseoverallbandwidthandreliability.Theexistenceofavarietyofcommercialcloudcomputingoeringsallowsuserstoswiftlydeploypublicroutablecomputingresourcesinaexible,anddistributedmanner:userscan 26

PAGE 27

deployVMinstancesinminutes,andatdierentgeographicallocationswithinthescopeofasingleprovider(dierentavailabilityzones),oracrossproviders.Forexample,justwithinthecontinentalUnitedStates,AmazonEC2operatesdatacentersatfoursites(Oregon,Ohio,VirginiaandCalifornia);othercloudprovidershavesimilargeospatialdiversityoftheirdatacenterlocations.Thisgeospatialdiversitypresentsameanstousecloudservicesasadetourrelaypointtocreateadditionalend-to-endpaths.CRONets[ 11 ]foundthat78%oftheInternetpathscanachieveimprovedthroughputbyleveragingalternatecloud-routedpaths,andthattheimprovementfactorisabout3times,onaverage.Suchopportunitycomeswithacost,ascloud-routedpathsincurcostofnotonlyprovisioningofcomputingresources,butalsoforadditionaltracto/fromInternet.InChapter 4 ,itisprovedinempiricallythattheevaluationintheextenttowhichusingcloudpathstotransferdatainparallelwiththedefaultInternetpathcanimprovetheend-to-endbandwidthinbulkdatatransfers.Intheevaluation,weconsidersingle-streamandmulti-streamTCPtransfersacrossoneormorepathstransfersacrossoneormorepaths.Moreover,wesuggestanapplicationleveldesignpatternthattakesadvantageofthisimprovedaggregatebandwidthtoreducedatatransfertimes. 1.7ParallelTCPPreviousworkhasalsoshownthatparallelTCPstreamsinasinglepathcanbeappliedtoincreaseoverallthroughput[ 32 33 34 ].Parallelsingle-pathTCPiswidelyusedinapplicationssuchasGridFTP[ 2 ],inparticularinscienticprojectsthatrequirebulkdatatransferssuchasAtlasorCERN.However,althoughparallelTCPiseectiveinunderutilizednetworkenvironments,ifthenetworkisfullyutilized,parallelTCPunfairlyoustcompetingTCPstreams.Moreover,end-pointsinpublicWANdonothavemuchoptionstodiversitytheirroutingpath.Assuch,parallelTCPexpectsonlyasingleroutingpath,aswithothertransportprotocols,butitisnotaneectivesolutiontocopewithfullyutilizedandbottleneckednetworkenvironments.Nonetheless,theavailabilityofcommercialcloudinstancesonthe 27

PAGE 28

Internet,andofvariousoverlaytechnologies,presentoptionsfordiversifyingroutingpathonpublicWAN,whichcanovercomethelimitsofasingleroutingpath.Inthisdissertation,weusethetermbandwidthtorefertothetransferrateofoneormoreindividualstreamsatthetransportlayer(e.g.asmeasuredby\iperf"),wherethebytestreamisnotorderedacrossstreams,andthroughputtorefertotheend-to-endbulkdataattheapplicationlayer,wherethebytestreamisorderedattheapplicationlayer.InChapter 4 ,weshowhowmuchaggregatebandwidthcanbeachievedinbulkdatatransfers,byconcurrentlyrunningparallelTCPstreamsonadefaultInternetpath,alongwithmultiplecloud-routedoverlaypathsinapublicWANenvironment.Itisobservedthatthat,inmanyscenarios,additionalcloudpathswithparallelTCPstreamsachievedaggregatethroughputequaltoorgreaterthanthedefaultInternetpath(single-streamandmulti-streamTCP),whileinsomecasesachievingsignicantlylargeraggregatethroughputforend-to-endbulkdatatransfers.Inaddition,adesignpatternasapseudosocketAPIthatcanleveragetheincreasedaggregatebandwidthofcloudmulti-pathstoincreaseoverallthroughputispresented. 1.8DissertationOrganizationInChapter 2 ,westudyhybridmodelofoverlaynetworksandSDNcalledVIAS.VIASintegratesoverlayandSDNtechniquestosupportexibleandhigh-performancevirtualnetworking,inparticularacrosstenant-managednestedvirtualizationinstancesdistributedacrosscloudproviders.InChapter 3 ,weinvestigatehowoverlaynetworkshelptoprovidemulti-tenancyincurrentmulti-tenantdatacenter(MTDC),thenstudyandpresentanSDN-basedapproachthatoperatesatedgeswitchesofadatacenterfabric.InChapter 4 ,empiricalexperimentresultsofincreasedaggregatebandwidthusingcloud-patharedescribedandanalyzed.Then,weinvestigateandimplementthesocketAPIstoleveragethismulti-streamaggregatebandwidthandtoprovideorderedbytestreamtotheapplicationlayer. 28

PAGE 29

CHAPTER2OVERLAYANDSDNHYBRIDMODELFORINTER-CLOUDARCHITECTUREChapter 2 presentsVIAS,anovelapproachthatintegratesoverlayandSDNtechniquestosupportexibleandhigh-performancevirtualnetworking,inparticularacrosstenant-managednestedvirtualizationinstancesdistributedacrosscloudproviders.Itdoessowhileexposinganetworklayer2abstractiontoendpoints.VIASsupportsthegeneralcase(theabstractionofalayer-2virtualprivatenetworkinglinkinginstancesacrossproviders)byemployingoverlaynetworkingasthesubstrateforvirtualization,andoptimizesforacommoncase(networkowsamongnodeswithinthesameprovider)bymeansofanovelperformanceenhancementtechniqueofautomaticallydetectingandprogrammingfastbypathTCP/UDPowsusingSDNAPIs.SuchSDN-programmedVIASbypassremovesthenecessityofpacketencapsulationanddeliversvirtualizationperformancenearwirelinkspeed.VIASdetectstrac-intensiveTCP/UDPowsinsidetheoverlay-encapsulateddatatrac,andautomaticallyswitchesovertotheSDNfabricwheneversuchpathcanbeprogrammed{suchaswhenendpointsarewithinthesamecloudprovider.ThemaincontributionofChapter 2 isanovelsystemthatintegratesdistributedoverlaynetworkandSDNcontrollerstoself-congurepeer-to-peervirtualprivateoverlaynetworktunnelsacrosscloudproviders,andtransparentlydetectandprogramvirtualnetworkbypathowswithinaprovider.Thesystemhasbeenimplementedbyleveraginganexistingoverlaytechnique(IPOP[ 47 ])asasubstrate,andintegratingexistingSDNprogrammableswitches(OpenvSwitch)toestablishbypassows.WhileitcangeneralizetodierentoverlaytunnelingandSDNtargets,VIASapplies,inparticular,tointer-cloudnestedvirtualizationenvironments,wheretenantscandeploysoftwareSDNswitches(e.g.OpenvSwitch)intheirowninstances.Chapter 2 demonstratesthefunctionalityofVIASwithinandacrossclouds,andevaluatestheperformanceoftheprototypeinmultiplecloudenvironments.TherestofChapter 2 isorganizedasfollows:Section 2.1 elaboratesonthevariousaspectofnetworkvirtualizationandenvironmentofinter-cloudarchitecture.Section 2.3 29

PAGE 30

Figure2-1. Inter-cloudarchitecture presentsanoverviewofoverlayandSDNhybridmodel.Section 2.4 describesthegeneralarchitectureofVIAS.Section 2.5 providesdetailsonthedesignandimplementationofaVIASsystem.Section 2.6 presentsresultsfromtheevaluationoftheprototypeinrealisticcloudenvironments.Section 2.7 discussesrelatedworkandliteraturesurvey,andSection 2.8 concludesthis. 2.1BackgroundThissectionoverviewsnestedvirtualization,inter-cloudarchitecture,processcontainersandvirtualnetworkingissuesandchallengesthatserveasabasistomotivatethedesignandusesofVIAS. 2.1.1Inter-CloudArchitectureAscloudcomputingtechnologyemerges,conceptofcomputingresourcechangesasaformofutilitysuchaselectricityorwater.Theseresourcesarenowprovisionedon-demand.Andtheyareprovidedasaformofvirtualmachineinadiversegranularity.Forexample, 30

PAGE 31

\m1.medium"asinAmazonEC2andmediumorn1-standard-1inGoogleCloudPlatform.Thesechangesconceptuallydetachcomputingresourcesfromphysicalinfrastructure.Thesedays,usersdeploytheirmulti-tenantapplicationonprovidedcomputingresourcesfromclouds,notonphysicalservers.Alongwiththesetrend,socalledinter-cloudarchitectureemerges,whichdeploymulti-tenantsoftwareuponheterogeneouscomputingresourcesacrossdierentcloudserviceproviders[ 29 ].Forexample,singlemulti-tenantapplicationcanbedeployedacrossAmazonEC2,GooglecomputeengineandCloudLab[ 16 ]asinFigure 2-1 .Inthisexample,userdeployedmultiplecloudinstancesacrossheterogeneouscloudserviceprovider,whereusingthepublicInternetasasubstratenetworkandwhereprovidingthesamelayer2networkabstractionisessentialinthisusagescenario.ThebenetofexibledeploymentofcomputingresourcesenabledInter-CloudArchitecture(ICA)whichappropriatesmultiplecloudserviceproviderforbetterQualityofService(QoS),reliability,costeectiveness,andpreventingvendorlock-inproblem[ 29 ].MassivelyparallelapplicationssuchasworkowmanagementsystemsuchasPegasus[ 17 ],parametersweepsandbag-of-tasks[ 66 ]canbebenetedbothintermsofcostandreliability.Naturally,sincewecanobtaincomputingresourcesfromanycloudproviders,wecandiversifythegeographicallocationofcomputingresourcesandincreasetheproximitybetweentheserverandclient.Especially,intheeldofcontentdeliverynetwork,wherethelatencybetweenclientandserveriscritical,diversifyinglocationscanleadtobetterqualityofservice.Also,since,wearenotdependentonasinglecloudserviceproviders,wecanalsoincreasethereliability,cost-eectivenessandcanavoidvendor-lockinproblem.Inter-cloudarchitectureusestheInternetasasubstratenetwork.However,theinternetitselfisacoordinationofheterogeneousautonomousnetwork.AndMostofInfrastructuresareoutsidethecontrolofanygivenentityAndcomputingresourcesoncloudareusuallybehindrewallandNATrouter. 31

PAGE 32

Figure2-2. Inherentheterogenityofnetworkenvironmentforinter-cloudarchitecture. So,itisdiculttomergeintosingleLANenvironmentacrossICA.UsuallydierentprivateIPsubnetsareassignedindierentcloudserviceprovider.Andnotallproviderssupportlayer2networkamongtheinstancesnorexposewholelayer2networktotheinstances.Moreover,computingresourcesmaybeprovidedinadierentformasshowninFigure 2-2 .Generally,itisprovidedinaformofguestVMfromthevirtualmachinemonitor.However,itisalsonowbecomingpopulartobeprovidedinaformofcontainerfromaVM.However,infewcases,cloudservicesprovidebare-metalmachinewithHWswitches.Overall,itisbasicallyimpracticaltophysicallyinterconnectcomputingresourcesinICA.Soitisinevitabletousevirtualnetwork. 2.1.2NestedVirtualizationandNestedVirtualNetworkUtilizationofVIAScanbeperfectlytinnestedvirtualenvironment,whichisrecursivelyvirtualizingvirtualnetwork.Sincetheriseofmulti-tenantclouds,theapproachofprovidingvirtualizescomputingresourcessuchasvirtualmachinesandcontainershasbecomekeytoanincreasingbodyofdistributedcomputingapplications.AsInfrastructure-as-a-Service(IaaS)becomespervasive,tenantshavetheabilitytodeploydistributedcomputingservicesondemandacrossmultipleprivateandpublicclouds.Furthermore,tenantsareabletodeploytheirownvirtualinfrastructureonproviderresources,usingtechniquesthatcanbebroadlycharacterizedas\nestedvirtualizatio",notonlyasameanstoimproveutilizationofresources,butalsotoenhanceperformancebyco-locatingservices.Toeectivelyusesuch 32

PAGE 33

deeply-virtualized,distributedcloudenvironments,seamlessnetworkingacross(nested)VMinstancesiskey.Today,computingresourcesarereadilyavailablethroughmultiplecloudserviceAPIsandportals,andcanbecustomizedand/orinstantiatedfromatemplateimageinamatterofminutes.Cloudprovidersexposesuchvirtualresourceswithasetofinstancetypes,withdiscretecombinationsofCPU,memoryandstoragecapacities.However,theresourceoeringsmaynotmatchtherequirementsfromtenantworkloads,andmayimpactresourceutilization,motivatingseveralapproachesthatleveragenestedvirtualization[ 75 97 8 ].Nestedvirtualizationallowstenantstoprovision\inner"VMinstanceswithinthe\outer"VMprovisionedbythecloudservice.Assuch,tenantscanchoosetheirown\inner"VMMandcloudstack(whichmaybedierentfromtheprovider's)torunwithintheprovisionedinstance,allowingforincreasedexibility{e.g.byslicingagenericlarge\outer"instanceasseveral\inner"instanceswithtenant-conguredsizes.Furthermore,nestedvirtualizationallowstenantstocongureonce,anddeploytheirentireenvironmentacrossdierentcloudserviceproviders.Suchfeaturesareimportantbecause,whilelibrariessuchaslibcloud[ 5 ]inprincipleallowtenantstomanagemultipleproviders,inpracticetheproblemofheterogeneityofservicescannotberesolvedjustbyusingsuchlibraries.IssuesincludethatdierentprovidersmayusedierentVMMswithdierentimageformats,dierentblockstorageabstractions,anddierentnetworkcapabilities.Whilenestedvirtualizationaddsalayerofindirectionthatunlocksbenetsinexibilityandne-graincontrolofresourceallocations,itraiseschallengeswithrespecttoperformanceandnetworking.Oneincarnationofnestedvirtualizationthataddressestheperformanceissuewithinaninstanceistodeploylight-weightcontainers(e.g.LXC,Docker)asthe\inner"instancesontopofaprovider'shypervisor.WhiletenantscanusecloudAPIsandWebinterfaceswithinaprovidertocongurethenetworkfortheircloudinstances(e.g.botoorgcloud),thereisnointer-operabilityacrossproviders. 33

PAGE 34

Initsmostgeneralform,nestedvirtualizationallowsthecapabilityofvirtualmachinestothemselveshostothervirtualmachines,inarecursivefashion.Itbuildsupontheabilityof\classic"VMmonitors/hypervisorstoexposetheabstractionofavirtualmachineattheInstructionSetArchitecture(ISA)layer,andcanbesupportediftheunderlyingISAfollowstheguidelinessetforthinseminalvirtualizationwork[ 71 ].Initiallyappliedinpartitionablemainframesystems,nestedvirtualizationbecamefeasibleincommodityserversystemswiththeadventofvirtualizationextensionsforthex86architecture[ 90 ],andhasbeenmotivatedbyusecasesincloudcomputing[ 8 ].Whilethemostgeneralformofnestedvirtualizationallowsa\classic"hypervisortobeinstantiatedwithinavirtualmachineandsupportscompletelyunmodiedsoftwaretorunwithinVMs(e.g.instantiatingKVMwithinVMware),thenestedapproachcanalsobeapplicableinothercongurations,suchasnestedpara-virtualizedhypervisors(e.g.XenonXen[ 94 ]),tradingopotentialperformancebenetswiththeadditionalrequirementofsoftwaremodicationsinthekerneland/orapplications.Inparticular,aformofnestedvirtualizationthatisappealingfordeploymentofsoftwareservicesinIaaScloudcomputingplatformsistousethehypervisorofthecloudprovider(e.g.XeninAmazonEC2,KVMinGooglecomputeengine,Hyper-VinMicrosoftAzure)todeployO/Scontainers(e.g.LinuxLXC/Docker).ThisistheapproachtakeninKangaroo[ 75 ];thekeyadvantageofthisapproachisperformance,becausecontainersarelight-weight.Therequirementtorunsoftwarewithincontainersposesaconstraint,asnestedinstancesneedtousethesameO/Skernel,butitisacceptableinmanyapplications,astheadoptionofcontainertechnologiescontinuestoincrease.Dierentapproachestonestedvirtualization,asdescribedabove,yielddierentmodelsofhowvirtualCPUs,memoryandstorageareallocatedandmanaged.Forinstance,classicVMsexposevirtualCPUs,physicalmemory,andblockstoragedevices,whilecontainersexposeprocesses,virtualmemory,andlesystemsattheO/Slayer.Nonetheless,ingeneral,nestedvirtualizedsystemstypicallyexposeasimilarinterfacetothenetworkingsubsystemacrossmultipleplatforms.Thelower-layerhasEthernetvirtualnetworkinterface(s)providedand 34

PAGE 35

managedbyacloudprovider's\outer"hypervisor;thesearemultiplexedamongEthernetvirtualnetworkinterfacesmanaged/exposedbythetenant's\inner"virtualizationstack.Theincreasingvarietyofcloudprovidersandtheaddedexibilityofanextralayerofindirectionfromnestedvirtualizationarewelcomenewsfortenantsinterestingindeployingmulti-cloudvirtualinfrastructuresthatseektoachievehighavailabilityandperformancewhileminimizingcost,andavoidingvendorlock-in.Networking,however,becomesharder{itrequiresthecoordinatedcongurationofvirtualnetworkinterfaces,switchesandroutersacrossmultipleinstances.Furthermore,eachcloudinfrastructurehasitsownnetworkingmodelandstack,andmaybesubjecttoconnectivityconstraintssuchasNetworkAddressTranslation(NAT),inparticularinprivateclouds.Thisleadstomanagementcomplexitythatisgreatlycompoundedwhentenantsdistributetheirapplicationsacrosscloudsandexpectelasticprovisioningandmanagementoperations,suchasVMmigration,toworkseamlessly.Throughoutthisdissertation,weusetheterm\nestedVMs"(or\nVMs"inshort)toreferbroadlytoanynestedvirtualizationtechniquethatexposesalayer-2virtualnetworkinterfacetoeachinstance.Intheevaluation,wefocusbothonO/ScontainersasnVMsandphysicalhostsaswell.Thelayer2virtual/physicalnetworkingtechniquesinVIASgenerallyapplytoanynestedvirtualizedsystemthatuseslayer-2Ethernetvirtualinterfaces. 2.1.3NetworkinginNestedContainersTothebestofmyknowledge,allVMMsusedinnestedvirtualizationsupporttheTUN/TAP(orequivalent)virtualnetworkinterfacedevicefornestedVMs.Inparticular,inLinuxContainers(LXC)oneormoreTUN/TAPdevicesarecreatedforeachcontainer.Bydefault,LXCalsocreatesaLinuxbridgeonthehost,andconnectsallvirtualnetworkinterfacesofthenestedVMstothisbridge.Inthisway,allthenestedVMsattachedtothisbridgeresideonthesamelayer-2Ethernetsegment.Becausemultiplenestedinstancesshareasingleinterfaceofthelower-layerhypervisor,itisnecessarytoassignandmanageaddresses,andmultiplexaccesstothesingleinterfaceusingaddresstranslation.Tothisend,LXCalso 35

PAGE 36

Figure2-3. Layer-2networkvirtualizationfornestedVMsacrossthreecloudproviders. createsNetworkAddressTranslation(NAT)rulesinthehostmachine,usingLinuxiptables.Tosupportautomaticaddressassignment,LXCrunsalightweightDHCPserveronthehostmachine.Thus,uponinstantiation,eachnestedVMgetsassignedarandomIPaddresswithinapredeterminedsubnetrangeof(private)IPaddresses.Whilenotallnestedvirtualizationtechnologiesautomatetheprocessofprovidingalayer2networkenvironmentbehindavirtualNAT,thisbehaviorcanbeprogrammedusingexistingLinuxdevicesandtoolsets{aLinuxbridge,iptablesanddnsmasq. 2.2NetworkVirtualizationThepurposeofVIASistoprovideacompletelayer2abstractiontoanyhostdeployedacrossheterogeneouscloudservices,withoutsignicantperformancedegradation.Especially,thegoalistoallowdistributedapplications(possiblymulti-tenant)withinnVMstoseamlesslycommunicateasillustratedinFigure 2-3 ,asiftheywereconnectedtothesamelayer-2segment{eventhoughtheyaredistributedacrossindependently-managedproviders.Ingeneral,therearetwoapproachestotacklethisissue.Therstreliesontenantsdeployingtheirownoverlaynetwork{whichhasthekeyadvantageofnotrequiringanysupportfromtheunderlyinginfrastructure.TheotherapproachistoexploitSDNand/orNetworkFunctionVirtualization(NFV)servicesprovidedbythecloudprovider{whichisnotinter-operableacrossproviders. 36

PAGE 37

2.2.1OverlayNetworksInoverlaynetworksandVPNs(VirtualPrivateNetwork),theentireheaderandpayloadofavirtualnetworkpacketareencapsulated(andpossiblyencryptedandauthenticated)byanotherlayertotransferthepacketoverpublicnetworklinks.TunnelingtechniquessuchasL2TP,GREorMPLStakeadvantageofencapsulation,whichprependsanadditionalnetworkheadertothesameordierentOSIlayerofthepacket[ 80 ].Tunnelscanbebuiltasstand-alonepoint-to-pointlinks,ororganizedtoformatopology,suchasameshorstructuredP2P,thatcanbeusedforscalableoverlayrouting[ 95 46 47 ].Overlayandtunnelingtechniquesbenetfromtheexibilityofusingencapsulationattheendpoints{whichdoesnotrequirechangestotheinfrastructure{butsuerfromperformancedegradation.Theadditionalencapsulationheaderisasourceofoverhead,limitingtheeectiveMaximumTransmissionUnit(MTU)size.Furthermore,overlayprocessingaddscomputationoverheadofdealingwithencapsulation{possiblyattheuserlevel,astypicaloverlaynetworksareimplementedasuser-levelprocesses. 2.2.2SoftwareDenedNetworkingSoftwareDenedNetworkinginitiallyemergedfromthenecessityoftestingexperimentalprotocolstoovercomethedicultiesofdeployingnewprotocolsonlegacyhardwareswitchesandrouters[ 57 ].Subsequently,theneedforecientlysupportinglargemulti-tenantenterprisedatacenters,suchaspublicandprivateclouds,hasmotivatedtheadoptionofSDNtechniquesasanapproachcomplementarytoNFV(NetworkFunctionVirtualization)fordatacenternetworkvirtualization.Virtualizationincloudcomputingimpactsthenetworkperformancebecauseofitsinherentsharingofprocessorresources.Thiscanleadtonegativeimpactonnetworkperformanceandstability,suchasexceptionallylongdelaysanddegradedthroughput[ 91 ].SDNandNFVtechniquescanmitigateperformancedegradationbymigratingnetworkvirtualizationprocessingoverheadstonetworkdevices,andpossiblyleadtosubstantialreductionsinoperatingexpenseforcloudandnetworkserviceproviders[ 38 ].Forexample,VMware[ 51 ]supportsnetworkvirtualizationthroughbothlogicallyandphysicallydeployed 37

PAGE 38

SDNnodes,providingaNetworkVirtualizationPlatform(NVP)withinamulti-tenantdatacenter.NVPleveragessoftwareswitchesonVMwarehypervisorsateachserverendpoint.However,cloudprovidersoftenconstrainnetworkcommunicationsavailableacrossinstances;furthermore,SDNintegrationandlayer-2messagingoutsideadomainisnotpossible,hinderingtheabilityfortenantstodeploytheirvirtualnetworksacrossproviders.Typically,tunnelingandSDNapproachesareusedindierentnetworkvirtualizationcontexts{suchasacrossandwithinadatacenter.VIASseekstointegratesthesetwoapproachesforaexible,cross-cloudoverlaynetworkvirtualizationtechniquethatmitigatesperformancedegradationbyselectivelyapplyingSDNtoestablishintra-cloudfastbypassows. 2.2.3NATTraversalAsdescribedabove,onechallengewithnestedvirtualizationistheneedtomultiplexthehostnetworkacrossmultiplenestedVMinstances.Consider,forinstance,atenantusingaVMprovisionedbyacloudprovider(cVM1),andtheninstantiatingnestedinstances(nVM1...nVMi)withincVM1.EachnVMisconsideredbythetenant,andhencehasavirtualaddressthatisprivate.WhilethenVMiinstancescancommunicatewithincVM1throughavirtualbridgeorswitch,inordertocommunicateacrossmultipleinstances(e.g.tonVMshostedinothermachinesinthesameprovider,oronadierentprovider)itbecomesnecessarytomapandtranslateaddresses.ItisalsopossiblethattheVMsprovisionedbyaservicethemselveshaveprivateIPaddressesthataretranslatedbyaNATdevice,whichiscommoninprivateclouds.Hence,intheabsenceoftheabilitytoprovisionpublicaddressestonestedVMinstances(whichiscurrentlynotoeredbycloudproviders),networkvirtualizationtechniquesmustdealwithmultiplexingandtranslationofnested/hostaddresses.Thiscan,inprinciple,beaccomplishedthroughcarefulcraftingofnetworkrulesbythetenant.However,thisbecomescomplexanderror-proneasthenetworkincreasesinsize.VIASleveragestheICE[ 92 ]andSTUN[ 79 ]protocolsintegratedinIPOP[ 47 ]tosupportdynamicNATtraversal,allowingnodestoself-organizepeer-to-peerlinksondemand.To 38

PAGE 39

illustrate,assumetwopeersAandB,bothbehinddistinctNATs,wanttoestablishadirectP2Ptunnel.Initially,Adoesnotknowtheoutermost(public)transportaddressofitself,andnordoesitknowB'soutermostaddress.Conversely,BdoesnotknowA's.Atrst,eachpeerqueriesaSTUNservertodiscoverthistransportaddress.Ateachsite,independently,theNATbindsoneoftheavailabletransportportnumbers,alongwithitsIPaddress,toeachpeer'sprivateaddressandportnumber;furthermore,theSTUNserverreplieswithoutermosttransporttoeachofthepeers.Atthispoint,bothpeersknowtheiroutermostpublicaddress;toestablishatunnel,thoseaddressesneedtobeexchanged.Forthisexchange,VIASleveragesIPOP'suseoftheeXtensibleMessagingandPresenceProtocol(XMPP),usinganXMPPserverinthepublicnetworkasanintermediary.Afterthesuccessfulexchange,eachpeercansendpacketswiththedestinationtransportaddressofremotepeer'soutermostaddress.Forinstance,ifAsendsapackettoB'soutermostNATtransportaddress,themessageisreceivedbyB'sNAT,whichtranslatestheoutertransportaddresstoaninnertransportaddress,thendeliveringthepacketwithintheprivatenetworktoreachthenaldestinationB. 2.3VIASOverviewThissectionoverviewsanovelsystem{VIAS(VIrtualizationAccelerationusingSDN){thatdeliverstheexibilityofoverlaysforinter-cloudvirtualprivatenetworking,whiletransparentlyapplyingSDNtechniques(availableinexistingOpenFlowhardwareorsoftwareswitches)toselectivelybypassoverlaytunnelingandachievenear-nativeperformancewithinaprovider.Figure 2-4 showsthearchitecturalillustrationofVIAS.VIAS,inlarge,consistsoftwomodules,IPOPandSDNcontroller.IPOPhandlesoverlaynetworkrelatedfeatures,creationorterminationofP2Plink,whileSDNcontrollerscontroldatapathofOpenFlowdevices.Inaddition,whiletheSDNcontrollertakesresponsibilityofprogrammingOpenFlowswitchestovirtualizeTCP/UDPstreams,IPOPhandlesfeaturesthatSDNisunabletosupport,suchasencapsulation,encryptionandauthentication.Architecturally,VIASisuniqueinhowitintegratesSDNandoverlaycontrollersinadistributedfashiontocoordinatethemanagementofvirtualnetworklinks.Theapproachisself-organizing,wherebyoverlaynodescandetect 39

PAGE 40

Figure2-4. ConceptualillustrationofVIAS thatpeerendpointsareinthesamenetworkandprogramabypasspathbetweenOpenFlowswitches.Whilegenerallyapplicable,VIAStargetstenantswhousenestedVMs/containersacrosscloudproviders,supportingseamlesscommunicationamongtheirnestedresourceswithoutrequiringanysupport,coordination,orstandardinterfacesacrossprovider.VIAShasbeenimplementedasanextensiontoanexistingvirtualnetworkoverlayplatform,integratingOpenFlowcontrollersupportwithdistributedoverlaycontrollers.PrototypeofVIAShasbeendeployedandtestedinrealisticcloudenvironments,usinganimplementationbasedonIPOP,theRYUSDNframework,OpenvSwitch,andLXCcontainersacrossvariouscloudenvironmentincludingAmazon,GooglecomputeengineandCloudLab[ 16 ].NetworkAddressTranslation(NAT)isheavilyused,notonlytodealwithshortageofIPv4addresses,butalsoindatacenternetworktoprovideisolation.NATisalsoessentialin 40

PAGE 41

nestedvirtualnetworkenvironments,whichseparatesguestVM/containernetworkenvironmentfromhostnetworkenvironment.NATinherentlyblocksunsolicitedtracfromoutside,acceptingonlytracthatisintendedtoreceive.NATdynamicallytranslatesfrompublicIPandtransportportnumberpairtoprivateoneandviceversa.NATbehaviorhasnotbeenimplementedinSDNsinpreviouswork,butisimportantforcloud-provisionedinter-cloudoverlays.ThisdissertationelaboratesondesignandimplementationofaSDNNAT.VIASsupportsthegeneralcase(scalable,dynamicvirtualprivatenetworkingacrossproviderspossiblyconstrainedbyNATs)byemployingoverlaynetworkingasthesubstrateforvirtualization,andoptimizesforacommoncase(communicationamongnodeswithinthesameprovider)bymeansofanovelperformanceenhancementtechniqueofautomaticallydetectingandprogrammingfastbypathlinksusingSDNAPIs.SuchSDN-programmedVIASbypassremovesthenecessityofpacketencapsulationanddeliversvirtualizationperformancenearwirelinkspeed.VIASdetectstrac-intensiveTCP/UDPowsinsidetheoverlay-encapsulateddatatrac,andautomaticallyswitchesovertotheSDNfabricwheneversuchpathcanbeprogrammed{suchaswhenendpointsarewithinthesamecloudprovider.SDNswitchesonendnodestranslatefromouter/innertoinner/outeraddressspace,whicheliminatesoverlayheaders,overlayprocessing,anduser/kernelcontextswitch. 2.4VIASArchitectureThekeydesignrequirementsinVIASare:1)toexposealayer-2abstractiontounmodiedapplications,2)tooperateasauser-levelapplicationthatcanbedeployedinanyexistingcloudinstance,thusnotrequiringdevelopmentofkernelmodules,3)tosupportprivate,encryptedtunnelsininter-cloudcommunications,and4)toavoidtheoverheadofencapsulationandkernel/usercrossingsforTCP/UDPowswithinacloudprovider.TheVIASarchitectureaddressestheserequirementsby:fullysupportinglayer-2tunnelingthroughTLS/DTLSP2Poverlaylinksasabaselinefunctionalityimplementedinuser-leveloverlaysoftware,andbyautomaticallydetectingandprogrammingSDNswitchestobypassTCP/UDPows. 41

PAGE 42

Figure2-5. VIASoverallarchitecture. ThereareimportantreasonswhyVIASbypassesTCP/UDPowsviaSDNswitches,butcarriesothervirtualnetworktrac(e.g.ARP,ICMP)intunnels.First,TCP/UDParethetransportsusedbythemajorityofcloudapplicationsandthecommoncasethatneedstomadefast.Second,cloudproviderstypicallyallowinstancestocommunicateusingTCP/UDP,butblocklayer-2protocols(suchasARP);sinceitisassumedthatSDNswitchesareavailableonlyattheendpointattheminimum(e.g.softwareswitchessuchasOpenvSwitchinaninstance),VIASisnotnecessarilyabletoprogramthecloudprovider'sentireswitchfabric.WithexistingOpenFlowstandardAPIsandexistingfunctionalityincloudplatforms,bypassowscanbeprogrammedbycoordinatedmapping/translationofTCP/UDPowendpointsatSDNswitchesconnectedtoVIASvirtualnetworkinterfaces.Toaccomplishthis,theVIASarchitectureisstructuredasillustratedinFigure 2-5 .Itcomprisesofthreemainmodules.VIASmainmodulesareuser-levelapplicationsthatareresponsibleformanagingbypassows(SDNcntr),setupandcongurationofoverlaytunnels(Overlaycntr),andencryption,encapsulation,tunnelingandNATtraversal(Overlaydatapath).TheVIASoverlaydatapathbindstoanSDNswitchportthroughavirtualnetworkinterface(tap),andtheSDNcontrollermoduleprogramstheswitch.TheSDNswitchesarecommoditysoftware/hardware 42

PAGE 43

devicesprogrammedusingOpenFlowAPIs;theyarealsoreferredtoasvias-nat-switchinthisdissertation.Detailedfunctionalityofeachmodulesfollowsasbelow. 2.4.1SDNControllerThismoduleactsasacontrollerfortheSDNswitches.Inessence,itallowsOpenFlowswitchestoretainconventionalfeatures(suchasMAClearning)but,inaddition,implementsthedistinctivefeaturesofVIAS:theabstractionofalayer-2virtualnetwork,andisolationofthevirtualnetworkfromthehostnetwork.TheVIASSDNcontrollerprogramsgatewayandNATfunctionalitiesontheSDNswitch,suchthatvirtualnetworkendpointscanhaveadierentsubnetandprivateaddressrangefromthehostnetwork's.ThosefeaturesareprogrammedusingstandardOpenFlowAPIs,withoutanymodicationstotheswitch,allowingtheuseofcommodityhardwareandsoftwareimplementationsofSDNswitches.ThemainrequirementisthattheVIASmoduleisallowedtoissueOpenFlowAPIcallstotheswitches.Forbrevity,wewillrefertotheSDNswitcheswiththesecapabilitiesasvias-nat-switch.Thevias-nat-switchalsoimplementsVIASoverlaybypasstoachievehighthroughputforintra-cloudTCP/UDPows.Theseowbypassrulesareprogrammed(alsousingOpenFlowAPIs)tohavehigherprioritythanthatofotherowrules.Wegivemoredetailsontheimplementationofthisvias-nat-switchandoverlaybypassinSubsections 2.5.1 and 2.5.2 2.4.2OverlayDatapathPacketscapturedbythetapvirtualnetworkinterfacearehandledbythismodule.Thismodulerunsasauser-spaceprocessthatreads/writesfromthetapdevice,andexecutesalllow-levelpacketprocessingfunctions,suchasprependingortranslatingnetworkheaders,aswellasencryptionandauthentication.VIASleveragestheIPOP[ 47 ]overlaystackforthismodule.Itreadsthedestinationaddresstolookupanoverlaydestination,prependsIPOPandUDPheaders,theninjectsthepackettotheWANinterface.WhilethecreationandterminationofP2Poverlaylinksaremanagedbytheoverlaycontroller,themetadataassociatedwitheachP2Poverlaylink(suchaspeerUID,IPv4orMACaddresses)areusedbytheoverlaydatapathmoduletomakeforwardingdecisions. 43

PAGE 44

Likewise,thenecessaryattributesoftheseheaders(suchastunnelingidentierandmappedIPandportnumbers)aredynamicallyassignedbytheoverlaycontrollerandprogrammedintotheoverlaydatapathmodule.AftertheUDPheaderprepending,packetsarereadytotraversethroughtunnelsacrossthepublicInternet.ThedesignandimplementationofthismoduleiselaboratedinSubsection 2.5.2 2.4.3OverlayControllerThecreationandterminationofP2Plinksamongoverlaynodes,andthetopologyoftheoverlaynetwork,aremanagedbythismodule.TheVIASoverlaycontrollerextendstheIPOPoverlaycontroller,whichcurrentlysupportsthreetypesoftopologies.Oneisanunstructuredsocialnetworkgraphtopology,whereP2Plinksarecreatedbasedonsociallinks.Asecondtopologyisanall-to-allgraphconnectingallnodes,whilethethirdtopologyisbasedontheChord[ 85 ]structuredP2Psystem.VIAScanusetheall-to-alltopologyforsmallnetworks(tensofhosts),orthestructuredtopologytoscaletolargernodes.Inthestructuredtopology,eachnodeidentierisbasedonSHA-1ofitsvirtualIPaddress,andidentier-basedstructuredroutingisperformedwhenthereisnodirectP2Plinkconnectingnodes.Structuredtopologypolicies,suchasthenumberofsuccessorsorchordlinks,arecongurableinthecontroller. 2.4.4OverlayLinkandBypassFlowManagementThegeneralapproachtakenbyVIASforthemanagement(creation,monitoring,andtear-down)oflinksisasfollows.First,overlaylinksarecreatedbytheoverlaycontroller.Linksmaybecreatedtoenforceatopologyinvariant(e.g.left/rightneighborsandchordsinstructuredP2P),oron-demandbasedontracinitiatedbyapeer.Second,foractivelinks,theirstateismonitored(withrespecttotheirtrac,andthetransportaddresses)bytheoverlaydatapathmodule,andmadeavailabletotheoverlaycontroller.Buildingonoverlaylinkmonitoringmechanisms,theoverlaycontrollerdenespoliciesforestablishingabypassTCP/UDPowthattakesintoaccount:1)thetracbetweennodes,and2)whetherthetwoendpointsarewithinthesameprovidernetwork,toinitiatealinkarequesttocreateabypassow.ThemechanismstoinitiateabypassowarehandledbytheSDNcontrollermodule. 44

PAGE 45

Therearetwoinstanceswhereabypassowmightbeterminated:oneisafterVMmigration,andanotherisifresources(availableports)areexhaustedattheSDNswitch.Thepoliciestodealwithbypassterminationareleftforfuturework,butmechanismsavailabletosupportthesepoliciesaremonitoringoftracperow(whichcanbeusedtoprioritizehigh-throughputtrac),andmigrationevents(e.g.unsolicitedARPrequests)tonotifycontrollerstoterminateaow.Insuchevents,theSDNcontrollercanthenclearoutbypassrules,hencerevertingallvirtualnetworktractopassthroughoverlaytunnels. 2.5VIASImeplementationInthissection,detailsofaVIASprototypeimplementationisprovided,highlightinghoweachmoduleisimplementedandintegrated.Moreover,weexplainhowthetransitionbetweenoverlayandSDNvirtualizationtakesplaceinVIAS.InSubsection 2.5.1 ,itiselaboratedonhowthedynamicNATfeatureoeredbyVIASisprogrammedintotheOpenFlowSDNswitch.InSubsection 2.5.2 ,tunnelingandoverlayvirtualizationalongwithvarioustunnelingmodesareexplained.Finally,inSubsection 2.5.3 ,itisdescribedhowVIASdetectsowsandimplementsrulestobypassoverlayvirtualizationbyusingSDNowrules. 2.5.1SDNControllerAsexplainedinSection 2.1.3 ,eachnestedVMsshouldbepresentedthefullabstractionofaprivatelayer2network,whilestillabletoaccessthepublicInternetasifbehindaNAT.Tosupportthisrequirement,VIASprogramsaddresstranslationrulesusingtheOpenFlowcontroller,whichmakesOpenvSwitch(oranyotherOpenFlowenableddevices)toworkasafull-coneNATrouter.Throughacongurationle,VIASspeciesasingleswitchportasitsWANinterface(eitherbythephysicalportnumberofaswitchorinterfacename);theremainingswitchportsaresetasLANports.Thecontrolleralsoimplementsagateway.BoththesubnetrangeandthegatewayIPaddressarestaticallyassignedthroughVIASconguration,andVIASdividestheaddressspaceofnestedVMsfromthecloudproviders'addressspace.WhenpacketsaresourcedfromtheLANaddressrangeanddestinedbeyondthegateway(i.e.the 45

PAGE 46

Figure2-6. NATimplementationinVIASusingOpenFlowprotocol. destinationaddressisoutoftherangeofLANsubnet),thecontrollerprogramsaowruleintheOpenFlowswitchtoperformfull-coneNATmappings. 46

PAGE 47

Forexample,asinFigure 2-6 ,theVIASSDNswitchisprogrammedusingOpenFlowtosupportNATtranslation,inadditionto\upcalls"totheoverlaydatapathandoverlaybypassrules.ThisallowsnVMstoaccesspublicInternetnodesthroughalayer-3NATgateway,inadditiontoexposingavirtuallayer-2networkacrossallnodesconnectedtotheoverlay.Intheexample,anIPpacketsentbynVM192.168.1.2topublicserver4.2.2.2triggerstheprogrammingofNATrulesbytheVIASSDNcontroller.Indetail,assumeanestednVMwithIPaddress192.168.1.2triestoaccessapublicserver4.2.2.2throughTCPwithsourceportnumberX.Initially,thedestinationaddressoftheveryrstpacketdoesnotmatchanyexistingowrulesintheswitch.NotethatthiscontrollersupportsMACaddresslearning,suchthatitcanlearnandmapeachnVMMACaddresstoitsrespectiveport.Then,themetadataofthisrstpacketisforwardedtotheSDNcontrollerthroughtheOpenFlowAPIofofp packet in.TheSDNcontrollerthenchecksthedestinationaddress;itcandeterminethatthedestination4.2.2.2isnotinitsLANsegment,butinsteadacrossthegateway.TheSDNcontrollerthenrandomlychoosesportnumberYfromanavailableportnumberpool(whichisalsocongurable).Finally,itmakestwoowentries:oneisanoutgoingowentryforstreamsfromnestednVMtothepublicInternet,andtheotherisanincomingowentryforthereversestream.Thiscanbedonewithseriesofofp matchandofp actions.Continuingintheexample,theoutgoingowentrytranslatesaddressfrom192.168.1.2:Xto10.0.0.1:Y,replacingthesourceaddressofnestednVMtoitsWANinterface.Theincomingowentrytranslatesdestinationaddressfrom10.0.0.1:Yto192.168.1.2:X,andinjectsthepacketstotheswitchportofdestinationnestedVM.ThesametechniquecanbeappliedtoUDPstreams.SincetheabsenceofICMPechoidentiereldinmatchofOpenFlowspecication,ICMPprotocol,whichiswidelyusedforechorequest/reply(ping)message,cannotbeNATted.OnelimitationofthisSDNcontrolleristhatitcannotapplythisNATtingbehaviortoICMPechorequest/reply(ping)messages,aprotocolwidelyusedfortestingreachabilityofnetworklayer3devices.OpenFlowincreasesthetypeofofp matcheldfrom10ofversion1.0to45ofversion 47

PAGE 48

1.5.However,theICMPechoidentiereld,whichiswidelyusedforNATtinginconventionalrouters,isnotincorporatedintheOpenFlowspecication.Therefore,itisimpossibletoapplyabovetechniqueofofp matchandofp actionoperationtoICMPmessages.IffutureOpenFlowspecicationsincludethiseld,wecanapplythesameapproachbyusingtheICMPechoidentierinsteadoftransportportnumber.Tothisend,ICMPmessagesareforwardedtothecontroller,andNATtingisperformedintheSDNcontroller.WhilethisapproachincreasesthelatencyofICMPmessages,thefunctionalbehavioroftheprotocolisunaltered.SinceICMPisnotusedfortrac-intensivecommunications,thisperformancedegradationisacceptable.IntheVIASprototype,theSDNcontrollerhasbeenbuiltupontheopen-sourceRYU[ 6 ]SDNframework.Thecode(excepttheframework)isapproximately800linesofPythoncode.Forfullbackwardcompatibility,thisimplementationisbasedonOpenFlowspecicationversion1.0. 2.5.2OverlayDatapathandOverlayControllerVIASbuildsupontheIPOPcodebaseasabasisfortheoverlaydatapath.Initscurrentarchitecture,IPOPcomprisesofapacketprocessing/overlaydatapathbinary,andanoverlaycontroller,whichcommunicatethroughtheTinCanAPI[ 47 ].Inessence,VIASextendstheIPOPoverlaycontrollertosupportSDNbypassprocessing{thesetwomodulesareembodiedinVIASasexplainedinSection 2.4 .InIPOP,nodescreatedirectP2PoverlaylinkstotunnelvirtualnetworktracusingtheICEprotocol[ 92 ].InordertobootstrapthesedirectP2Poverlaylinks,anXMPPserverisemployedtoassistinexchangingmessagescontainingcandidateendpointinformationofpeers{includingtheouter-andinner-mosttransportpairs,ifnodesarebehindmultipleNATs.Duringthisprocess,IPOP'soverlaydatapathmoduleopensUDPports(\hole-punching")onNATstocreateP2Ptunnels.Subsequently,peerscommunicateusingtheseassignedUDPtransports.TocapturepacketswithintheoverlayaddressrangefromtheO/Snetworkkernel,avirtualnetworkinterfacedevice(tap)isused.Afterthepacketiscapturedbythetapdevice,itisforwardedtotheIPOPoverlaydatapathmodule.ThismoduleprependsanIPOPheader 48

PAGE 49

Figure2-7. PacketencapsulationinIPOP. tothepacket;itisthenencapsulatedagainbyUDPheader,andthensendtothedestinationpeer'sUDPport,whichisdiscoveredthroughtheICEprotocolandpunchedintheNATbytheremotepeer.EachIPOPoverlaynodeisassigneda20-Byteuniqueidentier(UID)thatisusedinoverlayrouting.TheIPOPoverlayheaderconsistsoftwoelds,whicharethesourceanddestinationUIDsofoverlaysenderandreceiver.TheouterUDPandIPheaderareplacedbeforetheIPOPheaders,resultingintheoverallpacketstructureasshowninFigure 2-7 .IPOPmapsvariousnetworkaddressidentiers(MAC,IPv4andIPv6addresses)toP2Plinks,andcurrentlysupportsbothlayer2andlayer3virtualnetworks.VIASusestheswitch(layer2)modeofoperation.IPOPkeepsfourseparatehashtables:p2plink table,ipv4 table,ipv6 table,andmac table.Thekeyofp2plink tablecorrespondstoaUIDoftheremoteIPOPpeer,whilethevaluecorrespondstothelinkobject.Theipv4 table,ipv6 table,mac tablemapIPv4,IPv6andMACaddressrespectivelytoIPOPoverlaynodeUIDs,whichisthekeyofp2plink table.InVIAS,eachIPOPoverlaylinkisconsideredasanOSIlayer2tunnel.EachIPOPlinkismappedsuchthatmultipleMACaddressesareboundtoalink{whichisakintoalayer2switch'sMACaddressesboundtoaport.Thelayer-2overlayimplementslearningofMAC 49

PAGE 50

Figure2-8. PacketdatapathalternativesinVIAS. addressesbyhandlingARPrequest/replymessagesinitsLAN.IPOPchecksthedestinationMACaddressoftheframeandinjectstheframetothecorrespondinglink.TobindthevirtualnetworktoVMs,IPOPusesvirtualdevices,includingLinuxbridgeandOpenvSwitch[ 70 ].ThetapinterfaceisattachedtothisbindingdeviceandworksasabridgetotheremoteLANvirtualnetwork.Forexample,considertheusagescenariowithFigure 2-8 .TheillustrationshowstwohostsrunningVIAS-enhancedIPOP.EachhostcontainsmultiplenestedVMs(containers)withvirtualnetworkinterfaces(veth#)attachedtotheOpenvSwitchdevice.Theseguestsareinthesamelayer2network,andareNATtedbythehostnetworkstacktothephysicalinterface(peth0).Initially,packetsowthroughtheencapsulationtunnelingdatapath(solidline),throughtheoverlaydatapathmodule.OnceabypassowisinstalledbytheSDNcontrollersonbothendpoints,thefasterSDNvirtualizationdatapath(dashedline)isused.Whenguest0(veth0)attemptstosendIPpacketstoguest1(veth2),itrstbroadcastsanARPrequestmessage.TheVIAS-IPOPoverlaydatapathmodulepicksthismessagethroughthetapdevice,andhandlesitasfollows.First,theARPmessageisencapsulated(Figure 2-7 )andforwarded 50

PAGE 51

toalloverlaylinks.Atthisstage,dierentoverlaymulticastapproachescanbeimplementedbytheoverlaycontroller,dependingontheoverlay'stopology.Alltheoverlaynodesreceivingtheoverlay-broadcastpacket(e.g.theright-handsideofFigure 2-8 )decapsulatethemessageandbroadcastittoitsL2network(usingthetapdevice).IfthereisnodestinationmatchingtheARPrequestmessage,themessageissimplydropped.Ifthedestinationisinthenetwork(e.g.veth2inFigure 2-8 ),anARPreplymessageiscreatedbytheguest,andthereplyissentback(withanunicastmessage)tothesender(e.g.guest0).Aspartofthisprocess,theoverlaycreatesentriesinmac tablebindingtheMACaddressitlearnedtothecorrespondingoverlaylink.Aspartofthisprocess,theoverlaymapstheMACaddressitlearnedtothecorrespondingoverlaylink.AllunicastMACaddressframescapturedbytheoverlaylookupthesemappingstodeterminealongwhichoverlaylinktoforward.VIASmodecandynamicallyaccommodateoverlaytopologychangesbyupdatinganMACaddressanditsoverlaylinkbindingsupondetectinganARPframe.IfweconsiderausagescenarioofVMmigrationfromonehostinaprovidertoanotherhostinadierentprovider,theprocesscanbeautomaticallyhandled;thereneedstobenonetworkadministratorinvolvement,sincetheARPmessagefromthemigratednodeitselfincurupdatestheMAC-overlaylinkmappingondeployedoverlaynetwork. 2.5.3Software-denedOverlayBypassAsdescribedintheprevioussection,theoverlayvirtualizationprocessplacesseveralsourcesofoverhead.Atrst,thereisthetransitionoverheadofcontextswitchbetweennetworkkerneltouserspace.Italsorequiresmultiplecopyoperationswhenitsendspacketfromnetworkkerneltouserspace.Moreover,sincethereistheneedforadditionalprependingofoverlayheaders,theMTUsizeissmallerthanthatofthephysicalnetwork.ThisoverheadisthepricepaidtocreateoverlayvirtualnetworkslinkingnVMsacrossmultipleproviders,becausetunneling,NATtraversalandencryptionisrequiredforvirtualprivatenetworking.However,thisoverheadcanbemitigatedfornVMs\withinthesameprovider".Toaccomplishthis,webypassthisencapsulationprocessontrac-intensiveTCP/UDPows,asfollows. 51

PAGE 52

Figure2-9. Virtualizationcomparisonofdatapath VIASdetectsatrac-intensivestreambymonitoringtraconoverlaylinks,andcandetermineifanoverlaypathispossiblebyinspectingtheoverlayendpointsofalinkwhethertheyarewithinthesamenetwork.WhenVIASdeterminesthatabypasslinkshouldbecreated,itrstallocatesandassignsanavailableTCP(orUDP)portnumberpairfortheouteraddressspace.Thenitestablishesamappingofinneraddressrangetotheouteraddressrangewithitsportnumber.InOpenFlow,thiscanbeprogrammedwithasingleowaddAPIcallwithofp ow modalongasetofofp matchandofp action.Onmatchedpacketonofp match,theVIAScontrollerperformsactionsOFPAT SET DL SRCandOFPAT SET DL DSTtoreplacetheMACaddressofnestedVMandOpenvSwitchtophysicalEthernet(peth)andgatewayofouteraddressspace.ThenitperformsactionsOFPAT SET NW SRCandOFPAT SET NW DSTtoreplaceinnerIPaddressestoouteraddressspace.Finally,actionsOFPAT SET TP SRCandOFPAT SET TP DSTreplacetheinnertransportnumbertooutertransportnumber.Theincomingtractakesthesamesteps,butchangesfrompublicaddressspacetoprivateaddressspace.Toillustratethisbehavior,considertheexampleofaTCPstreamdepictedinFigure 2-9 alongsidetheowruleexampleofTable 2-1 .Theexampleusestheprivateaddressspace10.0.3.0/24fortheinneraddressspacefornestedVMs,andpublicnetworkaddresses128.0.0.0/24(notethatitisalsopossibletouseprivateaddressesfortheouteraddressspace, 52

PAGE 53

Table2-1. Streambypassruleexample Match Action Localhost Outbound nw src=10.0.3.1,nw dst=10.0.3.2,tp src=40001,tp dst=40002 set nw src=128.0.0.1,set nw dst=128.0.0.2,set tp src=50001,set tp dst=50002 Inbound nw src=128.0.0.2,nw dst=128.0.0.1,tp src=50002,tp dst=50001 set nw src=10.0.3.2,set nw dst=10.0.3.1,set tp src=40002,set tp dst=40001 Remotehost Outbound nw src=10.0.3.2,nw dst=10.0.3.1,tp src=40002,tp dst=40001 set nw src=128.0.0.2,set nw dst=128.0.0.1,set tp src=50002,set tp dst=50001 Inbound nw src=128.0.0.1,nw dst=128.0.0.2,tp src=50001,tp dst=50002 set nw src=10.0.3.1,set nw dst=10.0.3.2,set tp src=40001,set tp dst=40002 sinceVIASsupportsNATtraversal).Initially,packetsstreamthroughencapsulationoverlaydatapath,VIASprependingheaderseverypacket.Atcertainthresholdoftrac,VIAStriggersSDNbypass.Firstitextractsmetadataofthestream,includingthesource/destinationIPaddress(10.0.3.X)andportnumber(4000X).ThenVIASallocatesandassignsanavailableportnumber(5000Y)touseasoutertransportaddress.Next,VIASprogramsinboundOpenFlowrulesontheOpenvSwitchSDNswitchinhostVMA.Theseinboundrulestranslateoutertransport(IPaddressandportnumber)toinnertransportaddress.WhenVIASprogramsanOpenFlowswitch,itensuresthestreambypassruleshavehigherpriorityoverotherowrules,sothatthepacketisnotappliedtootherowrules.Soonafter,VIASsendsaninter-controllerRPCmessagethroughitsP2Poverlaylinktothepeeroverlaynode,passingalongJSON-formattedmetadataofthestreamasinListing 2.1 .Thismetadatasentthroughtheinter-controllerRPCAPIcontainsinformationsuchasoutertransportaddress,innertransportaddressandtransporttype.Uponreceivingthemessage,theVIAScontrollerathostBprogramsinboundandoutboundrulestoitsOpenvSwitchSDNswitch.TheVIAScontrollerathostBthensends 53

PAGE 54

Listing2.1.ExampleofstreammetadataexchangedbyVIAScontrollerstocoordinatebypassowrules. { "protocol":"TCP", "local_host_ipv4":"128.0.0.3", "remote_host_ipv4":"128.0.0.13", "src_random_port":50001, "dst_random_port":50002, "src_ipv4":"192.168.4.3", "dst_ipv4":"192.168.4.13", "src_port":40001 "dst_port":40002 } anacknowledgementRPCtohostAthroughtheoverlaylink,sothatitmakessurethattheoutboundruleinlocalhostisprogrammedonlyafteralltheotherrulesprogrammed.Thisorderingofeventsneedstobeenforcedtoavoidpacketlossduringthesetupofbypassrules.Ifwesetlocaloutboundrulebeforetheotherowrules,thestreamwillhavepacketssilentlydiscardedbytheSDNswitchbecauseofnomatchingrules.Finally,rightafterthenaloutboundrulesareprogrammedinOpenFlowswitches,thestreambypassesencapsulationandtransferspacketsthroughSDNswitchesonbothendpoints.ThisapproachcanbeseenasanapproachakintoNATs,asitprovidestheabilitytomapandtranslateaddresses.However,unliketheconventionaluseofNAT,whereeachindividualNATisindependentlycontrolled,thisschemeorchestratestheprogrammingofmappingsacrosscontrollersonbothpeerendpoints.VIASessentiallyusesoverlaylinkascontrolchannelsforcoordinationamongthetwopeerSDNcontrollerstoestablishNATmappingssimultaneouslyacrossendpoints,allowingbothnodesbehindNATstohaveadirectSDNowthatbypassestheoverlay. 2.6VIASEvaluationInthissection,weevaluateVIASfromthreedierentperspectives.Firstly,inadditiontoprovidingoverlaynetworkingandbypasspaths,VIASactsasanOpenFlow-programmedSDNbridgeandNATfornestedVMs.Typically,nestedVMscreatevirtualnetworkinterfacesbound 54

PAGE 55

Table2-2. ICMPandTCPperformancecomparisonbetweenLinuxNATandNATfeaturedopenvswitchofVIAS TestCase ICMP TCP HostVM S1 Native 0.487ms[0.104] 5.24Gbps[0.421] OpenvSwitch 7.76ms[0.790] 4.87Gbps[0.577] Percentchange 1493% -7.06% S2 native 41.7ms[0.269] 547Mbps[10.4] OpenvSwitch 49.2ms[1.35] 527Mbps[48.9] Percentchange 17.9% -3.80 GuestVM S1 Native 0.569ms[0.0611] 4.09Gbps[0.406] OpenvSwitch 7.76ms[0.686] 3.86Gbps[0.518] Percentchange 1264% -5.62% S2 native 41.7ms[0.514] 411Mbps[85.6] OpenvSwitch 49.3ms[0.978] 398Mbps[83.1] Percentchange 18.2% -3.16% Standarddeviationisshowninsquarebrackets toahostLinuxbridgeandNATbehaviorisimplementedusingiptables.ToevaluatetheVIASSDNbridge/NAT,wecomparetheperformanceofOpenvSwitch-basedVIAStothenativeLinuxbridge/NATimplementation.ThesecondevaluationconsidersthethroughputdeliveredbyVIASforTCPstreamsbetweennestedVMswithinandacrosscloudproviders.Thirdly,weuseanapplication-layerbenchmark(Redis)toevaluateend-to-endVIASperformance.Forallexperiments,UbuntuLinux14.04LTShostsareprovisionedfromclouds.SoftwareSDNswitches(OpenvSwitchversion2.0.2),andLXCcontainersinsidethehostareinstalledandconguredtocreatenestedinstances.VIASisimplementedasextensionstoIPOP15.01usingRYUasaframeworkforOpenFlowhandling. 2.6.1VIASOpenvSwitchbridge/NATAspointedoutinSection 2.5.1 ,OpenFlowisnotcapabletoprogramowrulestohandleNATtingofICMPechorequest/reply.Consequently,allICMPechorequest/replypacketsareforwardedbytheSDNswitchtotheVIAScontroller:thecontrolleritselfhandlesNATforICMPpacketsbymakinganentryinalocaltableforeveryoutgoingICMPechorequestmessage,usingtheICMPechoidentiereldasakeytothistable.EventhoughtheVIAScontrollerrunsinthesamehostasOpenvSwitch,ICMPhandlingincursoverheads. 55

PAGE 56

OpenvSwitchNAToverheadiscomparedwiththenativeLinuxbridge/NATimplementedbyiptablesinTable 2-2 .Intheexperiment,hostVMsaredeployedintheCloudLab[ 16 ]IG-DDCClusterinUtahandareprovisionedwith862MBofmemory.50pingtests(ICMPecho)and10iperfthroughput(TCP)testsareconductedandthearithmeticmean(andstandarddeviation)isreported,consideringaclientinCloudLabandtwodierentservers.Oneserver(S1)residesinthesameCloudLabcluster.Theotherserver(S2)isprovisionedasan\m1.small"instancewith2GBmemoryintheChameleoncloud[ 13 ]inTexas.Astheresultsshow,alatencyoverheadofabout7msisincurredinICMPNAThandling,irrespectiveofthenetworkdistancetotheserver.Thisisbecausetheoverheadmostlycomesfromtheinter-processcommunicationbetweenOpenvSwitchandtheVIAScontroller,andkernel/usercontextswitch.Thisoverheadisacceptable,astheICMPechomessageistypicallyusedfornetworkreachabilitychecksandnotforlatency-sensitiveapplications.TheTCPiperftestresultsshowsthatthebridge/NATthroughputdegradesabout3-7%comparedtonativeLinuxiptables.AsOpenvSwitchdevelopersarguethattheTCPperformanceisequivalenttothatofLinuxbridge[ 70 ],theslightperformancedegradationobservedisduetotheNATrulesprogrammedbyVIAS.TheNATrulesinthenativeLinuxcasearesetbyiptablesandexecutedinLinuxnetworkkernel,whiletheVIASNATisimplementedbyowrulesinsideOpenvSwitch. 2.6.2MicrobenchmarksThisexperimentevaluatestheperformanceofVIASforvirtualnetworkcommunicationamongnestedVMinstances.Tothisend,VIASisdeployedonmultiplecloudserviceplatforms,includingcommercialandacademicclouds.Thisallowstheevaluationoffunctionalityandperformanceforintra-andinter-clouddeploymentsacrossvariousgeographicallocations.ItisdemonstratedthatthenestedcontainersseparatedbymultipleNATsacrossmultiplecloudsaresuccessfullyconnectedbyavirtuallayer-2network,andevaluatetheperformanceofARP,TCP,andICMPprotocols. 56

PAGE 57

Thefollowingtestcasesareconsidered.Therstcase(CC)usestwoXenVMsdeployedonCloudLab[ 16 ].EachVMisprovisionedwith862MBmemoryonCloudLabIG-DDC.Thesecondcase(AA)usestwoAmazonEC2instancesoftypet2.mediuminthesamezone(Oregon).AlthoughAmazondoesnotprovidespecicationsofnetworkthroughputoft2.medium(onlymentioningthatitslowtomoderate),basedonthelinktestinFigure 2-4 ,theperformancelevelsarecommensuratetoan1GbpsEthernetlink.Thethirdcase(GG)usestwoGooglecomputeengineinstancesofn1-standard-1atUScentralzone.HostphysicalmachinewasprovisionedonIntel(R)Xeon(R)CPU@2.30GHzand3.7GBmemory.Inthefourthcase(AA dz),experimentsarealsoconductedwitht2.mediumAmazonEC2instances,butwithVMsdistributedacrosstwoavailabilityzones(N.VirginiaandOregon)forcomparison.Finally,thefthcase(AG dz)considerstwoinstancesdeployedacrosstwodierentcloudserviceproviders.OneinstanceisonAmazonEC2(t2.medium)atOregonandtheotherinstanceisonGooglecomputeengine(n1-standard-1)atUScentral.ColumnphysicalisthelatencybetweenhostVMs.TheoverlaycolumnshowsthelatencybetweennestedVMs.ThetracstreamsthroughoverlaydatapathfromLinuxbridgetotapdevicetoP2Ptunnel.TheVIAScolumnshowsthelatencyofoverlaydatapathwithOpenvSwitchandtapdevice.Notethat,fortheVIAScolumn,ARPandICMPisforwardedtoSDNcontroller,incurringadditionallatency. 2.6.2.1ARPTheARPlatencyismeasuredusingiputils-arpingandisshownonTable 2-3 .Thetestisrepeated50times,andthearithmeticmeanandstandarddeviationarereported.TheresultsfortheAA,CCandGGcaseshowthattheoverheadofARPhandlinginoverlayislessthan1.5ms,whileinVIAS,thelatencyoverheadsareintherangeof4-24ms,duetointer-processcommunicationandSDNcontrollerprocessing.Surprisingly,theresultsshowanexceptionallylonglatencyofARPacrosshostXenVMsinoneparticularenvironmentCC{theoverlayandVIASlatenciesaresmallerthanphysicalnetwork.Whileitwasnotabletodenitivelydeterminethereasonforthisbehavior,oneobservationisthattheVIASandtheoverlay 57

PAGE 58

Table2-3. ARPandICMPlatencycomparisonamongconventionalLinuximplementation,overlaydatapathandVIAS Testcase Physical Overlay VIAS ARP[ms] CC 432[243] 1.47[0.130] 24.4[0.102] AA 0.693[0.0357] 1.42[0.0805] 4.00[0.120] GG N/A 0.4207[0.126] 4.63[0.372] AA dz N/A 84.5[0.243] N/A AG dz N/A 49.7[0.116] N/A ICMP[ms] CC 0.954[0.121] 1.17[0.0825] 15.4[1.20] AA 0.559[0.132] 0.970[0.149] 5.33[0.197] GG 0.421[0.126] 0.697[0.0843] 6.22[0.409] AA dz 84.6[0.400] 92.5[0.169] N/A AG dz 50.7[0.101] 50.2[0.379] N/A CC:cloudlabAA:AmazonGG:GooglecomputeengineAG:Amazon-Googledz:dierentzone.Standarddeviationisshowninsquarebrackets pathsuseUDPastheprotocol,whereasarpingonthehostVMsusestheARPprotocol.ItispossiblethattheCloudLabplatformhandlesARPandUDPindierentways{sinceARPtracisrelativelyinfrequent,theeectsofdelayinoverallnetworkperformancearetypicallynotsignicant.TheARPmeasurementsintheAAcaseshowthatphysicallatencyisthelowest.Naturally,geographicallyseparatednestedVMs(AA dzandAG dz)exhibitlongerARPlatenciesduetonetworkdistance;thisisobservedintheresultssummarizedintheoverlaycolumn.NotethatbecausetheAA dzandAG dzinstancesarenotinthesameLANsegment,ARPtracinthephysicalnetworkisnotsupported,hencethephysicalcolumnshowsN/A.Furthermore,itwasnotabletoevaluateARPinGGcase,becauseGooglecomputeenginedoesnotprovidelayer2abstractionamonginstancesdeployedinthesamezone,andthustheARPprotocoldoesnotwork.Nonetheless,theresultsdemonstratethat,regardlessofprovidersblockingL2tracwithinoracrossclouds,VIAScanpresentaL2virtualnetworktonestedVMs. 58

PAGE 59

Table2-4. TCPperformancecomparisonamongphysical,overlaydatapathandSDNbypassvirtualizationscenario. Testcase Average StandardDeviation Percentchange CC Physical 932Mbps 9.66 0.0% Overlay 92.8Mbps 6.97 -90.0% VIAS 901Mbps 7.42 -3.33% AA Physical 933Mbps 53.4 0.0% Overlay 293.4Mbps 8.47 -68.6% VIAS 876Mbps 49.6 -6.11% GG Physical 1.99Gbps 0.00471 0.0% Overlay 223Mbps 14.7 -88.8% VIAS 1.99Gbps 0.00316 -0.00% AA dz Physical 140Mbps 24.3 0.0% Overlay 41.18Mbps 17.3 -70.6% VIAS 71.17Mbps 43.9 -38.0% AG dz Physical 118Mbps 27.1 0.0% Overlay 44.9Mbps 11.9 -61.9% VIAS 162.2Mbps 27.9 +37.5% 2.6.2.2ICMPICMPecholatencyismeasuredusingtheLinuxpingcommand.Thetestisrepeated50times,andreportthearithmeticmeanandstandarddeviationofthelatency.ThetrendissimilarwiththeARPlatency.Thelatencyoverheadisoftheorderofafewmillisecondsacrossalltestcases.Sincetheoverheadonlyconcernsinlocalkernel/userboundarycrossingandlocalsocketinterface,thenestedVMlatencyisnotafunctionofthephysicalnetwork'slatency,butratheraconstantoverheadattheendpoints. 2.6.2.3TCPInTable 2-4 andinFigure 2-10 ,theTCPthroughputswithdierentcongurationsbetweennestedVMsacrossdierentcloudservicesaresummarized.TheiperftoolisusedtotestthemaximumTCPthroughput;testswererepeated10times,andthearithmeticmeanandstandarddeviationareshown.Forthephysicalrow,iperfwasexecutedonthehostVM,whilefortheoverlayandVIASrow,iperfwasexecutedonthenestedVMs.TheresultsshowthatencapsulationinuserspacehasitspeakthroughputataroundafewofhundredsMbps,regardlessoflinklayerbandwidth.Thisoverlayperformanceisafunctionof 59

PAGE 60

Figure2-10. TCPperformancecomparisonamongphysical,overlaydatapathandVIASvirtualizationscenario. thehostprocessor'sperformance,andoverheadsassociatedwithpackethandlingandcopyingandO/Skernel-usercontextswitch.Incontrast,SDNvirtualizationachievesover94%ofthethroughputofthephysicalnetworkwhenendpointsareinthesamedatacenter(casesCC,AAandGG).NotethatthisexperimentconsidersasoftwareOpenvSwitchthatrunsonthehostVM,anddoesnotuseanyassistfromSDNhardware{thoughinprincipleVIAScanalsoprogramhardware-layerSDNswitches,ifavailable.Nonetheless,byeliminatingO/ScontextswitchandoverlaypackethandlingthroughtheVIASSDNoverlaybypassresultsinsubstantialperformanceimprovementsfornetworkvirtualization.Asdescribedinsection 2.5.3 ,VIASdetectstrac-intensiveTCPstreamsatruntime,andthenautomaticallyinsertsbypassrulesintheSDNfabric.Priortocompletionofowrule,TCPstreamstraversetheoverlaydatapath.Thus,thereisalatencyinvolvedinthecoordinatedprogrammingofowrulesintheSDNswitchofbothpeerendpoints.ThislatencyisafunctionoftheroundtriptimebetweenhostVMs:TheSDNbypasssetuplatenciesare 60

PAGE 61

Figure2-11. Redissimulation. measuredtobe13.5ms,12.3msand2.63ms,respectively,intheCC,AAandGGcases.ThislatencyismeasuredusingthedierencebetweentherstpacketandthelastpacketintheencapsulationpathofeachTCPstream. 2.6.3ApplicationTestInthissection,virtualizationperformanceattheapplicationlayeriselaborated,usingaround-triplatencysensitiveapplication:Redis,aNoSQL,memory-basedandkey-valuedatastructurestoragesystemthatiswidelyused.Intheexperiment,twohostVMsaredeployedintheCloudLabIG-DDCCluster,provisionedwith862MBofmemory.Thetestisdonewiththeversion3.0.6ofRedis.Thekeylengthis10Byteslongandvaluelengtharesetto50Byteslong,acommonusagepatternofRedis[ 21 ].Foreachrun,clientsmake1millionqueries(50%setsand50%gets).Everysetpacketsizeis93Bytes,andgetreturnpacketsfromthegetare36and57Bytes,respectively.Figure 2-11 showstheresultsofphysical,overlaydatapath,andVIASSDNbypasscases.Oneimportantobservationneedstobemade{thatthe\Physical"caseisonewherewhenbothRedisserverandclientsrunsonthehostVM,whileinoverlaydatapathandSDNbypass 61

PAGE 62

cases,thoserunonnestedcontainers.EachthreadsetupasingleTCPstreamwithRedisserver.Asthethreadcountincreases,thethroughputalsoincreases,saturatingataround20threadsforLXCcontainers.ThethroughputofVIASSDNbypassisonparwiththephysicalcase,proportionallyincreasingthethroughputasphysicalcasewiththeincreasingnumberofthreadcounts.Onthecontrary,overlaydatapaththroughputsaturatesaround8KOP/S.After20threads,performancestartstodegrade;thisisduetonotonlyoverheadofSDNprocessingintheswitch,butalsoduetoresourcelimitationswithinthecontainers.TheexperimentshowsthatVIASbypassperformssignicantlybetterthanoverlayencapsulation,andthatitiscapabletobypassmultipleTCPstreamssimultaneously. 2.7RelatedWorkTheideaofbuildinguser-levelvirtualnetworksforGrid/cloudcomputingdatesbacktosystemsincludingViolin[ 46 ],VNET[ 86 ],andIPOP[ 47 ].Violinproposedoverlayvirtualnetworksprovidingtheabstractionofseparatelayer-2networksfortenants.ViolinwasimplementedwithoutrequiringanymodicationsontheVMMorthehostingnetworkinfrastructure,notonlyprovidingexibleusercongurablenetworkenvironment,butalsoreducingthethreatofsecurityriskfromthehost.VNETaddressedasimilarproblem,focusingontheabilitytointer-connectvirtualmachinesacrossproviders.ThiswasaccomplishedbyrunningVNET\proxies"onendpoints(e.g.VMs)atdierentsites,andtunnelingL2tracoverTCP/TLSlinks.Alimitationofthesesystemsisperformance{nowadays,cloudnetworkinfrastructureperformanceis1Gbpsorhigher,makingitchallengingtodeliverhigh-performancenetworkvirtualizationattheapplicationlayeroncommodityCPUs.VNET/P[ 95 ]addressestheperformancegapbyimplementingakernelmoduleforlayer2virtualizationofguestVMsinsidetheVMMincreasingtheperformanceofvirtualnetworkssubstantiallybymovingvirtualizationprocessfromuserspacetoaVMMmodule.Itshowedthatitcanachievelinespeedperformancevirtualizationin1Gbpsnetworkand78%in10Gbpsnetwork.However, 62

PAGE 63

itrequireschangestotheVMM,whichhindersdeploymentofthistechnique.VIASseekstoalsobypassuser-levelprocessing,butdoessowhilereusingexisting,unmodiedsystemsbyleveragingsoftwareSDNswitches.TheexperimentshaveshownthatitispossibletodeployVIASinexistingcloudinfrastructures(AmazonEC2,Googlecomputeengine,CloudLab)withoutrequiringanychangestoVMMsnorVMimages{VIASonlyrequiresuser-levelsoftwaretobeinstalled.AnotherapproachtominimizecontextswitchbetweenkernelanduserspaceisNetmap[ 76 ].Iteliminatesthecopyoperationbyusingsharedbuerandmetadatabetweenkernelanduserspace,showingthatitachieves20xspeedupscomparedtoconventionalAPIs.However,netmapreliesonitsowncustomkernelmodule,againhinderingtheabilitytodeployoncommoditysystems.Insteadofprovidingfullvirtualnetworktotheusers,VirtualWire[ 93 ]takesadvantageofconnect/disconnectprimitives,whichareassumedtobeexposedbythecloudserviceprovider.TheirimplementationrequireschangestoaXen-blankethypervisor,whileVIAScanbedeployedwithoutchangestotheVMM.Furthermore,VIASmakesnoassumptionabouttheprimitivesexposedbyaprovidertomanageconnectors{VIASoverlaylinkscanbeestablishedevenwhencloudinstancesareconstrainedbyNATs.Theyshowedlivemigrating,withagivensimpleprimitives,aVMinstancefromonenetworktotheothersuccessfully.NVP[ 51 ]iscloselyrelatedasitalsoshowscombinedversionoftunnelingandSDNfabric.However,thisrequiresallthelogicalandphysicalnetworkswitchesshouldbeSDNharnessed.ItprovidesRESTAPIstothetenantstoexposenetworkvirtualizationcapabilities.Networkelementssuchasswitchesandportsarepresentedtothetenants,andtenantsbuildtopologyoftheirnetwork.Then,tunnelsandowrulesarecreatedandprogrammedbyNVPtoeachhardwareandsoftwareOpenFlowswitchestoforwardpacketsamongVMsdeployedintraandinterclouds.However,NVPisdesignedtosupportmultipletenantsinasinglecloudprovider.UnlikeVIAS,itstechniquesdonotsupportinter-cloudnetworkvirtualization. 63

PAGE 64

2.8ChapterConclusionChapter 2 presentedthenovelarchitectureofVIAS,anddemonstrateditsabilitytoautomaticallyprovidefastvirtualnetworkpathswithinacloudproviderviacoordinatedprogrammingofSDNswitches,whileretainingtheabilitytodynamicallyestablishinter-cloudvirtualprivatenetworktunnel.ThemaincontributionofChapter 2 isanoveluser-levelapproachtodistributed,coordinatedcontrolofoverlayandSDNcontrollers,supportingprivateinter-cloudandhigh-performanceintra-cloudnetworkvirtualizationows.VIASleveragesexistingsystem-levelVMM/kernelsoftwarewithoutmodications,andhasbeendemonstratedtoworkbyextendingexistingoverlaysoftware(IPOP)andSDNplatforms(RYU,OpenvSwitch)inrealisticcloudcomputingenvironments,includingAmazonEC2andGooglecomputeengine.ResultsshowedthatVIAScanprovideexiblelayer2virtualnetwork,inparticulartonestedvirtualizationenvironments,wheretenantsdeploycontainersacrossmultipleproviders.WhilethisdissertationquantitativelyevaluatedtheuseofVIASwithsoftwarevirtualswitches,theuseoftheOpenFlowstandardallowsVIAStotapintohardwareSDNresources,ifavailable. 64

PAGE 65

CHAPTER3SDNFORMULTI-TENANTDATACENTERAsservervirtualizationtechnologiesmature,multi-tenantcloudcomputinghasbecomewidelyusedasaplatformforon-demandresourceprovisioning.Dynamicprovisioningofcomputingresourcesasvirtualmachinesprovidesexiblecomputinginfrastructurethatcanbetailoredbytenants,andprovideshighavailabilityandproximitytoclientsbygeographicallydistributingserversacrossdierentzones.Fromanetworkingperspective,multi-tenancyindatacentersrequiresanabilitytodeployarbitraryvirtualnetworktopologiesuponphysicalnetworkinfrastructure,alongwithmanagingoverlappingaddressesbetweendierenttenants.Inaddition,forbetterutilizationandhighavailability,VMmigrationisanessentialfeatureinMulti-TenantDataCenter(MTDC).WhenVMsarecreateduponuser'srequest,theyshouldbevirtuallyconnectedtotheuser'salreadydeployedVMs,regardlessoftheirphysicallocation.Also,whenlivemigrated,thenetworkidentity(suchasMACorIPaddress)ofVMsshouldremainthesame.Finally,whenVMsareterminated,theirnetworkpresenceshouldberevokedfromthetenantvirtualnetwork,andnetworkidentitiesshouldbereclaimed.Tunnelinghasbeenwidelyusedtotackletheseissuesbyprovidingavirtualoverlaynetworkuponthephysicaldatacenternetwork.Tunnelingadvantagesincludeeaseofdeploymentandseparationfromphysicalnetworktopology,becauseitobviatestheneedforadditionalprotocolfromphysicalswitches(suchasVLANorMPLS).Generally,thetunnelingprocessinvolvesencapsulatingeverypacketfromthevirtualoverlaynetworkbyaphysicalnetworkheader.Thisprocessisusuallyachievedbygeneralpurposeprocessorsatend-pointsandimplementedinsoftware(e.g.hypervisors),insteadofphysicalnetworkfabric.Nowadays,linkspeedof10Gbpsisprevalent,andthetrendistowardhigherrates.Whilenetworkdevicessurpass10Gbps,itisincreasinglydicultforthetunnelingprocesstokeepthispace:theprocessingtimeofpacketclassicationisdominatedbymemoryaccesstimeratherthanCPUcycletime[ 31 ].AlthoughcurrentCPUtechnologyachievesfew 65

PAGE 66

hundredsofGIPS(GigaInstructionsperSecond),thetunnelingprocesshogsupcomputingresourcesathypervisors,nibblingavailableresourcesfortheguestmachines.SoftwareDenedNetworking(SDN)[ 57 ]hasbeenwidelyusedforimplementingexibleroutingrulesagainstelastictracdemandandalsoasameanstoprovidenetworkvirtualizationindatacenters.Forexample,NVP[ 51 ]usesforwardingpathswithOpenvSwitch,butitsscopeisboundedtoasinglehypervisor{itrequirestunnelingtoreachbeyondend-hosts.WL2[ 14 ]leveragesMACaddressrewritingandaredesignedLayer2addressschemeforhierarchicaldataforwarding,butitstilldependsontunneling(VXLAN)andgatewaysforscalablehierarchicalfullymeshedLayer2network.VIASinChapter 2 usestransportaddresstranslationusingSDNprimitivestobypassselectivetracintensiveTCP/UDPstreamsfromencapsulation,whilelettingtheoverlayhandlingrestofnetworkprotocols.InChapter 3 ,PARES(PAcketRewritingoverSDN),anetworkvirtualizationframeworkforMTDCthatleveragestheOpenFlowSDNprotocol,isintroduced.PARESachievesnetworkvirtualizationbypacketrewritingonSDNfabric,withoutincurringprocessingoverheadsonend-hosthypervisors.ExperimentsshowthatPARESachievenear-nativevirtualizationperformanceat10Gbpslink,whileVXLAN-basedtunnelingachievesonly20%oflinerate.Additionally,sincePARESisimplementedexclusivelyindata-planefabricandpacketsareprocessedinanin-situmanner,itinherentlyhasanadvantageofperformanceisolationfromthehypervisor,andavoidstheoverheadoflayersofindirectioncausedbyvirtualization. 3.1Background 3.1.1Multi-TenantDataCenter(MTDC)Tenantsinamulti-tenantdatacenterrequireaccesstothepublicInternet,whilealsobeingrequiredtocommunicateamongitsVMsasiftheywereconnectedtothesameLANenvironment,regardlessoflocation.RequirementsfortheMTDCnetworkinfrastructuresinclude: Managingandisolatingoverlappingaddressspaces SupportofVMmigrationforexibleprovisioning 66

PAGE 67

Decouplingofvirtualfromphysicalnetworktopology AbilitytoscaletolargenumbersofnodesTherearetwoexistinggeneralapproachestoprovideVirtualPrivateNetworks(VPN)forMTDCnetworks.Oneapproachistomicro-manageallFIB(ForwardInformationBase)ofallnetworkentitiesinthedatacenterbyestablishinglinkpathsfromsourcehosttodestinationhosts,e.g.byusingIEEE802.1q(VLAN)orMPLS[ 78 ].Theseprotocolsuseadditionaleldstorouteortoisolatesub-networkswithoutinterventionofconventionallinklayerorIPnetworkprotocol.However,VLANapproachhasnotbeenwidelyusedtoprovideVPNinMTDCs,becauseitincurscomplexityofcongurationandscalabilitylimitations:initialMTDCusedIEEE802.1qtovirtualLANforbridgingandisolation[ 10 ],butsueredfromscalabilitylimitation.Forexample,intraditionaldatacenter[ 7 ],VLANisusedtoprovidebroadcastisolationandtosliceLayer2network.Layer2addressspacecanbeslicedbyVLANIDandeachVLANIDisassignedtothecorrespondingtenantforisolation.However,thereisalimitationonVLANIDof12-bitwidth(4096entries),whichiswaybelowtherequirementofcurrenttypicalmulti-tenantdatacenter.TypicaldatacenterToR(TopofRack)switchesconsistof24-48ports[ 55 ]andeachservercouldpossiblyprovisionuptohundredsofVMs.Although,asaruleofthumb,asinglehypervisorisnotassignedVMsmorethanthenumberofthreadsonphysicalmachines.Singlerackwith40physicalserverscanreadilyreachthelimits.Moreover,conguringforwardingpathsalsobearsscalabilitylimitationsbecauseofthelimitedTCAMentriesofswitches.Typicallowcostcommodityswitchknowntohavearound16kentries[ 27 ].Theotherapproachistoencompassvirtualnetworkpacketsinsidephysicalnetworkheader,whichisreferredastunneling,overlaynetworkorencapsulation.ArbitrarynetworktopologiesandaddressingschemesoftheMTDCnetworkcanberealizedwiththisapproach.Furthermore,itcanbeaccomplishedusingcommodityswitcheswithouttherequirementofadditionalprotocolfeatures.However,thisapproachcomeswiththecostofencapsulationoverhead:thevirtualnetworkheaderisencapsulatedbyphysicalnetworkheader,reducingthe 67

PAGE 68

eectiveMTU(MaximumTransferUnit),andpacket-processingongeneralpurposeprocessorsincursfrequentcontextswitchingonthehypervisor.InChapter 3 ,weaddressthesetwoissuesofthisapproach. 3.1.2RelatedWorksofMTDCNetworkArchitectureSeveralMTDCnetworkarchitecturesanddatacenternetworkvirtualizationtechniqueshavebeenproposed.Tonameafew,NVP[ 51 ]usestheStatelessTransportTunnelingprotocol(STT)[ 28 ],VXLAN[ 55 ]andGREastunnelingprotocolsataservicenode(whichresidesinthehypervisor)toprovidenetworkvirtualization.Toavoidtheencapsulationoverheadathypervisor,NVPintroducedSTT,whichamortizestheencapsulationoverheadbycoalescingmultipledatagrams.AlthoughNVPisabletoachievelinerateof10Gbps,itstillhogsupCPUresourcesatbothend-hostsandrequiresspecialfeatureinNICcalledTCPSegmentationOoad(TSO).SinceSTTdisguisesencapsulationheaderasregularTCPheader,thereisnohandshakeoracknowledgment;thus,middleboxessuchasrewallorloadbalancersinthedatacenterneedtobemodied.Furthermore,asinglepacketlosscanleadtodropoftheentireSTTframe,andsignicantperformancedegradation.WhenNVPusesGREtunneling,itonlyachieves25%oflinerateat10Gbpsandincursmorethan80%loadatCPU.VL2[ 27 ]usesIP-in-IP[ 81 ]andthetunnelingprocessisdoneattheToR(TopofRack)switches,butitsevaluationisbasedon1Gbpslinerate.NetLord[ 58 ]usesanagentonhypervisorfortunnelinganddenestheirownencapsulationprotocol.PortLand[ 65 ]usespacketrewritinginsteadoftunneling,bytranslatingpseudoMACandactualMACatToRswitches.However,itassumesIPCore/aggregationswitcheshavelongestprexmatchuponMACaddress.ItusesMACaddressesinamannersimilartoIPaddressesinamulti-rootfattreedatacentertopology[ 1 ].AlthoughitissupportedinOpenFlow-enabledswitches,itisdiculttoexpectcommodityswitchestohavethisfeature,sincelongestprexmatchisonlyimplementeduponIPaddressbutnotataddressessuchasMAC.SecondNet[ 30 ]uses\sourcebasedrouting"byusingMPLSlabeleld.EverypacketisencapsulatedbyMPLSeld,whichrecordseveryegressportofeveryswitchonthepath.Since 68

PAGE 69

datacenterphysicalnetworktopologyisfairlystaticwithsmallnumberofhops,the20-bitwidthMPLSlabelhasenoughspacetorecordallegressportnumbersofeveryhoppingswitch.Sinceitisnotbasedonencapsulationitachieves10GbpslineratewithfairthroughputbalanceamongVMs.However,ithasscalabilitylimitationsintermsoftheIPcoreswitchhoppingcount,andrequiresMPLSfeatureoneveryswitchatthecoreandaggregationlayers.CommonaspectsofaforementionedMTDCnetworkarchitecturesarethattheyassumelow-costcommodityswitcheswithsimpleroutingprotocolatIPcore/aggregation,whichuseonlyafewFIBsateachswitch[ 1 7 ].Multi-rootfattreetopologyisespeciallypopularthesedays,sinceitcanachievefairbisectionbandwidthwithouttherequirementofhigh-endswitchesatcoreoraggregationlayerbyusingredundantmultipleroutingpaths.Moreover,itenablescommodityswitchestoreplacehigh-endswitches,especiallyinthecorelayer.AnothercommonaspectisthatMTDCarchitecturesvirtualizenetworkattheboundaryofthecorefabricnetworkandprovidesameantoresolve\many-to-one"mappingofvirtualaddressspacetophysicalunderlayaddressspaceeitherbymultiplexingaddressesorencapsulatingpackets.Thesefunctionsresideinhypervisor,edgeswitches,orToRswitches.Asexamples,PortLandusesedgeswitchestomapmultipleactualMACaddressestoapseudoMACaddress.NetLordandNVPplaceagentsatthehypervisortoperformencapsulationofvirtualaddressbycorrespondingphysicalnetworkaddress.VL2usesIP-in-IPinToRswitchesbyencapsulatingvirtualIPaddressbyphysicalIPaddress.AbovementioneddatacenterarchitecturesinacademiaandindustryalongwiththeirvirtualizationtechniqueissummarizedinTable 3-1 .Overall,thereisacleartrendthatstateoftheartdatacenterwithnetworkvirtualizationrequiressimpleroutingpolicyatcore/aggregationlayer,andexpandsitsaddressspaceatthe\edge"ofdatacenternetworkinfrastructure.Thevirtualizationprimitives,suchasaddressmappingandenforcingisolation,arehandledattheedge.Ratherthandoingsoattheend-pointservers/hypervisors,ourproposedarchitectureplacesSDN-enablededgeswitchesaroundthecorenetworkfabric(asinFigure 3-1 )andexploitsthemforpacketrewriting. 69

PAGE 70

Table3-1. Comparisonofvariousnetworkvirtualizationinmulti-tenantorenterprisedatacenter. Addressmappingmethodology Virtualizationbody linerate Miscellaneous NVP Encapsulation(STT,GRE,VXLAN) Servicenodeathypervisor Achieved10GbpsusingSTT VL2 Encapsulation(IP-in-IP) ToRswitch 1Gbps WL2 Hierarchicaladdressing,VXLAN(Interdatacentercommunication) Gateway { Interdatacenterarchitecture SecondNet SourcebasedroutingusingMPLS Portswitchingbasedsourcerouting(PSSR) 10Gbpslinerate Limitedhoppingcountonswitchfabric PortLand MACheaderrewriting(Pseudo/ActualMACtranslation) Edgeswitch 150Mbpsat1Gbps NetLord Encapsulation Hypervisor 1Gbps 3.1.3Layer2SemanticsinMTDCsTotraversepacketsfromoneserver(oneVMinsidehypervisor)toanotherserver(anotherVMinsideanotherhypervisor)1,MTDCstypicallyusetunneling.Moreover,tosupportthebroadcastingnatureofLayer2network,anagentonthehypervisorcanbeusedtohandlebroadcasting(suchasARPorDHCP)andmulticastingprotocolsasmultipleunicastmessageswithtunneling,oruseIPmulticasttrees.Forexample,NVP[ 51 ]runsaservicenodeatthehypervisorandusesmultipleunicastmessagestoimplementbroadcast/multicastsemantics.VL2[ 27 ]invokesadirectoryserviceattheshimlayer(whichislayeronnetworkstackofend-hostO/SinvokingRPCstodirectoryserver)toresolvelocationofend-hostaddress,andthentunnelstheoriginalandsubsequentpackets.VL2suppresseslayer-2semantics 1InChapter 3 ,serverandVMareinterchangeablyusedasthesamemeaning.Servers/VMsarevirtualinstancesinsidephysicalhypervisor. 70

PAGE 71

Figure3-1. PARESinmulti-tenantdatacenter. CS:CoreSwitch,AS:AggregationSwitch,ES:EdgeSwitch,HV:Hypervisor,VM:VirtualMachine onlyattheedgeswitchlayertopreventscalabilityproblemscreatedbyARP.PortLand[ 65 ]usesproxyARP,whichforwardsallARPmessagescapturedattheedgeswitchtothefabricmanager.WheninitialbroadcastARPmessageissent,itisinterceptedatedgeswitchandisnotegressedtotheIPcore/aggregationlayer.Instead,thefabricmanager(proxyARP)itselfsendsthisARPmessagetoalltheedgeswitchesinthenetwork.PARESusessimilarapproachbutitlooksupitsdirectoryfromtheSDNcontrolplanetoresolveARP.WL2[ 14 ]interceptsARPandDHCPpacketsfromthevirtualswitches,thenforwardstoSDNcontrolplaneandthecontrollerrepliesbacktotheend-hosts.ItissimilarapproachwithPARES,exceptPARESinterceptbroadcastpacketsatedgeswitches.Assuch,commonprinciplesofwell-knownMTDCsindicatethat,althoughtheykeeptheLayer2semanticsattheedge(betweennetworkfabricandend-pointhypervisor),theysuppressitbeyondtheIPcore. 71

PAGE 72

Figure3-2. ArchitectureofPARES. 3.2ArchitectureofPARES 3.2.1High-levelOverviewPARESresidesbetweenthefront-endconsoleportal/APIandphysicaldatacenter,asillustratedinFigure 3-2 .ConsideranexampleofatenantrequestingaVMinstancedeploymentthroughthecloudserviceprovider'sfront-end.Thefront-endprocessesauthenticationandaccesscontrol,thenvirtualnetworkcongurationrequestsaresenttoPARESthroughitsAPIs.PARESactsasanOpenFlowcontrollertoprogramedgeswitches(Figure 3-1 )dynamically,basedontenantrequests.However,ratherthanstaticallyprogrammingeachOpenFlowswitchesroutingrule,PARESprogramsOpenFlowenablededgeswitchesdynamically,ondemand,asshowninFigure 3-2 72

PAGE 73

Figure3-3. OperationlayeringonSDNfabric. 3.2.2PARESPacketHandlingLayersNetworklayersofPARESshouldbedierentoftheconventionalOSInetworklayers.Conventionalnetworkstackhasknowntohavehourglassshape,whichhavepreventedadoptionofnewandadvancedprotocolsbecauseofitsin-naturenecessityofobeyingtheexistingprotocols.Ateachperspectiveofnetworklayer,itdoesnothaveanyknowledgeoraccessofbeloworabovenetworklayer.MostofthePARESlogicisforitsOpenFlowprotocolcontrolmodule,whichprogramsowtablesofedgeswitches(dataplane)uponthegivennetworkconguration,anddynamicallyupdatesdataplanesuponthefront-end'srequest.Inessence,PARESprogramstheedgeswitchesofSDNfabricinthreelayers:endhostsolicitation,networkfunctionandroutinglayer,asshowninFigure 3-3 .Theoperationattheselayersiselaboratedbelow:InPARES,everylayerhasaccesstoallkindofnetworkheaderbutlayersareclassiedbyitsfunctionality,orderanddependencies. 73

PAGE 74

3.2.2.1End-hostsolicitationlayerSeparatingnamesfromlocationisaquintessentialrequirementforMTDCs,asprovidersandtenantsrequireelasticityandVMmigration.AsexplainedinSection 3.1.3 ,MTDCstypicallyplaceashimlayerinsideeachserver[ 27 ],orrunservicenodesonthehypervisor[ 51 ]tointerceptlocationsolicitationpacketsinordertopreventbroadcasttracfromoodingthedatacenter.VL2[ 27 ]placesashimlayeroneveryserverinordertointerceptlocationsolicitationpackets,theninvokesitsdirectoryservice(ratherthanbroadcastprotocol)tosolicitlocation.Thispreventsbroadcasttracfromoodingthedatacenter.AsomewhatdierentapproachisusedinNVP[ 51 ]whereaservicenodeathypervisorcreatesindividualtunnelforpairingremoteservers.Broadcasttracdoesnotgobeyondthehosttophysicaldatacenternetwork;ratheritisreplicatedandtunneledbythisservicenode.Dierentfromtheseapproaches,PARESplacesthesefunctionalitiesofend-hostsolicitationonedgeswitchesinthenetworkfabric.Thislayeristherstlayerapacketencounterswheningressedontheswitch.Ifthepacketismatchedtoanyend-hostsolicitationprotocol(ARP,DNSorDHCP),wholepacketsareforwardedtothePARESOpenFlowcontroller,andPARESreplies(bylookingupitsdirectory)andinjectsthereplybacktotheoriginalport. 3.2.2.2NetworkfunctionlayerAlltracotherthanend-hostsolicitationisforwardedtothislayer.PARESplacesthenetworkfunctionlayerbeforetheroutinglayer,suchthatpacketrewritingcanbecompletedbeforerouting.Forexample,incaseofloadbalancing,destinationtransportaddressusuallyneedstoberewrittenatnetworkfunctionlayerbeforeitsnalroutingatroutinglayer.Itisimportanttonotethatnetworkfunctionscanbeindependentorrequiresequentialorderamongeachother.Forexample,networktracfromnorth-southtracrstrequiresrewallthenloadbalancing.Networkfunctionsarediverseinnature[ 12 ];inthissection,thescopeofnetworkfunctionsislimitedtoonlymulti-tenancy(tunnelingandaddressisolation).Afterprocessingeachnetworkfunction,thepacketsarepassedtotheroutinglayer. 74

PAGE 75

3.2.2.3RoutinglayerAfterthepacketmodicationinnetworkfunctionlayer,switching/routingdecisionismadeinthislayer.IftheunderlaynetworkisLayer2fabric,thislayerperformsmac-learningprocess,andswitchingisperformedbasedonMACaddress.Whileood-and-learnsemanticsofLayer2natureprovideseasydeployment,STP(spanningtreeprotocol)prohibitsexploitingpossibleredundantmultipleparallelpathstoincreaseoverallbisectionbandwidth.AlthoughTRILL[ 18 ]andLISP[ 20 ]canbesolutionsforexploitingmultipathinLayer2datacenter,itdoesnotsolvetheproblemsoflimitedscalabilitycausedbythenumberofTCAMentriesinswitches.Asaresult,enterprise/multi-tenantdatacenternetworkshaveevolvedtoshiftfromLayer2toLayer3fabric,duethetoreasonsoutlinedbelow.Withtheintroductionofthemulti-rootfattreenetworktopology[ 1 ],whichplacesallswitchandserversinLayer3andexploitstheexistenceandpredictablelatencyofmultipleparallelpathsamongstallnodes,apureLayer3datacenternetworktopologyhasbecomepopular,especiallyinmassivelylargedatacenter[ 7 ].Additionally,itmaximizesthebisectiontracindatacenternetworkusingcommodityswitcheswithouttherequirementofexpensivehighcapacityswitchesatthecore.IfLayer3fabricisusedasanunderlaynetwork,switchingisperformedbasedonIPaddress. 3.3ImplementationInthissection,itisdescribedindetailthathoweachoperationlayerofPAREShandlescorrespondingprotocolsornetworkfunctions. 3.3.1Layer2SemanticsinPARESAsexplainedinSection 3.1.3 ,MTDCarchitecturesprovideameanstointerceptbroadcastpacketstopreventoodingthewholedatacenter.PortLand'sproxyARPapproach[ 65 ]interceptsARPatedgeswitchesthenforwardstoproxyARP.VL2[ 27 ]alsohassimilarapproach,butitinterceptsframesatend-hostsratherthanedgeswitches.PARESusesanapproachinspiredbyWL2[ 14 ]:itinterceptsbroadcasting-basedprotocolsonSDN-enablededgeswitches.Insteadoflettingthebroadcasting-basedprotocoltoreachitsdestination,edgeswitcheshandlebroadcasttracby:1)forwardingpacketstoPARES 75

PAGE 76

throughOpenFlowcontrolplane,2)waitingforreplyfromPAREScontrollerfollowingitsdirectorymappinglookup,and3)nallyinjectingreplymessagebacktooriginalport.2Notethatbroadcastingisgenerallyusedforsolicitingend-hostaddress,notforactualpackettraverse.Althoughusingcontrolplaneincurssubstantiallatenciesandcanbeapotentialbottleneck,theperformanceimpactofbroadcastframesistypicallynegligiblecomparedtoactualdatatrac(unicast)oftypicalMTDCapplications.Forexample,ARPcacheevictstypicallyin60-secondstimeoutandDHCPleasetimeisanorderofminutes.GiventheanalysisfromPortLand[ 65 ]andtheevaluationinSection 3.4 ,asinglecoreservercanserviceARPforseveralhundredthousand.Also,end-hostaddressmappinginformationisratherstatic,withveryfewupdates:itisonlyupdatedwhenaVMiscreated,terminated,ormigrated. 3.3.2End-hostSolicitationLayerARPandDHCParethemostwidelyusedprotocolstosolicitend-hostnetworkconguration.However,theyoodthenetworkduetotheirbroadcastnature.Theseprotocolsarenotinvolvedinconveyingthebulkofthe(unicast)datatracamongend-hoststhattheirbestinterestswouldresideascloseastotheend-host.Although,Layer2issupposedtolink-to-linkandLayer4forend-to-endtransport,lateappearanceofDHCPmakeitplaceintheLayer4andaboveARPandDHCPareLayer2andLayer7respectively,butDHCPpacketscanbeinferredfromLayer4header(transportportnumber).EndhostsolicitationworksasthedelegateofremotehosttosolicitARPorDHCPserver.EndhostsolicitationinPARESdirectlyhandlesARPandDHCPusingtheapproachdescribedabove(inSection 3.3.1 ).3Itisimportanttoplaceend-hostsolicitationprotocolssuchasARP, 2InOpenFlow,whenpacketsareforwardedtothecontroller,itcanreadanyeldsofthepacket.Andthecontrolleritselfhasacompletedegreeoffreedomofcraftingandinjectingpackets.3SimpleOpenFlowrulesofforwardingallEthernettype0x0806tocontroller(ARP)andtransportportnumber67(DHCP)canachievethis. 76

PAGE 77

NDP(NeighborhoodDiscoveryProtocol)andDHCPtoasnearaspossibletotheend-hostbecauseoftheoodingnatureofthoseprotocols,andtoseparatethesetracsfromthenetworkfabric.Thus,ood-basedpacketsdonotspreadbeyondtheend-hostsolicitationlayer.Otherdiscoveryprotocols,suchasLLDP(LinkLayerDiscoveryProtocol)[ 53 ]orNDP(NeighborhoodDiscoveryProtocol)[ 62 ]forIPv6,andnameresolutionprotocols(DNS)canbeimplementedinthesamemanner.End-hostsolicitationprotocolsareresolvedinacentralizedmannerintheSDNcontrolplanebylookingupadictionarymanagedbyPARES.Inmulti-rootfattreedatacenternetwork,whereIPaddressareassignedstaticallybasedonitspodlocation,traditionalsubnettingisnotusedinsidethedatacenterfabric.WhenARPorDHCPrequestcomesoutfromtheend-hosts(e.g.hypervisor),edgeswitchforwardsittothePARESandARPorDHCPmessageisinjectedfromthevirtualswitchestotheoriginatedswitchport.PARESexposesAPIstoupdateentriesoftheMAC-IPmappingdirectory,suchthatthecloudfront-endcanusetheseAPIstoupdatetheinformationofwhichinstanceassociateswithwhichIPorMACaddress. 3.3.3NetworkFunctionLayerVariousnetworkfunctions,suchasrewallingorload-balancing,canbeimplementedinthislayer.OpenFlownotonlyprovidesameanstoinspectapacketheaderfromLayer2toLayer4(match),butalsoallowsittoperformactionssuchasforwardingtoaswitchportorrewritingnetworkheaders(action).Thiscapabilityisenoughtoimplement5tuplesbasednetworkfunctions.Implementingthesenetworkfunctionsarereadilyavailableintheliterature,suchasgatewayinVIAS[ 45 ],andload-balancerandrewallinSoftFlow[ 42 ].Inthissection,weconcentrateonmulti-tenantnetworkfunctionsandaddressisolation(tunnelingequivalent),whicharekeytoprovidingmulti-tenancyforguestmachines.PAREStracksinformationregardingnetworkfunctions.Forexample,ifPARESdeploysmulti-tenancynetworkfunction,associatedinformationsuchaslocationsofedgeswitches, 77

PAGE 78

Figure3-4. Datapathcomparisonofconventionaltunneling(top)andaddresstranslation(PARES,bottom). hypervisorsorVMinstancesarestoredinPAREStochecktheidentierconicts(MACaddress)orresourceexhaustion(tablesentriesinedgeswitches). 3.3.4PacketWalk-throughExampleConsideranexamplewhereatenanthastwoVMs(server1andserver2),andthatserver1sendsapackettoserver2,whichresidesintheremotehypervisoracrossthedatacenter,asillustratedinthebottompartofFigure 3-4 .Thetenantisprovisionedthevirtualnetworkaddressrange192.168.1.0/24,andservers1and2areassignedvirtualaddresses192.168.1.1and192.168.1.2.Edgeswitch1(whichhasbeenprogrammedbyPARESupontenantVMinstatiation)usesanSDNprimitivetotranslatetheprivateIPaddressrangetounderlaynetworkaddressrangeoftheMTDCnetworkfabric(10.0.0.0/8),thenpushesthepackettotheunderlaynetwork.ThepackettraversestheLayer3routingfabric,thenreachesthedestinationedgeswitch2.Atthispoint,theedgeswitchinspectstheMACaddressofVM1,andtranslatestheaddressbackto192.168.1.0/24.Asacomparison,tunnelingapproachesencapsulateeachIPpackets(asshowninupperpartofFigure 3-4 ),eectivelyreducingtheMTUsizebyitsincreasedheadersizeandincurringfrequentsoftwareinterruptsonhypervisor.TheroleofPARESisinmanagingtheserulesonedgeswitches1,2,on-demand,basedontenantprovisioning.Continuingtheexample,supposethereisaserver3residinginhypervisor1(alongwithserver1),withMACaddress5.Whenserver3sendspackettoserver4on 78

PAGE 79

hypervisor2,edgeswitch2translatesbacktotenantaddressrangebasedonMACaddress.Notethat,whileIPaddressesaretranslated,theMACaddressisnottranslatedwhilethepacketstraversetheIPcore/aggregationlayer,sinceswitchingissolelydonebyIPaddressinthemulti-rootfattreenetworkarchitecture.Thisallowsthetranslationatedgeswitch2totakeeect. 3.3.5RoutingLayerConventionalroutingorswitchingbasedonIPorMACaddressrespectivelyisimplementedinthislayer.IftheAggregation/Coreswitchesarebasedonconventionallayer2network,thislayerfunctionasMAClearningandswitchingbasedonMACaddress.Ifaboveswitchinglayersarebasedonlayer3ashyperscaledatacentersuchasmassivelyscalabledatacenter[ 15 ],thislayerfunctionsaslongestprexroutingasmulti-rootfattreearchitecture[ 1 ].Asasamemanner,IPmulticasttreecanbeconstructedonthislayeralongwithend-hostsolicitationlayer.Forexample,IGMPsubscribepacketiscapturedatend-hostsolicitationlayerandPARESconstructroutingruleonroutinglayer. 3.4EvaluationInthissection,weevaluatePARESfromdierentperspectives.Atrst,sinceweapplyabovetechniquetoamulti-rootfattreedatacenterarchitecture,howtoemulatethistopologyinthetestbedisdescribed.ThenVXLANisexplained,whichwasusedasthecontrolgroupforexperimentsandperformancecomparisontoPARES.Then,end-hostsolicitationlayerofPARESisevaluatedtodemonstrateitsabilitytoperformaddressresolutioninMTDCsatscale.Additionally,itisshownthatPAREScanachievelineratevirtualizationwithoutusingtheend-hostcomputingresources.Finally,scalabilitywithrespecttonumberofowentriesateachedgeswitchisevaluated.PARESisdeployedintwotestbeds.OneusesahardwareOpenFlowswitchwith1Gbpsbandwidth,andtheotheruses10GbpsNICwithmultipleOpenvSwitchmodulesdeployedonasinglephysicalmachine.AlltheseresourcesarebasedonPRAGMA-ENTSDNtestbed[ 40 ].ThersttestbedusesasinglehardwareOpenFlowswitchofmodelPICA83295andtwo 79

PAGE 80

Figure3-5. SDNtestbeds. physicalmachines(Intel(R)Xeon(R)CPUE5530with24GBmemory)attachedtoitasshowninFigure 3-5 (a).Thesecondtestbedusesthreephysicalmachines,withtheaforementionedCPU/memoryspecication,andwith10GbpsIntelX520-DA2NIC.ToemulatetheIPfabriccoreandedgeswitches,fourOpenvSwitchinstancesareinstantiatedinonephysicalmachine,andusethetwoothermachinesasend-pointhypervisors.AllphysicalmachinesrunUbuntuLinux16.04.Insteadofusingfullorparavirtualizationhypervisor,lightweightOSlevelvirtulization,Linuxcontainers(LXC),isused,sincethesolepurposeistoevaluatenetworkvirtualizationperformancebutnottheprocessorvirtualization.PAREShasbeenbuiltupontheopen-sourceRYUSDNframework[ 6 ]conformingtoOpenFlowspecicationversion1.3. 3.4.1Multi-rootFatTreeandBidirectionalLongestPrexRoutingdecisionsbasedonIPaddressesaretypicallydeterminedbylongestprexmatch.ThiscanbeusedtomaketopologyofLayer3networkatree-likestructure,wheretherouterclosertothetopofthetreehasshortestnetmask,andthoseclosertotheend-hosthavelongernetmasks.This,however,introducesinherentbottlenecksandincurssevereover-subscriptionratioatcoreandaggregationswitches.Theuseofmulti-rootfattreeandbidirectionallongestprexroutingschemessubstantiallyreducethisproblem[ 1 ].Bothmostandleastsignicantaddresseldsareusedonroutingdecisionataggregation/edgeswitchesinthepodwithtwolevelroutingtable. 80

PAGE 81

Figure3-6. IPcore/aggregationroutingruleexamplewithasinglepodofmulti-rootfattreedatacenterarchitecture. BidirectionallongestprexschemeofowentryisshowninsideeachOpenvSwitchinstances Forevaluation,asanunderlaynetworktopologyandaddressschememulti-rootfattreenetworktopology[ 1 ]isused,whichuseslongestprexmatchonIPaddressoneitherdirection(fromMSBtoLSBortheotherwayaround).OpenFlowcurrentlysupportsthisaddressscheme.WhilePARESisalsoapplicableinconventionallayer2,oranyotherformoflayer3datacenternetwork,theevaluationscopesonlytosystemonamulti-rootfattreenetworktopology.Specically,a4-portpodconsistingof4switchesisdeployedasinFigure 3-6 .Addressrange10.0.i.1isassignedforeachswitch(iistheindexofeachswitch)and10.0.i.jisassignedforeachhypervisor(jistheindexofeachhypervisor).Notethat4OpenvSwitch 81

PAGE 82

instancesrunonasinglephysicalmachineandthelinksbetweentheswitchinstancesarecreatedwithvethpeer. 3.4.2VirtualeXtensibleLocalAreaNetwork(VXLAN)Thereisamyriadoftunnelingprotocolsavailable.Tonameafew,VXLAN[ 55 ],GRE,STT,IPsecTunnelmode[ 48 ],NVGRE,andIPOP[ 47 ]havebeenusedinvariousscenarios.Inthissection,wecomparetheproposedapproachtoVXLAN,whichisawidely-usedoverlaynetworksolution,especiallydesignedformulti-tenancyindatacenternetwork.Inaddition,akernelimplementationofVXLANisreadilyavailableandoptimizedforhighperformance.VXLANencapsulatespacketsfromthetenantswiththeaddressrangeofunderlaynetwork.Ithasanextensivelogicalidentierspace(16million),enoughforcurrentmulti-tenantdatacenters.ToemulateLayer2ood-and-learnsemanticsinunderlayLayer3network,VXLANexploitsIPmulticast,whicheectivelyreducestherangeofoodingatthedatacenternetwork.BecauseVXLANdoesnotrequirecentralcontrolplaneinvolvementnoradditionalfeaturefromunderlaynetworkforitsdeployment,itiswidelyandexiblyadaptedincurrentMTDCnetworks.ThelimitationoftheVXLANprotocol,aspointedoutearlier,isthatittakesupresourcesfromthehypervisors.EverypacketdestinedtoremotetenantsisencapsulatedinanUDPheader,whichtriggerssoftwareinterruptsandmultiplecopyoperationwhencrossinghypervisor-guestboundary. 3.4.3End-hostSolicitationLayerScaleTestTotesttheend-hostsolicitationlayer,ARPrequestmessagesaredeliberatelycraftedfromARPtracgenerator.AsinFigure 3-7 ,wevariedtherateofARPrequestspersecondfrom1to1million.PARESstartstoexhibitlatencyincreasesat10Kqueriespersecond.GiventhattypicalARPcachesevictentriesevery60seconds,10KpersecondisequivalentofARPrequestfrom600Kend-hosts.AsimilaranalysisispresentedinPortLand[ 65 ]whichrunsproxyARPserverforallend-hosts.NotethatthisperformanceisachievedwithasingleSDNcontrollerserver. 82

PAGE 83

Figure3-7. End-hostsolicitationtest. 3.4.4LineRateVirtualizationTestFigure 3-8 showstheTCPthroughputperformanceamongunderlaynetworknativethroughput(host),PARESandVXLANusingTCPperformancetool\iperf".Inthersttestbed(1Gbps),PARESachieveslineratevirtualizationwhileVXLANpeaksatslightly(about4%)lowerperformance.ThiscomesfromthefactthatVXLANhasreducedMSSsizebecauseofincreasedheadersizecausedbyencapsulation-theMTUisloweredto1450.Inthe1Gbpstestbedexperiment,signicantperformancedegradationcausedbyencapsulationisnotobserved,sincetherearesucientcomputingresourcesforencapsulation.Figure 3-9 showstheTCPthroughputmeasuredfromanexperimentusingtestbed2.Inthiscase,VXLANonlyachievesabout20%ofthroughput,whilePARESachievesnear-nativephysicalperformance.Certainly,encapsulatingat10Gbpsplacesaburdenoncomputingresourcesatend-pointhypervisorsandVXLANcouldnotkeepthepace.Thisresultissimilar 83

PAGE 84

Figure3-8. TCPmaximumthroughputcomparisonwithvariousmaximumsegmentsizewith1Gbpsedgeswitch. toNVP[ 51 ],whichachievesonly2.4Gbpswithconventionaltunneling(GRE)at10Gbpslinerate.ToconrmthattheoverloadedCPUisthesourceofthepoorperformance,CPUusageistrackedbyusing\top"commandfor60secondsofallphysicalmachines(bothiperfserver(s)/client(c)andswitchfabricmachine(sw))duringthe10GbpsTCPthroughputtest.Figure 3-10 showstheCPUusageprolefornativephysicalcase.ItisobservedthatCPUusageisdominatedbyiperfattheserverside.Softwareinterruptcallsatbothhypervisorsarelessthan1.0%ofthetime.Figure 3-11 showstheCPUproleofVXLANat10Gbps.ItshowsthattheCPUproleisdominatedbytheinterruptcalls(\ksoftirqd"atLinuxkernel),andnotbyiperf.NotethatVXLANcaseachieveslessthan25%TCPthroughputofPARESornativecase.Figure 3-12 showstheCPUproleofPARES.Thetrendissimilartonativecase,sinceitdoesnotincuranyinterruptsatthehypervisorkernel. 84

PAGE 85

Figure3-9. TCPmaximumthroughputcomparisonwithvariousmaximumsegmentsizewith10Gbpsedgeswitch. Figure3-10. CPUprolespanof60secondsrunonphysicalhosts. Figure3-11. CPUprolespanof60secondsVXLANrun. Figure3-12. CPUprolespanof60secondsPARESrun. Inrealdatacenter,99%ofowsaresmallerthan100MBand90%oftraccomesfromowsbetween100MBand1GB[ 27 ].SinglelongowscansubstantiallybenetfromPARES,andhencemostofthetracindatacenterscanbenetfromPARES. 85

PAGE 86

Figure3-13. TCPThroughputamongmultipleVMsandstreams. 3.4.5ScalabilityTestFigure 3-13 showstheTCPthroughputwhenmultipleVMsfromeachhypervisorstarttoruniperfatdierenttimes.Theexperimentissetupasfollows:attime0,onlyasingleVMsendsTCPtrac;every10seconds,additionalVMsstartTCPstreams.Every10seconds,TCPstreamsfromdierentVMsonthesamehypervisordoubles.NotethateachTCPstreamsharesfairthroughput.At40seconds,thecountofTCPstreamsfromdierentVMsreaches16.ThephysicalmachineshavetwosocketsandeachCPUhas4cores;thus16VMsincurredcontentiononprocessingresources,reducingtheoverallpeakthroughput.Althoughover-commitmentofmultiplevirtualCPUtoasinglephysicalcoreorthreadiscommon,allVMsprovisionedonasinglehypervisorcontendingfornetworkresourceatpeakthroughputisextremelyunlikelyinrealdatacenterworkloads.Throughthisexperiment,itisconrmedthatPARESprovidespeakthroughputof10Gbpswithfairshareamongmultiplestreams. 86

PAGE 87

Figure3-14. Flowentryscalabilitytest(basevaluesare144sand520srespectivelyfortestbed1and2) DierentfromconventionalMTDCs,PARESrequirestworulespertunnelateachedgeswitchtoperformmulti-tenancynetworkvirtualization.Totesttheeectofthenumberofowentries,thenumberofentriesateachedgeswitcharevaried.Singlelongowachievespeakthroughputirrespectiveofthenumberofentries.Moreover,itisknownthatthemajoritynumberofowsaresmall(afewKB)[ 27 ]inrealMTDC.ItisthereforeimportanttoquantifytheperformanceofPARESasthenumberofowentriesincreases.Thus,TCPowsynthesizerisdeveloped,ofwhichbasedonrealdatacenterowdistributionfromVL2[ 27 ].Thistracgeneratorgeneratesrandomsizeofowswiththedistributionfromrealdatacenter,randomlyselectsoneofthetunnels,thentransmitstheowusinganythreadavailable.Thegeneratorwaitsforanavailablethreadifthenumberofiperfthreadsismorethanthethreadcount.Notethatwhenthenumberoftunnelsisrelativelysmall,itismorelikelythegeneratorchoosesrecentlyusedtunnel,leadingtolesscacheevictioningeneral 87

PAGE 88

purposeprocessors.Twodierenttransfersizes(1GBfortestbed1and10GBfortestbed2)aregeneratedandthecompletiontimeofthetransferisrecordedinFigure 3-14 .Sinceasinglehypervisorcannotassociatemorethan1000virtualnetworkinterfaces,themaximumnumberoftunnelsinthistestsislimited.Intestbed2,thenumberofowentriesdoesnotaecttheperformance,whileintestbed1itdoes.Initially,itisexpectedthattheresultwouldbetheotherwayaround,sincetestbed2usesavirtualswitchandtestbed1usesahardwareimplementationofOpenFlow.However,whilepacketmatchingandswitchingishandledinhardware,packetrewritingisperformedbyageneralpurposeprocessorembeddedinthePICA8OpenFlowswitch,whichisa850MHzsinglecoremicroprocessor.Althoughweobserveascalabilitylimitationintestbed1,thisproblemtobeovercomewithfullimplementationofhardwareOpenFlowswitchorfastermulti-coremicroprocessors. 3.5Discussion 3.5.1ApplicabilitytoGeneralEnterpriseDataCenterPlacingSDNdevicesatedgeswitchestoseparateIPcorefabricandend-hostnetworkcanalsobeapplicableingeneralenterprisedatacenterasatechniquetoscaleoutthenumberofend-hostswithoutchangingthetopologyofunderlaynetworkoraddress/locationconsideration.Datacenternetworkarchitectureevolvedinawaythatallowstomaximizebisectionbandwidthbyreducingover-subscriptionatcore/aggregatelayerwithcommodityswitches[ 1 27 ].Thistrendsetsforthaspremisesimplerroutingruleatunderlaynetwork,whilerequiringnetworkvirtualizationattheedgenetworkdevices[ 51 ].AlthoughwemainlydiscussedMTDCcasesthroughoutChapter 3 ,thesametechniquecanbeapplicableingeneralenterprisedatacenter.Forexample,VL2[ 27 ]separateslocationaddress(LA)andapplicationaddress(AA),whichrespectivelyrepresentTopofRack(ToR)address(underlaynetwork)andend-hostaddresses(applicationservers).ItencapsulatesAAbyLAatToRlevel,andthosemappingsbetweenAAsandLAsaremanagedbytheVL2directorysystem.TohandlebroadcastsuchasARPorDHCP,itplacesashimlayerateveryend-hostO/Snetworkstacktointerceptbroadcasttracandinvokesdirectoryservicefromdatacenter.Ifweapplythe 88

PAGE 89

techniqueinVL2,shimlayerisreplacedwithend-hostsolicitationlayerfromedgeswitches,andtunnelinganditsmappingarereplacedbynetworkfunctionslayer.TheapproachofPARESdoesnotrequiremodicationatend-hostO/SnetworkstacknorincurencapsulationoverheadatnetworkappliancesatToR. 3.5.2ScalabilityConsiderationGiventhatOpenvSwitchallows1millionowentries,itissafetosaythatPARESimplementedwithOpenvSwitchasedgeswitchhasmorethanenoughcapacitytoscaletotypicaldeployments.Letusassumethescenariowith50-portsedgeswitchandahypervisorateachportforatotalof50hypervisors.Assumeeachhypervisorcontains10VMs.Intheworstcase,all5000VMsbelongtodierenttenants(althoughitrarelyhappens,sinceMTDCtriestoco-locateVMsbelongingtothesametenantsonthesamehypervisor.).Ifweassume100VMspertenant,thetotalowentriesrequiredis100K.ThescalabilityinhardwareOpenFlowedgeswitchescanbelimitedbyspecicationofcorrespondinghardwareOpenFlowdevices.FlowentriesintypicalhardwareOpenFlowrangesaround100Kentries.Althoughitisonetenthofthesoftwareone,itisenoughsizeforaboveusagescenario.Moreover,hardwareOpenFlowswitchguaranteeslinerateirrespectiveoftheowtablesizes,whilelargeowentriesinsoftwareswitchespotentiallyincurcacheevictionsleadingtoperformancedegradation[ 70 ].Currently,fullimplementationofHWOpenFlowswitchwithTCAMiscostly.However,thisproblemcanbeovercomeasSDNimplementationsbecomeprevalent. 3.6ConclusionInordertoprovidemulti-tenancyindatacenterortoextendLayer2semanticsupontheLayer3networkdatacenter,\mapandencapsulation"hasbeenwidelyusedintheliteratureandapplicationsinindustry.InChapter 3 ,analternative\mapandaddresstranslation"approachisproposed,whichtranslatesnetworkaddressfromvirtualrangetounderlayphysicalrangeusingOpenFlow-enablededgeswitches.PARESachieveslineratevirtualizationusingthisapproachwithoutincurringcomputingresourcefromend-hostsorhypervisors.Other 89

PAGE 90

advantagesofPARESarethatthereisnoneedforthemodicationonend-hostO/S,norrequirementofcomputingresourcesfromend-hosthypervisororservers,sinceitdirectlyhandlesARPandDHCPprotocolsonedgeswitchestoseparateLayer2semanticfromtheendserverstoIPcorelayers.ExperimentalresultsshowthatPARESprovidesscalablelineratemulti-tenancyvirtualizationat10Gbpswithoutsacricingend-hostcomputingresources. 90

PAGE 91

CHAPTER4OVERLAYNETWORKSFORBULKDATATRANSFERINPUBLICWANENVIRONMENTSincetheInternetisanaggregationofmultipleASes(AutonomousSystems),congestioncontrolandutilizationarenotgloballyoptimized.Forexample,itisnotuncommonthatadirectshortestroutewithlowlatencydeliverslessbandwidththananalternative,longandroundaboutroute.PreviousresearchhasshownthatgeospatiallydistributedcomputinginstancesincommercialcloudsoerusersanopportunitytodeployrelaypointstodetourpotentiallycongestedASes,andasameantodiversifypathstoincreaseoverallbandwidthandreliability.Suchopportunitycomeswithacost,ascloud-routedpathsincurcostofnotonlyprovisioningofcomputingresources,butalsoforadditionaltracto/fromInternet.Well-establishedprotocols,suchasTCP,werecreatedbasedonassumptionofsingleend-pointtoend-pointtransfer;nonetheless,currentcomputingdeviceshavemultipleend-points,andtheincreasingavailabilityofoverlaynetworksallowsmultiplexingmultiplevirtualnetworkowsintoasinglephysicalnetworkinterface.InChapter 4 ,weempiricallyevaluatetheextenttowhichusingcloudpathstotransferdatainparallelwiththedefaultInternetpathcanimprovetheend-to-endbandwidthinbulkdatatransfers.Intheevaluation,single-streamandmulti-streamTCPtransfersacrossoneormorepathsarestudied.ThisdissertationalsopresentsadesignpatternasapseudosocketAPIsthatcanleveragetheincreasedaggregatebandwidthofcloudmulti-pathstoincreaseoverallthroughput. 4.1Cloud-routedOverlayNetwork(CRONets)InCRONets[ 11 ],theauthorsranextensiveexperimentsinvolving6,600Internetpaths,andobservedempiricallythatusingacloud-routedoverlaypathcanincreaseTCPthroughputfor78%ofthedefaultInternetpaths,withanaverageimprovementfactorofover3times.ItusesoverlaynetworksuchasGREandIPsectocreatevirtualtunnelandtherelaynodetorewritetransportpairsusingLinuxiptablesmasquerade.WhilemotivatedbyCRONets,wefocusonadierentstudy.Ratherthanexperimentstoassessalternatesinglecloud-routedpathasasubstituteforthedefaultInternetpathacrosslargenumbersofpaths,various 91

PAGE 92

Figure4-1. Locationsofrelayandend-pointnodes Relaynodes(red,AmazonEC2-hosted)andend-pointnodes(blackforCloudLab-hostednodes,andblueforGooglecomputeenginehostednodes)aredenotedascircles.Directpathisdenotedassolidline,whiledetourpathsrelayedbyAmazonEC2nodesaredenotedasdashedline. experimentsareconductedonhowmuchaggregatebandwidthcanbeachievedbyusingmultiplecloud-routedpaths|inadditionto,andconcurrentlywiththedefaultInternetpath|andstudysocketleveldesignpatternstoexploitthemtomaximizetheTCPthroughput.Theexperimentsareperformedtoverifythatthecloudpathsconsideredintheevaluation(summarizedinFigure 4-1 )haveequivalentbandwidthtothedefaultInternetpath,asinCRONets.Theexperimentalsetupisasfollows.ThreeoverlayrelaynodesasAmazonEC2smallinstancesaredeployedinavailabilityzonesinthewest,mid-westandeast(labelled\oreg",\ohio",and\nova").ThreenodesfromCloudLab[ 16 ](labelled\utah",\wisc"and\clem")andadditional3nodesfromGooglecomputeengine(labelled\dall",\iowa",and\soca")aredeployedasend-pointnodes.Thesenodesserveassourceanddestinationvirtualmachinesinourexperiments.ThetestbedisdesignedtoexplicitlyaddressarequirementthattracroutesthroughthepublicInternet.WhileresourcesfromCloudLabasendpointsareusedintheexperiments,tracisnotroutedthroughInternet2-rather,alltracgoesthroughthepublicInternettotheEC2andGooglecomputeenginenodes.CloudLabnodes 92

PAGE 93

arenotusedasbothsourceanddestinationendpointsinameasurement;thisistoensurethatourexperimentsreectthetargetusecaseofdatatransfersbycommercialclouduserswhomaynothaveaccesstoaresearchnetworkwithlargerthroughputandlesscontentionthanthepublicInternet.Thesetupleadstoatotalof18paths.Asoneexample,considerthecasewherethesourcenodeis\utah"inCloudLabandthedestinationnodeis\soca"fromGooglecomputeengine.Thesetwoend-nodeshave4Internetpathsbetweenthem:thedefaultInternetpath,andthreecloud-routedpaths:relayingthrough\nova",\ohio",or\oreg"respectively.Ourmeasurementsonthissetupshowthat6outof18pathshaveatleastonebetteralternativepaththandirectTCPpathintermsofTCPthroughput;theimprovementfactoris6.54onaverage.Whilethefractionofalternativepathscandidates(i.e.thosewithhigherthroughputthandirectpath)issmallerinthissetupthanwhatwasobservedinCRONets(owingtothesmallernumberofrelaynodesconsidered),theimprovementfactorissignicant. 4.2OverlayPathInthissection,anapproachtobuildoverlaypathsthroughcloudnodesusingtunnelingandNAT(NetworkAddressTranslation)isexplained.Thecloudrelaynodeisconguredtocreateapairoftunnels,oneconnectedtothesource,andonetothedestination.Thetypeoftunnelcreateddependsonconstraintsimposedbyendnodes.OnetypeusesGRE(GenericRoutingEncapsulation).GREiswell-known,easytodeploy,andaconnectionlessprotocolwithouthandshakethatprovidesapersistenttunnelbetweenpeers.However,GREnotonlyrequirespublicIPaddresstobeused,butalsoitneedstobesupportedbythecloudprovider.PublicIPaddressesmayincuradditionalcosts,andnotallproviderssupportGRE;forinstance,Googlecomputeenginedoesnot.IPsecisanothertypeoftunnelthatcanbeusedwhenGREisunavailable.IPsecisaconnection-orientedprotocolandusesUDPencapsulation,suchthatoneoftheendnodescanresidebehindaNATrouter.ItsencapsulationheaderislargerthanthatofGRE,thusleadingtoreducedMSS(Maximumsegmentsize). 93

PAGE 94

Figure4-2. OverlaypathrelayedbycloudinstancesusingGREandIPsectunneling. Figure 4-2 showsanexampleofasingleoverlaypathdeploymentwithGREandIPsecasthetunnelingmethodforeachoftheendpoints.NotethatGREisapoint-to-pointprotocolwhileIPsecisasite-to-siteVPN.Thus,thesubnetoflocalandremoteendpointsofaGREtunnelcanoverlap,whileIPsecshouldseparatesubnetsoflocalandremoteaddressrange.Whenthesourcenodesendsapackettothedestination,thecloudnoderewritessourceIPaddresstothenextcorrespondingtunnelusingSourceNATinLinuxNetlter.LinuxprovidesameanstotrackconnectionsandtoconductaddresstranslationinNetlterframeworkalongwiththeuser-spacetooliptablestosetuptherulesforit.Whenthedestinationrepliestothecloudnode,thecloudnoderewritesthedestinationaddresstotheoriginalsourcenode.Inmoredetail:whenthesourcesendstoIPaddress10.75.128.1,itrstfallsinsubnetofGREinthesource,thusitiscapturedandencapsulatedthensenttothecloudnode.Thecloudnode,awarethatthedestinationisoverIPsectunnel,rewritesthesourceIPaddressofthepacketto10.75.0.1.Meanwhile,LinuxNATtracksitbystoringthe5tuplesofthestream(IPaddressandtransportnumberpairforTCP/UDP).Whenthedestinationnoderepliestothesource, 94

PAGE 95

itrstreachesthecloudnode(10.75.0.1).Then,thecloudnode,awarethatthedestination10.75.0.1isoriginally10.75.1.1,rewritesthedestinationeldaccordingly.Overlaypathinherentlyincursoverheadscomparedtothedirectpath.SincebothGREandIPsecencapsulateoriginalpacketswithaheader,itreducesMSSby24and40bytesrespectively.Also,ifmultipletunnelingisusedonthepath,Maximumtransferunit(MTU)ofallthetunnelsshouldmatchwiththesmallestMTUonthepathtopreventIPfragmentation.Anothersourceofoverheadispacketrewritingatthecloudnode.Packetprocessingandclassicationwithgeneralpurposeprocessorsalsoincursoverheads[ 31 ];thesearenotabottleneckintheusecasescenarioofbulkdatatransfersacrossthepublicInternet.InthetestbedinFigure 4-1 ,theoverlaypathbetweenaredandabluecircleisanIPsectunnelandthepathbetweenredandblacknodesisaGREtunnel. 4.3FindingAStreamCountCombinationforMaximumAggregateBandwidthInWANenvironment,transferringdataovermultipleTCPstreamscanincreasetheaggregatethroughput,evenifthestreamsshareasingleroutingpathofagivensourceanddestinationpair,comparedtoasingleTCPstream[ 2 32 73 ].Forinstance,arepresentativeapplicationisGridFTP[ 2 39 ],whichtakesadvantageofthisfactbytransferringledataacrossmultipleTCPstreamstoachieveimprovedthroughput.GridFTPtakesthenumberofparallelTCPstreamsasacongurableparameterfromtheuser;whilepredictingtheoptimumcountofconcurrentparallelTCPstreamsisinitselfacomplexissue[ 96 82 ],asaruleofthumb,theaggregatebandwidthincreaseslogarithmicallyasthenumberofparallelTCPstreamsincreases,thensaturates(ordecreases).Inordertoverifywhetherthisbehaviorisobservedinourtestbed,theaggregateTCPbandwidthonasingledefaultInternetpathwithvariousstreamcountsismeasuredbyusingiperf.Theaggregatebandwidthofeachpathisnormalizedwithitspeakbandwidth,thenareaveragedofall18paths.Figure 4-3 showstheratioofincreaseinbandwidthbyaddingadditionalstreamsonasinglepath.TheratioisdenedasAggregateBandwidthk AggregateBandwidthk)]TJ /F9 7.97 Tf 6.59 0 Td[(1wherekisthenumberofadditionalstreamcounts.Itisobservedthatthebandwidthgainbyincreasing 95

PAGE 96

Figure4-3. Averageofrelativeaggregatebandwidthofthe18WANpaths streamcountsislargestwithsmallnumberofstreams(fourorless),andsaturatesaroundover10.ItisalsoobservedthattheaggregatebandwidthplungesinfewcasesasminlineinFigure 4-3 .NotethatourtestbedispublicWANthattheaggregatebandwidthnotonlyvarieswithpathbutalsochangesovertime,ascompetingtracdoexistinpublicWANorinsidecloudserviceproviders.First90percentilestreamcountofthepeakaggregatebandwidthisanalyzedforeachpath.Thisanalysisshowsthat10outof18pathsfallinthe1-5streamcountbucket,and5,2,and1fallinthe6-10,11-15,16-20buckets,respectively.Overall,asimilartrendasinGridFTP[ 96 ]isobserved,whereaggregatebandwidthincreasesasthestreamcountincreasesuntilacertainpoint(usuallybelow10),afterwhichitsaturates. 96

PAGE 97

Algorithm1:Exhaustivesearchforndingacombinationofnumberofstreamsformaximumaggregatebandwidth. Data: path set Result: MaximumbandwidthwithrespectivecombinationofstreamcountsonallpathsFunctionWrapper(path set) count table[1,...,path set.length] f0g;bw,Opt[1,...,path set.length]=FindMaxAggRec(path set,0,count table,0.0);return(bw,Opt); FunctionFindMaxAggRec(path set,path index,count table,prev bw) ifpath index>count table.lengththen return(0.0,NIL); count table[path index]++;bw np,Opt np[1,...,path set.length]=FindMaxAggRec(path set,path index+1,count table.copy(),max bw);bw=Runiperfsimultaneouslyoncorrespondingcount tableandstoresthesummationofallbandwidth;ifbw>prev bwthen bw as,Opt as[1,...,path set.length]=FindMaxAggRec(path set,path index,count table.copy(),max bw); returnMax((bw np,Opt np),(bw,cost table),(bw as,Opt as),key=lambdai:i[0]); Knowingthatthestreamcountthatleadstomaximumaggregatebandwidthonasinglepathvariesoverpathtopath,itissensiblethateachstreamcountcombinationofeachpathalsovariesifthesourceanddestinationpairismulti-homedandhasmulti-pathsbetweenthem.Findingstreamcountcombinationonmulti-pathhasextracomplexity,sincepathsareneithercompletelyindependentnorcorrelatedwitheachother.Thesituationiscompletelycircumstantial.ItispossiblethatanalternatecloudpathhappenstosharemostASeswiththedefaultInternetpath,orthatpathsarecompletelyindependentfromeachother,dependingontheendpoints.ForthepurposeofndingthestreamcountcombinationofmultipleTCPstreamsonmulti-path,Algorithm 1 isrunoneverysourceanddestinationpairinourtestbed. 4.4AlgorithmDetailsAlgorithm 1 takesdestinationend-pointswithvariouspathsasapath set.ThedestinationendpointshaveaseparateIPaddressspacewithadierentroutingpaths. 97

PAGE 98

Thewrapperfunctioninitializescount tablearraywithzeroesandcallstherecursivefunction\FindMaxAggRec".Thisrecursivefunctioncallsitselfwithincreasedpath index,thenincreasescount tablewithgivenpath indexandperformsparalleliperftomeasurethebandwidthwithgivencombinationofthecount table.Ifthemeasuredbandwidthislargerthanthepreviousbandwidth(prev bw),itassumesthatthecorrespondingpathhasmorecapacitytoincreasethebandwidthbyincreasingtheTCPstreambyone.ItthenrecursivelycallsitselftomeasurethebandwidthwithincreasedTCPstreambyone.Notethateveryrecursivecallcopiesthecount table.ThecomplexityofAlgorithm 1 isexponential:O(NM),whileNistheoptimalcountsofTCPstreamsperpathandMisthenumberoftheavailablepaths.Inpractice,exhaustivelysearchingforthecombinationitselfcanbeprohibitivelyexpensivedependingonthenumberofpaths.Also,thecombinationchangesovertimeascircumstancessuchascompetingtracinpublicWANorinsideclouddiersovertime.Itisalsoworthnotethat,inpublicWANenvironment,ndingglobaloptimumofstreamcountcombinationissecondtoimpossible.EvenaboveexhaustivesearchAlgorithm 1 potentiallyleadstolocaloptimumastransientnatureofpublicWANmakeglitchesatcertainiterationthatleadstoforgofurtherrecursion.ThepurposeofAlgorithm 1 istoenableustoperformanempiricalstudytoassesswhatarethepossiblegainsinthroughputbyusingmultiplecloudpaths.Heuristicstondacombinationofpaths/streamsperpaththatstrivestomaximizeperformancearenotwithinthescopeofthisdissertation.Themaximumaggregatebandwidthusingmulti-pathfoundbyusingAlgorithm 1 areshowninFigure 4-4 ,alongwithresultsofsinglestreamonsinglepathandmultistreamsonsinglepath.Byusingmultistreamsonsinglepath,aggregatebandwidthcanbeimprovedbyafactorof1.7onaveragecomparedtosinglestream,whilea4.5improvementfactoronaveragecanbeachievedusingmultistreamsonmulti-path.Incasessuchaswisc-dall,soca-wiscanddall-wisc,wherethedefaultInternetpathsuersfromsignicantcongestion,usingmulti-pathscansubstantiallyincreasetheaggregatebandwidth. 98

PAGE 99

Figure4-4. Aggregatebandwidthcomparisonamongsinglestreamonsinglepath,multistreamonsinglepath,andmultistreamonmulti-path. Tobetterrepresentthestreamcountsfoundinthisexperiment,Sankeydiagramisusedfortheirvisualization(Figure 4-5 ).Thestreamsowfromtheleft(sender)totheright(received);thelinewidthsrepresentrelativecountsofstreams.Lineshoppingthroughrelaynodes(oreg,nova,andohio)representstreamsthatareroutedthroughacloudpath,whilelinesdirectlyconnectingasendertoareceiverrepresentthedefaultInternetpaths.Onenotablephenomenonisthattheoregcloudrelay,whichislocatedfairlyremotelyfromothernodes,happenstoembracemorestreamcountsthantheothercloudnodes.ThistrendalsooccursinaSankeydiagramwithaggregatebandwidthperpath:oregprovidesmorebandwidththantheotherpaths.Whilethegeospatialdiversityofthecloudnodesinthistestislimited, 99

PAGE 100

Figure4-5. Sankeydiagramfromsource(left)todestination(right)intermsofstreamcount.Linewidthisscaledwiththestreamcount. onepossibleexplanationisthatgeographicallydistantnodescanprovidemoreindependentcloudpaths,comparedtothecloudpathslocatednearthedefaultInternetpath.SinceourtestbedisonpublicWANenvironment,itispossibletheexperimentisaectedbytemporalcompetingtracpatternsuchasdiurnalorweeklypatternasshowninotherresearches[ 87 23 ].theexperimentranextensivelytocoveraweek-longperiod,andfrequentlytocoverthediurnalpattern.Ratherthanaregularpattern,suddendipsorspikesofthroughputareobservedoccasionallyincloudpaththatarenotcorrelatedwithothercloudpaths,orthedefaultInternetpath.Thisasafactorfromthezoneofcloudserviceprovider. 4.5ParallelTCPExperimentUsingCloud-pathsPreviousresearch[ 11 ]showedthatacloudroutedpathpotentiallyhashigherbandwidththandefaultInternetpath.Inthisresearch,wetargettoincreasetheoverallbandwidthwithadditionalcloud-routedpathsbysimultaneouslytransferringpacketsusingallavailablecloud 100

PAGE 101

Figure4-6. BandwidthcomparisonamongdefaultInternetpath,simultaneousmulti-pathandarithmeticsumofmulti-path. routedpathsalongwithdefaultInternetpath.Tomeasurethebandwidthusingmulti-pathtransfer,TCPapplicationcallediperfranallavailablepathssimultaneouslyandcomparedtheresultswithsingledefaultInternetpath.Inaddition,iperfranatallpaths(defulatInternetpathandcloudpath)separatelyandaddedthebandwidthofallpath.Thisarithmeticsumrepresentsthetheoreticalmaximumbandwidth,ascloudmulti-pathshappentosharethepartofpathsandtocoincideinthesameASes.ThistestraninallpathsinourtestbedandtheresultisshowninFigure 4-6 .Itisobservedthatusingmulti-pathscanincreasethechancetoachievehigherbandwidthandtheimprovingfactoris1.45inaverage. 101

PAGE 102

Figure4-7. Bandwidth-Costrelationship(costisfora1GBdatatransfer) 4.6Bandwidth-CostRelationshipinCloudMulti-PathThedataanalyzedintheprevioussectionshowsthatcloud-routedmulti-pathcandeliverhigheraggregatebandwidththandefaultInternetpathinseveralinstances.However,unlikedefaultInternetpath,thecloudpathincurscosts:boththecostofrunningacomputeinstance,andthenetworktransfercost.Provisioningcostiscalculatedbasedonnumberofhoursaninstanceisrunning,andthetraccostiscalculatedbasedonthenumberofbytestransferredoutofthecloudprovider.Tocharacterizetherelationshipbetweenbandwidthincreaseversuscost,TCP-basediperfisusedtoperformtransferswithallpossiblecombinationofpaths.Inthisexperiment,asinglepathischaracterizedbyrunningiperfonthepath;tocharacterizemultiplecombinationsofpaths,iperfranconcurrentlyonallcorrespondingpaths, 102

PAGE 103

andaddupallthedatatransferred.SinceInternetpathshaveoverlappingASesamongthem,themulti-pathbandwidthdoesnotexceedthearithmeticsumoftheindividualsinglepathbandwidth.TheresultsfromthecharacterizationofbandwidthandcostinourtestbedaresummarizedinFigure 4-7 .AsshowninFigure 4-7 ,dotsontheleft(y-axis)denotethedefaultInternetpath|itdoesnotincuranycost.Asmorealternatepathsareaddedtothedefaultpath,thecost(andbandwidth)increases.Thedatapointsontheright-mostpartofthegraphuseallavailablepaths.Thebandwidth/costrelationformsaconvexcurve;asthecostincreases,theaggregatebandwidthincreases,butinadiminishingmanner.Itissensiblethat,asmorealternatepathsareused,thereisagreaterchanceofoverlappingASesamongpaths.Itisalsoimportanttonotethatpotentialincreasesinaggregatebandwidthareprovidedonlyinanopportunisticandgranularmanner,andincurringadditionalcostbyaddingupalternatepathsdoesnotalwaysleadtohigheraggregatebandwidth. 4.7ADesignPatternforMulti-PathTCPSocketInSection 4.3 ,weshowthattheaggregatebandwidthcanbeincreasedbyusingmultiplestreamsonmultiplepaths.However,higheraggregatebandwidthdoesnotalwaysleadtohigherend-to-enddatatransferthroughput.Packetsaredeliveredoutoforderatreceivingend,eveninasinglepath,andinmulti-pathenvironment,out-of-orderdeliveryatreceivingendexacerbatesaspacketsgothroughadierentpathwithdierentRTTs.Widelyusedtransportprotocols,suchasTCPandSCTP,handlethisbyretransmissionorbysendinggapacksbacktothesender,butitleadstoduplicatetransmissionandreducedcongestionwindowatthesender[ 41 ].Inthisdissertation,wedonotintroduceatransportlevelsolutiontohandleout-or-orderdelivery.Instead,weslicethebulkdataintoblocksofsizesproportionaltotherespectivebandwidthofthepathsthattheblocksaretobetransferredover.Then,eachblockissentoverthroughasingleTCPstreamonanassignedsinglepath,andmultipleblocksareconcurrentlysentovermultiplestreamsonmultiplepaths.Our-of-orderdeliveryattheblock 103

PAGE 104

levelcanoccurinthisapproach;itistheresponsibilityofthereceiversidetoreordertheseblocks.ThisapproachrunsAlgorithm 1 tondtheaggregatebandwidthofeachpathandthecountofstreamsperpath.Thenthesenderallotsthetotalsizeofbulkdataintoeachpathbycomputingtheratiooftheaggregatebandwidthofeachpathoverthetotalaggregatebandwidth,suchthattheallottedsizeisproportionaltotherespectiveaggregatebandwidthofeachpath.Thentheallottedsizeisdividedevenlybythecountofstreamsofthepath,andthedividedvaluebecomestheblocksize.Toinformtothereceiverabouthowmanyblocksaretobesentandonwhichpaththeyaresentover,thesenderandthereceiverestablishasocketconnectionforsignaling.Thus,thereisonesignalsockettoarbitratedatatransferbetweenasenderandareceiver,andmultipledatatransfersockets,matchingthenumberofblocks.Thesignalsocketbetweenthesenderandthereceiverarbitratesignalingofbeginningandendingofthedatatransfer,andmetadataaboutthetransfersession.Intheprototype,\init",\ready",\done"and\n"signalsareintroduced.Initsignalfromthesendernotiesthereceiverabouttheblockcountsforeachpathandthatthestartofdatatransferisimminent.Readysignalfromthereceivernotiessenderthatreceiverhasopenedsocketsandiswaitingtobeconnected.Italsonotieswhichportnumberistobeusedfordatatransfer.Donesignalfromthereceivernotiesthesenderthatthedatatransferhascompletedandreadytoterminatethesession.Finsignalfromthesendernotiesthereceiverthatithasnomoresessions,andclosesthesocket.Intheimplementationofthesignalingprotocol,typicalTLV(Type-Length-Value)basedapproachisused,butwithoutthetypeeld.Thelengthofallsignalmessagesisshorterthantwo-byteintegerrange;thersttwobytesareusedasalengtheld,andtherestareusedasamessagepayload.Figure 4-8 showsthekeyelementsofourproposeddesignpattern.ThesocketslicesthedatatobetransferredintomultipleblocksandqueuestothedatatransferthreadalongwithblockID,osetandsizeoftheblock.Initially,beforeactuallysendingdata,thesocket 104

PAGE 105

Figure4-8. ADesignpatternformulti-pathdatatransfer. sendsinitalongwithhowmanyblocksaretobesentoverwhichpathandwaitsforready.Then,thereceiverpreparesabucketarrayanddatathreadsofthesamenumberofblockcounts,afterwhichitnotiesthesenderwithready.Then,thedatathreadsstartsendinguponthereceiptofreadysignal.Eachblockisappendedwith8byteheaderofwhichconsistsof4bytelengthofblock ideldandthesizeoftheblock.Sinceeachdatathreadknowshowmanybytesaretobetransferredthroughtherespectivesocket,itterminatesoncethereceivedbyteisequaltothesizeoftheblock.Thenthereceivedblockdataisstoredinthe 105

PAGE 106

bucketarray.EachcloudpathandthedefaultInternetpathcontainsasingleTCPsocketpairbetweensenderandreceiver.Atreceivingside,uponthecompletionofdata,donesignalissentalongwithchunk idandpath id.Upontherecipientsofdone,senderqueuesnextblock idupontheavailablepath.NotethatallthesignalinganddatatransferisperformedonTCPconnectionsthatguaranteereliabletransmission.ItisbettertoestablishsignalsocketondefaultInternetpathratherthancloudpath,sinceanInternetpathismoreresilientuponfailure.Inthisproof-of-conceptimplementationofthisapproachfortheexperimentsconductedinthisexperiment,failurehandlingforacasewhereacloudnodefailsduringtransferisnotimplemented.Anapproachthatcanbeusedinthiscaseistotimeoutandretransmitonanotheravailablepathtocopewiththiscase.Thepurposeofpresentingthisdesignpatternistodemonstratethatanapplication-levelprotocolcanexploittheaggregatebandwidthfrommulti-pathmulti-streamingtodeliverimprovedaggregatethroughputfordatatransfers.ThecoreprincipleofabovedesignisinspiredbytheGridFTPprotocol,whichconductsapplicationlevelstripingacrossmultipleTCPstreams.Moreover,reorderingatblocklevel(acrossTCPsocket)shouldbecarefullydesignedtominimizecomputingandmemoryresources.Currently,thisimplementationisnotoptimizedtoaddressthisissue,asitconcatenatesblocksatthenalphaseofthesession. 4.8EvaluationFortheperformanceevaluation,aprototypehasbeendevelopedwhichimplementsthedatatransferpatternoftheprevioussectionattheapplicationlayer,usingmultipleTCPstreams,forthefollowingreasons.TCPisimplementedinthekernellayeroftheO/S,makingitisdiculttomodifythecode.Furthermore,giventheprevalentexistenceofmiddleboxesintheInternet,modifyingTCPcansignicantlyhinderapplicability[ 69 ].Forexample,SCTP[ 84 ]hasextensivelybeenusedbetweentelecoms,isimplementedinmostO/S,andisproventoperformwellwithmulti-pathforbandwidthincrease;however,ithasbeennotusedwithend-userssimplybecausemostusersarebehindhomerouters[ 9 ]thatembedclassical5-tupleNAT[ 74 ]anddonotsupportSCTP.Thus,itisdecidedtouseTCPasisandpresenta 106

PAGE 107

designpatternintheapplicationlayerwhichexploitsmulti-pathfortheimprovedaggregatethroughput.Thedesignpatterndiscussedinsection 4.7 havebeenimplementedasapseudo-socketAPI,where\write()"and\read()"methodsareimplementedasmemberfunctions.Whileanactualsocket-levelAPIimplementationcanbeconceivedtoleveragethecloudmulti-path,itisnotwithinthescopeofthisresearch;thescopeoftheimplementationislimitedtodemonstrateempiricallytheimprovementofthroughputusingcloudmulti-paths.Forthesamereason,insteadofimplementing\bind()",\listen()"and\connect()"astheAPIs,theendpointtransportaddressofcloudpathsaregivenascongurableparameters.Intheimplementation,theapplicationtakesbulkdataasale,thenopensthesocketandsendsthedatabycalling\write()".Atreceivingend,thesocketiscalledthrough\read()"andwaitsforthecompletionofdatatransfer.Weimplementedasimpleapplicationwhichsends1GByteofdatausingthispseudosocketAPI.ThetransfercompletiontimesinthevariouspathsonourtestbedareshowninFigure 4-9 .Itisobservedthatamulti-pathcanguaranteeatleastthethroughputthatcouldbeachievedusingasinglepath.Moreover,insomecases,throughputusingmulti-pathcanbehigher,byasignicantfactor,thanasinglepath.ThishappenswhenoneofthecloudpathshappentohaveasmuchasormuchhigherthroughputthanthedefaultInternetpath.Figure 4-10 showstheselectivepathsthatachievedmostthroughputimprovement(leftaxis,redbar)byusingcloudpaths,alongwiththeirincurredcost(rightaxis,bluebar).ThisdatareectsthecurrentcostincurredonAmazonEC2instances($0.09perGB)usedintheexperiment.Notethatthecostiswithrespecttoa1GBdatatransfer;largerdatatransferswouldleadtoproportionallylargercosts,e.g.$90fora1TBtransferinthecaseof\wisc-dall",ascenariowherealldataaretransferredovercloudpath.Theresultsshowthatsignicantimprovementsinend-to-enddatatransfercanbeachievedbyusingcloud-routedpathsinsomescenarios,ifauseriswillingtopaytheadditionalcostsassociatedwithclouddatatransfer. 107

PAGE 108

Figure4-9. Transfercompletiontimeof1GBdata/leinvariouspaths. Thetotalcosttendstobedominatedbydatatransfercosts|notinstancecosts|allowinguserstotrade-ocostforperformancedependingonservice-levelobjectives. 4.9RelatedWorkTheGridFTPframework[ 2 ]hasbeenwidelyusedacrosstheglobeforreliableandsecureletransfer.IthasextendedFTP[ 72 ]toincorporatestripedandinterleavedsendingthroughparallelTCPstreams.Originally,FTPusestwoTCPconnections:controlanddata.Thecontrolconnectionexchangescommandsbetweentheserverandtheclient,whilethedataconnectioniscreatedeachtimealetransferisnecessary.GridFTPextendedtheprotocoltoenableparallelTCPbyallowingopeningmultipledataconnectionsandleveragingparalleltransfers. 108

PAGE 109

Figure4-10. Selectivepaththroughputimprovementwithincurredcost. OpeningmultipleTCPsocketsandstripingdataoverdierentpathsattheapplicationlayerhasbeenstudiedinrelatedwork[ 82 32 33 34 ].InparallelTCP,eachTCPstreamguaranteesin-orderdeliveryofpackets,butout-of-orderdeliveryofblocksacrossmultipleTCPstreamscanstallapplicationsinGridFTP.TheapproachofHackeretal.[ 34 ]couldreducethisreorderingbyallocatingpacketsonsocketsproportionaltothesizeofeachcongestionwindowofeachTCPsocket.However,theyalsomakenotethatcongestionwindowasabasisfortheschedulerwhichthrottlespacketstoeachstreamincursaviciouscycleofeitheroverallocatingorstarvingofsocket.AnotherdrawbackofparallelTCPovermultiplepathsisthatitcannotdynamicallyadaptoverthechangeofthequalityofpaths[ 98 ].mTCP[ 98 ]isamodicationfromTCPwhichstripespacketsintosub-TCP,whereeachsub-TCPrunsitsowncongestioncontrol.ItconsidersnotonlythethroughputimprovementbutalsothefairnessforpublicWANbydetecting/avoidingcommoncongestionamongpaths.Sinceitisdynamicallyscoringeachpathbasedonthecongestionwindowandoutstandingpackets,itcouldadaptoverchangesofpath 109

PAGE 110

characteristics.Moreover,itusesheuristicstondtheoptimumcombinationofpathsthatleadstomostindependenceamongpaths.MPTCP[ 74 ]istherecentandnoteworthystudyaboutimplementingtransportlayerprotocolleveragingmulti-path.ItisimplementedtomaximizetheapplicabilityincurrentInternetarchitecturewheremiddleboxesandhomeroutersareprevalent.MPTCPstrictlytargetstoperformnoworsethanthebestsinglepath,andinsomecases,itperformsworsethantheregularTCP.Itisimplementedinthekernelandisopentothecommunity,andrelatedworkhasconsidereditinvariousplatformsandscenarios[ 54 11 67 ].Sinceitstargetisnotmaximizingaggregatethroughput,inourexperienceincreaseinaggregatethroughputusingMPTCPisnotobservedinourtestbed,anissuealsoraisedinrelatedwork[ 11 49 ].However,inscenarioswherethereisminimalcompetingtracandcompleteindependenceofeachpath[ 60 67 ],MPTCPcouldimprovetheaggregatethroughput.Insection 4.3 ,wepointedoutthatndingtheoptimaloverlaypathfromtheendpointsisprohibitivelyexpensive.Typicaloverlaysusepingandtracerouteorotherprobingtechniquestoattainunderlaynetworktraits[ 98 ]inordertobuildmoresuitablevirtualtopologies:MSTalgorithmtobuildmulticasttree,nearestpeerforP2Plesharingandfarawaypeerfordatareplication.However,ithasbeenpointedoutthatprobingtonddisjointpathsisnotscalable,alsoredundantasdierentkindsofoverlaysduplicatethiseort[ 3 59 ].Nakaoetal.[ 59 ]suggestedasharedroutingunderlaythatoverlaycanquerytoattaintheunderlaynetworktopologies,whichcouldreducetheprobingfromO(N2)toO(N)whentryingtonddisjointpathsamongoverlaynodes. 4.10ChapterConclusionInChapter 4 ,itisdemonstratedempiricallythatcloudinstancescanbeusedasarelaypointtodiversifyroutingpathsbetweenagivensourceanddestinationpair.Variousexperimentsshowthatadditionalcloudpathscanbeusedtoincreaseaggregatethroughput.Especially,mostleastproximaltothedefaultInternetpathhappentoprovidemoreadditionalthroughput.However,additionalcloudpathsdonotleadtodeterministicthroughputincrease, 110

PAGE 111

ascanbeexpectedwithaprivatelinelease;rather,thethroughputincreaseisprovidedinopportunisticandgranularmanner,butatafractionofthecost.Sincetherewasnottransportprotocolthatmaximizesaggregatethroughputusingmulti-pathatthetimeofwritingthisdissertation,instead,anapproachinspiredfromapplicationssuchasGridFTPisused,whichstripesdataacrossparallelTCPsockets,thenpseudosocketAPIsisimplementedtodemonstratetheaggregatebandwidthusingmultiplepathscanleadtoactualincreaseinthroughputintheapplicationlayer. 111

PAGE 112

REFERENCES [1] M.Al-Fares,A.Loukissas,andA.Vahdat.Ascalable,commoditydatacenternetworkarchitecture.SIGCOMMComput.Commun.Rev.,38(4):63{74,Aug.2008. [2] W.Allcock,J.Bresnahan,R.Kettimuthu,M.Link,C.Dumitrescu,I.Raicu,andI.Foster.Theglobusstripedgridftpframeworkandserver.InProceedingsofthe2005ACM/IEEEConferenceonSupercomputing,SC'05,pages54{,Washington,DC,USA,2005.IEEEComputerSociety. [3] D.Andersen,H.Balakrishnan,F.Kaashoek,andR.Morris.Resilientoverlaynetworks.InProceedingsoftheEighteenthACMSymposiumonOperatingSystemsPrinciples,SOSP'01,pages131{145,NewYork,NY,USA,2001.ACM. [4] L.AnderssonandT.Madsen.Providerprovisionedvirtualprivatenetwork(vpn)terminology.RFC4026,RFCEditor,March2005. [5] ApacheLibcloud.https://libcloud.apache.org/. [6] ApacheLibcloud.https://osrg.github.io/ryu/. [7] M.F.Bari,R.Boutaba,R.Esteves,L.Z.Granville,M.Podlesny,M.G.Rabbani,Q.Zhang,andM.F.Zhani.Datacenternetworkvirtualization:Asurvey.IEEECommunicationsSurveysTutorials,15(2):909{928,Second2013. [8] M.Ben-Yehuda,M.D.Day,Z.Dubitzky,M.Factor,N.Har'El,A.Gordon,A.Liguori,O.Wasserman,andB.-A.Yassour.Theturtlesproject:Designandimplementationofnestedvirtualization.InProceedingsofthe9thUSENIXConferenceonOperatingSystemsDesignandImplementation,OSDI'10,pages423{436,Berkeley,CA,USA,2010.USENIXAssociation. [9] R.W.Bickhart.Transparenttcp-to-sctptranslationshimlayer.InMaster'sThesis,UniversityofDelaware,2005. [10] N.Bitar.Multi-tenantdatacenterandcloudnetworkingevolution.In2013OpticalFiberCommunicationConferenceandExpositionandtheNationalFiberOpticEngineersConference(OFC/NFOEC),pages1{3,March2013. [11] C.X.Cai,F.Le,X.Sun,G.G.Xie,H.Jamjoom,andR.H.Campbell.Cronets:Cloud-routedoverlaynetworks.In2016IEEE36thInternationalConferenceonDis-tributedComputingSystems(ICDCS),pages67{77,June2016. [12] B.CarpenterandS.Brim.Middleboxes:Taxonomyandissues.RFC3234,RFCEditor,February2002. http://www.rfc-editor.org/rfc/rfc3234.txt [13] ChameleonCloud.https://www.chameleoncloud.org/. 112

PAGE 113

[14] C.Chen,C.Liu,P.Liu,B.T.Loo,andL.Ding.Ascalablemulti-datacenterlayer-2networkarchitecture.InProceedingsofthe1stACMSIGCOMMSymposiumonSoftwareDenedNetworkingResearch,SOSR'15,pages8:1{8:12,NewYork,NY,USA,2015.ACM. [15] Cisco.Cisco'sMassivelyScalableDataCenter. [16] CloudLab.http://cloudlab.us/. [17] E.Deelman,K.Vahi,G.Juve,M.Rynge,S.Callaghan,P.J.Maechling,R.Mayani,W.Chen,R.F.daSilva,M.Livny,andK.Wenger.Pegasus,aworkowmanagementsystemforscienceautomation.FutureGenerationComputerSystems,46:17{35,2015. [18] D.Eastlake,A.Banerjee,D.Dutt,R.Perlman,andA.Ghanwani.Transparentinterconnectionoflotsoflinks(trill)useofis-is.RFC6326,RFCEditor,July2011. [19] H.Eriksson.Mbone:Themulticastbackbone.Commun.ACM,37(8):54{60,Aug.1994. [20] D.Farinacci,V.Fuller,D.Meyer,andD.Lewis.Thelocator/idseparationprotocol(lisp).RFC6830,RFCEditor,January2013. http://www.rfc-editor.org/rfc/rfc6830.txt [21] W.Felter,A.Ferreira,R.Rajamony,andJ.Rubio.Anupdatedperformancecomparisonofvirtualmachinesandlinuxcontainers.In2015IEEEInternationalSymposiumonPerformanceAnalysisofSystemsandSoftware(ISPASS),pages171{172,March2015. [22] A.Fishman,M.Rapoport,E.Budilovsky,andI.Eidus.HVX:Virtualizingthecloud.InPresentedaspartofthe5thUSENIXWorkshoponHotTopicsinCloudComputing,SanJose,CA,2013.USENIX. [23] C.Fraleigh,S.Moon,B.Lyles,C.Cotton,M.Khan,D.Moll,R.Rockell,T.Seely,andS.C.Diot.Packet-leveltracmeasurementsfromthesprintipbackbone.IEEENetwork,17(6):6{16,Nov2003. [24] P.Francis.http://www.icir.org/yoid/. [25] V.FullerandT.Li.Classlessinter-domainrouting(cidr):Theinternetaddressassignmentandaggregationplan.BCP122,RFCEditor,August2006. http://www.rfc-editor.org/rfc/rfc4632.txt [26] P.GargandY.Wang.Nvgre:Networkvirtualizationusinggenericroutingencapsulation.RFC7637,RFCEditor,September2015. [27] A.Greenberg,J.R.Hamilton,N.Jain,S.Kandula,C.Kim,P.Lahiri,D.A.Maltz,P.Patel,andS.Sengupta.Vl2:Ascalableandexibledatacenternetwork.SIGCOMMComput.Commun.Rev.,39(4):51{62,Aug.2009. [28] J.GrossandW.Lambeth.Methodandapparatusforstatelesstransportlayertunneling,Sept.22014.USPatent8,825,900. 113

PAGE 114

[29] N.GrozevandR.Buyya.Inter-cloudarchitecturesandapplicationbrokering:taxonomyandsurvey.Software:PracticeandExperience,44(3):369{390,2014. [30] C.Guo,G.Lu,H.J.Wang,S.Yang,C.Kong,P.Sun,W.Wu,andY.Zhang.Secondnet:Adatacenternetworkvirtualizationarchitecturewithbandwidthguarantees.InProceed-ingsofthe6thInternationalCOnference,Co-NEXT'10,pages15:1{15:12,NewYork,NY,USA,2010.ACM. [31] P.GuptaandN.McKeown.Algorithmsforpacketclassication.IEEENetwork,15(2):24{32,Mar2001. [32] T.J.Hacker,B.D.Athey,andB.Noble.Theend-to-endperformanceeectsofparalleltcpsocketsonalossywide-areanetwork.InProceedings16thInternationalParallelandDistributedProcessingSymposium,April2002. [33] T.J.Hacker,B.D.Noble,andB.D.Athey.Improvingthroughputandmaintainingfairnessusingparalleltcp.InIEEEINFOCOM2004,volume4,pages2480{2489vol.4,March2004. [34] T.J.Hacker,B.D.Noble,andB.D.Athey.Adaptivedatablockschedulingforparalleltcpstreams.InHPDC-14.Proceedings.14thIEEEInternationalSymposiumonHighPerformanceDistributedComputing,2005.,pages265{275,July2005. [35] E.Haleplidis,K.Pentikousis,S.Denazis,J.H.Salim,D.Meyer,andO.Koufopavlou.Software-denednetworking(sdn):Layersandarchitectureterminology.RFC7426,RFCEditor,January2015. http://www.rfc-editor.org/rfc/rfc7426.txt [36] S.Hanks,T.Li,D.Farinacci,andP.Traina.Genericroutingencapsulation(gre).RFC1701,RFCEditor,October1994. [37] J.HawkinsonandT.Bates.Guidelinesforcreation,selection,andregistrationofanautonomoussystem(as).BCP6,RFCEditor,March1996. [38] E.Hernandez-Valencia,S.Izzo,andB.Polonsky.Howwillnfv/sdntransformserviceprovideropex?IEEENetwork,29(3):60{67,May2015. [39] C.Huang,C.Nakasan,K.Ichikawa,andH.Iida.Amultipathcontrollerforacceleratinggridftptransferoversdn.In2015IEEE11thInternationalConferenceone-Science,pages439{447,Aug2015. [40] K.Ichikawa,P.U-Chupala,C.Huang,C.Nakasan,T.-L.Liu,J.-Y.Chang,L.-C.Ku,W.-F.Tsai,J.Haga,H.Yamanaka,E.Kawai,Y.Kido,S.Date,S.Shimojo,P.Papadopoulos,M.Tsugawa,M.Collins,K.Jeong,R.Figueiredo,andJ.Fortes.Pragma-ent:Aninternationalsdntestbedforcyberinfrastructureinthepacicrim.Con-currencyandComputation:PracticeandExperience,29(13):e4138{n/a,2017.e4138cpe.4138. 114

PAGE 115

[41] J.R.Iyengar,P.D.Amer,andR.Stewart.Concurrentmultipathtransferusingsctpmultihomingoverindependentend-to-endpaths.IEEE/ACMTransactionsonNetworking,14(5):951{964,Oct2006. [42] E.J.Jackson,M.Walls,A.Panda,J.Pettit,B.Pfa,J.Rajahalme,T.Koponen,andS.Shenker.Softow:Amiddleboxarchitectureforopenvswitch.In2016USENIXAnnualTechnicalConference(USENIXATC16),pages15{28,Denver,CO,2016.USENIXAssociation. [43] R.JainandS.Paul.Networkvirtualizationandsoftwaredenednetworkingforcloudcomputing:asurvey.IEEECommunicationsMagazine,51(11):24{31,November2013. [44] V.Jalaparti,I.Bliznets,S.Kandula,B.Lucier,andI.Menache.Dynamicpricingandtracengineeringfortimelyinter-datacentertransfers.InProceedingsofthe2016ACMSIGCOMMConference,SIGCOMM'16,pages73{86,NewYork,NY,USA,2016.ACM. [45] K.JeongandR.Figueiredo.Self-conguringsoftware-denedoverlaybypassforseamlessinter-andintra-cloudvirtualnetworking.InProceedingsofthe25thACMInternationalSymposiumonHigh-PerformanceParallelandDistributedComputing,HPDC'16,pages153{164,NewYork,NY,USA,2016.ACM. [46] X.JiangandD.Xu.VIOLIN:VirtualInternetworkingonOverlayInfrastructure,pages937{946.SpringerBerlinHeidelberg,Berlin,Heidelberg,2005. [47] P.S.Juste,K.Jeong,H.Eom,C.Baker,andR.Figueiredo.Tincan:User-denedp2pvirtualnetworkoverlaysforad-hoccollaboration.EAIEndorsedTransactionsonCollaborativeComputing,14(2),102014. [48] S.KentandK.Seo.Securityarchitecturefortheinternetprotocol.RFC4301,RFCEditor,December2005. http://www.rfc-editor.org/rfc/rfc4301.txt [49] R.Khalili,N.Gast,M.Popovic,andJ.Y.L.Boudec.Mptcpisnotpareto-optimal:Performanceissuesandapossiblesolution.IEEE/ACMTransactionsonNetworking,21(5):1651{1665,Oct2013. [50] H.KhosraviandT.Anderson.Requirementsforseparationofipcontrolandforwarding.RFC3654,RFCEditor,November2003. [51] T.Koponen,K.Amidon,P.Balland,M.Casado,A.Chanda,B.Fulton,I.Ganichev,J.Gross,P.Ingram,E.Jackson,A.Lambeth,R.Lenglet,S.-H.Li,A.Padmanabhan,J.Pettit,B.Pfa,R.Ramanathan,S.Shenker,A.Shieh,J.Stribling,P.Thakkar,D.Wendlandt,A.Yip,andR.Zhang.Networkvirtualizationinmulti-tenantdatacenters.In11thUSENIXSymposiumonNetworkedSystemsDesignandImplementation(NSDI14),pages203{216,Seattle,WA,2014.USENIXAssociation. 115

PAGE 116

[52] D.Kostic,A.Rodriguez,J.Albrecht,andA.Vahdat.Bullet:Highbandwidthdatadisseminationusinganoverlaymesh.InProceedingsoftheNineteenthACMSymposiumonOperatingSystemsPrinciples,SOSP'03,pages282{297,NewYork,NY,USA,2003.ACM. [53] S.Krishnan,N.Montavont,E.Njedjou,S.Veerepalli,andA.Yegin.Link-layereventnoticationsfordetectingnetworkattachments.RFC4957,RFCEditor,August2007. [54] Y.-s.Lim,Y.-C.Chen,E.M.Nahum,D.Towsley,andR.J.Gibbens.Howgreenismultipathtcpformobiledevices?InProceedingsofthe4thWorkshoponAllThingsCellular:Operations,Applications,&Challenges,AllThingsCellular'14,pages3{8,NewYork,NY,USA,2014.ACM. [55] M.Mahalingam,D.Dutt,K.Duda,P.Agarwal,L.Kreeger,T.Sridhar,M.Bursell,andC.Wright.Virtualextensiblelocalareanetwork(vxlan):Aframeworkforoverlayingvirtualizedlayer2networksoverlayer3networks.RFC7348,RFCEditor,August2014. http://www.rfc-editor.org/rfc/rfc7348.txt [56] L.Mai,L.Rupprecht,A.Alim,P.Costa,M.Migliavacca,P.Pietzuch,andA.L.Wolf.Netagg:Usingmiddleboxesforapplication-specicon-pathaggregationindatacentres.InProceedingsofthe10thACMInternationalonConferenceonEmergingNetworkingExperimentsandTechnologies,CoNEXT'14,pages249{262,NewYork,NY,USA,2014.ACM. [57] N.McKeown,T.Anderson,H.Balakrishnan,G.Parulkar,L.Peterson,J.Rexford,S.Shenker,andJ.Turner.Openow:Enablinginnovationincampusnetworks.SIG-COMMComput.Commun.Rev.,38(2):69{74,Mar.2008. [58] J.Mudigonda,P.Yalagandula,J.Mogul,B.Stiekes,andY.Pouary.Netlord:Ascalablemulti-tenantnetworkarchitectureforvirtualizeddatacenters.InProceedingsoftheACMSIGCOMM2011Conference,SIGCOMM'11,pages62{73,NewYork,NY,USA,2011.ACM. [59] A.Nakao,L.Peterson,andA.Bavier.Aroutingunderlayforoverlaynetworks.InProceedingsofthe2003ConferenceonApplications,Technologies,Architectures,andProtocolsforComputerCommunications,SIGCOMM'03,pages11{18,NewYork,NY,USA,2003.ACM. [60] C.Nakasan,K.Ichikawa,H.Iida,andP.Uthayopas.Asimplemultipathopenowcontrollerusingtopology-basedalgorithmformultipathtcp.ConcurrencyandComputa-tion:PracticeandExperience,29(13):e4134{n/a,2017.e4134cpe.4134. [61] T.Narten,E.Gray,D.Black,L.Fang,L.Kreeger,andM.Napierala.Problemstatement:Overlaysfornetworkvirtualization.RFC7364,RFCEditor,October2014. [62] T.Narten,E.Nordmark,W.Simpson,andH.Soliman.Neighbordiscoveryforipversion6(ipv6).RFC4861,RFCEditor,September2007. http://www.rfc-editor.org/rfc/rfc4861.txt 116

PAGE 117

[63] M.R.Nascimento,C.E.Rothenberg,M.R.Salvador,C.N.A.Corr^ea,S.C.deLucena,andM.F.Magalh~aes.Virtualroutersasaservice:Therouteowapproachleveragingsoftware-denednetworks.InProceedingsofthe6thInternationalConferenceonFutureInternetTechnologies,CFI'11,pages34{37,NewYork,NY,USA,2011.ACM. [64] D.Naylor,K.Schomp,M.Varvello,I.Leontiadis,J.Blackburn,D.R.Lopez,K.Papagiannaki,P.RodriguezRodriguez,andP.Steenkiste.Multi-contexttls(mctls):Enablingsecurein-networkfunctionalityintls.InProceedingsofthe2015ACMConfer-enceonSpecialInterestGrouponDataCommunication,SIGCOMM'15,pages199{212,NewYork,NY,USA,2015.ACM. [65] R.NiranjanMysore,A.Pamboris,N.Farrington,N.Huang,P.Miri,S.Radhakrishnan,V.Subramanya,andA.Vahdat.Portland:Ascalablefault-tolerantlayer2datacenternetworkfabric.SIGCOMMComput.Commun.Rev.,39(4):39{50,Aug.2009. [66] A.M.OprescuandT.Kielmann.Bag-of-tasksschedulingunderbudgetconstraints.In2010IEEESecondInternationalConferenceonCloudComputingTechnologyandScience,pages351{359,Nov2010. [67] C.Paasch,S.Ferlin,O.Alay,andO.Bonaventure.Experimentalevaluationofmultipathtcpschedulers.InProceedingsofthe2014ACMSIGCOMMWorkshoponCapacitySharingWorkshop,CSWS'14,pages27{32,NewYork,NY,USA,2014.ACM. [68] K.PagiamtzisandA.Sheikholeslami.Content-addressablememory(cam)circuitsandarchitectures:atutorialandsurvey.IEEEJournalofSolid-StateCircuits,41(3):712{727,March2006. [69] G.Papastergiou,G.Fairhurst,D.Ros,A.Brunstrom,K.J.Grinnemo,P.Hurtig,N.Khademi,M.Txen,M.Welzl,D.Damjanovic,andS.Mangiante.De-ossifyingtheinternettransportlayer:Asurveyandfutureperspectives.IEEECommunicationsSurveysTutorials,19(1):619{639,Firstquarter2017. [70] B.Pfa,J.Pettit,T.Koponen,E.Jackson,A.Zhou,J.Rajahalme,J.Gross,A.Wang,J.Stringer,P.Shelar,K.Amidon,andM.Casado.Thedesignandimplementationofopenvswitch.In12thUSENIXSymposiumonNetworkedSystemsDesignandImplementation(NSDI15),pages117{130,Oakland,CA,2015.USENIXAssociation. [71] G.J.PopekandR.P.Goldberg.Formalrequirementsforvirtualizablethirdgenerationarchitectures.Commun.ACM,17(7):412{421,July1974. [72] J.Postel.Filetransferprotocolspecication.RFC765,RFCEditor,June1980. [73] L.Qiu,Y.Zhang,andS.Keshav.Onindividualandaggregatetcpperformance.InProceedings.SeventhInternationalConferenceonNetworkProtocols,pages203{212,Oct1999. 117

PAGE 118

[74] C.Raiciu,C.Paasch,S.Barre,A.Ford,M.Honda,F.Duchene,O.Bonaventure,andM.Handley.Howhardcanitbe?designingandimplementingadeployablemultipathTCP.In9thUSENIXSymposiumonNetworkedSystemsDesignandImplementation(NSDI12),pages399{412,SanJose,CA,2012.USENIXAssociation. [75] K.Razavi,A.Ion,G.Tato,K.Jeong,R.Figueiredo,G.Pierre,andT.Kielmann.Kangaroo:Atenant-centricsoftware-denedcloudinfrastructure.In2015IEEEIn-ternationalConferenceonCloudEngineering,pages106{115,March2015. [76] L.Rizzo.netmap:Anovelframeworkforfastpacketi/o.In2012USENIXAnnualTechnicalConference(USENIXATC12),pages101{112,Boston,MA,2012.USENIXAssociation. [77] R.M.RobinsonandP.A.S.Ward.Anarchitectureforreliableencapsulationendpointsusingcommodityhardware.In2011IEEE30thInternationalSymposiumonReliableDistributedSystems,pages183{192,Oct2011. [78] E.Rosen,A.Viswanathan,andR.Callon.Multiprotocollabelswitchingarchitecture.RFC3031,RFCEditor,January2001. http://www.rfc-editor.org/rfc/rfc3031.txt [79] J.Rosenberg,J.Weinberger,C.Huitema,andR.Mahy.Stun-simpletraversalofuserdatagramprotocol(udp)throughnetworkaddresstranslators(nats).RFC3489,RFCEditor,March2003. http://www.rfc-editor.org/rfc/rfc3489.txt [80] T.Saad,B.Alawieh,H.T.Mouftah,andS.Gulder.Tunnelingtechniquesforend-to-endvpns:genericdeploymentinanopticaltestbedenvironment.IEEECommunicationsMagazine,44(5):124{132,May2006. [81] W.Simpson.Ipiniptunneling.RFC1853,RFCEditor,October1995. [82] H.Sivakumar,S.Bailey,andR.L.Grossman.Psockets:Thecaseforapplication-levelnetworkstripingfordataintensiveapplicationsusinghighspeedwideareanetworks.InSupercomputing,ACM/IEEE2000Conference,pages38{38,Nov2000. [83] P.SrisureshandM.Holdrege.Ipnetworkaddresstranslator(nat)terminologyandconsiderations.RFC2663,RFCEditor,August1999. http://www.rfc-editor.org/rfc/rfc2663.txt [84] R.Stewart.Streamcontroltransmissionprotocol.RFC4960,RFCEditor,September2007. http://www.rfc-editor.org/rfc/rfc4960.txt [85] I.Stoica,R.Morris,D.Karger,M.F.Kaashoek,andH.Balakrishnan.Chord:Ascalablepeer-to-peerlookupserviceforinternetapplications.InProceedingsofthe2001ConferenceonApplications,Technologies,Architectures,andProtocolsforComputerCommunications,SIGCOMM'01,pages149{160,NewYork,NY,USA,2001.ACM. 118

PAGE 119

[86] A.I.SundararajandP.A.Dinda.Towardsvirtualnetworksforvirtualmachinegridcomputing.InProceedingsofthe3rdConferenceonVirtualMachineResearchAndTechnologySymposium-Volume3,VM'04,pages14{14,Berkeley,CA,USA,2004.USENIXAssociation. [87] K.Thompson,G.J.Miller,andR.Wilder.Wide-areainternettracpatternsandcharacteristics.IEEENetwork,11(6):10{23,Nov1997. [88] J.Touch.Dynamicinternetoverlaydeploymentandmanagementusingthex-bone.InProceedings2000InternationalConferenceonNetworkProtocols,pages59{68,2000. [89] M.TsugawaandJ.A.B.Fortes.Avirtualnetwork(vine)architectureforgridcomputing.InProceedings20thIEEEInternationalParallelDistributedProcessingSymposium,pages10pp.{,April2006. [90] R.Uhlig,G.Neiger,D.Rodgers,A.L.Santoni,F.C.M.Martins,A.V.Anderson,S.M.Bennett,A.Kagi,F.H.Leung,andL.Smith.Intelvirtualizationtechnology.Computer,38(5):48{56,May2005. [91] G.WangandT.S.E.Ng.Theimpactofvirtualizationonnetworkperformanceofamazonec2datacenter.In2010ProceedingsIEEEINFOCOM,pages1{9,March2010. [92] M.WesterlundandC.Perkins.Ianaregistryforinteractiveconnectivityestablishment(ice)options.RFC6336,RFCEditor,July2011. [93] D.Williams,H.Jamjoom,Z.Jiang,andH.Weatherspoon.VirtualWiresforLiveMigratingVirtualNetworksacrossClouds.InIBMRESEARCH.IBMResearchReportRC25378,2013. [94] D.Williams,H.Jamjoom,andH.Weatherspoon.Thexen-blanket:Virtualizeonce,runeverywhere.InProceedingsofthe7thACMEuropeanConferenceonComputerSystems,EuroSys'12,pages113{126,NewYork,NY,USA,2012.ACM. [95] L.Xia,Z.Cui,J.R.Lange,Y.Tang,P.A.Dinda,andP.G.Bridges.Vnet/p:Bridgingthecloudandhighperformancecomputingthroughfastoverlaynetworking.InProceed-ingsofthe21stInternationalSymposiumonHigh-PerformanceParallelandDistributedComputing,HPDC'12,pages259{270,NewYork,NY,USA,2012.ACM. [96] E.Yildirim,D.Yin,andT.Kosar.Balancingtcpbuervsparallelstreamsinapplicationlevelthroughputoptimization.InProceedingsoftheSecondInternationalWorkshoponData-awareDistributedComputing,DADC'09,pages21{30,NewYork,NY,USA,2009.ACM. [97] F.Zhang,J.Chen,H.Chen,andB.Zang.Cloudvisor:Retrottingprotectionofvirtualmachinesinmulti-tenantcloudwithnestedvirtualization.InProceedingsoftheTwenty-ThirdACMSymposiumonOperatingSystemsPrinciples,SOSP'11,pages203{216,NewYork,NY,USA,2011.ACM. 119

PAGE 120

[98] M.Zhang,J.Lai,A.Krishnamurthy,L.Peterson,andR.Wang.Atransportlayerapproachforimprovingend-to-endperformanceandrobustnessusingredundantpaths.InProceedingsoftheAnnualConferenceonUSENIXAnnualTechnicalConference,ATEC'04,pages8{8,Berkeley,CA,USA,2004.USENIXAssociation. 120

PAGE 121

BIOGRAPHICALSKETCHKyuhoJeongwasbornandgrewupinSeoul,SouthKorea.HegraduatedSillimHighSchoolandattendedHongikUniversityin2001.HeservedforRepublicofKoreaArmyfrom2003to2005,thenmovedtoAustraliatopracticeEnglishandfortravel.HereturnedtothecollegeandobtainedhisBachelorofSciencedegreeinelectronicandelectricalengineeringfromHongikUniversity,Seoul,SouthKorea,in2008.Aftergraduate,heworkedinSamsungElectronicsasalogicdesignandvericationengineer.Aftertheindustryexperience,hewasyearningforthestudyofcomputerengineeringabroad.HereceivedtheMasterofScienceanddoctoratedegreesinelectricalandcomputerengineeringfromUniversityofFlorida,in2012and2017,respectively.DuringhisPhDstudy,hefocusedonvirtualnetworkandcloudcomputing,andpublishedmultipleresearchpapers. 121