<%BANNER%>

Overlay Infrastructure Support for Internet Applications


PAGE 1

1

PAGE 2

2

PAGE 3

3

PAGE 4

IamgratefulforthehelpIhavereceivedinwritingthisdissertation.Firstofall,IwouldliketothankmyadvisorProf.ShigangChenforhisguidanceandsupportthroughoutmygraduatestudies.Withoutthenumerousdiscussionsandbrainstormswithhim,theresultspresentedinthisthesiswouldneverhaveexisted.IamgratefultoProf.LiuqingYang,Prof.RandyChow,Prof.SartajSahni,andProf.YeXiafortheirguidanceandencouragementduringmyyearsattheUniversityofFlorida(UF).IamthankfultoallmycolleaguesinProf.Chen'sgroup,includingLiangZhang,MyungKeunYoon,YingJian,andMingZhang.Theyprovidevaluablefeedbackformyresearch.IwouldliketothankmanypeopleintheComputerandInformationScienceandEngineering(CISE)Departmentfortheirhelpinmyresearchwork.Lastbutnotleast,Iamgratefultomyfamilyfortheirlove,encouragement,andunderstanding.Itwouldbeimpossibleformetoexpressmygratitudetowardstheminmerewords. 4

PAGE 5

page ACKNOWLEDGMENTS ................................. 4 LISTOFFIGURES .................................... 7 ABSTRACT ........................................ 9 CHAPTER 1INTRODUCTION .................................. 11 1.1OverlayNetworks ................................ 11 1.2RelatedWork .................................. 13 1.3Contribution ................................... 17 1.3.1ADistributedHybridQuerySchemetoSpeedUpQueriesinUnstructuredPeer-to-PeerNetworks ......................... 17 1.3.2ADistributedIncentiveSchemeforPeer-to-PeerNetworks ..... 18 1.3.3Capacity-AwareMulticastAlgorithmsonHeterogeneousOverlayNetworks ................................. 18 2AHYBRIDQUERYSCHEMETOSPEEDUPQUERIESINUNSTRUCTUREDPEER-TO-PEERNETWORKS ........................... 19 2.1Motivation .................................... 19 2.1.1ProblemsinPriorWork ......................... 19 2.1.2Motivation ................................ 21 2.2ConstructingaSmall-worldTopology ..................... 23 2.2.1MeasuringtheInterestSimilaritybetweenTwoNodes ........ 23 2.2.2ClusteringNodeswithSimilarInterests ................ 27 2.2.3BoundingClusters ............................ 27 2.3AHybridQueryScheme ............................ 29 2.3.1MixingInter-ClusterQueriesandIntra-ClusterQueries ....... 29 2.3.2ReducingtheCommunicationOverhead ................ 32 2.4Simulation .................................... 32 3MARCH:ADISTRIBUTEDINCENTIVESCHEMEINOVERLAYNETWORKS 40 3.1Motivation .................................... 40 3.1.1LimitationofPriorWork ........................ 40 3.1.2Motivation ................................ 41 3.2SystemModel .................................. 42 3.3AuthorityInfrastructure ............................ 43 3.3.1Delegation ................................ 43 3.3.2k-pairTrustworthySet ......................... 44 3.4MARCH:ADistributedIncentiveScheme .................. 45 3.4.1MoneyandReputation ......................... 45 5

PAGE 6

..................... 46 3.4.3Phase2:ContractVerication ..................... 47 3.4.4Phases3and4:MoneyTransferandContractExecution ...... 57 3.4.5Phase5:Prosecution .......................... 58 3.5SystemPropertiesandDefenseagainstVariousAttacks ........... 59 3.5.1SystemProperties ............................ 59 3.5.2DefendingAgainstVariousAttacks .................. 60 3.6Discussions ................................... 61 3.6.1RewardingDelegationMembers .................... 61 3.6.2MoneyRelling ............................. 62 3.6.3SystemDynamicsandOverhead .................... 62 3.7Simulation .................................... 63 3.7.1EectivenessofAuthority ....................... 65 3.7.2EectivenessofMARCH ........................ 65 4CAPACITY-AWAREMULTICASTALGORITHMSONHETEROGENEOUSOVERLAYNETWORKS .............................. 70 4.1Motivation .................................... 70 4.2SystemOverview ................................ 71 4.3CAM-ChordApproach ............................. 74 4.3.1Neighbors ................................ 74 4.3.2LookupRoutine ............................. 75 4.3.3TopologyMaintenance ......................... 76 4.3.4MulticastRoutine ............................ 77 4.3.5Analysis ................................. 79 4.4CAM-KoordeApproach ............................ 84 4.4.1Neighbors ................................ 84 4.4.2LookupRoutine ............................. 86 4.4.3MulticastRoutine ............................ 87 4.4.4Analysis ................................. 88 4.5Discussions ................................... 91 4.5.1GroupMemberswithVerySmallUploadBandwidth ......... 91 4.5.2ProximityandGeography ....................... 92 4.6Simulation .................................... 92 4.6.1Throughput ............................... 93 4.6.2Throughputvs.Latency ........................ 96 4.6.3PathLengthDistribution ....................... 96 4.6.4AveragePathLength .......................... 98 4.6.5ImpactofDynamicCapacityVariation ................ 100 5SUMMARY ...................................... 102 REFERENCES ....................................... 103 BIOGRAPHICALSKETCH ................................ 108 6

PAGE 7

Figure page 2-1Probabilityofrandomwalksescapingoutoftheclusterdecreasesexponentiallywiththeratioofnumberofinter-clusteredgestothenumberofintra-clusteredges. ......................................... 20 2-2Markovrandomwalksdiscoverlessdistinctnumberofnodesthanuniformrandomwalksdo. ....................................... 21 2-3Aqueryschememixinginter-clusterqueriesandintra-clusterqueries(thenodesinthegreyclustersfallintothesameinterestgroup). ............... 23 2-4Interestsimilaritybetweennodes.Thenumberofdataitemsin1,2,andnare50,100,and200respectively.A)Nocommonvisitednodes(dierentinterests).B)uandvhavevisitednode2(100dataitems)with8and5timesrespectively(acertainlevelofsimilarinterests).C)uandvhavevisitednoden(200dataitems)with8and5timesrespectively(fallsin-between) ............. 25 2-5Eectofaveragequerynumberontheclustersize. ................ 33 2-6Interestassociationisagoodmetrictoestimateinterestsimilarity ........ 34 2-7Percentageofreturnedquerywithinaspecichopnumber. ........... 34 2-8Percentageofreturnedquerywithinaspecicmessagenumber. ......... 35 2-9Percentageofreturnedquerywithinaspecichopnumberinalessclusterednetwork. ........................................ 36 2-10Percentageofreturnedquerywithinaspecicmessagenumberinalessclusterednetwork. ........................................ 36 2-11Numberofdistinctnodesdiscoveredinthesamegroupwithinacertainmessagerange. ......................................... 37 2-12Numberofdistinctnodesdiscoveredinthesamegroupwithinaspecichopnumber. ........................................ 38 2-13Percentageofmessagesdiscoveringdistinctnodeswithinacertainmessagerange. ............................................. 38 2-14Totalnumberofdistinctnodesdiscoveredwithinaspecichopnumber. .... 39 3-1Trustworthyprobabilityforadelegationand5-pairdelegationsetwithm=3;000are99.975%and99.815%respectively.Evenifadelegation/k-pairdelegationsetisnottrustworthy,itmaynotbecompromisedbecauseveryunlikelyasinglecolludinggroupcancontrolthemajorityofthem. ................ 44 3-2A)Protocolsforcontractvericationandexchange.B)moneytransfer.C)Prosecution. 52 7

PAGE 8

............................ 63 3-4Trustworthinessofk-pairdelegationset ...................... 63 3-5Mostofmaliciousnodesarerejectedwithintherst50transactions. ...... 65 3-6Failedtransactionratioandtheoverpaidmoneyratiodropquicklytosmallpercentageswithintherst100transactions. .......................... 66 3-7Overpaidmoneyratio(measuredafter250transactions)increaseslinearlywiththenumberofdishonestnodes. ........................... 67 3-8Numberofrejecteddishonestnodes(measuredafter250transactions)increaseslinearlytothenumberofdishonestnodes. ..................... 67 3-9Overpaidmoneyratiowithrespecttothethreshold ................ 68 3-10Numberofrejectednodeswithrespecttothethreshold .............. 68 3-11Numberofrejecteddishonestnodes(measuredafter250transactions)increaseslinearlytothenumberofdishonestnodes. ..................... 69 4-1Chordv.s.CAM-Chordneighbors(cx=3) ..................... 75 4-2CAM-Koordetopologywithidentierspace[0..63] ................. 85 4-3Multicastthroughputwithrespecttoaveragenumberofchildrenpernon-leafnode .......................................... 94 4-4Throughputimprovementratiowithrespecttouploadbandwidthrange ..... 94 4-5Multicastthroughputwithrespecttosizeofthemulticastgroup ......... 95 4-6Throughputvs.averagepathlength ........................ 95 4-7PathlengthdistributioninCAM-Chord.Legend\[x::y]"meansthenodecapacitiesareuniformlydistributedintherangeof[x::y]. .................. 97 4-8PathlengthdistributioninCAM-Koorde.Legend\[x::y]"meansthenodecapacitiesareuniformlydistributedintherangeof[x::y]. .................. 97 4-9Averagepathlengthwithrespecttoaveragenodecapacity ............ 98 4-10Proximityoptimization ................................ 99 4-11Throughputvs.latency ............................... 100 8

PAGE 9

9

PAGE 10

10

PAGE 11

11

PAGE 12

12

PAGE 13

1 { 7 ],inwhichdistributedhashtables(DHT)areusedtoprovidedatalocationmanagementinastrictlystructuredway.Wheneveranodejoins/leavestheoverlay,anumberofnodesneedtoupdatetheirroutingtablestopreservedesirablepropertiesforfastlookup.WhilestructuredP2Pnetworkscanoerbetterperformanceinresponsetimeandcommunicationoverheadforqueryprocedures,theysuerfromthelargeoverheadforoverlaymaintenanceduetonetworkdynamics.Inaddition,DHTsareinexibleinprovidinggenerickeywordsearchesbecausetheyhavetohashthekeysassociatedwithcertainobjects[ 8 ].UnstructuredP2PnetworkssuchasGnutellarelyonarandomprocess,inwhichnodesareinterconnectedinarandommanner.Therandomnessoershighresiliencetothenetworkdynamics.Basicunstructurednetworksrelyonoodingforusers'queries,whichisexpensiveincomputationandcommunicationoverhead.Consequently,scalabilityhasalwaysbeenamajorweaknessforunstructurednetworks.EvenwiththeuseofsupernodesinMorpheusandKaZaA,thetracisstillhigh,andevenexceedswebtrac.Searchingthroughrandomwalksareproposedin[ 8 { 10 ],inwhichincomingqueriesareforwardedtotheneighborthatischosenrandomly.Inrandomwalks,thereistypicallynopreferenceforaquerytovisitthemostpossiblenodesmaintainingtheneededdata,resultinginlongresponsetime. 13

PAGE 14

11 ]exploitsthelocalityofinterestsamongdierentnodes.Inthisapproach,apeerlearnsitsshortcutsbyoodingorpassivelyobservingitsowntrac.Apeerranksitsshortcutsinalistandlocatescontentbysequentiallyaskingalloftheshortcutsonthelistfromthetop,untilcontentisfound.Thebasicprinciplebehindthisapproachisthatanodetendstorevisitaccessednodesagainsinceitwasinterestedinthedataitemsfromthesenodesbefore.Theconceptofinterestsimilarityisvague,anditisdiculttomakeasubtle,quantitativedenitionbasedonit.Inaddition,itmaycausenewproblems.Anotherchallengeinresourcesharingsystemsistheincentiveschemes.Today'speer-to-peernetworkssuerfromtheproblemoffree-riders,thatconsumeresourcesinthenetworkwithoutcontributinganythinginreturn.Originallyitwashopedthatuserswouldbealtruistic,\fromeachaccordingtohisabilities,toeachaccordingtohisneeds."Inpractice,altruismbreaksdownasnetworksgrowlargerandincludemorediverseusers.Thisleadstoa\tragedyofthecommons,"whereindividualplayers'selfinterestcausesthesystemtocollapse.Toreducefree-riders,thesystemshavetoincorporateincentiveschemestoencouragecooperativebehavior.Somerecentworks[ 12 { 17 ]proposereputationbasedtrustsystems,inwhicheachnodeisassociatedwithareputationestablishedbasedonthefeedbacksfromothersthatithasmadetransactionswith.Thereputationinformationhelpsuserstoidentifyandavoidmaliciousnodes.Analternativeisvirtualcurrencyschemes[ 18 { 20 ],inwhicheachnodeisassociatedwithacertainamountofmoney.Moneyisdeductedfromtheconsumersofaservice,andtransferredtotheprovidersoftheserviceaftereachtransaction.Bothtypesofschemesrelyonauthenticmeasurementofservicequalityandunforgeablereputation/moneyinformation.Otherwise,selsh/maliciousnodesmaygainadvantagebasedonfalsereports.Forexample,aconsumermayfalselyclaimtohavenotreceivedserviceinordertopaylessordefameothers.Moreseriously,maliciousnodes 14

PAGE 15

13 21 ],orremovetheunderlyingincentiveforcheating[ 22 ].Inordertoapplythesealgorithms,thenodes'historyinformationmustbemanagedbyacentralauthority,whichisnotavailableintypicalpeer-to-peernetworks.Someotherworks[ 23 24 ]ndcircularservicepatternsbasedonthehistoryinformationsharedamongtrustednodes.Eachnodeinaservicecirclehaschancetobebothaproviderandaconsumer.Thecommunicationoverheadfordiscoveringservicecirclesisveryhigh,whichmakestheseschemesnotscalable.Inaddition,nodesbelongingtodierentinterestgroupshavelittlechancetocooperatebecauseservicecirclesareunliketoformamongthem.Besidestheresourcesharingsystems,application-levelmulticastisanotherpromisingapplicationsinoverlaynetworks.Manyresearchpapers[ 25 { 28 ]pointedoutthedisadvantagesofimplementingmulticastattheIPlevel[ 29 ],andarguedforanapplication-leveloverlaymulticastservice.Morerecentwork[ 30 { 36 ]studiedoverlaymulticastfromdierentaspects.Tohandledynamicgroupsandensurescalability,novelproposalsweremadetoimplementmulticastontopofoverlaynetworks.Forexample,Bayeux[ 37 ]andBorg[ 38 ]wereimplementedontopofTapestry[ 3 ]andPastry[ 4 ]respectively,andCAN-basedMulticast[ 39 ]wasimplementedbasedonCAN[ 2 ].El-Ansaryetal.studiedecientbroadcastinaChordnetwork,andtheirapproachcanbeadaptedforthepurposeofmulticast[ 40 ].Castroetal.comparestheperformanceoftree-basedandooding-basedmulticastinCAN-styleversusPastry-styleoverlaynetworks[ 41 ].Thesesystemsassumeeachnodehasthesamenumberofchildren.Hostheterogeneityisnotaddressed.Eventhoughoverlaymulticastcanbeimplementedontopofoverlayunicast,theyhaveverydierentrequirements.Inoverlayunicast,low-capacitynodes 15

PAGE 16

42 ].Notethattheterms\degree"and\capacity"areinterchangeableinthecontextofthisdissertation.Centralizedheuristicalgorithmswereproposedtobalancemulticasttracamongmulticastservicenodes(MSNs)andtomaintainlowend-to-endlatency[ 42 43 ].Thealgorithmsdonotaddressthedynamicmembershipproblem,suchasMSNjoin/departure.Therehasbeenaourishofcapacity-awaremulticastsystems,whichexcelinoptimizingsingle-sourcemulticasttreesbutarenotsuitableformulti-sourceapplicationssuchasdistributedgames,teleconferencing,andvirtualclassrooms,whicharethetargetapplicationsofouralgorithms.Bullet[ 44 ]isdesignedtoimprovethethroughputofdatadisseminationfromonesourcetoagroupofreceivers.Anoverlaytreerootedatthesourcemustbeestablished.Disjointdataobjectsaredisseminatedfromthesourceviathetreetodierentreceives.Thereceiversthencommunicateamongstthemselvestoretrievethemissingobjects;thesedynamiccommunicationlinks,togetherwiththetree,formamesh,whichoersbetterbandwidththanthetreealone.Overlaymulticastnetworkinfrastructure(OMNI)[ 45 ]dynamicallyadaptsitsdegree-constrainedmulticasttreetominimizethelatenciestotheentireclientset.Riabovetal.proposedacentralizedconstant-factorapproximationalgorithmfortheproblemofconstructingasingle-sourcedegree-constrainedminimum-delaymulticasttree[ 46 ].Yamaguchietal.describeda 16

PAGE 17

47 ].Thesealgorithmsaredesignedforasinglesourceandnotsuitablewhentherearemanypotentialsources(suchasindistributedgames).Buildingonetreeforeachpossiblesourceistoocostly.Usingasingletreeforallsourcesisalsoproblematic.First,aminimum-delaytreeforonesourcemaynotbeaminimum-delaytreeforothersources.Second,thesingle-treeapproachconcentratesthetraconthelinksofthetreeandleavesthecapacitiesofthemajoritynodes(leaves)unused,whichaectstheoverallthroughputinmulti-sourcemulticasting.Third,asingletreemaybepartitionedbeyondrepairforadynamicgroup. 17

PAGE 18

18

PAGE 19

2.1.1ProblemsinPriorWorkSmallcommunicationoverheadandshortresponsetimearethetwomainconcernsindesigningecientqueryschemesinpeertopeernetworks.Currentapproachessuervariousproblemsinachievingabettertradeobetweencommunicationoverheadandresponsetimeduetotheblindnessinsearchingprocedures.Flooding:Flooding[ 48 49 ],isapopularqueryschemetosearchadataiteminfullyunstructuredP2PnetworkssuchasGnutella.Whileoodingissimpleandrobust,itscommunicationoverhead,thatis,themessagenumber,increasesexponentiallywiththehopnumber.Inaddition,mostofthesemessagesvisitthenodesthathavebeensearchedinthesamequery,andtheycanberegardedasduplicatemessages.Consequently,communicationoverheadandscalabilityarealwaysthemainproblemsintheoodingapproach.RandomWalks:Randomwalks[ 8 { 10 50 ]relyonquerymessagesrandomlyselectingtheirnexthopsamongneighborstoreducethecommunicationoverhead.Aquerymayhavetogothroughmanyhopsbeforeitsuccessfullylocatesthequerieddataitem.Consequently,thisapproachtakesalongtime.Ifthenetworksarewell-clustered(nodeswithsimilarinterestsaredenselyconnected),itisexpectedthatthequerylatencycanbereducedsignicantly.However,itisnottrue,becausethechanceofarandomwalkmessageescapingoutoftheoriginalclusterincreasesexponentiallywiththeratioroftheinter-clusteredgenumbertotheintra-clusteredgenumber,asshowninFigure 2-1 .Inthecaseofanetworkwithasmallvalueofr(e.g.,r<0:01),ifquerieddataitemsareindierentclustersfromthesourcenode,aquerymessagehastowalkalongdistancetobeabletotraversetheclusterborderandlocatethequerieddataitems.Inthecaseofanetworkwithalargevalueofr(e.g.,r>0:1),queryrequestsmayescapeoutofthe 19

PAGE 20

Probabilityofrandomwalksescapingoutoftheclusterdecreasesexponentiallywiththeratioofnumberofinter-clusteredgestothenumberofintra-clusteredges. originalclusterwithinasmallnumberofhops,resultinginalongresponsetimeifthequerieddataisintheoriginalcluster.TheseobservationsalsoaredemonstratedbyoursimulationsinSection 3.7 .Consequently,randomwalksmaysuerlongresponsetimeregardlessofthenetworkhavingbeenwell-clusteredornot.Interest-basedshortcut:Interest-basedshortcut[ 11 ]triestoavoidtheblindnessinrandomwalksbyfavoringnodessharingsimilarinterestswiththesource,whichcanberegardedasavariationofmarkovrandomwalks.Markovrandomwalksmayacceleratethequeryprocesstosomeextentinsomecases.However,itcausesnewproblems.Supposenodesinaninterestgrouphaveformedacluster,andquerymessagescanbearticiallyconnedinthisspeciccluster.Inthesenseofnodesintheclustersharesimilarinterests,anyofthempossiblymaintainsthequerieddata.Thequeryprocedureshouldshortenthecoveringtimeofthewholeclusterinsteadofthehittingtimeofsomespecicnodesinit.However,duetothebiasinselectingnexthopinmarkovrandomwalks,ittendstokeepvisitingsomespecicnodes,resultinginlessdistinctnodesbeingcoveredcomparingtouniformrandomwalks,asillustratedbyFigure 2-2 .Consequently,markovrandomwalksworkworsethanuniformrandomwalksifquerymessagescanbeconnedinspecicclusters. 20

PAGE 21

Markovrandomwalksdiscoverlessdistinctnumberofnodesthanuniformrandomwalksdo. 11 51 52 ]havefoundmanypeer-to-peernetworksexhibitsmall-worldtopology,andmostofquerieddataitemsareoeredbythenodes,whichsharesimilarinterestwiththesourcenode.Intuitively,thenodessharingsimilarinterestswiththesourcenodeshouldhavehigherprioritytobesearchedthanothers.Practically,therearetwochallengesindesigningsuchaqueryscheme.Therstoneishowtoconstructasmall-worldtopologytoclusternodessharingsimilarinterests.Bysaying\similarinterests",weactuallymeanthattwonodesareinterestedwithacommonsetofdataitems.Thenumberofcommonaccesseddataitemscanservedasametrictomeasuretheinterestsimilaritybetweentwonodes.Aclusteringalgorithmbasedonthemetriccanbeeasilydesignedtodenselyconnectthenodesinthesameinterestgroup.Moreover,eachnodeucanexplicitlypickupasetofinter-clusterneighborsthathavedierentinterestsfromu,andasetofintra-clusterneighborsthatsharesimilarinterestswithu.TakeFigure 2-3 asanexample.Thenetworkconsistsof5clusters,andnodesinthesameclusterfallintothesameinterestgroup.Notethatthereexistsaninterestgroupconsistingoftwoclusters:1,and5. 21

PAGE 22

2-3 ,rstinter-queriesareinitiatedbyanodeincluster1,whichtravelamongdierentclusters.Bytheinterestinformationcarriedintheinter-queries,cluster5isfoundtosharesimilarinterestswhenitishit,andanintra-clusterqueryisspawned,whichthenwillexhaustivelysearchthenodesinit.Inaddition,anintra-clusterqueryisspawnedincluster2byinter-clusterqueriestosupportblindsearch. 22

PAGE 23

Aqueryschememixinginter-clusterqueriesandintra-clusterqueries(thenodesinthegreyclustersfallintothesameinterestgroup). 2.2.1MeasuringtheInterestSimilaritybetweenTwoNodesClusterisgenerallyformedbyconnectingnodeswithsimilarinterestsinanetwork.WestartourdiscussionwiththedenitionofinterestsimilaritybetweentwonodesinP2Pnetworks.Ifnodeuandnodevsharesimilarinterests,thenitisverylikelythattheyhaveaccessedsamedataitemsmoreorlesspreviously.Thesizeofthecommonsubsetofaccesseddataitemscanservedasametrictomeasuretowhatextenttheinterestsoftwonodesaresimilar.Eachnodemayoerhundredsofdataitems,andhence,theremayexistalargenumberofdataitemseveninasmallnetwork.Asaresult,onlyifuandvhavevisitedalargenumberofdataitemsrespectively,theyareabletoshowsomedegreeofsimilarity.Analternativetoevaluatetheinterestsimilarityisbythenumberofcommonaccessednodes,whichmayenableaclusteringalgorithmtoconvergefasterthantheformerapproach.Theprobleminthisapproachisthattwonodesvisitingacommonnodedoesnotindicatetheyhavesimilarinterests,becauseanodemayoerdataitemsbelongingtomultipleinterestgroups.Forinstance,auserumayoerresourcesfortwogroups:anumberofmp3musiclesforonegroup,andanumberofresearchliteraturesinP2Pnetworksfortheothergroup.Itispossiblethattwonodesthathavevisitedumaybe 23

PAGE 25

Interestsimilaritybetweennodes.Thenumberofdataitemsin1,2,andnare50,100,and200respectively.A)Nocommonvisitednodes(dierentinterests).B)uandvhavevisitednode2(100dataitems)with8and5timesrespectively(acertainlevelofsimilarinterests).C)uandvhavevisitednoden(200dataitems)with8and5timesrespectively(fallsin-between) 2-4 ,uandvhavevisitedonecommonnode.ButuandvinBhavemorechanceofhavingvisitedcommondataitemsbecausethenumberofdataitemsinnode2ishalfofthatinnoden.Toaccountforthisissue,weintroduceaweighteddiagonalmatrixWwith(i;i)-thvaluewi;iequalto1 2-4 (B)asanexample): 25

PAGE 26

2-4 (C)is0.002,whichmeansnodesinAshowmoreinterestsimilarity.Fromthedenition,wehave Ifweviewfuiandfviastheprobabilitiesofnodesuandvvisitingnodei,and1 26

PAGE 27

3.1 .Toexploitthecharacteristicsofthesmallworldtopology,ourapproachistoexplicitlycapturetheclustersintheunderlyingtopologybyeachnodeimaintainingasetofinter-clusterneighborsinotherinterestgroups,andintra-clusterneighborsinitsowninterestgroup. 27

PAGE 28

28

PAGE 29

2.3.1MixingInter-ClusterQueriesandIntra-ClusterQueriesByexplicitlycapturingtheclusterstructuresintheunderlyingnetwork,wecanformallydenefollowingthreetypesofquerymessages.Therstoneiscalledl-querymessage,denotedasMl,whichisaspecialtypeofinter-clustermessageonlytravelingoninter-clusterneighbors.Thepurposeofitistoquicklylocatetheclustersthatmaysharesimilarinterestswiththesourcenodeanddisperseintra-queriesamongdierentclusters.Messagesinthistypeareissuedbythesourcenode,andwalkamongdierentclustersrandomly.Moreover,ifthequerieddataisinthesourcenode'sinterestgroup,l-querymessagesshouldpiggybackthesourcenode'sfrequencyvector,suchthatnodeshitbythemessagescandeterminewhethertheirclusterssharesimilarinterestswiththesourcenode.Thesecondoneiscalleds-querymessage,denotedasMs,whichisaspecialtypeofintra-clustermessageconningitselfwithinaspecicclusterbydoinguniformrandomwalksonlyonintra-clusterneighbors.s-querymessagesareonlyspawned/issuedintheclustersthatsharesimilarinterestswiththesourcenode.Thepurposeofitistoexhaustivelysearchnodesthatfallintothesameinterestgroupastheoriginalnode.Messagesinthistypeareissuedbythesourcenodeifthequeryisinthesameinterestgroup,and/orspawnedbyl-querymessages,ifclusterssharingsimilarinterestswiththesourcenodearehit.Inordertoreducethenumberofduplicates-querymessages,eachnodehavingreceivedthemessageshouldbeabletoestimatetowhatextenttheclusterhasbeencovered.Ifmostofnodeshavebeenvisited,ans-querymessagehaslittlechancetodiscoveranynewnodesbycontinuingtowalkinthecluster,indicatingthenewreceivedmessageshouldbediscarded.Otherwise,themessageshouldbeforwarded. 29

PAGE 30

30

PAGE 31

31

PAGE 32

32

PAGE 33

Eectofaveragequerynumberontheclustersize. Wecompareourschemetorandomwalks,inwhichasourcenodeissues32randomwalkmessagesineachquery,correspondingly.Wealsohavecomparedourschemetooodingschemes.Asexpected,weobservetheoodingschemessuerfromverylargecommunicationoverhead.Inthegures,thelegend\Uniformrandomwalks(0)"/\Uniformrandomwalks(1)"referstothequerieddataitemsareoutof/inthesourcenode'sinterestgroupintheuniformrandomwalksqueryscheme,andsimilarly\Inter-intra(0)"/\Inter-intra(1)"referstothequeriesareoutof/inthesourcenode'sinterestgroupintheproposedscheme.First,westudytheeectivenessofthemetricmeasuringinterestsimilarityandtheclusteringalgorithm.InFigure 2-5 ,itisobservedthatwhentheaveragequerynumberislargerthan10,thealgorithmreachesastablestateandalmostallnodesinthesameinterestgroupformasinglecluster.Itindicatesthatouralgorithmconvergefastwiththeaveragenumberofqueries,whichisespeciallyusefulinP2Pnetworks,wherenodestendtojoin/leavethesystemmorefrequently.ByFigure 2-6 ,itcanbeobservedthattheaveragenumberofnodesinaclusterisalmostthesameasgroupsize,demonstratingthatAu;vwcaneectivelymeasurethenodes'interestsimilarity. 33

PAGE 34

Interestassociationisagoodmetrictoestimateinterestsimilarity Figure2-7. Percentageofreturnedquerywithinaspecichopnumber. 34

PAGE 35

Percentageofreturnedquerywithinaspecicmessagenumber. Secondwestudytheperformanceofourschemewithrespecttoquerylatencyandcommunicationoverhead.InFigure 2-7 ,itisobservedthatifthequerieddataitemsfallintotheoriginalnode'sinterestgroup,thenumberofhopsneededforthemajorityofthequeriesissignicantlyreducedtoabout20,whileintheuniformrandomwalks,ittakesmuchlongertime.Correspondingly,thenumberofmessagesisalsomuchsmallerinourschemethanthatinrandomwalks,asshowninFigure 2-8 .Theguresalsoshowthatifthequerieddataareoutofthesourcenode'sinterestgroup,theperformanceofourschemeissimilartouniformrandomwalks.Notethatalongerresponsetimeisacceptablesinceonlyafewquerieswillbeoutofsourcenode'sinterestgroupinmanyP2Pnetworks.Inaddition,thesetwoguresalsodemonstratethatrandomwalksforqueriesinthesourcenode'sinterestgroupcanonlybenetmarginallyfromtheunderlyingclusteredtopology,thatis,onlyalittlelargerpercentageofthemcanbereturnedthanthoseoutofsourcenode'sinterestgroupwithinthesamenumberofhops).(messages).Wealsohavestudiedtheperformanceofanetwork,inwhicheachgroupconsistsofmultipledierentclusters,asshowninFigure 2-9 andFigure 2-10 .Theresultsshowthesimilartrends,whichkeepstruewithrespecttoallothermetricsthatwillbestudiedlater.Moreover,comparingFigure 2-7 withFigure 2-9 ,andFigure 2-8 withFigure 2-10 ,itcan 35

PAGE 36

Percentageofreturnedquerywithinaspecichopnumberinalessclusterednetwork. Figure2-10. Percentageofreturnedquerywithinaspecicmessagenumberinalessclusterednetwork. 36

PAGE 37

Numberofdistinctnodesdiscoveredinthesamegroupwithinacertainmessagerange. beobservedthattheperformanceofrandomwalksintwodierent(well-clusteredandpoor-clustered)networksaresimilar,whichfurtherveriesourargumentinSection 3.1 .Ashavebeenobserved,whenthequerieddataitemsareinthesourcenodes'interestgroup,ourschemeworksmuchbetterthanrandomwalks.Thereasonbehinditisthatourschemecandiscovermoredistinctnodesinthesourcenodes'interestgroupswithinthesamenumberofmessagesorhops,asshowninFigure 2-11 andFigure 2-12 .Inthegures,itcanbeobservedthatwithintherst1,000messagesor30hops,morethan120nodesinthesourcenode'sinterestgrouphavebeensearchedbyquerymessages.Consequently,themajorityofquerieddatafallingintothenode'sinterestgroupcanbefoundwithsmalleroverhead,andshorterlatency.Italsoindicatesourschemehasstrongerabilitytolocatemorereplicas,sinceitcandiscoveramuchlargernumberofnodessharingsimilarinterests.Occasionally,thequerieddataitemmaybemaintainedbynodesinotherinterestgroups,orclassiedintowronginterestgroupbysourcenode.Intheformercase,l-querieswillnotcarryanyinterestinformation,butinthelattercase,l-querieswillcarrywronginterestinformation.Inbothcases,theeciencyofourqueryschemecanbeevaluated 37

PAGE 38

Numberofdistinctnodesdiscoveredinthesamegroupwithinaspecichopnumber. Figure2-13. Percentageofmessagesdiscoveringdistinctnodeswithinacertainmessagerange. 38

PAGE 39

Totalnumberofdistinctnodesdiscoveredwithinaspecichopnumber. bythenumberofdistinctnodesdiscoveredbyqueries,includingthoseoutofthesourcenode'sinterestgroup,withinacertainnumberofmessagesandhops.Notethatwhetherthequeriesareinoroutoftheoriginalnode'sinterestgroupmakesnodierencetorandomworks.Figure 2-13 andFigure 2-14 showthatintherst1,000messages,ifthequeriescarryinginterestinformation,lessdistinctnodescanbesearchedinourscheme.Thereasonisthats-querymessagesmistakenlyexhaustivelysearchthenodesintheclustersthatshare\similar"interestsinthebeginning,whichhasbeendemonstratedbyourprevioussimulations.Consequently,thenumberofb-querymessagesislimited.Alongwiththeincrementofthenumberofmessages/hops,ourschemeworkssimilartotheuniformrandomwalks.Itisbecauseaftermostofnodessharingsimilarinterestsarecovered,moreb-querymessageswillbespawnedtosearchclusterswithdierentinterests,whichareabletodiscovermoredistinctnodes.Inaddition,bythegures,ifthequeriescarrynointerestinformation,ourschemeworkssimilartouniformrandomwalks. 39

PAGE 40

3.1.1LimitationofPriorWorkAnynodeinapeer-to-peernetworkisbothaserviceproviderandaserviceconsumer.Itcontributestothesystembyworkingasaprovider,andbenetsfromthesystemasaconsumer.Atransactionistheprocessofaprovideroeringaservicetoaconsumer,suchassupplyingavideole.Thepurposeofanincentiveschemeistoencouragethenodestotaketheroleofproviders.Neitherreputationsystems[ 12 { 17 ]norvirtualcurrencysystems[ 18 19 ]caneectivelypreventmaliciousnodes,especiallythoseincollusion,frommanipulatingtheirhistoryinformationbyusingfalseservicereports.Specically,theexistingschemeshavethefollowingproblems.Reputationination:Inthereputationschemes,maliciousnodescanworktogethertoinateeachother'sreputationortodefameinnocentnodes,bywhichcolludingnodesprotectthemselvesfromthecomplaintsbyinnocentnodesasthesecomplaintsmaybetreatedasnoisebythesystems.Moneydepletion:Inthevirtualcurrencyschemes,maliciousnodesmaylaunchattackstodepleteothernodes'moneyandparalyzethewholesystem.Withoutauthenticreputationinformation,innocentnodesarenotabletoproactivelyselectbenignpartnersandavoidmaliciousones.Frequentcomplainer:Inmanyincentiveschemes,nodeswillbepunishediftheycomplainfrequently,whichpreventsmaliciousnodesfromconstantlydefamingothersatnocost.Italsodiscouragesinnocentnodesfromreportingfrequentmaliciousactsbecauseotherwisetheywouldbecomefrequentcomplainers.Punishmentscale:Inmostexistingschemes,thescaleofpunishmentisrelatedtotheservicehistoryofthetransactionparticipants.Consequently,aninnocentnodemaybesubjecttonegativediscriminationattacks[ 12 ]launchedbynodeswithexcellenthistory. 40

PAGE 41

41

PAGE 42

1 2 4 ].Weassumetheroutingprotocolisrobust,ensuringthereliabledeliveryofmessagesinthenetwork[ 53 ].Wealsoassumethenetworkshavethefollowingproperties.Random,non-selectableidentier:Anodecannotselectitsidentier,whichshouldbearbitrarilyassignedbythesystem.ThisrequirementisessentialtodefendingtheSybilattack[ 54 ].Onecommonapproachistohashanode'sIPaddresstoderivearandomidentierforthenode[ 1 ].Public/privatekeypair:EachnodeAinthenetworkhasapublic/privatekeypair,denotedasPAandSArespectively.Atrustedthirdpartyisneededtoissuepublic-keycerticates.Thetrustedthirdpartyisusedo-lineoncepernodeforcerticateissuance,anditisnotinvolvedinanytransaction. 42

PAGE 43

3.3.1DelegationWhowillkeeptrackofthemoney/reputationinformationinaP2Pnetwork?Intheabsenceofacentralauthorityforthistask,wedesignadistributedauthorityinfrastructure.EachnodeAisassignedadelegation,denotedasDA,whichconsistsofknodespickedpseudo-randomly.Forexample,wecanapplykhashfunctions,i.e.,fh1;h2;:::;hkg,ontheidentierofnodeAtoderivetheidentiersofnodesinDA.Ifaderivedidentierdoesnotbelongtoanynodecurrentlyinthenetwork,the\closest"nodeisselected.Forexample,in[ 1 ],itwillbethenodeclockwiseafterthederivedidentieronthering.Thej-thelementinDAisdenotedasDA(j).DAkeepstrackofA'smoney/reputation.Anyanomalyintheinformationstoredatthedelegationmembersmayindicateanattempttoforgedata.Theinformationislegitimateonlyifthemajorityofthedelegationmembersagreeonit.Aslongasthemajorityofthedelegationmembersarehonest,theinformationaboutnodeAcannotbeforged.Suchadelegationissaidtobetrustworthy.Ontheotherhand,ifatleasthalfofthemembersaredishonest,thenthedelegationisuntrustworthy.Thedelegationmembersareappointedpseudo-randomlybythesystem.Anodecannotselectitsdelegationmembers,butcaneasilydeterminewhoarethemembersinitsoranyothernode'sdelegation.Tocompromiseadelegation,themalicious/selshnodesfromacolludinggroupmustconstitutethemajorityofthedelegation.Unlessthecolludinggroupisverylarge,theprobabilityforthistohappenissmallbecausetheidentiersofthecolludingnodesarerandomlyassignedbythesystemandtheidentiersofthedelegationarealsorandomlyassigned.Letmbethesizeofacolludinggroupandnbethetotalnumberofnodesinthesystem.TheprobabilityfortoutofknodestobeinthecolludinggroupisP(t;k;m n)=0B@kt1CA(m n)t(1m n)kt

PAGE 44

Trustworthyprobabilityforadelegationand5-pairdelegationsetwithm=3;000are99.975%and99.815%respectively.Evenifadelegation/k-pairdelegationsetisnottrustworthy,itmaynotbecompromisedbecauseveryunlikelyasinglecolludinggroupcancontrolthemajorityofthem. whereP(t;k;m n)denotestheprobabilityoftsuccessesinktrialsinaBinomialdistributionwiththeprobabilityofsuccessinanytrialbeingm n.Letmbethetotalnumberofdistinctnodesinallcolludinggroups,alsoincludingallmaliciousnodes.TheprobabilityofadelegationbeingtrustworthyisatleastPbk 3-1 (theuppercurve).Inordertocontroltheoverhead,weshallkeepthevalueofksmall.

PAGE 45

3-1 (thelowercurve). 3.4.1MoneyandReputationWiththedistributedauthoritydesignedintheprevioussection,thefollowinginformationaboutanodeAismaintainedbyadelegationofknodes.Totalmoney(TMA):ItisthetotalamountofmoneypaidbyotherstonodeAminusthetotalamountofmoneypaidtoothersbyAinallprevioustransactions.Theuniversalrelledmoney(Section 3.6.2 )willalsobeaddedtothisvariable.Overpaidmoney(OMA):Itisthetotalamountofmoneyoverpaidbyconsumers.AconsumerpaysmoneytonodeAbeforeaservice.Iftheservicecontractisnotfullledbythetransaction,theconsumermayleacomplaint,specifyingtheamountofmoneythatithasoverpaid.Thisamountcannotbegreaterthanwhattheconsumerhaspaid.Whenanewnodejoinsthenetwork,itstotalmoneyandoverpaidmoneyareinitializedtozero.FromTMAandOMA,wedenethefollowingtwoquantities.Availablemoney(mA):ItistheamountofmoneythatnodeAcanusetobuyservicesfromothers. 45

PAGE 46

46

PAGE 47

47

PAGE 48

55 ].A(k;t)thresholdcryptographyschemeallowskmemberstosharetheresponsibilityofperformingacryptographicoperation,sothatanysubgroupoftmemberscanperformthisoperationsuccessfully,whereasanysubgroupoflessthantmemberscannot.Fordigitalsignature,ksharesoftheprivatekeyaredistributedtothekmembers.Eachmembergeneratesapartialsignaturebyusingitsshareofthekey.Afteracombinerreceivesatleasttpartial 48

PAGE 49

MSG1Alice!DA(i):[fSA(i)gPDA(i)]SA,8DA(i)2DAStep2:Afterallmembersreceivetheirkeyshares,theynegotiateacommonrandomnumbers(possiblybymulti-partyDie-Hellmanexchangewithauthentication).EachmembersendsthenumbersasachallengetoAlice,signedbythemember'sprivatekeyandthenencryptedbyAlice'spublickey. MSG2DA(i)!Alice:f[s]SDA(i)gPA,8DA(i)2DAStep3:AlicesignsswithSA(i)andthenwithSAbeforesendingitbacktoDA(i). MSG3Alice!DA(i):[[s]SA(i)]SA,8DA(i)2DA

PAGE 50

MSG5DA(i)!DA(j):[\SA(i)isinvalid"]SDA(i),8DA(j)2DAStep5:DA(i)needstocollect[s]SA(j);8DA(j)2DA,whicharethepartialsignaturesons.IfitreceivesMSG4[[s]SA(j)]SAfromDA(j),thevalueof[s]SA(j)isinthemessage.IfitreceivesMSG5fromDA(j),therearetwopossibilities:eitherAliceorDA(j)isdishonest.Toresolvethissituation,DA(i)forwardsthecertiedcomplainttoAlice.IfAlicechallengesthecomplaint,shemustdisclosethecorrectvalueofSA(j)toDA(i)inthefollowingmessage(thenDA(j)canlearnSA(j)fromDA(i)). MSG6Alice!DA(i):[fSA(j)gPDA(i)]SALearningSA(j)fromthismessage,DA(i)cancompute[s]SA(j).AfterDA(i)hasallkpartialsignaturesons,itcandeterminethatAliceishonestifany(bk Proof. 50

PAGE 51

51

PAGE 52

A)Protocolsforcontractvericationandexchange.B)moneytransfer.C)Prosecution. latertimes.ThedelegationsmustverifytheinformationclaimedbyAliceandBobinthecontractandgeneratethecontractproofsthatAliceandBobneedinordertocontinuetheirtransaction.Wedesignacontractvericationprotocoltoimplementtheaboverequirements.Theprotocolconsistsoffoursteps,illuminatedinFigure 3-2 (A),andthenumberofmessagesisO(k)fornormalcases.Aprocedurecallisdenotedasx:y(z),whichmeanstoinvokeprocedureyatnodexwithparameter(s)z.Ifxisaremotenode,asignedmessagecarryingzmustbesenttoxrst.Step1:Alicesendsthecontractcandadigitalsignaturec0tothedelegationDAforvalidation.c0maybeasignatureofthecontractconcatenatingtheidentierofthereceiver,i.e.,c0=[cjDA(i)]SA.Bobdoesthesamething. Alice.SendContract(Contractc,Signaturec0)1.fori=1tokdo2.DA(i).ComputePartialProof(c,c0)Step2:ThenthedelegationmemberDA(i)veriesthereputationclaimedbyAliceinthecontract(denotedasc:rA)andcomputesapartialsignature(denotedaspsi)onthecontractwithitskeyshareestablishedbythepreviousprotocol. 52

PAGE 53

53

PAGE 54

54

PAGE 55

Proof. 1. Bobover-claimsitsavailablemoneymB.Inthiscase,allhonestmembersinDBcandetectBob'sfraudinStep1,andpunishesBob.Inthemeantime,thesememberswillneitherforwardthepartialsignature[c]SB(i)toDA(i)nordeliver[c]SA(i)toBob,andconsequentlythetransactionwillbeaborted. 2. Bobmodiesthecontractspecication,forexample,byloweringthetransactionpricec:Linordertopaylessforthetransaction.Hesendsthesamemodied 55

PAGE 56

3. Bobsendsdierentmodiedcontractstothedelegationmembers.MultipledelegationmemberswilldetectthatthecontractsfromBobandAlicearedierent,andtheywillinvokethedetect()routine.Attheendofthedetectionprocedure,allmemberswilllearnthattherearedierentcontractsignaturescomingfromBob.Consequently,theywillallpunishBobandabortthetransaction. 4. Bobdoesnotsendthecontractstosome(orall)delegationmembers.Inthiscase,amemberDB(i)thatdoesnotreceivedthecontractfromBobwillsendtherequestREQtoallothermembers.IfnoothermemberreceivesthecontractfromBob,DB(i)willreceivenoreplyback.Itwillrefusetocontinuethevericationprocess,butwillnotpunishBobbecauseeitherAliceorBobcanbelying.Similarly,allothermemberswillalsostoptheverication.NowifsomeothermembershavereceivedthecontractsfromBob,DB(i)willreceiveBob'ssignaturesintherepliesfromthem,anditwillcontinuethevericationprocessusingthecontractsretrievedfromothermembers.Noonestopsthevericationprocess.Insummary,wecanseethatallmembersinthetrustworthysetwilltakethesameaction(continuingorstopingthecontractvericationprocess)inallfourpossiblecases.Next,weprovethatdishonestmemberscannotdeceivehonestmembersinthetrustworthysetstostopthevericationprocessifbothAliceandBobarehonest.Aswehavediscussedabove,onlyintwocases,anhonestdelegationmember,DA(i)orDB(i),willstopthecontractverication.Onecaseisthatthecontractsignatures(bothAlice'sandBob's)receivedbythememberinthedetect()routinearenotidentical.TheothercaseisthatthememberreceivesdierentcontractsfromAliceandBobinStep3.The 56

PAGE 57

3.1 andthediscussionsabove,ifAliceandBobarehonest,bothofthemareabletocollectnolessthanbkc+1 2correctpartialsignatures,andcomputethevalidcontractproofs.Otherwise,ifeitherAliceorBobisdishonest,thetransactionisaborted. 3-2 (B).UponreceivingamoneytransferrequestfromAlice,thedelegationmemberDA(i)invokesthefollowingprocedure. 57

PAGE 58

3-2 (C).TherequestspeciestheamountofmoneyfthatBobthinkshehasoverpaid.UponreceivingaprosecutionrequestfromBob,ifDAcannotevaluatetheservicequality,itpunishesbothAliceandBobbyfreezingthemoneyoverpaidbyBob.Theprocedureisgivenasfollows. 58

PAGE 59

3.5.1SystemPropertiesWestudythepropertiesofMARCH,whichsolvesoralleviatestheproblemsinthepreviousapproaches.First,accordingtothemoneytransferproceduresinSection 3.4.4 ,transactionsamongmembersinthesamecolludinggroupcannotincreasethetotalamountofavailablemoneyofthegroup.Wehavethefollowingproperty,whichindicatesthatthemaliciousnodescannotbenetbycooperation. 12 13 15 16 ],MARCHdoesnotmaintainthehistoryofanyconsumer'scomplaints,anddoesnotpunishfrequentcomplainers.Wehavethefollowingproperty. 59

PAGE 60

2 ,adeceivedconsumercanseekrevengewithnorestriction,whichmeansamaliciousprovidercannotbenetfromitsaction.Wehavethefollowingproperties. 3 removesnancialincentivestocheat.Aprovidercanmakemoneyonlybyservingothers;aconsumerwillnotberefundedforcheating.Property 4 makessurethataninnocentnodewillnotbesubjecttonegativediscriminationattacks[ 12 ],inwhichnodeswithexcellentreputationcanseverelydamageothernodes.Insummary,themaliciousnodescannotincreasetheirpower(intermsofavailablemoney)bycooperation,andtheycanonlyattackothersatthecostoftheirowninterests,i.e.,moneyand/orreputation.Consequently,thetotaldamagecausedbythemaliciousnodesisstrictlylimited.Theywilleventuallyberejectedfromthesystemduetopoorreputationorbeenforcedtoserveothersforbetterreputationinordertostayinthesystem. 12 ].Unfairlyhighratings:Themembersofacolludinggroupcooperatetoarticiallyinateeachother'sreputationbyfalsereports,sothattheycanattackinnocentnodesmoreeectively.InMARCH,acolludinggroupcaninatethereputationofsomemembersonlybymovingtheavailablemoneyfromothermemberstothem.According 60

PAGE 61

1 ,thetotalmoneyinthegroupcannotbeinatedthroughcooperation.Althoughsomemembers'reputationcanbemadebetter,othermembers'reputationwillbecomeworse,makingthemineectiveinattacks.Unfairlylowratings:Providerscolludewithconsumersto\bad-mouth"otherprovidersthattheywanttodriveoutofthemarket.BecauseMARCHrequiresallconsumerstopaymoneyfortheirtransactionsbeforetheycandefametheproviders,themaliciousconsumerslosetheirmoney(andreputation)for\bad-mouthing",whichinturnmakesitharderforthemtostayinthesystem.Negativediscrimination:Aprovideronlydiscriminatesafewspecicconsumersbyoeringserviceswithmuchloweredqualitythanwhatthecontractspecies.Ithopestoearnsome\extra"moneywithoutdamagingitsreputationsinceitservesmostconsumerhonestly.InMARCH,aprovidercannotmakesuch\extra"moneybecauseoftheprosecutionmechanismandProperties2-3.Positivediscrimination:Aprovidergivesanexceptionallygoodservicetoafewconsumerswithhighreputationandanaverageservicetotherestconsumers.Thestrategywillworkinanincentiveschemewhereaconsumer'sabilityofaectingaprovider'sreputationishighlyrelatedtotheconsumer'sownreputation,andviceversa.MARCHdoesnothavethisproblem.Theprovider'sreputationchangesafteratransactionisdeterminedbyhowmuchmoneyitreceivesfortheservice,notbythereputationoftheconsumer. 61

PAGE 62

3.3.1 .TakeChord[ 1 ]asanexample.WecanselectasubsetofthelognneighborsofnodeAasthedelegationDA.Inthisway,themaintenanceofthedelegationisfreeasChordalreadymaintainstheneighborset.Thecommunicationoverheadofatransaction(excludingtheactualservice)consistsofO(k)controlmessages,whicharesentfromtheprovider(consumer)viakpairsof 62

PAGE 63

Trustworthinessofdelegation Figure3-4. Trustworthinessofk-pairdelegationset delegationmemberstotheconsumer(provider)throughputdirectTCPconnections.Thisoverheadisquitesmallcomparingtothetypicalservicessuchasdownloadingvideolesofmanygigabytesorsharingstorageformonths.Moreimportantly,theoverheaddoesnotincreasewiththenetworksize,whichmakesMARCHascalablesolution,comparingwithotherschemes[ 23 24 ]whoseoverheadincreaseswiththenetworksize. 63

PAGE 64

4{2 ).Ifmaxisnegative,thenthenodecannolongerbeaprovider.IfadishonestnodeinCategorytwoorthreendsthatitsmaxvaluemaybecomenegativeafteradditionalmaliciousacts,itwillbehavehonestly).Theactualsellingpriceisarandomnumbertakenuniformlyfrom(0;max].Ifanodecanneitherbeaprovider(duetopoorreputation),norbeaconsumer(duetolittlemoney),itissaidtoberejectedfromthesystem.Ifoneparticipantinatransactiontriestodeceivetheotherone,thetransactioniscalledafailedtransaction.Wedene\failedtransactionratio"asthenumberoffailedtransactionsdividedbythetotalnumberoftransactions,and\overpaidmoneyratio"asthetotalamountofoverpaidmoneydividedbythetotalamountofmoneypaidinthetransactions.Thesemetricsareusedtoassesstheoveralldamagecausedbydishonestnodes. 64

PAGE 65

Mostofmaliciousnodesarerejectedwithintherst50transactions. 3-3 showsthenumberofuntrustworthydelegationswithrespecttothenumberofdishonestnodesfork=3,5,and7.Recallthatadelegationisuntrustworthyifatleasthalfofitsmembersaredishonest.Outof100,000delegations,onlyafewnumberofthemareuntrustworthy.Fork=5,thenumberofnodeswithuntrustworthydelegationsisjust23evenwhenthereare3,000dishonestnodes.Figure 3-4 showstheprobabilityforanarbitraryk-pairdelegationsettobeuntrustworthy(Section 3.3.2 ).The5-pairdelegationsetistrustworthywithaprobabilitylargerthan99.8%evenwhenthereare3,000dishonestnodes.Notethatwhenadelegationisuntrustworthy,thedishonestmembersmaynotbelongtothesamecolludinggroup.Withoutcooperation,thedamagetheycancausewillbesmaller. 3-5 presentshowthenumberofrejectednodeschangeswiththeaveragenumberoftransactionsperformedpernode,whichcanbeusedasthelogicaltimeasthesimulationprogresses.Recallthatthedefaultnumberofdishonestnodesis1,000.Thegureshowsthatmostdishonestnodesarerejectedfromthesystemwithin50transactionspernode. 65

PAGE 66

Failedtransactionratioandtheoverpaidmoneyratiodropquicklytosmallpercentageswithintherst100transactions. Becauseofmoneyrelling,somerejectednodeswillrecoverafterenoughmoneyisrelled,buttheywillberejectedagainafterperformingmalicioustransactions.Nohonestnodesarerejectedfromthesystemduringthesimulation.Figure 3-6 showsthatthefailedtransactionratiodropsquicklyfrom1.4%to0.3%withintherst100transactionspernode,andtheoverpaidmoneyratiodropsfrom1.4%to0.2%inthesameperiod.Asthetimeprogresses,theseratiosbecomeevenmoreinsignicant.Notethattheoverpaidmoneyratioissmallerthanthefailedtransactionratio.Thisisbecausethedishonestprovidershavetolowertheirpricesinordertocompetewithhonestproviders,whichinturnlowerstheirabilitytocausesignicantdamage.Ironically,ifadishonestnodewithpoorreputationwantstostayinthesystem,notonlydoesithavetobehavehonestlytogainreputation,butalsoithastodosowithlowerpriceinordertogetconsumers,which\repairs"thedamageitdoestothesystempreviously.Next,westudyhowthenumberofdishonestnodesaectsthesystemperformance.Figure 3-7 showstheoverpaidmoneyratioafter250transactionspernode.Wendthattheratioincreaseslinearlywiththenumberofdishonestnodes.Evenwhenthereare3,000 66

PAGE 67

Overpaidmoneyratio(measuredafter250transactions)increaseslinearlywiththenumberofdishonestnodes. Figure3-8. Numberofrejecteddishonestnodes(measuredafter250transactions)increaseslinearlytothenumberofdishonestnodes. 67

PAGE 68

Overpaidmoneyratiowithrespecttothethreshold Figure3-10. Numberofrejectednodeswithrespecttothethreshold dishonestnodes,theoverpaidmoneyratioremainsverysmall,just0.15%.Figure 3-11 showsthatthemorethenumberofdishonestnodes,themoretheyarerejected.Last,westudytheimpactofthethresholdonthesystemperformance.Thethresholdisusedbyaconsumertoselectthepotentialproviders(Section 3.4.2 ).Figure??showsthattheoverpaidmoneyratiodecreaseslinearlywiththethresholdvalue,whichmeansthesystemperformsbetterwithalargerthreshold.Figure 3-10 showsthatthenumberofrejecteddishonestnodesislargelyinsensitivetothethresholdvalue.Whenthethresholdistoolow,somehonestnodesmayberejectedbythesystembecauseasmallerthresholdallowsthedishonestnodestodomoredamageonthehonestnodes,whichmayeven 68

PAGE 69

Numberofrejecteddishonestnodes(measuredafter250transactions)increaseslinearlytothenumberofdishonestnodes. causesomehonestnodestoberejectedfromthesystemduetodefamedreputation.Thenumbersintheabovetwoguresaremeasuredafter250transactionspernode. 69

PAGE 70

1 ]andKoorde[ 56 ]tobecapacity-aware, 70

PAGE 71

1 ]andKoorde[ 56 ]forthispurpose.TheresultingsystemsarecalledCAM-ChordandCAM-Koorde,respectively.TheprinciplesandtechniquesdevelopedshouldbeeasilyappliedtootherP2Pnetworksaswell.ACAM-ChordorCAM-Koordeoverlaynetworkisestablishedforeachmulticastgroup.Allmembernodes(i.e.,hostsofthemulticastgroup)arerandomlymappedbyahashfunction(suchasSHA-1)ontoanidentierring[0;N1],wherethenext 71

PAGE 72

2,1 4,1 8,...,oftheringawayfromx,respectively.Whenreceivingalookuprequestforidentierk,anodeforwardstherequesttotheneighborclosesttok.ThisgreedyalgorithmtakesO(log2n)hopswithhighprobabilitytondbk,thenoderesponsiblefork.EachnodeinKoordehasmneighbors.Anode'sidentierxisrepresentedasabase-mnumber.Itsneighborsarederivedbyshiftingonedigit(withvalue0..m-1)intoxfromtherightsideanddiscardx'sleftmostbittomaintainthesamenumberofdigits.Whenxreceivesalookuprequestfork,theroutingpathoftherequestrepresentsatransformationfromxtokbyshiftingonedigitofkatatimeintoxfromtherightuntiltherequestreachesthenoderesponsiblefork.BecausekhasO(logmn)digits,ittakesO(logmn)hopswithhighprobabilitytoresolvealookuprequest.Readersarereferredtotheoriginalpapersformoredetails.OurrstsystemisCAM-Chord,whichisessentiallyabase-cx(insteadofbase-2)Chordwithcxvariablefordierentnodes.Thenumberofneighborsofanodexis 72

PAGE 73

57 ].Thetwosystemsachievetheirbestperformancesunderdierentconditions.OursimulationsshowthatCAM-ChordisabetterchoicewhenthenodecapacitiesaresmallandCAM-Koordeisbetterwhenthenodecapacitiesarelarge. 73

PAGE 74

1 ])do.ThereareNOdataitemsassociatedwiththeidentierspace.EachmulticastgroupformsitsownCAM-Chordnetwork,whosesolepurposeistoprovideaninfrastructurefordynamiccapacity-awaremulticasting. 2,1 4,1 8,...,oftheringawayfromx.CAM-Chordisabase-cxChordwithvariablecxfordierentnodes.Letcixmean(cx)i.Theneighboridentiersare(x+jcix)modN,denotedasxi;j,8j2[1::cx1],8i2[0::logN logcxc(4{1) cixc(4{2) 74

PAGE 75

Chordv.s.CAM-Chordneighbors(cx=3) Itcanbeeasilyveriedthatxi;jistheneighboridentierofxthatiscounter-clockwiseclosesttok,whichmeanscxi;jistheneighbornodeofxthatiscounter-clockwiseclosesttonodebk. logcxc5.jbkx cixc6.ifk2(x;cxi;j]then 75

PAGE 76

1 ].ThejoinoperationofChordcanbeoptimizedbecausetwoconsecutivenodesontheringarelikelytohavesimilarneighbors.Whenanewnodejoins,itrstperformsalookuptonditssuccessorandretrievesitssuccessor'sneighbors(calledngersinChord).Itthenchecksthoseneighborstomakecorrectionifnecessary.Inabase-cChord,thejoincomplexitywithouttheoptimizationisO(clog2n c))byTheorem1(tobeproved).ThejoincomplexityisO(cxlog2n c)),whichwouldbereducedtoO(clog2n 76

PAGE 77

58 ]showedthatover20%oftheconnectionslast1minuteorlessand60%oftheIPaddresseskeepactivefornomorethan10minuteseachtimeaftertheyjointhesystem.ButCAM-Chordisnotdesignedforle-sharingapplications.OneappropriateCAM-Chordapplicationisteleconferencing,whichhasfarlessparticipantsthanFastTrackandlessdynamicmemberchanges.Wedonotexpectthemajorityofparticipantstokeepjoininganddepartingduringaconferencecall.Anotherapplicationisdistributedgames,whereauserismorelikelytoplayforahourthanforoneminute.CAM-Chordmakesatradeobetweencapacityawarenessandmaintenanceoverhead,whichmakesitunsuitableforhighlydynamicmulticastgroups.Forthem,CAM-Koordeisabetterchoicebecauseanodeonlyhascxneighbors.OurfutureresearchwillattempttodevelopnewtechniquestoovercomethislimitationofCAM-Chord. 77

PAGE 78

cixc/*selectchildrenfromlevel-ineighborsprecedingk*/6.k0k7.form=jdownto18.dxi;m:MULTICAST(msg;k0)9.k0xi;m1/*select(cxj1)childrenfromlevel-(i1)neighbors*/10.lcx11.form=cxj1downto112.llcx

PAGE 79

4{1 )and( 4{2 ),kcanbewrittenask=x+jcix+lwherej2[1::cx1]isthesequencenumberofkwithrespecttoxandl2[0::cix). 79

PAGE 80

4.3.1 ),xi;j=x+jcix.Becausex,xi;j,cxi;j,andkareinclockwiseorderontheidentierring,wehave 4{3 )and( 4{4 ),wehavekcxi;j jcix+l
PAGE 81

n,whichistheaveragedistancebetweentwoadjacentnodesontheidentierring.ThefollowingisasucientconditiontoachieveE(kxm)
PAGE 82

2ln2t21 2ln2t1)1 c)becauset22candt1cTherefore,lnE(lncx c)=(lnc)ByTheorem 4.2 ,O(lnn cx))=O(lnn Otherdistributionsofcxmaybeanalyzedsimilarly.NextweanalyzetheperformanceoftheMULTICASTroutineinCAM-Chord.Supposexexecutesx.MULTICAST(msg,k),whichisresponsiblefordeliveringmsgtoallnodesintheidentiersegment(x;k).Specically,xforwardsmsgtosomeneighbornodesdxi;mbyremotelyinvokingdxi;m.MULTICAST(msg,k0),whichisresponsiblefordeliveringmsgtoasmallersubsegment(dxi;m;k0),wheredxi;m;k02(x;k).Itisatypicaldivide-and-conquerstrategy.Wecallk0dxi;m 82

PAGE 83

4{1 )and( 4{2 ),kcanbewrittenask=x+jcix+lwherej2[1::cx1]isthesequencenumberofkwithrespecttoxandl2[0::cix). 4{6 )and( 4{7 ),wehavek0dxi;m 83

PAGE 84

4.2 and 4.3 ,duetothesimilaritybetweenLemma 4.4 andLemma 4.1 ,onwhichthetheoremsarebased.Toavoidexcessiverepetitionandtoconservespace,weomittheproofsforTheorems 4.5 and 4.6 cx)). 84

PAGE 85

CAM-Koordetopologywithidentierspace[0..63] 4-2 ,showingtheneighborsofnode36(100100)whosecapacityis10.Thebinaryrepresentationofthenodeidentierisgivenintheparentheses.Thebasicgroupisf35(100011);37(100101);18(010010);50(110010)gThesecondgroupisf9(001001);25(011001);41(101001);57(111001)g 85

PAGE 86

Denition1. 86

PAGE 87

87

PAGE 88

Lemma4.7. Proof. 88

PAGE 89

n).NotethatN nistheaveragedistancebetweenanytwoadjacentnodesonthering.IfwecanshowthatE(jcxmx0mj)
PAGE 90

2maxf1;blog(ccxj4)cgBecausecx4foranynodexinCAM-Chord,maxf1;blog(ccxj4)cg1.Hence,jcxmx0mjmXi=1i(1 2)miE(jcxmx0mj)mXi=1E(i)(1 2)mi=N 2)mi=N 2)i
PAGE 91

4.7 Proof. 4.8 ,O(logn=E(logcx))=O(logn 4.5.1GroupMemberswithVerySmallUploadBandwidthAnodexwithverysmalluploadbandwidthshouldonlybealeafinthemulticasttreesunlessitselfisthedatasource.InordertomakesurethattheMULTICASTroutinedoesnotselectxasanintermediatenodeinanymulticasttree,xmustnotbeintheCAM-Chord(orCAM-Koorde)overlaynetwork.Instead,itjoinsasanexternalmember.xasksanodeyknowntobeinCAM-Chord(orCAM-Koorde)toperformLOOKUP(k)foranarbitraryidentierk,whichreturnsarandomnodezintheoverlaynetwork.xthenattemptstojoinzasanexternalmember.Ifzcannotsupportx,zforwardsxtosuccessor(z).Ifzadmitsxasanexternalmember,zwillforwardthereceivedmulticastmessagestoxandxwillmulticastitsmessagesviaz.Ifzleavesthegroup,xmustrejoinviaanothernodeinCAM-Chord(orCAM-Koorde). 91

PAGE 92

57 59 ]fordetails.ExtensiontotheexistingP2Pnetworks,CAMscannaturallyinheritmostofthosefeatureswithoutmuchadditionaleort.Forexample,insteadofchoosingtheithneighbortobe(x+2i),aproximityoptimizationofChordallowstheithneighbortobeanynodewithintherangeof[(a+2i);(a+2i+1),whichdoesnotaectthecomplexities[ 57 ].ThisoptimizationcanalsobeappliedtoCAM-Chord(whichisanextentionofChord)withoutaectingthecomplexities.Anodexcanchooseanynodewhoseidentierbelongstothesegment[x+jcix;x+(j+1)cix)astheneighborxi;j.Giventhisfreedom,someheuristicssuchassmallestdelayrst,maybeusedtochooseneighborstopromoteproximity-clustering.Specically,anodexcanprogressivelyscanthenodesintheallowedsegmentforneighborxi;j,forexample,byfollowingthesuccessorlinktoprobethenextnodeinthesegmentafterevery100kdatabitssentbyx,whichtrivializestheprobingoverhead.xalwaysusethenearestnodeithasfoundrecentlyasitsneighbor. 92

PAGE 93

4-3 comparesthethroughputofCAM-Chord,Chord,CAM-Koorde,andKoordewithrespecttotheaveragenumberofchildrenpernon-leafnodeinthemulticasttree.CAMsperformmuchbetter.TheirthroughputimprovementoverChordandKoordeis70-80%whenthenodes'uploadbandwidthsarechosenfromtherathernarrowdefaultrangeof[400;1000]kbps.Thelargertheupload-bandwidthrange,themorethethroughputimprovement,asdemonstratedbyFigure 4-4 .Ingeneral,let[a;b]betherangeofuploadbandwidth.Theupperboundboftherangeisshownonthehorizontalaxis 93

PAGE 94

Multicastthroughputwithrespecttoaveragenumberofchildrenpernon-leafnode Figure4-4. Throughputimprovementratiowithrespecttouploadbandwidthrange 94

PAGE 95

Multicastthroughputwithrespecttosizeofthemulticastgroup Figure4-6. Throughputvs.averagepathlength ofFigure 4-4 ,whilethelowerboundaisxedat400kbps.ThegureshowsthatthethroughputimprovementbyCAMsincreaseswhentheupload-bandwidthrangeislarger,representingagreaterdegreeofhostheterogeneity.ThesimulationdataalsoindicatethatthethroughputratioofCAM-Chord(CAM-Koorde)overChord(Koorde)isroughlyproportionaltoa+b 4-5 showsthemulticastthroughputwithrespecttothesizeofthemulticastgroup.Accordingtothesimulationresults,thethroughputislargelyinsensitivetothegroupsize. 95

PAGE 96

4.5.2 ,thelatencyisnolongerproportionaltothenumberofhops.WeaddasimulationinSection 4.6.4 tostudythiscase,wheretheactualdelayismeasured.Boththroughputandlatencyarefunctionsofaveragenodecapacity.Withalargeraveragenodecapacity(achievedbyasmallervalueofp),thethroughputdecreasesduetomorechildrenpernon-leafnodeandthelatencyalsodecreasesduetosmallertreedepth.Thereexistsatradeobetweenthroughputandlatency,whichisdepictedbyFigure 4-6 .Higherthroughputcanbeachievedatthecostoflongerroutingpaths.Giventhesameuploadbandwidthdistribution,thesystem'sperformancecanbetunedbyadjustingp.Thegurealsoshowsthat,forrelativelysmallthroughput(lessthan46kbpsinthegure)|namely,forlargenodecapacities|CAM-KoordeslightlyoutperformsCAM-Chord;forrelativelylargethroughput(greaterthan46kbpsinthegure)|namely,forsmallnodecapacities|CAM-ChordoutperformsCAM-Koorde,whichwillbefurtherexplainedinSection 4.6.4 4-7 andFigure 4-8 presentthestatisticaldistributionofmulticastpathlengths,i.e.,thenumberofnodesthatcanbereachedbyamulticasttreeincertainnumberofhops.Eachcurverepresentsasimulationwithnodecapacitieschosenfromadierentrange.Whenthecapacityrangeincreases,thedistributioncurvemovesto 96

PAGE 97

PathlengthdistributioninCAM-Chord.Legend\[x::y]"meansthenodecapacitiesareuniformlydistributedintherangeof[x::y]. Figure4-8. PathlengthdistributioninCAM-Koorde.Legend\[x::y]"meansthenodecapacitiesareuniformlydistributedintherangeof[x::y]. 97

PAGE 98

Averagepathlengthwithrespecttoaveragenodecapacity theleftoftheplotduetoshortermulticastpaths.Theimprovementinthedistributioncanbemeasuredbyhowmuchthecurveisshiftedtotheleft.Atthebeginning,asmallincreaseinthecapacityrangecausessignicantimprovementinthedistribution.Afterthecapacityrangereachesacertainlevel([4;10]inoursimulations),theimprovementslowsdowndrastically.Eachcurvehasasinglepeak,andtherightsideofthepeakquicklydecreasestozero.Itmeansthatthevastmajorityofnodesarereachedwithinasmallrangeofpathlengths.Wedidn'tobserveanymulticastpathwhoselengthwassignicantlylargerthantheaveragepathlength. 4-9 showstheaveragepathlengthwithrespecttotheaveragenodecapacity.Wealsoplotanarticialline,1:5lnn 4.6 andTheorem 4.9 .Fromthegure,whentheaveragenodecapacityislessthan10,theaveragepathlengthofCAM-ChordissmallerthanthatofCAM-Koorde;whentheaveragenodecapacityisgreaterthan12,theaveragepathlengthofCAM-KoordeissmallerthanCAM-Chord.Asmalleraveragepathlengthmeansmorebalancedmulticasttrees.Forsmallnodecapacities,CAM-ChordmulticasttreesaremorebalancedthanCAM-Koorde 98

PAGE 99

Proximityoptimization multicasttrees,andviceversa.Thereasonsareexplainedasfollows.Ononehand,anon-leafCAM-Koordenodexmayhavelesschildrenthancxbecausesomeneighborsmayhavealreadyreceivedthemulticastmessagefromadierentpath.ThistendstomakethedepthofaCAM-KoordemulticasttreelargerthanthatofaCAM-Chordtree.Ontheotherhand,aCAM-Chordnodexmaysplit(x;k]intounevensubsegments(i.e.,subtrees)witharatiouptocx(Lines6-15inSection 4.3.4 ).Thisratioofunevennessbecomessmallwhenthenodecapacitiesaresmall.Combiningthesetwofactors,CAM-Chordcreatesbetterbalancedtreesforsmallnodecapacities;CAM-Koordecreatesbetterbalancedtreesforlargenodecapacities.NextweuseCAM-Chordasanexmple(Section 4.5.2 )tostudytheimpactofproximityoptimization.In[ 60 ],Mukherjeefoundthattheend-to-endpacketdelayontheInternetcanbemodeledbyashiftedGammadistribution,whichisalong-taildistribution.Theshapeparametervariesfromapproximately1.0duringlowloadsto6.0duringhighloadsonthebackbone.Wesettheshapeparametertobe5.0andtheaveragepacketdelaytobe50ms.Figure 4-10 comparestheaveragelatencyofdeliveringamulticastmessagefromasourcetoareceiverinCAM-Chordwithorwithouttheproximityoptimization.Thesimulationisperformedfordierentaveragenodecapacities, 99

PAGE 100

Throughputvs.latency andtheimpactofproximityoptimizationissignicant.Inmostcases,itreducesthelatencymorethanbyhalf. 4-11 showstherelationbetweenthelatencyratioandthecapacityratio.Whenthecapacityratioissmaller,whichmeansthenodescannotsupportasmanychildrenastheyhaveclaimed,thenodeswillforwardthereceivedmessagestoa 100

PAGE 101

101

PAGE 102

102

PAGE 103

[1] I.Stoica,R.Morris,D.Liben-Nowell,D.R.Karger,F.M.Kaashoek,F.Dabek,andH.Balakrishnan,\Chord:AScalablePeer-to-PeerLookupProtocolforInternetApplications,"IEEE/ACMTransactionsonNetworking(TON),2003. [2] S.Ratnasamy,P.Francis,M.Handley,R.Karp,andS.Shenker,\AScalableContentAddressableNetwork,"Proc.ofACMSIGCOMM'01,August2001. [3] B.Y.Zhao,L.Huang,S.C.Rhea,J.Stribling,A.D.Joseph,andJ.D.Kubiatowicz,\Tapestry:AGlobal-scaleOverlayforRapidServiceDeployment,"IEEEJ-SAC,January2004. [4] A.RowstronandP.Druschel,\Pastry:Scalable,DecentralizedObjectLocation,andRoutingforLarge-ScalePeer-to-PeerSystems,"Proc.ofMiddleware'01,November2001. [5] C.G.Plaxton,R.Rajaraman,andA.W.Richa,\AccessingNearbyCopiesofReplicatedObjectsinaDistributedEnvironment,"TheoryofComputingSystems,vol.32,no.3,pp.241{280,1999. [6] D.Malkhi,M.Naor,andD.Ratajczak,\Viceroy:AScalableandDynamicEmulationoftheButtery,"Proc.ofACMPODC'02,pp.183{192,2002. [7] A.Kumar,S.Merugu,J.Xu,andX.Yu,\Ulysses:ARobust,Low-Diameter,Low-LatencyPeer-to-PeerNetwork,"Proc.ofIEEEICNP'03,Nov.2003. [8] Y.Chawathe,S.Ratnasamy,L.Breslau,N.Lanham,andS.Shenker,\MakingGnutella-likeP2PSystemsScalable,"Proc.ofACMSIGCOMM'03,pp.407{418,2003. [9] C.Gkantsidis,M.Mihail,andA.Saberi,\RandomWalksinPeer-to-PeerNetworks,"Proc.ofIEEEINFOCOM'04,Mar.2004. [10] Q.Lv,P.Cao,E.Cohen,K.Li,andS.Shenker,\SearchandReplicationinUnstructuredPeer-to-PeerNetworks,"Proc.ofICS'02,pp.84{95,Sept.2002. [11] K.Sripanidkulchai,B.Maggs,andH.Zhang,\EcientContentLocationUsingInterest-BasedLocalityinPeer-to-PeerSystems,"Proc.ofIEEEINFOCOM'03,Mar.2003. [12] C.Dellarocas,\ImmunizingOnlineReputationReportingSystemsAgainstUnfairRatingsandDiscriminatoryBehavior,"EC'00:Proc.ofthe2ndACMconferenceonElectroniccommerce,2000. [13] M.Srivatsa,L.Xiong,andL.Liu,\TrustGuard:CounteringVulnerabilitiesinReputationManagementforDecentralizedNetworks,"Proc.ofWWW'05,May2005. [14] P.DewanandP.Dasgupta,\SecuringReputationDatainPeer-to-PeerNetworks,"Proc.ofParallelandDistributedComputingandSystems,2004. 103

PAGE 104

[15] Y.WangandJ.Vassileva,\BayesianNetworkTrustModelinPeer-to-PeerNetworks,"Proc.ofAP2PC'03,2003. [16] M.Venkatraman,B.Yu,andM.P.Singh,\TrustandReputationManagementinaSmall-WorldNetwork,"ICMAS'00:Proc.oftheFourthInternationalConferenceonMultiAgentSystems(ICMAS-2000),2000. [17] S.MartiandH.Garcia-Molina,\IdentityCrisis:Anonymityvs.ReputationinP2PSystems,"ThirdIEEEInternationalConferenceonPeer-to-PeerComputing,2003. [18] M.Jakobsson,J.Hubaux,andL.Buttyan,\AMicropaymentSchemeEncouragingCollaborationinMulti-hopCellularNetworks,"Proc.ofFinancialCrypto'03,2003. [19] V.Vishnumurthy,S.Chandrakumar,andE.G.Sirer,\KARMA:ASecureEconomicFrameworkforP2PResourceSharing,"Proc.oftheWorkshopontheEconomicsofPeer-to-PeerSystems,June2003. [20] Y.Zhang,W.Lou,andY.Fang,\SIP:ASecureIncentiveProtocolagainstSelshnessinMobileAdhocNetworks,"ProcofWCNC'04,March2004. [21] J.C.L.T.B.Ma,SamC.M.LeeandD.K.Yau,\AGameTheoreticApproachtoProvideIncentiveandServiceDierentiationinP2PNetworks,"Proc.ofACMSIGMETRICS/PERFORMANCE,June2004. [22] L.Buttyn,\RemovingtheFinancialIncentivetoCheatinMicropaymentSchemes,"IEEElectronicsLetters,pp.132{133,January2002. [23] S.Lee,R.Sherwood,andB.Bhattacharjee,\CooperativePeerGroupsinNICE,"Proc.ofIEEEINFOCOM'03,Apr2003. [24] M.Feldman,K.Lai,I.Stoica,andJ.Chuang,\RobustIncentiveTechniquesforPeer-to-PeerNetworks,"ACMElectronicCommerce,2004. [25] G.Banavar,M.Chandra,B.Nagarajaro,R.Strom,andC.Sturman,\AnEcientMulticastProtocolforContent-basedPublish-SubscribeSystem,"Proc.ofICDCS'98,May1998. [26] Y.H.Chu,S.Rao,andH.Zhang,\ACaseforEndSystemMulticast,"Proc.ofSIGMETRICS'00,June2000. [27] J.Jannotti,D.Giord,K.Johnson,M.Kaashoek,andJ.O'Toole,\Overcast:ReliableMulticastingwithanOverlayNetwork,"Proc.ofOSDI'00,Oct2000. [28] C.K.S.Banerjee,B.Bhattacharjee,\ScalableApplicationLayerMulticast,"Proc.ofACMSIGCOMM'02,August2002. [29] S.E.Deering,D.Estrin,D.Farinacci,V.Jacobson,C.-G.Liu,andL.Wei,\AnArchitectureforWide-AreaMulticastRouting,"Proc.ofACMSIGCOMM'94,pp.126{135,August1994.

PAGE 105

[30] Y.Chu,S.G.Rao,S.Seshan,andH.Zhang,\EnablingConferencingApplicationsontheInternetUsinganOverlayMulticastArchitecture,"Proc.ofACMSIGCOMM'01,Auguest2001. [31] B.Zhang,S.Jamin,andL.Zhang,\HostMulticast:AFrameworkforDeliveringMulticasttoEndUsers,"Proc.ofIEEEINFOCOM'02,June2002. [32] G.-I.KwonandJ.W.Byers,\ROMA:ReliableOverlayMulticastwithLooselyCoupledTCPConnections,"Proc.ofIEEEINFOCOM'04,March2004. [33] P.M.ZhiLi,\ImpactofTopologyOnOverlayRoutingService,"Proc.ofIEEEINFOCOM'04,March2004. [34] Y.ShavittandT.Tankel,\OntheCurvatureoftheInternetandItsUsageforOverlayConstructionandDistanceEstimation,"Proc.ofIEEEINFOCOM'04,March2004. [35] F.Baccelli,A.Chaintreau,Z.Liu,A.Riabov,andS.Sahu,\ScalabilityofReliableGroupCommunicationUsingOverlays,"Proc.ofIEEEINFOCOM'04,March2004. [36] V.Pappas,B.Zhang,A.Terzis,andL.Zhang,\Fault-TolerantDataDeliveryforMulticastOverlayNetworks,"Proc.ofICDCS'04,March2004. [37] S.Zhuang,B.Zhao,A.Joseph,R.Katz,andJ.Kubiatowicz,\Bayeux:AnArchitectureforScalableandFault-tolerantWideAreaDataDissemination,"Proc.oftheEleventhInternationalWorkshoponNetworkandOperatingSystemSupportforDigitalAudioandVideo(NOSSDAV'01),June2001. [38] R.ZhangandY.C.Hu,\Borg:aHybridProtocolforScalableApplication-levelMulticastinPeer-to-PeerNetworks,"Proc.ofNOSSDAV'03,2003. [39] S.Ratnasamy,M.Handley,R.Karp,andS.Shenker,\Application-LevelMulticastUsingContent-AddressableNetworks,"Proc.ofNGC'01,2001. [40] S.El-Ansary,L.O.Alima,P.Brand,andS.Haridi,\EcientBroadcastinStructuredP2PNetworks,"Proc.ofIPTPS'03,Feb2003. [41] M.Castro,M.Jones,A.-M.Kermarrec,A.Rowstron,M.Theimer,H.Wang,andA.Wolman,\AnEvaluationofScalableApplication-LevelMulticastBuiltUsingPeer-To-PeerOverlays,"Proc.ofIEEEINFOCOM'03,April2003. [42] S.Shi,J.Turner,andM.Waldvogel,\DimensioningServerAccessBandwidthandMulticastRoutinginOverlayNetworks,"Proc.ofNOSSDAV'01,June2001. [43] S.ShiandJ.Turner,\RoutinginOverlayMulticastNetworks,"Proc.ofIEEEINFOCOM'02,June2002. [44] J.A.DejanKosti,AdolfoRodriguezandA.Vahdat,\Bullet:HighBandwidthDataDisseminationUsinganOverlayMesh,"Proc.ofSOSP'03,October2003.

PAGE 106

[45] S.Banerjee,C.Kommareddy,B.B.K.Kar,andS.Khuller,\ConstructionofanEcientOverlayMulticastInfrastructureforReal-timeApplications,"Proc.ofIEEEINFOCOM'03,March2003. [46] A.RiabovandL.Z.ZhenLiu,\OverlayMulticastTreesofMinimalDelay,"Proc.ofICDCS'04,March2004. [47] H.Yamaguchi,A.Hiromori,T.Higashino,andK.Taniguchi,\AnAutonomousandDecentralizedProtocolforDelaySensitiveOverlayMulticastTree,"Proc.ofICDCS'04,March2004. [48] N.ChangandM.Liu,\RevisitingtheTTL-basedcontrolledFloodingSearch:OptimalityandRandomization,"Proc.ofACMMobiCom'04,2004. [49] N.ChangandM.Liu,\ControlledFloodingSearchinaLargeNetwork,"Proc.ofModelingandOptimizationinMobile,AdHocandWirelessNetworks,2005. [50] C.Gkantsidis,M.Mihail,andA.Saberi,\HybridSearchSchemesforUnstructuredPeer-to-PeerNetworks,"Proc.ofIEEEINFOCOM'05.,2005. [51] A.Iamnitchi,M.Ripeanu,andI.Foster,\LocatingDatain(Small-World?)Peer-to-PeerScienticCollaborations,"Proc.ofIPTPS'02:1stInternationalWorkshoponPeer-to-PeerSystems,2002. [52] A.Iamnitchi,M.Ripeanu,andI.Foster,\Small-worldFilesharingCommunities,"Proc.ofIEEEINFOCOM'04.,2004. [53] M.Castro,P.Drushel,A.Ganesh,A.Rowstron,andD.Wallach,\SecureRoutingforStructuredPeer-to-PeerOverlayNetworks,"Proc.ofOSDI'02,2002. [54] J.R.Douceur,\TheSybilAttack,"IPTPS'01:RevisedPapersfromtheFirstInternationalWorkshoponPeer-to-PeerSystems,pp.251{260,2002. [55] Y.G.DesmedtandY.Frankel,\ThresholdCryptosystems,"CRYPTO'89:ProceedingsonAdvancesincryptology,pp.307{315,1989. [56] M.KaashoekandD.Karger,\Koorde:ASimpleDegree-optimalDistributedHashTable,"Proc.of2ndIPTPS,Berkeley,CA,Feb2003. [57] K.P.Gummadi,R.Gummadi,S.D.Gribble,S.Ratnasamy,S.Shenker,andI.Stoica,\TheImpactofDHTRoutingGeometryonResilienceandProximity,"Proc.ofACMSIGCOMM'03,August2003. [58] S.SenandJ.Wong,\AnalyzingPeer-to-PeerTracacrossLargeNetworks,"Proc.ofSecondAnnualACMInternetMeasurementWorkshop,November2002. [59] S.Ratnasamy,\RoutingAlgorithmsforDHTs:SomeOpenQuestions,"Proc.of1stInternationalWorkshoponPeer-to-PeerSystems,March2002.

PAGE 107

[60] A.Mukherjee,\OntheDynamicsandSignicanceofLowFrequencyComponentsofInternetLoad,"Internetworking:ResearchandExperience,vol.5,no.4,pp.163{205,1994.

PAGE 108

ZhanZhangreceivedhisM.S.degreeincomputersciencefromFudanUniversityofChinain2003,andreceivedhisdoctorateinComputerandInformationScienceandEngineeringfromUniversityofFloridain2007.Hiscurrentresearcheldsincludeoverlaynetworks,networksecurityandsensornetworks.Hisemailaddressiszzhan@cise.u.edu. 108


Permanent Link: http://ufdc.ufl.edu/UFE0018800/00001

Material Information

Title: Overlay Infrastructure Support for Internet Applications
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0018800:00001

Permanent Link: http://ufdc.ufl.edu/UFE0018800/00001

Material Information

Title: Overlay Infrastructure Support for Internet Applications
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0018800:00001


This item has the following downloads:


Full Text





OVERLAY INFRASTRUCTURE SUPPORT FOR INTERNET APPLICATIONS


By

ZHAN ZHANG




















A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA

2007





























2007 Zhan Zhang



































To my family









ACKNOWLEDGMENTS

I am grateful for the help I have received in writing this dissertation. First of all,

I would like to thank my advisor Prof. Shigang Chl i1 for his guidance and support

throughout my graduate studies. Without the numerous discussions and brainstorms with

him, the results presented in this thesis would never have existed.

I am grateful to Prof. Liuqing Yang, Prof. Randy C!i. .-- Prof. Sartaj Sahni, and

Prof. Ye Xia for their guidance and encouragement during my years at the University of

Florida (UF).

I am thankful to all my colleagues in Prof. Ch! i,'s group, including Liang Zhang,

MyungKeun Yoon, Ying Jian, and Ming Zhang. They provide valuable feedback for my

research. I would like to thank many people in the Computer and Information Science and

Engineering (CISE) Department for their help in my research work.

Last but not least, I am grateful to my family for their love, encouragement, and

understanding. It would be impossible for me to express my gratitude towards them in

mere words.









TABLE OF CONTENTS
page

ACKNOWLEDGMENTS . ... .............................. 4

LIST OF FIGURES .... ............................... 7

A BSTR A CT . ...... . . . . . . . 9

CHAPTER

1 INTRODUCTION .... ............................. .. 11

1.1 Overlay Networks ..... ......................... .. 11
1.2 Related Work . ... ................................ 13
1.3 Contribution . . . . . 17
1.3.1 A Distributed Hybrid Query Scheme to Speed Up Queries in Unstructured
Peer-to-Peer Networks . . . . . . 17
1.3.2 A Distributed Incentive Scheme for Peer-to-Peer Networks . 18
1.3.3 Capacity-Aware Multicast Algorithms on Heterogeneous Overlay
N networks . . . . . . . .. .. 18

2 A HYBRID QUERY SCHEME TO SPEED UIP QUERIES IN IUNSTRUICTIURED
PEER-TO-PEER NETWORKS . . . . . . 19

2.1 M otivation . . . . . . . . . 19
2.1.1 Problems in Prior Work . . . . . . 19
2.1.2 M otivation . . . . . . . . 21
2.2 Constructing a Small-world Topology . . . . . 23
2.2.1 Measuring the Interest Similarity between Two Nodes . . 23
2.2.2 C!I-~ i n'' Nodes with Similar Interests . . . . 27
2.2.3 Bounding Clusters . . . . . . . 27
2.3 A Hybrid Query Scheme . . . . . . . 29
2.3.1 Mixing Inter-Cluster Queries and Intra-Cluster Queries . 29
2.3.2 Reducing the Communication Overhead . . . . 32
2.4 Sim ulation . . . . . . . . . 32

3 MARCH: A DISTRIBUTED INCENTIVE SCHEME IN OVERLAY NETWORKS 40

3.1 M otivation . . . . . . . . . 40
3.1.1 Limitation of Prior Work . . . . . . 40
3.1.2 M otivation . . . . . . . . 41
3.2 System M odel . . . . . . . . 42
3.3 Authority Infrastructure . . . . . . . 43
3.3.1 Delegation . . . . . . . . 43
3.3.2 k-pair T ii-lv..i ll, Set . . . . . . 44
3.4 MARCH: A Distributed Incentive Scheme . . . . 45
3.4.1 Money and Reputation . . . . . . 45









3.4.2 Phase 1: Contract Negotiation . . . .
3.4.3 Phase 2: Contract Verification . . . .
3.4.4 Phases 3 and 4: Money Transfer and Contract Execution
3.4.5 Phase 5: Prosecution . . . . .
3.5 System Properties and Defense against Various Attacks . .
3.5.1 System Properties . . . . . .
3.5.2 Defending Against Various Attacks . . .
3.6 D discussions . . . . . . . .
3.6.1 Rewarding Delegation Members . . . .
3.6.2 Money Refilling . . . . . .
3.6.3 System Dynamics and Overhead . . . .
3.7 Sim ulation . . . . . . . .
3.7.1 Effectiveness of Authority . . . . .
3.7.2 Effectiveness of MARCH . . . . .


4 CAPACITY-AWARE MULTICAST
OVERLAY NETWORKS......


4.1 M otivation . . . .
4.2 System Overview . . .
4.3 CAM-Chord Approach . .
4.3.1 Neighbors . . .
4.3.2 Lookup Routine . .
4.3.3 Topology Maintenance .
4.3.4 Multicast Routine . .
4.3.5 Analysis . . .
4.4 CAM-Koorde Approach . .
4.4.1 Neighbors . . .
4.4.2 Lookup Routine . .
4.4.3 Multicast Routine . .
4.4.4 Analysis . . .
4.5 Discussions . . .
4.5.1 Group Members with Very
4.5.2 Proximity and Geography
4.6 Simulation . . . .
4.6.1 Throughput . .
4.6.2 Throughput vs. Latency
4.6.3 Path Length Distribution
4.6.4 Average Path Length .


ALGORITHMS ON HETEROGENEOUS


Small


Upload


Bandwidth.


4.6.5 Impact of Dynamic Capacity Variation .


. 102


5 SUMMARY . . . . . . . . .

REFERENCES . . . . . . . . .

BIOGRAPHICAL SKETCH . . . . . . .


. 108









LIST OF FIGURES
Figure page

2-1 Probability of random walks escaping out of the cluster decreases exponentially
with the ratio of number of inter-cluster edges to the number of intra-cluster
edges. . . . . . . . . . . 20

2-2 Markov random walks discover less distinct number of nodes than uniform random
w alks do. . . . . . . . . . 21

2-3 A query scheme mixing inter-cluster queries and intra-cluster queries (the nodes
in the grey clusters fall into the same interest group). . . . 23

2-4 Interest similarity between nodes. The number of data items in 1, 2, and n are
50, 100, and 200 respectively. A) No common visited nodes (different interests).
B) u and v have visited node 2 (100 data items) with 8 and 5 times respectively
(a certain level of similar interests). C) u and v have visited node n (200 data
items) with 8 and 5 times respectively (falls in-between) . . . 25

2-5 Effect of average query number on the cluster size. . . . . 33

2-6 Interest association is a good metric to estimate interest similarity . . 34

2-7 Percentage of returned query within a specific hop number. . . 34

2-8 Percentage of returned query within a specific message number. . . 35

2-9 Percentage of returned query within a specific hop number in a less clustered
network. . . . . . . . . . . 36

2-10 Percentage of returned query within a specific message number in a less clustered
netw ork. . . . . . . . . . . 36

2-11 Number of distinct nodes discovered in the same group within a certain message
range. . . . . . . . . . . 37

2-12 Number of distinct nodes discovered in the same group within a specific hop
num ber. . . . . . . . . . . 38

2-13 Percentage of messages discovering distinct nodes within a certain message range.
. . . . . . . . . . . 3 8

2-14 Total number of distinct nodes discovered within a specific hop number. . 39

3-1 Trustworthy probability for a delegation and 5-pair delegation set with m* =
3,000 are 99.97-.'. and 99.815'. respectively. Even if a delegation/k-pair delegation
set is not trustworthy, it may not be compromised because very unlikely a single
colluding group can control the ii i Ii i ly of them. . . . . 44

3-2 A) Protocols for contract verification and exchange. B) money transfer. C)Prosecution. 52









3-3 Trustworthiness of delegation . . . . . . . 63

3-4 Trustworthiness of k-pair delegation set . . . . . 63

3-5 Most of malicious nodes are rejected within the first 50 transactions. . 65

3-6 Failed transaction ratio and the overpaid money ratio drop quickly to small percentages
within the first 100 transactions. . . . . . . 66

3-7 Overpaid money ratio (measured after 250 transactions) increases linearly with
the number of dishonest nodes. . . . . . . 67

3-8 Number of rejected dishonest nodes (measured after 250 transactions) increases
linearly to the number of dishonest nodes. . . . . . 67

3-9 Overpaid money ratio with respect to the threshold . . . . 68

3-10 Number of rejected nodes with respect to the threshold . . . 68

3-11 Number of rejected dishonest nodes (measured after 250 transactions) increases
linearly to the number of dishonest nodes. . . . . . 69

4-1 Chord v.s. CAM-Chord neighbors (c, 3) . . . . . 75

4-2 CAM-Koorde topology with identifier space [0..63] . . . . 85

4-3 Multicast throughput with respect to average number of children per non-leaf
n o d e . . . . . . . . .. . 94

4-4 Throughput improvement ratio with respect to upload bandwidth range . 94

4-5 Multicast throughput with respect to size of the multicast group . . 95

4-6 Throughput vs. average path length . . . . . . 95

4-7 Path length distribution in CAM-Chord. Legend "[x.. I means the node capacities
are uniformly distributed in the range of [x..y]. . . . . 97

4-8 Path length distribution in CAM-Koorde. Legend "[Jx../| means the node capacities
are uniformly distributed in the range of [x..y]. . . . . 97

4-9 Average path length with respect to average node capacity . . . 98

4-10 Proximity optimization . . . . . . . . 99

4-11 Throughput vs. latency . . . . . . . 100









Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

OVERLAY INFRASTRUCTURE SUPPORT FOR INTERNET APPLICATIONS

By

Zhan Zhang

May 2007

C'i iir: Shigang CIh h
Major: Computer Engineering

Overlay networks have gained a lot of popularity, and are now being considered

in many application domains such as content distribution, conferencing, -iii etc.

These applications rely on the support from underlying overlay infrastructures. How to

design overlay infrastructures to satisfy requirements from different applications, such as

resource-sharing systems and application-level multicast, is largely an open problem. My

dissertation develops several techniques in the design of overlay infrastructure to facilitate

Internet applications.

A distributed hybrid query scheme is proposed for resource-sharing systems. It

combines Markov random walks and peer clustering to achieve a better tradeoff. The

scheme has a short response time for most of the queries that belong to the same interest

group, while still maintaining a smaller network diameter. More important, we propose

a totally distributed clustering algorithm, which means better resilience to network

dynamics.

A new incentive scheme is proposed to support cooperative owver1-'- in which the

amount of a node can benefit is proportional to its contribution, malicious nodes can only

attack others at the cost of their own interests, and a collusion cannot gain advantage by

cooperation. Furthermore, we have designed a distributed authority infrastructure and a

set of protocols to facilitate the implementation of the scheme in peer-to-peer networks,

which have no trusted authorities to manage each peer's history information.









Moreover, we propose two capacity-aware multicast systems that focus on host

heterogeneity, any source multicast, dynamic membership, and scalability. We extend

Chord and Koorde to be capacity-aware. We then embed implicit degree-varying multicast

trees on top of the overlay network and develop multicast routines that automatically

follow the trees to disseminate multicast messages. The implicit trees are well balanced

with workload evenly spread across the network. We rigorously analyze the expected

performance of multi-source capacity-aware iilli ii -I ii- which was not thoroughly

addressed in any previous work. We also perform extensive simulations to evaluate the

proposed multicast systems.









CHAPTER 1
INTRODUCTION

1.1 Overlay Networks

Overlay networks have recently attracted a lot of attention and are now being

considered in many application domains such as content distribution, conferencing,

1imin: etc.

An overlay network is a computer network which is built on top of another network.

Each node in an overlay network maintains pointers to a set of neighbor nodes. These

pointers are used both to maintain the overlay and to implement application functionality,

for example, to locate content stored by overlay nodes. Two neighbors in the overlay can

be thought of as being connected by virtual or logical links, each of which corresponds to a

path, perhaps through many physical links, in the underlying network. For example, many

peer-to-peer networks are overlay networks because they run on top of the Internet.

Manually configured static ovw- i.-i are nothing new. A salient, modern feature of

overlay networks is that they can be made to autonomically self-organize which provides

great ease of deployment.

Resource sharing and application-level multicast are two of ii' Pr applications in

overlay networks, and we study the infrastructure support for these applications from

following three aspects.

Query schemes for lookup services: The basic overlay applications are

resource-sharing systems, such as grid computing, and file-sharing peer-to-peer networks.

The core operation in these systems is an efficient lookup mechanism for locating

resources. The fundamental challenge is to achieve faster response time, smaller network

diameter, better resilience to network dynamics, and lower overhead. The overli-

need to be constructed with good search properties, such as clustering the nodes with

similar interests, and the query scheme needs to utilize the properties to support lookup

efficiently.









Incentive schemes to encourage cooperation: In cooperative overlay networks,

a node is allowed to consume resources from other nodes, and is also expected to share

its resources with the community. Tod c.'s peer-to-peer networks suffer from the problem

of free-riders, that consume resources in the network without contributing anything in

return. Originally it was hoped that users would be altruistic, "from each according to his

abilities, to each according to his needs." In practice, altruism breaks down as networks

grow larger and include more diverse users. This leads to a I i'.;. ly of the commons,"

where individual p1. i rs' self interest causes the system to collapse.

Overlay multicast for group communication: Overlay multicast is becoming

a promising alternative for group communications, because the global deployment of

IP multicast has been slow due to the difficulties related to heterogeneity, scalability,

manageability, and lack of a robust inter-domain multicast routing protocol. Even though

overlay multicast can be implemented on top of overlay unicast, they have very different

requirements, because in overlay unicast, low-capacity nodes only affect traffic passing

through them and they create bottlenecks of limited impact. In overlay multicast, all

traffic will pass all nodes in the group, and the multicast throughput is decided by

the node with the smallest throughput, particularly in the case of reliable delivery.

The strategy of assigning an equal number of children to each intermediate node is far

from optimal. If the number of children is set too big, the low-capacity nodes will be

overloaded, which slows down the entire session. If the number of children is set too small,

the high-capacity nodes will be under-utilized. In such systems, the network heterogeneity,

scalability, manageability must be well-addressed,

We focus on how to design overlay infrastructures to address these challenges. For

resource sharing systems, we present a clustering algorithm and a labeling algorithm to

cluster members within the same interest group, and propose a efficient query scheme to

deliver a better tradeoff between communication overhead and response time. Moreover,

we propose a new incentive scheme, which is particularly suitable to overlay networks









without any centralized authority. Our incentive scheme is more resilient against various

attacks, especially launched by a large number of colluding nodes.

With respect to overlay multicast, we propose two capacity-aware overlay systems

that support distributed applications requiring multiple-source multicast with dynamic

membership.

1.2 Related Work

The fundamental challenge of constructing an overlay network for resource-sharing

systems such as peer-to-peer networks is to achieve faster response time, smaller network

diameter, better resilience to network dynamics, and higher security. Structured P2P

networks have been proposed by many researchers [1-7], in which distributed hash tables

(DHT) are used to provide data location management in a strictly structured way.

Whenever a node joins/leaves the overlay, a number of nodes need to update their routing

tables to preserve desirable properties for fast lookup. While structured P2P networks

can offer better performance in response time and communication overhead for query

procedures, they suffer from the large overhead for overlay maintenance due to network

dynamics. In addition, DHTs are inflexible in providing generic keyword searches because

they have to hash the keys associated with certain objects [8].

Unstructured P2P networks such as Gnutella rely on a random process, in which

nodes are interconnected in a random manner. The randomness offers high resilience to

the network dynamics. Basic unstructured networks rely on flooding for users' queries,

which is expensive in computation and communication overhead. Consequently, salability

has ahbi- ; been a major weakness for unstructured networks. Even with the use of super

nodes in Morpheus and KaZaA, the traffic is still high, and even exceeds web traffic.

Searching through random walks are proposed in [8-10], in which incoming queries

are forwarded to the neighbor that is chosen randomly. In random walks, there is typically

no preference for a query to visit the most possible nodes maintaining the needed data,

resulting in long response time.









Interest-based shortcut [11] exploits the locality of interests among different nodes. In

this approach, a peer learns its shortcuts by flooding or passively observing its own traffic.

A peer ranks its shortcuts in a list and locates content by sequentially asking all of the

shortcuts on the list from the top, until content is found. The basic principle behind this

approach is that a node tends to revisit accessed nodes again since it was interested in the

data items from these nodes before. The concept of interest similarity is vague, and it is

difficult to make a subtle, quantitative definition based on it. In addition, it may cause

new problems.

Another challenge in resource sharing systems is the incentive schemes. Tod (.'s

peer-to-peer networks suffer from the problem of free-riders, that consume resources in

the network without contributing anything in return. Originally it was hoped that users

would be altruistic, "from each according to his abilities, to each according to his needs."

In practice, altruism breaks down as networks grow larger and include more diverse users.

This leads to a ii 'ly of the commons," where individual pl i- rs' self interest causes

the system to collapse.

To reduce free-riders, the systems have to incorporate incentive schemes to encourage

cooperative behavior. Some recent works [12-17] propose reputation based trust systems,

in which each node is associated with a reputation established based on the feedbacks

from others that it has made transactions with. The reputation information helps users to

identify and avoid malicious nodes. An alternative is virtual currency schemes [18-20],

in which each node is associated with a certain amount of money. Money is deducted

from the consumers of a service, and transferred to the providers of the service after each

transaction.

Both types of schemes rely on authentic measurement of service quality and

unforgeable reputation/money information. Otherwise, selfish/malicious nodes may

gain advantage based on false reports. For example, a consumer may falsely claim to have

not received service in order to p i- less or defame others. More seriously, malicious nodes









may collude in cheating in order to manipulate their information. Several algorithms

are proposed to address these problems. They either analyze statistical characteristics

of the nodes' behavior patterns and other nodes' feedbacks [13, 21], or remove the

underlying incentive for cheating [22]. In order to apply these algorithms, the nodes'

history information must be managed by a central authority, which is not available in

typical peer-to-peer networks.

Some other works [23, 24] find circular service patterns based on the history

information shared among trusted nodes. Each node in a service circle has chance to

be both a provider and a consumer. The communication overhead for discovering service

circles is very high, which makes these schemes not scalable. In addition, nodes belonging

to different interest groups have little chance to cooperate because service circles are unlike

to form among them.

Besides the resource sharing systems, application-level multicast is another promising

applications in overlay networks. AT ir' research papers [25-28] pointed out the disadvantages

of implementing multicast at the IP level [29], and argued for an application-level overlay

multicast service. More recent work [30-36] studied overlay multicast from different

aspects.

To handle dynamic groups and ensure salability, novel proposals were made to

implement multicast on top of overlay networks. For example, B -.-i ux [37] and Borg [38]

were implemented on top of Tapestry [3] and Pastry [4] respectively, and CAN-based

Multicast [39] was implemented based on CAN [2]. El-Ansary et al. studied efficient

broadcast in a Chord network, and their approach can be adapted for the purpose of

multicast [40]. Castro et al. compares the performance of tree-based and flooding-based

multicast in CAN--i le versus Pastry --il ,1.- overlay networks [41].

These systems assume each node has the same number of children. Host heterogeneity

is not addressed. Even though overlay multicast can be implemented on top of overlay

unicast, they have very different requirements. In overlay unicast, low-capacity nodes









only affect traffic passing through them and they create bottlenecks of limited impact. In

overlay multicast, all traffic will pass all nodes in the group, and the multicast throughput

is decided by the node with the smallest throughput, particularly in the case of reliable

delivery. The strategy of assigning an equal number of children to each intermediate node

is far from optimal. If the number of children is set too big, the low-capacity nodes will be

overloaded, which slows down the entire session. If the number of children is set too small,

the high-capacity nodes will be under-utilized. To support efficient multicast, we should

allow nodes in a P2P network to have different numbers of neighbors.

Shi et al. proved that constructing a minimum-diameter degree-limited spanning tree

is NP-hard [42]. Note that the terms "d. 5;. and "( p Iy are interchangeable in the

context of this dissertation. Centralized heuristic algorithms were proposed to balance

multicast traffic among multicast service nodes (\ISNs) and to maintain low end-to-end

latency [42, 43]. The algorithms do not address the dynamic membership problem, such as

MSN join/departure.

There has been a flourish of capacity-aware multicast systems, which excel in

optimizing single-source multicast trees but are not suitable for multi-source applications

such as distributed games, teleconferencing, and virtual classrooms, which are the target

applications of our algorithms. Bullet [44] is designed to improve the throughput of data

dissemination from one source to a group of receivers. An overlay tree rooted at the source

must be established. Disjoint data objects are disseminated from the source via the tree

to different receives. The receivers then communicate amongst themselves to retrieve

the missing objects; these dynamic communication links, together with the tree, form

a mesh, which offers better bandwidth than the tree alone. Overlay multicast network

infrastructure (OMNI) [45] dynamically adapts its degree-constrained multicast tree

to minimize the latencies to the entire client set. Riabov et al. proposed a centralized

constant-factor approximation algorithm for the problem of constructing a single-source

degree-constrained minimum-delay multicast tree [46]. Yamaguchi et al. described a









distributed algorithm that maintains a degree-constrained delay-sensitive multicast tree for

a dynamic group [47]. These algorithms are designed for a single source and not suitable

when there are many potential sources (such as in distributed games). Building one tree

for each possible source is too costly. Using a single tree for all sources is also problematic.

First, a minimum-delay tree for one source may not be a minimum-delay tree for other

sources. Second, the single-tree approach concentrates the traffic on the links of the tree

and leaves the capacities of the in i i ily nodes (leaves) unused, which affects the overall

throughput in multi-source multicasting. Third, a single tree may be partitioned beyond

repair for a dynamic group.

1.3 Contribution

There are three i ii jr contributions in this dissertation. First of all, we propose an

efficient distributed hybrid query scheme for unstructured peer-to-peer networks. Second,

we propose an incentive scheme to encourage members to contribute to the community.

Third, we propose two overlay infrastructures to support application-level multicast in

heterogeneous environment.

1.3.1 A Distributed Hybrid Query Scheme to Speed Up Queries in Unstruc-
tured Peer-to-Peer Networks

We propose a hybrid query scheme, which deliver a better tradeoff between

communication overhead and response time.

We define a metric, independent of any global information, to measure the interest
similarity between nodes. Based on the metric, we propose a clustering algorithm to
cluster nodes sharing similar interests with small overhead, and fast convergence.

We propose a distributed labeling algorithm to explicitly capture the borders of
clusters without any extra communication overhead.

We propose a new query scheme, which is able to deliver a better tradeoff among
response time, communication overhead, and the ability to locate more resources by
mixing inter-cluster queries and intra-cluster queries.









1.3.2 A Distributed Incentive Scheme for Peer-to-Peer Networks

We propose a new incentive scheme to encourage end users to contribute to the

systems, which is suitable to the network without any centralized authority, and is more

resilient against various attacks, especially launched by a large collusion. Furthermore, we

have designed a distributed authority infrastructure and a set of protocols to implement

the scheme in peer-to-peer networks. More specifically, the scheme proposed has the

following advantages over the conventional approaches:

We propose a new distributed incentive scheme, which combines reputation and

virtual money. It is able to strictly limit the damage caused by malicious nodes and their

colluding groups. The following features distinguish our scheme from others.

The benefit that a node can get from the system is limited by its contribution
to the system. The members in a colluding group cannot increase their total money or
S. .--regate reputation by cooperation, regardless of the group size, and malicious nodes can
only attack others at the cost of their own interest.

We design a distributed authority infrastructure to manage the nodes' history
information with low overhead and high security.

We design a key sharing protocol and a contract verification protocol based on the
threshold cryptography to implement the proposed distributed incentive scheme.

1.3.3 Capacity-Aware Multicast Algorithms on Heterogeneous Overlay
Networks

We extend Chord, and Koorde to support application-level multicast, which has

following properties.

Capacity awareness: Member hosts may vary widely in their capacities in terms of
network bandwidth, memory, and CPU power. Some are able to support a large number of
direct children, but others support few.

Any source multicast: The system should allow any member to send data to other
members. A multicast tree that is optimal for one source may be bad for another source.
On the other hand, one tree per member is too costly.

Dynamic membership: Members may join and leave at any time. The system must
be able to efficiently maintain the multicast trees for a dynamic group.

Scalability: The system must be able to scale to a large Internet group. It should be
fully distributed without a single point of failure.









CHAPTER 2
A HYBRID QUERY SCHEME TO SPEED UP QUERIES IN UNSTRUCTURED
PEER-TO-PEER NETWORKS

2.1 Motivation

2.1.1 Problems in Prior Work

Small communication overhead and short response time are the two main concerns

in designing efficient query schemes in peer to peer networks. Current approaches suffer

various problems in achieving a better tradeoff between communication overhead and

response time due to the blindness in searching procedures.

Flooding: Flooding [48, 49], is a popular query scheme to search a data item in fully

unstructured P2P networks such as Gnutella. While flooding is simple and robust, its

communication overhead, that is, the message number, increases exponentially with the

hop number. In addition, most of these messages visit the nodes that have been searched

in the same query, and they can be regarded as duplicate messages. Consequently,

communication overhead and salability are aiv-x the main problems in the flooding

approach.

Random Walks: Random walks [8-10, 50] rely on query messages randomly

selecting their next hops among neighbors to reduce the communication overhead. A query

may have to go through many hops before it successfully locates the queried data item.

Consequently, this approach takes a long time. If the networks are well-clustered (nodes

with similar interests are densely connected), it is expected that the query latency can

be reduced significantly. However, it is not true, because the chance of a random walk

message escaping out of the original cluster increases exponentially with the ratio r of the

inter-cluster edge number to the intra-cluster edge number, as shown in Figure 2-1.

In the case of a network with a small value of r (e.g., r < 0.01), if queried data items

are in different clusters from the source node, a query message has to walk a long distance

to be able to traverse the cluster border and locate the queried data items. In the case

of a network with a large value of r (e.g., r > 0.1), query requests may escape out of the










200
180 Uniform random walks
180
160
140
4. 120
100
80
|o 80 '
S 60
40
20
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
ratio (# inter-cluster edges/# intra-incluster edges)

Figure 2-1. Probability of random walks escaping out of the cluster decreases
exponentially with the ratio of number of inter-cluster edges to the number
of intra-cluster edges.


original cluster within a small number of hops, resulting in a long response time if the

queried data is in the original cluster. These observations also are demonstrated by our

simulations in Section 3.7. Consequently, random walks may suffer long response time

regardless of the network having been well-clustered or not.

Interest-based shortcut: Interest-based shortcut [t11] tries to avoid the blindness

in random walks by favoring nodes sharing similar interests with the source, which can be

regarded as a variation of markov random walks. Markov random walks may accelerate

the query process to some extent in some cases. However, it causes new problems.

Suppose nodes in an interest group have formed a cluster, and query messages can be

artificially confined in this specific cluster. In the sense of nodes in the cluster share

similar interests, any of them possibly maintains the queried data. The query procedure

should shorten the covering time of the whole cluster instead of the hitting time of some

specific nodes in it. However, due to the bias in selecting next hop in markov random

walks, it tends to keep visiting some specific nodes, resulting in less distinct nodes being

covered comparing to uniform random walks, as illustrated by Figure 2-2. Consequently,

markov random walks work worse than uniform random walks if query messages can be

confined in specific clusters.










'70
Uniform random walks

-0
50
40
30

20
10
I-


0
-o 20

10

0 10 20 30 40 50 60 70 80 90 100
number of hops

Figure 2-2. Markov random walks discover less distinct number of nodes than uniform
random walks do.


2.1.2 Motivation

Researchers [11, 51, 52] have found rn -vi peer-to-peer networks exhibit small-world

topology, and most of queried data items are offered by the nodes, which share similar

interest with the source node.

Intuitively, the nodes sharing similar interests with the source node should have

higher priority to be searched than others. Practically, there are two challenges in

designing such a query scheme. The first one is how to construct a small-world topology

to cluster nodes sharing similar interests. By icing -n1 w interests", we actually

mean that two nodes are interested with a common set of data items. The number of

common accessed data items can served as a metric to measure the interest similarity

between two nodes. A clustering algorithm based on the metric can be easily designed

to densely connect the nodes in the same interest group. Moreover, each node u can

explicitly pick up a set of inter-cluster neighbors that have different interests from u, and

a set of intra-cluster neighbors that share similar interests with u. Take Figure 2-3 as an

example. The network consists of 5 clusters, and nodes in the same cluster fall into the

same interest group. Note that there exists an interest group consisting of two clusters: 1,

and 5.









Suppose the network has been well-clustered, and each node explicitly maintains a set

of inter-cluster and intra-cluster neighbors. The second challenge is how to fast locate the

clusters that share similar interests with the source node, and how to exhaustively search

nodes in the found clusters, if the queried data items are in the source node's interest

group. We introduce two types of queries: inter-cluster queries, and intra-cluster queries.

The inter-cluster queries are for the purpose of discover the clusters that share similar

interests with the source node, and they are issued by source node, carry the interest

information, and only travel on inter-cluster neighbors. It can be expected that clusters

sharing similar interests with the source node can be located quickly, because the number

of cluster is much smaller comparing to the network size, and inter-cluster queries only

travel among different clusters. The intra-cluster queries are spawned by inter-cluster

queries when a cluster sharing similar interest with the source node is hit. An intra-cluster

query thoroughly go through nodes in the found cluster, where it is spawned, by only

traveling on intra-cluster neighbors. Note that inter-cluster queries and intra-cluster

queries can be easily implemented if each node explicitly knows the types of its neighbors.

Occasionally, queried data may be out of the source node's interest group, and

possibly maintained by a cluster(s) with different interests. This problem is addressed

by blind search: inter-cluster messages randomly spawning intra-cluster messages when

hitting clusters with different interests.

For example, in Figure 2-3, first inter-queries are initiated by a node in cluster

1, which travel among different clusters. By the interest information carried in the

inter-queries, cluster 5 is found to share similar interests when it is hit, and an intra-cluster

query is spawned, which then will exhaustively search the nodes in it. In addition, an

intra-cluster query is spawned in cluster 2 by inter-cluster queries to support blind search.









inter-cluster query
(Y intra-cluster query cluster

4 5.3

4>



Figure 2-3. A query scheme mixing inter-cluster queries and intra-cluster queries (the
nodes in the grey clusters fall into the same interest group).


2.2 Constructing a Small-world Topology

2.2.1 Measuring the Interest Similarity between Two Nodes

Cluster is generally formed by connecting nodes with similar interests in a network.

We start our discussion with the definition of interest similarity between two nodes in P2P

networks.

If node u and node v share similar interests, then it is very likely that they have

accessed same data items more or less previously. The size of the common subset of

accessed data items can served as a metric to measure to what extent the interests of two

nodes are similar.

Each node may offer hundreds of data items, and hence, there may exist a large

number of data items even in a small network. As a result, only if u and v have visited a

large number of data items respectively, they are able to show some degree of similarity.

An alternative to evaluate the interest similarity is by the number of common accessed

nodes, which may enable a clustering algorithm to converge faster than the former

approach. The problem in this approach is that two nodes visiting a common node does

not indicate they have similar interests, because a node may offer data items belonging

to multiple interest groups. For instance, a user u may offer resources for two groups: a

number of mp3 music files for one group, and a number of research literatures in P2P

networks for the other group. It is possible that two nodes that have visited u may be









interested in data items in different interest group. We have to address the discrepancy

between the common set of accessed nodes and the common set of visited data items.

Suppose there are n nodes N = {1, 2,..., n} in the whole P2P network and a node i

offers a number of data items to the community. It categorizes (maps) all of these data

items into a, different categories, denoted as C' = {c- c, ..., cI }. Suppose a data item x

in i is mapped to a category c'(x), where c'(x) c C'. How to categorize the data items is

determined by the node i independently. For instance, node i may classify music files as

category 1, while another node may classify music files as category 2. On the other hand,

node i may fall into multiple interest groups, denoted as G' = {g',g, ..., g.}.

For a node u, the access history with respect to each of its interest groups (e.g., gj)

can be specified by a set of data items x, denoted as (i, c'(x)), where i represents the node

offering the data item, and c'(x) is the category in C' defined by i. If two nodes u and v

share "similar interests" (e.g., gj g`), their histories for g' and g' tend to consist of a

common set of (i, ct(x)).

Note that in the above definitions, each node determines its interest groups and

categories independently, indicating a node need not maintain any global information.

For easy explanation, we study a basic approach by assuming each user only falls

into one interest group, and offers one category of data items. In this scenario, the access

history can be represented by the accessed nodes alone. This approach can be easily

extended to the multi-categories and multi-groups based on the definitions above.

One node u may access another node multiple times for different data items, and

hence, the access history of node u can be represented by a vector V" = (vu, .., v'), 1

where vu represents the number of times u has visited node x. To cancel out the number

of queries a node has issued, the access vector Vu is normalized to the frequency vector



1 The real size of the data structure maintaining Vu is much smaller than the network
size n, and can be fixed to only record the nodes accessed most frequently by u

















N N N

A B C

Figure 2-4. Interest similarity between nodes. The number of data items in 1, 2, and n
are 50, 100, and 200 respectively. A) No common visited nodes (different
interests). B) u and v have visited node 2 (100 data items) with 8 and 5 times
respectively (a certain level of similar interests). C) u and v have visited node
n (200 data items) with 8 and 5 times respectively (falls in-between)


F", in which the i-th element in F" is denoted as f", computed by V" as = s,
Y'jGN V]
representing the frequency of the corresponding node i having been accessed.

Note that the value in F'11 falls into the range [0,1]. If u has never accessed node i,

the corresponding element f[ is equal to 0. The summation of all elements is equal to 1.

Furthermore, if the number of data items in node i, denoted as di, is large, the chance

that two nodes u and v have visited common data items in i may be small even if both

of them have visited i multiple times. As an example, in B and C of Figure 2-4, u and

v have visited one common node. But u and v in B have more chance of having visited

common data items because the number of data items in node 2 is half of that in node

n. To account for this issue, we introduce a weighted diagonal matrix W with (i, i)-th

value wi4, equal to It represents the probability of both u and v visiting a common data

items, if both of them visit i once.

Now we define the following metric to evaluate the interest similarity between two

nodes (Take Figure 2-4 (B) as an example):












Au FuT WF"
T
.2 .02 0 0 0 0 0 0

.8 0 .01 0 0 0 0 .5

0 0 0 d3 0 0 0 0

0 0 0 .0 0

0 0 0 0 0

0 0 0 0 0 0 .005 .5
S0.004


Similarly, the interest similarity in Figure 2-4 (C) is 0.002, which means nodes in A

show more interest similarity.

From the definition, we have


ARU F"WF 1f f (2-1)
di

If we view fj' and f[ as the probabilities of nodes u and v visiting node i, and

as the probability of u and v visiting a common data items if both of them visit i, then

the summation A;'" can be used to predict the probability that both u and v will visit a

common data item in their future queries.

Note that if a node i has not been visited by both u and v, then p' = 0 and/or

p4 = 0, indicating that a node does not need to maintain any information of the nodes it

has never visited. As we have discussed, the size of the vector Vu is fixed. Hence, both the

storage for access history and the computation overhead for Au, are constant.

Our definition is advantageous in manyfold. First, nodes u and v need not maintain

any global information to compute Auj0. Second, the frequency vector cancels out the

effect of the number of queries that a node has issued, enabling a clustering algorithm

based on this definition to converge fast with the average number of queries. Third,









the definition prefers the nodes with good properties. For instance, if two nodes i, j

maintaining same data items can be accessed by 1MB/s and 56Kb/s respectively. i

obviously will be accessed more often, resulting in a larger value of p' and p'. Fourth, it

reduces the impact of the possible discrepancy in the category definitions by nodes. For

instance, if a category in node i is poorly defined, and consists of data items belonging to

various interest groups, then the category will be seldom accessed comparing to its size,

resulting in a small value of fJ'f7j .

2.2.2 Clustering Nodes with Similar Interests

Given the metric to evaluate the interest similarity between two nodes, we propose a

light-weight clustering algorithm to connect nodes sharing similar interests.

In our strategy, each node i maintains a list L with limited size (e.g., 30) to record

the nodes that possibly share same interests with itself. Each time a query message is

processed, the similarity between the querying node itself and the node that owns the data

items is computed. The newly obtained interest similarity, and the corresponding node's

address are inserted into the list L. If the list is full, the stored neighbor with the lowest

interest similarity is dropped.

By assuming that interests of nodes will not shift in a limited time frame, the nodes

collected in L possibly fall into the same interest group as i, and will serve as candidates

of its intra-cluster neighbors.

2.2.3 Bounding Clusters

Although a small-world topology can be formed along with queries by the above

clustering algorithm, existing query schemes such as random walks can only benefit

marginally from it as we discussed in Section 3.1.

To exploit the characteristics of the small world topology, our approach is to explicitly

capture the clusters in the underlying topology by each node i maintaining a set of

inter-cluster neighbors in other interest groups, and intra-cluster neighbors in its own

interest group.









For inter-cluster neighbors, a node i can learn them easily. For example, i can issue

a certain number of random walk messages only traveling on other nodes' inter-cluster

neighbors, and choose the nodes hit by the messages as its inter-cluster neighbors. Note

that the list L collects candidates of its intra-cluster neighbors, and should not overlap

with the set of inter-cluster neighbors.

It is of the most importance to learn the intra-cluster neighbors, which can be selected

from the nodes collected in the list L. The purpose of intra-cluster neighbors is to confine

intra-cluster queries within a specific interest group. Two nodes falsely regarded as

intra-cluster neighbors may create a dramatic impact because an intra-cluster query may

traverse to another cluster with different interests. On the contrast, two nodes i and k

that are falsely regarded as intra-cluster neighbors will only have limited impact, because i

and k may be connected by other intermediate intra-cluster neighbors j. In addition, the

chance of i and k falling into the same cluster tends to become larger along with query

procedures if they are in the same interest group.

Based on this observation, we propose a labeling algorithm, which ensures that if a

link (i,j) is labeled as an intra-cluster edge, then i and j are in the same interest group

with high probability.

For a node i, we normalize the interest similarity of its neighbors j in L as follows.

AJ
Pi[j VYk A k

If a matrix P is organized such that its i,j-th element is pij, then the rows in P sum

to 1 as the matrix P is row stochastic. Intuitively, pij can be viewed as the transition

probabilities for the markov random walk.

The transition probability pij can serve as a good metric to determine whether i

and j are in the same interest group or not by introducing a threshold, denoted as T, as

a lower bound. T can be set as a relatively larger value, because the false negative has

limited impact as discussed. Suppose there are a neighbors in L that are possibly in the









same interest group with i. T can be set as 1. Note that pij and T are computed by node

i locally. The labeling algorithm does not involve any extra communication.

2.3 A Hybrid Query Scheme

2.3.1 Mixing Inter-Cluster Queries and Intra-Cluster Queries

By explicitly capturing the cluster structures in the underlying network, we can

formally define following three types of query messages.

The first one is called 1-query message, denoted as i., which is a special type

of inter-cluster message only traveling on inter-cluster neighbors. The purpose of it is

to quickly locate the clusters that may share similar interests with the source node and

disperse intra-queries among different clusters. Messages in this type are issued by the

source node, and walk among different clusters randomly. Moreover, if the queried data is

in the source node's interest group, 1-query messages should pi-ryback the source node's

frequency vector, such that nodes hit by the messages can determine whether their clusters

share similar interests with the source node.

The second one is called s-query message, denoted as if., which is a special

type of intra-cluster message confining itself within a specific cluster by doing uniform

random walks only on intra-cluster neighbors. s-query messages are only spawned/issued

in the clusters that share similar interests with the source node. The purpose of it is

to exhaustively search nodes that fall into the same interest group as the original node.

Messages in this type are issued by the source node if the query is in the same interest

group, and/or spawned by 1-query messages, if clusters sharing similar interests with the

source node are hit.

In order to reduce the number of duplicate s-query messages, each node having

received the message should be able to estimate to what extent the cluster has been

covered. If most of nodes have been visited, an s-query message has little chance

to discover any new nodes by continuing to walk in the cluster, indicating the new

received message should be discarded. Otherwise, the message should be forwarded.









Accurately estimating the covering time of a cluster is difficult and resource-consuming in

a distributed system. Heuristically, if the message has been sequentially hitting a certain

number, denoted as h, of nodes that have been visited by previous intra-cluster messages,

most of nodes may have been covered, and hence, the message should be discarded. All

messages in I [., need to maintain a counter to keep track on the sequential number of

nodes having been visited by previous intra-cluster messages. The counter of each new

spawned message in T[, is set to 0 or 1 depending on whether the starting node has been

hit or not before. Note that the total (not sequential) number of nodes having been hit

by previous intra-cluster queries is not a good metric for estimation, because it heavily

depends on the cluster size.

The last one is called b-query message, denoted as if, which is also a special type

of intra-cluster message similar to s-query message. The difference of b-query messages

from s-query messages is that b-query messages may be spawned in clusters that have

different interests from the source node. The purpose of it is to support blind search,

because occasionally, the queried data may be out of the source node's interest group. The

messages in this type are issued by the source node if the query is out of source's interest

group, and/or possibly spawned by 1-query messages, when clusters that have not been

visited by intra-cluster query messages are hit. The chance that the queried data item in

a cluster having different interests is very small. Once a b-query message hits a node that

has been visited by intra-cluster messages before, the message is discarded to reduce the

number of duplicated messages.

To control the communication overhead, the total number of concurrent query

messages has to be limited. The source node needs to count the number of 1-query,

s-query and b-query messages, denoted as mi, ms and mb respectively. Whenever

an intra-cluster message, such as s-query message or b-query message, is spawned or

discarded, the source node needs to be notified to update the corresponding counter. Only

if the summation of nil, ms, and mb is smaller than a certain number, denoted as ni, a









new b-query message can be spawned to support blind search. s-query messages can be

spawned without restriction, and the total number of concurrent messages may be larger

than m temporarily. Note that the counter mi does not change in a query. In addition, all

messages need to periodically check the status of the source node so that they can stop if

the query has been successfully returned.

With these three types of messages, our query scheme is designed as follows.

Initialization: To initiate a query request, a node u issues a number mi of 1-query

messages. If the queried data item falls in u's interest group, 1-query messages carry the

source's frequency vector, and a certain number ms of s-query messages are issued to

exhaustively search its own cluster. Otherwise, the message does not carry any interest

information, and a b-query message is issued.

Receiving an 1-query message: In the case of a node u receiving an 1-query

message, it calculates the interest similarity with the source node. If u shares similar

interests (e.g., the similarity is larger than a small value), it spawns a new s-query

message and update ms maintained by the source node. Otherwise, a new b-query

message is spawned, if the node has not been hit by other messages in ., and i f, and

mi + ms + mb < m. Finally, node u forwards the received message to a randomly selected

inter-cluster neighbor.

Receiving an s-query message: In the case of a node u receiving an s-query

message, if u has been hit by messages in ., or I f, it increases the counter in the

message by 1. Otherwise, it resets the counter to 0. Next, if the counter is larger than

the threshold h (e.g., 10), the node discards the message and notifies the source node

to update the counter mi. Otherwise, it forwards the message to a randomly selected

intra-cluster neighbor.

Receiving a b-query message: In the case of a node u receiving a b-query

message, if it has been hit by messages in ., or A, the node discards the message and









notifies the source node to update the counter nib. Otherwise, it forwards the message to a

randomly selected intra-cluster neighbor.

Our scheme can be considered to be stateful, in which if the same queries are

reissued multiple times, less intra-cluster queries will be spawned in the clusters that

have been well searched, and in contrast, more intra-cluster queries will be spawned in the

less-searched clusters, resulting in stronger ability to discover more resources/replicas.

2.3.2 Reducing the Communication Overhead

By mixing inter-cluster and intra-cluster queries, it can be expected the system

performance can be improved significantly. Because the access vector V'11 of a node u can

be fixed to a small size, the extra overhead will not increase largely (only 1-query messages

need to carry the frequency vector).

Moreover, 1-query messages only travel among different clusters, and the number

of clusters, especially in a well-clustered network, is much smaller comparing to the real

network size. It can be expected that most of clusters can be covered by 1-query messages

within a small number of hops. 1-query messages can remove the frequency vector from

the p loads after a certain number of hops. In the meantime, a source node can specify

one 1-query message to keep the frequency vector for the case of that some clusters sharing

similar interests with the source node have not been discovered after the specified number

of hops.

2.4 Simulation

In this section, the performance of the proposed clustering algorithm and query

scheme is studied by simulations. If not explicitly defined, the number of nodes is 10, 000,

and the average group size is 150. Each node maintains 1,000 data items, the average

number of queries issued by each node is 30, the threshold h is equal to 10, and the

probability of a node incorrectly classifying its queries or data items is 0.1. Moreover, m

and mi are set to be 32 and 16 respectively.










160
Clustering, ii liin
140 r^A "!- ^ ^
120 -
S100
o 80 -
S 60
40
20 -
0
0 10 20 30 40 50
number of queries

Figure 2-5. Effect of average query number on the cluster size.


We compare our scheme to random walks, in which a source node issues 32 random

walk messages in each query, correspondingly. We also have compared our scheme to

flooding schemes. As expected, we observe the flooding schemes suffer from very large

communication overhead.

In the figures, the legend "Uniform random walks (0)" /"Uniform random walks

(1)" refers to the queried data items are out of/in the source node's interest group in the

uniform random walks query scheme, and similarly "Inter-intra (0)"/"Inter-intra (1)"

refers to the queries are out of/in the source node's interest group in the proposed scheme.

First, we study the effectiveness of the metric measuring interest similarity and the

clustering algorithm. In Figure 2-5, it is observed that when the average query number

is larger than 10, the algorithm reaches a stable state and almost all nodes in the same

interest group form a single cluster. It indicates that our algorithm converge fast with the

average number of queries, which is especially useful in P2P networks, where nodes tend to

join/leave the system more frequently.

By Figure 2-6, it can be observed that the average number of nodes in a cluster is

almost the same as group size, demonstrating that A', can effectively measure the nodes'

interest similarity.

















Clustering algorithm












0 20 40 60 80 100 120 140 160
group size


Figure 2-6. Interest association is a good metric to estimate interest similarity


| h Uniform random walks (1)
S 0.2 H Uniform random walks (0) ----
0^ 0.1 'l i -Iiili.i i I
0
0 20 40 60 80 100 120 140 160 180 200
number of hops

Figure 2-7. Percentage of returned query within a specific hop number.










1
S 0.9 ... .
0.8
S0 .7
S 0.6 -
S 0.5 -
S 0.4
0.3
| 0.3 F r 1ii. -1ii ,i 1.11 I 'I .11 I
U 0.2 1 iii 1.11 i. n .111. li ii
S0. Inter-Intra (0)
0 5 10 15 20 25 30 35 40 45 50
number of messages (*100)

Figure 2-8. Percentage of returned query within a specific message number.


Second we study the performance of our scheme with respect to query latency and

communication overhead. In Figure 2-7, it is observed that if the queried data items fall

into the original node's interest group, the number of hops needed for the in ,' P ily of the

queries is significantly reduced to about 20, while in the uniform random walks, it takes

much longer time. Correspondingly, the number of messages is also much smaller in our

scheme than that in random walks, as shown in Figure 2-8. The figures also show that

if the queried data are out of the source node's interest group, the performance of our

scheme is similar to uniform random walks. Note that a longer response time is acceptable

since only a few queries will be out of source node's interest group in many P2P networks.

In addition, these two figures also demonstrate that random walks for queries in the source

node's interest group can only benefit marginally from the underlying clustered topology,

that is, only a little larger percentage of them can be returned than those out of source

node's interest group within the same number of hops). (messages).

We also have studied the performance of a network, in which each group consists of

multiple different clusters, as shown in Figure 2-9 and Figure 2-10. The results show the

similar trends, which keeps true with respect to all other metrics that will be studied later.

Moreover, comparing Figure 2-7 with Figure 2-9, and Figure 2-8 with Figure 2-10, it can
























0i.3
0.2 Un Uilii landuim ivalks QIu)
0.1 Inter-Intra (1)
0 Inter-Intra (0)
0 20 40 60 80 100 120 140 160 180 200
number of hops


Figure 2-9. Percentage of returned query within a specific hop
network.


number in a less clustered


0.2 Uniform random walks (0) ---- -
S 0.1 Inter-Intra (1)
0 1 1 Inter-Intra (0)
0 5 10 15 20 25 30 35 40 45 50
number of messages (*100)

Figure 2-10. Percentage of returned query within a specific message number in a less
clustered network.











110
Uniform random walks (1)
100 M Inter-Intra (1)

90 -

80 -
0
(U)
0
0
C:
.. 60 -
(U)
-3
50-
0
-o
< 40 -
E
Z 30-

: 20 -
1'

(0, 5] (5, 10] (10, 15] (15, 20] (20, 25] (25, 30]
range of messages (*100)

Figure 2-11. Number of distinct nodes discovered in the same group within a certain
message range.



be observed that the performance of random walks in two different (well-clustered and

poor-clustered) networks are similar, which further verifies our argument in Section 3.1.

As have been observed, when the queried data items are in the source nodes' interest

group, our scheme works much better than random walks. The reason behind it is that

our scheme can discover more distinct nodes in the source nodes' interest groups within

the same number of messages or hops, as shown in Figure 2-11 and Figure 2-12. In the

figures, it can be observed that within the first 1,000 messages or 30 hops, more than

120 nodes in the source node's interest group have been searched by query messages.

Consequently, the majority of queried data falling into the node's interest group can be

found with smaller overhead, and shorter latency. It also indicates our scheme has stronger

ability to locate more replicas, since it can discover a much larger number of nodes sharing

similar interests.

Occasionally, the queried data item may be maintained by nodes in other interest

groups, or classified into wrong interest group by source node. In the former case, 1-queries

will not carry any interest information, but in the latter case, 1-queries will carry wrong

interest information. In both cases, the efficiency of our query scheme can be evaluated






















120

I 100

80
I 80

S60

40
- n 40


Uniform random walks (1)
Inter-Intra (1)

0 50 100 150


number of hops


Figure 2-12. Number of distinct nodes discovered in the same group within a specific hop

number.


09
(I)
0)
08
(1)
's 07
E
06

U 05
Z4)
0
04

03

02

0 1
0 1


(0, 5] (5,10] (10,15] (15,20] (20,25] (25,30] (30,35] (35,40] (40,45] (45,50]
range of messages (*100)


Figure 2-13. Percentage of messages discovering distinct nodes within a certain message

range.










5000 II
4500
-o 4000
C
e 3500
.H 3000
| 2500
S 2000
1500 -
1000 :(ililbim random walks
500 Inter-Intra (1)
0 Inter-Intra (0)
0
0 50 100 150 200
number of hops

Figure 2-14. Total number of distinct nodes discovered within a specific hop number.


by the number of distinct nodes discovered by queries, including those out of the source

node's interest group, within a certain number of messages and hops. Note that whether

the queries are in or out of the original node's interest group makes no difference to

random works. Figure 2-13 and Figure 2-14 show that in the first 1,000 messages, if the

queries carrying interest information, less distinct nodes can be searched in our scheme.

The reason is that s-query messages mistakenly exhaustively search the nodes in the

clusters that share -ii I 1r" interests in the beginning, which has been demonstrated

by our previous simulations. Consequently, the number of b-query messages is limited.

Along with the increment of the number of messages/hops, our scheme works similar to

the uniform random walks. It is because after most of nodes sharing similar interests are

covered, more b-query messages will be spawned to search clusters with different interests,

which are able to discover more distinct nodes. In addition, by the figures, if the queries

carry no interest information, our scheme works similar to uniform random walks.









CHAPTER 3
MARCH: A DISTRIBUTED INCENTIVE SCHEME IN OVERLAY NETWORKS

3.1 Motivation

3.1.1 Limitation of Prior Work

Any node in a peer-to-peer network is both a service provider and a service consumer.

It contributes to the system by working as a provider, and benefits from the system as a

consumer. A transaction is the process of a provider offering a service to a consumer, such

as supplying a video file. The purpose of an incentive scheme is to encourage the nodes

to take the role of providers. Neither reputation systems [12-17] nor virtual currency

systems [18, 19] can effectively prevent malicious nodes, especially those in collusion, from

manipulating their history information by using false service reports. Specifically, the

existing schemes have the following problems.

Reputation inflation: In the reputation schemes, malicious nodes can work together

to inflate each other's reputation or to defame innocent nodes, by which colluding nodes

protect themselves from the complaints by innocent nodes as these complaints may be

treated as noise by the systems.

Money depletion: In the virtual currency schemes, malicious nodes may launch

attacks to deplete other nodes' money and paralyze the whole system. Without authentic

reputation information, innocent nodes are not able to proactively select benign partners

and avoid malicious ones.

Frequent complainer: In many incentive schemes, nodes will be punished if they

complain frequently, which prevents malicious nodes from constantly defaming others at

no cost. It also discourages innocent nodes from reporting frequent malicious acts because

otherwise they would become frequent complainers.

Punishment scale: In most existing schemes, the scale of punishment is related to

the service history of the transaction participants. Consequently, an innocent node may be

subject to negative discrimination attacks [12] launched by nodes with excellent history.









3.1.2 Motivation

Punishing malicious nodes and limiting the damage caused by a colluding group are

indispensable requirements of an incentive scheme that is able to deter bad behavior.

There are two in r kinds of bad behavior. First, a provider may deceive a consumer by

providing less-than-promised service. Second, a consumer may defame a provider by falsely

claiming the service is poor.

Consider how these problems are dealt with in real life. Before a transaction happens,

the provider would want to know if the consumer has enough money to p i- for the

service, and the consumer would want to know the reputation of the provider. With such

information, they can control the risk and decide whether to carry out the transaction

or not. After the transaction, if the provider deceives, it will be sued by the consumer.

Consequently, the malicious provider will build up bad reputation, which prevents it from

deceiving more consumers. Now consider a consumer intentionally defames a provider.

It does so only after it can show the evidence of a transaction, which requires it to p i

money first. Consequently, defaming comes with a cost. The ability of the malicious

consumer to defame others is limited by the amount of money it has.

Inspired by the observation above, we propose a new incentive scheme: MARCH,

which is a combination of Money And Reputation sCHemes.

The basic idea behind the scheme is simple: each node is associated with two

parameters: money and reputation. The providers earn money (and also reputation)

by serving others. The consumers p i- money for the service. If a consumer does not think

the received service worth the money it has paid, it reports to an authority, specifying

the amount of money it believes it has overpaid. If the authority can determine who is

lying, the liar is punished. Otherwise, the authority freezes the money claimed to have

been overpaid. The money will not be available to the provider and will not be returned to

the consumer either, which eliminates any reason for the consumer to lie. If the provider

is guilty, the consumer has the revenge and the provider's reputation suffers. If the









provider is innocent, the consumer does it at a cost because after all it has paid the price

of the transaction. In addition, the falsely-penalized provider will not serve it any more.

The technical challenges are (1) establishing a distributed authority for managing the

money and reputation, (2) designing the protocol of transaction that ensures authentic

exchange of money/reputation information and allows the unsatisfied consumers to sue the

providers, (3) analyzing the properties of such a system, and (4) evaluating the system.

3.2 System Model

The nodes in a P2P network fall in three categories: honest, selfish, and malicious.

Honest nodes follow the protocol exactly, and they both provide and receive services.

Selfish nodes will break the protocol only if they can benefit. Malicious nodes are willing

to compromise the system by breaking the protocol even when they benefit nothing and

may be punished. Selfish/malicious nodes may form colluding groups. There may exist

a significant number of selfish nodes, but the malicious nodes are likely to account for

a relatively small percentage of the whole network. At a certain time, all self/malicious

nodes that break the protocol are called dishonest nodes. A node is said to be rejected

from the system if it has too little money and too poor reputation such that no honest

providers/consumers will perform transaction with it. We study the incentive scheme

in the context of DHT-based P2P networks [1, 2, 4]. We assume the routing protocol is

robust, ensuring the reliable delivery of messages in the network [53]. We also assume the

networks have the following properties.

Random, non-selectable identifier: A node can not select its identifier, which should
be arbitrarily assigned by the system. This requirement is essential to defending the Sybil
attack [54]. One common approach is to hash a node's IP address to derive a random
identifier for the node [1].

Public/private key pair: Each node A in the network has a public/private key pair,
denoted as PA and SA respectively. A trusted third party is needed to issue public-key
certificates. The trusted third party is used off-line once per node for certificate issuance,
and it is not involved in any transaction.









3.3 Authority Infrastructure


3.3.1 Delegation

Who will keep track of the money/reputation information in a P2P network? In

the absence of a central authority for this task, we design a distributed authority

infrastructure. Each node A is assigned a delegation, denoted as DA, which consists

of k nodes picked pseudo-randomly. For example, we can apply k hash functions, i.e.,

{h, 2, ..**., hk}, on the identifier of node A to derive the identifiers of nodes in DA. If a

derived identifier does not belong to any node currently in the network, the "(! ..-- I node

is selected. For example, in [1], it will be the node clockwise after the derived identifier on

the ring. The j-th element in DA is denoted as DA(J).

DA keeps track of A's money/reputation. Any anomaly in the information stored

at the delegation members may indicate an attempt to forge data. The information is

legitimate only if the ini i i"i ly of the delegation members agree on it. As long as the

Sin ii ly of the delegation members are honest, the information about node A cannot be

forged. Such a delegation is said to be trustworthy. On the other hand, if at least half of

the members are dishonest, then the delegation is ui l i -n'v .i li.

The delegation members are appointed pseudo-randomly by the system. A node

cannot select its delegation members, but can easily determine who are the members

in its or any other node's delegation. To compromise a delegation, the malicious/selfish

nodes from a colluding group must constitute the ini i ii i ly of the delegation. Unless

the colluding group is very large, the probability for this to happen is small because the

identifiers of the colluding nodes are randomly assigned by the system and the identifiers

of the delegation are also randomly assigned. Let m be the size of a colluding group and n

be the total number of nodes in the system. The probability for t out of k nodes to be in

the colluding group is

P(t, k, k (1 -t










n = 100,000


S 0.9995

S 0.999

0.9985
delegation (k=5)
0.998 5-pair delegation set --- -
0.998 ---- --------
0 500 1000 1500 2000 2500 3000
m* (total number of dishonest nodes)

Figure 3-1. Trustworthy probability for a delegation and 5-pair delegation set with
m* = 3,000 are 99.97-.'. and 99.815'. respectively. Even if a delegation/k-pair
delegation set is not trustworthy, it may not be compromised because very
unlikely a single colluding group can control the ii i Ii i ly of them.


where P(t, k, m) denotes the probability of t successes in k trials in a Binomial distribution

with the probability of success in any trial being m. Let m* be the total number of

distinct nodes in all colluding groups, also including all malicious nodes. The probability

of a delegation being ti u-1l i li, is at least Io P(t, k, m4). We plot the ti ii-I v- -i lI lii

probability with respect to m* when k = 5 in Figure 3-1 (the upper curve). In order to

control the overhead, we shall keep the value of k small.

3.3.2 k-pair Trustworthy Set

A transaction involves two delegations, one for the provider and the other for the

consumer. They have to cooperate in maintaining the money and reputation information,

and avoiding any fraud. To facilitate the cooperation, we introduce a new structure, called

k-pair delegation set, consisting of k pairs of delegation members. Suppose node A is

the provider and node B is the consumer. The ith pair is (DA(i), DB(i)), Vi CE [1..k], and

the whole set is


{(DA(1), DB (1)), (DA (2), DB (2)), ..., (DA (k), DB (k))}









If both DA(i) and DB(i) are honest, the pair (DA(i), DB(i)) is trustworthy. If the in iii fly

of the k pairs are ti n-I~. i- 11 i1, the whole set is trustworthy. It can be easily verified that

the probability for the whole set to be trustworthy is


P(t, k, 2' 2)
Sn n
to0

We plot the trustworthy probability for the whole set with respect to m* in Figure 3-1

(the lower curve).

3.4 MARCH: A Distributed Incentive Scheme

3.4.1 Money and Reputation

With the distributed authority designed in the previous section, the following

information about a node A is maintained by a delegation of k nodes.

Total money (TMA): It is the total amount of money paid by others to node A

minus the total amount of money paid to others by A in all previous transactions. The

universal refilled money (Section 3.6.2) will also be added to this variable.

Overpaid money (OMA): It is the total amount of money overpaid by consumers.

A consumer p -i- money to node A before a service. If the service contract is not fulfilled

by the transaction, the consumer may file a complaint, specifying the amount of money

that it has overpaid. This amount cannot be greater than what the consumer has paid.

When a new node joins the network, its total money and overpaid money are

initialized to zero. From TMA and OMA, we define the following two quantities.

Available money (mA): It is the amount of money that node A can use to buy

services from others.

mA = TMA OMA (3-1)









Reputation (rA): It evaluates the quality of service (with respect to the service

contracts) that node A has provided.

{TMAOMA ifrTMA/o (
rA = AA (3-2)
1 if TMA = 0

For example, if TMA = 500 and OMA = 10, then A's available money is 490, i.e.,

mA =490, and its reputation is 0.98, i.e., rA = 0.98.

To track every node's available money and reputation, we propose a set of protocols.

Consider a transaction, in which Alice (A) is the provider and Bob (B) is the consumer.

The transaction consists of five sequential phases.

Phase 1: Contract Negotiation. Alice and Bob negotiate a service contract.

Phase 2: Contract Verification. Through the help of their delegations, Alice and
Bob verify the authenticity of the information claimed in the contract.

Phase 3: Money Transfer. The amount of money specified in the contract is
transferred from Bob's account in DB to Alice's account in DA.

Phase 4: Contract Execution. Alice offers the service to Bob based on the contract
specification.

Phase 5: Prosecution. After the service, Bob provides feedback reflecting the quality
of service offered by Alice.

3.4.2 Phase 1: Contract Negotiation

Suppose Bob has received a list of providers through the lookup routine of the P2P

network. Each provider specifies its reputation and its price for the service. Bob wants to

minimize his risk when deciding which service provider he is going to use.

Let LA be the price specified by Alice and GB be the fair price estimated by Bob

himself. According to the definition, rA can roughly be used as a lower bound on the

probability of Alice being honest. Intuitively, the probability for Bob to receive the service

is at least rA, and the probability for Bob to waste its money LA is at most (1 rA). We

define the benefit for Bob to have a transaction with Alice as GB x rA LA x (1 rA). We









further normalize it as

R rA -A) (3-3)
GB
To avoid excessive risk, Bob takes Alice as a potential provider if R is greater than a

threshold value T. The use of threshold helps the system reject dishonest providers with

poor reputation. Among all potential providers, Bob picks the one with the highest

normalized benefit.

Both the value of L and the value of rA are given by the provider A. If L is set too

high, R will be small and the provider runs the risk of not being picked by Bob. Providers

with poor reputation can improve their R values by setting their prices low. In this way,

they can recover their reputation by selling services at lower prices. If Alice lies about its

TA, she will be caught in the next phase and be punished.

Now suppose Bob chooses Alice as the best service provider. They have to negotiate a

service contract, denoted as c, in the following format.


< A, B, S, Q, L, SeqA, SeqB, TA, MB >


where A, B, S, Q, and L specify the provider, the consumer, the service type, the service

quality, and the service price respectively. SeqA and SeqB are the contract sequence

numbers of Alice and Bob, respectively. After the transaction, Alice and Bob each increase

their sequence numbers by 1. The values of TA and mB in the contract will be verified by

the delegations in the next phase.

As an example, if S = Storage, Q = 200G, and L = 5, the contract means that Alice

offers storage with size 200G to Bob, and as return, the amount of money Bob must p1v is

5.

3.4.3 Phase 2: Contract Verification

After negotiating a contract, Alice and Bob should exchange an authenticatable

contract proof, so that Alice is able to activate the money transfer procedure and Bob is









granted the prosecution rights. In addition, the information in the contract such as TA and

MB should be verified by the delegations of Alice and Bob.

We use the notation [x]y for the digital signature signed on message x with key y

and {x}y for the cipher text of message x encrypted with key y. After Phase two, if the

contract is verified by the delegations, Alice should have the following contract proof


CA = [CSB


CA should not be produced by Bob, who may lie about MB. Instead, Alice must receive CA

from Bob's delegation after the members confirm the value of mB. Bob has k delegation

members. Each of them will produce a pI .-" of CA and send it to Alice, who will

combine the pi -" into a valid contract proof. Similarly, Bob must receive the following

contract proof from Alice's delegation


CB = [c]sA

The contract proofs will be used by Alice for money transfer and by Bob for prosecution.

It is important to ensure that either both Alice and Bob, or none of them, receive the

contract proofs. Otherwise, dishonest nodes may take advantage of it. It can be shown

that ensuring both or neither one receives her/his contract proof is impossible without

using a third party (the delegation of Alice or Bob in this case).

Key Sharing Protocol

A k-member delegation is not a centralized third party. One possible approach for

producing a contract proof by a delegation is to use threshold cryptography [55]. A (k, t)

threshold cryptography scheme allows k members to share the responsibility of performing

a cryptographic operation, so that any subgroup of t members can perform this operation

successfully, whereas any subgroup of less than t members can not. For digital signature,

k shares of the private key are distributed to the k members. Each member generates a

partial signature by using its share of the key. After a combiner receives at least t partial









signatures, it is able to compute the signature, which is verifiable by the public key. An

important property is that less than t compromised members cannot produce a verifiable

signature on a false message.

In our case, the problem is to produce cB (or CA) by the k-member delegation of

Alice (or Bob). We employ a (k, [k] + 1) threshold cryptography scheme to produce

the contract proof. Alice distributes shares SA(i) of her private key SA to her delegation

members DA(i), which will produce partial signatures [c]sA(Y) and forward them to Bob

for combination. As long as the delegation of Alice is ti ni -I. i 11 ii, Bob will receive enough

correct partial signatures to compute a verifiable contract proof, while the false partial

signatures generated by the compromised delegation members will not yield any verifiable

proof.

When applying threshold cryptography, we have to defend against dishonest nodes,

which may intentionally distribute incorrect secret shares. The incorrect partial signatures

cannot yield a valid signatures. We propose a protocol for distributing the key shares.

Take Alice as an example. The protocol guarantees that either all delegation members

receive the correct shares, or they all detect that Alice is dishonest.

Step 1: Alice sends a key share SA(i) to each delegation member DA(i), encrypted

by the member's public key PDA(i). The messages are shown below.

MSG1 Alice DA(i): [{SA()}pDA()SA,VDA(i) e DA

Step 2: After all members receive their key shares, they negotiate a common random

number s (possibly by multi-party Diffie-Hellman exchange with authentication). Each

member sends the number s as a challenge to Alice, signed by the member's private key

and then encrypted by Alice's public key.

MSG2 DA(i) Alice: {[s]sDA(}PA, VDA(i) e DA

Step 3: Alice signs s with SA(i) and then with SA before sending it back to DA(i).

MSG3 Alice DA(): [SA(i)SA, VDA(i) e DA









Step 4: After authentication, if the received [s]sA(i) value matches the locally
computed one, DA(i) forwards the message to all other members in DA. 1

MSG4 DA(i) DA(j): [[s]sAi)]sA, VDA(j) c DA

Otherwise, DA(i) files a certified complaint to other members.
MSG5 DA(i) DA(j): ["SA(i) is invalid"]sD(iW

VDA(j) C DA
Step 5: DA(i) needs to collect [s]sA(J), VDA(j) CE DA, which are the partial signatures
on s. If it receives MSG4 [[s]sA(j)]s, from DA(j), the value of [s]sA(j) is in the message.

If it receives MSG5 from DA(j), there are two possibilities: either Alice or DA(j) is
dishonest. To resolve this situation, DA(i) forwards the certified complaint to Alice. If

Alice challenges the complaint, she must disclose the correct value of SA(j) to DA(i) in the

following message (then DA(j) can learn SA(j) from DA(i)).
MSG6 Alice DA(i): [{SA0(j)}PDA(i)SA
Learning SA(j) from this message, DA(i) can compute [s]sA(j). After DA(i) has all
k partial signatures on s, it can determine that Alice is honest if any (q ] + 1) partial

signatures produce the same signature [s]sA, which can be verified by Alice's public key.

Otherwise, Alice must be dishonest.
Since the value of k is typically set small (e.g. 5) and the key distribution is

performed once per node, the overhead of the above protocol is not significant.

Theorem 3.1. The I. ; sharing protocol ensures that all delegation members will either

obtain the correct shares of Alice's private I. ; or detect Alice's fraud.

Proof. First of all, any node cannot deny the messages it has sent to others or falsely

declare it has received some messages from others, because all messages in the protocol are
signed by the corresponding nodes with their private keys.



1 Note that DA(i) knows s and learns SA(i) from MSG1.









Consider the first case that Alice is honest. All delegation members can obtain the

correct shares in Step 1. In the meantime, only if a complaint is signed by a delegation

member, Alice will disclose the corresponding share to challenge the complaint. If Alice

is honest, only dishonest members may issue certified complaints. The total number of

distinct shares exposed by Alice is no larger than the number of dishonest members.

Next consider the second case that Alice is dishonest. Alice may try to deceive the

delegation in two possible v-i. One way is that Alice does not send shares to some

delegation members DA(i), which can be easily detected by DA(i) when it receives MSG3

from Alice or MSG4 from other delegation members. Subsequently, DA(i) will file a

certified complaint (MSG6). If Alice discloses the correct share (MSG7) to challenge the

complaint, DA(i) can obtain its share from other members; otherwise, honest members are

certain that Alice is dishonest, and will punish her.

The other possible way for Alice to deceive is to distributes incorrect shares to

some members DA(i) in MSG1. There are three possible outcomes when MSG3 is

processed by DA(i). (1) The partial signature in MSG3 matches the locally computed

one. Subsequently, MSG3 is forwarded to all other members by DA(i). Then all honest

members can detect Alice's fraud in Step 5 because, in addition to [s]sA(), there are

Lk other partial signatures that cannot be used to compute the signature [s]sA. (2)

The partial signature in MSG3 does not match the locally computed one. DA(i) will

detect Alice's fraud in Step 4 because of the inconsistency between MSG1 and MSG3.

It will forward two inconsistent messages from Alice to all other members in MSG5.

Consequently, all members learn the inconsistency and punish Alice. (3) Alice does not

send MSG3 to DA(i) at all. This can be handled in a way similar to the previous case

that DA(i) does not receive MSG1 from Alice. E


Contract Verification Protocol

Both Alice and Bob must register the contract with their delegations so that the

money transfer and the optional prosecution can be performed through the delegations at










0 Members in Alice's Delegation set 0 Members in Bob's Delegation set

Yes/NO

Ps,
C0
P cA CA CB
Alice ... Bob Alice ... Bob 0 0 Alice ... ... Bob
Ps c
C CA
psj

Yes/NO
Contract Verification and
Exchange (Alice->Bob) Money Transfer Prosecution

A B C

Figure 3-2. A) Protocols for contract verification and exchange. B) money transfer.
C)Prosecution.


later times. The delegations must verify the information claimed by Alice and Bob in the

contract and generate the contract proofs that Alice and Bob need in order to continue

their transaction. We design a contract verification protocol to implement the above

requirements. The protocol consists of four steps, illuminated in Figure 3-2 (A), and the

number of messages is 0(k) for normal cases.

A procedure call is denoted as x.y(z), which means to invoke procedure y at node x

with parameters) z. If x is a remote node, a signed message carrying z must be sent to x

first.

Step 1: Alice sends the contract c and a digital signature c' to the delegation DA

for validation. c' may be a signature of the contract concatenating the identifier of the

receiver, i.e., c' = [c|DA(iIsA. Bob does the same thing.

Alice. SendContract (Contract c, Signature c')

1. for i = 1 to k do

2. DA(i).ComputePartialProof(c, c')

Step 2: Then the delegation member DA(i) verifies the reputation claimed by Alice

in the contract (denoted as c.TA) and computes a partial signature (denoted as psi) on the

contract with its key share established by the previous protocol.










DA(i).ComputePartialProof (Contract c, Signature c')

1. if rA > C.TA then

2. ContractList.add(c, c')

3. ps [c]sA(i)

4. DB(i).DeliverPartialProof(c,psi)

5. else punish(A)

Line 1 verifies whether C.TA is over-claimed or not. Line 2 saves the contract for later

use in Step 3. The signature c' will be used in a procedure called detectQ. Line 3 produces

a partial signature on the contract by using SA(i). Line 4 sends the partial signature to

the ith member of DB. If C.rA is over-claimed, Alice will be punished at Line 5.

The delegation members in DB execute a similar procedure except that the condition

in Line 1 should be MB > c.mB.

Step 3: When DB(i) receives the contract c and the partial signature spi from DA(i),

it executes the following procedure.

DB (i).DeliverPartialProof (Contract c, PartialSignature psi)

1. wait for a timeout period

2. if c is found in ContractList then

3. Bob.ProcessPartialProof(psi)

4. else

5. detect()

Line 1 waits for a timeout period to ensure that the contract from Bob has arrived.

Line 2 checks if the received contract c is also in the local ContractList. If that is true,

Bob has announced the exact same contract as Alice does, and DB(i) forwards the partial

signature psi to Bob. Otherwise, there are two possibilities: (1) Bob does not have the

same contract as Alice has, or (2) Bob does have the same contract but its contract has

not arrived at DB(i) yet. To distinguish these two cases, Line 4 waits for a timeout period

before checking again if c is now in ContractList. If not, DB(i) believes that Alice and









Bob do not have the same contract, and the detect() procedure is executed to detect the

special case of a malicious Bob forging the contract. Once Bob is detected to be dishonest,

the delegation can regard him as a liar, and omit the detection procedure in the future

suspicious transactions, indicating the detection procedure needs to be invoked only once

for each dishonest node.

In the following, we will present the design details of the detectQ procedure, and

then provide the correctness proof. The delegation member DB(i), which has received

two different contracts from Bob and DA(i), must handle two possible cases. One case

is that DB(i) has received a contract with the sequence number c.SeCqB from Bob, and

the other is that DB(i) has never received such a contract from Bob. In the former case,

DB(i) stops the verification procedure immediately. Then it tries to detect whether Bob

is lying or not by sending Bob's signature c' to all other delegation members in DB. If

a member finds that c' is different from Bob's signature that it receives directly from

Bob, it sends its version of Bob's signature to all other members. Otherwise, the member

discards the signature from DB(i). In the latter case, DB(i) sends a special request,

denoted REQ, with the sequence number c.SeCqB to all other members in DB. If a member

has already received the contract from Bob with the specified sequence number, it sends

the corresponding signature c' to all other members after receiving REQ. Otherwise,

the member discards the request REQ. In both cases, any member that has received

two different versions of Bob's signature c' punishes Bob and refuses to participate in

the rest of transaction for the contract. In addition, for the latter case, DB(i) refuses to

proceed with the verification if no replies are received from other members, or punishes

Bob but still continues the verification procedure using the the contract retrieved from

other members if all of the signatures received are the same.

We show that, having received conflicted contracts, if a delegation member simply

stops the verification procedure without invoking the detect() routine, Bob will be able

to break the protocol. Suppose k is equal to 3, DB(1) is a friend of Bob, and both DB(2)









and DB(3) are honest. Bob can break the protocol by sending DB(2) a correct contact

while sending DB(3) a forged contract (for instance, with lower price c.L) or not sending

the contract to DB(3) at all. Because DB(1) is Bob's friend, it may forward the partial

signature pSA(1) from DA(1) to Bob, but not send the partial signature psB(1) to DA(1).

Then, Bob can collect two partial signatures pSA(1) and pSA(2) because DB(2) cannot

detect Bob's fraud and will forward the partial signature pSA(2) to Bob, while Alice can

only get one signature psB(2) from DA(2). Bob can compute the contract proof signed by

Alice, while Alice cannot compute the proof signed by Bob. In our protocol, this problem

is addressed by DB(3) invoking the detect() routine after receiving the contract from

DA(3). The detect() routine guarantees that either DB(2) detects Bob's fraud or DB(1)

forwards the partial signature of the contract retrieved from DB (2).

Step 4: After Alice (Bob) receives t or more correct partial signatures, she (he) can

compute the contract proof CA (CB), which can be verified by using Bob's (Alice's) public

key.

Theorem 3.2. If k-pair delegation set of Alice and Bob is trustworthy, the contract

verY. H.r protocol ensures that both Alice and Bob will receive the correct p.'.'f- or

neither one can receive a valid contract proof and the transaction is aborted.

Proof. First, we prove that neither Alice nor Bob can deceive the authority. This is a

symmetric protocol, so without losing generality we only consider Bob. Below we analyze

the four possible v--i that Bob may use to deceive the authority.

1. Bob over-claims its available money mB. In this case, all honest members in DB can

detect Bob's fraud in Step 1, and punishes Bob. In the meantime, these members

will neither forward the partial signature [c]sB(i) to DA(i) nor deliver [c]sA() to Bob,

and consequently the transaction will be aborted.

2. Bob modifies the contract specification, for example, by lowering the transaction

price c.L in order to p1 less for the transaction. He sends the same modified









contract to DB. In Step 3, all members in DB learn that the contract presented

by Bob is different from that presented by Alice, and they will invoke the detect()

procedure. In this case, all honest delegation members will stop contract verification

immediately, but they will not punish Bob, because there is only one contract

signature from Bob and either Alice or Bob may be lying.

3. Bob sends different modified contracts to the delegation members. Multiple

delegation members will detect that the contracts from Bob and Alice are different,

and they will invoke the detect() routine. At the end of the detection procedure, all

members will learn that there are different contract signatures coming from Bob.

Consequently, they will all punish Bob and abort the transaction.

4. Bob does not send the contracts to some (or all) delegation members. In this case,

a member DB(i) that does not received the contract from Bob will send the request

REQ to all other members. If no other member receives the contract from Bob,

DB(i) will receive no reply back. It will refuse to continue the verification process,

but will not punish Bob because either Alice or Bob can be lying. Similarly, all other

members will also stop the verification. Now if some other members have received

the contracts from Bob, DB(i) will receive Bob's signatures in the replies from them,

and it will continue the verification process using the contracts retrieved from other

members. No one stops the verification process.

In summary, we can see that all members in the ti ii-Iv-i l1, set will take the same

action (continuing or stopping the contract verification process) in all four possible cases.

Next, we prove that dishonest members cannot deceive honest members in the

trustworthy sets to stop the verification process if both Alice and Bob are honest. As we

have discussed above, only in two cases, an honest delegation member, DA(i) or DB(i),

will stop the contract verification. One case is that the contract signatures (both Alice's

and Bob's) received by the member in the detect() routine are not identical. The other

case is that the member receives different contracts from Alice and Bob in Step 3. The









former case happens only when Alice/Bob sign and distribute different contracts to

delegation members dishonestly, while the latter case happens when both DA(i) and DB(i)

are untrustworthy or when either Alice or Bob is dishonest. If both Alice and Bob are

honest, dishonest members cannot interrupt the verification process at the members in the

trustworthy sets.

By Theorem 3.1 and the discussions above, if Alice and Bob are honest, both of

them are able to collect no less than Lkj correct partial signatures, and compute the
2
valid contract proofs. Otherwise, if either Alice or Bob is dishonest, the transaction is

aborted. E


3.4.4 Phases 3 and 4: Money Transfer and Contract Execution

Before providing the service, Alice requests its delegation to transfer money, which

is illuminated in Figure 3-2 (B). Upon receiving a money transfer request from Alice, the

delegation member DA(i) invokes the following procedure.

DA(i).TransferMoneyProvider(Contracts c, ContractProof CA)

1. if valid(c, CA) and DB(i).TransferMoneyConsumer(c, CA)

2. TMA = TMA + c.L

3. else verify()

In Line 1, both DA(i) and DB(i) need to validate the contract by using Bob's public

key, which can be queried from Bob if it is not locally available. After validation, DA(i)

increases Alice's earned money in Line 2.

Note that DB(i) may be malicious. If DA(i) cannot get a positive answer from DB(i),

it must verify the validity of the contract further (Line 3), which can be designed as

follows. DA(i) asks other members in DA- If the ii ii ii ly of DA have received a positive

answer from DB, the contract is considered to be valid (DB(i) is malicious). Otherwise,

the contract is considered to be invalid and Alice is punished.

When DB(i) receives a money transfer request from DA(i), it performs the following

operations.










DB(i).TransferMoneyConsumer(Contract c, ContractProof CA)

1. if valid(c, CA) then

2. if mB > c.L then

3. TMB = TMB c.L

4. return true;

5. else

6. punish(B)

7. return false;

8. else return false;

First, if the contract is valid (Line 1) and Bob has enough money to p1v the service

(Line 2), then Bob's spent money is increased and a positive answer is returned to DA(i)

(Line 3 and 4). Second, it is possible that the contract is valid but Bob does not have

enough money. This happens when Alice and Bob are colluding nodes and Alice gets the

contract proof CA directly from Bob instead through her delegation. In such a case, Bob

is punished and a negative answer is returned (Line 6 and 7). Third, if the contract is

invalid, a negative answer is returned (Line 8).

DA(i) and DB(i), Vi e [1..k], perform money transfer at most once for each contract.

They keep track of the sequence numbers (SeqA and SeqB) of the last contract for which

the money has been transferred. All new contracts have larger sequence numbers.

3.4.5 Phase 5: Prosecution

After Bob receives the service from Alice, if the quality of service specified in the

contract is not met, Bob may issue a prosecution request to Alice's delegation, as

illustrated in Figure 3-2 (C). The request specifies the amount of money f that Bob

thinks he has overpaid.

Upon receiving a prosecution request from Bob, if DA cannot evaluate the service

quality, it punishes both Alice and Bob by freezing the money overpaid by Bob. The

procedure is given as follows.










DA(i).Prosecution(Contract c, ContractProof CB, Overpaid f)

1. if valid(c,CB) and f < c.L then

2. OMA OMAf /

3. notify(A)

First DA(i) validates the prosecution request by checking if the contract proof is

authentic (Line 1). If the contract is valid, it increases Alice's overpaid money by f (Line

2). Finally, it notifies Alice so that Alice is able to determine whether to sell service to

Bob in the future.

3.5 System Properties and Defense against Various Attacks

3.5.1 System Properties

We study the properties of MARCH, which solves or alleviates the problems in the

previous approaches.

First, according to the money transfer procedures in Section 3.4.4, transactions among

members in the same colluding group cannot increase the total amount of available money

of the group. We have the following property, which indicates that the malicious nodes

cannot benefit by cooperation.

Property 1. R. /n., -. of its size, a colluding ii '' cannot increase its members' money

or reputation by cooperation without decreasing other members' m' -,u'. ;/ and/or reputation.

Second, unlike some other schemes [12, 13, 15, 16], MARCH does not maintain the

history of any consumer's complaints, and does not punish frequent complainers. We have

the following property.

Property 2. If a consumer is deceived, it is not restricted by the system in i' i.i; from

seeking prosecution against the malicious providers.

Third, the overpaid money is not returned to the complaining consumer, which

eliminates any reason for the consumer to lie if the consumer is not malicious. If the

consumer is malicious and intends to defame the providers, it has to p i the price for









the transactions before committing any harm, which serves as an automatic punishment.

Consequently, its ability of defaming is limited by the money it has, which cannot be

increased artificially by collusion, according to Property 1.

In addition, by Property 2, a deceived consumer can seek revenge with no restriction,

which means a malicious provider cannot benefit from its action. We have the following

properties.

Property 3. A malicious provider cannot 7,. [. i7 by deceiving the consumers, and a

malicious consumer will be 10l.ii.'il..l ,ii punished for 1. fu 1ii:;u the providers.

Property 4. The maximum amount of loss for an innocent provider or consumer in a

transaction with a malicious node is limited by the price -p ..- W. in the contract.

Property 3 removes financial incentives to cheat. A provider can make money only

by serving others; a consumer will not be refunded for cheating. Property 4 makes sure

that an innocent node will not be subject to negative discrimination attacks [12], in which

nodes with excellent reputation can severely damage other nodes.

In summary, the malicious nodes cannot increase their power (in terms of available

money) by cooperation, and they can only attack others at the cost of their own interests,

i.e., money and/or reputation. Consequently, the total damage caused by the malicious

nodes is strictly limited. They will eventually be rejected from the system due to poor

reputation or be enforced to serve others for better reputation in order to stay in the

system.

3.5.2 Defending Against Various Attacks

In the following, we consider four different types of attacks launched by a colluding

group [12].

Unfairly high ratings: The members of a colluding group cooperate to artificially

inflate each other's reputation by false reports, so that they can attack innocent nodes

more effectively. In MARCH, a colluding group can inflate the reputation of some

members only by moving the available money from other members to them. According









to Property 1, the total money in the group cannot be inflated through cooperation.

Although some members' reputation can be made better, other members' reputation will

become worse, making them ineffective in attacks.

Unfairly low ratings: Providers collude with consumers to "bad-mouth" other

providers that they want to drive out of the market. Because MARCH requires all

consumers to p i- money for their transactions before they can defame the providers, the

malicious consumers lose their money (and reputation) for "bad-mouthiiing which in turn

makes it harder for them to stay in the system.

Negative discrimination: A provider only discriminates a few specific consumers

by offering services with much lowered quality than what the contract specifies. It

hopes to earn some "extra" money without damaging its reputation since it serves most

consumer honestly. In MARCH, a provider cannot make such "extra" money because of

the prosecution mechanism and Properties 2-3.

Positive discrimination: A provider gives an exceptionally good service to a

few consumers with high reputation and an average service to the rest consumers.

The strategy will work in an incentive scheme where a consumer's ability of affecting

a provider's reputation is highly related to the consumer's own reputation, and vice

versa. MARCH does not have this problem. The provider's reputation changes after

a transaction is determined by how much money it receives for the service, not by the

reputation of the consumer.

3.6 Discussions

In this section, we discuss other important issues on implementing MARCH.

3.6.1 Rewarding Delegation Members

The system should offer incentive for the delegation members to perform their tasks.

A simple approach is for the provider and the consumer of a transaction to reward their

delegation members with a certain amount of money, which should be less than the price

of the transaction. After the transaction, the provider A signs an incentive p .i-ment









certificate and sends the certificate to every delegation member DA(i), which reduces

TMA by a certain amount and then forwards the certificate to its delegation members,

where the certificate is authenticated and the money is deposited. The consumer p .i,- its

delegation members in a similar way. If a delegation member in DA refuses to serve, node

A can increase k to bring new members into DA.c

3.6.2 Money Refilling

Because the overpaid money will be frozen forever, the total amount of available

money in the whole system may decrease over the time. As a result, the system may enter

into deflation and lack sufficient money for the providers and the consumers to engage in

transactions. This problem can be addressed by money refilling. The delegation members

of a node A will replenish the total money TMA of the node at a slow, steady rate. In this

way, a minimal amount of service is provided to all consumers, even the free-riders, at all

time, which we believe is reasonable. For additional service, a consumer has to contribute

to the P2P network by also serving as a provider.

3.6.3 System Dynamics and Overhead

In a P2P network, nodes may join/leave the network at any time. When a node

X leaves the network, its DHT table will be taken over by the closest neighbor X'.

In MARCH, suppose X is a delegation member of A. After X leaves the network, X'

will become a new member in A's delegation. In order to deal with abrupt departure,

X' should cache the information kept at X, or it can learn the information from other

delegation members after X leaves.

For a specific DHT network, there are better --- of selecting the delegation members

than the approach in Section 3.3.1. Take Chord [1] as an example. We can select a subset

of the log n neighbors of node A as the delegation DA In this way, the maintenance of the

delegation is free as Chord already maintains the neighbor set.

The communication overhead of a transaction (excluding the actual service) consists

of 0(k) control messages, which are sent from the provider (consumer) via k pairs of











300
.2 k=3
k31=5 -----X ----
k25
S250 k=7
200

i 150

100

i 50

S 0
0 500 1000 1500 2000 2500 3000
number of dishonest nodes

Figure 3-3. Trustworthiness of delegation


S 0.014 3-pair delegation set
5-pair delegation set ---
0.012 7-pair delegation set
S 0.01
S 0.008 -
0.006 -
S 0.004 -
S 0.002 -
0
2 0 500 1000 1500 2000 2500 3000
number of dishonest nodes

Figure 3-4. Trustworthiness of k-pair delegation set


delegation members to the consumer (provider) throughput direct TCP connections. This

overhead is quite small comparing to the typical services such as downloading video files of

many gigabytes or sharing storage for months. More importantly, the overhead does not

increase with the network size, which makes MARCH a scalable solution, comparing with

other schemes [23, 24] whose overhead increases with the network size.

3.7 Simulation

In our simulations, the dishonest nodes fall into three categories with equal

probability.









Category one: These nodes never offer services to others after receiving money, and

al -i defame the providers after receiving services.

Category two: When these nodes find that they may be rejected from the system,

they behave honestly. Otherwise, they behave in the same way as the nodes in category

one.

Category three: When these nodes find that they may be rejected from the system,

they behave honestly. Otherwise, they cheat their transaction partners with a probability

taken from [0.5,1] uniformly at random.

If not explicitly specified otherwise, the system parameters are set as follows. The

number of nodes is 100,000 and k is 5. The average number of dishonest nodes is 1,000.

Initially, the total money for a node is 500, and the overpaid money is 0. The service

price G estimated by the consumers is 10. The threshold T is 0.9. To satisfy the threshold

requirement, the maximum selling price for a provider is denoted as max (max is the

maximum value of L that keeps R above the threshold, calculated based on Eq. (4-2).

If max is negative, then the node can no longer be a provider. If a dishonest node in

Category two or three finds that its max value may become negative after additional

malicious acts, it will behave honestly). The actual selling price is a random number taken

uniformly from (0, max]. If a node can neither be a provider (due to poor reputation), nor

be a consumer (due to little money), it is said to be rejected from the system.

If one participant in a transaction tries to deceive the other one, the transaction is

called a failed transaction. We define "failed transaction ratio" as the number of failed

transactions divided by the total number of transactions, and ip .d money ratio" as

the total amount of overpaid money divided by the total amount of money paid in the

transactions. These metrics are used to assess the overall damage caused by dishonest

nodes.










1000
honest nodes -
dishonest nodes -
7 800

o 600
400
200



0
I-


0 50 100 150 200 250 300 350 400 450 500



avg. number of transactions performed by each node

Figure 3-5. Most of malicious nodes are rejected within the first 50 transactions.


3.7.1 Effectiveness of Authority

In the first set of simulations, we study the trustworthiness of the delegations and

the k-pair delegation sets. Figure 3-3 shows the number of untrustworthy delegations

with respect to the number of dishonest nodes for k 3, 5, and 7. Recall that a

delegation is untrustworthy if at least half of its members are dishonest. Out of 100,000

delegations, only a few number of them are untrustworthy. For k 5, the number of

nodes with untrustworthy delegations is just 23 even when there are 3,000 dishonest

nodes. Figure 3-4 shows the probability for an arbitrary k-pair delegation set to be

untrustworthy (Section 3.3.2). The 5-pair delegation set is trustworthy with a probability

larger than 99.,-' even when there are 3,000 dishonest nodes. Note that when a delegation

is untrustworthy, the dishonest members may not belong to the same colluding group.

Without cooperation, the damage they can cause will be smaller.

3.7.2 Effectiveness of MARCH

The second set of simulations study the effectiveness of our incentive scheme.

Figure 3-5 presents how the number of rejected nodes changes with the average number of

transactions performed per node, which can be used as the logical time as the simulation

progresses. Recall that the default number of dishonest nodes is t,000. The figure shows

that most dishonest nodes are rejected from the system within 50 transactions per node.
I-
EI 200-


0 50 100 150 200 250 300 350 400 450 500
avg. number of transactions performed by each node

Figure 3-5. Most of malicious nodes are rejected within the first 50 transactions.


3.7.1 Effectiveness of Authority

In the first set of simulations, we study the trustworthiness of the delegations and

the kc-pair delegation sets. Figure 3-3 shows the number of untrustworthy delegations

with respect to the number of dishonest nodes for k =3, 5, and 7. Recall that a

delegation is untrustworthy if at least half of its members are dishonest. Out of 100,000

delegations, only a few number of them are untrustworthy. For k =5, the number of

nodes with untrustworthy delegations is just 23 even when there are 3,000 dishonest

nodes. Figure 3-4 shows the probability for an arbitrary kc-pair delegation set to be

untrustworthy (Section 3.3.2). The 5-pair delegation set is trustworthy with a probability

larger than 99.>'. even when there are 3,000 dishonest nodes. Note that when a delegation

is untrustworthy, the dishonest members may not belong to the same colluding group.

Without cooperation, the damage they can cause will be smaller.

3.7.2 Effectiveness of MARCH

The second set of simulations study the effectiveness of our incentive scheme.

Figure 3-5 presents how the number of rejected nodes changes with the average number of

transactions performed per node, which can be used as the logical time as the simulation

progresses. Recall that the default number of dishonest nodes is 1,000. The figure shows

that most dishonest nodes are rejected from the system within 50 transactions per node.










0.02
o failed transaction ratio -
overpaid money ratio ----
-e 0.015




0
0.005


o 0----- "--
S0 50 100 150 200 250 300 350 400 450 500
avg. number of transactions performed by each node

Figure 3-6. Failed transaction ratio and the overpaid money ratio drop quickly to small
percentages within the first 100 transactions.


Because of money refilling, some rejected nodes will recover after enough money is refilled,

but they will be rejected again after performing malicious transactions. No honest nodes

are rejected from the system during the simulation.

Figure 3-6 shows that the failed transaction ratio drops quickly from 1. !'. to 0.;'.

within the first 100 transactions per node, and the overpaid money ratio drops from

1. !'. to 0.'-. in the same period. As the time progresses, these ratios become even more

insignificant. Note that the overpaid money ratio is smaller than the failed transaction

ratio. This is because the dishonest providers have to lower their prices in order to

compete with honest providers, which in turn lowers their ability to cause significant

damage. Ironically, if a dishonest node with poor reputation wants to stay in the system,

not only does it have to behave honestly to gain reputation, but also it has to do so with

lower price in order to get consumers, which i p i -s" the damage it does to the system

previously.

Next, we study how the number of dishonest nodes affects the system performance.

Figure 3-7 shows the overpaid money ratio after 250 transactions per node. We find that

the ratio increases linearly with the number of dishonest nodes. Even when there are 3,000
















0.003
overpaid money ratio
0.0025

0.002

0.0015 -

0.001 -

0.0005


0 500 1000 1500 2000 2500 3000
number of dishonest nodes

Figure 3-7. Overpaid money ratio (measured after 250 transactions) increases linearly with
the number of dishonest nodes.












honest nodes
? 2000 dishonest nodes ----
2000UU


1500

1000

500


0 500 1000 1500 2000 2500 3000
number of dishonest nodes


Figure 3-8. Number of rejected dishonest nodes (measured after 250 transactions)
increases linearly to the number of dishonest nodes.











0.002 ---
overpaid money ratio


CS
-e



ID,







0
-s







-0
0
I-


-o


0.0015


0.001


0.0005


0
0.4 0.5 0.6 0.7 0.8 0.9 1
threshold

Overpaid money ratio with respect to the threshold


1200
honest nodes -
1000 dishonest nodes --

800

600

400 -

200 -

0 -
0.4 0.5 0.6 0.7 0.8 0.9
threshold


Figure 3-10. Number of rejected nodes with respect to the threshold


dishonest nodes, the overpaid money ratio remains very small, just 0.15'. Figure 3-11

shows that the more the number of dishonest nodes, the more they are rejected.

Last, we study the impact of the threshold on the system performance. The threshold

is used by a consumer to select the potential providers (Section 3.4.2). Figure ?? shows

that the overpaid money ratio decreases linearly with the threshold value, which means

the system performs better with a larger threshold. Figure 3-10 shows that the number of

rejected dishonest nodes is largely insensitive to the threshold value. When the threshold

is too low, some honest nodes may be rejected by the system because a smaller threshold

allows the dishonest nodes to do more damage on the honest nodes, which may even













2000

1500

1000

500


honest nodes
dishonest nodes


0
0 500 1000 1500 2000 2500 3000
number of dishonest nodes

Figure 3-11. Number of rejected dishonest nodes (measured after 250 transactions)
increases linearly to the number of dishonest nodes.


cause some honest nodes to be rejected from the system due to defamed reputation. The

numbers in the above two figures are measured after 250 transactions per node.









CHAPTER 4
CAPACITY-AWARE MULTICAST ALGORITHMS ON HETEROGENEOUS OVERLAY
NETWORKS

4.1 Motivation

Multicast is an important network function for group communication among a

distributed, dynamic set of heterogeneous nodes. The global deployment of IP multicast

has been slow due to the difficulties related to heterogeneity, salability, manageability,

and lack of a robust inter-domain multicast routing protocol. Application-level multicast

becomes a promising alternative.

Even though overlay multicast can be implemented on top of overlay unicast, they

have very different requirements. In overlay unicast, low-capacity nodes only affect

traffic passing through them and they create bottlenecks of limited impact. In overlay

multicast, all traffic will pass all nodes in the group, and the multicast throughput is

decided by the node with the smallest throughput, particularly in the case of reliable

delivery. The strategy of assigning an equal number of children to each intermediate node

is far from optimal. If the number of children is set too big, the low-capacity nodes will be

overloaded, which slows down the entire session. If the number of children is set too small,

the high-capacity nodes will be under-utilized.

There has been a flourish of capacity-aware multicast systems, which excel in

optimizing single-source multicast trees but are not suitable for multi-source applications

such as distributed games, teleconferencing, and virtual classrooms. Especially, they are

insufficient in supporting applications that require .:,-i,--urce multicast with varied host

capacities and dynamic membership.

To support efficient multicast, we should allow nodes in a P2P network to have

different numbers of neighbors. We proposes two overlay multicast systems that support

.ni, -- urce multicast with varied host capacities and dynamic membership. We model

the capacity as the maximum number of direct children to which a node is willing to

forward multicast messages. We extend Chord [1] and Koorde [56] to be capacity-aware,









and they are called CAM-Chord and CAM-Koorde, respectively.1 A delicate CAM-Chord

or CAM-Koorde overlay network is established for each multicast group. We then embed

implicit degree-varying multicast trees on top of CAM-Chord or CAM-Koorde and develop

multicast routines that automatically follow the implicit multicast trees to disseminate

multicast messages. Dynamic membership management and salability are inherited

features from Chord or Koorde. Capacity-aware multiple-source miticast are added

features. Our analysis on CAM multicasting sheds light on the expected performance

bounds with respect to the statistical distribution of host heterogeneity.

4.2 System Overview

Consider a multicast group G of n nodes. Each node x cE G has a capacity cx,

specifying the maximum number of direct child nodes to which x is willing to forward the

received multicast messages. The value of cx should be made roughly proportional to the

upload bandwidth of node x. Intuitively, x is able to support more direct children in a

multicast tree when it has more upload bandwidth. In a heterogeneous environment, the

capacities of different nodes may vary in a wide range. Our goal is to construct a resilient

capacity-aware multicast service, which meets the capacity constraints of all nodes, allows

frequent membership changes, and delivers multicast messages from any source to the

group members via a dynamic, balanced multicast tree.

Our basic idea is to build the multicast service on top of a capacity-aware structured

P2P network. We focus on extending Chord [1] and Koorde [56] for this purpose. The

resulting systems are called CAM-Chord and CAM-Koorde, respectively. The principles

and techniques developed should be easily applied to other P2P networks as well.

A CAM-Chord or CAM-Koorde overlay network is established for each multicast

group. All member nodes (i.e., hosts of the multicast group) are randomly mapped

by a hash function (such as SHA-1) onto an identifier ring [0, N 1], where the next



1 CAM stands for Capacity-Aware Multicast.









identifier after N 1 is zero. N(= 2b) should be large enough such that the probability of

mapping two nodes to the same identifier is negligible. Given an identifier x E [0, N 1],

we define successor(x) as the node clockwise after x on the ring, and predecessor(x) as

the node clockwise before x on the ring. x refers to the node whose identifier is x; if

there is not such a node, then it refers to successor(x). Node x is said to be responsible

for identifier x. With a little abuse of notation, x, x, successor(x), and predecessor(x)

may represent a node or the identifier that the node is mapped to, depending on the

appropriate context where the notations appear. Given two arbitrary identifiers x and

y, (x, y] is an identifier segment that starts from (x + 1), moves clockwise, and ends
at y. The size of (x, y] is denoted as (y x). Note that (y x) is alv-b positive. It

is the number of identifiers in the segment of (x, y]. The distance between x and y is

x -y = y x = min{(y x),(x -y)}, where (y x) is the size of segment (y,x] and

(x y) is the size of segment (x, y]. (y, x] and (x, y] form the entire identifier ring.
Before we discuss the CAMs, we briefly review Chord and Koorde. Each node x in

Chord has O(log2 n) neighbors, which are 1 ..., of the ring away from x, respectively.

When receiving a lookup request for identifier k, a node forwards the request to the

neighbor closest to k. This greedy algorithm takes O(log2 n) hops with high probability

to find k, the node responsible for k. Each node in Koorde has m neighbors. A node's

identifier x is represented as a base-m number. Its neighbors are derived by shifting one

digit (with value 0..m-1) into x from the right side and discard x's leftmost bit to maintain

the same number of digits. When x receives a lookup request for k, the routing path of

the request represents a transformation from x to k by shifting one digit of k at a time

into x from the right until the request reaches the node responsible for k. Because k has

O(log, n) digits, it takes O(log, n) hops with high probability to resolve a lookup request.

Readers are referred to the original papers for more details.

Our first system is CAM-Chord, which is essentially a base-c1 (instead of base-2)

Chord with cx variable for different nodes. The number of neighbors of a node x is









0(c, >i7), which is related to the node's capacity. Hence, different nodes may have

different numbers of neighbors. The distances between x and its neighbors on the

identifier ring are ..., c ... ... of the ring, respectively. Apparently

CAM-Chord becomes Chord if cx = 2, for all node x. We will design a greedy lookup

routine for CAM-Chord and a multicast routine that essentially embeds implicit,

capacity-constrained, balanced multicast trees in CAM-Chord. The multicast messages

are disseminated via these implicit trees. It is a challenging problem to analyze the

performance of CAM-Chord. The original analysis of Chord cannot be applied here

because cx is variable. We will provide a new set of analysis on the expected performance

of the lookup routine and the multicast routine.

The second system is CAM-Koorde, which differs from Koorde in both variable

number of neighbors and how the neighbors are calculated. This difference is critical

in constructing balanced multicast trees. Each node x has ce neighbors. The neighbor

identifiers are derived by shifting x to the right for a variable number I of bits and then

replacing the left-most I bits of x with a certain value. In comparison, Koorde shifts x

one digit (base m) to the left and replaces the right-most digit. This subtle difference

makes sure that CAM-Koorde spreads neighbors of a node evenly on the identifier ring

while neighbors in Koorde tend to cluster together. We will design a lookup routine and

a multicast routine that essentially performs broadcast. Remarkably, we show that this

broadcast-based routine achieves roughly balanced multicast trees with the expected

number of hops to a receiver being 0(log n/E(log ce)).

CAM-Chord maintains a larger number of neighbors than CAM-Koorde (by a factor

of O( log)), which means larger maintenance overhead. On the other hand, CAM-Chord

is more robust and flexible because it offers backup paths in its topology [57]. The two

systems achieve their best performances under different conditions. Our simulations show

that CAM-Chord is a better choice when the node capacities are small and CAM-Koorde

is better when the node capacities are large.









4.3 CAM-Chord Approach

CAM-Chord is an extension of Chord. It takes the capacity of each individual node

into consideration. We first describe CAM-Chord as a regular P2P network that supports

a lookup routine, which is to find k for a given identifier k. We then present our multicast

algorithm on top of CAM-Chord.

When a node joins or leaves the overlay topology, the lookup routine is needed to

maintain the topology as it is defined. CAM-Chord is not designed for data sharing

among peers as most other P2P networks (e.g., Chord [1]) do. There are NO data items

associated with the identifier space. Each multicast group forms its own CAM-Chord

network, whose sole purpose is to provide an infrastructure for dynamic capacity-aware

multicasting.

4.3.1 Neighbors

For a node x in Chord, its neighbor identifiers are (x + 2V) mod N, Vi E [1.. log2 N],

which are ..., of the ring away from x. CAM-Chord is a base-c1 Chord with

variable c, for different nodes. Let c mean (ce)'. The neighbor identifiers are (x + j x

c') mod N, denoted as xij, Vj c [1..cx 1], Vi c [0.. gN 1]. i and j are called the level
X ~log C
and the sequence number of xij. Let x0o,o = x. The actual neighbors are xfj, which are the

nodes responsible for xij. Note that xo, is Jvxi successor(x).

See an illustration in Figure 1 with ce = 3. The level-one neighbors in CAM-Chord

divide the whole ring into cx segments of similar size. The level-two neighbors further

divides a close segment (x, x + 1 N] into c sub-segments of similar size. And so on.

Consider an arbitrary identifier k. Let

log (k x)
log c(

i = L x (4-2)











S- level-one neighbors
S level-two neighbors
........................................................................................ x x level-three neighbors





Chord neighbors CAM-Chord neighbors

Figure 4-1. Chord v.s. CAM-Chord neighbors (c, = 3)


It can be easily verified that xij is the neighbor identifier of x that is counter-clockwise

closest to k, which means xj is the neighbor node of x that is counter-clockwise closest to

node k.2 We call i the level and j the sequence number of k with respect to x.

4.3.2 Lookup Routine

CAM-Chord requires a lookup routine to assist member join/departure during a

dynamic multicast session. This routine returns the address of node k responsible for a

given identifier k. x.foo() denotes a procedure call to be executed at x. It is a local (or

remote) procedure call if x is the local (or a remote) node. The set of identifiers that x is

responsible for is (predecessor(x), x]. The set of identifiers that successor(x) is responsible

for is (x, successor(x)].

x.LOOKUP(k)

1. if k c (x, successor(x)] then

2. return the address of successor(x)

3. else

4. |log (k-x)
4. i L-J-
log cm
5. i k Lx]

6. if k c (x, xj] then




2 It is possible that f = k.










7. return the address of x-j

8. else

/* forward request to xij */

9. return xf,.LOOKUP(k)

First the LOOKUP routine checks if k is between x and successor(x). If so, LOOKUP

returns the address of successor(x). Otherwise, it calculates the level i and the sequence

number j of k. If k falls between x and xf-, which means x j is responsible for the

identifier k, LOOKUP returns the address of xij. On the other hand, if xfj precedes k,

then x forwards the lookup request to x j. Because xfj is x's closest neighbor preceding k,

CAM-Chord makes a greedy progress to move the request closer to k.

4.3.3 Topology Maintenance

Because CAM-Chord is an extension of Chord, we use the same Chord protocols to

handle member join/departure and to maintain the correct set of neighbors at each node.

The difference is that our LOOKUP routine replaces the Chord LOOKUP routine. The

details of the protocols can be found in [1].

The join operation of Chord can be optimized because two consecutive nodes on

the ring are likely to have similar neighbors. When a new node joins, it first performs a

lookup to find its successor and retrieves its successor's neighbors (called fingers in Chord).

It then checks those neighbors to make correction if necessary. In a base-c Chord, the

join complexity without the optimization is O(ct>g) for a constant c. The optimization

reduces the complexity to O(c 9).

CAM-Chord can be regarded as a base-c Chord where c is a random variable

following the node-capacity distribution. It cannot .J. li- perform the above optimization

because consecutive nodes may have different capacities, which make their neighbor sets

different. When this happens, a new node x has to perform O(c lg) lookups to find

all its neighbors. The lookup complexity is O(- log' ) by Theorem 1 (to be proved).
lg E(' c )
log 2 n lOc if
The join complexity is ( I ), which would be reduced to locg 2i
Th jincopextyisO~1logy c.g log c Ot('glogc)i









the capacities of all nodes had the same value, i.e., c was a constant. This overhead is

too high for a traditional P2P file-sharing application such as FastTrack, because the

observations in [58] showed that over 2i i'. of the connections last 1 minute or less and 111'.

of the IP addresses keep active for no more than 10 minutes each time after they join the

system. But CAM-Chord is not designed for file-sharing applications. One appropriate

CAM-Chord application is teleconferencing, which has far less participants than FastTrack

and less dynamic member changes. We do not expect the ini ii ii ly of participants to keep

joining and departing during a conference call. Another application is distributed games,

where a user is more likely to p1 i- for a hour than for one minute.

CAM-Chord makes a tradeoff between capacity awareness and maintenance overhead,

which makes it unsuitable for highly dynamic multicast groups. For them, CAM-Koorde is

a better choice because a node only has c, neighbors. Our future research will attempt to

develop new techniques to overcome this limitation of CAM-Chord.

4.3.4 Multicast Routine

On top of the CAM-Chord overlay, we want to implicitly embed a dynamic, roughly

balanced multicast tree for each source node. Each intermediate node in the tree should

not have more children than its capacity. It should be emphasized that no explicit tree

is built. Given a multicast message, a node x executes a MULTICAST routine, sending

the message to ce selected neighbors, which in turn execute the MULTICAST routine to

further propagate the message. The execution of the MULTICAST routine at different

nodes makes sure that the message follows a capacity-aware multicast tree to reach every

member.

Let msg be a multicast message, k be an identifier, and x be a node executing the

MULTICAST routine. The goal of x.MULTICAST(msg, k) is to deliver msg to all nodes

whose identifiers belong to (x, k]. The routine is implemented as follows: x chooses cx

neighbors that split (x, k] into c subsegments as even as possible. Each subsegment begins

from a chosen neighbor and ends at the next clockwise chosen neighbor. x forwards the









multicast message to each chosen neighbor, together with the subsegment assigned to

this neighbor. When a neighbor receives the message and its subsegment, it forwards

the message using the same method. The above process repeats until the size of the

subsegment is reduced to one. The distributed computation of MULTICAST recursively

divides (x, k] into non-overlapping subsegments and hence no node will receive the

multicast message more than once.

x.MULTICAST(msg, k)

1. if k = x then

2. return

3. else

4. i -- | 0gk-x
log cm
5. i k L-]

/*select children from level-i neighbors preceding k*/

6. k' <-- k

7. for m = j down to 1

8. xi .MULTICAST(msg, k')

9. k' xi,m 1

/* select (c j 1) children from level-(i 1) neighbors */

10. I <-- c

11. form = c j 1 down to 1

12. I -- / /* for even separation */

13. xi- 11.MULTICAST(msg, k')

14. k' `- xi-1,_1 1

/* select x's successor */

15. xo,.MULTICAST(msg, k')

To split (x, k] evenly, x first calculates the level i and the sequence number j of k with

respect to x (Line 4-5). Then neighbors xim (Vm E [1..j]) at the ith level preceding k









are selected as children of x in the multicast tree (Line 6-9). We also select x's successor,

which is x0o (Line 15). Since j + 1 may be less than ce, in order to fully use x's capacity,

cx 1 j neighbors at the (i 1)th level are chosen; Line 10-14 ensures that the selection

is evenly spread at the (i 1)th level. Because the algorithm selects neighbors that divide

(x, k] as even as possible, it constructs a multicast tree that is roughly balanced. At Line
9, we optimize the code by using k <-- xi,m 1 instead of k <-- x 1. That is because

there is no node in (xi,m, Xim) by the definition of xLm.

4.3.5 Analysis

Assume cx > 2 for every node x. We analyze the performance of the LOOKUP

routine and the multicast routine of CAM-Chord. Suppose a node x receives a lookup

request for identifier k and it forwards the request to a neighbor node xfj that is closest

to k. We call -Ek j the distance reduction ratio, which measures how much closer the
K-X
request is from k after one-hop routing. The following lemma establishes an upper bound

on the distance reduction ratio with respect to c, which is a random variable of certain

distribution.

Lemma 4.1. Suppose a node x forwards a lookup request for .. i/,.i r k to a neighbor

xfj. Ifxfj c (x,k], then
k x In ce
E( I)< E(- )
k x CX 1
Proof. Based on the algorithm of the LOOKUP routine, i must be the height of k with

respect to x. By (4-1) and (4-2), k can be written as


k =x+ jc+l


where j E [..c 1] is the sequence number of k with respect to x and I E [0..cj).


k x = c + (4-3)










By definition (Section 4.3.1), xij = x+jc Because x, xij, x-j, and k are in clockwise

order on the identifier ring, we have


k x j < k xij = (4-4)


By (4-3) and (4-4), we have

k ~ X-i'j < < X
k -x jc + jc +1

kY
We now derive the expected distance reduction ratio. E(k-k j) depends on three

random variables, j, 1, and cex. Because the location of k is arbitrary with respect to x, we

can consider j and I as independent random variables with uniform distributions on their

respective value ranges.

k E1 1 CY-1 1 k1 x.
E( m) E (- i dl)
k x ex t j-1 e fo k-x
1 c-l 1 [c e l
=E( E dl)
c l j=1 cJ Jc +1
1 ax-1 Inc
= E(- ( E (In(j + 1) In j))) E(I
cx 1 *71 C 1

D


Theorem 4.2. Let ce, for all nodes x, be independent random variables of certain

distribution. The expected length of a lookup path in CAM-Cl. i, is 0(- "in )-

Proof. Suppose (xI, X2, ..., m) is a prefix of a lookup path for identifier k, where xi is the

node that initiates the lookup, and xi, i c [1..m], and k are in clockwise order on the

identifier ring. Because the nodes are randomly mapped to the identifier ring by a hash

function, the distance reduction ratio after each hop is independent of those after other










hops. Consequently k i c [2..m], are independent random variables.


E(k Xm) = E( Xm- k- k X2 (k x1))
k x,-1 k m- X-2 k x,
k -Xm k -xm-l k -X2
E(k ) ) E( k) ... x E( k ) ) E(k xi) (4-5)
k l X- k -m,-2 k X1

< (E( n -x))-1 N
c-x 1

where ce is a random variable with the same distribution as cx i G [1..m]. Next we derive

the value of m that ensures E(k Xm) < which is the average distance between two

.'lI went nodes on the identifier ring. The following is a sufficient condition to achieve

E(k xm) < N
( n c N
(E( I x))1 N
cm 1 12
Inn
In E(ne
c-1

If E(k Xm) < the expected number of additional routing hops from Xm to k is 0(1).

0(m) = 0(- 1"1) gives the expected length of the lookup path. D
cx

It is natural that the expected length of a lookup path in CAM-Chord depends on

the probability distribution of cex, which affects the topological structure of the overlay

network. For a given distribution, an upper bound of the expected path length can be

derived from Theorem 4.2. The following theorem gives an example.

Theorem 4.3. Suppose the node .'p'../;/ cx follows a "-nu.-m n distribution with E(cex) = c.

The expected length of a lookup path in CAM-Chord is 0(1).










Proof. Suppose the range of c, is [ti..t2 with E(c,) = c. We perform Big-O reduction as

follows.

In c to ln c 1
E( 2) =( x)
c cx t2 t 1
C- 1
EnInc 1
=8( r dcx x )
t^ ex t2 +l^
cJ2 t2 t+1

=0((11n t2 nt- In2 1)
22 c
-(In2 c) because t2< 2c and t < c
c

Therefore,


ln E(In e) e(lnInI2) (- Ilnc)
ex c
ca. C

By Theorem 4.2, O(- 1' ) O('n). D
cx

Other distributions of ce may be analyzed similarly.

Next we analyze the performance of the MULTICAST routine in CAM-Chord.

Suppose x executes x.MULTICAST(msg, k), which is responsible for delivering msg to

all nodes in the identifier segment (x, k). Specifically, x forwards msg to some neighbor

nodes xim by remotely invoking xim.MULTICAST(msg, k'), which is responsible for

delivering msg to a smaller subsegment (xim, k'), where xim, k' e (x, k). It is a typical

divide-and-conquer strategy. We call the segment reduction ratio, which measures
k-
the degree of reduction in problem size after one-hop multicast routing. The following

lemma establishes an upper bound on the segment reduction ratio with respect to ce,

which is a random variable of certain distribution.

Lemma 4.4. Suppose a node x forwards a multicast message to a neighbor x Z, i.e.,

x.MULTICAST(msg, k) calls xim.MULTICAST(msg, k'). It must be true that

V x In
xi^ ^ l (nca-
E( k n k -x c, -1









Proof. Based on the algorithm of the MULTICAST routine, the execution of x.MULTICAST(msg,

k) will divide its responsible segment (x, k) into ce subsegments, and xi, is responsible for

delivering msg to all nodes in one subsegment (xi,, k'). The largest subsegment is created

by Lines 6-9. When Line 7 is executed for m E [j 1..1], k' = Xi,m+ 1. Therefore,

k X i,rm
=x+(m+ 1)c x mc (4-6)

cx

By Line 4, i is the level of k with respect to x. By (4-1) and (4-2), k can be written as


k =x+ jc+l


where j E [1..c 1] is the sequence number of k with respect to x and I E [O..c).


k -x = jc + l (4-7)


By (4-6) and (4-7), we have
k1/ ~- Y?,,n<
k' Xi,m -
k -x jci+l

We now derive the expected segment reduction ratio. E( k/k ') depends on three

random variables, j, 1, and cex. Because the location of k is arbitrary with respect to x, we

can consider j and I as independent random variables with uniform distributions on their

respective value ranges.

k xm 1 t -t 1 k f.
E( ,m) E(- c i dl)
k, x e~x t j7-1 ex. JOC k -

E(1 cx-11n cfet
c 1=1 c Jo jc +I (4-8)
1 ex-1
=E( Y (In (j + t) -Inj)))
Cx 1 j-1
In cx
= E( -)
Cx 1

D









A multicast path is defined as the path that the MULTICAST routine takes to deliver

a multicast message from a source node to a destination node. The proofs of the following

two theorems are very similar to those of Theorems 4.2 and 4.3, due to the similarity

between Lemma 4.4 and Lemma 4.1, on which the theorems are based. To avoid excessive

repetition and to conserve space, we omit the proofs for Theorems 4.5 and 4.6.

Theorem 4.5. Let ce, for all nodes x, be independent random variables of certain

distribution. The expected length of a multicast path in CAM-Chord is O(- ig7 )-
In E( ncx
cx
Theorem 4.6. Suppose the node e'p'. H1 c1 follows a uniform distribution and E(cex) = c.

The expected length of a multicast path in CAM-Chord is 0(1).

4.4 CAM-Koorde Approach

This section proposes CAM-Koorde. For any node x in CAM-Koorde, its number of

neighbors is exactly equal to its capacity cx. The maintenance overhead of CAM-Koorde is

smaller than that of CAM-Chord due to a smaller number of neighbors.

Like Koorde, CAM-Koorde embeds the Bruijn graph in the identifier ring. On

the other hand, it has two ni i, r differences from Koorde, which are critical to our

capacity-aware multicast service.

The first difference is about neighbor selection. The neighbor identifiers of a node
x in Koorde are derived by shifting x one digit (base m) to the left and then replacing
the last digit with 0 through m 1. The neighbor identifiers differ only at the last
digit. Consequently they are clustered and often refer to the same physical node. For the
purpose of multicast, we want the neighbors to spread evenly on the identifier ring. The
neighbor identifiers of a node x in CAM-Koorde are derived by shifting x one or more bits
to the right and then replacing the high-order bits with 0 through certain number. The
neighbor identifiers differ at the high-order bits, and they are evenly distributed on the
identifier ring.

The second difference is about the number of neighbors. Koorde requires every
node to have the same number of neighbors. CAM-Koorde allows nodes to have different
numbers of neighbors.

4.4.1 Neighbors

Let N = 2b. In CAM-Koorde, x has ce neighbors, which are categorized into three

groups. All computations are assumed to be modulo N.












O0 Neighbors in
9 --. the basic group
........35
4 ---- --------------- 36 A Neighbors in
the second group
37
0 Neighbors in
41 the third group
57
50

Figure 4-2. CAM-Koorde topology with identifier space [0..63]


The basic group has four neighbors. Two are x's predecessor and successor. The
other two are the nodes responsible for identifiers (x/2) and 2b-1 + (x/2), respectively.

After the basic group, there are cx 4 remaining neighbors. Let s [log(c1 4)].
If s > 1, we shall shift x by s bits to the right to derive the neighbor identifiers.3 If s > 1,
then let t 2s; otherwise let t 0. The neighbors in the second group are the nodes
responsible for identifiers (i x 2b-s + x/28), V i e [0..t 1].

After the basic and second groups, there are t' = cx 4 t remaining neighbors.
Let s' s + 1. The neighbors in the third group are the nodes responsible for identifiers
(i x 2b-s' + x/28'),Vi e [0..t' 1].

It is required that cx > 4. The basic group is mandatory. The optional second and

third groups pick up the remaining neighbors.

An example is given in Figure 4-2, showing the neighbors of node 36 (100100) whose

capacity is 10. The binary representation of the node identifier is given in the parentheses.

The basic group is


{35 (100011), 37 (100101), 18 (010010), 50 (110010)}


The second group is


{9 (001001), 25 (011001),41 (101001),57 (111001)}




3 If s = 1, it means to shift one bit. The basic group already does that.









The third group is


{4 (000100), 12 (001100)}

4.4.2 Lookup Routine

Definition 1. Given two b-bit .1. i/. rs x and k, if an 1-bit prefix of x matches an 1-bit

suffix of k, we -.';/ x and k share I ps-common bits. x = k if the two share b ps-common

bits.

Similar to CAM-Chord, a lookup routine is needed in CAM-Koorde for member

join/departure. First consider an N-node network with every identifier having a

corresponding node. Given an identifier k, suppose node x wants to query for the address

of node k. The lookup routine forwards the lookup request along a chain of neighbors

whose identifiers share progressively more ps-common bits with k. Starting from x, we

identify a neighbor that has the longest prefix matching the suffix of k. More specifically,

if the third group is not empty and a third-group neighbor can be derived by selecting the

([log(c1 4)] + 1) bits of k that precedes the current ps-common bits and shifting them
from the left into x, then the lookup request is forwarded to this neighbor. Otherwise, if

the second group is not empty and a second-group neighbor can be derived by selecting

the [log(c1 4)] bits of k that precedes the current ps-common bits and shifting them

from the left into x, then the lookup request is forwarded to this neighbor. Otherwise,

we forward the request to a first-group neighbor that increases the number of ps-common

bits by one. To determine each subsequent node on the forwarding path, a similar process

repeats by shifting more bits of k into the identifier of the next hop receiver. After at most

b hops, the request can reach node k.

Now suppose n < N, which is normally the case. We still calculate the chain

of neighbor identifiers in the above way, which essentially transforms identifier x to

identifier k in a series of steps, each step adding one or more bits from k. Once the next

neighbor identifier y on the chain is calculated, the request is forwarded to y, which in









turn calculates its neighbor identifier that should be the next on the forwarding path and

then forwards the request.

The pseudo code of the LOOKUP routine is shown below. It uses the high-order bits

of the node identifier to match the low-order bits of k, which is different from Koorde's

routine and is critical for our multicast routine, to be discussed shortly.

xr.LOOKUP(k)

1. if k c (predecessor((x),x;r] then

2. return the address of x

3. if k cE (x, successor(x)] then

4. return the address of successor(x)

5. let mi1 be the number of ps-common bits shared by x and k

6. find the neighbor y that shares the largest number m2 of ps-common bits with k

7. if mI < M2 then

8. return y.LOOKUP(k)

9. else

10. if |k predecessor(x) < |k successor(x)| then

11. return predecessor((x).LOOKUP(k)

12. else

13. return successor(x).LOOKUP(k)



Koorde uses Chord's protocols with a new LOOKUP routine for node join/departure,

so does CAM-Koorde.

4.4.3 Multicast Routine

When a node receives a multicast message, it forwards the message to all neighbors

except those that have received or are receiving the message. Because neighbor connections

are bidirectional, it is easy for a node to perform the checking through a short control

packet. The overhead is negligible when the message is large, such as a video file. Note









that a node does not have to wait for the entire message to arrive before forwarding it to

neighbors. The forwarding is done on per packet basis, but the checking is performed only

for the first packet of a message, which carries the message header. The pseudo code of

the MULTICAST routine is shown below.

x.MULTICAST(msg)

1. for each neighbor y do

2. if y has not received or is not receiving msg then

3. y.MULTICAST(msg)

4.4.4 Analysis

Lemma 4.7. Let ce, for all nodes x, be independent random variables of certain distri-

bution. The expected length of the shortest path between two nodes in CAM-Koorde is

O(log n/E(log cx)).


Proof. Consider two arbitrary nodes xo and y. Let y' be an O(logn)-bit prefix of y. We

show there exists a path of length O(logn/E(log c1)) that reaches a node Xm with y' also

as its prefix.

We construct a physical path (xo, xi, x2, ..**., XT-I, xm) as follows. Node xo has a

basic- or second-group neighbor x1, where x, is derived by shifting max{1, [log(c1o -

4)] } low-order bits of y' into xo from the left.4 The bits of y' that have been used in
shifting are called used bits. Similarly, x1 has a second-group neighbor x2, where x2 is

derived by shifting max{1, [log(c1 4)]} low-order unused bits of y' into xi ... Finally,

xm1 has a second-group neighbor Xm, where Xm is derived by shifting the remaining
max{1, [log(cy- 4)]} low-order unused bits of y' into x l. The length of path (xo,

x1, x2, ..., f ) is m. The total number of used bits of y' is ZT- max{1, [log(c1 4)] J},



4 If cx < 6, we can pick x, from the basic group, which shifts one bit of y' into xo; if
cx > 6, we can pick x, from the second group, which shifts [log(c1o 4)] bits of y' into xo.









which is O(log n). Let ce be a random variable of the same distribution as c, Vi e

[O..m 1].

mr-l
Zmax{1, [log(c, 4)]j} O(logn)
i=o
n-1
logcf, O(logn)
i=O
m-1
E( log c) -O0(log n)
i=0

mE(logc) = O(logn)

m -O(logn/E(logce))

Next we construct an identifier list (xo, xX', x' ..., I -1 xm) as follows. x1 is derived

by shifting max{1, [log(cxo 4)] } low-order bits of y' into xo from the left. x' is derived by

shifting max{1, [log(c1 4)]} low-order unused bits of y' into x' ... Finally, where x' is

derived by shifting the remaining max{1, [log(c- 4)] } low-order unused bits of y' into



y' is an O(log n)-bit prefix of both x' and y. The distance between x' and y on

the identifier ring, Ix' y|, is O('). Note that N is the average distance between

any two .,I11 went nodes on the ring. If we can show that E(|Xm x'|) < -, then

E(|x y|) < E(|x x'I + x' y) = E(|x x'1) + E(x' y|) = O(N), which means

that the expected number of hops from Xm to y is 0(1).

Vi c [1..m], define a random variable Ai = Ii xi\. Because xi can be anywhere in

(predecessor (x), i), we have

1 N
E(A) = E(|predecessor(j),j|) (4-9)
2 2n

where N is the expected distance between two .,.11 went nodes on the identifier ring.
n










xi and x' are derived from x-1\ and x' _i, respectively, by shifting the same max{1, [log(c-,

4)] } bits of y' in from the left. Therefore,

/ V-l-1 X~ -IlI
I 1 2max{ IlI.. -4)]}


It follows that, Vi E [i..m],


iii x'^
Ai+ V-1- i- 1
2max{1 Li.. -4)]}

By induction, we have

m m-1
r~~ ~ T <- TT ___ t __
Xm < 1T2 max{1, Llog(cy-4)J}
i= j=i

Because cj > 4 for any node x in CAM-Chord, max{1, [log(c- 4)]} > 1. Hence,


-xr' 1) < 1n A i)(t )Tn'
Xlm Xlm\ 2 zn
i=1


i=1
N m 1
2n ^2}
iml
N m-1 N
2n5Y2(< n
i=0

Consequently, the expected number of hops from xm to x' and then to y is 0(1). The

expected length of the entire path from xo to y is 0(m) = 0(log n/E(logcex)). D


Theorem 4.8. Let ex, for all nodes x, be independent random variables of certain

distribution. The expected length of a multicast path in CAM-Koorde from a source node to

a member node is O(logn/E(logcx)).

Proof. According to the MULTICAST routine, a multicast packet is delivered in

CAM-Koorde by broadcast, which follows the shortest paths to the member nodes.









The expected length of a multicast path from a source node to a member node is

O(log n/E(log c,)) by Lemma 4.7. .


Theorem 4.9. Suppose the node a'/i. H, c1 follows a uniform distribution and E(cF ) = c.

The expected length of a multicast path in CAM-Koorde from a source node to a member

node is O(logn/logc)).

Proof. Suppose the range of ce is [t ..t2 with E(ce) = c. We perform Big-O reduction as

follows.
t2
E(logce) =-- 1 log ce
Cx-tl
=0( log1 dex)
t2 t,+l t Jt'
_0( -- ---((t2log9t2 t2) (t Ilogt I t 0)))
t2 tl + t

S0 (logc) because t < 2c and t, < c

By Theorem 4.8, O(logn/E(logce)) O("). D

4.5 Discussions

4.5.1 Group Members with Very Small Upload Bandwidth

A node x with very small upload bandwidth should only be a leaf in the multicast

trees unless itself is the data source. In order to make sure that the MULTICAST routine

does not select x as an intermediate node in any multicast tree, x must not be in the

CAM-Chord (or CAM-Koorde) overlay network. Instead, it joins as an external member.

x asks a node y known to be in CAM-Chord (or CAM-Koorde) to perform LOOKUP(k)

for an arbitrary identifier k, which returns a random node z in the overlay network. x

then attempts to join z as an external member. If z cannot support x, z forwards x to

successor(z). If z admits x as an external member, z will forward the received multicast

messages to x and x will multicast its messages via z. If z leaves the group, x must rejoin

via another node in CAM-Chord (or CAM-Koorde).









4.5.2 Proximity and Geography

The overlay connections between neighbors may have very different d.1 iv Two

neighbors may be separated by transcontinental links, or they may be on the same LAN.

There exist some approaches to cope with geography, for example, Proximity Neighbor

Selection and Geographic Layout. With Proximity Neighbor Selection, nodes are given

some freedom in choosing neighbors based on other criteria (i.e. latencies) in addition to

the arithemic relations between their identifiers. With Geographic Layout, node identifiers

are chosen in a geographically informed manner. The main idea is to make geographically

closeby nodes form clusters in the overlay. Readers are referred to [57, 59] for details.

Extension to the existing P2P networks, CAMs can naturally inherit most of those

features without much additional effort. For example, instead of choosing the ith neighbor

to be (x + 2), a proximity optimization of Chord allows the ith neighbor to be any node

within the range of [(a + 2), (a + 2i+1), which does not affect the complexities [57].

This optimization can also be applied to CAM-Chord (which is an extention of Chord)

without affecting the complexities. A node x can choose any node whose identifier belongs

to the segment [x +j x c, x + (j + 1) x c4) as the neighbor xij. Given this freedom,

some heuristics such as smallest delay first, may be used to choose neighbors to promote

proximity-clustering. Specifically, a node x can progressively scan the nodes in the allowed

segment for neighbor xij, for example, by following the successor link to probe the next

node in the segment after every 100k data bits sent by x, which trivializes the probing

overhead. x .J. i- use the nearest node it has found recently as its neighbor.

4.6 Simulation

Throughput and latency are two ini Pr performance metrics for a multicast

application. We evaluate the performance of CAMs from these two aspects. We

simulate multicast algorithms on top of CAM-Chord, Chord, CAM-Koorde, and Koorde,

respectively. The identifier space is [0, 219). If not specified otherwise, the default number

of nodes in an overlay network is 100, 000, and the node capacities are taken from [4..10]









with uniform probability. The upload bandwidths of the nodes are randomly distributed in

a default range of [400,1000] kbps.

It should be noted that the value ranges may go far beyond the default ones in

specific simulations. In our simulations, c, = [B,/p], where B, is the node's upload

bandwidth and p is a system parameter of CAMs, specifying the desired bandwidth

per link in the multicast tree. By varying the value of p, we can construct CAMs with

different average node capacities, which also mean different average numbers of children

per non-leaf node and consequently different tree depths (latency). If the average node

capacity c is not the default value of 7, the node capacities are taken uniformly from

[4,2c 4].

4.6.1 Throughput

We compare the sustainable throughput of multicast systems based on CAM-Chord,

Chord, CAM-Koorde, and Koorde. Throughput is defined as the rate at which data can

be continuously delivered from a source node to all other nodes. Due to limited buffer

space at each node, the sustainable multicast throughput is decided by the link with the

least allocated bandwidth in the multicast tree. CAM-Chord and CAM-Koorde produce

much larger throughput because a node's capacity c, (which is its number of children

in the multicast tree) is adjustable based on the node's upload bandwidth. The primary

advantage of CAMs over the Chord/Koorde is their ability to adapt the overlay topology

according to host heterogeneity.

Figure 4-3 compares the throughput of CAM-Chord, Chord, CAM-Koorde, and

Koorde with respect to the average number of children per non-leaf node in the multicast

tree. CAMs perform much better. Their throughput improvement over Chord and

Koorde is 70-i I.' when the nodes' upload bandwidths are chosen from the rather narrow

default range of [400,1000] kbps. The larger the upload-bandwidth range, the more the

throughput improvement, as demonstrated by Figure 4-4. In general, let [a, b] be the range

of upload bandwidth. The upper bound b of the range is shown on the horizontal axis



















CAM-Chord
Chord
CAM-Koorde
Koorde


5 10 15 20 25 30
Average Number of Children


Figure 4-3. Multicast throughput with respect to average number
node


2.4
2.3 -
2.2
2.1
2
1.9
1.8 -
1.7
1.6
1.5
1.4
800


of children per non-leaf


CAM-Chord over Chord
CAM-Koorde over Koorde ----
900 1000 1100 1200 1300 1400 1500 1600
Upload Bandwidth Range


Figure 4-4. Throughput improvement ratio with respect to upload bandwidth range


35 40











55

50

45 CAM-Chord
Chord ---
S40CAM-Koorde
40 Koorde e

| 35

30

25
100002000030000400005000060000700008000090000100000
Number of Nodes

Figure 4-5. Multicast throughput with respect to size of the multicast group


9

8

7

6
5


4
CAM-Chord
CAM-Koorde ------
3
10 20 30 40 50 60 70 80 90 100
Throughput (kbps)

Figure 4-6. Throughput vs. average path length


of Figure 4-4, while the lower bound a is fixed at 400 kbps. The figure shows that the

throughput improvement by CAMs increases when the upload-bandwidth range is larger,

representing a greater degree of host heterogeneity. The simulation data also indicate

that the throughput ratio of CAM-Chord (CAM-Koorde) over Chord (Koorde) is roughly

proportional to 2a
2a
















Figure 4-5 shows the multicast throughput with respect to the size of the multicast

group. According to the simulation results, the throughput is largely insensitive to the

group size.









4.6.2 Throughput vs. Latency

We measure multicast latency by the average length of multicast paths. Latency

is determined by both the number of hops and the hop delay. In CAM-Chord and

CAM-Koorde, the overlay links are randomly formed among the nodes due to the use of

DHT. The latency of an overlay path is statistically proportional to the number of hops.

That's why we used the number of hops to characterize the latency performance. In fact,

the measurement in terms of number of hops carries information beyond latency. It is

also an indication of how many times a message has to be r 1 i '-, 1 which is a resource

consumption issue. With the proximity neighbor selection in Section 4.5.2, the latency

is no longer proportional to the number of hops. We add a simulation in Section 4.6.4 to

study this case, where the actual d. 1 ic is measured.

Both throughput and latency are functions of average node capacity. With a larger

average node capacity (achieved by a smaller value of p), the throughput decreases due to

more children per non-leaf node and the latency also decreases due to smaller tree depth.

There exists a tradeoff between throughput and latency, which is depicted by Figure 4-6.

Higher throughput can be achieved at the cost of longer routing paths. Given the same

upload bandwidth distribution, the system's performance can be tuned by adjusting p.

The figure also shows that, for relatively small throughput (less than 46kbps in the figure)

- namely, for large node capacities CAM-Koorde slightly outperforms CAM-Chord; for

relatively large throughput (greater than 46kbps in the figure) namely, for small node

capacities CAM-Chord outperforms CAM-Koorde, which will be further explained in

Section 4.6.4.

4.6.3 Path Length Distribution

Figure 4-7 and Figure 4-8 present the statistical distribution of multicast path

lengths, i.e., the number of nodes that can be reached by a multicast tree in certain

number of hops. Each curve represents a simulation with node capacities chosen from

a different range. When the capacity range increases, the distribution curve moves to

















45000
40000
35000 |
30000
25000
20000
15000 -
10000
5000

0 2 4 6 8
Path Length(hops)


10 12 14


Figure 4-7. Path length distribution in CAM-Chord. Legend "[x..y]" means the node
capacities are uniformly distributed in the range of [x..y].


90000
80000
70000
60000
50000
40000
30000
20000
10000
0


4
[4..6]
[4..8]
[4..10]
[4..20]
[4..40]
[4..100]
[4..200]


5 10
Path Length(hops)


15 20


Figure 4-8. Path length distribution in CAM-Koorde. Legend "[x..'/|
capacities are uniformly distributed in the range of [x..y].


means the node










16
CAM-Chord
CAM-Koorde ---
14 1.5*ln(100,000)/ln(c)
S12
i5)
S10

s 8 -&

6


2
4 ----------------------------

0 10 20 30 40 50 60 70 80 90 100 110
Average Node Capacity

Figure 4-9. Average path length with respect to average node capacity


the left of the plot due to shorter multicast paths. The improvement in the distribution

can be measured by how much the curve is shifted to the left. At the beginning, a small

increase in the capacity range causes significant improvement in the distribution. After the

capacity range reaches a certain level ([4,10] in our simulations), the improvement slows

down drastically.

Each curve has a single peak, and the right side of the peak quickly decreases to

zero. It means that the vast in Pi il y of nodes are reached within a small range of path

lengths. We didn't observe any multicast path whose length was significantly larger than

the average path length.

4.6.4 Average Path Length

Figure 4-9 shows the average path length with respect to the average node capacity.

We also plot an artificial line, 1.51" with n = 105, which upper-bounds the average path

lengths of CAM-Chord and CAM-Koorde, verifying Theorem 4.6 and Theorem 4.9.

From the figure, when the average node capacity is less than 10, the average path

length of CAM-Chord is smaller than that of CAM-Koorde; when the average node

capacity is greater than 12, the average path length of CAM-Koorde is smaller than

CAM-Chord. A smaller average path length means more balanced multicast trees. For

small node capacities, CAM-Chord multicast trees are more balanced than CAM-Koorde










450
CAM-Chord
400 Approximate-optimized CAM-Chord ---
2 350
S300 \
S250 x
200
150
100 -----------
50 -
0 10 20 30 40 50 60 70 80 90 100
Average Node Capacity

Figure 4-10. Proximity optimization


multicast trees, and vice versa. The reasons are explained as follows. On one hand, a

non-leaf CAM-Koorde node x may have less children than ce because some neighbors may

have already received the multicast message from a different path. This tends to make the

depth of a CAM-Koorde multicast tree larger than that of a CAM-Chord tree. On the

other hand, a CAM-Chord node x may split (x, k] into uneven subsegments (i.e., subtrees)

with a ratio up to cx (Lines 6-15 in Section 4.3.4). This ratio of unevenness becomes small

when the node capacities are small. Combining these two factors, CAM-Chord creates

better balanced trees for small node capacities; CAM-Koorde creates better balanced trees

for large node capacities.

Next we use CAM-Chord as an example (Section 4.5.2) to study the impact of

proximity optimization. In [60], Mukherjee found that the end-to-end packet d. 1I

on the Internet can be modeled by a shifted Gamma distribution, which is a long-tail

distribution. The shape parameter varies from approximately 1.0 during low loads to

6.0 during high loads on the backbone. We set the shape parameter to be 5.0 and the

average packet delay to be 50 ms. Figure 4-10 compares the average latency of delivering

a multicast message from a source to a receiver in CAM-Chord with or without the

proximity optimization. The simulation is performed for different average node capacities,











1.5
CAM-Chord
1.45 CAM-Koorde ---- ---
1.4
S 1.35
S 1.3
------- -- -
S1.25
---- >----------
D 1.2
S 1.15
1.1
1.05

0.5 0.6 0.7 0.8 0.9 1
Average Capacity Ratio

Figure 4-11. Throughput vs. latency


and the impact of proximity optimization is significant. In most cases, it reduces the

latency more than by half.

4.6.5 Impact of Dynamic Capacity Variation

In a real environment, the upload bandwidth of a node may fluctuate. If we atlv'--

use the same implicit multicast trees, then the dynamic variation of node capacities will

cause variation in average throughput but not in average latency. CAMs can also be easily

modified to ensure throughput but allow latency variation. If a node's capacity decreases,

it simply forwards messages to a smaller number of neighbors, which automatically

reshapes the implicit tree. If a node's capacity increases for a long time, the node can take

advantage of the improved capacity by increasing the number of neighbors.

We define the capacity ratio as the actual capacity of a node x at the real time

divided by the claimed capacity ce that is used to build the topology of CAMs. We define

the latency ratio as the actual delay at a given capacity ratio divided by the "b, 1i11. !ii 1.

delay when the capacity ratio is 1i" i'., namely, no dynamic capacity variation. Apparently,

the latency ratio is a function of the capacity ratio.

Figure 4-11 shows the relation between the latency ratio and the capacity ratio.

When the capacity ratio is smaller, which means the nodes cannot support as many

children as they have claimed, the nodes will forward the received messages to a