Efficient Data Structures and Protocols with Applications in Space-Time Constrained Systems

MISSING IMAGE

Material Information

Title:
Efficient Data Structures and Protocols with Applications in Space-Time Constrained Systems
Physical Description:
1 online resource (180 p.)
Language:
english
Creator:
Qiao, Yan
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
CHEN,SHIGANG
Committee Co-Chair:
SAHNI,SARTAJ KUMAR
Committee Members:
PEIR,JIHKWON
FORTES,JOSE A
FANG,YUGUANG

Subjects

Subjects / Keywords:
bloom-filter -- distributed-join -- membership-lookup -- mobile-peer-to-peer -- multi-set-membership -- rfid -- space-efficiency -- time-efficiency
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Space-time efficient data structures can benefit many application systems. Space-efficiency describes a data structure that can fit compactly into the memory. The term time-efficiency implies that accessing/updating it requires short period of time. In this work, we cover space-time efficient data structures related to four abstract problems: set membership checking, multi-set membership checking, distributed set joining, and set size estimation. The goal is to improve the space-time efficiency of existing solutions. We first study an efficient set representation -- the Bloom filter. Traditional Bloom filters offer excellent space efficiency, but each lookup requires multiple hash computations and multiple memory accesses. We propose a new family of space-time efficient Bloom filters that provide the same space efficiency, but requires much fewer memory accesses for each lookup. Due to wide applicability, any improvement to the performance of Bloom filters can potentially have a broad impact in many areas of research. Then we further propose a data structure for representing multi-sets, where a set ID is associated with each member element. Our proposed data structure can reduce the space and computational complexity, yet keep the error ratio in a pretty low level. Next, we design another Bloom filter variant called adaptive Bloom filter for efficient joining two distributed sets. When two nodes in distributed system exchange their elements to find out common ones, the Bloom filter can be used to encode element IDs with fewer network overhead. Our design can further reduce the amount of data that is exchanged. Then, we apply our efficient Bloom filter idea to cyber-physical systems. We show that the space-time domain in computer/network system can be mapped to the time-energy domain of an RFID system. Based on the partitioned Bloom filter and our efficient Bloom filter idea, we design two time-energy efficient protocols for RFID systems. We show how to tweak the data structures to adapt to application-specific features. Last, we propose two light-weighted protocols to estimate the cardinality of vehicles in a region. Future vehicles may be equipped with wireless devices to form a powerful vehicular peer to peer network. For a stationary wireless network, this problem has been extensively studied. However, it becomes much more challenging for mobile vehicular peer-to-peer networks whose topologies are constantly changing, and the estimation protocol has to be fast to take a useful snapshot of the dynamic network. Our protocols are based on circled random walks and tokened random walks. With simulations under two different mobility models, we show the efficiency of our protocols.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Yan Qiao.
Thesis:
Thesis (Ph.D.)--University of Florida, 2014.
Local:
Adviser: CHEN,SHIGANG.
Local:
Co-adviser: SAHNI,SARTAJ KUMAR.

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2014
System ID:
UFE0046700:00001


This item is only available as the following downloads:


Full Text

PAGE 1

EFFICIENTDATASTRUCTURESANDPROTOCOLSWITHAPPLICATIONSINSPACE-TIMECONSTRAINEDSYSTEMSByYANQIAOADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2014

PAGE 2

c2014YanQiao 2

PAGE 3

Tomyparentsandmyhusband,withoutwhomIcouldneverreachsofar 3

PAGE 4

ACKNOWLEDGMENTS ItwassuchalongjourneysincethesecondIdecidedtopursueaPh.D.degree,tothismomentwhenInallygetone.Ihavereceivedsomanyhelp,inspiration,andencouragementfrommyfamily,friends,teachers,andcolleagues.Withoutthem,Icannotimaginehowdarkthispathwouldbe.LookingbacktothepastfewyearsofmystudyatUniversityofFlorida,itisundoubtedlyclearthattherstandmostimportantpersonalongthisjourneywasDr.ShigangChen,myPh.D.advisorandmymentor.Thankstohisopen-doorpolicy,Icanalwaysdiscussunsolvedissueswithhimandseekhissuggestions.Hehasspentcountlesshoursworkingwithmeeventhoughhedoesn'thaveto.Sometimeshewasstillupat3amjusttonishreviewingmypapersothatIhavemoretimetoreviseitbeforethesubmissiondeadline.Iamsograteful.InlifeDr.Chenisaveryniceperson.Everythanksgivingheinvitesourgrouptodinnerwithhisfamily.ItfeelssowarmespeciallythatmyownfamilywasfarbackinChina.Heisnotjustanadvisortomyresearch,amentorfromwhomIlearn,butalsoafriendandfamily.IwanttogivemygratitudetomyPh.D.committeemembers.TheyareDr.Jih-KwonPeir,Dr.JoseFortes,Dr.SartajSahniandDr.YuguangFang.ThankyouforyouradviceandsupportduringmystudyatUniversityofFlorida.Ialsowouldliketothankthefellowstudentsandresearchersinmyresearchgroup.TheirnamesareMingZhang,TaoLi,WenLuo,ZhenMo,YianZhou,MinChen,XiTao,YouZhou,QingjunXiao,ZhipingCai,andLipingZhang.Thanksforthefreshinsightsyougavemeandallthefunyoubroughtme.Lastly,Iwouldlikegivemygreatestthankstomyparents.Theyaresuchhardworkingpeopleandtheyhaveprovidedmewiththebestresourcestheycanget.Ialsowanttothankmyhusband,whohasalwaysbelievedinmeevenwhenIdidn't.Thanksforbeingbymysideforalltheseyears. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 6 LISTOFFIGURES ..................................... 7 ABSTRACT ......................................... 8 CHAPTER 1INTRODUCTION ................................... 10 1.1Background ................................... 10 1.2ResearchScopeofThisWork ......................... 11 2SYSTEMMODEL,METRICS,RELATEDWORKS,ANDCONTRIBUTIONS .. 14 2.1SystemModel ................................. 14 2.2PerformanceMetrics .............................. 14 2.3RelatedWorks ................................. 15 2.3.1BetterBloomFilter ........................... 16 2.3.2BloomFilterforDynamicSet ..................... 20 2.3.3BloomFilterforDistributedJoin .................... 22 2.3.4BloomFilterforMulti-Set ........................ 23 2.4Contributions .................................. 25 3SPACE-TIMEEFFICIENTBLOOMFILTERSFORSETMEMBERSHIPLOOKUP 27 3.1Background ................................... 27 3.2Bloom-1:OneMemoryAccessBloomFilter ................. 29 3.2.1BloomFilter ............................... 29 3.2.2Bloom-1Filter .............................. 31 3.2.3ImpactofWordSize .......................... 33 3.2.4Bloom-1v.s.BloomwithSmallk ................... 34 3.2.5Bloom-1v.s.BloomwithOptimalk .................. 36 3.3Bloom-g:AGeneralizationofBloom-1 .................... 38 3.3.1Bloom-gFilter .............................. 38 3.3.2Bloom-gv.s.BloomwithSmallk ................... 40 3.3.3Bloom-gv.s.BloomwithOptimalk .................. 42 3.3.4Discussion ................................ 44 3.3.5UsingBloom-ginaDynamicEnvironment .............. 45 3.3.6Experiment ............................... 47 3.4Bloom-:AnotherGeneralizationofBloom-1 ................ 51 3.4.1Motivation ................................ 51 3.4.2Bloom-Filter .............................. 53 5

PAGE 6

3.4.3Experiment ............................... 55 3.5Summary .................................... 57 4SPACE-TIMEEFFICIENTMULTI-SETMEMBERSHIPLOOKUP ........ 59 4.1Background ................................... 59 4.2ProblemDenition ............................... 60 4.2.1Multi-SetMembershipLookup(MSM) ................ 60 4.2.2PerformanceMetrics .......................... 61 4.3DesignofMSMFunction ............................ 62 4.3.1Motivation ................................ 62 4.3.2MSMFunction ............................. 63 4.3.3ConictClassication .......................... 69 4.3.4FalsePositive .............................. 71 4.3.5Mis-Classication ............................ 72 4.4Discussions ................................... 72 4.4.1Deletion ................................. 72 4.4.2MultipleMemoryBanks ........................ 75 4.4.3SystemOverload ............................ 76 4.5Evaluation .................................... 77 4.5.1SimulationSetup ............................ 77 4.5.2SimulationResults ........................... 78 4.6Summary .................................... 82 5ADAPTIVEBLOOMFILTERSFORDISTRIBUTEDJOIN ............ 83 5.1Background ................................... 83 5.2SystemModel ................................. 85 5.2.1ProblemDenition ........................... 85 5.2.2PerformanceMetrics .......................... 85 5.3BloomlterApproaches ............................ 86 5.3.1Bloomlter(BF) ............................ 86 5.3.2CompressedBloomlter(CBF) .................... 87 5.4AdaptiveBloomlter(ABF) .......................... 89 5.4.1Motivation ................................ 89 5.4.2Description ............................... 90 5.4.3Analysis ................................. 91 5.5Summary .................................... 93 6USINGPARTITIONEDBLOOMFILTERSFORINFORMATIONCOLLECTIONINRFIDSYSTEMS ................................. 95 6.1FromSpace-TimeEfciencytoTime-EnergyEfciency ........... 95 6.2RFIDBackground,Model,andProblemDenition .............. 95 6.2.1RFIDSystem .............................. 95 6.2.2SystemModel .............................. 98 6.2.3ProblemDenition ........................... 98 6

PAGE 7

6.2.4PriorArt ................................. 99 6.3BasicPollingProtocol(BP) .......................... 101 6.4CodedPollingProtocol ............................. 102 6.5Tag-OrderingPollingProtocol(TOP) ..................... 104 6.5.1Motivation ................................ 104 6.5.2ProtocolDescription .......................... 105 6.6PerformanceAnalysisofTOP ......................... 109 6.6.1EnergyCost ............................... 109 6.6.2ExecutionTime ............................. 111 6.6.3ChoosingvforTime-constrainedEnergyMinimization ....... 113 6.6.4Computingpd(m,v,x)theBallsAndBinsAlgorithm ....... 116 6.7EnhancedTag-OrderingPollingProtocol(ETOP) .............. 116 6.7.1Motivation ................................ 116 6.7.2ProtocolDescription .......................... 118 6.8PerformanceAnalysisofETOP ........................ 120 6.8.1EnergyCost ............................... 120 6.8.2ExecutionTime ............................. 121 6.8.3ChoosingvforTime-constrainedEnergyMinimization ....... 121 6.9ChannelError .................................. 125 6.10SimulationResults ............................... 126 6.10.1Varyingnumbernoftags ....................... 127 6.10.2VaryingSizevofReporting-orderVector ............... 128 6.11Summary .................................... 129 7SETSIZEESTIMATIONINMOBILEVEHICULARP2PNETWORKS ...... 130 7.1Background,SystemModel,andProblemDenition ............ 130 7.1.1MobileVehicularP2PNetworks .................... 130 7.1.2NetworkModel ............................. 133 7.1.3ProblemDenition ........................... 133 7.1.4PriorArt ................................. 135 7.2CircledRandomWalkProtocol(CRW) .................... 138 7.2.1DescriptionoftheProtocol ....................... 138 7.2.2LinkingE(X)ton ............................ 140 7.2.3DeterminingtheNumberofProbes .................. 141 7.3TokenedRandomWalkProtocol(TRW) ................... 143 7.3.1DescriptionoftheProtocol ....................... 143 7.3.2LinkingE(Y)ton ............................ 144 7.3.3RandomTokenDistribution ...................... 145 7.3.4DeterminingtheNumberofProbes .................. 147 7.4Simulation .................................... 149 7.4.1SimulationSetupforStreetModel ................... 149 7.4.2AccuracyandOverheadunderStreetModel ............. 150 7.4.3SimulationSetupforRandomWaypointModel ............ 151 7.4.4AccuracyandOverheadundertheRandomWaypointModel .... 152 7

PAGE 8

7.4.5PerformanceunderLowMobility ................... 152 7.5Summary .................................... 153 8CONCLUSIONANDFUTUREWORK ....................... 159 REFERENCES ....................................... 162 BIOGRAPHICALSKETCH ................................ 175 8

PAGE 9

LISTOFTABLES Table page 3-1NotationsinChapter 3 ................................ 31 3-2QueryoverheadcomparisonofBloom-1ltersandBloomlterwithk=3. ... 35 3-3QueryoverheadcomparisonofBloom-1lterandBloomlterwithoptimalnumberofmembershipbits. ............................. 37 3-4QueryoverheadcomparisonofBloom-2lterandBloomlterwithk=3. ... 41 3-5QueryoverheadcomparisonofBloomlterandBloom-glterwithoptimalk. 43 3-6QueryoverheadcomparisonofBloomlterwithk=3andBloom-glterwithh=3log2m. ..................................... 48 3-7QueryoverheadcomparisonofBloomlterandBloom-glterwithoptimalk. 50 3-8QueryoverheadcomparisonofBloom-1,Bloom-2andBloom-lter. ..... 54 7-1AveragecompletiontimeofCircledRandomWalk(CRW) ............ 151 9

PAGE 10

LISTOFFIGURES Figure page 3-1Falsepositiveratiowithrespecttowordsizew. .................. 34 3-2Performancecomparisonintermsoffalsepositiveratio. ............. 36 3-3Numberofmembershipbitsusedbythelters. .................. 36 3-4Optimalnumberofmembershipbitswithrespecttotheloadfactor. ....... 37 3-5FalsepositiveratiosoftheBloomlterandtheBloom-1lterwithoptimalk. .. 38 3-6FalsepositiveratiosofBloomlterandBloom-2lter. .............. 42 3-7Numberofmembershipbitsusedbythelters. .................. 42 3-8Optimalnumberofmembershipbitswithrespecttotheloadfactor. ....... 43 3-9FalsepositiveratiosofBloomandBloom-gwithoptimalk. ............ 44 3-10FalsepositiveratiosoftheBloomlterwithk=3,theBloom-1lterwithk=6,andtheBloom-2lterwithk=5. ........................ 46 3-11FalsepositiveratiosofBloomwithk=3andBloom-gwithh=3log2minrealtraceexperiment. ................................... 49 3-12FalsepositiveratiosofBloomandBloom-gwithoptimalkinrealtraceexperiment. 49 3-13FalsepositiveratioofBloom-1withrespecttoloadofwords. .......... 52 3-14Frequencydistributionofwordswithrespecttoloadvalues. ........... 52 3-15FalsepositiveratiosofBloom-1,Bloom-2andBloom-Filterwithk=3. .... 55 3-16AveragenumberofmemoryaccessforBloom-1,Bloom-2andBloom-Filterwithk=3inrealtraceexperiment. ......................... 56 3-17FalsepositiveratiosofBloom-1,Bloom-2andBloom-Filterwithk=3inrealtraceexperiment. ................................ 56 3-18AveragenumberofmemoryaccessforBloom-1,Bloom-2andBloom-Filterwithoptimalkinrealtraceexperiment.Parameters:n=25,000andw=64. 57 3-19FalsepositiveratiosofBloom-1,Bloom-2andBloom-Filterwithoptimalkinrealtraceexperiment.Parameters:n=25,000andw=64. ........... 57 3-20OptimalnumberofmembershipbitsfortheBloom-Filterinrealtraceexperiment.Parameters:n=25,000andw=64. ....................... 58 4-1AnexampleofinsertingamembertotheMSMfunction. ............. 65 10

PAGE 11

4-2Anillustrationofusingload-to-left,candidate-to-rightpolicytoinsertamembertoaSID-table.Anentrymarkedwith`x'(`0')meansisused(unused). ..... 65 4-3AnexampleoflookingupamemberintheMSMfunction. ........... 67 4-4Statetransitiondiagramforcodesinindexencoderwithtime-baseddeletion.isthestartingstate. .............................. 74 4-5InsertionfailureratiooftheBloomierlterandtheMSMlter. .......... 78 4-6AveragenumberofmemoryaccesspermembershiplookupbytheBloomierlter,COMB,andtheMSMlter. .......................... 80 4-7NumberofhashbitsrequiredbytheBloomierlter,COMB,andtheMSMlterforeachmembershiplookup. ............................ 80 4-8FalsepositiveratiocomparisonamongtheBloomierlter,COMB,andtheMSMlter. ...................................... 81 4-9ConictclassicationratiocomparisonbetweenCOMBandMSM. ....... 81 4-10ErrorratiooftheBloomierlter,COMB,andtheMSMlter. ........... 81 5-1ThepartitionedBloomlterwithk=4segments. ................. 89 5-2AsimpliedadaptiveBloomlter. ......................... 91 6-1IllustrationoftherepresentativesegmentofatagtinTOP. ........... 106 6-2Energy,time,andenergy-timetrade-offofTOP. .................. 112 6-3BoundBthatsatisesProbfTBgwithrespecttov.Parameters:n=10,000,m=1,000. ................................. 115 6-4IllustrationoftherepresentativesegmentofatagtinETOP. .......... 118 6-5BoundBthatsatisesProbfTBgwithrespecttov.Parameters:n=10,000,m=1,000. ................................. 125 6-6EnergyandtimecomparisonbetweenBP,CP,TOP,andETOP. ......... 127 6-7EnergyandexecutiontimecomparisionofTOPandETOP. ........... 128 7-1IllustrationofCRW. .................................. 139 7-2E(X)asafunctionofn. ............................... 155 7-3IllustrationofTRW. .................................. 155 7-4CRW'sestimationaccuracyunderthestreetmodel. ............... 156 7-5TRW'sestimationaccuracyunderthestreetmodel. ................ 156 11

PAGE 12

7-6Numberofmessagetransmissionsforeachestimation,underthestreetmodel. 156 7-7CRW'sestimationaccuracyundertherandomwaypointmodelwithnodalmovingvelocityinrangeof[5,30]. .............................. 157 7-8TRW'sestimationaccuracyundertherandomwaypointmodelwithnodalmovingvelocityinrangeof[5,30]. .............................. 157 7-9Numberofmessagetransmissionsforeachestimation,undertherandomwaypointmodelwithnodalmovingvelocityinrangeof[5,30]. ......... 157 7-10CRW'sestimationaccuracyundertherandomwaypointmodelwithnodalmovingvelocityinrangeof[3,10]. .............................. 158 7-11TRW'sestimationaccuracyundertherandomwaypointmodelwithnodalmovingvelocityinrangeof[3,10]. .............................. 158 7-12Numberofmessagetransmissionsforeachestimation,undertherandomwaypointmodelwithnodalmovingvelocityinrangeof[3,10]. .......... 158 12

PAGE 13

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyEFFICIENTDATASTRUCTURESANDPROTOCOLSWITHAPPLICATIONSINSPACE-TIMECONSTRAINEDSYSTEMSByYanQiaoMay2014Chair:ShigangChenMajor:ComputerEngineeringSpace-timeefcientdatastructurescanbenetmanyapplicationsystems.Space-efciencydescribesadatastructurethatcantcompactlyintothememory.Thetermtime-efciencyimpliesthataccessing/updatingitrequiresshortperiodoftime.Inthiswork,wecoverspace-timeefcientdatastructuresrelatedtofourabstractproblems:setmembershipchecking,multi-setmembershipchecking,distributedsetjoining,andsetsizeestimation.Thegoalistoimprovethespace-timeefciencyofexistingsolutions.WerststudyanefcientsetrepresentationtheBloomlter.TraditionalBloomltersofferexcellentspaceefciency,buteachlookuprequiresmultiplehashcomputationsandmultiplememoryaccesses.Weproposeanewfamilyofspace-timeefcientBloomltersthatprovidethesamespaceefciency,butrequiresmuchfewermemoryaccessesforeachlookup.Duetowideapplicability,anyimprovementtotheperformanceofBloomlterscanpotentiallyhaveabroadimpactinmanyareasofresearch.Thenwefurtherproposeadatastructureforrepresentingmulti-sets,whereasetIDisassociatedwitheachmemberelement.Ourproposeddatastructurecanreducethespaceandcomputationalcomplexity,yetkeeptheerrorratioinaprettylowlevel.Next,wedesignanotherBloomltervariantcalledadaptiveBloomlterforefcientjoiningtwodistributedsets.Whentwonodesindistributedsystemexchangetheir 13

PAGE 14

elementstondoutcommonones,theBloomltercanbeusedtoencodeelementIDswithfewernetworkoverhead.Ourdesigncanfurtherreducetheamountofdatathatisexchanged.Then,weapplyourefcientBloomlterideatocyber-physicalsystems.Weshowthatthespace-timedomainincomputer/networksystemcanbemappedtothetime-energydomainofanRFIDsystem.BasedonthepartitionedBloomlterandourefcientBloomlteridea,wedesigntwotime-energyefcientprotocolsforRFIDsystems.Weshowhowtotweakthedatastructurestoadapttoapplication-specicfeatures.Last,weproposetwolight-weightedprotocolstoestimatethecardinalityofvehiclesinaregion.Futurevehiclesmaybeequippedwithwirelessdevicestoformapowerfulvehicularpeertopeernetwork.Forastationarywirelessnetwork,thisproblemhasbeenextensivelystudied.However,itbecomesmuchmorechallengingformobilevehicularpeer-to-peernetworkswhosetopologiesareconstantlychanging,andtheestimationprotocolhastobefasttotakeausefulsnapshotofthedynamicnetwork.Ourprotocolsarebasedoncircledrandomwalksandtokenedrandomwalks.Withsimulationsundertwodifferentmobilitymodels,weshowtheefciencyofourprotocols. 14

PAGE 15

CHAPTER1INTRODUCTION 1.1BackgroundSpace-timeefcientdatastructuresandprotocolscanbenetmanyapplicationsystems.Space-efciencydescribesadatastructurethatcantcompactlyintothememory.Thetermtime-efciencyhastwodimensions:Foradatastructure,itimpliesthataccessing/updatingitrequiresshortperiodoftime;Foraprotocolthatcontainscontinuoustimeslots,itmeanstheexecutiontimeoftheprotocolisshort.Space-timeefciencyimpliesbothspace-efciencyandtime-efciency.Space-timeefcientdatastructuresandprotocolsplayanimportantrolewhenspace/timehasmajorinuenceonperformance.Inthiswork,wewillshowseveralspace-timeefcientdatastructureandprotocoldesignsandtheirapplicationsinsuchsystems.Forexample,withmodernhigh-endrouters,today'snetworksystemisabletoprocessmillionsofpacketspersecond,andforwardeachpacketinafewclockcycles(40GbpsforOC-768and16.4Tbpsinexperimentalsystems[ 49 ]).Tokeepupwithsuchhighthroughput,manynetworkfunctionsthatdealwithreal-timepacketsalsohavetobeimplementedinhardware,e.g.trafcmeasurement,packetscheduling,accesscontrol,qualityofservice,andsoon.However,thedataassociatedwiththefunctionsmaynotbestoredinDRAMbecausethebandwidthanddelayofDRAMaccesscannotmatchthepacketthroughputatthelinespeed.Consequently,therecentresearchtrendistoimplementnetworkfunctionsinthehigh-speedon-diecachememory,whichistypicallySRAMfornetworkprocessorchips.SRAMhaslimitedsize,whichistypicallyafewmegabytes.Furtherincreasingon-chipmemorytomorethan10MBistechnicallyfeasible,butitismoreexpensiveandtheaccesstimeislonger.Off-chipSRAMislarger[ 124 ],butitisslowertoaccessandmemorybandwidthisconstrainedbythenumberofI/Opins.Furthermore,withalimitedsize,on-chipmemorymayhavetobesharedbyvariousintegratedfunctionsthatareimplementedtogether.Forexample,trafc 15

PAGE 16

monitoringandmeasurementfunctionsprovidecriticalinformationforserviceprovision,anomalydetection,capacityplanning,accountingandbillinginmoderncomputernetworks[ 43 45 74 90 166 ].Ifthesefunctionsareimplementedinoff-chipDRAM,theywillnotbeabletokeepupwithtoday'slinespeed.Asaresult,itishighlydesiredtousespace-timeefcientdatastructuresinhigh-speednetworksystems.Asanotherexample,adistributedsystemisasoftwaresysteminwhichcomponentslocatedonnetworkedcomputerscommunicateandcoordinatetheiractionsbypassingmessages[ 37 ].Indistributedsystemssuchastheonesmentionedin[ 75 82 ],communicationbetweendistributedcomponentsisinlargescaleandhappensfrequently.Itisdesirabletomakethecommunicationextremelyefcient.Inthiswork,westudyspace-timeefcientdatastructuresandprotocolsinspace-timeconstrainedsystemsingeneral.Weshowhowthesedatastructurescanbeappliedinbutnotrestrictedtohigh-speednetworksystems,distributedsystems,RFIDsystems,andvehicularsystems. 1.2ResearchScopeofThisWorkInthiswork,werststudyageneralefcientdatastructure,i.e.theBloomlter.Bloomltersareefcientdatastructuresformembershipcheckagainstlargedatasets[ 7 12 ].Theyarewidelyappliedinrouting-tablelookup[ 40 145 146 ],per-owtrafcmeasurement[ 76 ],owlabelidentication[ 97 ],rewalldesign[ 103 ],intrusiondetection[ 150 ],andsoon.TraditionalBloomltersofferexcellentspaceefciency,buteachlookuprequiresmultiplehashcomputationsandmultiplememoryaccesses[ 7 12 98 ].Weproposeanewfamilyofspace-timeefcientBloomltersthatprovidethesamespaceefciency,butrequiresmuchfewermemoryaccessforeachlookup(Chapter 3 ).Duetowideapplicability,anyimprovementtotheperformanceofBloomlterscanpotentiallyhaveabroadimpactinmanyareasofresearch.TheBloomltercancompactlyencodeasingleset,butusingitalonedoesnotefcientlyrepresentamulti-set,whereasetIDisassociatedwitheachmember 16

PAGE 17

element.Inspiredbythespace-timeefcientBloomlterdesign,weproposeanefcientdatastructuretoencodemulti-Setmembership.Ourdesigngoalistoreducethespaceandcomputationalcomplexity,yetkeepingtheerrorratiocausedbyfalsepositiveinalowlevel.Chapter 4 showsthedetaileddesign.Bloomltershavewideapplicationindistributedsystems.WedesignanadaptiveBloomlterfordistributivelyjoiningtwosetsindistributedsystems(Chapter 5 ).WhentheBloomlterandthecompressedBloomlterwererstintroducedtothisproblem,thecommunicationcostwasgreatlyreduced.Ourdesigncanfurtherreducetheamountofmessagethathastobeexchanged.Next,weshiftourviewfromthespacedomaintothetimedomainandshowhowanefcientprotocolcanbenetcyber-physicalsystems.Inatypicalcomputer/networksystem,informationisstoredinmemoryas0/1bits.WhileinawirelessenvironmentsuchasanRFIDsystem,informationissent/receivedviacontinuoustimeslots.Thelengthofoveralltimeslotsdenestheexecutiontimeoftheprotocol.Asaresult,thespacedomainincomputer/networksystemsbecomesthetimedomaininRFIDsystems.Meanwhile,RFIDtagssend/receiveinformationto/fromoneormoretimeslots.Theamountofdatathatatagsend/receivedeterminesitsenergyexpenditure.Therefore,thetimedomainincomputer/networksystems(accessing/updatingthedatastructure)canbemappedtotheenergydomainofRFIDtags(sending/receivinginformationto/fromtimeslots).Asaresult,designingatime-energyefcientRFIDprotocolsharesthesamedisciplineasimplementingaspace-timedatastructureinacomputer/networksystem.ThroughaspecicprobleminanRFIDsystem,i.e.howtocollectinformationefcientlyfromasubsetofRFIDtags,weshowthatanefcientprotocoldesigncanreducethebatteryconsumptionofRFIDtagstoaconstantlevel(Chapter 6 ).Last,wefurtherdiscussapplicationsinvehicularpeertopeernetworks(Chapter 7 ).CollectinginformationfromVP2Pnetworkshasimportantapplications.OneproblemistodeterminethecardinalityofalargeVP2Pnetwork,i.e.,thenumberofvehicles 17

PAGE 18

inthenetwork.Forastationarywirelessnetwork,thisproblemcanbetriviallysolvedthroughaooding-basedquery.However,itbecomesmuchmorechallengingformobileVP2Pnetworkswhosetopologiesareconstantlychanging.Meanwhile,theprotocolhastorunquickinordertoprovideausefulsnapshotofthenetwork.Weproposetwolight-weightedprotocolsbasedoncircledrandomwalksandtokenedrandomwalksforestimatingthecardinality.Simulationisperformedundertwodifferentmobilitymodels:thestreetmodelandtherandomwaypointmodel.Attheend,weconcludethisworkanddiscussfutureworks. 18

PAGE 19

CHAPTER2SYSTEMMODEL,METRICS,RELATEDWORKS,ANDCONTRIBUTIONSInthischapter,weintroducethesystemmodels,performancemetrics,relatedworksandcontributionsofthethiswork. 2.1SystemModelInthiswork,wefocusonspace-timeconstrainedsystems.Theapplicationsweusedinthisworkarediversied:includinghigh-speednetworksystem,distributedsystem,RFIDsystem,andvehicularpeer-to-peersystem.Despiteapplicationspecicfeatures,thesesystemssharesomesimilarities:Spaceandtimearepreciousresourcesofthesystemsthatdirectlyorindirectlyconstrainsthesystemperformance.Therefore,thespaceortimeorboththatadatastructure/protocolmayuseislimited.Forinstance,ahigh-speednetworksystemsuchasthatinhigh-endroutershashugethroughputdemandsandhighspeedrequirements[ 49 ].Wemakethefollowingreasonableassumptionswhiledesigningthedatastructuresforsuchsystems: Improvingper-operationoverheadbringslargeperformancegains. Availablespaceislimitedinsize(e.g.onlineSRAMisnormallysmallerthan10MB). Thesystemfetchesablockofmemory(e.g.32bits,or64bits)intotheprocessoratatime.Thisismostlytrueformodernprocessors.Moreapplicationspecicmodelswillbediscussedincorrespondingchapters. 2.2PerformanceMetricsTheperformanceoftheBloomlteranditsmanyvariantsisjudgedbasedonthreecriteria:Therstoneistheprocessingoverhead,whichisincludesthenumberofmemoryaccessesandthenumberofhashoperations.Theoverheadlimitsthehighestthroughputthatthedatastructurecansupport.BecausebothSRAMandthehashfunctioncircuitmaybesharedamongdifferentnetworkfunctions,itisimportantforthemtominimizetheirprocessingoverheadinordertoachievegoodsystemperformance.Thesecondperformancecriterionisthespacerequirement.A 19

PAGE 20

smallerspacerequirementreducestheusageofSRAM,savingspaceforricheronlinefunctions.Ifthedatastructureismainlytransferredvianetwork,thenwecancallitasthebandwidthrequirement.Thethirdcriterionisthecorrectness.Forexample,aBloomltermaymistakenlyclaimanon-membertobeamemberduetoitslossyencodingmethod,whichiscalledthefalsepositiveratio.Asanotherexample,somevariantsofBloomltersmayintroducefalsenegatives,whichmeansamemberismistakenlyclaimedasanon-member.Thereisatradeoffbetweenthespacerequirementandthecorrectness.Wecanimprovethelatterbyallocatingmorememory.Insummary,theperformancemetricsweconsiderinclude: ProcessingOverhead: MemoryAccess:numberofmemoryaccessforeachmembershipquery. HashComplexity:numberofhashbitsrequiredforeachmembershipquery. SpaceRequirement: Ifthedatastructureisstoredinmemory,howmuchmemoryspaceittakes. Ifthedatastructureispassedasamessage,whatisthemessagesize. Correctness: FalsePositiveRatio:probabilityforanon-memberbeingconsideredasamember. FalseNegativeRatio:probabilityforamemberbeingclaimedasanon-member. ConictClassication:probabilityforamulti-setmembershipquerytoreturnmultiplepossibleset-IDstoamember. Mis-ClassicationRatio:probabilityforamemberofamulti-setbeingrecognizedasanon-member. 2.3RelatedWorksSincetheBloomlterwasrstproposedin1970[ 7 ],manyvariantshaveemergedtomeetdifferentrequirements,suchassupportingsetupdates,improvingthespace-efciency,memoryaccessefciency,falsepositiveratio,andhashcomplexityoftheBloomlter.AcomprehensivesurveyonBloomltervariantsandtheirapplications 20

PAGE 21

canbefoundin[ 153 ].OtherBloomltervariantswereproposedtosolvespecicproblems,e.g.representingmulti-setsorsetswithmulti-attributemembers[ 59 161 ].Inthefollowing,weintroducepriorworksthatarerelatedtothisdissertation.Wedividetheseworksintofourcategories.Therstcategoryaimstoimprovethespace,hash,accuracy,andmemoryaccessefciencyoftraditionalBloomlters.ThesecondcategoryincludesvariantsofBloomltersthatallowelementupdateordelete.ThethirdcategoryinvolvestheBloomlteruseinthedistributedjoinproblem.Thelastcategorydealswithmulti-setmembershiplookup.Applicationspecic(i.e.RFID,VP2Pnetwork)relatedworkswillbefurtherdiscussedincorrespondingchapters. 2.3.1BetterBloomFilter 2.3.1.1ImprovingSpaceEfciencySomeBloomltervariantsaimtoimprovethespace-efciency[ 111 121 128 ].IthasbeenproventhattheBloomlterisnotspace-optimal[ 12 93 121 ].Paghetal.designedanewRAMdatastructurewhosespaceusageiswithinalowerordertermoftheoptimalvalue[ 121 ].Poratdesignedadictionarydatastructurethatmapskeystovalueswithoptimalspace[ 128 ].MitzenmacherproposedthecompressedBloomlters[ 111 ],whichissuitableformessagepassing,forexample,duringwebcachesharing.Theideaistousealarge,sparseBloomlteratthesender/receiver,andcompress/decompressthelterbefore/aftertransmission.WewillcoverthisworkinmoredetailsinChapter 5 2.3.1.2ReducingFalsePositiveRatioReducingthefalsepositiveratiooftheBloomlterisanothersubjectofresearch.LumettaandMitzenmacher[ 99 ]proposedtousetwochoices(i.e.twosetsofhashfunctions)foreachelementencodedintheBloomlter.Theideaistopickthesetofhashfunctionsthatproducetheleastnumberofones,soastoreducethefalsepositiveratio.ThisapproachgeneratesfewerfalsepositivesthantheBloomlterwhentheload 21

PAGE 22

factorissmall,butitrequirestwicethenumberofhashbitsandtwicethenumberofmemoryaccessthantheBloomlterdoes.Haoetal.usepartitionedhashingtodividemembersintoseveralgroups,eachgroupwithadifferentsetofhashfunctions[ 46 ].Heuristicalgorithmsareusedtondthebestpartition.However,thisschemerequiresthekeysofmemberelementstocomputethenalpartition.Therefore,itonlyworkswellwhenmembersareinsertedallatonceandmembershipqueriesareperformedafterwards.TabatabaandHashemiproposedtheDualBloomFiltertoimprovethefalsepositiveratioofasingleBloomlter[ 151 ].TheideaistousetwoequalsizedBloomlterstoencodethesetwithdifferentsetsofhashfunctions.Anelementisconsideredamemberonlywhenbothltersproducepositives.Inordertoreducethespaceusageforstoringrandomnumbersfordifferenthashfunctions,thesecondBloomlterencodesonescomplementedvaluesoftheelements.Limetal.usetwocross-checkingBloomlterstogetherwithtraditionalBloomltertoreducethefalsepositives[ 92 ].ItrstdividesthesetStobeencodedtotwodisjointsubsetsAandB(S=A[B,A\B=;).ThentheyusethreeBloomlterstoencodeS,A,Brespectively,denotedasF(S),F(A),F(B).Differenthashfunctionsareusedtoprovidecrosschecking.Allthreeltersarecheckedduringmembershiplookup.OnlywhenF(S)givespositiveresult,whileF(A)andF(B)donotbothgivenegativeresults,itclaimstheelementasamember.TheirevaluationshowsthatitimprovesthefalsepositiveofBloomlterswiththesamesizebymagnitudes. 2.3.1.3ImprovingRead/WriteEfciencyOtherworkstrytoimprovetheread/writeefciencyoftheBloomlter.Thiscategoryismostrelatedtoourworkin[ 132 133 ].Zhuetal.proposedthehierarchicalBloomlter,whichbuildsasmall,lessaccurateBloomlteroverthelarge,accurateltertoreducethenumberofqueriestothelargelter[ 178 ].Asthesmalllterhasbetterlocality,thetotalnumberofmemoryaccessesisreduced.Theirmethodissuitablefor 22

PAGE 23

thescenarioswherelargelteraccessiscostive,andelementqueriesarenotuniformlydistributed[ 178 ].ThePartitionedBloomlter[ 65 114 ]canalsoimprovetheread/writeefciency.Itdividethememoryintoksegments.khashfunctionsareusedtomapanelementtoonebitineachsegment.Ifthesegmentsresideindifferentmemorybanks,reads/writesofthemembershipbitscanproceedinparallel,sothattheoverallread/writeefciencyisimproved.Kimetal.proposedaparallelisedBloomlterdesigntoreducethepowerconsumptionandincreasethecomputationthroughputofBloomlters[ 68 ].TheideaistotransformmultiplehashfunctionsofBloomltersintothemultiplestages.Whenthemembershipbitfromearlierhashfunctionis`0',thequerystringisnotprogressedtothenextstage.Chenetal.proposedanewBloomltervariantthatallocatestwomembershipbitstoeachmemoryblock[ 30 ].Thisreducesthetotalnumberofmemoryaccessesbyhalf.ItisequivalenttotheBloom-k=2lterinourBloom-gfamily.Ourworkcanbeinterpretedasageneralizationofthiswork.However,thispaperdidnotfurtheranalyzetheproblemtoextremes:k=2memoryblocksareaccessedforeachmembershipquery,whichisnotnecessaryasoursimulationshows.Besides,thisworkdoesnotthoroughlystudythetrade-offsbetweenblocksizes,numberofhashbits,andfalsepositiveratio.Canimetal.proposedatwo-layerbufferedBloomltertorepresentalargesetstoredinashmemory(a.k.a.SolidStateStorage,SSD)[ 17 ].Thedesignconsidersseveralimportantcharacteristicsofashmemory:(1)Datacanberead/writebypages.(2)Apagewriteisslowerthanapagereadanddatablocksareerasedrstbeforetheyareupdated(in-placeupdateproblem).(3)Eachcellallowsalimitednumberoferaseoperationsinashmemorylifecycle.Therefore,itisimportanttoreducethenumberofwritestoashmemory.Inthedesignof[ 17 ],thelterlayerinashconsistsofseveralsub-lters,eachwiththesizeofapage;whilethebufferlayerinRAMbufferstheread 23

PAGE 24

andwriteoperationstoeachsub-lter,andappliestheminbulkwhenabufferisfull.ThebufferedBloomlteradoptsanoff-linemodel,whereelementinsertionsandqueriesaredeferredandhandledinbatch.Debnathetal.adoptedanonlinemodelintheirdesignoftheBloomFlash,whichisalsoasetrepresentativestoredinashmemory[ 38 ].Theideaistoreducerandomwriteoperationsasmuchaspossible.First,bitupdatesarebufferedinRAMsothatupdatestothesamepagearehandledatonce.Second,theBloomlterissegmentedtomanysub-lters,witheachsub-lteroccupyingoneashpage.Foreachelementinsertion/query,anadditionalhashfunctionisevokedtochoosewhichsub-ltertoaccess.Intheirdesign,thesizeofeachsub-lterisxedtothesizeofaashpage,typically2KBor4KB[ 38 ].Comparingtoourwork,theydidnotstudyamoregenericcasewherethememoryblocksizeis32bits,64bits,ormore,orhowtoremedythefalsepositiveratioinducedbyimbalanceddistributionofmembershipbits.Luetal.proposedaForest-structuredBloomlterfordynamicsetinashstorage[ 94 ].TheproposeddatastructureresidespartiallyinashandpartiallyinRAM.Itpartitionsashspaceintoacollectionofash-pagesizedsub-ltersandorganizesthemintoaforeststructure.Asthesetsizeisincreasing,newsub-ltersareaddedtotheforest. 2.3.1.4ReducingHashComplexityTherearealsoworksthataimtoreducethehashcomplexityoftheBloomlter.KirschandMitzenmacherhaveshownthatfortheBloomlter,onlytwohashfunctionsarenecessary.Additionalhashfunctionscanbeproducedbyasimplelinearcombinationoftheoutputoftwohashfunctions[ 65 ].Thisgivesusanefcientwaytoproducemanyhashbits,butitdoesnotreducethenumberofhashbitsrequiredbytheBloomlteroritsvariants.TherearealsootherworksthatdesignefcientyetwellrandomizedhashfunctionsthataresuitableforBloomlter-likedatastructures.Sincethistopicislessrelatedtothiswork,wedonotenumeratethem. 24

PAGE 25

2.3.2BloomFilterforDynamicSetTheBloomlterdoesnotsupportelementdeletionorupdatewell,anditrequiresthenumberofelementsasapriortooptimizethenumberofhashfunctions.Almeidaetal.proposedtheScalableBloomFiltersfordynamicsets[ 3 ],whichcreatesanewBloomlterwhencurrentBloomltersarefull.Membershipquerychecksallthelters.Eachsuccessivebloomlterhasatightermaximumfalsepositiveprobabilityonageometricprogression,sothatthecompoundedprobabilityoverthewholeseriesconvergestosomewantedvalue.However,thisapproachusesalotofspaceanddoesnotsupportdeletion.ThecountingBloomlter(CBF)[ 82 ]addressestheelementdeletionproblembyreplacingthebitarraywithacounterarray.Insertions(deletions)toCBFaredonebyincrementing(decrementing)thecountersby1.Membershipquerychecksthepositivenessofthecounters.CBFanditsvariants(e.g.[ 54 137 ])requiremultipletimesofthespaceusedbyastandardBloomlter,dependingonhowmanybitseachcountertakes.Rottenstreichetal.proposedtheVariable-IncrementCountingBloomFilter(VI-CBF)[ 137 ]toimprovethespaceefciencyofCBF.Foreachelementinsertion,countersareupdatedbyahashedvariableincrementinsteadofaunitincrement.Then,duringaquery,theexactvalueofacounterisconsidered,notjustitspositiveness.Duetopossiblecounteroverow,CBFandVI-CBFbothintroducefalsenegatives[ 53 ].TheBloomlterwithvariable-lengthsignatures(VBF)proposedbyLuetal.alsoenableselementdeletion[ 98 ].Insteadofsettingkmembershipbits,aVBFsetst(tk)bitstoonetoencodeanelement.Tocheckthemembershipofanelement,ifq(qtk)membershipbitsareones,itclaimsthattheelementisamember.Deletionisdonebysettingseveralmembershipbitstozerosuchthatremainingonesarelessthanq.VBFalsohasfalsenegatives. 25

PAGE 26

Rothenbergetal.achievedfalse-negative-freedeletionsbydeletingelementprobabilistically[ 136 ].TheirproposedDeletableBloomlterdividesthestandardBloomlterintorregions.Itusesabitmapofsizertoindicateifthereisanybitcollision(i.e.twomemberssettingthesamebitto`1')amongmemberelementsineachregion.Whentryingtodeleteanelement,itonlyresetsthebitsthatarelocatedincollision-freeregions.Thereisaprobabilitythatanelementcanbesuccessfullydeleted:whenthereisatleastonemembershipbitlocatedinacollision-freeregion.Astheloadfactorincreases,theprobabilityofsuccessfuldeletiondecreases.Also,aselementsareinsertedanddeleted,eveniftheloadfactorremainsthesame,thedeletionprobabilitystilldrops.Thisisbecausebitsthebitmapdoesnotreecttherealbitcollisionwhenelementspreviouslycausingthecollisionwerealreadydeleted.Butthereisnowaytoresetbitsinthebitmapbackto`0's.Insameapplications,newdataismoremeaningfulthanolddata,sosomedatastructuredeletesolddatainrst-in-rst-outmanner.Forexample,Changetal.proposedanagingschemecalleddoublebufferingwheretwolargebuffersareusedalternatively[ 21 ].Inthisscheme,thememoryspaceisdividedintotwoequal-sizedBloomlterscalledactivelterandwarm-uplter.Thewarmuplterisasub-setoftheactivelter,whichconsistsonlyelementsthatappearedaftertheactivelterismorethanhalffull.Whentheactivelterisfull,itisclearedout,andthetwoltersexchangesroles.Eachtimetheactivelterisclearedout,anumberofolddataaredeleted,thesizeofwhichishalfthecapacityoftheBloomlter.Yoonproposedanactive-activeschemetobetterutilizethememoryspaceofdoublebuffering[ 165 ].Theideaistoinsertnewelementstoonelter(insteadofpossiblytwoindoublebuffering)andquerybothlters.Whenthelterthatelementsarecurrentlyinsertingtobecomesfull,theotherlterisclearedoutandtheltersswitchroles.Itcanstoremoredatawiththesamememorysizeandtolerablefalsepositiveratecomparedtodoublebuffering. 26

PAGE 27

DengandRaeiproposedtheStableBloomltertodetectduplicatesininniteinputdatastream[ 39 ].SimilartotheCountingBloomlter,aStableBloomlterisanarrayofcounters.Whenanelementisinserted,thecorrespondingcountersaresettothemaximumvalue.Thefalsepositiveratiodependsontheratioofzeros,whichisaxedvalueforaStableBloomlter.Itachievesthisbydecrementingsomerandomcountersbyonewheneveranewelementisinserted.Bonomietal.proposedtime-baseddeletion[ 8 ]withaagassociatedwitheachbitinthelter.Ifamemberisaccessed,theagsofitscorrespondingbitsaresetto`1'.Attheendofeachperiod,bitswithunsetagsareresettozeros,andthenallagsareunsetto`0'stoprepareforthenextperiod.Elementsthatarenotaccessedduringaperiodaredeletedconsequently.However,ithastowaitatimeperiodforthedeletiontotakeeffect. 2.3.3BloomFilterforDistributedJoinBloomltersarewidelyusedindistributedsystems.Oneapplicationistoefcientlyprocessdatabasejoinsindistributedsetting.MackertandLohmanproposedBloomjoin,whichsendsaBloomltertolteroutalargeportionofunneededdatabeforetransferringrealdata[ 104 ].Bloomjoinlargelyreducesthecommunicationcostfordistributivelyjoiningtwosets.MullinsuggeststousepartitionedBloomlter(PBF)toencodethedata,andsendonesegmentatatime[ 114 ].WhenthedatasizeoflteredoutelementsissmallerthanthesizeofaPBFsegment,itstopssendingPBFsegmentsbutsendstheremainingelementsinstead.TheadvantageofthisapproachoverBloomjoinisthatitdoesnotneedafalsepositiveratioasaprior,sothatthekthatisactuallyusedismoreclosetotheoptimalvalue.Michaeletal.studytheintersectionofmultiplesets[ 109 ].TheyoptimizedthejoinsequenceandcachetheBloomltersofkeyindicesinblock-partitionedBloomlterssothatrehashingisnotnecessary. 27

PAGE 28

Rameshetal.investigatethejoinproblemofmultiplesitesandproposefourBloomjoinextensions:resultmergingatparticipatingsites,resultmergingatusersite,cachingataparticipatingsite,andcachingatcoordinatorsite[ 134 ].Theyalsoanalyzeeachextensionandconstructaqueryoptimizerforselectingthebestextensionforeachquery.[ 79 ]and[ 127 ]studytheuseofBloomltersindistributedjoinwithMapReduce.Phanetal.useBloomltersandpartitionedBloomltersforbothsetintersectionandsetjoininMapReducetosavecommunicationcost[ 127 ].Zhangetal.investigatetheuseofGraphicsProcessingUnits(GPU)ingeneratingtheintersectionoftwolists[ 171 ].Tothebestofourknowledge,theideausingBloomltersiterativelywithdecreasingsizeandglobaloptimizationisunderexploit. 2.3.4BloomFilterforMulti-SetAMulti-setisdenedasavirtualsetthatconsistsofseveraldisjointsets.Toencodeamulti-set,wemayuseoneseparateBloomltertoencodeeachset[ 21 56 ].However,thismethodhastwoproblems:First,itishardtoallocateproperamountofmemoryforeachlterifthesizesofsetsareunknown.Second,itmakestoomuchmemoryaccessasthenumberofltersrequiredincreaseswiththenumberofsets.CodedBloomlters[ 21 98 ]areproposedtosignicantlycutthenumberofBloomltersbyencodingthesetIDstobinarycodes.EachbitofthecodeisrepresentedbyaBloomlter.However,theyarestillsensitivetothesizeofeachset,meaningthatsomeltersmaybeoverloadedwhileothersareunderloaded.ThecombinatorialBloomlter(COMB)proposedbyHaoetal.useshashfunctionsetsinsteadofseparatelterstoencodesetIDs[ 57 ].ItassignseachsetSiacodecithathasaxednumberofones.Forinstance,ifthecodelengthis20bitsand=5,therewillbe)]TJ /F7 7.97 Tf 5.48 -4.38 Td[(205=15,504differentvalidcodes,whichcanrepresentasmanysetIDs.Anexampleofsuchacodeis00010011000010000001.Foreachbitinthecode,COMBgeneratesagroupofkhashfunctions.ToencodeamemberinSi,COMBexaminesci 28

PAGE 29

bitbybit.Foreachbitofvalue1,itusesthecorrespondinggroupofhashfunctionstosetkbitsintheltertoones.Tolookupforanelement,COMBusesallgroupsofhashfunctions,eachgroupcorrespondingtoonebitinthecode.Foreachhashgroup,COMBperformsaclassicalBloomlterlookup:Hashtheelementtokbitsinthelterandseeiftheyareallset.Ifso,thecorrespondingbitinthecodeis1;otherwise,thebitis0.Again,thevalueofkdeterminestheprobabilityoffalsepositive(mistakenlyconcludingabitas1althoughitshouldbe0).Afterknowingallbitsinthecode,COMBtranslatesthecodebacktoasetID.AnotherrelatedworkistheBloomierlter[ 22 ].ItappendseachbitintheBloomlterwithamulti-bitentry,whichcanstoreavalue(setID).TostoreansetIDSeofanelemente,ithashesetokentriesintheBloomierlter,andmakessurethattheXORoftheseentriesisequaltoSe.Toperformlookupone,wesimplyhashetoitskentries,andtheirXORgivesitssetID.Onceanentryinthelterisusedbyanelement,itsvaluecannotbechanged.Whenweinsertanewelemente0thatbelongstosetS0e,afterwehashe0tokentries,atleastoneentrymustbeunusedinordertoencodee0.Ifthereisanunusedentry,wecansetitsvaluesuchthattheXORofthekelementsisequaltoS0e.However,ifallkentriesarealreadyused,theelementcannotbeinsertedintothelter.Tohandleinsertionfailure,separatedatastructureorinfrastructuresuchasTCAMisused.GoodrichandMitzenmacherproposedtheInvertibleBloomLookupTable(IBLT)toencodebothkeysandvaluesofelements[ 52 ].AnIBLTusesacell(includingacounter,aKeySum,andaValueSum)toreplaceeachbitofaBloomlter.AnelementismappedtokcellsjustliketheBloomlterdoes.Thecounterstoreshowmanyelementsaremappedtoeachcell,whileKeySum(ValueSum)storesthesumofthekeys(values)mappedtothecell.Duringthemembershiplookupofanelemente,ifanymappedcellhasacounterof`0',oranymappedcellthathasacounterof`1'butKeySum6=e,eisnotamember.Otherwiseifthereisatleastonemappedcellwithcounterequalto`1' 29

PAGE 30

andKeySum=e,eisamember.Otherwise,itproducesnotfound,meaningthatitisnotsurewhethereisencodedornot.AgreatadvantageofIBLTisthatitsupportselementdeletionandenumerationofallstoredkey-valuepairswithaboundedfailureratio.However,itneedtostorekeysofelementsinthetable,whichmaytaketoomuchspacewhentheelementstobeencodedhavelongkeys.Therearealsosomeworksfocusingonencodingkeyswithmultipleoccurrence.Thisproblemcanalsobemodeledasa(key,value)pair,wherethevalueisacounterindicatinghowmaytimethecorrespondingkeyappearsintheset.Oneapplicationexampleofthisproblemistomeasureowsizesonarouter.Eachowidisthekeyandthenumberofpacketsintheowiscountedandstoredasthevalue.TheSpectralBloomlter[ 35 ],VirtualBitMaps[ 84 ]andCounterBraids[ 97 ],forexample,areproposedtosolvethisproblem.However,thesedatastructurescannotbedirectlyusedtosolveourproblem.Thisisbecausetheseapproachesnormallygiveanestimatedvalueinsteadoftherealone,whichtranslatestoanincorrectset-ID.Also,[ 84 ]and[ 97 ]useofinemodelinsteadofonlinecheckingmodel. 2.4ContributionsThecontributionofthisworkcanbesummarizedasfollows:WesummarizethemetricsofevaluatingtheBloomlteranditsvariantsinnetworksystemsscenarios.Thetrade-offrelationshipcontainsspace,falsepositiveratio,memoryaccess,andhashbitrequirement.Thelattertwometricsareoftenoverlookedbyexistingworks.WeconductedadetailedstudyinonememoryaccessBloomlterdesignsandtheirgeneralizations.Weopenedadoorformoredesignchoices.Weproposeanefcientrepresentationofmulti-set.WedesignadaptiveBloomlterthatisefcientfordistributedjoinoperationindistributedsystems. 30

PAGE 31

Wediscovertheunderlyingconnectionbetweenspace-timeefcientdatastructureandtime-energyRFIDprotocol.WeapplytheefcientBloomlterideastoinformationcollectioninRFIDsystems.OurproposedprotocolconsumesonlyO(1)tagenergy,comparedtoexistingbestO(m)protocols,wheremisthenumberofRFIDtagsfromwhominformationiscollected.Wedesigntwoefcientcardinalityestimationprotocolsformobilevehicularpeer-to-peernetworks.Theprotocolsshowgoodestimationresultswithrelativelylownetworkoverhead. 31

PAGE 32

CHAPTER3SPACE-TIMEEFFICIENTBLOOMFILTERSFORSETMEMBERSHIPLOOKUP 3.1BackgroundTheBloomltersarecompactdatastructuresforhigh-speedonlinemembershipcheckagainstlargedatasets[ 7 12 ].Theyhavewideapplicationsinrouting-tablelookup[ 40 145 146 ],onlinetrafcmeasurement[ 76 97 ],peer-to-peersystems[ 77 135 ],cooperativecaching[ 82 ],rewalldesign[ 103 ],intrusiondetection[ 150 ],bioinformatics[ 106 ],databasequeryprocessing[ 114 158 ],streamcomputing[ 167 ],distributedstoragesystem[ 20 ],etc[ 12 153 ].Manynetworkfunctionsrequiremembershipcheck.Arewallmaybeconguredwithalargewatchlistofaddressesthatarecollectedbyanintrusiondetectionsystem.Iftherequirementistologallpacketsfromthoseaddresses,therewallmustcheckeacharrivalpackettoseeifthesourceaddressisamemberofthelist.Anotherexampleisrouting-tablelookup.Thelengthsoftheprexesinaroutingtablerangefrom8to32.Aroutercanextract25prexesofdifferentlengthsfromthedestinationaddressofanincomingpacket,anditneedstodeterminewhichprexesareintheroutingtables[ 40 ].Sometrafcmeasurementfunctionsrequiretheroutertocollecttheowlabels[ 96 97 ],suchassource/destinationaddresspairsoraddress/porttuplesthatidentifyTCPows.Eachowlabelshouldbecollectedonlyonce.Whenanewpacketarrives,theroutermustcheckwhethertheowlabelextractedfromthepacketbelongstothesetthathasalreadybeencollectedbefore.Asalastexampleforthemembershipcheckproblem,weconsiderthecontext-basedaccesscontrol(CBAC)functioninCiscorouters[ 47 ].Whenarouterreceivesapacket,itmaywanttorstdeterminewhethertheaddresses/portsinthepackethaveamatchingentryintheCBACtablebeforeperformingtheCBAClookup.Inallofthepreviousexamples,wefacethesamefundamentalproblem:Foralargedataset,whichmaybeanaddresslist,anaddressprextable,aowlabelset,aCBAC 32

PAGE 33

table,orothertypesofdata,wewanttocheckwhetheragivenelementbelongstothissetornot.Ifthereisnoperformancerequirement,thisproblemcanbeeasilysolvedusingtextbookdatastructuressuchasbinarysearch[ 58 ](whichstoresthesetinasortedarrayandusesbinarysearchformembershipcheck),oratraditionalhashtable[ 41 ](whichuseslinkedliststoresolvehashcollision).However,theseapproachesareinadequateiftherearestringentspeedandmemoryrequirements.Modernhigh-endroutersandrewallsimplementtheirper-packetoperationsmostlyinhardware.Theyareabletoforwardeachpacketinacoupleofclockcycles.Tokeepupwithsuchhighthroughput,manynetworkfunctionsthatinvolveper-packetprocessingalsohavetobeimplementedinhardware.However,theycannotstorethedatastructuresformembershipcheckinDRAMbecausethebandwidthanddelayofDRAMaccesscannotmatchthepacketthroughputatthelinespeed.Consequently,therecentresearchtrendistoimplementmembershipcheckinthehigh-speedon-diecachememory,whichistypicallySRAM.TheSRAMishoweversmallandmustbesharedamongmanyonlinefunctions.Thispreventsusfromstoringalargedatasetdirectlyintheformofasortedarrayorahashtable.ABloomlter[ 7 ]isabitarraythatencodesthemembershipofdataelementsinaset.Eachmemberinthesetishashedtokbitsinthearrayatrandomlocations,andthesebitsaresettoones.Toqueryforthemembershipofagivenelement,wealsohashittokbitsinthearrayandseeifthesebitsareallones.TheperformanceoftheBloomlteranditsmanyvariantsisjudgedbasedonthreecriteria:Therstoneistheprocessingoverhead,whichiskmemoryaccessesandkhashoperationsforeachmembershipquery.Theoverhead1limitsthehighestthroughputthattheltercansupport.BecausebothSRAMandthehashfunctioncircuit 1Intherestofthischapter,weinterchangeablyuseprocessingoverheadandoverheadwithoutfurthernotication. 33

PAGE 34

maybesharedamongdifferentnetworkfunctions,itisimportantforthemtominimizetheirprocessingoverheadinordertoachievegoodsystemperformance.Thesecondperformancecriterionisthespacerequirement.MinimizingthespacerequirementtoencodeeachmemberallowsanetworkfunctiontotalargesetinthelimitedSRAMspaceformembershipcheck.Thethirdcriterionisthefalsepositiveratio.ABloomltermaymistakenlyclaimanon-membertobeamemberduetoitslossyencodingmethod.Thereisatradeoffbetweenthespacerequirementandthefalsepositiveratio.Wecanreducethelatterbyallocatingmorememory.GiventhefactthatBloomltershavebeenappliedsoextensivelyinthenetworkresearch,anyimprovementtotheirperformancecanpotentiallyhaveabroadimpact.Westudyadatastructure,calledBloom-1,whichmakesjustonememoryaccesstoperformmembershipcheck.Therefore,itcanbeconguredtooutperformthecommonly-usedBloomlterwithconstantk.Wepointoutthat,duetoitshighoverhead,thetraditionalBloomlterisnotpracticalwhentheoptimalvalueofkisusedtoachievealowfalsepositiveratio.WegeneralizeBloom-1toBloom-g,whichallowsgmemoryaccesses.WeshowthattheycanachievethelowfalsepositiveratiooftheBloomlterwithoptimalk,withoutincurringthesamekindofhighoverhead.WefurthergeneralizeBloom-1toBloom-,whichachievesbetterperformancewithverysmallincreaseinoverhead.Weperformathoroughanalysistorevealthepropertiesofthisfamilyoflters.Wediscusshowtheycanbeappliedforstaticordynamicdatasets.Wealsoconductexperimentsbasedonarealtrafctracetostudytheperformanceofthenewlters. 3.2Bloom-1:OneMemoryAccessBloomFilter 3.2.1BloomFilterABloomlterisaspace-efcientdatastructureformembershipcheck.ItincludesanarrayBofmbits,whichareinitializedtozeros.Thearraystoresthemembershipinformationofasetasfollows:Eachmembereofthesetismappedtokbitsthatare 34

PAGE 35

randomlyselectedfromBthroughkhashfunctions,Hi(e),1ik,whoserangeis[0,m)]TJ /F6 11.955 Tf 10.75 0 Td[(1).2Toencodethemembershipinformationofe,thebits,B[H1(e)],...,B[Hk(e)],aresettoones.ThesearecalledthemembershipbitsinBfortheelemente.Somefrequently-usednotationsinthischaptercanbefoundinTable 3-1 .Tocheckthemembershipofanarbitraryelemente0,ifthekbits,B[Hi(e0)],1ik,areallones,e0isconsideredtobeamemberoftheset.Otherwise,itisnotamember.Wecantreatthekhashfunctionslogicallyasasingleonethatproducesklog2mhashbits.Forexample,supposemis220,k=3,andahashroutineoutputs64bits.Wecanextractthree20-bitsegmentsfromtherst60bitsofasinglehashoutputandusethemtolocatethreebitsinB.Hence,fromnowon,insteadofspecifyingthenumberofhashfunctionsrequiredbyalter,wewillstatethenumberofhashbitsthatareneeded,whichisdenotedash.ABloomlterdoesn'thavefalsenegatives,meaningthatifitanswersthatanelementisnotintheset,itistrulynotintheset.Thelterhoweverhasfalsepositives,meaningthatifitanswersthatanelementisintheset,itmaynotbereallyintheset.Accordingto[ 7 ]and[ 12 ],thefalsepositiveratiofB,whichistheprobabilityofmistakenlytreatinganon-memberasamember,is3 fB=)]TJ /F6 11.955 Tf 5.48 -9.68 Td[(1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F6 11.955 Tf 15 8.09 Td[(1 m)nkk(1)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F5 5.978 Tf 7.78 3.26 Td[(nk m)k, (3) 2Thehashfunctionscanbeanyrandomfunctionsaslongasthekfunctionsaredifferent,i.e.,Hi6=Hj,i6=j2[1,k].However,hashfunctionoutputsarerandomandtwohashresultsmayhappentobethesame,i.e.9i,j,e,i6=j,Hi(e)=Hj(e).3Formula 3 isanapproximationofthefalsepositiveratio.Morepreciseanalysiscanbefoundin[ 31 ].However,whenmislargeenough(>1024),theapproximationerrorcangeneglected[ 31 ]. 35

PAGE 36

Table3-1. NotationsinChapter 3 SymbolMeaning nnumberofmembersinasetBorB1bitarraymnumberofbitsinthebitarrayBorB1knumberofmembershipbitsforeachelementhnumberofhashbitsforlocatingallmembershipbitsktheoptimalvalueofkthatminimizesthefalsepositiveratioofaBloomlterk1theoptimalvalueofkthatminimizesthefalsepositiveratioofaBloom-1lterfB,fB1,fBgfalsepositiveratiosofaBloomlter,aBloom-1lter,andaBloom-glter,respectivelylnumberofwordsinabitarraywnumberofbitsinaword,m=lwB(k=3)BloomlterthatusesthreebitstoencodeeachmemberB1(k=3)Bloom-1lterthatusesthreebitstoencodeeachmemberB1(h=3log2m)Bloom-1lterthatusesupto3log2mhashbitsB(optimalk)BloomlterthatusestheoptimalnumberkofbitstoencodeeachmembertominimizeitsfalsepositiveratioB1(optimalk)Bloom-1lterthatusestheoptimalnumberk1ofbitstoencodeeachmembertominimizeitsfalsepositiveratio wherenisthenumberofmembersintheset.Obviously,thefalsepositiveratiodecreasesasmincreases,andincreasesasnincreases.Theoptimalvalueofk(denotedask)thatminimizesthefalsepositiveratiocanbederivedbytakingtherst-orderderivativeon( 3 )withrespecttok,thenlettingtherightsidebezero,andsolvingtheequation.Theresultis k=ln2m=n0.7m=n. (3) Theoptimalksometimescanbeverylarge.Toavoidtoomanymemoryaccesses,wemayalsosetkasasmallconstantinpractice. 3.2.2Bloom-1FilterTocheckthemembershipofanelement,aBloomlterrequireskmemoryaccesses.WeintroducetheBloom-1lter,whichrequiresonememoryaccessformembershipcheck.Thebasicideaisthat,insteadofmappinganelementtokbits 36

PAGE 37

randomlyselectedfromtheentirebitarray,wemapittokbitsinawordthatisrandomlyselectedfromthebitarray.Awordisdenedasacontinuousblockofbitsthatcanbefetchedfromthememorytotheprocessorinonememoryaccess.Intoday'scomputerarchitectures,mostgeneral-purposeprocessorsfetchwordsof32bitsor64bits.Specicallydesignedhardwaremayaccesswordsof72bitsorlonger.ABloom-1lterisanarrayB1oflwords,eachofwhichiswbitslong.Thetotalnumbermofbitsislw.Toencodeamembereduringtheltersetup,werstobtainanumberofhashbitsfrome,anduselog2lhashbitstomapetoawordinB1.ItiscalledthemembershipwordofeintheBloom-1lter.Wethenuseklog2whashbitstofurthermapetokmembershipbitsinthewordandsetthemtoones.Thetotalnumberofhashbitsthatareneededislog2l+klog2w.Supposem=220,k=3,w=26,andl=214.Only32hashbitsareneeded,smallerthanthe60hashbitsrequiredinthepreviousBloomlterexampleundersimilarparameters.Tocheckifanelemente0isamemberinthesetthatisencodedinaBloom-1lter,werstperformhashoperationsone0toobtainlog2l+klog2whashbits.Weuselog2lbitstolocateitsmembershipwordinB1,andthenuseklog2wbitstoidentifythemembershipbitsintheword.Ifallmembershipbitsareones,itisconsideredtobeamember.Otherwise,itisnot.Thechangefromusingkrandombitsinthearraytousingkrandombitsinawordmayappearsimple,butitisalsofundamental.Animportantquestionishowitwillaffectthefalsepositiveratioandtheprocessingoverhead.Amoreinterestingquestionishowitwillopenupnewdesignspacetocongurevariousnewlterswithdifferentperformanceproperties.Thisiswhatwewillinvestigateindepth.ThefalsenegativeratioofaBloom-1lterisalsozero.ThefalsepositiveratiofB1ofBloom-1,whichistheprobabilityofmistakenlytreatinganon-memberasamember,isderivedasfollows:LetFbethefalsepositiveeventthatanon-membere0ismistakenforamember.Theelemente0ishashedtoamembershipword.LetXbetherandom 37

PAGE 38

variableforthenumberofmembersthataremappedtothesamemembershipword.Letxbeaconstantintherangeof[0,n],wherenisthenumberofmembersintheset.Assumeweusefullyrandomhashfunctions.WhenX=x,theconditionalprobabilityforFtooccuris ProbfFjX=xg=(1)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F6 11.955 Tf 14.83 8.09 Td[(1 w)xk)k. (3) Obviously,Xfollowsthebinomialdistribution,Bino(n,1 l),becauseeachofthenelementsmaybemappedtoanyofthelwordswithequalprobabilities.Hence, ProbfX=xg=nx(1 l)x(1)]TJ /F6 11.955 Tf 13.15 8.09 Td[(1 l)n)]TJ /F5 7.97 Tf 6.59 0 Td[(x,80xn.(3)Therefore,thefalsepositiveratiocanbewrittenasfB1=ProbfFg=nXx=0ProbfX=xgProbfFjX=xg=nXx=0nx(1 l)x(1)]TJ /F6 11.955 Tf 13.15 8.08 Td[(1 l)n)]TJ /F5 7.97 Tf 6.58 0 Td[(x)]TJ /F6 11.955 Tf 5.48 -9.68 Td[(1)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F6 11.955 Tf 14.83 8.08 Td[(1 w)xkk. (3) 3.2.3ImpactofWordSizeWerstinvestigatetheimpactofwordsizewontheperformanceofaBloom-1lter.Ifn,landkareknown,wecanobtaintheoptimalwordsizethatminimizes( 3 ).However,inreality,wecanonlydecidetheamountofmemory(i.e.,m)tobeusedforalter,butcannotchoosethewordsizeoncethehardwareisinstalled.IntheleftplotofFigure 3-1 ,wecomputethefalsepositiveratiosofBloom-1underfourwordsizes:32,64,72,and256bits,whenthetotalamountofmemoryisxedatm=220andkissetto3.Notethatthenumberofwords,l=m w,isinverselyproportionaltothewordsize.Thehorizontalaxisinthegureistheloadfactor,n m,whichisthenumberofmembersstoredbythelterdividedbythenumberofbitsinthelter.Sincemostapplicationsrequirerelativelysmallfalsepositiveratios,wezoominattheload-factorrangeof[0,0.2]foradetailedlookintherightplotofFigure 3-1 .Thecomputation 38

PAGE 39

Figure3-1. Falsepositiveratiowithrespecttowordsizew.LeftPlot:FalsepositiveratiosforBloom-1underdifferentwordsizes.RightPlot:MagniedfalsepositiveratiosforBloom-1underdifferentwordsizes. resultsbasedon( 3 )showthatalargerwordsizehelpstoreducethefalsepositiveratio.Inthatcase,weshouldsimplysetw=mforthelowestfalsepositiveratio.However,inpractice,wisgivenbythehardware,notacongurableparameter.Withoutlosinggenerality,wechoosew=64inourcomputationsandsimulationsfortherestofthischapter. 3.2.4Bloom-1v.s.BloomwithSmallkAlthoughtheoptimalkalwaysyieldsthebestfalsepositiveratiooftheBloomlter,asmallvalueofkissometimespreferredtoboundtheprocessingoverheadintermsofmemoryaccessesandhashoperations.WecomparetheperformanceoftheBloom-1lterandtheBloomlterinbothscenarios.Inthissection,weusetheBloomlterwithk=3asthebenchmark.Usingk=4,5,...producesquantitativelydifferentresults,butthequalitativeconclusionwillstaythesame.Wecomparetheperformanceofthreetypesoflters:(1)B(k=3),whichrepresentsaBloomlterthatusesthreebitstoencodeeachmember;(2)B1(k=3),whichrepresentsaBloom-1lterthatusesthreebitstoencodeeachmember;(3)B1(h=3log2m),whichrepresentsaBloom-1lterthatusesthesamenumberofhashbitsasB(k=3)does. 39

PAGE 40

Table3-2. QueryoverheadcomparisonofBloom-1ltersandBloomlterwithk=3. DataStructureNumberofNumberofHashBitsMemoryAccessm=216m=220m=224 B(k=3)3486072B1(k=3)1283236B1(h=3log2m)1162024 Letk1betheoptimalvalueofkthatminimizesthefalsepositiveratiooftheBloom-1lterin( 3 ).ForB1(h=3log2m),weareallowedtouse3log2mhashbits,whichcanencode3ormoremembershipbitsinthelter.However,itisnotnecessarytousemorethantheoptimalnumberk1ofmembershipbits.Hence,B1(h=3log2m)actuallyusesthekthatisclosesttok1toencodeeachmember.Table 3-2 presentsnumericalresultsofthenumberofmemoryaccessesandthenumberofhashbitsneededbythethreeltersforeachmembershipquery.Theytogetherrepresentthequeryoverheadandcontrolthequerythroughput.First,wecompareB(k=3)andB1(k=3).TheBloom-1ltersavesnotonlymemoryaccessesbutalsohashbits.Forexample,intheseexamples,B1(k=3)requiresabouthalfofthehashbitsneededbyB(k=3).Whenthehashroutineisimplementedinhardware(suchasCRC[ 46 ]),thememoryaccessmaybecometheperformancebottleneck,particularlywhenthelter'sbitarrayislocatedoff-chipor,evenifitison-chip,thebandwidthofthecachememoryissharedbyothersystemcomponents.Inthiscase,thequerythroughputofB1(k=3)willbethreetimesofthethroughputofB(k=3).Next,weconsiderB1(h=3log2m).Eventhoughitstillmakesonememoryaccesstofetchaword,theprocessormaycheckmorethan3bitsinthewordforamembershipquery.Iftheoperationsofhashing,accessingmemory,andcheckingmembershipbitsarepipelinedandthememoryaccessistheperformancebottleneck,thethroughputofB1(h=3log2m)willalsobethreetimesofthethroughputofB(k=3).Finally,wecomparethefalsepositiveratiosofthethreeltersinFigure 3-2 .ThegureshowthattheoverallperformanceofB1(k=3)iscomparabletothatofB(k=3),butitsfalsepositiveratioisworsewhentheloadfactorissmall.The 40

PAGE 41

Figure3-2. Performancecomparisonintermsoffalsepositiveratio. Figure3-3. Numberofmembershipbitsusedbythelters. reasonisthatconcentratingthemembershipbitsinonewordreducestherandomness.B1(h=3log2m)isbetterthanB(k=3)whentheloadfactorissmallerthan0.1.ThisisbecausetheBloom-1lterrequireslessnumberofhashbitsonaveragetolocateeachmembershipbitthantheBloomlterdoes.Therefore,whenavailablehashbitsarethesame,B1(h=3log2m)isabletousealargerkthanB(k=3),asshowninFigure 3-3 3.2.5Bloom-1v.s.BloomwithOptimalkWecanreducethefalsepositiveratioofaBloomlteroraBloom-1lterbychoosingtheoptimalnumberofmembershipbits.From( 3 ),wendtheoptimalvaluekthatminimizesthefalsepositiveratioofaBloomlter.From( 3 ),wecanndthe 41

PAGE 42

Figure3-4. Optimalnumberofmembershipbitswithrespecttotheloadfactor. Table3-3. QueryoverheadcomparisonofBloom-1lterandBloomlterwithoptimalnumberofmembershipbits.Parameters:m=220andw=64. a.Numberofmemoryaccessesperquery DataStructureLoadFactorn=m0.010.020.040.080.16 B(optimalk)69351794B1(optimalk)11111 b.Numberofhashbitsperquery DataStructureLoadFactorn=m0.010.020.040.080.16 B(optimalk)138070034018080B1(optimalk)8074625038 optimalvaluek1thatminimizesthefalsepositiveratioofaBloom-1lter.Thevaluesofkandk1withrespecttotheloadfactorareshowninFigure 3-4 .Whentheloadfactorislessthan0.1,k1issignicantlysmallerthank.WeuseB(optimalk)todenoteaBloomlterthatusestheoptimalnumberkofmembershipbits,andB1(optimalk)todenoteaBloom-1lterthatusestheoptimalnumberk1ofmembershipbits.Tomakethecomparisonmoreconcrete,wepresentthenumericalresultsofmemoryaccessoverheadandhashingoverheadwithrespecttotheloadfactorinTable 3-3 .Forexample,whentheloadfactoris0.04,theBloomlterrequires17 42

PAGE 43

Figure3-5. FalsepositiveratiosoftheBloomlterandtheBloom-1lterwithoptimalk. memoryaccessesand340hashbitstominimizeitsfalsepositiveratio,whereastheBloom-1lterrequiresonly1memoryaccessand62hashbits.Inpractice,theloadfactorisdeterminedbytheapplicationrequirementonthefalsepositiveratio.Ifanapplicationrequiresaverysmallfalsepositiveratio,ithastochooseasmallloadfactor.Next,wecomparethefalsepositiveratiosofB(optimalk)andB1(optimalk)withrespecttotheloadfactorinFigure 3-5 .TheBloomlterhasamuchlowerfalsepositiveratiothantheBloom-1lter.Ononehand,wemustrecognizethefactthat,asshowninTable 3-3 ,theoverheadfortheBloomltertoachieveitslowfalsepositiveratioissimplytoohightobepractical.Ontheotherhand,itraisesachallengeforustoimprovethedesignoftheBloom-1ltersothatitcanmatchtheperformanceoftheBloomlteratmuchloweroverhead.Inthenextsection,wegeneralizetheBloom-1ltertoallowperformance-overheadtradeoff,whichprovidesexibilityforpractitionerstoachievealowerfalsepositiveratioattheexpenseofmodestlyhigherqueryoverhead. 3.3Bloom-g:AGeneralizationofBloom-1 3.3.1Bloom-gFilterAsageneralizationofBloom-1lter,aBloom-gltermapseachmemberetogwordsinsteadofone,andspreadsitskmembershipbitsevenlyinthegwords.Morespecically,weuseglog2lhashbitsderivedfrometolocategmembershipwords,and 43

PAGE 44

thenuseklog2whashbitstolocatekmembershipbits.Therstoneormultiplewordsareeachassigneddk gemembershipbits,andtheremainingwordsareeachassignedbk gcbits,sothatthetotalnumberofmembershipbitsisk.Tocheckthemembershipofanelemente0,wehavetoaccessgwords.Hencethequeryoverheadincludesgmemoryaccessesandglog2l+klog2whashbits.ThefalsenegativeratioofaBloom-glteriszeroandthefalsepositiveratiofBgoftheBloom-glter,isderivedasfollows:Eachmemberencodedinthelterrandomlyselectsgmembershipwords.Therearenmembers.Togethertheyselectgnmembershipwords(withreplacement).Thesewordsarecalledtheencodedwords.Ineachencodedword,k gbitsarerandomlyselectedtobesetasonesduringtheltersetup.Tosimplifytheanalysis,weusek ginsteadoftakingtheceilingoroor.NowconsideranarbitrarywordDinthearray.LetXbethenumberoftimesthiswordisselectedasanencodedwordduringtheltersetup.Assumeweusefullyrandomhashfunctions.Whenanymemberrandomlyselectsawordtoencodeitsmembership,thewordDhasaprobabilityof1 ltobeselected.Hence,XisarandomnumberthatfollowsthebinomialdistributionBino(gn,1 l).Letxbeaconstantintherange[0,gn]. ProbfX=xg=gnx(1 l)x(1)]TJ /F6 11.955 Tf 13.16 8.09 Td[(1 l)gn)]TJ /F5 7.97 Tf 6.58 0 Td[(x.(3)Consideranarbitrarynon-membere0.Itishashedtogmembershipwords.Afalsepositivehappenswhenitsmembershipbitsineachofthegwordsareones.Consideranarbitrarymembershipwordofe0.LetFbetheeventthatthek gmembershipbitsofe0inthiswordareallones.Supposethiswordisselectedforxtimesasanencodedwordduringtheltersetup.Wehavethefollowingconditionalprobability:ProbfFjX=xg=)]TJ /F6 11.955 Tf 5.48 -9.68 Td[(1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F6 11.955 Tf 14.84 8.08 Td[(1 w)xk gk g. (3) 44

PAGE 45

TheprobabilityforFtohappenisProbfFg=gnXx=0ProbfX=xgProbfFjX=xg=gnXx=0gnx(1 l)x(1)]TJ /F6 11.955 Tf 13.15 8.09 Td[(1 l)gn)]TJ /F5 7.97 Tf 6.59 0 Td[(x)]TJ /F6 11.955 Tf 5.48 -9.69 Td[(1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F6 11.955 Tf 14.83 8.09 Td[(1 w)xk gk g (3)Elemente0hasgmembershipwords.Hence,thefalsepositiveratioisfBg=(ProbfFg)g=gnXx=0gnx(1 l)x(1)]TJ /F6 11.955 Tf 13.15 8.09 Td[(1 l)gn)]TJ /F5 7.97 Tf 6.59 0 Td[(x)]TJ /F6 11.955 Tf 5.48 -9.68 Td[(1)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F6 11.955 Tf 14.83 8.09 Td[(1 w)xk gk gg. (3)Wheng=k,exactlyonebitissetineachmembershipword.ThisspecialBloom-kisidenticaltoaBloomlterwithkmembershipbits.(NotethatBloom-kmayhappentopickthesamemembershipwordmorethanonce.Hence,justlikeaBloomlter,Bloom-kallowsmorethanonemembershipbitinaword.)Toprovethis,werstletg=k,and( 3 )becomes,fBk=knXx=0knx(1 l)x(1)]TJ /F6 11.955 Tf 13.15 8.08 Td[(1 l)kn)]TJ /F5 7.97 Tf 6.59 0 Td[(x)]TJ /F6 11.955 Tf 5.48 -9.68 Td[(1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F6 11.955 Tf 14.83 8.08 Td[(1 wx)k=1)]TJ /F5 7.97 Tf 16.24 14.95 Td[(knXx=0knx)]TJ /F6 11.955 Tf 6.67 -1.6 Td[(1 l(1)]TJ /F6 11.955 Tf 14.83 8.09 Td[(1 w)x(1)]TJ /F6 11.955 Tf 13.15 8.09 Td[(1 l)kn)]TJ /F5 7.97 Tf 6.58 0 Td[(xk=)]TJ /F6 11.955 Tf 5.48 -9.68 Td[(1)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F6 11.955 Tf 16.18 8.09 Td[(1 lw)knk. (3)Asm=lw,wehavefBk=fB.Inotherwords,aBloom-klterisidenticaltoaBloomlter. 3.3.2Bloom-gv.s.BloomwithSmallkWecomparetheperformanceandoverheadoftheBloom-gltersandtheBloomlterwithk=3.BecausetheoverheadofBloom-gincreaseswithg,itishighlydesirabletouseasmallvalueforg.Hence,wefocusonBloom-2andBloom-3ltersastypicalexamplesintheBloom-gfamily.Wecomparethefollowinglters:(1)B(k=3),theBloomlterwithk=3;(2)B2(k=3),theBloom-2lterwithk=3;(3)B2(h=3log2m),theBloom-2lterthatis 45

PAGE 46

Table3-4. QueryoverheadcomparisonofBloom-2lterandBloomlterwithk=3. DataStructureNumberofNumberofHashBitsMemoryAccessm=216m=220m=224 B(k=3)3486072B2(k=3)2384654B2(h=3log2m)2324048 allowedtousethesamenumberofhashbitsasB(k=3)does.Inthissubsection,wedonotconsiderBloom-3becauseitisequivalenttoB(k=3),aswehavediscussedinSection 3.3.1 .From( 3 ),wheng=2,wecanndtheoptimalvalueofk,denotedask2,thatminimizesthefalsepositiveratio.SimilartoB1(h=3log2m),thelterB2(h=3log2m)usesthekthatisclosesttok2undertheconstraintthatthenumberofrequiredhashbitsislessthanorequalto3log2m.Table 3-4 comparesthequeryoverheadofthreelters.TheBloomlter,B(k=3),needs3memoryaccessesand3log2mhashbitsforeachmembershipquery.TheBloom-2lter,B2(k=3),requires2memoryaccessesand2log2l+3log2whashbits.Itiseasytosee2log2l+3log2w=3log2m)]TJ /F6 11.955 Tf 12.64 0 Td[(log2l<3log2m.Hence,B2(k=3)incursfewermemoryaccessesandfewerhashbitsthanB(k=3).Ontheotherhand,B2(h=3log2m)usesthesamenumberofhashbitsasB(k=3)does,butmakesfewermemoryaccesses.Figure 3-6 presentsthefalsepositiveratiosofB(k=3),B2(k=3)andB2(h=3log2m);Figure 3-7 showsthenumberofmembershipbitsusedbythelters.TheguresshowthatB(k=3)andB2(k=3)havecomparablefalsepositiveratiosinloadfactorrangeof[0.1,1],whereasB2(h=3log2m)performsbetterinloadfactorrangeof[0.00005,1].Forexample,whentheloadfactoris0.04,thefalsepositiveratioofB(k=3)is1.510)]TJ /F7 7.97 Tf 6.59 0 Td[(3andthatofB2(k=3)is1.610)]TJ /F7 7.97 Tf 6.59 0 Td[(3,whilethefalsepositiveratioofB2(h=3log2m)is3.110)]TJ /F7 7.97 Tf 6.59 0 Td[(4,aboutonefthoftheothertwo.ConsideringthatB2(h=3log2m)usesthesamenumberofhashbitsasB(k=3)butonly2memory 46

PAGE 47

Figure3-6. FalsepositiveratiosofBloomlterandBloom-2lter. Figure3-7. Numberofmembershipbitsusedbythelters. accessesperquery,itisaveryusefulsubstituteoftheBloomltertobuildfastandaccuratedatastructuresformembershipcheck. 3.3.3Bloom-gv.s.BloomwithOptimalkWenowcomparetheBloom-gltersandtheBloomlterwhentheyusetheoptimalnumbersofmembershipbitsdeterminedfrom( 3 )and( 3 ),respectively.WeuseB(optimalk)todenoteaBloomlterthatusestheoptimalnumberkofmembershipbitstominimizethefalsepositiveratio.WeuseBg(optimalk)todenoteaBloom-glterthatusestheoptimalnumberkgofmembershipbits,whereg=1,2or3.Figure 3-8 47

PAGE 48

Figure3-8. Optimalnumberofmembershipbitswithrespecttotheloadfactor. Table3-5. QueryoverheadcomparisonofBloomlterandBloom-glterwithoptimalk.Parameters:m=220andw=64. a.Numberofmemoryaccessesperquery DataStructureLoadFactorn=m0.010.020.040.080.16 B(optimalk)69351794B1(optimalk)11111B2(optimalk)22222B3(optimalk)33333 b.Numberofhashbitsperquery DataStructureLoadFactorn=m0.010.020.040.080.16 B(optimalk)138070034018080B1(optimalk)8074625038B2(optimalk)142118947052B3(optimalk)1981621269066 comparestheirnumbersofmembershipbits(i.e.,k,k1,k2andk3).ItshowsthattheBloomlterusesmanymoremembershipbitswhentheloadfactorissmall.Nextwecomparetheltersintermsofqueryoverhead.For1g3,Bg(optimalk)makesgmemoryaccessesandusesglog2l+kglog2whashbitspermembershipquery.NumericalcomparisonisprovidedinTable 3-5 .Inordertoachieveasmallfalsepositiveratio,onehastokeeptheloadfactorsmall,whichmeansthatB(optimalk)willhavetomakealargenumberofmemoryaccessesandusea 48

PAGE 49

Figure3-9. FalsepositiveratiosofBloomandBloom-gwithoptimalk. largenumberofhashbits.Forexample,whentheloadfactoris0.08,itmakes9memoryaccesseswith180hashbitsperquery,whereastheBloom-1,Bloom-2andBloom-3ltersmake1,2and3memoryaccesseswith50,70and90hashbits,respectively.Whentheloadfactoris0.02,itmakes35memoryaccesseswith700hashbits,whereastheBloom-1,Bloom-2andBloom-3ltersmakejust1,2and3memoryaccesseswith74,118and162hashbits,respectively.Figure 3-9 presentsthefalsepositiveratiosoftheBloomandBloom-glters.AswealreadyshowedinSection 3.2.5 ,B1(optimalk)performsworsethanB(optimalk).However,thefalsepositiveratioofB2(optimalk)isveryclosetothatofB(optimalk).Furthermore,thecurveofB3(optimalk)isalmostentirelyoverlappedwiththatofB(optimalk)forthewholeload-factorrange.Theresultsindicatethatwithjusttwomemoryaccessesperquery,B2(optimalk)worksalmostasgoodasB(optimalk),eventhoughthelattermakesmanymorememoryaccesses. 3.3.4DiscussionThemathematicalandnumericalresultsdemonstratethatBloom-2andBloom-3havesmallerfalsepositiveratiosthanBloom-1attheexpenseoflargerqueryoverhead.Belowwegiveanintuitiveexplanation:Bloom-1usesasinglehashtomapeachmembertoawordbeforeencoding.Itiswellknownthatasinglehashcannotachieve 49

PAGE 50

anevenlydistributedload;somewordswillhavetoencodemuchmoremembersthanothers,andsomewordsmaybeemptyasnomembersaremappedtothem.Thisunevendistributionofmemberstothewordsisthereasonforlargerfalsepositives.Bloom-2mapseachmembertotwowordsandsplitsthemembershipbitsamongthewords.Bloom-3mapseachmembertothreewords.Theyachievebetterloadbalancesuchthatmostwordswilleachencodeaboutthesamenumberofmembershipbits.Thishelpsthemimprovetheirfalsepositiveratios. 3.3.5UsingBloom-ginaDynamicEnvironmentInordertocomputetheoptimalnumberofmembershipbits,wemustknowthevaluesofn,m,w,andl.Thevalueofm,wandlareknownoncetheamountofmemoryforthelterisallocated.Thevalueofnisknownonlywhenthelterisusedtoencodeastaticsetofmembers.Inpractice,however,theltermaybeusedforadynamicsetofmembers.Forexample,aroutermayuseaBloomltertostoreawatchlistofIPaddresses,whichareidentiedbytheintrusiondetectionsystemaspotentialattackers.Therouterinspectsthearrivalpacketsandlogsthosepacketswhosesourceaddressesbelongtothelist.Ifthewatchlistisupdatedonceaweekoratthemidnightofeachday,wecanconsideritasastaticsetofaddressesduringmostofthetime.However,ifthesystemisallowedtoaddnewaddressestothelistcontinuouslyduringtheday,thewatchlistbecomesadynamicset.Inthiscase,wedonothaveaxedoptimalvalueofkfortheBloomlter.Oneapproachistosetthenumberofmembershipbitstoasmallconstant,suchasthree,whichlimitsthequeryoverhead.Inaddition,weshouldalsosetthemaximumloadfactortoboundthefalsepositiveratio.Iftheactualloadfactorexceedsthemaximumvalue,weallocatemorememoryandsetupthelteragaininalargerbitarray.ThesamethingistruefortheBloom-glter.Foradynamicsetofmembers,wedonothaveaxedoptimalnumberofmembershipbits,andtheBloom-glterwillalsohavetochooseaxednumberofmembershipbits.ThegoodnewsfortheBloom-glter 50

PAGE 51

Figure3-10. FalsepositiveratiosoftheBloomlterwithk=3,theBloom-1lterwithk=6,andtheBloom-2lterwithk=5. isthatitsnumberofmembershipbitsisunrelatedtoitsnumberofmemoryaccesses.Theexibledesignallowsittousemoremembershipbitswhilekeepingthenumberofmemoryaccessessmallorevenaconstantone.ComparingwiththeBloomlter,wemaycongureaBloom-glterwithmoremembershipbitsforasmallerfalsepositiveratio,whileinthemeantimekeepingboththenumberofmemoryaccessesandthenumberofhashbitssmaller.Imaginealterof220bitsisusedforadynamicsetofmembers.Supposethemaximumloadfactorissettobe0.1toensureasmallfalsepositiveratio.Figure 3-10 comparestheBloomlterwithk=3,theBloom-1lterwithk=6,andtheBloom-2lterwithk=5.Asnewmembersareaddedovertime,theloadfactorincreasesfromzeroto0.1.Inthisrangeofloadfactors,theBloom-2lterhassignicantlysmallerfalsepositiveratiosthantheBloomlter.Whentheloadfactoris0.04,thefalsepositiveratioofBloom-2isjustonefourthofthefalsepositiveratioofBloom.Moreover,itmakesfewermemoryaccessespermembershipquery.TheBloom-2lteruses58hashbitsperquery,andtheBloomlteruses60bits.ThefalsepositiveratiosoftheBloom-1lterareclosetoorslightlybetterthanthoseoftheBloomlter.Itachievessuchperformancebymakingjustonememoryaccessperqueryanduses50hashbits. 51

PAGE 52

3.3.6ExperimentWefurtherevaluatetheBloom-gltersthroughexperimentsusingrealnetworktraces. 3.3.6.1ExperimentSetupImaginethatanintrusiondetectionsystemmaintainsawatchlistconsistingofpreviously-identiedexternalsources,whichexhibitsuspiciousbehaviorsthatmatchthepatternsofwormattacks,DDoSattacks,scanningorreconnaissance.Theintrusiondetectionsystemwantstofurtheranalyzethepacketsfromthesehostsinordertocapturetherealoffenders.Itneedstomatchthesourceaddressesoftheincomingpacketsagainstthewatchlistandlogtheoneswhoseaddressesaremembersofthelist.WhilethewholewatchlistisstoredinahashtablelocatedinDRAM,itisalsoencodedinaBloomorBloom-glterwhosesmallsizecantinSRAMinordertokeepupwiththelinespeed.Supposethewatchlistisupdatedonceaday.Itcanbetreatedasastaticlistduringoperation.Weobtaininboundpacketheadertracesfromthemaingatewayatourcampus.Theycontain2,064,081distinctsourceIPaddressesand2,192,707distinctdestinationaddresses.Werstgenerate10watchlistsfromthetrace,eachwith25,000randomlyselectedsourceaddresses.Wethenfeedthetrafctracesthroughtheltersaswellasthehashtabletoidentifythematchingsourceaddressesandcomputefalsepositiveratiosbasedonthenumberofmatchesbytheltersandthenumberofmatchesbythehashtable.Resultsareaveragedamongthe10watchlists.Eachexperimentconsistsoftwophases:theinitializationphaseandtheexecutionphase.Intheinitializationphase,wesetuptheBloom/Bloom-glters,aswellasthehashtable,byusingawatchlistofaddresses.WealwaysallocatethesameamountofmemorytotheBloomandBloom-gltersforfaircomparison.Weusesixlters.Theyare(a)twoBloomlters:B(k=3)andB(optimalk),and(b)fourBloom-glters: 52

PAGE 53

Table3-6. QueryoverheadcomparisonofBloomlterwithk=3andBloom-glterwithh=3log2m.Parameters:n=25,000andw=64. a.Numberofmemoryaccessesperquery DataStructure#MemoryAccess B(k=3)3B1(h=3log2m)1B2(h=3log2m)2 b.Numberofhashbitsperquery DataStructurem125Kb250Kb500Kb B(k=3)515457B1(h=3log2m)294255B2(h=3log2m)405456 B1(h=3log2m),B2(h=3log2m),B1(optimalk),andB2(optimalk).TheirdenitionscanbefoundinSection 3.2 andSection 3.3 .Intheexecutionphase,weperformamembershipqueryineachlterforthesourceaddressofeachpacket.Ifalterclaimsthatitisamemberbutthesourceisnotfoundinthehashtable,itisafalsepositive.Wevarythesizemoftheltersfrom125Kbto500Kb,whichtranslatesinto5to20bitspermemberinthewatchlist,andloadfactorfrom0.2to0.05.Letw=64.l=m wvariesfrom2,000to8,000. 3.3.6.2PerformanceComparisonofBloomandBloom-gFirst,wecomparetheperformanceofB(k=3),B1(h=3log2m)andB2(h=3log2m).Table 3-6 presentsthequeryoverheadoftheBloomlterandtheBloom-glters.NotethatTable 3-6 biscomputedbasedonthenumberofmembershipbits(i.e.k)foreachlter.Inourexperiment,B1(h=3log2m)uses7membershipbits,andB2(h=3log2m)uses5.B(k=3),B1(h=3log2m)andB2(h=3log2m)requires3,1and2memoryaccessesrespectivelyforeachquery.B1(h=3log2m)andB2(h=3log2m)alsorequirelesshashbitsthanB(k=3).Forexample,whenm=125Kb,B(k=3)requires51hashbitsperquery,whileB1(h=3log2m)andB2(h=3log2m)onlyneed29and40hashbitsrespectivelyforeachquery. 53

PAGE 54

Figure3-11. FalsepositiveratiosofBloomwithk=3andBloom-gwithh=3log2minrealtraceexperiment.Parameters:n=25,000andw=64. Figure3-12. FalsepositiveratiosofBloomandBloom-gwithoptimalkinrealtraceexperiment.Parameters:n=25,000andw=64. Figure 3-11 presentsthefalsepositiveratiosofthelterswithrespecttotheamountofmemorym.Asourtheoreticalanalysishaspredicted,thefalsepositiveratioofB2(h=3log2m)issmallerthanthatofB(k=3).B1(h=3log2m)alsohasaslightlysmallerfalsepositiveratiothanB(k=3)whenmislargerthan250Kb.Whenmissmallerthan250Kb,ityieldsaslightlylargerfalsepositiveratiothanB(k=3).GiventhatthethroughputofB1(h=3log2m)canpotentiallybeuptothreetimesthatofB(k=3),itisanattractiveoptioninpracticedespiteofitsslightlyhigherfalsepositiveratio. 54

PAGE 55

Table3-7. QueryoverheadcomparisonofBloomlterandBloom-glterwithoptimalk.Parameters:n=25,000andw=64. a.Numberofmemoryaccessesperquery DataStructurem125Kb250Kb500Kb B(optimalk)3714B1(optimalk)111B2(optimalk)222 b.Numberofhashbitsperquery DataStructurem125Kb250Kb500Kb B(optimalk)51126266B1(optimalk)294255B2(optimalk)406086 Next,wecompareB(optimalk),B1(optimalk)andB2(optimalk).WeusetheoptimalkasshowninFigure 3-8 foreachlter.Table 3-7 presentsthequeryoverheadoftheBloomlterandBloom-glters.Whenmvariesfrom125Kbto500Kb,thequeryoverheadofB(optimalk)increasesdramatically.Whenm=500Kb,forexample,B(optimalk)requires14memoryaccessesand266hashbitsforeachquery,makingitimpractical.Incomparison,B1(optimalk)requiresonlyonememoryaccessand55hashbits,whileB2(optimalk)requiresjusttwomemoryaccessesand86hashbits.Theyremainpracticalsolutionsunderthissetting.Figure 3-12 presentsthefalsepositiveratiosofthelterswithrespecttom.TheseresultsmatchthetheoreticalanalysiswegiveinSection 3.3.3 .ThefalsepositiveratioofB2(optimalk)iscomparabletothatofB(optimalk)eventhoughitsqueryoverheadismuchsmaller.Itisanexcellentcandidateforapplicationsthatrequireaverylowfalsepositiveratio.ThefalsepositiveratioofB1(optimalk)islargerthanthatofB2(optimalk),buthasevensmalleroverhead,whichrepresentsaperformance-overheadtradeoff. 55

PAGE 56

3.4Bloom-:AnotherGeneralizationofBloom-1 3.4.1MotivationFromFigure 3-6 ,weseethatBloom-2iscomparabletotheBloomlterintermsoffalsepositiveratiowhenk=3.However,itperformsmuchbetterifitusesthesamenumberofhashbitsasBloomdoes.FromFigure 3-9 ,whentheoptimalnumberofmembershipbitsisused,Bloom-2isalmostasgoodasBloomandbetterthanBloom-1.ThedownsideisthatBloom-2needstwomemoryaccessesperquery,whereasBloom-1needsjustone.OurgoalistomakeaperformancetradeoffbetweenBloom-1andBloom-2suchthatfalsepositiveratioisclosetowhatBloom-2has,whereasoverheadisclosetowhatBloom-1has.WehavebrieytoucheduponthereasonwhyBloom-2outperformsBloom-1inSection 3.3.4 .Nowwegiveacloserlook.RecallthataBloom-1lterusesanarrayofwords.Let'sdenetheloadofawordtobethenumberofmembersthatareencodedintheword.Falsepositiveratiosincurredindifferentwordsmayvary.Naturally,thefalsepositiveratioofaheavily-loadedwordwillbehigherthanthatofalightly-loadedword.Thisisconrmedbyoursimulationthatkeepstrackofwhichwordeachfalsepositiveoccursin.Weclassifythewordsintogroupsbasedontheirloadvalues,andndtheaveragefalsepositiveratioforeachgroupbysimulation.TheresultsareshowninFigure 3-13 .Forwordswhoseloadsare6orsmaller,falsepositiveratiosareverysmall.However,falsepositiveratiogrowssuperlinearly.Whentheloadsare10orgreater,falsepositiveratiosareveryhigh.Clearly,reducingthenumberofheavily-loadedwordshelpsreducetheoverallfalsepositiveratios.TheapproachadoptedbyBloom-2mapseachmembertotwomembershipwords.Eachwordonlycarrieshalfofthemembershipbits.Inotherwords,eachwordonlyencodeshalfofthemember.Consideraheavilyloadedwordthatservesasthemembershipwordfor14members.InBloom-2,sincethewordencodeshalfofeachmember,itsloadisreducedfrom14to7.Figure 3-14 showsthe 56

PAGE 57

Figure3-13. FalsepositiveratioofBloom-1withrespecttoloadofwords.Parameters:m=220,w=64,loadfactorn=m=0.1. Figure3-14. Frequencydistributionofwordswithrespecttoloadvalues.Parameters:m=220,w=64,loadfactorn=m=0.1forbothlters. frequencydistributionofwordswithrespecttoloadvalues.Fewerwordshaveheavyloadsof10orhigherinBloom-2thanBloom-1.ThatiswhyBloom-2performsbetter.However,twomembershipwordsrequiretwomemoryaccessesforeachquery.Whataboutonlysomemembershavetwomembershipwordswhileothershaveone?Inthiscase,somequeriesneedtwomemoryaccesses,butothersneedsjustone.Theaveragenumberofmemoryaccessperquerywillbebetween1and2.Intuitively,foralightlyloadedword,wewantitsencodedmemberstohavejustonemembershipword.However,foraheavilyloadedword,wewantallorsomeofitsencodedmembersto 57

PAGE 58

haveadditionalmembershipwordssothatsomeoftheirmembershipbitswillbemovedelsewhere. 3.4.2Bloom-FilterABloom-lter(denotedasB)isalsoanarrayoflwords,eachofwhichiswbitslong.Therstw)]TJ /F6 11.955 Tf 11.07 0 Td[(1bitsineachwordareusedtoencodemembersofaset,andthelastbitisaagtoindicatewhethermembersmappedtothiswordhavesecondmembershipwords.Initially,allbits(includingagbits)arezeros.ThesetupprocedureofaBloom-lterconsistsoftwophases.Intherstphase,wemapelementsinthesettothewordsinthesamewayaswedoforBloom-1.Eachelementismappedtoandencodedinexactlyoneword.Wekeeptrackoftheelementsthatareencodedineachword.Inthesecondphase,wepickawordW1thathasthelargestnumberofencodedmembers.Wethensplitthesemembers:foreachmember,weuselog2lofitshashbitstolocateasecondwordW2andmoveitslastbk=2cmembershipbitsfromW1toW2.FinallytheagbitinW1issettoone;wecallW1asplitword.Wekeepsplittingtheheaviestloadedworduntilacertainpre-denedfractionofmembersaresplit,i.e.,thenumberofmembershavingtwomembershipwordsreachesn.Assplitwordscannotbesplitagain,thisprocesswillterminateassoonasenoughmembersaresplit.TheconstructionofBloom-ensuresthat,iftheagofawordisone,themembersmappedtothewordmusthavetwomembershipwords;iftheagiszero,themembershaveonemembershipword.Bloom-1isaspecialcaseofBloom-with=0.Toperformmembershipcheckonanelemente0,wersthashtheelementtolocateitsmembershipwordW1.Iftheagofthewordiszero,wecheckkmembershipbitsinthisword.Ifallthesebitsareones,e0isamemberoftheset;Otherwise,itisnot.Iftheagofthewordisone,wechecktherstdk=2emembershipbitsinW1.Ifanyofthesebitsiszero,wearesurethate0isnotamemberandthereisnoneedtocheckthesecondword.However,ifallthesebitsareones,wendthesecondmembershipword 58

PAGE 59

Table3-8. QueryoverheadcomparisonofBloom-1,Bloom-2andBloom-lter. DataStructure#MemoryAccess#HashBits B1(k=3),B1(optimalk)1log2l+klog2wB2(k=3),B2(optimalk)22log2l+klog2wB(k=3),B(optimalk)1+2log2l+klog2w byusingadditionalhashbitsandchecktheremainingbk=2cmembershipbitsinthatword.Weclaimthate0isamemberofthesetonlyifallthosebitsarealsoones.WeanalyzethequeryoverheadinTable 3-8 .Consideranarbitraryelemente0.Ife0ismappedtoawordwhoseagiszero,ittakesonememoryaccess.Ife0ismappedtoasplitwordwhoseagisone,ittakesoneortwomemoryaccesses,dependingonwhetheranyoftherstdk=2emembershipbitsiszero.Thechancefore0tobemappedtoanywordisequal.Hence,theprobabilityfore0tobemappedtoasplitwordisequaltothefraction(orpercentage)ofwordsthataresplit.Consideringthatsplitwordsaremostheavilyloaded,theaverageloadofthesewordsislargerthan(orequalto,onlyifmembersareevenlydistributed)thatofthewholewordarray.Inordertosplitfractionofmembers,splittingnomorethanfractionofwordsisenough.Stringentproofiseasytomakeandisomit.Therefore,theaveragenumberofmemoryaccessesperqueryisboundedby1(1)]TJ /F4 11.955 Tf 12.14 0 Td[()+2=1+.Ourexperimentresultsinthenextsubsectionshowthat,for=0.5,theaveragenumberofmemoryaccessesisactuallyveryclosetoone.Figure 3-15 comparesfalsepositiveratiosofBloom-1,Bloom-2,andBloom-withk=3bysimulations.ThefalsepositiveratioofBloom-isbetweenthoseofBloom-1andBloom-2.Whenthevalueofisincreasedfrom25%to50%,thefalsepositiveratioofBloom-isdecreased,suggestingaperformance-overheadtradeoffsincetheaveragenumberofmemoryaccesseswillincrease.When=50%,thefalsepositiveratioofBloom-isclosetothatofBloom-2. 59

PAGE 60

Figure3-15. FalsepositiveratiosofBloom-1,Bloom-2andBloom-Filterwithk=3. 3.4.3ExperimentWefurtherevaluatetheBloom-lterthroughexperimentsusingnetworktraces.TheexperimentalsettingsareidenticaltothoseweusedinSection 3.3.6.1 .Wecompareeightlters.Theyare(a)twoBloom-1lters:B1(k=3)andB1(optimalk);(b)twoBloom-2lters:B2(k=3)andB2(optimalk),and(c)fourBloom-lters:B(k=3,=25%),B(k=3,=50%),B(optimalk,=25%),andB(optimalk,=50%).ItisdifculttodeterminethetheoreticaloptimalkforBloom-,soweusethevalueofkthatproducesthelowestfalsepositiveratioinsimulations.First,wecomparetheperformanceofB1(k=3),B2(k=3),B(k=3,=25%),andB(k=3,=50%).Figure 3-16 showsmemoryaccessoverhead.WeseethatbothB(k=3,=25%)andB(k=3,=50%)needmuchfewermemoryaccessesthantheupperbound(1+)inTable 3-8 .Infact,thenumbersareclosetoone.Therearetworeasonsforthis:First,thefractionofsplitwordsissmallerthan.Infact,inourexperiments,around17.6%ofthewordsaresplitwhen=25%,and39.2%aresplitwhen=50%.Asaresult,theprobabilitythatanarbitraryelementismappedtoasplitwordissmallerthan.Second,evenifanelementismappedtoasplitword,itisnotnecessarythatadditionalmemoryaccessismade,aswehavediscussedinSection 3.4.2 .Figure 3-17 comparesfalsepositiveratiosofthelters.Weseethatfalsepositive 60

PAGE 61

Figure3-16. AveragenumberofmemoryaccessforBloom-1,Bloom-2andBloom-Filterwithk=3inrealtraceexperiment.Parameters:n=25,000andw=64. Figure3-17. FalsepositiveratiosofBloom-1,Bloom-2andBloom-Filterwithk=3inrealtraceexperiment.Parameters:n=25,000andw=64. ratiosofBaresmallerthanBloom-1butslightlyworsethanBloom-2,particularlywhen=50%.Next,Figures 3-18 and 3-19 presentperformancecomparisonofB1(optimalk),B2(optimalk),B(optimalk,=25%),andB(optimalk,=50%).Whentheoptimalkisused,thegapbetweenB1(optimalk)andB2(optimalk)intermsoffalsepositiveratiobecomeslarger.B(optimalk,=25%)andB(optimalk,=50%)llthegapwithsmallerfalsepositiveratiosthanB1(optimalk)andfewermemoryaccessesthan 61

PAGE 62

Figure3-18. AveragenumberofmemoryaccessforBloom-1,Bloom-2andBloom-Filterwithoptimalkinrealtraceexperiment.Parameters:n=25,000andw=64. Figure3-19. FalsepositiveratiosofBloom-1,Bloom-2andBloom-Filterwithoptimalkinrealtraceexperiment.Parameters:n=25,000andw=64. B2(optimalk).Figure 3-20 presentstheoptimalnumberofmembershipbitsforBloom-lters.ComparingtoFigure 3-8 ,theoptimalkfortheBloom-issmallerthanthatoftheBloom-1andBloom-2lters. 3.5SummaryInthischapter,weintroduceonememoryaccessBloomltersandtheirgeneralization.ThisfamilyofdatastructuresenrichesthedesignspaceoftheBloomltersandtheirapplicationscopebyreducingthequeryoverheadtoallowhighthroughput.Usinga 62

PAGE 63

Figure3-20. OptimalnumberofmembershipbitsfortheBloom-Filterinrealtraceexperiment.Parameters:n=25,000andw=64. numberofrandombitsinawordinsteadoffromtheentirebitarray,weanalyzetheimpactofthisdesignchangeintermsofoverheadandperformance.Thischangealsoopensthedoorforconstructingothervariantsforperformancetradeoff.Inthisenlargeddesignspace,wecancongureltersthatnotonlymakefewermemoryaccessesbutalsohavecomparableorsuperiorfalsepositiveratiosinscenarioswherethestandardBloomlterwiththeoptimalvalueofkincurstoomuchoverheadtobepractical. 63

PAGE 64

CHAPTER4SPACE-TIMEEFFICIENTMULTI-SETMEMBERSHIPLOOKUP 4.1BackgroundManyimportantnetworkfunctionsrequireonlinemembershiplookupagainstalargesetofaddresses,owlabels,signatures,etc.Forexample,aroutermaybeconguredwithcounters[ 13 ]tocollectper-clientinformationforagivensetofclientaddresses.Foreacharrivalpacket,therouterneedstoperformmembershiplookuptoseeifthesourceaddressofthepacketbelongstotheclientset.Aroutermayalsobeinstructedtoidentifythesetofcurrentows.Thisrequirestheroutertocollectowlabels[ 96 97 ],suchasclientaddressesoraddress/porttuplesthatidentifyTCPows.Sinceeachowlabelshouldbecollectedonlyonce.Whenanewpacketarrives,theroutermustcheckwhethertheowlabelextractedfromthepacketbelongstothesetthathasalreadybeencollectedbefore.Forrouting-tablelookup,weneedtodeterminewhetheragivendestinationaddressprexisintheroutingtable[ 146 ].Theseonlinemembershiplookupproblemshavebeenextensivelystudied.Manyinuentialsolutions[ 40 76 97 103 145 146 150 ]aredesignedusingBloomlters[ 7 12 ],whicharecompactdatastructuressuitableforhardwareimplementationinon-dieSRAMmemory.Thischapterstudiesamoredifcult,yetlessinvestigatedproblem,calledmulti-setmembershiplookup(MSM),whichinvolvesmultiple(sometimesinhundreds)sets,andclassiesanincomingstreamofelements,e.g.,addresses,ports,acombinationofthemandothereldsinpacketheaders,orevenpacketcontent1.Theclassicationdeterminesnotonlywhetheranelementisamemberofthesetsbutalsowhichsetitbelongsto.Ithasmanyimportantapplications.Forexample,withMSM,theroutersofanISPcanclassifypacketsfordifferentiatedservicesbasedontheirsourceswhichare 1Intheliterature,thereisanothertypeofmulti-set,whichdescribesthatanelementmayappearmultipletimesinaset. 64

PAGE 65

placedintodifferentsetsaccordingtodifferenttypesofservicecontracts.Arewallmaybesuppliedwithanactionlistforaddressesthatarecollectedbyanintrusiondetectionsystem.Dependingonthesuspiciouslevelsofsourceaddresses,somepacketsmaybelogged,somemaybecontentinspected,whileothersmaybedropped.Therewallmustcheckeacharrivalpackettoseeifthesourceaddressisamemberofthedifferentsets,andifitis,whatfurtheractionitshouldtake,dependingonwhichsetthepacketbelongsto.Inanotherexample,astheVMsareplacedontotheserversofadatacenter,eachserverhasasetofVMs.Thegatewaycanbetaskedtodeterminewhichincomingpacketsshouldbeforwardedtowhichserver(wherethecorrespondingVMsarehosted).Traditionalexact-matchdatastructuressuchasbinarysearchtree[ 58 ],trie[ 48 ],orhashtable[ 41 ]havetostorebothkeysandvalues(setIDs),andsomeneedadditionalspaceforpointers.Therefore,theyrequiremuchmorespacethanBloomltervariations.Triescanorganizethekeysinacompactway,butstillextrapointersarerequiredforsparsekeyspace[ 44 ].Thischapterstudiesprobabilistic-matchdatastructurestoreducethememoryrequirement,whichisusefulwhenmulti-setmembershiplookupneedstobeimplementedonsmallbutfaston-diememorytokeepupwithhighlinespeed.OursolutioncombinestheBloom-glterwithamulti-hashingtableemployinganovelload-to-left,candidate-to-rightpolicyforelementplacement.Thesetechniquestogetherallowtheproposedmulti-setmembershiplookupfunctiontoworkinverycompactmemory,takeonlyafewmemoryaccessesandhashoperationsforeachlookup,andhavemuchlowererrorprobabilitieswhencomparingwithalternativedatastructures. 4.2ProblemDenition 4.2.1Multi-SetMembershipLookup(MSM)ConsideranumbersofsetswhoseIDsare1,2,...,s,respectively.Eachsethasacertainnumberofelements.Givenanarbitraryelemente,themulti-setmembership 65

PAGE 66

lookupfunctionistondthesetIDthatebelongsto.Weconsiderthesetstobedisjointforapplicationsthatclassifyanincomingstreamofelementsintodifferentsets.SothereturnedvaluefortheMSMfunctioniseitheravalidsetIDSe(1Ses)ife2Se,or0ifedoesnotbelongtoanyset,whereweuseSeforbothIDandthesetforconvenience. 4.2.2PerformanceMetricsTheperformancecriteriaconsideredinourdesignoftheMSMfunctionaregivenbelow: Space:Forspaceefciency,weshouldreducethenumberofbitsittakestoencodeeachmemberanditssetID.Thisisextremelyimportantifthedatastructuresareplacedinon-dieSRAM. MemoryAccess:WeshouldreducethenumberofmemoryaccessestoSRAMforeachmembershiplookup.Thisisparticularlyimportantifoperationsareperformedveryfrequently,e.g.onaper-packetbasisinarouter. HashComputation:Wealsowanttoreducethenumberofhashbitsthatareneededforeachmembershiplookup.Thishelpsreducethecomputationaloverhead.Tofurtherelaborateoncorrectness,wedeneafewconcepts: FalsePositive:Foranelementthatdoesnotbelongtoanyset,falsepositivehappensiftheMSMfunctionmistakenlybelievestheelementbelongstoaset. ConictClassication:Foranelementthatbelongstoaset,conictclassicationoccursiftheMSMfunctioncannotdenitivelydeterminetherightsetIDbutknowsitmustbeamongseveralcandidatesets. Mis-Classication:Foramemberofanexistingset,mis-classicationoccursifanyofthefollowingtwoconditionshappens:(1)theMSMfunctionclaimsthattheelementbelongstoadifferentset,or(2)theMSMfunctionclaimsthattheelementdoesnotbelongtoanyset.FalsepositiveandconictclassicationaredirectconsequenceofbitsharinginBloom-likedatastructures,wherethesetofbitstowhichanon-memberismappedhappentobeallones.Whilemisclassicationcanbeavoided[ 57 ],falsepositiveandconictclassicationcannotbeeliminatedbecause,foravirtuallyunlimitednumberofnon-members,theprobabilityforanon-membertosharethesamesetofbitswith 66

PAGE 67

memberscannotbereducedtozeroinhash-baseddesignsadoptedbyBloomltersandtheiralike.Inthischapter,wefocusondatastructuresthatmayhavefalsepositivesorconictclassications,butdonothavemis-classications.Inotherwords,nomemberinanexistingsetistreatedasnon-memberormemberofadifferentset. 4.3DesignofMSMFunction 4.3.1MotivationIfcodedBloomlters[ 21 57 98 ]areused,eachentryinthemembershiplookuptable(i.e.,eachmemberinSe,1Ses)hastobeencodedmultipletimes.Toreducethenumberofbitsneededperentry,weshouldencodeeachentryjustonce,asthestraightforwardsolution[ 21 ]does.Onepossibilityistoencodeeachentry,includingboththememberidentierandthesetID,i.e.(e,Se),asamemberinaBloomlter[ 7 12 ].Theproblemisthatwhenperforminglookupforanelement,weonlyhavetheelementidentierandwillhavetotryallsetIDstoseeifanyofthem,whencombinedwiththeelementidentier,isencodedinthelter.Thisisequivalenttowhatthestraightforwardsolutiondoes.ThenumberofdifferentsetIDsisinthousands,causinghugelookupoverhead.TheBloomierlter[ 22 ]doesnotperformwelleitherasourlaterevaluationwillshow.OurideaistocreateindirectioninthelookupprocessbyseparatingmembershipencodingandsetIDstorageintwodatastructures,calledindexencoderandset-idtable(abbreviatedasSID-table),respectively.Intheindexencoder,weencodethemembershipofamemberaswellasasmallindexthatpointsoutwheretondtherightsetIDintheSID-table.Thisindexmaytakeafewdifferentvalues(e.g.,from1to10).Thelookupprocessconsistsoftwosteps:Givenamemberidentier,therststeptestswhetherthememberisamemberandchecksthefewindexvalues(insteadofsetIDsinthousands)toseewhichoneisencodedintheindexencoder.Usingtherightindex,thesecondstepndsoutwheretofetchthesetIDfromtheSID-table. 67

PAGE 68

Becausetherststepchecksallpossibleindexvalues,iftheirencodinglocationsscatterallovertheindexencoder,wewillhavetomakemanymemoryaccesses.Toreducethenumberofmemoryaccesses,weclustertheencodinglocationsofallindicesinoneorafewcontiguousblocksintheindexencoder,whereeachblockcanberetrievedinonememoryaccess.Inthisway,weonlyneedtomakeoneorafewmemoryaccessesbeforecheckingallindicesinparallelwithinamulti-coreprocessor. 4.3.2MSMFunction 4.3.2.1Set-IDTableTheSID-tablestoresthesetIDsofthemembers.EachentryintheSID-tableconsistsoftwoelds:achecksumandasetID.Thechecksumisusedforresolvinglookupconict.IfthesetIDeldiszero,itmeanstheentryisunused(ValidsetIDsstartfrom1andgoup). 4.3.2.2IndexEncoderTheindexencodersharessimilaritywithaBloomlter.Itisanarrayofbitsthatareorganizedinatwo-levelstructure.Attherstlevel,thearrayisdividedintoconsecutiveblocks,eachofwhichmaybe64bitslongandcanberetrievedinonememoryaccess.Atthesecondlevel,eachblockconsistsofconsecutivebits,whichencodethekeyofmemberelementstogetherwiththelocationinSID-tablewheretheirset-IDsarestored. 4.3.2.3InsertionToinsertanewmembereofsetSe,wehashetokentriesintheSID-table,wherekisasystemparameter.Theseentriesarecalledthecandidateentriesformembere.WewillselectoneofthemtostoresetIDSe;thatentryiscalledtheprimaryentryfore. 68

PAGE 69

Morespecically,theSID-tableisdividedintoqequal-sizedsegments,2whereqk.Forconvenience,werefertotheorderfromtherstsegmenttotheqthsegmentasfromlefttoright,following[ 8 69 ].Wehashetoatleastonecandidateentryineachsegment.Whenk>q,somesegmentswillhavemorethanonecandidateentry.Thecandidateentriesareindexedfrom1tokbytheorderoftheirsegmentsand,forthoseinthesamesegment,bytheorderofhashing.Ifanycandidateentryisunused,wewillbeabletosuccessfullyinsertthemember.Letabetheindexoftheprimaryentryfore;itisalsocalledtheprimaryindex.Weinsertachecksumcomputedfrome,andsetthesetIDeldtobeSe.However,ifallcandidateentriesforeareused,wefailinndinganentrytostoreSeintheSID-table.Inthiscase,weinsertthepair,eandSe,inasmallternarycontent-addressablememory(TCAM)[ 122 ].Inordertosupportefcientlookup,weencodetheprimaryindexaintheindexencoderbytwosteps:Intherststep,wehashthememberidentieretoanumbergofblocksintheindexencoder,wheregmaybeoneorasmallinteger.Wethenfetchtheseblockstotheprocessor.TheycanbelogicallythoughtofasasmallBloom-likelter,denotedasCe,nowresidingintheprocessorforencodinga.Inthesecondstep,wehasha(togetherwithe)tok0bitsinCeandsetthemto`1'.Theseoperationsareperformedwithintheprocessor.Afterthat,theblocksarewrittenbacktoSRAM.ThewaythattheindexencoderworksisverysimilartotheBloom-glterwementionedinChapter 3 .Inthischapter,whenwesaywehashtheprimaryindexa(orhashanotherindex),wealwaysmeantohashittogetherwiththememberidentiere. 2Technicallyspeaking,wemayallowvariable-sizedsegments[ 11 66 ].However,equal-sizedsegmentsareeasiertohandleinpracticeformulti-bankedmemoryanddynamicmemoryallocation.SeeSection 4.4 69

PAGE 70

Figure4-1. AnexampleofinsertingamembertotheMSMfunction. Figure4-2. Anillustrationofusingload-to-left,candidate-to-rightpolicytoinsertamembertoaSID-table.Anentrymarkedwith`x'(`0')meansisused(unused). Figure 4-1 illustrateshowinsertiontotheMSMfunctionisdonethroughasimpleexample.(1)Weperformhashoperationsonthemember'sideandobtainasequenceofhashbits.(2)Usingthehashbits,wendkcandidateentriesfromtheSID-table,andstorethesetIDSeofewithachecksumcomputedfrometooneoftheentries;theindexoftheentryisa.(3)Usingobtainedhashbits,gblocksarefetchedfromtheindexencoder,whichformavirtualBloom-likelterCe.(4)Weencode(e,a)toCe. 4.3.2.4Load-to-left,Candidate-to-rightForinsertion,therearetworemainingproblems:Howtochoosetheprimaryentry?Howmanycandidateentriesshouldeachsegmenthave?Ourobjectiveistoincreasetheprobabilityofndinganunusedentryforeachnewmember.Toachievethisobjective,weadoptaload-to-leftandcandidate-to-rightpolicytoaddressthosetwoproblems,whereload-to-lefthasappearedinprevioushashingschemes[ 8 11 69 ]andcandidate-to-rightisoriginal. 70

PAGE 71

Theload-to-leftpolicydealswiththechoiceoftheprimaryentry:Ifeishashedtomorethanoneunusedentry,wealwayschoosetheonefromthesegmentmosttotheleft.Thiscreatesaskewinthesegmentload,whichismeasuredasthefractionofentriesinasegmentthatareused.Asegmenttothelefttendstohaveahigherloadthanasegmenttotheright,asshowninFigure 4-2 .Askewintheloadimprovestheprobabilityforanewmembertobesuccessfullyinserted.Forexample,supposek=q=2andthesegmentloadsare0.9and0.1,respectively.Thesuccessful-insertionprobabilityis1)]TJ /F6 11.955 Tf 12.11 0 Td[(0.90.1=0.91.Incomparison,ifthesegmentloadsareuniform,i.e.,0.5and0.5,thesuccessful-insertionprobabilitywillbeonly1)]TJ /F6 11.955 Tf 11.95 0 Td[(0.50.5=0.75.Letibetheloadoftheithsegment.Itsvaluecanberecursivelycomputed.LetnbethenumberofmembersandlbethenumberofentriesintheSID-table.Eachsegmenthasl=qentries.Supposek=q.TheprobabilityPiforanarbitraryentryintheithsegmenttobeunusedisP1=(1)]TJ /F6 11.955 Tf 18.24 8.09 Td[(1 l=q)ne)]TJ /F5 7.97 Tf 6.59 0 Td[(qn=lPi=(1)]TJ /F5 7.97 Tf 12.74 14.94 Td[(i)]TJ /F7 7.97 Tf 6.58 0 Td[(1Yj=1(1)]TJ /F3 11.955 Tf 11.96 0 Td[(Pj)1 l=q)ne)]TJ /F13 7.97 Tf 7.99 5.97 Td[(Qi)]TJ /F7 5.978 Tf 5.76 0 Td[(1j=1(1)]TJ /F5 7.97 Tf 6.59 0 Td[(Pj)qn=l,8i2[2,q]. (4)Clearly,thevalueofPiincreaseswithrespecttoi.Theexpectedvalueofiissimply1)]TJ /F3 11.955 Tf 12.4 0 Td[(Pi,whichdecreaseswithrespecttoi.Infact,thesegmentloadsdecreasequicklyfrom1toq.Similaranalysiscanbedoneforthecaseofk>q,i.e.,asegmentmayhavemorethanonecandidateentry.Thecandidate-to-rightpolicydeterminesthenumberofcandidateentriesineachsegment.Takingadvantageoftheloadskew,itfurtherimprovestheprobabilityforanewmembertobesuccessfullyinsertedintotheSID-table.Ifk=q,thesuccessful-insertionprobabilityis1)]TJ /F9 11.955 Tf 12.18 8.97 Td[(Qqi=1i.Ifk>q,thesuccessful-insertionprobabilityis1)]TJ /F9 11.955 Tf 12.18 8.97 Td[(Qqi=1(i)di,wherediisthenumberofcandidateentriesintheithsegment.Sinceqhasthe 71

PAGE 72

Figure4-3. AnexampleoflookingupamemberintheMSMfunction. smallestvalue,wenaturallywanttopushthecandidateentriestotherightmost.AsshowninFigure 4-2 ,di=1,8i2[1,q)]TJ /F6 11.955 Tf 12.01 0 Td[(1],anddq=k)]TJ /F3 11.955 Tf 12.02 0 Td[(q+1.Thesuccessful-insertionprobabilitybecomes1)]TJ /F9 11.955 Tf 11.96 8.97 Td[(Qq)]TJ /F7 7.97 Tf 6.59 0 Td[(1i=1i(q)k)]TJ /F5 7.97 Tf 6.59 0 Td[(q+1.Theeffectoftheabovecandidate-to-rightpolicymaybeampliedifwereducethevalueofq.Thiscanbeachievedbyallowingmorethanonecandidateentryinthesecondrightmostsegment.Ifweusetwocandidateentriesinthe(q)]TJ /F6 11.955 Tf 12.23 0 Td[(1)thsegment,anewmemberwillhaveabetterchancetondanunusedentryinthe(q)]TJ /F6 11.955 Tf 12.06 0 Td[(1)thsegment,withouthavingtoresorttotheqthsegment,therebyreducingitsloadq.Inthiscase,thesuccessful-insertionprobabilityis1)]TJ /F9 11.955 Tf 12.03 8.97 Td[(Qq)]TJ /F7 7.97 Tf 6.59 0 Td[(2i=1i(q)]TJ /F7 7.97 Tf 6.58 0 Td[(1)2(q)k)]TJ /F5 7.97 Tf 6.58 0 Td[(q.Itperformsbetterifthereductioninqmorethancompensatesforthereductioninitsexponent.Ingeneral,ifthevaluesofn,l,qandkarexed,wecanmathematicallycomputetheoptimalvaluesofdi,i2[1,q],thatmaximizethesuccessful-insertionprobability. 4.3.2.5LookupConsideranarbitraryelemente,weuseMSMtodetermineitssetID.ThestepsareshowninFigure 4-3 .First,wegenerateasequenceofhashbitsusingelemente.Usingthehashbits,welocateandfetchgblocksintheindexencoder,i.e.,Ce,totheprocessor,wheretheremainingoperationsareperformed.TheprocessorhaskunitsthattestsinparallelwhetheranycandidateindexiisencodedinCe,for1ik.Each 72

PAGE 73

unithashesitok0bitsinCeandcheckswhetherallbitsareones.Ifso,iisencodedinCe.IfeisalivememberandithasbeenpreviouslyinsertedintheMSMfunction,itsprimaryindexamusthavebeenencodedinCe.Ideally,allotherindicesshouldbetestedasnotencodedinCe.However,falsepositivemayhappensuchthatthek0bitsofanotherindexareallonesbecausetheyaresetduringtheinsertionofothermembers(whicharealsohashedtothesebitsbychance).Inthiscase,weneedtogureoutwhichoneistheprimaryindexthatwecanusetondtherightentryintheSID-tableformembere.ThechecksumeldintheSID-tablewillserveforthispurpose.Weemphasizethattheprobabilityoffalsepositiveshouldbemadeverysmallsuchthatinmostcasesonlyoneindex(namely,a)willbefoundinCe.IfnoindexisfoundinCe,membereisnotamemberandthepacketwillbedropped.ForeachindexithatisencodedinCe,wehasheforitsithcandidateentryintheSID-table.Wethenfetchtheentryandcompareitschecksumeldwithwhat'scomputedfrome.Ifthechecksumdoesnotmatch,imustnotbetheprimaryindex.Otherwise,weoutputtheset-IDeldofthatentry.Figure 4-3 showsanexampleoflookingupamemberintheMSMfunction.Clearly,theprimaryindexawillpasstheabovechecksumtest.However,thechecksumeldcannottotallyresolvefalsepositivebecausetheprobabilityforanotherindexitopassthechecksumtestbychanceissmallbutnotzero.Whenthisindeedhappens,wehaveconictclassication.BecausesomemembersmaybestoredinTCAM,thelookuphappenssimultaneouslyinbothTCAMandindexencoder/SID-table.IfthereisamatchinTCAM,thelookupintheindexencoder/SID-tablewillbeabortedanditsresultwillbediscarded. 4.3.2.6HashFunctionsTheMSMfunctionrequiresmanyhashoperations.Forexample,itneedstohashamemberidentieretobblocksintheindexencoder,andthenhasheachindexin 73

PAGE 74

[1,k]tok0bitsinCeduringlookup.Forinsertion,itneedstohashetokentriesintheSID-table.Thatwilladduptog+qk0+khashoperations.Ifg=2,q=k=4andk0=5,thenumberofhashoperationswillbe26.Buttheactualimplementationdoesnotrequiresomanyifwetakeahash-bit'spointofview,insteadofahash-operation'spointofview.LetlbethenumberofentriesintheSID-table,l0bethenumberofblocksintheindexencoder,andeachblockbe64bitslong.Fortheindexencoder,thelookupoperationneedsthemosthashbits:glog2l0hashbitstolocateCe,andk0log264ghashbitstolocatebitsforeachindexin[1,k].Ifl0=1M,g=2,q=k=4andk0=5,itneedsjust180hashbits,whichcanbeproducedbythreehashingoperationsifeachgenerates64-bitoutput.Suppose192hashbitsareproducedbyhardware.Wetaketherstglog2l0=40bitstolocateCe.Thenwetakethenextk0log264g=35bitstolocatethek0bitsinCeforindex0,taketheyetnext35bitstolocatethebitsforindex1,andsoon.3FortheSID-table,theinsertionoperationneedsthemosthashbits.Itneedsklog2(l=q)hashbitstolocatekentries.Ifl=0.25M,4weneed80hashbits,whichcanbeproducedbytwohashingoperationsifeachgenerates64-bitoutput. 4.3.3ConictClassicationConictclassicationhappensunderthefollowingcondition:(1)falsepositiveoccursintheindexencodersuchthatmorethanoneindexisfoundinCe,and(2)the 3Itmaybecounter-intuitivetoimplementhashingofanindex(togetherwiththememberidentier)inthisnon-conventionalway.Anormalhashfunctionwouldtaketheindexandthememberidentierastwoinputs.Intheaboveimplementation,itusestherstinputtoproduceasequenceofhashbitsandthenusesthesecondinputtochoosesomebitsfromthesequence.4OuranalysisandexperimentswillshowthatthenumberofentriesintheSID-tablecanbemuchfewerthanthenumberofbitsintheindexencoder. 74

PAGE 75

checksumeldintheSID-tablefailsinresolvingthefalsepositive.Belowwederivetheprobabilityofconictclassication.ForeachmemberinsertedintheMSMfunction,thek0bitsforitsprimaryindexaresettoones.ConsideranarbitrarybitzinCe.Itremainszeroifitwasnotchosenbytheprimaryindexofanymember.Letwbethenumberofbitsineachblock.Theprobabilitythatzisnotchosenbytheprimaryindexofmembereis(1)]TJ /F7 7.97 Tf 16.97 4.7 Td[(1 wg)k0,wherewgisthenumberofbitsinCe.Theprobabilitythatzisnotchosenbyanyothermembere0is1)]TJ /F6 11.955 Tf 12.1 0 Td[([1)]TJ /F6 11.955 Tf 12.1 0 Td[((1)]TJ /F7 7.97 Tf 13.75 4.71 Td[(1 l0)g][1)]TJ /F6 11.955 Tf 12.1 0 Td[((1)]TJ /F7 7.97 Tf 16.38 4.71 Td[(1 wg)k0],wherethersttermintheproductistheprobabilityforCe0tocontainz'sblockandthesecondtermintheproductistheprobabilityforztobechosenbytheprimaryindexofe0.Hence,theprobabilityforztobezeroisP0=(1)]TJ /F6 11.955 Tf 17.52 8.09 Td[(1 wg)k0)]TJ /F6 11.955 Tf 5.48 -9.69 Td[(1)]TJ /F6 11.955 Tf 11.95 0 Td[([1)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F6 11.955 Tf 13.33 8.09 Td[(1 l0)g][1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F6 11.955 Tf 17.53 8.09 Td[(1 wg)k0]n)]TJ /F7 7.97 Tf 6.58 0 Td[(1, (4)wherenisthenumberofmembers.Consideranarbitrarynon-primaryindexofmembere.Itchoosesk0bitsinCe.Falsepositiveoccursifallofthemareones.Hence,thefalse-positiveprobabilityisPfp=(1)]TJ /F3 11.955 Tf 11.96 0 Td[(P0)k0. (4)Thisisnotstrictlycorrectasitassumesindependencefortheprobabilitiesofeachbitbeingset.However,itisaverycloseapproximation.TheprobabilitythatthechecksumintheSID-tablecannotresolvethisfalsepositiveis1 2c,wherecisthenumberofbitsinthechecksum.Hence,theprobabilityforonenon-primaryindextocauseconictclassicationis1 2cPfp.Becausetherearek)]TJ /F6 11.955 Tf 12.45 0 Td[(1non-primaryindices,theconict-classicationprobabilityisPconict=1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F3 11.955 Tf 13.15 8.09 Td[(Pfp 2c)k)]TJ /F7 7.97 Tf 6.58 0 Td[(1. (4) 75

PAGE 76

4.3.4FalsePositiveLetebetheidentierofanelementthatdoesnotbelongtoanyset.TheMSMfunctionrsthashesetondCe,andthenchecksifanycandidateindexiisencodedinCe.AcandidateindexisnotinCeifoneofitsk0bitsiszero.IfnocandidateindicesarefoundinCe,theMSMfunctionknowsthateisnotamember.Duetofalsepositive,however,oneormorecandidateindicesmaybefoundinCe.Inthiscase,theMSMfunctionmovestotheSID-table.EachcandidateindexinCepointstoacandidateentryintheSID-table.Ifthechecksumofthatentrydoesnotmatchwhat'scomputedfrome,weknowthatitisafalse-positivecausedbytheindexencoder;ifthishappenstoallindicesfoundinCe,theMSMfunctionwillrecognizeeasanon-member.However,ifthechecksumofoneentrymatchese,thesetIDinthatentrywillbeoutputted,resultinginfalsepositive.Wederivetheprobabilityoffalsepositive.Followingasimilaranalysisthatleadsto( 4 ),wegivetheprobabilityforanarbitrarybitinCetobezeroasfollows:P00=)]TJ /F6 11.955 Tf 5.48 -9.69 Td[(1)]TJ /F6 11.955 Tf 11.96 0 Td[([1)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F6 11.955 Tf 13.33 8.09 Td[(1 l0)g][1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F6 11.955 Tf 17.53 8.09 Td[(1 wg)k0]n. (4)Itisslightlydifferentfrom( 4 ),withouttheterm(1)]TJ /F7 7.97 Tf 16.26 4.71 Td[(1 wg)k0,becausenoprimaryindexofeisencoded.Consideranarbitraryindexofmembere.Ithask0bitsinCe.Falsepositiveoccursifallofthemareones.Hence,thefalse-positiveprobabilityisP0fp=(1)]TJ /F3 11.955 Tf 11.96 0 Td[(P00)k0. (4)TheprobabilitythatthechecksumintheSID-tablecannotresolvethisfalsepositiveis1 2c.Hence,theprobabilityforoneindextocausefalsepositiveis1 2cP0fp.Becausetherearekindices,thefalsepositiveprobabilityisPfalse=1)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F3 11.955 Tf 13.15 9.1 Td[(P0fp 2c)k. (4) 76

PAGE 77

4.3.5Mis-ClassicationNext,weconsideramemberelementethatbelongstosetSe.Bydenition,mis-classicationoccursifanyofthefollowingtwoconditionsholds:(1)theMSMfunctionclaimsthatebelongstoadifferentsetS0e,or(2)theMSMfunctionclaimsthatedoesnotbelongtoanyset.SincethememberhasbeeninsertedintheMSMfunction,thelookupwillcertainlyndtheprimaryindexaintheindexencoder,andthatleadstotheprimaryentryinSID-tableforthesetIDSe.Socondition(2)willnothappenforMSMfunction.Condition(1)onlyhappenswhentheMSMfunctionoutputsS0easthesoleanswer,whichisimpossiblegiventhatSeisguaranteedtobeincludedinoutput.Inotherwords,themembershiplookupresultforelementecaneitherbeSeoraconictclassication. 4.4Discussions 4.4.1DeletionAsBloomltersdoesnotsupportdeletioninherentlyduetorandomizedsharingofbits.InordertosupportdeletionintheMSMfunction,wemayuseoneofthetwotechniques:counterbaseddeletionandtime-baseddeletion. 4.4.1.1Counting-basedDeletionWereplacethebitarrayCewithacounterarrayascountingBloomlterdoes[ 54 82 137 ].Whenweinserttheprimaryindexaofamembere,wehashetondCe.Wethenhashatok0counters(insteadofbits)andincreaseallthosecountersbyone.Whenwedeleteamembere0ofsetSe0,afterndingCe0,wecheckallindicesi2[1,k]toseewhichoneisencodedinCe0.IfthereisonlyoneindexinCe0,itmustbetheprimaryindexa.Wehashe0foritsathcandidateentryintheSID-tableandsetthesetIDeldtozero,whichreleasestheentryforfutureuse.Theadvantagewithcountingbaseddeletionisthatthelogicissimpleandit'seasytoimplement.However,whatifwendmorethanoneindexinCe0(duetofalsepositive)?TheygiveusmorethanonecandidateentryintheSID-table.Ifthe 77

PAGE 78

checksumsoftwoormorecandidateentriesmatche,wewillhavenoideawhichentrytoremove.Althoughtheprobabilityforthistohappenissmall,itcanhappenandwewillnotbeabletoreleaseausedentry.Overtime,suchwastedentriesmayaccumulate. 4.4.1.2Time-BasedDeletionIfeachmemberonlyremainsinasetforaperiodoftime,wecanperformdeletionperiodicallybyremovingunvisitedelementsinpreviousperiod.Morespecically,wereplaceeachbitintheindexencoderwithatwo-bitcode,andaddoneagbittoeachentryofSID-table.Thepurposeoftheseistotrackwhetheranelementisvisited(beinglookedup).Initially,allagbitsofSID-tableentriesare`0',ifanentryinSID-tableisvisitedandthechecksummatches,wesettheagbitofthatentryto`1'.Aftereachperiod,theMSMfunctionrecyclesallentriesintheSID-tablewhoseagsare`0'.Tochangetheirstatustounused,wesimplyassignzerotothesetIDeld.Wealsoresettheagsofallentriesto`0',preparingfordeletionafterthenextperiod.TheMSMfunctionalsoclearsthecodesthatarenotaccessedinthepreviousperiod.Allcodesareinitially.Whenamemberisinsertedorvisited,thecodesitmapstoaresetto.Codeisusedtoresolveconictclassicationaswewillshowlater.Membershiplookupchecksthepositivenessofcodes.Duringperiodicaldeletion, Codesthatare(notbeingaccessedduringthepreviousperiod)aresetto. Codesthatareor(havingbeenaccessedduringthepreviousperiod)aresetto,initializingthemforthenextperiod.Figure 4-4 showsthedetailedtransitionofcodestates.DeletionneedstoaccesstheentireindexencoderandSID-table.IftheSRAMspaceallocatedtotheMSMfunctionis10MBandeachmemoryaccessretrievesablockof64bitsintoahardwareunitfordeletion,thereare1.25Mblocksandittakes2.5Mmemoryaccesses(forreadingtheblocksandwritingthemback)toprocessall 78

PAGE 79

Figure4-4. Statetransitiondiagramforcodesinindexencoderwithtime-baseddeletion.isthestartingstate. blocks.ThisoverheadismanageableifweamortizeitoverthenumerouspacketsthattheMSMfunctionreceivesovereachdeletionperiod.Forexample,supposeMSMisusedtoclassifyinboundpacketsofarouter,anddeletionisperformedafterevery10seconds.Iftheinboundlinespeedis100Gbpsandtheaveragepacketsizeis1000bytes,thenumberofpacketsreceivedoveraperiodisaround125M.5Ifweinterleavememoryaccessesfordeletionwithpacketprocessing,wemayprocessoneblockfordeletionafterforwardinganinboundpacket.Inthisway,fortherst1.25Mpacketsoutof125M,twomemoryaccessesfordeletionwillbemadeaftereachpacket.Thattranslatesintoasub-periodofabout0.1second(outof10seconds).Afterthat,thedeletioniscompleted.Alternatively,wemayprocessoneblockfordeletionafterforwardingmultiplepackets,whichreducestheimpactofdeletionevenatthemicrotimelevel.CodeValue:Supposeconictclassicationhappenstoamemberwithidentiere.Topreventsubsequentlookupsofmemberefromencounteringconictclassication,theMSMfunctionmayissueanexactlookuptoconrmtherealsetIDofe.Forexample,someapplicationmaystoretheexactmembershipinlargerbutslower 5Forsimplicityindiscussion,weignorethephysicallayeroverheadduringtransmissionwhichcutsdownthepacketrate. 79

PAGE 80

DRAM.Then,itperformsalookuponeandndstheprimaryentryintheSID-tablewhosesetIDeldstoresx.Onceitndssuchanentry,itknowsthatthecorrespondingindexmustbetheprimaryindex.Itthengoesbacktotheindexencoder,ndsthecodesfortheprimaryindex,andsetsthemto.Forasubsequentlookupsofmembere,whentheMSMfunctionperformslookupandndsmultipleindicesinCe,itwillusetheindexwhosecodesareall.Althoughtheaboveapproachhelpsresolvingconictclassication,itcannottotallyeliminateconictclassicationbecauseintheoryfalsepositivemaystilloccursuchthatthecodesofanon-primaryindexareall,nomatterhowsmallthelikelihoodis.Nevertheless,withthehelpofcodevalue,wecanreducethechanceforconictclassicationtohappen.Falsepositivemayevenoccursuchthatthecodesofanon-primaryindexareall,whilethecodesoftheprimaryindexarenotyetsettoall.However,theprobabilityforthistohappenisextremelysmall. 4.4.2MultipleMemoryBanksSofarwehavenotconsideredmulti-bankedon-chipmemory,whichimplementsseparateSRAMarrayssuchthateachcanbeaccessedindependently.Normally,eachbankinterleavesitsaddressblockswithotherbanks.Itmayalsobedesignedtooccupyacontiguoussectionofaddresses.Multi-bankedmemoryallowsustomakemultiplememoryaccessesinparallel.Ithelpsincreasethememorybandwidthifon-chipmemorybecomesaperformancebottleneck.Forexample,thecombinatorialBloomlter(COMB)[ 57 ]mayneedtomakeoverahundredmemoryaccessesperpacket,dependingonitsparametersetting.Ifnumerousmemorybanksareusedtosupportparallelaccesses,theactualper-packetdelayduetomemoryaccesswillbegreatlyreduced.MultiplebankscanequallybenettheMSMfunction.Firstofall,ifthegblocksinCearetakenfromgmemorybanksrespectively,theycanbefetchedinparallel.IftheSID-tableresidesonadifferentmemorybank,theoperationsontheindexencoder 80

PAGE 81

andtheSID-tablecanbepipelined:Forexample,whenthelookupofapacketmovestotheSID-table,thelookupofthenextpacketcanbegintofetchblocksfromtheindexencoder.Foreachmembershipquery,theMSMfunctionmakesgmemoryaccessestofetchCefromtheindexencoder.Ifthereisnoconictclassication,itmakesonememoryaccesstofetchanentryfromtheSID-table.Hence,wheng=2,threememoryaccessesareneededintotal.Ifthenumberofmemorybanksavailableismorethanthat,wecanstillmakefulluseoftheparallelismbylookingupmultipleelementssimultaneously.Wedividememorybanksintogroupsandassignmemberstodifferentgroups.EachgrouphasitsownindexencoderandSID-table,aswellasadedicatedsetofhardwareunitsforinsertion,deletionandlookup.Wheninsertingamembere,wehashittoamemory-bankgroupandthenstoresitssetIDintheSID-tableandindexencoderofthatgroupinexactlythesamewayaswehavedescribedpreviously. 4.4.3SystemOverloadWhenthenumberofmembersistoomanytobestoredintheSID-table,insertionfailurewillhappenmorefrequently,whichmayoverowtheTCAM,resultinginsystemoverload.TheMSMfunctioncanbedesignedtodynamicallyincreasetheSID-tabletoaccommodatemoremembers.Todoso,weaddanumberrofmoresegmentsbypreemptinglessimportantfunctionsthatarealsoimplementedontheEN-interface.6RecallthattheSID-tableconsistsofqsegmentsfromwhichkcandidateentriesareaccessesforeachnewmemberduringinsertion.Besidesaddingsegments,wealsoneedtoincreasethenumberofcandidateentriesfromktok+r,assumingthateach 6Forexample,atrafcmeasurementfunctionmayhavelowerprioritythancloudmembershiplookup.Itcanbestoppedtogiveupitsmemoryuntiltheoverloadconditioneases. 81

PAGE 82

newsegmenthasonecandidateentryandthatthereareextrahardwareunitstotestwhethertheadditionalindicesareencodedinCeduringlookup.Whentheoverloadconditioneasesandexistingmemberscanbeheldinksegments,wewillreleasetheaddedsegmentsfromtheSID-tablewhentheirentriesbecomeunused. 4.5EvaluationWecompareourdatastructure(MSM)withothertwoBloomlterbaseddatastructures:theBloomierFilter(Bloomier)[ 22 ]andthecombinatorialBloomlter(COMB)[ 57 ]usingsyntheticdata.ThedetailsofCOMBandBloomiercanbefoundinSection 2.3.4 4.5.1SimulationSetupInoursimulation,weallocatetheamountofmemory(m=16Mbits)foreachdatastructureandinsertnrandomgeneratedmemberswithrandomsetIDs.Thenumberofsetssis4000,andthesetIDsare1,2,...,4000.Wevarynsothattheaveragememorypermemberchangesfrom20bitsto50bits(m 50nm 20).Giventhatweneeddlog2ge=12torepresentasetID,thereareactually8bitsto42bitspermemberlefttoencodethemembership.Weusetheoptimalcongurationsofthedatastructuresasfollows:Bloomier:Weassignk=ln2l nelementstoencodeeachmember,wherel=m log2g+1isthenumberofentries.Thiswillminimizebothfalsepositivesandinsertionfailures[ 12 ].COMB:WechooseCOMB(15,6)whichcanencodeupto)]TJ /F7 7.97 Tf 5.48 -4.38 Td[(156=5005sets.Weusek=ln2m 6nhashfunctionforeachcodebittoobtainfewestfalsepositives[ 57 ].MSM:Wechooseg=2toboundthenumberofmemoryaccessperlookup;WedividetheSID-tabletoq=8segmentsandsetthenumberofcandidateentrieskto10toobtainreasonablehashoverheadandinsertionfailurerate.Wechoosetheoptimalk0thatminimizesthefalsepositiveratiooftheindexencoder(P0fp)basedon 82

PAGE 83

Figure4-5. InsertionfailureratiooftheBloomierlterandtheMSMlter. Equation( 4 ).Then,theonlyleftopenvariablesarecandl0.Ascandl0areboundedintegers,withthehelpoflinearprogramming,wecancomputetheoptimalvalueswhichminimizesthefalsepositiveratioinEquation( 4 ). 4.5.2SimulationResults 4.5.2.1InsertionFailureFirstwekeepinsertingmemberstothedatastructuresuntilthenumberofmembersencodedreachesn.FortheBloomierlter,insertionfailurehappenswhenthereisnoavailableelementforanewmember;ForMSM,insertionfailureoccurswhenallkcandidateentriesforthenewmemberareused.BothBloomierandMSMhandlethisbyinsertingthenewmemberintoTCAM.Figure 4-5 showstheratioofmembersthatencounterinsertionfailure.Whenthereare20bitspermember,theBloomierlterfailstoinsert26%oftheelements,whileMSMonlyfails1%.Whenthereare50bitspermember,theBloomierlterhas5%insertionfailure,whileMSMdoesnotencounteranyinsertionfailuresincebitspermemberreaches25.TheconsequenceofinsertionfailureisconsumptionofexpensiveTCAMorusinglargerSRAMmemoryinstead.ItisatoughdecisiontomakeasbothTCAMandSRAMareexpensiveforanetworkdevice. 83

PAGE 84

4.5.2.2MembershipLookupOverheadNextweexaminethehashoverheadandmemoryaccessoverheadforeachmembershiplookup.Figure 4-6 showstheaveragenumberofmemoryaccessesforeachmembershiplookup.Whilethenumberofbitspermemberincreases,thenumberofmemoryaccessesforCOMBandBloomieralsoincreases.Thisisbecausetheyuselargerktoreducefalsepositiveratio.MSMhaslessmemoryaccesseswhengivenmorebitstoencodeeachmember.ThisisbecausesmallerfalsepositiveratiointheindexencoderresultsinfewermemoryaccessestoSID-table.TheBloomierlterhas1memoryaccessperquery.MSMhas7.8.5memoryaccessesformembers,and7.3.6memoryaccessesfornon-members.Theminordifferenceisduetothenatureoftheindexencoder:ifnoindexisfoundfromindexencoder,thereisnoneedtoaccesstheSID-table.COMBrequires30to90memoryaccessesforeachmembershiplookup,whichis10timesthatofMSM,and30timesthatoftheBloomierlter.Figure 4-7 presentsthenumberofhashbitsrequiredbythedatastructuresforeachmembershiplookup.Amongthem,theBloomierlterworksthebestwith21hashbitsperquery.MSMrequires278hashbits,andCOMBrequires720hashbits. 4.5.2.3CorrectnessFinally,weexaminethecorrectnessofthedatastructurebyinitiatingmembershipqueriesforbothmembersandnon-members.Twotypesoferrorsmayhappen:falsepositivewhichhappenswhenavalidset-IDisoutputtedforanon-member;andconictclassicationwhichmeansmorethanonevalidset-IDisoutputtedforamember.Figure 4-8 and 4-9 presentsthefalsepositiveratiosandconictclassicationratiosofthethreedatastructuresrespectively.TheBloomierlteroutputsoneandonlyonevalidset-IDformemberelements.Therefore,itdoesnothaveconictclassications.COMBhasthesmallestfalsepositiveratio,butithaslargeconictclassicationratio.However,theBloomierlterhasmagnitudelargerfalsepositiveratiothantheothertwolters.Inpractice,onecandeterminewhichdatastructuretouseifhe/shehastheknowledge 84

PAGE 85

Figure4-6. AveragenumberofmemoryaccesspermembershiplookupbytheBloomierlter,COMB,andtheMSMlter. Figure4-7. NumberofhashbitsrequiredbytheBloomierlter,COMB,andtheMSMlterforeachmembershiplookup. aboutthequeryset:Ifthequeryaremostlyformembers,MSMortheBloomierlterisagoodoption.Otherwiseifthequeryismostlyfornon-members,COMBmakesagoodcandidateaswell.Figure 4-10 showstheratioofelementsthatiseitherfalsepositiveorconictclassication7,denedastheerrorratio.Amongthem,COMBhaslargererrorratio 7Theactualerrorratiocanvarydependingondifferentqueryset. 85

PAGE 86

Figure4-8. FalsepositiveratiocomparisonamongtheBloomierlter,COMB,andtheMSMlter. Figure4-9. ConictclassicationratiocomparisonbetweenCOMBandMSM. thantheBloomierlter.MSMhasthelowesterrorratio,andthisisachievedbyonly7.83.6memoryaccessesand278hashbits. 4.6SummaryInthischapter,weproposeanoveldatastructureformulti-setmembershiplookupbasedonanindexencoderandaSID-tablethatarehighlyspace-efcient,yetneedasmallnumberofmemoryaccessesforeachquery.Weproposeaload-to-left,candidate-to-rightpolicytoimproveinsertionperformance.Simulationshowthatourdesignsignicantlyoutperformsknownalternatives. 86

PAGE 87

Figure4-10. ErrorratiooftheBloomierlter,COMB,andtheMSMlter. 87

PAGE 88

CHAPTER5ADAPTIVEBLOOMFILTERSFORDISTRIBUTEDJOIN 5.1BackgroundBloomltersarewidelyusedindistributedsystems,whichcanbeusedinanomalydetection,attackdetection,networktrafcmeasurement,accountmanagement,andsoon.Forexample,distributedintrusiondetectionsystems(IDSes)detectaddress/portscanning[ 148 ],inwhichanexternalhostattemptstoestablishconnectionstoaunusualnumberofinternalhosts.MultipleIDSesguardinggeographicallydispersedsub-netsreporttoacentralcoordinatorabouttheirmeasurementsonthenumberofdistinctdestinationsthateachsourcehascontacted,andthecentralcoordinatorcombinesthemeasurementstodetectscanners.Asanothersecurityapplication,ahoneynetdetectsattackbyplacingseveralfakeserverscalledhoney-potsinthenetwork,whichlogallattemptingaccessestoit[ 147 ].Ashoney-potsdonotprovideanyrealservice,theonewhocontactsthemisprobablythesourceofascannerattackoraworm[ 107 ].Thecentercancollecttheloginformationfromallhoney-potstolocateattacker.Asanexampleintrafcengineering,someISPsmeasuretheirnetworkandanalyzethemeasurementresultsforimprovedQoS[ 26 27 86 ].Somemeasureowsthattravelinaparticularpath.Inordertodothis,eachroutersinthepathwillrecordallbypassingows.Later,byjoiningtheowsthatpassedtworouters,weknowthesetofowsthatweretravelingbetweenthem.Lastly,inapeer-to-peernetworksuchasaDHT,informationisstoredonmillionsofsitesinthenetwork.Eachkeywordismappedtoanodeinthenetwork,whereaninvertedlistcontainingdocumentIDscorrespondingtothekeywordisstored[ 23 ].Ifmultiplekeywordsaresearched,theinvertedlistsfromcorrespondingnodesarejoinedtogethertoformthenalsearchresults.Inallaboveexamples,thereisafundamentalproblemtobesolved:ndingthecommonelementsofsets,whichcouldbelistsofsourceIPaddressescapturedbyhoneypots,setsofowslabelscollectedbyrouters,orinvertedlistsofdocumentIDs. 88

PAGE 89

Wedenethisproblemasthedistributedjoinproblem,whichistondthecommonelementsoftwosetsthataredistributedindifferentlocationsinthenetwork.Thestraightforwardsolutionistoletonenodetosendthekeysofelementstotheothernode.However,whenthesetsarelarge,itisnotefcienttosendthewholeset.Supposeinadistributedmonitorsystem,thereisamillionowsinameasurementperiodof1minutes,andeachowidentierconsistsof100bits1.Nowinordertondthecommonowscollectedbytwomonitors,onemonitorneedstosend1M100=100MbowIDsperminute,whichtranslatesto1.6Mbpsupstream/downstreamtrafc.Asonemonitormaysharemonitoringresultswithmanyothers,thebandwidthrequirementistoolarge.AnothersolutionistouseBloomltertolterunneededowIDsbeforetransferringtherealIDs[ 104 ].OnemonitortransfersaBloomlterthatencodesitssetofelementtotheothermonitor.Theothermonitorthenperformsamembershiplookupusingitsownsetofowidentiers.NowitonlyneedstosendtheowidentiersthatremainpositiveintheBloomlter.SupposetheBloomltertakes20M(20bitperelement)andthefalsepositiveratiois0.1%.Ifthereare1,000commonows,therealtrafcfortransferringowidentiersis(1M0.1%+1,000)100andtheoveralltrafcforbothBloomlterandowidentiersare20.2Mb.Itbringsdownthetrafcbyaround80%.ThecompressedBloomlter[ 111 ]canbeusedinprevioussolutionsofurtherreducecommunicationcost.ThecompressedBloomlterusesarithmeticcodingtoencodeasparseBloomlter.ItreducesthetransmissionsizeoftheBloomlterwiththecostofcompression/decompressionatsender/receiver.Inthischapter,wewillintroduceanotherapproachcalledadaptiveBloomlter,thatisdesignedforefcientlydistributedjoiningtwosets.Ourideaistoiterativelyeliminate 1Supposeweuse5tuplestoidentifyeachow. 89

PAGE 90

elementsthatisnotinthecommonset.Wefoundthatwhenthejoinishighlyselective(thesizeofcommonelementsisignorablecomparingtosizesofeachset),theadaptiveBloomlteroutperformsboththeBloomlterandthecompressedBloomlter. 5.2SystemModel 5.2.1ProblemDenitionAssumetherearetwosetsofelementsresidingindifferentlocationsinthenetwork.Withoutlossofgenerality,wecalloneofthemthelocalhost,andtheothertheremotehost.WedenotethetwosetsasSlandSrrespectively.OurgoalisforthelocalhosttondoutthesetTofcommonelementsbetweenthetwosets,i.e.T=Sl\Sr.ExactlyndingTcanbeexpensiveifthesetsareverylarge.Alargemessagecontainingelementinformationhastobesentoverthenetwork.Iftheoperationisperformedfrequently,itcouldhavegreatimpacttothenetworktrafc.Alternatively,wemayallowafewelementsthatdonotbelongtoTtobeincludedinthenalresult.Inotherwords,wewanttoidentifyasetTsuchthatTT,andjT)]TJ /F5 7.97 Tf 6.58 0 Td[(Tj jTjissmall.ThiswillenableustocompactlyencodethesetsusingefcientdatastructuressuchasBloomlters.IfthegoalistoexactlyidentifyT,wecansendovertheelementsinT(insteadofthewholeset)forexactchecking.Tremendoussavingisstillpossible. 5.2.2PerformanceMetricsTherstperformancemetricsevaluatestheaccuracyofT.WedenethefalsepositiveratioastheprobabilityforanelementinSl)]TJ /F3 11.955 Tf 11.96 0 Td[(TtobeincludedinT,i.e., fp=T\Sl)]TJ /F3 11.955 Tf 11.96 0 Td[(T Sl)]TJ /F3 11.955 Tf 11.95 0 Td[(T.(5) 90

PAGE 91

Thesecondperformancemetricsdescribesthetrafcoverheadbroughttothenetwork.Wedenethemessagesizeastheamountofbitssentandreceivedbythelocalhost,denotedasM2.ThegoalistoreducethemessagesizeMandthefalsepositiveratiofpasmuchaspossible.Infact,thereisatradeoffrelationshipbetweenthemessagesizeandthefalsepositiveratio,regardlesswhatdatastructureisusedtoencodethemessage.Therefore,followingtheconventionof[ 134 ],weboundthefalsepositiveratiowithapre-denedparameter,andcomparethemessagesizesofdifferentapproaches.Withtheperformanceconstraints,wedenetheoptimizationproblemasfollows:minimizemessagesizeM,undertheconstraintfp. 5.3BloomlterApproaches 5.3.1Bloomlter(BF)IfBloomltersareused,wecanasktheremotehosttosendaBloomlterencodingallitselementsSr,denotedasBr.OnreceivingtheBloomlter,thelocalhostchecksitsownelementsonebyone.IfanelementistestedtobeamemberinBr,weinsertittotheresultsetT.Inthefollowing,weprovethatTT.Fromourdescription,T=Br\Sl,whereBrstandsforthesetofelementsencodedbytheBloomlter.AsBloomltershavefalsepositives,BrisslightlylargerthanSr,andSrBr.Asaresult,T=Sr\SlBr\Sl=T.WehaveTT.ForanelementinSl)]TJ /F3 11.955 Tf 12.55 0 Td[(T,theprobabilityforittobeinTistheprobabilityforanarbitraryelementtobeinBr,whichisexactlythefalsepositiveratioofBr.RecallthatthefalsepositiveratioofaBloomlteris, fB=)]TJ /F6 11.955 Tf 5.48 -9.68 Td[(1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F6 11.955 Tf 15 8.08 Td[(1 m)nkk(1)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F5 5.978 Tf 7.78 3.26 Td[(nk m)k,(5) 2Tosimplifytheanalysis,wedonotconsidervariousheadersaddedforphysicallytransferringthemessage. 91

PAGE 92

wheremisthesizeoftheBloomlterBr,n=nristhenumberofelementsinSr.andkisthenumberofhashfunctionsused.PleaserefertoChapter 3 Section 3.2.1 fordetailedderivation.Whenk=m nln2isused,thefalsepositiveratioisminimizedwithrespecttomandn.Theoptimalfalsepositiveratiois fB=)]TJ /F6 11.955 Tf 5.48 -9.68 Td[(1)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F5 7.97 Tf 6.58 0 Td[(m=nln2n=mm=nln2=e)]TJ /F5 5.978 Tf 7.78 3.26 Td[(m n(ln2)2.(5)AdoptingtheperformanceconstraintfBto( 5 ),wehave, fB=e)]TJ /F5 5.978 Tf 7.79 3.26 Td[(m n(ln2)2.(5)Therefore, m)]TJ /F3 11.955 Tf 25.6 8.09 Td[(nln (ln2)2.(5)From( 5 ),weknowthatinordertosatisfythefalsepositiverequirement,theminimumsizefortheBloomlterisM=)]TJ /F5 7.97 Tf 10.76 4.71 Td[(nrln (ln2)2. 5.3.2CompressedBloomlter(CBF)ThecompressedBloomlterisproposedbyMitzenmacherin[ 111 ]toreducethesizeofBloomlterswhentheyaretransmittedasamessage.TheideaistouselosslesscompressionalgorithmstocompresstheBloomlterbeforetransmission,anddecompressitatthereceiver.TherealsizeoftheBloomltercanbeverylarge,yetitcanbecompressedforefcienttransmission.IfCBFisusedfortheproblemofthischapter,theprocessissimilartothatwiththeBloomlter:aCBFCBrisgeneratedinremotehost,compressed,andsenttothelocalhost.ThelocalhostreceivesitanddecompressesitbacktoCBr.ThenitchecksitselementsonebyonetogetthenalresultsetT.LetmbethesizeoftheBloomlterbeforecompression,Mbethesizeoftransmittedmessageaftercompression.Ifoptimalcompressorexists,themessagesizecanbeM=mH(p),wherepistheprobabilityforabittobe`0',andH(p)= 92

PAGE 93

)]TJ /F3 11.955 Tf 9.3 0 Td[(plog2p)]TJ /F6 11.955 Tf 12.3 0 Td[((1)]TJ /F3 11.955 Tf 12.3 0 Td[(p)log2(1)]TJ /F3 11.955 Tf 12.3 0 Td[(p)istheentropyfunction.TheauthorsofCBFsuggeststousearithmeticcoding[ 113 159 ]toachievenear-optimalcompression.Ithasbeenproventhatthenumberofhashfunctionsthatminimizesthefalsepositiveratewithoutcompressioninfactmaximizesthefalsepositiveratewithcompression.Thenumberkofhashfunctionsthatminimizesthefalsepositiveratiowithcompressioniseither0orinnity[ 111 ].Asweneedatleastonehashfunction,anditisimpracticaltohaveinnitenumberofhashfunctions,weusek=1astheoptimalvalue.Whenk=1,theprobabilitypforabittobe`0'is, p=e)]TJ /F5 7.97 Tf 6.59 0 Td[(knr=m=e)]TJ /F5 7.97 Tf 6.58 0 Td[(nr=m,(5)andthefalsepositiveratioofCBFis, fCB=(1)]TJ /F3 11.955 Tf 11.96 0 Td[(p)k=1)]TJ /F3 11.955 Tf 11.96 0 Td[(p.(5)Adopting( 5 )and( 5 )tofCB,wehave, m)]TJ /F3 11.955 Tf 41.37 8.08 Td[(nr ln(1)]TJ /F4 11.955 Tf 11.96 0 Td[().(5)FromM=mH(p),wehave,M=mH(p)=m[)]TJ /F3 11.955 Tf 9.29 0 Td[(plog2p)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F3 11.955 Tf 11.96 0 Td[(p)log2(1)]TJ /F3 11.955 Tf 11.95 0 Td[(p)]=ne)]TJ /F5 7.97 Tf 6.58 0 Td[(n=m ln2)]TJ /F3 11.955 Tf 11.96 0 Td[(m(1)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F5 7.97 Tf 6.59 0 Td[(n=m)log2(1)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F5 7.97 Tf 6.59 0 Td[(n=m). (5)From( 5 ),wecannumericallycomputetheminimummessagesizeM,orwecantakethefollowingapproximationformwheremisminimized: M=mH(p)jm=)]TJ /F5 7.97 Tf 6.59 0 Td[(nr=ln(1)]TJ /F12 7.97 Tf 6.59 0 Td[()=nr ln2h1)]TJ /F4 11.955 Tf 11.96 0 Td[(+ln ln(1)]TJ /F4 11.955 Tf 11.95 0 Td[()i.(5) 93

PAGE 94

Figure5-1. ThepartitionedBloomlterwithk=4segments. 5.4AdaptiveBloomlter(ABF) 5.4.1MotivationFrom( 5 ),wecanseethattheoptimalfalsepositiveratioofaBloomlterdependsonlyonmandn3.TheCompressedBloomlter(CBF)increasestheactualsizemoftheBloomltertoachievelowerfalsepositive.Thisinspiredus:Canwedecreasethenumbernofelementsinstead?Ifwecan,howcanweachieveit?Firstlet'sseewhatnreallymeans.Inthecontextofthischapter,theelementsweareinterestedinarethoseinbothsets.Inotherwords,elementsnotbelongingtotheintersectionallcontributetofalsepositives.LetusnametheelementsintheintersectionTascandidateelements,andthoseinSl[Sr)]TJ /F3 11.955 Tf 12.2 0 Td[(Tasnon-candidateelements.InboththeBloomlterapproachandCBFapproach,aportionofnon-candidatesareencodedinBrorCBr.As( 5 )suggests,thefalsepositiveratioofaBForCBFisbiggerwhenthenumberofelementsencodedincreases.Inaperfectscenario,weshouldletBrorCBronlyencodeelementsinT,whosefalsepositiveratiowillbetheoreticallythelowest.Buthowcanweachieveitinpractice?Inordertoreducen,werstneedtoseparatethemembershipbitsoftheelements.OurapproachisinspiredbythepartitionedBloomlter.InapartitionedBloomlter,thebitarrayisdividedintokequal-sizedsub-arrays.Eachelementmapstokbits,oneineachsegment,asillustratedinFigure 5-1 .InapartitionedBloomlter,segmentsare 3Inpractice,thehashfunctionsusedcanalsoaffectthefalsepositiveratio.Imperfecthashfunctionsincurshigherfalsepositiveratiothantheoreticalexpectations[ 92 ]. 94

PAGE 95

independent:eachsegmentindependentlyprovidesomelteringpowertoeliminatenon-members.Ifanelementiseliminatedwiththerstisegments,wedon'tneedtocheckthei+1thsegment.Thinkingthisreversely,ifknowthatanelementisnotinTintheithsegment,wedon'tneedtoencodeittothesegmentsfollowed.Thisleadstooursolution. 5.4.2DescriptionWerstintroduceasimpliedversionofourapproach,thenwegeneralizeittooptimizethecommunicationcostforarbitrarysetsizes.Oursolutionworksiniterations.EachiterationthelocalhostconstructsasegmentofpartitionedBloomlter,i.e.abitmap,withitsremainingelements.Wedenoteitasbli,whereiistheiterationnumber.Letnlbethenumberofelementsinthelocalhost.Weusenl ln2bitsinthebitmapsothateachbitinbliis`1'withaprobabilityof1 2.Thelocalhostsendblitotheremotehost.Theremotehostuseblitolteritselements:ifanelementismappedto`1'inbli,itstays;otherwiseitiseliminatedfromtheintersection.Theprobabilityforanon-candidatetobeeliminatedisthesameastheprobabilityforabitinblitobe`0',whichis1 2.Aftertheelimination,thenumberofelementsleftintheremotehostisroughlync+nr)]TJ /F5 7.97 Tf 6.59 0 Td[(nc 2,wherenc=jTjisthenumberofcommonelements.Theremotehostencodesremainingelementsintoanotherbitmap(usingdifferenthashseed)briusing(nc+nr)]TJ /F5 7.97 Tf 6.59 0 Td[(nc 2)=ln2bitssothatbrihasroughlythesamenumberof`0'sand`1's.Thentheremotehostsendsbribacktothelocalhost.Onreceivingbri,thelocalhostthenperformthesameeliminationonitsremainingelements.After)]TJ /F6 11.955 Tf 11.29 0 Td[(log2iterations,itwillmeettherequiredfalsepositiveratio.Figure 5-2 showsanillustrationofthisapproach.However,whenonesetismuchlargerthantheotherset,theabovesolutionisnotoptimal.Weneedbiggerlteringpowerthan1 2toquicklyruleoutnon-candidateelementsfromthelargeset.Therefore,weproposeageneralizationofaboveiterativebitmapapproach,calledtheadaptiveBloomlter(ABF).Insteadofexchangingbitmapsineachiteration,we 95

PAGE 96

Figure5-2. AsimpliedadaptiveBloomlter. exchangeBloomlters.Notethatwemayalsosendmultiplebitmaps,butaBloomlterwithkhashfunctionsismorepowerfulthankbitmapswiththesameoverallspace. 5.4.3AnalysisInaniterativebitmap,allbitmapstransmittedhavethesameproperty:abitis`1'withaprobabilityof1 2.Aftereachiteration,thelocalhostreceivesabitmapfromtheremotehost.Thelteringpowerofthebitmapis1 2.Theprobabilityforanon-candidateelementinSltoremainun-eliminatedafterthekthiterationis(1 2)k.Considerthefalsepositiveconstraint,wehave,(1 2)k,So, klog1 2=)]TJ /F6 11.955 Tf 11.29 0 Td[(log2,(5)wherekisthenumberofiterations.Ineachiteration,abouthalfofthenon-candidatememberscanbeeliminated.Letnlibethenumberofnon-candidateelementsleftinthelocalhostaftertheithiteration.Thennli+1=1 2nli.Thenumberofbitsthatthelocalhostsendsintheithiterationisnc+nli ln2;Asaresult,thetotalnumberofbitsthatthelocalhostsendsinkiterationsis, Msent=kXi=11 ln2)]TJ /F3 11.955 Tf 5.48 -9.68 Td[(nc+nl)]TJ /F3 11.955 Tf 11.96 0 Td[(nc 2i)]TJ /F7 7.97 Tf 6.58 0 Td[(1=knc ln2+2k)]TJ /F6 11.955 Tf 11.96 0 Td[(1 2k)]TJ /F7 7.97 Tf 6.59 0 Td[(1ln2(nl)]TJ /F3 11.955 Tf 11.96 0 Td[(nc).(5) 96

PAGE 97

Similarly,letnrbethenumberofelementsinSr.Wecanndthenumberofbitsreceivedbythelocalhost,whichis, Mreceived=kXi=11 ln2)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(nc+nr)]TJ /F3 11.955 Tf 11.96 0 Td[(nc 2i=knc ln2+2k)]TJ /F6 11.955 Tf 11.96 0 Td[(1 2kln2(nr)]TJ /F3 11.955 Tf 11.95 0 Td[(nc).(5)Therefore,thetotalmessagesizeis,M=Msent+Mreceived=2knc ln2+2k)]TJ /F6 11.955 Tf 11.96 0 Td[(1 2kln2(2nl+nr)]TJ /F6 11.955 Tf 11.96 0 Td[(3nc)1 ln22nl+nr+(2k)]TJ /F6 11.955 Tf 11.96 0 Td[(3)nc=1 ln22nl+nr+()]TJ /F6 11.955 Tf 9.3 0 Td[(2log2)]TJ /F6 11.955 Tf 11.96 0 Td[(3)nc. (5)Fromtheasymmetryin( 5 ),wecanseethatthelocalhostscontributemoretothenalmessagesize.Thisisbecausethelocalhostsendsalteroutrst,whiletheremotehostisabletolterouthalfofitsnon-candidatebeforeitsrsttransmission.Inpractice,ifnl>nr,wemayasktheremoteservertosenditsbitmaprst.Inthatcase,thelargerset(Sl)willbereducedrstandtheoverallmessagesizeisreduced.Anotherobservationisthatthisapproachissensitivetothesizeofcommonelements.Intuitively,elementsinthecommonsetwillneverbeeliminated.Sotheyareencodedintothemessageinalltheiterations.However,inalotofapplicationsncisverysmallcomparingtonlandnr.Oneexampleisinthescannerdetectiontheproportionofscannersisactuallyverysmallcomparedtothenormalusers.Theotherexampleisthesearchresultsofmultiplekeywordsaremuchfewerthanthatofeachsingleone[ 23 135 ].Now,ifBloomltersareexchangedinsteadofbitmaps,wehavetwoseriesofnewvariables:kliandkri,whereklirepresentsthenumberofhashfunctionsfortheBloomlterthatthelocalhostsendintheithiteration,whilekristandsforthenumberofhashfunctionsfortheBloomlterthattheremotehostrepliesintheithiteration.Wehave 97

PAGE 98

Pti=1kri=k,wheretisthenumberofiterations.Now,wehaveanoptimizationproblemasfollows:OptimizeM,withconstraints:8>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>><>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>:M=1 ln2tXi=1(nli+nc)kli+1 ln2tXi=1(nri+nc)kri;nl1=nl;nr0=nr;nli+1=nlifpri+^nc,for1i
PAGE 99

lteranewfamilyofBloomltervariantwhichisdesignedfordatasharingvianetwork.Theideaistoiterativelyeliminatenon-candidates.Wetransformtheproblemintoanoptimizationproblem. 99

PAGE 100

CHAPTER6USINGPARTITIONEDBLOOMFILTERSFORINFORMATIONCOLLECTIONINRFIDSYSTEMS 6.1FromSpace-TimeEfciencytoTime-EnergyEfciencySwitchingourperspectivefromspacedomaintotimedomain,andfromtimedomaintoenergydomain,wefoundthatspace-timeefcientdatastructurecanalsobeusedtodesignenergy-timeefcientRFIDprotocols.Inatypicalcomputer/networksystem,informationisstoredinmemoryas0/1bits.WhileinawirelessenvironmentsuchasanRFIDsystem,informationissent/receivedviacontinuoustimeslotswhiletheprotocolisexecuting.Thelengthofoveralltimeslotsdenestheexecutiontimeoftheprotocol.Asaresult,thespacedomainincomputer/networksystemsbecomestimedomaininRFIDsystems.Meanwhile,RFIDtagssend/receiveinformationto/fromoneormoretimeslots.Theamountofdatathatatagsend/receivedeterminesitsenergyexpenditure.Therefore,thetimedomainincomputer/networksystems(accessing/updatingthedatastructure)canbemappedtotheenergydomainofRFIDtags(sending/receivinginformationto/fromtimeslots).Asaresult,designingantime-energyefcientRFIDprotocolsharesthesamedisciplineasimplementingaspace-timedatastructureinacomputer/networksystem. 6.2RFIDBackground,Model,andProblemDenition 6.2.1RFIDSystemTraditionalbarcodescanonlybereadincloseranges.RFIDtagsreplacebarcodeswithelectroniccircuitsthatcantransmitidenticationnumberswirelesslyoveradistance.Thelongeroperationalrangemakesthempopularinautomatictransportationpayments,objecttracking,andsupplychainmanagement[ 72 119 152 ].AtypicalRFIDsystemconsistsofoneormultiplereadersandnumeroustags.Eachtagcarriesauniqueidentier(ID).Tagsdonotcommunicateamongstthemselves;theycommunicatedirectlywiththereader. 100

PAGE 101

Passivetagsaremostwidelyusedtoday.Theyarecheap,butdonothaveinternalpowersources.Theyrelyonradiowavesemittedfromthereaderforpower,andhavesmalloperationalrangesofafewmeters,whichseriouslylimittheirapplicability.Forexample,consideralargewarehouseinadistributioncenterofamajorretailer,wherehundredsofthousandsoftaggedcommercialproductsarestored.Insuchanindoorenvironment,ifweusepassivetags,hundredsofRFIDreadersmayhavetobeinstalledinordertoaccesstagsinthewholearea,whichisnotonlycostlybutalsocausesinterferencewhennearbyreaderscommunicatewiththeirtagssimultaneously.Itisnotagoodsolutiontouseamobilereaderandwalkthroughthewholeareawheneverweneedinformationfromtags.Toautomatewarehousemanagementinlargescale,amuchbetterchoiceistousebattery-poweredactivetagsbecauseoftheirlongtransmissionranges.Thelifetimeofthesetagsisdeterminedbyhowtheirbatterypowerisused.Energyconservationmustbeoneofthetopprioritiesinanyprotocoldesignthatinvolvesactivetags.Withricheron-tagresources,activetagsarelikelytogainmorepopularityinthefuture,particularlywhentheirpricesdropovertimeasmanufacturaltechnologiesareimprovedandmarketsareexpanded.Thesetagscanbeintegratedwithminiaturizedsensors[ 112 119 130 139 ].NotonlywilltheyreporttheirIDs,butalsotheycanreportdynamic,real-timeinformationabouttheoperationstatusofthetagsortheconditionsoftheenvironment.WestudyageneralproblemofhowtodesignefcientinformationcollectionprotocolstocollectinformationfromasubsetoftagsinalargeRFIDsystem.Forexample,consideralargechilledfoodstoragefacility,whereeachfooditemisattachedwithaRFIDtagthathasathermalsensor.ARFIDreaderperiodicallycollectstemperaturereadingsfromtagstocheckwhetheranyareaistoohot(ortoocold), 101

PAGE 102

whichmaycausefoodspoil(orenergywaste).1Becauseeachareainthefacilitymaybepackedwithmanyfooditems,thetemperaturereadingsfromtheseclose-bytagsarehighlyredundant.Hence,itisnotnecessaryforthereadertocollectinformationfromalltagsinthesystem.Thereadermayselectasubsetoftagseachtimetocollecttemperatureinformation.Inanotherexample,aRFIDreaderperiodicallyaccessestheresidualenergylevelsofon-tagbatteriestoseeifsometags(ortheirbatteries)needtobereplaced.Ifthereaderhasinformationaboutwhichtagsarenewandwhichonesareold,itmaychoosetoonlyquerytheoldtags.Aswewillseelater,itcostslessenergytoqueryasmallernumberoftags.Ontheotherhand,itisaharderproblemtocollectinformationfromasubsetoftagsthanfromalltagsbecausethereaderhastomakesurethattagsthatarenotunderquerydonottransmittheirtransmissionswillinterferewiththetransmissionsmadebytagsofinterest,causingunnecessaryenergywaste.MuchexistingresearchfocusedondesigningID-collectionprotocolsthatreadIDsfromtagsinaRFIDsystem[ 1 6 19 78 115 140 143 157 162 172 173 177 ].Inrecentyears,someinterestisshiftedtootherfunctionssuchasestimatingthenumberoftagsinasystem[ 55 71 72 89 129 157 170 175 176 ],detectingthemissingtags[ 83 85 100 102 152 ]ormisplacedtags[ 14 ],andtagauthenticationandprivacy[ 42 87 95 164 ].TheprimaryperformanceobjectiveinmostpapersistominimizetheexecutiontimeittakesaprotocoltoreadalltagIDsorperformotherfunctions.Energyefciency,particularly,howtoreduceenergyconsumptionbythetags,isanunder-studiedsubject.Thereexistspriorworkonenergy-efcientprotocolsforestimatingthenumberoftags[ 88 ],oranti-collisionprotocolsthatminimizetheenergyconsumptionofamobilereader[ 70 116 ].Tothebestofourknowledge,wewerethe 1Ifatagreportsanabnormaltemperature,thereadermayinstructthetagtokeeptransmittingbeacons,whichguideamobilesignaldetectortolocatethetag. 102

PAGE 103

rsttoinvestigateenergy-efcientprotocolsforcollectinginformationfromasubsetoftagsinalargeRFIDsystem[ 131 ].Inthischapter,werstshowthatthestandard,straightforwardpollingdesignisnotenergy-efcientbecauseeachtaghastocontinuouslymonitorthewirelesschannelandreceiveO(m)tagIDs,whichisenergy-consumingifthenumbermoftagsthatthereaderneedstocollectinformationfromislarge.Wethenshowthatacodedpollingprotocol(CP)isabletocuttheamountofdataeachtaghastoreceivebyhalf,whichmeansthatenergyconsumptionpertagisalsoreducedbyhalf.ThisisstillfarawayfromourobjectiveofreducingenergyconsumptiontoO(1).Weproposeanoveltag-orderingpollingprotocol(TOP)thatcanreduceper-tagenergyconsumptionbymorethananorderofmagnitudewhencomparingwiththecodedpollingprotocol.Wealsorevealanenergy-timetradeoffintheprotocoldesign:per-tagenergyconsumptioncanbereducedtoO(1)attheexpenseoflongerprotocolexecutiontime.WethenapplypartitionedBloomlterstoenhancetheperformanceofTOP,suchthatitcanachievemuchbetterenergyefciencywithoutdegradationinprotocolexecutiontime.Finally,weshowhowtocongurethenewprotocolsfortime-constrainedenergyminimization. 6.2.2SystemModelWeconsideralargeRFIDsystemusingactivetags.EachtagcarriesauniqueIDandoneormoresensors.ItalsohasthecapabilityofperformingcertaincomputationsaswellascommunicatingwiththeRFIDreaderwirelessly.Thereaderandthetagstransmitwithsufcientpowersuchthattheycancommunicateoveralongdistance.WeassumethattheRFIDreaderknowstheIDsofalltagsinthesystembyexecutinganID-collectionprotocol,andithasenoughpowersupply. 6.2.3ProblemDenitionLetNbethesetoftagsinthesystemandn=jNj.LetMbeasubsetoftags,m=jMj,andMN.OurobjectiveistodesignefcientinformationcollectionprotocolsthatcollectinformationfromtagsinM.Aninformationcollectionprotocolmaybe 103

PAGE 104

scheduledtoexecuteperiodically.Mmaychangeovertimesothatdifferentsubsetsoftagsarequeried.Wehavetwoperformanceobjectives.Theprimaryperformanceobjectiveistoachieveenergyefciency.Wewanttominimizetheaverageamountofenergythatatagspendsduringoneexecutionofaninformationcollectionprotocol.Theenergyexpenditurebyataghastwocomponents:(1)energyfortransmittingitsinformation(e.g.,32bits)tothereader,and(2)energyforreceivingthepollingrequestandotherinformationfromthereader.Theformerisasmall,xedamountofenergythatmustbespent.Thelatterisdependentontheprotocoldesignaswewillseeshortly.Itisavariableamountofenergythatshouldbeminimized.Simpleprotocoldesignswillresultinalltagsinthesystem,includingthosenotinM,tobecontinuouslyactiveandunnecessarilyreceivealargeamountofdatafromthereaderforanextendedperiodoftime.Howtominimizesuchenergycostisthefocusofthischapter.Oursecondaryperformanceobjectiveistoreduceprotocolexecutiontime.RFIDsystemsuselow-ratecommunicationchannels.Forexample,inthePhilipsI-Codesystem,theratefromareadertoatagisabout27Kbpsandtheratefromatagtoareaderisabout53Kbps.Lowrates,coupledwithalargenumberoftags,oftencauselongexecutiontimesforRFIDprotocols.Toapplysuchprotocolsinabusywarehouseenvironment,itisdesirabletoreduceprotocolexecutiontimeasmuchaspossible.Communicationbetweenthereaderandtagsistime-slotted.Thereader'ssignalsynchronizestheclocksoftags.LetttagbethelengthofatimeslotduringwhichthereaderisabletobroadcastatagID,andtinfbethelengthofatimeslotduringwhichatagisabletotransmititsinformation. 6.2.4PriorArtMuchexistingworkonRFIDsystemsistodesignanti-collisionID-collectionprotocols,whichreadIDsfromallthetagsinthesystem.Theymainlyfallintotwocategories.OneisALOHA-based[ 19 78 140 143 157 173 ],andtheotherisTree-cased[ 1 6 115 177 ].TheALOHA-basedprotocolsworkasfollows:Thereader 104

PAGE 105

broadcastsaqueryrequest.Withacertainprobability,eachtagchoosesatimeslotinthecurrentframetotransmititsID.Ifthereisacollision,thetagwillcontinueparticipatinginthenextframe.Thisprocessrepeatsuntilalltagsareidentiedsuccessfully.Thetree-basedprotocolsorganizeallIDsinatreeofIDprexes[ 6 115 177 ].Eachin-treeprexhastwochildnodesthathaveoneadditionalbit,`0'or`1'.ThetagIDsareleavesofthetree.TheRFIDreaderwalksthroughthetree.Asitreachesanin-treenode,itqueriesfortagswiththeprexrepresentedbythenode.Whenmultipletagsmatchtheprex,theywillallrespondandcausecollision.Thenthereadermovestoachildnodebyextendingtheprexwithonemorebit.Ifzerooronetagresponds(intheone-tagcase,thereaderreceivesanID),itmovesupinthetreeandfollowsthenextbranch.Anothertypeoftree-basedprotocolstriestobalancethetreebylettingthetagsrandomlypickwhichbranchestheybelongto[ 1 18 115 ].ID-collectionprotocolscanbeadoptedtosolveourproblem,buttheyareinefcientcomparingtoourapproaches.ForALOHA-basedprotocols,tagscansendsensorinformationalongwiththeirIDs.Buteachtagneedtocontinuouslylistentothechannelandmaysenditssensorinformationmultipletimesduetocollision,whichresultsinatleaseO(n)per-tagenergyconsumption.Fortree-basedprotocols,readercanuseanadditionaltimeslottoreceiveatag'sinformationwhenitbecomesthesoleleafofabranch.However,alltagsneedtoreceiveatleastO(m)prexes,whichisequivalenttotheBasicPollingprotocoldesigninSection 6.3 .Shenetal.proposedadatacollectionprotocolinhybridmobilenetworksconsistingofwirelesssensornodes,RFIDreaders,andsmartRFIDtags[ 142 ].ZhengandLiproposedatwo-phasefasttagsearchprotocoltondthesubsetofwantedtagsinreaders'interrogationzone[ 174 ].Chen,ZhangandXiaoproposedefcientinformationcollectionprotocolsforsensor-augmentedRFIDnetworks[ 28 ].Yueetal.designedefcientprotocolstocollectsensorinformationfromRFIDtagsinmulti-readerscenarios 105

PAGE 106

[ 168 ].Areaderrstdetectswhichtagsarelocatedinitsinterrogationregion.ThenitbroadcastsadistributivelyconstructedBloomFiltertotagsandwaitfortags'response.Afollowupwork[ 169 ]furtheranalyzestheimpactofcardinalityestimationerrorandchannelerrortotheexecutiontime/overheadoftheprotocol.Minetal.proposedaniterativetagsearchprotocol,whichuseslteringvectorstoiterativelylterouttagsnotofinterest[ 24 ].Thesemethodshoweveronlyimprovestimeefciency[ 24 28 142 168 169 174 ]orinfrastructure-costefciency[ 142 ].Tagenergyconsumptionarenotconsidered.Ourproposedmethodsontheotherhand,minimizestag-energyconsumptionwhileconstrainingexecutiontime. 6.3BasicPollingProtocol(BP)Inastandard,straightforwardwayofdesigningapollingprotocol,wesimplylettheRFIDreaderbroadcastthetagIDsinMonebyone.AfterittransmitsanID,itwaitsforatimeslotoftinfduringwhichthecorrespondingtagtransmitsitsinformation.Eachtagcontinuouslylistenstothewirelesschannel.WheneveritreceivesanIDfromthereader,thetagcomparesthereceivedIDwithitsownID.Iftheymatch,thetagwilltransmititsinformationandthengotosleepuntilthenextscheduledexecutionoftheprotocol.Intheaboveprotocol,eachtaginMwillhavetoreceivem 2IDsonaveragefromthereaderbeforeittransmits.EachtagnotinMwillhavetoreceiveallmIDs.Theamountofenergyspentbyataginreceivingsuchdatagrowslinearlywithrespecttom.IttakesaconstantamountofenergyforatagtoreceiveanIDandanotherconstantamountofenergyforittotransmititsinformation.TheenergycostofthewholesystemisthusO(nm).Theprotocolexecutiontimeism(ttag+tinf).Weuseanumericalexampletoexplaintheenergycost.Consideramilitarybasethathasalargewarehousestoring50,000weapons,ammunitionmagazines,andotherequipment,whicharetaggedwithRFIDsensors.Amongthem,thereare1,000sensitivedevices,fromwhichaRFIDreaderneedstoaccessinformationinordertomakesurethattheyareingoodconditionsorsimplytoconrmtheirpresence(against 106

PAGE 107

unauthorizedremoval).LeterbetheamountofenergyatagspendsinreceivinganIDandesbetheamountofenergyatagspendsintransmittingitsinformation.Thetotalenergyconsumedbyalltagsfortransmittingis1,000es,andthetotalenergyconsumedbyalltagsforreceivingisabout50,000,000er.Eventhoughermaybesmallerthanes,thetotalamountofenergyspentbytagsinreceivingcanbemuchgreaterthantheamountspentintransmitting. 6.4CodedPollingProtocolWeshowthatacodedpollingprotocol(CP)[ 25 ]isabletoreducetheamountofdataeachtaghastoreceivebyhalf.TheprotocolassumesthateachtagIDcarriesanidenticationnumberandaCRC(cyclicredundancycode)forerrordetection.ThisrequirementissatisedbytheEPCglobalGen-2standard,whereeach96-bittagIDcontainsaCRCchecksum.TheCRCiscomputedbasedontheidenticationnumberandagenerator.WhenatagreceivesanIDfromawirelesschannel,itcomputesaCRCbasedonthereceivedidenticationnumberandthencomparestheresultwiththereceivedCRC.Iftheyarethesame,wesaytheIDcontainsavalidCRC.CRChasthefollowingproperty:IfxandyaretwotagIDswithvalidCRCs,thenxyalsohasavalidCRC.Thesamepropertydoesnotholdforx^y,where^ycontainsthesamebitsinybutinthereverseorder.Forexample,ify=10110,then^y=01101.Wecall^ythereversalofy.Inthecodedpollingprotocol,theRFIDreaderrstarrangestheIDsinMinpairs.EachpairconsistsoftwoIDsthatarearbitrarilyselectedfromM.Consideranarbitrarypair,xandy,whicharecalledeachother'sparingID.Wedenethepollingcodeofthepairasc=x^y.InsteadofsendingouttheIDsinMoneafteranother,thereaderbroadcaststhepollingcodeofeachpaironeafteranother.Aftereachbroadcastofapollingcodec=x^y,thereaderwaitsfortwotimeslots,duringwhichtagxandtagywilltransmit.Morespecically,whenanarbitrarytagzreceivesthepollingcodec,itrstcomputes 107

PAGE 108

zc,andcheckswhethertheCRCinthereversalofzcisvalid.Ifitis,thetagwilltransmititsinformation.Otherwise,thetagcomputes^zc,andcheckswhethertheCRCin^zcisvalid.Again,ifitisvalid,thetagwilltransmit.Otherwise,thetagwillnottransmit.Weshowthatonlytagxandtagywilltransmit.First,considerthecaseofz=x.Thetagrstcomputeszc=xx^y=^y.Thereversalof^yisy.TheCRCinanytagID(includingy)isvalid.Hence,tagxwilltransmit.Moreover,itnowknowsitspairingID,y.Ifxisgreaterthany,thetagwilltransmitintherstslotafterreceivingthepollingcode;otherwise,itwilltransmitinthesecondslot.Second,weconsiderthecaseofz=y.Thetagrstcomputesyc=yx^y.ItsreversalislikelytohaveaninvalidCRC;thechanceforanarbitrarynumbertocontainavalidCRCisverysmall.Then,thetagcomputes^zc=^yx^y=x,whichcontainsavalidCRC.Consequently,ywilltransmit.SinceitnowknowsitspairingID,x,italsoknowsinwhichslotitshouldtransmit.Finally,considerthecaseofz6=xandz6=y.Thetagcomputesthereversalofzc=zx^yandthencomputes^zc=^zx^y.BothofthemarelikelytohaveinvalidCRCs.Aminorproblemisthatycinthesecondcaseandzcor^zcinthethirdcasestillhaveasmallprobabilitytocontainavalidCRC.However,thereadercaneasilypreventthisfromhappening.ItknowsalltagIDs.ItcanprecomputeallpollingcodesandcheckwhetheravalidCRChappensintheabovecasesbychancewhenitisnotsupposedto.Ifthisistrueforapairoftags,xandy,thereadermustbreakupthepair,andusethemtoformnewpairswithotherIDsinM.SuchanapproachiseffectivebecausetheprobabilityforthistohappenisexceedinglysmallwhenCRCissufcientlylong.BecauseeachpollingcoderepresentstwotagIDs,thenumberofpollingcodesinCPism 2.Hence,whencomparingwiththebasicpollingprotocol,CPreducesthenumberofbroadcastsmadebythereaderbyhalf,anditalsoreducestheamountof 108

PAGE 109

datathateachtaghastoreceivebyhalf.Thisnotonlysavesenergyfortags,butalsoreducestheprotocolexecutiontimetom 2ttag+mtinf. 6.5Tag-OrderingPollingProtocol(TOP)AlthoughCPismoreefcient,theexpectedamountofenergythateachtagspendsinreceivingremainsO(m).Inthissection,weproposeanewtag-orderingpollingprotocolthatreducessuchenergycosttoO(1). 6.5.1MotivationInthebasicpollingprotocol,aRFIDreaderbroadcastsmIDsintimeslotsoflengthttag.AlltagsmustcontinuouslymonitorthewirelesschannelinordertoknowwhethertheirownIDsareinthebroadcast.InCP,thereaderbroadcastsm 2pollingcodesalsointimeslotsoflengthttag.Again,alltagsmustcontinuouslymonitorthewirelesschannel.Theyhavetokeepreceivingandprocessingthepollingcodes.EachtaginthebasicprotocolhastoreceiveuptomIDs.EventhoughCPismoreefcient,atagstillhastoreceiveuptom 2codes.Wewanttoremovethenecessityforanytagtokeepmonitoringthewirelesschannel.Ideally,atagshouldstayinanenergy-conservingstandbymodeformostoftime,andonlywakeupattherighttimeslottoreceiveinformationaboutitself,suchaswhetheritispolledand,ifso,whenitshouldtransmit.Tofurtherreducetheamountofdatathattagshavetoreceive,weletthereaderbroadcastaso-calledreporting-ordervectorV,insteadofIDsinM.EachIDinMismappedtoabitinVthroughahashfunction;thebitissetasonetoencodetheIDinthevector.AtagonlyneedstocheckaspecicbitinVatalocationdeterminedbythehashofitsID.Thisbitiscalledtherepresentativebitofthetag.Ifitsvalueisone,thetagispolledbythereaderforreporting,i.e.,thetagbelongstoM;ifitsvalueiszero,thetagisnotpolled.ThevectorValsocarriesinformationabouttheorderinwhichthepolledtagswillreporttheirdata.EachbitwhosevalueisoneinVrepresentsapolledtag.IfatagndsthatthereareionesinVprecedingitsrepresentativebit,itknowsthatitshouldbethe(i+1)thtagin 109

PAGE 110

Mtoreportitsinformation.Withsuchanordering,itbecomespossiblefortagsinMtoreportatdifferenttimesandavoidcollision.However,thisbasicideahastwoproblems.First,thereshouldbeatleastmbitsinVtoencodemIDsinM.TheenergycostofreceivingVremainsO(m).HowcanatagndoutthenumberofonesinVprecedingitsrepresentativebitwithouthavingtoreceivethewholevector?Second,hashcollisioncausestwoissues.IfatagnotinMishashedtothesamebitinVasataginMdoes,itwillnditsrepresentativebittobeone,causingfalsepositive.IftwotagsinMaremappedtothesamebitinV,theywilltransmitatthesametime,causingreportcollision.Intherestofthissection,wedesignanewtag-orderingpollingprotocol(TOP)tosolvetheseproblems.Itconsistsofthreephases:orderingphase,pollingphase,andreportingphase.Intheorderingphase,thereaderbroadcaststhevectorVsothateachtagknowswhetheritispolledandwhereitislocatedinthereportingorder.Thepollingphaseresolvestheissuesoffalsepositiveandreportcollision.Finally,inthereportingphase,tagsinMreporttheirinformationintheorderdenedbyVwithoutcollision. 6.5.2ProtocolDescription 6.5.2.1OrderingPhaseTheRFIDreaderdoesnotbroadcastanyIDsorindices.Itonlybroadcaststhereporting-ordervector,V.IfVcannottinonetimeslotoflengthttag,thereaderbreaksthevectorintosegmentsandbroadcastseachsegmentinatimeslotofttag.Inaddition,thereaderalsobroadcaststhevectorsizev.Knowingthevectorsize,atagtisabletohashitsIDandndoutthelocationofitsrepresentativebitinV.Becausethesegmentsizeisxed,talsoknowswhichsegmentitsrepresentativebitbelongsto.Thissegment,denotedasVt,iscalledtherepresentativesegmentoftagt.Atagwillstayinthestandbymodeandbeactiveonlywhenreceivingitsrepresentativesegment. 110

PAGE 111

Figure6-1. Vtistherepresentativesegmentoftagt,xtisthetotalnumberofonesinallprevioussegments,andytisthenumberofonesinVtthatprecedetagt'srepresentativebit.Itisthepositionoftinthereportingorder.It=xt+yt. Ifatagndsthatitsrepresentativebitiszero,itknowsforsurethatitisnotamemberinM.Ifatagndsthatitsrepresentativebitisone,itmaybeamemberinMoranon-memberthatismappedtoabitwhichamemberinMisalsomappedto.Thelattercasecausesfalsepositive.BecausethereaderknowsallIDsinthesystem,itcanpre-computethesetFofnon-membertagsthatcausefalsepositive.WhenthereaderbroadcastsanysegmentofV,itincludesinthesametimeslotthetotalnumberofonesintheprevioussegments.Foranarbitrarytagt,letItbethenumberofonesinVprecedingtherepresentativebitoft.WhentagtreceivesVt,itcancomputesItasthesumof(a)thenumberofonesintheprevioussegmentsand(b)thenumberofonesinVtbeforeitsrepresentativebit.SeeFigure 6-1 forillustration.Aswewillseelater,thevalueofItspecieswhentagtwilltransmitduringthereportingphase.IftwotagsinMaremappedtothesamebitinV,theywillhavethesameItvalueandthustransmitatthesametimeduringthereportingphase,causingcollision.BecausethereaderhasallIDsinM,itknowsexactlywhichtagswillbemappedtothesamebit.Thismakesiteasytoresolvecollision.Thereadersimplyremovesallbutonetagthataremappedtoabit,andputstheminasetC.Thesetags,togetherwithtagsinF,willnotparticipateinthereportingphase.Theyarehandledseparatelyinthepollingphase. 111

PAGE 112

6.5.2.2PollingPhaseInthisphase,thereaderissuestwotypesofpollingrequests.ForeachtaginC,itsendsapositivepollingrequest.ForeachtaginF,itsendsanegativepollingrequest.Todistinguishthesetwotypes,thereadermusttransmitaone-bitagtogetherwithatagIDineachrequest,specifyingwhetherthepollingispositiveornegativeandwhichtagispolled.Tagsthatndtheirrepresentativebitstobeonesinthepreviousphasemustcontinuouslylistentothechannelduringthepollingphase.Aftersendingapositiverequest,thereaderwaitsforatimeslottoreceiveinformation.ThetagthatndsitsIDintherequestwilltransmititsinformationinthisslot.Thistag,whichbelongstoC,willnotparticipateinthereportingphase.Aftersendinganegativerequest,thereaderdoesnotwaitbeforesendingoutthenextrequest.ThetagthatndsitsIDinanegativerequestknowsthatitmustbelongtoFandhenceshouldnotfurtherparticipateintheprotocolexecution.ThetotalnumberofpollingrequestsisjFj+jCj.Bychoosinganappropriatesizeforthereporting-ordervector,wecanmakesurethatjFj+jCj=O(1)(seeSection 6.6 ).NotethatonlytagsinMandFhavetolistentothechannelinthisphase.TagsinN)]TJ /F3 11.955 Tf 12.23 0 Td[(M)]TJ /F3 11.955 Tf 12.23 0 Td[(F,whichmaycontainthemajorityoftagsinthesystem,havealreadyknownthattheydonotbelongtoMandthusdonotneedtoparticipateintheprotocolexecution. 6.5.2.3ReportingPhaseAtagparticipatesinthereportingphaseonlyifitsatisesthefollowingtwoconditions:(1)itndsthatitsrepresentativebitisoneintheorderingphase,and(2)itdoesnotnditsIDintherequestsofthepollingphase.Thereportingphaseconsistsofm)-309(jCjtimeslots.Ineachtimeslot,onetaginM)]TJ /F3 11.955 Tf 12.88 0 Td[(Ctransmitsitsinformation.RecallthateachtaginMlearnsitsindexinthe 112

PAGE 113

reportingorderduringtheorderingphase.Thetagwilltransmitinthereportingphaseatthetimeslotofthesameindex. 6.5.2.4TimingBeforeexecutingtheprotocol,theRFIDreaderusesitsbroadcastingsignaltosynchronizetheclocksofthetags.ThereadercomputesthevectorVandbreaksitintosegments.Supposeeachtimeslotoflengthttagcancarry96bits.Wemaysetthesegmentsizetobe80bitsandusetheremaining16bitstocarrythetotalnumberofonesintheprevioussegments.2ThereaderisabletocomputetheexecutiontimeT1oftheorderingphase,whichisthenumberofsegmentsmultipliedbyttag.SincethereaderknowsallIDsinthesystem,itcanprecomputethesetFoftagsthatcausefalsepositiveandthesetCoftagsthatshouldnotparticipateinthereportingphaseinordertoavoidcollision.BasedonFandC,thereadercancomputetheexecutiontimeT2ofthepollingphase,whichisjFjttag+jCj(ttag+tinf).Supposealltagswakeupateachscheduledexecutionoftheprotocol.ThereadercomputesandbroadcaststhevaluesofT1andT2rightbeforetheorderingphase,sothatthetagsknowwheneachphaseoftheprotocolwillbegin.Theywillremaininthestandbymodeunlesstheyhavetoreceivetheirrepresentativesegments,participateinthepollingphase,ortransmittheirinformationinthereportingphase.Ifthesystemrequireson-demandpollingoftaginformationinsteadofperiodicexecution,therearetwopossiblesolutionstowakethetagsupintherstplace.Therstoneispseudo-on-demandpolling,wheretagsstillwakeupperiodically,butthereaderonlyissuesthepollingrequestwhenneeded.Thesecondapproachistoattachawake-upcircuittoeachtag,andusethetwo-stagewake-upschemeproposedin[ 29 ] 2Using16bittocarrythenumberofonesinprevioussegmentswilllimitthevalueofmto(0,65,535].Togetridofthislimitation,wecanusedlog2mebitsinsteadandbroadcastthevalueofdlog2metotagsatthebeginningofprotocol.However,forthesakeofsimplicity,weuse16bitsinthefollowingtohelpdemonstratethemainidea. 113

PAGE 114

toactivatethetags.Inthisapproach,tagsrespondealmostimmediatelytothepollingevent.However,thewake-upcircuitrequiresthereadertobecloseenoughsothattheradiopowerisstrongenoughtotriggerthewake-upevent.Asaresult,wemayhavetodeployextrareaderstocoverallthetags. 6.6PerformanceAnalysisofTOP 6.6.1EnergyCostWeshowhowtocongureTOPsuchthattheenergycostpertagisO(1).Theenergycostofataghasfourcomponents:(1)receivingv,T1andT2,(2)receivingasegmentofVintheorderingphase,(3)listeningtothechannelduringthepollingphase,and(4)transmittinginformationinaslotatthereportingphase(oratthepollingphaseifthetagisinC).Thersttwocomponentsincursmall,constantenergyexpendituretoeverytaginthesystem.Thefourthcomponentalsoincurssmall,constantenergycost,butonlytothetagsinM.ThethirdcomponentincursenergycostonlytotagsinFandM.Intheworsecase,ataghastolistentoalljCj+jFjpollingrequestsfromthereader.Supposeittakesoneunitofenergytoreceiveapollingrequest.Thetotalenergycostofatag,denotedas,isjCj+jFj+O(1). (6)WetreatjCjandjFjasrandomvariablesandderivetheirexpectedvalues.Recallthatvbethenumberofbitsinthereporting-ordervectorV.LetbibethevalueoftheithbitinV,0i
PAGE 115

ofonesinV.Hence,wehaveE(jCj)=m)]TJ /F5 7.97 Tf 18.18 14.95 Td[(vXi=1Probfbi=1gm)]TJ /F3 11.955 Tf 11.96 0 Td[(v(1)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F5 7.97 Tf 6.58 0 Td[(m=v). (6)AtagnotinMwillcausefalsepositivewhenitsrepresentativebitisone.TheprobabilityforthistohappenisProbfbi=1g.Hence,E(jFj)=(n)]TJ /F3 11.955 Tf 11.96 0 Td[(m)Probfbi=1g(n)]TJ /F3 11.955 Tf 11.96 0 Td[(m)(1)]TJ /F3 11.955 Tf 11.96 0 Td[(e)]TJ /F5 7.97 Tf 6.59 0 Td[(m=v). (6)BothE(jCj)andE(jFj)aremonotonicallydecreasingfunctionsofv.WeshowthatE(jCj)=O(1)ifvissufcientlylarge.Letv=m2 2.FromTaylorexpansion,weknowthat1)]TJ /F3 11.955 Tf 11.96 0 Td[(e)]TJ /F5 7.97 Tf 6.59 0 Td[(m=v=m v)]TJ /F6 11.955 Tf 15.09 8.09 Td[(1 2!(m v)2+1 3!(m v)3)]TJ /F6 11.955 Tf 15.1 8.09 Td[(1 4!(m v)4...m v)]TJ /F6 11.955 Tf 15.09 8.09 Td[(1 2!(m v)2.Applyingitto( 6 ),wehave E(jCj)=m)]TJ /F3 11.955 Tf 11.96 0 Td[(v(1)]TJ /F3 11.955 Tf 11.96 0 Td[(e)]TJ /F5 7.97 Tf 6.59 0 Td[(m=v)1 2!m2 v=1.(6)NextweshowthatE(jFj)=O(1)ifvissufcientlylarge.Ifn=m,E(jFj)=0.Nowassumen>m.Letv=)]TJ /F5 7.97 Tf 27.5 4.7 Td[(m ln(1)]TJ /F7 5.978 Tf 13.35 3.26 Td[(1 n)]TJ /F5 5.978 Tf 5.75 0 Td[(m).Applyingitto( 6 ),wehave E(jFj)=(n)]TJ /F3 11.955 Tf 11.96 0 Td[(m)(1)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F5 7.97 Tf 6.59 0 Td[(m=v)=1.(6)Therefore,ifwechoosev=maxfm2 2,)]TJ /F5 7.97 Tf 27.5 4.71 Td[(m ln(1)]TJ /F7 5.978 Tf 13.35 3.26 Td[(1 n)]TJ /F5 5.978 Tf 5.76 0 Td[(m)g,wehaveE()E(jCj)+E(jFj)+O(1)1+1+O(1)=O(1).WeconcludethatTOPcanbeconguredsuchthattheexpectedenergycostpertagisO(1).Aswewillseeshortly,theprotocolexecutiontimeincreaseswhenvbecomestoolarge.Tostrikeabalancebetweenenergycostandprotocolexecutiontime,wemaychooseavalueofvmuchsmallerthanmaxfm2 2,)]TJ /F5 7.97 Tf 27.5 4.71 Td[(m ln(1)]TJ /F7 5.978 Tf 13.34 3.26 Td[(1 n)]TJ /F5 5.978 Tf 5.75 0 Td[(m)g.InSection 6.10 ,weusesimulationstostudytheperformanceofTOPunderpracticalvaluesofv.Forexample, 115

PAGE 116

whenv=24m,theamountofdatathatatagreceivesinTOPismorethananorderofmagnitudesmallerthanwhatataghastoreceiveinCP.Wecharacterizetheenergycostinthepollingphasebycountingtheamountofdata(inKilobits)thatataghastoreceive.NumericalresultsareshownintherstplotofFigure 6-2 ,wheren=50,000andm=5,000,10,000,or25,000,correspondingtothreecurvesintheplot.Clearly,asvincreases,theenergycostdecreases. 6.6.2ExecutionTimeTheprotocolexecutiontimealsoconsistsoffourcomponents.Tobeginwith,ittakesthereaderasmall,constanttimetobroadcastv,T1andT2.Thetimefortheorderingphaseisv lttag,wherelisthesegmentsize.ThetimeforthepollingphaseisjFjttag+jCj(ttag+tinf).ThetimeforthereportingphaseisjM)]TJ /F3 11.955 Tf 12.12 0 Td[(Cjtinf.Hence,thetotalexecutiontimeis T=(v l+jFj+jCj)ttag+mtinf+O(1).(6)From( 6 )and( 6 ),theexpectedprotocolexecutiontimeisE(T)=hv l+(n)]TJ /F3 11.955 Tf 11.95 0 Td[(m)(1)]TJ /F3 11.955 Tf 11.96 0 Td[(e)]TJ /F5 7.97 Tf 6.59 0 Td[(m=v)+m)]TJ /F3 11.955 Tf 11.96 0 Td[(v(1)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F5 7.97 Tf 6.58 0 Td[(m=v)ittag+mtinf+O(1)hv l+(n)]TJ /F3 11.955 Tf 11.96 0 Td[(m)m vittag+mtinf+O(1). (6)ThesecondplotofFigure 6-2 presentstheprotocolexecutiontime(excludingtheconstantO(1))whenn=50,000,m=5,000,10,000,or25,000,ttag=3297s,andtinf=906s;seeSection 6.10 forhowtheyaredetermined.Interestingly,asvincreases,theexecutiontimerstdecreasesandthenincreases.WecanndtheoptimalvalueofvthatminimizestheexecutiontimefromE(T) v=0.Combiningtheresultsintherstandsecondplots,wecangureoutthetradeoffrelationbetweenenergycostandprotocolexecutiontime,whichispresentedinthethirdplot.Asvbecomeslarge,theenergycostdecreasesattheexpenseofincreasedexecutiontime. 116

PAGE 117

Figure6-2. Energy,time,andenergy-timetrade-offofTOP.Firstplot:Energycostpertagwithrespecttov.Secondplot:Protocolexecutiontimewithrespecttov.Thirdplot:Energy-timetradeoffcontrolledbyv. 117

PAGE 118

6.6.3ChoosingvforTime-constrainedEnergyMinimizationRecalltheperformanceobjectivesofTOPareenergyefciencyandtimeefciency.However,asshowninFigure 6-2 ,wemaynotbeabletoachievethebestperformanceinbothmetricsusingoneconguration.BelowwestudyhowtocongureTOPfortime-constrainedenergyminimization.ConsiderawarehousewithalargenumberofRFID-taggedgoods.Supposethesystemadministerwantstomaximizethetags'batterylifetime,butthereisarequirementontheexecutiontimeofapollingoperationbecauseexcessivelylongexecutiontimeincreasesthechanceofinterferingwithotherscheduledtasks.Fromthepreviousanalysis,weknowthattheprotocolexecutiontimeistreatedasarandomvariable.LetTbetheexecutiontimeofTOP,Bbeapre-denedtimebound,andbeaprobabilityvalue,0<<1.Thetimeconstraintcanbespeciedinaprobabilisticway, ProbfTBg.(6)Ourperformanceobjectiveistondtheoptimalvalueofvthatminimizestheenergycost,subjecttotheaboveconstraint.AsshownintherstplotofFigure 6-2 ,theenergycostdecreasesasthesizeofthereporting-ordervector,v,increases.Hence,ourgoalbecomesndingthelargestvthatsatises( 6 ).Inthefollowing,wederiveProbfTBgasafunctionofv.Basedonthisfunction,wewillbeabletocomputetheoptimalvalueofv.LetdbethetotalnumberofonesinVafterencodingtagsinM,0
PAGE 119

WhenatagnotinMismappedtoabitthatisone,falsepositivehappens.ThereaderputsallfalsepositivetagstoF.WhentherearexbitsthatareonesinV,theconditionalfalsepositiveprobabilityisx v.Thus, Probffalsepositived=xg=x v.Obviously,whend=x,thetotalnumberoffalsepositivetagsfollowsabinomialdistributionBino(n)]TJ /F3 11.955 Tf 11.95 0 Td[(m,x v).ProbfjFj=fd=xg=n)]TJ /F3 11.955 Tf 11.95 0 Td[(mf(x v)f(1)]TJ /F3 11.955 Tf 13.26 8.09 Td[(x v)n)]TJ /F5 7.97 Tf 6.59 0 Td[(m)]TJ /F5 7.97 Tf 6.58 0 Td[(f.LetSbetheunionofCandF,sojSj=jCj+jFj.TheprobabilitydistributionofjSjisProbfjSj=sg=sXc=0ProbfjFj=s)]TJ /F3 11.955 Tf 11.95 0 Td[(cjCj=cgProbfjCj=cg=mXx=1ProbfjFj=s)]TJ /F3 11.955 Tf 11.95 0 Td[(m+xjd=xgProbfd=xg=mXx=1n)]TJ /F3 11.955 Tf 11.95 0 Td[(ms)]TJ /F3 11.955 Tf 11.96 0 Td[(m+x(x v)s)]TJ /F5 7.97 Tf 6.58 0 Td[(m+x(1)]TJ /F3 11.955 Tf 13.25 8.09 Td[(x v)n)]TJ /F5 7.97 Tf 6.58 0 Td[(s)]TJ /F5 7.97 Tf 6.59 0 Td[(xpd(m,v,x). (6)Adopt( 6 )andignoreO(1),asmallconstanttimeforthereadertobroadcastv,T1andT2,whichisnegligiblysmallwhencomparingwithothercomponentsontherightsideof( 6 ).Wehave ProbfTBg=smaxXs=0ProbfjSj=sg,(6)wheresmax=B)]TJ /F5 7.97 Tf 6.59 0 Td[(mtinf ttag)]TJ /F5 7.97 Tf 13.61 4.71 Td[(v l.Wedenotetherightsideof( 6 )asPt(v,B),whichistheprobabilityfortheprotocolexecutiontimetobeboundedbyBunderacertainvalueofv.ItiscomputableasafunctionofvandBafter( 6 )isappliedandparametersmandnaregiven.Wewanttondthelargestvalueofvthatsatisestheinequality,Pt(v,B).Ournumericalcomputationshowsthat,givenaxedvalueofB,Pt(v,B)isnota 119

PAGE 120

Figure6-3. BoundBthatsatisesProbfTBgwithrespecttov.Parameters:n=10,000,m=1,000. monotonicfunctionwithrespecttov.Hence,wecannotdirectlyapplythebisectionsearchmethodtondthelargestvthatsatisesPt(v,B).WemayusetheFalsePositionalgorithm[ 16 ]tondtheoptimalvalueofv.Thecomputationoverheadisreasonable.Forn=10,000,m=1,000,B=4seconds,and=99%,ittakesanApplemacbook(2.4GHzCPUand4GBmemory)3secondstondtheoptimalv=60,160.Andforn=10,000,m=1,000,B=3seconds,and=99%,ittakesthesamecomputer16secondstondthatnovcansatisfytherequirement,becauseB=3secondsissmallerthantheminimumexecutiontimethatTOPcanachieve.Asarelatedproblem,ifvandaregiven,wecanalsousePt(v,B)tocomputethetimeboundthatTOPcanachieve.Morespecically,givenavalueofv,weareabletondthesmallestBthatsatisesPt(v,B)throughbi-sectionsearch:RecallthatPt(v,B)istheformulaforProbfTBg,theprobabilityfortheprotocolexecutiontimetobeboundedbyB.Clearly,itisanincreasingfunctionofBwithPt(v,0)=0andPt(v,+1)=1.WechooseasmallvalueB1(e.g.,0)suchthatPt(v,B1)
PAGE 121

Pt(v,B).Letn=10,000andm=1,000.Figure 6-3 showsthesmallestboundBwithrespecttovwhen=90%,95%and99%,whichcorrespondtothethreecurvesinthegure. 6.6.4Computingpd(m,v,x)theBallsAndBinsAlgorithmProblem:Supposewethrowmballsintovemptybins.Eachballisthrowntoarandombin,andeachbincanholdunlimitednumberofballs.Wewanttondtheprobabilitythataftermballsarethrown,xbinsarenotempty,denotedaspd(m,v,x).Solution:Therearemanysolutionstothisproblem.Wenowprovidearecursiveone.Assumeafterwethrowmballs,therearexnon-emptybins,1xm.Whenx>1,therearetwopossibilitiesofwherethemthballgoes:(1)Ifthemthballisplacedtoapreviouslyemptybin,thereshouldbex)]TJ /F6 11.955 Tf 12.28 0 Td[(1non-emptybinsafterm)]TJ /F6 11.955 Tf 12.28 0 Td[(1ballswerethrown,andthepossibilityforthistohappenisv)]TJ /F5 7.97 Tf 6.59 0 Td[(x+1 v;(2)Otherwiseifthemthballgoestoapreviouslynon-emptybin,theremustbexnon-emptybinsafterm)]TJ /F6 11.955 Tf 12.47 0 Td[(1ballswerethrown,andthepossibilityofthisoptionisx v.Thus,pd(m,v,x)=8>>>>>>>>>><>>>>>>>>>>:1;x=m=1.x vpd(m)]TJ /F6 11.955 Tf 11.96 0 Td[(1,v,x)+v)]TJ /F5 7.97 Tf 6.59 0 Td[(x+1 vpd(m)]TJ /F6 11.955 Tf 11.95 0 Td[(1,v,x)]TJ /F6 11.955 Tf 11.96 0 Td[(1);1xmandxv.0;allothercases.pd(m,v,x)canbecalculatedfromsimpledynamicprograming. 6.7EnhancedTag-OrderingPollingProtocol(ETOP) 6.7.1MotivationIfwedonotwanttosignicantlyincreaseexecutiontime,wecannotchoosealargevalueforv.Inthiscase,wemustndothermeanstolowerenergycost.ThekeyistoreducethenumberofIDsthathavetobetransmittedinthepollingphase.Namely,weshouldreducethenumberoftagsinFandC.Letusrstfocusourdiscussiononfalsepositive.Consideranarbitrarytagt=2M.ItsrepresentativesegmentisVt.Letqbe 121

PAGE 122

thenumberoftagsinMthatarealsomappedtoVt.Falsepositiveoccursiftandoneofthoseqtagshavethesamerepresentativebit.Theprobabilityforthistohappenis1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F7 7.97 Tf 13.15 4.71 Td[(1 l)q,wherelisthenumberofbitsinVt.Tofurtherreducethefalse-positiveprobability,wecanimplementeachsegmentofVasaBloomlter[ 7 12 ].Thereaderusesmultiplehashfunctionstomapeachtagtok(>1)representativebitsinV,insteadofjustoneinTOP.Morespecically,foreachmembert02M,thereaderrstmapsittoarepresentativesegmentVt0throughahashfunctionwhoserangeis[0,v l).Thenthereaderfurthermapst0tokrepresentativebitsinVt0andsetthemtoones.AfterallmembersinMareencodedinthesegmentsofV,thereaderbroadcaststhesegmentsintheorderingphase.AtagtonlylistensforitsrepresentativesegmentVtandthenchecksitsrepresentativebits.Ifanyrepresentativebitiszero,thetagcannotbeinM.Ifallrepresentativebitsareones,thetagmaybeamemberinMorafalsepositive.Inthecaseoffalsepositive,eventhoughthetagdoesnotbelongtoM,everyoneofitsrepresentativebitsissetbecauseitisalsoarepresentativebitofamembertaginM.Theprobabilityforthistohappenis(1)]TJ /F6 11.955 Tf 12.37 0 Td[((1)]TJ /F7 7.97 Tf 13.57 4.7 Td[(1 l)kq)k,whereqisthenumberoftagsinMwhoserepresentativesegmentsarealsoVt.Forexample,ifl=80,k=3,andq=2,thefalse-positiveprobabilityisjust3.810)]TJ /F7 7.97 Tf 6.59 0 Td[(4,muchlowerthan1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F7 7.97 Tf 13.15 4.71 Td[(1 l)q=2.510)]TJ /F7 7.97 Tf 6.59 0 Td[(2inTOPunderthesameparameters.Bloomlterscanreducethefalse-positiveprobability.Butitismoredifculttousethemtocarrythereportingorder,basedonwhichthetagswilltaketurntotransmitduringthereportingphase.InTOP,weusethenumberofonesthatprecedetherepresentativebitofatagtodeterminethetag'spositioninthereportingorder.Bloomltersusemultiplerepresentativebitstoencodeeachmember.Therepresentativebitsofdifferentmembersmayoverlapinanarbitraryway.Hence,wecannotsimplyuseallbitswhosevaluesareonestorepresenttagsinMbecausethereisnoone-to-onemappingbetweenthem. 122

PAGE 123

Figure6-4. Vtistherepresentativesegmentoftagt.Vtisevenlydividedintokpartitions,eachhavingbl kcbits.Tagthasonerepresentativebitineverypartition. Inthefollowing,wedesignanenhancedtag-orderingpollingprotocol(ETOP)tosolvetheaboveproblem.ETOPusespartitionedBloomlters,whichnotonlyreducefalsepositiveandencodethereportingorder,butalsoreducejCjaswellasoverallexecutiontimeoftheprotocol. 6.7.2ProtocolDescriptionThemaindifferencebetweenETOPandTOPisthatETOPimplementseachsegmentofVasapartitionedBloomlterinsteadofasimplebitarray.WhenwedescribetheprotocolofETOP,wefocusesonthedifferencewhileomittingthedetailsthatitsharesincommonwithTOP.InapartitionedBloomlter,thelbitsofasegmentareevenlydividedintokpartitions.Eachpartitionhasbl kcbits.SeeFigure 6-4 forillustration.ForeverymembertagtinM,thereaderappliesahashfunctiononitsIDtoobtainanumberofhashbits.Thereaderusesdlog2vehashbitstomapttoarepresentativesegmentVt,andthenuseskdlog2l kehashbitstofurthermapttoonerepresentativebitineverypartitionofthesegment.LikeaclassicalBloomlter,thepartitionedBloomltersetskrepresentativebitsforeachencodedmember;unlikeaclassicalBloomlter,apartitionedBloomlterspreadsthekrepresentativebitsinkdifferentpartitions.Afterreceivingitsrepresentativesegment,atagchecksthekrepresentativebitstodetermineifitisamemberinM.Falsepositivecasesarehandledbythereaderinthepollingphaseasusual. 123

PAGE 124

Howdoesatagtknowitspositioninthereportingorder?FirstweconsiderthereportingorderamongtagsthatareencodedinthesamesegmentVt.SinceeverytaghasexactlyonerepresentativebitineachpartitionofVt,wemaybeabletouseoneofthepartitionstocarrytheorderinformation.Inotherwords,ifthereisapartitionPwhosenumberofonesisequaltothenumberoftagsencodedinVt,weknowthattheremustbeaone-to-onemappingbetweenthesetagsandthe`1'bitsinP.Wecanusetheorderof`1'bitsinPasthereportingorderofthecorrespondingtags.Wewillexplainlaterhowthereadermakessurethatsuchapartitionexists.WhenthereadersendsoutVt,inthesametimeslotitalsosendsthetotalnumberxtoftagsthatareencodedinallprevioussegmentsofV.ThepositionoftagtinthereportingordercanbecomputedfromxtandtheinformationinP,whichwewillfurtherexplainshortly.HowtomakesurethatanysegmentofValwayshasapartitionwhosenumberofonesisequaltothenumberoftagsencodedinthesegment?Thereaderhastodosomeextrawork.AfterencodingalltagsinM,thereaderexaminesthepartitionsonebyoneforeachsegment.Ifthereisnotsuchapartition,thereaderremovesanencodedtagandplacesitinthesetC,whichwillbeexplicitlypolledinthepollingphase.Thereaderkeepsremovingtagsuntilitndsapartitionthatsatisestheaboverequirement.Notethattherequirementisalwayssatisedwhenthenumberoftagsencodedinasegmentisone.AfterreceivingitsrepresentativesegmentVt,atagt2Mcomputesitspositioninthereportingorderasfollows:ItndsoutapartitionPinVtthathasthelargestnumberofones.Thispartitionmusthaveaone-to-onemappingbetween`1'bitsandencodedtags.LetytbethenumberofonesinPthatprecedestherepresentativebitoft.Thetagcomputesitspositioninthereportingorderasyt+xt.Recallthatxtisthenumberoftagsthatareencodedintheprevioussegments.ItisreceivedtogetherwithVtinthesametimeslot. 124

PAGE 125

ThepollingphaseandthereportingphaseofETOPareidenticaltotheircounterpartsinTOP. 6.8PerformanceAnalysisofETOP 6.8.1EnergyCostWeshowthatETOPcanbeconguredsuchthattheenergycostpertagisO(1).ETOPhasthesameupperboundformulaforper-tagenergycostasTOPdoes,whichisshownin( 6 ),butithasdifferentvaluesofjCjandjFj.Inthefollowing,wederivejCjandjFjforETOP.LetmibethenumberoftagsinMthatareencodedintheithsegment,0i1)]TJ /F5 7.97 Tf 13.15 4.71 Td[(ml v,wehaveE(jCij)
PAGE 126

representativebitwithataginMis1)]TJ /F6 11.955 Tf 12.35 0 Td[((1)]TJ /F5 7.97 Tf 13.55 4.71 Td[(k l)mi.Theprobabilityforthattohappeninallpartitionsis[1)]TJ /F6 11.955 Tf 12.07 0 Td[((1)]TJ /F5 7.97 Tf 13.26 4.7 Td[(k l)mi]k.Hence,theprobabilityforthetagtocausefalsepositive,denotedaspfispf=mXq=0Probfmi=qg1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F3 11.955 Tf 13.15 8.08 Td[(k l)qk<(1)]TJ /F3 11.955 Tf 11.95 0 Td[(Probfmi=0g)1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F3 11.955 Tf 13.15 8.09 Td[(k l)mk(1)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F5 7.97 Tf 6.59 0 Td[(lm=v)(1)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F5 7.97 Tf 6.58 0 Td[(km=l).TheexpectedvalusofjFjisE(jFj)=(n)]TJ /F3 11.955 Tf 11.95 0 Td[(m)pf<(n)]TJ /F3 11.955 Tf 11.95 0 Td[(m)(1)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F5 7.97 Tf 6.58 0 Td[(lm=v)(1)]TJ /F3 11.955 Tf 11.96 0 Td[(e)]TJ /F5 7.97 Tf 6.58 0 Td[(km=l). (6)Ifweletv=)]TJ /F5 7.97 Tf 48.58 4.71 Td[(ml ln(1)]TJ /F7 5.978 Tf 34.95 3.26 Td[(1 (n)]TJ /F5 5.978 Tf 5.76 0 Td[(m)(1)]TJ /F5 5.978 Tf 5.75 0 Td[(e)]TJ /F5 5.978 Tf 5.75 0 Td[(km=l))andapplyitto( 6 ),wehaveE(jFj)<1.Now,ifwechoosev=maxfp m3l2,)]TJ /F5 7.97 Tf 48.58 4.7 Td[(ml ln(1)]TJ /F7 5.978 Tf 34.96 3.25 Td[(1 (n)]TJ /F5 5.978 Tf 5.76 0 Td[(m)(1)]TJ /F5 5.978 Tf 5.76 0 Td[(e)]TJ /F5 5.978 Tf 5.76 0 Td[(km=l))g,theexpectedenergycostE()E(jCj)+E(jFj)+O(1)<1+1+O(1)=O(1).Therefore,ETOPcanalsobeconguredsuchthattheenergycostpertagisO(1). 6.8.2ExecutionTimeFollowingthesameanalysisasinSection 6.6.2 ,itiseasytoseethatETOPhasthesameformulaforprotocolexecutiontimeasTOPdoes:T=(v l+jFj+jCj)ttag+mtinf+O(1),butthevaluesofjCjandjFjaredifferent.OursimulationresultsinSection 6.10 showthatETOPhassmallerexecutiontimethanTOP. 6.8.3ChoosingvforTime-constrainedEnergyMinimizationFollowingthesamereasoninginSection 6.6.3 ,wedenethetimeboundforETOPtobe ProbfTBg,(6)whereTistheexecutiontimeofETOP,Bisapre-denedtimebound,andisaprobabilityvalue,0<<1.Theobjectiveistondthelargestvaluevthatminimizestheenergycost,subjecttotheconstraint( 6 ).Inthefollowing,wederivea 126

PAGE 127

computableformulaforProbfTBg,whichcanbefoundin( 6 )and( 6 ).Basedontheformula,wewillbeabletondtheoptimalvaluev.LetmibethenumberoftagsinMthatareencodedintheithsegment,denotedasVi,0i
PAGE 128

Probfmaxj2[1,k]dij=ymi=xg=kYj=1Probfdijymi=xg)]TJ /F5 7.97 Tf 25.82 14.94 Td[(kYj=1Probfdijy)]TJ /F6 11.955 Tf 11.96 0 Td[(1mi=xg=)]TJ /F5 7.97 Tf 13.71 5.53 Td[(yXd=0Probfdij=dmi=xgk)]TJ /F9 11.955 Tf 11.96 9.68 Td[()]TJ /F5 7.97 Tf 8.19 5.53 Td[(y)]TJ /F7 7.97 Tf 6.59 0 Td[(1Xd=0Probfdij=dmi=xgk=)]TJ /F5 7.97 Tf 13.71 5.53 Td[(yXd=0pd(x,l k,d)k)]TJ /F9 11.955 Tf 11.95 9.68 Td[()]TJ /F5 7.97 Tf 8.2 5.53 Td[(y)]TJ /F7 7.97 Tf 6.59 0 Td[(1Xd=0pd(x,l k,d)k, (6)wherepd(x,l k,d)=Probfdij=dmi=xgistheconditionalprobabilitythatapartitioncontainingmi=xtagshappenstohavedones.Thecalculationofpd()canbefoundinSection 6.6.4 .Hence,theconditionaldistributionofjCijis,ProbfjCij=cmi=xg=)]TJ /F5 7.97 Tf 8.17 5.26 Td[(x)]TJ /F5 7.97 Tf 6.59 0 Td[(cXd=0pd(x,l k,d)k)]TJ /F9 11.955 Tf 11.95 9.69 Td[()]TJ /F5 7.97 Tf 7.47 5.26 Td[(x)]TJ /F5 7.97 Tf 6.59 0 Td[(c)]TJ /F7 7.97 Tf 6.58 0 Td[(1Xd=0pd(x,l k,d)k. (6)Secondly,wederiveProbfjFij=s)]TJ /F3 11.955 Tf 11.97 0 Td[(cjCij=c,mi=x,ni=zg.AtagnotinMmapsitselftokpartitionsandchooseonebitrandomlyfromeachpartition.Ifallthesebitsareones,falsepositivehappens.Theconditionalfalsepositiveprobabilityis,ProbffalsepositiveinVimi=xg=xXd=0kd lpd(x,l k,d)k. (6)WhenjCij=c,maxj2[1,k]dij=mi)]TJ /F3 11.955 Tf 11.96 0 Td[(c,hence,ProbffalsepositiveinVijCij=c,mi=xg=1 pd(x,l k,x)]TJ /F3 11.955 Tf 11.95 0 Td[(c)hx)]TJ /F5 7.97 Tf 6.59 0 Td[(cXd=0kd lpd(x,l k,d)k)]TJ /F9 11.955 Tf 11.96 13.27 Td[(x)]TJ /F5 7.97 Tf 6.58 0 Td[(c)]TJ /F7 7.97 Tf 6.58 0 Td[(1Xd=0kd lpd(x,l k,d)ki, (6)denotedaspfc,whichrepresentsthefalsepositiveprobabilitywhenmi=xtagsareencodedintheithsegmentsandjCij=ctagsaremovedtothecollisionsetC.WhennitagsinN)]TJ /F3 11.955 Tf 12.31 0 Td[(MaremappedtoVi,theconditionaldistributionofjFijfollowsthebinomial 128

PAGE 129

distributionBino(ni,pfc),thus,ProbfjFij=s)]TJ /F3 11.955 Tf 11.96 0 Td[(cjCij=c,mi=x,ni=zg=zs)]TJ /F3 11.955 Tf 11.96 0 Td[(cps)]TJ /F5 7.97 Tf 6.58 0 Td[(cfc(1)]TJ /F3 11.955 Tf 11.95 0 Td[(pfc)z)]TJ /F5 7.97 Tf 6.59 0 Td[(s+c. (6)From( 6 )and( 6 ),wecanderiveProbfjSij=smi=x,ni=zg.Thus,theprobabilitydistributionofjSijis,ProbfjSij=sg=mXx=0n)]TJ /F5 7.97 Tf 6.59 0 Td[(mXz=0sXc=0ProbfjSij=smi=x,ni=zgProbfmi=xgProbfni=zg=mXx=0n)]TJ /F5 7.97 Tf 6.59 0 Td[(mXz=0sXc=0h)]TJ /F5 7.97 Tf 8.17 5.26 Td[(x)]TJ /F5 7.97 Tf 6.59 0 Td[(cXd=0pd(x,l k,d)k)]TJ /F9 11.955 Tf 11.95 9.68 Td[()]TJ /F5 7.97 Tf 7.48 5.26 Td[(x)]TJ /F5 7.97 Tf 6.59 0 Td[(c)]TJ /F7 7.97 Tf 6.59 0 Td[(1Xd=0pd(x,l k,d)kizs)]TJ /F3 11.955 Tf 11.95 0 Td[(cps)]TJ /F5 7.97 Tf 6.59 0 Td[(cfc(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pfc)z)]TJ /F5 7.97 Tf 6.59 0 Td[(s+cmx(l v)x(1)]TJ /F3 11.955 Tf 14.66 8.09 Td[(l v)m)]TJ /F5 7.97 Tf 6.59 0 Td[(xn)]TJ /F3 11.955 Tf 11.95 0 Td[(mz(l v)z(1)]TJ /F3 11.955 Tf 14.66 8.09 Td[(l v)n)]TJ /F5 7.97 Tf 6.59 0 Td[(m)]TJ /F5 7.97 Tf 6.58 0 Td[(z. (6)LetSbetheunionofCandF.WehavejSj=jCj+jFjandjSj=Pv=li=1jSij.AsS1,S2,...,Sv=lareindependentofeachother,theprobabilitydistributionofjSjistheconvolutionofjSij.Hence, ProbfjSj=sg=ProbfjS1j=sg...ProbfjSv lj=sg,whereistheconvolutionoperator.WiththehelpofFourierTransform,wehave ProbfjSj=sg=^FFThFFT)]TJ /F3 11.955 Tf 5.48 -9.69 Td[(ProbfjSij=sgv=li,(6)whereFFTistheFastFourierTransform,and^FFTistheinverseFastFourierTransform.Adopting( 6 ),wehave ProbfTBg=smaxXs=0ProbfjSj=sg,(6)wheresmax=B)]TJ /F5 7.97 Tf 6.58 0 Td[(mtinf ttag)]TJ /F5 7.97 Tf 13.42 4.7 Td[(v l.TherightsideisdenotedasP0t(v,B),whichistheprobabilityfortheprotocolexecutiontimetobeboundedbyBunderacertainvalueofv.ItiscomputableasafunctionofvandBafter( 6 )isappliedandparametersmandnare 129

PAGE 130

Figure6-5. BoundBthatsatisesProbfTBgwithrespecttov.Parameters:n=10,000,m=1,000. given.GivenavalueofB,wecanndthelargestvthatsatisesP0t(v,B)
PAGE 131

whenatagtransmits,werequireittoincludeaCRCchecksumthatiscomputedfromtheconcatenationoftheinformationbitsandthetag'sID.Whenthereaderexpectsinformationfromataginatimeslot,iftheslotturnsouttobeemptyorthedatareceivedintheslotdonotcarryacorrectCRC,thereaderknowsthatinformationfromthetagisnotcorrectlyreceived.Attheendoftheprotocol,allmissedinformationcanberetrievedbypollingthetagsdirectly. 6.10SimulationResultsInthissection,weevaluatetheperformanceofournewprotocols,thetagorderingpollingprotocol(TOP)andtheenhancedtagorderingpollingprotocol(ETOP).Wecomparethemwiththebasicpollingprotocol(BP)andthecodedpollingprotocol(CP).Ourevaluationusestwoperformancemetrics:(1)theaveragenumberofbitsthateachtaghastoreceiveduringtheprotocolexecution,and(2)theoverallexecutiontime.Weonlyconsiderenergyconsumptionoftagsinreceivinginformationfortworeasons.First,thisisthemajor,variableportionoftheenergycostpertag.Aswewillseeshortly,eachtagmayhavetoreceivehundredsofthousandsofbitsduringprotocolexecution,whereasitonlysendsasmall,xedamount,e.g.,32bits.Second,theenergycostfortagsinMtotransmittheirinformationisthesameforallprotocols.Omittingthemdoesnotaffectthecomparison.Weusethefollowingparameterstocongurethesimulation:eachtagIDis96bitslong,informationreportedfromatagtothereaderis32bitslong,andeachsegmentinETOPis80bitslonganddividedinto4partitions,i.e.k=4.ThetransmissiontimeisbasedontheparametersofthePhilipsI-Codespecication[ 141 ].Theratefromatagtothereaderis53Kb/sec;ittakes18.88sforatagtotransmitonebit.Anytwoconsecutivetransmissions(fromthereadertotagsorviceversa)areseparatedbyawaitingtimeof302s.Thevalueoftinfiscalculatedasthesumofawaitingtimeandthetimefortransmittingtheinformation,whichis18.88smultipliedbythelengthoftheinformation.For32-bitinformation,tinf=906s.Thetransmissionratefromthereader 131

PAGE 132

Figure6-6. EnergyandtimecomparisonbetweenBP,CP,TOP,andETOP.Parameters:m=0.1n,v=24mforTOPandETOP.Notethatthehorizontal`0'lineisnotatthebottominordertomaketheETOPcurvevisible. totagsis26.5Kb/sec;ittakes37.76sforthereadertotransmitonebit.Thevalueofttagiscalculatedasthesumofawaitingtimeandthetimefortransmittinga96-bitID.Theresultis3927s. 6.10.1VaryingnumbernoftagsWerstvarythenumbernoftagsinthesystemfrom10,000to100,000.Wesetv=24mandm=0.1n,i.e.,10%ofalltagsareselectedbythereadertoreportinformation.Figure 6-6 comparesfourprotocolsintermsofenergycostandprotocolexecutiontime.Theleftplotshowsenergycosts.TOPandETOPreduceenergyconsumptionbyoneormultipleordersofmagnitude.Forexample,whenn=100,000,per-tagenergycostinTOPis9.4%ofthecostinCP,and5.0%ofthecostinBP.Per-tagenergycostinETOPisjust0.52%ofthecostinCP,and0.28%ofthecostinBP.Therightplotshowstheexecutiontimecomparison.TOPrequires25%lesstimethanBP,but27%moretimethanCP.ETOPrequires55%lesstimethanBPand24%lesstimethanCP.Insummary,CPreducesbothenergycostandexecutiontimenearlybyhalfwhencomparingwithBP.TOPmakesgreatimprovementoverCPintermsofenergycost,buthasmodestlyhigherexecutiontime.ETOPconsiderablyoutperformsCPintermsofbothenergycostandexecutiontime. 132

PAGE 133

Figure6-7. EnergyandexecutiontimecomparisionofTOPandETOP.Topleftplot:EnergycostofTOPwithrespecttov.Toprightplot:EnergycostofETOPwithrespecttov.Bottomleftplot:ExecutiontimeofTOPwithrespecttov.Bottomrightplot:ExecutiontimeofETOPwithrespecttov. 6.10.2VaryingSizevofReporting-orderVectorNext,weshowhowthevalueofvinuencestheperformanceofTOPandETOP.Wesetn=50,000andm=5,000,10,000,or25,000.Wevaryvfrom4mto64mandusesimulationtondenergycostpertagandprotocolexecutiontime.Figure 6-7 showsthesimulationresults.ThetopleftandtoprightplotspresenttheaverageamountofdataeachtagreceivesinTOPandETOP,respectively.ThecurvesmatchthetheoreticalresultswehavegiveninSection 6.6 .Whenvisreasonablylarge,e.g.,v7m,ETOPconsumeslessenergythanTOP.ThebottomleftandbottomrightplotspresenttheprotocolexecutiontimeofTOPandETOP,respectively.ETOPalsorequireslesstimethanTOPwhenv7m. 133

PAGE 134

6.11SummaryInthischapter,weintroducetwoenergy-efcientinformationcollectionprotocols,TOPandETOP,forlarge-scaleRFIDsystems.Theseprotocolsaredesignedtocollectreal-timeinformationfromasubsetoftagsinthesystem.Ourprimaryobjectiveistolowerenergyconsumptionbytagsinordertoextendtheirlifetime.TOPsharessimilaritywiththeBloom-1lterwediscussedinChapter 3 ,whileETOPcanbetreatedasanextensionofitadoptingtheideaofpartitionedBloomlter.ThenewprotocolscanbeconguredtoachieveO(1)energycostpertag.Performancetradeoffbetweenenergycostandexecutiontimecanbemadebycontrollingthesizeofthereporting-ordervector.Simulationresultsshowthatthenewprotocolsareabletocutenergycostbymorethananorderofmagnitude,whencomparingwithotherprotocols. 134

PAGE 135

CHAPTER7SETSIZEESTIMATIONINMOBILEVEHICULARP2PNETWORKS 7.1Background,SystemModel,andProblemDenition 7.1.1MobileVehicularP2PNetworksRFIDsystemisnottheonlycyber-physicalsystemthatrequirestimeandenergyefciency.Futureautomobilesmaybeequippedwithwirelessdevicesthatenablevehicularpeer-to-peer(VP2P)networkstomakeourtransportationsystemsmoreefcient.Asvehiclesmovecloseorpassbyeachother,datamaybeexchangedtoinformroadconditionsahead,andstatisticalinformationorsystem-widemeasurementmaybegeneratedcollaboratively[ 154 ].InaVP2Pnetwork,movingcarscommunicateusingshort-rangewirelesstechnologiessuchasIEEE802.11andDedicatedShortRangeCommunications(DSRC).Vehiclesinthesenetworkscanbehighlymobile.ConstantlyunstablenetworktopologiesmaymakeitunsuitabletousesomepopularroutingprotocolssuchasDSDV[ 125 ],AODV[ 126 ]andDSR[ 64 ],whicharedesignedfortraditionalmobilead-hocnetworks(whosetopologiesshouldberelativelystableenoughfortheseprotocolstowork).IfthevehiclesareGPS-capable,geographicalroutingprotocols[ 9 67 ]canbeusedtoroutedataovermultiplehops.NewfunctionshavetobedevelopedinthecontextofVP2Ptosupportemergingapplications.Thispaperinvestigatesonebasicfunction:estimatingthecardinalityofamobileVP2Pnetwork,i.e.,thenumberofvehiclesinthenetwork.Inliterature,thisproblemisalsocalledthevehicledensityestimationproblem.Suchafunctionisusefulwhenatransportationdepartmentneedstoknowinformationabouttrafcvolume,i.e.,numberofmovingcars,1ineachdistrictofacity,wherecarswithintheboundaryofeachdistrictformsaVP2PnetworkwiththeassistanceofGPSand 1Parkedcarsdonotcontributetotrafc.Theyaremostlypoweredoffandthereforedonotparticipateinthenetwork. 135

PAGE 136

wirelesscommunicationdevices.EachvehicleinaVP2Pnetworkonlyhasknowledgeaboutitsimmediateneighborhood.Withoutacentralentitythathasaccesstoallvehicles,cardinalityestimationisadistributedprocesscarriedoutamongpeernodes,asvehiclesenteranddepartfromthenetworkdynamically.Thefunctionofcardinalityestimationcanbeeasilyimplementedinastaticwirelessnetworkwhereallnodesarestationary.Aquerynodebroadcastsarequestmessageintothenetwork.Eachnodetransmitsthemessageoncewhenitreceivesthemessageforthersttime.Thenodealsoremembersitspredecessor,whichistheonefromwhichitreceivesthemessageforthersttime.Thissimplebroadcastprotocolisnotanefcientone,comparingwithothers[ 60 91 105 ].Butitquicklyestablishesaroutingtreeifeachnodetreatsitspredecessorastheparentnode.Inthistree,eachnodelearnsthenumbersofnodesinthesubtreesrootedatitschildren,sumthemupforthenumberofnodesinthesubtreerootedatitself,andthenreportsthatinformationtoitsparent.Asthisdistributedcomputationisrecursivelycarriedoutfromtheleafnodestowardtheroot,thequerynodewhichislocatedattherootofthetreewilllearnthenumberofnodesinthewholenetwork.However,anynodefailurewillbreakthetree.Tosolvethisproblem,thesynopsisdiffusionapproach[ 117 ]isproposedtocollectinformationbasedonamorerobustDAG(DirectedAcyclicGraph[ 32 ])routingstructure.Eachnodeaggregatesdatafromitsupstreamneighbors,integratesitsowninformation,andbroadcaststheaggregatedinformationdownstream.Toensureeachnodeonlytransmitsonce,itmustreceivedatafromallupstreamneighborsbeforeitsowntransmission.Thisrequiresastatictopologysuchasthetypeofsensornetworksinvestigatedin[ 117 ].Othermoreefcientapproacheshavebeeninventedtoestimatethenumberofnodesinastationarysensornetwork[ 15 33 34 81 ]orestimatethenumberoftagsinalargeRFIDsystem[ 55 71 72 129 176 ],assumingacentralized 136

PAGE 137

communicationmodelwherealltagsdirectlycommunicatewithaRFIDreader.2Theabovesolutionscannotbeappliedtothemulti-hopmobilenetworkmodelinthecontextofchapter.InamobileVP2Pnetwork,becausenodes(i.e.,vehicles)aremoving,thereisnostableroutingstructureforcollectinginformation.Moreover,thenumberofnodesmaychangeasnewnodesjoinandexistingnodesdepart.Thisrequirestheoperationofdeterminingthenumberofnodestobecarriedoutfrequentlyifapplicationsdemandanevolvingpictureofnetworksizeovertime.Ononehand,itisnotefcienttorelyonaooding(orbroadcasting)approachforsuchanoperationduetohighoverhead.Ontheotherhand,manyapplicationsmaynotrequiretheexactcountofthenodes,whichvariesovertimeanyway.Anestimatednumbercanalreadybeveryusefulifacertainaccuracyrequirementismet.Forexample,thetransportationdepartmentmaynothavetoknowtheexactnumberofvehiclesthatarepresentinadistrict.Ifanestimatednumber,saywithin10%oftheexactnumber,meetstheirneed,thenwecanadoptmoreefcientapproachesthatworkwellformobilenetworks.ThischapterproposestwostatisticalmethodsforestimatingthecardinalityofamobileVP2Pnetwork.Therstmethodiscalledthecircledrandomwalk(CRW).Itestimatesthenumberofnodeswithanaccuracythatistunable.Itmakestradeoffbetweencommunicationoverheadandestimationerror.Theerrorcanbemadearbitrarilysmallattheexpenseofincreasingoverhead.OneproblemofCRWisthatithasanidealizedassumptionforrandomizedneighboringrelationshipamongthenodes,whichapproximatesthecaseofarelativelysmallnetworkofhighlymobilenodes.Thisassumptionishowevernotgenerallyapplicable.Oursecondmethod,calledthetokenedrandomwalk(TRW),removessuchanassumption.Withtheassistance 2Normally,RFIDtagscannotcommunicatewitheachothertoformamulti-hopwirelessnetwork. 137

PAGE 138

ofrandomlydistributedtokens,TRWprovidesastatisticalestimationonthenumberofnodesinanarbitrarilyconnectedVP2Pnetwork.OurextensivesimulationsdemonstratethatCRWworkswellinnetworksofhighmobility,whereasTRWworkswellwithhighorlowmobility.ThereareothertypesofmobileP2Pnetworks[ 160 163 ],wherecommunicationinfrastructuredoesnotexistorcannotbeaccessedduetotechnicaloreconomicalreasons.MobileP2Pnetworkshavemanycivilianandmilitaryapplications.Forinstance,soldiersinabattleeldmaycarrysmallwirelessdevicesandformamobileP2Pnetworkamongthem.Inanotherexample,smartphonesorPDAsmaycarryapplicationsthatbenetfrompeer-to-peernetworksforinformationdiscoveryorsocialnetworkingfunctions.SolutionsdevelopedforVP2Pnetworks,includingthemethodsforcardinalityestimationinthischapter,willbeabletondtheirapplicabilityinothertypesofmobileP2Pnetworksaswell. 7.1.2NetworkModelConsideralargevehicularpeer-to-peersystemwhereweabstractvehiclesaswirelessnodes.Wirelesslinksareestablishedbetweennearbynodesthatarewithinthecommunicationrangeofeachother.Alllinksformaconnectedgraph.Twonodesareneighborsofeachotherifthereisawirelesslinkbetweenthem.Becausethenodesmove,theneighboringrelationshipchangesovertime. 7.1.3ProblemDenitionTheproblemistoestimatethenumbernofnodesinamobileVP2Pnetwork,i.e.,thecardinalityofthenetwork.Ourgoalistodevelopstatisticalmethodsthatestimatethevalueofnwithoutoodingthenetworkorrequiringtheparticipationofallnodes(foroverheadreduction).Let^nbetheestimatednumberofnodes.Thegoalisthattheprobabilityforntofallintherange[(1)]TJ /F4 11.955 Tf 12.38 0 Td[()^n,(1+)^n]isatleast,where(<1)and(<1)aretheparametersoftheaccuracyrequirement. 138

PAGE 139

Weassumethatthereexistend-to-endroutingprotocols[ 9 67 ]thatareabletorouteamessagetoacertainnode.Consideralargevehicularpeer-to-peersystemwhereweabstractvehiclesaswirelessnodes.Wirelesslinksareestablishedbetweennearbynodesthatarewithinthecommunicationrangeofeachother.Alllinksformaconnectedgraph.Twonodesareneighborsofeachotherifthereisawirelesslinkbetweenthem.Becausethenodesmove,theneighboringrelationshipchangesovertime.TheproblemistoestimatethenumbernofnodesinamobileVP2Pnetwork,i.e.,thecardinalityofthenetwork.Ourgoalistodevelopstatisticalmethodsthatestimatethevalueofnwithoutoodingthenetworkorrequiringtheparticipationofallnodes(foroverheadreduction).Let^nbetheestimatednumberofnodes.Bysamplingonlyasubsetofnodes,ourmethodsensurethattheprobabilityforntofallintherange[(1)]TJ /F4 11.955 Tf 12.49 0 Td[()^n,(1+)^n]isatleast,where(<1)and(<1)aretheparametersoftheaccuracyrequirement.Weassumethatthereexistend-to-endroutingprotocols[ 9 67 ]thatareabletorouteamessagetoacertainnode.FormobileVP2Pnetworks,wemayapproximatelyconsiderthattheneighborsofanodearerandomlyselectedfromthenetworkifthemovementofanodeallowsittofrequentlychangeneighborsandhavechanceofneighboringwithmanyothernodesovertime.Inpractice,therandom-neighborassumptioncannotreallybemet.However,oursimulationrevealsthateventhoughthisassumptionisnotaccurate,ourmethodbasedontheassumptionstillprovidesgoodcardinalityestimation.Wealsoobservethatforstaticnetworksornetworksoflowmobility,themethodbasedontherandom-neighborassumptiondoesnotworkwell.WedevelopcardinalityestimationmethodsformobileVP2Pnetworks.Ourrstmethodusestherandomneighborassumption.Itissimplerandeasytoimplement.Removingtherandomneighborassumption,oursecondmethodisabletoworkfor 139

PAGE 140

networkofhighorlowmobility.Itismorerobustbutrequiresadditionalassistanceinitsexecution. 7.1.4PriorArtThemostrelatedtopicinvehicularnetworksisthevehicledensityestimationproblem,whichaimstocalculatethenumberofvehiclesperroadlengthinanarea.Ourestimatedcardinalitycanbedirectlyappliedtocalculatevehicledensitybydividingitwiththetotallengthoftheroadinthearea.Mostrelatedworkestimatethetrafcdensityonspecicroadorsection.Yetourapproachfocusesonarelativelargedistrict,forexample,thewholedowntownarea,whoseactuallydensitychangesmuchslowerovertimecomparingtostreetsegments.Vehicledensityestimationmethodscanbedividedintotwogeneralcategories:infrastructure-assistedapproachesandinfrastructure-freeapproaches.Infrastructure-assistedapproaches[ 4 36 51 ]relyondetectingdevicessuchasinductiveloopdetectors[ 149 ],roadsideradar,sensors,orcamerastobeinstalledforcollectingtrafcdata.Thesedevicesmaycollectthevehiclecount,thevehiclespeed,ortheinstanceatwhicheachvehiclemovespastthem,whicharethenusedtoestimatethevehicledensity.However,theseapproachessufferfromlimitedcoverage,lowreliability,andhighdeploymentandmaintenancecosts[ 80 ].Infrastructure-freedensityestimationapproachesdonotrelyonanypre-installeddetectingdevices.Existingworksinthiscategorycanbefurtherdividedintotwosub-categories:speed-basedapproachesandmessage-basedapproaches.Speed-basedapproaches[ 5 144 156 ]assumeaspeed-densityrelationshipbetweenvehicle'sspeedandvehicledensityontheroad.Forexample,lowspeedimpliesdensetrafc.However,theseapproachescouldgiveinaccuratedensitiessincethespeedisnotalwaysanindicationofthevehiclesdensityontheroad.Anexampleiswhenvehiclesstopatsparseintersectionswithlowspeedandlowdensity. 140

PAGE 141

Thefollowingworks,togetherwithourapproach,belongtomessage-basedcategory.Jerbietal.proposedafullydistributedgroupingapproachfordensityestimation[ 63 ].Inthisapproach,aroadsegmentisdividedintomultiplexed-sizecells.Groupleader,whichisadesignatedvehicleinacell,computesvehicledensityandspreadsthisinformationamongothermembersinthecellandgroupleadersofothercells.Theaveragedensityontheroadcanbeobtained.PanichpapiboonandPattara-atikomproposedaneighbor-baseddensityestimation[ 123 ].Localdensityisrstcalculatedbasedonthenumberofone-hoportwo-hoplocalneighbors.Then,theglobaldensityoftheroadisestimatedassumingtheinter-vehiclespacingisexponentiallydistributed.However,insparsenetwork,thisapproachbecomeunreliablebecausetheprobevehicleisnotabletondasufcientnumberofneighborstomakeanaccurateestimation.Also,theirassumptionofexponentialdistributedinter-vehiclespacingmightnotbethecaseforallpossibletrafcscenarios.Garellietal.proposedamethodcalledMobSamplingtocalculatethedensityofaspecictargetarea[ 50 ].Itselectsavehiclecalledsampler,whichbroadcastaPOLLmessagewiththesamplerpositionandtheradiusofthetargetareaatrandomintervals.Whenavehicleinthetargetareareceivesthemessage,itrepliesafterwaitingforrandomtimetoavoidoodingthereceiver.Thesamplercountsthereceivedmessagestoestimatethelocaldensity.Whenasamplerleavesthetargetarea,anewvehicleischosenassampler.Assamplerandsamplerpositionrandomlychanges,continuousmeasurementsoflocaldensitygivesanestimationofdensityofthewholearea.However,thismethodneedsrelativelylongtimeduetotherandomwaitingandsamplerhanding-over.In[ 2 ],Akhtaretal.adaptedmechanismsfromsystemsizeestimationinpeer-to-peersystem[ 62 73 108 ]forvehiculardensityestimation.Theyadaptedandimplementedthreefullydistributedalgorithms,namelySample&Collide,HopSamplingandGossip-basedAggregation.Sample&Collidealgorithmtriestosamplethenodesfrom 141

PAGE 142

thenetwork,andthenestimatingthesystemsizedependingonhowmanysamplesofthenodesarecollected,beforeanalreadysamplednodeisre-selected.ThisalgorithmissimilartoourCRWapproachintermsvariableusedforestimation,butrandomsamplingisachievedifferently.Intheiralgorithm,aninitiatornodesendsamessagewithapre-denedvalueTtooneofitsneighboringnodes(T>0).Uponreceivingasamplingmessage,anodedecrementsTbyarandomamount,andsendthenewvaluetooneofitsneighborsselecteduniformlyatrandom.WhenanodendsT0,itissampledandrepliestotheinitiator.Theinitiatorkeepssendingsamplingmessageuntilaalreadysamplednodereplies.Thisalgorithmhasthreedisadvantages:First,itdoesnottakeadvantageofnodemobility,whichitselfshowsrandomnessinneighboringrelationship.Second,eventhoughitsendsoutseveralprobemessages,itessentiallyhasonlyonemeasurementofC(equivalenttoXinourapproach).Itisnotefcient.Third,precisionisnotboundedintheiralgorithm.TheHopSamplingalgorithmworksasfollowsTheinitiatornodebroadcastsamessagetoallthevehiclesinthenetworkusinggossiping.Thevehiclesreplybacktotheinitiatorwithaprobability,whichisinverselyproportionaltothedistance(hopcount)fromtheinitiator.Basedontherepliesthattheinitiatorgetsfromothervehiclesatdifferentdistances,itestimatesthesizeofthenetwork.However,theoverheadofthisapproachistoomuch.Eventhoughnodesreplyprobabilistically,gossipingmessagetraverseseveryedgeinthenetworkatleastonce.Moreover,asnetworktopologydynamicallychanges,avehicle'sprobabilityofreplyingislargerthanexpected,becauseitsstoredhopcountnevergoesupevenifitsrealdistancehasincreased.Asaconsequence,thealgorithmhasover-estimationproblem,whichisindicatedbytheirsimulationresults.InGossip-basedAggregationalgorithm,theinitiatorsamplesKvehiclesatrandomandassignavalue1tothem(RandomsamplingisachievedusingthemethodmentionedinSample$Collidealgorithm).Othervehiclesareassigned0.Neighbors 142

PAGE 143

periodicallyexchangesinformationwithoneofitsneighboursselectedatrandom,andupdatetheirvaluetotheaverageofthetwo.Insteadystate,thevalueineachvehiclewillbeK=N,whereNisthenumberofvehiclesinthenetwork.However,thisalgorithmneedstimetoconverge.Also,ifanodeleavesthenetworkduringtheinitialphaseofthealgorithmafterreceivingthegossipmessage,theaccuracyofthealgorithmdecreasessignicantly[ 2 ].Therearealsoworksthatestimatelocaldensitytodynamicallychangetransmissionpower[ 110 155 ].Forexample,NoureddineandSamiraproposeanefcientlocaldensityestimationmethodusingsegmentedroadsandextendedbeaconstoestimatethenumberofneighborsaroundavehicle[ 120 ].ThesealgorithmscanbeusedcollaborativelywithourapproachtopreventMAClevelcollision. 7.2CircledRandomWalkProtocol(CRW)Inthissection,wedescribeourrststatisticalmethodforestimatingthenumberofnodesinalargeVP2Pnetwork. 7.2.1DescriptionoftheProtocolAquerynodeasendsoutanumberofprobemessages.Eachprobeindependentlyperformsacircledrandomwalk(CRW).Theprobecarriesagloballyuniqueidentierandahopcount.Theidentierconsistsofnodea'slocationandasequencenumber.Thehopcountisinitializedtobezero.Whenanodereceivesaprobe,itrecordstheprobe'sidentierandincreasesthehopcountintheprobebyone.Ifthenodereceivestheprobeforthersttime,itforwardstheprobetoarandomlyselectedneighborexceptfortheonefromwhichtheprobewasjustreceived.Ifthenodereceivestheprobeforthesecondtime,itdiscardstheprobeandsendstheprobe'shopcountbacktonodea.Therandomwalkofaprobeterminatesonceitstraversingpathformsacircle.AnillustrationisgiveninFigure 7-1 .LetXbethelengthofacircledrandomwalk,whichisthenumberofhopsthattheprobetraversesbeforeitreachesanodeforthesecondtime.Xisarandomnumber.In 143

PAGE 144

Figure7-1. IllustrationofCRW. thenextsubsection,weestablishthefollowingmathematicalformulathatlinksE(X)ton: E(X)=f(n)=nXi=3i(i)]TJ /F7 7.97 Tf 6.59 0 Td[(1Yj=2(1)]TJ /F3 11.955 Tf 14.3 8.09 Td[(j)]TJ /F6 11.955 Tf 11.95 0 Td[(2 n)]TJ /F6 11.955 Tf 11.96 0 Td[(2))i)]TJ /F6 11.955 Tf 11.95 0 Td[(2 n)]TJ /F6 11.955 Tf 11.96 0 Td[(2,n=f)]TJ /F7 7.97 Tf 6.58 0 Td[(1(E(X)).(7)Usingtheformula,wecanestimatenafterwemeasureE(X)byperforminganumberofcircledrandomwalksandtakingtheaverageXofthereceivedhopcounts.Let^nbetheestimatedvalueofn. ^n=f)]TJ /F7 7.97 Tf 6.58 0 Td[(1(X)(7)Thepseudocodeofthealgorithmweusetonumericallycompute^nnumericallyfromXbasedon( 7 )isgivenbelow.Notethatf(n)isamonotonicallyincreasingfunction. 1.pickasmallvaluen1andalargevaluen2suchthatf(n1)X2.WHILE(n16=n2)DO3.letn3=b(n1+n2)=2c4.IFf(n3)
PAGE 145

7.2.2LinkingE(X)tonInthissubsection,wederiveEquation( 7 )andthevarianceofX.Wealsogiveawayforquickestimationofn.ItislessaccuratethanwhatiscomputedbyEquation( 7 )butiseasiertocalculate.Letq(i)betheprobabilityofnotvisitinganynodetwiceafteraprobemovesforihopsinitsrandomwalk.Thenodeathatinitializestherandomwalkisavisitednodebydefault.Itisobviousthatq(1)=q(2)=100%.Considertheithhopoftherandomwalk,8i>2.Theprevious(i)]TJ /F6 11.955 Tf 12.42 0 Td[(1)hopsvisitedinodes,includingnodea.Thenexthopcanbeanynodeexceptforthecurrentnodeandthepreviousnodeoftherandomwalk.Theprobabilityfortheithhoptobeanunvisitednodeis1)]TJ /F5 7.97 Tf 14.09 4.71 Td[(i)]TJ /F7 7.97 Tf 6.58 0 Td[(2 n)]TJ /F7 7.97 Tf 6.59 0 Td[(2.Hence,q(i)=q(i)]TJ /F6 11.955 Tf 11.95 0 Td[(1)(1)]TJ /F3 11.955 Tf 14.44 8.08 Td[(i)]TJ /F6 11.955 Tf 11.95 0 Td[(2 n)]TJ /F6 11.955 Tf 11.96 0 Td[(2).Here,wehaveusedtherandomneighborassumptioninSection 7.1 .Itdoesnotworkwellforlowmobilityaswewilldemonstratethroughsimulations.Thisassumptionwillberemovedinthenextsection.Recursivelyapplyingtheaboveequation,wehaveq(i)=q(1)iYj=2(1)]TJ /F3 11.955 Tf 14.29 8.09 Td[(j)]TJ /F6 11.955 Tf 11.95 0 Td[(2 n)]TJ /F6 11.955 Tf 11.96 0 Td[(2)=iYj=2(1)]TJ /F3 11.955 Tf 14.3 8.09 Td[(j)]TJ /F6 11.955 Tf 11.96 0 Td[(2 n)]TJ /F6 11.955 Tf 11.96 0 Td[(2).Xisarandomvariablewithadistributionfrom3ton.Weknowthatq(i)]TJ /F6 11.955 Tf 12.92 0 Td[(1)istheprobabilityofnotvisitinganynodetwiceaftertherst(i)]TJ /F6 11.955 Tf 13.06 0 Td[(1)hops,andi)]TJ /F7 7.97 Tf 6.58 0 Td[(2 n)]TJ /F7 7.97 Tf 6.59 0 Td[(2istheconditionalprobabilityfortheithhoptoreachanodethathasbeenvisitedpreviously.Clearly,q(i)]TJ /F6 11.955 Tf 12.6 0 Td[(1)i)]TJ /F7 7.97 Tf 6.59 0 Td[(2 n)]TJ /F7 7.97 Tf 6.58 0 Td[(2istheprobabilityofvisitinganodetwiceafterihops.Hence,theexpectedvalueofXisE(X)=nXi=3iq(i)]TJ /F6 11.955 Tf 11.95 0 Td[(1)i)]TJ /F6 11.955 Tf 11.95 0 Td[(2 n)]TJ /F6 11.955 Tf 11.96 0 Td[(2=nXi=3i(i)]TJ /F7 7.97 Tf 6.58 0 Td[(1Yj=2(1)]TJ /F3 11.955 Tf 14.3 8.09 Td[(j)]TJ /F6 11.955 Tf 11.96 0 Td[(2 n)]TJ /F6 11.955 Tf 11.95 0 Td[(2))i)]TJ /F6 11.955 Tf 11.96 0 Td[(2 n)]TJ /F6 11.955 Tf 11.95 0 Td[(2 145

PAGE 146

ThevarianceofXisV(X)=nXi=3(i)]TJ /F3 11.955 Tf 11.95 0 Td[(E(X))2(i)]TJ /F7 7.97 Tf 6.59 0 Td[(1Yj=2(1)]TJ /F3 11.955 Tf 14.3 8.09 Td[(j)]TJ /F6 11.955 Tf 11.96 0 Td[(2 n)]TJ /F6 11.955 Tf 11.95 0 Td[(2))i)]TJ /F6 11.955 Tf 11.96 0 Td[(2 n)]TJ /F6 11.955 Tf 11.95 0 Td[(2.Figure 7-2 showsE(X)withrespecttoninlogscale.Inthesameplot,twoadditionalcurves,1.31p nand1.25p n,areshowntocloselyoverlapwithE(X).Infact,numericalcomputationshowsthattheyareupperandlowerboundsofE(x)forawiderangeofn2[103,107].ThisindicatesthatE(X)=O(p n)inthisrange.1.25p n
PAGE 147

theithprobe.AsXistheaveragehopcountsofprobes,wehave, X=1 mmXi=1XiE(X)=1 mE(mXi=1Xi)=1 mmXi=1E(Xi)(7)EachXiisanindependentrandomsampleofX.ThereforeE(Xi)=E(X).From( 7 ),wehaveE(X)=E(X).Next,wedeterminethevalueofmsuchthattheprobabilityforE(X)tofallintherange[(1)]TJ /F4 11.955 Tf 12.65 0 Td[(0)X,(1+0)X]isatleast,where0isaconstantwhosevaluewillbedeterminedshortly.ThecondenceintervalofE(X)isXst=p m,wheresisthestandarddeviationcalculatedfromthereceivedhopcountsandtistheupper1+ 2pointofthetdistributionwith(m)]TJ /F6 11.955 Tf 12.7 0 Td[(1)degreeoffreedom.Thevalueofmiscalculatedasfollows: st=p m=0Xm=(st)2 (0X)2(7)Then,wedetermineanappropriatevaluefor0.Weknowthatifmischosenbasedon( 7 ),thenthefollowinginequalityissatisedwithprobability:(1)]TJ /F4 11.955 Tf 11.95 0 Td[(0)XE(X)(1+0)Xf)]TJ /F7 7.97 Tf 6.59 0 Td[(1((1)]TJ /F4 11.955 Tf 11.96 0 Td[(0)X)f)]TJ /F7 7.97 Tf 6.59 0 Td[(1(E(X))f)]TJ /F7 7.97 Tf 6.59 0 Td[(1((1+0)X)f)]TJ /F7 7.97 Tf 6.58 0 Td[(1((1)]TJ /F4 11.955 Tf 11.96 0 Td[(0)X)nf)]TJ /F7 7.97 Tf 6.58 0 Td[(1((1+0)X)Ourestimationaccuracyrequirementistosatisfy(1)]TJ /F4 11.955 Tf 11.16 0 Td[()^nn(1+)^nwithprobability.Tomeetthisrequirement,weselectthemaximumvalueof0thatmeetsthefollowingconditions: f)]TJ /F7 7.97 Tf 6.58 0 Td[(1((1+0)X)(1+)^nf)]TJ /F7 7.97 Tf 6.59 0 Td[(1((1)]TJ /F4 11.955 Tf 11.95 0 Td[(0)X)(1)]TJ /F4 11.955 Tf 11.95 0 Td[()^n,(7) 147

PAGE 148

whereisaconstantspeciedintheaccuracyrequirement,Xand^nareresultsoftheCRWexecution,andweknowhowtosolvef)]TJ /F7 7.97 Tf 6.59 0 Td[(1basedonthealgorithminSection 7.2.1 .Hence,0canbecomputednumericallyfromtheaboveinequalities.Finally,basedon( 7 )and( 7 ),wedesignaniterativeprocessfordeterminingthenumberofprobesthatisrequiredtomeetagivenaccuracyrequirementspeciedbyand.Thequerynodeabeginswithacertainnumber(e.g.,50)ofprobes.AfteritreceivesthehopcountsoftheprobesandcomputesXand^n,nodeadetermines0from( 7 )andthenthevalueofmfrom( 7 ).Ifthetotalnumberm0ofprobesthathavealreadybeensentisequaltoorgreaterthanm,itreturnsthecurrentvalueof^nastheestimationofn.Otherwise,ifm0islessthanm,nodeasend(m)]TJ /F3 11.955 Tf 12.18 0 Td[(m0)moreprobesandusethereturnedhopcountstorenethevalueofm.Thisprocesscontinuesuntilthetotalnumberofprobesthathavebeensentisequaltoorgreaterthanm. 7.3TokenedRandomWalkProtocol(TRW)Inthissection,wedescribeoursecondstatisticalmethodforestimatingthenumberofnodesinalargemobileP2Pnetwork. 7.3.1DescriptionoftheProtocolWerandomlydistributeanumberToftokenstonodesinthenetwork.TheproblemofhowtorandomlydistributetokenswillbediscussedinSection 7.3.3 .Thenodesthatholdtokensarecalledthetokenednodes.Aquerynodeasendsoutanumberofprobemessages.Eachprobeindependentlyperformsatokenedrandomwalk(TRW)andcountsthenumberofhopsthatithastraversed.Whenanodethatdoesnotholdatokenreceivesaprobe,itincreasesthehopcountintheprobebyoneandforwardstheprobetooneofitsneighbors.Whena 148

PAGE 149

tokenednodereceivesaprobe,itdiscardstheprobeandsendstheprobe'shopcountbacktonodea.AnillustrationofTRWisshowninFig 7-3 .3LetYbethelengthofatokenedrandomwalk,whichisthenumberofhopsthataprobetraversesbeforeitreachesatokenednode.Inthenextsubsection,weestablishamathematicalformulathatlinksE(Y)ton: E(Y)=g(n)=n)]TJ /F5 7.97 Tf 6.58 0 Td[(T+1Xi=1i(n)]TJ /F3 11.955 Tf 11.96 0 Td[(i+1)!(n)]TJ /F3 11.955 Tf 11.96 0 Td[(T)! n!(n)]TJ /F3 11.955 Tf 11.95 0 Td[(i)]TJ /F3 11.955 Tf 11.96 0 Td[(T+1)!T n)]TJ /F3 11.955 Tf 11.95 0 Td[(i+1,n=g)]TJ /F7 7.97 Tf 6.58 0 Td[(1(E(Y)).(7)Usingtheformula,wecanestimatenafterwemeasureE(Y)byperforminganumberoftokenedrandomwalksandtakingtheaverageYofthereceivedhopcounts.Theestimatednumber^nofnodesiscalculatedas ^n=g)]TJ /F7 7.97 Tf 6.59 0 Td[(1(Y).(7) 7.3.2LinkingE(Y)tonLetp(i)betheprobabilityofnotreachingatokenednodeafteraprobehasvisitedinodes.ThishappenswhenandonlywhenthesetofTtokenednodesdoesnotoverlapwiththesetofivisitednodes.Recallthatthetokenednodesarerandomlychosenfromthenetwork.(Tn)isthenumberofdifferentwaysforpickingthesetoftokenednodes.(Tn)]TJ /F5 7.97 Tf 6.58 0 Td[(i)isthenumberofdifferentwaysforpickingthemfromthenodesthatarenotvisited.Hence,p(i)=(Tn)]TJ /F5 7.97 Tf 6.59 0 Td[(i) (Tn)=(n)]TJ /F3 11.955 Tf 11.95 0 Td[(i)!(n)]TJ /F3 11.955 Tf 11.95 0 Td[(T)! n!(n)]TJ /F3 11.955 Tf 11.96 0 Td[(i)]TJ /F3 11.955 Tf 11.96 0 Td[(T)!Yisarandomvariablewithadistributionfrom1ton)]TJ /F3 11.955 Tf 12.2 0 Td[(T+1.Weknowthatp(i)]TJ /F6 11.955 Tf 12.2 0 Td[(1)istheprobabilityofnotreachingatokenednodeaftertherst(i)]TJ /F6 11.955 Tf 11.97 0 Td[(1)hops.Theconditional 3Inpractice,wemayallowtheprobestomakesomerandomwalksbeforestartingtocountthehops.Oursimulationshowsthatitincreasestherandomnessandreturnsbetterestimationresults. 149

PAGE 150

probabilityfortheithvisitednodetohaveatokenisT n)]TJ /F7 7.97 Tf 6.59 0 Td[(2.Clearly,p(i)]TJ /F6 11.955 Tf 12.6 0 Td[(1)T n)]TJ /F7 7.97 Tf 6.59 0 Td[(2istheprobabilityofreachingthersttokenednodeattheithhop.TheexpectedvalueofYisE(Y)=n)]TJ /F5 7.97 Tf 6.58 0 Td[(T+1Xi=1ip(i)]TJ /F6 11.955 Tf 11.96 0 Td[(1)T n)]TJ /F6 11.955 Tf 11.96 0 Td[(2=n)]TJ /F5 7.97 Tf 6.58 0 Td[(T+1Xi=1i(n)]TJ /F3 11.955 Tf 11.96 0 Td[(i+1)!(n)]TJ /F3 11.955 Tf 11.95 0 Td[(T)! n!(n)]TJ /F3 11.955 Tf 11.95 0 Td[(i)]TJ /F3 11.955 Tf 11.95 0 Td[(T+1)!T n)]TJ /F6 11.955 Tf 11.95 0 Td[(2.ThevarianceofYisV(Y)=n)]TJ /F5 7.97 Tf 6.59 0 Td[(T+1Xi=1(i)]TJ /F3 11.955 Tf 11.95 0 Td[(E(Y))2(n)]TJ /F3 11.955 Tf 11.95 0 Td[(i+1)!(n)]TJ /F3 11.955 Tf 11.95 0 Td[(T)! n!(n)]TJ /F3 11.955 Tf 11.96 0 Td[(i)]TJ /F3 11.955 Tf 11.95 0 Td[(T+1)!T n)]TJ /F6 11.955 Tf 11.96 0 Td[(2. 7.3.3RandomTokenDistributionTheprocessofderiving( 7 )intheprevioussubsectiondoesnotrequiretheassumptionofarandomly-connectednetwork.Instead,itrequiresthatthetokensarerandomlydistributed.Essentially,weshifttherandomtopologyrequirementofCRWtoarandomtokendistributionrequirement,whichcanbeimplementedbythefollowingdistributedalgorithm:Everytokenindependentlymovesaroundinthenetwork.Whenanodereceivesatoken,itholdsthetokenforaperiodthatisinverselyproportionaltoitscurrentnumberofneighbors,i.e.,thenode'sdegree.Afterthatperiod,itforwardsthetokentoaneighborselecteduniformlyatrandom.Thetransmissionofatokenmaybepiggybackedintheperiodichelloexchangebetweentheneighbors[ 138 ].Inordertosupportcommunicationsinamobilenetwork,eachnodehastoperiodicallybroadcastahellopacket(orcalledabeacon),whichallowsthenodetolearnitsnewneighbors.Onebitinthehellopacketcanbeusedtoencodeatokentransmission.'0'meansthereisnotokencarriedinthehellopacket,and'1'meansthereis.Weprovethat,inthesteadystate,therateatwhichanodereceivestokensisproportionaltothenode'sdegree.Becausetheholdingtimeaftereachreceiptofatokenisinverselyproportionaltothenode'sdegree,theaggregateholdingtimeateverynodeisaboutthesame,whichensurestheuniformrandomdistributionoftokens. 150

PAGE 151

Consideranarbitrarytoken.Themovementofthetokeninthenetworkismodeledasadiscrete-timenite-stateMarkovchain.Thecurrentstateisiifnodeiholdsthetoken.Thesetofstatesisf1,...,ngforthennodes.LetNibethesetofneighborsofnodei.Letni=jNij.Thetransitionprobabilityfromstateitostatejispij=8>><>>:1 niifj2Ni0ifj62NiThematrixoftransitionprobabilitiesisP=(pij,i,j2[1,n]).Let=(1,2,...,n)bethestationarydistributionoftheMarkovchain,whichsatisesP=andPni=1i=1,whereirepresentshowoftennodeihas(orreceives)thetokeninthesteadystateofthestochastictokenmovementprocess.P=canberewrittenasXj2Nij nj=i,i2[1,n].Itcanbeeasilyveriedthatthesolutionisi=ni Pnj=1nj,i2[1,n].Therefore,therateatwhichanodereceivesaparticulartokenisproportionaltoitsdegree.Becausealltokensmoveindependently,therateatwhichanodereceivestokensisalsoproportionaltoitsdegree.Ourgoalisforeachnodetohaveanequalchanceofbeingatokenednode.Tocompensatetheratedifference,whenanodereceivesatoken,theholdingtimeshouldbeinverselyproportionaltothedegree.Inparticular,wemayrandomlypickaholdingtimefromanexponentialdistributionwhosemeanisinverselyproportionaltothedegree.Ifanode'sdegreechangesduringthisperiod,weusethedegreewhenitrstreceivesthetoken.Inamobileenvironment,anodemaydepartfromthenetworkorbepowereddown.TosupporttheTRWprotocol,werequirethatifanodehasatoken,itmusttransmitthetokentoaneighborbeforeitdepartsorispowereddown.Theremayberarecases 151

PAGE 152

whereanodecrashestocauseapermanenttokenloss.Onesolutionistoreleaseanewsetoftokensperiodically.Eachreleaseisidentiedbyasequencenumbers.Thesequencenumberisinitializedtobeoneandincreasedbyoneforeverysubsequentrelease.Eachtokenmustbeassociatedwiththesequencenumberthatidentieswhichreleaseitbelongsto.Whenanodetransmitsatokentoaneighborpiggybackedinahellopacket,insteadofusingonebit,thepacketcarriesasequencenumber.Ifthenumberiszero,itmeansnotoken;otherwise,itmeansatokenofacertainrelease.Eachqueryismadewithasequencenumber,andeachprobeofthequerycarriesthatsequencenumber.Whenanodereceivesaprobe,onlyifithasatokenofthesamesequencenumber,itwilldiscardtheprobeandsendtheprobe'shopcountbacktothequerynode.LetDbetheperiodbetweentwoconsecutivetokenreleases.SupposeDischosentobesufcientlylarge,suchthattokenswillberandomlydistributedaftertheyarereleasedforatimeperiodofD.Now,aftertokensofsequencenumbersarereleasedandbeforetokensofsequencenumbers+1arereleased,allqueriescanbemadewithsequencenumbers)]TJ /F6 11.955 Tf 12.55 0 Td[(1.Moreover,allprevioustokens(withsequencenumberss)]TJ /F6 11.955 Tf 11.6 0 Td[(2orsmaller)arenolongeruseful.Toremoveoutdatedtokens,whenanodereceivesatokenwithanewsequencenumbers,itknowsthatitshouldremoveanytokenwithsequencenumbers)]TJ /F6 11.955 Tf 12.06 0 Td[(2orlessthatitmayreceiveinthefuture.Hence,astokenswithanewsequencenumberstarttotravelinthenetwork,nodesbegintoremoveold,uselesstokens. 7.3.4DeterminingtheNumberofProbesWenowdeterminetheminimumnumberofprobesthatareneededtoensurethattheprobabilityforntofallintherange[(1)]TJ /F4 11.955 Tf 11.96 0 Td[()^n,(1+)^n]isatleast.First,weprovethatYisanunbiasedestimateofE(Y),whichisthemeanhopcountofatokenedrandomwalk.LetYibethehopcountofthetokenedrandomwalkby 152

PAGE 153

theithprobe.AsYistheaveragehopcountoftheprobes,wehave, Y=1 mmXi=1YiE(Y)=1 mE(mXi=1Yi)=1 mmXi=1E(Yi)(7)EachYiisanindependentrandomsampleofY.ThereforeE(Yi)=E(Y).From( 7 ),wehaveE(Y)=E(Y).Next,basedonthesameprocessasSection 7.2.3 ,itcanbeshownthatifthenumberofprobesis m=(st)2 (0Y)2,(7)thentheprobabilityforE(Y)tofallin[(1)]TJ /F4 11.955 Tf 12.39 0 Td[(0)Y,(1+0)Y]isatleast,where0isagivenvalue,sisthestandarddeviationcalculatedfromthereceivedhopcounts,andtistheupper1+ 2pointofthetdistributionwith(m)]TJ /F6 11.955 Tf 12.17 0 Td[(1)degreeoffreedom.Wehavethefollowinginequalities:(1)]TJ /F4 11.955 Tf 11.95 0 Td[(0)YE(Y)(1+0)Yg)]TJ /F7 7.97 Tf 6.59 0 Td[(1((1)]TJ /F4 11.955 Tf 11.95 0 Td[(0)Y)g)]TJ /F7 7.97 Tf 6.59 0 Td[(1(E(Y))g)]TJ /F7 7.97 Tf 6.59 0 Td[(1((1+0)Y)g)]TJ /F7 7.97 Tf 6.58 0 Td[(1((1)]TJ /F4 11.955 Tf 11.96 0 Td[(0)Y)ng)]TJ /F7 7.97 Tf 6.58 0 Td[(1((1+0)Y).Theaccuracyrequirementis(1)]TJ /F4 11.955 Tf 12.4 0 Td[()^nn(1+)^nwithprobability.Tomeetthisrequirement,weselectthemaximumvalueof0thatmeetsthefollowingconditions: g)]TJ /F7 7.97 Tf 6.59 0 Td[(1((1+0)Y(1+)^ng)]TJ /F7 7.97 Tf 6.59 0 Td[(1((1)]TJ /F4 11.955 Tf 11.95 0 Td[(0)Y(1)]TJ /F4 11.955 Tf 11.96 0 Td[()^n.(7)Givenanaccuracyrequirementspeciedbyand,thequerynodeabeginswithacertainnumberofprobes.Afteritreceivesthehopcountsoftheprobes,nodeadetermines0from( 7 )andthenthevalueofmfrom( 7 ).Ifthetotalnumberm0ofprobesthathavealreadybeensentisequaltoorgreaterthanm,itreturnsthecurrentvalueof^nastheestimationofn.Otherwise,ifm0islessthanm,nodeasend(m)]TJ /F3 11.955 Tf 12.26 0 Td[(m0) 153

PAGE 154

moreprobesandusethereturnedhopcountstorenethevalueofm.Thisprocesscontinuesuntilthetotalnumberofprobesthathavebeensentisequaltoorgreaterthanm. 7.4SimulationInthissection,weusesimulationstoevaluatetheperformanceofourcardinalityestimationmethodsintermsofaccuracyandoverhead.Oursimulationsareperformedbasedontwomodels:(1)streetmodel,whichestimatesthenumberofmovingvehiclesinthestreetsofacitydistrict;(2)randomwaypointmodel[ 10 61 64 118 ],whichisusedtodemonstratetheperformanceofourmethodsforothermobileP2Psystemsthatarenotrestrictedtostreets. 7.4.1SimulationSetupforStreetModelSupposefuturecarsareequippedwithwirelessdevicesthatnotonlysupportuserapplicationsbutalsoassisttransportationmanagementfunctionssuchasinformationgathering.Onesuchfunctionmaybeestimatingthenumberofmovingvehicles.Whenavehicleispowereddown,wecanconsiderittobeinparkingstatus(inmostcases).Whenavehicleispoweredupwithitswirelessdevicerunning,wecanconsideritbetoinmovingstatus(inmostcases).Considerasquareareaof1000metersby1000meters.Letthetimeunitbeasecond.Supposethestreetsandavenuesarearrangedinagridpatternandthedistancebetweentwoadjacentstreets(oravenues)is100meters.Thenumbernofmovingvehiclesrangesfrom1,000to5,000.Thetransmissionrangeofavehicleis100meters.Eachvehicleiscapableofforwardingmessagestoanyothervehicleinitstransmissionrange.Eachmovingvehicleselectsanarbitraryintersectionandavelocityintherangeof[20,40]miles/hour,i.e.,[32,64]kilometers/hour.Thenitmovesrstalongastreetandthenalonganavenueorviceversatowardsthedestinationattheselectedspeed.Forsimplicity,wedonotimplementvariablespeedandstopsatsignallights. 154

PAGE 155

WeuseCRWandTRWtoestimatethenumbernofvehiclesinthearea.Bothmethodssendprobemessagestothenetwork.ForCRW,whenavehiclereceivesaprobemessage,itholdsthemessagefor10secondsbeforeforwardingit.Thisholdingperiodallowseachvehicletohavechancetochangeneighbors.TRWdoesnotneedtherandom-neighborassumption,andthereforetheholdingperiodforprobemessagesisnotnecessary.ForTRW,ifavehiclereceivesatoken,itkeepsthetokenfor10to100secondsbeforepassingittoanothervehicle.Theactualtimethatitholdsthetokenisinverselyproportionaltothenumberofitsneighbors.Wesetthenumberoftokensinthenetworktobe100.Inordertoincreaserandomness,weletprobemessagestomakesomerandomwalksbeforestartingtocountthehops.Assumeeachvehicleknowsitsgeographiclocation(possiblythroughGPS).Avehiclealsoknowsapproximatelythegeographiclocationsofitsneighbors,whicharepiggybackedinthehelloexchanges.Weimplementgeographicrouting[ 67 ]forthehopcountresulttobesentbacktothequerynodewheneveraprobemessagecompletesitsrandomwalk.Thelocationofthequerynodeiscarriedbytheprobemessage. 7.4.2AccuracyandOverheadunderStreetModelFigure 7-4 presentstheestimationsmadebyCRW.Eachpointinthegurerepresentsonesimulationresult;thexcoordinateofthepointistheactualnumbernofvehicles,andtheycoordinateistheestimatednumber^n.Anestimationismoreaccurateifthepointisclosertotheequalityline,y=x.Intheleftmostplot,welet=95%and=0.2,whichrequirethat95%ofestimatednumbersshouldfallwithin20%ofthetruenumbers.Inthemiddleplot,=99%and=0.2.Intherightmostplot,=95%and=0.1.Ourstatisticalmeasurementbasedonthedatapointsinthegureconrmsthattheaccuracyrequirementisindeedmet.Figure 7-5 presentstheestimationsmadebyTRW.Again,theaccuracyrequirementismet.Next,weinvestigatetheoverheadofthetwomethods.ForCRW,theoverheadiscausedbyprobemessagetransmissions.ForTRW,theoverheadcomesfromboth 155

PAGE 156

Table7-1. AveragecompletiontimeofCircledRandomWalk(CRW) n10002000300040005000 CRWTime6'419'1811'3513'3214'59 probetransmissionsandtokentransmissions.AprobetransmissionisaseparatepacketwithheadersatMAC,network,andapplication(CRW/TRW)layers,aswellasphysical-layercost.Atokentransmissionispiggybackedinthehelloexchanges,whichincursfarsmalleroverheadthanaprobe.Hence,weonlycounttheprobemessagetransmissionsinoverheadcomparison.Figure 7-6 comparesCRWandTRWintermsofthetotalnumberofmessagetransmissions.Foreachdatapoint,wemake100simulationruns,averagetheoverheadresults,andpresentthemeanoverheadanditsstandarddeviation.TRWoutperformsCRW.ItscommunicationoverheadisonlyafractionofCRW'soverhead.Foracircledrandomwalk,thetimeforaprobetotraveleachhophastwocomponents:theholdingperiodandthetransmissiontimefromonenodetothenext.Whencomparingtheholdingtime(10secondsinthissimulation),thetransmissiontimeisnegligible.Hence,oursimulationonlyconsiderstheholdingtime.TheaveragecompletiontimeofCRW(includingallitsprobes)isshowninTable 7-1 .Becauseofthelongholdingperiod,thetimeforCRWissignicant.However,itisprobablyacceptableinpracticetospendsixtofteenminutesforthetaskofmeasuringthenumberofvehiclesinacitydistinct.BecauseTRWdoesnotneedanyholdingperiod,itscompletionismuchquicker.Whenn=1000andthenumberoftokensis100,eachtokenedrandomwalkhasanaverageofjust10hops,andonlythetransmissiontimeisneeded. 7.4.3SimulationSetupforRandomWaypointModelWestudyhowourmethodswillperformforarbitrarymobileP2Pnetworksundertherandomwaypointmodel.Considerasquareareaof10001000unitsoflength.Thenumbernofwirelessnodesrangesfrom1,000to5,000.Thetransmissionrangeofthenodesis100unitsoflength.Eachnodeiscapableofforwardingmessagestoany 156

PAGE 157

othernodeinitstransmissionrange.Weusetherandomwaypointmodeltosimulatethemobilityofnodes:Eachnoderandomlyselectsapositionwithintheareaasdestinationandavelocityintherangeof[5,30].Thenitmovestowardsthedestinationattheselectedspeed.Afteritarrivesatthedestination,itmovesagainwithanewdestinationandanewvelocity.OtherparametersaresimilartothoseinSection 7.4.1 withsecondbeingreplacedwithunitoftime,incorrespondencewiththeabstracttermofunitoflengthabove. 7.4.4AccuracyandOverheadundertheRandomWaypointModelWerstpresenttheestimationaccuracyofthetwomethods.Wevarythenumbernofwirelessnodesfrom1000to5000.Werunthesimulationunderthreedifferentaccuracyrequirements:=95%,=20%;=99%,=20%;and=95%,=10%.Figures 7-7 and 7-8 comparetheactualnumberofnodes,n,andtheestimatednumber,^n.Eachpointintheguresrepresentsonesimulationresult;thexcoordinateofthepointistheactualnumbernofnodes,andtheycoordinateistheestimatednumber^n.Intheleftmostplot,therequirementspeciedbyandisthattheestimatednumber^nshouldhavea95%chancetofallwithin20%oftheactualnumbern.Ourstatisticalmeasurementbasedonthedatapointsinthegureshowsthatthisrequirementismet.ThesameistrueforotherplotsinFigures 7-7 and 7-8 withdifferentcombinationsofandvalues.Figure 7-9 comparesCRWandTRWintermsofthetotalnumberofmessagetransmissions.Fromthegure,weobservethatTRWismuchmoreefcient,onlyrequiring20%to30%oftheoverheadincurredbyCRW. 7.4.5PerformanceunderLowMobilityAswehaveshownintheprevioussimulations,CRWproducesgoodestimationwhenthenodesmoverelativelyfast.Eventhoughtherandomneighborassumptionisnotaccurate,itapproximateswellforthehighlymobilescenarios.However,ournextset 157

PAGE 158

ofsimulationsshowthatCRWdoesnotworkwellfornetworksoflowmobility.Onthecontrary,TRWremainsaccurateforthosenetworks.ThesimulationsetupisthesameasdescribedinSection 7.4.3 exceptthatthevelocityofeachnodeisrandomlyselectedfrom[3,10],insteadof[5,30],whenthenodemovesfromonelocationtoanother.Figure 7-10 showsthatCRWconsistentlyunder-estimatesthenumberofnodesinanetworkoflowmobility.Thatisbecausenodeshaverelativelystableneighborsandastheyforwardprobemessages,themessagestendtomakeshortcycleswithinsmallneighborhood,whichleadstotheunder-estimation.Figure 7-11 showsthatTRWremainsaccuratefornetworksoflowmobilitybecauseitsdesigndoesnotrelyontherandomneighborassumption.Figure 7-12 comparestheoverheadofthetwomethods.Again,TRWoutperformsCRWsignicantly.Overall,oursimulationresultsareconsistentacrossthestreetmodelandtherandomwaypointmodel.Insummary,TRWisabettermethodintermsofcommunicationoverhead,anditisalsobetterintermsofaccuracyforlow-mobilitynetworks.CRWisabletoproduceresultsthatmeettheaccuracyrequirementinVP2Porothermobilenetworkswithrelativelyhighmobility.Itsadvantageissimplicity(withouttheassistanceoftokens). 7.5SummaryThischapterinvestigatestheproblemofhowtoestimatethenumberofnodesinalargeVP2Pnetwork.Weproposetwostatisticalmethods,calledthecircledrandomwalkandthetokenedrandomwalk,respectively.Theproposedmethodsareadjustableforestimationaccuracyandcommunicationoverhead.Theirestimationprocessinvolvesonlyasubsetofthenodes,andtheestimationerrorscanbemadearbitrarilysmall.Whilethecircledrandomwalkmethodrequiresarandomizedneighboringrelationshipamongthenodesanditworkswellinhigh-mobilitynetworks,thetokenedrandomwalk 158

PAGE 159

ismorepracticalforalltypesofVP2Porothermobilenetworksbecauseitremovestherandom-neighborrequirementthrougharandomizedtokendistributionprocess. 159

PAGE 160

Figure7-2. E(X)asafunctionofn. Figure7-3. IllustrationofTRW. 160

PAGE 161

Figure7-4. CRW'sestimationaccuracyunderthestreetmodel. Figure7-5. TRW'sestimationaccuracyunderthestreetmodel. Figure7-6. Numberofmessagetransmissionsforeachestimation,underthestreetmodel. 161

PAGE 162

Figure7-7. CRW'sestimationaccuracyundertherandomwaypointmodelwithnodalmovingvelocityinrangeof[5,30]. Figure7-8. TRW'sestimationaccuracyundertherandomwaypointmodelwithnodalmovingvelocityinrangeof[5,30]. Figure7-9. Numberofmessagetransmissionsforeachestimation,undertherandomwaypointmodelwithnodalmovingvelocityinrangeof[5,30]. 162

PAGE 163

Figure7-10. CRW'sestimationaccuracyundertherandomwaypointmodelwithnodalmovingvelocityinrangeof[3,10]. Figure7-11. TRW'sestimationaccuracyundertherandomwaypointmodelwithnodalmovingvelocityinrangeof[3,10]. Figure7-12. Numberofmessagetransmissionsforeachestimation,undertherandomwaypointmodelwithnodalmovingvelocityinrangeof[3,10]. 163

PAGE 164

CHAPTER8CONCLUSIONANDFUTUREWORKInthiswork,weproposeseveralspace-timeefcientdatastructuresprotocolsandplacethemintovariousapplicationscenarios.Werststudyanewfamilyofspace-timeefcientBloomltersthatprovidethesamecompactnessastheBloomlter,butrequiresmuchfewermemoryaccessforeachlookup.Weopennewdesignspacewithtrade-offsamongspaceusage,falsepositiveratio,memoryaccess,andnumberofhashbits.Weshowthedesigntrade-offsthroughtheoreticalanalysisandexperiments.OurproposedfamilyofBloomltervarianttheBloom-glterconstantlyuseslessnumberofmemoryaccessandlesshashbitsthantheBloomlterwhenkisthesame.IfweallowtheBloom-gltertouseasmanyhashbitsasstandardBloomlter,itwillproduceevenbetterresultintermsofthefalsepositiveratio.Whenasmallkisused,theBloom-1ltermakesonlyonememoryaccessandhascomparablefalsepositiveratioasstandardBloomlter.Whentheoptimalkisusedandtheloadfactorissmall,Bloom-1introducesmuchmorefalsepositivesthanstandardBloomlterduetounbalancedloadamongdifferentwords.Asacompensation,theBloom-2lterandBloom-3ltermakes2and3memoryaccessesrespectively,yetproducecomparablefalsepositiveratioasstandardBloomlter.Asaconclusion,wesuggesttouseBloom-1asasubstituteoftheBloomlterwhenkissmall,anduseBloom-2orBloom-3toreplacetheBloomlterwithoptimalk.Inordertoallowmoredesigntrade-offs,weproposeanotherBloomltervarianttheBloom-lter,whichmakes1.xmemoryaccessonaverage,andproducesfalsepositiveratiosthatarebetweenBloom-1andBloom-2.Then,wefurtherproposeanideaofrepresentingmulti-set,whereasetIDisassociatedwitheachmemberelement.OurideaistoseparatemembershipencodingandsetIDstorageintwodatastructures,calledindexencoderandset-idtable,respectively.TheindexencoderadoptsideasfromtheBloom-glter,whichtakes 164

PAGE 165

gmemoryaccessestocheckthemembershipofagivenelementandtolocatetheindexwhereset-idisstored.Theset-idtableisamulti-hashingtableemployinganovelload-to-left,candidate-to-rightpolicyforelementplacement.Thesetechniquestogetherallowtheproposedmulti-setmembershiplookupfunctiontoworkinverycompactmemory,takeonlyafewmemoryaccessesandhashoperationsforeachlookup,andhavemuchlowererrorprobabilitieswhencomparingwithalternativedatastructures.Next,wedesignanotherBloomltervariantcalledadaptiveBloomlterforjoiningtwosetsstoreddistributivelyinthenetwork.Theideaistoiterativelyeliminatenon-candidatesusingsmallBloomlters.Wetransformtheproblemintoanoptimizationproblem.OurdesignoutperformstheBloomlterandcompressedBloomlterintermsofexchangedmessagesize.Afterthat,weapplytheBloom-1lterideatogetherwiththepartitionedBloomlterinthedesignofefcientinformationcollectionprotocolsforlarge-scaleRFIDsystems.Ourprimaryobjectiveistolowerenergyconsumptionbytagsinordertoextendtheirlifetime.TOPsharessimilaritywiththeBloom-1lter,wheretagsonlylistentoonevectorthatthereaderbroadcastsfornecessaryinformation.ETOPisanextensionofTOP,adoptingtheideaofpartitionedBloomlter.ThenewprotocolscanbeconguredtoachieveO(1)energycostpertag.Performancetradeoffbetweenenergycostandexecutiontimecanbemadebycontrollingthesizeofthereporting-ordervector.Simulationresultsshowthatthenewprotocolsareabletocutenergycostbymorethananorderofmagnitude,whencomparingwithotherprotocols.Lastly,wedesigntwoefcientprotocolsforestimatingthenetworksizeofmobilevehicularpeer-to-peernetworks.Existingsolutioneitherrequiresoodingthenetwork,ortakestoolongtoconverge.Wepresenttwonovelstatisticalmethods,calledthecircledrandomwalkandthetokenedrandomwalk,toaddressthisinterestingproblem.Whilethecircledrandomwalkmethodrequiresarandomizedneighboringrelationshipamongthenodesanditworkswellinhigh-mobilitynetworks,thetokenedrandomwalk 165

PAGE 166

ismorepracticalforalltypesofVP2Porothermobilenetworksbecauseitremovestherandom-neighborrequirementthrougharandomizedtokendistributionprocess.Bothmethodsprovidecardinalityestimationbyinvolvingonlyasmallsubsetofthenodes(vehicles).Theymaketradeoffbetweenoverheadandestimationaccuracy.Theestimationerrorcanbemadearbitrarilysmallattheexpenseoflargeroverhead.Futureresearchwillgoon.WewillfurtherexplorethedesignspaceoftheadaptiveBloomlter,forexample,whenitisusedformulti-partyinformationsharing.Wewilltestitusingrealnetworktraces.Also,forthemulti-setmembershipcheckingproblem,weproposedideaforelementupdate/deletion.Inpractice,therewillbepracticalissues,suchastheintroductionoffalsenegatives.Howtoreducethedownsideasmuchaspossiblewhilingutilizingitsadvantage,remainsanopenproblem. 166

PAGE 167

REFERENCES [1] InformationTechnologyRadioFrequencyIdenticationforItemManagementAirInterfacePart6:ParametersforAirInterfaceCommunicationsat860-960MHz.FinalDraftInternationalStandardISO18000-6(2003). [2] Akhtar,N.,Ergen,S.,andOzkasap,O.AnalysisofDistributedAlgorithmsforDensityEstimationinVANETs(Poster).VehicularNetworkingConference(2012):157. [3] Almeida,P.,Baquero,C.,Preguica,N.,andHutchison,D.ScalableBloomFilters.InformationProcessingLetters101(2007).6:255. [4] Alvarez-Icaza,L.,Munoz,L.,Sun,X.,andHorowitz,R.AdaptiveObserverforTrafcDensityEstimation.Proc.ofAmericanControlConference3(2004):2705. [5] Artimy,M.LocalDensityEstimationAndDynamicTransmission-RangeAssignmentInVehicularAdHocNetworks.IEEETransactionsonIntelligentTransportationSystems8(2007).3:400. [6] Bhandari,N.,Sahoo,A.,andIyer,S.IntelligentQueryTree(IQT)ProtocoltoImproveRFIDTagReadEfciency.Proc.ofIEEEICIT(2006). [7] Bloom,B.H.Space/TimeTrade-offsinHashCodingwithAllowableErrors.CommunicationsoftheACM13(1970).7:422. [8] Bonomi,F.,Mitzenmacher,M.,Panigrah,R.,Singh,S.,andVarghese,G.BeyondBloomFilters:fromApproximateMembershipCheckstoApproximateStateMachines.ACMSIGCOMMComputerCommunicationReview36(2006).4:315. [9] Bose,P.,Morin,P.,Stojmenovic,I.,andUrrutia,J.RoutingwithGuaranteedDeliveryinAdHocWirelessNetworks.Proc.of3rdInt'lWorkshoponDiscreteAlgorithmsandMethodsforMobileComputingandCommunications(DialM)(1999). [10] Broch,J.,Maltz,D.,Johnson,D.,Hu,Y.,andJetcheva,J.Multi-HopWirelessAdHocNetworkRoutingProtocols.Proc.ofACMMobicom(1998):85. [11] Broder,A.andKarlin,A.MultilevelAdaptiveHashing.Proc.ofACM-SIAMSODA(1990). [12] Broder,A.andMitzenmacher,M.NetworkApplicationsofBloomFilters:ASurvey.InternetMathematics1(2002).4:485. [13] Brodkin,J.AWirelessRouterThatTracksUserActivityButforaGoodReason.ArsTechnica(2013). 167

PAGE 168

URL http://arstechnica.com/gadgets/2013/01/a-wireless-router-that-tracks-user-activity-but-for-a-good-reason/ [14] Bu,K.,Xiao,B.,Xiao,Q.,andChen,S.EfcientPinpointingofMisplacedTagsinLargeRFIDSystems.IEEECommunicationsSocietyConferenceonSensor,MeshandAdHocCommunicationsandNetworks(SECON)(2011):287. [15] Budianu,C.,David,S.,andTong,L.EstimationoftheNumberofOperatingSensorsinLarge-ScaleSensorNetworkswithMobileAccess.IEEETranscationsonSignalProcessing54(2006).5:1703. [16] Burden,R.L.andFaires,J.D.NumericalAnalysis.Brooks/Cole6(2001). [17] Canim,M.,Mihaila,G.A.,Bhattacharhee,B.,Lang,C.A.,andRoss,K.A.BufferedBloomFiltersonSolidStateStorage.VLDBADMSWorkshop(2010). [18] Capetenakis,J.I.TreeAlgorithmsforPacketBroadcastChannels.IEEETransactionsonInformationTheory25(1979).5. [19] Cha,J.R.andKim,J.H.DynamicFramedSlottedALOHAAlgorithmsusingFastTagEstimationMethodforRFIDSystems.Proc.ofIEEECCNC(2006):768772. [20] Chang,F.,Dean,J.,Ghemawat,S.,Hsieh,W.C.,Wallach,D.A.,Burrows,M.,Chandra,T.,Fikes,A.,andGruber,R.E.Bigtable:ADistributedStorageSystemforStructuredData.ACMTransactionsonComputerSystems(TOCS)26(2008).2:4. [21] Chang,F.,Feng,W.,andLi,K.ApproximateCachesforPacketClassication.Proc.ofIEEEInfocom4(2004):2196. [22] Chazelle,B.,Kilian,J.,Rubinfeld,R.,andTal,A.TheBloomierFilter:AnEfcientDataStructureforStaticSupportLookupTables.Proc.ofACMSODA(2004):30. [23] Chen,H.,Jin,H.,Chen,L.,Liu,Y.,andNi,L.OptimizingBloomFilterSettingsinPeer-to-peerMultikeywordSearching.IEEETransactionsonKnowledgeandDataEngineering24(2012).4:692. [24] Chen,M.,Luo,W.,Mo,Z.,Chen,S.,andFang,Y.AnEfcientTagSearchProtocolinLarge-ScaleRFIDSystems.Proc.ofIEEEINFOCOM(2013):899. [25] Chen,S.,Li,J.,andZhang,M.PollingProtocolsinRFIDSystems.InternalTechnicalReport,NetworkInformationCenter,UniversityofShanghaiforScienceandTechnology,Shanghai,China(2010). 168

PAGE 169

[26] Chen,S.andNahrstedt,K.MaxminFairRoutinginConnection-orientedNetworks.Proc.ofEuro-ParallelandDistributedSystemsConference(Euro-PDS98)(1998):163. [27] Chen,S.andShavitt,Y.SoMR:AScalableDistributedQosMulticastRoutingProtocol.JournalofParallelandDistributedComputing68(2008).2:137. [28] Chen,S.,Zhang,M.,andXiao,B.EfcientInformationCollectionProtocolsforSensor-augmentedRFIDNetworks.Proc.ofIEEEINFOCOM(2011):3101. [29] Chen,W.,Che,W.,Wang,X.,Huang,C.,Yan,N.,Min,H.,andTan,J.ATwo-stageWake-upCircuitforSemi-PassiveRFIDTag.IEEEInternationalConferenceonASIC(2009):553. [30] Chen,Y.,Kumar,A.,andXu,J.ANewDesignofBloomFilterforPacketInspectionSpeedup.Proc.ofIEEEGLOBECOM(2007):1. [31] Christensen,K.,Roginsky,A.,andJimeno,M.ANewAnalysisOfTheFalsePositiveRateOfABloomFilter.InformationProcessingLetters110(2010).21:944. [32] Christodes,N.GraphTheory:AnAlgorithmicApproach,vol.8.AcademicpressNewYork,1975. [33] Cicho,J.,Lemiesz,J.,andZawada,M.OnCardinalityEstimationProtocolsforWirelessSensorNetworks.Ad-Hoc,Mobile,andWirelessNetworks(2011):322. [34] Cichon,J.,Lemiesz,J.,Szpankowski,W.,andZawada,M.Two-PhaseCardinalityEstimationProtocolsForSensorNetworksWithProvablePrecision.WirelessCommunicationsandNetworkingConference(WCNC)(2012):2009. [35] Cohen,S.andMatias,Y.SpectralBloomFilters.Proc.ofACMSIGMOD(2003):241. [36] Coifman,B.EstimatingDensityandLaneInowonaFreewaySegment.TransportationResearchPartA:PolicyandPractice37(2003).8:689. [37] Coulouris,G.,Dollimore,J.,andKindberg,T.DistributedSystems:ConceptsandDesign.Addison-Wesley,ISBN0-201-18059-6,2011. [38] Debnath,B.,Sengupta,S.,Li,J.,Lilja,D.J.,andDu,D.H.C.BloomFlash:BloomFilteronFlash-basedStorage.Proc.ofICDCS(2011):635. [39] Deng,F.andRaei,D.ApproximatelyDetectingDuplicatesforStreamingDataUsingStableBloomFilters.Proc.ofACMSIGMOD(2006):25. 169

PAGE 170

[40] Dharmapurikar,S.,Krishnamurthy,P.,andTaylor,D.LongestPrexMatchingUsingBloomFilters.Proc.ofACMSIGCOMM(2003):201. [41] Dietzfelbinger,M.,Karlin,A.,Mehlhorn,K.,Heide,F.,Rohnert,H.,andTarjan,R.DynamicPerfectHashing:UpperandLowerBounds.SIAMJournalonComputing23(1994).4:738. [42] Dimitriou,T.ASecureandEfcientRFIDProtocolthatcouldmakeBigBrother(partially)Obsolete.Proc.ofIEEEPerCom(2006):6pp. [43] Dimitropoulos,X.,Hurley,P.,andKind,A.ProbabilisticLossyCounting:AnEfcientAlgorithmforFindingHeavyHitters.ACMSIGCOMMComputerCommunicationReview38(2008).1:7. [44] Ding,N.,Bi,X.,andZhang,D.AnEfcientHybridHierarchicalTriePacketClassicationAlgorithmBasedonNoPrexRelationship.JournalofComputa-tionalInformationSystems9(2013).22:9193. [45] Estan,C.andVarghese,G.NewDirectionsinTrafcMeasurementandAccounting.Proc.ofACMSIGCOMM32(2002).4:323. [46] F.Hao,M.KodialamandLakshman,T.V.BuildingHighAccuracyBloomFiltersUsingPartitionedHashing.Proc.ofACMSIGMETRICS35(2007).1:277. [47] Firewall,CiscoIOS.Context-BasedAccessControl(CBAC):IntroductionandConguration.(2008).URL http://www.cisco.com/c/en/us/support/docs/security/ios-firewall/13814-32.html [48] Fredkin,E.TrieMemory.CommunicationsoftheACM3(1960).9:490. [49] Gardner,W.David.ResearchersTransmitOpticalDataAt16.4Tbps.InformationWeek(2008). [50] Garelli,L.,Casetti,C.,Chiasserini,C.,andFiore,M.Mobsampling:V2VCommunicationsforTrafcDensityEstimation.VehicularTechnologyConfer-ence(2011):1. [51] Gazis,D.C.andKnapp,C.H.On-LineEstimationofTrafcDensitiesfromTime-SeriesofFlowandSpeedData.TransportationScience5(1971).3:283. [52] Goodrich,M.andMitzenmacher,M.InvertibleBloomLookupTables.AllertonConferenceonCommunication,Control,andComputing(Allerton)(2011):792. 170

PAGE 171

[53] Guo,D.,Liu,Y.,Li,X.,andYang,P.FalseNegativeProblemofCountingBloomFilter.IEEETransactionsonKnowledgeandDataEngineering22(2010).5:651. [54] Guo,D.,Wu,J.,Chen,H.,Yuan,Y.,andLuo,X.TheDynamicBloomFilters.IEEETransactionsonKnowledgeandDataEngineering22(2010).1:120. [55] Han,H.,Sheng,B.,Tan,C.,Li,Q.,Mao,W.,andLu,S.CountingRFIDTagsEfcientlyandAnonymously.Proc.ofIEEEINFOCOM(2010):1. [56] Hao,F.,Kodialam,M.,andLakshman,TV.IncrementalBloomFilters.Proc.ofIEEEINFOCOM(2008):1067. [57] Hao,F.,Kodialam,M.S.,Lakshman,T.V.,andSong,H.FastMultisetMembershipTestingUsingCombinatorialBloomFilters.Proc.ofIEEEINFO-COM(2009). [58] Horowitz,E.,Sahni,S.,andRajasekaran,S.ComputerAlgorithmsC++(Chapter3.2).WHFreeman,1996. [59] Hua,Y.andXiao,B.AMulti-attributeDataStructureWithParallelBloomFiltersforNetworkServices.HighPerformanceComputing(HiPC)(2006):277. [60] Huang,S.C.H.,Wan,P.J.,Jia,X.,Du,H.,andShang,W.Minimum-LatencyBroadcastSchedulinginWirelessAdHocNetworks.Proc.ofIEEEINFOCOM(2007):733. [61] Hyytia,E.,Lassila,P.,andVirtamo,J.SpatialNodeDistributionoftheRandomWaypointMobilityModelwithApplications.IEEETransactionsonMobileComput-ing5(2006).6:680. [62] Jelasity,M.andMontresor,A.Epidemic-styleProactiveAggregationinLargeOverlayNetworks.Proc.ofDistributedComputingSystems(2004):102. [63] Jerbi,M.,Senouci,S.,Rasheed,T.,andGhamri-Doudane,Y.AnInfrastructure-FreeTrafcInformationSystemforVehicularNetworks.Vehic-ularTechnologyConference(2007):2086. [64] Johnson,D.andMaltz,D.DynamicSourceRoutinginAdHocWirelessNetworks.MobileComputing(1996):153. [65] Kamiyama,N.andMori,T.SimpleandAccurateIdenticationofHigh-rateFlowsbyPacketSampling.Proc.ofIEEEINFOCOM(2006). [66] Kanizo,Y.,Hay,D.,andKeslassy,I.OptimalFastHashing.Proc.ofIEEEINFOCOM(2009):2500. [67] Karp,B.andKung,H.GPSR:GreedyPerimeterStatelessRoutingforWirelessNetworks.Proc.ofACMMobiCom(2000):243. 171

PAGE 172

[68] Kim,D.,Oh,D.,andRo,W.DesignofPower-EfcientParallelPipelinedBloomFilter.Electronicsletters48(2012).7:367. [69] Kirsch.,A.andMitzenmacher,M.SimpleSummariesforHashingwithChoices.Networking,IEEE/ACMTransactionson16(2008).1:218. [70] Klair,D.,Chin,K.,andRaad,R.OntheEnergyConsumptionofPureandSlottedAlohabasedRFIDAnti-collisionProtocols.ComputerCommunications32(2009).5:961. [71] Kodialam,M.andNandagopal,T.FastandReliableEstimationSchemesinRFIDSystems.Proc.ofACMMOBICOM(2006):322. [72] Kodialam,M.,Nandagopal,T.,andLau,W.AnonymousTrackingusingRFIDtags.Proc.ofIEEEINFOCOM(2007). [73] Kostoulas,D.,Psaltoulis,D.,Gupta,I.,Birman,K.,andDemers,A.DecentralizedSchemesforSizeEstimationinLargeandDynamicGroups.NetworkComputingandApplications(2005):41. [74] Krishnamurthy,B.,Sen,S.,Zhang,Y.,andChen,Y.Sketch-BasedChangeDetection:Methods,Evaluation,andApplications.Proc.ofACMSIGCOMMInternetMeasurementConference(2003):234. [75] Kubiatowicz,J.,Bindel,D.,Chen,Y.,etal.Oceanstore:AnArchitectureforGlobal-ScalePersistentStorage.ACMSigplanNotices35(2000).11:190. [76] Kumar,A.,Xu,J.,Wang,J.,Spatschek,O.,andLi,L.Space-CodeBloomFilterforEfcientPer-FlowTrafcMeasurement.IEEEJournalonSelectedAreasinCommunications24(2006).12:2327. [77] Kumar,A.,Xu,J.,andZegura,E.W.EfcientandScalableQueryRoutingforUnstructuredPeer-to-PeerNetworks.Proc.ofIEEEINFOCOM2(2005):1162. [78] Lee,S.,Joo,S.,andLee,C.AnEnhancedDynamicFramedSlottedALOHAAlgorithmforRFIDTagIdentication.Proc.ofIEEEMOBIQUITOUS(2005):166. [79] Lee,T.,Kim,K.,andKim,H.JoinProcessingusingBloomFilterinMapReduce.Proc.ofACMResearchinAppliedComputationSymposium(2012):100. [80] Leow,W.,Ni,D.,andPishro-Nik,H.ASamplingTheoremApproachtoTrafcSensorOptimization.IEEETransactionsonIntelligentTransportationSystems9(2008).2:369. [81] Leshem,A.andTong,L.EstimatingSensorPopulationviaProbabilisticSequentialPolling.IEEESignalProcessingLetters12(2005).5:395. 172

PAGE 173

[82] Li,F.,Cao,P.,Almeida,J.,andBroder,A.SummaryCache:AScalableWide-AreaWebCacheSharingProtocol.IEEE/ACMTransactionsonNetworking8(2000).3:281. [83] Li,T.,Chen,S.,andLing,Y.IdentifyingtheMissingTagsinaLargeRFIDSystem.Proc.ofACMMobihoc(2010):1. [84] .FastandCompactPer-FlowTrafcMeasurementthroughRandomizedCounterSharing.IEEEINFOCOM(2011):1799. [85] .EfcientProtocolsforIdentifyingtheMissingTagsinaLargeRFIDSystem.toappearinIEEE/ACMTransactionsonNetworking(2014). [86] Li,T.,Chen,S.,andQiao,Y.Origin-destinationFlowmeasurementinHigh-speedNetworks.Proc.ofINFOCOM(2012):2526. [87] Li,T.,Luo,W.,Mo,Z.,andChen,S.Privacy-preservingRFIDAuthenticationBasedonCryptographicalEncoding.Proc.ofIEEEINFOCOM(2012):2174. [88] Li,T.,Wu,S.,Chen,S.,andYang,M.EnergyEfcientAlgorithmsfortheRFIDEstimationProblem.Proc.IEEEINFOCOM(2010):1. [89] .GeneralizedEnergy-EfcientAlgorithmsfortheRFIDEstimationProblem.IEEE/ACMTransactionsonNetworking20(2012).6:1978. [90] Li,X.,Bian,F.,Crovella,M.,Diot,C.,Govindan,R.,Iannaccone,G.,andLakhina,A.DetectionandIdenticationofNetworkAnomaliesUsingSketchSubspaces.Proc.ofACMSIGCOMMInternetMeasurementConference(2006):147. [91] Liang,W.ConstructingMinimum-energyBroadcastTreesinWirelessAdHocNetworks.Proc.ofACMMobiHoc(2002):112. [92] Lim,H.,Lee,N.,Lee,J.,andYim,C.ReducingFalsePositivesofaBloomFilterusingCross-CheckingBloomFilters.Appl.Math8(2014).4:1865. [93] Lovett,S.andPorat,E.ALowerBoundforDynamicApproximateMembershipDataStructures.Proc.ofFoundationsofComputerScience(FOCS)(2010):797. [94] Lu,G.,Debnath,B.,andDu,D.AForest-structuredBloomFilterwithashMemory.IEEESymposiumonMassStorageSystemsandTechnologies(MSST)(2011):1. [95] Lu,L.,Liu,Y.,andLi,X.Refresh:WeakPrivacyModelforRFIDSystems.Proc.ofIEEEINFOCOM(2010):1. 173

PAGE 174

[96] Lu,Y.,Montanari,A.,Prabhakar,B.,Dharmapurikar,S.,andKabbani,A.CounterBraids:ANovelCounterArchitectureforPer-FlowMeasurement.Proc.ofACMSIGMETRICS36(2008).1:121. [97] Lu,Y.andPrabhakar,B.RobustCountingViaCounterBraids:AnError-ResilientNetworkMeasurementArchitecture.Proc.ofIEEEINFOCOM(2009):522. [98] Lu,Y.,Prabhakar,B.,andBonomi,F.BloomFilters:DesignInnovationsandNovelApplications.Proc.ofAllertonConference(2005). [99] Lumetta,S.andMitzenmacher,M.UsingthePowerofTwoChoicestoImproveBloomFilters.InternetMathematics4(2007).1:17. [100] Luo,W.,Chen,S.,Li,T.,andChen,S.EfcientMissingTagDetectioninRFIDSystems.Proc.ofIEEEINFOCOM(2011):356. [101] Luo,W.,Chen,S.,Li,T.,andQiao,Y.ProbabilisticMissing-TagDetectionandEnergy-TimeTradeoffInLarge-ScaleRFIDSystems.Proc.ofMobileAdHocNetworkingandComputing(MobiHoc)(2012):95. [102] Luo,W.,Chen,S.,Qiao,Y.,andLi,T.Missing-TagDetectionandEnergyTimeTradeoffinLarge-ScaleRFIDSystemsWithUnreliableChannels.toappearinIEEE/ACMTransactionsonNetworking(2014). [103] Maccari,L.,Fantacci,R.,Neira,P.,andGasca,R.MeshNetworkFirewallingwithBloomFilters.IEEEInternationalConferenceonCommunications(2007):1546. [104] Mackert,L.andLohman,G.R*OptimizerValidationandPerformanceEvaluationforLocalQueries.Proc.ofACMSIGMODinternationalconferenceonManage-mentofdata15(1986).2. [105] Mahiourian,R.,Chen,F.,Tiwari,R.,Thai,M.T.,Zhai,H.,andFang,Y.AnApproximationAlgorithmforConict-AwareBroadcastSchedulinginWirelessAdHocNetworks.Proc.ofACMMobiHoc(2008):331. [106] Malde,K.andOSullivan,B.UsingBloomFiltersforLargeScaleGeneSequenceAnalysisinHaskell.PracticalAspectsofDeclarativeLanguages(2009):183. [107] Manna,P.K.,Chen,S.,andRanka,S.InsidethePermutation-ScanningWorms:PropagationModelingandAnalysis.IEEE/ACMTransactionsonNetworking18(2010).3:858. [108] Massoulie,L.,Merrer,E.Le,Kermarrec,A.,andGanesh,A.PeerCountingandSamplinginOverlayNetworks:RandomWalkMethods.Proc.ofACMSymposiumonPrinciplesofDistributedComputing(2006):123. 174

PAGE 175

[109] Michael,L.,Nejdl,W.,Papapetrou,O.,andSiberski,W.Improvingdistributedjoinefciencywithextendedbloomlteroperations.InternationalConferenceonAdvancedInformationNetworkingandApplications(AINA)(2007):187. [110] Mittag,J.,Schmidt-Eisenlohr,F.,Killat,M.,Harri,J.,andHartenstein,H.AnalysisandDesignofEffectiveandLow-overheadTransmissionPowerControlforVANETs.Proc.ofACMinternationalworkshoponVehiculArInter-NETworking(2008):39. [111] Mitzenmacher,M.CompressedBloomFilters.IEEE/ACMTransactionsonNetworking10(2002).5:604. [112] Miura,M.,Ito,S.,Takatsuka,R.,Sugihara,T.,andKunifuji,S.AnEmpiricalStudyofanRFIDMatSensorSysteminaGroupHome.JournalofNetworks4(2009).2. [113] Moffat,A.,Neal,R.,andWitten,I.ArithmeticCodingRevisited.ACMTransac-tionsonInformationSystems(TOIS)16(1998).3:256. [114] Mullin,J.K.OptimalSemijoinsforDistributedDatabaseSystems.IEEETransactionsonSoftwareEngineering16(1990).5:558. [115] Myung,J.andLee,W.AdaptiveSplittingProtocolsforRFIDTagCollisionArbitration.Proc.ofACMMOBIHOC(2006):202. [116] Namboodiri,V.andGao,L.Energy-AwareTagAnti-collisionProtocolsforRFIDSystems.IEEETransactionsonMobileComputing9(2010).1:44. [117] Nath,S.,Gibbons,P.B.,Seshan,S.,andAnderson,Z.R.SynopsisDiffusionforRobustAggregationinSensorNetworks.Proc.ofSenSys(2004):250. [118] Navidi,W.andCamp,T.StationaryDistributionsfortheRandomWaypointMobilityModel.IEEETransactionsonMobileComputing3(2004).1:99. [119] Ni,L.M.,Liu,Y.,Lau,Y.C.,andPatil,A.LANDMARC:IndoorLocationSensingusingActiveRFID.ACMWirelessNetworks(WINET)10(2004).6:701. [120] Noureddine,H.andSamira,M.EfcientLocalDensityEstimationStrategyforVANETs.arXivpreprintarXiv:1402.4508(2014). [121] Pagh,A.,Pagh,R.,andRao,S.S.AnOptimalBloomFilterReplacement.Proc.ofACM-SIAMsymposiumonDiscretealgorithms(2005):823. [122] Pagiamtzis,K.andSheikholeslami,A.Content-addressableMemory(CAM)CircuitsandArchitectures:ATutorialandSurvey.IEEEJournalofSolid-StateCircuits41(2006).3:712. [123] Panichpapiboon,S.andPattara-atikom,W.EvaluationofaNeighbor-BasedVehicleDensityEstimationScheme.ITSTelecommunications(2008):294. 175

PAGE 176

[124] Pearson,M.QDRTM-III:NextGenerationSRAMforNetworking.http://www.qdrconsortium.org/presentation/QDR-III-SRAM.pdf(2009). [125] Perkins,C.andBhagwat,P.HighlyDynamicDestination-SequencedDistanceVectorRouting(DSDV)forMobileComputers.ACMSIGCOMMComputerCommunicationReview24(1994).4:234. [126] Perkins,C.E.andRoyer,E.M.AdHocOn-demandDistanceVectorRouting.Proc.ofIEEEWorkshoponMobileComputingSystemsandApplications(1999):90. [127] Phan,T.,d'Orazio,L.,andRigaux,P.TowardIntersectionFilter-basedOptimizationforJoinsinMapReduce.Proc.ofInternationalWorkshoponCloudIntelligence(2013):2. [128] Porat,E.AnOptimalBloomFilterReplacementBasedonMatrixSolving.ComputerScience-TheoryandApplications(2009):263. [129] Qian,C.,Ngan,H.,Liu,Y.,andNi,L.CardinalityEstimationforLarge-scaleRFIDSystems.IEEETransactionsonParallelandDistributedSystems22(2011).9:1441. [130] Qiao,Y.,Chen,S.,andLi,T.RFIDasanInfrastructure.Springer,2012. [131] Qiao,Y.,Chen,S.,Li,T.,andChen,S.Energy-efcientPollingProtocolsinRFIDSystems.Proc.ofACMMobiHoc(2011). [132] Qiao,Y.,Li,T.,andChen,S.OneMemoryAccessBloomFiltersandTheirGeneralization.Proc.ofIEEEINFOCOM(2011):17451753. [133] .FastBloomFiltersandTheirGeneralization.IEEETransactionsonParallelandDistributedSystems(TPDS)25(2014).1:93103. [134] Ramesh,S.,Papapetrou,O.,andSiberski,W.OptimizingDistributedJoinswithBloomFilters.DistributedComputingandInternetTechnology(2009):145. [135] Reynolds,P.andVahdat,A.EfcientPeer-to-peerKeywordSearching.Proc.oftheACM/IFIP/USENIXInternationalConferenceonMiddleware(2003):21. [136] Rothenberg,C.,Macapuna,C.,Verdi,F.,andMagalhaes,M.TheDeletableBloomFilter:aNewMemberoftheBloomFamily.IEEECommunicationsLetters14(2010).6. [137] Rottenstreich,O.,Kanizo,Y.,andKeslassy,I.TheVariable-IncrementCountingBloomFilter.Proc.ofIEEEINFOCOM(2012):1880. [138] Royer,E.M.andToh,C.AReviewofCurrentRoutingProtocolsforAdHocMobileWirelessNetworks.IEEEPersonalCommunications6(1999).2:46. 176

PAGE 177

[139] Ruhanen,A.,Hanhikorpi,M.,Bertuccelli,F.,Colonna,A.,Malik,W.,Ranasinghe,D.,Lopez,T.S.,Yan,N.,andTavilampi,M.Sensor-enabledRFIDTagHandbook.BRIDGE,IST-2005-033546(2008). [140] Sarangan,V.,Devarapalli,M.R.,andRadhakrishnan,S.AFrameworkforFastRFIDTagReadinginStaticandMobileEnvironments.TheInternationalJournalofComputerandTelecommunicationsNetworking52(2008).5. [141] Semiconductors,Philips.I-CODESmartLabelRFIDTags.http://www.nxp.com/documents/data sheet/SL092030.pdf(2004). [142] Shen,H.,Li,Z.,Yu,L.,andQiu,C.EfcientDataCollectionforLarge-ScaleMobileMonitoringApplications.IEEETransactionsonParallelandDistributedSystems(TPDS)PP(2013).99. [143] Sheng,B.,Li,Q.,andMao,W.EfcientContinuousScanninginRFIDSystems.Proc.ofIEEEINFOCOM(2010):1. [144] Shirani,R.,Hendessi,F.,andGulliver,T.A.Store-Carry-ForwardMessageDisseminationinVehicularAd-HocNetworkswithLocalDensityEstimation.VehicularTechnologyConference(2009):1. [145] Song,H.,Dharmapurikar,S.,Turner,J.,andLockwood,J.FastHashTableLookupusingExtendedBloomFilter:AnAidtoNetworkProcessing.ACMSIGCOMMComputerCommunicationReview35(2005).4:181. [146] Song,H.,Hao,F.,Kodialam,M.,andLakshman,T.IPv6LookupsUsingDistributedandLoadBalancedBloomFiltersfor100GbpsCoreRouterLineCards.Proc.ofIEEEINFOCOM(2009). [147] Spitzner,L.TheHoneynetProject:TrappingTheHackers.IEEESecurity&Privacy1(2003).2:15. [148] Staniford,S.,Hoagland,J.,andMcAlerney,J.PracticalAutomatedDetectionofStealthyPortscans.JournalofComputerSecurity10(2002):105136. [149] Sun,C.andRitchie,S.IndividualVehicleSpeedEstimationUsingSingleLoopInductiveWaveforms.JournalofTransportationEngineering125(1999).6:531. [150] Suresh,D.,Guo,Z.,Buyukkurt,B.,andNajjar,W.AutomaticCompilationFrameworkforBloomFilterBasedIntrusionDetection.RecongurableComput-ing:ArchitecturesandApplications3985(2006):413. [151] Tabataba,F.andHashemi,M.ImprovingFalsePositiveinBloomFilter.ElectricalEngineering(ICEE)(2011):1. [152] Tan,C.,Sheng,B.,andLi,Q.HowtoMonitorforMissingRFIDTags.Proc.ofIEEEICDCS(2008):295. 177

PAGE 178

[153] Tarkoma,S.,Rothenberg,C.,andLagerspetz,E.TheoryandPracticeofBloomFiltersforDistributedSystems.CommunicationsSurveys&Tutorials,IEEE(2012).99:1. [154] Toh,C.FutureApplicationScenariosForManet-BasedIntelligentTransportationSystems.FutureGenerationCommunicationAndNetworking2(2007):414. [155] Torrent-Moreno,M.,Mittag,J.,Santi,P.,andHartenstein,H.Vehicle-To-VehicleCommunication:FairTransmitPowerControlforSafety-CriticalInformation.IEEETransactionsonVehicularTechnology58(2009).7:3684. [156] Umer,T.,Ding,Z.,Honary,B.,andAhmad,H.ImplementationofMicroscopicParametersforDensityEstimationofHeterogeneousTrafcFlowforVANET.CommunicationSystemsNetworksandDigitalSignalProcessing(CSNDSP)(2010):66. [157] Vogt,H.EfcientObjectIdenticationwithPassiveRFIDTags.Proc.ofIEEEPerCom(2002):98. [158] Wang,W.,Jiang,H.,Lu,H.,andYu,J.X.BloomHistogram:PathSelectivityEstimationforXMLDatawithUpdates.Proc.ofVeryLargeDataBases(VLDB)(2004):240. [159] Witten,I.,Moffat,A.,andBell,T.ManagingGigabytes:CompressingAndIndexingDocumentsandImages(2ndEdition).MorganKaufmann,1999. [160] Wolfson,O.,Xu,B.,Yin,H.,andCao,H.Search-and-DiscoverinMobileP2PNetworkDatabases.Proc.ofIEEEICDCS(2006):65. [161] Xiao,B.andHua,Y.UsingParallelBloomltersforMulti-attributeRepresentationonNetworkServices.IEEETransactionsonParallelandDistributedSystems21(2010).1:20. [162] Xiao,Q.,Xiao,B.,andChen,S.DifferentialEstimationinDynamicRFIDSystems.Proc.ofIEEEINFOCOM(2013):295. [163] Xu,B.andWolfson,O.DataManagementinMobilePeer-to-peerNetworks.InternationalWorkshoponDatabases,InformationSystemsandPeer-to-PeerComputing(DBISP2P)(2005):1. [164] Yang,L.,Han,J.,Qi,Y.,andLiu,Y.Identication-freeBatchAuthenticationforRFIDTags.Proc.ofIEEEICNP(2010):154. [165] Yoon,M.AgingBloomFilterwithTwoActiveBuffersforDynamicSets.IEEETransactionsonKnowledgeandDataEngineering22(2010).1:134. [166] Yoon,M.,Li,T.,Chen,S.,andPeir,J.FitaSpreadEstimatorinSmallMemory.Proc.ofIEEEINFOCOM(2009):504. 178

PAGE 179

[167] Yuan,Z.,Miao,J.,Jia,Y.,andWang,L.CountingDataStreamBasedonImprovedCountingBloomlter.The9thInternationalConferenceonWeb-AgeInformationManagement(WAIM)(2008):512. [168] Yue,H.,Zhang,C.,Pan,M.,Fang,Y.,andChen,S.ATime-efcientInformationCollectionProtocolforLarge-scaleRFIDSystems.IEEEProc.ofINFOCOM(2012):2158. [169] .Unknown-TargetInformationCollectioninSensor-EnabledRFIDSystems.toappearinIEEE/ACMTransactionsonNetworking(2014). [170] Zhai,J.andWang,G.N.AnAnti-collisionAlgorithmusingTwo-functionedEstimationforRFIDTags.Proc.ofICCSA(2005):702. [171] Zhang,F.,Wu,D.,Ao,N.,Wang,G.,Liu,X.,andLiu,J.FastListsIntersectionwithBloomFilterusingGraphicsProcessingUnits.Proc.ofACMSymposiumonAppliedComputing(2011):825. [172] Zhang,M.,Li,T.,Chen,S.,andLi,B.UsingAnalogNetworkCodingtoImprovetheRFIDReadingThroughput.InternationalConferenceonDistributedComput-ingSystems(ICDCS)(2010):547. [173] Zhen,B.,Kobayashi,M.,andShimizu,M.FramedALOHAforMultipleRFIDObjectsIdentication.IEICETransactionsonCommunications88(2005).3:991. [174] Zheng,Y.andLi,M.FastTagSearchingProtocolforLarge-scaleRFIDSystems.IEEEInternationalConferenceonNetworkProtocols(ICNP)(2011):363. [175] .ZOE:FastCardinalityEstimationforLarge-ScaleRFIDSystems.Proc.ofIEEEINFOCOM(2013):908. [176] Zheng,Y.,Li,M.,andQian,C.PET:ProbabilisticEstimatingTreeforLarge-scaleRFIDEstimation.InternationalConferenceonDistributedComputingSystems(ICDCS)(2011):37. [177] Zhou,F.,Chen,C.,Jin,D.,Huang,C.,andMin,H.EvaluatingandOptimizingPowerConsumptionofAnti-collisionProtocolsforApplicationsinRFIDSystems.Proc.ofISLPED(2004):357. [178] Zhu,Y.,Jiang,H.,andWang,J.HierarchicalBloomFilterArrays(HBA):aNovel,ScalableMetadataManagementSystemforLargeCluster-basedStorage.IEEEInternationalConferenceonClusterComputing(2004):165. 179

PAGE 180

BIOGRAPHICALSKETCH YanQiaoreceivedherPh.D.incomputerengineeringfromtheUniversityofFloridainthespringof2014.ShereceivedherB.S.degreeincomputerscienceandtechnologyinShanghaiJiaoTongUniversity,Chinain2009.Herresearchinterestsincludenetworkmeasurement,networkalgorithmsandRFIDprotocols. 180