Efficient Statistical Measurement Methods in Wired and Wireless Systems

MISSING IMAGE

Material Information

Title:
Efficient Statistical Measurement Methods in Wired and Wireless Systems
Physical Description:
1 online resource (140 p.)
Language:
english
Creator:
Li, Tao
Publisher:
University of Florida
Place of Publication:
Gainesville, Fla.
Publication Date:

Thesis/Dissertation Information

Degree:
Doctorate ( Ph.D.)
Degree Grantor:
University of Florida
Degree Disciplines:
Computer Engineering, Computer and Information Science and Engineering
Committee Chair:
Chen, Shigang
Committee Members:
Sahni, Sartaj
Wong, Tan F
Helmy, Ahmed H.
Fang, Yuguang

Subjects

Subjects / Keywords:
measurement -- networks -- rfid -- traffic
Computer and Information Science and Engineering -- Dissertations, Academic -- UF
Genre:
Computer Engineering thesis, Ph.D.
bibliography   ( marcgt )
theses   ( marcgt )
government publication (state, provincial, terriorial, dependent)   ( marcgt )
born-digital   ( sobekcm )
Electronic Thesis or Dissertation

Notes

Abstract:
Traffic measurement provides critical real-world data for service providers and network administrators to perform capacity planning, accounting and billing, anomaly detection, and service provision. We observe that in many measurement functions, statistical methods play important roles in system designing, model building, formula deriving, and error analyzing. In this dissertation, we first propose several novel online measurement functions in high-speed networks. We then notice that statistical methods for measurement problems are generic in many network systems. They can be also applied to wireless systems such as RFID (radio frequency identification) systems, which have been gaining popularity for inventory control, object tracking, and supply chain management in warehouses, retail stores, hospitals, etc. The second part of the dissertation studies the RFID estimation problem and designs two probabilistic algorithms for it. One of the greatest challenges in designing an online measurement module is to minimize the per-packet processing time in order to keep up with the line speed of the modern routers. To meet this challenge, we should minimize the number of memory accesses per packet and implement the measurement module in the on-die SRAM, which is fast but expensive. Because many other essential routing/security/performance functions may also run from SRAM, it is expected that the amount of high-speed memory allocated for the module will be small. Hence, it is critical to make the measurement module's data structure as compact as possible. The first work of this dissertation focuses on a particularly challenging problem, the measurement of per-flow information in high-speed networks. We design a fast and compact measurement function that estimates the sizes of all flows. It achieves the optimal processing speed: 2 memory accesses per packet. In addition, it provides reasonable measurement accuracy in a tight space where the best existing methods no longer work. Our design is based on a new data encoding/decoding scheme, called {\it randomized counter sharing}. This scheme allows us to mix per-flow information together in storage for compactness and, at the decoding time, separate the information of each flow through statistical removal of the error introduced during information mixing from other flows. The effectiveness of our online per-flow measurement approach is analyzed and confirmed through extensive experiments based on real network traffic traces. We also propose several methods to increase the estimation range of flow sizes. Our second work studies the scan detection problem, which is one of the most fundamental functions in intrusion detection systems. We propose an efficient scan detection scheme based on {\it dynamic bit sharing}, which incorporates probabilistic sampling and bit sharing for compact information storage. We design a maximum likelihood estimation method to extract per-source information from the shared bits in order to determine the scanners. Our new scheme ensures that the false positive/false negative ratios are bounded with high probability. Moreover, given an arbitrary set of bounds, we develop a systematic approach to determine the optimal system parameters that minimize the amount of memory needed to meet the bounds. Experiments based on a real Internet traffic trace demonstrate that the proposed scan detection scheme reduces memory consumption by three to twenty times when comparing with the best existing work. The origin-destination flow measurement is the focus of our third work. An origin-destination (OD) flow between two routers is the set of packets that pass both routers in a network. We design a new measurement method that employs a compact data structure for packet information storage and uses a novel statistical inference approach for OD-flow size estimation. Not only does the proposed method require smaller per-packet processing overhead, but also it achieves much better measurement accuracy, when comparing with existing approaches. We perform both simulations and experiments to demonstrate the effectiveness of our method. Our last work focuses on estimating the number of RFID tags deployed in a large area, which has many important applications in inventory management and theft detection. Prior works focus on designing time-efficient algorithms that can estimate tens of thousands of tags in seconds. We observe that, for a RFID reader to access tags in a large area, active tags are likely to be used due to their longer operational ranges. These tags are battery-powered and use their own energy for information transmission. However, recharging batteries for tens of thousands of tags is laborious. Hence, conserving energy for active tags becomes critical. Some prior works have studied how to reduce energy expenditure of a RFID reader when it reads tag IDs. We study how to reduce the amount of energy consumed by active tags during the process of estimating the number of tags in a system. We design two energy-efficient probabilistic estimation algorithms that iteratively refine a control parameter to optimize the information carried in transmissions from tags, such that both the number and the size of transmissions are reduced. These algorithms can also take time efficiency into consideration. By tuning a contention probability parameter, the new algorithms can make tradeoff between energy cost and estimation time.
General Note:
In the series University of Florida Digital Collections.
General Note:
Includes vita.
Bibliography:
Includes bibliographical references.
Source of Description:
Description based on online resource; title from PDF title page.
Source of Description:
This bibliographic record is available under the Creative Commons CC0 public domain dedication. The University of Florida Libraries, as creator of this bibliographic record, has waived all rights to it worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law.
Statement of Responsibility:
by Tao Li.
Thesis:
Thesis (Ph.D.)--University of Florida, 2012.
Local:
Adviser: Chen, Shigang.
Electronic Access:
RESTRICTED TO UF STUDENTS, STAFF, FACULTY, AND ON-CAMPUS USE UNTIL 2013-05-31

Record Information

Source Institution:
UFRGP
Rights Management:
Applicable rights reserved.
Classification:
lcc - LD1780 2012
System ID:
UFE0044106:00001


This item is only available as the following downloads:


Full Text

PAGE 1

EFFICIENTSTATISTICALMEASUREMENTMETHODSINWIREDANDWIRELESSSYSTEMSByTAOLIADISSERTATIONPRESENTEDTOTHEGRADUATESCHOOLOFTHEUNIVERSITYOFFLORIDAINPARTIALFULFILLMENTOFTHEREQUIREMENTSFORTHEDEGREEOFDOCTOROFPHILOSOPHYUNIVERSITYOFFLORIDA2012

PAGE 2

c2012TaoLi 2

PAGE 3

Tomyparentsandmywife 3

PAGE 4

ACKNOWLEDGMENTS Firstofall,Iwouldliketogratefullyandsincerelythankmyadviser,Prof.ShigangChen,forhisgreatsupport,guidance,understanding,andmostimportantly,hisfriendshipduringmygraduatestudyatUniversityofFlorida.Heisanincredibleadviser,apassionatescientist,andaterricperson.HisconsistentsupportandencouragementhelpedmetoovercomemanycrisissituationsandnishmyPh.D.study.Foreverythingyouhavedoneforme,Prof.Chen,Iwanttosaythankyoufromthebottomofmyheart.IamgratefultoProf.SartajSahni,Prof.YuguangFang,Prof.AhmedHelmy,andProf.TanWongfortheiradviceandsupportduringmystudyatUniversityofFlorida.IwouldalsoliketothankallthemembersinProf.Chen'sgroupfortheirhelp.TheyareLiangZhang,MyungKeunYoon,YingJian,WenLuo,YanQiao,ZhenMo,YianZhou,YangbaePark,MinChen,andespeciallyforMingZhang,whooffersmealotofsuggestionsandencouragements.IwouldalsoliketothankmyfriendsinUniversityofFlorida.TheyareYingXuan,TingChen,BingJian,YanLi,LuChen,ZhuoHuang,XuelianXiao,XiaoLi,WenjieYuan,JianminChen,JingyiWang,LongYu,YuchenXie,HechenLiu,MeizhuLiu,YixingYang,ShuangZhao,JunjieLi,MingminZhu,LinQi,YanDeng,ShuangLin,andXiangguoLi.YoumakemylifefulloffunthroughoutmygraduatestudyandIreallyenjoybeingfriendswithyou.Finally,andmostimportantly,Iamthankfultoeachmemberofmyfamily:myparents,mybrother,mysister-in-law,mylittleniece,andespeciallymywifeXiaoqian.Thankyouallforyoursupport,understanding,encouragement,andloveforsomanyyears. 4

PAGE 5

TABLEOFCONTENTS page ACKNOWLEDGMENTS .................................. 4 LISTOFTABLES ...................................... 9 LISTOFFIGURES ..................................... 10 ABSTRACT ......................................... 12 CHAPTER 1INTRODUCTION ................................... 15 1.1OnlineNetworkFunctions ........................... 15 1.2FundamentalPrimitives ............................ 16 1.3Per-FlowTrafcMeasurementthroughRandomizedCounterSharing ... 18 1.4ScanDetectioninHigh-SpeedNetworks ................... 20 1.5Origin-DestinationFlowMeasurementinHigh-SpeedNetworks ...... 23 1.6SizeEstimationProbleminRFIDSystems .................. 24 1.7OutlineoftheDissertation ........................... 27 2PER-FLOWTRAFFICMEASUREMENTTHROUGHRANDOMIZEDCOUNTERSHARING ....................................... 28 2.1PerformanceMetrics .............................. 28 2.1.1ProcessingTime ............................ 28 2.1.2StorageOverhead ........................... 29 2.1.3EstimationAccuracy .......................... 30 2.2SystemDesign ................................. 30 2.2.1BasicIdea ................................ 30 2.2.2OverallDesign ............................. 32 2.3StateoftheArt ................................. 32 2.4OnlineDataEncoding ............................. 33 2.5OfineCounterSumEstimation ........................ 34 2.5.1EstimationMethod ........................... 34 2.5.2EstimationAccuracy .......................... 36 2.5.3CondenceInterval ........................... 37 2.6MaximumLikelihoodEstimation ........................ 38 2.6.1EstimationMethod ........................... 38 2.6.2EstimationAccuracy .......................... 39 2.7SettingCounterLength ............................ 41 2.8FlowLabels ................................... 42 2.9Experiments .................................. 43 2.9.1ProcessingTime ............................ 43 2.9.2MemoryOverheadandEstimationAccuracy ............. 44 5

PAGE 6

2.10ExtensionofEstimationRange ........................ 46 2.10.1IncreasingCounterSizeb ....................... 47 2.10.2IncreasingStorageVectorSizel ................... 48 2.10.3EmployingSamplingModule ..................... 48 2.10.4HybridSRAM/DRAMDesign ..................... 49 2.11Summary .................................... 49 3SCANDETECTIONINHIGH-SPEEDNETWORKS ............... 55 3.1ProblemStatement ............................... 55 3.2RelatedWork .................................. 57 3.3AnEfcientScanDetectionScheme ..................... 58 3.3.1ProbabilisticSampling ......................... 58 3.3.2Bit-SharingStorage ........................... 59 3.3.3MaximumLikelihoodEstimationandScannerReport ........ 60 3.3.4VarianceofVm ............................. 63 3.3.5SourceAddresses ........................... 64 3.4OptimalSystemParametersandMinimumMemoryRequirement ..... 64 3.4.1ReportProbability ........................... 64 3.4.2ConstraintsfortheSystemParameters ................ 66 3.4.3OptimalSystemParameters ...................... 67 3.5Experiments .................................. 71 3.5.1ExperimentalSetup .......................... 71 3.5.2ComparisoninTermsofMemoryRequirement ........... 72 3.5.3ComparisoninTermsofFalsePositiveRatioandFalseNegativeRatio ................................... 74 3.6Summary .................................... 75 4ORIGIN-DESTINATIONFLOWMEASUREMENTINHIGH-SPEEDNETWORKS 79 4.1ProblemStatementandPerformanceMetrics ................ 79 4.1.1ProblemStatement ........................... 79 4.1.2Per-packetProcessingOverhead ................... 80 4.1.3MeasurementAccuracy ........................ 80 4.2RelatedWork .................................. 81 4.3Origin-DestinationFlowMeasurement .................... 83 4.3.1StraightforwardApproachesandTheirLimitations .......... 83 4.3.2ODFM:MotivationandOverview ................... 84 4.3.3ODFM:StoringthePacketInformation ................ 85 4.3.4ODFM:MeasuringtheSizeofEachODFlow ............ 86 4.3.4.1Measuren1andn2 ...................... 87 4.3.4.2Measurenc .......................... 87 4.3.5MeasurementAccuracy ........................ 89 4.4Simulations ................................... 91 4.4.1ProcessingOverhead ......................... 92 4.4.2MeasurementAccuracy ........................ 92 6

PAGE 7

4.5Experiments .................................. 93 4.5.1NumberofPacketsforanOrigin-DestinationPair .......... 94 4.5.2ProcessingOverhead ......................... 94 4.5.3MeasurementAccuracy ........................ 95 4.6Summary .................................... 95 5SIZEESTIMATIONPROBLEMINRFIDSYSTEMS ................ 99 5.1RelatedWork .................................. 99 5.2ProblemDenitionAndSystemModel .................... 101 5.2.1RFIDEstimationProblem ....................... 101 5.2.2ActiveTags ............................... 102 5.2.3CommunicationProtocol ........................ 102 5.2.4Empty/Singleton/CollisionSlots .................... 104 5.3GeneralizedMaximumLikelihoodEstimationAlgorithm .......... 104 5.3.1Overview ................................ 104 5.3.2InitializationPhase ........................... 105 5.3.3IterativePhase ............................. 106 5.3.3.1Computethevalueof^Ni ................... 106 5.3.3.2TerminationCondition .................... 107 5.3.4Determinethevalueof! ........................ 108 5.3.4.1NumberofPollings ...................... 109 5.3.4.2NumberofResponses .................... 109 5.3.4.3Summary ........................... 110 5.3.5Request-lessPollings ......................... 110 5.3.6InformationLossduetoCollision ................... 110 5.4EnhancedGeneralizedMaximumLikelihoodEstimationAlgorithm .... 111 5.4.1Overview ................................ 111 5.4.2IterativePhase ............................. 112 5.4.2.1Computethenumberofresponses ............. 112 5.4.2.2Computethevalueof^Ni ................... 113 5.4.2.3TerminationCondition .................... 114 5.4.3PerformanceTradeoff ......................... 115 5.4.3.1NumberofPollings ...................... 116 5.4.3.2NumberofResponses .................... 116 5.4.3.3Summary ........................... 117 5.5Simulations ................................... 117 5.5.1NumberofResponses ......................... 118 5.5.2TotalNumberofBitsTransmitted ................... 119 5.5.3EstimationTime ............................. 120 5.6Summary .................................... 121 6CONCLUSIONS ................................... 130 REFERENCES ....................................... 131 7

PAGE 8

BIOGRAPHICALSKETCH ................................ 140 8

PAGE 9

LISTOFTABLES Table page 2-1Numberofmemoryaccessesandnumberofhashcomputationsperpacket .. 52 3-1Memoryrequirements(inMB)ofESD,TFAandESD-1(i.e.ESDwithp=1)when=0.9and=0.1. ............................. 77 3-2Memoryrequirements(inMB)ofESD,TFAandESD-1(i.e.ESDwithp=1)when=0.95and=0.05. ............................ 77 3-3FalsenegativeratioandfalsepositiveratioofESD,CSEandTBAwithm=0.05MB. ........................................ 77 4-1Numberofmemoryaccessesandnumberofhashoperationsperpacketwithn1=6,000,000andn2=6,000,000 ........................ 96 4-2Numberofmemoryaccessesandnumberofhashoperationsperpacketwithn1=6,000,000andn2=300,000 ......................... 96 4-3Numberofmemoryaccessesandnumberofhashoperationsperpacketwithn1=6,000,000andn2=100,000 ......................... 96 4-4Numberofmemoryaccessesandnumberofhashoperationsperpacketwiththevaluesofn1andn2arerandomlyassignedbetween100,000and10,000,000 96 4-5Numberofmemoryaccessesandnumberofhashoperationsperpacket ... 96 5-1NumberofResponseswhen=90%,=9% .................. 125 5-2NumberofResponseswhen=90%,=6% .................. 125 5-3NumberofResponseswhen=90%,=3% .................. 125 5-4NumberofResponseswhen=95%,=9% .................. 126 5-5NumberofResponseswhen=95%,=6% .................. 126 5-6NumberofResponseswhen=95%,=3% .................. 126 5-7NumberofResponseswhen=99%,=9% .................. 127 5-8NumberofResponseswhen=99%,=6% .................. 127 5-9NumberofResponseswhen=99%,=3% .................. 127 9

PAGE 10

LISTOFFIGURES Figure page 2-1Trafcdistribution:eachpointshowsthenumber(ycoordinate)ofowsthathaveacertainsize(xcoordinate). ......................... 51 2-2EstimationresultsbyCSEandMLMwhenM=2Mb. .............. 51 2-3EstimationresultsbyCSEandMLMwhenM=4Mb. .............. 51 2-4EstimationresultsbyCSEandMLMwhenM=8Mb. .............. 52 2-5EstimationresultsbyCBwhenM=2,4,and8Mb. ................ 52 2-6EstimationresultsbyMRSCBF. ........................... 53 2-7EstimationresultsbyMLMwhenb=6,7,8,and9.Intheseexperiments,n=10M,M=4Mb. .................................... 53 2-8EstimationbiasandstandarddeviationintheexperimentalresultsshowninFigure 2-7 ...................................... 53 2-9EstimationresultsbyMLMwhenl=50,70,100,and1000.Intheseexperiments,n=10M,M=4Mb. ................................. 54 2-10EstimationbiasandstandarddeviationintheexperimentalresultsshowninFigure 2-9 ...................................... 54 2-11EstimationresultsbyMLMwhenp=75%,50%,25%,and2%.Intheseexperiments,n=10M,M=4Mb. ................................. 54 2-12EstimationbiasandstandarddeviationintheexperimentalresultsshowninFigure 2-11 ...................................... 54 3-1Therelativestandarddeviation,Std(Vm) E(Vm),approachestozeroasmincreases. .. 76 3-2(A)Thecurve(withoutthearrows)showsthevalueofPotential(m,s,p)withrespecttopwhenm=0.45MBands=150.(B)ThearrowsillustratetheoperationofOptimalP(m,s). ............................ 76 3-3ThevalueofPotential(m,s,OptimalP(m,s))withrespecttoswhenm=0.25MB. ............................................. 76 3-4Trafcdistribution:eachpointshowsthenumberofsourceshavingacertainspreadvalue. ..................................... 78 4-1Therelationbetweentworoutersr1andr2 ..................... 97 4-2EstimationresultsbyODFMandQMLEwhenn1=6,000,000andn2=6,000,000. ...................................... 97 10

PAGE 11

4-3EstimationresultsbyODFMandQMLEwhenn1=6,000,000andn2=300,000. ........................................ 97 4-4EstimationresultsbyODFMandQMLEwhenn1=6,000,000andn2=100,000. ........................................ 97 4-5EstimationresultsbyODFMandQMLEwhenthevaluesofn1andn2arerandomlyassignedbetween100,000and10,000,000. .................... 98 4-6Thenumberofpacketsfor100ODpair. ...................... 98 4-7EstimationresultsbyODFMandQMLEwhenn1=1,000,000andn2=1,000,000. ...................................... 98 5-1Themiddlecurveshowstheestimatednumberoftagswithrespecttothenumberofpollings.Theupperandlowercurvesshowthecondenceinterval. ..... 123 5-2Thesolidlineshowsthenumberofpollingswithrespectto!when=95%and=5%.Thedottedlineshowsthenumberofresponses. .......... 123 5-3Thecollisionprobabilitywithrespecttotheframesizef. ............. 124 5-4Themiddlecurveshowstheestimatednumberoftagswithrespecttothenumberofpollings.Theupperandlowercurvesshowthecondenceinterval. ..... 124 5-5Thesolidlineshowsthenumberofpollingswithrespectto!when=95%and=5%.Thedottedlineshowsthenumberofresponses. .......... 125 5-6Numbersofbitstransmittedwhen=90%,=9%,6%and3%. ........ 128 5-7Numbersofbitstransmittedwhen=95%,=9%,6%and3%. ........ 128 5-8Numbersofbitstransmittedwhen=99%,=9%,6%and3%. ........ 128 5-9Estimationtimesofthealgorithmswhen=90%,=9%,6%and3%. .... 129 5-10Estimationtimesofthealgorithmswhen=95%,=9%,6%and3%. .... 129 5-11Estimationtimesofthealgorithmswhen=99%,=9%,6%and3%. .... 129 11

PAGE 12

AbstractofDissertationPresentedtotheGraduateSchooloftheUniversityofFloridainPartialFulllmentoftheRequirementsfortheDegreeofDoctorofPhilosophyEFFICIENTSTATISTICALMEASUREMENTMETHODSINWIREDANDWIRELESSSYSTEMSByTaoLiMay2012Chair:ShigangChenMajor:ComputerEngineering Trafcmeasurementprovidescriticalreal-worlddataforserviceprovidersandnetworkadministratorstoperformcapacityplanning,accountingandbilling,anomalydetection,andserviceprovision.Weobservethatinmanymeasurementfunctions,statisticalmethodsplayimportantrolesinsystemdesigning,modelbuilding,formuladeriving,anderroranalyzing.Inthisdissertation,werstproposeseveralnovelonlinemeasurementfunctionsinhigh-speednetworks.Wethennoticethatstatisticalmethodsformeasurementproblemsaregenericinmanynetworksystems.TheycanbealsoappliedtowirelesssystemssuchasRFID(radiofrequencyidentication)systems,whichhavebeengainingpopularityforinventorycontrol,objecttracking,andsupplychainmanagementinwarehouses,retailstores,hospitals,etc.ThesecondpartofthedissertationstudiestheRFIDestimationproblemanddesignstwoprobabilisticalgorithmsforit. Oneofthegreatestchallengesindesigninganonlinemeasurementmoduleistominimizetheper-packetprocessingtimeinordertokeepupwiththelinespeedofthemodernrouters.Tomeetthischallenge,weshouldminimizethenumberofmemoryaccessesperpacketandimplementthemeasurementmoduleintheon-dieSRAM,whichisfastbutexpensive.Becausemanyotheressentialrouting/security/performancefunctionsmayalsorunfromSRAM,itisexpectedthattheamountofhigh-speed 12

PAGE 13

memoryallocatedforthemodulewillbesmall.Hence,itiscriticaltomakethemeasurementmodule'sdatastructureascompactaspossible. Therstworkofthisdissertationfocusesonaparticularlychallengingproblem,themeasurementofper-owinformationinhigh-speednetworks.Wedesignafastandcompactmeasurementfunctionthatestimatesthesizesofallows.Itachievestheoptimalprocessingspeed:2memoryaccessesperpacket.Inaddition,itprovidesreasonablemeasurementaccuracyinatightspacewherethebestexistingmethodsnolongerwork.Ourdesignisbasedonanewdataencoding/decodingscheme,calledrandomizedcountersharing.Thisschemeallowsustomixper-owinformationtogetherinstorageforcompactnessand,atthedecodingtime,separatetheinformationofeachowthroughstatisticalremovaloftheerrorintroducedduringinformationmixingfromotherows.Theeffectivenessofouronlineper-owmeasurementapproachisanalyzedandconrmedthroughextensiveexperimentsbasedonrealnetworktrafctraces.Wealsoproposeseveralmethodstoincreasetheestimationrangeofowsizes. Oursecondworkstudiesthescandetectionproblem,whichisoneofthemostfundamentalfunctionsinintrusiondetectionsystems.Weproposeanefcientscandetectionschemebasedondynamicbitsharing,whichincorporatesprobabilisticsamplingandbitsharingforcompactinformationstorage.Wedesignamaximumlikelihoodestimationmethodtoextractper-sourceinformationfromthesharedbitsinordertodeterminethescanners.Ournewschemeensuresthatthefalsepositive/falsenegativeratiosareboundedwithhighprobability.Moreover,givenanarbitrarysetofbounds,wedevelopasystematicapproachtodeterminetheoptimalsystemparametersthatminimizetheamountofmemoryneededtomeetthebounds.ExperimentsbasedonarealInternettrafctracedemonstratethattheproposedscandetectionschemereducesmemoryconsumptionbythreetotwentytimeswhencomparingwiththebestexistingwork. 13

PAGE 14

Theorigin-destinationowmeasurementisthefocusofourthirdwork.Anorigin-destination(OD)owbetweentworoutersisthesetofpacketsthatpassbothroutersinanetwork.WedesignanewmeasurementmethodthatemploysacompactdatastructureforpacketinformationstorageandusesanovelstatisticalinferenceapproachforOD-owsizeestimation.Notonlydoestheproposedmethodrequiresmallerper-packetprocessingoverhead,butalsoitachievesmuchbettermeasurementaccuracy,whencomparingwithexistingapproaches.Weperformbothsimulationsandexperimentstodemonstratetheeffectivenessofourmethod. OurlastworkfocusesonestimatingthenumberofRFIDtagsdeployedinalargearea,whichhasmanyimportantapplicationsininventorymanagementandtheftdetection.Priorworksfocusondesigningtime-efcientalgorithmsthatcanestimatetensofthousandsoftagsinseconds.Weobservethat,foraRFIDreadertoaccesstagsinalargearea,activetagsarelikelytobeusedduetotheirlongeroperationalranges.Thesetagsarebattery-poweredandusetheirownenergyforinformationtransmission.However,rechargingbatteriesfortensofthousandsoftagsislaborious.Hence,conservingenergyforactivetagsbecomescritical.SomepriorworkshavestudiedhowtoreduceenergyexpenditureofaRFIDreaderwhenitreadstagIDs.Westudyhowtoreducetheamountofenergyconsumedbyactivetagsduringtheprocessofestimatingthenumberoftagsinasystem.Wedesigntwoenergy-efcientprobabilisticestimationalgorithmsthatiterativelyreneacontrolparametertooptimizetheinformationcarriedintransmissionsfromtags,suchthatboththenumberandthesizeoftransmissionsarereduced.Thesealgorithmscanalsotaketimeefciencyintoconsideration.Bytuningacontentionprobabilityparameter!,thenewalgorithmscanmaketradeoffbetweenenergycostandestimationtime. 14

PAGE 15

CHAPTER1INTRODUCTION 1.1OnlineNetworkFunctions Modernhigh-speedroutersforwardpacketsfromincomingportstooutgoingportsviaswitchingfabric,bypassingmainmemoryandCPU.NewtechnologiesarepushinglinespeedsbeyondOC-768(40Gb/s)toreach100Gb/soreventerabitspersecond[ 47 ].Thelinecardsincoreroutersmustthereforeforwardpacketsatarateexceeding150Mpps[ 104 ];thatleavesnomorethan6.7nstoprocesseachpacket.Parallelprocessingandpipelineareusedtospeeduppacketswitchingtoafewclockcyclesperpacket[ 51 ].Inordertokeepupwithsuchhighthroughput,onlinenetworkfunctionsfortrafcmeasurement,packetscheduling,accesscontrol,andqualityofservicewillalsohavetobeimplementedusingon-chipcachememoryandbypassingmainmemoryandCPUalmostentirely[ 75 104 129 ].However,ttingthesenetworkfunctionsinfastbutsmallon-chipmemoryrepresentsamajortechnicalchallengetoday[ 51 95 ]. Thecommonly-usedcachememoryonnetworkprocessorchipsisSRAM,typicallyafewmegabytes.Furtherincreasingon-chipmemorytomorethan10MBistechnicallyfeasible,butitcomeswithamuchhigherpricetagandaccesstimeislonger.Thereisahugeincentivetokeepon-chipmemorysmallbecausesmallermemorycanbemadefasterandcheaper.Off-chipSRAMislarger.Forexample,QDR-IIISRAMhas36MB[ 91 ].Butitisslowertoaccess.Hence,on-chipmemoryremainstherstchoiceforonlinenetworkfunctionsthataredesignedtomatchthelinespeeds. On-chipmemoryislimitedinsize.Tomakethematterevenmorechallenging,itmayhavetobesharedbysecurity[ 57 ],measurement[ 75 ],routing[ 20 ],andperformance[ 55 ]functionsthatareimplementedonthesamechip.Whenmultiplenetworkfunctionssharethesamememory,eachofthemcanonlyuseafractionoftheavailablespace.Dependingontheirrelativeimportance,somefunctionsmaybeallocatedtinyportionsofthelimitedmemory,whereastheamountofdatatheyhave 15

PAGE 16

toprocessandstorecanbeextremelylargeinhigh-speednetworks.Thedisparityinmemorydemandandsupplyrequiresustoimplementonlinefunctionsascompactaspossible[ 106 114 ].Furthermore,whendifferentfunctionssharethesamememory,theymayhavetotaketurnstoaccessthememory,makingmemoryaccesstheperformancebottleneck.Sincemostonlinefunctionsrequireonlysimplecomputationsthatcanbeefcientlyimplementedinhardware,theirthroughputwillbedeterminedbythebottleneckinmemoryaccess.Hence,wemustalsominimizethenumberofmemoryaccessesmadebyeachfunctionwhenitprocessesapacket.Thechallengeisthatcompactness(intermsofspacerequirement)andspeed(intermsofmemoryaccesses)aresometimesconictingobjectives. 1.2FundamentalPrimitives Weobservethattheimplementationsofmanyonlinefunctionsheavilyrelyonseveralfundamentalbuildingblocksfordataprocessingandstorage.Therstpartofthisdissertationstudiesthreeimportantfundamentalonlinefunctions:per-owestimators,spreadestimators,andorigin-destinationowestimators. Per-owestimatorsareusedtomeasureper-owinformationforhigh-speedlinks.Thegoalistoestimatethesizeofeachow(intermsofnumberofpackets).Aowisidentiedbyalabelthatcanbeasourceaddress,adestinationaddress,oranycombinationofaddresses,ports,andothereldsinthepacketheader.Measuringthesizesofindividualowshasimportantapplications.Forexample,ifweusetheaddressesoftheusersasowlabels,per-owtrafcmeasurementprovidesthebasisforusage-basedbillingandgracefulservicedifferentiation,whereauser'sserviceprioritygracefullydropsasheover-spendshisresourcequota.Studyingper-owdataoverconsecutivemeasurementperiodsmayhelpusdiscovernetworkaccesspatternsand,togetherwithuserproling,revealgeographic/demographictrafcdistributionsamongusers.SuchinformationwillhelpInternetserviceprovidersandapplicationdeveloperstoalignnetworkresourceallocationwiththemajority'sneeds. 16

PAGE 17

Spreadestimators[ 70 ]aredesignedforthemeasurementofdistinctelementsineachow,whereowscanbeper-sourceows,per-destinationows,TCPows,P2Pows,andsoon.Andelementscanbesourceaddresses,destinationaddresses,oranyotherapplication-specicaddresses.Forexample,ifwetreatallpacketsthatarefromthesamesourceaddressasaow,aspreadestimatorthatmeasuresthenumberofdistinctdestinationsforeachowcanbeusedtodetectportscans[ 106 ].SpreadestimatorsmaybeusedtodetectDDoSattackswhentoomanyhostssendtrafctoareceiver[ 92 ],i.e.,thespreadofadestinationisabnormallyhigh.Theycanbeusedtoestimatetheinfectionrateofawormbymonitoringhowmanyaddresseseachinfectedhostcontactsoveraperiodoftime. Origin-destination(OD)owestimatorsareusedtomeasureODowsizes.Considertworoutersr1andr2.Wedenethesetofpacketsthatrstpassr1andthenpassr2orrstpassr2andthenpassr1asanorigin-destination(OD)owofthetworouters.ThecardinalityofthepacketsetiscalledtheODowsize.TheODowmeasurementisalsoanimportanttopicinmanynetworkmanagementapplications[ 39 46 80 82 99 ].Forexample,InternetserviceprovidersmayusetheOD-owinformationbetweenpointsofinterestasareferencetoaligntrafcdistributionwithinthenetwork.TheymayalsostudytheOD-owtrafcpatternandidentifyanomaliesthatdeviatesignicantlyfromthenormalpattern.Intheeventofapersistentcongestion,OD-owdatamayhelppointoutthesourceofthecongestion. Oneofthegreatestchallengesindesigninganonlinemeasurementmoduleistominimizetheper-packetprocessingtimeinordertokeepupwiththelinespeedofthemodernrouters.Tomeetthischallenge,weshouldminimizethenumberofmemoryaccessesperpacketandimplementthemeasurementmoduleintheon-dieSRAM,whichisfastbutexpensive.BecausemanyotherfunctionsmayalsorunfromSRAM,itisexpectedthattheamountofhigh-speedmemoryallocatedforthemodulewillbe 17

PAGE 18

small.Hence,itiscriticaltomakethemeasurementmodule'sdatastructureascompactaspossible. Weobservethatwhenstudyingthesemeasurementproblems,statisticalmethodsplayimportantrolesinsystemdesigning,modelbuilding,formuladeriving,anderroranalyzing.Werstproposeseveralnovelschemeswhichemployclassicalstatisticalmethodssuchasmaximumlikelihoodestimationmethod.Wethenlearnthatstatisticalmethodsformeasurementproblemsaregenericinmanynetworksystems.TheycanalsobeappliedtowirelesssystemssuchasRFID(radiofrequencyidentication)systems,whichhavebeengainingpopularityforinventorycontrol,objecttracking,andsupplychainmanagementinwarehouses,retailstores,hospitals,etc.ThesecondpartofthisdissertationstudiestheRFIDestimationproblemanddesignsseveralprobabilisticalgorithmsforit. 1.3Per-FlowTrafcMeasurementthroughRandomizedCounterSharing Thisworkfocusesonaparticularlychallengingproblem,themeasurementofper-owinformationforahigh-speedlinkwithoutusingper-owdatastructures[ 69 ].Ithasbeenshownin[ 40 ]thatmaintainingper-owcounterscannotscaleforhigh-speedlinks.Evenforefcientcounterimplementations[ 96 102 130 ],SRAMwillonlybeabletoholdasmallfractionofper-owstate(includingcountersandindexingdatastructuressuchaspointersandowidentitiesforlocatingthecounters).Thecounterbraidsavoidper-owcountersandachievenear-optimalmemoryefciency[ 75 76 ].Thismethodmapseachowtothreearbitrarycounters;theyareallincrementedbyoneforeverypacketoftheow.Manyowsmaybemappedtothesamecounter,whichstoresthesumoftheowsizes.Essentially,thecountersrepresentlinearequations,whichcanbesolvedfortheowsizes.Twolevelsofcountersareusedtoreducethememoryoverhead.Thecounterbraidsrequireslightlymorethan4bitsperowandareabletocounttheexactsizesofallows.Butitalsohastwolimitations.First,itperforms6oroccasionally12memoryaccessesperpacket.Second,whenthememoryallocated 18

PAGE 19

toameasurementfunctionisfarlessthan4bitsperow,ourexperimentsshowthatthemessagepassingdecodingalgorithmofcounterbraidscannotconvergetoanymeaningfulresults.Whentheavailablememoryisjust12bitsperow,theexactmeasurementoftheowsizesisnolongerpossible.Wehavetoresorttoestimationmethods.Thekeyistoefcientlyutilizethelimitedspacetoimprovetheaccuracyoftheestimatedowsizes,anddosowiththeminimumnumberofmemoryaccessesperpacket.Thisiswhatthisworktriestoachieve. Wedesignafastandcompactper-owtrafcmeasurementfunctionthatachievesthreemainobjectives:i)Itsharescountersamongowstosavespace,anddoesnotincuranyspaceoverheadformappingowstotheircounters.Thisdistinguishesourworkfrom[ 96 102 130 ].ii)Itupdatesexactlyonecounterperpacket,whichisoptimal.Thisseparatesourworkfromthecounterbraidsthatupdatethreeormorecountersperpacket.Updatingeachcounterrequirestwomemoryaccessesforreadandthenwrite.iii)Itprovidesestimationoftheowsizes,aswellasthecondenceintervalsthatcharacterizetheaccuracy,evenwhentheavailablememoryistoosmallsuchthatotherexact-countingmethodsincluding[ 75 76 ]nolongerwork.Webelieveourworkistherstonethatachievesalltheseobjectives.Itcomplementstheexistingworkbyprovidingadditionalexibilityforthepractitionerstochoosewhenothermethodscannotmeetthespeedandspacerequirements. Thedesignofourmeasurementfunctionisbasedonanewdataencoding/decodingscheme,calledrandomizedcountersharing.Itsplitsthesizeofeachowamonganumberofcountersthatarerandomlyselectedfromacounterpool.Thesecountersformthestoragevectoroftheow.Foreachpacketofaow,werandomlyselectacounterfromtheow'sstoragevectorandincrementthecounterbyone.Suchasimpleonlineoperationcanbeimplementedveryefciently.Thestoragevectorsofdifferentowssharecountersuniformlyatrandom;thesizeinformationofoneowinacounteristhenoisetootherowsthatsharethesamecounter.Fortunately,thisnoisecanbe 19

PAGE 20

quantitativelymeasuredandremovedthroughstatisticalmethods,whichallowustoestimatethesizeofaowfromtheinformationinitsstoragevector.Weproposetwoestimationmethodswhoseaccuraciesarestatisticallyguaranteed.Theyworkwellevenwhenthetotalnumberofcountersinthepoolisbyfarsmallerthanthetotalnumberofowsthatsharethecounters.Ourexperimentalresultsbasedonrealtrafctracesdemonstratethatthenewmethodscanachievegoodaccuracyinatightspace.Wealsoproposeseveralmethodstoincreasetherangeofowsizesthattheestimatorscanmeasure. Therandomizedcountersharingschemeproposedinthisworkforper-owtrafcmeasurementhasapplicationsbeyondthenetworkingeld.Itmaybeusedinthedatastreamingapplicationstocollectper-iteminformationfromastreamofdataitems. 1.4ScanDetectioninHigh-SpeedNetworks Internetsecurityisafundamentalproblemandhasreceivedconsiderableattentionforyears[ 23 24 56 ].Manynetwork-basedattacksareprecededwithareconnaissancephase,inwhichtheattackeroritszombiesscanthehostsinanetworktoidentifyvulnerability.Asaresult,scandetectionisoneofthemostfundamentalfunctionsinalmostanynetworkintrusiondetectionsystem(IDS).Ciscohasbeenpushingforyearstobuildsecurityfunctionsintoitshigh-endrouters.Scandetectionisincreasinglyperformedbyrouterswithsecuritymodulesorrewallsthatinspectpackets[ 32 ]. Wedeneacontactasasource-destinationpair,forwhichthesourcesendsapackettothedestination.ThesourceordestinationcanbeanIPaddress,aportnumber,oracombinationofthemtogetherwithothereldsinthepacketheader.Thespreadofasourceisthenumberofdistinctdestinationscontactedbythesourceduringameasurementperiod.Asourceisclassiedasascannerifitsspreadexceedsacertainthreshold.Therefore,scandetectionisfundamentallyanonlinetrafcmeasurementproblem. 20

PAGE 21

Agreatchallengeforscandetectionisthatthedatavolumetobestoredcanbehuge.Forexample,themaingatewayatourcampusobservesmorethan10milliondistinctsource-destinationpairsonanaverageday.Supposeeachmeasurementperiodisonedaylong(inordertocatchstealthylow-ratescanners).Ifwesimplystorealldistinctsource/destinationaddresspairsforscandetection,itwillrequiremorethan80MBofSRAM,whichistoomuch.Amajorthrustinthescandetectionresearchistoreducethememoryconsumption[ 13 41 110 114 129 ]. Reducingmemoryconsumptiondoesnotcomeforfree.Thepriorresearchsacricesdetectionaccuracyformemorysaving.Thebasicideaistocompressthecontactinformationinlimitedmemoryspace.Thecompressedinformationallowsustoestimatethespreadsofthesources,insteadofcountingthemexactly.However,theestimatedspreadvaluesmaycausefalsepositives(inwhichanon-scannerismistakenlyreportedasascanner)andfalsenegatives(inwhichascannerisnotreported).Consequently,thefollowingquestionsbecomeimportantforanypracticalsecuritysystem:Howseriousisthefalsepositive/falsenegativeproblem?Canthesystembeconguredsuchthatthefalsepositive/falsenegativeratiosarebounded?Todate,fewpapersdirectlyaddressedthesequestions. Thepriorworkfollowstwogeneralmethodsformemoryreduction:probabilisticsamplingandstoragesharing.Theprobabilisticsamplingmethodistorecordonlyacertainpercentageofrandomlysampledcontacts.Anexampleistheone-level/two-levelalgorithmsproposedbyVenkataramanetal.[ 110 ].Thesealgorithmsstorethesource/destinationaddressesofthesampledcontactsinhashtables.Theirmaincontributionistoderivetheoptimalsamplingprobabilitythatensureswithhighprobabilitythatthefalsepositive/falsenegativeratiosdonotexceedcertainpre-denedbounds. However,itisnotmemory-efcienttodirectlystoretheaddressesofthecontactsmadebyeachsource.Anaivesolutionistouseper-sourcecounterstorecordthe 21

PAGE 22

numberofpacketsfromeachsource.Near-optimalcounterarchitecturessuchascounterbraids[ 75 ]requireonlyafewbitspersource.Theproblemisthatcounterscannotremoveduplicates:Athousandpacketsfromthesamesourcetothesamedestinationshouldcountasonecontact,insteadofathousand.Inordertoremoveduplicates,onemayuseBloomlters[ 110 ]orbitmapalgorithms[ 41 ].Theyencodethecontactsmadebyeachsourceinaseparatebitmap,whichautomaticallyltersduplicates.However,per-sourcebitmapsstilltaketoomuchspace.Caoetal.useaseriesofBloomltersandahashtabletoreducethenumberofsourcesthatneedbitmaps[ 13 ]. Insteadofusingaseparatebitmapforeachsource,aninterestingspace-savingmethodistoallowstoragesharing,whereeachdatastructureisnolongerdedicatedtoasinglesourcebutsharedamongmultiplesources.Thisisparticularlynecessarywhenthenumberofsourcesismorethanthenumberofavailablebits.Zhaoetal.[ 129 ]encodeseachcontactinthreesharedbitmapsusingatechniquesimilartoBloomlters.Yoonetal.[ 114 ]designanotherstoragesharingmethodwithsuperiorperformance.Althoughbothmethodscanbeusedforscandetection,noneofthemprovidesanymeanstoensurethatthefalsepositive/falsenegativeratiosarebounded.Moreover,ourexperimentsshowthattheseexistingmethods[ 13 114 129 ]takefarmorememorythantheoneproposedinthisstudy. Thisstudyproposesanefcientscandetectionschemebasedonanewstoragesharingmethod,calleddynamicbitsharing,whichsharestheavailablebitsuniformlyatrandomamongallsources,suchthatthememoryspaceisfullyutilizedforstoringcontactinformation.Itemploysamaximumlikelihoodestimationmethodtoextractper-sourceinformationfromthesharedbitsinordertodeterminethescanners.Italsoenhancessecuritythroughaprivatekey.Ournewmethodensuresthatthefalsepositive/falsenegativeratiosarebounded.Moreover,givenanarbitrarysetofbounds,weshowanalyticallyhowtochoosetheoptimalsystemparameterssuchthat 22

PAGE 23

theamountofmemoryneededtosatisfytheboundsisminimized.Wealsoperformexperimentsbasedonarealtrafctraceanddemonstratethat,usingtheseoptimalparameters,wecanreducethememoryconsumptionbythreetotwentytimeswhencomparingwiththebestexistingwork. 1.5Origin-DestinationFlowMeasurementinHigh-SpeedNetworks Ourthirdworkfocusesontheproblemoforigin-destination(OD)owmeasurement[ 71 ].Thegoalistodesignanefcientmethodtomeasurethenumberofpacketsthattraversebetweentworoutersduringameasurementperiod.Itgenerallyconsistsoftwophases:OneforonlinepacketinformationstorageandtheotherforofineOD-owsizecomputation.Intherstphase,routersrecordinformationaboutarrivalpackets.Inthesecondphase,eachrouterreportsitsstoredinformationtoacentralizedserver,whichperformsthemeasurementofeachODowbasedontheinformationsentfromtheorigin/destinationrouterpair. Measurementefciencyandaccuracyaretwomaintechnicalchallenges.Intermsofefciency,wewanttominimizetheper-packetprocessingoverheadtoaccommodatefutureroutersthatforwardpacketsatextremelyhighrates.Morespecically,thefunctionshouldminimizethecomputationalcomplexityandthenumberofmemoryaccessesforeachpacket. Accuracyisanotherimportantdesigngoal.Inhigh-speednetworks,wehavetodealwithaverylargevolumeofpackets.Anditisunrealistictostoreallpacket-levelinformationinordertoachieve100%accuracy.Tosolvethisproblem,somepastresearch[ 119 121 ]usesdatasuchaslinkload,networkrouting,andcongurationdatatoindirectlymeasuretheODows.Cao,ChenandBu[ 11 ]proposeaquasi-likelihoodapproachbasedonacontinuousvariantoftheFlajolet-Martinsketches[ 43 ].However,noneofthemisabletoachievebothefciencyandaccuracyatthesametime. Tomeetthesechallenges,wedesignanovelODowmeasurementmethod,whichusesacompactbitmapdatastructureforpacketinformationstorage.Attheend 23

PAGE 24

ofameasurementperiod,bitmapsfromallroutersaresenttoacentralizedserver,whichexaminesthebitmapsofeachorigin/destinationrouterpairandusesastatisticalinferenceapproachtoestimatetheODowsize.Theproposedmethodhasthreeelegantproperties.First,itsprocessingoverheadissmallandconstant,onlyonehashoperationandonememoryaccessperpacket.Second,itisabletoachieveexcellentmeasurementresults,whichwillbedemonstratedbybothsimulationsandexperiments.Finally,itsdatastorageisverycompact.Thememoryallocationislessthan1bitforeachpacketonaverage. 1.6SizeEstimationProbleminRFIDSystems Radio-frequencyidentication(RFID)technologyhasbeenwidelyusedinvariouscommercialapplications.RFIDtags(eachstoringauniqueID)areattachedtomerchandizesatretailstores,equipmentathospitals,orgoodsatwarehouses,allowinganauthenticatedRFIDreadertoquicklyaccesspropertiesofeachindividualitemorcollectstatisticalinformationaboutalargegroupofitems. ThisworkfocusesonaRFID-enabledfunctionthatisveryusefulininventorymanagement.Imaginealargewarehousewiththousandsoflaptops,cellphones,electronics,apparel,bags,orfurniturepieces.Anationalretailsurveyshowedthatadministrationerror,vendorfraudandemployeetheftcausedabout20billiondollarslostayear[ 52 ].Hence,itisdesirabletohaveaquickwayofcountingthenumberofitemsinthewarehouseorineachsectionofthewarehouse.Totimelydetecttheftormanagementerrors,suchcountingmaybeperformedfrequently. IfeachitemisattachedwithaRFIDtag,thecountingproblemcanbesolvedbyaRFIDreaderthatreceivestheIDstransmitted(orbackscattered)fromthetags[ 112 ].However,readingtheactualtagIDscanbetime-consumingbecausesomanyofthemhavetobedeliveredinthesamelow-ratechannelandcollisionscausedbysimultaneoustransmissionsbydifferenttagsmakethematterworse.Toaddressthisproblem,KodialamandNandagopal[ 63 64 ]showedthatreadingtimecanbe 24

PAGE 25

greatlyreducedthroughprobabilisticmethodsthatestimatethenumberoftags.ThisiscalledtheRFIDestimationproblem.Thefollow-upworkbyQianetal.[ 93 ]signicantlyreducesestimationtimewhencomparingwith[ 63 ].ItcanbeshownthatevenforapplicationsthatrequirereadingtheactualtagIDs,estimatingthenumberoftagsasapre-processingstepwillhelpmakethemainprocedureofreadingtagIDsmuchmoreefcient[ 63 ].AnotheradvantageofestimatingthenumberoftagswithoutreadingtheIDsisthatitensuresanonymityofthetags,whichmaybeusefulinprivacy-sensitivescenariosinvolvingRFID-enhancedpassportsordriver'slicences,wherecountingthenumberofpeoplepresentisneededbutrevealingtheiridentitiesisnotnecessary. Istimeefciencytheonlyperformancemetricfortheestimationprobleminlarge-scaleRFIDsystemsthatuseactivetags?Wearguethatenergycostisalsoanimportantissuethatmustbecarefullydealtwith.ForanyapplicationthatrequiresaRFIDreadertoaccesstagsinalargearea,itislikelythatbattery-poweredactivetagswillbeused.Passivetagsharvestenergyfromradiosignalofareaderandusesuchaminuteamountofenergytodeliverinformationbacktothereader.Theirtypicalreadingrangeisonlyseveralmeters,whichdonottwellwiththebigwarehousescenario.Activetagsusetheirownpowertotransmit.Alongerreadingrangecanbeachievedbytransmittingathigherpower.Theyarealsoricherinresourcesforimplementingadvancedfunctions.Theirpricebecomeslessofaconcerniftheyareusedforexpensivemerchandizesorreusedmanytimesasgoodsmovinginandoutofthewarehouse.Butactivetagsalsohaveaproblem.Theyarepoweredbybatteries.Rechargingbatteriesfortensofthousandsoftagsisalaboriousoperation,consideringthattaggedproductsmaybestackedup,makingtagsnoteasilyaccessible.Toprolongthelifetimeoftagsandreducethefrequencyofbatteryrecharge,allfunctionsthatinvolvelarge-scaletransmissionbymanytagsshouldbemadeenergy-efcient.Priorworksfocusonenergy-efcientanti-collisionprotocolsthatminimizeenergyconsumptionofamobilereader[ 61 85 ]whenthereadercollectstagIDs.Tothebestof 25

PAGE 26

ourknowledge,thisworkisthersttostudyenergy-efcientsolutionsfortheestimationprobleminlarge-scaleRFIDsystemsthatuseactivetags. Thisworkhasfourmajorcontributions.First,weobservethatthereexistsanasymmetryinenergycost.SolvingtheRFIDestimationproblemincursenergycostbothattheRFIDreaderandatactivetags.Theasymmetryisthatenergycostattagsshouldbeminimizedwhileenergycostatthereaderisrelativelylessofaconcernbecausethereader'sbatterycanbereplacedeasilyoritmaybepoweredbyanexternalsource.Toexploitthisasymmetry,ournewalgorithmsfollowacommonframeworkthattradesmoreenergycostatthereaderforlesscostatthetags.Thereaderwillcontinuouslyreneandbroadcastacontrolparametercalledcontentionprobability,whichoptimizestheamountofinformationthereadercanextractfromtransmissionsbytags.Thisinturnreducesthenumberoftransmissionsbytagsthatarenecessarytoachieveacertainestimationaccuracy. Second,thedesignofourestimationalgorithmsisbasedonthemaximumlikelihoodestimationmethod(MLE)thatisdifferentfromtheprobabilisticcountingmethods[ 113 ]usedby[ 63 64 ].OurestimationalgorithmsoptimizetheirperformancebyiterativelyapplyingMLEwithcontinuouslyrenedparameters.Thesenewalgorithmsnotonlyrequirefewertransmissionsbytagsbutalsominimizethesizeofeachtransmission.Thenumberoftransmissionsmadebytagsinourbestalgorithmislessthanonefourthachievedbythestate-of-the-artalgorithms.Intermsofthetotalnumberofbitstransmittedbytags,itismorethananorderofmagnitudesmaller. Third,weformallyanalyzethecondenceintervalsofestimationsmadebyournewalgorithmsandestablishtheterminationconditionsforanygivenaccuracyrequirement.Weperformextensivesimulationstodemonstratethatthemeasuredresultsmatchwellwiththeanalyticalresultsandthatthenewalgorithmsperformfarbetterintermsofenergysavingthanthebestexistingalgorithms. 26

PAGE 27

Fourth,ouralgorithmsaregeneralizedwithatunableparameter!,specifyingthecontentionprobabilitythattagsusetodecidewhethertheywilltransmit.Bymodifyingthisparameter,thegeneralizedalgorithmscanmaketradeoffbetweenenergycostandestimationtime(i.e.,thetimeittakestocompletetheprocessofestimatingthenumberoftags).Eventhoughourmaingoalistoreduceenergycost,theabilityforperformancetradeoffmakesouralgorithmsmoreadaptableinpracticalsettingthataresensitivenotonlytoenergycostbutalsotoestimationtime. Inthebroadcontextofcomputernetworks,therearemanyotherimportanttopicsthathavedrawnextensiveattentionfromresearchers.TheyareQoSandmaxminrouting[ 19 21 22 77 84 103 109 ],P2Pnetworks[ 59 125 126 ],distributedcomputing[ 6 18 ],etc. 1.7OutlineoftheDissertation Therestofthedissertationisorganizedasfollows:Chapter 2 presentsafastandcompactper-owtrafcmeasurementapproachthroughrandomizedcountersharing.Inthissection,wedesignofanoveldataencoding/decodingscheme,whichmixesper-owinformationrandomlyinatightSRAMspaceforcompactness.Chapter 3 proposesanefcientscandetectionschemebasedonanewmethodcalleddynamicbitsharing,whichoptimallycombinesprobabilisticsampling,bit-sharingstorage,andmaximumlikelihoodestimation.Chapter 4 designsanewmethodforODowmeasurementwhichemploysthebitmapdatastructureforpacketinformationstorageandusesstatisticalinferenceapproachtocomputethemeasurementresults.Chapter 5 proposestwoprobabilisticalgorithmsforestimatingthenumberofRFIDtagsinaregion.Chapter 6 drawstheconclusion. 27

PAGE 28

CHAPTER2PER-FLOWTRAFFICMEASUREMENTTHROUGHRANDOMIZEDCOUNTERSHARING Thischapterstudiesthemeasurementofper-owinformationforhigh-speedlinks.Itisaparticularlydifcultproblembecauseoftheneedtoprocessandstoreahugeamountofinformation,whichmakesitdifcultforthemeasurementmoduletotinthesmallbutfastSRAMspace(inordertooperateatthelinespeed).Weproposeanovelmeasurementfunctionthatestimatesthesizesofallows.Itdeliversgoodperformanceintightmemoryspacewherethebestexistingapproachesnolongerwork.Theeffectivenessofouronlineper-owmeasurementapproachisanalyzedandconrmedthroughextensiveexperimentsbasedonrealnetworktrafctraces. Therestofthischapterisorganizedasfollows:Section 2.1 discussestheperformancemetrics.Section 2.2 givesanoverviewofoursystemdesign.Section 2.3 discussesthestateoftheart.Section 2.4 presentstheonlinedataencodingmodule.Sections 2.5 2.6 proposetwoofinedatadecodingmodules.Section 2.7 discussestheproblemofsettingcounterlength.Section 2.8 addressestheproblemofcollectingowlabels.Section 2.9 presentstheexperimentalresults.Section 2.10 extendsourestimatorsforlargeowsizes.Section 2.11 givesthesummary. 2.1PerformanceMetrics Wemeasurethenumberofpacketsineachowduringameasurementperiod,whichendseverytimeafteracertainnumber(e.g.,10millions)ofpacketsareprocessed.Thedesignofper-owmeasurementfunctionsshouldconsiderthefollowingthreekeyperformancemetrics. 2.1.1ProcessingTime Theper-packetprocessingtimeofanonlinemeasurementfunctiondeterminesthemaximumpacketthroughputthatthefunctioncanoperateat.Itshouldbemadeassmallaspossibleinordertokeepupwiththelinespeed.Thisisespeciallytruewhenmultiple 28

PAGE 29

routing,security,measurement,andresourcemanagementfunctionsshareSRAMandprocessingcircuits. Theprocessingtimeismainlydeterminedbythenumberofmemoryaccessesandthenumberofhashcomputations(whichcanbeefcientlyimplementedinhardware[ 97 ]).Thecounterbraids[ 75 76 ]updatethreecountersattherstlevelforeachpacket.Whenacounterattherstleveloverows,itneedstoupdatethreeadditionalcountersatthesecondlevel.Hence,itrequires3hashesand6memoryaccessestoreadandthenwritebackaftercounterincrement.Butintheworsecase,itrequires6hashesand12memoryaccesses.Themulti-resolutionspace-codeBloomlters[ 66 ]probabilisticallyselectoneormoreofits9ltersandset36bitsineachoftheselectedones.Eachofthosebitsrequiresonememoryaccessandonehashcomputation. Ourobjectiveistoachieveaconstantper-packetprocessingtimeofonehashcomputationandtwomemoryaccesses(forupdatingasinglecounter).Thisistheminimumprocessingtimeforanymethodthatuseshashoperationstoidentifycountersforupdate. 2.1.2StorageOverhead TheneedtoreducetheSRAMoverheadhasbeendiscussedinChapter 1 .Onemayarguethatbecausetheamountofmemoryneededisrelatedtothenumberofpacketsinameasurementperiod,wecanreducethememoryrequirementbyshorteningthemeasurementperiod.However,whenthemeasurementperiodissmaller,moreowswillspanmultipleperiodsandconsequentlytheaverageowsizeineachperiodwillbesmaller.Whenwemeasuretheowsizes,wealsoneedtocapturetheowlabels[ 76 ],e.g.,atupleofsourceaddress/portanddestinationaddress/porttoidentifyaTCPow.TheowlabelsaretoolargetotinSRAM.TheyhavetobestoredinDRAM.Therefore,inameasurementperiod,eachowincursatleastoneDRAMaccesstostoreitsowlabel.Iftheaverageowsizeislargeenough,theoverheadofthisDRAMaccesswillbeamortizedovermanypacketsofaow.However, 29

PAGE 30

iftheaverageowsizeistoosmall,theDRAMaccesswillbecometheperformancebottleneckthatseriouslylimitsthethroughputofthemeasurementfunction.Thismeansthemeasurementperiodshouldnotbetoosmall.OurexperimentsinSection 2.9 setameasurementperiodsuchthattheaverageowsizeisabout10. 2.1.3EstimationAccuracy Letsbethesizeofaowand^sbetheestimatedsizeoftheowbasedonameasurementfunction.Theestimationaccuracyofthefunctioncanbespeciedbyacondenceinterval:theprobabilityforstobewithin[^s(1)]TJ /F7 11.955 Tf 12.24 0 Td[(),^s(1+)]isatleastapre-speciedvalue,e.g.,95%.Asmallervalueofmeansthattheestimatedowsizeismoreaccurate(inaprobabilisticsense). Thereisatradeoffbetweentheestimationaccuracyandthestorageoverhead.Ifthestoragespaceandtheprocessingtimeareunrestricted,wecanaccuratelycounteachpackettoachieveperfectaccuracy.However,inpractice,therewillbeconstraintsonbothstorageandprocessingspeed,whichmake100%accuratemeasurementsometimesinfeasible.Inthiscase,onehastosettlewithimperfectresultsthatcanbeproducedwiththeavailableresources.Withintheboundsofthelimitedresources,wemustexplorenovelmeasurementmethodstomaketheestimatedowsizesasaccurateaspossible. 2.2SystemDesign 2.2.1BasicIdea Weuseanexampletoillustratetheideabehindournewmeasurementapproach.SupposetheamountofSRAMallocatedtooneofthemeasurementfunctionsis2Mb(2220bits),andeachmeasurementperiodendsafter10millionpackets,whichtranslateintoabout8secondsforanOC-192link(10+Gbps)withanaveragepacketsizeof1,000bytes.Thetypesofowsthattheonlinefunctionsmaymeasureincludeper-sourceows,per-destinationows,per-source/destinationows,TCPows,WWW 30

PAGE 31

ows(withdestinationport80),etc.Withoutlosinggenerality,supposethespecicfunctionunderconsiderationinthisexamplemeasuresthesizesofTCPows. Figure 2-1 showsthenumberofTCPowsthathaveacertainowsizeinlogscale,basedonarealnetworktracecapturedbythemaingatewayofourcampus.Ifweuse10bitsforeachcounter,therewillbeonly0.2millioncounters.Thenumberofconcurrentowsinourtraceforatypicalmeasurementperiodisaround1million.Itisobviousthatallocatingper-owstateisnotpossibleandeachcounterhastostoretheinformationofmultipleows.Butifanelephantowismappedtoacounter,thatcounterwilloverowandloseinformation.Ontheotherhand,ifonlyacoupleofmouseowsaremappedtoacounter,thecounterwillbeunder-utilized,withmostofitshigh-orderbitsleftaszeros. Tosolvetheaboveproblems,wenotonlystoretheinformationofmultipleowsineachcounter,butalsostoretheinformationofeachowinalargenumberofcounters,suchthatanelephantisbrokenintomanymicethatarestoredatdifferentcounters.Morespecically,wemapeachowtoasetoflrandomly-selectedcountersandsplittheowsizeintolroughly-equalshares,eachofwhichisstoredinonecounter.Thevalueofacounteristhesumofthesharesfromallowsthataremappedtothecounter.Becauseowssharecounters,theyintroducenoisetoeachother'smeasurement.Thekeytoaccuratelyestimatethesizeofaowistomeasurethenoiseintroducedbyotherowsinthecountersthattheowismappedto. Fortunately,thiscanbedoneiftheowsaremappedtothecountersuniformlyatrandom.Anytwoowswillhavethesameprobabilityofsharingcounters,whichmeansthateachowwillhavethesameprobabilityofintroducingacertainamountofnoisetoanyotherow.Ifthenumberofowsandthenumberofcountersareverylarge,thecombinednoiseintroducedbyallowswillbedistributedacrossthecounterspaceaboutuniformly.Thestatisticallyuniformdistributionofthenoisecanbemeasuredand 31

PAGE 32

removed.Theaboveschemeofinformationstorageandrecoveryiscalledrandomizedcountersharing. WestressthatthisdesignphilosophyofsplittingeachowamongalargenumberofcountersisverydifferentfromreplicatingeachowinthreecountersasthecountingBloomlter[ 27 ]orcounterbraids[ 75 ]dotheyaddthesizeofeachowasawholetothreerandomlyselectedcounters.Mostnotably,ourmethodincrementsonecounterforeacharrivalpacket,whilethecountingBloomlterorcounterbraidsincrementthreecounters.Westoretheinformationofeachowinmanycounters(e.g.,50),whiletheystoretheinformationofeachowinthreecounters. 2.2.2OverallDesign Ouronlinetrafcmeasurementfunctionconsistsoftwomodules.Theonlinedataencodingmodulestorestheinformationofarrivalpacketsinanarrayofcounters.Foreachpacket,itperformsonehashfunctiontoidentifyacounterandthenupdatesthecounterwithtwomemoryaccesses,oneforreadingandtheotherforwriting.Attheendofeachmeasurementperiod,thecounterarrayisstoredtothediskandthenresettozeros. Theofinedatadecodingmoduleanswersqueriesforowsizes.Itisperformedbyadesignatedofinecomputer.Weproposetwomethodsforseparatingtheinformationaboutthesizeofaowfromthenoiseinthecounters.Therstoneiscalledthecountersumestimationmethod(CSM),whichisverysimpleandeasytocompute.Thesecondoneiscalledthemaximumlikelihoodestimationmethod(MLM),whichismoreaccuratebutalsomorecomputationallyintensive.Thetwocomplementarymethodsprovideexibilityindesigningapracticalsystem,whichmayrstuseCSMforroughestimationsandthenapplyMLMtotheonesofinterest. 2.3StateoftheArt Arelatedthreadofresearchistocollectstatisticalinformationoftheows[ 36 65 ],oridentifythelargestowsanddevotetheavailablememorytomeasuretheir 32

PAGE 33

sizeswhileignoringthesmallerones[ 34 40 58 60 ].Forexample,RATE[ 62 ]andACCEL-RATE[ 50 ]measureper-owratebymaintainingper-owstate,buttheyuseatwo-runsamplingmethodtolteroutsmall-rateowssothatonlyhigh-rateowsaremeasured. Anotherthreadofresearchistomaintainalargenumberofcounterstotrackvariousnetworkinginformation.Onepossiblesolution[ 30 107 ]canbestatisticallyupdateacounteraccordingtothecurrentcountersize.Thisapproachistfortheapplicationswithloosemeasurementaccuracy.Inordertoenhancetheaccuracyperformance,Zhaoetal.[ 127 ]proposeastatisticalmethodtomakeaDRAM-basedsolutionpractical,whichusesasmallcacheandrequestqueuestobalancethecountervalues.SinceDRAMisinvolvedandwirespeedisachieved,thisapproachisabletoachievedecentmeasurementaccuracy. Alsorelatedisthework[ 110 ]thatmeasuresthenumberofdistinctdestinationsthateachsourcehascontacted.Per-owcounterscannotbeusedtosolvethisproblembecausetheycannotremoveduplicatepackets.Ifasourcesends1,000packetstoadestination,thepacketscontributeonlyonecontact,butwillcountas1,000whenwemeasuretheowsize.Toremoveduplicates,bitmaps(insteadofcounters)shouldbeused[ 13 41 114 115 128 ].Fromthetechnicalpointofview,thisrepresentsaseparatelineofresearch,whichemploysadifferentsetofdatastructuresandanalyticaltools.Attempthasalsobeenmadetousebitmapsforestimatingtheowsizes,whichishoweverfarlessefcientthancounters,asourexperimentswillshow. 2.4OnlineDataEncoding TheowsizeinformationisstoredinanarrayCofmcounters.TheithcounterinthearrayisdenotedasC[i],0im)]TJ /F6 11.955 Tf 12.82 0 Td[(1.Thesizeofthecountersshouldbesetsothatthechanceofoverowisnegligible;wewilldiscussthisissueindetailsinSection 2.7 .EachowismappedtolcountersthatarerandomlyselectedfromCthroughhashfunctions.Thesecounterslogicallyformastoragevectoroftheow, 33

PAGE 34

denotedasCf,wherefisthelabeloftheow.Theithcounterofthevector,denotedasCf[i],0il)]TJ /F6 11.955 Tf 11.96 0 Td[(1,isselectedfromCasfollows:Cf[i]=C[Hi(f)], (2) whereHi(...)isahashfunctionwhoserangeis[0,m).WewanttostressthatCfisnotaseparatearrayforowf.ItismerelyalogicalconstructionfromcountersinCforthepurposeofsimplifyingthepresentation.Inallourformulas,oneshouldtreatthenotationCf[i]simplyasC[Hi(f)].ThehashfunctionHi,0il)]TJ /F6 11.955 Tf 11.97 0 Td[(1,canbeimplementedfromamasterfunctionH(...)asfollows:Hi(f)=H(fji)orHi(f)=H(fR[i]),where`j'istheconcatenationoperator,`'istheXORoperator,andR[i]isaconstantwhosebitsdifferrandomlyfordifferentindicesi. Allcountersareinitializedtozerosatthebeginningofeachmeasurementperiod.Theoperationofonlinedataencodingisverysimple:Whentherouterreceivesapacket,itextractstheowlabelffromthepacketheader,randomlyselectsacounterfromCf,andincreasesthecounterbyone.Morespecically,therouterrandomlypicksanumberibetween0andl)]TJ /F6 11.955 Tf 12.43 0 Td[(1,computesthehashHi(f),andincreasesthecounterC[Hi(f)],whichisphysicallyinthearrayC,butlogicallytheithelementinthevectorCf. 2.5OfineCounterSumEstimation 2.5.1EstimationMethod Attheendofameasurementperiod,therouterstoresthecounterarrayCtoadiskforlong-termstorageandofinedataanalysis.Letnbethecombinedsizeofallows,whichisPm)]TJ /F5 7.97 Tf 6.59 0 Td[(1i=0C[i].Letsbethetruesizeofaowfduringthemeasurementperiod.Theestimatedsize,^s,basedonourcountersumestimationmethod(CSM)is ^s=l)]TJ /F5 7.97 Tf 6.58 0 Td[(1Xi=0Cf[i])]TJ /F3 11.955 Tf 11.95 0 Td[(ln m.(2) Therstitemisthesumofthecountersinthestoragevectorofowf.Itcanalsobeinterpretedasthesumoftheowsizesandthenoisefromotherowsdue 34

PAGE 35

tocountersharing.Theseconditemcapturestheexpectednoise.Belowweformallyderive( 2 ). Consideranarbitrarycounterinthestoragevectorofowf.WetreatthevalueofthecounterasarandomvariableX.LetYbetheportionofXcontributedbythepacketsofowf,andZbetheportionofXcontributedbythepacketsofotherows.Obviously,X=Y+Z. Eachofthespacketsinowfhasaprobabilityof1 ltoincreasethevalueofthecounterbyone.Hence,Yfollowsabinomialdistribution: YBino(s,1 l).(2) Eachpacketofanotherowf0hasaprobabilityof1 mtoincreasethecounterbyone.Thatisbecausetheprobabilityforthecountertobelongtothestoragevectorofowf0isl m,andifthathappens,thecounterhasaprobabilityof1 ltobeselectedforincrement.Assumethereisalargenumberofows,thesizeofeachowisnegligiblewhencomparingwiththetotalsizeofallows,andlislargesuchthateachow'ssizeisrandomlyspreadamongmanycounters.Wecanapproximatelytreatthepacketsindependently.Hence,Zapproximatelyfollowsabinomialdistribution: ZBino(n)]TJ /F3 11.955 Tf 11.96 0 Td[(s,1 m)Bino(n,1 m),becausesn.(2) Wemusthave E(X)=E(Y+Z)=E(Y)+E(Z)=s l+n m.(2) Thatis, s=lE(X))]TJ /F3 11.955 Tf 11.96 0 Td[(ln m.(2) FromtheobservedcountervaluesCf[i],E(X)canbemeasuredasPl)]TJ /F15 5.978 Tf 5.76 0 Td[(1i=0Cf[i] l.Wehavethefollowingestimationfors: ^s=l)]TJ /F5 7.97 Tf 6.58 0 Td[(1Xi=0Cf[i])]TJ /F3 11.955 Tf 11.95 0 Td[(ln m.(2) 35

PAGE 36

Ifaowsharesacounterwithanelephantow,itssizeestimationcanbeskewed.However,ourexperimentsshowthatCSMworkswellingeneralbecausethenumberofelephantsistypicallysmall(asshowninFigure 2-1 )andthustheirimpactisalsosmall,particularlywhenthereareaverylargenumberofcountersandows.Moreover,ournextmethodbasedonmaximumlikelihoodestimationcaneffectivelyreducetheimpactofanoutlierinaow'sstoragevectorthatiscausedbyanelephantow. 2.5.2EstimationAccuracy Themeanandvarianceof^swillbegivenin( 2 )and( 2 ),respectively.Theyarederivedasfollows:BecauseX=Y+Z,wehaveE(X2)=E((Y+Z)2)=E(Y2)+2E(YZ)+E(Z2)=E(Y2)+2E(Y)E(Z)+E(Z2)=s2 l2)]TJ /F3 11.955 Tf 14.64 8.09 Td[(s l2+s l+2s ln m+n2 m2)]TJ /F3 11.955 Tf 17.36 8.09 Td[(n m2+n m. Thefollowingfactsareusedintheabovemathematicalprocess:E(Y2)=s2 l2)]TJ /F4 7.97 Tf 14.84 4.71 Td[(s l2+s lbecauseYBino(s,1=l).E(YZ)=E(Y)E(Z)sinceYandZareindependent.E(Z2)=n2 m2)]TJ /F4 7.97 Tf 16.29 4.71 Td[(n m2+n mbecauseZBino(n,1=m).Var(X)=E(X2))]TJ /F6 11.955 Tf 11.96 0 Td[((E(X))2=s l(1)]TJ /F6 11.955 Tf 13.15 8.08 Td[(1 l)+n m(1)]TJ /F6 11.955 Tf 14.99 8.08 Td[(1 m). (2) In( 2 ),Cf[i],0il)]TJ /F6 11.955 Tf 12 0 Td[(1,areindependentsamplesofX.Wecaninterpret^sasarandomvariableinthesensethatadifferentsetofsamplesofXmayresultinadifferentvalueof^s.From( 2 ),wehaveE(^s)=lE(X))]TJ /F3 11.955 Tf 11.96 0 Td[(ln m=l(s l+n m))]TJ /F3 11.955 Tf 11.96 0 Td[(ln m=s, (2) 36

PAGE 37

whichmeansourestimationisunbiased.Thevarianceof^scanbewrittenasVar(^s)=l2Var(X)=l2s l(1)]TJ /F6 11.955 Tf 13.15 8.09 Td[(1 l)+n m(1)]TJ /F6 11.955 Tf 15 8.09 Td[(1 m)=s(l)]TJ /F6 11.955 Tf 11.95 0 Td[(1)+l2n m(1)]TJ /F6 11.955 Tf 14.99 8.09 Td[(1 m). (2) 2.5.3CondenceInterval Thecondenceintervalfortheestimationwillbegivenin( 2 ),anditisderivedasfollows:Thebinomialdistribution,ZBino(n,1=m),canbecloselyapproximatedasaGaussiandistribution,Norm(n m,n m(1)]TJ /F5 7.97 Tf 14.73 4.71 Td[(1 m)),whennislarge.Similarly,thebinomialdistribution,YBino(s,1 l),canbeapproximatedbyNorm(s l,s l(1)]TJ /F5 7.97 Tf 13.63 4.71 Td[(1 l)).BecausethelinearcombinationoftwoindependentGaussianrandomvariablesisalsonormallydistributed[ 14 ],wehaveXNorm(s l+n m,s l(1)]TJ /F5 7.97 Tf 14.08 4.71 Td[(1 l)+n m(1)]TJ /F5 7.97 Tf 15.38 4.71 Td[(1 m)).Tosimplifythepresentation,let=s l+n mand=s l(1)]TJ /F5 7.97 Tf 13.15 4.71 Td[(1 l)+n m(1)]TJ /F5 7.97 Tf 14.45 4.71 Td[(1 m). XNorm(,),(2) wherethemeanandthevarianceagreewith( 2 )and( 2 ),respectively. Because^sisalinearfunctionofCf[i],0il)]TJ /F6 11.955 Tf 11.96 0 Td[(1,whichareindependentsamplesofX,^smustalsoapproximatelyfollowaGaussiandistribution.From( 2 )and( 2 ),wehave ^sNorm(s,s(l)]TJ /F6 11.955 Tf 11.95 0 Td[(1)+l2n m(1)]TJ /F6 11.955 Tf 14.99 8.09 Td[(1 m)).(2) Hence,thecondenceintervalis ^sZr s(l)]TJ /F6 11.955 Tf 11.96 0 Td[(1)+l2n m(1)]TJ /F6 11.955 Tf 15 8.08 Td[(1 m),(2) whereisthecondencelevelandZisthepercentileforthestandardGaussiandistribution.Asanexample,when=95%,Z=1.96. 37

PAGE 38

2.6MaximumLikelihoodEstimation Inthissection,weproposethesecondestimationmethodthatismoreaccuratebutalsomorecomputationallyexpensive. 2.6.1EstimationMethod WeknowfromtheprevioussectionthatanycounterinthestoragevectorofowfcanberepresentedbyarandomvariableX,whichisthesumofYandZ,whereYBino(s,1 l)andZBino(n,1=m).Foranyintegerz2[0,n),theprobabilityfortheeventZ=ztooccurcanbecomputedasfollows: PrfZ=zg=nz(1 m)z(1)]TJ /F6 11.955 Tf 15 8.09 Td[(1 m)n)]TJ /F4 7.97 Tf 6.59 0 Td[(z. Becausenandmareknown,PrfZ=zgisafunctionofasinglevariablezandthusdenotedasP(z). BasedontheprobabilitydistributionofYandZ,theprobabilityfortheobservedvalueofacounter,Cf[i],8i2[0,l),tooccurisPrfX=Cf[i]g=Cf[i]Xz=0(PrfZ=zgPrfY=Cf[i])]TJ /F3 11.955 Tf 11.95 0 Td[(zg)=Cf[i]Xz=0sCf[i])]TJ /F3 11.955 Tf 11.95 0 Td[(z(1 l)Cf[i])]TJ /F4 7.97 Tf 6.58 0 Td[(z(1)]TJ /F6 11.955 Tf 13.15 8.08 Td[(1 l)s)]TJ /F5 7.97 Tf 6.59 0 Td[((Cf[i])]TJ /F4 7.97 Tf 6.59 0 Td[(z)P(z). (2) Lety=Cf[i])]TJ /F3 11.955 Tf 12.35 0 Td[(ztosimplifytheformula.TheprobabilityforallobservedvaluesinthestoragevectorofowftooccurisL=l)]TJ /F5 7.97 Tf 6.59 0 Td[(1Yi=0PrfX=Cf[i]g=l)]TJ /F5 7.97 Tf 6.59 0 Td[(1Yi=0Cf[i]Xz=0sy(1 l)y(1)]TJ /F6 11.955 Tf 13.16 8.09 Td[(1 l)s)]TJ /F4 7.97 Tf 6.59 0 Td[(yP(z). (2) 38

PAGE 39

Themaximumlikelihoodmethod(MLM)istondanestimatedsize^sofowfthatmaximizestheabovelikelihoodfunction.Namely,wewanttond ^s=argmaxfLg.s(2) Tond^s,werstapplylogarithmtoturntherightsideoftheequationfromproducttosummation.ln(L)=l)]TJ /F5 7.97 Tf 6.59 0 Td[(1Xi=0lnCf[i]Xz=0sy(1 l)y(1)]TJ /F6 11.955 Tf 13.15 8.09 Td[(1 l)s)]TJ /F4 7.97 Tf 6.59 0 Td[(yP(z). (2) Becaused(sy) ds=)]TJ /F4 7.97 Tf 5.86 -4.38 Td[(sy( (s+1))]TJ /F7 11.955 Tf 11.24 0 Td[( (s+1)]TJ /F3 11.955 Tf 11.25 0 Td[(y)),where (...)isthepolygammafunction[ 4 ],wehaved()]TJ /F4 7.97 Tf 5.86 -4.37 Td[(sy(1)]TJ /F5 7.97 Tf 13.15 4.71 Td[(1 l)s)]TJ /F4 7.97 Tf 6.59 0 Td[(y) ds=sy(1)]TJ /F6 11.955 Tf 13.15 8.08 Td[(1 l)s)]TJ /F4 7.97 Tf 6.59 0 Td[(y (s+1))]TJ /F7 11.955 Tf 11.96 0 Td[( (s+1)]TJ /F3 11.955 Tf 11.95 0 Td[(y)+ln(1)]TJ /F6 11.955 Tf 13.15 8.08 Td[(1 l). Tosimplifythepresentation,wedenotetherightsideoftheaboveequationasO(s).From( 2 ),wecancomputetherst-orderderivativeofln(L)asfollows:dln(L) ds=l)]TJ /F5 7.97 Tf 6.59 0 Td[(1Xi=0PCf[i]z=0O(s)(1 l)yP(z) PCf[i]z=0)]TJ /F4 7.97 Tf 5.86 -4.38 Td[(sy(1 l)y(1)]TJ /F5 7.97 Tf 13.15 4.71 Td[(1 l)s)]TJ /F4 7.97 Tf 6.59 0 Td[(yP(z). (2) MaximizingLisequivalenttomaximizingln(L).Hence,bysettingtherightsideof( 2 )tozero,wecanndthevaluefor^sthroughnumericalmethods.Becausedln(L) dsisamonotonefunctionofs,wehaveusedthebisectionsearchmethodinallourexperimentsinSection 2.9 tondthevalue^sthatmakesdln(L) dsequaltozero. 2.6.2EstimationAccuracy Theestimationcondenceintervalwillbegivenin( 2 ),anditisderivedasfollows:Theestimationformulaisgivenin( 2 ).AccordingtotheclassicaltheoryforMLM,whenlissufcientlylarge,thedistributionoftheow-sizeestimation^scanbe 39

PAGE 40

approximatedby Norm(s,1 I(^s)),(2) wherethesherinformationI(^s)[ 67 ]ofLisdenedasfollows: I(^s)=)]TJ /F3 11.955 Tf 9.3 0 Td[(Ed2ln(L) ds2.(2) Inordertocomputethesecond-orderderivative,webeginfrom( 2 )andhavethefollowing:PrfX=Cf[i]g=1 p 2e)]TJ /F15 5.978 Tf 7.78 4.63 Td[((Cf[i])]TJ /F17 5.978 Tf 5.75 0 Td[()2 2ln(PrfX=Cf[i]g)=)]TJ /F6 11.955 Tf 11.29 0 Td[(ln(p 2))]TJ /F6 11.955 Tf 13.15 8.08 Td[((Cf[i])]TJ /F7 11.955 Tf 11.95 0 Td[()2 2, (2) where0il)]TJ /F6 11.955 Tf 11.96 0 Td[(1.Performingthesecond-orderdifferentiation,wehaved2ln(PrfX=Cf[i]g) ds2=)]TJ /F7 11.955 Tf 11.68 8.09 Td[(0 l+(1 2(1)]TJ /F5 7.97 Tf 13.15 4.7 Td[(1 l)+)]TJ /F3 11.955 Tf 11.95 0 Td[(Cf[i])0 l2+1 l3(1)]TJ /F6 11.955 Tf 13.15 8.09 Td[(1 l)()]TJ /F3 11.955 Tf 11.95 0 Td[(Cf[i])0)]TJ /F6 11.955 Tf 11.96 0 Td[(()]TJ /F3 11.955 Tf 11.95 0 Td[(Cf[i])20, (2) where0=1 land0=1 l(1)]TJ /F5 7.97 Tf 13.15 4.7 Td[(1 l).Therefore,E(d2ln(PrfX=Cf[i]g) ds2)=)]TJ /F7 11.955 Tf 11.68 8.08 Td[(0 l+1 2(1)]TJ /F5 7.97 Tf 13.15 4.71 Td[(1 l)0 l2+1 l3(1)]TJ /F6 11.955 Tf 13.15 8.08 Td[(1 l)E()]TJ /F3 11.955 Tf 11.95 0 Td[(Cf[i])20=)]TJ /F6 11.955 Tf 15.93 8.09 Td[(1 l2+3(1)]TJ /F5 7.97 Tf 13.15 4.71 Td[(1 l)2 2l22, (2) wherewehaveusedthefollowingfacts:E()]TJ /F3 11.955 Tf 10.36 0 Td[(Cf[i])=0andE()]TJ /F3 11.955 Tf 10.37 0 Td[(Cf[i])2=.BecauseL=Ql)]TJ /F5 7.97 Tf 6.59 0 Td[(1i=0PrfX=Cf[i]g,wehaveI(^s)=)]TJ /F3 11.955 Tf 9.3 0 Td[(Ed2ln(L) ds2=l)]TJ /F5 7.97 Tf 6.59 0 Td[(1Xi=0E(d2ln(PrfX=Cf[i]g) ds2)=1 l)]TJ /F6 11.955 Tf 13.15 8.78 Td[(3(1)]TJ /F5 7.97 Tf 13.15 4.71 Td[(1 l)2 2l2. (2) 40

PAGE 41

From( 2 ),thevarianceof^sis Var(^s)=1 I(^s)=2l2 2)]TJ /F6 11.955 Tf 11.95 0 Td[(3(1)]TJ /F5 7.97 Tf 13.15 4.71 Td[(1 l)2.(2) Hence,thecondenceintervalis ^sZs 2l2 2)]TJ /F6 11.955 Tf 11.96 0 Td[(3(1)]TJ /F5 7.97 Tf 13.15 4.71 Td[(1 l)2,(2) whereZisthepercentileforthestandardGaussiandistribution. 2.7SettingCounterLength Sofar,ouranalysishasassumedthateachcounterhasasufcientnumberofbitssuchthatitwillnotoverow.However,inordertosavespace,wewanttosetthecounterlengthasshortaspossible.Supposeeachmeasurementperiodendsafterapre-speciednumbernofpacketsarereceived.(Notethatthevalueofnisthecombinedsizesofallowsduringeachmeasurementperiod.)Theaveragevalueofallcounterswillben m.Wesetthenumberofbitsineachcounter,denotedasb,tobelog2n m+1.Duetotheadditionalbit,eachcountercanholdatleasttwotimesoftheaveragebeforeoverowing.IftheallocatedmemoryhasMbits,thevaluesofbandmcanbedeterminedfromthefollowingequations: bm=M,log2n m+1=b.(2) Duetotherandomizedcountersharingdesign,roughlyspeaking,thepacketsaredistributedinthecountersatrandom.WeobserveinourexperimentsthatthecountervaluesapproximatelyfollowaGaussiandistributionwithameanofn m.Inthisdistribution,thefractionofcountersthataremorethanfourtimesofthemeanisverysmalllessthan5.3%inallourexperiments.Consequently,theimpactofcounteroverowinCSMorMLMisalsoverysmallformostows.Thoughitissmall,wewilltotallyeliminatethisimpactinSection 2.10.4 41

PAGE 42

2.8FlowLabels ThecompactonlinedatastructureintroducedinSection 2.4 onlystorestheowsizeinformation.Itdoesnotstoretheowlabels.Thelabelsareper-owinformation,anditcannotbecompressedinthesamewaywedofortheowsizes.Insomeapplications,theowlabelsarepre-knownanddonothavetobecollected.Forexample,ifanISPwantstomeasurethetrafcfromitscustomers,itknowstheirIPaddresses(whicharetheowlabelsinthiscase).Similarly,ifthesystemadministratorofalargeenterprisenetworkneedstheinformationaboutthetrafcvolumesofthehostsinthenetwork,shehasthehosts'addresses. IncasethattheowlabelsneedtobecollectedandthereisnotenoughSRAMtokeepthem,thelabelshavetobestoredinDRAM.Anefcientsolutionforlabelcollectionwasproposedin[ 76 ].ABloomlter[ 7 8 ]canbeimplementedinSRAMtoencodetheowlabelsthathaveseenbytherouterduringameasurementperiod,suchthateachlabelisonlystoredonceinDRAMwhenitappearsforthersttimeinthepacketstream;storingeachlabelonceistheminimumoverheadifthelabelsmustbecollected. IfweusethreehashfunctionsintheBloomlter,eachpacketincursthreeSRAMaccessesinordertocheckwhethertheowlabelcarriedthepacketisalreadyencodedintheBloomlter.Arecentworkonone-memory-accessBloomlters[ 94 ]showsthatthreeSRAMaccessesperpacketcanbereducedtoone.ThisoverheadisfurtherreducedifweonlyexaminetheUDFpacketsandtheSYNpackets(whichcarrythelabelinformationofTCPtrafc).ArecentstudyshowsthatUDFaccountsfor20%oftheInternettrafc[ 10 ]andthemeasurementofourcampustrafcshowsthatSYNpacketsaccountsforlessthan10%ofallTCPtrafc.Therefore,theBloomlteroperationonlyneedstobecarriedoutforlessthan28%ofallpackets,whichamortizestheoverhead. 42

PAGE 43

2.9Experiments Weuseexperimentstoevaluateourestimationmethods,CSM(CounterSumestimationMethod)andMLM(MaximumLikelihoodestimationMethod),whicharedesignedbasedontherandomizedcountersharingscheme.WealsocompareourmethodswithCB(CounterBraids)[ 75 ]andMRSCBF(Multi-ResolutionSpace-CodeBloomFilters)[ 66 ].OurevaluationisbasedontheperformancemetricsoutlinedinSection 2.1 ,includingper-packetprocessingtime,memoryoverhead,andestimationaccuracy. Theexperimentsuseanetworktrafctraceobtainedfromthemaingatewayofourcampus.Weperformexperimentsonvariousdifferenttypesofows,suchasper-sourceows,per-destinationows,per-source/destinationows,andTCPows.Theyallleadtothesameconclusions.Withoutlosinggenerality,wechooseTCPowsforpresentation.Thetracecontainsabout68millionsofTCPowsand750millionsofpackets.Ineachmeasurementperiod,10millionpacketsareprocessed;ittypicallycoversslightlymorethan1millionows. 2.9.1ProcessingTime Theprocessingtimeismainlydeterminedbythenumberofmemoryaccessesandthenumberofhashcomputationsperpacket.Table 2-1 presentsthecomparison.CSMorMLMperformstwomemoryaccessesandonehashcomputationforeachpacket.CBincursthreetimesoftheoverhead.Itperformssixmemoryaccessesandthreehashcomputationsforeachpacketattherstcounterlevel,andintheworstcasemakessixadditionalmemoryaccessesandthreeadditionalhashcomputationsatthesecondlevel.MRSCBFhasninelters.Theithlteruseskihashfunctionsandencodespacketswithasamplingprobabilitypi,wherek1=3,k2=4,ki=6,8i2[3,9],andpi=(1 4)i)]TJ /F5 7.97 Tf 6.59 0 Td[(1,8i2[1,9].Whenencodingapacket,theithlterperformskihashcomputationsandsetskibits.Hence,thetotalnumberofmemoryaccesses(orhashcomputations)perpacketforallltersisP9i=1(piki)4.47. 43

PAGE 44

2.9.2MemoryOverheadandEstimationAccuracy WestudytheestimationaccuraciesofCSMandMLMunderdifferentlevelsofmemoryavailability.Ineachmeasurementperiod,10Mpacketsareprocessed,i.e.,n=10M,whichtranslatesintoabout8secondsforanOC-192link(10+Gbps)orabout2secondsforanOC-768link(40+Gbps)withanaveragepacketsizeof1,000bytes.ThememoryMallocatedtothisparticularmeasurementfunctionisvariedfrom2Mb(2220bits)to8Mb.Thecounterlengthbandthenumberofcountersmaredeterminedbasedon( 2 ).Thesizeofeachstoragevectoris50. WhenM=2Mb,theexperimentalresultsarepresentedinFigure 2-2 .TherstplotfromtheleftshowstheestimationresultsbyCSMforonemeasurementperiod;theresultsforothermeasurementperiodsareverysimilar.Eachowisrepresentedbyapointintheplot,whosexcoordinateisthetrueowsizesandycoordinateistheestimatedowsize^s.Theequalityline,y=x,isalsoshownforreference.Anestimationismoreaccurateifthepointisclosertotheequalityline. Thesecondplotpresentsthe95%condenceintervalsfortheestimationsmadebyCSM.Thewidthofeachverticalbarshowsthesizeofthecondenceintervalatacertainowsize(whichisthexcoordinateofthebar).Themiddlepointofeachbarshowsthemeanestimationforallowsofthatsize.Intuitively,theestimationismoreaccurateifthecondenceintervalissmallerandthemiddlepointisclosertotheequalityline. ThethirdplotshowstheestimationresultsbyMLM,andthefourthplotshowsthe95%condenceintervalsfortheestimationsmadebyMLM.Clearly,MLMachievesbetteraccuracythanCSM.TheestimationaccuracyshowninFigure 2-2 isachievedwithamemoryofslightlylessthan2bitsperow, WecanimprovetheestimationaccuracyofCSMorMLMbyusingmorememory.WeincreaseMto4Mbandrepeattheaboveexperiments.TheresultsareshowninFigure 2-3 .WethenincreaseMto8Mbandrepeattheaboveexperiments.Theresults 44

PAGE 45

areshowninFigure 2-4 .TheaccuracyclearlyimprovesasthecondenceintervalsshrinkwhenMbecomeslarger. WerepeatthesameexperimentsonCB,whoseparametersareselectedaccordingto[ 75 ].TheresultsarepresentedinFigure 2-5 .TherstplotshowsthatCBtotallyfailstoproduceanymeaningfulresultswhentheavailablememoryistoosmall:M=2Mb,whichtranslatesintolessthan2bitsperow.Infact,itsalgorithmcannotconverge,butinsteadproduceoscillatingresults.Wehavetoarticiallystopthealgorithmafteraverylongtime.ThesecondplotshowsthatCBworkswellwhenM=4Mb.Thealgorithmstillcannotconvergebyitself,eventhoughitcanproduceverygoodresultswhenwearticiallystopitafteralongtimewithoutobservinganyfurtherimprovementintheresults.Itcanbeseenthattheresultscarryasmallpositivebiasbecausemostpointsareononesideoftheequalityline.ThethirdplotshowsthatCBisabletoreturntheexactsizesformostowswhenthememoryisM=8Mb. CombiningtheresultsinTable 2-1 ,wedrawthefollowingconclusion:(1)Inpractice,weshouldchooseCSM/MLMiftherequirementistohandlehighmeasurementthroughput(whichmeanslowper-packetprocessingtime)oriftheavailablememoryistoosmallsuchthatCBdoesnotwork,whilerelativelycoarseestimationisacceptable.(2)WeshouldchooseCBiftheprocessingtimeislessofaconcern,sufcientmemoryisavailable,andtheexactowsizesarerequired. WealsorunMRSCBFunderdifferentlevelsofmemoryavailability.WebeginwithM=8Mb.CSMorMLMworksverywellwiththismemorysize(Figure 2-4 ).TheperformanceofMRSCBFisshownintherstplotofFigure 2-6 .Therearesomeverylargeestimatedsizes.Tocontrolthescaleintheverticalaxis,wearticiallysetanyestimationbeyond2,800tobe2,800.TheresultsdemonstratethatMRSCBFtotallyfailswhenM=8Mb.TheperformanceofMRSCBFimproveswhenweincreasethememory. 45

PAGE 46

TheresultswhenM=40Mbareshowninthesecondplot.1Inthethirdplot,whenwefurtherincreaseMto80Mb,2noobviousimprovementisobservedwhencomparingthesecondplot.AnalnoteisthattheoriginalpaperofMRSCBFuseslogscaleintheirpresentation.ThethirdplotinFigure 2-6 willappearasthefourthplotinlogscale. Clearly,thebitmap-basedMRSCBFperformsworsethanCB,CSMorMLM.Tomeasureowsizes,countersaresuperiorthanbitmaps. 2.10ExtensionofEstimationRange WesettheupperboundontheowsizethatCSMandMLMcanestimateinSection 2.9 to2,500.However,intoday'shigh-speednetworks,thesizesofsomeowsaremuchlargerthan2,500.Inordertoextendtheestimationrangetocovertheselargeows,weproposefourapproachesthatincreasetheestimationupperbound,andpresentextensiveexperimentalresultstodemonstratetheireffectiveness.SinceMLMgenerallyperformsbetterthanCSE,weonlydiscusshowtoextendtheestimationrangeofMLM.CSEcanbeeasilyenhancedbysimilarapproaches. AccordingtoSection 2.4 ,eachowisassignedauniquestoragevector.Aow'sstoragevectorconsistsoflcountersandeachcounterhasbbits.Therefore,themaximumnumberofpacketsthatthestoragevectorcanrepresentisl(2b)]TJ /F6 11.955 Tf 12.18 0 Td[(1).Ifweincreasebbyone,thenumberofpacketsthatthevectorcanrepresentwillbedoubled.Similarly,ifweincreaselbyacertainfactor,thenumberofpacketsthatthevectorcanrepresentwillbeincreasedbythesamefactor.Basedontheseobservations,weextendtheestimationrangeofMLMbyincreasingthevalueofbandl,respectively.Inaddition, 1Attheendofeachmeasurementperiod,abouthalfofthebitsintheltersofMRSCBFaresettoones.2Attheendofeachmeasurementperiod,lessthanhalfofthebitsintheltersofMRSCBFaresettoones. 46

PAGE 47

weaddasamplingmoduletoMLMandconsiderhybridSRAM/DRAMimplementationtoextendtheestimationrange. 2.10.1IncreasingCounterSizeb Ourrstapproachtoextendtheestimationrangeistoenlargethecountersizeb.WerepeatthesameexperimentonMLMpresentedinthethirdplotofFigure 2-3 (Section 2.9.2 ),whereM=4Mb,l=50,andn=10M.Thistime,insteadofcomputingbfrom( 2 ),wevaryitsvaluefrom6to9.ThenewexperimentalresultsareshowninFigure 2-7 .Intherstplot,themaximumowsizethatMLMcanestimateisabout1,400whenb=6.Inthesecondplot,whereb=7,themaximumowsizeisabout2,800,whichistwiceofthemaximumowsizethattherstplotcanachieve.Whenbissetto8,thethirdplotshowsthattheestimationrangeofMLMisfurtherextended.Thefourthplotshowsthat,whenb=9,themaximumowsizethatMLMcanestimatedoesnotincreaseanymorewhencomparingwiththethirdplot,whichwewillexplainshortly.TheestimationaccuracyoftheaboveexperimentsispresentedinFigure 2-8 ,wheretherstplotshowstheestimationbiasandthesecondplotshowsthestandarddeviationoftheexperimentalresultsinFigure 2-7 .Generallyspeaking,bothbiasandstandarddeviationincreaseslightlywhenbincreases. SinceowssharecountersinMLM,thesizeinformationofoneowinacounteristhenoisetootherowsthatsharethesamecounter.WhentheamountofmemoryallocatedtoMLMisxed(M=4Mbintheseexperiments),alargervalueforbwillresultinasmallervalueform,i.e.,thetotalnumberofcountersisreduced.Hence,eachcounterhastobesharedbymoreows,andtheaveragenumberofpacketsstoredineachcounterwillincrease.Thatmeansheaviernoiseamongows,whichdegradestheestimationaccuracy,asisdemonstratedbyFigure 2-8 .Moreover,althoughacounterwithalargersizebcankeeptrackofalargernumberofpackets,sinceitalsocarriesmorenoise,MLMhastosubstractmorenoisefromthecountervalueduringtheestimationprocess.Asaresult,theestimationrangecannotbeextendedindenitelyby 47

PAGE 48

simplyincreasingb,whichexplainsthefactthatthemaximumowsizethatMLMcanestimatedoesnotincreasewhenbreaches9inFigure 2-7 2.10.2IncreasingStorageVectorSizel Oursecondapproachforextendingtheestimationrangeistoincreasethestoragevectorsizel.WerepeattheexperimentsintheprevioussubsectionforMLMwithM=4Mb,b=7,andn=10M.Wevarylfrom50to1,000.Figure 2-9 presentstheexperimentalresults.TherstplotshowsthatthemaximumowsizethatMLMcanestimateisabout5,800whenl=50.Asweincreasethevalueofl,MLMcanestimateincreasinglylargerowsizes.However,whenlbecomestoolarge,estimationaccuracywilldegrade,whichisevidentinthefourthplot.Thereasonisthateachowsharestoomanycounterswithothers,whichresultsinexcessivenoiseinthecountersandconsequentlyintroduceinaccuracyintheestimationprocess. TheestimationaccuracyoftheaboveexperimentsispresentedinFigure 2-10 ,wheretherstplotshowstheestimationbiasandthesecondplotshowsthestandarddeviationoftheexperimentalresultsinFigure 2-9 .Generallyspeaking,bothbiasandstandarddeviationincreaseslightlywhenlincreases.Clearly,thevalueoflshouldnotbechosentoolarge(suchasl=1,000)inordertopreventestimationaccuracytodegradesignicantly. 2.10.3EmployingSamplingModule Inourthirdapproach,weaddasamplingmoduletoMLMtoenlargetheestimationrange.Thesamplingtechniquehasbeenwidelyusedinnetworkmeasurement[ 13 35 36 66 128 ].WeshowthatitalsoworksforMLM.Letpbethesamplingprobability.Foreachpacketthattherouterreceivesinthedataencodingphase,theroutergeneratesarandomnumberrinarange[0,N].Ifr
PAGE 49

WeagainrepeattheexperimentsintheprevioussubsectionsforMLMwithM=4Mb,l=50,andn=10M.Thevalueofbiscomputedfrom( 2 ).Thistime,weintroduceasamplingprobabilitypandvariesitsvalue.Figure 2-11 presentstheexperimentalresultsofMLMwithp=75%,50%,25%,and2%,respectively.Itdemonstratesthatwhenthesamplingprobabilitydecreases,theestimationrangeincreases.However,itcomeswithapenaltyonestimationaccuracy.Figure 2-12 showstheestimationbiasandstandarddeviationoftheestimationresultsinFigure 2-11 .Ifthesamplingprobabilityisnotdecreasedtoosmall,e.g.,whenp25%,theincreaseinbiasandstandarddeviationisinsignicant.However,ifthesamplingprobabilitybecomestoosmallsuchas2%,thedegradationinestimationaccuracyalsobecomesnoticeable. 2.10.4HybridSRAM/DRAMDesign Canweextendtheestimationrangewithoutanylimitationanddosowithoutanydegradationinestimationaccuracy?ThiswillrequireahybridSRAM/DRAMdesign.InSRAM,westillchoosethevalueofbbasedon( 2 ).Thelimitedsizeofeachcountermeansthatacountermaybeoverowedduringthedataencodingphaseeventhoughthechanceforthistohappenisverysmall(Section 2.7 ).Tototallyeliminatetheimpactofcounteroverow,wekeepanotherarrayofcountersinDRAM,eachofwhichhasasufcientnumberofbits.ThecountersinDRAMareone-to-onemappedtothecountersinSRAM.WhenacounterinSRAMisoverowed,itisresettozeroandthecorrespondingcounterinDRAMisincrementedbyone.Duringofinedataanalysis,thecountervaluesaresetbasedonbothSRAMandDRAMdata.BecauseoverowhappensonlytoasmallfractionofSRAMcountersandaDRAMaccessismadeonlyafteranoverowedSRAMcounterisaccessed2btimes,theoveralloverheadofDRAMaccessisverysmall. 2.11Summary Per-owtrafcmeasurementprovidesreal-worlddataforavarietyofapplicationsonaccountingandbilling,anomalydetection,andtrafcengineering.Currentonline 49

PAGE 50

datacollectionmethodscannotmeettherequirementsofbeingbothfastandcompact.Thisworkproposesanoveldataencoding/decodingscheme,whichmixesper-owinformationrandomlyinatightSRAMspaceforcompactness.Itsonlineoperationonlyincursasmalloverheadofonehashcomputationandonecounterupdateperpacket.Twoofinestatisticalmethodsthecountersumestimationandthemaximumlikelihoodestimationareusedtoextractper-owsizesfromthemixeddatastructureswithgoodaccuracy.Duetoitsfundamentallydifferentdesignphilosophy,thenewmeasurementfunctionisabletoworkinatightspacewhereexactmeasurementisnolongerpossible,anditdoessowiththeminimalnumberofmemoryaccessesperpacket. 50

PAGE 51

Figure2-1. Trafcdistribution:eachpointshowsthenumber(ycoordinate)ofowsthathaveacertainsize(xcoordinate). Figure2-2. EstimationresultsbyCSEandMLMwhenM=2Mb. Figure2-3. EstimationresultsbyCSEandMLMwhenM=4Mb. 51

PAGE 52

Figure2-4. EstimationresultsbyCSEandMLMwhenM=8Mb. Figure2-5. EstimationresultsbyCBwhenM=2,4,and8Mb. Table2-1. Numberofmemoryaccessesandnumberofhashcomputationsperpacket memoryhashconstant?accessescomputations CSM21YMLM21YCB63NMRSCBF4.474.47N 52

PAGE 53

Figure2-6. EstimationresultsbyMRSCBF. Figure2-7. EstimationresultsbyMLMwhenb=6,7,8,and9.Intheseexperiments,n=10M,M=4Mb. Figure2-8. EstimationbiasandstandarddeviationintheexperimentalresultsshowninFigure 2-7 53

PAGE 54

Figure2-9. EstimationresultsbyMLMwhenl=50,70,100,and1000.Intheseexperiments,n=10M,M=4Mb. Figure2-10. EstimationbiasandstandarddeviationintheexperimentalresultsshowninFigure 2-9 Figure2-11. EstimationresultsbyMLMwhenp=75%,50%,25%,and2%.Intheseexperiments,n=10M,M=4Mb. Figure2-12. EstimationbiasandstandarddeviationintheexperimentalresultsshowninFigure 2-11 54

PAGE 55

CHAPTER3SCANDETECTIONINHIGH-SPEEDNETWORKS Thischapterattemptstollamissingpieceintheexistingresearchlandscape.Itsgoalistooptimallycombineprobabilisticsamplingandbitsharingthetwomosteffectivememoryreductionmethodstofulllquantitativelyspeciedperformanceobjectives.Wehavethreecontributions.First,wepresentageneralizedschemeforscandetectionbasedonbitsharing.Itincorporatesprobabilisticsamplingandenhancessecuritythroughaprivatekey.Second,asthemainresultsinthischapter,weshowanalyticallyhowtooptimallycombineprobabilisticsamplingandbitsharing.Wederivetheprobabilityfortheintegratedsampling/bit-sharingschemetomissreportingascannerandtheprobabilitytomistakenlyreportanon-scanner.Wethenconstructaniterativealgorithmthatsolvesanon-linearconstrainedoptimizationproblemtoobtaintheoptimalvaluesforthesamplingprobabilityandotherparameterssuchthatthememoryrequiredtoboundtheaboveprobabilitiesisminimized.Third,weperformexperimentsbasedonrealtrafctraceanddemonstratethat,usingtheoptimalparametersobtainedfromthiswork,wecanreducethememoryconsumptionbythreetotwentytimeswhencomparingwiththebestexistingwork.Remarkably,thenumberofbitsrequiredbyourschemeisfarsmallerthanthenumberofdistinctsourcesinthetrafctrace.Onaverage,ittakesmuchlessthan1bitpersourcetoperformscandetection. Therestofthischapterisorganizedasfollows:Section 3.1 givestheproblemdenition.Section 3.2 describestherelatedwork.Section 3.3 describesourgeneralizedbit-sharingscheme.Section 3.4 presentstheanalyticalresultsforoptimalparameters.Section 3.5 presentstheexperimentalresults.Section 3.6 givesthesummary. 3.1ProblemStatement Thenumberofdistinctdestinationaddressesthatanexternalsourcehascontactediscalledthespreadofthesource.Theproblemofscandetectionistocongurea 55

PAGE 56

rewalloranintrusiondetectionsystemtoreportallexternalsourceswhosespreadsexceedacertainthresholdduringameasurementperiod.Werefertothesesourcesaspotentialscanners(orscannersforshort). IfarewalloranIDSkeepstheexactcountofdistinctdestinationsthateachsourcehascontacted,itisabletoreportthescannersprecisely.However,keepingtrackofper-sourceinformationconsumesalargeamountofresources.ThelimitedSRAMmayonlyallowustoestimatearoughcountofdistinctdestinationsthateachsourcecontacts[ 110 114 115 129 ].Whenpreciselyreportingscannersisinfeasible,thefunctionofscandetectionmustbedenedinaprobabilisticterm. Weadopttheprobabilisticperformanceobjectivefrom[ 110 ].Lethandlbetwopositiveintegers,h>l.Letandbetwoprobabilityvalues,0<<1and0<<1.Theobjectiveistoreportanysourcewhosespreadishorlargerwithaprobabilitynolessthanandreportanysourcewhosespreadislorsmallerwithaprobabilitynomorethan.Letkbethespreadofanarbitrarysourcesrc.Theobjectivecanbeexpressedintermsofconditionalprobabilities: ProbfreportsrcasascannerjkhgProbfreportsrcasascannerjklg(3) Wetreatthereportofasourcewhosespreadislorsmallerasafalsepositive,andthenon-reportofasourcewhosespreadishorlargerasafalsenegative.Hence,theaboveobjectivecanalsobestatedasboundingthefalsepositiveratiobyandthefalsenegativeratioby1)]TJ /F7 11.955 Tf 11.96 0 Td[(. OurgoalistominimizetheamountofSRAMthatisneededforachievingtheaboveobjective. Thememoryrequirementfordetectingaggressivescannersislikelytobesmall.Forexample,supposeanaggressivescannermakes100distinctcontactseachsecond,whereasanormalhostrarelymakes100distinctcontactsinaday.Todetectsucha 56

PAGE 57

scanner,arewallcansetthemeasurementperiodtobeasecond.Thenumberofcontactsthatpasstherewallinsuchasmallperiodislikelytobesmall.Consequently,itdoesnotneedmuchmemorytostorethem.However,thesituationistotallydifferentforstealthyscannersthatmakecontactsatlowrates.Considerascannerthatmakes500distinctcontactsaday.Ifthemeasurementperiodisaday,weareabletosetitapartfromthenormalhosts.However,ifthemeasurementperiodisasecond,wewillnotdetectthisscannerbecauseitmakeslessthan0.006contactpersecondonaverage. Inordertodetectdifferenttypesofscanners,arewallmayexecutemultipleinstancesofascandetectionfunctionsimultaneously,eachhavingadifferentmeasurementperiod.Foraggressivescanners,asmallperiodwillbechosensothattheycanbedetectedinrealtime.Forstealthyscanners,alargeperiodwillbechosen.Inthelattercase,timelydetectionisofsecondprioritybecausethescannersthemselvesoperateslowly.Butthememoryrequirementisofrstpriorityduetothelargenumberofcontactsthatareexpectedtopassthroughtherewallinalongmeasurementperiod.Reducingmemoryconsumptionisthefocusofthisstudy. 3.2RelatedWork Venkataramanetal.[ 110 ]usehashtablestostoretheaddressesofthesampledcontacts.Theirmaincontributionistoderivetheoptimalsamplingprobabilitythatachievesaclassicationobjectivewithpre-speciedupperboundsonfalse-positiveratioandfalse-negativeratio.However,becausetheiralgorithmsstorethecontactaddresses,itleavesgreatroomforimprovement.EvenifBloomltersareused,theroomforimprovementisstillsignicant,aswehavearguedinSection 3.5 Estanetal.[ 41 ]proposeavarietyofbitmapalgorithmstostorethecontacts(oractiveowsintheircontext).Itsavesspacebecauseeachdestinationaddressisstoredasabit.However,assigningonebitmaptoeachsourceisnotcheapiftheaveragenumberofcontactspersourceissmall.Inaddition,anindexstructureisneeded 57

PAGE 58

tomapasourcetoitsbitmap.Itistypicallyahashtablewhereeachentrystoresasourceaddressandapointertothecorrespondingbitmap.Caoetal.[ 13 ]developthethresholdedbitmapalgorithmbasedonthevirtualbitmapalgorithmpresentedin[ 41 ]forspreadestimation.Theyuseprobabilisticsamplingtoreducetheinformationtobestored.Zhaoetal.[ 129 ]shareasetofbitmapsamongallsources.Theschemeassignsthreepseudo-randomlyselectedbitmapstoeachsource.Whenthesourcecontactsadestination,thedestinationisstoredbysettingonebitineachofthethreebitmaps.Becausethebitmapsaresharedbyothers,theinformationstoredforonesourcebecomesnoiseforothers.Yoonetal.[ 114 115 ]observethatthenoiseintroducedbysharingbitmapscannotbeappropriatelyremovedifthenumberofbitmapsisnotsufcientlylarge.Bysharingbitsinsteadofbitmaps,CSEconsiderablyreducesthememoryconsumption. AlsorelatedistheworkbyBandietal.[ 5 ]ontheheavydistincthitterproblem,whichisessentiallythesameasspreaderclassication.TheiralgorithmexploitsTCAM(TernaryContentAddressableMemory),aspecialkindofmemoryfoundinNPUs(NetworkProcessingUnits).Theemphasisoftheirworkisontheprocessingtime.Arelatedbranchofresearchisthedetectionofheavy-hitters[ 17 29 33 35 40 48 81 123 ].Aheavy-hitterisasourcethatsendsalotofpacketsduringameasurementperiodnomatterwhetherthepacketsaresenttoafewormanydistinctdestinations. 3.3AnEfcientScanDetectionScheme Thissectionpresentsourefcientscandetectionscheme(ESD). 3.3.1ProbabilisticSampling Tosaveresources,arewall(orIDS)samplesthecontactsmadebyexternalsourcestointernaldestinations,anditonlystoresthesampledcontacts.Therewallselectscontactsforstorageuniformlyatrandomwithasamplingprobabilityp.Thesamplingprocedureissimple:therewallhashesthesource/destinationaddresspairofeachpacketthatarrivesattheexternalnetworkinterfaceintoanumberinarange 58

PAGE 59

[0,N).IfthehashresultissmallerthanpN,thecontactwillbestored;otherwise,thecontactwillnotbestored. 3.3.2Bit-SharingStorage Abitarray(alsocalledbitmap)maybeusedtostoreallsampledcontactsmadebyasource[ 41 ].Thebitsareinitiallyzeros.Eachsampledcontactishashedtoabitinthebitmap,andthebitissettoone.Attheendofthemeasurementperiod,wecanestimatethenumberofcontacts,i.e.,thespreadofthesource,basedonthenumberofzerosremaininginthebitmap.Usingper-sourcebitmapsisnotmemory-efcient.Ononehand,thesizeofeachbitmaphastobelargeenoughtoensuretheaccuracyinestimatingthespreadvaluesofthescanners.Ontheotherhand,thevastmajorityofnormalsourceshavesmallspreadvaluesandtheirbitmapsarelargelywastedbecausemostbitsremainzeros.Tosolvethisproblem,wewanttoputthosewastedbitsingoodusebyallowingbitmapstosharetheirbits. Tofullysharetheavailablebits,ESDstorescontactsfromdifferentsourcesinasinglebitarrayB.LetmbethenumberofbitsinB.Foranarbitrarysourcesrc,weuseahashfunctiontopseudo-randomlyselectanumberofbitsfromBtostorethecontactsmadebysrc.TheindicesoftheselectedbitsareH(srcR[0]),H(srcR[1]),...,H(srcR[s)]TJ /F6 11.955 Tf 12.71 0 Td[(1]),whereH(...)isahashfunctionwhoserangeis[0,m),Risanintegerarray,storingrandomlychosenconstantswhosepurposeistoarbitrarilyalterthehashresult,ands(m)isasystemparameterthatspeciesthenumberofbitstobeselected.Theabovebitsformalogicalbitmapofsourcesrc,denotedasLB(src). Similarly,alogicalbitmapcanbeconstructedfromBforanyothersource.Essentially,weembedthebitmapsofallpossiblesourcesinB.Thebit-sharingrelationshipisdynamicallydeterminedontheyaseachnewsourcesrc0whosecontactsaresampledbytherewallwillbeallocatedalogicalbitmapLB(src0)fromB. Atthebeginningofameasurementperiod,allbitsinBareresettozeros.Consideranarbitrarycontacthsrc,dstithatissampledforstorage,wheresrcisthesource 59

PAGE 60

addressanddstisthedestinationaddress.TherewallsetsasinglebitinBtoone.Obviously,itmustalsobeabitinthelogicalbitmapLB(src).Theindexofthebittobesetforthiscontactisgivenasfollows:H(srcR[H(dstK)mods]). Thesecondhash,H(dstK),ensuresthatthebitispseudo-randomlyselectedfromLB(src).TheprivatekeyKisintroducedtopreventthehashcollisionattacks.Insuchanattack,ascannersrcndsasetofdestinationaddresses,dst1,dst2,...,thathavethesamehashvalue,H(dst1)=H(dst2)=...Ifitonlycontactsthesedestinations,thesamebitinLB(src)willbeset,whichallowsthescannertostayundetected.ThistypeofattackscanbepreventedifweuseacryptographichashfunctionsuchasMD5orSHA1,whichmakesitdifculttonddestinationaddressesthathavethesamehashvalue.However,ifaweakerhashfunctionisusedforperformancereason,thenaprivatekeybecomesnecessary.Withoutknowingthekey,thescannerswillnotbeabletopredictwhichdestinationaddressesproducethesamehashvalue. Tostoreacontact,ESDonlysetsasinglebitandperformstwohashoperations.Thisismoreefcientthanthemethodsthatusehashtables[ 110 ]orhavefeaturessimilartoBloomltersthatrequiresettingmultiplebitsforstoringeachcontact[ 129 ]. 3.3.3MaximumLikelihoodEstimationandScannerReport Attheendofthemeasurementperiod,ESDwillsendthecontentofBtoanofinedataprocessingcenter.There,thelogicalbitmapofeachsourcesrcisextractedandtheestimatedspread^kofthesourceiscomputed.Onlyif^kisgreaterthanathresholdvalueT,ESDreportsthesourceasapotentialscanner.WewilldiscusshowtokeeptrackofthesourceaddressesinSection 3.3.5 ,andexplainhowtodeterminethethresholdTinSection 3.4 .Belowwederivetheformulafor^k. Letkbethetruespreadofsourcesrc,andnbethenumberofdistinctcontactsmadebyallsources.LetVmbethefractionofbitsinBwhosevaluesarezerosatthe 60

PAGE 61

endofthemeasurementperiod,VsbethefractionofbitsinLB(src)whosevaluesarezeros,andUsbethenumberofbitsinLB(src)whosevaluesarezeros.Clearly,Vs=Us s.Dependingonthecontext,Vm(orVs,Us)isusedeitherasarandomvariableoraninstancevalueoftherandomvariable. Theprobabilityforanycontacttobesampledforstorageisp.ConsideranarbitrarybitbinLB(src).Asampledcontactmadebysrchasaprobabilityof1 stosetbto`1',andasampledcontactmadebyanyothersourcehasaprobabilityof1 mtosetbto`1'.Hence,theprobabilityq(k)forbtoremain`0'attheendofthemeasurementperiodis q(k)=(1)]TJ /F3 11.955 Tf 14.85 8.08 Td[(p m)n)]TJ /F4 7.97 Tf 6.59 0 Td[(k(1)]TJ /F3 11.955 Tf 13.15 8.08 Td[(p s)k.(3) EachbitinLB(src)hasaprobabilityofq(k)toremain`0'.Theobservednumberof`0'bitsinLB(src)isUs.Thelikelihoodfunctionforthisobservationtooccurisgivenasfollows:L=q(k)Us(1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(k))s)]TJ /F4 7.97 Tf 6.58 0 Td[(Us. (3) Inthestandardprocessofmaximumlikelihoodestimation,theunknownvaluekistechnicallytreatedasavariablein( 3 ).Wewanttondanestimate^kthatmaximizesthelikelihoodfunction.Namely, ^k=argmaxfLgk.(3) Sincethemaximaisnotaffectedbymonotonetransformations,weuselogarithmtoturntherightsideof( 3 )fromproducttosummation:ln(L)=Usln(q(k))+(s)]TJ /F3 11.955 Tf 11.96 0 Td[(Us)ln(1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(k)). 61

PAGE 62

From( 3 ),theaboveequationcanbewrittenasln(L)=Us((n)]TJ /F3 11.955 Tf 11.95 0 Td[(k)ln(1)]TJ /F3 11.955 Tf 14.86 8.09 Td[(p m)+kln(1)]TJ /F3 11.955 Tf 13.15 8.09 Td[(p s))+(s)]TJ /F3 11.955 Tf 11.96 0 Td[(Us)ln(1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F3 11.955 Tf 14.85 8.09 Td[(p m)n)]TJ /F4 7.97 Tf 6.59 0 Td[(k(1)]TJ /F3 11.955 Tf 13.15 8.09 Td[(p s)k). Tondthemaxima,wedifferentiatebothsides: @ln(L) @k=ln(1)]TJ /F4 7.97 Tf 13.15 5.03 Td[(p s 1)]TJ /F4 7.97 Tf 14.34 5.04 Td[(p m)Us)]TJ /F3 11.955 Tf 11.96 0 Td[(s(1)]TJ /F4 7.97 Tf 14.34 5.03 Td[(p m)n)]TJ /F4 7.97 Tf 6.59 0 Td[(k(1)]TJ /F4 7.97 Tf 13.15 5.03 Td[(p s)k 1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F4 7.97 Tf 14.35 5.04 Td[(p m)n)]TJ /F4 7.97 Tf 6.58 0 Td[(k(1)]TJ /F4 7.97 Tf 13.15 5.04 Td[(p s)k.(3) Wethenlettherightsidebezero.Thatis, Us=s(1)]TJ /F3 11.955 Tf 14.85 8.09 Td[(p m)n)]TJ /F4 7.97 Tf 6.59 0 Td[(k(1)]TJ /F3 11.955 Tf 13.15 8.09 Td[(p s)k.(3) Takinglogarithmonbothsides,wehavelnUs s=nln(1)]TJ /F3 11.955 Tf 14.85 8.09 Td[(p m)+k(ln(1)]TJ /F3 11.955 Tf 13.15 8.09 Td[(p s))]TJ /F6 11.955 Tf 11.95 0 Td[(ln(1)]TJ /F3 11.955 Tf 14.85 8.09 Td[(p m)),k=lnVs)]TJ /F3 11.955 Tf 11.96 0 Td[(nln(1)]TJ /F4 7.97 Tf 14.35 5.04 Td[(p m) ln(1)]TJ /F4 7.97 Tf 13.15 5.04 Td[(p s))]TJ /F6 11.955 Tf 11.95 0 Td[(ln(1)]TJ /F4 7.97 Tf 14.34 5.04 Td[(p m). (3) whereVs=Us s.Supposethenumberofsources(whichequalstothenumberoflogicalbitmaps)issufcientlylarge.BecauseeverybitineverylogicalbitmapisrandomlyselectedfromB,inthissense,eachofthencontactshasaboutthesameprobabilityp mofsettinganybitinB.Hence,wehave E(Vm)=(1)]TJ /F3 11.955 Tf 14.85 8.08 Td[(p m)n.(3) Applying( 3 )to( 3 ),wehavek=lnVs)]TJ /F6 11.955 Tf 11.95 0 Td[(lnE(Vm) ln(1)]TJ /F4 7.97 Tf 13.15 5.03 Td[(p s))]TJ /F6 11.955 Tf 11.95 0 Td[(ln(1)]TJ /F4 7.97 Tf 14.34 5.03 Td[(p m). (3) ReplacingE(Vm)bytheinstancevalueVm,wehavethefollowingestimationfork.^k=lnVs)]TJ /F6 11.955 Tf 11.96 0 Td[(lnVm ln(1)]TJ /F4 7.97 Tf 13.15 5.04 Td[(p s))]TJ /F6 11.955 Tf 11.95 0 Td[(ln(1)]TJ /F4 7.97 Tf 14.34 5.04 Td[(p m), (3) 62

PAGE 63

whereVscanbemeasuredbycountingthenumberofzerosinLB(src),VmcanbemeasuredbycountingthenumberofzerosinB,ands,pandmarepre-setparametersofESD(seethenextsection). 3.3.4VarianceofVm LetAibetheeventthattheithbitinBremains`0'attheendofthemeasurementperiodand1Aibethecorrespondingindicatorrandomvariable.LetUmbetherandomvariableforthenumberof`0'bitsinB.WerstderivetheprobabilityforAitooccurandtheexpectedvalueofUm.ForanarbitrarybitinB,eachdistinctcontacthasaprobabilityofp mtosetthebittoone.AllcontactsareindependentofeachotherwhensettingbitsinB.Hence, ProbfAig=(1)]TJ /F3 11.955 Tf 14.85 8.09 Td[(p m)n,8i2[0,s). TheprobabilityforAiandAj,8i,j2[0,m),i6=j,tohappensimultaneouslyisProbfAi\Ajg=(1)]TJ /F21 10.909 Tf 12.1 7.38 Td[(2p m)n.SinceVm=Um mandUm=Pmi=11Ai,wehaveE(V2m)=1 m2E((mXi=11Ai)2)=1 m2E(mXi=112Ai)+2 m2E(X1i
PAGE 64

3.3.5SourceAddresses ESDdoesnotstorethesourceaddressofeveryarrivalpacket.Instead,itstoresasourceaddressonlywhenacontactsetsabitinBfrom`0'to`1'.Hence,thefrequencyofstoringsourceaddressesismuchsmallerthanthefrequencyatwhichcontactsaresampledforsettingbitsinB.First,numerouspacketsmaybesentfromasourcetoadestinationinaTCP/UDPsession.Onlytherstsampledpacketmaycausethesourceaddresstobestoredbecauseonlytherstpacketsetsabitfrom`0'to`1'andtheremainingpacketswillsetthesamebit(whichisalready`1').Second,asourcemaysendthousandsorevenmillionsofpacketsthrougharewall,butthenumberoftimesitsaddresswillbestoredisboundedbys(whichisthenumberofbitsinthesource'slogicalbitmap).Insummary,becausetheoperationofstoringsourceaddressesisrelativelyinfrequent,theseaddressescanbestoredinthemainmemory. 3.4OptimalSystemParametersandMinimumMemoryRequirement Inthissection,werstdeveloptheconstraintsthatthesystemparametersmustsatisfyinordertoachievetheprobabilisticperformanceobjective.Basedontheconstraints,wedeterminetheoptimalvaluesforthesizesofthelogicalbitmaps,thesamplingprobabilityp,andthethresholdT.WealsodeterminetheminimumamountofmemorymthatshouldbeallocatedforESDtoachievetheperformanceobjective.Recallthaton-dieSRAMmaybesharedbyotherfunctions. 3.4.1ReportProbability Consideranarbitrarysourcesrcwhosespreadisk.Givenasetofsystemparameters,m,s,pandT,wederivetheprobabilityforESDtoreportsrcasascanner, 64

PAGE 65

i.e.,Probf^kTg.From( 3 ),weknowthatthefollowinginequalitiesareequivalent.^kTlnVs)]TJ /F6 11.955 Tf 11.95 0 Td[(lnVm ln(1)]TJ /F4 7.97 Tf 13.16 5.04 Td[(p s))]TJ /F6 11.955 Tf 11.96 0 Td[(ln(1)]TJ /F4 7.97 Tf 14.34 5.04 Td[(p m)TVsVm(1)]TJ /F4 7.97 Tf 13.15 5.03 Td[(p s 1)]TJ /F4 7.97 Tf 14.34 5.04 Td[(p m)T LetUsbetherandomvariableforthenumberof`0'bitsinLB(src).Us=sVs.TheaboveinequalitybecomesUssVm(1)]TJ /F4 7.97 Tf 13.16 5.04 Td[(p s 1)]TJ /F4 7.97 Tf 14.35 5.03 Td[(p m)T. (3) Forasetofparametersm,s,pandT,wedeneaconstantC=sVm(1)]TJ /F4 7.97 Tf 13.15 5.03 Td[(p s 1)]TJ /F4 7.97 Tf 14.35 5.04 Td[(p m)T, wheretheinstancevalueofVmcanbemeasuredfromBafterthemeasurementperiod.Hence,theprobabilityforESDtoreportsrcisProbf^kTg=ProbfUsCg. Usfollowsthebinomialdistributionwithparameterssandq(k),whereq(k)in( 3 )istheprobabilityforanarbitrarybitinLB(src)toremainzeroattheendofthemeasurementperiod.Hence,theprobabilityofhavingexactlyizerosinLB(src)isgivenbythefollowingprobabilitymassfunction:ProbfUs=ig=siq(k)i(1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(k))s)]TJ /F4 7.97 Tf 6.58 0 Td[(i. (3) WemusthaveProbf^kTg=ProbfUsCg=bCcXi=0siq(k)i(1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(k))s)]TJ /F4 7.97 Tf 6.58 0 Td[(i. (3) 65

PAGE 66

3.4.2ConstraintsfortheSystemParameters Wederivetheconstraintsthatthesystemparametersmustsatisfyinordertoachievetheperformanceobjectivein( 3 ).First,wegivethevarianceofVm,whichisprovidedin( 3 ).Var(Vm)'e)]TJ /F8 5.978 Tf 7.78 3.53 Td[(np m(1)]TJ /F6 11.955 Tf 11.96 0 Td[((1+np2 m)e)]TJ /F8 5.978 Tf 7.78 3.53 Td[(np m) m. (3) Itapproachestozeroasmincreases.InFigure 3-1 ,weplottheratioofthestandarddeviationStd(Vm)=p Var(Vm)toE(Vm),whichcanbefoundin( 3 ).ThegureshowsthatStd(Vm)=E(Vm)isverysmallwhenmisreasonablylarge.Inthiscase,wecanapproximatelytreatVmasaconstant. Vm'E(Vm)'(1)]TJ /F3 11.955 Tf 14.85 8.09 Td[(p m)n.(3) Theprobabilisticperformanceobjectivecanbestatedastworequirements.First,theprobabilityforESDtoreportasourcewithkhmustbeatleast.Thatis,Probf^kTg,8kh.From( 3 ),thisrequirementcanbewrittenasthefollowinginequality:bCcXi=0siq(k)i(1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(k))s)]TJ /F4 7.97 Tf 6.59 0 Td[(i, whereC=sVm(1)]TJ /F8 5.978 Tf 7.78 3.53 Td[(p s 1)]TJ /F8 5.978 Tf 8.67 3.52 Td[(p m)T's(1)]TJ /F4 7.97 Tf 14.68 5.03 Td[(p m)n(1)]TJ /F8 5.978 Tf 7.78 3.53 Td[(p s 1)]TJ /F8 5.978 Tf 8.68 3.52 Td[(p m)T.Theleftsideoftheinequalityisanincreasingfunctionink.Hence,tosatisfytherequirementintheworstcasewhenk=h,thefollowingconstraintforthesystemparametersmustbemet:bCcXi=0siq(h)i(1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(h))s)]TJ /F4 7.97 Tf 6.58 0 Td[(i. (3) 66

PAGE 67

Second,theprobabilityforESDtoreportasourcewithklmustbenomorethan.Thisrequirementcanbesimilarlyconvertedintothefollowingconstraint:bCcXi=0siq(l)i(1)]TJ /F3 11.955 Tf 11.95 0 Td[(q(l))s)]TJ /F4 7.97 Tf 6.59 0 Td[(i. (3) 3.4.3OptimalSystemParameters Ourgoalistooptimizethesystemparameterssuchthatthememoryrequirement,m,isminimizedundertheconstraints( 3 )and( 3 ).Theproblemisformallydenedasfollows.Minimizem (3)SubjecttobCcXi=0siq(h)i(1)]TJ /F3 11.955 Tf 11.95 0 Td[(q(h))s)]TJ /F4 7.97 Tf 6.59 0 Td[(i,bCcXi=0siq(l)i(1)]TJ /F3 11.955 Tf 11.95 0 Td[(q(l))s)]TJ /F4 7.97 Tf 6.59 0 Td[(i,C=s(1)]TJ /F3 11.955 Tf 14.85 8.08 Td[(p m)n(1)]TJ /F4 7.97 Tf 13.15 5.03 Td[(p s 1)]TJ /F4 7.97 Tf 14.34 5.04 Td[(p m)T. Theparameters,h,l,and,arespeciedintheperformanceobjective.Thevalueofnisdecidedbasedonthehistorydatainthepastmeasurementperiods.Tobeconservative,wetakethethemaximumnumbernofdistinctcontactsobservedinanumberofpreviousmeasurementperiods.Morespecically,( 3 )canbeturnedintoaformulaforestimatingnineachpreviousperiodifwereplaceE(Vm)withtheinstancevalueVm. ^n=)]TJ /F3 11.955 Tf 10.49 8.08 Td[(m plnVm(3) Wederivetherelativebiasandtherelativestandarddeviationoftheaboveestimation. Bias(^n n)=E(^n n))]TJ /F6 11.955 Tf 11.96 0 Td[(1'enp m)]TJ /F4 7.97 Tf 13.15 5.04 Td[(np2 m)]TJ /F6 11.955 Tf 11.95 0 Td[(1 2np(3) 67

PAGE 68

Std(^n n)=p m np(enp m)]TJ /F3 11.955 Tf 13.15 8.08 Td[(np2 m)]TJ /F6 11.955 Tf 11.95 0 Td[(1)1=2 (3) Theybothapproachtozeroasmincreases.Basedonthelargest^nvalueobservedinacertainnumberofpastmeasurementperiods,wecansetthevalueofn. Tosolvetheconstrainedoptimizationproblem( 3 ),weneedtodeterminetheoptimalvaluesoftheremainingthreesystemparameters,s,pandT,suchthatmwillbeminimized.Weconsidertheleftsideof( 3 )asafunctionFh(m,s,p,T),andtheleftsideof( 3 )asFl(m,s,p,T).Namely,Fh(m,s,p,T)=bCcXi=0siq(h)i(1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(h))s)]TJ /F4 7.97 Tf 6.58 0 Td[(i,Fl(m,s,p,T)=bCcXi=0siq(l)i(1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(l))s)]TJ /F4 7.97 Tf 6.59 0 Td[(i. Bothofthemarenon-increasingfunctionsinT,accordingtotherelationbetweenCandT.Inthefollowing,wepresentaniterativenumericalalgorithmtosolvetheoptimizationproblem.Thealgorithmconsistsoffourprocedures. Algorithm1Potential(m,s,p) INPUT:m,s,pand OUTPUT:ThemaximumvalueofFh(m,s,p,T)undertheconditionthatFl(m,s,p,T) PickasmallintegerT1suchthatFl(m,s,p,T1)>andalargeintegerT2suchthatFl(m,s,p,T2); whileT2)]TJ /F19 10.909 Tf 10.91 0 Td[(T1>1do T=b(T1+T2)=2c; ifFl(m,s,p,T)thenT1=TelseT2=T; endwhile T=T; returnFh(m,s,p,T); First,weconstructaprocedurecalledPotential(m,s,p),whichtakesavalueofm,avalueofsandavalueofpasinputandreturnsthemaximumvalueofFh(m,s,p,T)undertheconditionthatFl(m,s,p,T)issatised.BecauseFh(m,s,p,T)isanon-increasingfunctioninT,weneedtondthesmallestvalueofTthatsatisesFl(m,s,p,T).Thatcanbedonenumericallythroughbinarysearch:Pick 68

PAGE 69

asmallintegerT1suchthatFl(m,s,p,T1)andalargeintegerT2suchthatFl(m,s,p,T2).WeiterativelyshrinkthedifferencebetweenthembyresettingoneofthemtobetheaverageT1+T2 2,whilemaintainingtheinequalities,Fl(m,s,p,T1)andFl(m,s,p,T2).TheprocessstopswhenT1=T2,whichisdenotedasT.TheprocedurePotential(m,s,p)returnsFh(m,s,p,T).ThepseudocodeispresentedinAlgorithm 3.4.3 Essentially,whatPotential(m,s,p)returnsisthemaximumvalueoftheleftsidein( 3 )undertheconditionthat( 3 )issatised.ThedifferencebetweenPotential(m,s,p)andprovidesuswithaquantitativeindicationonhowconservativeoraggressivewehavechosenthevalueofm.IfPotential(m,s,p))]TJ /F7 11.955 Tf 12.13 0 Td[(ispositive,itmeansthattheperformanceachievedbythecurrentmemorysizeismorethanrequired.Weshallreducem.Onthecontrary,ifPotential(m,s,p))]TJ /F7 11.955 Tf 12.32 0 Td[(isnegative,weshallincreasem. Giventheabovesemantics,whenwedeterminetheoptimalvaluesforpands,ourgoaliscertainlytomaximizethereturnvalueofPotential(m,s,p). Second,givenavalueofmandavalueofs,weconstructaprocedureOptimalP(m,s)thatdeterminestheoptimalvaluepsuchthatPotential(m,s,p)ismaximized.Whenthevaluesofmandsarexed,Potential(m,s,p)becomesafunctionofp.ItisacurveasillustratedinFigure 3-2 .Thecurve(withoutthearrows)showsthevalueofPotential(m,s,p)withrespecttopwhenm=0.45MBands=150.Itsnon-smoothappearanceisduetobCcintheformulaofFh(m,s,p,T).Fh(m,s,p,T)dependsonthevaluesofbCcandq(h),whicharebothfunctionsofp. Weuseabinarysearchalgorithmtondanear-optimalvalueofp.Letp1=0andp2=1.Letbeasmallpositivevalue(suchas0.001).Repeatthefollowingoperation:Letp=(p1+p2)=2.IfPotential(m,s,p)
PAGE 70

Thisdifferencecanbemadearbitrarilysmallwhenwedecreaseattheexpenseofincreasedcomputationoverhead.Wewanttostressthatitisone-timeoverhead(notonlineoverhead)todeterminethesystemparametersbeforedeployment.TheoperationofOptimalP(m,s)isillustratedbythearrowsinFigure 3-2 .ThearrowsillustratetheoperationofOptimalP(m,s).Intherstiteration(arrowi1),p2issettobe(p1+p2)=2.Intheseconditeration(arrowi2),p1issettobe(p1+p2)=2.Inthethirditeration(arrowi3),p2issettobe(p1+p2)=2. Third,givenavalueofm,weconstructaprocedureOptimalS(m)thatdeterminestheoptimalvaluessuchthatPotential(m,s,OptimalP(m,s))ismaximized.Whenthevalueofmisxed,Potential(m,s,OptimalP(m,s))becomesafunctionofs.ItisacurveasillustratedinFigure 3-3 .WecanuseabinarysearchalgorithmsimilartothatofOptimalP(m,s)tonds. Fourth,weconstructaprocedureOptimalM()thatdeterminestheminimummemoryrequirementmthroughbinarysearch:DenotePotential(m,OptimalS(m),OptimalP(m,OptimalS(m)))asPotential(m,...).Pickasmallvaluem1suchthatPotential(m1,...),whichmeansthattheperformanceobjectiveisnotmetmorespecically,accordingtothesemanticsofPotential(...),theconstraint( 3 )cannotbesatisediftheconstraint( 3 )issatised.Pickalargevaluem2suchthatPotential(m2,...),whichmeansthattheperformanceobjectiveismet.Repeatthefollowingoperation.Letm=b(m1+m2)=2c.IfPotential(m,...),setm1tobem;otherwise,setm2tobem.Theaboveiterativeoperationterminateswhenm1=m2,whichisreturnedbytheprocedureOptimalM(). Inpractice,anetworkadministratorwillrstdenetheperformanceobjectivethatisspeciedby,,handl.Heorshesetsthevalueofnbasedonhistorydata,andthensetsm=OptimalM(),s=OptimalS(m),p=OptimalP(m,s)andTasthethresholdvalueTbeforethelastcalltoPotential(m,s,p)isreturnedduringtheexecutionofOptim-alM().Aftertherewall(orIDS)isconguredwiththeseparametersandbegins 70

PAGE 71

tomeasurethenetworktrafc,italsomonitorsthevalueofn.Ifthemaximumnumberofdistinctcontactsinameasurementperiodchangessignicantly,thevaluesofm,s,pandTwillberecomputed. 3.5Experiments 3.5.1ExperimentalSetup WeevaluatetheperformanceofESDandcompareitwiththeexistingwork,includingtheTwo-levelFilteringAlgorithm(TFA)[ 110 ],theThresholdedBitmapAlgorithm(TBA)[ 13 ],andtheCompactSpreadEstimator(CSE)[ 114 ].TFAusestwolterstoreduceboththenumberofsourcestobemonitoredandthenumberofcontactstobestored.Itisdesignedtosatisfytheprobabilisticperformanceobjectivein( 3 ).TBAisnotdesignedformeetingtheprobabilisticperformanceobjective.Itcannotensurethatthefalsepositive/falsenegativeratiosarebounded.CSEisdesignedtoestimatethespreadsoftheexternalsourcesinaverycompactmemoryspace.Itcanbeusedforscandetectionbyreportingthesourceswhoseestimatedspreadsexceedacertainthreshold.However,thedesignofCSEmakesitunsuitableformeetingtheobjectivein( 3 ). OnlineStreamingModule(OSM)[ 129 ]isanotherrelatedwork.WedonotimplementOSMinthisstudybecauseYoonetal.showthat,giventhesameamountofmemory,CSEestimatesspreadvaluesmoreaccuratelythanOSM[ 114 ].Moreover,theoperationsofOSMsharecertainsimilaritywithBloomlters.Tostoreeachcontact,itperformsthreehashfunctionsandmakesthreememoryaccesses.Incomparison,ESDperformstwohashfunctionsandmakesonememoryaccess. TheexperimentsusearealInternettrafctracecapturedbyCisco'sNetowatthemaingatewayofourcampusforaweek.Forexample,inonedayoftheweek,thetrafctracerecords10,702,677distinctcontacts,4,007,256distinctsourceIPaddressesand56,167distinctdestinationaddresses.Theaveragespreadpersourceis2.67,whichmeansasourcecontacts2.67distinctdestinationsonaverage.Figure 3-4 shows 71

PAGE 72

thenumberofsourceswithrespecttothesourcespreadinlogscale.Thenumberofsourcesdecreasesexponentiallyasthespreadvalueincreasesfrom1to500.Afterthat,thereiszero,oneorafewsourcesforeachspreadvalue. WeimplementESD,TFA,TBAandCSE,andexecutethemwiththetrafctraceasinput.Aspartofthesetupineachexperiment,thevaluesofhandlaregiventospecifywhattoreportasscanners.Forexample,ifh=500andl=0.7h,thesourceswhosespreadsare500ormoreshouldbereported,andthesourceswhosespreadsare350orlessshouldnotbereported.Intheexperiments,thesourceofacontactistheIPaddressofthesenderandthedestinationistheIPaddressofthereceiver.Themeasurementperiodisoneday.Alongmeasurementperiodhelpstoseparatelow-ratescannersfromnormalhosts.Theexperimentalresultsaretheaverageovertheweek-longdata. Oneperformancemetricusedincomparisonistheamountofmemorythatisrequiredforascandetectionschemetomeetagivenprobabilityperformanceobjective.Remarkably,thenumberofbitsrequiredbyESDisfarsmallerthanthenumberofdistinctsourcesinthetrafctrace.Thatis,ESDrequiresmuchlessthan1bitpersourcetoperformscandetection.Otherperformancemetricsincludethefalsepositiveratioandthefalsenegativeratio,whichwillbeexplainedfurthershortly. 3.5.2ComparisoninTermsofMemoryRequirement TherstsetofexperimentscomparesESDandTFAfortheamountofmemorythattheyneedinordertosatisfyagivenprobabilisticperformanceobjective,whichisspeciedbyfourparameters,,,h,andl.SeeSection 3.1 fortheformaldenitionoftheperformanceobjective.WedonotcompareTBAandCSEherebecausetheyarenotdesignedtomeetthisobjective. ThememoryrequiredbyESDisdeterminedbasedontheiterativealgorithminSection 3.4.3 .Thevaluesofotherparameters,s,Tandp,aredecidedbythesamealgorithm.Usingtheseparameters,weperformexperimentsonESDwiththetrafc 72

PAGE 73

traceasinput,andtheexperimentalresultsconrmthattheperformanceobjectiveisindeedachievedforeachdayduringtheweek.TheamountofmemoryrequiredbyTFAisdeterminedexperimentallybasedonthemethodin[ 110 ]togetherwiththetrafctrace.TheparametersofTFAarechosenbasedontheoriginalpaper. ThememoryrequirementsofESDandTFAarepresentedinTables 3-1 3-2 withrespectto,,handl.For=0.9and=0.1,Table 3-1 showsthatTFArequiressixtotwenty-fourtimesofthememorythatESDrequires,dependingonthevaluesofhandl(whichthesystemadministratorwillselectbasedontheorganization'ssecuritypolicy).Forexample,whenh=500andl=0.5h,ESDreducesthememoryconsumptionbyanorderofmagnitudewhencomparingwithTFA. Todemonstratetheimpactofprobabilisticsampling,thetablealsoincludesthememoryrequirementofESDwhensamplingisturnedoff(bysettingp=1).ThisversionofESDisdenotedasESD-1.Sincepissetasaconstant,theiterativealgorithminSection 3.4.3 needstobeslightlymodied:TheprocedureOptimalP(m,s)willalwaysreturn1,whileotherproceduresremainthesame.Table 3-1 showsthatthememorysavedbysamplingissignicantwhenhislarge.Forexample,whenh=5,000andl=0.3h,ESDwithsamplinguseslessthanonethirteenthofthememorythatisneededbyESD-1.However,whenhbecomessmallerorl hbecomeslarger,ESDhastochoosealargersamplingprobabilityinordertolimittheerrorinspreadestimationcausedbysampling.Consequently,ithastostoremorecontactsandthusrequiremorememory.Forinstance,whenh=500andl=0.5h,ESDwithsamplinguses55.6%ofthememorythatisneededbyESD-1. Table 3-2 comparesthememoryrequirementswhen=0.95and=0.05.Itshowssimilarresults:(1)ESDusessignicantlylessmemorythanTFA,and(2)theprobabilisticsamplingmethodinESDiscriticalformemorysavingespeciallywhenhislargeorl hissmall.Thetablealsodemonstratesthatthememoryrequirementofeither 73

PAGE 74

ESDorTFAincreaseswhentheperformanceobjectivebecomesmorestringent,i.e.,issetlargerandsmaller. TFArequiresmorememorybecauseitstoresthesourceanddestinationaddressesofthecontacts.In[ 128 ],theauthorsalsoindicatethatBloomFilters[ 7 8 ]canbeusedtoreducethememoryconsumption.However,thepaperdoesnotgivedetaileddesignorparametersettings.Therefore,wecannotimplementtheBloom-lterversionofTFA.Thepaperclaimsthatthememoryrequirementwillbereducedbyafactorof2.5whenBloomltersareused.EvenwhenthisfactoristakenintoaccountinTables 3-1 3-2 ,memorysavingbyESDwillstillbesignicant. 3.5.3ComparisoninTermsofFalsePositiveRatioandFalseNegativeRatio Thefalsepositiveratio(FPR)isdenedasthefractionofallnon-scanners(whosespreadsarenomorethanl)thataremistakenlyreportedasscanners.Thefalsenegativeratio(FNR)isthefractionofallscanners(whosespreadsarenolessthanh)thatarenotreportedbythesystem.Intheprevioussubsection,wehaveshownthat,giventheboundsofFPRandFNR,ittakesESDmuchlessmemorytoachievetheboundsthanTFA.SinceCSEandTBAarenotdesignedformeetingagivensetofbounds,wecompareourESDwiththembyadifferentsetofexperimentsthatmeasureandcomparetheFPRandFNRvaluesunderaxedamountofSRAM. Givenaxedmemorysizem,weuseOptimalS(m,s)inSection 3.4.3 todeterminethevalueofsinESD,useOptimalP(m,s)todeterminethevalueofp,andthensetthethresholdTash+l 2.Weperformexperimentsusingtheweek-longtrafctrace.Form=0.05MB,l=0.5h,theresultsarepresentedinTables 3-3 .WealsoperformthesameexperimentsforCSEandTBA,andtheresultsarepresentedinthetableaswell.Theoptimalparametersarechosenforeachschemebasedontheoriginalpapers. Whentheavailablememoryisverysmall,suchas0.05MBinTable 3-3 ,CSEhaszeroFNRbutitsFPRis1.0,whichmeansitreportsallnon-scanners.Thereasonisthat,withoutprobabilisticsampling,CSEstoresinformationoftoomanycontactssuch 74

PAGE 75

thatitsdatastructureisfullysaturated.Inthiscase,thespreadestimationmethodofCSEbreaksdown.TBAhasasmallFPRbutitsFNRislarge.Forexample,whenh=500,itsFNRis26%.OnlyESDachievessmallvaluesforbothFNRandFPR.Forexample,whenh=500,itsFNRis7.4%anditsFPRis5.0%.Thesevaluesdecreasequicklyashincreases.Whenh=1,000,theyare1.0%and0.55%,respectively,whiletheFNRofTBAremainstobe26%. 3.6Summary Scandetectionisoneofthemostimportantfunctionsinintrusiondetectionsystems.TherecentresearchtrendistoimplementsuchafunctioninthetightSRAMspacetocatchupwiththerapidadvanceinnetworkspeed.Thisworkproposesanefcientscandetectionschemebasedonanewmethodcalleddynamicbitsharing,whichoptimallycombinesprobabilisticsampling,bit-sharingstorage,andmaximumlikelihoodestimation.Wedemonstratetheoreticallyandexperimentallythatthenewschemeisabletoachieveaprobabilisticperformanceobjectivewitharbitrarily-setboundsonworst-casefalsepositive/negativeratios.Itdoessoinaverytightmemoryspacewherethenumberofbitsavailableismuchsmallerthanthenumberofexternalsourcestobemonitored. 75

PAGE 76

Figure3-1. Therelativestandarddeviation,Std(Vm) E(Vm),approachestozeroasmincreases. Figure3-2. (A)Thecurve(withoutthearrows)showsthevalueofPotential(m,s,p)withrespecttopwhenm=0.45MBands=150.(B)ThearrowsillustratetheoperationofOptimalP(m,s). Figure3-3. ThevalueofPotential(m,s,OptimalP(m,s))withrespecttoswhenm=0.25MB. 76

PAGE 77

Table3-1. Memoryrequirements(inMB)ofESD,TFAandESD-1(i.e.ESDwithp=1)when=0.9and=0.1. l=0.1hl=0.3hl=0.5hl=0.7hhESDTFAESD-1ESDTFAESD-1ESDTFAESD-1ESDTFAESD-1 5000.092.020.330.192.530.430.303.610.540.976.121.0110000.071.100.270.091.290.330.151.850.420.473.110.8620000.030.550.240.050.710.290.081.020.420.251.620.8630000.020.420.240.030.510.270.060.680.420.171.090.8640000.010.320.210.030.380.270.030.520.420.130.830.8650000.010.240.210.020.310.270.030.430.420.110.660.86 Table3-2. Memoryrequirements(inMB)ofESD,TFAandESD-1(i.e.ESDwithp=1)when=0.95and=0.05. l=0.1hl=0.3hl=0.5hl=0.7hhESDTFAESD-1ESDTFAESD-1ESDTFAESD-1ESDTFAESD-1 5000.122.410.380.223.270.480.484.590.681.568.031.6010000.081.290.320.121.650.380.242.340.500.764.041.2020000.030.690.260.080.870.320.131.210.470.382.121.2030000.020.460.260.060.600.320.090.830.470.261.421.2040000.020.370.230.040.450.320.060.630.470.201.081.2050000.010.290.230.040.350.320.050.520.470.160.891.20 Table3-3. FalsenegativeratioandfalsepositiveratioofESD,CSEandTBAwithm=0.05MB. FNRFPRhESDCSETBAESDCSETBA 5007.4e-202.6e-15.0e-219.0e-610001.0e-202.6e-15.5e-319.0e-620004.2e-302.5e-12.0e-311.1e-530005.5e-302.5e-12.0e-311.0e-54000002.4e-12.0e-317.0e-65000002.4e-12.0e-317.0e-6 77

PAGE 78

Figure3-4. Trafcdistribution:eachpointshowsthenumberofsourceshavingacertainspreadvalue. 78

PAGE 79

CHAPTER4ORIGIN-DESTINATIONFLOWMEASUREMENTINHIGH-SPEEDNETWORKS Thischapterdesignsanefcientapproachfororigin-destinationowmeasurementinhigh-speednetworks,whereanorigin-destination(OD)owbetweentworoutersisthesetofpacketsthatpassbothrouters.TheODowmeasurementhaswidelyusageinmanynetworkmanagementapplications.Weconsidertwoperformancechallenges,measurementefciencyandaccuracy.Theformerrequiresmeasurementfunctionstominimizeper-packetprocessingoverheadtokeepupwithtoday'shigh-speednetwork.Thelatterrequiresmeasurementfunctionstoachieveaccuratemeasurementresultswithsmallbiasandstandarddeviation.WedesignanovelmeasurementmethodthatemploysacompactdatastructureforpacketinformationstorageandusesanewstatisticalinferenceapproachforODowmeasurement.Bothsimulationsandexperimentsareperformedtodemonstratetheeffectivenessofourmethod. Therestofthischapterisorganizedasfollows:Section 4.1 givestheproblemstatementandperformancemetrics.Section 4.2 describestherelatedwork.Section 4.3 presentsourneworigin-destinationowmeasurementmethod.Section 4.4 discussesthesimulationresults.Section 4.5 presentstheexperimentalresults.Section 4.6 givesthesummary. 4.1ProblemStatementandPerformanceMetrics 4.1.1ProblemStatement LetSbeasubsetofroutersofinterestinanetwork.TheproblemistomeasuretrafcvolumebetweenanypairofroutersinS.Wemodelanorigin-destination(OD)owasthesetofpacketstraversebetweentworouters(theundirectionalcase)ortraversefromoneroutertotheother(thedirectionalcase).OurgoalistomeasurethesizeofeachODowintermsofnumberofpackets. ConsiderthesetofaccessroutersontheperimeterofanISPnetwork.Ifeachaccessrouterstoresinformationaboutingresspackets(thatentertheISPnetwork) 79

PAGE 80

andegresspackets(thatleavetheISPnetwork)inseparatedatastructures,wecangureoutthesizeofandirectionalODowbycomparingtheinformationintheingressdatastructureoftheoriginrouterandtheinformationintheegressdatastructureofthedestinationrouter.Ontheotherhand,ifeachaccessrouterstoresinformationofallarrivalpacketsinthesamedatastructure,wecangureoutthesizeofanundirectionalODowbycomparingtheinformationinthedatastructuresofbothrouters.Themeasurementmethodproposedinthisworkcanbeappliedtobothcaseseventhoughourdescriptionusestheundirectionalcaseforsimplicity. Weconsidertwoperformancemetrics,per-packetprocessingoverheadandmeasurementaccuracy,whicharediscussedbelow. 4.1.2Per-packetProcessingOverhead Themaximumpacketthroughputthatanonlinemeasurementfunctioncanachieveisdeterminedbytheper-packetprocessingoverheadofthefunction.Inordertokeepupwithtoday'shigh-speednetwork,itisdesirabletomaketheper-packetprocessingoverheadassmallaspossible,especiallywhentheSRAMandprocessingcircuitsaresharedbyothercriticalfunctionsforrouting,packetscheduling,trafcmanagementandsecuritypurposes. Theper-packetprocessingoverheadismainlydeterminedbythecomputationalcomplexityandthenumberofmemoryaccessesforeachpacket.Whenarouterreceivesapacket,itneedstoperformcertaincomputationstodeterminetheproperlocationfortheinformationstorageofthatpacketandatleastonememoryaccessforthestorageoperation.WewillshowthatourODowmeasurementfunctionisabletoachieveextremelysmallper-packetprocessingoverhead. 4.1.3MeasurementAccuracy LetncbetheODowsizeofanorigin/destinationrouterpairand^ncbethecorrespondingmeasuredresult.Theeventfornctofallintotheinterval[^nc(1)]TJ /F7 11.955 Tf -416.47 -23.91 Td[(),^nc(1+)]withprobabilityatleastspeciesthemeasurementaccuracyofour 80

PAGE 81

function,whereisapre-determinedaccuracyparameter,e.g.,95%.Asmallervalueofmeansbettermeasurementresults. Ifthememoryrequirementandtheprocessingspeedforeachpacketareunlimited,wecanachieve100%measurementresults.Otherwise,wehavetocompromisethemeasurementaccuracyifthememoryresourceisnotenoughortheprocessingspeedrequirementisrelativelystringent. 4.2RelatedWork Theorigin-destination(OD)owmeasurementmethodsmainlyfallintotwocategories.Oneisintermediate-based[ 74 82 88 105 119 121 ]andtheotherisend-to-end-based[ 11 37 42 ].Theintermediatebasedmethods[ 119 121 ]employstatisticaltechniquestoindirectlyestimatetheODowsbasedonlinkload,networkrouting,andcongurationdata,whicharewidelyavailableinformation.Zhangetal.[ 119 ]assumeanunderlyinggravitymodel[ 82 100 ]forODowsanduseedgelinkloaddatatogetherwithadditionalinformationonintermediaterouterstoanalyzethemodel.Afterthat,theyintroducethetomographicmethod[ 12 26 ]todeterminetheresultsthatmosttwiththeobtainedgravitymodel.Themethodsin[ 120 121 ]extendthepoint-to-pointmeasurementtopoint-to-multipointmeasurementusingaregularizationbasedonentropypenalization.Theseintermediate-basedmethodsshareacommonpropertythatjeopardizesthemfrombeingwidelyapplied:theestimationreliesontrafcvolumes,whichareusuallyunknowninformation.Asaresult,thesemethodseithercannotachievehighmeasurementaccuracyorincursseverecomputationalcost. Considineetal.[ 28 ]usethemethodofmomentsforODpacketcounts,whichextractsatrafcdigestfromthepacketstream.Asthestudyin[ 11 ]pointsout,whenthenoise-to-signalratiosarehigh,theperformanceof[ 28 ]willbedegraded. Cao,ChenandBu[ 11 ]designaquasi-likelihoodapproach(QMLE)forODowmeasurementbasedonacontinuousvariantoftheFlajolet-Martinsketches[ 43 ].Theapproachmaintainsanarrayofbuckets,whoseinitialvaluesareallsettoinnityat 81

PAGE 82

thebeginningofameasurementperiod,ineachnetworknode.Whenitreceivesapacket,thenodeperformstwohashoperations.Therstonepseudo-randomlychoosesabucketiinthearrayforpacketinformationstorage.Thesecondonegeneratesanexponentialrandomnumbervbasedonthepacket,whoseexpectedvalueisone.Afterthetwohashoperations,thenodeupdatesthebucketibyv.Iftheoriginalvalueofiislargerthanv,thenodewillsetthevalueofbucketitov.Otherwise,itwillskipthispacket.Attheendofthemeasurementperiod,inordertoestimatetheODowsizeoftworoutersr1andr2,QMLEderivesthequasi-probabilitydistributionofthepacketinformationandemploysthemaximumlikelihoodestimationtocomputetheODowsizebasedonthevaluesofthetwobucketarrays. QMLEclaimsthatitisabletoachievesmallper-packetupdateoverheadandaccuratemeasurementresultwithacompactmemoryrequirement.However,foreachpacket,itneedstoperformtwohashoperationsandmorethanonememoryaccessonaverage(Italwaysneedsonememoryreadandsometimesonememorywrite),whiletheoptimalshouldbeexactlyonehashoperationandonememoryaccessper-packet.Moreover,italsohasspacetoimproveintermsofmeasurementaccuracy.Withtheassignmentofthesameamountofmemoryresource,theproposedmethodinthisstudyisabletoachievemuchmoreaccuratemeasurementresult,whichwillbedemonstratedbysimulationsandexperimentsinSection 4.4 and 4.5 ,respectively. Alsorelatedistorecoverthemissingvaluesduringtrafcmeasurementbythetechniqueofcompressivesensingin[ 122 ],whichproposesaspatio-temporalframeworktoexploitthepresenceofbothglobalstructureandlocalstructure.Rinconetal.[ 98 ]provideamulti-resolutionanalysistodevelopageneralmodelfortrafcmatrices,whichisbasedonthediffusionwavelettransform.Theyndthatthemodelmustbesparseandalsodemonstrateitbyexperimentalresults. 82

PAGE 83

4.3Origin-DestinationFlowMeasurement Werstdescribetwostraightforwardapproachesanddiscusstheirlimitations.Wethenmotivatethebitmapideathatweuseinthisstudy.Finally,wepresentourorigin-destinationowmeasurementmethod(ODFM)indetails. 4.3.1StraightforwardApproachesandTheirLimitations Astraightforwardapproachisforeachroutertostoretheinformationofallpacketsthatpassit.Inthisway,whenwewanttomeasuretheODowsizeoftworouters,weonlyneedtocomparethetwosetsofpacketinformationandcounthowmanypacketsthetwosetshaveincommon,i.e.,thecardinalityoftheintersectionofthetwosets.Clearly,storinginformationofallpacketsisunrealisticsincethenumberofpacketspassingarouterishugeinhigh-speednetworksanditimposesanextremelylargememoryrequirementontherouter. Inordertoreducethememoryrequirement,wecanstorethesignaturesofpacketsinstead.Thesignatureofapacketisahashvalueofthepacketwithaxedlength.Whenthelengthofthesignatureislongenough,e.g.,160bitsifusingSHA-1[ 89 ],thechanceoftwopacketshavingthesamesignaturesisnegligiblysmall.Therefore,wecancountthenumberofidenticalsignaturesthatstoredinthetworouterstoobtaintheODowsize.Thisenhancementcanreducethememoryrequirementtosomeextent.However,itisstillnotmemoryefcient.Supposethereare1Mpacketsthatpassarouterduringameasurementperiod.Whenthelengthofthesignatureis160bitslong,arouterneeds20MB(1M160=8)memorytostoretheinformationofallsignatures,whichisstilltoomuchinpractise.Usingsmallersignaturescannotsolvetheproblem.Forexample,ifwereducethesignaturelengthtojust16bits,thememoryrequirementisstill2MB,farhigherthanthegoalofthisstudy,lessthan1bitperpacket. Anothersolutionisforaroutertomaintainacounter,whoseinitialvalueissettozero,foreachofotherroutersinthenetwork.Inordertonotifythecurrentrouterwhichroutersapackethaspassed,itneedstocarrytheinformationofallpreviousrouters 83

PAGE 84

initsheader.Whenarouterreceivesapacket,itrstchecksthepacketheaderandobtainstheinformationofwhichroutersthispackethaspassed.Itthenincreasesthecorrespondingcountersbyone.Finally,therouteraddsitsowninformationintothepacketbeforesendingitout.Attheendofthemeasurementperiod,inordertoobtaintheODowsizeofrouterr1androuterr2,werstcheckr1andndthecounterthatcorrespondstor2,whichstoresthenumberofpacketsthatenterr2andexitfromr1.Wethencheckr2andndthecounterthatcorrespondstor1,whichstoresthenumberofpacketsthatenterr1andexitfromr2.ThesummationofthetwocountersistheODowsizeofr1andr2.AlthoughthecomputationfortheODowsizeisverysimpleattheendofthemeasurementperiod,thisapproachhastwomaindrawbacksduringthepacketprocessingperiod.First,therouterneedstoextracttheinformationofallroutersthatstoredinapacketandupdatesthecorrespondingcounters,whichslowsdownthepacketprocessingspeedandcannotkeepupwiththelinespeedintoday'shigh-speednetworks.Second,eachpackethastocarrytheinformationofallroutersithaspassed,whichrequiresthemodicationofthepacketstructureandincursunnecessarystorageoverheadtothepacket. 4.3.2ODFM:MotivationandOverview WedesignabitmapbasedODowmeasurementmethodthatisabletosolvetheproblemsthattheabovetwoapproacheshave.Insteadofstoringthesignaturesofpackets,eachroutermaintainsabitarraywithaxedlengthandinitiallyallbitsinthearrayaresettozero.Whentherouterreceivesapacket,itpseudo-randomlymapsthepackettoonebitofthearraybyahashoperationandsetsthebittoone.Attheendofthemeasurementperiod,wemeasuretheODowsizeoftworoutersbycomparingtheirbitarrays.Sinceapacketalwaysusesthesamehashfunctiontochooseabitinthearraysforallroutersandthesizeofeachbitarrayisxedwithinameasurementperiod,itwillmaptothesamelocationinthebitarraysofanyroutersithaspassed.Therefore,ifapacketentersrouterr1andexitsfromr2ortheotherway 84

PAGE 85

around,itscorrespondingbitinthesetwobitarraysmustbebothsettoone.Basedonthisobservation,wecantakeabitwiseANDoperationofthetwobitarraysandcountthenumberofonesinthecombinedbitarraytomeasuretheODow. Notethatthisapproachmayintroducetheoverestimationproblem,whichcouldleadtoaninaccuratemeasurementresult.Supposetwopackets,calledp1andp2,maptoasamelocationjbythehashfunction.Whilep1passesonerouterandp2passestheother.Inthiscase,thejthbitofbothbitarraysofthetworouterswillbesettoone.Whenwecomparethetwobitarrays,wewillfalselytreatp1andp2asasamepacketandoverestimatetheODowsize.However,thereisanicepropertyofourscheme:Becausethebitforeachpacketisrandomlypickedinthebitarray,theeventforanytwopacketstochoosethesamebitinthearrayhasanequalprobabilitytohappen.Whenthenumberofpacketsandthesizeofthebitarrayarelargeenough,thiseventoccursinthebitarrayuniformlyatrandomandtheoverestimationproblemcanberemovedthroughstatisticalanalysis.Thispropertyenablesustodesignacompactyetaccuratemeasurementmethod.Moreover,inourscheme,arouteronlyneedstoperformonehashoperationandonememoryaccessesperpacket,whichisveryefcientandfeasibleforhigh-speednetworks. 4.3.3ODFM:StoringthePacketInformation ODFMconsistsoftwocomponents:oneforstoringthepacketinformationintorouters,theotherformeasuringtheODowofanytworouters.Thissubsectionpresentstherstcomponentandthesecondonewillbedescribedinthenextsubsection. Atthebeginningofthemeasurementperiod,eachroutermaintainsabitarrayBwithaxedlengthm.InitiallyeachbitinBissettozero.TheithbitinthearrayisdenotedasB[i].Whenarouterreceivesapacketp,itpseudo-randomlypicksonebitinBbyperformingahashoperationH(p)andsetthebittoone,whereH(..)isahashfunctionwhoseoutputrangeis[0..m)]TJ /F6 11.955 Tf 11.97 0 Td[(1].Morespecically,tostorethepacketp,ODFM 85

PAGE 86

performsthefollowingassignment:B[H(p)]:=1. (4) Actuallyarouterdoesnothavetoperformthehashoperationonallthecontentofapacket.Inthenetworklayer,apacketcanbeuniquelyidentiedbyitsIPheader,whichstoresthepacketlabelinformation,i.e.,sourceIPaddressanddestinationIPaddressandsoon.Fortwopacketsthatarefragmentsofsomeoriginal,largerpacket,althoughtheysharethesamesource/destinationIPaddressesandidenticationnumber,theirfragmentationoffsetvaluesaredifferent.Therefore,arouteronlyneedstoperformthehashoperation(H(p))ontheIPheaderofapacket,whichcanfurtherreducethehashcomputationalcomplexityandimprovetheprocessingspeed.ThisenhancementcanbealsoappliedtothetwostraightforwardapproachesinSection 4.3.1 Itisworthnotingthatarouteronlyneedstoperformonehashoperationandsetsonebitinitsbitarrayperpacket,whichisverysimple,efcient,andcanbeeasilyimplementedinhigh-speedrouters. 4.3.4ODFM:MeasuringtheSizeofEachODFlow Attheendofthemeasurementperiod,allrouterswillreportitsbitarraytoacentralizedserver,e.g.,thenetworkmanagementcenter,whichperformstheofinemeasurement.ODFMemploysthemaximumlikelihoodestimation(MLE)[ 15 ]tomeasuretheODowofanytworoutersbasedontheirbitarrays.LetS1andS2bethesetofpacketsthatpassthetworoutersr1andr2.Letn1andn2bethecardinalitiesofS1andS2,respectively,i.e.,n1=jS1j,n2=jS2j,ncbethenumberofcommonpacketsthatr1andr2share,i.e.,theODowsizeofthetworouters,whichisthevaluethatwewanttomeasureinthisstudy.Figure 4-1 illustratestherelationshipofn1,n2andnc.Obviously,wehavenc=jS1\S2j.LetB1andB2bethetwobitarraysofr1andr2,U1andU2bethenumberof`0'sinB1andB2,respectively,V1andV2bethepercentageofbitsinB1andB2whosevaluesarezero.Clearly,V1=U1 mandV2=U2 m. 86

PAGE 87

Themeasurementconsistsoftwosteps.Intherststep,wecomputethecardinalityofS1(i.e.,n1)andthecardinalityofS2(i.e.,n2)basedonB1andB2,respectively.Inthesecondstep,wetakeabitwiseANDoperationofB1andB2togenerateanewbitarray,denotedasBc,tocomputetheODowsizenc.LetUcbethenumberof`0'sinBc,VcbethepercentageofbitsinBcwhosevaluesarezero.Clearly,Vc=Uc m.WecomputencbasedonBcandtheresultsobtainedinpreviousstep,i.e.,thevaluesofn1andn2. 4.3.4.1Measuren1andn2 Thenumberofpacketsthatarouterreceivesduringameasurementperiodcanbeeasilyobtainedbyaddingacounterwhoseinitialvalueissettozero.Whenitcomesanewpacket,theroutersimplyincreasesthecounterbyone.Inthisway,wecanobtaintheexactvaluesofn1andn2,whichwewillusetomeasurencinthefollowingsubsection. 4.3.4.2Measurenc Aftern1andn2areobtained,wetakeabitwiseANDoperationofB1andB2,denotedasBc,tomeasurenc.Morespecically,wehaveBc[i]=B1[i]&B2[i],8i2[0..m)]TJ /F6 11.955 Tf 11.95 0 Td[(1]. (4) ForanarbitrarybitbinBc,itis`0'ifandonlyifthefollowingtwoconditionsarebothsatised.First,itisnotchosenbyanypacketinS1\S2.Ifbischosenbyapacketp2S1\S2,weknowthecorrespondingbitsinbothB1andB2willbesetto`1'.Therefore,bwillbe`1'.Second,itiseithernotchosenbyanypacketinS1)]TJ /F3 11.955 Tf 12.49 0 Td[(S2ornotchosenbyanypacketS2)]TJ /F3 11.955 Tf 12.07 0 Td[(S1.Ifitischosenbybothapacketp12S1)]TJ /F3 11.955 Tf 12.08 0 Td[(S2andapacketp22S2)]TJ /F3 11.955 Tf 12.13 0 Td[(S1,thecorrespondingbitsinbothB1andB2willbealsosetto`1'.Asaresult,bwillbe`1'.Fortherstcondition,apacketinS1\S2hasprobability1 mtosetbto`1',whichmeanstheprobabilityforbnottobesetbythispacketis1)]TJ /F5 7.97 Tf 15.09 4.71 Td[(1 m.AsFigure 4-1 shows,nc=jS1\S2j.Therefore,theprobabilityforbnottobesetto`1'byanypacketinS1\S2is(1)]TJ /F5 7.97 Tf 14.83 4.71 Td[(1 m)nc.Similarly,theprobabilityforitnottobechosenbyanypacketin 87

PAGE 88

S1)]TJ /F3 11.955 Tf 11.98 0 Td[(S2is(1)]TJ /F5 7.97 Tf 14.48 4.71 Td[(1 m)n1)]TJ /F4 7.97 Tf 6.59 0 Td[(ncandtheprobabilityforitnottobechosenbyanypacketinS2)]TJ /F3 11.955 Tf 11.98 0 Td[(S1is(1)]TJ /F5 7.97 Tf 14.45 4.7 Td[(1 m)n2)]TJ /F4 7.97 Tf 6.59 0 Td[(nc.Asaresult,theprobabilityq(nc)forbtoremain`0'inBcisq(nc)=(1)]TJ /F6 11.955 Tf 14.99 8.09 Td[(1 m)ncf1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F6 11.955 Tf 15 8.09 Td[(1 m)n1)]TJ /F4 7.97 Tf 6.58 0 Td[(nc)(1)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F6 11.955 Tf 14.99 8.09 Td[(1 m)n2)]TJ /F4 7.97 Tf 6.58 0 Td[(nc)g=(1)]TJ /F6 11.955 Tf 14.99 8.09 Td[(1 m)n1+(1)]TJ /F6 11.955 Tf 15 8.09 Td[(1 m)n2)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F6 11.955 Tf 15 8.09 Td[(1 m)n1+n2)]TJ /F4 7.97 Tf 6.59 0 Td[(nc (4) EachbitinBchasaprobabilityq(nc)tobe`0'.Theobservednumberof`0'bitsinBcisUc.Therefore,thelikelihoodfunctionforthisobservationtooccurisgivenasfollows:L=q(nc)Uc(1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(nc))m)]TJ /F4 7.97 Tf 6.59 0 Td[(Uc (4) Followingthestandardprocessofmaximumlikelihoodestimation,wendanoptimalvalueofncthatcanmaximizetheabovelikelihoodfunction.Namely,wewanttond ^nc=argmaxfLgnc(4) Tond^nc,wetakealogarithmoperationtobothsidesof( 4 ).lnL=Uclnq(nc)+(m)]TJ /F3 11.955 Tf 11.95 0 Td[(Uc)ln(1)]TJ /F3 11.955 Tf 11.95 0 Td[(q(nc)) (4) Wethendifferentiatetheaboveequation:dlnL dnc=(Uc q(nc))]TJ /F3 11.955 Tf 18.52 8.09 Td[(m)]TJ /F3 11.955 Tf 11.96 0 Td[(Uc 1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(nc))q0(nc)=(Uc q(nc))]TJ /F3 11.955 Tf 18.52 8.09 Td[(m)]TJ /F3 11.955 Tf 11.96 0 Td[(Uc 1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(nc))ln(1)]TJ /F6 11.955 Tf 15 8.09 Td[(1 m)(1)]TJ /F6 11.955 Tf 15 8.08 Td[(1 m)n1+n2)]TJ /F4 7.97 Tf 6.59 0 Td[(nc, (4) sinceaccordingto( 4 ),wehaveq0(nc)=dq(nc) dnc=ln(1)]TJ /F6 11.955 Tf 15 8.08 Td[(1 m)(1)]TJ /F6 11.955 Tf 15 8.08 Td[(1 m)n1+n2)]TJ /F4 7.97 Tf 6.59 0 Td[(nc. (4) 88

PAGE 89

Inordertocompute^nc,wesettherightsideof( 4 )tozero,i.e. (Uc q(nc))]TJ /F3 11.955 Tf 18.52 8.09 Td[(m)]TJ /F3 11.955 Tf 11.96 0 Td[(Uc 1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(nc))ln(1)]TJ /F6 11.955 Tf 14.99 8.09 Td[(1 m)(1)]TJ /F6 11.955 Tf 15 8.09 Td[(1 m)n1+n2)]TJ /F4 7.97 Tf 6.59 0 Td[(nc=0 (4) Sinceneitherofln(1)]TJ /F5 7.97 Tf 14.45 4.71 Td[(1 m)and(1)]TJ /F5 7.97 Tf 14.45 4.71 Td[(1 m)n1+n2)]TJ /F4 7.97 Tf 6.59 0 Td[(nccouldbe0whenmispositive,wehave Uc q(nc))]TJ /F3 11.955 Tf 18.53 8.09 Td[(m)]TJ /F3 11.955 Tf 11.96 0 Td[(Uc 1)]TJ /F3 11.955 Tf 11.95 0 Td[(q(nc)=0. (4) Applying( 4 )to( 4 ),wehave(1)]TJ /F6 11.955 Tf 14.99 8.09 Td[(1 m)n1+(1)]TJ /F6 11.955 Tf 15 8.09 Td[(1 m)n2)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F6 11.955 Tf 15 8.09 Td[(1 m)n1+n2)]TJ /F4 7.97 Tf 6.59 0 Td[(nc=Uc m=Vc. (4) Inaboveequation,m,n1,andn2areallknownvalues,andVccanalsobecomputedwhenthepacketsinformationarerecorded.Asaresult,wecanmeasurencinthefollowingformula:nc=n1+n2)]TJ /F6 11.955 Tf 13.15 8.79 Td[(ln((1)]TJ /F5 7.97 Tf 14.45 4.7 Td[(1 m)n1+(1)]TJ /F5 7.97 Tf 14.45 4.7 Td[(1 m)n2)]TJ /F3 11.955 Tf 11.95 0 Td[(Vc) ln(1)]TJ /F5 7.97 Tf 14.45 4.71 Td[(1 m) (4) 4.3.5MeasurementAccuracy TheprevioussubsectiongivesthemeasurementformulaforncbyMLE.Weanalyzethemeasurementaccuracyofourmethodinthissubsection.AccordingtothestandardtheoryofMLE[ 86 ],whenthevaluesofm,n1,andn2arelargeenough,themeasuredODowsize^ncapproximatelyfollowsanormaldistribution:^ncNormnc,1 I(^nc), (4) 89

PAGE 90

whereI(^nc)isthesherinformation1ofL,whichisdenedasfollows I(^nc)=)]TJ /F3 11.955 Tf 9.3 0 Td[(Ed2lnL dn2c.(4) Accordingto( 4 ),wecomputethesecond-orderderivativeoflnLd2lnL dn2c=ln(1)]TJ /F6 11.955 Tf 14.99 8.08 Td[(1 m))]TJ /F3 11.955 Tf 13.15 8.08 Td[(Ucq0(nc) q2(nc))]TJ /F6 11.955 Tf 13.15 8.08 Td[((m)]TJ /F3 11.955 Tf 11.96 0 Td[(Uc)q0(nc) (1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(nc))2C)]TJ /F11 11.955 Tf 11.96 16.86 Td[(Uc q(nc))]TJ /F3 11.955 Tf 18.52 8.09 Td[(m)]TJ /F3 11.955 Tf 11.95 0 Td[(Uc 1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(nc)C, (4) whereC=(1)]TJ /F5 7.97 Tf 14.45 4.7 Td[(1 m)n1+n2)]TJ /F4 7.97 Tf 6.59 0 Td[(ncandq0(nc)isgivenin( 4 ). Weusetheprobabilisticcountingmethod[ 54 ]tocomputetheexpectedvalueofUc.LetXibetheeventthattheithbitinBcremains`0'attheendofthemeasurementperiodand1Xibethecorrespondingindicatorrandomvariable.AsthesizeofBcism,foranarbitrarybitb,ithasprobabilityq(nc)toremain`0'.Ucisthenumberof`0'sinBc,Uc=Pm)]TJ /F5 7.97 Tf 6.58 0 Td[(1i=01Xi.Hence,E(Uc)=m)]TJ /F5 7.97 Tf 6.59 0 Td[(1Xi=0E(1Xi)=m)]TJ /F5 7.97 Tf 6.59 0 Td[(1Xi=0q(nc)=mq(nc) (4) Therefore,wehaveI(^nc)=)]TJ /F3 11.955 Tf 9.3 0 Td[(Ed2lnL dn2c=ln(1)]TJ /F6 11.955 Tf 15 8.09 Td[(1 m)mq0(nc) q(nc)+mq0(nc) 1)]TJ /F3 11.955 Tf 11.96 0 Td[(q(nc)C, (4) astheexpectedvalueof(Uc q(nc))]TJ /F4 7.97 Tf 16.93 4.71 Td[(m)]TJ /F4 7.97 Tf 6.59 0 Td[(Uc 1)]TJ /F4 7.97 Tf 6.58 0 Td[(q(nc))is0. 1Thesherinformation[ 67 ]isawayofmeasuringtheamountofinformationthatanobservablerandomvariablexcarriesaboutanunknownparameteruponwhichthelikelihoodfunctionof,L()=f(x;),depends. 90

PAGE 91

Accordingto( 4 ),thevarianceof^ncisVar(^nc)=1 I(^nc)=1 ln(1)]TJ /F5 7.97 Tf 14.45 4.71 Td[(1 m)mq0(nc) q(nc)+mq0(nc) 1)]TJ /F4 7.97 Tf 6.58 0 Td[(q(nc)C. (4) andthecondenceintervalofourmeasurementis ^ncZ s ln(1)]TJ /F5 7.97 Tf 14.45 4.7 Td[(1 m)mq0(nc) q(nc)+mq0(nc) 1)]TJ /F4 7.97 Tf 6.58 0 Td[(q(nc)C,(4) whereisthecondencelevelparameterandZisthepercentileforthestandardGaussiandistribution[ 9 ].Forexample,when=99%,Z=2.58. 4.4Simulations WerstevaluatetheperformanceofourmethodODFMbysimulationsinthissection.Wewillpresentexperimentalresultsbasedonrealtrafctraceinthenextsection.Inbothsimulationsandexperiments,wecompareODFMwiththemostrelatedwork,QMLE[ 11 ].Forfaircomparison,weassignthesameamountofmemorytoODFMandQMLE.Wecomparethemintermsofonlineprocessingoverheadandofinemeasurementaccuracy. Simulationsareperformedundersystemparameters,n1,n2,andnc.Foranorigin-destinationrouterpair,n1isthenumberofpacketsthatonerouterreceivesduringthemeasurementperiod,andn2isthenumberofpacketsthattheotherouterreceives.ParameterncistheactualODowsize.Theamountofmemoryusedissettobe1MB. Intherstsetofsimulations,weletn1=6,000,000,n2=6,000,000,300,000,or100,000.Wevaryncfrom100to50,000.WeuseODFMandQMLEtomeasuretheowsize,andcompareitwithnctoseehowaccuratethemeasurementis. 91

PAGE 92

Inthesecondsetofsimulations,wemodelamorerealisticscenario,wheren1,n2andncarerandomlychosen.Thevaluesofn1andn2arerandomlyselectedfromtherangeof[100,000,10,000,000],andthevalueofncisrandomlyselectedfrom[100,50,000]ineachsimulationrun. 4.4.1ProcessingOverhead Per-packetprocessingoverheadofameasurementmethodismainlydeterminedbythenumberofmemoryaccessesandthenumberofhashoperationsforeachpacket.Table 4-1 showstheaveragedresultswhenn1=6,000,000,n2=6,000,000,andncvariesfrom100to50,000.ODFMrequiresonly1hashoperationand1memoryaccess(memorywrite)foreachpacket,whichistheoptimal.QMLErequiresmoreper-packetprocessingoverhead.Itincurs1.50memoryaccessesand2hashoperationsonaverage.Furthermore,per-packetprocessingoverheadofODFMisconstant,whileQMLErequiresvariableper-packetprocessingoverhead,whichisundesirableinpractice.Table 4-2 andTable 4-3 presentsimilarresultswithn2=300,000and100,000respectively.Table 4-4 showstheresultswhenthevaluesofn1andn2arerandomlychosenintherange[100,000,10,000,000]andthevalueofncisrandomlychosenintherangeof[100,50,000]. 4.4.2MeasurementAccuracy Figures 4-2 4-4 presentthemeasurementresultsofODFMandQMLE.Eachgureconsistsoffourplots.Eachpointintherstplot(ODFM)orthesecondplot(QMLE)representsanODow.Thex-axisistheactualowsizenc,andthey-axisistheestimatedvalue^nc.Wealsoshowtheequalityline,y=x,forreference.Clearly,thecloserapointistotheequalityline,thebettertheestimationresultis.Thethirdplotshowsthecorrespondingmeasuredbiasofthersttwoplots,whichisE(^nc)]TJ /F3 11.955 Tf 12.25 0 Td[(nc).Thefourthplotshowsthecorrespondingstandarddeviationofthersttwoplots,whichisp Var(^nc) nc.Inordertoclearlypresenttheestimationresultsofthetwomethods,wedividethehorizontalcoordinateinto25measurementbinsofwidth2,000,andnumerically 92

PAGE 93

measurethebiasandstandarddeviationineachbin.Thethreegurespresentthefollowingresults. AsshownintherstplotofFigure 4-2 ,whenthevaluesofn1andn2arethesame,ODFMhasasmallbiasinitsmeasurement,whichisunderstandablebecauseitiswellknownthatthemaximumlikelihoodestimationmayproducesmallbiasundercertainparametersettings.ThesecondplotshowsthatQMLEperformsbetterandproducesalmostperfectresults.However,thisisonlypartofthestory.Whenthevaluesofn1andn2aredifferent,asshowninFigure 4-3 wheren1=6,000,000andn2=300,000,ODFMperformsnearlyperfectly,whileQMLEproduceslargebias.Asthedifferencebetweenn1andn2widens,thebiasofQMLEbecomeslarger,whereastheperformanceofODFMisactuallyimproved,whichisshowninFigure 4-4 wheren1=6,000,000andn2=100,000.Nowthequestioniswhichcaseisclosertothereality,n1andn2havingclosevaluesordiversevalues?Itisthelatter,aswewillshowinthenextsection. Figure 4-5 comparestheperformanceofODFMandQDFMwhenn1andn2arerandomlypickedintherange[100,000,10,000,000].Clearly,ODFMoutperformsQMLEbyawidemargin.Thereasonisthatrandomly-selectedvaluesofn1andn2tendtobeverydifferentthanbeingclosetoeachother. 4.5Experiments WefurtherevaluatetheperformanceofODFMandQMLEbyexperimentsinthissection.TheexperimentaldatasetthatweuseisobtainedfromAbilenenetwork(Internet2)[ 3 ],whichiscollectedandsharedbyYinZhang[ 124 ].Thenetworkconsistsof12routersthatarelocatedatdifferentcitiesinUS[ 1 ].Thedatasetcontains24weeksofAbilenetrafcmatricesfromMarch1sttoSeptember10th,2004.Theresolutionofthedatasetis5minutes,whichmeansthereare2472412=48,3845-mintrafcmatrices.Ineach5-mintrafcmatrices,thetrafcowsoftheroutersrangefrom0.5Gigabytesto20Gigabytes.Wesetthedurationofameasurementperiodto5minutes 93

PAGE 94

andassumethatthepacketsizeis1,500bytes,whichmeanstheroutersreceiveabout0.3Mto13Mpacketsinonemeasurementperiod. Weallocate1MBmemoryresourcetoeachrouterandimplementthetwomeasurementmethodsbasedonthe24weeks'trafcmatrices.Theexperimentalresultsaresimilarforthoseweeks.Inthissection,weonlypresenttheresultsfortherstweek. 4.5.1NumberofPacketsforanOrigin-DestinationPair BeforemeasuringthesizeofeachODow,werststudythenumberofpacketsthattheoriginrouterandthedestinationrouterreceive,whicharedenotedasn1andn2respectively.Werandomlypick100ODpairinthetrafcmatricesandpresentthevaluesofn1andn2inFigure 4-6 .Thex-axisistheindexoftheODpair.EachindexcorrespondstoanOrigin-Destinationpair,(n1,n2).Thegureshowsthatthevaluesofn1andn2areverydifferentfromeachotherinmostcases.Forexample,forthetenthODpair,n1=1,111,022andn2=17,795,961.Theratiobetweenn1andn2isabout0.06.Astheprevioussectionshows,ODFMisnotabletoworkwellinthissituation.Wewilldemonstrateitshortly. 4.5.2ProcessingOverhead Table 4-1 showstheaveragedresultsoftheper-packetprocessingoverheadintermsofthenumberofmemoryaccessesandthenumberofhashoperationsforeachpacket.Likeprevioussection,ODFMrequiresonly1hashoperationand1memoryaccess(memorywrite)foreachpacket.QMLErequiresmoreper-packetprocessingoverheadthanODFM.Itincurs1.17memoryaccessesforeachpacketand2hashoperationsonaverage.Furthermore,ODFMrequiresconstantper-packetprocessingoverhead.WhileQMLErequiresunpredictedper-packetprocessingoverheadintermsofmemoryaccesses. 94

PAGE 95

4.5.3MeasurementAccuracy SimilartoFigure 4-2 4-5 ,Figure 4-7 hasfourplots.TherstplotpresentstheestimationresultsofODFM.ThesecondplotspresentstheestimationresultsofQMLE.Thethirdplotshowsthecorrespondingestimationbiasandthelastplotshowsthestandarddeviation.ClearlyODFMworksfarbetterthanQMLE,whichmatchesthesimulationresultsinFigure 4-5 .Thereasonisthatinthetrafcmatrices,theoriginrouterandthedestinationrouterarelikelytoreceivedifferentnumberofpackets.AndtheperformanceofQMLEwilldegradeinthatsituation,whileODFMdoesnothavethiskindofproblem. 4.6Summary ThischapterproposesanewmethodforODowmeasurementwhichemploysthebitmapdatastructureforpacketinformationstorageandusesstatisticalinferenceapproachtocomputethemeasurementresults.Ourmethodnotonlyrequiressmallerper-packetprocessingoverheadbutalsoachievesmuchmoreaccurateresults,whencomparingwiththebestexistingapproach.Weimplementbothsimulationsandexperimentstodemonstratethesuperiorperformanceofourmethod. 95

PAGE 96

Table4-1. Numberofmemoryaccessesandnumberofhashoperationsperpacketwithn1=6,000,000andn2=6,000,000 memoryhashconstant?accessesoperations ODFM11YesQMLE1.502No Table4-2. Numberofmemoryaccessesandnumberofhashoperationsperpacketwithn1=6,000,000andn2=300,000 memoryhashconstant?accessesoperations ODFM11YesQMLE1.562No Table4-3. Numberofmemoryaccessesandnumberofhashoperationsperpacketwithn1=6,000,000andn2=100,000 memoryhashconstant?accessesoperations ODFM11YesQMLE1.542No Table4-4. Numberofmemoryaccessesandnumberofhashoperationsperpacketwiththevaluesofn1andn2arerandomlyassignedbetween100,000and10,000,000 memoryhashconstant?accessesoperations ODFM11YesQMLE1.222No Table4-5. Numberofmemoryaccessesandnumberofhashoperationsperpacket memoryhashconstant?accessesoperations ODFM11YesQMLE1.172No 96

PAGE 97

Figure4-1. Therelationbetweentworoutersr1andr2 Figure4-2. EstimationresultsbyODFMandQMLEwhenn1=6,000,000andn2=6,000,000. Figure4-3. EstimationresultsbyODFMandQMLEwhenn1=6,000,000andn2=300,000. Figure4-4. EstimationresultsbyODFMandQMLEwhenn1=6,000,000andn2=100,000. 97

PAGE 98

Figure4-5. EstimationresultsbyODFMandQMLEwhenthevaluesofn1andn2arerandomlyassignedbetween100,000and10,000,000. Figure4-6. Thenumberofpacketsfor100ODpair. Figure4-7. EstimationresultsbyODFMandQMLEwhenn1=1,000,000andn2=1,000,000. 98

PAGE 99

CHAPTER5SIZEESTIMATIONPROBLEMINRFIDSYSTEMS RFID(radio-frequencyidentication)tagsarebecomingubiquitouslyavailableinwarehousemanagement,objecttrackingandinventorycontrol.ResearchershavebeenactivelystudyingRFIDsystemsasanemergingpervasivecomputingplatform[ 73 78 79 87 116 118 ],whichhelpscreateamulti-billiondollarmarket[ 31 ].ThischapterfocusesonperiodicallyandautomaticallyestimatingthenumberofRFIDtagsinalargedeploymentarea.InalargeRFIDsystems,activetagsarelikelytouseduetotheirlongertransmissiondistance.However,thesebattery-poweredtagsneedtoberechargedwhentheyrunoutofenergy.Rechargingtensofthousandsoftagsisalaboriousoperation.Moreover,sometimestaggedproductsmaybestackedup,whichmakestagsnoteasilyaccessible.Toprolongthelifetimeoftagsandreducethefrequencyofbatteryrecharge,allfunctionsthatinvolvelarge-scaletransmissionbymanytagsshouldbeenergy-efcient.Tothebestofourknowledge,thisworkisthersttodesignenergy-efcientprotocolsfortheestimationproblemsinlarge-scaleRFIDsystemsthatuseactivetags. Therestofthestudyisorganizedasfollows:Section 5.1 discussestherelatedwork.Section 5.2 denestheproblemtobesolvedandthesystemmodel.Sections 5.3 and 5.4 proposetwoenergy-efcientalgorithmsfortheRFIDestimationproblem.Section 5.5 evaluatesthealgorithmsthroughsimulations.Section 5.6 givesthesummary. 5.1RelatedWork MostexistingworkfocusesonhowtoefcientlyreadthetagIDs.CollisionoccurswhenmultipletagstransmittheirIDsinthesametimeslot.Collisionarbitrationprotocolsmainlyfallintotwocategories:theframedALOHA-basedprotocols[ 16 61 94 111 117 ]andthetree-basedprotocols[ 25 53 72 83 85 90 ].Intheformercategory,eachpollingrequestcarriesaframelength,andeverytagindividuallychoosesaslotinthe 99

PAGE 100

frametotransmititsID.TheprocessrepeatsuntilalltagssuccessfullytransmittheirIDstotheRFIDreader.Inthelattercategory,areaderrstsendsoutanIDprexstring.ThetagswhoseIDmatchesthestringwillrespond.Ifacollisionhappens,thereaderwillappenda`0'or`1'totheprexstringandsendoutthenewstring.Thisprocessrepeatsuntilonlyonetagresponds.EssentiallytheapproachtraversesabinarytreewiththetagIDsbeingtheleafnodes. InsteadofidentifyingindividualRFIDtags,Floerkemeier[ 44 45 ]studiestheproblemofestimatingthecardinalityofatagsetbasedonthenumberofemptyslots.TheproposedschemeemploysaBayesianprobabilityestimationtoachievefastestimation.Theschemeissimilartohash-basedestimators[ 38 113 ]andthedifferenceisdiscussedin[ 64 ].InKodialamandNandagopal'sapproach[ 63 ],informationfromtagsarecollectedbyaRFIDreaderinaseriesoftimeframes.Eachframeconsistsofanumberofslots,andthetagsprobabilisticallyrespondinthoseslots.Usingtheprobabilisticcountingmethods,thereaderestimatesthenumberoftagsbasedonthenumberofemptyslotsorthenumberofcollisionslotsineachframe.TheirbestestimatoriscalledtheUniedProbabilisticEstimator(UPE).Afollow-upworkbythesameauthorsproposestheEnhancedZero-BasedEstimator(EZB)[ 64 ],whichmakesitsestimationbasedonthenumberofemptyslots.Thefocusoftheaboveestimatorsistoreducethetimeittakesareadertocompletetheestimationprocess.Becausetheirgoalisnotconservingenergyforactivetags,theirdesignisnotgearedtowardsreducingthenumberoftransmissionsmadebythetags. TheLottery-Framescheme(LoF)[ 93 ]byQianetal.employsageometricdistribution-basedschemetodeterminewhichslotinatimeframeeachtagwillrespond.ItsignicantlyreducestheestimationtimewhencomparingwithUPE.However,everytagmustrespondineachofthetimeframes,resultinginlargeenergycostwhenactivetagsusetheirownpowertotransmit.TheFirstNon-EmptyslotsBasedalgorithm 100

PAGE 101

(FNEB)[ 49 ]usestheslotnumberoftherstreplyfromtagsinaframetocountRFIDtagsinbothstaticanddynamicenvironments. AlsorelatedisanovelsecurityprotocolproposedbyTanetal.tomonitortheeventofmissingtagsinthepresenceofdishonestRFIDreaders[ 108 ].Inordertopreventadishonestreaderfromreplayingpreviouslycollectedinformation,theymaintainatimerintheserverandperiodicallyupdatethesystemclock.Lietal.[ 68 ]designaseriesofefcientprotocolsthatemploynoveltechniquestoidentifymissingtagsinlarge-scaleRFIDsystems. Noneoftheaboveestimatorsaredesignedwithenergyconservationinmind.Inthefollowing,wewillpresentourenergyefcientestimators. 5.2ProblemDenitionAndSystemModel 5.2.1RFIDEstimationProblem TheproblemistodesignefcientalgorithmstoestimatethenumberofRFIDtagsinadeploymentareawithoutactuallyreadingtheIDofeachtag.LetNbetheactualnumberoftagsand^Nbetheestimate.Theestimationaccuracyisspeciedbyacondenceintervalwithtwoparameters:aprobabilityvalueandanerrorbound,bothintherangeof(0,1).TherequirementisthattheprobabilityforN ^Ntofallintheinterval[1)]TJ /F7 11.955 Tf 11.95 0 Td[(,1+]shouldbeatleast,i.e.,Probf(1)]TJ /F7 11.955 Tf 11.96 0 Td[()^NN(1+)^Ng. Ourgoalistoreducetheenergyoverheadincurredtothetagsduringtheestimationprocessthatachievestheaboveaccuracy.PriorworksontheRFIDestimationproblemfocusontime-efciency,whichistheamountoftimeaRFIDreaderspendsinestimatingthenumberoftagsinthesystem.Ourworkfocusesonenergy-efciency,whichistheamountofenergythetagsspendduringestimationprocess. 101

PAGE 102

5.2.2ActiveTags ThetypeofactiveRFIDsystemsconsideredinthisworkisapplicabletoalargedeploymentareathatishundredsoffeetormoreacross.Passivetagsarebeyondthescopeofthiswork.Iftheywereused,onewouldhavetotaketheRFIDreaderandmovearoundthewholearea,collectingtaginformationonceeveryfewfeet.Activetagsallowareadertocollectinformationfromonelocation. Taggedgoods(suchasapparel)maystackinpiles,andtheremaybeobstacles,suchasrackslledwithmerchandize,betweenatagandthereader.Weexpectactivetagsaredesignedtotransmitwithsignicantpowerthatishighenoughtoensurereliableinformationdeliveryinsuchademandingenvironment.Hence,energycostduetothetags'transmissionsisthemainconcerninouralgorithmdesign;itincreasesatleastinthesquareofthemaximumdistancetobecoveredbytheRFIDsystem.Energyconsumptionthatpowersatag'scircuitforcomputingandreceivinginformationisnotaffectedbylongdistanceandobstacles.OurnewestimatorsaredesignedforRFIDsystemswherepowerconsumptionbytagsisdominatedbytransmissioneventsduetolongdistancesthatthesystemsneedtocover.EnergyconsumedbytheRFIDreaderislessofaconcern.Weassumethereadertransmitsatsufcientlyhighpower. 5.2.3CommunicationProtocol Weusethefollowingcommunicationprotocolbetweenareaderandtags.Thereaderrstsynchronizestheclocksofthetagsandthenperformsasequenceofpollings.Clocksynchronizationonlyneedstohappenatthebeginningoftheprotocolexecution.RFIDsystemsoperateinlow-ratewirelesschannels.Ournewestimatorsonlytakeafewsecondstocomplete.Clockdriftshouldnotbeamajorissueinalow-ratechannelwithinsuchashortperiodtime. Ineachpolling,thereadersendsoutarequest,whichisfollowedbyaslottedtimeframeduringwhichthetagsrespond.Thepollingrequestfromthereadercarriesacontentionprobability0
PAGE 103

currentpollingwithprobabilityp.Ifitdecidestoparticipate,itwillpickaslotuniformlyatrandomfromtheframe,andtransmitabitstring(calledresponse)inthatslot.Theformatoftheresponsedependsontheapplication.Ifthetagdecidestonotparticipate,itwillkeepsilent.Inoursolutions,pwillbesetintheorderof1 N. IfweknowalowerboundNminofN,thecontentionprobabilitycanbeimplementedefcientlytoconserveenergy.Forexample,acompany'sinventoryofcertaingoodsmaybeinthethousandsandneverbeforereducedbelowacertainnumber,orthecompanyhasapolicyontheminimuminventory,ortheRFIDestimationbecomesunnecessarywhenthenumberoftagsisbelowathreshold.Inthesecases,wewillhavealowerboundNmin,whichcanbemuchsmallerthanN.IfweknowsuchavalueofNmin,wecanimplementacontentionprobabilitypwithoutrequiringalltagstoparticipateinthecontentionprocess.Sinceonlyasmallnumberoftagsactuallyparticipateincontention,energycostisreduced.Theimplementationisdescribedasfollows:Atthebeginningofapolling,eachtagmakesaprobabilisticdecision:Itgoestoastandbymodeforthecurrentpollingwithprobability1)]TJ /F5 7.97 Tf 20.03 4.7 Td[(1 Nminandwakesupuntilthenextpollingstarts,oritstaysawaketoreceivethepollingrequestwithprobability1 NminandthendecidestorespondwithprobabilityminfpNmin,1g.Forexample,ifN=10,000andNmin=1,000,thenonly10tagsstayawakeineachpolling.InSection 5.3.5 ,anotherenergy-reductionmethod,calledrequest-lesspollings,willbeproposedtoeliminatemostpollingrequests. Intheabovecommunicationprotocol,thereader'srequestmayincludeanoptionalprexandonlytagsthatsatisfytheprexwillparticipateinthepolling.Forexample,supposealltagsdeployedinonesectionofawarehousecarrythe96-bitGEN2IDsthatbeginwithintheSerialNumbereld.Inordertoestimatethenumberoftagsinthissection,therequestcarriesapredicatetestingwhethertherstthreebitsofatag'sSerialNumberis. 103

PAGE 104

5.2.4Empty/Singleton/CollisionSlots Aslotissaidtobeemptyifnotagresponds(transmits)intheslot.Itiscalledasingletonslotifexactlyonetagresponds.Itisacollisionslotifmorethanonetagresponds.Asingletonorcollisionslotisalsocalledanon-emptyslot.ThePhilipsI-Codesystem[ 101 ]requiresaslotlengthof10bitsinordertodistinguishsingletonslotsfromcollisionslots.Onthecontrary,onebitisenoughifweonlyneedtodistinguishemptyslotsfromnon-emptyslots`0'meansemptyand`1'meansnon-empty.Hence,theresponsewillbemuchshorter(orconsumemuchlessenergy)ifanalgorithmonlyneedstoknowempty/non-emptyslots,insteadofallthreetypesofslotsasrequiredby[ 63 ]. Inordertoprolongthelifetimeoftags,therearetwowaystoreducetheirenergyconsumption:reducingthesizeofeachresponseandreducingthenumberofresponses.Wewilldesignalgorithmsthatrequireonlytheknowledgeofempty/non-emptyslotsandemploystatisticalmethodstominimizetheamountoftransmissionneededfromthetags. 5.3GeneralizedMaximumLikelihoodEstimationAlgorithm OurrstestimatorforthenumberofRFIDtagsiscalledthegeneralizedmaximumlikelihoodestimation(GMLE)algorithm.Itfullyutilizestheinformationfromallpollingsinordertominimizethenumberofpollingsitneedstomeettheaccuracyrequirement. 5.3.1Overview GMLEusesthepollingprotocoldescribedinSection 5.2.3 .Theframesizefisxedtobeoneslot.TheRFIDreaderadjuststhecontentionprobabilityforeachpolling.Letpibethecontentionprobabilityoftheithpolling.GMLEonlyrecordswhetherthesoleslotineachpollingisemptyornon-empty.Basedonthisinformation,itrenestheestimate^Nuntiltheaccuracyrequirementismet.Letzibetheslotstateoftheithpolling.Whenatleastonetagresponds,theslotisnon-emptyandzi=1.Whennotagresponds,itisemptyandzi=0.Thesequenceofzi,i1,formstheresponsevector. 104

PAGE 105

Attheithpolling,eachtaghasaprobabilitypitotransmitand,ifanytagtransmits,ziwillbeone.Hence,Probfzi=1g=1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F3 11.955 Tf 11.96 0 Td[(pi)N1)]TJ /F3 11.955 Tf 11.95 0 Td[(e)]TJ /F4 7.97 Tf 6.58 0 Td[(Npi, (5) whereNisthetheactualnumberoftags. Ifthecontentionprobabilitiesofthepollingsarepickedtoosmall,theresponsevectorwillcontainmostlyzeros.Ifthecontentionprobabilitiesarepickedtoolarge,theresponsevectorwillcontainmostlyones.Bothcasesdonotprovidesufcientstatisticalinformationforaccurateestimation.Aswillbediscussedshortly,ouranalysisshowsthattheoptimalcontentionprobabilityforminimizingthenumberofpollingsispi=1.594=N.TheproblemisthatwedonotknowN(whichisthequantitywewanttoestimate). Inordertodeterminepi,GMLEconsistsofaninitializationphaseandaniterativephase.TheformerquicklyproducesacoarseestimationofN.Thelatterrenesthecontentionprobabilityandgeneratestheestimationresult. 5.3.2InitializationPhase Wewanttopickasmallvaluefortheinitialcontentionprobabilityp1attherstpolling.TheexpectednumberofrespondingtagsisNp1.Ifp1ispickedtoolarge,alotoftagswillrespond,whichiswastefulbecauseoneresponseormanyresponsesproducethesameinformationanon-emptyslot.SupposeweknowanupperboundNmaxofN.Thisinformationisoftenavailableinpractice.Forexample,weknowNmaxis10,000ifthewarehouseisdesignedtoholdnomorethan10,000microwaves(eachtaggedwithaRFID),orthecompany'sinventorypolicyrequiresthatin-storemicrowavesshouldnotexceed10,000,orthewarehouseonlyhas10,000RFIDtagsinuse.NmaxcanbemuchbiggerthanN.Wepickp1=1 Nmaxsuchthattheexpectednumberofrespondingtagsisnomorethanone.Ifz1=0,wemultiplythecontentionprobabilitybyaconstantC(>1),i.e.,p2=p1Cforthesecondpolling.WecontinuemultiplyingthecontentionprobabilitybyCaftereachpollinguntilanon-emptyslotisobserved.Whenthathappens(say,at 105

PAGE 106

thelthpolling),wehaveacoarseestimationofNtobe1=pl.Thenwemovetothenextphase.WhenCisrelativelylarge,theinitializationphaseonlytakesafewpollingstocompleteduetotheexponentialincreaseofthecontentionprobability. 5.3.3IterativePhase Thisphaseiterativelyrenestheestimationresultaftereachpolling,andterminateswhenthespeciedaccuracyrequirementismet.Let^Nibetheestimatednumberoftagsaftertheithpolling.Tocompute^Ni,thereaderperformsthreetasksattheithpolling.First,itsetsthecontentionprobabilityasfollowsbeforesendingoutthepollingrequest: pi=! ^Ni)]TJ /F5 7.97 Tf 6.58 0 Td[(1,(5) where^Ni)]TJ /F5 7.97 Tf 6.59 0 Td[(1istheestimateafterthepreviouspollingand!isasystemparameter,whichwillbeextensivelyanalyzedinthenextsubsection.Second,basedonthereceivedziandthehistoryinformation,thereaderndsthenewestimateofNthatmaximizesthefollowinglikelihoodfunction: Li=iYj=1(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pj)N(1)]TJ /F4 7.97 Tf 6.59 0 Td[(zj)(1)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F3 11.955 Tf 11.95 0 Td[(pj)N)zj,(5) where(1)]TJ /F3 11.955 Tf 12.06 0 Td[(pj)N(1)]TJ /F4 7.97 Tf 6.59 0 Td[(zj)(1)]TJ /F6 11.955 Tf 12.05 0 Td[((1)]TJ /F3 11.955 Tf 12.06 0 Td[(pj)N)zjistheprobabilityfortheobservedstatezjofthejthpollingtooccur.Namely,wewanttond ^Ni=argmaxfLigN.(5) Third,aftercomputing^Ni,thereaderhastodetermineifthecondenceintervalofthenewestimatemeetstherequirement.Inthefollowing,weshowhowtheabovetaskscanbeachieved. 5.3.3.1Computethevalueof^Ni WecomputethenewestimateofNthatmaximizes( 5 ).Sincethemaximaisnotaffectedbymonotonetransformations,weuselogarithmtoturntherightsideofthe 106

PAGE 107

equationfromproducttosummation: ln(Li)=iXj=1N(1)]TJ /F3 11.955 Tf 11.95 0 Td[(zj)ln(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pj)+zjln(1)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F3 11.955 Tf 11.95 0 Td[(pj)N). Tondthemaxima,wedifferentiatebothsides: @ln(Li) @N=iXj=1(1)]TJ /F19 10.909 Tf 10.91 0 Td[(zj)ln(1)]TJ /F19 10.909 Tf 10.91 0 Td[(pj))]TJ /F19 10.909 Tf 10.91 0 Td[(zj(1)]TJ /F19 10.909 Tf 10.91 0 Td[(pj)Nln(1)]TJ /F19 10.909 Tf 10.91 0 Td[(pj) 1)]TJ /F21 10.909 Tf 10.91 0 Td[((1)]TJ /F19 10.909 Tf 10.91 0 Td[(pj)N.(5)Wethensettherightsidetozeroandsolvetheequationforthenewestimate^Ni.NotethatthederivativeisamonotonefunctionofN,wecannumericallyobtain^Nithroughbisectionsearch. 5.3.3.2TerminationCondition Usingthe)]TJ /F1 11.955 Tf 9.29 0 Td[(method[ 15 ],weshowinAppendixAthat,wheniislarge,^NiapproximatelyfollowstheGaussiandistribution:NormN,(1)]TJ /F6 11.955 Tf 11.96 0 Td[((1)]TJ /F3 11.955 Tf 11.96 0 Td[(pi)N) i(1)]TJ /F3 11.955 Tf 11.95 0 Td[(pi)Nln2(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pi). Thevarianceof^Niis Var(^Ni)1)]TJ /F6 11.955 Tf 11.95 0 Td[((1)]TJ /F3 11.955 Tf 11.95 0 Td[(pi)N i(1)]TJ /F3 11.955 Tf 11.95 0 Td[(pi)Nln2(1)]TJ /F3 11.955 Tf 11.95 0 Td[(pi).(5) WhenNislargeandpiissmall,wecanapproximate(1)]TJ /F3 11.955 Tf 12.02 0 Td[(pi)Nase)]TJ /F4 7.97 Tf 6.59 0 Td[(Npiandln(1)]TJ /F3 11.955 Tf 12.02 0 Td[(pi)aspi.Theabovevariancebecomes Var(^Ni)eNpi)]TJ /F6 11.955 Tf 11.96 0 Td[(1 ip2i.(5) Hence,thecondenceintervalofNis ^NiZs e^Nipi)]TJ /F6 11.955 Tf 11.95 0 Td[(1 ip2i,(5) whereZisthepercentileforthestandardGaussiandistribution.Forexample,when=95%,Z=1.96.BecauseNisundetermined,weuse^Niasanapproximationwhencomputingthestandarddeviationin( 5 ). 107

PAGE 108

TheterminationconditionforGMLEisthereforeZs e^Nipi)]TJ /F6 11.955 Tf 11.95 0 Td[(1 ip2i^Ni, (5) whereistheerrorbound.Theaboveinequalitycanberewrittenasp iZp e^Nipi)]TJ /F6 11.955 Tf 11.95 0 Td[(1 ^Nipi. (5) Wheniislarge,theestimationchangeslittlefromonepollingtothenext.Hence,pi=!=^Ni)]TJ /F5 7.97 Tf 6.59 0 Td[(1!=^Ni.WehaveiZ2(e!)]TJ /F6 11.955 Tf 11.96 0 Td[(1) !22. (5) Hence,if!isdetermined,wecantheoreticallycomputetheapproximatenumberofpollingsthatisrequiredinordertomeettheaccuracyrequirement.Forexample,if=95%,=5%,and!=1.594(whichistheoptimalvaluetobegivenshortly),2372pollingswillberequired.Notethat( 5 )isindependentwiththeactualnumberoftags,N.Hence,ourapproachhasperfectscalability. Figure 5-1 showsthesimulationresultofGMLEwhenN=10,000,=95%,=5%and!=1.594.ThesimulationsetupcanbefoundinSection 5.5 .Themiddlecurveistheestimatednumberoftags,^Ni,withrespecttothenumberpollings.ItconvergestothetruevalueNrepresentedbythecentralstraightline.Theupperandlowercurvesrepresentthe95%condenceinterval,whichshrinksasthenumberofpollingsincreases. 5.3.4Determinethevalueof! Wedemonstratetheimpactofthevalue!ontwoperformancemetrics:thenumberofpollingsandthenumberoftagresponses(i.e.,thenumberoftagtransmissions).Theformermeasurestheestimationtimesinceeachpollingtakesanequalamountoftimeforrequest/responseexchange.Thelattermeasurestheenergycostbecauseeachresponsecorrespondstoonetagmakingonetransmissioninaslot. 108

PAGE 109

5.3.4.1NumberofPollings Accordingto( 5 ),thenumberofpollingsformeetingtheaccuracyrequirementisZ2(e!)]TJ /F5 7.97 Tf 6.59 0 Td[(1) !22.Tonditsminimumvalue,wedifferentiateitwithrespectto!andlettheresultbezero.Solvingtheequation,wehave!=1.594.Hence,theoptimalvalueofpithatminimizesthenumberofpollingsis pi=1.594 ^Ni)]TJ /F5 7.97 Tf 6.58 0 Td[(1.(5) 5.3.4.2NumberofResponses Wecountthetotalnumberofresponsesduringtheestimationprocess.Afterasmallnumberofpollings,theestimationwillcloselyapproximateN(seeFigure 5-1 ).Hence,theexpectednumberofresponsesforeachpollingisNpiNi)]TJ /F5 7.97 Tf 6.59 0 Td[(1pi=!.AfterZ2(e!)]TJ /F5 7.97 Tf 6.59 0 Td[(1) !22pollingsaremade,thetotalnumberofresponsesisroughly Z2(e!)]TJ /F6 11.955 Tf 11.95 0 Td[(1) !22!=Z2(e!)]TJ /F6 11.955 Tf 11.95 0 Td[(1) !2. (5) OursimulationresultsinSection 5.5 demonstratethattheapproximationintheabovecountisreasonablyaccurate.Itisanincreasingfunctionwithrespectto!,whichmeansthatalargervalueof!willleadtoalargernumberofresponses.Wegivetheintuitionasfollows:Alarger!meansalargercontentionprobabilityandthusmorecollisions.Twoormoreresponsesinacollisionslotproducethesameamountofinformationasoneresponseinasingletonslot(seefurtherexplanationinSection 5.3.6 ).Inotherwords,inordertogeneratethenecessaryamountofinformationformeetingtheaccuracyrequirement,moreresponsesmustbeneedediftherearemorecollisions. 109

PAGE 110

5.3.4.3Summary InFigure 5-2 ,weplotthenumberofpollingsandthenumberofresponseswithrespecttothevalueof!.Thenumberofpollingsisminimizedat!=1.594.When!issmallerthan1.594,itsvaluecontrolstheperformancetradeoffbetweenthetwometrics.Whenwedecrease!,theenergycost(i.e.,thenumberofresponses)dropsattheexpensesoftheestimationtime(i.e.,thenumberofpollings).OurfurthersimulationsinSection 5.5 showthatevenat!=1.594,theenergycostofGMLEisfarbelowthoseoftheexistingprotocols. 5.3.5Request-lessPollings Weobservethat,afteranumberofpollings,thevalueofpiwillstayinaverysmallrangeanddoesnotchangemuch.ItbecomesunnecessaryfortheRFIDreadertotransmititateachpolling.Hence,weimproveGMLEasfollows:IfthepercentagechangeinpiduringacertainnumberM1ofconsecutivepollingsisbelowasmallthreshold,thereaderwillbroadcastapollingrequest,carryingthelatestvalueofpi,aagindicatingthatitwillnolongertransmitpollingrequestsforacertainnumberM2ofslots,andthevalueofM2.Withoutreceivingfurtherpollingrequests,thetagswillrespondwiththesamecontentionprobabilityinthesubsequentM2slots.Thisiscalledtherequest-lesspollings.AfterM2slots,thereaderwillrecalculatethecontentionprobability,broadcastanotherpollingrequest,carryingthenewprobabilityvalue,aag,andM2.Thisprocessrepeatsuntiltheterminationconditionin( 5 )ismet.Withthethresholdbeing10%,M1=10,andM2=50,oursimulationresultsshowthattheperformancedifferencecausedbyrequest-lesspollingsisnegligiblysmalleventhoughthecontentionprobabilityduringrequest-lesspollingsmaybeslightlyoffthevaluesetby( 5 ).Request-lesspollingscanalsobeappliedtothealgorithminthenextsection. 5.3.6InformationLossduetoCollision GMLEhasaframesizeofoneslot.Itobtainsonlybinaryinformationateachpolling.Nomatterhowmanytagsrespond,theinformationthatthereaderreceivesis 110

PAGE 111

alwaysthesame,i.e.,zi=1,whichimpliesinformationlosswhentwoormoretagsdecidetotransmitatapolling.Let'scomparetwoscenarios.Inonescenario,onlyonetagrespondsatapolling.Intheother,twotagsrespond.Thesetwoscenariosgeneratethesameinformationbuttheenergycostofthesecondscenarioistwiceoftherst.Toaddressthisissue,wedesignanotheralgorithmthatreducestheprobabilityofcollisionand,moreover,compensatetheimpactofcollisioninitscomputation. 5.4EnhancedGeneralizedMaximumLikelihoodEstimationAlgorithm Theenhancedgeneralizedmaximumlikelihoodestimation(EGMLE)algorithmisoursecondestimatorforthenumberofRFIDtags.Italsoutilizeshistoryinformationfrompreviouspollingsandusesthemaximumlikelihoodmethodtoestimatethenumberoftags.However,insteadofonlyobtainingbinaryinformation,itcomputesthenumberofresponsesineachpolling.Becausemoreinformationcanbeextracted,itisabletoachievemuchbetterenergyefciencythanGMLE. 5.4.1Overview EGMLEusesthesamepollingprotocolasGMLEdoes,exceptthatitsframesizefislargerthanoneinordertoreducetheprobabilityofcollision.Theresultoftheithpolling,xi,isnolongerabinaryvalue.Instead,itisanestimateofthenumberoftagsthatrespondduringthepolling. EGMLEtakestwostepstosolvethecollisionproblem.First,itincreasestheframesizefsuchthatthetagsthatdecidetorespondatapollingarelikelytorespondatdifferentslotsintheframe.Wepickvaluesforpiandfsuchthatthecollisionprobabilityisverysmall.Second,wecompensatetheremainingimpactofcollisioninourcomputation. EGMLEalsoconsistsofaninitializationphaseandaniterativephase.TheinitializationphaseofEGMLEisthesameastheinitializationphaseofGMLE,exceptthatwhentheRFIDreaderobtainstherstnon-zeroresultxlatthelthpollingwitha 111

PAGE 112

contentionprobabilitypl,itcomputesacoarseestimationofNasxl pl.Thenitmovestothenextphasebelow. 5.4.2IterativePhase Thisphaseiterativelyrenestheestimationaftereachpolling,andterminateswhenthespeciedaccuracyrequirementismet.Thereaderperformsfourtasksduringtheithpolling.First,itcomputesthecontentionprobabilitybeforesendingoutthepollingrequest. pi=! ^Ni)]TJ /F5 7.97 Tf 6.58 0 Td[(1,(5) where^Ni)]TJ /F5 7.97 Tf 6.59 0 Td[(1istheestimateafterthepreviouspollingand!isonebydefault.Aswewillshowinthenextsubsection,performancetradeoffcanbemadebychoosingothervaluesfor!. Second,thereadercomputesthenumberofresponsesxiinthecurrentframe. Third,basedonthereceivedxiandthehistoryinformation,thereadercomputesthenewestimateofNthatmaximizesthefollowinglikelihoodfunction: Li=iYj=l+11 p 2Npj(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pj)e)]TJ /F15 5.978 Tf 7.78 5.62 Td[(((1+")xj)]TJ /F8 5.978 Tf 5.76 0 Td[(Npj)2 2Npj(1)]TJ /F8 5.978 Tf 5.75 0 Td[(pj),(5) where"isintroducedtocompensateforcollisionandtheiterativephasebeginsfromthe(l+1)thpolling.Theaboveformulaandthevalueof"willbederivedshortly.Thenewestimateis ^Ni=argmaxfLigN.(5) Fourth,aftercomputing^Ni,thereaderdeterminesiftheestimatemeetstheaccuracyrequirement.Inthefollowing,wegivethedetailsoftheabovetasks. 5.4.2.1Computethenumberofresponses Attheithpolling,thereadermeasuresthenumberofnon-emptyslotsintheframe,denotedasxi,whichisanintegerintherangeof[0..f].Duetopossiblecollision,the 112

PAGE 113

actualnumberofresponses,denotedasxi,canbegreater.Letxi=(1+")xi.Thevalueof"isdeterminedbelow. Sinceeachtagindependentlydecidestorespondwithprobabilitypi,xifollowsabinomialdistribution,Bino(N,pi),i.e., Probfxi=kg=Nkpki(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pi)N)]TJ /F4 7.97 Tf 6.59 0 Td[(k.(5) Suppose!takesthedefaultvalue,1.Wheniislarge,Ni)]TJ /F5 7.97 Tf 6.58 0 Td[(1approximatesNandthuspi1=N.IfNissufcientlylarge,Probfxi=2g0.1839,Probfxi=3g0.0613,Probfxi=4g0.0153,andtheprobabilitydecreasesexponentiallywithrespecttok.Probfxi>4gisonlyabout0.0037. Next,wecomputetheprobabilityforcollisiontohappenattheithpolling,whichisdenotedasProbifcollisiong. Probifcollisiong=NXk=2Probifcollisionjxi=kgProbfxi=kg=fXk=2(1)]TJ /F19 10.909 Tf 12.11 7.38 Td[(P(f,k) fk)Probfxi=kg+NXk=f+11Probfxi=kg,whereP(f,k)=f! (f)]TJ /F4 7.97 Tf 6.58 0 Td[(k)!isthepermutationfunction.Figure 5-3 showsthecollisionprobabilityProbifcollisiongwithrespecttof.Itdiminishesquicklyasfincreases.Whenf=10(whichiswhatweuseinthesimulations),Probifcollisiongisjust0.046.Withsuchasmallprobability,thechanceformorethantwotagsinvolvedinacollisionormorethanonecollisionatapollingisexceedinglysmallandthusignored.Therefore,toapproximatexi,wemultiplyxiby1.046tocompensatetheimpactofcollision.Namely,"=0.046. 5.4.2.2Computethevalueof^Ni Recallthattheiterativephasestartsatthe(l+1)thpolling.Aftertheithpolling,thereaderhascollectedthevaluesofxj,l
PAGE 114

thatxi=(1+")xianditfollowsabinomialdistributionBino(N,pj).WhenNislargeenough,thebinomialdistributioncanbecloselyapproximatedbyaGaussiandistributionNorm(j,j)withparametersj=Npjandj=p Npj(1)]TJ /F3 11.955 Tf 11.95 0 Td[(pj).Namely, xj(1+")xjNorm(Npj,Npj(1)]TJ /F3 11.955 Tf 11.95 0 Td[(pj)).(5) Hence,theprobabilityforthemeasurednumberofresponses,(1+")xj,tooccurunderthisdistributionis1 p 2Npj(1)]TJ /F4 7.97 Tf 6.59 0 Td[(pj)exp[)]TJ /F5 7.97 Tf 10.49 6.1 Td[(((1+")xj)]TJ /F4 7.97 Tf 6.59 0 Td[(Npj)2 2Npj(1)]TJ /F4 7.97 Tf 6.59 0 Td[(pj)].Thelikelihoodfunctionforallmeasurednumbersofresponsesinthepollings,(1+")xj,l
PAGE 115

Accordingto( 5 ),wehaveI(^Ni)=EiXj=l+1(1+")2x2j N3pj(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pj))]TJ /F3 11.955 Tf 13.15 8.09 Td[(i)]TJ /F3 11.955 Tf 11.96 0 Td[(l 2N2=iXj=l+1(Npj)2+Npj(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pj) N3pj(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pj))]TJ /F3 11.955 Tf 13.15 8.09 Td[(i)]TJ /F3 11.955 Tf 11.95 0 Td[(l 2N2 (5)=iXj=l+1pj N(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pj)+i)]TJ /F3 11.955 Tf 11.96 0 Td[(l 2N2. (5) Above,wehaveappliedE((1+")2x2j)=(Npj)2+Npj(1)]TJ /F3 11.955 Tf 12.47 0 Td[(pj)in( 5 )because(1+")xjNorm(Npj,Npj(1)]TJ /F3 11.955 Tf 11.95 0 Td[(pj))andE(x2)=(E(x))2+Var(x). FollowingtheclassicaltheoryforMLE,wheniissufcientlylarge,thedistributionof^Niisapproximatedby Norm(N,1 I(^Ni)).(5) Hence,thecondenceintervalis ^NiZs 1 I(^Ni).(5) Notethatweuse^NiasanapproximationforNinthecomputationwhennecessarysinceNisunknown.TheterminationconditionforEGMLEtoachievetherequiredaccuraryis Zs 1 I(^Ni)^Ni.(5) Figure 5-4 showsthesimulationresultofEGMLEwhenN=10,000,=95%,=5%,and!=1.Themiddlecurveisthevalueof^Ni,whichconvergestothevalueofNrepresentedbythecentralstraightline.Theupperandlowercurvesrepresentthe95%condenceinterval,whichshrinksasthenumberofpollingsincreases.Thealgorithmterminatesafter1081pollings. 5.4.3PerformanceTradeoff Accordingto( 5 ),thecontentionprobabilityisproportionalto!.Westudyhowthevalueof!controlsthetradeoffbetweentheestimationtimeandtheenergy 115

PAGE 116

cost,whicharemeasuredbythenumberofpollingsandthenumberofresponses,respectively. 5.4.3.1NumberofPollings SincetheMLEapproachprovidesstatisticallyconsistentestimate,wheniislarge,( 5 )canbeapproximatedasfollows:I(^Ni)=iXj=l+1pj N(1)]TJ /F3 11.955 Tf 11.95 0 Td[(pj)+i)]TJ /F3 11.955 Tf 11.96 0 Td[(l 2N2pi N(1)]TJ /F3 11.955 Tf 11.95 0 Td[(pi)+1 2N2(i)]TJ /F3 11.955 Tf 11.95 0 Td[(l)2Npi+1 2N2(i)]TJ /F3 11.955 Tf 11.96 0 Td[(l). (5) wherepi1.Accordingto( 5 ),wehave I(^Ni)(Z ^Ni)2(5) ( 5 )and( 5 )giveusthefollowinginequality:2Npi+1 2N2(i)]TJ /F3 11.955 Tf 11.95 0 Td[(l)(Z ^Ni)2,i2Z2 (2!+1)2, (5) where^NiNandli.Hence,thenumberofpollingsittakestoachievetheaccuracyrequirementis2Z2 (2!+1)2. ThesolidlineinFigure 5-5 showsthenumberofpollingswithrespectto!when=95%and=5%.Itisadecreasingfunctionin!.Thereasonisthatalarger!resultsinmoreresponses(andthusmoreinformation)ineachpolling.Consequently,alessnumberofpollingsisneededtoachieveacertainaccuracyrequirement. 5.4.3.2NumberofResponses Wheniislarge,theexpectednumberofresponsesforeachpollingisNpiNi)]TJ /F5 7.97 Tf 6.58 0 Td[(1pi=!.After2Z2 (2!+1)2pollingsaremade,thetotalnumberofresponsesisroughly 116

PAGE 117

Z2(e!)]TJ /F6 11.955 Tf 11.95 0 Td[(1) !22!=Z2(e!)]TJ /F6 11.955 Tf 11.95 0 Td[(1) !2. (5) ThedottedlineinFigure 5-5 showsthenumberofresponseswithrespectto!when=95%and=5%.Itisanincreasingfunctionin!,whichmeansthatalargervalueof!willleadtoalargernumberofresponses. 5.4.3.3Summary Figure 5-5 demonstratestheperformancetradeoffunderdifferentvaluesof!.Aswedecrease!,EGMLEachievesbetterenergyefciencybyrequiringafewernumberofresponses,attheexpenseoftimeefciencybyrequiringalargernumberofpollings. 5.5Simulations WeevaluatetheperformanceofGMLEandEGMLEbysimulations.Inordertodemonstratetheperformancetradeoffbetweenenergycostandestimationtime,wechoosetwodifferentcontentionprobabilityparametersforeachofthetwoalgorithms.Weuse!=0.5and1.594forGMLE,i.e.,pi=0.5 ^Ni)]TJ /F15 5.978 Tf 5.75 0 Td[(1and1.594 ^Ni)]TJ /F15 5.978 Tf 5.76 0 Td[(1.Notethat1.594istheoptimalvalueof!fortimeefciencyinGMLE.WedenotethecorrespondingvariantsofthealgorithmasGMLE(0.5)andGMLE(1.594). ForEGMLE,Figure 5-5 showsthatthenumberofpollingsandthenumberofresponsesarebothmonotonicfunctionswithrespectto!,whichmeansthereisnooptimal!foreitherenergyefciencyortimeefciency.Wechoose!=0.5and1.0forEGMLE,i.e.,pi=0.5 ^Ni)]TJ /F15 5.978 Tf 5.76 0 Td[(1and1.0 ^Ni)]TJ /F15 5.978 Tf 5.76 0 Td[(1.ThecorrespondingvariantsofthealgorithmisdenotedasEGMLE(0.5)andEGMLE(1.0).Section 5.4.2 showshowtocomputethecompensationparameter"forEGMLE(1.0),whichis0.046.Followingthesamesteps,weobtain"=0.012forEGMLE(0.5).Wecomparetheproposedalgorithmswiththestate-of-the-artalgorithmsintherelatedwork.TheyaretheUniedProbabilisticEstimator(UPE)[ 63 ]andtheEnhancedZero-Based(EZB)estimator[ 64 ].TheoriginalUPE,denotedasUPE-O,isveryenergy-inefcientbecauseitscontentionprobability 117

PAGE 118

beginsfrom100%andthusalltagswillrespond.Wemodifyit(denotedasUPE-M)tobeginfromasmallinitialcontentionprobability1 NmaxandkeeptheremainingpartofUPE-O.ThissectionshowstheperformanceofbothUPE-OandUPE-M.Weruneachsimulation100timesandaveragetheoutcomes. Intheinitializationphaseofouralgorithms,letNmax=1,000,000andC=2.TheframesizeinEGMLE(0.5)andEGMLE(1.0)is10slots.TheparametersforUPEandEZBarechosenbasedontheoriginalpaperswheneverpossible.AllalgorithmsexceptforUPEneedonlytoidentifyemptyandnon-emptyslots.Tosetanon-emptyslotapartfromanemptyslot,atagonlyneedstorespondwithashortbitstring(onebit)tomakethechannelbusy.UPEhastoidentifyempty,singletonandcollisionslots.Tosetasingletonslotapartfromacollisionslot,manymorebits(10usedbyUPE)arenecessary[ 2 ].Forexample,CRCmaybeusedtodetectcollision. Theenergycostofanalgorithmdependson(1)thenumberofresponsesthatalltagstransmitbeforethealgorithmterminatesand(2)thesizeofeachresponse.Weuse`S'tomeanthattheresponseisashortbitstring(intheempty/non-emptycase),and`L'tomeanalongbitstring(intheempty/singleton/collisioncase). WedonotincludethesimulationresultsforLoF[ 93 ]becauseitsenergycostismuchhigherthanothers.ItsnumberofresponsestransmittedbythetagsiskN,wherekisthenumberofframesusedintheestimationprocess. 5.5.1NumberofResponses TherstsimulationstudiesthenumberofresponsesineachalgorithmwithrespecttoN,and.Table 5-1 showsthenumberofresponseswithrespecttoNwhen=90%and=9%.TheproposedalgorithmsrequirefewerresponsesthanUPEandEZB.Aspredicted,UPE-Oisenergy-inefcient;UPE-Mworksmuchbetter.ThebestalgorithmisEGMLE(0.5),whosenumberofresponsesisaboutonefthofwhatUPE-MrequiresandoneninetiethofwhatEZBrequireswhenNis20,000.Moreover,eachresponseinUPEismuchlonger. 118

PAGE 119

GMLE(0.5)hasasmallerenergycostthanGMLE(1.594).Forexample,N=10,000,theratiobetweenthenumberofresponsesbyGMLE(1.594)andthatbyGMLE(0.5)is2.01,whichisclosetothetheoretically-computedratioof1.90inFigure 5-2 .Similarly,EGMLE(0.5)ismoreenergyefcientthanEGMLE(1.0).WhenN=10,000,theratiobetweenthenumberofresponsesbyGMLE(1.594)andthatbyGMLE(0.5)is1.28,whichisalsoclosetothetheoreticalvalueof1.34inFigure 5-5 Wevaryfrom90%to95%andto99%,andvaryfrom9%to6%andto3%.Tables 5-2 to 5-9 showsimilarcomparisonunderdifferentvaluesofandvalues.Inallcases,thenumberofresponsesincreaseswhenincreasesordecreases,andexceptforEZB,thenumberdoesnotvarymuchwithrespecttoN,meaningthatallalgorithmsexceptforEZBachievegoodscalability.Theratiobetweenthenumbersfordifferentalgorithmsappearstobequitestableunderdifferentparametersettings. 5.5.2TotalNumberofBitsTransmitted Thesecondsimulationevaluatestheenergycostofthealgorithms.Asmentionedbefore,onebitisenoughtoseparateempty/non-emptyslot.Hence,theresponseofGMLE,EGMLEandEZBisonebitlong.AresponseinUPE-Mis10bitslong[ 63 ].Wecomparethetotalnumberofbitstransmittedbyalltagsbeforeeachalgorithmterminates.WeomittheresultsforUPE-O,whicharemuchworsethantheresultsofUPE-M.Figure 5-6 showsthesimulationresultswithrespecttoNwhen=90%,=9%,6%and3%.Forexample,when=90%,=3%,andN=20,000,theratiobetweenthenumberofbitstransmittedbyUPE-M(EZB)andthatbyourbestestimatorEGMLE(0.5)is45.32(71.28).Figure 5-7 andFigure 5-8 showthecomparisonunderdifferentvalueswhen=95%and99%,respectively.TheirresultsaresimilartoFigure 5-6 .Itshouldbenotedthatthenumberofbitstransmittedisnotanaccuratemeasurementoftheenergycostbecauseitignorestheenergyspenttopoweruptheradioandsynchronizewiththereader.However,combiningthenumberofbitsandthe 119

PAGE 120

numberoftransmissions(intheprevioussubsection)stillgivesagoodideaonhowenergy-efcienteachalgorithmis. 5.5.3EstimationTime ThethirdsimulationcomparesthetimeittakesforeachalgorithmtocompletetheestimationofN.BasedonthespecicationofthePhilipsI-Codesystem[ 101 ],aftertherequiredwaitingtimes(e.g.,gapbetweentransmissions)areincluded,itcanbecalculatedthataRFIDreaderneeds0.4mstodetectanemptyslot,0.8mstodetectacollisionorasingletonslot,and1mstobroadcastapollingrequest.Hence,GMLE,EGMLEandEZBrequiresaslotlengthof0.4ms,whileUPE-Mrequiresaslotlengthof0.8ms.Recallthatthecontentionprobabilitytakestheformof! ^Ni,where!isaknownconstant.Thusthereadertransmits^Niinsteadoftheactualprobabilityvalueinthepollingrequests.IfweassumeNmaxisnomorethanamillion,then20bitsfor^Niaresufcient.GMLEhasaxedframesizeofoneslot.EGMLEhasaxedframesizeof10slots.EZBandUPE-Malsohavepre-determinedframesizes.Let=90%,=9%,6%and3%.ThethreeplotsinFigure 5-9 showtheestimationtimesofthealgorithmswithrespecttothenumberoftagsinthedeployment.Thetimesgrowveryslowlyasthenumberoftagsincrease,whichsuggeststhealgorithmsallscalewell.IntherstplotofFigure 5-9 ,UPE-Mtakestheleastamountoftime,onlyabout0.5second,toestimate20,000tags,whiletheotheralgorithmstakebetween0.7to2.0seconds.GMLE(1.594)takeslessestimationtimethanGMLE(0.5)andtheratiois0.61,whichisconsistentwiththetheoreticalvalueof0.58inFigure 5-2 .Similarly,EGMLE(1.0)takeslesstimethanEGMLE(0.5)andtheratiois0.68,whichisalsoconsistentwiththetheoreticalvalueof0.67inFigure 5-5 .Figure 5-10 andFigure 5-11 showsimilarsimulationresultswhen=95%and99%,respectively.Eventhoughthenewalgorithmstakelongertocomplete,theirestimationtimeisstillsmall.Webelievetheextratimeneededcanbewelljustiedforthelargeenergysaving. 120

PAGE 121

ThereexistsaperformancetradeoffbetweenGMLEandEGMLE.Intheprevioustwosubsections,wehaveexaminedenergycostintermsofnumberofresponsesandnumberoftransmittedbits.EGMLEalwaysperformsbetterthanGMLE.Inthissubsection,wecompareestimationtimeofourtwomethods.GMLEperformsbetterthanEGMLE.Becausethefocusofthisworkisonenergyefciency,weregardEGMLEasourbestestimatorforenergysaving. 5.6Summary ThischapterproposestwoprobabilisticalgorithmsforestimatingthenumberofRFIDtagsinaregion.WebelievethealgorithmsaretherstofitskindthattargetsatprolongingthelifetimeoftheactiveRFIDs.Theirenergycostisfarlessthanthestate-of-the-artalgorithmsintherelatedwork.Moreover,werevealafundamentaltradeoffbetweentheenergycostandtheestimationtime.Bytuningasystemparameter,thealgorithmscantradelongerestimationtimeforlessenergycost,orviceversa.AppendixA:DistributionandVarianceof^Ni Letibealargepositiveinteger.ConsiderthesequenceofBernoullirandomvariables,Zj,1ji,whosesuccessprobabilityisq=1)]TJ /F6 11.955 Tf 12.87 0 Td[((1)]TJ /F3 11.955 Tf 12.87 0 Td[(pi)N.Let^q=(Pij=1Zj)=i,whichistheestimationofthesuccessprobabilityq.Itisknownthatasymptotically^qfollowsanormaldistribution: ^qNormq,q(1)]TJ /F3 11.955 Tf 11.95 0 Td[(q) i.(5) BecausetheMLEapproachprovidesstatisticallyconsistentestimate,wheniislarge,wecanconsiderthecontentionprobabilitiesinthelaterstageofthepoolingprocesstobeapproximatelyaconstant.Inaddition,thenumberofpollingresultsbeforestabilizationofthecontentionprobabilityislimited,andtheirimpactwilldiminishasibecomeslarge.Thatis,theycanbeignoredwhentheasymptoticpropertyof^Niisconsidered.Hence,fortheasymptoticproperty,wecanletpj=pi,for1ji,and 121

PAGE 122

Eq.( 5 )becomes @ln(Li) @N=ln(1)]TJ /F19 10.909 Tf 10.91 0 Td[(pi)(i)]TJ /F4 7.97 Tf 17.45 13.64 Td[(iXj=1Zj))]TJ /F21 10.909 Tf 21.63 7.39 Td[((1)]TJ /F19 10.909 Tf 10.91 0 Td[(pi)N 1)]TJ /F21 10.909 Tf 10.91 0 Td[((1)]TJ /F19 10.909 Tf 10.91 0 Td[(pi)NiXj=1Zj.(5)Therefore,theMLE^Nithatsolves@ln(Li) @N=0satises (1)]TJ /F3 11.955 Tf 11.95 0 Td[(pi)^Ni=1)]TJ /F6 11.955 Tf 11.96 0 Td[((iXj=1Zj)=i=1)]TJ /F6 11.955 Tf 12.2 0 Td[(^q.(5) Hence,from( 5 ),(1)]TJ /F3 11.955 Tf 11.96 0 Td[(pi)^Nasymptoticallyfollowsthefollowingnormaldistribution Norm(1)]TJ /F19 10.909 Tf 10.91 0 Td[(pi)N,(1)]TJ /F21 10.909 Tf 10.91 0 Td[((1)]TJ /F19 10.909 Tf 10.9 0 Td[(pi)N)(1)]TJ /F19 10.909 Tf 10.91 0 Td[(pi)N i.(5)Accordingtothe-method[ 15 ],ifarandomvariableXisatises XiD!Norm(,2 i),(5) whereandareniteconstantsandD!meansconvergenceindistribution,thenwemusthave g(Xi)D!Normg(),2[g0()]2 i,(5) foranyfunctiongsuchthatg0()existsandtakesanon-zerovalue.Basedon( 5 )and( 5 ),takingthelogarithmof( 5 ),wehave ^Niln(1)]TJ /F19 10.909 Tf 10.91 0 Td[(pi)NormNln(1)]TJ /F19 10.909 Tf 10.91 0 Td[(pi),(1)]TJ /F21 10.909 Tf 10.9 0 Td[((1)]TJ /F19 10.909 Tf 10.91 0 Td[(pi)N) i(1)]TJ /F19 10.909 Tf 10.9 0 Td[(pi)N.(5) Thatis,^NiNormN,(1)]TJ /F21 10.909 Tf 10.91 0 Td[((1)]TJ /F19 10.909 Tf 10.91 0 Td[(pi)N) i(1)]TJ /F19 10.909 Tf 10.91 0 Td[(pi)Nln2(1)]TJ /F19 10.909 Tf 10.91 0 Td[(pi).(5) 122

PAGE 123

Figure5-1. Themiddlecurveshowstheestimatednumberoftagswithrespecttothenumberofpollings.Theupperandlowercurvesshowthecondenceinterval. Figure5-2. Thesolidlineshowsthenumberofpollingswithrespectto!when=95%and=5%.Thedottedlineshowsthenumberofresponses. 123

PAGE 124

Figure5-3. Thecollisionprobabilitywithrespecttotheframesizef. Figure5-4. Themiddlecurveshowstheestimatednumberoftagswithrespecttothenumberofpollings.Theupperandlowercurvesshowthecondenceinterval. 124

PAGE 125

Figure5-5. Thesolidlineshowsthenumberofpollingswithrespectto!when=95%and=5%.Thedottedlineshowsthenumberofresponses. Table5-1. NumberofResponseswhen=90%,=9% NTotalnumberofresponsesGMLE(0.5)GMLE(1.594)EGMLE(0.5)EGMLE(1.0)UPE-OUPE-MEZB 5000432S767S172S225S6345L709L4342S10000414S832S180S231S11986L899L8683S20000402S844S186S213S22895L977L17366S Table5-2. NumberofResponseswhen=90%,=6% NTotalnumberofresponsesGMLE(0.5)GMLE(1.594)EGMLE(0.5)EGMLE(1.0)UPE-OUPE-MEZB 50001041S1855S402S523S7144L1811L7236S100001153S1924S414S519S12645L1687L14472S200001015S1797S375S503S23808L1814L28944S Table5-3. NumberofResponseswhen=90%,=3% NTotalnumberofresponsesGMLE(0.5)GMLE(1.594)EGMLE(0.5)EGMLE(1.0)UPE-OUPE-MEZB 50003927S7341S1499S2037S12664L6426L27497S100003760S7339S1489S2059S18023L6581L54993S200003783S7350S1543S2002S28708L6993L109987S 125

PAGE 126

Table5-4. NumberofResponseswhen=95%,=9% NTotalnumberofresponsesGMLE(0.5)GMLE(1.594)EGMLE(0.5)EGMLE(1.0)UPE-OUPE-MEZB 5000603S1112S258S330S6715L1073L4342S10000669S1120S247S304S12062L961L8683S20000680S1197S262S320S23345L1136L17366S Table5-5. NumberofResponseswhen=95%,=6% NTotalnumberofresponsesGMLE(0.5)GMLE(1.594)EGMLE(0.5)EGMLE(1.0)UPE-OUPE-MEZB 50001340S2515S581S736S7712L2598L10130S100001354S2511S596S736S13477L2318L20261S200001381S2630S555S749S24631L2510L40521S Table5-6. NumberofResponseswhen=95%,=3% NTotalnumberofresponsesGMLE(0.5)GMLE(1.594)EGMLE(0.5)EGMLE(1.0)UPE-OUPE-MEZB 50005687S10493S2181S2915S14678L8858L39074S100005673S10286S2267S2924S20845L9364L78148S200005588S10637S2217S2990S32339L9683L156297S 126

PAGE 127

Table5-7. NumberofResponseswhen=99%,=9% NTotalnumberofresponsesGMLE(0.5)GMLE(1.594)EGMLE(0.5)EGMLE(1.0)UPE-OUPE-MEZB 50001040S2162S427S453S7240L1726L7236S100001071S2135S416S529S12842L1906L14472S200001017S1916S439S573S23982L1819L28944S Table5-8. NumberofResponseswhen=99%,=6% NTotalnumberofresponsesGMLE(0.5)GMLE(1.594)EGMLE(0.5)EGMLE(1.0)UPE-OUPE-MEZB 50002527S4785S965S1269S9679L4311L17366S100002527S4637S973S1248S15336L4130L34733S200002440S4580S991S1293S26128L4044L69465S Table5-9. NumberofResponseswhen=99%,=3% NTotalnumberofresponsesGMLE(0.5)GMLE(1.594)EGMLE(0.5)EGMLE(1.0)UPE-OUPE-MEZB 50009693S18690S3818S4993S21823L16705L65124S100009606S18223S3791S4998S27667L15882L130247S200009385S17735S3847S5027S38935L16471L260495S 127

PAGE 128

Figure5-6. Numbersofbitstransmittedwhen=90%,=9%,6%and3%. Figure5-7. Numbersofbitstransmittedwhen=95%,=9%,6%and3%. Figure5-8. Numbersofbitstransmittedwhen=99%,=9%,6%and3%. 128

PAGE 129

Figure5-9. Estimationtimesofthealgorithmswhen=90%,=9%,6%and3%. Figure5-10. Estimationtimesofthealgorithmswhen=95%,=9%,6%and3%. Figure5-11. Estimationtimesofthealgorithmswhen=99%,=9%,6%and3%. 129

PAGE 130

CHAPTER6CONCLUSIONS Inthisdissertation,werstdevelopafastandcompactper-owtrafcmeasurementapproachthroughrandomizedcountersharing.Theapproachemploysanoveldataencoding/decodingscheme,whichmixesper-owinformationrandomlyinatightSRAMspaceforcompactness. Wethenfocusonthescandetectionprobleminhigh-speednetworks,whichisanotherimportantresearchtopicofonlinenetworkmeasurement.Weoptimallycombineprobabilisticsampling,bit-sharingstorage,andmaximumlikelihoodestimationtoachieveanefcientscandetectionscheme. Thirdly,weproposeanewmethodforODowmeasurementwhichemploysthebitmapdatastructureforpacketinformationstorageandusesstatisticalinferenceapproachtocomputethemeasurementresults.Ourmethodisabletoachievesmallerper-packetprocessingoverheadandmuchmoreaccurateresults,whencomparingwiththebestexistingapproach. Finally,wedesigntwoprobabilisticalgorithmsforestimatingthenumberofRFIDtagsinaregion.Webelievethealgorithmsaretherstofitskindthattargetsatprolongingthelifetimeoftheactivetags.Theirenergycostisfarlessthanthestate-of-the-artalgorithmsintherelatedwork.Moreover,werevealafundamentaltradeoffbetweentheenergycostandtheestimationtime.Bytuningasystemparameter,thealgorithmscantradelongerestimationtimeforlessenergycost,orviceversa. 130

PAGE 131

REFERENCES [1] AbileneUpdate.http://www.internet2.edu/presentations/spring03/20030410-Abilene-Corbato.pdf(2003). [2] EPCRadio-FrequencyIdentityProtocolsClass-1Generation-2UHFRFIDProtocolforCommunicationsat860MHz-960MHzVersion1.0.9.http://www.epcglobalinc.org/standards/uhfc1g2/uhfc1g2 1 0 9-standard-20050126.pdf(2005). [3] AbileneNetwork.http://en.wikipedia.org/wiki/Abilene Network(2011). [4] Abramowitz,M.andStegun,I.HandbookofMathematicalFunctions:withFormulas,Graphs,andMathematicalTables.DoverPublications(1964). [5] Bandi,N.,Agrawal,D.,andAbbadi,A.FastAlgorithmsforHeavyDistinctHittersusingAssociativeMemories.Proc.ofIEEEInternationalConferenceonDistributedComputingSystems(ICDCS)(2007). [6] Basu,A.,Buch,V.,Vogels,W.,andvonEicken,T.U-Net:auser-levelnetworkinterfaceforparallelanddistributedcomputing.Proc.ofACMSOSP(1995):40. [7] Bloom,B.H.Space/TimeTrade-offsinHashCodingwithAllowableErrors.CommunicationsoftheACM13(1970).7:422. [8] Broder,A.andMitzenmacher,M.NetworkApplicationsofBloomFilters:ASurvey.InternetMathematics1(2002).4:485. [9] Bryc,W.Thenormaldistribution:characterizationswithapplications.Springer-Verlag(1995). [10] CAIDA.AnalyzingUDPUsageinInternetTrafc.http://www.caida.org/research/trafc-analysis/tcpudpratio/(2009). [11] Cao,J.,Chen,A.,andBu,T.AQuasi-LikelihoodApproachforAccurateTrafcMatrixEstimationinaHighSpeedNetwork.Proc.ofIEEEINFOCOM(2008). [12] Cao,J.,Davis,D.,Wiel,S.V.,andYu,B.Time-varyingnetworktomography.J.Amer.Statist.Assoc(2000). [13] Cao,J.,Jin,Y.,Chen,A.,Bu,T.,andZhang,Z.IdentifyingHighCardinalityInternetHosts.Proc.ofIEEEINFOCOM(2009). [14] Casella,G.andBerger,R.StatisticalInference.DuxburyPress(2001). [15] Casella,G.andBerger,R.L.StatisticalInference.2ndedition,DuxburyPress(2002). 131

PAGE 132

[16] Cha,J.andKim,J.NovelAnti-collisionAlgorithmsforFastObjectIdenticationinRFIDSystem.Proc.IEEEICPADS(2005). [17] Charikar,M.,Chen,K.,andFarach-Colton,M.FindingFrequentItemsinDataStreams.Proc.ofInternationalColloquiumonAutomata,Languages,andProgramming(ICALP)(2002). [18] Chen,S.,Deng,Y.,Attie,P.,andSun,W.OptimalDeadlockDetectioninDistributedSystemsbasedonLocallyConstructedWait-forGraphs.(1996):613. [19] Chen,S.,Fang,Y.,andXia,Y.LexicographicMaxminFairnessforDataCollectioninWirelessSensorNetworks.IEEETransactionsonMobileComputing6(2007).7:762. [20] Chen,S.andNahrstedt,K.MaxminFairRoutinginConnection-orientedNetworks.Proc.Euro-ParallelandDistributedSystemsConf(1998):163. [21] Chen,S.andShavitt,Y.SoMR:AScalableDistributedQoSMulticastRoutingProtocol.JournalofParallelandDistributedComputing68(2008).2:137. [22] Chen,S.,Song,M.,andSahni,S.TwoTechniquesforFastComputationofConstrainedShortestPaths.IEEE/ACMTransactionsonNetworking16(2008).1:105. [23] Chen,S.,Tang,Y.,andDu,W.StatefulDDoSAttacksandTargetedFiltering.Journalofnetworkandcomputerapplications30(2007).3:823. [24] Cheswick,W.andBellovin,S.FirewallsandInternetSecurity:RepellingtheWilyHacker.Addison-Wesley(1994). [25] Choi,H.,Cha,J.,andKim,J.FastWirelessAnti-collisionAlgorithminUbiquitousIDSystem.Proc.IEEEVTC(2004). [26] Coates,M.,Hero,A.,Nowak,R.,andYu,B.Internettomography.IEEESignalProcessingMagazine(2002). [27] Cohen,S.andMatias,Y.SpectralBloomFilters.Proc.ofACMSIGMOD(2003). [28] Considine,J.,Li,F.,Kollios,G.,andByers,J.Approximateaggregationtechniquesforsensordatabases.InThe20thInternationalConferenceonDataEngineering(ICDE)(2004). [29] Cormode,G.andMuthukrishnan,S.SpaceEfcientMiningofMultigraphStreams.Proc.ofACMPODS(2005). [30] Cvetkovski,A.Analgorithmforapproximatecountingusinglimitedmemoryresources.Proc.ofACMSIGMETRICS(2007). 132

PAGE 133

[31] Das,R.GlobalRFIDMarketTops$5.5Billion.http://www.convertingmagazine.com/article/CA6653688.html(2009). [32] Deal,R.CiscoRouterFirewallSecurity.CiscoPress,ISBN-10:1-58705-175-3(2004). [33] Demaine,E.,Lopez-Ortiz,A.,andIan-Munro,J.FrequencyEstimationofInternetPacetStreamswithLimitedSpace.Proc.ofAnnualEuropeanSymposiumonAlgorithms(ESA)(2002). [34] Demaine,E.,Lopez-Ortiz,A.,andMunro,J.FrequencyEstimationofInternetPacketStreamswithLimitedSpace.Proc.of10thESAAnnualEuropeanSymposiumonAlgorithms(2002). [35] Dimitropoulos,X.,Hurley,P.,andKind,A.ProbabilisticLossyCounting:AnEfcientAlgorithmforFindingHeavyHitters.ACMSIGCOMMComputerCommunicationReview38(2008).1:7. [36] Dufeld,N.,Lund,C.,andThorup,M.EstimatingFlowDistributionsfromSampledFlowStatistics.Proc.ofACMSIGCOMM(2003). [37] Dufeld,N.G.andGrossglauser,M.Trajectorysamplingfordirecttrafcobservation.Proc.ofACMSIGCOMM(2000). [38] Durand,M.andFlajolet,P.LogLogCountingofLargeCardinalities.Proc.ofEuropeanSymposiumonAlgorithms(2003). [39] Erramilli,V.,Crovella,M.,andTaft,N.Anindependent-connectionmodelfortrafcmatrices.Proc.ofInternetMeasurementConference(IMC)(2006). [40] Estan,C.andVarghese,G.NewDirectionsinTrafcMeasurementandAccounting.Proc.ofACMSIGCOMM(2002). [41] Estan,C.,Varghese,G.,andFish,M.BitmapAlgorithmsforCountingActiveFlowsonHigh-SpeedLinks.IEEE/ACMTransactionsonNetworking(TON)14(2006).5:925. [42] Feldmann,A.,Greenberg,A.G.,Lund,C.,Reingold,N.,Rexford,J.,andTrue,F.DerivingtrafcdemandsforoperationalIPnetworks:methodologyandexperience.Proc.ofACMSIGCOMM(2000). [43] Flajolet,G.Probabilisticcounting.Proc.ofSymp.onFundationsofComputerScience(FOCS)(1983). [44] Floerkemeier,C.TransmissionControlSchemeforFastRFIDObjectIdencation.Proc.ofPerComWorkshopsonPervasiveWirelessNetworking(2006). 133

PAGE 134

[45] Floerkemeier,C.andWille,M.ComparisonofTransmissionSchemesforFrameALOHAbasedRFIDProtocols.Proc.ofSAINTWorkshopsonRFIDandExtendedNetworkDeploymentofTechnologiesandApplications(2006). [46] Fortz,B.andThorup,M.OptimizingOSPF/IS-ISweightsinachangingworld.IEEEJSACSpecialIssueonAdvancesinFundamentalsofNetworkManagement(2002). [47] Gardner,W.David.ResearchersTransmitOpticalDataAt16.4Tbps.Informa-tionWeek(2008). [48] Gibbons,P.andMatias,Y.NewSampling-basedSummaryStatisticsforImprovingApproximateQueryAnswers.Proc.ofACMSIGMOD(1998). [49] Han,H.,Sheng,B.,Tan,C.,Li,Q.,Mao,W.,andLu,S.CountingRFIDTagsEfcientlyandAnonymously.Proc.ofIEEEInfocom(2010). [50] Hao,F.,Kodialam,M.,andLakshman,T.V.ACCEL-RATE:AFasterMechanismforMemoryEfcientPer-owTrafcEstimation.Proc.ofACMSIGMET-RICS/Performance(2004). [51] Hermsmeyer,C.,Song,H.,Gemelli,R.,andBunse,S.Towards100GPacketProcessing:ChallengesandTechnologies.BellLabsTechnicalJournal14(2009).2:57. [52] Hollinger,R.andDavis,J.NationalRetailSecuritySurvey.http://diogenesllc.com/NRSS 2001.pdf(2001). [53] Hush,D.andWood,C.AnalysisofTreeAlgorithmforRFIDArbitration.Proc.IEEEISIT(1998). [54] Hwang,K.,Vander-Zanden,B.,andTaylor,H.ALinear-timeProbabilisticCountingAlgorithmforDatabaseApplications.ACMTransactionsonDatabaseSystems15(1990).2. [55] Jian,Y.andChen,S.CanCSMA/CANetworksBeMadeFair?(2008):235. [56] Jian,Y.,Chen,S.,Zhang,Z.,andZhang,L.ANovelSchemeforProtectingReceiver'sLocationPrivacyinWirelessSensorNetworks.IEEETransactionsonWirelessCommunications7(2008).10:3769. [57] Jung,J.,Paxson,V.,Berger,A.,andBalakrishnan,H.FastPortscanDetectionUsingSequentialHypothesisTesting.Proc.ofIEEESymposiumonSecurityandPrivacy(2004). [58] Kamiyama,N.andMori,T.SimpleandAccurateIdenticationofHigh-rateFlowsbyPacketSampling.Proc.ofIEEEINFOCOM(2006). 134

PAGE 135

[59] Kamvar,S.,Schlosser,M.,andGarcia-Molina,H.TheEigentrustalgorithmforreputationmanagementinP2Pnetworks.Proc.oftheWorldWideWebConference(2003). [60] Karp,R.,Shenker,S.,andPapadimitriou,C.ASimpleAlgorithmforFindingFrequentElementsinStreamsandBags.ACMTransactionsonDatabaseSystems28(2003).1:51. [61] Klair,D.,Chin,K.,andRaad,R.OntheEnergyConsumptionofPureandSlottedAlohabasedRFIDAnti-CollisionProtocols.ComputerCommunications(2008). [62] Kodialam,M.,Lakshman,T.V.,andMohanty,S.RunsbAsedTrafcEstimator(RATE):ASimple,MemoryEfcientSchemeforPer-FlowRateEstimation.Proc.ofINFOCOM(2004). [63] Kodialam,M.andNandagopal,T.FastandReliableEstimationSchemesinRFIDSystems.Proc.ACMMOBICOM,LosAngeles(2006). [64] Kodialam,M.,Nandagopal,T.,andLau,W.AnonymousTrackingusingRFIDtags.Proc.IEEEINFOCOM(2007). [65] Kumar,A.,Sung,M.,Xu,J.,andWang,J.DataStreamingAlgorithmsforEfcientandAccurateEstimationofFlowSizeDistribution.Proc.ofACMSIGMETRICS(2004). [66] Kumar,A.,Xu,J.,Wang,J.,Spatschek,O.,andLi,L.Space-CodeBloomFilterforEfcientPer-FlowTrafcMeasurement.Proc.ofIEEEINFOCOM(2004,AjournalversionwaspublishedinIEEEJSAC,24(12):2327-2339,December2006). [67] Lehmann,E.andCasella,G.TheoryofPointEstimation.SpringerPress(1998). [68] Li,T.,Chen,S.,andLing,Y.IdentifyingtheMissingTagsinaLargeRFIDSystem.Proc.ofACMMobiHoc(2010). [69] .FastandCompactPer-FlowTrafcMeasurementthroughRandomizedCounterSharing.Proc.ofIEEEINFOCOM(2011). [70] Li,T.,Chen,S.,Luo,W.,andZhang,M.ScanDetectioninHigh-SpeedNetworksBasedonOptimalDynamicBitSharing.Proc.ofIEEEINFOCOM(2011). [71] Li,T.,Chen,S.,andQiao,Y.Origin-DestinationFlowMeasurementinHigh-SpeedNetworks.Proc.ofIEEEINFOCOM,mini-conference(2011). [72] Li,T.,W.Luo,Mo,Z.,andChen,S.Privacy-preservingRFIDAuthenticationbasedonCryptographicalEncoding.Proc.ofIEEEINFOCOM(2012). [73] Li,T.,Wu,S.,Chen,S.,andYang,M.EnergyEfcientAlgorithmsfortheRFIDEstimationProblem.Proc.ofIEEEINFOCOM(2010). 135

PAGE 136

[74] Liang,G.andYu,B.Maximumpseudolikelihoodestimationinnetworktomography.IEEETrans.SignalProcessing51(2003).2043C2053. [75] Lu,Y.,Montanari,A.,Prabhakar,B.,Dharmapurikar,S.,andKabbani,A.CounterBraids:ANovelCounterArchitectureforPer-FlowMeasurement.Proc.ofACMSIGMETRICS(2008). [76] Lu,Y.andPrabhakar,B.RobustCountingViaCounterBraids:AnError-ResilientNetworkMeasurementArchitecture.Proc.ofIEEEINFOCOM(2009). [77] Lui,K.S.,Nahrstedt,K.,andChen,S.HierarchicalQoSRoutinginDelay-bandwidthSensitiveNetworks.(2000):579. [78] Luo,W.,Chen,S.,Li,T.,andChen,S.EfcientMissingTagDetectioninRFIDSystems.Proc.ofIEEEINFOCOM,mini-conference(2011). [79] Luo,W.,Chen,S.,Li,T.,andQiao,Y.ProbabilisticMissing-tagDetectionandEnergy-TimeTradeoffinLarge-scaleRFIDSystems.Proc.ofACMMobiHoc(2012). [80] M.Roughan,M.ThorupandZhang,Y.Trafcengineeringwithestimatedtrafcmatrices.Proc.ofInternetMeasurementConference(IMC)(2003). [81] Manku,G.andMotwani,R.ApproximateFrequencyCountsoverDataStreams.Proc.ofVLDB(2002). [82] Medina,A.,Taft,N.,Salamatian,K.,Bhattacharyya,S.,andDiot,C.Trafcmatrixestimation:Existingtechniquesandnewdirections.Proc.ofACMSIGCOMM(2002). [83] Myung,J.andLee,W.Anadaptivememorylesstaganti-collisionprotocolforRFIDnetworks.Proc.IEEEICC(2005). [84] Nahrstedt,K.andChen,S.CoexistenceofQoSandBest-effortFlows-routingandScheduling.(1998). [85] Namboodiri,V.andGao,L.Energy-AwareTagAnti-CollisionProtocolsforRFIDSystems.Proc.ofIEEEPerCom(2007). [86] Newey,W.andMcFadden,D.LargeSampleEstimationandHypothesisTesting.Dan.HandbookofEconometrics4(1994):2111. [87] Ni,L.,Liu,Y.,andLau,Y.C.Landmarc:IndoorLocationSensingusingActiveRFID.Proc.IEEEPerCom(2003). [88] Nucci,A.,Cruz,R.,Taft,N.,andDiot,C.Designofigplinkweightchangesforestimationoftrafcmatrices.Proc.ofIEEEINFOCOM(2004). [89] ofStandards,NationalInstituteandTechnology.FIPS180-1:SecureHashStandard.http://csrc.nist.gov(1995). 136

PAGE 137

[90] Pan,L.andWu,H.SmartTrend-Traversal:ALowDelayandEnergyTagArbitrationProtocolforLargeRFIDSystems.Proc.ofIEEEInfocom(2009). [91] Pearson,M.QDRTM-III:NextGenerationSRAMforNetworking.http://www.qdrconsortium.org/presentation/QDR-III-SRAM.pdf(2009). [92] Plonka,D.FlowScan:ANetworkTrafcFlowReportingandVisualizationTool.Proc.ofUSENIXLISA(2000). [93] Qian,C.,Ngan,H.,andLiu,Y.CardinalityEstimationforLarge-scaleRFIDSystems.Proc.IEEEPerCom(2008). [94] Qiao,Y.,Chen,S.,Li,T.,andChen,S.Energy-EfcientPollingProtocolsinRFIDSystems.Proc.ofACMMobiHoc(2011). [95] Qiao,Y.,Li,T.,andChen,S.OneMemoryAccessBloomFiltersandTheirGeneralization.Proc.ofIEEEINFOCOM(2011). [96] Ramabhadran,S.andVarghese,G.EfcientImplementationofaStatisticsCounterArchitecture.Proc.ACMSIGMETRICS(2003). [97] Ramakrishna,M.,Fu,E.,andBahcekapili,E.EfcientHardwareHashingFunctionsforHighPerformanceComputers.IEEETransactionsonComputers46(1997).12:1378. [98] Rincon,D.,Roughan,M.,andWillinger,W.TowardsaMeaningfulMRAofTrafcMatrices.Proc.ofACMSIGCOMMIMC(2008). [99] Ringberg,H.,Soule,A.,Rexford,J.,andDiot,C.SensitivityofPCAfortrafcanomalydetection.Proc.ofACMSIGMETRICS(2007). [100] Roughan,M.,Greenberg,A.,Kalmanek,C.,Rumsewicz,M.,Yates,J.,andZhang,Y.Experienceinmeasuringbackbonetrafcvariability:Models,metrics,measurementsandmeaning.Porc.ofACMSIGCOMMInternetMeasurementWorkshop(2002). [101] Semiconductors,Philips.I-CODESmartLabelRFIDTags.http://www.nxp.com/acrobat download/other/identication/SL092030.pdf(2004). [102] Shah,D.,Iyer,S.,Prabhakar,B.,andMcKeown,N.MaintainingStatisticsCountersinRouterLineCards.Proc.ofIEEEMicro22(2002).1:76. [103] Shi,Y.andHou,Y.Thomas.Theoreticalresultsonbasestationmovementproblemforsensornetwork.(2008). [104] Song,H.,Hao,F.,Kodialam,M.,andLakshman,T.IPv6LookupsUsingDistributedandLoadBalancedBloomFiltersfor100GbpsCoreRouterLineCards.Proc.ofINFOCOM(2009). 137

PAGE 138

[105] Soule,A.,Nucci,A.,Cruz,R.,Leonardi,E.,andTaft,N.Howtoidentifyandestimatethelargesttrafcmatrixelementsinadynamicenvironment.Proc.ofACMSigmetrics(2004). [106] Staniford,S.,Hoagland,J.,andMcAlerney,J.PracticalAutomatedDetectionofStealthyPortscans.JournalofComputerSecurity10(2002):105136. [107] Stanojevic,R.Smallactivecounters.Proc.ofIEEEINFOCOM(2007). [108] Tan,ChiuC.,Sheng,Bo,andLi,Qun.HowtoMonitorforMissingRFIDTags.Proc.IEEEICDCS(2008). [109] Tang,Y.,Chen,S.,andLing,Y.StateAggregationofLargeNetworkDomains.Computercommunications30(2007).4:873. [110] Venkatataman,S.,Song,D.,Gibbons,P.,andBlum,A.NewStreamingAlgorithmsforFastDetectionofSuperspreaders.Proc.ofNDSS(2005). [111] Vogt,H.EfcientObjectIdenticationwithPassiveRFIDTags.Proc.IEEEPerCom(2002). [112] Want,R.AnIntroductiontoRFIDTechnology.Proc.IEEEPerCom(2006). [113] Whang,K.,Vander-Zanden,B.,andTaylor,H.ALinearTimeProbabilisticCountingAlgorithmforDatabaseApplications.ACMTransactionsonDatabaseSystems(1990). [114] Yoon,M.,Li,T.,Chen,S.,andPeir,J.FitaSpreadEstimatorinSmallMemory.Proc.ofIEEEINFOCOM(2009). [115] .FitaCompactSpreadEstimatorinSmallHigh-SpeedMemory.IEEE/ACMTransactionsonNetworking19(2011).5. [116] Zecca,G.,Couderc,P.,Banatre,M.,andBeraldi,R.SwarmRobotSynchronizationUsingRFIDTags.Proc.ofIEEEPERCOM(2009). [117] Zhai,J.andWang,G.N.AnAnti-CollisionAlgorithmUsingTwo-functionedEstimationforRFIDTags.Proc.ICCSA(2005). [118] Zhang,M.,Li,T.,Chen,S.,andLi,B.UsingAnalogNetworkCodingtoImprovetheRFIDReadingThroughput.Proc.ofIEEEICDCS(2010). [119] Zhang,Y.,Roughan,M.,Dufeld,N.,andGreenberg,A.Fastaccuratecomputationoflarge-scaleiptrafcmatricesfromlinkloads.Proc.ofACMSIGMETRICS(2003). [120] Zhang,Y.,Roughan,M.,Lund,C.,andDonoho,D.Aninformationtheoreticapproachtotrafcmatrixestimation.Proc.ofACMSIGCOMM(2003). 138

PAGE 139

[121] .EstimatingPoint-to-PointandPoint-to-MultipointTrafcMatrices:AnInformation-TheoreticApproach.IEEE/ACMTransactionsonNetworking10(2005).10. [122] Zhang,Y.,Roughan,M.,Willinger,W.,andQiu,L.Spatio-TemporalCompressiveSensingandInternetTrafcMatrices.Proc.ofACMSIGCOMM(2009). [123] Zhang,Y.,Singh,S.,Sen,S.,Dufeld,N.,andLund,C.OnlineIdenticationofHierarchicalHeavyHitters:Algorithms,Evaluation,andApplication.Proc.ofACMSIGCOMMIMC(2004). [124] Zhang,Yin.monthsofAbilenetrafcmatrices.http://www.cs.utexas.edu/yzhang/research/AbileneTM/(2004). [125] Zhang,Z.,Chen,S.,Ling,Y.,andChow,R.Capacity-awareMulticastAlgorithmsonHeterogeneousOverlayNetworks.IEEETransactionsonParallelandDistributedSystems17(2006).2:135. [126] Zhang,Z.,Chen,S.,andYoon,M.MARCH:ADistributedIncentiveSchemeforPeer-to-peerNetworks.(2007):1091. [127] Zhao,H.,Wang,H.,Lin,B.,andXu,J.DesignandperformanceanalysisofaDRAM-basedstatisticscounterarrayarchitecture.Proc.ofACM/IEEEANCS(2009). [128] Zhao,Q.,Kumar,A.,andXu,J.JointDataStreamingandSamplingTechniquesforDetectionofSuperSourcesandDestinations.Proc.ofUSENIX/ACMInternetMeasurementConference(2005). [129] Zhao,Q.,Xu,J.,andKumar,A.DetectionofSuperSourcesandDestinationsinHigh-SpeedNetworks:Algorithms,AnalysisandEvaluation.IEEEJournalonSelectedAreasinCommunications(JASC)24(2006).10:1840. [130] Zhao,Q.,Xu,J.,andLiu,Z.DesignofaNovelStatisticsCounterArchitecturewithOptimalSpaceandTimeEfciency.Proc.ofACMSigmetrics/Performance(2006). 139

PAGE 140

BIOGRAPHICALSKETCH TaoLiwasborninFuyang,Anhui,China,in1984.HereceivedhisB.E.degreeincomputerscienceandengineeringfromUniversityofScienceandTechnologyofChinain2007.Afterthat,hejoinedtheDepartmentofComputerandInformationScienceandEngineeringattheUniversityofFloridatopursuehisPh.D.degreeunderthesupervisionofDr.ShigangChen.HisresearchinterestsincludeInternetTrafcMeasurementandRFIDtechnologies. 140