Citation
Data Structures for Efficient Packet Classification

Material Information

Title:
Data Structures for Efficient Packet Classification
Creator:
LU, WENCHENG ( Author, Primary )
Copyright Date:
2008

Subjects

Subjects / Keywords:
Algorithms ( jstor )
Data models ( jstor )
Databases ( jstor )
Datasets ( jstor )
Dynamic programming ( jstor )
Epics ( jstor )
Heuristics ( jstor )
Information search ( jstor )
Pipelines ( jstor )
Standard deviation ( jstor )

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Wencheng Lu. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
11/30/2007
Resource Identifier:
660162330 ( OCLC )

Downloads

This item is only available as the following downloads:


Full Text

PAGE 1

1

PAGE 2

2

PAGE 3

3

PAGE 4

IhavebeenalwaysfeelingfortunatetohaveDr.Sahniasmyadvisor.IwouldliketoexpressmysinceregratefulnesstoDr.Sahniforhismentoring,inspiring,andcaringduringmyPh.D.study.IalsowanttothankDr.Jih-kwonPeir,Dr.YuguangFang,Dr.ShigangChen,Dr.DapengWu,andDr.RichardNewmanforservingonmyPh.D.supervisorycommitteeandgivingmetremendoushelpsandvaluablesuggestions.IthankmyfriendsatUF.Withthem,5yearswentbysofast,andsomanywonderfulmemoryareleft.IwillalwaysrememberthatIwitnessedwiththemwhenUFwontwobigChampions(footballandbasketball).Specialthanksgotomybeautifulwife,mylovedparents,andmycutelittlegirlfortheirunselshlove,encouragementandsupport.Iwouldliketotellthemthattheyarealwaysthemostimportantpartofmylife. 4

PAGE 5

page ACKNOWLEDGMENTS ................................. 4 LISTOFTABLES ..................................... 8 LISTOFFIGURES .................................... 11 ABSTRACT ........................................ 15 CHAPTER 1INTRODUCTION .................................. 17 2PACKETCLASSIFICATIONUSINGTWO-DIMENSIONALMULTIBITTRIES 19 2.1RelatedWork .................................. 19 2.2Tries ....................................... 21 2.2.1One-BitTries .............................. 21 2.2.2MultibitTries .............................. 21 2.2.3Two-Dimensional1-BitTries ...................... 24 2.3Two-DimensionalMultibitTries ........................ 25 2.4AlgorithmdominatingLevel 27 2.5AlgorithmsourceTries 28 2.6Space-OptimalConstrained2DMTs ...................... 30 2.6.12DMTa .................................. 32 2.6.22DMTb ................................. 35 2.6.32DMTc .................................. 35 2.6.42DMTd ................................. 37 2.6.5Postprocessing .............................. 39 2.72DMTswithSwitchPointers .......................... 40 2.8ExperimentalResults .............................. 44 2.8.1Two-DimensionalDataSets ...................... 45 2.8.2Five-DimensionalDataSets ...................... 51 2.9Summary .................................... 54 3PACKETCLASSIFICATIONUSINGPIPELINEDMULTIBITTRIES ..... 56 3.1RelatedWork .................................. 56 3.2PipelinedOne-DimensionalVSTs ....................... 57 3.2.1ADynamicProgrammingConstruction ................ 57 3.2.2MappingontoaPipelineArchitecture ................. 59 3.3Pipelined2DMTs{CoarseMapping ...................... 63 3.3.1FSTDestTrie .............................. 64 3.3.2VSTDestTrie .............................. 66 3.4Pipelined2DMTs{FineMapping ....................... 69 3.4.1FSTDestTrie .............................. 69 5

PAGE 6

.............................. 73 3.5ExperimentalResults .............................. 78 3.5.1PipelinedOne-DimensionalMultibitTries ............... 78 3.5.2PipelinedTwo-DimensionalMultibitTries .............. 82 3.5.2.1Coarsepipelinemapping ................... 83 3.5.2.2Finepipelinemapping .................... 85 3.6Summary .................................... 87 4SUCCINCTREPRESENTATIONOFSTATICPACKETCLASSIFIERS .... 89 4.1BackgroundandRelatedWork ......................... 89 4.1.1One-DimensionalPacketClassication ................ 89 4.1.2Two-DimensionalPacketClassication ................ 96 4.2Minimum-HeightSSTs ............................. 97 4.3Space-OptimalHSSTs ............................. 101 4.4Space-Optimal2DHSSTs ............................ 103 4.52DHSSTsWithPrexInheritance(2DHSSTP) ............... 107 4.6ImplementationConsiderations ........................ 111 4.6.1HSSTs .................................. 111 4.6.2BaseImplementationOptimization .................. 113 4.6.3End-NodeOptimizedHSSTs ...................... 114 4.6.42DHSSTsand2DHSSTPCs ...................... 116 4.7ExperimentalResults .............................. 117 4.7.1One-DimensionalRoutingTables ................... 117 4.7.2FurtherOptimizations ......................... 126 4.7.3Multi-DimensionalIPv4Tables ..................... 127 4.7.3.1Two-dimensionalIPv4tables ................ 128 4.7.3.2Five-dimensionalIPv4tables ................ 131 4.8Summary .................................... 135 5RECURSIVELYPARTITIONEDSTATICIPROUTER-TABLES ........ 137 5.1RelatedWork .................................. 137 5.2RecursivePartitioning ............................. 139 5.2.1BasicStrategy .............................. 139 5.2.2IncorporatingLeafPushing ....................... 141 5.2.3Optimization ............................... 142 5.2.4ComparisonwithOtherPartitioningMethods ............ 145 5.2.5ImplementationConsiderations ..................... 146 5.3ExperimentalResults .............................. 147 5.3.1IPv4RouterTables ........................... 148 5.3.2IPv6RouterTables ........................... 151 5.4Summary .................................... 154 6

PAGE 7

.... 156 6.1RelatedWork .................................. 157 6.2SubtreeSplit .................................. 162 6.3PostorderSplit ................................. 167 6.4SimpleTCAMwithWideSRAM ....................... 171 6.4.1CarvingHeuristic ............................ 175 6.4.2DynamicProgrammingCarvingAlgorithm .............. 177 6.5Two-LevelTCAMwithWideSRAM ..................... 179 6.5.1One-to-onetwo-levelTCAM ...................... 179 6.5.2Many-to-onetwo-levelTCAM ..................... 181 6.6AComparison .................................. 183 6.7ExperimentalResults .............................. 183 6.7.1IPv4RouterTables ........................... 184 6.7.1.1Two-levelTCAMswithoutwideSRAMs .......... 184 6.7.1.2Two-levelTCAMswithwideSRAMs ............ 186 6.7.1.3Two-levelTCAMswithoutwideSRAMsVs.two-levelTCAMswithwideSRAMs .................. 190 6.7.2IPv6RouterTables ........................... 191 6.8Conclusion .................................... 193 7CONCLUSION .................................... 199 BIOGRAPHICALSKETCH ................................ 204 7

PAGE 8

Table page 2-1Exampleofsevendest-sourcelters ......................... 24 2-2Memory(KBytes)requiredby2DMTSaswithVSTdesttries .......... 48 2-3Memory(KBytes)requiredby2DMTdswithVSTsourcetries .......... 48 2-4Totalmemory(KBytes)andnumberofmemoryaccesses(MAs)requiredbyGoT-MTs,EGT-PC-TBMs,andHyperCutsstructureon2-dimensionaldatasets 50 2-5Totalmemory(KBytes)andnumberofmemoryaccesses(MAs)requiredbyextended2DMTds,EGT-PC-TBM,andHyperCutsstructureon5-dimensionaldatasets ........................................ 53 3-14-lterdatabase ................................... 74 3-2Reductioninmaximumper-stagememoryresultingfromtreepackingheuristic . 79 3-3Maximumper-stagememory(KB) ......................... 80 3-4Maximumper-stagememory(KB) ......................... 81 3-5Maximumper-stagememorynormalizedbyPVST'smaximumper-stagememory ............................................. 82 3-6TotalmemorynormalizedbyPVST'stotalmemory ............... 83 3-7Reductioninmaximumper-stagememoryresultingfromtreepackingheuristic . 84 4-1Exampleofvedest-sourcelters ......................... 97 4-2NumberofmemoryaccessesrequiredforalookupinIPv4tables ......... 118 4-3AccessesforIPv4datanormalizedbyEBOdata. ................. 118 4-4MemoryforIPv4datanormalizedbyEBOdata. ................. 119 4-5Totalmemory(KBytes)requiredbyIPv4tables .................. 120 4-6Memory(KBytes)ofBARTsearch ......................... 123 4-7NumberofmemoryaccessesrequiredforalookupinIPv6tables ......... 124 4-8AccessesforIPv6datanormalizedbyEBOdata. ................. 124 4-9MemoryforIPv6datanormalizedbyEBOdata. ................. 125 4-10Totalmemory(KBytes)requiredbyIPv6tables(AS1221*dataisinbytes) ... 125 8

PAGE 9

........................... 126 4-12Totalmemory(KBytes)andnumberofmemoryaccesses(MAs)requiredby2DHSSTsand2DHSSTPCs ............................. 129 4-13Totalmemory(KBytes),bits/rule,andnumberofmemoryaccessesrequiredbyextended2DHSSTPCson5-dimensionaldatasets. ................ 132 4-14Totalmemory(KBytes),bits/rule,andnumberofmemoryaccessesrequiredbyHyperCutson5-dimensionaldatasets. ...................... 133 5-1Memoryaccessesandtotalmemory(KBytes)requiredforIPv4tables ...... 150 5-2StatisticsforIPv4memoryrequirementnormalizedbythatforRP(4)using36-bitentries ......................................... 150 5-3Memoryaccessesandtotalmemory(KBytes)requiredforIPv6tables ...... 153 5-4IPv6datanormalizedbythememoryrequiredbyRP(4)using72-bitentries.ThedatasetAS1221*isexcludedhere. ....................... 153 6-1Anexample7-prexforwardingtable ........................ 158 6-2Comparisonofworst-caseTCAMmemoryandpowerrequired .......... 183 6-3ITCAMsizefor1-12-levelTCAMs ......................... 184 6-4ITCAMsizeformany-12-levelTCAMs ....................... 185 6-5NumberofDTCAMbucketsformany-12-levelTCAMs .............. 185 6-6TotalTCAMsizewithwideSRAMs ........................ 187 6-7TotalTCAMsizenormalizedbythatofM-12Wb ................. 187 6-8TotalTCAMpowerwithwideSRAMs ....................... 188 6-9TotalTCAMpowernormalizedbythatofM-12Wb ................ 188 6-10TotalSRAMsize(KBytes) .............................. 188 6-11TotalSRAMsizenormalizedbythatofM-12Wb ................. 189 6-12TotalTCAMsize ................................... 191 6-13TotalTCAMsizenormalizedbythatofM-12Wb ................. 191 6-14TotalTCAMpower .................................. 192 6-15TotalTCAMpowernormalizedbythatofM-12Wb ................ 192 9

PAGE 10

.............................. 192 6-17TotalSRAMsizenormalizedbythatofM-12Wb ................. 192 6-18ITCAMsizefor1-12-levelTCAMsforIPV6 .................... 193 6-19ITCAMsizeformany-12-levelTCAMsforIPv6 .................. 193 6-20NumberofDTCAMbucketsrequiredformany-12-levelTCAMsforIPv6 .... 194 6-21TotalTCAMsizeforIPv6 .............................. 194 6-22TotalTCAMsizenormalizedbythatofM-12Wb ................. 194 6-23TotalTCAMpowerforIPv6 ............................. 196 6-24TotalTCAMpowernormalizedbythatofM-12Wb ................ 197 6-25TotalSRAMsize(KBytes)forIPv6 ......................... 197 6-26TotalSRAMsizenormalizedbythatofM-12Wb ................. 197 10

PAGE 11

Figure page 2-1Prexesandcorresponding1-bittrie[ 32 ].A)8-prexexample[ 38 ].B)Corresponding1-bittrie. ....................................... 22 2-2Prexexpansionandxed-stridetrie.A)Expandedprexes.B)Correspondingxed-stridetrie .................................... 23 2-3Two-dimensional1-bittrieforTable 2-1 ...................... 25 2-42DMTforsevendest-sourceltersofTable 2-1 .................. 26 2-5Algorithmforcalculatingdominatinglevels. .................... 28 2-6Algorithmtocompute1-bitsourcetries ...................... 29 2-7Algorithmtoremovedominatednodes ....................... 30 2-8Algorithmtocomputef(s;e;z) ........................... 33 2-92DMTofFigure 2-4 augmentedwithswitchpointers ............... 40 2-102DMTSafortheltersofTable 2-1 ........................ 43 2-11Heuristictodetermineagood2DMTSa(k). .................... 44 2-12Numberofelementsinspace-optimalconstrained2DMTswithpostprocessing.SourcetriesareVSTs.A)ACL1.B)ACL2.C)ACL3.D)ACL4.E)ACL5.F)FW1.G)FW2.H)FW3.I)FW4.J)FW5.K)IPC1.L)IPC2. ......... 47 2-13Memoryrequirementsof2DMTdsand2DMTSas.A)ACL1.B)ACL2.C)ACL3.D)ACL4.E)ACL5.F)FW1.G)FW2.H)FW3.I)FW4.J)FW5.K)IPC1.L)IPC2. ........................................ 49 2-14PerformanceofGoT-MTs,EGT-PC-TBMs,andHyperCutson2-dimensionaldatasets.A)memory.B)numberofaccesses ................... 50 2-15Performanceofextended2DMTds,EGT-PC-TBM,andHyperCutson5-dimensionaldatasets.A)memory.B)Accesses. ......................... 54 3-1Treepackingheuristic ................................ 63 3-2Examplefornodepullup ............................... 69 3-3ComputingAoptSTs(N;i;j)usingnodepullups .................. 70 3-42DMTfor4-lterdatabaseinTable 3-1 ....................... 75 3-5Maximumper-stageandtotalmemory(KB)forAS1221.A)Maximumper-stagememory.B)Totalmemory .............................. 82 11

PAGE 12

................. 85 3-7Maximumper-stageandtotalmemoryrequiredbyne-grainpipelining.SourcetriesareVSTs.A)ACL1,MaxPerStageMemory.B)FW1,MaxPerStageMemory.C)IPC1,MaxPerStageMemory.D)ACL1,TotalMemory.E)FW1,TotalMemory.F)IPC1,TotalMemory. ...................... 86 3-8Maximumper-stageandtotalmemoryrequiredbycoarse-grainandne-grainpipelining.ThedesttrieisVSTandthesourcetriesareVSTs.A)ACL1,MaxPerStageMemory.B)FW1,MaxPerStageMemory.C)IPC1,MaxPerStageMemory.D)ACL1,TotalMemory.E)FW1,TotalMemory.F)IPC1,TotalMemory. ........................................ 88 4-1Prexesandcorrespondingbinarytrie.A)5prexes.B)Correspondingbinarytrie. .......................................... 90 4-2TBMforbinarytrieofFigure 4-1 (B).A)TBMpartitioning.B)TBMnoderepresentation. .................................... 92 4-3SSTforbinarytrieofFigure 4-1 (B).A)SSTpartitioning.B)SSTnode. .... 94 4-4HSSTforbinarytrieofFigure 4-1 (B).A)HSSTpartitioning.B)HSSTnoderepresentation. .................................... 95 4-5Two-dimensionalbinarytrieforvedest-sourceltersofFigure 4-1 ....... 97 4-6VisitfunctionforminHtSST 99 4-7Two-dimensionalsupernodetriefor5dest-sourceltersofFigure 4-1 ...... 104 4-82DHSSTPforFigure 4-7 ............................... 108 4-9Leafsupernodeformats ............................... 115 4-10NumberofmemoryaccessesrequiredbyalookupinIPv4tables.A)B=72bits.B)B=144bits. ..................................... 119 4-11Totalmemory(KBytes)requiredforIPv4tables.A)B=72bits.B)144bits. ... 120 4-12NumberofmemoryaccessesrequiredbyalookupinIPv6tables.A)B=72bits.B)B=144bits ..................................... 124 4-13Totalmemory(KBytes)requiredbyIPv6tables(AS1221*dataisinbytes).A)B=72bits.B)B=144bits. .............................. 125 4-14Totalmemory(KBytes)andnumberofmemoryaccessesrequiredby2DHSSTsand2DHSSTPCs.A)Memory.B)Accesses. ................... 129 12

PAGE 13

... 130 4-16Totalmemory(KBytes)andnumberofmemoryaccessesrequiredby2DHSSTPCsandextended2DHSSTPCs.A)Memory.B)Accesses. .............. 132 4-17Totalmemory(KBytes)andnumberofmemoryaccessesrequiredbyHyperCutsandextended2DHSSTPCs.A)Memory.B)Accesses. .............. 134 5-1StridespartitioningofabinarytrieT.A)TrieT.B)Hashtablerepresentation. ............................................. 139 5-2Hashtableentrytypes ................................ 141 5-3Searchingwithbasicstrategy ............................ 142 5-4SearchingwithleafpushingversionA ........................ 143 5-5MemoryaccessesrequiredforalookupinIPv4tables .............. 149 5-6TotalmemoryrequiredforIPv4tables ....................... 149 5-7MemoryaccessesrequiredforalookupinIPv6tables .............. 152 5-8TotalmemoryrequiredforIPv6tables ....................... 152 6-1SimpleTCAMorganizationforTable 6-1 ...................... 158 6-21-bittriefor7-prexexampleofTable 6-1 ..................... 159 6-32-levelTCAMorganizationusingsubtreesplit ................... 160 6-42-levelTCAMorganizationusingpostordersplit .................. 161 6-5Badexampleforsubtreesplitof[ 45 ] ........................ 163 6-6Visitfunctionforoptimalsubtreesplitting ..................... 164 6-7PS1 .......................................... 169 6-8VisitfunctionforfeasibleSTasusedbyPS1 ................... 170 6-9PS2 .......................................... 172 6-10VisitfunctionforfeasibleST2asusedbyPS2 .................. 173 6-11Suxnodeformat 4 ................................. 173 6-12SimpleTCAMwithSRAM(STW)fortheprexsetofTable 6-1 ........ 174 6-13Visitfunctionforsubtreecarvingheuristic ..................... 176 6-141-12Wawithxed-sizeDTCAMbuckets ...................... 180 13

PAGE 14

.................... 181 6-161-12Wcwithxed-sizeDTCAMbuckets ...................... 182 6-171-2Wdwithvariable-sizeDTCAMbuckets ..................... 182 6-18Many-to-one2-levelTCAMwithwideSRAM.A)M-12Wa.B)M-12Wb .... 182 6-19TotalITCAMsizewithwideSRAMsforAS1221. ................. 185 6-20ITCAMsizeandnumberofDTCAMbucketsformany-12-levelTCAMsforAS1221.A)ITCAMsize.B)#ofDTCAMbuckets ................ 186 6-21TotalTCAMsizewithwideSRAMsforAS1221. ................. 189 6-22TotalTCAMpowerwithwideSRAMsforAS1221. ................ 189 6-23TotalSRAMsizewithwideSRAMsforAS1221. .................. 190 6-24TotalTCAMsize,TCAMpower,andSRAMsizeforAS1221.A)TCAMsize.B)TCAMpower.C)SRAMsize .......................... 191 6-25TotalITCAMsizewithwideSRAMsforIPv6AS1221. .............. 195 6-26TotalITCAMsizeandnumberofDTCAMbucketswithwideSRAMsforIPv6AS1221.A)ITCAMsize.B)#ofDTCAMbuckets ................ 195 6-27TotalTCAMsize,TCAMpower,andSRAMsizeforIPv6AS1221.A)TCAMsize.B)TCAMpower.C)SRAMsize. ....................... 196 14

PAGE 15

15

PAGE 16

16

PAGE 17

30 , 31 ]).Although1-dimensionalprexltersareadequatefordestinationbasedpacketforwarding,higherdimensionalltersarerequiredforrewall,qualityofservice,andvirtualprivatenetworkapplications,forexample.Two-dimensionalprexlters,forexample,maybeused\torepresenthosttohostornetworktonetworkorIPmulticastows"[ 11 ]andhigherdimensionalltersarerequirediftheseowsaretoberepresented\withgreatergranularity."EppsteinandMuthukrishnan[ 8 ]statethat\Someproposalsareunderwaytospecifymanyeldswhileothersareunderway 17

PAGE 18

15 ]alsopointoutthatinIPsec,forsecurityreasons,eldsotherthanthesourceanddestinationaddressmaynotbeavailabletoaclassier.Thustwo-dimensionalprexltersrepresentanimportantspecialcaseofmulti-dimensionalpacketclassication.Datastructuresformulti-dimensional(i.e.,d>1)packetclassicationweredeveloped[ 2 , 3 , 5 , 8 { 11 , 16 , 21 , 22 , 29 , 33 { 36 ].Routertablesgenerallyoperateinoneoftwomodes{static(oroine)anddynamic(oronline).Inthestaticmode,weemployaroutertablethatsupportsveryhighspeedlookup.Updaterequestsarehandledonlineusingabackgroundprocessor.Withsomeperiodicity,anewandupdatedforwardingtableiscreated.Inthedynamicmode,lookupandupdaterequestsareprocessedintheordertheyappear.So,alookupcannotbedoneuntilaprecedingupdatehasbeendone.Sinceupdatesinroutertablesoccurwithabout3ordersofmagnitudelessfrequencythandolookups,itissucientfordatastructurestosupportlookupatlineratesandupdatesatmuchlowerrates.Theprimarymetricsemployedtoevaluateadatastructureforastatictablearememoryrequirementandworst-casenumberofmemoryaccessestoperformalookup.Inthecaseofadynamictable,anadditionalmetric{worst-casenumberofmemoryaccessesneededforanupdate{isused.Inthisdissertation,wefocusonstaticclassiers.Thatis,thesetofrulesthatcomprisetheclassierdoesnotchange(noinserts/deletes).Thisassumptionisconsistentwiththatmadeinmostoftheclassierliteraturewheretheobjectiveistodevelopamemory-ecientclassierrepresentationthatcanbesearchedveryfast. 18

PAGE 19

36 ],thestridesforthetrieweredeterminedempiricallyandnoattemptwasmadetooptimizethesestridesforindividualdatasets.Wedevelopfastpolynomial-timealgorithmstoconstructspace-optimalconstrained2DMTs(two-dimensionalmultibittries).Theconstructed2DMTsmaybesearchedwithatmostkmemoryaccesses,wherekisadesignparameter.Ourspace-optimalconstrained2DMTsmaybeusedford-dimensionallters,d>2,usingthebucketingstrategyproposedin[ 2 ].Forthecased=2,switchpointersmaybeemployedtogetmultibittriesthatrequirelessmemorythanrequiredbyourspace-optimalconstrained2DMTsandthatpermitpacketclassicationwithatmostkmemoryaccesses.Wedevelopafastheuristictoconstructgoodmultibittrieswithswitchpointers.Experimentsconductedbyusindicatethat,ourproposedspace-optimalconstrained2DMTstructuresaresuperiortootherdatastructuresproposedformulti-dimensional(d>1)packetclassication. 38 ]exploredtheideaofusingmultibittries(i.e.,trieswhosedegreeismorethan2)forone-dimensionalpacketclassication.Theyproposedtheuseofxed-(FST)andvariable-stride(VST)one-dimensionaltriesanddevelopeddynamicprogrammingalgorithmstoconstructspace-optimalFSTsandVSTswhoseheightisk,wherekisadesignparameter 32 ]developimprovedalgorithmsforspaceoptimalone-dimensionalFSTsandVSTs. 19

PAGE 20

2 , 3 , 5 , 8 { 11 , 16 , 21 , 22 , 29 , 33 { 36 ],thoseproposedin[ 2 , 33 , 36 ]appeartobethemosteective.Asweusethesetobenchmarkthestructuresproposedinthispaper,weprovideabriefdescriptionofeach.SrinivasanandVarghese[ 36 ]proposedusingtwo-dimensionalone-bittriesfordestination-sourceprexlters.Theproposedtwo-dimensionaltriestructuretakesO(nW)memory,wherenisthenumberofltersintheclassierandWisthelengthofthelongestprex.Usingthisstructure,apacketmaybeclassiedwithO(W2)memoryaccesses.Thebasictwo-dimensionalone-bittriemaybeimproveduponbyusingpre-computationandswitchpointers[ 36 ].TheimprovedversionclassiesapacketmakingonlyO(W)memoryaccesses.Byusingmultibittriesratherthan1-bittries,thelookupperformancemaybeimprovedfurther.Forexample,witha4-leveldesttriewithstrides8-8-8-8,5-levelsourcetrieswithstrides6-6-6-6-8,andswitchpointers,alookuprequiresonly9memoryaccesses.SrinivasanrecommendsthesestridesforIPv4data.WerefertotheresultingmultibittriewithswitchpointersasGoT-MT(gridoftriesusingmultibitnodes).SrinivasanandVarghese[ 36 ]alsoproposeextensionstohigher-dimensionaltriesthatmaybeusedwithd-dimensional,d>2,lters.Baboescuetal.[ 2 ]suggesttheuseoftwo-dimensionalone-bittrieswithbucketsford-dimensional,d>2,classiers.Basically,thedestinationandsourceeldsoftheltersareusedtoconstructatwo-dimensionalone-bittrie.Filtersthathavethesamedestinationandsourceeldsareconsideredtobeequivalent.Equivalentltersarestoredinabucketthatmaybesearchedserially.Baboescuetal.[ 2 ]reportthatthisschemeisexpectedtoworkwellinpracticebecausethebucketsizetendstobesmall.Theynotealsothatswitchpointersmaynotbeusedinconjunctionwiththebucketingscheme.Toachievespaceeciency,[ 2 ]proposestheuseofpathcompressionandtreebitmaps.Theresultingspace-ecientstructureisreferredtoasEGT-PC-TBM(extendedgridoftrieswithpathcompressionandtreebitmaps). 20

PAGE 21

33 ],whichisoneofthebestknownalgorithmicschemesformultidimensionalpacketclassication,usesadecisiontreeandrulesarestoredinbucketsofboundedsize;eachbucketisassociatedwithatreenode.AmajorweaknessofHyperCutsistheexcessivespaceitneedstorepresentclassiersthathaveamodestnumberofruleswithawildcardinthesourceordestelds. 2.2.1One-BitTriesA1-bittrieisabinarytree-likestructureinwhicheachnodehastwoelementelds,le(leftelement)andre(rightelement)andeachelementeldhasthecomponentschildanddata.Branchingisdonebasedonthebitsinthesearchkey.Aleft-elementchildbranchisfollowedatanodeatleveli(therootisatlevel0)iftheithbitofthesearchkeyis0;otherwisearight-elementchildbranchisfollowed.Levelinodesstoreprexeswhoselengthisi+1intheirdataelds.Aprexthatendsin0isstoredasle:dataandonewhoselastbitisa1isstoredasre:data.Thenodeinwhichaprexistobestoredisdeterminedbydoingasearchusingthatprexaskey.LetNbeanodeina1-bittrieandletEbeanelementeld(eitherleftorright)ofN.LetQ(E)bethebitstringdenedbythepathfromtheroottoNfollowedbya0incaseEisaleftelementeldand1otherwise.Q(E)istheprexthatcorrespondstoE.Q(E)isstoredinE:dataincaseQ(E)isoneoftheprexestobestoredinthetrie.Figure 2-1 showsasetof8prexesandthecorresponding1-bittrie.Theshownattherightendofeachprexisusedneitherforthebranchingdescribedabovenorisinthelengthcomputation.So,thelengthofP1is2. 21

PAGE 22

BFigure2-1. Prexesandcorresponding1-bittrie[ 32 ].A)8-prexexample[ 38 ].B)Corresponding1-bittrie. (onememoryunitbeinglargeenoughtoaccommodateanelementeld).Notethatthestrideofeverynodeina1-bittrieis1.Inaxed-stridetrie(FST),allnodesatthesamelevelhavethesamestride;nodesatdierentlevelsmayhavedierentstrides.Inavariable-stridetrie(VST),nodesmayhavedierentstridesregardlessoftheirlevel.SupposewewishtorepresenttheprexesofFigure 2-1 (A)usinganFSTthathasthreelevels.Assumethatthestridesare2,3,and2.Therootofthetriestoresprexeswhoselengthis2;thelevelonenodesstoreprexeswhoselengthis5(2+3);andlevelthreenodesstoreprexeswhoselengthis7(2+3+2).Thisposesaproblemfortheprexesofourexample,becausethelengthofsomeoftheseprexesisdierentfromthestoreablelengths.Forinstance,thelengthofP5is1.Togetaroundthisproblem,aprexwithanonpermissiblelengthisexpandedtothenextpermissiblelength[ 38 ].Forexample,P5=0*isexpandedtoP5a=00*andP5b=01*.Ifoneofthenewlycreatedprexesisaduplicate,dominancerulesareusedtoeliminateallbutoneoccurrenceoftheprex.Becauseoftheeliminationofduplicateprexesfromtheexpandedprexset,allprexesaredistinct.Figure 2-2 (A)showstheprexesthatresultwhenweexpandtheprexesofFigure 2-1 tolengths2,5,and7.Duplicateprexes,followingexpansion,areeliminated 22

PAGE 23

2-2 (B)showsthecorrespondingFSTwhoseheightis2andwhosestridesare2,3,and2. BFigure2-2. Prexexpansionandxed-stridetrie.A)Expandedprexes.B)Correspondingxed-stridetrie SincethetrieofFigure 2-2 (B)canbesearchedwithatmost3memoryaccesses,itrepresentsatime-performanceimprovementoverthe1-bittrieofFigure 2-1 (B),whichrequiresupto7memoryaccessestoperformasearch.However,thespacerequirementsoftheFSTofFigure 2-2 (B)aremorethanthatofthecorresponding1-bittrie.FortherootoftheFST,weneed4units;thetwolevel1nodesrequire8unitseach;andthelevel3noderequires4units.Thetotalis24memoryunits.Notethatthe1-bittrieofFigure 2-1 requiresonly20memoryunits.LetNbeanodeatleveljofamultibittrie.Lets0,s1,,sjbethestridesofthenodesonthepathfromtherootofthemultibittrietonodeN.Notethats0isthestrideoftherootandsjisthestrideofN.WithnodeNweassociateapair[s;e],calledthestart-endpair,thatgivesthestartandendlevelsofthecorresponding1-bittriecoveredbythisnode.Bydenition,s=Pj1i=0sjande=s+sj1.InthecaseofanFSTallnodes 23

PAGE 24

Exampleofsevendest-sourcelters Filter Dest Source Cost F1 0* 1100* 1 F2 0* 1110* 2 F3 0* 1111 3 F4 000* 10* 4 F5 000* 11* 5 F6 0001* 000* 6 F7 0* 1* 7 atthesameleveloftheFSThavethesame[s;e]values.IntheFSTofFigure 2-2 (B),the[s;e]valuesforthelevel0,level1,andlevel2nodesare[0;1],[2;4]and[5;6],respectively.Startingwitha1-bittriefornprexeswhoselengthisatmostW,thestridesforaspace-optimalFSTwithatmostklevelsmaybedeterminedinO(nW+kW2)time 32 , 38 ].Foraspace-optimalVSTwhoseheight 32 , 38 ]. 2-1 .Foreachrule,thelterisdenedbytheDest(destination)andSourceprexes.So,forexample,F1=(0;1100)matchesallpacketswhosedestinationaddressbeginswith0andwhosesourceaddress 32 , 38 ]assumeswestartwithdataextractedfromthe1-bittrie;theextractionofthisdatatakesO(nW)time.3 2 and 3 ismoreconvenient.4 24

PAGE 25

Two-dimensional1-bittrieforTable 2-1 beginswith1100.Whenapacketismatchedbytwoormorelters,therulewithleastcostisused.TheclassierofTable 2-1 mayberepresentedasa2D1BTinwhichthetop-leveltrieisconstructedusingthedestinationprexes.Inthecontextofourdestination-sourcelters,thistop-leveltrieiscalledthedestinationtrie.LetNbeanodeinthedestinationtrieandletEbeoneoftheelementelds(eitherleorre)ofN.IfnodestprexequalsQ(E),thenN:E:datapointstoanemptylower-leveltrie.IfthereisadestprexDthatequalsQ(E),thenN:E:datapointstoa1-bittrieforallsourceprexesSsuchthat(D;S)isalter.Inthecontextofdestination-sourcelters,thelower-leveltriesarecalledsourcetries.WesaythatNhastwosource-tries(oneorbothofwhichmaybeempty)hangingfromit.Figure 2-3 givesthe2D1BTfortheltersofTable 2-1 . 2-1 ,u=3,D1=0,D2=000andD3=0001.Thedestinationtrieofthe2DMTforour7ltersisamultibittrieforD1{D3(seeFigure 2-4 ).InthedestinationtrieofthisgureelementeldsthatcorrespondtoaDiortheexpansionofaDi(incaseDi

PAGE 26

2DMTforsevendest-sourceltersofTable 2-1 istobeexpanded)aremarkedwith`+'.Theremainingelementeldsaremarkedwith`'.Elementeldsmarkedwith`+'haveanon-emptysourcetriehangingfromthem,theremainingelementeldshaveanempty(ornull)sourcetriehangingfromthem.LetL(p)bethelengthoftheprexp,letDi;j=fDqjDqisaprexofDiandL(Di)L(Dq)jg,andletSi;j=fSqj(Dq;Sq)isalterandDq2Di;jg.WeseethatDi;0=fDigforalli.Forourexamplelterset,D1;0=f0g,S1;0=f1;1100;1110;1111g,D2;0=D2;1=f000g,S2;0=S2;1=f10;11g,D2;2=f0;000g,S2;2=f1;10;11;1100;1110;1111g,D3;0=f0001g,S3;0=f000g,D3;1=D3;2=f0001;000g,S3;1=S3;2=f000;10;11g,D3;3=f0001;000;0g,andS3;3=f000;10;11;1100;1110;1111g.LetNbeanodeinthedestinationtrieofa2DMT.Let[s;e]bethestart-endpairassociatedwithNandletEbeanelementeldofN.LetDibethelongestdestinationprex,s
PAGE 27

2-4 for00010takesusthroughthe0eldoftheroot,the00eldoftheroot'sleftchild,andthe1eldoftheremainingdest-trienode.Eachofthesethreeeldshasanon-emptysource-triehangingfromit.So,threesourcetriesaresearchedfor11101.Whentherootsource-trieissearched,theelementeldswithF2andF7areencountered.Whenthesourcetrieintheleftchildoftherootissearched,theelementeldwithF5isencountered.Thesearchintheremainingsourcetrieencountersnomatchinglters.Theleast-costmatching-lterisdeterminedtobeF2.LetPbeanyroottoleafpathinthedestinationtrieofa2DMTT.LetthesumoftheheightsofthesourcetrieshangingfromtheelementeldsonthispathbeH(P).Themaximumnumberofmemoryaccesses(MNMA)neededtondtheleast-costmatchinglterforagivendestination-sourceaddresspairisMNMA(T)=maxPfH(P)+numberofnodesonPgWecanreducetheMNMAneededforasearchofa2DMTthroughtheuseofswitchpointers[ 35 ].Although[ 35 ]describestheuseofswitchpointersonlyinthecontextof2D1BTs,theconceptisextendedeasilyto2DMTs(thoughnotto2DMTsaugmentedwiththebucketingschemeof[ 2 ]). 27

PAGE 28

Algorithmforcalculatingdominatinglevels. two-dimensionalclassier.LetD1,D2,,Dubethesedistinctdestinationprexes.EachDiisstoredinexactlyoneoftheelementeldsofexactlyoneofthenodesofO.LetNbeanodeofO.LetEbeoneoftheelementeldsofN.IfQ(E)isoneoftheDis,thisDiisstoredinE:data.Otherwise,E:dataisempty.WesaythatQ(E)isdominatedbyDiiL(Q(E))
PAGE 29

Algorithmtocompute1-bitsourcetries initialinvocationissourceTries(root(O)).ThecorrectnessofthisalgorithmfollowsfromthedenitionsofSi;jandS[].Notethatwhenthereareudistinctdestprexes,exactlyuelementeldshaveanon-nullS[0].So,theunionbetweentriesdoneinthesecondforloopneedstobedoneonlyforuelementelds.Attheremainingelementelds,wesimplycopytheS[i1]values,whicharepointerstosource-trieroots,fromtheparentelement.Therefore,thenumberofunionoperationsisO(uW).Whenthetotalnumberofltersinthetwo-dimensionalclassierisn,eachS[j]hasO(nW)nodesandauniontakesO(nW)time.So,thetimecomplexityofsourceTriesisO(unW2)=O(n2W2).ThespacecomplexityisO(n2W2)becausethereareO(uW)=O(nW)sourcetriesandeachsourcetriehasO(nW)nodes.LetpbeasourceprexstoredinE.IfthecostofthelterassociatedwithpismorethanthatofanyoneoftheprexesstoredinanancestorelementofE,pmaybedeletedfromthatsourcetrie.ThisobservationisusedinAlgorithm 2-7 toreducethesizeofasourcetrie. 29

PAGE 30

Algorithmtoremovedominatednodes 30

PAGE 31

maxPfH(P)+nodes(P)gz(2{1)NotethateveryT,T22DMTb(z),canbesearchedwithatmostzmemoryaccessesperpacket.So,everyTin2DMTb(k)providesthedesiredperformance.Inthethirdvariant,2DMTc,thedestinationtrieisaVSTandallsourcetriesthathangfromthesamedestination-trienodeobeythesameheightconstraint.A2DMTcTisin2DMTc(z)iEquation 2{1 issatisedforeveryroot-to-leafpathPinT.Asisthecasefor2DMTb(z)s,everyT,T22DMTc(z),canbesearchedwithatmostzmemoryaccessesperpacket.So,everyTin2DMTc(k)providesthedesiredperformance.Inthefourthandnalvariant,2DMTd,thedestinationtrieisaVSTandnoconstraintisplacedonthesourcetriesthathangoofanode.A2DMTdTisin2DMTd(z)iEquation 2{1 issatisedforeveryroot-to-leafpathPinT.Asisthecasefor2DMTb(z)sand2DMTd(z)s,everyT,T22DMTd(z),canbesearchedwithatmostzmemoryaccessesperpacket.So,everyTin2DMTd(k)providesthedesiredperformance.Let2DMTa(k)betheunionof2DMTa(y;z)overallyandzsatisfyingy(z+1)kandlet2DMT(k)bethesetof2DMTsthatcanbesearchedwithatmostkmemoryaccesses.Notethateventhoughnoconstraintsareplacedonthesourcetriesofa2DMTd,aspace-optimal2DMTd(k)maynotbeaspace-optimal2DMT(k).Thisisbecause,2DMT(k)maycontaintriesthathaveroot-to-leafpathsPthatdonotsatisfyEquation 2{1 .Ofcourse,everypathPthatviolatesEquation 2{1 mustbeapathforwhichnosearchqueryispossible. 31

PAGE 33

Algorithmtocomputef(s;e;z) levelseach.Letf(s;e;z)=nodes(s)2es+1+sourceSum(s;e;z).Iflevellofa2DMTahasthestart-endpair[s;e],thenf(s;e;z)givestheminimumnumberofelementsatlevellofthe2DMTa(thiscountincludeselementsinthesourcetriesthathangfromthelevelldest-trieelements).Fors>e,denef(s;e;z)=0.Algorithmf2DMTa(Figure 2-8 )computesf(s;e;z),0se
PAGE 34

z+1;z)gSo,oncetheOpt1valueshavebeencomputedthenumberofelementsinthespace-optimal2DMTa(k)maybedeterminedexpendinganadditionalO(k)time.Theoverallcomplexityofouralgorithmtocomputethenumberofelementsinaspace-optimal2DMTa(k)isO(n2W3k+W2k2)=O(n2W3k)(undertheassumptionn>maxfk;Wg).ItsspacecomplexityisO(n2W2)(thespacecomplexityisdeterminedbythespace 34

PAGE 35

2.6.1 .LetOpt2(j;z)betheminimumnumberofelementsinaspace-optimal2DMTb(z)thatcoversonlylevels0throughjofO.Whenthedestinationtriehasonly1level,Opt2(j;z)=f(0;j;z).Whenthedestinationtriehasmorethan1level,thestart-endpairforthelastlevelis[i+1;j]forsomeiintherange[0;j1].Ifthesourcetrieshangingfromthelastlevelofthedestinationtriehaveatmostylevels,thelastlevelcontributesy+1tothevalueofH(P)+numberofnodesonPforsomepathP.So,wegetthefollowingrecurrenceforOpt2. 35

PAGE 36

(2{6)

PAGE 37

where UsingastrategysimilartothatusedtocomputefinSection 2.6.1 ,wecancomputetheneededgvaluesinO(n2W3k)timewhenthesourcetriesareVSTsandinO(nW3k)timewhenthesourcetriesareFSTs.ThereareO(nW2k)Opt3valuestocomputeusingEquation 2{6 .EachoftheseiscomputedinO(1)time(exclusiveofthetimerequiredtocomputetheneededOpt3(;0;)values),foratotaltimeofO(nW2k).WealsohaveO(nWk)Opt3valuestocomputeusingEquation 2{9 .EachofthesetakesO(Wk)time.So,thetotaltimeneededtocomputetheseO(nWk)Opt3valuesisO(nW2k2).Hencethetimeneededtodeterminethenumberofelements,Opt3(root(O);k)=Opt3(root(O);0;k),inthespace-optimal2DMTc(k)fortheclassierisO(n2W3k+nW2k2)=O(n2W3k)whenthesourcetriesareVSTsandO(nW3k+nW2k2)whenthesourcetriesareFSTs.ThespacecomplexityisO(n2W2). 2.6.3 exceptthatMnowisanodeofthe2DMTd.sA(N;q)denotesthesubsetofsourceT(N;q)thathangfromelementsofMthathaveanon-nullchild.sB(N;q)denotessourceT(N;q)sA(N;q).Aq+1(N;s)comprisesallnodesR2Dq+1(N)forwhichthereisanelementEinMwiththeproperties(a) 37

PAGE 38

TocomputeOpt4(root(O));k),wemustcomputeO(nWk)Opt4(:;:)values.IttakesO(nW)timetocomputeeachOpt4(:;:)valueexclusiveofthetimeneededtocomputetheopt(:;:)andh(:;:;:;:)values.So,thetotaltimeneededisO(n2W2k)plusthetimefortheoptandhvalues.Foreachs,ittakesO(nW2k)time[ 32 , 38 ]tocomputeopt(s;).SincethereareO(nW)svalues,allopt(;)valuesmaybecomputedinO(n2W3k)time.Forh(),weseethatthereareO(nW)svalues,O(nW)Nvalues,O(W)qvaluesandO(k)zvalues.However,foranytriple(N;q;z),onlyO(n)svaluesarepossible.So,thetotal 38

PAGE 39

2.6.1 { 2.6.3 .IfasourcetrieofTisonnosearchpathPwithH(P)+nodes(P)=kThenthissourcetriemaybereplacedwithanoptimalsourcetrieoflargerheight.Thisreplacement,ofcourse,isdoneonlyifthelarger-heightsourcetriehasasmallernumberofelements. 39

PAGE 40

2DMTofFigure 2-4 augmentedwithswitchpointers 2 , 36 ].Switchpointersalsomaybeusedwith2DMTstoreducethenumberofmemoryaccessesduringthesearchforaleast-costmatchinglter.LetEbeanelementeldinthedest-trieofa2DMTTandletsT=E:databethesourcetriehangingfromE.AssumethatsT6=null.LetE1beanelementeldinsTsuchthatE1:child=null.LetE2bethenearestancestorofEsuchthatthesourcetrieE2:datahasanelementeldE3suchthatQ(E1)isaproperprexofQ(E3)andE3:data6=null.IfthereisnosuchE2theswitchpointerlistforE1isnull.AssumethatsuchanE2existsandletFbeanE3inE2:datasuchthatL(Q(E3))isleast.LetNbethenodeofE2:datathatcontainstheelementeldF.TheelementE1maintainsalistofpointerstoallelementeldsGofNsuchthatQ(E1)isa(proper)prexofQ(G)andeitherthedataorchild(orboth)eldofGisnotnull.Thenodesonthispointerlist,whichiscalledtheswitch-pointerlist,areindexedbybitsequencesbsuchthatQ(E1)jjb=Q(G).Figure 2-9 showsthe2DMTofFigure 2-4 augmentedwithswitch-pointerlists.Ina2DMTwithswitchpointers(2DMTS),sourcetriesareaugmentedwithswitch-pointerlists.Additionally,everyelementeldE1ineverysourcetriestorestheleast-costlter(D,S)thatmatchesthedestinationprexQ(E)andthesourceprex 40

PAGE 41

2-9 .Werstsearchthedest-triefor00010.ThissearchstopsattheelementwithQ()=0001.Wecontinuebysearchingfor11101inthesourcetriehangingfrom0001.ThissearchterminatesatandstopsattheelementwithQ()=1.Thiselementhasboththeleast-costlterthatmatches(0001,1)(itisemptyinthiscase)andalistofswitchpointers.Sincetheswitchpointersonthislistareindexedwith1-bitindexes,weusethenextbitofsatomoveintoanancestorsourcetrie.ThesearchcontinuesattheelementeldwithQ()=11ofthesourcetriehangingfromthedestination-trieelementwithQ()=000.Thesearchforsa=11101terminatesatthiselement.Thiselementstorestheleast-costlterthatmatches(000,11)aswellasalistofswitchpointers.Sincetheseswitchpointersuse2-bitindexes,weusethenexttwobitsofsa(i.e.,10)tomovetoanelementinanancestorsourcetrie.WegettotheelementwithQ()=1110inthesourcetriehangingfromtherootofthedestinationtrie.Whenwesearchfromhereforsa,thesearchterminatesatthiselement.Sincethiselementhasnoswitchpointer,wearedone.Byexaminingtheleast-costltersencounteredonthissearchpath,wemaydeterminetheleast-costlterthatmatches(da;sa).Whenimplementinga2DMTS,wedonotactuallystoreswitch-pointerlistsexplicitly.Thisisbecauseoftheexcessivespacerequiredbytheselistsandbecauseoftheexcessive

PAGE 42

36 ]haveobservedthatwhenswitchpointersareused,alookupmayfailtondtheleast-costlterbecausewhenaswitchpointerisfollowedshortersourceprexesareoverlookedinthenewsourcetrie.Toovercomethisproblem,[ 36 ]proposestheadditionoftheeldstoredFilterforeveryelementEassociatedwithadest-sourceprexpair(da;sa).Thiseldrecordstheleast-costlterfromSF(da;sa),whereSF(da;sa)=f(d;s)jfdisaprexofdaandsisaprexofsag.Weadoptthesamestrategyinouruseofswitchpointers.Althoughwemayconvertthespace-optimal2DMTa(k),2DMTb(k),2DMTd(k)and2DMTd(k)triesobtainedusingthealgorithmsofSections 2.6.1 { 2.6.4 into2DMTSsandimprovesearchperformance,thereappearstobenoeasywaytomodifythealgorithmsofSections 2.6.1 { 2.6.4 toconstructspace-optimalconstrained2DMTSsthatmaybesearchedwithatmostkmemoryaccesses.LetTbeanl-levelFST.LetsibethestridefornodesatleveliofT.(s0;;sl)isthestridesequenceforT.AsetofFSTsissaidtobeuniformiallFTSsinthesethavethesamestridesequence.Forpurposesofthisdenitionweregardtwosequencesandthesameevenwhenisaproperprexof.Let2DMTSa(k)bea2DMTSinwhichthedestinationtrieisanFSToraVSTandallsourcetriesdeneauniformsetofFSTs.Figure 2-10 showsa2DMTSaforthedataofTable 2-1 .Inthis2DMTSa,thestridesequenceforthesourcetriesis2,1,1.Whenauniformsetofsourcetriesisused,theswitchpointerschemeissimpliedasshowninFigure 2-10 .Itiseasytoseethatthenumberofmemoryaccessesneededtosearcha2DMTSaisatmost(heightof 42

PAGE 43

2DMTSafortheltersofTable 2-1 destinationtrie+maximumheightofasourcetrie).So,the2DMTSaofFigure 2-10 isa2DMTSa(6).In2DMTSa,whencomputingthestoredFilterforanelement(da;sa),wemayreducethesetofcandidateltersSF(da;sa)toSubSF(da;sa;s;e)=f(dp;sp)j(dp;sp)2SF(da;sa)andsspeg,where[s;e]isthestart-endpairofthenodeinwhichEresides.Weseethat,alongasearchpathusingswitchpointers,storedFilterscomputedfromSubSFsguaranteethatallmatchingltersofshortersourceprexeshavebeentakenintoaccount.UsingSubSFinsteadofSFtocomputestoredFiltersreducesthenumberofstoredFiltereldsthatactuallycontainalter.This,inturn,reducesthetotalmemoryrequired(nullstoredFiltereldsneedbeonly1biteach).OurexperimentsindicatethatusingSubSFismuchmoreecientthanusingSF(da;sa)whenthepackedarray[ 38 ]andbutlernode[ 17 ]techniquesareusedtocompressamultibitnode.Bothofthesetechniquesattempttoreplaceasubtrieoramultibitnodewithasmallamountofactualdata(prexes,storedFilters,andpointers,etc.)byasinglenodethatcontainsthesedata.Figure 2-11 givesanalgorithmtodetermineanapproximationtotheminimumnumberofelementsinaspace-optimal2DMTSa(k).Thisalgorithmessentiallytriesallcombinationsofxandkx,wherexistheheightofthedestinationVSTandkxisthemaximumheightofthesourceFSTsinauniformsetofFSTs.TheVSTsused 43

PAGE 44

Heuristictodetermineagood2DMTSa(k). forthedestinationtriearerequiredtobespace-optimalVSTs.opt()referstotheVSTalgorithmof[ 32 ].OncethisVSTalgorithmisrunfromthestatementoutsidetheforloop,wehavethenumberofelementsintheoptimalheightlVST,1lk)astheoverallcomplexityofAlgorithm2DMTSa.WhenthedestinationtrieisanFST,thecomplexityofopt(O;x)isO(Wk)andthetotalcomplexityisO(n2W2). 44

PAGE 45

42 ].Eachofthesedatasetsactuallyhas10dierentdatabasesofrules.So,inall,wehave120databasesof5-dimensionalrules.Thedatasetsweregeneratedusingthetwelveparameterles(acl[1-5] seed,fw[1-5] seed,ipc[1-2] seed)givenin[ 43 ]andthespecication\numberoflters=20000,smoothness=0,addressscope=0,applicationscope=0"(theseparametersaredescribedin[ 42 ]).Thedatasets,whicharenamedACL1throughACL5(AccessControlList),FW1throughFW5(Firewall),IPC1andIPC2(IPChain)have,respectively,20K,19K,19K,19K,12K,19K,19K,18K,17K,17K,19K,and20Krules,onaverage,ineachdatabase.Our2-dimensionaldatasets,whichwerederivedfromour5-dimensionaldatasets,have,respectively,20K,19K,10K,13K,5K,19K,19K,18K,17K,17K,16Kand20Krulesonaverageineachdatabase.The2-dimensionalruleswereobtainedfromour5-dimensionalrulesbystrippingothesourceanddestinationporteldsaswellastheprotocoleld;thedestandsourceprexeldwereretained.Followingthisstrippingprocess,duplicatesweredeleted(i.e.,tworulesareconsideredduplicateiftheyhavethesamedestprexandthesamesourceprex).Forthepostprocessingstep(Section 2.6.5 ),weexaminedeachsourcetrieTintheconstructedtwo-dimensionaltrie.LetEbetheelementofthedestinationtriefromwhichThangs.IfE:child6=null,Twasnotchanged.IfE:child=null,TwasredeterminedpermittingitsheighttobeasmuchaskH(P)nodes(P),wherePisthepathfromtherootofthedestinationtrietoEandk,H(P)andnodes(P)areasdenedinSection 2.6 . 45

PAGE 46

2-12 givestheaveragenumberofelementsinthespace-optimalconstrained2DMTswithVSTsourcetriesforour12datasets.Whenthisnumberexceeds109,thegureuses109tocompletetheplot.For2DMTa,2DMTband2DMTconlythedatawithpostprocessingincludedisshown.WhenVSTsourcetriesareused,space-optimal2DMT(k)shaveupto22%lesselementsthanthosewithFSTdestinationtries.Althoughpriortopostprocessingspace-optimal2DMTashavesignicantlymoreelementsthandospace-optimal2DMTbs,whichinturnhavesignicantlymoreelementsthanspace-optimal2DMTcs,followingpostprocessingthedierencewasconsiderablyless.Priortopostprocessing,thenumberofelementsina2DMTdecreasesaskincreases.However,followingpostprocessingthenumberofelementsincreasedaskwasincreasedforsomeofourdatasets.Forexample,forFW3,2DMTa(26)s,attimes,hadfewerelementsthandid2DMTa(28)s.On5ofour12datasets(ACL2-5,IPC1),space-optimal2DMTdshavesignicantlylesselementsthandospace-optimal2DMTcs,butthedierencewasslightfortheremaining7datasets.Next,weexperimentedwithourheuristicsfor2DMTswithswitchpointers.(RecallthatswitchpointersmaynotbeusedwiththebucketingschemeofBaboescuetal.[ 2 ].)Wefoundthat2DMTSa(k)swithFSTdestinationtrieshaveupto30%moreelementsthan2DMTSa(k)swithVSTdestinationtries.Figure 2-13 comparesthememoryrequirementsof2DMTd(k)susingVSTsourcetriesand2DMTSa(k)susingVSTdesttries.Forthiscomparison,thecompressiontechniquespackedarray[ 38 ]andbutlernode[ 17 ]wereappliedto2DMTdsand2DMTSas.Forbenchmarkingpurposes,thesizeofapackedarrayandofabutlernodewassetto144bits.ThissizesettingsallowsustoexamineapackedarrayorabutlernodewithasinglememoryaccessofaQDRIISRAM,whichsupports144bitsburstpermemoryaccess.Bothelementelds,eitherinthedesttrieorinthesourcetrie,weresettohave48bitseach. 46

PAGE 47

B C D E F G H I J K LFigure2-12. Numberofelementsinspace-optimalconstrained2DMTswithpostprocessing.SourcetriesareVSTs.A)ACL1.B)ACL2.C)ACL3.D)ACL4.E)ACL5.F)FW1.G)FW2.H)FW3.I)FW4.J)FW5.K)IPC1.L)IPC2. Tables 2-2 and 2-3 givethetotalmemoryrequirementsof2DMTSa(k)sand2DMTd(k)s.Fork6,2DMTd(k)srequiredsignicantlylessmemorythanthatrequiredby2DMTSa(k)son7ofourtwelvedatasets(ACL1,FW1-5,IPC2),Fortheremaining5datasets,2DMTd(k)sbecamesuperiorto2DMTSa(k)sonlyforlargervaluesofk(for 47

PAGE 48

Table2-2. Memory(KBytes)requiredby2DMTSaswithVSTdesttries ACL1 ACL2 ACL3 ACL4 ACL5 FW1 FW2 FW3 FW4 FW5 IPC1 IPC2 6 23459 29822 17481 21201 6025 24860 14640 17897 28950 18949 28514 13627 7 5665 8139 8257 12884 989 5395 4588 8314 7365 6642 10599 5905 8 3847 5621 5407 5408 558 4155 3316 3593 5342 2689 6496 4524 9 2870 2900 2959 3122 388 3662 2044 2689 4697 2458 4624 4310 10 2756 2494 2034 2592 257 2484 1645 2663 2688 2262 3767 3059 11 2756 2325 1572 1881 233 2483 1454 2075 2572 1677 2519 2984 12 2756 2287 1295 1467 220 2483 1454 2050 2572 1676 1718 2945 13 2756 2285 1206 1373 218 2411 1454 2048 2367 1674 1646 2893 14 2756 2284 1156 1340 218 2411 1454 2047 2367 1674 1640 2893 15 2756 2284 1131 1326 218 2411 1454 2047 2367 1674 1640 2893 16 2756 2284 1104 1321 218 2411 1454 2047 2367 1674 1640 2893 Memory(KBytes)requiredby2DMTdswithVSTsourcetries ACL1 ACL2 ACL3 ACL4 ACL5 FW1 FW2 FW3 FW4 FW5 IPC1 IPC2 6 619 29201 561023 721949 177478 18443 537 15130 6933 20586 1008361 421 7 619 20682 192271 188870 31184 384 521 362 378 314 273127 405 8 618 553 19741 23303 5469 361 520 304 333 312 55683 401 9 618 510 8966 12435 1369 358 520 289 331 311 16718 400 10 618 500 2976 6892 555 357 520 289 331 310 6506 400 11 618 495 1843 3701 173 357 520 289 331 310 1590 400 12 618 492 1471 2583 113 357 520 289 331 310 616 400 13 618 491 757 2145 89 357 520 289 331 310 434 400 14 618 491 443 1160 75 357 520 289 331 310 292 400 15 618 491 284 660 65 357 520 289 331 310 266 400 16 618 491 215 475 60 357 520 289 331 310 259 400 36 ],EGT-PCsusingtreebitmaptries(EGT-PC-TBMs)[ 2 ],andHyperCuts[ 33 ].IntheirGoT-MTs,Srinivasanetal.[ 36 ]used4-levelFSTdesttriesand5-levelFSTsourcetries;theyemployedalsopackedarrays[ 38 ]andbutlernodes.Thestridesforthedesttrieweresetto8-8-8-8andthoseforthesourcetriesto6-6-6-6-8.InEGT-PC-TBMs,wesetthesizeofatreebitmapnode[ 7 ]to144bitssothatsuchanodecouldbeexaminedwithasinglememoryaccessusingaQDRIISRAM.OftheseveralHyperCutsschemesproposedin[ 33 ],HyperCuts-4isthemostecient.So,weusedthisversionofHyperCutsinourexperiments.Since[ 33 ]has 48

PAGE 49

B C D E F G H I J K LFigure2-13. Memoryrequirementsof2DMTdsand2DMTSas.A)ACL1.B)ACL2.C)ACL3.D)ACL4.E)ACL5.F)FW1.G)FW2.H)FW3.I)FW4.J)FW5.K)IPC1.L)IPC2. establishedthesuperiorityofHyperCutsoverEGT-PCs[ 2 ],wedidnotincludeEGT-PCsinourexperiments.Table 2-4 givesthememoryandnumberofmemoryaccessesperlookuprequiredbyGoT-MT,EGT-PC-TBMandHyperCuts.Figure 2-14 plotsthisdata.Ascanbeseen,GoT-MTs,whichneedonly9accessesforalookup,weresuperiortobothEGT-PC-TBMsandHyperCutsintermsofthenumberofmemoryaccessesperlookupmetric.The 49

PAGE 50

Table2-4. Totalmemory(KBytes)andnumberofmemoryaccesses(MAs)requiredbyGoT-MTs,EGT-PC-TBMs,andHyperCutsstructureon2-dimensionaldatasets GoT-MT EGT-PC-TBM HyperCut DataSet Mem MAs Mem MAs Mem MAs ACL1 5322 9 1101 21 463 13 ACL2 8906 9 919 26 9094 24 ACL3 6475 9 414 63 1605 43 ACL4 7162 9 597 76 11136 56 ACL5 2228 9 141 40 123 37 FW1 5685 9 1076 23 200901 28 FW2 7796 9 1015 22 67364 30 FW3 3955 9 1037 23 230229 26 FW4 6954 9 753 23 64675 28 FW5 6013 9 1000 23 142792 28 IPC1 8189 9 628 45 5111 38 IPC2 5790 9 1149 23 87482 27 BFigure2-14. PerformanceofGoT-MTs,EGT-PC-TBMs,andHyperCutson2-dimensionaldatasets.A)memory.B)numberofaccesses 50

PAGE 51

2 ].Westartwitha2D1BTforthedestinationandsourceprexes.Allrulesthathavethesamedest-sourceprexpair(dp;sp)areplacedinabucketthatispointedatfromtheappropriatesourcetrienodeofthe2-dimensionaltrie.Sincedpandsparedenedbythepathtothisbucket,thedestandsourceprexeldsarenotstoredexplicitlyinabucket.However,thesourceportrange,destportrange,protocoltype,priorityandactionarestoredforeachruleinthebucket.The2DMTdalgorithmofthispaperisusedtoobtaina2DMTfortheconstructed2D1BTwithbuckets.Inthis2DMTtrie,eachsource-trieelementisapointertoabucket.2DMTdswithbucketsarecalledextended2DMTds.Duringsource-prexexpansionin2DMTds,asourceprexSmayexpandtoseveralelements.Whenthishappens,eachoftheelementsexpandedtoinheritsthebucketofrulesassociatedwithS.Tosavespace,welinktogetherallbuckets(inorderofthelengthoftheirassociatedsourceprexes)inheritedbyanelement.Notethatina2DMTd,asourcetriemaystoreaprexfromanothersourcetrie.Inthiscase,therulesassociatedwiththisprexarestoredinbothsourcetries. 51

PAGE 52

2 ]statethatwhen2-dimensionaltrieswithbucketsareused,asabove,for5-dimensionaltables,mostbucketshavenomorethan5rulesandnobuckethasmorethan20rules.Whilethisobservationwastrueofthedatasetsusedin[ 2 ],somebucketshadsignicantlymorerulesforourdatasets.Forexample,inFW4,about100rulescontainwildcardsinboththedestandsourceprexelds.Theserulesmayberemovedfromtheoriginaldatasetandstoredinasearchstructurethatisoptimizedfortheremaining3elds.WenotethatthisstrategyofstoringalargeclusterofruleswithwildcardsinthedestandsourceprexeldsinaseparatestructurewasusedearlierintheHyperCuts[ 33 ]scheme.Thedatareportedinthefollowingguresandtablesareonlyforstructuresconstructedfortherulesthatremainafterruleswithwildcardsinbothdestandsourceprexeldsareremoved.Extended2DMTdswereconstructedunderthesameconstraintsasusedfor2DMTds.Thatis,thenumberofmemoryaccessesperlookupwasxedtothesmallestnumbersuchthatanincreaseinthisnumberwouldnotreducethetotalmemorysignicantly.Table 2-5 comparestheperformanceofextended2DMTds,EGT-PC-TBMsandHyperCutsonourtwelve5-dimensionaldatasets.Figure 2-15 plotsthisdata.Thenumberofbitsperrulerequiredbytheextended2DMTdsstructurewasbetween107and336;theaveragewas223.ThecorrespondingnumbersforEGT-PC-TBMsandHyperCutsare,respectively,134,528,405,and242,163519,56801.Itisimportanttonotethatthereisawidevariationinthebits/rulerequiredbyHyperCuts;thebits/rulerequiredbyextended2DMTdsisfarbetterpredictable.Inparticular,[ 33 ]reportsthattheperformanceofHyperCutsisnotgoodforrewall-likedatabasesasthesetendtohaveahighfrequencyofwildcardsinthesourceand/ordestelds.Infact,[ 33 ]reportsthata10%presenceofwildcardsineitherthesourceordestprexeldsresultedinasteepincreaseinmemoryrequirement!Thisobservationisconrmedbyourexperiments.HyperCutsexhibiteditsbestbits/ruleperformanceonACL1andACL5(242and400,respectively),inwhichthefrequencyofwildcardsineitherthesourceordesteldsislessthan1%.Itexhibitedits 52

PAGE 53

Table2-5. Totalmemory(KBytes)andnumberofmemoryaccesses(MAs)requiredbyextended2DMTds,EGT-PC-TBM,andHyperCutsstructureon5-dimensionaldatasets extended2DMTds EGT-PC-TBMs HyperCuts DataSet Mem MAs Mem MAs Mem MAs ACL1 821 9 1205 24 605 16 ACL2 702 13 1000 31 10487 24 ACL3 309 41 510 87 19591 43 ACL4 444 44 701 101 17661 44 ACL5 156 29 195 56 600 44 FW1 518 11 1177 27 308121 26 FW2 671 8 1110 24 189751 24 FW3 429 12 1135 27 341813 23 FW4 547 11 874 27 199096 24 FW5 435 11 1094 27 347478 20 IPC1 431 27 719 58 38863 51 IPC2 573 8 1242 25 64394 24 Ascanbeseen,extended2DMTdsweresuperiortoEGT-PC-TBMsandHyperCutsonbothcriteria(exceptonACL1forwhichHyperCutsrequired26%lesstotalmemorythanrequiredbyextended2DMTds).Onthememoryrequirementmetric,EGT-PC-TBMsweregenerallysuperiortoHyperCuts,whileontheothermetric-numberofmemoryaccessesperlookup,HyperCutsweresuperiortoEGT-PC-TBMs.ThememoryrequiredbytheEGT-PC-TBMsnormalizedbythatofextended2DMTdswasbetween1.25and2.65(theaverageandstandarddeviationwere1.82and0.45,respectively).ForHyperCuts,thenormalizedmemorywasbetween0.74and799(theaverageandstandard 53

PAGE 54

BFigure2-15. Performanceofextended2DMTds,EGT-PC-TBM,andHyperCutson5-dimensionaldatasets.A)memory.B)Accesses. deviationwere264and306,respectively).ThenumberofmemoryaccessesrequiredtosearchEGT-PC-TBMsnormalizedbythatforextended2DMTdswasbetween1.93to3.13;theaverageandstandarddeviationwere2.44and0.35,respectively.ThecorrespondingnumbersforHyperCutswere1,3,1.95,and0.63.Forallthreeschemes,thereportedmemoryandaccessesareonlyfortherulesthatremainafterruleswithwildcardsinbothdestandsourceprexeldsareremoved.Insummary,thememoryrequirementof2DMTdsisbetterpredictedandfarlessonaverageandworst-casedata;2DMTdsalsoaresuperioronthememoryaccesscriteria. 2 ],usingbetween29%and60%thememory,space-optimaltwo-dimensionalmultibittriesmaybesearchedwithbetween25%and38%ofthememoryaccesses.For5-dimensionalclassiers,extendedspace-optimalconstrainedtwo-dimensionalmultibittriesusingthe 54

PAGE 55

33 ].Wehaveproposedaheuristictoconstructtwo-dimensionalmultibittrieswithswitchpointers.Thesetriesmaybeusedonlywithtwo-dimensionallters.UsingthesamenumberofaccessesperlookupbudgetasforGoT-MTs[ 36 ]withswitchpointers,ourtwo-dimensionalmultibittriesjustrequiredfrom17%to74%thememory. 55

PAGE 56

23 ],and2DMTasand2DMTdsinChapter 2 .Thevariable-strideone-dimensionaltriesconstructedbyourheuristicrequiresignicantlylessper-stagememorythanrequiredbyoptimalpipelinedxed-stridetries.Also,thepipelinedtwo-dimensionalmultibittriesconstructedbyourproposedheuristicsaresuperior,forpipelinedarchitectures,totwo-dimensionalmultibittriesconstructedbythebestalgorithmsproposedinChapter 2 fornon-pipelinedarchitectures.Section 3.1 reviewsrelatedworkonmultibittriesdevelopedforpipelinedarchitectures.InSection 3.2 wedevelopan(heuristic)algorithmforpipelinedone-dimensionalvariable-stridetries.Fortwo-dimensionaltries,weconsidertwostrategiestomapatwo-dimensionaltrieontoapipelinedarchitecture{coursegrainandnegrain.Ouralgorithmsforcoarse-grainmappingaredevelopedinSection 3.3 andthoseforne-grainmappinginSection 3.4 .AnexperimentalevaluationofouralgorithmsisconductedinSection 3.5 . 4 ]andKimandSahni[ 23 ]haveproposedalgorithmsfortheconstructionofoptimalxed-stridetriesforone-dimensionalprextables;thesexed-stridetriesareoptimizedforpipelinedarchitectures.BasuandNarliker[ 4 ]listthreeconstraintsforoptimalpipelinedxed-stridemultibittries:(C1)Eachlevelinthexed-stridetriemusttinasinglepipelinestage;(C2)Themaximummemoryallocatedtoastage(overallstages)isminimized;(C3)Thetotalmemoryusedisminimizedsubjecttothersttwoconstraints. 56

PAGE 57

4 ]assertthatconstraintC3reducespipelinedisruptionresultingfromrule-tableupdates.Althoughthealgorithmproposedin[ 4 ]constructsxed-stridetriesthatsatisfyconstraintsC1andC2,theconstructedtriesmayviolateconstraintC3.KimandSahni[ 23 ]havedevelopedfasteralgorithmstoconstructpipelinedxed-stridetries;theirtriessatisfyallthreeoftheconstraintsC1{C3.FSTsthatsatisfyC1{C3arecalledoptimalpipelinedFSTs. 3.2.1ADynamicProgrammingConstructionAlgorithmstoconstructoptimalpipelinedone-dimensionalFSTs(i.e.,FSTsthatsatisfyconstraintsC1{C3)havebeendevelopedin[ 23 ].So,weconsideronlytheconstructionofpipelinedone-dimensionalVSTsinthissection.OptimalpipelinedVSTssatisfyconstraintsC1{C3(although,inC1,wereplace\xed-stridetrie"with\variable-stridetrie").AlthoughwedonotdevelopanalgorithmthatconstructsanoptimalpipelinedVST,theheuristicproposedbyus,inthissection,constructsan\approximatelyoptimal"VST,whichwhenmappedontoapipelinedarchitectureusingconstraintC1resultsinamaximumper-stagememoryrequirementthatisconsiderablylessthanthatforanoptimalpipelinedFSTforthegivenruletable.LetObethe1-bittrieforthegivenlterset,letNbeanodeofOandletST(N)bethesubtreeofOthatisrootedatN.LetAopt(N;r)denotetheapproximatelyoptimal(pipelined)VSTforthesubtreeST(N);thisVSThasatmostrlevels.LetAopt(N;r):E(l)bethe(total)numberofelementsatlevellofAopt(N;r),0l
PAGE 58

3{1 { 3{3 maybesolveddirectlytodetermineAopt(root(O);k),thetimecomplexityoftheresultingalgorithmisreducedbydeningauxialliaryequations.Forthispurpose,letAoptSTs(N;i;r1),i>0,r>1,denotethesetofapproximatelyoptimalVSTsforDi(N)(AoptSTs(N;i;r1)hasoneVSTforeachmemberofDi(N));eachVSThasatmostr1levels.LetAoptSTs(N;i;r1):E(l)bethesumofthenumberofelementsatlevellofeachVSTofAoptSTs(N;i;r1).So, 58

PAGE 59

3{1 { 3{5 ,thetotaltimeforthisisO(nW2k2). 3.2.1 ismappedontoakstagepipelineinthemoststraightforwardway(i.e.,nodesatlevelloftheVSTarepackedintostagel+1,0lq.Hence,wearemotivatedtosolvethefollowingtreepackingproblem:TreePacking(TP)Input:Twointegersk>0andM>0andatreeT,eachofwhosenodeshasapositivesize.Output:"Yes"ithenodesofTcanbepackedintokbins,eachofcapacityM.Thebinsareindexed1throughkandthepackingisconstrainedsothatforeverynodepackedintobinq,eachofitsdescendentnodesispackedintoabinwithindexmorethanq.Byperformingabinary(orother)searchoverM,wemayuseanalgorithmforTPtodetermineanoptimalpacking(i.e.,onewithleastM)ofAopt(root(O);k)intoak-stagepipeline.Unfortunately,problemTPisNP-complete.Thismaybeshownbyusingareductionfromthepartitionproblem[ 13 ].Inthepartitionproblem,wearegivennpositiveintegerssi,1inwhosesumis2BandwearetodeteminewhetheranysubsetofthegivensissumstoB. 59

PAGE 60

AllVSTshavenodeswhosesizeisapowerof2(moreprecisely,someconstanttimesapowerof2).TheTPconstructionofTheorem4-1resultsinnodesizesthatarenotnecessarilyapowerof2.DespiteTheorem4-1,itispossiblethatTPrestrictedtonodeswhosesizeisapowerof2ispolynomiallysolvable.However,wehavebeenunabletodevelopapolynomial-timealgorithmforthisrestrictedversionofTP.Instead,weproposeaheuristic,whichismotivatedbytheoptimalityoftheFirstFitDecreasing(FFD)algorithmtopackbinswhenthesizeofeachitemisapowerofa,wherea2isaninteger.InFFD[ 13 ],itemsarepackedindecreasingorderofsize;whenanitemisconsideredforpacking,itispackedintotherstbinintowhichitts;iftheitemtsinnoexisitingbin,anewbinisstarted.Althoughthispackingstrategydoesnotguaranteetominimizethenumberofbinsintowhichtheitemsarepackedwhenitemsizesarearbitraryintegers,thestrategyworksfortherestrictedcasewhenthesizeofeachitemisoftheformai,wherea2isaninteger.Theorem4-2establishesthisbyconsideringarelatedproblem{restrictedmaxpacking(RMP).Leta2beaninteger.Letsi,apowerofa,bethesizeoftheithitem,1in.Letcibethecapacityoftheithbin,1ik.Intherestrictedmaxpackingproblem,wearetomaximizethesumofthesizesoftheitems 60

PAGE 61

61

PAGE 62

3{7 istherstsumofEquation 3{6 .Fromobservations(2)and(3)andthefactthatAisthesmallestsizeinS,itfollowsthateveryitemwhosesizeislessthanAispackedintoabinbyFFDandcontributestothesecondsuminEquation 3{6 .Sinceallitemsizesareapowerofa,noitemwhosesizeismorethanAcancontributetoayi.Hence,thesecondsumofEquation 3{7 isthesecondsumofEquation 3{6 .So,OtherSizeFFDSizeandFFDsolvetheRMPproblem. TheoptimalityofFFDforRMPmotivatesourtreepackingheuristicofFigure 3-1 ,whichattemptstopackatreeintokbinseachofsizeM.Itisassumedthatthetreeheightisk.Theheuristicusesthenotionsofareadynodeandacriticalnode.Areadynodeisonewhoseancestorshavebeenpackedintopriorbins.Onlyareadynodemaybepackedintothecurrentbin.Acriticalnodeisan,asyet,unpackednodewhoseheight 62

PAGE 63

Treepackingheuristic 2-4 hasauniquemappingontoa3stagepipeline,eachlevelofthedesttrieismappedtoadierentstageofthepipeline.Inthismapping,thedest-trierootandthe2-nodesourcesubtriethathangsfromitmapintostage1ofthepipeline,theleftchildofthedest-trieroottogetherwithits1-nodehangingsourcetriemapintostage2,thelevel2dest-trienodeandit2-nodehangingsourcetriemapintostage3.Thememoryrequirementforthe3stagesis12(2forthedest-trienodeand2+8forthe2source-trienodes),8,and8,respectively.Themaximumper-stagememory,therefore,is12.Themaximumnumberofmemoryaccessesneededtoprocessapacketis3forstage1(1toaccessthedest-trienodeand1foreachofthe2source-trienodes),2forstage2,and3forstage3.Forsmoothoperationofthepipeline,thecycletimehastobesucentfor3memoryaccesses.Inane-grainmapping,adest-trienodeandthenodesofthesourcetrieshangingfromitaremappedtodierentstagesofthepipeline.So,forexample,inane-grain 63

PAGE 64

2-4 ontoan8-stagepipline,thedest-trierootcouldbemappedtostage1,therootofitshangingsourcetrietostage2andtheremainingnodeinthissourcetrietostage3;thelevel1dest-trienodecouldbemappedtostage4andits1-nodehangingsourcetrietostage5;theremaining3stagesofthepipelinecouldbeusedforthelevel2dest-trienodeandits2-nodehangingsourcetrie.Eachstagewouldperformonememoryaccesswhenprocessingapacketandthemaximumper-stagememoryis8.Inthissection,weconsidercoarse-grainmappingonly.Fine-grainmappingisconsideredinSection 3.4 .wedeveloptwoalgorithmstoconstruct2DMTsthataresuitableforacoarse-grainmappingontoak-stagepipelinearchitecture.Therstconstructsa2DMTinwhichthedesttrieisanFSTandthesourcetriesareVSTs.Theconstructed2DMTisoptimal(inthesenseofconstraintsC1{C3)forcoarse-grainmappingundertheassumptionthatthe2DMTdesttrieisanFST.Inthesecondalgorithm,theconstructed2DMTisaVST.Whentheconstructed2DMTismappedsoastosatisfyC1(undertheaddedconstraintofacoarse-grainmapping),constraintsC2andC3maynotbesatised.However,experimentalresultsreportedinSection 3.5 indicatethatthemappingrequiresmuchlessmaximumper-stageaswellastotalmemorythanwhenthe2DMToftherstalgorithmisused. 64

PAGE 65

2 .Then,O(kW)MMEvaluesarecomputedusingEquation 3{8 .EachoftheseMMEvaluesiscomputedinO(W)time,foratotalofO(kW2)time.TheoveralltimetocomputeMME(W1;k)is,therefore,O(n2W2+kW2)=O(n2W2)(undertheveryrealisticassumptionthatk=O(n2)). 65

PAGE 66

3{9 and 3{10 isO(kW2).Notethatthetree-mappingheuristicofFigure 3-1 maybeemploedtomaptheconstructedtrieontoakstagepipeline. 3.2 toobtainapproximatelyoptimalcoarse-grainpipelined2DMTswhosedestandsourcetriesareVSTs;eachsourceVSThasatmostHlevels.LetObethe2D1BTforthegivenlterset,letNbeanodeofthedesttrieofOandletST(N)bethesubtreeofOthatisrootedatN.NotethatST(N)includesalldest-trienodesofOthataredescendentsofNtogetherwiththesourcetriesthathangoofthesedest-trienodes.LetAopt(N;r)denotetheapproximatelyoptimal(pipelined)2DMTforthesubtreeST(N);thedesttrieofthis2DMTisaVSTthathasatmostrlevels,eachofthesourceVSTshangingoofthedest-trienodeshasatmostHlevels.LetAopt(N;r):E(l)bethe(total)numberofelementsatlevellofAopt(N;r),0l
PAGE 67

3.2 ,ourapproximatelyoptimal2DMThasthepropertythatthedest-triesubtreesareapproximatelyoptimalforthesubtreesofNthattheyrepresent.Whenr=1,Aopt(N;1)hasonlyarootnode;thisrootrepresentsallofST(N).So, 3.2 ,thetimeneededtodetermineAopt(root(O);k)isreducedbydeningauxilliaryequations.Forthispurpose,letAoptSTs(N;i;r1),i>0,r>1,denotethesetofapproximatelyoptimalpipelined2DMTsforDi(N)(AoptSTs(N;i;r1)hasone2DMTforeachmemberofDi(N)),thedesttrieofeach2DMThasatmostr1levelsandeachsourcetriehasatmostHlevels.LetAoptSTs(N;i;r1):E(l)bethesumofthenumberofelementsatlevellofeach2DMTofAoptSTs(N;i;r1).Equations 3{4 and 3{5 holdforthenewdenitionofAoptSTs.AllvaluesofsourceSum(N;j)maybecomputedinO(n2W3k)timeasinChapter 2 .UsingEquations 3{11 { 3{13 , 3{4 and 3{5 ,wemaycomputeAopt(root(O);k)inan 67

PAGE 68

Wecanimprovethepipelined2DMTtriesconstructedusingEquations 3{11 { 3{13 , 3{4 and 3{5 byemployingthenodepulluptechnique[ 4 , 23 ].Thenodepulluptechiquehelpsreducecongestion(i.e.,excessmemoryrequiredbyatrielevel)atatrielevelbycollapsingsubtriesatthatlevelintotheparentlevel.Asanexample,considerthe2D1BTST(R)ofFigure 3-2 .SupposethattheoptimalH-levelVSTforeachofthesourcetriesS1andS2has100elements.WhenEquations 3{11 { 3{13 , 3{4 and 3{5 areused,theconstructed3-stagepipelined2DMTforST(R)hasnodeRassignedtostage1,nodesAandBassignedtostage2,andnodesCandDtogetherwiththesourcetriesS1andS2assignedtostage3.Thememoryrequiredbythe3stagesis2(thestrideofRis1),4,and204(2foreachofCandDand100foreachofS1andS2),respectively.Usinganodepullup,wecanpullDintoB(orCintoA)changingthestrideoftheright(left)childofRto2.Whentheresulting2DMTismappedontoa3-stagepipeline,thememoryrequirementforstage1is2,thatforstage2is2(forthesinglestride1destnodeatlevel1)+4(forthestride2destnodeatlevel1)+100(forthesingledistinctnon-emptysourcetriehangingfromthestride2destnode)=106,and2(forthesingledest-trienodeatlevel2ofthedesttrie)+100(forthesourcetriehangingfromthisdest-trienode)=102.ForST(R),theperformednodepullupreducesthemaximumper-stagememoryfrom204to106.LetNbeadest-trienodeofO.WhencomputingAoptSTs(N;i;j),foreachM2Di(N),wemayeitherdoanodepullup(i.e.,ST(M)isrepresentedbyasingledest-trienodeinthe2DMT)orrepresentST(M)byitsapproximatelyoptimalj-level2DMTwhosemaximumper-stagememoryrequirementisAopt(M;j).AlgorithmNodePullup(Figure 3-3 )usesagreedystrategytochoosebetweenthesetwooptions.Thealgorithm 68

PAGE 69

Examplefornodepullup computesSTs(N;i;j):E(j)usingAopt(M;j)forM2Di(N).NotethatthealgorithmincorporatesthenodepullupoptiondirectlyintoEquation 3{4 . 3.3 ,weconsidertwocases.Intherst,thedesttrieofthe2DMTusedtomapontothepipelinearchitectureisanFSTandinthesecond,thisdesttrieisaVST.UnlikeinSection 3.3 whereweobtainedanalgorithmtoconstructanoptimalpipelined2DMT(undertheconstraintthatthedesttrieisanFST),ouralgorithmsfornegrainmappingdonotguaranteeoptimality.Hence,theyarejustheuristicstoconstructgoodpipelined2DMTsfornegrainmapping. 69

PAGE 70

ComputingAoptSTs(N;i;j)usingnodepullups 3.3.1 exceptthatthetermsarenowforane-grainmappingontoanr-stagepipleine.Itiseasytoseethatforj0, 70

PAGE 71

3.3.1 .Whenthedesttrieoftheoptimalpipelined2DMTforlevels0throughjofOhasonlyonelevel,MME(j;r)=X(j;r),whereX(j;r)=maxf2j+1;minMaxE(0;j;r1)gWhenthisdesttriehasmorethan1level,thestart-endpairforthelastlevelis(m+1;j)forsomem,0m
PAGE 72

23 ]todetermine,foreachSi2S(s;e),theFSTUithathasminimumtotalnumberofelementssubjecttotheconstraintsthattheFSThasatmosthlevelsandminimizesthemaximumnumberofelementsonanylevel.LetlevelSum(j)bethesumofthenumberofelementsonleveljofalltheUis.minMaxE(s;e;h)isestimatedtobemaxjflevelSum(j)gandtotalE(s;e;h)isestimatedtobethetotalnumberofelementsinalltheUis.ComplexityAnalysis 3{14 through 3{16 todetermineMME(W1;k),wemustcomputeO(W2k)minMaxEvalues.Forthislattercomputation,weneedtodeterminetheapproximatelyoptimalpipelinedVSTs(oroptimalpipelinedFSTs)forallsourcetriesS(s;e).AsnotedinChapter 2 ,thetotalnumberofsourcetriesweneedtoworkwithisO(nW)andeachofthesesourcetrieshasO(nW)nodes.UsingtheheuristicofSection 3.2 ,theapproximatelyoptimalVSTs(forallpossibleheights)fortheseO(nW)sourcetriesmaybecomputedinO(n2W3k2)time.IfFSTsourcetriesaretobeused,optimalFSTsforallpossibleheightsandallpossiblesourcetriesmaybecomputedinO(nW3k)timeusingthealgorithmof[ 23 ].EachminMaxE(s;e;h)isestimatedtobemaxjflevelSum(j)g,wherelevelSum(j)isthesumeofthenumberofelementsonleveljoftheapproximatelyoptimal(oroptimal)pipelinedVSTs(FSTs)thatcoverlevelssthrougheofthe2D1BT.AsO(n)VSTs(orFSTs)areinvolvedinthecomputationofeachlevelSum(j)valueandthereareO(k)levelstoconsider,anadditionalO(nk)timeisneededforeachminMaxEvaluethatistobecomputed.Theadditionaltimeforall(s;e;h)combinationsis,therefore,O(nW2k2).(NotethatthesizeofalloftheVSTs(orFSTs)maybedeterminedinthismuchtimeaswell.)Addingupallthecomponents,weseethatthecomputationofallminMaxvaluestakesO(n2W3k2)timeinthecaseofVSTsourcetriesandO(nW3k+nW2k2)timeinthecaseofFSTsourcetries.OncewehavetheminMaxvalues,Equations 3{14 through 3{16 maybesolvedforMME(W1;k)inO(W2k2)timeasthereareO(Wk)MMEvaluestocomputeand 72

PAGE 73

3{17 through 3{19 maynowbesolvedforTE(W;k)inO(W2k2)time(O(Wk)TEvaluesaretobecomputedatacostofO(Wk)timeeach).So,theoverallcomplexityofcomputinganapproximatelyoptimalpipelined2DMTwhenthedesttrieisanFSTisO(n2W3k2)forthecaseofVSTsourcetriesandO(nW3k+nW2k2)forthecaseofFSTsourcetries. 2.2 )andE:child,respectively,denotethesourcesubtrieanddestsubtrieassociatedwithE.WesaythatE:sourceTrieandE:childarecorrespondingtries.The2DMTTmaybemappedontothestagesofapipelinearchitectureinthefollowingwaybeginningwithstagenextStage.First,therootMofT(butnotthesourcetriesthathangfromit)ismappedtostagenextStage.Second,letnoCor(M)bethesetof2DMTsubtriesofMhavenocorrespondingnonemptysourcetries.These2DMTsubtriesaremappedrecursively,oneatatime,beginningatstagenextStage+1.Third,letS3(M)bethesetofsourcetriesthathangfromMandthathavenononemptycorresponding2DMTsubtrie.(NotethattwoormoreelementsofMmaypointtothesamesourcetrieS.Ifanyoneoftheseelementshasanon-nullchildeld,SisnotinS3(M).)ThetriesinS3(M)aremappedlevelbylevelontostagesnextStage+1,nextStage+2,.Notethatthesesourcetriessharepipelinestagesamongthemselvesaswellaswiththe2DMTsubtriesmappedinstep2.Atlast,letS4(M)bethesetofsourcetriesthathangfromMandthatwerenotmappedinstep3.Eachofthesesourcetrieshasoneormorenonemptycorresponding2DMTsubtrie.ForS2S4(M),letcombo(S)=(S;dest(S))besuchthatdest(S)isthenonemptysetof2DMTsubtriesofMthatcorrespondtoS.Thetriesineachcombo(S) 73

PAGE 74

4-lterdatabase Filter Dest Source Cost F1 00* 0* 1 F2 000* 0* 2 F3 10* 000* 3 F4 111* 111** 4 aremappedontothepipelinebeginningatstagenextStage+1.ForeachS2S4(M),SismappedlevelbylevelontostagesnextStage+1,nextStage+2,.LetlbethenumberoflevelsinS.The2DMTsindest(S)aremappedoneatatimeandrecursivelybeginningatstagenextStage+l+1.Notethateachcombo(S)sharespipelinestageswiththesourcetriesand2DMTsubtriesmappedinsteps2and3.Furher,dierentcombo(S)ssharestagesamongthemselves.Asanexampleofthismappingstrategy,considerthe4-lterdatabaseofTable 3-1 .Figure 3-4 showsapossible2DMTforthisdatabaseaswellasthemappingofthis2DMTontoapipeline.Themappingrequiresa5-stagepipeline.Thedesttrieofthis2DMTisaVSTandthesourcetriesareFSTs.ThemappingprocessbeginswithNbeingtherootofthe2DMTandnextStage=1.Nismappedtostage1instep1.Nhas22DMTsubtries,neitherofwhichhasanonemptycorrespondingsourcetrie.Both2DMTsubtriesofNaremappedrecursivelybeginningatstage2instep2.WhenNistheleftchildoftheroot,nextStage=2.Thisleftchildismappedtostage2.Theonlysourcetrie,S1,hangingfromthisleftchildandtheonly2DMTsubtrieofthisleftchilddenecombo(S1).combo(S1)ismappedinstep4andsoon.Noticethatapipelinestagemayaccomodatebothdest-andsource-trienodes.So,weshallneedtoaddabittoeachnodetodierentiate.Wedevelopdynamicprogrammingrecurrencestodeterminea2DMTwithVSTdestandsourcetriesthatissuitablefornegrainmappingontoakstagepipelineusingthejuststatedmappingstrategy.Whiletheconstructed2DMTdoesn'tnecessarilyminimizethemaximumper-stagememoryrequirement,itdoesproducegoodmappings.LetNbeadest-trienodeinthe2D1BTO.Aopt(N;r)denotestheapproximatelyoptimal 74

PAGE 75

2DMTfor4-lterdatabaseinTable 3-1 pipelined2DMTforthesubtrieST(N)suchthatthe2DMTmapsontoatmostrpipelinestagesusingthejuststatedmappingstrategy.LetMbetherootofAopt(N;r).InAopt(N;r),thesubtriesofMarethemselvesapproximatelyoptimal2DMTsandthesourcetriesareapproximatelyoptimalVSTs(asdenedinSection 3.2 ).Aopt(N;r):E(l)denotesthenumberofelementsmappedtostagelofthepipeline.Dq+1(N)denotesthesubsetofDq+1(N)thataretherootsofthesubtriesrepresentedbynoCor(M)undertheassumptionthatMcoverslevels1throughqofST(N).Dq+1(N)denotesDq+1(N)=Dq+1(N)Dq+1(N).Tries(N;q;z)denotesthesetofapproximatelyoptimalpipelinedVSTsforS3(M).ThepipelinedVSTshaveatmostzlevelseachandMcoverslevels0throughqofST(N).Tries(N;q;z):E(l)denotesthetotalnumberofelementsatlevellofallVSTsinTries(N;q;z).OptCombo(S;N;q;z):E(l)denotesthenumberofelementsassignedtostagel+1ofthepipelinewhencombo(S)ismappedtoapipelinebeginningwithstage1;therootMofthe2DMTforST(N)coverslevels0throughqofST(N);andthemappingofcombo(S)islimitedtozstages. 75

PAGE 76

76

PAGE 77

77

PAGE 78

3{21 )inO(1)time.So,allO(n2W2k2)OptCombo(S;N;q;z):E(l)valuesmaybecomputedinO(n2W2k2)additionaltime.SinceeachoftheO(nW)nodesinthedesttrieofthe2D1BThasO(W)ancestorsandjS4(M)jnforeveryM,allO(nW2k2)hvaluesandhence,allAopt(;):E()values,maybecomputedinO(n2W2k2)time.Addingtogetherallcomponentsofthetimecomplexity,weseethatanapproximatelyoptimalne-grainpipelined2DMTwithaVSTdesttriemaybecomputedinO(n2W3k2)timewhenthesourcetriesarealsoVSTsandinO(nW3k+n2W2k2)=O(n2W2k2)(undertheveryrealisticassumptionthatW=O(nk))timewhenthesourcetriesareFSTs. 23 ]andouttwo-dimensionalalgorithmswerebenchmarkedagainstthetwo-dimensionalalgorithmsofChapter 2 . 4 ]andKimandSahni[ 23 ]haveproposedalgorithmsfortheconstructionofpipelinedone-dimensionalmultibittries.Sincethealgorithmsof[ 23 ]aresuperiortothoseof[ 4 ],wefocusonthealgorithmsof[ 23 ].[ 23 ]developsanalgorithmPFST-2,whichisa2-stagealgorithmthatresultsinpipelinedFSTsthatminimizetotalmemorysubjecttominimizingthemaximumper-stagememory.SahniandKim[ 23 ]alsoproposetwoalgorithmsPU-2nandPARTthatarebasedonFSTsbutresultinVSTsthataresuperiorforpipelineapplicationsthantheoptimalFSTsgeneratedbyPFST-2. 78

PAGE 79

32 ],whichconstructsVSTswithminimumtotalmemoryandletPVSTbeouralgorithmofSection 3.2 .So,inall,wehave5algorithms{PFST,PU-2n,PART,VSTandPVST{fortheconstructionofpipelinedmultibittries.Onlyoneofthese,PFST,resultsinanFSTandtheothersresultinVSTs.WerstdeterminetheeectivenessofourtreepackingheuristicofFigure 3-1 relativetothestraightforwardmapping(i.e.,nodesatlevellofthemultibittriearepackedintostagel+1,0l
PAGE 80

3-5 plotsthisdataforAS1221. Table3-3. Maximumper-stagememory(KB) k 2 3 4 5 6 7 8 Aads PFST 2304 216 118 64 64 64 64 PU-2n 2304 216 118 64 64 64 64 PART 2304 219 105 72 71 51 49 VST 576 94 60 42 33 30 27 PVST 576 88 46 36 26 23 18 MaeWest PFST 7634 327 144 102 102 102 102 PU-2n 7634 327 144 102 102 102 102 PART 5838 369 151 144 94 78 72 VST 896 175 97 71 59 52 44 PVST 896 144 73 55 42 37 33 RRC01 PFST 9216 708 362 308 308 230 230 PU-2n 9216 708 360 301 301 225 225 PART 9216 708 380 288 324 248 188 VST 1866 587 335 229 192 161 153 PVST 1866 447 248 175 146 114 111 RRC04 PFST 10490 1048 395 339 243 243 243 PU-2n 10490 1048 394 334 239 239 239 PART 10490 1048 445 326 288 261 209 VST 2304 504 349 240 207 169 154 PVST 2304 482 264 187 155 121 118 AS4637 PFST 18432 1026 560 479 479 479 359 PU-2n 18432 1026 557 469 469 469 351 PART 15149 989 576 406 368 358 315 VST 2517 751 555 376 321 268 245 PVST 2304 597 368 288 288 185 181 AS1221 PFST 18432 1714 795 498 439 439 348 PU-2n 18432 1714 795 496 431 431 342 PART 18432 1743 818 576 387 365 311 VST 5750 1002 704 497 404 324 298 PVST 4608 802 476 332 313 288 217 80

PAGE 81

Maximumper-stagememory(KB) k 2 3 4 5 6 7 8 Aads PFST 3715 577 350 242 230 230 230 PU-2n 3715 577 350 242 230 230 230 PART 3482 566 355 330 326 297 300 VST 1010 238 140 121 117 116 116 PVST 1010 242 163 157 139 132 132 MaeWest PFST 12242 879 540 367 356 355 355 PU-2n 12242 879 540 367 355 355 355 PART 10446 965 599 575 477 469 469 VST 1472 368 232 202 196 194 194 PVST 1472 402 282 239 239 221 216 RRC01 PFST 18049 1920 1375 1062 1039 1035 1034 PU-2n 18049 1920 1370 1042 1018 1015 1013 PART 17050 1920 1429 1440 1528 1524 1314 VST 3018 957 680 626 613 610 609 PVST 3018 980 815 799 712 715 671 RRC04 PFST 19706 2283 1475 1159 1114 1090 1088 PU-2n 19706 2283 1471 1142 1097 1073 1071 PART 19706 2283 1623 1590 1553 1605 1463 VST 4440 1039 726 663 649 645 643 PVST 4440 1057 854 835 748 751 706 AS4637 PFST 19445 2628 1968 1624 1595 1594 1593 PU-2n 19445 2628 1961 1594 1565 1563 1562 PART 24365 2428 2214 1913 1784 2151 2150 VST 3669 1371 1071 1003 986 981 980 PVST 3801 1565 1248 1253 1260 1101 1065 AS1221 PFST 23161 4580 2664 2283 1939 1911 1896 PU-2n 23161 4580 2664 2272 1906 1878 1864 PART 23161 4629 2929 2650 2227 2264 2250 VST 8054 1730 1313 1217 1195 1189 1187 PVST 8403 1913 1520 1485 1361 1485 1308 81

PAGE 82

BFigure3-5. Maximumper-stageandtotalmemory(KB)forAS1221.A)Maximumper-stagememory.B)Totalmemory Inallofour42tests,PVSTresultedintheleastmaximumper-stagememoryrequirement.Tables 3-5 and 3-6 givethemaximumper-stageandtotalmemoryrequiredbythe5algorithmsnormalizedbytherequirementsforPVST.Themaximumper-stagememoryrequirementforthealgorithmsof[ 23 ]areupto8.51timesthatofPVSTwhiletherequirementforVSTisupto50%morethanthatofPVST.Onaverage,thetotalmemoryrequiredbythemultibittriesproducedbyVSTwas12%lessthanthatrequiredbythePVSTtries;thetriesgeneratedbythealgorithmsof[ 23 ]required,onaverage,about2timesthetotalmemoryrequiredbythePVSTtries. Table3-5. Maximumper-stagememorynormalizedbyPVST'smaximumper-stagememory Algorithm Min Max Mean StandardDeviation PFST 1.40 8.51 2.54 1.54 PU-2n 1.38 8.51 2.53 1.551 PART 1.24 6.58 2.38 1.25 VST 1.00 1.50 1.28 0.14 PVST 1 1 1 0 2 fortwo-dimensionalmultibittriesthatminimize 82

PAGE 83

TotalmemorynormalizedbyPVST'stotalmemory Algorithm Min Max Mean StandardDeviation PFST 1.27 8.31 2.13 1.41 PU-2n 1.24 8.31 2.12 1.41 PART 1.42 7.09 2.42 1.23 VST 0.77 1 0.88 0.07 PVST 1 1 1 0 totalmemory.Chapter 2 propose4heuristics{2DMTathrough2DMTd{toconstructtwo-dimensionalmultibittries.Ofthese,only2DMTaconstructstriessuitableforcoarsemapping 2 (fromthestandpointoftotalmemoryrequirement),thetriesconstructedby2DMTdaresuitedonlyforanemappingontoapipeline.So,wecompareourcoarsegrainalgorithmsto2DMTaandournegrainalgorithmsto2DMTd.Fortestdata,weusedthe12datasetsofChapter 2 .Thesedatasetsarederivedfrom5-dimensionaldatasetsgeneratedusingtheltergeneratorof[ 42 ].Eachofthesedatasetsactuallyhas10dierentdatabasesoflters.So,inall,wehave120databasesoftwo-dimensionallters.Thedatasets,whicharenamedACL1throughACL5,FW1throughFW5,IPC1andIPC2have,respectively,20K,19K,10K,13K,5K,19K,19K,18K,17K,17K,16Kand20Klters,onaverage,ineachdatabase. 3-7 givesthereductioninmaximumper-stagememorywhenthetreepackingheuristicofFigure 3-1 isusedversusthestraightforwardmapping.Althoughthemeanreduction(lessthan7%)issmall,themaximumreduction(51%)issignicant.All 83

PAGE 84

3-1 isused. Table3-7. Reductioninmaximumper-stagememoryresultingfromtreepackingheuristic Algorithm Min Max Mean StandardDeviation 2DMTa 0% 19% 2% 5% FST 0% 19% 2% 5% VST 0% 51% 7% 12% Inourexperiments,thesourcetrieswererestrictedtohaveheightatmostmaxSHformaxSH2f2;3;4gandthedest-trieheightwasrestrictedtobeatmostk,2k8.Withtheserestrictions,theconstructedtwo-dimensionaltriecouldbemappedintoak-stagepipelinewithacycletimeofmaxSH+1(1toaccessadest-trienodeandmaxSHtoaccesssource-trienodes).ForFST,thetree-packingheuristicofFigure 3.2.2 wasemployedtoreducethemaximumper-stagememoryandforVST,thenodepullupschemeofSection 3.3.2 followedbythetree-packingheuristicofFigure 3-1 wereemployed.Thenodepullupschemereducedthemaximumper-stagememorybyasmuchas51%(theminimumandmeanreductionswere0%and3%,respectively;thestandarddeviationwas11%)andthetree-packingheuristicreducedthisbyanadditionalupto19%forFSTsandupto51%onVSTs(asreportedinTable 3-7 ).Figure 3-6 showsthemaximumper-stageandtotalmemoryrequirementsofthecoarsegrainmappingresultingfromthetriesproducedby2DMTaandtheFSTandVSTdest-triecoarsemappingalgorithmsofSection 3.3 forthedatasetsACL1,FW1andIPC1andthecasemaxSH=4.Sinceeachdatasetiscomprisedof10databases,weactuallyshowtheaverageofthe10values.Theexperimentalresultsaresimilarfortheremaining9datasets.Onourtestdata,themaximumper-stagememoryandtotalmemoryofVSTaregenerallysuperiortothatofFSTand2DMTa.Themaximumper-stagememoryofFSTnormalizedbythatofVSTisbetween0.91and80;themeanandstandarddeviationare3.16and5.95,respectively.Thenormalizednumbersfortotalmemoryrequiredarebetween0.58and65withthemeanandstandarddeviationbeing2.04and4.80.On 84

PAGE 85

B C D E FFigure3-6. Maximumper-stageandtotalmemoryrequiredbycoarse-grainpipelining,maxSH=4.SourcetriesareVSTs.A)ACL1,MaxPerStageMemory.B)FW1,MaxPerStageMemory.C)IPC1,MaxPerStageMemory.D)ACL1,TotalMemory.E)FW1,TotalMemory.F)IPC1,TotalMemory. ourtestdata,themaximumper-stagememoryof2DMTanormalizedbythatofVSTisbetween0.93and80;themeanandstandarddeviationare3.39and5.91,respectively.Thenormalizednumbersfortotalmemoryrequiredarebetween0.49and65withthemeanandstandarddeviationbeing1.92and4.74.Forthestructuresconstructedbythesethreealgorithms,increasingmaxSHsignicantlyreducedtheper-stage(total)memoryrequirementon4ofourdatabases(ACL3-5andIPC1),whilethishadlittleornoimpactontheremaining8databases. 3-7 showsthemaximumper-stageandtotalmemoryrequirementsofthenegrainmappingresultingfromthetriesproducedby2DMTd(k)andtheFSTandVST 85

PAGE 86

3.4 .Memoryrequirementsinexcessof106areplottedas106.Nodepullupdidnotreducethemaximumper-stagememoryrequirementofthetrieresultingfromourVSTdest-trienemappingalgorithm. B C D E FFigure3-7. Maximumper-stageandtotalmemoryrequiredbyne-grainpipelining.SourcetriesareVSTs.A)ACL1,MaxPerStageMemory.B)FW1,MaxPerStageMemory.C)IPC1,MaxPerStageMemory.D)ACL1,TotalMemory.E)FW1,TotalMemory.F)IPC1,TotalMemory. Forallourdatasets,VSTgenerallyoutperformedFSTonboththemaximumper-stagemetricaswellasthetotalmemorymetric.Infact,themaximumper-stagememoryforFSTnormalizedbythatforVSTwasbetween0.7and2047;themeanandstandarddeviationwere109and328,respectively.Thenormalizednumbersfortotalmemorywerebetween0.82and1020.Themeanandstandarddeviationwere42and128.VSTwasalsogenerallysuperiorto2DMTdonthemaximumper-stagememorymetric.Themaximumper-stagememoryfor2DMTdnormalizedbythatforVSTwasbetween0.65and2.16;themeanandstandarddeviationwere1.27and0.29,respectively.Notsurprisingly,2DMTdwasalwayssuperiortoVSTonthetotalmemorymetric.The 86

PAGE 87

3-8 comparesthememoryrequirementsofthetriesresultingfromthebestcoarsegrainandnegrainalgorithms(i.e.,theVSTdest-triealgorithmsofSections 3.3 and 3.4 ).Forthecaseofacoarsegrainmapping,weshowthreecases{maxSH=2(Coarse-2),3and4.Thenegrainmappingissuperiortothecoarsegrainmappingwhenkislargebutnotwhenkissmall.ForFW1,forexample,themaximumper-stagememoryrequiredbythecoarse(maxSH=3)andnemappingis(almost)thesamewhenk=7.However,thecycletimeforthe7-stagepipelineis4forthecoarsemappingand1forthenemapping.So,thenemappingresultsinathroughputthatis4timesthatofthecoarsemappingwhileusing(almost)thesamemaximumper-stagememory.Ontheotherhand,whenk=6,thenemappingrequiredupto6050timesthemaximumper-stagememoryrequiredbyacoarsemappingwithmaxSH=4!Whenk=12,thecoarsemappingwithmaxSH=2requiredupto1061timesthemaximumper-stagememoryrequiredbythenemapping. 23 ]byasmuchas60%.OurPVSTalgorithmresultsinVSTmultibittries,whichwhenmappedintoapipelinedarchitecture,haveamaximumper-stagememoryrequirementthatisupto1/8thatrequiredbythetriesof[ 23 ].Fortwo-dimensionalmultibittries,wehaveproposedtwostrategies{neandcoarse{formappingintoapipeline.Foreachstrategy,wehavedevelopeddynamicprogrammingalgorithmsforbothFSTandVSTdestandsourcetries.Althoughthesealgorithmsdonotndoptimalmultibittries,theyconstructcoarsegrainmultibittriesthatgenerallyhaveamuchsmallermaximumper-stagememoryrequirementthandothetriesobtainedusingthe2DMTaalgorithmofChapter 2 andtheyconstructnegrain 87

PAGE 88

B C D E FFigure3-8. Maximumper-stageandtotalmemoryrequiredbycoarse-grainandne-grainpipelining.ThedesttrieisVSTandthesourcetriesareVSTs.A)ACL1,MaxPerStageMemory.B)FW1,MaxPerStageMemory.C)IPC1,MaxPerStageMemory.D)ACL1,TotalMemory.E)FW1,TotalMemory.F)IPC1,TotalMemory. multibittriesthatgenerallyhaveamuchsmallermaximumper-stagememoryrequirementthandothetriesobtainedusingthe2DMTdalgorithmofChapter 2 . 88

PAGE 89

4.1 ,byreviewingthe1-and2-dimensionalbinarytrierepresentationofaclassiertogetherwiththeresearchthathasbeendoneonthesuccinctrepresentationofthesestructures.InSections 4.2 through 4.6.3 ,wedevelopourouralgorithmsforthesuccinctrepresentationof1-dimensionaltriesandinSections 4.4 and 4.5 ,wedothisfor2-dimensionaltries.ExperimentalresultsarepresentedinSection 4.7 . 4.1.1One-DimensionalPacketClassicationWeassumethattheltersina1-dimensionalclassierareprexesofdestinationaddresses.Manyofthedatastructuresdevelopedfortherepresentationofa1-dimensionalclassierarebasedonthebinarytriestructure[ 12 ].Abinarytrieisabinarytreestructureinwhicheachnodehasadataeldandtwochildrenelds.Branchingisdonebasedonthebitsinthesearchkey.Aleftchildbranchisfollowedatanodeatleveli(therootisatlevel0)iftheithbitofthesearchkey(theleftmostbitofthesearchkeyisbit0)is0;otherwisearightchildbranchisfollowed.Levelinodesstoreprexeswhoselengthisiintheirdataelds.Thenodeinwhichaprexistobestoredisdeterminedbydoingasearchusingthatprexaskey.LetNbeanodeinabinarytrie.LetQ(N)bethebitstringdenedbythepathfromtheroottoN.Q(N)istheprexthatcorrespondstoN.Q(N)isstoredinN:dataincaseQ(N)isoneoftheprexestobestoredinthetrie. 89

PAGE 90

4-1 (A)showsasetof5prexes.Theshownattherightendofeachprexisusedneitherforthebranchingdescribedabovenorinthelengthcomputation.So,thelengthofP2is1.Figure 4-1 (B)showsthebinarytriecorrespondingtothissetofprexes.Shadednodescorrespondtoprexesintheruletableandeachcontainsthenexthopfortheassociatedprex.ThebinarytrieofFigure 4-1 (B)diersfromthe1-bittriedescribedinSection 2.2 ,whichisusedin[ 38 ],[ 31 ],andothersinthata1-bittriestoresupto2prexesinanode(aprexoflengthlisstoredinanodeatlevell1)whereaseachnodeofabinarytriestoresatmost1prex.Becauseofthisdierenceinprexstoragestrategy,abinarytriemayhaveupto33(129)levelswhenstoringIPv4(IPv6)prexeswhilethenumberoflevelsina1-bittrieisatmost32(128). P1 * P2 0* P3 000* P4 10* P5 11* BFigure4-1. Prexesandcorrespondingbinarytrie.A)5prexes.B)Correspondingbinarytrie. Foranydestinationaddressd,wemayndthelongestmatchingprexbyfollowingapathbeginningatthetrierootanddictatedbyd.Thelastprexencounteredonthispathisthelongestprexthatmatchesd.Whilethissearchalgorithmissimple,itresultsinasmanycachemissesasthenumberoflevelsinthetrie.EvenforIPv4,thisnumber,whichisatmost33,istoolargeforustoclassify/forwardpacketsatlinespeed.Severalstrategies{e.g.,LCtrie[ 28 ],Lulea[ 6 ],treebitmap[ 7 ],multibittries[ 38 ],shapeshifting 90

PAGE 91

39 ]{havebeenproposedtoimprovethelookupperformanceofbinarytries.Allofthesestrategiescollapseseverallevelsofeachsubtreeofabinarytrieintoasinglenode,whichwecallasupernode,thatcanbesearchedwithanumberofmemoryaccessesthatislessthanthenumberoflevelscollapsedintothesupernode.Forexample,,wecanaccessthecorrectchildpointer(aswellasitsassociatedprex)inamultibittriewithasinglememoryaccessindependentofthesizeofthemultibitnode.Theresultingtrie,whichiscomposedofsupernodes,iscalledasupernodetrie.Lunteren[ 25 , 26 ]hasdevisedaperfect-hash-functionschemeforthecompactrepresentationofthesupernodesofamultibittrie.Thedatastructureweproposeinthispaperalsoisasupernodetriestructure.Ourstructureismostcloselyrelatedtotheshapeshiftingtrie(SST)structureofSongetal.[ 39 ],whichinturndrawsheavilyfromthetreebitmap(TBM)schemeofEathertonetal.[ 7 ]andthetechniquedevelopedbyJacobson[ 14 , 27 ]forthesuccinctrepresentationofabinarytree.InTBMwestartwiththebinarytrieforourclassierandpartitionthisbinarytrieintosubtriesthathaveatmostSlevelseach.Eachpartitionisthenrepresentedasa(TBM)supernode.SisthestrideofaTBMsupernode.WhileS=8issuggestedin[ 7 ]forreal-worldIPv4classiers,weuseS=2heretoillustratetheTBMstructure.Figure 4-2 (A)showsapartitioningofthebinarytrieofFigure 4-1 (B)into4subtriesW{Zthathave2levelseach.AlthoughafullbinarytriewithS=2levelshas3nodes,Xhasonly2nodesandYandZhaveonlyonenodeeach.Eachpartitionisisrepresentedbyasupernode(Figure 4-2 (B))thathasthefollowingcomponents:Therstcomponentisa(2S1)-bitbitmapIBM(internalbitmap)thatindicateswhethereachoftheupto2S1nodesinthepartitioncontainsaprex.TheIBMisconstructedbysuperimposingthepartitionnodesonafullbinarytriethathasSlevelsandtraversingthenodesofthisfullbinarytrieinlevelorder.FornodeW,theIBMis110indicatingthattherootanditsleftchildhaveaprexandtheroot'srightchildiseither 91

PAGE 92

BFigure4-2. TBMforbinarytrieofFigure 4-1 (B).A)TBMpartitioning.B)TBMnoderepresentation. absentorhasnoprex.TheIBMforXis010,whichindicatesthattheleftchildoftherootofXhasaprexandthattherightchildoftherootiseitherabsentorhasnoprex(notethattherootitselfisalwayspresentandsoa0intheleadingpositionofanIBMindicatesthattheroothasnoprex).TheIBM'sforYandZareboth100.Thesecondcomponentisa2S-bitEBM(externalbitmap)thatcorrespondstothe2SchildpointersthattheleavesofafullS-levelbinarytriehas.AswasthecasefortheIBM,wesuperimposethenodesofthepartitiononafullbinarytriethathasSlevels.Thenweseewhichofthepartitionnodeshaschildpointersemanatingfromtheleavesofthefullbinarytrie.TheEBMforWis1011,whichindicatesthatonlytherightchildoftheleftmostleafofthefullbinarytrieisnull.TheEBMsforX,YandZare0000indicatingthatthenodesofX,YandZhavenochildrenthatarenotincludedinX,Y,andZ,respectively.Eachchildpointerfromanodeinonepartitiontoanodeinanotherpartitionbecomesapointerfromasupernodetoanothersupernode.Toreducethespacerequiredfortheseinter-supernodepointers,thechildrensupernodesofasupernodearestoredsequentiallyfromlefttorightsothatusingthelocationoftherstchildandthesizeofasupernode,wecancomputethelocationofanychildsupernode.ThethirdcomponentisachildpointerthatpointstothelocationwheretherstchildsupernodeisstoredandthelastcomponentisapointertoalistNHofnext-hop 92

PAGE 93

39 ]isobtainedbypartitioningabinarytrieintosubtriesthathaveatmostKnodeseach.KisthestrideofanSSTsupernode.TocorrectlysearchanSST,eachSSTsupernoderequiresashapebitmap(SBM)inadditiontoanIBMandEBM.TheSBMusedbySongetal.[ 39 ]isthesuccinctrepresentationofabinarytreedevelopedbyJacobson[ 14 ].Jacobson'sSBMisobtainedbyreplacingeverynulllinkinthebinarytreebeingcodedbytheSBMwithanexternalnode.Next,placea0ineveryexternalnodeanda1ineveryothernode.Finally,traversethisextendedbinarytreeinlevelorder,listingthebitsinthenodesastheyarevisitedbythetraversal.SupposewepartitionourexamplebinarytrieofFigure 4-1 (B)intobinarytriesthathaveatmostK=3nodeseach.Figure 4-3 (A)showsapossiblepartitioningintothe3partitionsX-Z.Xincludesnodesa,banddofFigure 4-1 (B);Yincludesnodesc,eand 93

PAGE 94

4-3 (B)showsthenoderepresentationforeachpartitionofFigure 4-3 (A).TheshownSBMsexcludetherstandlasttwobits. BFigure4-3. SSTforbinarytrieofFigure 4-1 (B).A)SSTpartitioning.B)SSTnode. TheIBMofanSSTsupernodeisobtainedbytraversingthepartitioninlevelorder;whenanodeisvisited,weoutputa1totheIBMifthenodehasaprexanda0otherwise.TheIBMsfornodesX-Zare,respectively,110,011,and1.NotethantheIBMofanSSTsupernodeisatmostKbitsinlength.ToobtaintheEBMofasupernode,westartwiththeextendedbinarytreeforthepartitionandplacea1ineachexternalnodethatcorrespondstoanodeintheoriginalbinarytrieanda0ineveryotherexternalnode.Next,wevisittheexternalnodesinlevelorderandoutputtheirbittotheEBM.TheEBMsforour3supernodesare,respectively,1010,0000,and00.SincethenumberofexternalnodesforeachpartitionisatmostK+1,thesizeofanEBMisatmostK+1bits. 94

PAGE 95

BFigure4-4. HSSTforbinarytrieofFigure 4-1 (B).A)HSSTpartitioning.B)HSSTnoderepresentation. AsinthecaseoftheTBMstructure,childsupernodesofanSSTsupernodearestoredsequentiallyandapointertotherstchildsupernodemaintained.TheNHlistforthesupernodeisstoredinseparatememoryandapointertothislistmaintainedwithinthesupernode.AlthoughthesizeofanSBM,IBMandEBMvarieswiththepartitionsize,anSSTsupernodeisofaxedsizeandallocates2KbitstotheSBM,KbitstotheIBMandK+1bitstotheEBM.Unusedbitsarelledwith0s.Hence,thesizeofanSSTsupernodeis4K+2b1bits.Songetal.[ 39 ]developanO(m)timealgorithm,calledpost-orderpruning,toconstructaminimum-nodeSST,foranygivenK,fromanm-nodebinarytrie.Theydevelopalsoabreadth-rstpruningalgorithmtoconstruct,foranygivenK,aminimumheightSST.ThecomplexityofthisalgorithmisO(m2).Fordensebinarytries,TBMsaremorespaceecientthanSSTs.However,forsparsebinarytries,SSTsaremorespaceecient.Songetal.[ 39 ]proposeahybridSST(HSST)inwhichdensesubtriesoftheoverallbinarytriearepartitionedintoTBMsupernodesandsparsesubtriesintoSSTsupernodes.Figure 4-4 showsanHSSTforthebinarytrieofFigure 4-1 (B).ForthisHSST,K=S=2.TheHSSThastwoSSTnodesXandZ,andoneTBMnodeY. 95

PAGE 96

39 ]donotdevelopanalgorithmtoconstructaspace-optimalHSST,theyproposeaheuristicthatisamodicationoftheirbreadth-rstpruningalgorithmforSSTs.ThisheuristicguaranteesthattheheightoftheconstructedHSSTisnomorethanthatoftheheight-optimalSST. 4-1 .Foreachrule,thelterisdenedbytheDest(destination)andSourceprexes.So,forexample,F2=(0;1)matchesallpacketswhosedestinationaddressbeginswith0andwhosesourceaddressbeginswith1.Whenapacketismatchedbytwoormorelters,thematchingrulewithleastcostisused.TheclassierofFigure 4-1 mayberepresentedasa2DBTinwhichthetop-leveltrieisconstructedusingthedestinationprexes.Inthecontextofourdestination-sourcelters,thistop-leveltrieiscalledthedestinationtrie(orsimply,desttrie).LetNbeanodeinthedestinationtrie.IfnodestprexequalsQ(N),thenN:datapointstoanemptylower-leveltrie.IfthereisadestprexDthatequalsQ(N),thenN:datapointstoabinarytrieforallsourceprexesEsuchthat(D;E)isalter.Inthecontextofdestination-sourcelters,thelower-leveltriesarecalledsourcetrees.EverynodeNofthedesttrieofa2DBThasa(possiblyempty)sourcetriehangingfromit.Letaandbbetwonodesinthedesttrie.Letbbeanancestorofa.Wesaythatthesourcetriethathangsfrombisanancestortrieoftheonethathangsfroma.Figure 4-5 givesthe2DBTfortheltersofFigure 4-1 . 96

PAGE 97

Exampleofvedest-sourcelters Filter DestPrex SourcePrex Cost F1 * 0* 1 F2 0* 1* 2 F3 000* 0* 3 F4 10* 0* 4 F5 11* 1* 5 Figure4-5. Two-dimensionalbinarytrieforvedest-sourceltersofFigure 4-1 39 ]constructs,foranygivenKandbinarytrieT,aminimumheightSST.ThecomplexityofthisalgorithmisO(m2),wheremisthenumberofnodesinT.Inthissection,wedevelopanO(m)algorithmforthistask.Ouralgorithm,whichwecallminHtSST,performsapostordertraversal 39 ],whichconstructsanSSTthathasthefewestnumberofnodes. 97

PAGE 98

4-6 givesthevisitfunctionemployedbyourpostordertraversalalgorithmminHtSST.xisthenodeofTbeingvisited.Toavoidcodeclutter,wedonotshowthecodeneededtoupdatesize,lht,rht,SSTs,andsoon.Thisvisitfunctionhas3mutuallyexclusivecases.Exactlyoneoftheseisexecutedduringavisit.ItiseasytoseethatifTistraversedinpostorderusingthevisitfunctionofFigure 4-6 ,thenx:leftChild:size
PAGE 99

Case2: Case3: VisitfunctionforminHtSST 99

PAGE 100

100

PAGE 101

SincethevisitfunctionofFigure 4-6 canbeimplementedtoruninO(1)time,thecomplexityofourpostordertraversalfunctionminHtSSTisO(m)wheremisthenumberofnodesinthebinarytrieT.NotethatthenumberofnodesinthebinarytriefornprexeswhoselengthisatmostWisO(nW).So,intermsofnandW,thecomplexityofminHtSSTisO(nW). 101

PAGE 102

102

PAGE 103

4{3 ,eachopt(;)valuecanbecomputedinO(K)time,sincejDS(N)j2S2K.Also,eachopt(;;)valuecanbecomputedinO(K)timeusingEquations 4{4 4{8 .ThereareO(mH)opt(;)andO(mHK)opt(;;)valuestocompute.Hence,thetimecomplexityisO(mHK+mHK2)=O(mHK2)O(nWHK2),wherenisthenumberofltersandWisthelengthofthelongestprex. 4-7 showsapossible2DHSSTforthe2DBTofFigure 4-5 .ThesupernodestridesusedareK=S=2.A2DHSSTmaybesearchedfortheleast-costlterthatmatchesanygivenpairofdestinationandsourceaddresses(da;sa)byfollowingthesearchpathfordainthedestinationHSSTofthe2DHSST.Allsourcetriesencounteredonthispatharesearchedforsa.Theleast-costlteronthesesource-triesearchpathsthatmatchessaisreturned.Supposewearetondtheleast-costlterthatmatches(000,111).Thesearchpathfor000takesusrsttotheroot(ab)ofthe2DHSSTofFigure 4-7 andthentotheleftchild(dg).Inthe2DHSSTroot,wegothroughnodesaandbofthedest 103

PAGE 104

Two-dimensionalsupernodetriefor5dest-sourceltersofFigure 4-1 binarytrieandinthesupernodedg,wegothroughnodesdandgofT.Threeoftheencounterednodes(a,b,andg)haveahangingsourcetrie.ThecorrespondingsourceHSSTsaresearchedfor111andF2isreturnedastheleast-costmatchinglter.Todeterminethenumberofmemoryaccessesrequiredbyasearchofa2DHSST,weassumesucientmemorybandwidththatanentiresupernode(thisincludestheIBM,EBM,childandNHpointers)maybeaccessedwithasinglememoryreference.ToaccessacomponentoftheNHarray,anadditionalmemoryaccessisrequired.Foreachsupernodeonthesearchpathforda,wemakeonememoryaccesstogetthesupernode'selds(e.g.,IBM,EBM,childandNHpointers).Inaddition,foreachsupernodeonthispath,weneedtoexaminesomenumberofhangingsourceHSSTs.ForeachsourceHSSTexamined,werstaccessacomponentofthedest-triesupernode'sNHarraytogettherootofthehangingsourceHSST.ThenwesearchthishangingsourceHSSTbyaccessingtheremainingnodesonthesearchpath(asdeterminedbythesourceaddress)forthisHSST.Finally,theNHcomponentcorrespondingtothelastnodeonthissearchpathisaccessed.So,inthecaseofouraboveexample,wemake2memoryaccessestofetchthe2supernodesonthedestHSSTpath.Inaddition,3sourceHSSTsaresearched.Each 104

PAGE 105

maxPfH(P)+nodes(P)gh(4{9)NotethateveryU;U22DHSST(h),canbesearchedwithatmosthmemoryaccessesperlookup.Notealsothatsome2DHSSTsthathaveapathPforwhichH(P)+nodes(P)=hcanbesearchedwithfewermemoryaccessesthanhastheremaybeno(da;sa)thatcausesasearchtotakethelongestpaththrougheverysourceHSSTonpathsPforwhichH(P)+nodes(P)=h.Weconsidertheconstructionofaspace-optimal2DHSSTVsuchthatV22DHSST(H).WerefertosuchaVasaspace-optimal2DHSST(h).LetNbeanodeinT'stop-leveltrie,andlet2DBT(N)bethe2-dimensionalbinarytrierootedatN.Letopt1(N;h)bethesize(i.e.,numberofsupernodes)ofthespace-optimal2DHSST(h)for2DBT(N).opt1(root(T);H)givesthesizeofaspace-optimal2DHSST(H)forT.Letg(N;q;h)bethesize(excludingtheroot)ofaspace-optimal2DHSST(h)for2DBT(N)undertheconstraintthattherootofthe2DHSSTisaTBMsupernodewhosestrideisq.So,g(N;S;h)+1givesthesizeofaspace-optimal2DHSST(h)for2DBT(N)undertheconstraintthattherootofthe2DHSSTisaTBMsupernodewhosestrideisS.Weseethat,forq>0, 105

PAGE 106

106

PAGE 107

4.3 tocomputeoptisO(n2WHK2)time.UsingEquation 4{10 andpreviouslycomputedgvalues,O(H)timeisneededtocomputeeachg(;;)value.UsingEquation 4{11 ,eachopt1(;)valuemaybecomputedinO(K)time.UsingEquations 4{12 4{15 ,wecancomputeeachopt1(;;)valueinO(KH)time.SincethereareO(nWH)opt1(;),O(nWHK)opt1(;;),andO(nWSH)g(;;)valuestocompute,thetimetodetermineopt1(root(T);H)isO(n2WHK2+nWHK+nWH2K2+nWSH2)=O(n2WHK2)(as,intypicalapplications,n>H). 4-5 .Considerthedest-triesupernodeabofFigure 4-7 .ThissupernoderepresentsthesubtrieofTthatiscomprisedofthebinarynodesaandb.Asearchinthissubtriehasthreeexitpoints{leftchildofb,rightchildofbandrightchildofa.Forthersttwoexitpoints,thesourcetriesthathangoofaandbaresearchedwhereasforthethirdexitpoint,onlythesourcetriethathangsoofaissearched.Wesaythatthersttwoexitpointsusethesourcetriesthathangoofaandbwhilethethirdexitpointusesonlythesourcetriethathangsoofa.Ifthesourcetriethathangsoofbisaugmentedwiththeprexesinthesourcetriethathangsoofa,thenwhenthersttwoexitpointsareused,onlytheaugmentedsourcetriethathangsoofbneedbesearched.Inprexinheritance,eachnon-emptysourcetrieinapartitionisaugmentedwiththeprexesinallsourcetriesthathangoofancestorsinthepartition.Whenthisaugmentationresultsinduplicateprexes,theleast-costprexineachsetofduplicatesisretained.Theresultingaugmentedsourcetriesarecalledexittries.Ina2DHSSTwithprexinheritance(2DHSSTP),prexinheritanceisdoneineachsupernode.Figure 4-8 givesthe2DHSSTPforthe2DHSSTofFigure 4-7 . 107

PAGE 108

2DHSSTPforFigure 4-7 Noticethattosearcha2DHSSTPweneedsearchatmostoneexittrieforeachdest-triesupernodeencountered{thelastexittrieencounteredinthesearchofthepartitionrepresentedbythatdest-triesupernode.So,whensearchingfor(da;sa)=(000;111),wesearchtheexittriesthathangoofbandgfor111.Thenumberofmemoryaccessesis2(forthetwosupernodesabanddg)+2(1toaccessthesupernodeineachofthetwosourcetriessearched)+2(toaccesstheNHarraysforthesourcetriesupernodes)=6.Thesamesearchusingthe2DHSSTofFigure 4-7 willsearchthreesourcetries(thosehangingoofa,b,andg)foratotalcostof8memoryaccesses.AnodeNinadest-triepartitionisadominatingnodeithereisanexittrieoneverypathfromNtoanexitpointofthepartition.NoticethatifNhastwochildren,bothofwhicharedominatingnodes,thentheexittrie(ifany)inNisneversearched.Hence,thereisnoneedtostorethisexittrie.Weareinterestedinconstructingaspace-optimal2DHSSTPthatcanbesearchedwithatmostHmemoryaccesses.Althoughwehavebeenunabletodevelopagoodalgorithmforthis,weareabletodevelopagoodalgorithmtoconstructaspace-optimalconstrained2DHSSTPforany2DBTT.Notethatthe2DHSSTPforTiscomprisedofsupernodesforthedest-trieofTplussupernodesfortheexittries. 108

PAGE 109

109

PAGE 110

4.3 .Theh(N;p)valuesarecomputedeasilyduringthecomputationofthes(N;p)values.x(N;h;k;p)isthesizeofaspace-optimal2DHSSTPC(N;h;k;p)undertheaddedconstraintthattherootofthe2DHSSTPC(N;b;k;p)isadominatingnode.Weobtainrecurrencesforopt2(N;h;k;p)andx(N;h;k;p)byconsideringthreecasesforN.WhenNhasnochild(i.e.,Nisaleaf), where 110

PAGE 111

4{16 and 4{17 ,weget 4.3 ,alls(;)andh(;)valuesmaybecomputedinO(n2W2HK2)time.Followingthiscomputation,eachss(N)valuemaybecomputedinO(2S)=O(K)timebytraversingtherstSlevelsofthesubtreeofTrootedatN.Thusallss()valuesmaybedeterminedinO(nWK)additionaltime.AscanbeseenfromEquation 4{25 ,O(K)timeisneedtocomputeeachopt2(;)value(assumingthatthessandopt2termsintheright-hand-sideoftheequationareknown).IttakesO(K)timetocomputeeachopt2(;;;)andx(;;;)value.AsthereareO(nWH)opt2(;)valuesandO(nW2HK)opt2(;;;)andx(;;;)values,thetotaltimecomplexityisO(n2W2HK2+nWK+nWHK+nW2HK2)=O(n2W2HK2). 4.6.1HSSTsIfeachsupernodecanbeexaminedwithasinglememoryaccess,thenanHSSTwhoseheightisH(i.e.,thenumberoflevelsisH+1)maybesearchedforthenexthopofthelongestmatchingprexbymakingatmostH+2memoryaccesses.Togetthisperformance,wemustchoosethesupernodeparametersKandSsuchthateachtypeofsupernodecanberetrievedwithasingleaccess.AsnotedinChaptper 4.1.1 ,thesizeofaTBMnodeis2S+1+2b1bitsandthatofanSSTnodeis4K+2b1bits.Anadditionalbitisneededforustodistinguishthetwonodetypes.So,anyimplementationofanHSSTmustallocate2S+1+2bbitsforaTBMnodeand4K+2bbitsforanSSTnode.WerefertosuchanimplementationasthebaseimplementationofanHSST.LetBbethenumberofbitsthatmayberetrievedwithasinglememoryaccessandsupposethatweuseb=20bitsforapointer(asisdonein[ 39 ]).WhenB=72,oursupernodeparametersbecome 111

PAGE 112

39 ]haveproposedanalternativeimplementation,calledtheprex-bitimplementation,forsupernodes.Thisalternativeimplementationemploystheprex-bitoptimizationtechniqueofEathertonetal.[ 7 ].Anadditionalbit(calledprefixBit)isaddedtoeachsupernode.Thisbitisa1forasupernodeNithesearchpaththroughtheparentsupernode(ifany)ofNthatleadsustoNgoesthroughabinarytrienodethatcontainsaprex.WiththeprefixBitaddedtoeachsupernode,wemaysearchanHSSTasfollows:(a)MovedowntheHSSTkeepingtrackoftheparent,Z,ofthemostrecentlyseensupernodewhoseprefixBitis1.DonotexaminetheIBMofanynodeencounteredinthisstep.(b)ExaminetheIBMofthelastsupernodeonthesearchpath.Ifnomatchingprexisfoundinthissupernode,examinetheIBMofsupernodeZ.Whenprex-bitoptimizationisemployed,itispossibletohavealargerKandSastheIBM(Kor2S1bits)andNH(bbits)eldsofasupernodearenotaccessed(exceptinStep2).So,itissucientthatthespaceneededbytheremainingsupernodeeldsbeatmostBbits.TheIBMandNHeldsmayspilloverintothenextmemoryword.Inotherwords,weselectKandStobethelargestintegersforwhich3K+b+1Band2S+b+2B.WhenB=72andb=20,weuseK=17andS=5;andwhenB=64andb=20,weuseK=14andS=5.Whenprex-bitoptimizationschemeisemployed,thenumberofmemoryaccessesforasearchisH+4astwoadditionalaccesses(relativetothebaseimplementation)areneededtofetchtheuptotwoIBMsandNHeldsthatmaybeneededinStep2.TheadditionalaccesstotheIBMofZmaybeavoidedusingcontrolledleafpushing,whichisquitesimilartothestandardleafpushingproposedin[ 6 ].Ineachsupernode,ifitsunderlyingsubtreerootdoesn'tcontainanexthop,westoreinthisrootthenexthopofitsnearestancestorthatcontainsanexthop.ThiswayweneitherneedaprefixBit

PAGE 113

113

PAGE 114

7 ].Wepermitfourformatsforaleafsupernode.Figure 4-9 showsthesefourformatsforthebaseimplementation.Eachsupernode(leafornon-leaf)usesabittodistinguishbetweenleafandnon-leafsupernodes.Eachleafsupernodeusestwoadditionalbitstodistinguishamongthefourleafformatswhileeachnon-leafsupernodeusesanadditionalbittodistinguishbetweenSSTandTBMsupernodes.TheleafsupernodesareobtainedbyidentifyingthelargestsubtriesofthebinarytrieTthattintooneofthefourleaf-supernodeformats.Noticethataleafsupernodehasnochildpointer.Consequently,intheSSTformatwemayusealargerKthanusedfornon-leafsupernodesandintheTBMformat,alargerSmaybepossible.Thethirdformat(SuxA)isusedwhenwemaypacktheprexesinasubtrieintoasinglesupernode.Forthispacking,letNbetherootofthesubtriebeingpacked.Then,Q(N)(theprex 114

PAGE 115

Leafsupernodeformats denedbythepathfromtherootofTtoN)isthesameforallprexesinthesubtrierootedatN.HencetheleafsupernodeneedstoreonlythesuxesobtainedbydeletingQ(N)fromeachprexinST(N).Theleafsupernodestoresthenumberofthesesuxes,followedbypairsoftheform(suxlength,sux).InFigure 4-9 ,len(S1)isthelengthoftherstsuxandS1istherstsuxinthesupernode.Leafsupernodesinthethirdformataresearchedbyseriallyexaminingthesuxesstoredinthenodeandcomparingthesewiththedestinationaddress(afterthisisstrippedoftheprexQ(N);thisstrippingmaybedoneaswemovefromroot(T)toN).ForallST(N)sthatarerepresentedbyaleafsupernode,wesetopt(N;h)=1forh0.ThedynamicprogrammingrecurrenceofSection 4.3 isthenusedtodetermineopt(root(T);H).Thefourthformat(SuxB)issimilartothethirdleafsupernodeformat,whileavoidsanadditionalaccesstoextractthenexthop.Whencontrolledleafpushingisappliedtothefourthformatleafsupernode,theworst-casenumberofmemoryaccessesrequiredforalookupmaybereduced.Notewithoutcontrolledleafpushing,ifnomatchingprexisfoundinafourthformatleafsupernode,westillneedanadditionalaccesstoextractthenexthopassociatedwiththelongestmatchingprexalongthesearchpath.ForallST(N)sthatmayberepresentedbyaleafsupernodeoftherstthreetypes,wesetopt(N;h)=1forh0andforallST(N)sthatmayberepresentedbyaSuxB 115

PAGE 116

4.3 isthenusedtodetermineopt(root(T);H).Althoughwehavedescribedend-nodeoptimizationonlyforthebaseimplementation,thistechniquemaybeappliedtotheprex-bitimplementationaswelltoreducetotalmemoryrequirement. 4.4 and 4.5 toconstructspace-optimal2DHSSTsand2DHSSTPCs.Followingtheconstruction,identifytheparentdest-triesupernodeforeachleafthatwascuto.Second,inthecaseof2DHSSTPCs,eachsourcetriethathangsoofaleafofthedestbinarytrie,inheritstheprexesstoredalongthepath,intheparentdest-triesupernode,tothisleaf.Third,eachcut-oleafisreplacedbytheHSSTforitssourcetrie(thissourcetrieincludestheinheritedprexesof(2)incaseofa2DHSSTPC).TherootofthisHSSTisplacedastheappropriatechildoftheparentdest-triesupernode.(Thisrequiresustouseanadditionalbittodistinguishbetweendest-triesupernodesandsourceHSSTroots.)Byhandlingtheleavesofthebinarydest-trieasabove,weeliminatetheneedtosearchthesourcetriesthatareonthepath,inthedest-trieparent,toaleafchild.Finally,for2DHSSTPCs,wemayreducethetimeandspacerequiredtoconstructspace-optimalstructuresbyusinganalternativedenitionofthepusedinSection 4.5 .Inthisnewdenition,prexinheritanceextendsuptothepnearestancestorsofNinTthathaveanon-emptysourcetrie.Since,ontypicaldatasets,adest-trienodehasasmall(say3or4)numberofancestorsthathavenon-emptysourcetrieswhilethenumberofancestorsmaybeaslargeas32inIPv4and128inIPv6,thenewdenitionofpallowsustoworkwithmuchsmallerps.Thisreducesthememoryrequiredbythearraysfor 116

PAGE 117

4.5 havetobemodiedtoaccountforthischangeindenition.Notealsothatwhilethespacerequiredforminx(;;;)alsoisreduced,wemaysolvetherecurrencesofSection 4.5 withoutactuallyusingsuchanarray. 33 , 39 , 41 ],and2DMTdsand2DMTSasinChapter 2 .Thebenchmarkedalgorithmsseektoconstructlookupstructuresthat(A)minimizetheworst-casenumberofmemoryaccessesneededforalookupand(B)minimizethetotalmemoryneededtostoretheconstructeddatastructure.Asaresult,ourexperimentsmeasuredonlythesetwoquantities.Further,alltestalgorithmswererunsoastogeneratealookupstructurethatminimizestheworst-casenumberofmemoryaccessesneededforalookup;thesize(i.e.,memoryrequired)oftheconstructedlookupstructurewasminimizedsubjecttothisformerconstraint.ForbenchmarkingpurposesweassumedthattheclassierdatastructurewillresideonaQDRIISRAM,whichsupportsbothB=72bits(dualburst)andB=144bits(quadburst).Forourexperiments,weusedb=22bitsforapointer(whetherachildpointerorapointertoanext-hoparray)and12bitsforeachnexthop.Inthecaseoftwo-dimensionaltables,weneedtostorethepriorityandactionassociatedwithaprex.Weallocate18bitsforthispurpose. 117

PAGE 118

39 ] 41 ]toconstructmulti-waytrees.Extensiveexperimentsreportedin[ 41 ]establishthesuperiorityofV3MT,intermsofspaceandlookupeciency,overotherknownschemesforspaceandtimeecientrepresentationofIPlookuptables.[ 39 ]establishesthesuperiorityofBFPoverTBM[ 7 ].However,[ 39 ]didnotcompareBFPtoV3MT.IPv4RouterTables Table4-2. NumberofmemoryaccessesrequiredforalookupinIPv4tables B=144bits Database EP EPO EB EBO BFP V3MT EP EPO EB EBO BFP V3MT Aads 6 5 6 5 7 8 5 4 4 3 6 5 MaeWest 6 5 6 5 7 8 5 4 4 3 6 5 RRC01 6 6 6 6 8 8 5 4 5 4 6 6 RRC04 7 6 6 6 8 9 5 4 5 4 6 6 AS4637 7 6 6 6 8 9 5 5 5 4 6 6 AS1221 7 6 7 6 8 9 5 5 5 4 7 6 AccessesforIPv4datanormalizedbyEBOdata. Standard Algorithm Min Max Mean Deviation EP 1.00 1.67 1.27 0.20 EPO 1.00 1.33 1.10 0.15 EB 1.00 1.33 1.19 0.12 EBO 1 1 1 0 BFP 1.33 2.00 1.53 0.25 V3MT 1.33 1.67 1.53 0.09 39 ]isawedasduringitsgenerationofTBMnodesitdoesn'tcheckifpartoftheunderlyingfulltreehasalreadybeenprunedtoconstructanothersupernode.Ourimplementationxesthisawusingaxdiscussedwiththeauthorsof[ 39 ].3 41 ]forprovidingtheircode 118

PAGE 119

BFigure4-10. NumberofmemoryaccessesrequiredbyalookupinIPv4tables.A)B=72bits.B)B=144bits. Table4-4. MemoryforIPv4datanormalizedbyEBOdata. Standard Algorithm Min Max Mean Deviation EP 1.10 1.50 1.34 0.11 EPO 0.87 1.30 1.02 0.14 EB 1.05 1.35 1.23 0.08 EBO 1 1 1 0 BFP 1.31 1.79 1.61 0.15 V3MT 1.14 1.61 1.41 0.15 Table 4-2 showsthenumberofmemoryaccessesrequiredforalookupinthedatastructureconstructedbyeachofouralgorithms(assumingtherootisheldinaregister).Unliketheaccesscountsreportedin[ 39 , 41 ],thenumbersreportedbyusincludeanadditionalaccessneededtoobtainthenexthopforthelongestmatchingprex.Figure 4-10 plotsthisdata.Ascanbeseen,EBOresultsinthesmallestaccesscountsforallofourtestsets;EPOtieswithEBOonallofthesixtestsetswhenB=72(ourotherexperimentswith9-bitnexthopand18-bitpointerindicatethatEBOmayrequireonememoryaccesslessthanEPOwhenB=72)andon2whenB=144.Tables 4-3 and??normalizetheaccesscountdatabythecountsforEBOandpresentsthemin,max,andstandarddeviationofthenormalizedcountforthe6datasets.ThenumberofmemoryaccessesforalookupinthestructureconstructedbyBFPrangesfrom1.33to2.00times 119

PAGE 120

Table4-5. Totalmemory(KBytes)requiredbyIPv4tables B=144bits Database EP EPO EB EBO BFP V3MT EP EPO EB EBO BFP V3MT Aads 102 67 92 68 122 110 86 64 81 71 103 89 MaeWest 167 116 150 113 199 180 141 111 135 128 167 146 RRC01 479 326 428 335 578 515 419 405 382 314 488 427 RRC04 484 344 452 354 610 544 444 432 404 333 514 452 AS4637 721 518 682 530 918 798 677 493 611 507 785 663 AS1221 870 630 810 644 1108 981 823 596 750 635 957 819 BFigure4-11. Totalmemory(KBytes)requiredforIPv4tables.A)B=72bits.B)144bits. Table 4-5 showsthetotalmemoryrequiredbythelookupstructureconstructedbyeachofour6algorithms.Figure 4-11 plotsthisdataandTable??(B)presentsstatisticsnormalizedbythedataforEBO.Ascanbeseen,EPOandEBOresultsintheleast 120

PAGE 121

39 ]),42forV3MT[ 41 ]and27forHSSTs(EBO).ThecorrespondingnumbersforthecasewhenB=144are41,35,and27.ComparisonWithOtherSuccinctRepresentations 6 ]haveproposedasuccinctroutertablestructurecalledLulea.Thisisa3-levelmultibittrie.AlookupinLulearequires12memoryaccesses.So,asfaraslookuptimegoes,Luleaisinferiortoall6ofthestructureswehaveconsideredhere.SincewedonothavethecodeforLulea,weareabletodoonlyanapproximatememorycomparison.[ 6 ]reportsmemoryrequirementsfor6databases,thelargestofwhichhas38,141prexesanduses34bitsofmemoryperprex.Sincethememoryrequiredper 121

PAGE 122

25 , 26 ]hasproposedasuccinctrepresentationofamultibittrieusingperfecthashfunctions-balancedroutingtablesearch(BARTs).Figure 4-6 givesthememoryrequirementofBARTs12-4-4-4-8,oneofhistwomostmemoryecientschemes(theotherschemeisBARTs8-4-4-4-4-8,whichrequiresslightlylessmemorybuttwomoreaccessesforasearch).Thenumberofmemoryaccessesneededforalookupis9inBARTs12-4-4-4-8.Bycomparison,thelookupcomplexityforEBOwithB=72is5or6accesses/lookup,andthetotalmemoryrequiredisbetween38%and43%thememoryofBARTs12-4-4-4-8.WenotethattheimplementationassumptionsusedbyLunteren[ 25 ]andusareslightlydierent.[ 25 ]allocates18bitsforeachpointerandnexthopwhereasweallocate22bitsforapointerand12foranexthop.Theschemeof[ 25 ]requirespointersandnexthopstobeofthesamesize.Inreality,thenumberofdierentnexthopsissmalland12bitsareadequate.Ontheotherhand,forlargedatabases,18bitsmaynotbeadequateforapointer.Despitetheseminordierences,ourexperimentsshowthatEBOissuperiortotheschemeof[ 25 ]onbothlookupcomplexityandtotalmemoryrequired. 122

PAGE 123

Memory(KBytes)ofBARTsearch Database Aads MaeWest RRC01 RRC04 AS4637 AS1221 BARTs12-4-4-4-8(B=32) 163 262 793 844 1270 1685 BARTS12-6-6-8(B=288) 137 214 692 733 1149 1380 Lunteren[ 26 ]describesperfect-hash-functionstrategiesforverywidememory-bal-ancedroutingtablesearch(BARTS),B288.Figure 4-6 showsthememoryrequirementofhismostmemoryecientschemeBARTS12-6-6-8withB=288.Thenumberofmemoryaccessesneededforalookupis4.EBOwithB=144achievesalookupcomplexityof3or4accesses/lookupwhilerequiringfrom44%to60%thememoryofBARTS12-6-6-8.IPv6RouterTables 44 ]togenerateIPv6tablesfromIPv4tables.Inthisstrategy,toeachIPv4prexweprependa16-bitstringcomprisedof001followedby13randombits.Ifthisprependingdoesn'tatleastdoubletheprexlength,weappendasucientnumberofrandombitssothatthelengthoftheprexisdoubled.Followingthisprependingandpossibleappending,wedropthelastbitfromone-fourthoftheprexessoastomaintainthe3:1ratioofevenlengthprexestooddlengthobservedinrealroutertables.EachsynthetictableisgiventhesamenameastheIPv4tablefromwhichitwassynthesized.TheAS1221-TelstraIPv6tableisnamedAS1221*todistinguishitfromtheIPv6tablesynthesizedfromtheIPv4AS1221table.Tables 4-7 and 4-10 givethenumberofmemoryaccessesandmemoryrequiredbythesearchstructuresforour7IPv6datasets.Figures 4-12 and 4-13 plotthesedataandTables 4-8 and 4-9 givestatisticsnormalizedbythedataforEBO.EPOandEBOarethebestwithrespecttonumberofmemoryaccesses.WhenB=72,EPOwassuperiorto 123

PAGE 124

NumberofmemoryaccessesrequiredforalookupinIPv6tables B=144bits Database EP EPO EB EBO BFP V3MT EP EPO EB EBO BFP V3MT AS1221* 7 6 7 7 8 7 5 4 4 3 6 5 Aads 7 5 7 5 9 9 5 4 5 4 6 6 MaeWest 8 5 8 5 9 9 6 4 5 4 7 6 RRC01 8 6 8 6 9 10 6 4 5 4 7 7 RRC04 8 6 8 6 10 10 6 4 5 4 7 7 AS4637 8 6 8 6 10 10 6 5 6 4 7 7 AS1221 9 6 9 7 10 11 6 5 6 4 7 7 BFigure4-12. NumberofmemoryaccessesrequiredbyalookupinIPv6tables.A)B=72bits.B)B=144bits Table4-8. AccessesforIPv6datanormalizedbyEBOdata. Standard Algorithm Min Max Mean Deviation EP 1.00 1.67 1.41 0.17 EPO 0.86 1.33 1.04 0.14 EB 1.00 1.60 1.33 0.14 EBO 1 1 1 0 BFP 1.14 2.00 1.66 0.21 V3MT 1.00 1.80 1.63 0.21 EBOby1memoryaccesseson2ofthe7datasetsandtiedontheremaining5.However,whenB=144,EBOwassuperiortoEPOby1memoryaccesseson3ofthe7datasetsandtiedontheremaining4.AswiththeIPv4data,thememoryutilizationoftheEBOstructuresisalmostasgoodasoftheEPOstructures(anaveragedierenceof1%).Worst-caselookupsintheconstructedBFPstructuresrequire1.14to2.00timesasmany 124

PAGE 125

MemoryforIPv6datanormalizedbyEBOdata. Standard Algorithm Min Max Mean Deviation EP 1.42 2.99 2.47 0.45 EPO 0.92 1.21 0.99 0.10 EB 1.35 2.16 1.97 0.26 EBO 1 1 1 0 BFP 1.82 3.17 2.71 0.42 V3MT 2.26 3.12 2.79 0.30 Table4-10. Totalmemory(KBytes)requiredbyIPv6tables(AS1221*dataisinbytes) B=144bits Database EP EPO EB EBO BFP V3MT EP EPO EB EBO BFP V3MT AS1221* 7255 4372 6574 4755 8634 10996 6043 4213 5760 4254 7729 9601 Aads 473 170 391 184 501 558 464 157 334 155 492 438 MaeWest 769 296 641 311 836 946 750 284 563 270 832 724 RRC01 2581 974 2102 1046 2785 3267 2556 1107 1969 917 2782 2405 RRC04 2744 1030 2234 1108 2969 3459 2712 1163 2098 971 2949 2562 AS4637 4249 1617 3449 1752 4609 5445 4216 1461 3201 1503 4584 3957 AS1221 5656 2182 4665 2252 6240 6982 5489 2005 4220 2168 6090 5516 BFigure4-13. Totalmemory(KBytes)requiredbyIPv6tables(AS1221*dataisinbytes).A)B=72bits.B)B=144bits. memoryaccessesasrequiredintheEBOstructuresandtheBFPstructuresrequire1.82to3.17timesthememoryrequiredbytheEBOstructures.AswasthecaseforourIPv4experiments,increasingBfrom72to144,resultsinareductioninthenumberofmemoryaccessesrequiredforalookup.ForEPOthemaximum,minimum,andaveragereductioninthenumberofmemoryaccesseswere33%,17%,and25%;thestandarddeviationwas8%.ThecorrespondingpercentagesforEBO 125

PAGE 126

39 ]haveproposedtwotechniques{childpromotionandnearest-ancestorcollapse,thatmaybeusedtoreducethenumberofnodesandnumberofprexesintheone-bitbinarytree.Thesetechniquesreducethesizeoftheone-bitbinarytrieaswellasthatofitscompactrepresentation.Inchildpromotion,wepromotetheprexstoredinabinarynode,ifitssiblingalsocontainsavalidprex,totheparentnode.Afterthepromotion,thenodeisdeletedprovideditisaleaf.Inthenearestancestorcollapsetechnique,whichdrawsfrom[ 18 ],weeliminatetheprexstoredinanodeifitsnearestancestorcontainsaprexwiththesamenexthop;leavesaredeletediftheybecomeempty.Wenotethatnearest-ancestorcollapseisverysimilartotheportmergetechniqueproposedbySunetal.[ 41 ].Aportmergeisusedtoreducethenumberofendpointsbymergingtwoconsecutivedestination-addressintervalsthathavethesamenexthop.Inthissection,westudytheeectofchildpromotionandnearest-ancestorcollapseonthesuccinctrepresentationsgeneratedbyEBO,BFP,andV3MT.ForV3MT,wedoaportmergeontheintervalsconstructedfromtheoptimizedbinarytrie.Forthisexperimentalstudy,weareabletouseonly3ofourIPv4datasets{Aads,Maewest,andAS1221asthesearetheonlydatasetsforwhichwehavenext-hopdata. Table4-11. Totalmemory(KBytes)andnumberofmemoryaccesses(MAs)requiredbyIPv4tablesafteroptimizations B=144bits EBO BFP V3MT EBO BFP V3MT Database Mem MAs Mem MAs Mem MAs Mem MAs Mem MAs Mem MAs Aads 51 5 94 7 72 7 50 3 78 6 57 5 MaeWest 85 5 151 7 116 8 89 3 126 6 93 5 AS1221 425 6 716 8 507 9 411 4 605 6 418 6

PAGE 127

4-11 givesthetotalmemoryrequirementandmemoryaccessesneededforalookup.EBOremainsthebestsuccinctrepresentationmethodonboththenumberofmemoryaccessesmeasureandthetotalmemorymeasure.On2ofour18tests,(BFPonAS1221withB=144andV3MTonAadswithB=72),thenumberofmemoryaccessesrequiredforalookupisreducedby1.Fortheremaining16tests,thereisnochangeinthenumberofaccessesrequiredforalookup.Theapplicationofthechildpromotionandnearest-ancestorcollapseoptimizationsreducesthetotalmemoryrequiredbythesuccinctrepresentationsofthebinarytrie.ForEBO,thereductionvariesfrom24%to35%withthemeanreductionbeing29%;thestandarddeviationis5%.ForBFP,thesePercentageswere23%,37%,28%and6%.ThesepercentagesforV3MTwere34%,49%,40%,and7%.Ourexperimentsindicatethatmostofthereductioninmemoryrequirementisduetothenearest-ancestorcollapseoptimization.Childpromotioncontributedaround1%ofthememoryreduction.ThememoryrequiredbytheBFPstructuresnormalizedbythatrequiredbytheEBOstructureswasbetween1.41and1.82,withthemeanandstandarddeviationbeing1.62and0.17.ThecorrespondingratiosforV3MTwere1.02,1.40,1.19and0.16. 42 ].Eachofthesedatasetsactuallyhas10dierentdatabasesofrules.So,inall,wehave120databasesof5-dimensionalrules.Thedatasets,whicharenamedACL1throughACL5(AccessControlList),FW1throughFW5(Firewall),IPC1andIPC2(IPChain)have,respectively,20K,19K,19K,19K,12K,19K,19K,18K,17K,17K,19K,and20Krules,onaverage,ineachdatabase.Our2-dimensionaldatasets,whichwerederivedfromthese5-dimensionaldatasets,have,respectively,20K,19K,10K,13K,5K,19K,19K,18K,17K,17K,16Kand20Krulesonaverageineachdatabase.The2-dimensionalruleswereobtainedfromour5-dimensionalrulesbystrippingothesource 127

PAGE 128

4-12 andFigure 4-14 showtheresultsfromourexperiment.For5ofour12datasets{ACL2-5,andIPC1{2DHSSTPCsreducethenumberofaccessesattheexpenseofincreasedmemoryrequirement.Fortheremainingdatasets,2DHSSTPCsand2DHSSTsrequirealmostthesamenumberofaccessesandthesameamountofmemory.Acrossallourdatasets,2DHSSTPCsrequiredbetween0%and29%morememorythanrequiredby2DHSSTs(themeanincreaseinmemoryrequiredwas6%andthestandarddeviationwas9%).Asnotedearlier,although2DHSSTPCsrequiredmorememory,theyrequiredasmallernumberofmemoryaccessesforalookup.Thereductioninnumberofmemoryaccessesaordedby2DHSSTPCswasbetween0%and41%(themeanreductionwas11%andthestandarddeviationwas13%).WhenBisincreasedfrom72to144,forboth2DHSSTsand2DHSSTPCs,thenumberofmemoryaccessesrequiredisreduced,butthetotalmemoryrequiredisgenerallyincreased.For2DHSSTs,thetotalmemoryrequiredwhenB=144normalizedbythatrequiredwhenB=72isbetween0.98and1.50(themeanandthestandarddeviationare1.21and0.19);thenumberofmemoryaccessesreducesbybetween28%and41%(themeanreductionis30%andthestandarddeviationis9%).For2DHSSTPCs,thenormalizedmemoryrequirementisbetween1.04and1.49(themeanandstandarddeviationare1.23and0.16);thereductioninnumberofmemoryaccessesrangesfrom18%to56%(themeanreductionandthestandarddeviationare31%and11%).Sinceourprimaryobjectiveistoreducethenumberofmemoryaccesses,weuse2DHSSTPCswithB=144forfurtherbenchmarkingwith2DMTSasand2DMTdsChapter 2 .The2DMTSasand2DMTdsemployedbyususedthecompressiontechniques 128

PAGE 129

Totalmemory(KBytes)andnumberofmemoryaccesses(MAs)requiredby2DHSSTsand2DHSSTPCs B=72bits B=144bits 2DHSST 2DHSSTPC 2DHSST 2DHSSTPC DataSet Mem MAs Mem MAs Mem MAs Mem MAs ACL1 329 9 333 9 479 6 479 6 ACL2 256 13 264 12 379 9 388 8 ACL3 102 26 119 18 102 16 131 11 ACL4 130 34 148 20 131 20 154 12 ACL5 40 17 45 16 39 10 50 7 FW1 236 11 236 11 274 8 274 8 FW2 261 11 261 11 392 8 390 8 FW3 210 11 210 11 260 8 260 8 FW4 203 11 203 11 237 8 237 8 FW5 216 11 216 11 274 9 273 9 IPC1 174 20 190 17 189 12 211 11 IPC2 265 10 265 10 300 8 300 8 BFigure4-14. Totalmemory(KBytes)andnumberofmemoryaccessesrequiredby2DHSSTsand2DHSSTPCs.A)Memory.B)Accesses. 38 ]andbutlernode[ 17 ].Thesetwotechniquesareverysimilar;bothattempttoreplaceasubtriewithasmallamountofactualdata(prexesandpointers)byasinglenodethatcontainsthesedata.Wenotethat2DMTdsand2DMTSasarethebestofthestructuresdevelopedinChapter 2 ,andusingthesetwocompressiontechniques,Chapter 2 hasestablishedthesuperiorityof2DMTdsand2DMTSasoverothercompetingpacketclassicationstructuressuchasGrid-of-Tries[ 37 ],EGT-PCs[ 2 ],andHyperCuts 129

PAGE 130

33 ].Forthisfurtherbenchmarking,weconstructedspace-optimal2DHSSTPCswiththeminimumpossiblenumber,H,ofmemoryaccessesforaworst-casesearch.ThisminimumHwasprovidedasinputtothe2DMTSa(2DMTd)algorithmtoconstructa2DMTSa(2DMTd)thatcouldbesearchedwithHmemoryaccessesintheworstcase.Becauseofthisstrategy,theworst-casenumberofmemoryaccessesfor2DHSSTPCsand2DMTSas(2DMTd)isthesame. Figure4-15. Totalmemory(KBytes)requiredby2DHSSTPCs,2DMTSas,and2DMTds Figure 4-15 plotsthememorymemoryrequiredby2DHSSTPCs,2DMTds,and2DMTSas.Weseethat,onthememorycriterion,2DHSSTPCsoutperform2DMTSasbyanorderofmagnitude,andoutperform2DMTSasbyanorderofmagnitudeon4ofour12datasets.Thememoryrequiredby2DMTdsnormalizedbythatrequiredby2DHSSTPCsisbetween1.14and624,themeanandstandarddeviationbeing56and179.Thenormalizednumbersfor2DMTSaswere9,49,17,11.Wealsoobservedthatwhen2DMTdsaregivenupto60%morememorythanrequiredbyspace-optimal2DHSSTPCswiththeminimumpossibleH,wecanconstruct2DMTdsthatcanbesearchedwith1or2feweraccessesforourdatasetsFW1-5andIPC2. 130

PAGE 131

2 ].Westartwitha2-dimensionaltrieforthedestinationandsourceprexes.Allrulesthathavethesamedest-sourceprexpair(dp;sp)areplacedinabucketthatispointedatfromtheappropriatesourcetrienodeofthe2-dimensionaltrie.Sincedpandsparedenedbythepathtothisbucket,thedestandsourceprexeldsarenotstoredexplicitlyinabucket.However,thesourceportrange,destportrange,protocoltype,priorityandactionarestoredforeachruleinthebucket.The2DHSSTPCalgorithmsofthispaperareusedtoobtainasupernoderepresentationofthe2-dimensionaltrieandtheNHlistsofnext-hopdataarecomprisedofbuckets.WemodifySuffixBnodessothattheycontainsourceprexsuxes,destandsourceports,protocols,prioritiesandactionsratherthanjustsourceprexsuxes,prioritiesandactions.Duringprexinheritancein2DHSSTPCs,asourcetriemayinheritprexes,fromitsancestortries,thatalreadyareinthatsourcetrie.Whenthishappens,therulesassociatedwiththeseinheritedprexesneedalsotobestoredinthissourcetrie.Toavoidthisredundancy,westoreapointerinthebucketassociatedwithasource-trieprex,whichpointstothebucketassociatedwiththesameprexinthenearestancestorsourcetrie.2DHSSTPCswithbucketsarecalledextended2DHSSTPCs.Unlike2DHSSTs,wedonotmodifythesourcetriesofanextended2DHSSTPCsothatthelastsourceprexseenonasearchpathhashighestpriority(orleastcost).Florinetal.[ 2 ]statethatwhen2-dimensionaltrieswithbucketsareused,asabove,for5-dimensionaltables,mostbucketshavenomorethan5rulesandnobuckethasmorethan20rules.Whilethisobservationwastrueofthedatasetsusedin[ 2 ],somebucketshadsignicantlymorerulesforourdatasets.Forexample,inFW4,about100rulescontainwildcardsinboththedestandsourceprexelds.Theserulesmayberemovedfromtheoriginaldatasetandstoredinasearchstructurethatisoptimizedfortheremaining3elds.Wenotethatthisstrategyofstoringalargeclusterofruleswith 131

PAGE 132

33 ]scheme.Thedatareportedinthefollowingguresandtablesareonlyforstructuresconstructedfortherulesthatremainafterruleswithwildcardsinbothdestandsourceprexeldsareremoved. Table4-13. Totalmemory(KBytes),bits/rule,andnumberofmemoryaccessesrequiredbyextended2DHSSTPCson5-dimensionaldatasets. DataSet Mem MAs bits/rule ACL1 480 6 196 ACL2 392 12 169 ACL3 181 35 78 ACL4 252 32 108 ACL5 87 20 59 FW1 282 8 121 FW2 391 8 168 FW3 264 8 120 FW4 302 9 145 FW5 280 9 135 IPC1 244 23 105 IPC2 326 8 133 BFigure4-16. Totalmemory(KBytes)andnumberofmemoryaccessesrequiredby2DHSSTPCsandextended2DHSSTPCs.A)Memory.B)Accesses. Table 4-13 givesthetotalmemoryandnumberofmemoryaccessesrequiredbyextended2DHSSTPCsonourtwelve5-dimensionaldatasets.Figure 4-16 compares2DHSSTPCs(these,ofcourse,storeonlythederived2-dimensionalrules)withextended 132

PAGE 133

Table4-14. Totalmemory(KBytes),bits/rule,andnumberofmemoryaccessesrequiredbyHyperCutson5-dimensionaldatasets. DataSet Mem MAs bits/rule ACL1 605 16 242 ACL2 10487 24 4415 ACL3 19591 43 8248 ACL4 17661 44 7436 ACL5 600 44 400 FW1 308121 26 129735 FW2 189751 24 79895 FW3 341813 23 151916 FW4 199096 24 93692 FW5 347478 20 163519 IPC1 38863 51 16363 IPC2 64394 24 25757 HyperCuts[ 33 ],whichisoneofthepreviouslybestknownalgorithmicschemesformultidimensionalpacketclassication,usesadecisiontreeandrulesarestoredinbucketsofboundedsize;eachbucketisassociatedwithatreenode.Unlikethebucketscheme 133

PAGE 134

BFigure4-17. Totalmemory(KBytes)andnumberofmemoryaccessesrequiredbyHyperCutsandextended2DHSSTPCs.A)Memory.B)Accesses. usedbyextended2DHSSTPCsinwhichthedestandsourceprexesarenotstoredexplicitly,thebucketschemeofHyperCutsrequiresthestorageoftheseeldsaswellasthosestoredinextended2DHSSTCbuckets.So,thestorageofanindividualruleinHyperCutsrequiresmorespacethanisrequiredinextended2DHSSTPCs.Additionally,inHyperCuts,arulemaybestoredinseveralbucketswhereasinextended2DHSSTPCs,eachruleisstoredinexactly1bucket.ThemostecientHypercutschemereportedin[ 33 ]isHyperCuts-4.Weusethisschemeforcomaprisonwithextended2DHSSTPCs.Table 4-14 showsthetotalmemoryandnumberofmemoryaccessesrequiredbyHyperCuts[ 33 ],onourtwelve5-dimensionaldatasets.ThenumberofbitsperrulerequiredbytheHyperCutsstructurewasbetween242and163,519;theaveragewas56,801.Itisimportanttonotethatthereiswidevariationinthebits/rulerequiredbyHypercuts;thebits/rulerequiredbyextended2DHSSTPCsisfarbetterpredictable.Inparticular,[ 33 ]reportsthattheperformanceofHyperCutsisnotgoodforrewall-likedatabasesasthesetendtohaveahighfrequencyofwildcardsinthesourceand/ordestelds.Infact,[ 33 ]reportsthata10%presenceofwildcardsineitherthesourceordestprexeldsresultedinasteepincreaseinmemoryrequirement!Thisobservationisconrmedbyourexperiments.HyperCutsexhibiteditsbestbits/ruleperformanceon 134

PAGE 135

4-17 comparesextended2DHSSTPCsandHyperCuts.Thestructureconstructedbyextended2DHSSTPCsrequiredbetween0.1%and79%thememoryrequiredbythatconstructedbyHyperCuts;theaverageandstandarddeviationbeing8%and23%,respectively.Thenumberofaccessesforalookupintheextended2DHSSTPCsstructurewasbetween31%and81%thatrequiredbytheHyperCutsstructure;theaverageandstandarddeviationwere46%and16%,respectively.Forbothschemes,thereportedmemoryandaccessesareonlyfortherulesthatremainafterruleswithwildcardsinbothdestandsourceprexeldsareremoved.Since,inextended2DHSSTPCs,noruleisstoredtwicewhilethesamerulemaybestoredinseveralHypercutsbuckets(dependingonthecomplexityoftheruleset),thememoryrequirementof2DHSSTPCsisbetterpredictedandfarlessonaverageandworst-casedata. 39 ]toO(m),wheremisthenumberofnodesintheinputbinarytrie.Additionally,wehavedevelopeddynamicprogrammingformulationsfortheconstructionofspace-optimalHSSTsandgood2DHSSTsand2DHSSTPCs.OurexperimentsindicatethatforIPv4datasets,ourEBOstructuresrequirebetween25%and50%fewermemoryaccessesforalookupthan 135

PAGE 136

39 ].Additionally,ourEBOstructuresrequirebetween24%and44%lessmemory.ComparedtothestructuresproducedbytheV3MTalgorithmof[ 41 ],ourEBOstructuresrequirebetween25%and40%fewermemoryaccessesandbetween12%and38%lessmemory.ThememoryaccessandstoragerequirementsofEBOalsoaresuperiortothoseofLulea[ 6 ]andtheperfect-hash-functionschemesof[ 25 , 26 ].SimilarimprovementswereprovidedbyEPOandEBOonIPv6data.Fortwo-dimensionalIPv4datasets,2DHSSTPCsresultinfewermemoryaccessesbutmorememoryrequirementthando2DHSSTs.Since,memoryaccessesperlookupistheprimaryoptimizationcriterion,werecommend2DHSSTPCsover2DHSSTs.Giventhesamebudgetforthenumberofmemoryaccesses,2DHSSTPCsrequirebetween0.2%and88%ofthememoryrequiredby2DMTdsofChapter 2 ,andbetween2%and11%ofthememoryrequiredby2DMTSasofChapter 2 .For5-dimensionalclassiers,extended2DHSSTPCsrequiresignicantlylessmemoryandsignicantlylessmemoryaccesses,bothonaverageandintheworst-case,thanrequiredbyHyperCuts[ 33 ]. 136

PAGE 137

2.2 orahybridshapeshiftingtrie(HSST)inChapter 4 .Thepartitioningresultsinanoverallreductioninthenumberofmemoryaccessesneededforalookupandareductioninthetotalmemoryrequired.Section 5.1 reviewsrelatedworkonrouter-tablepartitioningandSection 5.2 describesourpartitioningmethod.ExperimentalresultsarepresentedinSection 5.3 . 17 ]proposeapartitioningschemeforstaticrouter-tables.Thisschemeemploysafront-endarray,partition,topartitiontheprexesinaroutertablebasedontheirrsts,bits.Prexesthatarelongerthansbitsandwhoserstsbitscorrespondtothenumberi,0i<2sarestoredinabucketpartition[i]:bucketusinganydatastructure(e.g.,multibittrie)suitableforarouter-table.Further,partition[i]:lmp,whichisthelongestmatching-prexinthedatabaseforthebinaryrepresentationofi(notethatthelengthofpartition[i]:lmpisatmosts)isprecomputedfromthegivenprexset.Foranydestinationaddressd,lmp(d),isdeterminedasfollows:(a)Letibetheintegerwhosebinaryrepresentationequalstherstsbitsofd.LetVequalNULLifnoprexinpartition[i]:bucketmatchesd;otherwise,letVbethelongestprexinpartition[i]:bucketthatmatchesd;(b)IfVisNULL,lmp(d)=partition[i]:lmp.Otherwise,lmp(d)=V.Notethatthecases=0resultsinasinglebucketand,eectively,nopartitioning.Assisincreased,theaveragenumberofprexesperbucketaswellasthemaximumnumberinanybucketdecreases.Althoughtheworst-casetimetondlmp(d)decreasesasweincreases,thestorageneededforthearraypartition[]increaseswithsandquicklybecomesimpractical.Lampsonetal.[ 17 ]recommendusings=16.Thisrecommendationresultsin2s=65,536buckets.Forpracticalrouter-tabledatabasesthatmayhaveuptoafewhundredthousandrules,s=16resultsinbucketsthathaveatmostafewhundred 137

PAGE 138

20 ]haveproposedpartitioningschemesfordynamicrouter-tables.Whiletheseschemesaredesignedtokeepthenumberofmemoryaccessesrequiredforanupdateatanacceptablelevel,theymayincreasetheworst-casenumberofmemoryaccessesrequiredforalookupandalsoincreasethetotalmemoryrequiredtostorethestructure.OftheschemesproposedbyLu,KimandSahni[ 20 ],thetwo-leveldynamicpartitioningscheme(TLDP)worksbestforaverage-caseperformance.TLDP,liketheschemeofLampsonetal.[ 17 ],employsafront-endarraypartitionwithpartition[i]:bucket,0i<2sstoringallprexeswhoselengthissandwhoserstsbitscorrespondtoi.UnliketheschemeofofLampsonetal.[ 17 ],however,prexeswhoselengthislessthansarestoredinanauxiliarystructureXandthereisnoprecomputationofaquantitysuchaspartition[i]:lmp.TheprexesinXarethemselvespartitionedusingt
PAGE 139

20 ]showthattheTLDPschemeleadstoreducedaveragesearchandupdatetimesaswellastoareductioninmemoryrequirementoverthecasewhenthetestedbaseschemesareusedwithnopartitioning. 5.2.1BasicStrategyInrecursivepartitioning,westartwiththebinarytrieT(Figure 5-1 (a))forourprexsetandselectastride,s,topartitionthebinarytrieintosubtries.LetDl(R)bethelevell(therootisatlevel0)descendentsoftherootRofT.NotethatD0(R)isjustRandD1(R)isthechildrenofR.Whenthetrieispartitionedwithstrides,eachsubtrie,ST(N),rootedatanodeN2Ds(R)denesapartitionoftheroutertable.Notethat0
PAGE 140

5-1 (B)).ThebitstringsQ(N),N2Ds(R)denethekeysusedtoindexintothehashtable.Althoughanyperfecthashfunctionforthissetofkeysmaybeused,weusetheperfecthashfunctiondenedbyLunteren[ 25 , 26 ].Wenotethatwhens=T:height+1,thehashtableisemptyandL(R)=T.Inthiscase,TissimplyrepresentedbyabasestructuresuchasMBTorHSST.Whens
PAGE 141

Hashtableentrytypes Figure 5-3 givesthealgorithmtodoalookupinaroutertablethathasbeenpartitionedusingthebasicstrategy.Thealgorithmassumesthatatleastonelevelofpartitioninghasbeendone.Theinitialinvocationspecies,fortherst-levelpartitioning,thestrides,addressofrsthashtableentry,ht,andperfecthashfunctionh(speciedbyitsmask). 38 ].Incontrolledleafpushing,everybasestructurethatdoesnothavea(stripped)prexoflength0isgivenalength0prexwhosenexthopisthesameasthatofthelongestprexthatmatchesthebitsstrippedfromallprexesinthatpartition.So,forexample,supposewehaveabasestructurewhosestrippedprexesare00,01,101and110.All4oftheseprexeshavehadthesamenumberofbits(say3)strippedfromtheirleftend.Thestripped3bitsarethesameforall4prexes.Supposethatthestrippedbitsare010.Sincethepartitiondoesnothavealength0prex,itinheritsalength0prexwhosenexthopcorrespondstothelongestof*,0,01and010thatisintheoriginalsetofprexes.Assumingthattheoriginalprexsetcontainsthedefaultprex,thestatedinheritanceensuresthateverysearchinapartitionndsa 141

PAGE 142

Searchingwithbasicstrategy matchingprexandhenceanexthop.So,thelookupalgorithmtakestheformgiveninFigure 5-4 . 142

PAGE 143

SearchingwithleafpushingversionA representationofthesubtriedenedbylevels0throughlofST(N).Fromthedenitionofrecursivepartitioning,thechoicesforlinC(N;l;r)are1throughN:height+1.Whenl=N:height+1,ST(N)isrepresentedbythebasestructure.So,fromthedenitionofrecursivepartitioning,itfollowsthat (5{1) TherecurrenceinEquations 5{1 and 5{2 assumesthatnomemoryaccessisneededtodeterminewhethertheentireroutertablehasbeenstoredasabasestructure.Further,incasetheroutertablehasbeenpartitionedthennomemoryaccessisneededtodetermine 143

PAGE 144

(5{3) (5{4) RecurrencesforBmaybeobtainedfromSahniandKim[ 32 ]forxed-andvariable-strideMBTsandChapter 4 forHSSTs.Ourexperimentswithreal-worldroutertablesindicatesthatwhenauxiliarypartitionsarerestrictedtoberepresentedbybasestructures,thememoryrequirementisreduced.Withthisrestriction,thedynamicprogrammingrecurrencebecomes Now,thesecondparameterlofC(N;l;r)alwaysisN:heightandsothissecondparametermaybedropped.Furtheroptimizationispossiblebypermittingthemethodusedtokeeptrackofpartitionstobeeitherahashtableplusanauxiliarystructureforprexeswhoselengthislessthanthestrideorasimplearraywith2lentrieswhenthepartitionstrideisl(thislatterstrategyisidenticaltothatusedbyLampsonetal.[ 17 ] 144

PAGE 145

wherecisthememoryrequiredbyeachpositionofthefront-endarray.Again,thesecondparameterinCmaybedropped.Noticethattheinclusionoffront-endarraysasamechanismtokeeptrackofpartitionsrequiresastoaddafthentrytype(011)forhashtableentries.Thisfthtype,whichindicatesapartitionrepresentedusingafront-endarray,includesaeldforthekeyQ(N),anothereldforthestrideofthenext-levelpartition,andapointertothenext-levelfront-endarray.Notealsothatwhileourdiscussionmayhavegiventheimpressionthatallbasestructuresinarecursivelypartitionedroutertablemustbeofthesametype(i.e.,allareMBTsorallareHSSTs),itispossibletosolvethedynamicprogrammingrecurrencesallowingamixofbasicstructures. 17 ],one-levelprexpartitioning[ 20 ],intervalpartitioning[ 20 ])ortwo(two-levelprexpartitioning)levelsofpartitioning.Thestridesofthepartitioningtablesarexed(e.g.,thefront-endarrayof[ 17 ]usesastrideof16)andnotdatadependent.Recursivepartitioningpermitsmultilevelpartitioningwithstridesdeterminedbyadynamicprogrammingrecurrencetooptimizememoryutilization. 145

PAGE 146

32 , 38 ]. 4 asthesewereshowntobethemostecientrouter-tablestructureforstaticroutertables.Non-leafEBOnodeshavechildpointersandsomeEBOleafnodeshavepointerstonext-hoparrays.Forchildpointersweallocated10bits.Thisallowsustoindex1024nodes.WemodiedthedynamicprogrammingequationsdevelopedininChapter 4 fortheconstructionofoptimalEBOssothatEBOsthatrequire 146

PAGE 147

5.2.5 werecompiledusingtheMicrosoftVisualC++compilerwithoptimizationlevelO2andrunona3.06GHzPentium4PC.Ourrecursivepartitioningschemewascomparedagainstaone-levelpartitioningscheme,OLP,whichisageneralizationofthefront-endarrayofLampsonetal.[ 17 ]andanon-partitionedEBO.OLPdoesonlyonelevelofpartitioning(asdoes 147

PAGE 148

17 ])andusesEBOasthebasestructure.However,unlike[ 17 ],whichxesthesizeofthefront-endarrayto216,OLPselectsanoptimal,data-dependent,sizeforthefront-endarray.Specically,OLPtriesoutfront-endarraysofsize0and2i,1i24anddeterminesthosesizesthatminimizetheworst-casenumberofmemoryaccessesforalookup;fromthesesizes,thesizethatminimizestotalmemoryisselected.Notethatusingafront-endarrayofsize0isequivalenttousingnofront-endarray.WefoundOLPtobesuperior,onourdatasets,tosimplylimitingourrecursivepartitioningschemesoastopartitiononlyattherootlevel.Wedidnotcomparewiththepartitioningschemesof[ 20 ],becausetheseschemesimproveaverageperformanceattheexpenseofworst-caseperformanceandourfocusinthispaperisworst-caseperformance.Theschemesof[ 20 ]resultinincreasedmemoryrequirementandworst-casenumberofmemoryaccessesforasearchrelativetothebasestructure.Allofourprogramswerewrittensoastoconstructlookupstructuresthat(A)minimizetheworst-casenumberofmemoryaccessesneededforalookupand(B)minimizethetotalmemoryneededtostoretheconstructeddatastructure.Asaresult,ourexperimentsmeasuredonlythesetwoquantities.Further,alltestalgorithmswererunsoastogeneratealookupstructurethatminimizestheworst-casenumberofmemoryaccessesneededforalookup;thesize(i.e.,memoryrequired)oftheconstructedlookupstructurewasminimizedsubjecttothisformerconstraint. 5-1 showsthenumberofmemoryaccessesandmemoryrequirementforthetestedlookupstructures.RP(k)(K=4,5)denotesthespace-optimalrecursivelypartitionedstructurethatrequiresatmostkmemoryaccessespersearch.Figures 5-5 and 5-6 plotthisdata.Ascanbeseen,onthememoryaccesscount,RP(4)issuperiortoEBO 148

PAGE 149

MemoryaccessesrequiredforalookupinIPv4tables Figure5-6. TotalmemoryrequiredforIPv4tables 149

PAGE 150

Memoryaccessesandtotalmemory(KBytes)requiredforIPv4tables 36-bitentries 72-bitentries OLP EBO Database RP(4) RP(5) RP(4) RP(5) Accesses Memory Accesses Memory Aads 77 59 90 61 4 141 5 68 MaeWest 124 98 143 100 4 186 5 113 RRC01 392 300 442 309 4 507 6 335 RRC04 417 318 474 327 4 2687 6 354 AS4637 591 473 669 485 4 717 6 530 AS1221 861 611 1080 634 5 3041 6 644 onall6datasetsby1or2accesses.OLPissuperiortoEBOby1accesson3ofourdatasetsandby2accessesontheremaining3datasets;RP(5)issuperiortoEBOby1accesson4ofthe6datasets.OLPrequiredonemoreaccessthanRP(4)onthelargestdataset(AS1221)andtiedwithRP(4)ontheremaining5.Onallofourtestsets,the36-bitimplementationrequiredlessmemorythanrequiredbythecorresponding72-bitimplementation.Infact,the36-bitimplementationrequiredbetween80%and98%ofthememoryrequiredbythe72-bitimplementation,theaverageis92%andthestandarddeviationis6%. Table5-2. StatisticsforIPv4memoryrequirementnormalizedbythatforRP(4)using36-bitentries Algorithm Min Max Mean StandardDeviation RP(5)using36-bitentries 0.71 0.80 0.77 0.03 RP(4)using72-bitentries 1.13 1.25 1.16 0.05 RP(5)using72-bitentries 0.74 0.82 0.79 0.03 OLP 1.21 6.44 2.64 2.05 EBO 0.75 0.91 0.86 0.06 Figure 5-2 givesthememoryrequirementofthelookupstructuresnormalizedbythatforRP(4)using36-bitentries.ComparedtoRP(4)with36-bitentries,OLPrequiredfrom21%to544%morememory,whileEBOrequiredbetween9%and25%lessmemory.Amongallsixrepresentations,RP(5)using36-bitentrieswasthemostmemoryecient.ComparedtoEBO,thisimplementationofRP(5),usedbetween5%and13%lessmemory;theaveragereductionismemoryrequiredwas10%andthestandarddeviationas3%. 150

PAGE 151

44 ]togenerateIPv6tablesfromIPv4tables.Inthisstrategy,toeachIPv4prexweprependa16-bitstringcomprisedof001followedby13randombits.Ifthisprependingdoesn'tatleastdoubletheprexlength,weappendasucientnumberofrandombitssothatthelengthoftheprexisdoubled.Followingthisprependingandpossibleappending,wedropthelastbitfromone-fourthoftheprexessoastomaintainthe3:1ratioofevenlengthprexestooddlengthobservedinrealroutertables.EachsynthetictableisgiventhesamenameastheIPv4tablefromwhichitwassynthesized.TheAS1221-TelstraIPv6tableisnamedAS1221*todistinguishitfromtheIPv6tablesynthesizedfromtheIPv4AS1221table.Figure 5-3 givesthenumberofmemoryaccessesandmemoryrequirementforourIPv6datasets.Figures 5-7 and 5-8 plotthisdata.AswasthecaseforourIPv4experiments,RP(4)wasthebestintermsoflookupcomplexity.Particularly,RP(4)required1to3fewermemoryaccessesthanrequiredbyEBOforalookup.RP(4)andOLPtiedon5ofthe7datasets;on1,RP(4)required3fewermemoryaccessesandon 151

PAGE 152

MemoryaccessesrequiredforalookupinIPv6tables Figure5-8. TotalmemoryrequiredforIPv6tables 152

PAGE 153

Memoryaccessesandtotalmemory(KBytes)requiredforIPv6tables 36-bitentries 72-bitentries OLP EBO Database RP(4) RP(5) RP(4) RP(5) Accesses Memory Accesses Memory AS1221* 2021 282 79 5.7 7 4.6 7 4.6 Aads 197 179 183 178 4 221 5 184 MaeWest 332 302 309 299 4 456 5 311 RRC01 1165 1027 1295 1015 4 1348 6 1046 RRC04 1253 1088 1372 1075 4 2287 6 1108 AS4637 1871 1712 2013 1695 4 2346 6 1752 AS1221 3432 2217 2979 2188 5 2414 7 2252 theother,itrequired1lessaccess.RP(5)outperformedEBOby1or2accesseson5datasetsandtiedontheremaining2.IncontrasttotheexperimentswithIPv4tables,the72-bitimplementationofrecursivepartitioninggenerallyrequiredlessmemorythandidthe36-bitimplementation.On11ofour14tests(RP(4)andRP(5))withrecursivepartitioning,thememoryrequiredbythe72-bitimplementationwaslessthanthatrequiredbythe36-bitimplementation;itwasmoreontheremaining3tests.Thememoryofrecursivelypartitionedstructureusing36-bithashentriesnormalizedbythatrequiredusing72-bitentriesrangedfrom0.9to49.9.WeseethatthedatasetAS1221*incurredthelargestdierence.WhenAS1221*isexcluded,thenormalizednumberfortheremaining6datasetsisbetween0.90to1.15(themeanandstandarddeviationwere1.00and0.00). Table5-4. IPv6datanormalizedbythememoryrequiredbyRP(4)using72-bitentries.ThedatasetAS1221*isexcludedhere. Algorithm Min Max Mean StandardDeviation RP(4)using36-bitentries 0.90 1.15 1.00 0.11 RP(5)using36-bitentries 0.74 0.98 0.86 0.10 RP(5)using72-bitentries 0.73 0.97 0.85 0.10 OLP 0.81 1.67 1.23 0.31 EBO 0.76 1.00 0.87 0.11 ForthedatasetAS1221*,the72-bitimplementationofRP(4)reducedthememoryaccessesofEBOby3butrequired17timesasmuchmemory.ThesameimplementationofRP(5)required24%morememorythanrequiredbythebaseEBOstructure.Ontheotherhand,RP(6)required3.8KBytes;a17%memoryreductionaccompaniedby 153

PAGE 154

5-4 presentsthestatisticsnormalizedbythememoryrequiredbyRP(4)using72-bitentriesfortheremaining6datasets.Ascanbeseen,thememoryofEBOnormalizedbyRP(4)using72-bitentriesrangedfrom0.76to1.00,withthemeanandstandarddeviationbeing0.87and0.11.ThecorrespondingnormalizednumbersforOLPwere0.81,1.67,1.23,and0.31. 17 ]commonlyusedinconjunctionwithbaserouter-tabledatastructures.Althoughwedidnotdoadirectcomparisonwiththestandard16-bitfront-endtablescheme,wedidcomparewithitsgeneralizationOLP,whichusesafront-endtablethatminimizestotalmemorysubjecttominimizingtheworst-casenumberofmemoryaccessesperlookup.Bydesign,OLPcannotbeinferiortotheemployedbasestructure(inourcaseEBO).OLPimprovedthelookupperformanceofEBOby1or2memoryaccessesonallbutoneofourtestsets.Inallcases,theimprovementinlookupperformancecameattheexpenseofincreasedmemoryreaquirement;fortheRRC04IPv4dataset,OLPreducedthememoryaccessesperlookupby2butrequired6.7timesthememory.RP(4)improvedthelookupperformanceofEBOby1to3memoryaccessesonallourdatasets.OnalltestsetswhereRP(4)andOLPresultedinthesamelookupperformance,RP(4)tooklessmemorythandidOLP.Forexample,ontheRRC04IPv4dataset,the36-bitimplementationofRP(4)took15.5%ofthememorytakenbyOLP;ittookonly18%morememorythanEBOwhilereducingtheworst-casememoryaccessesfrom6to4. 154

PAGE 155

5.3 ,usingevenlargerhash-tableentries(e.g.,144bits)resultedinnoreductioninmemoryrequiredbyeitherRP(4)orRP(5)forourIPv4andIPv6testdata.Further,weexpecttheresultsreportedinthispapertocarryovertothecasewhenbasestructuresotherthanEBO(e.g.,multibittries)areemployed. 155

PAGE 156

45 ]showhowtoreduceTCAMpowersignicantlybycapitalizingonafeatureincontemporaryTCAMsthatpermitsonetoselectaportionoftheentireTCAMforsearch.ThepowerconsumptionnowcorrespondstothatforaTCAMwhosesizeisthatoftheportionthatissearched.Usingtheexampleof[ 45 ],supposewehaveaTCAMwhosecapacityis512KprexesandthattheTCAMhasablocksizeof6K.So,thetotalnumberofblocksis64.TheportionofthetotalTCAMthatistobesearchedisspeciedusinga64-bitvector.Eachbitofthisvectorcorrespondstoablock.The1sinthisvectordenetheportion(subtable)oftheTCAMthatistobesearchedandthepowerrequiredtosearchaTCAMsubtableisproportionaltothesubtablesize.WhileitisnotrequiredthatasubtablebecomprisedofcontiguousTCAMblocks,weassume,inthispaper,thatthisisthecase.Weusethetermbuckettorefertoasetofcontiguousblocks.Although,intheexampleof[ 45 ]thesizeofabucketisamultipleof8Kprexes,weassumethatbucketsizesarerequiredonlytobeinteger.Zaneetal.[ 45 ]partitiontheforwardingtableintosmallersubtables(actually,buckets)sothateachlookuprequires2searchesofsmallerTCAMs.Theirmethod,however,increasesthetotalTCAMmemorythatisrequired.Lu[ 19 ]hasproposedan 156

PAGE 157

1 ]proposeanalternativeTCAMarchitecturethatemploysmultipleTCAMsandmultipleTCAMselectors.TheroutingtableisdistributedoverthemultipleTCAMs,theselectorsdecidewhichTCAMistobesearched.Thearchitectureof[ 1 ]isabletodeterminethenext-hopforseveralpacketsinparallelandsoachievesprocessingrateshigherthanthoseachievablebyusingasinglepipelinearchitecturesuchastheoneproposedbyZaneetal.[ 45 ].TheproposalofZaneetal.[ 45 ],however,hastheadvantagethatitcanbeimplementedacommercialnetworkprocessorboardequippedwithaTCAMandanSRAM(forexample,Intel'sIXP2800networkprocessorsupportsaTCAMandupto4SRAMs,nocustomizedhardwaresupportisrequired)whereasthatofAkhbarizadehetal.[ 1 ]cannot.Inthischapter,weimproveupontherouter-tablepartitioningalgorithmsof[ 45 ]and[ 19 ].Thesealgorithmsmaybeusedtopartitionroutertablesintoxedsizeblocksasisrequiredbythearchitectureof[ 1 ]aswell.Additionally,weshowhowtocoupleTCAMsandwideSRAMssoastosearchforwardingtableswhosesizeismuchlargerthantheTCAMsizewithnolossintimeandwithpowerreduction.AllofouralgorithmsandtechniquesareimplementableusingacommercialnetworkprocessorboardequippedwithaTCAMandmultipleSRAMs.WebegininSection 6.1 byreviewingrelatedwork.InSection 6.2 wedevelopanalgorithmtodooptimalsubtreesplitsandinSection 6.3 weproposeaheuristicforpostordersplit.MethodstoecientlysearchforwardingtableswhosesizeislargerthantheTCAMsizeareproposedinSections 6.4 and 6.5 .AqualitativecomparisonofthemethodstoreduceTCAMpowerandincreasethesizeoftheforwardingtablethatmaybesearchisdoneinSection 6.6 andanexperimentalevaluationoftheproposedmethodsisdoneinSection 6.7 . 6-1 givesanexample7-prexforwardingtable.Figure 6-1 showsasimpleTCAMorganizationforthisforwardingtable.Inthisorganization,the7prexesarestoredintheTCAMisdecreasingorderofprexlengthandthenexthopsarestored 157

PAGE 158

Anexample7-prexforwardingtable Prexes NextHop P1 * H1 P2 0* H2 P3 00* H3 P4 01* H4 P5 11* H5 P6 000* H6 P7 011* H7 Figure6-1. SimpleTCAMorganizationforTable 6-1 incorrespondingwordsofanSRAM.WeassumethattheTCAMandSRAMwordsareindexedbeginningat0.Supposethatwehaveapacketwhosedestinationaddressbeginswith010.ThelongestmatchingprexisP4.ATCAMsearchforthedestinationaddressreturnstheTCAMindex3forthelongestmatchingprex.Accessingword3oftheSRAMyieldsH4asthenexthopforthesubjectpacket.ToreducethepowerconsumedbytheTCAMsearch,Zaneetal.[ 45 ]proposepartitioningtheTCAMintoanindexTCAM(ITCAM)andadataTCAM(DTCAM).TheDTCAMiscomprisedofseveralbucketsofprexes.EachlookuprequiresasearchoftheITCAM,asearchof1bucketoftheDTCAM,and2SRAMaccesses.Zaneetal.[ 45 ]proposetwomethods{subtreesplitandpostordersplit{topartitiontheforwardingtableprexesintoDTCAMbuckets.Bothmethodsstartwiththe1-bittrierepresentationoftheprexesintheforwardingtable.Figure 6-2 showsthe1-bittrieforthe7-prexexampleofTable 6-1 . 158

PAGE 159

1-bittriefor7-prexexampleofTable 6-1 Insubtreesplit,theprexesarepartitionedintovariable-sizebuckets.Allbutoneofthebucketscontainbetweendb=2eandbprexes,whereb>1isaspeciedboundonthebucketsize.Theremainingbucketcontainsbetween1andbprexes.Thepartitioningisaccomplishedbyperformingapostordertraversalofthe1-bittrie.Duringthevisitoperation,thesubtreerootedatthevisitednodeviscarvedoutifitcontainsatleastdb=2eprexesandifthesubtreerootedatitsparent(ifany)containsmorethanbprexes.TheprexesinthecarvedoutsubtreearemappedintoaDTCAMbucketindecreasingorderoflength.Acoveringprex 6-3 showstheITCAM,DTCAMandthe2SRAMs(ISRAMandDSRAM)forour7-prexexample.ForeachITCAMprex,thecorrespondingISRAMentrypointstothestartoftheDTCAMbucketthatgeneratedthatprexandforeachDTCAMprex,thecorrespondingDSRAMentryisthenexthopforthatprex.SinceDTCAMbucketsareofvariablesize, 159

PAGE 160

2-levelTCAMorganizationusingsubtreesplit ISRAMentrieswillneedalsotostorethesizeofthebucketpointedto.Todoalookup,theITCAMissearchedfortherstprexthatmatchesthedestinationaddress.ThecorrespondingISRAMentrypointstotheDTCAMbucketthatistobesearchednext.So,bydoing2TCAMsearchesand2SRAMaccesses,wecandeterminethenexthopforthepacket.Foraforwardingtablewithnprexes,thenumberofITCAMentriesisatmostd2n=beandeachbuckethasatmostb+1prexes(includingthecoveringprex).AssumingthatTCAMpowerconsumptionisroughlylinearinthesizeoftheTCAMbeingsearched,theTCAMpowerrequirementisapproximatelyd2n=be+b+1,whichisminimizedwhenb=p 6-1 .Whenn=8104,forexample,theminimumpowerrequiredbythe2-levelTCAMsolutionofFigure 6-3 is801andTCAMmemoryfor80,800prexesisrequired.Incontrast,thesimplesolutionofFigure 6-1 hasapowerandmemoryrequirementof80,000.Allbutatmostoneofthebucketsgeneratedbypostordersplit[ 45 ],containbforwardingtableprexes(plusuptoWcoveringprexes 160

PAGE 161

2-levelTCAMorganizationusingpostordersplit longestforwarding-tableprex);theremainingbuckethasfewerthanbforwarding-tableprexes(plusuptoWcoveringprexes).Allbucketsmaybepaddedwithnullprexessothat,forallpracticalpurposes,theyhavethesamesize.Thepartitioningisdoneusingapostordertraversalasinthecaseofsubtreesplitting.However,now,wemaypacktheprexesofseveralsubtreesintothesamebucketsoastolleachbucket.Consequently,theITCAMmayhaveseveralprexesforeachDTCAMbucket;oneprexforeachsubtreethatispackedintothebucket.Notealsothatabucketmaycontainupto1coveringprexforeachsubtreepackedintoit.Figure 6-4 showstheITCAM,ISRAM,DTCAM,andDSRAMcongurationsforthe7-prexexampleofTable 6-1 .Zaneetal.[ 45 ]haveshownthatthesizeoftheITCAMisatmost(W+1)dn=beandabucketmayhaveuptob+Wprexes(includingcoveringprexes).Lu[ 19 ]hasdevelopedanalternativealgorithmtopartitionintoequal-sizebuckets.Hisalgorithm,resultsinanITCAMthathasatmostdn=blog2beITCAMprexesandeachDTCAMbuckethasatmostb+dlog2beprexes(includingcoveringprexes);eachbucketexceptpossiblyonehasexactlybforwarding-tableprexes(plusuptodlog2becoveringprexes).Sincelog2b
PAGE 162

45 ]issuboptimal;thatisitdoesnotpartitiona1-bittrieintothesmallestnumberofsubtreesthathaveatmostbprexeseach.Infact,thealgorithmof[ 45 ]maygeneratealmosttwicetheoptimalnumberofsubtreesandhencebucketsandITCAMprexes.Toseethisconsiderthe1-bittrieofFigure 6-5 (A).Inthis,biseven,therightmostsubtriehasb1prexesandeachoftheleftsubtrieshasb=2prexes.Leth1bethetotalnumberofleftsubtries(i.e.,subtrieswithb=2prexeseach).Figure 6-5 (A)showsthebucketingobtainedbythealgorithmof[ 45 ].Onebuckethasb1prexesandtheremainderhaveb=2prexeseach.Thetotalnumberofbuckets(andhenceITCAMprexes)ish.Figure 6-5 (B)showsanoptimalpartitioningintoa1-12-levelTCAM.Thenumberofbucketsish=2+1.Notethatsinceeachbuckethasatleastdb=2e,prexes,2isanupperboundontheratioofthenumberofbucketsgeneratedbythesubtreesplitalgorithmof[ 45 ]andtheoptimalnumberofbuckets.Theorem6-1Letmbethenumberofbuckets(andhenceITCAMprexes)generatedbythesubtreesplitalgorithmof[ 45 ].Letmbethenumberofbucketsinanoptimalsubtreesplit.m=m<2andthisboundisbestpossible.WemayconstructoptimalsubtreesplitsusingthevisitalgorithmofFigure 6-6 inconjunctionwithapostordertraversalofthe1-bittrieTfortheforwardingtable.InthevisitalgorithmofFigure 6-6 ,bisthemaximumnumberofprexes(includingthecovering 162

PAGE 163

BFigure6-5. Badexampleforsubtreesplitof[ 45 ] prex(ifany))thatmaybestoredinaDTCAMbucket 45 ].4 163

PAGE 164

Visitfunctionforoptimalsubtreesplitting algorithmofFigure 6-6 inconjunctionwithapostordertraversalofthe1-bittrieforaforwardingtable.Theorem6-2AlgorithmoptSplitminimizesthenumberofDTCAMbucketsandhenceminimizesthenumberofITCAMprexes. 6-6 revealsthatcount(w)
PAGE 165

165

PAGE 166

FromTheorems6-1and6-2,itfollowsthatalgorithmoptSplitresultsin1-12-levelTCAMswiththefewestnumberofITCAMprexesanduptohalfasmanyITCAMprexesasintheITCAMsresultingfromthealgorithmof[ 45 ].Bydeferringthecomputationofanode'scountuntilitisneeded,thecomplexityofoptSplitbecomesO(nW),wherenisthenumberofprexesintheforwardingtableandWisthelengthofthelongestprex. 166

PAGE 167

45 ].Thenexttwotheoremsaresimilartotheoremsin[ 45 ].Theorem6-3Thenumberofforwarding-tableprexes(thiscountexcludesthecoveringprex(ifany))ineachbucketisintherange[b 6-6 .Notethatthebucketscreatedbythealgorithmof[ 45 ]mayhaveuptob+1prexes(includingthecoveringprex). be;d2n be]. beprexesandaDTCAMsearchofatmostbprexes. be.Also,noDTCAMbuckethasmorethanbprexes. 45 ],apostordersplitisrequiredtopack 167

PAGE 168

45 ]maypackuptoWcoveringprexesintoaDTCAMbucketwhilethatof[ 19 ]packsuptodlog2becoveringprexesintoaDTCAMbucket.Inbothalgorithms,eachbucketcontributesanumberofITCAMentriesequaltothenumberofcarvedsubtreespackedintoit.Inthissection,weproposeanewalgorithmforpostordersplit.Whilethevariationinthenumberofprexesinabucketisthesameasforthealgorithmof[ 19 ](frombtob+dlog2be)andtheworst-casenumberofITCAMprexesisthesameforbothouralgorithmandthatof[ 19 ],ouralgorithmgeneratesmuchfewerITCAMprexesonreal-worlddatasets.WedevelopalsoavariantofouralgorithmthathasthepropertythateachDTCAMbucketotherthanthelastonehasexactlybprexes(includingcoveringprexes).Thelastbucketmaybepackedwithnullprexestomakeitthesamesizeastheothers.Whenwelimiteachbuckettobforwarding-tableprexes,thetotalnumberofbucketsisincreasedslightly.WeusePS1torefertoourpostordersplitalgorithmthatstrictlyadherestothedenitionof[ 45 ]andweusePS2torefertothestatedvariant.ThestrategyinPS1istorstseeddn=beDTCAMbucketswithafeasiblesubtree 168

PAGE 169

abucket,thefeasiblesubtreeiscarvedoutofTandnotavailableforfurthercarving 6-7 givesourPS1algorithm.Here,feasibleST(T;q)determinesafeasiblesubtreeofTwhosesizeisaslargeaspossiblebutnomorethanq.ThefoundsubtreeisbestST.Intheinterestsofrun-timeeciency,weuseaheuristicforfeasibleST(T;q).ThisheuristicperformsatraversalofTusingthevisitalgorithmgiveninFigure 6-8 .Inthis 169

PAGE 170

VisitfunctionforfeasibleSTasusedbyPS1 visitalgorithm,count(x)isthenumberofforwarding-tableprexesinST(x)andonlytwofeasiblesubtreesST(x)andTST(x)areexamined.Followingthepreordertraversal,bestSTgivesthebestfeasiblesubtreefound.ThissubtreeispackedintoabucketbyalgorithmPS1andTupdatedtothesubtreethatremainsafterbestSTiscarvedfromT(thesubtreethatremainsiseitherST(x)orTST(x)forsomexinT).NoticealsothatwhenbestSTiscarvedoutofT,itisnecessarytoupdatethecountsofthenodesonthepathfromtherootofT(priortothecarving)totherootofbestST.Lemma6-1ExceptforthelastinvocationoffeasibleST(T;q),bestCountdq=2e. 170

PAGE 171

bedlog2be. ThetimecomplexityforfeasibleST(T;q)isO(nW),wherenisthenumberofforwarding-tableprexesandWisthelengthofthelongestprex.ThetimecomplexityofPS1isdominatedbythetimespentintheO(n=blogb)invocationsoffeasibleST.Thus,thecomplexityofPS1isO((n2Wlogb)=b).Figure 6-9 givesthevariantPS2;thevisitalgorithmusedforthisvariantisgiveninFigure 6-10 . 6-1 ,eachwordoftheSRAMisusedtostoreonlyanexthop.Sinceanexthoprequiresonlyasmallnumberofbits(e.g.,10bitsaresucientwhenthenumberofdierentnexthopsisupto1024)andawordofSRAMistypicallyquitelarge(e.g.,usingaQDRIISRAM,wecanaccess72bits(dualburst)or144bits(quadburst)atatime),thesimpleTCAMorganizationofFigure 6-1 doesnotoptimizeSRAMusage.ByusingeachwordoftheSRAMtostoreasubtreeofthe1-bittrieofaforwardingtable,wecanreducethesizeoftherequiredTCAMandhencereducethepowerrequiredfortablelookup.Thelookuptimeisnotsignicantlyaectedasalookupstillrequires1TCAMsearch(theTCAMtobesearchedissmallerandsothesearchrequireslesspowerbutotherwisetakesthesameamountoftime)and1SRAMaccessandsearch(theSRAMaccesstakesthesameamountoftimeregardlessofwhetherasinglehoporasubtreeofthe1-bittrieisaccessed;althoughthetimetoprocesstheaccessedSRAMwordincreases,thetotalSRAMtimeisdominatedbytheaccesstime). 171

PAGE 172

Tostorea1-bitsubtreeinanSRAMword,weusethesux-nodestructureusedbyusin 4 tocompactlystoresmallsubtreesofa1-bittrie.Figure 6-11 showsthisstructure.Considerasubtreeofa1-bittrieT.LetNbetherootofthesubtreeandletQ(N)betheprexdenedbythepathfromtherootofTtoN.LetP1PkbetheprexesinthesubtreeplusthecoveringprexforN(ifneeded).ThesuxnodeforNwillstoreasuxcountofkandforeachprexPi,itwillstorethesuxSiobtainedbyremovingtherstjQ(N)jbitsfromPi,thelengthjSij=jPijjQ(N)jofthissux(thecoveringprex(ifany)isanexception,itssuxisandthesuxlengthis0)andthenexthopassociatedwiththesux(thisisthesameasthenexthopassociatedwiththeprexPi). 172

PAGE 173

VisitfunctionforfeasibleST2asusedbyPS2 Figure6-11. Suxnodeformat 4 Letubethenumberofbitsallocatedtothesuxcounteldofasuxnodeandletvbethesumofthenumberofbitsallocatedtoalengtheldandanext-hopeld.Letlen(Si)bethelengthofthesuxSi.ThespaceneededbythesuxnodeeldsforS1Skisu+kv+Plen(Si)bits.Typically,wexthesizeofasuxnodetoequalthebandwidth(orwordsize)oftheSRAMinuse 173

PAGE 174

6-1 .Supposethatasuxnodeis32bitslong(equivalently,thebandwidthoftheSRAMis32bits).Wemayuse2bitsforthesuxcounteld(thisallowsupto4suxesinanodeasthecountmustbemorethan0),2bitsforthesuxlengtheld(permittingsuxesoflengthupto3),and12bitsforanexthop(permittingupto4096dierentnexthops).Withthisbitallocation,asuxnodemaystoreupto2suxes.Figure 6-12 (A)showsacarvingofthe1-bittrie(Figure 6-2 )forour7-prexexample.Thiscarvinghasthepropertythatnosubtreeneedsacoveringprexandeachsubtreemaybestoredinasuxnodeusingthestatedformat.Figure 6-12 (B)showstheSTWrepresentationforthiscarving. Figure6-12. SimpleTCAMwithSRAM(STW)fortheprexsetofTable 6-1 Tosearchforthelongestmatchingprex(actuallythenexthopassociatedwiththisprex)forthedestinationaddressd,wendrsttheTCAMindexofthelongestmatchingQ(N)intheTCAM.ThisindextellsuswhichSRAMwordtosearch.TheSRAMwordisthensearchedforthelongestsuxSithatmatchesdwiththerstjQ(N)jbitsstripped.Iftheaveragenumberofprexespackedintoasuxnodeisa1,thentheTCAMsizeisapproximatelyn=a1,wherenisthetotalnumberofforwarding-tableprexes.So,thepowerneededforalookupinaforwardingtableusinganSTWisabout1=a1that 174

PAGE 175

6-1 isused.Equivalently,ifwehaveaTCAMwhosecapacityisnprexes,theSTWrepresentationpermitsustohandleforwardingtableswithuptona1prexeswhiletableswithuptoonlynprexesmaybehandledusingtheorganizationofTable 6-1 ;inbothcases,thepowerandlookuptimeareaboutthesame.Intheremainderofthissection,weproposeaheuristictocarvesubtreesfromTaswellasadynamicprogrammingalgorithmthatdoesthis.Theheuristicattemptstominimizethenumberofsubtreescarved(eachsubtreemusttinanSRAMwordorsuxnode)whilethedynamicprogrammingalgorithmguaranteesaminimumcarving. (6{1) Toseethecorrectnessofthisrecurrence,noticethateachprexinST(l)andST(r)hasasuxthatis1longerinST(x)thaninST(l)andST(r).So,weneedST(l):numB+ST(l):numP+ST(r):numB+ST(r):numPbitstostoretheirlengths,suxes,andnexthops.Additionally,whenxcontainsaprex,weneedvbitstostorethelength(0)ofitssuxaswellasitsnexthop;nobitsareneededforthesuxitself(asthesuxisandhaslength0).Thesize,ST(x):size,ofthesuxnodeneededbyST(x)isgivenby 175

PAGE 176

Visitfunctionforsubtreecarvingheuristic 6{2 followsfromtheobservationthatineithercase,weneeduadditionalbitsforthesuxcount.Whenacoveringprexisneeded,werequirealsovbitsforthelength(whichis0)andnext-hopeldsforthiscoveringprex.Ourcarvingheuristicperformsapostordertraversalofthe1-bittrieTusingthevisitalgorithmofFigure 6-13 .Wheneverasubtreeissplitfromthe1-bittrie,theprexesinthatsubtreeaswellasacoveringprex(ifneeded)areputintoasuxnodeandaTCAMentryforthissuxnodegenerated.Thecomplexityofthevisitalgorithm(includingthetimetorecomputex:size)isO(1).So,theoverallcomplexityofourtreecarvingheuristicisO(nW),wherenisthenumberofprexesintheforwardingtableandWisthelengthofthelongestprex. 176

PAGE 177

177

PAGE 178

178

PAGE 179

6{4 6{8 ,eachopt(;;)valuecanbecomputedinO(wpMax)time.SinceO(nWwpMax)opt(;;)aretobecomputed,thetimerequiredtodetermineopt(root(T))isO(nWw2pMax2)=O(nWw4=v2)(aspMax
PAGE 180

6-14 showsthelayoutforthe7-prexforwarding-tableexampleofTable 6-1 .ThislayoutusesDTCAMbucketswithb=3. Figure6-14. 1-12Wawithxed-sizeDTCAMbuckets Tosearchforthelongestmatchingprexofdusingthe1-12Waorganization,werstsearchtheITCAMfortherstITCAMentrythatmatchesd.FromtheindexofthisITCAMentryandtheDTCAMbucketsizeb,wecomputethelocationoftheDTCAMbucketthatistobesearched.TheidentiedDTCAMbucketisnextsearchedfortherstentrythatmatchesd.TheSRAMwordcorrespondingtothismatchingentryisthensearchedforthelongestmatchingprexusingthesearchstrategyforasuxnode.Inall,2TCAMsearchesand1SRAMsearcharedone.Thepowerreduction,relativetotheSTWorganization,isbyafactorequalthatprovidedbythesubtreesplitschemeofSection 6.2 (thereductionfactorisapproximatelyn=(a1b)).Additionally,thenumberofSRAMaccessesisonly1vs2fortheschemeofSection 6.2 .However,1-12WamaywasteuptohalfoftheDTCAMbecausethesubtreesplitalgorithmofSection 6.2 maypopulateDTCAMbucketswithasfewasdb=2eprexes.WecanovercometheproblemofinecientDTCAMspaceutilizationby1-12WabyintroducinganISRAM(thismayjustbealogicalpartitionoftheSRAMusedforsuxnodes)asisdoneina2-levelTCAMorganizationthatusessubtreesplit(Figure 6-3 ). 180

PAGE 181

6-15 showsthe1-12Wblayoutforour7-prexexample. Figure6-15. 1-12Wbwithvariable-sizeDTCAMbuckets Twoadditionalorganizations,1-12Wcand1-12WdresultfromrecognizingthattheISRAMcouldbeusedtostoreasuxnoderatherthanjustapointertoaDTCAMbucket.1-12Wc(Figure 6-16 )usesthexedDTCAMbucketsizeorganizationusedby1-12Wawhile1-12WdusesthevariableDTCAMbucketorganizationof1-12Wb.ThesuxnodesintheISRAMareconstructedfromthe1-bittrieVfortheprexesusedintheITCAMofFigures 6-14 and 6-15 .ThisconstructionofsuxnodesusesoneofthealgorithmsgiveninSection 6.4 .TheprexesintheITCAMforthe1-12Wcand1-12WdorganizationscorrespondtothoseforitsISRAMsuxnodes.Tosearchusing1-12Wc,forexample,werstsearchtheITCAMfortherstentrythatmatchesd,thenthecorrespondingsuxnodeintheISRAMisaccessedandsearchedusingthesearchmethodforasuxnode.ThissearchyieldsthesameresultasobtainedbysearchingtheITCAMofthe1-12Warepresentation.SinceDTCAMbucketsareofaxedsize,usingthesinglepointerstoredinthesearchedISRAMsuxnode,wecandeterminewhichDTCAMbuckettosearchnext. 6.3 .Twovariants(M-12WaandM-12Wb,seeFigure 6-18 )arepossible 181

PAGE 182

1-12Wcwithxed-sizeDTCAMbuckets Figure6-17. 1-2Wdwithvariable-sizeDTCAMbuckets dependingonwhethertheISRAMsimplystorespointerstoDTCAMbuckets(asinFigure 6-4 )oritstoressuxnodesformedfromV. BFigure6-18. Many-to-one2-levelTCAMwithwideSRAM.A)M-12Wa.B)M-12Wb ThesearchprocessforanM-12Wa(B)isthesameasthatfora1-12Wb(d). 182

PAGE 183

Comparisonofworst-caseTCAMmemoryandpowerrequired TCAMSize TCAMPower TCAMSearches SRAMAccesses Simple n 1 1-1 2n+2n b b+b 2 M-1 b(b+2log2b) b) 2 2 STW 1 1-12Wa a1(1+1 a1b 1 1-12Wb a1(1+2 a1b 2 1-12Wc a1(1+1 a1a2b 2 1-12Wd a1(1+2 a1a2b 2 M-12Wa a1b(b+2log2b) a1b) 2 2 M-12Wb a1b(b+log2b+log2b a2) a1a2b) 2 2 6-1 ,1-1denotesa1-1TCAMusingsubtreesplit,M-1denotesamany-1TCAMusingoneofthepostordersplitmethodsof[ 19 ]andSection 6.3 ,anda2istheaveragenumberofprexesstoredinthesuxnodesofanISRAM. 19 , 45 ]toconstructlow-power2-levelTCAMsforverylargeforwardingtables.ForourwideSRAMstrategies,weassumeaQDRIISRAM(quadburst)thatsupportstheretrievalof144bitsofdatawithasinglememoryaccess.Forallimplementations,weallocate12bitsforeachnexthopeld.FortheISRAMin1-12Wband1-12Wd,thesizeofthepointerpointingtoaDTCAMentrywas16bitsandanother10bitswereusedtospecifytheactualsizeofabucket.FortheISRAMin1-12Wc,M-12WaandM-12Wb,thesizeofthepointerpointingtoaDTCAMbucketwas10bits.OurexperimentsusedbothIPv4andIPv6datasets. 183

PAGE 184

45 ].RecallthatforanygivenDTCAMbucketsize,optSplitresultsinanITCAMofminimumsize,wheresizeofaTCAMisthenumberofTCAMentries.Notealsothat,for1-12-levelTCAMs,theITCAMsizeequalsthenumberofDTCAMbuckets.Figure 6-3 givestheITCAMsizeconstructedbythesetwoalgorithmsfordierentDTCAMbucketsizeb.Figure 6-19 plotsthedataforAS1221.Eventhoughsubtree-splitmaygenerateITCAMswhosesizeisuptotwiceoptimal,onour3IPv4testsets,theITCAMsgeneratedbysubtree-splitwereonlybetween1:9%and3:4%largerthanoptimal;theaverageandstandarddeviationwere2:9%and0:1%,respectively. Table6-3. ITCAMsizefor1-12-levelTCAMs Algorithm AS1221 2946 1467 734 377 2861 1429 720 367 AS3333 2193 1091 545 276 2121 1056 531 269 AS4637 2172 1079 541 274 2100 1048 529 266 19 ].[ 19 ]hasestablishedthesuperiorityoftriePartitiontopostordersplitof[ 45 ].Figures 6-4 and 6-5 showtheITCAMsizeandthenumberofDTCAMbucketsconstructedbythesethreealgorithms.Figure 6-20 plotsthedataforAS1221.WeseethatPS2hasthebestperformance.TheITCAMsconstructedbytriePartitionarefrom80%to137%largerthanthoseconstructedbyPS2withtheaverageandstandarddeviationbeing98%and48%,respectively.ThesizeoftheITCAMsconstructedbyPS1werebetween0:94and1:22timesthatoftheITCAMsconstructedbyPS2;theaverage 184

PAGE 185

TotalITCAMsizewithwideSRAMsforAS1221. andstandarddeviationwere1:08and0:16,respectively.ThenumberofDTCAMbucketsconstructedbytriePartitionwasbetween4%to7%morethanthatconstructedbyPS2;theaverageandstandarddeviationbeing3%and1%,respectively.PS1resultedinthesamenumberofDTCAMbucketsasdidtriePartition. Table6-4. ITCAMsizeformany-12-levelTCAMs Algorithm AS1221 8773 4694 2574 1436 10758 5345 2602 1473 759 8804 4638 2601 1398 731 AS3333 6440 3488 1921 1047 7812 3698 1664 905 548 6569 3293 1667 956 451 AS4637 6364 3468 1899 1068 7670 3585 1687 908 445 6509 3276 1639 967 451 NumberofDTCAMbucketsformany-12-levelTCAMs Algorithm AS1221 2327 1136 560 278 4854 2327 1136 560 278 4550 2239 1111 553 276 AS3333 1751 855 422 209 3653 1751 855 422 209 3419 1682 835 416 208 AS4637 1737 848 418 208 3623 1737 848 418 208 3390 1668 828 413 206

PAGE 186

BFigure6-20. ITCAMsizeandnumberofDTCAMbucketsformany-12-levelTCAMsforAS1221.A)ITCAMsize.B)#ofDTCAMbuckets 6.4 givesimilarresults.Sincetheheuristicisconsiderablyfaster,weusedthecarvingheuristicforbenchmarkinghere.Figure 6-6 showsthetotalTCAMsize(ITCAMplusDTCAM)constructedbyeachofour6wide-SRAMalgorithms(1-12Wa,1-12Wb,1-12Wc,1-12Wd,M-12Wa,andM-12Wb).Figure 6-21 plotsthedataforAS1221.Figure 6-7 showsthesamedatanormalizedbythatforM-12Wb.Thenormalizationwasdonebydividingthedatumforeachalgorithm,eachdataset,andeachbucketsizebythecorrespondingdatumforM-12Wb.Thisresultedin15dataforeachalgorithm(3datasetsand5bucketsizesperalgorithm).Themin,max,mean,andstandarddeviationinthese15dataarereportedinthetableforeachofthe3algorithms.Thesixstrategiesclusterintotwogroups1-12Waand1-12Wcbeingtherstgroupandtheremaining4deningthesecondgroup.TheTCAMsizeisaboutthesameforeachstrategyinthesamegroup.Strategiesintherstgrouprequiredbetween26%to35%moreTCAMmemorythanrequiredbystrategiesinthesecondgroup. 186

PAGE 187

TotalTCAMsizewithwideSRAMs Algorithm AS1221 1-12Wa 73320 71208 71446 69768 69700 1-12Wb 54953 54377 54103 53961 53893 1-12Wc 72287 70704 71193 69644 69639 1-12Wd 54146 53988 53905 53862 53845 M-12Wa 57400 55818 55055 54506 54388 M-12Wb 55981 55071 54620 54314 54294 AS3333 1-12Wa 54145 52761 53199 53865 55350 1-12Wb 41402 40978 40776 40674 40623 1-12Wc 53381 52388 53010 53769 55301 1-12Wd 40810 40688 40628 40599 40584 M-12Wa 43178 41997 41324 41138 41054 M-12Wb 42139 41460 41026 40991 40976 AS4637 1-12Wa 53755 52632 52685 53352 55350 1-12Wb 41050 40631 40428 40327 40277 1-12Wc 52998 52260 52498 53257 55301 1-12Wd 40461 40345 40282 40253 40238 M-12Wa 42792 41605 41074 40648 41046 M-12Wb 41750 41076 40773 40485 40976 TotalTCAMsizenormalizedbythatofM-12Wb Algorithm min max mean standarddeviation 1-12Wa 1.27 1.35 1.30 0.01 1-12Wb 0.98 1.00 0.99 0.00 1-12Wc 1.26 1.35 1.29 0.01 1-12Wd 0.97 0.99 0.98 0.00 M-12Wa 1.00 1.03 1.01 0.00 Figure 6-8 showsthetotalTCAMpowerrequiredbyour6strategiesandFigure 6-22 plotsthedataforAS1221.Figure 6-9 showsthisdatanormalizedbythatforM-12Wb.Onthepowermetric,1-12Wcistheclearwinner.Figure 6-10 showsthetotalSRAMsizerequiredbyour6strategiesandFigure 6-23 plotsthedataforAS1221.Figure 6-11 showsthisdatanormalizeddatafortheM-12Wb.ThestrategiesclusterintotwogroupswithstrategiesinthesamegrouprequiringaboutthesameamountofSRAM.Therstgroupcomprises1-12Wb,1-12Wd,M-12Wa,andM-12Wbwhilethesecondgroupcomprises1-12Waand1-12Wc.TheSRAMrequirementofstrategiesintherstgrouparebetween26%to35%largerthanthatforthoseinthesecondgroup;theaveragebeing29%. 187

PAGE 188

TotalTCAMpowerwithwideSRAMs Algorithm AS1221 1-12Wa 1192 680 534 648 1092 1-12Wb 1192 680 534 648 1092 1-12Wc 159 176 281 524 1031 1-12Wd 385 291 336 549 1044 M-12Wa 1784 1034 783 746 1140 M-12Wb 365 287 348 554 1046 AS3333 1-12Wa 897 537 463 617 1078 1-12Wb 897 537 463 617 1078 1-12Wc 133 164 274 521 1029 1-12Wd 305 247 315 542 1039 M-12Wa 1322 781 620 690 1118 M-12Wb 283 244 322 543 1040 AS4637 1-12Wa 891 536 461 616 1078 1-12Wb 891 536 461 616 1078 1-12Wc 134 164 274 521 1029 1-12Wd 302 250 315 542 1039 M-12Wa 1320 773 626 712 1110 M-12Wb 278 244 325 549 1040 TotalTCAMpowernormalizedbythatofM-12Wb Algorithm min max mean standarddeviation 1-12Wa 1.04 3.27 1.82 0.23 1-12Wb 1.04 3.27 1.82 0.23 1-12Wc 0.44 0.99 0.78 0.05 1-12Wd 0.97 1.09 1.01 0.01 M-12Wa 1.07 4.89 2.50 0.38 Table6-10. TotalSRAMsize(KBytes) Algorithm AS1221 1-12Wa 1269 1242 1251 1224 1224 1-12Wb 949 947 947 946 946 1-12Wc 1270 1242 1251 1224 1224 1-12Wd 951 949 947 946 946 M-12Wa 981 966 959 954 954 M-12Wb 984 968 960 954 954 AS3333 1-12Wa 937 920 931 945 972 1-12Wb 715 714 713 713 713 1-12Wc 938 920 931 945 972 1-12Wd 717 715 714 713 713 M-12Wa 738 727 720 720 720 M-12Wb 740 728 721 720 720 AS4637 1-12Wa 930 918 922 936 972 1-12Wb 709 708 707 707 707 1-12Wc 931 918 922 936 972 1-12Wd 711 709 708 707 707 M-12Wa 731 720 716 711 720 M-12Wb 733 722 716 711 720

PAGE 189

TotalTCAMsizewithwideSRAMsforAS1221. Figure6-22. TotalTCAMpowerwithwideSRAMsforAS1221. Table6-11. TotalSRAMsizenormalizedbythatofM-12Wb Algorithm min max mean standarddeviation 1-12Wa 1.26 1.35 1.29 0.01 1-12Wb 0.97 0.99 0.98 0.00 1-12Wc 1.26 1.35 1.29 0.01 1-12Wd 0.97 0.99 0.98 0.00 M-12Wa 1.00 1.00 1.00 0.00 189

PAGE 190

TotalSRAMsizewithwideSRAMsforAS1221. 6-12 , 6-14 ,and 6-16 showthetotalTCAMsize,totalTCAMpower,andtotalSRAMsizeforeachof3datasetsusingthesefouralgorithms.Figure 6-24 plotsthedataforAS1221.Figures 6-13 , 6-15 ,and 6-17 showthisdatanormalizedbythatforM-12Wb.Ascanbeseen,intermsoftotalTCAMsizeandTCAMpower,1-12WcandM-12WbaresignicantlysuperiortooptSplitandPS2.BothoptSplitandPS2requiredmorethan5timestheTCAMrequiredbyM-12Wb;optSplitalsorequiredmorethan6timesasmuchTCAMpower,andPS2requiredabout10timesasmuchTCAMpowerasrequiredbythestrategiesemployingwideSRAM.optSplitrequiredslightlysmallertotalTCAMthanPS2,andmuchlesstotalTCAMpowerthanPS2;bothrequireaboutthesameamountofSRAM.BothoptSplitandPS2requireabout66%lessSRAMthanrequiredby1-12Wcandabout56%lessSRAMthanrequiredbyM-12Wb.SinceTCAMismoreexpensivethanSRAMandalsoconsumesmorepower,werecommend1-12WcandM-12WboveroptSplitandPS2. 190

PAGE 191

TotalTCAMsize Algorithm AS1221 1-12Wc 72287 70704 71193 69644 69639 M-12Wb 55981 55071 54620 54314 54294 optSplit 287273 284377 282945 282236 281883 PS2 300004 291230 287017 284534 283355 AS3333 1-12Wc 53381 52388 53010 53769 55301 M-12Wb 42139 41460 41026 40991 40976 214089 213024 212499 212237 225385 218589 215427 213948 213443 AS4637 1-12Wc 52998 52260 52498 53257 55301 M-12Wb 41750 41076 40773 40485 40976 212219 211167 210648 210385 223469 216780 213607 212423 211395 A B CFigure6-24. TotalTCAMsize,TCAMpower,andSRAMsizeforAS1221.A)TCAMsize.B)TCAMpower.C)SRAMsize Table6-13. TotalTCAMsizenormalizedbythatofM-12Wb Algorithm min max mean standarddeviation 1-12Wc 1.26 1.35 1.29 0.01 5.20 5.17 0.01 5.16 5.36 5.26 0.02 44 ]togenerateIPv6tablesfromIPv4tables.Inthisstrategy,toeachIPv4prexweprependa16-bitstringcomprisedof001followedby13randombits.Ifthisprependingdoesn'tatleastdoubletheprexlength,weappendasucientnumberofrandombitssothatthelengthoftheprexisdoubled.Followingthisprependingandpossibleappending,wedropthelastbitfromone-fourthoftheprexessoastomaintainthe3:1ratioofevenlengthprexesto 191

PAGE 192

TotalTCAMpower Algorithm AS1221 1-12Wc 159 176 281 524 1031 M-12Wb 365 287 348 554 1046 2989 1685 1232 1391 8868 4766 2857 1910 1755 AS3333 1-12Wc 133 164 274 521 1029 M-12Wb 283 244 322 543 1040 2249 1312 1043 1293 6633 3421 1923 1468 1475 AS4637 1-12Wc 134 164 274 521 1029 M-12Wb 278 244 325 549 1040 2228 1304 1041 1290 6573 3404 1895 1479 1475 TotalTCAMpowernormalizedbythatofM-12Wb Algorithm min max mean standarddeviation 1-12Wc 0.44 0.99 0.78 0.05 15.95 6.56 1.49 1.42 24.30 9.96 2.31 Table6-16. TotalSRAMsize(KBytes) Algorithm AS1221 1-12Wc 1270 1242 1251 1224 1224 M-12Wb 984 968 960 954 954 423 417 415 413 439 426 420 416 415 AS3333 1-12Wc 938 920 931 945 972 M-12Wb 740 728 721 720 720 318 314 312 311 330 320 315 313 312 AS4637 1-12Wc 931 918 922 936 972 M-12Wb 733 722 716 711 720 315 311 309 308 327 317 312 311 309 TotalSRAMsizenormalizedbythatofM-12Wb Algorithm min max mean standarddeviation 1-12Wc 1.26 1.35 1.29 0.01 0.44 0.44 0.00 0.43 0.45 0.44 0.00 oddlengthobservedinrealroutertables.EachsynthetictableisgiventhesamenameastheIPv4tablefromwhichitwassynthesized.OurIPv6experimentsfollowedthepatternofourIPv4experimentsandtheresultsareshowninFigures 6-18 through 6-26 .Figures 6-25 through 6-27 plotthedatafor 192

PAGE 193

Table6-18. ITCAMsizefor1-12-levelTCAMsforIPV6 Algorithm AS1221 3389 1791 949 496 3300 1750 934 491 AS3333 subtree-split 4225 2060 1023 511 255 2058 1023 511 255 AS4637 subtree-split 4220 2059 1023 511 255 2057 1023 511 255 ITCAMsizeformany-12-levelTCAMsforIPv6 Algorithm AS1221 9233 5098 2792 1507 12410 8080 4153 2432 1340 9015 6222 3920 2241 1378 AS3333 6316 3602 1994 1090 7146 3474 1967 1044 507 6674 3406 1979 1055 488 AS4637 6380 3564 1987 1060 7098 3436 1886 1027 480 6658 3684 2052 1095 514 193

PAGE 194

NumberofDTCAMbucketsrequiredformany-12-levelTCAMsforIPv6 Algorithm AS1221 2327 1136 560 278 4854 2327 1136 560 278 4570 2256 1116 555 277 AS3333 1752 855 422 210 3655 1752 855 422 210 3418 1686 837 417 208 AS4637 1737 848 418 208 3623 1737 848 418 208 3389 1673 830 413 206 TotalTCAMsizeforIPv6 Algorithm AS1221 1-12Wa 134615 132096 131327 130815 130175 1-12Wb 111767 110720 110207 109951 109823 1-12Wc 132673 131137 130849 130577 130057 1-12Wd 110223 109960 109829 109763 109729 M-12Wa 116707 113909 112267 111341 110934 M-12Wb 113819 112187 111300 110716 110652 284816 283266 282450 282007 301495 294990 289616 286401 285026 AS3333 1-12Wa 131040 131838 131327 130815 130175 1-12Wb 80509 79515 79004 78748 78620 1-12Wc 129153 130881 130849 130577 130057 1-12Wd 79017 78755 78624 78559 78526 M-12Wa 85869 82658 81066 79980 80248 M-12Wb 82871 81107 80075 79469 79938 214026 212991 212479 212223 225426 219214 216251 214559 213480 AS4637 1-12Wa 130715 131838 131327 130815 130175 1-12Wb 79825 78836 78325 78069 77941 1-12Wc 128833 130881 130849 130577 130057 1-12Wd 78339 78076 77945 77879 77847 M-12Wa 85181 82207 80311 79549 79141 M-12Wb 82173 80463 79315 78968 78898 212176 211142 210630 210374 223554 217828 214532 212551 211458 TotalTCAMsizenormalizedbythatofM-12Wb Algorithm min max mean standarddeviation 1-12Wa 1.18 1.66 1.48 0.06 1-12Wb 0.97 0.99 0.98 0.00 1-12Wc 1.17 1.65 1.47 0.06 1-12Wd 0.95 0.99 0.98 0.00 M-12Wa 1.00 1.04 1.01 0.00 2.67 2.61 0.01 2.58 2.72 2.67 0.01 194

PAGE 195

TotalITCAMsizewithwideSRAMsforIPv6AS1221. BFigure6-26. TotalITCAMsizeandnumberofDTCAMbucketswithwideSRAMsforIPv6AS1221.A)ITCAMsize.B)#ofDTCAMbuckets (equivalently,DTCAMbuckets)whenpartitioninga1-bittrieasgeneratedbytheheuristic,subtreesplitof[ 45 ].However,onourtestdata,theheuristicof[ 45 ]generatednear-optimalpartitions.Formany-1partitioning,ourheuristicPS2outperformstheheuristictriePartitionof[ 19 ].Infact,onIPv4data,triePartitionresultsin80%to137%moreITCAMentriesthangeneratedbyPS2onourtestdata.BesidesimprovinguponexistingtriepartitioningalgorithmsforTCAMs,wehaveproposedanovelwaytocombineTCAMsandSRAMssoastoachieveasignicant 195

PAGE 196

B CFigure6-27. TotalTCAMsize,TCAMpower,andSRAMsizeforIPv6AS1221.A)TCAMsize.B)TCAMpower.C)SRAMsize. Table6-23. TotalTCAMpowerforIPv6 Algorithm AS1221 1-12Wa 2135 1152 767 767 1151 1-12Wb 2135 1152 767 767 1151 1-12Wc 193 193 289 529 1033 1-12Wd 591 392 389 579 1057 M-12Wa 3555 2165 1419 1261 1366 M-12Wb 667 443 452 636 1084 3428 2006 1446 1515 9079 6350 4176 2753 2402 AS3333 1-12Wa 2080 1150 767 767 1151 1-12Wb 2080 1150 767 767 1151 1-12Wc 193 193 289 529 1033 1-12Wd 588 390 387 578 1057 M-12Wa 3693 2018 1450 1132 1400 M-12Wb 695 467 459 621 1090 2186 1279 1023 1279 6738 3534 2235 1567 1512 AS4637 1-12Wa 2075 1150 767 767 1151 1-12Wb 2075 1150 767 767 1151 1-12Wc 193 193 289 529 1033 1-12Wd 589 390 387 577 1057 M-12Wa 3709 2207 1463 1213 1317 M-12Wb 701 463 467 632 1074 2185 1279 1023 1279 6722 3812 2308 1607 1538 19 , 45 ]ortherecommendedwidememoryschemesM-12Wband1-12Wcdevelopedbyus,alookuprequires2TCAMsearchesand2SRAMaccesses.However,onourIPv4testdata,M-12Wbrequiredabout1/5ththe 196

PAGE 197

TotalTCAMpowernormalizedbythatofM-12Wb Algorithm min max mean standarddeviation 1-12Wa 1.06 3.20 1.90 0.21 1-12Wb 1.06 3.20 1.90 0.21 1-12Wc 0.28 0.96 0.63 0.07 1-12Wd 0.83 0.98 0.89 0.01 M-12Wa 1.23 5.33 3.26 0.43 9.54 3.88 0.69 1.39 14.33 6.43 1.14 Table6-25. TotalSRAMsize(KBytes)forIPv6 Algorithm AS1221 1-12Wa 2329 2304 2299 2295 2286 1-12Wb 1934 1931 1929 1929 1928 1-12Wc 2332 2305 2300 2295 2286 1-12Wd 1937 1932 1930 1929 1928 M-12Wa 1995 1969 1954 1945 1944 M-12Wb 2000 1972 1956 1946 1945 425 419 416 414 441 432 424 419 417 AS3333 1-12Wa 2268 2299 2299 2295 2286 1-12Wb 1386 1383 1381 1380 1380 1-12Wc 2270 2300 2300 2295 2286 1-12Wd 1388 1384 1382 1380 1380 M-12Wa 1450 1422 1405 1395 1404 M-12Wb 1456 1425 1407 1396 1405 318 314 312 311 330 321 316 314 312 AS4637 1-12Wa 2262 2299 2299 2295 2286 1-12Wb 1374 1371 1369 1368 1368 1-12Wc 2264 2300 2300 2295 2286 1-12Wd 1377 1372 1370 1368 1368 M-12Wa 1438 1411 1392 1387 1386 M-12Wb 1444 1414 1394 1388 1386 315 311 309 308 327 319 314 311 309 TotalSRAMsizenormalizedbythatofM-12Wb Algorithm min max mean standarddeviation 1-12Wa 1.16 1.65 1.47 0.06 1-12Wb 0.95 0.99 0.98 0.00 1-12Wc 1.17 1.65 1.47 0.06 1-12Wd 0.95 0.99 0.98 0.00 M-12Wa 1.00 1.00 1.00 0.00 0.22 0.22 0.00 0.21 0.23 0.22 0.00 197

PAGE 198

19 , 45 ];however,M-12Wbrequired2.5timesasmuchSRAMmemory.OnIPv6data,theseratioswere2/5,1/6,and5,respectively.OnIPv4data,1-12Wcrequiredabout1/4ththeTCAMmemory,1/12thasmuchTCAMpower,andabout3timesasmuchSRAMmemoryasrequiredbyourimprovedversionsoftheschemesof[ 19 , 45 ].Theseratioswere,1/2,1/10,and7,respectively,forIPv6data.SinceTCAMmemoryandpowerarethedominantcriteriaforoptimization,werecommendM-12WbwhenwewishtooptimizeTCAMmemoryand1-12Wcwhenwewishtooptimizepower. 198

PAGE 199

33 ].Wethenproposedheuristicsfortheconstructionofone-andtwo-dimensionalmulti-bittries.Thesemulti-bittriesaresuitableforclassifyingInternetpacketsinapipelinedplatformwhereeachpipelinestagehasitsownmemoryandthetriestructureispartitionedintoseparatepipelinestages.Themaximumper-stagememoryrequirementofpipelinedmulti-bittriesconstructedbyourheuristicsisreduceddramaticallywhencomparingtorecentpublishedschemes.Ouranotherworkistodevelopdynamicprogrammingalgorithmstobuildsuccinctdatastructures,HSSTsand2DHSSTPCs,respectivelyfor1-andmulti-dimensionalclassiers.Theseclassiersarecompactenoughtotintoahigh-speedSRAMandcanbesearchedusingasmallnumberofmemoryaccesses.TheHSSTsconstructedbyouralgorithmswereshowntobesuperiortomanyotherstateoftheart1-dimensionalschemesintermsoftotalmemoryrequirement,searchtimeandscalability.2DHSSTPCsaresuperior,inmemorystoragerequirement,totwo-dimensionalmulti-bittriesbyanorderofmagnitude,giventhesamebudgetofworst-casememoryaccessespersearch.Wedevelopedanalgorithmthatrecursivelypartitionsalargeruletableintomanysmalltablestoreducethesearchtimeforpacketforwardingaswellasthememorystoragerequired.Theecacyoftherecursivepartitioningiscomparedtothatofanother 199

PAGE 200

200

PAGE 201

[1] M.Akhbarizadeh,M.Nourani,R.PanigrahyandS.Sharma,\ATCAM-basedparallelarchitectureforhigh-speedpacketforwarding",IEEETrans.onComputers,56,1,2007. [2] F.Baboescu,S.SinghandG.Varghese,\PacketClassicationforCoreRouters:IsthereanalternativetoCAMs?",INFOCOM,2003. [3] F.BaboescuandG.Varghese,\FastandScalableConictDetectionforPacketClassiers",10thIEEEInternationalConferenceonNetworkProtocols(ICNP'02),2002. [4] A.BasuandG.Narlikar,\FastIncrementalUpdatesforPipelineForwardingEngines",INFOCOM,2003. [5] M.Buddhikot,S.SuriandM.Waldvogel,\Spacedecompositiontechniquesforfastlayer-4switching",ConferenceonHighSpeedNetworks,1998. [6] M.Degermark,A.Brodnik,S.Carlsson,andS.Pink.,\Smallforwardingtablesforfastroutinglookups",ProceedingsofSIGCOMM,3-14,1997. [7] W.Eatherton,G.Varghese,Z.Dittia,\Treebitmap:hardware/softwareIPlookupswithincrementalupdates",ComputerCommunicationReview,34(2):97-122,2004. [8] D.EppsteinandS.Muthukrishnan,\Internetpacketltermanagementandrectanglegeometry",12thACM-SIAMSymp.onDiscreteAlgorithms,2001,827-835. [9] A.FeldmanandS.Muthukrishnan,\Tradeosforpacketclassication",INFOCOM,2000. [10] P.GuptaandN.McKeown,\Packetclassicationusinghierarchicalintelligentcuts",ACMSIGCOMM,1999. [11] A.Hari,S.Suri,G.Parulkar,\Detectingandresolvingpacketlterconicts",INFOCOM,2000. [12] E.Horowitz,S.Sahni,andD.Mehta,\FundamentalsofDataStructuresinC++",W.H.Freeman,NY,1995. [13] E.Horowitz,S.Sahni,andS.Rajasekeran,\ComputerAlgorithms/C++",W.H.Freeman,NY,1997. [14] G.Jacobson,\SuccinctStaticDataStructure",CarnegieMellonUniversityPh.DThesis,1998. [15] C.Kaufman,R.PerlmanandM.Speciner,\NetworkSecurity:Privatecommunicationinapublicworld",SecondEdition,Chapter17,PrenticeHall, 201

PAGE 202

[16] T.LakshmanandD.Stidialis,\Highspeedpolicy-basedpacketforwardingusingecientmulti-dimensionalrangematching",ACMSIGCOMM,1998. [17] B.Lampson,V.Srinivasan,andG.Varghese,\IPLookupusingMulti-wayandMulti-columnSearch",IEEEInfocom98,1998. [18] H.Liu,\RoutingTableCompactioninTernaryTCAM",IEEEMicro,22,2002. [19] H.Lu,\ImprovedTriePartitioningforCoolerTCAMs",ACST,2004. [20] H.Lu,K.Kim,andS.Sahni,\Prex-andinterval-partitioneddynamicroutertables",IEEETrans.onComputers,54,5,2005,545-557. [21] H.LuandS.Sahni,\O(logW)multidimensionalpacketclassication",paperinreview.IEEE/ACMTransactionsonNetworking,toappear. [22] H.LuandS.Sahni,\Conictdetectionandresolutionintwo-dimensionalprexroutertables",IEEE/ACMTransactionsonNetworking,13,6,2005,1353-1363. [23] K.KimandS.Sahni,\Ecientconstructionofpipelinedmultibit-trieRouter-Tables",IEEETransactionsonComputers,toappear. [24] C.W.Mortensen,\Fully-dynamictwodimensionalorthogonalrangeandlinesegmentintersectionreportinginlogarithmictime",ProceedingsofthefourteenthannualACM-SIAMsymposiumonDiscretealgorithms,2003. [25] J.Lunteren,\SearchingverylargeroutingtablesinfastSRAM",ICCCN2001. [26] J.Lunteren,\Searchingverylargeroutingtablesinwideembeddedmemory",ProceedingsGlobecom,2001. [27] J.MunroandS.Rao,\Succinctrepresentationofdatastructures",inHandbookofDataStructuresandApplications,D.MehtaandS.Sahnieditors,Chapman&Hall/CRC,2005. [28] S.NilssonandG.Karlsson,\Fastaddresslook-upforInternetrouters",IEEEBroadbandCommunications,1998. [29] L.Qiu,G.VargheseandS.Suri,\Fastrewallimplementationforsoftwareandhardwarebasedrouters".9thInternationalConferenceonNetworkProtocolos,2001. [30] M.Ruiz-Sanchez,E.Biersack,andW.Dabbous,\SurveyandtaxonomyofIPaddresslookupalgorithms",IEEENetwork,2001,8-23. 202

PAGE 203

S.Sahni,K.Kim,andH.Lu,\Datastructuresforone-dimensionalpacketclassicationusingmost-specic-rulematching",InternationalJournalonFoun-dationsofComputerScience,14,3,2003,337-358. [32] S.SahniandK.Kim,\EcientconstructionofmultibittriesforIPlookup",IEEE/ACMTransactionsonNetworking,11,4,2003. [33] S.Singh,F.Baboescu,G.Varghese,andJ.Wang,\PacketClassicationUsingMultidimensionalCutting",ProceedingsofACMSIGCOMM,August2003. [34] V.Srinivasan,\Apacketclassicationandltermanagementsystem",INFOCOM,2001. [35] V.Srinivasan,S.Suri,andG.Varghese,\Packetclassicationusingtuplespacesearch",ACMSIGCOMM,1999. [36] V.Srinivasan,G.Varghese,S.Suri,andM.Waldvogel,\Fastandscalablelayerfourswitching",Proc.ACMSIGCOMM,1998. [37] V.Srinivasan,G.Varghese,S.Suri,andM.Waldvogel,\ScalableAlgorithmsforLayerfourSwitching",ProceedingsofACMSigcomm,8,1998. [38] V.SrinivasanandG.Varghese,\FasterIPlookupsusingcontrolledprexexpansion",ACMTransactionsonComputerSystems,Feb:1-40,1999. [39] H.Song,J.Turner,andJ.Lockwood,\ShapeShiftingTriesforFasterIPRouteLookup",Proceedingsof13thIEEEInternationalConferenceonNetworkProtocols,2005. [40] X.SunandY.Zhao,\Packetclassicationusingindependentsets",IEEESymposiumonComputers&Communications,2003,83-90. [41] X.SunandY.Zhao,\AnOn-ChipIPAddressLookupAlgorithm",IEEETransac-tionsonComputers,2005,873-885. [42] D.TaylorandJ.Turner,\ClassBench:APacketClassicationBenchmark",INFO-COM,2005. [43] D.TaylorandJ.Turner,\ScalablePacketClassicationusingDistributedCrossproductingofFieldLabels",INFOCOM,2005. [44] M.Wang,S.Deering,T.Hain,andL.Dunn,\Non-randomGeneratorforIPv6Tables",12thAnnualIEEESymposiumonHighPerformanceInterconnects,2004. [45] F.Zane,G.NarlikarandA.Basu,\CoolCAMs:Power-EcientTCAMsforForwardingEngines",INFOCOM,2003. 203

PAGE 204

WenchengLureceivedhisbachelor's(1998)andmaster's(2001)degreesfromZhejiangUniversityinChina.AftergraduationfromZhejiangUniversity,Mr.LuworkedforPhilips(China)InvestmentCo.asasoftwareengineer.HeworkedwithDr.SahniforhisPh.D.degreeattheUniversityofFlorida,intheeldofcomputernetworksandcommunications. 204