<%BANNER%>

Data Structures for Static and Dynamic Router Tables


PAGE 1

D A T A STR UCTURES F OR ST A TIC AND D YNAMIC R OUTER T ABLES By KUN SUK KIM A DISSER T A TION PRESENTED TO THE GRADUA TE SCHOOL OF THE UNIVERSITY OF FLORID A IN P AR TIAL FULFILLMENT OF THE REQUIREMENTS F OR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORID A 2003

PAGE 2

Cop yrigh t 2003 b y Kun Suk Kim

PAGE 3

T o m y mother, Kyung Hw an Lee m y wife, Hy e Ryung and m y sons, Jin Sung and Daniel

PAGE 4

A CKNO WLEDGMENTS I appreciate Go d for m y exciting and w onderful time at the Univ ersit y of Florida. In teractions with the man y m ulticultural studen ts and great professors ha v e enric hed me in b oth academic and so cial asp ects. I w ould lik e to deeply thank Distinguished Professor Sarta j Sahni, m y advisor for guiding me through the do ctoral study I thank Go d for ha ving giv en me suc h an ideal men tor. I ha v e learned from him ho w an advisor should treat his studen ts. His en th usiastic and dev oted attitude to w ard teac hing and researc h has strongly aected me. He has inspired me to solv e dicult problems and giv en me in v aluable advices. I consider him as a role mo del for m y life and hop e to lift m y abilities up to his high standards in the future. I also thank Drs. Randy Cho w, Ric hard Newman, Shigang Chen, and Janise McNair for serving on m y sup ervisory committee. I am thankful to V enk atac hary Sriniv asan and Jon Sharp for giving their programming co des and commen ts for the m ultibit trie and BSL structures, resp ectiv ely With their help I could start m y researc h smo othly Sp ecial thanks go to m y former advisor, Dr. Y ann-Hang Lee for supp orting me for the frst t w o semesters at Univ ersit y of Florida and encouraging me ev en after his mo ving to Arizona State Univ ersit y I w ould lik e to thank former and curren t mem b ers of the Korean Studen t Asso ciation of the CISE departmen t at the Univ ersit y of Florida for their assistance and friendship. Thanks go to Daey oung Kim, Y o onmee Doh, Y oung Jo on Byun, Myung Ch ul Song, and man y others. I am eternally grateful to P astor Hee Y oung Sohn, who is m y spiritual men tor; and to the ministers at Korean Baptist Ch urc h of Gainesville (KBCG), for con tin uously caring ab out me in Jesus lo v e. Thanks go to Jin Kun Song, Dong Y ul Sung, iv

PAGE 5

and other former and curren t mem b ers of m y cell c h urc h of KBCG for sharing their liv es with me and pra ying for me. I cannot adequately express m y gratitude to m y mother, Kyung Hw an Lee. She has w ork ed hard to pro vide me with b etter educational opp ortunities that made it p ossible for me to get this degree. I w ould lik e to thank Mo on Suk, m y brother; and Mi Ok, m y sister for their constan t supp ort and encouragemen t. I am also grateful to m y paren ts-in-la w, Dae Hun Song and Ok Jo Choi; m y fv e elder sisters-in-la w (and their h usbands), Mi Hy e, Mi Ah, Hy e Y oung, and Hy e Y oun; and m y y ounger sister-in-la w, Hy o Jae for supp orting me b oth materially and morally A t the end I ha v e to men tion m y n uclear family that has put up with m y absence for man y ev enings while I fnished this w ork. Thanks go to m y wife, Hy e Ryung; and m y sons, Jin Sung and Daniel for their lo v e and tolerance. I am grateful to all for their help and guidance and hop e to remem b er their lo v e forev er. v

PAGE 6

vi TABLE OF CONTENTS page ACKNOWLEDGMENTS.................................................................................................iv LIST OF TABLES...........................................................................................................viii LIST OF FIGURES.............................................................................................................x ABSTRACT.....................................................................................................................xi ii CHAPTER 1 INTRODUCTION......................................................................................................1 1.1 Internet Router....................................................................................................2 1.1.1 Internet Protocols.....................................................................................4 1.1.2 Classless Inter-Domain Routing (CIDR).................................................6 1.1.3 Packet Forwarding...................................................................................8 1.2 Packet Classification........................................................................................10 1.3 Prior Work........................................................................................................15 1.3.1 Linear List..............................................................................................15 1.3.2 End-Point Array.....................................................................................16 1.3.3 Sets of Equal-Length Prefixes................................................................16 1.3.4 Tries.......................................................................................................19 1.3.5 Binary Search Trees...............................................................................20 1.5.6 Others.......................................................................................................21 1.4 Dissertation Outline..........................................................................................22 2 MULTIBIT TRIES...................................................................................................23 2.1 1-Bit Tries........................................................................................................23 2.2 Fixed-Stride Tries.............................................................................................26 2.2.1 Definition...............................................................................................26 2.2.2 Construction of Optimal Fixed-Stride Tries..........................................27 2.3 Variable-Stride Tries........................................................................................38 2.3.1 Definition and Construction...................................................................38 2.3.2 An Example............................................................................................44 2.3.3 Faster k = 2 Algorithm...........................................................................46 2.3.4 Faster k = 3 Algorithm...........................................................................48 2.4 Experimental Results........................................................................................51 2.4.1 Performance of Fixed-Stride Algorithm................................................51 2.4.2 Performance of Fixed-Stride Algorithm................................................52 2.5 Summary..........................................................................................................67

PAGE 7

vii 3 BINARY SEARCH ON PREFIX LENGTH...........................................................68 3.1 Heuristic of Srinivasan.....................................................................................71 3.2 Optimal-Storage Algorithm..............................................................................74 3.2.1 Expansion Cost......................................................................................75 3.2.2 Number of Markers................................................................................76 3.2.3 Algorithm for ECHT..............................................................................78 3.3 Alternative Formulation...................................................................................80 3.4 Reduced-Range Heuristic.................................................................................81 3.5 More Accurate Cost Estimator.........................................................................95 3.6 Experimental Results........................................................................................95 3.7 Summary..........................................................................................................99 4 O (log n ) DYNAMIC ROUTER-TABLE...............................................................101 4.1 Prefixes and Ranges.......................................................................................101 4.2 Properties of Prefix Ranges............................................................................103 4.3 Representation Using Binary Search Trees....................................................105 4.3.1 Representation......................................................................................105 4.3.2 Longest Prefix Matching......................................................................108 4.3.3 Inserting a Prefix..................................................................................110 4.3.4 Deleting a Prefix..................................................................................120 4.3.5 Complexity...........................................................................................123 4.3.6 Comments............................................................................................124 4.4 Experimental Results......................................................................................125 4.5 Summary........................................................................................................129 5 DYNAMIC LOOKUP FOR BU RSTY ACCESS PATTERNS.............................132 5.1 Biased Skip Lists with Prefix Trees...............................................................133 5.2 Collection of Splay Trees...............................................................................139 5.3 Comparison of BITs and ABITs....................................................................140 5.4 Experimental Results......................................................................................142 5.5 Summary........................................................................................................154 6 CONCLUSIONS AND FUTURE WORK............................................................159 6.1 Conclusions....................................................................................................159 6.2 Future Work.....................................................................................................161 REFERENCES ................................................................................................................164 BIOGRAPHICAL SKETCH...........................................................................................170

PAGE 8

viii LIST OF TABLES Table page 2-1 Prefix databases obtained fr om IPMA project on Sep 13, 2000............................25 2-2 Distributions of the prefixes a nd nodes in the 1-bit trie for Paix...........................25 2-3 Memory required (in Kbytes) by best k -level FST................................................51 2-4 Execution time (in sec) for FST algorithms, Pentium 4 PC................................53 2-5 Execution time (in sec) for FST algorithms, SUN Ultra Enterprise 4000/5000.............................................................................................................53 2-6 Memory required (in Kbytes) by best k -level VST...............................................54 2-7 Execution times (in msec) for firs t two implementations of our VST algorithm, Pentium 4 PC........................................................................................56 2-8 Execution times (in msec) for firs t two implementations of our VST algorithm, SUN Ultra Enterprise 4000/5000.........................................................56 2-9 Execution times (in msec) for third implementation of our VST algorithm, Pentium 4 PC.........................................................................................................57 2-10 Execution times (in msec) for third implementation of our VST algorithm, SUN Ultra Enterprise 4000/5000...........................................................................57 2-11 Execution times (in msec) for our best VST implementation and the VST algorithm of Srinivasan and Varghese, Pentium 4 PC...........................................59 2-12 Execution times (in msec) for our best VST implementation and the VST algorithm of Srinivasan and Varghe se, SUN Ultra Enterprise 4000/5000............59 2-13 Time (in msec) to construct optimal VST from optimal stride data, Pentium 4 PC........................................................................................................................61 2-14 Search time (in sec) in optimal VST, Pentium 4 PC...........................................61 2-15 Insertion time (in sec) for OptVST, Pentium 4 PC..............................................64 2-16 Deletion time (in sec) for OptVST, Pentium 4 PC..............................................64

PAGE 9

ix 2-17 Insertion time (in sec) for Batch1, Pentium 4 PC................................................64 2-18 Deletion time (in sec) for Batch1, Pentium 4 PC................................................65 2-19 Insertion time (in sec) for Batch2, Pentium 4 PC................................................65 2-20 Deletion time (in sec) for Batch2, Pentium 4 PC................................................66 3-1 Number of prefixes and markers in solution to ECHT(P, k) .................................97 3-2 Number of prefixes and markers in solution to ACHT(P, k) .................................98 3-3 Preprocessing time in m illiseconds........................................................................98 3-4 Execution time, in sec, for ECHT(P, k) ...............................................................99 3-5 Execution time, in sec, for ACHT(P, k) .............................................................100 4-1 Statistics of prefix databases obt ained from IPMA project on Sep 13, 2000......125 4-2 Memory for data structure (in Kbytes)................................................................126 4-3 Execution time (in sec) for randomized databases............................................128 4-4 Execution time (in sec) for original databases...................................................129 5-1 Memory requirement (in KB)..............................................................................144 5-2 Trace sequences...................................................................................................145 5-3 Search time (in sec) for CRBT, ACRBT, and SACRBT structures on NODUP, DUP, and RAN data sets......................................................................147 5-4 Search time (in sec) for CST and BSLPT structures on NODUP, DUP, and RAN data sets................................................................................................148 5-5 Search time (in sec) for CRBT, ACRBT, and SACRBT structures on trace sequences.....................................................................................................149 5-6 Search time (in sec) for CST and BSLPT structures on trace sequences..........150 5-7 Average time to insert a prefix (in sec).............................................................153 5-8 Average time to delete a prefix (in sec).............................................................153 6-1 Performance of data structur es for longest matching-prefix................................161

PAGE 10

x LIST OF FIGURES Figure page 1-1 Internet structure......................................................................................................2 1-2 Generic router architecture......................................................................................5 1-3 Formats for IP packet header...................................................................................6 1-4 Transport protocol header formats...........................................................................7 1-5 Router table example.............................................................................................10 2-1 Prefixes and corresponding 1-bit trie.....................................................................24 2-2 Prefix expansion a nd fixed-stride trie....................................................................27 2-3 Algorithm for fixed-stride tries..............................................................................38 2-4 Two-level VST for prefixes of Figure 2-1(a)........................................................39 2-5 A prefix set and its e xpansion to four lengths........................................................44 2-6 1-bit trie for pref ixes of Figure 2-5(a)....................................................................44 2-7 Opt values in the computation of Opt(N0, 0, 4) .....................................................45 2-8 Optimal 4-VST for prefixes of Figure 2-5(a)........................................................46 2-9 Algorithm to compute C using Equation 2.20.......................................................47 2-10 Algorithm to compute T using Equation 2.22........................................................49 2-11 Memory required (in Kbytes) by best k -level FST................................................52 2-12 Execution time (in sec) for FST algorithms, Pentium 4 PC................................53 2-13 Execution time (in sec) for FST algorithms, SUN Ultra Enterprise 4000/5000..............................................................................................................54 2-14 Memory required (in Kbytes) for Paix by best k -VST and best FST....................55 2-15 Execution times (in msec) for Paix for our three VST implementations, Pentium 4 PC.........................................................................................................58

PAGE 11

xi 2-16 Execution times (in msec) for Paix for our three VST implementations, SUN Ultra Enterprise 4000/5000...........................................................................58 2-17 Execution times (in msec) for Paix for our best VST implementation and the VST algorithm of Srinivasan and Varghese, Pentium 4 PC............................60 2-18 Execution times (in msec) for Paix for our best VST implementation and the VST algorithm of Srinivasan and Varghese, SUN Ultra Enterprise 4000/5000..............................................................................................................60 2-19 Search time (in nsec) in op timal VST for Paix, Pentium 4 PC..............................62 2-20 Insertion time (in sec) for Paix, Pentium 4 PC....................................................65 2-21 Deletion time (in sec) for Paix, Pentium 4 PC ....................................................66 3-1 Controlled prefix expansion...................................................................................69 3-2 Prefixes and corresponding 1-bit trie.....................................................................73 3-3 Alternative binary tree for binary search...............................................................75 3-4 LEC and EC values for Figure 3-2.........................................................................76 3-5 LMC and MC values for Figure 3-2.......................................................................77 3-6 Optimal-storage CHTs for Figure 3-2....................................................................80 3-7 Algorithm for binary-search hash tables................................................................94 4-1 Prefixes and their ranges......................................................................................102 4-2 Pictorial and tabular represen tation of prefixes and ranges.................................102 4-3 Types of prefix ranges.........................................................................................104 4-4 CBST for Figure 4-2(a)........................................................................................106 4-5 Values of next() are shown as left arrows............................................................107 4-6 Algorithm to find LMP(d) ....................................................................................111 4-7 Pictorial representation of prefixes and ranges after inserting a prefix...............112 4-8 Basic interval tree and prefix trees after inserting P6 = 01* into Figure 4-4.......113 4-9 Algorithm to insert an end point..........................................................................114 4-10 Splitting a basic interval when lsb(u) = 1............................................................115

PAGE 12

xii 4-11 Prefix trees after inserting P7 = 10* into P1 P5 ..................................................117 4-12 Algorithm to update prefix trees..........................................................................119 4-13 P = S ; P S and S starts at s ; and P S and S finishes at f ..................................122 4-14 Memory required (in Kbytes) by best k -VST and CRBT for Paix......................126 4-15 Search time (in sec) comparison for Paix..........................................................129 4-16 Insert time (in sec) comparison for Paix............................................................130 4-17 Delete time (in sec) comparison for Paix..........................................................130 5-1 Skip list representation for ba sic intervals of Figure 4-2(a)................................134 5-2 Start point s of P splits the basic interval [ a b ]...................................................137 5-3 BSLPT insert algorithm.......................................................................................138 5-4 Alternative Base Interval Tree corresponding to Figure 4-2(a)...........................140 5-5 Total memory requirement (in MB)....................................................................145 5-6 Average search time for NOD UP, DUP, and RAN data sets..............................156 5-7 Average search time for trace sequences.............................................................157 5-8 Average time to insert a prefix.............................................................................158 5-9 Average time to delete a prefix............................................................................158

PAGE 13

Abstract of Dissertation Presen ted to the Graduate Sc ho ol of the Univ ersit y of Florida in P artial F ulfllmen t of the Requiremen ts for the Degree of Do ctor of Philosoph y D A T A STR UCTURES F OR ST A TIC AND D YNAMIC R OUTER T ABLES By Kun Suk Kim August 2003 Chair: Sarta j K. Sahni Ma jor Departmen t: Computer and Information Science and Engineering The In ternet has b een gro wing exp onen tially since man y users mak e use of new applications and w an t to connect their hosts to In ternet. Because of increased processing p o w er and high-sp eed comm unication links, pac k et header pro cessing has b ecome a ma jor b ottlenec k in In ternet routing. T o impro v e pac k et forw arding, our study dev elop ed fast and ecien t algorithms. W e impro v ed on dynamic programming algorithms to determine the strides of optimal m ultibit tries b y pro viding alternativ e dynamic programming form ulations for b oth fxedand v ariable-stride tries. While the asymptotic complexities of our algorithms are the same as those for the corresp onding algorithms of [82], exp erimen ts using real IPv4 routing table data indicate that our algorithms run considerably faster. An added feature of our v ariable-stride trie algorithm is the abilit y to insert and delete prefxes taking a fraction of the time needed to construct an optimal v ariable-stride trie from scratc h. IP lo okup in a collection of hash tables (CHT) organization can b e done with O (log l dist ) hash-table searc hes, where l dist is the n um b er of distinct prefx-lengths (also equal to the n um b er of hash tables in the CHT). W e dev elop ed an algorithm xiii

PAGE 14

that minimizes the storage required b y the prefxes and mark ers for the resulting set of prefxes to reduce the v alue of l dist b y using the con trolled prefx-expansion tec hnique. Also, w e prop osed impro v emen ts to the heuristic of [80]. W e dev elop ed a data structure called collection of red-blac k trees (CRBT) in whic h prefx matc hing, insertion, and deletion can eac h b e done in O (log n ) time, where n is the n um b er of prefxes in the router table. Exp erimen ts using real IPv4 routing databases indicate that although the prop osed data structure is slo w er than optimized v ariable-stride tries for longest prefx matc hing, the prop osed data structure is considerably faster for the insert and delete op erations. W e form ulate a v arian t, alternativ e collection of red-blac k trees (A CRBT) from the CRBT data structure to dev elop data structures for burst y access patterns. By replacing the red-blac k trees used in the A CRBT with spla y trees (or biased skip lists), w e obtained the collection of spla y trees (CST) structure (or the biased skip lists with prefx trees (BSLPT) structure) in whic h searc h, insert, and delete tak e O (log n ) amortized time (or O (log n ) exp ected time) p er op eration, where n is the n um b er of prefxes in the router table. Exp erimen tal results using real IPv4 routing databases and syn thetically generated searc h sequences as w ell as trace sequences indicate that the CST structure is b est for extremely burst y access patterns. Otherwise, the A CRBT is recommended. Our exp erimen ts also indicate that a sup erno de implemen tation of the A CRBT usually has b etter searc h p erformance than do es the traditional one-elemen t-p er-no de implemen tation. xiv

PAGE 15

CHAPTER 1 INTR ODUCTION The inruence of In ternet ev olution reac hes not only to the tec hnical felds of computer and comm unications but throughout so ciet y as w e mo v e to w ard increasing use of online services (e.g., electronic commerce and information acquisition). The In ternet is a w orld-wide comm unication infrastructure in whic h individuals and their computers in teract and collab orate without regard for geographic lo cation. Beginning with the early researc h on p acket switching 1 and the ARP ANET, 2 go v ernmen t, industry and academia ha v e b een co op erating to ev olv e and deplo y this exciting new tec hnology [12 45 ]. The In ternet is an in ternet w ork that ties man y groups of net w orks with a common In ternet Proto col (IP) [6, 19]. Figure 1{1 sho ws routers as the switc h p oin ts of an in ternet w ork. A pac k et ma y pass through man y dieren t deplo ymen t classes of routers from source to destination. The en terprise router, lo cated at the lo w est lev el in the router hierarc h y m ust supp ort tens to thousands of net w ork routes and medium bandwidth in terfaces of 100 Mbps to 1 Gbps. Access routers are aggregation p oin ts for corp orations and residen tial services. Access routers m ust supp ort thousands or tens of thousands of routes. Residen tial terminals are connected to mo dem p o ols of the telephone cen tral oce through plain old telephone service (POTS), cable service, 1 This tec hnology is fundamen tally dieren t from the cir cuit switching that w as used b y the telephone system. In a pac k et-switc hing system, data to b e deliv ered is brok en in to small c h unks called pac k et that are lab eled to sho w where they come from and where they are to go. 2 This pro ject w as sp onsored b y U.S. Departmen t of Defense to dev elop a whole new sc heme for p ostn uclear comm unication. 1

PAGE 16

2 Core routerr ISP access routerr Enterprise routerr Modemr Bankr Figure 1{1: In ternet structure or one of the digital subscrib er lines (DSLs). Bac kb one routers require m ultiples of high bandwidth p orts suc h as OC-48 at 2.5 Gbps and OC-192 at 9.6 Gbps. Bac kb one routers co v er national and in ternational areas. With the doubling of In ternet trac ev ery 3 mon ths [85 ] and the tripling of In ternet hosts ev ery 2 y ears [31], the imp ortance of high sp eed scalable net w ork routers cannot b e o v eremphasized. F ast net w orking \will pla y a k ey role in enabling future progress" [54]. F ast net w orking requires fast routers; and fast routers require fast router table lo okup. The rest of this c hapter is structured as follo ws. Section 1.1 in tro duces the basics of In ternet routers. W e describ e the IP lo okup and pac k et classifcation problems in Section 1.2. In Section 1.3 discusses related w ork in these felds. Finally Section 1.4 presen ts an outline of the dissertation. 1.1 In ternet Router Figure 1{2 sho ws the generic arc hitecture of an IP router. Generally a router consists of the follo wing basic comp onen ts: the con troller card, the router bac kplane, and line cards [3 51]. The CPU in the con troller card p erforms path computations

PAGE 17

3 and router table main tenance. The line cards p erform in b ound and outb ound pac k et forw arding. The router bac kplane transfers pac k ets b et w een the cards. The basic functions in a router can b e classifed as routing, pac k et forw arding, switc hing, and queueing [3, 6, 51, 58]. W e discuss the eac h function in more detail b elo w. Routing: Routing is the pro cess of comm unicating to other routers and exc hanging route information to construct and main tain the router tables that are used b y the pac k et-forw arding function. Routing proto cols that are used to learn ab out and create a view of the net w ork's top ology include the routing information proto col (RIP) [37 ], op en shortest path frst (OSPF) [55 ], b order gatew a y proto col (BGP) [47 65], distance v ector m ulticast routing proto col (D VMRP) [86 ], and proto col indep enden t m ulticast (PIM) [27 ]. P ac k et forw arding: The router lo oks at eac h incoming pac k et and p erforms a table lo okup to decide whic h output p ort to use. This is based on the destination IP address in the incoming pac k et. The result of this lo okup ma y imply a lo cal, unicast, or m ulticast deliv ery A lo cal deliv ery is the case that the destination address is one of the router's lo cal addresses and the pac k et is lo cally deliv ered. A unicast deliv ery sends the pac k et to a single output p ort. A m ulticast deliv ery is done through a set of output p orts, dep ending on the m ulticast group mem b ership of the router. In addition to table lo okup, routers m ust p erform other functions. { P ac k et v alidation: This function c hec ks to see that the reciev ed IPv4 pac k et is prop erly formed for the proto col b efore it pro ceeds with proto col pro cessing. Ho w ev er, b ecause the c hec ksum calculation is considered to o exp ensiv e, curren t routers hardly v erify the c hec ksum, instead assuming that pac k ets are transmitted through reliable media lik e fb er optics and assuming that end hosts will recognize an y p ossible corruption.

PAGE 18

4 { P ac k et lifetime con trol: The router adjusts the time-to-liv e (TTL) feld in the pac k et to prev en t pac k ets lo oping endlessly in the net w ork. A host sending a pac k et initializes the TTL with 64 (recommended b y [67]) or 255 (the maxim um). A pac k et b eing routed to output p orts has its TTL v alue decremen ted b y 1. A pac k et whose TTL is zero b efore reac hing the destination is discarded b y the router. { Chec ksum up date: The IP header c hec ksum m ust b e recalculated since the TTL feld w as c hanged. RF C 1071 [8] con tains implemen tation tec hniques for computing the IP c hec ksum. If only the TTL w as decremen ted b y 1, a router can ecien tly up date the c hec ksum incremen tally instead of calculating the c hec ksum o v er the en tire IP header [48]. P ac k et switc hing: P ac k et switc hing is the pro cess of mo ving pac k ets from one in terface to other p ort in terface based on the forw arding decision. P ac k et switc hing can b e done at v ery high sp eed [14 52 ]. Queueing: Queueing is the action of buering eac h pac k et in a small memory for a short time (on the order of a few microseconds) during pro cessing of the pac k et. Queueing can b e done at the input, in the switc h fabric, and/or at the output. 1.1.1 In ternet Proto cols The headers of the IPv4 and IPv6 proto cols [21, 59 ] are sho wn in Figure 1{3. Unicast pac k ets are forw arded based on the destination address feld. Eac h router b et w een source and destination m ust lo ok at this feld. Multicast pac k ets are forw arded based on the source net w ork and destination group address. The proto col feld defnes the transp ort proto col (e.g., TCP [60] and UDP [61]) that is encapsulated within this IP pac k et. The t yp e-of-service (T oS) feld notices a pac k et's priorit y its queueing, and its dropping b eha vior (to the routers). Some applications suc h as telnet and FTP set these rags.

PAGE 19

5 Controller Cardr Routerr Tabler Routerr Controllerr Line Cardr Forwardingr Incomingr packetr Outgoingr packetr Routerr Backplaner Figure 1{2: Generic router arc hitecture The most notable c hange from the IPv4 to the IPv6 header is the address length of 128 bits. P a yload length is the length of the IPv6 pa yload in o ctets. Next header uses the same v alue as the IPv4 proto col feld. Hop limit is decremen ted b y 1 b y eac h no de that forw ards the pac k et. The pac k et is discarded if the hop limit is decremen ted to zero. The ro w ID feld is added to simplify pac k et classifcation. The tuple (source address, ro w ID) uniquely iden tifes a ro w for an y nonzero ro w ID. The headers of t w o transp ort proto cols, as sho wn in Figure 1{4, pro vide more information (suc h as source and destination p ort n um b ers and rags that are used to further classify pac k ets). In TCP and UDP net w orks, a p ort is an endp oin t to a logical connection and the w a y a clien t program sp ecifes a sp ecifc serv er program on a computer in a net w ork. These t w o p ort n um b ers are used to distribute pac k ets to the application and represen t the fne-grained v ariet y of ro ws. They can b e used to iden tify a ro w within the net w ork. Th us, applications can reserv e the resources to

PAGE 20

6 Versr HLenr ToSr Packet Lengthr Identificationr Flagment Info/Offsetr Protocolr TTLr Header Checksumr Source Addressr Destination Addressr IP Options (optional, variable length)r 0r 3 (octet)r 2r 1r (a) F ormat for an IPv4 header Payload Lengthr Source Address (128 bits)r Destination Address (128 bits)r Next Headerr Hop Limitr 0r 3 (octet)r 2r 1r Versr Flow IDr Traffic Classr (b) F ormat for an IPv6 header Figure 1{3: F ormats for IP pac k et header guaran tee their service requiremen t using appropriate signalling proto col lik e RSVP [9]. Some p orts ha v e n um b ers that are preassigned to them, and these are kno wn as wel l-known p orts P ort n um b ers range from 0 to 65536, but only p orts n um b ers 0 to 1024 are reserv ed for privileged services and designated as w ell-kno wn p orts [67]. Eac h of these w ell-kno wn p ort n um b ers sp ecifes the p ort used b y the serv er pro cess as its con tact p ort. F or example, p ort n um b ers 20, 25, and 80 are assigned for FTP simple mail transfer proto col (SMTP) [42], and h yp ertext transfer proto col (HTTP) [29] serv ers, resp ectiv ely 1.1.2 Classless In ter-Domain Routing (CIDR) As the In ternet has ev olv ed and gro wn, it faced t w o serious scaling problems [38].

PAGE 21

7 Source Portr Destination Portr Sequence Numberr Acknowlegement Numberr Window Sizer TCP Options (optional, variable length)r Offsetr Reservedr Flagsr Checksumr Urgent Pointerr 0r 3 (octet)r 2r 1r (a) TCP header Source Portr Destination Portr UDP Data Lengthr Checksumr 0r 3 (octet)r 2r 1r (b) UDP header Figure 1{4: T ransp ort proto col header formats Exhaustion of IP address space: In the old Class A, B, and C address sc heme, a fundamen tal cause of this problem w as the lac k of a net w ork class of a size that is appropriate for a mid-sized organization. Class C with a maxim um of 254 host addresses is to o small; while class B, whic h allo ws up to 65534 addresses, is to o large to b e densely p opulated. The result is inecien t use of class B net w ork n um b ers. F or example, if y ou needed 500 addresses to confgure a net w ork, y ou w ould b e assigned Class B. Ho w ev er, that means 65034 un used addresses. Routing information o v erload: As the n um b er of net w orks on the In ternet increased, so did the n um b er of routes. The size and rate of gro wth of the router tables in In ternet routers is b ey ond the abilit y to eectiv ely manage it. CIDR is a mec hanism to slo w the gro wth of router tables and allo w for more ecien t allo cation of IP addresses than the old Class A, B, and C address sc heme. Tw o solutions to these problems w ere dev elop ed and adopted b y the In ternet comm unit y [66, 30].

PAGE 22

8 Restructuring IP address assignmen ts: Instead of b eing limited to a netw ork iden tifer (or pr efx ) of 8, 16, or 24 bits, CIDR uses generalized prefxes an ywhere from 13 to 27 bits. Th us, blo c ks of addresses can b e assigned to netw orks with 32 hosts or to those with o v er 500,000 hosts. This allo ws for address assignmen ts that m uc h more closely ft an organization's sp ecifc needs. Hierarc hical routing aggregation: The CIDR addressing sc heme also enables r oute aggr e gation in whic h a single high-lev el route en try can represen t man y lo w er-lev el routes in the global router tables. Big blo c ks of addresses are assigned to the large In ternet Service Pro viders (ISPs) who then re-allo cate p ortions of their address blo c ks to their customers. F or example, a tier-1 ISP (e.g., Sprin t and P acifc Bell) w as assigned a CIDR address blo c k with a prefx of 14 bits and t ypically assigns its customers, who ma y b e smaller tier-2 ISPs, CIDR addresses with prefxes ranging from 27 bits to 18 bits. These customers in turn re-allo cate p ortions of their address blo c k to their users and/or customers (tier-3 or lo cal ISPs). In the bac kb one router tables all these dieren t net w orks and hosts can b e represen ted b y the single tier-1 ISP route en try In this w a y the gro wth in the n um b er of router table en tries at eac h lev el in the net w ork hierarc h y has b een signifcan tly reduced. 1.1.3 P ac k et F orw arding Consider a part of the In ternet in Figure 1{5(a) to get an in tuitiv e idea of pac k et deliv ery If a user in Chicago wishes to send a pac k et to Orlando, the pac k et is sen t to a router R4. The router R4 ma y send this pac k et on the comm unication link L3 to a router R1. The router R1 ma y then send the pac k et on link L4 to a router R5 in Orlando. R4 then sends the pac k et to the fnal destination. An In ternet router table is a set of tuples of the form ( p; a ), where p is a binary string whose length is at most W ( W = 32 for IPv4 destination addresses and W = 128 for IPv6), and a is an output link (or next hop). When a pac k et with destination

PAGE 23

9 address A arriv es at a router, w e are to fnd the pair ( p; a ) in the router table for whic h p is a longest matc hing prefx of A (i.e., p is a prefx of A and there is no longer prefx q of A suc h that ( q ; b ) is in the table). Once this pair is determined, the pac k et is sen t to output link a The sp eed at whic h the router can route pac k ets is limited b y the time it tak es to p erform this table lo okup for eac h pac k et. F or example, consider a router table at the router R1 in Figure 1{5(a), sho wn in Figure 1{5(b). Assume that when a pac k et arriv es on router R1, the pac k et carries the destination address 101110 in its header. In this example w e assume that the longest prefx length is 6. T o forw ard the pac k et to its fnal destination, router R1 consults a router table, whic h lists eac h p ossible prefx and the corresp onding output link. The address 101110 matc hes b oth 1* and 101* in the router table, but 101* is the longest matc hing prefx. Since the table indicates output link L2, the router then switc hes the pac k et to L2. Longest prefx routing is used b ecause this results in smaller and more manageable router tables. It is impractical for a router table to con tain an en try for eac h of the p ossible destination addresses. Tw o of the reasons this is so are The n um b er of suc h en tries w ould b e almost one h undred million and w ould triple ev ery 3 y ears. Ev ery time a new host comes online, all router tables m ust incorp orate the new host's address. By using longest prefx routing, the size of router tables is con tained to a reasonable quan tit y; and information ab out host/router c hanges made in one part of the In ternet need not b e propagated throughout the In ternet.

PAGE 24

10 Chicagor (11010*)r Orlandor (11100*)r Los Angelesr (101*)r CAr (1*)r R1r L1r L5r L4r L3r L2r R5r R4r R3r R2r (a) Bac kb one routers Prefixr Output Linkr 1*r 101*r 11010*r 11100*r L1r L2r L3r L4r (b) Router table for router R1 Figure 1{5: Router table example 1.2 P ac k et Classifcation An In ternet router classifes incoming pac k ets in to ro ws, 3 using information con tained in pac k et headers and a table of (classifcation) rules. This table is called the rule table (equiv alen tly r outer table ). The pac k et-header information that is used to p erform the classifcation is some subset of the source and destination addresses, the source and destination p orts, the proto col, proto col rags, t yp e of service, and so 3 A ro w is a set of pac k ets that are to b e treated similarly for routing purp oses.

PAGE 25

11 on. The sp ecifc header information used for pac k et classifcation is go v erned b y the rules in the rule table. Eac h rule-table rule is a pair of the form ( F ; A ), where F is a flter and A is an action. The action comp onen t of a rule sp ecifes what is to b e done when a pac k et that satisfes the rule flter is receiv ed. Sample actions are drop the pac k et, forw ard the pac k et along a certain output link, and reserv e a sp ecifed amoun t of bandwidth. A rule flter F is a tuple that is comprised of one or more felds. In the simplest case of destination-based pac k et forw arding, F has a single feld, whic h is a destination (address) prefx; and A is the next hop for pac k ets whose destination address has the sp ecifed prefx. F or example, the rule (01 ; a ) states that the next hop for pac k ets whose destination address (in binary) b egins with 01 is a IP m ulticasting uses rules in whic h F comprises the t w o felds source prefx and destination prefx; QoS routers ma y use fv e-feld rule flters (source-address prefx, destination-address prefx, source-p ort range, destination-p ort range, and proto col); and frew all flters ma y ha v e one or more felds. In the d -dimensional pac k et classifcation problem, eac h rule has a d -feld flter. Our study w as concerned solely with 1-dimensional pac k et classifcation. It should b e noted, that data structures for m ultidimensional pac k et classifcation are usually built on top of data structures for 1-dimensional pac k et classifcation. Therefore, the study of data structures for 1-dimensional pac k et classifcation is fundamen tal to the design and dev elopmen t of data structures for d -dimensional, d > 1, pac k et classifcation. F or the 1-dimensional pac k et classifcation problem, w e assume that the single feld in the flter is the destination feld; and that the action is the next hop for the pac k et. With these assumptions, 1-dimensional pac k et classifcation is equiv alen t to the destination-based pac k et forw arding problem. Henceforth, w e use the terms rule table and router table to mean tables in whic h the flters ha v e a single feld, whic h is the destination address. This single feld of a flter ma y b e sp ecifed in one of t w o w a ys:

PAGE 26

12 As a range: F or example, the range [35, 2096] matc hes all destination addresses d suc h that 35 d 2096. As an address/mask pair: Let x i denote the i th bit of x The address/mask pair a=m matc hes all destination addresses d for whic h d i = a i for all i for whic h m i = 1. That is, a 1 in the mask sp ecifes a bit p osition in whic h d and a m ust agree; while a 0 in the mask sp ecifes a don't-care bit p osition. F or example, the address/mask pair 101100/011101 matc hes the destination addresses 101100, 101110, 001100, and 001110. When all the 1-bits of a mask are to the left of all 0-bits, the address/mask pair sp ecifes an address prefx. F or example, 101100/110000 matc hes all destination addresses that ha v e the prefx 10 (i.e., all destination addresses that b egin with 10). In this case, the address/mask pair is simply represen ted as the prefx 10*, where the denotes a sequence of don't-care bits. If W is the length, in bits, of a destination address, then the in 10* represen ts all sequences of ( W )Tj/T1_0 11.9552 Tf11.7494 0 Td(2) bits. In IPv4 the address and mask are b oth 32 bits; while in IPv6 b oth of these are 128 bits. Notice that ev ery prefx ma y b e represen ted as a range. F or example, when W = 6, the prefx 10* is equiv alen t to the range [32, 47]. A range that ma y b e sp ecifed as a prefx for some W is called a pr efx r ange The sp ecifcation 101100/011101 ma y b e abbreviated to ?011?0, where ? denotes a don't-care bit. This sp ecifcation is not equiv alen t to an y single range. Also, the range sp ecifcation [3,6] is not equiv alen t to an y single address/mask sp ecifcation. When more than one rule matc hes an incoming pac k et, a tie o ccurs. T o select one of the man y rules that ma y matc h an incoming pac k et, w e use a tie br e aker Let R S b e the set of rules in a rule table and let F S b e the set of flters asso ciated with these rules. r ul es ( d; R S ) (or simply r ul es ( d ) when R S is implicit) is the subset of rules of R S that matc h/co v er the destination address d f il ter s ( d; F S ) and

PAGE 27

13 f il ter s ( d ) are defned similarly A tie o ccurs whenev er j r ul es ( d ) j > 1 (equiv alen tly j f il ter s ( d ) j > 1). Three p opular tie break ers are First matc hing rule in table: The rule table is assumed to b e a linear list ([39 ]) of rules with the rules indexed 1 through n for an n -rule table. The action corresp onding to the frst rule in the table that matc hes the incoming pac k et is used. In other w ords, for pac k ets with destination address d the rule of r ul es ( d ) that has least index is selected. F or our example router table corresp onding to the fv e prefxes of Figure 4{ 1, rule R1 is selected for ev ery incoming pac k et, b ecause P1 matc hes ev ery destination address. When using the frst-matc hing-rule criteria, w e m ust index the rules carefully In our example, P1 should corresp ond to the last rule so that ev ery other rule has a c hance to b e selected for at least one destination address. Highest-priorit y rule: Eac h rule in the rule table is assigned a priorit y F rom among the rules that matc h an incoming pac k et, the rule that has highest priorit y wins is selected. T o a v oid the p ossibilit y of a further tie, rules are assigned dieren t priorities (it is actually sucien t to ensure that for ev ery destination address d r ul es ( d ) do es not ha v e t w o or more highest-priorit y rules). Notice that the frst-matc hing-rule criteria is a sp ecial case of the highestpriorit y criteria (simply assign eac h rule a priortiy equal to the negativ e of its index in the linear list). Most-sp ecifc-rule matc hing: The flter F 1 is more sp ecifc than the flter F 2 i F 2 matc hes all pac k ets matc hed b y F 1 plus at least one additional pac k et. So, for example, the range [2 ; 4] is more sp ecifc than [1 ; 6], and [5 ; 9] is more sp ecifc than [5 ; 12]. Since [2 ; 4] and [8 ; 14] are disjoin t (i.e., they ha v e no address in common), neither is more sp ecifc than the other. Also, since [4 ; 14] and

PAGE 28

14 [6 ; 20] in tersect 4 neither is more sp ecifc than the other. The prefx 110* is more sp ecifc than the prefx 11*. In most-sp ecifc-rule matc hing, ties are brok en b y selecting the matc hing rule that has the most sp ecifc flter. When the flters are destination prefxes, the most-sp ecifc-rule that matc hes a giv en destination d is the longest 5 prefx in f il ter s ( d ). Hence, for prefx flters, the most-sp ecifc-rule tie break er is equivalen t to the longest-matc hing-prefx criteria used in router tables. F or our example rule set, when the destination address is 18, the longest matc hing-prefx is P4. When the flters are ranges, the most-sp ecifc-rule tie break er requires us to select the most sp ecifc range in f il ter s ( d ). Notice also that most-sp ecifcrange matc hing is a sp ecial case of the the highest-priorit y rule. F or example, when the flters are prefxes, set the prefx priorit y equal to the prefx length. F or the case of ranges, the range priorit y equals the negtiv e of the range size. In a static rule table, the rule set do es not v ary in time. F or these tables, w e are concerned primarily with the follo wing metrics: Time required to pro cess an incoming pac k et: This is the time required to searc h the rule table for the rule to use. Prepro cessing time: This is the time to create the rule-table data structure. Storage requiremen t: That is, ho w m uc h memory is required b y the ruletable data structure? 4 Tw o ranges [ u; v ] and [ x; y ] in tersect i u < x v < y x < u y < v 5 The length of a prefx is the n um b er of bits in that prefx (note that the is not used in determining prefx length). The length of P1 is 0 and that of P2 is 4.

PAGE 29

15 In practice, rule tables are seldom truly static. A t b est, rules ma y b e added to or deleted from the rule table infrequen tly T ypically in a \static" rule table, inserts/deletes are batc hed and the rule-table data structure reconstructed as needed. In a dynamic rule table, rules are added/deleted with some frequency F or suc h tables, inserts/deletes are not batc hed. Rather, they are p erformed in real time. F or suc h tables, w e are concerned additionally with the time required to insert/delete a rule. F or a dynamic rule table, the initial rule-table data structure is constructed b y starting with an empt y data structures and then inserting the initial set of rules in to the data structure one b y one. So, t ypically in the case of dynamic tables, the prepro cessing metric, men tioned ab o v e, is v ery closely related to the insert time. In this dissertation, w e fo cus on data structures for static and dynamic router tables (1-dimensional pac k et classifcation) in whic h the flters are either prefxes or ranges. Although some of our data structures apply equally w ell to all three of the commonly used tie break ers, our fo cus, in this dissertation, is on longest-prefx matc hing. 1.3 Prior W ork Sev eral solutions for the IP lo okup problem (i.e., fnding the longest matc hing prefx) ha v e b een prop osed. Let LM P ( d ) b e the longest matc hing-prefx for address d 1.3.1 Linear List In this data structure, the rules of the rule table are stored as a linear list ([39]) L The LM P ( d ) is determined b y examining the prefxes in L from left to righ t; for eac h prefx, w e determine whether or not that prefx matc hes d ; and from the set of matc hing prefxes, the one with longest length is selected. T o insert a rule q w e frst searc h the list L from left to righ t to ensure that L do esn't already ha v e a rule with the same flter as do es q Ha ving v erifed this, the new rule q is added to the end of the list. Deletion is similar. The time for eac h of the op erations to determine

PAGE 30

16 LM P ( d ), insert a rule, and delete a rule is O ( n ), where n is the n um b er of rules in L The memory required is also O ( n ). Note that this data structure ma y b e used regardless of the form of the flter (i.e., ranges, Bo olean expressions, etc.) and regardless of the tie break er in use. The time and memory complexities are unc hanged. 1.3.2 End-P oin t Arra y Lampson, Sriniv asan, and V arghese [44] prop osed a data structure in whic h the end p oin ts of the ranges defned b y the prefxes are stored in ascending order in an arra y The LM P ( d ) is found b y p erformaing a binary searc h on this ordered arra y of end p oin ts. Although Lampson et al. [44 ] pro vide w a ys to reduce the complexit y of the searc h for the LMP b y a constan t factor, these metho ds do not result in sc hemes that p ermit prefx insertion and deletion in O (log n ) time. It should b e noted that the end-p oin t arra y ma y b e used ev en when ties are brok en b y selecting the frst matc hing rule or the highest-priorit y matc hing rule. F urther, the metho d applies to the case when the flters are arbitrary ranges rather than simply prefxes. The complexit y of the prepro cessing step (i.e., creation of the arra y of ordered end-p oin ts) and the searc h for the rule to use is unc hanged. F urther, the memory requiremen ts are the same, O ( n ) for an n -rule table, regardless of the tie break er and whether the flters are prefxes or general ranges. 1.3.3 Sets of Equal-Length Prefxes W aldv ogel et al. [87] prop osed a data structure to determine LM P ( d ) b y p erforming a binary searc h on prefx length. In this data structure, the prefxes in the router table T are partitioned in to the sets S 0 S 1 ::: suc h that S i con tains all prefxes of T whose length is i F or simplicit y w e assume that T con tains the default prefx *. So, S 0 = fg Next, eac h S i is augmen ted with mark ers that represen t prefxes in S j suc h that j > i and i is on the binary searc h path to S j F or example, supp ose that the length of the longest prefx of T is 32 and that the length of LM P ( d ) is 22.

PAGE 31

17 T o fnd LM P ( d ) b y a binary searc h on length, w e will frst searc h S 16 for an en try that matc hes the frst 16 bits of d This searc h 6 will need to b e successful for us to pro ceed to a larger length. The next searc h will b e in S 24 This searc h will need to fail. Then, w e will searc h S 20 follo w ed b y S 22 So, the path follo w ed b y a binary searc h on length to get to S 22 is S 16 S 24 S 20 and S 22 F or this to b e follo w ed, the searc hes in S 16 S 20 and S 22 m ust succeed while that in S 24 m ust fail. Since the length of LM P ( d ) is 22, T has no matc hing prefx whose length is more than 22. So, the searc h in S 24 is guaran teed to fail. Similarly the searc h in S 22 is guaran teed to succeed. Ho w ev er, the searc hes in S 16 and S 20 will succeed i T has matc hing prefxes of length 16 and 20. T o ensure success, ev ery length 22 prefx P places a marker in S 16 and S 20 the mark er in S 16 is the frst 16 bits of P and that in S 20 is the frst 20 bits in P Note that a mark er M is placed in S i only if S i do esn't con tain a prefx equal to M Notice also that for eac h i the binary searc h path to S i has O (log l max ) = O (log W ), where l max is the length of the longest prefx in T S j s on it. So, eac h prefx creates O (log W ) mark ers. With eac h mark er M in S i w e record the longest prefx of T that matc hes M (the length of this longest matc hing-prefx is neccessarily smaller than i ). T o determine LM P ( d ), w e b egin b y setting l ef tE nd = 0 and r ig htE nd = l max The rep etitiv e step of the binary searc h requires us to searc h for an en try in S m where m = b ( l ef tE nd + r ig htE nd ) = 2 c that equals the frst m bits of d If S m do es not ha v e suc h an en try set r ig htE nd = m )Tj/T1_0 11.9552 Tf12.1094 0 Td(1. Otherwise, if the matc hing en try is the prefx P P b ecomes the longest matc hing-prefx found so far. If the matc hing en try is the mark er M the prefx recorded with M is the longest matc hing-prefx 6 When searc hing S i only the frst i bits of d are used, b ecause all prefxes in S i ha v e exactly i bits.

PAGE 32

18 found so far. In either case, set l ef tE nd = m + 1. The binary searc h terminates when l ef tE nd > r ig htE nd One ma y easily establish the correctness of the describ ed binary searc h. Since, eac h prefx creates O (log W ) mark ers, the memory requiremen t of the sc heme is O ( n log W ). When eac h set S i is represen ted as a hash table, the data structure is called SELPH (sets of equal length prefxes using hash tables). The exp ected time to fnd LM P ( d ) is O (log W ) when the router table is represen ted as an SELPH. When inserting a prefx, O (log W ) mark ers m ust also b e inserted. With eac h mark er, w e m ust record a longest-matc hing prefx. The exp ected time to fnd these longest matc hing-prefxes is O (log 2 W ). In addition, w e ma y need to up date the longestmatc hing prefx information stored with the O ( n log W ) mark ers at lengths greater than the length of the newly inserted prefx. This tak es O ( n log 2 W ) time. So, the exp ected insert time is O ( n log 2 W ). When deleting a prefx P w e m ust searc h all hash tables for mark ers M that ha v e P recorded with them and then up date the recorded prefx for eac h of these mark ers. F or hash tables with a b ounded loading densit y the exp ected time for a delete (including mark er-prefx up dates) is O ( n log 2 W ). W aldv ogel et al. [87 ] ha v e sho wn that b y inserting the prefxes in ascending order of length, an n -prefx SELPH ma y b e constructed in O ( n log 2 W ) time. When eac h set is represen ted as a balanced searc h tree, the data structure is called SELPT. In an SELPT, the time to fnd LM P ( d ) is O (log n log W ); the insert time is O ( n log n log 2 W ); the delete time is O ( n log n log 2 W ); and the time to construct the data structure for n prefxes is O ( W + n log n log 2 W ). In the full v ersion of [87 ], W aldv ogel et al. sho w that b y using a tec hnique called mark er partitioning, the SELPH data structure ma y b e mo difed to ha v e a searc h time of O ( + log W ) and an insert/delete time of O ( p nW log W ), for an y > 1. Because of the excessiv e insert and delete times, the sets of equal-length prefxes data structure is suitable only for static router tables. By using the prefx expansion

PAGE 33

19 metho d [22, 82], w e can limit the n um b er of distinct lengths in the prefx set and so reduce the run time b y a constan t factor [87]. 1.3.4 T ries IP lo okup in the BSD k ernel is done using the P atricia data structure [78], whic h is a v arian t of a compressed binary trie [39]. This sc heme requires O ( W ) memory accesses p er lo okup, insert, and delete. W e note that the lo okup complexit y of longest prefx matc hing algorithms is generally measured b y the n um b er of accesses made to main memory (equiv alen tly the n um b er of cac he misses). Dynamic prefx tries, whic h are an extension of P atricia, and whic h also tak e O ( W ) memory accesses for lo okup, w ere prop osed b y Do eringer et al. [23 ]. F or IPv4 prefx sets, Degermark et al. [22] prop osed the use of a three-lev el trie in whic h the strides are 16, 8, and 8. They prop ose enco ding the no des in this trie using bit v ectors to reduce memory requiremen ts. The resulting data structure requires at most 12 memory accesses. Ho w ev er, inserts and deletes are quite exp ensiv e. F or example, the insertion of the prefx 1* c hanges up to 2 15 en tries in the trie's ro ot no de. All of these c hanges ma y prop ogate in to the compacted storage sc heme of [22]. The m ultibit trie data structures of Sriniv asan and V arghese [82] are, p erhaps, the most rexible and eectiv e trie structure for IP lo okup. Using a tec hnique called con trolled prefx expansion, whic h is v ery similar to the tec hnique used in [22 ], tries of a predetermined heigh t (and hence with a predetermined n um b er of memory accesses p er lo okup) ma y b e constructed for an y prefx set. Sriniv asan and V arghese [82] dev elop dynamic programming algorithms to obtain space optimal fxed-stride tries (FSTs) and v ariable-stride tries (VSTs) of a giv en heigh t. Lampson et al. [44] prop osed the use of h ybrid data structures comprised of a stride-16 ro ot and an auxiliary data structure for eac h of the subtries of the stride-16 ro ot. This auxiliary data structure could b e the end-p oin t arra y (since eac h subtrie is exp ected to con tain only a small n um b er of prefxes, the n um b er of end p oin ts in

PAGE 34

20 eac h end-p oin t arra y is also exp ected to b e quite small). An alternativ e auxiliary data structure suggested b y Lampson et al. [44] is a 6-w a y searc h tree for IPv4 router tables. In the case of these 6-w a y trees, the k eys are the remaining up to 16 bits of the prefx (recall that the stride-16 ro ot consumes the frst 16 bits of a prefx). F or IPv6 prefxes, a m ulticolumn sc heme is suggested [44]. None of these prop osed structures is suitable for a dynamic table. Nilsson and Karlsson [57] prop ose a greedy heuristic to construct optimal VSTs. They call the resulting VSTs LC-tries (lev el-compressed tries). An LC-tries obtained from a 1-bit trie b y replacing full subtries of the 1-bit trie b y single m ultibit no des. This replacemen t is done b y examining the 1-bit trie top to b ottom (i.e., from ro ot to lea v es). 1.3.5 Binary Searc h T rees Suri et al. [84] prop osed a B-tree data structure for dynamic router tables. Using their structure, w e ma y fnd the longest matc hing prefx in O (log n ) time. Ho w ev er, inserts/deletes tak e O ( W log n ) time. The n um b er of cac he misses is O (log n ) for eac h op eration. When W bits ft in O (1) w ords (as is the case for IPv4 and IPv6 prefxes) logical op erations on W -bit v ectors can b e done in O (1) time eac h. In this case, the sc heme of [84] tak es O (log W log n ) time for an insert and O ( W + log n ) = O ( W ) time for a delete. Sev eral researc hers ([16 25 26 36, 74], for example), ha v e in v estigated router table data structures that accoun t for bias in access patterns. Gupta, Prabhak ar, and Bo yd [36 ], for example, prop ose the use of ranges. They assume that access frequencies for the ranges are kno wn, and they construct a b ounded-heigh t binary searc h tree of ranges. This binary searc h tree accoun ts for the kno wn range access frequencies to obtain near-optimal IP lo okup. Although the sc heme of [36 ] p erforms IP lo okup in near-optimal time, c hanges in the access frequencies, or the insertion

PAGE 35

21 or remo v al of a prefx require us to reconstruct the data structure, a task that tak es O ( n log n ) time. Ergun et al. [25 26 ] use ranges to dev elop a biased skip list structure that p erforms longest prefx-matc hing in O (log n ) exp ected time. Their sc heme is designed to giv e go o d exp ected p erformance for burst y 7 access patterns". The biased skip list sc heme of Ergun et al. [25, 26 ] p ermits inserts and deletes in O (log n ) time only in the sev erely restricted and impractical situation when all prefxes in the router table are of the same length. F or the more general, and practical, case when the router table comprises prefxes of dieren t length, their sc heme tak es O ( n ) exp ected time for eac h insert and delete. 1.3.6 Others Cheung and McCanne [16] dev elop ed \a mo del for table-driv en route lo okup and cast the table design problem as an optimization problem within this mo del." Their mo del accoun ts for the memory hierarc h y of mo dern computers and they optimize a v erage p erformance rather than w orst-case p erformance. Gupta and McKeo wn [33 ] examine the asymptotic complexit y of a related problem, pac k et classifcation. They dev elop t w o data structures, heap-on-trie (HoT) and binary-searc h-tree-on-trie (BoT), for the dynamic pac k et classifcation problem. The complexit y of these data structures (for pac k et classifcation and the insertion and deletion of rules) also is dep enden t on W F or d -dimensional rules, a searc h in a HoT tak es O ( W d ) and an up date (insert or delete) tak es O ( W d log n ) time. The corresp onding times for a BoT are O ( W d log n ) and O ( W d )Tj/T1_2 7.9701 Tf6.5992 0 Td(1 log n ), resp ectiv ely 7 In a bursty access pattern the n um b er of dieren t destination addresses in an y subsequence of q pac k ets is << q That is, if the destination of the curren t pac k et is d there is a high probabilit y that d is also the destination for one or more of the next few pac k ets. The fact that In ternet pac k ets tend to b e burst y has b een noted in [18, 46], for example.

PAGE 36

22 Hardw are solutions that in v olv e the use of con ten t addressable memory [50] as w ell as solutions that in v olv e mo difcations to the In ternet Proto col (i.e., the addition of information to eac h pac k et) ha v e also b een prop osed [10, 13, 56 ]. 1.4 Dissertation Outline The remainder of the dissertation is organized as follo ws. Chapters 2 and 3 concen trate on t w o data structures for static router table, in whic h the rule set do es not v ary in time. In Chapter 2, w e dev elop new dynamic programming form ulations for the construction of space optimal tries of a predetermined heigh t. In Chapter 3, w e dev elop an algorithm that minimizes storage requiremen t for collection of hash table optimization problem. Also, w e prop ose impro v emen ts to the heuristic of [80]. Chapters 4 and 5 pro vide data structures for dynamic router tables, in whic h rules are added/deleted with some frequency In Chapter 4, w e sho w ho w to use the range enco ding idea of [44] so that longest prefx matc hing as w ell as prefx insertion and deletion can b e done in O (log n ) time. Chapter 5 presen ts the managemen t of router tables for a dynamic en vironmen t (i.e., searc h, insert, and deletes are p erformed dynamically) in whic h the access pattern is burst y .

PAGE 37

CHAPTER 2 MUL TIBIT TRIES In this c hapter, w e fo cus on the con trolled expansion tec hnique of Sriniv asan and V arghese [82 ]. In particular, w e dev elop new dynamic programming form ulations for the construction of space optimal tries of a predetermined heigh t. While the asymptotic complexities of the algorithms that result from our form ulations are the same as those for the corresp onding algorithms of [82], exp erimen ts using real IPv4 routing table data indicate that our algorithms run considerably faster. Our fxedstride trie algorithm is 2 to 4 times as fast on a SUN w orkstation and 1.5 to 3 times as fast on a P en tium 4 PC. On a SUN w orkstation, our v ariable-stride trie algorithm is b et w een 2 and 17 times as fast as the corresp onding algorithm of [82]; on a P en tium 4 PC, our algorithm is b et w een 3 and 47 times as fast. In Section 2.1, w e describ e the data structure for 1-bit tries. W e dev elop our new dynamic programming form ulations for b oth fxed-stride and v ariable-stride tries in Section 2.2 and 2.3, resp ectiv ely In Section 2.4, w e presen t our exp erimen tal results. 2.1 1-Bit T ries A 1-bit trie is a tree-lik e structure in whic h eac h no de has a left c hild, left data, righ t c hild, and righ t data feld. No des at lev el l )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 of the trie store prefxes whose length is l (the length of a prefx is the n um b er of bits in that prefx; the terminating (if presen t) do es not coun t to w ards the prefx length). If the righ tmost bit in a prefx whose length is l is 0, the prefx is stored in the left data feld of a no de that is at lev el l )Tj/T1_0 11.9552 Tf11.0293 0 Td(1; otherwise, the prefx is stored in the righ t data feld of a no de that is at lev el l )Tj/T1_0 11.9552 Tf10.1894 0 Td(1. A t lev el i of a trie, branc hing is done b y examining bit i (bits are n um b ered from left to righ t b eginning with the n um b er 0, and lev els are n um b ered with the ro ot b eing at lev el 0) of a prefx or destination address. When bit i is 0, w e mo v e in to the 23

PAGE 38

24 left subtree; when the bit is 1, w e mo v e in to the righ t subtree. Figure 2{1(a) giv es the prefxes in the 8-prefx example of [82 ], and Figure 2{1(b) sho ws the corresp onding 1-bit trie. The prefxes in Figure 2{1(a) are n um b ered and ordered as in [82]. Since the trie of Figure 2{1(b) has a heigh t of 6, a searc h in to this trie ma y mak e up to 7 memory accesses. The total memory required for the 1-bit trie of Figure 2{1(b) is 20 units (eac h no de requires 2 units, one for eac h pair of (c hild, data) felds). The 1-bit tries describ ed here are an extension of the 1-bit tries describ ed in [39]. The primary dierence b eing that the 1-bit tries of [39 ] are for the case when all k eys (prefxes) ha v e the same length. Original prefixes P5=0* P1=10* P2=111* P3=11001* P4=1* P6=1000* P7=100000* P8=1000000* (a) 8-prefx example of [82 ] N0 N1 N21 N22 P5 P4 P1 P2 P3 P6 P7 P8 N31 N32 N41 N42 N5 N6 (b) Corresp onding 1-bit trie Figure 2{1: Prefxes and corresp onding 1-bit trie When 1-bit tries are used to represen t IPv4 router tables, the trie heigh t ma y b e as m uc h as 31. A lo okup in suc h a trie tak es up to 32 memory accesses. T able 2{1 giv es the c haracteristics of fv e IPv4 bac kb one router prefx sets, and T able 2{2 giv es a more detailed c haracterization of the prefxes in the largest of these fv e databases, P aix [53]. F or our fv e databases, the n um b er of no des in a 1-bit trie is b et w een 2 n and 3 n where n is the n um b er of prefxes in the database (T able 2{1).

PAGE 39

25 T able 2{1: Prefx databases obtained from IPMA pro ject on Sep 13, 2000 Database Prefxes 16-bit prefxes 24-bit prefxes No des* P aix 85,682 6,606 49,756 173,012 Pb 35,151 2,684 19,444 91,718 MaeW est 30,599 2,500 16,260 81,104 Aads 26,970 2,236 14,468 74,290 MaeEast 22,630 1,810 11,386 65,862 The last column sho ws the n um b er of no des in the 1-bit trie represen tation of the prefx database. Note: the n um b er of prefxes stored at lev el i of a 1-bit trie equals the n um b er of prefxes whose length is i + 1. T able 2{2: Distributions of the prefxes and no des in the 1-bit trie for P aix Lev el Num b er of Num b er of Lev el Num b er of Num b er of prefxes no des prefxes no des 0 0 1 16 918 5,117 1 0 2 17 1,787 8,245 2 0 4 18 5,862 12,634 3 0 7 19 3,614 15,504 4 0 11 20 3,750 20,557 5 0 20 21 5,525 26,811 6 0 36 22 7,217 32,476 7 22 62 23 49,756 37,467 8 4 93 24 12 54 9 5 169 25 26 44 10 9 303 26 12 20 11 26 561 27 5 9 12 56 1,037 28 4 5 13 176 1,933 29 1 2 14 288 3,552 30 0 1 15 6,606 6,274 31 1 1

PAGE 40

26 2.2 Fixed-Stride T ries 2.2.1 Defnition Sriniv asan and V arghese [82 ] ha v e prop osed the use of fxed-stride tries to enable fast iden tifcation of the longest matc hing prefx in a router table. The stride of a no de is defned to b e the n um b er of bits used at that no de to determine whic h branc h to tak e. A no de whose stride is s has 2 s c hild felds (corresp onding to the 2 s p ossible v alues for the s bits that are used) and 2 s data felds. Suc h a no de requires 2 s memory units. In a fxe d-stride trie (FST), all no des at the same lev el ha v e the same stride; no des at dieren t lev els ma y ha v e dieren t strides. Supp ose w e wish to represen t the prefxes of Figure 2{1(a) using an FST that has three lev els. Assume that the strides are 2, 3, and 2. The ro ot of the trie stores prefxes whose length is 2; the lev el one no des store prefxes whose length is 5 (2 + 3); and lev el three no des store prefxes whose length is 7 (2 + 3 + 2). This p oses a problem for the prefxes of our example, b ecause the length of some of these prefxes is dieren t from the storeable lengths. F or instance, the length of P5 is 1. T o get around this problem, a prefx with a nonp ermissible length is expanded to the next p ermissible length. F or example, P5 = 0* is expanded to P5a = 00* and P5b = 01*. If one of the newly created prefxes is a duplicate, natural dominance rules are used to eliminate all but one o ccurrence of the prefx. F or instance, P4 = 1* is expanded to P4a = 10* and P4b = 11*. Ho w ev er, P1 = 10* is to b e c hosen o v er P4a = 10*, b ecause P1 is a longer matc h than P4. So, P4a is eliminated. Because of the elimination of duplicate prefxes from the expanded prefx set, all prefxes are distinct. Figure 2{2(a) sho ws the prefxes that result when w e expand the prefxes of Figure 2{1 to lengths 2, 5, and 7. Figure 2{2(b) sho ws the corresp onding FST whose heigh t is 2 and whose strides are 2, 3, and 2. Since the trie of Figure 2{2(b) can b e searc hed with at most 3 memory references, it represen ts a time p erformance impro v emen t o v er the 1-bit trie of Figure 2{1(b),

PAGE 41

27 (3 levels) 00* (P5a) 01* (P5b) 10* (P1) 11* (P4) 11100* (P2a) 11001* (P3) 10000* (P6a) 11101* (P2b) 11110* (P2c) 11111* (P2d) 10001* (P6b) 1000001* (P7) 1000000* (P8) Expanded prefixes (a) Expanded prefxes 00 01 10 11 100 110 111 101 001 010 011 000 100 110 111 101 001 010 011 000 00 01 10 11 P5 P3 P1 P4 P6 P6 P8 P7 P5 P2 P2 P2 P2 (b) Corresp onding fxed-stride trie Figure 2{2: Prefx expansion and fxed-stride trie whic h requires up to 7 memory references to p erform a searc h. Ho w ev er, the space requiremen ts of the FST of Figure 2{2(b) are more than that of the corresp onding 1-bit trie. F or the ro ot of the FST, w e need 8 felds or 4 units; the t w o lev el 1 no des require 8 units eac h; and the lev el 3 no de requires 4 units. The total is 24 memory units. W e ma y represen t the prefxes of Figure 2{1(a) using a one-lev el trie whose ro ot has a stride of 7. Using suc h a trie, searc hes could b e p erformed making a single memory access. Ho w ev er, the one-lev el trie w ould require 2 7 = 128 memory units. 2.2.2 Construction of Optimal Fixed-Stride T ries In the fxe d-stride trie optimization (FSTO) problem, w e are giv en a set P of prefxes and an in teger k W e are to select the strides for a k -lev el FST in suc h a manner that the k -lev el FST for the giv en prefxes uses the smallest amoun t of memory .

PAGE 42

28 F or some P a k -lev el FST ma y actually require more space than a ( k )Tj/T1_0 11.9552 Tf11.7493 0 Td(1)-lev el FST. F or example, when P = f 00*, 01*, 10*, 11* g the unique 1-lev el FST for P requires 4 memory units while the unique 2-lev el FST (whic h is actually the 1-bit trie for P ) requires 6 memory units. Since the searc h time for a ( k )Tj/T1_0 11.9552 Tf12.3494 0 Td(1)-lev el FST is less than that for a k -lev el FST, w e w ould actually prefer ( k )Tj/T1_0 11.9552 Tf12.1094 0 Td(1)-lev el FSTs that tak e less (or ev en equal) memory o v er k -lev el FSTs. Therefore, in practice, w e are really in terested in determining the b est FST that uses at most k lev els (rather than exactly k lev els). The mo dife d FSTO problem (MFSTO) is to determine the b est FST that uses at most k lev els for the giv en prefx set P Let O b e the 1-bit trie for the giv en set of prefxes, and let F b e an y k -lev el FST for this prefx set. Let s 0 ::: s k )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 b e the strides for F W e shall sa y that lev el 0 of F co v ers lev els 0 ; :::; s 0 )Tj/T1_0 11.9552 Tf11.6293 0 Td(1 of O and that lev el j 0 < j < k of F co v ers lev els a; :::; b of O where a = P j )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 q =0 s q and b = P j q =0 s q )Tj/T1_0 11.9552 Tf12.1094 0 Td(1. So, lev el 0 of the FST of Figure 2{2(b) co v ers lev els 0 and 1 of the 1-bit trie of Figure 2{1(b). Lev el 1 of this FST co v ers lev els 2, 3, and 4 of the 1-bit trie of Figure 2{1(b); and lev el 2 of this FST co v ers lev els 5 and 6 of the 1-bit trie. W e shall refer to lev els e u = P u )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 q =0 s q 0 u < k as the exp ansion levels of O The expansion lev els defned b y the FST of Figure 2{2(b) are 0, 2, and 5. Let nodes ( i ) b e the n um b er of no des at lev el i of the 1-bit trie O F or the 1-bit trie of Figure 2{1(a), nodes (0 : 6) = [1 ; 1 ; 2 ; 2 ; 2 ; 1 ; 1]. The memory required b y F is P k )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 q =0 nodes ( e q ) 2 s q F or example, the memory required b y the FST of Figure 2{2(b) is nodes (0) 2 2 + nodes (2) 2 3 + nodes (5) 2 2 = 24. Let T ( j; r ), r j + 1, b e the cost (i.e., memory requiremen t) of the b est w a y to co v er lev els 0 through j of O using exactly r expansion lev els. When the maxim um prefx length is W T ( W )Tj/T1_0 11.9552 Tf12.8293 0 Td(1 ; k ) is the cost of the b est k -lev el FST for the giv en set of prefxes. Sriniv asan and V arghese [82 ] ha v e obtained the follo wing dynamic programming recurrence for T :

PAGE 43

29 T ( j; r ) = min m 2f r )Tj/T1_4 7.9701 Tf6.5992 0 Td(2 ::j )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 g f T ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( m + 1) 2 j )Tj/T1_2 7.9701 Tf6.5993 0 Td(m g ; r > 1 (2.1) T ( j; 1) = 2 j +1 (2.2) The rationale for Equation 2.1 is that the b est w a y to co v er lev els 0 through j of O using exactly r expansion lev els, r > 1, m ust ha v e its last expansion lev el at lev el m + 1 of O where m m ust b e at least r )Tj/T1_0 11.9552 Tf12.3494 0 Td(2 (as otherwise, w e do not ha v e enough lev els b et w een lev els 0 and m of O to select the remaining r )Tj/T1_0 11.9552 Tf12.4693 0 Td(1 expansion lev els) and at most j )Tj/T1_0 11.9552 Tf11.2694 0 Td(1 (b ecause the last expansion lev el is j ). When the last expansion lev el is lev el m + 1, the stride for this lev el is j )Tj/T1_1 11.9552 Tf11.5094 0 Td(m and the n um b er of no des at this expansion lev el is nodes ( m + 1). F or optimalit y lev els 0 through m of O m ust b e co v ered in the b est p ossible w a y using exactly r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 expansion lev els. As noted b y Sriniv asan and V arghese [82], using the ab o v e recurrence, w e ma y determine T ( W )Tj/T1_0 11.9552 Tf12.4694 0 Td(1 ; k ) in O ( k W 2 ) time (excluding the time needed to compute O from the giv en prefx set and determine nodes ()). The strides for the optimal k -lev el FST can b e obtained in an additional O ( k ) time. Since, Equation 2.1 also ma y b e used to compute T ( W )Tj/T1_0 11.9552 Tf12.4694 0 Td(1 ; q ) for all q k in O ( k W 2 ) time, w e can actually solv e the MFSTO problem in the same asymptotic complexit y as required for the FSTO problem. W e can reduce the time needed to solv e the MFSTO problem b y mo difying the defnition of T The mo difed function is C where C ( j; r ) is the cost of the b est FST that uses at most r expansion lev els. It is easy to see that C ( j; r ) C ( j; r )Tj/T1_0 11.9552 Tf10.5493 0 Td(1), r > 1. A simple dynamic programming recurrence for C is: C ( j; r ) = min m 2f)Tj/T1_4 7.9701 Tf16.4264 0 Td(1 ::j )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 g f C ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( m + 1) 2 j )Tj/T1_2 7.9701 Tf6.5992 0 Td(m g ; j 0 ; r > 1 (2.3)

PAGE 44

30 C ( )Tj/T1_0 11.9552 Tf9.2294 0 Td(1 ; r ) = 0 and C ( j; 1) = 2 j +1 ; j 0 (2.4) T o see the correctness of Equations 2.3 and 2.4, note that when j 0, there m ust b e at least one expansion lev el. If r = 1, then there is eactly one expansion lev el and the cost is 2 j +1 If r > 1, the last expansion lev el in the b est FST could b e at an y of the lev els 0 through j Let m + 1 b e this last expansion lev el. The cost of the co v ering is C ( m; r )Tj/T1_0 11.9552 Tf12.3494 0 Td(1) + nodes ( m + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(m When j = )Tj/T1_0 11.9552 Tf9.2294 0 Td(1, no lev els of the 1-bit trie remain to b e co v ered. Therefore, C ( )Tj/T1_0 11.9552 Tf9.2294 0 Td(1 ; r ) = 0. W e ma y obtain an alternativ e recurrence for C ( j; r ) in whic h the range of m on the righ t side is r )Tj/T1_0 11.9552 Tf12.5894 0 Td(2 ::j )Tj/T1_0 11.9552 Tf12.4694 0 Td(1 rather than )Tj/T1_0 11.9552 Tf9.2294 0 Td(1 ::j )Tj/T1_0 11.9552 Tf12.5894 0 Td(1. First, w e obtain the follo wing dynamic programming recurrence for C : C ( j; r ) = min f C ( j; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) ; T ( j; r ) g ; r > 1 (2.5) C ( j; 1) = 2 j +1 (2.6) The rationale for Equation 2.5 is that the b est FST that uses at most r expansion lev els either uses at most r )Tj/T1_0 11.9552 Tf12.2293 0 Td(1 lev els or uses exactly r lev els. When at most r )Tj/T1_0 11.9552 Tf12.3494 0 Td(1 lev els are used, the cost is C ( j; r )Tj/T1_0 11.9552 Tf11.6293 0 Td(1), and when exactly r lev els are used, the cost is T ( j; r ), whic h is defned b y Equation 2.1. Let U ( j; r ) b e as defned in Equation 2.7. U ( j; r ) = min m 2f r )Tj/T1_4 7.9701 Tf6.5992 0 Td(2 ::j )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 g f C ( m; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + nodes ( m + 1) 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(m g (2.7) F rom Equations 2.1 and 2.5 w e obtain C ( j; r ) = min f C ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ; U ( j; r ) g (2.8)

PAGE 45

31 T o see the correctness of Equation 2.8, note that for all j and r suc h that r j + 1, T ( j; r ) C ( j; r ). F urthermore, T ( j; r ) = min m 2f r )Tj/T1_5 7.9701 Tf6.5992 0 Td(2 ::j )Tj/T1_5 7.9701 Tf6.5992 0 Td(1 g f T ( m; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + nodes ( m + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(m g min m 2f r )Tj/T1_5 7.9701 Tf6.5992 0 Td(2 ::j )Tj/T1_5 7.9701 Tf6.5992 0 Td(1 g f C ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( m + 1) 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(m g = U ( j; r ) (2.9) Therefore, when C ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) U ( j; r ), Equations 2.5 and 2.8 compute the same v alue for C ( j; r ) (i.e., C ( j; r )Tj/T1_0 11.9552 Tf12.9494 0 Td(1)). When C ( j; r )Tj/T1_0 11.9552 Tf12.9494 0 Td(1) > U ( j; r ), it app ears from Equation 2.9 that Equation 2.8 ma y compute a smaller C ( j; r ) than is computed b y Equation 2.5. Ho w ev er, from Equation 2.3, whic h is equiv alen t to Equation 2.5, the C ( j; r ) computed b y Equations 2.3 and 2.5 satisifes C ( j; r ) = min m 2f)Tj/T1_5 7.9701 Tf16.4264 0 Td(1 ::j )Tj/T1_5 7.9701 Tf6.5992 0 Td(1 g f C ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( m + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(m g min m 2f r )Tj/T1_5 7.9701 Tf6.5992 0 Td(2 ::j )Tj/T1_5 7.9701 Tf6.5992 0 Td(1 g f C ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( m + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(m g = U ( j; r ) where C ( )Tj/T1_0 11.9552 Tf9.2294 0 Td(1 ; r ) = 0. Ho w ev er, when C ( j; r )Tj/T1_0 11.9552 Tf12.2293 0 Td(1) > U ( j; r ), the C ( j; r ) computed b y Equation 2.8 is U ( j; r ). Therefore, when C ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) > U ( j; r ), the C ( j; r ) computed b y Equation 2.8 cannot b e smaller than that computed b y Equation 2.5. Therefore, the C ( j; r )s computed b y Equations 2.5 and 2.8 are equal. In the remainder of this section, w e use Equations 2.3 and 2.4 for C The range for m (in Equation 2.3) ma y b e restricted to a range that is (often) considerably smaller than r )Tj/T1_0 11.9552 Tf12.1094 0 Td(2 ::j )Tj/T1_0 11.9552 Tf12.2293 0 Td(1. T o obtain this narro w er searc h range, w e frst establish a few prop erties of 1-bit tries and their corresp onding optimal FSTs. Lemma 1 F or every 1-bit trie O (a) nodes ( i ) 2 i i 0 and (b) nodes ( i + j ) 2 j nodes ( i ) j 0 i 0 Pro of F ollo ws from the fact that a 1-bit trie is a binary tree.

PAGE 46

32 Let M ( j; r ), r > 1, b e the smallest m that minimizes C ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( m + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(m ; in Equation 2.3. Lemma 2 8 ( j 0 ; r > 1)[ M ( j + 1 ; r ) M ( j; r )] : Pro of Let M ( j; r ) = a and M ( j + 1 ; r ) = b Supp ose b < a Then, C ( j; r ) = C ( a; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + nodes ( a + 1) 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(a < C ( b; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( b + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(b since, otherwise, M ( j; r ) = b F urthermore, C ( j + 1 ; r ) = C ( b; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( b + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(b C ( a; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( a + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(a : Therefore, nodes ( a + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(a + nodes ( b + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(b < nodes ( b + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(b + nodes ( a + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(a So, nodes ( b + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(b < nodes ( a + 1) 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(a Hence, 2 a )Tj/T1_3 7.9701 Tf6.5992 0 Td(b nodes ( b + 1) < nodes ( a + 1) This con tradicts Lemma 1(b). So, b a Lemma 3 8 ( j 0 ; r > 0)[ C ( j; r ) < C ( j + 1 ; r )] :

PAGE 47

33 Pro of The case r = 1 follo ws from C ( j; 1) = 2 j +1 So, assume r > 1. F rom the defnition of M it follo ws that C ( j + 1 ; r ) = C ( b; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + nodes ( b + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5992 0 Td(b ; where )Tj/T1_0 11.9552 Tf9.2294 0 Td(1 b = M ( j + 1 ; r ) j When b < j w e get C ( j; r ) C ( b; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( b + 1) 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(b < C ( b; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( b + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(b = C ( j + 1 ; r ) When b = j C ( j + 1 ; r ) = C ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( j + 1) 2 > C ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ; since nodes ( j + 1) > 0. The next few lemmas use the function whic h is defned as ( j; r ) = C ( j; r )Tj/T1_0 11.9552 Tf-421.797 -24 Td(1) )Tj/T1_2 11.9552 Tf11.8694 0 Td(C ( j; r ). Since, C ( j; r ) C ( j; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1), ( j; r ) 0 for all j 0 and all r 2. Lemma 4 8 ( j 0)[( j; 2) ( j + 1 ; 2)] Pro of If C ( j; 2) = C ( j; 1), there is nothing to pro v e as ( j + 1 ; 2) 0. The only other p ossibilit y is C ( j; 2) < C ( j; 1) (i.e., ( j; 2) > 0). In this case, the b est co v er for lev els 0 through j uses exactly 2 expansion lev els. F rom the recurrence for C (Equations 2.3 and 2.4), it follo ws that C ( j; 1) = 2 j +1 and C ( j; 2) = C ( a; 1) + nodes ( a + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(a = 2 a +1 )Tj/T1_2 11.9552 Tf11.9894 0 Td(nodes ( a + 1) 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(a ; for some a 0 a < j Therefore, ( j; 2) = C ( j; 1) )Tj/T1_2 11.9552 Tf11.8694 0 Td(C ( j; 2) = 2 j +1 )Tj/T1_0 11.9552 Tf11.9894 0 Td(2 a +1 )Tj/T1_2 11.9552 Tf11.8694 0 Td(nodes ( a + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(a :

PAGE 48

34 F rom Equations 2.3 and 2.4, it follo ws that C ( j + 1 ; 2) C ( a; 1) + nodes ( a + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(a = 2 a +1 + nodes ( a + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(a : Hence, ( j + 1 ; 2) 2 j +2 )Tj/T1_0 11.9552 Tf11.8694 0 Td(2 a +1 )Tj/T1_1 11.9552 Tf11.9894 0 Td(nodes ( a + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(a : Therefore, ( j + 1 ; 2) )Tj/T1_0 11.9552 Tf11.9894 0 Td(( j; 2) 2 j +2 )Tj/T1_0 11.9552 Tf11.8694 0 Td(2 a +1 )Tj/T1_1 11.9552 Tf11.9894 0 Td(nodes ( a + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(a )Tj/T1_0 11.9552 Tf9.2294 0 Td(2 j +1 + 2 a +1 + nodes ( a + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(a = 2 j +1 )Tj/T1_1 11.9552 Tf11.8694 0 Td(nodes ( a + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(a 2 j +1 )Tj/T1_0 11.9552 Tf11.8694 0 Td(2 a +1 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(a (Lemma 1(a)) = 0 Lemma 5 8 ( j 0 ; k > 2)[( j; k )Tj/T1_0 11.9552 Tf9.5894 0 Td(1) ( j + 1 ; k )Tj/T1_0 11.9552 Tf9.5894 0 Td(1)] = ) 8 ( j 0 ; k > 2)[( j; k ) ( j + 1 ; k )] : Pro of Assume that 8 ( j 0 ; k > 2)[( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ( j + 1 ; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1)]. W e shall sho w that 8 ( j 0 ; k > 2)[( j; k ) ( j + 1 ; k )]. Let M ( j; k ) = b and M ( j + 1 ; k )Tj/T1_0 11.9552 Tf11.5094 0 Td(1) = c Case 1: c b .

PAGE 49

35 ( j; k ) = C ( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) )Tj/T1_1 11.9552 Tf11.8694 0 Td(C ( j; k ) = C ( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) )Tj/T1_1 11.9552 Tf11.8694 0 Td(C ( b; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(nodes ( b + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(b C ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) + nodes ( b + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(b )Tj/T1_1 11.9552 Tf9.2294 0 Td(C ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) )Tj/T1_1 11.9552 Tf11.8694 0 Td(nodes ( b + 1) 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(b = ( b; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) : Also, ( j + 1 ; k ) = C ( j + 1 ; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) )Tj/T1_1 11.9552 Tf11.8694 0 Td(C ( j + 1 ; k ) C ( c; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) + nodes ( c + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(c )Tj/T1_1 11.9552 Tf9.2294 0 Td(C ( c; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(nodes ( c + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(c = ( c; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) : Since c b ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ( c; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1). Therefore, ( j + 1 ; k ) ( c; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ( j; k ) : Case 2: c < b Let M ( j + 1 ; k ) = a M ( j; k ) = b M ( j + 1 ; k )Tj/T1_0 11.9552 Tf12.4694 0 Td(1) = c and M ( j; k )Tj/T1_0 11.9552 Tf12.4694 0 Td(1) = d F rom Lemma 2, a b and c d Since c < b a b > c d Also, ( j; k ) = C ( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(C ( j; k ) = [ C ( d; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(2) + nodes ( d + 1) 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(d ] )Tj/T1_0 11.9552 Tf9.2294 0 Td([ C ( b; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + nodes ( b + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(b ]

PAGE 50

36 and ( j + 1 ; k ) = C ( j + 1 ; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(C ( j + 1 ; k ) = [ C ( c; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) + nodes ( c + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5992 0 Td(c ] )Tj/T1_0 11.9552 Tf9.2294 0 Td([ C ( a; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( a + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(a ] : Therefore, ( j + 1 ; k ) )Tj/T1_0 11.9552 Tf11.9894 0 Td(( j; k ) = [ C ( c; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(2) + nodes ( c + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(c ] )Tj/T1_0 11.9552 Tf9.2294 0 Td([ C ( d; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(2) + nodes ( d + 1) 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(d ] +[ C ( b; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + nodes ( b + 1) 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(b ] )Tj/T1_0 11.9552 Tf9.2294 0 Td([ C ( a; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( a + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(a ] : (2.10) Since j > b > c d = M ( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1), C ( c; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) + nodes ( c + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(c C ( d; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(2) + nodes ( d + 1) 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(d (2.11) F urthermore, since M ( j + 1 ; k ) = a b C ( b; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + nodes ( b + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5992 0 Td(b C ( a; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + nodes ( a + 1) 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(a (2.12) Substituting Equations 2.11 and 2.12 in to Equation 2.10, w e get ( j + 1 ; k ) )Tj/T1_0 11.9552 Tf11.8694 0 Td(( j; k ) nodes ( c + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(c )Tj/T1_1 11.9552 Tf11.9894 0 Td(nodes ( b + 1) 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(b : Lemma 1 and c < b imply nodes ( c + 1) 2 b )Tj/T1_3 7.9701 Tf6.5992 0 Td(c nodes ( b + 1). Therefore, nodes ( c + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(c nodes ( b + 1) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(b : So, ( j + 1 ; k ) )Tj/T1_0 11.9552 Tf11.9894 0 Td(( j; k ) 0.

PAGE 51

37 Lemma 6 8 ( j 0 ; k 2)[( j; k ) ( j + 1 ; k )] : Pro of F ollo ws from Lemmas 4 and 5. Lemma 7 L et k > 2 8 ( j 0)[( j; k )Tj/T1_0 11.9552 Tf10.7894 0 Td(1) ( j + 1 ; k )Tj/T1_0 11.9552 Tf10.7894 0 Td(1)] = ) 8 ( j 0)[ M ( j; k ) M ( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1)] : Pro of Assume that 8 ( j 0)[( j; k )Tj/T1_0 11.9552 Tf11.3894 0 Td(1) ( j + 1 ; k )Tj/T1_0 11.9552 Tf11.5094 0 Td(1)]. Supp ose that M ( j; k )Tj/T1_0 11.9552 Tf-421.729 -24 Td(1) = a M ( j; k ) = b and b < a for some j j 0. F rom Equation 2.3, w e get C ( j; k ) = C ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( b + 1) 2 j )Tj/T1_5 7.9701 Tf6.5993 0 Td(b C ( a; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + nodes ( a + 1) 2 j )Tj/T1_5 7.9701 Tf6.5993 0 Td(a and C ( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) = C ( a; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) + nodes ( a + 1) 2 j )Tj/T1_5 7.9701 Tf6.5993 0 Td(a < C ( b; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(2) + nodes ( b + 1) 2 j )Tj/T1_5 7.9701 Tf6.5993 0 Td(b : Hence, C ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + C ( a; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(2) < C ( a; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + C ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) : Therefore, ( a; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) < ( b; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) : Ho w ev er, b < a and 8 ( j 0)[( j; k )Tj/T1_0 11.9552 Tf11.7494 0 Td(1) ( j + 1 ; k )Tj/T1_0 11.9552 Tf11.7494 0 Td(1)] imply that ( b; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) ( a; k )Tj/T1_0 11.9552 Tf12.4694 0 Td(1). Since our assumption that b < a leads to a con tradiction, it m ust b e that there is no j 0 for whic h M ( j; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) = a M ( j; k ) = b and b < a Lemma 8 8 ( j 0 ; k > 2)[ M ( j; k ) M ( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1)] : Pro of F ollo ws from Lemmas 6 and 7. Theorem 1 8 ( j 0 ; k > 2)[ M ( j; k ) max f M ( j )Tj/T1_0 11.9552 Tf11.8694 0 Td(1 ; k ) ; M ( j; k )Tj/T1_0 11.9552 Tf11.8693 0 Td(1) g ] : Pro of F ollo ws from Lemmas 2 and 8.

PAGE 52

38 Algorithm FixedStrides( W k ) // W is length of longest prefx. // k is maxim um n um b er of expansion lev els desired. // Return C ( W )Tj/T1_0 11.9552 Tf11.8694 0 Td(1 ; k ) and compute M ( ; ) : f for ( j = 0; j < W ; j + +) f C ( j; 1) := 2 j +1 ; M ( j; 1) := )Tj/T1_0 11.9552 Tf9.2294 0 Td(1; g for ( r = 1; r < k ; r + +) C ( )Tj/T1_0 11.9552 Tf9.2294 0 Td(1 ; r ) := 0; for ( r = 2; r k ; r + +) for ( j = r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1; j < W ; j + +) f // Compute C ( j; r ). minJ := max ( M ( j )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 ; r ) ; M ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1)); minC ost := C ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1); minL := M ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1); for ( m = minJ ; m < j ; m + +) f cost := C ( m; j )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + nodes ( m + 1) 2 j )Tj/T1_4 7.9701 Tf6.5993 0 Td(m ; if ( cost < minC ost ) then f minC ost := cost ; minL := m ; gg C ( j; r ) := minC ost ; M ( j; r ) := minL ; g return C ( W )Tj/T1_0 11.9552 Tf11.8694 0 Td(1 ; k ); g Figure 2{3: Algorithm for fxed-stride tries. Note 1 F r om L emma 6, it fol lows that whenever ( j; k ) > 0 ( q ; k ) > 0 8 q > j Theorem 1 leads to Algorithm Fixe dStrides (Figure 2{3), whic h computes C ( W )Tj/T1_0 11.9552 Tf-421.707 -24 Td(1 ; k ). The complexit y of this algorithm is O ( k W 2 ). Using the computed M v alues, the strides for the OFST that uses at most k expansion lev els ma y b e determined in an additional O ( k ) time. Although our algorithm has the same asymptotic complexit y as do es the algorithm of Sriniv asan and V arghese [82 ], exp erimen ts conducted b y us using real prefx sets indicate that our algorithm runs faster. 2.3 V ariable-Stride T ries 2.3.1 Defnition and Construction In a variable-stride trie (VST) [82], no des at the same lev el ma y ha v e dieren t strides. Figure 2{4 sho ws a t w o-lev el VST for the 1-bit trie of Figure 2{1. The stride

PAGE 53

39 00 01 10 11 100 110 111 101 001 010 011 000 00001 00010 00000 00100 00110 00111 00101 00011 P5 P3 P1 P4 P8 P7 P5 P2 P2 P2 P2 )Tj0 -25.8159 TD()Tj0 -10.7566 TD()Tj-163.501 -79.599 Td(11100 11101 11110 11111 )Tj0 12.9079 TD()Tj0 12.908 TD()TjT*()Tj-2.1513 25.8159 Td(P6 P6 P6 P6 P6 P6 ... ... Figure 2{4: Tw o-lev el VST for prefxes of Figure 2{1(a) for the ro ot is 2; that for the left c hild of the ro ot is 5; and that for the ro ot's righ t c hild is 3. The memory requiremen t of this VBT is 4 (ro ot) + 32 (left c hild of ro ot) + 8 (righ t c hild of ro ot) = 44. Since FSTs are a sp ecial case of VSTs, the memory required b y the b est VST for a giv en prefx set P and n um b er of expansion lev els k is less than or equal to that required b y the b est FST for P and k Despite this, FSTs ma y b e preferred in certain router applications \b ecause of their simplicit y and sligh tly faster searc h time" [82]. Let r -VST b e a VST that has at most r lev els. Let O pt ( N ; r ) b e the cost (i.e., memory requiremen t) of the b est r -VST for a 1-bit trie whose ro ot is N Sriniv asan and V arghese [82] ha v e obtained the follo wing dynamic programming recurrence for O pt ( N ; r ). O pt ( N ; r ) = min s 2f 1 :: 1+ heig ht ( N ) g f 2 s + X Q 2 D s ( N ) O pt ( Q; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) g ; r > 1 (2.13)

PAGE 54

40 where D s ( N ) is the set of all descenden ts of N that are at lev el s of N F or example, D 1 ( N ) is the set of c hildren of N and D 2 ( N ) is the set of grandc hildren of N heig ht ( N ) is the maxim um lev el at whic h the trie ro oted at N has a no de. F or example, in Figure 2{1(b), the heigh t of the trie ro oted at N1 is 5. When r = 1, O pt ( N ; 1) = 2 1+ heig ht ( N ) (2.14) Equation 2.14 is equiv alen t to Equation 2.2; the cost of co v ering all lev els of N using at most one expansion lev el is 2 1+ heig ht ( N ) When more than one expansion lev el is p ermissible, the stride of the frst expansion lev el ma y b e an y n um b er s that is b et w een 1 and 1 + heig ht ( N ). F or an y suc h selection of s the next expansion lev el is lev el s of the 1-bit trie whose ro ot is N The sum in Equation 2.13 giv es the cost of the b est w a y to co v er all subtrees whose ro ots are at this next expansion lev el. Eac h suc h subtree is co v ered using at most r )Tj/T1_0 11.9552 Tf11.3894 0 Td(1 expansion lev els. It is easy to see that O pt ( R ; k ), where R is the ro ot of the o v erall 1-bit trie for the giv en prefx set P is the cost of the b est k -VST for P Sriniv asan and V arghese [82] describ e a w a y to determine O pt ( R ; k ) using Equations 2.13 and 2.14. Although Sriniv asan and V arghese state that the complexit y of their algorithm is O ( nW 2 k ), where n is the n um b er of prefxes in P and W is the length of the longest prefx, a close examination rev eals that the complexit y is O ( pW k ), where p is the n um b er of no des in the 1-bit trie. Since p = O ( n ) for realistic router prefx sets, the complexit y of their algorithm is O ( nW k ) on realistic router prefx sets. W e dev elop an alternativ e dynamic programming form ulation that also p ermits the computation of O pt ( R ; k ) in O ( pW k ) time. Ho w ev er, the resulting algorithm is considerably faster. Let O pt ( N ; s; r ) = X Q 2 D s ( N ) O pt ( Q; r ) ; s > 0 ; r > 1 ; and let O pt ( N ; 0 ; r ) = O pt ( N ; r ). F rom Equations 2.13 and 2.14, w e obtain:

PAGE 55

41 O pt ( N ; 0 ; r ) = min s 2f 1 :: 1+ heig ht ( N ) g f 2 s + O pt ( N ; s; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) g ; r > 1 (2.15) and O pt ( N ; 0 ; 1) = 2 1+ heig ht ( N ) : (2.16) F or s > 0 and r > 1, w e get O pt ( N ; s; r ) = X Q 2 D s ( N ) O pt ( Q; r ) = O pt ( Lef tC hil d ( N ) ; s )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 ; r ) + O pt ( R ig htC hil d ( N ) ; s )Tj/T1_0 11.9552 Tf11.8694 0 Td(1 ; r ) : (2.17) F or Equation 2.17, w e need the follo wing initial condition: O pt ( nul l ; ; ) = 0 (2.18) The n um b er of O pt ( ; ; ) v alues is O ( pW k ). Eac h O pt ( ; ; ) v alue ma y b e computed in O (1) time using Equations 2.15 through 2.18 pro vided the O pt v alues are computed in p ostorder. Therefore, w e ma y compute O pt ( R ; k ) = O pt ( R ; 0 ; k ) in O ( pW k ) time. Although b oth our algorithm and that of [82] run in O ( pW k ) time, our algorithm is exp ected to do less w ork. W e arriv e at this exp ectation b y p erforming a somewhat crude op eration coun t analysis. In the algorithm of [82], for eac h v alue of r (see Equation 2.13), O pt ( M ; r )Tj/T1_0 11.9552 Tf12.1094 0 Td(1) is used l ev el M times, where l ev el M is the lev el for no de M Adding in 1 unit for the initial storage of O pt ( M ; r )Tj/T1_0 11.9552 Tf12.1094 0 Td(1), w e see that a l ev el M no de con tributes roughly l ev el M + 1 to the total cost of computing O pt ( ; r ). Therefore, a rough op eration coun t for the algorithm of [82] is O pC ountS r ini = k X M ( l ev el M + 1)

PAGE 56

42 where the sum is tak en o v er all no des M of the 1-bit trie. Let heig ht M b e the heigh t of the subtree ro oted at no de M of the 1-bit trie (the heigh t of a subtree that has only a ro ot is 0). Our algorithm computes ( heig ht M + 1) k O pt ( M ; ; ) v alues at no de M Eac h of these v alues is computed using a single addition. So, the op eration coun t for our algorithm is crudely estimated to b e O pC ountO ur = k X M ( heig ht M + 1) F or our fv e databases P aix, Pb, MaeW est, Aads, and MaeEast, the ratios O pC ountS r ini=O pC ountO ur are 6.7, 5.9, 5.7, 5.6, and 5.4. W e can determine the p ossible range for this ratio b y computing the ratio for sk ew ed as w ell as full binary trees. F or a totally sk ew ed 1-bit trie (e.g., a left or righ t sk ew ed trie), the t w o op eration coun t estimates are the same. F or a 1-bit trie that is a full binary tree of heigh t W )Tj/T1_0 11.9552 Tf10.0693 0 Td(1, O pC ountS r ini=k = W )Tj/T1_6 7.9701 Tf6.5992 0 Td(1 X 0 ( p + 1)2 p = ( W )Tj/T1_0 11.9552 Tf11.9894 0 Td(1)2 W + 1 and O pC ountO ur =k = W )Tj/T1_6 7.9701 Tf6.5992 0 Td(1 X 0 ( W )Tj/T1_1 11.9552 Tf11.9894 0 Td(p )2 p = 2 W +1 )Tj/T1_1 11.9552 Tf11.8694 0 Td(W )Tj/T1_0 11.9552 Tf11.9894 0 Td(2 So, O pC ountS r ini=O pC ountO ur ( W )Tj/T1_0 11.9552 Tf12.5894 0 Td(1) = 2. Since sk ew ed and full binary trees represen t t w o extremes for the op eration coun t ratio, the op eration coun t ratio is exp ected to b e b et w een 1 and ( W )Tj/T1_0 11.9552 Tf12.4694 0 Td(1) = 2. F or IPv4, W = 32 and this ratio lies b et w een 1 and 15.5. F or IPv6, W = 128 and this ratio is b et w een 1 and 63.5. Although the n um b er of op erations b eing p erformed is an imp ortan t con tributing factor to the observ ed run time of an algorithm, the n um b er of cac he misses often has signifcan t impact. F or the algorithm of [82 ], w e estimate the n um b er of cac he

PAGE 57

43 misses to b e of the same order as the n um b er of op erations (i.e., O pC ountS r ini ). Because our algorithm is a simple p ostorder tra v ersal that visits eac h no de of the 1-bit trie exactly once, the n um b er of cac he misses for our algorithm is estimated to b e O pC ountO ur =L where L is the smaller of k and the n um b er of O pt ( M ; ; ) v alues that ft in a cac he line. The cac he miss coun t giv es our algorithm another factor of L adv an tage o v er the algorithm of [82 ]. When the cost of op erations dominates the run time, our crude analysis indicates that our algorithm will b e ab out 6 times as fast as that of [82 ] (for our test databases). When cac he miss time dominates the run time, our algorithm could b e 12 times as fast when k = 2 and 42 times as fast when k = 7. Of course, since our analysis do esn't include all of the o v erheads asso ciated with the t w o algorithms, actual sp eedups ma y b e quite dieren t. Our algorithm requires O ( W 2 k ) memory for the O pt ( ; ; ) v alues. T o see this, notice that there can b e at most W + 1 no des N whose O pt ( N ; ; ) v alues m ust b e retained at an y giv en time, and for eac h of these at most W + 1 no des, O ( W k ) O pt ( N ; ; ) v alues m ust b e retained. T o determine the optimal strides, eac h no de of the 1-bit trie m ust store the stride s that minimizes the righ t side of Equation 2.15 for eac h v alue of r F or this purp ose, eac h 1-bit trie no de needs O ( k ) space. Therefore, the memory requiremen ts of the 1-bit trie are O ( pk ). The total memory required is, therefore, O ( pk + W 2 k ). In practice, w e ma y prefer an implemen tation that uses considerably more memory If w e asso ciate a cost arra y with eac h of the p no des of the 1-bit trie, the memory requiremen t increases to O ( pW k ). The adv an tage of this increased memory implemen tation is that the optimal strides can b e recomputed in O ( W 2 k ) time (rather than O ( pW k )) follo wing eac h insert or delete of a prefx. This is so b ecause, the O pt ( N ; ; ) v alues need b e recomputed only for no des along the insert/delete path of the 1-bit trie. There are O ( W ) suc h no des.

PAGE 58

44 P 1 = 0 0 ( P 1) P 2 = 1 1 ( P 2) P 3 = 11 101 ( P 4) P 4 = 101 110 ( P 3) P 5 = 10001 111 ( P 3) P 6 = 1100 10001 ( P 5) P 7 = 110000 11000 ( P 6) P 8 = 1100000 11001 ( P 6) 1100000 ( P 8) 1100001 ( P 7) (a) Original prefxes (b) Expanded prefxes Figure 2{5: A prefx set and its expansion to four lengths N0 N1 N21 N22 N31 N32 N42 N41 N5 N6 P1 P2 P3 P4 P5 P6 P7 P8 Figure 2{6: 1-bit trie for prefxes of Figure 2{5(a) 2.3.2 An Example Figure 2{5(a) giv es a prefx set P that con tains 8 prefxes. The length of the longest prefx (P8) is 7. Figure 2{5(b) giv es the prefxes that remain when the prefxes of P are expanded in to the lengths 1, 3, 5, and 7. As w e shall see, these expanded prefxes corresp ond to an optimal 4-VST for P Figure 2{6 giv es the 1-bit trie for the prefxes of Figure 2{5. T o determine the cost, O pt ( N 0 ; 0 ; 4), of the b est 4-VST for the prefx set of Figure 2{5(a), w e m ust compute all the O pt v alues sho wn in Figure 2{7. In this fgure O pt 1 for example, refers to O pt ( N 1 ; ; ) and O pt 42 refers to O pt ( N 42 ; ; ).

PAGE 59

45 O pt 0 r = 1 2 3 4 O pt 1 r = 1 2 3 4 s = 0 128 26 20 18 s = 0 64 18 16 16 1 64 18 16 16 1 40 18 16 16 2 40 18 16 16 2 20 12 12 12 3 20 12 12 12 3 10 8 8 8 4 10 8 8 8 4 4 4 4 4 5 4 4 4 4 5 2 2 2 2 6 2 2 2 2 O pt 22 r = 1 2 3 4 O pt 21 r = 1 2 3 4 s = 0 32 12 10 10 s = 0 8 6 6 6 1 16 8 8 8 1 4 4 4 4 2 8 6 6 6 2 2 2 2 2 3 4 4 4 4 4 2 2 2 2 O pt 31 r = 1 2 3 4 s = 0 4 4 4 4 O pt 32 r = 1 2 3 4 1 2 2 2 2 s = 0 16 8 8 8 1 8 6 6 6 O pt 41 r = 1 2 3 4 2 4 4 4 4 s = 0 2 2 2 2 3 2 2 2 2 O pt 5 r = 1 2 3 4 O pt 42 r = 1 2 3 4 s = 0 4 4 4 4 s = 0 8 6 6 6 1 2 2 2 2 1 4 4 4 4 2 2 2 2 2 O pt 6 r = 1 2 3 4 s = 0 2 2 2 2 Figure 2{7: O pt v alues in the computation of O pt ( N 0 ; 0 ; 4) The O pt arra ys sho wn in Figure 2{7 are computed in p ostorder; that is, in the order N41, N31, N21, N6, N5, N42, N32, N22, N1, N0. The O pt v alues sho wn in Figure 2{7 w ere computed using Equations 2.15 through 2.18. F rom Figure 2{7, w e determine that the cost of the b est 4-VST for the giv en prefx set is O pt ( N 0 ; 0 ; 4) = 18. T o construct this b est 4-VST, w e m ust determine the strides for all no des in the b est 4-VST. These strides are easily determined if, with eac h O pt ( ; 0 ; ), w e store the s v alue that minimizes the righ t side of Equation 2.15. F or O pt ( N 0 ; 0 ; 4), this minimizing s v alue is 1. This means that the stride for the ro ot of the b est 4-VST is 1, its left subtree is empt y (b ecause N0 has an empt y

PAGE 60

46 00 01 10 11 P8 P7 00 01 10 11 )Tj0 -10.6075 TD()TjET0.1 0 0 0.1 0 0 cm3870.49 6239.37 m3994.24 5708.99 lS9.9994 0 0 9.9994 0 0 cmBT/T1_1 8.486 Tf1.0001 0 0 1.0001 360.552 620.4389 Tm(P6 P6 00 01 10 11 )TjT*()TjET0.1 0 0 0.1 0 0 cm2332.4 5514.52 459.659 212.15 reS2332.4 5620.6 m2792.05 5620.6 lS9.9994 0 0 9.9994 0 0 cmBT/T1_1 8.486 Tf1.0001 0 0 1.0001 250.9343 553.2538 Tm(P5 0 1 )TjET0.1 0 0 0.1 0 0 cm3181 7034.93 m3287.07 6858.14 lS2685.98 6221.69 m2526.87 5726.67 lS3068.32 6807.23 m2909.21 6294.53 lS3534.58 6592.95 m3605.3 6292.41 lS9.9994 0 0 9.9994 0 0 cmBT/T1_1 8.486 Tf1.0001 0 0 1.0001 325.1909 666.4077 Tm(P4 P3 P3 0 1 P1 P2 0 1 )Tj0 -10.6075 TD()Tj38.8942 67.181 Td()TjETQ BT/T1_0 11.9552 Tf1.0001 0 0 1.0001 178.6909 503.4307 Tm(Figure 2{8: Optimal 4-VST for prefxes of Figure 2{5(a) left subtree), its righ t subtree is the b est 3-VST for the subtree ro oted at N1. The minimizing s v alue for O pt ( N 1 ; 0 ; 3) is 2 (actually there is a tie b et w een s = 2 and s = 3; ties ma y b e brok en arbitrarily). Therefore, the righ t c hild of the ro ot of the b est 4-VST has a stride of 2. Its frst subtree is the b est 2-VST for N31; its second subtree is empt y; its third subtree is the b est 2-VST for N32; and its fourth subtree is empt y Con tin uing in this manner, w e obtain the 4-VST of Figure 2{8. The cost of this 4-VST is 18. 2.3.3 F aster k = 2 Algorithm The algorithm of Section 2.3.1 ma y b e used to determine the optimal 2-VST for a set of n prefxes in O ( pW ) (equal to O ( nW ) for practical prefx sets) time, where p is the n um b er of no des in the 1-bit trie and W is the length of the longest prefx. In this section, w e dev elop an O ( p )-time algorithm for this task. F rom Equation 2.13, w e see that the cost, O pt ( r oot; 2) of the b est 2-VST is

PAGE 61

47 Algorithm ComputeC( t ) // Initial in v o cation is C omputeC ( r oot ). // The C arra y and l ev el are initialized to 0 prior to initial in v o cation. // Return heigh t of tree ro oted at no de t f if ( t = n ull ) f l ev el + +; l ef tH eig ht = C omputeC ( t:l ef tC hil d ); r ig htH eig ht = C omputeC ( t:r ig htC hil d ); l ev el )Tj11.8694 0 Td()Tj/T1_0 11.9552 Tf9.2294 0 Td(; heig ht = max f l ef tH eig ht; r ig htH eig ht g + 1; C [ l ev el ] += 2 heig ht +1 ; return heig ht ; g else return )Tj/T1_0 11.9552 Tf9.2294 0 Td(1; g Figure 2{9: Algorithm to compute C using Equation 2.20. O pt ( r oot; 2) = min s 2f 1 :: 1+ heig ht ( r oot ) g f 2 s + X Q 2 D s ( r oot ) O pt ( Q; 1) g = min s 2f 1 :: 1+ heig ht ( r oot ) g f 2 s + X Q 2 D s ( r oot ) 2 1+ heig ht ( Q ) g = min s 2f 1 :: 1+ heig ht ( r oot ) g f 2 s + C ( s ) g (2.19) where C ( s ) = X Q 2 D s ( r oot ) 2 1+ heig ht ( Q ) (2.20) W e ma y compute C ( s ), 1 s 1 + heig ht ( r oot ), in O ( p ) time b y p erforming a p ostorder tra v ersal (see Figure 2{9) of the 1-bit trie ro oted at r oot (Recall that p is the n um b er of no des in the 1-bit trie.) Once w e ha v e determined the C v alues using Algorithm C omputeC (Figure 2{9), w e ma y determine O pt ( r oot; 2) and the optimal stride for the ro ot in an additional O ( heig ht ( r oot )) time using Equation 2.19. If the optimal stride for the ro ot is s then

PAGE 62

48 the second expansion lev el is lev el s (unless, s = 1 + heig ht ( r oot ), in whic h case there isn't a second expansion lev el). The stride for eac h no de at lev el s is one plus the heigh t of the subtree ro oted at that no de. The heigh t of the subtree ro oted at eac h no de w as computed b y Algorithm C omputeC and so the strides for the no des at the second expansion lev el are easily determined. 2.3.4 F aster k = 3 Algorithm Using the algorithm of Section 2.3.1 w e ma y determine the optimal 3-VST in O ( pW ) time. In this section, w e dev elop a simpler and faster O ( pW ) algorithm for this task. On practical prefx sets, the algorithm of this section runs in O ( p ) time. F rom Equation 2.13, w e see that the cost, O pt ( r oot; 3) of the b est 3-VST is O pt ( r oot; 3) = min s 2f 1 :: 1+ heig ht ( r oot ) g f 2 s + X Q 2 D s ( r oot ) O pt ( Q; 2) g = min s 2f 1 :: 1+ heig ht ( r oot ) g f 2 s + T ( s ) g (2.21) where T ( s ) = X Q 2 D s ( r oot ) O pt ( Q; 2) (2.22) Figure 2{10 giv es our algorithm to compute T ( s ), 1 s 1 + heig ht ( r oot ). The computation of O pt ( M ; 2) is done using Equations 2.19 and 2.20. In Algorithm C omputeT (Figure 2{10), the metho d al l ocate allo cates a one-dimensional arra y that is to b e used to compute the C v alues for a subtree. The allo cated arra y is initialized to zero es; it has p ositions 0 through W where W is the length of the longest prefx ( W also is 1 + heig ht ( r oot )); and when computing the C v alues for a subtree whose ro ot is at lev el j only p ositions j through W of the allo cated arra y ma y b e mo difed. The metho d deal l ocate frees a C arra y previously allo cated.

PAGE 63

49 Algorithm ComputeT( t ) // Initial in v o cation is C omputeT ( r oot ). // The T arra y and l ev el are initialized to 0 prior to initial in v o cation. // Return cost of b est 2-VST for subtree ro oted at no de t and heigh t // of this subtree. f if ( t = n ull ) f l ev el + +; // compute C v alues and heigh ts for left and righ t subtrees of t ( l ef tC ; l ef tH eig ht ) = C omputeT ( t:l ef tC hil d ); ( r ig htC ; r ig htH eig ht ) = C omputeT ( t:r ig htC hil d ); l ev el )Tj11.8694 0 Td()Tj/T1_0 11.9552 Tf9.2294 0 Td(; // compute C v alues and heigh t for t as w ell as // bestT = O pt ( t; 2) and t:str ide = stride of no de t // in this b est 2-VST ro oted at t heig ht = max f l ef tH eig ht; r ig htH eig ht g + 1; bestT = l ef tC [ l ev el ] = 2 heig ht +1 ; t:str ide = heig ht + 1; for ( in t i = 1; i < = heig ht ; i + +) f l ef tC [ l ev el + i ] += r ig htC [ l ev el + i ]; if (2 i + l ef tC [ l ev el + i ] < t:bestT ) f bestT = 2 i + l ef tC [ l ev el + i ]; t:str ide = i ; gg T [ l ev el ]+ = bestT ; deal l ocate ( r ig htC ); return ( l ef tC ; heig ht ); g else f //t is n ull al l ocate ( C ); return ( C ; )Tj/T1_0 11.9552 Tf9.2294 0 Td(1); g g Figure 2{10: Algorithm to compute T using Equation 2.22.

PAGE 64

50 The complexit y of Algorithm C omputeT is readily seen to b e O ( pW ). Once the T v alues ha v e b een computed using Algorithm C omputeT w e ma y determine O pt ( r oot; 3) and the stride of the ro ot of the optimal 3-VST in an additional O ( W ) time. The strides of the no des at the remaining expansion lev els of the optimal 3VST ma y b e determined from the t:str ide and subtree heigh t v alues computed b y Algorithm C omputeT in O ( p ) time. So the total time needed to determine the b est 3-VST is O ( pW ). When the dierence b et w een the heigh ts of the left and righ t subtrees of no des in the 1-bit trie is b ounded b y some constan t d the complexit y of Algorithm C omputeT is O ( p ). W e use an amortization sc heme to pro v e this. First, note that, exclusiv e of the recursiv e calls, the w ork done b y Algorithm C omputeT for eac h in v o cation is O ( heig ht ( t )). F or simplicit y assume that this w ork is exactly heig ht ( t ) + 1 (the 1 is for the w ork done outside the for lo op of C omputeT ). Eac h activ e C arra y will main tain a credit that is at least equal to the heigh t of the subtree it is asso ciated with. When a C arra y is allo cated, it has no credit asso ciated with it. Eac h no de in the 1-bit trie b egins with a credit of 2. When t = N 1 unit of the credits on N is used to pa y for the w ork done outside of the for lo op. The remaining unit is giv en to the C arra y l ef tC The cost of the for lo op is paid for b y the credits asso ciated with r ig htC These credits ma y fall short b y at most d + 1, b ecause the heigh t of the left subtree of N ma y b e up to d more than the heigh t of N 's righ t subtree. Adding together the initial credits on the no des and the maxim um total shortfall, w e see that p (2 + d + 1) credits are enough to pa y for all of the w ork. So, the complexit y of C omputeT is O ( pd ) = O ( p ) (b ecause d is assumed to b e a constan t). In practice, w e exp ect that the 1-bit tries for router prefxes will not b e to o sk ew ed and that the dierence b et w een the heigh ts of the left and righ t subtrees will, in fact, b e quite small. Therefore, in practice, w e exp ect C omputeT to run in O ( p ) time.

PAGE 65

51 T able 2{3: Memory required (in Kb ytes) b y b est k )Tj/T1_0 11.9552 Tf9.2294 0 Td(lev el FST Lev els( k ) 2 3 4 5 6 7 P aix 49,192 3,030 1,340 1,093 960 922 Pb 47,925 2,328 896 699 563 527 MaeW est 44,338 2,168 819 636 499 468 Aads 42,204 2,070 782 594 467 436 MaeEast 38,890 1,991 741 549 433 398 2.4 Exp erimen tal Results W e programmed our dynamic programming algorithms in C and compared their p erformance against that of the C co des for the algorithms of Sriniv asan and V arghese [82]. All co des that w ere run on a SUN w orkstation w ere compiled using the gcc compiler and optimization lev el -O2; co des run on a PC w ere compiled using Microsoft Visual C++ 6.0 and optimization lev el -O2. The co des w ere run on a SUN Ultra En terprise 4000/5000 computer as w ell as on a 2.26 GHz P en tium 4 PC. F or test data, w e used the fv e IPv4 prefx databases of T able 2{1. 2.4.1 P erformance of Fixed-Stride Algorithm T able 2{3 and Figure 2{11 sho ws the memory required b y the b est k -lev el FST for eac h of the fv e databases of T able 2{1. Note that the y -axis of Figure 2{11 uses a semilog scale. The k v alues used b y us range from a lo w of 2 to a high of 7 (corresp onding to a lo okup p erformance of at most 2 memory accesses p er lo okup to at most 7 memory accesses p er lo okup). As w as the case with the data sets used in [82], using a larger n um b er of lev els do es not increase the required memory W e note that for k = 11 and 12, [82] rep orts no decrease in memory required for three of their data sets. W e did not try suc h large k v alues for our data sets. T able 2{4 and Figure 2{12 sho w the time tak en b y b oth our algorithm and that of [82] (w e are grateful to Dr. Sriniv asan for making his fxedand v ariable-stride co des a v ailable to us) to determine the optimal strides of the b est FST that has at most k lev els. These times are for the P en tium 4 PC. Times in T able 2{5 and Figure 2{13 are for the SUN w orkstation. As exp ected, the run time of the algorithm of [82] is

PAGE 66

52 2 3 4 5 6 7 10 2 10 3 10 4 10 5 k Memory (KB) Paix Pb MaeWest Aads MaeEast Figure 2{11: Memory required (in KBytes) b y b est k -lev el FST quite insensitiv e to the n um b er of prefxes in the database. Although the run time of our algorithm is indep enden t of the n um b er of prefxes, the run time do es dep end on the v alues of nodes ( ) as these v alues determine M ( ; ) and hence determine minJ in Figure 2{3. As indicated b y the graph of Figure 2{12, the run time for our algorithm v aries only sligh tly with the database. As can b e seen, our algorithm pro vides a sp eedup of b et w een 1.5 and 3 compared to that of [82 ]. When the co des w ere run on our SUN w orkstation, the sp eedup w as b et w een 2 and 4. 2.4.2 P erformance of V ariable-Stride Algorithm T able 2{6 sho ws the memory required b y the b est k -lev el VST for eac h of the fv e databases of T able 2{1. The columns lab eled \Y es" giv e the memory required when the VST is p ermitted to ha v e Butler no des [44 ]. This capabilit y refers to the replacing of subtries with three or few er prefxes b y a single no de that con tains these prefxes [44]. The columns lab eled \No" refer to the case when Butler no des are not p ermitted (i.e., the case discussed in this c hapter). The data of T able 2{6 as w ell

PAGE 67

53 T able 2{4: Execution time (in sec) for FST algorithms, P en tium 4 PC P aix Pb MaeW est Aads MaeEast k [82] Our [82] Our [82] Our [82] Our [82 ] Our 2 5.23 3.20 5.19 3.15 5.13 3.15 5.15 3.17 5.17 3.09 3 9.99 4.87 9.73 4.73 9.98 4.81 9.96 4.90 10.00 4.73 4 14.68 6.23 14.53 6.15 14.62 6.29 14.59 6.31 14.64 6.10 5 19.54 7.36 19.42 7.31 19.42 7.40 19.15 7.42 19.45 7.28 6 24.32 9.39 24.08 8.37 24.07 8.47 24.03 8.46 24.23 8.29 7 28.99 9.48 28.72 9.42 28.68 9.45 28.68 9.38 28.76 9.34 T able 2{5: Execution time (in sec) for FST algorithms, SUN Ultra En terprise 4000/5000 P aix Pb MaeW est Aads MaeEast k [82] Our [82] Our [82] Our [82] Our [82] Our 2 39 21 41 21 39 21 37 20 37 21 3 85 30 81 30 84 31 74 31 96 31 4 123 39 124 40 128 38 122 40 130 40 5 174 46 174 48 147 46 161 45 164 46 6 194 53 201 54 190 55 194 54 190 53 7 246 62 241 63 221 63 264 62 220 62 2 3 4 5 6 7 0 5 10 15 20 25 30 k Time (usec) Paix-[16] Paix-Our Pb-[16] Pb-Our MaeWest-[16] MaeWest-Our Aads-[16] Aads-Our MaeEast-[16] MaeEast-Our Figure 2{12: Execution time (in sec) for FST algorithms, P en tium 4 PC

PAGE 68

54 2 3 4 5 6 7 0 50 100 150 200 250 300 k Time (usec) Paix[16] PaixOur Pb[16] PbOur MaeWest[16] MaeWestOur Aads[16] AadsOur MaeEast[16] MaeEastOur Figure 2{13: Execution time (in sec) for FST algorithms, SUN Ultra En terprise 4000/5000 as the memory requiremen ts of the b est FST are plotted in Figure 2{14. As can b e seen, the Butler no de pro vision has far more impact when k is small than when k is large. In fact, when k = 2 the Butler no de pro vision reduces the memory required b y the b est VST b y almost 50%. Ho w ev er, when k = 7, the reduction in memory resulting from the use of Butler no des v ersus not using them results in less than a 20% reduction in memory requiremen t. T able 2{6: Memory required (in Kb ytes) b y b est k )Tj/T1_0 11.9552 Tf9.2294 0 Td(VST P aix Pb MaeW est Aads MaeEast k No Y es No Y es No Y es No Y es No Y es 2 2,528 1,722 1,806 1,041 1,754 949 1,631 891 1,621 837 3 1,080 907 677 496 619 443 582 405 537 367 4 845 749 489 397 441 351 410 320 371 286 5 780 706 440 370 393 327 363 297 326 264 6 763 695 426 361 379 319 350 290 313 257 7 759 692 422 358 376 316 346 287 310 254

PAGE 69

55 2 3 4 5 6 7 10 2 10 3 10 4 10 5 k Memory (KB) No Butler Butler FST Figure 2{14: Memory required (in KBytes) for P aix b y b est k -VST and b est FST F or the run time comparison of the VST algorithms, w e implemen ted three v ersions of our VST algorithm of Section 2.3.1. None of these v ersions p ermitted the use of Butler no des. The frst v ersion, called the O ( pk + W 2 k ) Static Memory Implementation, is the O ( pk + W 2 k ) memory implemen tation describ ed in Section 2.3.1. The O ( W 2 k ) memory required b y this implemen tation for the cost arra ys is allo cated at compile time. During execution, memory segmen ts from this preallo cated O ( W 2 k ) memory are allo cated to no des, as needed, for their cost arra ys. The second v ersion, called the O ( pW k ) Dynamic Memory Implemen tation, dynamically allo cates a cost arra y to eac h no de of the 1-bit trie no des using C's malloc metho d. Neither the frst nor second implemen tations emplo y the fast algorithms of Sections 2.3.3 and 2.3.4. T ables 2{7 and 2{8 giv e the run time for these t w o implemen tations. The third implemen tation of our VST algorithm uses the faster k = 2 and k = 3 algorithms of Section 2.3.3 and 2.3.4 and also uses O ( pW k ) memory The O ( pW k ) memory is allo cated in one large blo c k making a single call to malloc F ollo wing

PAGE 70

56 T able 2{7: Execution times (in msec) for frst t w o implemen tations of our VST algorithm, P en tium 4 PC P aix Pb MaeW est Aads MaeEast k S D S D S D S D S D 2 34.3 107.5 17.6 56.2 31.0 50.1 15.9 46.9 12.2 40.2 3 39.1 115.4 22.4 65.2 15.2 58.2 19.0 53.1 15.1 46.5 4 47.0 131.4 28.2 74.8 20.0 66.2 23.3 57.6 16.6 54.2 5 51.5 140.7 29.6 78.0 20.3 66.2 23.2 62.0 19.9 56.0 6 59.0 146.7 32.9 82.7 27.9 69.4 26.3 71.6 21.4 62.5 7 63.7 159.3 31.0 88.6 32.8 79.0 32.7 73.3 29.4 67.1 S = O ( pk + W 2 k ) Static Memory Implemen tation D = O ( pW k ) Dynamic Memory Implemen tation T able 2{8: Execution times (in msec) for frst t w o implemen tations of our VST algorithm, SUN Ultra En terprise 4000/5000 P aix Pb MaeW est Aads MaeEast k S D S D S D S D S D 2 290 500 150 280 150 260 120 200 120 230 3 360 790 190 460 180 430 150 340 150 340 4 430 900 210 520 220 430 180 430 160 390 5 490 1140 260 610 240 570 200 520 190 470 6 530 1170 290 670 270 570 270 550 220 510 7 590 1390 330 780 300 690 300 630 260 560 S = O ( pk + W 2 k ) Static Memory Implemen tation D = O ( pW k ) Dynamic Memory Implemen tation

PAGE 71

57 T able 2{9: Execution times (in msec) for third implemen tation of our VST algorithm, P en tium 4 PC k P aix Pb MaeW est Aads MaeEast 2 21.0 10.6 9.0 8.2 7.3 3 27.8 15.0 13.2 12.1 10.7 4 48.5 27.6 24.6 22.9 20.6 5 56.2 32.3 28.7 26.7 24.0 6 62.1 36.4 32.5 30.4 27.1 7 69.3 40.3 36.1 33.7 30.3 T able 2{10: Execution times (in msec) for third implemen tation of our VST algorithm, SUN Ultra En terprise 4000/5000 k P aix Pb MaeW est Aads MaeEast 2 70 30 30 20 20 3 210 100 90 80 70 4 550 290 270 270 240 5 640 350 370 330 260 6 740 430 390 410 350 7 920 530 450 400 350 this, the large allo cated blo c k of memory is partitioned in to cost arra ys for the 1-bit trie no des b y our program. The run time for the third implemen tation is giv en in T ables 2{9 and 2{10. The run times for all three of our implemen tations is plotted in Figures 2{15 and 2{16. Notice that this third implemen tation is signifcan tly faster than our other O ( pW k ) memory implemen tation. Note also that this third implemen tation is also faster than the O ( pk + W 2 k ) memory implemen tation for the cases k = 2 and k = 3 (this is b ecause, in our third implemen tation, these cases use the faster algorithms of Sections 2.3.3 and 2.3.4). T o compare the run time p erformance of our algorithm with that of [82 ], w e use the times for implemen tation 3 when k = 2 or 3 and the times for the faster of implemen tations 1 and 3 when k > 3. That is, w e compare our b est times with the times for the algorithm of [82 ]. The times for the algorithm of [82] w ere obtained using their co de and running it with the Butler no de option o. Since the co de of [82] do es no dynamic memory allo cation, our use of the times for the static memory allo cation

PAGE 72

58 2 3 4 5 6 7 20 40 60 80 100 120 140 160 k Time (msec) Static Dynamic Third Figure 2{15: Execution times (in msec) for P aix for our three VST implemen tations, P en tium 4 PC 2 3 4 5 6 7 0 200 400 600 800 1000 1200 1400 k Time (msec) Static Dynamic Third Figure 2{16: Execution times (in msec) for P aix for our three VST implemen tations, SUN Ultra En terprise 4000/5000

PAGE 73

59 T able 2{11: Execution times (in msec) for our b est VST implemen tation and the VST algorithm of Sriniv asan and V arghese, P en tium 4 PC P aix Pb MaeW est Aads MaeEast k [82] Our [82 ] Our [82] Our [82] Our [82 ] Our 2 64.6 21.0 37.4 10.6 31.1 9.0 27.9 8.2 26.6 7.3 3 665.6 27.8 339.2 15.0 297.0 13.2 269.8 12.1 244.1 10.7 4 1262.7 47.0 629.8 27.6 559.4 20.0 503.2 22.9 448.5 16.6 5 1858.0 51.5 928.4 29.6 817.1 20.3 737.2 23.2 659.8 19.9 6 2441.0 59.0 1215.8 32.9 1073.2 27.9 971.4 26.3 868.9 21.4 7 3034.7 63.7 1512.7 31.0 1328.0 32.8 1209.3 32.7 1072.0 29.4 T able 2{12: Execution times (in msec) for our b est VST implemen tation and the VST algorithm of Sriniv asan and V arghese, SUN Ultra En terprise 4000/5000 P aix Pb MaeW est Aads MaeEast k [82] Our [82 ] Our [82] Our [82 ] Our [82] Our 2 190 70 130 30 50 30 40 20 40 20 3 1960 210 1230 100 360 90 320 80 280 70 4 3630 430 2330 210 700 220 590 180 530 160 5 5340 490 3440 260 1030 240 860 200 780 190 6 7510 530 4550 290 1340 270 1150 270 1020 220 7 9280 590 5650 330 1650 300 1420 300 1270 260 do es not, in an y w a y disadv an tage the algorithm of [82]. The run times, on our 2.26 GHz PC, are sho wn in T able 2{11 and these times are plotted in Figure 2{17. F or our largest database, P aix, our new algorithm tak es less than one-third the time tak en b y the algorithm of [82] when k = 2 and ab out 1 = 47 the time when k = 7. On our SUN w orkstation, as sho wn in T able 2{12 and Figure 2{18, the observ ed sp eedups for P aix ranges from a lo w of 2.7 to a high of 15.7. The observ ed sp eedups aren't as high as predicted b y our crude analysis b ecause actual sp eedup is go v erned b y b oth the op eration cost and the cac he-miss cost; further, our crude analysis do esn't accoun t for all op erations. The higher sp eedup observ ed on a PC suggests a higher relativ e cac he-miss cost on the PC (relativ e to the cost of an op eration) v ersus on a SUN w orkstation. The times rep orted in T ables 2{7{2{12 are only the times needed to determine the optimal strides for a giv en 1-bit trie. Once these strides ha v e b een determined,

PAGE 74

60 2 3 4 5 6 7 0 500 1000 1500 2000 2500 3000 3500 k Time (msec) Algorithm of [16] Algorithm of Our Figure 2{17: Execution times (in msec) for P aix for our b est VST implemen tation and the VST algorithm of Sriniv asan and V arghese, P en tium 4 PC 2 3 4 5 6 7 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 k Time (msec) Algorithm of [16] Algorithm of Our Figure 2{18: Execution times (in msec) for P aix for our b est VST implemen tation and the VST algorithm of Sriniv asan and V arghese, SUN Ultra En terprise 4000/5000

PAGE 75

61 T able 2{13: Time (in msec) to construct optimal VST from optimal stride data, P entium 4 PC k P aix Pb MaeW est Aads MaeEast 2 117.1 78.0 68.5 67.4 64.3 3 107.8 62.6 55.7 47.0 47.0 4 115.5 66.2 61.3 50.9 47.0 5 126.6 78.0 63.6 62.1 56.5 6 131.4 82.6 64.5 68.9 59.5 7 139.3 78.0 75.6 71.6 62.0 T able 2{14: Searc h time (in sec) in optimal VST, P en tium 4 PC k P aix Pb MaeW est Aads MaeEast 2 0.55 0.46 0.44 0.43 0.42 3 0.71 0.64 0.62 0.61 0.59 4 0.79 0.74 0.73 0.72 0.72 5 0.92 0.89 0.89 0.88 0.90 6 1.01 1.00 0.99 0.99 0.98 7 1.10 1.10 1.08 1.09 1.10 it is necessary to actually construct the optimal VST. T able 2{13 sho ws the time required to construct the optimal VST once the optimal strides are kno wn. F or our databases, the VST construction time is more than the time required to compute the optimal strides using our b est optimal stride computation implemen tation. The primary op eration p erformed on an optimal VST is a lo okup or searc h in whic h w e b egin with a destination address and fnd the longest prefx that matc hes this destination address. T o determine the a v erage lo okup/searc h time, w e searc hed for as man y addresses as there are prefxes in a database. The searc h addresses w ere obtained b y using the 32-bit expansion a v ailable in the database for all prefxes in the database. T able 2{14 and Figure 2{19 sho w the a v erage time to p erform a lo okup/searc h. As exp ected, the a v erage searc h time increases monotonically with k F or our databases, the searc h time for a 2-VST is less than or equal to half that for a 7-VST. Inserts and deletes are p erformed less frequen tly than searc hes in a VST. W e exp erimen ted with three strategies for these t w o op erations:

PAGE 76

62 2 3 4 5 6 7 0.5 0.6 0.7 0.8 0.9 1 1.1 1.2 k Search Time (usec) Figure 2{19: Searc h time (in nsec) in optimal VST for P aix, P en tium 4 PC OptVST: In this strategy the VST w as alw a ys the b est p ossible k -VST for the curren t set of prefxes. T o insert a new prefx, w e frst insert the prefx in to the 1-bit trie of all prefxes. Then, the cost arra ys on the insert path are recomputed. This is done ecien tly using implemen tation 2 (i.e., the O ( pW k ) dynamic memory implemen tation) of our VST stride computation algorithm. F ollo wing this, the optimal strides for v ertices on the insert path are computed. Since, the optimal VST for the new prefx set diers from the optimal VST for the original prefx set only along the insert path, w e mo dify the original optimal VST only along this insert path using the newly computed strides for the v ertices on this path. Deletion w orks in a similar fashion. Batc h1: In this strategy the optimal VST is computed p erio dically (sa y after a sucien t n um b er of inserts/deletes ha v e tak en place) rather than follo wing eac h insert/delete. Inserts and deletes are done directly in to the curren t VST without regard to main taining optimalit y If the insertion results in the creation of a new no de, the stride of this new no de is suc h that the sum of the strides of the no des on the path from the ro ot to this new no de equals the length of the newly inserted prefx. The deletion of a prefx ma y require us to searc h a no de

PAGE 77

63 for a replacemen t prefx of the next (lo w er) length that matc hes the deleted prefx. Batc h2: This diers from strategy Batc h1 in that inserts and deletes are done in b oth the curren t VST and in the 1-bit trie. This increases the time for an insert as w ell as for a delete. In the case of deletion, b y frst deleting from the 1-bit trie, w e determine the next (lo w er) length matc hing prefx from the delete path tak en in the 1-bit trie. This eliminates the need to searc h a no de for this next (lo w er) length matc hing prefx when deleting from the VST. The result is a net reduction in time for the delete op eration. The batc h mo des describ ed ab o v e ma y also b e useful when the insert/delete rate is sucien tly small that follo wing eac h insert or delete done as ab o v e, the optimal VST is computed in the bac kground using another pro cessor. While this computation is b eing done, routes are made using the sub optimal VST resulting from the insert or delete that w as done as describ ed for the batc h mo des. When the new optimal VST has b een computed, the new optimal VST is sw app ed with the sub optimal one. T ables 2{15-2{20 giv e the measured run times for the insert and delete op erations using eac h of the three strategies describ ed ab o v e. Figures 5{8 and 5{9 plot these times for the P aix database. F or the insert time exp erimen ts, w e started with an optimal VST for 75% of the prefxes in the giv en database and then measured the time to insert the remaining 25%. The rep orted times are the a v erage time for one insert. F or P aix and k = 2, it tak es 21 + 78 = 99 milli seconds to construct the optimal VST (time to compute optimal strides plus time to construct the VST for these strides). Ho w ev er, the cost of an incremen tal insert that main tains the optimalit y of the VST is only 50.75 micro seconds; the cost of an incremen tal delete is 51.85 micro seconds; a sp eedup of ab out 2000 o v er the from scratc h optimal VST construction!

PAGE 78

64 T able 2{15: Insertion time (in sec) for OptVST, P en tium 4 PC k P aix Pb MaeW est Aads MaeEast 2 50.75 49.72 49.53 49.17 208.44 3 325.95 71.25 67.28 66.23 68.19 4 146.74 165.60 126.74 122.92 99.87 5 186.22 191.17 187.68 169.06 186.36 6 2247.96 333.87 252.99 746.92 192.36 7 912.03 446.03 2453.73 445.81 375.32 T able 2{16: Deletion time (in sec) for OptVST, P en tium 4 PC k P aix Pb MaeW est Aads MaeEast 2 51.85 51.39 51.79 50.95 52.15 3 61.29 60.77 60.94 59.80 61.71 4 74.86 72.90 73.72 71.99 74.58 5 87.47 85.71 86.31 84.97 87.80 6 99.74 97.70 98.50 96.97 99.90 7 111.92 109.89 110.49 108.82 113.15 Although batc h insertion is considerably faster than insertion using strategy OptVST, batc h insertion increases the n um b er of lev els in the VST, and so results in slo w er searc hes. F or example, in the exp erimen ts with P aix, the batc h inserts increased the n um b er of lev els in the initial k -VST from k to 5 for k = 2, to 6 for k = 3 and 4, and to 8 for k = 5, 6, and 7. The delete times w ere measured b y starting with an optimal VST for 100% of the prefxes in the giv en database and then measuring the time to delete 25% of these prefxes. Once again, the a v erage time for a single delete is rep orted. T able 2{17: Insertion time (in sec) for Batc h1, P en tium 4 PC k P aix Pb MaeW est Aads MaeEast 2 1.51 1.73 1.69 1.81 1.89 3 2.37 2.97 3.40 2.58 2.79 4 2.86 3.50 4.04 3.31 3.09 5 3.63 4.33 4.52 3.54 3.93 6 4.18 5.05 6.53 4.35 5.19 7 5.00 4.98 9.03 4.58 5.19

PAGE 79

65 2 3 4 5 6 7 10 0 10 1 10 2 10 3 10 4 k Insert Time (usec) OptVST Batch1 Batch2 Figure 2{20: Insertion time (in sec) for P aix, P en tium 4 PC T able 2{18: Deletion time (in sec) for Batc h1, P en tium 4 PC k P aix Pb MaeW est Aads MaeEast 2 4.72 6.13 6.53 6.24 6.44 3 2.54 2.14 2.18 2.34 2.48 4 2.22 2.69 2.59 2.39 2.65 5 2.58 3.25 2.79 2.92 2.94 6 2.73 3.05 3.15 3.86 3.06 7 2.80 3.43 3.10 3.51 3.62 T able 2{19: Insertion time (in sec) for Batc h2, P en tium 4 PC k P aix Pb MaeW est Aads MaeEast 2 3.53 3.68 4.01 4.07 4.00 3 4.18 4.62 4.97 4.72 4.79 4 4.42 4.96 4.93 4.94 5.03 5 4.70 5.32 5.48 4.94 5.51 6 5.10 5.96 6.31 5.72 6.15 7 5.34 6.13 7.26 5.76 5.47

PAGE 80

66 T able 2{20: Deletion time (in sec) for Batc h2, P en tium 4 PC k P aix Pb MaeW est Aads MaeEast 2 5.70 7.34 7.37 7.16 7.45 3 3.59 3.59 3.67 3.63 3.49 4 3.48 3.78 3.93 3.89 3.98 5 3.67 3.77 4.05 3.79 4.05 6 3.81 4.07 4.14 3.98 4.15 7 3.87 4.19 4.23 3.99 3.90 2 3 4 5 6 7 10 0 10 1 10 2 10 3 k Delete Time (usec) OptVST Batch1 Batch2 Figure 2{21: Deletion time (in sec) for P aix, P en tium 4 PC

PAGE 81

67 2.5 Summary W e ha v e dev elop ed faster algorithms to compute the optimal strides for fxedand v ariable-stride tries than those prop osed in [82 ]. On IPv4 prefx databases and a 2.26 GHz P en tium 4 PC, our algorithm for fxed-stride tries is faster than the corresp onding algorithm of [82] b y a factor of b et w een 1.5 and 3; on a SUN Ultra En terprise 4000/5000, the sp eedup is b et w een 2 and 4. This sp eedup results from narro wing the searc h range in the dynamic-programming form ulation. Since the searc h range is at most 32 for IPv4 databases and at most 128 for IPv6 databases, the p oten tial to narro w the range (and hence sp eed up the computation) is greater for IPv6 data. Hence, w e exp ect that our narro w ed-range FST algorithm will exhibit greater sp eedup on IPv6 databases. W e are unable to v erify this exp ectation, b ecause of the non-a v ailabilit y of IPv6 prefx databases. On our PC, our algorithm to compute the strides for an optimal v ariable-stride trie is faster than the corresp onding algorithm of [82] b y a factor of b et w een 3 and 47; on our SUN w orkstation, the sp eedup is b et w een 2 and 17. Our VST stride computation metho d p ermits the insertion and remo v al of prefxes without ha ving to recompute the optimal strides from scratc h. The incremen tal insert and delete algorithms are ab out 3 orders of magnitude faster than the \from scratc h" algorithm. W e also ha v e prop osed t w o batc h strategies for the insertion and remo v al of prefxes. Although, these strategies p ermit faster insertion and deletion, they increase the heigh t of the VST, whic h results in slo wing do wn the searc h op eration. These batc h strategies are, nonetheless, useful in applications where it is practical to rebuild the optimal VST whenev er the searc h p erformance has b ecome unacceptible.

PAGE 82

CHAPTER 3 BINAR Y SEAR CH ON PREFIX LENGTH In this c hapter, w e fo cus on the collection of hash tables (CHT) sc heme of W aldv ogel et al. [87]. Let P b e the set of prefxes in a router table, and let P i b e the subset of P comprised of prefxes whose length is i In the sc heme of W aldv ogel et al. [87], w e main tain a hash table H i for ev ery P i that is not empt y H i includes the prefxes of P i as w ell as mark ers for prefxes in [ i
PAGE 83

69 P1 P2 P3 P4 P5 P6 =1* =0* =10* =1000* =100100* =1001001* (a) Prefxes H 1 H H H 4 7 6 H 2 (b) T ree for binary searc h 00* 11* 1000* 1001000* 1001001* (P1a) (P1b) (P3) (P2a) 01* 10* (P4) (P6) (P5a) (c) Expanded prefxes Figure 3{1: Con trolled prefx expansion CHT. By reducing the n um b er of hash tables in the CHT, the w orst-case n um b er of hash tables searc hed in the quest for l mp ( d ) ma y b e reduced. Prefx expansion [82] replaces a prefx of length u with 2 v )Tj/T1_5 7.9701 Tf6.5992 0 Td(u prefxes of length v v > u The new prefxes are obtained b y app ending all p ossible bit sequences of length v )Tj/T1_4 11.9552 Tf12.2293 0 Td(u to the prefx b eing expanded. So, for example, the prefx 1* ma y b e expanded to the length 2 prefxes 10* and 11* or to the length 3 prefxes 100*, 101*, 110*, and 111*. In case an expanded prefx is already presen t in the prefx set of the router table, it is dominated b y the existing prefx (the expanded prefx 10* represen ts a shorter original prefx 1* that cannot b e used to matc h destination addresses that b egin with 10 when longest-prefx matc hing is used) and so is discarded. So, if w e expand P 2 = 1 in our collection of Figure 3{1(a) to length 2, the expaned prefx P 2 b = 10 is dominated b y P 3 = 10 Figure 3{1(c) sho ws the prefxes that result when the length 1 prefxes of Figure 3{1(a) are expanded to length 2 and the length 6 prefx is expanded to length 7. Y ou ma y v erify that l mp ( d ) is the same for all d regardless of whether w e use the prefx set of Figure 3{1(a) or (c) (when the latter set is used, w e need to map bac k to the original prefx from whic h an expanded prefx came). Since, the prefxes of Figure 3{1(c) ha v e only 3 distinct lengths, the corresp onding CHT has only 3 hash tables and ma y b e searc hed for l mp ( d ) with at most 2 hash-table searc hes. Hence,

PAGE 84

70 the CHT sc heme results in faster lo okup when the prefxes of Figure 3{1(c) are used than when those of Figure 3{1(a) are used. [71, 72, 73 82] use prefx expansion to impro v e the lo okup p erformance of trie-represen tations of router tables. When reducing the n um b er of distinct lengths from u to v the c hoice of the target v lengths aects the n um b er of mark ers and prefxes that ha v e to b e stored in the resulting CHT but not the n um b er of hash tables, whic h is alw a ys v Although the n um b er of target lengths ma y b e determined from the exp ected n um b er of pac k ets to b e pro cessed p er second and the p erformance c haracteristics of the computer to b e used for this purp ose, the target lengths are determined so as to minimize the storage requiremen ts of the CHT. Consequen tly Sriniv asan and V arghese [82] form ulated the follo wing optimization problem. Exact Collection of Hash T ables Optimization Problem (ECHT) Given a set P of n pr efxes and a tar get numb er of distinct lengths k determine tar get lengths l 1 ; :::; l k such that the stor age r e quir e d by the pr efxes and markers for the pr efx set expansion ( P ) obtaine d fr om P by pr efx exp ansion to the determine d tar get lengths is minimum. When P and k are not implicit, w e use the notation E C H T ( P ; k ). F or simplicit y Sriniv asan [80 ] assumes that the strorage required b y the prefxes and mark ers for the prefx set expansion ( P ) equals the n um b er of prefxes and mark ers. W e mak e the same assumption in this c hapter. Sriniv asan [80 ] pro vides an O ( nW 2 )-time heuristic for ECHT. W e frst sho w, in Section 3.1, that the heuristic of Sriniv asan [80] ma y b e implemen ted so that its complexit y is O ( nW + k W 2 ) on practical prefx-sets. Then, in Section 3.2, w e pro vide an O ( nW 3 + k W 4 )-time algorithm for ECHT. In Section 3.3, w e form ulate an alternativ e v ersion A CHT of the ECHT problem. In this alternativ e v ersion, w e are to fnd at most k distinct target lengths to minimize storage rather than exactly k target lengths. The A CHT problem also ma y b e solv ed

PAGE 85

71 in O ( nW 3 + k W 4 ) time. In Section 3.4, w e prop ose a reduction in the searc h range used b y the heuristic of [80]. The prop osed range reduction reduces the run time b y more than 50% exclusiv e of the prepro cessing time. The reduced-range heuristic generates the same results on our b enc hmark prefx data-sets as are generated b y the full-range heuristic of [80 ]. A more accurate cost estimator than is used in the heuristic of [80] is prop osed in Section 3.5. Exp erimen tal results highligh ting the relativ e p erformance of the v arious algorithms and heuristics for ECHT and A CHT are presen ted in Section 3.6. 3.1 Heuristic of Sriniv asan The ECHT heuristic of Sriniv asan [80] uses the follo wing defnitions: E xpansionC ost ( i; j ) This is the n um b er of distinct prefxes that result when all prefxes in P q 2 P i q < j are expanded to length j F or example, when P = f 0 ; 1 ; 01 ; 100 g E xpansionC ost (1 ; 3) = 8 (note that 0* and 1* con tribute 4 prefxes eac h; 01* con tributes none b ecause its expanded prefxes are included in the expanded prefxes of 0*). E ntr ies ( j ) This is the maxim um n um b er of mark ers in H j (should j b e a target length) plus the n um b er of prefxes in P whose length is j Sriniv asan [80] uses \maxim um n um b er of mark ers" in the defnition of E ntr ies ( j ) rather than the exact n um b er of mark ers b ecause of the rep orted dicult y in computing this latter quan tit y T ( j; r ) This is an upp er b ound on the storage required b y the optimal solution to E C H T ( Q; r ), where Q P comprises all prefxes of P whose length is at most j ; the optimal solution to E C H T ( Q; r ) is required to con tain mark ers, as necessary for prefxes of P whose length exceeds j Sriniv asan [80] pro vides the follo wing dynamic programming recurrence for T ( j; r ).

PAGE 86

72 T ( j; r ) = E ntr ies ( j ) + min m 2f r )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 :::j )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 g f T ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E xpansionC ost ( m + 1 ; j ) g (3.1) T ( j; 1) = E ntr ies ( j ) + E xpansionC ost (1 ; j ) (3.2) W e ma y v erify the correctness of Equations 3.1 and 3.2. When r = 1, there is only 1 target length and this length is no more than j When Q has a prefx whose length is j then j m ust b e the target length. In this case, the n um b er of expanded prefxes is at most E xpansionC ost (1 ; j ) plus the n um b er of prefxes whose length is j So, the n um b er of prefxes and mark ers is at most E ntr ies ( j ) + E xpansionC ost (1 ; j ). When Q has no prefx whose length is j the optimal target length is the largest l l < j suc h that Q has a prefx whose length is l In this case, E ntr ies ( l ) + E xpansionC ost (1 ; l ) E ntr ies ( j ) + E xpansionC ost (1 ; j ) is an upp er b ound on the n um b er of prefxes and mark ers. T o compute E xpansionC ost and E ntr ies a 1-bit trie [39 ] is used. Figure 3{2 sho ws a prefx set and its corresp onding 1-bit trie. Notice that no des at lev el i (the ro ot is at lev el 0) of the 1-bit trie store prefxes whose length is i + 1. Sriniv asan [80] states ho w E xpansionC ost ( i; j ) ; 1 i < j W ma y b e computed in O ( nW 2 ) time using a 1-bit trie for P Sahni and Kim [71] ha v e observ ed that, for practical prefx sets, the 1-bit trie has O ( n ) no des. So, b y p erforming a p ostorder tra v ersal of the 1-bit trie, E xpansionC ost ( i; j ) ; 1 i < j W ma y b e computed in O ( nW ) time (note that n > W ). Details of this pro cess are pro vided in Section 3.2.1 where w e sho w ho w a closely related function ma y b e computed. F or E ntr ies ( j ), Sriniv asan [80] prop oses coun ting the n um b er of prefxes stored in lev el j )Tj/T1_0 11.9552 Tf11.0293 0 Td(1 of the 1-bit trie and the n um b er of (non-n ull) p oin ters (in the 1-bit trie) to no des at lev el j (the n um b er of p oin ters actually equals the n um b er of no des).

PAGE 87

73 P2 P3 P4 P5 P6 P7 P8 P9 P10 Prefixes P1 =010* =001* =01* =1* =0101* =00001* =00010* =00110* =01000* =01001* (a) A prefx set N31 N32 N41 N42 N43 N51 N52 N53 N54 Length 5 4 3 2 1 N2 N1 P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 (b) Corresp onding 1-bit trie Figure 3{2: Prefxes and corresp onding 1-bit trie The former giv es the n um b er of prefxes whose length is j and the latter giv es the maxim um n um b er of mark ers needed for the longer-length prefxes. Supp ose that m and j are target lengths and that no l m < l < j is a target length. The actual n um b er of prefxes and mark ers in H j ma y b e considerably less than E ntr ies ( j ) + E xpansionC ost ( m + 1 ; j ) for the follo wing reasons. An expanded prefx coun ted in E xpansionC ost ( m + 1 ; j ) ma y b e iden tical to a prefx in P whose length is j Some of the prefxes in P whose length is more than j ma y not need to lea v e a mark er in H j b ecause their length is not on an y binary searc h (sub)path that is preceded b y the length j F or example, for the binary searc h describ ed b y Figure 3{1(b), H 1 needs mark ers only for prefxes in H 2 ; not for those in H 4 H 6 and H 7 Ho w ev er, E ntr ies (1) accoun ts for mark ers needed b y prefxes in H 2 as w ell as those in H 4 H 6 and H 7 E ntr ies ( j ) do esn't accoun t for the fact that a mark er ma y b e iden tical to a prefx in whic h case the storage coun t for the mark er and the prefx together should b e 1 and not 2. F or example, in Figure 3{2(b), the mark er corresp onding to the non-n ull p oin ter to no de N 42 is iden tical to the prefx P3 and that for the non-n ull p oin ter to N 43 is iden tical to P 4. So, w e can safely reduce the v alue

PAGE 88

74 of E ntr ies (3) from 5 to 3. Note also that if the target lengths for the example of Figure 3{2(a) are 1, 3, and 5, then the n um b er of prefxes and mark ers in H 3 is 4. Ho w ev er, E xpansionC ost (2 ; 3) + E ntr ies (3) = 2 + 5 = 7. Exclusiv e of the time needed to compute E xpansionC ost and E ntr ies the complexit y of computing T ( W ; k ) and the target k lengths using Equations 3.1 and 3.2 is O ( k W 2 ) [80 ]. So, the o v erall complexit y is O ( nW 2 ) (note that n k ). As noted ab o v e, w e ma y reduce the time required to compute E xpansionC ost on practical prefx-sets b y p erforming a p ostorder tra v ersal of the 1-bit trie. Hence, for practical prefx-sets, the o v erall run time is O ( nW + k W 2 ). 3.2 Optimal-Storage Algorithm As noted in Section 3.1, the algorithm of Sriniv asan [80 ] is only a heuristic for ECHT. Since T ( W ; k ) is only an upp er b ound on the cost of an optimal solution for E C H T ( P ; k ), there is no assurance that the determined target lengths actually result in an optimal or close-to-optimal solution to E C H T ( P ; k ). In this section, w e dev elop an algorithm to determine the storage cost of an optimal solution to E C H T ( P ; k ). The algorithm is easily extended to determine the target lengths that yield this optimal storage cost. Lik e the heuristic of Sriniv asan [80], our algorithm uses dynamic programming. Ho w ev er, w e mo dify the defnition of expansion cost and in tro duce an accurate w a y to coun t the n um b er of mark ers. Although the heuristic of Sriniv asan [80] is insensitiv e to the shap e of the binary tree that describ es the binary searc h, the optimal-storage algorithm cannot b e insensitiv e to this shap e. T o see this, notice that the binary tree of Figure 3{1(b) corresp onds to the traditional w a y to program a binary searc h. In this, if l ow and up defne the curren t searc h range, then the next comparison is made at mid = b ( l ow + up ) = 2 c If instead, w e w ere to mak e the next comparison at mid = d ( l ow + up ) = 2 e the searc h is describ ed b y the binary tree of Figure 3{3. When a binary searc h is p erformed according to this tree, only H 4 need ha v e mark ers. The mark ers in H 4 are the same regardless

PAGE 89

75 H H H H H 1 2 4 7 6 Figure 3{3: Alternativ e binary tree for binary searc h of whether w e use mid = b ( l ow + up ) = 2 c or mid = d ( l ow + up ) = 2 e By using the latter defnition of mid w e eliminate mark ers from all remaining hash tables. In our dev elopmen t of the optimal-storage algorithm, w e assume that mid = d ( l ow + up ) = 2 e is used. Our dev elopmen t is easily altered for the case when mid = b ( l ow + up ) = 2 c is used. 3.2.1 Expansion Cost Defne E C ( i; j ), 1 i j W to b e the n um b er of distinct prefxes that result when all prefxes in P q 2 P i q j are expanded to length j Note that E C ( i; i ) is the n um b er of prefxes in P whose length is i W e ma y compute E C b y tra v ersing the 1-bit trie for P in a p ostorder fashion. Eac h no de x at lev el i )Tj/T1_0 11.9552 Tf9.7094 0 Td(1 of the trie main tains a lo cal set of E C ( i; j ) v alues, LE C ( i; j ), whic h is the expansion cost measured relativ e to the prefxes in the subtree of whic h x is ro ot. Some of the cases for the computation of x:LE C ( i; j ) are giv en b elo w. x:LE C ( i; i ) equals the n um b er of prefxes stored in no de x F or example, for no de N 1 of Figure 3{2(a), LE C (1 ; 1) = 1 and for no de N 54, LE C (5 ; 5) = 2. F or the remaining cases, assume i < j If x has a prefx in its left data feld (e.g., the prefx in the left data feld of no de N 32 is P 4) and also has one in its righ t data feld, then x:LE C ( i; j ) = 2 j )Tj/T1_6 7.9701 Tf6.5992 0 Td(i +1 If x has no prefxes (e.g., no des N 41 and N 42) and x has non-n ull left and righ t subtrees, then x:LE C ( i; j ) = x:l ef tC hil d:LE C ( i + 1 ; j ) + x:r ig htC hil d ( i + 1 ; j ).

PAGE 90

76 If x has a righ t prefx and a non-n ull left subtree, then x:LE C ( i; j ) = x:l ef t C hil d:LE C ( i + 1 ; j ) + 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(i The remaining cases are similar to those giv en ab o v e. One ma y v erify that E C ( i; j ) is just the sum of the LE C ( i; j ) v alues tak en o v er all no des at lev el i )Tj/T1_0 11.9552 Tf11.8693 0 Td(1 of the trie. Figure 3{4 giv es the LE C and E C v alues for the example of Figure 3{2. In this fgure, LE C 51, for example, refers to the LE C v alues for no de N 51. j = 4 5 4 1 LEC43[4,j] j = 4 0 LEC41[4,j] 2 5 j = 2 3 1 3 4 6 5 14 j = 5 1 j = 5 2 j = 4 5 0 1 j = 3 4 1 2 5 6 LEC32[3,j] LEC1[1,j] LEC51[5,j] j = 5 1 LEC53[5,j] j = 5 1 LEC52[5,j] LEC42[4,j] LEC54[5,j] LEC31[3,j] LEC2[2,j] 5 4 j = 3 1 2 4 2 3 j = 1 1 3 7 4 14 5 30 (a) LE C v alues i = 1 2 3 4 5 2 3 EC[i,j] j = 1 1 3 2 1 3 7 14 4 5 30 6 4 1 14 10 7 5 (b) E C v alues Figure 3{4: LE C and E C v alues for Figure 3{2 Since a 1-bit trie for n prefxes ma y ha v e O ( nW ) no des, w e ma y compute all E C v alues in O ( nW 2 ) time b y computing the LE C v alues as ab o v e and summing up the computed LE C v alues. A p ostorder tra v ersal suces for this. As noted in [71], the 1-bit tries for practical prefx sets ha v e O ( n ) no des. Hence, in practice, the E C v alues tak e only O ( nW ) time to compute. 3.2.2 Num b er of Mark ers Defne M C ( i; j; m ), 1 i j m W to b e the n um b er of mark ers in H j under the follo wing assumptions. The prefx set comprises only those prefxes of P whose length is at most m The target lengths include i )Tj/T1_0 11.9552 Tf12.4694 0 Td(1 (for notational con v enience, w e assume that 0 is a trivial target length for whic h H 0 is alw a ys empt y) and j but no length b et w een i )Tj/T1_0 11.9552 Tf12.7094 0 Td(1 and j Hence, prefxes whose length is i i + 1, or j are

PAGE 91

77 expanded to length j Only prefxes whose length is b et w een j + 1 and m ma y lea v e a mark er in H j F or M C (2 ; 4 ; 5) (Figure 3{2), P 6 through P 10 ma y lea v e mark ers in H 4 The candidate mark ers are obtained b y considering only the frst four bits of eac h of these prefxes. Hence, the candidate mark ers are 0000*, 0001*, 0011*, and 0100*. Ho w ev er, since the next smaller target length is 1, P 2, P 3, and P 4 will lea v e a prefx in H 4 The prefxes in H 4 are 0100*, 0101*, 0110*, 0111*, 0010*, and 0011*. So, of the candidate mark ers, only 0000* and 0001* are dieren t from the prefxes in H 4 Therefore, the mark er coun t M C (2 ; 4 ; 5) is 2. W e ma y compute all M C ( i; j; m ) v alues in O ( nW 3 ) time ( O ( nW 2 ) for practical prefx sets) using a lo cal function LM C in eac h no de of the 1-bit trie and a p ostorder tra v ersal. The metho d is v ery similar to that describ ed in Section 3.2.1 for the computation of all E C v alues. Figure 3{5 sho ws the LM C and M C v alues for our example of Figure 3{2. 0 j = 4 5 k = 4 5 1 0 0 j = 4 5 k = 4 2 0 0 j = 4 5 k = 4 5 1 0 LMC54[5,j,k] LMC42[4,j,k] LMC31[3,j,k] LMC53[5,j,k] LMC41[4,j,k] LMC43[4,j,k] LMC32[3,j,k] LMC2[2,j,k] k = 5 j = 5 0 k = 5 0 j = 5 j = 5 k = 5 0 LMC52[5,j,k] LMC51[5,j,k] k = 5 0 j = 5 5 5 4 k = 3 0 0 1 2 0 0 j = 3 4 5 5 4 k = 3 0 0 0 0 0 0 5 4 3 k = 2 j = 3 4 5 1 1 2 0 1 0 0 1 0 0 j = 2 3 4 5 k = 1 2 3 4 0 1 1 1 0 1 1 0 0 0 0 2 1 1 1 1 5 LMC1[1,j,k] j = 1 2 3 4 5 (a) LM C v alues MC[2,j,k] j = 2 3 4 5 k = 2 3 4 5 0 1 1 1 0 0 1 0 2 0 MC[5,j,k] 0 j = 5 k = 5 MC[3,j,k] j = 3 4 5 k = 3 4 5 0 0 1 0 2 0 MC[4,j,k] 0 j = 4 5 k = 4 5 4 0 MC[1,j,k] j = 1 2 3 4 5 k = 1 2 3 4 0 1 1 1 1 5 0 1 1 1 0 0 1 0 2 0 (b) M C v alues Figure 3{5: LM C and M C v alues for Figure 3{2

PAGE 92

78 3.2.3 Algorithm for ECHT Let O pt ( i; j; r ) b y the storage requiremen t of the optimal solution to E C H T ( P ; r ) under the follo wing restrictions. Only prefxes of P whose length is b et w een i and j are considered. Exactly r target lengths are used. j is one of the target lengths (ev en if there is no prefx whose length is j ). Let l max l max W b e the length of the longest prefx in P W e see that O pt (1 ; l max ; k ) is the storage requiremen t of the optimal solution to E C H T ( P ; k ). When r = 1, there is exactly one target length, j So, all prefxes m ust b e expanded to this length and there are no mark ers. Therefore, O pt ( i; j; 1) = E C ( i; j ) ; i j (3.3) When r = 2, one of the target lengths is j and the other, sa y m lies b et w een i and j )Tj/T1_0 11.9552 Tf12.3494 0 Td(1. Because w e assume mid = d ( l ow + up ) = 2 e the frst searc h is made in H j and the second in H m Consequen tly neither H j nor H m has an y mark ers. H j ( H m ) includes prefxes resulting from the expansion of prefxes of P whose length is b et w een m + 1 and j ( i and m ). So, O pt ( i; j; 2) = min i m 2. Let the r target lengths b e l 1 < l 2 < < l r Supp ose that the mid = d (1 + r ) = 2 e th target length is v Let u )Tj/T1_0 11.9552 Tf12.7093 0 Td(1 b e the largest target length suc h that u )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 < v The frst searc h of the binary searc h is done in H v The n um b er of prefxes and mark ers in H v is E C ( u; v ) + M C ( u; v ; j ). Additionally the mid )Tj/T1_0 11.9552 Tf10.7894 0 Td(1 = b ( r )Tj/T1_0 11.9552 Tf10.7894 0 Td(1) = 2 c target lengths that are less than v defne an optimal ( mid )Tj/T1_0 11.9552 Tf10.7894 0 Td(1)target-length solution for prefxes whose length is b et w een i and u )Tj/T1_0 11.9552 Tf12.7093 0 Td(1 sub ject to

PAGE 93

79 the constrain t that u )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 is a target length (notice that there are no mark ers in this solution for prefxes whose length exceeds u )Tj/T1_0 11.9552 Tf11.7494 0 Td(1) and the r )Tj/T1_1 11.9552 Tf11.6293 0 Td(m = b ( r )Tj/T1_0 11.9552 Tf11.7494 0 Td(1) = 2 c target lengths greater than v defne an optimal ( r )Tj/T1_1 11.9552 Tf12.2293 0 Td(m )-target-length solution for prefxes whose length is b et w een v + 1 and j sub ject to the constrain t that j is a target length. Hence, w e obtain the follo wing recurrence for O pt ( i; j; r ). O pt ( i; j; r ) = min i + d ( r )Tj/T1_4 7.9701 Tf6.5992 0 Td(1) = 2 e u v j b ( r )Tj/T1_4 7.9701 Tf6.5992 0 Td(1) = 2 c f O pt ( i; u )Tj/T1_0 11.9552 Tf11.8694 0 Td(1 ; d ( r )Tj/T1_0 11.9552 Tf11.8693 0 Td(1) = 2 e ) + O pt ( v + 1 ; j; b ( r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) = 2 c ) + E C ( u; v ) + M C ( u; v ; j ) g ; 3 r j )Tj/T1_1 11.9552 Tf11.9894 0 Td(i + 1 (3.5) Using Equations 3.3{3.5 to compute O pt (1 ; 5 ; 4) for the example of Figure 3{2, w e get O pt (1 ; 5 ; 4) = min 3 u v 4 f O pt (1 ; u )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 ; 2) + O pt ( v + 1 ; 5 ; 1) + E C ( u; v ) + M C ( u; v ; j ) g = min f O pt (1 ; 2 ; 2) + O pt (4 ; 5 ; 1) + E C (3 ; 3) + M C (3 ; 3 ; 5) ; O pt (1 ; 2 ; 2) + O pt (5 ; 5 ; 1) + E C (3 ; 4) + M C (3 ; 4 ; 5) ; O pt (1 ; 3 ; 2) + O pt (5 ; 5 ; 1) + E C (4 ; 4) + M C (4 ; 4 ; 5) g = min f E C (1 ; 1) + E C (2 ; 2) + E C (4 ; 5) + E C (3 ; 3) + M C (3 ; 3 ; 5) ; E C (1 ; 1) + E C (2 ; 2) + E C (5 ; 5) + E C (3 ; 4) + M C (3 ; 4 ; 5) ; min f E C (1 ; 1) + E C (2 ; 3) ; E C (1 ; 2) + E C (3 ; 3) g + E C (5 ; 5) + E C (4 ; 4) + M C (4 ; 4 ; 5) g = min f 1 + 1 + 7 + 2 + 1 ; 1 + 1 + 5 + 4 + 2 ; min f 1 + 3 ; 3 + 2 g + 5 + 1 + 4 g = min f 12 ; 13 ; 14 g = 12 : F rom the ab o v e computations, w e see that the optimal expansion lengths are 1, 2, 3, and 5. Figure 3{6(a) sho ws the CHT structure that results when these four target

PAGE 94

80 lengths are used. The three mark ers are sho wn in b oldface, t w o of these mark ers are also prefxes ( P 3 and P 4). The storage cost is 12. P2=01* P1=1* 2(length 2) 3(length 3) 4(length 5) 1(length 1) Array of hash table pointers P10=01001* P9=01000* P8=00110* P7=00010* P6=00001* P5=01010* P5=01011* P3=001* P4=010* 000* (a) 4 target lengths P1=1* 2(length 3) 4(length 5) 1(length 1) Array of hash table pointers P10=01001* P9=01000* P8=00110* P7=00010* P6=00001* P5=01010* P5=01011* P2=011* P4=010* P3=001* 000* (b) 3 target lengths Figure 3{6: Optimal-storage CHTs for Figure 3{2 Complexit y T o solv e Equations 3.3{3.5 for O pt (1 ; l max ; k ), w e need to compute O ( k W 2 ) O pt ( i; j; r ) v alues. Eac h of these v alues ma y b e computed in O ( W 2 ) time from earlier computed O pt v alues. Hence, exclusiv e of the time needed to compute E C and M C the time to compute O pt (1 ; l max ; k ) is O ( k W 4 ). Adding in the time to compute E C and M C w e get O ( nW 3 + k W 4 ) as the o v erall time needed to solv e the ECHT problem. Of course, on practical data sets, the time is O ( nW 2 + k W 4 ). 3.3 Alternativ e F orm ulation In the E C H T ( P ; k ) problem, w e are to fnd exactly k target lengths for P that minimize the n um b er of (expanded) prefxes and mark ers (i.e., minimize storage cost). Although k is determined b y constrain ts on required lo okup p erformance, the determined k is only an upp er b ound on the n um b er of target lengths b ecause using a smaller n um b er of target lengths impro v es lo okup p erformance. The E C H T problem is form ulated with the premise that using a smaller k will lead to increased storage cost, and so in the in terest of conserving storage/memory while meeting the lo okup p erformance requiremen t, w e use the maxim um p ermissible n um b er of target lengths. Ho w ev er, this premise is not true. As an example, consider the prefx set

PAGE 95

81 P = f P 1 ; P 2 ; P 3 g = f 0 ; 00 ; 010 g The solution for E C H T ( P ; 2) uses the target lengths 2 and 3; P 1 expands to 00* and 01* but the 00* expansion is dominated b y P 2 and is discarded; no mark ers are stored in either H 2 or H 3 ; and the storage cost is 3. The solution for E C H T ( P ; 3), on the other hand, uses the target lengths 1, 2, and 3; no prefxes are expanded; H 2 needs a mark er 01* for P 3; and the total storage cost is 4! With this in mind, w e form ulate the AC H T ( P ; k ) problem in whic h w e are to fnd at most k target lengths for P that minimize the storage cost. In case of a tie, the solution with the smaller n um b er of target lengths is preferred, b ecause this solution has a reduced a v erage lo okup for the preceding example, the solution to E C H T ( P ; 3) is f 1 ; 2 ; 3 g whereas the solution to AC H T ( P ; 3) is f 2 ; 3 g F or the example of Figure 3{2, the solution to E C H T ( P ; 4) is f 1 ; 2 ; 3 ; 5 g resulting in a storage cost of 12; the solution to AC H T ( P ; 4) is f 1 ; 3 ; 5 g resulting in a storage cost that is also 12 (see Figure 3{6(b)). The AC H T problem ma y b e solv ed in the same asymptotic time as needed for the E C H T problem b y frst computing O pt ( i; j; r ), 1 i < j l max 1 r k and then fnding the r for whic h O pt (1 ; l max ; r ) is minim um, 1 r k 3.4 Reduced-Range Heuristic W e frst adapt the E C H T heuristic of Sriniv asan [80] to the AC H T problem. F or this purp ose, w e defne the function C whic h is the AC H T analog of T T o get the defnition of C simply replace E C H T ( Q; r ) b y AC H T ( Q; r ) in the defnition of T Also, w e use the same defnitions for E xpansionC ost (no w abbreviated to E C ost ) and E ntr ies as used in [80] (see Section 3.1). It is easy to see that C ( j; r ) C ( j; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) ; r > 1. A simple dynamic programming recurrence for C is

PAGE 96

82 C ( j; r ) = E ntr ies ( j ) + min m 2f 0 :::j )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 g f C ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( m + 1 ; j ) g ; j > 0 ; r > 1 (3.6) C (0 ; r ) = 0 ; C ( j; 1) = E ntr ies ( j ) + E C ost (1 ; j ) ; j > 0 (3.7) T o see the correctness of Equations 3.6 and 3.7, note that when j > 0, there m ust b e at least one target length. If r = 1, then there is exactly one target length. This target length is at most j (the target length is j when there is at least one prefx of this length) and so E ntr ies ( j ) + E C ost (1 ; j ) is an upp er b ound on the storage cost. If r > 1, let m and s m < s b e the t w o largest target lengths in the solution to AC H T ( P ; r ). m could b e at an y of the lengths 0 through j )Tj/T1_0 11.9552 Tf12.2293 0 Td(1; m = 0 w ould mean that there is only 1 target length. Hence the storage cost is b ounded b y E ntr ies ( j ) + C ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( m + 1 ; j ). Since w e do not kno w the v alue of m w e ma y minimize o v er all c hoices for m C (0 ; r ) = 0 is a b oundary condition. W e ma y obtain an alternativ e recurrence for C ( j; r ) in whic h the range of m on the righ t side is r )Tj/T1_0 11.9552 Tf12.1094 0 Td(1 : : : j )Tj/T1_0 11.9552 Tf12.1094 0 Td(1 rather than 0 : : : j )Tj/T1_0 11.9552 Tf11.9894 0 Td(1. First, w e obtain the follo wing dynamic programming recurrence for C : C ( j; r ) = min f C ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ; T ( j; r ) g ; r > 1 (3.8) C ( j; 1) = E ntr ies ( j ) + E C ost (1 ; j ) (3.9) The rationale for Equation 3.8 is that the b est CHT that uses at most r target lengths either uses at most r )Tj/T1_0 11.9552 Tf10.3094 0 Td(1 target lengths or uses exactly r target lengths. When at most r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 target lengths are used, the cost is b ounded b y C ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1), and when

PAGE 97

83 exactly r target lengths are used, the cost is b ounded b y T ( j; r ), whic h is defned b y Equation 3.1. Let U ( j; r ) b e as defned in Equation 3.10. U ( j; r ) = E ntr ies ( j ) + min m 2f r )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 :::j )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 g f C ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( m + 1 ; j ) g ; j > 0 ; r > 1 (3.10) F rom Equations 3.1 and 3.8 w e obtain C ( j; r ) = min f C ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ; U ( j; r ) g ; r > 1 (3.11) T o see the correctness of Equation 3.11, note that for all j and r suc h that r j; T ( j; r ) C ( j; r ). F urthermore, E ntr ies ( j ) + min m 2f r )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 :::j )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 g f T ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( m + 1 ; j ) g E ntr ies ( j ) + min m 2f r )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 :::j )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 g f C ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( m + 1 ; j ) g = U ( j; r ) (3.12) Therefore, when C ( j; r )Tj/T1_0 11.9552 Tf11.0293 0 Td(1) U ( j; r ), Equations 3.8 and 3.11 compute the same v alue for C ( j; r ). When C ( j; r )Tj/T1_0 11.9552 Tf12.5894 0 Td(1) > U ( j; r ), it app ears from Equation 3.12 that Equation 3.11 ma y compute a smaller C ( j; r ) than is computed b y Equation 3.8. Ho w ev er, this is imp ossible, b ecause C ( j; r ) = E ntr ies ( j ) + min m 2f 0 :::j )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 g f C ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( m + 1 ; j ) g E ntr ies ( j ) + min m 2f r )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 ::: j )Tj/T1_4 7.9701 Tf6.5992 0 Td(1 g f C ( m; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + E C ost ( m + 1 ; j ) g Therefore, the C ( j; r )s computed b y Equations 3.8 and 3.11 are equal.

PAGE 98

84 In the remainder of this section, w e use the reduced ranges r )Tj/T1_0 11.9552 Tf11.7494 0 Td(1 : : : j )Tj/T1_0 11.9552 Tf11.7494 0 Td(1 for C Heuristically the range for m (in Equation 3.6) ma y b e restricted to a range that is (often) considerably smaller than r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 : : : j )Tj/T1_0 11.9552 Tf11.9894 0 Td(1. The narro w er range w e wish to use is max f M ( j )Tj/T1_0 11.9552 Tf11.7494 0 Td(1 ; r ) ; M ( j; r )Tj/T1_0 11.9552 Tf11.7494 0 Td(1) ; r )Tj/T1_0 11.9552 Tf11.7494 0 Td(1 g : : : j )Tj/T1_0 11.9552 Tf11.8694 0 Td(1, where M ( j; r ) ; r > 1 ; is the smallest m that minimizes C ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( m + 1 ; j ) in Equation 3.6. Although the use of this narro w er range could result in dieren t results from what w e get using the range r )Tj/T1_0 11.9552 Tf10.7894 0 Td(1 : : : j )Tj/T1_0 11.9552 Tf10.7894 0 Td(1, on our b enc hmark prefx sets, this do esn't happ en. In the remainder of this section, w e deriv e a condition Z on the 1-bit trie that, if satisfed, guaran tees that the use of the narro w er range yields the same results as when the range r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 : : : j )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 is used. Let P b e the set of prefxes represen ted b y the 1-bit trie. Let exp ( i; j ), i j b e the set of distinct prefxes obtained b y expanding the prefxes of P whose length is b et w een i and j )Tj/T1_0 11.9552 Tf12.5894 0 Td(1 to length j Note that exp ( i; i ) = ; and that j exp ( i; j ) j = E C ost ( i; j ). W e sa y that exp ( i; j ) co v ers a length j prefx p of P i p 2 exp ( i; j ). Let n ( i; j ) b e the n um b er of length j prefxes in P that are not co v ered b y a prefx of exp ( i; j ). Note that n ( i; i ) is the n um b er of length i prefxes in P The condition Z that ensures that the use of the narro w er range pro duces the same C v alues as when the range r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 : : : r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 is used is Z = E C ost ( a; j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E C ost ( b; j ) 2( n ( b; j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(n ( a; j )) where, 0 < a < b j Lemma 9 F or every 1-bit trie, (a) E C ost ( i; j + 1) 2 E C ost ( i; j ) ; 0 < i j and (b) E C ost ( i; j ) E C ost ( i + 1 ; j ) ; 0 < i < j .

PAGE 99

85 Pro of (a) E C ost ( i; j + 1) = j exp ( i; j + 1) j = 2[ j exp ( i; j ) j + n ( i; j )] = 2 E C ost ( i; j ) + 2 n ( i; j ) 2 E C ost ( i; j ) (b) Since, exp ( i + 1 ; j ) exp ( i; j ), E C ost ( i; j ) = j exp ( i; j ) j j exp ( i + 1 ; j ) j = E C ost ( i + 1 ; j ). Lemma 10 8 ( j > 0 ; i < j )[ E ntr ies ( j ) + E C ost ( i; j ) E ntr ies ( j + 1) + E C ost ( i; j + 1)] : Pro of By defnition, E ntr ies ( j ) = n um b er of prefxes of length j plus the n um b er of no des at lev el j of the trie (this latter n um b er equals the n um b er of p oin ters from lev el j )Tj/T1_0 11.9552 Tf10.4293 0 Td(1 to lev el j ). Since eac h length j prefx expands to 2 length j + 1 prefxes, the frst term in the sum for E ntr ies ( j ) is at most E C ost ( i; j + 1) = 2. Since the subtree ro oted at eac h lev el j no de con tains at least one prefx, the second term in the sum for E ntr ies ( j ) is at most E ntr ies ( j + 1). So, E ntr ies ( j ) E C ost ( i; j + 1) = 2 + E ntr ies ( j + 1) F rom Lemma 9(a), E C ost ( i; j ) E C ost ( i; j +1) = 2. So, E ntr ies ( j )+ E C ost ( i; j ) E C ost ( i; j + 1) + E ntr ies ( j + 1). Lemma 11 8 ( j > 0 ; r > 0)[ C [ j; r ] C [ j + 1 ; r ]] : Pro of First, consider the case when r = 1. F rom Equation 3.7, w e get C ( j; 1) = E ntr ies ( j ) + E C ost (1 ; j ) and C ( j + 1 ; 1) = E ntr ies ( j + 1) + E C ost (1 ; j + 1). F rom Lemma 10, E ntr ies ( j ) + E C ost (1 ; j ) E ntr ies ( j + 1) + E C ost (1 ; j + 1). Hence, C ( j; 1) C ( j + 1 ; 1).

PAGE 100

86 Next, consider the case r > 1. F rom the defnition of M ( j; r ), it follo ws that C ( j + 1 ; r ) = E ntr ies ( j + 1) + C ( b; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( b + 1 ; j + 1) ; where 0 b = M ( j + 1 ; r ) j When b < j using Equation 3.6 and Lemma 9, w e get C ( j; r ) E ntr ies ( j ) + C ( b; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( b + 1 ; j ) E ntr ies ( j + 1) + C ( b; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( b + 1 ; j + 1) = C ( j + 1 ; r ) : When b = j C ( j + 1 ; r ) = E ntr ies ( j + 1) + C ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( j + 1 ; j + 1) C ( j; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) : The remaining lemmas of this section assume that Z is true. Lemma 12 E C ost ( a; j + 1) )Tj/T1_1 11.9552 Tf10.6694 0 Td(E C ost ( b; j + 1) E C ost ( a; j ) )Tj/T1_1 11.9552 Tf10.6693 0 Td(E C ost ( b; j )] 0 < a < b j Pro of F rom the defnition of n ( i; j ), it follo ws that E C ost ( a; j ) )Tj/T1_1 11.9552 Tf11.8694 0 Td(E C ost ( b; j ) = j )Tj/T1_7 7.9701 Tf6.5992 0 Td(1 X l = a n ( a; l ) 2 j )Tj/T1_5 7.9701 Tf6.5992 0 Td(l )Tj/T1_5 7.9701 Tf13.32 15.36 Td(j )Tj/T1_7 7.9701 Tf6.5992 0 Td(1 X l = b n ( b; l ) 2 j )Tj/T1_5 7.9701 Tf6.5993 0 Td(l = b )Tj/T1_7 7.9701 Tf6.5992 0 Td(1 X l = a n ( a; l ) 2 j )Tj/T1_5 7.9701 Tf6.5992 0 Td(l )Tj/T1_5 7.9701 Tf13.32 15.36 Td(j )Tj/T1_7 7.9701 Tf6.5992 0 Td(1 X l = b [ n ( b; l ) )Tj/T1_1 11.9552 Tf11.8693 0 Td(n ( a; l )] 2 j )Tj/T1_5 7.9701 Tf6.5993 0 Td(l

PAGE 101

87 Hence, E C ost ( a; j + 1) )Tj/T1_1 11.9552 Tf19.3094 0 Td(E C ost ( b; j + 1) = b )Tj/T1_5 7.9701 Tf6.5992 0 Td(1 X l = a n ( a; l ) 2 j +1 )Tj/T1_3 7.9701 Tf6.5992 0 Td(l )Tj/T1_3 7.9701 Tf18.6 15.48 Td(j X l = b [ n ( b; l ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(n ( a; l )] 2 j +1 )Tj/T1_3 7.9701 Tf6.5993 0 Td(l = 2[ b )Tj/T1_5 7.9701 Tf6.5992 0 Td(1 X l = a n ( a; l ) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(l )Tj/T1_3 7.9701 Tf18.6 15.36 Td(j X l = b [ n ( b; l ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(n ( a; l )] 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(l ] = 2[ b )Tj/T1_5 7.9701 Tf6.5992 0 Td(1 X l = a n ( a; l ) 2 j )Tj/T1_3 7.9701 Tf6.5992 0 Td(l )Tj/T1_3 7.9701 Tf13.2 15.36 Td(j )Tj/T1_5 7.9701 Tf6.5992 0 Td(1 X l = b [ n ( b; l ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(n ( a; l )] 2 j )Tj/T1_3 7.9701 Tf6.5993 0 Td(l ] )Tj/T1_0 11.9552 Tf9.2294 0 Td(2[ n ( b; j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(n ( a; j )] = 2[ E C ost ( a; j ) )Tj/T1_1 11.9552 Tf11.8694 0 Td(E C ost ( b; j )] )Tj/T1_0 11.9552 Tf11.8693 0 Td(2[ n ( b; j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(n ( a; j )] Hence, when condition Z is true, E C ost ( a; j +1) )Tj/T1_1 11.9552 Tf9.2294 0 Td(E C ost ( b; j +1) E C ost ( a; j ) )Tj/T1_1 11.9552 Tf-422.159 -23.88 Td(E C ost ( b; j ). Lemma 13 8 ( j > 0 ; r > 1)[ M ( j + 1 ; r ) M ( j; r )] : Pro of Let M ( j; r ) = a and M ( j + 1 ; r ) = b Supp ose b < a Then, C ( j; r ) = E ntr ies ( j ) + C ( a; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( a + 1 ; j ) < E ntr ies ( j ) + C ( b; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( b + 1 ; j ) since, otherwise, M ( j; r ) = b C ( j + 1 ; r ) = E ntr ies ( j + 1) + C ( b; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost [ b + 1 ; j + 1] E ntr ies ( j + 1) + C ( a; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost [ a + 1 ; j + 1] Therefore, E C ost ( a + 1 ; j ) + E C ost ( b + 1 ; j + 1) < E C ost ( b + 1 ; j ) + E C ost ( a + 1 ; j + 1) Hence, E C ost ( b + 1 ; j + 1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E C ost ( a + 1 ; j + 1) < E C ost ( b + 1 ; j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E C ost ( a + 1 ; j )

PAGE 102

88 F rom Lemma 12, this is not p ossible when condition Z is true. So, when condition Z is true, b a The next few lemmas use the function whic h is defned as ( j; r ) = C ( j; r )Tj/T1_0 11.9552 Tf-421.797 -23.88 Td(1) )Tj/T1_1 11.9552 Tf11.8694 0 Td(C ( j; r ). Since, C ( j; r ) C ( j; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) ; ( j; r ) 0 for all j > 0 and all r > 1. Lemma 14 8 ( j > 0)[( j; 2) ( j + 1 ; 2)] Pro of If C ( j; 2) = C ( j; 1), there is nothing to pro v e as ( j + 1 ; 2) 0. The only other p ossibilit y is C ( j; 2) < C ( j; 1) (i.e., ( j; 2) > 0). In this case, the solution for AC H T ( j; 2) uses exactly 2 target lengths. F rom the recurrence for C (Equations 3.6 and 3.7), it follo ws that C ( j; 1) = E ntr ies ( j ) + E C ost (1 ; j ), and C ( j; 2) = E ntr ies ( j ) + C ( a; 1) + E C ost ( a + 1 ; j ) = E ntr ies ( j ) + E ntr ies ( a ) + E C ost (1 ; a ) + E C ost ( a + 1 ; j ) ; for some a 0 < a < j Therefore, ( j; 2) = C ( j; 1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(C ( j; 2) = E C ost (1 ; j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E ntr ies ( a ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E C ost (1 ; a ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E C ost ( a + 1 ; j ) : F rom Equations 3.6 and 3.7, it follo ws that C ( j + 1 ; 2) E ntr ies ( j + 1) + C ( a; 1) + E C ost ( a + 1 ; j + 1) = E ntr ies ( j + 1) + E ntr ies ( a ) + E C ost (1 ; a ) + E C ost ( a + 1 ; j + 1) Hence, ( j + 1 ; 2) E C ost (1 ; j + 1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E ntr ies ( a ) )Tj/T1_1 11.9552 Tf11.8694 0 Td(E C ost (1 ; a ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E C ost ( a + 1 ; j + 1) :

PAGE 103

89 Therefore, ( j + 1 ; 2) )Tj/T1_0 11.9552 Tf19.3094 0 Td(( j; 2) E C ost (1 ; j + 1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E ntr ies ( a ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E C ost (1 ; a ) )Tj/T1_1 11.9552 Tf11.8693 0 Td(E C ost ( a + 1 ; j + 1) )Tj/T1_1 11.9552 Tf9.2294 0 Td(E C ost (1 ; j ) + E ntr ies ( a ) + E C ost (1 ; a ) + E C ost ( a + 1 ; j ) = [ E C ost (1 ; j + 1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E C ost ( a + 1 ; j + 1)] )Tj/T1_0 11.9552 Tf9.2294 0 Td([ E C ost (1 ; j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E C ost ( a + 1 ; j )] F rom this and Lemma 12, it follo ws that when Z is true, ( j + 1 ; 2) )Tj/T1_0 11.9552 Tf12.1094 0 Td(( j; 2) 0. Lemma 15 8 ( j > 0 ; k > 2)[(( j; k )Tj/T1_0 11.9552 Tf13.0694 0 Td(1) ( j + 1 ; k )Tj/T1_0 11.9552 Tf13.1893 0 Td(1)] = ) 8 ( j > 0 ; k > 2)[(( j; k ) ( j + 1 ; k )] : Pro of Assume that 8 ( j > 0 ; k > 2)[(( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ( j + 1 ; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1)]. W e shall sho w that 8 ( j > 0 ; k > 2)[(( j; k ) ( j + 1 ; k )]. Let M ( j; k ) = b and M ( j + 1 ; k )Tj/T1_0 11.9552 Tf10.9093 0 Td(1) = c Case 1: c b ( j; k ) = C ( j; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(C ( j; k ) = C ( j; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E ntr ies ( j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(C ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) )Tj/T1_1 11.9552 Tf11.8694 0 Td(E C ost ( b + 1 ; j ) E ntr ies ( j ) + C ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) + E C ost ( b + 1 ; j ) )Tj/T1_1 11.9552 Tf9.2294 0 Td(E ntr ies ( j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(C ( b; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E C ost ( b + 1 ; j ) = ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) : Also, ( j + 1 ; k ) = C ( j + 1 ; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(C ( j + 1 ; k ) E ntr ies ( j + 1) + C ( c; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(2) + E C ost ( c + 1 ; j + 1) )Tj/T1_1 11.9552 Tf9.2294 0 Td(E ntr ies ( j + 1) )Tj/T1_1 11.9552 Tf11.8694 0 Td(C ( c; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) )Tj/T1_1 11.9552 Tf11.8694 0 Td(E C ost ( c + 1 ; j + 1) = ( c; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) :

PAGE 104

90 Since c b ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ( c; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1). Therefore, ( j + 1 ; k ) ( c; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ( j; k ) : Case 2: c < b Let M [ j + 1 ; k ] = a M [ j; k ] = b M [ j + 1 ; k )Tj/T1_0 11.9552 Tf11.7494 0 Td(1] = c and M [ j; k )Tj/T1_0 11.9552 Tf11.6293 0 Td(1] = d F rom Lemma 13, a b and c d Since b > c a b > c d Also, ( j; k ) = C ( j; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(C ( j; k ) = [ E ntr ies ( j ) + C ( d; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) + E C ost ( d + 1 ; j )] )Tj/T1_0 11.9552 Tf9.2294 0 Td([ E ntr ies ( j ) + C ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( b + 1 ; j )] and ( j + 1 ; k ) = C ( j + 1 ; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(C ( j + 1 ; k ) = [ E ntr ies ( j + 1) + C ( c; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(2) + E C ost ( c + 1 ; j + 1)] )Tj/T1_0 11.9552 Tf9.2294 0 Td([ E ntr ies ( j + 1) + C ( a; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + E C ost ( a + 1 ; j + 1)] : Therefore, ( j + 1 ; k ) )Tj/T1_0 11.9552 Tf11.8694 0 Td(( j; k ) = [ C ( c; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) + E C ost ( c + 1 ; j + 1)] )Tj/T1_0 11.9552 Tf9.2294 0 Td([ C ( d; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) + E C ost ( d + 1 ; j )] +[ C ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( b + 1 ; j )] )Tj/T1_0 11.9552 Tf9.2294 0 Td([ C ( a; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + E C ost ( a + 1 ; j + 1)] : (3.13) Since j > b > c d = M [ j; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1], E ntr ies ( j ) + C ( c; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(2) + E C ost ( c + 1 ; j ) E ntr ies ( j ) + C ( d; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) + E C ost ( d + 1 ; j ) (3.14)

PAGE 105

91 F urthermore, since M ( j + 1 ; k ) = a b E ntr ies ( j + 1) + C ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( b + 1 ; j + 1) E ntr ies ( j + 1) + C ( a; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( a + 1 ; j + 1) (3.15) Substituting Equations 3.14 and 3.15 in to Equation 3.13, w e get ( j + 1 ; k ) )Tj/T1_0 11.9552 Tf11.8694 0 Td(( j; k ) [ E C ost ( c + 1 ; j + 1) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E C ost ( b + 1 ; j + 1)] )Tj/T1_0 11.9552 Tf9.2294 0 Td([ E C ost ( c + 1 ; j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E C ost ( b + 1 ; j )] : Lemma 12 and c < b imply that when Z is true, E C ost ( c + 1 ; j + 1) )Tj/T1_1 11.9552 Tf12.3494 0 Td(E C ost ( b + 1 ; j + 1) E C ost ( c + 1 ; j ) )Tj/T1_1 11.9552 Tf12.1094 0 Td(E C ost ( b + 1 ; j ). Therefore, ( j + 1 ; k ) )Tj/T1_0 11.9552 Tf12.1094 0 Td(( j; k ) 0. Lemma 16 8 ( j > 0 ; k 2)[( j; k ) ( j + 1 ; k )] : Pro of F ollo ws from Lemmas 14 and 15. Lemma 17 L et k > 2 8 ( j > 0)[( j; k )Tj/T1_0 11.9552 Tf9.8293 0 Td(1) ( j + 1 ; k )Tj/T1_0 11.9552 Tf9.8293 0 Td(1)] = ) 8 ( j > 0)[ M ( j; k ) M ( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1)] : Pro of Assume that 8 ( j > 0)[( j; k )Tj/T1_0 11.9552 Tf9.2294 0 Td(1) ( j +1 ; k )Tj/T1_0 11.9552 Tf9.2294 0 Td(1)]. Supp ose that M ( j; k )Tj/T1_0 11.9552 Tf9.2294 0 Td(1) = a M ( j; k ) = b and b < a for some j; j > 0. F rom Equation 3.6, w e get C ( j; k ) = E ntr ies ( j ) + C ( b; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + E C ost ( b + 1 ; j ) E ntr ies ( j ) + C ( a; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( a + 1 ; j ) and C ( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) = E ntr ies ( j ) + C ( a; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) + E C ost ( a + 1 ; j ) < E ntr ies ( j ) + C ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) + E C ost ( b + 1 ; j ) : Hence, C ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + C ( a; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(2) < C ( a; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + C ( b; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(2) :

PAGE 106

92 Therefore, ( a; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) < ( b; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) : Ho w ev er, b < a and 8 ( j > 0)[( j; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) ( j + 1 ; k )Tj/T1_0 11.9552 Tf11.8694 0 Td(1)] imply that ( b; k )Tj/T1_0 11.9552 Tf11.8693 0 Td(1) ( a; k )Tj/T1_0 11.9552 Tf12.4694 0 Td(1). Since our assumption that b < a leads to a con tradiction, it m ust b e that there is no j > 0 for whic h M ( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) = a M ( j; k ) = b and b < a Lemma 18 8 ( j > 0 ; k > 2)[ M ( j; k ) M ( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1)] : Pro of F ollo ws from Lemmas 16 and 17. Theorem 2 8 ( j > 0 ; k > 2)[ M ( j; k ) max f M ( j )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 ; k ) ; M ( j; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ; k )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 g ] : Pro of F ollo ws from Lemmas 13 and 18 and the fact that M ( j; k ) is in the range r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1 j )Tj/T1_0 11.9552 Tf11.8694 0 Td(1. Note 2 F r om L emma 16, it fol lows that whenever ( j; k ) > 0 then ( q ; k ) > 0 8 q > j Since Theorem 2 holds only when condition Z is true, w e m ust b e able to c hec k for this condition in an y implemen tation that attempts to use Theorem 2 to reduce the range for m in Equation 3.6. Unfortunately the time required to c hec k condition Z exceeds the an ticipated gain from using the narro w er range. Ho w ev er, w e can pro vide go o d reason to exp ect that condition Z will hold on almost all practical data sets (certainly the condition holds on the practical data sets a v ailable to us). Therefore, w e prop ose the use of the narro w er range in practice. Ev en if the condition fails on some data set, the p enalt y for using the narro w er range w ould b e a sub optimal solution. Since C and T are themselv es only upp er b ounds on the cost of optimal solutions, it isn't clear that m uc h is to b e lost b y solving for T and C inexactly The condition Z is E C ost ( a; j ) )Tj/T1_1 11.9552 Tf11.8694 0 Td(E C ost ( b; j ) 2( n ( b; j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(n ( a; j ))

PAGE 107

93 where, 0 < a < b j W e restate this condition in terms of the n um b er of no des at lev el j )Tj/T1_0 11.9552 Tf12.1094 0 Td(1 of the 1-bit trie for the prefx set. First, put in all missing no des so that the total n um b er of no des at lev el i of the 1-bit trie is 2 i Call the new no des dummy no des Let the n um b er of dumm y no des added to lev el j )Tj/T1_0 11.9552 Tf12.5894 0 Td(1 b e dum ( j ). A no de x dumm y or otherwise, at lev el j )Tj/T1_0 11.9552 Tf12.3494 0 Td(1 is co v ered b y a length s s < j prefx i x 's ancestor at lev el s )Tj/T1_0 11.9552 Tf10.9094 0 Td(1 con tains a prefx. Lab el a no de at lev el j )Tj/T1_0 11.9552 Tf10.9094 0 Td(1 i it is co v ered b y a prefx whose length is b et w een a and b )Tj/T1_0 11.9552 Tf10.6694 0 Td(1 and b y no prefx whose length is b et w een b and j )Tj/T1_0 11.9552 Tf12.1094 0 Td(1 (equiv alen tly trace a path to w ard the ro ot from eac h no de x at lev el j ; if the frst prefx encoun tered is at one of the lev els a )Tj/T1_0 11.9552 Tf12.3494 0 Td(1 ; ; b )Tj/T1_0 11.9552 Tf12.3493 0 Td(2, lab el x ). Let N i ( j ), 0 i 2 b e the n um b er of lab eled no des at lev el j )Tj/T1_0 11.9552 Tf11.7494 0 Td(1 that con tain exactly i prefxes (note that a dumm y no de has no prefx). W e see that E C ost ( a; j ) )Tj/T1_1 11.9552 Tf11.8694 0 Td(E C ost ( b; j ) = 2 2 X i =0 N i ( j ) F urther, let cov ( a; j ) b e the n um b er of length j prefxes co v ered b y prefxes whose length is b et w een a and j )Tj/T1_0 11.9552 Tf11.9894 0 Td(1. So, n ( b; j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(n ( a; j ) = n ( j; j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(cov ( b; j ) )Tj/T1_0 11.9552 Tf11.8694 0 Td([ n ( j; j ) )Tj/T1_1 11.9552 Tf11.8694 0 Td(cov ( a; j )] = cov ( a; j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(cov ( b; j ) = N 1 ( j ) + 2 N 2 ( j ) Therefore, E C ost ( a; j ) )Tj/T1_1 11.9552 Tf11.9894 0 Td(E C ost ( b; j ) )Tj/T1_0 11.9552 Tf11.9894 0 Td(2( n ( b; j ) )Tj/T1_1 11.9552 Tf11.8694 0 Td(n ( a; j )) = 2( N 0 ( j ) )Tj/T1_1 11.9552 Tf11.8694 0 Td(N 2 ( j )) So, Z is true i N 0 ( j ) N 2 ( j ). The 1-bit trie for practical data sets has b et w een 2 n and 3 n no des ([71 73]). So, for large j dum ( j ) is fairly close to 2 j )Tj/T1_5 7.9701 Tf6.5993 0 Td(1 while the n um b er of no des that ha v e 2 prefxes is at most n= 2. Since the no des that comprise N 0 ( j ) are dra wn from a m uc h larger p o ol (the p o ol is dum ( j ) plus the empt y lev el j )Tj/T1_0 11.9552 Tf11.1494 0 Td(1 no des of the 1-bit trie) while those that comprise N 2 ( j ) are dra wn from a m uc h

PAGE 108

94 Heuristic OptimalLengths( W k ) // W is length of longest prefx. // k is maxim um n um b er of target lengths desired. // Return C ( W ; k ) and compute M ( ; ). f for ( j = 1; j < = W ; j + +) f C ( j; 1) := E ntr ies ( j ) + E C ost (1 ; j ); M ( j; 1) := )Tj/T1_0 11.9552 Tf9.2294 0 Td(1; g for ( r = 1; r < = k ; r + +) C (0 ; r ) := 0; for ( r = 2; r k ; r + +) for ( j = r ; j < = W ; j + +) f // Compute C ( j; r ) minJ := max ( M ( j )Tj/T1_0 11.9552 Tf11.8694 0 Td(1 ; r ) ; M ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) ; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1); minC ost := E ntr ies ( j ) + C ( minJ ; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ost ( minJ + 1 ; j ); minC ost := C ( j; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1); minL := M ( j; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1); for ( m = minJ + 1; m < j ; m + +) f cost := E ntr ies ( j ) + C ( m; r )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) + E C ost ( m + 1 ; j ); if ( cost < minC ost ) then f minC ost := cost ; minL := m ; gg C [ j; r ] := minC ost ; M [ j; r ] := minL ; g return C [ W ; k ]; g Figure 3{7: Algorithm for binary-searc h hash tables smaller p o ol (the p o ol comprises the no des at lev el j )Tj/T1_0 11.9552 Tf12.3494 0 Td(1 that ha v e 2 prefxes), w e exp ect N 0 ( j ) >> N 2 ( j ). F or small j almost all no des (dumm y or otherwise) at lev el j )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 are empt y Again, w e exp ect N 0 ( j ) N 2 ( j ). Theorem 2 leads to Heuristic OptimalL engths (Figure 3{7), whic h computes C ( W ; k ). The complexit y of this algorithm is O ( k W 2 ). Using the M v alues, the at most k storage-optimal target lengths ma y b e determined in an additional O ( k ) time. When w e add in the time needed to compute the E C ost and E ntr ies v alues, the asymptotic complexit y b ecomes O ( nW 2 ). On practical data sets, the complexit y is O ( nW + k W 2 ).

PAGE 109

95 3.5 More Accurate Cost Estimator In Section 3.1, w e stated three reasons wh y the actual n um b er of prefxes and mark ers in H j ma y b e considerably less than E ntr ies ( j ) + E xpansionC ost ( m + 1 ; j ). Let E C ( i; j ) b e as defned in Section 3.2 and let M C ost ( i; j ) ; 0 < i j W b e the n um b er of subtrees ro oted at lev el j of the 1-bit trie that con tain prefxes that are not co v ered b y a prefx whose length is b et w een i and j F rom the reasons stated in Section 3.1, it follo ws that E C ( i; j ) + M C ost ( i; j ) is a more accurate estimator of actual n um b er of mark ers and prefxes in H j Using the more accurate estimator for the n um b er of prefxes and mark ers, the dynamic programming recurrence for C (Equations 3.6 and 3.7) b ecomes C ( j; r ) = min m 2f 0 :::j )Tj/T1_6 7.9701 Tf6.5992 0 Td(1 g f C ( m; r )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) + E C ( m + 1 ; j ) + M C ost ( m + 1 ; j ) g ; j > 0 ; r > 1 (3.16) C (0 ; r ) = 0 ; C ( j; 1) = E C (1 ; j ) + M C ost (1 ; j ) ; j > 0 (3.17) Since M C ost ma y b e computed in the same asymptotic time as needed to compute E C and E C ost the asymptotic complexit y of the heuristic with the more accurate cost estimator is the same as that of the heuristic of [80]. 3.6 Exp erimen tal Results W e programmed our space-optimal algorithm, reduced-range heuristic of Section 3.4, and the more accurate cost estimator heuristic of Section 3.5 in C and compared their p erformance against that of the heuristic of Sriniv asan [80 ]. All algorithms and heuristics w ere adapted for b oth the E C H T and the AC H T problems. All co des w ere compiled using the gcc compiler and optimization lev el -O2. The co des w ere run on a SUN Ultra En terprise 4000/5000 computer. F or test data, w e used the fv e IPv4 prefx databases of T able 2{1. These databases corresp ond to

PAGE 110

96 bac kb one routers. Notice that the n um b er of no des in the 1-bit trie for eac h of our databases is b et w een 2 n and 3 n where n is the n um b er of prefxes in the database. T able 3{1 sho ws the memory (i.e., sum of n um b er of prefxes and mark ers to b e stored in the hash tables) required b y the solution to E C H T ( P ; k ) for eac h of our fv e databases. The k v alues used b y us are 3, 7, and 15 (corresp onding to a lo okup p erformance of 2, 3, and 4 memory accesses p er lo okup, resp ectiv ely). F or the t w o heuristics S (heuristic of [80]) and A C (heuristic of Section 3.5, w e pro vide b oth the memory requiremen t as estimated b y the T function as w ell as the actual requiremen t of the solution generated b y these t w o heuristics (since the t w o heuristics consisten tly generated solutions with the same actual memory requiremen t, the actual requiremen t data is pro vided in a single column). This latter quan tit y is obtained b y coun ting the n um b er of prefxes and mark ers for the k lengths determined b y the heuristic. As exp ected, the use of the more accurate cost estimator in heuristic A C results in smaller T v alues. Ho w ev er, these smaller v alues do not translate in to a reduced actual memory cost. In all cases, the use of the more accurate cost estimator did not aect the selection of the k lengths and the resulting actual n um b er of prefxes and mark ers w as the same using heuristics S and A C. The space-optimal algorithm of Section 3.2 pro duces solutions whose memory requiremen t is up to 15% less than that of the t w o heuristics. In terestingly for k = 3, the t w o heuristics generate optimal solutions for all 5 of our databases. T able 3{2 giv es the memory requiremen ts of the solutions obtained b y the t w o heuristics and the optimal algorithm for AC H T ( P ; k ). F or the cases k = 3 and 7, the memory requiremen ts of the optimal solutions as w ell as those of the heuristic solutions for the \at most k lengths" v ersion of our problem are the same as those for the \exactly k lengths v ersion". Ho w ev er, for all 5 of our databases the optimal solution for AC H T ( P ; 15) is sup erior to that for E C H T ( P ; 15). The C v alue of the heuristic solutions for AC H T ( P ; 15) are smaller than the T v alues of the heuristic

PAGE 111

97 T able 3{1: Num b er of prefxes and mark ers in solution to E C H T ( P ; k ) Database k T Actual Optimal S A C P aix 3 321,189 305,147 299,884 299,884 7 167,162 158,542 144,617 124,516 15 167,048 158,428 118,687 107,195 Pb 3 143,648 138,880 133,105 133,105 7 75,036 72,313 62,598 53,318 15 74,921 72,198 50,389 44,996 MaeW est 3 140,516 136,194 131,042 13,1042 7 66,251 63,585 55,006 48,404 15 66,157 63,491 44,592 39,620 Aads 3 117,908 114,452 109,430 109,430 7 58,732 56,600 48,436 41,864 15 58,630 56,498 38,949 34,705 MaeEast 3 103,607 100,599 95,822 95,822 7 51,075 49,111 41,672 36,151 15 50,952 48,988 33,651 29,610 solutions for E C H T ( P ; 15). With the exception of MaeW est, ho w ev er, the actual cost of the heuristic solutions for AC H T ( P ; 15) are larger than the actual costs of the heuristic solutions for E C H T ( P ; 15). This is due to the fact that T and C are only upp er b ounds on actual cost. T able 3{3 giv es the prepro cessing time (i.e., the time to compute E xpansionC ost E C E ntr ies M C and M C ost as needed b y the heuristic or optimal algorithm) for our heuristics and optimal algorithm. The prepro cessing time for the optimal algorithm is 8 to 9 times that for the heuristic of [80]. The prepro cessing time for the more accurate cost-estimator heuristic is ab out 60% more than that for the heuristic of [80]. T ables 3{4 and 3{5 giv e the times needed to solv e the dynamic programming recurrences for our heuristics and our optimal-space algorithm. In these tables, RRS and RRA C refer to the reduced-range v ersions of S and A C, resp ectiv ely As exp ected from the analyses of these metho ds, the prepro cessing time is signifcan tly larger than the time needed to solv e the dynamic programming recurrences (note that the times in

PAGE 112

98 T able 3{2: Num b er of prefxes and mark ers in solution to AC H T ( P ; k ) Database k C Actual Optimal S A C P aix 3 321,189 305,147 299,884 299,884 7 167,162 158,542 144,617 124,516 15 167,034 158,414 124,832 105,978 Pb 3 143,648 138,880 133,105 133,105 7 75,036 72,313 62,598 53,318 15 74,908 72,185 54,922 44,552 MaeW est 3 140,516 136,194 131,042 131,042 7 66,251 63,585 55,006 48,404 15 66,143 63,477 44,449 39,209 Aads 3 117,908 114,452 109,430 109,430 7 58,732 56,600 48,436 41,864 15 58,617 56,485 42,735 34,299 MaeEast 3 103,607 100,599 95,822 95,822 7 51,075 49,111 41,672 36,151 15 50,937 48,973 37,076 29,255 T able 3{3: Prepro cessing time in milliseconds Database S A C Optimal P aix 540 820 4,230 Pb 280 440 2,460 MaeW est 260 390 2,210 Aads 230 350 2,049 MaeEast 210 320 1,889

PAGE 113

99 T able 3{4: Execution time, in sec, for E C H T ( P ; k ) Database k S RRS A C RRA C Optimal P aix 3 40 18 46 21 400 7 168 68 223 76 13,660 15 344 132 442 168 24,650 Pb 3 40 19 63 21 400 7 172 73 311 97 13,720 15 329 138 333 159 24,750 MaeW est 3 41 19 39 25 410 7 168 69 196 84 14,620 15 335 134 416 146 24,670 Aads 3 38 19 39 21 420 7 164 72 263 109 13,650 15 343 139 421 154 24,550 MaeEast 3 39 19 38 22 400 7 168 74 163 82 13,690 15 346 141 332 158 25,060 T able 3{3 are in milliseconds while those in T ables 3{4 and 3{5 are in microseconds). Note also that the time for the reduced-range v ersion of eac h heuristic is less than half that of the original heuristic. 3.7 Summary W e ha v e dev elop ed optimal algorithms for the E C H T ( P ; k ) and AC H T ( P ; k ) problems; sho wn ho w the dynamic programming recurrence for the heuristic of [80] ma y b e solv ed in O ( nW + k W 2 ) time on practical data sets (in con trast, the analysis of [80 ] suggests an O ( nW 2 ) complexit y); and prop osed a reduced-range heuristic as w ell as a more accurate cost-estimator heuristic. Exp erimen tal results sho w that the reduced-range heuristic reduces the time to solv e the dynamic programming recurrences b y more than a factor of 2 while yielding the same result as the original full-range heuristic. W e are unable to compare the reduction in run time that results from our metho ds to do the prepro cessing v ersus that prop osed in [80], b ecause the co de of [80 ] is una v ailable. Although the more accurate cost-estimator heuristic results in solutions with a b etter cost estimate than those pro duced b y the heuristic of

PAGE 114

100 T able 3{5: Execution time, in sec, for AC H T ( P ; k ) Database k S RRS A C RRA C Optimal P aix 3 40 18 46 21 440 7 170 66 165 113 14,420 15 344 127 420 162 26,490 Pb 3 39 19 86 27 440 7 165 69 210 82 14,500 15 342 136 401 146 25,800 MaeW est 3 39 19 39 21 430 7 173 67 231 95 14,570 15 338 129 462 170 26,060 Aads 3 40 19 39 28 420 7 168 69 264 81 14,550 15 343 131 485 149 26,240 MaeEast 3 40 19 39 22 430 7 167 70 162 109 14,470 15 346 134 365 183 26,530 [80], the actual costs of the solutions pro duced b y the t w o heuristics are the same for our test sets. The optimal-space algorithm pro duces solutions with a memory requiremen t up to 15% less than that of the solutions pro duced b y the heuristics. Ho w ev er, the optimal-space algorithm tak es b et w een 8 and 9 times as m uc h time (prepro cessing and recurrence solution time) as do es the heuristic of [80] and b et w een 5 and 6 times the time tak en b y the more accurate cost-estimator heuristic. Ev en though the optimal-space algorithm tak es signifcan tly more time than is tak en b y an y of the heuristics, its time requiremen ts are v ery practical. Therefore, the optimal-space algorithm is recommended for applications in whic h memory conserv ation is crucial. When a near-optimal solution suces, our reduced-range heuristic should b e used. Finally although, on our test sets, our more accurate cost estimator pro duced solutions that ha v e the same cost as those obtained using the cost estimator in [80 ], there seems to b e no reason to use the less accurate estimator.

PAGE 115

CHAPTER 4 O (log n ) D YNAMIC R OUTER-T ABLE In this c hapter, w e sho w ho w to use the range enco ding idea of [44] so that longest prefx matc hing as w ell as prefx insertion and deletion can b e done in O (log n ) time. Despite the in tense researc h that has b een conducted in recen t y ears, there is no kno wn w a y to p eform longest prefx matc hes as w ell as insertion and deletion of prefxes in O (log n ) time. In Section 4.1, w e describ e the range enco ding tec hnique of [44]. W e establish a few prop erties of ranges that represen t prefxes in Section 4.2. Our O (log n ) metho d is describ ed in Section 4.3. In Section 4.4, w e presen t our exp erimen tal results. These results, obtained using real IPv4 prefx databases, indicate that the O (log n ) metho d prop osed in this c hapter represen ts a go o d alternativ e to existing metho ds in en vironmen ts where there is a signifcan t n um b er of insert and/or delete op eations. F or example, our metho d tak es more time to fnd the longest matc hing prefx than do the v ariable-stride tries of [82 ]. Ho w ev er, although these tries are optimized for longest matc hing prefx searc hes, they p erform v ery p o orly when it comes to insertion and deletion of prefxes. Our prop osed metho d handily outp erforms v ariable-stride tries on these latter op erations. 4.1 Prefxes and Ranges Lampson, Sriniv asan, and V arghese [44] ha v e prop osed a binary searc h sc heme for longest prefx matc hing. In this sc heme, eac h prefx is represen ted as a range [ s; f ], where s is the start of the range for the prefx and f is the fnish of the range for that prefx. F or example, when W = 5, the prefx P = 1* matc hes all destination addresses in the range [10000, 11111] = [16, 31]. So, for prefx P s = 16 and f = 31. Figure 4{1 sho ws a set of fv e prefxes together with the start and fnish of the range 101

PAGE 116

102 Prefx Name Prefx Range Start Range Finish P1 0 31 P2 0101* 10 11 P3 100* 16 19 P4 1001* 18 19 P5 10111 23 23 Figure 4{1: Prefxes and their ranges P3 P1 P5 P2 P4 r2 r1 r3 r5 r4 r6 r7 11 10 0 16 18 19 23 31 End P oin t > = 0 P1 P1 10 P2 P2 11 P1 P2 16 P3 P3 18 P4 P4 19 P1 P4 23 P1 P5 31 P1 (a) Pictorial represen tation of (b) T able for binary searc h prefxes and ranges Figure 4{2: Pictorial and tabular represen tation of prefxes and ranges for eac h. This fgure assumes that W = 5. The prefx P1 = *, whic h matc hes all legal destination addresses, is called the default prefx. Although a real router database ma y not include the default prefx, we assume thr oughout this chapter that this pr efx is always pr esent This assumption do es not, in an y w a y aect the v alidit y of our w ork as w e ma y simply augmen t router databases that do not include the default prefx with a default prefx whose next hop feld is n ull. Prefxes and their ranges ma y b e dra wn as nested rectangles as in Figure 4{2(a), whic h giv es the pictorial represen tation of the fv e prefxes of Figure 4{1. Lampson et al. [44] prop ose the construction of a table of distinct range endp oin ts suc h as the one sho wn in Figure 4{2(b). The distinct end p oin ts (range start and fnish p oin ts) for the prefxes of Figure 4{1 are [0, 10, 11, 16, 18, 19, 23, 31]. Let r i 1 i q 2 n b e the q distinct range-end-p oin ts for a set of n prefxes. Let r q +1 = 1 Let LM P ( d ) b e the longest matc hing prefx for the destination address

PAGE 117

103 d With eac h distinct range end-p oin t, r i 1 i q the table stores the longest matc hing prefx for destination addresses d suc h that (a) r i < d < r i +1 (this is the column lab eled \ > in Figure 4{2(b)) and (b) r i = d (column lab eled \="). No w, LM P ( d ), r 1 d r q can b e determined in O (log n ) time b y p erforming a binary searc h to fnd the unique i suc h that r i d < r i +1 If r i = d LM P ( d ) is giv en b y the \=" en try; otherwise, it is giv en b y the \ > en try F or example, since d = 20 satisfes 19 d < 23 and since d 6= 19, the \ > en try of the end p oin t 19 is used to determine that LM P (20) is P1. As noted b y Lampson et al. [44], the range end-p oin t table can b e built in O ( n ) time (this assumes that the end p oin ts are a v ailable in ascending order). Unfortunately as stated in [44], up dating the range end-p oin t table follo wing the insertion or deletion of a prefx also tak es O ( n ) time b ecause O ( n ) \ > and/or \=" en tries ma y c hange. Although Lampson et al. [44] pro vide w a ys to reduce the complexit y of the searc h for the LMP b y a constan t factor, these metho ds do not result in sc hemes that p ermit prefx insertion and deletion in O (log n ) time. 4.2 Prop erties of Prefx Ranges The length l eng th ( P ), of a prefx P is the n um b er of zero es and ones in the binary represen tation of the prefx. F or example, P1 of Figure 4{1 has a length of 0 and l eng th (P4) = 4. W is the n um b er of bits in a destination address. Hence, the n um b er of bits in the start and fnish p oin ts of a prefx also is W P = [ s; f ] is a trivial pr efx i l eng th ( P ) = W (equiv alen tly i s = f ). P is a nontrivial pr efx i l eng th ( P ) < W (equiv alen tly i s 6= f ). Prefxes P1{P4 of Figure 4{1 are non trivial while P5 is a trivial prefx. Let l sb ( x ) b e the least signifcan t bit in the binary represen tation of x F or example, l sb (32) = 0 and lsb(3) = 1. Lemma 19 If P = [ s; f ] is a nontrivial pr efx, then l sb ( s ) = 0 and l sb ( f ) = 1 Pro of Since P is non trivial, l eng th ( P ) < W Therefore, s is the bits of P follo w ed b y W )Tj/T1_1 11.9552 Tf11.8694 0 Td(l eng th ( P ) > 0 zero es and f is the bits of P follo w ed b y W )Tj/T1_1 11.9552 Tf11.8694 0 Td(l eng th ( P ) > 0 ones. Consequen tly l sb ( s ) = 0 and l sb ( f ) = 1.

PAGE 118

104 u w v x (a) In tersecting ranges (b) Nested ranges u v w x (c) Disjoin t ranges Figure 4{3: T yp es of prefx ranges Tw o ranges [ u; v ] and [ w ; x ], u v w x u w interse ct i u < w < v < x (see Figure 4{3(a)). The ranges are neste d i u w x v (see Figure 4{3(b)). The ranges are disjoint i v < w (see Figure 4{3(c)). Tw o prefxes in tersect, are nested, or are disjoin t i the corresp onding prop ert y holds with resp ect to their ranges. The follo wing lemma is implicit in [44] and other pap ers on prefx matc hing. Lemma 20 L et P i = [ s i ; f i ] and P j = [ s j ; f j ] b e two dier ent pr efxes. P i and P j ar e either neste d or disjoint (i.e., they c annot interse ct). Pro of When l eng th ( P i ) = l eng th ( P j ), the destination addresses matc hed b y P i and P j are dieren t. So, the ranges of P i and P j (and hence the prefxes) are disjoin t. When l eng th ( P i ) 6= l eng th ( P j ), w e ma y without loss of generalit y assume that l eng th ( P i ) < l eng th ( P j ). If P i is not a prefx of P j (i.e., P i and P j dier in one of the sp ecifed bits), then again, the ranges of P i and P j (and hence the prefxes) are disjoin t. If P i is a prefx of P j s i s j f j f i Consequen tly P j is nested within P i

PAGE 119

105 Lemma 21 L et P = [ s; f ] s 6= f b e a pr efx and let a = b ( s + f ) = 2 c P is the longest length pr efx that includes 1 [ a; a + 1] Pro of First observ e that f = s + 2 W )Tj/T1_6 7.9701 Tf6.5992 0 Td(g )Tj/T1_0 11.9552 Tf11.1494 0 Td(1, where g = l eng th ( P ). Since prefxes do not in tersect, an y longer (or equal) length prefx P 0 = [ s 0 ; f 0 ] that includes [ a; a + 1] m ust ha v e s s 0 a < a + 1 f 0 f F urther, s s 0 f and f 0 all ha v e the same frst g bits and s 0 and f 0 ha v e the same frst g + 1 (or more) bits. Since a and a + 1 dier in bit g + 1, P 0 cannot include [ a; a + 1]. Therefore, no prefx whose length is longer than that of P can include [ a; a + 1]. If l eng th ( P 0 ) = l eng th ( P ), P 0 = P So, P is the longest length prefx that includes [ a; a + 1]. 4.3 Represen tation Using Binary Searc h T rees 4.3.1 Represen tation Let r i 1 i q 2 n b e the distinct end p oin ts of the giv en set of n prefxes. Assume that these end p oin ts are ordered so that r i < r i +1 1 i < q Eac h of the in terv als [ r i ; r i +1 ], 1 i < q is called a b asic interval The basic in terv als of the fv e-prefx example of Figure 4{1 are [0, 10], [10, 11], [11, 16], [16, 18], [18, 19], [19, 23], and [23, 31]. These basic in terv als are lab eled r 1 through r 7 in Figure 4{2(a). T o p erform longest prefx matc hes, inserts and deletes in O (log n ) time p er op eration, w e use a collection of n + 1 binary searc h trees (CBST). Although the O (log n ) p erformance results only when eac h of the n + 1 binary searc h trees in the CBST is a balanced binary searc h tree, w e in tro duce the CBST in terms of binary searc h trees that are not necessarily balanced. Basic In terv al T ree (BIT) Of the n + 1 binary searc h trees in the CBST, one is called the b asic interval tr e e (BIT) The BIT comprises in ternal and external no des and there is one in ternal no de 1 The prefx P i = [ s i ; f i ] includes the in terv al [ a; b ] i s i a b f i .

PAGE 120

106 10 11 16 r1 r2 r3 r4 r5 r6 r7 18 31 0 19 23 (a) Basic in terv al tree P1 16 11 10 r1 P2 r3 P3 r6 r7 0 19 23 (b) Prefx tree for P 1 P2 10 r2 (c) P3 18 16 r4 P4 (d) P4 18 r5 (e) P5 (f ) (c){(f ) Prefx trees for P 2 through P 5 Figure 4{4: CBST for Figure 4{2(a) for eac h r i Since the BIT has q in ternal no des, it has q + 1 external no des. The frst and last of these, in inorder, ha v e no signifcance. The remaining q )Tj/T1_0 11.9552 Tf11.8693 0 Td(1 external no des, in inorder, represen t the q )Tj/T1_0 11.9552 Tf11.0293 0 Td(1 basic in terv als of the giv en prefx set. Figure 4{ 4(a) giv es a p ossible (w e sa y p ossible b ecause, at this time, an y binary searc h tree organization for the in ternal no des will suce) BIT for our fv e-prefx example of Figure 4{2(a). In ternal no des are sho wn as rectangles while circles denote external no des. The felds of the BIT in ternal no des are called k ey l ef tC hil d and r ig htC hil d W e describ e the structure of the BIT external no des later. Prefx T rees The remaining n binary searc h trees in the CBST are pr efx tr e es F or eac h of the n prefxes in the router table, there is exactly one prefx tree. F or eac h prefx and basic in terv al, x defne next ( x ) to b e the smallest range prefx (i.e., the longest prefx) whose range includes the range of x F or the example of Figure 4{2(a), the next ()

PAGE 121

107 P1 16 19 23 31 P5 18 0 11 10 r2 r3 P4 r5 P3 r6 r7 P2 r1 r4 Figure 4{5: V alues of next () are sho wn as left arro ws. v alues for the basic in terv als r 1 through r 7 are, resp ectiv ely P 1, P 2, P 1, P 3, P 4, P 1, and P 1. Notice that the next v alue for the range [ r i ; r i +1 ] is the same as the \ > v alue for r i in Figure 4{2(b), 1 i < q The next () v alues for the non trivial prefxes P 1 through P 4 of Figure 4{2(a) are, resp ectiv ely \-", P 1, P 1, and P 3. The next () v alues for the basic in terv als and the non trivial prefxes of Figure 4{2(a) are sho wn in Figure 4{5 as left arro ws. The prefx tree for prefx P comprises a header no de plus one no de, called a pr efx no de for ev ery non trivial prefx or basic in terv al x suc h that next ( x ) = P The prefx trees for eac h of the fv e prefxes of Figure 4{2(a) are sho wn in Figures 4{4(b)-(f ). Notice that prefx trees do not ha v e external no des and that the prefx no des of a prefx tree store the start p oin t of the range or prefx represen ted b y that prefx no de. In the fgures, the start p oin ts of the basic in terv als and prefxes are sho wn inside the prefx no des while the basic in terv al or prefx name is sho wn outside the no de.

PAGE 122

108 Notic e also that nontrivial pr efxes and b asic intervals do not stor e the value of next() explicitly. The value of next() is stor e d only in the he ader of a pr efx tr e e. 2 BIT External No des Eac h of the q )Tj/T1_0 11.9552 Tf11.7494 0 Td(1 external no des of the BIT that represen ts a basic in terv al x p oin ts to the prefx no de that represen ts this basic in terv al in the prefx tree for next ( x ). W e call this p oin ter basicI nter v al P ointer In addition, an external no de that represen ts the basic in terv al x = [ r i ; r i +1 ] has a p oin ter star tP ointer ( f inishP ointer ) whic h p oin ts to the header no de of the prefx tree for the trivial prefx (if an y) whose range start and fnish p oin ts are r i ( r i +1 ). F or example, star tP ointer for r 7 = [23,31] in Figure 4{2(a) p oin ts to the header no de for the prefx tree of the trivial prefx P 5; f inishP ointer for r 6 = [19, 23] also p oin ts to the header no de for the prefx tree of P 5; the remaining start and fnish p oin ters are n ull. 4.3.2 Longest Prefx Matc hing Notice that, b ecause of our assumption that the default prefx is alw a ys presen t, there is alw a ys a prefx in our database that matc hes an y W -bit destination address d The searc h for the longest prefx that matc hes d is done in t w o steps: Step 1 First w e start at the ro ot of the BIT and mo v e do wn to an appropriate external no de. An external no de x that represen ts the basic in terv al [ r i ; r i +1 ] is appr opriate for d i (a) d = r i and x:star tP ointer 6= nul l or (b) d = r i +1 and x:f inishP ointer 6= nul l or (c) LM P ( d ) = next ( x ). Notice that the appropriate no de for a giv en d ma y not b e unique. F or instance, for our example BIT, the external no des for b oth r 6 and r 7 are appropriate when d = 23. When d = 18, only the external no de for r 5 is appropriate. 2 If next () v alues w ere explicitly stored with basic in terv als and trivial prefxes, an up date w ould tak e O ( n ) time, b ecause O ( n ) next () v alues c hange follo wing an insert/delete.

PAGE 123

109 Step 2 If cases (a) or (b) of Step 1 apply then LM P ( d ) is obtained b y follo wing the non-n ull start or fnish p oin ter. When case (c) applies, the basic in terv al p oin ter is follo w ed in to the prefx tree corresp onding to next ( x ). The header no de of this prefx tree con tains the longest matc hing prefx for d This header no de is lo cated b y follo wing paren t p oin ters. In step 1, w e searc h for an appropriate external no de b y p erforming a series of comparisons b eginning at the ro ot of the BIT. The searc h pro cess diers from that emplo y ed to searc h a normal binary searc h tree (see, for example, [39]) only in ho w w e handle equalit y b et w een the address d and the k ey in the curren t searc h tree no de y Whenev er d equals the k ey in an in ternal no de y (i.e., d = y :k ey ) of the BIT, w e kno w that the basic in terv al [ r i ; r i +1 ] represen ted b y the righ tmost (leftmost) external no de in the left (righ t) subtree of y is suc h that r i +1 = d ( r i = d ). It is not to o dicult to see that one (or b oth) of these t w o external no des is an appropriate external no de for d T o determine whic h, w e examine the least signifcan t bit ( l sb ( k ey :y )) of k ey :y (equiv alen tly examine l sb ( d )). If l sb ( k ey :y ) = 0, then it follo ws from Lemma 19 that y :k ey = d is the start p oin t of some prefx (note that the start and fnish p oin ts of a trivial prefx are the same). Therefore, the leftmost external no de in the righ t subtree of y is an appropriate no de for d (recall that the basic in terv al for this external no de is [ r i ; r i +1 ], where r i = d ). When l sb ( y :k ey ) = 1, y :k ey = d is the fnish p oin t of some prefx and so the righ tmost external no de in the left subtree of y is an appropriate no de for d This external no de has r i +1 = d As an example, supp ose w e wish to determine LM P (11). W e start at the ro ot of the BIT of Figure 4{4(a). Since d = 11 < r oot:k ey = 18, the curren t no de y b ecome the left c hild of the ro ot. No w, since d = y :k ey and l sb ( y :k ey ) = 1, the appropriate external no de for d is the righ tmost external no de in the left subtree of y This external no de represen ts the basic in terv al r 2. Notice that next ( r 2) = P 2. As another example, consider determining LM P (18). Since d = 18 = r oot:k ey

PAGE 124

110 and l sb ( r oot:k ey ) = 0, the appropriate external no de is the leftmost external no de in the righ t subtree of the ro ot. This external no de represen ts the basic in terv al r 5 = [ r i ; r i +1 ] = [18, 19]. Once again, notice that LM P (18) = next ( r 5) = P 4. F or d = 23, w e reac h the external no de for r 6 = [ r i ; r i +1 ] = [19,23]. Since d = r i +1 and the fnish p oin ter of this external no de is non-n ull, the fnish p oin ter (this p oin ts to the header no de of the prefx tree for the trivial prefx P 5) is used to determine LM P (23) = P 5. Notice that when the router table has a trivial prefx that matc hes the destination address d this trivial prefx is LM P ( d ). Figure 4{6 giv es a high-lev el statemen t of the algorithm to determine LM P ( d ). Theorem 3 (a) A lgorithm longestMatc hingPrefx c orr e ctly fnds LM P ( d ) (b) The c omplexity of algorithm longestMatc hingPrefx is O ( heig ht ( B I T ) + heig ht ( pr ef ixT r ee ( d ))) wher e pr ef ixT r ee ( d ) is the pr efx tr e e for LM P ( d ) Pro of Correctness follo ws from the defnition of the BIT and prefx tree data structures. F or the complexit y w e note that it tak es O ( heig ht ( B I T )) time to fnd the appropriate external no de and an additional O ( heig ht ( pr ef ixT r ee ( d ))) time to fnd LM P ( d ) in case the function pr ef ix is in v ok ed. 4.3.3 Inserting a Prefx Supp ose w e wish to add the prefx P 6 = 01* = [8, 15] to the prefx set P 1P 5. Figure 4{7(a) giv es the pictorial represen tation for the prefxes P 1P 6. Relativ e to the pictorial represen tation of P 1P 5 (Figure 4{2(a)), w e see that the insertion of P 6 has created t w o new end p oin ts (8 and 15), the basic in terv al r 1 has b een split in to the basic in terv als r 1 a and r 1 b as a result of the new end p oin t 8, and the basic in terv al r 3 has b een split in to the basic in terv als r 3 a and r 3 b as a result of the new end p oin t 15. Figure 4{7(b) sho ws the pictorial represen tation for the case when P 1P 5 are augmen ted b y the prefx P 7 = 10* = [16, 23]. In this case no new end p oin ts are created and none of the basic in terv als of P 1P 5 split.

PAGE 125

111 algorithm l ong estM atching P r ef ix ( d ) f // return header no de for LM P ( d ) // fnd appropriate external no de y = ro ot of BIT; while ( y is an in ternal no de) if ( d < y :k ey ) y = y :l ef tC hil d ; else if ( d > y :k ey ) y = y :r ig htC hil d ; else // d equals y :k ey if ( l sb ( y :k ey ) is 0) f eN ode = leftmost external no de in righ t subtree of y ; if ( eN ode:star tP ointer is n ull) return ( pr ef ix ( eN ode:basicI nter v al P ointer )); else return ( eN ode:star tP ointer ); g else // l sb ( y :k ey ) is 1 f eN ode = righ t most external no de in left subtree of y ; if ( eN ode:f inishP ointer is n ull) return ( pr ef ix ( eN ode:basicI nter v al P ointer )); else return ( eN ode:f inishP ointer ); g return ( pr ef ix ( y :basicI nter v al P ointer )); g algorithm pr ef ix ( pN ode ) f // return prefx in header no de of prefx tree that con tains no de pN ode y = pN ode ; while ( y is not a header no de) y = y :par ent ; return ( y ); g Figure 4{6: Algorithm to fnd LM P ( d )

PAGE 126

112 P3 P1 16 19 31 P5 P6 11 15 P2 P4 r7 r6 r5 r4 r2 r1b r1a r3a r3b 0 8 10 18 23 (a) P3 P1 P7 10 16 19 P5 r1 11 P2 P4 r7 r6 r5 r4 r3 r2 0 18 23 31 (b) (a) Figure 4{2(a) after inserting P 6 = 01 (b) Figure 4{2(a) after inserting P 7 = 10 Figure 4{7: Pictorial represen tation of prefxes and ranges after inserting a prefx In addition to p ossibly increasing the n um b er of distinct end p oin ts, the insertion of a new prefx c hanges the next () v alue of certain prefxes and basic in terv als. The insertion of P 6 in to P 1P 5 c hanges next ( P 2) from P 1 to P 6 ( next ( r 1 b ) and next ( r 3 a ) b ecome P 6). The insertion of P 7 in to R 1R 5 c hanges next ( P 3) and next ( r 6) from P 1 to P 7. Up dating the BIT Since the default prefx is alw a ys presen t, w e need not b e concerned with insertion in to an empt y BIT. It is easy to v erify that the insertion of a new prefx will increase the n um b er of distinct end p oin ts b y 0, 1, or 2. Corresp ondingly the n um b er of basic in terv als will increase b y 0, 1, or 2. Because the n um b er of in ternal (external) no des in a BIT equals (is one more than) the n um b er of distinct end p oin ts, the n um b er of in ternal and external no des in the BIT increases b y the same amoun t as do es the n um b er of

PAGE 127

113 10 11 16 r2 r4 r5 r6 r7 18 31 r1a r1b r3b r3a 15 0 8 19 23 (a) BIT for P 1P 5 and P 6 P1 16 15 r1a r3b P6 P3 r6 r7 0 8 19 23 (b) prefx tree for P 1 P6 10 r1b P2 r3a 8 11 (c) prefx tree for P 6 Figure 4{8: Basic in terv al tree and prefx trees after inserting P 6 = 01 in to Figure 4{ 4 distinct end p oin ts. Figure 4{8(a) sho ws the BIT for P 1P 6. Since the insertion of P 7 in to the prefx set P 1P 5 do es not c hange the set of distinct end p oin ts, the BIT for P 1P 5 and P 7 has the same structure as do es that for P 1P 5. Lemma 22 L et P = [ s; f ] b e a new pr efx that is inserte d into a r outer datab ase. Assume that the insertion of P cr e ates no new end p oints. (a) If l eng th ( P ) < W the BIT is unchange d. (Even though the next value may change for sever al b asic intervals and pr efxes, these changes do not ae ct the BIT.) (b) If l eng th ( P ) = W the structur e of the BIT is unchange d. However, the start p ointer in the external no de for the b asic interval [ r i ; r i +1 ] wher e r i = s = f and the fnish p ointer in the external no de for the b asic interval [ r i ; r i +1 ] wher e

PAGE 128

114 algorithm inser tE ndP oint ( u ) f // insert the end p oin t u in to the BIT y = ro ot of BIT; while ( y is an in ternal no de) if ( u < y :k ey ) y = y :l ef tC hil d ; else if ( u > y :k ey ) y = y :r ig htC hil d ; else // u equals y :k ey f // u is not a new end p oin t if (length of new prefx is W ) f eN ode = leftmost external no de in righ t subtree of y ; up date eN ode:star tP ointer to p oin t to header no de for new prefx; eN ode = righ t most external no de in left subtree of y ; up date eN ode:f inishP ointer to p oin t to header no de for new prefx; g return ; g // u is a new end p oin t insert a new in ternal no de z with z :k ey = u b et w een y and its paren t and create a new external no de for the remaining c hild of z ; return ; g Figure 4{9: Algorithm to insert an end p oin t r i +1 = s = f change (b oth now p oint to the he ader no de for the pr efx tr e e of P ). Pro of Straigh tforw ard. T o up date the BIT as required b y the insertion of the prefx P = [ s; f ], w e insert the end p oin ts s and f in to the BIT using algorithm inser tE ndP oint of Figure 4{9. Of course, when s = f w e in v ok e inser tE ndP oint just once. The felds of the t w o external no de c hildren of the newly created in ternal no de z are easily c hanged/set to their correct v alues. When a new in ternal no de z is created, a basic in terv al [ r i ; r i +1 ] is split in to the t w o basic in terv als [ r i ; u ] and [ u; r i +1 ]. Let e 1 and e 2, resp ectiv ely b e the external no des that represen t these basic in terv als. Let e b e the external no de that represen ts the original in terv al [ r i ; r i +1 ] (note that

PAGE 129

115 ri u P ri+1 e2 e1 Figure 4{10: Splitting a basic in terv al when l sb ( u ) = 1 e is either e 1 or e 2). The start p oin ter of e 1 is the start p oin ter of e and the fnish p oin ter of e 2 is the fnish p oin ter of e When the length of the new prefx P is W the basic in terv al p oin ters of e 1 and e 2 are the same as that of e and the fnish p oin ter of e 1 and the start p oin ter of e 2 p oin t to the header no de of the prefx tree of the new prefx. When l eng th ( P ) 6= W the fnish p oin ter of e 1 and the start p oin ter of e 2 are n ull. F urther, when l eng th ( P ) 6= W and l sb ( u ) = 1 (see Figure 4{10), the basic in terv al p oin ter of e 2 is the same as that of e and the basic in terv al p oin ter of e 1 p oin ts to a new no de that is to go in to the prefx tree of the new prefx P The case when l eng th ( P ) 6= W and l sb ( u ) = 0 is similar. Theorem 4 (a) A lgorithm insertEndP oin t c orr e ctly inserts an end p oint into the BIT. (b) The c omplexity of the algorithm is O ( heig ht ( B I T )) Pro of Correctness follo ws from the defnition of a BIT. F or the complexit y w e see that it tak es O ( heig ht ( B I T )) time to exit the while lo op. The ensuing insert (if an y) of a new in ternal and external no de tak es O (1) time if the BIT is not to b e balanced and O ( heig ht ( B I T )) time if the BIT is to b e balanced. Up dating Prefx T rees When the prefx P = [ s; f ] is inserted, w e m ust create a new prefx tree for P Additionally when l eng th ( P ) < W or when l eng th ( P ) = W and s is a new end p oin t, w e m ust up date the prefx tree for the longest prefx Q = [ a; b ] suc h that a s f b (i.e., the prefx Q suc h that next ( P ) = Q ). Note that b ecause of our assumption that the default prefx is alw a ys presen t, Q exists whenev er P is not

PAGE 130

116 the default prefx. W e assume that whenev er a request is made to insert a prefx that is already in the database, w e need only up date the next-hop information asso ciated with this prefx. Therefore, the only time that Q do es not exist, w e are to simply lo cate the header no de for the default prefx and up date the next-hop information. F or the remainder of this subsection, w e assume that Q exists. Additional w ork that is to b e done includes the insertion of up to t w o new basic in terv al no des. These no des go in to the prefx trees for P and/or Q Consider the insertion of P 6 = [8 ; 15] in to P 1P 5 (Figures 4{2(a) and 4{7(a)). When P 6 is inserted, Q = P 1. Let Z b e the set of prefxes and basic in terv als x for whic h next ( x ) = Q = P 1 and the range of x is con tained within that of P 6 (i.e., P 2). The next () v alue for the prefxes and basic in terv als in Z c hanges from Q = P 1 to P 6. The basic in terv als ( r 1 and r 3) that in tersect the range of P 6 (recall from Lemma 20 that no prefx can in tersect P 6) get split in to four basic in terv als with t w o of these ha ving next v alue Q and the other t w o ha ving next v alue P 6. The prefx trees for prefxes other than Q and P 6 are unaected b y the insertion of P 6. T o mak e the ab o v e c hanges, w e use the split and join op erations [39 ] of a binary searc h tree. F or binary searc h trees T smal l and big these op erations are defned b elo w. T :spl it ( u ) : Split T in to t w o binary searc h trees smal l and big suc h that smal l has all k eys/elemen ts of T that are less than u and big has those that are greater than or equal to u j oin ( smal l ; big ) : This op eration starts with t w o binary searc h trees smal l and big with the prop ert y that all k eys in smal l are less than ev ery k ey in big and creates a binary searc h tree that includes all k eys in smal l and big T o determine, the basic in terv als and prefxes in the prefx tree of Q = P 1 whose next v alue c hanges to P 6, w e frst split the prefx tree of P 1 b y in v oking spl it (8) (8 is the start p oin t of the new prefx P 6). The resulting binary searc h trees smal l 1

PAGE 131

117 16 11 10 P1 r1 P2 r3 P7 r7 0 23 (a) prefx tree for P 1 after the insertion of P 7 16 P7 19 P3 r6 (b) prefx tree for P 7 Figure 4{11: Prefx trees after inserting P 7 = 10 in to P 1P 5 and big 1 ha v e the k eys f 0 g and f 10 ; 11 ; 16 ; 19 ; 23 g resp ectiv ely Next, w e split the binary searc h tree big 1 b y in v oking spl it (15) (15 is the fnish p oin t of P 6) to get the binary searc h trees smal l 2 and big 2, whic h ha v e the k eys f 10 ; 11 g and f 16 ; 19 ; 23 g resp ectiv ely W e no w ha v e three binary searc h trees smal l 1 with k ey f 0 g represen ting the basic in terv al f r 1 a g smal l 2 with k eys f 10 ; 11 g represen ting f r 1 b; P 2 g and big 2 with k eys f 16 ; 19 ; 23 g represen ting f P 3 ; r 6 ; r 7 g T o construct the new prefx tree for P 1, w e join smal l 1 and big 2 and then insert the basic in terv al r 3 b as w ell as the new prefx P 6. T o get the prefx tree for P 6, w e insert the basic in terv al r 1 b in to smal l 2. The resulting P 1 and P 6 prefx trees are sho wn in Figures 4{8(b) and (c). No w, consider the insertion of P 7 = [16, 23] in to P 1 )Tj/T1_4 11.9552 Tf11.0293 0 Td(P 5. Once again, Q = P 1. F ollo wing spl it (16), smal l 1 has the k eys f 0 ; 10 ; 11 g and big 1 has the k eys f 16 ; 19 ; 23 g When big 1 is split using spl it (23), w e get smal l 2 with k eys f 16 ; 19 g and big 2 with the k ey f 23 g T o get the new prefx tree for P 1, w e join smal l 1 and big 2 and then insert the new prefx P 7. The resulting tree has the k eys f 0 ; 10 ; 11 ; 16 ; 23 g (the k ey 16 represen ts P 7). smal l 2 is the tree for P 7. These prefx trees for P 1 and P 7 are sho wn in Figures 4{11(a) and (b). T o complete the discussion of the insertion op eration, w e need to describ e ho w the prefx Q is determined. When l eng th ( P ) < W Q ma y b e determined using Lemma 23. When l eng th ( P ) = W and s is a new end p oin t, Q is LM P ( s ).

PAGE 132

118 Lemma 23 L et R b e a pr efx set that includes the default pr efx L et P = [ s; f ] s 6= f (i.e., l eng th ( P ) < W ), P = 2 R b e a pr efx that is to b e inserte d into R L et a = b ( s + f ) = 2 c (a) Ther e is a unique b asic interval x of R that c ontains [ a; a + 1] (b) The longest pr efx Q 2 R that includes the interval [ s; f ] is next ( x ) Pro of (a) Since the default prefx is in R the distinct end p oin ts of R are 0 = r 1 < r 2 < ::: < r q = 2 W )Tj/T1_0 11.9552 Tf11.0293 0 Td(1. Therefore, there is a unique i suc h that r i a < a + 1 r i +1 So, x = [ r i ; r i +1 ] is the unique basic in terv al of R that con tains [ a; a i +1 ]. (b) By defnition, next ( x ) is the smallest range prefx (i.e., longest prefx) P 0 = [ s 0 ; f 0 ] of R that includes the basic in terv al [ r i ; r i +1 ]. Therefore, P 0 is the longest prefx of R that includes [ a; a + 1]. F rom Lemma 21 and P = 2 R (so P 6= P 0 ), it follo ws that l eng th ( P 0 ) < l eng th ( P ). Since prefxes do not in tersect and since b oth P and P 0 include [ a; a + 1], s 0 s a < a + 1 f f 0 F urther, since P 0 is the longest prefx of R with this prop ert y Q = P 0 = next ( x ). Figure 4{12 giv es a high-lev el description of our algorithm to up date the prefx trees. Theorem 5 (a) A lgorithm up datePrefxT rees c orr e ctly up dates a pr efx tr e e. (b) The c omplexity of the algorithm is O ( heig ht ( B I T )+ spl it ( pt )+ j oin ( pt )+ inser t ( pt )) wher e spl it ( pt ) j oin ( pt ) and inser t ( pt ) ar e, r esp e ctively, the times to split a pr efx tr e e, join two pr efx tr e es, and insert into a pr efx tr e e. Pro of Correctness follo ws from the defnition of a prefx tree. F or the complexit y w e see that it tak es O ( heig ht ( B I T )) time to determine Q In addition to determining Q at most 2 splits, 1 join, and 3 insertions in to prefx trees are done. Theorem 6 The c omplexity of the insert-pr efx op er ation is O ( heig ht ( B I T )+ spl it ( pt ) + j oin ( pt ) + inser t ( pt )) .

PAGE 133

119 algorithm updateP r ef ixT r ees ( s; f ) f // up date the prefx trees when the prefx P = [ s; f ] is inserted if ( s == 0&& f == 2 W )Tj/T1_0 11.9552 Tf11.8694 0 Td(1) f // P is the default prefx Up date next-hop feld of default prefx; return ; g if ( s == f ) f // l eng th ( P ) = W if ( P is not a new prefx) Up date next-hop feld for P ; else f Create a header no de for P 's prefx tree; if ( s is a new p oin t) f Q = LM P ( s ); Insert the basic in terv al that b egins at s in to Q ; g g return ; g // P is a non trivial prefx Determine Q using Lemma 23; if ( P == Q ) Up date next hop of Q and return ; ( smal l 1 ; big 1) = Q:spl it ( s ); ( smal l 2 ; big 2) = big 1 :spl it ( f ); Q = j oin ( smal l 1 ; big 2); Insert s (i.e., prefx P ) in to Q ; if ( f < fnish p oin t of prefx represen ted b y Q ) Insert f in to Q ; Insert basic in terv als in to Q as needed; Insert basic in terv als in to smal l 2 as needed; smal l 2 is the prefx tree for P ; g Figure 4{12: Algorithm to up date prefx trees

PAGE 134

120 Pro of F ollo ws from the complexities of inser tE ndP oint and updateP r ef ixT r ees and the observ ation that when a prefx is inserted, w e mak e at most 2 in v o cations of inser tE ndP oint and 1 of updateP r ef ixT r ees 4.3.4 Deleting a Prefx T o delete P 6 = [8 ; 15] from the database P 1 )Tj/T1_2 11.9552 Tf11.8694 0 Td(P 6 of Figure 4{7(a), w e m ust do the follo wing: 1. Delete 8 and 15 from the BIT and merge the basic in terv als r 1 a and r 1 b as w ell as r 3 a and r 3 b 2. Mo v e the prefx-tree no de for P 2, whic h is presen tly in the in prefx tree for P 6 to the prefx tree for P 1 and discard the remainder of the prefx tree for P 6. T o delete P 7 = [16 ; 23] from the database P )Tj/T1_2 11.9552 Tf11.9894 0 Td(P 5 and P 7 of Figure 4{7(b), w e m ust mo v e the prefx-tree no des for P 3 and r 6 from the prefx tree for P 7 to the prefx tree for P 1 and discard the header no de of the prefx tree for P 7. T o delete P 5 = [23 ; 23] from P 1 )Tj/T1_2 11.9552 Tf12.9494 0 Td(P 5, w e m ust remo v e 23 from the BIT, merge the basic in terv als r 6 and r 7, and discard the prefx tree for P 5. The deletion of the default prefx requires us to simply c hange the next-hop feld for this prefx to n ull (recall that the default prefx m ust b e retained in the database at all times). In the remainder of our discussion, w e assume that the prefx to b e deleted is not the default prefx. W e see that the deletion of a prefx P = [ s; f ], P 6= requires us to p erform some or all of the follo wing tasks: 1. Lo cate the prefx tree for P 2. Determine the longest prefx L whose range includes [ s; f ] ( L = P 1 in our preceding examples). 3. Determine whether s and/or f are to b e deleted from the BIT. If so, delete them. 4. Mo v e a p ortion of the prefx tree for P in to that of L and discard the remainder. 5. Merge pairs of external no des in the BIT.

PAGE 135

121 T o p erform task 1, w e observ e that when s = f the prefx tree for P ma y b e lo cated b y frst determining an external no de e of the BIT that represen ts a basic in terv al [ r i ; r i +1 ] with either r i = s or r i +1 = f In the former case, e:star tP ointer giv es us the desired prefx tree and in the latter case, e:f inishP ointer do es this. In case the p oin ter is n ull, P is not a prefx of the database. When s 6= f task 1 ma y b e p erformed using Lemma 23 to determine prefx Q using s and f If Q 6= P then P is not in the prefx database. In case the prefx to b e deleted is not in the database, the deletion algorithm terminates. A simple strategy for task 2 is to add a prefx-no de p oin ter pr ef ixN ode to the header no de of ev ery prefx tree. The prefx-no de p oin ter for the prefx S p oin ts to the unique no de N that is in one of the prefx trees and represen ts prefx S By follo wing paren t p oin ters from N w e reac h the header no de for the prefx L The prefx-p oin ter in the header no de of the prefx tree for S is set when S is inserted in to the database. Once set, this p oin ter do es not c hange. A sligh tly more in v olv ed strategy is describ ed no w. This strategy do es not require us to mak e an y c hanges to the BIT or prefx-trees structures. First note that since the prefx database con tains the default prefx and since P 6= the database con tains a unique prefx L of longest length whose range includes [ s; f ]. T o determine L let U denote the subset of database prefxes that either start at s or fnish at f (or b oth). Since P 2 U U is not empt y Let S b e the shortest prefx in U W e consider the follo wing three cases, whic h are exhaustiv e: (1) P = S (2) P 6= S and S starts at s and (3) P 6= S and S fnishes at f These three cases are sho wn pictorially in Figure 4{13. Let x b e the basic in terv al (if an y) that includes [ s )Tj/T1_0 11.9552 Tf11.8694 0 Td(1 ; s ] (note that when s = 0, there is no suc h x ) and let y b e the basic in terv al (if an y) that includes [ f ; f + 1]. W e see that, in all cases, L is the shorter of the prefxes next ( x ) and next ( y ). W e ma y determine next ( x ) ( next ( y )) b y follo wing the basic in terv al p oin ter in the BIT external no de for

PAGE 136

122 s P L f (a) P is shortest L s f P (b) P is not shortest L P f s (c) P is not shortest Figure 4{13: P = S ; P 6= S and S starts at s ; and P 6= S and S fnishes at f x ( y ) to the prefx tree for next ( x ) ( next ( y )) and then follo wing paren t p oin ters to the header no de for next ( x ) ( next ( y )). The easiest w a y to p erform task 3 is to augmen t the BIT structure so that with eac h distinct end p oin t w e main tain a coun t of the n um b er of prefxes in the database that start (fnish) at that end p oin t. When this coun t is 1, the deletion of P = [ s; f ] requires us to remo v e s ( f ) from the BIT. The insert algorithm is easily mo difed to up date the coun t felds whenev er a prefx is inserted. An alternativ e strategy whic h do esn't require us to augmen t either the BIT or prefx-trees structure, is describ ed no w. When s = f s is to b e deleted from the BIT i there is no other prefx for whic h s is an end p oin t. T o determine this, compute next ( x ), where x is the basic in terv al that includes [ s; s + 1] in case l sb ( s ) = 0 and includes [ f )Tj/T1_0 11.9552 Tf12.4693 0 Td(1 ; f ], otherwise. When l sb ( s ) = 0, s is to b e deleted i the start p oin t of next ( x ) 6= s When l sb ( s ) = 1, s is to b e deleted i the fnish p oin t of next ( x ) 6= s When s 6= f s ( f ) is to b e deleted i none of the follo wing is true (1) there is a prefx in the database whose start and

PAGE 137

123 fnish p oin ts are s ( f ), (2) next ( x ) 6= P where x is the basic in terv al (if an y) that includes [ s; s + 1] ([ f )Tj/T1_0 11.9552 Tf11.9894 0 Td(1 ; f ]), or (3) start (fnish) p oin t of L (task 2) equals s ( f ). F or task 4, w e frst delete the header no de of the prefx tree for P as w ell as the basic in terv al no des for the up to t w o basic in terv als in the prefx tree of P that are to b e merged with adjacen t basic in terv als. Call the resulting binary searc h tree P T 0 ( P ). Next, w e split the prefx tree P T ( L ) for P as in ( smal l ; big ) = P T ( L ) :spl it ( s ). The new prefx tree for L is j oin ( j oin ( smal l ; P T 0 ( P )) ; big ). T ask 5 is to b e done only when either s or f or b oth are to b e deleted. This task is easily in tegrated in to the delete s ( f ) task (task 3). Theorem 7 The c omplexity of the delete op er ation is O ( heig ht ( B I T ) + heig ht ( pt ) + spl it ( pt ) + j oin ( pt ) + del ete ( pt )) wher e heig ht ( pt ) is the height of a pr efx tr e e and del ete ( pt ) is the time to delete fr om a pr efx tr e e. Pro of T ask 1 is done b y searc hing do wn the BIT and then (p ossibly) going up a prefx tree. This tak es O ( heig ht ( B I T ) + heig ht ( pt )) time. T ask 2 requires us to go do wn the BIT and up a prefx tree once for eac h of x and y So, task 2 also tak es O ( heig ht ( B I T ) + heig ht ( pt )) time. F or task 3, w e m ust determine whether the end p oin ts s and f of the prefx that is to b e deleted are also to b e deleted and then delete these end p oin ts if so determined. F or eac h of s and f w e m ust fnd a next () v alue and then (p ossibly) delete the p oin t from the BIT. It tak es O ( heig ht ( B I T ) + heig ht ( pt )) time to determine next () and O ( heig ht ( B I T )) to delete a p oin t. F or task 4, w e m ust do up to 3 deletions from a prefx tree, p erform 1 split, and 2 joins. So, task 4 tak es O ( del ete ( pt ) + spl it ( pt ) + j oin ( pt )). Finally task 5 is in tegrated in to task 3 without an y increase in asymptotic complexit y 4.3.5 Complexit y The red-blac k tree [39] is a go o d c hoice of data structure for the binary searc h trees of the CBST. The follo wing prop erties [39] of red-blac k trees are imp ortan t to us:

PAGE 138

124 The heigh t of a red-blac k tree is logarithmic in the n um b er of no des in the tree. W e ma y insert in to, delete from, and split a red-blac k tree in O (heigh t of tree) time. Tw o red-blac k trees with n 1 and n 2 no des, resp ectiv ely ma y b e joined in O (log( n 1 n 2 )) time. F rom these prop erties and the earlier stated complexities of the searc h, insert, and delete algorithms for our prop osed CBST structure, it follo ws that w e can p erform longest prefx matc hes as w ell as prefx insertion and deletion in O (log n ) time, where n is the n um b er of prefxes in the database. When the trees of the CBST structure are implemen ted as red-blac k trees, the resulting structure is called CRBT (collection of red-blac k trees). Although the use of A VL trees in place of red-blac k trees also results in O (log n ) router-table op erations, red-blac k trees are generally b eliev ed to b e faster than A VL trees b y a constan t factor. When un balanced binary searc h trees are used in place of red-blac k trees, the complexit y of the matc h/insert/delete algorithms b ecomes O ( n ) (though the exp ected complexit y is O (log n )). Using spla y trees in place of red-blac k trees results in router-table op erations whose amortized complexit y is O (log n ). As for the space complexit y the BIT has at most 2 n in ternal and 2 n + 1 external no des. F urther, the n prefx trees together ha v e n header no des, n )Tj/T1_0 11.9552 Tf11.6293 0 Td(1 prefx no des (there is no prefx no de for the default prefx), and at most 2 n )Tj/T1_0 11.9552 Tf12.2293 0 Td(1 basic in terv al no des. So, the BIT and the prefx trees together ha v e at most 8 n no des. Therefore, the space complexit y is O ( n ). 4.3.6 Commen ts Our algorithms assume that prefxes are giv en b y the start and fnish p oin ts of their ranges. In practical databases, this ma y not b e the case; a prefx ma y b e sp ecifed b y its start p oin t and length. In this case, the fnish p oin t of the prefx ma y b e

PAGE 139

125 T able 4{1: Statistics of prefx databases obtained from IPMA pro ject on Sep 13, 2000 Database P aix Pb MaeW est Aads MaeEast Num of prefxes 85,988 35,303 30,719 27,069 22,713 Num of 32-bit prefxes 1 0 1 0 1 Num of end p oin ts 167,434 69,280 60,105 53,064 44,463 Max nesting depth 7 6 6 6 7 Avg nesting depth 2.13 1.90 1.90 1.86 1.82 Max prefx tree 76,979 44,333 38,469 36,201 32,437 Avg prefx tree 2.95 2.96 2.96 2.96 2.96 computed in O (1) time pro vided w e precompute the v alues A ( i ) = 2 i )Tj/T1_0 11.9552 Tf10.7894 0 Td(1, 0 i W The fnish p oin t of a prefx P whose start p oin t is s is s + A ( W )Tj/T1_1 11.9552 Tf11.9894 0 Td(l eng th ( P )). 4.4 Exp erimen tal Results W e programmed the CRBT sc heme in C++ and measured its p erformance using IPv4 prefx databases. The co des w ere run on a SUN Ultra En terprise 4000/5000 computer. The g++ compiler with optimization lev el -O3 w as used. F or test data, w e used the fv e IPv4 prefx databases of T able 4{1 [53 ]. In terestingly the n um b er of distinct end p oin ts is almost t wice the n um b er of prefxes in eac h database. The depth of nesting is the n um b er of prefxes that co v er a giv en basic in terv al. F or example, the depth of nesting for the basic in terv al r1 of Figure 4{2(a) is 1, b ecause prefx P1 is the only prefx that co v ers r1. The depth of nesting for r5 is 3, b ecause P1, P3 and P4 co v er r5. The maxim um depth of nesting is surprisingly almost the same for all fv e of our databases. Note that the depth of nesting rep orted in T able 4{1 includes the default prefx that w e ha v e added to the database. The a v erage nesting depth is obtained b y summing the nesting depth for all basic in terv als and dividing b y the n um b er of basic in terv als. F or our sample data, the a v erage nesting depth is v ery small. In fact, if w e eliminate the default prefx added b y us to the original databases, the a v erage depth of nesting b ecomes ab out 1. So, most of the basic in terv als are co v ered b y at most 1 prefx!

PAGE 140

126 T able 4{2: Memory for data structure (in KBytes) Database P aix Pb MaeW est Aads MaeEast Num of prefxes 85,988 35,303 30,719 27,069 22,713 Memory 16,139 6,664 5,786 5,106 4,280 2 3 4 5 6 7 10 2 10 3 10 4 10 5 k Memory (KB) optVST optVSTButler CRBT Figure 4{14: Memory required (in KBytes) b y b est k -VST and CRBT for P aix Max prefx tree is the maxim um n um b er of no des in an y of the constructucted prefx trees. This n um b er do es not include the header no de. Avg prefx tree is the a v erage n um b er of no des in a prefx tree. Although the prefx tree for the default prefx has a v ery large n um b er of no des (this prefx tree w as alw a ys the largest), the ma jorit y of the prefx trees are rather small. T able 4{2 sho ws the amoun t of memory used b y our data structure. Figure 4{14 compares the memory used b y our data structure and that used b y the the optimal v ariable-stride tries (VST) of Sriniv asan and V arghese [82]. CRBT is our collection of red-blac k trees data structure, optVST is the optimal v ariable-stride trie of [82], and optVST-Butler is the optimal v ariable-stride trie of [82] augmen ted with Butler no des. k is the heigh t of the VST, and is a user-sp ecifed parameter. The data for VSTs are tak en from [72]. Our CRBT structure tak es 6.4 times the memory required b y an optimal VST whose heigh t is 2.

PAGE 141

127 T o measure the searc h, insert, and delete times for our data structure, w e frst obtained a random p erm utation of the prefxes in the databases of [53 ]. F or eac h database, w e started with a CRBT that included the frst 75% of the prefxes (order is determined b y the random p erm utation). Then, the remaining 25% w ere inserted and the time to insert these 25% w as measured. The a v erage time for one of these inserts is rep orted in T able 4{3. F or the delete time, w e started with the CRBT for 100% of the prefxes in a database and measured the time to delete the last 25% of these prefxes. The a v erage time for one of these delete op erations is rep orted. Finally for the searc h time, w e measured the time to p erform a searc h for a destination address in eac h of the basic in terv als, and a v eraged o v er the n um b er of basic in terv als. The columns lab eled Dyn (dynamic) giv e the times for the case when the insert and delete co des use C++'s new and delete metho ds to create and free no des as required b y the insert and delete op erations, resp ectiv ely The columns lab eled Sta (static) is for co des that do not use dynamic memory allo cation/deallo cation during insert and delete op erations. Instead, w e b egin b y allo cating the maxim um n um b er of prefx trees as w ell as the maxim um n um b er of in ternal, external, and prefx no des that ma y b e needed. These allo cated no des are link ed in to four dieren t c hains, one for eac h no de t yp e. During an insert, no des are tak en from these c hains, and during a delete, no des are returned to these c hains. As the run times of T able 4{3 sho w, dynamic allo cation/deallo cation accoun ts for a signifcan t p ortion of the run time. Although one w ould, in theory exp ect the time for a searc h to b e the same when dynamic and static allo cation and deallo cation are used, the searc h times rep orted in T able 4{3 dier for three of the fv e databases. W e susp ect that this dierence is largely due to cac hing dierences resulting from the dierences in no de addresses in the t w o sc hemes. It is in teresting to note that ev en though searc h, insert, and delete are O (log n ) op erations, an insert or delete tak es ab out 25 times as m uc h time as do es a searc h when dynamic allo cation and deallo cation are used. When static

PAGE 142

128 T able 4{3: Execution time (in sec) for randomized databases Searc h Insert Delete Database Dyn Sta Dyn Sta Dyn Sta P aix 1.97 2.20 47.45 36.29 46.99 36.29 Pb 1.73 1.88 44.19 28.33 44.19 33.99 MaeW est 1.83 2.00 44.28 27.25 42.97 31.25 Aads 1.51 1.88 44.33 28.08 41.38 32.51 MaeEast 1.57 1.80 42.27 28.18 40.51 29.94 allo cation and deallo cation are used, this ratio is ab out 16. In either case, the ratio is far more than the less than t w o factor b et w een the time to insert/delete from a red-blac k tree and that to searc h a red-blac k tree. This order of magnitude jump in the ratio of insert/delete time and searc h time is due to the sev eral join and split op erations needed to insert/delete in to/from a CRBT. The times of T able 4{3 cannot b e compared with the times for corresp onding op erations on an optimal VST as rep orted b y Sahni and Kim in [72]. This is b ecause in the exp erimen ts conducted in [72], the database prefxes w ere considered in the order they app ear in eac h database rather than in a random order. F urther, in the exp erimen ts of [72], w e started with an optimal VST that con tained the frst 90% of the database prefxes and then inserted the remaining 10%. The a v erage time for eac h of these latter inserts is rep orted in [72]. The delete times are similarly obtained b y remo ving the last 10% of the prefxes from an optimal VST that initially has all 100% of the prefxes. The run times for our CRBT structure for the exp erimen t conducted in [72 ] are sho wn in T able 4{4. Notice that in this exp erimen t, the cost of an insert/delete is only 15 times that of a searc h when dynamic allo cation and deallo cation are used. When static allo cation and deallo cation are used, this ratio drops to ab out 7. Figures 4{15 through 4{17 compare the run times for the searc h, insert, and delete op erations using the P aix database and the CRBT and optimal VST structures. The searc h time using the CRBT structure is ab out 4 times that when an optimal

PAGE 143

129 T able 4{4: Execution time (in sec) for original databases Searc h Insert Delete Database Dyn Sta Dyn Sta Dyn Sta P aix 2.33 2.21 27.91 13.96 30.24 18.61 Pb 1.98 1.98 28.33 14.16 31.16 19.83 MaeW est 2.28 2.28 29.31 13.03 29.31 16.28 Aads 2.22 1.85 29.56 14.78 29.56 18.48 MaeEast 1.76 2.20 26.42 17.61 30.82 17.61 2 3 4 5 6 7 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 k Search Time (usec) OptVST CRBT Figure 4{15: Searc h time (in sec) comparison for P aix VST of heigh t k = 2 is used. Ho w ev er, when k = 2, eac h insert tak es ab out 6 times the time tak en b y our CRBT structure with dynamic allo cation/deallo cation and 12 times the time tak en b y our CRBT structure with static allo cation/deallo cation! F or the delete op eration, these ratios are 26 and 43, resp ectiv ely Note that these ratios increase as w e increase k So, although the CRBT is slo w er than optimal VSTs for the searc h op eration, it is considerably faster for the insert and delete op erations! 4.5 Summary The collection of red-blac k searc h trees (CRBT) data structure dev elop ed b y us pro vides the frst kno wn w a y to p erform longest-prefx matc hes, as w ell as prefx insert and delete in O (log n ) time. The CRBT is in teresting from b oth the theoretical and practical viewp oin ts. F rom the theoretical viewp oin t, it represen ts the frst data

PAGE 144

130 2 3 4 5 6 7 10 1 10 2 10 3 k Insert Time (usec) OptVST CRBTDyn CRBTSta Figure 4{16: Insert time (in sec) comparison for P aix 2 3 4 5 6 7 10 1 10 2 10 3 k Delete Time (usec) OptVST CRBTDyn CRBTSta Figure 4{17: Delete time (in sec) comparison for P aix

PAGE 145

131 structure to supp ort dynamic router-table op erations in O (log n ) time eac h. F rom the practical viewp oin t, w e note that the CRBT p ermits up dates to b e p erformed in m uc h less time than when structures suc h as the VST, whic h are optimized for searc h, are used. In a securit y-conscious en vironmen t, our router w ould need to op erate in a blo c king mo de (i.e., an insert/delete m ust complete b efore an y in b ound/outb ound pac k ets are forw arded). In suc h an en vironmen t, the CRBT w ould blo c k trac for ab out 1/10th the time the VST w ould. On the other hand, when trac is not blo c k ed due to an insert/delete in progress, the VST w ould pro cess pac k ets at 4 to 5 times the rate of the CRBT. In another application en vironmen t, our concern ma y b e the total time to pro cess a stream of searc h/insert/delete requests. Supp ose that for ev ery pair of insert and delete requests, there are m searc h requests. F urther, supp ose that the searc h/insert/delete times for the optimal VST are 0.5/170/800 micro seconds and that the times for the CRBT are 2.2/14/19 micro seconds (these are appro ximately the times for P aix). Then, when m > 551, the optimal VST w ould p erform b etter than the CRBT. It is w orth noting that the tec hnique dev elop ed here ma y b e used to extend the biased skip list sc heme of Ergun et al. [25 26 ] so that lo okups, inserts, and deletes ma y all b e done in O (log n ) exp ected time, while pro viding go o d exp ected p erformance for burst y access patterns (see Sahni and Kim [74 ]). Finally as noted in the in tro duction, when a compressed binary trie is used to represen t a dynamic router table, eac h of the dynamic router-table op erations tak es O ( W ) time. Since the compressed trie algorithms ha v e m uc h smaller constan ts than do the CRBT algorithms and since n 2 W the CRBT is exp ected to outp erform the compressed binary trie structure for relativ ely small v alues of n The threshold at whic h the compressed binary trie giv es b etter o v erall p erformance is higher for IPv6 than for IPv4.

PAGE 146

CHAPTER 5 D YNAMIC LOOKUP F OR BURSTY A CCESS P A TTERNS In this c hapter, w e fo cus on the managemen t of router tables for a dynamic en vironmen t (i.e., searc h, insert, and deletes are p erformed dynamically) in whic h the access pattern is burst y In a bursty access pattern the n um b er of dieren t destination addresses in an y subsequence of q pac k ets is << q That is, if the destination of the curren t pac k et is d there is a high probabilit y that d is also the destination for one or more of the next few pac k ets. The fact that In ternet pac k ets tend to b e burst y has b een noted in [18, 46], for example. Although the CRBT structure prop osed in Chapter 4 can b e used for our prop osed en vironmen t to p erform searc h, insert, and delete in O (log n ) time eac h, the CRBT structure do es not mak e explicit use of the assumed burst y nature of the access pattern. Despite the fact that the CRBT w as not dev elop ed with burst y access in mind, the cac he prop erties of con temp orary computers mak e the CRBT p erform b etter when the access pattern is burst y than when it is not. An alternativ e to the CRBT, the A CRBT, is prop osed in this c hapter. This alternativ e requires more memory than is required b y the CRBT. Ho w ev er, exp erimen ts indicate that the searc h, insert, and delete op erations are faster in the A CRBT than in the CRBT. In this c hapter, w e dev elop also t w o data structures that explicitly accoun t for the assumed burst y access pattern. The frst, biased skip lists with prefx trees (BSLPT), extends the biased skip list (BSL) structure prop osed in [25] for static router tables. This extension allo ws us to fnd longest matc hing prefxes as w ell as to insert and delete prefxes in O (log n ) exp ected time ( n is the n um b er of prefxes in the router table). The second data structure, whic h is a spla y-tree data structure called collection of spla y trees (CST), is an adaptation of the CRBT data structure prop osed 132

PAGE 147

133 b y us in Chapter 4. Using this structure, w e can fnd longest matc hing prefxes and insert and delete prefxes in O (log n ) amortized time. Exp erimen ts conducted b y us indicate that the CST structure has b est p erformance when the access pattern is v ery burst y Otherwise, the A CRBT structure p erforms b est. It should b e noted that BSLPTs ha v e O (log n ) exp ected complexit y p er op eration, CSTs ha v e O (log n ) amortized complexit y and CRBTs and A CRBTs ha v e O (log n ) w orst-case complexit y The eectiv eness of using sup erno de implemen tations of red-blac k trees to implemen t the A CRBT structure w as ev aluated exp erimen tally Our tests sho w that using a sup erno de implemen tation for the fron t end of the A CRBT generally reduces the searc h time while k eeping the insert and delete time relativ ely unc hanged. In Section 5.1, w e extend the BSL structure of [25] to BSLPT. Our prop osed CST is describ ed in Section 5.2. In Section 5.3, w e describ e the comparison of BITs and ABITs in b oth time complexit y and space complexit y In Section 5.4, w e presen t our exp erimen tal results. 5.1 Biased Skip Lists with Prefx T rees Ergun et al. [25] prop ose a biased skip list structure for burst y access. Their structure is suitable for static router tables; that is, router tables in whic h the prefx set do es not c hange (no prefxes are inserted or deleted). T o construct the initial BSL, the n prefxes of the router table are pro cessed to determine the up to 2 n )Tj/T1_0 11.9552 Tf12.1094 0 Td(1 basic in terv als they defne. The longest matc hing prefx for destination addresses that fall within eac h basic in terv al as w ell as for destination addresses that equal an end p oin t is determined. A master list of basic in terv als along with the determined longest matc hing prefx information is constructed. This list is indexed in to using a skip list structure (see [62 ] or [70 ] for a description of skip lists). Figure 5{1 sho ws a p ossible skip list structure for the basic in terv als of Figure 4{2(a). In a biased skip list, ranks are assigned to the basic in terv als. r ank ( a ) < r ank ( b ) whenev er in terv al a is accessed more recen tly than in terv al b Basic in terv al ranks are

PAGE 148

134 0 0 0 0 0 10 10 11 16 16 16 18 19 19 19 19 23 23 23 19 16 Biased skip list 0 19 23 10 11 16 18 r1 r2 r3 r4 r5 r6 r7 Master list Figure 5{1: Skip list represen tation for basic in terv als of Figure 4{2(a) used to bias the selection of in terv als that are represen ted in skip list c hains that are close to the top. Hence, searc hing for a destination address that is matc hed b y a basic in terv al of smaller rank tak es less time than when the matc hing in terv al has higher rank. If the destination of the curren t pac k et is d and the matc hing basic in terv al for d as determined b y the skip list searc h algorithm is a then r ank ( a ) b ecomes 1 and all previous ranks b et w een 1 and ol dR ank ( a ) are increased b y 1. The skip list structure is up dated follo wing eac h searc h to rerect the curren t ranks. Consequen tly searc hing for basic in terv als that w ere recen tly accessed is faster than searc hing for those that w ere last accessed a long time ago. This prop ert y mak es the BSL structure of [25] p erform b etter on burst y access patterns than on random access patterns. Regardless of the nature of the access, the exp ected time for a searc h in a BSL is O (log n ). When a router table has to supp ort the insertion and deletion of prefxes, the BSL structure of [25] b ecomes prohibitiv ely exp ensiv e. This is b ecause, an insert or delete can require us to c hange the longest matc hing prefx information stored in O ( n ) basic in terv als. T o o v ercome this problem, w e enhance the BSL structure of [25] b y adding the bac k-end CPT in Chapter 4. Notice that a BSL is functionally v ery similar to the fron t-end BIT of the CRBT structure in Chapter 4. Both are used to searc h for the matc hing basic in terv al. Unlik e the CRBT, whic h uses a

PAGE 149

135 b asicIntervalPointer to index in to a bac k-end CPT from whic h LM P ( d ) ma y b e determined for d v alues that are not matc hed b y prefxes whose length is W a BSL retains LM P ( d ) within the master-list no de for the matc hing in terv al. This is the reason insertion and deletion op erations in a BSL tak e O ( n ) time. T o o v ercome this problem, w e augmen t the BSL structure with a bac k-end CPT structure and confgure the master-list no des exactly the same as the external no des of a BIT. Note that eac h master-list no de of a BSL, lik e eac h external no de of a BIT, represen ts a basic in terv al. In an augmen ted BSL, eac h master-list no de has three p oin ters: startPointer fnishPointer and b asicIntervalPointer F or a master-list no de that represen ts the basic in terv al [ r i ; r i +1 ], startPointer ( fnishPointer ) p oin ts to the header no de of the prefx tree (in the bac k-end structure) for the prefx (if an y) whose range start and fnish p oin ts are r i ( r i +1 ). b asicIntervalPointer p oin ts to the prefx no de for the basic in terv al [ r i ; r i +1 ]. This prefx no de is in a prefx tree of the bac k-end structure. T o searc h a BSLPT for LM P ( d ), w e use the BSL searc h sc heme of [25] to searc h the fron t-end BSL for a matc hing basic in terv al. This searc h gets us to the master-list no de Q for a matc hing basic in terv al. When d equals the left (righ t) end-p oin t of the matc hing basic in terv al and startPointer ( fnishPointer ) is not n ull, LM P ( d ) is p oin ted to b y startPointer ( fnishPointer ). Otherwise, w e follo w the b asicIntervalPointer in Q to get in to a prefx tree. Then, w e follo w paren t p oin ters to the header of the prefx tree to determine the prefx that is LM P ( d ). T o insert a new prefx P = [ s; f ] ( s and f are, resp ectiv ely the range start and fnish p oin ts of P ), w e note that: 1. When neither s nor f is a new end p oin t, the n um b er of basic in terv als is unc hanged. F urther, since the b asicIntervalPointer in eac h master-list no de p oin ts to the prefx no de that represen ts the same basic in terv al, none of the b asicIntervalPointer s in the BSL c hange (what do es c hange, is the prefx tree

PAGE 150

136 these prefx no des need to b e in, but, this c hange is handled b y the insert algorithm for the bac k-end structure). Only the startPointer s and fnishPointer s in master-list no des ma y c hange when neither s nor f is a new end-p oin t. F urther, since these p oin ters are set only when there is a prefx with s = f = an end p oin t of the basic in terv al, at most t w o of these p oin ters ma y c hange when j P j = W ( j P j is the length or n um b er of bits in the prefx P ). The v alue of these c hanged p oin ters is the header no de of the prefx tree for the new prefx P When j P j < W s 6= f and no p oin ter in the BSL c hanges. 2. When s is a new end p oin t, the matc hing basic in terv al [ a; b ] for s splits in to t w o. Note that, b ecause of our assumption that the default prefx is alw a ys presen t, ev ery destination address d has a matc hing basic in terv al. The replacemen t basic in terv als for i = [ a; b ] are i 1 = [ a; s ] and i 2 = [ s; b ]. It is easy to see the follo wing: i 1 :star tP ointer = i:star tP ointer and i 2 :f inishP ointer = i:f inishP ointer When j P j = W i 1 :f inishP ointer = i 2 :star tP ointer = p oin ter to header no de of prefx tree for P ; i 1 :basicI nter v al P ointer and i 2 :basicI nter v al P ointer p oin t to prefx no des for i 1 and i 2, resp ectiv ely Both of these prefx no des are in the same prefx tree as w as the prefx no de for i This prefx tree can b e found b y follo wing paren t p oin ters from i:basicI nter v al P ointer Figure 5{2 sho ws the situation when j P j < W Since, j P j < W i 1 :f inishP ointer = i 2 :star tP ointer = n ull : Also, from Figure 5{2, w e see that i 1 :basicI nter v al P ointer = i:basic I nter v al P ointer (note that the prefx no de i:basicI nter v al P ointer no w represen ts the smaller basic in terv al i 1) and i 2 :basicI nter v al P ointer p oin ts to a new prefx no de for the in terv al i 2. This new prefx no de is to b e in the prefx tree for P .

PAGE 151

137 a b P s i1 i2 i Figure 5{2: Start p oin t s of P splits the basic in terv al [ a; b ] 3. The case when f is a new end p oin t is similar to the preceding case. 4. When b oth s and f are new end p oin ts, b oth cases (2) and (3) apply Figure 5{3 giv es the insert algorithm. The steps for modif y C P T are not pro vided, b ecause these steps are iden tical to those used to mo dify the CPT bac k-end structure of a CRBT. The in terested reader is referred to Chapter 4 for a statemen t of these steps. The algorithm to delete a prefx uses the delete from CPT algorithm in Chapter 4 to mo dify the bac k-end structure appropriately together with an algorithm to mak e the necessary c hanges to the BSL fron t end. Since these c hanges to the BSL are the in v erse of what happ ens during an insert, w e do not detail them here. When the bac k-end prefx trees are red-blac k trees, the exp ected complexit y of a searc h, insert, and delete is O (log n ). Ergun et al. [25 ] also prop ose a dynamic skip list structure (DSL) for static router tables. Although this structure is exp ected to p erform b etter than the BSL structure, Ergun et al. ha v e not implemen ted it, but ha v e c hosen to sta y with the simpler BSL structure. W e note that the DSL structure of [25] ma y also b e augmen ted with a bac k-end prefx-tree structure as describ ed here. The result is a structure in whic h searc h, insert, and delete all tak e O (log n ) exp ected time eac h. The unaugmen ted DSL structure of [25] p erforms a searc h in O (log n ) exp ected time. Ho w ev er, insert and delete tak e O ( n ) time eac h.

PAGE 152

138 algorithm inser tP r ef ix ( s; f ) f // insert a prefx handl eP oint ( s ); if ( s 6= f ) handl eP oint ( f ); modif y C P T ( s; f ) g algorithm handl eP oint ( u ) f // mo dify BSL to accoun t for the end p oin t u Searc h BSL for a matc hing basic in terv al Q for u if ( u is an end p oin t of Q ) f if ( j P j = W ) f Up date startPointer in master-list no de (if an y) for basic in terv al that starts at u Up date fnishPointer in master-list no de (if an y) for basic in terv al that fnishes at u g return ; g // u is a new end p oin t Replace the basic in terv al Q = [ a; b ] with the in terv als [ a; u ] and [ u; b ] and up date p oin ters as describ ed in text. return ; g Figure 5{3: BSLPT insert algorithm

PAGE 153

139 5.2 Collection of Spla y T rees Spla y trees [79 ] are binary searc h trees that self adjust so that the deep est encoun tered surviving no de in an y op eration b ecomes the ro ot follo wing the op eration. This self-adjusting prop ert y mak es spla y trees a promising data structure for burst y applications. Unfortunately w e cannot simply replace the use of red-blac k trees in the CRBT structure in Chapter 4 with spla y trees. This is b ecause, the red-blac k tree used to represen t the BIT in Chapter 4 uses t w o t yp es of no des: in ternal and external. Ev ery searc h has to reac h an external no de from whic h it ma y progress in to a bac k-end prefx tree. F ollo wing the spla y op eration, the reac hed external no de should b ecome the ro ot. This isn't p ermissible. With this realization, w e w ere faced with the option of either mo difying the spla y step so that the paren t of the reac hed external no de b ecame the ro ot or c hanging the BIT structure so that the new structure did not ha v e external no des. The former option w as rejected as this w ould result in a fron t-end tree whose ro ot alw a ys has an external no de as one of its c hildren. So, w e prop ose an alternativ e BIT structure (ABIT) in whic h w e ha v e only in ternal no des. When the BIT structure of a CRBT is replaced b y the ABIT structure, w e get, what w e call, the A CRBT structure. Eac h (in ternal) no de of the ABIT represen ts a single basic in terv al. Therefore, eac h ABIT no de has felds for the left and righ t end p oin ts of a basic in terv al, a left and righ t c hild p oin ter, p oin ters to LM P ( d ) when d equals either the left or righ t end p oin t of the basic in terv al, and a p oin ter to the corresp onding basic-in terv al no de in a prefx tree for the case when d lies b et w een the left and righ t end p oin ts of the basic in terv al (this is the same as the b asicIntervalPointer used in CRBT and BSLPT). Figure 5{4 sho ws the ABIT for the basic in terv als of Figure 4{2(a). In this fgure, only the left and righ t end p oin ts of eac h basic in terv al are sho wn (the LM P ( d ) v alues and the b asicIntervalPointers are not sho wn).

PAGE 154

140 0,10 10,11 11,16 18,19 16,18 19,23 23,31 r1 r2 r3 r4 r5 r6 r7 Figure 5{4: Alternativ e Base In terv al T ree corresp onding to Figure 4{2(a) The CST structure prop osed b y us for burst y applications of dynamic router tables uses the ABIT structure as the fron t end. The bac k end is the same prefx tree structure prop osed b y us in Chapter 4 except that eac h prefx tree is implemen ted as a spla y tree rather than as a red-blac k tree. The algorithms to searc h, insert, and delete for our prop osed CST structure are simple adaptations of those for the prop osed BSLPT structure. Therefore, further details are not pro vided. W e note that the amortized complexit y of these algorithms is O (log n ). Notice that spla y trees also ma y b e used to implemen t the bac k-end prefx trees of the BSLPT structure (rather than red-blac k trees). 5.3 Comparison of BITs and ABITs Space Complexit y A BIT has t w o t yp es of no des, in ternal and external. F or an n -prefx router-table, the n um b er, I of in ternal no des is at most 2 n ; the n um b er, E of external no des is I )Tj/T1_0 11.9552 Tf12.4694 0 Td(1. 1 Eac h in ternal no de has a feld for an in terv al end-p oin t plus three p oin ter 1 Although a binary tree with m in ternal no des has m + 1 external no des, the frst and last of these ha v e no signifcance in a BIT. So, these t w o external no des are not explicitly represen ted.

PAGE 155

141 felds (t w o c hildren and one paren t feld) and eac h external no de has three p oin ter felds (in to the bac k-end structure). Additionally ev ery no de has a t yp e feld that enables us to distinguish b et w een in ternal and external no des and a color feld to distinguish b et w een red and blac k no des. Supp ose that d b ytes are needed to store an in terv al end-p oin t, p for a no de p oin ter, t for the t yp e feld, and c for the color feld. W e see that the storage requiremen t of a BIT is ( d + t + c + 3 p ) I + ( t + 3 p )( I )Tj/T1_0 11.9552 Tf10.5494 0 Td(1) = ( d + 2 t + c + 6 p ) I )Tj/T1_1 11.9552 Tf12.4694 0 Td(t )Tj/T1_0 11.9552 Tf12.4694 0 Td(3 p b ytes. The ABIT for the same n -prefx router-table has I )Tj/T1_0 11.9552 Tf12.2293 0 Td(1 no des, eac h of whic h has t w o end-p oin t, one color, and 6 p oin ter felds (1 paren t, 2 c hildren, and 3 to bac k-end no des). So, the storage required b y the ABIT is (2 d + c + 6 p )( I )Tj/T1_0 11.9552 Tf11.9894 0 Td(1) = (2 d + c + 6 p ) I )Tj/T1_0 11.9552 Tf11.8694 0 Td(2 d )Tj/T1_1 11.9552 Tf11.9894 0 Td(c )Tj/T1_0 11.9552 Tf11.8694 0 Td(6 p b ytes. In theory t = c = 1 = 8 (i.e., 1 bit) is sucien t and p = 4 b ytes. F or IPv4, d = 4 b ytes. So, memor y ( B I T ) = 28 : 375 I )Tj/T1_0 11.9552 Tf12.3494 0 Td(12 : 125 b ytes and memor y ( AB I T ) = 32 : 125 I )Tj/T1_0 11.9552 Tf12.8293 0 Td(32 : 125 b ytes. The BIT structure requires ab out 12% less space than is required b y the ABIT. F or IPv6, the dieren tial is ab out 25%. In practice, ho w ev er, when co ding in a language suc h as C++, w e fnd it con v enien t to implemen t the t yp e and color feld using the data t yp e byte (note that the alternativ e t yp e boolean is 4 b ytes long in the g++ implemen tation of C++). So, in practice, t = c = 1. So, in practice, for IPv4 router tables, memor y ( B I T ) 31 I b ytes and memor y ( AB I T ) 33 I b ytes. So, for IPv4 router tables, using the con v enien t 1-b yte p er color and t yp e feld implemen tation, BIT tak es 6% less memory than is tak en b y ABIT! Time Complexit y Since the n um b er of in ternal no des in the BIT and ABIT for a giv en n -prefx router table is almost the same and since a BIT has external no des, whereas an ABIT do es not, heig ht ( AB I T ) heig ht ( B I T ) )Tj/T1_0 11.9552 Tf12.4694 0 Td(1. When searc hing for the longest matc hing prefx, the w orst-case n um b er of k ey comparisons in the BIT is heig ht ( B I T ). In an ABIT, this n um b er is appro ximately 2 heig ht ( AB I T ). So, in the w orst case, the ABIT requires appro ximately t wice as man y k ey comparisons as are required b y the

PAGE 156

142 BIT. Ho w ev er, on a v erage, the dieren tial is somewhat less, b ecuase in a BIT ev ery searc h neccessarily go es all the w a y to an external no de, whereas in an ABIT, a searc h ma y terminate at an y in ternal no de. F urther, in con temp orary computers, the larger n um b er of k ey comparisons p erformed required b y an ABIT ma y not manifest itself as an increase in measured run time. This is b ecause, the run time is dominated b y cac he misses and not b y k ey comparisons. If heig ht ( B I T ) = 10 and heig ht ( AB I T ) = 9, the w orst-case cac he-misses during a searc h is 10 for a BIT and 9 for an ABIT. So, despite the larger n um b er of k ey comprisons, a searc h in an ABIT ma y tak e 10% (on a v erage, the searc h go es to one lev el ab o v e the lo w est lev el; the a v erage searc h in an ABIT, assuming all no de accesses result in cac he misses, causes 20% few er cac he misses than is the case for a BIT) less time than tak en b y a searc h in a BIT. 5.4 Exp erimen tal Results W e programmed 2 the three burst y-access-sc hemes describ ed in this c hapter in C++ and measured the p erformance of these sc hemes as w ell as that of the CRBT sc heme in Chapter 4 using IPv4 prefx databases. Tw o implemen tations of the A CBR T sc heme w ere used. In the frst, the BIT structure w as implemen ted using con v en tional red-blac k trees (i.e., binary searc h trees with one elemen t p er no de). W e con tin ue to refer to this implemen tation as A CRBT. In the second implemen tation, the BIT structure of the A CRBT w as implemen ted using a sup erno de red-blac k tree [40]. This sup erno de implemen tation is referred to as SA CRBT. In the SA CRBT implemen tation, the BIT structure is a binary red-blac k searc h tree, eac h no de of whic h has up to 5 basic in terv als (i.e., up to 6 end p oin ts). These end p oin ts are stored in ascending order. 2 W e are grateful to the authors of [25 ] for pro viding us their C co de for the BSL structure. Our BSLPT co de utilized a C++ translation of the BSL co de of [25 ] for the fron t end and our o wn co de for the bac k end.

PAGE 157

143 F or the CST structure, w e used top-do wn spla y trees when implemen ting the ABIT and b ottom-up spla y trees for the bac k-end prefx trees. T op-do wn spla y trees w ere used for the ABIT, b ecause past exp erimen tal studies sho w that these p erform b etter than b ottom-up spla y trees in normal applications (see [40], for example). In the case of prefx trees, the searc h op eration b egins at the b ottom of the spla y tree (in normal searc h applications, this searc h w ould b egin at the ro ot). So, searc h time is optimized b y using b ottom-up rather than top-do wn spla y trees. Our implementation of the BSLPT structure also emplo y ed b ottom-up spla y trees in the bac k-end structure. The co des w ere run on a SUN Ultra En terprise 4000/5000 computer. The g++ compiler with optimization lev el -O2 w as used. F or test data, w e used the fv e IPv4 prefx databases of T able 4{1. T otal Memory Requiremen t T able 5{1 sho ws the amoun t of memory used b y eac h of the four data structures. As predicted b y our analysis of Section 5.3, the fron t-end structure of the CRBT (i.e., the BIT) tak es ab out 6% less memory than is tak en b y the ABIT of A CRBT. The bac k-end structure is iden tical in the CRBT and A CRBT. Therefore, these tak e the same amoun t of memory Both the fron t and bac k ends of the CST structure tak es less memory than their coun terparts in the CRBT and A CRBT. This is b ecause, the fron t-end spla y tree requires no paren t p oin ter, no t yp e feld, and no color feld and the bac k end spla y tree requires no color feld. The total memory required b y the CST structure is ab out 12% less than that required b y the A CRBT structure and ab out 9% less than that required b y the CRBT structure. The BSLPT, on the other hand, requires ab out t wice the memory required b y eac h of the other structures. As noted in [40], the use of a sup erno de structure reduces memory requiremen t relativ e to the meory required b y the corresp onding binary tree structure. The fron t end of the SA CRBT tak es ab out 36% less memory than is required b y the A CRBT fron t

PAGE 158

144 T able 5{1: Memory requiremen t (in KB) Sc hemes P aix Pb MaeW est Aads MaeEast CRBT F ron t End 5,068 2,097 1,819 1,606 1,346 Bac k End 6,465 2,664 2,315 2,041 1,712 T otal 11,534 4,761 4,134 3,648 3,056 A CRBT F ron t End 5,395 2,232 1,936 1,710 1,432 Bac k End 6,465 2,664 2,315 2,041 1,712 T otal 11,861 4,897 4,252 3,751 3,145 SA CRBT F ron t End 3,467 1,438 1,246 1,101 921 Bac k End 6,465 2,664 2,315 2,041 1,712 T otal 9,932 4,102 3,561 3,142 2,633 CST F ron t End 4,578 1,894 1,643 1,450 1,215 Bac k End 5,970 2,460 2,137 1,885 1,580 T otal 10,548 4,354 3,781 3,336 2,796 BSLPT F ron t End 17,530 7,698 7,015 6,496 4,578 Bac k End 5,970 2,460 2,137 1,885 1,580 T otal 23,508 10,158 9,152 8,381 6,159 end. Similar reductions in fron t and bac k end memory requiremen ts are exp ected when sup erno de implemen tations are used for the other data structures considered here. Figure 5{5 histograms the total memory required b y eac h data structure. Searc h Time T o measure the a v erage searc h time, w e frst constructed the data structure for eac h of our fv e prefx databases. Elev en sets of test data w ere used. The destination addresses in the frst set, NODUP comprised the end p oin ts of the basic in terv als corresp onding to the database b eing searc hed. These end p oin ts w ere randomly p erm uted. The data set, DUP10 (DUP20), w as constructed from NODUP b y making 10 (20) consecutiv e copies of eac h destination address. F or the data set RAN10 (RAN20), w e randomly p erm uted ev ery blo c k of 50 (100) destination addresses from DUP10 (DUP20) (note that eac h suc h blo c k has only 5 dieren t destination addresses). The remaining 6 data sets w ere constructed from the 6 trace sequences obtained from http://ita.ee.lbl.gov/html /con trib /DE C-PK T.ht ml and http://ita.ee.lbl.gov/htm l/co ntri b/LB L-P KT.h tml F our of these trace sequences represen t all wide-area trac b et w een Digital Equipmen t Corp oration and the rest

PAGE 159

145 MaeEast Aads MaeWest Pb Paix 0 5 10 15 20 25 databasesTotal Memory (MB) CRBT ACRBT SACRBT CST BSLPT Figure 5{5: T otal memory requiremen t (in MB) T able 5{2: T race sequences T race #pac k ets #addresses DEC-PKT-1 3,062,997 8,084 DEC-PKT-2 3,574,364 7,328 DEC-PKT-3 4,086,844 11,855 DEC-PKT-4 5,244,322 11,280 LBL-PKT-4 910,385 1,539 LBL-PKT-5 757,629 1,621 of the w orld for four dieren t one-hour in terv als in 1995. The remaining 2 trace sequences represen t all wide-area trac b et w een La wrence Berk eley Lab oratory and the rest of the w orld for t w o dieren t one-hour in terv als in 1994. T able 5{2 giv es the c haracteristics of these 6 traces. 3 In this table, #pac k ets is the n um b er of pac k ets in the trace sequence and #addresses is the n um b er of dieren t destination addresses for these pac k ets. 3 Only the TCP UDP and TCP SYN/FIN/RST pac k ets in the original traces are rep orted and used b y us. The ra w traces include other pac k ets that ha v e insucien t asso ciated information to b e useful.

PAGE 160

146 These data sets represen t dieren t degrees of burstiness in the access pattern. In NODUP all searc h addresses are dieren t. So, this access pattern represen ts the lo w est p ossible degree of burstiness. In DUP20, ev ery blo c k of 20 consecutiv e IP pac k ets has the same destination address, a high degree of burstiness. In RAN20, ev en though eac h destination address o ccurs 20 times, the destination addresses are not necessarily in consecutiv e pac k ets; there is some measure of temp oral spread among the recurring addresses. Since the n um b er of dieren t destination addresses in the trace data is 2 to 3 orders of magnitude less than the n um b er of pac k ets, these traces represen t a m uc h higher degree of burstiness than do the DUP and RAN data sets. F or example, in DEC-PKT-1, eac h destination address o ccurs appro ximately 379 times. Ho w ev er, temp oral burstiness (i.e., in v erse of the time b et w een the frst and last pac k ets that ha v e the same destination address) is far greater in the DUP and RAN data sets. F or the NODUP DUP and RAN data sets, the total searc h time for eac h data set w as measured and then a v eraged to get the time for a single searc h. This exp erimen t w as rep eated 10 times. So, 10 a v erage searc h times w ere obtained. The a v erage of these a v erages together with the standard deviation (SD) in the a v erages is giv en in T ables 5{3 and 5{4. Since pac k et destination-addresses giv en in the trace sequences w ere sanitized for priv acy reasons, w e randomly mapp ed the distinct addresses in the trace data, whic h are consecutiv e in tegers b eginning at 1, in to random 32-bit addresses. F or our exp erimen t, this random mapping to 32-bit addresses w as done 10 times for eac h trace and the a v erage searc h time for eac h mapping as w ell as the a v erage of the a v erages and the standard deviation of the a v erages computed. T ables 5{5 and 5{6 giv e the times for the trace sequences. Figures 5{6 and 5{7 histogram the a v erage times for all data sets. The frst thing to notice is that ev en though the CRBT and A CRBT are not designed to p erform b etter on burst y access paterns than on non-burst y ones, they

PAGE 161

147 T able 5{3: Searc h time (in sec) for CRBT, A CRBT, and SA CRBT structures on NODUP DUP and RAN data sets Sc hemes P aix Pb MaeW est Aads MaeEast CRBT NODUP 6.51 5.77 5.50 5.29 5.15 SD 0.30 0.09 0.05 0.09 0.12 DUP10 1.55 1.41 1.36 1.36 1.31 SD 0.00 0.01 0.03 0.01 0.03 DUP20 1.29 1.21 1.17 1.17 1.13 SD 0.02 0.01 0.01 0.02 0.01 RAN10 2.08 1.54 1.58 1.39 1.49 SD 0.00 0.00 0.04 0.03 0.05 RAN20 1.89 1.44 1.37 1.28 1.22 SD 0.01 0.01 0.01 0.02 0.02 A CRBT NODUP 4.59 3.69 3.54 3.39 3.14 SD 0.05 0.10 0.08 0.08 0.00 DUP10 1.13 1.08 1.06 1.04 1.02 SD 0.01 0.00 0.02 0.00 0.01 DUP20 0.98 0.91 0.89 0.88 0.86 SD 0.00 0.00 0.00 0.00 0.00 RAN10 1.74 1.54 1.40 1.37 1.26 SD 0.00 0.00 0.01 0.16 0.00 RAN20 1.44 1.10 1.05 1.00 0.98 SD 0.11 0.01 0.02 0.01 0.01 SA CRBT NODUP 4.76 3.96 3.64 3.56 3.46 SD 0.06 0.23 0.25 0.16 0.11 DUP10 1.16 0.98 0.95 0.94 0.91 SD 0.01 0.03 0.00 0.01 0.00 DUP20 0.96 0.86 0.85 0.84 0.83 SD 0.00 0.00 0.00 0.00 0.00 RAN10 1.72 1.41 1.50 1.28 1.52 SD 0.01 0.33 0.16 0.00 0.00 RAN20 1.15 1.00 0.98 0.97 0.97 SD 0.00 0.00 0.00 0.00 0.03

PAGE 162

148 T able 5{4: Searc h time (in sec) for CST and BSLPT structures on NODUP DUP and RAN data sets Sc hemes P aix Pb MaeW est Aads MaeEast CST NODUP 9.91 8.68 8.45 8.23 7.78 SD 0.38 0.06 0.07 0.09 0.11 DUP10 1.23 1.13 1.11 1.09 1.04 SD 0.00 0.00 0.00 0.00 0.01 DUP20 0.76 0.71 0.69 0.68 0.66 SD 0.00 0.00 0.00 0.00 0.00 RAN10 1.69 1.59 1.56 1.53 1.50 SD 0.00 0.00 0.00 0.00 0.01 RAN20 1.24 1.18 1.18 1.16 1.12 SD 0.00 0.00 0.00 0.01 0.00 BSLPT NODUP 88.20 77.70 75.16 73.10 66.14 SD 3.75 0.15 0.35 0.30 0.32 DUP10 9.36 8.28 8.13 7.90 7.13 SD 0.36 0.01 0.28 0.18 0.03 DUP20 5.01 4.38 4.26 4.15 3.81 SD 0.10 0.07 0.01 0.01 0.01 RAN10 9.73 8.50 8.22 8.21 7.43 SD 0.46 0.03 0.01 0.51 0.02 RAN20 5.39 4.79 4.72 4.60 4.34 SD 0.02 0.01 0.01 0.01 0.01

PAGE 163

149 T able 5{5: Searc h time (in sec) for CRBT, A CRBT, and SA CRBT structures on trace sequences Sc hemes P aix Pb MaeW est Aads MaeEast CRBT DEC-PKT-1 3.91 3.44 3.03 2.91 2.95 SD 0.16 0.13 0.17 0.16 0.13 DEC-PKT-2 3.87 3.43 3.05 2.85 3.07 SD 0.25 0.20 0.21 0.17 0.21 DEC-PKT-3 4.17 3.57 3.26 3.05 3.25 SD 0.27 0.14 0.18 0.25 0.11 DEC-PKT-4 3.99 3.49 3.13 2.92 3.18 SD 0.20 0.13 0.16 0.14 0.11 LBL-PKT-4 4.03 3.54 3.16 3.01 3.22 SD 0.19 0.14 0.17 0.15 0.19 LBL-PKT-5 4.03 3.46 3.21 3.05 3.19 SD 0.15 0.09 0.10 0.10 0.08 A CRBT DEC-PKT-1 3.53 3.08 2.87 2.61 2.77 SD 0.17 0.16 0.19 0.15 0.12 DEC-PKT-2 3.61 3.03 2.90 2.58 2.85 SD 0.22 0.19 0.21 0.17 0.19 DEC-PKT-3 3.96 3.21 2.99 2.74 2.93 SD 0.24 0.16 0.15 0.20 0.08 DEC-PKT-4 3.73 3.14 2.94 2.65 2.75 SD 0.21 0.14 0.14 0.14 0.09 LBL-PKT-4 3.65 3.19 2.97 2.68 2.97 SD 0.16 0.17 0.18 0.15 0.16 LBL-PKT-5 3.71 3.09 2.93 2.71 2.84 SD 0.16 0.09 0.11 0.11 0.11 SA CRBT DEC-PKT-1 2.81 2.47 2.16 2.07 2.05 SD 0.15 0.08 0.16 0.09 0.11 DEC-PKT-2 2.83 2.51 2.14 2.06 2.09 SD 0.22 0.13 0.14 0.11 0.15 DEC-PKT-3 2.97 2.65 2.32 2.10 2.19 SD 0.24 0.15 0.10 0.13 0.10 DEC-PKT-4 2.82 2.58 2.20 2.09 2.21 SD 0.16 0.11 0.12 0.10 0.07 LBL-PKT-4 2.91 2.56 2.22 2.14 2.13 SD 0.15 0.10 0.17 0.13 0.14 LBL-PKT-5 2.81 2.56 2.17 2.06 2.19 SD 0.12 0.08 0.08 0.10 0.05

PAGE 164

150 T able 5{6: Searc h time (in sec) for CST and BSLPT structures on trace sequences Sc hemes P aix Pb MaeW est Aads MaeEast CST DEC-PKT-1 1.35 1.41 1.46 1.43 1.42 SD 0.07 0.08 0.08 0.08 0.08 DEC-PKT-2 1.35 1.41 1.39 1.46 1.41 SD 0.09 0.09 0.09 0.09 0.11 DEC-PKT-3 1.45 1.54 1.53 1.52 1.51 SD 0.04 0.05 0.06 0.06 0.06 DEC-PKT-4 1.43 1.48 1.46 1.51 1.55 SD 0.18 0.07 0.08 0.07 0.07 LBL-PKT-4 1.21 1.27 1.26 1.28 1.30 SD 0.09 0.10 0.11 0.10 0.11 LBL-PKT-5 1.53 1.25 1.29 1.23 1.24 SD 0.09 0.07 0.05 0.05 0.07 BSLPT DEC-PKT-1 5.27 5.03 4.95 4.74 4.98 SD 0.45 0.40 0.45 0.48 0.45 DEC-PKT-2 5.19 4.97 4.79 4.58 4.75 SD 0.62 0.67 0.59 0.59 0.57 DEC-PKT-3 5.94 5.48 5.36 5.18 5.45 SD 0.39 0.41 0.39 0.38 0.45 DEC-PKT-4 5.49 5.36 5.13 4.90 5.28 SD 0.40 0.45 0.37 0.36 0.45 LBL-PKT-4 4.65 3.77 4.01 3.68 3.94 SD 0.33 0.39 0.41 0.43 0.39 LBL-PKT-5 3.95 3.74 3.89 3.53 3.76 SD 0.38 0.44 0.45 0.38 0.44

PAGE 165

151 actually do so. F or example, searc hing the database P aix tak es ab out four times as m uc h time p er searc h using the data set NODUP (non-burst y) as it do es using the data set DUP10! This is b ecause of the cac he eect{the frst searc h for destination d in CRBT p oten tially causes h cac he misses, where h is the heigh t of the BIT. Subsequen t searc hes ha v e (almost) no cac he misses b ecause the BIT and bac k-end no des needed for the searc h are still in cac he! So, computer cac hes automatically result in impro v ed p erformance for burst y access patterns! The p erformance impro v emen t b et w een RAN10 and NODUP isn't as m uc h (a factor of 3 rather than 4), b ecause of increased cac he conricts caused b y in termediate searc hes for dieren t destinations. Still, it app ears that most of the accessed no des remain in cac he long enough to service all rep eated searc hes for the same destination in RAN10. The reduction in a v erage searc h-time in going from DUP10 to DUP20 and from RAN10 to RAN20 isn't quite as dramatic|a mere 9% to 17%. The use of sup erno des in the SA CRBT generally reduces the searc h time b y a small amoun t. Similar reductions in searc h times are seen when the trace data sets are used together with the CRBT, A CRBT, and SA CRBT structures. In the case of the trace data sets, the SA CRBT pro vides a m uc h greater reduction in searc h time o v er the A CRBT than w as the case with the NODUP DUP and RAN data sets. The data structures, CST and BSLPT, that are designed to p erform b etter, through a reduction in w ork (n um b er of comparisons and n um b er of p oin ters follo w ed) on burst y access patterns sho w a m uc h greater p erformance impro v emen t when the access pattern is burst y than when it is not. F or example, on the database P aix, CST and BSLPT to ok ab out 8 to 9 times as m uc h time on the NODUP data set as they did on the DUP10 data set. A further 26% to 46% reduction in a v erage searc h-time w as evidenced when going from DUP10 to DUP20 and from RAN10 to RAN20. The a v erage searc h times for the trace data sets w as comparable to that for the DUP and RAN data sets. On the database P aix, for example, CST to ok ab out 7 times as m uc h

PAGE 166

152 time on the NODUP data set as on the DEC-PKT-1 data set. This ratio w as ab out 17 for BSLPT! Although a w orst-case and a v erage searc h in an A CRBT requires more comparisons than do es a similar searc h in a CRBT, our exp erimen ts sho w that the a v erage searc h time in an A CRBT is less than that in a CRBT. In fact, in our tests, the A CRBT w as generally faster than the CRBT and in most of our tests, the searc h time in an A CRBT w as less than that in a CRBT b y b et w een 16% and 29%. Although the CST and A CRBT w ere comp etitiv e on the DUP10 and RAN10 data sets, the CST to ok ab out 50% more time than did A CRBT on the NODUP test, and to ok ab out 20% less time than tak en b y the A CRBT on the DUP20 test. On the trace data, the CST to ok ab out one-third to one-half the time tak en b y the CRBT, A CRBT, and SA CRBT structures. The BSLPT w as no matc h for the other structures on an y of the data sets! Insert Time T o measure the a v erage insert time for eac h of the data structures, w e frst obtained a random p erm utation of the prefxes in eac h of the databases. Next, the frst 75% of the prefxes in this random p erm utation w ere inserted in to an initially empt y data structure. The time to insert the remaining 25% of the prefxes w as measured and a v eraged. This timing exp erimen t w as rep eated 10 times. T able 5{7 sho ws the a v erage of the 10 a v erage insert times as w ell as the standard deviation (SD) in the measured a v erage time. Figure 5{8 histograms the a v erage times of T able 5{7. Our insert exp erimen ts sho w that the BSLPT structure is the clear loser for this op eration. The a v erage insert in a BSLPT tak es ab out t wice as m uc h time as it do es in an y of the three tree-based fron t-end structures. The insert op eration is faster in the CST than in the A CRBT, and inserts are faster in the A CRBT than in the CRBT. An insert in the P aix database, for example, tak es 18% less time when an A CRBT is used than when a CRBT is used. The time for the CST is 14% less than

PAGE 167

153 T able 5{7: Av erage time to insert a prefx (in sec) Sc hemes P aix Pb MaeW est Aads MaeEast CRBT A V G 35.12 32.86 31.51 31.62 29.94 SD 1.35 0.00 0.54 1.03 0.00 A CRBT A V G 28.65 26.17 25.26 24.82 23.42 SD 0.39 0.35 0.67 0.62 0.85 SA CRBT A V G 33.58 30.82 29.69 29.70 28.00 SD 0.36 0.71 1.82 0.83 0.99 CST AD V 24.60 23.68 22.39 22.60 21.66 SD 0.14 0.35 0.54 1.40 0.85 BSLPT AD V 64.10 58.47 67.84 67.82 51.60 SD 0.65 1.52 0.73 0.46 0.85 T able 5{8: Av erage time to delete a prefx (in sec) Sc hemes P aix Pb MaeW est Aads MaeEast CRBT A V G 38.89 36.03 34.90 35.31 33.81 SD 1.30 0.71 0.54 0.83 0.74 A CRBT A V G 30.98 28.55 27.34 27.33 25.71 SD 0.39 0.47 0.00 0.77 0.90 SA CRBT A V G 33.95 31.27 29.82 30.88 28.70 SD 0.43 0.79 1.43 0.83 0.85 CST A V G 32.88 31.72 30.86 31.18 29.41 SD 0.31 0.00 0.62 1.62 0.85 BSLPT A V G 74.01 65.49 63.68 63.10 57.76 SD 0.77 0.89 0.96 0.99 1.11 for the A CRBT. The insert time for the A CRBT is ab out 15% less than that for the sup erno de implemen tation SA CRBT. Delete Time T o measure the a v erage delete time, w e started with the data structure for eac h database and remo v ed the last 25% of the prefxes in the database. These prefxes w ere determined using the p erm utation generated for the insert time test. Once again, the test w as run 10 times and the a v erage of the a v erages computed. T able 5{8 sho ws the a v erage time to delete a prefx as w ell as the standard deviation in this time o v er the 10 test runs. Figure 5{9 histograms the a v erage times of T able 5{8.

PAGE 168

154 As w as the case for the searc h and insert op erations, for the delete op eration to o, the BSLPT structure is the clear loser. A delete in the BSLPT structure tak es more than t wice the time it tak es in an y of the tree-based fron t-end structures. Among the tree-based fron t-end structures, the delete time is the least for the A CRBT structure. F or example, on the P aix database, a delete using CST tak es ab out 6% more time than when an A CRBT is used; a delete using CRBT tak es 25% more time than when an A CRBT is used. The use of sup erno des increases the a v erage delete time b y 10%. 5.5 Summary W e ha v e mo difed the CRBT structure prop osed in Chapter 4 b y c hanging the structure of the fron t end. Although, the mo difed structure, called the A CRBT, requires more memory than do es the CRBT, exp erimen ts conducted b y us indicate that the A CRBT is generally faster than the CRBT for the searc h, insert, and delete op erations. By replacing the fron t-end red-blac k trees of the CRBT structure in Chapter 4 with top-do wn spla y trees and the bac k-end red-blac k trees with b ottom-up spla y trees, w e arriv e at the CST data structure that tak es less memory than tak en b y either the CRBT and A CRBT. F or acutely burst y access patterns (e.g., DUP20 and the trace data sets), searc hes in the CST structure are m uc h faster than in the A CRBT structure. When the access pattern is not burst y (e.g., NODUP), searc hes in a CST tak e ab out t wice as m uc h time as they do in an A CRBT. F or the insert op eration, CSTs are sligh tly faster than A CRBTs, but for the delete op eration, the rev erse is the case. W e can add the CRBT bac k-end to the BSL structure of [25] to obtain a biased skip list structure that p ermits the insertion and deletion of prefxes in O (log n ) exp ected time. Ho w ev er, our exp erimen ts indicate that the resulting data structure is highly non-comp etitiv e with CRBTs, A CRBTs, and CSTs for the searc h, insert, and delete op erations. The BSLPT is alw a ys signifcan tly inferior to the CST on

PAGE 169

155 all op erations. Therefore, biased skip lists cannot b e recommended for ev en highly burst y access applications. Of the structures w e tested, the A CRBT is recommended for non-burst y to mo derately burst y applications, the CST is recommended for highly burst y applications. Finally w e observ e that the add-on data structures, suc h as a cac he list of most-recen t destination addresses, suggested in [25] for further impro v emen t in searc h p erformance ma y also b e used in conjunction with the data structures of this c hapter. Finally b y using sup erno de binary trees [40 ] in place of traditional one-elemen tp er-no de binary trees, w e can impro v e the searc h p erformance of our data structures. Although w e did this only for the fron t-end of the A CRBT structure, w e exp ect the results to carry o v er for the remaining structures.

PAGE 170

156 MaeEast Aads MaeWest Pb Paix 0 10 20 30 40 50 60 70 80 90 databasesSearch Time (usec) CRBT ACRBT SACRBT CST BSLPT (a) NODUP MaeEast Aads MaeWest Pb Paix 0 1 2 3 4 5 6 7 8 9 10 databasesSearch Time (usec) (b) DUP10 MaeEast Aads MaeWest Pb Paix 0 1 2 3 4 5 6 databasesSearch Time (usec) (c) DUP20 MaeEast Aads MaeWest Pb Paix 0 1 2 3 4 5 6 7 8 9 10 databasesSearch Time (usec) (d) RAN10 MaeEast Aads MaeWest Pb Paix 0 1 2 3 4 5 6 databasesSearch Time (usec) (e) RAN20 Figure 5{6: Av erage searc h time for NODUP DUP and RAN data sets

PAGE 171

157 MaeEast Aads MaeWest Pb Paix 0 1 2 3 4 5 6 databasesSearch Time (usec) CRBT ACRBT SACRBT CST BSLPT (a) DEC-PKT-1 MaeEast Aads MaeWest Pb Paix 0 1 2 3 4 5 6 databasesSearch Time (usec) (b) DEC-PKT-2 MaeEast Aads MaeWest Pb Paix 0 1 2 3 4 5 6 databasesSearch Time (usec) (c) DEC-PKT-3 MaeEast Aads MaeWest Pb Paix 0 1 2 3 4 5 6 databasesSearch Time (usec) (d) DEC-PKT-4 MaeEast Aads MaeWest Pb Paix 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 databasesSearch Time (usec) (e) LBL-PKT-4 MaeEast Aads MaeWest Pb Paix 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 databasesSearch Time (usec) (f ) LBL-PKT-5 Figure 5{7: Av erage searc h time for trace sequences

PAGE 172

158 MaeEast Aads MaeWest Pb Paix 0 10 20 30 40 50 60 70 databasesInsert Time (usec) CRBT ACRBT SACRBT CST BSLPT Figure 5{8: Av erage time to insert a prefx MaeEast Aads MaeWest Pb Paix 0 10 20 30 40 50 60 70 80 databasesDelete Time (usec) CRBT ACRBT SACRBT CST BSLPT Figure 5{9: Av erage time to delete a prefx

PAGE 173

CHAPTER 6 CONCLUSIONS AND FUTURE W ORK In this dissertation, w e ha v e dev elop ed sev eral ecien t algorithms for IP lo okup in In ternet routers. Through analyzing and exp erimen ting with them, w e ha v e seen that the prop osed algorithms w ork w ell compared with other kno wn algorithms. W e summarize our con tributions in Section 6.1 and discuss directions for future w ork in Section 6.2. 6.1 Conclusions W e ha v e prop osed data structures for one-dimensional pac k et classifcation in this dissertation. The main con tributions of the dissertation are discussed as follo ws. Multibit T ries: In Chapter 2, w e impro v ed on the dynamic programming algorithms of [82], whic h determine the strides of optimal m ultibit fxed-stride and v ariable-stride tries, b y pro viding alternativ e dynamic programming form ulations for b oth fxedand v ariable-stride tries. While the asymptotic complexities of our algorithms are the same as those of the corresp onding algorithms of [82], exp erimen ts using real IPv4 routing table data indicate that our algorithms run considerably faster. An added feature of our v ariable-stride trie algorithm is the abilit y to insert and delete prefxes taking a fraction of the time needed to construct an optimal v ariable-stride trie from scratc h. Binary Searc h on Prefx Length: Chapter 3 considers the collection of hash tables (CHT) organization, whic h w as prop osed in [87] for an IP router table. Sriniv asan et al. [80 ] ha v e prop osed the use of con trolled prefx-expansion to reduce the n um b er of distinct prefx-lengths (also equal to the n um b er of hash tables in the CHT). The complexit y of their algorithm is O ( nW 2 ), where n is the n um b er of prefxes, and W is the length of the longest prefx. W e ha v e dev elop ed an algorithm that 159

PAGE 174

160 minimizes storage requiremen t but tak es O ( nW 3 + k W 4 ) time, where k is the desired n um b er of distinct lengths. Also, w e ha v e prop osed impro v emen ts to the heuristic of [80]. O (log n ) Dynamic Router-T able: In Chapter 4, w e prop osed a data structure in whic h prefx matc hing, prefx insertion, and deletion can eac h b e done in O (log n ) time, where n is the n um b er of prefxes in the router table. F or W -bit destination addresses, the use of binary tries enables us to determine the longest matc hing prefx as w ell as to insert and delete a prefx in O ( W ) time, indep enden t of n Since n << 2 W in real router tables, it is desirable to dev elop a data structure that p ermits these three op erations in O (log n ). Although the prop osed data structure is slo w er than optimized v ariable-stride tries for longest prefx matc hing, the prop osed data structure is considerably faster for the insert and delete op erations. Dynamic Lo okup for Burst y Access P atterns: W e ha v e dev elop ed data structures for dynamic router-tables for burst y access-patterns in Chapter 5. In this c hapter, w e frst form ulated a v arian t, A CRBT, of the CRBT data structure prop osed in Chapter 4 for dynamic router-tables. By replacing the red-blac k trees used in the A CRBT with spla y trees, w e obtain the CST structure in whic h searc h, insert, and delete tak e O (log n ) amortized time p er op eration, where n is the n um b er of prefxes in the router table. By replacing the fron t end of the CST with biased skip lists, w e obtain the BSLPT structure in whic h searc h, insert, and delete tak e O (log n ) exp ected time. The CST and BSLPT structures are designed so as to p erform m uc h b etter when the access pattern is burst y than when it is not. F or extremely burst y access patterns, the CST structure is b est. Otherwise, the A CRBT is recommended. A sup erno de implemen tation of the A CRBT usually has b etter searc h p erformance than do es the traditional one-elemen t-p er-no de implemen tation. T able 6{1 summarizes the p erformance c haracteristics of v arious data structures for the longest matc hing-prefx problem.

PAGE 175

161 T able 6{1: P erformance of data structures for longest matc hing-prefx Data Structure Searc h Up date Memory Usage Linear List O ( n ) O ( n ) O ( n ) End-p oin t Arra y O (log n ) O ( n ) O ( n ) Sets of Equal-Length Prefxes O ( + log W ) exp ected O ( p nW log W ) exp ected O ( n log W ) 1-bit tries O ( W ) O ( W ) O ( nW ) s-bit T ries O ( W =s ) O (2 s nW =s ) CRBT O (log n ) O (log n ) O ( n ) A CRBT O (log n ) O (log n ) O ( n ) BSLPT O (log n ) exp ected O (log n ) exp ected O ( n ) CST O (log n ) amortized O (log n ) amortized O ( n ) 6.2 F uture W ork Although m uc h eort has b een dev oted to the IP router tables, this area has not y et reac hed complete maturit y W e discuss some issues for future researc h directions as follo ws. IPv6 Extensions: Some router table structures that giv e a go o d lo okup p erformance for IPv4 prefxes ma y face scalabilit y problems when applied to IPv6. F or instance, a large-stride m ultibit trie to accomo date a h undred thousand IPv6 prefxes ma y exp onen tially increase the amoun t of memory; whereas, with small-stride no des, it ma y tak e high execution time for w orst-case lo okup op erations. Ev en with mo derate strides, the m ultibit trie ma y suer from either the tremendous amoun t of memory or the n um b er of cac he misses, or b oth. F uture Routers: Gigabit routers can t ypically pro vide total line capacit y of up to tens of gigabits p er second, and supp ort p ort in terface sp eeds up to OC-48 (2.5 Gbps). Curren t terabit routers are designed with the aggregate line capacit y to handle thousands of gigabits p er second and to pro vide high-scalable p erformance and high p ort densit y [1]. These routers can supp ort p ort in terface sp eeds as high as OC-192 (10 Gbps) and b ey ond. T o supp ort lo okups in future routers, order of magnitude faster lo okup times than those of curren t routers are required. W e ma y

PAGE 176

162 need to use on-c hip memory and/or SRAM memory to allo w suc h a sp eed. Th us, some scalable and ecien t data structures should b e dev elop ed to ft more than h undered of thousands prefxes in to on-c hip SRAM. Switc hing v ersus Routing: The basic dierence b et w een switc hing and routing is that switc hing uses indexing to determine the next hop for a pac k et in the address table whereas routing uses se ar ching (or lo okup ). Since indexing is O (1) operation, it ma y b e m uc h faster than an y searc h teac hnique. Because of this, man y p eople started thinking ab out replacing routers with switc hes wherev er p ossible and v endors in tro duced sev eral pro ducts in to the mark et. There are t w o v ery dieren t approac hes that com bine la y er 2 switc hing and la y er 3 routing. The frst approac h (e.g., IP switc hing [56 ] and m ulti-proto col o v er A TM (MPO A) [2]) aims at impro ving routing p erformance b y separating the transmission of net w ork con trol information from the normal data trac. Con trol trac passes through the routers to initiate a call or connection setup, whereas normal data trac can b e switc hed through the already established path. The other approac h (e.g., tag switc hing [64] and MPLS [68]) addresses W AN (wide area net w ork) route scalabilit y issues [58 ]. Routing decisions are p erformed once at the en try p oin t to the W AN and the remaining forw arding decisions within the W AN switc h infrastructure are based on switc hing tec hniques. P ac k et Classifcation: Despite the v ast amoun t of atten tion to the pac k et classifcation [4, 5, 11 28, 34, 35 43, 63 81, 83, 88 ], there is a scaling problem ev en for medium size databases when rules con tain more than 2 felds. Both QoS and securit y guaran tees require a fner discrimination of pac k ets based on felds other than the destination address. Classifers ha v e ev olv ed from frew alls [15 ] that flter out un w an ted pac k ets at the edge routers of net w orks using rule tables of up to 500 rules. The adv en t of DiServ [7] and p olicing applications has giv en an an ticipation that classifers could supp ort a few h undred thousand rules at edge routers [49]. Ho w ev er, while the curren t b est classifcation algorithms [5, 34, 35] w ork w ell for

PAGE 177

163 databases of up to (sa y) 1000 rules, they require v ery large amoun ts of memory for larger databases. W e hop e the IP lo okup algorithms describ ed in this dissertation will pro vide a brigehead for exploring these areas further.

PAGE 178

REFERENCES [1] D. Allen, T erabit routing: Simplifying the core, T ele c ommunic ations Online h ttp://www.telecommagazine.com/, Ma y 1999. Site last visited June 2003. [2] A TM F orum, Multi-proto col o v er A TM sp ecifcation, v ersion 1.1, af-mp oa0114.000, h ttp://www-comm.itsi.disa.mil/atmf/mp oa.h tml, Ma y 1999. [3] J. Aw ey a, IP router arc hitectures: An o v erview, Journal of Systems A r chite ctur e V ol. 46, 2000, 483-511. [4] F. Bab o escu, S. Singh, and G. V arghese, P ac k et classifcation for core routers: Is there an alternativ e to CAMs?, Pr o c e e dings of IEEE INF OCOM 03 April 2003. [5] F. Bab o escu and G. V arghese, Scalable pac k et classifcation, Pr o c e e dings of A CM SIGCOMM 01 Septem b er 2001. [6] F. Bak er, Requiremen ts for IP v ersion 4 routers, RF C 1812, IETF, h ttp://www.ietf.org/rfc.h tml, June 1995. [7] S. Blak e, D. Blac k, M. Carlson, E. Da vies, Z. W ang, and W. W eiss, An arc hitecture for dieren tiated services, RF C 2475, IETF, h ttp://www.ietf.org/rfc.h tml, Decem b er 1998. [8] R. Braden, D. Borman, and C. P artridge, Computing the In ternet c hec ksum, RF C 1071, IETF, h ttp://www.ietf.org/rfc.h tml, Septem b er 1988. [9] B. Braden, L. Zhang, S. Berson, S. Herxog, and S. Jamin, Resource reserv ation proto col (RSVP) { v ersion 1 functional sp ecifcation, RF C 2205, IETF, h ttp://www.ietf.org/rfc.h tml, Septem b er 1997. [10] A. Bremler-Barr, Y. Afek, and S. Har-P eled, Routing with a clue, Pr o c e e dings of A CM SIGCOMM 99 Septem b er 1999, 203-214. [11] M. Buddhik ot, S. Suri, and M. W aldv ogel, Space decomp osition tec hniques for fast la y er-4 switc hing, Pr o c e e dings of A CM SIGCOMM 01 Septem b er 2001. [12] V. Cerf, Computer net w orking: Global infrastructure for the 21st cen tury h ttp://www.cs.w ashington.edu/homes/lazo ws k a/cra/net w orks .h tml, 1995. Site last visited June 2003. [13] G. Chandranmenon and G. V arghese, T rading pac k et headers for pac k et pro cessing, IEEE T r ansactions on Networking April 1996. 164

PAGE 179

165 [14] T. Chaney A. Fingerh ut, M. Fluc k e, and J. T urner, Design of a gigabit A TM switc h, Pr o c e e dings of IEEE INF OCOM 97 Marc h 1997. [15] W. Cheswic k, S. Bello vin, and A. Rubin, Firew alls and In ternet securit y: Rep elling the wily hac k er, Addison-W esley Professional, 2nd ed., 2002, 464 pages. [16] G. Cheung and S. McCanne, Optimal routing table design for IP address lo okups under memory constrain ts, Pr o c e e dings of IEEE INF OCOM 99 Marc h 1999. [17] T. Chiueh and P Pradhan, High-p erformance IP routing table lo okup using CPU cac hing, Pr o c e e dings of IEEE INF OCOM 99 Marc h 1999. [18] K. Clay H. Braun, and G. P olyzos, A parameterizable metho dology for In ternet trac ro w profling, IEEE Journal of Sele cte d A r e as in Communic ations 1995. [19] D. Comer, Computer net w orks and In ternets with In ternet applications, 3rd ed., Pren tice Hall, NJ, 2001, 683 pages. [20] D. Decasp er, Z. Dittia, G. P arulk ar, and B. Plattner, Router plugins: A soft w are arc hitecture for next generation routers, Pr o c e e dings of A CM SIGCOMM 98 August 1998, 191-202. [21] S. Deering and R. Hinden, In ternet proto col, v ersion 6 (IPv6) sp ecifcation, RF C 2460, IETF, h ttp://www.ietf.org/rfc.h tml, Decem b er 1998. [22] M. Degermark, A. Bro dnik, S. Carlsson, and S. Pink, Small forw arding tables for fast routing lo okups, Pr o c e e dings of A CM SIGCOMM 97 Octob er 1997, 3-14. [23] W. Do eringer, G. Karjoth, and M. Nassehi, Routing on longest-matc hing prefxes, IEEE/A CM T r ansactions on Networking 4, 1, 1996, 86-97. [24] R. Dra v es, C. King, V. Sriniv asan, and B. Zill, Constructing optimal IP routing tables, Pr o c e e dings of IEEE INF OCOM 99 Marc h 1999. [25] F. Ergun, S. Mittra, S. Sahinalp, J. Sharp, and R. Sinha, A dynamic lo okup sc heme for burst y access patterns, Pr o c e e dings of IEEE INF OCOM 01 2001. [26] F. Ergun, S. Sahinalp, J. Sharp, and R. Sinha, Biased skip lists for highly sk ew ed access patterns, 3r d Workshop on A lgorithm Engine ering and Exp eriments 2001. [27] D. Estrin, D. F arinacci, A. Helm y D. Thaler, S. Deering, M. Handley V. Jacobson, C. Liu, P Sharma, and L. W ei, Proto col indep enden t m ulticast-sparse mo de (PIM-SM): Proto col sp ecifcation, RF C 2362, IETF, h ttp://www.ietf.org/rfc.h tml, June 1998. [28] A. F eldman and S. Muth ukrishnan, T radeos for pac k et classifcation, Pr o c e e dings of IEEE INF OCOM 2000 2000.

PAGE 180

166 [29] R. Fielding, J. Gett ys, J. Mogul, H. F ryst yk, L. Masin ter, P Leac h, and T. Berners-Lee, Hyp ertext transfer proto col { HTTP/1.1, RF C 2616, IETF, h ttp://www.ietf.org/rfc.h tml, June 1999. [30] V. F uller, T. Li, J. Y u, and K. V aradhan, Classless in ter-domain routing (CIDR): An address assignmen t and aggregation strategy RF C 1519, IETF, h ttp://www.ietf.org/rfc.h tml, Septem b er 1993. [31] M. Gra y In ternet gro wth summary h ttp://www.mit.edu/p eople/mkgra y/net/ in ternet-gro wth-sumary .h tml, 1996. Site last visited June 2003. [32] P Gupta, S. Lin, and N. McKeo wn, Routing lo okups in hardw are at memory access sp eeds, Pr o c e e dings of IEEE INF OCOM 98 April 1998. [33] P Gupta and N. McKeo wn, Dynamic algorithms with w orst-case p erformance for pac k et classifcation, IFIP Networking 2000. [34] P Gupta and N. McKeo wn, P ac k et classifcation on m ultiple felds, Pr o c e e dings of A CM SIGCOMM 99 Septem b er 1999. [35] P Gupta and N. McKeo wn, P ac k et classifcation using hierarc hical in telligen t cuttings, Hot Inter c onne cts VII August 1999. [36] P Gupta, B. Prabhak ar, and S. Bo yd, Near-optimal routing lo okups with b ounded w orst case p erformance, Pr o c e e dings of IEEE INF OCOM 2000 2000. [37] C. Hedric k, Routing information proto col, RF C 1058, IETF, h ttp://www.ietf.org/rfc.h tml, June 1988. [38] R. Hinden, Applicabilit y statemen t for the implemen tation of classless in terdomain routing (CIDR), RF C 1517, IETF, h ttp://www.ietf.org/rfc.h tml, Septem b er 1993. [39] E. Horo witz, S. Sahni, and D. Meh ta, F undamen tals of data structures in C++, W.H. F reeman, NY, 1995, 653 pages. [40] H. Jung and S. Sahni, Sup erno de binary searc h trees, International Journal on F oundations of Computer Scienc e T o app ear. [41] K. Kim and S. Sahni, IP lo okup b y binary searc h on prefx length, Journal of Inter c onne ction Networks V ol. 3, No. 3 & 4, 2002, 105-128. [42] J. Klensin, Simple mail transfer proto col, RF C 2821, IETF, h ttp://www.ietf.org/rfc.h tml, April 2001. [43] T.V. Lakshman and D. Stiliadis, High-sp eed p olicy-based pac k et forw arding using ecien t m ulti-dimensional range matc hing, Pr o c e e dings of A CM SIGCOMM 98 August 1998.

PAGE 181

167 [44] B. Lampson, V. Sriniv asan, and G. V arghese, IP lo okup using m ulti-w a y and m ulticolumn searc h, Pr o c e e dings of IEEE INF OCOM 98 April 1998. [45] B. Leiner, V. Cerf, D. Clark, R. Kahn, L. Kleinro c k, D. Lync h, J. P ostel, L. Rob erts, and S. W ol, A brief history of the In ternet, h ttp://www.iso c.org/in ternet/history/brief.sh tml, August 2000. Site last visited June 2003. [46] S. Lin and N. McKeo wn, A sim ulation study of IP switc hing, Pr o c e e dings of IEEE INF OCOM 2000 2000. [47] K. Lougheed and Y. Rekh ter, A b order gatew a y proto col (BGP), RF C 1163, IETF, h ttp://www.ietf.org/rfc.h tml, June 1990. [48] T. Mallory and A. Kullb erg, Incremen tal up dating of the In ternet c hec ksum, RF C 1141, IETF, h ttp://www.ietf.org/rfc.h tml, Jan uary 1990. [49] C. Matsumoto, CAM v endors consider algorithmic alternativ es, EE Times h ttp://www.eetimes.com/story/OEG20020520S0014, Ma y 2002. Site last visited June 2003. [50] A. McAuley and P F rancis, F ast routing table lo okups using CAMs, Pr o c e e dings of IEEE INF OCOM 93 1993, 1382-1391. [51] N. McKeo wn, A fast switc hed bac kplane for a gigabit switc hed router, Business Communic ations R eview V ol. 27, No. 12, Decem b er 1997. [52] N. McKeo wn, M. Izzard, A. Mekkittikul, B. Ellersic k, and M. Horo witz, The tin y tera: A pac k et switc h core, IEEE Micr o Jan uary 1997. [53] Merit, IPMA statistics, h ttp://nic.merit.edu/ipma, (snapshot on Septem b er 13, 2000), 2000. [54] D. Milo jicic, T rend w ars: In ternet tec hnology h ttp://www.computer.org/concurrency/articles/trendw ars 200 1.h tm, 2000. [55] J. Mo y OSPF v ersion 2, RF C 1247, IETF, h ttp://www.ietf.org/rfc.h tml, July 1991. [56] P Newman, G. Minshall, and L. Huston, IP switc hing and gigabit routers, IEEE Communic ations Magazine Jan uary 1997. [57] S. Nilsson and G. Karlsson, F ast address lo ok-up for In ternet routers, IEEE Br o adb and Communic ations 1998. [58] D. P assmore and J. Bransky Route once switc h man y h ttp://www.burtongroup.com/Public/WhiteP ap ers/Route Once wp.h tml, July 1997. Site last visited June 2003.

PAGE 182

168 [59] J. P ostel, In ternet proto col, RF C 791, IETF, h ttp://www.ietf.org/rfc.h tml, Septem b er 1981. [60] J. P ostel, T ransmission con trol proto col, RF C 793, IETF, h ttp://www.ietf.org/rfc.h tml, Septem b er 1981. [61] J. P ostel, User datagram proto col, RF C 768, IETF, h ttp://www.ietf.org/rfc.h tml, August 1980. [62] W. Pugh, Skip lists: a probailistic alternativ e to balanced trees, Comm. of the A CM 33, 6, 1990. [63] L. Qiu, G. V arghese, and S. Suri, F ast frew all implemen tation for soft w are and hardw are based routers, 9th International Confer enc e on Network Pr otoc ols No v em b er 2001. [64] Y. Rekh ter, B. Da vie, D. Katz, E. Rosen, and G. Sw allo w, Cisco systems' tag switc hing arc hitecture o v erview, RF C 2105, IETF, h ttp://www.ietf.org/rfc.h tml, F ebruary 1997. [65] Y. Rekh ter and T. Li, A b order gatew a y proto col 4 (BGP-4), RF C 1771, IETF, h ttp://www.ietf.org/rfc.h tml, Marc h 1995. [66] Y. Rekh ter and T. Li, An arc hitecture for IP address allo cation with CIDR, RF C 1518, IETF, h ttp://www.ietf.org/rfc.h tml, Septem b er 1993. [67] J. Reynolds and J. P ostel, Assigned n um b ers, RF C 1700, IETF, h ttp://www.ietf.org/rfc.h tml, Octob er 1994. [68] E. Rosen, A. Visw anathan, and R. Callon, Multiproto col lab el switc hing arc hitecture, RF C 3031, IETF, h ttp://www.ietf.org/rfc.h tml, Jan uary 2001. [69] M. Ruiz-Sanc hez, E. Biersac k, and W. Dabb ous, Surv ey and taxonom y of IP address lo okup algorithms, IEEE Network 2001, 8-23. [70] S. Sahni, Data structur es, algorithms, and applic ations in Java McGra w Hill, NY, 2000. [71] S. Sahni and K. Kim, Ecien t construction of fxed-stride m ultibit tries for IP lo okup, Pr o c e e dings of 8th IEEE Workshop on F utur e T r ends of Distribute d Computing Systems 2001, 178-184. [72] S. Sahni and K. Kim, Ecien t construction of v ariable-stride m ultibit tries for IP lo okup, Pr o c e e dings of IEEE Symp osium on Applic ations and the Internet (SAINT) 2002, 220-227. [73] S. Sahni and K. Kim, Ecien t construction of m ultibit tries for IP lo okup, IEEE/A CM T r ansactions on Networking T o app ear.

PAGE 183

169 [74] S. Sahni and K. Kim, Ecien t dynamic lo okup for burst y access patterns. Submitted. [75] S. Sahni and K. Kim, O (log n ) dynamic pac k et routing, Pr o c e e dings of IEEE Symp osium on Computers and Communic ations 2002, 443-448. [76] S. Sahni, K. Kim, and H. Lu, Data structures for one-dimensional pac k et classifcation using most-sp ecifc-rule matc hing, Pr o c e e dings of International Symp osium on Par al lel A r chite ctur es, A lgorithms, and Networks (ISP AN) 2002, 3-14. [77] S. Sahni, K. Kim, and H. Lu, Data structures for one-dimenstional pac k et classifcation using most-sp ecifc-rule matc hing, International Journal of F oundations of Computer Scienc e T o app ear. [78] K. Sklo w er, A tree-based routing table for Berk eley Unix, T ec hnical Rep ort, Univ ersit y of California, Berk eley 1993. [79] D. Sleator and R. T arjan, Self-adjusting binary searc h trees, Journal of the A CM 32, 1985. [80] V. Sriniv asan, F ast and ecien t In ternet lo okups, CS Ph.D Dissertation W ashington Univ ersit y August 1999. [81] V. Sriniv asan, S. Suri, and G. V arghese, P ac k et classifcation using tuple space searc h, Pr o c e e dings of A CM SIGCOMM 99 Septem b er 1999. [82] V. Sriniv asan and G. V arghese, F aster IP lo okups using con trolled prefx expansion, A CM T r ansactions on Computer Systems F eb:1-40, 1999. [83] V. Sriniv asan, G. V arghese, S. Suri, and M. W aldv ogel, F aster and scalable la y er four switc hing, Pr o c e e dings of A CM SIGCOMM 98 August 1998. [84] S. Suri, G. V arghese, and P W arkhede, Multiw a y range trees: Scalable IP lo okup with fast up dates, Pr o c e e dings of GLOBECOM 01 2001. [85] A. T ammel, Ho w to surviv e as an ISP Networld Inter op 1997. [86] D. W aitzman, C. P artridge, and S. Deering, K. Lougheed and Y. Rekh ter, Distance v ector m ulticast routing proto col, RF C 1075, IETF, h ttp://www.ietf.org/rfc.h tml, No v em b er 1988. [87] M. W aldv ogel, G. V arghese, J. T urner, and B. Plattner, Scalable high sp eed IP routing lo okups, Pr o c e e dings of A CM SIGCOMM 97 Octob er 1997, 25-36. [88] T. W o o, A mo dular approac h to pac k et classifcation: Algorithms and results, Pr o c e e dings of IEEE INF OCOM 2000 2000.

PAGE 184

BIOGRAPHICAL SKETCH Kun Suk Kim receiv ed the B.E. and M.E. degrees in Computer Engineering from the Kyungp o ok National Univ ersit y Korea, in 1992 and 1994, resp ectiv ely He w as a researc h sta mem b er at the Electronics and T elecomm unications Researc h Institute, Korea, for fv e and a half y ears from 1994. Since 1999, he has b een taking the Ph.D. degree course in the Computer and Information Science and Engineering departmen t at the Univ ersit y of Florida, Gainesville, FL. He is in terested in net w orkbased systems, telecomm unications, and computer net w orks. He is curren tly w orking on algorithms for net w orks. 170


Permanent Link: http://ufdc.ufl.edu/UFE0000956/00001

Material Information

Title: Data Structures for Static and Dynamic Router Tables
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0000956:00001

Permanent Link: http://ufdc.ufl.edu/UFE0000956/00001

Material Information

Title: Data Structures for Static and Dynamic Router Tables
Physical Description: Mixed Material
Copyright Date: 2008

Record Information

Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
System ID: UFE0000956:00001


This item has the following downloads:


Full Text











DATA STRUCTURES FOR STATIC AND DYNAMIC ROUTER TABLES


By

KUN SUK KIM















A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

UNIVERSITY OF FLORIDA


2003


































Copyright 2003

by

Kun Suk Kim

















To my mother,

Kyung Hwan Lee

my wife,

Hye Ryung

and my sons,

Jin Sung and Daniel















ACKNOWLEDGMENTS

I appreciate God for my exciting and wonderful time at the University of Florida.

Interactions with the many multicultural students and great professors have enriched

me in both academic and social aspects.

I would like to deeply thank Distinguished Professor Sartaj Sahni, my advisor

for guiding me through the doctoral study. I thank God for having given me such an

ideal mentor. I have learned from him how an advisor should treat his students. His

enthusiastic and devoted attitude toward teaching and research has strongly affected

me. He has inspired me to solve difficult problems and given me invaluable advices.

I consider him as a role model for my life and hope to lift my abilities up to his high

standards in the future. I also thank Drs. Randy Chow, Richard Newman, Shigang

Chen, and Janise fM. NILr for serving on my supervisory committee.

I am thankful to Venkatachary Srinivasan and Jon Sharp for giving their pro-

gramming codes and comments for the multibit trie and BSL structures, respectively.

With their help I could start my research smoothly.

Special thanks go to my former advisor, Dr. Yann-Hang Lee for supporting me

for the first two semesters at University of Florida and encouraging me even after

his moving to Arizona State University. I would like to thank former and current

members of the Korean Student Association of the CISE department at the University

of Florida for their assistance and friendship. Thanks go to Daeyoung Kim, Yoonmee

Doh, Young Joon Byun, Myung Chul Song, and many others.

I am eternally grateful to Pastor Hee Young Sohn, who is my spiritual mentor;

and to the ministers at Korean Baptist Church of Gainesville (KBCG), for continu-

ously caring about me in Jesus love. Thanks go to Jin Kun Song, Dong Yul Sung,









and other former and current members of my cell church of KBCG for sharing their

lives with me and praying for me.

I cannot adequately express my gratitude to my mother, Kyung Hwan Lee. She

has worked hard to provide me with better educational opportunities that made it

possible for me to get this degree. I would like to thank Moon Suk, my brother; and

Mi Ok, my sister for their constant support and encouragement. I am also grateful

to my parents-in-law, Dae Hun Song and Ok Jo Choi; my five elder sisters-in-law

(and their husbands), Mi Hye, Mi Ah, Hye Young, and Hye Youn; and my younger

sister-in-law, Hyo Jae for supporting me both materially and morally.

At the end I have to mention my nuclear family that has put up with my absence

for many evenings while I finished this work. Thanks go to my wife, Hye Ryung; and

my sons, Jin Sung and Daniel for their love and tolerance.

I am grateful to all for their help and guidance and hope to remember their love

forever.














TABLE OF CONTENTS
page
A C K N O W L E D G M E N T S ................................................................................................. iv

L IST O F TA B LE S ........... .... ........ ... .... ...................... .... .... .............. viii

LIST OF FIGURES ................................. .. ... ... ................. .x

A B S T R A C T .............................................. ..........................................x iii

CHAPTER

1 IN TR OD U CTION ............................................... .. ... .... .............. ..

1.1 Internet R outer .................. ........................... ............ .. ........ ..
1.1.1 Internet Protocols .................................................. ........ .. .. ....... .. 4
1.1.2 Classless Inter-Domain Routing (CIDR) ...............................................6
1.1.3 Packet Forw arding ...........................................................................8
1.2 Packet Classification ............................................ ........ ............... 10
1.3 Prior W ork .......................... .. ............... .......................... 15
1.3.1 Linear List ...................... .... ......... .. ..............................15
1.3.2 End-Point Array ........................... .. .. ...... .........................16
1.3.3 Sets of Equal-Length Prefixes..............................................16
1 .3 .4 T rie s ................................................................................................. 1 9
1.3.5 B inary Search Trees ........................................ .......................... 20
1.5 .6 O th ers ......... ........ .............................................................. .......... 2 1
1.4 D issertation O u tlin e ............................................................... .....................2 2

2 M U LTIB IT TRIES ........................................................................ ............... 23

2.1 1-Bit Tries ................. ....... .....................23
2.2 Fixed-Stride Tries...................... ........................ ......26
2 .2 .1 D efin ition ..................... ... ......... ..................................2 6
2.2.2 Construction of Optimal Fixed-Stride Tries .......................................27
2.3 V ariable-Stride Tries .......... ............................................ ............... 38
2.3.1 D definition and Construction .......... .. .............. ................................. 38
2 .3 .2 A n E x am ple........... ...... .............................. ................ .. .... ..... .. 44
2.3.3 Faster k = 2 A lgorithm ........................................ ........ ............... 46
2.3.4 Faster k = 3 A lgorithm ........................................ ........ ............... 48
2 .4 E xperim mental R esults............................................................. .....................5 1
2.4.1 Performance of Fixed-Stride Algorithm .............................................51
2.4.2 Performance of Fixed-Stride Algorithm ............... ....... ...........52
2 .5 S u m m ary ..................................................... ................ 6 7









3 BINARY SEARCH ON PREFIX LENGTH ................................ .....................68

3.1 H euristic of Srinivasan ......... ............... .................................. ............... 71
3.2 O ptim al-Storage A lgorithm ......................................... ......................... 74
3.2.1 Expansion C ost .............................................. .......................... 75
3.2.2 Number of Markers ...................... ............. ................... 76
3.2.3 Algorithm for ECHT .......................... ............. .. ............... 78
3.3 A alternative Form ulation ......... ........................................ .... .. ................. ... 80
3.4 R educed-R ange H euristic........................................... .......................... 81
3.5 More Accurate Cost Estimator.................... .............. .. ...............95
3.6 E xperim ental R esults............................................... ............................. 95
3 .7 Sum m ary ........................................................................99

4 O(log n) DYNAMIC ROUTER-TABLE ........................................................101

4.1 Prefixes and R anges ......................................................... .............. 101
4.2 Properties of Prefix Ranges................................................................. 103
4.3 Representation Using Binary Search Trees...................................................105
4.3.1 Representation........ ...... ....................... ......... .. .. ................. 105
4.3.2 Longest Prefix M atching................................ ........................ 108
4.3.3 Inserting a Prefix ............................ ..... ...................110
4.3.4 D eleting a Prefix ............................................................................ 120
4.3.5 C om plexity ....... ............................ .. .......... .... .. ............ 123
4.3.6 C om m ents .............................................. ............ .. .............. 124
4.4 Experim ental R esults.............................................. ............................ 125
4 .5 S u m m a ry .................................................................................................. 12 9

5 DYNAMIC LOOKUP FOR BURSTY ACCESS PATTERNS..........................132

5.1 Biased Skip Lists w ith Prefix Trees ......... ...... ............... ..... .......... .. 133
5.2 C collection of Splay Trees .................................................... ........ .......... 139
5.3 Comparison of BITs and ABITs ........................................................... 140
5.4 Experim ental R esults.............................................. ............................ 142
5 .5 S u m m a ry .................................................................................................. 1 5 4

6 CONCLUSIONS AND FUTURE WORK .................................... ...............159

6 .1 C o n c lu sio n s .............................................................................................. 1 5 9
6.2 Future W ork ......................................... ................... .... ........ 161

REFERENCES .................................................... .............164

BIOGRAPHICAL SKETCH ............................................................. ............... 170
















LIST OF TABLES
Table pge


2-1 Prefix databases obtained from IPMA project on Sep 13, 2000............................25

2-2 Distributions of the prefixes and nodes in the 1-bit trie for Paix...........................25

2-3 Memory required (in Kbytes) by best k-level FST..............................................51

2-4 Execution time (in /sec) for FST algorithms, Pentium 4 PC .............................53

2-5 Execution time (in /sec) for FST algorithms, SUN Ultra Enterprise
4 0 0 0 /5 0 0 0 ....................................................... ................ 5 3

2-6 Memory required (in Kbytes) by best k-level VST ............................................54

2-7 Execution times (in msec) for first two implementations of our VST
algorithm Pentium 4 PC ......................................................... ............... 56

2-8 Execution times (in msec) for first two implementations of our VST
algorithm, SUN Ultra Enterprise 4000/5000 ............................... ............... .56

2-9 Execution times (in msec) for third implementation of our VST algorithm,
P en tiu m 4 P C ..................................................... ................ 5 7

2-10 Execution times (in msec) for third implementation of our VST algorithm,
SU N U ltra Enterprise 4000/5000.................................. ...................................... 57

2-11 Execution times (in msec) for our best VST implementation and the VST
algorithm of Srinivasan and Varghese, Pentium 4 PC...........................................59

2-12 Execution times (in msec) for our best VST implementation and the VST
algorithm of Srinivasan and Varghese, SUN Ultra Enterprise 4000/5000 ............59

2-13 Time (in msec) to construct optimal VST from optimal stride data, Pentium
4 P C ........................................................................... 6 1

2-14 Search time (in /sec) in optimal VST, Pentium 4 PC .......................................61

2-15 Insertion time (in /sec) for OptVST, Pentium 4 PC...........................................64

2-16 Deletion time (in /sec) for OptVST, Pentium 4 PC ...........................................64









2-17 Insertion time (in /sec) for Batch1, Pentium 4 PC..............................................64

2-18 Deletion time (in /sec) for Batchl, Pentium 4 PC .............................................65

2-19 Insertion time (in /sec) for Batch2, Pentium 4 PC..............................................65

2-20 Deletion time (in /sec) for Batch2, Pentium 4 PC .............................................66

3-1 Number of prefixes and markers in solution to ECHT(P, k) ..............................97

3-2 Number of prefixes and markers in solution to ACHT(P, k) ..............................98

3-3 Preprocessing time in milliseconds...................... ... .......................... 98

3-4 Execution time, in /sec, for ECHT(P, k) .................................... ............... 99

3-5 Execution time, in /sec, for ACHT(P, k) ..........................................100

4-1 Statistics of prefix databases obtained from IPMA project on Sep 13, 2000 ......125

4-2 M em ory for data structure (in Kbytes) ..................................... .................126

4-3 Execution time (in /sec) for randomized databases ................. ............ .......128

4-4 Execution time (in /sec) for original databases.......................................129

5-1 M em ory require ent (in KB) ..................................................... ... .......... 144

5 -2 T race sequ en ces ........................................... ............................ .................... 14 5

5-3 Search time (in /sec) for CRBT, ACRBT, and SACRBT structures on
NODUP, DUP, and RAN data sets................ ... .... ................. 147

5-4 Search time (in /sec) for CST and BSLPT structures on NODUP, DUP,
and R A N data sets............. ...................................... ................ .. .... ...... 148

5-5 Search time (in /sec) for CRBT, ACRBT, and SACRBT structures on
trace sequ en ces............................................... ................ 14 9

5-6 Search time (in /sec) for CST and BSLPT structures on trace sequences..........150

5-7 Average tim e to insert a prefix (in /sec) .................................. ............... 153

5-8 Average time to delete a prefix (in /sec) ......................................................153

6-1 Performance of data structures for longest matching-prefix...............................161















LIST OF FIGURES
Figure page


1-1 Internet structure .................. ...................................... .. ........ .. ..

1-2 Generic router architecture .................................. ...................................... 5

1-3 Form ats for IP packet header.............................................................................

1-4 Transport protocol header form ats............................................... ........ ....... 7

1-5 R outer table exam ple ........................................... ....................................... 10

2-1 Prefixes and corresponding 1-bit trie ............................................ ...............24

2-2 Prefix expansion and fixed-stride trie ...................................... ...............27

2-3 A lgorithm for fixed-stride tries.................................... ........................... ......... 38

2-4 Two-level VST for prefixes of Figure 2-1(a) ......................................................39

2-5 A prefix set and its expansion to four lengths............................................ 44

2-6 1-bit trie for prefixes of Figure 2-5(a).............. ............................. ...............44

2-7 Opt values in the computation of Opt(NO, 0, 4) ............................................. 45

2-8 Optimal 4-VST for prefixes of Figure 2-5(a) ............. ..................................46

2-9 Algorithm to compute C using Equation 2.20 ................................................. 47

2-10 Algorithm to compute Tusing Equation 2.22...................................................49

2-11 Memory required (in Kbytes) by best k-level FST ...............................................52

2-12 Execution time (in /sec) for FST algorithms, Pentium 4 PC .............................53

2-13 Execution time (in /sec) for FST algorithms, SUN Ultra Enterprise
4000/5000 ........................................... ........................... 54

2-14 Memory required (in Kbytes) for Paix by best k-VST and best FST ....................55

2-15 Execution times (in msec) for Paix for our three VST implementations,
P entium 4 P C ........................................58

x









2-16 Execution times (in msec) for Paix for our three VST implementations,
SU N U ltra Enterprise 4000/5000.................................. ............................ .......... 58

2-17 Execution times (in msec) for Paix for our best VST implementation and
the VST algorithm of Srinivasan and Varghese, Pentium 4 PC ..........................60

2-18 Execution times (in msec) for Paix for our best VST implementation and
the VST algorithm of Srinivasan and Varghese, SUN Ultra Enterprise
4000/5000 ........................................... ........................... 60

2-19 Search time (in nsec) in optimal VST for Paix, Pentium 4 PC.............................62

2-20 Insertion time (in /sec) for Paix, Pentium 4 PC ...................................................65

2-21 Deletion time (in /sec) for Paix, Pentium 4 PC ......................................... 66

3-1 C controlled prefix expansion........................................................ ............... 69

3-2 Prefixes and corresponding 1-bit trie............................................ .................. 73

3-3 Alternative binary tree for binary search .................................... ............... 75

3-4 LEC and EC values for Figure 3-2..................................... ........................ 76

3-5 LMC and MC values for Figure 3-2............................................77

3-6 Optimal-storage CHTs for Figure 3-2......................................... ............... 80

3-7 Algorithm for binary-search hash tables..................................... ............... 94

4-1 Prefixes and their ranges ......... ....... .. ......... ........ .............. ............... 102

4-2 Pictorial and tabular representation of prefixes and ranges.............................102

4-3 Types of prefix ranges .................................. ....................................................... 104

4-4 C B ST for F igure 4-2(a).......................................................................... ..... 106

4-5 Values of next are shown as left arrows .............................. .................107

4-6 Algorithm to find LM P(d) ................................................... ...... ............... 111

4-7 Pictorial representation of prefixes and ranges after inserting a prefix .............12

4-8 Basic interval tree and prefix trees after inserting P6 = 01 into Figure 4-4.......113

4-9 A lgorithm to insert an end point................................. ....................... .. .......... 114

4-10 Splitting a basic interval when lsb(u) = 1 ..... .......... ..... ........................ 115









4-11 Prefix trees after inserting P7 = 10* into P1-P5.................... ..............117

4-12 Algorithm to update prefix trees ............................ ...............119

4-13 P = S; P S and S starts at s; and P S and S finishes atf..............................122

4-14 Memory required (in Kbytes) by best k-VST and CRBT for Paix ................126

4-15 Search time (in /sec) comparison for Paix.......... ................... .................. 129

4-16 Insert time (in /sec) comparison for Paix.................... ........ .................. 130

4-17 Delete tim e (in /sec) comparison for Paix ................................... .................. ... 130

5-1 Skip list representation for basic intervals of Figure 4-2(a) ..............................134

5-2 Start point s of P splits the basic interval [a, b] .............. .................................. 137

5-3 BSLPT insert algorithm ........... .. ........ ........................ 138

5-4 Alternative Base Interval Tree corresponding to Figure 4-2(a)...........................140

5-5 Total memory requirement (in M B) ....................................... ............... 145

5-6 Average search time for NODUP, DUP, and RAN data sets ...........................156

5-7 Average search time for trace sequences.................. ............ ............... 157

5-8 A average tim e to insert a prefix......................................................... .... .......... 158

5-9 Average time to delete a prefix.......................................... ................... 158















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy

DATA STRUCTURES FOR STATIC AND DYNAMIC ROUTER TABLES

By

Kun Suk Kim

August 2003

Chair: Sartaj K. Sahni
Major Department: Computer and Information Science and Engineering

The Internet has been growing exponentially since many users make use of new

applications and want to connect their hosts to Internet. Because of increased pro-

cessing power and high-speed communication links, packet header processing has

become a major bottleneck in Internet routing. To improve packet forwarding, our

study developed fast and efficient algorithms.

We improved on dynamic programming algorithms to determine the strides of

optimal multibit tries by providing alternative dynamic programming formulations

for both fixed- and variable-stride tries. While the '-iulpt.iti., complexities of our

algorithms are the same as those for the corresponding algorithms of [82], experiments

using real IPv4 routing table data indicate that our algorithms run considerably

faster. An added feature of our variable-stride trie algorithm is the ability to insert

and delete prefixes taking a fraction of the time needed to construct an optimal

variable-stride trie from scratch.

IP lookup in a collection of hash tables (CHT) organization can be done with

O(logldist) hash-table searches, where Idit is the number of distinct prefix-lengths

(also equal to the number of hash tables in the CHT). We developed an algorithm









that minimizes the storage required by the prefixes and markers for the resulting

set of prefixes to reduce the value of Idist by using the controlled prefix-expansion

technique. Also, we proposed improvements to the heuristic of [80].

We developed a data structure called collection of red-black trees (CRBT) in

which prefix matching, insertion, and deletion can each be done in O(logn) time,

where n is the number of prefixes in the router table. Experiments using real IPv4

routing databases indicate that although the proposed data structure is slower than

optimized variable-stride tries for longest prefix matching, the proposed data struc-

ture is considerably faster for the insert and delete operations.

We formulate a variant, alternative collection of red-black trees (ACRBT) from

the CRBT data structure to develop data structures for bursty access patterns. By

replacing the red-black trees used in the ACRBT with splay trees (or biased skip

lists), we obtained the collection of splay trees (CST) structure (or the biased skip

lists with prefix trees (BSLPT) structure) in which search, insert, and delete take

O(logn) amortized time (or O(logn) expected time) per operation, where n is the

number of prefixes in the router table. Experimental results using real IPv4 routing

databases and synthetically generated search sequences as well as trace sequences

indicate that the CST structure is best for extremely bursty access patterns. Other-

wise, the ACRBT is recommended. Our experiments also indicate that a supernode

implementation of the ACRBT usually has better search performance than does the

traditional one-element-per-node implementation.















CHAPTER 1
INTRODUCTION

The influence of Internet evolution reaches not only to the technical fields of

computer and communications but throughout society as we move toward increasing

use of online services (e.g., electronic commerce and information acquisition). The

Internet is a world-wide communication infrastructure in which individuals and their

computers interact and collaborate without regard for geographic location. Beginning

with the early research on packet switching1 and the ARPANET,2 government,

industry, and academia have been cooperating to evolve and deploy this exciting new

technology [12, 45].

The Internet is an internetwork that ties many groups of networks with a common

Internet Protocol (IP) [6, 19]. Figure 1-1 shows routers as the switch points of an

internetwork. A packet may pass through many different deployment classes of routers

from source to destination. The enterprise router, located at the lowest level in the

router hierarchy, must support tens to thousands of network routes and medium

bandwidth interfaces of 100 Mbps to 1 Gbps. Access routers are aggregation points

for corporations and residential services. Access routers must support thousands or

tens of thousands of routes. Residential terminals are connected to modem pools of

the telephone central office through plain old telephone service (POTS), cable service,



1 This technology is fundamentally different from the circuit switching that was
used by the telephone system. In a packet-switching system, data to be delivered
is broken into small chunks called packet that are labeled to show where they come
from and where they are to go.
2 This project was sponsored by U.S. Department of Defense to develop a whole
new scheme for postnuclear communication.










A


owO 'Core router
mmmo ISP access router
Enterprise router


Figure 1-1: Internet structure

or one of the digital subscriber lines (DSLs). Backbone routers require multiples of

high bandwidth ports such as OC-48 at 2.5 Gbps and OC-192 at 9.6 Gbps. Backbone

routers cover national and international areas.

With the doubling of Internet traffic every 3 months [85] and the tripling of

Internet hosts every 2 years [31], the importance of high speed scalable network

routers cannot be overemphasized. Fast networking "will play a key role in enabling

future pi'["' -- [54]. Fast networking requires fast routers; and fast routers require

fast router table lookup.

The rest of this chapter is structured as follows. Section 1.1 introduces the basics

of Internet routers. We describe the IP lookup and packet classification problems in

Section 1.2. In Section 1.3 discusses related work in these fields. Finally, Section 1.4

presents an outline of the dissertation.

1.1 Internet Router

Figure 1-2 shows the generic architecture of an IP router. Generally, a router

consists of the following basic components: the controller card, the router backplane,

and line cards [3, 51]. The CPU in the controller card performs path computations









and router table maintenance. The line cards perform inbound and outbound packet

forwarding. The router backplane transfers packets between the cards.

The basic functions in a router can be classified as routing, packet forwarding,

switching, and queueing [3, 6, 51, 58]. We discuss the each function in more detail

below.

Routing: Routing is the process of communicating to other routers and ex-

changing route information to construct and maintain the router tables that

are used by the packet-forwarding function. Routing protocols that are used

to learn about and create a view of the network's topology include the routing

information protocol (RIP) [37], open shortest path first (OSPF) [55], border

gateway protocol (BGP) [47, 65], distance vector multicast routing protocol

(DVMRP) [86], and protocol independent multicast (PIM) [27].

Packet forwarding: The router looks at each incoming packet and performs

a table lookup to decide which output port to use. This is based on the des-

tination IP address in the incoming packet. The result of this lookup may

imply a local, unicast, or multicast delivery. A local delivery is the case that

the destination address is one of the router's local addresses and the packet is

locally delivered. A unicast delivery sends the packet to a single output port.

A multicast delivery is done through a set of output ports, depending on the

multicast group membership of the router. In addition to table lookup, routers

must perform other functions.

Packet validation: This function checks to see that the recieved IPv4

packet is properly formed for the protocol before it proceeds with protocol

processing. However, because the checksum calculation is considered too

expensive, current routers hardly verify the checksum, instead assuming

that packets are transmitted through reliable media like fiber optics and

assuming that end hosts will recognize any possible corruption.









Packet lifetime control: The router adjusts the time-to-live (TTL) field

in the packet to prevent packets looping endlessly in the network. A host

sending a packet initializes the TTL with 64 (recommended by [67]) or

255 (the maximum). A packet being routed to output ports has its TTL

value decremented by 1. A packet whose TTL is zero before reaching the

destination is discarded by the router.

Checksum update: The IP header checksum must be recalculated since

the TTL field was changed. RFC 1071 [8] contains implementation tech-

niques for computing the IP checksum. If only the TTL was decremented

by 1, a router can efficiently update the checksum incrementally instead

of calculating the checksum over the entire IP header [48].

Packet switching: Packet switching is the process of moving packets from

one interface to other port interface based on the forwarding decision. Packet

switching can be done at very high speed [14, 52].

Queueing: Queueing is the action of buffering each packet in a small memory

for a short time (on the order of a few microseconds) during processing of the

packet. Queueing can be done at the input, in the switch fabric, and/or at the

output.

1.1.1 Internet Protocols

The headers of the IPv4 and IPv6 protocols [21, 59] are shown in Figure 1-3.

Unicast packets are forwarded based on the destination address field. Each router be-

tween source and destination must look at this field. Multicast packets are forwarded

based on the source network and destination group address. The protocol field defines

the transport protocol (e.g., TCP [60] and UDP [61]) that is encapsulated within this

IP packet. The type-of-service (ToS) field notices a packet's priority, its queueing,

and its dropping behavior (to the routers). Some applications such as telnet and FTP

set these flags.
































Figure 1-2: Generic router architecture


The most notable change from the IPv4 to the IPv6 header is the address length

of 128 bits. Payload length is the length of the IPv6 payload in octets. Next header

uses the same value as the IPv4 protocol field. Hop limit is decremented by 1 by each

node that forwards the packet. The packet is discarded if the hop limit is decremented

to zero. The flow ID field is added to simplify packet classification. The tuple (source

address, flow ID) uniquely identifies a flow for any nonzero flow ID.

The headers of two transport protocols, as shown in Figure 1-4, provide more

information (such as source and destination port numbers and flags that are used

to further classify packets). In TCP and UDP networks, a port is an endpoint to a

logical connection and the way a client program specifies a specific server program

on a computer in a network. These two port numbers are used to distribute packets

to the application and represent the fine-grained variety of flows. They can be used

to identify a flow within the network. Thus, applications can reserve the resources to











0
Vers HLen


1
ToS


2 3 (octet)
Packet Length


Identification Flagment Info/Offset
TTL Protocol Header Checksum
Source Address
Destination Address

IP Options (optional, variable length)

(a) Format for an IPv4 header


0 1 2 3 (octet)
Vers Traffic Class Flow ID
Payload Length Next Header Hop Limit

Source Address (128 bits)

Destination Address (128 bits)

(b) Format for an IPv6 header

Figure 1-3: Formats for IP packet header


guarantee their service requirement using appropriate signalling protocol like RSVP

[9].

Some ports have numbers that are preassigned to them, and these are known as

well-known ports. Port numbers range from 0 to 65536, but only ports numbers 0

to 1024 are reserved for privileged services and designated as well-known ports [67].

Each of these well-known port numbers specifies the port used by the server process

as its contact port. For example, port numbers 20, 25, and 80 are assigned for FTP,

simple mail transfer protocol (SMTP) [42], and hypertext transfer protocol (HTTP)

[29] servers, respectively.

1.1.2 Classless Inter-Domain Routing (CIDR)

As the Internet has evolved and grown, it faced two serious scaling problems [38].










0 1 2 3 (octet)
Source Port Destination Port
Sequence Number
Acknowlegement Number
Offset Reserved Flags Window Size
Checksum Urgent Pointer

TCP Options (optional, variable length)

(a) TCP header


0 1 2 3 (octet)
Source Port Destination Port
UDP Data Length Checksum

(b) UDP header

Figure 1-4: Transport protocol header formats


Exhaustion of IP address space: In the old Class A, B, and C address

scheme, a fundamental cause of this problem was the lack of a network class of a

size that is appropriate for a mid-sized organization. Class C with a maximum

of 254 host addresses is too small; while class B, which allows up to 65534

addresses, is too large to be densely populated. The result is inefficient use of

class B network numbers. For example, if you needed 500 addresses to configure

a network, you would be assigned Class B. However, that means 65034 unused

addresses.

Routing information overload: As the number of networks on the Internet

increased, so did the number of routes. The size and rate of growth of the router

tables in Internet routers is beyond the ability to effectively manage it.

CIDR is a mechanism to slow the growth of router tables and allow for more effi-

cient allocation of IP addresses than the old Class A, B, and C address scheme. Two

solutions to these problems were developed and adopted by the Internet community

[66, 30].









Restructuring IP address assignments: Instead of being limited to a net-

work identifier (or pijfi.r) of 8, 16, or 24 bits, CIDR uses generalized prefixes

anywhere from 13 to 27 bits. Thus, blocks of addresses can be assigned to net-

works with 32 hosts or to those with over 500,000 hosts. This allows for address

assignments that much more closely fit an organization's specific needs.

Hierarchical routing aggregation: The CIDR addressing scheme also en-

ables route aggregation in which a single high-level route entry can represent

many lower-level routes in the global router tables. Big blocks of addresses

are assigned to the large Internet Service Providers (ISPs) who then re-allocate

portions of their address blocks to their customers. For example, a tier-1 ISP

(e.g., Sprint and Pacific Bell) was assigned a CIDR address block with a prefix
of 14 bits and typically assigns its customers, who may be smaller tier-2 ISPs,

CIDR addresses with prefixes ranging from 27 bits to 18 bits. These customers

in turn re-allocate portions of their address block to their users and/or cus-

tomers (tier-3 or local ISPs). In the backbone router tables all these different

networks and hosts can be represented by the single tier-1 ISP route entry. In

this way, the growth in the number of router table entries at each level in the

network hierarchy has been significantly reduced.

1.1.3 Packet Forwarding

Consider a part of the Internet in Figure 1-5(a) to get an intuitive idea of packet

delivery. If a user in Chicago wishes to send a packet to Orlando, the packet is sent

to a router R4. The router R4 may send this packet on the communication link L3

to a router R1. The router R1 may then send the packet on link L4 to a router R5

in Orlando. R4 then sends the packet to the final destination.

An Internet router table is a set of tuples of the form (p, a), where p is a binary

string whose length is at most W (W = 32 for IPv4 destination addresses and W =

128 for IPv6), and a is an output link (or next hop). When a packet with destination









address A arrives at a router, we are to find the pair (p, a) in the router table for

which p is a longest matching prefix of A (i.e., p is a prefix of A and there is no longer

prefix q of A such that (q, b) is in the table). Once this pair is determined, the packet

is sent to output link a. The speed at which the router can route packets is limited

by the time it takes to perform this table lookup for each packet.

For example, consider a router table at the router R1 in Figure 1-5(a), shown in

Figure 1-5(b). Assume that when a packet arrives on router R1, the packet carries

the destination address 101110 in its header. In this example we assume that the

longest prefix length is 6. To forward the packet to its final destination, router R1

consults a router table, which lists each possible prefix and the corresponding output

link. The address 101110 matches both 1* and 101* in the router table, but 101* is

the longest matching prefix. Since the table indicates output link L2, the router then

switches the packet to L2.

Longest prefix routing is used because this results in smaller and more manage-

able router tables. It is impractical for a router table to contain an entry for each of

the possible destination addresses. Two of the reasons this is so are

The number of such entries would be almost one hundred million and would

triple every 3 years.

Every time a new host comes online, all router tables must incorporate the new

host's address.

By using longest prefix routing, the size of router tables is contained to a reasonable

quantity; and information about host/router changes made in one part of the Internet

need not be propagated throughout the Internet.











Chicago
(11010*)


R4


(a) Backbone routers


(b) Router table for router RI

Figure 1-5: Router table example

1.2 Packet Classification

An Internet router classifies incoming packets into flows,3 using information

contained in packet headers and a table of (classification) rules. This table is called

the rule table (equivalently, router table). The packet-header information that is used

to perform the classification is some subset of the source and destination addresses,

the source and destination ports, the protocol, protocol flags, type of service, and so


3 A flow is a set of packets that are to be treated similarly for routing purposes.


Prefix Output Link

1* L1
101* L2
11010* L3
11100* L4









on. The specific header information used for packet classification is governed by the

rules in the rule table. Each rule-table rule is a pair of the form (F, A), where F is

a filter and A is an action. The action component of a rule specifies what is to be

done when a packet that satisfies the rule filter is received. Sample actions are drop

the packet, forward the packet along a certain output link, and reserve a specified

amount of bandwidth. A rule filter F is a tuple that is comprised of one or more

fields. In the simplest case of destination-based packet forwarding, F has a single

field, which is a destination (address) prefix; and A is the next hop for packets whose

destination address has the specified prefix. For example, the rule (01*, a) states

that the next hop for packets whose destination address (in binary) begins with 01 is

a. IP multicasting uses rules in which F comprises the two fields source prefix and

destination prefix; QoS routers may use five-field rule filters (source-address prefix,

destination-address prefix, source-port range, destination-port range, and protocol);

and firewall filters may have one or more fields.

In the d-dimensional packet classification problem, each rule has a d-field filter.

Our study was concerned solely with 1-dimensional packet classification. It should be

noted, that data structures for multidimensional packet classification are usually built

on top of data structures for 1-dimensional packet classification. Therefore, the study

of data structures for 1-dimensional packet classification is fundamental to the design

and development of data structures for d-dimensional, d > 1, packet classification.

For the 1-dimensional packet classification problem, we assume that the single

field in the filter is the destination field; and that the action is the next hop for the

packet. With these assumptions, 1-dimensional packet classification is equivalent to

the destination-based packet forwarding problem. Henceforth, we use the terms rule

table and router table to mean tables in which the filters have a single field, which

is the destination address. This single field of a filter may be specified in one of two

ways:









As a range: For example, the range [35, 2096] matches all destination addresses

d such that 35 < d < 2096.

As an address/mask pair: Let xi denote the ith bit of x. The address/mask

pair a/m matches all destination addresses d for which d, = a, for all i for

which mi = 1. That is, a 1 in the mask specifies a bit position in which d

and a must agree; while a 0 in the mask specifies a don't-care bit position.

For example, the address/mask pair 101100/011101 matches the destination

addresses 101100, 101110, 001100, and 001110. When all the 1-bits of a mask

are to the left of all 0-bits, the address/mask pair specifies an address prefix.

For example, 101100/110000 matches all destination addresses that have the

prefix 10 (i.e., all destination addresses that begin with 10). In this case, the

address/mask pair is simply represented as the prefix 10*, where the denotes a

sequence of don't-care bits. If W is the length, in bits, of a destination address,

then the in 10* represents all sequences of (W 2) bits. In IPv4 the address

and mask are both 32 bits; while in IPv6 both of these are 128 bits.

Notice that every prefix may be represented as a range. For example, when

W = 6, the prefix 10* is equivalent to the range [32, 47]. A range that may be specified

as a prefix for some W is called a pi fi.r wiai. The specification 101100/011101 may

be abbreviated to ?011?0, where ? denotes a don't-care bit. This specification is not

equivalent to any single range. Also, the range specification [3,6] is not equivalent to

any single address/mask specification.

When more than one rule matches an incoming packet, a tie occurs. To select

one of the many rules that may match an incoming packet, we use a tie breaker.

Let RS be the set of rules in a rule table and let FS be the set of filters associ-

ated with these rules. rules(d, RS) (or simply rules(d) when RS is implicit) is the

subset of rules of RS that match/cover the destination address d. filters(d, FS) and









filters(d) are defined similarly. A tie occurs whenever Irules(d)l > 1 (equivalently,

Ifilters(d)l > 1).

Three popular tie breakers are

First matching rule in table: The rule table is assumed to be a linear list

([39]) of rules with the rules indexed 1 through n for an n-rule table. The

action corresponding to the first rule in the table that matches the incoming

packet is used. In other words, for packets with destination address d, the rule

of rules(d) that has least index is selected.

For our example router table corresponding to the five prefixes of Figure 4

1, rule RI is selected for every incoming packet, because PI matches every

destination address. When using the first-matching-rule criteria, we must index

the rules carefully. In our example, PI should correspond to the last rule so

that every other rule has a chance to be selected for at least one destination

address.

Highest-priority rule: Each rule in the rule table is assigned a priority. From

among the rules that match an incoming packet, the rule that has highest

priority wins is selected. To avoid the possibility of a further tie, rules are

assigned different priorities (it is actually sufficient to ensure that for every

destination address d, rules(d) does not have two or more highest-priority rules).

Notice that the first-matching-rule criteria is a special case of the highest-

priority criteria (simply assign each rule a priortiy equal to the negative of

its index in the linear list).

Most-specific-rule matching: The filter F1 is more specific than the filter

F2 iff F2 matches all packets matched by F1 plus at least one additional packet.

So, for example, the range [2,4] is more specific than [1, 6], and [5, 9] is more

specific than [5, 12]. Since [2, 4] and [8, 14] are disjoint (i.e., they have no address

in common), neither is more specific than the other. Also, since [4,14] and









[6, 20] intersect4 neither is more specific than the other. The prefix 110* is

more specific than the prefix 11*.

In most-specific-rule matching, ties are broken by selecting the matching rule

that has the most specific filter. When the filters are destination prefixes, the

most-specific-rule that matches a given destination d is the longest5 prefix in

filters(d). Hence, for prefix filters, the most-specific-rule tie breaker is equiv-

alent to the longest-matching-prefix criteria used in router tables. For our ex-

ample rule set, when the destination address is 18, the longest matching-prefix

is P4.

When the filters are ranges, the most-specific-rule tie breaker requires us to

select the most specific range in filters(d). Notice also that most-specific-

range matching is a special case of the the highest-priority rule. For example,

when the filters are prefixes, set the prefix priority equal to the prefix length.

For the case of ranges, the range priority equals the negative of the range size.

In a static rule table, the rule set does not vary in time. For these tables, we are

concerned primarily with the following metrics:

Time required to process an incoming packet: This is the time required

to search the rule table for the rule to use.

Preprocessing time: This is the time to create the rule-table data structure.

Storage requirement: That is, how much memory is required by the rule-

table data structure?



4 Two ranges [u, v] and [x, y] intersect iff u < x < v < y V x < u < y < v.

5 The length of a prefix is the number of bits in that prefix (note that the is not
used in determining prefix length). The length of PI is 0 and that of P2 is 4.









In practice, rule tables are seldom truly static. At best, rules may be added

to or deleted from the rule table infrequently. Typically, in a "static" rule table,

inserts/deletes are batched and the rule-table data structure reconstructed as needed.

In a *li,,,iii: rule table, rules are added/deleted with some frequency. For such

tables, inserts/deletes are not batched. Rather, they are performed in real time. For

such tables, we are concerned additionally with the time required to insert/delete a

rule. For a dynamic rule table, the initial rule-table data structure is constructed

by starting with an empty data structures and then inserting the initial set of rules

into the data structure one by one. So, typically, in the case of dynamic tables, the

preprocessing metric, mentioned above, is very closely related to the insert time.

In this dissertation, we focus on data structures for static and dynamic router

tables (1-dimensional packet classification) in which the filters are either prefixes

or ranges. Although some of our data structures apply equally well to all three of

the commonly used tie breakers, our focus, in this dissertation, is on longest-prefix

matching.

1.3 Prior Work

Several solutions for the IP lookup problem (i.e., finding the longest matching

prefix) have been proposed. Let LMP(d) be the longest matching-prefix for address

d.

1.3.1 Linear List

In this data structure, the rules of the rule table are stored as a linear list ([39])

L. The LMP(d) is determined by examining the prefixes in L from left to right; for

each prefix, we determine whether or not that prefix matches d; and from the set

of matching prefixes, the one with longest length is selected. To insert a rule q, we

first search the list L from left to right to ensure that L doesn't already have a rule

with the same filter as does q. Having verified this, the new rule q is added to the

end of the list. Deletion is similar. The time for each of the operations to determine









LMP(d), insert a rule, and delete a rule is O(n), where n is the number of rules in

L. The memory required is also O(n).

Note that this data structure may be used regardless of the form of the filter

(i.e., ranges, Boolean expressions, etc.) and regardless of the tie breaker in use. The

time and memory complexities are unchanged.

1.3.2 End-Point Array

Lampson, Srinivasan, and Varghese [44] proposed a data structure in which the

end points of the ranges defined by the prefixes are stored in ascending order in an

array. The LMP(d) is found by performing a binary search on this ordered array of

end points. Although Lampson et al. [44] provide ways to reduce the complexity of

the search for the LMP by a constant factor, these methods do not result in schemes

that permit prefix insertion and deletion in O(logn) time.

It should be noted that the end-point array may be used even when ties are

broken by selecting the first matching rule or the highest-priority matching rule.

Further, the method applies to the case when the filters are arbitrary ranges rather

than simply prefixes. The complexity of the preprocessing step (i.e., creation of the

array of ordered end-points) and the search for the rule to use is unchanged. Further,

the memory requirements are the same, O(n) for an n-rule table, regardless of the tie

breaker and whether the filters are prefixes or general ranges.

1.3.3 Sets of Equal-Length Prefixes

Waldvogel et al. [87] proposed a data structure to determine LMP(d) by per-

forming a binary search on prefix length. In this data structure, the prefixes in the

router table T are partitioned into the sets So, Si, ... such that S, contains all prefixes

of T whose length is i. For simplicity, we assume that T contains the default prefix

*. So, So = {*}. Next, each S, is augmented with markers that represent prefixes in

Sj such that j > i and i is on the binary search path to Sj. For example, suppose

that the length of the longest prefix of T is 32 and that the length of LMP(d) is 22.









To find LMP(d) by a binary search on length, we will first search S16 for an entry

that matches the first 16 bits of d. This search6 will need to be successful for us

to proceed to a larger length. The next search will be in 524. This search will need

to fail. Then, we will search S20 followed by S22. So, the path followed by a binary

search on length to get to S22 is S16, S24, S20, and S22. For this to be followed, the

searches in S16, S20, and S22 must succeed while that in S24 must fail. Since the

length of LMP(d) is 22, T has no matching prefix whose length is more than 22.

So, the search in S24 is guaranteed to fail. Similarly, the search in S22 is guaranteed

to succeed. However, the searches in S16 and S20 will succeed iff T has matching

prefixes of length 16 and 20. To ensure success, every length 22 prefix P places a

marker in S16 and S20, the marker in S16 is the first 16 bits of P and that in S20 is

the first 20 bits in P. Note that a marker M is placed in Si only if S, doesn't contain

a prefix equal to M. Notice also that for each i, the binary search path to Si has

O(logl,,,) = O(logW), where I r- is the length of the longest prefix in T, Sjs on

it. So, each prefix creates O(log W) markers. With each marker M in Si, we record

the longest prefix of T that matches M (the length of this longest matching-prefix is

necessarily smaller than i).

To determine LMP(d), we begin by setting leftEnd = 0 and rightEnd = lmx.

The repetitive step of the binary search requires us to search for an entry in S,,

where m = L(leftEnd + rightEnd)/2j, that equals the first m bits of d. If S, does

not have such an entry, set rightEnd = m 1. Otherwise, if the matching entry is

the prefix P, P becomes the longest matching-prefix found so far. If the matching

entry is the marker M, the prefix recorded with M is the longest matching-prefix



6 When searching S,, only the first i bits of d are used, because all prefixes in Si
have exactly i bits.









found so far. In either case, set leftEnd = m + 1. The binary search terminates

when leftEnd > rightEnd.

One may easily establish the correctness of the described binary search. Since,

each prefix creates O(logW) markers, the memory requirement of the scheme is

O(n log W). When each set S, is represented as a hash table, the data structure is

called SELPH (sets of equal length prefixes using hash tables). The expected time

to find LMP(d) is O(logW) when the router table is represented as an SELPH.

When inserting a prefix, O(log W) markers must also be inserted. With each marker,

we must record a longest-matching prefix. The expected time to find these longest

matching-prefixes is O(log2 W). In addition, we may need to update the longest-

matching prefix information stored with the O(nlog W) markers at lengths greater

than the length of the newly inserted prefix. This takes O(n log2 W) time. So, the ex-

pected insert time is O(n log2 W). When deleting a prefix P, we must search all hash

tables for markers M that have P recorded with them and then update the recorded

prefix for each of these markers. For hash tables with a bounded loading density, the

expected time for a delete (including marker-prefix updates) is O(nlog2 W). Waldvo-

gel et al. [87] have shown that by inserting the prefixes in ascending order of length,

an n-prefix SELPH may be constructed in O(n log2 W) time.

When each set is represented as a balanced search tree, the data structure is called

SELPT. In an SELPT, the time to find LMP(d) is O(lognlog W); the insert time is

O(nlogn log2 W); the delete time is O(n log n log2 W); and the time to construct the

data structure for n prefixes is O(W + n log n log2 W).

In the full version of [87], Waldvogel et al. show that by using a technique called

marker partitioning, the SELPH data structure may be modified to have a search

time of O(a + log W) and an insert/delete time of O(a -nW log W), for any a > 1.

Because of the excessive insert and delete times, the sets of equal-length prefixes

data structure is suitable only for static router tables. By using the prefix expansion









method [22, 82], we can limit the number of distinct lengths in the prefix set and so

reduce the run time by a constant factor [87].

1.3.4 Tries

IP lookup in the BSD kernel is done using the Patricia data structure [78], which

is a variant of a compressed binary trie [39]. This scheme requires O(W) memory

accesses per lookup, insert, and delete. We note that the lookup complexity of longest

prefix matching algorithms is generally measured by the number of accesses made to

main memory (equivalently, the number of cache misses). Dynamic prefix tries, which

are an extension of Patricia, and which also take O(W) memory accesses for lookup,

were proposed by Doeringer et al. [23].

For IPv4 prefix sets, Degermark et al. [22] proposed the use of a three-level trie in

which the strides are 16, 8, and 8. They propose encoding the nodes in this trie using

bit vectors to reduce memory requirements. The resulting data structure requires

at most 12 memory accesses. However, inserts and deletes are quite expensive. For

example, the insertion of the prefix 1* changes up to 215 entries in the trie's root

node. All of these changes may propogate into the compacted storage scheme of [22].

The multibit trie data structures of Srinivasan and Varghese [82] are, perhaps,

the most flexible and effective trie structure for IP lookup. Using a technique called

controlled prefix expansion, which is very similar to the technique used in [22], tries of

a predetermined height (and hence with a predetermined number of memory accesses

per lookup) may be constructed for any prefix set. Srinivasan and Varghese [82]

develop dynamic programming algorithms to obtain space optimal fixed-stride tries

(FSTs) and variable-stride tries (VSTs) of a given height.

Lampson et al. [44] proposed the use of hybrid data structures comprised of a

stride-16 root and an auxiliary data structure for each of the subtries of the stride-16

root. This auxiliary data structure could be the end-point array (since each subtrie

is expected to contain only a small number of prefixes, the number of end points in









each end-point array is also expected to be quite small). An alternative auxiliary

data structure -'-1-'. -tI .1 by Lampson et al. [44] is a 6-way search tree for IPv4 router

tables. In the case of these 6-way trees, the keys are the remaining up to 16 bits of the

prefix (recall that the stride-16 root consumes the first 16 bits of a prefix). For IPv6

prefixes, a multicolumn scheme is -i-'-2' -t .11 [44]. None of these proposed structures

is suitable for a dynamic table.

Nilsson and Karlsson [57] propose a greedy heuristic to construct optimal VSTs.

They call the resulting VSTs LC-tries (level-compressed tries). An LC-tries obtained

from a 1-bit trie by replacing full subtries of the 1-bit trie by single multibit nodes.

This replacement is done by examining the 1-bit trie top to bottom (i.e., from root

to leaves).

1.3.5 Binary Search Trees

Suri et al. [84] proposed a B-tree data structure for dynamic router tables. Using

their structure, we may find the longest matching prefix in O(log n) time. However,

inserts/deletes take O(W log n) time. The number of cache misses is O(log n) for each

operation. When W bits fit in 0(1) words (as is the case for IPv4 and IPv6 prefixes)

logical operations on W-bit vectors can be done in 0(1) time each. In this case, the

scheme of [84] takes O(log W log n) time for an insert and O(W + log n) = O(W)

time for a delete.

Several researchers ([16, 25, 26, 36, 74], for example), have investigated router

table data structures that account for bias in access patterns. Gupta, Prabhakar,

and Boyd [36], for example, propose the use of ranges. They assume that access

frequencies for the ranges are known, and they construct a bounded-height binary

search tree of ranges. This binary search tree accounts for the known range access

frequencies to obtain near-optimal IP lookup. Although the scheme of [36] performs

IP lookup in near-optimal time, changes in the access frequencies, or the insertion









or removal of a prefix require us to reconstruct the data structure, a task that takes

O(n log n) time.

Ergun et al. [25, 26] use ranges to develop a biased skip list structure that

performs longest prefix-matching in O(log n) expected time. Their scheme is designed

to give good expected performance for bursty7 access 1p., I t' I- The biased skip list

scheme of Ergun et al. [25, 26] permits inserts and deletes in O(logn) time only in

the severely restricted and impractical situation when all prefixes in the router table

are of the same length. For the more general, and practical, case when the router

table comprises prefixes of different length, their scheme takes O(n) expected time

for each insert and delete.

1.3.6 Others

Cheung and McCanne [16] developed "a model for table-driven route lookup and

cast the table design problem as an optimization problem within this model." Their

model accounts for the memory hierarchy of modern computers and they optimize

average performance rather than worst-case performance.

Gupta and McKeown [33] examine the asymptotic complexity of a related prob-

lem, packet classification. They develop two data structures, heap-on-trie (HoT) and

binary-search-tree-on-trie (BoT), for the dynamic packet classification problem. The

complexity of these data structures (for packet classification and the insertion and

deletion of rules) also is dependent on W. For d-dimensional rules, a search in a

HoT takes O(Wd) and an update (insert or delete) takes O(Wdlogn) time. The

corresponding times for a BoT are O(W/ log n) and O(W1 -1 logn), respectively.



7 In a '1,,i /,, access pattern the number of different destination addresses in any
subsequence of q packets is << q. That is, if the destination of the current packet
is d, there is a high probability that d is also the destination for one or more of the
next few packets. The fact that Internet packets tend to be bursty has been noted in
[18, 46], for example.









Hardware solutions that involve the use of content addressable memory [50] as

well as solutions that involve modifications to the Internet Protocol (i.e., the addition

of information to each packet) have also been proposed [10, 13, 56].

1.4 Dissertation Outline

The remainder of the dissertation is organized as follows. Chapters 2 and 3

concentrate on two data structures for static router table, in which the rule set does

not vary in time. In Chapter 2, we develop new dynamic programming formulations

for the construction of space optimal tries of a predetermined height. In Chapter 3,

we develop an algorithm that minimizes storage requirement for collection of hash

table optimization problem. Also, we propose improvements to the heuristic of [80].

Chapters 4 and 5 provide data structures for dynamic router tables, in which

rules are added/deleted with some frequency. In Chapter 4, we show how to use the

range encoding idea of [44] so that longest prefix matching as well as prefix insertion

and deletion can be done in O(logn) time. Chapter 5 presents the management of

router tables for a dynamic environment (i.e., search, insert, and deletes are performed

dynamically) in which the access pattern is bursty.















CHAPTER 2
MULTIBIT TRIES

In this chapter, we focus on the controlled expansion technique of Srinivasan and

Varghese [82]. In particular, we develop new dynamic programming formulations

for the construction of space optimal tries of a predetermined height. While the

2-, -vmrpt.ti," complexities of the algorithms that result from our formulations are the

same as those for the corresponding algorithms of [82], experiments using real IPv4

routing table data indicate that our algorithms run considerably faster. Our fixed-

stride trie algorithm is 2 to 4 times as fast on a SUN workstation and 1.5 to 3 times as

fast on a Pentium 4 PC. On a SUN workstation, our variable-stride trie algorithm is

between 2 and 17 times as fast as the corresponding algorithm of [82]; on a Pentium

4 PC, our algorithm is between 3 and 47 times as fast.

In Section 2.1, we describe the data structure for 1-bit tries. We develop our new

dynamic programming formulations for both fixed-stride and variable-stride tries in

Section 2.2 and 2.3, respectively. In Section 2.4, we present our experimental results.

2.1 1-Bit Tries

A 1-bit trie is a tree-like structure in which each node has a left child, left data,

right child, and right data field. Nodes at level 1 1 of the trie store prefixes whose

length is 1 (the length of a prefix is the number of bits in that prefix; the terminating

* (if present) does not count towards the prefix length). If the rightmost bit in a

prefix whose length is 1 is 0, the prefix is stored in the left data field of a node that is

at level 1- 1; otherwise, the prefix is stored in the right data field of a node that is at

level 1-1. At level i of a trie, branching is done by examining bit i (bits are numbered

from left to right beginning with the number 0, and levels are numbered with the root

being at level 0) of a prefix or destination address. When bit i is 0, we move into the









left subtree; when the bit is 1, we move into the right subtree. Figure 2-1(a) gives the

prefixes in the 8-prefix example of [82], and Figure 2-1(b) shows the corresponding

1-bit trie. The prefixes in Figure 2-1(a) are numbered and ordered as in [82]. Since

the trie of Figure 2-1(b) has a height of 6, a search into this trie may make up to 7

memory accesses. The total memory required for the 1-bit trie of Figure 2-1(b) is 20

units (each node requires 2 units, one for each pair of (child, data) fields). The 1-bit

tries described here are an extension of the 1-bit tries described in [39]. The primary

difference being that the 1-bit tries of [39] are for the case when all keys (prefixes)

have the same length.

NO
Original prefixes P5 P N
P5=0* N21 P1 N22
Pl=10* P2
P2= 111* N31
P6 N32
P3=11001* N41
P4=1* P4 3 N42
P6=1000* N5
P7=100000* N6
P8=1000000* P8

(a) 8-prefix example of (b) Corresponding 1-bit trie
[82]

Figure 2-1: Prefixes and corresponding 1-bit trie


When 1-bit tries are used to represent IPv4 router tables, the trie height may be

as much as 31. A lookup in such a trie takes up to 32 memory accesses. Table 2-1

gives the characteristics of five IPv4 backbone router prefix sets, and Table 2-2 gives

a more detailed characterization of the prefixes in the largest of these five databases,

Paix [53]. For our five databases, the number of nodes in a 1-bit trie is between 2n

and 3n, where n is the number of prefixes in the database (Table 2-1).













Table 2-1: Prefix databases obtained from IPMA project on Sep 13, 2000

Database Prefixes 16-bit prefixes 24-bit prefixes Nodes*
Paix 85,682 6,606 49,756 173,012
Pb 35,151 2,684 19,444 91,718
MaeWest 30,599 2,500 16,260 81,104
Aads 26,970 2,236 14,468 74,290
MaeEast 22,630 1,810 11,386 65,862
* The last column shows the number of nodes in the 1-bit trie representation of the
prefix database.
Note: the number of prefixes stored at level i of a 1-bit trie equals the number of
prefixes whose length is i + 1.


Table 2-2: Distributions of the prefixes and nodes in the 1-bit trie for Paix

Level Number of Number of Level Number of Number of
prefixes nodes prefixes nodes
0 0 1 16 918 5,117
1 0 2 17 1,787 8,245
2 0 4 18 5,862 12,634
3 0 7 19 3,614 15,504
4 0 11 20 3,750 20,557
5 0 20 21 5,525 26,811
6 0 36 22 7,217 32,476
7 22 62 23 49,756 37,467
8 4 93 24 12 54
9 5 169 25 26 44
10 9 303 26 12 20
11 26 561 27 5 9
12 56 1,037 28 4 5
13 176 1,933 29 1 2
14 288 3,552 30 0 1
15 6,606 6,274 31 1 1









2.2 Fixed-Stride Tries

2.2.1 Definition

Srinivasan and Varghese [82] have proposed the use of fixed-stride tries to enable

fast identification of the longest matching prefix in a router table. The stride of a

node is defined to be the number of bits used at that node to determine which branch

to take. A node whose stride is s has 2" child fields (corresponding to the 2" possible

values for the s bits that are used) and 2" data fields. Such a node requires 2" memory

units. In a fixed-stride trie (FST), all nodes at the same level have the same stride;

nodes at different levels may have different strides.

Suppose we wish to represent the prefixes of Figure 2-1(a) using an FST that

has three levels. Assume that the strides are 2, 3, and 2. The root of the trie stores

prefixes whose length is 2; the level one nodes store prefixes whose length is 5 (2 +

3); and level three nodes store prefixes whose length is 7 (2 + 3 + 2). This poses a

problem for the prefixes of our example, because the length of some of these prefixes

is different from the storeable lengths. For instance, the length of P5 is 1. To get

around this problem, a prefix with a nonpermissible length is expanded to the next

permissible length. For example, P5 = 0* is expanded to P5a = 00* and P5b =

01*. If one of the newly created prefixes is a duplicate, natural dominance rules

are used to eliminate all but one occurrence of the prefix. For instance, P4 = 1* is

expanded to P4a = 10* and P4b = 11*. However, PI = 10* is to be chosen over

P4a = 10*, because PI is a longer match than P4. So, P4a is eliminated. Because

of the elimination of duplicate prefixes from the expanded prefix set, all prefixes are

distinct. Figure 2-2(a) shows the prefixes that result when we expand the prefixes of

Figure 2-1 to lengths 2, 5, and 7. Figure 2-2(b) shows the corresponding FST whose

height is 2 and whose strides are 2, 3, and 2.

Since the trie of Figure 2-2(b) can be searched with at most 3 memory references,

it represents a time performance improvement over the 1-bit trie of Figure 2-1(b),










P5 00
Expanded prefixes P5 01
(3 levels) Pl 10
00* (P5a) P4 11
01* (P5b) 000 P6 000
10* (P1) 001 P6 001 P3
11 (P4) 010 \ 010
11100*(P2a) o0l \ l -
11101*(P2b) 100 \ 100 P2
11110*(P2c) 101 101 P2
11111*(P2d) 110 110 P2
11001* (P3) 111 111 P2
10000* (P6a)
10001*(P6b) P8 00
1000001* (P7) oP7
1000000* (P8) 11

(a) Expanded pre- (b) Corresponding fixed-stride trie
fixes

Figure 2-2: Prefix expansion and fixed-stride trie

which requires up to 7 memory references to perform a search. However, the space

requirements of the FST of Figure 2-2(b) are more than that of the corresponding

1-bit trie. For the root of the FST, we need 8 fields or 4 units; the two level 1 nodes

require 8 units each; and the level 3 node requires 4 units. The total is 24 memory

units.

We may represent the prefixes of Figure 2-1(a) using a one-level trie whose root

has a stride of 7. Using such a trie, searches could be performed making a single

memory access. However, the one-level trie would require 27 = 128 memory units.

2.2.2 Construction of Optimal Fixed-Stride Tries

In the fixed-stride trie optimization (FSTO) problem, we are given a set P of

prefixes and an integer k. We are to select the strides for a k-level FST in such

a manner that the k-level FST for the given prefixes uses the smallest amount of

memory.









For some P, a k-level FST may actually require more space than a (k 1)-level

FST. For example, when P = {00*, 01*, 10*, 11*}, the unique 1-level FST for P

requires 4 memory units while the unique 2-level FST (which is actually the 1-bit

trie for P) requires 6 memory units. Since the search time for a (k 1)-level FST

is less than that for a k-level FST, we would actually prefer (k 1)-level FSTs that

take less (or even equal) memory over k-level FSTs. Therefore, in practice, we are

really interested in determining the best FST that uses at most k levels (rather than

exactly k levels). The modified FSTO problem (MFSTO) is to determine the best

FST that uses at most k levels for the given prefix set P.

Let 0 be the 1-bit trie for the given set of prefixes, and let F be any k-level FST

for this prefix set. Let so, ..., k-1 be the strides for F. We shall say that level 0 of F

covers levels 0,..., so 1 of 0, and that level j, 0 < j < k of F covers levels a,..., b of

0, where a q= E sqq and b = E o sq 1. So, level 0 of the FST of Figure 2-2(b)

covers levels 0 and 1 of the 1-bit trie of Figure 2-1(b). Level 1 of this FST covers

levels 2, 3, and 4 of the 1-bit trie of Figure 2-1(b); and level 2 of this FST covers

levels 5 and 6 of the 1-bit trie. We shall refer to levels e = E-"o S, 0 < u < k as

the expansion levels of O. The expansion levels defined by the FST of Figure 2-2(b)

are 0, 2, and 5.

Let nodes(i) be the number of nodes at level i of the 1-bit trie 0. For the 1-bit

trie of Figure 2-1(a), nodes(0 : 6) = [1, 1, 2, 2, 2, 1, 1]. The memory required by F is

E -1 nodes(eq) 2q. For example, the memory required by the FST of Figure 2-2(b)
is nodes(0) 22 + nodes(2) 23 + nodes(5) 22 = 24.

Let T(j, r), r < j + 1, be the cost (i.e., memory requirement) of the best way to

cover levels 0 through j of O using exactly r expansion levels. When the maximum

prefix length is W, T(W 1, k) is the cost of the best k-level FST for the given

set of prefixes. Srinivasan and Varghese [82] have obtained the following dynamic

programming recurrence for T:












T(j, r) min {T(m, r 1) + nodes(m + 1) 2-m}, r > 1 (2.1)
mC{r-2.J-1}




T(j, 1) = 2j+ (2.2)

The rationale for Equation 2.1 is that the best way to cover levels 0 through j of

O using exactly r expansion levels, r > 1, must have its last expansion level at level

m + 1 of O, where m must be at least r 2 (as otherwise, we do not have enough

levels between levels 0 and m of O to select the remaining r 1 expansion levels)

and at most j 1 (because the last expansion level is < j). When the last expansion

level is level m + 1, the stride for this level is j m, and the number of nodes at this

expansion level is nodes(m + 1). For optimality, levels 0 through m of O must be

covered in the best possible way using exactly r 1 expansion levels.

As noted by Srinivasan and Varghese [82], using the above recurrence, we may

determine T(W 1, k) in O(kW2) time (excluding the time needed to compute O

from the given prefix set and determine nodes()). The strides for the optimal k-level

FST can be obtained in an additional O(k) time. Since, Equation 2.1 also may be

used to compute T(W 1, q) for all q < k in O(kW2) time, we can actually solve

the MFSTO problem in the same asymptotic complexity as required for the FSTO

problem.

We can reduce the time needed to solve the MFSTO problem by modifying the

definition of T. The modified function is C, where C(j, r) is the cost of the best FST

that uses at most r expansion levels. It is easy to see that C(j, r) < C(j, r 1), r > 1.

A simple dynamic programming recurrence for C is:


C(j, r) = min {C(m, r 1) + nodes(m + 1) 2--m}, j > 0, r > 1 (2.3)
m {--1..j- 1}









C(-1, r) = 0 and C(j, 1) = 2+1, j > 0 (2.4)


To see the correctness of Equations 2.3 and 2.4, note that when j > 0, there

must be at least one expansion level. If r = 1, then there is exactly one expansion

level and the cost is 2j+l. If r > 1, the last expansion level in the best FST could be

at any of the levels 0 through j. Let m + 1 be this last expansion level. The cost of

the covering is C(m, r 1) + nodes(m + 1) 2j-m. When j = -1, no levels of the

1-bit trie remain to be covered. Therefore, C(-1, r) = 0.

We may obtain an alternative recurrence for C(j, r) in which the range of m on

the right side is r 2..j 1 rather than -1..j 1. First, we obtain the following

dynamic programming recurrence for C:



C(j, r) = min{C(j, r- 1),T(j,)}, r> 1 (2.5)




C(j, 1) = 2j+ (2.6)

The rationale for Equation 2.5 is that the best FST that uses at most r expansion

levels either uses at most r 1 levels or uses exactly r levels. When at most r 1

levels are used, the cost is C(j, r 1), and when exactly r levels are used, the cost is

T(j, r), which is defined by Equation 2.1.

Let U(j, r) be as defined in Equation 2.7.


U(j, r)= mm {C(m, r 1) + nodes(m + 1) 2j--m (2.7)
From Equations 2.1 and 2.5 we obtain
From Equations 2.1 and 2.5 we obtain


C(j, r) =min{C(j, r- 1), U(j, r)}


(2.8)









To see the correctness of Equation 2.8, note that for all j and r such that r < j+l,

T(j, r) > C(j, r). Furthermore,




T(j, r) = mi {T(m, r 1) + nodes(m + 1) 2 -m}
mC{r-2.j -1}
> min {C(m,r 1)+ nodes(m+ 1) 23- m
mC{r-2..j 1}
= (j,r) (2.9)


Therefore, when C(j, r 1) < U(j, r), Equations 2.5 and 2.8 compute the same

value for C(j, r) (i.e., C(j, r 1)). When C(j, r 1) > U(j, r), it appears from

Equation 2.9 that Equation 2.8 may compute a smaller C(j, r) than is computed by

Equation 2.5. However, from Equation 2.3, which is equivalent to Equation 2.5, the

C(j, r) computed by Equations 2.3 and 2.5 satisifes


C(j,r) = min {C(m, r 1) + nodes(m + 1) 2j--m
m{- 1..j-l}
< min {C(m,r- 1) + nodes(m + 1) 23-m}
mC{r-2..j 1}
= (j,r)


where C(-1, r) = 0. However, when C(j, r 1) > U(j, r), the C(j, r) computed by

Equation 2.8 is U(j, r). Therefore, when C(j, r 1) > U(j, r), the C(j, r) computed

by Equation 2.8 cannot be smaller than that computed by Equation 2.5. Therefore,

the C(j, r)s computed by Equations 2.5 and 2.8 are equal.

In the remainder of this section, we use Equations 2.3 and 2.4 for C. The range

for m (in Equation 2.3) may be restricted to a range that is (often) considerably

smaller than r 2..j 1. To obtain this narrower search range, we first establish a

few properties of 1-bit tries and their corresponding optimal FSTs.

Lemma 1 For every 1-bit trie 0, (a) nodes(i) < 21, i > 0 and (b) nodes(i + j) <

2J nodes(i), j > 0, i > 0.

Proof Follows from the fact that a 1-bit trie is a binary tree. U









Let M(j, r), r > 1, be the smallest m that minimizes


C(m, r 1)+ nodes(m + 1) 2j-m,

in Equation 2.3.

Lemma 2 V(j > 0, r > 1)[M(j + 1, r) > M(j, r)].

Proof Let M(j, r) = a and M(j + 1, r) = b.

Suppose b < a. Then,


C(j, r)


= C(a, r

< C(b, r


since, otherwise, M(j, r) = b. Furthermore,


C(j + 1, r)


= C(b, r 1) + nodes(b + 1) 2j+l-b

< C(a, r 1) + nodes(a + 1) 2 2+l-a


Therefore,


nodes(a + 1) 2j-a + nodes(b + 1) 2j+l-b

< nodes(b + 1) 2j-b + nodes(a + 1) 2j+1-'


nodes(b + 1) 2j-b < nodes(a + 1) 2j-a


Hence,


2a-b nodes(b + 1) < nodes(a + 1)


This contradicts Lemma l(b). So, b > a.

Lemma 3 V(j > 0, r > 0)[C(j, r) < C(j + 1, r)].


1) + nodes(a + 1) 23-0

1) + nodes(b + 1) 2j-b









Proof The case r = 1 follows from C(j, 1) = 2j+l. So, assume r > 1. From the
definition of M, it follows that

C(j + 1, r) =C(b, r 1) + nodes(b + 1) 2j+l-b

where -1 < b = M(j + 1, r) < j. When b < j, we get

C(j, r) < C(b, r 1) + nodes(b + 1) 2-b

< C(b, r 1) + nodes(b + 1) 2j+ -b

= (j + 1, r)

When b = j,

C(j + 1, r) = C(j, r 1)+ nodes(j + 1) 2 > C(j,r 1),

since nodes(j + 1) > 0. m

The next few lemmas use the function A, which is defined as A(j, r) = C(j, r -

1) C(j, r). Since, C(j, r) < C(j, r 1), A(j, r) > 0 for all j > 0 and all r > 2.
Lemma 4 V(j > 0)[A(j, 2) < A(j + 1, 2)].
Proof If C(j, 2) = C(j, 1), there is nothing to prove as A(j + 1, 2) > 0. The only

other possibility is C(j, 2) < C(j, 1) (i.e., A(j, 2) > 0). In this case, the best cover
for levels 0 through j uses exactly 2 expansion levels. From the recurrence for C

(Equations 2.3 and 2.4), it follows that C(j, 1) = 2j+, and

C(j,2) = C(a, 1) + nodes(a + 1) 23-a

= 2+l nodes(a + 1) 2j-a,


for some a, 0 < a < j. Therefore,


A(j, 2) = C(j, 1) (j,2)

= 2j+ 23+1 nodes(a + 1) 2-a.









From Equations 2.3 and 2.4, it follows that

C(j+ 1,2) < C(a, 1)+nodes(a + 1) 2j+l-

= 2+l1 + nodes(a + 1) 2j+l-a

Hence,

A(j + 1,2) > 2j+2 2a+ nodes(a + 1) 2j+l-a

Therefore,


A(j + 1, 2) A(j, 2) > 2j+2 2a+l -_ nodes(a + 1) 2j+l-a

-2+1 + 20+1 + nodes(a + 1) 2j-a

= 2+ nodes(a + 1) 2j-a

> 2j+ 20+1 2j-a (Lemma l(a))

= 0




Lemma 5 V(j > 0, k > 2)[A(j, k-1) < A(j+1, k-1)] = V(j > 0, k > 2)[A(j, k) <

A(j + 1, k)].
Proof Assume that V(j > 0, k > 2)[A(j, k 1) < A(j + 1, k 1)]. We shall show

that V(j > 0, k > 2)[A(j, k) < A(j + 1, k)]. Let M(j, k) = b and M(j + 1, k 1) = c.
Case 1: c > b.












= C(j, k 1) C(j, k)

= C(j, k 1) C(b, k 1) nodes(b + 1) 2j-b

< C(b, k 2) + nodes(b + 1) 2j-b

-C(b, k 1) nodes(b + 1) 2j-b


A(j + 1, k)


= C(j+1, k 1) -C(j+1,k)

> C(c, k 2) + nodes(c + 1) 2+l-c

-C(c, k 1) nodes(c + 1) 2j+l-c


= A(c, 1).

Since c > b, A(b, k 1) < A(c, k 1). Therefore,

A(j + 1, A) > A(c, k 1) > A(b, k 1) > A(j, k).

Case 2: c < b.

Let M(j + 1, k) = a, M(j, k) = b, M(j + 1, k 1) = c, and M(j, k 1)

From Lemma 2, a > b and c > d. Since c < b, a > b > c> d. Also,


C(j,k- 1)-

[C(d, k 2)

-[C(b, k


- C(j, k)

+ nodes(d + 1) 2j-d]

- 1) + nodes(b + 1) 2j-b]


A(j,k)


A(b,k- 1).


Also,


A(j, k)











A(j + 1, k)


C(j + 1, k- 1)- C(j + 1, k)

[C(c, k 2) + nodes(c + 1) 2j+l-c]

-[C(a, k 1) + nodes(a + 1) 2J+l-a].


[C(c, k 2)

-[C(d, k

+[C(b, k

-[C(a, k


+ nodes(c + 1) 2j+l-c]

- 2) + nodes(d + 1) 2j-d]

- 1) + nodes(b + 1) 2j-b]

- 1) nodes(a + 1) 2+l-a]. (2.10)


Since > b > c > d = Mj, k 1),

C(c, k 2) + nodes(c + 1) 2-

> C(d, k 2) + nodes(d + 1) 2j-d

Furthermore, since M(j + 1, k) = a > b,

C(b, k 1) + nodes(b + 1) 2j+ -b

> C(a, k 1) + nodes(a + 1) 2j+1-a

Substituting Equations 2.11 and 2.12 into Equation 2.10, we get

A(j + 1, k) A(j, k) > nodes(c + 1) 2j-c nodes(b + 1) 2j-b

Lemma 1 and c < b imply nodes(c + 1) 2b-c > nodes(b + 1). Therefore,

nodes(c + 1) 2j-c > nodes(b + 1) 2j-b

So, A(j + 1, k) A(j, k) > 0.


and


Therefore,


(2.11)






(2.12)


A(j + 1, k) A(j, k)









Lemma 6 V(j > 0, k > 2)[A(j, k) < A(j + 1, k)].
Proof Follows from Lemmas 4 and 5. U

Lemma 7 Let k > 2. V(j > 0)[A(j, k- 1) A(j+ 1, k- 1)] = V(j > 0)[M(j, k) >

M(j, k 1)].

Proof Assume that V(j > 0)[A(j, k 1) < (j + 1, k 1)]. Suppose that M(j, k -

1) = a, M(j, k) = b, and b < a for some j, j > 0. From Equation 2.3, we get

C(j, k) =C(b, k 1) + nodes(b + 1) 2-b

SC(a, k 1)+nodes(a+ 1) 23-

and


C(j, k 1) = C(a, k 2) + nodes(a + 1) 23-0

< C(b, k 2) + nodes(b + 1) 2j-b

Hence,

C(b,k 1)+C(a,k 2)
Therefore,

A(a, k 1) < A(b, k 1).

However, b < a and V(j > 0)[A(j, k 1) < A(j + 1, k 1)] imply that A(b, k 1) <

A(a, k 1). Since our assumption that b < a leads to a contradiction, it must be

that there is no j > 0 for which M(j, k 1) = a, M(j, k) = b, and b < a. m

Lemma 8 V(j > 0, k > 2)[M(j, k) > M(j, k 1)].

Proof Follows from Lemmas 6 and 7. U

Theorem 1 V(j > 0, k > 2)[M(j, k) > max{M(j 1, k), M(j, k 1)}].

Proof Follows from Lemmas 2 and 8. U









Algorithm FixedStrides(W, k)
// W is length of longest prefix.
// k is maximum number of expansion levels desired.
// Return C(W 1, k) and compute M(*, E).
{
for (j 0;j < W;j ++){
C(j, 1):= 2j;
M(j, 1) := -1;}
for (r= 1;r< k;r++)
C(-1, r) := 0;
for (r = 2; r < k;r + +)
for (j = r-1; < W;+ +){
// Compute C(j, r).
minJ:= max(M(j 1, r), M(j,r 1));
minCost := C(j, r 1);
minL := M(j, r 1);
for (m = minJ; m < j; m ++){
cost := C(m, j 1) + nodes(m + 1) 2-m;
if (cost < minCost) then
{minCost:= cost; minL := m;}}
C(j, r) := minCost; M(j, r) := minL;}
return C(W 1, k);
}
Figure 2-3: Algorithm for fixed-stride tries.

Note 1 From Lemma 6, it follows that whenever A(j, k) > 0, A(q, k) > 0, Vq > j.

Theorem 1 leads to Algorithm FixedStrides (Figure 2-3), which computes C(W-

1, k). The complexity of this algorithm is O(kW2). Using the computed M values,

the strides for the OFST that uses at most k expansion levels may be determined in an

additional O(k) time. Although our algorithm has the same ;-i\lipt.iti, complexity

as does the algorithm of Srinivasan and Varghese [82], experiments conducted by us

using real prefix sets indicate that our algorithm runs faster.
2.3 Variable-Stride Tries

2.3.1 Definition and Construction

In a variable-stride trie (VST) [82], nodes at the same level may have different

strides. Figure 2-4 shows a two-level VST for the 1-bit trie of Figure 2-1. The stride










P5 00
P5 01
P1 10
P4 11

00000 P8 000
00001 P7 001 P3
00010 P6 010
00011 P6 011
00100 P6 100 P2
00101 P6 101 P2
00110 P6 110 P2
00111 P6 111 P2

11100
11101
11110
11111

Figure 2-4: Two-level VST for prefixes of Figure 2-1(a)


for the root is 2; that for the left child of the root is 5; and that for the root's right

child is 3. The memory requirement of this VBT is 4 (root) + 32 (left child of root)

+ 8 (right child of root) = 44.

Since FSTs are a special case of VSTs, the memory required by the best VST

for a given prefix set P and number of expansion levels k is less than or equal to that

required by the best FST for P and k. Despite this, FSTs may be preferred in certain

router applications "because of their simplicity and slightly faster search time" [82].

Let r-VST be a VST that has at most r levels. Let Opt(N, r) be the cost (i.e.,

memory requirement) of the best r-VST for a 1-bit trie whose root is N. Srinivasan

and Varghese [82] have obtained the following dynamic programming recurrence for

Opt(N, r).




Opt(N, r)= min {2s + Opt(Q, r- 1)}, r > 1 (2.13)
sc{1..l+height(N)}
QcDs(N)









where D,(N) is the set of all descendents of N that are at level s of N. For ex-

ample, Di(N) is the set of children of N and D2(N) is the set of grandchildren of

N. height(N) is the maximum level at which the trie rooted at N has a node. For

example, in Figure 2-1(b), the height of the trie rooted at N1 is 5. When r = 1,


Opt(N, 1) = 21+hcighl() (2.14)

Equation 2.14 is equivalent to Equation 2.2; the cost of covering all levels of N

using at most one expansion level is 2l+cight(N) When more than one expansion

level is permissible, the stride of the first expansion level may be any number s that

is between 1 and 1 + height(N). For any such selection of s, the next expansion

level is level s of the 1-bit trie whose root is N. The sum in Equation 2.13 gives

the cost of the best way to cover all subtrees whose roots are at this next expansion

level. Each such subtree is covered using at most r 1 expansion levels. It is easy to

see that Opt(R, k), where R is the root of the overall 1-bit trie for the given prefix

set P, is the cost of the best k-VST for P. Srinivasan and Varghese [82] describe a

way to determine Opt(R, k) using Equations 2.13 and 2.14. Although Srinivasan and

Varghese state that the complexity of their algorithm is O(nW2k), where n is the

number of prefixes in P and W is the length of the longest prefix, a close examination

reveals that the complexity is O(pWk), where p is the number of nodes in the 1-bit

trie. Since p = O(n) for realistic router prefix sets, the complexity of their algorithm

is O(nWk) on realistic router prefix sets.

We develop an alternative dynamic programming formulation that also permits

the computation of Opt(R, k) in O(pWk) time. However, the resulting algorithm is

considerably faster. Let


Opt(N, s, r)= 1 Opt(Q, r), s > 0, r > 1,
QeD D(N)

and let Opt(N, 0, r) = Opt(N, r). From Equations 2.13 and 2.14, we obtain:












Opt(N, 0, r) = min.{.,+heght(N)} {2 + Opt(N, s, r 1)}, r > 1 (2.15)

and




Opt(N, 0, 1) = 21+h"eght(). (2.16)

For s > 0 and r > 1, we get


Opt(N, s, r) = Z Opt(Q, r)
QecD(N)
= Opt(LeftChild(N), s 1, r)

+ Opt(RightChild(N), s- 1, r). (2.17)


For Equation 2.17, we need the following initial condition:


Opt(null, *, *) = 0 (2.18)


The number of Opt(*, *, *) values is O(pWk). Each Opt(*, *, *) value may be

computed in 0(1) time using Equations 2.15 through 2.18 provided the Opt values

are computed in postorder. Therefore, we may compute Opt(R, k) = Opt(R, 0, k) in

O(pWk) time. Although both our algorithm and that of [82] run in O(pWk) time, our

algorithm is expected to do less work. We arrive at this expectation by performing a

somewhat crude operation count analysis. In the algorithm of [82], for each value of

r (see Equation 2.13), Opt(M, r 1) is used levelM times, where levelM is the level

for node M. Adding in 1 unit for the initial storage of Opt(M, r 1), we see that a

levelM node contributes roughly levelM + 1 to the total cost of computing Opt(*, r).

Therefore, a rough operation count for the algorithm of [82] is


OpCountSrini = k Y(levelM + 1)
M









where the sum is taken over all nodes M of the 1-bit trie.

Let heightM be the height of the subtree rooted at node M of the 1-bit trie (the

height of a subtree that has only a root is 0). Our algorithm computes (heightM+ 1)k

Opt(M, *, *) values at node M. Each of these values is computed using a single

addition. So, the operation count for our algorithm is crudely estimated to be


OpCountOur = k Y(heightM + 1)
M

For our five databases Paix, Pb, MaeWest, Aads, and MaeEast, the ratios

OpCountSrini/OpCountOur are 6.7, 5.9, 5.7, 5.6, and 5.4. We can determine the

possible range for this ratio by computing the ratio for skewed as well as full binary

trees.

For a totally skewed 1-bit trie (e.g., a left or right skewed trie), the two operation

count estimates are the same. For a 1-bit trie that is a full binary tree of height W- 1,

W-1
OpCountSrini/k = (p + 1)2P
0
= (- 1)2+ 1


and
W-1
OpCountOur/k = Y(W -p)2P
0
= 2 w+ -2


So, OpCountSrini/OpCountOur ( (W 1)/2. Since skewed and full binary

trees represent two extremes for the operation count ratio, the operation count ratio

is expected to be between 1 and (W 1)/2. For IPv4, W = 32 and this ratio lies

between 1 and 15.5. For IPv6, W = 128 and this ratio is between 1 and 63.5.

Although the number of operations being performed is an important contributing

factor to the observed run time of an algorithm, the number of cache misses often

has significant impact. For the algorithm of [82], we estimate the number of cache









misses to be of the same order as the number of operations (i.e., OpCountSrini).

Because our algorithm is a simple postorder traversal that visits each node of the

1-bit trie exactly once, the number of cache misses for our algorithm is estimated

to be OpCountOur/L, where L is the smaller of k and the number of Opt(M, *, *)

values that fit in a cache line. The cache miss count gives our algorithm another

factor of L advantage over the algorithm of [82].

When the cost of operations dominates the run time, our crude analysis indicates

that our algorithm will be about 6 times as fast as that of [82] (for our test databases).

When cache miss time dominates the run time, our algorithm could be 12 times as

fast when k = 2 and 42 times as fast when k = 7. Of course, since our analysis doesn't

include all of the overheads associated with the two algorithms, actual speedups may

be quite different.

Our algorithm requires O(W2k) memory for the Opt(*, *, *) values. To see this,

notice that there can be at most W + 1 nodes N whose Opt(N, *, *) values must

be retained at any given time, and for each of these at most W + 1 nodes, O(Wk)

Opt(N, *, *) values must be retained. To determine the optimal strides, each node of

the 1-bit trie must store the stride s that minimizes the right side of Equation 2.15 for

each value of r. For this purpose, each 1-bit trie node needs O(k) space. Therefore,

the memory requirements of the 1-bit trie are O(pk). The total memory required is,

therefore, O(pk + W2k).

In practice, we may prefer an implementation that uses considerably more mem-

ory. If we associate a cost array with each of the p nodes of the 1-bit trie, the memory

requirement increases to O(pWk). The advantage of this increased memory imple-

mentation is that the optimal strides can be recomputed in O(W2k) time (rather

than O(pWk)) following each insert or delete of a prefix. This is so because, the

Opt(N, *, *) values need be recomputed only for nodes along the insert/delete path

of the 1-bit trie. There are O(W) such nodes.









PI
P2
P3
P4:
P5
P6
P7:
P8:


N21


N22


N31


N32


N41


P7

P8


N42

]N5

N6


Figure 2-6: 1-bit trie for prefixes of Figure 2-5(a)


2.3.2 An Example

Figure 2-5(a) gives a prefix set P that contains 8 prefixes. The length of the

longest prefix (P8) is 7. Figure 2-5(b) gives the prefixes that remain when the prefixes

of P are expanded into the lengths 1, 3, 5, and 7. As we shall see, these expanded

prefixes correspond to an optimal 4-VST for P. Figure 2-6 gives the 1-bit trie for

the prefixes of Figure 2-5.

To determine the cost, Opt(NO, 0, 4), of the best 4-VST for the prefix set of

Figure 2-5(a), we must compute all the Opt values shown in Figure 2-7. In this

figure Opti, for example, refers to Opt(N1, *, *) and Opt42 refers to Opt(N42, *, *).


S0* 0 *(PI)
= i* 1 (P2)
=11* 101 (P4)
S101* 110 (P3)
= 10001* 111 (P3)
=1100* 10001 (P5)
=110000* 11000 (P6)
= 1100000* 11001 (P6)
1100000 (P8)
1100001 (P7)
Original prefixes (b) Expanded prefixes

5: A prefix set and its expansion to four lengths

NO
PI P2


(a)
Figure 2









Opto r = 1 2 3 4
= 0 128 26 20 18
1 64 18 16 16
2 40 18 16 16
3 20 12 12 12
4 10 8 8 8
5 4 4 4 4
6 2 2 2 2

Opt21 r= 1 2 3 4
s=0 8 6 6 6
1 4 4 4 4
2 2 2 2 2

Opt31 r =1 2 3 4
s=0 4 4 4 4
1 2 2 2 2

Opt41 = 1 2 3 4
s=0 2 2 2 2

p0t-, r = 1 2 3 4
s=O 4 4 4 4
1 2 2 2 2

Opt r =1 2 3 4
s=0 2 2 2 2


Opt r = 1 2 3 4
s = 0 64 18 16 16
1 40 18 16 16
2 20 12 12 12
3 10 8 8 8
4 4 4 4 4
5 2 2 2 2

Opt__ r = 1 2 3 4
S=0 32 12 10 10
1 16 8 8 8
2 8 6 6 6
3 4 4 4 4
4 2 2 2 2

Opt32 r = 1 2 3 4
s=0 16 8 8 8
1 8 6 6 6
2 4 4 4 4
3 2 2 2 2

Opt42 r = 1 2 3 4
s=0 8 6 6 6
1 4 4 4 4
2 2 2 2 2


Figure 2-7: Opt values in the computation of Opt(NO, 0, 4)


The Opt arrays shown in Figure 2-7 are computed in postorder; that is, in the order

N41, N31, N21, N6, N5, N42, N32, N22, N1, NO. The Opt values shown in Figure 2-7

were computed using Equations 2.15 through 2.18.

From Figure 2-7, we determine that the cost of the best 4-VST for the given

prefix set is Opt(NO, 0, 4) = 18. To construct this best 4-VST, we must determine the

strides for all nodes in the best 4-VST. These strides are easily determined if, with

each Opt(*, 0, *), we store the s value that minimizes the right side of Equation 2.15.

For Opt(NO, 0, 4), this minimizing s value is 1. This means that the stride for the

root of the best 4-VST is 1, its left subtree is empty (because NO has an empty










0 P1
1 P2

00
P4 01
P3 10
P3 \11

0 00 P6
1 01 P6
10
/ I

0 00 P8
1 P5 01 P7
10
11

Figure 2-8: Optimal 4-VST for prefixes of Figure 2-5(a)


left subtree), its right subtree is the best 3-VST for the subtree rooted at N1. The

minimizing s value for Opt(N1, 0, 3) is 2 (actually, there is a tie between s = 2 and

s = 3; ties may be broken arbitrarily). Therefore, the right child of the root of the

best 4-VST has a stride of 2. Its first subtree is the best 2-VST for N31; its second

subtree is empty; its third subtree is the best 2-VST for N32; and its fourth subtree

is empty. Continuing in this manner, we obtain the 4-VST of Figure 2-8. The cost

of this 4-VST is 18.

2.3.3 Faster k = 2 Algorithm

The algorithm of Section 2.3.1 may be used to determine the optimal 2-VST for

a set of n prefixes in O(pW) (equal to O(nW) for practical prefix sets) time, where

p is the number of nodes in the 1-bit trie and W is the length of the longest prefix.

In this section, we develop an O(p)-time algorithm for this task.

From Equation 2.13, we see that the cost, Opt(root, 2) of the best 2-VST is









Algorithm ComputeC(t)
//Initial invocation is ComputeC(root).
// The C array and level are initialized to 0 prior to initial invocation.
// Return height of tree rooted at node t.
{
if (t! = null) {
level + +;
leftHeight = ComputeC(t.leftChild);
rightHeight = ComputeC(t.rightChild);
level -
height = max{leftHeight, rightHeight} + 1;
C[level] += 2heghwt+1
return height;
}
else return -1;
Figure 2 Algorithm to compute C using Equation 2.20.
Figure 2-9: Algorithm to compute C using Equation 2.20.


Opt(root, 2)


minsEC{..l+height(root)} {2s-- + Opt(Q, 1)}
QeD, (root)
mins {1 ..+height(root)} {2 + 21+height(Q)}
QeDs (root)
minsE{1..l+height(root)} {2S + C(s)}


C(s)= 21 +height(Q) (2.20)
QeD (root)

We may compute C(s), 1 < s < 1 + height(root), in O(p) time by performing a

postorder traversal (see Figure 2-9) of the 1-bit trie rooted at root. (Recall that p is

the number of nodes in the 1-bit trie.)

Once we have determined the C values using Algorithm ComputeC (Figure 2-9),

we may determine Opt(root, 2) and the optimal stride for the root in an additional

O(height(root)) time using Equation 2.19. If the optimal stride for the root is s, then


where


(2.19)









the second expansion level is level s (unless, s = 1 + height(root), in which case there

isn't a second expansion level). The stride for each node at level s is one plus the

height of the subtree rooted at that node. The height of the subtree rooted at each

node was computed by Algorithm ComputeC, and so the strides for the nodes at the

second expansion level are easily determined.

2.3.4 Faster k = 3 Algorithm

Using the algorithm of Section 2.3.1 we may determine the optimal 3-VST in

O(pW) time. In this section, we develop a simpler and faster O(pW) algorithm for

this task. On practical prefix sets, the algorithm of this section runs in O(p) time.

From Equation 2.13, we see that the cost, Opt(root, 3) of the best 3-VST is



Opt(root, 3) = mins{..I+height(root)} {2- + Opt(Q, 2)}
QeDD (root)
= ilnsel{..l+height(root)}{2f + T(s)} (2.21)


where



T(s)= Opt(Q, 2) (2.22)
QeD. (root)

Figure 2-10 gives our algorithm to compute T(s), 1 < s < 1 + height(root).

The computation of Opt(M, 2) is done using Equations 2.19 and 2.20. In Algorithm

ComputeT (Figure 2-10), the method allocate allocates a one-dimensional array that

is to be used to compute the C values for a subtree. The allocated array is initialized

to zeroes; it has positions 0 through W, where W is the length of the longest prefix

(W also is 1 + height(root)); and when computing the C values for a subtree whose

root is at level j, only positions j through W of the allocated array may be modified.

The method deallocate frees a C array previously allocated.














Algorithm ComputeT(t)
//Initial invocation is ComputeT(root).
// The T array and level are initialized to 0 prior to initial invocation.
// Return cost of best 2-VST for subtree rooted at node t and height
// of this subtree.
{
if (t! = null) {
level + +;
// compute C values and heights for left and right subtrees of t
(leftC, leftHeight) = ComputeT(t.leftChild);
(rightC, rightHeight) = ComputeT(t.rightChild);
level -;
// compute C values and height for t as well as
// bestT = Opt(t, 2) and t.stride = stride of node t
//in this best 2-VST rooted at t.
height = max{leftHeight, rightHeight} + 1;
bestT = leftC[level] = 2hght+l;
t.stride = height + 1;
for (int i = 1; i <= height; i + +) {
leftC[level + i] += rightC[level + i];
if (2 + leftC[level + i] < t.bestT) {
bestT = 2 + leftC[level + i];
t.stride = i;
}}
T[level]+ = bestT;
deallocate(rightC);
return (leftC, height);
}
else {//t is null
allocate(C);
return (C, -1);
}
}


Figure 2-10: Algorithm to compute T using Equation 2.22.









The complexity of Algorithm ComputeT is readily seen to be O(pW). Once

the T values have been computed using Algorithm ComputeT, we may determine

Opt(root, 3) and the stride of the root of the optimal 3-VST in an additional O(W)

time. The strides of the nodes at the remaining expansion levels of the optimal 3-

VST may be determined from the t.stride and subtree height values computed by

Algorithm ComputeT in O(p) time. So the total time needed to determine the best

3-VST is O(pW).

When the difference between the heights of the left and right subtrees of nodes in

the 1-bit trie is bounded by some constant d, the complexity of Algorithm ComputeT

is O(p). We use an amortization scheme to prove this. First, note that, exclusive

of the recursive calls, the work done by Algorithm ComputeT for each invocation is

O(height(t)). For simplicity, assume that this work is exactly height(t) + 1 (the 1

is for the work done outside the for loop of ComputeT). Each active C array will

maintain a credit that is at least equal to the height of the subtree it is associated

with. When a C array is allocated, it has no credit associated with it. Each node in

the 1-bit trie begins with a credit of 2. When t = N, 1 unit of the credits on N is

used to pay for the work done outside of the for loop. The remaining unit is given

to the C array leftC. The cost of the for loop is paid for by the credits associated

with rightC. These credits may fall short by at most d+ 1, because the height of the

left subtree of N may be up to d more than the height of N's right subtree. Adding

together the initial credits on the nodes and the maximum total shortfall, we see that

p(2 + d + 1) credits are enough to pay for all of the work. So, the complexity of

ComputeT is O(pd) = O(p) (because d is assumed to be a constant). In practice,

we expect that the 1-bit tries for router prefixes will not be too skewed and that the

difference between the heights of the left and right subtrees will, in fact, be quite

small. Therefore, in practice, we expect ComputeT to run in O(p) time.







51

Table 2-3: Memory required (in Kbytes) by best k-level FST

Levels(k) 2 3 4 5 6 7
Paix 49,192 3,030 1,340 1,093 960 922
Pb 47,925 2,328 896 699 563 527
MaeWest 44,338 2,168 819 636 499 468
Aads 42,204 2,070 782 594 467 436
MaeEast 38,890 1,991 741 549 433 398


2.4 Experimental Results

We programmed our dynamic programming algorithms in C and compared their

performance against that of the C codes for the algorithms of Srinivasan and Varghese

[82]. All codes that were run on a SUN workstation were compiled using the gcc

compiler and optimization level -02; codes run on a PC were compiled using Microsoft

Visual C++ 6.0 and optimization level -02. The codes were run on a SUN Ultra

Enterprise 4000/5000 computer as well as on a 2.26 GHz Pentium 4 PC. For test

data, we used the five IPv4 prefix databases of Table 2-1.

2.4.1 Performance of Fixed-Stride Algorithm

Table 2-3 and Figure 2-11 shows the memory required by the best k-level FST

for each of the five databases of Table 2-1. Note that the y-axis of Figure 2-11

uses a semilog scale. The k values used by us range from a low of 2 to a high of 7

(corresponding to a lookup performance of at most 2 memory accesses per lookup to

at most 7 memory accesses per lookup). As was the case with the data sets used in

[82], using a larger number of levels does not increase the required memory. We note

that for k = 11 and 12, [82] reports no decrease in memory required for three of their

data sets. We did not try such large k values for our data sets.

Table 2-4 and Figure 2-12 show the time taken by both our algorithm and that of

[82] (we are grateful to Dr. Srinivasan for making his fixed- and variable-stride codes

available to us) to determine the optimal strides of the best FST that has at most k

levels. These times are for the Pentium 4 PC. Times in Table 2-5 and Figure 2-13

are for the SUN workstation. As expected, the run time of the algorithm of [82] is







52

105
v Paix
SPb
MaeWest
Aads
S MaeEast
104


0
E






102
1021
2 3 4 5 6 7
k

Figure 2-11: Memory required (in KBytes) by best k-level FST


quite insensitive to the number of prefixes in the database. Although the run time

of our algorithm is independent of the number of prefixes, the run time does depend

on the values of nodes(*) as these values determine M(*, *) and hence determine

minJ in Figure 2-3. As indicated by the graph of Figure 2-12, the run time for

our algorithm varies only slightly with the database. As can be seen, our algorithm

provides a speedup of between z1.5 and z3 compared to that of [82]. When the

codes were run on our SUN workstation, the speedup was between Z2 and A4.

2.4.2 Performance of Variable-Stride Algorithm

Table 2-6 shows the memory required by the best k-level VST for each of the

five databases of Table 2-1. The columns labeled "Yes" give the memory required

when the VST is permitted to have Butler nodes [44]. This capability refers to the

replacing of subtries with three or fewer prefixes by a single node that contains these

prefixes [44]. The columns labeled "No" refer to the case when Butler nodes are not

permitted (i.e., the case discussed in this chapter). The data of Table 2-6 as well








53


Table 2-4: Execution time (in psec) for FST algorithms, Pentium 4 PC

Paix Pb MaeWest Aads MaeEast
k [82] Our [82] Our [82] Our [82] Our [82] Our
2 5.23 3.20 5.19 3.15 5.13 3.15 5.15 3.17 5.17 3.09
3 9.99 4.87 9.73 4.73 9.98 4.81 9.96 4.90 10.00 4.73
4 14.68 6.23 14.53 6.15 14.62 6.29 14.59 6.31 14.64 6.10
5 19.54 7.36 19.42 7.31 19.42 7.40 19.15 7.42 19.45 7.28
6 24.32 9.39 24.08 8.37 24.07 8.47 24.03 8.46 24.23 8.29
7 28.99 9.48 28.72 9.42 28.68 9.45 28.68 9.38 28.76 9.34


Table 2-5: Execution time (in pisec) for FST algorithms, SUN Ultra Enterprise
4000/5000


20

1U
-15


- -v Paix-[16]
SPaix-Our
-- Pb-[16]
Pb-Our
MaeWest-[16]
MaeWest-Our
SAads-[16]
v Aads-Our
MaeEast-[16]
MaeEast-Our


3 4 5 6 7
k


Figure 2-12: Execution time (in psec) for FST algorithms, Pentium 4 PC


Paix Pb MaeWest Aads MaeEast
k [82] Our [82] Our [82] Our [82] Our [82] Our
2 39 21 41 21 39 21 37 20 37 21
3 85 30 81 30 84 31 74 31 96 31
4 123 39 124 40 128 38 122 40 130 40
5 174 46 174 48 147 46 161 45 164 46
6 194 53 201 54 190 55 194 54 190 53
7 246 62 241 63 221 63 264 62 220 62


: : /
: : /d :


/
: : f :



/
: 2 : / :

//
.l- "
/ .
/
r --
/ ....










300


250


200
(D
0)

150
E

100
100


/ )
/









34/5
k/ ,




/
7 -
/--*---




S 3 4 5 6 7
k


Figure 2-13: Execution time (in psec)
4000/5000


SPaix-[16]
Paix-Our
Pb-[16]
Pb-Our
MaeWest-[16]
MaeWest-Our
SAads-[16]
Aads-Our
MaeEast-[16]
SMaeEast-Our


for FST algorithms, SUN Ultra Enterprise


as the memory requirements of the best FST are plotted in Figure 2-14. As can be

seen, the Butler node provision has far more impact when k is small than when k is

large. In fact, when k = 2 the Butler node provision reduces the memory required

by the best VST by almost 50%. However, when k = 7, the reduction in memory

resulting from the use of Butler nodes versus not using them results in less than a

20% reduction in memory requirement.


Table 2-6: Memory required (in Kbytes) by best k-


VST


Paix Pb MaeWest Aads MaeEast
k No Yes No Yes No Yes No Yes No Yes
2 2,528 1,722 1,806 1,041 1,754 949 1,631 891 1,621 837
3 1,080 907 677 496 619 443 582 405 537 367
4 845 749 489 397 441 351 410 320 371 286
5 780 706 440 370 393 327 363 297 326 264
6 763 695 426 361 379 319 350 290 313 257
7 759 692 422 358 376 316 346 287 310 254










105
10 -5 No Butler
Butler
-- FST
\
\ U

104


0
E

10
103m .. .. .. : : :. ...... : -:: : : .:.:.: ........ : .... ... ... ...







102
2 3 4 5 6 7
k

Figure 2-14: Memory required (in KBytes) for Paix by best k-VST and best FST


For the run time comparison of the VST algorithms, we implemented three ver-

sions of our VST algorithm of Section 2.3.1. None of these versions permitted the use

of Butler nodes. The first version, called the O(pk + W2k) Static Memory Implemen-

tation, is the O(pk + W2k) memory implementation described in Section 2.3.1. The

O(W2k) memory required by this implementation for the cost arrays is allocated at

compile time. During execution, memory segments from this preallocated O(W2k)

memory are allocated to nodes, as needed, for their cost ;,ii,\ -. The second version,

called the O(pWk) Dynamic Memory Implementation, dynamically allocates a cost

array to each node of the 1-bit trie nodes using C's malloc method. Neither the first

nor second implementations employ the fast algorithms of Sections 2.3.3 and 2.3.4.

Tables 2-7 and 2-8 give the run time for these two implementations.

The third implementation of our VST algorithm uses the faster k = 2 and k = 3

algorithms of Section 2.3.3 and 2.3.4 and also uses O(pWk) memory. The O(pWk)

memory is allocated in one large block making a single call to malloc. Following















Table 2-7: Execution times (in msec) for first two implementations of our VST algo-
rithm, Pentium 4 PC

Paix Pb MaeWest Aads MaeEast
k S D S D S D S D S D
2 34.3 107.5 17.6 56.2 31.0 50.1 15.9 46.9 12.2 40.2
3 39.1 115.4 22.4 65.2 15.2 58.2 19.0 53.1 15.1 46.5
4 47.0 131.4 28.2 74.8 20.0 66.2 23.3 57.6 16.6 54.2
5 51.5 140.7 29.6 78.0 20.3 66.2 23.2 62.0 19.9 56.0
6 59.0 146.7 32.9 82.7 27.9 69.4 26.3 71.6 21.4 62.5
7 63.7 159.3 31.0 88.6 32.8 79.0 32.7 73.3 29.4 67.1
S = O(pk + W2k) Static Memory Implementation
D = O(pWk) Dynamic Memory Implementation


Table 2-8: Execution times (in msec) for first two implementations of our VST algo-
rithm, SUN Ultra Enterprise 4000/5000


Paix Pb MaeWest Aads MaeEast
k S D S D S D S D S D
2 290 500 150 280 150 260 120 200 120 230
3 360 790 190 460 180 430 150 340 150 340
4 430 900 210 520 220 430 180 430 160 390
5 490 1140 260 610 240 570 200 520 190 470
6 530 1170 290 670 270 570 270 550 220 510
7 590 1390 330 780 300 690 300 630 260 560
S = O(pk + W2k) Static Memory Implementation
D = O(pWk) Dynamic Memory Implementation







57

Table 2-9: Execution times (in msec) for third implementation of our VST algorithm,
Pentium 4 PC
k Paix Pb MaeWest Aads MaeEast
2 21.0 10.6 9.0 8.2 7.3
3 27.8 15.0 13.2 12.1 10.7
4 48.5 27.6 24.6 22.9 20.6
5 56.2 32.3 28.7 26.7 24.0
6 62.1 36.4 32.5 30.4 27.1
7 69.3 40.3 36.1 33.7 30.3

Table 2-10: Execution times (in msec) for third implementation of our VST algo-
rithm, SUN Ultra Enterprise 4000/5000

k Paix Pb MaeWest Aads MaeEast
2 70 30 30 20 20
3 210 100 90 80 70
4 550 290 270 270 240
5 640 350 370 330 260
6 740 430 390 410 350
7 920 530 450 400 350


this, the large allocated block of memory is partitioned into cost ;,ii.'\- for the 1-bit

trie nodes by our program. The run time for the third implementation is given in

Tables 2-9 and 2-10. The run times for all three of our implementations is plotted

in Figures 2-15 and 2-16. Notice that this third implementation is significantly

faster than our other O(pWk) memory implementation. Note also that this third

implementation is also faster than the O(pk + W2k) memory implementation for the

cases k = 2 and k = 3 (this is because, in our third implementation, these cases use

the faster algorithms of Sections 2.3.3 and 2.3.4).

To compare the run time performance of our algorithm with that of [82], we

use the times for implementation 3 when k = 2 or 3 and the times for the faster of

implementations 1 and 3 when k > 3. That is, we compare our best times with the

times for the algorithm of [82]. The times for the algorithm of [82] were obtained using

their code and running it with the Butler node option off. Since the code of [82] does

no dynamic memory allocation, our use of the times for the static memory allocation

















SDynamic
-tThird


2 3 4 5 6 7
k


Figure 2-15: Execution times
Pentium 4 PC















1000

8 800

S600
I-


(in msec) for Paix for our three VST implementations,


5 6


Figure 2-16: Execution times (in msec) for Paix
SUN Ultra Enterprise 4000/5000


for our three VST implementations,


-1







59

Table 2-11: Execution times (in msec) for our best VST implementation and the
VST algorithm of Srinivasan and Varghese, Pentium 4 PC

Paix Pb MaeWest Aads MaeEast
k [82] Our [82] Our [82] Our [82] Our [82] Our
2 64.6 21.0 37.4 10.6 31.1 9.0 27.9 8.2 26.6 7.3
3 665.6 27.8 339.2 15.0 297.0 13.2 269.8 12.1 244.1 10.7
4 1262.7 47.0 629.8 27.6 559.4 20.0 503.2 22.9 448.5 16.6
5 1858.0 51.5 928.4 29.6 817.1 20.3 737.2 23.2 659.8 19.9
6 2441.0 59.0 1215.8 32.9 1073.2 27.9 971.4 26.3 868.9 21.4
7 3034.7 63.7 1512.7 31.0 1328.0 32.8 1209.3 32.7 1072.0 29.4

Table 2-12: Execution times (in msec) for our best VST implementation and the VST
algorithm of Srinivasan and Varghese, SUN Ultra Enterprise 4000/5000

Paix Pb MaeWest Aads MaeEast
k [82] Our [82] Our [82] Our [82] Our [82] Our
2 190 70 130 30 50 30 40 20 40 20
3 1960 210 1230 100 360 90 320 80 280 70
4 3630 430 2330 210 700 220 590 180 530 160
5 5340 490 3440 260 1030 240 860 200 780 190
6 7510 530 4550 290 1340 270 1150 270 1020 220
7 9280 590 5650 330 1650 300 1420 300 1270 260


does not, in any way, disadvantage the algorithm of [82]. The run times, on our 2.26

GHz PC, are shown in Table 2-11 and these times are plotted in Figure 2-17. For our

largest database, Paix, our new algorithm takes less than one-third the time taken

by the algorithm of [82] when k = 2 and about 1/47 the time when k = 7. On our

SUN workstation, as shown in Table 2-12 and Figure 2-18, the observed speedups

for Paix ranges from a low of 2.7 to a high of 15.7. The observed speedups aren't

as high as predicted by our crude analysis because actual speedup is governed by

both the operation cost and the cache-miss cost; further, our crude analysis doesn't

account for all operations. The higher speedup observed on a PC -i,- -,-t a higher

relative cache-miss cost on the PC (relative to the cost of an operation) versus on a

SUN workstation.

The times reported in Tables 2-7-2-12 are only the times needed to determine

the optimal strides for a given 1-bit trie. Once these strides have been determined,



















3500

3000

2500

82000
E
1500
F-
1000

500


2 3 4 5 6


Figure 2-17: Execution times (in msec) for Paix for our best VST implementation
and the VST algorithm of Srinivasan and Varghese, Pentium 4 PC


2 3 4 5 6


Figure 2-18: Execution times (in msec) for Paix for our best VST implementation and
the VST algorithm of Srinivasan and Varghese, SUN Ultra Enterprise
4000/5000


Algorithm of [16]
Algorithm of Our


S Algorithm of [16]
-- Algorithm of Our







61

Table 2-13: Time (in msec) to construct optimal VST from optimal stride data, Pen-
tium 4 PC


k Paix Pb MaeWest Aads MaeEast
2 117.1 78.0 68.5 67.4 64.3
3 107.8 62.6 55.7 47.0 47.0
4 115.5 66.2 61.3 50.9 47.0
5 126.6 78.0 63.6 62.1 56.5
6 131.4 82.6 64.5 68.9 59.5
7 139.3 78.0 75.6 71.6 62.0

Table 2-14: Search time (in psec) in optimal VST, Pentium 4 PC

k Paix Pb MaeWest Aads MaeEast
2 0.55 0.46 0.44 0.43 0.42
3 0.71 0.64 0.62 0.61 0.59
4 0.79 0.74 0.73 0.72 0.72
5 0.92 0.89 0.89 0.88 0.90
6 1.01 1.00 0.99 0.99 0.98
7 1.10 1.10 1.08 1.09 1.10


it is necessary to actually construct the optimal VST. Table 2-13 shows the time

required to construct the optimal VST once the optimal strides are known. For our

databases, the VST construction time is more than the time required to compute the

optimal strides using our best optimal stride computation implementation.

The primary operation performed on an optimal VST is a lookup or search in

which we begin with a destination address and find the longest prefix that matches

this destination address. To determine the average lookup/search time, we searched

for as many addresses as there are prefixes in a database. The search addresses

were obtained by using the 32-bit expansion available in the database for all prefixes

in the database. Table 2-14 and Figure 2-19 show the average time to perform a

lookup/search. As expected, the average search time increases monotonically with k.

For our databases, the search time for a 2-VST is less than or equal to half that for

a 7-VST.

Inserts and deletes are performed less frequently than searches in a VST. We

experimented with three strategies for these two operations:


















-C,,

CO07

06

0254
2 3 4 5 6 7
k

Figure 2-19: Search time (in nsec) in optimal VST for Paix, Pentium 4 PC


* OptVST: In this strategy, the VST was always the best possible k-VST for

the current set of prefixes. To insert a new prefix, we first insert the prefix

into the 1-bit trie of all prefixes. Then, the cost arrays on the insert path are

recomputed. This is done efficiently using implementation 2 (i.e., the O(pWk)

dynamic memory implementation) of our VST stride computation algorithm.

Following this, the optimal strides for vertices on the insert path are computed.

Since, the optimal VST for the new prefix set differs from the optimal VST

for the original prefix set only along the insert path, we modify the original

optimal VST only along this insert path using the newly computed strides for

the vertices on this path. Deletion works in a similar fashion.

* Batchl: In this strategy, the optimal VST is computed periodically (say, after

a sufficient number of inserts/deletes have taken place) rather than following

each insert/delete. Inserts and deletes are done directly into the current VST

without regard to maintaining optimality. If the insertion results in the creation

of a new node, the stride of this new node is such that the sum of the strides of

the nodes on the path from the root to this new node equals the length of the

newly inserted prefix. The deletion of a prefix may require us to search a node









for a replacement prefix of the next (lower) length that matches the deleted

prefix.

Batch2: This differs from strategy Batchi in that inserts and deletes are done

in both the current VST and in the 1-bit trie. This increases the time for an

insert as well as for a delete. In the case of deletion, by first deleting from the

1-bit trie, we determine the next (lower) length matching prefix from the delete

path taken in the 1-bit trie. This eliminates the need to search a node for this

next (lower) length matching prefix when deleting from the VST. The result is

a net reduction in time for the delete operation.

The batch modes described above may also be useful when the insert/delete rate

is sufficiently small that following each insert or delete done as above, the optimal

VST is computed in the background using another processor. While this computation

is being done, routes are made using the suboptimal VST resulting from the insert or

delete that was done as described for the batch modes. When the new optimal VST

has been computed, the new optimal VST is swapped with the suboptimal one.

Tables 2-15-2-20 give the measured run times for the insert and delete operations

using each of the three strategies described above. Figures 5-8 and 5-9 plot these

times for the Paix database. For the insert time experiments, we started with an

optimal VST for 75% of the prefixes in the given database and then measured the

time to insert the remaining 25%. The reported times are the average time for

one insert. For Paix and k = 2, it takes 21 + 78 = 99 milli seconds to construct

the optimal VST (time to compute optimal strides plus time to construct the VST

for these strides). However, the cost of an incremental insert that maintains the

optimality of the VST is only 50.75 micro seconds; the cost of an incremental delete

is 51.85 micro seconds; a speedup of about 2000 over the from scratch optimal VST

construction!









Table 2-15: Insertion time (in [tsec) for OptVST, Pentium 4 PC

k Paix Pb MaeWest Aads MaeEast
2 50.75 49.72 49.53 49.17 208.44
3 325.95 71.25 67.28 66.23 68.19
4 146.74 165.60 126.74 122.92 99.87
5 186.22 191.17 187.68 169.06 186.36
6 2247.96 333.87 252.99 746.92 192.36
7 912.03 446.03 2453.73 445.81 375.32


16: Deletion time


/usec) for


OptVST,


Pentium 4 PC


Although batch insertion is considerably faster than insertion using strategy

OptVST, batch insertion increases the number of levels in the VST, and so results

in slower searches. For example, in the experiments with Paix, the batch inserts

increased the number of levels in the initial k-VST from k to 5 for k = 2, to 6 for k =

3 and 4, and to 8 for k = 5, 6, and 7. The delete times were measured by starting with

an optimal VST for 100% of the prefixes in the given database and then measuring

the time to delete 25% of these prefixes. Once again, the average time for a single

delete is reported.


Table 2-17: Insertion time (in [tsec) for Batchl, Pentium 4 PC

k Paix Pb MaeWest Aads MaeEast
2 1.51 1.73 1.69 1.81 1.89
3 2.37 2.97 3.40 2.58 2.79
4 2.86 3.50 4.04 3.31 3.09
5 3.63 4.33 4.52 3.54 3.93
6 4.18 5.05 6.53 4.35 5.19
7 5.00 4.98 9.03 4.58 5.19


Table 2


k Paix Pb MaeWest Aads MaeEast
2 51.85 51.39 51.79 50.95 52.15
3 61.29 60.77 60.94 59.80 61.71
4 74.86 72.90 73.72 71.99 74.58
5 87.47 85.71 86.31 84.97 87.80
6 99.74 97.70 98.50 96.97 99.90
7 111.92 109.89 110.49 108.82 113.15

















103



2
E 10
F-


OptVST
SBatch1
Batch2

. . - - / -- -- -- - -- -
-/-








. .- .. .. ...-. ..-
- -^ -


100
2


Figure 2-20: Insertion time (in psec) for Paix, Pentium 4 PC



Table 2-18: Deletion time (in psec) for Batchl, Pentium 4 PC

k Paix Pb MaeWest Aads MaeEast
2 4.72 6.13 6.53 6.24 6.44
3 2.54 2.14 2.18 2.34 2.48
4 2.22 2.69 2.59 2.39 2.65
5 2.58 3.25 2.79 2.92 2.94
6 2.73 3.05 3.15 3.86 3.06
7 2.80 3.43 3.10 3.51 3.62



Table 2-19: Insertion time (in psec) for Batch2, Pentium 4 PC


k Paix Pb MaeWest Aads MaeEast
2 3.53 3.68 4.01 4.07 4.00
3 4.18 4.62 4.97 4.72 4.79
4 4.42 4.96 4.93 4.94 5.03
5 4.70 5.32 5.48 4.94 5.51
6 5.10 5.96 6.31 5.72 6.15
7 5.34 6.13 7.26 5.76 5.47















Table 2-20: Deletion time (in psec) for Batch2, Pentium 4 PC


3 4 5 6 7


Figure 2-21: Deletion time (in psec) for Paix, Pentium 4 PC


k Paix Pb MaeWest Aads MaeEast
2 5.70 7.34 7.37 7.16 7.45
3 3.59 3.59 3.67 3.63 3.49
4 3.48 3.78 3.93 3.89 3.98
5 3.67 3.77 4.05 3.79 4.05
6 3.81 4.07 4.14 3.98 4.15
7 3.87 4.19 4.23 3.99 3.90


10 3





() 2
a)~ 10
r)

U,
E

U,
U' 101


100
2









2.5 Summary

We have developed faster algorithms to compute the optimal strides for fixed-

and variable-stride tries than those proposed in [82]. On IPv4 prefix databases and

a 2.26 GHz Pentium 4 PC, our algorithm for fixed-stride tries is faster than the

corresponding algorithm of [82] by a factor of between 1.5 and 3; on a SUN Ultra

Enterprise 4000/5000, the speedup is between 2 and 4. This speedup results from

narrowing the search range in the dynamic-programming formulation. Since the

search range is at most 32 for IPv4 databases and at most 128 for IPv6 databases,

the potential to narrow the range (and hence speed up the computation) is greater

for IPv6 data. Hence, we expect that our narrowed-range FST algorithm will exhibit

greater speedup on IPv6 databases. We are unable to verify this expectation, because

of the non-availability of IPv6 prefix databases.

On our PC, our algorithm to compute the strides for an optimal variable-stride

trie is faster than the corresponding algorithm of [82] by a factor of between 3 and

47; on our SUN workstation, the speedup is between 2 and 17. Our VST stride

computation method permits the insertion and removal of prefixes without having

to recompute the optimal strides from scratch. The incremental insert and delete

algorithms are about 3 orders of magnitude faster than the "from -, .t,. Li algorithm.

We also have proposed two batch strategies for the insertion and removal of prefixes.

Although, these strategies permit faster insertion and deletion, they increase the

height of the VST, which results in slowing down the search operation. These batch

strategies are, nonetheless, useful in applications where it is practical to rebuild the

optimal VST whenever the search performance has become unacceptable.















CHAPTER 3
BINARY SEARCH ON PREFIX LENGTH

In this chapter, we focus on the collection of hash tables (CHT) scheme of Wald-

vogel et al. [87]. Let P be the set of prefixes in a router table, and let Pi be the subset

of P comprised of prefixes whose length is i. In the scheme of Waldvogel et al. [87],

we maintain a hash table Hi for every Pi that is not empty. H, includes the prefixes of

Pi as well as markers for prefixes in Uij
m.mlp is the longest matching-prefix for m. Consider the prefix set P = {P, ..., P6}

of Figure 3-1(a). The prefixes of P have 5 distinct lengths 1, 2, 4, 6, and 7. So,

the CHT of [87] will comprise H1, H2, H4, H6, and H7. Given a destination address

d, the longest matching-prefix, Imp(d) is found by searching the His using a binary

search. Suppose that the binary search follows a path as determined by the binary

tree of Figure 3-1(b). That is, if the first four bits of d correspond to a prefix in H4,

this prefix becomes the longest matching-prefix found so far and the search continues

to H6; if the first four bits of d correspond to a marker m in H4, then m.imp becomes

the longest-matching prefix found so far and the search continues to H6; otherwise,

the search continues to H1. The quest for Imp(d) examines at most 3 hash tables in

our example. When the number of distinct lengths is Idist, the number of hash tables

examined is O(logldist).

For the described search to work correctly, H4 must have markers for P5 and P6;

HI for P3; and H6 for P6. H1, for example, will include P1 and P2 plus the marker

1* for P3 (actually, since P2 = 1*, the marker isn't needed); while H4 will include

P4 plus the marker 1001* for P5 and P6. The Imp value for the marker 1001* is P3.

Srinivasan and Varghese [82] have proposed the use of controlled prefix-expansion

to reduce the number of distinct lengths and hence the number of hash tables in the










P1=0* 00* (Pla)
P =1* 01* (Plb)
P =1* 10* (P3)
P3=0* 11* (P2a)
P4=1000* 1000 (P4)
P5=100100* 1001000* (P5a)
P6=1001001* W H 1001001* (P6)
(a) Prefixes (b) Tree for binary (c) Expanded pre-
search fixes

Figure 3-1: Controlled prefix expansion

CHT. By reducing the number of hash tables in the CHT, the worst-case number
of hash tables searched in the quest for lmp(d) may be reduced. Prefix expansion
[82] replaces a prefix of length u with 20-" prefixes of length v, v > u. The new
prefixes are obtained by appending all possible bit sequences of length v u to the
prefix being expanded. So, for example, the prefix 1* may be expanded to the length
2 prefixes 10* and 11* or to the length 3 prefixes 100*, 101*, 110*, and 111*. In
case an expanded prefix is already present in the prefix set of the router table, it is
dominated by the existing prefix (the expanded prefix 10* represents a shorter original
prefix 1* that cannot be used to match destination addresses that begin with 10 when
longest-prefix matching is used) and so is discarded. So, if we expand P2 = 1* in our
collection of Figure 3-1(a) to length 2, the expand prefix P2b = 10* is dominated by
P3 = 10*. Figure 3-1(c) shows the prefixes that result when the length 1 prefixes of
Figure 3-1(a) are expanded to length 2 and the length 6 prefix is expanded to length
7. You may verify that lmp(d) is the same for all d regardless of whether we use the
prefix set of Figure 3-1(a) or (c) (when the latter set is used, we need to map back
to the original prefix from which an expanded prefix came). Since, the prefixes of
Figure 3-1(c) have only 3 distinct lengths, the corresponding CHT has only 3 hash
tables and may be searched for Imp(d) with at most 2 hash-table searches. Hence,









the CHT scheme results in faster lookup when the prefixes of Figure 3-1(c) are used

than when those of Figure 3-1(a) are used. [71, 72, 73, 82] use prefix expansion to

improve the lookup performance of trie-representations of router tables.

When reducing the number of distinct lengths from u to v, the choice of the

target v lengths affects the number of markers and prefixes that have to be stored in

the resulting CHT but not the number of hash tables, which is always v. Although

the number of target lengths may be determined from the expected number of packets

to be processed per second and the performance characteristics of the computer to be

used for this purpose, the target lengths are determined so as to minimize the storage

requirements of the CHT. Consequently, Srinivasan and Varghese [82] formulated the

following optimization problem.


Exact Collection of Hash Tables Optimization Problem (ECHT)

Given a set P of n prtfi.r and a lm,,., number of distinct lengths k, determine

1 1,,,, I lengths 11, ..., lk such that the storage required by the pirfi.r .- and markers for

the pr fi.r set expansion(P) obtained from P by pi fi.r expansion to the determined

q,,., I lengths is minimum.

When P and k are not implicit, we use the notation ECHT(P, k). For simplicity,

Srinivasan [80] assumes that the storage required by the prefixes and markers for the

prefix set expansion(P) equals the number of prefixes and markers. We make the

same assumption in this chapter. Srinivasan [80] provides an O(nW2)-time heuristic

for ECHT. We first show, in Section 3.1, that the heuristic of Srinivasan [80] may

be implemented so that its complexity is O(nW + kW2) on practical prefix-sets.

Then, in Section 3.2, we provide an O(nW3 + kW4)-time algorithm for ECHT. In

Section 3.3, we formulate an alternative version ACHT of the ECHT problem. In

this alternative version, we are to find at most k distinct target lengths to minimize

storage rather than exactly k target lengths. The ACHT problem also may be solved









in O(nW3 + kW4) time. In Section 3.4, we propose a reduction in the search range

used by the heuristic of [80]. The proposed range reduction reduces the run time

by more than 50% exclusive of the preprocessing time. The reduced-range heuristic

generates the same results on our benchmark prefix data-sets as are generated by

the full-range heuristic of [80]. A more accurate cost estimator than is used in the

heuristic of [80] is proposed in Section 3.5. Experimental results highlighting the

relative performance of the various algorithms and heuristics for ECHT and ACHT

are presented in Section 3.6.

3.1 Heuristic of Srinivasan

The ECHT heuristic of Srinivasan [80] uses the following definitions:

ExpansionCost(i, j)

This is the number of distinct prefixes that result when all prefixes in Pq E P,

i < q < j are expanded to length j. For example, when P = {0*, 1*, 01*, 100*},

ExpansionCost(1, 3) = 8 (note that 0* and 1* contribute 4 prefixes each; 01*

contributes none because its expanded prefixes are included in the expanded

prefixes of 0*).

Entries(j)

This is the maximum number of markers in Hj (should j be a target length) plus

the number of prefixes in P whose length is j. Srinivasan [80] uses "maximum

number of markers" in the definition of Entries(j) rather than the exact number

of markers because of the reported difficulty in computing this latter quantity.

T(j,r)

This is an upper bound on the storage required by the optimal solution to

ECHT(Q, r), where Q C P comprises all prefixes of P whose length is at

most j; the optimal solution to ECHT(Q, r) is required to contain markers, as

necessary, for prefixes of P whose length exceeds j.

Srinivasan [80] provides the following dynamic programming recurrence for T(j, r).












T(j, r) = Entries(j) + min {T(m, r 1)+ ExpansionCost(m + l,j)} (3.1)
mC{r-l ...j-1}




T(j, 1) = Entries(j) + ExpansionCost(, j) (3.2)

We may verify the correctness of Equations 3.1 and 3.2. When r = 1, there is only

1 target length and this length is no more than j. When Q has a prefix whose length

is j, then j must be the target length. In this case, the number of expanded prefixes

is at most ExpansionCost(1, j) plus the number of prefixes whose length is j. So, the

number of prefixes and markers is at most Entries(j) + ExpansionCost(1, j). When

Q has no prefix whose length is j, the optimal target length is the largest 1, 1 < j such

that Q has a prefix whose length is 1. In this case, Entries(1)+ExpansionCost(1, 1) <

Entries(j) + ExpansionCost(1, j) is an upper bound on the number of prefixes and

markers.

To compute ExpansionCost and Entries, a 1-bit trie [39] is used. Figure 3-2

shows a prefix set and its corresponding 1-bit trie. Notice that nodes at level i (the

root is at level 0) of the 1-bit trie store prefixes whose length is i + 1. Srinivasan [80]

states how ExpansionCost(i, j), 1 < i < j < W may be computed in O(nW2) time

using a 1-bit trie for P. Sahni and Kim [71] have observed that, for practical prefix

sets, the 1-bit trie has O(n) nodes. So, by performing a postorder traversal of the

1-bit trie, ExpansionCost(i, j), 1 < i < j < W may be computed in O(nW) time

(note that n > W). Details of this process are provided in Section 3.2.1 where we
show how a closely related function may be computed.

For Entries(j), Srinivasan [80] proposes counting the number of prefixes stored

in level j 1 of the 1-bit trie and the number of (non-null) pointers (in the 1-bit trie)

to nodes at level j (the number of pointers actually equals the number of nodes).










Prefixes
Length N]
Pl =1*
P2=01* 1 PI
N2
P3=001* 2
P4=010* 2 P2
P5=0101* N31 N32
P6=00001* 3 P3 P4
P7=00010* N41 N42 N43
P8=00110* 4 P5
P9=01000* N51/ N52 N53 N54
PO0=01001* 5 P6 P7 P8 P9 PO1

(a) A pre- (b) Corresponding 1-bit trie
fix set

Figure 3-2: Prefixes and corresponding 1-bit trie


The former gives the number of prefixes whose length is j and the latter gives the

maximum number of markers needed for the longer-length prefixes.

Suppose that m and j are target lengths and that no 1, m < 1 < j is a target

length. The actual number of prefixes and markers in Hj may be considerably less

than Entries(j) + ExpansionCost(m + 1, j) for the following reasons.

An expanded prefix counted in ExpansionCost(m + 1, j) may be identical to a

prefix in P whose length is j.

Some of the prefixes in P whose length is more than j may not need to leave a

marker in Hj because their length is not on any binary search (sub)path that

is preceded by the length j. For example, for the binary search described by

Figure 3-1(b), H1 needs markers only for prefixes in H2; not for those in H4,

H6, and H7. However, Entries(1) accounts for markers needed by prefixes in

H2 as well as those in H4, H6, and H7.

Entries(j) doesn't account for the fact that a marker may be identical to a

prefix in which case the storage count for the marker and the prefix together

should be 1 and not 2. For example, in Figure 3-2(b), the marker corresponding

to the non-null pointer to node N42 is identical to the prefix P3 and that for the

non-null pointer to N43 is identical to P4. So, we can safely reduce the value









of Entries(3) from 5 to 3. Note also that if the target lengths for the example

of Figure 3-2(a) are 1, 3, and 5, then the number of prefixes and markers in H3

is 4. However, ExpansionCost(2, 3) + Entries(3) = 2 + 5 = 7.

Exclusive of the time needed to compute ExpansionCost and Entries, the com-

plexity of computing T(W, k) and the target k lengths using Equations 3.1 and 3.2

is O(kW2) [80]. So, the overall complexity is O(nW2) (note that n > k). As noted

above, we may reduce the time required to compute ExpansionCost on practical

prefix-sets by performing a postorder traversal of the 1-bit trie. Hence, for practical

prefix-sets, the overall run time is O(nW + kW2).

3.2 Optimal-Storage Algorithm

As noted in Section 3.1, the algorithm of Srinivasan [80] is only a heuristic for

ECHT. Since T(W, k) is only an upper bound on the cost of an optimal solution

for ECHT(P, k), there is no assurance that the determined target lengths actually

result in an optimal or close-to-optimal solution to ECHT(P, k). In this section,

we develop an algorithm to determine the storage cost of an optimal solution to

ECHT(P, k). The algorithm is easily extended to determine the target lengths that

yield this optimal storage cost. Like the heuristic of Srinivasan [80], our algorithm

uses dynamic programming. However, we modify the definition of expansion cost and

introduce an accurate way to count the number of markers.

Although the heuristic of Srinivasan [80] is insensitive to the shape of the binary

tree that describes the binary search, the optimal-storage algorithm cannot be insensi-

tive to this shape. To see this, notice that the binary tree of Figure 3-1(b) corresponds

to the traditional way to program a binary search. In this, if low and up define the

current search range, then the next comparison is made at mid = L(low + up)/2]. If

instead, we were to make the next comparison at mid = [(low +up)/2], the search is

described by the binary tree of Figure 3-3. When a binary search is performed accord-

ing to this tree, only H4 need have markers. The markers in H4 are the same regardless









H4

H2(




Figure 3-3: Alternative binary tree for binary search

of whether we use mid = [(low + up)/2] or mid = [(low + up)/2]. By using the

latter definition of mid, we eliminate markers from all remaining hash tables. In our

development of the optimal-storage algorithm, we assume that mid = [(low+ up)/2]

is used. Our development is easily altered for the case when mid = [(low + up)/2] is

used.

3.2.1 Expansion Cost

Define EC(i, j), 1 < i < j < W, to be the number of distinct prefixes that result

when all prefixes in Pq E P, i < q < j are expanded to length j. Note that EC(i, i)

is the number of prefixes in P whose length is i.

We may compute EC by traversing the 1-bit trie for P in a postorder fashion.

Each node x at level i-1 of the trie maintains a local set of EC(i, j) values, LEC(i, j),

which is the expansion cost measured relative to the prefixes in the subtree of which

x is root. Some of the cases for the computation of x.LEC(i, j) are given below.

x.LEC(i, i) equals the number of prefixes stored in node x. For example, for

node N1 of Figure 3-2(a), LEC(1, 1) = 1 and for node N54, LEC(5, 5) = 2.

For the remaining cases, assume i < j.

If x has a prefix in its left data field (e.g., the prefix in the left data field of node

N32 is P4) and also has one in its right data field, then x.LEC(i, j) = 2j-+l.

If x has no prefixes (e.g., nodes N41 and N42) and x has non-null left and right

subtrees, then x.LEC(i, j) = x.leftChild.LEC(i+ 1, j)+x.rightChild(i+1, j).









If x has a right prefix and a non-null left subtree, then x.LEC(i, j) = x.left-

Child.LEC(i + 1, j)+ 2-.

The remaining cases are similar to those given above. One may verify that EC(i, j)

is just the sum of the LEC(i, j) values taken over all nodes at level i 1 of the trie.

Figure 3-4 gives the LEC and EC values for the example of Figure 3-2. In this

figure, LEC51, for example, refers to the LEC values for node N51.


LEC51[5j] j 5 LEC52[5j] j =5 L TrC.[ i1 j= 5
1 1 1
LEC54[5j] j = 5 LEC41[4j] j =4 5
2 02
LEC42[4j] j 4 5 LEC43[4j] j =4 5
0 1 1 4 EC[ij]j 123 4 5
i= 1 3 71430
LEC31[3j]j=3 4 5 Lit C2 jl|j 3 4 5 2 1 3 6 14
126 124 3 2 410
LEC2[2j]j=23 4 5 LEC[1j]j =12 3 4 5 4 1 7
1 3 6 14 1 3 7 1430 5 5
(a) LEC values (b) EC values

Figure 3-4: LEC and EC values for Figure 3-2


Since a 1-bit trie for n prefixes may have O(nW) nodes, we may compute all

EC values in O(nW2) time by computing the LEC values as above and summing up

the computed LEC values. A postorder traversal suffices for this. As noted in [71],

the 1-bit tries for practical prefix sets have O(n) nodes. Hence, in practice, the EC

values take only O(nW) time to compute.

3.2.2 Number of Markers

Define MC(i, j, m), 1 < i < j < m < W, to be the number of markers in Hj

under the following assumptions.

The prefix set comprises only those prefixes of P whose length is at most m.

The target lengths include i 1 (for notational convenience, we assume that

0 is a trivial target length for which Ho is always empty) and j but no length

between i 1 and j. Hence, prefixes whose length is i, i + 1, or j are











expanded to length j. Only prefixes whose length is between j + 1 and m may

leave a marker in Hj.

For MC(2, 4, 5) (Figure 3-2), P6 through P10 may leave markers in H4. The

candidate markers are obtained by considering only the first four bits of each of these

prefixes. Hence, the candidate markers are 0000*, 0001*, 0011*, and 0100*. However,

since the next smaller target length is 1, P2, P3, and P4 will leave a prefix in H4. The

prefixes in H4 are 0100*, 0101*, 0110*, 0111*, 0010*, and 0011*. So, of the candidate

markers, only 0000* and 0001* are different from the prefixes in H4. Therefore, the

marker count MC(2, 4, 5) is 2.

We may compute all MC(i, j, m) values in O(nW3) time (O(nW2) for practical

prefix sets) using a local function LMC in each node of the 1-bit trie and a postorder

traversal. The method is very similar to that described in Section 3.2.1 for the

computation of all EC values. Figure 3-5 shows the LMC and MC values for our

example of Figure 3-2.


LMC51[5j,k] k 5 LMC52[5j,k] k 5 LMC53[5j,k] k 5
j=5 0 j=5 0 j=5 0
LMC54[5j,k] k 5 LMC41[4j,k]k 4 5
j=5 0 j=4 02
LMC42[4j,k]k 4 5 5 0
j =4 0 1 LMC43[4j,k]k 4 5
5 0 j=4 01
5 0
LMC31[3,j,k] k 3 4 5
j 3 0 0 1 L.kh -; jijk -3 4 5
4 0 2 j=3 0 0 0
5 0 4 00
5 0
LMCl[1j,k]k 1 2 3 4 5
j 1 0 11 11 LMC2[2j,k] k2 3 4 5
2 0111 j 2 0111
3 0 0 1 3 001
4 02 4 02
5 0 5 0

(a) LMC values


MC[1j,k] k
j=1
2
3
4
5
MC[2j,k] k
j=2
3
4
5


123 4 5 MC[3,j,k] k 3 4 5
01111 j3 00 1
0111 4 02
001 5 0
02
0 MC[4j,k] k 4 5
j=4 04
2 3 4 5 5 0
0111
0 0 1 MC[5j,k] k 5
02 j 5 0
0

(b) MC values


Figure 3-5: LMC and MC values for Figure 3-2









3.2.3 Algorithm for ECHT

Let Opt(i, j, r) by the storage requirement of the optimal solution to ECHT(P, r)

under the following restrictions.

Only prefixes of P whose length is between i and j are considered.

Exactly r target lengths are used.

j is one of the target lengths (even if there is no prefix whose length is j).
Let lmax, Imax < W, be the length of the longest prefix in P. We see that

Opt(l, Imax, k) is the storage requirement of the optimal solution to ECHT(P, k).
When r = 1, there is exactly one target length, j. So, all prefixes must be

expanded to this length and there are no markers. Therefore,



Opt(i, j, 1) = EC(i, j), i < j (3.3)

When r = 2, one of the target lengths is j and the other, say m, lies between

i and j 1. Because we assume mid = [(low + up)/2], the first search is made in

Hj and the second in Hm. Consequently, neither Hj nor Hm has any markers. Hj

(Hm) includes prefixes resulting from the expansion of prefixes of P whose length is

between m + 1 and j (i and m). So,



Opt(i, j, 2) = min {EC(i, m) + EC(m + 1, j)}, i < j (3.4)
i~m
Consider the case r > 2. Let the r target lengths be 11 < 12 < .. < Ir. Suppose

that the mid = [(1 + r)/2]th target length is v. Let u 1 be the largest target

length such that u 1 < v. The first search of the binary search is done in H,. The

number of prefixes and markers in H, is EC(u, v) + MC(u, v, j). Additionally, the

mid- 1 = L(r 1)/2] target lengths that are less than v define an optimal (mid- 1)-

target-length solution for prefixes whose length is between i and u 1 subject to









the constraint that u 1 is a target length (notice that there are no markers in this

solution for prefixes whose length exceeds u 1) and the r m = (r 1)/2] target

lengths greater than v define an optimal (r m)-target-length solution for prefixes

whose length is between v+1 and j subject to the constraint that j is a target length.

Hence, we obtain the following recurrence for Opt(i, j, r).


Opt(i, j, r)


min {Opt(i, 1, [(r 1)/2])
i+[(r-1)/2] v + Opt(v + 1,j, (r 1)/2])+ EC(u,v)

+ MC(u,v,j)},3 < r

(3.5)


Using Equations 3.3-3.5 to compute Opt(l, 5, 4) for the example of Figure 3-2,

we get


Opt(1, 5,4)


min {Opt(l, 1, 2)+ Opt(v + 1, 5,1)+ EC(u, v) + MC(u, v, j)}
3 min{Opt(l, 2, 2) + Opt(4, 5, 1) + EC(3, 3) + MC(3, 3, 5),

Opt(1, 2, 2) + Opt(5, 5,1) + EC(3, 4) + MC(3, 4, 5),

Opt(1, 3,2) + Opt(5,5, 1)+ EC(4, 4) + MC(4, 4, 5)}

min{EC(1, 1) + EC(2, 2) + EC(4, 5) + EC(3, 3) + MC(3, 3, 5),

EC(1, 1) + EC(2, 2) + EC(5, 5) + EC(3, 4) + MC(3, 4, 5),

min{EC(1, 1) + EC(2, 3), EC(1, 2) + EC(3, 3)}

+EC(5, 5) + EC(4, 4) + MC(4, 4, 5)}

min{+ + 7+ 2 + 1,1+ 1+5+4+ 2,

min{1 + 3,3 + 2}+ 5 + 1 + 4}

min{12, 13, 14} = 12.


From the above computations, we see that the optimal expansion lengths are 1, 2,

3, and 5. Figure 3-6(a) shows the CHT structure that results when these four target










lengths are used. The three markers are shown in boldface, two of these markers are

also prefixes (P3 and P4). The storage cost is 12.


Array of hash table pointers Array of hash table pointers
l(length 1) 2(length 2) 3(length 3) 4(length 5) l(length 1) 2(length 3) 4(length 5)

P P201* P6=00001* P6=00001*
P7=00010* P7=00010*
000* P8=00110* 0001 P8=00110*
P3=001 P9=01000* P3=001* P9=01000*
P4=010 P10=01001* P4=010 P10=01001*
P5=01010* P2=011 P5=01010*
P5=01011* P5=01011*

(a) 4 target lengths (b) 3 target lengths

Figure 3-6: Optimal-storage CHTs for Figure 3-2


Complexity

To solve Equations 3.3-3.5 for Opt(l, Imra, k), we need to compute O(kW2) Opt(i, j, r)

values. Each of these values may be computed in O(W2) time from earlier computed

Opt values. Hence, exclusive of the time needed to compute EC and MC, the time

to compute Opt(l, Imax, k) is O(kW4). Adding in the time to compute EC and MC,

we get O(nW3 + kW4) as the overall time needed to solve the ECHT problem. Of

course, on practical data sets, the time is O(nW2 + kW4).

3.3 Alternative Formulation

In the ECHT(P, k) problem, we are to find exactly k target lengths for P that

minimize the number of (expanded) prefixes and markers (i.e., minimize storage cost).

Although k is determined by constraints on required lookup performance, the deter-

mined k is only an upper bound on the number of target lengths because using a

smaller number of target lengths improves lookup performance. The ECHT prob-

lem is formulated with the premise that using a smaller k will lead to increased

storage cost, and so in the interest of conserving storage/memory while meeting the

lookup performance requirement, we use the maximum permissible number of target

lengths. However, this premise is not true. As an example, consider the prefix set









P = {P1, P2, P3} = {0*, 00*, 010*}. The solution for ECHT(P, 2) uses the target

lengths 2 and 3; P1 expands to 00* and 01* but the 00* expansion is dominated by

P2 and is discarded; no markers are stored in either H2 or H3; and the storage cost

is 3. The solution for ECHT(P, 3), on the other hand, uses the target lengths 1, 2,

and 3; no prefixes are expanded; H2 needs a marker 01* for P3; and the total storage

cost is 4!

With this in mind, we formulate the ACHT(P, k) problem in which we are

to find at most k target lengths for P that minimize the storage cost. In case of

a tie, the solution with the smaller number of target lengths is preferred, because

this solution has a reduced average lookup for the preceding example, the solution

to ECHT(P, 3) is {1, 2, 3}, whereas the solution to ACHT(P, 3) is {2, 3}. For the

example of Figure 3-2, the solution to ECHT(P, 4) is {1, 2, 3, 5} resulting in a storage

cost of 12; the solution to ACHT(P, 4) is {1, 3, 5} resulting in a storage cost that is

also 12 (see Figure 3-6(b)).

The ACHT problem may be solved in the same asymptotic time as needed for

the ECHT problem by first computing Opt(i, j, r), 1 < i < j < I i, 1 < r < k and

then finding the r for which Opt(l, Imra, r) is minimum, 1 < r < k.

3.4 Reduced-Range Heuristic

We first adapt the ECHT heuristic of Srinivasan [80] to the ACHT problem.

For this purpose, we define the function C, which is the ACHT analog of T. To get

the definition of C, simply replace ECHT(Q, r) by ACHT(Q, r) in the definition

of T. Also, we use the same definitions for ExpansionCost (now abbreviated to

ECost) and Entries as used in [80] (see Section 3.1). It is easy to see that C(j, r) <

C(j, r- 1),r > 1.

A simple dynamic programming recurrence for C is












C(j, r)= Entries(j) + mmin {C(m, r 1) + ECost(m + 1, j)}, > 0, r > 1
mCG{0...j-1}
(3.6)



C(0, r) = 0, C(j, 1) = Entries(j) + ECost(l, j), j > 0 (3.7)


To see the correctness of Equations 3.6 and 3.7, note that when j > 0, there

must be at least one target length. If r = 1, then there is exactly one target length.

This target length is at most j (the target length is j when there is at least one

prefix of this length) and so Entries(j) + ECost(1,j) is an upper bound on the

storage cost. If r > 1, let m and s, m < s, be the two largest target lengths in the

solution to ACHT(P, r). m could be at any of the lengths 0 through j 1; m = 0

would mean that there is only 1 target length. Hence the storage cost is bounded by

Entries(j) + C(m, r 1) + ECost(m + 1, j). Since we do not know the value of m,

we may minimize over all choices for m. C(0, r) = 0 is a boundary condition.

We may obtain an alternative recurrence for C(j, r) in which the range of m on

the right side is r 1... j 1 rather than 0... j 1. First, we obtain the following

dynamic programming recurrence for C:



C(, r) = min{C(j, r 1), T(j, r)}, r > 1 (3.8)




C(j, 1) = Entries(j) + ECost(1, j) (3.9)

The rationale for Equation 3.8 is that the best CHT that uses at most r target

lengths either uses at most r 1 target lengths or uses exactly r target lengths. When

at most r 1 target lengths are used, the cost is bounded by C(j, r 1), and when









exactly r target lengths are used, the cost is bounded by T(j, r), which is defined by

Equation 3.1. Let U(j, r) be as defined in Equation 3.10.


U(j, r) = Entries(j) +


min {C(m, r 1) + ECost(m + 1,j)}, j > 0, r > 1

(3.10)


From Equations 3.1 and 3.8 we obtain


C(j, r) = min{C(j, r 1), U(j, r)}, r > 1


(3.11)


To see the correctness of Equation 3.11, note that for all j and r such that

r < j, T(j, r) > C(j, r). Furthermore,


Entries(j)


+ min {T(m, r 1) + ECost(m + 1,j)}

> Entries(j) + min {C(m, r 1)+ ECost(m + 1,j)}

= U(j, r) (3.12)


Therefore, when C(j, r 1) < U(j, r), Equations 3.8 and 3.11 compute the same

value for C(j, r). When C(j, r 1) > U(j, r), it appears from Equation 3.12 that

Equation 3.11 may compute a smaller C(j, r) than is computed by Equation 3.8.

However, this is impossible, because


C(j, r)


=Entries(j) + min {C(m, r 1) + ECost(m + 1, j)}
m in{0...j- 1}
< Entries(j) + mmin {C(m, r 1) + ECost(m + 1,j)}
mn{r-l ...j-l1


Therefore, the C(j, r)s computed by Equations 3.8 and 3.11 are equal.









In the remainder of this section, we use the reduced ranges r 1... j 1 for C.

Heuristically, the range for m (in Equation 3.6) may be restricted to a range that is

(often) considerably smaller than r 1... j 1. The narrower range we wish to use

is max{M(j 1, r), M(j, r 1), r 1} ... j 1, where M(j, r), r > 1, is the smallest

m that minimizes

C(m, r 1) + ECost(m + 1, j)

in Equation 3.6. Although the use of this narrower range could result in different

results from what we get using the range r 1... j 1, on our benchmark prefix sets,

this doesn't happen. In the remainder of this section, we derive a condition Z on the

1-bit trie that, if satisfied, guarantees that the use of the narrower range yields the

same results as when the range r 1... j 1 is used.

Let P be the set of prefixes represented by the 1-bit trie. Let exp(i, j), i < j,

be the set of distinct prefixes obtained by expanding the prefixes of P whose length

is between i and j 1 to length j. Note that exp(i, i) = 0 and that Iexp(i, j) =

ECost(i, j). We say that exp(i, j) covers a length j prefix p of P iff p E exp(i, j).

Let n(i, j) be the number of length j prefixes in P that are not covered by a prefix

of exp(i, j). Note that n(i, i) is the number of length i prefixes in P.

The condition Z that ensures that the use of the narrower range produces the

same C values as when the range r 1... r 1 is used is

Z = ECost(a,j) ECost(b,j) > 2(n(b, j) n(a,j))

where, 0 < a < b < j.

Lemma 9 For every 1-bit trie, (a) ECost(i, j + 1) > 2ECost(i, j), 0 < i < j, and

(b) ECost(i, j) > ECost(i + 1, j), 0 < i < j.








Proof (a)

ECost(i,j + 1) = exp(,j + 1)

= 2[|exp(i, j) + n(i,j)]

= 2ECost(i, j) + 2n(i, j)

> 2ECost(i, j)

(b) Since, exp(i 1,j) C exp(i,j), ECost(i,j) = exp(i,j)l > exp(i + 1,j)| =
ECost(i+ 1,j). )

Lemma 10 V(j > 0, i < j)[Entries(j)+ECost(i, j) < Entries(j+1)+ECost(i, j+
1)].
Proof By definition, Entries(j) = number of prefixes of length j plus the number
of nodes at level j of the trie (this latter number equals the number of pointers from
level j -1 to level j). Since each length j prefix expands to 2 length j+1 prefixes, the
first term in the sum for Entries(j) is at most ECost(i,j + 1)/2. Since the subtree
rooted at each level j node contains at least one prefix, the second term in the sum
for Entries(j) is at most Entries(j + 1). So,

Entries(j) < ECost(i, j + 1)/2 + Entries(j + 1)

From Lemma 9(a), ECost(i, j) < ECost(i, j+1)/2. So, Entries(j)+ECost(i, j) <
ECost(i, j + 1) + Entries(j + 1). m

Lemma 11 V(j > 0, r > 0)[C[j, r] < C[j + 1, r]].
Proof First, consider the case when r = 1. From Equation 3.7, we get C(j, 1) =
Entries(j) + ECost(1, j) and C(j + 1, 1)= Entries(j + 1) + ECost(1,j + 1). From
Lemma 10, Entries(j) + ECost(1,j) < Entries(j + 1) + ECost(1, + 1). Hence,
C(j, 1) < C(j + 1, 1).









Next, consider the case r > 1. From the definition of M(j, r), it follows that


C(j + 1, r) = Entries(j + 1) + C(b, r 1) + ECost(b + 1, J+ 1),


where 0 < b =


M(j + 1, r) < j. When b < j, using Equation 3.6 and Lemma 9, we


< Entries(j) + C(b, r 1

< Entries(j + 1) + C(b, r


) +ECost(b+ 1, j)

- 1)+ ECost(b + 1, + 1)


C(j + 1, r).


When b = j,


C(j + 1, r) = Entries(j + 1) + C(j, r


1)+ ECost(j + 1, j + 1) > C(j, r


The remaining lemmas of this section assume that Z is true.


Lemma 12 ECost(a, j +1)


ECost(b, j+1) > ECost(a, j)


ECost(b, j)], 0 < a <


b

Proof From the definition of n(i, j), it follows that


ECost(a, j) ECost(b, j)


j-1
Zn(a, l) 2j-
l=a
b-1
Zn(a, 1) 2
I-a


j-1
Zn(b, 1) 2j-
l=b


j-1
[n(b, 1)
lb


C(j, r)


n(a, 1)] 2j-