Citation |

- Permanent Link:
- https://ufdc.ufl.edu/UFE0000920/00001
## Material Information- Title:
- Data Structures for Dynamic Router Table
- Creator:
- LU, HAIBIN (
*Author, Primary*) - Copyright Date:
- 2008
- Language:
- English
## Subjects- Subjects / Keywords:
- Bytes ( jstor )
Chopping ( jstor ) Data models ( jstor ) Data ranges ( jstor ) Databases ( jstor ) Information search ( jstor ) Mathematical vectors ( jstor ) Range searching ( jstor ) Siblings ( jstor ) Standard deviation ( jstor )
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- Copyright Haibin Lu. Permission granted to University of Florida to digitize and display this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
- Embargo Date:
- 9/9/1999
- Resource Identifier:
- 53314839 ( OCLC )
## UFDC Membership |

Downloads |

## This item has the following downloads:
lu_h ( .pdf )
lu_h_Page_072.txt lu_h_Page_118.txt lu_h_Page_069.txt lu_h_Page_008.txt lu_h_Page_078.txt lu_h_Page_038.txt lu_h_Page_002.txt lu_h_Page_060.txt lu_h_Page_124.txt lu_h_Page_094.txt lu_h_Page_033.txt lu_h_Page_006.txt lu_h_Page_058.txt lu_h_Page_128.txt lu_h_Page_037.txt lu_h_Page_076.txt lu_h_Page_029.txt lu_h_Page_021.txt lu_h_Page_117.txt lu_h_Page_077.txt lu_h_Page_074.txt lu_h_Page_063.txt lu_h_Page_010.txt lu_h_Page_054.txt lu_h_Page_114.txt lu_h_Page_062.txt lu_h_Page_030.txt lu_h_Page_046.txt lu_h_Page_084.txt lu_h_Page_113.txt lu_h_Page_031.txt lu_h_Page_090.txt lu_h_Page_059.txt lu_h_Page_079.txt lu_h_Page_091.txt lu_h_Page_028.txt lu_h_pdf.txt lu_h_Page_121.txt lu_h_Page_017.txt lu_h_Page_081.txt lu_h_Page_115.txt lu_h_Page_016.txt lu_h_Page_064.txt lu_h_Page_045.txt lu_h_Page_061.txt lu_h_Page_080.txt lu_h_Page_112.txt lu_h_Page_041.txt lu_h_Page_125.txt lu_h_Page_007.txt lu_h_Page_098.txt lu_h_Page_057.txt lu_h_Page_089.txt lu_h_Page_035.txt lu_h_Page_101.txt lu_h_Page_075.txt lu_h_Page_049.txt lu_h_Page_023.txt lu_h_Page_107.txt lu_h_Page_067.txt lu_h_Page_004.txt lu_h_Page_092.txt lu_h_Page_120.txt lu_h_Page_105.txt lu_h_Page_052.txt lu_h_Page_047.txt lu_h_Page_127.txt lu_h_Page_119.txt lu_h_Page_087.txt lu_h_Page_093.txt lu_h_Page_096.txt lu_h_Page_108.txt lu_h_Page_102.txt lu_h_Page_003.txt lu_h_Page_073.txt lu_h_Page_083.txt lu_h_Page_005.txt lu_h_Page_111.txt lu_h_Page_025.txt lu_h_Page_066.txt lu_h_Page_116.txt lu_h_Page_103.txt lu_h_Page_055.txt lu_h_Page_071.txt lu_h_Page_110.txt lu_h_Page_065.txt lu_h_Page_040.txt lu_h_Page_104.txt lu_h_Page_001.txt lu_h_Page_106.txt lu_h_Page_039.txt lu_h_Page_082.txt lu_h_Page_070.txt lu_h_Page_123.txt lu_h_Page_022.txt lu_h_Page_044.txt lu_h_Page_068.txt lu_h_Page_034.txt lu_h_Page_015.txt lu_h_Page_099.txt lu_h_Page_129.txt lu_h_Page_053.txt lu_h_Page_050.txt lu_h_Page_095.txt lu_h_Page_097.txt lu_h_Page_048.txt lu_h_Page_086.txt lu_h_Page_042.txt lu_h_Page_013.txt lu_h_Page_036.txt lu_h_Page_019.txt lu_h_Page_109.txt lu_h_Page_014.txt lu_h_Page_026.txt lu_h_Page_056.txt lu_h_Page_018.txt lu_h_Page_088.txt lu_h_Page_100.txt lu_h_Page_020.txt lu_h_Page_024.txt lu_h_Page_012.txt lu_h_Page_085.txt lu_h_Page_027.txt lu_h_Page_043.txt lu_h_Page_122.txt lu_h_Page_009.txt lu_h_Page_051.txt lu_h_Page_032.txt lu_h_Page_126.txt lu_h_Page_011.txt |

Full Text |

DATA STRUCTURES FOR DYNAMIC ROUTER TABLE By HAIBIN LU A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2003 Copyright 2003 by Haibin Lu To my family. ACKNOWLEDGMENTS I would like to give my sincere thankfulness to my advisor, Dr. Sartaj Sahni, for his mentoring and support throughout my Ph.D. study. It would be impossible to have my research career without his guidance. This work was supported, in part, by the National Science Foundation under grant CCR-9911:'~ ,. I am very grateful to Dr. S li-i 'y Ranka, Dr. Randy C'!. ..-, Dr. Richard N. ,.- il 111 Dr. Michael Fang for serving on my Ph.D. supervisory committee and providing helpful --, -I 1 i i-. I want to dedicate this dissertation to my parents. Without their encouragement and hard work, I could not think of getting a doctoral degree. Finally, I would like to give my special thanks to my wife, Lan, whose caring and love enabled me to complete this work. TABLE OF CONTENTS ACKNOWLEDGMENTS . ........... ABSTRACT .. ................... CHAPTER 1 INTRODUCTION AND RELATED WORK . 1.1 Introduction .. ........... 1.1.1 Static Router Table ..... 1.1.2 Dynamic Router Table . 1.2 Related Work .. .......... 1.2.1 Trie . . . . 1.2.2 Sets of Equal-Length Prefixes 1.2.3 End-Point Array ....... 1.2.4 Multiway Range Tree . 1.2.5 O(logn) Dynamic Solutions 1.2.6 Highest-Priority Prefix Table 1.2.7 TCAM .. .......... 1.2.8 Others . . . 1.3 Contribution .. ........... 2 O(log n) DYNAMIC ROUTER TABLE FOR PREFIXES Al Prelim inaries . . . . . . 2.1.1 Prefixes and Longest-Prefix Matching ..... 2.1.2 Ranges and Projections .. ........... 2.1.3 Most-Specific-Range Routing and Conflict-Free 2.1.4 Normalized Ranges .. ............ 2.1.5 Priority Search Trees And Ranges ....... P refixes . . . . . . . N i iii! i, iecting Ranges . . . . . Conflict-Free Ranges .. ............... 2.4.1 Determine msr(d) .. ............. 2.4.2 Insert A Range .. .............. 2.4.3 Delete A Range .. .............. 2.4.4 Computing maxP and minP .......... 2.4.5 A Simple Algorithm to Compute maxP . 2.4.6 An Efficient Algorithm to Compute maxP . D RANGES Ranges page iv 2.1 2.4.7 Wrapping Up Insertion of a Range . . 44 2.4.8 Wrapping Up Deletion of a Range . . ..... 45 2.4.9 Complexity. ....... ............ ...... 45 2.5 Experimental Results .................. ..... .. 46 2.5.1 Prefixes. .................. ......... .. 46 2.5.2 Nonintersecting Ranges ............ .. .. .. 50 2.5.3 Conflict-free Ranges ............ ...... 51 2.6 Conclusion .................. ............ .. 51 3 DYNAMIC IP ROUTER TABLES USING HIGHEST-PRIORITY MATCHING .................. ............. .. 53 3.1 Preliminaries ...... .......... ..... .... 53 3.2 N. iii. I, ecting Highest-Priority Rule-Tables (NHRTs)-BOB .56 3.2.1 The Data Structure .................. .... 56 3.2.2 Search for hpr(d) .................. .. 59 3.2.3 Insert a Range .................. ..... .. 61 3.2.4 Red-Black-Tree Rotations .................. .. 63 3.2.5 Delete a Range. .................. .... .. 66 3.2.6 Expected Complexity of BOB . . ..... 68 3.3 Highest-Priority Prefix-Tables (HPPTs)-PBOB . ... 69 3.3.1 The Data Structure .................. .... 69 3.3.2 Lookup ........ ....... ...... .. .... 69 3.3.3 Insertion and Deletion . . . ...... 71 3.4 Longest-Matching Prefix-Tables (LMPTs)LMPBOB . 71 3.4.1 The Data Structure .................. .... 71 3.4.2 Lookup ........ ....... ...... .. .... 72 3.4.3 Insertion and Deletion ................ .. .. 73 3.5 Implementation Details and Memory Requirement . ... 74 3.5.1 Memory Management ................ .. .. 74 3.5.2 BO B . . . . ... . .. . 74 3.5.3 PBOB ...... ........ ......... .... 76 3.5.4 LM PBOB .................. ...... .. .. 77 3.6 Experimental Results .................. ..... .. 78 3.6.1 Test Data and Memory Requirement . . 78 3.6.2 Preliminary Timing Experiments . . ..... 79 3.6.3 Run-Time Experiments ............ .. .. .. 82 3.7 Conclusion .................. ............ .. 84 4 A B-TREE DYNAMIC ROUTER-TABLE DESIGN . . 87 4.1 Longest-Matching Prefix-Tables-LMPT . . ..... 88 4.1.1 The Prefix In B-Tree Structure-PIBT . ... 88 4.1.2 Finding The Longest Matching-Prefix . . ... 91 4.1.3 Inserting A Prefix .................. .. 92 4.1.4 Inserting an endpoint ................ .. .. 92 4.1.5 Update interval vectors ............... .. 96 4.1.6 Deleting A Prefix ..... .......... .... 97 4.1.7 Deleting from a Leaf Node .............. .. .. 98 4.1.8 Borrow from a Sibling ...... .......... .... 98 4.1.9 Merging Two Ad i .. il Siblings . . ..... 99 4.1.10 Deleting from a Non-leaf Node . . 100 4.1.11 Cache-Miss Analysis ...... ........ . 102 4.2 Highest-Priority Range-Tables ............. . 104 4.2.1 Preliminaries ..... . . ...... 104 4.2.2 The Range In B-Tree Structure-RIBT . .... 105 4.2.3 RIBT Operations ................ ... 107 4.3 Experimental Results .................. .... 108 4.4 Conclusion .................. ............ .. 112 5 CONCLUSION AND FUTURE WORK ............. ..113 5.1 Conclusion ............... ......... ..113 5.2 Future Work ............... ........... ..114 REFERENCES .................. ................ .. 116 BIOGRAPHICAL SKETCH .................. ......... 120 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy DATA STRUCTURES FOR DYNAMIC ROUTER TABLE By Haibin Lu August 2003 C'!I ir: Sartaj Sahni Major Department: Computer Information and Science and Engineering Internet routers use router tables to classify incoming packets based on the in- formation carried in the packet headers. Packet classification is one of the network bottlenecks, especially when a high update rate becomes necessary. Much of the research in the router-table area has focused on static prefix tables, where updates usually require the rebuilding of the whole router table. Some router-table designs rely on the relatively short IPv4 addresses to achieve desired efficiency. However, these designs have bad scalability in terms of the prefix length. We propose several schemes to represent one-dimensional dynamic range tables, that is, tables into/from which rules are inserted/deleted concurrent with packet classification, and filters are specified as ranges. Our schemes allow real-time update and at the same time provide efficient lookup. The lookup and update complexities of our schemes are logarithmic functions of the number of the filters. The first scheme PST, which is based on priority search trees, uses the most specific rule tie breaker. The second scheme is called BOB (Binary search tree On Binary search tree). This scheme uses the highest priority tie breaker. In order to utilize the wide cache line size and reduce the tree height, a third scheme is developed in which the top level tree is a B-Tree. This scheme also uses the highest priority tie breaker. All three schemes are suitable for prefix filters as well as for range filters in which no two filters have intersecting ranges. In addition, the PST also can handle a conflict-free range set. CHAPTER 1 INTRODUCTION AND RELATED WORK 1.1 Introduction Tod ,'s Internet consists of thousands of packet networks interconnected by routers. When a host sends a packet into the Internet, the routers relay the packet towards its final destination. The routers exchange routing information with each other, and use the information gathered to calculate the paths to all reachable desti- nations. Each packet is treated independently and forwarded to a next router based on its destination address. The data structure a router uses to query next hop is called the router table. Each entry in the router table is a rule of the form (address prefix, next hop). Table 1-1 shows a set of five rules. We use W to denote the maximum possible length of a prefix. In IPv4, W = 32 and in IPv6, W = 128. In Table 1-1 W is 5. The prefix P1, which matches all the destination addresses, is called the /. fault prefix. The prefix P3 matches the destination addresses between 16 and 19. If the address prefix of a rule matches the destination address the incoming packet carries, the next hop of this rule is used to forward packet. Address prefix was introduced by CIDR (Classless Interdomain Routing) to deal with address depletion and router table explosion. The result of CIDR's address . I -. regation is that there may have several rules whose prefixes match the destination address. For example, the rules P1, P3 and P4 in Table 1-1 match the destination address 19. In this case, a tie breaker is needed to select one of the matching rules. The most .p. .W- matching is usually used, namely, the longest prefix matching the destination address is the winner. For our example router table, P4 is the winner for destination address 19. The other two popular tie breakers are first matching and highest j' .:, .:1,/ match- ing. For first matching tie breaker, the rule table is assumed to be a linear list of rules with the rules indexed 1 through n for an n-rule table. The first rule that matches the incoming package is used. Notice that the rule R1 is selected for every incoming packet since it matches all the destination addresses. In order to give a chance to other rules to become the winner, we must index the rules carefully, and the default prefix should be the last rule. In the highest priority n il hin,:. each rule is assigned a priority, and the rule with the highest priority is selected from those matching the incoming packet.1 Notice that the first matching tie breaker is a special case of the highest priority matching tie breaker(simply assign each rule a priority equal to the negative of its index in the linear linear). Table 1-1: A router table with five rules (W = 5) Rule Name Prefix Name Prefix Next Hop Range Start Range Finish R1 P1 N1 0 31 R2 P2 0101* N2 10 11 R3 P3 100* N3 16 19 R4 P4 1001* N4 18 19 R5 P5 10111 N5 23 23 The query based on the destination address is usually called address lookup or packet forwarding. In general other fields such as source address and port numbers may also be used, and the router table consists of the rules of the form (F, A), where F is a filter and A is an action. The action component of a rule specifies what is 1 We may assume either that all priorities are distinct or that selection among rules that have the same priority may be done in an arbitrary fashion to be done when a packet that satisfies the rule filter is received. Sample actions are drop the packet, forward the packet along a certain output link, and reserve a specified amount of bandwidth. Tie breakers similar to those mentioned earlier are used to select a rule from the set of rules that match the incoming packet. We call this problem packet la- i..:, /;'.:n. 1.1.1 Static Router Table In a static rule table, the rule set does not vary in time. For these tables, we are concerned primarily with the following metrics: 1. Time required to process an incoming packet. This is the time required to search the rule table for the rule to use. We refer to this operation as a lookup. 2. Preprocessing time. This is the time to create the rule-table data structure. 3. Storage requirement. That is, how much memory is required by the rule-table data structure? To handle update, static schemes usually use two copies working and shadow of the router tables. Lookups are done using the working table. Updates are performed, in the background (either in real time on the shadow table or by watching updates and reconstructing an updated shadow at suitable intervals); periodically, the shadow replaces the working table, and the caches of the working table are flushed. In this mode of update operation, many packets may be misclassified, because the working <"i,- isn't immediately updated. The number of misclassified packets depends on the periodicity with which the working table can be replaced by an updated shadow. Further, additional memory is required for the shadow table and for periodic recon- struction of the working table. It is important to have shorter preprocessing time in order to reduce the number of misclassified packets. 1.1.2 Dynamic Router Table In practice, rule tables are seldom truly static. At best, rules may be added to or deleted from the rule table infrequently. Typically, in a "static" rule table, in- serts/deletes are batched and the router-table data structure reconstructed as needed. In a t/i,.//'. rule table, rules are added/deleted with some frequency. For such tables, inserts/deletes are not batched. Rather, they are performed in real time. We believe that dynamic structures for router tables is becoming a necessity. First, update occurs frequently in the backbone area. Labovitz et al. [1] found up- date rate could reach as high as 1000 per second. These updates stem from the route failure, route repair and route fail-over. With the number of autonomous systems con- tinuously in' i -- i it is reasonable to expect the raising update rate. The router table needs to be updated in order to reflect the route change. Second, fast process- ing of update is preferred because during the batch and reconstruction, end-to-end d, 1liv increases, packet loss raises dramatically, and the part of network may expe- rience connectivity loss. Labovitz et al. [2] observed dramatically increased packet loss and end-to-end latency during the BGP routing change. Batch and expensive reconstruction make things worse. While BGP takes time to converge, route-repair events usually do not cause multiple announcements, and the latency for router table to become stable due to these events should only depend on the network delay and router processing d.-1-, along the path [2]. In addition, when the BGP coverage time gets reduced, the processing delay may dominate. Pei et al. [3] reduce the conver- gence time from 30.3 seconds to 0.3 seconds for a failure withdraw in the tested by applying two consistency assertions to BGP. Macian et al. [4] emphasize the impor- tance of supporting high update rate. Dynamic router tables that permit high-speed inserts and deletes are essential in QoS and VAS applications [4]. For example, edge routers that do stateful filtering require high-speed updates [5]. For dynamic router tables, we are concerned additionally with the time required to insert/delete a rule. For a dynamic rule table, the initial rule-table data structure is constructed by starting with an empty data structure and then inserting the initial set of rules into the data structure one by one. So, typically, in the case of dynamic tables, the preprocessing metric, mentioned above, is very closely related to the insert time. For dynamic router table, the following metrics are measured to compare the performance: 1. Lookup Time. 2. Insertion Time. This is the time required to insert a new rule into the rule table. 3. Deletion Time. This is the time required to delete a rule from the rule table. 4. Storage requirement. Note that there is only a working table for dynamic schemes and updates are made directly to the working table in real time. In this mode of update, no packet is improperly classified. However, packet classification/forwarding may be d.1 .1 until a preceding update completes. To minimize this delay, it is necessary that update be done as fast as possible. Another important metric we concern for both static and dynamic router table is the scalability to IPv6. IPv6, the next generation of IP, uses 128-bit addresses (W = 128). Although some of the schemes in section 1.2 work well for IPv4 (W = 32), they have bad scalability in terms of the prefix length. 1.2 Related Work Data structures for rule tables in which each filter is a address prefix and the rule priority is the length of this prefix2 have been intensely researched in recent years. We refer to rule tables of this type as longest-matching prefix-tables (LMPT). We refer to rule tables in which the filters are ranges and in which the highest-priority matching filter is used as highest-priority range-tables (HPRT). When the filters of no two rules of an HPRT intersect, the HPRT is a nonintersecting HPRT (NHPRT). Although every LMPT is also an NHPRT, an NHPRT may not be an LMPT. Ruiz-Sanchez et al. [6] review data structures for static LMPTs and Sahni et al. [7] review data structures for both static and dynamic LMPTs. 1.2.1 Trie Several trie-based data structures for LMPTs have been proposed [8, 9, 10, 11, 12, 13, 14]. Structures such as that of Doeringer et al. [10] use the path-compression technique. Thus the memory requirement is O(n). The search is guided by the input key and only inspects the bit position stored at the internal node due to a successful search bias. When the search reaches the leaf node and the search does not succeed, the downward path may be backtracked to find the longest matching prefix. Hence the search can be carried out in O(W) time. The update operation, insert or delete, is natural in trie structure, and can also be performed in O(W) time. The memory accesses during these operations are O(W). For IPv6, O(W = 128) memory accesses are quite expensive. Moreover, path compression reduces the height of trie only if the prefixes scatter inside the trie sparsely. When the number of prefixes increases, lots of branch nodes are needed and path compression does not have many nodes to 2 For example, the filter 10* matches all destination addresses that begin with the bit sequence 10; the length of this prefix is 2. compress. Ruiz-Sanchez et al. [6] observe that the height of BSD version of path- compressed trie is 26 for a IPv4 router table with 47,113 prefixes, and the height of a simple binary trie is only 30. In order to reduce the trie length, Gupta et al. [15] uses DIR-24-8 scheme which fully expands the binary trie at depth 24, i.e., all prefixes with length less than or equal to 24 are expanded to 24-bit prefixes as many as needed, and a table with 224 entries is used to store these expanded prefixes. For those prefixes longer than 24 bits, a second table is used to store them. The correspondence is established by storing pointers in the first table which point to the proper entries in the second table. The first table has 224 entries, and each entry is 16 bits (32M bytes in total). The first bit of each entry indicates whether the next 15 bits store the next hop or a pointer into 2nd table. With more than 32M bytes memory usage, the scheme can perform search in at most two memory accesses. But it is not scalable to IPv6 because expanding to 24 bits already takes too much memory. Gupta et al. [15] also propose alternatives that use less memory but require more memory accesses. Degermark et al. [9] use a similar prefix expansion technique at multiple depths. Bitmap compression is deploy, 1l to reduced the memory requirement greatly. A router table with 40,000 rules can fit into 160K bytes. In the worst case, the number of memory accesses is nine. Huang et al. [16] fully expand the binary trie at depth 16 and also expand the sbutries rooted at the nodes in depth 16 to their own depths. The bitmap compression is also applied to reduce the memory requirement. The router tables used in the experiment can be compacted into less than 500K bytes. The number of worst case memory accesses is three. Both schemes [9, 16] heavily depend on the prefix distribution. It is hard to decide a proper memory size for the scheme ahead of time. For example, in extreme case, if n prefixes in the router table all have length 32, and their first 16-bits are distinct (assume n <= 216), the scheme [16] needs at least 214n bytes. Nilsson et al. [11] apply the level compression as well as path compression to the binary trie. A binary trie is path-compressed first, then level compression is used to reduce the height of the trie further by substituting k highest levels of the binary trie with a single degree-2k node. Although the search complexity of LC (level compressed) trie is still O(W), the height of LC-trie is around 8 for the router tables used in author's analysis. These data structures [9, 11, 15] as well as Srinivasan et al. [12] attempt to optimize lookup time through an expensive preprocessing step. They, while providing very fast lookup capability, have a prohibitive insert/delete time, so they are suitable only for static router-tables (i.e., tables into/from which no inserts and deletes take place). Sahni et al. [13, 14] provide efficient constructions for fixed-stride and variable- stride multibit tries. The lookup time and memory requirement are optimized through expensive preprocessing. Aiming at improving update speed for fixed-stride multibit trie at pipelined ASIC architecture, Basu et al. [17] describe an algorithm to optimize and balance the memory requirement across the pipeline stages. 1.2.2 Sets of Equal-Length Prefixes Waldvogel et al. [18] have proposed a scheme that performs a binary search on hash tables organized by prefix length. In order to support binary search, O(log W) markers are generated for each prefix, and the longest matching prefix is precomputed for each marker. This binary search scheme has an expected complexity of O(log W) for lookup. The memory requirement is bounded by O(n log W). By introducing a technique called marker partitioning in the full version of Waldvogel et al. [18], the scheme has O(a T7iWlog W) insert/delete time and an increased search time O(a + log W), for a > 1. 1.2.3 End-Point Array An alternative adaptation of binary search to longest-prefix matching is devel- oped in [19]. The distinct end points (start points and finish points) of the ranges defined by the prefixes are stored in ascending order in an array. The end points divide the universe into O(n) basic intervals. The LMP(d) is precomputed for each interval as well as for each end point. LMP(d) is found by performing a binary search on this ordered array. A lookup in a table that has n prefixes takes O(log n) time. Because the schemes [19] use expensive precomputation, they are not suited for a dynamic router-tables. 1.2.4 Multiway Range Tree Suri et al. [20] have proposed a B-tree data structure for dynamic LMPTs. Using their structure, we may find the longest matching-prefix, LMP(d), in O(log, n) time. However, inserts/deletes take O(Wlog, n) time. When W bits fit in 0(1) words (as is the case for IPv4 and IPv6 prefixes) logical operations on W-bit vectors can be done in 0(1) time each. In this case, the scheme of Suri et al. [20] takes O(mlog2 W log n) time for an insertion and O(mlog, n+W) for a deletion. Assume one node can fit into 0(1) cache line, the number of memory accesses that occur when the data structure of Suri et al. [20] is used is O(log, n) per search, and O(m log, n) per update. 1.2.5 O(logn) Dynamic Solutions Sahni et al. [21, 22] develop data structures, called a collection of red-black trees (CRBT) and alternative collection of red-black trees (ACRBT), that support the three operations of a dynamic LMPT in O(log n) time each. The number of cache misses is also O(log n). Sahni et al. [22] show that their ACRBT structure is easily modified to extend the biased-skip-list structure of Ergun et al. [23] so as to obtain a biased-skip-list structure for dynamic LMPTs. Using this modified biased skip- list structure, lookup, insert, and delete can each be done in O(log n) expected time and O(logn) expected cache misses. Like the original biased-skip list structure of Ergun et al. [23], the modified structure of Sahni et al. [22] adapts so as to perform lookups faster for bursty access patterns than for non-bursty patterns. The ACRBT structure may also be adapted to obtain a collection of splay trees structure [22], which performs the three dynamic LMPT operations in O(log n) amortized time and which adapts to provide faster lookups for bursty traffic. 1.2.6 Highest-Priority Prefix Table When an HPPT (highest-priority prefix-table) is represented as a binary trie [24], each of the three dynamic HPPT operations takes O(W) time and cache misses. Gupta et al. [25] have developed two data structures for dynamic HPRTs-heap on trie (HOT) and binary search tree on trie (BOT). The HOT structure takes O(W) time for a lookup and O(W logn) time for an insert or delete. The BOT structure takes O(W log n) time for a lookup and O(W) time for an insert/delete. The number of cache misses in a HOT and BOT is .i- mptotically the same as the time complexity of the corresponding operation. 1.2.7 TCAM Ternary content-addressible memories, TCAMs, use parallelism to achieve 0(1) lookup [26]. Each memory cell of a TCAM may be set to one of three states 0, 1, and don't care. The prefixes of a router table are stored in a TCAM in descending order of prefix length. Assume that each work of the TCAM has 32 cells. The prefix 10* is stored in a TCAM work as 10??...?, where ? denotes a don't care and there are 30 ?s in the given sequence. To do a longest-prefix match, the destination address is matched, in parallel, against every TCAM entry and a sorted-biv--l n:_.l linear list, the longest matching-prefix can be determined in 0(1) time. A prefix may be inserted or deleted in O(W) time, where W is the length of the longest prefix [27]. Although TCAMs provide a simple and efficient solution for static and dynamic router tables, this solution requires special hardware, costs more, and uses more power and board space than solutions that employ SDRAMs. TCAMs have longer latency than SDRAMs. Since TCAM requires an arbitration module to choose the longest matching prefix and a more complex arbitration module are needed for a '-i.. -r router table, the latency of TCAM increases with router table size. EZchip Technologies, for example, claim that classifiers can forgo TCAMs in favor of comoodity memory solutions [5, 28]. Algorithmic approaches that have lower power consumption and are conservative on board space at the price of slightly increased search latency are sought. "System vendors are willing to accept some latency in their searches if it means lowering the power of a line < iI [28]. 1.2.8 Others C'!,. ig et al. [29] developed a model for table-driven route lookup and cast the table design problem as an optimization problem within this model. Their model accounts for the memory hierarchy of modern computers, and they optimize average performance rather than worst-case performance. Solutions that involve modifications to the Internet Protocol (i.e., the addition of information to each packet) have also been proposed [30, 31, 32]. 1.3 Contribution We have developed data structures for dynamic router tables. The data struc- tures use O(n) space except that RIBT uses O(nlog, n) space. Our first data struc- ture, PST [33, 34], uses the most specific matching tie breaker. It permits one to search, insert, and delete in O(log n) time each. Although O(log n) time data struc- tures for prefix tables were known prior to our work [21, 22], the PST is more memory efficient than the data structures of [21, 22]. Further, PST is significantly superior on the insert and delete operations, while being competitive on the search operation. For nonintersecting ranges and conflict-free ranges PSTs are the first to permit O(log n) search, insert, and delete. The second data structure, BOB [35], works for highest-priority matching with nonintersecting ranges. the highest-priority rule that matches a destination address may be found in O(log2 n) time; a new rule may be inserted and an old one deleted in O(log n) time. For the case when all rule filters are prefixes, the data structure PBOB (prefix BOB) permits highest-priority matching as well as rule insertion and deletion in O(W) time each. On practical rule tables, BOB and PBOB perform each of the three dynamic-table operations in O(log n) time and with O(log n) cache misses. PBOB can also support the dynamic-table operations in O(logn) time and with O(log n) cache misses for nonintersecting ranges when the number of nesting levels is a constant. To utilize the wide cache line size, e.g., 64-byte cache line, we propose B-tree data structures for dynamic router-tables for the cases when the filters are prefixes as well as when they are non-intersecting ranges. A crucial difference between our data structure for prefix filters and the B-tree router-table data structure of Suri et al. [20] is that in our data structure, each prefix is stored in 0(1) B-tree nodes per B-tree level, whereas in the structure of Suri et al. [20], each prefix is stored in O(m) nodes per level (m is the order of the B-tree). As a result of this difference, a prefix may be inserted or deleted from an n-filter router table accessing only O(log, n) nodes of our data structure; these operations access O(mlog, n) nodes using the structure of Suri et al. [20]. Even though the .,-i'--~!Itotic complexity of prefix insertion and deletion is the same in both B-tree structures, experiments conducted by us show that because of the reduced cache misses for our structure, the measured average insert and delete times using our structure are about 3i '. less than when the B-tree structure of Suri et al. [20] is used. Further, an update operation using the B-tree structure of Suri et al. [20] will, in the worst case, make 2.5 times as many cache misses as made when our structure is used. The .,-i-,i!,ld l ic complexity to find the longest matching prefix is the same, O(mlog, n) in both B-tree structures, and in both structures, this operation accesses O(log, n) nodes. The measured time for this operation also is nearly the same for both data structures. Both B-tree structures for prefix router-tables take O(n) memory. However, our structure is more memory efficient by a constant factor. For the case of non-intersecting ranges, the highest- priority range that matches a given destination address may be found in O(m log, n) time using our proposed B-tree data structure. The time to insert and delete a range is O((m + D) log, n), where D is the maximum nesting depth of the ranges. Our data structure for non-intersecting ranges requires O(n log, n) memory and O(log,, n) nodes are accessed during an operation. With the O(logn) operation time, our data structures scale well to the large router tables. Since the complexity is independent of the prefix length, our data structures are also scalable to IPv6. Another important feature of our data structures is that nonintersecting ranges are supported naturally, whereas most existing data structures support ranges (neces- sary when the filters are defined for port numbers) by breaking one range into O(W) prefixes which results in O(W log n) memory requirement. Supporting ranges is also a nice feature for network liv-r addresses. The range that a prefix covers must be a power of two, and it must start at a number which is a multiple of the range size. But the end points and the size of a normal range can be any number. Supporting ranges means one can allocate a range with arbitrary size to a network (AppleTalk supports this feature) and the range .,.::-regation is potentially better than that of prefix. For example, two di-, ,iil prefixes can ..- .-regate into one prefix only if their ranges are .,Ii] i'.ent to each other and they have the same length, whereas the two di-i i I ranges can ..-:-:regate into one range as long as they are next to each other. So, range .::- regation is expected to result in router tables that have fewer rules. CHAPTER 2 O(log n) DYNAMIC ROUTER TABLE FOR PREFIXES AND RANGES In this chapter, we show in Section 2.2 how priority-search trees may be used to represent dynamic prefix-router-tables. The resulting structure, which is conceptually simpler than the CRBT structure of Sahni et al. [21], permits lookup, insert, and delete in O(log n) time each. For range router-tables, we consider the case when the best matching-prefix is the most-specific matching prefix (this is the range analog of longest-matching prefix). In Section 2.3, we show that dynamic range-router-tables that employ most-specific range matching and in which no two ranges overlap may be efficiently represented using two priority-search trees. Using this two-priority-search- tree representation, lookup, insert, and delete can be done in O(log n) time each. The general case of non-conflicting ranges is considered in Section 2.4. In this section, we augment the data structure of Section 2.3 with several red-black trees to obtain a range-router-table representation for non-conflicting ranges that permits lookup, insert, and delete in O(log n) time each. Section 2.1 introduces the terminology we use. In this section, we also develop the mathematical foundation that forms the basis of our data structures. Experimental results are reported in Section 2.5. 2.1 Preliminaries 2.1.1 Prefixes and Longest-Prefix Matching The prefix 1101* matches all destination addresses that begin with 1101 and 10010* matches all destination addresses that begin with 10010. For example, when W = 5, 1101* matches the addresses {11010, 11011} {26, 27}, and when W = 6, 1101* matches {110100,110101,110110,110111} = {52,53,54,55}. Suppose that a router table includes the prefixes P1 = 101*, P2 = 10010*, P3 = 01*, P4 = 1*, and S y I I 1 I--I I U V u V x y u v x y x y u v x I u v I i I i (A) (B) (C) Figure 2-1: Relationships between pairs of ranges. A)Two ranges are di-. iiil B)Two ranges are nested. C)Two ranges intersect. P5 = 1010*. The destination address d = 1010100 is matched by the prefixes P1, P4, and P5. Since |P1| = 3 (the length of a prefix is number of bits in the prefix), |P4| = 1, and |P5| = 4, P5 is the longest prefix that matches d. In longest-prefix routing, the next hop for a packet destined for d is given by the longest prefix that matches d. 2.1.2 Ranges and Projections Definition 1 A range r = [u, v] is a pair of addresses u and v, u < v. The r r,'., r represents the addresses {u, u + 1,..., v}. start(r) = u is the start point of the r"i,,.- and finish(r) = v is the finish point of the r,,.g The rr,i..' r covers or matches all addresses d such that u < d < v. range(q) is a predicate that is true iff q is a r it,'ilI The start point of the range r = [3, 9] is 3 and its finish point is 9. This range covers or matches the addresses {3, 4, 5, 6, 7, 8, 9}. In IPv4, s and f are up to 32 bits long, and in IPv6, s and f may be up to 128 bits long. The IPv4 prefix P = 0* corresponds to the range [0, 231 1]. The range [3,9] does not correspond to any single IPv4 prefix. We may draw the range r = [u, v] = {u, u + 1,..., v} as a horizontal line that begins at u and ends at v. Figure 2-1 shows ranges drawn in this fashion. Notice that every prefix of a prefix router-table may be represented as a range. For example, when W = 6, the prefix P = 1101* matches addresses in the range [52,55]. So, we -- P = 1101* = [52,55], start(P) = 52, and finish(P) = 55. Since a range represents a set of (contiguous) points, we may use standard set operations and relations such as n and c when dealing with ranges. So, for example, [2, 6] n [4, 8] = [4, 6]. Note that some operations between ranges my not yield a range. For example, [2, 6] U [8, 10] {2, 3, 4, 5, 6, 8, 9, 10} is not a range. Definition 2 Let r = [u, v] and s = [x, y] be two ri .g. Let overlap(r, s) = r n s. (a) The predicate disjoint(r, s) is true iff r and s are disjoint. disjoint(r, s) < overlap(r, s)= 0 v < x V y < u Figure 2-1(A) shows the two cases for disjoint sets. (b) The predicate nested(r, s) is true iff one of the r,,.", is contained within the other. nested(r, s) o overlap(r, s) r V overlap(r, s)= s == rCsVsCr <= x
Figure 2-1(B) shows the two cases for nested sets. (c) The predicate intersect(r, s) is true iff r and s have a no,. mn1,1i intersection that is different from both r and s. intersect(r,s) rns /OArns /rArs / s = ~disjoint(r, s) A -nested(r, s) u Figure 2-1(C) shows the two cases for r,,i.. that intersect. Notice that overlap(r, s) = [x, v] when u < x < v < y and overlap(r, s) = [u, y] when x < u < y < v. [2, 4] and [6, 9] are disjoint; [2,4] and [3,4] are nested; [2,4] and [2,2] are nested; [2,8] and [4,6] are nested; [2,4] and [4,6] intersect; and [3,8] and [2,4] intersect. [4, 4] is the overlap of [2, 4] and [4, 6]; and overlap([3, 8], [2, 4]) = [3, 4]. Lemma 1 Let r and s be two r,,. Ei. //;/ one of the following is true. 1. disjoint(r, s) 2. nested(r, s) 3. intersect(r, s) Proof Straightforward. U Definition 3 Let R = {ri,..., r} be a set of n r,'ig The projection, H(R), of R is H(R) = Ui
That is, II(R) comprises all addresses that are covered by at least one rr,,.- of R. For A {[2, 5], [3, 6], [8, 9]}, H(A) = {2, 3, 4, 5, 6, 8, 9}, and for B = {[4, 8], [7,9]}, (B) = {4, 5,6, 7,8, 9}. II(A) is not a range. However, 1(B) is the range [4,9]. Note that HI(R) is a range iff d CE (R) for every d, u < d < v, where u = mind d E H(R)} and v = max{d|d E I(R)}. Lemma 2 Let R = {ri,r2,..., rn} be a set of n ri.,g such that 1(R) = [u,v]. (a) u = minStart(R) min{start(ri)} and v maxFinish(R) = max{finish(ri)}. (b) Let s be a r,.,g II(RU{s}) is a ri,.-, if starts) < v+1 and finishes) > u-1. (c) When H(R U {s}) [x, y], x = min{u, startss} and y = max{v, finish(s)}. Proof (a) is straightforward. Figure 2-2 shows all possible cases for which II(RU{s}) is a range, s is shown as a solid line. (b) and (c) are readily verified for each case of Figure 2-2. m 2.1.3 Most-Specific-Range Routing and Conflict-Free Ranges Definition 4 The ri 'i r is more specific than the r .'i,. s iff r C s. I II I I u v I I . . . . I I u-1 v+1 Figure 2-2: Cases for Lemma 2 [2, 4] is more specific than [1,6], and [5, 9] is more specific than [5, 12]. Since [2, 4] and [8, 14] are di-i ..iii neither is more specific than the other. Also, since [4, 14] and [6, 20] intersect, neither is more specific than the other. Definition 5 Let R be a riu,.j. set. ranges(d, R) (or .i.:,,l;, ranges(d) when R is implicit) is the subset of r,, of R that match/cover the destination address d. msr(d,R) (or msr(d)) is the most / .... .:;' riI.j, of R that matches d. That is, msr(d) is the most "-/.. ..:'. rr,,i.j, in ranges(d). msr([u,v], R) = msr(u, v, R) = r iff msr(d, R) = r, u < d < v. When R is implicit, we write msr(u, v) and msr([u,v]) in place of msr(u, v,R) and msr([u, v],R). In most-specific-range routing, the next hop for packets destined for d is given by the next-hop information associated with msr(d). When R = {[2,4], [1, 6]}, ranges(3) = [2,4], [1, 6]}, msr(3) = [2,4], msr(1) = [1, 6], msr(7) = 0, and msr(5,6) = [1,6]. When R = {[4,14], [6, 20], [6,14], [8,12]}, msr(4, 5) [4,14], msr(6, 7) [6,14], msr(8, 12)- [8,12], msr(13, 14)- [6,14], and msr(15, 20) [6, 20]. Definition 6 The ri,,-, set R has a conflict iff there exists a destination address d for which ranges(d) / 0 A msr(d) = R is conflict free iff it has no conflict. The predicate conflictFree(R) is true iff R is a conflict-free ru,.,- set. con flictFRee({[2, 8], [4, 12], [4, 8])} is true while conflictFree( {[2, 8], [4, 12])} is false. We note that our definition of conflict free is a natural extension to ranges of the definition of conflict free given by Hari et al. [36] for the case of two-dimensional prefix rules. Definition 7 Let r and s be two intersecting r,".,. of the r,,g,- set R. The subset Q c R is a resolving subset for these two ri., if Q is conflict free and II(Q) = overlap(r, s). Two r,,i., of a riu.,' set are in conflict iff they intersect and have no resolving subset. Two r, i ,, are conflict free iff they are not in conflict. Lemma 3 A rir,,.- set is conflict free iff it has no pair of ri,, ig that are in conflict. Proof Follows from the definition of a conflict-free range set. 0 Lemma 4 Let R be a conflict-free r,,u-., set. Let r be an arbitrary r,i ., Let A be the subset of R that comprises all r,,.g of R that are contained in r. A is conflict free. Proof Since R is conflict free, every pair (s,t) of intersecting ranges in A has a resolving subset B in R. From Definition 7, it follows that every range in B is contained in overlap(s,t). Hence, B C A. Therefore, every pair of intersecting ranges of A has a resolving subset in A. So, A is conflict free. 0 Lemma 5 Let R be a conflict-free rr,,.' set. Let A, A C R be such that II(A) = r [u,v]. 1. 3B C R[conflictFree(A U B) A I(A) = H(A U B)] 2. Let s E R be such that intersect(r, s). 3B C R[I(B) = overlap(r, s)] 3. R U {r} is conflict free. Proof 1. Follows from Lemma 4. 2. When r E R, (2) follows from the definition of a conflict-free range set. So, assume r R. Let C comprise all ranges of A contained in s. If s intersects no range of A, II(C) = overlap(r, s). If s intersects at least one range of A, then let t E A be an intersecting range with maximum overlap. Since R is conflict free, 3D C R[H(D) = overlap(t, s)]. We see that H(C U D) = overlap(r, s). 3. From parts (1) and (2) of this lemma, it follows that there is a resolving subset in RU {r} for every s E R that intersects with r. Hence, RU {r} is conflict free. Definition 8 maxP(u, v, R) = max{finish(H(A)) A C R A range(H(A)) A start(H(A)) u A finish(H(A)) < v} is the maximum possible projection that is a ri,..' that starts at u and finishes by v. minP(u,v, R) = min{start(H(A)) A C R A range(H(A)) A finish(H(A)) v A start(H(A)) > u} is the minimum possible projection that is a rin,. that finishes at v and starts by u. When /QA C R[range(H(A)) A start(H(A)) u A finish(H(A)) < v], we -..r; that maxP(u, v, R) does not exist. Similarly, minP(u, v, R) I,,r.; not exist. At times, we use maxP and minP as abbreviations for maxP(u, v, R) and minP(u, v, R), re- IN. A 1 1;/ maxY(u,v, R) = max{y [x,y] E R A x < u < y < v} and minX(u,v, R) min{x [x, y] E R A u < x < v < y}. Note that maxY and minX i,,.. r not exist. Lemma 6 Let R be a conflict-free r,,..'- set. Let A RU {r}, where r = [u, v] R. conflictFree(A) maxY(u, v, R) < maxP(u, v, R) A minX(u, v, R) > minP(u, v, R) where maxY < maxP (minX > minP) is true whenever maxY (minX) does not exist and is false when maxY (minX) exists but maxP (minP) does not. Proof (=) Assume that A is conflict free. When neither maxY nor minX exist (this happens iff no range of R intersects r =[u, v]), maxY < maxP A minX > minP. When, maxY exists, s = [x, maxY] E AAx < u < maxY < v. (Note that intersect(r, s).) Since A is conflict free, A has a (resolving) subset B for which H(B) = overlap(r, s)= [u, maxY]. Therefore, maxY < maxP. Similarly, when minX exists, minX > minP. ( ) Assume maxY(u, v, R) < maxP(u, v, R) A minX(u, v, R) > minP(u, v, R). When neither maxY nor minX exist, no range of R intersects r. When, maxY exists, 3s = [x, y] E A[x < u < y < v]. Consider any such s = [x, y]. Since maxY < maxP and maxY exists, maxP exists. Hence, 3B C R[conflictFree(B) A H(B) [u, maxP]]. When y = maxP, B is a resolving subset for s and r in A. When y < maxP, intersect(s, [u,maxP]). Since R U {[u,maxP]} is conflict free lemmaa 5(3)), R U {[u,maxP]} (and so also R and A) has a resolving subset for s and [u, maxP]. This resolving subset is also a resolving subset for s and r. When minX exists, 3s [x, y] E A[u < x < v < y]. In a manner analogous to the proof for the case maxY exists, we may show that A has a resolving subset for r and each such s. Hence, in all cases, intersecting ranges of A have a resolving subset. So, A is conflict free. U Lemma 7 Let R be a conflict-free rr,,j set. Let A = R {r} for some r c R. AB c A[H(B) r]A As e Air c s] ,AC c Air c n(C)] Proof Assume AB c A[H(B) = r] (2.1) and As e A[r C s] (2.2) We need to show that AC C A[r C H(C)]. Suppose that there is a C such that C C A Ar C H(C). From C C A and Equation 2.2, it follows that Vt c C[disjoint(r,t) V intersect(r, t) V t C r] (2.3) If At c C[intersect(r, t), then from Equation 2.3, we get Vt c C[disjoint(r, t) V t c r]. From this and r C H(C), it follows that all destination addresses d, d E r, are covered by ranges of C that are contained in r. Therefore, 3B C C C A(H(B) r). This contradicts Equation 2.1. Next, suppose 3t E C[intersect(r, t)]. Let D be the union of the resolving subsets for all of these t and r in R. Clearly, all ranges in D are contained in r. Further, let E be the subset of all ranges in C that are contained in r. It is easy to see that D U E C A A H(D U E) = r. This contradicts Equation 2.1. t Lemma 8 Let R be a conflict-free rr,'j set. Let A = R {r}, for some r e R. 1. 3B C A[H(B) = r] = conflictFree(A). 2. 14B C A[H(B) r] s= [conflictFree(A) c=/Es e A[r c s] V [m,n] e A], where max{start(s)ls c AAr C s}, and n min{finish(s)ls c AAr C s}. Proof For (1), we note that by replacing r by B in every resolving subset for intersecting ranges in R, we get resolving subsets that do not include r. Hence all of these resolving subsets are present in A. So, A is conflict free. For (2), assume that AB C A[I(B) = r]. ( ) Assume that A is conflict free. We need to prove s e A[r C s] V [m,n] e A (2.4) We do this by contradiction. So, assume 3s E A[r C s] A [m, n] A (2.5) Since 3s E A[r c s], m and n are well defined. Equation 2.5 implies that A has a range [m, y], y > n as well as a range [x, n], x < m. Further, intersect([m, y], [x, n]) and r C overlap([m, y], [x, n]) = [m, n]. Let B be the subset of R comprised of all ranges contained in [m, n]. From Lemma 4, it follows that B is conflict free. However, r is the projection of no subset of C = B {r}. Further, no range of C contains r. From Lemma 7, it follows that no subset of C has a projection that contains r. In particular, C has no subset whose projection is [m, n]. Therefore, A, has no subset whose projection is [m, n]. So, A has no resolving subset for [m, y] and [x, n]. Therefore, A is not conflict free, a contradiction. (-) If no range of A contains r, then r is not part of the resolving subset for any pair of intersecting ranges of R. This, together with the fact that R is conflict free, implies that A is conflict free. If [m, n] e A, we can use [m, n] in place of r in any resolving subset for intersecting ranges of R. Therefore, A has a resolving subset for every pair of intersecting ranges. So, A is conflict free. 0 Lemma 9 Let R be a conflict-free rr,,,.' set and let d be a destination address. If ranges(d) / 0, then start(msr(d)) = a = maxStart(ranges(d)) = max{start(r) r E ranges(d)} and finish(msr(d)) = b = minFinish(ranges(d)) min{finish(r)lr E ranges(d)}. Proof Since R is conflict free and ranges(d) / 0, msr(d) / 0. Assume that msr(d) = s. If s / [a, b], then starts) < a or finishes) > b. Assume that starts) < a (the case finishes) > b is similar). Let t E ranges(d) be such that start(t) = a. Now, intersect(s,t) Vt C s. Hence, s / msr(d). U 2.1.4 Normalized Ranges Definition 9 [Normalized Ranges] The r i,,. set R is normalized iff one of the following is true. 1. RI <1. 2. IRI > 1 and for every r E R and every s E R, r / s, one of the following is true. (a) disjoint(r, s). (b) nested(r,s) A start(r) / starts) A finish(r) / finishes). That is, r and s are nested and do not have a common end-point. H (A) (B) Figure 2-3: Unnormalized and normalized range sets Figure 2-3(A) shows a range set that is not normalized (it contains ranges that intersect as well as nested ranges that have common end-points). Figure 2-3(B) shows a normalized range set. Regardless of which of these two range sets is used, every destination d has the same most-specific range. Definition 10 An ordered sequence of ri.l- (ri,..., r) is a chain iff Vi < n [start(ri+l) = finish(ri)]. A ri.-j, set R is a chain iff its ri.j,- can be ordered so as to form a chain. chain(R) is a predicate that is true iff R is a chain. The range sequence ([2, 4], [5, 7], [8, 12]) is a chain while ([5, 8], [12, 14]) and ([5, 8], [2, 4]) are not. The range sets {[5,8], [2, 4]} and {[2, 4], [8, 12], [5, 7]} are chains while {[2, 4], [8, 12]} and {[2, 4], [5, 7], [8, 12], [9, 10]} are not. Note that when R is a chain, H(R) = [minStart(R), maxFinish(R)]. Lemma 10 Let N be a normalized ri.lj. set. A c N A n(A) = [u, v] = 3B c N[chain(B) A n(B) = [u, v]] Proof Let B be the subset of A obtained by removing from A all ranges that are nested within at least one other range of A. Clearly, I(B) = H(A) = [u, v]. Since N is normalized and B C N, B is also normalized. From Definition 9 and the fact that B has no pair of nested ranges, it follows that all ranges of B are dli-i ,iiil For dli- I iil ranges to have a projection that is a range, the dli- iiil ranges must form a chain. U Lemma 11 Let N be a normalized riu.j. set. 1. N 1In,,; be ,n.u':,1, l/ partitioned into a set of longest chains CP(N) {C1,..., Ck}, N = Ul
... +1 + ] .... .-' +1 +1 +1- Figure 2-4: Partitioning a normalized range set into chains of CP i,,i' be combined into a single chain. CP(N) is called a canonical partitioning. 2. For all i and j, 1 < i < j < k, i and Cj are either disjoint,or Ci is ""'/' contained within a ri,:, of Cy or Cj is 1" I'/' i'/ contained within a ri ,:' of CQ. A chain Ci is i," '/'. JI contained within the ru,:-, r iffII(Ci) C r and Ci and r share no end point. Proof Direct consequence of the definition of a normalized set and that of a chain. Figure 2-4 shows a normalized range set and its canonical partitioning into three chains. Next we state a chopping rule that we use to transform every conflict-free range set R into an equivalent normalized range set norm(R). By equivalent, we mean that for every destination d, the most-specific matching-range is the same in R as it is in norm(R). Definition 11 [Chopping Rule] Let r = [u,v] E R, where R is a ri,.'- set. chop(r, R) (or more -.':,,/1/; chop(r) when R is implicit), is as /, r1 ,.1 below. 1. If neither maxP(u, v 1, R) nor minP(u + 1, v, R) exists, chop(r) = r. 2. If only maxP(u, v 1, R) exists, chop(r) = [maxP(u, v 1, R) + 1, finisher)]. 3. If only minP(u + 1, v, R) exists, chop(r) = [start(r), minP(u + 1, v, R) 1]. 4. If both maxP(u, v-, R) and minP(u+l,v, R) exist and maxP(u, v-, R)+1 < minP(u+ 1, v, R) -1, chop(r) = [maxP(u,v- 1, R) + 1, minP(u+ 1, v, R) 1]. 5. IfbothmaxP(u,v-l,R) and minP(u+,v,R) exist and maxP(u,v-l,R)+1 > minP(u + 1, v, R) 1, chop(r) = 0, where 0 denotes the null r,,.. The null r,,.'i, neither intersects nor is contained in rn, other r,,.'. D. fi,. norm(R) = {chop(r)lr E R A chop(r) / 0}. Lemma 12 Let R be a conflict-free rr,'.- set. Vr E R Vs E R[s C r = [s C chop(r) A starts) / start(chop(r)) A finishes) / finish(chop(r))] Vdisjoint(s, chop(r))] Proof The lemma is trivially true when chop(r) = 0 (disjoint(s, 0) is true). So, assume that chop(r) = r'. For the lemma to be false, either intersects, r') or (r' C s or s and r' have a common end point). If intersect(s,r'), then either starter') < starts) < finisher') < finishes) or starts) < starter') < finishes) < finisher'). Assume the former (the latter case is similar). From the chopping rule, it follows that 3A C R[II(A) = [finish(r') + 1, finisher)]. Therefore, A U {s} C R A II(A U {s}) = starts(s, finisher)]. From this, start(r) < start(r') < startss, and the chopping rule, we get finish(chop(r)) < startss. But, starts) < finisher'), a contradiction. So, r' C s or s and r' have a common end point. First consider the case r' C s C r. Suppose that starts) / start(r) (the case finishes) / finish(r) is similar). Since r' = chop(r), 3A C R[I(A) = finisherr') + 1, finisher)]. Therefore, II(A U {s}) starts(s, finisher)] and start(r) < starts) < start(r'). From the chopping rule, it follows that finish(chop(r)) < starts) < starter') < finisher'), a contradiction. Therefore, s C r'. If starts) = starter'), maxP(start(r), finish(r)- 1) > finishes). So, starter') > finishes), which contradicts s C r'. The case finishes)= finisher') is similar. 0 Lemma 13 Let r and s be two intersecting rii'. of a conflict-free r,,',- set R. disjoint(chop(r), overlap(r, s)) A disjoint(chop(s), overlap(r, s)) A disjoint(chop(r), chop(s)) Proof Without loss of generality, we may assume that start(r) < starts) < finisher) < finishes). Since R is conflict free, 3A[A C R A n(A) = overlap(r, s)]. Therefore, finish(chop(r)) < starts) and start(chop(s)) > finisher). This proves the lemma. 0 Lemma 14 Let R be a conflict-free r,,'.- set. For every r' e norm(R) there is a unique r E R such that chop(r) = r'. Proof Let r' be any range in norm(R). Clearly, for every r' E norm(R), there is at least one r E R such that chop(r) = r'. Suppose two different ranges r and s of R have r' = chop(r) = chop(s). If intersect(r, s), then from Lemma 13 we get disjoint(chop(r), chop(s)). So, chop(r) / chop(s). If nested(r, s), then from Lemma 12 it follows that s C chop(r) V disjoint(s, chop(r)) when s C r and r C chop(s) V disjoint(r, chop(s)) when r C s. Consider the former case (the latter case is similar). s C chop(r) implies chop(s) / chop(r). disjoint(s, chop(r)) also implies chop(s) / chop(r). The final case is disjoint(r, s). In this case, clearly, chop(s) / chop(r). m For r' E norm(R), define full(r') chop-l(r') = r, where r is the unique range in R for which chop(r) = r'. Notice that full(chop(r)) = r except when chop(r) = 0. Lemma 15 For every conflict-free r,,'.., set R, norm(R) is a normalized conflict-free ri./,'' set. Proof We shall show that norm(R) is normalized. Since a normalized range set has no intersecting ranges, every normalized range set is conflict free. If Inorm(R)l < 1, norm(R) is normalized. So, assume that Inorm(R)l > 1. Let r' and s' be two different ranges in norm(R). We need to show that r' and s' satsify property 2(a) or 2(b) of Definition 9. Let r = [u,v] = full(r') and s = full(s'). There are three possible cases for r and s, they either intersect, are nested, or are di-i,,ii-l (Lemma 1). Case 1: intersect(r, s). From Lemma 13, it follows that r' and s' are disjoint. Case 2: nested(r,s). Either s C r or r C s. Assume the former (the latter case is similar). From Lemma 12, we get [s C chop(r) A starts) / start(chop(r)) A finishes) / finish(chop(r))] V disjoint(s, chop(r)) s C chop(r) A starts) / start(chop(r)) A finishes) / finish(chop(r)) implies that s' and r' are nested and do not have a common end-point. disjoint(s, chop(r)) implies that s' and r' are disjoint. Case 3. disjoint(r, s). Clearly, disjoint(r', s'). Lemma 16 Let r' E norm(R), where R is a conflict-free ri,,.'-. set. As' e norm(R)[s' C r'] = r = full(r') = msr(r', R) Proof Assume that /fs' e norm(R)[s' C r']. If 3d E r'[r / msr(d,R)], then 3s C r[d E s]. From Lemma 12, it follows that s C r'Vsn r' 0. Since d E sAd E r', s n r' / 0. Hence, s C r'. From Lemma 4, it follows that A {= t t RAt C s} 0 is conflict free. From the chopping rule it follows that norm(A) / 0. So, 3t' e norm(A) c norm(R)[t' C t = full(t') C r']. This violates the assumption of this lemma. Therefore, Ad E r'[r / msr(d, R)]. So, r = msr(r', R). 0 Lemma 17 Let R be a conflict-free rr,.ui, set, let x be the start point of some rri..j, in R, and let y be the finish point of some rru.j, in R. 1. Let s E R be such that starts) = x and finishes) = min{finish(t)t E R A start(t) = x} (a) chop(s) / 0. (b) start(chop(s)) = x. (c) chop(s) is the only r"'i.j in norm(R) that starts at x. 2. Let s C R be such that finishes) = y and starts) = max{start(t) t E R A finish(t) = y} (a) chop(s) / 0. (b) finish(chop(s)) = y. (c) chop(s) is the only rr,..,. in norm(R) that finishes at y. Proof We prove l(a) (c). 2(a) (c) are similar. Since maxP(start(s), finish(s)- 1, R) does not exist, case 5 of the chopping rule does not apply and chop(s) / 0. One of the cases 1 and 3 applies. In both of these cases, start(chop(s)) = x. For l(c), we note that the definition of a normalized set (Definition 9) implies that no two ranges of norm(R) share an end point. In particular, norm(R) can have only one range that has x as an end point. U Lemma 18 Let r' E norm(R), where R is a conflict-free riw..- set. start(full(r')) / starter') =/As e R[start(s) = starter')] Proof Suppose that start(full(r')) / starter') and 3s E R[start(s) = start(r')]. From Lemma 17(la and Ib), it follows that 3t c R[start(t) = starter') A chop(t) / 0 A start(chop(t)) = start(r')]. Therefore, norm(R) has at least two ranges (r' and chop(t)) that start at starter'). This contradicts Lemma 17(lc). m Lemma 19 Let R be a conflict-free r,'..- set. Let r E R be such that r msr(u, v, R) for some rr wu. [u, v]. r' = chop(r) = msr(u, v, norm(R)). Proof From the definition of msr, it follows that there is no s E R such that s C r A s n [u, v] / 0. Therefore, [u, v] C chop(r). Further, from Lemmas 12 and 13, it follows that norm(R) contains no s' Cc hop(r). So, r' = msr(u, v, norm(R)). m Lemma 20 Let R be a conflict-free riu.,. set that has a subset whose projection equals [x, y]. Let A C R comprise all r c R such that r C [x, y]. 1. 3B C norm(R)[II(B) = [x,y]] 2. C = {full(r') r' e norm(R) A r' C [x, y]} C A Proof 1. From Lemma 4, it follows that A is conflict free. Further, since R has a sub- set whose projection equals [x, y], n(A) = [x, y]. From Lemma 19, it fol- lows that every d E [x,y] has a most-specific range in norm(A). Therefore, n(norm(A)) = [x, y]. From the definition of the chopping rule and that of A, we see that Vr E A[chop(r, A) = chop(r,R)]. So, norm(A) c norm(R). 2. First, assume that [x, y] E R. Suppose there is a range r' E norm(R) such that r' C [x, y] and r = full(r') A. There are three cases for r. Case 1: disjoint(r, [x,y]). In this case, disjoint(r', [x,y]) and so r' Z [x, y]. Case 2: intersect(r, [x, y]). From Lemma 13, we get disjoint(chop(r), [x, y]). So r' 0 [x,y]. Case 3: [x,y] C r. From Lemma 12 and r' c [x,y], we get disjoint([x,y], chop(r)). So r' 7 [x, y]. When [x, y] R, let R' RU{[x, y]}, C' = {full(r') r' e norm(R')Ar' C [x, y]} and A' = A U { [x, y]}. Using the lemma case we have already proved, we get C' C A'. Since chop([x,y],R') = 0 and chop(s,R) = chop(s,R') for every s E R, norm(R') = norm(R). Therefore, C C'. So, C C A'. Finally, since [x, y] C, CC A. Lemma 21 Let R be a conflict-free r,,..', set. Let r R be such that R U {r} is conflict free. 1. chop(r, RU {r}) 0 = Vt R[chop(t, R)= chop(t, RU {r})]. 2. Let s be the smallest ri,.j,' of R that contains r. Assume that s exists and that chop(r, RU {r}) / 0. (a) Vt R -{s}J[chop(t, R) chop(t, RU {r})]. (b) chop(s, R) / chop(s, RU {r}) = (x' =u' A y' = v') V (x' u' A y' > v) V (x' < uAy' = v'), where r [u,v], chop(r,RU{r}) = chop(r,R) = [u', v'], and chop(s, R) = [x', y']. Proof For (1), note that chop(r, RU {r}) 0 z= 3A C R[H(A) = r]. Therefore, the addition of r to R does not affect any of the maxP and minP values. For (2a), suppose there are two different ranges g and h in R such that chop(g, R) / chop(g, R U {r}) and chop(h, R) / chop(h, R U {r}) From the chopping rule, it follows that rCgAr C h (2.6) Therefore, -disjoint(g, h). From this and Lemma 1, we get intersect(g, h)V nested(g, h). Equation 2.6 and intersect(g, h) imply r C overlap(g, h). From this and Lemma 13, we get disjoint(r, chop(g, R)) A disjoint(r, chop(h, R)). Therefore, chop(g,R) = chop(g, R U {r}) and chop(h, R) = chop(h, R U {r}), a contradiction. So, -intersect(g, h). If nested(g, h), we may assume, without loss of generality, that g C h. This and Equation 2.6 yield r C g C h. Therefore, maxP(x,y-l, R) = maxP(x,y-l, RU{r}) and minP(x + y, R) = minP(x + 1, y, RU {r}), where h = [x,y]. So, chop(h, R) chop(h, R U {r}), a contradiction. Hence, there can be at most one range of R whose chop() value changes as a result of the addition of r. The preceding proof for the case nested(g, h) also establishes that the chop() value may change only for the range s, that is for the smallest enclosing range of r (i.e., smallest s E R[r C s]). For (2b), assume that chop(s, R) / chop(s, RU{r}). This implies that chop(s, R) / 0 and so x' and y' are well defined. (Note that from part (1), we get chop(r, R) / 0.) We consider each of the three cases for the relationship between r and chop(s, R) (Lemma 1). Case 1: disjoint(r, chop(s, R)). This case cannot arise, because then chop(s, R) = chop(s, R U {r}). Case 2: intersect(r, chop(s, R)). Now, either x' < u < y' < v or u < x' < v < y'. Consider the former case. Since r C s, v < y. When v = y, minP(u + 1, v, R U {r}) = minP(x + 1, y, R) = y' + 1. So, v' = y'. Therefore, x' < u A y' = v'. Consider the case v < y. From the chopping rule, it follows that 3A C R c R U {r}[H(A) [y' + l,y]]. From this, Lemma 5(2), and the fact that RU {r} is conflict free, we conclude 3B E R U {r}[I(B) = overlap(r, [y' + 1, y]) = [y' + 1, v]]. From this and minP(x + 1, y, R) = y' + 1, we get minP(u + 1, v, RU {r}) y' + 1. So, v' = y'. Once again, x' < u A y' = v'. Using a similar argument, we may show that when u < x' < v < y', x' u' A y' > v. Case 3: nested(r, chop(s, R)). So, either r C chop(s,R) or chop(s,R) C r. First, consider all possibilities for r C chop(s, R). The case x' < u < v < y' cannot arise, because this implies chop(s, R) = chop(s,R U {r}). When x' = u < v < y', u' = x'. So, x' = u' A y' > v. When x' < u < v = y', v' = y'. So, x' < uA y' = v'. The final case is when x' = u < v = y'. Now, u' = x' A y' v'. Using an argument similar to that used in part (2a), we may show that when chop(s,R) C r, x' = u' A' y' v'. Lemma 22 Let R, r = [u, v], s = [x, y], x', y', u' and v' be as in Lemma 21. Assume that s exists and chop(s) / 0. 1. disjoint(r,chop(s,R)) V x' < u < v < y' = chop(s,R U {r}) = chop(s,R). 2. x' = u' A y' = v' = chop(s, R U {r}) 0. 3. Suppose x' = u' A y' > v. If maxP(v' + 1, y', R) doesn't exist, then chop(s, R U {r}) = [v + 1, y']. If it exists, chop(s, RU {r}) = [maxP(v' + 1, y', R) + 1, y']. 4. Suppose x' < u' A y' = v'. If minP(x', u' 1, R) doesn't exist, then chop(s, R U {r}) = [x', u ]. If it exists, chop(s,R U {r}) = [x', minP(x', u' 1,R) i]. Proof (1) follows from the proof of Lemma 21(2b). For (2), from the proof cases of Lemma 21(2b) that have x' = u' A y' = v', it follows that case 5 of the chopping rule applies for s in R U {r}. So, chop(s, R U {r}) = 0. For (3), finish(chop(s, R U {r})) = y' follows from the proof of Lemma 21(2b). Also, we observe that maxP(x,y 1,R U {r}) > v. So, (3b) can be false only when maxP(x, y 1, RU {r}) > v and either (a) maxP(v' + 1, y', R) doesn't exist or (b) maxP(v' + 1,y', R) < maxP(x,y 1,RU {r}). For (a), 3[c,d] e R[x < c < v' Av < d < y']. For (b), 3[c,d] e R[x < c < v' Av < maxP(v' + ,y',R) < d < y']. In both cases, c < u implies that r = [u, v] C [c, d] C s. This contradicts the assumption that s is the smallest enclosing range of r. Also, in both cases, c > u implies intersect(r, [c, d]). So, R U {r} has a subset whose projection is [c, v]. Therefore, finish(chop(u, v, RU {r})) < c < v', a contradiction. The proof for (4) is similar to that for (3). U Lemma 23 Let R be a conflict-free ri,.,', set. Let r = [u, v] E R be such that R-{r} is conflict free. 1. chop(r, R) = 0 = Vt E R {r}[chop(t, R) = chop(t, R {r})]. 2. Let s = [x, y] be the smallest r,.".' of R {r} that contains r. Assume that s exists and that chop(r, R) = [u', v']. (a) Vt E R- {r, s}[chop(t, R) = chop(t, R {r})]. (b) chop(s, R) = 0 chop(s, R {r}) [u',v']. (c) chop(s, R)= [x', y'] = chop(s, R {r}) [min{x', u'}, max{y', v'}]. Proof For (1), note that chop(r, R) =0 = 3A C R {r}[II(A) = r]. Therefore, the removal of r from R does not affect any of the maxP and minP values. For (2a) note that by substituting R {r} for R in Lemma 21(2a), we get Vt E R {r, s}[chop(t, R {r}) = chop(t, R)]. (2b) and (2c) follow from Lemma 22. * 1 2 .. ..... .... .. ." ..... ..... .... 4 7 8 14 8 ..... ... 11 14 4 11 17 0 22 0 I 0 4 8 12 16 20 24 (A) (B) Figure 2-5: An example range set R and its mapping mapl(R) into points in 2D 2.1.5 Priority Search Trees And Ranges A priority-search tree (PST) [37] is a data structure that is used to represent a set of tuples of the form (keyl, key2, data), where keyl > 0, key2 > 0, and no two tuples have the same keyl value. The data structure is simultaneously a min-tree on key2 (i.e., the key2 value in each node of the tree is < the key2 value in each descendent node) and a search tree on keyl. There are two common PST representations [37]: 1. In a radix priority-search tree (RPST), the underlying tree is a binary radix tree on keyl. 2. In a red-black priority-search tree (RBPST), the underlying tree is a red- black tree. McCreight [37] has si --, -I ,l a PST representation of a collection of ranges with distinct finish points. This representation uses the following mapping of a range r into a PST tuple: (keyl, key2, data) (finish(r), starter), data) (2.7) where data is any information (e.g., next hop) associated with the range. Each range r is, therefore mapped to a point mapl(r) = (x,y) = (keyl, key2) = (finish(r), starter)) in 2-dimensional space. Figure 2-5 shows a set of ranges and the equivalent set of 2-dimensional points (x, y). McCreight [37] has observed the when the mapping of Equation 2.7 is used to obtain a point set P = mapl(R) from a range set R, then ranges(d) is given by the points that lie in the rectangle (including points on the boundary) defined by Xleft = Xright = 00, Yto = d, and bottom = 0. These points are obtained using the method enumerateRectangle(xft, Xright, Ytop) = enumerateRectangle(d, oo, d) of a PST bottomm is implicit and is alv--v- 0). When an RPST is used to represent the point set P, the complexity of enumerateR'. (,,/ xl leftf, right, top) is O(logmaxX + s), where maxX is the largest x value in P and s is the number of points in the query rectangle. When the point set is represented as an RBPST, this complexity becomes O(log n + s), where n = IP. A point (x, y) (and hence a range [y, x]) may be inseted into or deleted from an RPST (RBPST) in O(log maxX) (O(logn)) time [37]. 2.2 Prefixes Let R be a set of ranges such that each range represents a prefix. It is well known (see Sahni et al. [21], for example) that no two ranges of R intersect. Therefore, R is conflict free. For simplicity, assume that R includes the range that corresponds to the prefix *. With this assumption, msr(d) is defined for every d. From Lemma 9, it follows that msr(d) is the range [maxStart(ranges(d)), minFinish(ranges(d))]. To find this range easily, we first transform P = mapl(R) into a point set transforml(P) so that no two points of transforml(P) have the same x-value. Then, we represent transforml(P) as a PST. Definition 12 Let W be the (maximum) number of bits in a destination address (W = 32 in IPv). Let (x, y) E P. transforml(x, y) = (x',y') = (2wx-y+2w-1,y) and transforml(P) = {transforml(x, y) (x, y) C P}. We see that 0 < x' < 22W for every (x', y') E transforml(P) and that no two points in transforml(P) have the same x'-value. Let PST1(P) be the PST for transforml(P). The operation enumerateR. i,,.il/, (2wd d + 2" 1, oo, d) performed on PST1 yields ranges(d). To find msr(d), we employ the minX inRectangle leftf, Xright, Ytop) operation, which determines the point in the defined rectangle that has the least x-value. It is easy to see that minXinRectangle(2Wd d + 2w 1, oo, d) performed on PST1 yields msr(d). To insert the prefix whose range in [u, v], we insert transforml(mapl([u, v])) into PST1. In case this prefix is already in PST1, we simply update the next- hop information for this prefix. To delete the prefix whose range is [u, v], we delete transforml(mapl([u,v])) from PST1. When deleting a prefix, we must take care not to delete the prefix *. Requests to delete this prefix should simply result in setting the next-hop associated with this prefix to 0. Since, minXinRectangle, insert, and delete each take O(W) (O(logn)) time when PST1 is an RPST (RBPST), PST1 provides a router-table representation in which longest-prefix matching, prefix insertion, and prefix deletion can be done in O(W) time each when an RPST is used and in O(logn) time each when an RBPST is used. 2.3 Nonintersecting Ranges Let R be a set of nonintersecting ranges. Clearly, R is conflict free. For simplicity, assume that R includes the range z that matches all destination addresses (z [0, 232 1] in the case of IPv4). With this assumption, msr(d) is defined for every d. We may use PST1(transforml(mapl(R))) to find msr(d) as described in Section 2.2. Insertion of a range r is to be permitted only if r does not intersect any of the ranges of R. Once we have verified this, we can insert r into PST1 as described in Section 2.2. Range intersection may be verified by noting that there are two cases for range intersection (Definition 2(c)). When inserting r = [u, v], we need to determine if 3s = [x, y] E R[u < x < v < yVx < u < y < v]. We see that 3s E R[x < u < y < v] iff mapl(R) has at least one point in the rectangle defined by xleft = u, Xright = 1, and ytop = u 1 (recall that bottom = 0 by default). Hence, 3s E R[x < u < y < v] iff minXinRectangle(2u (u 1) + 2W 1,2w(v 1) + 2W u 1) exists in PST1. To verify 3s E R[u < x < v < y], map the ranges of R into 2-dimensional points using the mapping, map2(r) = (start(r), 2 1 finish(r)). Call the resulting set of mapped points map2(R). We see that 3s E R[u < x < v < y] iff map2(R) has at least one point in the rectangle defined by xleft = + 1, Xright = v, and ytop (2w 1) v 1. To verify this, we maintain a second PST, PST2 of points in transform2(map2(R)), where transform2(x, y) = (2Wx + y, y) Hence, 3s E R[u < x < v < y] iff minXinR.. l -,il. (2(u + 1), 2v + (2 1) v 1, (2 1) v 1) exists. To delete a range r, we must delete r from both PST1 and PST2. Deletion of a range from a PST is similar to deletion of a prefix as discussed in Section 2.2. The complexity of the operations to find msr(d), insert a range, and delete a range are the same as that for these operations for the case when R is a set of ranges that correspond to prefixes. 38 Step 1: If r = [u, v] E R, update the next-hop information associated with r E R and terminate. Step 2: Compute maxP(u, v, R), minP(u, v, R), maxY(u, v, R) and minX(u, v, R). Step 3: If maxY(u,v, R) < maxP(u,v, R) A minX(u,v, R) > minP(u,v,R), R U {r} is conflict free; otherwise, it is not. In the former case, insert transforml(mapl(r)) into PST1 and transform2(map2(r)) into PST2. In the latter case, the insert operation fails. Figure 2-6: Insert r = [u, v] into the conflict-free range set R 2.4 Conflict-Free Ranges In this section, we extend the two-PST data structure of Section 2.3 to the general case when R is an arbitrary conflict-free range set. Once again, we assume that R includes the range z that matches all destination addresses. PST1 and PST2 are defined for the range set R as in Sections 2.2 and 2.3. 2.4.1 Determine msr(d) Since R is conflict free, msr(d) is determined by Lemma 9. Hence, msrd(d) may be obtained by performing the operation minXinRectangle(2Wd d + 2w 1, oo, d) on PST1. 2.4.2 Insert A Range When inserting a range r = [u,v] i R, we must insert transform(mapl(r)) into PST1 and transform2(map2(r)) into PST2. Additionally, we must verify that R U {r} is conflict free. This verification is done using Lemma 6. Figure 2-6 gives a high-level description of our algorithm to insert a range into R. Step 1 is done by searching for transforml(mapl(r)) in PST1. For Step 2, we note that maxY(u, v, R) = maxXinRectangle(2u- (u- 1)+2w- 2(v- 1)+2- 1, u- 1) minX(u, v, R) = minXinRectangle(2(u+1),2Wv+(2w-1) -v- (2- 1)-v-1) Step 1: If r = z, change the next-hop information for z to 0 and terminate. Step 2: Delete transforml(mapl(r)) from PST1 and transform2(map2(r)) from PST2 to get the PSTs for A R {r}. If PST1 did not have transforml(mapl(r)), r J R; terminate. Step 3: Determine whether or not A has a subset whose projection equals r = [u, v]. Step 4: If A has such a subset, conclude conflictFree(A) and terminate. Step 5: Determine whether A has a range that contains r [u, v]. If not, conclude conflictFree(A) and terminate. Step 6: Determine m and n as defined in Lemma 8 as follows. m start(maxXinRectangle(O, 2Wu + (2 1) v, (2w 1) v) (use PST2) n = finish(minXinRectangle(2v u + 2W 1, u) (use PST1) Step 7: Determine whether [m,n] e A. If so, conclude conflictFree(A). Otherwise, conclude -conflictFree(A). In the latter case reinsert transforml(mapl(r)) into PST1 and transform2(map2(r)) into PST2 and disallow the deletion of r. Figure 2-7: Delete the range r = [u, v] from the conflict-free range set R where for maxY we use PST1 and for minX we use PST2. Section 2.4.4 describes the computation of maxP and minP. The point insertions of Step 3 are done using the standard insert algorithm for a PST [37]. 2.4.3 Delete A Range Suppose we are to delete the range r = [u, v]. This deletion is to be permitted iff r / z and A = R {r} is conflict free. Figure 2-7 gives a high-level description of our algorithm to delete r. Its correctness follows from Lemma 8. Step 2 employs the standard PST algorithm to delete a point [37]. For Step 3, we note that A has a subset whose projection equals r = [u, v] iff maxP(u, v, A) = v. In Section 2.4.4, we show how maxP(u, v, A) may be computed efficiently. For Step 5, we note that r = [u, v] C s = [x, y] iff x < u A y > v. So, A has such a range iff minXinR.. I.il, (2wv u + 2 ,oo, u) exists in PST1. In Step 6, we assume that maxXinRectangle and minXinR. u.,.i,.!: return the range of R that corresponds to the desired point in the rectangle. To determine Step 1: Find r' E norm(R) such that starter') = u. If no such r' or start(full(r')) / uV finish(full(r')) > v, maxP does not exist; terminate. Step 2: maxP = finisher'); while (s' e norm(R) A startss) = maxP + 1 A (full(s') C [u, v]) maxP = finishes'); Figure 2-8: Simple algorithm to compute maxP(u, v, R), where [u, v] is a range and conflictFree(R) whether [m, n] c A (Step 7), we search for the point (2wn m + 2W 1, m) in PST1 using the standard PST search algorithm [37]. The reinsertion into PST1 and PST2, if necessary, is done using the standard PST insert algorithm [37]. 2.4.4 Computing maxP and minP Although maxP and minP are relatively difficult to compute using data struc- tures such as PST1 and PST2 that directly represent R, they may be computed efficiently using data structures for norm(R). In this section, we show how to com- pute maxP from norm(R). The computation of minP is similar. 2.4.5 A Simple Algorithm to Compute maxP Figure 2-8 is a high-level description of a simple, though not efficient, algorithm to compute maxP(u, v, R). Theorem 1 Figure 2-8 corr. l//; computes maxP(u, v, R). Proof First consider Step 1. From Lemma 17(a), it follows that /r' e norm(R)[start(r') = u] z /3r c R[start(r) = u] Therefore, /3r' c norm(R)[start(r') = u] /B3maxP. From Lemma 18, it follows that start(full(r')) / starter') =u /3s e R[start(s) = starter') = u]. So, start(full(r')) / u =/3maxP. Finally, u start(r') start(full(r)) im- plies finish(full(r')) = min{finish(t) t R A start(t) = u} (Lemma 17(1)). So, finish(full(r')) > v implies /3s E R[start(s) = u A finishes) < v]. Hence, starter') = u A finish(full(r')) > v 3=/3maxP. Further, when 3r' E norm(R)[start(r') = u A finish(full(r')) < v], maxP exists and maxP > finish(full(r')) > finish(r'). Therefore, Step 1 correctly identifies the case when maxP doesn't exist. We get to Step 2 only when maxP exists. From the definition of maxP, 3A C R[I(A) = [u, maxP]]. From this and Lemma 20(1), we get 3B C norm(R)[H(B) [u, maxP]]. Now, from Lemma 10, we get 3D C norm(R)[chain(D) A H(D) [u, maxP]]. From Lemma 11, it follows that D is a sub-chain of the unique chain Ci E CP(norm(R)) that includes r'. Let r', s', s', ..., s be the tail of C. It follows that maxP is either finisher') or finishes') for some j in the range [1,q]. Let j be the least integer such that full(s') % [u,v]. If such a j does not exist, then maxP finishes') as norm(R) has no subset whose projection equals [u,x] for any x > finishes'). So, assume that j exists. From Lemma 20(2), it follows that maxP < finishes'). Hence, Step 2 correctly determines maxP. U 2.4.6 An Efficient Algorithm to Compute maxP The algorithm of Figure 2-8 takes time O(length(Ci)), where length(Ci) is the number of ranges in the chain C, e CP(norm(R)) that contains r'. We can reduce this time to O(loglength(Ci)) by representing each chain of CP(norm(R)) as a red- black tree (actually any balanced search tree structure that permits efficient join and split operations may be used). The number of red-black trees we use equals the number of chains in CP(norm(R)). Let D t,..., ') be a chain in CP(norm(R)). The red-black tree, RBT(D), for D has one node for each of the ranges t'. The key value for the node for t is start(t) (equivalently, finish(t) may be used as the search tree key). Each node of RBT(D) has the following four values (in addition to having a t' and other values necessary for efficient implementation): minStartLeft, minStartRight, maxFinishLeft, and maxFinishRight. For a node p that has an empty left subtree, minStartLeft = 2W 1 and maxFinishLeft = 0. Similarly, when p has an empty right subtree, minStartRight = 2W 1 and maxFinishRight = 0. Otherwise, minStartLeft = min{start(full(r')) r' E leftSubtree(p)} minStartRight = min{start(full(r'))|r' E rightSubtree(p)} maxFinishLeft = max{finish(full(r')) r' E leftSubtree(p)} maxFinishRight = max{finish(full(r'))|r' c rightSubtree(p)} The collection of red-black trees representing norm(R) is augmented by an ad- ditional red-black tree endPointsTree(norm(R)) that represents the end points of the ranges in norm(R). With each point x in endPointsTree, we store a pointer to the node in RBT(D) that represents s'. Alternatively, we may use a PST, PST3, for the range set chains {[start(Ci), finish(Ci)]I Ci E CP(norm(R))}. The points in PST3 are mapl(chains); with each point in PST3, we keep a pointer to the root of the RBT for that chain. Note that since range end-points are distinct in chains, we do not need to use transform as used in PST1. To find an end point d, we first find the smallest chain that covers d by performing the operation minXinRectangle(d, oo, d) in PST3. Next, we follow the pointer associated with this chain to get to the cor- responding RBT. Finally, a search of this RBT gets us to the RBT node for the s' with the given end point. In the sequel, we assume that endPointsTree, rather than PST3, is used. A parallel discussion is possible for the case when PST3 is used. To implement Step 1 of Figure 2-8, we search endPointsTree for the point u. If u endPointsTree, then /Jr' c norm(R)[start(r') = u]. If u c endPointsTree, then we use the pointer in the node for u to get to the root of the RBT that has r'. A search in this RBT for u locates r'. We may now perform the remaining checks of Step 1 using the data associated with r'. Suppose that maxP exists. At the start of Step 2, we are positioned at the RBT node that represents r'. This is node 0 of Figure 2-9. We need to find s' e norm(R) Figure 2-9: An example RBT with least s' such that startss) > finisher') A full(s') % [u, v]. If there is no such s', then maxP = max{finish(root.range), root.maxFinishRight}. If such an s' exists, maxP = startss) 1. s' may be found in O(height(RBT)) time using a simple search process. We illustrate this process using the tree of Figure 2-9. We begin at node 0. If [minStartRight, maxFinishRight] C [u,v], then s' is not in the right subtree of node 0. Since node 0 is a right child, s' is not in its parent. So, we back up to node 1 (in general, we back up to the nearest ancestor whose left subtree contains the current node). Let t' be the range in node 1. s' = t' iff t' % [u,v]. If s' / t', we perform the test [minStartRight, maxFinishRight] C [u, v] at node 1 to determine whether or not s' is in the right subtree of node 1. If the test is true, we back up to node 2. Otherwise, s' is in the right subtree of node 1. When the right subtree (if any) that contains s' is identified, we make a downward pass in this subtree to locate s'. Figure 2-10 describes this downward pass. downwardPass(currentNode) // currentNode is the root of a subtree all of whose ranges start at the right of u // This subtree contains s'. Return maxP. while (true) { if ([currentNode. minStartLeft, currentNode.maxFinishRight] C [u,v]) // s' not in left subtree if (currentNode.range C [u, v]) // s' currentNode. s' must be in right subtree. currentNode = currentNode.rightChild; else return (start(currentNode.range) 1); else // s' is in left subtree currentNode = currentNode.le ftChild; } Figure 2-10: Find s' (and hence maxP) in a subtree known to contain s' 2.4.7 Wrapping Up Insertion of a Range Now that we have augmented PST1 and PST2 with a collection of RBTs and an endPointsTree, whenever we insert a range r = [u, v] into R, we mut update not only PST1 and PST2 as described in Section 2.4.2, but also the RBT collection and endPointsTree. To do this, we first compute chop(r, R U {r}) = chop(r, R) = [u', v'] by first computing minP(u + 1, v) and maxP(u, v 1) as described in Section 2.4.4. [u', v'] is now easily obtained from the chopping rule. Lemma 21 tells us that the only s E R whose chop() value may change as a result of the insertion of r is the smallest enclosing range of r. Since z E R and r / z, such an s must exist. Rather than search for this s explicitly, we use the cases (2)-(4) conditions of Lemma 22 to find s' = chop(s, R) in endPointsTree. Note that if chop(s, R) = 0, the search in endPointsTree will not find s; but when chop(s, R) = 0, chop(s, RU {r}) = 0. So, no change in chop(s, R) is called for. Note that the insertion of r may combine two chains of CP(norm(R)). In this case, we use the join operation of red-black trees to combine the RBTs corresponding to these two chains. 2.4.8 Wrapping Up Deletion of a Range When chop(r, R) = 0, no changes are to be made to the RBTs and endPointsTree (Lemma 23(1)). So, assume that chop(r, R) / 0. We first find s, the smallest range that contains r (see Lemma 23(2)). Note that since z E R and r / z, s exists. One may verify that s is one of the ranges given by the following two operations. minXinRR. I.,,,I (2Wv u + 2 1,o u) maxXinRectangle(0, 2Wu + 2w 1 v, 2w 1 v) where the first operation is done in PST1 and the second in PST2 (both oper- ations are done after transforml(mapl([u, v])) has been deleted from PST1 and transform2(map2([u, v)) has been deleted from PST2). The ranges returned by these two operations may be compared to determine which is s. Once we have identified s, Lemma 23(2) is used to determine chop(s, R-{r}). As- sume that chop(s, R) / 0. Let chop(r, R) = r' = [u', v'] and chop(s, R) = s' = [x', y']. When s' and r' are in different RBTs (this is the case when r' C s', chop(s, R) chop(s, R {r}) and the RBT that contains s' may need to be split into two RBTs. When s' and r' are in the same RBT, they are in the same chain of CP(norm(R)). If s' are r' are .dli i .ent ranges of this chain, we may simply remove the RBT node for r' and update that for s' to reflect its new start or finish point (only one may change). When r' and s' are not .,.li i:ent ranges, the nodes for these two ranges are removed from the RBT (this may split the RBT into up to two RBTs) and chop(s, R {r}) inserted. Figure 2-11 shows the different cases. 2.4.9 Complexity The portions of the search, insert, and delete algorithms that deal only with PST1 and PST2 have the same .i-vmptotic complexity as their counterparts for the case of nonintersecting ranges (Section 2.3). The portions that deal with the RBTs and endPointsTree require a constant number of search, insert, delete, join, and split U' V' yt x' y u' v' u' y x' v'U (A) (B) UI IV 1- I I V/ F--H--H -H -H --HH---H F--H SU--H -' F--H---H-1 H-H F--H-F--H (C) (D) Figure 2-11: Cases when s' and r' are in the same chain of CP(norm(R)) operations on these structures. Since each of these operations takes O(log n) time on a red-black tree and since we can update the values minStartLeft, minStartRight, and so on, that are stored in the RBT nodes in the same .,-ill li ic time as taken by an insert/delete/join/split, the overall complexity of our proposed data structure is O(log n) for each operation when RBPSTs are used for PST1 and PST2. When RPSTs are used, the search complexity is O(W) and the insert and delete complexity is (W + logn) = (W). 2.5 Experimental Results 2.5.1 Prefixes We programmed our red-black priority-search tree algorithm for prefixes (Sec- tion 2.2) in C++ and compared its performance to that of the ACBRT of Sahni et al. [22]. Recall that the ACBRT is the best performing O(logn) data structure reported in [22] for dynamic prefix-tables. For test data, we used six IPv4 prefix databases obtained from [38]. The number of prefixes in each of these databases as well as the memory requirements for each database of prefixes using our data struc- ture (PST) of Section 2.2 as well as the ACBRT structure of Sahni et al. [22] are B PST ACRBT 12000 - 10000- 8000- o E 6000 4000 2000 Palx1 Pbl MaeWest Aads Pb2 Paix2 Database Figure 2-12: Memory usage shown in Table 2-1. The databases Paixl, Pbl, MaeWest and Aads were obtained on Nov 22, 2001, while Pb2 and Paix2 were obtained Sept. 13, 2000. Figure 2-12 is a plot of the data of Table 2-1. As can be seen, the ACBRT structure takes almost three times as much memory as is taken by the PST structure. Further, the memory requirement of the PST structure can be reduced to about 5(0' that of our current implementation. This reduction requires an n-node implementation of a priority- search tree as described in [37] rather than our current implementation, which uses 2n 1 nodes as in [39]. Table 2-1: Memory usage Database Paixt Pbl MaeWest Aads Pb2 Paix2 Num of Prefixes 16172 22225 28889 31827 35303 85988 Memory PST 884 1215 1579 1740 1930 4702 (KB) ACRBT 2417 3331 4327 4769 5305 12851 To obtain the mean time to find the longest matching-prefix (i.e., to perform a search), we started with a PST or ACRBT that contained all prefixes of a pre- fix database. Next, a random permutation of the set of start points of the ranges corresponding to the prefixes was obtained. This permutation determined the order in which we searched for the longest matching-prefix for each of these start points. The time required to determine all of these longest-matching prefixes was measured and averaged over the number of start points (equal to the number of prefixes). The experiment was repeated 20 times and the mean and standard deviation of the 20 mean times computed. Table 2-2 gives the mean time required to find the longest matching-prefix on a Sun Blade 100 workstation that has a 500MHz UltraSPARC-Iie processor and has a 256KB L2 cache. The standard deviation in the mean time is also given in this table. On our Sun workstation, finding the longest matching-prefix takes about 10'-. to 1 !'. less time using an ACRBT than a PST. Table 2-2: Prefix times on a 500MHz Sun Blade 100 workstation Database Paixl Pbl MaeWest Aads Pb2 Paix2 PST Mean 2.88 3.06 3.25 3.31 3.43 4.06 Search Std 0.36 0.18 0.17 0.16 0.09 0.05 (psec) ACRBT Mean 2.60 2.77 2.87 2.87 3.09 3.51 Std 0.25 0.16 0.16 0.12 0.13 0.04 PST Mean 3.90 4.45 4.83 5.18 5.14 6.04 Insert Std 0.57 0.63 0.51 0.48 0.19 0.20 (psec) ACRBT Mean 21.15 23.42 24.77 25.36 25.54 28.07 Std 1.11 0.66 0.38 0.29 0.19 0.18 PST Mean 4.36 4.45 4.73 4.71 5.06 5.48 Delete Std 0.91 0.63 0.53 0.00 0.19 0.16 (psec) ACRBT Mean 21.24 22.68 23.16 23.71 24.56 25.64 Std 0.95 0.55 0.49 0.35 0.26 0.21 To obtain the mean time to insert a prefix, we started with a random permutation of the prefixes in a database, inserted the first I.7' of the prefixes into an initially empty data structure, measured the time to insert the remaining 3: ;'. and computed the mean insert time by dividing by the number of prefixes in 3 ;' of the database. This experiment was repeated 20 times and the mean of the mean as well as the standard deviation in the mean computed. These latter two quantities are given in Table 2-2 for our Sun workstation. As can be seen, insertions into a PST take between 1i'. and 2"' the time to insert into an ACRBT! The mean and standard deviation data reported in Table 2-2 for the delete operation were obtained in a similar fashion by starting with a data structure that had 1C( 1' of the prefixes in the database and measuring the time to delete a randomly selected 3 ;' of these prefixes. Deletion from a PST takes about 21 1' the time required to delete from an ACRBT. Tables 2-3 and 2-4 give the corresponding times on a 700MHz Pentium III PC and a 1.4GHz Pentium 4 PC, respectively. Both computers have a 256KB L2 cache. The run times on our 700MHz Pentium III are about one-half the times on our Sun workstation. Surprisingly, when going from the 700MHz Pentium III to the 1.4GHz Pentium 4, the measured time to find the longest matching-prefix decreased by only about 5'. for PST. More surprisingly, the corresponding times for ACRBT actually increased. The net result of the slight decrease in time for PST and the increase for ACRBT is that, on our Pentium 4 PC, the PST is faster than the ACRBT on all three operations-find longest matching-prefix, insert, and delete. This somewhat surprising behavior is due to architectural differences (e.g., differences in width and size of L1 cache lines) between the Pentium III and 4 processors. Table 2-3: Prefix times on a 700MHz Pentium III PC Database Paixl Pbl MaeWest Aads Pb2 Paix2 PST Mean 1.39 1.54 1.61 1.65 1.70 1.97 Search Std 0.27 0.22 0.17 0.14 0.00 0.04 (psec) ACRBT Mean 1.36 1.44 1.44 1.49 1.54 1.80 Std 0.25 0.18 0.13 0.14 0.14 0.06 PST Mean 2.41 2.63 2.60 2.83 2.80 3.07 Insert Std 0.87 0.30 0.53 0.43 0.40 0.14 (psec) ACRBT Mean 11.97 12.63 13.48 13.62 13.77 14.93 Std 0.95 0.67 0.24 0.48 0.35 0.18 PST Mean 2.32 2.38 2.49 2.45 2.55 2.91 Delete Std 0.82 0.61 0.52 0.47 0.00 0.17 (psec) ACRBT Mean 11.69 12.55 12.95 13.01 13.40 14.10 Std 0.87 0.63 0.54 0.44 0.48 0.16 Figures 2-13, 2-14, and 2-15 histogram the search, insert, and delete time data of the preceding tables. Table 2-4: Prefix times on a 1.4GHz Pentium 4 PC Database Paixl Pbl MaeWest Aads Pb2 Paix2 PST Mean 1.30 1.44 1.51 1.52 1.63 1.92 Search Std 0.19 0.18 0.17 0.13 0.13 0.06 (psec) ACRBT Mean 1.48 1.69 1.83 1.87 1.87 2.24 Std 0.31 0.20 0.16 0.07 0.14 0.05 PST Mean 1.76 1.96 2.18 2.17 2.38 2.65 Insert Std 0.41 0.69 0.00 0.44 0.35 0.18 (psec) ACRBT Mean 11.22 11.81 12.41 12.91 12.92 13.94 Std 0.41 0.60 0.41 0.44 0.26 0.18 PST Mean 1.76 1.69 1.92 1.93 2.00 2.22 Delete Std 0.41 0.60 0.38 0.21 0.42 0.17 (psec) ACRBT Mean 9.46 10.39 10.54 10.42 10.92 11.64 Std 0.57 0.63 0.38 0.21 0.42 0.16 PSBT IN AC RBT PST ACRBT i(B) (B) 1 (C) (C) Figure 2-13: Time for searching longest matching prefix. A)Sun. B)Pentium 700MHz. C)Pentium 1.4GHz M APSBT ACRBTI ACRBT ACRBT Figure 2-14: Time for inserting a prefix. A)Sun. B)Pentium 700MHz. C)Pentium 1.4GHz 2.5.2 Nonintersecting Ranges To benchmark our algorithm for nonintersecting ranges (Section 2.3), we gener- ated three different sets of random1 nonintersecting ranges. These, respectively, had 1 We resorted to randomly generated data sets because no benchmark data for nonintersecting ranges was available. S PSBT IN AC RBT III 111111 111111 111111 t PST t PST 7 PST G RABT ARBT ACRBT S111111. ... Database Database Database (A) (B) (C) Figure 2-15: Time for deleting a prefix. A)Sun. B)Pentium 700MHz. C)Pentium 1.4GHz 30000, 50000, and 80000 ranges. Table 2-5 gives the memory requirement as well as the mean times and standard deviations for search, insert, and delete. The run times are for our 700MHz Pentium III PC. The search, insert, and delete experiments were modeled after those conducted for the case of prefix databases. Table 2-5: N,-,iii isecting Ranges. 700 MHz PIII Num of Ranges 30000 50000 80000 Memory Usage (KB) 3360 5600 8960 Search Mean 1.92 2.19 2.51 (psec) Std 0.15 0.04 0.06 Insert Mean 8.65 9.27 9.88 (psec) Std 0.49 0.29 0.17 Remove Mean 5.75 6.42 6.81 (psec) Std 0.44 0.28 0.14 2.5.3 Conflict-free Ranges Table 2-6 gives the memory required as well as the mean times and standard deviations for the case of conflict-free ranges. The range sequence used is generated so that when the ranges are inserted in sequence order, there are no conflicts. For deletion, 3 ::' of the ranges are removed in the reverse of the insert order. 2.6 Conclusion We have developed data structures for dynamic router tables. Our data struc- tures permit one to search, insert, and delete in O(log n) time each. Although O(log n) Table 2-6: Conflict-free Ranges. PIII 700MHz with 256K L2 cache Num of Ranges in R 30000 50000 80000 Num of Ranges Mean 29688 48868 76472 in norm(R) Std 18.03 42.90 60.05 Memory Usage Mean 6240 9979 15219 (KB) Std 7.06 10.91 11.19 Search Mean 1.98 2.34 2.69 (psec) Std 0.07 0.09 0.06 Insert Mean 18.45 19.65 20.76 (psec) Std 0.51 0.27 0.27 Remove Mean 19.3 20.49 21.60 (psec) Std 0.41 0.13 0.29 time data structures for prefix tables were known prior to our work [21, 22], our data structure is more memory efficient than the data structures of Sahni et al. [21, 22]. Further, our data structure is significantly superior on the insert and delete opera- tions, while being competitive on the search operation. For nonintersecting ranges and conflict-free ranges our data structures are the first to permit O(log n) search, insert, and delete. CHAPTER 3 DYNAMIC IP ROUTER TABLES USING HIGHEST-PRIORITY MATCHING In this chapter, we focus on data structures for dynamic NHPRTs, HPPTs and LMPTs. In Section 3.2, we develop the data structure binary tree on binary tree (BOB). This data structure is proposed for the representation of dynamic NHPRTs. Using BOB, a lookup takes O(log2 n) time and cache misses; a new rule may be inserted and an old one deleted in O(logn) time and cache misses. For HPPTs, we propose a modified version of BOB-PBOB (prefix BOB)-in Section 3.3. Using PBOB, a lookup, rule insertion and deletion each take O(W) time and cache misses. In Section 3.4, we develop the data structures LMPBOB (longest matching-prefix BOB) for LMPTs. Using LMPBOB, the longest matching-prefix may be found in O(W) time and O(log n) cache misses; rule insertion and deletion each take O(log n) time and cache misses. On practical rule tables, BOB and PBOB perform each of the three dynamic-table operations in O(log n) time and with O(log n) cache misses. Section 3.1 introduces some terminology and Experimental results are presented in Section 3.6. 3.1 Preliminaries Definition 13 A range r = [u, v] is a pair of addresses u and v, u < v. The ru.i., r represents the addresses {u, u+ 1,..., v}. starter) = u is the start point of the ri.,, and finish(r) = v is the finish point of the rr, .i. The rr ,,, r matches all addresses d such that u < d < v. The start point of the range r = [3, 9] is 3 and its finish point is 9. This range matches the addresses {3, 4, 5, 6, 7, 8, 9}. In IPv4, s and f are up to 32 bits long, and in IPv6, s and f may be up to 128 bits long. The IPv4 prefix P = O* corresponds to the range [0, 231 1]. The range [3,9] does not correspond to any single IPv4 prefix. We may draw the range r = [u, v] = {u, u + 1,..., v} as a horizontal line that begins at u and ends at v. Figure 2-1 shows ranges drawn in this fashion. Notice that every prefix of a prefix router-table may be represented as a range. For example, when W = 6, the prefix P = 1101* matches addresses in the range [52,55]. So, we --v P = 1101* = [52,55], start(P) = 52, and finish(P) = 55. Since a range represents a set of (contiguous) points, we may use standard set operations and relations such as n and c when dealing with ranges. So, for example, [2, 6] n [4, 8] = [4, 6]. Note that some operations between ranges my not yield a range. For example, [2, 6] U [8, 10] = 2, 3, 4, 5, 6, 8, 9, 10}, which is not a range. Definition 14 Let r = [u, v] and s = [x, y] be two r, .,. Let overlap(r, s) = rn s. (a) The predicate disjoint(r, s) is true iff r and s are disjoint. disjoint(r, s) < overlap(r, s)= 0 v < x V y < u Figure 2-1(A) shows the two cases for disjoint sets. (b) The predicate nested(r, s) is true iff one of the ru., is contained within the other. nested(r, s) overlap(r, s) r V overlap(r, s)= s r rCsVsCr < x
Figure 2-1(B) shows the two cases for nested sets. (c) The predicate intersect(r, s) is true iff r and s have a no,. mi1l,;' intersection that is different from both r and s. intersect(r, s) => r s O Ar n s rAr n s s < -disjoint(r, s) A -inested(r, s) S= u Figure 2-1(C) shows the two cases for ri,.g that intersect. Notice that overlap(r, s) = [x,v] when u < x < v < y and overlap(r, s) = [u,y] when x < y < v. [2, 4] and [6, 9] are disjoint; [2,4] and [3,4] are nested; [2,4] and [2,2] are nested; [2,8] and [4,6] are nested; [2,4] and [4,6] intersect; and [3,8] and [2,4] intersect. [4, 4] is the overlap of [2, 4] and [4, 6]; and overlap([3, 8], [2, 4]) = [3, 4]. Lemma 24 Let r and s be two r,.g. E,'. /;i one of the following is true. 1. disjoint(r, s) 2. nested(r, s) 3. intersect(r, s) Proof Straightforward. U Definition 15 The r i,.g' set R is nonintersecting iff disjoint(r, s) V nested(r, s) for every pair of ri,.g, r and s E R. Definition 16 The ri.,.- r is more specific than the r,,'j. s iff r C s. [2, 4] is more specific than [1,6], and [5, 9] is more specific than [5, 12]. Since [2, 4] and [8, 14] are di-i, 1iiil neither is more specific than the other. Also, since [4, 14] and [6, 20] intersect, neither is more specific than the other. Definition 17 Let R be a rr,,.-. set. ranges(d, R) (or -i .'l.; ranges(d) when R is implicit) is the subset of rn.g' of R that match the destination address d. msr(d,R) (or msr(d)) is the most "/ .. ..:'' ring' of R that matches d. That is, msr(d) is the most -/... ''.: rir '.- in ranges(d). msr([u, v],R) = msr(u, v,R) = r iff msr(d, R) = r, u < d < v. When R is implicit, we write msr(u,v) and msr([u,v]) in place of msr(u,v, R) and msr([u, v],R). hpr(d) is the highest-i '.i- ',;l ri,'.- in ranges(d). We assume that rr,.i are assigned priorities in such a way that hpr(d) is ;",',.:!;" ;I I, f;,.,, for every d. When R = {[2,4], [1, 6]}, ranges(3) = {[2,4], [1, 6]}, msr(3) = [2,4], msr(1) [1, 6], msr(7) = 0, and msr(5, 6) = [1,6]. When R = {[4,14], [6, 20], [6,14], [8,12]}, msr(4, 5) [4,14], msr(6, 7) [6,14], msr(8, 12)- [8,12], msr(13, 14)- [6,14], and msr(15, 20) = [6, 20]. Definition 18 Let r and s be two ri.g. r < s # starter) < starts) V starterr) starts) A finisher) > finishes)). Note that for every pair, r and s, of different ranges, either r < s or s < r. Lemma 25 Let R be a nonintersecting rig.- set. If r n s / 0 for r s R, then the following are true: 1. start(r) < starts) = finish(r) > finishes). 2. finisher) > finishes) = start(r) < startss. Proof Straightforward. U 3.2 Nonintersecting Highest-Priority Rule-Tables (NHRTs)-BOB 3.2.1 The Data Structure The data structure binary tree on binary tree (BOB) that is being proposed here for NHRTs comprises a single balanced binary search tree at the top level. This top- level balanced binary search tree is called the point search tree (PTST). For an n-rule NHRT, the PTST has at most 2n nodes (we call this the PTST size constraint). The size constraint is necessary to enable O(log n) update. With each node z of the PTST, we associate a point, point(z). The PTST is a standard red-black binary search tree (actually, any binary search tree structure that supports efficient search, insert, and delete may be used) on the point(z) values of its node set [24]. That is, for every node z of the PTST, nodes in the left subtree of z have smaller point values than point(z), and nodes in the right subtree of z have larger point values than point(z). Let R be the set of nonintersecting ranges of the NHRT. Each range of R is stored in exactly one of the nodes of the PTST. More specifically, the root of the PTST stores all ranges r E R such that starter) < point(root) < finisher); all ranges r E R such that finisher) < point(root) are stored in the left subtree of the root; all ranges r E R such that point(root) < start(r) (i.e., the remaining ranges of R) are stored in the right subtree of the root. The ranges allocated to the left and right subtrees of the root are allocated to nodes in these subtrees using the just stated range allocation rule recursively. Note that the range allocation rule is quite similar to that used for interval trees [40]. For the range allocation rule to successful allocate all r E R to exactly one node of the PTST, the PTST must have at least one node z for which starter) < point(z) < finisher). Table 3-1 gives an example set of nonintersecting ranges, and Figure 3-1 shows a possible PTST for this set of ranges (we w possible, because we haven't specified how to select the point(z) values and even with specified point(z) values, the corresponding red-black tree isn't unique). The number inside each node is point(z), and outside each node, we give ranges(z). 70 Y- .([2, 100],4) ([8, 50], 9) I I ([69, 72],10) S([10, 50], 20)1 .- 30 80 1---- ([10,35], 3) ----- I ([15, 33], 5) 2 6 ([80, 80], 12)I ([16, 320], 302 I 2 ___ _ S([2, 4], 33) ([54, 66], 18) I ([2, 3], 34) I I ([60, 65], 7) 1 Figure 3-1: A possible PTST Figure 3-1: A possible PTST 58 Table 3-1: A nonintersecting range set range priority [2, 100] 4 [2, 4] 33 [2, 3] 34 [8, 68] 10 [8, 50] 9 [10, 50] 20 [10, 35] 3 [15, 33] 5 [16, 30] 30 [54, 66] 18 [60, 65] 7 [69, 72] 10 [80, 80] 12 Let ranges(z) be the subset of ranges of R allocated to node z of the PTST.1 Since the PTST may have as many as 2n nodes and since each range of R is in exactly one of the sets ranges(z), some of the ranges(z) sets may be empty. The ranges in ranges(z) may be ordered using the < relation of Definition 18. Using this < relation, we put the ranges of ranges(z) into a red-black tree (any balanced binary search tree structure that supports efficient search, insert, delete, join, and split may be used) called the range search-tree or RST(z). Each node x of RST(z) stores exactly one range of ranges(z). We refer to this range as range(x). Every node y in the left (right) subtree of node x of RST(z) has range(y) < range(x) (range(y) > range(x)). In addition, each node x stores the quantity mp(x), which is the maximum of the priorities of the ranges associated with the nodes in the subtree 1 We have overloaded the function ranges. When u is a node, ranges(u) refers to the ranges stored in node u of a PTST; when u is a destination address, ranges(u) refers to the ranges that match u rooted at x. mp(x) may be defined recursively as below. p(x) p(x) if x is leaf max {mp(leftChild(x)), mp(rightChild(x)), p(x)} otherwise where p(x) = prior.:// (range(x)). Figure 3-2 gives a possible RST structure for ranges(30) of Figure 3-1. Each node shows (range(x),p(x), mp(x)). [10, 35], 3, 30 [8, 50], 9, 20 [15, 33], 5, 30 [8, 68], 1 10 [10, 50], 20, 20 [16, 30], 30, 30 Figure 3-2: An example RST for ranges(30) of Figure 3-1 Lemma 26 Let z be a node in a PTST and let x be a node in RST(z). Let st(x) start(range(x)) and fn(x) = finish(range(x)). 1. For every node y in the right subtree of x, st(y) > st(x) and fn(y) < fn(x). 2. For every node y in the left subtree of x, st(y) < st(x) and fn(y) > fn(x). Proof For 1, we see that when y is in the right subtree of x, range(y) > range(x). From Definition 18, it follows that st(y) > st(x). Further, since range(y) n range(x) / 0, if st(y) > st(x), then fn(y) < fn(x) (Lemma 25); if st(y) = st(x), fn(y) < fn(x) (Definition 18). The proof for 2 is similar. m 3.2.2 Search for hpr(d) The highest-priority range that matches the destination address d may be found by following a path from the root of the PTST toward a leaf of the PTST. Figure 3-3 gives the algorithm. For simplicity, this algorithm finds hp = prior :l',(hpr(d)) rather than hpr(d). The algorithm is easily modified to return hpr(d) instead. Algorithm hp(d) { // return the priority of hpr(d) // easily extended to return hpr(d) hp = -1; // assuming 0 is the smallest priority value z = root; // root of PTST while (z != null) { if (d > point(z)) { RST(z)->hpRight(d, hp); z = rightChild(z); } else if (d < point(z)) { RST(z)->hpLeft(d, hp); z = leftChild(z); } else // d == point(z) return max{hp, mp(RST(z)->root)}; } return hp; } Figure 3-3: Algorithm to find prior ':/,(hpr(d)) We begin by initializing hp = -1 and z is set to the root of the PTST. This initialization assumes that all priorities are > 0. The variable z is used to follow a path from the root toward a leaf. When d > point(z), d may be matched only by ranges in RST(z) and those in the right subtree of z. The method RST(z)->hpRight(d,hp) (Figure 3-4) updates hp to reflect any matching ranges in RST(z). This method makes use of the fact that d > point(z). Consider a node x of RST(z). If d > fn(x), then d is to the right (i.e., d > finish(range(x))) of range(x) and also to the right of all ranges in the right subtree of x. Hence, we may proceed to examine the ranges in the left subtree of x. When d < fn(x), range(x) as well as all ranges in the left subtree of x match d. Additional matching ranges may be present in the right subtree of x. hpLeft(d, hp) is the analogous method for the case when d < point(z). Complexity The complexity of the invocation RST(z)->hpRight (d,hp) is read- ily seen to be O(height(RST(z)) = O(logn). Consequently, the complexity of hp(d) is O(log2 n). To determine hpr(d) we need only add code to the methods hp(d), Algorithm hpRight(d, hp) { // update hp to account for any ranges in RST(z) that match d // d > point(z) x = root; // root of RST(z) while (x != null) if (d > fn(x)) x = leftChild(x); else { hp = max{hp, p(x), mp(leftChild(x))}; x = rightChild(x); } } Figure 3-4: Algorithm hpRight(d, hp) hpRight(d, hp), and hpLeft(d, hp) so as to keep track of the range whose priority is the current value of hp. So, hpr(d) may be found in O(log2 n) time also. 3.2.3 Insert a Range A range r that is known to have no intersection with any of the existing ranges in the router table, may be inserted using the algorithm of Figure 3-5. In the while loop, we find the node z nearest to the root such that r matches point(z) (i.e., starter) < point(z) < finisher)). If such a z exists, the range r is inserted into RST(z) using the standard red-black insertion algorithm [24]. During this insertion, it is necessary to update some of the mp values on the insert path. This update is done easily. In case the PTST has no z such that r matches point(z), we insert a new node into the PTST. This insertion is done using the method insertNewNode. To insert a new node into the PTST, we first create a new PTST node y and define point(y) and RST(y). point(y) may be set to be any destination address matched by r (i.e., any address such that start(r) < point(y) < finisher)) may be used. In our implementation, we use point(y) = starter). RST(y) has only a root node and this root contains r; its mp value is prior' /l(r). If the PTST is currently empty, y becomes the new root and we are done. Otherwise, the new node y may be inserted where the search conducted in the while loop of Figure 3-5 terminated. That Algorithm insert(r) { // insert the nonintersecting range r z = root; // root of PTST while (z != null) if (finish(r) < point(z)) z = leftChild(z); else if (start(r) > point(z)) z = rightChild(z); else {// r matches point(z) RST(z)->insert(r); return; } // there is no node z such that r matches point(z) // insert a new node into PTST insertNewNode(r); } Figure 3-5: Algorithm to insert a nonintersecting range is, as a child of the last non-null value of z. Following this insertion, the traditional bottom-up red-black rebalancing pass is made [24]. This rebalancing pass may require color changes and at most one rotation. Color changes do not affect the tree structure. However, a rebalancing rotation, if performed, affects the tree structure and may lead to a violation of the range allocation rule. Rebalancing rotations are investigated in the next section. We note that if the number of nodes in the PTST was at most 21RI, where IRI is the number of ranges prior to the insertion of a new range r, then following the insertion, IPTSTI < 21RI + 1 < 2(IRI + 1), where IPTSTI is the number of nodes in the PTST and |R| + 1 is the number of ranges following the inertion of r. Hence an insert does not violate the PTST size constraint. Complexity Exclusive of the time required to perform the tasks associated with a rebalancing rotation, the time required to insert a range is O(height(PTST)) O(logn). As we shall see in the next section, a rebalancing rotation can be done in O(logn) time. Since at most one rebalancing rotation is needed following an insert, the time to insert a range is O(log n). In case it is necessary for us to verify that the range to be inserted does not intersect an existing range, we can augment the PTST with priority search trees as in [34] and use these trees for intersection detection. The overall complexity of an insert remains O(log n). 3.2.4 Red-Black-Tree Rotations Figures 3-6 and 3-7, respectively, show the red-black LL and RR rotations used to rebalance a red-black tree following an insert or delete (see [24]). In these figures, pt() is an abbreviation for point(). Since the remaining rotation types, LR and RL, may, respectively, be viewed as an RR rotation followed by an LL rotation and an LL rotation followed by an RR rotation, it suffices to examine LL and RR rotations alone. pt(x) pt(y) ity) LL a p(x) a b b/ Figure 3-6: LL rotation pt(x) P Y) y x pt(y) RR pt(x) b c a b Figure 3-7: RR rotation Lemma 27 Let R be a set of nonintersecting r,,mj. Let ranges(z) C R be the r i,.j, allocated by the r u,.-, allocation rule to node z of the PTST prior to an LL or RR rotation. Let ranges'(z) be this subset for the PTST node z following the rotation. ranges(z) = ranges'(z) for all nodes z in the subtrees a, b, and c of Figures 3-6 and 3-7. Proof Consider an LL rotation. Let ranges(subtree(x)) be the union of the ranges allocated to the nodes in the subtree whose root is x. Since the range allocation rule allocates each range r to the node z nearest the root such that r matches point(z), ranges(subtree(x)) = ranges'(subtree(y)). Further, r c ranges(a) if r E ranges(subtree(x)) and finisher) < point(y). Consequently, r E ranges'(a). From this and the fact that the LL rotation doesn't change the positioning of nodes in a, it follows that for every node z in the subtree a, ranges(a) = ranges'(a). The proof for the nodes in b and c as well as for the RR rotation is similar. 0 Let x and y be as in Figures 3-6 and 3-7. From Lemma 27, it follows that ranges(z) = ranges'(z) for all z in the PTST except possibly for z E {x, y}. It is not too difficult to see that ranges'(y) = ranges(y) U S and ranges'(x) = ranges(x) S, where S = {rfr E ranges(x) A start(r) < point(y) < finisher)} Since we are dealing with a set of nonintersecting ranges, all ranges in ranges(y) are nested within the ranges of S. Figure 3-8 shows the ranges of ranges(x) using solid lines and those of ranges(y) using broken lines. S is the set of ranges drawn above ranges(y) (i.e., the solid lines above the broken lines). The range rMax of S with largest start() value may be found by searching RST(x) for the range with largest start() value that matches point(y). (Note that rMax = msr(point(y),ranges(x)).) Since RST(x) is a binary search tree of an ordered set of ranges (Definition 18), rMax may be found in O(height(RST(x)) time by following a path from the root downward. If rMax doesn't exist, S = 0, ranges'(x)= ranges(x) and ranges'(y) = ranges(y). msr(pt(y), ranges(x)) pt(y) pt(x) I I II I ms(pt(y), ranges(x)) pt(x) pt(y) I I I I Figure 3-8: ranges(x) and ranges(y) for LL and RR rotations. as in Figures 3-6 and 3-7 Assume that rMax exists. We may use the split operation RST(x) the ranges that belong to S. The operation RST(x) split(small, rMax, big) separates RST(x) into an RST small of ranges < (Definition 18) than rMax and an RST big of ranges > than rMax. We see that RST'(x) = big and RST'(y) = join(small, rMax, RST(y)), where join [24] combines the red-black tree small with ranges < rMax, the range rMax, and the red-black tree RST(y) with ranges > rMax into a single red-black tree. The standard split and join operations of Horowitz et al. [24] need to be modified slightly so as to update the mp values of affected nodes. This modification doesn't affect the .,-i~'i!ii.1 I ic complexity, which is logarithmic in the number of nodes in the tree being split or logarithmic in the sum of the number of nodes in the two trees being joined, of the split and join operations. So, the complexity of performing an LL or RR rotation (and hence of performing an LR or RL rotation) in the PTST is O(log n). Nodes x and y are [24] to extract from 3.2.5 Delete a Range Figure 3-9 gives our algorithm to delete a range r. Note that if r is one of the ranges in the PTST, then r must be in the RST of the node z that is closest to the root and such that r matches point(z). The while loop of Figure 3-9 finds this z and deletes r from RST(z). Algorithm delete(r) { // delete the range r z = root; // root of PTST while (z != null) if (finish(r) < point(z)) z = leftChild(z); else if (start(r) > point(z)) z = rightChild(z); else {// r matches point(z) RST(z)->delete(r); cleanup(z); return; } } Figure 3-9: Algorithm to delete a range Assume that r is, in fact, one of the ranges in our PTST. To delete r from RST(z), we use the standard red-black deletion algorithm [24] modified to update mp values as necessary. Following the deletion of r from RST(z) we perform a cleanup operation that is necessary to maintain the size constraint of the PTST. Figure 3-10 gives the steps in the method cleanup. Algorithm cleanup(z) { // maintain size constraint if (RST(z) is empty and the degree of z is 0 or 1) delete node z from the PTST and rebalance; while (IPTSTI > 21RI) delete a degree 0 or degree 1 node z with empty RST(z) from the PTST and rebalance; Figure 3 Algorithm to maintain size constraint following a delete Figure 3-10: Algorithm to maintain size constraint following a delete Notice that following the deletion of r from RST(z), RST(z) may or may not be empty. If RST(z) becomes empty and the degree of node z is either 0 or 1, node z is deleted from the PTST using the standard red-black node deletion algorithm [24]. If this deletion requires a rotation (at most one rotation may be required) the rotation is done as described in Section 3.2.4. Since the number of ranges and nodes has each decreased by 1, the size constraint may be violated (this happens if IPTST = 21RI prior to the delete). Hence, it may be necessary to remove a node from the PTST to restore the size constraint. If RST(z) becomes empty and the degree of z is 2 or if RST(z) does not become empty, z is not deleted from the PTST. Now, IPTSTI is unchanged by the deletion of r and |R| reduces by 1. Again, it is possible that we have a size constraint violation. If so, up to two nodes may have to be removed from the PTST to restore the size constraint. The size constraint, if violated, is restored in the while loop of Figure 3-10. This restoration is done by removing one or two (as needed) degree 0 or degree 1 nodes that have an empty RST. Lemma 28 shows that whenever the size constraint is violated, the PTST has at least one degree 0 or degree 1 node with an empty RST. So, the node z needed for deletion in each iteration of the while loop l i' i, exists. Lemma 28 When the PTST has > 2n nodes, where n= IRI, the PTST has at least one degree 0 or degree 1 node that has an n mp'/l PTST. Proof Suppose not. Then the degree of every node that has an empty RST is 2. Let n2 be the total number of degree 2 nodes, nl the total number of degree 1 nodes, no the total number of degree 0 nodes, n, the total number of nodes that have an empty RST, and n, the total number of nodes that have a nonempty RST. Since all PTST nodes that have an empty RST are degree 2 nodes, n2 > n,. Further, since there are only n ranges and each range is stored in exactly one RST, there are at most n nodes that have a nonempty RST, i.e., n > n,n. Thus n2 + n > n, + n, IPTSTI, i.e., n2 > IPTSTI n. From [24], we know that no = n2 + 1. Hence, no + n1 + n2 2 + + n + > n2+n2 > 21PTSTI 2n > IPTSTI. This contradicts no + n1 + n2 = |PTSTI. 0 To find the degree 0 and degree 1 nodes that have an empty RST efficiently, we maintain a doubly-linked list of these nodes. Also, a doubly-linked list of degree 2 nodes that have an empty RST is maintained. When a range is inserted or deleted, PTST nodes may be added/removed from these doubly-linked lists and nodes may move from one list to another. The required operations can be done in 0(1) time each. Complexity It takes O(logn) time to find the PTST node z that contains the range r that is to be deleted. Another O(log n) time is needed to delete r from RST(z). The cleanup step removes up to 2 nodes from the PTST. This takes another O(log n) time. So, the overall delete time is O(log n). 3.2.6 Expected Complexity of BOB Let maxR be the maximum number of ranges that match any destination ad- dress. So, Iranges(z)l = IRST(z)I < maxR for every node z of the PTST. We may, therefore, restate the complexity of the BOB operations-lookup, insert, delete-as O(lognlogmaxR), O(logn), and O(logn), respectively. Sahni et al. [21] have analyzed the prefixes in several real IPv4 prefix router- tables. They report that a destination address is matched by about 1 prefix on average; the maximum number of prefixes that match a destination address is at most 6. Making the assumption that this analysis holds true even for real range router-tables (no data is available for us to perform such an analysis), we conclude that maxR < 6. So, the expected complexity of BOB on real router-tables is O(log n) per operation. 3.3 Highest-Priority Prefix-Tables (HPPTs)-PBOB 3.3.1 The Data Structure When all rule filters are prefixes, maxR < min{n, W}. Hence, if BOB is used to represent an HPPT, the search complexity is O(log n min{log n, log W}); the insert and delete complexities are O(log n) each. Since maxR < 6 for real prefix router-tables, we may expect to see better perfor- mance using a simpler structure (i.e., a structure with smaller overhead and possibly worse .,i-mptotic complexity) for ranges(z) than the RST structure described in Sec- tion 3.2. In PBOB, we replace the RST in each node, z, of the BOB PTST with an array linear list [41], ALL(z), of pairs of the form (pLength, priority), where pLength is a prefix length (i.e., number of bits) and priority is the prefix priority. ALL(z) has one pair for each range r E ranges(z). The pLength value of this pair is the length of the prefix that corresponds to the range r and the priority value is the priority of the range r. The pairs in ALL(z) are in ascending order of pLength. Note that since the ranges in ranges(z) are nested and match point(z), the corresponding prefixes have different length. 3.3.2 Lookup Figure 3-11 gives the algoritm to find the priority of the highest-priority prefix that matches the destination address d. The method maxp() returns the highest priority of any prefix in ALL(z) (note that all prefxes in ALL(z) match point(z)). The method searchALL(d,hp) examines the prefixes in ALL(z) and updates hp taking into account the priorities of those prefixes in ALL(z) that match d. The method searchALL(d,hp) utilizes the following lemma. Consequently, it examines prefixes of ALL(z) in increasing order of length until either all prefixes have been examined or until the first (i.e., shortest) prefix that doesn't match d is examined. Algorithm hp(d) { // return the priority of hpp(d) // easily extended to return hpp(d) hp = -1; // assuming 0 is the smallest priority value z = root; // root of PTST while (z != null) { if (d == point(z)) return max{hp, ALL(z)->maxp()}; ALL(z)->searchALL(d,hp); if (d < point(z)) z = leftChild(z); else z = rightChild(z); } return hp; } Figure 3-11: Algorithm to find prior.:l ,(hpp(d)) Lemma 29 If a prefix in ALL(z) doesn't match a destination address d, then no longer-length prefix in ALL(z) matches d. Proof Let pl and p2 be prefixes in ALL(z). Let li be the length of pi. Assume that 11 < 12 and that pi doesn't match d. Since both pi and P2 match point(z), P2 is nested within pl. Therefore, all destination addresses that are matched by p2 are also matched by pi. So, p2 doesn't match d. 0 One way to determine whether a length 1 prefix of ALL(z) matches d is to use the following lemma. The check of this lemma may be implemented using a mask to extract the most-signifcant bits of point(z) and d. Lemma 30 A length I prefix p of ALL(z) matches d iff the most--.:i,ii.:liW..it/ bits of point(z) and d are the same. Proof Straightforward. U Complexity We assume that the masking operations can be done in 0(1) time each. (In IPv4, for example, each mask is 32 bits long and we may extract any subset of bits from a 32-bit integer by taking the logical and of the appropriate mask and the integer.) The number of PTST nodes reached in the while loop of Figure 3-11 is O(log n) and the time spent at each node z that is reached is linear in the number of prefixes in ALL(z) that match d. Since the PTST has at most maxR prefixes that match d, the complexity of our lookup algorithm is O(log n + maxR) = O(W) (note that log2 n < W and maxR < W). 3.3.3 Insertion and Deletion The PBOB algorithms to insert/delete a prefix are simple adaptations of the cor- responding algorithms for BOB. rMax is found by examining the prefixes in ALL(x) in increasing order of length. ALL'(y) is obtained by prepending the prefixes in ALL(x) whose length is < the length of rMax to ALL(y), and ALL'(x) is obtained from ALL(x) by removing the prefixes whose lenth is < the length of rMax. The time require to find rMax is O(maxR). This is also the time required to com- pute ALL'(y) and ALL'(x). The overall complexity of an insert/delete operation is O(log n + maxR) = 0(W). As noted earlier, maxR < 6 in practice. So, in practice, PBOB takes O(log n) time and makes O(log n) cache misses per operation. 3.4 Longest-Matching Prefix-Tables (LMPTs)-LMPBOB 3.4.1 The Data Structure Using priority = pLength, a PBOB may be used to represent an LMPT obtain- ing the same performance as for an HPPT. However, we may achieve some reduction in the memory required by the data structure if we replace the array linear list that is stored in each node of the PTST by a W-bit vector, bit. bit(z)[i] denotes the ith bit of the bit vector stored in node z of the PTST, bit(z)[i] = 1 iff ALL(z) has a prefix whose length is i. We note that Suri et al. [20] use W-bit vectors to keep track of prefix lengths in their data structure also. 3.4.2 Lookup Figure 3-12 gives the algorithm to find the length of the longest matching-prefix, Imp(d), for destination d. The method longest() returns the largest i such that bit(z)[i] = 1 (i.e., it returns the length of the longest prefix stored in node z). The method searchBitVector(d,hp,k) examines bit(z) and updates hp taking into ac- count the lengths of those prefixes in this bit vector that match d. The method same (k+l, point (z) d) returns true iff point(z) and d agree on their k + 1 most significant bits. Algorithm lmp(d) { // return the length of lmp(d) // easily extended to return lmp(d) hp = 0; // length of Imp k = 0; // next bit position to examine is k+1 z = root; // root of PTST while (z != null) { if (d == point(z)) return max{k, z->longest()}; bit(z)->searchBitVector(d,hp,k); if (d < point(z)) z = leftChild(z); else z = rightChild(z); } return hp; } Figure 3-12: Algorithm to find length(lmp(d)) Algorithm searchBitVector(d,hp,k) { // update hp and k while (k < W && same(k+1, point(z), d) { if (bit(z)[k+1] == 1) hp = k+1; k++; } } Figure 3-13: Algorithm to search a bit vector for prefixes that match d The method searchBitVector(d,hp,k) (Figure 3-13) utilizes the next two lem- mas. Lemma 31 If bit(z)[i] corresponds to a prefix that doesn't match the destination address d, then bit(z)[j], j > i corresponds to a prefix that doesn't match d. Proof bit(z)[q] corresponds to the prefix pq whose length is q and which equals the q most significant bits of point(z). So, pi matches all points that are matched by pj. Hence, if pi doesn't match d, pj doesn't match d either. 0 Lemma 32 Let w and z be two nodes in a PTST such that w is a descendent of z. Suppose that z- > bit(q) corresponds to a prefix pq that matches d. w- > bit(j), j < q cannot correspond to a prefix that matches d. Proof Suppose that w- > bit(j) corresponds to the prefix pj, pj matches d, and j < q. So, pj equals the j most significant bits of d. Since pq matches d and also point(z), d and point(z) have the same q most significant bits. Therefore, pj matches point(z). So, by the range allocation rule, pj should be stored in node z and not in node w, a contradiction. U Complexity We assume that the method same can be implemented using masks and Boolean operations so as to have complexity 0(1). Sine a bit vector has the same number of bits as does a destination address, this assumption is consistent with the implicit assumption that arithmetic on destination addresses takes 0(1) time. The total time spent in all invocations of searchBitVector is O(W + log n). The time spent in the remaining steps of lmp(d) is O(logn). So, the overall complexity of 1mp(d) is O(W + logn) = O(W). Even though the time complexity is O(W), the number of cache misses is O(log n) (note that each bit vector takes the same amount of space as needed to store a destination address). 3.4.3 Insertion and Deletion The insert and delete algorithms are similar to the corresponding algorithms for HPPTs. The essential difference are as below. 1. Rather than insert or delete a prefix from an ALL(z), we set bit(z)[1], where 1 is the length of the prefix being inserted or deleted, to 1 or 0, respectively. 2. For a rotation, we do not look for rMax in bit(x). Instead, we find the largest integer iMax such that the prefix that corresponds to bit(x)[iMax] matches point(y). The first (bit 0 comes before bit 1) iMax bits of bit'(y) are the first iMax bits of bit(x) and the remaining bits of bit'(y) are the same as the corresponding bits of bit(y). bit'(x) is obtained from bit(x) by setting its first iMax bits to 0. Complexity iMax may be determined in O(log W) time using binary search; bit'(x) and bit'(y) may be computed in 0(1) time using masks and boolean operations. The remaining tasks performed during an insert or delete take O(log n) time. So, the overall complexity of an insert or delete operation is O(log n+log W) = O(log(Wn)). The number of cache misses is O(log n). 3.5 Implementation Details and Memory Requirement 3.5.1 Memory Management We implemented our data structures in C++. Since dynamic memory allocation and deallocation using C++'s methods new and delete are very time consuming, we implemented our own methods to manage memory. We maintained our own list of free memory. Whenever this list was exhausted, we used the new method to get a large chunk of memory to add to our free list. Memory was then allocated from this large chunk as needed by our data structures. Whenever memory was to be deallocated, it was put back on to our free list. 3.5.2 BOB As described in Section 3.2, each node z of the PTST of BOB has the following fields: color, point(z), RST, leftChild, and rightChild. To improve the lookup per- formance of BOB, we added the following fields: maxPriority (maximum priority of the ranges in ranges(z)), minSt (smallest starting point of the ranges in ranges(z)), and maxFn (largest finish point of the ranges in ranges(z)). Correspondingly, the statements RST->hpRight (d,h) and RST (z) ->hpLeft (d, h) of Figure 3-3 are executed only when maxPriority > hp&&d <= maxFn and maxPriority > hp&&minSt < d, respectively. With the added fields, each node of the PTST has 8 fields. For the color and maxPriority fields, we allocate 1 byte each. Assuming 4 bytes for each of the re- maining fields, we get a node size of 26 bytes. For improved cache performance, it is desirable to align node to 4-byte memory-boundaries. This alignment is simplified if node size is an integral multiple of 4 bytes. Therefore, for practical purposes, the PTST node-size becomes 28 bytes. In our implementation of hpRight (Figure 3-4), the while loop conditional was changed from x != null to x != null && mp > hp. A corresponding change was made to hpLeft. The nodes of an RST have the following fields: color, mp, st, fn, p, leftChild, and rightChild. Using 1 byte for the color, p, and mp fields each, and 4 bytes for each of the remaining fields, the size of an RST node becomes 19 bytes. Again, for ease of alignment to 4-byte boundaries, we make the RST-node size 20 bytes. In addition to nodes, every nonempty RST has the fields root (pointer to root of RST) and rank (rank of red-black tree) field. Each of these fields is a 4-byte field. For the doubly-linked lists of PTST nodes with an empty RST, we used the minSt and maxFn fields to, respectively, represent left and right pointers. So, there is no space overhead (other than the space needed to keep track of the first node) associated with maintaining the two doubly-linked lists of PTST nodes that have an empty RST. Since an instance of BOB may have up to 2n PTST nodes, n nonempty RSTs, and n RST nodes, the maximum space/memory required by BOB is 28*2n+8*n+20*n 84n bytes. 3.5.3 PBOB The required fields in each node z of the PTST of PBOB are: color, point(z), ALL, size, length, leftChild, and rightChild, where ALL is a one-dimensional array, each entry of which has the subfields pLength and priority; size is the dimension of the array, and length is the number of pairs currently in the array linear list. The array ALL initially has enough space to accommodate 4 pairs (pLength, priority). When the capacity of an ALL is exceeded, the size of the ALL is increased by 4 pairs (since at most 6 pairs are expected in an ALL, the size of an ALL needs to be increased at most once; in theory, an ALL may get as many as W pairs and, in theory, using array doubling as in [41] may work better than increasing the array size by 4 each time array capacity is exceeded). To improve the lookup performance of PBOB, the field maxPriority (maxi- mum priority of the prefixes in ALL(z)), may be added. Note that minSt (smallest starting point of the prefixes in ALL(z)), and maxFn (largest finish point of the pre- fixes in ALL(z)) are easily computed from point(z) and the pLength of the shortest (i.e., first) prefix in ALL(z). When the nodes of the PTST are augmented with a maxPriority field, the expression ALL (z) ->maxp () in Figure 3-11 may be changed to maxPriority(z), and the statement ALL(z)->searchALL(d,hp) executed only when maxPriority > hp && minSt < d && d < maxFn Since searchALL does its first check against the shortest prefix in the array linear-list and this check tests minSt < d&&d < maxFn, it is sufficient to execute the statement ALL(z)->searchALL (d,hp) only when maxPriority > hp. Using 1 byte for each of the fields: color, size, length, maxPriority, pLength, and priority; and 4 bytes for each of the remaining fields, the initial size of a PTST node of PBOB is 24 bytes. For the doubly-linked lists of PTST nodes with an empty ALL, we used the 8 bytes of memory allocated to the empty array ALL to, respectively, represent left and right pointers. So, there is no space overhead (other than the space needed to keep track of the first node) associated with maintaining the two doubly-linked lists of PTST nodes that have an empty ALL. Since an instance of PBOB may have up to 2n PTST nodes, the minimum space/memory required by these 2n PTST nodes is 24 2n = 48n bytes. However, some PTST nodes may have more than 4 pairs in their ALL. There can be at most n/5 such nodes. So, the maximum space-requirement of PBOB is 48n + 8n/5 = 49.6n bytes. 3.5.4 LMPBOB In the case of LMPBOB, each node z of the PTST has the following fields: color, point(z), bit, leftChild, and rightChild. To improve the lookup performance of PBOB, the fields minLength (minimum of lengths of prefixes in bit(z)) and maxLength may be added. When the nodes of the PTST are augmented with a minLength and a maxLength field, we replace the statement bit(z)->searchBitVector(d,hp,k) of Figure 3-12 by if (same(minLength, point(z), d)) { hp = k = minLength; bit(z)->searchBitVector(d,hp,k); } Observe that maxLength of LMPBOB is equivalent to maxPriority of BOB and PBOB. Using 1 byte for each of the fields: color, minLength, and maxLength; 8 bytes for bit (this analysis is for IPv4); and 4 bytes for each of the remaining fields, the size of a PTST node of LMPBOB is 23 bytes. Again, to easily align PTST nodes along 4-byte boundaries, we pad an LMP PTST node so that its size is 24 bytes. For the doubly-linked lists of PTST nodes with an empty bit vector, we used the 8 bytes of memory allocated to the empty bit vector bit to represent left and right pointers. So, there is no space overhead (other than the space needed to keep track of the first node) associated with maintaining the two doubly-linked lists of PTST nodes that have an empty bit. Since an instance of LMPBOB may have up to 2n PTST nodes, the space/memory required by these 2n PTST nodes is 24 2n = 48n bytes. 3.6 Experimental Results 3.6.1 Test Data and Memory Requirement We implemented the BOB, PBOB, and and LMPBOB data structures and asso- ciated algorithms in C++ as described in Section 3.5 and measured their performance on a 1.4GHz PC. To assess the performance of these data structures, we used six IPv4 prefix databases obtained from [38]2 We assigned each prefix a priority equal to its length. Hence, BOB, PBOB, and LMPBOB were all used in a longest matching-prefix mode. For dynamic router-tables that use the longest matching-prefix tie breaker, the PST structure of Lu et al. [33, 34] provides O(logn) lookup, insert, and delete. So, we included the PST in our experimental evaluation of BOB, PBOB, and LMPBOB. The number of prefixes in each of our 6 databases as well as the memory re- quirement for each database of prefixes are shown in Table 3-2. For the memory 2 Our experiments are limited to prefix databases because range databases are not available for benchmarking requirement, we performed two measurements. Measure gives the memory used by a data structure that is the result of a series of insertions made into an initially empty instance of the data structure. For Measurel, less than 1 of the PTST-nodes in the constructed BOB, PBOB, and LMPBOB instances are empty. So, these data structures use close to the minimum amount of memory they could use. Measure gives the memory used after 7 .. of the prefixes in the data structure constructed for Measure are deleted. In the resulting BOB, PBOB, and LMPBOB instances, almost half the PTST nodes are empty. The datbases Paixl, Pbl, MaeWest and Aads were obtained on Nov 22, 2001, while Pb2 and Paix2 were obtained Sep 13, 2000. Fig- ures 3-14 and 3-15 histogram the data of Table 3-2. The memory required by PBOB and LMPBOB is the same when rounded to the nearest KB. This is so because in each of these structures, the number of PTST nodes is the same; the minimum size of a PTST node in PBOB is 24 bytes, very few PTST nodes of PBOB are vi._-r than 24 bytes because the average value of Iranges(z) is about 1 for our data sets and the maximum value is at most 6; and the size of PTST node in LMPBOB is 24 bytes. In Measure, the memory required by BOB is about 2.38 times that required by PBOB and LMPBOB. However, in Measure2, this ratio is about 1.75. Also, note that, in Measure, PST takes slightly more memory than does BOB, whereas, in Measure2, BOB takes about 50' more memory than does PST. We note also that the mem- ory requirement of PST may be reduced by about 50' using a priority-search-tree implementation different from that used in [33]. Of course, using this more memory efficient implementation would increase the run-time of PST. 3.6.2 Preliminary Timing Experiments We performed preliminary experiments to determine the effectiveness of the changes -i-l-. -1. .1 in Section 3.5. Since these changes are only to the lookup al- gorithm, our preliminary timing experiments measured only the lookup times for the BOB, PBOB, and LMPBOB data structures. To obtain the mean lookup-time, we Table 3-2: Memory usage Database Paixl Pbl MaeWest Aads Pb2 Paix2 Num of Prefixes 16172 22225 28889 31827 35303 85988 PST 884 1215 1579 1740 1930 4702 Measure BOB 851 1176 1526 1682 1876 4527 (KB) PBOB 357 495 642 708 790 1901 LMPBOB 357 495 642 708 790 1901 PST 221 303 395 435 482 1175 Measure BOB 331 455 592 652 723 1760 (KB) PBOB 189 260 338 372 413 1007 LMPBOB 189 260 338 372 413 1007 4000- PST BOB PBOB LMPBOB . n li li 0 Palxl Pbl MaeWest Aads Pb2 Palx2 Database Figure 3-14: Memory usage-measurel 2000 800 PST S BOB 1600 PBOB S LMPBOB 1400 1200 - 1000 - 800 - 600 - 400 200 l Palxl Pbl MaeWest Aads Database Pb2 Paix2 Figure 3-15: Memory usage-measure2 started with a BOB, PBOB, or LMPBOB that contained all prefixes of a prefix database. Next, we created a list of the start points of the ranges corresponding to the prefixes in a database and then added 1 to each of these start points. Call this list L. A random permutation of L was generated and this permutation determined the order in which we searched for the longest matching-prefix for each of addresses in L. The time required to determine all of these longest-matching prefixes was measured and averaged over the number of addresses in L (actually, since the time to perform all these lookups was too small to measure accurately, we repeated the lookup for all addresses in L several times and then averaged). The experiment was repeated 10 times, each time using different random permutation of L, and the mean of these average times computed. The mean times for the implementation described in Section 3.5 is the base lookup-time. For BOB, we found that omitting the predicates d < maxFn and minSt < d resulted in a mean lookup time that is approximately twice the base lookup time. On the other hand, elimination of the predicate maxPriority > hp reduces the mean lookup time by about 2' Even though the use of the predicate maxPriority > hp increased the lookup time slightly on our test data, we believe this is a good heuristic for data sets in which the priorities are not highly correlated with the lengths of the prefixes or ranges. So, our remaining experiments retained this predicate. Eliminating the predicate mp > hp had no noticeable effect on mean lookup time. This is to be expected on our data sets, because for these data sets, the maximum value of |ranges(z)| is < maxR = 6. The predicate mp > hp is expected to be effective on data sets with a larger value of maxR. So, we retained this predicate for our remaining tests. For PBOB, elimination of the predicate hp < maxPriority results in a very slight decrease in the mean lookup time relative to the base case. Hwoever, we expect that for data sets in which the priority isn't highly correlated with the prefix length, this predicate will actually reduce lookup time. Therefore, for further experiments, we retain this predicate in our lookup code. In the case of LMPBOB, the introduction of the statement hp = k = minLength into the base code, results in a lookup time that is 15' less than when this statement is removed. 3.6.3 Run-Time Experiments We measured the mean lookup-time as described in Section 3.6.2. The standard deviation in the average times across the 10 repetitions described in Section 3.6.2 was also computed. These mean times and standard deviations are reported in Table 3-3. The mean times are also histogrammed in Figure 3-16. It is interesting to note that PBOB, which can handle prefix tables with arbitrary priority assignments is actually 211' to 3I' faster than PST, which is limited to prefix tables that employ the longest matching-prefix tie breaker. Further, lookups in BOB, which can handle range tables with arbitrary priorities are slightly slower than in PST. LMPBOB, which, like PST, is designed specifically for longest-matching-prefix lookups is slightly inferior to the more general PBOB. I PST 32 BOB 3 I PBOB 28 LMPBOB 26 24 ?22 2- E18 -16 814 (012 Paixl Pbl MaeWest Aads Pb2 Paix2 Database Figure 3-16: Search time To obtain the mean insert-time, we started with a random permutation of the prefixes in a database, inserted the first I 7'. of the prefixes into an initially empty data structure, measured the time to insert the remaining : ;'. and computed the mean insert time by dividing by the number of prefixes in :3 ;' of the database. (Once again, Table 3-3: Prefix times on a 1.4GHz Pentium 4 PC with an 8K L1 data cache and a 256K L2 cache Database Paixl Pbl MaeWest Aads Pb2 Paix2 PST Mean 1.20 1.35 1.49 1.53 1.57 1.96 Std 0.01 0.01 0.04 0.01 0.00 0.01 BOB Mean 1.22 1.39 1.54 1.56 1.62 2.19 Search Std 0.01 0.02 0.02 0.02 0.02 0.01 (psec) PBOB Mean 0.82 0.98 1.10 1.15 1.20 1.60 Std 0.01 0.01 0.01 0.01 0.01 0.01 LMPBOB Mean 0.87 1.03 1.17 1.21 1.27 1.69 Std 0.01 0.01 0.01 0.01 0.01 0.01 PST Mean 2.17 2.35 2.53 2.60 2.64 3.03 Std 0.07 0.04 0.03 0.01 0.05 0.01 BOB Mean 1.70 1.89 2.06 2.10 2.16 2.55 Insert Std 0.06 0.06 0.05 0.05 0.05 0.03 (psec) PBOB Mean 1.04 1.25 1.39 1.44 1.51 1.93 Std 0.06 0.05 0.00 0.05 0.05 0.06 LMPBOB Mean 1.06 1.29 1.47 1.50 1.57 1.98 Std 0.07 0.07 0.06 0.06 0.04 0.01 PST Mean 1.72 1.87 2.06 2.09 2.11 2.48 Std 0.04 0.05 0.05 0.06 0.04 0.06 BOB Mean 1.04 1.13 1.26 1.27 1.32 1.69 Delete Std 0.06 0.05 0.04 0.05 0.06 0.06 (psec) PBOB Mean 0.68 0.82 0.90 0.91 0.97 1.30 Std 0.07 0.06 0.05 0.06 0.03 0.05 LMPBOB Mean 0.67 0.82 0.89 0.92 0.95 1.26 Std 0.06 0.06 0.05 0.05 0.03 0.05 Num of Copies 15 11 9 8 8 3 since the time to insert the remaining 3:' of the prefixes was too small to measure accurately, we started with several copies of the data structure and inserted the 3 ' prefixes into each copy; measured the time to insert in all copies; and divided by the number of copies and number of prefixes inserted). This experiment was repeated 10 times, each time starting with a different permutation of the database prefixes, and the mean of the mean as well as the standard deviation in the mean computed. These latter two quantities as well as the number of copies of each data structure we used for the inserts are given in Table 3-3. Figure 3-17 histograms the mean insert-time. As can be seen, insertions into PBOB take between III '. and ian'. less time than do insertions into PST; insertions into LMPBOB take slightly more time than do insertions into PBOB; and insertions into PST take 211' to 25'. more time than do insertions into BOB. S PST 32 BOB 3 H PBOB 28 m LMPBOB 26 24 22 2 18- E F16 14 Palxl Pbl MaeWest Aads Pb2 Palx2 Database Figure 3 17: Insert time The mean and standard deviation data reported for the delete operation in Ta- ble 3 3 and Figure 3 18 was obtained in a similar fashion by starting with a data structure that had 1(111' of the prefixes in the database and measuring the time to delete a randomly selected Q:;' of these prefixes. Deletion from PBOB takes less than 50' the time required to delete from an PST. For the delete operation, how- ever, LMPBOB is slightly faster than PBOB. Deletions from BOB take about 40' . less time than do deletions from PST. 3.7 Conclusion Table 3.7 gives the worst-case memory required by each of the data structures. The data of this table are for IPv4. When comparing these memory requirement data, we should keep in mind that BOB, PBOB, and LMPBOB have different ca- pabilities. BOB works for highest-priority matching with nonintersecting ranges; PBOB is limited to highest-priority matching with prefixes; and LMPBOB is limited PBOB is limited to highest-priority matching with prefixes; and LMPBOB is limited 2 PST 26 24 022 Palxl Pbl MaeWest Aads Pb2 Palx2 Database Figure 3-18: Delete time to longest-length matching with prefixes. The PST structure of Lu et al. [33] has the same restrictions as does LMPBOB. Table 3-4: Node sizes and worst-case memory requirement in bytes for IPv4 router tables. BOB PBOB LMPBOB PST Node Size PTST(28 RST20) >24 24 28 ?"- 1 2 08 Memory Required 84n 49.6n 48n 56n 0204 P Maeiest Paix1 Pb1 MaeWest Aads Pb2 Paix2 Database Table 3Figure5 gives the .De time complexity and Table 3time6 gives the .mp- totic cache misses for our data structures. In these tables, maxR is the maximum number of ragest-lenges or prefixes th prefixes. The PST stination address of Lu et and ma[33 is the maximum number of cache-lines needed by any of the array linear-lists stored in a PTST node. For LMPBOB, it is assumed that mask operations on W-bit vectorsLMPBOB. take (1) time sizesand that an enworst-case W-bit memory requirement in bytes for IPv4 router tabmisses. Table 3-5: Time complexity BOB PBOB PBOB LMPBOB PST Node Size PTST(28) RST(20) >24 24 28 Memory Required 84n 49.6naR) O(logn + W) O(logn Table 3-5 gives the i-, iiii ic time complexity and Table 3-6 gives the .'-Jrmp- totic cache misses for our data structures. In these tables, maxR is the maximum Inser of ranges or prefixes that match any destinatio(logn address and axL is the Deletemaximum number of cache-lines needed by any of the array linear-lists stored in a PTST node. For LMPBOB, it is assumed that mask operations on W-bit vectors take 0(1) time and that an entire W-bit vector can be accessed with 0(1) cache misses. Table 3-5: Time complexity BOB PBOB LMPBOB PST Search 0(lognlogmaxR) 0(logn+maxR) 0(log n + W) 0(log n) Insert 0 (log n) 0(logn + maxR) 0(logn + logW) 0(logn) Delete 0(log n) 0(logn + maxR) 0(logn + logW) 0(logn) Table 3-6: Cache misses BOB PBOB LMPBOB PST Search O(log log maxR) O(log n + maxL) O(log n) O(log n) Insert O(log n) O(log n + maxL) O(log n) O(log n) Delete O(log n) O(log n + maxL) O(log n) O(log n) Our experiments show that PBOB is to be preferred over PST and LMPBOB for the representation of dynamic longest-matching prefix-router-tables. This is some- what surprising because PBOB may be used for highest-priority prefix-router-tables, not just longest-matching prefix-router-tables. A possible reason why PBOB is faster than LMPBOB is that in LMPBOB one has to check O(W) prefix lengths, whereas in PBOB O(maxR) lengths are checked (note that in our test databases, W = 32 and maxR < 6). BOB is slower than and requires more memory than PBOB when tested with longest-matching prefix-router tables. The same relative performance between BOB and PBOB is expected when filters are prefixes with arbitrary priority. Of the data structures considered in this chapter, BOB, of course, remains the only choice when the filters are ranges that have an associated priority. Although the range allocation rule used by our data structures is similar to that used in an interval tree [40], the unique feature of our structures is the 2n size constraint. The size constraint is essential for O(log n) update. CHAPTER 4 A B-TREE DYNAMIC ROUTER-TABLE DESIGN In this chapter, we focus on B-tree data structures for dynamic NHPRTs and LMPTs. We are interested in the B-tree, because by varying the order of the B- tree, we can control the height of the tree and hence control the number of cache misses incurred when performing a rule-table operation. Although Suri et al. [20] have proposed a B-tree data structure for dynamic prefix-tables, their structure has the following shortcomings: 1. A prefix may be stored in O(m) nodes at each level of the order m B-tree. This results in excessive cache misses during the insert and delete operations. 2. Some of prefix end-points are stored twice in the B-tree. This is because every endpoint is stored in a leaf node and some of the endpoints are additionally stored in interior nodes. This duplicity in end-point storage increases memory requirement. Our proposed B-tree structure doesn't suffer from these shortcomings. In our struc- ture, each prefix is stored in 0(1) nodes at each level, and each prefix end-point is stored once. Consequently, even though the .,-i-in!ll ic complexity of performing dynamic prefix-table operations is the same in both structures and the .,-i-'!,l I ilic memory requirements of both are the same, our structure is faster for the insert and delete operations and also takes less memory. In Section 4.1, we develop our B-tree data structure, PIBT (prefix in B-tree), for dynamic prefix-tables. Our B-tree structure for non-intersecting ranges, RIBT (range in B-tree), is developed in Section 4.2. Experimental results comparing the performance of our PIBT structure, the multiway range tree (\! RT) structure of Suri Table 4-1: An example prefix set R (W = 5) Preifx Name Prefix Range Start Range Finish P1 001* 4 7 P2 00* 0 7 P3 1* 16 31 P4 01* 8 15 P5 10111 23 23 P6 0* 0 15 et al. [20], and the best binary tree structure for dynamic prefix-tables, PBOB [35], are presented in Section 4.3. 4.1 Longest-Matching Prefix-Tables-LMPT 4.1.1 The Prefix In B-Tree Structure-PIBT A range r = [u, v] is a pair of addresses u and v, u < v. The range r represents the addresses {u,u + 1,...,v}. starter) = u is the start point of the range and finisher) = v is the finish point of the range. The range r matches all addresses d such that u < d < v. Every prefix of a prefix router-table may be represented as a range. For example, when W = 5, the prefix p = 100* matches addresses in the range [16,19]. So, we p = 100* [16,19], start(p) = 16, and finish(p) = 19. The length of p is 3. Figure 4-1 shows a prefix set and the ranges of the prefixes. The set of start and finish points of a collection P of prefixes is the set of endpoints, E(P), of P. When IP = n, E(P)I < 2n. Although our PIBT structure and the MRT structure of Suri et al. [20] (\I RT) store the endpoints E(P) together with additional information in a B-tree1 [41], each structure uses a different variety of B-tree. Our PIBT structure uses a B-tree in which each key (endpoint) is stored 1 A B-tree of order m is an m-way search tree. If the B-tree is not empty, the root has at least two children and other internal nodes have at least [m/21 children. All external nodes are at the same level. X 7 16 Y / z / w )o 4 8 15 23 31 P1 P3 P2 P4 P6 Figure 4-1: B-tree for the endpoints of the prefixes of Figure 4-1 15 4 7 ) ( 23 0 4) 7) 8 15) 16 23 31 Figure 4-2: Alternative B-tree for Figure 4-1 exactly once, while the MRT uses a B-tree in which each key is stored once in a leaf node and some of the keys are additionally stored in interior nodes. Figure 4-1 shows a possible order-3 B-tree for the endpoints of the prefix set of Figure 4-1. In this example, each endpoint is stored in exactly one node. This example B-tree is a possible B-tree for PIBT but not for MRT. Figure 4-2 shows a possible order 3 B-tree in which each endpoint is stored in exactly one leaf node and some endpoints are also stored in interior nodes. This example B-tree is a possible B-tree for MRT but not for PIBT. With each node x of a PIBT B-tree, we associate an interval int(x) of the des- tination address space [0, 2" 1]. The interval int(root) associated with the root of the B-tree is [0, 2W 1]. Let x be a B-tree node that has t keys. The format of this node is: t, child, (key,, child,), (keyt, child) where keyi is the ith key in the node (keyi < key2 < ... < keyt) and child is a pointer to the ith subtree. In case of ambiguity, we use the notation x.keyi and x.childi to refer to the ith key and child, respectively, of node x. Let keyo = start(int(x)) and keyt+l = finish(int(x)). By definition, intjix) = int(childi) = [keyi, keyi+1], 0 < i < t For the example B-tree of Figure 4-1, int(x) = [0, 31], into(x) = int(y) = [0, 7], intl(x) = int(z) = [7,16], int2(x) = int(w) = [16,31], into(y) = [0,0], intl(y) [0, 4], int2(y) [4, 7], and into(z) [7, 8]. Node x of a PIBT has t + 1 W-bit vectors x.inte i,. 0 < i < t and t W-bit vectors .,,.,/ 1 < i < t. The Ith bit of x.int( i', denoted x.intti, i., [1] is 1 iff there is a length 1 prefix whose range includes inti(x) but not int(x). This rule for the interval vectors is called the prefix allocation rule. For our example of Figure 4 1, y.interval2[3] = 1 because prefix P1 has length 3 and range [4,7]; [4,7] includes int2(y) = [4, 7] but not int(y) = [0, 7]. We -i-v that P1 is stored in y.interval2 and in node y. It is easy to see that a prefix may be stored in up to m 1 intervals of an order m B-tree node and in up to 2 nodes at each level of the B-tree. The bit -, ,,., [1] is 1 iff there is a length 1 prefix that has a start or finish endpoint equal to keyi of x. For our example, prefixes P2 and P6 have 0 as their start endpoint. Since the length of P2 is 2 and that of P6 is 1,, ,.,/l [1] = ,, .,li [2] = 1; all other bits of ,' ./',,.,i are 0. To conserve space, leaf nodes do not have child pointers. Further, to reduce memory accesses, child pointers and interval bit-vectors are interleaved so that child and inte i ,., can be accessed with a single cache miss provided cache lines are long enough. In the sequel, we assume that W is sufficiently small so that this is the case. Further, we assume that bit-vector operations on W-bit vectors take 0(1) time. This assumption is certainly valid for IPv4 where W = 32 and a W-bit vector may be represented as a 4-byte integer. 4.1.2 Finding The Longest Matching-Prefix As in [20], we determine only the length of the longest prefix that matches a given destination address d. From this length and d, the longest matching-prefix, Imp(d), is easily computed. The PIBT search algorithm (Figure 4-3) employs the following lemma. Lemma 33 Let P be a set of prefixes. If P contains a prefix whose start or finish endpoint equals d, then the longest prefix, Imp(d), that matches d has its start or finish point equal to d. Proof Let p E P be a prefix that matches d and whose start or finish endpoint equals d. Let q E P be a prefix that matches d but whose start and finish endpoints are different from d. It is easy to see that the range of p is properly contained in the range of q. Therefore, p is a longer prefix than q. So, Imp(d) / q. The lemma follows. 0 The PIBT search algorithm first constructs a W-bit vector matchVector. When the router table has no prefix whose start or finish endpoint equals the destination address d, the constructed bit vector satisfies matchVector[l] = 1 iff there is a length 1 prefix that matches d. Otherwise, matchVector[l] = 1 iff there is a length 1 prefix whose start or finish endpoint equals d. The maximum 1 such that matchVector[l] = 1 is the length of Imp(d). Complexity Analysis. Each iteration of the while loop takes O(log2 ) time (we assume throughout this paper that, for sufficiently large m, a B-tree node is searched using a binary search) and the number of iterations is O(log, n). The largest I such that matchVector[l1] = 1 may be found in O(log2 W) time by performing O(log2 W) operations on the W-bit vector matchVector. So, the overall complexity is |

Full Text |

PAGE 4 Iwouldliketogivemysincerethankfulnesstomyadvisor,Dr.SartajSahni,forhismentoringandsupportthroughoutmyPh.D.study.Itwouldbeimpossibletohavemyresearchcareerwithouthisguidance. Thisworkwassupported,inpart,bytheNationalScienceFoundationundergrantCCR-9912395. IamverygratefultoDr.SanjayRanka,Dr.RandyChow,Dr.RichardNewman,Dr.MichaelFangforservingonmyPh.D.supervisorycommitteeandprovidinghelpfulsuggestions. Iwanttodedicatethisdissertationtomyparents.Withouttheirencouragementandhardwork,Icouldnotthinkofgettingadoctoraldegree.Finally,Iwouldliketogivemyspecialthankstomywife,Lan,whosecaringandloveenabledmetocompletethiswork.iv PAGE 5 TABLEOFCONTENTS page ACKNOWLEDGMENTS.............................iv ABSTRACT....................................viii CHAPTER 1INTRODUCTIONANDRELATEDWORK................1 1.1Introduction..............................1 1.1.1StaticRouterTable......................3 1.1.2DynamicRouterTable....................4 1.2RelatedWork.............................6 1.2.1Trie...............................6 1.2.2SetsofEqual-LengthPrexes.................8 1.2.3End-PointArray........................9 1.2.4MultiwayRangeTree.....................9 1.2.5 O (log n )DynamicSolutions.................9 1.2.6Highest-PriorityPrexTable.................10 1.2.7TCAM.............................10 1.2.8Others.............................11 1.3Contribution..............................11 2 O (log n )DYNAMICROUTERTABLEFORPREFIXESANDRANGES14 2.1Preliminaries.............................14 2.1.1PrexesandLongest-PrexMatching............14 2.1.2RangesandProjections....................15 2.1.3Most-Specic-RangeRoutingandConict-FreeRanges..17 2.1.4NormalizedRanges......................23 2.1.5PrioritySearchTreesAndRanges..............34 2.2Prexes................................35 2.3NonintersectingRanges........................36 2.4Conict-FreeRanges.........................38 2.4.1Determine msr ( d ).......................38 2.4.2InsertARange........................38 2.4.3DeleteARange........................39 2.4.4Computing maxP and minP .................40 2.4.5ASimpleAlgorithmtoCompute maxP ...........40 2.4.6AnEcientAlgorithmtoCompute maxP .........41 v PAGE 6 2.4.7WrappingUpInsertionofaRange..............44 2.4.8WrappingUpDeletionofaRange..............45 2.4.9Complexity...........................45 2.5ExperimentalResults.........................46 2.5.1Prexes.............................46 2.5.2NonintersectingRanges....................50 2.5.3Conict-freeRanges......................51 2.6Conclusion...............................51 3DYNAMICIPROUTERTABLESUSINGHIGHEST-PRIORITY MATCHING................................53 3.1Preliminaries.............................53 3.2NonintersectingHighest-PriorityRule-Tables(NHRTs)|BOB..56 3.2.1TheDataStructure......................56 3.2.2Searchfor hpr ( d ).......................59 3.2.3InsertaRange.........................61 3.2.4Red-Black-TreeRotations...................63 3.2.5DeleteaRange.........................66 3.2.6ExpectedComplexityofBOB................68 3.3Highest-PriorityPrex-Tables(HPPTs)|PBOB..........69 3.3.1TheDataStructure......................69 3.3.2Lookup.............................69 3.3.3InsertionandDeletion.....................71 3.4Longest-MatchingPrex-Tables(LMPTs)|LMPBOB.......71 3.4.1TheDataStructure......................71 3.4.2Lookup.............................72 3.4.3InsertionandDeletion.....................73 3.5ImplementationDetailsandMemoryRequirement.........74 3.5.1MemoryManagement.....................74 3.5.2BOB..............................74 3.5.3PBOB.............................76 3.5.4LMPBOB...........................77 3.6ExperimentalResults.........................78 3.6.1TestDataandMemoryRequirement............78 3.6.2PreliminaryTimingExperiments...............79 3.6.3Run-TimeExperiments....................82 3.7Conclusion...............................84 4AB-TREEDYNAMICROUTER-TABLEDESIGN...........87 4.1Longest-MatchingPrex-Tables|LMPT..............88 4.1.1ThePrexInB-TreeStructure|PIBT...........88 4.1.2FindingTheLongestMatching-Prex............91 4.1.3InsertingAPrex.......................92 4.1.4Insertinganendpoint.....................92 vi PAGE 7 4.1.5Updateintervalvectors....................96 4.1.6DeletingAPrex.......................97 4.1.7DeletingfromaLeafNode..................98 4.1.8BorrowfromaSibling.....................98 4.1.9MergingTwoAdjacentSiblings...............99 4.1.10DeletingfromaNon-leafNode................100 4.1.11Cache-MissAnalysis......................102 4.2Highest-PriorityRange-Tables....................104 4.2.1Preliminaries..........................104 4.2.2TheRangeInB-TreeStructure|RIBT...........105 4.2.3RIBTOperations.......................107 4.3ExperimentalResults.........................108 4.4Conclusion...............................112 5CONCLUSIONANDFUTUREWORK..................113 5.1Conclusion...............................113 5.2FutureWork..............................114 REFERENCES...................................116 BIOGRAPHICALSKETCH............................120 vii PAGE 8 Internetroutersuseroutertablestoclassifyincomingpacketsbasedonthein-formationcarriedinthepacketheaders.Packetclassicationisoneofthenetworkbottlenecks,especiallywhenahighupdateratebecomesnecessary.Muchoftheresearchintherouter-tableareahasfocusedonstaticprextables,whereupdatesusuallyrequiretherebuildingofthewholeroutertable.Somerouter-tabledesignsrelyontherelativelyshortIPv4addressestoachievedesiredeciency.However,thesedesignshavebadscalabilityintermsoftheprexlength. Weproposeseveralschemestorepresentone-dimensionaldynamicrangetables,thatis,tablesinto/fromwhichrulesareinserted/deletedconcurrentwithpacketclassication,andltersarespeciedasranges.Ourschemesallowreal-timeupdateandatthesametimeprovideecientlookup.Thelookupandupdatecomplexitiesofourschemesarelogarithmicfunctionsofthenumberofthelters.TherstschemePST,whichisbasedonprioritysearchtrees,usesthemostspecicruletiebreaker.ThesecondschemeiscalledBOB(BinarysearchtreeOnBinarysearchtree).Thisschemeusesthehighestprioritytiebreaker.Inordertoutilizethewidecachelinesizeandreducethetreeheight,athirdschemeisdevelopedinwhichthetoplevelviii PAGE 10 1.1 Introduction Today'sInternetconsistsofthousandsofpacketnetworksinterconnectedbyrouters.WhenahostsendsapacketintotheInternet,theroutersrelaythepackettowardsitsnaldestination.Theroutersexchangeroutinginformationwitheachother,andusetheinformationgatheredtocalculatethepathstoallreachabledesti-nations.Eachpacketistreatedindependentlyandforwardedtoanextrouterbasedonitsdestinationaddress. Thedatastructurearouterusestoquerynexthopiscalledtheroutertable.Eachentryintheroutertableisaruleoftheform(addressprex,nexthop).Table1{1showsasetofverules.WeuseWtodenotethemaximumpossiblelengthofaprex.InIPv4,W=32andinIPv6,W=128.InTable1{1Wis5.TheprexP1,whichmatchesallthedestinationaddresses,iscalledthedefaultprex.TheprexP3matchesthedestinationaddressesbetween16and19.Iftheaddressprexofarulematchesthedestinationaddresstheincomingpacketcarries,thenexthopofthisruleisusedtoforwardpacket. AddressprexwasintroducedbyCIDR(ClasslessInterdomainRouting)todealwithaddressdepletionandroutertableexplosion.TheresultofCIDR'saddressaggregationisthattheremayhaveseveralruleswhoseprexesmatchthedestinationaddress.Forexample,therulesP1,P3andP4inTable1{1matchthedestinationaddress19.Inthiscase,atiebreakerisneededtoselectoneofthematchingrules.Themostspecicmatchingisusuallyused,namely,thelongestprexmatchingthe1 PAGE 11 Theothertwopopulartiebreakersarerstmatchingandhighestprioritymatch-ing.Forrstmatchingtiebreaker,theruletableisassumedtobealinearlistofruleswiththerulesindexed1throughnforann-ruletable.Therstrulethatmatchestheincomingpackageisused.NoticethattheruleR1isselectedforeveryincomingpacketsinceitmatchesallthedestinationaddresses.Inordertogiveachancetootherrulestobecomethewinner,wemustindextherulescarefully,andthedefaultprexshouldbethelastrule. Inthehighestprioritymatching,eachruleisassignedapriority,andtherulewiththehighestpriorityisselectedfromthosematchingtheincomingpacket.1Noticethattherstmatchingtiebreakerisaspecialcaseofthehighestprioritymatchingtiebreaker(simplyassigneachruleapriorityequaltothenegativeofitsindexinthelinearlinear). Table1{1:Aroutertablewithverules(W=5) RuleName PrexName Prex NextHop RangeStart RangeFinish R1 P1 N1 0 31R2 P2 0101* N2 10 11R3 P3 100* N3 16 19R4 P4 1001* N4 18 19R5 P5 10111 N5 23 23 Thequerybasedonthedestinationaddressisusuallycalledaddresslookuporpacketforwarding.Ingeneralothereldssuchassourceaddressandportnumbersmayalsobeused,andtheroutertableconsistsoftherulesoftheform(F;A),whereFisalterandAisanaction.Theactioncomponentofarulespecieswhatis PAGE 12 1.1.1 Static Router Table Inastaticruletable,therulesetdoesnotvaryintime.Forthesetables,weareconcernedprimarilywiththefollowingmetrics:1. Tohandleupdate,staticschemesusuallyusetwocopies-workingandshadow-oftheroutertables.Lookupsaredoneusingtheworkingtable.Updatesareperformed,inthebackground(eitherinrealtimeontheshadowtableorbybatchingupdatesandreconstructinganupdatedshadowatsuitableintervals);periodically,theshadowreplacestheworkingtable,andthecachesoftheworkingtableareushed.Inthismodeofupdateoperation,manypacketsmaybemisclassied,becausetheworkingcopyisn'timmediatelyupdated.Thenumberofmisclassiedpacketsdependsontheperiodicitywithwhichtheworkingtablecanbereplacedbyanupdatedshadow.Further,additionalmemoryisrequiredfortheshadowtableandforperiodicrecon-structionoftheworkingtable.Itisimportanttohaveshorterpreprocessingtimeinordertoreducethenumberofmisclassiedpackets. PAGE 13 1.1.2 Dynamic Router Table Inpractice,ruletablesareseldomtrulystatic.Atbest,rulesmaybeaddedtoordeletedfromtheruletableinfrequently.Typically,ina\static"ruletable,in-serts/deletesarebatchedandtherouter-tabledatastructurereconstructedasneeded.Inadynamicruletable,rulesareadded/deletedwithsomefrequency.Forsuchtables,inserts/deletesarenotbatched.Rather,theyareperformedinrealtime. Webelievethatdynamicstructuresforroutertablesisbecominganecessity.First,updateoccursfrequentlyinthebackbonearea.Labovitzetal.[1]foundup-dateratecouldreachashighas1000persecond.Theseupdatesstemfromtheroutefailure,routerepairandroutefail-over.Withthenumberofautonomoussystemscon-tinuouslyincreasing,itisreasonabletoexpecttheraisingupdaterate.Theroutertableneedstobeupdatedinordertoreecttheroutechange.Second,fastprocess-ingofupdateispreferredbecauseduringthebatchandreconstruction,end-to-enddelayincreases,packetlossraisesdramatically,andthepartofnetworkmayexpe-rienceconnectivityloss.Labovitzetal.[2]observeddramaticallyincreasedpacketlossandend-to-endlatencyduringtheBGProutingchange.Batchandexpensivereconstructionmakethingsworse.WhileBGPtakestimetoconverge,route-repaireventsusuallydonotcausemultipleannouncements,andthelatencyforroutertabletobecomestableduetotheseeventsshouldonlydependonthenetworkdelayandrouterprocessingdelaysalongthepath[2].Inaddition,whentheBGPcoveragetimegetsreduced,theprocessingdelaymaydominate.Peietal.[3]reducetheconver-gencetimefrom30.3secondsto0.3secondsforafailurewithdrawinthetestbedbyapplyingtwoconsistencyassertionstoBGP.Macianetal.[4]emphasizetheimpor-tanceofsupportinghighupdaterate.Dynamicroutertablesthatpermithigh-speedinsertsanddeletesareessentialinQoSandVASapplications[4].Forexample,edgeroutersthatdostatefullteringrequirehigh-speedupdates[5]. PAGE 14 Fordynamicroutertables,weareconcernedadditionallywiththetimerequiredtoinsert/deletearule.Foradynamicruletable,theinitialrule-tabledatastructureisconstructedbystartingwithanemptydatastructureandtheninsertingtheinitialsetofrulesintothedatastructureonebyone.So,typically,inthecaseofdynamictables,thepreprocessingmetric,mentionedabove,isverycloselyrelatedtotheinserttime. Fordynamicroutertable,thefollowingmetricsaremeasuredtocomparetheperformance:1. AnotherimportantmetricweconcernforbothstaticanddynamicroutertableisthescalabilitytoIPv6.IPv6,thenextgenerationofIP,uses128-bitaddresses(W=128).Althoughsomeoftheschemesinsection1.2workwellforIPv4(W=32),theyhavebadscalabilityintermsoftheprexlength. PAGE 15 1.2 Related Work Datastructuresforruletablesinwhicheachlterisaaddressprexandtherulepriorityisthelengthofthisprex2havebeenintenselyresearchedinrecentyears.Werefertoruletablesofthistypeaslongest-matchingprex-tables(LMPT).Werefertoruletablesinwhichtheltersarerangesandinwhichthehighest-prioritymatchinglterisusedashighest-priorityrange-tables(HPRT).WhentheltersofnotworulesofanHPRTintersect,theHPRTisanonintersectingHPRT(NHPRT).AlthougheveryLMPTisalsoanNHPRT,anNHPRTmaynotbeanLMPT. Ruiz-Sanchezetal.[6]reviewdatastructuresforstaticLMPTsandSahnietal.[7]reviewdatastructuresforbothstaticanddynamicLMPTs. 1.2.1 Trie Severaltrie-baseddatastructuresforLMPTshavebeenproposed[8,9,10,11,12,13,14].StructuressuchasthatofDoeringeretal.[10]usethepath-compressiontechnique.ThusthememoryrequirementisO(n).Thesearchisguidedbytheinputkeyandonlyinspectsthebitpositionstoredattheinternalnodeduetoasuccessfulsearchbias.Whenthesearchreachestheleafnodeandthesearchdoesnotsucceed,thedownwardpathmaybebacktrackedtondthelongestmatchingprex.HencethesearchcanbecarriedoutinO(W)time.Theupdateoperation,insertordelete,isnaturalintriestructure,andcanalsobeperformedinO(W)time.ThememoryaccessesduringtheseoperationsareO(W).ForIPv6,O(W=128)memoryaccessesarequiteexpensive.Moreover,pathcompressionreducestheheightoftrieonlyiftheprexesscatterinsidethetriesparsely.Whenthenumberofprexesincreases,lotsofbranchnodesareneededandpathcompressiondoesnothavemanynodesto PAGE 16 Inordertoreducethetrielength,Guptaetal.[15]usesDIR-24-8schemewhichfullyexpandsthebinarytrieatdepth24,i.e.,allprexeswithlengthlessthanorequalto24areexpandedto24-bitprexesasmanyasneeded,andatablewith224entriesisusedtostoretheseexpandedprexes.Forthoseprexeslongerthan24bits,asecondtableisusedtostorethem.Thecorrespondenceisestablishedbystoringpointersinthersttablewhichpointtotheproperentriesinthesecondtable.Thersttablehas224entries,andeachentryis16bits(32Mbytesintotal).Therstbitofeachentryindicateswhetherthenext15bitsstorethenexthoporapointerinto2ndtable.Withmorethan32Mbytesmemoryusage,theschemecanperformsearchinatmosttwomemoryaccesses.ButitisnotscalabletoIPv6becauseexpandingto24bitsalreadytakestoomuchmemory.Guptaetal.[15]alsoproposealternativesthatuselessmemorybutrequiremorememoryaccesses. Degermarketal.[9]useasimilarprexexpansiontechniqueatmultipledepths.Bitmapcompressionisdeployedtoreducedthememoryrequirementgreatly.Aroutertablewith40,000rulescantinto160Kbytes.Intheworstcase,thenumberofmemoryaccessesisnine.Huangetal.[16]fullyexpandthebinarytrieatdepth16andalsoexpandthesbutriesrootedatthenodesindepth16totheirowndepths.Thebitmapcompressionisalsoappliedtoreducethememoryrequirement.Theroutertablesusedintheexperimentcanbecompactedintolessthan500Kbytes.Thenumberofworstcasememoryaccessesisthree.Bothschemes[9,16]heavilydependontheprexdistribution.Itishardtodecideapropermemorysizefortheschemeaheadoftime.Forexample,inextremecase,ifnprexesintheroutertableallhavelength32,andtheirrst16-bitsaredistinct(assumen<=216),thescheme[16]needsatleast214nbytes. PAGE 17 Nilssonetal.[11]applythelevelcompressionaswellaspathcompressiontothebinarytrie.Abinarytrieispath-compressedrst,thenlevelcompressionisusedtoreducetheheightofthetriefurtherbysubstitutingkhighestlevelsofthebinarytriewithasingledegree-2knode.AlthoughthesearchcomplexityofLC(levelcompressed)trieisstillO(W),theheightofLC-trieisaround8fortheroutertablesusedinauthor'sanalysis. Thesedatastructures[9,11,15]aswellasSrinivasanetal.[12]attempttooptimizelookuptimethroughanexpensivepreprocessingstep.They,whileprovidingveryfastlookupcapability,haveaprohibitiveinsert/deletetime,sotheyaresuitableonlyforstaticrouter-tables(i.e.,tablesinto/fromwhichnoinsertsanddeletestakeplace). Sahnietal.[13,14]provideecientconstructionsforxed-strideandvariable-stridemultibittries.Thelookuptimeandmemoryrequirementareoptimizedthroughexpensivepreprocessing. Aimingatimprovingupdatespeedforxed-stridemultibittrieatpipelinedASICarchitecture,Basuetal.[17]describeanalgorithmtooptimizeandbalancethememoryrequirementacrossthepipelinestages. 1.2.2 Sets of Equal-Length Prexes Waldvogeletal.[18]haveproposedaschemethatperformsabinarysearchonhashtablesorganizedbyprexlength.Inordertosupportbinarysearch,O(logW)markersaregeneratedforeachprex,andthelongestmatchingprexisprecomputedforeachmarker.ThisbinarysearchschemehasanexpectedcomplexityofO(logW)forlookup.ThememoryrequirementisboundedbyO(nlogW).ByintroducingatechniquecalledmarkerpartitioninginthefullversionofWaldvogeletal.[18],theschemehasO(p PAGE 18 1.2.3 End-Point Array Analternativeadaptationofbinarysearchtolongest-prexmatchingisdevel-opedin[19].Thedistinctendpoints(startpointsandnishpoints)oftherangesdenedbytheprexesarestoredinascendingorderinanarray.TheendpointsdividetheuniverseintoO(n)basicintervals.TheLMP(d)isprecomputedforeachintervalaswellasforeachendpoint.LMP(d)isfoundbyperformingabinarysearchonthisorderedarray.AlookupinatablethathasnprexestakesO(logn)time.Becausetheschemes[19]useexpensiveprecomputation,theyarenotsuitedforadynamicrouter-tables. 1.2.4 Multiway Range Tree Surietal.[20]haveproposedaB-treedatastructurefordynamicLMPTs.Usingtheirstructure,wemayndthelongestmatching-prex,LMP(d),inO(logmn)time.However,inserts/deletestakeO(Wlogmn)time.WhenWbitstinO(1)words(asisthecaseforIPv4andIPv6prexes)logicaloperationsonW-bitvectorscanbedoneinO(1)timeeach.Inthiscase,theschemeofSurietal.[20]takesO(mlog2Wlogmn)timeforaninsertionandO(mlogmn+W)foradeletion.AssumeonenodecantintoO(1)cacheline,thenumberofmemoryaccessesthatoccurwhenthedatastructureofSurietal.[20]isusedisO(logmn)persearch,andO(mlogmn)perupdate. 1.2.5 Dynamic Solutions Sahnietal.[21,22]developdatastructures,calledacollectionofred-blacktrees(CRBT)andalternativecollectionofred-blacktrees(ACRBT),thatsupportthethreeoperationsofadynamicLMPTinO(logn)timeeach.ThenumberofcachemissesisalsoO(logn).Sahnietal.[22]showthattheirACRBTstructureiseasilymodiedtoextendthebiased-skip-liststructureofErgunetal.[23]soastoobtainabiased-skip-liststructurefordynamicLMPTs.Usingthismodiedbiasedskip-liststructure,lookup,insert,anddeletecaneachbedoneinO(logn)expectedtimeandO(logn)expectedcachemisses.Liketheoriginalbiased-skipliststructureof PAGE 19 1.2.6 Highest-Priority Prex Table WhenanHPPT(highest-priorityprex-table)isrepresentedasabinarytrie[24],eachofthethreedynamicHPPToperationstakesO(W)timeandcachemisses. Guptaetal.[25]havedevelopedtwodatastructuresfordynamicHPRTs|heapontrie(HOT)andbinarysearchtreeontrie(BOT).TheHOTstructuretakesO(W)timeforalookupandO(Wlogn)timeforaninsertordelete.TheBOTstructuretakesO(Wlogn)timeforalookupandO(W)timeforaninsert/delete.ThenumberofcachemissesinaHOTandBOTisasymptoticallythesameasthetimecomplexityofthecorrespondingoperation. 1.2.7 TCAM Ternarycontent-addressiblememories,TCAMs,useparallelismtoachieveO(1)lookup[26].EachmemorycellofaTCAMmaybesettooneofthreestates0,1,anddon'tcare.TheprexesofaroutertablearestoredinaTCAMindescendingorderofprexlength.AssumethateachworkoftheTCAMhas32cells.Theprex10*isstoredinaTCAMworkas10??...?,where?denotesadon'tcareandthereare30?sinthegivensequence.Todoalongest-prexmatch,thedestinationaddressismatched,inparallel,againsteveryTCAMentryandasorted-by-lengthlinearlist,thelongestmatching-prexcanbedeterminedinO(1)time.AprexmaybeinsertedordeletedinO(W)time,whereWisthelengthofthelongestprex[27].AlthoughTCAMsprovideasimpleandecientsolutionforstaticanddynamicroutertables,thissolutionrequiresspecialhardware,costsmore,andusesmorepowerandboardspacethansolutionsthatemploySDRAMs.TCAMshavelongerlatencythanSDRAMs. PAGE 20 1.2.8 Others Cheungetal.[29]developedamodelfortable-drivenroutelookupandcastthetabledesignproblemasanoptimizationproblemwithinthismodel.Theirmodelaccountsforthememoryhierarchyofmoderncomputers,andtheyoptimizeaverageperformanceratherthanworst-caseperformance. SolutionsthatinvolvemodicationstotheInternetProtocol(i.e.,theadditionofinformationtoeachpacket)havealsobeenproposed[30,31,32]. 1.3 Contribution Wehavedevelopeddatastructuresfordynamicroutertables.Thedatastruc-turesuseO(n)spaceexceptthatRIBTusesO(nlogmn)space.Ourrstdatastruc-ture,PST[33,34],usesthemostspecicmatchingtiebreaker.Itpermitsonetosearch,insert,anddeleteinO(logn)timeeach.AlthoughO(logn)timedatastruc-turesforprextableswereknownpriortoourwork[21,22],thePSTismorememoryecientthanthedatastructuresof[21,22].Further,PSTissignicantlysuperiorontheinsertanddeleteoperations,whilebeingcompetitiveonthesearchoperation.Fornonintersectingrangesandconict-freerangesPSTsarethersttopermitO(logn)search,insert,anddelete. PAGE 21 Theseconddatastructure,BOB[35],worksforhighest-prioritymatchingwithnonintersectingranges.thehighest-priorityrulethatmatchesadestinationaddressmaybefoundinO(log2n)time;anewrulemaybeinsertedandanoldonedeletedinO(logn)time.Forthecasewhenallruleltersareprexes,thedatastructurePBOB(prexBOB)permitshighest-prioritymatchingaswellasruleinsertionanddeletioninO(W)timeeach.Onpracticalruletables,BOBandPBOBperformeachofthethreedynamic-tableoperationsinO(logn)timeandwithO(logn)cachemisses.PBOBcanalsosupportthedynamic-tableoperationsinO(logn)timeandwithO(logn)cachemissesfornonintersectingrangeswhenthenumberofnestinglevelsisaconstant. Toutilizethewidecachelinesize,e.g.,64-bytecacheline,weproposeB-treedatastructuresfordynamicrouter-tablesforthecaseswhentheltersareprexesaswellaswhentheyarenon-intersectingranges.AcrucialdierencebetweenourdatastructureforprexltersandtheB-treerouter-tabledatastructureofSurietal.[20]isthatinourdatastructure,eachprexisstoredinO(1)B-treenodesperB-treelevel,whereasinthestructureofSurietal.[20],eachprexisstoredinO(m)nodesperlevel(mistheorderoftheB-tree).Asaresultofthisdierence,aprexmaybeinsertedordeletedfromann-lterroutertableaccessingonlyO(logmn)nodesofourdatastructure;theseoperationsaccessO(mlogmn)nodesusingthestructureofSurietal.[20].EventhoughtheasysmptoticcomplexityofprexinsertionanddeletionisthesameinbothB-treestructures,experimentsconductedbyusshowthatbecauseofthereducedcachemissesforourstructure,themeasuredaverageinsertanddeletetimesusingourstructureareabout30%lessthanwhentheB-treestructureofSurietal.[20]isused.Further,anupdateoperationusingtheB-treestructureofSurietal.[20]will,intheworstcase,make2.5timesasmanycachemissesasmadewhenourstructureisused.Theasymptoticcomplexitytondthelongestmatchingprexisthesame,O(mlogmn)inbothB-treestructures,andin PAGE 22 WiththeO(logn)operationtime,ourdatastructuresscalewelltothelargeroutertables.Sincethecomplexityisindependentoftheprexlength,ourdatastructuresarealsoscalabletoIPv6. Anotherimportantfeatureofourdatastructuresisthatnonintersectingrangesaresupportednaturally,whereasmostexistingdatastructuressupportranges(neces-sarywhentheltersaredenedforportnumbers)bybreakingonerangeintoO(W)prexeswhichresultsinO(Wlogn)memoryrequirement.Supportingrangesisalsoanicefeaturefornetworklayeraddresses.Therangethataprexcoversmustbeapoweroftwo,anditmuststartatanumberwhichisamultipleoftherangesize.Buttheendpointsandthesizeofanormalrangecanbeanynumber.Supportingrangesmeansonecanallocatearangewitharbitrarysizetoanetwork(AppleTalksupportsthisfeature)andtherangeaggregationispotentiallybetterthanthatofprex.Forexample,twodisjointprexescanaggregateintooneprexonlyiftheirrangesareadjacenttoeachotherandtheyhavethesamelength,whereasthetwodisjointrangescanaggregateintoonerangeaslongastheyarenexttoeachother.So,rangeaggregationisexpectedtoresultinroutertablesthathavefewerrules. PAGE 23 Inthischapter,weshowinSection2.2howpriority-searchtreesmaybeusedtorepresentdynamicprex-router-tables.Theresultingstructure,whichisconceptuallysimplerthantheCRBTstructureofSahnietal.[21],permitslookup,insert,anddeleteinO(logn)timeeach.Forrangerouter-tables,weconsiderthecasewhenthebestmatching-prexisthemost-specicmatchingprex(thisistherangeanalogoflongest-matchingprex).InSection2.3,weshowthatdynamicrange-router-tablesthatemploymost-specicrangematchingandinwhichnotworangesoverlapmaybeecientlyrepresentedusingtwopriority-searchtrees.Usingthistwo-priority-search-treerepresentation,lookup,insert,anddeletecanbedoneinO(logn)timeeach.Thegeneralcaseofnon-conictingrangesisconsideredinSection2.4.Inthissection,weaugmentthedatastructureofSection2.3withseveralred-blacktreestoobtainarange-router-tablerepresentationfornon-conictingrangesthatpermitslookup,insert,anddeleteinO(logn)timeeach.Section2.1introducestheterminologyweuse.Inthissection,wealsodevelopthemathematicalfoundationthatformsthebasisofourdatastructures.ExperimentalresultsarereportedinSection2.5. 2.1 Preliminaries 2.1.1 Prexes and Longest-Prex Matching Theprex1101*matchesalldestinationaddressesthatbeginwith1101and10010*matchesalldestinationaddressesthatbeginwith10010.Forexample,whenW=5,1101*matchestheaddressesf11010;11011g=f26;27g,andwhenW=6,1101*matchesf110100;110101;110110;110111g=f52;53;54;55g.SupposethataroutertableincludestheprexesP1=101,P2=10010,P3=01,P4=1,and14 PAGE 24 (B) (C) 2.1.2 Ranges and ProjectionsDenition1 Noticethateveryprexofaprexrouter-tablemayberepresentedasarange.Forexample,whenW=6,theprexP=1101matchesaddressesintherange[52;55].So,wesayP=1101=[52;55],start(P)=52,andfinish(P)=55. PAGE 25 Sincearangerepresentsasetof(contiguous)points,wemayusestandardsetoperationsandrelationssuchas\andwhendealingwithranges.So,forexample,[2;6]\[4;8]=[4;6].Notethatsomeoperationsbetweenrangesmynotyieldarange.Forexample,[2;6][[8;10]=f2;3;4;5;6;8;9;10gisnotarange.Denition2 Thepredicatedisjoint(r;s)istrueirandsaredisjoint.disjoint(r;s)()overlap(r;s)=;()v PAGE 26 [2,4]and[6,9]aredisjoint;[2,4]and[3,4]arenested;[2,4]and[2,2]arenested;[2,8]and[4,6]arenested;[2,4]and[4,6]intersect;and[3,8]and[2,4]intersect.[4;4]istheoverlapof[2;4]and[4;6];andoverlap([3;8];[2;4])=[3;4].Lemma1 Letsbearange.(R[fsg)isarangeistart(s)v+1andfinish(s)u1.(c) When(R[fsg)=[x;y],x=minfu;start(s)gandy=maxfv;finish(s)g. 2.1.3 Most-Specic-Range Routing and Conict-Free RangesDenition4 PAGE 27 Figure2{2:CasesforLemma2 [2;4]ismorespecicthan[1;6],and[5;9]ismorespcicthan[5;12].Since[2;4]and[8;14]aredisjoint,neitherismorespecicthantheother.Also,since[4;14]and[6;20]intersect,neitherismorespecicthantheother.Denition5 PAGE 28 WenotethatourdenitionofconictfreeisanaturalextensiontorangesofthedenitionofconictfreegivenbyHarietal.[36]forthecaseoftwo-dimensionalprexrules.Denition7 Lets2Rbesuchthatintersect(r;s).9BR[(B)=overlap(r;s)]3. FollowsfromLemma4.2. Whenr2R,(2)followsfromthedenitionofaconict-freerangeset.So,assumer62R.LetCcompriseallrangesofAcontainedins.IfsintersectsnorangeofA,(C)=overlap(r;s).IfsintersectsatleastonerangeofA,then PAGE 29 Fromparts(1)and(2)ofthislemma,itfollowsthatthereisaresolvingsubsetinR[frgforeverys2Rthatintersectswithr.Hence,R[frgisconictfree. When69AR[range((A))^start((A))=u^finish((A))v],wesaythatmaxP(u;v;R)doesnotexist.Similarly,minP(u;v;R)maynotexist.Attimes,weusemaxPandminPasabbreviationsformaxP(u;v;R)andminP(u;v;R),re-spectively. PAGE 30 ((=)AssumemaxY(u;v;R)maxP(u;v;R)^minX(u;v;R)minP(u;v;R).WhenneithermaxYnorminXexist,norangeofRintersectsr.When,maxYexists,9s=[x;y]2A[x PAGE 31 If69t2C[intersect(r;t)],thenfromEquation2.3,weget8t2C[disjoint(r;t)_tr].Fromthisandr(C),itfollowsthatalldestinationaddressesd,d2r,arecoveredbyrangesofCthatarecontainedinr.Therefore,9BCA((B)=r).ThiscontradictsEquation2.1. Next,suppose9t2C[intersect(r;t)].LetDbetheunionoftheresolvingsubsetsforallofthesetandrinR.Clearly,allrangesinDarecontainedinr.Further,letEbethesubsetofallrangesinCthatarecontainedinr.ItiseasytoseethatD[EA^(D[E)=r.ThiscontradictsEquation2.1. For(2),assumethat69BA[(B)=r]. (=))AssumethatAisconictfree.Weneedtoprove69s2A[rs]_[m;n]2A(2.4) Wedothisbycontradiction.So,assume9s2A[rs]^[m;n]62A(2.5) Since9s2A[rs],mandnarewelldened.Equation2.5impliesthatAhasarange[m;y],y>naswellasarange[x;n],x PAGE 32 ((=)IfnorangeofAcontainsr,thenrisnotpartoftheresolvingsubsetforanypairofintersectingrangesofR.This,togetherwiththefactthatRisconictfree,impliesthatAisconictfree.If[m;n]2A,wecanuse[m;n]inplaceofrinanyresolvingsubsetforintersectingrangesofR.Therefore,Ahasaresolvingsubsetforeverypairofintersectingranges.So,Aisconictfree. 2.1.4 Normalized RangesDenition9 [NormalizedRanges]TherangesetRisnormalizedioneofthefollowingistrue.1. PAGE 33 (B) Figure2{3(A)showsarangesetthatisnotnormalized(itcontainsrangesthatintersectaswellasnestedrangesthathavecommonend-points).Figure2{3(B)showsanormalizedrangeset.Regardlessofwhichofthesetworangesetsisused,everydestinationdhasthesamemost-specicrange.Denition10 PAGE 34 Figure2{4:PartitioninganormalizedrangesetintochainsofCPmaybecombinedintoasinglechain.CP(N)iscalledacanonicalpartitioning.2. Foralliandj,1i PAGE 35 IfbothmaxP(u;v1;R)andminP(u+1;v;R)existandmaxP(u;v1;R)+1>minP(u+1;v;R)1,chop(r)=;,where;denotesthenullrange.Thenullrangeneitherintersectsnoriscontainedinanyotherrange. .Denenorm(R)=fchop(r)jr2R^chop(r)6=;g.Lemma12 Ifintersect(s;r0),theneitherstart(r0) PAGE 36 Ifintersect(r;s),thenfromLemma13wegetdisjoint(chop(r);chop(s)).So,chop(r)6=chop(s). Ifnested(r;s),thenfromLemma12itfollowsthatschop(r)_disjoint(s;chop(r))whensrandrchop(s)_disjoint(r;chop(s))whenrs.Considertheformercase(thelattercaseissimilar).schop(r)implieschop(s)6=chop(r).disjoint(s;chop(r))alsoimplieschop(s)6=chop(r). Thenalcaseisdisjoint(r;s).Inthiscase,clearly,chop(s)6=chop(r). Forr02norm(R),denefull(r0)=chop1(r0)=r,whereristheuniquerangeinRforwhichchop(r)=r0.Noticethatfull(chop(r))=rexceptwhenchop(r)=;.Lemma15 Ifjnorm(R)j1,norm(R)isnormalized.So,assumethatjnorm(R)j>1.Letr0ands0betwodierentrangesinnorm(R).Weneedtoshowthatr0ands0satsify PAGE 37 Lets2Rbesuchthatstart(s)=xandfinish(s)=minffinish(t)jt2R^start(t)=xg(a) PAGE 38 Lets2Rbesuchthatfinish(s)=yandstart(s)=maxfstart(t)jt2R^finish(t)=yg(a) PAGE 39 FromLemma4,itfollowsthatAisconictfree.Further,sinceRhasasub-setwhoseprojectionequals[x;y],(A)=[x;y].FromLemma19,itfol-lowsthateveryd2[x;y]hasamost-specicrangeinnorm(A).Therefore,(norm(A))=[x;y].FromthedenitionofthechoppingruleandthatofA,weseethat8r2A[chop(r;A)=chop(r;R)].So,norm(A)norm(R).2. First,assumethat[x;y]2R.Supposethereisaranger02norm(R)suchthatr0[x;y]andr=full(r0)62A.Therearethreecasesforr. When[x;y]62R,letR0=R[f[x;y]g,C0=ffull(r0)jr02norm(R0)^r0[x;y]gandA0=A[f[x;y]g.Usingthelemmacasewehavealreadyproved,wegetC0A0.Sincechop([x;y];R0)=;andchop(s;R)=chop(s;R0)foreverys2R,norm(R0)=norm(R).Therefore,C=C0.So,CA0.Finally,since[x;y]62C,CA. LetsbethesmallestrangeofRthatcontainsr.Assumethatsexistsandthatchop(r;R[frg)6=;.(a) PAGE 40 For(2a),supposetherearetwodierentrangesgandhinRsuchthatchop(g;R)6=chop(g;R[frg)andchop(h;R)6=chop(h;R[frg)Fromthechoppingrule,itfollowsthatrg^rh(2.6) Therefore,:disjoint(g;h).FromthisandLemma1,wegetintersect(g;h)_nested(g;h).Equation2.6andintersect(g;h)implyroverlap(g;h).FromthisandLemma13,wegetdisjoint(r;chop(g;R))^disjoint(r;chop(h;R)).Therefore,chop(g;R)=chop(g;R[frg)andchop(h;R)=chop(h;R[frg),acontradiction.So,:intersect(g;h). Ifnested(g;h),wemayassume,withoutlossofgenerality,thatgh.ThisandEquation2.6yieldrgh.Therefore,maxP(x;y1;R)=maxP(x;y1;R[frg)andminP(x+1;y;R)=minP(x+1;y;R[frg),whereh=[x;y].So,chop(h;R)=chop(h;R[frg),acontradiction. Hence,therecanbeatmostonerangeofRwhosechop()valuechangesasaresultoftheadditionofr.Theprecedingproofforthecasenested(g;h)alsoestablishesthatthechop()valuemaychangeonlyfortheranges,thatisforthesmallestenclosingrangeofr(i.e.,smallests2R[rs]). For(2b),assumethatchop(s;R)6=chop(s;R[frg).Thisimpliesthatchop(s;R)6=;andsox0andy0arewelldened.(Notethatfrompart(1),wegetchop(r;R)6=;.)Weconsidereachofthethreecasesfortherelationshipbetweenrandchop(s;R)(Lemma1). PAGE 41 Usinganargumentsimilartothatusedinpart(2a),wemayshowthatwhenchop(s;R)r,x0=u0^y0=v0. Supposex0=u0^y0>v.IfmaxP(v0+1;y0;R)doesn'texist,thenchop(s;R[frg)=[v+1;y0].Ifitexists,chop(s;R[frg)=[maxP(v0+1;y0;R)+1;y0].4. Supposex0 PAGE 42 For(3),finish(chop(s;R[frg))=y0followsfromtheproofofLemma21(2b).Also,weobservethatmaxP(x;y1;R[frg)v.So,(3b)canbefalseonlywhenmaxP(x;y1;R[frg)>vandeither(a)maxP(v0+1;y0;R)doesn'texistor(b)maxP(v0+1;y0;R) PAGE 43 (B) 2.1.5 Priority Search Trees And Ranges Apriority-searchtree(PST)[37]isadatastructurethatisusedtorepresentasetoftuplesoftheform(key1;key2;data),wherekey10,key20,andnotwotupleshavethesamekey1value.Thedatastructureissimultaneouslyamin-treeonkey2(i.e.,thekey2valueineachnodeofthetreeisthekey2valueineachdescendentnode)andasearchtreeonkey1.TherearetwocommonPSTrepresentations[37]:1. Inaradixpriority-searchtree(RPST),theunderlyingtreeisabinaryradixtreeonkey1.2. Inared-blackpriority-searchtree(RBPST),theunderlyingtreeisared-blacktree. McCreight[37]hassuggestedaPSTrepresentationofacollectionofrangeswithdistinctnishpoints.ThisrepresentationusesthefollowingmappingofarangerintoaPSTtuple:(key1;key2;data)=(finish(r);start(r);data)(2.7) wheredataisanyinformation(e.g.,nexthop)associatedwiththerange.Eachrangeris,thereforemappedtoapointmap1(r)=(x;y)=(key1;key2)=(finish(r);start(r))in2-dimensionalspace.Figure2{5showsasetofrangesandtheequivalentsetof2-dimensionalpoints(x;y). McCreight[37]hasobservedthewhenthemappingofEquation2.7isusedtoobtainapointsetP=map1(R)fromarangesetR,thenranges(d)isgivenby PAGE 44 WhenanRPSTisusedtorepresentthepointsetP,thecomplexityofenumerateRectangle(xleft;xright;ytop) isO(logmaxX+s),wheremaxXisthelargestxvalueinPandsisthenumberofpointsinthequeryrectangle.WhenthepointsetisrepresentedasanRBPST,thiscomplexitybecomesO(logn+s),wheren=jPj.Apoint(x;y)(andhencearange[y;x])maybeinsetedintoordeletedfromanRPST(RBPST)inO(logmaxX)(O(logn))time[37]. 2.2 Prexes LetRbeasetofrangessuchthateachrangerepresentsaprex.Itiswellknown(seeSahnietal.[21],forexample)thatnotworangesofRintersect.Therefore,Risconictfree.Forsimplicity,assumethatRincludestherangethatcorrespondstotheprex*.Withthisassumption,msr(d)isdenedforeveryd.FromLemma9,itfollowsthatmsr(d)istherange[maxStart(ranges(d));minFinish(ranges(d))].Tondthisrangeeasily,wersttransformP=map1(R)intoapointsettransform1(P)sothatnotwopointsoftransform1(P)havethesamex-value.Then,werepresenttransform1(P)asaPST.Denition12 PAGE 45 performedonPST1yieldsranges(d).Tondmsr(d),weemploytheminXinRectangle(xleft;xright;ytop) operation,whichdeterminesthepointinthedenedrectanglethathastheleastx-value.ItiseasytoseethatminXinRectangle(2Wdd+2W1;1;d) performedonPST1yieldsmsr(d). Toinserttheprexwhoserangein[u;v],weinserttransform1(map1([u;v]))intoPST1.IncasethisprexisalreadyinPST1,wesimplyupdatethenext-hopinformationforthisprex.Todeletetheprexwhoserangeis[u;v],wedeletetransform1(map1([u;v]))fromPST1.Whendeletingaprex,wemusttakecarenottodeletetheprex*.Requeststodeletethisprexshouldsimplyresultinsettingthenext-hopassociatedwiththisprexto;. Since,minXinRectangle,insert,anddeleteeachtakeO(W)(O(logn))timewhenPST1isanRPST(RBPST),PST1providesarouter-tablerepresentationinwhichlongest-prexmatching,prexinsertion,andprexdeletioncanbedoneinO(W)timeeachwhenanRPSTisusedandinO(logn)timeeachwhenanRBPSTisused. 2.3 Nonintersecting Ranges LetRbeasetofnonintersectingranges.Clearly,Risconictfree.Forsimplicity,assumethatRincludestherangezthatmatchesalldestinationaddresses(z= PAGE 46 InsertionofarangeristobepermittedonlyifrdoesnotintersectanyoftherangesofR.Oncewehaveveriedthis,wecaninsertrintoPST1asdescribedinSection2.2.Rangeintersectionmaybeveriedbynotingthattherearetwocasesforrangeintersection(Denition2(c)).Wheninsertingr=[u;v],weneedtodetermineif9s=[x;y]2R[u PAGE 47 Figure2{6:Insertr=[u;v]intotheconict-freerangesetR Conict-Free Ranges Inthissection,weextendthetwo-PSTdatastructureofSection2.3tothegeneralcasewhenRisanarbitraryconict-freerangeset.Onceagain,weassumethatRincludestherangezthatmatchesalldestinationaddresses.PST1andPST2aredenedfortherangesetRasinSections2.2and2.3. 2.4.1 Determine SinceRisconictfree,msr(d)isdeterminedbyLemma9.Hence,msrd(d)maybeobtainedbyperformingtheoperationminXinRectangle(2Wdd+2W1;1;d) onPST1. 2.4.2 Insert A Range Wheninsertingaranger=[u;v]62R,wemustinserttransform(map1(r))intoPST1andtransform2(map2(r))intoPST2.Additionally,wemustverifythatR[frgisconictfree.ThisvericationisdoneusingLemma6.Figure2{6givesahigh-leveldescriptionofouralgorithmtoinsertarangeintoR. Step1isdonebysearchingfortransform1(map1(r))inPST1.ForStep2,wenotethatmaxY(u;v;R)=maxXinRectangle(2Wu(u1)+2W1;2W(v1)+2W1;u1)minX(u;v;R)=minXinRectangle(2W(u+1);2Wv+(2W1)v1;(2W1)v1) PAGE 48 Figure2{7:Deletetheranger=[u;v]fromtheconict-freerangesetR 2.4.3 Delete A Range Supposewearetodeletetheranger=[u;v].Thisdeletionistobepermittedir6=zandA=Rfrgisconictfree.Figure2{7givesahigh-leveldescriptionofouralgorithmtodeleter.ItscorrectnessfollowsfromLemma8. Step2employsthestandardPSTalgorithmtodeleteapoint[37].ForStep3,wenotethatAhasasubsetwhoseprojectionequalsr=[u;v]imaxP(u;v;A)=v.InSection2.4.4,weshowhowmaxP(u;v;A)maybecomputedeciently.ForStep5,wenotethatr=[u;v]s=[x;y]ixu^yv.So,AhassucharangeiminXinRectangle(2Wvu+2W1;1;u) existsinPST1. InStep6,weassumethatmaxXinRectangleandminXinRectanglereturntherangeofRthatcorrespondstothedesiredpointintherectangle.Todetermine PAGE 49 Figure2{8:SimplealgorithmtocomputemaxP(u;v;R),where[u;v]isarangeandconflictFree(R)whether[m;n]2A(Step7),wesearchforthepoint(2Wnm+2W1;m)inPST1usingthestandardPSTsearchalgorithm[37].ThereinsertionintoPST1andPST2,ifnecessary,isdoneusingthestandardPSTinsertalgorithm[37]. 2.4.4 Computing 2.4.5 A Simple Algorithm to Compute .Therefore,69r02norm(R)[start(r0)=u]=)69maxP.FromLemma18,itfollowsthatstart(full(r0))6=start(r0)=u=)69s2R[start(s)=start(r0)=u].So,start(full(r0))6=u=)69maxP.Finally,u=start(r0)=start(full(r))im-pliesfinish(full(r0))=minffinish(t)jt2R^start(t)=ug(Lemma17(1)).So,finish(full(r0))>vimplies69s2R[start(s)=u^finish(s)v].Hence,start(r0)=u^finish(full(r0))>v=)69maxP.Further,when PAGE 50 WegettoStep2onlywhenmaxPexists.FromthedenitionofmaxP,9AR[(A)=[u;maxP]].FromthisandLemma20(1),weget9Bnorm(R)[(B)=[u;maxP]].Now,fromLemma10,weget9Dnorm(R)[chain(D)^(D)=[u;maxP]].FromLemma11,itfollowsthatDisasub-chainoftheuniquechainCi2CP(norm(R))thatincludesr0.Letr0,s01,s02,:::,s0qbethetailofCi.ItfollowsthatmaxPiseitherfinish(r0)orfinish(s0j)forsomejintherange[1;q].Letjbetheleastintegersuchthatfull(s0j)6[u;v].Ifsuchajdoesnotexist,thenmaxP=finish(s0q)asnorm(R)hasnosubsetwhoseprojectionequals[u;x]foranyx>finish(s0q).So,assumethatjexists.FromLemma20(2),itfollowsthatmaxP PAGE 51 ToimplementStep1ofFigure2{8,wesearchendPointsTreeforthepointu.Ifu62endPointsTree,then69r02norm(R)[start(r0)=u].Ifu2endPointsTree,thenweusethepointerinthenodeforutogettotherootoftheRBTthathasr0.AsearchinthisRBTforulocatesr0.WemaynowperformtheremainingchecksofStep1usingthedataassociatedwithr0. SupposethatmaxPexists.AtthestartofStep2,wearepositionedattheRBTnodethatrepresentsr0.Thisisnode0ofFigure2{9.Weneedtonds02norm(R) PAGE 52 Figure2{9:AnexampleRBTwithleasts0suchthatstart(s0)>finish(r0)^full(s0)6[u;v].Ifthereisnosuchs0,thenmaxP=maxffinish(root:range);root:maxFinishRightg.Ifsuchans0exists,maxP=start(s0)1. PAGE 53 Wrapping Up Insertion of a Range NowthatwehaveaugmentedPST1andPST2withacollectionofRBTsandanendPointsTree,wheneverweinsertaranger=[u;v]intoR,wemutupdatenotonlyPST1andPST2asdescribedinSection2.4.2,butalsotheRBTcollectionandendPointsTree.Todothis,werstcomputechop(r;R[frg)=chop(r;R)=[u0;v0]byrstcomputingminP(u+1;v)andmaxP(u;v1)asdescribedinSection2.4.4.[u0;v0]isnoweasilyobtainedfromthechoppingrule.Lemma21tellsusthattheonlys2Rwhosechop()valuemaychangeasaresultoftheinsertionofristhesmallestenclosingrangeofr.Sincez2Randr6=z,suchansmustexist.Ratherthansearchforthissexplicitly,weusethecases(2){(4)conditionsofLemma22tonds0=chop(s;R)inendPointsTree.Notethatifchop(s;R)=;,thesearchinendPointsTreewillnotnds;butwhenchop(s;R)=;,chop(s;R[frg)=;.So,nochangeinchop(s;R)iscalledfor. NotethattheinsertionofrmaycombinetwochainsofCP(norm(R)).Inthiscase,weusethejoinoperationofred-blacktreestocombinetheRBTscorrespondingtothesetwochains. PAGE 54 2.4.8 Wrapping Up Deletion of a Range Whenchop(r;R)=;,nochngesaretobemadetotheRBTsandendPointsTree(Lemma23(1)).So,assumethatchop(r;R)6=;.Werstnds,thesmallestrangethatcontainsr(seeLemma23(2)).Notethatsincez2Randr6=z,sexists.Onemayverifythatsisoneoftherangesgivenbythefollowingtwooperations.minXinRectangle(2Wvu+2W1;1;u)maxXinRectangle(0;2Wu+2W1v;2W1v) wheretherstoperationisdoneinPST1andthesecondinPST2(bothoper-ationsaredoneaftertransform1(map1([u;v]))hasbeendeletedfromPST1andtransform2(map2([u;v]))hasbeendeletedfromPST2).Therangesreturnedbythesetwooperationsmaybecomparedtodeterminewhichiss. Oncewehaveidentieds,Lemma23(2)isusedtodeterminechop(s;Rfrg).As-sumethatchop(s;R)6=;.Letchop(r;R)=r0=[u0;v0]andchop(s;R)=s0=[x0;y0].Whens0andr0areindierentRBTs(thisisthecasewhenr0s0,chop(s;R)=chop(s;Rfrg)andtheRBTthatcontainss0mayneedtobesplitintotwoRBTs.Whens0andr0areinthesameRBT,theyareinthesamechainofCP(norm(R)).Ifs0arer0areadjacentrangesofthischain,wemaysimplyremovetheRBTnodeforr0andupdatethatfors0toreectitsnewstartornishpoint(onlyonemaychange).Whenr0ands0arenotadjacentranges,thenodesforthesetworangesareremovedfromtheRBT(thismaysplittheRBTintouptotwoRBTs)andchop(s;Rfrg)inserted.Figure2{11showsthedierentcases. 2.4.9 Complexity Theportionsofthesearch,insert,anddeletealgorithmsthatdealonlywithPST1andPST2havethesameasymptoticcomplexityastheircounterpartsforthecaseofnonintersectingranges(Section2.3).TheportionsthatdealwiththeRBTsandendPointsTreerequireaconstantnumberofsearch,insert,delete,join,andsplit PAGE 55 (B) (C) (D) 2.5 Experimental Results 2.5.1 Prexes Weprogrammedourred-blackpriority-searchtreealgorithmforprexes(Sec-tion2.2)inC++andcompareditsperformancetothatoftheACBRTofSahnietal.[22].RecallthattheACBRTisthebestperformingO(logn)datastructurereportedin[22]fordynamicprex-tables.Fortestdata,weusedsixIPv4prexdatabasesobtainedfrom[38].Thenumberofprexesineachofthesedatabasesaswellasthememoryrequirementsforeachdatabaseofprexesusingourdatastruc-ture(PST)ofSection2.2aswellastheACBRTstructureofSahnietal.[22]are PAGE 56 Figure2{12:MemoryusageshowninTable2{1.ThedatbasesPaix1,Pb1,MaeWestandAadswereobtainedonNov22,2001,whilePb2andPaix2wereobtainedSept.13,2000.Figure2{12isaplotofthedataofTable2{1.Ascanbeseen,theACBRTstructuretakesalmostthreetimesasmuchmemoryasistakenbythePSTstructure.Further,thememoryrequirementofthePSTstructurecanbereducedtoabout50%thatofourcurrentimplementation.Thisreductionrequiresann-nodeimplementationofapriority-searchtreeasdescribedin[37]ratherthanourcurrentimplementation,whichuses2n1nodesasin[39]. Table2{1:Memoryusage Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 NumofPrexes 16172 22225 28889 31827 35303 85988 Memory PST 884 1215 1579 1740 1930 4702 (KB) ACRBT 2417 3331 4327 4769 5305 12851 Toobtainthemeantimetondthelongestmatching-prex(i.e.,toperformasearch),westartedwithaPSTorACRBTthatcontainedallprexesofapre-xdatabase.Next,arandompermutationofthesetofstartpointsoftherangescorrespondingtotheprexeswasobtained.Thispermutationdeterminedtheorderinwhichwesearchedforthelongestmatching-prexforeachofthesestartpoints.Thetimerequiredtodeterminealloftheselongest-matchingprexeswasmeasured PAGE 57 Table2{2:Prextimesona500MHzSunBlade100workstation Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 PST Mean 2.88 3.06 3.25 3.31 3.43 4.06 Search Std 0.36 0.18 0.17 0.16 0.09 0.05 (sec) ACRBT Mean 2.60 2.77 2.87 2.87 3.09 3.51 Std 0.25 0.16 0.16 0.12 0.13 0.04 PST Mean 3.90 4.45 4.83 5.18 5.14 6.04 Insert Std 0.57 0.63 0.51 0.48 0.19 0.20 (sec) ACRBT Mean 21.15 23.42 24.77 25.36 25.54 28.07 Std 1.11 0.66 0.38 0.29 0.19 0.18 PST Mean 4.36 4.45 4.73 4.71 5.06 5.48 Delete Std 0.91 0.63 0.53 0.00 0.19 0.16 (sec) ACRBT Mean 21.24 22.68 23.16 23.71 24.56 25.64 Std 0.95 0.55 0.49 0.35 0.26 0.21 Toobtainthemeantimetoinsertaprex,westartedwitharandompermutationoftheprexesinadatabase,insertedtherst67%oftheprexesintoaninitiallyemptydatastructure,measuredthetimetoinserttheremaining33%,andcomputedthemeaninserttimebydividingbythenumberofprexesin33%ofthedatabase.Thisexperimentwasrepeated20timesandthemeanofthemeanaswellasthestandarddeviationinthemeancomputed.TheselattertwoquantitiesaregiveninTable2{2forourSunworkstation.Ascanbeseen,insertionsintoaPSTtakebetween18%and22%thetimetoinsertintoanACRBT! ThemeanandstandarddeviationdatareportedinTable2{2forthedeleteoperationwereobtainedinasimilarfashionbystartingwithadatastructurethat PAGE 58 Tables2{3and2{4givethecorrespondingtimesona700MHzPentiumIIIPCanda1.4GHzPentium4PC,respectively.Bothcomputershavea256KBL2cache.Theruntimesonour700MHzPentiumIIIareaboutone-halfthetimesonourSunworkstation.Surprisingly,whengoingfromthe700MHzPentiumIIItothe1.4GHzPentium4,themeasuredtimetondthelongestmatching-prexdecreasedbyonlyabout5%forPST.Moresurprisingly,thecorrespondingtimesforACRBTactuallyincreased.ThenetresultoftheslightdecreaseintimeforPSTandtheincreaseforACRBTisthat,onourPentium4PC,thePSTisfasterthantheACRBTonallthreeoperations{ndlongestmatching-prex,insert,anddelete.Thissomewhatsurprisingbehaviorisduetoarchitecturaldierences(e.g.,dierencesinwidthandsizeofL1cachelines)betweenthePentiumIIIand4processors. Table2{3:Prextimesona700MHzPentiumIIIPC Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 PST Mean 1.39 1.54 1.61 1.65 1.70 1.97 Search Std 0.27 0.22 0.17 0.14 0.00 0.04 (sec) ACRBT Mean 1.36 1.44 1.44 1.49 1.54 1.80 Std 0.25 0.18 0.13 0.14 0.14 0.06 PST Mean 2.41 2.63 2.60 2.83 2.80 3.07 Insert Std 0.87 0.30 0.53 0.43 0.40 0.14 (sec) ACRBT Mean 11.97 12.63 13.48 13.62 13.77 14.93 Std 0.95 0.67 0.24 0.48 0.35 0.18 PST Mean 2.32 2.38 2.49 2.45 2.55 2.91 Delete Std 0.82 0.61 0.52 0.47 0.00 0.17 (sec) ACRBT Mean 11.69 12.55 12.95 13.01 13.40 14.10 Std 0.87 0.63 0.54 0.44 0.48 0.16 Figures2{13,2{14,and2{15histogramthesearch,insert,anddeletetimedataoftheprecedingtables. PAGE 59 Table2{4:Prextimesona1.4GHzPentium4PC Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 PST Mean 1.30 1.44 1.51 1.52 1.63 1.92 Search Std 0.19 0.18 0.17 0.13 0.13 0.06 (sec) ACRBT Mean 1.48 1.69 1.83 1.87 1.87 2.24 Std 0.31 0.20 0.16 0.07 0.14 0.05 PST Mean 1.76 1.96 2.18 2.17 2.38 2.65 Insert Std 0.41 0.69 0.00 0.44 0.35 0.18 (sec) ACRBT Mean 11.22 11.81 12.41 12.91 12.92 13.94 Std 0.41 0.60 0.41 0.44 0.26 0.18 PST Mean 1.76 1.69 1.92 1.93 2.00 2.22 Delete Std 0.41 0.60 0.38 0.21 0.42 0.17 (sec) ACRBT Mean 9.46 10.39 10.54 10.42 10.92 11.64 Std 0.57 0.63 0.38 0.21 0.42 0.16 (B) (C) (B) (C) 2.5.2 Nonintersecting Ranges Tobenchmarkouralgorithmfornonintersectingranges(Section2.3),wegener-atedthreedierentsetsofrandom1nonintersectingranges.These,respectively,had PAGE 60 (B) (C) Table2{5:NonintersectingRanges.700MHzPIII NumofRanges 30000 50000 80000 MemoryUsage(KB) 3360 5600 8960 Search Mean 1.92 2.19 2.51 (sec) Std 0.15 0.04 0.06 Insert Mean 8.65 9.27 9.88 (sec) Std 0.49 0.29 0.17 Remove Mean 5.75 6.42 6.81 (sec) Std 0.44 0.28 0.14 2.5.3 Conict-free Ranges Table2{6givesthememoryrequiredaswellasthemeantimesandstandarddeviationsforthecaseofconict-freeranges.Therangesequenceusedisgeneratedsothatwhentherangesareinsertedinsequenceorder,therearenoconicts.Fordeletion,33%oftherangesareremovedinthereverseoftheinsertorder. 2.6 Conclusion Wehavedevelopeddatastructuresfordynamicroutertables.Ourdatastruc-turespermitonetosearch,insert,anddeleteinO(logn)timeeach.AlthoughO(logn) PAGE 61 Table2{6:Conict-freeRanges.PIII700MHzwith256KL2cache NumofRangesinR 30000 50000 80000 NumofRanges Mean 29688 48868 76472 innorm(R) Std 18.03 42.90 60.05 MemoryUsage Mean 6240 9979 15219 (KB) Std 7.06 10.91 11.19 Search Mean 1.98 2.34 2.69 (sec) Std 0.07 0.09 0.06 Insert Mean 18.45 19.65 20.76 (sec) Std 0.51 0.27 0.27 Remove Mean 19.3 20.49 21.60 (sec) Std 0.41 0.13 0.29 timedatastructuresforprextableswereknownpriortoourwork[21,22],ourdatastructureismorememoryecientthanthedatastructuresofSahnietal.[21,22].Further,ourdatastructureissignicantlysuperiorontheinsertanddeleteopera-tions,whilebeingcompetitiveonthesearchoperation. Fornonintersectingrangesandconict-freerangesourdatastructuresarethersttopermitO(logn)search,insert,anddelete. PAGE 62 Inthischapter,wefocusondatastructuresfordynamicNHPRTs,HPPTsandLMPTs.InSection3.2,wedevelopthedatastructurebinarytreeonbinarytree(BOB).ThisdatastructureisproposedfortherepresentationofdynamicNHPRTs.UsingBOB,alookuptakesO(log2n)timeandcachemisses;anewrulemaybeinsertedandanoldonedeletedinO(logn)timeandcachemisses.ForHPPTs,weproposeamodiedversionofBOB{PBOB(prexBOB){inSection3.3.UsingPBOB,alookup,ruleinsertionanddeletioneachtakeO(W)timeandcachemisses.InSection3.4,wedevelopthedatastructuresLMPBOB(longestmatching-prexBOB)forLMPTs.UsingLMPBOB,thelongestmatching-prexmaybefoundinO(W)timeandO(logn)cachemisses;ruleinsertionanddeletioneachtakeO(logn)timeandcachemisses.Onpracticalruletables,BOBandPBOBperformeachofthethreedynamic-tableoperationsinO(logn)timeandwithO(logn)cachemisses.Section3.1introducessometerminologyandExperimentalresultsarepresentedinSection3.6. 3.1 PreliminariesDenition13 PAGE 63 Noticethateveryprexofaprexrouter-tablemayberepresentedasarange.Forexample,whenW=6,theprexP=1101matchesaddressesintherange[52;55].So,wesayP=1101=[52;55],start(P)=52,andfinish(P)=55. Sincearangerepresentsasetof(contiguous)points,wemayusestandardsetoperationsandrelationssuchas\andwhendealingwithranges.So,forexample,[2;6]\[4;8]=[4;6].Notethatsomeoperationsbetweenrangesmynotyieldarange.Forexample,[2;6][[8;10]=f2;3;4;5;6;8;9;10g,whichisnotarange.Denition14 Thepredicatedisjoint(r;s)istrueirandsaredisjoint.disjoint(r;s)()overlap(r;s)=;()v PAGE 64 Thepredicateintersect(r;s)istrueirandshaveanonemptyintersectionthatisdierentfrombothrands.intersect(r;s)()r\s6=;^r\s6=r^r\s6=s():disjoint(r;s)^:nested(r;s)()u PAGE 65 3.2 Nonintersecting Highest-Priority Rule-Tables (NHRTs)|BOB 3.2.1 The Data Structure Thedatastructurebinarytreeonbinarytree(BOB)thatisbeingproposedhereforNHRTscomprisesasinglebalancedbinarysearchtreeatthetoplevel.Thistop-levelbalancedbinarysearchtreeiscalledthepointsearchtree(PTST).Forann-ruleNHRT,thePTSThasatmost2nnodes(wecallthisthePTSTsizeconstraint).ThesizeconstraintisnecessarytoenableO(logn)update.WitheachnodezofthePTST,weassociateapoint,point(z).ThePTSTisastandardred-blackbinarysearchtree(actually,anybinarysearchtreestructurethatsupportsecientsearch,insert,anddeletemaybeused)onthepoint(z)valuesofitsnodeset[24].That PAGE 66 LetRbethesetofnonintersectingrangesoftheNHRT.EachrangeofRisstoredinexactlyoneofthenodesofthePTST.Morespecically,therootofthePTSTstoresallrangesr2Rsuchthatstart(r)point(root)finish(r);allrangesr2Rsuchthatfinish(r) PAGE 67 Table3{1:Anonintersectingrangeset priority 4 [2,4] 33 [2,3] 34 [8,68] 10 [8,50] 9 [10,50] 20 [10,35] 3 [15,33] 5 [16,30] 30 [54,66] 18 [60,65] 7 [69,72] 10 [80,80] 12 Letranges(z)bethesubsetofrangesofRallocatedtonodezofthePTST.1SincethePTSTmayhaveasmanyas2nnodesandsinceeachrangeofRisinexactlyoneofthesetsranges(z),someoftheranges(z)setsmaybeempty. Therangesinranges(z)maybeorderedusingthe PAGE 68 wherep(x)=priority(range(x)).Figure3{2givesapossibleRSTstructureforranges(30)ofFigure3{1.Eachnodeshows(range(x);p(x);mp(x)). Figure3{2:AnexampleRSTforranges(30)ofFigure3{1Lemma26 Foreverynodeyintherightsubtreeofx,st(y)st(x)andfn(y)fn(x).2. Foreverynodeyintheleftsubtreeofx,st(y)st(x)andfn(y)fn(x). 3.2.2 Search for Thehighest-priorityrangethatmatchesthedestinationaddressdmaybefoundbyfollowingapathfromtherootofthePTSTtowardaleafofthePTST.Figure3{3givesthealgorithm.Forsimplicity,thisalgorithmndshp=priority(hpr(d))ratherthanhpr(d).Thealgorithmiseasilymodiedtoreturnhpr(d)instead. PAGE 69 Webeginbyinitializinghp=1andzissettotherootofthePTST.Thisinitializationassumesthatallprioritiesare0.Thevariablezisusedtofollowapathfromtheroottowardaleaf.Whend>point(z),dmaybematchedonlybyrangesinRST(z)andthoseintherightsubtreeofz.ThemethodRST(z)->hpRight(d,hp)(Figure3{4)updateshptoreectanymatchingrangesinRST(z).Thismethodmakesuseofthefactthatd>point(z).ConsideranodexofRST(z).Ifd>fn(x),thendistotheright(i.e.,d>finish(range(x)))ofrange(x)andalsototherightofallrangesintherightsubtreeofx.Hence,wemayproceedtoexaminetherangesintheleftsubtreeofx.Whendfn(x),range(x)aswellasallrangesintheleftsubtreeofxmatchd.Additionalmatchingrangesmaybepresentintherightsubtreeofx.hpLeft(d;hp)istheanalogousmethodforthecasewhend PAGE 70 3.2.3 Insert a Range Arangerthatisknowntohavenointersectionwithanyoftheexistingrangesintheroutertable,maybeinsertedusingthealgorithmofFigure3{5.Inthewhileloop,wendthenodeznearesttotherootsuchthatrmatchespoint(z)(i.e.,start(r)point(z)finish(r)).Ifsuchazexists,therangerisinsertedintoRST(z)usingthestandardred-blackinsertionalgorithm[24].Duringthisinsertion,itisnecessarytoupdatesomeofthempvaluesontheinsertpath.Thisupdateisdoneeasily.IncasethePTSThasnozsuchthatrmatchespoint(z),weinsertanewnodeintothePTST.ThisinsertionisdoneusingthemethodinsertNewNode. ToinsertanewnodeintothePTST,werstcreateanewPTSTnodeyanddenepoint(y)andRST(y).point(y)maybesettobeanydestinationaddressmatchedbyr(i.e.,anyaddresssuchthatstart(r)point(y)finish(r))maybeused.Inourimplementation,weusepoint(y)=start(r).RST(y)hasonlyarootnodeandthisrootcontainsr;itsmpvalueispriority(r).IfthePTSTiscurrentlyempty,ybecomesthenewrootandwearedone.Otherwise,thenewnodeymaybeinsertedwherethesearchconductedinthewhileloopofFigure3{5terminated.That PAGE 71 WenotethatifthenumberofnodesinthePTSTwasatmost2jRj,wherejRjisthenumberofrangespriortotheinsertionofanewranger,thenfollowingtheinsertion,jPTSTj2jRj+1<2(jRj+1),wherejPTSTjisthenumberofnodesinthePTSTandjRj+1isthenumberofrangesfollowingtheinertionofr.HenceaninsertdoesnotviolatethePTSTsizeconstraint. PAGE 72 3.2.4 Red-Black-Tree Rotations Figures3{6and3{7,respectively,showthered-blackLLandRRrotationsusedtorebalanceared-blacktreefollowinganinsertordelete(see[24]).Inthesegures,pt()isanabbreviationforpoint().Sincetheremainingrotationtypes,LRandRL,may,respectively,beviewedasanRRrotationfollowedbyanLLrotationandanLLrotationfollowedbyanRRrotation,itsucestoexamineLLandRRrotationsalone. Figure3{6:LLrotation Figure3{7:RRrotationLemma27 PAGE 73 LetxandybeasinFigures3{6and3{7.FromLemma27,itfollowsthatranges(z)=ranges0(z)forallzinthePTSTexceptpossiblyforz2fx;yg.Itisnottoodiculttoseethatranges0(y)=ranges(y)[Sandranges0(x)=ranges(x)S,whereS=frjr2ranges(x)^start(r)point(y)finish(r)g TherangerMaxofSwithlargeststart()valuemaybefoundbysearchingRST(x)fortherangewithlargeststart()valuethatmatchespoint(y).(NotethatrMax=msr(point(y);ranges(x)).)SinceRST(x)isabinarysearchtreeofanorderedsetofranges(Denition18),rMaxmaybefoundinO(height(RST(x))timebyfollowingapathfromtherootdownward.IfrMaxdoesn'texist,S=;,ranges0(x)=ranges(x)andranges0(y)=ranges(y). PAGE 74 Figure3{8:ranges(x)andranges(y)forLLandRRrotations.NodesxandyareasinFigures3{6and3{7 AssumethatrMaxexists.Wemayusethesplitoperation[24]toextractfromRST(x)therangesthatbelongtoS.TheoperationRST(x)!split(small;rMax;big) separatesRST(x)intoanRSTsmallofranges<(Denition18)thanrMaxandanRSTbigofranges>thanrMax.WeseethatRST0(x)=bigandRST0(y)=join(small;rMax;RST(y)),wherejoin[24]combinesthered-blacktreesmallwithranges PAGE 75 3.2.5 Delete a Range Figure3{9givesouralgorithmtodeletearanger.NotethatifrisoneoftherangesinthePTST,thenrmustbeintheRSTofthenodezthatisclosesttotherootandsuchthatrmatchespoint(z).ThewhileloopofFigure3{9ndsthiszanddeletesrfromRST(z). Assumethatris,infact,oneoftherangesinourPTST.TodeleterfromRST(z),weusethestandardred-blackdeletionalgorithm[24]modiedtoupdatempvaluesasnecessary.FollowingthedeletionofrfromRST(z)weperformacleanupoperationthatisnecessarytomaintainthesizeconstraintofthePTST.Figure3{10givesthestepsinthemethodcleanup. PAGE 76 NoticethatfollowingthedeletionofrfromRST(z),RST(z)mayormaynotbeempty.IfRST(z)becomesemptyandthedegreeofnodeziseither0or1,nodezisdeletedfromthePTSTusingthestandardred-blacknodedeletionalgorithm[24].Ifthisdeletionrequiresarotation(atmostonerotationmayberequired)therotationisdoneasdescribedinSection3.2.4.Sincethenumberofrangesandnodeshaseachdecreasedby1,thesizeconstraintmaybeviolated(thishappensifjPTSTj=2jRjpriortothedelete).Hence,itmaybenecessarytoremoveanodefromthePTSTtorestorethesizeconstraint. IfRST(z)becomesemptyandthedegreeofzis2orifRST(z)doesnotbecomeempty,zisnotdeletedfromthePTST.Now,jPTSTjisunchangedbythedeletionofrandjRjreducesby1.Again,itispossiblethatwehaveasizeconstraintviolation.Ifso,uptotwonodesmayhavetoberemovedfromthePTSTtorestorethesizeconstraint. Thesizeconstraint,ifviolated,isrestoredinthewhileloopofFigure3{10.Thisrestorationisdonebyremovingoneortwo(asneeded)degree0ordegree1nodesthathaveanemptyRST.Lemma28showsthatwheneverthesizeconstraintisviolated,thePTSThasatleastonedegree0ordegree1nodewithanemptyRST.So,thenodezneededfordeletionineachiterationofthewhileloopalwaysexists.Lemma28 PAGE 77 Tondthedegree0anddegree1nodesthathaveanemptyRSTeciently,wemaintainadoubly-linkedlistofthesenodes.Also,adoubly-linkedlistofdegree2nodesthathaveanemptyRSTismaintained.Whenarangeisinsertedordeleted,PTSTnodesmaybeadded/removedfromthesedoubly-linkedlistsandnodesmaymovefromonelisttoanother.TherequiredoperationscanbedoneinO(1)timeeach. 3.2.6 Expected Complexity of BOB LetmaxRbethemaximumnumberofrangesthatmatchanydestinationad-dress.So,jranges(z)j=jRST(z)jmaxRforeverynodezofthePTST.Wemay,therefore,restatethecomplexityoftheBOBoperations{lookup,insert,delete{asO(lognlogmaxR),O(logn),andO(logn),respectively. Sahnietal.[21]haveanalyzedtheprexesinseveralrealIPv4prexrouter-tables.Theyreportthatadestinationaddressismatchedbyabout1prexonaverage;themaximumnumberofprexesthatmatchadestinationaddressisatmost6.Makingtheassumptionthatthisanalysisholdstrueevenforrealrangerouter-tables(nodataisavailableforustoperformsuchananalysis),weconcludethatmaxR6.So,theexpectedcomplexityofBOBonrealrouter-tablesisO(logn)peroperation. PAGE 78 3.3 Highest-Priority Prex-Tables (HPPTs)|PBOB 3.3.1 The Data Structure Whenallruleltersareprexes,maxRminfn;Wg.Hence,ifBOBisusedtorepresentanHPPT,thesearchcomplexityisO(lognminflogn;logWg);theinsertanddeletecomplexitiesareO(logn)each. SincemaxR6forrealprexrouter-tables,wemayexpecttoseebetterperfor-manceusingasimplerstructure(i.e.,astructurewithsmalleroverheadandpossiblyworseasymptoticcomplexity)forranges(z)thantheRSTstructuredescribedinSec-tion3.2.InPBOB,wereplacetheRSTineachnode,z,oftheBOBPTSTwithanarraylinearlist[41],ALL(z),ofpairsoftheform(pLength;priority),wherepLengthisaprexlength(i.e.,numberofbits)andpriorityistheprexpriority.ALL(z)hasonepairforeachranger2ranges(z).ThepLengthvalueofthispairisthelengthoftheprexthatcorrespondstotherangerandthepriorityvalueisthepriorityoftheranger.ThepairsinALL(z)areinascendingorderofpLength.Notethatsincetherangesinranges(z)arenestedandmatchpoint(z),thecorrespondingprexeshavedierentlength. 3.3.2 Lookup Figure3{11givesthealgoritmtondthepriorityofthehighest-priorityprexthatmatchesthedestinationaddressd.Themethodmaxp()returnsthehighestpriorityofanyprexinALL(z)(notethatallprefxesinALL(z)matchpoint(z)).ThemethodsearchALL(d,hp)examinestheprexesinALL(z)andupdateshptakingintoaccounttheprioritiesofthoseprexesinALL(z)thatmatchd. ThemethodsearchALL(d,hp)utilizesthefollowinglemma.Consequently,itexaminesprexesofALL(z)inincreasingorderoflengthuntileitherallprexeshavebeenexaminedoruntiltherst(i.e.,shortest)prexthatdoesn'tmatchdisexamined. PAGE 79 OnewaytodeterminewhetheralengthlprexofALL(z)matchesdistousethefollowinglemma.Thecheckofthislemmamaybeimplementedusingamasktoextractthemost-signifcantbitsofpoint(z)andd.Lemma30 PAGE 80 3.3.3 Insertion and Deletion ThePBOBalgorithmstoinsert/deleteaprexaresimpleadaptationsofthecor-respondingalgorithmsforBOB.rMaxisfoundbyexaminingtheprexesinALL(x)inincreasingorderoflength.ALL0(y)isobtainedbyprependingtheprexesinALL(x)whoselengthisthelengthofrMaxtoALL(y),andALL0(x)isobtainedfromALL(x)byremoveingtheprexeswhoselenthisthelengthofrMax.ThetimerequiretondrMaxisO(maxR).Thisisalsothetimerequiredtocom-puteALL0(y)andALL0(x).Theoverallcomplexityofaninsert/deleteoperationisO(logn+maxR)=O(W). Asnotedearlier,maxR6inpractice.So,inpractice,PBOBtakesO(logn)timeandmakesO(logn)cachemissesperoperation. 3.4 Longest-Matching Prex-Tables (LMPTs)|LMPBOB 3.4.1 The Data Structure Usingpriority=pLength,aPBOBmaybeusedtorepresentanLMPTobtain-ingthesameperformanceasforanHPPT.However,wemayachievesomereductioninthememoryrequiredbythedatastructureifwereplacethearraylinearlistthatisstoredineachnodeofthePTSTbyaW-bitvector,bit.bit(z)[i]denotestheithbitofthebitvectorstoredinnodezofthePTST,bit(z)[i]=1iALL(z)hasaprexwhoselengthisi.WenotethatSurietal.[20]useW-bitvectorstokeeptrackofprexlengthsintheirdatastructurealso. PAGE 81 3.4.2 Lookup Figure3{12givesthealgorithmtondthelengthofthelongestmatching-prex,lmp(d),fordestinationd.Themethodlongest()returnsthelargestisuchthatbit(z)[i]=1(i.e.,itreturnsthelengthofthelongestprexstoredinnodez).ThemethodsearchBitVector(d,hp,k)examinesbit(z)andupdateshptakingintoac-countthelengthsofthoseprexesinthisbitvectorthatmatchd.Themethodsame(k+1,point(z),d)returnstrueipoint(z)anddagreeontheirk+1mostsignicantbits. PAGE 82 ThemethodsearchBitVector(d,hp,k)(Figure3{13)utilizesthenexttwolem-mas.Lemma31 3.4.3 Insertion and Deletion TheinsertanddeletealgorithmsaresimilartothecorrespondingalgorithmsforHPPTs.Theessentialdierenceareasbelow. PAGE 83 RatherthaninsertordeleteaprexfromanALL(z),wesetbit(z)[l],wherelisthelengthoftheprexbeinginsertedordeleted,to1or0,respectively.2. Forarotation,wedonotlookforrMaxinbit(x).Instead,wendthelargestintegeriMaxsuchthattheprexthatcorrespondstobit(x)[iMax]matchespoint(y).Therst(bit0comesbeforebit1)iMaxbitsofbit0(y)aretherstiMaxbitsofbit(x)andtheremainingbitsofbit0(y)arethesameasthecorrespondingbitsofbit(y).bit0(x)isobtainedfrombit(x)bysettingitsrstiMaxbitsto0. 3.5 Implementation Details and Memory Requirement 3.5.1 Memory Management WeimplementedourdatastructuresinC++.SincedynamicmemoryallocationanddeallocationusingC++'smethodsnewanddeleteareverytimeconsuming,weimplementedourownmethodstomanagememory.Wemaintainedourownlistoffreememory.Wheneverthislistwasexhausted,weusedthenewmethodtogetalargechunkofmemorytoaddtoourfreelist.Memorywasthenallocatedfromthislargechunkasneededbyourdatastructures.Whenevermemorywastobedeallocated,itwasputbackontoourfreelist. 3.5.2 BOB AsdescribedinSection3.2,eachnodezofthePTSTofBOBhasthefollowingelds:color,point(z),RST,leftChild,andrightChild.Toimprovethelookupper-formanceofBOB,weaddedthefollowingelds:maxPriority(maximumpriorityof PAGE 84 Withtheaddedelds,eachnodeofthePTSThas8elds.ForthecolorandmaxPriorityelds,weallocate1byteeach.Assuming4bytesforeachofthere-mainingelds,wegetanodesizeof26bytes.Forimprovedcacheperformance,itisdesirabletoalignnodeto4-bytememory-boundaries.Thisalignmentissimpliedifnodesizeisanintegralmultipleof4bytes.Therefore,forpracticalpurposes,thePTSTnode-sizebecomes28bytes. InourimplementationofhpRight(Figure3{4),thewhileloopconditonalwaschangedfromx!=nulltox!=null&&mp>hp.AcorrespondingchangewasmadetohpLeft. ThenodesofanRSThavethefollowingelds:color,mp,st,fn,p,leftChild,andrightChild.Using1byteforthecolor,p,andmpeldseach,and4bytesforeachoftheremainingelds,thesizeofanRSTnodebecomes19bytes.Again,foreaseofalignmentto4-byteboundaries,wemaketheRST-nodesize20bytes.Inadditiontonodes,everynonemptyRSThastheeldsroot(pointertorootofRST)andrank(rankofred-blacktree)eld.Eachoftheseeldsisa4-byteeld. Forthedoubly-linkedlistsofPTSTnodeswithanemptyRST,weusedtheminStandmaxFneldsto,respectively,representleftandrightpointers.So,thereisnospaceoverhead(otherthanthespaceneededtokeeptrackoftherstnode)associatedwithmaintainingthetwodoubly-linkedlistsofPTSTnodesthathaveanemptyRST. PAGE 85 SinceaninstanceofBOBmayhaveupto2nPTSTnodes,nnonemptyRSTs,andnRSTnodes,themaximumspace/memoryrequiredbyBOBis282n+8n+20n=84nbytes. 3.5.3 PBOB TherequiredeldsineachnodezofthePTSTofPBOBare:color,point(z),ALL,size,length,leftChild,andrightChild,whereALLisaone-dimensionalarray,eachentryofwhichhasthesubeldspLengthandpriority;sizeisthedimensionofthearray,andlengthisthenumberofpairscurrentlyinthearraylinearlist.ThearrayALLinitiallyhasenoughspacetoaccomodate4pairs(pLength;priority).WhenthecapacityofanALLisexceeded,thesizeoftheALLisincreasedby4pairs(sinceatmost6pairsareexpectedinanALL,thesizeofanALLneedstobeincreasedatmostonce;intheory,anALLmaygetasmanyasWpairsand,intheory,usingarraydoublingasin[41]mayworkbetterthanincreasingthearraysizeby4eachtimearraycapacityisexceeded). ToimprovethelookupperformanceofPBOB,theeldmaxPriority(maxi-mumpriorityoftheprexesinALL(z)),maybeadded.NotethatminSt(smalleststartingpointoftheprexesinALL(z)),andmaxFn(largestnishpointofthepre-xesinALL(z))areeasilycomputedfrompoint(z)andthepLengthoftheshortest(i.e.,rst)prexinALL(z).WhenthenodesofthePTSTareaugmentedwithamaxPriorityeld,theexpressionALL(z)->maxp()inFigure3{11maybechangedtomaxPriority(z),andthestatementALL(z)->searchALL(d,hp)executedonlywhen PAGE 86 Using1byteforeachoftheelds:color,size,length,maxPriority,pLength,andpriority;and4bytesforeachoftheremainingelds,theinitialsizeofaPTSTnodeofPBOBis24bytes. Forthedoubly-linkedlistsofPTSTnodeswithanemptyALL,weusedthe8bytesofmemoryallocatedtotheemptyarrayALLto,respectively,representleftandrightpointers.So,thereisnospaceoverhead(otherthanthespaceneededtokeeptrackoftherstnode)associatedwithmaintainingthetwodoubly-linkedlistsofPTSTnodesthathaveanemptyALL. SinceaninstanceofPBOBmayhaveupto2nPTSTnodes,theminimumspace/memoryrequiredbythese2nPTSTnodesis242n=48nbytes.However,somePTSTnodesmayhavemorethan4pairsintheirALL.Therecanbeatmostn=5suchnodes.So,themaximumspace-requirementofPBOBis48n+8n=5=49:6nbytes. 3.5.4 LMPBOB InthecaseofLMPBOB,eachnodezofthePTSThasthefollowingelds:color,point(z),bit,leftChild,andrightChild. ToimprovethelookupperformanceofPBOB,theeldsminLength(minimumoflengthsofprexesinbit(z))andmaxLengthmaybeadded.WhenthenodesofthePTSTareaugmentedwithaminLengthandamaxLengtheld,wereplacethestatementbit(z)->searchBitVector(d,hp,k)ofFigure3{12by PAGE 87 Using1byteforeachoftheelds:color,minLength,andmaxLength;8bytesforbit(thisanalysisisforIPv4);and4bytesforeachoftheremainingelds,thesizeofaPTSTnodeofLMPBOBis23bytes.Again,toeasilyalignPTSTnodesalong4-byteboundaries,wepadanLMPPTSTnodesothatitssizeis24bytes. Forthedoubly-linkedlistsofPTSTnodeswithanemptybitvector,weusedthe8bytesofmemoryallocatedtotheemptybitvectorbittorepresentleftandrightpointers.So,thereisnospaceoverhead(otherthanthespaceneededtokeeptrackoftherstnode)associatedwithmaintainingthetwodoubly-linkedlistsofPTSTnodesthathaveanemptybit. SinceaninstanceofLMPBOBmayhaveupto2nPTSTnodes,thespace/memoryrequiredbythese2nPTSTnodesis242n=48nbytes. 3.6 Experimental Results 3.6.1 Test Data and Memory Requirement WeimplementedtheBOB,PBOB,andandLMPBOBdatastructuresandasso-ciatedalgorithmsinC++asdescribedinSection3.5andmeasuredtheirperformanceona1.4GHzPC.Toassesstheperformanceofthesedatastructures,weusedsixIPv4prexdatabasesobtainedfrom[38]2.Weassignedeachprexapriorityequaltoitslength.Hence,BOB,PBOB,andLMPBOBwereallusedinalongestmatching-prexmode.Fordynamicrouter-tablesthatusethelongestmatching-prextiebreaker,thePSTstructureofLuetal.[33,34]providesO(logn)lookup,insert,anddelete.So,weincludedthePSTinourexperimentalevaluationofBOB,PBOB,andLMPBOB. Thenumberofprexesineachofour6databasesaswellasthememoryre-quirementforeachdatabaseofprexesareshowninTable3{2.Forthememory PAGE 88 3.6.2 Preliminary Timing Experiments WeperformedpreliminaryexperimentstodeterminetheeectivenessofthechangessuggestedinSection3.5.Sincethesechangesareonlytothelookupal-gorithm,ourpreliminarytimingexperimentsmasuredonlythelookuptimesfortheBOB,PBOB,andLMPBOBdatastructures.Toobtainthemeanlookup-time,we PAGE 89 Table3{2:Memoryusage Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 NumofPrexes 16172 22225 28889 31827 35303 85988 PST 884 1215 1579 1740 1930 4702 Measure1 BOB 851 1176 1526 1682 1876 4527 (KB) PBOB 357 495 642 708 790 1901 LMPBOB 357 495 642 708 790 1901 PST 221 303 395 435 482 1175 Measure2 BOB 331 455 592 652 723 1760 (KB) PBOB 189 260 338 372 413 1007 LMPBOB 189 260 338 372 413 1007 Figure3{14:Memoryusage{measure1 Figure3{15:Memoryusage{measure2startedwithaBOB,PBOB,orLMPBOBthatcontainedallprexesofaprexdatabase.Next,wecreatedalistofthestartpointsoftherangescorrespondingto PAGE 90 ForBOB,wefoundthatomittingthepredicatesdmaxFnandminStdresultedinameanlookuptimethatisapproximatelytwicethebaselookuptime.Ontheotherhand,eliminationofthepredicatemaxPriority>hpreducesthemeanlookuptimebyabout2%.EventhoughtheuseofthepredicatemaxPriority>hpincreasedthelookuptimeslightlyonourtestdata,webelievethisisagoodheuristicfordatasetsinwhichtheprioritiesarenothighlycorrelatedwiththelengthsoftheprexesorranges.So,ourremainingexperimentsretainedthispredicate.Eliminatingthepredicatemp>hphadnonoticeableeectonmeanlookuptime.Thisistobeexpectedonourdatasets,becauseforthesedatasets,themaximumvalueofjranges(z)jismaxR=6.Thepredicatemp>hpisexpectedtobeeectiveondatasetswithalargervalueofmaxR.So,weretainedthispredicateforourremainingtests. ForPBOB,eliminationofthepredicatehp PAGE 91 InthecaseofLMPBOB,theintroductionofthestatementhp=k=minLengthintothebasecode,resultsinalookuptimethatis15%lessthanwhenthisstatementisremoved. 3.6.3 Run-Time Experiments Wemeasuredthemeanlookup-timeasdescribedinSection3.6.2.Thestandarddeviationintheaveragetimesacrossthe10repetitionsdescribedinSection3.6.2wasalsocomputed.ThesemeantimesandstandarddeviationsarereportedinTable3{3.ThemeantimesarealsohistogrammedinFigure3{16.ItisinterestingtonotethatPBOB,whichcanhandleprextableswitharbitrarypriorityassignementsisactually20%to30%fasterthanPST,whichislimitedtoprextablesthatemploythelongestmatching-prextiebreaker.Further,lookupsinBOB,whichcanhandlerangetableswitharbitraryprioritiesareslightlyslowerthaninPST.LMPBOB,which,likePST,isdesignedspecicallyforlongest-matching-prexlookupsisslightlyinferiortothemoregeneralPBOB. Figure3{16:Searchtime Toobtainthemeaninsert-time,westartedwitharandompermutationoftheprexesinadatabase,insertedtherst67%oftheprexesintoaninitiallyemptydatastructure,measuredthetimetoinserttheremaining33%,andcomputedthemeaninserttimebydividingbythenumberofprexesin33%ofthedatabase.(Onceagain, PAGE 92 Table3{3:Prextimesona1.4GHzPentium4PCwithan8KL1datacacheanda256KL2cache Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 PST Mean 1.20 1.35 1.49 1.53 1.57 1.96 Std 0.01 0.01 0.04 0.01 0.00 0.01 BOB Mean 1.22 1.39 1.54 1.56 1.62 2.19 Search Std 0.01 0.02 0.02 0.02 0.02 0.01 (sec) PBOB Mean 0.82 0.98 1.10 1.15 1.20 1.60 Std 0.01 0.01 0.01 0.01 0.01 0.01 LMPBOB Mean 0.87 1.03 1.17 1.21 1.27 1.69 Std 0.01 0.01 0.01 0.01 0.01 0.01 PST Mean 2.17 2.35 2.53 2.60 2.64 3.03 Std 0.07 0.04 0.03 0.01 0.05 0.01 BOB Mean 1.70 1.89 2.06 2.10 2.16 2.55 Insert Std 0.06 0.06 0.05 0.05 0.05 0.03 (sec) PBOB Mean 1.04 1.25 1.39 1.44 1.51 1.93 Std 0.06 0.05 0.00 0.05 0.05 0.06 LMPBOB Mean 1.06 1.29 1.47 1.50 1.57 1.98 Std 0.07 0.07 0.06 0.06 0.04 0.01 PST Mean 1.72 1.87 2.06 2.09 2.11 2.48 Std 0.04 0.05 0.05 0.06 0.04 0.06 BOB Mean 1.04 1.13 1.26 1.27 1.32 1.69 Delete Std 0.06 0.05 0.04 0.05 0.06 0.06 (sec) PBOB Mean 0.68 0.82 0.90 0.91 0.97 1.30 Std 0.07 0.06 0.05 0.06 0.03 0.05 LMPBOB Mean 0.67 0.82 0.89 0.92 0.95 1.26 Std 0.06 0.06 0.05 0.05 0.03 0.05 NumofCopies 15 11 9 8 8 3 sincethetimetoinserttheremaining33%oftheprexeswastoosmalltomeasureaccurately,westartedwithseveralcopiesofthedatastructureandinsertedthe33%prexesintoeachcopy;measuredthetimetoinsertinallcopies;anddividedbythenumberofcopiesandnumberofprexesinserted).Thisexperimentwasrepeated10times,eachtimestartingwithadierentpermutationofthedatabaseprexes,andthemeanofthemeanaswellasthestandarddeviationinthemeancomputed.TheselattertwoquantitiesaswellasthenumberofcopiesofeachdatastructureweusedfortheinsertsaregiveninTable3{3.Figure3{17histogramsthemeaninsert-time.Ascanbeseen,insertionsintoPBOBtakebetween40%and60%less PAGE 93 Figure3{17:Inserttime ThemeanandstandarddeviationdatareportedforthedeleteoperationinTa-ble3{3andFigure3{18wasobtainedinasimilarfashionbystartingwithadatastructurethathad100%oftheprexesinthedatabaseandmeasuringthetimetodeletearandomlyselected33%oftheseprexes.DeletionfromPBOBtakeslessthan50%thetimerequiredtodeletefromanPST.Forthedeleteoperation,how-ever,LMPBOBisslightlyfasterthanPBOB.DeletionsfromBOBtakeabout40%lesstimethandodeletionsfromPST. 3.7 Conclusion Table3.7givestheworst-casememoryrequiredbyeachofthedatastructures.ThedataofthistableareforIPv4.Whencomparingthesememoryrequirementdata,weshouldkeepinmindthatBOB,PBOB,andLMPBOBhavedierentca-pabilities.BOBworksforhighest-prioritymatchingwithnonintersectingranges;PBOBislimitedtohighest-prioritymatchingwithprexes;andLMPBOBislimited PAGE 94 Figure3{18:Deletetimetolongest-lengthmatchingwithprexes.ThePSTstructureofLuetal.[33]hasthesamerestrictionsasdoesLMPBOB. Table3{4:Nodesizesandworst-casememoryrequirementinbytesforIPv4routertables. BOB PBOB LMPBOB PST NodeSize PTST(28)RST(20) 24 28 MemoryRequired 84n Table3{5:Timecomplexity BOB PBOB LMPBOB PST Search Insert Delete PAGE 95 Table3{6:Cachemisses BOB PBOB LMPBOB PST Search Insert Delete OurexperimentsshowthatPBOBistobepreferredoverPSTandLMPBOBfortherepresentationofdynamiclongest-matchingprex-router-tables.Thisissome-whatsurprisingbecausePBOBmaybeusedforhighest-priorityprex-router-tables,notjustlongest-matchingprex-router-tables.ApossiblereasonwhyPBOBisfasterthanLMPBOBisthatinLMPBOBonehastocheckO(W)prexlengths,whereasinPBOBO(maxR)lengthsarechecked(notethatinourtestdatabases,W=32andmaxR6).BOBisslowerthanandrequiresmorememorythanPBOBwhentestedwithlongest-matchingprex-routertables.ThesamerelativeperformancebetweenBOBandPBOBisexpectedwhenltersareprexeswitharbitrarypriority.Ofthedatastructuresconsideredinthischapter,BOB,ofcourse,remainstheonlychoicewhentheltersarerangesthathaveanassociatedpriority. Althoughtherangeallocationruleusedbyourdatastructuresissimilartothatusedinanintervaltree[40],theuniquefeatureofourstructuresisthe2nsizeconstraint.ThesizeconstraintisessentialforO(logn)update. PAGE 96 Inthischapter,wefocusonB-treedatastructuresfordynamicNHPRTsandLMPTs.WeareinterestedintheB-tree,becausebyvaryingtheorderoftheB-tree,wecancontroltheheightofthetreeandhencecontrolthenumberofcachemissesincurredwhenperformingarule-tableoperation.AlthoughSurietal.[20]haveproposedaB-treedatastructurefordynamicprex-tables,theirstructurehasthefollowingshortcomings:1. AprexmaybestoredinO(m)nodesateachleveloftheordermB-tree.Thisresultsinexcessivecachemissesduringtheinsertanddeleteoperations.2. Someofprexend-pointsarestoredtwiceintheB-tree.Thisisbecauseeveryendpointisstoredinaleafnodeandsomeoftheendpointsareadditionallystoredininteriornodes.Thisduplicityinend-pointstorageincreasesmemoryrequirement. OurproposedB-treestructuredoesn'tsuerfromtheseshortcomings.Inourstruc-ture,eachprexisstoredinO(1)nodesateachlevel,andeachprexend-pointisstoredonce.Consequently,eventhoughtheasymptoticcomplexityofperformingdynamicprex-tableoperationsisthesameinbothstructuresandtheasymptoticmemoryrequirementsofbotharethesame,ourstructureisfasterfortheinsertanddeleteoperationsandalsotakeslessmemory. InSection4.1,wedevelopourB-treedatastructure,PIBT(prexinB-tree),fordynamicprex-tables.OurB-treestructurefornon-intersectingranges,RIBT(rangeinB-tree),isdevelopedinSection4.2.ExperimentalresultscomparingtheperformanceofourPIBTstructure,themultiwayrangetree(MRT)structureofSuri87 PAGE 97 Table4{1:AnexampleprexsetR(W=5) PreifxName Prex RangeStart RangeFinish P1 001* 4 7 P2 00* 0 7 P3 1* 16 31 P4 01* 8 15 P5 10111 23 23 P6 0* 0 15 etal.[20],andthebestbinarytreestructurefordynamicprex-tables,PBOB[35],arepresentedinSection4.3. 4.1 Longest-Matching Prex-Tables|LMPT 4.1.1 The Prex In B-Tree Structure|PIBT Aranger=[u;v]isapairofaddressesuandv,uv.Therangerrepresentstheaddressesfu;u+1;:::;vg.start(r)=uisthestartpointoftherangeandfinish(r)=visthenishpointoftherange.Therangermatchesalladdressesdsuchthatudv.Everyprexofaprexrouter-tablemayberepresentedasarange.Forexample,whenW=5,theprexp=100matchesaddressesintherange[16;19].So,wesayp=100=[16;19],start(p)=16,andfinish(p)=19.Thelengthofpis3.Figure4{1showsaprexsetandtherangesoftheprexes. ThesetofstartandnishpointsofacollectionPofprexesisthesetofendpoints,E(P),ofP.WhenjPj=n,jE(P)j2n.AlthoughourPIBTstructureandtheMRTstructureofSurietal.[20](MRT)storetheendpointsE(P)togetherwithadditionalinformationinaB-tree1[41],eachstructureusesadierentvarietyofB-tree.OurPIBTstructureusesaB-treeinwhicheachkey(endpoint)isstored PAGE 98 Figure4{1:B-treefortheendpointsoftheprexesofFigure4{1 Figure4{2:AlternativeB-treeforFigure4{1exactlyonce,whiletheMRTusesaB-treeinwhicheachkeyisstoredonceinaleafnodeandsomeofthekeysareadditionallystoredininteriornodes.Figure4{1showsapossibleorder-3B-treefortheendpointsoftheprexsetofFigure4{1.Inthisexample,eachendpointisstoredinexactlyonenode.ThisexampleB-treeisapossibleB-treeforPIBTbutnotforMRT. Figure4{2showsapossibleorder3B-treeinwhicheachendpointisstoredinexactlyoneleafnodeandsomeendpointsarealsostoredininteriornodes.ThisexampleB-treeisapossibleB-treeforMRTbutnotforPIBT. WitheachnodexofaPIBTB-tree,weassociateanintervalint(x)ofthedes-tinationaddressspace[0;2W1].Theintervalint(root)associatedwiththerootoftheB-treeis[0;2W1].LetxbeaB-treenodethathastkeys.Theformatofthisnodeis:t;child0;(key1;child1);;(keyt;childt) PAGE 99 wherekeyiistheithkeyinthenode(key1 PAGE 100 4.1.2 Finding The Longest Matching-Prex Asin[20],wedetermineonlythelengthofthelongestprexthatmatchesagivendestinationaddressd.Fromthislengthandd,thelongestmatching-prex,lmp(d),iseasilycomputed.ThePIBTsearchalgorithm(Figure4{3)employsthefollowinglemma.Lemma33 ThePIBTsearchalgorithmrstconstructsaW-bitvectormatchVector.Whentheroutertablehasnoprexwhosestartornishendpointequalsthedestinationaddressd,theconstructedbitvectorsatisesmatchVector[l]=1ithereisalengthlprexthatmatchesd.Otherwise,matchVector[l]=1ithereisalengthlprexwhosestartornishendpointequalsd.ThemaximumlsuchthatmatchVector[l]=1isthelengthoflmp(d). ComplexityAnalysis.EachiterationofthewhilelooptakesO(log2m)time(weassumethroughoutthispaperthat,forsucientlylargem,aB-treenodeissearchedusingabinarysearch)andthenumberofiterationsisO(logmn).ThelargestlsuchthatmatchVector[l]=1maybefoundinO(log2W)timebyperformingO(log2W)operationsontheW-bitvectormatchVector.So,theoverallcomplexityis PAGE 101 4.1.3 Inserting A Prex Toinsertaprexp,wemustdothefollowing:1. Insertstart(p)intothePIBTandupdatethecorrespondingequalvector.Ifstart(p)isalreadyinthePIBT,onlythecorrespondingequalvectoristobeupdated.2. Insertfinish(p)(provided,ofcourse,thatfinish(p)6=start(p))intothePIBTandupdatethecorrespondingequalvector.Iffinish(p)isalreadyinthePIBT,onlythecorrespondingequalvectoristobeupdated.3. Updatetheintervalvectorssoastosatisfytheprexallocationrule. 4.1.4 Inserting an endpoint ThealgorithmtoinsertanendpointuintothePIBTisanadaptationofthestandardB-treeinsertionalgorithm[41].WesearchthePIBTforakeyequaltou.IncasecaseuisalreadyinthePIBT,theassociatedequalvectorisupdatedtoaccountforthenewprexpthatbeginsorendsatuandwhoselengthequals PAGE 102 IfuisnotinthePIBT,thesearchforuterminatesataleafxofthePIBT.Lettbethenumberofkeysinx.Theendpointuisinsertedintonodexbetweenkeyi1andkeyi,wherekeyi1
PAGE 103 Figure4{4:Nodesplitting AlgorithminsertEndPoint(u;x;i)(Figure4{5)insertstheendpointuintotheleafnodexofthePIBTandperformsnodesplitsasneeded.Itisassumedthatx:keyi1
PAGE 105 4.1.5 Update interval vectors Followingtheinsertionoftheendpointsofthenewprexp,theintervalvectorsinthenodesofthePIBTneedtobeupdatedtoaccountforthenewprexp.TheprexallocationruleleadstotheintervalupdatealgorithmofFigure4{6.Oninitialinvocation,xistherootofthePIBT.Theintervalupdatealgorithmassumesthatpisnotthedefaultprexthatmatchesalldestinationaddresses(thisprex,ifpresent,maybestoredoutsidethePIBTandhandledasaspecialcase). Figure4{7showsapossiblesetofnodesvisitedbyx. ComplexityAnalysis.AlgorithmupdateIntervalsvisitsatmost2nodesoneachlevelofthePIBTandateachnode,O(m)timeisspent.So,thecomplexityofupdateIntervalsisO(mlogmn).ThisalgorithmaccessesO(logmn)nodesofthePIBT. PAGE 106 Figure4{7:Nodesvisitedwhenupdatingintervals Combiningthecomplexitiesofallpartsofthealgorithmtoinsertaprex,weseethatthetimetoinsertisO((m+log2W)logmn)andthatthenumberofnodesaccessedduringaprexinsertionisO(logmn). 4.1.6 Deleting A Prex Todeleteaprexp,wedothefollowing.1. Removepfromallintervalvectorsthatcontainp.2. Updatetheequalvectorforstart(p)andremovestart(p)fromthePIBTifitsequalvectorisnowzero.3. Ifstart(p)6=finish(p),updatetheequalvectorforfinish(p)andremovefinish(p)fromthePIBTifitsequalvectorisnowzero. Therstofthesesteps(i.e.,removingpfromintervalvectors)isalmostidenticaltothecorrespondingstep(Figure4{6)forprexinsertion.Theonlydierenceisthatinsteadofsettingx:intervalq[length(p)]to1,wenowsetitto0.ThetimecomplexityofthisstepremainsO(mlogmn)andthisstepaccessesO(logmn)nodesofthePIBT. TheB-treekeydeletionalgorithmofSahnietal.[41]considerstwocases:1. Thekeytobedeletedisinaleafnode.2. Thekeytobedeletedisaninterior(i.e.,non-leaf)node. PAGE 107 4.1.7 Deleting from a Leaf Node Todeletetheendpointu,werstsearchthePIBTforthenodexthatcontainsthisendpoint.SupposethatxisaleafofthePIBTandthatu=x:keyi.Sinceuisanendpointofnoprex,x:intervali1=x:intervaliandx:equali=0.keyi,x:intervali,x:equali,andx:childiareremovedfromnodexandthekeystotherightofkeyitogetherwiththeassociatedinterval,equal,andchildvaluesareshiftedonepositionleft.Ifthenumberofkeysthatremaininxisatleastdm=2e(2incasexistheroot),wearedone.Otherwise,nodexisdecientandwedothefollowing1. Ifanearestsiblingofxhasmorethandm=2ekeys,xgains/borrowsakeyviathisnearestsiblingandsoisnolongerdecient.2. Otherwise,xmergeswithitsnearestsibling.Themergemaycausepx=parent(x)tobecomedecientinwhichcase,thisdeciencyresolutionprocessisrepeatedatpx. 4.1.8 Borrow from a Sibling Supposethatx'snearestleftsiblingyhasmorethandm=2ekeys(Figure4{8).Letkeyt(y)bethelargest(i.e.,rightmost)keyinyandletpx:keyibethekeyinpxsuchthatpx:childi1=yandpx:childi=x(i.e.,px:keyiisthekeyinpxthatisbetweenyandx). Figure4{8:Borrowfromrightsibling Theborrowoperationdoesthefollowing:1. Inpx,keyiandequaliarereplacedbykeyt(y)anditsassociatedequalvector. PAGE 108 Inx,allkeysandassociatedvectorsandchildpointersareshiftedoneright;y:childt(y),y:intervalt(y),px:keyi,andpx:equali,respectively,becomex:child0,x:interval0,x:key0,x:equal0.3. Fromtheintervalsofy,removetheprexesthatincludetherange[px:keyi1;keyt(y)]andaddtheseremovedprexestopx:intervali1.4. Frompx:intervali,removethoseprexesthatdonotincludetherange[keyt(y);px:keyi+1]andaddtheseremovedprexestotheintervalsofxotherthanx:interval0.5. Tox:interval0(formerlyy:intervalt(y))addallpexesoriginallyinpx:intervali1.Next,removefromx:interval0,thoseprexesthatcontaintherange[keyt(y);px:keyi+1].Sincetheseremovedprexesarealreadyincludedinpx:intervali,theyarenottobeaddedagain. Onemayverifythatfollowingtheborrowoperation,wehaveaproperlystruc-turedPIBT.Further,sincetheprexesofintervalithatdonotincludeagivenrangemaybefoundinO(log2W)timeusingabinarysearchonprexlength,thetimecom-plexityoftheborrowoperationisO(m+log2W)andtheborrowoperationaccesses3nodes. 4.1.9 Merging Two Adjacent Siblings Whennodexisdecientanditsnearestsiblingyhasexactlydm=2e1keys,nodesx,yandthein-betweenkey,px:keyi,intheparentpxofxarecombinedintoasinglenode.Theresultingsinglenodehas2dm=2e2keys.Figure4{9showsthesituationwhenyisthenearestrightsiblingofx. Thestepsinthemergeofxandyare:1. Theprexesinpx:intervali1thatdonotincludetherange[px:keyi1;px:keyi+1]areremovedfrompx:intervali1andaddedtotheintervalsofx.2. Theprexesinpx:intervalithatdonotincludetherange[px:keyi1;px:keyi+1]areaddedtotheintervalsofy.px:intervaliisremovedfrompx. PAGE 109 Figure4{9:Mergetwonodes3. Removepx:keyianditsassociatedequalvectorfrompxandappendtotherightofx.Next,appendthecontentsofytotherightofthenewx. Sincethemergingofxandyreducesthenumberofkeysinpxby1,pxmaybecomedecient.Ifso,theborrow/mergeprocessisrepeatedatpx.Inthisway,thedeciencymaybepropogatedallthewayuptotheroot.Incasetherootbecomesdecientithasnokeysandsoisdiscarded. ComplexityAnalysis.SincetheprexesofintervalithatdonotincludeagivenrangemaybefoundinO(log2W)timeusingabinarysearchonprexlength,twonodesmaybemergedinO(m+log2W)time;thenumberofnodesaccessedduringthemergeis3. 4.1.10 Deleting from a Non-leaf Node Todeletetheendpointu=x:keyifromthenon-leafnodex,uisreplacedbyeitherthelargestkeyinthesubtreex:childi1orthesmallestkeyinthesubtreex:childi[41].Lety:keyt(y)bethelargestkeyinthesubtreex:childi1(Figure4{10). Whenuisreplacedbyy:keyt(y),itisnecessaryalsotoreplacex:equalibyy:equalt(y).Beforeproceedingtoremovey:keyt(y)fromtheleafnodey,weneedtoadjusttheintervalvaluesofnodesonthepathfromxtoy.Letz,z6=x,beanodeonthepathfromxtoy.Asaresultoftherelocationofy:keyt(y),int(z)shrinksfrom[start(int(z));u]to[start(int(z));y:keyt(y)].So,prexesthatincludetherange PAGE 110 Figure4{10:Deletingx:keyi[start(int(z));keyt(y)]butnottherange[start(int(z));u]aretoberemovedfromtheintervalsofzandaddedtotheparentofz.Since,therearenoendpointsbetweeny:keyt(y)andu=x:keyi,theseprexesthataretoberemovedfromtheintervalsofzmusthavey:keyt(y)asanendpoint(inparticular,theseprexesnishaty:keyt(y)).Hence,theseprexesareiny:equalt(y),andso,thenumberoftheseprexesisO(W).Asweretracethepathfromytox,thebitvectorsforthesetofprexestobere-movedfromeachnodemaybeconstructedintotaltimeO(logmn+W).AsittakesO(m)timetoremovethedesiredprexesfromtheintervalsofeachnodezandtoaddtotheparentofz,thetotaltimeneededtoupdatetheintervalvaluesforallnodesonthepathfromxtoy(includingnodesxandy)isO(mlogmn+W). Letvbetheleftmostleafnodeinthesubtreex:childi.Foreachnodez,z6=x,onthepathfromxtov,z:intexpandsfrom[u;finish(int(z))]to[y:keyt(y);finish(int(z))].Sincethereisnoprexthathasuasitsendpointandsincetherearenoendpointsbetweenuandy:keyt(y),nointervalvectorsonthepathfromxtovaretobechanged. ComplexityAnalysis.Addingtogetherthecomplexityofeachstepofthedeletionalgorithm,wegetO((m+log2W)logmn+W)astheoveralltimecomplexityofthedeleteoperation.ThenumberofnodeaccessesisO(logmn). ThetimecomplexityofthedeleteoperationbecomesO(mlogmn+W)whenthesearchforprexesthatdonotmatchagivenrange(thisisdonewhentwoadjacent PAGE 111 Figure4{11:Mergingadjacentsiblings Unfortunately,thisserialsearchstrategyforthenon-matchingprexescannotbeadaptedtond,inO(W)time,allthematchingprexesrequiredduringtheinsertoperation. 4.1.11 Cache-Miss Analysis Thenumberofcachemissesthatoccurduringthelookupoperationisapprox-imatelythesameforPIBTsandMRTs.Aworst-caselookupwillexaminelogm=2nnodes.Ifabinarysearchisusedineachexaminednodetodeterminewhichsubtreetomoveto,theexaminationofeachnodewillcauseaboutlog2(mW=(8b))cachemisses(bisthesize,inbytes,ofacacheline).So,theworst-casenumberofcachemisses PAGE 112 Fortheinsertoperation,wecountonlythenumberofreadmisses(sincewritemissesarenon-blocking,thesedonotaectperformanceasmuchasthereadmissesdo).Letsbethesize,innumberofcachelines,ofanMRTnode.Tosplitanode,wemustreadatleasttherighthalfofthatnode.Forsimplicity,weassumethattheentirenodeisread.Withthisassumption,ourcache-misscountwillalsoaccountforcachemissesthatoccuronthedownwardsearchpassofaninsertoperation.Thetotalnumberofnodesthatgetsplitduringaninsertmaybeashighash,wherehistheheightoftheMRT.So,theworst-casenumberofcachemissesexclusiveofthoseneededtoupdateinformationinnodesnotaccessedbythesearchandsplitstepsishs.BesidesmaintainingtheB-treepropertiesoftheMRT,aninsertmustupdatethespanvector(denedin[20])storedineachofthechildrenofanodethatgetssplit.Thisrequires(m1)hmhspanvectorstobeupdatedatanadditionalcostofmhcachemisses(eachspanvectorisassumedtotinacacheline).So,theworst-casenumberofcachemissesisapproximately,h(s+m). ThenodesofthePIBTstructureareapproximatelytwiceaslargeasthoseoftheMRT.Sincetheworst-caseheightsofthePIBTandMRTarealmostthesame,thenumberofcachemissesduringthedownwardsearchpassandtheupwardnode-splitpassisatmost2hs.Nonewnodesareaccessedtoupdateintervalvectors.So,2hsisaboundfortheentireinsertoperation.Sinces8m=b,theratiooftheworst-casemissesforMRTandPIBTisapproximately(b+8)=16.Whenb=32(asitisforaPC),thisratiois2.5.Thatis,theMRTwillmake2.5timesasmanycachemisses,intheworst-case,aswillthePIBTduringaninsertoperation. PAGE 113 ThePBOBofLuetal.[35]makes2log2ncachemissesduringaworst-caseinsert.Since,2hs16m=blogm=2n=4log2n,whenm=b=32,anorder32PIBTmakestwiceasmanycachemissesduringaworst-caseinsertasdoesthePBOB. Theanalysisforthedeleteoperationisalmostidentical,andthecache-misscountsarethesameasfortheinsertoperation. 4.2 Highest-Priority Range-Tables Inthissection,weextendthePIBTstructuretoobtainaB-tree-baseddatastructurecalledRIBT(rangeinB-tree).TheRIBTstructureisfordynamicrouter-tableswhoseltersarenon-intersectingranges. 4.2.1 PreliminariesDenition19 [4;4]istheoverlapof[2;4]and[4;6];andoverlap([3;8];[2;4])=[3;4].[2,4]and[6,9]aredisjoint;[2,4]and[3,4]arenested;[2,4]and[2,2]arenested;[2,8]and[4,6]arenested;[2,4]and[4,6]intersect;and[3,8]and[2,4]intersect.Denition20 IfRhasjRj+1distinctendpoints,then[s;f]2R. PAGE 114 Part(b)followsfromtheproofofCase1.If[s;f]62R,RhasatleastjRj+2distinctendpoints. 4.2.2 The Range In B-Tree Structure|RIBT TheRIBTisanextensionofthePIBTstructuretothecaseofanNHPRT.AsinthePIBTstructure,wemaintainaB-treeofdistinctrange-endpoints.LetxbeanodeoftheRIBTB-tree.x:intandx:intiaredenedasforthecaseofthePIBTB-tree.Witheachendpointx:keyiinnodex,wekeepamax-heap,equalHi,ofrangesthathavex:keyiasanendpoint.AsinthecaseofthePIBT,thedefault PAGE 115 OtherrangesarestoredinequalHheapsaswellasinintervalHmax-heaps,whicharethecounterpartoftheintervalvectorsusedinPIBT.AnRIBTnodethathastkeyshastintervalHmax-heaps.TherangesstoredinthesemaxheapsaredeterminedbyarangeallocationrulethatissimilartotheprexallocationruleemployedbyaPIBT|arangerisstoredinanintervalHmax-heapofnodexirincludesx:intiforsomeibutdoesnotincludex:int.AsinthecaseofthePIBT,eachrangeisstoredintheintervalHmax-heapsofatmost2nodesateachleveloftheRIBTB-tree.Letset(x)bethesetofrangestobestoredinnodex.UnlikethePIBT,whereaprexmaybestoredinseveralintervalvectorsofanode,inanRIBT,eachranger2set(x)isstoredineactlyoneintervalHmax-heapofx.Toeachranger2set(x),weassignanindex(i;j)suchthatx:keyi1 PAGE 116 key1;key2;:::;keyt wherehprsisthehighest-priorityrangeinset(x)thatmatchesx:ints,equalHptrsisapointertoequalHs,andintervalHptrsisapointertotheintervalHmax-heapwhoseindexis(is;js). Sincethetotalnumberofendpointsisatmost2nandsinceeachB-treenodeotherthantheroothasatleastdm=2e1keys(keysarerangeendpoints),thenumberofB-treenodesinanRIBTisO(n=m).EachB-treenodetakesO(m)memoryexclusiveofthememoryrequiredforthemaxheaps.So,exclusiveofthemax-heapmemory,weneedO(n)memory.EachrangemaybestoredonO(1)maxheapsateachleveloftheB-tree.So,themax-heapmemoryisO(nlogmn).Therefore,thetotalmemoryrequirementoftheRIBTisO(nlogmn). 4.2.3 RIBT Operations Figure4{12givesthealgorithmtondthepriorityofthehighest-priorityrangethatmatchesthedestinationaddressd.Thisalgorithmiseasilymodiedtondthehighest-priorityrangethatmatchesd.ThealgorithmdiersfromalgorithmlengthOflmp(Figure4{3)primarilyintheabsenceofthebreakstatementinthewhileloop.SinceLemma33doesnotextendtothecaseofhighest-prioritymatchinginnon-intersectingranges,itisn'tpossibletostopthesearchforhp(d)followingtheexaminationoftheequalHmax-heapford. Thecomplexityofhp(d)isO(log2mlogmn)=O(log2n)andthenumberofnodesaccessedisO(logmn).ThealgorithmstoinsertanddeletearangearesimilartothecorrespondingPIBTalgorithms.So,wedonotdescribethesehere.WhenthemaximumdepthofnestingoftherangesisD.AlthoughDnforrangesand PAGE 117 4.3 Experimental Results WeimplementedtheB-treerouter-tabledatastructuresPIBT(Section4.1)andMRT[20]inC++andcomparedtheirperformanceona700MHzPC.Initialexperi-mentationwiththeimplementationsofthetwoB-treestructuresshowedthatsearchtimeisoptimalwhentheB-treeorderis32(i.e.,m=32).Consequently,allexper-imentalresultsreportedinthissectionareforthecasem=32.TodeterminewhatbenetsaccruefromtheuseofaB-treerelativetoabinarysearchtree,weincludedalsothePBOBdatastructureofLuetal.[35]inourperformancemeasurements.OurexperimentswereconductedusingsixIPv4prexdatabasesobtainedfrom[38].ThedatbasesPaix1,Pb1,MaeWestandAadswereobtainedonNov22,2001,whilePb2andPaix2wereobtainedSep13,2000.Thenumberofprexesineachofour6 PAGE 118 Table4{2:MemoryUsage.m=32forPIBTandMRT Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 NumofPrexes 16172 22225 28889 31827 35303 85988 PIBT(KB) 715 993 1292 1425 1604 3936 MRT(KB) 813 1132 1471 1621 1834 4526 PBOB(KB) 369 509 661 728 811 1961 Tomeasuretheaveragelookuptime,foreachprexdatabase,wegenerated1000randomaddresses,randAddr[0::999],thatarematchedbyoneormoreoftheprexesinthedatabase.Then,asequenceof1millionlookupsaredonebygenerating1millionunformlydistributedrandomnumbersintherange[0::999].Whenarandomnumberiisgenerated,wendlmp(randAddr[i]).Fromthetimerequiredforthissequenceof1millionrandomnumbergenerationsandlookups,wesubtractthetimefortherandomnumbergenerationanddivideby1milliontogettheaveragetimepersearch.Foreachdatabaseandrouter-tabledatastructure,theexperimentisdone10timesandtheaverageoftheaveragesaswellasthestandarddeviationoftheaveragescomputed.Since,eachdestinationinrandAddrissearchedforapproximately1000times,theexperimentsimulatesaburstytracenvironment.Sincethe1000addressesinrandAddrtake4000bytes,thepollutionofL2cache(256KB)byrandAddrislessthanwhatitwouldbeifwegeneratedarandomsequenceof1millionaddressesandsavedtheseinanarray. Table4{3givesthemeasuredaveragelookuptimes.TheseaveragetimesarehistogrammedinFigure4{13.Ascanbeseen,PIBTandMRThavealmostthesameperformanceonlookup.Thisistobeexpectedasalookupineitherstructureresultsinthesamenumberofcachemissesandalsodoesthesameamountofwork.The PAGE 119 Table4{3:LookuptimeonaPentiumIII700MHzPC.m=32forPIBTandMRT.Varianceis<0:02 Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 PIBT 0.33 0.34 0.35 0.36 0.33 0.42 MRT 0.33 0.35 0.35 0.35 0.33 0.41 PBOB 0.49 0.50 0.53 0.53 0.51 0.61 Figure4{13:LookuptimeonaPentiumIII700MHzPC.m=32forPIBTandMRT Fortheaverageupdate(insert/delete)time,westartbyselecting1000prexesfromthedatabase.Those1000prexesarerstremovedfromthedatastructure.Oncethe1000removalsaredone,theremoved1000prexesareinsertedbackinto PAGE 120 Table4{4:UpdatetimeonaPentiumIII700MHzPC.m=32forPIBTandMRT Paix1 Pb1 MaeWest Aads Pb2 Paix2 PIBT mean 2.89 3.21 3.32 3.31 3.53 4.16 std 0.061 0.021 0.040 0.033 0.047 0.016 MRT mean 4.15 4.37 4.55 4.47 5.21 5.69 std 0.183 0.034 0.029 0.043 0.239 0.030 PBOB mean 0.42 0.43 0.44 0.45 0.46 0.47 std 0.002 0.002 0.001 0.003 0.007 0.004 Figure4{14:UpdatetimeonaPentiumIII700MHzPC.m=32forPIBTandMRT Ascanbeseen,anupdateinPBOBtakesmuchlesstimethandoesanupdateineitherMRTandPIBT.Further,anupdateinPIBTtakesabout30%lesstimethandoesanupdateinMRT. PAGE 121 4.4 Conclusion WehavedevelopedanalternativeB-treerepresentationfordynamicrouterta-bles.AlthoughourrepresentationhasthesameasymptoticcomplexityasdoestheB-treerepresentationofSurietal.[20],oursisfasterfortheupdateoperation.Thisisbecauseourstructureperformsupdateswithfewercachemisses.Forthesearchoperation,bothB-treestructurestakeaboutthesametime.Whencomparedtothefastestbinarytreestructure,PBOB,fordynamicroutertables,weseethattheuseofahigh-degreetreeenablestheB-treestructuretoperformbetteronthesearchoperation.However,ontheupdateoperation,PBOBisdecidedlysuperior. PAGE 122 5.1 Conclusion Wehavedevelopedseveraldatastructuresfordynamicroutertables.Thenoveltyofourdatastructuresissupportingreal-timeupdate,supportingrangelters,andsupportingthehighest-prioritymatchingtiebreaker(BOB,PBOB,RIBT). Therstdatastructure,PST[33,34],permitsonetosearch,insert,anddeleteinO(logn)timeeachusingthemostspecicmatchingtiebreaker.AlthoughO(logn)timedatastructuresforprextableswereknownpriortoourwork[21,22],PSTismorememoryecientthanthedatastructuresofSahnietal.[21,22].Further,PSTissignicantlysuperiorontheinsertanddeleteoperations,whilebeingcompetitiveonthesearchoperation.Fornonintersectingrangesandconict-freerangesPSTsarethersttopermitO(logn)search,insert,anddelete.AllofourdatastructuresbasedonprioritysearchtreeusesO(n)memory. Theseconddatastructure,BOB[35],worksforhighest-prioritymatchingwithnonintersectingranges.Itsvariant,PBOB,worksforprexsetornonintersectingrangesetwithalimitednumberofnestinglevels(6forIPv4backbonerouterta-ble[38]).InordertosupportO(logn)timeupdate,BOBtransformstheproblemofremovinganemptydegree-2nodefromthetop-leveltreetotheproblemofremov-inganemptydegree-1ordegree-0nodefromthetop-leveltree.OurexperimentsshowthatPBOBistobepreferredoverPSTfortherepresentationofdynamiclongest-matchingprex-router-tables.ButPSTremainstheonlychoicefordetect-ingintersectandconictbetweenrangesinO(logn)time.Onpracticalruletables,113 PAGE 123 ThethirddatastructureisbasedonB-treeinordertoutilizethewidecachelinesize.Itisdesignedforprexltersaswellasnon-intersectingrangelters.Surietal.[20]proposedamulti-wayrangetreethatisalsobasedonB-tree.AcrucialdierencebetweenourdatastructureforprexltersandthatofSurietal.[20]isthatinourdatastructure,eachprexisstoredinO(1)B-treenodesperB-treelevel,whereasinthestructureofSurietal.[20],eachprexisstoredinO(m)nodesperlevel(mistheorderoftheB-tree).Asaresultofthisdierence,oursisfasterfortheupdateoperation.Forthesearchoperation,bothB-treestructurestakeaboutthesametime.WhencomparedtoPBOB,wendthattheuseofahigh-degreetreeenablestheB-treestructuretoperformbetteronthesearchoperation.However,ontheupdateoperation,PBOBisdecidedlysuperior. 5.2 Future Work Itismorechallengingtodesigndatastructuresformultidimensionalrouterta-bles.Theproblemofpointlocationinasetofnnon-overlappingd-dimensionalhyper-rectanglesrequiresO(logn)timewithO(nd)memoryrequirementorO((logn)d1)timewithO(n)memoryrequirement[43].Multidimensionalclassicationisnoeas-ierthanpointlocationproblemsincethelterscanoverlap.Theabovecomplexityboundsdonotconsiderupdate.Thustheyareforstaticdatastructures. Almostalltheexistingschemesformultidimensionalroutertablesfocusonstaticdatastructuresforprexlters.Guptaetal.[43]reviewstaticdatastructuresformultidimensionalroutertables.HierarchicaltriesrequiresO(Wd)timeforsearchwithO(dWn)memoryrequirement.HierarchicaltriescansupportupdateinO(dW)time.Setpruningtrees[44]supportsearchinO(dW)timewithO(nd)memoryrequirement.Cross-producting[44]decomposesthesearchintodone-dimensional PAGE 124 Designingschemesfordynamicmultidimensionalroutertablesisevenmorechal-lenging.WhiletheInternetcommunityistryingtoreducetheworkloadofpacketclassicationinsidecoreusingtechnologylikeMPLS,classicationstillhastobedonesomewheretoaggregateanddeaggregatethetracinsideanetworkaslongasthepacketsstillcarryIPaddresses,portnumbersandotherrelatedelds. PAGE 125 C.Labovitz,G.Malan,andF.Jahanian,Internetroutinginstability,ACMSIG-COMM,Cannes,FrenchRiviera,France,September1997.[2] C.Labovitz,A.Ahuja,A.Bose,andF.Jahanian,DelayedInternetroutingcon-vergence,ACMSIGCOMM,Stockholm,Sweden,August-September2000.[3] D.Pei,X.Zhao,L.Wang,D.Massey,A.Mankin,S.WuandL.Zhang,ImprovingBGPconvergencethroughconsistencyassertions,IEEEINFOCOM,NewYorkCity,NewYork,USA,June2002.[4] C.Macian,R.Finthammer,Anevaluationofthekeydesigncriteriatoachievehighupdateratesinpacketclassiers,IEEENetwork,15(6):24-29,Novem-ber/December2001.[5] F.Baboescu,S.Singh,andG.Varghese,Packetclassicationforcorerouters:isthereanalternativetoCAMs?IEEEINFOCOM,SanFrancisco,California,USA,April2003.[6] M.Ruiz-Sanchez,E.Biersack,andW.Dabbous,SurveyandtaxonomyofIPaddresslookupalgorithms,IEEENetwork,15(2):8-23,March/April2001.[7] S.Sahni,K.Kim,H.Lu,Datastructuresforone-dimensionalpacketclassi-cationusingmost-specic-rulematching,InternationalSymposiumonParallelArchitectures,Algorithms,andNetworks(ISPAN),MakatiCity,MetroManila,Philippines,May2002.[8] K.Sklower,Atree-basedroutingtableforBerkeleyUnix,TechnicalReport,UniversityofCalifornia,Berkeley,1993.[9] M.Degermark,A.Brodnik,S.Carlsson,andS.Pink,Smallforwardingtablesforfastroutinglookups,ACMSIGCOMM,Cannes,FrenchRiviera,France,Septem-ber1997.[10] W.Doeringer,G.Karjoth,andM.Nassehi,Routingonlongest-matchingpre-xes,IEEE/ACMTransactionsonNetworking,4(1):86-97,1996.[11] S.NilssonandG.Karlsson,Fastaddresslook-upforInternetrouters,IEEEBroadbandCommunications,Stuttgart,Germany,April1998.[12] V.Srinivasan,GeorgeVarghese,FasterIPlookupsusingcontrolledprexex-pansion,ACMSIGMETRICSPerformanceEvaluationReview,26(1):1-10,1998116 PAGE 126 S.SahniandK.Kim,Ecientconstructionofxed-StridemultibittriesforIPlookup,Proceedings8thIEEEWorkshoponFutureTrendsofDistributedComputingSystems(FTDCS),Bologna,Italy,October-November2001.[14] S.SahniandK.Kim,Ecientconstructionofvariable-stridemultibittriesforIPlookup,ProceedingsIEEESymposiumonApplicationsandtheInternet(SAINT),Naracity,Nara,Japan,January-February2002.[15] P.Gupta,S.Lin,andN.McKeown,Routinglookupsinhardwareatmemoryaccessspeeds,IEEEINFOCOM,SanFrancisco,USA,March-April1998.[16] N.Huang,S.Zhao,AnovelIP-routinglookupschemeandhardwarearchitectureformultigigabitswitchingrouters,IEEEJounalonSelectedAreasinCommuni-cations,17(6):1093-1104,June1999.[17] A.Basu,G.Narlika,Fastincrementalupdatesforpipelinedforwardingengines,IEEEINFOCOM,SanFrancisco,California,USA,April2003.[18] M.Waldvogel,G.Varghese,J.Turner,andB.Plattner,ScalablehighspeedIProutinglookups,ACMSIGCOMM,Cannes,FrenchRiviera,France,September1997.[19] B.Lampson,V.Srinivasan,andG.Varghese,IPlookupusingmulti-wayandmulticolumnsearch,IEEEINFOCOM,SanFrancisco,USA,March-April1998.[20] S.Suri,G.Varghese,andP.Warkhede,Multiwayrangetrees:ScalableIPlookupwithfastupdates,GLOBECOM,SanAntonio,Texas,USA,November2001.[21] S.SahniandK.Kim,O(logn)dynamicpacketrouting,IEEESymposiumonComputersandCommunications,Taormina,ITALY,July2002.[22] S.SahniandK.Kim,Ecientdynamiclookupforburstyaccesspatterns,sub-mitted.[23] F.Ergun,S.Mittra,S.Sahinalp,J.Sharp,andR.Sinha,Adynamiclookupschemeforburstyaccesspatterns,IEEEINFOCOM,Anchorage,Alaska,USA,April2001.[24] E.Horowitz,S.Sahni,andD.Mehta,FundamentalsofdatastructuresinC++,W.H.Freeman,NewYork,1995.[25] P.Gupta,andN.McKeown,Dynamicalgorithmswithworst-caseperformanceforpacketclassication,IFIPNetworking,Paris,France,May2000.[26] A.McAuleyandP.Francis,FastroutingtablelookupsusingCAMs,IEEEIN-FOCOM,SanFrancisco,CA,USA,March-April1993.[27] D.Shah,andP.Gupta,FastupdatingalgorithmsforTCAMs,IEEEMICRO,21(1):36-47,2001. PAGE 127 C.Matsumoto,CAMvendorsconsideralgorithmicalternatives,http://www.eetimes.com/story/OEG20020520S0014,EETIMES,May20,2002.[29] G.CheungandS.McCanne,OptimalroutingtabledesignforIPaddresslookupsundermemoryconstraints,IEEEINFOCOM,NewYorkCity,NewYork,USA,March1999.[30] G.ChandranmenonandG.Varghese,Tradingpacketheadersforpacketprocess-ing,IEEETransactionsonNetworking,4(2):141-152,1996.[31] P.Newman,G.Minshall,andL.Huston,IPswitchingandgigabitrouters,IEEECommunicationsMagazine,64-69,January1997.[32] A.Bremler-Barr,Y.Afek,andS.Har-Peled,Routingwithaclue,ACMSIG-COMM,Cambridge,MA,USA,September1999.[33] H.LuandS.Sahni,Prioritysearchtreesanddynamicrouter-tables.Submitted.[34] H.LuandS.Sahni,O(logn)dynamicrouter-tablesforranges,IEEESymposiumonComputersandCommunications,Kiris-Kemer,Turkey,June-July2003.[35] H.LuandS.Sahni,DynamicIProuter-tablesusinghighest-prioritymatching,Submitted.[36] A.Hari,S.Suri,andG.Parulkar,Detectingandresolvingpacketlterconicts,IEEEINFOCOM,Tel-Aviv,Israel,March2000.[37] E.McCreight,Prioritysearchtrees,SIAMJr.onComputing,14(2):257-276,1985.[38] Merit,Ipmastatistics,http://nic.merit.edu/ipma,November25,2001.[39] K.Melhorn,Datastructuresandalgorithms3:Multi-dimensionalsearchingandcomputationalgeometry,SpringerVerlag,NewYork,1984.[40] T.H.Cormen,C.E.Leiserson,R.L.Rivest,C.Stein,IntroductiontoAlgorithms,2ndedition,McGrawHill,NewYork,2001.[41] S.Sahni,Datastructures,algorithms,andapplicationsinJava,McGrawHill,NewYork,2000.[42] P.Warkhede,S.Suri,andG.Varghese,Fastpacketclassicationfortwo-Dimensionalconict-Freelters,IEEEINFOCOM,Anchorage,Alaska,USA,April2001.[43] P.GuptaandN.Mckeown,Algorithmsforpacketclassication,IEEENetwork,15(2):24-32,2001. PAGE 128 V.Srinivasan,G.Varghese,S.Suri,M.Waldvogel,Fastandscalablelayerfourswitching,ACMSIGCOMM,Vancouver,BC,Canada,August-September1998.[45] A.Feldmann,S.Muthukrishnan,Tradeosforpacketclassication,IEEEIN-FOCOM,Tel-Aviv,Israel,March2000.[46] V.Srinivasan,S.Suri,G.Varghese,Packetclassicationusingtuplespacesearch,ACMSIGCOMM,Cambridge,Massachusetts,USA,September1999.[47] P.GuptaandN.Mckeown,Classicationusinghierarchicalintelligentcuttings,IEEEMicro,20(1):34-41,2000. |