Data Structures for Dynamic Router Table

Material Information

Data Structures for Dynamic Router Table
LU, HAIBIN ( Author, Primary )
Copyright Date:


Subjects / Keywords:
Bytes ( jstor )
Chopping ( jstor )
Data models ( jstor )
Data ranges ( jstor )
Databases ( jstor )
Information search ( jstor )
Mathematical vectors ( jstor )
Range searching ( jstor )
Siblings ( jstor )
Standard deviation ( jstor )

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Haibin Lu. Permission granted to University of Florida to digitize and display this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Embargo Date:
Resource Identifier:
53314839 ( OCLC )


This item has the following downloads:

lu_h ( .pdf )



































































































































Full Text







Copyright 2003


Haibin Lu

To my family.


I would like to give my sincere thankfulness to my advisor, Dr. Sartaj Sahni, for

his mentoring and support throughout my Ph.D. study. It would be impossible to

have my research career without his guidance.

This work was supported, in part, by the National Science Foundation under

grant CCR-9911:'~ ,.

I am very grateful to Dr. S li-i 'y Ranka, Dr. Randy C'!. ..-, Dr. Richard N. ,.- il 111

Dr. Michael Fang for serving on my Ph.D. supervisory committee and providing

helpful --, -I 1 i i-.

I want to dedicate this dissertation to my parents. Without their encouragement

and hard work, I could not think of getting a doctoral degree. Finally, I would like

to give my special thanks to my wife, Lan, whose caring and love enabled me to

complete this work.



ABSTRACT .. ...................



1.1 Introduction .. ...........
1.1.1 Static Router Table .....
1.1.2 Dynamic Router Table .
1.2 Related Work .. ..........
1.2.1 Trie . . . .
1.2.2 Sets of Equal-Length Prefixes
1.2.3 End-Point Array .......
1.2.4 Multiway Range Tree .
1.2.5 O(logn) Dynamic Solutions
1.2.6 Highest-Priority Prefix Table
1.2.7 TCAM .. ..........
1.2.8 Others . . .
1.3 Contribution .. ...........



Prelim inaries . . . . . .
2.1.1 Prefixes and Longest-Prefix Matching .....
2.1.2 Ranges and Projections .. ...........
2.1.3 Most-Specific-Range Routing and Conflict-Free
2.1.4 Normalized Ranges .. ............
2.1.5 Priority Search Trees And Ranges .......
P refixes . . . . . . .
N i iii! i, iecting Ranges . . . . .
Conflict-Free Ranges .. ...............
2.4.1 Determine msr(d) .. .............
2.4.2 Insert A Range .. ..............
2.4.3 Delete A Range .. ..............
2.4.4 Computing maxP and minP ..........
2.4.5 A Simple Algorithm to Compute maxP .
2.4.6 An Efficient Algorithm to Compute maxP .






2.4.7 Wrapping Up Insertion of a Range . . 44
2.4.8 Wrapping Up Deletion of a Range . . ..... 45
2.4.9 Complexity. ....... ............ ...... 45
2.5 Experimental Results .................. ..... .. 46
2.5.1 Prefixes. .................. ......... .. 46
2.5.2 Nonintersecting Ranges ............ .. .. .. 50
2.5.3 Conflict-free Ranges ............ ...... 51
2.6 Conclusion .................. ............ .. 51

MATCHING .................. ............. .. 53

3.1 Preliminaries ...... .......... ..... .... 53
3.2 N. iii. I, ecting Highest-Priority Rule-Tables (NHRTs)-BOB .56
3.2.1 The Data Structure .................. .... 56
3.2.2 Search for hpr(d) .................. .. 59
3.2.3 Insert a Range .................. ..... .. 61
3.2.4 Red-Black-Tree Rotations .................. .. 63
3.2.5 Delete a Range. .................. .... .. 66
3.2.6 Expected Complexity of BOB . . ..... 68
3.3 Highest-Priority Prefix-Tables (HPPTs)-PBOB . ... 69
3.3.1 The Data Structure .................. .... 69
3.3.2 Lookup ........ ....... ...... .. .... 69
3.3.3 Insertion and Deletion . . . ...... 71
3.4 Longest-Matching Prefix-Tables (LMPTs)LMPBOB . 71
3.4.1 The Data Structure .................. .... 71
3.4.2 Lookup ........ ....... ...... .. .... 72
3.4.3 Insertion and Deletion ................ .. .. 73
3.5 Implementation Details and Memory Requirement . ... 74
3.5.1 Memory Management ................ .. .. 74
3.5.2 BO B . . . . ... . .. . 74
3.5.3 PBOB ...... ........ ......... .... 76
3.5.4 LM PBOB .................. ...... .. .. 77
3.6 Experimental Results .................. ..... .. 78
3.6.1 Test Data and Memory Requirement . . 78
3.6.2 Preliminary Timing Experiments . . ..... 79
3.6.3 Run-Time Experiments ............ .. .. .. 82
3.7 Conclusion .................. ............ .. 84


4.1 Longest-Matching Prefix-Tables-LMPT . . ..... 88
4.1.1 The Prefix In B-Tree Structure-PIBT . ... 88
4.1.2 Finding The Longest Matching-Prefix . . ... 91
4.1.3 Inserting A Prefix .................. .. 92
4.1.4 Inserting an endpoint ................ .. .. 92

4.1.5 Update interval vectors ............... .. 96
4.1.6 Deleting A Prefix ..... .......... .... 97
4.1.7 Deleting from a Leaf Node .............. .. .. 98
4.1.8 Borrow from a Sibling ...... .......... .... 98
4.1.9 Merging Two Ad i .. il Siblings . . ..... 99
4.1.10 Deleting from a Non-leaf Node . . 100
4.1.11 Cache-Miss Analysis ...... ........ . 102
4.2 Highest-Priority Range-Tables ............. . 104
4.2.1 Preliminaries ..... . . ...... 104
4.2.2 The Range In B-Tree Structure-RIBT . .... 105
4.2.3 RIBT Operations ................ ... 107
4.3 Experimental Results .................. .... 108
4.4 Conclusion .................. ............ .. 112

5 CONCLUSION AND FUTURE WORK ............. ..113

5.1 Conclusion ............... ......... ..113
5.2 Future Work ............... ........... ..114

REFERENCES .................. ................ .. 116

BIOGRAPHICAL SKETCH .................. ......... 120

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy



Haibin Lu

August 2003

C'!I ir: Sartaj Sahni
Major Department: Computer Information and Science and Engineering

Internet routers use router tables to classify incoming packets based on the in-

formation carried in the packet headers. Packet classification is one of the network

bottlenecks, especially when a high update rate becomes necessary. Much of the

research in the router-table area has focused on static prefix tables, where updates

usually require the rebuilding of the whole router table. Some router-table designs

rely on the relatively short IPv4 addresses to achieve desired efficiency. However,

these designs have bad scalability in terms of the prefix length.

We propose several schemes to represent one-dimensional dynamic range tables,

that is, tables into/from which rules are inserted/deleted concurrent with packet

classification, and filters are specified as ranges. Our schemes allow real-time update

and at the same time provide efficient lookup. The lookup and update complexities

of our schemes are logarithmic functions of the number of the filters. The first scheme

PST, which is based on priority search trees, uses the most specific rule tie breaker.

The second scheme is called BOB (Binary search tree On Binary search tree). This

scheme uses the highest priority tie breaker. In order to utilize the wide cache line

size and reduce the tree height, a third scheme is developed in which the top level

tree is a B-Tree. This scheme also uses the highest priority tie breaker. All three

schemes are suitable for prefix filters as well as for range filters in which no two filters

have intersecting ranges. In addition, the PST also can handle a conflict-free range



1.1 Introduction

Tod ,'s Internet consists of thousands of packet networks interconnected by

routers. When a host sends a packet into the Internet, the routers relay the packet

towards its final destination. The routers exchange routing information with each

other, and use the information gathered to calculate the paths to all reachable desti-

nations. Each packet is treated independently and forwarded to a next router based

on its destination address.

The data structure a router uses to query next hop is called the router table.

Each entry in the router table is a rule of the form (address prefix, next hop). Table

1-1 shows a set of five rules. We use W to denote the maximum possible length of a

prefix. In IPv4, W = 32 and in IPv6, W = 128. In Table 1-1 W is 5. The prefix P1,

which matches all the destination addresses, is called the /. fault prefix. The prefix

P3 matches the destination addresses between 16 and 19. If the address prefix of a

rule matches the destination address the incoming packet carries, the next hop of this

rule is used to forward packet.

Address prefix was introduced by CIDR (Classless Interdomain Routing) to deal

with address depletion and router table explosion. The result of CIDR's address

. I -. regation is that there may have several rules whose prefixes match the destination

address. For example, the rules P1, P3 and P4 in Table 1-1 match the destination

address 19. In this case, a tie breaker is needed to select one of the matching rules.

The most .p. .W- matching is usually used, namely, the longest prefix matching the

destination address is the winner. For our example router table, P4 is the winner for

destination address 19.

The other two popular tie breakers are first matching and highest j' .:, .:1,/ match-

ing. For first matching tie breaker, the rule table is assumed to be a linear list of rules

with the rules indexed 1 through n for an n-rule table. The first rule that matches

the incoming package is used. Notice that the rule R1 is selected for every incoming

packet since it matches all the destination addresses. In order to give a chance to

other rules to become the winner, we must index the rules carefully, and the default

prefix should be the last rule.

In the highest priority n il hin,:. each rule is assigned a priority, and the rule with

the highest priority is selected from those matching the incoming packet.1 Notice

that the first matching tie breaker is a special case of the highest priority matching

tie breaker(simply assign each rule a priority equal to the negative of its index in the

linear linear).

Table 1-1: A router table with five rules (W = 5)

Rule Name Prefix Name Prefix Next Hop Range Start Range Finish
R1 P1 N1 0 31
R2 P2 0101* N2 10 11
R3 P3 100* N3 16 19
R4 P4 1001* N4 18 19
R5 P5 10111 N5 23 23

The query based on the destination address is usually called address lookup or

packet forwarding. In general other fields such as source address and port numbers

may also be used, and the router table consists of the rules of the form (F, A), where

F is a filter and A is an action. The action component of a rule specifies what is

1 We may assume either that all priorities are distinct or that selection among rules
that have the same priority may be done in an arbitrary fashion

to be done when a packet that satisfies the rule filter is received. Sample actions

are drop the packet, forward the packet along a certain output link, and reserve a

specified amount of bandwidth. Tie breakers similar to those mentioned earlier are

used to select a rule from the set of rules that match the incoming packet. We call

this problem packet la- i..:, /;'.:n.

1.1.1 Static Router Table

In a static rule table, the rule set does not vary in time. For these tables, we are

concerned primarily with the following metrics:

1. Time required to process an incoming packet. This is the time required to search

the rule table for the rule to use. We refer to this operation as a lookup.

2. Preprocessing time. This is the time to create the rule-table data structure.

3. Storage requirement. That is, how much memory is required by the rule-table

data structure?

To handle update, static schemes usually use two copies working and shadow of

the router tables. Lookups are done using the working table. Updates are performed,

in the background (either in real time on the shadow table or by watching updates

and reconstructing an updated shadow at suitable intervals); periodically, the shadow

replaces the working table, and the caches of the working table are flushed. In this

mode of update operation, many packets may be misclassified, because the working

<"i,- isn't immediately updated. The number of misclassified packets depends on

the periodicity with which the working table can be replaced by an updated shadow.

Further, additional memory is required for the shadow table and for periodic recon-

struction of the working table. It is important to have shorter preprocessing time in

order to reduce the number of misclassified packets.

1.1.2 Dynamic Router Table

In practice, rule tables are seldom truly static. At best, rules may be added

to or deleted from the rule table infrequently. Typically, in a "static" rule table, in-

serts/deletes are batched and the router-table data structure reconstructed as needed.

In a t/i,.//'. rule table, rules are added/deleted with some frequency. For such tables,

inserts/deletes are not batched. Rather, they are performed in real time.

We believe that dynamic structures for router tables is becoming a necessity.

First, update occurs frequently in the backbone area. Labovitz et al. [1] found up-

date rate could reach as high as 1000 per second. These updates stem from the route

failure, route repair and route fail-over. With the number of autonomous systems con-

tinuously in' i -- i it is reasonable to expect the raising update rate. The router

table needs to be updated in order to reflect the route change. Second, fast process-

ing of update is preferred because during the batch and reconstruction, end-to-end

d, 1liv increases, packet loss raises dramatically, and the part of network may expe-

rience connectivity loss. Labovitz et al. [2] observed dramatically increased packet

loss and end-to-end latency during the BGP routing change. Batch and expensive

reconstruction make things worse. While BGP takes time to converge, route-repair

events usually do not cause multiple announcements, and the latency for router table

to become stable due to these events should only depend on the network delay and

router processing d.-1-, along the path [2]. In addition, when the BGP coverage time

gets reduced, the processing delay may dominate. Pei et al. [3] reduce the conver-

gence time from 30.3 seconds to 0.3 seconds for a failure withdraw in the tested by

applying two consistency assertions to BGP. Macian et al. [4] emphasize the impor-

tance of supporting high update rate. Dynamic router tables that permit high-speed

inserts and deletes are essential in QoS and VAS applications [4]. For example, edge

routers that do stateful filtering require high-speed updates [5].

For dynamic router tables, we are concerned additionally with the time required

to insert/delete a rule. For a dynamic rule table, the initial rule-table data structure

is constructed by starting with an empty data structure and then inserting the initial

set of rules into the data structure one by one. So, typically, in the case of dynamic

tables, the preprocessing metric, mentioned above, is very closely related to the insert


For dynamic router table, the following metrics are measured to compare the


1. Lookup Time.

2. Insertion Time. This is the time required to insert a new rule into the rule


3. Deletion Time. This is the time required to delete a rule from the rule table.

4. Storage requirement.

Note that there is only a working table for dynamic schemes and updates are

made directly to the working table in real time. In this mode of update, no packet is

improperly classified. However, packet classification/forwarding may be d.1 .1 until

a preceding update completes. To minimize this delay, it is necessary that update be

done as fast as possible.

Another important metric we concern for both static and dynamic router table

is the scalability to IPv6. IPv6, the next generation of IP, uses 128-bit addresses

(W = 128). Although some of the schemes in section 1.2 work well for IPv4 (W =

32), they have bad scalability in terms of the prefix length.

1.2 Related Work

Data structures for rule tables in which each filter is a address prefix and the rule

priority is the length of this prefix2 have been intensely researched in recent years.

We refer to rule tables of this type as longest-matching prefix-tables (LMPT). We

refer to rule tables in which the filters are ranges and in which the highest-priority

matching filter is used as highest-priority range-tables (HPRT). When the filters of

no two rules of an HPRT intersect, the HPRT is a nonintersecting HPRT (NHPRT).

Although every LMPT is also an NHPRT, an NHPRT may not be an LMPT.

Ruiz-Sanchez et al. [6] review data structures for static LMPTs and Sahni et

al. [7] review data structures for both static and dynamic LMPTs.

1.2.1 Trie

Several trie-based data structures for LMPTs have been proposed [8, 9, 10, 11,

12, 13, 14]. Structures such as that of Doeringer et al. [10] use the path-compression

technique. Thus the memory requirement is O(n). The search is guided by the input

key and only inspects the bit position stored at the internal node due to a successful

search bias. When the search reaches the leaf node and the search does not succeed,

the downward path may be backtracked to find the longest matching prefix. Hence

the search can be carried out in O(W) time. The update operation, insert or delete,

is natural in trie structure, and can also be performed in O(W) time. The memory

accesses during these operations are O(W). For IPv6, O(W = 128) memory accesses

are quite expensive. Moreover, path compression reduces the height of trie only if

the prefixes scatter inside the trie sparsely. When the number of prefixes increases,

lots of branch nodes are needed and path compression does not have many nodes to

2 For example, the filter 10* matches all destination addresses that begin with the
bit sequence 10; the length of this prefix is 2.

compress. Ruiz-Sanchez et al. [6] observe that the height of BSD version of path-

compressed trie is 26 for a IPv4 router table with 47,113 prefixes, and the height of

a simple binary trie is only 30.

In order to reduce the trie length, Gupta et al. [15] uses DIR-24-8 scheme which

fully expands the binary trie at depth 24, i.e., all prefixes with length less than or

equal to 24 are expanded to 24-bit prefixes as many as needed, and a table with 224

entries is used to store these expanded prefixes. For those prefixes longer than 24 bits,

a second table is used to store them. The correspondence is established by storing

pointers in the first table which point to the proper entries in the second table. The

first table has 224 entries, and each entry is 16 bits (32M bytes in total). The first bit

of each entry indicates whether the next 15 bits store the next hop or a pointer into

2nd table. With more than 32M bytes memory usage, the scheme can perform search

in at most two memory accesses. But it is not scalable to IPv6 because expanding to

24 bits already takes too much memory. Gupta et al. [15] also propose alternatives

that use less memory but require more memory accesses.

Degermark et al. [9] use a similar prefix expansion technique at multiple depths.

Bitmap compression is deploy, 1l to reduced the memory requirement greatly. A

router table with 40,000 rules can fit into 160K bytes. In the worst case, the number

of memory accesses is nine. Huang et al. [16] fully expand the binary trie at depth

16 and also expand the sbutries rooted at the nodes in depth 16 to their own depths.

The bitmap compression is also applied to reduce the memory requirement. The

router tables used in the experiment can be compacted into less than 500K bytes.

The number of worst case memory accesses is three. Both schemes [9, 16] heavily

depend on the prefix distribution. It is hard to decide a proper memory size for

the scheme ahead of time. For example, in extreme case, if n prefixes in the router

table all have length 32, and their first 16-bits are distinct (assume n <= 216), the

scheme [16] needs at least 214n bytes.

Nilsson et al. [11] apply the level compression as well as path compression to

the binary trie. A binary trie is path-compressed first, then level compression is

used to reduce the height of the trie further by substituting k highest levels of the

binary trie with a single degree-2k node. Although the search complexity of LC (level

compressed) trie is still O(W), the height of LC-trie is around 8 for the router tables

used in author's analysis.

These data structures [9, 11, 15] as well as Srinivasan et al. [12] attempt to

optimize lookup time through an expensive preprocessing step. They, while providing

very fast lookup capability, have a prohibitive insert/delete time, so they are suitable

only for static router-tables (i.e., tables into/from which no inserts and deletes take


Sahni et al. [13, 14] provide efficient constructions for fixed-stride and variable-

stride multibit tries. The lookup time and memory requirement are optimized through

expensive preprocessing.

Aiming at improving update speed for fixed-stride multibit trie at pipelined

ASIC architecture, Basu et al. [17] describe an algorithm to optimize and balance the

memory requirement across the pipeline stages.

1.2.2 Sets of Equal-Length Prefixes

Waldvogel et al. [18] have proposed a scheme that performs a binary search on

hash tables organized by prefix length. In order to support binary search, O(log W)

markers are generated for each prefix, and the longest matching prefix is precomputed

for each marker. This binary search scheme has an expected complexity of O(log W)

for lookup. The memory requirement is bounded by O(n log W). By introducing

a technique called marker partitioning in the full version of Waldvogel et al. [18],

the scheme has O(a T7iWlog W) insert/delete time and an increased search time

O(a + log W), for a > 1.

1.2.3 End-Point Array

An alternative adaptation of binary search to longest-prefix matching is devel-

oped in [19]. The distinct end points (start points and finish points) of the ranges

defined by the prefixes are stored in ascending order in an array. The end points

divide the universe into O(n) basic intervals. The LMP(d) is precomputed for each

interval as well as for each end point. LMP(d) is found by performing a binary search

on this ordered array. A lookup in a table that has n prefixes takes O(log n) time.

Because the schemes [19] use expensive precomputation, they are not suited for a

dynamic router-tables.

1.2.4 Multiway Range Tree

Suri et al. [20] have proposed a B-tree data structure for dynamic LMPTs. Using

their structure, we may find the longest matching-prefix, LMP(d), in O(log, n) time.

However, inserts/deletes take O(Wlog, n) time. When W bits fit in 0(1) words (as

is the case for IPv4 and IPv6 prefixes) logical operations on W-bit vectors can be done

in 0(1) time each. In this case, the scheme of Suri et al. [20] takes O(mlog2 W log n)

time for an insertion and O(mlog, n+W) for a deletion. Assume one node can fit into

0(1) cache line, the number of memory accesses that occur when the data structure

of Suri et al. [20] is used is O(log, n) per search, and O(m log, n) per update.

1.2.5 O(logn) Dynamic Solutions

Sahni et al. [21, 22] develop data structures, called a collection of red-black trees

(CRBT) and alternative collection of red-black trees (ACRBT), that support the

three operations of a dynamic LMPT in O(log n) time each. The number of cache

misses is also O(log n). Sahni et al. [22] show that their ACRBT structure is easily

modified to extend the biased-skip-list structure of Ergun et al. [23] so as to obtain

a biased-skip-list structure for dynamic LMPTs. Using this modified biased skip-

list structure, lookup, insert, and delete can each be done in O(log n) expected time

and O(logn) expected cache misses. Like the original biased-skip list structure of

Ergun et al. [23], the modified structure of Sahni et al. [22] adapts so as to perform

lookups faster for bursty access patterns than for non-bursty patterns. The ACRBT

structure may also be adapted to obtain a collection of splay trees structure [22],

which performs the three dynamic LMPT operations in O(log n) amortized time and

which adapts to provide faster lookups for bursty traffic.

1.2.6 Highest-Priority Prefix Table

When an HPPT (highest-priority prefix-table) is represented as a binary trie [24],

each of the three dynamic HPPT operations takes O(W) time and cache misses.

Gupta et al. [25] have developed two data structures for dynamic HPRTs-heap

on trie (HOT) and binary search tree on trie (BOT). The HOT structure takes O(W)

time for a lookup and O(W logn) time for an insert or delete. The BOT structure

takes O(W log n) time for a lookup and O(W) time for an insert/delete. The number

of cache misses in a HOT and BOT is .i- mptotically the same as the time complexity

of the corresponding operation.

1.2.7 TCAM

Ternary content-addressible memories, TCAMs, use parallelism to achieve 0(1)

lookup [26]. Each memory cell of a TCAM may be set to one of three states 0, 1, and

don't care. The prefixes of a router table are stored in a TCAM in descending order

of prefix length. Assume that each work of the TCAM has 32 cells. The prefix 10* is

stored in a TCAM work as 10??...?, where ? denotes a don't care and there are 30 ?s in

the given sequence. To do a longest-prefix match, the destination address is matched,

in parallel, against every TCAM entry and a sorted-biv--l n:_.l linear list, the longest

matching-prefix can be determined in 0(1) time. A prefix may be inserted or deleted

in O(W) time, where W is the length of the longest prefix [27]. Although TCAMs

provide a simple and efficient solution for static and dynamic router tables, this

solution requires special hardware, costs more, and uses more power and board space

than solutions that employ SDRAMs. TCAMs have longer latency than SDRAMs.

Since TCAM requires an arbitration module to choose the longest matching prefix

and a more complex arbitration module are needed for a '-i.. -r router table, the

latency of TCAM increases with router table size. EZchip Technologies, for example,

claim that classifiers can forgo TCAMs in favor of comoodity memory solutions [5, 28].

Algorithmic approaches that have lower power consumption and are conservative on

board space at the price of slightly increased search latency are sought. "System

vendors are willing to accept some latency in their searches if it means lowering the

power of a line < iI [28].

1.2.8 Others

C'!,. ig et al. [29] developed a model for table-driven route lookup and cast the

table design problem as an optimization problem within this model. Their model

accounts for the memory hierarchy of modern computers, and they optimize average

performance rather than worst-case performance.

Solutions that involve modifications to the Internet Protocol (i.e., the addition

of information to each packet) have also been proposed [30, 31, 32].

1.3 Contribution

We have developed data structures for dynamic router tables. The data struc-

tures use O(n) space except that RIBT uses O(nlog, n) space. Our first data struc-

ture, PST [33, 34], uses the most specific matching tie breaker. It permits one to

search, insert, and delete in O(log n) time each. Although O(log n) time data struc-

tures for prefix tables were known prior to our work [21, 22], the PST is more memory

efficient than the data structures of [21, 22]. Further, PST is significantly superior on

the insert and delete operations, while being competitive on the search operation. For

nonintersecting ranges and conflict-free ranges PSTs are the first to permit O(log n)

search, insert, and delete.

The second data structure, BOB [35], works for highest-priority matching with

nonintersecting ranges. the highest-priority rule that matches a destination address

may be found in O(log2 n) time; a new rule may be inserted and an old one deleted

in O(log n) time. For the case when all rule filters are prefixes, the data structure

PBOB (prefix BOB) permits highest-priority matching as well as rule insertion and

deletion in O(W) time each. On practical rule tables, BOB and PBOB perform

each of the three dynamic-table operations in O(log n) time and with O(log n) cache

misses. PBOB can also support the dynamic-table operations in O(logn) time and

with O(log n) cache misses for nonintersecting ranges when the number of nesting

levels is a constant.

To utilize the wide cache line size, e.g., 64-byte cache line, we propose B-tree

data structures for dynamic router-tables for the cases when the filters are prefixes as

well as when they are non-intersecting ranges. A crucial difference between our data

structure for prefix filters and the B-tree router-table data structure of Suri et al. [20]

is that in our data structure, each prefix is stored in 0(1) B-tree nodes per B-tree

level, whereas in the structure of Suri et al. [20], each prefix is stored in O(m) nodes

per level (m is the order of the B-tree). As a result of this difference, a prefix may

be inserted or deleted from an n-filter router table accessing only O(log, n) nodes

of our data structure; these operations access O(mlog, n) nodes using the structure

of Suri et al. [20]. Even though the .,-i'--~!Itotic complexity of prefix insertion and

deletion is the same in both B-tree structures, experiments conducted by us show

that because of the reduced cache misses for our structure, the measured average

insert and delete times using our structure are about 3i '. less than when the B-tree

structure of Suri et al. [20] is used. Further, an update operation using the B-tree

structure of Suri et al. [20] will, in the worst case, make 2.5 times as many cache

misses as made when our structure is used. The .,-i-,i!,ld l ic complexity to find the

longest matching prefix is the same, O(mlog, n) in both B-tree structures, and in

both structures, this operation accesses O(log, n) nodes. The measured time for this

operation also is nearly the same for both data structures. Both B-tree structures

for prefix router-tables take O(n) memory. However, our structure is more memory

efficient by a constant factor. For the case of non-intersecting ranges, the highest-

priority range that matches a given destination address may be found in O(m log, n)

time using our proposed B-tree data structure. The time to insert and delete a range

is O((m + D) log, n), where D is the maximum nesting depth of the ranges. Our

data structure for non-intersecting ranges requires O(n log, n) memory and O(log,, n)

nodes are accessed during an operation.

With the O(logn) operation time, our data structures scale well to the large

router tables. Since the complexity is independent of the prefix length, our data

structures are also scalable to IPv6.

Another important feature of our data structures is that nonintersecting ranges

are supported naturally, whereas most existing data structures support ranges (neces-

sary when the filters are defined for port numbers) by breaking one range into O(W)

prefixes which results in O(W log n) memory requirement. Supporting ranges is also

a nice feature for network liv-r addresses. The range that a prefix covers must be a

power of two, and it must start at a number which is a multiple of the range size.

But the end points and the size of a normal range can be any number. Supporting

ranges means one can allocate a range with arbitrary size to a network (AppleTalk

supports this feature) and the range .,.::-regation is potentially better than that of

prefix. For example, two di-, ,iil prefixes can ..- .-regate into one prefix only if their

ranges are .,Ii] i'.ent to each other and they have the same length, whereas the two

di-i i I ranges can ..-:-:regate into one range as long as they are next to each other.

So, range .::- regation is expected to result in router tables that have fewer rules.


In this chapter, we show in Section 2.2 how priority-search trees may be used to

represent dynamic prefix-router-tables. The resulting structure, which is conceptually

simpler than the CRBT structure of Sahni et al. [21], permits lookup, insert, and

delete in O(log n) time each. For range router-tables, we consider the case when the

best matching-prefix is the most-specific matching prefix (this is the range analog of

longest-matching prefix). In Section 2.3, we show that dynamic range-router-tables

that employ most-specific range matching and in which no two ranges overlap may be

efficiently represented using two priority-search trees. Using this two-priority-search-

tree representation, lookup, insert, and delete can be done in O(log n) time each. The

general case of non-conflicting ranges is considered in Section 2.4. In this section, we

augment the data structure of Section 2.3 with several red-black trees to obtain

a range-router-table representation for non-conflicting ranges that permits lookup,

insert, and delete in O(log n) time each. Section 2.1 introduces the terminology we

use. In this section, we also develop the mathematical foundation that forms the

basis of our data structures. Experimental results are reported in Section 2.5.

2.1 Preliminaries

2.1.1 Prefixes and Longest-Prefix Matching

The prefix 1101* matches all destination addresses that begin with 1101 and

10010* matches all destination addresses that begin with 10010. For example, when

W = 5, 1101* matches the addresses {11010, 11011} {26, 27}, and when W = 6,

1101* matches {110100,110101,110110,110111} = {52,53,54,55}. Suppose that a

router table includes the prefixes P1 = 101*, P2 = 10010*, P3 = 01*, P4 = 1*, and

S y I I
1 I--I I U V
u V x y

u v x y
x y u v x I u v
I i I i
(A) (B) (C)

Figure 2-1: Relationships between pairs of ranges. A)Two ranges are di-. iiil B)Two
ranges are nested. C)Two ranges intersect.

P5 = 1010*. The destination address d = 1010100 is matched by the prefixes P1,

P4, and P5. Since |P1| = 3 (the length of a prefix is number of bits in the prefix),

|P4| = 1, and |P5| = 4, P5 is the longest prefix that matches d. In longest-prefix

routing, the next hop for a packet destined for d is given by the longest prefix that

matches d.

2.1.2 Ranges and Projections

Definition 1 A range r = [u, v] is a pair of addresses u and v, u < v. The r r,'., r

represents the addresses {u, u + 1,..., v}. start(r) = u is the start point of the r"i,,.-

and finish(r) = v is the finish point of the r,,.g The rr,i..' r covers or matches

all addresses d such that u < d < v. range(q) is a predicate that is true iff q is a

r it,'ilI

The start point of the range r = [3, 9] is 3 and its finish point is 9. This range

covers or matches the addresses {3, 4, 5, 6, 7, 8, 9}. In IPv4, s and f are up to 32

bits long, and in IPv6, s and f may be up to 128 bits long. The IPv4 prefix P = 0*

corresponds to the range [0, 231 1]. The range [3,9] does not correspond to any single

IPv4 prefix. We may draw the range r = [u, v] = {u, u + 1,..., v} as a horizontal line

that begins at u and ends at v. Figure 2-1 shows ranges drawn in this fashion.

Notice that every prefix of a prefix router-table may be represented as a range.

For example, when W = 6, the prefix P = 1101* matches addresses in the range

[52,55]. So, we -- P = 1101* = [52,55], start(P) = 52, and finish(P) = 55.

Since a range represents a set of (contiguous) points, we may use standard set

operations and relations such as n and c when dealing with ranges. So, for example,

[2, 6] n [4, 8] = [4, 6]. Note that some operations between ranges my not yield a range.

For example, [2, 6] U [8, 10] {2, 3, 4, 5, 6, 8, 9, 10} is not a range.

Definition 2 Let r = [u, v] and s = [x, y] be two ri .g. Let overlap(r, s) = r n s.

(a) The predicate disjoint(r, s) is true iff r and s are disjoint.

disjoint(r, s) < overlap(r, s)= 0 v < x V y < u

Figure 2-1(A) shows the two cases for disjoint sets.

(b) The predicate nested(r, s) is true iff one of the r,,.", is contained within the


nested(r, s) o overlap(r, s) r V overlap(r, s)= s

== rCsVsCr

<= x

Figure 2-1(B) shows the two cases for nested sets.

(c) The predicate intersect(r, s) is true iff r and s have a no,. mn1,1i intersection

that is different from both r and s.

intersect(r,s) rns /OArns /rArs / s

= ~disjoint(r, s) A -nested(r, s)


Figure 2-1(C) shows the two cases for r,,i.. that intersect.

Notice that overlap(r, s) = [x, v] when u < x < v < y and overlap(r, s) = [u, y]

when x < u < y < v.

[2, 4] and [6, 9] are disjoint; [2,4] and [3,4] are nested; [2,4] and [2,2] are nested;

[2,8] and [4,6] are nested; [2,4] and [4,6] intersect; and [3,8] and [2,4] intersect. [4, 4]
is the overlap of [2, 4] and [4, 6]; and overlap([3, 8], [2, 4]) = [3, 4].

Lemma 1 Let r and s be two r,,. Ei. //;/ one of the following is true.

1. disjoint(r, s)

2. nested(r, s)

3. intersect(r, s)
Proof Straightforward. U

Definition 3 Let R = {ri,..., r} be a set of n r,'ig The projection, H(R), of

R is

H(R) = Ui
That is, II(R) comprises all addresses that are covered by at least one rr,,.- of R.

For A {[2, 5], [3, 6], [8, 9]}, H(A) = {2, 3, 4, 5, 6, 8, 9}, and for B = {[4, 8], [7,9]},

(B) = {4, 5,6, 7,8, 9}. II(A) is not a range. However, 1(B) is the range [4,9]. Note

that HI(R) is a range iff d CE (R) for every d, u < d < v, where u = mind d E H(R)}

and v = max{d|d E I(R)}.

Lemma 2 Let R = {ri,r2,..., rn} be a set of n ri.,g such that 1(R) = [u,v].

(a) u = minStart(R) min{start(ri)} and v maxFinish(R) = max{finish(ri)}.
(b) Let s be a r,.,g II(RU{s}) is a ri,.-, if starts) < v+1 and finishes) > u-1.

(c) When H(R U {s}) [x, y], x = min{u, startss} and y = max{v, finish(s)}.

Proof (a) is straightforward. Figure 2-2 shows all possible cases for which II(RU{s})

is a range, s is shown as a solid line. (b) and (c) are readily verified for each case of

Figure 2-2. m

2.1.3 Most-Specific-Range Routing and Conflict-Free Ranges

Definition 4 The ri 'i r is more specific than the r .'i,. s iff r C s.

u v
I I . . . . I I
u-1 v+1

Figure 2-2: Cases for Lemma 2

[2, 4] is more specific than [1,6], and [5, 9] is more specific than [5, 12]. Since [2, 4]

and [8, 14] are di-i ..iii neither is more specific than the other. Also, since [4, 14] and

[6, 20] intersect, neither is more specific than the other.

Definition 5 Let R be a riu,.j. set. ranges(d, R) (or .i.:,,l;, ranges(d) when R is

implicit) is the subset of r,, of R that match/cover the destination address d.

msr(d,R) (or msr(d)) is the most / .... .:;' riI.j, of R that matches d. That is,

msr(d) is the most "-/.. ..:'. rr,,i.j, in ranges(d). msr([u,v], R) = msr(u, v, R) = r iff

msr(d, R) = r, u < d < v. When R is implicit, we write msr(u, v) and msr([u,v])

in place of msr(u, v,R) and msr([u, v],R). In most-specific-range routing, the

next hop for packets destined for d is given by the next-hop information associated

with msr(d).

When R = {[2,4], [1, 6]}, ranges(3) = [2,4], [1, 6]}, msr(3) = [2,4], msr(1) =

[1, 6], msr(7) = 0, and msr(5,6) = [1,6]. When R = {[4,14], [6, 20], [6,14], [8,12]},
msr(4, 5) [4,14], msr(6, 7) [6,14], msr(8, 12)- [8,12], msr(13, 14)- [6,14],

and msr(15, 20) [6, 20].

Definition 6 The ri,,-, set R has a conflict iff there exists a destination address d

for which ranges(d) / 0 A msr(d) = R is conflict free iff it has no conflict. The

predicate conflictFree(R) is true iff R is a conflict-free ru,.,- set.

con flictFRee({[2, 8], [4, 12], [4, 8])} is true while conflictFree( {[2, 8], [4, 12])} is


We note that our definition of conflict free is a natural extension to ranges of

the definition of conflict free given by Hari et al. [36] for the case of two-dimensional

prefix rules.

Definition 7 Let r and s be two intersecting r,".,. of the r,,g,- set R. The subset

Q c R is a resolving subset for these two ri., if Q is conflict free and II(Q) =

overlap(r, s). Two r,,i., of a riu.,' set are in conflict iff they intersect and have

no resolving subset. Two r, i ,, are conflict free iff they are not in conflict.

Lemma 3 A rir,,.- set is conflict free iff it has no pair of ri,, ig that are in conflict.

Proof Follows from the definition of a conflict-free range set. 0

Lemma 4 Let R be a conflict-free r,,u-., set. Let r be an arbitrary r,i ., Let A be

the subset of R that comprises all r,,.g of R that are contained in r. A is conflict


Proof Since R is conflict free, every pair (s,t) of intersecting ranges in A has

a resolving subset B in R. From Definition 7, it follows that every range in B is

contained in overlap(s,t). Hence, B C A. Therefore, every pair of intersecting

ranges of A has a resolving subset in A. So, A is conflict free. 0

Lemma 5 Let R be a conflict-free rr,,.' set. Let A, A C R be such that II(A) = r


1. 3B C R[conflictFree(A U B) A I(A) = H(A U B)]

2. Let s E R be such that intersect(r, s). 3B C R[I(B) = overlap(r, s)]

3. R U {r} is conflict free.


1. Follows from Lemma 4.

2. When r E R, (2) follows from the definition of a conflict-free range set. So,

assume r R. Let C comprise all ranges of A contained in s. If s intersects no

range of A, II(C) = overlap(r, s). If s intersects at least one range of A, then

let t E A be an intersecting range with maximum overlap. Since R is conflict

free, 3D C R[H(D) = overlap(t, s)]. We see that H(C U D) = overlap(r, s).

3. From parts (1) and (2) of this lemma, it follows that there is a resolving subset

in RU {r} for every s E R that intersects with r. Hence, RU {r} is conflict free.

Definition 8

maxP(u, v, R) = max{finish(H(A)) A C R A range(H(A)) A start(H(A)) u A

finish(H(A)) < v} is the maximum possible projection that is a ri,..' that starts at

u and finishes by v.

minP(u,v, R) = min{start(H(A)) A C R A range(H(A)) A finish(H(A))

v A start(H(A)) > u} is the minimum possible projection that is a rin,. that finishes

at v and starts by u.

When /QA C R[range(H(A)) A start(H(A)) u A finish(H(A)) < v], we -..r;

that maxP(u, v, R) does not exist. Similarly, minP(u, v, R) I,,r.; not exist. At times,

we use maxP and minP as abbreviations for maxP(u, v, R) and minP(u, v, R), re-

IN. A 1 1;/

maxY(u,v, R) = max{y [x,y] E R A x < u < y < v} and minX(u,v, R)

min{x [x, y] E R A u < x < v < y}. Note that maxY and minX i,,.. r not exist.
Lemma 6 Let R be a conflict-free r,,..'- set. Let A RU {r}, where r = [u, v] R.
conflictFree(A) maxY(u, v, R) < maxP(u, v, R)

A minX(u, v, R) > minP(u, v, R)
where maxY < maxP (minX > minP) is true whenever maxY (minX) does

not exist and is false when maxY (minX) exists but maxP (minP) does not.

Proof (=) Assume that A is conflict free. When neither maxY nor minX exist

(this happens iff no range of R intersects r =[u, v]), maxY < maxP A minX >

minP. When, maxY exists, s = [x, maxY] E AAx < u < maxY < v. (Note

that intersect(r, s).) Since A is conflict free, A has a (resolving) subset B for which

H(B) = overlap(r, s)= [u, maxY]. Therefore, maxY < maxP. Similarly, when

minX exists, minX > minP.

( ) Assume maxY(u, v, R) < maxP(u, v, R) A minX(u, v, R) > minP(u, v, R).

When neither maxY nor minX exist, no range of R intersects r. When, maxY exists,

3s = [x, y] E A[x < u < y < v]. Consider any such s = [x, y]. Since maxY < maxP

and maxY exists, maxP exists. Hence, 3B C R[conflictFree(B) A H(B)

[u, maxP]]. When y = maxP, B is a resolving subset for s and r in A. When
y < maxP, intersect(s, [u,maxP]). Since R U {[u,maxP]} is conflict free

lemmaa 5(3)), R U {[u,maxP]} (and so also R and A) has a resolving subset

for s and [u, maxP]. This resolving subset is also a resolving subset for s and r.

When minX exists, 3s [x, y] E A[u < x < v < y]. In a manner analogous to the

proof for the case maxY exists, we may show that A has a resolving subset for r and

each such s. Hence, in all cases, intersecting ranges of A have a resolving subset. So,

A is conflict free. U

Lemma 7 Let R be a conflict-free rr,,j set. Let A = R {r} for some r c R.

AB c A[H(B) r]A As e Air c s] ,AC c Air c n(C)]

Proof Assume

AB c A[H(B) = r] (2.1)


As e A[r C s] (2.2)

We need to show that AC C A[r C H(C)].

Suppose that there is a C such that C C A Ar C H(C). From C C A and

Equation 2.2, it follows that

Vt c C[disjoint(r,t) V intersect(r, t) V t C r]


If At c C[intersect(r, t), then from Equation 2.3, we get Vt c C[disjoint(r, t) V

t c r]. From this and r C H(C), it follows that all destination addresses d, d E r, are

covered by ranges of C that are contained in r. Therefore, 3B C C C A(H(B) r).

This contradicts Equation 2.1.

Next, suppose 3t E C[intersect(r, t)]. Let D be the union of the resolving subsets

for all of these t and r in R. Clearly, all ranges in D are contained in r. Further,

let E be the subset of all ranges in C that are contained in r. It is easy to see that

D U E C A A H(D U E) = r. This contradicts Equation 2.1. t

Lemma 8 Let R be a conflict-free rr,'j set. Let A = R {r}, for some r e R.

1. 3B C A[H(B) = r] = conflictFree(A).

2. 14B C A[H(B) r] s= [conflictFree(A) c=/Es e A[r c s] V [m,n] e A],

where max{start(s)ls c AAr C s}, and n min{finish(s)ls c AAr C s}.

Proof For (1), we note that by replacing r by B in every resolving subset for

intersecting ranges in R, we get resolving subsets that do not include r. Hence all of

these resolving subsets are present in A. So, A is conflict free.

For (2), assume that AB C A[I(B) = r].

( ) Assume that A is conflict free. We need to prove

s e A[r C s] V [m,n] e A (2.4)

We do this by contradiction. So, assume

3s E A[r C s] A [m, n] A (2.5)

Since 3s E A[r c s], m and n are well defined. Equation 2.5 implies that A has

a range [m, y], y > n as well as a range [x, n], x < m. Further, intersect([m, y], [x, n])

and r C overlap([m, y], [x, n]) = [m, n]. Let B be the subset of R comprised of all

ranges contained in [m, n]. From Lemma 4, it follows that B is conflict free. However,

r is the projection of no subset of C = B {r}. Further, no range of C contains

r. From Lemma 7, it follows that no subset of C has a projection that contains

r. In particular, C has no subset whose projection is [m, n]. Therefore, A, has no

subset whose projection is [m, n]. So, A has no resolving subset for [m, y] and [x, n].

Therefore, A is not conflict free, a contradiction.

(-) If no range of A contains r, then r is not part of the resolving subset for

any pair of intersecting ranges of R. This, together with the fact that R is conflict

free, implies that A is conflict free. If [m, n] e A, we can use [m, n] in place of r in

any resolving subset for intersecting ranges of R. Therefore, A has a resolving subset

for every pair of intersecting ranges. So, A is conflict free. 0

Lemma 9 Let R be a conflict-free rr,,,.' set and let d be a destination address. If

ranges(d) / 0, then start(msr(d)) = a = maxStart(ranges(d)) = max{start(r) r E

ranges(d)} and finish(msr(d)) = b = minFinish(ranges(d)) min{finish(r)lr E


Proof Since R is conflict free and ranges(d) / 0, msr(d) / 0. Assume that

msr(d) = s. If s / [a, b], then starts) < a or finishes) > b. Assume that starts) <

a (the case finishes) > b is similar). Let t E ranges(d) be such that start(t) = a.

Now, intersect(s,t) Vt C s. Hence, s / msr(d). U

2.1.4 Normalized Ranges

Definition 9 [Normalized Ranges] The r i,,. set R is normalized iff one of the

following is true.

1. RI <1.

2. IRI > 1 and for every r E R and every s E R, r / s, one of the following is


(a) disjoint(r, s).

(b) nested(r,s) A start(r) / starts) A finish(r) / finishes). That is, r and

s are nested and do not have a common end-point.


(A) (B)

Figure 2-3: Unnormalized and normalized range sets

Figure 2-3(A) shows a range set that is not normalized (it contains ranges that

intersect as well as nested ranges that have common end-points). Figure 2-3(B) shows

a normalized range set. Regardless of which of these two range sets is used, every

destination d has the same most-specific range.

Definition 10 An ordered sequence of ri.l- (ri,..., r) is a chain iff Vi < n

[start(ri+l) = finish(ri)]. A ri.-j, set R is a chain iff its ri.j,- can be ordered

so as to form a chain. chain(R) is a predicate that is true iff R is a chain.

The range sequence ([2, 4], [5, 7], [8, 12]) is a chain while ([5, 8], [12, 14]) and ([5, 8],

[2, 4]) are not. The range sets {[5,8], [2, 4]} and {[2, 4], [8, 12], [5, 7]} are chains while

{[2, 4], [8, 12]} and {[2, 4], [5, 7], [8, 12], [9, 10]} are not. Note that when R is a chain,

H(R) = [minStart(R), maxFinish(R)].

Lemma 10 Let N be a normalized ri.lj. set.

A c N A n(A) = [u, v] = 3B c N[chain(B) A n(B) = [u, v]]

Proof Let B be the subset of A obtained by removing from A all ranges that are

nested within at least one other range of A. Clearly, I(B) = H(A) = [u, v]. Since

N is normalized and B C N, B is also normalized. From Definition 9 and the fact

that B has no pair of nested ranges, it follows that all ranges of B are dli-i ,iiil For

dli- I iil ranges to have a projection that is a range, the dli- iiil ranges must form a

chain. U

Lemma 11 Let N be a normalized riu.j. set.

1. N 1In,,; be ,n.u':,1, l/ partitioned into a set of longest chains CP(N)

{C1,..., Ck}, N = Ul

... +1 + ] .... .-' +1

+1 +1-

Figure 2-4: Partitioning a normalized range set into chains

of CP i,,i' be combined into a single chain. CP(N) is called a canonical


2. For all i and j, 1 < i < j < k, i and Cj are either disjoint,or Ci is ""'/'

contained within a ri,:, of Cy or Cj is 1" I'/' i'/ contained within a ri ,:' of CQ.

A chain Ci is i," '/'. JI contained within the ru,:-, r iffII(Ci) C r and Ci and r

share no end point.

Proof Direct consequence of the definition of a normalized set and that of a chain.

Figure 2-4 shows a normalized range set and its canonical partitioning into three


Next we state a chopping rule that we use to transform every conflict-free range

set R into an equivalent normalized range set norm(R). By equivalent, we mean that

for every destination d, the most-specific matching-range is the same in R as it is in


Definition 11 [Chopping Rule] Let r = [u,v] E R, where R is a ri,.'- set.

chop(r, R) (or more -.':,,/1/; chop(r) when R is implicit), is as /, r1 ,.1 below.

1. If neither maxP(u, v 1, R) nor minP(u + 1, v, R) exists, chop(r) = r.

2. If only maxP(u, v 1, R) exists, chop(r) = [maxP(u, v 1, R) + 1, finisher)].

3. If only minP(u + 1, v, R) exists, chop(r) = [start(r), minP(u + 1, v, R) 1].

4. If both maxP(u, v-, R) and minP(u+l,v, R) exist and maxP(u, v-, R)+1 <

minP(u+ 1, v, R) -1, chop(r) = [maxP(u,v- 1, R) + 1, minP(u+ 1, v, R) 1].

5. IfbothmaxP(u,v-l,R) and minP(u+,v,R) exist and maxP(u,v-l,R)+1 >
minP(u + 1, v, R) 1, chop(r) = 0, where 0 denotes the null r,,.. The null
r,,.'i, neither intersects nor is contained in rn, other r,,.'.

D. fi,. norm(R) = {chop(r)lr E R A chop(r) / 0}.

Lemma 12 Let R be a conflict-free rr,'.- set.

Vr E R Vs E R[s C r = [s C chop(r) A starts) / start(chop(r))

A finishes) / finish(chop(r))]

Vdisjoint(s, chop(r))]

Proof The lemma is trivially true when chop(r) = 0 (disjoint(s, 0) is true). So,

assume that chop(r) = r'. For the lemma to be false, either intersects, r') or (r' C s

or s and r' have a common end point).

If intersect(s,r'), then either starter') < starts) < finisher') < finishes) or

starts) < starter') < finishes) < finisher'). Assume the former (the latter case
is similar). From the chopping rule, it follows that 3A C R[II(A) = [finish(r') +

1, finisher)]. Therefore, A U {s} C R A II(A U {s}) = starts(s, finisher)]. From

this, start(r) < start(r') < startss, and the chopping rule, we get finish(chop(r)) <

startss. But, starts) < finisher'), a contradiction.

So, r' C s or s and r' have a common end point. First consider the case r' C s C

r. Suppose that starts) / start(r) (the case finishes) / finish(r) is similar). Since
r' = chop(r), 3A C R[I(A) = finisherr') + 1, finisher)]. Therefore, II(A U {s})

starts(s, finisher)] and start(r) < starts) < start(r'). From the chopping rule,
it follows that finish(chop(r)) < starts) < starter') < finisher'), a contradiction.

Therefore, s C r'. If starts) = starter'), maxP(start(r), finish(r)- 1) > finishes).

So, starter') > finishes), which contradicts s C r'. The case finishes)= finisher')
is similar. 0

Lemma 13 Let r and s be two intersecting rii'. of a conflict-free r,,',- set R.

disjoint(chop(r), overlap(r, s)) A disjoint(chop(s), overlap(r, s))

A disjoint(chop(r), chop(s))

Proof Without loss of generality, we may assume that start(r) < starts) <

finisher) < finishes). Since R is conflict free, 3A[A C R A n(A) = overlap(r, s)].

Therefore, finish(chop(r)) < starts) and start(chop(s)) > finisher). This proves

the lemma. 0

Lemma 14 Let R be a conflict-free r,,'.- set. For every r' e norm(R) there is a

unique r E R such that chop(r) = r'.

Proof Let r' be any range in norm(R). Clearly, for every r' E norm(R), there is

at least one r E R such that chop(r) = r'. Suppose two different ranges r and s of R

have r' = chop(r) = chop(s).

If intersect(r, s), then from Lemma 13 we get disjoint(chop(r), chop(s)). So,

chop(r) / chop(s).

If nested(r, s), then from Lemma 12 it follows that s C chop(r) V disjoint(s,

chop(r)) when s C r and r C chop(s) V disjoint(r, chop(s)) when r C s. Consider

the former case (the latter case is similar). s C chop(r) implies chop(s) / chop(r).

disjoint(s, chop(r)) also implies chop(s) / chop(r).

The final case is disjoint(r, s). In this case, clearly, chop(s) / chop(r). m

For r' E norm(R), define full(r') chop-l(r') = r, where r is the unique range

in R for which chop(r) = r'. Notice that full(chop(r)) = r except when chop(r) = 0.

Lemma 15 For every conflict-free r,,'.., set R, norm(R) is a normalized conflict-free
ri./,'' set.

Proof We shall show that norm(R) is normalized. Since a normalized range set

has no intersecting ranges, every normalized range set is conflict free.

If Inorm(R)l < 1, norm(R) is normalized. So, assume that Inorm(R)l > 1. Let

r' and s' be two different ranges in norm(R). We need to show that r' and s' satsify

property 2(a) or 2(b) of Definition 9. Let r = [u,v] = full(r') and s = full(s').
There are three possible cases for r and s, they either intersect, are nested, or are
di-i,,ii-l (Lemma 1).
Case 1: intersect(r, s). From Lemma 13, it follows that r' and s' are disjoint.
Case 2: nested(r,s). Either s C r or r C s. Assume the former (the latter
case is similar). From Lemma 12, we get [s C chop(r) A starts) / start(chop(r)) A
finishes) / finish(chop(r))] V disjoint(s, chop(r))
s C chop(r) A starts) / start(chop(r)) A finishes) / finish(chop(r)) implies
that s' and r' are nested and do not have a common end-point. disjoint(s, chop(r))
implies that s' and r' are disjoint.
Case 3. disjoint(r, s). Clearly, disjoint(r', s').

Lemma 16 Let r' E norm(R), where R is a conflict-free ri,,.'-. set.

As' e norm(R)[s' C r'] = r = full(r') = msr(r', R)

Proof Assume that /fs' e norm(R)[s' C r']. If 3d E r'[r / msr(d,R)], then
3s C r[d E s]. From Lemma 12, it follows that s C r'Vsn r' 0. Since d E sAd E r',
s n r' / 0. Hence, s C r'. From Lemma 4, it follows that A {= t t RAt C s} 0
is conflict free. From the chopping rule it follows that norm(A) / 0. So, 3t' e
norm(A) c norm(R)[t' C t = full(t') C r']. This violates the assumption of this
lemma. Therefore, Ad E r'[r / msr(d, R)]. So, r = msr(r', R). 0

Lemma 17 Let R be a conflict-free rr,.ui, set, let x be the start point of some rri..j,
in R, and let y be the finish point of some rru.j, in R.
1. Let s E R be such that starts) = x and finishes) = min{finish(t)t E R A
start(t) = x}
(a) chop(s) / 0.
(b) start(chop(s)) = x.
(c) chop(s) is the only r"'i.j in norm(R) that starts at x.

2. Let s C R be such that finishes) = y and starts) = max{start(t) t E R A
finish(t) = y}
(a) chop(s) / 0.

(b) finish(chop(s)) = y.
(c) chop(s) is the only rr,..,. in norm(R) that finishes at y.
Proof We prove l(a) (c). 2(a) (c) are similar. Since maxP(start(s), finish(s)-

1, R) does not exist, case 5 of the chopping rule does not apply and chop(s) / 0. One
of the cases 1 and 3 applies. In both of these cases, start(chop(s)) = x. For l(c), we
note that the definition of a normalized set (Definition 9) implies that no two ranges
of norm(R) share an end point. In particular, norm(R) can have only one range that
has x as an end point. U

Lemma 18 Let r' E norm(R), where R is a conflict-free riw..- set.

start(full(r')) / starter') =/As e R[start(s) = starter')]

Proof Suppose that start(full(r')) / starter') and 3s E R[start(s) = start(r')].
From Lemma 17(la and Ib), it follows that 3t c R[start(t) = starter') A chop(t) /
0 A start(chop(t)) = start(r')]. Therefore, norm(R) has at least two ranges (r' and
chop(t)) that start at starter'). This contradicts Lemma 17(lc). m

Lemma 19 Let R be a conflict-free r,'..- set. Let r E R be such that r
msr(u, v, R) for some rr wu. [u, v]. r' = chop(r) = msr(u, v, norm(R)).
Proof From the definition of msr, it follows that there is no s E R such that

s C r A s n [u, v] / 0. Therefore, [u, v] C chop(r). Further, from Lemmas 12 and 13,
it follows that norm(R) contains no s' Cc hop(r). So, r' = msr(u, v, norm(R)). m

Lemma 20 Let R be a conflict-free riu.,. set that has a subset whose projection

equals [x, y]. Let A C R comprise all r c R such that r C [x, y].
1. 3B C norm(R)[II(B) = [x,y]]
2. C = {full(r') r' e norm(R) A r' C [x, y]} C A

1. From Lemma 4, it follows that A is conflict free. Further, since R has a sub-
set whose projection equals [x, y], n(A) = [x, y]. From Lemma 19, it fol-
lows that every d E [x,y] has a most-specific range in norm(A). Therefore,
n(norm(A)) = [x, y]. From the definition of the chopping rule and that of A,
we see that Vr E A[chop(r, A) = chop(r,R)]. So, norm(A) c norm(R).
2. First, assume that [x, y] E R. Suppose there is a range r' E norm(R) such that
r' C [x, y] and r = full(r') A. There are three cases for r.
Case 1: disjoint(r, [x,y]). In this case, disjoint(r', [x,y]) and so r' Z [x, y].
Case 2: intersect(r, [x, y]). From Lemma 13, we get disjoint(chop(r), [x, y]).
So r' 0 [x,y].
Case 3: [x,y] C r. From Lemma 12 and r' c [x,y], we get disjoint([x,y],
chop(r)). So r' 7 [x, y].
When [x, y] R, let R' RU{[x, y]}, C' = {full(r') r' e norm(R')Ar' C [x, y]}
and A' = A U { [x, y]}. Using the lemma case we have already proved, we get
C' C A'. Since chop([x,y],R') = 0 and chop(s,R) = chop(s,R') for every
s E R, norm(R') = norm(R). Therefore, C C'. So, C C A'. Finally, since

[x, y] C, CC A.

Lemma 21 Let R be a conflict-free r,,..', set. Let r R be such that R U {r} is
conflict free.
1. chop(r, RU {r}) 0 = Vt R[chop(t, R)= chop(t, RU {r})].
2. Let s be the smallest ri,.j,' of R that contains r. Assume that s exists and that
chop(r, RU {r}) / 0.
(a) Vt R -{s}J[chop(t, R) chop(t, RU {r})].

(b) chop(s, R) / chop(s, RU {r}) = (x' =u' A y' = v') V (x' u' A y' > v) V

(x' < uAy' = v'), where r [u,v], chop(r,RU{r}) = chop(r,R) = [u', v'],
and chop(s, R) = [x', y'].

Proof For (1), note that chop(r, RU {r}) 0 z= 3A C R[H(A) = r]. Therefore,

the addition of r to R does not affect any of the maxP and minP values.

For (2a), suppose there are two different ranges g and h in R such that chop(g, R)

/ chop(g, R U {r}) and chop(h, R) / chop(h, R U {r}) From the chopping rule, it
follows that

rCgAr C h (2.6)

Therefore, -disjoint(g, h). From this and Lemma 1, we get intersect(g, h)V

nested(g, h). Equation 2.6 and intersect(g, h) imply r C overlap(g, h). From this

and Lemma 13, we get disjoint(r, chop(g, R)) A disjoint(r, chop(h, R)). Therefore,

chop(g,R) = chop(g, R U {r}) and chop(h, R) = chop(h, R U {r}), a contradiction.
So, -intersect(g, h).

If nested(g, h), we may assume, without loss of generality, that g C h. This and

Equation 2.6 yield r C g C h. Therefore, maxP(x,y-l, R) = maxP(x,y-l, RU{r})

and minP(x + y, R) = minP(x + 1, y, RU {r}), where h = [x,y]. So, chop(h, R)

chop(h, R U {r}), a contradiction.

Hence, there can be at most one range of R whose chop() value changes as a result

of the addition of r. The preceding proof for the case nested(g, h) also establishes that

the chop() value may change only for the range s, that is for the smallest enclosing

range of r (i.e., smallest s E R[r C s]).

For (2b), assume that chop(s, R) / chop(s, RU{r}). This implies that chop(s, R)

/ 0 and so x' and y' are well defined. (Note that from part (1), we get chop(r, R) / 0.)
We consider each of the three cases for the relationship between r and chop(s, R)

(Lemma 1).

Case 1: disjoint(r, chop(s, R)). This case cannot arise, because then

chop(s, R) = chop(s, R U {r}).

Case 2: intersect(r, chop(s, R)). Now, either x' < u < y' < v or u < x' < v <

y'. Consider the former case. Since r C s, v < y. When v = y, minP(u + 1, v, R U

{r}) = minP(x + 1, y, R) = y' + 1. So, v' = y'. Therefore, x' < u A y' = v'. Consider
the case v < y. From the chopping rule, it follows that 3A C R c R U {r}[H(A)

[y' + l,y]]. From this, Lemma 5(2), and the fact that RU {r} is conflict free, we
conclude 3B E R U {r}[I(B) = overlap(r, [y' + 1, y]) = [y' + 1, v]]. From this and

minP(x + 1, y, R) = y' + 1, we get minP(u + 1, v, RU {r}) y' + 1. So, v' = y'.

Once again, x' < u A y' = v'. Using a similar argument, we may show that when

u < x' < v < y', x' u' A y' > v.

Case 3: nested(r, chop(s, R)). So, either r C chop(s,R) or chop(s,R) C r.

First, consider all possibilities for r C chop(s, R). The case x' < u < v < y' cannot

arise, because this implies chop(s, R) = chop(s,R U {r}). When x' = u < v < y',

u' = x'. So, x' = u' A y' > v. When x' < u < v = y', v' = y'. So, x' < uA y' = v'.

The final case is when x' = u < v = y'. Now, u' = x' A y' v'.

Using an argument similar to that used in part (2a), we may show that when

chop(s,R) C r, x' = u' A' y' v'.

Lemma 22 Let R, r = [u, v], s = [x, y], x', y', u' and v' be as in Lemma 21. Assume

that s exists and chop(s) / 0.

1. disjoint(r,chop(s,R)) V x' < u < v < y' = chop(s,R U {r}) = chop(s,R).

2. x' = u' A y' = v' = chop(s, R U {r}) 0.

3. Suppose x' = u' A y' > v. If maxP(v' + 1, y', R) doesn't exist, then chop(s, R U

{r}) = [v + 1, y']. If it exists, chop(s, RU {r}) = [maxP(v' + 1, y', R) + 1, y'].
4. Suppose x' < u' A y' = v'. If minP(x', u' 1, R) doesn't exist, then chop(s, R U

{r}) = [x', u ]. If it exists, chop(s,R U {r}) = [x', minP(x', u' 1,R) i].

Proof (1) follows from the proof of Lemma 21(2b). For (2), from the proof cases

of Lemma 21(2b) that have x' = u' A y' = v', it follows that case 5 of the chopping

rule applies for s in R U {r}. So, chop(s, R U {r}) = 0.

For (3), finish(chop(s, R U {r})) = y' follows from the proof of Lemma 21(2b).

Also, we observe that maxP(x,y 1,R U {r}) > v. So, (3b) can be false only

when maxP(x, y 1, RU {r}) > v and either (a) maxP(v' + 1, y', R) doesn't exist

or (b) maxP(v' + 1,y', R) < maxP(x,y 1,RU {r}). For (a), 3[c,d] e R[x < c <

v' Av < d < y']. For (b), 3[c,d] e R[x < c < v' Av < maxP(v' + ,y',R) <

d < y']. In both cases, c < u implies that r = [u, v] C [c, d] C s. This contradicts

the assumption that s is the smallest enclosing range of r. Also, in both cases,

c > u implies intersect(r, [c, d]). So, R U {r} has a subset whose projection is [c, v].

Therefore, finish(chop(u, v, RU {r})) < c < v', a contradiction.

The proof for (4) is similar to that for (3). U

Lemma 23 Let R be a conflict-free ri,.,', set. Let r = [u, v] E R be such that R-{r}

is conflict free.

1. chop(r, R) = 0 = Vt E R {r}[chop(t, R) = chop(t, R {r})].

2. Let s = [x, y] be the smallest r,.".' of R {r} that contains r. Assume that s

exists and that chop(r, R) = [u', v'].

(a) Vt E R- {r, s}[chop(t, R) = chop(t, R {r})].

(b) chop(s, R) = 0 chop(s, R {r}) [u',v'].

(c) chop(s, R)= [x', y'] = chop(s, R {r}) [min{x', u'}, max{y', v'}].

Proof For (1), note that chop(r, R) =0 = 3A C R {r}[II(A) = r]. Therefore,

the removal of r from R does not affect any of the maxP and minP values.

For (2a) note that by substituting R {r} for R in Lemma 21(2a), we get

Vt E R {r, s}[chop(t, R {r}) = chop(t, R)]. (2b) and (2c) follow from Lemma 22.


1 2 .. ..... .... .. ." ..... ..... ....
4 7 8 14 8 ..... ...
11 14
11 17
0 22 0
I 0 4 8 12 16 20 24
(A) (B)

Figure 2-5: An example range set R and its mapping mapl(R) into points in 2D

2.1.5 Priority Search Trees And Ranges

A priority-search tree (PST) [37] is a data structure that is used to represent a set

of tuples of the form (keyl, key2, data), where keyl > 0, key2 > 0, and no two tuples

have the same keyl value. The data structure is simultaneously a min-tree on key2

(i.e., the key2 value in each node of the tree is < the key2 value in each descendent

node) and a search tree on keyl. There are two common PST representations [37]:

1. In a radix priority-search tree (RPST), the underlying tree is a binary radix

tree on keyl.

2. In a red-black priority-search tree (RBPST), the underlying tree is a red-

black tree.

McCreight [37] has si --, -I ,l a PST representation of a collection of ranges with

distinct finish points. This representation uses the following mapping of a range r

into a PST tuple:

(keyl, key2, data) (finish(r), starter), data) (2.7)

where data is any information (e.g., next hop) associated with the range. Each

range r is, therefore mapped to a point mapl(r) = (x,y) = (keyl, key2) =

(finish(r), starter)) in 2-dimensional space. Figure 2-5 shows a set of ranges and

the equivalent set of 2-dimensional points (x, y).

McCreight [37] has observed the when the mapping of Equation 2.7 is used to

obtain a point set P = mapl(R) from a range set R, then ranges(d) is given by

the points that lie in the rectangle (including points on the boundary) defined by

Xleft = Xright = 00, Yto = d, and bottom = 0. These points are obtained using the
method enumerateRectangle(xft, Xright, Ytop) = enumerateRectangle(d, oo, d) of a

PST bottomm is implicit and is alv--v- 0).

When an RPST is used to represent the point set P, the complexity of

enumerateR'. (,,/ xl leftf, right, top)

is O(logmaxX + s), where maxX is the largest x value in P and s is the number

of points in the query rectangle. When the point set is represented as an RBPST,

this complexity becomes O(log n + s), where n = IP. A point (x, y) (and hence a

range [y, x]) may be inseted into or deleted from an RPST (RBPST) in O(log maxX)

(O(logn)) time [37].

2.2 Prefixes

Let R be a set of ranges such that each range represents a prefix. It is well known

(see Sahni et al. [21], for example) that no two ranges of R intersect. Therefore, R
is conflict free. For simplicity, assume that R includes the range that corresponds to

the prefix *. With this assumption, msr(d) is defined for every d. From Lemma 9, it

follows that msr(d) is the range [maxStart(ranges(d)), minFinish(ranges(d))]. To

find this range easily, we first transform P = mapl(R) into a point set transforml(P)

so that no two points of transforml(P) have the same x-value. Then, we represent

transforml(P) as a PST.

Definition 12 Let W be the (maximum) number of bits in a destination address

(W = 32 in IPv). Let (x, y) E P. transforml(x, y) = (x',y') = (2wx-y+2w-1,y)

and transforml(P) = {transforml(x, y) (x, y) C P}.

We see that 0 < x' < 22W for every (x', y') E transforml(P) and that no two

points in transforml(P) have the same x'-value. Let PST1(P) be the PST for

transforml(P). The operation

enumerateR. i,,.il/, (2wd d + 2" 1, oo, d)

performed on PST1 yields ranges(d). To find msr(d), we employ the

minX inRectangle leftf, Xright, Ytop)

operation, which determines the point in the defined rectangle that has the least

x-value. It is easy to see that

minXinRectangle(2Wd d + 2w 1, oo, d)

performed on PST1 yields msr(d).

To insert the prefix whose range in [u, v], we insert transforml(mapl([u, v]))

into PST1. In case this prefix is already in PST1, we simply update the next-

hop information for this prefix. To delete the prefix whose range is [u, v], we delete

transforml(mapl([u,v])) from PST1. When deleting a prefix, we must take care

not to delete the prefix *. Requests to delete this prefix should simply result in setting

the next-hop associated with this prefix to 0.

Since, minXinRectangle, insert, and delete each take O(W) (O(logn)) time

when PST1 is an RPST (RBPST), PST1 provides a router-table representation in

which longest-prefix matching, prefix insertion, and prefix deletion can be done in

O(W) time each when an RPST is used and in O(logn) time each when an RBPST

is used.

2.3 Nonintersecting Ranges

Let R be a set of nonintersecting ranges. Clearly, R is conflict free. For simplicity,

assume that R includes the range z that matches all destination addresses (z

[0, 232 1] in the case of IPv4). With this assumption, msr(d) is defined for every d.

We may use PST1(transforml(mapl(R))) to find msr(d) as described in Section 2.2.

Insertion of a range r is to be permitted only if r does not intersect any of the

ranges of R. Once we have verified this, we can insert r into PST1 as described in

Section 2.2. Range intersection may be verified by noting that there are two cases for

range intersection (Definition 2(c)). When inserting r = [u, v], we need to determine

if 3s = [x, y] E R[u < x < v < yVx < u < y < v]. We see that 3s E R[x < u < y < v]

iff mapl(R) has at least one point in the rectangle defined by xleft = u, Xright = 1,

and ytop = u 1 (recall that bottom = 0 by default). Hence, 3s E R[x < u < y < v]

iff minXinRectangle(2u (u 1) + 2W 1,2w(v 1) + 2W u 1) exists in


To verify 3s E R[u < x < v < y], map the ranges of R into 2-dimensional points

using the mapping, map2(r) = (start(r), 2 1 finish(r)). Call the resulting

set of mapped points map2(R). We see that 3s E R[u < x < v < y] iff map2(R)

has at least one point in the rectangle defined by xleft = + 1, Xright = v, and

ytop (2w 1) v 1. To verify this, we maintain a second PST, PST2 of points in
transform2(map2(R)), where transform2(x, y) = (2Wx + y, y) Hence, 3s E R[u <

x < v < y] iff minXinR.. l -,il. (2(u + 1), 2v + (2 1) v 1, (2 1) v 1)


To delete a range r, we must delete r from both PST1 and PST2. Deletion of

a range from a PST is similar to deletion of a prefix as discussed in Section 2.2.

The complexity of the operations to find msr(d), insert a range, and delete a

range are the same as that for these operations for the case when R is a set of ranges

that correspond to prefixes.


Step 1: If r = [u, v] E R, update the next-hop information associated with r E R and
Step 2: Compute maxP(u, v, R), minP(u, v, R), maxY(u, v, R) and minX(u, v, R).
Step 3: If maxY(u,v, R) < maxP(u,v, R) A minX(u,v, R) > minP(u,v,R), R U
{r} is conflict free; otherwise, it is not. In the former case, insert
transforml(mapl(r)) into PST1 and transform2(map2(r)) into PST2. In
the latter case, the insert operation fails.

Figure 2-6: Insert r = [u, v] into the conflict-free range set R

2.4 Conflict-Free Ranges

In this section, we extend the two-PST data structure of Section 2.3 to the

general case when R is an arbitrary conflict-free range set. Once again, we assume

that R includes the range z that matches all destination addresses. PST1 and PST2

are defined for the range set R as in Sections 2.2 and 2.3.

2.4.1 Determine msr(d)

Since R is conflict free, msr(d) is determined by Lemma 9. Hence, msrd(d) may

be obtained by performing the operation

minXinRectangle(2Wd d + 2w 1, oo, d)

on PST1.

2.4.2 Insert A Range

When inserting a range r = [u,v] i R, we must insert transform(mapl(r))

into PST1 and transform2(map2(r)) into PST2. Additionally, we must verify that

R U {r} is conflict free. This verification is done using Lemma 6. Figure 2-6 gives a

high-level description of our algorithm to insert a range into R.

Step 1 is done by searching for transforml(mapl(r)) in PST1. For Step 2, we

note that

maxY(u, v, R) = maxXinRectangle(2u- (u- 1)+2w- 2(v- 1)+2- 1, u- 1)

minX(u, v, R) = minXinRectangle(2(u+1),2Wv+(2w-1) -v- (2- 1)-v-1)

Step 1: If r = z, change the next-hop information for z to 0 and terminate.
Step 2: Delete transforml(mapl(r)) from PST1 and transform2(map2(r)) from
PST2 to get the PSTs for A R {r}. If PST1 did not have
transforml(mapl(r)), r J R; terminate.
Step 3: Determine whether or not A has a subset whose projection equals r = [u, v].
Step 4: If A has such a subset, conclude conflictFree(A) and terminate.
Step 5: Determine whether A has a range that contains r [u, v]. If not, conclude
conflictFree(A) and terminate.
Step 6: Determine m and n as defined in Lemma 8 as follows.
m start(maxXinRectangle(O, 2Wu + (2 1) v, (2w 1) v) (use PST2)
n = finish(minXinRectangle(2v u + 2W 1, u) (use PST1)

Step 7: Determine whether [m,n] e A. If so, conclude conflictFree(A). Otherwise,
conclude -conflictFree(A). In the latter case reinsert transforml(mapl(r))
into PST1 and transform2(map2(r)) into PST2 and disallow the deletion of

Figure 2-7: Delete the range r = [u, v] from the conflict-free range set R

where for maxY we use PST1 and for minX we use PST2. Section 2.4.4 describes

the computation of maxP and minP. The point insertions of Step 3 are done using

the standard insert algorithm for a PST [37].

2.4.3 Delete A Range

Suppose we are to delete the range r = [u, v]. This deletion is to be permitted

iff r / z and A = R {r} is conflict free. Figure 2-7 gives a high-level description of

our algorithm to delete r. Its correctness follows from Lemma 8.

Step 2 employs the standard PST algorithm to delete a point [37]. For Step 3,

we note that A has a subset whose projection equals r = [u, v] iff maxP(u, v, A) = v.

In Section 2.4.4, we show how maxP(u, v, A) may be computed efficiently. For Step

5, we note that r = [u, v] C s = [x, y] iff x < u A y > v. So, A has such a range iff

minXinR.., (2wv u + 2 ,oo, u)

exists in PST1.

In Step 6, we assume that maxXinRectangle and minXinR. u.,.i,.!: return the

range of R that corresponds to the desired point in the rectangle. To determine

Step 1: Find r' E norm(R) such that starter') = u.
If no such r' or start(full(r')) / uV finish(full(r')) > v, maxP does not exist;
Step 2: maxP = finisher');
while (s' e norm(R) A startss) = maxP + 1 A (full(s') C [u, v])
maxP = finishes');

Figure 2-8: Simple algorithm to compute maxP(u, v, R), where [u, v] is a range and

whether [m, n] c A (Step 7), we search for the point (2wn m + 2W 1, m) in PST1

using the standard PST search algorithm [37]. The reinsertion into PST1 and PST2,

if necessary, is done using the standard PST insert algorithm [37].

2.4.4 Computing maxP and minP

Although maxP and minP are relatively difficult to compute using data struc-

tures such as PST1 and PST2 that directly represent R, they may be computed

efficiently using data structures for norm(R). In this section, we show how to com-

pute maxP from norm(R). The computation of minP is similar.

2.4.5 A Simple Algorithm to Compute maxP

Figure 2-8 is a high-level description of a simple, though not efficient, algorithm

to compute maxP(u, v, R).

Theorem 1 Figure 2-8 corr. l//; computes maxP(u, v, R).

Proof First consider Step 1. From Lemma 17(a), it follows that

/r' e norm(R)[start(r') = u] z /3r c R[start(r) = u]

Therefore, /3r' c norm(R)[start(r') = u] /B3maxP. From Lemma 18, it

follows that start(full(r')) / starter') =u /3s e R[start(s) = starter') = u].

So, start(full(r')) / u =/3maxP. Finally, u start(r') start(full(r)) im-

plies finish(full(r')) = min{finish(t) t R A start(t) = u} (Lemma 17(1)).

So, finish(full(r')) > v implies /3s E R[start(s) = u A finishes) < v].

Hence, starter') = u A finish(full(r')) > v 3=/3maxP. Further, when

3r' E norm(R)[start(r') = u A finish(full(r')) < v], maxP exists and maxP >

finish(full(r')) > finish(r'). Therefore, Step 1 correctly identifies the case when

maxP doesn't exist.

We get to Step 2 only when maxP exists. From the definition of maxP, 3A C

R[I(A) = [u, maxP]]. From this and Lemma 20(1), we get 3B C norm(R)[H(B)

[u, maxP]]. Now, from Lemma 10, we get 3D C norm(R)[chain(D) A H(D)

[u, maxP]]. From Lemma 11, it follows that D is a sub-chain of the unique chain
Ci E CP(norm(R)) that includes r'. Let r', s', s', ..., s be the tail of C. It follows
that maxP is either finisher') or finishes') for some j in the range [1,q]. Let j

be the least integer such that full(s') % [u,v]. If such a j does not exist, then
maxP finishes') as norm(R) has no subset whose projection equals [u,x] for

any x > finishes'). So, assume that j exists. From Lemma 20(2), it follows that

maxP < finishes'). Hence, Step 2 correctly determines maxP. U

2.4.6 An Efficient Algorithm to Compute maxP

The algorithm of Figure 2-8 takes time O(length(Ci)), where length(Ci) is the

number of ranges in the chain C, e CP(norm(R)) that contains r'. We can reduce

this time to O(loglength(Ci)) by representing each chain of CP(norm(R)) as a red-

black tree (actually any balanced search tree structure that permits efficient join and

split operations may be used). The number of red-black trees we use equals the

number of chains in CP(norm(R)).

Let D t,..., ') be a chain in CP(norm(R)). The red-black tree, RBT(D), for
D has one node for each of the ranges t'. The key value for the node for t is start(t)

(equivalently, finish(t) may be used as the search tree key). Each node of RBT(D)

has the following four values (in addition to having a t' and other values necessary

for efficient implementation): minStartLeft, minStartRight, maxFinishLeft, and

maxFinishRight. For a node p that has an empty left subtree, minStartLeft =
2W 1 and maxFinishLeft = 0. Similarly, when p has an empty right subtree,

minStartRight = 2W 1 and maxFinishRight = 0. Otherwise,

minStartLeft = min{start(full(r')) r' E leftSubtree(p)}

minStartRight = min{start(full(r'))|r' E rightSubtree(p)}

maxFinishLeft = max{finish(full(r')) r' E leftSubtree(p)}

maxFinishRight = max{finish(full(r'))|r' c rightSubtree(p)}

The collection of red-black trees representing norm(R) is augmented by an ad-

ditional red-black tree endPointsTree(norm(R)) that represents the end points of

the ranges in norm(R). With each point x in endPointsTree, we store a pointer to

the node in RBT(D) that represents s'. Alternatively, we may use a PST, PST3, for

the range set chains {[start(Ci), finish(Ci)]I Ci E CP(norm(R))}. The points in

PST3 are mapl(chains); with each point in PST3, we keep a pointer to the root of

the RBT for that chain. Note that since range end-points are distinct in chains, we do

not need to use transform as used in PST1. To find an end point d, we first find the

smallest chain that covers d by performing the operation minXinRectangle(d, oo, d)

in PST3. Next, we follow the pointer associated with this chain to get to the cor-

responding RBT. Finally, a search of this RBT gets us to the RBT node for the s'

with the given end point. In the sequel, we assume that endPointsTree, rather than

PST3, is used. A parallel discussion is possible for the case when PST3 is used.

To implement Step 1 of Figure 2-8, we search endPointsTree for the point u.

If u endPointsTree, then /Jr' c norm(R)[start(r') = u]. If u c endPointsTree,

then we use the pointer in the node for u to get to the root of the RBT that has r'.

A search in this RBT for u locates r'. We may now perform the remaining checks of

Step 1 using the data associated with r'.

Suppose that maxP exists. At the start of Step 2, we are positioned at the RBT

node that represents r'. This is node 0 of Figure 2-9. We need to find s' e norm(R)

Figure 2-9: An example RBT

with least s' such that startss) > finisher') A full(s') % [u, v]. If there is no such s',

then maxP = max{finish(root.range), root.maxFinishRight}. If such an s' exists,

maxP = startss) 1.

s' may be found in O(height(RBT)) time using a simple search process.

We illustrate this process using the tree of Figure 2-9. We begin at node 0. If

[minStartRight, maxFinishRight] C [u,v], then s' is not in the right subtree of
node 0. Since node 0 is a right child, s' is not in its parent. So, we back up to node

1 (in general, we back up to the nearest ancestor whose left subtree contains the

current node). Let t' be the range in node 1. s' = t' iff t' % [u,v]. If s' / t', we

perform the test [minStartRight, maxFinishRight] C [u, v] at node 1 to determine

whether or not s' is in the right subtree of node 1. If the test is true, we back up to

node 2. Otherwise, s' is in the right subtree of node 1. When the right subtree (if

any) that contains s' is identified, we make a downward pass in this subtree to locate

s'. Figure 2-10 describes this downward pass.

// currentNode is the root of a subtree all of whose ranges start at the right of u
// This subtree contains s'. Return maxP.
while (true) {
if ([currentNode. minStartLeft, currentNode.maxFinishRight] C [u,v])
// s' not in left subtree
if (currentNode.range C [u, v])
// s' currentNode. s' must be in right subtree.
currentNode = currentNode.rightChild;
else return (start(currentNode.range) 1);
else // s' is in left subtree
currentNode = currentNode.le ftChild;

Figure 2-10: Find s' (and hence maxP) in a subtree known to contain s'

2.4.7 Wrapping Up Insertion of a Range

Now that we have augmented PST1 and PST2 with a collection of RBTs and

an endPointsTree, whenever we insert a range r = [u, v] into R, we mut update not

only PST1 and PST2 as described in Section 2.4.2, but also the RBT collection and

endPointsTree. To do this, we first compute chop(r, R U {r}) = chop(r, R) = [u', v']

by first computing minP(u + 1, v) and maxP(u, v 1) as described in Section 2.4.4.

[u', v'] is now easily obtained from the chopping rule. Lemma 21 tells us that the
only s E R whose chop() value may change as a result of the insertion of r is the

smallest enclosing range of r. Since z E R and r / z, such an s must exist. Rather

than search for this s explicitly, we use the cases (2)-(4) conditions of Lemma 22 to

find s' = chop(s, R) in endPointsTree. Note that if chop(s, R) = 0, the search in

endPointsTree will not find s; but when chop(s, R) = 0, chop(s, RU {r}) = 0. So,

no change in chop(s, R) is called for.

Note that the insertion of r may combine two chains of CP(norm(R)). In this

case, we use the join operation of red-black trees to combine the RBTs corresponding

to these two chains.

2.4.8 Wrapping Up Deletion of a Range

When chop(r, R) = 0, no changes are to be made to the RBTs and endPointsTree

(Lemma 23(1)). So, assume that chop(r, R) / 0. We first find s, the smallest range

that contains r (see Lemma 23(2)). Note that since z E R and r / z, s exists. One

may verify that s is one of the ranges given by the following two operations.

minXinRR. I.,,,I (2Wv u + 2 1,o u)

maxXinRectangle(0, 2Wu + 2w 1 v, 2w 1 v)

where the first operation is done in PST1 and the second in PST2 (both oper-

ations are done after transforml(mapl([u, v])) has been deleted from PST1 and

transform2(map2([u, v)) has been deleted from PST2). The ranges returned by

these two operations may be compared to determine which is s.

Once we have identified s, Lemma 23(2) is used to determine chop(s, R-{r}). As-

sume that chop(s, R) / 0. Let chop(r, R) = r' = [u', v'] and chop(s, R) = s' = [x', y'].

When s' and r' are in different RBTs (this is the case when r' C s', chop(s, R)

chop(s, R {r}) and the RBT that contains s' may need to be split into two RBTs.

When s' and r' are in the same RBT, they are in the same chain of CP(norm(R)). If

s' are r' are .dli i .ent ranges of this chain, we may simply remove the RBT node for r'

and update that for s' to reflect its new start or finish point (only one may change).

When r' and s' are not .,.li i:ent ranges, the nodes for these two ranges are removed

from the RBT (this may split the RBT into up to two RBTs) and chop(s, R {r})

inserted. Figure 2-11 shows the different cases.

2.4.9 Complexity

The portions of the search, insert, and delete algorithms that deal only with

PST1 and PST2 have the same .i-vmptotic complexity as their counterparts for the

case of nonintersecting ranges (Section 2.3). The portions that deal with the RBTs

and endPointsTree require a constant number of search, insert, delete, join, and split

U' V' yt x' y u' v'

u' y x' v'U

(A) (B)
UI IV 1- I I V/
F--H--H -H -H --HH---H F--H

SU--H -' F--H---H-1 H-H F--H-F--H

(C) (D)

Figure 2-11: Cases when s' and r' are in the same chain of CP(norm(R))

operations on these structures. Since each of these operations takes O(log n) time on

a red-black tree and since we can update the values minStartLeft, minStartRight,

and so on, that are stored in the RBT nodes in the same .,-ill li ic time as taken

by an insert/delete/join/split, the overall complexity of our proposed data structure

is O(log n) for each operation when RBPSTs are used for PST1 and PST2. When

RPSTs are used, the search complexity is O(W) and the insert and delete complexity

is (W + logn) = (W).

2.5 Experimental Results

2.5.1 Prefixes

We programmed our red-black priority-search tree algorithm for prefixes (Sec-

tion 2.2) in C++ and compared its performance to that of the ACBRT of Sahni

et al. [22]. Recall that the ACBRT is the best performing O(logn) data structure

reported in [22] for dynamic prefix-tables. For test data, we used six IPv4 prefix

databases obtained from [38]. The number of prefixes in each of these databases as

well as the memory requirements for each database of prefixes using our data struc-

ture (PST) of Section 2.2 as well as the ACBRT structure of Sahni et al. [22] are

12000 -





Palx1 Pbl MaeWest Aads Pb2 Paix2

Figure 2-12: Memory usage

shown in Table 2-1. The databases Paixl, Pbl, MaeWest and Aads were obtained on

Nov 22, 2001, while Pb2 and Paix2 were obtained Sept. 13, 2000. Figure 2-12 is a

plot of the data of Table 2-1. As can be seen, the ACBRT structure takes almost

three times as much memory as is taken by the PST structure. Further, the memory

requirement of the PST structure can be reduced to about 5(0' that of our current

implementation. This reduction requires an n-node implementation of a priority-

search tree as described in [37] rather than our current implementation, which uses

2n 1 nodes as in [39].

Table 2-1: Memory usage

Database Paixt Pbl MaeWest Aads Pb2 Paix2
Num of Prefixes 16172 22225 28889 31827 35303 85988
Memory PST 884 1215 1579 1740 1930 4702
(KB) ACRBT 2417 3331 4327 4769 5305 12851

To obtain the mean time to find the longest matching-prefix (i.e., to perform

a search), we started with a PST or ACRBT that contained all prefixes of a pre-

fix database. Next, a random permutation of the set of start points of the ranges

corresponding to the prefixes was obtained. This permutation determined the order

in which we searched for the longest matching-prefix for each of these start points.

The time required to determine all of these longest-matching prefixes was measured

and averaged over the number of start points (equal to the number of prefixes). The

experiment was repeated 20 times and the mean and standard deviation of the 20

mean times computed. Table 2-2 gives the mean time required to find the longest

matching-prefix on a Sun Blade 100 workstation that has a 500MHz UltraSPARC-Iie

processor and has a 256KB L2 cache. The standard deviation in the mean time is

also given in this table. On our Sun workstation, finding the longest matching-prefix

takes about 10'-. to 1 !'. less time using an ACRBT than a PST.

Table 2-2: Prefix times on a 500MHz Sun Blade 100 workstation
Database Paixl Pbl MaeWest Aads Pb2 Paix2
PST Mean 2.88 3.06 3.25 3.31 3.43 4.06
Search Std 0.36 0.18 0.17 0.16 0.09 0.05
(psec) ACRBT Mean 2.60 2.77 2.87 2.87 3.09 3.51
Std 0.25 0.16 0.16 0.12 0.13 0.04
PST Mean 3.90 4.45 4.83 5.18 5.14 6.04
Insert Std 0.57 0.63 0.51 0.48 0.19 0.20
(psec) ACRBT Mean 21.15 23.42 24.77 25.36 25.54 28.07
Std 1.11 0.66 0.38 0.29 0.19 0.18
PST Mean 4.36 4.45 4.73 4.71 5.06 5.48
Delete Std 0.91 0.63 0.53 0.00 0.19 0.16
(psec) ACRBT Mean 21.24 22.68 23.16 23.71 24.56 25.64
Std 0.95 0.55 0.49 0.35 0.26 0.21

To obtain the mean time to insert a prefix, we started with a random permutation

of the prefixes in a database, inserted the first I.7' of the prefixes into an initially

empty data structure, measured the time to insert the remaining 3: ;'. and computed

the mean insert time by dividing by the number of prefixes in 3 ;' of the database.

This experiment was repeated 20 times and the mean of the mean as well as the

standard deviation in the mean computed. These latter two quantities are given

in Table 2-2 for our Sun workstation. As can be seen, insertions into a PST take

between 1i'. and 2"' the time to insert into an ACRBT!

The mean and standard deviation data reported in Table 2-2 for the delete

operation were obtained in a similar fashion by starting with a data structure that

had 1C( 1' of the prefixes in the database and measuring the time to delete a randomly

selected 3 ;' of these prefixes. Deletion from a PST takes about 21 1' the time required

to delete from an ACRBT.

Tables 2-3 and 2-4 give the corresponding times on a 700MHz Pentium III PC

and a 1.4GHz Pentium 4 PC, respectively. Both computers have a 256KB L2 cache.

The run times on our 700MHz Pentium III are about one-half the times on our Sun

workstation. Surprisingly, when going from the 700MHz Pentium III to the 1.4GHz

Pentium 4, the measured time to find the longest matching-prefix decreased by only

about 5'. for PST. More surprisingly, the corresponding times for ACRBT actually

increased. The net result of the slight decrease in time for PST and the increase

for ACRBT is that, on our Pentium 4 PC, the PST is faster than the ACRBT on

all three operations-find longest matching-prefix, insert, and delete. This somewhat

surprising behavior is due to architectural differences (e.g., differences in width and

size of L1 cache lines) between the Pentium III and 4 processors.

Table 2-3: Prefix times on a 700MHz Pentium III PC
Database Paixl Pbl MaeWest Aads Pb2 Paix2
PST Mean 1.39 1.54 1.61 1.65 1.70 1.97
Search Std 0.27 0.22 0.17 0.14 0.00 0.04
(psec) ACRBT Mean 1.36 1.44 1.44 1.49 1.54 1.80
Std 0.25 0.18 0.13 0.14 0.14 0.06
PST Mean 2.41 2.63 2.60 2.83 2.80 3.07
Insert Std 0.87 0.30 0.53 0.43 0.40 0.14
(psec) ACRBT Mean 11.97 12.63 13.48 13.62 13.77 14.93
Std 0.95 0.67 0.24 0.48 0.35 0.18
PST Mean 2.32 2.38 2.49 2.45 2.55 2.91
Delete Std 0.82 0.61 0.52 0.47 0.00 0.17
(psec) ACRBT Mean 11.69 12.55 12.95 13.01 13.40 14.10
Std 0.87 0.63 0.54 0.44 0.48 0.16

Figures 2-13, 2-14, and 2-15 histogram the search, insert, and delete time data

of the preceding tables.

Table 2-4: Prefix times on a 1.4GHz Pentium 4 PC

Database Paixl Pbl MaeWest Aads Pb2 Paix2
PST Mean 1.30 1.44 1.51 1.52 1.63 1.92
Search Std 0.19 0.18 0.17 0.13 0.13 0.06
(psec) ACRBT Mean 1.48 1.69 1.83 1.87 1.87 2.24
Std 0.31 0.20 0.16 0.07 0.14 0.05
PST Mean 1.76 1.96 2.18 2.17 2.38 2.65
Insert Std 0.41 0.69 0.00 0.44 0.35 0.18
(psec) ACRBT Mean 11.22 11.81 12.41 12.91 12.92 13.94
Std 0.41 0.60 0.41 0.44 0.26 0.18
PST Mean 1.76 1.69 1.92 1.93 2.00 2.22
Delete Std 0.41 0.60 0.38 0.21 0.42 0.17
(psec) ACRBT Mean 9.46 10.39 10.54 10.42 10.92 11.64
Std 0.57 0.63 0.38 0.21 0.42 0.16




1 (C)

Figure 2-13: Time for searching longest matching prefix. A)Sun. B)Pentium
700MHz. C)Pentium 1.4GHz




Figure 2-14: Time for inserting a prefix. A)Sun. B)Pentium 700MHz. C)Pentium

2.5.2 Nonintersecting Ranges

To benchmark our algorithm for nonintersecting ranges (Section 2.3), we gener-
ated three different sets of random1 nonintersecting ranges. These, respectively, had

1 We resorted to randomly generated data sets because no benchmark data for
nonintersecting ranges was available.



111111 111111 111111


S111111. ...
Database Database Database
(A) (B) (C)

Figure 2-15: Time for deleting a prefix. A)Sun. B)Pentium 700MHz. C)Pentium

30000, 50000, and 80000 ranges. Table 2-5 gives the memory requirement as well as

the mean times and standard deviations for search, insert, and delete. The run times

are for our 700MHz Pentium III PC. The search, insert, and delete experiments were

modeled after those conducted for the case of prefix databases.

Table 2-5: N,-,iii isecting Ranges. 700 MHz PIII

Num of Ranges 30000 50000 80000
Memory Usage (KB) 3360 5600 8960
Search Mean 1.92 2.19 2.51
(psec) Std 0.15 0.04 0.06
Insert Mean 8.65 9.27 9.88
(psec) Std 0.49 0.29 0.17
Remove Mean 5.75 6.42 6.81
(psec) Std 0.44 0.28 0.14

2.5.3 Conflict-free Ranges

Table 2-6 gives the memory required as well as the mean times and standard

deviations for the case of conflict-free ranges. The range sequence used is generated

so that when the ranges are inserted in sequence order, there are no conflicts. For

deletion, 3 ::' of the ranges are removed in the reverse of the insert order.

2.6 Conclusion

We have developed data structures for dynamic router tables. Our data struc-

tures permit one to search, insert, and delete in O(log n) time each. Although O(log n)

Table 2-6: Conflict-free Ranges. PIII 700MHz with 256K L2 cache

Num of Ranges in R 30000 50000 80000
Num of Ranges Mean 29688 48868 76472
in norm(R) Std 18.03 42.90 60.05
Memory Usage Mean 6240 9979 15219
(KB) Std 7.06 10.91 11.19
Search Mean 1.98 2.34 2.69
(psec) Std 0.07 0.09 0.06
Insert Mean 18.45 19.65 20.76
(psec) Std 0.51 0.27 0.27
Remove Mean 19.3 20.49 21.60
(psec) Std 0.41 0.13 0.29

time data structures for prefix tables were known prior to our work [21, 22], our data

structure is more memory efficient than the data structures of Sahni et al. [21, 22].

Further, our data structure is significantly superior on the insert and delete opera-

tions, while being competitive on the search operation.

For nonintersecting ranges and conflict-free ranges our data structures are the

first to permit O(log n) search, insert, and delete.


In this chapter, we focus on data structures for dynamic NHPRTs, HPPTs and

LMPTs. In Section 3.2, we develop the data structure binary tree on binary tree

(BOB). This data structure is proposed for the representation of dynamic NHPRTs.

Using BOB, a lookup takes O(log2 n) time and cache misses; a new rule may be

inserted and an old one deleted in O(logn) time and cache misses. For HPPTs,

we propose a modified version of BOB-PBOB (prefix BOB)-in Section 3.3. Using

PBOB, a lookup, rule insertion and deletion each take O(W) time and cache misses.

In Section 3.4, we develop the data structures LMPBOB (longest matching-prefix

BOB) for LMPTs. Using LMPBOB, the longest matching-prefix may be found in

O(W) time and O(log n) cache misses; rule insertion and deletion each take O(log n)

time and cache misses. On practical rule tables, BOB and PBOB perform each of

the three dynamic-table operations in O(log n) time and with O(log n) cache misses.

Section 3.1 introduces some terminology and Experimental results are presented in

Section 3.6.

3.1 Preliminaries

Definition 13 A range r = [u, v] is a pair of addresses u and v, u < v. The ru.i.,

r represents the addresses {u, u+ 1,..., v}. starter) = u is the start point of the ri.,,

and finish(r) = v is the finish point of the rr, .i. The rr ,,, r matches all addresses

d such that u < d < v.

The start point of the range r = [3, 9] is 3 and its finish point is 9. This range

matches the addresses {3, 4, 5, 6, 7, 8, 9}. In IPv4, s and f are up to 32 bits long, and

in IPv6, s and f may be up to 128 bits long. The IPv4 prefix P = O* corresponds to

the range [0, 231 1]. The range [3,9] does not correspond to any single IPv4 prefix.

We may draw the range r = [u, v] = {u, u + 1,..., v} as a horizontal line that begins

at u and ends at v. Figure 2-1 shows ranges drawn in this fashion.

Notice that every prefix of a prefix router-table may be represented as a range.

For example, when W = 6, the prefix P = 1101* matches addresses in the range

[52,55]. So, we --v P = 1101* = [52,55], start(P) = 52, and finish(P) = 55.

Since a range represents a set of (contiguous) points, we may use standard set

operations and relations such as n and c when dealing with ranges. So, for example,

[2, 6] n [4, 8] = [4, 6]. Note that some operations between ranges my not yield a range.
For example, [2, 6] U [8, 10] = 2, 3, 4, 5, 6, 8, 9, 10}, which is not a range.

Definition 14 Let r = [u, v] and s = [x, y] be two r, .,. Let overlap(r, s) = rn s.

(a) The predicate disjoint(r, s) is true iff r and s are disjoint.

disjoint(r, s) < overlap(r, s)= 0 v < x V y < u

Figure 2-1(A) shows the two cases for disjoint sets.

(b) The predicate nested(r, s) is true iff one of the ru., is contained within the


nested(r, s) overlap(r, s) r V overlap(r, s)= s

r rCsVsCr

< x

Figure 2-1(B) shows the two cases for nested sets.

(c) The predicate intersect(r, s) is true iff r and s have a no,. mi1l,;' intersection

that is different from both r and s.

intersect(r, s) => r s O Ar n s rAr n s s

< -disjoint(r, s) A -inested(r, s)

S= u

Figure 2-1(C) shows the two cases for ri,.g that intersect. Notice that

overlap(r, s) = [x,v] when u < x < v < y and overlap(r, s) = [u,y] when

x < y < v.

[2, 4] and [6, 9] are disjoint; [2,4] and [3,4] are nested; [2,4] and [2,2] are nested;

[2,8] and [4,6] are nested; [2,4] and [4,6] intersect; and [3,8] and [2,4] intersect. [4, 4]
is the overlap of [2, 4] and [4, 6]; and overlap([3, 8], [2, 4]) = [3, 4].

Lemma 24 Let r and s be two r,.g. E,'. /;i one of the following is true.

1. disjoint(r, s)

2. nested(r, s)

3. intersect(r, s)

Proof Straightforward. U

Definition 15 The r i,.g' set R is nonintersecting iff disjoint(r, s) V nested(r, s) for

every pair of ri,.g, r and s E R.

Definition 16 The ri.,.- r is more specific than the r,,'j. s iff r C s.

[2, 4] is more specific than [1,6], and [5, 9] is more specific than [5, 12]. Since [2, 4]

and [8, 14] are di-i, 1iiil neither is more specific than the other. Also, since [4, 14] and

[6, 20] intersect, neither is more specific than the other.

Definition 17 Let R be a rr,,.-. set. ranges(d, R) (or -i .'l.; ranges(d) when R is

implicit) is the subset of rn.g' of R that match the destination address d. msr(d,R)

(or msr(d)) is the most "/ .. ..:'' ring' of R that matches d. That is, msr(d) is the

most -/... ''.: rir '.- in ranges(d). msr([u, v],R) = msr(u, v,R) = r iff msr(d, R) = r,

u < d < v. When R is implicit, we write msr(u,v) and msr([u,v]) in place of

msr(u,v, R) and msr([u, v],R). hpr(d) is the highest-i '.i- ',;l ri,'.- in ranges(d).

We assume that rr,.i are assigned priorities in such a way that hpr(d) is ;",',.:!;" ;I

I, f;,.,, for every d.

When R = {[2,4], [1, 6]}, ranges(3) = {[2,4], [1, 6]}, msr(3) = [2,4], msr(1)

[1, 6], msr(7) = 0, and msr(5, 6) = [1,6]. When R = {[4,14], [6, 20], [6,14], [8,12]},
msr(4, 5) [4,14], msr(6, 7) [6,14], msr(8, 12)- [8,12], msr(13, 14)- [6,14],

and msr(15, 20) = [6, 20].

Definition 18 Let r and s be two ri.g. r < s # starter) < starts) V starterr)

starts) A finisher) > finishes)).

Note that for every pair, r and s, of different ranges, either r < s or s < r.

Lemma 25 Let R be a nonintersecting rig.- set. If r n s / 0 for r s R, then

the following are true:

1. start(r) < starts) = finish(r) > finishes).

2. finisher) > finishes) = start(r) < startss.

Proof Straightforward. U

3.2 Nonintersecting Highest-Priority Rule-Tables (NHRTs)-BOB

3.2.1 The Data Structure

The data structure binary tree on binary tree (BOB) that is being proposed here

for NHRTs comprises a single balanced binary search tree at the top level. This top-

level balanced binary search tree is called the point search tree (PTST). For an n-rule

NHRT, the PTST has at most 2n nodes (we call this the PTST size constraint).

The size constraint is necessary to enable O(log n) update. With each node z of the

PTST, we associate a point, point(z). The PTST is a standard red-black binary

search tree (actually, any binary search tree structure that supports efficient search,

insert, and delete may be used) on the point(z) values of its node set [24]. That

is, for every node z of the PTST, nodes in the left subtree of z have smaller point

values than point(z), and nodes in the right subtree of z have larger point values than


Let R be the set of nonintersecting ranges of the NHRT. Each range of R is

stored in exactly one of the nodes of the PTST. More specifically, the root of the

PTST stores all ranges r E R such that starter) < point(root) < finisher); all

ranges r E R such that finisher) < point(root) are stored in the left subtree of the

root; all ranges r E R such that point(root) < start(r) (i.e., the remaining ranges

of R) are stored in the right subtree of the root. The ranges allocated to the left

and right subtrees of the root are allocated to nodes in these subtrees using the just

stated range allocation rule recursively. Note that the range allocation rule is quite

similar to that used for interval trees [40].

For the range allocation rule to successful allocate all r E R to exactly one

node of the PTST, the PTST must have at least one node z for which starter) <

point(z) < finisher). Table 3-1 gives an example set of nonintersecting ranges, and

Figure 3-1 shows a possible PTST for this set of ranges (we w possible, because we

haven't specified how to select the point(z) values and even with specified point(z)

values, the corresponding red-black tree isn't unique). The number inside each node

is point(z), and outside each node, we give ranges(z).

70 Y- .([2, 100],4)
([8, 50], 9) I I ([69, 72],10)
S([10, 50], 20)1 .- 30 80 1----
([10,35], 3) ----- I
([15, 33], 5) 2 6 ([80, 80], 12)I
([16, 320], 302 I 2 ___ _

S([2, 4], 33) ([54, 66], 18)
I ([2, 3], 34) I I ([60, 65], 7) 1
Figure 3-1: A possible PTST
Figure 3-1: A possible PTST


Table 3-1: A nonintersecting range set

range priority
[2, 100] 4
[2, 4] 33
[2, 3] 34
[8, 68] 10
[8, 50] 9
[10, 50] 20
[10, 35] 3
[15, 33] 5
[16, 30] 30
[54, 66] 18
[60, 65] 7
[69, 72] 10
[80, 80] 12

Let ranges(z) be the subset of ranges of R allocated to node z of the PTST.1

Since the PTST may have as many as 2n nodes and since each range of R is in exactly

one of the sets ranges(z), some of the ranges(z) sets may be empty.

The ranges in ranges(z) may be ordered using the < relation of Definition 18.

Using this < relation, we put the ranges of ranges(z) into a red-black tree (any

balanced binary search tree structure that supports efficient search, insert, delete,

join, and split may be used) called the range search-tree or RST(z). Each node x of

RST(z) stores exactly one range of ranges(z). We refer to this range as range(x).

Every node y in the left (right) subtree of node x of RST(z) has range(y) < range(x)

(range(y) > range(x)). In addition, each node x stores the quantity mp(x), which is
the maximum of the priorities of the ranges associated with the nodes in the subtree

1 We have overloaded the function ranges. When u is a node, ranges(u) refers to
the ranges stored in node u of a PTST; when u is a destination address, ranges(u)
refers to the ranges that match u

rooted at x. mp(x) may be defined recursively as below.

p(x) p(x) if x is leaf
max {mp(leftChild(x)), mp(rightChild(x)), p(x)} otherwise

where p(x) = prior.:// (range(x)). Figure 3-2 gives a possible RST structure for
ranges(30) of Figure 3-1. Each node shows (range(x),p(x), mp(x)).

[10, 35], 3, 30

[8, 50], 9, 20 [15, 33], 5, 30

[8, 68], 1 10 [10, 50], 20, 20 [16, 30], 30, 30

Figure 3-2: An example RST for ranges(30) of Figure 3-1

Lemma 26 Let z be a node in a PTST and let x be a node in RST(z). Let st(x)
start(range(x)) and fn(x) = finish(range(x)).
1. For every node y in the right subtree of x, st(y) > st(x) and fn(y) < fn(x).
2. For every node y in the left subtree of x, st(y) < st(x) and fn(y) > fn(x).
Proof For 1, we see that when y is in the right subtree of x, range(y) >
range(x). From Definition 18, it follows that st(y) > st(x). Further, since
range(y) n range(x) / 0, if st(y) > st(x), then fn(y) < fn(x) (Lemma 25); if
st(y) = st(x), fn(y) < fn(x) (Definition 18). The proof for 2 is similar. m

3.2.2 Search for hpr(d)
The highest-priority range that matches the destination address d may be found
by following a path from the root of the PTST toward a leaf of the PTST. Figure 3-3
gives the algorithm. For simplicity, this algorithm finds hp = prior :l',(hpr(d)) rather
than hpr(d). The algorithm is easily modified to return hpr(d) instead.

Algorithm hp(d) {
// return the priority of hpr(d)
// easily extended to return hpr(d)
hp = -1; // assuming 0 is the smallest priority value
z = root; // root of PTST
while (z != null) {
if (d > point(z)) {
RST(z)->hpRight(d, hp);
z = rightChild(z);
else if (d < point(z)) {
RST(z)->hpLeft(d, hp);
z = leftChild(z);
else // d == point(z)
return max{hp, mp(RST(z)->root)};
return hp;

Figure 3-3: Algorithm to find prior ':/,(hpr(d))

We begin by initializing hp = -1 and z is set to the root of the PTST. This

initialization assumes that all priorities are > 0. The variable z is used to follow a path

from the root toward a leaf. When d > point(z), d may be matched only by ranges in

RST(z) and those in the right subtree of z. The method RST(z)->hpRight(d,hp)

(Figure 3-4) updates hp to reflect any matching ranges in RST(z). This method

makes use of the fact that d > point(z). Consider a node x of RST(z). If d > fn(x),

then d is to the right (i.e., d > finish(range(x))) of range(x) and also to the right

of all ranges in the right subtree of x. Hence, we may proceed to examine the ranges

in the left subtree of x. When d < fn(x), range(x) as well as all ranges in the left

subtree of x match d. Additional matching ranges may be present in the right subtree

of x. hpLeft(d, hp) is the analogous method for the case when d < point(z).

Complexity The complexity of the invocation RST(z)->hpRight (d,hp) is read-

ily seen to be O(height(RST(z)) = O(logn). Consequently, the complexity of hp(d)

is O(log2 n). To determine hpr(d) we need only add code to the methods hp(d),

Algorithm hpRight(d, hp) {
// update hp to account for any ranges in RST(z) that match d
// d > point(z)
x = root; // root of RST(z)
while (x != null)
if (d > fn(x))
x = leftChild(x);
else {
hp = max{hp, p(x), mp(leftChild(x))};
x = rightChild(x);

Figure 3-4: Algorithm hpRight(d, hp)

hpRight(d, hp), and hpLeft(d, hp) so as to keep track of the range whose priority is

the current value of hp. So, hpr(d) may be found in O(log2 n) time also.

3.2.3 Insert a Range

A range r that is known to have no intersection with any of the existing ranges

in the router table, may be inserted using the algorithm of Figure 3-5. In the while

loop, we find the node z nearest to the root such that r matches point(z) (i.e.,

starter) < point(z) < finisher)). If such a z exists, the range r is inserted into

RST(z) using the standard red-black insertion algorithm [24]. During this insertion,

it is necessary to update some of the mp values on the insert path. This update is

done easily. In case the PTST has no z such that r matches point(z), we insert a

new node into the PTST. This insertion is done using the method insertNewNode.

To insert a new node into the PTST, we first create a new PTST node y and

define point(y) and RST(y). point(y) may be set to be any destination address

matched by r (i.e., any address such that start(r) < point(y) < finisher)) may be

used. In our implementation, we use point(y) = starter). RST(y) has only a root

node and this root contains r; its mp value is prior' /l(r). If the PTST is currently

empty, y becomes the new root and we are done. Otherwise, the new node y may be

inserted where the search conducted in the while loop of Figure 3-5 terminated. That

Algorithm insert(r) {
// insert the nonintersecting range r
z = root; // root of PTST
while (z != null)
if (finish(r) < point(z))
z = leftChild(z);
else if (start(r) > point(z))
z = rightChild(z);
else {// r matches point(z)

// there is no node z such that r matches point(z)
// insert a new node into PTST

Figure 3-5: Algorithm to insert a nonintersecting range

is, as a child of the last non-null value of z. Following this insertion, the traditional

bottom-up red-black rebalancing pass is made [24]. This rebalancing pass may require

color changes and at most one rotation. Color changes do not affect the tree structure.

However, a rebalancing rotation, if performed, affects the tree structure and may lead

to a violation of the range allocation rule. Rebalancing rotations are investigated in

the next section.

We note that if the number of nodes in the PTST was at most 21RI, where IRI

is the number of ranges prior to the insertion of a new range r, then following the

insertion, IPTSTI < 21RI + 1 < 2(IRI + 1), where IPTSTI is the number of nodes in

the PTST and |R| + 1 is the number of ranges following the inertion of r. Hence an

insert does not violate the PTST size constraint.

Complexity Exclusive of the time required to perform the tasks associated with

a rebalancing rotation, the time required to insert a range is O(height(PTST))

O(logn). As we shall see in the next section, a rebalancing rotation can be done in

O(logn) time. Since at most one rebalancing rotation is needed following an insert,

the time to insert a range is O(log n). In case it is necessary for us to verify that the

range to be inserted does not intersect an existing range, we can augment the PTST

with priority search trees as in [34] and use these trees for intersection detection. The

overall complexity of an insert remains O(log n).

3.2.4 Red-Black-Tree Rotations

Figures 3-6 and 3-7, respectively, show the red-black LL and RR rotations used

to rebalance a red-black tree following an insert or delete (see [24]). In these figures,

pt() is an abbreviation for point(). Since the remaining rotation types, LR and RL,

may, respectively, be viewed as an RR rotation followed by an LL rotation and an

LL rotation followed by an RR rotation, it suffices to examine LL and RR rotations


pt(x) pt(y)

ity) LL a p(x)

a b b/

Figure 3-6: LL rotation

pt(x) P Y)
y x
pt(y) RR pt(x)

b c a b

Figure 3-7: RR rotation

Lemma 27 Let R be a set of nonintersecting r,,mj. Let ranges(z) C R be the

r i,.j, allocated by the r u,.-, allocation rule to node z of the PTST prior to an LL or

RR rotation. Let ranges'(z) be this subset for the PTST node z following the rotation.

ranges(z) = ranges'(z) for all nodes z in the subtrees a, b, and c of Figures 3-6 and


Proof Consider an LL rotation. Let ranges(subtree(x)) be the union of the ranges

allocated to the nodes in the subtree whose root is x. Since the range allocation

rule allocates each range r to the node z nearest the root such that r matches

point(z), ranges(subtree(x)) = ranges'(subtree(y)). Further, r c ranges(a) if

r E ranges(subtree(x)) and finisher) < point(y). Consequently, r E ranges'(a).

From this and the fact that the LL rotation doesn't change the positioning of nodes

in a, it follows that for every node z in the subtree a, ranges(a) = ranges'(a). The

proof for the nodes in b and c as well as for the RR rotation is similar. 0

Let x and y be as in Figures 3-6 and 3-7. From Lemma 27, it follows that

ranges(z) = ranges'(z) for all z in the PTST except possibly for z E {x, y}. It is not

too difficult to see that ranges'(y) = ranges(y) U S and ranges'(x) = ranges(x) S,


S = {rfr E ranges(x) A start(r) < point(y) < finisher)}

Since we are dealing with a set of nonintersecting ranges, all ranges in ranges(y)

are nested within the ranges of S. Figure 3-8 shows the ranges of ranges(x) using

solid lines and those of ranges(y) using broken lines. S is the set of ranges drawn

above ranges(y) (i.e., the solid lines above the broken lines).

The range rMax of S with largest start() value may be found by searching

RST(x) for the range with largest start() value that matches point(y). (Note that

rMax = msr(point(y),ranges(x)).) Since RST(x) is a binary search tree of an

ordered set of ranges (Definition 18), rMax may be found in O(height(RST(x))

time by following a path from the root downward. If rMax doesn't exist, S = 0,

ranges'(x)= ranges(x) and ranges'(y) = ranges(y).

msr(pt(y), ranges(x))

pt(y) pt(x)



ms(pt(y), ranges(x))

pt(x) pt(y)


Figure 3-8: ranges(x) and ranges(y) for LL and RR rotations.
as in Figures 3-6 and 3-7

Assume that rMax exists. We may use the split operation
RST(x) the ranges that belong to S. The operation

RST(x) split(small, rMax, big)

separates RST(x) into an RST small of ranges < (Definition 18) than rMax and
an RST big of ranges > than rMax. We see that RST'(x) = big and RST'(y) =
join(small, rMax, RST(y)), where join [24] combines the red-black tree small with
ranges < rMax, the range rMax, and the red-black tree RST(y) with ranges >
rMax into a single red-black tree.
The standard split and join operations of Horowitz et al. [24] need to be modified
slightly so as to update the mp values of affected nodes. This modification doesn't
affect the .,-i~'i!ii.1 I ic complexity, which is logarithmic in the number of nodes in the
tree being split or logarithmic in the sum of the number of nodes in the two trees
being joined, of the split and join operations. So, the complexity of performing an
LL or RR rotation (and hence of performing an LR or RL rotation) in the PTST is
O(log n).

Nodes x and y are

[24] to extract from

3.2.5 Delete a Range

Figure 3-9 gives our algorithm to delete a range r. Note that if r is one of the

ranges in the PTST, then r must be in the RST of the node z that is closest to the

root and such that r matches point(z). The while loop of Figure 3-9 finds this z

and deletes r from RST(z).

Algorithm delete(r) {
// delete the range r
z = root; // root of PTST
while (z != null)
if (finish(r) < point(z))
z = leftChild(z);
else if (start(r) > point(z))
z = rightChild(z);
else {// r matches point(z)

Figure 3-9: Algorithm to delete a range

Assume that r is, in fact, one of the ranges in our PTST. To delete r from

RST(z), we use the standard red-black deletion algorithm [24] modified to update

mp values as necessary. Following the deletion of r from RST(z) we perform a cleanup

operation that is necessary to maintain the size constraint of the PTST. Figure 3-10

gives the steps in the method cleanup.

Algorithm cleanup(z) {
// maintain size constraint
if (RST(z) is empty and the degree of z is 0 or 1)
delete node z from the PTST and rebalance;

while (IPTSTI > 21RI)
delete a degree 0 or degree 1 node z with empty
RST(z) from the PTST and rebalance;

Figure 3 Algorithm to maintain size constraint following a delete
Figure 3-10: Algorithm to maintain size constraint following a delete

Notice that following the deletion of r from RST(z), RST(z) may or may not be

empty. If RST(z) becomes empty and the degree of node z is either 0 or 1, node z is

deleted from the PTST using the standard red-black node deletion algorithm [24]. If

this deletion requires a rotation (at most one rotation may be required) the rotation

is done as described in Section 3.2.4. Since the number of ranges and nodes has each

decreased by 1, the size constraint may be violated (this happens if IPTST = 21RI

prior to the delete). Hence, it may be necessary to remove a node from the PTST to

restore the size constraint.

If RST(z) becomes empty and the degree of z is 2 or if RST(z) does not become

empty, z is not deleted from the PTST. Now, IPTSTI is unchanged by the deletion of

r and |R| reduces by 1. Again, it is possible that we have a size constraint violation.

If so, up to two nodes may have to be removed from the PTST to restore the size


The size constraint, if violated, is restored in the while loop of Figure 3-10.

This restoration is done by removing one or two (as needed) degree 0 or degree 1

nodes that have an empty RST. Lemma 28 shows that whenever the size constraint

is violated, the PTST has at least one degree 0 or degree 1 node with an empty RST.

So, the node z needed for deletion in each iteration of the while loop l i' i, exists.

Lemma 28 When the PTST has > 2n nodes, where n= IRI, the PTST has at least

one degree 0 or degree 1 node that has an n mp'/l PTST.

Proof Suppose not. Then the degree of every node that has an empty RST is 2.

Let n2 be the total number of degree 2 nodes, nl the total number of degree 1 nodes,

no the total number of degree 0 nodes, n, the total number of nodes that have an

empty RST, and n, the total number of nodes that have a nonempty RST. Since all

PTST nodes that have an empty RST are degree 2 nodes, n2 > n,. Further, since

there are only n ranges and each range is stored in exactly one RST, there are at

most n nodes that have a nonempty RST, i.e., n > n,n. Thus n2 + n > n, + n,

IPTSTI, i.e., n2 > IPTSTI n. From [24], we know that no = n2 + 1. Hence,

no + n1 + n2 2 + + n + > n2+n2 > 21PTSTI 2n > IPTSTI. This

contradicts no + n1 + n2 = |PTSTI. 0

To find the degree 0 and degree 1 nodes that have an empty RST efficiently, we

maintain a doubly-linked list of these nodes. Also, a doubly-linked list of degree 2

nodes that have an empty RST is maintained. When a range is inserted or deleted,

PTST nodes may be added/removed from these doubly-linked lists and nodes may

move from one list to another. The required operations can be done in 0(1) time


Complexity It takes O(logn) time to find the PTST node z that contains

the range r that is to be deleted. Another O(log n) time is needed to delete r from

RST(z). The cleanup step removes up to 2 nodes from the PTST. This takes another

O(log n) time. So, the overall delete time is O(log n).

3.2.6 Expected Complexity of BOB

Let maxR be the maximum number of ranges that match any destination ad-

dress. So, Iranges(z)l = IRST(z)I < maxR for every node z of the PTST. We may,

therefore, restate the complexity of the BOB operations-lookup, insert, delete-as

O(lognlogmaxR), O(logn), and O(logn), respectively.

Sahni et al. [21] have analyzed the prefixes in several real IPv4 prefix router-

tables. They report that a destination address is matched by about 1 prefix on

average; the maximum number of prefixes that match a destination address is at

most 6. Making the assumption that this analysis holds true even for real range

router-tables (no data is available for us to perform such an analysis), we conclude

that maxR < 6. So, the expected complexity of BOB on real router-tables is O(log n)

per operation.

3.3 Highest-Priority Prefix-Tables (HPPTs)-PBOB

3.3.1 The Data Structure

When all rule filters are prefixes, maxR < min{n, W}. Hence, if BOB is used to

represent an HPPT, the search complexity is O(log n min{log n, log W}); the insert

and delete complexities are O(log n) each.

Since maxR < 6 for real prefix router-tables, we may expect to see better perfor-

mance using a simpler structure (i.e., a structure with smaller overhead and possibly

worse .,i-mptotic complexity) for ranges(z) than the RST structure described in Sec-

tion 3.2. In PBOB, we replace the RST in each node, z, of the BOB PTST with an

array linear list [41], ALL(z), of pairs of the form (pLength, priority), where pLength

is a prefix length (i.e., number of bits) and priority is the prefix priority. ALL(z) has

one pair for each range r E ranges(z). The pLength value of this pair is the length

of the prefix that corresponds to the range r and the priority value is the priority of

the range r. The pairs in ALL(z) are in ascending order of pLength. Note that since

the ranges in ranges(z) are nested and match point(z), the corresponding prefixes

have different length.

3.3.2 Lookup

Figure 3-11 gives the algoritm to find the priority of the highest-priority prefix

that matches the destination address d. The method maxp() returns the highest

priority of any prefix in ALL(z) (note that all prefxes in ALL(z) match point(z)).

The method searchALL(d,hp) examines the prefixes in ALL(z) and updates hp

taking into account the priorities of those prefixes in ALL(z) that match d.

The method searchALL(d,hp) utilizes the following lemma. Consequently, it

examines prefixes of ALL(z) in increasing order of length until either all prefixes

have been examined or until the first (i.e., shortest) prefix that doesn't match d is


Algorithm hp(d) {
// return the priority of hpp(d)
// easily extended to return hpp(d)
hp = -1; // assuming 0 is the smallest priority value
z = root; // root of PTST
while (z != null) {
if (d == point(z))
return max{hp, ALL(z)->maxp()};
if (d < point(z))
z = leftChild(z);
z = rightChild(z);
return hp;

Figure 3-11: Algorithm to find prior.:l ,(hpp(d))

Lemma 29 If a prefix in ALL(z) doesn't match a destination address d, then no

longer-length prefix in ALL(z) matches d.

Proof Let pl and p2 be prefixes in ALL(z). Let li be the length of pi. Assume

that 11 < 12 and that pi doesn't match d. Since both pi and P2 match point(z), P2

is nested within pl. Therefore, all destination addresses that are matched by p2 are

also matched by pi. So, p2 doesn't match d. 0

One way to determine whether a length 1 prefix of ALL(z) matches d is to use

the following lemma. The check of this lemma may be implemented using a mask to

extract the most-signifcant bits of point(z) and d.

Lemma 30 A length I prefix p of ALL(z) matches d iff the most--.:i, bits of

point(z) and d are the same.

Proof Straightforward. U

Complexity We assume that the masking operations can be done in 0(1) time

each. (In IPv4, for example, each mask is 32 bits long and we may extract any subset

of bits from a 32-bit integer by taking the logical and of the appropriate mask and

the integer.) The number of PTST nodes reached in the while loop of Figure 3-11

is O(log n) and the time spent at each node z that is reached is linear in the number

of prefixes in ALL(z) that match d. Since the PTST has at most maxR prefixes that

match d, the complexity of our lookup algorithm is O(log n + maxR) = O(W) (note

that log2 n < W and maxR < W).

3.3.3 Insertion and Deletion

The PBOB algorithms to insert/delete a prefix are simple adaptations of the cor-

responding algorithms for BOB. rMax is found by examining the prefixes in ALL(x)

in increasing order of length. ALL'(y) is obtained by prepending the prefixes in

ALL(x) whose length is < the length of rMax to ALL(y), and ALL'(x) is obtained

from ALL(x) by removing the prefixes whose lenth is < the length of rMax. The

time require to find rMax is O(maxR). This is also the time required to com-

pute ALL'(y) and ALL'(x). The overall complexity of an insert/delete operation is

O(log n + maxR) = 0(W).

As noted earlier, maxR < 6 in practice. So, in practice, PBOB takes O(log n)

time and makes O(log n) cache misses per operation.

3.4 Longest-Matching Prefix-Tables (LMPTs)-LMPBOB

3.4.1 The Data Structure

Using priority = pLength, a PBOB may be used to represent an LMPT obtain-

ing the same performance as for an HPPT. However, we may achieve some reduction

in the memory required by the data structure if we replace the array linear list that

is stored in each node of the PTST by a W-bit vector, bit. bit(z)[i] denotes the ith

bit of the bit vector stored in node z of the PTST, bit(z)[i] = 1 iff ALL(z) has a

prefix whose length is i. We note that Suri et al. [20] use W-bit vectors to keep track

of prefix lengths in their data structure also.

3.4.2 Lookup

Figure 3-12 gives the algorithm to find the length of the longest matching-prefix,

Imp(d), for destination d. The method longest() returns the largest i such that

bit(z)[i] = 1 (i.e., it returns the length of the longest prefix stored in node z). The

method searchBitVector(d,hp,k) examines bit(z) and updates hp taking into ac-

count the lengths of those prefixes in this bit vector that match d. The method

same (k+l, point (z) d) returns true iff point(z) and d agree on their k + 1 most

significant bits.

Algorithm lmp(d) {
// return the length of lmp(d)
// easily extended to return lmp(d)
hp = 0; // length of Imp
k = 0; // next bit position to examine is k+1
z = root; // root of PTST
while (z != null) {
if (d == point(z))
return max{k, z->longest()};
if (d < point(z))
z = leftChild(z);
z = rightChild(z);
return hp;

Figure 3-12: Algorithm to find length(lmp(d))

Algorithm searchBitVector(d,hp,k) {
// update hp and k
while (k < W && same(k+1, point(z), d) {
if (bit(z)[k+1] == 1)
hp = k+1;

Figure 3-13: Algorithm to search a bit vector for prefixes that match d

The method searchBitVector(d,hp,k) (Figure 3-13) utilizes the next two lem-


Lemma 31 If bit(z)[i] corresponds to a prefix that doesn't match the destination

address d, then bit(z)[j], j > i corresponds to a prefix that doesn't match d.

Proof bit(z)[q] corresponds to the prefix pq whose length is q and which equals the

q most significant bits of point(z). So, pi matches all points that are matched by pj.

Hence, if pi doesn't match d, pj doesn't match d either. 0

Lemma 32 Let w and z be two nodes in a PTST such that w is a descendent of z.

Suppose that z- > bit(q) corresponds to a prefix pq that matches d. w- > bit(j),

j < q cannot correspond to a prefix that matches d.
Proof Suppose that w- > bit(j) corresponds to the prefix pj, pj matches d, and

j < q. So, pj equals the j most significant bits of d. Since pq matches d and also
point(z), d and point(z) have the same q most significant bits. Therefore, pj matches

point(z). So, by the range allocation rule, pj should be stored in node z and not in

node w, a contradiction. U

Complexity We assume that the method same can be implemented using masks

and Boolean operations so as to have complexity 0(1). Sine a bit vector has the same

number of bits as does a destination address, this assumption is consistent with the

implicit assumption that arithmetic on destination addresses takes 0(1) time. The

total time spent in all invocations of searchBitVector is O(W + log n). The time

spent in the remaining steps of lmp(d) is O(logn). So, the overall complexity of

1mp(d) is O(W + logn) = O(W). Even though the time complexity is O(W), the

number of cache misses is O(log n) (note that each bit vector takes the same amount

of space as needed to store a destination address).

3.4.3 Insertion and Deletion

The insert and delete algorithms are similar to the corresponding algorithms for

HPPTs. The essential difference are as below.

1. Rather than insert or delete a prefix from an ALL(z), we set bit(z)[1], where 1

is the length of the prefix being inserted or deleted, to 1 or 0, respectively.

2. For a rotation, we do not look for rMax in bit(x). Instead, we find the largest

integer iMax such that the prefix that corresponds to bit(x)[iMax] matches

point(y). The first (bit 0 comes before bit 1) iMax bits of bit'(y) are the

first iMax bits of bit(x) and the remaining bits of bit'(y) are the same as the

corresponding bits of bit(y). bit'(x) is obtained from bit(x) by setting its first

iMax bits to 0.

Complexity iMax may be determined in O(log W) time using binary search;

bit'(x) and bit'(y) may be computed in 0(1) time using masks and boolean operations.

The remaining tasks performed during an insert or delete take O(log n) time. So, the

overall complexity of an insert or delete operation is O(log n+log W) = O(log(Wn)).

The number of cache misses is O(log n).

3.5 Implementation Details and Memory Requirement

3.5.1 Memory Management

We implemented our data structures in C++. Since dynamic memory allocation

and deallocation using C++'s methods new and delete are very time consuming,

we implemented our own methods to manage memory. We maintained our own list

of free memory. Whenever this list was exhausted, we used the new method to get

a large chunk of memory to add to our free list. Memory was then allocated from

this large chunk as needed by our data structures. Whenever memory was to be

deallocated, it was put back on to our free list.

3.5.2 BOB

As described in Section 3.2, each node z of the PTST of BOB has the following

fields: color, point(z), RST, leftChild, and rightChild. To improve the lookup per-

formance of BOB, we added the following fields: maxPriority (maximum priority of

the ranges in ranges(z)), minSt (smallest starting point of the ranges in ranges(z)),

and maxFn (largest finish point of the ranges in ranges(z)). Correspondingly, the

statements RST->hpRight (d,h) and RST (z) ->hpLeft (d, h) of Figure 3-3 are executed

only when maxPriority > hp&&d <= maxFn and maxPriority > hp&&minSt <

d, respectively.

With the added fields, each node of the PTST has 8 fields. For the color and

maxPriority fields, we allocate 1 byte each. Assuming 4 bytes for each of the re-

maining fields, we get a node size of 26 bytes. For improved cache performance, it

is desirable to align node to 4-byte memory-boundaries. This alignment is simplified

if node size is an integral multiple of 4 bytes. Therefore, for practical purposes, the

PTST node-size becomes 28 bytes.

In our implementation of hpRight (Figure 3-4), the while loop conditional was

changed from x != null to x != null && mp > hp. A corresponding change was

made to hpLeft.

The nodes of an RST have the following fields: color, mp, st, fn, p, leftChild,

and rightChild. Using 1 byte for the color, p, and mp fields each, and 4 bytes for

each of the remaining fields, the size of an RST node becomes 19 bytes. Again, for

ease of alignment to 4-byte boundaries, we make the RST-node size 20 bytes. In

addition to nodes, every nonempty RST has the fields root (pointer to root of RST)

and rank (rank of red-black tree) field. Each of these fields is a 4-byte field.

For the doubly-linked lists of PTST nodes with an empty RST, we used the

minSt and maxFn fields to, respectively, represent left and right pointers. So, there

is no space overhead (other than the space needed to keep track of the first node)

associated with maintaining the two doubly-linked lists of PTST nodes that have an

empty RST.

Since an instance of BOB may have up to 2n PTST nodes, n nonempty RSTs, and

n RST nodes, the maximum space/memory required by BOB is 28*2n+8*n+20*n

84n bytes.

3.5.3 PBOB

The required fields in each node z of the PTST of PBOB are: color, point(z),

ALL, size, length, leftChild, and rightChild, where ALL is a one-dimensional array,

each entry of which has the subfields pLength and priority; size is the dimension

of the array, and length is the number of pairs currently in the array linear list.

The array ALL initially has enough space to accommodate 4 pairs (pLength, priority).

When the capacity of an ALL is exceeded, the size of the ALL is increased by 4

pairs (since at most 6 pairs are expected in an ALL, the size of an ALL needs to

be increased at most once; in theory, an ALL may get as many as W pairs and, in

theory, using array doubling as in [41] may work better than increasing the array size

by 4 each time array capacity is exceeded).

To improve the lookup performance of PBOB, the field maxPriority (maxi-

mum priority of the prefixes in ALL(z)), may be added. Note that minSt (smallest

starting point of the prefixes in ALL(z)), and maxFn (largest finish point of the pre-

fixes in ALL(z)) are easily computed from point(z) and the pLength of the shortest

(i.e., first) prefix in ALL(z). When the nodes of the PTST are augmented with a

maxPriority field, the expression ALL (z) ->maxp () in Figure 3-11 may be changed to

maxPriority(z), and the statement ALL(z)->searchALL(d,hp) executed only when

maxPriority > hp && minSt < d && d < maxFn

Since searchALL does its first check against the shortest prefix in the array

linear-list and this check tests minSt < d&&d < maxFn, it is sufficient to execute

the statement ALL(z)->searchALL (d,hp) only when maxPriority > hp.

Using 1 byte for each of the fields: color, size, length, maxPriority, pLength,

and priority; and 4 bytes for each of the remaining fields, the initial size of a PTST

node of PBOB is 24 bytes.

For the doubly-linked lists of PTST nodes with an empty ALL, we used the 8

bytes of memory allocated to the empty array ALL to, respectively, represent left

and right pointers. So, there is no space overhead (other than the space needed to

keep track of the first node) associated with maintaining the two doubly-linked lists

of PTST nodes that have an empty ALL.

Since an instance of PBOB may have up to 2n PTST nodes, the minimum

space/memory required by these 2n PTST nodes is 24 2n = 48n bytes. However,

some PTST nodes may have more than 4 pairs in their ALL. There can be at most

n/5 such nodes. So, the maximum space-requirement of PBOB is 48n + 8n/5 = 49.6n


3.5.4 LMPBOB

In the case of LMPBOB, each node z of the PTST has the following fields: color,

point(z), bit, leftChild, and rightChild.

To improve the lookup performance of PBOB, the fields minLength (minimum

of lengths of prefixes in bit(z)) and maxLength may be added. When the nodes of

the PTST are augmented with a minLength and a maxLength field, we replace the

statement bit(z)->searchBitVector(d,hp,k) of Figure 3-12 by

if (same(minLength, point(z), d)) {

hp = k = minLength;



Observe that maxLength of LMPBOB is equivalent to maxPriority of BOB

and PBOB.

Using 1 byte for each of the fields: color, minLength, and maxLength; 8 bytes

for bit (this analysis is for IPv4); and 4 bytes for each of the remaining fields, the size

of a PTST node of LMPBOB is 23 bytes. Again, to easily align PTST nodes along

4-byte boundaries, we pad an LMP PTST node so that its size is 24 bytes.

For the doubly-linked lists of PTST nodes with an empty bit vector, we used the

8 bytes of memory allocated to the empty bit vector bit to represent left and right

pointers. So, there is no space overhead (other than the space needed to keep track

of the first node) associated with maintaining the two doubly-linked lists of PTST

nodes that have an empty bit.

Since an instance of LMPBOB may have up to 2n PTST nodes, the

space/memory required by these 2n PTST nodes is 24 2n = 48n bytes.

3.6 Experimental Results

3.6.1 Test Data and Memory Requirement

We implemented the BOB, PBOB, and and LMPBOB data structures and asso-

ciated algorithms in C++ as described in Section 3.5 and measured their performance

on a 1.4GHz PC. To assess the performance of these data structures, we used six IPv4

prefix databases obtained from [38]2 We assigned each prefix a priority equal to its

length. Hence, BOB, PBOB, and LMPBOB were all used in a longest matching-prefix

mode. For dynamic router-tables that use the longest matching-prefix tie breaker, the

PST structure of Lu et al. [33, 34] provides O(logn) lookup, insert, and delete. So,

we included the PST in our experimental evaluation of BOB, PBOB, and LMPBOB.

The number of prefixes in each of our 6 databases as well as the memory re-

quirement for each database of prefixes are shown in Table 3-2. For the memory

2 Our experiments are limited to prefix databases because range databases are not
available for benchmarking

requirement, we performed two measurements. Measure gives the memory used by

a data structure that is the result of a series of insertions made into an initially empty

instance of the data structure. For Measurel, less than 1 of the PTST-nodes in

the constructed BOB, PBOB, and LMPBOB instances are empty. So, these data

structures use close to the minimum amount of memory they could use. Measure

gives the memory used after 7 .. of the prefixes in the data structure constructed for

Measure are deleted. In the resulting BOB, PBOB, and LMPBOB instances, almost

half the PTST nodes are empty. The datbases Paixl, Pbl, MaeWest and Aads were

obtained on Nov 22, 2001, while Pb2 and Paix2 were obtained Sep 13, 2000. Fig-

ures 3-14 and 3-15 histogram the data of Table 3-2. The memory required by PBOB

and LMPBOB is the same when rounded to the nearest KB. This is so because in

each of these structures, the number of PTST nodes is the same; the minimum size of

a PTST node in PBOB is 24 bytes, very few PTST nodes of PBOB are vi._-r than

24 bytes because the average value of Iranges(z) is about 1 for our data sets and the

maximum value is at most 6; and the size of PTST node in LMPBOB is 24 bytes. In

Measure, the memory required by BOB is about 2.38 times that required by PBOB

and LMPBOB. However, in Measure2, this ratio is about 1.75. Also, note that, in

Measure, PST takes slightly more memory than does BOB, whereas, in Measure2,

BOB takes about 50' more memory than does PST. We note also that the mem-

ory requirement of PST may be reduced by about 50' using a priority-search-tree

implementation different from that used in [33]. Of course, using this more memory

efficient implementation would increase the run-time of PST.

3.6.2 Preliminary Timing Experiments

We performed preliminary experiments to determine the effectiveness of the

changes -i-l-. -1. .1 in Section 3.5. Since these changes are only to the lookup al-

gorithm, our preliminary timing experiments measured only the lookup times for the

BOB, PBOB, and LMPBOB data structures. To obtain the mean lookup-time, we

Table 3-2: Memory usage

Database Paixl Pbl MaeWest Aads Pb2 Paix2
Num of Prefixes 16172 22225 28889 31827 35303 85988
PST 884 1215 1579 1740 1930 4702
Measure BOB 851 1176 1526 1682 1876 4527
(KB) PBOB 357 495 642 708 790 1901
LMPBOB 357 495 642 708 790 1901
PST 221 303 395 435 482 1175
Measure BOB 331 455 592 652 723 1760
(KB) PBOB 189 260 338 372 413 1007
LMPBOB 189 260 338 372 413 1007



. n li li

Palxl Pbl MaeWest Aads Pb2 Palx2

Figure 3-14: Memory usage-measurel

800 PST
1600 PBOB
1200 -
1000 -
800 -
600 -
200 l

Palxl Pbl MaeWest Aads



Figure 3-15: Memory usage-measure2

started with a BOB, PBOB, or LMPBOB that contained all prefixes of a prefix

database. Next, we created a list of the start points of the ranges corresponding to

the prefixes in a database and then added 1 to each of these start points. Call this

list L. A random permutation of L was generated and this permutation determined

the order in which we searched for the longest matching-prefix for each of addresses

in L. The time required to determine all of these longest-matching prefixes was

measured and averaged over the number of addresses in L (actually, since the time

to perform all these lookups was too small to measure accurately, we repeated the

lookup for all addresses in L several times and then averaged). The experiment was

repeated 10 times, each time using different random permutation of L, and the mean

of these average times computed. The mean times for the implementation described

in Section 3.5 is the base lookup-time.

For BOB, we found that omitting the predicates d < maxFn and minSt < d

resulted in a mean lookup time that is approximately twice the base lookup time.

On the other hand, elimination of the predicate maxPriority > hp reduces the mean

lookup time by about 2' Even though the use of the predicate maxPriority > hp

increased the lookup time slightly on our test data, we believe this is a good heuristic

for data sets in which the priorities are not highly correlated with the lengths of the

prefixes or ranges. So, our remaining experiments retained this predicate. Eliminating

the predicate mp > hp had no noticeable effect on mean lookup time. This is to

be expected on our data sets, because for these data sets, the maximum value of

|ranges(z)| is < maxR = 6. The predicate mp > hp is expected to be effective

on data sets with a larger value of maxR. So, we retained this predicate for our

remaining tests.

For PBOB, elimination of the predicate hp < maxPriority results in a very slight

decrease in the mean lookup time relative to the base case. Hwoever, we expect that

for data sets in which the priority isn't highly correlated with the prefix length, this

predicate will actually reduce lookup time. Therefore, for further experiments, we

retain this predicate in our lookup code.

In the case of LMPBOB, the introduction of the statement hp = k = minLength

into the base code, results in a lookup time that is 15' less than when this statement

is removed.

3.6.3 Run-Time Experiments

We measured the mean lookup-time as described in Section 3.6.2. The standard

deviation in the average times across the 10 repetitions described in Section 3.6.2 was

also computed. These mean times and standard deviations are reported in Table 3-3.

The mean times are also histogrammed in Figure 3-16. It is interesting to note that

PBOB, which can handle prefix tables with arbitrary priority assignments is actually

211' to 3I' faster than PST, which is limited to prefix tables that employ the longest

matching-prefix tie breaker. Further, lookups in BOB, which can handle range tables

with arbitrary priorities are slightly slower than in PST. LMPBOB, which, like PST,

is designed specifically for longest-matching-prefix lookups is slightly inferior to the

more general PBOB.

32 BOB

Paixl Pbl MaeWest Aads Pb2 Paix2

Figure 3-16: Search time

To obtain the mean insert-time, we started with a random permutation of the

prefixes in a database, inserted the first I 7'. of the prefixes into an initially empty data

structure, measured the time to insert the remaining : ;'. and computed the mean

insert time by dividing by the number of prefixes in :3 ;' of the database. (Once again,

Table 3-3: Prefix times on a 1.4GHz Pentium 4 PC with an 8K L1 data cache and a
256K L2 cache
Database Paixl Pbl MaeWest Aads Pb2 Paix2
PST Mean 1.20 1.35 1.49 1.53 1.57 1.96
Std 0.01 0.01 0.04 0.01 0.00 0.01
BOB Mean 1.22 1.39 1.54 1.56 1.62 2.19
Search Std 0.01 0.02 0.02 0.02 0.02 0.01
(psec) PBOB Mean 0.82 0.98 1.10 1.15 1.20 1.60
Std 0.01 0.01 0.01 0.01 0.01 0.01
LMPBOB Mean 0.87 1.03 1.17 1.21 1.27 1.69
Std 0.01 0.01 0.01 0.01 0.01 0.01
PST Mean 2.17 2.35 2.53 2.60 2.64 3.03
Std 0.07 0.04 0.03 0.01 0.05 0.01
BOB Mean 1.70 1.89 2.06 2.10 2.16 2.55
Insert Std 0.06 0.06 0.05 0.05 0.05 0.03
(psec) PBOB Mean 1.04 1.25 1.39 1.44 1.51 1.93
Std 0.06 0.05 0.00 0.05 0.05 0.06
LMPBOB Mean 1.06 1.29 1.47 1.50 1.57 1.98
Std 0.07 0.07 0.06 0.06 0.04 0.01
PST Mean 1.72 1.87 2.06 2.09 2.11 2.48
Std 0.04 0.05 0.05 0.06 0.04 0.06
BOB Mean 1.04 1.13 1.26 1.27 1.32 1.69
Delete Std 0.06 0.05 0.04 0.05 0.06 0.06
(psec) PBOB Mean 0.68 0.82 0.90 0.91 0.97 1.30
Std 0.07 0.06 0.05 0.06 0.03 0.05
LMPBOB Mean 0.67 0.82 0.89 0.92 0.95 1.26
Std 0.06 0.06 0.05 0.05 0.03 0.05
Num of Copies 15 11 9 8 8 3

since the time to insert the remaining 3:'

of the prefixes was too small to measure

accurately, we started with several copies of the data structure and inserted the 3 '

prefixes into each copy; measured the time to insert in all copies; and divided by the

number of copies and number of prefixes inserted). This experiment was repeated

10 times, each time starting with a different permutation of the database prefixes,

and the mean of the mean as well as the standard deviation in the mean computed.

These latter two quantities as well as the number of copies of each data structure

we used for the inserts are given in Table 3-3. Figure 3-17 histograms the mean

insert-time. As can be seen, insertions into PBOB take between III '. and ian'. less

time than do insertions into PST; insertions into LMPBOB take slightly more time

than do insertions into PBOB; and insertions into PST take 211' to 25'. more time

than do insertions into BOB.

32 BOB

Palxl Pbl MaeWest Aads Pb2 Palx2

Figure 3 17: Insert time

The mean and standard deviation data reported for the delete operation in Ta-

ble 3 3 and Figure 3 18 was obtained in a similar fashion by starting with a data

structure that had 1(111' of the prefixes in the database and measuring the time to

delete a randomly selected Q:;' of these prefixes. Deletion from PBOB takes less

than 50' the time required to delete from an PST. For the delete operation, how-

ever, LMPBOB is slightly faster than PBOB. Deletions from BOB take about 40' .

less time than do deletions from PST.

3.7 Conclusion

Table 3.7 gives the worst-case memory required by each of the data structures.

The data of this table are for IPv4. When comparing these memory requirement

data, we should keep in mind that BOB, PBOB, and LMPBOB have different ca-

pabilities. BOB works for highest-priority matching with nonintersecting ranges;

PBOB is limited to highest-priority matching with prefixes; and LMPBOB is limited
PBOB is limited to highest-priority matching with prefixes; and LMPBOB is limited



Palxl Pbl MaeWest Aads Pb2 Palx2

Figure 3-18: Delete time

to longest-length matching with prefixes. The PST structure of Lu et al. [33] has the
same restrictions as does LMPBOB.

Table 3-4: Node sizes and worst-case memory requirement in bytes for IPv4 router
Node Size PTST(28 RST20) >24 24 28

1 2


Memory Required 84n 49.6n 48n 56n
0204 P Maeiest
Paix1 Pb1 MaeWest Aads Pb2 Paix2

Table 3Figure5 gives the .De time complexity and Table 3time6 gives the .mp-

totic cache misses for our data structures. In these tables, maxR is the maximum
number of ragest-lenges or prefixes th prefixes. The PST stination address of Lu et and ma[33 is the

maximum number of cache-lines needed by any of the array linear-lists stored in a
PTST node. For LMPBOB, it is assumed that mask operations on W-bit vectorsLMPBOB.

take (1) time sizesand that an enworst-case W-bit memory requirement in bytes for IPv4 router


Table 3-5: Time complexity
Node Size PTST(28) RST(20) >24 24 28

Memory Required 84n 49.6naR) O(logn + W) O(logn
Table 3-5 gives the i-, iiii ic time complexity and Table 3-6 gives the .'-Jrmp-

totic cache misses for our data structures. In these tables, maxR is the maximum

Inser of ranges or prefixes that match any destinatio(logn address and axL is the

Deletemaximum number of cache-lines needed by any of the array linear-lists stored in a
PTST node. For LMPBOB, it is assumed that mask operations on W-bit vectors

take 0(1) time and that an entire W-bit vector can be accessed with 0(1) cache


Table 3-5: Time complexity

Search 0(lognlogmaxR) 0(logn+maxR) 0(log n + W) 0(log n)
Insert 0 (log n) 0(logn + maxR) 0(logn + logW) 0(logn)
Delete 0(log n) 0(logn + maxR) 0(logn + logW) 0(logn)

Table 3-6: Cache misses
Search O(log log maxR) O(log n + maxL) O(log n) O(log n)
Insert O(log n) O(log n + maxL) O(log n) O(log n)
Delete O(log n) O(log n + maxL) O(log n) O(log n)

Our experiments show that PBOB is to be preferred over PST and LMPBOB for

the representation of dynamic longest-matching prefix-router-tables. This is some-

what surprising because PBOB may be used for highest-priority prefix-router-tables,

not just longest-matching prefix-router-tables. A possible reason why PBOB is faster

than LMPBOB is that in LMPBOB one has to check O(W) prefix lengths, whereas

in PBOB O(maxR) lengths are checked (note that in our test databases, W = 32 and

maxR < 6). BOB is slower than and requires more memory than PBOB when tested

with longest-matching prefix-router tables. The same relative performance between

BOB and PBOB is expected when filters are prefixes with arbitrary priority. Of the

data structures considered in this chapter, BOB, of course, remains the only choice

when the filters are ranges that have an associated priority.

Although the range allocation rule used by our data structures is similar to

that used in an interval tree [40], the unique feature of our structures is the 2n size

constraint. The size constraint is essential for O(log n) update.


In this chapter, we focus on B-tree data structures for dynamic NHPRTs and

LMPTs. We are interested in the B-tree, because by varying the order of the B-

tree, we can control the height of the tree and hence control the number of cache

misses incurred when performing a rule-table operation. Although Suri et al. [20]

have proposed a B-tree data structure for dynamic prefix-tables, their structure has

the following shortcomings:

1. A prefix may be stored in O(m) nodes at each level of the order m B-tree. This

results in excessive cache misses during the insert and delete operations.

2. Some of prefix end-points are stored twice in the B-tree. This is because every

endpoint is stored in a leaf node and some of the endpoints are additionally

stored in interior nodes. This duplicity in end-point storage increases memory


Our proposed B-tree structure doesn't suffer from these shortcomings. In our struc-

ture, each prefix is stored in 0(1) nodes at each level, and each prefix end-point is

stored once. Consequently, even though the .,-i-in!ll ic complexity of performing

dynamic prefix-table operations is the same in both structures and the .,-i-'!,l I ilic

memory requirements of both are the same, our structure is faster for the insert and

delete operations and also takes less memory.

In Section 4.1, we develop our B-tree data structure, PIBT (prefix in B-tree),

for dynamic prefix-tables. Our B-tree structure for non-intersecting ranges, RIBT

(range in B-tree), is developed in Section 4.2. Experimental results comparing the

performance of our PIBT structure, the multiway range tree (\! RT) structure of Suri

Table 4-1: An example prefix set R (W = 5)

Preifx Name Prefix Range Start Range Finish
P1 001* 4 7
P2 00* 0 7
P3 1* 16 31
P4 01* 8 15
P5 10111 23 23
P6 0* 0 15

et al. [20], and the best binary tree structure for dynamic prefix-tables, PBOB [35],

are presented in Section 4.3.

4.1 Longest-Matching Prefix-Tables-LMPT

4.1.1 The Prefix In B-Tree Structure-PIBT

A range r = [u, v] is a pair of addresses u and v, u < v. The range r represents

the addresses {u,u + 1,...,v}. starter) = u is the start point of the range and

finisher) = v is the finish point of the range. The range r matches all addresses

d such that u < d < v. Every prefix of a prefix router-table may be represented as

a range. For example, when W = 5, the prefix p = 100* matches addresses in the

range [16,19]. So, we p = 100* [16,19], start(p) = 16, and finish(p) = 19.

The length of p is 3. Figure 4-1 shows a prefix set and the ranges of the prefixes.

The set of start and finish points of a collection P of prefixes is the set of

endpoints, E(P), of P. When IP = n, E(P)I < 2n. Although our PIBT structure

and the MRT structure of Suri et al. [20] (\I RT) store the endpoints E(P) together

with additional information in a B-tree1 [41], each structure uses a different variety

of B-tree. Our PIBT structure uses a B-tree in which each key (endpoint) is stored

1 A B-tree of order m is an m-way search tree. If the B-tree is not empty, the
root has at least two children and other internal nodes have at least [m/21 children.
All external nodes are at the same level.

7 16

Y / z / w
)o 4 8 15 23 31
P2 P4


Figure 4-1: B-tree for the endpoints of the prefixes of Figure 4-1


4 7 ) ( 23

0 4) 7) 8 15) 16 23 31

Figure 4-2: Alternative B-tree for Figure 4-1

exactly once, while the MRT uses a B-tree in which each key is stored once in a

leaf node and some of the keys are additionally stored in interior nodes. Figure 4-1

shows a possible order-3 B-tree for the endpoints of the prefix set of Figure 4-1. In

this example, each endpoint is stored in exactly one node. This example B-tree is a

possible B-tree for PIBT but not for MRT.

Figure 4-2 shows a possible order 3 B-tree in which each endpoint is stored in

exactly one leaf node and some endpoints are also stored in interior nodes. This

example B-tree is a possible B-tree for MRT but not for PIBT.

With each node x of a PIBT B-tree, we associate an interval int(x) of the des-

tination address space [0, 2" 1]. The interval int(root) associated with the root of

the B-tree is [0, 2W 1]. Let x be a B-tree node that has t keys. The format of this

node is:

t, child, (key,, child,), (keyt, child)

where keyi is the ith key in the node (keyi < key2 < ... < keyt) and child is a pointer

to the ith subtree. In case of ambiguity, we use the notation x.keyi and x.childi to

refer to the ith key and child, respectively, of node x. Let keyo = start(int(x)) and

keyt+l = finish(int(x)). By definition,

intjix) = int(childi) = [keyi, keyi+1], 0 < i < t

For the example B-tree of Figure 4-1, int(x) = [0, 31], into(x) = int(y) = [0, 7],

intl(x) = int(z) = [7,16], int2(x) = int(w) = [16,31], into(y) = [0,0], intl(y)

[0, 4], int2(y) [4, 7], and into(z) [7, 8].
Node x of a PIBT has t + 1 W-bit vectors x.inte i,. 0 < i < t and t W-bit

vectors .,,.,/ 1 < i < t. The Ith bit of i', denoted x.intti, i., [1] is 1 iff

there is a length 1 prefix whose range includes inti(x) but not int(x). This rule for the

interval vectors is called the prefix allocation rule. For our example of Figure 4

1, y.interval2[3] = 1 because prefix P1 has length 3 and range [4,7]; [4,7] includes

int2(y) = [4, 7] but not int(y) = [0, 7]. We -i-v that P1 is stored in y.interval2 and
in node y. It is easy to see that a prefix may be stored in up to m 1 intervals of an

order m B-tree node and in up to 2 nodes at each level of the B-tree.

The bit -, ,,., [1] is 1 iff there is a length 1 prefix that has a start or finish

endpoint equal to keyi of x. For our example, prefixes P2 and P6 have 0 as their start

endpoint. Since the length of P2 is 2 and that of P6 is 1,, ,.,/l [1] = ,, .,li [2] = 1;

all other bits of ,' ./',,.,i are 0.

To conserve space, leaf nodes do not have child pointers. Further, to reduce

memory accesses, child pointers and interval bit-vectors are interleaved so that child

and inte i ,., can be accessed with a single cache miss provided cache lines are long

enough. In the sequel, we assume that W is sufficiently small so that this is the case.

Further, we assume that bit-vector operations on W-bit vectors take 0(1) time. This

assumption is certainly valid for IPv4 where W = 32 and a W-bit vector may be

represented as a 4-byte integer.

4.1.2 Finding The Longest Matching-Prefix

As in [20], we determine only the length of the longest prefix that matches a

given destination address d. From this length and d, the longest matching-prefix,

Imp(d), is easily computed. The PIBT search algorithm (Figure 4-3) employs the

following lemma.

Lemma 33 Let P be a set of prefixes. If P contains a prefix whose start or finish

endpoint equals d, then the longest prefix, Imp(d), that matches d has its start or

finish point equal to d.

Proof Let p E P be a prefix that matches d and whose start or finish endpoint

equals d. Let q E P be a prefix that matches d but whose start and finish endpoints

are different from d. It is easy to see that the range of p is properly contained in

the range of q. Therefore, p is a longer prefix than q. So, Imp(d) / q. The lemma

follows. 0

The PIBT search algorithm first constructs a W-bit vector matchVector. When

the router table has no prefix whose start or finish endpoint equals the destination

address d, the constructed bit vector satisfies matchVector[l] = 1 iff there is a length

1 prefix that matches d. Otherwise, matchVector[l] = 1 iff there is a length 1 prefix

whose start or finish endpoint equals d. The maximum 1 such that matchVector[l] = 1

is the length of Imp(d).

Complexity Analysis. Each iteration of the while loop takes O(log2 ) time

(we assume throughout this paper that, for sufficiently large m, a B-tree node is

searched using a binary search) and the number of iterations is O(log, n). The

largest I such that matchVector[l1] = 1 may be found in O(log2 W) time by performing

O(log2 W) operations on the W-bit vector matchVector. So, the overall complexity is

Full Text


Iwouldliketogivemysincerethankfulnesstomyadvisor,Dr.SartajSahni, Thisworkwassupported,inpart,bytheNationalScienceFoundationundergrantCCR-9912395. IamverygratefultoDr.SanjayRanka,Dr.RandyChow,Dr.RichardNewman,Dr.MichaelFangforservingonmyPh.D.supervisorycommitteeandprovidinghelpfulsuggestions. Iwanttodedicatethisdissertationtomyparents.Withouttheirencouragementandhardwork,Icouldnotthinkofgettingadoctoraldegree.Finally,Iwouldliketogivemyspecialthankstomywife,Lan,whosecaringandloveenabledmetocompletethiswork.iv


TABLEOFCONTENTS page ACKNOWLEDGMENTS.............................iv ABSTRACT....................................viii CHAPTER 1INTRODUCTIONANDRELATEDWORK................1 1.1Introduction..............................1 1.1.1StaticRouterTable......................3 1.1.2DynamicRouterTable....................4 1.2RelatedWork.............................6 1.2.1Trie...............................6 1.2.2SetsofEqual-LengthPrexes.................8 1.2.3End-PointArray........................9 1.2.4MultiwayRangeTree.....................9 1.2.5 O (log n )DynamicSolutions.................9 1.2.6Highest-PriorityPrexTable.................10 1.2.7TCAM.............................10 1.2.8Others.............................11 1.3Contribution..............................11 2 O (log n )DYNAMICROUTERTABLEFORPREFIXESANDRANGES14 2.1Preliminaries.............................14 2.1.1PrexesandLongest-PrexMatching............14 2.1.2RangesandProjections....................15 2.1.3Most-Specic-RangeRoutingandConict-FreeRanges..17 2.1.4NormalizedRanges......................23 2.1.5PrioritySearchTreesAndRanges..............34 2.2Prexes................................35 2.3NonintersectingRanges........................36 2.4Conict-FreeRanges.........................38 2.4.1Determine msr ( d ).......................38 2.4.2InsertARange........................38 2.4.3DeleteARange........................39 2.4.4Computing maxP and minP .................40 2.4.5ASimpleAlgorithmtoCompute maxP ...........40 2.4.6AnEcientAlgorithmtoCompute maxP .........41 v


2.4.7WrappingUpInsertionofaRange..............44 2.4.8WrappingUpDeletionofaRange..............45 2.4.9Complexity...........................45 2.5ExperimentalResults.........................46 2.5.1Prexes.............................46 2.5.2NonintersectingRanges....................50 2.5.3Conict-freeRanges......................51 2.6Conclusion...............................51 3DYNAMICIPROUTERTABLESUSINGHIGHEST-PRIORITY MATCHING................................53 3.1Preliminaries.............................53 3.2NonintersectingHighest-PriorityRule-Tables(NHRTs)|BOB..56 3.2.1TheDataStructure......................56 3.2.2Searchfor hpr ( d ).......................59 3.2.3InsertaRange.........................61 3.2.4Red-Black-TreeRotations...................63 3.2.5DeleteaRange.........................66 3.2.6ExpectedComplexityofBOB................68 3.3Highest-PriorityPrex-Tables(HPPTs)|PBOB..........69 3.3.1TheDataStructure......................69 3.3.2Lookup.............................69 3.3.3InsertionandDeletion.....................71 3.4Longest-MatchingPrex-Tables(LMPTs)|LMPBOB.......71 3.4.1TheDataStructure......................71 3.4.2Lookup.............................72 3.4.3InsertionandDeletion.....................73 3.5ImplementationDetailsandMemoryRequirement.........74 3.5.1MemoryManagement.....................74 3.5.2BOB..............................74 3.5.3PBOB.............................76 3.5.4LMPBOB...........................77 3.6ExperimentalResults.........................78 3.6.1TestDataandMemoryRequirement............78 3.6.2PreliminaryTimingExperiments...............79 3.6.3Run-TimeExperiments....................82 3.7Conclusion...............................84 4AB-TREEDYNAMICROUTER-TABLEDESIGN...........87 4.1Longest-MatchingPrex-Tables|LMPT..............88 4.1.1ThePrexInB-TreeStructure|PIBT...........88 4.1.2FindingTheLongestMatching-Prex............91 4.1.3InsertingAPrex.......................92 4.1.4Insertinganendpoint.....................92 vi


4.1.5Updateintervalvectors....................96 4.1.6DeletingAPrex.......................97 4.1.7DeletingfromaLeafNode..................98 4.1.8BorrowfromaSibling.....................98 4.1.9MergingTwoAdjacentSiblings...............99 4.1.10DeletingfromaNon-leafNode................100 4.1.11Cache-MissAnalysis......................102 4.2Highest-PriorityRange-Tables....................104 4.2.1Preliminaries..........................104 4.2.2TheRangeInB-TreeStructure|RIBT...........105 4.2.3RIBTOperations.......................107 4.3ExperimentalResults.........................108 4.4Conclusion...............................112 5CONCLUSIONANDFUTUREWORK..................113 5.1Conclusion...............................113 5.2FutureWork..............................114 REFERENCES...................................116 BIOGRAPHICALSKETCH............................120 vii


Internetroutersuseroutertablestoclassifyincomingpacketsbasedonthein-formationcarriedinthepacketheaders.Packetclassicationisoneofthenetworkbottlenecks,especiallywhenahighupdateratebecomesnecessary.Muchoftheresearchintherouter-tableareahasfocusedonstaticprextables,whereupdatesusuallyrequiretherebuildingofthewholeroutertable.Somerouter-tabledesignsrelyontherelativelyshortIPv4addressestoachievedesiredeciency.However,thesedesignshavebadscalabilityintermsoftheprexlength. Weproposeseveralschemestorepresentone-dimensionaldynamicrangetables,thatis,tablesinto/fromwhichrulesareinserted/deletedconcurrentwithpacketclassication,andltersarespeciedasranges.Ourschemesallowreal-timeupdateandatthesametimeprovideecientlookup.Thelookupandupdatecomplexitiesofourschemesarelogarithmicfunctionsofthenumberofthelters.TherstschemePST,whichisbasedonprioritysearchtrees,usesthemostspecicruletiebreaker.ThesecondschemeiscalledBOB(BinarysearchtreeOnBinarysearchtree).Thisschemeusesthehighestprioritytiebreaker.Inordertoutilizethewidecachelinesizeandreducethetreeheight,athirdschemeisdevelopedinwhichthetoplevelviii


1.1 Introduction Today'sInternetconsistsofthousandsofpacketnetworksinterconnectedbyrouters.WhenahostsendsapacketintotheInternet,theroutersrelaythepackettowardsitsnaldestination.Theroutersexchangeroutinginformationwitheachother,andusetheinformationgatheredtocalculatethepathstoallreachabledesti-nations.Eachpacketistreatedindependentlyandforwardedtoanextrouterbasedonitsdestinationaddress. Thedatastructurearouterusestoquerynexthopiscalledtheroutertable.Eachentryintheroutertableisaruleoftheform(addressprex,nexthop).Table1{1showsasetofverules.WeuseWtodenotethemaximumpossiblelengthofaprex.InIPv4,W=32andinIPv6,W=128.InTable1{1Wis5.TheprexP1,whichmatchesallthedestinationaddresses,iscalledthedefaultprex.TheprexP3matchesthedestinationaddressesbetween16and19.Iftheaddressprexofarulematchesthedestinationaddresstheincomingpacketcarries,thenexthopofthisruleisusedtoforwardpacket. AddressprexwasintroducedbyCIDR(ClasslessInterdomainRouting)todealwithaddressdepletionandroutertableexplosion.TheresultofCIDR'saddressaggregationisthattheremayhaveseveralruleswhoseprexesmatchthedestinationaddress.Forexample,therulesP1,P3andP4inTable1{1matchthedestinationaddress19.Inthiscase,atiebreakerisneededtoselectoneofthematchingrules.Themostspecicmatchingisusuallyused,namely,thelongestprexmatchingthe1


Theothertwopopulartiebreakersarerstmatchingandhighestprioritymatch-ing.Forrstmatchingtiebreaker,theruletableisassumedtobealinearlistofruleswiththerulesindexed1throughnforann-ruletable.Therstrulethatmatchestheincomingpackageisused.NoticethattheruleR1isselectedforeveryincomingpacketsinceitmatchesallthedestinationaddresses.Inordertogiveachancetootherrulestobecomethewinner,wemustindextherulescarefully,andthedefaultprexshouldbethelastrule. Inthehighestprioritymatching,eachruleisassignedapriority,andtherulewiththehighestpriorityisselectedfromthosematchingtheincomingpacket.1Noticethattherstmatchingtiebreakerisaspecialcaseofthehighestprioritymatchingtiebreaker(simplyassigneachruleapriorityequaltothenegativeofitsindexinthelinearlinear). Table1{1:Aroutertablewithverules(W=5) RuleName PrexName Prex NextHop RangeStart RangeFinish R1 P1 N1 0 31R2 P2 0101* N2 10 11R3 P3 100* N3 16 19R4 P4 1001* N4 18 19R5 P5 10111 N5 23 23 Thequerybasedonthedestinationaddressisusuallycalledaddresslookuporpacketforwarding.Ingeneralothereldssuchassourceaddressandportnumbersmayalsobeused,andtheroutertableconsistsoftherulesoftheform(F;A),whereFisalterandAisanaction.Theactioncomponentofarulespecieswhatis


1.1.1 Static Router Table Inastaticruletable,therulesetdoesnotvaryintime.Forthesetables,weareconcernedprimarilywiththefollowingmetrics:1. Tohandleupdate,staticschemesusuallyusetwocopies-workingandshadow-oftheroutertables.Lookupsaredoneusingtheworkingtable.Updatesareperformed,inthebackground(eitherinrealtimeontheshadowtableorbybatchingupdatesandreconstructinganupdatedshadowatsuitableintervals);periodically,theshadowreplacestheworkingtable,andthecachesoftheworkingtableareushed.Inthismodeofupdateoperation,manypacketsmaybemisclassied,becausetheworkingcopyisn'timmediatelyupdated.Thenumberofmisclassiedpacketsdependsontheperiodicitywithwhichtheworkingtablecanbereplacedbyanupdatedshadow.Further,additionalmemoryisrequiredfortheshadowtableandforperiodicrecon-structionoftheworkingtable.Itisimportanttohaveshorterpreprocessingtimeinordertoreducethenumberofmisclassiedpackets.


1.1.2 Dynamic Router Table Inpractice,ruletablesareseldomtrulystatic.Atbest,rulesmaybeaddedtoordeletedfromtheruletableinfrequently.Typically,ina\static"ruletable,in-serts/deletesarebatchedandtherouter-tabledatastructurereconstructedasneeded.Inadynamicruletable,rulesareadded/deletedwithsomefrequency.Forsuchtables,inserts/deletesarenotbatched.Rather,theyareperformedinrealtime. Webelievethatdynamicstructuresforroutertablesisbecominganecessity.First,updateoccursfrequentlyinthebackbonearea.Labovitzetal.[1]foundup-dateratecouldreachashighas1000persecond.Theseupdatesstemfromtheroutefailure,routerepairandroutefail-over.Withthenumberofautonomoussystemscon-tinuouslyincreasing,itisreasonabletoexpecttheraisingupdaterate.Theroutertableneedstobeupdatedinordertoreecttheroutechange.Second,fastprocess-ingofupdateispreferredbecauseduringthebatchandreconstruction,end-to-enddelayincreases,packetlossraisesdramatically,andthepartofnetworkmayexpe-rienceconnectivityloss.Labovitzetal.[2]observeddramaticallyincreasedpacketlossandend-to-endlatencyduringtheBGProutingchange.Batchandexpensivereconstructionmakethingsworse.WhileBGPtakestimetoconverge,route-repaireventsusuallydonotcausemultipleannouncements,andthelatencyforroutertabletobecomestableduetotheseeventsshouldonlydependonthenetworkdelayandrouterprocessingdelaysalongthepath[2].Inaddition,whentheBGPcoveragetimegetsreduced,theprocessingdelaymaydominate.Peietal.[3]reducetheconver-gencetimefrom30.3secondsto0.3secondsforafailurewithdrawinthetestbedbyapplyingtwoconsistencyassertionstoBGP.Macianetal.[4]emphasizetheimpor-tanceofsupportinghighupdaterate.Dynamicroutertablesthatpermithigh-speedinsertsanddeletesareessentialinQoSandVASapplications[4].Forexample,edgeroutersthatdostatefullteringrequirehigh-speedupdates[5].


Fordynamicroutertables,weareconcernedadditionallywiththetimerequiredtoinsert/deletearule.Foradynamicruletable,theinitialrule-tabledatastructureisconstructedbystartingwithanemptydatastructureandtheninsertingtheinitialsetofrulesintothedatastructureonebyone.So,typically,inthecaseofdynamictables,thepreprocessingmetric,mentionedabove,isverycloselyrelatedtotheinserttime. Fordynamicroutertable,thefollowingmetricsaremeasuredtocomparetheperformance:1. AnotherimportantmetricweconcernforbothstaticanddynamicroutertableisthescalabilitytoIPv6.IPv6,thenextgenerationofIP,uses128-bitaddresses(W=128).Althoughsomeoftheschemesinsection1.2workwellforIPv4(W=32),theyhavebadscalabilityintermsoftheprexlength.


1.2 Related Work Datastructuresforruletablesinwhicheachlterisaaddressprexandtherulepriorityisthelengthofthisprex2havebeenintenselyresearchedinrecentyears.Werefertoruletablesofthistypeaslongest-matchingprex-tables(LMPT).Werefertoruletablesinwhichtheltersarerangesandinwhichthehighest-prioritymatchinglterisusedashighest-priorityrange-tables(HPRT).WhentheltersofnotworulesofanHPRTintersect,theHPRTisanonintersectingHPRT(NHPRT).AlthougheveryLMPTisalsoanNHPRT,anNHPRTmaynotbeanLMPT. Ruiz-Sanchezetal.[6]reviewdatastructuresforstaticLMPTsandSahnietal.[7]reviewdatastructuresforbothstaticanddynamicLMPTs. 1.2.1 Trie Severaltrie-baseddatastructuresforLMPTshavebeenproposed[8,9,10,11,12,13,14].StructuressuchasthatofDoeringeretal.[10]usethepath-compressiontechnique.ThusthememoryrequirementisO(n).Thesearchisguidedbytheinputkeyandonlyinspectsthebitpositionstoredattheinternalnodeduetoasuccessfulsearchbias.Whenthesearchreachestheleafnodeandthesearchdoesnotsucceed,thedownwardpathmaybebacktrackedtondthelongestmatchingprex.HencethesearchcanbecarriedoutinO(W)time.Theupdateoperation,insertordelete,isnaturalintriestructure,andcanalsobeperformedinO(W)time.ThememoryaccessesduringtheseoperationsareO(W).ForIPv6,O(W=128)memoryaccessesarequiteexpensive.Moreover,pathcompressionreducestheheightoftrieonlyiftheprexesscatterinsidethetriesparsely.Whenthenumberofprexesincreases,lotsofbranchnodesareneededandpathcompressiondoesnothavemanynodesto


Inordertoreducethetrielength,Guptaetal.[15]usesDIR-24-8schemewhichfullyexpandsthebinarytrieatdepth24,i.e.,allprexeswithlengthlessthanorequalto24areexpandedto24-bitprexesasmanyasneeded,andatablewith224entriesisusedtostoretheseexpandedprexes.Forthoseprexeslongerthan24bits,asecondtableisusedtostorethem.Thecorrespondenceisestablishedbystoringpointersinthersttablewhichpointtotheproperentriesinthesecondtable.Thersttablehas224entries,andeachentryis16bits(32Mbytesintotal).Therstbitofeachentryindicateswhetherthenext15bitsstorethenexthoporapointerinto2ndtable.Withmorethan32Mbytesmemoryusage,theschemecanperformsearchinatmosttwomemoryaccesses.ButitisnotscalabletoIPv6becauseexpandingto24bitsalreadytakestoomuchmemory.Guptaetal.[15]alsoproposealternativesthatuselessmemorybutrequiremorememoryaccesses. Degermarketal.[9]useasimilarprexexpansiontechniqueatmultipledepths.Bitmapcompressionisdeployedtoreducedthememoryrequirementgreatly.Aroutertablewith40,000rulescantinto160Kbytes.Intheworstcase,thenumberofmemoryaccessesisnine.Huangetal.[16]fullyexpandthebinarytrieatdepth16andalsoexpandthesbutriesrootedatthenodesindepth16totheirowndepths.Thebitmapcompressionisalsoappliedtoreducethememoryrequirement.Theroutertablesusedintheexperimentcanbecompactedintolessthan500Kbytes.Thenumberofworstcasememoryaccessesisthree.Bothschemes[9,16]heavilydependontheprexdistribution.Itishardtodecideapropermemorysizefortheschemeaheadoftime.Forexample,inextremecase,ifnprexesintheroutertableallhavelength32,andtheirrst16-bitsaredistinct(assumen<=216),thescheme[16]needsatleast214nbytes.


Nilssonetal.[11]applythelevelcompressionaswellaspathcompressiontothebinarytrie.Abinarytrieispath-compressedrst,thenlevelcompressionisusedtoreducetheheightofthetriefurtherbysubstitutingkhighestlevelsofthebinarytriewithasingledegree-2knode.AlthoughthesearchcomplexityofLC(levelcompressed)trieisstillO(W),theheightofLC-trieisaround8fortheroutertablesusedinauthor'sanalysis. Thesedatastructures[9,11,15]aswellasSrinivasanetal.[12]attempttooptimizelookuptimethroughanexpensivepreprocessingstep.They,whileprovidingveryfastlookupcapability,haveaprohibitiveinsert/deletetime,sotheyaresuitableonlyforstaticrouter-tables(i.e.,tablesinto/fromwhichnoinsertsanddeletestakeplace). Sahnietal.[13,14]provideecientconstructionsforxed-strideandvariable-stridemultibittries.Thelookuptimeandmemoryrequirementareoptimizedthroughexpensivepreprocessing. Aimingatimprovingupdatespeedforxed-stridemultibittrieatpipelinedASICarchitecture,Basuetal.[17]describeanalgorithmtooptimizeandbalancethememoryrequirementacrossthepipelinestages. 1.2.2 Sets of Equal-Length Prexes Waldvogeletal.[18]haveproposedaschemethatperformsabinarysearchonhashtablesorganizedbyprexlength.Inordertosupportbinarysearch,O(logW)markersaregeneratedforeachprex,andthelongestmatchingprexisprecomputedforeachmarker.ThisbinarysearchschemehasanexpectedcomplexityofO(logW)forlookup.ThememoryrequirementisboundedbyO(nlogW).ByintroducingatechniquecalledmarkerpartitioninginthefullversionofWaldvogeletal.[18],theschemehasO(p


1.2.3 End-Point Array Analternativeadaptationofbinarysearchtolongest-prexmatchingisdevel-opedin[19].Thedistinctendpoints(startpointsandnishpoints)oftherangesdenedbytheprexesarestoredinascendingorderinanarray.TheendpointsdividetheuniverseintoO(n)basicintervals.TheLMP(d)isprecomputedforeachintervalaswellasforeachendpoint.LMP(d)isfoundbyperformingabinarysearchonthisorderedarray.AlookupinatablethathasnprexestakesO(logn)time.Becausetheschemes[19]useexpensiveprecomputation,theyarenotsuitedforadynamicrouter-tables. 1.2.4 Multiway Range Tree Surietal.[20]haveproposedaB-treedatastructurefordynamicLMPTs.Usingtheirstructure,wemayndthelongestmatching-prex,LMP(d),inO(logmn)time.However,inserts/deletestakeO(Wlogmn)time.WhenWbitstinO(1)words(asisthecaseforIPv4andIPv6prexes)logicaloperationsonW-bitvectorscanbedoneinO(1)timeeach.Inthiscase,theschemeofSurietal.[20]takesO(mlog2Wlogmn)timeforaninsertionandO(mlogmn+W)foradeletion.AssumeonenodecantintoO(1)cacheline,thenumberofmemoryaccessesthatoccurwhenthedatastructureofSurietal.[20]isusedisO(logmn)persearch,andO(mlogmn)perupdate. 1.2.5 Dynamic Solutions Sahnietal.[21,22]developdatastructures,calledacollectionofred-blacktrees(CRBT)andalternativecollectionofred-blacktrees(ACRBT),thatsupportthethreeoperationsofadynamicLMPTinO(logn)timeeach.ThenumberofcachemissesisalsoO(logn).Sahnietal.[22]showthattheirACRBTstructureiseasilymodiedtoextendthebiased-skip-liststructureofErgunetal.[23]soastoobtainabiased-skip-liststructurefordynamicLMPTs.Usingthismodiedbiasedskip-liststructure,lookup,insert,anddeletecaneachbedoneinO(logn)expectedtimeandO(logn)expectedcachemisses.Liketheoriginalbiased-skipliststructureof


1.2.6 Highest-Priority Prex Table WhenanHPPT(highest-priorityprex-table)isrepresentedasabinarytrie[24],eachofthethreedynamicHPPToperationstakesO(W)timeandcachemisses. Guptaetal.[25]havedevelopedtwodatastructuresfordynamicHPRTs|heapontrie(HOT)andbinarysearchtreeontrie(BOT).TheHOTstructuretakesO(W)timeforalookupandO(Wlogn)timeforaninsertordelete.TheBOTstructuretakesO(Wlogn)timeforalookupandO(W)timeforaninsert/delete.ThenumberofcachemissesinaHOTandBOTisasymptoticallythesameasthetimecomplexityofthecorrespondingoperation. 1.2.7 TCAM Ternarycontent-addressiblememories,TCAMs,useparallelismtoachieveO(1)lookup[26].EachmemorycellofaTCAMmaybesettooneofthreestates0,1,anddon'tcare.TheprexesofaroutertablearestoredinaTCAMindescendingorderofprexlength.AssumethateachworkoftheTCAMhas32cells.Theprex10*isstoredinaTCAMworkas10??...?,where?denotesadon'tcareandthereare30?sinthegivensequence.Todoalongest-prexmatch,thedestinationaddressismatched,inparallel,againsteveryTCAMentryandasorted-by-lengthlinearlist,thelongestmatching-prexcanbedeterminedinO(1)time.AprexmaybeinsertedordeletedinO(W)time,whereWisthelengthofthelongestprex[27].AlthoughTCAMsprovideasimpleandecientsolutionforstaticanddynamicroutertables,thissolutionrequiresspecialhardware,costsmore,andusesmorepowerandboardspacethansolutionsthatemploySDRAMs.TCAMshavelongerlatencythanSDRAMs.


1.2.8 Others Cheungetal.[29]developedamodelfortable-drivenroutelookupandcastthetabledesignproblemasanoptimizationproblemwithinthismodel.Theirmodelaccountsforthememoryhierarchyofmoderncomputers,andtheyoptimizeaverageperformanceratherthanworst-caseperformance. SolutionsthatinvolvemodicationstotheInternetProtocol(i.e.,theadditionofinformationtoeachpacket)havealsobeenproposed[30,31,32]. 1.3 Contribution Wehavedevelopeddatastructuresfordynamicroutertables.Thedatastruc-turesuseO(n)spaceexceptthatRIBTusesO(nlogmn)space.Ourrstdatastruc-ture,PST[33,34],usesthemostspecicmatchingtiebreaker.Itpermitsonetosearch,insert,anddeleteinO(logn)timeeach.AlthoughO(logn)timedatastruc-turesforprextableswereknownpriortoourwork[21,22],thePSTismorememoryecientthanthedatastructuresof[21,22].Further,PSTissignicantlysuperiorontheinsertanddeleteoperations,whilebeingcompetitiveonthesearchoperation.Fornonintersectingrangesandconict-freerangesPSTsarethersttopermitO(logn)search,insert,anddelete.


Theseconddatastructure,BOB[35],worksforhighest-prioritymatchingwithnonintersectingranges.thehighest-priorityrulethatmatchesadestinationaddressmaybefoundinO(log2n)time;anewrulemaybeinsertedandanoldonedeletedinO(logn)time.Forthecasewhenallruleltersareprexes,thedatastructurePBOB(prexBOB)permitshighest-prioritymatchingaswellasruleinsertionanddeletioninO(W)timeeach.Onpracticalruletables,BOBandPBOBperformeachofthethreedynamic-tableoperationsinO(logn)timeandwithO(logn)cachemisses.PBOBcanalsosupportthedynamic-tableoperationsinO(logn)timeandwithO(logn)cachemissesfornonintersectingrangeswhenthenumberofnestinglevelsisaconstant. Toutilizethewidecachelinesize,e.g.,64-bytecacheline,weproposeB-treedatastructuresfordynamicrouter-tablesforthecaseswhentheltersareprexesaswellaswhentheyarenon-intersectingranges.AcrucialdierencebetweenourdatastructureforprexltersandtheB-treerouter-tabledatastructureofSurietal.[20]isthatinourdatastructure,eachprexisstoredinO(1)B-treenodesperB-treelevel,whereasinthestructureofSurietal.[20],eachprexisstoredinO(m)nodesperlevel(mistheorderoftheB-tree).Asaresultofthisdierence,aprexmaybeinsertedordeletedfromann-lterroutertableaccessingonlyO(logmn)nodesofourdatastructure;theseoperationsaccessO(mlogmn)nodesusingthestructureofSurietal.[20].EventhoughtheasysmptoticcomplexityofprexinsertionanddeletionisthesameinbothB-treestructures,experimentsconductedbyusshowthatbecauseofthereducedcachemissesforourstructure,themeasuredaverageinsertanddeletetimesusingourstructureareabout30%lessthanwhentheB-treestructureofSurietal.[20]isused.Further,anupdateoperationusingtheB-treestructureofSurietal.[20]will,intheworstcase,make2.5timesasmanycachemissesasmadewhenourstructureisused.Theasymptoticcomplexitytondthelongestmatchingprexisthesame,O(mlogmn)inbothB-treestructures,andin


WiththeO(logn)operationtime,ourdatastructuresscalewelltothelargeroutertables.Sincethecomplexityisindependentoftheprexlength,ourdatastructuresarealsoscalabletoIPv6. Anotherimportantfeatureofourdatastructuresisthatnonintersectingrangesaresupportednaturally,whereasmostexistingdatastructuressupportranges(neces-sarywhentheltersaredenedforportnumbers)bybreakingonerangeintoO(W)prexeswhichresultsinO(Wlogn)memoryrequirement.Supportingrangesisalsoanicefeaturefornetworklayeraddresses.Therangethataprexcoversmustbeapoweroftwo,anditmuststartatanumberwhichisamultipleoftherangesize.Buttheendpointsandthesizeofanormalrangecanbeanynumber.Supportingrangesmeansonecanallocatearangewitharbitrarysizetoanetwork(AppleTalksupportsthisfeature)andtherangeaggregationispotentiallybetterthanthatofprex.Forexample,twodisjointprexescanaggregateintooneprexonlyiftheirrangesareadjacenttoeachotherandtheyhavethesamelength,whereasthetwodisjointrangescanaggregateintoonerangeaslongastheyarenexttoeachother.So,rangeaggregationisexpectedtoresultinroutertablesthathavefewerrules.


Inthischapter,weshowinSection2.2howpriority-searchtreesmaybeusedtorepresentdynamicprex-router-tables.Theresultingstructure,whichisconceptuallysimplerthantheCRBTstructureofSahnietal.[21],permitslookup,insert,anddeleteinO(logn)timeeach.Forrangerouter-tables,weconsiderthecasewhenthebestmatching-prexisthemost-specicmatchingprex(thisistherangeanalogoflongest-matchingprex).InSection2.3,weshowthatdynamicrange-router-tablesthatemploymost-specicrangematchingandinwhichnotworangesoverlapmaybeecientlyrepresentedusingtwopriority-searchtrees.Usingthistwo-priority-search-treerepresentation,lookup,insert,anddeletecanbedoneinO(logn)timeeach.Thegeneralcaseofnon-conictingrangesisconsideredinSection2.4.Inthissection,weaugmentthedatastructureofSection2.3withseveralred-blacktreestoobtainarange-router-tablerepresentationfornon-conictingrangesthatpermitslookup,insert,anddeleteinO(logn)timeeach.Section2.1introducestheterminologyweuse.Inthissection,wealsodevelopthemathematicalfoundationthatformsthebasisofourdatastructures.ExperimentalresultsarereportedinSection2.5. 2.1 Preliminaries 2.1.1 Prexes and Longest-Prex Matching Theprex1101*matchesalldestinationaddressesthatbeginwith1101and10010*matchesalldestinationaddressesthatbeginwith10010.Forexample,whenW=5,1101*matchestheaddressesf11010;11011g=f26;27g,andwhenW=6,1101*matchesf110100;110101;110110;110111g=f52;53;54;55g.SupposethataroutertableincludestheprexesP1=101,P2=10010,P3=01,P4=1,and14


(B) (C) 2.1.2 Ranges and ProjectionsDenition1 Noticethateveryprexofaprexrouter-tablemayberepresentedasarange.Forexample,whenW=6,theprexP=1101matchesaddressesintherange[52;55].So,wesayP=1101=[52;55],start(P)=52,andfinish(P)=55.


Sincearangerepresentsasetof(contiguous)points,wemayusestandardsetoperationsandrelationssuchas\andwhendealingwithranges.So,forexample,[2;6]\[4;8]=[4;6].Notethatsomeoperationsbetweenrangesmynotyieldarange.Forexample,[2;6][[8;10]=f2;3;4;5;6;8;9;10gisnotarange.Denition2 Thepredicatedisjoint(r;s)istrueirandsaredisjoint.disjoint(r;s)()overlap(r;s)=;()v

[2,4]and[6,9]aredisjoint;[2,4]and[3,4]arenested;[2,4]and[2,2]arenested;[2,8]and[4,6]arenested;[2,4]and[4,6]intersect;and[3,8]and[2,4]intersect.[4;4]istheoverlapof[2;4]and[4;6];andoverlap([3;8];[2;4])=[3;4].Lemma1 Letsbearange.(R[fsg)isarangeistart(s)v+1andfinish(s)u1.(c) When(R[fsg)=[x;y],x=minfu;start(s)gandy=maxfv;finish(s)g. 2.1.3 Most-Specic-Range Routing and Conict-Free RangesDenition4


Figure2{2:CasesforLemma2 [2;4]ismorespecicthan[1;6],and[5;9]ismorespcicthan[5;12].Since[2;4]and[8;14]aredisjoint,neitherismorespecicthantheother.Also,since[4;14]and[6;20]intersect,neitherismorespecicthantheother.Denition5


WenotethatourdenitionofconictfreeisanaturalextensiontorangesofthedenitionofconictfreegivenbyHarietal.[36]forthecaseoftwo-dimensionalprexrules.Denition7 Lets2Rbesuchthatintersect(r;s).9BR[(B)=overlap(r;s)]3. FollowsfromLemma4.2. Whenr2R,(2)followsfromthedenitionofaconict-freerangeset.So,assumer62R.LetCcompriseallrangesofAcontainedins.IfsintersectsnorangeofA,(C)=overlap(r;s).IfsintersectsatleastonerangeofA,then


Fromparts(1)and(2)ofthislemma,itfollowsthatthereisaresolvingsubsetinR[frgforeverys2Rthatintersectswithr.Hence,R[frgisconictfree. When69AR[range((A))^start((A))=u^finish((A))v],wesaythatmaxP(u;v;R)doesnotexist.Similarly,minP(u;v;R)maynotexist.Attimes,weusemaxPandminPasabbreviationsformaxP(u;v;R)andminP(u;v;R),re-spectively.



If69t2C[intersect(r;t)],thenfromEquation2.3,weget8t2C[disjoint(r;t)_tr].Fromthisandr(C),itfollowsthatalldestinationaddressesd,d2r,arecoveredbyrangesofCthatarecontainedinr.Therefore,9BCA((B)=r).ThiscontradictsEquation2.1. Next,suppose9t2C[intersect(r;t)].LetDbetheunionoftheresolvingsubsetsforallofthesetandrinR.Clearly,allrangesinDarecontainedinr.Further,letEbethesubsetofallrangesinCthatarecontainedinr.ItiseasytoseethatD[EA^(D[E)=r.ThiscontradictsEquation2.1. For(2),assumethat69BA[(B)=r]. (=))AssumethatAisconictfree.Weneedtoprove69s2A[rs]_[m;n]2A(2.4) Wedothisbycontradiction.So,assume9s2A[rs]^[m;n]62A(2.5) Since9s2A[rs],mandnarewelldened.Equation2.5impliesthatAhasarange[m;y],y>naswellasarange[x;n],x

((=)IfnorangeofAcontainsr,thenrisnotpartoftheresolvingsubsetforanypairofintersectingrangesofR.This,togetherwiththefactthatRisconictfree,impliesthatAisconictfree.If[m;n]2A,wecanuse[m;n]inplaceofrinanyresolvingsubsetforintersectingrangesofR.Therefore,Ahasaresolvingsubsetforeverypairofintersectingranges.So,Aisconictfree. 2.1.4 Normalized RangesDenition9 [NormalizedRanges]TherangesetRisnormalizedioneofthefollowingistrue.1.


(B) Figure2{3(A)showsarangesetthatisnotnormalized(itcontainsrangesthatintersectaswellasnestedrangesthathavecommonend-points).Figure2{3(B)showsanormalizedrangeset.Regardlessofwhichofthesetworangesetsisused,everydestinationdhasthesamemost-specicrange.Denition10


Figure2{4:PartitioninganormalizedrangesetintochainsofCPmaybecombinedintoasinglechain.CP(N)iscalledacanonicalpartitioning.2. Foralliandj,1i

IfbothmaxP(u;v1;R)andminP(u+1;v;R)existandmaxP(u;v1;R)+1>minP(u+1;v;R)1,chop(r)=;,where;denotesthenullrange.Thenullrangeneitherintersectsnoriscontainedinanyotherrange. .Denenorm(R)=fchop(r)jr2R^chop(r)6=;g.Lemma12 Ifintersect(s;r0),theneitherstart(r0)finish(s),whichcontradictssr0.Thecasefinish(s)=finish(r0)issimilar.


Ifintersect(r;s),thenfromLemma13wegetdisjoint(chop(r);chop(s)).So,chop(r)6=chop(s). Ifnested(r;s),thenfromLemma12itfollowsthatschop(r)_disjoint(s;chop(r))whensrandrchop(s)_disjoint(r;chop(s))whenrs.Considertheformercase(thelattercaseissimilar).schop(r)implieschop(s)6=chop(r).disjoint(s;chop(r))alsoimplieschop(s)6=chop(r). Thenalcaseisdisjoint(r;s).Inthiscase,clearly,chop(s)6=chop(r). Forr02norm(R),denefull(r0)=chop1(r0)=r,whereristheuniquerangeinRforwhichchop(r)=r0.Noticethatfull(chop(r))=rexceptwhenchop(r)=;.Lemma15 Ifjnorm(R)j1,norm(R)isnormalized.So,assumethatjnorm(R)j>1.Letr0ands0betwodierentrangesinnorm(R).Weneedtoshowthatr0ands0satsify






FromLemma4,itfollowsthatAisconictfree.Further,sinceRhasasub-setwhoseprojectionequals[x;y],(A)=[x;y].FromLemma19,itfol-lowsthateveryd2[x;y]hasamost-specicrangeinnorm(A).Therefore,(norm(A))=[x;y].FromthedenitionofthechoppingruleandthatofA,weseethat8r2A[chop(r;A)=chop(r;R)].So,norm(A)norm(R).2. First,assumethat[x;y]2R.Supposethereisaranger02norm(R)suchthatr0[x;y]andr=full(r0)62A.Therearethreecasesforr. When[x;y]62R,letR0=R[f[x;y]g,C0=ffull(r0)jr02norm(R0)^r0[x;y]gandA0=A[f[x;y]g.Usingthelemmacasewehavealreadyproved,wegetC0A0.Sincechop([x;y];R0)=;andchop(s;R)=chop(s;R0)foreverys2R,norm(R0)=norm(R).Therefore,C=C0.So,CA0.Finally,since[x;y]62C,CA. LetsbethesmallestrangeofRthatcontainsr.Assumethatsexistsandthatchop(r;R[frg)6=;.(a)


For(2a),supposetherearetwodierentrangesgandhinRsuchthatchop(g;R)6=chop(g;R[frg)andchop(h;R)6=chop(h;R[frg)Fromthechoppingrule,itfollowsthatrg^rh(2.6) Therefore,:disjoint(g;h).FromthisandLemma1,wegetintersect(g;h)_nested(g;h).Equation2.6andintersect(g;h)implyroverlap(g;h).FromthisandLemma13,wegetdisjoint(r;chop(g;R))^disjoint(r;chop(h;R)).Therefore,chop(g;R)=chop(g;R[frg)andchop(h;R)=chop(h;R[frg),acontradiction.So,:intersect(g;h). Ifnested(g;h),wemayassume,withoutlossofgenerality,thatgh.ThisandEquation2.6yieldrgh.Therefore,maxP(x;y1;R)=maxP(x;y1;R[frg)andminP(x+1;y;R)=minP(x+1;y;R[frg),whereh=[x;y].So,chop(h;R)=chop(h;R[frg),acontradiction. Hence,therecanbeatmostonerangeofRwhosechop()valuechangesasaresultoftheadditionofr.Theprecedingproofforthecasenested(g;h)alsoestablishesthatthechop()valuemaychangeonlyfortheranges,thatisforthesmallestenclosingrangeofr(i.e.,smallests2R[rs]). For(2b),assumethatchop(s;R)6=chop(s;R[frg).Thisimpliesthatchop(s;R)6=;andsox0andy0arewelldened.(Notethatfrompart(1),wegetchop(r;R)6=;.)Weconsidereachofthethreecasesfortherelationshipbetweenrandchop(s;R)(Lemma1).


Usinganargumentsimilartothatusedinpart(2a),wemayshowthatwhenchop(s;R)r,x0=u0^y0=v0. Supposex0=u0^y0>v.IfmaxP(v0+1;y0;R)doesn'texist,thenchop(s;R[frg)=[v+1;y0].Ifitexists,chop(s;R[frg)=[maxP(v0+1;y0;R)+1;y0].4. Supposex0


(B) 2.1.5 Priority Search Trees And Ranges Apriority-searchtree(PST)[37]isadatastructurethatisusedtorepresentasetoftuplesoftheform(key1;key2;data),wherekey10,key20,andnotwotupleshavethesamekey1value.Thedatastructureissimultaneouslyamin-treeonkey2(i.e.,thekey2valueineachnodeofthetreeisthekey2valueineachdescendentnode)andasearchtreeonkey1.TherearetwocommonPSTrepresentations[37]:1. Inaradixpriority-searchtree(RPST),theunderlyingtreeisabinaryradixtreeonkey1.2. Inared-blackpriority-searchtree(RBPST),theunderlyingtreeisared-blacktree. McCreight[37]hassuggestedaPSTrepresentationofacollectionofrangeswithdistinctnishpoints.ThisrepresentationusesthefollowingmappingofarangerintoaPSTtuple:(key1;key2;data)=(finish(r);start(r);data)(2.7) wheredataisanyinformation(e.g.,nexthop)associatedwiththerange.Eachrangeris,thereforemappedtoapointmap1(r)=(x;y)=(key1;key2)=(finish(r);start(r))in2-dimensionalspace.Figure2{5showsasetofrangesandtheequivalentsetof2-dimensionalpoints(x;y). McCreight[37]hasobservedthewhenthemappingofEquation2.7isusedtoobtainapointsetP=map1(R)fromarangesetR,thenranges(d)isgivenby


WhenanRPSTisusedtorepresentthepointsetP,thecomplexityofenumerateRectangle(xleft;xright;ytop) isO(logmaxX+s),wheremaxXisthelargestxvalueinPandsisthenumberofpointsinthequeryrectangle.WhenthepointsetisrepresentedasanRBPST,thiscomplexitybecomesO(logn+s),wheren=jPj.Apoint(x;y)(andhencearange[y;x])maybeinsetedintoordeletedfromanRPST(RBPST)inO(logmaxX)(O(logn))time[37]. 2.2 Prexes LetRbeasetofrangessuchthateachrangerepresentsaprex.Itiswellknown(seeSahnietal.[21],forexample)thatnotworangesofRintersect.Therefore,Risconictfree.Forsimplicity,assumethatRincludestherangethatcorrespondstotheprex*.Withthisassumption,msr(d)isdenedforeveryd.FromLemma9,itfollowsthatmsr(d)istherange[maxStart(ranges(d));minFinish(ranges(d))].Tondthisrangeeasily,wersttransformP=map1(R)intoapointsettransform1(P)sothatnotwopointsoftransform1(P)havethesamex-value.Then,werepresenttransform1(P)asaPST.Denition12


performedonPST1yieldsranges(d).Tondmsr(d),weemploytheminXinRectangle(xleft;xright;ytop) operation,whichdeterminesthepointinthedenedrectanglethathastheleastx-value.ItiseasytoseethatminXinRectangle(2Wdd+2W1;1;d) performedonPST1yieldsmsr(d). Toinserttheprexwhoserangein[u;v],weinserttransform1(map1([u;v]))intoPST1.IncasethisprexisalreadyinPST1,wesimplyupdatethenext-hopinformationforthisprex.Todeletetheprexwhoserangeis[u;v],wedeletetransform1(map1([u;v]))fromPST1.Whendeletingaprex,wemusttakecarenottodeletetheprex*.Requeststodeletethisprexshouldsimplyresultinsettingthenext-hopassociatedwiththisprexto;. Since,minXinRectangle,insert,anddeleteeachtakeO(W)(O(logn))timewhenPST1isanRPST(RBPST),PST1providesarouter-tablerepresentationinwhichlongest-prexmatching,prexinsertion,andprexdeletioncanbedoneinO(W)timeeachwhenanRPSTisusedandinO(logn)timeeachwhenanRBPSTisused. 2.3 Nonintersecting Ranges LetRbeasetofnonintersectingranges.Clearly,Risconictfree.Forsimplicity,assumethatRincludestherangezthatmatchesalldestinationaddresses(z=



Figure2{6:Insertr=[u;v]intotheconict-freerangesetR Conict-Free Ranges Inthissection,weextendthetwo-PSTdatastructureofSection2.3tothegeneralcasewhenRisanarbitraryconict-freerangeset.Onceagain,weassumethatRincludestherangezthatmatchesalldestinationaddresses.PST1andPST2aredenedfortherangesetRasinSections2.2and2.3. 2.4.1 Determine SinceRisconictfree,msr(d)isdeterminedbyLemma9.Hence,msrd(d)maybeobtainedbyperformingtheoperationminXinRectangle(2Wdd+2W1;1;d) onPST1. 2.4.2 Insert A Range Wheninsertingaranger=[u;v]62R,wemustinserttransform(map1(r))intoPST1andtransform2(map2(r))intoPST2.Additionally,wemustverifythatR[frgisconictfree.ThisvericationisdoneusingLemma6.Figure2{6givesahigh-leveldescriptionofouralgorithmtoinsertarangeintoR. Step1isdonebysearchingfortransform1(map1(r))inPST1.ForStep2,wenotethatmaxY(u;v;R)=maxXinRectangle(2Wu(u1)+2W1;2W(v1)+2W1;u1)minX(u;v;R)=minXinRectangle(2W(u+1);2Wv+(2W1)v1;(2W1)v1)


Figure2{7:Deletetheranger=[u;v]fromtheconict-freerangesetR 2.4.3 Delete A Range Supposewearetodeletetheranger=[u;v].Thisdeletionistobepermittedir6=zandA=Rfrgisconictfree.Figure2{7givesahigh-leveldescriptionofouralgorithmtodeleter.ItscorrectnessfollowsfromLemma8. Step2employsthestandardPSTalgorithmtodeleteapoint[37].ForStep3,wenotethatAhasasubsetwhoseprojectionequalsr=[u;v]imaxP(u;v;A)=v.InSection2.4.4,weshowhowmaxP(u;v;A)maybecomputedeciently.ForStep5,wenotethatr=[u;v]s=[x;y]ixu^yv.So,AhassucharangeiminXinRectangle(2Wvu+2W1;1;u) existsinPST1. InStep6,weassumethatmaxXinRectangleandminXinRectanglereturntherangeofRthatcorrespondstothedesiredpointintherectangle.Todetermine


Figure2{8:SimplealgorithmtocomputemaxP(u;v;R),where[u;v]isarangeandconflictFree(R)whether[m;n]2A(Step7),wesearchforthepoint(2Wnm+2W1;m)inPST1usingthestandardPSTsearchalgorithm[37].ThereinsertionintoPST1andPST2,ifnecessary,isdoneusingthestandardPSTinsertalgorithm[37]. 2.4.4 Computing 2.4.5 A Simple Algorithm to Compute .Therefore,69r02norm(R)[start(r0)=u]=)69maxP.FromLemma18,itfollowsthatstart(full(r0))6=start(r0)=u=)69s2R[start(s)=start(r0)=u].So,start(full(r0))6=u=)69maxP.Finally,u=start(r0)=start(full(r))im-pliesfinish(full(r0))=minffinish(t)jt2R^start(t)=ug(Lemma17(1)).So,finish(full(r0))>vimplies69s2R[start(s)=u^finish(s)v].Hence,start(r0)=u^finish(full(r0))>v=)69maxP.Further,when



ToimplementStep1ofFigure2{8,wesearchendPointsTreeforthepointu.Ifu62endPointsTree,then69r02norm(R)[start(r0)=u].Ifu2endPointsTree,thenweusethepointerinthenodeforutogettotherootoftheRBTthathasr0.AsearchinthisRBTforulocatesr0.WemaynowperformtheremainingchecksofStep1usingthedataassociatedwithr0. SupposethatmaxPexists.AtthestartofStep2,wearepositionedattheRBTnodethatrepresentsr0.Thisisnode0ofFigure2{9.Weneedtonds02norm(R)




Wrapping Up Insertion of a Range NowthatwehaveaugmentedPST1andPST2withacollectionofRBTsandanendPointsTree,wheneverweinsertaranger=[u;v]intoR,wemutupdatenotonlyPST1andPST2asdescribedinSection2.4.2,butalsotheRBTcollectionandendPointsTree.Todothis,werstcomputechop(r;R[frg)=chop(r;R)=[u0;v0]byrstcomputingminP(u+1;v)andmaxP(u;v1)asdescribedinSection2.4.4.[u0;v0]isnoweasilyobtainedfromthechoppingrule.Lemma21tellsusthattheonlys2Rwhosechop()valuemaychangeasaresultoftheinsertionofristhesmallestenclosingrangeofr.Sincez2Randr6=z,suchansmustexist.Ratherthansearchforthissexplicitly,weusethecases(2){(4)conditionsofLemma22tonds0=chop(s;R)inendPointsTree.Notethatifchop(s;R)=;,thesearchinendPointsTreewillnotnds;butwhenchop(s;R)=;,chop(s;R[frg)=;.So,nochangeinchop(s;R)iscalledfor. NotethattheinsertionofrmaycombinetwochainsofCP(norm(R)).Inthiscase,weusethejoinoperationofred-blacktreestocombinetheRBTscorrespondingtothesetwochains.


2.4.8 Wrapping Up Deletion of a Range Whenchop(r;R)=;,nochngesaretobemadetotheRBTsandendPointsTree(Lemma23(1)).So,assumethatchop(r;R)6=;.Werstnds,thesmallestrangethatcontainsr(seeLemma23(2)).Notethatsincez2Randr6=z,sexists.Onemayverifythatsisoneoftherangesgivenbythefollowingtwooperations.minXinRectangle(2Wvu+2W1;1;u)maxXinRectangle(0;2Wu+2W1v;2W1v) wheretherstoperationisdoneinPST1andthesecondinPST2(bothoper-ationsaredoneaftertransform1(map1([u;v]))hasbeendeletedfromPST1andtransform2(map2([u;v]))hasbeendeletedfromPST2).Therangesreturnedbythesetwooperationsmaybecomparedtodeterminewhichiss. Oncewehaveidentieds,Lemma23(2)isusedtodeterminechop(s;Rfrg).As-sumethatchop(s;R)6=;.Letchop(r;R)=r0=[u0;v0]andchop(s;R)=s0=[x0;y0].Whens0andr0areindierentRBTs(thisisthecasewhenr0s0,chop(s;R)=chop(s;Rfrg)andtheRBTthatcontainss0mayneedtobesplitintotwoRBTs.Whens0andr0areinthesameRBT,theyareinthesamechainofCP(norm(R)).Ifs0arer0areadjacentrangesofthischain,wemaysimplyremovetheRBTnodeforr0andupdatethatfors0toreectitsnewstartornishpoint(onlyonemaychange).Whenr0ands0arenotadjacentranges,thenodesforthesetworangesareremovedfromtheRBT(thismaysplittheRBTintouptotwoRBTs)andchop(s;Rfrg)inserted.Figure2{11showsthedierentcases. 2.4.9 Complexity Theportionsofthesearch,insert,anddeletealgorithmsthatdealonlywithPST1andPST2havethesameasymptoticcomplexityastheircounterpartsforthecaseofnonintersectingranges(Section2.3).TheportionsthatdealwiththeRBTsandendPointsTreerequireaconstantnumberofsearch,insert,delete,join,andsplit


(B) (C) (D) 2.5 Experimental Results 2.5.1 Prexes Weprogrammedourred-blackpriority-searchtreealgorithmforprexes(Sec-tion2.2)inC++andcompareditsperformancetothatoftheACBRTofSahnietal.[22].RecallthattheACBRTisthebestperformingO(logn)datastructurereportedin[22]fordynamicprex-tables.Fortestdata,weusedsixIPv4prexdatabasesobtainedfrom[38].Thenumberofprexesineachofthesedatabasesaswellasthememoryrequirementsforeachdatabaseofprexesusingourdatastruc-ture(PST)ofSection2.2aswellastheACBRTstructureofSahnietal.[22]are


Figure2{12:MemoryusageshowninTable2{1.ThedatbasesPaix1,Pb1,MaeWestandAadswereobtainedonNov22,2001,whilePb2andPaix2wereobtainedSept.13,2000.Figure2{12isaplotofthedataofTable2{1.Ascanbeseen,theACBRTstructuretakesalmostthreetimesasmuchmemoryasistakenbythePSTstructure.Further,thememoryrequirementofthePSTstructurecanbereducedtoabout50%thatofourcurrentimplementation.Thisreductionrequiresann-nodeimplementationofapriority-searchtreeasdescribedin[37]ratherthanourcurrentimplementation,whichuses2n1nodesasin[39]. Table2{1:Memoryusage Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 NumofPrexes 16172 22225 28889 31827 35303 85988 Memory PST 884 1215 1579 1740 1930 4702 (KB) ACRBT 2417 3331 4327 4769 5305 12851 Toobtainthemeantimetondthelongestmatching-prex(i.e.,toperformasearch),westartedwithaPSTorACRBTthatcontainedallprexesofapre-xdatabase.Next,arandompermutationofthesetofstartpointsoftherangescorrespondingtotheprexeswasobtained.Thispermutationdeterminedtheorderinwhichwesearchedforthelongestmatching-prexforeachofthesestartpoints.Thetimerequiredtodeterminealloftheselongest-matchingprexeswasmeasured


Table2{2:Prextimesona500MHzSunBlade100workstation Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 PST Mean 2.88 3.06 3.25 3.31 3.43 4.06 Search Std 0.36 0.18 0.17 0.16 0.09 0.05 (sec) ACRBT Mean 2.60 2.77 2.87 2.87 3.09 3.51 Std 0.25 0.16 0.16 0.12 0.13 0.04 PST Mean 3.90 4.45 4.83 5.18 5.14 6.04 Insert Std 0.57 0.63 0.51 0.48 0.19 0.20 (sec) ACRBT Mean 21.15 23.42 24.77 25.36 25.54 28.07 Std 1.11 0.66 0.38 0.29 0.19 0.18 PST Mean 4.36 4.45 4.73 4.71 5.06 5.48 Delete Std 0.91 0.63 0.53 0.00 0.19 0.16 (sec) ACRBT Mean 21.24 22.68 23.16 23.71 24.56 25.64 Std 0.95 0.55 0.49 0.35 0.26 0.21 Toobtainthemeantimetoinsertaprex,westartedwitharandompermutationoftheprexesinadatabase,insertedtherst67%oftheprexesintoaninitiallyemptydatastructure,measuredthetimetoinserttheremaining33%,andcomputedthemeaninserttimebydividingbythenumberofprexesin33%ofthedatabase.Thisexperimentwasrepeated20timesandthemeanofthemeanaswellasthestandarddeviationinthemeancomputed.TheselattertwoquantitiesaregiveninTable2{2forourSunworkstation.Ascanbeseen,insertionsintoaPSTtakebetween18%and22%thetimetoinsertintoanACRBT! ThemeanandstandarddeviationdatareportedinTable2{2forthedeleteoperationwereobtainedinasimilarfashionbystartingwithadatastructurethat


Tables2{3and2{4givethecorrespondingtimesona700MHzPentiumIIIPCanda1.4GHzPentium4PC,respectively.Bothcomputershavea256KBL2cache.Theruntimesonour700MHzPentiumIIIareaboutone-halfthetimesonourSunworkstation.Surprisingly,whengoingfromthe700MHzPentiumIIItothe1.4GHzPentium4,themeasuredtimetondthelongestmatching-prexdecreasedbyonlyabout5%forPST.Moresurprisingly,thecorrespondingtimesforACRBTactuallyincreased.ThenetresultoftheslightdecreaseintimeforPSTandtheincreaseforACRBTisthat,onourPentium4PC,thePSTisfasterthantheACRBTonallthreeoperations{ndlongestmatching-prex,insert,anddelete.Thissomewhatsurprisingbehaviorisduetoarchitecturaldierences(e.g.,dierencesinwidthandsizeofL1cachelines)betweenthePentiumIIIand4processors. Table2{3:Prextimesona700MHzPentiumIIIPC Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 PST Mean 1.39 1.54 1.61 1.65 1.70 1.97 Search Std 0.27 0.22 0.17 0.14 0.00 0.04 (sec) ACRBT Mean 1.36 1.44 1.44 1.49 1.54 1.80 Std 0.25 0.18 0.13 0.14 0.14 0.06 PST Mean 2.41 2.63 2.60 2.83 2.80 3.07 Insert Std 0.87 0.30 0.53 0.43 0.40 0.14 (sec) ACRBT Mean 11.97 12.63 13.48 13.62 13.77 14.93 Std 0.95 0.67 0.24 0.48 0.35 0.18 PST Mean 2.32 2.38 2.49 2.45 2.55 2.91 Delete Std 0.82 0.61 0.52 0.47 0.00 0.17 (sec) ACRBT Mean 11.69 12.55 12.95 13.01 13.40 14.10 Std 0.87 0.63 0.54 0.44 0.48 0.16 Figures2{13,2{14,and2{15histogramthesearch,insert,anddeletetimedataoftheprecedingtables.


Table2{4:Prextimesona1.4GHzPentium4PC Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 PST Mean 1.30 1.44 1.51 1.52 1.63 1.92 Search Std 0.19 0.18 0.17 0.13 0.13 0.06 (sec) ACRBT Mean 1.48 1.69 1.83 1.87 1.87 2.24 Std 0.31 0.20 0.16 0.07 0.14 0.05 PST Mean 1.76 1.96 2.18 2.17 2.38 2.65 Insert Std 0.41 0.69 0.00 0.44 0.35 0.18 (sec) ACRBT Mean 11.22 11.81 12.41 12.91 12.92 13.94 Std 0.41 0.60 0.41 0.44 0.26 0.18 PST Mean 1.76 1.69 1.92 1.93 2.00 2.22 Delete Std 0.41 0.60 0.38 0.21 0.42 0.17 (sec) ACRBT Mean 9.46 10.39 10.54 10.42 10.92 11.64 Std 0.57 0.63 0.38 0.21 0.42 0.16 (B) (C) (B) (C) 2.5.2 Nonintersecting Ranges Tobenchmarkouralgorithmfornonintersectingranges(Section2.3),wegener-atedthreedierentsetsofrandom1nonintersectingranges.These,respectively,had


(B) (C) Table2{5:NonintersectingRanges.700MHzPIII NumofRanges 30000 50000 80000 MemoryUsage(KB) 3360 5600 8960 Search Mean 1.92 2.19 2.51 (sec) Std 0.15 0.04 0.06 Insert Mean 8.65 9.27 9.88 (sec) Std 0.49 0.29 0.17 Remove Mean 5.75 6.42 6.81 (sec) Std 0.44 0.28 0.14 2.5.3 Conict-free Ranges Table2{6givesthememoryrequiredaswellasthemeantimesandstandarddeviationsforthecaseofconict-freeranges.Therangesequenceusedisgeneratedsothatwhentherangesareinsertedinsequenceorder,therearenoconicts.Fordeletion,33%oftherangesareremovedinthereverseoftheinsertorder. 2.6 Conclusion Wehavedevelopeddatastructuresfordynamicroutertables.Ourdatastruc-turespermitonetosearch,insert,anddeleteinO(logn)timeeach.AlthoughO(logn)


Table2{6:Conict-freeRanges.PIII700MHzwith256KL2cache NumofRangesinR 30000 50000 80000 NumofRanges Mean 29688 48868 76472 innorm(R) Std 18.03 42.90 60.05 MemoryUsage Mean 6240 9979 15219 (KB) Std 7.06 10.91 11.19 Search Mean 1.98 2.34 2.69 (sec) Std 0.07 0.09 0.06 Insert Mean 18.45 19.65 20.76 (sec) Std 0.51 0.27 0.27 Remove Mean 19.3 20.49 21.60 (sec) Std 0.41 0.13 0.29 timedatastructuresforprextableswereknownpriortoourwork[21,22],ourdatastructureismorememoryecientthanthedatastructuresofSahnietal.[21,22].Further,ourdatastructureissignicantlysuperiorontheinsertanddeleteopera-tions,whilebeingcompetitiveonthesearchoperation. Fornonintersectingrangesandconict-freerangesourdatastructuresarethersttopermitO(logn)search,insert,anddelete.


Inthischapter,wefocusondatastructuresfordynamicNHPRTs,HPPTsandLMPTs.InSection3.2,wedevelopthedatastructurebinarytreeonbinarytree(BOB).ThisdatastructureisproposedfortherepresentationofdynamicNHPRTs.UsingBOB,alookuptakesO(log2n)timeandcachemisses;anewrulemaybeinsertedandanoldonedeletedinO(logn)timeandcachemisses.ForHPPTs,weproposeamodiedversionofBOB{PBOB(prexBOB){inSection3.3.UsingPBOB,alookup,ruleinsertionanddeletioneachtakeO(W)timeandcachemisses.InSection3.4,wedevelopthedatastructuresLMPBOB(longestmatching-prexBOB)forLMPTs.UsingLMPBOB,thelongestmatching-prexmaybefoundinO(W)timeandO(logn)cachemisses;ruleinsertionanddeletioneachtakeO(logn)timeandcachemisses.Onpracticalruletables,BOBandPBOBperformeachofthethreedynamic-tableoperationsinO(logn)timeandwithO(logn)cachemisses.Section3.1introducessometerminologyandExperimentalresultsarepresentedinSection3.6. 3.1 PreliminariesDenition13


Noticethateveryprexofaprexrouter-tablemayberepresentedasarange.Forexample,whenW=6,theprexP=1101matchesaddressesintherange[52;55].So,wesayP=1101=[52;55],start(P)=52,andfinish(P)=55. Sincearangerepresentsasetof(contiguous)points,wemayusestandardsetoperationsandrelationssuchas\andwhendealingwithranges.So,forexample,[2;6]\[4;8]=[4;6].Notethatsomeoperationsbetweenrangesmynotyieldarange.Forexample,[2;6][[8;10]=f2;3;4;5;6;8;9;10g,whichisnotarange.Denition14 Thepredicatedisjoint(r;s)istrueirandsaredisjoint.disjoint(r;s)()overlap(r;s)=;()v


3.2 Nonintersecting Highest-Priority Rule-Tables (NHRTs)|BOB 3.2.1 The Data Structure Thedatastructurebinarytreeonbinarytree(BOB)thatisbeingproposedhereforNHRTscomprisesasinglebalancedbinarysearchtreeatthetoplevel.Thistop-levelbalancedbinarysearchtreeiscalledthepointsearchtree(PTST).Forann-ruleNHRT,thePTSThasatmost2nnodes(wecallthisthePTSTsizeconstraint).ThesizeconstraintisnecessarytoenableO(logn)update.WitheachnodezofthePTST,weassociateapoint,point(z).ThePTSTisastandardred-blackbinarysearchtree(actually,anybinarysearchtreestructurethatsupportsecientsearch,insert,anddeletemaybeused)onthepoint(z)valuesofitsnodeset[24].That



Table3{1:Anonintersectingrangeset priority 4 [2,4] 33 [2,3] 34 [8,68] 10 [8,50] 9 [10,50] 20 [10,35] 3 [15,33] 5 [16,30] 30 [54,66] 18 [60,65] 7 [69,72] 10 [80,80] 12 Letranges(z)bethesubsetofrangesofRallocatedtonodezofthePTST.1SincethePTSTmayhaveasmanyas2nnodesandsinceeachrangeofRisinexactlyoneofthesetsranges(z),someoftheranges(z)setsmaybeempty. Therangesinranges(z)maybeorderedusingtherange(x)).Inaddition,eachnodexstoresthequantitymp(x),whichisthemaximumoftheprioritiesoftherangesassociatedwiththenodesinthesubtree


wherep(x)=priority(range(x)).Figure3{2givesapossibleRSTstructureforranges(30)ofFigure3{1.Eachnodeshows(range(x);p(x);mp(x)). Figure3{2:AnexampleRSTforranges(30)ofFigure3{1Lemma26 Foreverynodeyintherightsubtreeofx,st(y)st(x)andfn(y)fn(x).2. Foreverynodeyintheleftsubtreeofx,st(y)st(x)andfn(y)fn(x). 3.2.2 Search for Thehighest-priorityrangethatmatchesthedestinationaddressdmaybefoundbyfollowingapathfromtherootofthePTSTtowardaleafofthePTST.Figure3{3givesthealgorithm.Forsimplicity,thisalgorithmndshp=priority(hpr(d))ratherthanhpr(d).Thealgorithmiseasilymodiedtoreturnhpr(d)instead.



3.2.3 Insert a Range Arangerthatisknowntohavenointersectionwithanyoftheexistingrangesintheroutertable,maybeinsertedusingthealgorithmofFigure3{5.Inthewhileloop,wendthenodeznearesttotherootsuchthatrmatchespoint(z)(i.e.,start(r)point(z)finish(r)).Ifsuchazexists,therangerisinsertedintoRST(z)usingthestandardred-blackinsertionalgorithm[24].Duringthisinsertion,itisnecessarytoupdatesomeofthempvaluesontheinsertpath.Thisupdateisdoneeasily.IncasethePTSThasnozsuchthatrmatchespoint(z),weinsertanewnodeintothePTST.ThisinsertionisdoneusingthemethodinsertNewNode. ToinsertanewnodeintothePTST,werstcreateanewPTSTnodeyanddenepoint(y)andRST(y).point(y)maybesettobeanydestinationaddressmatchedbyr(i.e.,anyaddresssuchthatstart(r)point(y)finish(r))maybeused.Inourimplementation,weusepoint(y)=start(r).RST(y)hasonlyarootnodeandthisrootcontainsr;itsmpvalueispriority(r).IfthePTSTiscurrentlyempty,ybecomesthenewrootandwearedone.Otherwise,thenewnodeymaybeinsertedwherethesearchconductedinthewhileloopofFigure3{5terminated.That




3.2.4 Red-Black-Tree Rotations Figures3{6and3{7,respectively,showthered-blackLLandRRrotationsusedtorebalanceared-blacktreefollowinganinsertordelete(see[24]).Inthesegures,pt()isanabbreviationforpoint().Sincetheremainingrotationtypes,LRandRL,may,respectively,beviewedasanRRrotationfollowedbyanLLrotationandanLLrotationfollowedbyanRRrotation,itsucestoexamineLLandRRrotationsalone. Figure3{6:LLrotation Figure3{7:RRrotationLemma27


LetxandybeasinFigures3{6and3{7.FromLemma27,itfollowsthatranges(z)=ranges0(z)forallzinthePTSTexceptpossiblyforz2fx;yg.Itisnottoodiculttoseethatranges0(y)=ranges(y)[Sandranges0(x)=ranges(x)S,whereS=frjr2ranges(x)^start(r)point(y)finish(r)g TherangerMaxofSwithlargeststart()valuemaybefoundbysearchingRST(x)fortherangewithlargeststart()valuethatmatchespoint(y).(NotethatrMax=msr(point(y);ranges(x)).)SinceRST(x)isabinarysearchtreeofanorderedsetofranges(Denition18),rMaxmaybefoundinO(height(RST(x))timebyfollowingapathfromtherootdownward.IfrMaxdoesn'texist,S=;,ranges0(x)=ranges(x)andranges0(y)=ranges(y).


Figure3{8:ranges(x)andranges(y)forLLandRRrotations.NodesxandyareasinFigures3{6and3{7 AssumethatrMaxexists.Wemayusethesplitoperation[24]toextractfromRST(x)therangesthatbelongtoS.TheoperationRST(x)!split(small;rMax;big) separatesRST(x)intoanRSTsmallofranges<(Denition18)thanrMaxandanRSTbigofranges>thanrMax.WeseethatRST0(x)=bigandRST0(y)=join(small;rMax;RST(y)),wherejoin[24]combinesthered-blacktreesmallwithrangesrMaxintoasinglered-blacktree. ThestandardsplitandjoinoperationsofHorowitzetal.[24]needtobemodiedslightlysoastoupdatethempvaluesofaectednodes.Thismodicationdoesn'taecttheasymptoticcomplexity,whichislogarithmicinthenumberofnodesinthetreebeingsplitorlogarithmicinthesumofthenumberofnodesinthetwotreesbeingjoined,ofthesplitandjoinoperations.So,thecomplexityofperforminganLLorRRrotation(andhenceofperforminganLRorRLrotation)inthePTSTisO(logn).


3.2.5 Delete a Range Figure3{9givesouralgorithmtodeletearanger.NotethatifrisoneoftherangesinthePTST,thenrmustbeintheRSTofthenodezthatisclosesttotherootandsuchthatrmatchespoint(z).ThewhileloopofFigure3{9ndsthiszanddeletesrfromRST(z). Assumethatris,infact,oneoftherangesinourPTST.TodeleterfromRST(z),weusethestandardred-blackdeletionalgorithm[24]modiedtoupdatempvaluesasnecessary.FollowingthedeletionofrfromRST(z)weperformacleanupoperationthatisnecessarytomaintainthesizeconstraintofthePTST.Figure3{10givesthestepsinthemethodcleanup.


NoticethatfollowingthedeletionofrfromRST(z),RST(z)mayormaynotbeempty.IfRST(z)becomesemptyandthedegreeofnodeziseither0or1,nodezisdeletedfromthePTSTusingthestandardred-blacknodedeletionalgorithm[24].Ifthisdeletionrequiresarotation(atmostonerotationmayberequired)therotationisdoneasdescribedinSection3.2.4.Sincethenumberofrangesandnodeshaseachdecreasedby1,thesizeconstraintmaybeviolated(thishappensifjPTSTj=2jRjpriortothedelete).Hence,itmaybenecessarytoremoveanodefromthePTSTtorestorethesizeconstraint. IfRST(z)becomesemptyandthedegreeofzis2orifRST(z)doesnotbecomeempty,zisnotdeletedfromthePTST.Now,jPTSTjisunchangedbythedeletionofrandjRjreducesby1.Again,itispossiblethatwehaveasizeconstraintviolation.Ifso,uptotwonodesmayhavetoberemovedfromthePTSTtorestorethesizeconstraint. Thesizeconstraint,ifviolated,isrestoredinthewhileloopofFigure3{10.Thisrestorationisdonebyremovingoneortwo(asneeded)degree0ordegree1nodesthathaveanemptyRST.Lemma28showsthatwheneverthesizeconstraintisviolated,thePTSThasatleastonedegree0ordegree1nodewithanemptyRST.So,thenodezneededfordeletionineachiterationofthewhileloopalwaysexists.Lemma28


Tondthedegree0anddegree1nodesthathaveanemptyRSTeciently,wemaintainadoubly-linkedlistofthesenodes.Also,adoubly-linkedlistofdegree2nodesthathaveanemptyRSTismaintained.Whenarangeisinsertedordeleted,PTSTnodesmaybeadded/removedfromthesedoubly-linkedlistsandnodesmaymovefromonelisttoanother.TherequiredoperationscanbedoneinO(1)timeeach. 3.2.6 Expected Complexity of BOB LetmaxRbethemaximumnumberofrangesthatmatchanydestinationad-dress.So,jranges(z)j=jRST(z)jmaxRforeverynodezofthePTST.Wemay,therefore,restatethecomplexityoftheBOBoperations{lookup,insert,delete{asO(lognlogmaxR),O(logn),andO(logn),respectively. Sahnietal.[21]haveanalyzedtheprexesinseveralrealIPv4prexrouter-tables.Theyreportthatadestinationaddressismatchedbyabout1prexonaverage;themaximumnumberofprexesthatmatchadestinationaddressisatmost6.Makingtheassumptionthatthisanalysisholdstrueevenforrealrangerouter-tables(nodataisavailableforustoperformsuchananalysis),weconcludethatmaxR6.So,theexpectedcomplexityofBOBonrealrouter-tablesisO(logn)peroperation.


3.3 Highest-Priority Prex-Tables (HPPTs)|PBOB 3.3.1 The Data Structure Whenallruleltersareprexes,maxRminfn;Wg.Hence,ifBOBisusedtorepresentanHPPT,thesearchcomplexityisO(lognminflogn;logWg);theinsertanddeletecomplexitiesareO(logn)each. SincemaxR6forrealprexrouter-tables,wemayexpecttoseebetterperfor-manceusingasimplerstructure(i.e.,astructurewithsmalleroverheadandpossiblyworseasymptoticcomplexity)forranges(z)thantheRSTstructuredescribedinSec-tion3.2.InPBOB,wereplacetheRSTineachnode,z,oftheBOBPTSTwithanarraylinearlist[41],ALL(z),ofpairsoftheform(pLength;priority),wherepLengthisaprexlength(i.e.,numberofbits)andpriorityistheprexpriority.ALL(z)hasonepairforeachranger2ranges(z).ThepLengthvalueofthispairisthelengthoftheprexthatcorrespondstotherangerandthepriorityvalueisthepriorityoftheranger.ThepairsinALL(z)areinascendingorderofpLength.Notethatsincetherangesinranges(z)arenestedandmatchpoint(z),thecorrespondingprexeshavedierentlength. 3.3.2 Lookup Figure3{11givesthealgoritmtondthepriorityofthehighest-priorityprexthatmatchesthedestinationaddressd.Themethodmaxp()returnsthehighestpriorityofanyprexinALL(z)(notethatallprefxesinALL(z)matchpoint(z)).ThemethodsearchALL(d,hp)examinestheprexesinALL(z)andupdateshptakingintoaccounttheprioritiesofthoseprexesinALL(z)thatmatchd. ThemethodsearchALL(d,hp)utilizesthefollowinglemma.Consequently,itexaminesprexesofALL(z)inincreasingorderoflengthuntileitherallprexeshavebeenexaminedoruntiltherst(i.e.,shortest)prexthatdoesn'tmatchdisexamined.




3.3.3 Insertion and Deletion ThePBOBalgorithmstoinsert/deleteaprexaresimpleadaptationsofthecor-respondingalgorithmsforBOB.rMaxisfoundbyexaminingtheprexesinALL(x)inincreasingorderoflength.ALL0(y)isobtainedbyprependingtheprexesinALL(x)whoselengthisthelengthofrMaxtoALL(y),andALL0(x)isobtainedfromALL(x)byremoveingtheprexeswhoselenthisthelengthofrMax.ThetimerequiretondrMaxisO(maxR).Thisisalsothetimerequiredtocom-puteALL0(y)andALL0(x).Theoverallcomplexityofaninsert/deleteoperationisO(logn+maxR)=O(W). Asnotedearlier,maxR6inpractice.So,inpractice,PBOBtakesO(logn)timeandmakesO(logn)cachemissesperoperation. 3.4 Longest-Matching Prex-Tables (LMPTs)|LMPBOB 3.4.1 The Data Structure Usingpriority=pLength,aPBOBmaybeusedtorepresentanLMPTobtain-ingthesameperformanceasforanHPPT.However,wemayachievesomereductioninthememoryrequiredbythedatastructureifwereplacethearraylinearlistthatisstoredineachnodeofthePTSTbyaW-bitvector,bit.bit(z)[i]denotestheithbitofthebitvectorstoredinnodezofthePTST,bit(z)[i]=1iALL(z)hasaprexwhoselengthisi.WenotethatSurietal.[20]useW-bitvectorstokeeptrackofprexlengthsintheirdatastructurealso.


3.4.2 Lookup Figure3{12givesthealgorithmtondthelengthofthelongestmatching-prex,lmp(d),fordestinationd.Themethodlongest()returnsthelargestisuchthatbit(z)[i]=1(i.e.,itreturnsthelengthofthelongestprexstoredinnodez).ThemethodsearchBitVector(d,hp,k)examinesbit(z)andupdateshptakingintoac-countthelengthsofthoseprexesinthisbitvectorthatmatchd.Themethodsame(k+1,point(z),d)returnstrueipoint(z)anddagreeontheirk+1mostsignicantbits.


ThemethodsearchBitVector(d,hp,k)(Figure3{13)utilizesthenexttwolem-mas.Lemma31 3.4.3 Insertion and Deletion TheinsertanddeletealgorithmsaresimilartothecorrespondingalgorithmsforHPPTs.Theessentialdierenceareasbelow.


RatherthaninsertordeleteaprexfromanALL(z),wesetbit(z)[l],wherelisthelengthoftheprexbeinginsertedordeleted,to1or0,respectively.2. Forarotation,wedonotlookforrMaxinbit(x).Instead,wendthelargestintegeriMaxsuchthattheprexthatcorrespondstobit(x)[iMax]matchespoint(y).Therst(bit0comesbeforebit1)iMaxbitsofbit0(y)aretherstiMaxbitsofbit(x)andtheremainingbitsofbit0(y)arethesameasthecorrespondingbitsofbit(y).bit0(x)isobtainedfrombit(x)bysettingitsrstiMaxbitsto0. 3.5 Implementation Details and Memory Requirement 3.5.1 Memory Management WeimplementedourdatastructuresinC++.SincedynamicmemoryallocationanddeallocationusingC++'smethodsnewanddeleteareverytimeconsuming,weimplementedourownmethodstomanagememory.Wemaintainedourownlistoffreememory.Wheneverthislistwasexhausted,weusedthenewmethodtogetalargechunkofmemorytoaddtoourfreelist.Memorywasthenallocatedfromthislargechunkasneededbyourdatastructures.Whenevermemorywastobedeallocated,itwasputbackontoourfreelist. 3.5.2 BOB AsdescribedinSection3.2,eachnodezofthePTSTofBOBhasthefollowingelds:color,point(z),RST,leftChild,andrightChild.Toimprovethelookupper-formanceofBOB,weaddedthefollowingelds:maxPriority(maximumpriorityof


Withtheaddedelds,eachnodeofthePTSThas8elds.ForthecolorandmaxPriorityelds,weallocate1byteeach.Assuming4bytesforeachofthere-mainingelds,wegetanodesizeof26bytes.Forimprovedcacheperformance,itisdesirabletoalignnodeto4-bytememory-boundaries.Thisalignmentissimpliedifnodesizeisanintegralmultipleof4bytes.Therefore,forpracticalpurposes,thePTSTnode-sizebecomes28bytes. InourimplementationofhpRight(Figure3{4),thewhileloopconditonalwaschangedfromx!=nulltox!=null&&mp>hp.AcorrespondingchangewasmadetohpLeft. ThenodesofanRSThavethefollowingelds:color,mp,st,fn,p,leftChild,andrightChild.Using1byteforthecolor,p,andmpeldseach,and4bytesforeachoftheremainingelds,thesizeofanRSTnodebecomes19bytes.Again,foreaseofalignmentto4-byteboundaries,wemaketheRST-nodesize20bytes.Inadditiontonodes,everynonemptyRSThastheeldsroot(pointertorootofRST)andrank(rankofred-blacktree)eld.Eachoftheseeldsisa4-byteeld. Forthedoubly-linkedlistsofPTSTnodeswithanemptyRST,weusedtheminStandmaxFneldsto,respectively,representleftandrightpointers.So,thereisnospaceoverhead(otherthanthespaceneededtokeeptrackoftherstnode)associatedwithmaintainingthetwodoubly-linkedlistsofPTSTnodesthathaveanemptyRST.


SinceaninstanceofBOBmayhaveupto2nPTSTnodes,nnonemptyRSTs,andnRSTnodes,themaximumspace/memoryrequiredbyBOBis282n+8n+20n=84nbytes. 3.5.3 PBOB TherequiredeldsineachnodezofthePTSTofPBOBare:color,point(z),ALL,size,length,leftChild,andrightChild,whereALLisaone-dimensionalarray,eachentryofwhichhasthesubeldspLengthandpriority;sizeisthedimensionofthearray,andlengthisthenumberofpairscurrentlyinthearraylinearlist.ThearrayALLinitiallyhasenoughspacetoaccomodate4pairs(pLength;priority).WhenthecapacityofanALLisexceeded,thesizeoftheALLisincreasedby4pairs(sinceatmost6pairsareexpectedinanALL,thesizeofanALLneedstobeincreasedatmostonce;intheory,anALLmaygetasmanyasWpairsand,intheory,usingarraydoublingasin[41]mayworkbetterthanincreasingthearraysizeby4eachtimearraycapacityisexceeded). ToimprovethelookupperformanceofPBOB,theeldmaxPriority(maxi-mumpriorityoftheprexesinALL(z)),maybeadded.NotethatminSt(smalleststartingpointoftheprexesinALL(z)),andmaxFn(largestnishpointofthepre-xesinALL(z))areeasilycomputedfrompoint(z)andthepLengthoftheshortest(i.e.,rst)prexinALL(z).WhenthenodesofthePTSTareaugmentedwithamaxPriorityeld,theexpressionALL(z)->maxp()inFigure3{11maybechangedtomaxPriority(z),andthestatementALL(z)->searchALL(d,hp)executedonlywhen


Using1byteforeachoftheelds:color,size,length,maxPriority,pLength,andpriority;and4bytesforeachoftheremainingelds,theinitialsizeofaPTSTnodeofPBOBis24bytes. Forthedoubly-linkedlistsofPTSTnodeswithanemptyALL,weusedthe8bytesofmemoryallocatedtotheemptyarrayALLto,respectively,representleftandrightpointers.So,thereisnospaceoverhead(otherthanthespaceneededtokeeptrackoftherstnode)associatedwithmaintainingthetwodoubly-linkedlistsofPTSTnodesthathaveanemptyALL. SinceaninstanceofPBOBmayhaveupto2nPTSTnodes,theminimumspace/memoryrequiredbythese2nPTSTnodesis242n=48nbytes.However,somePTSTnodesmayhavemorethan4pairsintheirALL.Therecanbeatmostn=5suchnodes.So,themaximumspace-requirementofPBOBis48n+8n=5=49:6nbytes. 3.5.4 LMPBOB InthecaseofLMPBOB,eachnodezofthePTSThasthefollowingelds:color,point(z),bit,leftChild,andrightChild. ToimprovethelookupperformanceofPBOB,theeldsminLength(minimumoflengthsofprexesinbit(z))andmaxLengthmaybeadded.WhenthenodesofthePTSTareaugmentedwithaminLengthandamaxLengtheld,wereplacethestatementbit(z)->searchBitVector(d,hp,k)ofFigure3{12by


Using1byteforeachoftheelds:color,minLength,andmaxLength;8bytesforbit(thisanalysisisforIPv4);and4bytesforeachoftheremainingelds,thesizeofaPTSTnodeofLMPBOBis23bytes.Again,toeasilyalignPTSTnodesalong4-byteboundaries,wepadanLMPPTSTnodesothatitssizeis24bytes. Forthedoubly-linkedlistsofPTSTnodeswithanemptybitvector,weusedthe8bytesofmemoryallocatedtotheemptybitvectorbittorepresentleftandrightpointers.So,thereisnospaceoverhead(otherthanthespaceneededtokeeptrackoftherstnode)associatedwithmaintainingthetwodoubly-linkedlistsofPTSTnodesthathaveanemptybit. SinceaninstanceofLMPBOBmayhaveupto2nPTSTnodes,thespace/memoryrequiredbythese2nPTSTnodesis242n=48nbytes. 3.6 Experimental Results 3.6.1 Test Data and Memory Requirement WeimplementedtheBOB,PBOB,andandLMPBOBdatastructuresandasso-ciatedalgorithmsinC++asdescribedinSection3.5andmeasuredtheirperformanceona1.4GHzPC.Toassesstheperformanceofthesedatastructures,weusedsixIPv4prexdatabasesobtainedfrom[38]2.Weassignedeachprexapriorityequaltoitslength.Hence,BOB,PBOB,andLMPBOBwereallusedinalongestmatching-prexmode.Fordynamicrouter-tablesthatusethelongestmatching-prextiebreaker,thePSTstructureofLuetal.[33,34]providesO(logn)lookup,insert,anddelete.So,weincludedthePSTinourexperimentalevaluationofBOB,PBOB,andLMPBOB. Thenumberofprexesineachofour6databasesaswellasthememoryre-quirementforeachdatabaseofprexesareshowninTable3{2.Forthememory


3.6.2 Preliminary Timing Experiments WeperformedpreliminaryexperimentstodeterminetheeectivenessofthechangessuggestedinSection3.5.Sincethesechangesareonlytothelookupal-gorithm,ourpreliminarytimingexperimentsmasuredonlythelookuptimesfortheBOB,PBOB,andLMPBOBdatastructures.Toobtainthemeanlookup-time,we


Table3{2:Memoryusage Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 NumofPrexes 16172 22225 28889 31827 35303 85988 PST 884 1215 1579 1740 1930 4702 Measure1 BOB 851 1176 1526 1682 1876 4527 (KB) PBOB 357 495 642 708 790 1901 LMPBOB 357 495 642 708 790 1901 PST 221 303 395 435 482 1175 Measure2 BOB 331 455 592 652 723 1760 (KB) PBOB 189 260 338 372 413 1007 LMPBOB 189 260 338 372 413 1007 Figure3{14:Memoryusage{measure1 Figure3{15:Memoryusage{measure2startedwithaBOB,PBOB,orLMPBOBthatcontainedallprexesofaprexdatabase.Next,wecreatedalistofthestartpointsoftherangescorrespondingto


ForBOB,wefoundthatomittingthepredicatesdmaxFnandminStdresultedinameanlookuptimethatisapproximatelytwicethebaselookuptime.Ontheotherhand,eliminationofthepredicatemaxPriority>hpreducesthemeanlookuptimebyabout2%.EventhoughtheuseofthepredicatemaxPriority>hpincreasedthelookuptimeslightlyonourtestdata,webelievethisisagoodheuristicfordatasetsinwhichtheprioritiesarenothighlycorrelatedwiththelengthsoftheprexesorranges.So,ourremainingexperimentsretainedthispredicate.Eliminatingthepredicatemp>hphadnonoticeableeectonmeanlookuptime.Thisistobeexpectedonourdatasets,becauseforthesedatasets,themaximumvalueofjranges(z)jismaxR=6.Thepredicatemp>hpisexpectedtobeeectiveondatasetswithalargervalueofmaxR.So,weretainedthispredicateforourremainingtests. ForPBOB,eliminationofthepredicatehp

InthecaseofLMPBOB,theintroductionofthestatementhp=k=minLengthintothebasecode,resultsinalookuptimethatis15%lessthanwhenthisstatementisremoved. 3.6.3 Run-Time Experiments Wemeasuredthemeanlookup-timeasdescribedinSection3.6.2.Thestandarddeviationintheaveragetimesacrossthe10repetitionsdescribedinSection3.6.2wasalsocomputed.ThesemeantimesandstandarddeviationsarereportedinTable3{3.ThemeantimesarealsohistogrammedinFigure3{16.ItisinterestingtonotethatPBOB,whichcanhandleprextableswitharbitrarypriorityassignementsisactually20%to30%fasterthanPST,whichislimitedtoprextablesthatemploythelongestmatching-prextiebreaker.Further,lookupsinBOB,whichcanhandlerangetableswitharbitraryprioritiesareslightlyslowerthaninPST.LMPBOB,which,likePST,isdesignedspecicallyforlongest-matching-prexlookupsisslightlyinferiortothemoregeneralPBOB. Figure3{16:Searchtime Toobtainthemeaninsert-time,westartedwitharandompermutationoftheprexesinadatabase,insertedtherst67%oftheprexesintoaninitiallyemptydatastructure,measuredthetimetoinserttheremaining33%,andcomputedthemeaninserttimebydividingbythenumberofprexesin33%ofthedatabase.(Onceagain,


Table3{3:Prextimesona1.4GHzPentium4PCwithan8KL1datacacheanda256KL2cache Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 PST Mean 1.20 1.35 1.49 1.53 1.57 1.96 Std 0.01 0.01 0.04 0.01 0.00 0.01 BOB Mean 1.22 1.39 1.54 1.56 1.62 2.19 Search Std 0.01 0.02 0.02 0.02 0.02 0.01 (sec) PBOB Mean 0.82 0.98 1.10 1.15 1.20 1.60 Std 0.01 0.01 0.01 0.01 0.01 0.01 LMPBOB Mean 0.87 1.03 1.17 1.21 1.27 1.69 Std 0.01 0.01 0.01 0.01 0.01 0.01 PST Mean 2.17 2.35 2.53 2.60 2.64 3.03 Std 0.07 0.04 0.03 0.01 0.05 0.01 BOB Mean 1.70 1.89 2.06 2.10 2.16 2.55 Insert Std 0.06 0.06 0.05 0.05 0.05 0.03 (sec) PBOB Mean 1.04 1.25 1.39 1.44 1.51 1.93 Std 0.06 0.05 0.00 0.05 0.05 0.06 LMPBOB Mean 1.06 1.29 1.47 1.50 1.57 1.98 Std 0.07 0.07 0.06 0.06 0.04 0.01 PST Mean 1.72 1.87 2.06 2.09 2.11 2.48 Std 0.04 0.05 0.05 0.06 0.04 0.06 BOB Mean 1.04 1.13 1.26 1.27 1.32 1.69 Delete Std 0.06 0.05 0.04 0.05 0.06 0.06 (sec) PBOB Mean 0.68 0.82 0.90 0.91 0.97 1.30 Std 0.07 0.06 0.05 0.06 0.03 0.05 LMPBOB Mean 0.67 0.82 0.89 0.92 0.95 1.26 Std 0.06 0.06 0.05 0.05 0.03 0.05 NumofCopies 15 11 9 8 8 3 sincethetimetoinserttheremaining33%oftheprexeswastoosmalltomeasureaccurately,westartedwithseveralcopiesofthedatastructureandinsertedthe33%prexesintoeachcopy;measuredthetimetoinsertinallcopies;anddividedbythenumberofcopiesandnumberofprexesinserted).Thisexperimentwasrepeated10times,eachtimestartingwithadierentpermutationofthedatabaseprexes,andthemeanofthemeanaswellasthestandarddeviationinthemeancomputed.TheselattertwoquantitiesaswellasthenumberofcopiesofeachdatastructureweusedfortheinsertsaregiveninTable3{3.Figure3{17histogramsthemeaninsert-time.Ascanbeseen,insertionsintoPBOBtakebetween40%and60%less


Figure3{17:Inserttime ThemeanandstandarddeviationdatareportedforthedeleteoperationinTa-ble3{3andFigure3{18wasobtainedinasimilarfashionbystartingwithadatastructurethathad100%oftheprexesinthedatabaseandmeasuringthetimetodeletearandomlyselected33%oftheseprexes.DeletionfromPBOBtakeslessthan50%thetimerequiredtodeletefromanPST.Forthedeleteoperation,how-ever,LMPBOBisslightlyfasterthanPBOB.DeletionsfromBOBtakeabout40%lesstimethandodeletionsfromPST. 3.7 Conclusion Table3.7givestheworst-casememoryrequiredbyeachofthedatastructures.ThedataofthistableareforIPv4.Whencomparingthesememoryrequirementdata,weshouldkeepinmindthatBOB,PBOB,andLMPBOBhavedierentca-pabilities.BOBworksforhighest-prioritymatchingwithnonintersectingranges;PBOBislimitedtohighest-prioritymatchingwithprexes;andLMPBOBislimited


Figure3{18:Deletetimetolongest-lengthmatchingwithprexes.ThePSTstructureofLuetal.[33]hasthesamerestrictionsasdoesLMPBOB. Table3{4:Nodesizesandworst-casememoryrequirementinbytesforIPv4routertables. BOB PBOB LMPBOB PST NodeSize PTST(28)RST(20) 24 28 MemoryRequired 84n Table3{5:Timecomplexity BOB PBOB LMPBOB PST Search Insert Delete


Table3{6:Cachemisses BOB PBOB LMPBOB PST Search Insert Delete OurexperimentsshowthatPBOBistobepreferredoverPSTandLMPBOBfortherepresentationofdynamiclongest-matchingprex-router-tables.Thisissome-whatsurprisingbecausePBOBmaybeusedforhighest-priorityprex-router-tables,notjustlongest-matchingprex-router-tables.ApossiblereasonwhyPBOBisfasterthanLMPBOBisthatinLMPBOBonehastocheckO(W)prexlengths,whereasinPBOBO(maxR)lengthsarechecked(notethatinourtestdatabases,W=32andmaxR6).BOBisslowerthanandrequiresmorememorythanPBOBwhentestedwithlongest-matchingprex-routertables.ThesamerelativeperformancebetweenBOBandPBOBisexpectedwhenltersareprexeswitharbitrarypriority.Ofthedatastructuresconsideredinthischapter,BOB,ofcourse,remainstheonlychoicewhentheltersarerangesthathaveanassociatedpriority. Althoughtherangeallocationruleusedbyourdatastructuresissimilartothatusedinanintervaltree[40],theuniquefeatureofourstructuresisthe2nsizeconstraint.ThesizeconstraintisessentialforO(logn)update.


Inthischapter,wefocusonB-treedatastructuresfordynamicNHPRTsandLMPTs.WeareinterestedintheB-tree,becausebyvaryingtheorderoftheB-tree,wecancontroltheheightofthetreeandhencecontrolthenumberofcachemissesincurredwhenperformingarule-tableoperation.AlthoughSurietal.[20]haveproposedaB-treedatastructurefordynamicprex-tables,theirstructurehasthefollowingshortcomings:1. AprexmaybestoredinO(m)nodesateachleveloftheordermB-tree.Thisresultsinexcessivecachemissesduringtheinsertanddeleteoperations.2. Someofprexend-pointsarestoredtwiceintheB-tree.Thisisbecauseeveryendpointisstoredinaleafnodeandsomeoftheendpointsareadditionallystoredininteriornodes.Thisduplicityinend-pointstorageincreasesmemoryrequirement. OurproposedB-treestructuredoesn'tsuerfromtheseshortcomings.Inourstruc-ture,eachprexisstoredinO(1)nodesateachlevel,andeachprexend-pointisstoredonce.Consequently,eventhoughtheasymptoticcomplexityofperformingdynamicprex-tableoperationsisthesameinbothstructuresandtheasymptoticmemoryrequirementsofbotharethesame,ourstructureisfasterfortheinsertanddeleteoperationsandalsotakeslessmemory. InSection4.1,wedevelopourB-treedatastructure,PIBT(prexinB-tree),fordynamicprex-tables.OurB-treestructurefornon-intersectingranges,RIBT(rangeinB-tree),isdevelopedinSection4.2.ExperimentalresultscomparingtheperformanceofourPIBTstructure,themultiwayrangetree(MRT)structureofSuri87


Table4{1:AnexampleprexsetR(W=5) PreifxName Prex RangeStart RangeFinish P1 001* 4 7 P2 00* 0 7 P3 1* 16 31 P4 01* 8 15 P5 10111 23 23 P6 0* 0 15 etal.[20],andthebestbinarytreestructurefordynamicprex-tables,PBOB[35],arepresentedinSection4.3. 4.1 Longest-Matching Prex-Tables|LMPT 4.1.1 The Prex In B-Tree Structure|PIBT Aranger=[u;v]isapairofaddressesuandv,uv.Therangerrepresentstheaddressesfu;u+1;:::;vg.start(r)=uisthestartpointoftherangeandfinish(r)=visthenishpointoftherange.Therangermatchesalladdressesdsuchthatudv.Everyprexofaprexrouter-tablemayberepresentedasarange.Forexample,whenW=5,theprexp=100matchesaddressesintherange[16;19].So,wesayp=100=[16;19],start(p)=16,andfinish(p)=19.Thelengthofpis3.Figure4{1showsaprexsetandtherangesoftheprexes. ThesetofstartandnishpointsofacollectionPofprexesisthesetofendpoints,E(P),ofP.WhenjPj=n,jE(P)j2n.AlthoughourPIBTstructureandtheMRTstructureofSurietal.[20](MRT)storetheendpointsE(P)togetherwithadditionalinformationinaB-tree1[41],eachstructureusesadierentvarietyofB-tree.OurPIBTstructureusesaB-treeinwhicheachkey(endpoint)isstored


Figure4{1:B-treefortheendpointsoftheprexesofFigure4{1 Figure4{2:AlternativeB-treeforFigure4{1exactlyonce,whiletheMRTusesaB-treeinwhicheachkeyisstoredonceinaleafnodeandsomeofthekeysareadditionallystoredininteriornodes.Figure4{1showsapossibleorder-3B-treefortheendpointsoftheprexsetofFigure4{1.Inthisexample,eachendpointisstoredinexactlyonenode.ThisexampleB-treeisapossibleB-treeforPIBTbutnotforMRT. Figure4{2showsapossibleorder3B-treeinwhicheachendpointisstoredinexactlyoneleafnodeandsomeendpointsarealsostoredininteriornodes.ThisexampleB-treeisapossibleB-treeforMRTbutnotforPIBT. WitheachnodexofaPIBTB-tree,weassociateanintervalint(x)ofthedes-tinationaddressspace[0;2W1].Theintervalint(root)associatedwiththerootoftheB-treeis[0;2W1].LetxbeaB-treenodethathastkeys.Theformatofthisnodeis:t;child0;(key1;child1);;(keyt;childt)


PAGE 100

4.1.2 Finding The Longest Matching-Prex Asin[20],wedetermineonlythelengthofthelongestprexthatmatchesagivendestinationaddressd.Fromthislengthandd,thelongestmatching-prex,lmp(d),iseasilycomputed.ThePIBTsearchalgorithm(Figure4{3)employsthefollowinglemma.Lemma33 ThePIBTsearchalgorithmrstconstructsaW-bitvectormatchVector.Whentheroutertablehasnoprexwhosestartornishendpointequalsthedestinationaddressd,theconstructedbitvectorsatisesmatchVector[l]=1ithereisalengthlprexthatmatchesd.Otherwise,matchVector[l]=1ithereisalengthlprexwhosestartornishendpointequalsd.ThemaximumlsuchthatmatchVector[l]=1isthelengthoflmp(d). ComplexityAnalysis.EachiterationofthewhilelooptakesO(log2m)time(weassumethroughoutthispaperthat,forsucientlylargem,aB-treenodeissearchedusingabinarysearch)andthenumberofiterationsisO(logmn).ThelargestlsuchthatmatchVector[l]=1maybefoundinO(log2W)timebyperformingO(log2W)operationsontheW-bitvectormatchVector.So,theoverallcomplexityis

PAGE 101

4.1.3 Inserting A Prex Toinsertaprexp,wemustdothefollowing:1. Insertstart(p)intothePIBTandupdatethecorrespondingequalvector.Ifstart(p)isalreadyinthePIBT,onlythecorrespondingequalvectoristobeupdated.2. Insertfinish(p)(provided,ofcourse,thatfinish(p)6=start(p))intothePIBTandupdatethecorrespondingequalvector.Iffinish(p)isalreadyinthePIBT,onlythecorrespondingequalvectoristobeupdated.3. Updatetheintervalvectorssoastosatisfytheprexallocationrule. 4.1.4 Inserting an endpoint ThealgorithmtoinsertanendpointuintothePIBTisanadaptationofthestandardB-treeinsertionalgorithm[41].WesearchthePIBTforakeyequaltou.IncasecaseuisalreadyinthePIBT,theassociatedequalvectorisupdatedtoaccountforthenewprexpthatbeginsorendsatuandwhoselengthequals

PAGE 102

PAGE 103

Figure4{4:Nodesplitting AlgorithminsertEndPoint(u;x;i)(Figure4{5)insertstheendpointuintotheleafnodexofthePIBTandperformsnodesplitsasneeded.Itisassumedthatx:keyi1
PAGE 105

4.1.5 Update interval vectors Followingtheinsertionoftheendpointsofthenewprexp,theintervalvectorsinthenodesofthePIBTneedtobeupdatedtoaccountforthenewprexp.TheprexallocationruleleadstotheintervalupdatealgorithmofFigure4{6.Oninitialinvocation,xistherootofthePIBT.Theintervalupdatealgorithmassumesthatpisnotthedefaultprexthatmatchesalldestinationaddresses(thisprex,ifpresent,maybestoredoutsidethePIBTandhandledasaspecialcase). Figure4{7showsapossiblesetofnodesvisitedbyx. ComplexityAnalysis.AlgorithmupdateIntervalsvisitsatmost2nodesoneachlevelofthePIBTandateachnode,O(m)timeisspent.So,thecomplexityofupdateIntervalsisO(mlogmn).ThisalgorithmaccessesO(logmn)nodesofthePIBT.

PAGE 106

Figure4{7:Nodesvisitedwhenupdatingintervals Combiningthecomplexitiesofallpartsofthealgorithmtoinsertaprex,weseethatthetimetoinsertisO((m+log2W)logmn)andthatthenumberofnodesaccessedduringaprexinsertionisO(logmn). 4.1.6 Deleting A Prex Todeleteaprexp,wedothefollowing.1. Removepfromallintervalvectorsthatcontainp.2. Updatetheequalvectorforstart(p)andremovestart(p)fromthePIBTifitsequalvectorisnowzero.3. Ifstart(p)6=finish(p),updatetheequalvectorforfinish(p)andremovefinish(p)fromthePIBTifitsequalvectorisnowzero. Therstofthesesteps(i.e.,removingpfromintervalvectors)isalmostidenticaltothecorrespondingstep(Figure4{6)forprexinsertion.Theonlydierenceisthatinsteadofsettingx:intervalq[length(p)]to1,wenowsetitto0.ThetimecomplexityofthisstepremainsO(mlogmn)andthisstepaccessesO(logmn)nodesofthePIBT. TheB-treekeydeletionalgorithmofSahnietal.[41]considerstwocases:1. Thekeytobedeletedisinaleafnode.2. Thekeytobedeletedisaninterior(i.e.,non-leaf)node.

PAGE 107

4.1.7 Deleting from a Leaf Node Todeletetheendpointu,werstsearchthePIBTforthenodexthatcontainsthisendpoint.SupposethatxisaleafofthePIBTandthatu=x:keyi.Sinceuisanendpointofnoprex,x:intervali1=x:intervaliandx:equali=0.keyi,x:intervali,x:equali,andx:childiareremovedfromnodexandthekeystotherightofkeyitogetherwiththeassociatedinterval,equal,andchildvaluesareshiftedonepositionleft.Ifthenumberofkeysthatremaininxisatleastdm=2e(2incasexistheroot),wearedone.Otherwise,nodexisdecientandwedothefollowing1. Ifanearestsiblingofxhasmorethandm=2ekeys,xgains/borrowsakeyviathisnearestsiblingandsoisnolongerdecient.2. Otherwise,xmergeswithitsnearestsibling.Themergemaycausepx=parent(x)tobecomedecientinwhichcase,thisdeciencyresolutionprocessisrepeatedatpx. 4.1.8 Borrow from a Sibling Supposethatx'snearestleftsiblingyhasmorethandm=2ekeys(Figure4{8).Letkeyt(y)bethelargest(i.e.,rightmost)keyinyandletpx:keyibethekeyinpxsuchthatpx:childi1=yandpx:childi=x(i.e.,px:keyiisthekeyinpxthatisbetweenyandx). Figure4{8:Borrowfromrightsibling Theborrowoperationdoesthefollowing:1. Inpx,keyiandequaliarereplacedbykeyt(y)anditsassociatedequalvector.

PAGE 108

Inx,allkeysandassociatedvectorsandchildpointersareshiftedoneright;y:childt(y),y:intervalt(y),px:keyi,andpx:equali,respectively,becomex:child0,x:interval0,x:key0,x:equal0.3. Fromtheintervalsofy,removetheprexesthatincludetherange[px:keyi1;keyt(y)]andaddtheseremovedprexestopx:intervali1.4. Frompx:intervali,removethoseprexesthatdonotincludetherange[keyt(y);px:keyi+1]andaddtheseremovedprexestotheintervalsofxotherthanx:interval0.5. Tox:interval0(formerlyy:intervalt(y))addallpexesoriginallyinpx:intervali1.Next,removefromx:interval0,thoseprexesthatcontaintherange[keyt(y);px:keyi+1].Sincetheseremovedprexesarealreadyincludedinpx:intervali,theyarenottobeaddedagain. Onemayverifythatfollowingtheborrowoperation,wehaveaproperlystruc-turedPIBT.Further,sincetheprexesofintervalithatdonotincludeagivenrangemaybefoundinO(log2W)timeusingabinarysearchonprexlength,thetimecom-plexityoftheborrowoperationisO(m+log2W)andtheborrowoperationaccesses3nodes. 4.1.9 Merging Two Adjacent Siblings Whennodexisdecientanditsnearestsiblingyhasexactlydm=2e1keys,nodesx,yandthein-betweenkey,px:keyi,intheparentpxofxarecombinedintoasinglenode.Theresultingsinglenodehas2dm=2e2keys.Figure4{9showsthesituationwhenyisthenearestrightsiblingofx. Thestepsinthemergeofxandyare:1. Theprexesinpx:intervali1thatdonotincludetherange[px:keyi1;px:keyi+1]areremovedfrompx:intervali1andaddedtotheintervalsofx.2. Theprexesinpx:intervalithatdonotincludetherange[px:keyi1;px:keyi+1]areaddedtotheintervalsofy.px:intervaliisremovedfrompx.

PAGE 109

Figure4{9:Mergetwonodes3. Removepx:keyianditsassociatedequalvectorfrompxandappendtotherightofx.Next,appendthecontentsofytotherightofthenewx. Sincethemergingofxandyreducesthenumberofkeysinpxby1,pxmaybecomedecient.Ifso,theborrow/mergeprocessisrepeatedatpx.Inthisway,thedeciencymaybepropogatedallthewayuptotheroot.Incasetherootbecomesdecientithasnokeysandsoisdiscarded. ComplexityAnalysis.SincetheprexesofintervalithatdonotincludeagivenrangemaybefoundinO(log2W)timeusingabinarysearchonprexlength,twonodesmaybemergedinO(m+log2W)time;thenumberofnodesaccessedduringthemergeis3. 4.1.10 Deleting from a Non-leaf Node Todeletetheendpointu=x:keyifromthenon-leafnodex,uisreplacedbyeitherthelargestkeyinthesubtreex:childi1orthesmallestkeyinthesubtreex:childi[41].Lety:keyt(y)bethelargestkeyinthesubtreex:childi1(Figure4{10). Whenuisreplacedbyy:keyt(y),itisnecessaryalsotoreplacex:equalibyy:equalt(y).Beforeproceedingtoremovey:keyt(y)fromtheleafnodey,weneedtoadjusttheintervalvaluesofnodesonthepathfromxtoy.Letz,z6=x,beanodeonthepathfromxtoy.Asaresultoftherelocationofy:keyt(y),int(z)shrinksfrom[start(int(z));u]to[start(int(z));y:keyt(y)].So,prexesthatincludetherange

PAGE 110

Figure4{10:Deletingx:keyi[start(int(z));keyt(y)]butnottherange[start(int(z));u]aretoberemovedfromtheintervalsofzandaddedtotheparentofz.Since,therearenoendpointsbetweeny:keyt(y)andu=x:keyi,theseprexesthataretoberemovedfromtheintervalsofzmusthavey:keyt(y)asanendpoint(inparticular,theseprexesnishaty:keyt(y)).Hence,theseprexesareiny:equalt(y),andso,thenumberoftheseprexesisO(W).Asweretracethepathfromytox,thebitvectorsforthesetofprexestobere-movedfromeachnodemaybeconstructedintotaltimeO(logmn+W).AsittakesO(m)timetoremovethedesiredprexesfromtheintervalsofeachnodezandtoaddtotheparentofz,thetotaltimeneededtoupdatetheintervalvaluesforallnodesonthepathfromxtoy(includingnodesxandy)isO(mlogmn+W). Letvbetheleftmostleafnodeinthesubtreex:childi.Foreachnodez,z6=x,onthepathfromxtov,z:intexpandsfrom[u;finish(int(z))]to[y:keyt(y);finish(int(z))].Sincethereisnoprexthathasuasitsendpointandsincetherearenoendpointsbetweenuandy:keyt(y),nointervalvectorsonthepathfromxtovaretobechanged. ComplexityAnalysis.Addingtogetherthecomplexityofeachstepofthedeletionalgorithm,wegetO((m+log2W)logmn+W)astheoveralltimecomplexityofthedeleteoperation.ThenumberofnodeaccessesisO(logmn). ThetimecomplexityofthedeleteoperationbecomesO(mlogmn+W)whenthesearchforprexesthatdonotmatchagivenrange(thisisdonewhentwoadjacent

PAGE 111

Figure4{11:Mergingadjacentsiblings Unfortunately,thisserialsearchstrategyforthenon-matchingprexescannotbeadaptedtond,inO(W)time,allthematchingprexesrequiredduringtheinsertoperation. 4.1.11 Cache-Miss Analysis Thenumberofcachemissesthatoccurduringthelookupoperationisapprox-imatelythesameforPIBTsandMRTs.Aworst-caselookupwillexaminelogm=2nnodes.Ifabinarysearchisusedineachexaminednodetodeterminewhichsubtreetomoveto,theexaminationofeachnodewillcauseaboutlog2(mW=(8b))cachemisses(bisthesize,inbytes,ofacacheline).So,theworst-casenumberofcachemisses

PAGE 112

Fortheinsertoperation,wecountonlythenumberofreadmisses(sincewritemissesarenon-blocking,thesedonotaectperformanceasmuchasthereadmissesdo).Letsbethesize,innumberofcachelines,ofanMRTnode.Tosplitanode,wemustreadatleasttherighthalfofthatnode.Forsimplicity,weassumethattheentirenodeisread.Withthisassumption,ourcache-misscountwillalsoaccountforcachemissesthatoccuronthedownwardsearchpassofaninsertoperation.Thetotalnumberofnodesthatgetsplitduringaninsertmaybeashighash,wherehistheheightoftheMRT.So,theworst-casenumberofcachemissesexclusiveofthoseneededtoupdateinformationinnodesnotaccessedbythesearchandsplitstepsishs.BesidesmaintainingtheB-treepropertiesoftheMRT,aninsertmustupdatethespanvector(denedin[20])storedineachofthechildrenofanodethatgetssplit.Thisrequires(m1)hmhspanvectorstobeupdatedatanadditionalcostofmhcachemisses(eachspanvectorisassumedtotinacacheline).So,theworst-casenumberofcachemissesisapproximately,h(s+m). ThenodesofthePIBTstructureareapproximatelytwiceaslargeasthoseoftheMRT.Sincetheworst-caseheightsofthePIBTandMRTarealmostthesame,thenumberofcachemissesduringthedownwardsearchpassandtheupwardnode-splitpassisatmost2hs.Nonewnodesareaccessedtoupdateintervalvectors.So,2hsisaboundfortheentireinsertoperation.Sinces8m=b,theratiooftheworst-casemissesforMRTandPIBTisapproximately(b+8)=16.Whenb=32(asitisforaPC),thisratiois2.5.Thatis,theMRTwillmake2.5timesasmanycachemisses,intheworst-case,aswillthePIBTduringaninsertoperation.

PAGE 113

ThePBOBofLuetal.[35]makes2log2ncachemissesduringaworst-caseinsert.Since,2hs16m=blogm=2n=4log2n,whenm=b=32,anorder32PIBTmakestwiceasmanycachemissesduringaworst-caseinsertasdoesthePBOB. Theanalysisforthedeleteoperationisalmostidentical,andthecache-misscountsarethesameasfortheinsertoperation. 4.2 Highest-Priority Range-Tables Inthissection,weextendthePIBTstructuretoobtainaB-tree-baseddatastructurecalledRIBT(rangeinB-tree).TheRIBTstructureisfordynamicrouter-tableswhoseltersarenon-intersectingranges. 4.2.1 PreliminariesDenition19 [4;4]istheoverlapof[2;4]and[4;6];andoverlap([3;8];[2;4])=[3;4].[2,4]and[6,9]aredisjoint;[2,4]and[3,4]arenested;[2,4]and[2,2]arenested;[2,8]and[4,6]arenested;[2,4]and[4,6]intersect;and[3,8]and[2,4]intersect.Denition20 IfRhasjRj+1distinctendpoints,then[s;f]2R.

PAGE 114

Part(b)followsfromtheproofofCase1.If[s;f]62R,RhasatleastjRj+2distinctendpoints. 4.2.2 The Range In B-Tree Structure|RIBT TheRIBTisanextensionofthePIBTstructuretothecaseofanNHPRT.AsinthePIBTstructure,wemaintainaB-treeofdistinctrange-endpoints.LetxbeanodeoftheRIBTB-tree.x:intandx:intiaredenedasforthecaseofthePIBTB-tree.Witheachendpointx:keyiinnodex,wekeepamax-heap,equalHi,ofrangesthathavex:keyiasanendpoint.AsinthecaseofthePIBT,thedefault

PAGE 115

PAGE 116

key1;key2;:::;keyt wherehprsisthehighest-priorityrangeinset(x)thatmatchesx:ints,equalHptrsisapointertoequalHs,andintervalHptrsisapointertotheintervalHmax-heapwhoseindexis(is;js). Sincethetotalnumberofendpointsisatmost2nandsinceeachB-treenodeotherthantheroothasatleastdm=2e1keys(keysarerangeendpoints),thenumberofB-treenodesinanRIBTisO(n=m).EachB-treenodetakesO(m)memoryexclusiveofthememoryrequiredforthemaxheaps.So,exclusiveofthemax-heapmemory,weneedO(n)memory.EachrangemaybestoredonO(1)maxheapsateachleveloftheB-tree.So,themax-heapmemoryisO(nlogmn).Therefore,thetotalmemoryrequirementoftheRIBTisO(nlogmn). 4.2.3 RIBT Operations Figure4{12givesthealgorithmtondthepriorityofthehighest-priorityrangethatmatchesthedestinationaddressd.Thisalgorithmiseasilymodiedtondthehighest-priorityrangethatmatchesd.ThealgorithmdiersfromalgorithmlengthOflmp(Figure4{3)primarilyintheabsenceofthebreakstatementinthewhileloop.SinceLemma33doesnotextendtothecaseofhighest-prioritymatchinginnon-intersectingranges,itisn'tpossibletostopthesearchforhp(d)followingtheexaminationoftheequalHmax-heapford. Thecomplexityofhp(d)isO(log2mlogmn)=O(log2n)andthenumberofnodesaccessedisO(logmn).ThealgorithmstoinsertanddeletearangearesimilartothecorrespondingPIBTalgorithms.So,wedonotdescribethesehere.WhenthemaximumdepthofnestingoftherangesisD.AlthoughDnforrangesand

PAGE 117

4.3 Experimental Results WeimplementedtheB-treerouter-tabledatastructuresPIBT(Section4.1)andMRT[20]inC++andcomparedtheirperformanceona700MHzPC.Initialexperi-mentationwiththeimplementationsofthetwoB-treestructuresshowedthatsearchtimeisoptimalwhentheB-treeorderis32(i.e.,m=32).Consequently,allexper-imentalresultsreportedinthissectionareforthecasem=32.TodeterminewhatbenetsaccruefromtheuseofaB-treerelativetoabinarysearchtree,weincludedalsothePBOBdatastructureofLuetal.[35]inourperformancemeasurements.OurexperimentswereconductedusingsixIPv4prexdatabasesobtainedfrom[38].ThedatbasesPaix1,Pb1,MaeWestandAadswereobtainedonNov22,2001,whilePb2andPaix2wereobtainedSep13,2000.Thenumberofprexesineachofour6

PAGE 118

Table4{2:MemoryUsage.m=32forPIBTandMRT Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 NumofPrexes 16172 22225 28889 31827 35303 85988 PIBT(KB) 715 993 1292 1425 1604 3936 MRT(KB) 813 1132 1471 1621 1834 4526 PBOB(KB) 369 509 661 728 811 1961 Tomeasuretheaveragelookuptime,foreachprexdatabase,wegenerated1000randomaddresses,randAddr[0::999],thatarematchedbyoneormoreoftheprexesinthedatabase.Then,asequenceof1millionlookupsaredonebygenerating1millionunformlydistributedrandomnumbersintherange[0::999].Whenarandomnumberiisgenerated,wendlmp(randAddr[i]).Fromthetimerequiredforthissequenceof1millionrandomnumbergenerationsandlookups,wesubtractthetimefortherandomnumbergenerationanddivideby1milliontogettheaveragetimepersearch.Foreachdatabaseandrouter-tabledatastructure,theexperimentisdone10timesandtheaverageoftheaveragesaswellasthestandarddeviationoftheaveragescomputed.Since,eachdestinationinrandAddrissearchedforapproximately1000times,theexperimentsimulatesaburstytracenvironment.Sincethe1000addressesinrandAddrtake4000bytes,thepollutionofL2cache(256KB)byrandAddrislessthanwhatitwouldbeifwegeneratedarandomsequenceof1millionaddressesandsavedtheseinanarray. Table4{3givesthemeasuredaveragelookuptimes.TheseaveragetimesarehistogrammedinFigure4{13.Ascanbeseen,PIBTandMRThavealmostthesameperformanceonlookup.Thisistobeexpectedasalookupineitherstructureresultsinthesamenumberofcachemissesandalsodoesthesameamountofwork.The

PAGE 119

Table4{3:LookuptimeonaPentiumIII700MHzPC.m=32forPIBTandMRT.Varianceis<0:02 Database Paix1 Pb1 MaeWest Aads Pb2 Paix2 PIBT 0.33 0.34 0.35 0.36 0.33 0.42 MRT 0.33 0.35 0.35 0.35 0.33 0.41 PBOB 0.49 0.50 0.53 0.53 0.51 0.61 Figure4{13:LookuptimeonaPentiumIII700MHzPC.m=32forPIBTandMRT Fortheaverageupdate(insert/delete)time,westartbyselecting1000prexesfromthedatabase.Those1000prexesarerstremovedfromthedatastructure.Oncethe1000removalsaredone,theremoved1000prexesareinsertedbackinto

PAGE 120

Table4{4:UpdatetimeonaPentiumIII700MHzPC.m=32forPIBTandMRT Paix1 Pb1 MaeWest Aads Pb2 Paix2 PIBT mean 2.89 3.21 3.32 3.31 3.53 4.16 std 0.061 0.021 0.040 0.033 0.047 0.016 MRT mean 4.15 4.37 4.55 4.47 5.21 5.69 std 0.183 0.034 0.029 0.043 0.239 0.030 PBOB mean 0.42 0.43 0.44 0.45 0.46 0.47 std 0.002 0.002 0.001 0.003 0.007 0.004 Figure4{14:UpdatetimeonaPentiumIII700MHzPC.m=32forPIBTandMRT Ascanbeseen,anupdateinPBOBtakesmuchlesstimethandoesanupdateineitherMRTandPIBT.Further,anupdateinPIBTtakesabout30%lesstimethandoesanupdateinMRT.

PAGE 121

4.4 Conclusion WehavedevelopedanalternativeB-treerepresentationfordynamicrouterta-bles.AlthoughourrepresentationhasthesameasymptoticcomplexityasdoestheB-treerepresentationofSurietal.[20],oursisfasterfortheupdateoperation.Thisisbecauseourstructureperformsupdateswithfewercachemisses.Forthesearchoperation,bothB-treestructurestakeaboutthesametime.Whencomparedtothefastestbinarytreestructure,PBOB,fordynamicroutertables,weseethattheuseofahigh-degreetreeenablestheB-treestructuretoperformbetteronthesearchoperation.However,ontheupdateoperation,PBOBisdecidedlysuperior.

PAGE 122

5.1 Conclusion Wehavedevelopedseveraldatastructuresfordynamicroutertables.Thenoveltyofourdatastructuresissupportingreal-timeupdate,supportingrangelters,andsupportingthehighest-prioritymatchingtiebreaker(BOB,PBOB,RIBT). Therstdatastructure,PST[33,34],permitsonetosearch,insert,anddeleteinO(logn)timeeachusingthemostspecicmatchingtiebreaker.AlthoughO(logn)timedatastructuresforprextableswereknownpriortoourwork[21,22],PSTismorememoryecientthanthedatastructuresofSahnietal.[21,22].Further,PSTissignicantlysuperiorontheinsertanddeleteoperations,whilebeingcompetitiveonthesearchoperation.Fornonintersectingrangesandconict-freerangesPSTsarethersttopermitO(logn)search,insert,anddelete.AllofourdatastructuresbasedonprioritysearchtreeusesO(n)memory. Theseconddatastructure,BOB[35],worksforhighest-prioritymatchingwithnonintersectingranges.Itsvariant,PBOB,worksforprexsetornonintersectingrangesetwithalimitednumberofnestinglevels(6forIPv4backbonerouterta-ble[38]).InordertosupportO(logn)timeupdate,BOBtransformstheproblemofremovinganemptydegree-2nodefromthetop-leveltreetotheproblemofremov-inganemptydegree-1ordegree-0nodefromthetop-leveltree.OurexperimentsshowthatPBOBistobepreferredoverPSTfortherepresentationofdynamiclongest-matchingprex-router-tables.ButPSTremainstheonlychoicefordetect-ingintersectandconictbetweenrangesinO(logn)time.Onpracticalruletables,113

PAGE 123

ThethirddatastructureisbasedonB-treeinordertoutilizethewidecachelinesize.Itisdesignedforprexltersaswellasnon-intersectingrangelters.Surietal.[20]proposedamulti-wayrangetreethatisalsobasedonB-tree.AcrucialdierencebetweenourdatastructureforprexltersandthatofSurietal.[20]isthatinourdatastructure,eachprexisstoredinO(1)B-treenodesperB-treelevel,whereasinthestructureofSurietal.[20],eachprexisstoredinO(m)nodesperlevel(mistheorderoftheB-tree).Asaresultofthisdierence,oursisfasterfortheupdateoperation.Forthesearchoperation,bothB-treestructurestakeaboutthesametime.WhencomparedtoPBOB,wendthattheuseofahigh-degreetreeenablestheB-treestructuretoperformbetteronthesearchoperation.However,ontheupdateoperation,PBOBisdecidedlysuperior. 5.2 Future Work Itismorechallengingtodesigndatastructuresformultidimensionalrouterta-bles.Theproblemofpointlocationinasetofnnon-overlappingd-dimensionalhyper-rectanglesrequiresO(logn)timewithO(nd)memoryrequirementorO((logn)d1)timewithO(n)memoryrequirement[43].Multidimensionalclassicationisnoeas-ierthanpointlocationproblemsincethelterscanoverlap.Theabovecomplexityboundsdonotconsiderupdate.Thustheyareforstaticdatastructures. Almostalltheexistingschemesformultidimensionalroutertablesfocusonstaticdatastructuresforprexlters.Guptaetal.[43]reviewstaticdatastructuresformultidimensionalroutertables.HierarchicaltriesrequiresO(Wd)timeforsearchwithO(dWn)memoryrequirement.HierarchicaltriescansupportupdateinO(dW)time.Setpruningtrees[44]supportsearchinO(dW)timewithO(nd)memoryrequirement.Cross-producting[44]decomposesthesearchintodone-dimensional

PAGE 124


PAGE 125

C.Labovitz,G.Malan,andF.Jahanian,Internetroutinginstability,ACMSIG-COMM,Cannes,FrenchRiviera,France,September1997.[2] C.Labovitz,A.Ahuja,A.Bose,andF.Jahanian,DelayedInternetroutingcon-vergence,ACMSIGCOMM,Stockholm,Sweden,August-September2000.[3] D.Pei,X.Zhao,L.Wang,D.Massey,A.Mankin,S.WuandL.Zhang,ImprovingBGPconvergencethroughconsistencyassertions,IEEEINFOCOM,NewYorkCity,NewYork,USA,June2002.[4] C.Macian,R.Finthammer,Anevaluationofthekeydesigncriteriatoachievehighupdateratesinpacketclassiers,IEEENetwork,15(6):24-29,Novem-ber/December2001.[5] F.Baboescu,S.Singh,andG.Varghese,Packetclassicationforcorerouters:isthereanalternativetoCAMs?IEEEINFOCOM,SanFrancisco,California,USA,April2003.[6] M.Ruiz-Sanchez,E.Biersack,andW.Dabbous,SurveyandtaxonomyofIPaddresslookupalgorithms,IEEENetwork,15(2):8-23,March/April2001.[7] S.Sahni,K.Kim,H.Lu,Datastructuresforone-dimensionalpacketclassi-cationusingmost-specic-rulematching,InternationalSymposiumonParallelArchitectures,Algorithms,andNetworks(ISPAN),MakatiCity,MetroManila,Philippines,May2002.[8] K.Sklower,Atree-basedroutingtableforBerkeleyUnix,TechnicalReport,UniversityofCalifornia,Berkeley,1993.[9] M.Degermark,A.Brodnik,S.Carlsson,andS.Pink,Smallforwardingtablesforfastroutinglookups,ACMSIGCOMM,Cannes,FrenchRiviera,France,Septem-ber1997.[10] W.Doeringer,G.Karjoth,andM.Nassehi,Routingonlongest-matchingpre-xes,IEEE/ACMTransactionsonNetworking,4(1):86-97,1996.[11] S.NilssonandG.Karlsson,Fastaddresslook-upforInternetrouters,IEEEBroadbandCommunications,Stuttgart,Germany,April1998.[12] V.Srinivasan,GeorgeVarghese,FasterIPlookupsusingcontrolledprexex-pansion,ACMSIGMETRICSPerformanceEvaluationReview,26(1):1-10,1998116

PAGE 126

S.SahniandK.Kim,Ecientconstructionofxed-StridemultibittriesforIPlookup,Proceedings8thIEEEWorkshoponFutureTrendsofDistributedComputingSystems(FTDCS),Bologna,Italy,October-November2001.[14] S.SahniandK.Kim,Ecientconstructionofvariable-stridemultibittriesforIPlookup,ProceedingsIEEESymposiumonApplicationsandtheInternet(SAINT),Naracity,Nara,Japan,January-February2002.[15] P.Gupta,S.Lin,andN.McKeown,Routinglookupsinhardwareatmemoryaccessspeeds,IEEEINFOCOM,SanFrancisco,USA,March-April1998.[16] N.Huang,S.Zhao,AnovelIP-routinglookupschemeandhardwarearchitectureformultigigabitswitchingrouters,IEEEJounalonSelectedAreasinCommuni-cations,17(6):1093-1104,June1999.[17] A.Basu,G.Narlika,Fastincrementalupdatesforpipelinedforwardingengines,IEEEINFOCOM,SanFrancisco,California,USA,April2003.[18] M.Waldvogel,G.Varghese,J.Turner,andB.Plattner,ScalablehighspeedIProutinglookups,ACMSIGCOMM,Cannes,FrenchRiviera,France,September1997.[19] B.Lampson,V.Srinivasan,andG.Varghese,IPlookupusingmulti-wayandmulticolumnsearch,IEEEINFOCOM,SanFrancisco,USA,March-April1998.[20] S.Suri,G.Varghese,andP.Warkhede,Multiwayrangetrees:ScalableIPlookupwithfastupdates,GLOBECOM,SanAntonio,Texas,USA,November2001.[21] S.SahniandK.Kim,O(logn)dynamicpacketrouting,IEEESymposiumonComputersandCommunications,Taormina,ITALY,July2002.[22] S.SahniandK.Kim,Ecientdynamiclookupforburstyaccesspatterns,sub-mitted.[23] F.Ergun,S.Mittra,S.Sahinalp,J.Sharp,andR.Sinha,Adynamiclookupschemeforburstyaccesspatterns,IEEEINFOCOM,Anchorage,Alaska,USA,April2001.[24] E.Horowitz,S.Sahni,andD.Mehta,FundamentalsofdatastructuresinC++,W.H.Freeman,NewYork,1995.[25] P.Gupta,andN.McKeown,Dynamicalgorithmswithworst-caseperformanceforpacketclassication,IFIPNetworking,Paris,France,May2000.[26] A.McAuleyandP.Francis,FastroutingtablelookupsusingCAMs,IEEEIN-FOCOM,SanFrancisco,CA,USA,March-April1993.[27] D.Shah,andP.Gupta,FastupdatingalgorithmsforTCAMs,IEEEMICRO,21(1):36-47,2001.

PAGE 127

C.Matsumoto,CAMvendorsconsideralgorithmicalternatives,,EETIMES,May20,2002.[29] G.CheungandS.McCanne,OptimalroutingtabledesignforIPaddresslookupsundermemoryconstraints,IEEEINFOCOM,NewYorkCity,NewYork,USA,March1999.[30] G.ChandranmenonandG.Varghese,Tradingpacketheadersforpacketprocess-ing,IEEETransactionsonNetworking,4(2):141-152,1996.[31] P.Newman,G.Minshall,andL.Huston,IPswitchingandgigabitrouters,IEEECommunicationsMagazine,64-69,January1997.[32] A.Bremler-Barr,Y.Afek,andS.Har-Peled,Routingwithaclue,ACMSIG-COMM,Cambridge,MA,USA,September1999.[33] H.LuandS.Sahni,Prioritysearchtreesanddynamicrouter-tables.Submitted.[34] H.LuandS.Sahni,O(logn)dynamicrouter-tablesforranges,IEEESymposiumonComputersandCommunications,Kiris-Kemer,Turkey,June-July2003.[35] H.LuandS.Sahni,DynamicIProuter-tablesusinghighest-prioritymatching,Submitted.[36] A.Hari,S.Suri,andG.Parulkar,Detectingandresolvingpacketlterconicts,IEEEINFOCOM,Tel-Aviv,Israel,March2000.[37] E.McCreight,Prioritysearchtrees,SIAMJr.onComputing,14(2):257-276,1985.[38] Merit,Ipmastatistics,,November25,2001.[39] K.Melhorn,Datastructuresandalgorithms3:Multi-dimensionalsearchingandcomputationalgeometry,SpringerVerlag,NewYork,1984.[40] T.H.Cormen,C.E.Leiserson,R.L.Rivest,C.Stein,IntroductiontoAlgorithms,2ndedition,McGrawHill,NewYork,2001.[41] S.Sahni,Datastructures,algorithms,andapplicationsinJava,McGrawHill,NewYork,2000.[42] P.Warkhede,S.Suri,andG.Varghese,Fastpacketclassicationfortwo-Dimensionalconict-Freelters,IEEEINFOCOM,Anchorage,Alaska,USA,April2001.[43] P.GuptaandN.Mckeown,Algorithmsforpacketclassication,IEEENetwork,15(2):24-32,2001.

PAGE 128

V.Srinivasan,G.Varghese,S.Suri,M.Waldvogel,Fastandscalablelayerfourswitching,ACMSIGCOMM,Vancouver,BC,Canada,August-September1998.[45] A.Feldmann,S.Muthukrishnan,Tradeosforpacketclassication,IEEEIN-FOCOM,Tel-Aviv,Israel,March2000.[46] V.Srinivasan,S.Suri,G.Varghese,Packetclassicationusingtuplespacesearch,ACMSIGCOMM,Cambridge,Massachusetts,USA,September1999.[47] P.GuptaandN.Mckeown,Classicationusinghierarchicalintelligentcuttings,IEEEMicro,20(1):34-41,2000.