Citation |

- Permanent Link:
- https://ufdc.ufl.edu/UF00100806/00001
## Material Information- Title:
- Algorithms for external beam dose computation
- Creator:
- Jung, Haejae (
*Dissertant*) Sahni, Sartaj K. (*Thesis advisor*) Ranka, Sanjay (*Reviewer*) Palta, Jatinder (*Reviewer*) Rajasekaran, Sanguthevar (*Reviewer*) Vemuri, Baba (*Reviewer*) Chun, Paul W. (*Reviewer*) - Place of Publication:
- Gainesville, Fla
- Publisher:
- University of Florida
- Publication Date:
- 2000
- Copyright Date:
- 2000
- Language:
- English
## Subjects- Subjects / Keywords:
- Algorithms ( jstor )
Bytes ( jstor ) Data models ( jstor ) Datasets ( jstor ) Dosage ( jstor ) Information search ( jstor ) Input data ( jstor ) Pixels ( jstor ) Plant roots ( jstor ) Run time ( jstor ) Computational complexity ( lcsh ) Computer and Information Science and Engineering thesis, Ph. D ( lcsh ) Data structures (Computer science) ( lcsh ) Dissertations, Academic -- Computer and Information Science and Engineering -- UF ( lcsh ) Trees (Graph theory) -- Data processing ( lcsh ) - Genre:
- bibliography ( marcgt )
theses ( marcgt ) government publication (state, provincial, terriorial, dependent) ( marcgt ) non-fiction ( marcgt )
## Notes- Abstract:
- This dissertation presents novel dose computation algorithms based on the use of spatial data structures derived from octrees and quadtrees. This data structure makes approximate dose computation possible. Our methods approximate the scatter effect due to a cluster of proximate points which are far away from the point of interest. The scatter effect of the cluster is approximated by the scatter effect due to a single point at the center of this cluster. To compute dose contribution from one point to another point, ray tracing is performed to compute radiological distance between two points. We used a region growing scheme to reduce the number of regions: That decreases the number of intersection points. This region growing scheme using three dimensional real computed tomography (CT) data could eliminate about 80 percent of the regions produced by a regular region partitioning scheme, which is a variant of the octree based scheme. For region growing, an efficient binary search tree scheme with supernodes that have multiple elements has been developed. It is shown that this supernode scheme is better than binary search trees with single element nodes in terms of space efficiency and run time performance. Our experimental results of dose computation on homogeneous phantoms show that the resultant algorithm is two to three orders of magnitude faster than the collapsed cone algorithm while achieving similar levels of accuracy. ( , )
- Subject:
- dose, binary tree, ray tracing
- Thesis:
- Thesis (Ph. D.)--University of Florida, 2000.
- Bibliography:
- Includes bibliographical references (p. 140-142).
- General Note:
- Title from first page of PDF file.
- General Note:
- Document formatted into pages; contains xii, 143 p.; also contains graphics.
- General Note:
- Vita.
- Statement of Responsibility:
- by Haejae Jung.
## Record Information- Source Institution:
- University of Florida
- Holding Location:
- University of Florida
- Rights Management:
- All applicable rights reserved by the source institution and holding location.
- Resource Identifier:
- 47680156 ( OCLC )
002678728 ( AlephBibNum ) ANE5955 ( NOTIS )
## UFDC Membership |

Downloads |

## This item has the following downloads: |

Full Text |

ALGORITHMS FOR EXTERNAL BEAM DOSE COMPUTATION By HAEJAE JUNG A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2000 Copyright 2000 by Si., i Jung ACKNOWLEDGMENTS I wish to express my whole-hearted appreciation and gratitude to my advisor Distinguished Professor Dr. Sartaj K. Sahni for giving countless hours of guidance in my research work. Without his support and patience, this research would not have been done. I would also like to thank my coadvisor Dr. nli ,y Ranka and other members in my supervisory committee, Dr. Jatinder Palta, Dr. Sanguthevar R ,ii 1- : ,1 1~ Dr. Baba Vemuri, and Dr. Paul W. Chun, for their interests and comments. I would like to express my appreciation to Dr. Siyong Kim, Mr. Didier Rajon, and Dr. Timothy C. Zhu for their comments and kernel generation. Last, but not least, my greatest appreciation goes to my parents, my parents-in- law, my wife, and my kids. Although I am half a globe away, I am .i1' i,- surrounded by their love and tender care. To them I dedicate this work. TABLE OF CONTENTS page ACKNOWLEDGMENTS ................... ...... iii LIST OF TABLES ...................... ......... vi LIST OF FIGURES ................... ......... viii ABSTRACT ....................... ........... xi CHAPTERS 1 INTRODUCTION ........................... 1 2 PROBLEM DESCRIPTION ........... ........... 5 2.1 Collapsed Cone Scheme ............. .... .... 6 2.2 The Outline of Our Hierarchical Method ........... 8 3 HIERARCHICAL REPRESENTATION FOR CONVOLUTION . 11 4 RAY TRACING ALGORITHM ......... ............ 16 4.1 Region Partition ......................... 17 4.1.1 Region Construction ......... ......... 18 4.1.2 Region Growing ........ ......... .... 23 4.2 Radiological Distance Computation ...... ..... ... 27 5 EFFICIENT DYNAMIC SEARCH TREES ........... .. 29 5.1 Rebalancing ................... .... . 31 5.2 AVL Tree with Supernode ........ .......... 35 5.3 Bottom-up Red-black Tree with Supernode .......... 40 5.4 Bottom-up SpI1 li Tree with Supernode ...... ...... 46 5.5 Top-down Sp1 li. Tree with Supernode ...... ....... 52 5.6 Space Efficiency ......... ............... 57 5.7 Efficient Use of Cache Memory ................ .. 58 5.8 Experimental Results .................. ..... 62 6 DOSE COMPUTATION ALGORITHMS . . . . ..... 114 6.1 Simple Threshold Algorithm ................. .. 114 6.2 Adaptive Threshold Algorithm . . . . . 119 6.3 Dynamic Algorithm ............... . . 124 7 EXPERIMENTAL RESULTS ........ ......... ... 128 8 CONCLUSION AND FURTHER RESEARCH ............ 138 REFERENCES ...................... ........... 140 BIOGRAPHICAL SKETCH ................... ........ 143 LIST OF TABLES Table page 5.1 Actual memory utilization in mega bytes ................ ..62 5.2 AVL tree performance in base model with non-aligned supernode for 1M data (sec) . ............... ..... ......... . 68 5.3 AVL tree performance in base model with aligned supernode for 1M data (sec) ...... ......... ............ ... ..69 5.4 AVL tree performance in base model with non-aligned supernode for I\ data (sec) ..... .......... ............ 70 5.5 AVL tree performance in base model with aligned supernode for I\ data (sec) ...... ............. ............... .. 71 5.6 Red-black tree performance in base model with non-aligned supernode for 1M data (sec) ............... ......... . 72 5.7 Red-black tree performance in base model with aligned supernode for 1M data (sec) ...... .......... ..... ......... 73 5.8 Red-black tree performance in base model with non-aligned supernode for ! data (sec) ...... .................... .. 74 5.9 Red-black tree performance in base model with aligned supernode for i data (sec) ...... .................... .. 75 5.10 Bottom-up splay tree performance in base model with non-aligned su- pernode for 1M data (sec) ............... .... . 76 5.11 Bottom-up splay tree performance in base model with aligned supernode for 1M data (sec) ............... .......... .. 77 5.12 Bottom-up splay tree performance in base model with non-aligned su- pernode for \ data (sec) ............... .... . 78 5.13 Bottom-up splay tree performance in base model with aligned supernode for ,li! data (sec) ....... ....... ....... .. .... 79 5.14 Top-down splay tree performance in base model with non-aligned su- pernode for 1M data (sec) ............... .... . 80 5.15 Top-down splay tree performance in base model with aligned supernode for 1M data (sec) ...... ............... ...... .. 81 5.16 Top-down splay tree performance in base model with non-aligned su- pernode for I\ data (sec) .................. .. .. 82 5.17 Top-down splay tree performance in base model with aligned supernode for ,!! data (sec) ...... ............... ...... .. 83 5.18 Performance in hold model with aligned supernode for iM data (sec) .90 5.19 Performance in hold model with aligned supernode for I\ data (sec) 91 5.20 Performance in stack model with aligned supernode for 1M data (sec) 94 5.21 Performance in stack model with aligned supernode for I\ data (sec) 95 5.22 Performance in queue model with aligned supernode for 1M data (sec) 98 5.23 Performance in queue model with aligned supernode for I!\! data (sec) 99 5.24 Performance in histograming with aligned supernode for 1M data (sec) 102 5.25 Performance in histograming with aligned supernode for I!\! data (sec) 103 5.26 Performance in histograming with aligned supernode for 16M data (sec) 104 5.27 Performance in histogram report with aligned supernode for 1M data (sec) 105 5.28 Performance in histogram report with aligned supernode for I data (sec) 106 5.29 Performance in histogram report with aligned supernode for 16M data (sec) ........ ........ ........ ....... .. 107 7.1 Error in the dose difference between collapsed cone scheme CCa48 and exhaustive convolution. .................. .... . 130 7.2 Dose errors using the collapsed cone scheme for smaller number of zenith and azimuthal angles using CCa48 as the base. . . . ..... 130 7.3 Dose errors for our algorithms using different threshold levels. ..... ..131 7.4 Dose computation time for different algorithms. . . . . . 131 7.5 Dose errors using the tree schemes for the two adaptive schemes . . 132 7.6 Dose computation time in adaptive tree scheme. . . . . 132 7.7 Parallel dose computation time of Adaptive Scheme 2 (min) ..... ..134 LIST OF FIGURES Figure page 1.1 D osim etry . . . . . . . . . . . . . . . . . 2 1.2 Basic idea . . . . . . . . . . . . .. . . .. . 4 2.1 Dosim etry . . . . . . . . . . . . ..... 5 2.2 A cone and its ray .................. . ... 6 2.3 Dose computation example in the collapsed cone scheme . . . . 8 2.4 The processing sequence of hierarchical method ..... . . . . 9 3.1 An example 4 x 4 phantom .................. .... .. 12 3.2 Hierarchical representation of the 4 x 4 phantom . . . . . 12 4.1 Radiological distance ................ ........ .. 16 4.2 A given heterogeneous medium with two regions . . . . 20 4.3 Regions following merge along column ................. .. 21 4.4 Regions following merge along row ................... .. . 21 4.5 Regions formed by the region construction step . . . . ..... 22 4.6 Regions extension .................. ......... .. .. 24 4.7 Regions formed by region partitioning process . . . . ...... 26 4.8 Partitioned regions representation .................. .. 28 5.1 Supernode structure .................. ......... .. 30 5.2 Transformation to leaf node insertion and deletion . . . . .... 31 5.3 LL rotation .................. .............. .. 32 5.4 LR rotation .................. .............. . 33 5.5 L splay .. .. ... .. .. .. .. ... .. .. .. .. ... ...... . 33 5.6 LL splay .................. ................ .. 34 5.7 LR splay AVL tree supernode .. . ...................... Red-black tree supernode structure .. .............. Sp1 ,i tree supernode structure .. ................. Supernode layout for the efficient use of cache memory ...... Cache aligned chunk with 3 supernodes .. ............ Four test models (I: Insert, S: Search, D: Delete, rand: random sequ inc: increasing sequence, dec: decreasing sequence 5.14 Average insertion time in base model with 1M data insertion time in base model with I\ data search time in base model with 1M data set search time in base model with \! data set deletion time in base model with 1M data s deletion time in base model with I\ data s run time in hold model with 1M data sets time time time time time time time time time time hold model with I\ data sets stack model with 1M data sets stack model with I\ data sets queue model with iM data sets queue model with I\ data sets histograming with 1M data sets histograming model with I\ da histograming model with 16M d histogram report with 1M data histogram report with I \ data 5.31 Average run time in histogram report with 16M date 5.8 5.9 5.10 5.11 5.12 5.13 S . . . . 63 sets . . . ... 84 sets . . . ... 85 s . . . ... 86 s . . . ... 87 ets . . . .. 88 ets . . . .. 89 . . . . . 92 . . . . . 93 . . . . . 96 . . . . . 97 . . . . . 100 . . . . . 101 . . . . 108 ta sets ....... 109 ata sets . . .. 110 sets . . ... 111 sets . . ... 112 a sets . . ... 113 6.1 Approximation of the effect from the region represented by a live node ence, 5.15 5.16 5.17 5.18 5.19 5.20 5.21 5.22 5.23 5.24 5.25 5.26 5.27 5.28 5.29 5.30 Average Average Average Average Average Average Average Average Average Average Average Average Average Average Average Average 6.2 Finding nearest point .................. ......... .. 118 6.3 Spatial binary tree of the 4 x 4 phantom ................ .119 6.4 Dose vs relative error plots for voxels using our algorithm for threshold of 3.1 cm and 2.3 cm respectively. .................. . 121 6.5 Node relation for error estimation .................. .. 124 7.1 Dose vs relative error plots for voxels using the two adaptive schemes 133 7.2 Dose contribution error in heterogeneous medium . . . .... 135 7.3 Kernel curve showing forward effect ............... . . 136 7.4 Dose profile in phantom with horizontal slabs ........... . .137 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ALGORITHMS FOR EXTERNAL BEAM DOSE COMPUTATION By H i. i Jung December 2000 ('CI ii, i,: Dr. Sartaj K. Sahni Major Department: Computer and Information Science and Engineering This dissertation presents novel dose computation algorithms based on the use of spatial data structures derived from octrees and quadtrees. This data structure makes approximate dose computation possible. Our methods approximate the scatter effect due to a cluster of proximate points which are far away from the point of interest. The scatter effect of the cluster is approximated by the scatter effect due to a single point at the center of this cluster. To compute dose contribution from one point to another point, ray tracing is performed to compute radiological distance between two points. We used a region growing scheme to reduce the number of regions: That decreases the number of in- tersection points. This region growing scheme using three dimensional real computed tomography (CT) data could eliminate about 80 percent of the regions produced by a regular region partitioning scheme, which is a variant of the octree based scheme. For region growing, an efficient binary search tree scheme with supernodes that have multiple elements has been developed. It is shown that this supernode scheme is better than binary search trees with single element nodes in terms of space efficiency and run time performance. Our experimental results of dose computation on homogeneous phantoms show that the resultant algorithm is two to three orders of magnitude faster than the collapsed cone algorithm while achieving similar levels of accuracy. CHAPTER 1 INTRODUCTION In the treatment of cancer with radiation, the radiation oncologist is faced with the problem of prescribing a treatment regime that either cures or controls the neoplasm, but does not inflict serious complications to the patient. The difficulty of this task is illustrated by the dose-response relationships between tumor/normal tissues and the radiation dose delivered. Published clinical results show normal tissue response as a very steep function of radiation dose, i.e. a 5 .change in the dose delivered can result in a dramatic change in the local response of the tissue [1, 2]. Tod- i, most computerized planning of radiation therapy is treated as a two di- mensional (2D) problem, which limits the accuracy of the dose calculation. In regions with strong body contour change as well as tissue inhomogeneity, two dimensional methods can introduce clinically unacceptable error in dose computation. Most of the current algorithms do not calculate dose in the electron disequilibrium region, because only photon transport is considered. This is especially true in the interface region, such as air cavities, lung interfaces, and bone interfaces, where steep dose gradient is possible. Many full three dimensional (3D) dose calculation algorithms have been proposed, such as the differential scatter-air ratio (DSAR) method [3], equivalent tissue-air ratio (ETAR) method [4], fast Fourier transform (FFT) convolution [5, 6], delta-volume [7], and dose superposition and convolution algorithms [8, 9, 10, 11]. The most promis- ing algorithm is the convolution/superposition method. It is a model-based algorithm that simulates treatment situation from first principles and can account for dose in electron disequilibrium regions. This method separates primary interaction from the transport of secondary particles (secondary electrons and scattered photons) set in motion when the primary photon interacts. Figure 1.1 shows a geometry for dose cal- culation. The beam source shoots photons into a simulated geometric phantom that represents the human body. As can be seen in Figure 1.1, point B has two kinds of energy sources: one is the direct effect from the beam source (primary radiation), and the other one is the scatter effect from point A after primary interaction (scatter radi- ation). Primary interactions are relatively easy to calculate while the more complex secondary interactions are dealt with by convolving precomputed kernels generated by Monte Carlo calculation with the distributions of total energy removed per unit mass (terma) to yield the dose distribution. The terma at any point is equal to the product of the mass attenuation coefficient and the energy fluence. This method is quite suitable for computing dose in very complex three dimensional geometries. Beam Source Phantom A B Figure 1.1: Dosimetry The convolution based algorithms are often very slow in computational speed. Dose computation methods based on the convolution/superposition method spend much of their time in the calculation of scatter effects due to the primary dose deliv- ered at a given point. An exhaustive calculation of scatter dose for a given point of interest requires calculating a summation of the scatter effects due to all the primary doses delivered at all the remaining voxels. Assuming all voxels to be in the region of interest, this results in a quadratic (i.e., in the number of voxels) behavior for the overall computation. Thus, in order to compute the dose in a region containing 13 voxels, the contributions from all 13 voxels have to be accounted for, making the order of calculation to be 16. With the current 3D convolution algorithm, speed is still the 1n i, i" problem. \1 ,ii;: researchers have tried to improve the efficiency of the 3D convolution algorithm [12, 6]. The collapsed cone algorithm [12] represents the current state of the art in these efficient codes. However its speed is still too slow for the algorithm to be used in a clinical setting for real time dose optimization. So, the development of new dose computational methods and algorithms that operate in real-time is crucial. In this thesis, we develop novel dose computation algorithms based on the use of spatial data structures derived from octrees and quadtrees. Typically the secondary effects for a given voxel diminish rapidly with distance from the primary voxel. This allows for approximation of the scatter effect due to a cluster of proximate points which are far away from the point of interest. The scatter effect of the cluster is approximated by the scatter effect due to a single point at the center of this cluster. The dose of this single point is equal to the summation of the doses on individual voxels in the cluster, as can be seen in Figure 1.2. Our experimental results on homogeneous phantoms show that the numerical error introduced can be controlled. The resultant algorithm is two to three orders of magnitude faster than the collapsed cone algorithm. The rest of the thesis is organized as follows. C'!i pter 2 describes the prob- lem, the collapsed cone algorithm, and the processing sequence of our hierarchical I g \ - 0 0 I' Dose Figure 1.2: Basic idea method. ('C! ipter 3 describes the data structure, spatial binary tree, and its initial- ization for fast dose computation. In ('! Ilpters 4 and 5, the method for computing the radiological distance and the efficient search trees that may be used in the radio- logical distance computation are described. C'! Ilpter 6 presents the dose computation algorithms based on our hierarchical data structures. And finally, ('! Ilpter 7 shows experimental results, and in ('!I Ipter 8, conclusion and further research are discussed. CHAPTER 2 PROBLEM DESCRIPTION As can be seen in Figure 2.1, the phantom is a w x h x d three dimensional region of n = whd voxels. Let P(i, j, k) be the primary dose delivered at voxel (i, j, k). This refers to the terma value which reflects the total energy released by primary photon interactions per unit mass. The dose distribution, D(i, j, k), at this voxel is given by D(i,j,k) = EEEP(a, b, c) K( (2.1) a=lb lc= 1 where r whose length is the radiological distance between the two points is the vector between the voxels (a, b, c) and (i, j, k), K is the kernel describing dose contribution, and K(j gives the dose contribution at a voxel (rj away, from the voxel at (a, b, c) to the voxel at (ij, k). Phantom h (a,b,c) d (i,j, k) Figure 2.1: Dosimetry The computation of each D(i,j, k) takes 0(n) time when Equation 2.1 is used. So, the computation of the dose at all n points takes n2 time. When the three dimensions w, d, and h are the same, il 1, the computation is a Q(16) computation. This represents too much computer time for even modest values of 1. One way to reduce the computational time is to compute the dose in only a subregion (i.e., the region of interest or ROI) of the phantom. For desirable size ROIs, the computational time does not decrease to an acceptable level. So, approximations are attempted. The collapsed cone convolution method [12, 13, 14, 15, 16] is an approximation scheme used in commercial dose computation systems and described briefly in the following section. 2.1 Collapsed Cone Scheme In this method, D(i,j, k) at voxel (i,j, k) is computed by logically partitioning the phantom into mn cones with (i,j, k) the apex of each. The line from the apex of a cone to the center of the cone's base defines a ray (see Fig. 2.2). (i,j ,k) Figure 2.2: A cone and its ray The number of rays so defined is also equal to m The contribution to D(i,j, k) of the voxels in each cone is computed by following along its ray and computing the contribution of voxels on this ray. In a preprocessing step, a new convolution kernel, K, is computed from the dose spread array to account for the fact that each voxel along a ray actually represents several voxels (i.e., voxels or fractional voxels that are the same distance from (i,j, k)) of the phantom. This is done by summing the dose spread array values for these voxels. For each voxel (a, b, c) on the ray, the product P(a, b, c)K(r(a, b, c), ((a, b, c)) is used to approximate the effect of all cone voxels represented by (a, b, c). Here, r(a, b, c) is the radiological distance between (a, b,c) and (i,j, k), and (a, b, c) is the zenith angle between (a, b,c) and (i,j, k). Since the number of computation points on a ray is generally linear in the size 1 of each phantom dimension, the overall computation time becomes O(ml4). This can be represented in the Algorithm 2.1. Algorithm 2.1 Collapsed cone // compute dose D // P: primary dose // K: kernel in spherical coordinate system // density : density in 3D CC_alg(P, K, density) { for each voxel (i,j, k) in the region of interest do for each ray do for each voxel (a, b, c) on a ray do compute angle q and radiological distance r between (i,j, k) and (a, b,c); D(,j, k) =D(,j, k) + P(a, b, c) K(r, ); end for end for end for } Figure 2.3 shows an example of the collapsed cone scheme in two dimensional space. We want to compute dose D(3,2). As can be seen from the figure, the phantom is partitioned into 8 cones according to the spherical coordinate system. Notice that the kernel that the collapsed cone scheme is using is in the polar coordinate system. A ray passes through the center of each cone. The dose contribution of each cone to the I/ I/ I/ I I / 1/ 2 ------ ---- -- -- - 0 31 I \ I 0 3 Figure 2.3: Dose computation example in the collapsed cone scheme voxel at (3,2) is computed by following the ray and finding the voxels that intersect the ray. That is, for D(3,2), the self dose contribution is first computed. Second, find the next voxel (4,2) intersecting the ray and compute the dose contribution from that voxel. The third voxel meeting with the ray is (4,3). So, its effect is computed. In this way, all the dose contributions from voxels (5,3), (6,3), (6,4) and (7,4) intersecting with the ray are calculated. In the same way, the dose effect from the other cones can be computed. 2.2 The Outline of Our Hierarchical Method The collapsed cone method described in the previous section still has high time complexity. We propose a new method called the hierarchical method that takes advantage of a spatial binary tree. Figure 2.4 shows the processing sequence of our hierarchical method. Since the convolution takes such a long time, we do preprocessing for the inputs, primary dose and density, that are represented by a three dimensional array. Primary Initialize Dose Tree Dose Convolution Kernel ---- DesiyRegion Ray Density Partition Tracing Figure 2.4: The processing sequence of hierarchical method First, we construct a spatial binary tree in bottom-up direction from the primary dose. Since each leaf node corresponds to an element (voxel) in the three dimensional array, we have to find this correspondence. This is done by traversing from the root node of the binary tree to a leaf node. Once all the leaf nodes are initialized, all the internal nodes can be initialized in linear time in the bottom-up direction. Since initialization of all the internal and leaf nodes can be done by simple post-order tree traversal, initializing the spatial binary tree takes O(n) time in total where n is the number of voxels. Another pre-processing is performed for the density array that will be used to calculate radiological distance. Since walking through voxel-by-voxel slows down convolution time, we merge into a region a bunch of voxels with same or similar density that are neighbors of one another. By re-structuring in this way, we can make the convolution computation fast by walking through region by region, instead of voxel by voxel. As will be seen in the ray tracing chapter, this region partitioning process takes O(n logn) time since a priority queue is used to select a 'i.-:. -1 region. Then, the convolution is performed using the tree, the regions partitioned, and the kernel in a Cartesian coordinate system transformed from the dose spread array (DSA), which can be obtained from Monte Carlo simulation. This takes between O(nlogn) and O(n2) time that is O(131ogl) and 0(16), depending on the parame- ter settings for the algorithm. But, according to our experimental results, we can compute the convolution with very high accuracy in O(n log n). CHAPTER 3 HIERARCHICAL REPRESENTATION FOR CONVOLUTION We propose the use of a radically different scheme. This scheme has its roots in the physics of n-body problems [17]. In this chapter, we describe the main data structure for dose computation. Rather than decompose the region around each voxel (i,j, k) into a collection of m cones, we begin with a hierarchical representation of the w x h x d phantom. The hierarchical representation used by us is a binary tree that is motivated by quadtrees and octrees [18, 19], 3-d trees [20], and the binary space partitioning tree [21] data structures. Each node in the binary tree represents a subset of the voxels in the phantom. These voxels define a wl x hi x d, subregion of the phantom. Furthermore, the nodes at each level of the binary tree collectively represent all voxels of the phantom. At each level of the binary tree, the region represented by a node is divided into halves by a plane that is perpendicular to one of the axes. The root of the binary tree represents the entire phantom. Each of its children represents one half of the phantom. Each of the grandchildren of the root represents one fourth of the phantom and so on. For an 1 x / x / phantom, the decomposition tree will have approximately 3 log2 levels. A possible way to select the cutting planes is in round robin fashion: first cut along x, then along y, then along z, repeat this sequence until each node represents a single voxel. We illustrate the process on a two dimensional phantom (Figure 3.1). Figure 3.2 shows the spatially decomposed tree for the 4 x 4 phantom of Figure 3.1. The 16 pixels of the phantom are labeled A-P as shown in Figure 3.1. The tree root in Figure 3.2 represents the entire phantom. Its two children repre- sent regions obtained by dividing the phantom using a vertical cut through the center. 0 1 2 3 Figure 3.1: An example 4 x 4 phantom Figure 3.2: Hierarchical representation of the 4 x 4 phantom A B E F C D G H I J M N K L O P So, the left child represents the pixels A-D and I-L. At the next level, a horizontal cut through the middle is used. So, the region represented by the left child of the root is cut in two. One of these represents the pixels A-D and the other the pixels I-L. The region decomposition at the next level is done using a vertical cut. Finally, a horizontal cut is used to end up with regions that represent a single pixel. Notice that the decomposition scheme readily extends to the case of phantoms with different w, h, and d values. In the round robin scheme, cuts along dimensions that have already been reduced to unit size can be omitted. The binary tree structure just described may be represented as an implicit data structure using the standard mapping of a complete binary tree into an array [22]. This eliminates the need for pointer memory and so the representation is very memory efficient. Algorithm 3.1 Spatial binary tree initialization // pdose: primary dose 3D array // SBT: binary tree to construct SBT_init( pdose, SBT ) { for each voxel pd..- [i i,k] do find the corresponding leaf node of pd..- [i i 11: ( i,- the primary dose into the leaf node. set the node center as the center of the voxel. end for for each internal node of the binary tree do set the sum of its two children as its primary dose. set the weighted center of its two children as the node center. end for } Now, let's see how to initialize the binary tree from a three dimensional array pdose with primary dose (see Algorithm 3.1). The binary tree is initialized in bottom-up direction; that is, all the leaf nodes are initialized and then all the internal nodes are filled up up to the root node. Each leaf node has all exactly one-to-one relation to a voxel (an element) in the array pdose. To find the leaf node corresponding to a voxel (i,j,k) in the array pdose, we start at the root. Since the phantom is cut in a round-robin sequence, we can determine whether the voxel is contained in the left child or the right child. Suppose that the first cut is x cut. If the index i is less than the cut plane, we go to the left child since it belongs to the left region. The next cut is y cut. We can see if the index j belongs to the region corresponding to the left or the right child. In this way, using the cut plane and the indices representing the voxel, we can reach the corresponding leaf node. As an example, let's find the node of the spatial binary tree that corresponds to the element M of the phantom whose array indices are [2,1] in Figure 3.1. Before doing this, notice that the left/right child by x cut corresponds to the left/right region of the phantom, and the the left/right child by y cut corresponds to the up/down region of the phantom, respectively. Since the first cut from the root node is x cut, we know that we have to go to the right child by comparing the cut plane and the x axis index 2 of the M. The next cut is y cut. So, the y axis index 1 of M is compared to the cut plane. Since y cut is greater than the y index of the element, we again go to the right child representing M-P. Since the next x cut is to the right of the element M, we go to left child {M,0}. Finally, the next y cut is below the element M and we reach the leaf node M by following the left child. When we find the leaf node, we can copy the primary dose value in the array pdose and the geometric position representing the voxel to the leaf node found. Here, we assume the center of a voxel is the geometric position of the voxel. Once all the leaf nodes are initialized with the method above, we can initialize their parent. The primary dose of each parent node is initialized with the sum of the primary dose of its two children and the node position representing a collection of voxels with the sum of the two children voxel positions weighted by the primary dose. SC x P, + c, x P,( C X = Xf (3.1) P, + Pr where C,, C1, and C, represent the region center position of the parent, left-child, and right-child respectively. PF and P, are the primary dose of the left and right child, respectively. This initialization scheme applies to all the internal nodes up to the root node. Now, let's see the time complexity of the tree initialization: Suppose the number of leaf nodes is n. Since we can find the corresponding voxel to each leaf node and compute each internal node value in constant time while traversing the tree in post- order, the tree initialization takes only ((n) time. CHAPTER 4 RAY TRACING ALGORITHM In the dose computation, radiological distance rather than Euclidean distance has to be calculated to compute the effect from one point to another point (see Figure 4.1). Let's see what the radiological distance is. In Figure 4.1, there are two regions, A and B. Let regions A and B have densities pi and p2, respectively. Also, dl, d2 and d3 are the Euclidean distances between points p and r, between r and s, and between s and q, respectively. Then the radiological distance rd between points p and q is dl pi + d2 P2 + d3 pi. To compute this radiological distance rd, we have to find the intersection points r and s, which can be done by ray tracing. Figure 4.1: Radiological distance So, we can describe the problem as follows: Problem: There is a heterogeneous phantom in three dimensional space in which each voxel have a different and has faces parallel to the x, and z coordinates. Given a qu, line segment I with 7'' . direction, find all the intersection points with different '. ; '. boundaries in the sequence. For ray tracing, schemes based on quadtree or octree have been used [18, 23]. Since these schemes use a partitioning strategy that is independent of the image that is being partitioned, these schemes result in more rectangular (in the case of 2D images) or rectangular prism (in the case of 3D images) regions than are necessary. For example, a quadtree decomposition of the 2D image of Figure 4.5 leaves H, D, M, I, J, K, L as separate rectangular regions even though all of these may be combined into a single rectangle. Another possible approach is to decompose regions into a minimal number of rectangular prisms to reduce the number of intersections between a given line segment and region boundaries. There are efficient partitioning algorithms in two dimensional space [24, 25], but minimal partitioning in three dimensional space is proved as NP- complete [26]. Even though there are optimal one-time algorithms [27, 28] and interactive algo- rithms for an orthogonal line segment [29, 30], there are no known efficient interactive algorithms for an arbitrary query line segment. So, we adopt a heuristic region partitioning method to decrease the number of intersection points by reducing the number of rectangular prisms partitioned in a regular fashion so that efficient ray tracing can be performed. 4.1 Region Partition Our region partitioning algorithm is divided into two n ii r steps: 1. Region Construction: construct a set of regions with the same density that decomposes a given phantom. 2. Region Growing: extend regions, starting from a 'Li--._ -1 region. In the first step, a heterogeneous phantom is decomposed into a set of rectangular prisms in a regular way. In the second step, we extend each region absorbing neighbor regions with the same density, starting from the 'i1--.: -1 region. Once a region is extended, the region is fixed. This means no neighbor region can grow into an already grown region. This process continues until all live regions that have not been probed for region extension are exhausted. 4.1.1 Region Construction An easy way to form a set of rectangular prisms partitioning a phantom is to combine two neighbor regions with the same density in a regular fashion: Along the x axis, we can first merge two neighbor regions if they have the same density and the same size. Along the y axis, two regions are also merged if they have the same density and size. Along the z axis, two regions can be merged if the same two conditions hold. Then again, along the x dimension, regions can be combined. In this way, region merge continues until no more regions are combined. In Algorithm 4.1, everytime the while statement is repeated, the index along each dimension is incremented by a factor of 2, starting at index 0. Figure 4.2 shows an example of a heterogeneous phantom with two kinds of density values, the shaded and unshaded regions. For simplicity, we use a two dimensional phantom on the x and y coordinates. Extension to three dimensions is fairly easy. Let's first see regions merge along columns, starting from top-left. The two regions (pixels) [y,x] = [0,0] and [1,0] are merged into one since the two have the same density and size. Regions [2,0] and [3,0] are also merged. This merge process along columns stops when the two regions [6,7] and [7,7] are merged. In the middle of the process, Algorithm 4.1 Region construction algorithm // denarray: density array to construct regions Region_partition der_ { cont = true; inc = 2; // initial index increment size while ( cont ) do cont = false; for each yz coordinate do for each two neighbor regions along x direction do if same size and same density then combine those two regions; cont = true; end if end for end for for each xz coordinate do for each two neighbor regions along y direction do if same size and same density then combine those two regions; cont = true; end if end for end for for each xy coordinate do for each two neighbor regions along z direction do if same size and same density then combine those two regions; cont = true; end if end for end for inc = 2*inc; end while } Y1 0 1 x 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Figure 4.2: A given heterogeneous medium with two regions the two regions [6,6] and [7,6] cannot be merged since the two do not have the same density. This merge result along columns is shown in Figure 4.3. Next, the merge process is done by row starting from top-left again. The two regions [0:1, 0] and [0:1, 1] that have been merged during the merge process along columns are merged into one since the two conditions hold. Regions [0:1,2] and [0:1,3] are also merged. This combining process along rows is finished when the two regions [6,6] and [6:7, 7] end up with failure to merge since those two do not have the same density as well as same size. This process results in Figure 4.4. Then, the merge process parallel to x axis keeps going on. This merge process is finished when no merge along all x, and y dimensions is successful. The resulting phantom from the region construction process is shown in Figure 4.5. Notice that since the region index is incremented by a factor of two everytime the while statement is performed, the starting index of a combined region is alv--b i an 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Figure 4.3: Regions following merge along column 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 Figure 4.4: Regions following merge along row y x F -C) L J 0 1 2 3 4 5 6 7 8 regiontable A x-bnd y-bnd density B C D Figure 4.5: Regions formed by the region construction step 0:2 0:2 6:8 4:6 4:8 0:4 4:8 2:4 1.0 1.0 1.0 0.6 even number and the size of that is ahv--i a power of two. So, this region construction process can be done in O(n) time. As can be seen in Figure 4.5, there are too many rectangles in the phantom after the first step of region partition due to the regularity. For example, the two regions marked F and G are unnecessarily separated. So, we get into the second step to reduce the number of regions. 4.1.2 Region Growing Now we want to decrease the number of regions by extending each region as much as possible which will eliminate regions with the same density as a growing region. As can be seen in Algorithm 4.2, our heuristic is to grow a 'li--. -1 region first in which a largest face moves first outward to extend the region. If the size of more than one region or face ties, a region or face can be arbitrarily chosen. The chosen face is extended outward until a region with different density is met. While extending each face, the regions with the same density are absorbed into the growing region. If a region is fully contained within a growing face, that region is absorbed completely into the growing region. If a region is partially contained, only the part contained in the growing face is merged into the growing, and the rest not contained is partitioned into rectangular prisms to insert into the region table. Later on, these rectangular prisms are checked for region growing. Suppose that the current growing region is labeled G in Figure 4.6. Region G can move right until it meets the left face of region region D which has different density with the growing region G. Region A is fully absorbed into region G. For region B, the part B1 is absorbed into region G and region B is shrunk to B1. Similarly, region C1 is absorbed into region G, and the other part is partitioned into C2 and C3 that are inserted into the region table. We attempt to extend the newly created regions B2, C2 and C3, later. C C2 C1 C3 G A B B1 B2 Figure 4.6: Regions extension Algorithm 4.2 Region growing algorithm // RegionTbl: table with region information decomposed Reg_grow Region T. ) { for each region do choose the 'i--:,- -I region b among ungrown regions; for each face of b do choose max face f among not-yet-extended faces; move f outward listing regions until a region with different density is met; for each region r listed do if( r is contained in b ) delete from the region table; if r is partially contained in b then absorb the part contained; create regions for non-contained part; delete region r from the region table; end if end for end for update the size of b into the region table; end for } In Figure 4.5, the 'i.----- -1 regions are A, B, and C. We can choose one (A) of them arbitrarily. The largest face is right and left and we select one, right face. When we try to move the right face, we meet the region E that has different density. So, the region can't grow to the right direction. To the left direction, the negative x direction, the region can't be extended since the region boundary is a phantom boundary. Now, check the top face. The top boundary is also a phantom boundary and it can't move up. For the bottom face, the region A can be extended since the neighbor region B has the same density, and the region B is absorbed. Since the bottom of region B (y=0) is a phantom boundary, the region growing of A stops. The extended region A is marked as fixed since it has been grown. The next 'i-i.- -1 region is C. Since this region has two faces that are phantom boundaries and the other two faces meet the regions O and N with different density, it can't be grown. The next candidates are D, E, F, and G that have the 'i.-.-. -1 size among all the regions except for the already grown regions, A and C. 'i- we choose region D. For region D, the right face can be extended up to x = 7 since the neighbor region M has the same density. By this growing, the region M is merged into the region D. The left face is also grown to merge the region H until it meets the right face of the region P. The top face can't be moved into the region C since region C has different density. The extension of the bottom face absorbs the regions I, J, K, and L. Let region F be selected for the next growing. To the right direction, region G is merged into and deleted. Even though the left neighbor A has the same density, the left face of region F can't be extended since region A was already grown. The top and bottom faces can't be grown since they are facing the phantom boundary and region E with the different density. This process continues until all live regions are grown and fixed. Figure 4.7 is the resulting partitioned phantom after the second step of the region partition. 0 Figure 4.7: 1 2 3 4 5 6 7 Regions formed by region partitioning process As can be seen, the second step has eliminated about 62 of the rectilinear regions of Figure 4.5. According to my experimental results using a three dimensional real phantom that was obtained from CT ii, i,-,: about 80 of the regions resulting from the first step were removed by the second region growing step. Now, take a look at how many intersections between a given line segment I and the partitioned phantom exist. If we directly use Figure 4.5 without using region growing step, we'll need to find 8 intersection points. But from Figure 4.7, we need to find only five intersection points. So, we can expect faster radiological distance computation. 4.2 Radiological Distance Computation Using the partitioned regions from the previous section, ray tracing is repeatedly performed for a given query line segment 1 with arbitrary direction to compute the radiological distance corresponding to line segment 1. The physical representation of the partitioned phantom is shown in Figure 4.8. Each phantom element has an index or pointer to the region table that has the region information, region bounds and density value, to which it belongs. In this particular example, it is assumed that the densities of white and shaded regions are 1.0 and 0.6 respectively. To compute the radiological distance of a given line segment 1, the region to which endpoint p belongs is accessed and we compute the intersection point r between the line segment / and the region accessed by index 1. Then we find the next region indexed by 2 that the line segment 1 passes through. This is done by incrementing very tiny distance toward another endpoint q and computing phantom indices to which the incremented point belongs. Then another intersection point s is found. Continuing in this way, we can get the region that contains the other endpoint q. At this point, we have found all the intersection points. Since we have found all the intersections, the radiological distance rd can easily be computed by summing up all the radiological distances of the line segments on the line 1: rd = (dl + d2 + d5 + d6) 1.0 + (d3 + d4) 0.6. phantom : 0 1 2 3 4 5 6 7 8 regiontable : 1 2 3 xbnd y_bnd density 0:2 2:3 3:7 0:8 0:3 1:4 * * * 1.0 1.0 0.6 Figure 4.8: Partitioned regions representation 1 1 1 1 1 1 d' I 3 3 1 p 2 3 3 3 3 1 1 2 CHAPTER 5 EFFICIENT DYNAMIC SEARCH TREES The dynamic search tree has very wide applications. A dynamic search tree may be used as a dictionary to find the record with a given key or as a double ended priority queue into which we may insert an arbitrary key and delete the min or the max key. The most efficient trees have optimal time complexity O(logn), where n is the number of elements in the tree. It is well known that AVL tree, red-black tree, and splay tree belong to the representative tree structures that show the best performance. Let's call all these trees binary search tree (BST) with single element per node BST01. We propose BST with supernode that has multiple elements in each node instead of a single element in each node. The BST with supernode, -v BSTsn, has the following properties: 1. Every internal node except leaf is alv--v- full. 2. Leaf nodes may or may not be full. 3. Tree balance is maintained only for full nodes. Supernodes have multiple elements in them and can be implemented in several v--.v They can be in sorted list or unsorted array except min and max elements. In the current implementation, we distinguished min and max elements from the other elements within a supernode to lessen the cost of maintaining the order of elements in each node. As can be seen in Figure 5.1, a supernode has left child, right child and parent pointers, and multiple elements in addition to min and max elements. Each element consists of a key and a value field. The left subtree pointed to by the left child pointer parent min ** * * max left child right child Figure 5.1: Supernode structure has keys less than the min key of the current node and the right subtree pointed to by the right child pointer has keys greater than the max key of current node. To guarantee the first property, the element insertion and deletion at non-leaf node must be transformed to leaf node insertion and deletion. This can be done using the element with the largest key in left subtree or with smallest key in right subtree. Using the element with smallest key in right subtree is symmetric to the other, we present only the way using the element with largest key from left subtree of a node. Suppose an element e is to be inserted into a non-leaf node p in Figure 5.2. Since the smallest element key in node p is greater than any key in the left subtree of p, the element e is exchanged with the element with the smallest key of p so that element e can have the smallest key among keys in node p. Then, the node with largest key from left subtree of p is found by following the right-most path of the left child of p. This node is marked as q. Since the node q does not have a right child, the exchanged element e can be inserted as right child of q if node q is full. If q is not full, then q is a leaf node, and the element e can be inserted into this node. Figure 5.2: Transformation to leaf node insertion and deletion Deletion of an element from a non-leaf node can also be transformed to leaf node deletion (see Figure 5.2). Suppose that an element e is to be deleted from node p. The element with smallest key is moved to the element e position to be deleted. Since the 'l-.-i -1 element from the left subtree of p is the smallest element of p, the node q is found as in the same way above. Then the element with largest key is moved up to make node p full. This process is repeated until a leaf node is found. How to maintain tree balance efficiently is described in the next section and, detailed algorithms for AVL, red-black and splay trees are followed. Then, we show how much we can get space efficiency and the performance comparison results between trees with single element node and trees with supernode. In the following sections, distinct keys in the search trees are assumed. 5.1 Rebalancing Let's see how the balance of BST01 is maintained briefly. The balance of the AVL tree is maintained by rotations and that of the RB tree is maintained by color flip as well as rotations. For the splay tree, splay operation is performed following each operation to have amortized time complexity O(logn), where n is the number of elements in the tree. Here, we assume that BSTsn satisfy all the BSTsn properties before any rotation, color flip or splay. Now, let's take a look at BSTsn from the view point of tree rebalancing. Color flipping does not cause any problem since it does not make leaf node non-leaf node. But rotation and splay may violate the first property of BSTsn if a non-full leaf node participates in them. Let's first see the four rotation types: LL, LR, RL, RR. Since rotation types RR and RL are symmetric to LL and LR respectively, we just explain LL and LR rotation types. Figure 5.3 shows LL rotation type for tree rebalancing. If node x is leaf node before rotation, that node is still leaf node after rotation. If node x is not leaf node before rotation, node x is still not leaf node after rotation. Therefore, whether node x is full or not, the first property of BSTsn after rotation is still satisfied. z LL AX Z a b a b c d Figure 5.3: LL rotation The rotation type LR is shown in the Figure 5.4. If node y is not leaf node before rotation, the first property of BSTsn after rotation is not violated since the node y is 33 full before and after rotation. But if node y is leaf node before rotation, it may not be full. So, the first property of BSTsn may not hold anymore after rotation. LR b c d b c Figure 5.4: LR rotation Similarly, if the start node of splay is leaf node that is not full, the first property of BSTsn will be violated. Since each splay type R, RL and RR is symmetric to L, LR and LL respectively, we'll just see L, LL and LR splay types. Figure 5.5 shows L splay type. If node x before splay is non-full leaf node, the first property of BSTsn is violated. Otherwise, all the properties hold. \ X L Figure 5.5: L splay 34 Figure 5.6 shows LL splay type. If the node x before splay is non-full leaf node, the first property of BSTsn is violated since the node x will be a non-leaf node following splay. Of course, if the node x is a full node, there is no property violation of BSTsn. Z7 /-\ Vy LL 311 Figure 5.6: LL splay Figure 5.7 shows LR splay type. If the node y before the splay is a non-full leaf node, the first property of BSTsn is violated since the node y will be a non-leaf node following the splay. If the node y is a full node, there is no property violation of BSTsn. LR b c Figure 5.7: LR splay To fix the problems violating the first property of BSTsn, we may make non-full non-leaf node full by moving the data from a leaf node after rotation or splay. But this will make insert and delete operations slow down. Another way to handle this problem is to rotate or splay only full nodes: If current node is leaf node and full, we can start rotation from this node. Otherwise, we can just start rotation or splay from the parent node of leaf node since the parent node must be full. Even though this method may add one more tree level than the previous data move method in total, the burden of data movement can be avoided. Also, if each node has at least three elements, the tree height of BSTsn is still smaller than that of BST01. In our implementation, this scheme is used for AVL, red-black, and splay trees. The detailed description of AVL, red-black, and splay trees follows. 5.2 AVL Tree with Supernode Let's start reviewing the AVL tree in which each node has only a single element. Let's call it AVLO1 tree. The AVLO1 tree enforces that the height difference be- tween the left subtree and right subtree of every node is zero or one. Rebalancing is performed after insert or delete operation of BST. Parent minkey, minval maxkey, maxval bf n_elem key_1, val_1 key_2, val_2 LeftChild RightChild Figure 5.8: AVL tree supernode In the AVL tree with supernode (see Figure 5.8), each node has multiple elements and each element has a key and a value field. Parent, LeftChild and RightChild point to the parent node, left child node and right child node, respectively. The n_elem field denotes the number of elements that this node has currently, and bf is balance factor for full nodes. The minimum and maximum keys are denoted by minkey and maxkey with their own value fields minval and maxval, respectively. The rest of the elements are represented in an array and marked by key_l, key_2, ... with their own values. The rebalancing following the insert or delete operation of BST alv-,i starts from full node which may be a leaf node or the parent of leaf node. When a leaf node becomes full because an element is inserted into the node, the bf of this node is changed to 0 and tree rebalancing is performed from this node. When a node becomes non-full because of an element deletion, tree rebalancing is started from the parent of the non-full node. Algorithm 5.1 AVL Search algorithm // key: key value to find Search(. { np = root; while ( np ) do if( key < np-minkey ) np = left child of np; else if( key > np-maxkey ) np = right child of np; else break; end if end while if( np is NULL ) return if( key is found in the node np ) return return } The AVL tree search starts from the root node, as can be seen in Algorithm 5.1. If the input key is smaller than the minkey, go to its left child. If the key is greater than the maxkey in the node, go to its right child. Otherwise, the break statement is executed to get out of the while loop since the key is in the range of the current node, that is, the key is less than minkey and 'i -. --r than maxkey. After the while loop execution is done, the node pointer np may have NULL pointer from while loop condition or point to a node from the break statement. If np points to an internal node of the tree and finds the input key from that node, then search function returns the pair node pointer np and boolean value FOUND. Otherwise, it returns NOT_FOUND with NULL or a node pointer, denoting that there is no element in the tree corresponding to the input key. For the insert operation (Algorithm 5.2), the AVL search function is called to see if the input key is already in the tree. If so, nothing needs to be done and the insert function returns FOUND. Otherwise, the input data are inserted into the tree. Let's see the case that the node pointer np from the search function is NULL. If the parent pp of np is non-full, the input data are inserted into pp. If the node pp becomes full, the balance factor bf of node pp is set to zero and tree rebalancing is performed starting from the node pp. If pp is already full, a new node np is allocated and initialized with the input data key and val. This new node is attached to the node pointed to by pp. That is, if the input key is greater than pp's maxkey, np is pointed to by the right child of pp. If the input key is less than pp's minkey, np is linked as the left child of pp. Now, let's see the case that the node pointer np from the tree search function is not NULL. If np is not full, the input data are inserted into this node. If the data insertion makes this node full, its balance factor is set to 0 and the tree is rebalanced starting from this node. If the node np is full, the position to insert the input data should be found. The left and right children of np are checked if one of them is NULL. If so, a new node is allocated and the input data are inserted into that node and attached to np. In the case that neither of the children is NULL, the input Algorithm 5.2 AVL Insert algorithm // key, val: input data to insert Insert( val ) { if( found is true ) return FOUND; let pp be the parent node of np; if np is NULL then if pp is NOT FULL then insert key and val into pp; if( pp is FULL ) pp-bf = 0, perform rebalancing starting from pp; else create a node np with (key, val); attach np to pp; end if else if np is NOT FULL then insert key and val into np; if( np is full ) np-bf = 0, rebalance tree starting from np; else if np has no left child then swap input data with the min data of np; create a node cp with (swapped data), attach cp to np as left child; else if np has no right child then swap input data with the max data of np; create a node cp with (swapped data), attach cp to np as right child; else swap input data with the min data of np; find the node cp with largest key from the left subtree of np; if cp is full then create a node temp with (swapped data); attach temp as right child of cp; else insert exchanged data into cp; if( cp is FULL ) cp-bf = 0, rebalance tree starting from cp; end if end if end if end if return INSERTED; } data are exchanged with min element of the node np and we find the node cp with the largest key from the left subtree of np. If that node is full, a new node temp is allocated and the input data are inserted into this new node. This new node temp is linked to cp as a child node. If cp is not full, the exchanged data are inserted into this node and we check if the node cp has become full. If full, cp's balance factor is set to 0 and tree rebalancing is performed, starting from cp. Algorithm 5.3 AVL Delete algorithm // key: key of element to delete Delete( ' { if( found is false ) return NOT_FOUND; if np has left child then while np is not leaf node do find the node cp with largest key from the left subtree of np; move up the largest key element of cp into np; np = cp; end while delete the element with largest key from np; else if np has right child then while np is not leaf node do find the node cp with smallest key from the right subtree of np; move up the smallest key element of cp into np np = cp; end while delete the element with smallest key from np; else delete the element with the input key from np; end if if( np has only one element ) dispose node np; if np was full before deletion then rebalance tree starting from the parent node of np; end if return DELETED; } Now, let's see the AVLsn delete operation (Algorithm 5.3). As with AVL tree insert function, first search function is called to see if the input key is in the tree. If not, this function returns NOT_FOUND indicating that the key is not in the AVL tree. Otherwise, element deletion procedure is performed. To delete the element whose key is the same as the input key, we check whether the node np returned from the search operation has left child or right child, or is just a leaf node. If the node np is not a leaf, the node will end up as a non-full node if the element corresponding to the input key is deleted. So, this node should be filled up. To do this, we find an element to fill from left or right subtree of np. If np has left child, the element with minkey of np is moved to the element with the input key to make the element with the minkey empty. Then, the element with largest key from the left subtree of np is found and this element is moved up into the position of minkey of the node np that has been empty. This process is repeated until the node pointed to by np ends up with a leaf node. If the np returned from the search operation has a right child, the element with the smallest key from the right subtree of np is moved up. This process is repeated until np becomes a leaf. Now that np is a leaf, we can delete an element from np with no violation of the AVLsn tree property. If the node np before element deletion has only one element, that node is disposed from the tree since np is to become an empty node. If node np was full before deletion, tree rebalancing is performed starting from the parent node of np since the node np is not full. 5.3 Bottom-up Red-black Tree with Supernode As with an AVL tree, a red-black tree is also a balanced binary search tree in terms of the number of black nodes on any path from the root to an external node. The red-black tree can be rebalanced in bottom-up direction efficiently. Let's call red- black tree with single element whose rebalancing is done in bottom-up way RBB01. In RBB01, a newly allocated node has black color and rebalancing is done by color flips and/or a rotation when an element is inserted or deleted into/from the tree. The red-black tree with supernode which has multiple elements, RBBsn, can be implemented in similar way as the AVL tree. That is, rebalancing is performed only for full nodes. All non-full nodes have black color. When a non-full node becomes full, this node has red color and participates in tree balancing which is done by color flipping or rotation as for RBB01. When a full node becomes non-full due to an element deletion, the color of this node is changed to black and tree rebalancing is performed in the same way as for RBB01. Parent minkey, minval maxkey, maxval color n elem key_1, val_1 key_2, val_2 LeftChild RightChild Figure 5.9: Red-black tree supernode structure Figure 5.9 shows the supernode structure of RBBsn tree. The number of elements within a node is counted by n_elem field and the node color is denoted by the color field that is used for tree rebalancing. The two elements with minimum and maximum keys are distinguished from the other elements. Each node has parent, left child and right child pointers. All elements less than the minkey are in the left subtree pointed by LeftChild and elements greater than the maxkey are in the right subtree pointed by RightChild. As can be seen in Algorithm 5.4, the search of a red-black tree is performed starting from the root node. If the input key is less than the minkey of the current node np, np is set to its left child. If the input key is greater than the maxkey of Algorithm 5.4 RB Search algorithm (bottom-up) // key: key value to find Search( ) { np = root; while ( np ) do if( key < np-minkey ) np = left child of np; else if( key > np-maxkey ) np = right child of np; else break; end if end while if( np is NULL ) return if( key is found in the np ) return return } the current node np, np is set to its right child. Otherwise, we are getting out of the while loop since the input key is in between the minkey and maxkey of the current node np. After the while loop is executed, np is checked if it is NULL. If so, the search function returns a NULL pointer with boolean value NOT_FOUND. If np is not NULL, the key is searched within the current node np. If the input key is found in it, np is returned with boolean value FOUND. If not found, the pair np and NOT_FOUND is returned. To insert input data which consists of key and value (see Algorithm 5.5), the search function is called to find the insertion position. If the input key is already in the tree, just return without further execution. If the node np returned from Search() function is NULL, the input data are inserted into the parent node pp of np if pp is not full. If pp is full, a new node with black color is created and the input data are inserted into this node. Then this new node is attached to pp as left child if the input key is less than minkey. Otherwise the node is linked as the right child of pp. Algorithm 5.5 RB Insert algorithm (bottom-up) Insert( val // key, val: input data to insert { if( found is true ) return FOUND; let pp be the parent node of np; if np is NULL then if pp is NOT FULL then insert key and val into pp; np = pp; else create node np with (color BLACK, key, val); attach np to pp; end if else if np is NOT FULL then insert key and val into np; else if np has no left child then exchange input data with the min data of np; create node cp with (color BLACK, exchanged data), attach cp to np as left child; else if np has no right child then exchange input data with the max data of np; create node cp with (color BLACK, exchanged data), attach cp to np as right child; else exchange input data with the min data of np; find the node cp with largest key from the left subtree of np; if cp is full then create node ccp with (color BLACK, exchanged data); attach ccp to cp as right child, cp = ccp; else insert exchanged data into cp; end if end if np = cp; end if end if if( np is FULL ) np-color = RED, perform tree rebalancing starting from np; return INSERTED; } If the node pointer np from Search() function is not NULL, np is checked if it is full. If not, the input data are inserted into this node. When np is full and one of its children is missing, a new node with black color is created, the input data are inserted into this new node and this new node is linked to its parent node np as left or right child. If np has both a left and a right child, the position to insert the input data should be found. To do this, the input data are exchanged with the element with the minkey and then the node cp with 'L--,. -1 key from the left subtree of np is found. Since the exchanged data are bigger than any key in the left subtree of np, we can insert it into cp as element with maxkey. If cp is full, a new node with black color is created, initialized with the exchanged data and attached to the node cp. If not full, the exchanged data can be inserted into cp. Since the input data have been inserted into the red-black tree, we need to check if tree rebalancing is necessary. That is, if np is full, the node color of np is changed to red and tree rebalancing is t lii._. i, using color flip or rotation. As can be seen in Algorithm 5.6, the delete operation starts by finding the input key by calling the Search() function. If the key is not found in the tree, this function returns the boolean value NOT_FOUND. Otherwise, deletion from a leaf node is performed. If the node np returned from the Search() function has a left child, the node cp with largest key from the left subtree of np is found and the largest key element of cp is moved up into the node np as the minkey element. This process is repeated until node np ends up at a leaf node. If the node np from the Search() function has only a right child, the node cp with smallest key from the right subtree of np is found and the element with smallest key is moved up into the node np as the maxkey element. This process is repeated until the node np is a leaf. After this process, actual deletion at the node np is performed for the element moved up or the element with input key so that the minkey and maxkey are distinguished from the other elements. Algorithm 5.6 RB Delete algorithm (bottom-up) // key: key of element to delete Delete( { if( found is false ) return NOT_FOUND; if np has left child then while np is not leaf node do find the node cp with largest element from the left subtree of np; move up the largest key element of cp to np; np = cp; end while delete the element with maxkey; else if np has right child then while np is not leaf node do find the node cp with smallest element from the right subtree of np; move up the smallest key element of cp to np np = cp; end while delete the element with minkey; else delete the element with the input key from np; end if if np had single element before deletion then deallocate np; return DELETED; end if if np was full before deletion then np's color = BLACK ; rebalance tree starting from the parent node of np; end if return DELETED; } After actual deletion, the node np is checked if it was a node with only a single element before deletion. If so, the node is deleted from its parent and disposed. If the leaf node np was full before deletion, its color is set back to black and tree rebalancing is performed starting from its parent node. When the element with the input key is deleted, this Delete() function returns the boolean value DELETED. 5.4 Bottom-up Splay Tree with Supernode Each operation of a bottom-up splay tree, -i SPB01, in which each node has a single element is followed by a splay to have amortized time complexity of O(logn). If an operation ends up at a failure node, splay is performed starting from the parent node of the failure node. In the splay tree with supernode which has multiple elements in it, i- SPBsn, each operation is followed by a splay for full node. If an operation ends at a failure node or non-full node, splay is performed starting from the first full ancestor node of that node: That is, if the last visited node np of an operation is full or non-full, a splay is started from the node np or the parent node of np, respectively. If the operation ends at a failure node, the last visited node is checked if it is full. If so, a splay is started from this full node. If not, a splay is performed starting from the parent node of the last visited node. The supernode structure used for the splay tree is shown in Figure 5.10. The two elements with minkey and maxkey are differentiated from the other elements within the node which are stored in unordered sequence. The n_elem field counts the current number of elements in the node. The three pointers, Parent, LeftChild and RightChild link to the parent, left child, and right child of the node, respectively. All elements in the left subtree pointed to by the LeftChild field have keys less than the minkey of the current node, and all elements in the right subtree have keys greater than the maxkey of the current node. 47 Parent minkey, minval maxkey, maxval n elem key_1, val_1 key_2, val_2 * LeftChild RightChild Figure 5.10: Sp,1 li tree supernode structure Algorithm 5.7 Sp1 li tree Search algorithm (bottom-up) // key: key value to find Search( { np = root; while ( np ) do if( key < np-minkey ) np = left child of np; else if( key > np-maxkey ) np = right child of np; else break; end if end while result = NOT_FOUND; if( np = NULL and key is found in the np ) result = FOUND; splay starting from the first full ancestor node of np; return } The search in Algorithm 5.7 starts from the root node. If the input key is less than the minkey of current node, np goes to the left child of current node, and goes to the right child if the key is greater than the maxkey of current node. When the input key is in between the minkey and maxkey of current node, the break statement in the while loop is executed. After the while loop is executed, the pair np and boolean value FOUND are returned, if the node pointer np is not NULL and the input key is found in the node np. Otherwise, the search function returns boolean value NOTFOUND and the node pointer np which may be NULL or point to an internal node. To insert an element, the input key is searched for (see Algorithm 5.8). If the input key is already in the tree, a splay is performed starting from the the node np or the parent node of np. That is, if the node np is not full, the splay starts at the parent node of np. Otherwise, the splay starts at np. When the input key is not found in the tree, the node pointed by np may or may not be NULL. If np is NULL, the parent node pp of np is checked. If pp is non-full node, the input data that consist of key and its value are inserted into pp. If full, a new node is created and initialized with the input data. The new node is attached to pp as left child if the input key is less than the minkey of pp. Otherwise, the new node is attached to pp as right child. Then a splay is performed starting at the first full ancestor node of np. When the node pointer np from the Search() function is not NULL, whether the node pointed by np is full or not it is checked. If not full, the input data are inserted into this node. If the np points to full node, three cases are checked. First, if np does not have left child, the element with minkey of np is extracted and the input data are inserted into np. Then a new node is created and initialized with this extracted data and attached to np as its left child. Second, if np does not have a right child, the element with maxkey of np is exchanged with the input data. A newly created node Algorithm 5.8 Sp1 iv tree Insert algorithm (bottom-up) Insert '. val // key, val: input data to insert { if( found ) splay tree starting from the first full ancestor node of np, return FOUND; let pp be the parent node of np; if np is NULL then if pp is NOT FULL then insert key and val into pp; else create a node np with (key, val); attach np to pp as child; end if np = pp; else if np is NOT FULL then insert key and val into np; else if np-LeftChild == NULL then exchange the input data with min data of np; create a node cp with (the exchanged min data); attach cp to np as left child; else if np RightChild == NULL then exchange the input data with max data of np; create a node cp with (the exchanged max data); attach cp to np as right child; else exchange input data with the min data of np; find the node cp with largest key from the left subtree of np; if cp is full then create node temp with (the exchanged min data); attach temp to cp as right child of cp; else insert the exchanged data into cp; end if np = cp; end if end if end if splay tree starting from the first full ancestor node of np; return INSERTED; } is initialized with this exchanged data and attached to np as its right child. Finally, when node np has both left and right children, the input data are exchanged with the element with minkey of np and the node cp with the largest element from the left subtree of np is found. If cp points to a full node, this exchanged data initialize a newly created node temp that links to the node cp as its right child since the exchanged data are bigger than any data from the left subtree of np. If cp is not full, the exchanged data are inserted into cp. After the input data insertion into the tree, a splay is done starting from the first full ancestor of np. The delete operation of splay tree in Algorithm 5.9 starts by searching the input key in the tree. If the input key is not in the tree, a splay is performed starting at the first full ancestor node of np and the boolean value NOT_FOUND is returned. When the input key is found in the tree, the deletion process of the element with input key is started. Since all the internal nodes except leaf nodes should be full, whether the node pointed by np is leaf of not is checked. If the node np has any child, then this node is not a leaf and the slot of deleted element must be filled up from its left subtree or right subtree. If np has a left child, the element slot with minkey is made empty by moving this element to the slot occupied by the element with the input key. Then the node cp with the largest element from the left subtree of np is found and the largest key element of cp is moved up into the empty slot of np. This process is repeated until np points to a leaf node. At a leaf node, the elements of np are rearranged so that the element with the maxkey is distinguished from the other elements. Similarly, when np resulting from the search function has a right child, the slot with maxkey becomes empty by moving the element with maxkey into the element slot with the input key. The element with smallest key from the right subtree of np is located and moved up into the node np as the element with maxkey. This process Algorithm 5.9 Sp1 iv tree Delete algorithm (bottom-up) // key: key of element to delete Delete( ' { if found is false then splay tree starting from the first full ancestor node of np, return NOT_FOUND; end if if np has left child then move the min data to the element slot to be deleted; while np is not leaf node do find the node cp with the largest element from the left subtree of np; move up the largest element of cp to np; np = cp; end while fill out the max data of np; else if np has right child then move the max data to the element slot to be deleted; while np is not leaf node do find the node cp with smallest element from the right subtree of np; move up the smallest element of cp to np np = cp; end while fill out the min data of np; else delete the element with the input key from np; end if splay tree starting from the first full ancestor of np; return DELETED; } is repeated until np points to a leaf node. Then, the smallest new key data in the leaf node is found and distinguished. If np from the Search() function is a leaf node, the element with the input key is directly deleted. After this deletion, splay begins starting from the first full ancestor node of np. 5.5 Top-down Splay Tree with Supernode As opposed to bottom-up splay tree, splay operation of top-down splay tree with a single element node, SPT01, is performed on the way down from the root to leaf path, and still preserve O(log n) amortized time complexity [32]. For each operation, the input key is searched for, performing splay. After search, the node containing the input key becomes the root node. For the delete operation, if the node with the input key has both left and right children, the two subtrees are joined and the root node of the joined subtree becomes the root node of overall tree. While searching down for a node with the input key, the two -p'1 i\ 't subtrees, LT and RT, collect all subtrees that have been splayed: All the nodes with key smaller than the input key are attached to LT, and all the nodes with key greater than the input key are attached to RT. When the node with key is found, this node and its two subtrees, and LT and RT are assembled into a single tree with the node with the input key as the root. In the top-down splay tree with supernodes, SPTsn, splay is done only for full node. For the delete operation, leaf node conversion is performed, instead of joining of the two subtrees of the node with the input key. The search operation (Algorithm 5.10) searches for the input key, starting from the root node. For each while loop, two nodes (np, cp) are compared, except for the final step. For every two comparisons, appropriate splay depending on the key value in the node is performed and attached to LT or RT. Algorithm 5.10 SpI1 tree Search algorithm (top-down) // key: key value to find Search( { np = root; while ( true ) do if key < np-minkey then cp = np LeftChild; if key < cp-minkey then perform splay LL; np = cp-LeftChild; else if key > cp-maxkey then perform splay LR; np = cp RightChild; else assemble subtrees; if( key is found in node cp ) return FOUND; return FALSE; end if else if key > np-maxkey then cp = np RightChild; if key < cp-minkey then perform splay RL; np = cp-LeftChild; else if key > cp-maxkey then perform splay RR; np = cp RightChild; else assemble subtrees; if( key is found in node cp ) return FOUND; return FALSE; end if else assemble subtrees; if( key is found in node np ) return FOUND; return NOT_FOUND; end if end while } When the node that covers the input key is found, all the subtrees are assembled into one. After this assembly, the search function returns FOUND if key is found. Otherwise, NOT_FOUND is returned. As can be seen in Algorithm 5.11, the insert operation searches for the node covering the input key. We assume that there is a node in the tree that covers the input key. If the node np already contains the input key, this function assembles and returns the boolean value FOUND. If np is not full, the input data are inserted into this node, all the subtrees are merged, and the insert function returns the boolean value INSERTED, denoting that the insertion is done successfully. If np does not have right child, the input data are exchanged with the max data of np. Then, a node with the exchanged data are created and attached to np as the right child. If np has no left child, the symmetric operation is performed. When the node np has a left child, the input data are exchanged with the min element of np. Then we find the node tp with largest key in the left subtree of np. If tp is already full, a node with the exchanged data are created and linked to tp as a right child. If tp is not full, the exchanged data are inserted into tp as the max element. After this insertion, all subtrees are merged together and return INSERTED. For the top-down splay delete (Algorithm 5.12), the node np covering the input key is found as in the search function. If the input key is not found in the node np, this function returns the boolean NOT_FOUND after assembling all subtrees together. If np is a leaf node, the element with the input key is deleted from np. The node np is disposed if np becomes empty after this deletion, When all subtrees are assembled making np's parent node the overall root, the boolean DELETED is returned. When np is not leaf node, np should be filled up from its left or right subtrees. If np has a left child, the '.i.-.- -I element from its left subtree is moved up to become the smallest element of np. Before this move-up, the min element position of np becomes Algorithm 5.11 Sph1 i tree Insert algorithm (top-down) // key, val: input data to Insert( '. vl ) insert { find the node np covering the input key, performing splay; if( the input key is found in np ) { assemble subtrees; return FOUND; } if np is not full then insert the input data into np; assemble subtrees; return INSERTED; end if if np has no right child then exchange the input data with max data of np; create a node cp with the exchanged data; attach cp to np as a right child; assemble subtrees; return INSERTED; end if exchange the input data with min data of np; if np has no left child then create a node cp with the exchanged data; attach cp to np as a left child; assemble subtrees; return INSERTED; end if find the node tp with largest key from the left subtree of np; if tp is not full then insert the exchanged data into tp; else create a node cp with the exchanged data; attach cp to tp as the right child; end if assemble subtrees; return INSERTED; } Algorithm 5.12 SpI1 i tree Delete algorithm (top-down) // key: key of element to delete Delete(' { find the node np covering the input key, performing splay; if( key is not found in np ) { assemble subtrees; return NOTFOUND; } if np is a leaf node then delete the element with key; if( np has no element) dispose np; assemble subtrees; return DELETED; end if if np has left child then make min element of np empty; move up the 'i.:-- -1 element of left subtree into np; repeat this move-up until a leaf node tp is reached; else make max element of np empty; move up the smallest element of right subtree into np; repeat this move-up until a leaf node tp is reached; end if if( tp has no element) dispose tp; assemble subtrees; return DELETED; } empty by moving to the position of the element with the input key. The move-up from the right subtree is performed symmetrically. After the repeated move-ups, we end up at a leaf node tp. If the node tp becomes empty resulting from the move-up, tp is removed. Finally, all the subtrees are assembled to form a -!1 ,i I1 tree with the parent node of tp as the root, and this delete function returns DELETED. 5.6 Space Efficiency Now, let's take a look at the memory space necessary to accommodate n elements in which each element consists of a key and a value field. It's assumed that each node is required to be an integral number of bytes, and the size of each field except bf, color and n_elem is 4 bytes. Each of the three fields bf, color and n_elem requires at most 1 byte, and collectively, 4 bytes are allocated to these fields in a supernode. For the search tree with single elements, the space Sl requirement is (20 + 4) n bytes where 4 is for the bf. That is, S' 24n (5.1) To analyze the space requirement S, for supernode scheme, let m be the number of leaf nodes and c the maximum number of elements that supernode can have. In the worse case, every leaf node has only one element. So, we can have an expression (m-1)c + m = n, where m 1 is the number of internal nodes. n+c M = (5.2) c+1 The space S, requirement for the supernode scheme is (2m- 1)(8c+12+4), where 8c is for the c elements in a supernode, 12 is for the three pointers in each node, and 4 is for the remaining fields which are n_elem and bf fields of AVL supernode, n_elem and color of red-black tree, and n_elem of splay tree. Solving this, we have Ss = (2 1)(8c + 16) n+c S8(2 -)(c + 2) c+1 16(n+ c) (5.3) Therefore, the space efficiency E, is E, 2(n + c) (54) (5.4) 3n Also, from Equations 5.1 and 5.3, the condition that Ss is smaller than S1 is 16(n + c) < 24n. From this relationship, Ss < Si iff c < 0.5n. 5.7 Efficient Use of Cache Memory The performance of the dynamic search tree operations, insert, delete and search, can be improved by using cache memory efficiently. Let's see the supernode structure, again. The frequently used fields of the su- pernode are LeftChild, RightChild, minkey, and maxkey since all three operations use these four fields to find the node with a given input key. So the node layout can be changed as shown in Figure 5.11. With this layout we can access all four of the frequently accessed fields with at most one cache miss provided all four fields fit into the same cache line. This happens, for example, when the supernode begins at a cache line boundary and the memory required by the above four fields is no more than the size of a cache line. Typically, allocating memory that is aligned to a cache line boundary is more time consuming than allocation of unaligned memory. Therefore, when using cache Figure 5.11: Supernode layout for the efficient use of cache memory aligned memory, we allocate aligned memory in chunks that are large enough for several supernodes. Figure 5.12 shows a cache aligned memory chunk that is the size of 3 supernodes. If the cache line size is LineSize, then each supernode is cache aligned iff NS mod LineSize = 0 (5.5) Since LineSize is typically a power of 2 (the size of an L1 cache line on a SUN Sparc workstation is 32 bytes and that of an L2 cache line is 64 bytes), for Equation 5.5 to hold, NS must be a power of 2 and > LineSize. Suppose that each supernode has NNE elements, each element comprises a 4 byte key and a 4 byte value, and each of the 3 pointers in a node is 4 bytes. To have any hope of satisfying Equation 5.5, we allocate 4 bytes for the remaining fields n_elem, bf, color (even though fewer bytes suffice). So, we get NS = 8NNE + 16 (5.6) From Equations 5.5 and 5.6, we get minkey maxkey LeftChild RightChild * * Figure 5.12: Cache aligned chunk with 3 supernodes (8NNE + 16) mod LineSize 0 (5.7) Cache alignment of all supernodes in a cache aligned chunk is assured, for example, when NNE= 2, 6, 10, 14, ... and LineSize = 32 bytes. Next, suppose we have a direct-mapped cache as is the case for a SUN Sparc workstation. Byte b of memory maps into cache line CacheLine(b) = b/LbneStze mod NumberO f CacheLnes (5.8) Suppose we use cache aligned supernodes and that each supernode is an integral number of cache lines as -i r:'- -1. I in Equation 5.5. Suppose that our first supernode begins at byte 0. Then, assuming no memory is allocated for other purposes, each of our supernodes begins at byte i NS for some non-negative integer i. Therefore, the frequently used 4 fields of each supernode fall into cache lines given by supernode 0 supernode 1 supernode 2 [i NS/LineSize] mod NumberOfCacheLines = (i LinesPerNode) mod NumberOfCacheLines (5.9) where LinesPerNode = NS LineSize is the number of cache lines needed for a supernode. Since the number of cache lines in both the LI and L2 cache of a computer is a power of 2, we see from Equation 5.9 that when LinesPerNode is even, the four frequently used fields of a supernode are alv--,- mapped into an even cache line. Effectively, only half the cache lines get used. As long as the two numbers LinesPerNode and NumberOfCacheLines are relatively prime, the numbers generated by Equation 5.9 are uniformly distributed over { 0, 1, 2, ... NumberOfCacheLines } by the following theorem [31]. Theorem 5.1 For integer j ( 0 < j < n ), (j3* ) mod n generates .. ' d copies of the set { 0, d, 2d, ..., n d } of the n/d numbers, where d = gcd(m, n). ([31, pp. 1 .,-130) By this theorem, we can -- that for every mn and every p such that n ( 1 < m < p ) and p are relatively prime (that is, d = 1), the set {0, 1, 2, ..., p 1} is generated by (j m) mod p, where 0 < j < p. Therefore, for 0 < j < oo, (j *p) mod n (5.10) where p and n are relatively prime, generates uniformly distributed numbers over 0, 1, 2, ..., a 1. Since NumberOfCacheLines is a power of 2, it follows that by choosing Lines- PerNode to be an odd number we ensure Equation 5.9 gives a uniform distribution of cache lines. Note that choosing LinesPerNode such that k LinesPerNode and NumberOf- CacheLines are relatively prime, where k is an integral constant > 1 also gives a uniform utilization of cache lines. So, for example, when LinesPerNode = 6.5, we get a uniform utilization of cache lines. When a cache line is 32 bytes, cache utilization may be further improved by separating the frequently used fields of a node from the non-frequently used fields. When this is done, we can arrange for the frequently used fields of two nodes to fill a cache line. Doing this complicates the code somewhat, and we do not explore this further in this thesis. 5.8 Experimental Results As predicted by our analysis, the supernode schemes have reduced memory re- quirements. Table 5.1 gives the actual memory required for various supernode sizes and for the one element per node scheme. For example, a one million element AVL tree using one element per node takes 24 MB Megaa bytes) of memory, whereas when supernodes of size 26 are used, only 12 MB of memory are needed. Table 5.1: Actual memory utilization in mega bytes To compare the execution times of BSTO1 and BSTsn, we have used four test models (base, hold, stack, and queue models) as shown in Figure 5.13, and a his- tograming application [34]. In the base test model (Figure 5.13 (a)), we measure data set #elem /node AVL RBB SPB SPT 01 24.0 24.0 24.0 16.0 06 14.2 14.2 14.3 14.3 1M 12 12.9 12.9 13.0 13.0 22 12.3 12.3 12.3 12.3 26 12.1 12.2 12.2 12.2 01 96.0 96.0 96.0 64.0 06 56.6 56.6 56.9 57.0 I \! 12 51.6 51.6 51.8 51.8 22 49.2 49.2 49.3 49.3 26 48.7 48.7 48.8 48.8 insert, search, and delete times, separately. Starting with an empty tree, a set of random data are inserted and the total time for these insertions is measured. Next, a sequence of searches is done and the total search time measured. Finally, all data in the binary tree are deleted in a random order and the total deletion time mea- sured. In the hold test model [33], as can be seen in Figure 5.13(b), we measure the execution time when insertion and deletion operations are intermixed. The time spent initializing and d. -I i. ii_-; the tree is ignored. The intermixed insertion and deletion operations are performed so that the tree size is roughly unchanged. In the stack model (Figure 5.13 (c)), data are inserted in increasing order and deleted in decreasing order. The insertion and deletion sequence is done twice. In the queue model (Figure 5.13 (d)), both insertions and deletions are performed in increasing order, and repeated. Srand Irand/Drand rnd D nd (I)/ D) /)\ (1) time (a) Base lin/ Dde li D time (b) Hold lin Din li D i time time (c) Stack (d) Queue Figure 5.13: Four test models (I: Insert, S: Search, D: Delete, rand: random sequence, inc: increasing sequence, dec: decreasing sequence) We have measured execution time using the UNIX system on the Ultra Spare workstation in which L1 cache size is 16K bytes and each L1 cache line size is 32 bytes. All programs were written in C++ and compiled using the maximum optimization option 03. The run time performance of the single element per node version of a data struc- ture was compared with that of the supernode as well as cache aligned supernode (each supernode begins at a cache line boundary) version of the same data structure. Notice that the single element per node versions of our data structures (other than the top-down splay tree data structure) do not use nodes that necessarily start at cache line boundaries. The speedup obtained by a supernode implementation is the run time for the single element per node data structure divided by the run time for the supernode implementation. The non-cache aligned schemes used the new and delete methods of C++ to get and free nodes. For the cache aligned schemes, nodes were allocated from large chunks of cache aligned memory (this was done to reduce memory allocation time). The chunk size was chosen so as not to exceed the size of L1 cache and so as to be a multiple of the node size. chunksize = Nnodes LinesPerNode NumberO fCache Lines = -e-e- d LinesPerNode (5.11) LinesPerNode where Nnodes is the number of nodes that can be allocated from a memory chunk. Base Model. The run time and speedups reported in Tables 5.2 5.17 were obtained as follows: Insert time: Start with an empty tree and insert N = 1 million (or 4 million) different integer keys in the range [1,2N]. The insertion sequence is obtained by generating a random permutation of the integers 1 through 2N, and then inserting the first N numbers of this random permutation. Search time: Generate a random sequence of N different numbers (keys) in the range [1,2N]. Search for each number in this sequence once. Note that some searches will be for numbers not in the tree. Delete time: Obtain a random permutation of the N keys in the tree. Delete all keys from the tree using the order specified by the random permutation. The times and speedups reported in Tables 5.2 5.17 are the average over 10 data sets generated as above. The same 10 data sets were used for all data structures (of course, the 10 data sets used when N = 106 are different from those used when N 4 x 106). Tables 5.2 -5.5 give the results for AVL trees. Our supernode scheme outperforms the traditional single element per node scheme. The speedup obtained depends on the supernode size. For non-cache aligned tests with N = 106, using supernodes with 26 keys results in a speedup of 1.7 for insert, 1.8 for search, and 1.9 for delete. The speedup numbers when N = 4 x 106 and each node has 26 keys are 1.5, 1.6, and 1.6. When cache aligned nodes are used, the speedup numbers for N = 106 and 6 elements per node are 1.9, 1.7, and 2.4. When N = 4 x 106 and each supernode has 6 elements, the speedups are 1.7, 1.5, and 2. Tables 5.6 -5.9 give our experimental results for red-black trees. The row labeled STL (Standard Template Library) in Tables 5.6 and 5.8 gives the times and speedups for the C++ STL implementation of red-black trees (the STL code of Silicon Graphics Computer Systems, Inc. (www.sgi.com/Technology/STL/) was used). This imple- mentation allocates nodes in chunks of size sufficient to suballocate 40 single element nodes and uses single element per node scheme. Although the STL implementation is faster than our single element per node implementation on the insert test, it is slower on the search and delete tests. When N = 106, our supernode scheme with non-aligned nodes and 22 elements per node yields a speedup of 1.7, 1.8, and 1.8, respectively, for the three parts of our test. When cache aligned nodes are used, these speedup numbers increase to 1.9, 1.8, and 2.0. For the case when N = 4 x 106 and each supernode has 12 elements, the speedup numbers are 1.5, 1.6, 1.5, 1.7, 1.6, and 1.7. For bottom-up splay trees (Tables 5.10 5.13) and top-down splay trees (Ta- bles 5.14 5.17) also, the use of supernodes results in a speedup over using the tra- ditional single element per node implementation. The row labeled Al in Tables 5.15 and 5.17 is for single element per node top-down splay trees using cache aligned nodes. This row is possible because, in a top-down splay tree, nodes do not need a parent field. Consequently, each node requires 16 bytes and so, two nodes fit into an L1 cache line. Therefore, it is possible to use single element per node such that each node starts at an L1 cache line boundary or in the middle of one without incurring memory wastage. As can be seen from the tables, the supernode scheme outperforms the single element per node scheme. Figures 5.14 through 5.19 show the performance in the base test model, graphi- cally. Figures 5.14 and 5.15 show the average elapsed insertion time for 1M and \! data sets, Figures 5.16 and 5.17 show the average elapsed search time for 1M and \! data sets, and Figures 5.18 and 5.19 show the average elapsed deletion time for 1M and I I\ data sets. Hold Model. In the hold model, the test data are generated in the same way as in the base model, and the search operations are replaced by a sequence of intermixed insertions and deletions. The ratio of insertions and deletions is 50:50 percent. As can be seen in Tables 5.18 and 5.19 and Figures 5.20 and 5.21, our supernode scheme outperforms the non-aligned single element per node scheme. Stack Model. In the stack model, test data are simply inserted in increasing order and deleted in the reverse order, decreasing order. As can be seen in Tables 5.20 and 5.21 and Figures 5.22 and 5.23, our supernode scheme outperforms the non-aligned single element per node scheme. Queue Model. In the queue model, test data are inserted in increasing order and deleted in the same order. This is repeated. The results are given in Tables 5.22 and 5.23 and Figures 5.24 and 5.25. Once again, our supernode scheme outperforms the non-aligned single element per node scheme. Histograming Application. In a histograming application, the test data are generated by a random number generator. The number of unique data (key) is one tenth of the number of input data generated. The value field of each element is used as a counter. After histograming is finished, reporting is done by storing all the data in the tree into an array in ascending order. For the tree with supernodes, the elements in each supernode are sorted using insertion sort while the tree is traversed. As can be seen in Tables 5.24 through 5.26 and Figures 5.26 through 5.28, our supernode scheme outperforms the non-aligned single element per node scheme in histograming. Also, Figures 5.29, 5.30, and 5.31 show that reporting from the trees with su- pernodes is faster, even though the data in each supernode are to be sorted while reporting. Our experiments show that the supernode scheme speeds all of the varieties of binary search trees tested by us. The improvement in run time and memory utilization comes at the expense of increased code complexity. Table 5.2: AVL tree performance in base model with non-aligned supernode for 1M data (sec) #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 7.681 (1.000) 5.947 (1.000) 8.466 (1.000) 02 6.118 (1.255) 4.923 (1.208) 6.067 (1.395) 04 5.212 (1.474) 4.182 (1.422) 6.238 (1.357) 06 4.801( 1.600) 3.936 (1.511) 5.709 (1.483) 08 4.771( 1.610) 3.784 (1.572) 5.372 (1.576) 10 4.516 (1.701) 3.888 (1.530) 5.146 (1.645) 11 4.496 (1.708) 3.770 (1.577) 5.088 (1.664) 12 4.487 (1.712) 3.707 (1.604) 4.854 (1.744) 13 4.620 (1.663) 3.708 (1.604) 4.929 (1.718) 14 4.631 ( 1.659) 3.632 (1.637) 4.807 (1.761) 15 4.682 (1.641) 3.633 (1.637) 4.875 (1.737) 16 4.390 (1.750) 3.469 (1.714) 4.729 (1.790) 17 4.716 (1.629) 3.761( 1.581) 4.875 (1.737) 18 4.372 (1.757) 3.421 ( 1.738) 4.449 (1.903) 19 4.585 (1.675) 3.820 (1.557) 4.836 (1.751) 20 4.703 (1.633) 3.649 (1.630) 4.782 (1.770) 21 4.766 (1.612) 3.596 (1.654) 4.656 (1.818) 22 4.547 (1.689) 3.489 (1.704) 4.514 (1.875) 23 4.755 (1.615) 3.783 (1.572) 4.854 (1.744) 24 4.502 (1.706) 3.392 (1.753) 4.441 ( 1.906) 25 4.868 (1.578) 3.734 (1.593) 4.731( 1.789) 26 4.472 (1.718) 3.374 (1.763) 4.462 (1.897) 27 4.827 (1.591) 3.923 (1.516) 4.818 (1.757) 28 4.676 (1.643) 3.627 (1.640) 4.473 (1.893) 29 4.827 (1.591) 3.792 (1.568) 4.553 (1.859) 30 4.938 (1.555) 4.011( 1.483) 4.602 (1.840) 31 4.999 (1.537) 4.097 (1.452) 4.823 (1.755) 32 4.787 ( 1.605) 3.580 ( 1.661) 4.493 ( 1.884) 48 5.411( 1.420) 4.489 (1.325) 4.908 (1.725) 64 5.587 (1.375) 3.895 (1.527) 4.868 (1.739) Table 5.3: AVL tree performance in base model with aligned supernode for 1M data (sec) #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 7.681 (1.000) 5.947 (1.000) 8.466 (1.000) 02 4.638 (1.656) 4.360 (1.364) 4.278 (1.979) 04 4.271 ( 1.798) 3.812 (1.560) 3.990 ( 2.122) 06 4.013 (1.914) 3.484 (1.707) 3.592 ( 2.357) 08 4.077 (1.884) 3.507 (1.696) 3.720 ( 2.276) 10 4.068 (1.888) 3.475 (1.711) 3.795 ( 2.231) 12 4.074 (1.885) 3.314 (1.795) 3.771 ( 2.245) 14 4.137 (1.857) 3.258 (1.825) 3.706 ( 2.284) 16 4.044 (1.899) 3.275 (1.816) 3.830 ( 2.210) 18 4.161 ( 1.846) 3.353 (1.774) 3.857 ( 2.195) 20 4.097 (1.875) 3.168 (1.877) 3.743 ( 2.262) 22 4.314 (1.780) 3.468 (1.715) 3.940 ( 2.149) 24 4.270 (1.799) 3.235 (1.838) 3.804 ( 2.226) 26 4.264 (1.801) 3.147 (1.890) 3.755 ( 2.255) 28 4.524 (1.698) 3.572 (1.665) 4.041 ( 2.095) 30 4.245 (1.809) 3.205 (1.856) 3.793 ( 2.232) 32 4.698 (1.635) 3.735 (1.592) 4.184 ( 2.023) 48 4.989 (1.540) 3.768 (1.578) 4.298 (1.970) 64 5.656 (1.358) 4.169 (1.426) 4.630 (1.829) Table 5.4: AVL tree performance in base model with non-aligned supernode for ,!\! data (sec) #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 40.572 (1.000) 30.354 (1.000) 41.868 (1.000) 02 32.889 (1.234) 25.747 (1.179) 31.186 (1.343) 04 29.404 (1.380) 23.784 (1.276) 33.390 (1.254) 06 28.240 (1.437) 22.297 (1.361) 30.490 (1.373) 08 27.358 (1.483) 21.819 (1.391) 29.242 (1.432) 10 26.986 (1.503) 21.610 (1.405) 28.107 (1.490) 11 27.346 (1.484) 22.269 (1.363) 29.115 (1.438) 12 26.638 (1.523) 21.331( 1.423) 27.813 (1.505) 13 27.056 (1.500) 22.317 (1.360) 27.533 (1.521) 14 26.513 (1.530) 21.286 (1.426) 27.345 (1.531) 15 27.167 (1.493) 21.161 ( 1.434) 27.880 (1.502) 16 26.143 (1.552) 20.230 (1.500) 26.684 (1.569) 17 27.038 (1.501) 21.657 (1.402) 27.844 (1.504) 18 25.863 (1.569) 19.647 (1.545) 26.309 (1.591) 19 28.061 (1.446) 22.557 (1.346) 28.528 (1.468) 20 27.170 (1.493) 21.803 (1.392) 27.665 (1.513) 21 27.681 (1.466) 21.303 (1.425) 27.715 (1.511) 22 26.172 (1.550) 19.966 (1.520) 26.011 ( 1.610) 23 27.714 (1.464) 22.320 (1.360) 28.163 (1.487) 24 26.204 (1.548) 19.525 (1.555) 26.050 (1.607) 25 27.356 (1.483) 21.445 (1.415) 26.863 (1.559) 26 26.347 (1.540) 19.528 (1.554) 25.626 (1.634) 27 28.034 (1.447) 22.745 (1.335) 27.823 (1.505) 28 27.480 (1.476) 21.667 (1.401) 27.292 (1.534) 29 27.088 (1.498) 21.504 (1.412) 26.114 (1.603) 30 28.646 (1.416) 22.370 (1.357) 27.677 (1.513) 31 28.772 (1.410) 22.907 (1.325) 28.003 (1.495) 32 27.343 ( 1.484) 20.834 ( 1.457) 26.275 ( 1.593) 48 31.095 (1.305) 24.230 (1.253) 28.897 (1.449) 64 31.435 (1.291) 22.647 (1.340) 27.672 (1.513) Table 5.5: AVL tree performance in base model with aligned supernode for I\ data (sec) #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 40.572 (1.000) 30.354 (1.000) 41.868 (1.000) 02 26.629 (1.524) 24.019 (1.264) 24.476 (1.711) 04 25.528 (1.589) 21.324 (1.423) 23.032 (1.818) 06 23.487 (1.727) 20.051 (1.514) 21.018 (1.992) 08 24.555 (1.652) 20.471( 1.483) 22.372 (1.871) 10 23.761( 1.708) 19.981 (1.519) 22.170 (1.888) 12 23.710 (1.711) 19.319 (1.571) 21.962 (1.906) 14 23.597 (1.719) 19.285 (1.574) 21.487 (1.949) 16 23.760 (1.708) 18.880 (1.608) 22.192 (1.887) 18 24.470 (1.658) 20.096 (1.510) 23.160 (1.808) 20 24.139 (1.681) 18.840 (1.611) 22.341 (1.874) 22 24.822 (1.635) 20.148 (1.507) 22.890 (1.829) 24 24.833 (1.634) 19.762 (1.536) 23.159 (1.808) 26 24.220 (1.675) 18.464 (1.644) 22.251 (1.882) 28 26.092 (1.555) 20.743 (1.463) 24.191 ( 1.731) 30 24.833 (1.634) 18.979 (1.599) 22.399 (1.869) 32 26.759 (1.516) 21.625 (1.404) 24.622 (1.700) 48 27.538 (1.473) 20.159 (1.506) 24.386 (1.717) 64 30.809 (1.317) 22.801( 1.331) 26.680 (1.569) Table 5.6: Red-black tree performance in base model with non-aligned supernode for 1M data (sec) #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 7.891 (1.000) 6.056 (1.000) 7.877 (1.000) STL 6.699 (1.178) 6.656 ( 0.910) 9.486 ( 0.830) 02 6.050 (1.304) 4.993 (1.213) 6.195 (1.272) 04 5.393 (1.463) 4.232 (1.431) 6.378 (1.235) 06 5.062 (1.559) 3.901 ( 1.552) 5.697 (1.383) 08 4.839 (1.631) 3.666 (1.652) 5.340 (1.475) 10 4.676 (1.688) 3.811( 1.589) 5.039 (1.563) 11 4.750 (1.661) 3.880 (1.561) 4.978 (1.582) 12 4.629 (1.705) 3.651( 1.659) 4.830 (1.631) 13 4.597 (1.717) 3.817 (1.587) 4.820 (1.634) 14 4.573 (1.726) 3.665 (1.652) 4.908 (1.605) 15 4.743 (1.664) 3.781( 1.602) 4.798 (1.642) 16 4.658 (1.694) 3.547 (1.707) 4.530 (1.739) 17 4.763 (1.657) 3.855 (1.571) 4.853 (1.623) 18 4.535 (1.740) 3.538 (1.712) 4.526 (1.740) 19 4.641( 1.700) 3.859 (1.569) 4.875 (1.616) 20 4.858 (1.624) 3.714 (1.631) 4.724 (1.667) 21 4.676 (1.688) 3.710 (1.632) 4.619 (1.705) 22 4.700 (1.679) 3.417 (1.772) 4.436 (1.776) 23 4.804 (1.643) 3.970 (1.525) 4.633 (1.700) 24 4.621 ( 1.708) 3.350 (1.808) 4.531 ( 1.738) 25 4.781( 1.650) 3.761( 1.610) 4.728 (1.666) 26 4.704 (1.678) 3.446 (1.757) 4.417 (1.783) 27 4.904 (1.609) 3.887 (1.558) 4.711( 1.672) 28 4.710 (1.675) 3.722 (1.627) 4.707 (1.673) 29 4.914 (1.606) 3.737 (1.621) 4.603 (1.711) 30 4.922 (1.603) 3.935 (1.539) 4.646 (1.695) 31 4.995 (1.580) 3.909 (1.549) 4.776 (1.649) 32 4.898 (1.611) 3.571( 1.696) 4.549 (1.732) 48 5.481 ( 1.440) 4.431 ( 1.367) 4.997 (1.576) 64 5.730 (1.377) 3.950 (1.533) 4.771( 1.651) Table 5.7: Red-black tree performance in base model with aligned supernode for 1M data (sec) #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 7.891 (1.000) 6.056 (1.000) 7.877 (1.000) 02 4.666 (1.691) 4.287 (1.413) 4.691( 1.679) 04 4.470 (1.765) 3.821( 1.585) 4.174 (1.887) 06 4.132 (1.910) 3.786 (1.600) 3.828 ( 2.058) 08 4.116 (1.917) 3.502 (1.729) 3.971( 1.984) 10 4.118 (1.916) 3.514 (1.723) 3.980 (1.979) 12 4.075 (1.936) 3.428 (1.767) 3.859 ( 2.041) 14 4.034 (1.956) 3.450 (1.755) 3.774 ( 2.087) 16 4.135 (1.908) 3.400 (1.781) 3.919 ( 2.010) 18 4.213 (1.873) 3.560 (1.701) 3.954 (1.992) 20 4.164 (1.895) 3.427 (1.767) 3.964 (1.987) 22 4.266 (1.850) 3.510 (1.725) 3.892 ( 2.024) 24 4.247 (1.858) 3.558 (1.702) 3.942 (1.998) 26 4.311 ( 1.830) 3.208 (1.888) 3.940 (1.999) 28 4.493 (1.756) 3.689 (1.642) 4.040 (1.950) 30 4.374 (1.804) 3.423 (1.769) 3.926 ( 2.006) 32 4.656 (1.695) 3.816 (1.587) 4.220 (1.867) 48 4.873 (1.619) 3.829 (1.582) 4.338 (1.816) 64 5.661( 1.394) 4.310 (1.405) 4.879 (1.614) Table 5.8: Red-black tree performance !\ I data (sec) in base model with non-aligned supernode for #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 41.429 (1.000) 31.334 (1.000) 39.312 (1.000) STL 36.647 (1.130) 34.492 (0.908) 47.787 ( 0.823) 02 34.015 (1.218) 26.753 (1.171) 32.611 ( 1.205) 04 30.372 (1.364) 23.613 (1.327) 33.471 ( 1.175) 06 29.042 (1.427) 22.827 (1.373) 31.051 ( 1.266) 08 28.154 (1.472) 22.158 (1.414) 29.449 (1.335) 10 27.615 (1.500) 21.738 (1.441) 28.494 (1.380) 11 28.062 (1.476) 22.619 (1.385) 29.051 ( 1.353) 12 27.187 (1.524) 21.666 (1.446) 27.853 (1.411) 13 28.016 (1.479) 22.855 (1.371) 28.131 ( 1.397) 14 27.091 (1.529) 21.663 (1.446) 27.809 (1.414) 15 27.507 (1.506) 21.648 (1.447) 27.710 (1.419) 16 26.748 (1.549) 20.349 (1.540) 26.604 (1.478) 17 27.799 (1.490) 21.352 (1.467) 27.524 (1.428) 18 26.484 (1.564) 20.055 (1.562) 26.279 (1.496) 19 28.720 (1.443) 22.994 (1.363) 28.550 (1.377) 20 27.810 (1.490) 21.875 (1.432) 27.546 (1.427) 21 28.074 (1.476) 21.649 (1.447) 27.637 (1.422) 22 26.822 (1.545) 20.180 (1.553) 26.020 (1.511) 23 28.222 (1.468) 22.850 (1.371) 27.933 (1.407) 24 26.618 (1.556) 19.843 (1.579) 25.866 (1.520) 25 28.258 (1.466) 22.066 (1.420) 27.365 (1.437) 26 26.579 (1.559) 19.991( 1.567) 26.034 (1.510) 27 28.827 (1.437) 23.089 (1.357) 28.511 ( 1.379) 28 28.248 (1.467) 22.204 (1.411) 27.359 (1.437) 29 28.007 (1.479) 21.875 (1.432) 26.258 (1.497) 30 29.218 (1.418) 22.957 (1.365) 27.724 (1.418) 31 29.205 (1.419) 23.201( 1.351) 28.293 (1.389) 32 27.999 (1.480) 21.071( 1.487) 26.643 (1.476) 48 31.733 (1.306) 24.572 (1.275) 28.986 (1.356) 64 31.854 (1.301) 22.928 (1.367) 27.921 ( 1.408) Table 5.9: Red-black tree performance in base model with aligned supernode for ,!\! data (sec) #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 41.429 (1.000) 31.334 (1.000) 39.312 (1.000) 02 28.117 (1.473) 24.647 (1.271) 27.079 (1.452) 04 26.401 (1.569) 22.532 (1.391) 24.958 (1.575) 06 24.816 (1.669) 20.829 (1.504) 22.482 (1.749) 08 25.167 (1.646) 20.692 (1.514) 23.336 (1.685) 10 24.701 (1.677) 20.287 (1.545) 23.569 (1.668) 12 24.359 (1.701) 19.582 (1.600) 23.122 (1.700) 14 24.261 ( 1.708) 20.022 (1.565) 22.575 (1.741) 16 24.649 (1.681) 19.547 (1.603) 23.147 (1.698) 18 25.373 (1.633) 20.571 (1.523) 24.341( 1.615) 20 25.209 (1.643) 19.424 (1.613) 23.813 (1.651) 22 25.678 (1.613) 20.148 (1.555) 24.080 (1.633) 24 25.194 (1.644) 18.983 (1.651) 23.608 (1.665) 26 24 1 ; (1.661) 19.030 (1.647) 23.581 (1.667) 28 26.760 (1.548) 21.484 (1.458) 25.009 (1.572) 30 25.598 (1.618) 19.413 (1.614) 23.416 (1.679) 32 27.668 (1.497) 21.829 (1.435) 25.528 (1.540) 48 28.599 (1.449) 20.973 (1.494) 25.722 (1.528) 64 31.805 (1.303) 23.482 (1.334) 27.962 (1.406) Table 5.10: Bottom-up splay tree performance in base pernode for 1M data (sec) model with non-aligned su- #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 11.590 (1.000) 15.102 (1.000) 14.627 (1.000) 02 13.272 (0.873) 13.738 (1.099) 14.036 (1.042) 04 11.976 (0.968) 12.052 (1.253) 13.933 (1.050) 06 11.211 (1.034) 11.415 (1.323) 12.616 (1.159) 08 10.759 (1.077) 11.020 (1.370) 11.894 (1.230) 10 10.377 (1.117) 10.622 (1.422) 11.353 (1.288) 11 10.196 (1.137) 10.566 (1.429) 11.218 (1.304) 12 10.212 (1.135) 10.715 (1.409) 11.137 (1.313) 13 10.290 (1.126) 10.777 (1.401) 11.147 (1.312) 14 10.026 (1.156) 10.476 (1.442) 10.987 (1.331) 15 9.774 (1.186) 9.902 (1.525) 10.464 (1.398) 16 9.643 (1.202) 9.960 (1.516) 10.431( 1.402) 17 9.748 (1.189) 10.031( 1.506) 10.542 (1.387) 18 9.630 (1.204) 9.322 (1.620) 9.881 ( 1.480) 19 9.855 (1.176) 10.161( 1.486) 10.546 (1.387) 20 9.914 (1.169) 10.133 (1.490) 10.592 (1.381) 21 9.290 (1.248) 9.352 (1.615) 9.734 (1.503) 22 9.338 (1.241) 9.369 (1.612) 9.727 (1.504) 23 9.583 (1.209) 9.662 (1.563) 10.105 (1.448) 24 9.321 ( 1.243) 8.986 (1.681) 9.617 (1.521) 25 9.659 (1.200) 9.768 (1.546) 10.161( 1.440) 26 9.233 (1.255) 9.016 (1.675) 9.668 (1.513) 27 9.897 (1.171) 10.003 (1.510) 10.168 (1.439) 28 9.950 (1.165) 10.042 (1.504) 10.267 (1.425) 29 10.624 (1.091) 11.208 (1.347) 11.194 (1.307) 30 9.902 (1.170) 9.939 (1.519) 10.261 ( 1.425) 31 9.745 (1.189) 9.986 (1.512) 10.092 (1.449) 32 9.450 (1.226) 9.150 (1.650) 9.659 (1.514) 36 9.506 (1.219) 9.170 (1.647) 9.463 (1.546) 44 9.237 (1.255) 8.426 (1.792) 9.096 (1.608) 48 10.164 ( 1.140) 10.070 ( 1.500) 10.237 ( 1.429) 50 9.585 (1.209) 9.332 (1.618) 9.579 (1.527) 56 9.914 (1.169) 9.626 (1.569) 9.958 (1.469) 60 9.839 (1.178) 9.182 (1.645) 9.634 (1.518) 64 9.722 (1.192) 8.591 ( 1.758) 9.242 (1.583) 66 10.248 (1.131) 9.812 (1.539) 10.042 (1.457) Table 5.11: Bottom-up splay tree performance in base model with aligned supernode for 1M data (sec) #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 11.590 (1.000) 15.102 (1.000) 14.627 (1.000) 02 11.144 (1.040) 12.554 (1.203) 11.847 (1.235) 04 11.117 (1.043) 12.212 (1.237) 11.499 (1.272) 06 9.879 ( 1.173) 10.704 ( 1.411) 9.992 ( 1.464) 08 10.040 (1.154) 10.713 (1.410) 10.164 (1.439) 10 9.480 (1.223) 10.243 (1.474) 9.936 (1.472) 12 9.418 (1.231) 9.967 (1.515) 9.666 (1.513) 14 9.436 (1.228) 9.909 (1.524) 9.472 (1.544) 16 9.189 (1.261) 9.564 (1.579) 9.599 (1.524) 18 9.360 (1.238) 9.745 (1.550) 9.689 (1.510) 20 9.105 (1.273) 9.259 (1.631) 9.284 (1.576) 22 9.130 (1.269) 9.512 (1.588) 9.325 (1.569) 24 8.928 (1.298) 8.936 (1.690) 8.960 (1.632) 26 8.795 (1.318) 8.766 (1.723) 8.757 (1.670) 28 9.437 (1.228) 9.995 (1.511) 9.754 (1.500) 30 8.723 (1.329) 8.760 (1.724) 8.765 (1.669) 32 9.543 (1.215) 10.044 (1.504) 9.697 (1.508) 36 8.891 ( 1.304) 8.770 (1.722) 8.928 (1.638) 44 8.942 (1.296) 8.479 (1.781) 8.577 (1.705) 48 9.186 (1.262) 8.891( 1.699) 8.913 (1.641) 50 8.888 (1.304) 8.329 (1.813) 8.639 (1.693) 56 9.125 (1.270) 8.553 (1.766) 8.723 (1.677) 60 9.256 (1.252) 8.516 (1.773) 8.741 ( 1.673) 64 9.465 (1.225) 9.001 ( 1.678) 9.169 (1.595) 66 10.361 (1.119) 10.313 (1.464) 9.857 (1.484) Table 5.12: Bottom-up splay tree performance in base pernode for !\ data (sec) model with non-aligned su- #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 61.212 (1.000) 75.257 (1.000) 73.005 (1.000) 02 69.929 (0.875) 69.782 (1.078) 72.217 (1.011) 04 64.376 (0.951) 63.441 ( 1.186) 70.836 (1.031) 06 60.929 (1.005) 60.223 (1.250) 65.802 (1.109) 08 59.280 (1.033) 58.247 (1.292) 62.832 (1.162) 10 57.338 (1.068) 56.702 (1.327) 60.395 (1.209) 11 57.020 (1.074) 56.105 (1.341) 59.771 ( 1.221) 12 57.308 (1.068) 57.007 (1.320) 60.271 ( 1.211) 13 57.369 (1.067) 57.111 ( 1.318) 59.881( 1.219) 14 55.933 (1.094) 56.089 (1.342) 58.776 (1.242) 15 53.706 (1.140) 52.379 (1.437) 55.590 (1.313) 16 53.927 (1.135) 52.543 (1.432) 55.737 (1.310) 17 53.597 (1.142) 52.502 (1.433) 55.447 (1.317) 18 53.051 ( 1.154) 51.609 (1.458) 55.318 (1.320) 19 55.453 (1.104) 54.604 (1.378) 56.998 (1.281) 20 56.067 (1.092) 55.029 (1.368) 57.968 (1.259) 21 52.264 (1.171) 50.230 (1.498) 52.780 (1.383) 22 52.839 (1.158) 50.667 (1.485) 53.952 (1.353) 23 54.693 (1.119) 53.044 (1.419) 55.803 (1.308) 24 52.284 (1.171) 49.991( 1.505) 53.504 (1.364) 25 53.769 (1.138) 52.084 (1.445) 54.710 (1.334) 26 51.965 (1.178) 49.385 (1.524) 52.832 (1.382) 27 56.075 (1.092) 55.437 (1.358) 57.335 (1.273) 28 56.191 (1.089) 54.853 (1.372) 57.516 (1.269) 29 62.670 (0.977) 63.315 (1.189) 64.198 (1.137) 30 56.191 (1.089) 54.044 (1.393) 56.707 (1.287) 31 55.552 (1.102) 53.607 (1.404) 56.415 (1.294) 32 53.060 (1.154) 50.726 (1.484) 53.664 (1.360) 36 53.357 (1.147) 50.976 (1.476) 53.504 (1.364) 44 51.275 (1.194) 48.027 (1.567) 50.971 ( 1.432) 48 57.736 ( 1.060) 55.302 (1.361) 57.342 ( 1.273) 50 52.701 ( 1.161) 48.267 (1.559) 50.944 (1.433) 56 55.671 ( 1.100) 52.264 (1.440) 54.735 (1.334) 60 54.794 (1.117) 51.111( 1.472) 53.224 (1.372) 64 52.769 (1.160) 46.854 (1.606) 50.686 (1.440) 66 57.211 (1.070) 53.635 (1.403) 55.783 (1.309) Table 5.13: Bottom-up splay tree performance in base model with aligned supernode for !\ data (sec) #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 61.212 (1.000) 75.257 (1.000) 73.005 (1.000) 02 59.387 (1.031) 63.548 (1.184) 60.849 (1.200) 04 60.409 (1.013) 62.880 (1.197) 60.355 (1.210) 06 53.713 (1.140) 55.457 (1.357) 52.535 (1.390) 08 55.329 (1.106) 57.708 (1.304) 55.340 (1.319) 10 52.149 (1.174) 53.636 (1.403) 52.227 (1.398) 12 52.812 (1.159) 54.035 (1.393) 52.532 (1.390) 14 51.706 (1.184) 52.274 (1.440) 50.958 (1.433) 16 51.791 ( 1.182) 52.039 (1.446) 51.497 (1.418) 18 52.044 (1.176) 51.880 (1.451) 51.586 (1.415) 20 51.001 ( 1.200) 50.295 (1.496) 50.431 ( 1.448) 22 51.838 (1.181) 51.272 (1.468) 51.000 (1.431) 24 50.082 (1.222) 49.181( 1.530) 49.626 (1.471) 26 48.360 (1.266) 46.514 (1.618) 47.570 (1.535) 28 54.920 (1.115) 53.997 (1.394) 54.020 (1.351) 30 48.872 (1.252) 47.805 (1.574) 48.117 (1.517) 32 53.929 (1.135) 53.489 (1.407) 53.439 (1.366) 36 49.352 (1.240) 47.271( 1.592) 48.605 (1.502) 44 49.791 ( 1.229) 46.418 (1.621) 47.948 (1.523) 48 50.065 (1.223) 46.364 (1.623) 48.129 (1.517) 50 48.569 (1.260) 44.458 (1.693) 46.772 (1.561) 56 51.138 (1.197) 47.146 (1.596) 48.695 (1.499) 60 51.015 (1.200) 46.627 (1.614) 48.823 (1.495) 64 52.906 (1.157) 48.474 (1.553) 50.408 (1.448) 66 57.112 (1.072) 53.670 (1.402) 55.092 (1.325) Table 5.14: Top-down splay tree performance in base model with non-aligned supern- ode for 1M data (sec) #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 7.175 (1.000) 10.044 (1.000) 9.769 (1.000) 02 9.331 ( 0.769) 9.579 (1.049) 10.561 ( 0.925) 04 8.285 ( 0.866) 8.182 (1.228) 10.751 ( 0.909) 06 7.565 ( 0.948) 7.491 ( 1.341) 9.770 (1.000) 08 7.365 ( 0.974) 7.098 (1.415) 9.078 (1.076) 10 7.110 (1.009) 7.090 (1.417) 8.754 (1.116) 11 7.102 (1.010) 6.814 (1.474) 8.703 (1.122) 12 6.963 (1.030) 6.851( 1.466) 8.594 (1.137) 13 7.009 (1.024) 6.722 (1.494) 8.380 (1.166) 14 7.015 (1.023) 6.683 (1.503) 8.403 (1.163) 15 6.847 (1.048) 6.387 (1.573) 8.113 (1.204) 16 6.632 (1.082) 6.448 (1.558) 7.920 (1.233) 17 6.774 (1.059) 6.264 (1.603) 7.911 ( 1.235) 18 6.673 (1.075) 6.097 (1.647) 7.786 (1.255) 19 6.611 ( 1.085) 6.476 (1.551) 8.005 (1.220) 20 6.899 (1.040) 6.637 (1.513) 8.044 (1.214) 21 6.679 (1.074) 6.071 (1.654) 7.592 (1.287) 22 6.554 (1.095) 5.948 (1.689) 7.422 (1.316) 23 6.515 (1.101) 6.215 (1.616) 7.557 (1.293) 24 6.648 (1.079) 5.891( 1.705) 7.438 (1.313) 25 6.823 (1.052) 6.300 (1.594) 7.662 (1.275) 26 6.548 (1.096) 5.833 (1.722) 7.393 (1.321) 27 7.001 ( 1.025) 6.493 (1.547) 7.789 (1.254) 28 6.977 (1.028) 6.658 (1.509) 7.757 (1.259) 29 7.700 ( 0.932) 6.919 (1.452) 8.270 (1.181) 30 6.915 (1.038) 6.458 (1.555) 7.816 (1.250) 31 7.387 ( 0.971) 6.285 (1.598) 7.792 (1.254) 32 6.624 ( 1.083) 5.977 ( 1.680) 7.475 ( 1.307) 48 7.228 ( 0.993) 6.747 (1.489) 7.856 (1.244) 64 7.295 ( 0.984) 6.112 (1.643) 7.354 (1.328) Table 5.15: Top-down splay tree performance in base model with aligned supernode for iM data (sec) #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 7.175 (1.000) 10.044 (1.000) 9.769 (1.000) Al 4.797 (1.496) 9.207 (1.091) 7.806 (1.251) 02 7.502 ( 0.956) 8.419 (1.193) 8.650 (1.129) 04 7.987 ( 0.898) 7.732 (1.299) 8.238 (1.186) 06 6.464 (1.110) 6.976 (1.440) 7.369 (1.326) 08 6.535 (1.098) 6.994 (1.436) 7.747 (1.261) 10 6.420 (1.118) 6.638 (1.513) 7.361 ( 1.327) 12 6.400 (1.121) 6.592 (1.524) 7.344 (1.330) 14 6.140 (1.169) 6.432 (1.562) 7.011 ( 1.393) 16 6.248 ( 1.148) 6.314 ( 1.591) 6.972 ( 1.401) 18 6.324 (1.135) 6.340 (1.584) 7.105 (1.375) 20 6.297 (1.139) 6.083 (1.651) 7.043 (1.387) 22 6.172 (1.163) 6.166 (1.629) 7.001 ( 1.395) 24 6.315 (1.136) 5.949 (1.688) 6.893 (1.417) 26 6.042 (1.188) 5.776 (1.739) 6.762 (1.445) 28 6.470 (1.109) 6.495 (1.546) 7.331( 1.333) 30 6.101( 1.176) 5.636 (1.782) 6.596 (1.481) 32 6.706 (1.070) 6.407 (1.568) 7.265 (1.345) 48 6.765 (1.061) 6.252 (1.607) 6.992 (1.397) 64 7.094 (1.011) 6.335 (1.585) 7.247 (1.348) Table 5.16: Top-down splay tree performance in base model with non-aligned supern- ode for I \1 data (sec) #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 39.274 (1.000) 51.119 (1.000) 49.497 (1.000) 02 49.668 ( 0.791) 49.802 (1.026) 55.595 ( 0.890) 04 45.215 ( 0.869) 43.856 (1.166) 55.391 ( 0.894) 06 42.501 ( 0.924) 41.161 ( 1.242) 51.592 ( 0.959) 08 41.509 ( 0.946) 39.652 (1.289) 51.199 ( 0.967) 10 40.604 ( 0.967) 38.531 ( 1.327) 47.795 (1.036) 11 39.775 ( 0.987) 37.461 ( 1.365) 47.408 (1.044) 12 40.237 ( 0.976) 38.506 (1.328) 46.186 (1.072) 13 41.038 ( 0.957) 38.092 (1.342) 46.684 (1.060) 14 40.309 ( 0.974) 37.998 (1.345) 45.929 (1.078) 15 38.139 (1.030) 34.578 (1.478) 43.941 ( 1.126) 16 38.325 (1.025) 35.117 (1.456) 42.802 (1.156) 17 37.810 (1.039) 34.967 (1.462) 42.224 (1.172) 18 38.008 (1.033) 34.606 (1.477) 42.242 (1.172) 19 39.250 (1.001) 36.209 (1.412) 43.748 (1.131) 20 39.847 ( 0.986) 37.346 (1.369) 43.546 (1.137) 21 36.357 (1.080) 33.434 (1.529) 40.052 (1.236) 22 37.735 (1.041) 34.042 (1.502) 41.495 (1.193) 23 38.391 ( 1.023) 35.529 (1.439) 41.990 (1.179) 24 37.195 (1.056) 33.596 (1.522) 40.780 (1.214) 25 38.216 (1.028) 34.856 (1.467) 41.695 (1.187) 26 37.120 (1.058) 33.330 (1.534) 40.305 (1.228) 27 39.813 ( 0.986) 36.598 (1.397) 42.936 (1.153) 28 40.778 ( 0.963) 37.401 ( 1.367) 44.170 (1.121) 29 44.139 ( 0.890) 41.571 ( 1.230) 47.893 (1.033) 30 40.113 ( 0.979) 36.712 (1.392) 43.370 (1.141) 31 39.623 ( 0.991) 36.206 (1.412) 42.260 (1.171) 32 38.504 (1.020) 34.340 (1.489) 42.102 (1.176) 36 39.116 (1.004) 34.420 (1.485) 41.006 (1.207) 44 38.457 (1.021) 33.428 (1.529) 39.767 (1.245) 48 42.677 ( 0.920) 38.096 ( 1.342) 44.594 ( 1.110) 50 40.168 ( 0.978) 34.318 (1.490) 41.341 ( 1.197) 56 42.631 ( 0.921) 37.766 (1.354) 43.933 (1.127) 60 41.812 ( 0.939) 36.315 (1.408) 42.954 (1.152) 64 40.910 ( 0.960) 33.239 (1.538) 40.843 (1.212) 66 44.103 ( 0.891) 38.506 (1.328) 44.802 (1.105) Table 5.17: Top-down splay tree performance in base model with aligned supernode for I I\ data (sec) #elem Insert Search Delete /node (Speedup) (Speedup) (Speedup) 01 39.274 (1.000) 51.119 (1.000) 49.497 (1.000) Al 28.951 ( 1.357) 47.407 (1.078) 41.555 (1.191) 02 41.804 ( 0.939) 44.375 (1.152) 45.105 (1.097) 04 40.587 ( 0.968) 42.851 ( 1.193) 44.553 (1.111) 06 36.234 (1.084) 37.627 (1.359) 39.442 (1.255) 08 38.016 (1.033) 38.683 (1.321) 41.425 (1.195) 10 36.186 (1.085) 36.106 (1.416) 39.198 (1.263) 12 36.639 (1.072) 36.448 (1.403) 40.142 (1.233) 14 35.003 (1.122) 34.767 (1.470) 38.081 ( 1.300) 16 36.114 (1.088) 34.947 (1.463) 39.468 (1.254) 18 36.651 ( 1.072) 35.154 (1.454) 41.011 ( 1.207) 20 35.991 ( 1.091) 33.885 (1.509) 39.262 (1.261) 22 36.177 (1.086) 34.422 (1.485) 39.547 (1.252) 24 35.722 (1.099) 32.990 (1.550) 39.332 (1.258) 26 34.606 (1.135) 31.316 (1.632) 37.313 (1.327) 28 38.848 (1.011) 36.478 (1.401) 40.918 (1.210) 30 35.251 ( 1.114) 32.120 (1.592) 37.535 (1.319) 32 38.875 (1.010) 3', ;,. (1.405) 41.031( 1.206) 36 36.568 (1.074) 32.805 (1.558) 38.714 (1.279) 44 36...2, ( 1.065) 33.130 (1.543) 38.293 (1.293) 48 37.653 (1.043) 32.238 (1.586) 39.186 (1.263) 50 36.236 (1.084) 31.480 (1.624) 37.283 (1.328) 56 38.258 (1.027) 33.156 (1.542) 38.705 (1.279) 60 38.480 (1.021) 32.812 (1.558) 38.795 (1.276) 64 40.169 ( 0.978) 34.407 (1.486) 40.263 (1.229) 66 42.875 ( 0.916) 37.761 ( 1.354) 43.152 ( 1.147) 12 \ AVL 0\ .. -- RBB SPB \ -- SPB 10- -*- SPT 8 4 \* 4 --*- -* S6- E .=. . -.. . . . . . . . 4- 2- 0 0 10 20 30 40 50 60 70 number of elements per node Figure 5.14: Average insertion time in base model with 1M data sets 80 A AVL 0 RBB 70 .-- SPB -- SPT 60 \.-. 1--50- - --- I\_. - -- U, C,) I 30- __ _. c* 20 10 0 10 20 30 40 50 60 70 number of elements per node Figure 5.15: Average insertion time in base model with \! data sets 18 A AVL 16 .0 RBB S- SPB -- SPT 14 12 - (1 10 E \ - 6 O 6 -* "-- 8- 4- . 2- 0 10 20 30 40 50 60 70 number of elements per node Figure 5.16: Average search time in base model with 1M data sets 90 A AVL 8.0 RBB 80 .. SPB -- SPT 70 60 U) 0 .,50 / (U~~~ -' '~- ---- - ^ * ) '* E ' 40 - Lk * 30 20 10 0 10 20 30 40 50 60 70 number of elements per node Figure 5.17: Average search time in base model with !\l data sets 16 A AVL .0 RBB 14 .-- SPB -- SPT 12 - E I\ 6- 4- 0 10 20 30 40 50 60 70 5s -- - - -6 0 10 20 30 40 50 60 70 number of elements per node Figure 5.18: Average deletion time in base model with 1M data sets |