UFDC Home  myUFDC Home  Help 



Full Text  
OPTIMIZATION PROBLEMS IN TELECOMMUNICATIONS AND THE INTERNET By CARLOS A.S. OLIVEIRA A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2004 To my wife Janaina. ACKNOWLEDGMENTS The following people deserve my sincere acknowledgments: * My advisor, Dr. Panos Pardalos; * Dr. Mauricio Resende, from AT&T Research Labs, who was responsible for introducing me to this University; * My colleagues in the graduate school of the Industrial and Systems Engi neering Department; * My family, and especially my parents; * My wife. TABLE OF CONTENTS page ACKNOWLEDGMENTS .................. ..... iii LIST OF TABLES ................... .......... vii LIST OF FIGURES ................... ......... viii ABSTRACT .................................... ix 1 INTRODUCTION ................... ....... 1 2 A SURVEY OF COMBINATORIAL OPTIMIZATION PROB LEMS IN MULTICAST ROUTING .............. 4 2.1 Introduction ..... ........... ...... 4 2.1.1 Multicast Routing ................... 5 2.1.2 Basic Definitions ................... 7 2.1.3 Applications of Multicast Routing ......... 8 2.1.4 Chapter Organization ................ 9 2.2 Basic Problems in Multicast Routing ........... 9 2.2.1 Graph Theory Terminology ............. 9 2.2.2 Optimization Goals . . . . 11 2.2.3 Basic Multicast Routing Algorithms . . 12 2.2.4 General Techniques for Creation of Multicast Routes 13 2.2.5 Shortest Path Problems with Delay Constraints 15 2.2.6 Delay Constrained Minimum Spanning Tree Problem 16 2.2.7 CenterBased Trees and the Topological Center Prob lem . . . . . . 17 2.3 Steiner Tree Problems and Multicast Routing . 18 2.3.1 The Steiner Tree Problem on Graphs . . 18 2.3.2 Steiner Tree Problems with Delay Constraints 21 2.3.3 The Online Version of Multicast Routing . 25 2.3.4 Distributed Algorithms . . . 29 2.3.5 Integer Programming Formulation . . 34 2.3.6 Minimizing Bandwidth Utilization . .... 36 2.3.7 The Degreeconstrained Steiner Problem . 37 2.3.8 Other Restrictions: Non Symmetric Links and De gree Variation . . . . ... 38 2.3.9 Comparison of Algorithms . . . 39 2.4 Other Problems in Multicast Routing . . .... 41 2.4.1 The Multicast Packing Problem . .... 42 2.4.2 The Multicast Network Dimensioning Problem 44 2.4.3 The PointtoPoint Connection Problem . 46 2.5 Concluding Remarks . . . . .. . 47 3 STREAMING CACHE PLACEMENT PROBLEMS .. ..... 48 3.1 Introduction . . . . . . 48 3.1.1 Multicast Networks . . . .. . 49 3.1.2 Related Work . . . . . 51 3.2 Versions of Streaming Cache Placement Problems . 52 3.2.1 The Tree Cache Placement Problem . . 53 3.2.2 The Flow Cache Placement Problem . . 55 3.3 Complexity of the Cache Placement Problems . 56 3.3.1 Complexity of the TSCPP . . . 56 3.3.2 Complexity of the FSCPP . . . 60 3.4 Concluding Remarks . . . . .. . 63 4 COMPLEXITY OF APPROXIMATION FOR STREAMING CACHE PLACEMENT PROBLEMS .... ......... 64 4.1 Introduction . . . ... ..... 64 4.2 Nonapproximability . . . . . 65 4.3 Improved Hardness Result for FSCPP . . .... 68 4.4 Concluding Remarks . . . . ... 73 5 ALGORITHMS FOR STREAMING CACHE PLACEMENT PROBLEMS ..... . . ... ........ 74 5.1 Introduction . . . . . . 74 5.2 Approximation Algorithms for SCPP . . 75 5.2.1 A Simple Algorithm for TSCPP . . 75 5.2.2 A Flowbased Algorithm for FSCPP . . 77 5.3 Construction Algorithms for the SCPP . ... 80 5.3.1 Connecting Destinations. . . . 81 5.3.2 Adding Caches to a Solution . . 85 5.4 Empirical Evaluation . . . . ..... 89 5.5 Concluding Remarks . . . . ..... 93 6 HEURISTIC ALGORITHMS FOR ROUTING ON MULTICAST NETWORKS ..... . . . ... ...... 94 6.1 Introduction . . . . . . 94 6.1.1 The Multicast Routing Problem . . 95 6.1.2 Contributions . . . . .... 96 6.2 An Algorithm for the MRP . . . ..... 97 6.3 Metaheuristic Description . . . ..... 101 6.3.1 Improving the Construction Phase . .... 102 6.3.2 Improvement Phase . . . . 105 6.3.3 Reverse Path Relinking and Postprocessing . 109 6.3.4 Efficient implementation of Path Relinking . 110 6.4 Computational Experiments . . . . 111 6.5 Concluding Remarks . . . .... . 113 7 A NEW HEURISTIC FOR THE MINIMUM CONNECTED DOMINATING SET PROBLEM ON AD HOC WIRELESS NETWORKS ..... . . . ... ...... 115 7.1 Introduction ......... . . ...... 115 7.2 Algorithm for the MCDS Problem . . 118 7.3 A Distributed Implementation . . . 121 7.4 Numerical Experiments . . . . .... 125 7.5 Concluding Remarks . . . .... . 126 8 CONCLUSION . . . . . . . 130 REFERENCES ..... . . ... ........... 135 BIOGRAPHICAL SKETCH . . . . . . 147 LIST OF TABLES Table page 21 Comparison among algorithms for the problem of multicast rout ing with delay constraints. k is the number of destinations. ** This algorithm is partially distributed. . . .. 39 22 Comparison among algorithms for the problem of multicast rout ing with delay constraints. k is the number of destinations, TSP is the time to find a shortest path in the graph. ** In this case amortized time is the important issue, but was not analyzed in the original paper. ............. .. 40 51 Computational results for different variations of Algorithm 7 and Algorithm 8 ..... .. .. ... .. ... ...... 90 52 Comparison of computational time for Algorithm 7 and Algo rithm 8. All values are in milliseconds. . . .. 92 61 Summary of results for the proposed metaheuristic for the MRP. Column 9 (*) reports only the time spent in the construction phase. ...... .... ..... ........ ..... 112 71 Results of computational experiments for instances with 100 vertices, randomly distributed in square planar areas of size 100 x 100 and 120 x 120, 140 x 140, and 160 x 160. The average solutions are taken over 30 iterations. .. . ..... 128 72 Results of computational experiments for instances with 150 vertices, randomly distributed in square planar areas of size 120 x 120, 140 x 140, 160 x 160, and 180 x 180. The average solutions are taken over 30 iterations. .. . ..... 129 LIST OF FIGURES Figure page 21 Conceptual organization of a multicast group. . . 6 31 Simple example for the cache placement problem. ...... .. 50 32 Simple example for the Tree Cache Placement Problem. ... 53 33 Simple example for the Flow Cache Placement Problem. 56 34 Small graph G created in the reduction given by Theorem 2. In this example, the SAT formula is (ax V X2 V T3) A (r2 V X3 V T4) A (rT V X3 V 4) . . ....... .... 58 35 Part of the transformation used by the FSCPP. . ... 61 41 Example for transformation of Theorem 11. . . 70 51 Sample execution for Algorithm 7. In this graph, all capacities are equal to 1. Destination d2 is being added to the partial solution, and node 1 must be added to R. . . 82 52 Sample execution for Algorithm 8, on a graph with unitary ca pacities. Nodes 1 and 2 are infeasible, and therefore are can didates to be included in R. .... . . ... 87 53 Comparison of computational time for different versions of Al gorithm 7 and Algorithm 8. Labels 'C3' to 'C10' refer to the columns from 3 to 10 on Table 52. . . 92 61 Comparison between the average solution costs found by the KMB heuristic and our algorithm. . . ..... 113 71 Approximating the virtual backbone with a connected dominat ing set in a unitdisk graph ................. .. 117 72 Actions for a vertex v in the distributed algorithm. . 124 Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy OPTIMIZATION PROBLEMS IN TELECOMMUNICATIONS AND THE INTERNET By Carlos A.S. Oliveira August 2004 Chair: Panos M. Pardalos M., i Department: Industrial and Systems Engineering Optimization problems occur in diverse areas of telecommunications. Some problems have become classical examples of application for techniques in operations research, such as the theory of network flows. Other opportuni ties for applications in telecommunications arise frequently, given the dynamic nature of the field. Every new technique presents different challenges that can be answered using appropriate optimization techniques. In this dissertation, problems occurring in telecommunications are dis cussed, with emphasis for applications in the Internet. First, a study of prob lems occurring in multicast routing is presented. Here, the objective is to allow the ,1dpl.'ment of multicast services with minimum cost. A description of the problem is provided, and variations that occur frequently in some of these applications are discussed. Complexity results are presented for multicast problems, showing that it is NPhard to approximate these problems effectively. Despite this, we also describe algorithms that give some guarantee of approximation. A second problem in multicast networks studyed in this dissertation is the multicast routing problem. Its objective is to find a minimum cost route linking source to destinations, with additional (.ii. il of service constraints. A heuristic based on a Steiner tree algorithm is proposed, and used to construct solutions for the routing problem. This construction heuristic is also used as the basis to develop a restarting method, based on the greedy randomized adaptive search procedure (GRASP). ) The last part of the dissertation is concerned with problems in wireless networks. Such networks have numerous applications due to its highly dynamic nature. Algorithms to compute near optimal solutions for the minimum back bone problem are proposed, which perform in practice much better than other methods. A distributed version of the algorithm is also provided. CHAPTER 1 INTRODUCTION Computer networks are a relatively new communication medium that has quickly become essential for most organizations. In this dissertation, we present some optimization problems occurring in computer and telecommuni cations networks. Performing optimization on such networks is important for several reasons, including cost and speed of communication. We concentrate on two types of networks that have recently received much attention. The first type is multicast spii:t.~ir. which are used to reliably share information with a (possibly large) group of clients. The second type of networks considered in this dissertation is wireless ad hoc s;it,,ii. an important type of networks with several applications. We are mostly concerned about computational issues arising in the op timization of problems occurring on telecommunications networks. Thus, al though we present mathematical programming aspects for each of these prob lems, the main objective will be to derive efficient algorithms, with or without guarantee of approximation. The topics discussed in the dissertation are divided as follows. In Chap ter 2, a survey of research on the area of multicast systems is presented. The review is used as a starting point for the topics that will be discussed later in the dissertation related to multicast networks. Chapter 3 introduces the problem that will be studied in the next chapters, the streaming cache placement problem (SCPP). Variants of this basic problem are introduced, and all variants are proved to be ATPhard. Chapter 4 is dedicated to the study of approximability properties of the different versions of the SCPP. It is shown that in general the SCPP cannot have a polynomial time approximation scheme (PTAS). This demonstrates that the SCPP is a very hard problem not only to solve exactly, but also to approximate. We also show that for the directed flow version it is not possible to approximate the problem by less than log log ID where D is the set of destinations. In Chapter 5, algorithms for different versions of the SCPP are proposed. Both approximation algorithms, as well as heuristics are discussed. Initially, some algorithms with performance guarantee are proposed. However, due to complexity results, these algorithms in general do not give good results for problems found in practice. Heuristic algorithms are then studied, and two main strategies for construction heuristics are discussed. Results of computa tional results with these methods are presented and compared. Another problem in multicast networks is discussed in Chapter 6. The routing problem in multicast networks asks for an optimal route, i.e., a mini mum cost tree connecting the source node to destinations. The routing prob lem for multicast networks is known to be NPhard. We propose new heuris tics, and use these heuristics to implement a greedy adaptive search procedure (GRASP). In the last part of the dissertation, wireless network systems are discussed. In particular, ad hoc systems (also known as MANETs) are studied. Chapter 7 is dedicated to the problem of determining a minimum backbone for such ad hoc networks. A new algorithm for this problem is given, and the advantages of this algorithm are addressed. A distributed version of the algorithm is also proposed. 3 Finally, in Chapter 8 general conclusions are given about the work pre sented in the dissertation. Future work in the area is presented, and some concluding remarks about this area of research are given. CHAPTER 2 A SURVEY OF COMBINATORIAL OPTIMIZATION PROBLEMS IN MULTICAST ROUTING In multicasting routing, the main objective is to send data from one or more source to multiple destinations, while at the same time minimizing the usage of resources. Examples of resources which can be minimized include bandwidth, time and connection costs. In this chapter we survey applications of combinatorial optimization to multicast routing. We discuss the most im portant problems considered in this area, as well as their models. Algorithms for each of the main problems are also presented. 2.1 Introduction A basic application of computer networks consists of sending information to a selective, usually large, number of clients of some specific data. Common examples of such applications are multimedia distribution systems (Pasquale et al., 1998), videoconferencing (Eriksson, 1994), software delivery (Han and Shahmehri, 2000), groupware (Chockler et al., 1996), and game communi ties (Park and Park, 1997). Multicast is a technique used to facilitate this type of information exchange, by routing data from one or more sources to a potentially large number of destinations (Deering and Cheriton, 1990). This is done in such a way that overall utilization of resources in the underlying network is minimized in some sense. To handle multicast routing, many proposals of multicast technologies have been done in the last decade. Examples are the MBONE (Eriksson, 1994), MOSPF (Moy, 1994a), PIM (Deering et al., 1996), corebased trees (Ballardie et al., 1993) and shared tree technologies (Chiang et al., 1998; Wei and Es trin, 1994). Each proposed technology requires the solution of (usually hard) combinatorial problems. With the proliferation of services that require multi cast delivery, the associated routing methods became an important source of problems for the combinatorial optimization community. Many objectives can be devised when designing protocols, routing strategies, and overall networks that can be optimized using techniques from combinatorial optimization. In this chapter we discuss some of the combinatorial optimization prob lems arising in the area of multicast routing. These are very interesting in their own, but sometimes are closely related to other well known problems. Thus, the crossfertilization of ideas from combinatorial optimization and multicast networks can be beneficial to the development of improved algorithms and general techniques. Our objective is to review some of the more interesting problems and give examples and references of the existing algorithms. We also discuss some problems recently appearing in the area of multicast networks and how they are modeled and solved in the literature. 2.1.1 Multicast Routing The idea of sending information for a large number of users is common in systems that employ broadcasting. Radio and TV are two standard examples of broadcasting systems which are widely used. On the other hand, networks were initially designed to be used as a communication means among a relatively small number of participants. The TCP/IP protocol stack, which is the main technology underlying the Internet, uses routing protocols for delivery of packets for single destinations. Most of these protocols are based on the calculation of shortest paths. A good example of a widely used routing protocol is the OSPF (Moy, 1994b; Network Multicast sources Multicast destinations Figure 21: Conceptual organization of a multicast group. Thomas II, 1998) (Open Shortest Path First), which is used to compute routing tables for routers inside a subnetwork. In OSPF, each router in the network is responsible for maintaining a table of paths for reachable destinations. This table can be created using the Dijkstra's algorithm (Dijkstra, 1959) to calculate shortest paths from the current node to all other destinations in the current subnetwork. This process can be done deterministically in polynomial time, using at most O(n3) iterations, where n is the number of nodes involved. However, with the Internet and the increased use of large networks, the ne cessity appeared for services targeting larger audiences. This phenomenon be came more important due to the development of new technologies such as vir tual conference (Sabri and Prasada, 1985), video on demand, groupware (Ellis et al., 1991), etc. This series of developments gave momentum for the creation of multicast routing protocols. In multicast routing, data can be sent from one or more source nodes to a set of destination nodes (see Figure 21). It is required that all destinations be satisfied by a stream of data. Dalal and Metcalfe (1978) were the first to give nontrivial algorithms for routing of packets in a multicast network. From then on, many proposals have been made to create technology supporting multicast routing, such as by Deering (1988), Eriksson (1994), and Wall (1980). Some examples of mul ticast protocols are PIM Protocol Independent Multicast (Deering et al., 1996), DVMRP DistanceVector Multicast Routing Protocol (Deering and Cheriton, 1990; Waitzman et al., 1988), MOSPF Multicast OSPF (Moy, 1994a), and CBT Core Based Trees (Ballardie et al., 1993). See Levine and GarciaLunaAceves (1998) for a detailed comparison of diverse technologies. 2.1.2 Basic Definitions A multicast qirvilup is a set of nodes in a network that need to share the same piece of information. A multicast group can have one or more source nodes, and more than one destination. Note that even when there is more than one source, the same information is shared among all nodes in the group. A multicast group can be static or dynamic. Static qfiviupi' cannot be changed after its creation. Starting with Wall (1980), the problem of routing information in static groups is frequently modeled as a type of Steiner tree problem. On the other hand, .g*l,'in:l' qr1iup can have members added or removed at any time (Waxman, 1988). Clearly the task of maintaining routes for dynamic groups is complicated by the fact that it is not known in advance which nodes can be added or removed. Multicast groups can be also classified according to the relative number of users, as described by Deering and Cheriton (1990). In sparse r1iup', the number of participants is small compared to the number of nodes in the network. In the other situation, in which most of the nodes in the network are engaged in multicast communication, the groups involved are called pervasive qirvulii (Waitzman et al., 1988). For more information about multicast networks in general, one can consult the surveys by A.J. Frank (1985), and Paul and Raghavan (2002). A good introduction to multicasting in IP networks is given in the Internet Draft by Semeria and Maufer (1996) (available online). Other interesting related literature include Du and Pardalos (1993a); Pardalos and Du (1998); Wan et al. (1998); Pardalos et al. (2000, 1993); Pardalos and Khoury (1996, 1995). 2.1.3 Applications of Multicast Routing Applications of multicast routing have a wide spectrum, from business to government and entertainment. One of the first applications of multicast routing was in audio broadcasting. In fact, the first real use of the Internet MBONE (I\!ultimedia Backbone, created in 1992) was to broadcast audio from IETF (Internet Engineering Task Force) meetings over the Internet (Eriksson, 1994). Another important application of multicast routing is video confer ence (Yum et al., 1995), since this is a resourceintensive kind of application, where a group of users is targeted. It has requirements, such as realtime image exchanging and allowing interaction between geographically separated users, also found in other types of multimedia applications. Being closely re lated to the area of remote collaboration, video conferencing has received great attention during the last decade. Among others, Pasquale et al. (1998) give a detailed discussion about utilization of multicast routing to deliver multimedia content over large networks, such as the Internet. Jia et al. (1997) and Kom pella et al. (1996) also proposed algorithms for multicast routing applied to realtime video distribution and videoconferencing problems. Many other interesting uses of multicast routing have been done during the last decade, with examples such as video on demand, software distribution, Internet radio and TV stations, etc. 2.1.4 Chapter Organization The remainder of this chapter is organized as follows. In Section 2.2 we give a common ground for the description of optimization problems in multi cast routing. We start by giving the terminology used throughout the chapter, mainly from graph theory. Then, we discuss some of the common problems appearing in this area. In Section 2.3 we discuss delay constrained Steiner tree problems. These are the most studied problems in multicast routing, from the optimization point of view, being used in diverse algorithms. Thus, we discuss many of the versions of this problem considered in the literature. In Section 2.4 we review some other optimization problems related to multi cast routing. They are the multicast packing problem, the multicast network dimensioning problem, and the pointtopoint connection problem. Finally, in Section 2.5 we give some concluding remarks about the subject. 2.2 Basic Problems in Multicast Routing In this section we discuss the basic problems occurring in multicast net works. We start by an introduction to terminology used. In the sequence we discuss some basic problems which are addressed in the multicast routing literature. 2.2.1 Graph Theory Terminology Graphs in this chapter are considered to be undirected and without loops. In our applications, the nodes in a graph represent hosts, and edges represent network links. We use N(v) to denote the set of neighbors of a node v E V. Also, we denote by 6(V) the number of such neighbors. With each edge (i, j) c E we can associate functions representing char acteristics of the network links. The most widely used functions are capacity c(i, j), cost w(i, j) and delay d(i, j), for i,j E V. For each edge (i, j) E E, the associated c., .... il1v c(i,j) represents the maximum amount of data that can be sent between nodes i and j. In multicasting applications this is generally given by an integer multiple of some unity of transmission c.i ... i 'v, so we can say that c(i, j) E Z+, for all (i, j) E E. The function w(i, j) is used to model any costs incurred by the use of the network link between nodes i and j. This include leasing costs, maintenance costs, etc. Some applications, such as multimedia delivery, are sensitive to transmis sion delays and require that the total time between delivery and arrival of a data package be restricted to some particular maximum value (Ferrari and Verma, 1990). The delay function d(i, j) is used to model this kind of con straint. The delay d(i, j) represents the time needed to transmit information between nodes i and j. As a typical example, videoondemand applications may have specific requirements concerning the transmission time. Each packet i can be marked with the maximum delay di that can be tolerated for its trans mission. In this case the routers must consider only paths where the total delay is at the most di. A path in a graph G is a sequence of nodes Vil,...,Vi where (Vik, Vk+ ) is in E, for all k E {1,..., j 1}. In a routing problem we want to find paths from a source s to a set D of destinations, .,i ifvii some requirements. The cost w(P) of a path P is defined as the sum of the costs of all edges (vik, Vik+) in P. A path P between nodes u and v is called a minimum path if there is no path P' in G such that w(P') < w(P). The path /. 1,,, d(P) is defined as the delay incurred when routing data between nodes vi and ,, through path P = (v, ..., ,). In other words, d(P) = i1 d(v,, i ). In this chapter, we use interchangeably the words edge and link to relate to the same object. The word link is used when it is more appropriate in the application context. For more information of graph theoretical aspects of multicast networks, see Berry (1990). 2.2.2 Optimization Goals Different objectives can be considered when optimizing a multicast routing problem, such as, for example, path delay, total cost of the tree, and maximum congestion. We discuss some of these objectives. Quality of service is an important consideration with network service, and it is mostly related to the time needed for data delivery. Depending on the q(II.ili v, of service requirements of an application, one of the possible goals is to minimize path delay. The best example of application that needs this quality of service is videoconference. The path delay is an additive delay function, corresponding to the sum of delays incurred from source to destination, for all destinations. It is interesting to note that this problem is solvable in polyno mial time, since the paths from source to destination are considered separately. Shortest path algorithms such as, for example, the Dijkstra's algorithm (Dijk stra, 1959), can be used to achieve this objective. A second objective is to minimize the total cost of the routing tree. This is again an additive metric, where we look for the minimum sum of costs for edges used in the routing tree. In this case, however, the optimization objective is considerably harder to achieve, since it can be shown to be equivalent to the minimum Steiner tree, a classical A/'Phard problem (Garey and Johnson, 1979). Another example of optimization goal is to minimize the maximum net work congestion. The congestion on a link is defined as the difference between capacity and usage. The higher the congestion, the more difficult it is to han dle failures in some other links of the network. Also, higher congestion makes it harder to include new elements in an existing multicast group, and there fore is an undesirable situation in dynamic multicast. Thus, in a well designed network it is interesting to keep congestion at a minimum. 2.2.3 Basic Multicast Routing Algorithms The most basic way of sending information to a multicast group is using flooding. With this technique, a node sends packets through all its .rli.':ent links. If a node v receives a packet p from node u for which it is not the destination, then v first checks if p was received before. If this is true, the packet does not need to be sent again. Otherwise, the v just resends the packet to all other .ri.1.:ent nodes (excluding u). The formal statement of this strategy is shown in Algorithm 1. It is clear that after at most n such steps (where n is the number of nodes in the network), the package must have reached all nodes, including the destinations. Thus, the algorithm is correct. The number of messages sent by each node is at most n. The number of messages received by v is at most n6c(v). Receive packet p from node u if destination(p) = v then PacketReceived else if packet was not /". j'.'.iri processed then Sent packet p to all nodes in N(v) \ {u} end end Algorithm 1: Flooding algorithm for node v in a multicast network This method of packet routing is simple, but very inefficient. The first reason is that it uses more bandwidth than required, since many nodes which are not in the path to the destination will end up by receiving the packet. Second, each node in the network must keep a list of all packets which it sent, in order to avoid loops. This makes the use of flooding prohibitive for all but very small networks. Another problem, which is more difficult to solve, is how to guarantee that a packet will be delivered, since the network can be disconnected due to some link failure, for example. The reverse pathlfi,,ii ,+'i, algorithm is a method, proposed by Dalal and Metcalfe (1978), used to reduce the network usage associated with the flooding technique. The idea is that, for each node v and source node s in the network, v will determine in a distributed way what is the edge e = (u, v), for some u E V, which is in the shortest path from s to v. This edge is called the parent link. The parent link can be determined in different ways, and a very simple method is: select e = (u, v) to be the parent link for source s if this was the first edge from which a packet from s was received. With this information, a node can selectively drop incoming packets, based on its source. If a packet p is received from a link which is not considered to be in the shortest path between the source node and the current node, then p is discarded. Otherwise, the node broadcasts p to all other .,li.1:ent links, just as in the flooding algorithm. The parent link can also be updated depending on the information received from other nodes. Other algorithms can be used to enhance this basic scheme as discussed, e.g., by Semeria and Maufer (1996). 2.2.4 General Techniques for Creation of Multicast Routes During the last decades a number of basic techniques were proposed for the construction of multicast routes. Diot et al. (1997) identified some of the main techniques used in the literature. They describe these techniques as being divided into source based routing, center based tree algorithms, and Steiner tree based algorithms. In source based rouli.i'. a routing tree rooted at the source node is cre ated for each multicast group. This technique is used, for example, in the DVMRP and PIM protocols. Some implementations of source based routing make use of the reverse pathforwarding algorithm, discussed in the previous subsession (Dalal and Metcalfe, 1978). Sriram et al. (1998) observed that this technique does a poor job in routing small multicast groups, since it tries to optimize the routing tree without considering other potential users not in the current group. Among the source based routing algorithms, the Steiner tree based meth ods focus on minimization of tree cost. This is probably the most used ap proach, since it can leverage the large number of existing algorithms for the Steiner tree problem. There are many examples of this technique (such as in BharathKumar and Jaffe (1983); Wall (1982); Waxman (1988); Wi and Choi (1995)), which will be discussed on Section 2.3. In contrast to source based routing, center based tree .il';iJl.iis create routing trees with a specified root node. This root node is computed to have some special properties, such as, for example, being closest to all other nodes. This method is well suited to the construction of shared trees, since the root node can have properties interesting to all multicast groups. For example, if the root node is the topological center of a set of nodes, then this is the node which is closest to all members of the involved multicast groups. In the case of the topological center, the problem of finding the root node becomes AVP hard, but there are other versions of the problem which are easier to solve. An important example of use of this idea occurs in the CBT (corebased tree) algorithm (Ballardie et al., 1993). A recent method proposed for distributing data in multicast groups is called ring based routing (Baldi et al., 1997; Ofek and Yener, 1997). The idea is to have a ring linking nodes in a group, to minimize costs and improve reliability. Note for example that trees can be broken by just one link failure; on the other hand, rings are 2connected structures, which offer a more reliable interconnection. 2.2.5 Shortest Path Problems with Delay Constraints Given a graph G(V, E), a source node s and a destination node t, with s, t E V, the shortest path problem consists of finding a path from s to t with minimum cost. The solution of shortest path problems is required in most implementations of routing algorithms. This problem can be solved in polynomial time using standard algorithms (Dijkstra, 1959; Bellman, 1958; Ford, 1956). However, other versions of the shortest problem are harder, and cannot be solved exactly in polynomial time. An example of this occurs when we add delay constraints to the basic problem. The delay constraints require that the sum of the delays from source to each destination be less than some threshold. In this case, the shortest path problem becomes ATPhard (Garey and Johnson, 1979) and therefore, some heuristic algorithms must be used in order to find efficient implementations (e.g. Salama et al. (1997b)). For example, Sun and Langendoerfer (1995) and Deering and Cheriton (1990) have proposed good heuristics for this problem. Some algorithms for shortest path construction are less useful than others, due to properties of their distributed implementations. According to Cheng et al. (1989), a disadvantage of the distributed BellmanFord algorithm for shortest path computation is that is difficult to recover from link failures, from the bouncing effect (Sloman and Andriopoulos, 1985) caused by loops, and from termination problems caused by disconnected segments. Thus, a chief requirement for shortest path algorithms used in multicast routing is to have a scalable distributed implementation. The problems associated with distributed requirements for shortest path algorithms are discussed by Cheng et al. (1989), who proposed a distributed algorithm to overcome such limita tions. 2.2.6 Delay Constrained Minimum Spanning Tree Problem In the minimum ".,;,I/i:i.i tree (\!ST) problem, given a graph G(V, E), we need to find a minimum cost tree connecting all nodes in V. This problem can be solved in polynomial time by Kruskal's algorithm (Kruskal, 1956) or Prim's algorithm (Prim, 1957). However, similarly to the shortest path problem, the MST problem becomes AVPhard when delay constraints are applied to the resulting paths in the routing tree. This fact can be easily shown, since the minimum spanning tree problem is a generalization of the minimum cost path problem. Salama et al. (1997a) discuss the /. 1 ,,i constrained minimum "1..',; :,::,:' tree problem. They propose a simple heuristic, which resembles Prim's algorithm, to give an approximate solution to the problem. The proposed method can be described as follows. In its first phase, the algorithm tries to incorporate links, ordered according to increasing cost, but without creating cycles. At each step, the algorithm must also insure that the current (partial) solution satisfy the delay constraints. If this is not true, then a relaxation step is carried on, which consists of the following procedure. If a node can be linked by an alternative path, while reducing the delay, then the new path is selected. If, after this relaxation step, there is still no path with a suitable delay for some node, then the algorithm fails and returns just a partial answer. Other examples of algorithms for computing delay constrained spanning trees include the work of Chow (1991). In his paper, an algorithm for the prob lem of combining different routes into one single routing tree is proposed. For more information about delay constrained routing, see Salama et al. (1997c), where a comparison of diverse algorithms for this problem is performed. 2.2.7 CenterBased Trees and the Topological Center Problem In the context of generation of multicast routing trees, some routing tech nologies, such as PIM and CBT, use the technique known as centerbased trees (Salama et al., 1996), which was initially developed in Wall (1982). This method can be classified as a centerbased routing technique, as described in Section 2.2.4. In this approach the first step is to find the node v which is the S"I./.'..;', ,1 Icenter of the set of senders and receivers. The topological center of a graph G(V, E) is defined as the node v E V which is closest to any other node in the network, i.e., the node v which minimizes ii.i:. , d(v, u). Then, a routing tree rooted at v is constructed and used throughout the multicast session. The basic reasoning behind the algorithm is that the topological center is a better starting point for the routing tree, since it is expected to change less than other parts of the tree. This scheme departs from the idea of rooting the tree at the sender, and therefore can be extended to be used by more than one multicast group at the same time. The topological center is, however, a A'Phard problem (Ballardie et al., 1993). Thus, other related approaches try to find root nodes that are not exactly the topological center, but which can be thought of as a good approx imation. Along these lines we have algorithms using core points (Ballardie et al., 1993) and also rendezvous points (Deering et al., 1994). It is interesting to note that, for simplicity, most of the papers which try to create routing trees using centerbased techniques simply disregard the AVP complete problem and try to find other approximations. It is not completely understood how good these approximations can be for practical instances. However, Calvert et al. (1995) gave an informative comparison of the different methods of choosing the center for a routing tree, based on several experiments. 2.3 Steiner Tree Problems and Multicast Routing In this section we discuss different versions of the Steiner tree problem, and how they can be useful to solve problems arising in multicast routing. Some of the algorithm for the Steiner tree are also presented. 2.3.1 The Steiner Tree Problem on Graphs Steiner tree problems are very useful in representing solutions to multicast routing problems. They are nrplvd mostly when there is just one active multicast group and the minimum cost tree is wanted. In the Steiner tree problem, given a graph G(V, E), and a set R C V of required nodes, we want to find a minimum cost tree connecting all nodes in R. The nodes in V \ R can be used if needed, and are called "St. il. I" points. This is a classical AV'Phard problem (Garey and Johnson, 1979), and has a vast literature on its own (Bauer and Varma, 1997; Du et al., 2001; Du and Pardalos, 1993b; Hwang and Richards, 1992; Hwang et al., 1992; Kou et al., 1981; Takahashi and Matsuyama, 1980; Winter, 1987; Winter and Smith, 1992). Thus, in this subsection we give only some of the most used results. For additional information about the Steiner problem, one can consult the surveys Winter (1987); Hwang and Richards (1992); Hwang et al. (1992). One of the most well known heuristics for the Steiner tree problem was proposed by Kou et al. (1981), and frequently refereed to as the KMB heuristic. There is practical interest in this heuristic, since it has a performance guarantee of at most twice the size of the optimum Steiner tree. The steps of the KMB heuristic are shown in Algorithm 2. Construct a complete graph K(R, E) where the set of nodes is R. Let the distance d(i, j), i, j R be the shortest path from i to j in G. Find a minimum spanning tree T of K. Replace each edge (i, j) in T by the complete path from i to j in G. Let the resulting graph be T' Compute a minimum spanning tree T of T'. repeat r < false if there is a leaf w T which is not in R then Remove w from T r  true end until not r Algorithm 2: Minimum spanning tree heuristic for Steiner tree. Theorem 1 (Kou et al. (Kou et al., 1981)) Algorithm 2 has a perfor mance guarantee of 2 2/p, where p = R1. Wall (1980) made a comprehensive study of how the KMB heuristic per forms in problems occurring in real networks. For example, Doar and Leslie (1993) report that this heuristic can give much better results than the claimed guarantee, usually achieving 5% of the optimal for a large number of realistic instances. Another basic heuristic for Steiner tree was proposed by Takahashi and Matsii'..,iii. (1980). This heuristic works in a way similar to the Dijkstra's and Prim's algorithms. The operation of the heuristic consists of increasing the initial solution tree using shortest paths. Thus, it is classified as part of the broad class of pathdistance heuristics. Initially, the tree is composed of the source node only. Then, at each step, the heuristic searches for a still unconnected destination d that is closest to the current tree T, and adds to T the shortest path leading to d. The algorithm stops when all required nodes have been added to the solution tree. The Steiner tree technique for multicast routing consists of using the Steiner problem as a model for the construction of a multicast tree. In general, it is considered that there is just one source node for the multicast group. The set of required nodes is defined as the union of source and destinations. This technique is one of the most studied for multicast tree construction, with many algorithms available (Bauer and Varma, 1995; Chow, 1991; Chen et al., 1993; Kompella et al., 1992, 1993b,a; Hong et al., 1998; Kompella et al., 1996; Ramanathan, 1996). In the remaining of this and the next sections we discuss the versions of this problem which are most useful, as well as algorithms proposed for them. In one of the first uses of the Steiner tree problem for creating multicast trees, BharathKumar and Jaffe (1983) studied algorithms to optimize the cost and delay of a routing tree at the same time. Also, Waxman (1988) discusses heuristics for cost minimization using Steiner tree, taking in consideration the dynamics of inclusion and exclusion of members in a multicast group. It is also important to note some of the limitations of the Steiner problem as a model for multicast routing. It has been pointed out by Sriram et al. (1998) that Steiner tree techniques work best in situations where a virtual connection must be established. However, in the most general case of packet networks, like the Internet, it does not make much sense to minimize the cost of a routing tree, since each packet can take a very different route. In this case, it is more important to have distributed algorithms with low overhead. Despite this, Steiner trees are still useful as a starting point for more sophisticated algorithms. 2.3.2 Steiner Tree Problems with Delay Constraints The simplest way of applying the Steiner tree problem in multicast net works requires that the costs of edges in the tree represent the communication costs incurred by the resulting multicast routes. In this case we can just apply a number of existing algorithms, such as the ones discussed in the previous sec tion, for the Steiner tree problem. However, most applications have additional requirements in terms of the maximum delay for delivering of the information. That is the reason why the most well studied version of the Steiner tree problem applied to multicast routing is the delay constrained version (Im et al., 1997; Kompella et al., 1992, 1993b,a; Jia, 1998; Sriram et al., 1998). We give in this section some examples of methods used to give approximate solutions to this problem. One of the strategies used to solve the delay constrained Steiner tree problem is to adapt existing heuristics, by adding delay constraints. The heuristic proposed by Kompella et al. (1993b), for example, uses methods that are similar to the KMB algorithm (Kou et al., 1981). The resulting heuristic is composed of three stages. The first stage consists of finding a closure graph of constrained shortest paths between all members of a multicast group. The closure qlilntp of G is a complete graph which has the set of nodes V(G) and, for each pair of nodes u, v E V, an edge representing the cost of the shortest path between u and v. In the second stage, Kompella's algorithm finds a constrained spanning tree of the closure graph. To do this, the heuristic uses a greedy algorithm based on edge costs, to find a spanning tree with low cost. In the last stage, edges of the spanning tree found in the previous step are mapped back to the original paths in the graph. At the same time, loops are removed using the shortest path algorithm on the expanded constrained spanning tree. The time complexity of the whole procedure is O(An3), where A is the maximum delay allowed by the application. It should be noted, however, that even being very similar to the KMB heuristic, this algorithm does not have any proved approximation guarantee. This happens because the delay constraints make the problem much harder to approximate. Sriram et al. (1998) proposed an algorithm for constructing delay constrained multicast trees which is optimized for sparse, static groups. Their algorithm is divided into two phases. The first phase is distributed, and works by creating delay constrained paths from source to each destination. The paths are created using a unicast routing algorithm, so it can use information already available on the network. The second phase uses the computed paths to define a routing tree. Each path is added sequentially and cycles are removed as they appear. Basically, on iteration i, when a new path Pi is added to an existing tree Ti_1, each intersection Pi n Ti_I of the path with the old tree is tested. This is necessary to determine if just the part of Pi which does not intersect can be used, while maintaining the same delay constraint. If this is possible, then the tree becomes Ti, after adding the nonintersecting part of the path. Otherwise, the algorithm must remove some parts of the old tree in order to avoid a cycle. Another heuristic for the delay constrained Steiner tree problem is pre sented by Feng and Yum (1999). This heuristic uses the idea of constructing a minimum cost tree, as well as a minimum delay tree, and then combining the resulting solutions. Recall that a shortest /. 1,,;i tree can be computed using some algorithm for shortest paths, in polynomial time, with the delay being used as the cost function. Thus, the hard part of the algorithm consists of finding the minimum cost tree and then decide how to combine it with the minimum delay tree. The algorithm used to compute the minimum cost, delay constrained tree is a modification of the Dijkstra's algorithm, which maintains each path within a specified delay constraint. To combine different trees, the algorithm ei'pl'v a loop removal subroutine, which verifies if the resulting paths still satisfies the delay constraints. The resulting complexity of this al gorithm is similar to the complexity of the Dijkstra's algorithm, and therefore is an improvement in terms of computation time. Another possible method for designing good multicast routing trees is to start from algorithms for computing constrained minimum paths. This was the technique chosen by Kumar et al. (1999), who proposed two heuristics for the constrained minimum cost routing problem. In the first heuristic, which is called "dynamic center based heuristic", the idea is to find a center node to which all destinations will be linked, using constrained minimum paths. The center node c is calculated initially by finding the pair of nodes with highest minimum delay path, and taking c as the node in the middle of this path. Other destinations are linked using minimum delay paths with low cost. The second heuristic, called "best effort residual delay heuristic", follows a similar idea, but this time each node added to the current routing tree T has a residual delay bound. New destinations are then linked to the tree through paths which have low cost and delay smaller than the residual delay of the connecting node v T. Not only delay constraints have being used with the multicast routing problem. Jiang (1992) discusses another version of the multicast Steiner tree problem, this time with link capacity constraints. His work is related to video conferi ini. where many users need to be source nodes during the establish ment of the conference. One of the ideas used is that, as each user can become a source, then a distinct multicast tree must be created for each user. He proposes some heuristics to solve this problem, with computational results for the heuristics. As a last example, Zhu et al. (1995) proposed a heuristic for routing with delay constraints with complexity O(k lV3 log V). The algorithm has two phases. In the first phase, a set of delaybounded paths is constructed from source to each destination, to form a delaybounded tree. Then, in the second phase the algorithm tries to optimize this tree, by reducing the total cost at each iteration. The algorithm is also shown useful to optimize other objective functions than total cost. For example, it can be used to minimize the maximum congestion in the network, after changes in the second phase to account for the new objective function. In the paper there are comparisons between the proposed heuristic and the heuristic for Steiner tree problem pro posed by Kou et al. (1981). The results show that the heuristic achieves solutions very close to that given by the algorithm for Steiner tree. Sparsity and Delay Minimization Chung et al. (1997) proposed heuristics to the delay constrained minimum multicast routing, when considering the structure of sparse problems. The heuristic depends on the use of other algorithms to find approximate solutions to Steiner problem. The Steiner tree heuristic is used to return two solutions: in the second run, the cost function c is replaced by the delay function d. Thus, there are two solution which optimize different objective functions. The main idea of the proposed algorithm is trying to optimize the cost of the routing tree, as well as the maximum delay, at the same time. To do this, the algorithm uses a method proposed by Blokh and Gutin (1996), which is based on Lagrangian relaxation. A critique that can be done to the work of Chung et al. (1997) is that the goal of optimizing the Steiner tree with delay cost is not what is required in most applications. For example, a solution can be optimal for this goal, however some path from s to a destination d can still have delay greater than a constant A. This happens because the global optimum does not implies that each sourcedestination path is restricted to the maximum delay. 2.3.3 The Online Version of Multicast Routing The multicast routing problem can be generalized in the following way. Suppose that a multicast group can be increased or reduced by means of on line requests posted by nodes of the network. This is a harder problem, since optimal solutions, when considering just a fixed group, can quickly become inaccurate, and even very far from the optimum, after a number of additions and removals. Researchers in the area of multicasting routing have devised some ways to deal with the problem of reconfiguring a multicast tree when inclusions and departures of members of a group occur (Aguilar et al., 1986; Waxman, 1988). A common approach consists of modifying the simple existing algorithms in order to avoid the recomputation of the entire tree for each change. However, as noted in Pasquale et al. (1998), a problem with such methods is that the global opi im.il ii1 of the resulting trees is lost at each change, and a very bad solution can emerge after many such local modifications. One of the difficulties of the source tree based techniques in this respect is that, for each change in the multicast group, a new tree must be computed to restore service at the required level. The algorithms necessary to create this tree are, however, expensive, and this makes the technique not suitable for dynamic groups. Kheong et al. (2001) proposed an algorithm to speed up the creation of multicast routing trees in the case of dynamic changes. The idea is to maintain caches of precomputed multicast trees from previous groups. The cache can be used to quickly find new paths, connecting some of the members of the group. An algorithm for retrieving data from the path cache was proposed, which finds similarities between the previous and the current multicast groups. Then the algorithm constructs a connecting path using parts of the paths store in the cache. The difficulty of adapting source based techniques to the dynamic case has motivated the appearance of specialized algorithms for the online version of the problem. For example, Waxman (1988) defines two types of online multicast heuristics. The first type allows a rearrangement of the routing tree after some number of changes, while the second type does not allow such reconfigurations. The theoretical model for this problem is given by the so called online Steiner problem. In this version of the problem, one needs to construct a solution to a Steiner problem subject to the addition and deletion of nodes (Imase and Waxman, 1991; Westbrook and Yan, 1993; Sriram et al., 1999). This is clearly a AVPhard problem, since it is a generalization of the Steiner problem. Waxman (1988) studied how a routing tree must be changed when new nodes are added or removed. To better describe this situation, he proposed a random graph model, where the probability of existing an edge between two nodes depends on the Euclidean distance between then. This probability decreases exponentially with the increase of distance between nodes. The random inclusion of links can be used to represent the random addition of new users to a multicast group. Waxman also described a greedy heuristic to approximately solve instances generated according to this model. Hong et al. (1998) proposed a dynamic algorithm which is capable of handling additions and removals of elements to an existing multicast group. The algorithm is again based on the Steiner tree problem, with added delay constraints. However, to decrease the computational complexity of the prob lem, the authors employed the Lagrangian relaxation technique. According to their results, the algorithm finds solutions very close to the optimum when the network is sparse. Feng and Yum (1999) devised a heuristic algorithm with the main goal of allowing easy insertion of new nodes in a multicast group. The algorithm is similar to Prim's algorithm for spanning trees in which it, at each step, takes a nonconnected destination with minimum cost and tries to add this destination to the current solution. The algorithm also uses a priority queue Q where the already connected elements are stored. The key in this priority queue is the total delay between the elements and the source node. The algorithm uses a parameter k to determine how to compute the path from a destination to the current tree. Given a value of k, the algorithm computes k minimum delay paths, from the current destination d to each of the smallest k elements in the priority queue. Then, the best of the paths is chosen to be part of the routing tree. An interesting feature of the resulting algorithm is that, changing the value of the parameter k will change the amount of effort needed to connect destination. Clearly, when increasing the value of k, better results will be obtained. This algorithm facilitates the inclusion of new elements, because the same procedure can be used to grow the existing tree, in order to accommodate a new node. Sriram et al. (1999) proposed new algorithms for the online, delay con strained minimum cost multicast routing that try to maintain a fixed quality of service by specifying minimum delays. The algorithm is able to adapt the routing tree to changes in membership due to inclusions and exclusions of users. One of the problems they try to solve is how to determine the moment in which the tree must be recomputed, and for how long should the algorithm just do modifications to the original tree. To answer this question, the authors introduced the concept of ..i,',,,.:h factor, which measures the usefulness of part of the routing tree to the rest of the users. When the quality factor of part of a tree decreases to a specific threshold, that part of the tree must be modified. The authors discuss a technique to rearrange the tree such that the minimum delays continue to be respected. The first algorithm proposed by Sriram et al. (1999) starts by creating a set of delay constrained minimum cost paths. For each destination, a path is created with bandwidth greater than the required bandwidth B, and with delay less than the maximum delay A. The next phase uses the resulting paths to create a complete routing tree. The algorithm adds sequentially the edges in each path, and at each step removes the loops created by the addition of the path. Loops are removed in a way such that the delay constraints are not violated. The second algorithm proposed in Sriram et al. (1999) is a distributed protocol, where initially each destination receives a message in order to add new paths to the source tree. The nodes are kept in a priority list, ordered by increasing delay requirements. According to the order in the list, the des tinations receive messages, which ask them to compute parameters over the available paths, and then construct the new paths that will form the final routing tree. A technique that has been 'irplvd by some researchers consists of us ing information available from unicast protocols to simplify the creation of multicast routes. For example, Baoxian et al. (2000) proposed a heuristic for routing with delay constraints which is based on the information given by OSPF. Reusing this information, the resulting algorithm can run with im proved performance, in this case with complexity O(IDIIV), where D is the set of destinations. The resulting algorithm has two steps. In the first step, it checks, for each destination di, if there is some path from the source s to destination di .I ifviiu the delay. In the second step, the algorithm uses another heuristic to construct a unicast path from s to di. This heuristic basically construct a path using information about predecessor nodes from the unicast protocol as well as the delay information. 2.3.4 Distributed Algorithms The multicast routing problem is in fact a distributed problem, since each node involved has some available processing power. Thus, it is natural to look for distributed algorithms which can use this computational power in order to reduce their time complexity. A number of papers have focused on distributed strategies for delay constrained minimum spanning tree (Jia, 1998; Chen et al., 1993). A good example is the algorithm presented in Chen et al. (1993). The authors propose a heuristic that is similar to the general technique used in the KMB heuristic for Steiner tree, and the algorithm in Kompella et al. (1993b), for example. However, the main difference is that a distributed algorithm is used to compute the minimum spanning tree, which must be computed twice during the execution of the heuristic. The method used to find the MST is based on the distributed algorithm proposed by Gallager et al. (1983). Kompella et al. (1993a) proposed some distributed algorithms targeting applications of audio and video delivering over a network, where the restriction on maximum delay plays an important role. The authors try to improve over previous algorithms by using a distributed method. The main objective of the distributed procedure is to reduce the overall computational complexity. It must be noted, however, that, using decentralized algorithms, some of the global information about the network becomes harder to find (for example, global connectivity). Thus, a simplified version of the algorithm must be proposed which does not use global information. Nonetheless, according to the authors, the resulting algorithms stay within 15%30% of the optimal solution, for most test instances. The first algorithm in Kompella et al. (1993a) is just a version of Bellman Ford algorithm (Bellman, 1957) which finds a minimum delay tree from the source to each destination. During the construction of each path, the algo rithm verifies the cost of the available edges, and choose the one with lowest cost which satisfies the delay constraints. The algorithm has the objective of achieving feasibility, and therefore the results are not necessarily locally opti mal. This can be achieved, however, using another optimization phase, such as a local search algorithm. In the second algorithm, the strategy impl,v d is similar to the Prim's algorithm for minimum spanning tree construction. Its consists of growing a routing tree, starting from the source node, until all destinations are reached. The resulting algorithm is specialized according to different techniques for selecting the next edge to be added to the tree. In the first edge selection strategy proposed, edges with smallest cost are selected, such that the delay restrictions are satisfied. The second edge selection rule tries to balance the cost of the edge with the delay imposed by its use. This is done by a "'.i., factor, which gives higher priority to edges with smaller delay, among edges with the same cost. The factor used for edge (i, j) is ,w(i,j)J b(i, A(D(si)+ d(i,j)) where D(s, i) is the minimum total delay between the source s and node i, A is the maximum allowed delay, and, as usual, w(i, j) and d(i, j) are the cost and delay between nodes i and j. The authors also discuss the problem of termination, which is an im portant question for distributed algorithms. In this case, the problem exists because some configurations can report an infeasible problem, while feasibility can be restored by making some changes to the current solution. Shaikh and Shin (1997) presented a distributed algorithm where the focus is to reduce the complexity of distributed versions of heuristics for the the delay constrained Steiner problem. In their paper, the authors try to adapt the model of Prim's and Dijkstra's algorithms to the harder task of creating a multicast routing tree. In this way, they aim to reduce the complexity associated with the heuristics for Steiner tree, while producing good solutions for the problem. The methods employed by Dijkstra's shortest path and Prim's minimum spanning tree algorithm are interesting because they require only local information about the network, and therefore they are known to perform well in distributed environments. The operation of these algorithms consists of adding at each step a new edge to the existing tree, until some termination condition is satisfied. In the algorithm proposed by Shaikh and Shin, the main addition done to the structure of Dijkstra's algorithm is a method for distinguishing be tween destinations and nondestination nodes. This is done by the use of an indicator function ID which returns 1 if and only if the argument is not a destination node. The general strategy is presented in Algorithm 3. Note that in this algorithm the accumulated cost of a path is set to zero every time a destination node is reached. This guides the algorithm to find paths that pass through destination nodes with higher probability. This strategy is called destinationdriven multicast. The resulting algorithm is simple to implement and, according to the authors, perform well in practice. input: G(V, E), s for v c V do d[v] oo d[s] < 0 S 0 Q < V /* Q is a queue */ while Q / 0 do v<< get_min(Q) S  SU {v} for u E N(v) do if u S and d[u] > d[v]ID[v] + w(u, v) then d[u] d[V]lD[v] w(u, v) end end end Algorithm 3: Modification of Dijkstra's algorithm for multicast rout ing, proposed by Shaikh and Shin (1997). Mokbel et al. (1999) discuss a distributed heuristic algorithm for delay constrained multicasting routing, which is divided in a number of phases. The initial phase of the algorithm consists of discovering information about nodes in the network, particularly about delays incurred by packages. In this phase, a packet is sent from a source to all other nodes in the neighborhood. The packet is duplicated at each node, using the flooding technique. At each node visited, information about the total delay and cost experienced by the packet is collected, added to the packet, and retransmitted. Each destination will receive packets with information about the path traversed during the delay time previously defined. After receiving the packets, each destination can select the resulting path with lowest cost to be the chosen path. As the last step of the initial phase, all destinations send this information to the source node. In the second phase, the source node will receive the selected paths for each destination and construct a routing tree based on this information. This is a centralized phase, where existing heuristics can be applied to the construction of the tree. To improve the performance of the algorithm, and to avoid an overload of packets in the network during the flooding phase, each node is required to maintain at most K packets at any time. Using this parameter, the time complexity of the whole algorithm is O(K2iV 2). Sparse Groups An important case of multicast routing occurs when the number of sources and destinations is small compared to the whole network. This is the typical case for big instances, where just a few nodes will participate in a group, at each moment. For this case, Sriram et al. (1998) proposed a distributed algorithm which tries to explore the sparsity of the problem. The algorithm initially uses information available through a unicast routing protocol to find precomputed paths in the current network. However, problems can appear when these paths induce loops in the corresponding graph. In the algorithm, such intersections are treated and removed dynamically. The algorithm starts by creating a list of destinations, ordered according to the their delay constraints. Nodes with more strict delay constraints have the opportunity of searching first for paths. Each destination di will independently try to find a path from di to the source s. If during this process a node v previously added to the routing tree is found, then the process stops and a new phase starts. In this new phase, paths are generated from v to the destination i. The destination i chooses one of the paths according to a selection function SF (similar to the function used by Kompella et al. (1993a)), which is defined for each path P and given by C(P) a(P D, (s, v) D(P)' where C(P) and D(P) are the cost and delay of path P, A is the maximum delay in this group, and DT, (s, v) is the current delay between the source s and node v. A problem that exists when the multicast group is allowed to have dy namic membership, is that a considerable amount of time is spent in the pro cess of connection configuration. Jia (1998) proposes a distributed algorithm which addresses this question. A new distributed algorithm is employed, which integrates the routing calculation with the connection configuration phase. Using this strategy, the number of messages necessary to set up the whole multicast group is decreased. 2.3.5 Integer Programming Formulation Integer programming has been very useful in solving combinatorial opti mization problems, via the use of relaxation and implicit enumeration meth ods. An example of this approach to multicast routing is given by Noronha and Tobagi (1994), who studied the routing problem using an integer programming formulation. They discuss a general version of the problem in which there are costs and delays for each link, and a set {1, .., T} of multicast groups, where each group i has its own source si, a set of ni destinations dil ..., din,, a maxi mum delay Ai, and a bandwidth request ri. There is also a matrix B' E R '1, for each group i E {1,..., T}, of sourcedestination requirements. The value of B>k is 1 if j = 1 if j = dk, and 0 otherwise. The nodeedge incidence matrix is represented by A E Znm. The network considered has n nodes and m edges. The vectors W E R', D E Rm and C E Rm give respectively the costs, delays and capacities for each link in the network. The variables in the formulation are X1,...,XT (where each Xi is a matrix m x ni), y1,..., yT (where each Y' is a vector m elements), and M E RT. The variable Xk = 1 if and only if link j is used by group i to reach destination dik. Similarly, variable Yj 1 if and only if link j is used by multicast group i. Also, variable M, represents the delay incurred by multicast group i in the current solution. In the following formulation, the objectives of minimizing total cost and maximum delay are considered. However, the constant values 3 and 3d rep resent the relative weight given to the minimization of the cost and to the minimization of the delays, respectively. Using the variables shown, the inte ger programming formulation is given by: T min riu CYi +3.[ (2.1) i= 1 subject to AX' = B' for i 1,...,T (2.2) Xj k< Y < 1 fori =,..., T, j 1,...,n,k =,...,n (2.3) k MV > ED for i 1,...,T, k ,...,n (2.4) j=1 M < L, for i ,...,T (2.5) T rY'i < C (2.6) i= 1 XiY, Y' {0,1}, for The constraints in the above integer program have the following meaning. Constraint (2.2) is the flow conservation constraint for each of the multicast groups. Constraint (2.3) determine that an edge must be selected when it is used by any multicast tree. Constraints (2.4) and (2.5) determine the value of the delay, since it must be greater than the sum of all delays in the current multicast group and less than the maximum acceptable delay Li. Finally, constraint (2.6) says that each edge i can carry a value which is at most the capacity Ci. This is a very general formulation, and clearly cannot be solved exactly in polynomial time because of the integrality constraints (2.7). This formulation is used in Noronha and Tobagi (1994) to derive an exact algorithm for the general problem. Initially, the decomposition technique was used to decompose the constraint matrix in smaller parts, where each part could be solved more easily. This can done using standard mathematical pro gramming techniques, as shown e.g., in Bazaraa et al. (1990). Then, a branch andbound algorithm is proposed to the resulting problem. In this branchand bound, the lower bounding procedure uses the decomposition found initially to improve the efficiency of the lower bound computation. 2.3.6 Minimizing Bandwidth Utilization A problem that usually happens when constructing multicast trees is the tradeoff between bandwidth used and total cost of the tree. Traditional algo rithms for tree minimization try to reduce the total cost of the tree. However, this in general does not guarantee minimum bandwidth utilization. On the other hand, there are algorithms for minimization of the bandwidth that do not maintain the minimum cost. For example, a greedy algorithm, as described in Fujinoki and Christensen (1999), works by connecting destinations sequen tially to the source. Each destination is linked to the nearest node already connected to the source. In this way, bandwidth is saved by reusing existing paths. Fujinoki and Christensen (1999) proposed a new algorithm for maintain ing dynamic multicast trees which try to solve the tradeoff problem discussed above. The algorithm, called hIi test best path tree" (SBPT), uses short est paths to connect sources to destinations. The authors represent distance between nodes as the minimum number of edges in the path between them. The first phase in the algorithm consists of computing the shortest path from s to all destinations di. In the second phase, the algorithm performs a sequence of steps for each destination di e D. Initially, it computes the shortest paths from d, to all other nodes in G. Then, the algorithm takes the node u which has minimum distance from d, and at the same time occurs in one of the shortest paths from s to di. By doing this choice, the method tries to favor the nodes already in the routing tree giving the smallest possible increase in the total cost. 2.3.7 The Degreeconstrained Steiner Problem If the number of links from any node in the network is required to be a fixed value, then we have the degreeconstrained version of the multicast routing problem. For some applications of multicasting is difficult to make a large number of copies of the same data. This is particularly true for high speed switches, where the speed requirements may prohibit in practice an unbounded number of copies of the received information. For example, in ATM networks, the number of out connections can have a fixed limit (Zhong et al., 1993). Thus, it is interesting to consider Steiner tree problems where the degree of each node is constrained. Bauer (1996) proposed algorithms for this version of the problem, and tried to construct degreeconstrained multicast trees as a solution. Bauer and Varma (1995) reviewed the traditional heuristics for Steiner tree, and new heuristics were given, which consider the restriction in the number of ,li. ',ent nodes. They show that the heuristics for degreeconstrained Steiner tree give solutions very close to the optimum for sample instances of the general Steiner problem. They also show experimentally that, despite the restriction on the node degrees, almost all instances have feasible solutions which have been found by the heuristics. 2.3.8 Other Restrictions: Non Symmetric Links and Degree Vari ation An interesting feature of real networks, which is not mentioned in most of the research papers, is that links are, in general, non symmetric. The capacity in one direction can be different from the (.1....ilr in the other direction, for example, due to congestion problems in some links. Ramanathan (1996) considered this kind of restriction. In his work, the minimum cost routing tree is modeled as a minimum Steiner tree with constraints, where the network has non symmetric links. The author proposes an approximation algorithm, with fixed worst case guarantee. The resulting algorithm has also the nice characteristic of being parameterizable, and therefore it allows the trading of execution time for accuracy. Another restriction, which is normally disregarded, was considered in the approach taken by Rouskas and Baldine (1996), who proposed the minimiza tion of the so called /. 1,,;i variation. The delay variation is defined as the difference between the minimum and maximum delay defined by a specific routing tree. In some applications it is interesting that this variation stay within a specific range. For example, it can be desirable that all nodes receive the same information at about the same time. Table 21: Comparison among algorithms for the problem of multicast routing with delay constraints. k is the number of destinations. ** This algorithm is partially distributed. Algorithm Guarantee Complexity Types of instances KMB (Kou et al., 1981) 2 O(kn2) general Takahashi and Matsuyama (1980) 2 O(kn2) general Kompella et al. (1993b) O(n3A) general Sriram et al. (1998) N/A ** sparse, static groups Feng and Yum (1999) (n2) general Kumar et al. (1999) (n3) center based capacity constrained, Jiang (1992) (n3) videoconferencing Chung et al. (1997) (n3) sparse instances Zhu et al. (1995) O(kn3 log n) sparse instances 2.3.9 Comparison of Algorithms A Comparison of Nondistributed Approaches Table 21 gives a summary of features of the algorithms for the Steiner tree problem with delay constraints discussed in this section. Most of them have similar computational complexity of the order of O(n3), where n is the number of nodes in the network. The best result is obtained by Feng and Yum (1999), which combine the work of finding good solutions in terms of cost and delay. The heuristic by Chung et al. (1997) is reported to run faster then other heuristics, however it must be noted that it is optimized for sparse instances. Regarding approximation, only the first two algorithms in the table have known approximation guarantee (constant and equal to 2). However, it is not difficult to achieve similar performance guarantees on heuristics with the same complexity as KMB, for example. This can be performed by running the heuristic with known performance guarantee, followed by the normal heuristic, and then reporting the best solution. Table 22: Comparison among algorithms for the problem of multicast routing with delay constraints. k is the number of destinations, TSP is the time to find a shortest path in the graph. ** In this case amortized time is the important issue, but was not analyzed in the original paper. Algorithm Complexity Types of instances Kompella et al. (1993a) O(n3) general online instances Baoxian et al. (2000) O(mn) based on unicast information Sriram et al. (1999) 0(n3) instances with QoS Hong et al. (1998) O(mk(k + TSP) dynamic, delay sensitive general instances, Kheong et al. (2001) N/A ** cache must be maintained A Comparison of OnLine Approaches Table 22 presents a comparison of algorithms proposed for the online version of the Steiner tree problem with delay constraints, as discussed in Section 2.3.3. These algorithms in general do not provide a guarantee of approximation, due to the dynamic nature of the problem. From the algorithms shown in Table 22, the one with lowest complexity is given by Baoxian et al. (2000). However, this complexity is kept low due to the dependence of the algorithm on information given by other protocols operating unicast routing. Hong et al. (1998), however, consider the construction of a complete solution, and have the online issues as an additional feature. Kheong et al. (2001) also consider a technique where information is reused, but in this case from previous iterations of the algorithm. It is difficult to evaluate the complexity of the whole algorithm, since it depends on the amortized complexity on a large number of iterations. This kind of analysis is not carried out in the paper. A Comparison of Distributed Approaches Distributed approaches for the Steiner tree problem with delay constraints are more difficult to evaluate in the sense that other features become impor tant. For example, in distributed algorithms the message complexity, i.e., the number of message exchanges, is an important indicator of performance. These factors are sometimes not derived explicitly in some of the papers. For the algorithm proposed by Chen et al. (1993), the message complexity is shown to be O(m + n(n + log n)), and the time complexity is O(n2). Also, the worstcase ratio of the solution obtained to the cost of any given minimum cost Steiner tree T is 2(1 1/1), where 1 is the number of leaves in T. On the other hand, Shaikh and Shin (1997) do not give much information about the complexity of their distributed algorithm. It can be noted however that the complexity is similar to that of distributed algorithms for the com putation of a minimum spanning tree. Finally, Mokbel et al. (1999) derive only the total time complexity of their algorithm, which is O(K2n2), where K is a constant value introduced to decrease the number of message exchanges required. 2.4 Other Problems in Multicast Routing In this section we present some other problems occurring in multicast routing and which have interesting characteristics, in terms of combinatorial optimization. The first of these problems is the multicast packing problem, where the objective is to optimize the design of the entire network in order to provide (.,i ... ili, for a specific number of multicast groups. Then, we discuss the pointtopoint connection problem, which is a generalization of the Steiner tree problem. 2.4.1 The Multicast Packing Problem A more general view of the multicast routing problem can be found if we consider the required constraints when more than one multicast group exists. In this case, there is a number of applications that try to use the network for the purpose of establishing connection and sending information, organized in different groups. Thus, the network (.i1.. itr must be shared accordingly with the requirements of each group. These I ..i. ilr1 constraints are modeled in what is called the multicast packing problem in networks. This problem has attracted some attention in the past few years (Wang et al., 2002; Priwan et al., 1995; Chen et al., 1998). The congestion A, on edge e is given by the sum of all load imposed by the groups using e. The maximum congestion A is then defined as the maximum of all congestion A,, over edges e c E. If we assume that there are K multicast groups, and each group k generates an amount tk of traffic, an integer programming formulation for the multicast packing problem is given by min A (2.8) subject to K tkxk < A for all e E (2.9) i=1 x e {0,1}E fox i 1,..., K, (2.10) where variable x is equal to one if and only if the edge e is being used by multicast group k. A number of approaches have been proposed for solving this problem. For example, Wang et al. (2002) discuss how to set up multiple groups using routing trees, and formalized this as a packing problem. Two heuristics were then proposed. The first one is based on known heuristics for constructing Steiner trees. The second is based on the cutset problem. The constraints considered for the Steiner tree problem are, first, the minimum cost under bounded tree depth; and second, the cost minimization under bounded degree for intermediate nodes. Priwan et al. (1995) and Chen et al. (1998) proposed formulations for the multicast packing problem using integer programming. The last authors considered two ways of modeling the routing of multicast information. In the first method, the information is sent according to a minimum cost tree among nodes in the group, and therefore give rise to the Steiner tree problem. In the second, more interesting version, the information is sent through a ring which visits all elements in the group, and therefore this results in a problem similar to the traveling salesman problem. Using these formulations they describe heuristics that can be applied to get approximate solutions. Comparisons were done between the two proposed formulations with respect to the quality of the solutions found to the multicast packing problem. In the integer formulation of the multicast packing problem, there is a variable xe for each edge e c E, which is equal to one if and only if this edge is selected. Each edge has also an associated cost w,* Then, the integer formulation for the tree version of the problem is given by min : (2.11) ecE subject to Sxe > 1 for all S C V such that mi E S and MI S (2.12) eO8(S) (2.13) x E {0, 1}E where M is the set of nodes participating in a multicast group and 6(S) repre sents the edges leaving the set S C V. The integer program for the ringbased version is given by min S wexe (2.14) eE subject to x = 2 for all v M (2.15) ecS(v) Sx, < 2 for all v V \ M (2.16) Xe x > 2 for all Sc V s.t. u ES and M S (2.17) ecS(S) x {0, 1}1. (2.18) Here, u is any element of M. The integer solution of this problem defines a ring passing through all nodes participating in group M. In Chen et al. (1998) these two problems are solved using branchandcut techniques, after the identification of some valid inequalities. 2.4.2 The Multicast Network Dimensioning Problem Another interesting problem occurs when we consider the design of a new network, intended to support a specific multicast demand. This is called the multicast network dimensioning problem, and it has been treated in some recent papers (Prytz, 2002; Forsgren and Prytz, 2002; Prytz and Forsgren, 2002). According to Forsgren and Prytz (2002), the problem consists of determin ing the topology (which edges will be selected) and the corresponding capacity of the edges, such that a multicast service can be ,1pl.v d in the resulting net work. Much of the work for this problem has used mathematical programming techniques to define and give exact and approximate solutions to the problem. The technique used in Forsgren and Prytz (2002) has been the Lagrangian relaxation applied to an integer programming model. We assume that there are T multicast groups. The model uses variables x E {0, 1}, for k E {1,..., T}, e c E, which represent if edge e is used by group k. There are also variables z cE {0, 1}, for I {1,..., L}, e E, where L is the highest possible capacity level, which determine if the capacity level of edge e is equal to 1. Now, let dk, for k E {1,..., T}, be the bandwidth demanded by group k; c<, for I e {1,..., L}, and e e E, be the capacity available for edge e at the level 1; and w>, for I E {1,..., L}, and e E E, be the cost of using edge e at the capacity level 1. Also, b E Z"' is the demand vector, and A E R"X" is the node edge incidence matrix. We can now state the multicast network dimensioning problem using the following integer program L min w E z (2.19) eCE 1=1 T L subject to E dkj < c1z for all e E (2.20) k=l 1=1 z < 1 for all ecE (2.21) lL Ax =b (2.22) x, z Z. (2.23) In this integer program, constraint (2.20) ensures that the bandwidth used on each edges is at most the available .i,..i. .11,. Constraint (2.21) selects just one v.iI..i. ili level for each edge. Finally, constraint (2.22) enforces the flow conservation in the resulting solution. The problem proposed above has been solved using a branchandcut algo rithm, "'npl ing some basic types of cuts. The authors also use Lagrangian relaxation to reduce the size of the linear program that need to be solved. Some primal heuristics have been designed to exploit certain similarities with the Steiner tree problem. These primal heuristics were used to improve the upper bounds found during the branching phase. The resulting algorithm has been able to solve instances with more than 100 nodes. 2.4.3 The PointtoPoint Connection Problem An interesting generalization of the Steiner problem is known as the point topoint connection problem (PPCP). In the PPCP, we are given two i1 '..iii sets S and D of sources and, respectively, destinations. We require that S = ID1. However, that is not an important restriction, since for every network we can extend the set of sources using dummy nodes, if needed. As usual, there is a cost function w : E N. The objective is to find a minimum cost forest F C E, where each destination is connected to at least one source, and similarly each source is connected to at least one destination. This problem was first proposed by Li et al. (1992), who proved that all four versions of the PPCP (directed, undirected, with fixed or nonfixed destinations) are AVPhard, when p is given as input. Natu (1995) pro posed a dynamic programming algorithm for p = 2 with time complexity O(mn + n2 log n). Goemans and Williamson (1995) presented an approxima tion algorithm for a class of forest constrained problems, including the PPCP and the Steiner problem, that runs in O(n2 log n) and gives its results within a factor of 2 1/p of the optimal solution. The PPCP is useful to model situations where there are multiple sources. Some metaheuristic algorithms have been applied to the problem by Correa et al. (2003), and Gomes et al. (1998). The main idea of these methods is to design simple heuristics and combine them in a framework called asynchronous teams (Talukdar and de Souza, 1990), where each heuristic is considered an autonomous agent, capable of improving the existing solutions. Some of the heuristics proposed in Correa et al. (2003) explore basic features of optimal solutions. For example, one of the heuristics uses the t, :.i,.;.i: .', ,l;,i;7,:l;i property: given three nodes a, b, and c in one of the paths in the solution, the cost of paths between a and b and between b and c must be at most the minimum path between b and c. Given a solution, we can check for each three nodes a, b, and c in a path, if this condition is satisfied. If it is not, then we can always improve the solution by making the correct substitution, using the minimum path. 2.5 Concluding Remarks In this paper we have shown a number of applications and problems as sociated with multicast routing. We have also shown that most of them are related to other important problems in the area of combinatorial optimization. The topics addressed show that this is an evolving area, still in its develop ment stages. Moreover, most of the interesting problems can be addressed with techniques developed by the combinatorial optimization and operations research communities. We believe that in the next years an increased number of applications and models will continue to evolve from this field and make it an important source of problems and results. CHAPTER 3 STREAMING CACHE PLACEMENT PROBLEMS We study a problem in the area of multicast networks, called the stream ing cache placement problem (SCPP). In the SCPP one wants to determine the minimum number of multicast routers needed to deliver content to a specified number of destinations, subject to predetermined link capacities. We initially discuss the different versions of the SCPP found in multicast networks appli cations. Then, a transformation from the SATISFIABILITY problem is used in order to prove AHPhardness to all of these versions of the SCPP. Complexity results are derived for the cases of directed and undirected graphs, as well as with different assumptions about the type of flow in the network. 3.1 Introduction Multicast protocols are used to send information from one or more sources to a large number of destinations using a single send operation. Networks sup porting multicast protocols have become increasingly important for many orga nizations due to the large number of applications of nuill i .. in,. which include data distribution, videoconferencing (Eriksson, 1994), groupware (Chockler et al., 1996), and automatic software updates (Han and Shahmehri, 2000). Due to the lack of multicast support in existing networks, there is an arising need for updating unicast oriented networks. Thus, there is a clear economical impact in providing support for new multicast enabled applications. In this chapter we study a problem motivated by the economical plan ning of multicast network implementations. The streaming cache placement problem (SCPP) has the objective of minimizing costs associated with the implementation of multicast routers. This problem has only recently received attention (\!., et al., 2003; Oliveira et al., 2003a) and presents many interest ing questions still unanswered from the algorithmic and complexity theoretic point of view. 3.1.1 Multicast Networks In multicast networks, nodes interested in a particular piece of data are called a multicast ,rivmup. The main objective of such groups is to send data to destinations in the most efficient way, avoiding duplication of transmissions, and therefore saving bandwidth. With this aim, special purpose multicast protocols have being devised in the literature. Examples are the PIM (Deering et al., 1996) and corebased (Ballardie et al., 1993) distribution protocols. The basic operation in these routing protocols is to send data for a subset of nodes, duplicating the information only when necessary. Network nodes that understand a multicast protocol are called cache nodes, because they can send multiple copies of the received data. Other nodes simply act as intermediates in the multicast transmission. The main problem to be solved is deciding the route to be used by packages in such a network. One of the simplest strategies for generating multicast routes is to maintain a routing tree, linking all sources and destinations. A similar strategy, which may reduce the number of needed cache nodes, consists in determining a feasible flow from sources to destinations such that all destinations can be satisfied. A main economical problem, however, is that not all nodes understand these multicast routing protocol. Moreover, upgrading all existing nodes can be expensive (or even impossible, when the whole network is not owned by the same company, as happens in the Internet). Cs,r 1 a cr,b = b Figure 31: Simple example for the cache placement problem. Suppose an extreme situation, where no nodes have multicast capabili ties. In this case, the only possible solution consists in sending a separate copy of the required data to each destination in the group. However, in this case instances can become quickly infeasible, as shown in Figure 31, Here, all edges have (.i,... i1,, equal to one, and nodes a and b are destinations. In this example, a feasible solution is found when r becomes a cache node. Thus, it is interesting to determine the minimum number of cache nodes required to han dle a specified amount of multicast traffic, subject to link capacity constraints. This is called the str, ,im,,.i cache placement problem (SCPP). Formal Description of the SCPP. Suppose that a graph G = (V, E) is given, with a capacity function c : E Z+, a distinguished source node s E V and a set of destination nodes D C V. It is required that data be sent from node s to each destination. Thus, we must determine a set R of cache nodes, used to retransmit data when necessary, and the amount of information carried by each edge, which is represented by variables E IR+, such that ,W < c6, for e E E. The objective of the SCPP is to find a set R of minimum size corresponding to a flow {I,'  E E}, such that for each node v E D U R there is a unit flow from some node in {s}UR to v, and the (.,i ... il 1 constraints we < c,, for e E E, are satisfied. A characterization of the set of cache nodes can be given in terms of the surplus of data at each node v E V. Suppose that the number of data units sent by node v, also called surplus, is given by variable b, E Z. Note that the node s must send at least one unit of information, so bs must be greater than zero. Each destination is required to receive a unit of data, so it has a negative surplus (requirement) of 1. Now, suppose that, due to capacity constraints, we need to establish v as a cache node. Then, the surplus at this node cannot be negative, since it is also sending forward the received data. If v is also a destination, than the minimum surplus is zero (in this case it is receiving one unit and sending one unit); otherwise, b, > 1. Thus, the set of cache nodes R C V \ {s} is the one such that b, > 0 and v E D or b, > 0 and v V \DU {s}, for all v E R. 3.1.2 Related Work Problems in multicast routing have been investigated by a large number of researchers in the last decade. The most studied problems relate to the design of routing tables with optimal cost. In these problems, given a set of sources and a set of destinations, the objective is to send data from sources to destinations with minimum cost. In the case in which there are no additional constraints, this reduces to the Steiner tree problem on graphs (Du et al., 2001). In other words, it is required to find a tree linking all destinations to the source, with minimum cost. Using this technique, the source and destinations are the required nodes, the remaining ones being the Steiner nodes. Many heuristic algorithms have been proposed for this kind of problem (Chow, 1991; Feng and Yum, 1999; Kompella et al., 1993b,a; Kumar et al., 1999; Salama et al., 1997b; Sriram et al., 1998). The problems above, however, consider that all nodes support a multicast protocol. This is not a realistic assumption on existing networks, since most routers do not support multicasting by default. Thus, some sort of upgrade must be applied, in terms of software or even hardware, in order to d1 pl. multicast applications. Despite this important application, only recently re searchers have started to look at this kind of problem. In Mao et al. (2003) the Streaming Cache Placement problem is defined, in the context of Virtual Private Networks (VPNs). In this paper, the SCPP was proven to be AfPhard, using a reduction from the EXACT COVER BY 3SETS problem, and a heuristic was proposed to solve some sample instances. However, the paper does not give details about possible versions of the prob lem, and proceeds directly to deriving local search heuristics. Another related problem is the Cache Placement Problem (Li et al., 1999). Here, the objective is to place replicas of some static document on different points of a network, in order to increase accessibility and also decrease the average access time by any client. The important difference between this problem and the SCPP is that Li et al. (1999) does not consider multicast transmissions. Also, there are no restriction on the capacity of links, and data is considered to be placed at the locations before the real operation of the network. 3.2 Versions of Streaming Cache Placement Problems In this section, we discuss two versions of the SCPP. In the tree stry ,,in, cache placement problem (TSCPP), the objective is to find a routing tree which minimizes the number of cache nodes needed to send data from a source to a set of destinations. We also discuss a modification of this problem where we try to find any feasible flow from source to destinations, minimizing the number of cache nodes. The problem is called the flow striinr,. cache placement problem (FSCPP). Cs,r 1 Cr,a 1 8 0,70  a r Cr,b 1 Figure 32: Simple example for the Tree Cache Placement Problem. 3.2.1 The Tree Cache Placement Problem Consider a weighted, capacitated network G(V, E) with a source node s and a routing tree T rooted on s and spanning all nodes in V. Let D be a subset of the nodes in V that have a demand for a data stream to be sent from node s. The stream follows the path defined by T from s to the demand nodes and takes B units of bandwidth on every edge that it traverses. For each demand node, a separate copy of the stream is sent. Edge capacities cannot be violated. Note that, depending on the network structure, an instance of this problem can easily become infeasible. To handle this, we allow stream splitters, or caches, to be located at specific nodes in the network. A single copy of the stream is sent from s to a cache node r and from there multiple copies are sent down the tree. The optimization problem consists in finding a routing tree and to locate a minimum number of cache nodes. Figure 32 shows an small example for this problem. In this example, if nodes a and b each require a stream (with B = 1) from s, and node r is not a cache node, then we send two units from s to r and one unit from r to a and from r to b. We get an infeasibility on edge (s, r), since two units flow on it, and it has capacity c,,r = 1 < 2. However, if node r becomes a cache node, we can send one unit from s to r and then one unit from r to a and one unit from r to b. The resulting flow is now feasible. To simplify the formulation of the problem, we can consider, without loss of generality, that the bandwidth used by each message is equal to one. The tree cache placement problem (TSCPP) is defined as follows. Given a graph G(V, E) with capacities c,, on the edges, a source node s E V and a subset D C V representing the destination nodes, we want to find a spanning tree T (which determines the paths followed by a data stream from s to v E D) such that the subset R C V \ {s}, which represents the cache nodes, has minimum size. For each node v E D U R, there must be a data stream from some node w E R U {s} to v, such that the sum of all streams in each edge (i, j) T does not exceed the edge capacity cy. To state the problem more formally, consider an integer programming formulation for the TSCPP. Define the variables Y 1 if edge e is in the spanning tree T 0 otherwise, S1 if node i $ s is a cache node xi = 0 otherwise bi E {1,..., IV 1} the flow surplus for node i E V 1w {0,..., IV} the amount of flow in edge e E E. Given the nodearc incidence matrix A, the problem can be stated as IVI min x, (3.1) i=1 subject to Aw =b (3.2) E b = 0 (3.3) ieV bs > 1 for source s (3.4) x, 1 < b < xV 1) 1 for ie D (3.5) x SYe = V1 1 (3.7) eE ye < IH 1 for all H C V (3.8) eeG(H) 0 < < ceye for e E (3.9) x E {0, 1}, y {0, 1}E (3.10) b E Z, wE Z+, (3.11) where G(H) is the subgraph induced by the nodes in H. Constraint (3.2) imposes flow conservation. Constraints (3.3) through (3.6) require that there must be a number of data streams equal to the number of nodes in R U D. Constraints (3.7) and (3.8) are the spanning tree constraints. Finally, constraint (3.9) determine the bounds for flow variables, implying that the flow specified by w can be carried only on edges in the spanning tree. 3.2.2 The Flow Cache Placement Problem An interesting extension of the TSCPP arises if we relax the constraints in the previous integer programming formulation that require the solution to be a tree of the graph G. Then we have the more general case of a flow sent from the source node s to the set of destination nodes D. To see why this extension is interesting, consider the example graph, shown in Figure 33. In this example all edges have costs equal to one. If we find a solution to the TSCPP on this graph, then a stream can be sent through only one of the two edges (s, a) or (s, b). Suppose that we use edges (s, a) and (a, c). This implies that c must be a cache node, in order to satisfy demand nodes di and a d2 b Figure 33: Simple example for the Flow Cache Placement Problem. d2. However, in practice the number of caches in this optimal solution for the TSCPP can be further reduced. Routing protocols, like OSPF, achieve load balancing by sending data through parallel links. In the case of Figure 33, the protocol could just send another stream of data over edges (s, b) and (b, c). If this happens, we do not need a cache node, and the solution will have fewer caches. We define the Flow Cache Placement Problem (FSCPP) to be the problem of finding a feasible flow from source s to the set of destinations D, such that the number of required caches R C V \ {s} is minimized. The integer linear programming model for this problem is similar to (3.1)(3.11), without the integer variable y and relaxing constraints (3.7)(3.8). 3.3 Complexity of the Cache Placement Problems We prove that both versions of the SCPP discussed above are A.'Phard, using a transformation from SATISFIABILITY. This transformation allows us to give a proof of nonapproximability by showing that it is a gappreserving transformation. 3.3.1 Complexity of the TSCPP In this section we prove that the TSCPP is AHPhard, by using a reduction from SATISFIABILITY (SAT) (Garey and Johnson, 1979). SAT: Given a set of clauses Ci,...,Cm, where each clause is the liiji P li..i of CiI literals (each literal is a variable xj E {xli,...,x, } or its negation ri), is there a truth assignment for variables xl,..., x such that all clauses are satisfied? Definition 1 The TSCPPD problem is the following. Given an instance of the TSCPP and an integer k, is there a solution such that the number of cache nodes needed is at most k ? Theorem 2 The TSCPPD problem is ViPcomplete. Proof: This problem is clearly in A/P, since for each instance I it is enough to give the spanning tree and the nodes in R to determine, in polynomial time, if this is a '. ' instance. We reduce SAT to TSCPPD. Given an instance I of SAT, composed of m clauses C1,..., C, and n variables xl,..., x,, we build a graph G(V, E), with c, 1 for all e E E, and choose k = n. The set V is defined as V = {s} U {xi,...,xn} U {i,...,Tn} U {T',...,Tr,} U{T", ... T"} U {TI"', ...., T,/} U {CI, ..., Cr}, and the set E is defined as E = { (s, x), (s, )} U J{(x, T', (T, T')} U (, T")} n mT SU {(Tri")} U U (x, C) U (r, C) (3.12) i=1 i=1 xjzCi j6Ci Figure 1 shows the construction of G for a small SAT instance. Define D = {Cl,...,Cm U {T, ... T,} U {T', ..., T,} U {T",... T.'.}. Clearly, destination nodes T', Ti" and Ti" are there just to saturate the arcs leaving s and force one of xi, Ti to be chosen as a cache node. Also, each node Ci forces C1 (x, x2,a3) 74 T4 3 T1,3,4) Figure 34: Small graph G created in the reduction given by Theorem 2. In this example, the SAT formula is (xt VX2 V9 ) A (2 V V x V4) A (T1 Vx3 V4). the existence of at least one cache among the nodes corresponding to literals appearing in clause Ci. Suppose that the solution of the resulting TSCPPD problem is true. Then, we assign variable xi to true if node xi is in R, otherwise we set xi to false. This assignment is welldefined, since exactly one of the nodes Xi, xi must be selected. Clearly, this truth assignment satisfies all clauses Ci, because the demand of each node Ci is satisfied by at least one node corresponding to literals appearing in clause Ci. Conversely, if there is a truth assignment F which makes the SAT formula satisfiable, we can use it to define the nodes which will be caches, and, by con struction of G, all demands will be satisfied. Finally, the resulting construction is polynomial in size, thus SAT reduces in polynomial time to TSCPPD. o Input: a tree T Output: a set R of cache nodes forall v V do if v D then demand(v) < 1 else demand(v) < 0 end call findR(s) return R procedure findR(v) begin forall w such that (v, w) c T do findR(w) 1 if v = s then return R else p < parent(v) if cp,, < demand(v) then R  R U{ v} demand(p) < demand(p) + 1 end else demand(p) < demand(p) + demand(v) end Algorithm 4: Find the optimal R for a fixed tree. As a simple consequence of this theorem, we have the following corollary. Corollary 3 The TSCPP is JPhard. It is interesting to observe that the problem remains AVPhard even for unitarycapacity networks, since the proof remains the same for edges with unitary capacity. Some simple examples serve to illustrate the problem. For instance, if G is the complete graph K", then the optimal solution is simply a star graph with s at the center, and R = 0. On the other hand, if the graph is a tree with n nodes, then the number of cache nodes is implied by the edges of the tree, thus the optimum is completely determined. Algorithm 4 determines an optimal set R from a given tree T. The algo rithm works recursively. Initially it finds the demand for all leaves of T. Then it goes up the tree determining if the current nodes must be a cache node. The correctness of this method is proved bellow. Theorem 4 Given an instance of the TSCPP which is a tree T, then an optimal solution for T is given by Algorithm 4. Proof: The proof is by induction on the height h of a tree analyzed when Algorithm 4 arrives at line (1). If h = 0 then the number of cache nodes is clearly equal to zero. Assume that the theorem is true for trees with height h > 1. If the capacity of the arc (p, v) is greater than the demand at v, then there is no need of a new cache node, and therefore the solution remains optimal. If, on the other hand, (p, v) does not have enough (., .... il v to satisfy all demand at v, then we do not have a choice other than making v a cache node. Combining this with the assumption that the solution for all children of v is optimal, we conclude that the new solution for a tree of height h + 1 is also optimal. O 3.3.2 Complexity of the FSCPP We can use the transformation from SAT to TSCPP to show that FSCPP is also A/Phard. In the case of directed edges, this is simple, since given a graph G provided by the reduction, we can give an orientation of G from source to destinations. This is stated in the next theorem. Theorem 5 The FSCPP is ACTPhard if the instance rtplin is directed. Proof: The proof is similar to the proof of Theorem 4. We need just to make sure that the polynomial transformation given for the TSCPPD also works for a decision version of the FSCPP. Given an instance of SAT, let G be the corresponding graph found by the reduction. We orient the edges of G from S to destinations D, i.e., use the implicit orientation given in (3.12). It can be checked that in the resulting instance the number of cache nodes cannot be Ti2 17 Ti Figure 35: Part of the transformation used by the FSCPP. reduced by sending additional flow in other edges other than the ones which form the tree in the solution of TSCPP. Thus, the resulting R is the same, and FSCPP is A'Phard in this case. O Next we prove a slightly modified theorem for the undirerted version. To do this we need the following variant of SAT: 3SAT(5): Given an instance of SATISFIABILITY with at most three literals per clause and such that each variable appears in at most five clauses, is there a truth assignment that makes all clauses true? The 3SAT(5) is well known to be A/Pcomplete (Garey and Johnson, 1979). Theorem 6 The FSCPP is ffPhard if the instance irpli, is undirected. Proof: When the instance of FSCPP is undirected, the only thing that can go wrong is that some of the destinations T7, T", or T'" are being satisfied by flow coming from nodes Cj connected to their respective xi, Ti nodes. What we need to do to prevent this is to bound the number of occurrences of each variable and add enough absorbing destinations to the subgraph corresponding to that variable. We do this by reduction from 3SAT(5). The reduction is essentially the same as the reduction from SAT to TSCPP, but now for each variable x, we have nodes xi, Ti, T~, Tf", and Tk, for 1 < k < 6 (see Figure 35). Also, 62 for each variable x, we have edges (s, xi), (s, ri), (xi, TI), (i, T7"), (xi, T,), (,T), for 1 < k < 6. We claim that in this case for each pair of nodes xr, 7i, one of them must be a cache node (which says that the corresponding variable in 3SAT(5) is true or false). This is true because from the eight destinations not corresponding to clauses (Ti, T'i, and Tt, 1 < k < 6) attached to xr, Ti, two can be directly satisfied from s without caches. However, the remaining six cannot be satis fied from nodes Cj linked to the current variable nodes, because there are at most five such nodes. Thus, we must have one cache node at x, or Ti, for each variable xi. It is clear that these are the only cache nodes needed to make all destinations satisfied. This gives us the correct truth assignment for the original 3SAT(5) instance. Conversely, any nonsatisfiable formula will trans form to a FSCPP instance which needs more than n cache nodes to satisfy all destinations. Thus, the decision version of FSCPP is AHPcomplete, and this implies the theorem. o Note that there is a case of TSCPPD that is solvable in polynomial time, and this happens when k = 0, i.e. determining if any cache node is needed. The solution is given by the following algorithm. Run the maximum flow algorithm from node s to all nodes in D. This can be accomplished, for example, by creating a dummy destination node d and linking all nodes v E D to d by arcs with capacity equal to 1. If the maximum flow from s reaches each node in D, then the answer is true, since no cache node is needed to satisfy the destinations. Otherwise, the answer must be false because then at least one cache node is needed to satisfy all nodes in D. 3.4 Concluding Remarks In this chapter we presented and analyzed two combinatorial optimiza tion problems, the tree cache placement problem (TSCPP) and its flowbased, generalized version, the flow cache placement problem (FSCPP). We prove that both problems, on directed and undirected graphs, are AHPhard. For this purpose, we use a transformation from the SATISFIABILITY problem. Many questions remain open for these problems. For example, it would be interesting to find algorithms with a better approximation guarantee, or improved nonapproximability results. Some of these issues will be considered in the next chapters. CHAPTER 4 COMPLEXITY OF APPROXIMATION FOR STREAMING CACHE PLACEMENT PROBLEMS As shown in the previous chapter, the SCPP in its two forms is AiP hard. We improve the hardness results for the SCPP, by showing that it is very difficult to give approximate solutions for such problems. General non approximability is proved using the reduction from SATISFIABILITY given in the previous chapter. Then, we improve the approximation results for the FSCPP using a reduction from SET COVER. In particular, given k destina tions, we show that the FSCPP cannot have a O(log log k 6)approximation algorithm, for a very small 6, unless AiPcan be solved in subexponential time. 4.1 Introduction We continue in this chapter the study of the streaming cache placement problem (SCPP). In the SCPP one wants to determine the minimum number of multicast routers needed to deliver content to a specified number of desti nations, subject to predetermined link capacities. The SCPP is known to be .A'Phard, as shown Chapter 3. We give approximation results for the SCPP in its different versions, using properties of the SATISFIABILITY problem. We use the transformation described in the previous chapter to achieve this non approximability result. We show that there is a fixed c > 1 such that no SCPP problem can be approximated in polynomial time with guarantee bet ter than c. This is equivalent to say that the SCPP is in the MAX SNPhard class (Papadimitriou and Yannakakis, 1991). We are also able to improve the approximation results for the FSCPP, using a reduction from SET COVER. In this case, we are interested in general flows and directed arcs. In particular, given k destinations, we show that the FSCPP cannot have a O(log log k 6)approximation algorithm, for a very small 6, unless A'Pcan be solved in subexponential time. This chapter is organized as follows. In Section 4.2 we discuss the non approximation result for FSCPP based on the SATISFIABILITY problem. Then, in Section 4.3, we discuss the improved result for the FSCPP based on SET COVER. Section 4.4 gives some concluding remarks. 4.2 Nonapproximability The transformation used in Theorem 2 provides a method for proving a nonapproximability result for the TSCPP and FSCPP. We employ standard techniques, based on the gappreserving transformations. To do this we use an optimization version of 3SAT(5). MAX3SAT(5): Given an instance of 3SAT(5), find the maximum number of clauses that can be satisfied by any truth assignment. Definition 2 For any c, 0 < c < 1, an approximation .,1,'.iJ;.ii with guar antee c (or i ,!,.: i,. '://,. an eapproximation .,i1,i.ih1i,) for a maximization problem II is an .i, o .l.,i,, A such that, for any instance I E II, the r, .l1i cost A(I) of A applied to instance I ./ili... OPT(I) < A(I), where we de note by OPT(I) the cost of the optimum solution. For minimization problems, A(I) must .i.. fl A(I) < e OPT(I), for any fixed e > 1. The following theorem from Arora and Lund (1996) is very useful to prove hardness of approximation results. Theorem 7 (Arora and Lund (1996)) TlI. is a ,,i,',ii,:.:,I;/ time reduc tion from SAT to MAX3SAT(5) which tro, af; iir formula p into a formula o' such that, for some fixed c (e is in fact determined in the proof of the theorem), if Z is satisfiable, then OPT(O') = m, and if Z is not satisfiable, then OPT(9') < (1 c)m, where m is the number of clauses in 9'. In the following theorem we use this fact to show a nonapproximability result for TSCPP. Theorem 8 The transformation used in the proof of T/i..... ,,: 2 is a gap preserving transformation from MAx3SAT(5) to TSCPP. In other words, given an instance 9 of MAx3SAT(5) with m clauses and n variables, we can find an instance I of TSCPP such that If OPT(O) = m then OPT(I) = n; and If OPT(O) < (1 c)m then OPT(I) > (1 + e )n. where e is given in TI... i, ,,: 7 and el = e/15. Proof: Suppose that 0 is an instance of MAX3SAT(5). Then, we can use the transformation given in the proof of Theorem 2 to construct a corresponding instance I of TSCPP. If 0 has a solution with OPT(O) = m, where m is the number of clauses, then by Theorem 2, we can find a solution for I such that OPT(I) = n. Now, if OPT(O) < (1ec)m, then there are at least cm clauses unsatisfied. In the corresponding instance I we have at least n cache nodes due to the constraints from nodes T/, T and Tf', 1 < i < n. These cache nodes satisfy at most (1 c)m destinations corresponding to clauses. Let U be the set of unsatisfied destinations. The nodes in U can be satisfied by setting one extra cache (in a total of two, for nodes xj and Tj) for at least one variable xr appearing in the clause corresponding to ci, for all ci E U. Thus, the number of extra cache nodes needed to satisfy U is at least IU /5, since a variable can appear in at most 5 clauses. We have OPT(I) > n + IU/5 > n + m/5 > (1 + /15)n. The last inequality follows from the trivial bound m > n/3. The theorem follows by setting ec = e/15. O Definition 3 A PTAS (Polynomial Time Approximation Scheme) for a min imization problem II is an .1io.i./'ii; that, for each e > 0 and instance I II, returns u solution A(I), such that A(I) < (1 + e)OPT(I), and A has rii,, time P',,ri.,'. ,::il in the size of I, depending on e (see, e.g. Papadimitriou and Sl. (.l. : (1' '), page 425). Corollary 9 Unless P = AfP, the TSCPP cannot be approximated by (1+e2) for any C2 < c1, where e1 is given in TI .... ,,i 8, and therefore there is no .i .,1,i,,,..:,,1 time approximation scheme (PTAS) for the TSCPP. Proof: Given an instance 0 of SAT, we can use the transformation given in Theorem 7, coupled with the transformation given in the proof of Theorem 2, to give a new polynomial transformation T from SAT to TSCPP. Now, let I be the instance created by T on input 0. Suppose there is an E2 approximation algorithm A for TSCPP, with 0 < C2 < c1. Then, when A runs on an instance I constructed by T from a satisfiable formula 0, the result must have cost A(I) < (1+ C2)n < (1 + c)n. Otherwise, if 0 is not satisfiable, then the result given by this algorithm must be greater than (1 + ci)n, because of the gap introduced by T. Thus, if there is an E2approximation algorithm, then we can decide in polynomial time if a formula ( is satisfiable or not. Assuming P $ A'P, there is no such algorithm. The fact that there is no PTAS for the TSCPP is a consequence of this result and the definition of PTAS. o The above theorem and corollary can be easily extended to the FSCPP. The fact that the same transformation can be used for both problems can be used to demonstrate the nonapproximability result to the FSCPP as well. We state this as a corollary. Corollary 10 Unless P = HP, the FSCPP has no PTAS. Proof: The transformation from SAT to FSCPP is identical, so Theorem 8 is also valid for the FSCPP. This implies that the FSCPP has no PTAS, unless P H HV. E 4.3 Improved Hardness Result for FSCPP In this section, we are interested in the case of general flows and directed arcs. This version of the problem is called the flow streaming cache place ment problem (FSCPP). In particular, given k destinations, we show that the FSCPP cannot have a O(log log k 6)approximation algorithm, for a very small 6, unless HAPcan be solved in subexponential time. We have shown above that, given a instance of the FSCPP, there is an e > 0 such that the FSCPP cannot be approximated by 1+ e, thus demonstrating that FSCPP is MAX SNPhard (Papadimitriou and Yannakakis, 1991) and discarding the possibility of a PTAS. We show a stronger result: there is no approximation algorithm that can give a performance guarantee better than log log k, where k is the number of destinations. The proof is based on a reduction from the SET COVER problem. SET COVER: Given a ground set T = tl,..., ti, with subsets Si,..., Sm C T, find the minimum cardinality set C C {1,..., m} such that Uiec Si T. It is known (Feige, 1998) that SET COVER does not have approximation algorithms for any guarantee better than O(logn). Thus, if we find a trans formation from SET COVER to FSCPP that preserves approximation, we can prove a similar result for FSCPP. We show how this transformation, which will be represented by : SC FSCPP, can be done. For each instance Isc of set cover, we must find a corresponding instance IFSCPP of the FSCPP. The instance Isc is composed of sets T and Si, ..., Sm as shown above. The transformation consists of defining a capacitated graph G with a source and a set D of destinations. Let G be the graph composed of the following nodes: V = {} U {W1,... ",.} U {Vi,..., Vn} U{Si,..., Sm}. Also, let the edges E of the graph G be E {(w, vi) ti, S} U {(s,, )} U U {(, .i)}. i=1 i=1 In the instance of FSCPP, the set of destination nodes D is given by D = {v,..., V,} U {S,...,S }, and s is the source node. Thus, there is an one to one correspondence between nodes ., and sets Si, for 1 < i < m. There is also an one to one correspondence between nodes vi and ground elements ti E T, for 1 < i < n. There is a directed edge between the source and each node .w and between nodes .w, and nodes representing elements appearing in the set Si. Nodes w, are also linked to each si. Finally, each edge e has (.i .... il v c = 1. See an example of such reduction in Figure 41. The ground set in this example is T = {t,... ,t}, and the subsets are S t =t t2, t4, t5}, S2 { tl1 t2, t4, t6}, and S = {t2, t4, t6} Theorem 11 The transformation described above is a '.'l.v:,'.*:.lJ time reduc tion from SET COVER to FSCPP. Proof: Let Isc be the instance of SET COVER and IFSCPP the corresponding instance of the FSCPP. It is clear that the transformation is polynomial, since 01 02 V3 V4 05 V06 0 0 0 S1 S2 S3 Figure 41: Example for transformation of Theorem 11. the number of edges and nodes is given by a constant multiple of the number of elements and sets in the instance of SET COVER. We must prove that Iis and Isccp have equivalent optimal solutions. Let S' be an optimal solution for Isc. First we note that the destination nodes si, 1 < i < m, can be reached only from nodes w, and therefore each s, must be satisfied with flow coming from 11 Thus, each node si saturates the corresponding w, which means that to satisfy any other node from iw we must make it a cache node. Then, we can clearly make R = {f w i c S'}, and serve all remaining destinations in vi,..., v, by definition of S'. Each node in R will be a cache node, and therefore R is a solution for IFSCPP. This solution must be optimal, because otherwise we could use a smaller solution R' to construct a corresponding set S" c {1,...,m} with S" < S'I, covering all elements of T, and therefore contradicting the fact that S' is an optimum solution for the SC instance. Thus, the two instances Isc and IFSCPP have equivalent optimal solutions. Corollary 12 Given an instance I of SC, and the transformation q described above, then we have OPT(I) = OPT(q(I)). The following theorem, proved by Feige (Feige, 1998), will be useful for our main result. Theorem 13 (Feige (Feige, 1998)) If there is some c > 0 such that a poly nomial time .i/. i./'lii, can approximate set cover within (1 ) logn, then AfP c TIME (no(loglog )). This theorem implies that finding approximate solutions with guarantee better than (1 c) log n for SET COVER is equivalent to solve any problem in AflPin subexponential time. It is strongly believed that this is not the case. We use this theorem and the reduction above to give a related bound for the approximation of FSCPP. To do this, we need a gap preserving transformation from SC to FSCPP, as stated in the following lemma. Lemma 14 If I is an instance SET COVER, then the transformation 0 from SC to FSCPP described above is gap preserving, that is, it has the following property: (a) If OPT(I) = k then OPT(q(I)) = k; and (b) If OPT(I) > k log n then OPT(q(I)) > k log log D 6, where k is a fixed value, depending on the instance, and S= klog log(l+ 2)/logDI D} 0 for large n. Proof: Part (a) is a simple consequence of Corollary 12. Now, for part (b), note that the maximum number of sets in an instance of SC with n elements is 2". Consequently, in the instance of FSCPP created by transformation 0, DI = m + n < 2" + n. Thus, we have log IDI < log(2 + n) n + 6', ', where 6'= log(1+ ). This implies that, log n > log(log IDI 6)= log log IDI + 6", where 6" = log(1 oD ). Therefore, OPT( (I)) > k log n > k log log ID 6, where 6 = k8" (note that 6 is a positive (11,.1l, il'). Finally, note that the (111. I ll il /,r klog {log(1+ )/log DI goes very fast to zero, in comparison to n, thus the value log log IDI is .i.'mp totically optimal. O The reduction shown in Theorem 11 is gap preserving, since it maintains an approximation gap, introduced by the instances of SET COVER. Note however that the name "gap preserving" is misleading in this case, since the new transformation has a smaller gap than then original. Finally, we get the following result. Theorem 15 If there is some c > 0 such that a P,I,'..;:,,:./l, time algorithm A can approximate FSCPP within (1 ) log log k, where k = IDI, then iJVP C TIJME(no(loglog)). Proof: Suppose that an instance I of the SC is given. The transformation 9 described above can be used to find an instance 0(I) of the FSCPP. Then, A can be used to solve the problem for instance 0(I). According to Lemma 14, transformation 0 reduces any gap of logn to log log k. Thus, with such an algorithm one can differentiate between instances I with a gap of log n. But this is not possible in polynomial time, according to (Feige, 1998, Theorem 10) unless flVPC TIME (n(log lg n)). D 4.4 Concluding Remarks The SCPP is a difficult combinatorial optimization problem occurring in multicast networks. We have shown that the SCPP in general cannot have approximation algorithms with guarantee better than c, for some c > 1. Thus, different from other optimization problems (such as the connected dominating set in Chapter 7), the SCPP cannot have a poli',iniiial time approximation scheme (PTAS). We have also proved that the FSCPP cannot be approximated by less then log log k, where k is the number of destinations, unless AfPcan be solved in subexponential time. This shows that it is very difficult to find near optimal results for general instances of the FSCPP. CHAPTER 5 ALGORITHMS FOR STREAMING CACHE PLACEMENT PROBLEMS The results of the preceding chapter show that the SCPP is very difficult to solve, even if only approximate solutions are required. We describe some approximation algorithms that can be used to give solutions to the problem, and decrease the gap between known solutions and nonapproximability re sults. We also consider practical heuristics to find good nearoptimal solutions to the problem. We propose two general types of heuristics, based on comple mentary techniques, which can be used to give good starting solutions for the SCPP. 5.1 Introduction In this chapter, we propose algorithms for solution of SCPP problems. Initially, we discuss algorithms with performance guarantee, also known as approximation algorithms. We give a general algorithm for SCPP problems, and also a better algorithm based on flow techniques. Approximation algo rithms are very interesting as a way of understanding the complexity of the problem, but specially on this case, due to the negative results shown in Chap ter 4, they are not very practical. Thus, considering the complexity issues, we propose polynomial time con struction algorithms for the SCPP, based on two general techniques: adding destinations to a partial solution, and reducing the number of infeasible nodes in an initial solution. We report the results of computational experiments based on these two algorithms and its variations. This chapter is organized as follows. In Section 5.2 we present algo rithms with performance guarantee for the SCPP. In Section 5.3, we turn to algorithms without performance guarantee, and discuss a number of possible construction strategies. Then, in Section 5.4 we proceed to an empirical evalu ation of the solutions returned by the proposed construction heuristics. Final remarks and future research directions are discussed in Section 5.5. 5.2 Approximation Algorithms for SCPP In this section, we present algorithms for the TSCPP and FSCPP and analyze their approximation guarantee. To simplify our results, we use the notation A(I) = IRU{s} where R is the set of cache nodes found by algorithm A applied to instance I. Also, OPT(I) = IR* U {s}, where R* is an optimal set of cache nodes for instance I. Note that A(I) > 1 and OPT(I) > 1, which makes Definition 2 valid for our problems. 5.2.1 A Simple Algorithm for TSCPP It is easy to construct a simple approximation algorithm for any instance of the TSCPP. We denote by 6G(v) the degree of node v in the graph G. Input: Graph G, destinations D, source s Output: a set R of cache nodes STEP1: Construct a spanning tree T of G STEP2: Remove recursively all leaves of T which are not in D U {s} STEP3: Let S1 be the set of internal nodes v with 6T(V) > 2 STEPj: Let S2 be the set of internal nodes v with 6rT() = 2 and v D STEPS: Return R = S1 U S2. Algorithm 5: Spanning Tree Algorithm Note that steps 3 and 4 of Algorithm 5 represent a worst case for Algo rithm 4. The correctness of the algorithm is shown in the next lemma. Lemma 16 Algorithm 5 returns a feasible solution to the TSCPP. Proof: The operation in step 2 maintains feasibility, since leaves cannot be used to reach destinations. The result R includes all internal nodes v with 6r(v) > 2, and all internal nodes v with 6T(v) = 2 and v E D. It suffices to prove that if &6(v) = 2 and v ( D then v is not needed in R. Suppose that v is an internal node with 6T(v) = 2 and v D. If the number of destinations down the tree from v is equal to 1, then v does not need to be a cache. Now, assume that the number of cache nodeswdown the tree from v is two or more. Then, there are two cases. In the first case, there is a node w, between v and the destinations, with &6(W) > 2. In this case, w is in R, and we need just to send one unit of flow from v to w, thus v does not need to be in R. In the second case, there must be some destination w with 6T(W) = 2 between v and the other destinations. Again, in this case w will be included in R from S2. Thus v does not need to be in R. This shows that R is a feasible solution to the TSCPP. o Lemma 17 Algorithm 5 gives an approximation guarantee of IDI. Proof: Let us partition the set of destinations among D1 and D2, where D1 = D \ S2 and D2 = S2. Denote by D' the set of destinations which are leaves in T. Initially, note that for any tree the number of nodes v with degree 6(v) > 2 is at most ILI 2, where L is the set of nodes with 6(v) = 1 (the leaves). But L in this case is D' U {s} C D1 U {s}. This implies that Si < IDi U {s} 2. Thus, IRI = Si U S21 < ID, U {s} 2 + DD21 = IDI 1, and A(I)= IR 1 < DI < IDOPT(I), since OPT(I) > 1. Let A = A(G) be the maximum degree of G. In the case in which all capacities c,, for e E E(G), are equal to one, we can give a better analysis of the previous algorithm with an improved performance. Theorem 18 When c, = 1 for all e E E, then Algorithm 1 is a k approximation algorithm, where k = min{A(G), D}. Proof: The key idea to note is that if c, 1 for all e E E, then [IDI/A] < OPT(I), for any instance I of the TSCPP. This happens because each cache node (as well as the source) can serve at most A destinations. Let A(I) be the value returned by Algorithm 2 on instance I. We know from the previous analysis of Lemma 17 that A(I) < D1. Thus A(I) < AOPT(I). The theorem follows, since we know that this is also an Dapproximation algorithm. O 5.2.2 A Flowbased Algorithm for FSCPP In this section, we present an approximation algorithm for the FSCPP. The algorithm is based on the idea of sending flow from the source to desti nation nodes. We show that this algorithm performs at least as good as the previous algorithm for the TSCPP. In addition, we show that for a special class of graphs this algorithm gives essentially the optimum solution. Therefore, for this class of graphs the FSCPP is solvable in polynomial time. We give now standard definitions about network flows. For more details about the subject, see Ahuja et al. (1993). Let f(x, y) E R+ be the amount of flow sent on edge (x, y), for (x, y) E E. A flow is called a feasible flow if it satisfies flow conservation constraints (3.2). Let F(f, s, t) = zE, f(s, v) be the total flow sent from node s. We assume that s can send at most Z(s,v)eE Cs units of flow, and t can receive at most E(u,t)eE Cut units of flow. A feasible flow f is the maximum flow from s to t if there is no feasible flow f' such that F(f', s, t) > F(f, s, t). A node v is a reached node from s by flow f if EZv f(w, v) > 0. It is well known that when f is the maximum flow, then F(f, s, t) = C(s, t), where C(s, t) represents the minimum (.i ,.... i ilr of any set of edges separating s from t in G (the minimum cut). We also use the notation C(U, U) to denote the total capacity of edges linking nodes in U to nodes in U, where U C V and U = V \ U. Denote any feasible flow starting from node v by f,. The algorithm works by finding the maximum flow from s to all nodes in D. If the total cost of this maximum flow is F(f, s, D) > D, then the problem is solved, since all destinations can be reached without cache nodes. Otherwise, we put all nodes reachable from s in a set Q. Then we repeat the following steps until D \ Q is empty: for all nodes v E Q, compute the maximum flow f, from v to D. Find the node v* such that f,. is maximum. Then add v* to R and add to Q all nodes reachable from v*. Also, reduce the capacity of the edges in E by the amount of flow used by f,.. These steps are described more formally in Algorithm 6. Q {s} while D\Q 0 do forall v E Q do find the maximum flow f, from v to D \ Q end Let v* be the node such that F(f,, s, D) is maximum R RU{v*} Add to Q the nodes reached by f,, for each edge (u, v) E E do Reduce (. 1.... i' Cu,,v by fv, (u, v) end end Algorithm 6: Flow algorithm In the following theorem, we show that the running time of this algorithm is polynomial and depends on the time needed to find the maximum flow. Denote by CM(G) the maximum value of the minimum cut between any pair of nodes v, w E V(G), i.e. CM(G) = maxC(, w). v,wCV Similarly, we define Cm(G) min C(v, w). v,wCV Theorem 19 Algorithm 6 has iI,,,,iii time equal to O(n IDI Tm CmIC(G)), where Tm is the time needed to run the maximum flow .,l'i., 7l., Proof: The most costly operations in this algorithm are calls to the maximum flow algorithm. Therefore we count the number of calls. Note that, at each of the N iterations of the while loop, a new element is added to the set of cache nodes. Thus, A(I) is equal to the number of such iterations. Let vi be the node added to the set of cache nodes at iteration i, and Q' be the content of set Q at iteration i of Algorithm 6. At each step, the number of elements of D found by the algorithm is, according to the maximum flow/minimumcut theorem, equal to the minimum cut from vi to the remain ing nodes in D \ Q' (recall that all demands are unitary). Then, N N ID C(v, D \ Q') > min C(v, w) > A(I) min C(v, w). SwED v,wCV Thus we have IDI N A(I) < (5.1) Cm(G). At each iteration of the while loop, the number of calls to the maximum flow algorithm is at most n. The total number n, of such calls is given by n, < niDl/C'(G). Thus the running time of Algorithm 6 is O(nlDlTmf/Cm(G)). D Based on the performance analysis just shown, the following theorem gives an approximation guarantee for Algorithm 6. Theorem 20 Algorithm 6 is a kapproximation .,I1,,. :1 ,,:. where k CM(G)/Cm(G). Proof: If we denote by R the set of cache nodes in the optimal solution, we have IDI < C(vu, D) < max C(v, w) OPT(I)CM(G) (5.2) vicRU{s} iRU{s} 'wIv Combining inequalities (5.1) and (5.2) results in cM (G) A(I) < OPT(I) w Cm(G) Note that the quantity CM(G)/Cm(G) can become large. However, for some types of graphs, the preceding algorithm gives us a better understanding of the problem. For example, if the graph has maximum degree A(G) and fixed capacity 1, then it is easy to see that CM(G) A(G) .l A(G). < ) A(G). Cm(G) I If the edge (., .... i01v is not fixed, then CM(G)/Cm(G) becomes at most A(G) cM/cm, where cM represents the maximum (.i .... il and c" the smallest capacity of edges in G. 5.3 Construction Algorithms for the SCPP We this section, we provide construction algorithms for SCPP that give good results for a class of problem instances. The algorithms are based on dual methods for constructing solutions. In the first heuristic, the method used consists of sequentially adding destinations to the current flow, until all destinations are satisfied. The second method uses the idea of turning an initial infeasible solution into a feasible one, by adding cache nodes to the existing infeasible flow. The general method used can be summarized by saying that it consists of the selection, at each step, of subsets of the resulting solution, while a complete solution is not found. To select elements of the solution, an ordering function is eil'lv, d. This is done in a way such that parts of the solution which seem promising in terms of objective function are added first. In the remaining of this section we describe the specific techniques proposed to create solutions for the SCPP. 5.3.1 Connecting Destinations The first method we propose to construct solutions for the SCPP is based on adding destinations sequentially. The algorithm uses the fact that each feasible solution for the SCPP can be clearly described as the union of paths from the source s to the set D of destinations. More formally, let D = {dl,...,dk} be the set of destinations. Then, a feasible flow for the SCPP is the union of a set of paths s = P U ... U Pk, such that Pi = (s,xi ..., xzj1, zj), P2 = (s,1 ..., j2I, 2 xj2), ... P2 (s, xz ..., Zjk1_, jXi), and where xzj = di, for i {1,..., k}. In the proposed algorithm, we try to construct a solution that is the union of paths. It is assumed initially that no destination has been connected to the source, and A is the set of non connected destinations, i.e., A = D. Also, the set of cache nodes R is initially empty. During the algorithm execution, S represents the current subgraph of G connected to the source s. At each step of the algorithm, a path is created linking S to one of the destinations d E A. First, the algorithm tries to find a path P from d to one of the nodes in R U {s}. If this is not possible (this is represented by saying that P = nil), s 2 3 4 Figure 51: Sample execution for Algorithm 7. In this graph, all capacities are equal to 1. Destination d2 is being added to the partial solution, and node 1 must be added to R. then the algorithm tries to find a path P from d to some node in the connected subgraph S. Let w be the connection point between P and S. Then, add w to the set R of caches, since this is necessary to make to the solution feasible. Finally, the residual flow available in the graph is updated, to account for the capacity used by P. The algorithm finishes when all destinations are included, and R is returned as the solution for the problem. The formal steps of the proposed procedure are described in Algorithm 7. A number of important decisions, however, are unspecified in the description of the algorithm given so far. For example, there are many possible methods that can be used to select the next destination added to the current partial solution. Also, a path to a destination v can be found using diverse algorithms, which can result in different selections for a required cache node. These possible variations in the algorithms are represented by two functions, get_path(v, S), and get_dest(A). Thus, changing the definition of these functions we can actually achieve different implementations. The first feature that can be changed, by defining a function get_dest 2V V, is the order in which destinations are added to the final solution. Among the possible variations, we can list the following, which seem more useful: Input: graph G, set D of destinations A D S  0 /* current flow */ R  ( /* set of cache nodes */ while A / 0 do v < get_dest(A) /* choose a destination */ A A\ {v} P get_path(v, R, G) if P nil then P < getpath(v, S, G) Let w be the node connecting P to S R RU{w} end Remove from G the ., .... ilv used in P S < SUP end return R Algorithm 7: First construction algorithm for the SCPP. give precedence to destinations closer to the source; give precedence to destinations further from the source; give precedence to destinations in lexicographic order. The second basic decision that can be made is, once a destination v is selected, what type of path that will be used to join v to the rest of the graph. This decision is incorporated by the function get_path : V  2V. A second parameter involved in the definition of get_path is the specific node w E S which will be connected to v. Note that such a node always exist, since at each step of the construction there is a path from destinations to at least one node already reached by flow. Using a greedy strategy, the best way to link a new destination d is through a path from d to some node v E R U {s}. However, it may not be possible to find such v, and this requires the addition of a new node to R. In both situations it is not clear what is the optimal node to be linked to the current destination. Thus, another important decision in Algorithm 7 concerns how to choose, at each step, the node to be linked to the current destination. Shortest Path Policy. Perhaps, the simplest and most logical solution to the above questions is to link destination nodes using shortest paths. This policy is useful, since it can be applied to answer the two questions raised above: the path is created as a shortest path, and the node v E R U {s} selected is the closest from d. Thus, function get_path(v, R, G) in Algorithm 7 becomes: select the node v E R U {s} such that dist(v, d) (the shortest path distance between v and d) is minimum; find the the shortest path d > v from d to v; add d > v to the current solution. If there is no path from d to RU{s}, then let v be the node closest to d and reached by flow from s and add v to R. Other Policies. We have tested other two methods of connecting sources to destinations. In the first method, destinations are connected through a path found using the depth first search algorithm. In this imple mentation, paths are followed until a node already in R, or connected to some node in R, is found. In the last case, the connection node must be added to R. The second method used employs random paths starting from the destination nodes. This method is just useful to understand how good are the previous algorithms compared to a random solution. Once defined the policy to be used in the constructor, it is not difficult to prove the correctness, as well as finding the complexity of the algorithm. Theorem 21 Given an instance I of the SCPP, Algorithm 7 returns a correct solution. Proof: At each step, a new destination is linked to the source node. Thus, at the end of the algorithm all destinations are connected. The paths determined by such connections are valid, since they use some of the available capacity (according to the information stored in the residual graph G). Nodes are added to R only when it is required to make the connection possible. Thus, the resulting set R is correct, and corresponds to a valid flow from s to the set D of destinations. o Theorem 22 Using the closest node '../.:' ; for destination selection and the shortest path "'.:.. 'i for path creation, the time <,,.,:/,/. ,.:'/ of the Algorithm 7 is O(IDIn2). Proof: The exterlal loop is executed IDI times. The steps of highest complex ity inside the while loop are exactly the ones corresponding to the get_path procedure. As we are proposing to use the shortest path algorithm for the implementation of get_path, the complexity is O(n2) (but can be improved with more clever implementations for the shortest path algorithm). Other op erations have smaller complexity, thus the total complexity for Algorithm 7 is O(IDn2). D Other implementations of Algorithm 7 would result in a very similar anal ysis of complexity. 5.3.2 Adding Caches to a Solution We propose a second general technique for creating feasible solutions for the SCPP. The algorithm consists of adding caches sequentially to an initial infeasible solution, until it becomes feasible. The steps are presented in Algo rithm 8. At the beginning of the algorithm, the set of cache nodes R is empty, and a possibly infeasible subgraph, linking the source to all destinations gives the initial flow. Such an initial infeasible solution can be created easily with any spanning tree algorithm. In the description of our procedure, we define the set I of infeasible nodes to be the nodes v E V \ {s} such that E c(w,v) E c(v, w) b, (w,v)CE(G) (v,w)CE(G) input: Graph G S spanning_tree(G) Remove from G the (. i... ili on edges used by S I  infeasible nodes in S while I / 0 (there are infeasible nodes) do v < select_unfeasible_node Try to find different paths to satisfy v if a set P of paths is found then Remove from G the ., .... i;I used by P S S UP I < I \ { v } else R RU{v} end end return R Algorithm 8: Second construction algorithm for the SCPP. where b, is the demand of v, which can be 0 or 1. In the while loop of Algorithm 8, the current solution is initially checked for feasibility. This verification determine if there is any node v E V such that the amount of flow leaving the node is greater then the arriving flow, or in other words, I $ 0. The formal description of this verification procedure is given in Algorithm 9. If the solution is found to be infeasible, then it is necessary to improve its feasibility by increasing the number of properly balanced nodes. The correction of infeasible nodes v E I is done in the body of the while loop in Algorithm 8. The procedure consists of selecting a node v from the set of infeasible nodes I in the flow graph, and trying to make if feasible by sending more data from one of the nodes w E RU {s}. If this can done in such a way that v becomes feasible again, then the algorithm just needs to update the current subgraph S and the set of infeasible nodes I. s 2 4d4 3 4 Figure 52: Sample execution for Algorithm 8, on a graph with unitary capac ities. Nodes 1 and 2 are infeasible, and therefore are candidates to be included in R. However, if v cannot receive enough additional data, then it must be added to the list of cache nodes. This clearly will make the node feasible again, since there will be no restrictions on the amount of flow departing from v. After adding v to R, the graph is modified as necessary. Example of possible modifications are changing the flow required by v to one unit and deleting additional paths leading to v, since only one is necessary to satisfy the flow requirements. We assume that these changes are done randomly, if necessary. This construction technique can be seen as a dual of the algorithm pre sented in the previous subsection. In Algorithm 7 the assumption is that a Input: current solution S, destinations set D for v D do b,0 1 for v V do < E c(w, vu) c(u, w) (w,v)CE(S) (v,w)CE(S) if 6 / b6 then I < I U {v} end end return I /* returns the infeasible nodes */ Algorithm 9: Feasibility test for candidate solution. partial solution can be incomplete, but always feasible with relation to the flow sent from s to the reached destinations. On the other hand, in Algorithm 8 a solution is always complete, in the sense that all destinations are reached. However, it does not represent a feasible flow, until the end of the algorithm. Procedure select_unfeasible_node has the objective of finding the most suitable node to be processed in the current iteration. This is the main decision in the implementation of Algorithm 8, and can be done using a greedy function, which will find the best candidate according to some criterion. We propose some possible candidate functions, and determine empirically (in the next section) how these functions perform in practice. Function largest_infeasibility: select the node that has greatest infeasibility, i.e., the difference between entering and leaving flow minus demand is maximum (breaking ties arbitrarily). This strategy tries to add to the set R a node which can benefit most from being a cache. Function closest_from_source: select the infeasible node which is closer to the destination. The advantage in this case is that a node v selected by this rule can help to reduce the infeasibility of other nodes down the in the path from s to the destinations. Function uniformrandom: select uniformly a node v E I to be added to R. This rule is useful for breaking the biases existing in the previous methods. It has also the advantage of being very simple to compute, and therefore very fast. Theorem 23 Given an instance of the SCPP, Algorithm 8 returns a correct solution. Proof: In the algorithm, the set of infeasible nodes I will decrease mono tonically. This happens because at each step one infeasible node is selected and turned into a feasible node. Also, feasible nodes cannot become infeasi ble, since each operation requires that there is enough capacity in the network (this is guaranteed by the used of the residual graph G). Thus, the algorithm terminates. The data flow from the source to the destinations must be valid at the end, by definition of the set I (which must be empty at the end). Similarly, the set R must be valid, since it is used only to make nodes feasible in the case that no additional paths can be found to satisfy their requirements. Thus, the solution return by Algorithm 8 is correct for the SCPP. o Theorem 24 The time (or,,1;'. i.:;1 of the Algorithm 8 is O(nmK), where K is the sum of capacities in the SCPP instance. Proof: A spanning tree can be found in O(mlogn). Then, it follows the while loop that will perform at most n iterations. The procedure select_unfeasible_node can be implemented in 0(n) by the use of some p'. i.'.... ii. in each of the proposed implementations. Finding paths to in feasible nodes is clearly the most difficult operation in the loop. This can be performed in O(m + n) for each path, using a procedure such as depth first search. However, it may be necessary to run this step a number of times pro portional to the sum of capacities in the graph (K), which results in O(mK). Other operations have low complexity, thus the maximum complexity for al gorithm 8 is O(nmK). o 5.4 Empirical Evaluation In this section we present computational experiments carried out with the construction algorithms proposed above. All algorithms were implemented using the C programming language (the code is available by request). The resulting program was executed in a PC, Table 51: Computational results for different variations of Algorithm 7 and Algorithm 8. Instance Constructor 1 Constructor 2 n m DFS I Shortest Random LI CS UR 50 500 9.9 2.9 18.8 18.3 18.4 3.0 2.8 3.2 50 800 12.9 4.9 29.3 29.6 28.4 5.0 4.8 5.2 60 500 35.1 8.8 56.8 56.4 55.5 9.6 9.2 9.9 60 800 44.7 11.7 77.6 76.9 76.8 12.5 11.7 12.8 70 500 78.9 17.0 111.3 111.2 110.4 18.9 18.1 19.6 70 800 102.2 20.3 139.3 140.1 139.7 22.5 21.4 23.3 80 500 147.6 27.8 180.7 182.2 181.3 32.0 31.1 33.4 80 800 186.1 32.4 218.3 218.8 218.9 36.9 36.2 38.9 90 500 239.8 42.1 268.4 267.8 268.5 49.5 49.9 52.0 90 800 285.4 47.5 312.6 313.4 313.3 56.5 57.4 59.8 100 500 344.1 60.3 368.0 369.0 369.1 71.3 74.2 75.8 100 800 398.6 67.7 420.1 421.6 422.3 80.3 84.3 85.4 110 500 466.4 84.2 482.6 484.8 485.8 100.3 106.1 106.2 110 800 525.5 92.5 541.7 545.1 543.7 111.9 119.5 118.4 120 500 599.7 115.1 611.3 614.9 613.3 137.6 146.9 144.9 120 800 668.7 125.5 676.9 680.6 679.2 151.9 164.0 159.4 130 500 748.8 157.4 751.4 755.6 754.9 180.3 194.6 189.0 130 800 824.7 169.5 823.3 827.7 827.8 199.3 215.7 208.5 140 500 910.6 213.2 906.2 909.8 908.9 233.4 252.4 243.9 140 800 995.9 228.2 984.8 990.2 989.3 254.6 276.3 265.5 150 500 1092.0 281.8 1074.4 1080.3 1080.6 292.0 319.6 303.6 150 800 1185.4 299.6 1162.6 1170.0 1169.1 315.8 347.0 353.6 with 312MB of memory and a 800MHz processor. The GNU gcc compiler (under the Linux operating system) was used, with the optimization flag 02 enabled. Table 51 presents a summary of the results of experiments with the proposed algorithms. The first two columns give the size of instances used. The remaining columns give results returned by Algorithm 7 and Algorithm 8 under different policies. Each entry reported in this table represents the averaged results over 30 instances of the stated size. Each instance was created using a random generator for connected graphs, first described in (Gomes et al., 1998). This generator creates graphs with random distances and capacities, but guarantees that the resulting instance is connected. Destinations were also defined randomly, with the number of destinations being equal to 40% of the 