Citation
Improving resource utilization in a partitionable bus network using graph coloring and coin-changing algorithms

Material Information

Title:
Improving resource utilization in a partitionable bus network using graph coloring and coin-changing algorithms
Creator:
Woo, Tai-Kuo
Publication Date:
Language:
English
Physical Description:
x, 114 leaves : ill. ; 28 cm.

Subjects

Subjects / Keywords:
Algorithms ( jstor )
Computer technology ( jstor )
Databases ( jstor )
Gaussian distributions ( jstor )
Greedy algorithms ( jstor )
News content ( jstor )
Organizational communication ( jstor )
Simulations ( jstor )
Total communication ( jstor )
Vertices ( jstor )
Algorithms ( lcsh )
Computer and Information Sciences thesis Ph. D
Computer network architectures ( lcsh )
Dissertations, Academic -- Computer and Information Sciences -- UF
Graph theory ( lcsh )
Local area networks (Computer networks) ( lcsh )
Parallel processing (Electronic computers) ( lcsh )
Genre:
bibliography ( marcgt )
theses ( marcgt )
non-fiction ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1989.
Bibliography:
Includes bibliographical references (leaves 109-113.
Additional Physical Form:
Also available online.
General Note:
Typescript.
General Note:
Vita.
Statement of Responsibility:
Tai-Kuo Woo.

Record Information

Source Institution:
University of Florida
Holding Location:
University of Florida
Rights Management:
Copyright Tai-Kuo Woo. Permission granted to the University of Florida to digitize, archive and distribute this item for non-profit research and educational purposes. Any reuse of this item in excess of fair use or other copyright exemptions requires permission of the copyright holder.
Resource Identifier:
21139173 ( OCLC )
0030505522 ( ALEPH )

Downloads

This item has the following downloads:


Full Text







IMPROVING RESOURCE UTILIZATION
IN A PARTITIONABLE BUS NETWORK USING
GRAPH COLORING AND COIN-CHANGING ALGORITHMS






By

TAI-KUO WOO


















A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY



UNIVERSITY OF FLORIDA 1989
























Copyright 1989

by

Tai-Kuo Woo











ACKNOWLEDGEMENTS


I would like to express my heartfelt gratitude to my adviser, Dr. Stanley Y.W. Su, for his advice and guidance. Many thanks are also due to Dr. Yuan-Chieh Chow, Dr. Herman Lam, Dr. Richard Newman-Wolfe, and Dr. Chung-Yee Lee for their encouragement. Without their help, this work would not have been finished.

I am also grateful to David Johns for his proofreading of this dissertation. Lastly, I sincerely thank my roommate, David Kern, who has helped in making this endeavor possible.
































111

















TABLE OF CONTENTS




ACKNOWLEDGEMENTS ............................ iii

LIST OF TABLES ................................. vi

LIST OF FIGURES ................................ viii

ABSTRACT .. .. .. ... .. .. .. . .. .. .. . .. ... .. . ix

CHAPTERS

1 INTRODUCTION ............................... 1


2 SURVEY OF RELATED WORK ....................... 5

2.1 Local Area Networks ........................... 5
2.2 Partitionable Bus Networks ........................ 8
2.3 Graph Coloring Algorithms ........................ 9
2.4 Coin-changing Algorithms ........................ 16
2.5 Sum m ary . . . . . . . . . . . . . . . . 19

3 GRAPH COLORING ALGORITHMS .................... 20

3.1 The Graph Traversal Algorithms ....................... 20
3.1.1 Static Algorithm ......................... 20
3.1.2 Dynamic Algorithm ......... ............... 21
3.1.3 Heuristic Algorithm ..... ..... .. ... .......... 25
3.2 Analysis of the Dynamic Graph Traversal Algorithm ......... 27 3.3 Comparison of Results .......................... 37

4 DPBN AND GRAPH COLORING ALGORITHM .............. 39

4.1 A Graph Traversal Algorithm and Its Application ............ 41
4.2 Dynamic Bus Partitioning Technique ..................... 42
4.3 Hardware Design of A Graph Traversal Unit ............... 47
4.3.1 Architecture of the Graph Traversal Unit ............. 47
4.3.2 Input and Output of the Graph Traversal Unit . . . . 49
4.3.3 Organization of the Graph Traversal Unit . . . . . 58
4.4 Performance Evaluation of the Partitionable Bus Networks . . . 61
4.4.1 Simulation Study of DPBN ................... 62

iv












4.4.2 A Strategy for Solving the Aging Problem in DPBN . . . 69
4.5 Sum m ary . . . . . . . . . . . . . . . . 70

5 DPBN AND COIN-CHANGING ALGORITHMS . . . . . . . 72

5.1 Some Theorems Related to Coin-Changing Algorithm . . . . 75
5.2 The Application of the Coin-Changing Algorithm in Partitionable Bus
N etworks . . . . . . . . . . . . . . . . 81
5.3 Performance Evaluation ......................... 84

6 SUMMARY, CONCLUSION, OTHER APPLICATIONS, AND FUTURE WORK
100

6.1 Sum m ary . . . . . . . . . . . . . . . . 100
6.2 Conclusion . . . . . . . . . . . . . . . . 102
6.3 Other Applications ................... ......... 102
6.4 Future W ork .. .................. .. ....... ... 103

APPENDICES

A PROBABILITY DISTRIBUTION: NORMAL DISTRIBUTION ...... 105


B PROBABILITY DISTRIBUTION: EXPONENTIAL DISTRIBUTION . 107


REFERENCES ................................... 108

BIOGRAPHICAL SKETCH ............................ 114






















v
















LIST OF TABLES



3.1 The Approximate Values of the Expected Chromatic Number by Wood's
Method and the Dynamic Graph Traversal Algorithm . . . . 37 4.1 A Graph Representation of a Group of Communication Requests . 43 4.2 The Table of Conflicts among Communication Requests . . . . 43 4.3 The Adjacency Matrix Stored in the RAM . . . . . . . 52

4.4 The Initial Values of the Registers (W, W1, W2, ..., Wr) . . . . 52

4.5 The Table of Communication Requests . . . . . . . . 63

4.6 The Adjacency Matrix of the Vertices in a Graph Correspondin to
the Communication Requests in Table 4.5 (before graph traversal) 64

4.7 The Discarded Vertices of the Graph Traversal on the Adjacency Matrix in Table 4.6 ... . .. .. . .. . .. . .. .. .. 64

4.8 The Adjacency Matrix of the Vertices in a Graph Corresponding to the
Communication Requests in Table 4.7 (after the first graph traversal) 65

4.9 The Discarded Vertices of the Graph Traversal on the Adjacency Matrix in Table 4.8 . . . . . . . . . . . . . . 65

4.10 The Adjacency Matrix of the Vertices in a Graph Corresponding to
the Communication Requests in Table 4.9 (after the second graph
traversal) . . . . . . . . . . . . . . . . 65

4.11 The Expected Communication Delays and Standard Deviations in
Units for a DPBN and an Ideal Bus Network (in parentheses) . . 66 4.12 The Improvement Ratio of a DPBN over an Ideal Network . . . 67

4.13 The Improvement Ratio of a DPBN over an Ideal Network When New
Requests are Allowed to Join Traversals . . . . . . . . 68
5.1 An Example of Bus Idling ........................ 73

5.2 An Example of Decomposing Communication Requests . . . . 83


vi











5.3 The Improvement Ratio of a DPBN Using Time Frames (Uniform
Distribution of Communication Lengths) . . . . ... . .. 86

5.4 The Improvement Ratio of a DPBN Using Time Frames (Exponential
Distribution of Communication Lengths with 3 = 60) . . . . 88

5.5 The Improvement Ratio of a DPBN Using Time Frames (Exponential
Distribution of Communication Lengths with / = 40) . . . . 89

5.6 The Improvement Ratio of a DPBN Using Time Frames (Exponential
Distribution of Communication Lengths with 3 = 20) . . . . 90

5.7 The Improvement Ratio of a DPBN Using Time Frames (Exponential
Distribution of Communication Lengths with / = 10) . . . . 91

5.8 The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with 02 = 40) . . . . . 93

5.9 The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with cr2 = 30) . . . . . 94

5.10 The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with a2 = 20) . . . . . 95

5.11 The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with cr2 = 15) . . . . . 96

5.12 The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with a2 = 10) . . . . . 97

5.13 The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with cr2 = 5) . . . . . 98

















vii

















LIST OF FIGURES



2.1 An Example of Computing the Upper Bound of a Graph Using Welsh
and Powell's Algorithm .......................... 13

3.1 An Example of Graph Traversal Using the Static Graph Traversal
A lgorithm . . . . . . . . . . . . . . . . 22

3.2 An Example of Graph Traversal Using the Dynamic Graph Traversal
A lgorithm . . . . . . . . . . . . . . . . 24

3.3 An Example of Graph Traversal Using the Heuristic Graph Traversal
Algorithm . . . . . . . . . . . . . . . . 26

3.4 An Example of Graph Traversal on a K-partite Graph . . . . 30

4.1 The Architecture of a Partitionable Bus Network . . . . . 40

4.2 A Graph Representation of Conflicts and Requests . . . . . 44

4.3 The Graph Traversal of the Constructed Graph . . . . . . 45

4.4 The Detailed Organization of the Control Computer of the Partitionable Bus Network (DPBN) ........................ 50

4.5 The Organization of the Graph Traversal Unit . . . . . . 59

4.6 The Graph Traversal for the Strategy of Handling Aging . . . 71















viii











Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy



IMPROVING RESOURCE UTILIZATION IN A PARTITIONABLE BUS NETWORK USING
GRAPH COLORING AND COIN-CHANGING ALGORITHMS By

Tai-Kuo Woo

May 1989

Chairman: Dr. Stanley Y. W. Su
Major Department: Computer and Information Sciences


Achieving efficiency in a parallel processing environment is a fundamental problem in many computer science and engineering disciplines. In a large scale computer system, resources such as memory, secondary devices, and communication networks are usually shared by processors and processes. Resource contentions often occur, and system performance is degraded due to blocking and deadlock problems.

System performance can be improved by detecting non-conflicting processes and scheduling them for parallel processing. In this dissertation, we introduce three graph coloring algorithms for distinguishing conflicting and non-conflicting processes. The complexity of each algorithm is O(E), where E is the number of edges of the graph. By interpreting the results of the graph traversal algorithm, non-conflicting processes can be scheduled for parallel communication or processing.

Another problem dealt with in this work is the idling problem in the execution of non-conflicting processes. Since the processes may take different amount of times to execute, if processors are assigned to process them to their completion, the processors of shorter processes will be idle after the completion of their tasks. In this dissertation, ix











a coin-changing algorithm is applied to achieve better scheduling of non-conflicting processes.

Both the graph traversal algorithms and coin-changing algorithms are then applied in a dynamically partitionable bus network to demonstrate that non-conflicting communication requests can be identified and scheduled for execution in partitioned subnetworks. The performance of a dynamically partitionable bus network using these algorithms is evaluated.

This work makes the following specific contributions. First, it introduces and analyzes three graph coloring algorithms and their performance. An analytical study shows that the dynamic graph traversal algorithm has better performance than other existing graph coloring algorithms. Second, it presents a design of a hardware for implementing the dynamic graph traversal algorithm to meet the time requirement of some real-time environment. Third, it demonstrates the utility of the graph traversal algorithms in a partitionable bus network by analysis and simulation. Fourth, it provides an analysis of a coin-changing algorithm and its application to solve the bus idling problem in a partitionable bus network.



















X

















CHAPTER 1
INTRODUCTION

Parallel processing is an efficient form of information processing. It allows several concurrent processes to be processed simultaneously. System resources such as communication networks, memory devices, and processors are shared among processes. One of the limiting factors to the expansion of parallel processing is performance degradation due to resource contentions that occur when multiple processes request the same resources.

Local area networks are example systems in which resource contentions degrade system performance. They are characterized by a high bandwidth transmission medium shared by many interconnecting processors within a limited area. In the past, local area networks have been developed to provide limited functions such as message passing and file transfer, and the amount of data transmitted among processors is small. If the number of computers attached to the network is large and the volume of communication among processors/memory devices increases, conventional interconnection devices, such as a single shared bus/ring, can not meet the increasing demands. Consequently, bus contention is unavoidable, and system performance degrades rapidly because of long communication delays.

Much effort has been made in developing efficient local area networks, such as the development of protocols to reduce contentions in CSMA/CD (carrier sense multiple access with collision detection) [37] and token passing [23], and the design of hardware in register insertion ring networks [17]. An alternative approach is to partition a communication bus and to allocate communication networks dynamically.

1








2


The concept of partitioning a computer system was introduced in [3,26,54]. Kartashev and Kartashev [26] propose varying word size to meet the requirements of the task on hand by physically partitioning registers and memory words. Su and Baru [3,54] use a physically partitionable bus to allow for the formation of a number of clusters of processors for parallel processing of database operations. However, the formation of clusters of processors is based on the data distribution and processing power needed. No specific algorithm is used to resolve the conflicts in the formation of clusters. In other words, the scheduling of database operations is based on a first-come-first-served basis. Thus, the degree of parallelism is not extended to its potential limit.

In this dissertation, three graph coloring algorithms are proposed for identifying conflicting and non-conflicting communication requests. By assigning non-conflicting requests to different subnetworks created by manipulating switches of a dynamically partitionable bus, parallel communication can be achieved and maximized. This dissertation also presents a hardware design for implementing one of the algorithms. The performance evaluation of the partitionable bus network that uses a graph coloring algorithm for subnetwork formation is also presented.

Another problem associated with the dynamically partitionable bus network is bus idling. In a dynamically partitionable bus network, the non-conflicting communication requests being carried out at the same time in the subnetworks usually are of different time lengths. The subnetworks assigned to the requests that finish early have to wait for other communication processes to finish. As a result, the subnetworks are idle and are not fully utilized. The bus idling problem can be eliminated by using a coin-changing algorithm. The strategy works as follows. First, the length of each communication request is partitioned into a number of time frames of different sizes. For instance, if the set of frame sizes is (1, 5, 10, 15), a communication








3


request of length 70 can be partitioned into 4 frames of size 15 and 1 frame of size 10. Each time when the requests are collected for graph traversals, only the requests that have been assigned a tag frame can join the graph traversals. The tag frame is designated in turn from the largest frame size to the smallest frame size (i.e., 15 10 -+ 5 -+ 1). There are many ways a request can be decomposed. For instance, the length 70 can be decomposed into either 2 frames of size 15 and 4 frames of size 10 or 4 frames of size 15 and 1 frame of size 10. The objective is to use the minimum number of frames for a given set of frame sizes, since a large number of frames will result in high frequency of graph traversals. The problem of minimizing the number of time frames can be transformed into a coin-changing problem which minimizes the number of coins for a given set of coin types. A performance evaluation shows a significant increase of network throughput by applying the coin-changing algorithm to the dynamically partitionable bus network.

The approach presented in this dissertation can be outlined as follows:

1. Decompose each communication request into a number of time frames by a

coin-changing algorithm.

2. Transform communication requests into a graph where vertices represent communication requests and edges denote the conflicts among communication requests.

3. Use graph coloring techniques to color the constructed graph so that nonconflicting communication requests are assigned the same color.

4. Manipulate the switches of the partitionable bus network to form a number of

subnetworks to allow parallel communication among requests that have been

assigned the same color.








4


This dissertation is organized as follows: Chapter 2 gives a survey of related work in local area networks, partitionable bus networks, graph coloring algorithms, and coin-changing algorithms. Chapter 3 presents the graph traversal algorithms and the analyses of algorithms. Chapter 4 delineates the application of the best graph traversal algorithm in a dynamically partitionable bus network (DPBN), the hardware design of graph traversal unit (GTU), and the performance evaluation of the DPBN. Chapter 5 provides a coin-changing analysis and its application to the dynamically partitionable bus network. A conclusion is given in Chapter 6.
















CHAPTER 2
SURVEY OF RELATED WORK

This dissertation touches a number of areas namely, local area networks, partitionable bus networks, graph coloring algorithms, and coin-changing algorithms. This section provides a brief survey of these areas.

2.1 Local Area Networks

There are two most common topologies of local area networks: bus and ring. Channel sharing is one of the communication techniques used in both topologies.

In a bus network, only one communication is allowed at a time. CSMA/CD and token bus are good examples of protocols used on this topology. In the token bus architecture, access to a shared bus is achieved by passing a token from station to station. The order in which the token is passed follows a "logical ring," which can be modified dynamically as stations are added to or dropped from the bus, and reestablished when token pass failures occur. A token regulates the right to access the bus. When a station receives the token, it is granted control of the medium for a period of time. When the packet transmission is over, it passes the token to the next station in a predefined sequence. This protocol requires considerable maintenance, which includes network initialization, addition to the network, deletion from the network, and fault management (multiple tokens, missing token). In order to meet the above requirements, the logic of each station can be very complex, and the overhead involved in token passing is a waste of bandwidth. In CSMA/CD, also known as listen before (while) talk, each station listens to the medium to determine if the medium


5








6


is available. The CD part indicates that a station listens to its own transmission to detect collisions. If the medium is idle, the station may either transmit immediately or transmit with a probability. If the medium is busy, the station may either continue to listen until the medium is sensed idle, transmit with a probability, or wait a random period before sensing the channel. When collisions occur, the station immediately stops transmitting messages and waits a random period of time before retransmission. A major disadvantage of CSMA/CD is the non-deterministic behavior of the bound of delay. It is possible that a station may never be able to transmit messages when the load of the network is heavy.

In a ring network, for instance, the Cambridge Ring [42], there is a sequence of point-to-point links between stations. Messages travel over a fixed route from station to station around the loop in which each station is responsible for regenerating messages and identifying addresses. In some cases, packets can be sent along different links simultaneously. Ring access protocols play a major role in the performance of ring networks. There are three basic types of ring access protocols: token passing, empty slot, and register insertion. The token ring protocol uses a token that circulates around the ring. Each station wishing to transmit messages must wait for a free token passing by. When the transmission is over, the station passes the token to the physically adjacent station, as opposed to the logically adjacent station as in token bus networks. The disadvantages are similar to token bus networks: only one station is able to transmit messages at a time, and fault management increases the complexity of the network.

In the slotted ring, a number of fixed-length slots circulate around the ring. A station finds an available empty slot and inserts a packet of messages. When the full slot reaches its destination, the leading bit of a full slot is set to empty and is free for transmission again. The main disadvantages of slotted rings are: (1) overhead bits








7


take a significant amount of space, and (2) a data packet may not occupy the entire slot.

Each station in the register insertion ring has a shift register, a buffer which temporarily holds the packet passing through the station, and a buffer which stores the packets produced by the station. Packets which arrive at the shift register are received if the station is the addressee. For the output from the station, packets are placed in the buffer first and then transmitted to a shift register when it is idle. The fault management requires a significant amount of work.

No matter what protocol is used, the following three major problems still exist in ring networks [19]:


1. The operation of the network depends on each station's network controller to

regenerate the message. Thus, if an interface or controller fails, the network is essentially broken. This addresses the concern for reliability. Pierce [44] suggests using a "simple circuit" to bypass failed stations. Liu of Ohio State University [33] proposes a distributed double-loop computer network (DDCLN) as a fault-tolerant distributed system. Both approaches result in a higher hardware cost.

2. A ring network must be broken to add or delete stations. Expansion of the

network may result in an interruption of network service.

3. Propagation delay is proportional to the number of stations in the network. As

the number of stations increases, propagation delay can be a serious problem.


Register insertion ring networks [17] allow a certain degree of parallelism; however they require a minimum of a 7-bit station latency for addressing. On the other hand, only one bit delay is sufficient for slotted and token rings. Long addressing delay








8


damages the performance of register insertion networks. All in all, the existing local area networks have defects and need to be improved. A partitionable bus network that allows multiple communication processes to be carried out at the same time is a possible solution.

2.2 Partitionable Bus Networks

As discussed above, both types of networks (bus and ring) have disadvantages. The lack of parallel communication has greatly reduced the performance of existing networks. The idea of bus partitioning was introduced in database machines to speed up data processing. There are two kinds of partitioning: logical partitioning and physical partitioning. First, in logical partitioning as used in DIRECT [11], a backend computer allocates query processors to perform tasks and handles data requests from processors. The system is divided into a number of subsystems logically and a controller monitors the progress of each subsystem. This scheme has two weak points as mentioned in [55]. First, the controller may become the bottleneck, since the control of operations is centralized. Second, the data movement between the mass storage and the main memory of the query processors can cause memory contentions in the interconnection network.

In contrast to logical partitioning, physical partitioning reconfigures the hardware of the computer to maximize system efficiency. Kartashev and Kartashev [26], of Dynamic Computer Architecture, Inc., vary the word size by connecting computer elements to meet the requirements of processing tasks on hand. Each computer element consists of a processor element, a memory element, and an I/O element. Su and Baru, at the University of Florida Database Systems Development and Research Center, use a physically partitionable bus in a system called SM3 [3,54] to increase the degree of parallelism. It has three important features. First, data movement among








9


memory modules is done by memory switching instead of transmitting data among buses. Second, inter-processor communication and synchronization is achieved by using control lines. Third, concurrent processes are executed in parallel in the clusters of processors formed by physically partitioning a common bus. The formation of clusters of processors is based on the data distribution, processing power needed, and file sizes. No specific algorithm is used to resolve the conflicts in the formation of clusters. Thus, the degree of parallelism is not maximized. This dissertation uses a fast graph coloring algorithm for distinguishing conflicting and non-conflicting communication requests and then adopts the idea of a partitionable bus to form a number of clusters of processors for processing the non-conflicting communication requests simultaneously. Thus, the system throughput can be significantly improved.

2.3 Graph Coloring Algorithms

Increasing system throughput and reducing response time are important objectives in designing and implementing a computer system. By duplicating hardware devices and allowing computation tasks to be processed in these devices in parallel, the degree of parallelism can be increased. However, the cost of such a hardware system can be rather high. An economic alternative for achieving a high degree of parallelism is to distinguish conflicting and non-conflicting events by software through graph coloring techniques and to execute non-conflicting events in parallel on a shared hardware resource.

There are two classes of graph coloring problems: vertex coloring and edge coloring. The vertex coloring of a graph is the assignment of colors to the vertices of a graph so that no two adjacent vertices have the same color. Edge coloring is defined in a similar fashion: two edges sharing a common vertex in a graph G must be assigned with different colors. It has been proven that a graph G can be edge colored








10


with N colors if and only if the vertices of G can be colored with N colors [41]. Thus, an edge coloring problem can be transformed into a vertex coloring problem. For the rest of this dissertation, the graph coloring problem is defined as a vertex coloring problem.

The vertex coloring technique can be applied in computer systems in the following way. If we represent the vertices in a graph as the events of a computer system and edges as the conflicts among events, the event scheduling problem can be transformed into a vertex coloring problem. By finding the minimum number of colors for the vertices of a graph and scheduling the events (or vertices) of the same color for parallel execution, a higher degree of parallelism in a computer system can be achieved. Finding the minimum number of colors (i.e., the chromatic number) required for a given graph is an NP-complete problem [15,25]. However, it is possible to find algorithms that can assign colors to any arbitrary simple graph (i.e., a graph without self-loops nor multiple edges between any pair of vertices) and the number of colors assigned is close to the chromatic number. We shall survey the existing methods that attempt to find the exact chromatic number and the approximate chromatic number below:


1. Exact chromatic number: Algorithms in this category find the chromatic number of a graph. However, they require a very large amount of computing time and memory space. For example, Christofides' algorithm [9] constructs a maximum subgraph tree for a given graph from its "maximal independent sets."

Through a breadth-first search method, the shortest path from the root to the terminal nodes can be found. The maximal independent sets on the shortest path are the color classes of chromatic coloring. Unfortunately, the subtree constructed by this algorithm can be too large to be practical for implementation.











Christofides' algorithm was improved by Wang [59], who showed that, by considering only a subset of all the maximal independent sets of a graph, the required number of steps to compute the chromatic number of a graph can be reduced by a factor of as much as (I V / 2 !), where V I is the number of vertices of a graph and 2 denotes 2 factorial. By using a depth-first search method, the amount of memory required to find the shortest path from the root to the terminal nodes can be reduced by a ratio of (I V /8!) to (| V |/2!).

However, the computing time of this algorithm grows exponentially with the

increase of the size of the graph.

2. Approximate chromatic number: Due to the intractability of the graph coloring

problem, some algorithms produce a larger number of colors for a graph than it is necessary. For instance, Greedy's algorithm orders the vertices in arbitrary order, say, V1, V2, V3, ..., V, and then colors the vertices in sequence. First, color V with color 1, then check if V and V2 are non-adjacent. If so, color V2 with color 1, otherwise with color 2. Continue this process until all the vertices are colored. The result of the Greedy algorithm is highly dependent on the order

of vertices.

Brooks [6] provides a fast way of computing the upper bound for the chromatic

number of a graph. The upper bound is defined as


max (di, d2, d3, ...., d)



where di is the degree of vertex i. The estimation is valid only if the graph

G(V E) is not a complete graph and the largest degree of the vertex of the

graph is greater than or equal to three.








12


The method introduced by Welsh and Powell [60] first arranges the vertices in descending order of their degrees (i.e., dl > d2 > d3,...,dn). It then colors the first vertex with color 1 and sequentially colors the rest of vertices which are not adjacent to a previously colored vertex. This process is repeated until all the vertices are colored. Welsh and Powell's algorithm provides an upper bound for the estimation of the chromatic number of a graph: maxi min (di + 1, i)

where i is the index of the vertices arranged in descending order of degrees. The example below illustrates the procedure for computing the upper bound of the chromatic number of the graph as shown in Figure 2.1. The following steps constitute the procedure: Step 1: Arrange the degrees of the vertices in descending order.


di



43322
4 3 3 2 2


Step 2: Select the minimum for each pair of values (di + 1, i), where i goes from

1 to 5.


min (di + 1, i)



1 2 3 3 3









13

















A









B c D E






Figure 2.1. An Example of Computing the Upper Bound of a Graph Using Welsh and Powell's Algorithm








14


Step 3: Choose the largest value among the values in Step 2.


maxi min (di + 1,i) = 3


In this example, it happens to be the chromatic number of the graph. This algorithm produces good results in some cases. However, the number of colors required may increase if the order of vertices is changed. To correct this problem, Wood [61] proposes a coloring method by forming a similarity matrix. A similarity matrix S = {sij} is formed to determine which vertices should be colored with the same color. The values of sij are determined by the conflicts of vertices. sij = 0 if cij = 1. sij = Ek(Cik f cjk) if cij = 0. It should be noted that k is summed over all vertices and C = {caj} is the conflicting matrix, where cij = 1 if vertex i and vertex j have a conflict, and cij = 0 if vertex i and vertex j have no conflict.

To color the vertices of a graph, assign a color to the pair of vertices with the highest similarity based on whether both, one, or none of the vertices has been colored. Each time the similarity matrix is scanned, the similarity level is reduced by one. Continue this process until all the vertices are colored. The similarity matrix method is better than Welsh and Powell's method when the number of vertices is large and the graph is highly connected. However, the improvement is limited and the computation time requirement is high. Plesnik [45] partitions a graph into a spanning graph F and a number of induced subgraphs (V1i, V2, V3, ..., Vk). By estimating the chromatic number of the spanning graph F through the same partitioning technique, the chromatic number of graph G is computed as the sum of the chromatic numbers of the








15


first f subgraphs (i.e, g1 + g2 + g3 + ... + gf), where gi is the chromatic number for subgraph Vi and f is the chromatic number for spanning graph F. The partitioning technique is not clearly stated in the paper and the performance

of the algorithm is not available.

Other researchers have proposed factoring techniques for coloring [18,45]. In these techniques, a graph is decomposed into factors F1, F, F3, ..., Fj (F's are subgraphs). The chromatic number is estimated as the sum of the chromatic numbers of the subgraphs. However, good factoring techniques usually require

large amount of computing time.

Several other algorithms are available and can be found in [5,31,36,40].


All the graph coloring algorithms mentioned above share a common feature: the algorithms are rather complex. Although Welsh and Powell's method is superior to others in terms of simplicity and computation time requirement [5], it predicts the chromatic number poorly when the graph is highly connected.

Graph coloring algorithms have been used in timetable design of examinations [61]. The events scheduled are examinations, each of which requires a period of time. The problem can be transformed into a graph coloring problem by representing each examination with a vertex of a graph and drawing an edge between the vertices if the corresponding examinations can not take place concurrently. The objective is to find the minimal number of periods required to accommodate all the examinations. Newman-Wolfe [43] uses edge coloring to schedule communication requests statically in multistage interconnection networks, where I/O ports and connection lines are represented by vertices and edges of a graph, respectively. Edges assigned the same color are switched on to allow communications to go through simultaneously.








16


Many people recognize the usefulness of the graph coloring algorithms, but because of the intractability of the graph coloring problem, few realistic applications have been proposed so far. Most approaches are static in nature, i.e., the participating events and constraints remain unchanged during the entire period of processing. The designation of vertices as processors in multiprocessor scheduling and the representation of switch boxes by vertices in multistage interconnection networks are examples of a static scheduling scheme which does not fully reflect the dynamic nature of request contention in a computer system and is therefore unable to optimize resource utilization. The approach proposed in this dissertation is to schedule events dynamically through graph traversal algorithms. The detailed description will be given in Chapter 4. This dissertation presents three graph traversal algorithms which give good estimations of the chromatic number of a graph in O(E) time where E is the number of edges of the graph. They can be applied to speed up the process of distinguishing conflicting and non-conflicting requests. However, due to the non-uniform ending of the requests, the subnetworks assigned to the processes that finish early will be idle. The problem can be solved by fragmenting the messages of the communication requests and by using a fast coin-changing algorithm to schedule them for execution.

2.4 Coin-changing Algorithms

In distributed processing, each individual processor needs to perform tasks independently and cooperate with other processors if necessary. Many of the computing tasks require cooperation among processors, and thus result in heavy loads on networks. Performance of the system degrades because of network contentions. The approach of using a graph coloring algorithm solves the contention problem and allows for parallel communication. What remains is the bus idling problem, which can








17


be solved by a coin-changing algorithm. The bus idling problem is due to the fact that the communication processes that are allowed to proceed in the partitioned subnetworks take different time durations, and the subnetworks that are released by the processes of shorter durations can not be reused until all the processes are completed.

The coin-changing problem can be defined as follows. For a given set of coin types W = (W1, W2,..., W,), the goal is to meet a given total T value with a minimum number of coins. It is assumed that the supply for each type of coin is unlimited. Mathematically, the problem can be formulated as follows:

Minimize C = EXi i = 1, 2,...,n

subject to EWXi = T where Xi is a non-negative integer.

The coin-changing problem is a simplified version of the integer programming problem. The existing work on this topic can be categorized as follows:


1. Exact algorithms: Researchers look for algorithms that can find optimal solutions for any arbitrary set of coin types. Chang and Gill [8] provides a recursive algorithmic solution which requires a considerable amount of computation time.

Wright [62] proposes a solution using dynamic programming techniques to simplify the computation task. Both algorithms find optimal solutions. However,

the computation time required makes the algorithms of little practical value.

2. The Greedy algorithm [8]: The Greedy algorithm is simple and fast. It takes

as many of the largest coins as possible and then as many of the second largest coins as possible, etc. The procedure below implements the Greedy algorithm:








18




Procedure Coin Changing (W, X, T); (* W = ( W1, W2, W3, ... ,W, ) is the set of coin types; X = ( X1, X2, X3,..., X, ) is the representation of T with respect to W*)

begin

For i := n to 1 do

begin

Xi := MOD ( T, W)

T:=T-Xi *Wi

end;

end;

However, the algorithm does not always find optimal solutions. In some cases, the algorithm does not find a solution even though one exists. For instance, when the set of coin types is W = (5, 11, 15, 20) and the total is T = 30, the Greedy algorithm would return the representation X = (2, 0, 0, 1) instead of the optimal representation Y = (0, 0, 2, 0). If the total is T = 26, the Greedy algorithm would not return any representation even though an optimal representation, Y = (0, 1, 1, 0), exists. In this category, researchers are trying to determine if a set of coin types W = (W1, W2, W3, ..., W,) will yield optimal solutions when the Greedy algorithm is applied as discussed in [7,8,58]. For instance, in Chang and Gill's paper, a range of values of T (the sum of the values of coin types) (W3, W.(W.W,- + W 3W..1)/(W, W7,1)) has to be checked for optimality with a given set of coin types W. Magazine et. al [34] provide methods that would check fewer values of T.








19


2.5 Summary

In order to meet the demand of a high performance local area network, both the graph coloring algorithms and coin-changing algorithms are applied in a dynamically partitionable bus network to improve the network performance. In this section, we have presented the survey of related work including partitionable bus networks, graph coloring algorithms, and coin-changing algorithms. The application, analysis, and performance evaluation of the graph coloring algorithms and the coin-changing algorithms will be presented in the next three chapters.
















CHAPTER 3
GRAPH COLORING ALGORITHMS

The graph coloring algorithms are the foundation of this research. Three fast graph traversal algorithms which give good estimate of the chromatic number of a graph are presented. This chapter is organized as follows: Section 3.1 delineates the algorithms including the static, dynamic, and heuristic algorithms. Section 3.2 evaluates the performance of the dynamic graph traversal algorithm. Section 3.3 shows the results of the performance evaluation.

3.1 The Graph Traversal Algorithms

The algorithms to be presented below assign colors to the vertices of a graph through graph traversals. Each traversal results in some colored vertices and "thrownaway" vertices. Through repeated traversals of the "thrown-away" vertices, all vertices are assigned with colors.

3.1.1 Static Algorithm

Given a simple graph, the following steps are taken to color the vertices as illustrated in Figure 3.1.

1. Arbitrarily pick one vertex in the graph and label it as 1.

2. Assign a count to its adjacent vertices with a value that is one greater than the

count of the vertex being traversed and mark the corresponding edges. Repeat this step on the new adjacent vertices until all edges are marked. If an adjacent vertex already has a count, a new count is assigned to it. Thus, vertices may

have multiple counts.
20








21


3. Examine vertices with multiple counts, if the sum of any two counts associated

with a vertex is odd, discard the vertex. For the remaining vertices, assign color 1 to those vertices whose sum of the counts is even. Assign color 2 to the rest of the vertices. This step colors the graph with two colors (color 1 and color 2)

and discards some vertices.

4. The thrown-away vertices form a new graph which is then traversed starting

from Step 1. The algorithm terminates if no thrown-away vertex remains.

Figure 3.1 shows an example of the graph traversal algorithm. In Figure 3.1a, vertex A is arbitrarily picked and is labeled as 1. The adjacent vertices of vertex A (i.e., vertex B and vertex C) are labeled as 2 as shown in Figure 3.lb. Repeat the same procedure: vertices D, E, and F are labeled as 3 in Figure 3.1c. Finally, in Figure 3.1d, vertex E is labeled as two 4's, due to its adjacency to vertex D and vertex F. So far, all the edges of the graph have been marked and vertex E has multiple counts. According to step 3 of the static graph traversal algorithm, vertices B and C are assigned with color 1, vertices A, D, and F are assigned with color 2, and vertex E is discarded.

Obviously, vertex E is assigned with color 3 in the next round of traversal. The approximate chromatic number of this graph produced by this algorithm is 3 which happens to be the actual chromatic number.

3.1.2 Dynamic Algorithm

The dynamic graph traversal algorithm is intended to improve the result of static algorithm in the following ways:

1. Reduce the number of thrown-away vertices.

2. Simplify the labeling technique.










22






Al


BC (a)


E F



Al

2 2
B C
E (b)




1
A


B C


3 E 3(c) F 3


1
A

2 2
B C

3(d)
3 EF 3

3, 4, 4


Figure 3.1. An Example of Graph Traversal Using the Static Graph Traversal Algorithm








23


3. Color the vertices of the graph "on the fly".

The dynamic graph traversal algorithm labels, colors, and throws away vertices as each vertex is traversed. A vertex whose sum of the counts is odd is discarded immediately to avoid further consideration.

The following steps form the algorithm:

1. Arbitrarily pick one vertex and label it as 1. The vertex chosen is the front

vertex.

2. Label the adjacent vertices of the front vertex in random order with the l's

complement value of the count of the front vertex. For each labeling of a new vertex, if the sum of any two counts is 1, discard the vertex right away. Repeat

this procedure for each vertex adjacent to the front vertices being traversed.

3. Designate the newly labeled vertices as the front vertices. Go to step 2 if there

are still unlabeled vertices.

4. Go to step 1, if there are vertices that have been thrown away in step 2.

Figure 3.2 demonstrates the steps of the dynamic graph traversal. In Figure 3.2a, vertex A labeled as 1 is the front vertex. Vertices B and C which are adjacent to the front vertex A are labeled as 0 in Figure 3.2b (complement value of the count of the front vertex A). Note that the counts of each vertex are checked immediately to see if the vertex needs to be thrown away. At this point, vertices B and C are the new front vertices. In Figure 3.2c, vertex C is labeled as 1 and is thrown away because it is adjacent to vertex B. As a result, since there is no more front vertex to be processed, the algorithm would randomly pick a new starting vertex from the unlabeled vertices (i.e., vertices D or E) and label it as 1, as shown in Figure 3.2d. According to step 2, vertex E is labeled as 0. So far, all the vertices have been labeled and colored. The









24


1 1

A A


0
0









(a) (b)






1 1

A A


1,0 0 1,0 B C B C



D EF D EF

0 1
(c) (d)




Figure 3.2. An Example of Graph Traversal Using the Dynamic Graph Traversal Algorithm








25


group of vertices with a count of 1 is colored with the first color, the group of vertices with a count of 0 is colored with the second color, and the vertex C is colored with the third color in the next round of traversal.

The advantages of this algorithm are obvious as compared to the static graph traversal algorithm. They are:

Vertices are thrown away "on the fly" as they are traversed so that the edges

associated with them do not have to be traversed as they do in the static

algorithm.

Fewer edges need to be traversed.

Labeling vertices with O's and l's is simpler than counts.

In addition, the dynamic graph traversal algorithm has some mathematical properties which simplify the analysis of its performance. Details are given in Section 3.2.

3.1.3 Heuristic Algorithm

In some cases, both the static and dynamic algorithms will throw away a high percentage of vertices. Figure 3.3 gives an example. Suppose that vertex A is chosen as the front vertex using the dynamic algorithm and labeled as 1 and vertices B, C, D, and E are labeled as 0. If the labeling is in the direction of B->C, B->D, and B->E in the next step, three vertices (C, D, and E) will be thrown away. On the other hand, if the labeling of vertices is in the opposite direction (C->B, D->B, and E->B), only vertex B will be thrown away.

This motivates us to pursue some heuristics to further reduce the number of vertices that will be thrown away. Two heuristic rules are given below:









26


















A






CE







B











Figure 3.3. An Example of Graph Traversal Using the Heuristic Graph Traversal Algorithm








27


1. Pick the vertex with the largest degree (i.e., the largest number of edges) as

the starting vertex. The idea is to expand the set of front vertices as soon as

possible.

2. In exploring the adjacent vertices of a front vertex, start with the vertex of the

smallest degree. The justification for this rule is to throw away the vertices that break up most odd cycles as soon as possible. The odd cycles are the ones

with an odd number of vertices.

If we incorporate the heuristics proposed above to the dynamic graph traversal algorithm, the number of thrown-away vertices can be significantly reduced. The example in Figure 3.3 throws away only one vertex when the heuristic graph traversal algorithm is used.

3.2 Analysis of the Dynamic Graph Traversal Algorithm

The dynamic graph traversal algorithm has some mathematical properties which yield a simplier analysis of the bound of the approximate chromatic number. First, we define some theorems and related terms.


Definition 3.2.1 Bipartite graph: A graph G (V, E) is a bipartite graph if there exist subsets V1 and V2 of V where V1 -' V2 = V, such that each edge of the graph (i.e., e E E) has one endpoint in V1 and the other in V2. A graph is a bipartite graph if and only if it is two-colorable.


Corollary 3.2.1 The dynamic graph traversal algorithm is an efficient way of determining if a graph is two-colorable (bipartite). Applying the dynamic traversal algorithm to a graph G, if a vertex is thrown away in the middle of traversal, the graph is not two-colorable.
union disjoint set








28


Corollary 3.2.2 Each traversal of the dynamic graph traversal algorithm partitions the vertices of the graph into three groups of vertices: vertices with color 1, vertices with color 2, and the thrown-away vertices. Definition 3.2.2 Independent set: A set of vertices is said to be independent if no two of its vertices are adjacent to each other.


Definition 3.2.3 Maximal independent set: An independent set I is maximal if no other independent sets contain I in graph G. Theorem 3.2.1 For a connected graph, each traversal of the dynamic algorithm generates two maximal independent sets (i.e., the set of vertices with color 1 (set 1) and the set of vertices with color 2 (set 2)). It should be noted that a maximal independent set is not the same as a maximum independent set. Proof of Theorem 3.2.1 The claim that the set of vertices with color 1 is a maximal independent set is based on the following argument. No vertices in set 2 can be added to set 1 without breaking its independence, since each vertex in set 2 is adjacent to at least one vertex in set 1. Furthermore, no vertex in the set of thrown-away vertices (set 3) can be added to set 1, since each vertex in set S is adjacent to at least one vertex in set 1 and one vertex in set 2. As a result, set 1 must be a maximal independent set. The same argument can be used to prove that set 2 is also a maximal independent set.


Theorem 3.2.2 The dynamic graph traversal algorithm produces the exact chromatic number 2 for a bipartite graph.

Proof of Theorem 3.2.2 Without loss of generality, it can be assumed that the graph G is connected. (If the graph is not connected, the algorithm simply treats each








29


component separately.) For a connected bipartite graph, the number of maximal independent sets is two which can be determined by the dynamic graph traversal algorithm in the first traversal of the graph (Theorem 3.2.1). No vertices are thrown away. As a result, the algorithm generates the exact chromatic number for a bipartite graph. Definition 3.2.4 k-partite graph: A graph whose vertex set can be partitioned as V = VI W V2 WL... Vk such that there exists no edge between a pair of vertices located in the same partitioned set.

Definition 3.2.5 Complete k-partite graph: A k-partite graph is complete if there exists an edge between each pair of vertices located in distinct partitioned sets. Theorem 3.2.3 The dynamic graph traversal algorithm always produces the exact chromatic number k for a complete k-partite graph.


Proof of Theorem 3.2.3 Let G(V, E) be a complete k-partite graph with vertex partitions VI, V2,...,and Vk. Each edge of this graph has its endpoints in two distinct partitions (see Figure 3.4a for an example of k-partite graph). Without loss of generality, a vertex sl in V1 is picked as the starting vertex and is assigned with a count of 1. In Figure 3.4b, the adjacent vertices of the starting vertices sl (all the vertices not in Vi) are assigned a count of 0 according to the dynamic graph traversal algorithm. Now, all the vertices not in VI are the front vertices. In the next step, pick a vertex from the set of front vertices, say, s2 in V2 for expansion. The adjacent vertices of s2 are given a count of 1 as shown in Figure 3.4c. At this step, all the vertices in V3 through V-k are thrown away because the sum of two counts is odd and the vertices in V1 and V2 are assigned with colors 1 and 2, respectively. Thus, a new (k-2)-partite graph is formed with vertex partition V3, V4,...,and IVk. By repeating the same steps, the dynamic graph traversal algorithm uses k colors for the k-partite graph.









30
1 Vi


(a)








V3





(b)



SVk V2 0 0 Vk












(c)


s2 Vk




O O V3


Figure 3.4. An Example of Graph Traversal on a K-partite Graph








31


Definition 3.2.6 Random graph model: A random graph model M( V q) is defined in terms of the number of vertices of a graph (1 V I) and the probability (q) that an edge exists between any distinct pair of vertices, where 0 < q < 1. This model consists of a set of graphs, each of which has V I vertices and its edges are determined by a random number generator. For a given q value, if a random number is less than q, then the edge between two vertices exists. A set of graphs is generated for the model in this manner.


The random graph model is used in the performance evaluation of the dynamic graph traversal algorithm, since it generates all possible graphs with equal probability.


Definition 3.2.7 Clique: A clique here means a maximal complete subgraph of a graph.


Theorem 3.2.4 The expected number of maximal independent sets of a random graph containing a fixed vertex has been derived in [35,59] as follows:




El = ( V 1 |(1 q)d(d-1)/2(l (1- q)d)IVj-d d=1



where

El: the expected number of maximal independent sets of a random graph model containing a fixed vertex
I V | : number of vertices in graph G

q: the probability that an edge exists between any pair of vertices in a random graph








32


The calculation of the number of maximal independent sets for a random graph model M(I V |, q) is done by counting the number of cliques and replacing q with (1q), since the complement of a maximal independent set is a clique [15]. The counting of cliques can be explained as follows:

1. The size of cliques ranges from 1 to V .

2. For each size of cliques (d), there will be ( d) edges and the probability is q

raised to the d th power.

3. If a vertex can be added to a clique, it has to be fully connected to the vertices

in the clique. The probability that this situation happens is q raised to the dth power. The probability that a vertex can not be added to a clique is therefore (1 qd). Since there are (I V d) vertices not in the clique, (1 q') has to be

raised to the (I V d)th power.

4. The product of the probabilities derived in items 2 and 3 is the probability that

a clique with d vertices exists.

5. Since a fixed vertex must be included in the clique, the number of combinations

for a clique of size d is
VI
d-1

6. The product of the terms derived in items 4 and 5 is the expected number of

cliques of d vertices in a random graph model with IV I vertices under the

condition that a fixed vertex must be included in the clique.

7. Ej- (the expected number of cliques) is calculated by taking the summation of

the expected numbers of cliques for different sizes of cliques.








33




IVI V I) q(d-)2(1 q d)IVI-d F-i d-1
d=1



8. The expected number of maximal independent sets is obtained by replacing q

with (l-q) in E, since cliques and maximal independent sets complement each

other.



El = V 1I (1 q)d(d-1)/2(l -(1- q)d)IVI-d dF- d 1
d=1



Theorem 3.2.5 The expected number of maximal independent sets of a random graph model is




E2 = I (1 q)d(d-1)/2(1 (1 q)d)IVI-d dE( d
d=1


where

E2: the expected number of maximal independent sets of a random graph model

The calculation is similar to Theorem 3.2.4.

Definition 3.2.8 Color class: In a colored graph, the set of vertices with the same color forms a color class. Definition 3.2.9 Expected chromatic number (ECN): The expected value of the chromatic number for a random graph model is calculated by taking the summation of the product of each possible chromatic number and its probability.








34


It should be noted that the closed form for the expected chromatic number is not known at this moment. In this section, the approximate value of the expected chromatic number which can be obtained by applying the dynamic graph traversal algorithm to a random graph is computed.

Methods for coloring random graphs have been proposed by several researchers [4,14,24,32]. Various constraints on the random graphs, such as small graphs, dense graphs, etc., were introduced in their algorithms. This work considers general random graphs without constraints. In the rest of this section, the theoretical basis for evaluating the performance of the dynamic graph traversal algorithm on general random graphs is provided.

Given a random graph, the assignment of colors to its vertices is done by applying the dynamic graph traversal algorithm repeatedly to the random graph until each vertex has been assigned a color. Since each traversal produces two maximal independent sets, and each maximal independent set forms a color class, the number of maximal independent sets produced by the algorithm is equal to the approximate value of the expected chromatic number of the random graph model defined by I V and q.

The approximate value of the expected chromatic number can be calculated as follows:

Step 1. Calculate the expected size of the maximal independent sets (ESMIS).

The expected size of maximal independent sets (ESMIS) is equal to the sum of the number of maximal independent sets with d vertices times d vertices (where d may go from 1 to I V |) and divided by the total number of maximal independent sets, i.e.,








35




E z d ( (1 q)d(d-1)/2(l d( V-d ( I 1 (1 q)d(d-1)/2(1 (1 q)d)IVI-d


The numerator can be simplified.

d V (1 q)d(d)/2(l (1 q)d)IV-d
d=1




= I V (1 q)d(d-1)/2(l (1 )d)IVI-d
E= (d 1)!(I V I- d)I
d=1



S|VI (VI -1)! (1 q)d(d-1)/2( (1 q d) V-d
V (d 1)!(I V 1 (d 1))!(1
d











= E IVI



Thus,


ESMIS = IV
E2




where El and E2 are the expected number of maximal independent sets containing a fixed vertex and the expected number of maximal independent sets, respectively.








36


Step 2. The approximate value of the expected number of maximal independent sets is calculated by dividing the number of vertices of the graph by the expected size of maximal independent sets, i.e.,

| VJ E2
AECN
ESMIS E1

where

AECN = approximate value of the expected chromatic number

I V I = number of vertices of the random graph

ESMIS = expected size of maximal independent sets

The dynamic graph traversal algorithm partitions a random graph into a number of maximal independent sets which are picked from a pool of all possible maximal independent sets for the random graph. Depending on the patterns of random graphs, different sets of maximal independent sets will be chosen. Even for random graphs with the same pattern, the chosen maximal independent sets may vary, since the algorithm picks the starting vertex arbitrarily when traversing the graph and the order of exploring the adjacent vertices of a front vertex is random. Since each maximal independent set of the pool of maximal independent sets has the same probability to be chosen, and the expected size of the maximal independent sets in the pool can be computed as a function of (I V 1) and (q), (I V I / ESMIS) is the expected number of maximal independent sets that will be produced by the dynamic graph traversal algorithm. If each maximal independent set is a color class, the expected number of maximal independent sets is the approximate value for the expected chromatic number.

As an example, for a random graph with 50 vertices and the probability that an edge exists between any pair of vertices is 0.5, the following values can be derived using the above equations.









37


El = 98.78

E2 = 963.50

ESMIS = 5.14

AECN = 9.7

3.3 Comparison of Results

In this section, both the simulation results obtained by the method that orders vertices and uses a similarity matrix [61] and the analytical results of the dynamic graph traversal algorithm are provided. The results are shown in Table 3.1. Table 3.1. The Approximate Values of the Expected Chromatic Number by Wood's Method and the Dynamic Graph Traversal Algorithm lVI q a -3 20 0.25 3- 5 3.11 20 0.50 5- 8 5.11 20 0.75 8 10 7.89 50 0.25 6- 8 5.43 50 0.50 10 13 9.75 50 0.75 16- 20 15.73 100 0.25 10 13 8.78 100 0.50 18 22 16.52 100 0.75 28 33 27.63



In the table, V and q are the number of vertices and the probability that an edge exists between a pair of vertices, respectively. a is the range in which most approximate chromatic numbers produced by Wood's algorithm fall, and /3 is the approximate value of the expected chromatic number computed using our formula. As shown in the table, /3 stays in the low end of a for various I V I's and q's. As can be seen, the dynamic graph traversal algorithm produces better results. It should be noted that the comparison is based on the analytical results of the dynamic algorithm








38


and the simulation results of Wood's method. Since the domains of samples are not exactly the same, the difference between a and # may not be as large as it is shown in Table 3.1. The comparison based on simulation results will be included in the future work.
















CHAPTER 4
DPBN AND GRAPH COLORING ALGORITHM

One way to increase bus capacity is to physically partition the bus, allowing parallel communications in the fragmented segments. Figure 4.1 shows the architecture of a partitionable bus [3,54] which consists of a number of stations connected to a shared communication network in a multipoint configuration. A number of switches are used to physically partition the bus into clusters, each of which contains a number of adjacent stations. For example, by closing all switches except the kth switch, two clusters are formed; one contains stations 1 to k and the other contains k+1 to n stations where n is the total number of stations in the network. Communications among stations in one cluster can be carried out in parallel with the communications in another cluster. The control of the switches can be either centralized or distributed. In a centralized control, the control computer sets the switches for partitioning or connecting the bus. In a decentralized control, an individual station turns the switch on and off when receiving a command from the control computer.

The above idea of using a dynamically partitionable bus will be useful if there is an efficient algorithm to distinguish conflicting and non-conflicting communication requests so that the network can be properly partitioned to allow a maximum of non-conflicting requests to be processed in parallel.

In the context of a partitionable bus network, two communication requests are said to be in conflict if the range of adjacent stations defined by the sender and the receiver of one request overlaps with the range defined by the sender and the receiver of the other request. For example, if station 1 wants to send a message to station 39









40



















switch switch switch switch Com puter

Station 1 Station 2 Station n

















Figure 4.1. The Architecture of a Partitionable Bus Network








41


4 and station 3 wants to send a message to station 5 at the same time, these two requests would be in conflict since the range 1 to 4 overlaps with the range 3 to 5 and the bus can not be partitioned to allow simultaneous transmissions of these requests. Graph traversal algorithms can be applied to efficiently distinguish conflicting and non-conflicting requests.

This chapter is organized as follows: Section 4.1 describes the application of the dynamic graph traversal algorithm. Section 4.2 delineates the bus partitioning technique. Section 4.3 presents the hardware design of the dynamic graph traversal algorithm. Section 4.4 gives the results of a performance evaluation.

4.1 A Graph Traversal Algorithm and Its Application

In Section 3.1, three graph traversal algorithms: (static, dynamic, and heuristic graph traversal algorithms) were presented. Each traversal of the algorithm results in some colored vertices and "thrown-away" vertices. Through repeated traversals of the "thrown-away" vertices, all vertices are assigned with colors. The dynamic graph traversal algorithm has several advantages over the static traversal algorithms and is suitable for real-time applications.

The application of the dynamic algorithm in a computer system is as follows. If each vertex of the graph (Figure 3.2) represents an event of a computer system, the events corresponding to vertices B, D, and F can be processed in parallel, as can the events corresponding to vertices A and E. The corresponding event of vertex C, which is thrown away in the graph traversal, will join the next group of processes to compete for a shared system resource.

As a result of separating conflicting and non-conflicting events, the system performance can be enhanced. If the vertices in Figure 3.2 represent communication requests of a single shared bus, the system throughput can almost be doubled. In the








42


case of a very busy network, the increase in system throughput will be even higher. The traversal time of this algorithm is negligible since it is a very simple algorithm and the time can most probably be overlapped with the communication processes.

4.2 Dynamic Bus Partitioning Technique

This section shows how the graph coloring algorithm can be applied in a dynamically partitionable bus network. The following steps are required to construct a graph for a partitionable bus network.

1. Represent each communication request from one station to another by a vertex

in the graph.

2. Determine the conflicts among the communication requests. If there is a conflict

between two communication requests as defined in the beginning of this chapter,

draw an edge between the corresponding vertices.

3. Apply the graph traversal algorithm to the constructed graph. (Note: Some of

the vertices may be thrown away as described in the graph traversal algorithm).

4. At time tl, all the vertices (representing communication requests) with color 1

are allowed to perform communication in parallel in different clusters of stations which are formed by physically partitioning the bus. Then, at time t2, all the vertices with color 2 are allowed to perform communication in parallel by repartitioning the bus. The vertices which have been thrown away at this

traversal will join the next group of communication requests.

Tables 4.1 and 4.2 and Figures 4.2 and 4.3 illustrate the steps of the bus partitioning technique. Five communication requests are collected in a communication request table (Table 4.1). Each request is represented by a vertex in the graph; for








43


example, the communication request from station 1 to station 2 is denoted by vertex A. Table 4.2 shows the conflicts among the communication requests. A "C" in a table entry indicates that there is a conflict between the request on the row and the request on the column. The graph representation of the communication requests shown in Figure 4.2 is constructed according to the conflicting events shown in Table 4.2. In Figure 4.2, communication requests are represented by vertices (A, B, C, D, and E) and edges are drawn to indicate their conflicts. Figure 4.3 shows the result of the graph traversal.

Table 4.1. A Graph Representation of a Group of Communication Requests

From station To station Graph representation (vertex)
1 2 A 2 4 B 3 4 C 4 5 D 5 6 E




Table 4.2. The Table of Conflicts among Communication Requests 1-2 2-4 3-4 4-6 5-6 1-2 C
2-4 C C C 3-4 C C 4-5 C C C 5-6 C



In this example, vertices A and C are assigned color 1, vertices B and E are assigned color 2, and vertex D is thrown away. At time tl, the communication request from station 1 to station 2 and the communication request from station 3 to station 4 can be done in parallel by opening the switch between station 2 and









44























1-2

A

3-4 C 2-4 4-5 D E 5-6

















Figure 4.2. A Graph Representation of Conflicts and Requests










45


























1 A


C B



0
1,0
















Figure 4.3. The Graph Traversal of the Constructed Graph








46


station 3 and closing all other switches. At time t2, the communication request from station 2 to station 4 and the communication request from station 5 to station 6 can be done in parallel by resetting the switches properly. The communication request from station 4 to station 5, which is represented by vertex D, will join the next group of communication requests. In a system that uses a dynamically partitionable bus, the bus can be properly partitioned to allow parallel communication to take place at different times. Thus, system throughput can be dramatically increased by using the graph traversal algorithm in conjunction with the partitionable network. The example given above demonstrates our approach and algorithm. The results of a simulation study of DPBN will be given in Section 4.4.

There are several notable features in our approach to solving contention problems in a computer system.

1. Flexible Representation: Vertices can be used to represent any type of requests for any resource in a computer system. Any resource contention problem can be transformed into a graph coloring problem. For instance, in database management, vertices can denote database operations (JOIN, SELECT, AND PROJECT), and edges can represent memory access conflicts among database operations. In multiple bus networks, partitionable networks, and multistage interconnection networks, vertices and edges can represent communication requests and conflicts in using communication network(s), respectively. This flexibility in problem representation allows for a broad range of applications of the presented algorithm. As long as a system has multiple processes competing

for limited resources, the proposed approach can be applied.

2. Fast Running Time: It is obvious that the complexity of the graph traversal

algorithm is O(E), where E is the number of edges of the graph, since each








47


edge is processed once only. The required memory space for traversing the graph is small, and the traversal can overlap with packet communication and data processing to reduce the overhead involved (i.e., while processes are being processed in one time frame, the algorithm is used to traverse the graph for the

next group of requests).

3. Multiple Ways of Interpreting the Colors Assigned to the Vertices. Depending

on the applications, the colors assigned to the vertices of the graph can be interpreted in a variety of ways. For instance, the colors assigned to the vertices of the graph can be interpreted as time frames in partitionable bus networks and database query processing. The corresponding requests of the vertices assigned with different colors must be performed at different time frames. However, in multiple bus networks, the colors can be interpreted as buses. The corresponding requests of the vertices assigned with different colors can be carried out in

different buses at the same time frame.

4.3 Hardware Design of A Graph Traversal Unit

In a real-time environment, many requests for a resource arrive at the same time and a very fast scheduling decision needs to be made. The proposed algorithm may not meet the time requirement if it is implemented in software. In that case, a hardware implementation of the algorithm would be needed. In this section, a design of the graph traversal unit is presented.

4.3.1 Architecture of the Graph Traversal Unit

The graph traversal unit is a specialized hardware with the capability of distinguishing conflicting and non-conflicting communication requests by assigning either a "0" or a "1" to each communication request. It is connected to the control unit








48


of the control computer and can operate in parallel with the normal operation of the control computer, which off-loads communication requests to the graph traversal unit. As shown in Figure 4.1, the control computer and stations are suspended off the partitionable bus. Figure 4.4 shows the detailed organization of the control computer, which consists of the usual components of a general purpose computer including disks, memory, DMA, control unit, and additionally a graph traversal unit. In general, the system works as follows:

1. The control computer collects communication requests from the stations and

produces an adjacency matrix such as the one illustrated in Section 4.2. A communication request is composed of two numbers, the tag of the sender and the tag of the receiver. For each pair of communication requests, the corresponding tags are used to determine if there is a conflict. A conflict occurs if the range of consecutive stations represented by a pair of numbers overlaps with the range represented by another pair as shown in the following procedure.



Procedure For Determining Conflict;

(*Let the Tags for a pair of communication requests be

(TIa, Tib) and (T2a, T2b), respectively.

Each pair of numbers represents a range of consecutive stations.*)

begin

I f(T. < T2. and TIa < T2b and Tlb < T2a and Tlb < T2b)

then return no conflict

elsei f(Ti, > T2a and Ti. > T2b andTlb > T2a and Tlb > T2b)

then return no conflict

else








49


return conflict

end;

2. The control computer maps the adjacency matrix to the RAM of the graph

traversal unit and activates the unit.

3. The graph traversal unit driven by an internal controller returns the results of

graph traversal to the control computer, which sets switches on/off to partition

the bus accordingly to allow messages to go through in parallel.

It should be noted that the determination of conflicts among communication requests can be done by either the CPU of the control computer or the graph traversal unit. If the graph traversal unit determines the conflicts, an additional logic unit would be required.

4.3.2 Input and Output of the Graph Traversal Unit

The input to the graph traversal unit is nothing but the adjacency matrix. The adjacency matrix is a RAM of size 128 x 128, which is essentially a two dimensional array with each element indicating the adjacency of the vertex on the row and the vertex on the column. A conflict between any two communication requests is indicated by placing a 1 in the corresponding position in the adjacency matrix. Depending on the number of communication requests being processed by the graph traversal unit, the size of RAM can be changed to suit system's need. The number 128 was chosen arbitrarily for illustration purposes. It is a system design parameter.

The results of a graph traversal are stored in two 128 x 1 registers (status matrix) which use two bits (one bit from each register) to show the status of each vertex: colored with the first color, colored with the second color, and thrown away for the next traversal. In addition to the status matrix, a status code is used to indicate the









50











Control Computer



Internal Bus







I Interface GTU
Il


'TO DPBN








L---------------------------------------------------------------------------------Note: G T U stands for graph traversal unit









Figure 4.4. The Detailed Organization of the Control Computer of the Partitionable Bus Network (DPBN)








51


status of the graph traversal unit. When the results of traversal are ready, a signal is sent to the control computer. On the other hand, the control computer may issue one of the three signals to specify the external functions to be executed by the graph traversal unit. The external functions are as follows:


1. Clear: Before loading the adjacency matrix, the RAM is cleared.

2. Load: The control computer off-loads the graph traversal unit with the adjacency matrix.

3. Go: The graph traversal unit is activated. The controller of the graph traversal

unit starts executing micro-instructions and produces results. The external

function Go is composed of a sequence of logical operations.


The following example illustrates the traversal of the graph in Figure 4.2 by the GTU. To simplify the illustration, the size of the RAM is reduced to 8 x 8 and the size of the registers is reduced to 8 bits.

1. Graph traversal unit is enabled.

2. The RAM is loaded with the adjacency matrix. As shown in Table 4.3, each

address of the RAM (addresses 000 through 111) represents a communication request. For instance, the address 000 represents a communication request from station 1 to station 2 (vertex A), and the adjacent vertices of vertex A are the corresponding vertices of the bits with 1 in the row of address 000.

It is obvious that vertex B is the only adjacent vertex of vertex A (address 000) since the second bit of the row 000 is 1 and all other bits are 0's. It should be noted that only the upper triangle of the RAM is filled with l's if the corresponding communication requests are in conflict. For example, the








52


conflict between vertex A (address 000) and vertex B (address 001) is indicated

by putting a 1 in the entry (1, 2) instead of in the entry (2, 1).

Table 4.3. The Adjacency Matrix Stored in the RAM

000 (vertex A) 0 1 0 00 0 0 0 001 (vertex B) 0 0 1 1 0 0 0 0 010 (vertex C) 0 0 0 1 0 0 0 0 011 (vertex D) 0 0 0 01 0 0 0 100 (vertexE) 0 0 0 0 0 0 0 0 101 (unused) 0 0 0 0 0 0 0 0 110 (unused) 0 0 0 0 0 0 0 0 111 (unused) 0 0 0 0 00 0 0 0



3. There are eight working registers (Wo, Wi, ..., WT7) used to support the hardware

graph traversal algorithm. They are initialized with values as shown in Table 4.4. Each bit of a working register represents the status of the corresponding

Table 4.4. The Initial Values of the Registers (Wo, W1, W2, ..., W7) Wo 0 0 0 0 0 0 0 0 W1 0 0 0 0 0 0 0 0 W2 0 0 0 0 0 0 0 0 W3 1 0 0 0 0 0 0 0 W4 0 0 0 0 0 0 0 0 W5 1 1 1 1 1 1 1 1 We 1 0 0 0 0 0 0 0 WT7 0 0 0 0 0 0 0 0



vertex. For instance, the first bit of a working register represents the status of vertex A, the second bit represents the status of vertex B, etc. The purposes of these working registers and their relationship to the dynamic graph traversal

algorithm are explained below:








53


In the dynamic algorithm, the starting vertex is randomly chosen. However, the starting vertex must not be an isolated vertex. W5 is used for searching for a starting vertex. This is done by searching the RAM row by row. The corresponding vertex of a row that is not null can be designated as the starting vertex. In this example, vertex A (address 000) is chosen as the starting vertex. Once the starting vertex (front vertex) is chosen, the adjacent vertices of the front vertices need to be explored. Wo stores the adjacent vertices of the front vertex. The register Wo is an 8-bit register. Each bit indicates the status (adjacent or non-adjacent) of the corresponding vertex to the front vertex. For instance, the row of address 001 of the RAM is loaded into Wo and the third and fourth bits of Wo are l's. It means that vertex C (represented by the third bit) and vertex D (represented by the fourth bit) are adjacent to vertex B (RAM address 001).

In the process of traversal, the adjacent vertices need to be assigned the current count. W1 stores the current count that is being assigned to the vertices. The bits of W1 are either all l's or all O's. After the expansion of each of the front vertices, the current count is replaced with a 2's complement value of the current count. In other words, the count is alternating between 1 and 0. To record the adjacent vertices assigned with current count (either 0 or 1), W2 and W3 are used. W2 stores the current count of the adjacent vertices. In each expansion of a front vertex, the adjacent vertices of the front vertex are assigned a current count stored in W1. If the current count stored in W1 is 1, the contents of W are copied to W2. This operation will assign a count of 1 to the bits of W2 where the corresponding bits in Wo are 1. If the current count stored in W1 is 0, no count needs to be assigned, since W2 is initialized to zero.








54


W3 stores the history of counts (accumulative counts) assigned to each vertex. It should be noted that only one bit in W3 is enough for recording the history of the counts of a vertex, since multiple l's are the same as a single 1 and multiple O's are the same as a single O's in evaluating whether a vertex should be thrown away or what color should be assigned to the corresponding vertex. The current count assigned to W2 is accumulated in W3 after each expansion of a front vertex. In other words, W3 contains all the counts which have been assigned to the vertices. This can be done by taking a logical "OR" operation between W2 and W3. The first bit of W3 in this example is initialized to 1, since vertex A is designated as the starting vertex and has been assigned a count of

1.

While assigning the current count to each adjacent vertex, each vertex is checked to see if the condition of "thrown-away" (i.e., a vertex that has at least one count of 1 and at least one count of 0) is met. The results of checking of "thrown-away" are stored in W4. The corresponding vertices of the bits with the value 1 in W4 are "thrown-away" vertices. The "thrown-away" vertices can be determined by taking a logical "AND" operation between W3 and W7 (to be given below).

After each of the set of front vertices has been expanded, the adjacent vertices of the current front vertices become a new set of front vertices. W6 stores the set of front vertices. The corresponding vertices of the bits with a value of 1 are the front vertices. In our example, the first bit of W6 is initialized to 1, since vertex A is the starting vertex (front vertex). In order to distinguish between a vertex assigned with a count of 0 and a new vertex (a vertex that has no count), W7 is used to keep track of the vertices that








55


have been assigned a count of 0. A vertex that has been assigned a count of 0 is identified by putting a 1 in the corresponding bit of W7. This can be achieved by taking a logical "OR"operation between Wo and W7. In this example, W7

is null, since no vertices have been assigned a count of 0.

4. The adjacent vertices of the starting vertex (front vertex) are explored by loading the row of address 000 into Wo (Wo +- RAM[MAR]). Here, MAR stands

for the memory address register. The contents of Wo are shown below.

Wo 0 1 0 0 0 0 0 0


5. Keep track of the vertices that have been assigned a count of 0 (W7 +- Wo V W).

When a front vertex is expanded, some vertices are newly traversed. The newly traversed vertices (indicated by the mark bits with l's in W) are added to the set of vertices which have already been assigned a count of 0 (indicated by the mark bits with l's in W7) by taking a logical "OR" operation between Wo and W7. It should be noted that this step is necessary only when the current count

is 0. The contents of W7 are shown below.


W7 0 1 0 0 0 0 0 0


6. Establish a new set of front vertices and complement the count (W6 -- Wo and

W1 +--'- W1). W1 contains the current count of the traversal. When a new set of front vertices is established, the count is replaced with the 2's. complement of the original count as described in the algorithm. The contents of W1 and W6

are as follows:

W 1 111111111

W6 0 1 0 0 0 0 0 0








56


7. Encode each mark bit of W6. The data stored in the encoded address of the

RAM are read into W (Wo -- RAM[MAR]). This explanation is very similar to that of step 1. Each of the front vertices is explored by loading the corresponding row of the RAM into Wo. The contents of Wo are shown below.

Wo001 1 0000


8. Assign the current count to each adjacent vertex (W2 +-- Wo). W and W1

contain the adjacent vertices of the front vertex and the current count, respectively. Since the current count is 1 this operation assigns the current count to

the adjacent vertices of the front vertex. The contents of W2 are now


W2 0 0 1 1 0 0 0 0


9. Accumulate the counts (W3 W2 V W3). W3 contains the accumulated counts

and W2 contains the count assigned to the adjacent vertices. The logical "OR"

operation accumulates the counts. The contents of W3 are now


W310110000


10. Check for "thrown-away" vertices (W4 +- W3 A W7). W3 and W7 keep track

of the count of 1 and the count of 0 of the vertices, respectively. The logical "AND" operation determines the "thrown-away" vertices. The corresponding vertex of a bit position with value 1 in both W3 and W7 is a "thrown-away"

vertex since a 1 in W3 indicates that the vertex has been assigned a count of 1 and a 1 in W7 indicates that the vertex has been assigned a count of 0. The

contents of W4 are now

W4 0 0 0 0 0 0 0 0








57


11. Since no vertex has been thrown away in the step above, a new set of front

vertices is established by copying Wo to We. At the same time, the current count is replaced with the 2's complement value of the current count (We +- WO

and W1 +--- W1)). The contents of W1 and We are now W1 0 0 0 0 0 0 0 0



W6 0 0 1 1 0 0 0 0


12. The mark bits of number 1 in W6 are encoded and the data read from the RAM

are stored in WO (Wo +- RAM[MAR]). The contents of WO are now Wo 0 0 0 1 0 0 0 0


13. Assign the current count to each adjacent vertices. Since the current count

stored in W1 is 0, no logical operations are required to assign the current count.

The contents of W2 are now

W2 0 0 0 0 0 0 0 0


14. Keep track of the vertices that have been assigned a count of 0 (W7 Wo AW W7).

The step is very similar to step 5. The contents of W are shown below.

W7 0 1 0 1 00 00


15. Check for "thrown-away" vertices (W4 +- W3 A W7). The explanation is similar
to that of step 10. The contents of W4 are now

W4 0 0 0 1 0 0 0 0








58


16. Encode the mark bits of l's in W4 to clear the rows of the encoded addresses in the RAM. Since the "thrown-away" vertices are indicated by a 1 in the corresponding bit of W4, the corresponding rows of the "thrown-away" vertices

in the RAM can be cleared by encoding the mark bit of l's in W4.

17. At this step, the RAM is clear and the traversal is done. The counts of the vertices are stored in W3. The corresponding vertices of the bits with l's are assigned the first color, and the corresponding vertices of the bits with O's are assigned the second color. The corresponding vertices of the bits with l's in W4 are thrown away. In this example, vertices A and C are assigned the first color, vertices B and E are assigned the second color, and vertex D is thrown

away. The contents of W3 and W4 are now


W3 1 0 1 1 0 0 0 0


W4 0 0 0 1 0 0 0 0

In summary, the traversal of a graph consists of a number of repeated cycles. A cycle starts with encoding the front vertices for retrieving the data stored in the RAM, followed by assigning the current count to the adjacent vertices, accumulating the counts, and checking the thrown-away vertices. The operations are simple and can be done in a very small amount of time.

4.3.3 Organization of the Graph Traversal Unit

The graph traversal unit is a microprogrammed subsystem equipped with a local bus. The details of the graph traversal unit are shown in Figure 4.5. It consists of the following components:













MAR




RAM (128 x 128) G

T
GTU Ri decoder

U
Controller


encoder









o
o
0

W7

A logic F B unit

S
ACC

TEMP



Figure 4.5. The Organization of the Graph Traversal Unit








60


1. Data memory and MAR: The data memory is a RAM of size 128 x 128 used

to store the adjacency matrix of communication requests. MAR specifies the

memory address for storing/retrieving data to/from the data memory.

2. Controller: It contains the micro-instructions for controlling the operations of

a graph traversal.

3. Working registers (WO, W1,...,W7): The traversal of vertices involves exploring

adjacent vertices, keeping track of front vertices, and checking for "thrownaway" vertices. The working registers are designed for marking and keeping track of the status of each vertex. A register is composed of 128 mark bits where each mark bit indicates the count or status of the corresponding vertex

(communication request).

4. Logic unit and accumulator: This is where the logic operations are performed.

The logic unit performs operations on two operands: one stored in the accumulator and the other stored in one of the working registers.

5. Priority encoder and temporary register: The priority encoder and address

decoder are adopted from the work reported in [30,46]. Since the working registers that need to be encoded contain mostly 0's, it is very slow to check bit by bit for l's. The priority encoder is used to encode the mark bits of registers and generate addresses for retrieving the data stored in the RAM. It works as follows. The priority encoder generates the address of the highest priority mark bit of 1. The generated address is loaded from the address bus to the RAM address register. The address is used to read from the RAM and to clear the mark bit. For example, the working register W6 with the contents (W6 0 1 0 1 1 0 0 0) indicates that vertices 1, 3, and 4 are the front








61


vertices. Priorities are assigned to these bits from left to right. Starting with the highest priority mark bit of 1 (bit 1), the address 00000001 is generated and the data stored at 00000001 of the RAM is retrieved. After the retrieval, the mark bit 1 is reset to allow the next highest priority bit to be encoded.

The same procedure holds true for the mark bits 3 and 4. Another purpose of the priority encoder is to generate the address of the row in the RAM that needs to be cleared. For instance, for each of the "thrown-away" vertices, the address of the corresponding row in the RAM is generated by encoding the corresponding bit in W4. Once the address is generated, the temporary register

that is initialized to zero is used to clear the corresponding row in the RAM.

6. Address decoder: The main purpose of address decoder is to decode the address

generated by priority encoder for resetting the mark bits. In the middle of the traversal, each of the front vertices has to be explored for adjacent vertices.

After a front vertex has been explored, the address of that front vertex has to be decoded in order to reset the corresponding mark bit and allow the next mark bit of 1 to be encoded. In the example above, after the data stored in 00000001 has been retrieved, the address 00000001 has to be decoded by the address decoder to reset the mark bit and to allow the next mark bit of 1 to be

encoded.

4.4 Performance Evaluation of the Partitionable Bus Networks

In this section, the performance of the dynamically partitionable bus network (DPBN) is compared with an "ideal" local area network using simulation. In addition, an aging analysis to avoid indefinite communication delay is discussed.








62


4.4.1 Simulation Study of DPBN

We will compare the performance of the dynamically partitionable bus network to an ideal network. An ideal local area network has a communication channel which has 100 percent utilization rate and is free from any request contention. In other words, it has the theoretically optimal performance of any one-communication-at-atime bus network. The model of dynamically partitionable bus network used in the simulation is described below:


1. N stations are connected to a shared communication channel in a multipoint

configuration. The probability that a station will issue a communication request

at any instant of time is p.

2. The probability that a station will issue two communication requests in a very

short period of time as compared with the time interval of communication is

very small.

3. All communication requests take the same amount of time.

4. Communication requests are independent of each other.


In general, this model is characterized by two parameters: the number of stations connected to the bus (N) and the probability that a station will issue a communication request (p).

A simulation program that models the partitionable bus networks was implemented on a VAX 8600. The following steps describe the program:


1. For each station, if the number generated by the random number generator is

less than p, a request is established. The destination of the request is determined by dividing the range [0 1) into N (the number of stations attached to








63


the bus) half open, equal intervals, with each interval representing a station, and the corresponding station of the interval in which the random number falls

is the destination of the request.

2. An adjacency matrix is established by determining the conflicts among communication requests according to the procedure given in Section 4.1.

3. The graph traversal algorithm is applied to the adjacency matrix.

4. The number of colors assigned to vertices is the communication delay in units.


The example below shows how the communication requests are generated and processed by the program. A number of requests, as shown in Table 4.5, are established by a random number generator. The adjacency matrix shown in Table 4.6 is


Table 4.5. The Table of Communication Requests Sending Station Receiving Station
20 4
5 10
1 3
8 5
9 4
7 13
3 18
12 8
15 4
16 18



constructed according to the procedure in Section 4.3. A 1 in the cell indicates a conflict between the request on the row and the request on the column. For instance, the cell (4 1) is 1 because the request 4 (station 8 sends messages to station 5) is in conflict with request 1 (station 20 sends messages to station 4).








64








Table 4.6. The Adjacency Matrix of the Vertices in a Graph Corresponding to the Communication Requests in Table 4.5 (before graph traversal)

0 1 0 1 1 1 1 1 1 1

1 10 1 1 1 1 1 0 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1 1 1 1 1 0 1 1 0 1 0 1 1 1 1 0 1 1 0 1 1 0 1 1 1 0 I 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 0 1 0 i 1 0 1 1 1 1 1 0 0 1 0 0 0 0 0 1 0 0 0













Table 4.7. The Discarded Vertices of the Graph Traversal on the Adjacency Matrix in Table 4.6

Discarded Vertices
4.5.6.7.8.9








65


After the first graph traversal, the vertices, as shown in Table 4.7, are discarded.

A new adjacency matrix of the discarded vertices of the first graph traversal, as shown in Table 4.8, is established.

Table 4.8. The Adjacency Matrix of the Vertices in a Graph Corresponding to the Communication Requests in Table 4.7 (after the first graph traversal)











The vertices discarded in the second traversal and their adjacency matrix are shown in Tables 4.8 and 4.9, respectively. Table 4.9. The Discarded Vertices of the Graph Traversal on the Adjacency Matrix in Table 4.8

Discarded Vertices





Table 4.10. The Adjacency Matrix of the Vertices in a Graph Corresponding to the Communication Requests in Table 4.9 (after the second graph traversal)




0110



In the third traversal, only vertex 4 is thrown away. The total number of colors used is seven and the communication delay is seven units.








66


Table 4.11 shows the expected communication delays and standard deviations in units of time frames for N = 25, 50, 75, 100 and p = 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9. A time frame is the duration of each communication request. The numbers in parentheses are the communication delays for an ideal network. The expected communication delays for the DPBN are obtained by taking the average of 20 simulation runs. The communication delay for each run is calculated by polling each station for a request, constructing a conflicting graph, and traversing the constructed graph. The expected communication delay for the ideal network is obtained by multiplying N by p. As shown in the table, the DPBN has much shorter communication delay. Table 4.11. The Expected Communication Delays and Standard Deviations in Units for a DPBN and an Ideal Bus Network (in parentheses)


Di/cr (D2) N = 25 N = 50 N = 75 N = 100 p = 0.2 3.10/1.18 (5.0) 6.50/2.85 (10.0) 9.30/2.49 (15.0) 11.40/2.93 (20.0)
p = 0.3 5.00/1.34 (7.5) 10.20/2.17 (15.0) 13.30/2.34 (22.5) 17.00/3.09 (30.0)
p = 0.4 6.80/1.77 (10.0) 12.90/2.60 (20.0) 17.40/3.08(30.0) 23.60/4.34 (40.0)
p = 0.5 8.80/2.04 (12.5) 14.75/2.19 (25.0) 22.80/2.89 (37.5) 29.20/4.31 (50.0)
p = 0.6 9.60/2.03 (15.0) 18.75/2.81 (30.0) 25.40/2.20 (45.0) 35.90/4.39 (60.0)
p = 0.7 11.20/1.82 (17.5) 19.20/3.73 (35.0) 30.20/3.43 (52.5) 38.80/2.94 (70.0)
p = 0.8 12.35/2.57 (20.0) 22.80/2.63 (40.0) 32.10/2.72 (60.0) 40.57/5.14 (80.0)
p = 0.9 13.60/1.41 (22.5) 26.40/3.63 (45.0) 37.80/2.92 (67.5) 51.00/5.24 (90.0)



In Table 4.11, D1 and D2 are the expected communication delays for a DPBN and an ideal network, respectively, o is the standard deviation of the expected communication delays for a DPBN, N is the number of stations attached to the network, and p is the probability that a station may issue a communication request.

Table 4.12 gives the improvement ratio for parameters N and p. It was computed by the formula below:









67







where 6= The improvement ratio


The improvement ratio the percentage of decrease of network delay of a DPBN as compared to an ideal network. It is obvious that with a little extra hardware, the DPBN reduces network delay by around 70 percent depending on the number of stations attached to the network and the frequency of requests.

Table 4.12. The Improvement Ratio of a DPBN over an Ideal Network

6 N = 25 N = 50 N = 75 N = 100
p = 0.2 0.61 0.54 0.61 0.75 p = 0.3 0.50 0.47 0.69 0.76 p = 0.4 0.47 0.55 0.72 0.70 p = 0.5 0.42 0.69 0.65 0.71 p = 0.6 0.56 0.60 0.77 0.67 p = 0.7 0.56 0.83 0.74 0.80 p = 0.8 0.62 0.75 0.87 0.97 p = 0.9 0.65 0.70 0.78 0.76



The results in Tables 4.11 and 4.12 are obtained based on the assumption that new vertices (communication requests) are not allowed to join a graph traversal until all the old vertices (communication requests) have been assigned colors and processed. As each traversal continues, the size of the graph reduces and so does the number of communication processes allowed in the bus. It should be noted that there is no obvious functional relationship between the improvement ratio and p in Table 4.12. The future work will include the study of the relationship between the improvement ratio and the parameters of the network (N and p).








68


Table 4.13 shows the improvement ratio for the case when new vertices are allowed to join the "thrown-away" vertices after each traversal. The improvement ratio increases significantly in comparison with that shown in Table 4.12 because the size of the graph is not reduced after each traversal. In this simulation, the number of new communication requests stays the same during the traversals of a simulation run. In other words, the number of requests that are allowed to proceed in a traversal is the same as the number of newly added requests. When the number of communication requests allowed to go through reaches the number of initially generated requests, the simulation run stops. However, this strategy does not reduce the network delay of an ideal network, since the number of new requests does not affect the performance of an ideal network.

One problem resulting from this improvement is the starvation problem: It is possible that some communication requests may be withheld indefinitely as the new vertices join the "thrown-away" vertices causing the old vertices to be discarded repeatedly. In the next subsection, we shall propose a strategy to overcome this shortcoming.

Table 4.13. The Improvement Ratio of a DPBN over an Ideal Network When New Requests are Allowed to Join Traversals

6 N = 25 N = 50
p = 0.2 0.43 0.70
p = 0.3 0.57 1.00
p = 0.4 0.71 1.55
p = 0.5 0.75 1.65
p = 0.6 1.20 1.90
p = 0.7 1.20 1.90
p = 0.8 1.25 2.20
p = 0.9 1.45 2.25








69


4.4.2 A Strategy for Solving the Aging Problem in DPBN

To allow the communication requests that have been held up for the longest period of time to have the highest priority in the next processing of communication requests, we propose the following strategy:

1. Maintain a priority queue that keeps track of the priorities of communication

requests. Before any communication request starts transmitting messages, apply the graph traversal algorithm repeatedly to the thrown-away vertices and label the vertices with a time stamp. A time stamp is composed of the number of traversals and the assigned color. The priority queue is constructed in accordance with the time stamp. A typical priority queue looks like this: 1A, 1B, 2A, 2B, 3A..., where 1A is the group of vertices that are assigned the first color at the very first traversal and 1B is the group of vertices that are assigned

the second color at the very first traversal, etc.

2. Each time the graph traversal algorithm is executed, a group of vertices, such

as the vertices with priority 1A, are chosen to join the new communication requests and all the vertices with priority 1A are taken as the starting vertices of the graph traversal. Since none of the vertices with priority 1A are adjacent to each other and all are starting vertices, they are guaranteed to get a color in this graph traversal. At this point, the communication requests corresponding to the vertices in 1A and some new communication requests are allowed to go through. In the next traversal, the vertices with priority 1B will join the new requests and all the vertices in lB become starting vertices. This procedure is

continued until the priority queue is empty and a new queue is formed.

Figure 4.6 shows an example of this strategy. First, apply the graph traversal algorithm to the graph. Vertices A and C, colored with the first color, are in group








70


1A, vertices B and D, colored with the second color, are in group 1B, and vertex E, which was thrown away in the first traversal, is in group 2A. Second, combine a group of vertices from the priority queue in the order of priority with new communication requests for a graph traversal. This plan not only increases system throughput but also solves the problem of starvation.

4.5 Summary

In this chapter an application of the graph traversal algorithm to the dynamically partitionable bus network (DPBN) is presented. In a dynamically partitionable bus network a bus is partitioned into a number of subnetworks, and multiple communication processes are carried out at any moment. The proposed approach resolves conflicts by assigning colors to the graph where vertices represent communication requests and edges denote the conflicts among the communication requests. The communication requests corresponding to the vertices assigned the same color are carried out in parallel within subnetworks. In addition, a hardware design of graph traversal algorithm to speed up the identification and scheduling of non-conflicting communication requests is described. The performance evaluation of the partitionable bus network shows a significant increase of network throughput as compared with an ideal non-partitionable network in which no communication contention is assumed.









71
















1

A



0
B


0, 1

E




1 0
r SD














Figure 4.6. The Graph Traversal for the Strategy of Handling Aging

















CHAPTER 5
DPBN AND COIN-CHANGING ALGORITHMS

One of the characteristics of a distributed system is that processors compete for resources and processes are performed in parallel. Due to the fact that the elapsed times of these processes vary, system resources are often idle as these processes are completed and resources are released at different times. For instance, in a partitionable bus network, requests that correspond to the vertices that have been assigned the same color proceed in parallel. When a communication is over, a subnetwork assigned to that communication process will be idle and has to wait for other parallel communication processes to complete before it can be reused in the next round of graph traversal and assignment. For instance, three communication requests are selected from a pool of communication requests (qi, q2, q3, ..., qn) as shown in Table 5.1 by the graph traversal algorithm for parallel processing. The durations for qj, q3, and q4 are 23, 189, and 288 time units, respectively. The process q, will finish first and be followed by q3 and q4. Since the requests do not finish at the same time, the partitioned subnetworks will not be fully utilized. The utilization rate for this example is

23 + 189 + 288
= 0.578
288 3

The percentage of bus idling is approximately 42.

In a random distribution of the lengths of communication requests, the waste of bandwidth due to bus idling can reach 50 percent, since the expected length of communication requests is a half of the maximum length. The amount of idling 72








73


Table 5.1. An Example of Bus Idling


communication request Sending Station Receiving Station duration of request qg 1 3 23 q2 2 4 50 q3 4 5 189 q4 6 7 288 q5 1 4 211 q 1 5 100


time can be reduced if the message of each request is divided into a number of small time frames. Each communication request is allowed to transmit messages for the duration of a time frame and the scheduling of message transfer is based on these time frames. There are two ways to divide a message into time frames. One way is to use a fixed-size time frames. Using this method, a message is divided into a number of fixed-size frames. For example, if each time frame is 23 units, q2, q3, and q4 in this example can be scheduled for parallel processing with three subnetworks during the first time frame. q3 and q4 can be scheduled for parallel processing with two subnetworks during the next 8 time frames. q4 can be completed in the next 4 time frames. Since subnetworks can be released as soon as communication requests are completed, the utilization of the communication channel can be increased. However, if a large frame size is used, some idle time would still remain. For example, in the ninth time frame in this example, the subnetwork assigned to q3 would be idle for 18 time units since q3 has only 5 time units in the last frame. If a small frame size is used, the overhead as a result of increased graph traversals and scheduling can be significant.








74


Another way to divide a message is to use time frames of variable sizes. Using this method, a message is partitioned into a number of frames of variable sizes. If the set of time frames of variable sizes is chosen properly it is possible to keep the number of time frames required for processing the communication requests to a minimum, thus reducing the number of graph traversals and process scheduling. If we select a number of time frames of different lengths and assign time frames to each individual communication process using the Greedy algorithm described in Chapter 2, where the total length of the assigned time frames equals the elapsed time of the communication process, the number of time frames assigned would be minimized and so is the number of graph traversals. The Greedy algorithm is simple and fast in comparison to other algorithms.

This problem of assigning the proper mix of time frames of variable sizes is the same as the coin-changing algorithm described in Section 2.4. The Greedy algorithm takes as many of the largest coin type (time frame) as possible and then as many of the second largest coin type as possible, etc. However, it has two disadvantages. First, the Greedy algorithm does not always generate an optimal solution for a given total time T. Second, in some cases, the Greedy algorithm does not generate a solution even though an optimal solution exists. The examples can be found in Section 2.4. Both disadvantages can be eliminated if the set of coin types (time frames) is chosen properly, for example, if the set of coin types W = (W1, W2, W3, ..., W,) is in the form of a geometric sequence. A geometric sequence is defined as follows:


Definition 5.0.1 Geometric Sequence: A sequence W = (W1, W2, W3,..., Wn), where W < Wi+I is geometric if Wi+1/W = r and r is a positive integer.


Since the problem of minimizing the number of coins is the same as the problem of minimizing time frames, the set of time frame sizes is chosen to be a geometric









75


sequence in our approach. There are two advantages to use a geometric sequence as the set of frame sizes. First, it guarantees that the Greedy algorithm always generates a solution if a solution exists, as proven in [7]. Second, as we shall show, the Greedy algorithm always produces an optimal solution if the set of frame sizes is a geometric sequence. The proof for the second point is given in the next section. However, for a given distribution of communication lengths, there exist many possible geometric sequences. For instance, in a normal distribution of lengths of communication requests with a maximum length 100 and a minimum length 1, some useful geometric sequences are as follows:

W = (1,2,4,8,16,32,64)

W = (1,3,9,27,81)

W = (1,4, 16, 64)

The optimal set of frame sizes can be determined by simulation.

This chapter is organized as follows: Section 5.1 presents and proves some theorems related to the Greedy algorithm for coin-changing. Section 5.2 presents the application of coin-changing algorithm in partitionable bus networks. A performance evaluation is given in Section 5.3.

5.1 Some Theorems Related to Coin-Changing Algorithm

The terms used in this chapter are defined below. Some of these terms and definitions are taken from [7].

Definition 5.1.1 Representation: A set of non-negative integers (X1, X2,..., X,) which satisfies E XW = T is called a representation of T with respect to the set of coin types (W1, W2, W3,..., W).

For example, X = (1, 2, 0, 1) is a representation of T = 17 with respect to W = (1, 3, 4, 10).








76


Definition 5.1.2 Canonical Representation: A representation obtained by taking the highest value coin type as many times as possible, then taking the next highest value coin type as many times as possible, etc. In other words, the canonical solution is the representation computed by the Greedy algorithm.

For example, the canonical representation for T = 17 in the example of Definition

5.1.1 is X = (0, 1, 1, 1).

Definition 5.1.3 Complete: A set of coin types is said to be complete if any given integer T has a canonical representation.

For example, the set W = (1, 3, 8, 10, 26) is complete, while W' = (2, 4, 8, 10) is not, since T = 13 has no representation with respect to W'. Definition 5.1.4 Essentially Complete: A set of coin types W = (W1, W2, W3, ..., Wn) is said to be essentially complete if whenever a constant T is representable, it always has a canonical representation.

For example, W = (3, 9, 12, 27) is essentially complete, since any T that has a representation also has a canonical representation. W' = (3, 7, 12, 29) is not essentially complete since T = 9 has a representation (3, 0, 0, 0) with respect to W' but does not have a canonical representation.

Definition 5.1.5 Optimal Representation: A representation of an integer T with respect to W = (W1, W, W3, ..., W,) is an optimal representation if it uses a minimal number of coins.


Definition 5.1.6 Canonical Set: A set of coin types W = (W1, W2, Wa,..., W) is canonical if each canonical representation of T with respect to W is optimal.








77


Using the above definitions, we shall now present and prove a number of theorems.

The theorem below provides a sufficient condition to determine if a set of coin types is essentially complete.

Theorem 5.1.1 [7] W = (W1, W2, W3,..., W,) is essentially complete if and only if Wi = KiW1, for 1 < i < n, where KIi is a non-negative monotonically increasing integer.


Proof of Theorem 5.1.1 The proof is given in [7].


Now, we present and provide a theorem that provides a sufficient condition for a cannonical set.

Theorem 5.1.2 W = (W1, W2, W3,..., W,) is a canonical set of coin types if the sequence Wi (i = 1, ...,n) is a geometric sequence with ratio r > 1 where r = Wk+1/Wk and r is an integer.


Proof of Theorem 5.1.2 The strategy we take to prove the theorem is as follows. We want to show that if X = (X1,X2,X., ...,X,) is a representation of T generated by the Greedy algorithm with respect to the set of coin types W = (W1, W2, W3, ..., Wn) and Y = (Y, Y2, Y3, ..., Y) is an optimal representation, then X = Y.

Before we prove the theorem, we make the following four observations:

1. Y < r for i = 1...n. If Y1 is greater than r, we can simply replace r coins

of type W with a single coin of type Wi+x. By doing so, we reduce the total number of coins used by r-1. (i.e., Y is not an optimal representation if any Y

is greater than or equal to r.)

2. If Xk # Yk where k is the largest number less than n, then Xk > Yk. This is

because the Greedy algorithm always takes a maximum possible number of coins








78


of the highest values. It is obvious that Xk is greater than Yk at least by 1 if

Xk Yk.

3. g= 1 W < (Yk + 1)Wk.
To show that the above inequality holds, we first replace Y by (r 1) in E = YMWV

to obtain the following equation: k k-1 Y Wi< E (r 1)Wi + YkWk 1) i=1 i=1 The above equation holds because Y < r according to observation 1.

By taking the summation of the geometric series W, we obtain k-1
(r 1)W = Wk 1 (2) i= 1


By substituting (2) into (1) we conclude

k
YiW < (Yk + 1)Wh
i=1



4. E = XW = L~n= YiW = T. The above equation is valid, since both X and Y
are representations of T.

Based on the above four observations, we now prove the theorem by contradiction.

We first assume that X 5 Y. If k is the largest number such that Xk : Yk, the equation in observation 4 can be reduced to k k
Ex~w = Exvw,
i=1 i=1

By observation 3, we have

k
YWi < (Yk + 1)Wk (3) i=1








79


It is obvious that
k
XIW XkWk i=1
Since Xk > Yk by at least 1, due to observation 2, we know


XkWk > (Yk + 1)Wk

Thus,

k
Xi W (Yk + 1) Wk (4) i= 1

By (3) and (4)

k k E x, w, > E Yw
in1 =1 This is a eo~trdiclion to the fact that

k k
Exwe= Ex we
i=1 i=1 We conclude that either X = Y or X = (X1,X2, X3, ..., X,) is the optimal representation of T.


The following two corollaries show that a geometrical sequence is essentially complete, and is complete if W1 = 1. Corollary 5.1.1 W = (W, W2 W3, ..., Wn) is essentially complete if Wn is a geometric sequence. Proof of Corollary 5.1.1 The proof is obvious from Theorem 5.1.1.


Corollary 5.1.2 W = (W1,W2,W3,...,W,) is complete if and only if W = 1.


Proof of Corollary 5.1.2 The proof is obvious from the definition of Complete.








80


Theorem 5.1.3 Given a set of coin types W = (W1, W2, W3, ...,W), if Wk = (W1, W2, W3, ..., Wk) is canonical and the representation of T = pWk is optimal, then Wk+1 = (W1, W2, W3, ..., Wk, Wk+1) is canonical. The constant p is determined by taking the ceiling of Wk+l/Wk (i.e., p = [Wk+i/Wkl).


It should be noted that Wk represents the subset of W consisting of the first k elements of W.

Proof of Theorem 5.1.3 The proof is given in [58].

Theorem 5.1.3 is also called the one-point theorem. It provides a fast way to check if a set of coin types is canonical. A set of coin types W = (W1, W2, ..., W,) is a canonical set if the first (n-1) elements of W is a canonical set, and the representation of [W,W,,11 W, is optimal. The same rule can be applied to determine if the first (n-1) elements of W is canonical. For instance, to determine that W = (1, 4, 15, 20) is not a canonical set, the following steps are required:

Step 1: W1 = ( 1 ) is canonical. W2 = (1, 4) and p = [] = 4.

Step 2: For T = 4 W1 = 4, the representation (0, 1) obtained by the Greedy algorithm is an optimal representation. According to the theorem, W2 = (1, 4) is a canonical set.

Step 3: W3 = (1, 4, 15) and p = [~] = 4.

Step 4: For T = 4 *W2 = 4 4 = 16, the representation (1, 0, 1) obtained by the Greedy algorithm is an optimal representation. According to the theorem, W3 = (1, 4, 15) is a canonical set.

Step 5: W4 = (1, 4, 15, 20) and p = [-] = 2.

Step 6: For T = 2 *W3 = 2 15 = 30, the representation (2, 2, 0, 1) obtained by the Greedy algorithm is not an optimal representation since the representation (0, 0,








81


2, 0) for T = 30 uses fewer coins. As a result, W = (1, 4, 15, 20) is not a canonical set.

The above theorems serve the following purposes:

1. To establish a general rule for the construction of a canonical set (Theorem

5.1.2).

2. To determine if a set of coin types is complete or essentially complete (Corollary

5.1.1 and Corollary 5.1.2).

3. To provide an easy way to check if a set of coin types is canonical (Theorem

5.1.3).

In summary, a set of geometric sequence W = (W1, W2, ..., W,) with W1 = 1 is a good choice for the set of time frames in our approach, since each communication request can be decomposed into time frames by the Greedy algorithm and the number of time frames assigned is minimal.

5.2 The Application of the Coin-Changing Algorithm in Partitionable Bus Networks

As stated before, a dynamically partitionable bus network allows multiple communication processes to be carried out at the same time. However, the bus capacity is still not fully utilized, since the message lengths of the communication requests are not the same in general. Assuming the message lengths of communication requests are uniformly distributed, the loss of throughput due to non-uniform termination of communication processes could be 50 percent or higher. Bus idling can be reduced if the scheduling of requests is based on some variable-sized time frames assigned to the communication requests. Only these requests with communication durations representable by the assigned frames can participate in the graph traversal. However, the number of time frames assigned to the communication requests needs to








82


be minimized to reduce the number of graph traversals. The problem of minimizing the number of time frames can be transformed into a coin-changing problem if we regard the size of a time frame as the face value of a coin type and the message length of a request as the total dollar amount that the individual coins should add up to. The proposed approach is to select a set of coin types (time frames) according to the theorems provided in the previous section and use the simple Greedy algorithm to decompose each message length. The steps to be taken in this approach are as follows:


1. Select a set of time frames of different sizes that form a geometric sequence. The

purpose of choosing a geometric sequence is to guarantee that a communication request of any length can be divided into the selected time frames using the Greedy algorithm. Since the Greedy algorithm is very simple, it will add little

overhead to the communication request.

2. Apply the Greedy algorithm to each communication request. Each message

length will thus be represented by a minimal number of time frames.

3. Start with the largest time frame, designate it as the tag frame and collect the

communication requests which have at least one unit of the tag frame.

4. Transform the collected requests into a graph according to the techniques described in Chapter 3.

5. Apply the graph traversal algorithm to the constructed graph and allow the

requests that correspond to the vertices with the same color to be carried out

simultaneously.

6. Repeat steps 3, 4, and 5 for the next largest time frame.








83


The example below demonstrates the approach:

Suppose there are m communication requests with lengths 11, 12, 13, ..., Im, respectively, as shown in Table 5.2. By applying the Greedy algorithm to each length of communication request with respect to the set of time frames (ml, m2, 3, 3 n), the time of each communication request is decomposed into time frames of different lengths. For instance, l is made up of 3 frames of mi, 1 frame of m2, 2 frames of an-l, and 1 frames of m, as shown in the Table 5.1. Next, all the requests having some units of time frame mn are collected for graph traversal, and the requests assigned with the same color are allowed to send messages of length mn. Continue this process for each time frame n-1, mn-2, ..., etc.

The advantage of using the coin-changing algorithm with the graph traversal algorithm is obvious. The partitionable bus network can be more fully utilized. Since the Greedy algorithm always generates the optimal solution (i.e., the minimal number of time frames required to make up message length Ii with respect to m), the overhead of graph traversals can be reduced to a minimum.

Table 5.2. An Example of Decomposing Communication Requests 11 12 13 14 ... Im
im 3 0 0 1 ... 2
m2 0 2 1 1 ... 0
m3 1 1 1 0 ... 4
m4 0 5 2 2 ... 2

mn-1 2 0 3 0 ... 0
m, 1 2 3 4 ... 1



Ii : The message length of comunication request i.
mi : The size of the time frame i








84


5.3 Performance Evaluation

In this section, the usefulness of the proposed approach is demonstrated. The model and the parameters of the dynamically partitionable bus network are first described below:


1. The architecture of the partitionable bus network has been presented in Chapter

4. However, the lengths of communication requests are not uniform.

2. It is obvious that the distribution of the lengths of communication requests

determines the effectiveness of using the coin-changing algorithm to maximize the utility of the dynamically partitionable bus network. In this performance evaluation, a normal distribution, an exponential distribution, and a uniform

distribution are simulated.

The appendices A and B show the probability distribution of a normal distribution and an exponential distribution, respectively.

3. The maximal and minimal lengths of the communication requests are 100 units

and 1 unit, respectively.

4. The expected length of the communication requests is 50 units.

5. The time required to change switches between time frames is negligible.


In the simulation program, the set of frame sizes (1,2,4, 8, 16,32, 64) is used, which guarantees that the Greedy algorithm will generate the optimal solution. For the uniform distribution, the length of a communication request is determined by generating a random number between 0 and 1 and by multiplying that number by 100. Only one simulation run is completed for each distribution, and 100 communication








85


requests are generated for each run. For the normal distribution and the exponential distribution, the numbers are generated by using the procedure given in [16].

Table 5.3 shows the improvement ratio of a DPBN using time frames versus a DPBN without using time frames in a uniform distribution of communication lengths. The expected number of communication processes (ENP) being carried out at any instant is 2, 3, 4, 5, 6, 7, 8, 9, and 10, respectively. The improvement ratio A is the percentage of times the network delay is decreased. For instance, A = .30 means that a decrease of network delay by 30 percent.

T1 T2
T,



where T1 is the total communication time of a DPBN without using time frames and, T2 is the total communication time of a DPBN using time frames. As shown in Table 5.3, the network delay can be reduced by between 25 and 45 percent. The decrease of network delay is proportional to the ENP value.








86



Table 5.3. The Improvement Ratio of a DPBN Using Time Frames (Uniform Distribution of Communication Lengths)

ENP T T2 A
2 6508 4798 0.2626 3 7381 4995 0.3232 4 8119 5076 0.3748 5 8295 5108 0.3841 6 8680 5053 0.4178 7 8836 5072 0.4260 8 8935 5081 0.4313 9 9138 5108 0.4409 10 9124 5086 0.4430



T1: The Total Communication Time of a DPBN without Using Time Frames
T2: The Total Communication Time of a DPBN Using Time Frames
A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN Without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant.
The Number of Requests Generated = 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50









87


Tables 5.4, 5.5, 5.6, and 5.7 show the improvement ratio in an exponential distribution of communication lengths. The exponent coefficient f equals 60, 40, 20, and 10, respectively.








88



Table 5.4. The Improvement Ratio of a DPBN Using Time Frames (Exponential Distribution of Communication Lengths with P = 60)

ENP T1 T2 A
2 7163 5125 0.284 3 7752 4859 0.373 4 8277 4741 0.427 5 8741 4693 0.461 6 9134 4756 0.479 7 9325 4727 0.493 8 9475 4731 0.500 9 9573 4708 0.508 10 9720 4745 0.511



TI: The Total Communication Time of a DPBN without Using Time Frames
T2: The Total Communication Time of a DPBN Using Time Frames
A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant.
The Number of Requests Generated = 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50








89



Table 5.5. The Improvement Ratio of a DPBN Using Time Frames (Exponential Distribution of Communication Lengths with / = 40)

ENP T T2 A
2 5575 3880 0.304 3 6119 3614 0.409 4 6728 3527 0.475 5 7261 3486 0.519 6 7715 3546 0.540 7 8066 3539 0.561 8 8299 3518 0.575 9 8404 3500 0.583 10 8722 3534 0.594



T1: The Total Communication Time of a DPBN without Using Time Frames
T2: The Total Communication Time of a DPBN Using Time Frames
A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant.
The Number of Requests Generated = 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50








90



Table 5.6. The Improvement Ratio of a DPBN Using Time Frames (Exponential Distribution of Communication Lengths with # = 20)

ENP T T2 A
2 3060 2082 0.319 3 3381 1905 0.436 4 3837 1882 0.509 5 4166 1848 0.556 6 4553 1887 0.585 7 4868 1889 0.611 8 5061 1872 0.629 9 5232 1869 0.642 10 5493 1880 0.657



Ti: The Total Communication Time of a DPBN without Using Time Frames
T2: The Total Communication Time of a DPBN Using Time Frames
A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant.
The Number of Requests Generated = 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50




Full Text

PAGE 1

IMPROVING RESOURCE UTILIZATION IN A PARTITIONABLE BUS NETWORK USING GRAPH COLORING AND COIN-CHANGING ALGORITHMS By TAI-KUO WOO A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1989

PAGE 2

Copyright 1989 by Tai-Kuo Woo

PAGE 3

ACKNOWLEDGEMENTS I would like to express my heartfelt gratitude to my adviser, Dr. Stanley Y.W. Su, for his advice and guidance. Many thanks are also due to Dr. Yuan-Chieh Chow, Dr. Herman Lam, Dr. Richard NewmanWolfe, and Dr. ChungYee Lee for their encouragement. Without their help, this work would not have been finished. I am also grateful to David Johns for his proofreading of this dissertation. Lastly, I sincerely thank my roommate, David Kern, who has helped in making this endeavor possible. Ill

PAGE 4

TABLE OF CONTENTS ACKNOWLEDGEMENTS iii LIST OF TABLES vi LIST OF FIGURES viii ABSTRACT ix CHAPTERS 1 INTRODUCTION 1 2 SURVEY OF RELATED WORK 5 2.1 Local Area Networks 5 2.2 Partitionable Bus Networks 8 2.3 Graph Coloring Algorithms 9 2.4 Coin-changing Algorithms 16 2.5 Summary 19 3 GRAPH COLORING ALGORITHMS 20 3.1 The Graph Traversal Algorithms 20 3.1.1 Static Algorithm 20 3.1.2 Dynamic Algorithm 21 3.1.3 Heuristic Algorithm 25 3.2 Analysis of the Dynamic Graph Traversal Algorithm 27 3.3 Comparison of Results 37 4 DPBN AND GRAPH COLORING ALGORITHM 39 4.1 A Graph Traversal Algorithm and Its Application 41 4.2 Dynamic Bus Partitioning Technique 42 4.3 Hardware Design of A Graph Traversal Unit 47 4.3.1 Architecture of the Graph Traversal Unit 47 4.3.2 Input and Output of the Graph Traversal Unit 49 4.3.3 Organization of the Graph Traversal Unit 58 4.4 Performance Evaluation of the Partitionable Bus Networks 61 4.4.1 Simulation Study of DPBN 62 iv

PAGE 5

4.4.2 A Strategy for Solving the Aging Problem in DPBN 69 4.5 Summary 70 5 DPBN AND COIN-CHANGING ALGORITHMS 72 5.1 Some Theorems Related to CoinChanging Algorithm 75 5.2 The Application of the Coin-Changing Algorithm in Partitionable Bus Networks 81 5.3 Performance Evaluation 84 6 SUMMARY, CONCLUSION, OTHER APPLICATIONS, AND FUTURE WORK 100 6.1 Summary 100 6.2 Conclusion 102 6.3 Other Applications 102 6.4 Future Work 103 APPENDICES A PROBABILITY DISTRIBUTION: NORMAL DISTRIBUTION 105 B PROBABILITY DISTRIBUTION: EXPONENTIAL DISTRIBUTION 107 REFERENCES 108 BIOGRAPHICAL SKETCH 114

PAGE 6

LIST OF TABLES 3.1 The Approximate Values of the Expected Chromatic Number by Wood's Method and the Dynamic Graph Traversal Algorithm 37 4.1 A Graph Representation of a Group of Communication Requests ... 43 4.2 The Table of Conflicts among Communication Requests 43 4.3 The Adjacency Matrix Stored in the RAM 52 4.4 Thelnitial Values of the Registers (1^0, VFi,Vr2,-,Wr) 52 4.5 The Table of Communication Requests 63 4.6 The Adjacency Matrix of the Vertices in a Graph Corresponding to the Communication Requests in Table 4.5 (before graph traversal) 64 4.7 The Discarded Vertices of the Graph Traversal on the Adjacency Matrix in Table 4.6 64 4.8 The Adjacency Matrix of the Vertices in a Graph Corresponding to the Communication Requests in Table 4.7 (after the first graph traversal) 65 4.9 The Discarded Vertices of the Graph Traversal on the Adjacency Matrix in Table 4.8 65 4.10 The Adjacency Matrix of the Vertices in a Graph Corresponding to the Communication Requests in Table 4.9 (after the second graph traversal) 65 4.11 The Expected Communication Delays and Standard Deviations in Units for a DPBN and an Ideal Bus Network (in parentheses) .... 66 4.12 The Improvement Ratio of a DPBN over an Ideal Network 67 4.13 The Improvement Ratio of a DPBN over an Ideal Network When New Requests are Allowed to Join Traversals 68 5.1 An Example of Bus Idling 73 5.2 An Example of Decomposing Communication Requests 83 VI

PAGE 7

5.3 The Improvement Ratio of a DPBN Using Time Frames (Uniform Distribution of Communication Lengths) 86 5.4 The Improvement Ratio of a DPBN Using Time Frames (Exponential Distribution of Communication Lengths with /3 = 60) 88 5.5 The Improvement Ratio of a DPBN Using Time Frames (Exponential Distribution of Communication Lengths with (3 = 40) 89 5.6 The Improvement Ratio of a DPBN Using Time Frames (Exponential Distribution of Communication Lengths with (3 = 20) 90 5.7 The Improvement Ratio of a DPBN Using Time Frames (Exponential Distribution of Communication Lengths with f3 = 10) 91 5.8 The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with a'^ = 40) 93 5.9 The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with a^ — 30) 94 5.10 The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with cr-^ — 20) 95 5.11 The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with a^ = 15) 96 5.12 The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with a^ = 10) 97 5.13 The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with a^ — 5) 98 vn

PAGE 8

LIST OF FIGURES 2.1 An Example of Computing the Upper Bound of a Graph Using Welsh and Powell's Algorithm 13 3.1 An Example of Graph Traversal Using the Static Graph Traversal Algorithm 22 3.2 An Example of Graph Traversal Using the Dynamic Graph Traversal Algorithm 24 3.3 An Example of Graph Traversal Using the Heuristic Graph Traversal Algorithm 26 3.4 An Example of Graph Traversal on a K-partite Graph 30 4.1 The Architecture of a Partitionable Bus Network 40 4.2 A Graph Representation of Conflicts and Requests 44 4.3 The Graph Traversal of the Constructed Graph 45 4.4 The Detailed Organization of the Control Computer of the Partitionable Bus Network (DPBN) 50 4.5 The Organization of the Graph Traversal Unit 59 4.6 The Graph Traversal for the Strategy of Handling Aging 71 vin

PAGE 9

Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy IMPROVING RESOURCE UTILIZATION IN A PARTITIONABLE BUS NETWORK USING GRAPH COLORING AND COIN-CHANGING ALGORITHMS By Tai-Kuo Woo May 1989 Chairman: Dr. Stanley Y. W. Su Major Department: Computer and Information Sciences Achieving efficiency in a parallel processing environment is a fundamental problem in many computer science and engineering disciplines. In a large scale computer system, resources such as memory, secondary devices, and communication networks are usually shared by processors and processes. Resource contentions often occur, and system performance is degraded due to blocking and deadlock problems. System performance can be improved by detecting non-conflicting processes and scheduling them for parallel processing. In this dissertation, we introduce three graph coloring algorithms for distinguishing conflicting and non-conflicting processes. The complexity of each algorithm is 0(E), where E is the number of edges of the graph. By interpreting the results of the graph traversal algorithm, non-conflicting processes can be scheduled for parallel communication or processing. Another problem dealt with in this work is the idling problem in the execution of non-conflicting processes. Since the processes may take different amount of times to execute, if processors are assigned to process them to their completion, the processors of shorter processes will be idle after the completion of their tasks. In this dissertation, ix

PAGE 10

a coin-changing algorithm is applied to achieve better scheduling of non-conflicting processes. Both the graph traversal algorithms and coin-changing algorithms are then applied in a dynamically partitionable bus network to demonstrate that non-conflicting communication requests can be identified and scheduled for execution in partitioned subnetworks. The performance of a dynamically partitionable bus network using these algorithms is evaluated. This work makes the following specific contributions. First, it introduces and analyzes three graph coloring algorithms and their performance. An analytical study shows that the dynamic graph traversal algorithm has better performance than other existing graph coloring algorithms. Second, it presents a design of a hardware for implementing the dynamic graph traversal algorithm to meet the time requirement of some real-time environment. Third, it demonstrates the utility of the graph traversal algorithms in a partitionable bus network by analysis and simulation. Fourth, it provides an analysis of a coin-changing algorithm and its application to solve the bus idling problem in a partitionable bus network.

PAGE 11

CHAPTER 1 INTRODUCTION Parallel processing is an efficient form of information processing. It allows several concurrent processes to be processed simultaneously. System resources such as communication networks, memory devices, and processors are shared among processes. One of the limiting factors to the expansion of parallel processing is performance degradation due to resource contentions that occur when multiple processes request the same resources. Local area networks are example systems in which resource contentions degrade system performance. They are characterized by a high bandwidth transmission medium shared by many interconnecting processors within a Hmited area. In the past, local area networks have been developed to provide limited functions such as message passing and file transfer, and the amount of data transmitted among processors is small. If the number of computers attached to the network is large and the volume of communication among processors/memory devices increases, conventional interconnection devices, such as a single shared bus/ring, can not meet the increasing demands. Consequently, bus contention is unavoidable, and system performance degrades rapidly because of long communication delays. Much effort has been made in developing efficient local area networks, such as the development of protocols to reduce contentions in CSMA/CD (carrier sense multiple access with collision detection) [37] and token passing [23], and the design of hardware in register insertion ring networks [17]. An alternative approach is to partition a communication bus and to allocate communication networks dynamically. 1

PAGE 12

The concept of partitioning a computer system was introduced in [3,26,54]. Kartashev and Kartashev [26] propose varying word size to meet the requirements of the task on hand by physically partitioning registers and memory words. Su and Baru [3,54] use a physically partitionable bus to allow for the formation of a number of clusters of processors for parallel processing of database operations. However, the formation of clusters of processors is based on the data distribution and processing power needed. No specific algorithm is used to resolve the conflicts in the formation of clusters. In other words, the scheduling of database operations is based on a first-come-first-served basis. Thus, the degree of parallelism is not extended to its potential limit. In this dissertation, three graph coloring algorithms are proposed for identifying conflicting and non-conflicting communication requests. By assigning non-conflicting requests to different subnetworks created by manipulating switches of a dynamically partitionable bus, parallel communication can be achieved and maximized. This dissertation also presents a hardware design for implementing one of the algorithms. The performance evaluation of the partitionable bus network that uses a graph coloring algorithm for subnetwork formation is also presented. Another problem associated with the dynamically partitionable bus network is bus idling. In a dynamically partitionable bus network, the non-conflicting communication requests being carried out at the same time in the subnetworks usually are of different time lengths. The subnetworks assigned to the requests that finish early have to wait for other communication processes to finish. As a result, the subnetworks are idle and are not fully utilized. The bus idling problem can be eliminated by using a coin-changing algorithm. The strategy works as follows. First, the length of each communication request is partitioned into a number of time frames of different sizes. For instance, if the set of frame sizes is (1, 5, 10, 15), a communication

PAGE 13

request of length 70 can be partitioned into 4 frames of size 15 and 1 frame of size 10. Each time when the requests are collected for graph traversals, only the requests that have been assigned a tag frame can join the graph traversals. The tag frame is designated in turn from the largest frame size to the smallest frame size (i.e., 15 —> 10 — > 5 — > 1). There are many ways a request can be decomposed. For instance, the length 70 can be decomposed into either 2 frames of size 15 and 4 frames of size 10 or 4 frames of size 15 and 1 frame of size 10. The objective is to use the minimum number of frames for a given set of frame sizes, since a large number of frames will result in high frequency of graph traversals. The problem of minimizing the number of time frames can be transformed into a coin-changing problem which minimizes the number of coins for a given set of coin types. A performance evaluation shows a significant increase of network throughput by applying the coin-changing algorithm to the dynamically partitionable bus network. The approach presented in this dissertation can be outlined as follows: 1. Decompose each communication request into a number of time frames by a coin-changing algorithm. 2. Transform communication requests into a graph where vertices represent communication requests and edges denote the conflicts among communication requests. 3. Use graph coloring techniques to color the constructed graph so that nonconflicting communication requests are assigned the same color. 4. Manipulate the switches of the partitionable bus network to form a number of subnetworks to allow parallel communication among requests that have been assigned the same color.

PAGE 14

This dissertation is organized as follows: Chapter 2 gives a survey of related work in local area networks, partitionable bus networks, graph coloring algorithms, and coin-changing algorithms. Chapter 3 presents the graph traversal algorithms and the analyses of algorithms. Chapter 4 delineates the application of the best graph traversal algorithm in a dynamically partitionable bus network (DPBN), the hardware design of graph traversal unit (GTU), and the performance evaluation of the DPBN. Chapter 5 provides a coin-changing analysis and its application to the dynamically partitionable bus network. A conclusion is given in Chapter 6.

PAGE 15

CHAPTER 2 SURVEY OF RELATED WORK This dissertation touches a number of areas namely, local area networks, partitionable bus networks, graph coloring algorithms, and coin-changing algorithms. This section provides a brief survey of these areas. 2.1 Local Area Networks There are two most common topologies of local area networks: bus and ring. Channel sharing is one of the communication techniques used in both topologies. In a bus network, only one communication is allowed at a time. CSMA/CD and token bus are good examples of protocols used on this topology. In the token bus architecture, access to a shared bus is achieved by passing a token from station to station. The order in which the token is passed follows a "logical ring," which can be modified dynamically as stations are added to or dropped from the bus, and reestablished when token pass failures occur. A token regulates the right to access the bus. When a station receives the token, it is granted control of the medium for a period of time. When the packet transmission is over, it passes the token to the next station in a predefined sequence. This protocol requires considerable maintenance, which includes network initialization, addition to the network, deletion from the network, and fault management (multiple tokens, missing token). In order to meet the above requirements, the logic of each station can be very complex, and the overhead involved in token passing is a waste of bandwidth. In CSMA/CD, also known as listen before (while) talk, each station listens to the medium to determine if the medium

PAGE 16

is available. The CD part indicates that a station listens to its own transmission to detect collisions. If the medium is idle, the station may either transmit immediately or transmit with a probability. If the medium is busy, the station may either continue to listen until the medium is sensed idle, transmit with a probability, or wait a random period before sensing the channel. When collisions occur, the station immediately stops transmitting messages and waits a random period of time before retransmission. A major disadvantage of CSMA/CD is the non-deterministic behavior of the bound of delay. It is possible that a station may never be able to transmit messages when the load of the network is heavy. In a ring network, for instance, the Cambridge Ring [42], there is a sequence of point-to-point links between stations. Messages travel over a fixed route from station to station around the loop in which each station is responsible for regenerating messages and identifying addresses. In some cases, packets can be sent along different links simultaneously. Ring access protocols play a major role in the performance of ring networks. There are three basic types of ring access protocols: token passing, empty slot, and register insertion. The token ring protocol uses a token that circulates around the ring. Each station wishing to transmit messages must wait for a free token passing by. When the transmission is over, the station passes the token to the physically adjacent station, as opposed to the logically adjacent station as in token bus networks. The disadvantages are similar to token bus networks: only one station is able to transmit messages at a time, and fault management increases the complexity of the network. In the slotted ring, a number of fixed-length slots circulate around the ring. A station finds an available empty slot and inserts a packet of messages. When the full slot reaches its destination, the leading bit of a full slot is set to empty and is free for transmission again. The main disadvantages of slotted rings are: (1) overhead bits

PAGE 17

take a significant amount of space, and (2) a data packet may not occupy the entire slot. Each station in the register insertion ring has a shift register, a buffer which temporarily holds the packet passing through the station, and a buffer which stores the packets produced by the station. Packets which arrive at the shift register are received if the station is the addressee. For the output from the station, packets are placed in the buffer first and then transmitted to a shift register when it is idle. The fault management requires a significant amount of work. No matter what protocol is used, the following three major problems still exist in ring networks [19]: 1. The operation of the network depends on each station's network controller to regenerate the message. Thus, if an interface or controller fails, the network is essentially broken. This addresses the concern for reliability. Pierce [44] suggests using a "simple circuit" to bypass failed stations. Liu of Ohio State University [33] proposes a distributed double-loop computer network (DDCLN) as a fault-tolerant distributed system. Both approaches result in a higher hardware cost. 2. A ring network must be broken to add or delete stations. Expansion of the network may result in an interruption of network service. 3. Propagation delay is proportional to the number of stations in the network. As the number of stations increases, propagation delay can be a serious problem. Register insertion ring networks [17] allow a certain degree of parallelism; however they require a minimum of a 7-bit station latency for addressing. On the other hand, only one bit delay is sufficient for slotted and token rings. Long addressing delay

PAGE 18

8 damages the performance of register insertion networks. All in all, the existing local area networks have defects and need to be improved. A partitionable bus network that allows multiple communication processes to be carried out at the same time is a possible solution. 2.2 Partitionable Bus Networks As discussed above, both types of networks (bus and ring) have disadvantages. The lack of parallel communication has greatly reduced the performance of existing networks. The idea of bus partitioning was introduced in database machines to speed up data processing. There are two kinds of partitioning: logical partitioning and physical partitioning. First, in logical partitioning as used in DIRECT [11], a backend computer allocates query processors to perform tasks and handles data requests from processors. The system is divided into a number of subsystems logically and a controller monitors the progress of each subsystem. This scheme has two weak points as mentioned in [55]. First, the controller may become the bottleneck, since the control of operations is centralized. Second, the data movement between the mass storage and the main memory of the query processors can cause memory contentions in the interconnection network. In contrast to logical partitioning, physical partitioning reconfigures the hardware of the computer to maximize system efficiency. Kartashev and Kartashev [26], of Dynamic Computer Architecture, Inc., vary the word size by connecting computer elements to meet the requirements of processing tasks on hand. Each computer element consists of a processor element, a memory element, and an I/O element. Su and Baru, at the University of Florida Database Systems Development and Research Center, use a physically partitionable bus in a system called SMS [3,54] to increase the degree of parallelism. It has three important features. First, data movement among

PAGE 19

memory modules is done by memory switching instead of transmitting data among buses. Second, inter-processor communication and synchronization is achieved by using control lines. Third, concurrent processes are executed in parallel in the clusters of processors formed by physically partitioning a common bus. The formation of clusters of processors is based on the data distribution, processing power needed, and file sizes. No specific algorithm is used to resolve the conflicts in the formation of clusters. Thus, the degree of parallelism is not maximized. This dissertation uses a fast graph coloring algorithm for distinguishing conflicting and nonconflicting communication requests and then adopts the idea of a partitionable bus to form a number of clusters of processors for processing the non-conflicting communication requests simultaneously. Thus, the system throughput can be significantly improved. 2.3 Graph Coloring Algorithms Increasing system throughput and reducing response time are important objectives in designing and implementing a computer system. By duplicating hardware devices and allowing computation tasks to be processed in these devices in parallel, the degree of parallelism can be increased. However, the cost of such a hardware system can be rather high. An economic alternative for achieving a high degree of parallelism is to distinguish conflicting and non-conflicting events by software through graph coloring techniques and to execute non-conflicting events in parallel on a shared hardware resource. There are two classes of graph coloring problems: vertex coloring and edge coloring. The vertex coloring of a graph is the assignment of colors to the vertices of a graph so that no two adjacent vertices have the same color. Edge coloring is defined in a similar fashion: two edges sharing a common vertex in a graph G must be assigned with different colors. It has been proven that a graph G can be edge colored

PAGE 20

10 with N colors if and only if the vertices of G can be colored with N colors [41]. Thus, an edge coloring problem can be transformed into a vertex coloring problem. For the rest of this dissertation, the graph coloring problem is defined as a vertex coloring problem. The vertex coloring technique can be applied in computer systems in the following way. If we represent the vertices in a graph as the events of a computer system and edges as the conflicts among events, the event scheduling problem can be transformed into a vertex coloring problem. By finding the minimum number of colors for the vertices of a graph and scheduling the events (or vertices) of the same color for parallel execution, a higher degree of parallehsm in a computer system can be achieved. Finding the minimum number of colors (i.e., the chromatic number) required for a given graph is an NP-complete problem [15,25]. However, it is possible to find algorithms that can assign colors to any arbitrary simple graph (i.e., a graph without self-loops nor multiple edges between any pair of vertices) and the number of colors assigned is close to the chromatic number. We shall survey the existing methods that attempt to find the exact chromatic number and the approximate chromatic number below: 1. Exact chromatic number: Algorithms in this category find the chromatic number of a graph. However, they require a very large amount of computing time and memory space. For example, Christofides' algorithm [9] constructs a maximum subgraph tree for a given graph from its "maximal independent sets." Through a breadth-first search method, the shortest path from the root to the terminal nodes can be found. The maximal independent sets on the shortest path are the color classes of chromatic coloring. Unfortunately, the subtree constructed by this algorithm can be too large to be practical for implementation.

PAGE 21

11 Christofides' algorithm was improved by Wang [59], who showed that, by considering only a subset of all the maximal independent sets of a graph, the required number of steps to compute the chromatic number of a graph can be reduced by a factor of as much as (| V |/ 2 !), where | V | is the number of vertices of a graph and 2 denotes 2 factorial. By using a depth-first search method, the amount of memory required to find the shortest path from the root to the terminal nodes can be reduced by a ratio of (| V |/8!) to (| V |/2!). However, the computing time of this algorithm grows exponentially with the increase of the size of the graph. 2. Approximate chromatic number: Due to the intractability of the graph coloring problem, some algorithms produce a larger number of colors for a graph than it is necessary. For instance, Greedy's algorithm orders the vertices in arbitrary order, say, 14, T^, ^, ••, K, and then colors the vertices in sequence. First, color Vi with color 1, then check if Vi and V2 are non-adjacent. If so, color V2 with color 1, otherwise with color 2. Continue this process until all the vertices are colored. The result of the Greedy algorithm is highly dependent on the order of vertices. Brooks [6] provides a fast way of computing the upper bound for the chromatic number of a graph. The upper bound is defined as max {di,d2,d3,....,dn) where di is the degree of vertex i. The estimation is valid only if the graph G(V E) is not a complete graph and the largest degree of the vertex of the graph is greater than or equal to three.

PAGE 22

12 The method introduced by Welsh and Powell [60] first arranges the vertices in descending order of their degrees (i.e., dl > d2 > d3,...,dn). It then colors the first vertex with color 1 and sequentially colors the rest of vertices which are not adjacent to a previously colored vertex. This process is repeated until all the vertices are colored. Welsh and Powell's algorithm provides an upper bound for the estimation of the chromatic number of a graph: maxi min (di + l,i) where i is the index of the vertices arranged in descending order of degrees. The example below illustrates the procedure for computing the upper bound of the chromatic number of the graph as shown in Figure 2.1. The following steps constitute the procedure: Step 1: Arrange the degrees of the vertices in descending order. di 4 3 3 2 2 Step 2: Select the minimum for each pair of values (Jj + 1, i), where i goes from 1 to 5. min [di + l,i) 12 3 3 3

PAGE 23

13 Figure 2.1. An Example of Computing the Upper Bound of a Graph Using Welsh and Powell's Algorithm

PAGE 24

14 Step 3: Choose the largest value among the values in Step 2. maxi min [di-\-l,i) = 3 In this example, it happens to be the chromatic number of the graph. This algorithm produces good results in some cases. However, the number of colors required may increase if the order of vertices is changed. To correct this problem, Wood [61] proposes a coloring method by forming a similarity matrix. A similarity matrix S = {sij} is formed to determine which vertices should be colored with the same color. The values of sij are determined by the conflicts of vertices. Sij = if Cij = 1. Sij = J2kic^kncjk) if Cij = 0. It should be noted that k is summed over all vertices and C = {cij} is the conflicting matrix, where dj = 1 if vertex i and vertex j have a conflict, and Cij = if vertex i and vertex j have no conflict. To color the vertices of a graph, assign a color to the pair of vertices with the highest similarity based on whether both, one, or none of the vertices has been colored. Each time the similarity matrix is scanned, the similarity level is reduced by one. Continue this process until all the vertices are colored. The similarity matrix method is better than Welsh and Powell's method when the number of vertices is large and the graph is highly connected. However, the improvement is limited and the computation time requirement is high. Plesnik [45] partitions a graph into a spanning graph F and a number of induced subgraphs (T4, ^2, 1^, •••, ^-)By estimating the chromatic number of the spanning graph F through the same partitioning technique, the chromatic number of graph G is computed as the sum of the chromatic numbers of the

PAGE 25

15 first f subgraphs (i.e, gi + g2 + 93 + ••• + g'/), where gi is the chromatic number for subgraph Vi and f is the chromatic number for spanning graph F. The partitioning technique is not clearly stated in the paper and the performance of the algorithm is not available. Other researchers have proposed factoring techniques for coloring [18,45]. In these techniques, a graph is decomposed into factors Fi,F2,F3, ...,Fj (F's are subgraphs). The chromatic number is estimated as the sum of the chromatic numbers of the subgraphs. However, good factoring techniques usually require large amount of computing time. Several other algorithms are available and can be found in [5,31,36,40]. All the graph coloring algorithms mentioned above share a common feature: the algorithms are rather complex. Although Welsh and Powell's method is superior to others in terms of simplicity and computation time requirement [5], it predicts the chromatic number poorly when the graph is highly connected. Graph coloring algorithms have been used in timetable design of examinations [61]. The events scheduled are examinations, each of which requires a period of time. The problem can be transformed into a graph coloring problem by representing each examination with a vertex of a graph and drawing an edge between the vertices if the corresponding examinations can not take place concurrently. The objective is to find the minimal number of periods required to accommodate all the examinations. NewmanWolfe [43] uses edge coloring to schedule communication requests statically in multistage interconnection networks, where I/O ports and connection lines are represented by vertices and edges of a graph, respectively. Edges assigned the same color are switched on to allow communications to go through simultaneously.

PAGE 26

16 Many people recognize the usefulness of the graph coloring algorithms, but because of the intractability of the graph coloring problem, few realistic applications have been proposed so far. Most approaches are static in nature, i.e., the participating events and constraints remain unchanged during the entire period of processing. The designation of vertices as processors in multiprocessor scheduling and the representation of switch boxes by vertices in multistage interconnection networks are examples of a static scheduling scheme which does not fully reflect the dynamic nature of request contention in a computer system and is therefore unable to optimize resource utilization. The approach proposed in this dissertation is to schedule events dynamically through graph traversal algorithms. The detailed description will be given in Chapter 4. This dissertation presents three graph traversal algorithms which give good estimations of the chromatic number of a graph in 0(E) time where E is the number of edges of the graph. They can be applied to speed up the process of distinguishing conflicting and non-conflicting requests. However, due to the non-uniform ending of the requests, the subnetworks assigned to the processes that finish early will be idle. The problem can be solved by fragmenting the messages of the communication requests and by using a fast coin-changing algorithm to schedule them for execution. 2.4 Coin-changing Algorithms In distributed processing, each individual processor needs to perform tasks independently and cooperate with other processors if necessary. Many of the computing tasks require cooperation among processors, and thus result in heavy loads on networks. Performance of the system degrades because of network contentions. The approach of using a graph coloring algorithm solves the contention problem and allows for parallel communication. What remains is the bus idling problem, which can

PAGE 27

17 be solved by a coin-changing algorithm. The bus idling problem is due to the fact that the communication processes that are allowed to proceed in the partitioned subnetworks take different time durations, and the subnetworks that are released by the processes of shorter durations can not be reused until all the processes are completed. The coin-changing problem can be defined as follows. For a given set of coin types W = {Wi,W2,...,Wn), the goal is to meet a given total T value with a minimum number of coins. It is assumed that the supply for each type of coin is unlimited. Mathematically, the problem can be formulated as follows: Minimize C = SX, i = 1, 2,...,n subject to HWiXi = T where Xi is a non-negative integer. The coin-changing problem is a simphfied version of the integer programming problem. The existing work on this topic can be categorized as follows: 1. Exact algorithms: Researchers look for algorithms that can find optimal solutions for any arbitrary set of coin types. Chang and Gill [8] provides a recursive algorithmic solution which requires a considerable amount of computation time. Wright [62] proposes a solution using dynamic programming techniques to simplify the computation task. Both algorithms find optimal solutions. However, the computation time required makes the algorithms of little practical value. 2. The Greedy algorithm [8]: The Greedy algorithm is simple and fast. It takes as many of the largest coins as possible and then as many of the second largest coins as possible, etc. The procedure below implements the Greedy algorithm:

PAGE 28

18 Procedure Coin — Changing {W, X, T); (* VF = ( VFi, W2, W3, ... Wn ) is the set of coin types] X = ( Xi, X2, Xs,..., Xn ) is the representation of T with respect to W*) begin For i := n to 1 do begin Xi := MOD ( T, W,) T:=T-Xi Wi end; end; However, the algorithm does not always find optimal solutions. In some cases, the algorithm does not find a solution even though one exists. For instance, when the set of coin types is W = (5, 11, 15, 20) and the total is T = 30, the Greedy algorithm would return the representation X = (2, 0, 0, 1) instead of the optimal representation Y = (0, 0, 2, 0). If the total is T = 26, the Greedy algorithm would not return any representation even though an optimal representation, Y = (0, 1, 1, 0), exists. In this category, researchers are trying to determine if a set of coin types W (Wi, W2, W3, ..., Wn) will yield optimal solutions when the Greedy algorithm is applied as discussed in [7,8,58]. For instance, in Chang and Gill's paper, a range of values of T (the sum of the values of coin types) {Ws.WniWnWn-i + W„ 3W„_i)/(W„ W„_i)) has to be checked for optimality with a given set of coin types W. Magazine et. al [34] provide methods that would check fewer values of T.

PAGE 29

19 2.5 Summary In order to meet the demand of a high performance local area network, both the graph coloring algorithms and coin-changing algorithms are applied in a dynamically partitionable bus network to improve the network performance. In this section, we have presented the survey of related work including partitionable bus networks, graph coloring algorithms, and coin-changing algorithms. The appHcation, analysis, and performance evaluation of the graph coloring algorithms and the coin-changing algorithms will be presented in the next three chapters.

PAGE 30

CHAPTER 3 GRAPH COLORING ALGORITHMS The graph coloring algorithms are the foundation of this research. Three fast graph traversal algorithms which give good estimate of the chromatic number of a graph are presented. This chapter is organized as follows: Section 3.1 deUneates the algorithms including the static, dynamic, and heuristic algorithms. Section 3.2 evaluates the performance of the dynamic graph traversal algorithm. Section 3.3 shows the results of the performance evaluation. 3.1 The Graph Traversal Algorithms The algorithms to be presented below assign colors to the vertices of a graph through graph traversals. Each traversal results in some colored vertices and "thrownaway" vertices. Through repeated traversals of the "thrown-away" vertices, all vertices are assigned with colors. 3.1.1 Static Algorithm Given a simple graph, the following steps are taken to color the vertices as illustrated in Figure 3.1. 1. Arbitrarily pick one vertex in the graph and label it as 1. 2. Assign a count to its adjacent vertices with a value that is one greater than the count of the vertex being traversed and mark the corresponding edges. Repeat this step on the new adjacent vertices until all edges are marked. If an adjacent vertex already has a count, a new count is assigned to it. Thus, vertices may have multiple counts. 20

PAGE 31

21 3. Examine vertices with multiple counts, if the sum of any two counts associated with a vertex is odd, discard the vertex. For the remaining vertices, assign color 1 to those vertices whose sum of the counts is even. Assign color 2 to the rest of the vertices. This step colors the graph with two colors (color 1 and color 2) and discards some vertices. 4. The thrown-away vertices form a new graph which is then traversed starting from Step 1. The algorithm terminates if no thrown-away vertex remains. Figure 3.1 shows an example of the graph traversal algorithm. In Figure 3.1a, vertex A is arbitrarily picked and is labeled as 1. The adjacent vertices of vertex A (i.e., vertex B and vertex C) are labeled as 2 as shown in Figure 3.1b. Repeat the same procedure: vertices D, E, and F are labeled as 3 in Figure 3.1c. Finally, in Figure 3. Id, vertex E is labeled as two 4's, due to its adjacency to vertex D and vertex F. So far, all the edges of the graph have been marked and vertex E has multiple counts. According to step 3 of the static graph traversal algorithm, vertices B and C are assigned with color 1, vertices A, D, and F are assigned with color 2, and vertex E is discarded. Obviously, vertex E is assigned with color 3 in the next round of traversal. The approximate chromatic number of this graph produced by this algorithm is 3 which happens to be the actual chromatic number. 3.1.2 Dvnamic Algorithm The dynamic graph traversal algorithm is intended to improve the result of static algorithm in the following ways: 1. Reduce the number of thrown-away vertices. 2. Simplify the labehng technique.

PAGE 32

22 (a) (b) (c) (d) 3,4,4 Figure 3.1. An Example of Graph Traversal Using the Static Graph Traversal Algorithm

PAGE 33

23 3. Color the vertices of the graph "on the fly". The dynamic graph traversal algorithm labels, colors, and throws away vertices as each vertex is traversed. A vertex whose sum of the counts is odd is discarded immediately to avoid further consideration. The following steps form the algorithm: 1. Arbitrarily pick one vertex and label it as 1. The vertex chosen is the front vertex. 2. Label the adjacent vertices of the front vertex in random order with the I's complement value of the count of the front vertex. For each labeling of a new vertex, if the sum of any two counts is 1, discard the vertex right away. Repeat this procedure for each vertex adjacent to the front vertices being traversed. 3. Designate the newly labeled vertices as the front vertices. Go to step 2 if there are still unlabeled vertices. 4. Go to step 1, if there are vertices that have been thrown away in step 2. Figure 3.2 demonstrates the steps of the dynamic graph traversal. In Figure 3.2a, vertex A labeled as 1 is the front vertex. Vertices B and C which are adjacent to the front vertex A are labeled as in Figure 3.2b (complement value of the count of the front vertex A). Note that the counts of each vertex are checked immediately to see if the vertex needs to be thrown away. At this point, vertices B and C are the new front vertices. In Figure 3.2c, vertex C is labeled as 1 and is thrown away because it is adjacent to vertex B. As a result, since there is no more front vertex to be processed, the algorithm would randomly pick a new starting vertex from the unlabeled vertices (i.e., vertices D or E) and label it as 1, as shown in Figure 3. 2d. According to step 2, vertex E is labeled as 0. So far, all the vertices have been labeled and colored. The

PAGE 34

24 (a) (b) (c) Figure 3.2. An Example of Graph Traversal Using the Dynamic Graph Traversal Algorithm

PAGE 35

25 group of vertices with a count of 1 is colored with the first color, the group of vertices with a count of is colored with the second color, and the vertex C is colored with the third color in the next round of traversal. The advantages of this algorithm are obvious as compared to the static graph traversal algorithm. They are: • Vertices are thrown away "on the fly" as they are traversed so that the edges associated with them do not have to be traversed as they do in the static algorithm. • Fewer edges need to be traversed. • Labeling vertices with O's and I's is simpler than counts. In addition, the dynamic graph traversal algorithm has some mathematical properties which simplify the analysis of its performance. Details are given in Section 3.2. 3.1.3 Heuristic Algorithm In some cases, both the static and dynamic algorithms will throw away a high percentage of vertices. Figure 3.3 gives an example. Suppose that vertex A is chosen as the front vertex using the dynamic algorithm and labeled as 1 and vertices B, C, D, and E are labeled as 0. If the labeling is in the direction of B->C, B->D, and B->E in the next step, three vertices (C, D, and E) will be thrown away. On the other hand, if the labeling of vertices is in the opposite direction (C->B, D->B, and E->B), only vertex B will be thrown away. This motivates us to pursue some heuristics to further reduce the number of vertices that will be thrown away. Two heuristic rules are given below:

PAGE 36

26 Figure 3.3. An Example of Graph Traversal Using the Heuristic Graph Traversal Algorithm

PAGE 37

27 1. Pick the vertex with the largest degree (i.e., the largest number of edges) as the starting vertex. The idea is to expand the set of front vertices as soon as possible. 2. In exploring the adjacent vertices of a front vertex, start with the vertex of the smallest degree. The justification for this rule is to throw away the vertices that break up most odd cycles as soon as possible. The odd cycles are the ones with an odd number of vertices. If we incorporate the heuristics proposed above to the dynamic graph traversal algorithm, the number of thrown-away vertices can be significantly reduced. The example in Figure 3.3 throws away only one vertex when the heuristic graph traversal algorithm is used. 3.2 Analysis of the Dynamic Graph Traversal Algorithm The dynamic graph traversal algorithm has some mathematical properties which yield a simplier analysis of the bound of the approximate chromatic number. First, we define some theorems and related terms. Definition 3.2.1 Bipartite graph: A graph G (V, E) is a bipartite graph if there exist subsets VI and V2 of V where VI il^ V2 = V, such that each edge of the graph (i.e., e Ei E) has one endpoint in Vl and the other in V2. A graph is a bipartite graph if and only if it is two-colorable. Corollary 3.2.1 The dynamic graph traversal algorithm is an efficient way of determining if a graph is two-colorable (bipartite). Applying the dynamic traversal algorithm to a graph G, if a vertex is thrown away in the middle of traversal, the graph is not two-colorable. ^ union disjoint set

PAGE 38

28 Corollary 3.2.2 Each traversal of the dynamic graph traversal algorithm partitions the vertices of the graph into three groups of vertices: vertices with color 1, vertices with color 2, and the thrown-away vertices. Definition 3.2.2 Independent set: A set of vertices is said to he independent if no two of its vertices are adjacent to each other. Definition 3.2.3 Maximal independent set: An independent set I is maximal if no other independent sets contain I in graph G. Theorem 3.2.1 For a connected graph, each traversal of the dynamic algorithm generates two maximal independent sets (i.e., the set of vertices with color 1 (set 1) and the set of vertices with color 2 (set 2)). It should he noted that a maximal independent set is not the same as a maximum independent set. Proof of Theorem 3.2.1 The claim that the set of vertices with color 1 is a maximal independent set is based on the following argument. No vertices in set 2 can he added to set 1 without breaking its independence, since each vertex in set 2 is adjacent to at least one vertex in set 1. Furthermore, no vertex in the set of thrown-away vertices (set 3) can be added to set 1, since each vertex in set 3 is adjacent to at least one vertex in set 1 and one vertex in set 2. As a result, set 1 must be a maximal independent set. The same argument can be used to prove that set 2 is also a maximal independent set. Theorem 3.2.2 The dynamic graph traversal algorithm produces the exact chromatic number 2 for a bipartite graph. Proof of Theorem 3.2.2 Without loss of generality, it can he assumed that the graph G is connected. (If the graph is not connected, the algorithm, simply treats each

PAGE 39

29 component separately.) For a connected bipartite graph, the number of maximal independent sets is two which can be determined by the dynamic graph traversal algorithm in the first traversal of the graph (Theorem 3.2.1). No vertices are thrown away. As a result, the algorithm generates the exact chromatic number for a bipartite graph. Definition 3.2.4 k-partite graph: A graph whose vertex set can be partitioned as V = VI l+l V2 tbl...l) Vk such that there exists no edge between a pair of vertices located in the same partitioned set. Definition 3.2.5 Complete k-partite graph: A k-partite graph is complete if there exists an edge between each pair of vertices located in distinct partitioned sets. Theorem 3.2.3 The dynamic graph traversal algorithm always produces the exact chromatic number k for a complete k-partite graph. Proof of Theorem 3.2.3 Let G(V, E) be a complete k-partite graph with vertex partitions VI, V2,...,and Vk. Each edge of this graph has its endpoints in two distinct partitions (see Figure 3.4a for an example of k-partite graph). Without loss of generality, a vertex si in VI is picked as the starting vertex and is assigned with a count of 1. In Figure 3.4b, the adjacent vertices of the starting vertices si (all the vertices not in VI) are assigned a count of according to the dynamic graph traversal algorithm. Now, all the vertices not in VI are the front vertices. In the next step, pick a vertex from the set of front vertices, say, s2 in V2 for expansion. The adjacent vertices of s2 are given a count of 1 as shown in Figure 3.4c. At this step, all the vertices in V3 through Vk are thrown away because the sum of two counts is odd and the vertices in VI and V2 are assigned with colors 1 and 2, respectively. Thus, a new (k-2)-partite graph is formed with vertex partition V3, V4,...,and Vk. By repeating the same steps, the dynamic graph traversal algorithm uses k colors for the k-partite graph.

PAGE 40

30 (b) (c) Figure 3.4. An Example of Graph Traversal on a K-partite Graph

PAGE 41

31 Definition 3.2.6 Random graph moid: A random graph model Af(\ V \, q) is defined in terms of the number of vertices of a graph (] V \) and the probability (q) that an edge exists between any distinct pair of vertices, where < q < 1. This model consists of a set of graphs, each of which has \ V \ vertices and its edges are determined by a random, number generator. For a given q value, if a random number is less than q, then the edge between two vertices exists. A set of graphs is generated for the model in this manner. The random graph model is used in the performance evaluation of the dynamic graph traversal algorithm, since it generates all possible graphs with equal probability. Definition 3.2.1 Clique: A clique here means a maximal complete subgraph of a graph. Theorem 3.2.4 The expected number of maximal independent sets of a random graph containing a fixed vertex has been derived in [35,59] as follows: El = f:[^^^:l^)ii-qY^'-'^'\i-{i-
PAGE 42

32 The calculation of the number of maximal independent sets for a random graph model M(| y I, q) is done by counting the number of cliques and replacing q with (1q), since the complement of a maximal independent set is a clique [15]. The counting of cliques can be explained as follows: 1. The size of cliques ranges from 1 to ] V |. 2. For each size of cliques (d), there will be I j edges and the probability is q raised to the j th power. 3. If a vertex can be added to a clique, it has to be fully connected to the vertices in the chque. The probability that this situation happens is q raised to the dth power. The probability that a vertex can not be added to a cHque is therefore (1 q'^). Since there are (| V | d) vertices not in the clique, [l q"^) has to be raised to the (| V | d)th power. 4. The product of the probabilities derived in items 2 and 3 is the probability that a clique with d vertices exists. 5. Since a fixed vertex must be included in the clique, the number of combinations for a clique of size d is ( ;-^T) 6. The product of the terms derived in items 4 and 5 is the expected number of cliques of d vertices in a random graph model with | V \ vertices under the condition that a fixed vertex must be included in the clique. 7. El (the expected number of cliques) is calculated by taking the summation of the expected numbers of cliques for different sizes of cliques.

PAGE 43

33 8. The expected number of maximal independent sets is obtained by replacing q with (1-q) in Ei, since cliques and maximal independent sets complement each other. ^1 = E('^:i')(i-?)'^''-^^/^(i-(i-#r'-^ Theorem 3.2.5 The expected number of maximal independent sets of a random graph model is E, = E ( T ) (1 5)'^'-'^^'(i (1 ?)Ti-^ where E2: the expected number of maximal independent sets of a random graph model The calculation is similar to Theorem 3.2.4. Definition 3.2.8 Color class: In a colored graph, the set of vertices with the same color forms a color class. Definition 3.2.9 Expected chromatic number (ECN): The expected value of the chromatic number for a random graph model is calculated by taking the summation of the product of each possible chromatic number and its probability.

PAGE 44

34 It should be noted that the closed form for the expected chromatic number is not known at this moment. In this section, the approximate value of the expected chromatic number which can be obtained by applying the dynamic graph traversal algorithm to a random graph is computed. Methods for coloring random graphs have been proposed by several researchers [4,14,24,32]. Various constraints on the random graphs, such as small graphs, dense graphs, etc., were introduced in their algorithms. This work considers general random graphs without constraints. In the rest of this section, the theoretical basis for evaluating the performance of the dynamic graph traversal algorithm on general random graphs is provided. Given a random graph, the assignment of colors to its vertices is done by applying the dynamic graph traversal algorithm repeatedly to the random graph until each vertex has been assigned a color. Since each traversal produces two maximal independent sets, and each maximal independent set forms a color class, the number of maximal independent sets produced by the algorithm is equal to the approximate value of the expected chromatic number of the random graph model defined by | V | and q. The approximate value of the expected chromatic number can be calculated as follows: Step 1. Calculate the expected size of the maximal independent sets (ESMIS). The expected size of maximal independent sets (ESMIS) is equal to the sum of the number of maximal independent sets with d vertices times d vertices (where d may go from 1 to | V |) and divided by the total number of maximal independent sets, i.e.,

PAGE 45

35 EfJi 4 ^ ) (1 qY^'''^l\l (1 qfr^-' ESMIS = -^ -^ eS ( J ) (1 qY^'-'^'K^ (1 ?)'^)i^i-'^ The numerator can be simplified \v\ E^('^')(i-?)'^'-ni-(i-?)T'-^ \y\ 1 y I! 1^1 nyi_iv E,\V Thus, 171 ESMIS = -7^\V E2 where Ei and ^2 are the expected number of maximal independent sets containing a fixed vertex and the expected number of maximal independent sets, respectively.

PAGE 46

36 Step 2. The approximate value of the expected number of maximal independent sets is calculated by dividing the number of vertices of the graph by the expected size of maximal independent sets, i.e., where AECN = approximate value of the expected chromatic number I y I — number of vertices of the random graph ESMIS = expected size of maximal independent sets The dynamic graph traversal algorithm partitions a random graph into a number of maximal independent sets which are picked from a pool of all possible maximal independent sets for the random graph. Depending on the patterns of random graphs, different sets of maximal independent sets will be chosen. Even for random graphs with the same pattern, the chosen maximal independent sets may vary, since the algorithm picks the starting vertex arbitrarily when traversing the graph and the order of exploring the adjacent vertices of a front vertex is random. Since each maximal independent set of the pool of maximal independent sets has the same probability to be chosen, and the expected size of the maximal independent sets in the pool can be computed as a function of (| V |) and (q), (| V \ / ESMIS) is the expected number of maximal independent sets that will be produced by the dynamic graph traversal algorithm. If each maximal independent set is a color class, the expected number of maximal independent sets is the approximate value for the expected chromatic number. As an example, for a random graph with 50 vertices and the probability that an edge exists between any pair of vertices is 0.5, the following values can be derived using the above equations.

PAGE 47

37 Ex = 98.78 E2 = 963.50 ESMIS = 5.14 AECN = 9.7 3.3 Comparison of Results In this section, both the simulation results obtained by the method that orders vertices and uses a similarity matrix [61] and the analytical results of the dynamic graph traversal algorithm are provided. The results are shown in Table 3.1. Table 3.1. The Approximate Values of the Expected Chromatic Number by Wood's Method and the Dynamic Graph Traversal Algorithm V q a ^ 20 0.25 35 3.11 20 0.50 58 5.11 20 0.75 810 7.89 50 0.25 68 5.43 50 0.50 10 13 9.75 50 0.75 1620 15.73 100 0.25 1013 8.78 100 0.50 18 22 16.52 100 0.75 2833 27.63 In the table, | V \ and q are the number of vertices and the probability that an edge exists between a pair of vertices, respectively, a is the range in which most approximate chromatic numbers produced by Wood's algorithm fall, and /? is the approximate value of the expected chromatic number computed using our formula. As shown in the table, /? stays in the low end of a for various | ^ |'s and q's. As can be seen, the dynamic graph traversal algorithm produces better results. It should be noted that the comparison is based on the analytical results of the dynamic algorithm

PAGE 48

38 and the simulation results of Wood's method. Since the domains of samples are not exactly the same, the difference between a and /? may not be as large as it is shown in Table 3.1. The comparison based on simulation results will be included in the future work.

PAGE 49

CHAPTER 4 DPBN AND GRAPH COLORING ALGORITHM One way to increase bus capacity is to physically partition the bus, allowing parallel communications in the fragixiented segments. Figure 4.1 shows the architecture of a partitionable bus [3,54] which consists of a number of stations connected to a shared communication network in a multipoint configuration. A number of switches are used to physically partition the bus into clusters, each of which contains a number of adjacent stations. For example, by closing all switches except the kth switch, two clusters are formed; one contains stations 1 to k and the other contains k+l to n stations where n is the total number of stations in the network. Communications among stations in one cluster can be carried out in parallel with the communications in another cluster. The control of the switches can be either centralized or distributed. In a centralized control, the control computer sets the switches for partitioning or connecting the bus. In a decentralized control, an individual station turns the switch on and off when receiving a command from the control computer. The above idea of using a dynamically partitionable bus will be useful if there is an efficient algorithm to distinguish conflicting and non-conflicting communication requests so that the network can be properly partitioned to allow a maximum of non-conflicting requests to be processed in parallel. In the context of a partitionable bus network, two communication requests are said to be in conflict if the range of adjacent stations defined by the sender and the receiver of one request overlaps with the range defined by the sender and the receiver of the other request. For example, if station 1 wants to send a message to station 39

PAGE 50

switch switch switch switch Control Computer o o o Station 1 Station 2 Station n Figure 4.1. The Architecture of a Partitionable Bus Network

PAGE 51

41 4 and station 3 wants to send a message to station 5 at the same time, these two requests would be in conflict since the range 1 to 4 overlaps with the range 3 to 5 and the bus can not be partitioned to allow simultaneous transmissions of these requests. Graph traversal algorithms can be applied to efficiently distinguish conflicting and non-conflicting requests. This chapter is organized as follows: Section 4.1 describes the application of the dynamic graph traversal algorithm. Section 4.2 delineates the bus partitioning technique. Section 4.3 presents the hardware design of the dynamic graph traversal algorithm. Section 4.4 gives the results of a performance evaluation. 4.1 A Graph Traversal Algorithm and Its Application In Section 3.1, three graph traversal algorithms: (static, dynamic, and heuristic graph traversal algorithms) were presented. Each traversal of the algorithm results in some colored vertices and "thrown-away" vertices. Through repeated traversals of the "thrown-away" vertices, all vertices are assigned with colors. The dynamic graph traversal algorithm has several advantages over the static traversal algorithms and is suitable for real-time applications. The application of the dynamic algorithm in a computer system is as follows. If each vertex of the graph (Figure 3.2) represents an event of a computer system, the events corresponding to vertices B, D, and F can be processed in parallel, as can the events corresponding to vertices A and E. The corresponding event of vertex C, which is thrown away in the graph traversal, will join the next group of processes to compete for a shared system resource. As a result of separating conflicting and non-conflicting events, the system performance can be enhanced. If the vertices in Figure 3.2 represent communication requests of a single shared bus, the system throughput can almost be doubled. In the

PAGE 52

42 case of a very busy network, the increase in system throughput will be even higher. The traversal time of this algorithm is negligible since it is a very simple algorithm and the time can most probably be overlapped with the communication processes. 4.2 Dynamic Bus Partitioning Technique This section shows how the graph coloring algorithm can be applied in a dynamically partitionable bus network. The following steps are required to construct a graph for a partitionable bus network. 1. Represent each communication request from one station to another by a vertex in the graph. 2. Determine the conflicts among the communication requests. If there is a conflict between two communication requests as defined in the beginning of this chapter, draw an edge between the corresponding vertices. 3. Apply the graph traversal algorithm to the constructed graph. (Note: Some of the vertices may be thrown away as described in the graph traversal algorithm). 4. At time tl, all the vertices (representing communication requests) with color 1 are allowed to perform communication in parallel in different clusters of stations which are formed by physically partitioning the bus. Then, at time t2, all the vertices with color 2 are allowed to perform communication in parallel by repartitioning the bus. The vertices which have been thrown away at this traversal will join the next group of communication requests. Tables 4.1 and 4.2 and Figures 4.2 and 4.3 illustrate the steps of the bus partitioning technique. Five communication requests are collected in a communication request table (Table 4.1). Each request is represented by a vertex in the graph; for

PAGE 53

43 example, the communication request from station 1 to station 2 is denoted by vertex A. Table 4.2 shows the conflicts among the communication requests. A "C" in a table entry indicates that there is a conflict between the request on the row and the request on the column. The graph representation of the communication requests shown in Figure 4.2 is constructed according to the conflicting events shown in Table 4.2. In Figure 4.2, communication requests are represented by vertices (A, B, C, D, and E) and edges are drawn to indicate their conflicts. Figure 4.3 shows the result of the graph traversal. Table 4.1. A Graph Representation of a Group of Communication Requests From station To station Graph representation (vertex) 1 2 A 2 4 B 3 4 C 4 5 D 5 6 E Table 4.2. The Table of Conflicts among Communication Requests 1 2 24 34 46 56 1 -2 C 2 4 C C C 3-4 C C 45 c C C 5-6 c In this example, vertices A and C are assigned color 1, vertices B and E are assigned color 2, and vertex D is thrown away. At time tl, the communication request from station 1 to station 2 and the communication request from station 3 to station 4 can be done in parallel by opening the switch between station 2 and

PAGE 54

44 1-2 3-4 4-5 2-4 56 Figure 4.2. A Graph Representation of Conflicts and Requests

PAGE 55

45 1,0 Figure 4.3. The Graph Traversal of the Constructed Graph

PAGE 56

46 station 3 and closing all other switches. At time t2, the communication request from station 2 to station 4 and the communication request from station 5 to station 6 can be done in parallel by resetting the switches properly. The communication request from station 4 to station 5, which is represented by vertex D, will join the next group of communication requests. In a system that uses a dynamically partitionable bus, the bus can be properly partitioned to allow parallel communication to take place at diiferent times. Thus, system throughput can be dramatically increased by using the graph traversal algorithm in conjunction with the partitionable network. The example given above demonstrates our approach and algorithm. The results of a simulation study of DPBN will be given in Section 4.4. There are several notable features in our approach to solving contention problems in a computer system. 1. Flexible Representation: Vertices can be used to represent any type of requests for any resource in a computer system. Any resource contention problem can be transformed into a graph coloring problem. For instance, in database management, vertices can denote database operations (JOIN, SELECT, AND PROJECT), and edges can represent memory access conflicts among database operations. In multiple bus networks, partitionable networks, and multistage interconnection networks, vertices and edges can represent communication requests and conflicts in using communication network(s), respectively. This flexibiUty in problem representation allows for a broad range of appUcations of the presented algorithm. As long as a system has multiple processes competing for limited resources, the proposed approach can be applied. 2. Fast Running Time: It is obvious that the complexity of the graph traversal algorithm is 0(E), where E is the number of edges of the graph, since each

PAGE 57

47 edge is processed once only. The required memory space for traversing the graph is small, and the traversal can overlap with packet communication and data processing to reduce the overhead involved (i.e., while processes are being processed in one time frame, the algorithm is used to traverse the graph for the next group of requests). 3. Multiple Ways of Interpreting the Colors Assigned to the Vertices. Depending on the applications, the colors assigned to the vertices of the graph can be interpreted in a variety of ways. For instance, the colors assigned to the vertices of the graph can be interpreted as time frames in partitionable bus networks and database query processing. The corresponding requests of the vertices assigned with different colors must be performed at different time frames. However, in multiple bus networks, the colors can be interpreted as buses. The corresponding requests of the vertices assigned with different colors can be carried out in different buses at the same time frame. 4.3 Hardware Design of A Graph Traversal Unit In a real-time environment, many requests for a resource arrive at the same time and a very fast scheduling decision needs to be made. The proposed algorithm may not meet the time requirement if it is implemented in software. In that case, a hardware implementation of the algorithm would be needed. In this section, a design of the graph traversal unit is presented. 4.3.1 Architecture of the Graph Traversal Unit The graph traversal unit is a specialized hardware with the capability of distinguishing conflicting and non-conflicting communication requests by assigning either a "0" or a "1" to each communication request. It is connected to the control unit

PAGE 58

48 of the control computer and can operate in parallel with the normal operation of the control computer, which off-loads communication requests to the graph traversal unit. As shown in Figure 4.1, the control computer and stations are suspended off the partitionable bus. Figure 4.4 shows the detailed organization of the control computer, which consists of the usual components of a general purpose computer including disks, memory, DMA, control unit, and additionally a graph traversal unit. In general, the system works as follows: 1. The control computer collects communication requests from the stations and produces an adjacency matrix such as the one illustrated in Section 4.2. A communication request is composed of two numbers, the tag of the sender and the tag of the receiver. For each pair of communication requests, the corresponding tags are used to determine if there is a conflict. A conflict occurs if the range of consecutive stations represented by a pair of numbers overlaps with the range represented by another pair as shown in the following procedure. Procedure For Determining Conflict; {*Let the Tags for a pair of communication requests be {Tia,Tu) and {T2a,T2b), respectively. Each pair of numbers represents a range of consecutive stations.*) begin If{T^a < T2a and Tia < T2b and T^ < T2a and Tib < T2b) then return no conflict elseif{Tia > T2a and Tia > T2b andTu > T2a and Tib > T2b) then return no conflict else

PAGE 59

49 return conflict end; 2. The control computer maps the adjacency matrix to the RAM of the graph traversal unit and activates the unit. 3. The graph traversal unit driven by an internal controller returns the results of graph traversal to the control computer, which sets switches on/off to partition the bus accordingly to allow messages to go through in parallel. It should be noted that the determination of conflicts among communication recjuests can be done by either the CPU of the control computer or the graph traversal unit. If the graph traversal unit determines the conflicts, an additional logic unit would be required. 4.3.2 Input and Output of the Graph Traversal Unit The input to the graph traversal unit is nothing but the adjacency matrix. The adjacency matrix is a RAM of size 128 x 128, which is essentially a two dimensional array with each element indicating the adjacency of the vertex on the row and the vertex on the column. A conflict between any two communication requests is indicated by placing a 1 in the corresponding position in the adjacency matrix. Depending on the number of communication rec[uests being processed by the graph traversal unit, the size of RAM can be changed to suit system's need. The number 128 was chosen arbitrarily for illustration purposes. It is a system design parameter. The results of a graph traversal are stored in two 128 x 1 registers (status matrix) which use two bits (one bit from each register) to show the status of each vertex: colored with the first color, colored with the second color, and thrown away for the next traversal. In addition to the status matrix, a status code is used to indicate the

PAGE 60

Control Computer 50 Internal Bus Interface I TO DPBN DMA Disks GTU CU Memory I Note: GTU stands for graph traversal unit Figure 4.4. The Detailed Organization of the Control Computer of the Partitionable Bus Network (DPBN)

PAGE 61

51 status of the graph traversal unit. When the results of traversal are ready, a signal is sent to the control computer. On the other hand, the control computer may issue one of the three signals to specify the external functions to be executed by the graph traversal unit. The external functions are as follows: 1. Clear: Before loading the adjacency matrix, the RAM is cleared. 2. Load: The control computer off-loads the graph traversal unit with the adjacency matrix. 3. Go: The graph traversal unit is activated. The controller of the graph traversal unit starts executing micro-instructions and produces results. The external function Go is composed of a sequence of logical operations. The following example illustrates the traversal of the graph in Figure 4.2 by the GTU. To simpHfy the illustration, the size of the RAM is reduced to 8 x 8 and the size of the registers is reduced to 8 bits. 1. Graph traversal unit is enabled. 2. The RAM is loaded with the adjacency matrix. As shown in Table 4.3, each address of the RAM (addresses 000 through 111) represents a communication request. For instance, the address 000 represents a communication request from station 1 to station 2 (vertex A), and the adjacent vertices of vertex A are the corresponding vertices of the bits with 1 in the row of address 000. It is obvious that vertex B is the only adjacent vertex of vertex A (address 000) since the second bit of the row 000 is 1 and all other bits are O's. It should be noted that only the upper triangle of the RAM is filled with I's if the corresponding communication requests are in conflict. For example, the

PAGE 62

52 conflict between vertex A (address 000) and vertex B (address 001) is indicated by putting a 1 in the entry (1, 2) instead of in the entry (2, 1). Table 4.3. The Adjacency Matrix Stored in the RAM 000 (vertex A) 1 001 (vertex B) 1 1 010 (vertex C) 1 on (vertex D) 1 100 (vertex E) 101 (unused) 110 (unused) 111 (unused) 3. There are eight working registers (Wo, Wi, ..., W7) used to support the hardware graph traversal algorithm. They are initiaHzed with values as shown in Table 4.4. Each bit of a working register represents the status of the corresponding Table 4.4. The Initial Values of the Registers (Wq, Wi, W2, ..., W7) Wo Wi W2 W3 1 W4 W5 1 1 1 1 1 1 1 1 We 1 W7 vertex. For instance, the first bit of a working register represents the status of vertex A, the second bit represents the status of vertex B, etc. The purposes of these working registers and their relationship to the dynamic graph traversal algorithm are explained below:

PAGE 63

53 In the dynamic algorithm, the starting vertex is randomly chosen. However, the starting vertex must not be an isolated vertex. W5 is used for searching for a starting vertex. This is done by searching the RAM row by row. The corresponding vertex of a row that is not null can be designated as the starting vertex. In this example, vertex A (address 000) is chosen as the starting vertex. Once the starting vertex (front vertex) is chosen, the adjacent vertices of the front vertices need to be explored. Wq stores the adjacent vertices of the front vertex. The register Wq is an 8-bit register. Each bit indicates the status (adjacent or non-adjacent) of the corresponding vertex to the front vertex. For instance, the row of address 001 of the RAM is loaded into Wq and the third and fourth bits of Wo are I's. It means that vertex C (represented by the third bit) and vertex D (represented by the fourth bit) are adjacent to vertex B (RAM address 001). In the process of traversal, the adjacent vertices need to be assigned the current count. Wi stores the current count that is being assigned to the vertices. The bits of Wi are either all I's or all O's. After the expansion of each of the front vertices, the current count is replaced with a 2's complement value of the current count. In other words, the count is alternating between 1 and 0. To record the adjacent vertices assigned with current count (either or 1), W2 and W3 are used. W2 stores the current count of the adjacent vertices. In each expansion of a front vertex, the adjacent vertices of the front vertex are assigned a current count stored in Wi. If the current count stored in Wi is 1, the contents of Wq are copied to W2. This operation will assign a count of 1 to the bits of W2 where the corresponding bits in Wq are 1. If the current count stored in Wi is 0, no count needs to be assigned, since W2 is initialized to zero.

PAGE 64

54 Wa stores the history of counts (accumulative counts) assigned to each vertex. It should be noted that only one bit in W^ is enough for recording the history of the counts of a vertex, since multiple I's are the same as a single 1 and multiple O's are the same as a single O's in evaluating whether a vertex should be thrown away or what color should be assigned to the corresponding vertex. The current count assigned to W2 is accumulated in W3 after each expansion of a front vertex. In other words, Ws contains all the counts which have been assigned to the vertices. This can be done by taking a logical "OR" operation between W2 and W3. The first bit of W3 in this example is initialized to 1, since vertex A is designated as the starting vertex and has been assigned a count of 1. While assigning the current count to each adjacent vertex, each vertex is checked to see if the condition of "thrown-away" (i.e., a vertex that has at least one count of 1 and at least one count of 0) is met. The results of checking of "thrown-away" are stored in W4. The corresponding vertices of the bits with the value 1 in W4 are "thrown-away" vertices. The "thrown-away" vertices can be determined by taking a logical "AND" operation between W3 and W7 (to be given below). After each of the set of front vertices has been expanded, the adjacent vertices of the current front vertices become a new set of front vertices. We stores the set of front vertices. The corresponding vertices of the bits with a value of 1 are the front vertices. In our example, the first bit of We is initialized to 1, since vertex A is the starting vertex (front vertex). In order to distinguish between a vertex assigned with a count of and a new vertex (a vertex that has no count), W-r is used to keep track of the vertices that

PAGE 65

55 have been assigned a count of 0. A vertex that has been assigned a count of is identified by putting a 1 in the corresponding bit of W^. This can be achieved by taking a logical "OR" operation between Wq and W7. In this example, Wj is null, since no vertices have been assigned a count of 0. 4. The adjacent vertices of the starting vertex (front vertex) are explored by loading the row of address 000 into Wq {Wq
PAGE 66

56 7. Encode each mark bit of Wq. The data stored in the encoded address of the RAM are read into Wq {Wq
PAGE 67

57 11. Since no vertex has been thrown away in the step above, a new set of front vertices is estabUshed by copying Wq to Wq. At the same time, the current count is replaced with the 2's complement value of the current count {Wq <— Wq and Wi <— ~ Wi)). The contents of W^ and We are now Wi 00000000 We 1 1 12. The mark bits of number 1 in Wq are encoded and the data read from the RAM are stored in Wq (Wq
PAGE 68

58 16. Encode the mark bits of I's in W4 to clear the rows of the encoded addresses in the RAM. Since the "thrown-away" vertices are indicated by a 1 in the corresponding bit of W4, the corresponding rows of the "thrown-away" vertices in the RAM can be cleared by encoding the mark bit of I's in W4. 17. At this step, the RAM is clear and the traversal is done. The counts of the vertices are stored in W3. The corresponding vertices of the bits with I's are assigned the first color, and the corresponding vertices of the bits with O's are assigned the second color. The corresponding vertices of the bits with I's in W4 are thrown away. In this example, vertices A and C are assigned the first color, vertices B and E are assigned the second color, and vertex D is thrown away. The contents of W3 and W4 are now W3 1 1 1 VF4 00010000 In summary, the traversal of a graph consists of a number of repeated cycles. A cycle starts with encoding the front vertices for retrieving the data stored in the RAM, followed by assigning the current count to the adjacent vertices, accumulating the counts, and checking the thrown-away vertices. The operations are simple and can be done in a very small amount of time. 4.3.3 Organization of the Graph Traversal Unit The graph traversal unit is a microprogrammed subsystem equipped with a local bus. The details of the graph traversal unit are shown in Figure 4.5. It consists of the following components:

PAGE 69

GTU Controller Rl RAM (128 X 128) decoder WO Wl o o o W7 TEMP G T U B U Figure 4.5. The Organization of the Graph Traversal Unit

PAGE 70

60 1. Data memory and MAR: The data memory is a RAM of size 128 x 128 used to store the adjacency matrix of communication requests. MAR specifies the memory address for storing/retrieving data to/from the data memory. 2. Controller: It contains the micro-instructions for controlling the operations of a graph traversal. 3. Working registers (WO, Wl,...,W7): The traversal of vertices involves exploring adjacent vertices, keeping track of front vertices, and checking for "thrownaway" vertices. The working registers are designed for marking and keeping track of the status of each vertex. A register is composed of 128 mark bits where each mark bit indicates the count or status of the corresponding vertex (communication request). 4. Logic unit and accumulator: This is where the logic operations are performed. The logic unit performs operations on two operands: one stored in the accumulator and the other stored in one of the working registers. 5. Priority encoder and temporary register: The priority encoder and address decoder are adopted from the work reported in [30,46]. Since the working registers that need to be encoded contain mostly O's, it is very slow to check bit by bit for I's. The priority encoder is used to encode the mark bits of registers and generate addresses for retrieving the data stored in the RAM. It works as follows. The priority encoder generates the address of the highest priority mark bit of 1. The generated address is loaded from the address bus to the RAM address register. The address is used to read from the RAM and to clear the mark bit. For example, the working register We with the contents (We 0101100 0) indicates that vertices 1, 3, and 4 are the front

PAGE 71

61 vertices. Priorities are assigned to these bits from left to right. Starting with the highest priority mark bit of 1 (bit 1), the address 00000001 is generated and the data stored at 00000001 of the RAM is retrieved. After the retrieval, the mark bit 1 is reset to allow the next highest priority bit to be encoded. The same procedure holds true for the mark bits 3 and 4. Another purpose of the priority encoder is to generate the address of the row in the RAM that needs to be cleared. For instance, for each of the "thrown-away" vertices, the address of the corresponding row in the RAM is generated by encoding the corresponding bit in W4. Once the address is generated, the temporary register that is initiahzed to zero is used to clear the corresponding row in the RAM. 6. Address decoder: The main purpose of address decoder is to decode the address generated by priority encoder for resetting the mark bits. In the middle of the traversal, each of the front vertices has to be explored for adjacent vertices. After a front vertex has been explored, the address of that front vertex has to be decoded in order to reset the corresponding mark bit and allow the next mark bit of 1 to be encoded. In the example above, after the data stored in 00000001 has been retrieved, the address 00000001 has to be decoded by the address decoder to reset the mark bit and to allow the next mark bit of 1 to be encoded. 4.4 Performance Evaluation of the Partitionable Bus Networks In this section, the performance of the dynamically partitionable bus network (DPBN) is compared with an "ideal" local area network using simulation. In addition, an aging analysis to avoid indefinite communication delay is discussed.

PAGE 72

62 4.4.1 Simulation Study of DPBN We will compare the performance of the dynamically partitionable bus network to an ideal network. An ideal local area network has a communication channel which has 100 percent utilization rate and is free from any request contention. In other words, it has the theoretically optimal performance of any one-communication-at-atime bus network. The model of dynamically partitionable bus network used in the simulation is described below: 1. N stations are connected to a shared communication channel in a multipoint configuration. The probability that a station will issue a communication request at any instant of time is p. 2. The probability that a station will issue two communication requests in a very short period of time as compared with the time interval of communication is very small. 3. All communication requests take the same amount of time. 4. Communication requests are independent of each other. In general, this model is characterized by two parameters: the number of stations connected to the bus (N) and the probability that a station will issue a communication request (p). A simulation program that models the partitionable bus networks was implemented on a VAX 8600. The following steps describe the program: 1. For each station, if the number generated by the random number generator is less than p, a request is established. The destination of thr request is determined by dividing the range [0,1) into N (the number of stations attached to

PAGE 73

63 the bus) half open, equal intervals, with each interval representing a station, and the corresponding station of the interval in which the random number falls is the destination of the request. 2. An adjacency matrix is established by determining the conflicts among communication requests according to the procedure given in Section 4.1. 3. The graph traversal algorithm is applied to the adjacency matrix. 4. The number of colors assigned to vertices is the communication delay in units. The example below shows how the communication requests are generated and processed by the program. A number of requests, as shown in Table 4.5, are established by a random number generator. The adjacency matrix shown in Table 4.6 is Table 4.5. The Table of Communication Requests Sending Station Receiving Station 20 4 5 10 1 3 8 5 9 4 7 13 3 18 12 8 15 4 16 18 constructed according to the procedure in Section 4.3. A 1 in the cell indicates a conflict between the request on the row and the request on the column. For instance, the cell (4 1) is 1 because the request 4 (station 8 sends messages to station 5) is in conflict with request 1 (station 20 sends messages to station 4).

PAGE 74

64 Table 4.6. The Adjacency Matrix of the Vertices in a Graph Corresponding to the Communication Requests in Table 4.5 (before graph traversal) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Table 4.7. The Discarded Vertices of the Graph Traversal on the Adjacency Matrix in Table 4.6 Discarded Vertices 4, 5, 6, 7, 8, 9

PAGE 75

65 After the first graph traversal, the vertices, as shown in Table 4.7, are discarded. A new adjacency matrix of the discarded vertices of the first graph traversal, as shown in Table 4.8, is established. Table 4.8. The Adjacency Matrix of the Vertices in a Graph Corresponding to the Communication Requests in Table 4.7 (after the first graph traversal) 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 The vertices discarded in the second traversal and their adjacency matrix are shown in Tables 4.8 and 4.9, respectively. Table 4.9. The Discarded Vertices of the Graph Traversal on the Adjacency Matrix in Table 4.8 Discarded Vertices 3, 4, 5, 6 Table 4.10. The Adjacency Matrix of the Vertices in a Graph Corresponding to the Communication Requests in Table 4.9 (after the second graph traversal) 1 1 1 1 1 1 In the third traversal, only vertex 4 is thrown away. The total number of colors used is seven and the communication delay is seven units.

PAGE 76

66 Table 4.11 shows the expected communication delays and standard deviations in units of time frames for N = 25, 50, 75, 100 and p = 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9. A time frame is the duration of each communication request. The numbers in parentheses are the communication delays for an ideal network. The expected communication delays for the DPBN are obtained by taking the average of 20 simulation runs. The communication delay for each run is calculated by poUing each station for a request, constructing a conflicting graph, and traversing the constructed graph. The expected communication delay for the ideal network is obtained by multiplying N by p. As shown in the table, the DPBN has much shorter communication delay. Table 4.11. The Expected Communication Delays and Standard Deviations in Units for a DPBN and an Ideal Bus Network (in parentheses) D,la {D,) N = 25 N = 50 N = 75 N = 100 p = 0.2 3.10/1.18 (5.0) 6.50/2.85 (10.0) 9.30/2.49 (15.0) 11.40/2.93 (20.0) p = 0.3 5.00/1.34 (7.5) 10.20/2.17 (15.0) 13.30/2.34 (22.5) 17.00/3.09 (30.0) p = 0.4 6.80/1.77 (10.0) 12.90/2.60 (20.0) 17.40/3.08(30.0) 23.60/4.34 (40.0) p = 0.5 8.80/2.04 (12.5) 14.75/2.19 (25.0) 22.80/2.89 (37.5) 29.20/4.31 (50.0) p = 0.6 9.60/2.03 (15.0) 18.75/2.81 (30.0) 25.40/2.20 (45.0) 35.90/4.39 (60.0) p = 0.7 11.20/1.82 (17.5) 19.20/3.73 (35.0) 30.20/3.43 (52.5) 38.80/2.94 (70.0) p = 0.8 12.35/2.57 (20.0) 22.80/2.63 (40.0) 32.10/2.72 (60.0) 40.57/5.14 (80.0) p = 0.9 13.60/1.41 (22.5) 26.40/3.63 (45.0) 37.80/2.92 (67.5) 51.00/5.24 (90.0) In Table 4.11, D^ and D2 are the expected communication delays for a DPBN and an ideal network, respectively, a is the standard deviation of the expected communication delays for a DPBN, N is the number of stations attached to the network, and p is the probability that a station may issue a communication request. Table 4.12 gives the improvement ratio for parameters N and p. It was computed by the formula below:

PAGE 77

67 6 = D2-Di where 8= The improvement ratio The improvement ratio the percentage of decrease of network delay of a DPBN as compared to an ideal network. It is obvious that with a little extra hardware, the DPBN reduces network delay by around 70 percent depending on the number of stations attached to the network and the frequency of requests. Table 4.12. The Improvement Ratio of a DPBN over an Ideal Network 8 N = 25 N = 50 N = 75 N= 100 p = 0.2 0.61 0.54 0.61 0.75 p = 0.3 0.50 0.47 0.69 0.76 p = 0.4 0.47 0.55 0.72 0.70 p = 0.5 0.42 0.69 0.65 0.71 p = 0.6 0.56 0.60 0.77 0.67 p = 0.7 0.56 0.83 0.74 0.80 p = 0.8 0.62 0.75 0.87 0.97 p = 0.9 0.65 0.70 0.78 0.76 The results in Tables 4.11 and 4.12 are obtained based on the assumption that new vertices (communication requests) are not allowed to join a graph traversal until all the old vertices (communication requests) have been assigned colors and processed. As each traversal continues, the size of the graph reduces and so does the number of communication processes allowed in the bus. It should be noted that there is no obvious functional relationship between the improvement ratio and p in Table 4.12. The future work will include the study of the relationship between the improvement ratio and the parameters of the network (N and p).

PAGE 78

68 Table 4.13 shows the improvement ratio for the case when new vertices are allowed to join the "thrown-away" vertices after each traversal. The improvement ratio increases significantly in comparison with that shown in Table 4.12 because the size of the graph is not reduced after each traversal. In this simulation, the number of new communication requests stays the same during the traversals of a simulation run. In other words, the number of requests that are allowed to proceed in a traversal is the same as the number of newly added requests. When the number of communication requests allowed to go through reaches the number of initially generated requests, the simulation run stops. However, this strategy does not reduce the network delay of an ideal network, since the number of new requests does not affect the performance of an ideal network. One problem resulting from this improvement is the starvation problem: It is possible that some communication requests may be withheld indefinitely as the new vertices join the "thrown-away" vertices causing the old vertices to be discarded repeatedly. In the next subsection, we shall propose a strategy to overcome this shortcoming. Table 4.13. The Improvement Ratio of a DPBN over an Ideal Network When New Requests are Allowed to Join Traversals 6 N = 25 N = 50 p = 0.2 0.43 0.70 p = 0.3 0.57 1.00 p = 0.4 0.71 1.55 p = 0.5 0.75 1.65 p = 0.6 1.20 1.90 p = 0.7 1.20 1.90 p = 0.8 1.25 2.20 p = 0.9 1.45 2.25

PAGE 79

69 4.4.2 A Strategy for Solving the Aging Problem in DPBN To allow the communication requests that have been held up for the longest period of time to have the highest priority in the next processing of communication requests, we propose the following strategy: 1. Maintain a priority queue that keeps track of the priorities of communication requests. Before any communication request starts transmitting messages, apply the graph traversal algorithm repeatedly to the thrown-away vertices and label the vertices with a time stamp. A time stamp is composed of the number of traversals and the assigned color. The priority queue is constructed in accordance with the time stamp. A typical priority queue looks hke this: lA, IB, 2A, 2B, 3A..., where lA is the group of vertices that are assigned the first color at the very first traversal and IB is the group of vertices that are assigned the second color at the very first traversal, etc. 2. Each time the graph traversal algorithm is executed, a group of vertices, such as the vertices with priority lA, are chosen to join the new communication requests and all the vertices with priority lA are taken as the starting vertices of the graph traversal. Since none of the vertices with priority lA are adjacent to each other and all are starting vertices, they are guaranteed to get a color in this graph traversal. At this point, the communication requests corresponding to the vertices in lA and some new communication requests are allowed to go through. In the next traversal, the vertices with priority IB will join the new requests and all the vertices in IB become starting vertices. This procedure is continued until the priority queue is empty and a new queue is formed. Figure 4.6 shows an example of this strategy. First, apply the graph traversal algorithm to the graph. Vertices A and C, colored with the first color, are in group

PAGE 80

70 lA, vertices B and D, colored with the second color, are in group IB, and vertex E, which was thrown away in the first traversal, is in group 2A. Second, combine a group of vertices from the priority queue in the order of priority with new communication requests for a graph traversal. This plan not only increases system throughput but also solves the problem of starvation. 4.5 Summary In this chapter an apphcation of the graph traversal algorithm to the dynamically partitionable bus network (DPBN) is presented. In a dynamically partitionable bus network a bus is partitioned into a number of subnetworks, and multiple communication processes are carried out at any moment. The proposed approach resolves conflicts by assigning colors to the graph where vertices represent communication requests and edges denote the conflicts among the communication requests. The communication requests corresponding to the vertices assigned the same color are carried out in parallel within subnetworks. In addition, a hardware design of graph traversal algorithm to speed up the identification and scheduHng of non-conflicting communication requests is described. The performance evaluation of the partitionable bus network shows a significant increase of network throughput as compared with an ideal non-partitionable network in which no communication contention is assumed.

PAGE 81

71 Figure 4.6. The Graph Traversal for the Strategy of Handling Aging

PAGE 82

CHAPTER 5 DPBN AND COIN-CHANGING ALGORITHMS One of the characteristics of a distributed system is that processors compete for resources and processes are performed in parallel. Due to the fact that the elapsed times of these processes vary, system resources are often idle as these processes are completed and resources are released at different times. For instance, in a partitionable bus network, requests that correspond to the vertices that have been assigned the same color proceed in parallel. When a communication is over, a subnetwork assigned to that communication process will be idle and has to wait for other parallel communication processes to complete before it can be reused in the next round of graph traversal and assignment. For instance, three communication requests are selected from a pool of communication requests (?i,?2W35 ••5?n) as shown in Table 5.1 by the graph traversal algorithm for parallel processing. The durations for gi, qs, and ^4 are 23, 189, and 288 time units, respectively. The process qi will finish first and be followed by 53 and ^4. Since the requests do not finish at the same time, the partitioned subnetworks will not be fully utilized. The utilization rate for this example is 23 + 189 + 288 — = 0.578 288*3 The percentage of bus idling is approximately 42. In a random distribution of the lengths of communication requests, the waste of bandwidth due to bus idling can reach 50 percent, since the expected length of communication requests is a half of the maximum length. The amount of idling 72

PAGE 83

73 Table 5.1. An Example of Bus Idling communication request Sending Station Receiving Station duration of request ?i 1 3 23 92 2 4 50 gs 4 5 189 94 6 7 288 95 1 4 211 ; ; : • qn 1 5 100 time can be reduced if the message of each request is divided into a number of small time frames. Each communication request is allowed to transmit messages for the duration of a time frame and the scheduling of message transfer is based on these time frames. There are two ways to divide a message into time frames. One way is to use a fixed-size time frames. Using this method, a message is divided into a number of fixed-size frames. For example, if each time frame is 23 units, <^i, 53, and 94 in this example can be scheduled for parallel processing with three subnetworks during the first time frame, qz and ^4 can be scheduled for parallel processing with two subnetworks during the next 8 time frames, q^ can be completed in the next 4 time frames. Since subnetworks can be released as soon as communication requests are completed, the utilization of the communication channel can be increased. However, if a large frame size is used, some idle time would still remain. For example, in the ninth time frame in this example, the subnetwork assigned to ^3 would be idle for 18 time units since q^ has only 5 time units in the last frame. If a small frame size is used, the overhead as a result of increased graph traversals and scheduling can be significant.

PAGE 84

74 Another way to divide a message is to use time frames of variable sizes. Using this method, a message is partitioned into a number of frames of variable sizes. If the set of time frames of variable sizes is chosen properly it is possible to keep the number of time frames required for processing the communication requests to a minimum, thus reducing the number of graph traversals and process scheduling. If we select a number of time frames of different lengths and assign time frames to each individual communication process using the Greedy algorithm described in Chapter 2, where the total length of the assigned time frames equals the elapsed time of the communication process, the number of time frames assigned would be minimized and so is the number of graph traversals. The Greedy algorithm is simple and fast in comparison to other algorithms. This problem of assigning the proper mix of time frames of variable sizes is the same as the coin-changing algorithm described in Section 2.4. The Greedy algorithm takes as many of the largest coin type (time frame) as possible and then as many of the second largest coin type as possible, etc. However, it has two disadvantages. First, the Greedy algorithm does not always generate an optimal solution for a given total time T. Second, in some cases, the Greedy algorithm does not generate a solution even though an optimal solution exists. The examples can be found in Section 2.4. Both disadvantages can be eliminated if the set of coin types (time frames) is chosen properly, for example, if the set of coin types W = (Wi, 1^2, W3, ..., W„) is in the form of a geometric sequence. A geometric sequence is defined as follows: Definition 5.0.1 Geometric Sequence: A sequence W = (Wi, W2, W3, ..., W„), where Wi < Wi+i is geometric ifWi^i/Wi = r and r is a positive integer. Since the problem of minimizing the number of coins is the same as the problem of minimizing time frames, the set of time frame sizes is chosen to be a geometric

PAGE 85

75 sequence in our approach. There are two advantages to use a geometric sequence as the set of frame sizes. First, it guarantees that the Greedy algorithm always generates a solution if a solution exists, as proven in [7]. Second, as we shall show, the Greedy algorithm always produces an optimal solution if the set of frame sizes is a geometric sequence. The proof for the second point is given in the next section. However, for a given distribution of communication lengths, there exist many possible geometric sequences. For instance, in a normal distribution of lengths of communication requests with a maximum length 100 and a minimum length 1, some useful geometric sequences are as follows: W = (1,2,4,8,16,32,64) ly = (1,3,9,27,81) M^ = (1,4, 16,64) The optimal set of frame sizes can be determined by simulation. This chapter is organized as follows: Section 5.1 presents and proves some theorems related to the Greedy algorithm for coinchanging. Section 5.2 presents the application of coin-changing algorithm in partitionable bus networks. A performance evaluation is given in Section 5.3. 5.1 Some Theorems Related to Coin-Changing Algorithm The terms used in this chapter are defined below. Some of these terms and definitions are taken from [7]. Definition 5.1.1 Representation: A set of non-negative integers {Xx,X2^...,Xn) which satisfies X^"_i XiWi — T is called a representation of T with respect to the set of coin types {Wx,W2,Ws,...,Wr,). For example, X = (1, 2, 0, 1) is a representation of T = 17 with respect to W = (1, 3, 4, 10).

PAGE 86

76 Definition 5.1.2 Canonical Representation: A representation obtained by taking the highest value coin type as many times as possible, then taking the next highest value coin type as many times as possible, etc. In other words, the canonical solution is the representation computed by the Greedy algorithm. For example, the canonical representation for T = 17 in the example of Definition 5.1.1 is X = (0, 1, 1, 1). Definition 5.1.3 Complete: A set of coin types is said to be complete if any given integer T has a canonical representation. For example, the set W = (1, 3, 8, 10, 26) is complete, while W = (2, 4, 8, 10) is not, since T = 13 has no representation with respect to W. Definition 5.1.4 Essentially Complete: A set of coin types W = (Wj, W2, W3, ..., W„) is said to be essentially complete if whenever a constant T is representable, it always has a canonical representation. For example, W = (3, 9, 12, 27) is essentially complete, since any T that has a representation also has a canonical representation. W = (3, 7, 12, 29) is not essentially complete since T = 9 has a representation (3, 0, 0, 0) with respect to W but does not have a canonical representation. Definition 5.1.5 Optimal Representation: A representation of an integer T with respect to W = {Wi,W2, Ws, ..., Wn) is an optimal representation if it uses a minimal number of coins. Definition 5.1.6 Canonical Set: A set of coin types W = (Wj, W2, W^s, •••, W'n) is canonical if each canonical representation of T with respect to W is optimal.

PAGE 87

77 Using the above definitions, we shall now present and prove a number of theorems. The theorem below provides a sufficient condition to determine if a set of coin types is essentially complete. Theorem 5.1.1 [7] W = {Wi,W2,W3, ...,Wn) is essentially complete if and only if Wi = KiWi, for 1 < i < n, where Ki is a non-negative monotonically increasing integer. Proof of Theorem 5.1.1 The -proof is given in [7j. Now, we present and provide a theorem that provides a sufficient condition for a cannonical set. Theorem 5.1.2 W = (Wi, W2, W3, ..., iy„) is a canonical set of coin types if the sequence Wi [i = 1, ...,n) is a geometric sequence with ratio r> 1 where r = W^+x/W^ and r is an integer. Proof of Theorem, 5.1.2 The strategy we take to prove the theorem is as follows. We want to show that if X — {Xi, X2, X3, ..., Xn) is a representation of T generated by the Greedy algorithm with respect to the set of coin types W = (Wi, W2, W3, ..., W„) and Y = (1^,5-^,13,...,}^) is an optimal representation, then X = Y. Before we prove the theorem, we make the following four observations: 1. Yi < r for i = l...n. IfYi is greater than r, we can simply replace r coins of type Wi with a single coin of type Wi+i By doing so, we reduce the total number of coins used by r-1. (i.e., Y is not an optim,al representation if any Yi is greater than or equal to r.) 2. If Xk 7^ Yk where k is the largest number less than n, then Xk > YkThis is because the Greedy algorithm always takes a maximum possible number of coins

PAGE 88

78 of the highest values. It is obvious that Xk is greater than Yk at least by 1 if Xk^Yk. 3. Y.UYiWi<{Yk-^l)Wk. To show that the above inequality holds, we first replace Yi by (r 1) in 2]*Li YiWi to obtain the following equation: k k—1 E^i^i < E(^ ^Wi + YkWk (1) The above equation holds because Yi < r according to observation 1. By taking the summation of the geometric series Wi, we obtain k—\ Y,{r-l)W, = Wk-l (2) By substituting (2) into (1) we conclude i=l 4Y4=iXiWi — Yl^=iYiWi = T. The above equation is valid, since both X and Y are representations of T. Based on the above four observations, we now prove the theorem by contradiction. We first assume that X ^ Y. If k is the largest number such that Xk ^ Yk, the equation in observation 4 can be reduced to k k J2XiWi = '£YiWi i-1 1=1 By observation 3, we have k Y.YW,<{Yk^l)Wk (3) i=l

PAGE 89

79 It is obvious that k Since Xk > Yk by at least 1, due to observation 2, we know XkWk > (n + l)Wk Thus, k T.^iWi>{Yk + l)Wk (4) By (3) and (4) k k Y.XiW,>J2Ym This is a Contmdiction to the fact that k k ^X^W^ = ^Ym i=l i=l We conclude that either X = Y or X = {Xi,X2,X3, ...,X„) is the optimal representation of T. The following two corollaries show that a geometrical sequence is essentially complete, and is complete if Wi = 1. Corollary 5.1.1 W = (Wi, H^2, H^3, •.., W'n) is essentially complete ifW^ is a geometric sequence. Proof of Corollary 5.1.1 The proof is obvious from Theorem 5.1.1. Corollary 5.1.2 W = {W^,W2,W3,...,Wn) is complete if and only if Wi = l. Proof of Corollary 5.1.2 The proof is obvious from the definition of Complete.

PAGE 90

80 Theorem 5.1.3 Given a set of coin types W = (Wi, Wj, 1^3, •••; ^n); if W'' = {Wi,W2,W2., ...,Wk) is canonical and the representation of T = pW^ is optimal, then 1^^+^ = (Wi, W2, W3, ..., W/-, M^/c+i) is canonical. The constant p is determined by taking the ceiling of Wk+i /Wh (i.e., p — \Wk+i/Wk]). It should be noted that W represents the subset of W consisting of the first k elements of W. Proof of Theorem 5.1.3 The proof is given in [58]. Theorem 5.1.3 is also called the one-point theorem. It provides a fast way to check if a set of coin types is canonical. A set of coin types W = (W^i, W^2, •••, W^n) is a canonical set if the first (n-1) elements of W is a canonical set, and the representation of [W„/I^n-il Wn is optimal. The same rule can be applied to determine if the first (n-1) elements of W is canonical. For instance, to determine that W = (1,4, 15,20) is not a canonical set, the following steps are required: Step 1: W^ = (1) is canonical. W^ = (1, 4) and p = [|] = 4. Step 2: For T = 4 W-i = 4, the representation (0, 1) obtained by the Greedy algorithm is an optimal representation. According to the theorem, W-^ = (1, 4) is a canonical set. Step 3: W^ = (1, 4, 15) and p = [f ] = 4. Step 4: For T = 4 *W2 = 4 4 = 16, the representation (1, 0, 1) obtained by the Greedy algorithm is an optimal representation. According to the theorem, W^ = (1, 4, 15) is a canonical set. Step 5: W^ = (1, 4, 15, 20) and p = [§] = 2. Step 6: For T = 2 *Ws = 2 15 = 30, the representation (2, 2, 0, 1) obtained by the Greedy algorithm is not an optimal representation since the representation (0, 0,

PAGE 91

81 2, 0) for T = 30 uses fewer coins. As a result, W = (1,4,15,20) is not a canonical set. The above theorems serve the following purposes: 1. To establish a general rule for the construction of a canonical set (Theorem 5.1.2). 2. To determine if a set of coin types is complete or essentially complete (Corollary 5.1.1 and Corollary 5.1.2). 3. To provide an easy way to check if a set of coin types is canonical (Theorem 5.1.3). In summary, a set of geometric sequence W = (Wi, W2, ..., W„) with Wi — 1 is a good choice for the set of time frames in our approach, since each communication request can be decomposed into time frames by the Greedy algorithm, and the number of time frames assigned is minimal. 5.2 The Application of the Coin-Changing Algorithm in Partitionable Bus Networks As stated before, a dynamically partitionable bus network allows multiple communication processes to be carried out at the same time. However, the bus capacity is still not fully utilized, since the message lengths of the communication requests are not the same in general. Assuming the message lengths of communication requests are uniformly distributed, the loss of throughput due to non-uniform termination of communication processes could be 50 percent or higher. Bus idling can be reduced if the scheduling of requests is based on some variable-sized time frames assigned to the communication requests. Only these requests with communication durations representable by the assigned frames can participate in the graph traversal. However, the number of time frames assigned to the communication requests needs to

PAGE 92

82 be minimized to reduce the number of graph traversals. The problem of minimizing the number of time frames can be transformed into a coin-changing problem if we regard the size of a time frame as the face value of a coin type and the message length of a request as the total dollar amount that the individual coins should add up to. The proposed approach is to select a set of coin types (time frames) according to the theorems provided in the previous section and use the simple Greedy algorithm to decompose each message length. The steps to be taken in this approach are as follows: 1. Select a set of time frames of different sizes that form a geometric sequence. The purpose of choosing a geometric sequence is to guarantee that a communication request of any length can be divided into the selected time frames using the Greedy algorithm. Since the Greedy algorithm is very simple, it will add little overhead to the communication request. 2. Apply the Greedy algorithm to each communication request. Each message length will thus be represented by a minimal number of time frames. 3. Start with the largest time frame, designate it as the tag frame and collect the communication requests which have at least one unit of the tag frame. 4. Transform the collected requests into a graph according to the techniques described in Chapter 3. 5. Apply the graph traversal algorithm to the constructed graph and allow the requests that correspond to the vertices with the same color to be carried out simultaneously. 6. Repeat steps 3, 4, and 5 for the next largest time frame.

PAGE 93

83 The example below demonstrates the approach: Suppose there are m communication requests with lengths /i, h-, h, ..., /m, respectively, as shown in Table 5.2. By applying the Greedy algorithm to each length of communication request with respect to the set of time frames (mi,m2,m3, ...,m„), the time of each communication request is decomposed into time frames of different lengths. For instance, /i is made up of 3 frames of mi, 1 frame of m2, 2 frames of run-i-i and 1 frames of m„ as shown in the Table 5.1. Next, all the requests having some units of time frame r7i„ are collected for graph traversal, and the requests assigned with the same color are allowed to send messages of length m„. Continue this process for each time frame m„_i, m„_2, ..., etc. The advantage of using the coin-changing algorithm with the graph traversal algorithm is obvious. The partitionable bus network can be more fully utilized. Since the Greedy algorithm always generates the optimal solution (i.e., the minimal number of time frames required to make up message length li with respect to m), the overhead of graph traversals can be reduced to a minimum. Table 5.2. An Example of Decomposing Communication Requests h h h u Im mi 3 1 2 m2 2 1 1 mo, 1 1 1 4 m^ 5 2 2 2 ^ : : mn-i 2 3 rrin 1 2 3 4 1 li : The message length of comunication request i. rrii : The size of the time frame i

PAGE 94

84 5.3 Performance Evaluation In this section, the usefulness of the proposed approach is demonstrated. The model and the parameters of the dynamically partitionable bus network are first described below: 1. The architecture of the partitionable bus network has been presented in Chapter 4. However, the lengths of communication requests are not uniform. 2. It is obvious that the distribution of the lengths of communication requests determines the effectiveness of using the coin-changing algorithm to maximize the utility of the dynamically partitionable bus network. In this performance evaluation, a normal distribution, an exponential distribution, and a uniform distribution are simulated. The appendices A and B show the probability distribution of a normal distribution and an exponential distribution, respectively. 3. The maximal and minimal lengths of the communication requests are 100 units and 1 unit, respectively. 4. The expected length of the communication requests is 50 units. 5. The time required to change switches between time frames is negligible. In the simulation program, the set of frame sizes (1,2,4,8,16,32,64) is used, which guarantees that the Greedy algorithm will generate the optimal solution. For the uniform distribution, the length of a communication request is determined by generating a random number between and 1 and by multiplying that number by 100. Only one simulation run is completed for each distribution, and 100 communication

PAGE 95

85 requests are generated for each run. For the normal distribution and the exponential distribution, the numbers are generated by using the procedure given in [16]. Table 5.3 shows the improvement ratio of a DPBN using time frames versus a DPBN without using time frames in a uniform distribution of communication lengths. The expected number of communication processes (ENP) being carried out at any instant is 2, 3, 4, 5, 6, 7, 8, 9, and 10, respectively. The improvement ratio A is the percentage of times the network delay is decreased. For instance, A = .30 means that a decrease of network delay by 30 percent. A = Ti where Ti is the total communication time of a DPBN without using time frames and, T2 is the total communication time of a DPBN using time frames. As shown in Table 5.3, the network delay can be reduced by between 25 and 45 percent. The decrease of network delay is proportional to the ENP value.

PAGE 96

86 Table 5.3. The Improvement Ratio of a DPBN Using Time Frames (Uniform Distribution of Communication Lengths) ENP Ti T2 A 2 6508 4798 0.2626 3 7381 4995 0.3232 4 8119 5076 0.3748 5 8295 5108 0.3841 6 8680 5053 0.4178 7 8836 5072 0.4260 8 8935 5081 0.4313 9 9138 5108 0.4409 10 9124 5086 0.4430 Ti: The Total Communication Time of a DPBN without Using Time Frames T2: The Total Communication Time of a DPBN Using Time Frames A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN Without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant. The Number of Requests Generated — 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50

PAGE 97

87 Tables 5.4, 5.5, 5.6, and 5.7 show the improvement ratio in an exponential distribution of communication lengths. The exponent coefficient (3 equals 60, 40, 20, and 10, respectively.

PAGE 98

88 Table 5.4. The Improvement Ratio of a DPBN Using Time Frames (Exponential Distribution of Communication Lengths with /3 = 60) ENP Ti T2 A 2 7163 5125 0.284 3 7752 4859 0.373 4 8277 4741 0.427 5 8741 4693 0.461 6 9134 4756 0.479 7 9325 4727 0.493 8 9475 4731 0.500 9 9573 4708 0.508 10 9720 4745 0.511 Ti: The Total Communication Time of a DPBN without Using Time Frames T2: The Total Communication Time of a DPBN Using Time Frames A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant. The Number of Requests Generated = 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50

PAGE 99

89 Table 5.5. The Improvement Ratio of a DPBN Using Time Frames (Exponential Distribution of Communication Lengths with (3 = 40) ENP Ti T2 A 2 5575 3880 0.304 3 6119 3614 0.409 4 6728 3527 0.475 5 7261 3486 0.519 6 7715 3546 0.540 7 8066 3539 0.561 8 8299 3518 0.575 9 8404 3500 0.583 10 8722 3534 0.594 Ti: The Total Communication Time of a DPBN without Using Time Frames T2: The Total Communication Time of a DPBN Using Time Frames A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant. The Number of Requests Generated = 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50

PAGE 100

90 Table 5.6. The Improvement Ratio of a DPBN Using Time Frames (Exponential Distribution of Communication Lengths with (5 = 20) ENP Ti T2 A 2 3060 2082 0.319 3 3381 1905 0.436 4 3837 1882 0.509 5 4166 1848 0.556 6 4553 1887 0.585 7 4868 1889 0.611 8 5061 1872 0.629 9 5232 1869 0.642 10 5493 1880 0.657 Ti: The Total Communication Time of a DPBN without Using Time Frames T2: The Total Communication Time of a DPBN Using Time Frames A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant. The Number of Requests Generated = 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50

PAGE 101

91 Table 5.7. The Improvement Ratio of a DPBN Using Time Frames (Exponential Distribution of Communication Lengths with /? = 10) ENP T, T2 A 2 1577 1061 0.327 3 1735 960 0.446 4 1963 942 0.520 5 2129 920 0.567 6 2358 944 0.599 7 2517 944 0.625 8 2614 932 0.643 9 2698 930 0.655 10 2831 934 0.669 Ti: The Total Communication Time of a DPBN without Using Time Frames T2: The Total Communication Time of a DPBN Using Time Frames A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant. The Number of Requests Generated = 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50

PAGE 102

92 Tables 5.8, 5.9, 5.10, 5.11, 5.12, and 5.13 show the improvement ratio for a normal distribution of communication lengths. The variance of the normal distribution varies between 5 and 40. The results show that, by partitioning a bus network to achieve parallel communication, the network throughput increases significantly when the variance of distribution is large. This is because when the variance is large, the lengths of the communication processes spread over a wider range resulting in a higher percentage of bus idling if the coin-changing algorithm is not used. When the variance is large, the application of the coin-changing algorithm in a dynamically partitionable bus network is justifiable.

PAGE 103

93 Table 5.8. The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with a^ — 40) ENP Ti T2 A 2 6755 4959 0.265 3 7842 5156 0.342 4 8406 5192 0.382 5 8679 5091 0.413 6 9086 5055 0.443 7 9249 5066 0.452 8 9337 5041 0.460 9 9400 5078 0.461 10 9672 5066 0.476 Tx. The Total Communication Time of a DPBN without Using Time Frames T2. The Total Communication Time of a DPBN Using Time Frames A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant. The Number of Requests Generated = 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50

PAGE 104

94 Table 5.9. The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with a^ = 30) ENP Ti T2 A 2 6741 5082 0.246 3 7391 4951 0.330 4 7871 5058 0.357 5 8132 5111 0.371 6 8373 5000 0.402 7 8553 4989 0.416 8 8739 4976 0.430 9 8875 5013 0.435 10 8967 4980 0.444 Ti: The Total Communication Time of a DPBN without Using Time Frames T2: The Total Communication Time of a DPBN Using Time Frames A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant. The Number of Requests Generated = 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50

PAGE 105

95 Table 5.10. The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with a^ = 20) ENP Tx T2 A 2 6267 5133 0.180 3 6787 5109 0.240 4 7136 5082 0.287 5 7150 5017 0.298 6 7416 5021 0.323 7 7669 5023 0.345 8 7759 5010 0.354 9 7742 4957 0.259 10 7893 4963 0.371 Ti : The Total Communication Time of a DPBN without Using Time Frames T2: The Total Communication Time of a DPBN Using Time Frames A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant. The Number of Requests Generated — 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50

PAGE 106

96 Table 5.11. The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with a^ = 15) ENP Tx T2 A 2 5851 5044 0.138 3 6168 4959 0.198 4 6502 4978 0.234 5 6671 5002 0.250 6 6868 5019 0.269 7 6690 5021 0.281 8 7073 5001 0.293 9 7098 4983 0.298 10 7186 4978 0.307 T\: The Total Communication Time of a DPBN without Using Time Frames T2: The Total Communication Time of a DPBN Using Time Frames A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant. The Number of Requests Generated = 100 The Maximum Length of a Request = 100 The Minimum Length of a Request — 1 The Mean of the Requests = 50

PAGE 107

97 Table 5.12. The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with ct^ = 10) ENP Ti T, A 2 5512 4940 0.103 3 5770 4962 0.140 4 5890 4965 0.157 5 6007 4964 0.173 6 6146 4970 0.191 7 6208 4963 0.200 8 6276 4954 0.210 9 6363 4962 0.221 10 6450 4964 0.230 Ti. The Total Communication Time of a DPBN without Using Time Frames T2: The Total Communication Time of a DPBN Using Time Frames A: The Improvement Ratio of a DPBN with Time Frames vs. a DPBN without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant. The Number of Requests Generated = 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50

PAGE 108

98 Table 5.13. The Improvement Ratio of a DPBN Using Time Frames (Normal Distribution of Communication Lengths with a^ = 5) ENP Ti T. A 2 5235 4973 0.050 3 5402 4969 0.080 4 5451 4974 0.087 5 5525 4966 0.101 6 5581 4960 0.119 7 5630 4974 0.116 8 5631 4960 0.119 9 5663 4954 0.125 10 5709 4958 0.131 T\\ The Total Communication Time of a DPBN without Using Time Frames T2: The Total Communication Time of a DPBN Using Time Frames A: The Improvement Ratio of a DPBN Using Time Frames vs. a DPBN without Using Time Frames ENP: The Number of Communication Processes That Are Being Carried Out at Any Instant. The Number of Requests Generated = 100 The Maximum Length of a Request = 100 The Minimum Length of a Request = 1 The Mean of the Requests = 50

PAGE 109

99 In summary, the network delay that the DPBN reduces by using the coin-changing algorithm depends on the following two factors: 1. The expected number of communication processes (ENP). The ENP indicates the number of communication processes that are being carried out at any instant in the network. The higher the ENP, the better the chance of having large variation of communication lengths. As shown in all the tables above, A (the improvement ratio) increases with the increase of ENP values. The improvement ratio can be converted to the percentage of network delay decrease by using the following equation: percentage of network delay decrease = A 100 = ^1^^ 100 2. The variance (cr^). As the variance of the distribution of communication lengths increases, the percentage of bus idling becomes higher in a DPBN without using time frames. This situation gives a DPBN that uses time frames a better performance since it minimizes bus idling. It is also worth mentioning that the overhead of bus partitioning and switching may affect choice of frame length sequence as well as performance.

PAGE 110

CHAPTER 6 SUMMARY, CONCLUSION, OTHER APPLICATIONS, AND FUTURE WORK In a naultiprocessor/multicomputer system, computational tasks are carried out in paralleL Limited system resources such as communication channels, memory devices, and processors are shared among concurrent processes. The performance of a computer system can decrease rapidly due to resource contentions. The problem of resource contention and allocation can be represented as a graph coloring problem, and the techniques for coloring graphs can be used to identify non-conflicting resource requests effectively. 6.1 Summary This dissertation presents three graph traversal algorithms that give good estimations for the chromatic number of a graph, where the vertices of the graph represent events or requests and the edges connecting the vertices represent conflicts between events or requests. The complexity of each graph traversal algorithm is O (E), where E is the number of edges of the graph. The dynamic algorithm has better performance than the static algorithm, since it throws away vertices "on the fly" and results in few colors used. The analysis of the dynamic algorithm shows that it produces better results than the simulation results of the algorithm reported by Wood [61]. In this work, the dynamic graph traversal algorithm for detecting non-conflicting resource requests is applied in a dynamically partitionable bus network (DPBN) to partition the bus network into a number of subnetworks for processing sets of nonconflicting communication requests. It distinguishes conflicting and non-conflicting 100

PAGE 111

101 requests by assigning different colors to vertices of a graph, which represent requests. The same color is assigned to a set of nonconflicting requests which can be scheduled for parallel processing in the subnetworks. In this work, the design of a special-purpose processor (a hardware design of the dynamic algorithm) that can be used to speed up the scheduling of communication requests is also presented. The special processor receives from the control computer of a partitionable bus network the adjacency matrix, which indicates conflicts among requests. It applies the dynamic graph traversal algorithm and returns the identified non-conflicting requests to the control computer. The network is partitioned to allow non-conflicting requests to proceed in parallel. Performance evaluation of the partitionable bus network that uses the dynamic graph traversal algorithm is also carried out. The results show a significant decrease of network delay as compared with an ideal unpartitioned local area network in which no conflicts are assumed. Since the communication processes that are allowed to proceed in the partitioned subnetworks take different time durations, the subnetworks that are released by the processes of shorter durations can not be reused until all the processes are completed. In other words, some subnetworks are not fully utilized. The performance of the dynamically partitionable bus network is further improved by reducing the subnetwork idhng time. The subnetwork idling problem is solved by dividing the message of each request into a minimal number of variable-sized time frames using the coin-changing algorithm, and each communication request is allowed to transmit messages for the duration of a time frame. The coin-changing algorithm used is the Greedy algorithm which is simple and fast. It is shown in this work that, by choosing a proper set of coin types (time frames) such as a geometric sequence, the Greedy algorithm can always generate an optimal representation for a time requirement of a communication

PAGE 112

102 request. The performance evaluation shows that the increase in network throughput depends on the the variance of the distribution of communication durations and the expected number of communication processes (ENP) that are being carried out at the same time intervaL The higher the variance and the ENP, the greater the network throughput increase achieved by applying the coin-changing algorithm in a dynamically partitionable bus network. 6.2 Conclusion This research makes the following four contributions. First, it introduces three new graph traversal algorithms and shows analytically that one of the algorithms can produce better results than the algorithm introduced by Wood. Second, it demonstrates that, by applying the dynamic graph traversal algorithm in a partitionable bus network, the throughput of the network can be dramatically increased. Third, it introduces several new theorems asserting that, if a set of coin types forms a geometric sequence, the Greedy coin-changing algorithm always generates an optimal solution. Fourth, the new algorithms, the analysis and simulation results, and the theorems introduced in this work have very general applications. They can be applied to solve any resource contention problem in a computer system and to improve its resource utilization and performance. 6.3 Other Applications The approach presented in this dissertation has very broad and useful applications, since most real world problems that deal with resource sharing and scheduling can be interpreted as a graph coloring problem. For example, for database computers, such as MICRONET [51,53], MDBS [21], DBC/1012 [57], GAMMA [12],

PAGE 113

103 HYPERTREE [10,47], REPT [49,50], NON-VON [20,48], Michigan's Boolean Cubeconnected Multicomputer [2], GRACE [27,28,29], and DBMAC [38,39], the congestion of the communication bus due to the large amount of data transfer slows down the processing of database operations. The problem can be solved by partitioning the bus to increase network throughput and by minimizing unnecessary data transfer by identifying the sequence of operations which cause a large amount of data transfer. It works as follows. Apply the graph traversal algorithm to the graph, where vertices represent database operations and an edge is drawn between a pair of vertices if the corresponding database operations do not require a large amount of data transfer. Thus, the corresponding operations of a pair of vertices with an edge in between can be executed consecutively with minimal delay. 6.4 Future Work This dissertation presents an alternative method for solving resource contention problems. It touches upon many areas. Some of the things we have done can be further refined and improved. They are as follows: • Simulate the dynamic graph traversal algorithm and Wood's algorithm on the same set of randomly generated graphs for comparison. • Investigate some open problems in graph coloring and coin-changing algorithms, such as three coloring, heuristic rules of coloring, and sufficient conditions of a coin set that results in an optimal representation when Greedy algorithm is applied. • Compare the improvement of time frames of variable sizes and time frames of a fixed size on a DPBN. Decomposing the length of a request into time frames of variable sizes solves the idling problem. However, the overhead of running

PAGE 114

104 the Greedy algorithm is a waste of bandwidth. An alternative is to decompose the length of a request into time frame of a fixed size. The overhead is reduced, but the number of graph traversals may be increased. • Do more simulation runs on the simulation study of the DPBN in Chapters 4 and 5. • Refine the model of DPBN in Chapter 5 to include bus switching time for performance evaluation. • Study the strategy that allows non-conflicting requests with sizes of time frames smaller than that of the tag frame to join graph traversals.

PAGE 115

APPENDIX A PROBABILITY DISTRIBUTION: NORMAL DISTRIBUTION A random variable X is said to have a normal distribution with mean // and variance a^ if X has the probability distribution function given below: where -cx) < a; < oo and e is a constant (e th 2.7183). A procedure that generates a random distribution is in the next page. 105

PAGE 116

106 The procedure [16] below generates a normal distribution defined in the previous page. Procedure Normal Distribution; (* This Procedure Generates Numbers of Normal Distribution *) Begin C :=2*7r If {A = 0)Then Begin U :=U (0, 1); (*U is the random number generator*) V :— E(l); (*E is the exponential number generator*) B := (2y)i/2 Wi := BcosU W2 := BsinU A:=l; Return W\ End; Else Begin A:=0 Return W^ End; End; {*Normal Distribution*);

PAGE 117

APPENDIX B PROBABILITY DISTRIBUTION: EXPONENTIAL DISTRIBUTION A random variable X is said to have an exponential distribution if X has the probability distribution function given below: f( )S i^ll^)^'''"^ if < a; < oo ^ "^ ~ I otherwise A procedure that generates an exponential distribution is in the next page. 107

PAGE 118

108 The procedure [16] below generates an exponential distribution defined in the previous page. Procedure Exponential Distribution] (* This Procedure Generates Numbers of Exponential Distribution *) Begin U :=U (0, 1)(* U is Random Number Generator *) W :-ln{U) Return W end; (* Exponential Distribution *)

PAGE 119

REFERENCES [1] C. K. BaruandS. Y. W. Su. Performance of Statistical Aggregation Operations in the SMS system. Proceedings of ACM/SIGMOD International Conference on Management of Data, Association for Computing Machinery, Boston, June 1984. [2] C. K. Baru and O. Frieder. Implementing Relational Database Operations in a Cube-connected Multicomputer. Proceedings of the International Conference on Data Engineering, IEEE, Los Angeles, Feb. 1986. [3] C. K. Baru and S. Y. W. Su. The Architecture of SMS: A Dynamically Partitionable Multicomputer System. IEEE Transactions on Computers, Vol. C-35, No.9, Sept. 1986, pp. 790-801. [4] B. Bollabos and A. G. Thomason. Random Graphs of Small Order. Annals of Discrete Math., Vol. 28, 1985, pp. 47-97. [5] D. Brelaz. New Methods to Color the Vertices of a Graph. Communications of the ACM, Vol. 22, No. 4, 1979, pp. 251-256. [6] R. L. Brooks. On Coloring the Nodes of a Network. Proceedings of Cambridge Philos. Soc, Vol. 37, 1941, pp. 194-197. [7] L. Chang and J. F. Korsh. Canonical Coin Changing and Greedy Solutions. Journal of the Association for Computing Machinery, Vol. 23, No. 3, July 1976, pp. 418-422. [8] S. K. Chang and A. Gill. Algorithmic Solution of the Change-Making Problem. Journal of the Association for Computing Machinery, Vol. 17, No. 1, Jan. 1970, pp. 113-122. [9] N. Christofides. An Algorithm for the Chromatic Number of a Graph. Computer Journal, Vol. 14, No. 1, Feb. 1971, pp. 38-39. [10] A. M. Despain and D. A. Patterson. X-tree: A Tree Structured Multiprocessor Computer Architecture. Proceedings of the Fifth Symposium on Computer Architecture, Palo Alto, California, 1978. [11] D. DeWitt. A Multiprocessor Organization for Supporting Relational Database Management Systems. IEEE Transactions on Computers, Vol. C-28, No. 6, June 1979, pp. 395-406. [12] D. J. DeWitt, R. H. Gerber, G. Graefe, M. L. Heytens, K. B. Kumar, and M. Muralikrishna. GAMMA: a Performance Dataflow Database Machine. Proceedings of the 12th International Conference on Very Large Databases, Information Processing Society of Japan, Kyoto, Japan, Aug. 1986. 109

PAGE 120

no 13] T. Fei, C. K. Bam, and S. Y. W. Su. SMS: A Dynamically Partitionable Multicomputer System with Switchable Main Memory Modules. Proceedings of the International Conference on Computer Data Engineering, IEEE, Los Angeles, April 1984. 14] M. R. Garey and D. S. Johnson. The Complexity of Near-optimal Graph Coloring. Journal of Association for Computing Machinery, Vol. 23, 1976, pp. 43-49. 15] M. R. Garey and D. S. Johnson. Computers and Intractability. W. H. Freeman and Company, San Francisco, 1979. 16] G. S. Fishman. Principle of Discrete Event Simulation. John Wiley and Sons, New York, 1979. 17] E. Hafner, Z. Nenadal, and M. Tschanz. A Digital Loop Communication System. IEEE Transactions on Communications, Vol. Com-22, No. 6, June 1974, pp. 877881. 18] S. L. Hakimi and Schineichel. Chromatic Factorizations of a Graph. Journal of Graph Theory, Vol. 12, No. 2, 1988, pp.177-182. 19] J. L. Hammond and P. O'Reilly. Performance Analysis of Local Computer Networks. Addision Wesley Publishing Company, Reading, Mass., 1986. 20] B. K. Hillyer, D. E. Shaw, and A. Nigram. NON-VON's Performance on Certain Database Benchmarks. IEEE Transactions on Software Engineering, Vol. SE-12, No. 4, April 1986, pp. 577-583. 21] D. K. Hsiao and M. J. Menon. Design and Analysis of a Multi-backend Database System for Performance Improvement, Functionality Expansion, and Capacity Growth. Technical Report NPS52-83-006, Naval Postgraduate School, Monterey, California, June 1983. 22] IEEE Project 802 Committee. A Status Report on Local Network Standards Committee. Draft C, May 1982. 23] IEEE Standard 802.4. Token Passing Bus Access Method and Physical Layer Specifications. Draft D, 1982. 24] A. Johri and D. W. Matula. Probabilistic Bounds and Heuristic Algorithms for Coloring Large Random Graphs. Technical Report, Department of Computer Science and Engineering, Southern Methodist University, Dallas, Texas, 1982. 25] R. M. Karp. Reducibility among Combinatorial Problems. In R. E. Miller and J. W. Thatcher (eds), Complexity of Computer Computations, Plenum Press, New York, 1972. 26] S. Kartashev and S. Kartashev. Dynamic Architecture: Problems and Solutions. Computer, Vol. 11, No. 7, July 1978, pp. 7-15. 27] M. Kitsuregawa, H. Tanaka, and T. Moto-oka. Relational Algebra Machine GRACE. RIMS Symposia on Software Science and Engineering, 1982. Lecture Notes in Computer Science, SpringerVerlag, New York, 1983, pp. 191-212.

PAGE 121

Ill [28] M. Kitsuregawa, H. Tanaka, and T. Moto-oka. Architecture and Performance of Relational Algebra Machine GRACE. Proceedings of the International Conference on Parallel Processing, IEEE, Bellaire, Michigan, 1984, pp. 241-250. [29] M. Kitsuregawa, M. Fushimi, H. Tanaka, and T. Motooka. In D. J. Dewitt and H. Boral,(eds.), Memory Management Algorithms in Pipeline Merge Sorter. Proceedings of the Fourth International Workshop on Database Machines, SpringerVerlag, New York, 1985, pp. 208-232. [30] H. Lam, S. Y. W. Su, F. L. C. Seeger, C. Lee, and W. R. Eisenstadt. A Special Function Unit for Database Operations Within Data-Control Flow System. Proceedings of the International Conference on Parallel Processing, IEEE, St. Charles, IL., Aug. 1987. [31] E. L. Lawler. A Note on the Complexity of the Chromatic Number Problem. Information Processing Letter, Vol. 5, 1976, pp. 66-67. [32] F. T. Leighton. A Graph Coloring Algorithm for Large Scheduling Problem. Journal Res. Natn. Bur. Stand., Vol. 84, 1979, pp. 489-496. [33] M. T. Liu. Design of the Distributed Double-loop Computer Network (DDLCN). Journal of Digital Systems, Vol. 5, 1981, pp. 3-37. [34] M. Magazine, G. M. Nemhauser, and L. E. Trotter. When the Greedy Solution Solves a Class of Knapsack Problems. Operations Research, Vol. 23, 1975, pp. 207-217. [35] D. Matula. On the Complete Subgraphs of a Random Graph. Proceedings of the Second Chapel Hill Conference on Combinatorial Mathematics and Its Application, University of North Carolina, Chapel Hill, 1970, pp. 356-369. [36] D. Matula, G. Marble, and J. Isaacson. Graph Coloring Algorithms in Graph Theory and Computing. Academic Press, New York, 1972, pp. 109-122. [37] R. M. Metcalfe, D. R. Boggs, C. P. Thacker, and B. W. Lampson. Multipoint Data Communication System with Collision Detection. U.S. Patent 4063220, 1977. [38] M. MissikofF. A Domain-Based Internal Schema for Relational Database Machines. 1982 ACM-SIGMOD Conference, Santa Monica, California, 1982. [39] M. Missikoff and M. Terranova. The Architecture of a Relational Database Computer Known as DBMAC. In David K. Hsiao, (ed.). Advanced Database Machine Architecture, Prentice-Hall, Englewood ChfFs, New Jersey, 1983. [40] J. Mitchem. On Various Algorithms for Estimating the Chromatic Number of a Graph. Computer Journal, Vol. 10, 1976, pp. 182-183. [41] J. L. Mott, A. Kandel, and T. P. Baker. Discrete Mathematics for Computer Scientists. Reston Publishing Company, A Prentice-Hall Company, Reston, Virginia, 1983. [42] R. M. Needham and A. J. Herbert. The Cambridge Distributed Computing Systems. Addision Wesley Pubhshing Company, Reading, Mass., 1982.

PAGE 122

112 [43] R. NewmanWolfe. Communication Issues in Parallel Computation. Ph.D. Dissertation, Computer Science Department, University of Rochester, Rochester, New York, 1986. [44] J. R. Pierce. Network or Block Switching of Data. Bell System Technical Journal, Vol. 51, No. 6, pp. 1133-1145, 1972. [45] J. Plesnik. Coloring of Graphs by Partitioning. Math. Slovaca, Vol. 30, No. 2, 1980, pp. 121-126. [46] L. Raschid, T. Fei, H. Lam, and S. Y. W. Su. A Special Function Unit for Sorting and Sort-based Database Operations. IEEE Transactions on Computers (correspondence), C-35, No.l2, Dec. 1986, pp. 1071-1077. [47] C. H. Sequin, A. M. Despain, and D. A. Patterson. Communication in X-Tree, A Modular Multiprocessor System. Proceedings of the ACM, ACM, Austin, Texas, 1978. [48] D. E. Shaw. A Parallel Algorithm for External Sorting, NON-VON Supercomputer. Technical Report, Computer Science Department, Columbia University, New York, August 1982. [49] R. K. Shultz. A Multiprocessor Computer Architecture for Database Support. Ph.D. Dissertation, Department of Computer Science, Iowa State University, Ames, Iowa, 1981. [50] R. K. Shultz and R. J. Zingg. Response Time Analysis of Multiprocessor Computers for Database Support. ACM Transactions on Database Systems, Vol. 9, No. 1, 1984, pp. 100-132. [51] S. Y. W. Su, S. Lupkiewicz, D. H. Lo, and K. Doty. MICRONET: A Microcomputer Network System for Managing Distributed Relational Databases. Proceedings of the 4th International Conference on VLDB, Berlin, West Germany, 1978. [52] S. Y. W. Su and K. P. Mikkilineni. Parallel Algorithms and Their Implementation in MICRONET. Proceedings of the 8th International Conference on VLDB, Mexico City, Sept. 1982. [53] S. Y. W. Su. A Microcomputer Network System for Distributed Relational Databases: Design, Implementation, and Analysis. Journal of Telecommunication Networks, Vol. 2, No. 3, 1983, pp. 307-333. [54] S. Y. W. Su and C. K. Baru. Dynamically Partitionable Multicomputers with Switchable Memory. Journal of Parallel and Distributed Computing, Vol. 1, Nov. 1984, pp. 152-184. [55] S. Y. W. Su. Database Computers: Principles, Architecture, and Techniques. McGraw-Hill, New York, 1988. [56] Y. Tanaka, Y. Noxaka, and A. Masuyama. Pipeline Searching and Sorting Models as Components of a Data Flow database Computer. Information Processing 80, NorthHolland PubUshing Co., Amsterdam, 1980.

PAGE 123

113 [57] Terradata Corporation. DBC/1012 Data Base Computer Concepts and Facilities. Release 1.1, C02-0001-01, Los Angeles, 1984. [58] B. N. Tien and T. C. Hu. Error Bounds and the Applicability of the Greedy Solution to the Coin-Changing Problem. Operation Research, Vol. 25, No. 3, 1977, pp. 404-418. [59] C. C. Wang. An Algorithm for the Chromatic Number of a Graph. Journal of the Association for Computing Machinery, Vol. 21, No. 3, July 1974, pp. 385-391. [60] D. J. A. Welsh and M. B. Powell. An Upper Bound to the Chromatic Number of a Graph and its Application to TimeTabling Problem. Computer Journal, Vol. 10, 1967, pp. 85-86. [61] D. C. Wood. A Technique for Coloring a Graph Applicable to Large Scale TimeTabling Problem. Computer Journal, Vol. 12, 1969, pp. 317-319. [62] J. W. Wright. The Change-Making Problem. Journal of the of the Association for Computing Machinery, Vol. 22, No. 1, Jan. 1975, pp. 125-128.

PAGE 124

BIOGRAPHICAL SKETCH Mr. Woo received the degrees of B.S. in engineering from National Taiwan University, M.S. in mechanical and aerospace engineering from the State University of New York at Buffalo, and M.S. in applied mathematics and computer science from Northwestern University in 1980, 1982, and 1985, respectively. While at Northwestern University, Mr. Woo wrote two published papers in the area of mathematical modeling and simulation. During the period of pursuing the Ph.D. degree at the University of Florida, he has written two papers in the area of computer networks. After graduation he plans to concentrate his research on database management and computer networking. 114

PAGE 125

I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Stanley Y. Wi Su, Chairman Professor of Uomputer and Information Sciences and Electrical Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Yu^a-Chieh Chow Professor of Computer and Information Sciences I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. <. (yG^<^-^^ Herman Lam Associate Professor of Electrical Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Richard NewmanWolfe Assistant Professor of Computer and Information Sciences

PAGE 126

I certify that I have read this study and that in nay opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quaUty, as a dissertation for the degree of Doctor of Philosophy. ~^pa^ ;f-g< ChungYee Ijee Assistant Professor of Industrial and Systems Engineering This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate School and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. May 1989 Dean, College of Engineering Dean, Graduate School


xml version 1.0 encoding UTF-8
REPORT xmlns http:www.fcla.edudlsmddaitss xmlns:xsi http:www.w3.org2001XMLSchema-instance xsi:schemaLocation http:www.fcla.edudlsmddaitssdaitssReport.xsd
INGEST IEID EBU2LV16S_CYY2LC INGEST_TIME 2015-01-26T21:30:16Z PACKAGE AA00026668_00001
AGREEMENT_INFO ACCOUNT UF PROJECT UFDC
FILES