BANYAN NETWORKS
FOR PARTITIONING MULTIPROCESSOR SYSTEMS
By
Louis Rodney Goke
A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF
THE UNIVERSITY OF FLORIDA
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE
DEGREE OF DOCTOR OF PHILOSOPHY
UNIVERSITY OF FLORIDA 1976
ACKNOWLEDGMENTS
I wish to thank my advisor, Dr. G. J. Lipovski, for his inspiration and guidance concerning both this dissertation and computer architecture in general. It was largely for the opportunity to work with and to learn from Dr. Lipovski that I chose to continue at the University of Florida beyond the master's program, and I feel that his influence has contributed greatly to my technical and professional preparation. The research presented here began as an outgrowth of his earlier work with SW switching structures (Lipovski, 69, 70), and was undertaken originally as thesis research for the degree of Engineer. It was developed, instead, into a doctoral dissertation largely as a result of Dr. Lipovski's encouragement.
Portions of this dissertation have evolved piecemeal over a period of years, and I am indebted to a large number of typists and others who have assisted in document preparation. Most notable has been the contribution of Ms. Sylvia Hansing, who has had the perseverance and the skill to type the entire final manuscript in Mag Card form.
Special thanks go to my wife, Mary Goke, for her tolerance and
understanding and for the variety of ways in which she has assisted me during the preparation of this dissertation.
ii
TABLE OF CONTENTS
Page
ACKNOWLEDGMENTS ii
LIST OF TABLES v
LIST OF FIGURES vii
ABSTRACT ix
SECTION
1 INTRODUCTION 1
1.1 The Trend Towards NumerousModule Systems 1
1.2 Problems with Previous Interconnection Schemes 4
1.3 Banyan Partitioning Networks 7
2 PARTITIONING NETWORK CONCEPTS AND REQUIREMENTS 9
2.1 Architecture of a Partitionable System 10
2.2 Utilization of a Partitionable System 15
2.3 Requirements of a Partitioning Network 25
3 SOME ALTERNATE REALIZATIONS OF PARTITIONING NETWORKS 31
3.1 Crossbar Networks 32
3.2 Permutation Networks 35
4 BANYANS 38
4.1 TreeShaped Connections 40
4.2 Priority Hardware in TreeShaped Data Paths 41
4.3 Synthesizing Large Banyans from Smaller Ones 46
4.4 Control of Connections 49
4.5 Parallel and Multiplexed Networks 54
iii
TABLE OF CONTENTS (continued)
SECTION Page
5 LLEVEL BANYANS 56
5.1 Base and Apex Distance 58
5.2 Fanout and Spread 62
6 SW BANYANS 65
6.1 Previous Special Cases 66
6.2 Structure 69
6.3 Distance Properties 75
7 CC BANYANS 78
7.1 Structure 79
7.2 Distance Properties 81
8 BANYAN NETWORK SIMULATIONS 85
8.1 Nature of Simulations 87
8.2 Simulation Results 96
9 COSTPERFORMANCE FUNCTIONS iii
9.1 Functions of Interest 112
9.2 A Comparative Example 117
9.3 Optimum Fanout and Spread 122
10 CONCLUSIONS 124
APPENDIX A: MATHEMATICAL NOTATION AND TERMINOLOGY 128
APPENDIX B: THE THEORY OF BANYAN GRAPHS 136
APPENDIX C: BIDIRECTIONAL SWITCHING CIRCUITS 208
APPENDIX D: COMPLETE'SIMULATION DATA 216
REFERENCES 225
BIOGRAPHICAL SKETCH 228
iv
LIST OF TABLES
Table Page
5.21. Size and Cost Functions for Uniform Banyans 64
6.31. Base and Apex Equivalence Classes for SW Banyan in 76
Figure 6.22g
8.11. Application of Near Apex Selection Rule to the 88
Banyan in Figure 6.22g
8.12. Application of Near Apex Selection Rule to the 89
Banyan in Figure 7.11b
8.21. Effects of Varying the Number of Subsystems in 97
a Partition
8.22. Comparison of SW and CC Network Structures 106
8.23. Comparison of Far and Near Apex Selection Rules 108
8.24. Comparison of Standard and Modified SetUp Rules 109
9.21. Cost and Performance Measures for Three Alternative 119
Networks
AI. Standard APL Notation Used in Dissertation 132
A2. Extensions to APL Notation Used in Dissertation 135
D1. Simulation Results for SW Banyans Using Far Apex 218
Selection Rule and Standard SetUp Rule
D2. Simulation Results for SW Banyans Using Near Apex 219
Selection Rule and Standard SetUp Rule
D3. Simulation Results for SW Banyans Using Far Apex 220
Selection Rule and Modified SetUp Rule
D4. Simulation Results for SW Banyans Using Near Apex 221
Selection Rule and Modified SetUp Rule
D5. Simulation Results for CC Banyans Using Far Apex 222
Selection Rule and Standard SetUp Rule
v
LIST OF TABLES (Continued)
Table Page
D6. Simulation Results for CC Banyans Using Near Apex 223
Selection Rule and Standard SetUp Rule
D7. Simulation Results for CC Banyans Using Near Apex 224
Selection Rule and Modified SetUp Rule
vi
LIST OF FIGURES
Figure Page
2.11. Basic Architecture of a Partitionable System 11
2.12. A Partitionable System with Special Bus for Control 13
Messages
2.21. Example of an Isolated Subsystem 16
2.22. Subsystems Linked by a Shared Resource Module 18
2.23. Example of Subsystems Linked for Distributed 20
Processing
2.24. Example of Subsystems Linked for Array Processing 22
3.11. Crossbar Partitioning Network 33
3.21. Permutation Network used as a Partitioning Network 36
41. Examples of Banyans 39
4.21. Example of Priority Vie 42
4.31. Banyan Synthesis 47
4.41. Setup Algorithm 50
4.42. Search Algorithm 52
5.11. Base and Apex Distances in an LLevel Banyan 59
6.21. Synthesis of an SW Banyan 70
6.22. Examples of SW Banyans 71
6.23. Synthesis of an SW Banyan from Smaller Component 74
SW Banyans
7.11. Examples of CC Banyans 80
7.21. Conceptualization of Minimum Circular Distance 82
7.22. Apex Distances for CC Banyan in Figure 7.11a 84
vii
LIST OF FIGURES (Continued)
Figure Page
8.11. Redrawn Version of SW Banyan in Figure 6.22g 90
8.12. Redrawn Version of CC Banyan in Figure 7.11b 91
8.13. SetUp Rule Modification 92
8.21. Average Layers Required for SW Banyans Using Far 99
Apex Selection Rule and Standard SetUp Rule
8.22. Average Layers Required for SW Banyans Using Near 100
Apex Selection Rule and Modified SetUp Rule
8.23. Average Layers Required for CC Banyans Using Near 101
Apex Selection Rule and Standard SetUp Rule
8.24. Average Layers Required for CC Banyans Using Near 102
Apex Selection Rule and Modified SetUp Rule
9.21. Modification of a Crossbar Partitioning Network to 118
Limit Fanout Requirements
CI. Relay Used as a Bidirectional Switch 210
C2. AND Gate Used as a Unidirectional Switch 211
C3. Bidirectional Switch Using Standard TTL Gates 212
C4. Bidirectional Switch Using Standard ECL Gates 213
C5. Bidirectional Switch for LSI Using 12L Gates 214
viii
Abstract of Dissertation Presented to the Graduate Council
of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
BANYAN NETWORKS
FOR PARTITIONING MULTIPROCESSOR SYSTEMS By
Louis Rodney Goke
June, 1976
Chairman: Gerald J. Lipovski
Major Department: Electrical Engineering
There is a strong and growing need for switching structures suitable for interconnecting numerous processors and other resource modules in large, general purpose computing systems. For this purpose, "banyan" network structures are defined and analyzed with the use of graph theory, and their costperformance characteristics are compared with those of alternative networks.
Techniques are proposed for utilizing banyan networks in large,
general purpose, partitionable systems containing numerous microprocessors or other resources. A banyan network used in this manner can partition system resources into a wide variety of taskoriented subsystems and, when necessary, can be multiplexed to realize any possible partition. Techniques are also presented for constructing banyan networks in modular form and for controlling them in a rapid and potentially faulttolerant manner using distributed logic in the networks themselves.
Banyan partitioning networks are shown to have significant advantages over alternative crossbar, or multiplebus, structures for use in large
ix
systems. It is shown that banyan cost functions tend to grow more slowly with network size and that banyan networks can be expanded without limit using fixedfanout devices. Statistical simulation results are presented indicating that banyans have a potential costperformance advantage over large crossbarbased partitioning networks and that this advantage tends to increase with network size.
Graph theory and APL vector operations are used in characterizing
theoretical banyan properties. Various subclasses of banyans are defined, providing a taxonomy of network structures and permitting the derivation of additional useful properties. The analysis is oriented towards the use of banyans as partitioning networks, but it is noted that a variety of networks proposed previously for other purposes are structurally equivalent to special cases of banyans, suggesting that the theory of banyan graphs could have much broader applications.
x
SECTION I
INTRODUCTION
1.1 The Trend Towards NumerousModule Systems
There is reason to believe that large data processing systems of the future will tend more and more to contain numerous small resource modules instead of a few large ones. For example, a computing facility requiring a large amount of processing power might contain a number of miniprocessors or microprocessors instead of a single large processor. Similarly, its primary memory might be provided by numerous low capacity modules rather than a few high capacity modules.
It has long been recognized that modular multiprocessor systems
have major potential advantages over single processors in the areas of throughput, reliability, availability, and expandability. Potential throughput advantages are obvious in applications which lend themselves to parallel processing. By automatically reassigning tasks so as to bypass faulty modules, excellent system reliability and availability can be achieved, even if individual module failures are common. Furthermore, it is possible to expand the capacity of a highly modular system quickly and efficiently by simply adding more modules.
Economically, the attractiveness of multipleprocessor architectures has g rown with progress in integrated circuit technology. With early component technologies, large processors tended to provide the most computing power for the money, and the high costs of processors tended
1
2
to make numerousprocessor systems prohibitively expensive. Accordingly, early multiprocessor systems tended to contain small numbers of processors and generally were constructed only as research projects or for special applications where adequate performance could not be obtained with conventional machines (Comptre, 74).
When MSI technology and volume production made miniprocessors available for several thousand dollars each, systems with multiple miniprocessors began to be viewed as alternatives to large, singleprocessor, timeshared systems. These miniprocessors found considerable use as elements of geographically distributed, special purpose systems, and efforts were undertaken to develop reconfigurable, general purpose, multiminiprocessor systems at two major universities (Baskin et al., 69, 72; Wulf and Bell, 72). Simple costeffectiveness measures published recently by Bhandarkar and Juliussen (75) indicate that multiminiprocessor systems with up to several tens of processors could have costeffectiveness advantages over comparable singleprocessor computers.
More recently, the availability of extremely low cost LSI microprocessors has accelerated interest in multipleprocessor architectures, and the trend is expected to continue (Searle and Freberg, 75). Microprocessors became recognized for their excellent costperformance potential shortly after they were introduced (Schultz et al., 73), and improvements in both cost and performance have progressed rapidly. General purpose processors are now available for several tens of dollars, and substantial further reductions are projected. The excellent costeffectiveness of these processors suggests that multiplemicroprocessor systems are destined to replace large singleprocessor computers in a number of applications.
3
The cost, size, and power requirements of microprocessors are now low enough that systems with hundreds or even thousands of processors are reasonable to consider. In fact, a general purpose computer containing up to 512 microprocessors already is being marketed by one manufacturer (Frank, 75).
The trend towards numerous small modules is not limited to processors. Advancing LSI technologies are now challenging core technology for primary memory applications. MOS/LSI memories are beginning to compete with core memories in speed and cost; and faster, but more costly, bipolar memories are finding applications in highperformance machines such as the Texas Instruments Advanced Scientific Computer. Before long, charge coupled devices may be widely used where higher capacity and lower speed are required. Unlike core technology, the newer LS1 semiconductor technologies are well suited to the fabrication 'of numerous small modules.
Clearly, strong incentives now exist for developing general purpose multiprocessor systems containing large numbers of resource modules. Potentially, at least, such systems could have significant advantages over large singleprocessor computers in the areas of throughput, reliability, availability, expandability, and cost.
4
1.2 Problems with Previous Interconnection Schemes
Although there are good reasons for developing general purpose
multiprocessors with numerous modules, most previously used interconnection schemes are practical only for small or specialized systems. Problems with module interconnection schemes used previously will be discussed briefly in this section. More detailed surveys of these schemes have been published by Comptre (74) and by Searle and Freberg
(75).
Crossbar switching structures can support high data rates but are not practical for interconnecting large numbers of modules. The number of "contacts", or switching devices, needed for a crossbar increases with the square of the number of modules connected to it, making the crossbar prohibitively expensive for very large systems. Since the fanout of switching devices in a crossbar increases linearly with the number of resource modules, this too can be a serious problem in large systems, especially if expandability is not to be limited.
A single timeshared bus can provide flexible, inexpensive communication among a small number of modules, but bus contention problems make this approach impractical for large systems. As the number of modules on the bus increases, bus utilization increases, causing the resource modules to waste more and more of their time waiting for a nonbusy bus.
Multiple timeshared busses can be used to alleviate bus contention problems, but this configuration has problems similar to those of a crossbar. In fact, the switching devices which enable each module to be connected with any bus are arranged in a crossbar configuration. Since the maximum data rate that can be handled by a bus is fixed, the number of busses required grows proportionally with the total number of modules,
5
and the number of switching devices required grows as the square of this number.
Both single and multiple timeshared busses have fanout problems in large systems, because each module on a bus must be capable of driving all other modules attached to that bus. Thus, as is the case with a crossbar, fanout requirements grow linearly with the number of modules. In a very large system, fanout limitations can be overcome by dividing each bus into segments interfaced by bidirectional amplifiers, but this further increases network cost and increases the time required for signals to propagate across a bus.
"Multiport" systems, in effect, use crossbar structures to interconnect various classes of modules, and hence, they have cost and fanout problems like those of crossbars.
A number of vector and array organizations have been devised in
which each resource module (usually containing a processor with memory) can communicate directly with only a fixed number of "nearest neighbors." These organizations generally can be extended to very large sizes without severe cost or fanout problems but are not well suited to general purpose use. Since each module can communicate directly with only a few neighbors, considerable overhead is required whenever logical dataflow patterns do not correspond closely with the hardware interconnection pattern. It has been found, generally, that efficient software is difficult to produce for these machines unless the application "fits" the architecture. Also, when a module fails, there may be no efficient way for another module to take its place, since the substitute module is not likely to have the same neighbors as the module that failed. The ILLIAC IV (Barnes et al., 68) is typical of this kind of organization.
6
Various distributed architectures have been used successfully in dedicatedfunction applications but are not well suited to largescale, general purpose use. Efficiency in a large distributed system usually derives from the fact that each module or subsystem performs a specialized function and needs to communicate with other modules or subsystems only in very limited ways known at design time. This, of course, is not feasible in a general purpose system, which must be designed for a variety of applications not necessarily known at design time.
Thus, interconnection schemes used previously in multipleprocessor systems generally are undesirable for generalpurpose systems with very large numbers of modules, because their cost and fanout functions grow too rapidly with system size, because performance degrades with system size, or because they perform well only in specialized applications.
7
1.3 Banyan Partitioning Networks
A class of connecting networks suitable for interconnecting large numbers of resource modules in a general purpose, multipleprocessor system will be defined and analyzed in this dissertation. These networks, called banyans, will be analyzed for their ability to partition the resources of a modular system into taskoriented subsystems. The work presented here concentrates on the use of banyans as partitioning networks, because this mode of operation is particularly applicable to large, general purpose, multipleprocessor systems. Banyan networks also can be used in other ways, but nonpartitioning applications are beyond the scope of this dissertation and will be discussed only in relating this to previous work.
Banyan partitioning networks offer a great deal of flexibility for general purpose use and have major practical advantages for use in large systems. They can economically partition the resources of large modular systems into a wide variety of subsystems. Any possible partition can be realized by paralleling several networks or by multiplexing a single network in a manner to be described later. Banyans are potentially much more economical than crossbarbased structures for large systems, because their "cost" functions increase more slowly with system size and also because they have easily satisfied fanout requirements that are independent of system size. Results will be given indicating that a costperformance advantage over crossbarbased structures can be achieved for large systems and that a crossbar structure actually can be considered a nonoptimal special case of a banyan structure. Propagation delays through a banyan network grow only logarithmically with system size, and high intrasubsystem data transfer rates can be sustained regardless of
8
the number of subsystems realized by the network. Banyan advantages also include failsoft capabilities, isolation of independent jobs, and rapid control algorithms which can be performed largely by distributed logic in the network itself.
A graph theoretic analysis of banyan structures will be presented. Useful theoretical results will be derived concerning the structure, cost, and performance of different classes of banyan networks. The analysis is oriented towards the use of banyans as partitioning networks, but it is expected that the theory of banyan graphs could have much broader significance. Structures graphically equivalent or similar to restricted classes of banyans have been proposed for a variety of data manipulation functions, including sorting, shifting, and permuting; so banyan theory potentially could be applied in these areas and could tie together a number of previously unrelated works.
For the reader's convenience, all proofs and other tedious theoretical material will be separated from the main text and presented in an appendix.
Simulation results will be presented concerning certain performancerelated issues not resolved analytically. These results, together with those derived theoretically, will be used in assessing the costperformance potential of banyan partitioning networks.
Most major results presented in this dissertation were published in an earlier paper (Coke and Lipovski, 73).
SECTION 2
PARTITIONING NETWORK CONCEPTS AND REQUIREMENTS
The purpose of this section is to explain how a partitioning network could be used in a general purpose, multipleprocessor system and to identify requirements this application would be likely to impose on a partitioning network. A partitionable system architecture will be presented in Section 2.1. The ways in which a system of this type could be applied to different classes of problems will be discussed in Section 2.2. Requirements imposed on a partitioning network by this kind of use will be identified in Section 2.3.
9
10
2.1 Architecture of a Partitionable System
The basic architecture of a partitionable data processing system is illustrated in Figure 2.11. The system contains a number of resource, modules, such as processors, memory modules, mass storage devices, 1/0 devices, or even complete computer systems. Each module is connected to a partitioning network through one or more ports. These ports generally would be bidirectional so that a module could both transmit and receive signals through the same port. For example, an input device might receive requests for data and then transmit the data requested.
The purpose of the partitioning network is to establish necessary communication paths by connecting ports together to form subsystems. For each subsystem, the partitioning network, in effect, provides a separate timeshared bus to which all ports in that subsystem are connected. Thus, the partitioning network partitions the ports of system resource modules into subsystems, and a particular resource module may belong to as many subsystems as it has ports.
The connections established by the partitioning network typically would be controlled by an operating system or executive and would be modified automatically to accommodate different job needs. The operating system, like any other job, could be executed by a set of resource modules linked as a subsystem.
To facilitate operating system functions, some facility for passing system control messages outside the partitioning network would probably be needed. For example, user subsystems might send messages to the operating system in order to request additional resources, to release resources no longer needed, to signal job completion, or to request other changes in system configuration. Similarly, the operating system might
PARTITIONING NETWORK Resource Modules Figure 2.11. Basic Architecture of a Partitionable System
12
direct system configuration changes by sending commands to the partitioning network, and it might send messages to user subsystems in order to control execution and to acknowledge service requests. Control messages such as these would likely be short and infrequent but generally should be transmitted quickly. Communication of this sort could be provided by a single timeshared bus or other simple facility linking together all processor modules and the control inputs of the partitioning network as illustrated in Figure 2.12. This facility for passing system control messages would tend to be a critical, but relatively inexpensive, part of a large partitionable system; so the straightforward use of redundant hardware' in this facility could prevent it from limiting system reliability and would add little to overall system cost.
The architecture presented here has a number of attractive features. By effectively providing a separate bidirectional bus for each subsystem, the partitioning network allows data transfers within each subsystem to take place at high rates and with little delay. Single timeshared busses are now widely used for interconnecting resource modules in small computing systems, and a variety of devices now on the market can be interfaced in this manner.
Security problems in a multiprogramming environment are greatly
simplified by the fact that disjoint subsystems readily can be established to execute independent Jobs. Since each subsystem bus functions independently from all others, subsystems or sets of interconnected subsystems cannot interfere with each other unless they share one or more resource modules.
Excellent potential exists for achieving high system reliability and availability, because each module potentially can be.connected directly with any set of other modules by grouping them together as a
13
PARTITIONING NETWORK Network
Control
Commands
G00 000
Control Bus Other Resource Processor Modules
Modules
Figure 2.12. A Partitionable System with Special Bus for Control Messages
14
subsystem. Thus, if a module failed, the partitioning network could connect an equivalent spare module in its place, and the repaired subsystem could continue processing as efficiently as before.
The flexibility of this architecture makes it attractive for general purpose use. As will be shown in Section 2.2, it is suitable for most applications of large computers and, for most purposes, could be programmed in a very straightforward manner.
15
2.2 Utilization of a Partitionable System
The system architecture described in Section 2.1 is extremely
flexible and could be used in many of the application areas currently dominated by large singlepr~ocessor computers. It allows each job to be executed by a set of resource modules interconnected to emulate a system architecture appropriate for that job. Conventional batch jobs and small realtime jobs could be executed by isolated subsystems, each configured to finction as a singleprocessor computer dedicated to its own job. More demanding jobs encountered in many realtime and nearrealtime applications could be handled by sets of subsystems linked via shared resources to form distributed processing networks or parallel processing arrays. Potentially, any of these configurations could coexist with others and could be assembled, disassembled, or modified under operating system control to satisfy changing job requirements. In the remainder of this section,, several basic techniques will be discussed for applying a partitionable system to common types of data processing problems.
The simplest way to use a partitionable system is to configure an isolated subsystem for each job. An isolated subsystem is one whose resource modules are functionally independent from those of other subsystems. Except for possible communication with a central operating system, each isolated subsystem would function independently as a separate computer system. Different numbers and types of resource modules could be assigned to different subsystems according to job requirements, and if necessary, modules of a subsystem could be added or deleted by the operating system in response to service calls from the subsystem.
Figure 2.21 shows a typical isolated subsystem which might be used to execute a small batch job. The subsystem bus connects the
Subsystem Bus
I/O FILES PROCESSOR WITH MEMORY SUPPLEMENTAL MEMORY
Figure2.21. Example of an Isolated Subsystem
17
necessary resource modules so that they can function together as a small singleprocessor computer. *Although the resource modules are functionally independent from those of other subsystems, some of them may be physically combined with those of other subsystems. For example, the module labeled "1/0 Files" might actually be one of several ports to a larger file management system, which makes its ports appear to be independent by restricting the files accessible through each port.
An isolated subsystem could be used effectively for any job not
requiring a large amount of processing power. Batch programs and Small realtime programs could be written in a conventional manner and could be executed by singleprocessor subsystems so that, in effect, each program would have its own small computer. High system throughput could be achieved by executing many jobs concurrently with different subsystems.
Muiltiple processors could be interconnected for jobs too demanding for a single processor. This might be the case in certain realtime or nearrealtime applications or for very large batch jobs that would take too long to execute otherwise. In some cases, two or more processors could be assigned to an isolated subsystem, but severe bus contention problems would be likely if many processors shared the same subsystem bus.
A more practical approach for unusually demanding jobs would be to link two or more subsystems together via shared resource modules as illus trated in Figure 2.22. A number of multiport modules might be included in a system for this purpose. For example, a number of multiport memory modules with arbitration hardware could be provided. To increase versatility, the memory areas accessible through each port could be limited
First Subsystem Bus Second Siubsvsteni Bus
000 SHARED
RESOURCE
MODULE
Modules Dedicated Modules Dedicated
to First Subsystem to Second Subsystem
Figure 2.22. Subsystems Linked by a Shared Resource Module
19
by registers associated with the port so that each port could be used as a separate memory module when shared memory was not required.
Each set of linked subsystems could be configured to function as a distributed network or parallel processing array suited to the needs of its associated job. This technique would enable application programmers to run unusually demanding jobs on partitionable systems, and also could provide a fast, economical means for system designers to emulate special purpose distributed networks and array processors prior to construction.
Figure 2.23 Illustrates how a set of subsystems might be linked
for distributed processing in a simple nearrealtime application. The subsystems are cascaded in assembly line fashion with each subsystem performing a different kind of processing function. Thus, throughput capacity of the distributed system is enhanced by pipelining. A dataf low diagram of the distributed system is shown in Figure 2.23a, and the organization of resource modules into linked subsystems is shown in Figure 2.23b. The first subsystem collects data from a realtime source, does some initial processing on it, and writes the results into a buffer in shared memory. The second subsystem reads data from this buffer, applies a signal processing algorithm, and writes its results into a second buffer. The third subsystem takes data from the second buffer', arranges it into the desired output format, and writes it to a display device.
Distributed configurations like this, which process data in assembly line fashion, should be relatively easy to design and program. Dataf low diagrams for more complex systems could be designed in a systematic, topdown manner much as sequential programs are in certain "structured
REAL
TIME DATA BUFFER SUtAR DISPLAY
SOURCE
DATA COLLECTION SIGNAL PROCESSING DISPLAY PROCESSING
a) DataFlow Diagram
First Subsystem Bus Second Subsystem Bus Third Subsystem Bus
REALTIME SHARED SHARED DISPLAY
DATA SOURCE MEMORY MEMORY DEVICE
PROCESSOR PROCESSOR SUPPLEMENTAL PROCESSOR
WITH MEMORY WTilTH MEMORY MEMORY. WTI MEMORY
b) Resource Module Interconnections Established by Partitioning Network
Figure 2.23. Example of Subsystems Linked for Distributed Processing
21
programming" methodologies. Once a dataflow structure has been fully defined, each processing node could be implemented with a singleprocessor subsystem programmed like a conventional computer. For example, a subsystem which continually reads data from one buffer and writes its results into another could be programmed as if it were a conventional computer reading data from an input device and writing results to an output device. Any mutual exclusion protocols required for accessing shared buffers could be built into I/0 routines and need not necessarily concern the application programmer. These basic design principles could be applied even to complex systems in which some processing nodes and buffers might have multiple inputs and outputs.
The practicability of distributedsystem design is evidenced by the growing use of distributed processing in specialized data processing systems. Continuation of this trend, no doubt, will lead to improved design techniques and to more widespread familiarity with distributed systems among designers and programmers.
Subsystems also could be linked for parallel processing of array data. With this approach, a number of subsystems would be interconnected in a regular pattern, and each subsystem would perform the same processing function on a different data stream.
One of many possible array processing configurations is shown in
Figure 2.24. In this example, four subsystems are linked in a rectangular pattern for multiplying a pair of large matrices. A (2xl) by J matrix A is to be multiplied times a J by (2xK) matrix B to obtain a (2xl) by (2xK) matrix C. The matrices A and B initially are segmented and loaded into the shared memory modules as shown. Each subsystem then computes one
Subsystem Bus No. I Subsystem Bus No. 2
C[I~I;1I~K C[I~I;K+I~2xK]
SHARED M MEMORY
MEMORY MEMORYMEMORY
& EMRY& MEMORY [
SHARED SHARED
B[~EJ;1,K] E[1~J;K+1~2xK] Subsystemi Bus No. 3 Subsystem Bus No. 4
SHARED MEMORY
A[I+1~2xI;1~J]
MEMORY 1 PROCESSOR [PROCESSOR MEMORY
& MEMORY & MEMORY
.[I+l~2xu;1~tK] C[I+1~2xl;K+1~2x
Figure 2.24. Example of Subsystems Linked for Array Processing
23
quadrant of the resulting matrix C and stores it in the memory module shown. By working in parallel, the four processors should be able to perform the matrix multiplication about four times as fast as a single processor.
A number of fixedconfiguration array processors have been proposed but generally have been built only as research projects or for specialized applications.1 There is no doubt that these machines can perform certain kinds of computations at extremely high speed, but efficient software for most of them has been notoriously difficult to produce except for specialized applications where the problem "fits" the machine. A frequent source of'programming difficulty and software inefficiency is the fact that processing elements are interconnected in a fixed or nearly fixed configuration, which may not have the best dimensions or interconnection pattern for a given problem. In such situations, one must either use the available processing array inefficiently or employ complicated software techniques to "fit" the problem to the machine.
Parallel array processing in a partitionable system similarly could provide very high speed array computation but need not involve some of the software problems associated with existing array processors. A processing array in a partionable system could be configured with whatever interconnection pattern and dimensions were needed for a particular problem. Consequently, relatively straightforward software could be executed with high efficiencyby array processing subsystems.
In a partitionable system, parallel array processing and other
forms of processing could be used together easily and efficiently for
A number of existing array processors have been surveyed by Comptre
(74).
24
any job with varied computational requirements. For example, a parallel processing array for signal processing might be used as a processing node of a larger distributed system configured for a realtime application; or a large batch job might involve several steps, each of which would run with a different configuration of subsystems. Such flexibility is desirable because there are many applications in which parallel array processing would be useful only for part of the required computations.
In almost any practical situation, an operating system would be
necessary fo r effective utilization of a partitionable system. The kinds of functions performed by such an operating system would be similar to those performed by a conventional multiprogramming operating system except that partitioning network control functions would be performed instead of timesharing overhead functions. The overhead and complexity normally required for swapping tasks in and out of execution on a single processor would be unnecessary in many situations, because each job or task could be run to completion on its own subsystem or set of subsystems. The software complexity and overhead required for controlling a partitioning network would depend on the kind of network used. As will be shown later, much of the work required for controlling a banyan partitioning network can be performed very rapidly by distributed logic in the network itself.
25
2.3 Requirements of a Partitioning Network
Partitionable systems have a number of attractive features for a wide range of largecomputer applications, but the practicability of this approach depends on the availability of a suitable class of partitioning networks. In this section, we will identify properties that a class of partitioning networks should have for use in partitionable systems.
Cost. Reasonable cost is perhaps the most obvious practical requirement. Generally, a network will be economically practical so long as its cost per resource module port is below some applicationdependent limit. Since network cost per port increases with the number of ports, this effectively limits the maximum network size that is economically practical. Thus, in order for a class of networks to be practical for large systems, its cost function should grow slowly with system size. Actual network cost., of course, depends on many factors, including component technology and packaging techniques. For comparing network structures, however, it is common practice to use the number of "contacts", or switching devices, required as a measure of network cost.
Fanout. All major families of electronic switching devices have fanout limitations; that is, each device is capabl. of driving only a limited number of similar devices. Thus, the number of other devices to which each switching device in a network is connected should grow very slowly, or'not at all, with system size in order for a network structure to be practical for large systems.
Bidirectional switching. Data paths, and hence the devices used for switching them, must be bidirectional in a partitioning network so that data can be transferred from any resource module to any other resource module connected to the same subsystem bus. Some bidirectional
26
electronic switching devices suitable for this purpose have been described by Vice et al. (73). Some additional bidirectional switching circuits using standard TTL and ECL gates are described in Appendix C.
Priority hardware. Whenever several devices communicate over a
bidirectional, timeshared bus, some mechanism is needed to prevent more than one device from trying to transmit on the same line at the same time. Priority hardware built into a bus is probabably the fastest and most desirable mechanism for arbitrating simultaneous requests for bus use. For this reason, priority hardware is likely to be needed in a partitioning network to arbitrate conflicting requests for use of subsystem busses. The ease with which suitable priority hardware can be built into a partitioning network is, thus, an important consideration.
Speed requirements. There are three basic response times of interest in a partitioning network: the time required to rearrange connections (probably one subsystem at a time), the time required for a resource module to gain control of its subsystem' s bus, and the rate at which a module can transfer data over this bus after obtaining control.
The time required to rearrange connections in a partitioning network depends largely on the complexity of the control algorithms involved and the extent to which these algorithms can be performed by hardware in the network itself, as opposed to sequential execution in an external processor. In a parttionable system used as described in Section 2.2, a subsystem generally would exist long enough for numerous messages to be transferred within that subsystem. Consequently, the time required to establish a new subsystem might be substantially greater than a typical message transfer time without significantly degrading overall performance,
27
especially if new subsystems can be established without disrupting communication in existing subsystems. In a large system with many subsystems, however, frequent reconfiguration may be necessary even if the average subsystem life is long. In this case, it may be necessary that new subsystems be connected very quickly or that they be connected without disrupting communication in existing subsystems or both. This is an application dependent issue. Fast, simple control algorithms clearly are moral desirable than slow, complicated ones, but the importance of this depends largely on the frequency with which the system must be reconfigured.
The time required for a resource to request and receive bus control (assuming that the bus is available) depends primarily on the speed of priority hardware used to arbitrate bus control requests. Since this must be done prior to each transmission, propagation delays in the priority hardware can significantly affect the rate at which short messages can be transmitted. In designing partitioning networks for large expandable systems, one must be careful that neither the propagation delay nor the cost of priority hardware grows unreasonably with system size.
The maximum rate at which a resource module can transfer data after gaining control of its subsystem's bus tends to be inversely proportional to the number of bidirectional switches through w..hich th signal propagates on its way through the network. If the network is multiplexed in a manner to be described later, this maximum rate also tends to be inversely proportional to the number of multiplexed "layers". The maximum rate referred to here is the limit imposed by the partitioning network. The rate at which data is actually transferred, of course, is
28
limited by the resource modules also. The maximum data transfer rate allowed by the partitioning network should be high enough not to unduly slow down the resource modules. For large expandable systems, the propagation delay and the number of multiplexed lay ers (if used) should grow slowly with system size.
Fault tolerance. A major advantage of partitionable systems is
their potential for fault tolerance. If this potential is to be realized, a system must be able to tolerate hardware failures in its partitioning network as well as in its resource modules. Thus, a partitioning network should be able to continue functioning to at least some extent in spite of limited hardware failures. It is desirable that there be more than one possible way in which to connect any given subsystem, because otherwise a single failure in the network could make certain subsystems impossible to connect, even if no demands are placed on the network by othersubsystems. It also should be possible to employ control algorithms adaptable enough to bypass faulty portions of the network when establishing new subsystems.
Modularity and expandability. Modularity and expandability also are advantages of partitionable systems, and it is desirable that a partitioning network share these properties. To minimize production cost and to facilitate maintenance, it should be possible to built a partitioning network by connecting together a number of identical modules, perhaps supplied by a manufacturer as "offtheshelf" items. It also should be possible to expand a partitioning network such that the old network becomes part of the new one instead of being replaced by it.
Partitioning flexibility. Ideally, we would like for a partitioning network to be able to partition'system resource ports into subsystems
29
in any conceivable way, but complete flexibility in this regard may be unnecessary in practice. Since greater flexibility generally requires greater cost and complexity, it is useful to determine just how much flexibility really is needed in a partitioning network. The ways in which a network actually needs to be able to partition a system depend on the kinds of system resource modules and on the ways in which the system is to be used.
In a practical system, certain kinds of subsystems might be incapable of performing any useful function and, hence, need never exist. For example, a subsystem containing only memories might be unable to function. Similarly, for certain kinds of multiport modules, it may be pointless to ever connect two or more ports of the same module to the same subsystem bus.
In applications requiring only isolated subsystems, such as batch execution of conventional programs, a partitioning network should be able to configure any reasonable subsystem by itself but need not necessarily be able to configure all reasonable combinations of subsystems. If subsystems required for a particular set of jobs cannot all be configured at the same time, then some of the jobs simply can be executed at different times. Thus, jobs can be scheduled to avoid conflicting demands on'a partitioning network just as they must be scheduled to avoid conflicting requirements for other system resources. Rescheduling jobs because of partitioning network limitations might result.in less efficient resource module utilization, but would allow all jobs to execute eventually, so long as the network could configure each subsystem individually. Isolated subsystems only need'to coexist sufficiently for efficient utilization of'resource modules.
30
Linked subsystems, on the other hand, interact during execution and hence must exist concurrently. Consequently, greater partitioning flexibility is likely to be required if a system is to accommodate large sets of linked subsystems. Additional flexibility for configuring arbitrary sets of linked subsystems can be achieved either by inherent properties of a network structure or by multiplexing or paralleling certain less flexible network structures in a manner that will be described later.
SECTION 3
SOME ALTERNATE REALIZATIONS OF PARTITIONING NETWORKS
Two types of partitioning networks, based on crossbars and permutation networks, respectively, will be described in this section. These networks are presented for their conceptual significance in relating partitioning networks to other structures and also to provide a basis of comparison for'the banyan networks described in following sections. As will be explained, the networks described in this chapter have certain characteristics which tend to make them impractical for very large systems.
31
32
3.1 Crossbar Networks
The network shown in Figure 3.1la is perhaps the most straightforward partitioning structure. 1 It contains a number of busses, which are linked with all of the resource modules by bidirectional switching devices. Partitioning is accomplished by assigning one bus to each subsystem and connecting resources to them accordingly. For N resource modules, up to L N12 busses may be required since this is the maximum number of nontrivial subsystems possible at any one time. A subsystem with only one resource module is trivial in the sense that it need not use the partitioning network for intrasubsystem communication.
Figure 3.1lb is a graph representing the structure of this network. This representation of network structure is similar to that used by Benes (62)*. It uses vertices to represent data busses, or links, and uses edges to represent the switching devices, or "contacts", connecting them. Graph representations of this kind will be used with other structures later.
The network shown is represented by a biparte graph with an edge connecting each bus with every resource module. Graphically, this structure is equivalent to a crossbar switch.
Crossbar partitioning networks have a simple, regular structure. They also have potentially low propagation delay for data transmission since data must propagate through only two switches regardless of network size. Propagation delay for priority hardware, however, would grow logarithmically with N, assuming the use of methods similar to those described by Foster (68). Although faster priority hardware is possible,
1This structure is equi valent ot the "multiple, systemwide, functionally and physically nondedicated busses" described by Thurber et al. (72).
Data Busses
Data Switching
Busses Devices
Resource Modules
Resource Modules
a) Block Diagram b) Graph Representation
Figure 3.11. Crossbar Partitioning Network
34
substantial improvement would most likely be prohibitively expensive in very large systems.
The principal drawbacks of large crossbar networks are their cost
and fanout requirements. A network with L N4~2 busses for partitioning N resource modules would require N x [ N2 switches. Thus, the cost in terms of switching devices required tends to grow as the square of N. Since each switching device in this network is connected to NI similar devices on the same bus, the fanout required of the devices tends to grow linearly with N. Similarly, each resource module port is connected to L. N12 switches; so the fanout capability required of resource modules grows linearly with N also.
35
3.2 Permutation Networks
It is possible to build a partitioning network from a permutation network by supplying the external links shown in Figure 3.21. A permutation network can connect, in pairs, a set of input terminals to a set of output terminals of equal size so that any desired permutation of inputs onto outputs can be realized. These connections allow transmission in either direction when bidirectional switches are used in the network. In the configuration of Figure 3.21, the network permutes the set of resource modules onto itself, allowing connected subsystems to correspond to the cycles of the permutation. By choosing a permutation with the appropriate cycles, any desired partition can be connected.
This result is theoretically significant because it implies that a minimal partitioning network for N resource modules requires no more switching devices than an Ninput, Noutput permutation network. It has been shown that when N is a power of 2, such a permutation network can be built with as few as 4 x (N x 20N) NI switching devices (Goldstein and Leibholz, 67; Joel, 68; Waksman, 68). Thus, the cost of this network tends to grow as N x ON, a substantial improvement over that of a crossbar for large N. Further, the fanout required of switches in such networks is independent of N. This too is a substantial improvement over crossbar structures.
The partitioning structure of Figure 3.21 is of limited practical value, however, for several reasons. Propagation delay tends to be excessive for data transmission in large subsystems. A signal in a subsystem bus connecting I resource modules may have to propagate through the network as many as L 1 2 times to reach its destination. Each time through, it must propagate through as many as (2 x 20N)i switching
36
PERMUTATION
NETWORK
Resource Modules
Figure 3.21. Permutation Network
Used as.a
Partitioning
Network
37
devices in a minimalcost permutation network. Network reconfiguration would be hampered by the complexity of control algorithms and by the fact that existing connections may have to be rerouted whenever a new subsystem is added. The difficulty of incorporating priority hardware into this structure also appears to be a serious drawback.
SECTION 4
BANYANS
Banyan networks, named for the East Indian fig trees of somewhat similar structure, are defined in terms of their graph representations in Definition 1.1.1 A banyan graph is a Hasse diagram of a partial ordering in which there is one and only one path from any base to any apex. A base is defined as any vertex having no arcs incident into it, an apex is any vertex with no arcs incident out from it, and all other vertices are called intermediates. When a banyan is used as a partitioning network, its bases are connected to resource modules, but its apexes and intermediates are internal to the network. Some examples of banyansare shown in Figure 41. We use a directed graph representation because it is useful for specifying the'structure and its control algorithms, but the switching devices represented by edges are still bidirectional. Frequently, we will omit the arrow heads from banyan graph diagrams and let it be understood that all arcs point up.
Useful properties common to all banyan partitioning networks will be presented in Sections 4.1 through 4.5. The general class of banyan networks is quite broad, but it is expected that most useful banyan partitioning networks will be included in the more specialized categories described in Sections 5 through 7.
'All definitions and theorems discussed in this paper appear in Appendix B.
38
39
a) Irregular Banyan
b) LLevel Banyan
Figure 41. Examples of Banyans
40
4.1 TreeShaped Connections
In a banyan the data path established to connect the resource
modules of any subsystem always forms a tree rooted at some apex. By definition, there is a unique path from each base to each apex. A subsystem bus is formed by selecting an apex and then closing all switches along the path from each desired base to the selected apex. Since each path is unique, the resulting data path forms a tree rooted at the selected apex (Th. 1.1.1). Algorithms for locating eligible apexes and establishing the connections will be presented in Section 4.4.
Treeshaped connections are significant because they lend themelves well to the inclusion of priority hardware and because they can afford low propagation delay with limited fanout. A method for building priority hardware into a banyan partitioning network will be described next. Propagation delay and fanout requirements of certain types of banyans will be discussed in subsequent sections.
41
4.2 Priority Hardware in TreeShaped Data Paths
The need for priority hardware in a partitioning network was described in Section 2.3. The treeshaped nature of subsystem busses in a banyan network allows suitable priority hardware to be built into the network using the basic approach outlined in this section. Implementation will not be discussed in detail since a number of variations are possible. Although designed for use in banyan partitioning networks, the technique described here is applicable to any treeshaped data path., and is somewhat similar to that described by Foster (68). It allows priority levels to be associated with requests for bus control by resource modules and, in the event of simultaneous requests, grants control to the module with the highest priority request. Various tie breaking schemes are possible.
In the proposed priority scheme, each resource module desiring
control of its treeshaped subsystem bus transmits a "bus request signal" apexward toward the tree's root along a set of "bus request lines." A bus request signal is an encoded number representing the priority level of the corresponding request for bus control. Modules not desiring bus control transmit bus request signals at priority level zero. Priority hardware in the network compares these signals and sends "request denial signals" to all resource modules except the one to which it grants bus control.
Figure 4.21 illustrates how the priority scheme would function.
Since the priority hardware for each subsystem functions independently, only one subsystem bus is illustrated. A,. B, C, and D are the bases, or resource module ports, included in this subsystem. X and Y are intermediate vertices used in the corresponding treeshaped connection,
42
Z
/1 /1
//
I 1
/ \ I I I / 3 / 3 2/
10 2
// / '
/ 1 / A B C
Fiue421/xml fPirt i
43
and Z is the apex at its root. Solid lines in the diagram represent bus request lines and dash lines represent "request denial lines," which are used to convey request denial signals. Numbers beside these lines indicate the signal values they carry in the example.
To simplify the explanation, we assume that bus request and request denial lines are physically distinct from those used to convey data. These lines are switched by switching devices in the banyan just as are the data lines so that their connection patterns are correspondingly treeshaped. They differ from the data lines, however, in that they transfer signals in only one direction and interface with priority hardware at each intermediate and apex vertex. Bus request signals originate at bases and propagate only in an apexwardly direction. Request denial signals are generated by the priority hardware at apex and intermediate vertices and propagate baseward.
Priority hardware at each intermediate vertex, such as X or Y,
compares its incoming bus request signals and forwards the maximum on to the vertex above. It also generates a request denial signal, represented by a logical "V, on all but one of the request denial lines below it. The request denial line over which it sends no denial signal, logical "0", corresponds to the incoming bus request signal with highest priority. Ties may be broken in various ways. In the example shown, the rightmost branch with highest request priority is selected. Other tiebreaking schemes are possible, however. For example, a fairer but more complex scheme might be to select the branch whose past requests have been least recently granted.
When an intermediate receives a request denial signal from the vertex above, however, it transmits request denial signals to all
44
vertices below it in the connection regardless of their request priorities.
An apex,, such as Z, functions exactly the same as an intermediate
except that it can neither receive request denials from nor transmit bus requests to a vertex above.
In the example of Figure 4.21, bases A, B, and C are making bus
requests at priority levels 3, 3, and 2, respectively. Base D does not desire bus control so, consequently, produces a request signal at level zero. Intermediates X and Y compare their incoming request signals and forward the maximums on to apex Z. The apex compares these requests, sends a "request granted" signal (logical "0") to the branch with highest request priority (X), and sends request denial signals (logical '1") to the other branch (Y). Vertex Y transmits denial signals to both C and D because it receives a denial from Z above. Vertex X may grant the request of either A or B since they have tied for highest priority. In this case, the tie is broken by granting B's request.
In explaining this priority scheme, we have assumed the existence of physically separate lines for data, bus request signals, and request denial signals. This allows the priority vie to be performed in parallel by combinational logic with a worst case propagation delay approximately proportional to the longest path length from a base to the apex in the connection. Further, it allows the priority vie for one message transfer to overlap data transmission from the previous one. This could be desirable to achieve high transfer rates for short messages.
For applications in which lower rates are acceptable, however, serial implementations using fewer lines may be more economical. A number of variations are possible. For example,'the same lines might
45
be used for the priority vie as for data transfers. During a priority vie, the bus request and request denial lines would function as described. Once the new bus master has been selected, the network, or at least that part of it used by a given subsystem, would change its operating mode and would allow these lines to function as a bidirectional data path during the message transfer.
Another possibility is to perform the priority vie sequentially in two phases. First the bus request signals would be transmitted apexward as described. Actual transmission of these signals could be either serial or parallel. During this phase, a small register is set in each intermediate and apex vertexto record the branch from which the highest priority request was received. These registers then contain sufficient information to control routing of the request denial signals. For the second phase, the operating mode of the priority hardware is changed and request denial signals are propagated baseward using one of the lines previously used for the bus request signals.
46
4.3 Synthesizing Large Banyans from Smaller Ones
Large banyan networks can be synthesized recursively from smaller ones. Suppose that one has available a number of small banyan networks, perhaps supplied by a manufacturer as standard components, and one wishes to synthesize'a larger network. This can be done as illustrated in Figure 4.3la by connecting the apexes of some banyans to the bases of others.
The interconnections of these banyans can be represented by a
graph, as illustrated in Figure 4.3lb. In this, graph, called an interconnection, graph, each vertex represents a banyan network. An arc from any vertex V1 to another vertex V2 means that one Apex of banyan V1 is directly connected to one base of banyan V2. We assume that if there are any arcs incident into a vertex, then the corresponding banyan has exactly one base for each incident arc. Similarly, the number of apexes equals the number of arcs incident out from the corresponding vertex unless there are none. When there are no arcs incident into a vertex, the bases of the corresponding banyan become the bases of the synthesized network. Similarly, the apexes of the synthesized network are those of the component banyans with no arcs incident out.
Theorem 1.2.3 states that when banyan networks, called the component banyans, are interconnected as described, the resulting synthesized network will be a banyan if and only if the corresponding interconnection graph is a banyan. This implies that once one or more banyan structures are known, these structures can be recursively expanded to arbitrarily large sizes. Using this principle, one might construct a large banyan network by systematically interconnecting a number of smaller component
77 77
a) Synthesized Network b) Interconnection Graph
Figure 4.31. Banyan Synthesis
48
banyan networks in a pattern which is itself a banyan. Suitable component banyans could be manufactured as standardized modules.
The SW structure, discussed later, is characterized by applying this kind of recursive expansion to a crossbar, which is one of the simplest banyan structures.
49
4.4 Control of Connections
The treeshaped subsystem connections in a banyan network can be established very rapidly and in a potentially faulttolerant manner using distributed control hardware within the network. Control is accomplished by means of a setup algorithm, which establishes a subsystem connection, and a search algorithm, which locates eligible apexes. In setting up the first subsystem, any apex may be used as the root of its treeshaped subsystem bus. Prior to setting up each additional subsystem, however, a search algorithm must be employed to select an apex such that the new connection will not interfere with those already existing.
A twostep setup algorithm is illustrated in Figure 4.41 and is justified theoretically in Theorem 1.3.1. Setup is facilitated by a single control line provided in each link of the network. First, a "one" signal is broadcast baseward from the selected apex over the control line, as illustrated in Figure 4.4la. The signal fans baseward at each vertex so that the "one" propagates to all bases. This signal sets a flipflop in each intermediate and apex through which it passes.
In the second step, "ones" are broadcast apexward from each base in the desired subsystem, as illustrated in Figure 4.4lb. In this step, the signal is OR'ed apexward at each vertex. As illustrated in Figure
4.4Ic, the desired connection is made by closing every switch that receives this signal from below and has a set flipflop in the adjacent vertex above. These are the links through which control signals propagated in both steps one and two.
As described, this setup algorithm would require two steps but
only one control line in each link. Unlike the data lines, this control
Selected Apex
Selected Bases a) Step 1 b) Step 2 c)Final Connection.
Figure 4.41. SetUp Algorithm
51
line is always connected between vertices and does not require a bidirectional switch for each edge of the graph. Switching for the control line occurs at the vertices, where the signal is either OR'ed up or ORted down. With the use of two control lines rather than one, the two steps of this algorithm could be combined into one, eliminating the need for the flipflop at each intermediate and apex.
A twostep search algorithm is illustrated in Figure 4.42 and is justified theoretically in Theorem 1.3.2. The purpose of the search algorithm is to locate those apexes which are suitable for connecting a given subsystem (i.e., set'of bases) without interfering with any existing connections. Two subsystem connections can interfere if and only if they have some vertex .in common.
In the example illustrated the circled vertices represent those already in use, and bases 3 and 6 are to be connected as a new subsystem. As shown in Figure 4.42a, control signals are first broadcast apexward simultaneously from all bases in the desired subsystem and are then OR'ed upward using the same control line used in setup. During this step, a flipflop is set in every intermediate and apex which receives this control signal and is already in use.
In the second step, illustrated in Figure 4.42b, the control
signals from the bases are turned off, and each vertex with a set f lipflop broadcasts a "n", which is OR'ed apexward on the same line used in step one. All apexes not receiving a "n" during this step are eligible. Final selection could then be performed by a priority circuit attached to the apexes.
Eligible Apexes
\ ,,,
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
a) Step 1 b) Step 2
Figure 4.42. Search Algorithm
53
Steps one and two of this algorithm, like those of the setup
algorithm, could be combined using a second control line. With four control lines, search and setup could all be combined in one step.
Another possible variation is to provide no dedicated control lines at all and use instead the same lines that are used for data and priority signals within a subsystem. Since these lines must be treated differently by the network, the entire network could be switched between two modes as needed to facilitate either reconfiguration or intrasubsystem communication. This implementation of the control algorithms could reduce the required pin counts in network hardware, but, unlike other implementations, would require that communication be temporarily suspended in all existing subsystems whenever new ones are being set up.
The most desirable control algorithm implementation would depend on the costperformance tradeoffs of a particular application.
The control algorithms described are inherently faulttolerant when faulty vertices in the network can be made to appear like those already in use to the search algorithm. New connections would then be routed around faulty portions of the network just as they are routed around those portions already in use. Using the search algorithm described, however, a portion of the control circuitry in faulty cells would still need to function properly in order to make them appear like those in use. A slower search algorithm that avoids this problem has been described by Lipovski (70). Alternatively, a software search algorithm could replace the faster hardware algorithm in the event of hardware failure.
54
4.5 Parallel and Multiplexed Networks
Search and setup algorithms may be repeated until all desired
subsystems have been connected or until no more eligible apexes can be found. Practical banyan networks allow numerous combinations of subsystems to be configured in this way but are not necessarily capable of realizing all possible partitionings of system resources. As was discussed in Section 2.3, this degree of partitioning flexibility may suffice for certain kinds of applications. When necessary, a banyan network can always configure conflicting isolated subsystems at different times.
When greater partitioning flexibility is required, there are two solutions which potentially will allow configuration of all possible partitionings. First, several banyans can be connected in parallel. The parallel'networks would function independently but their bases would be connected to the same set of resource modules. As many subsystems as possible would be connected using the first network. Those left over would be connected in as many additional networks as required.
The other solution is to multiplex a single network so that it
periodically rearranges itself to connect first one set of subsystems, then another, and so on, so that each subsystem has some time slot during which it can communicate. A partitioning network, as considered here, acts as a rearrangeable set of timeshared busses. A resource module attached to the network must request and receive control of its bus before transmitting data, and must be prepared to wait whenever the bus is not immediately available. Normally, the bus would be unavailable only when used'by other resources in the same subsystem;
55
but should it ever become temporarily unavailable for other reasons, the only effect would be to delay data transmission within the subsystem. This situation makes multiplexing possible with little or no modification of the resource modules. The system need only be designed so that any resource not currently connected by the network would "see" it as a busy bus.
Multiplexing requires that a small amount of memory be associated with each switch in the network to store the state of the switch during each time slot. With LSI, this could be done at reasonable cost by associating a small register with each switch and synchronizing all state changes from a central clock.
The techniques of parallel'networks and multiplexing may be mixed to balance cost and performance. Whether a network structure is space shared with parallel hardware or time shared with multiplexing, the parallel networks and/or time slots share many properties and are called layers. The number of layers required depends on a number of factors and will be discussed later.
There also is a partial solution which could provide some increase in partitioning flexibility. Sometimes it might be possible to connect additional subsystems in a single layer by rearranging the connections of existing subsystems. This, however, would require more complex algorithms, would interrupt processing in existing subsystems during reconfiguration, and would provide only a limited increase in partitioning flexibility. Consequently, it is believed that parallel networks or
multiplexing would be of much more practical value.
SECTION 5
LLEVEL BANYANS
An Llevel banyan, defined formally in Definition 2.1, is a banyan whose vertices are arranged in levels so that switches, or arcs of the graph, exist only between vertices in adjacent levels. For example, the graphs in Figures 41b, 4.41, and 4.42 are Llevel banyans, but 4la is not. There are actually L+i levels of vertices in an Llevel banyan. They are, by convention, numbered apexward from 0 through L so that all bases are in level 0 and all apexes are in level L. When we say that a banyan has L levels, however, it will be understood that it is an Llevel banyan rather than an (LI)level banyan.
The class of Llevel banyans is a proper subset of the general banyan networks discussed in Section 4, but is still broad enough to include most practical designs. As will be explained in this section, Llevel banyans have additional useful properties, which make them attractive as partitioning networks.
Any path from a base to an apex in an Llevel banyan has exactly L arcs; thus, the propagation delay of data through the network cannot exceed that of 2xL switches, since in the worst case, data must travel frombase to apex to base.
Base and apex "distance" functions can be associated with Llevel banyans and will be discussed in Section 5.1. Theoretical results concerning these functions can be used to improve the performance of an Llevel banyan network.'
56
57
A class of Llevel banyans called "uniform" banyans will be discussed in Section 5.2. Special cases of uniform banyans called "regular" and "rectangular" banyans also will be discussed. It will be shown that measures of the size and cost of a uniform banyan can be expressed as functions of certain parameters called "fanout" and "spread". The orderly structure of the networks discussed in Section 5.2 makes them likely candidates for modular construction, and it is expected that most practical designs would fall into these categories.
58
5.1 Base and Apex Distance
The dyadic operators R9 and M, called base distance and apex distance, respectively, are defined in Definition 2.1.2 for any Llevel banyan. The base distance B1 U B2 specifies the minimum number of levels up into the banyan a connection must extend to connect two bases BI and B2. Similarly, the apex distance Al 12 A2 specifies the minimum number of levels down from the top of the banyan a connection must extend to connect two apexes Al and A2.
Figure 5.11 illustrates the concepts of base and apex distance. The darkened paths represent minimal connections. The connection of apexes is presented only as a conceptual aid in explaining apex distance and would not actually occur in a banyan partitioning network.
The definitions of base and apex distance are extended to sets of bases and apexes, respectively, in the same way that point distances often are extended to sets of points. That is, the base distance between any two sets of bases SBI and SB2 is defined to be the minimum of all distances B1 IN B2 such that B1 E SBl and B2 E SB2. The analogous extention applies to apex distance.
Base and apex distance functions are used in Theorem 2.1.7 to characterize a way of avoiding conflicts in connections established within an Llevel banyan. Theorem 2.1.7 tells us that if
< (SB1ESB2)+(A1M42),
then subsystems SBI and SB2 can be connected without conflict in the same layer using treeshaped connections rooted at apexes Al and A2, respectively.
Al A2
#/ / // /// / /1= 2
1 = AIA2
7+77/:7(B1S32)= 2 B1 B2
Figure 5.11. Base and Apex Distances in an LLevel Banyan
60
There are two potentially useful interpretations of Theorem 2.1.7 which suggest ways of enhancing the performance of an Llevel banyan partitioning network. First, subsystems close to each other place more stringent requirements on the separation .of apexes used than do widely separated subsystems, suggesting that closely spaced subsystems are less likely to be connected in the same layer. Thus, if it is known at design time which resources of a system are most likely to be connected, one might improve performance by gerrymandering the assignment of resources to bases so that bases most likely to be connected tend to be closest. For example, each processor module might be placed close to some memory module port, and multiple ports to the same memory module might be widely separated. An operating system also could take advantage of this result by allocating closely spaced resource modules to a subsystem whenever possible. The amount of improvement thus obtainable is not estimated here since this would be highly problem dependent, but one can easily contrive extreme examples in which more than one layer would seldom or' never be needed.
The second interpretation concerns the selection of apexes. The search procedure described earlier locates all apexes eligible for connecting a new subsystem in a partially occupied layer, but does not determine which of the eligible apexes is the best choice. Theorem
2.1.7 now suggests a plausible selection criterion. According to the theorem, any new subsystem can be connected if we can find some apex sufficiently distant from those already in use. Thus, apexes most distant from those in use Are the most valuable in the sense that they
are likely to be eligible f or connecting the greatest variety of subsystems.
61
More subsystems might then be connected in a layer by selecting each new eligible apex so as to leave as many "valuable" apexes as possible for subsequent connections. This criterion is ambiguous in some cases, but, nevertheless, is the conceptual basis for a priority rule found to improve performance in network simulations discussed in Section 8.
62
5.2 Fanout and Spread
A class of Llevel banyans called uniform banyans is defined in Definition 2.2.1. Within each level of a uniform banyan, all vertices are alike in that each has the same number of arcs incident into it and the same number of arcs incident out from it. The arcs incident out from the vertices of a uniform banyan are characterized by an Lcomponent vector F, called the fanout vector. Similarly, the arcs incident in are characterized by an Lcomponent spread vector S. When F = F, the banyan has the same number of vertices in each level (Corollary 2.2.1b) and is calledrectangular (Definition 2.2.2).
A regular banyan (Definition 2.2.3) is a special case of a uniform banyan, in which all vertices throughout the network are alike except, of course, for the fact that bases have no arcs incident into them and apexes have no arcs incident out. All components of a regular banyan's fanout vector, thus, are equal and can be characterized by a single scalar parameter F, called fanout. Similarly, all components of its spread vector are equal and are characterized by a spread parameter S.
Regular banyans probably would be the most economical to fabricate, because they can be built from a number of identical cells, each containing the circuitry associated with a vertex and the arcs incident into it. The fanout and fanin requirements of these cells are determined by F and S. Such modular construction could be used for other uniform banyans as well except that different kinds of cells might be needed for different levels.
Theorems 2.2.1 and 2.3.1 and their corollaries show how the numbers of arcs and vertices in various kinds of uniform banyans are related to fanout and spread. The expressions derived'are summarized in Table 5.2
63
1. The total number of arcs in a banyan graph can be used as a measure of network cost since these arcs represent the bidirectional switching devices in the corresponding network.
It is shown in Theorem 2.4.2 that, for a given number of bases, the "cost" (number of arcs) per base of a regular rectangular banyan is minimized with respect to fanout when F = 3. Further, the cost of such a network is the same when F = 4 as it is when F = 2. Similarly, it is shown that a crude costperformance measure, obtained by multiplying this cost function by a measure of maximum data propagation delay, is optimized when F = 7 and is very near optimal when F 8. The cost and performance aspects of banyan partitioning networks will be discussed in greater detail in Section 9.
TABLE 5.21. Size and Cost Functions for Uniform Banyans
Type of Vertices in
Banyan Level I Bases Apexes Arcs (Cost Measure)
Uniform x/(I1'S),14.F x/F [/2 I=1L1[I~xx(IS) ,I+F
(Th. 2.2.1) (Car. 2.2.1a) (Car. 2.2.1a) (Th. 2.3.1)
Rectangular x/F x/F x/F (X/FEW/
(Car. 2.2.1a & 2.2.1b) (Car. 2.2.1a) (Car. 2.2.1a & (Car. 2.3.1a)
2.2.1b)
Regular and F*L F*L F*L (F*L)xLxF a
Rectangular (Car. 2.2.1b & 2.2.1c) (Car. 2.2.1c) (Car. 2.2.1c) (Car. 2.3.1b)
Regular (S*I)xF*L1 F*L S*L Fx[I=1L](S*I)xF*L1
(Car. 2.2.1c) (Car. 2.2.1c) (Car. 2.2.1c) (Car. 2.3.1c)
Regular and (S*I)xF*L.I F*L S*L (F*L)xSx(((F*L)xS)F)*tSF
Nonrectangular (Car. 2.2.1c) (Car. 2.2.1c) (Car. 2.2.1c) (Car. 2.3.1d)
SECTION 6
SW BANYANS
SW banyans (Definition 3.1.2) are a particularly interesting class of Llevel banyans which can be synthesized recursively from crossbars (Definition3.1.1) using the synthesis principle discussed in Section
4.3. SW banyans are probably the most attractive banyans for partitioning networks because they are the best understood theoretically, because they lend themselves exceptionally well to modular construction and, because they are a broad class of networks which can be varied in a number of ways to meet the needs of different applications. Additionally, the analysis of SW banyans might have much broader implications, because a number of connecting networks originally proposed for other applications are actually special cases of SW banyans.
65
66
6.1 Previous Special Cases
SW banyans are actually a generalization of a number of network structures considered previously for a variety of applications. The term "SW structure" was originally used by Lipovski (69, 70), who first proposed them for partioning applications in a large associative processor. The structures he defined are equivalent to regular SW banyans, and the possibility of uniform (but nonregular) SW banyans was implied (Lipovski, 69). Structures graphically equivalent to regular rectangular banyans with fanout and spread equal to 2 had been proposed earlier by Batcher (68) for use as "bitonic sorters."
More recently, networks graphically equivalent to rectangular SW banyans were proposed by Lawrie (73, 75) for memoryprocessor communication. Lawrie defined these networks using "omegabase" representations of integers and analyzed them for his application using number theory. Lawrie also noted that interconnections between stages of an "omega network' (i.e., between levels of a rectangular banyan) are equivalent to the "perfect shuffle" connection discussed by Pease (68) and by Stone (71). Additional material on the control and applications of networks of this type was published by Lang and Stone (76).
A variety of permutation networks also have been proposed which
contain special cases of SW banyans as major subgraphs. These networks are intended to permute a set of input lines onto an equal number of output lines in any desired fashion and were studied largely for telephone switching applications. Clos networks, proposed by Clos (53) and discussed further by Benes (62), contain three stages of crossbars interconnected symmetrically. If the last stage (or alternately the first, since the network is symmetrical) were removed from an n by m
67
by r Clos network, then the remainder would be a uniform 2level SW banyan with fanout vector n r and spread vector m r. Benes (64a, 64b, 65) analyzed a similar, but more general, class of permutation networks containing an odd number (not less than 3) of stages of crossbars interconnected symmetrically. These Benes networks, like Clos networks, are symmetrical about the center stage. If one were to remove all stages from one side of a Benes network, the remainder would be a rectangular SW network (or, equivalently, an omega network as noted by Lawrie (73)). It was subsequently shown by Goldstein and Liebholz (67) and by Waksman
(68) that certain crossbars can be removed from one side of a Benes network without destroying its ability to connect all possible permutations of input lines onto output lines. The other side of such a network (including its center stage) is identical to one side of a Benes network, and hence is a rectangular SW banyan. Another class of permutation networks, called "nested tree" networks, were proposed by Joel
(68) with little supporting theory. Each stage of a "nested tree" network is built from twobytwo crossbars and apparently is a regular, rectangular SW banyan with fanout and spread equal to 2.
Such common structures as crossbars and homogeneous trees are also special cases of SW banyans. Crossbars are simply 1level SW banyans and homogeneous trees are uniform SW banyans in which each component of the fanout vector equals 1.
IThe term "homogeneous tree" is used here in the sense of Iverson (62, p. 58). All leaves of a homogeneous tree lie in the same level, and within each level, all vertices have the same degree.
68
We are presently concerned with SW banyans as partitioning networks, but this diversity of applications suggests that theoretical results concerning SW banyans also could be useful in other areas.
69
6.2 Structure
SW banyans (Definition 3.1.2) are defined recursively in terms of crossbars (Definition 3.1.1), which are simply 1level banyans. All crossbars are SW banyans. Additionally, a synthesized banyan is an SW banyan if its interconnection graph is an SW banyan and its component banyans are all crossbars. Crossbars and synthesized SW banyans are the only SW banyans. This definition of an SW banyan is somewhat simpler and more general than that published previously by the author (Goke and Lipovski, 73). Unlike the earlier definition, it does not necessarily require an SW banyan to be uniform.
Figure 6.21 illustrates the synthesis of an SW banyan. Figure 6.2la shows the interconnection of component crossbars, 6.21b is the corresponding interconnection graph, and 6.2ic is the resulting synthesized SW banyan graph. Figure 6.22 shows some additional examples of synthesized SW banyans and their corresponding interconnection graphs, which, of course, are also SW banyans.
Properties of a synthesized SW banyan are related to those of its interconnection graph in a number of ways. All SW banyans are Llevel banyans, and the number of levels in a synthesized SW banyan is one greater than the number of levels in its interconnection graph (Theorem 3.1.3). A uniform synthesized SW banyan with fanout vector F and spread vector S has a uniform interconnection graph with fanout vector 1+F and spread vector (1)+S (Theorem 3.1.5). Similarly, a uniform SW interconnection graph with fanout vector F' and spread vector S' can be used to synthesize a uniform SW banyan with fanout vector B,F' and spread vector S',A, where B is the number of bases in each bottomlevel component crossbar and A is the number of apexes in each toplevel
70
0 1 2 4 5
2 x 2 CROSSBAR 2 x 2 CROSSBAR 2 x 2 CROSSBAR
4 x 3 CROSSBAR 4 x 3 CROSSBAR
I i L I
0 1 2 3 4 5 6
a) Interconnection of Crossbars
b) Interconnection Graph
0 2 4 1 3 5
0 1 3 4 5
c) Synthesized SW Banyan
Figure 6.21. Synthesis of an SW Banyan
71
a) Nonuniform SW Banyan b) Interconnection Graph of a
c) Rectangular SW Banyan d) Interconnection Graph of c
e) Regular SW Banyan f) Interconnection Graph of e
Figure 6.22. Examples of SW Banyans
72
C 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
x x x X
i0 1 2 3 4 5 6 7 8 9 i0 1 12 13 14 15 g) Regular, Rectangular SW Banyan
h) Interconnection i) Interconnection j) Interconnection
Graph of g Graph ofh Graph of i
Figure 6.22 (Continued)
73
component crossbar (Theorem 3.1.4). Also, the bases and apexes of a synthesized SW banyan can be mapped into those of its interconnection graph such that distances in the synthesized banyan are one greater than the corresponding distances in the interconnection graph (Theorems 3.1.6 and 3.1.7).
Synthesized SW banyans lend themselves especially well to modular
construction. By definition, a synthesized SW banyan can be constructed by interconnecting crossbars in a pattern which is itself a simpler SW banyan. Crossbars are, thus, natural building blocks for SW banyans. A regular SW banyan has the advantage that its component crossbars are all identical. At most L different kinds of crossbar modules are needed for a uniform SW banyan with L levels. A manufacturer, thus, could mass produce a few standardized kinds of crossbar modules in sizes where crossbars are practical, and these modules could be interconnected in an SW pattern to produce larger networks.
We observe that it is also possible to synthesize large SW banyans by interconnecting smaller SW banyan modules in an SW pattern. For example, Figure 6.23 shows how an SW banyan equivalent to that of Figure 6.22g can be synthesized from eight component SW banyans interconnected in a crossbar pattern. In Figure 6.23, graphs of the component SW banyans are drawn in the usual manner using solid lines as arcs, and broken lines show how the component banyans are interconnected. Base and apex numbers correspond with those in Figure 6.22g. In this manner, SW banyan modules can be used as building blocks for larger SW banyan networks. This would be useful if, for example, available packaging technology made it desirable to use modules larger than the largest practical crossbar.
0 4 8 12 1 5 9 13 2 6 10 14 3 7 11 15
I.X
S I I I i l I
N, N. I /.. A ./ I
* ,,, /I ,~ /
)1
/ e
I \7 :< I N /
/ K "N \ ,
XX ... i . ... f / k N
" L
I t I I 1 1 1 tii t
t1 1 i l li l
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14. 15
Figure 6.23. SYnthesis of an SW Banyan From Smaller Component SW Banyans
75
6.3 Distance Properties
The base and apex distance functions of an SW banyan are metrics on its bases and apexes, respectively (Corollaries 3.2.3b and 3.2.4b). In an Llevel SW banyan, each of these functions is characterized by L+1 equivalence relations with nested equivalence classes (Corollaries
3.2.3a and 3.2.4a and Theorems 3.2.5 and 3.2.6). Base distance is characterized by relations ,2,...,L where B1IN B2 if and only if
0 1'
(BlB2) I (Definition 3.2.1). Similarly, apex distance is characterized by relations M0,.',...,ML where Al M, A2 if and only if (AlMA2) < I (Definition 3.2.2). The equivalence classes of these relations are listed in Table 6.31 for the SW banyan in Figure 6.22g. In a uniform SW banyan with fanout vector F, the relation VI partitions the networks bases into k/I+F equivalence classes with x/I+F elements each (Theorem
3.2.7). Similarly, relation M partitions the network's apexes into x/(I)+S equivalence classes with x/(I)+S elements each (Theorem 3.2.8).
For reasons explained in Section 5.1, it is desirable to assign resources to bases such that resources most likely to be in the same subsystem are closest to each other. Thus, with an SW banyan, resources most likely to be in the same subsystem should be assigned to bases in the same small equivalence class. For example, suppose that eight processors and eight memory modules are to be attached to the 16 bases of the network in Figure 6.22g, and suppose it is known that a typical subsystem will require about as many processors as it does memories. Then chances are that more subsystems could be connected per layer if processors were attached to bases 0,2,4,...,14 and memories to bases 1,3,5,...,15 than if processors were attached to bases 0,1,...,7 and memories to bases 8,9,...,15.
TABLE 6.31. Base and Apex Equivalence Classes for SW Banyan in Figure 6.22g
Relation Equivalence Classes of Bases
[0 [0] [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15]
1 [0 1] [2, 3] [4 5] [6 7] [8, 9] [10 ,11] [12 ,13] [14 ,15]
2 [0 1 2 3] [4 5 6 7] [8 9 10 11] [12 13 14 15]
83 [0 1 2 3 4 5 6 7] [8 9 10 11] [12 13 14 15]
4 [0, 1 2 ,3 4 ,5 6 ,7 ,8 ,9 10 11 12 13 14 15]
Relation Equivalence Classes of Apexes
n0 [0] [8] [4] [12] [2] [10] [6] [14] [1] [9] [5] [13] [3] [11] [7] [15]
1 [0 8 [4 12] [2 10] [6 14] [1 9] [5 13] [3 11] [7 15]
92 [0 8 4 ,12] [2 ,10 6 ,14] [1 9 5 ,13] [3 ,11 7 ,15]
3 0, 8 ,4 ,12 2 ,10, 6 ,14] [1, 9 5 ,13, 3 ,11 7 ,15]
4 [0 8 4 ,12 2 ,10 6 ,14 1 9 5 ,13 3 ,11 7 ,15]
77
Base and apex distance functions are even more informative for SW banyans than they are for other Llevel banyans. In an Llevel banyan, these functions specify the minimum number of levels into the banyan that a connection must extend to connect two bases or apexes (Section
5.1). In an SW banyan, these functions also specify the maximum number of levels into the banyan that such a connection may extend before its two branches join. For example, if the distance between two bases of an SW banyan is 3, then any treeshaped connection joining them will fork precisely at level 3 rather than just somewhere in level 3 or above. Accordingly, the necessary condition for a conflict in an Llevel banyan (Theorem 2.1.7) is both a necessary and sufficient condition for a conflict in an SW banyan (Theorem 3.3.1).
SECTION 7
CC BANYANS
CC banyans (Definition 4.1.1) are a class of rectangular banyans
which are potentially useful as partitioning networks. They differ from SW banyans in that multilevel CC banyans are not synthesized from smaller banyans. The distance functions of a CC banyan differ from those of an SW banyan in that bases or apexes appear to be arranged in a circle rather than in nested equivalence classes. The distance between two bases or apexes is then determined by their separation on the circle.
Relatively few examples of the CC banyan structure are known to
exist in earlier networks. The "barrel switch" of the ILLIAC IV Processing Element is graphically equivalent to a 3level, regular CC banyan with fanout and spread equal to 4. In this application, it is used to shift 64 bits an arbitrary number of places left or right (Davis, 69).
CC banyans also are related to the "line manipulator" networks proposed by Feng (74) for performing a variety of data manipulation functions. A line manipulator is not itself a banyan because it contains multiple paths from any given base to an apex, but it contains both a CC banyan and an SW banyan as partial graphs. These partial graphs are both regular, rectangular banyans with fanout and spread equal to 2.
78
79
7.1 Structure
A CC banyan (Definition 4.1.1) is rectangular (Theorem 4.1.3) and, hence, has the same number of vertices in each level. For convenient identification, we can index these vertices as VEO~L;0~N11 where V[I;0N1] are the N vertices of level I. Hence, V[O;ON1] are bases, and V[L;O~N11 are apexes. Let SE[1L] be the fanout and spread vector of this rectangular banyan. Then from each vertex V[I;J] where 0 < I < L, there is an arc to each of the vertices V[IE;J],V[I;Je)(x/IS)],Y[II;Je2x(x/I+S)],..., V[I1;Je(S[I+111)x(x/I+S)], where a denotes addition modulo N. Some examples of CC banyans are shown in Figure 7.11. With the vertices in each level numbered from left to right as shown, each arc from a level I to level I+1 shifts circularly to the right Mx(x/ItS) places for some integer M where 0 M < SEll].
Note that if either of the CC banyans in Figure 7.11 were drawn on the surface of a vertical cylinder instead of in a plane, the horizontal lines would disappear, and a comparatively simple crosshatch pattern would remain. The term "CC" is an acronym for "cylindrical crosshatch" and is based upon this conceptualization of CC banyan structure.
CC banyans can be constructed in a modular fashion using a physical configuration similar to that of an ILLIAC IV "barrel switch," which has been described by Davis (69). The "barrel switch" layout is applicable directly to any regular, 3level CC banyan with fanout and spread equal to 4, and can be extended in a straightforward manner for other CC banyans.
80
0 1 2 3 4 5 .6 7 a) A CC Banyan with L 3 andS 2 2 2.
2 .
b) A CC Banyan with L = 2 and S = 3 2.
Figure 7.11. Examples of CC Banyans
81
7.2 Distance Properties
The base distance function in a CC banyan is characterized in terms of minimum circular distance. Consider the integers 0 ~ N1 arranged in a circle as illustrated in Figure 7.21. The minimum circular distance between two numbers J1 and J2 is denoted by J1 0 J2 and is defined to be the minimum number of steps, either clockwise or counterclockwise, which separate Ji from J2 on this circle. For example, if N 4 then ((N1)KI1) = (I(N1)) = 2. This function is a metric on the integers 0 ~ N1 (Theorem 4.2.3).
The distance between two bases of a CC banyan can be determined
from the minimum circular distance between their indices. Any base distance, of course, must be an integer in the range 0 ~ L. For any integer I in this range, the base distance between two bases y[0;J1] and VEO[0;J2] will be equal to or less than I if and only if (J1J02) < x/I+S, where S is the fanout and spread vector of the CC banyan and where N = x/S. (Theorem
4.2.4). Thus, the base distance between y[0;JI] and V[0,J2] is simply the smallest value of I = 0 L such that (Ji02) < x/I+S. Base distance is a metric on the bases of a CC banyan except possibly for degenerate CC banyans in which one or more components of S are less than 2 (Theorem
4.2.6).
It is apparent from this characterization that bases are closest in terms of base distance when their indices are closest in terms of minimum circular distance. Consequently, bases in a CC banyan can be thought of as arranged in a circle like the numbers in Figure 7.21. Hence, resource module ports most likely to be assigned to the same subsystem should be attached to adjacent bases, and those least likely to be connected should be attached to bases opposite each other on the circle.
82
2 = (N1)i1
0
N1 1
N2 2
Figure 7.21. Concetualizationof Minimum Circular Distance
Figure 7.21. Conceptualization of Minimum Circular Distance
83
Apex distance in a CC banyan is characterized differently from
base distance, but it is still useful to think of a CC banyan's apexes as being arranged in a circle. The apex distance between two apexes V[L,J1] and V[L,J2] is the smallest integer I = 0 L such that 0 = (+/(I)+S)IJ2Jl; that is, such that x/(I)+S divides J2J1 (Corollary
4.2.5a). It may be observed, however, that since x/(I)+S divides N, all of the following are equivalent.
0 = (x/(1)+S)1J2J1
0 (x/(I)+S) Nji21
0 = (x/(I)+S)'jNj1J2
0 = (x/(I)+S)jJ2i1
If one thinks of apexes y[L;0] through V[L;N1] arranged in a circle similar to that shown in Figure 7.21, then the distance between any two apexes can be determined from their separation on the circle. Starting with any apex V[L;J] and proceeding in either direction, one can find an apex I or closer to V[L;J] every x/(I)+S steps around the circle. For example, Figure 7.22 shows the apexes of the CC banyan in Figure 7.1la arranged in a circle. The number in parentheses next to each apex is that apex's distance from V[L,0]. This pattern of numbers simply may be rotated clockwise J places to determine distances from any other apex V[L.J].
84
(0)
Z[L;0]
(3)(3
(2) (2)
YEIL; 6] VEL;2]
_[;5 ZEL;3]
ZEIL;'4]
Figure 7.22. Apex Distances for CC Banyan in Figure 7.1la
SECTION 8
BANYAN NETWORK SIMULATIONS
Although theoretical analysis of banyan structures has been fruitful in many respects, it has thus far failed to yield good quantitative measures of partitioning flexibility. Consequently, a number of banyan networks were simulated on a digital computer in order to study network performance characteristics and to assess the effects of certain design options.
The tests performed were intended primarily for comparing the
effects of design variations on the partitioning flexibility of a banyan network. The simulated test conditions were not based on any particular application or jobmix. They were designed to exercise a network's partitioning capabilities thoroughly, but in a conceptually simple manner. As will be explained later, these test conditions tended to be contrived "worst case" conditions in several respects and probably were more severe than normal conditions in any practical application.
The simulations tested the ability of networks to connect randomly selected partitions of system resources. Statistics were gathered concerning the numbers of parallel or multiplexed layers (Section 4.5) required and concerning the number of subsystems connected in each layer. The nature of these simulations will be described in greater detail in Section 8.1.
85
86
The simulation results, tabulated in Appendix D and discussed in Section 8.2, demonstrate how performance measures for a banyan network tend to be affected by its size, fanoutspread, and structure type and by certain control options. As will be explained in Section 9, these simulation results also indicate that banyan networks could have significant costperformance advantages over crossbarbased networks in large systems.
87
8.1 Nature of Simulations
Kinds of networks simulated. Both SW and CC banyan partitioning networks were simulated. All simulated networks were regular and rectangular and had fanoutspread parameters ranging from 2 to 8. The number of bases ranged from 4 to 256, but due to computer time limitations, most tests were performed using networks with at most 64 bases.
Apex selection rules. As was discussed in Section 5.1, Theorem
2.1.7 suggests that more subsystems might be connected in a given layer if apexes for new subsystems were selected as near as possible to apexes used for existing subsystems. To assess the significance of this selection criterion, two apex selection rules were used, one of which tended to select new apexes far from those in use and the other of which tended to select new apexes near to those in use. We call these the "far rule" and the "near rule", respectively.
Both selection rules were simple, fixedpriority rules. The only difference was the way in which selection priorities were assigned to apexes.
The far rule simply selected the leftmost elligible apex, assuming that a network was layed out in the usual manner as illustrated in Figures 4.3ib, 4.41, 4.42, 5.11, 6.22g, 6.22h, G.22i, and 7.11a.
For example, if the far rule were applied to the SW banyan in Figure 6.22g, apex 0 would be first choice, apex 1 would be second choice, etc. Apex 15 would be selected only if it were the only apex elligible. Thus, with apexes numbered in this manner, apex Ii would be the Ith choice according to the far rule. This rule tended to select apexes very distant from those already in use, because consecutive choices generally were the most widely separatedapexes in terms of apex distance.
88
In contrast, the near rule tended to select new apexes close to
those already in use, because its consecutive choices tended to be the nearest apexes in terms of apex distance. With apexes numbered in the conventional manner, apex number (i(ST(II)) would be the Ith choice according to the near rule, where S is the spread vector. For example, the apexes of the SW banyan in Figure 6.22g would be assigned priorities as shown below in Table 8.11. This rule would simply select the leftmost eligible apex if the banyan were laid out as shown in Figure 8.11. Similarly, the near rule would select apexes of the CC banyan in Figure 7.11b according to the priorities listed in Table 8.12. This would be equivalent to selecting the leftmost eligible apex if the CC banyan were laid out as shown in Figure 8.22.
Note that our apex numbering conventions are such that the same far and near rules are applicable to both SW and CC banyans.
TABLE 8.11. Application of Near Apex Selection Rule to the Banyan in Figure 6.22g
CHOICE APEX
NUMBER (I) ST(II) ( (I+1)) NUMBER
1 0000 0000 0
2 0001 1000 8
3 0010 0100 4
4 0011 1100 12
5 0100 0010 2
6, 0 1 0 1 1 0 1 0 10
7 0110 0110 6
8 0111 1110 14
9 1000 0001 1
10 1001 1001 9
11 1010 0101 5
12 1011 1101 13
89
TABLE 8.11. (Continued)
CHOICE APEX
NUMBER (I) ST(Il) P(ST(11)) NUMBER
13 1100 0011 3
14 1101 1011 11
15 1110 0111 7
16 1111 1111 15
TABLE 8.12. Application of Near Apex Selection Rule to the Banyan in Figure 7.11b
CHOICE APEX
NUMBER (I) ST(Il) (ST(I1)) NUMBER
10 0 00 0
2 01 10 3
3 10 01 1
4 11 11 4
5 20 02 2
6 21 12 5
Setup rules. Two setup rules were simulated, the standard setup rule described in Section 4.4 and a modified rule in which the "trunk" portion of a treeshaped connection was disconnected immediately after setup. The standard rule sometimes produced treeshaped connections, like that denoted by heavy lines Figure 8.13a, in which no branching existed at the apex or root. In such cases, the portion of a connection between the apex and the highestlevel branch point was superfluous once the connection had been established. The modified setup rule
90
.0 8 4 12 2 10 6 .14 1 9 5 13 3 11 7 15
XX XX
X XX
x x
//X XX\
0 8 4 12 2 10 6 14 1 9 5 13 3. 11 7 15
Figure 8.11. Redrawn Version of SW Banyan in Figure 6.22g

PAGE 1
BANYAN NETWORKS FOR PARTITIONING MULTIPROCESSOR SYSTEMS By Louis Rodney Goke A DISSERTATION PRESENTED TO THE GRADUATE COUNCIL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 1976
PAGE 2
ACKNOWLEDGMENTS I wish to thank my advisor, Dr. G. J. Lipovski, for his inspiration and guidance concerning both this dissertation and computer architecture in general. It was largely for the opportunity to work with and to learn from Dr. Lipovski that I chose to continue at the University of Florida beyond the master's program, and I feel that his influence has contributed greatly to my technical and professional preparation. The research presented here began as an outgrowth of his earlier work with SW switching structures (Lipovski, 69, 70), and was undertaken originally as thesis research for the degree of Engineer. It was developed, instead, into a doctoral dissertation largely as a result of Dr. Lipovski's encouragement. Portions of this dissertation have evolved piecemeal over a period of years, and I am indebted to a large number of typists and others who have assisted in document preparation. Most notable has been the con tribution of Ms. Sylvia Hansing, who has had the perseverance and the skill to type the entire final manuscript in Mag Card form. Special thanks go to my wife, Mary Goke, for her tolerance and understanding and for the variety of ways in which she has assisted me during the preparation of this dissertation. ii
PAGE 3
TABLE OF CONTENTS ACKNOWLEDGMENTS LIST OF TABLES LIST OF FIGURES ABSTRACT SECTION 1 INTRODUCTION 2 3 4 1.1 The Trend Towards NumerousModule Systems 1.2 Problems with Previous Interconnection Schemes 1.3 Banyan Partitioning Networks PARTITIONING NETWORK CONCEPTS AND REQUIREMENTS 2.1 Architecture of a Partitionable System 2.2 Utilization of a Partitionable System 2.3 Requirements of a Partitioning Network SOME ALTERNATE REALIZATIONS OF PARTITIONING NETWORKS 3.1 Crossbar Networks 3.2 Permutation Networks BANY~.NS 4.1 TreeShaped Connections 4.2 Priority Hardware in TreeShaped Data Paths 4.3 Synthesizing Large Banyans from Smaller Ones 4.4 Control of Connections 4.5 Parallel and Multiplexed Networks iii ii V vii ix 1 1 4 7 9 10 15 25 31 32 35 38 40 41 46 49 54
PAGE 4
TABLE OF CONTENTS (continued) SECTION Page 5 LLEVEL BANYANS 56 5.1 Base and Apex Distance 58 6 7 8 9 5.2 Fanout and Spread SW BANYANS 6.1 Previous Special Cases 6.2 Structure 6.3 Distance Properties CC BANYANS 7.1 Structure 7.2 Distance Properties BA.J.~AN NETWORK SIMULATIONS 8.1 Nature of Simulations 8.2 Simulation Results COSTPERFORMANCE FUNCTIONS 9..1 Functions of Interest 9.2 A Comparative Example 9.3 Optimum Fanout and Spread 10 CONCLUSIONS APPENDIX A: MATHEMATICAL NOTATION AND TERMINOLOGY APPENDIX B: THE THEORY OF BANYAN GRAPHS APPENDIX C: BIDIRECTIONAL SWITCHING CIRCUITS APPENDIX D: COMPLETE SIMULATION DATA REFERENCES BIOGRAPHICAL SKETCH iv 62 65 66 69 75 78 79 81 85 87 96 111 112 117 122 124 128 136 208 216 225 228
PAGE 5
Table 5. 21. 6. 31. 8.11. 8.12. 8.21. 8.22. 8.23. 8.24. 9. 21. A1. A2. D1. D2. D3. D4. D5. LIST OF TABLES Size and Cost Functions for Uniform Banyans Base and Apex Equivalence Classes for SW Banyan in Figure 6.22g Application of Near Apex Selection Rule to the Banyan in Figure 6.22g Application of Near Apex Selection Rule to the Banyan in Figure 7.1lb Effects of Varying the Number of Subsystems in a Partition Comparison of SW and CC Network Structures Comparison of Far and Near Apex Selection Rules Comparison of Standard and Modified SetUp Rules Cost and Performance Measures for Three Alternative Networks Standard APL Notation Used in Dissertation Extensions to APL Notation Used in Dissertation Simulation Results for SW Banyans Using Far Apex Selection Rule and Standard SetUp Rule Simulation Results for SW Banyans Using Near Apex Selection Rule and Standard SetUp Rule Simulation Results for SW Banyans Using Far Apex Selection Rule and Modified SetUp Rule Simulation Results for SW Banyans Using Near Apex Selection Rule and Modified SetUp Rule Simulation Results for CC Banyans Using Far Apex Selection Rule and Standard SetUp Rule V 64 76 88 89 97 106 108 109 119 132 135 218 219 220 221 222
PAGE 6
LIST OF TABLES (Continued) Table Page D6. D7. Simulation Results for CC Banyans Using Near Apex Selection Rule and Standard SetUp Rule Simulation Results for CC Banyans Using Near Apex Selection Rule and Modified SetUp Rule vi 223 224
PAGE 7
Figure 2.11. 2.12. 2.21. 2.22. 2.23. 2.24. 3.11. 3. 21. 41. 4. 21. 4.31. 4 42 5.11. 6. 21. 6.22 6.23. 7.11. 7. 21. 7.22. LIST OF FIGURES Basic Architecture of a Partitionable System A Partitionable System with Special Bus for Controi Messages Example of an Isolated Subsystem Subsystems Linked by a Shared Resource Module Example of Subsystems Linked for Distributed Processing Example of Subsystems Linked for Array Processing Crossbar Partitioning Network Permutation Network used as a Partitioning Network Examples of Banyans Example of Priority Vie Banyan Synthesis Setup Algorithm Search Algorithm Base and Apex Distances in an LLevel Banyan Synthesis of an SW Banyan Examples of SW Banyans Synthesis of an SW Banyan from Smaller Component SW Banyans Examples of CC Banyans Conceptualization of Minimum Circular Distance Apex Distances for CC Banyan in Figure 7.1la vii 11 13 16 18 20 22 33 36 39 42 47 50 52 59 70 71 74 80 82 84
PAGE 8
LIST OF FIGURES (Continued) Figure Page 8. 11. 8.12 8.13. 8. 21. 8.22. 8.23. 8.24. 9.21. C1. C2. C3. C4. C5. Redrawn Version of SW Banyan in Figure 6.22g Redrawn Version of CC Banyan in Figure 7.1lb SetUp Rule Modification Average Layers Required for SW Banyans Using Apex Selection Rule and Standard SetUp Rule Average Layers Required for SW Banyans Using Apex Selection Rule and Modified SetUp Rule Average Layers Required for CC Banyans Using Apex Selection Rule and Standard SetUp Rule Far Near Near Average Layers Required for CC Banyans Using Near Apex Selection Rule and Modified SetUp Rule Modification of a Crossbar Partitioning Network to Limit Fanout Requirements Relay Used as a Bidirectional Switch AND Gate Used as a Unidirectional Switch Bidirectional Switch Using Standard TTL Gates Bidirectional Switch Using Standard ECL Gates Bidirectional Switch for LSI Using 1 2 1 Gates viii 90 91 92 99 100 101 102 118 210 211 212 213 214
PAGE 9
Abstract of Dissertation Presented to the Graduate Council of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy BANYAN NETWORKS FOR PARTITIONING MULTIPROCESSOR SYSTEMS By Louis Rodney Goke June, 1976 Chairman: Gerald J. Lipovski Major Department: Electrical Engineering There is a strong and growing need for switching structures suitable for interconnecting numerous processors and other resource modules in large, general purpose computing systems. For this purpose, "banyan" network structures are defined and analyzed with the use of graph theory, and their costperformance characteristics are compared with those of alternative networks. Techniques are proposed for utilizing banyan networks in large, general purpose, partitionable systems containing numerous microproces$ors or other resources. A banyan network used in this manner can partition system resources into a wide variety of taskoriented subsystems and, when necessary, can be multiplexed to realize any possible partition. Techniques are also presented for constructing banyan networks in modular form and for controlling them in a rapid and potentially faulttolerant manner using distributed logic in the networks themselves. Banyan partitioning networks are shown to have significant advantages over alternative crossbar, or multiplebus, structures for use in large ix
PAGE 10
systems. It is shown that banyan cost functions tend to grow more slowly with network size and that banyan networks can be expanded without limit using fixedfanout devices. Statistical simulation results are presented indicating that banyans have a potential costperformance advantage over large crossbarbased partitioning networks and that this advantage tends to increase with network size. Graph theory and APL vector operations are used in characterizing theoretical banyan properties. Various subclasses of banyans are defined, providing a taxonomy of network structures and ~ermitting the derivation of additional useful properties. The analysis is oriented towards the use of banyans as partitioning networks, but it is noted that a variety of networks proposed previously for other purposes are structurally equivalent to special cases of banyans, suggesting that the theory of banyan graphs could have much broader applications. X
PAGE 11
SECTION 1 INTRODUCTION 1 1 The Trend Towards NumerousModule Systems There is reason to believe that large data processing systems of the future will tend more and more to contain numerous small resource modules instead of a few large ones. For example, a computing facility requiring a large amount of processing power might contain a number of miniprocessors or microprocessors instead of a single large processor. Similarly, its primary memory might be provided by numerous low capacity modules rather than a few high capacity modules. It has long been recognized that modular multiprocessor systems have major potential advantages over single processors in the areas of throughput, reliability, availability, and expandability. Potential throughput advantages are obvious in applications which lend themselves to parallel processing. By automatically reassigning tasks so as to bypass faulty modules, excellent system reliability and availability can be achieved, even if individual module failures are connnon. Furthermore, it is possible to expand the capacity of a highly modular system quickly and efficiently by simply adding more modules. Economically, the attractiveness of multipleprocessor architectures has grown with progress in integrated circuit technology. With early component technologies, large processors tended to provide the most computing power for the money, and the high costs of processors tended 1
PAGE 12
2 to make numerousprocessor systems prohibitively expensive. Accordingly, early multiprocessor systems tended to contain small numbers of processors and generally were constructed only as research projects or for special applications where adequate performance could not be obtained with conventional machines (Comptre, 74). When MSI technology and volume production made miniprocessors available for several thousand dollars each, systems with multiple miniprocessors began to be viewed as alternatives to large, single processor, timeshared systems. These miniprocessors found considerable use as elements of geographically distributed, special purpose systems, and efforts were undertaken to develop reconfigurable, general purpose, multiminiprocessor systems at two major universities (Baskin et al., 69, 72; Wulf and Bell, 72). Simple costeffectiveness measures published recently by Bhandarkar and Juliussen (75) indicate that multiminiprocessor systems with up to several tens of processors could have costeffective ness advantages over comparable singleprocessor computers. More recently, the availability of extremely low cost LSI micro processors has accelerated interest in multipleprocessor architectures, and the trend is expected to continue (Searle and Freberg, 75). Micro processors became recognized for their excellent costperformance potential shortly after they were introduced (Schultz et al., 73), and improvements in both cost and performance have progressed rapidly. General purpose processors are now available for several tens of dollars, and substantial further reductions are projected. The excellent costeffectiveness of these processors suggests that multiplemicroprocessor systems are destined to replace large singleprocessor computers in a number of applications.
PAGE 13
3 The cost, size, and power requirements of microprocessors are now low enough that systems with hundreds or even thousands of processors are reasonable to consider. In fact, a general purpose computer containing up to 512 microprocessors already is being marketed by one manufacturer (Frank, 75). The trend towards numerous small modules is not limited to processors. Advancing LSI technologies are now challenging core technology for primary memory ~pplications. MOS/LSI memories are beginning to compete with core memories in speed and cost; and faster, but more costly, bipolar memories are finding applications in highperformance machines such as the Texas Instruments Advanced Scientific Computer. Before long, charge coupled devices may be widely used where higher capacity and lower speed are required. Unlike core technology, the newer LSI semiconductor technologies are well suited to the fabrication of numerous small modules. Clearly, strong incentives now exist for developing general purpose multiprocessor systems containing large numbers of resource modules. Potentially, at least, such systems could have significant advantages over large singleprocessor computers in the areas of throughput, relia bility, availability, expandability, and cost.
PAGE 14
4 1.2 Problems with Previous Interconnection Schemes Although there are good reasons for developing general purpose multiprocessors with numerous modules, most previously used interconnec tion schemes are practical only for small or specialized systems. Problems with module interconnection schemes used previously will be discussed briefly in this section. More detailed surveys of these schemes have been published by Comptre (74) and by Searle and Freberg (75). Crossbar switching structures can support high data rates but are not practical for interconnecting large numbers of modules. The number of "contacts", or switching devices, needed for a crossbar increases with the square of the number of modules connected to it, making the crossbar prohibitively expensive for very large systems. Since the fanout of switching devices in a crossbar increases linearly with the number of resource modules, this too can be a serious problem in large systems, especially if expandability is not to be limited. A single timeshared bus can provide flexible, inexpensive communi cation among a small number of modules, but bus contention problems make this approach impractical for large systems. As the number of modules on the bus increases, bus utilization increases, causing the resource modules to waste more and more of their time waiting for a nonbusy bus. Multiple timeshared busses can be used to alleviate bus contention problems, but this configuration has problems similar to those of a crossbar. In fact, the switching devices which enable each module to be connected with any bus are arranged in a crossbar ~onfiguration. Since the maximum data rate that can be handled by a bus is fixed, the number of busses required grows proportionally with the total number of modules,
PAGE 15
5 and the number of switching devices required grows as the square of this number. Both single and multiple timeshared busses have fanout problems in large systems, because each module on a bus must be capable of driving all other modules attached to that bus. Thus, as is the case with a crossbar, fanout requirements grow linearly with the number of modules. In a very large system, fanout limitations can be overcome by dividing each bus into segments interfaced by bidirectional amplifiers, but this further increases network cost and increases the time required for signals to propagate across a bus. "Multiport" systems, in effect, use crossbar structures to inter connect various classes of modules, and hence, they have cost and fanout problems like those of crossbars. A number of vector and array organizations have been devised in which each resource module (usually containing a processor with memory) can communicate directly with only a fixed number of "nearest neighbors." These organizations generally can be extended to very large sizes without severe cost or fanout problems but are not well suited to general purpose use. Since each module can communicate directly with only a few neighbors, considerable overhead is required whenever logical dataflow patterns do not correspond closely with the hardware interconnection pattern. It has been found, generally, that efficient software is difficult to produce for these machines unless the application "fits" the architecture. Also, when a module fails, there may be no efficient way for another module to take its place, since the substitute module is not likely to have the same neighbors as the module that failed. The ILLIAC IV (Barnes et al., 68) is typical of this kind of organization.
PAGE 16
6 Various distributed architectures have been used successfully in dedicated~function applications but are not well suited to largescale, general purpose use. Efficiency in a large distributed system usually derives from the fact that each module or subsystem performs a specialized function and needs to communicate with other modules or subsystems only in very limited ways known at design time. This, of course, is not feasible in a general purpose system, which must be designed for a variety of applications not necessarily known at design time. Thus, interconnection schemes used previously in multipleprocessor systems generally are undesirable for generalpurpose systems with very large numbers of modules, because their cost and fanout functions grow too rapidly with system size, because performance degrades with system size, or because they perform well only in specialized applications.
PAGE 17
7 1.3 Banyan Partitioning Networks A class of connecting networks suitable for interconnecting large numbers of resource modules in a general purpose, multipleprocessor system will be defined and analyzed in this dissertation. These networks, called banyans, will be analyzed for their ability to partition the resources of a modular system into taskoriented subsystems. The work pi:esented here concentrates on the use of banyans as partitioning networks, because this mode of operation is particularly applicable to large, general purpose, multipleprocessor systems. Banyan networks also can be used in other ways, but nonpartitioning applications are beycnd the scope of this dissertation and will be discussed only in relating this to previous work. Banyan partitioning networks offer a great deal of flexibility for general purpose use and have major practical advantages for use in large systems. They can economically partition the resources of large modular systems into a wide variety of subsystems. Any possible partition can be realized by paralleling several networks or by multiplexing a single network in a manner to be described later. Banyans are potentially much more economical than crossbarbased structures foi:large systems, because their "cost" functions increase more slowly with system size and also because they have easily satisfied fanout requirements that are independent of system size. Results will be given indicating that a costperform ance advantage over crossbarbased structures can be achieved for large systems and that a crossbar sti:ucture actually can be considered a nonoptimal special case of a banyan structure. Propagation delays through a banyan network grow only logarithmically with system size, and high intrasubsystem data transfer rates can be sustained regardless of
PAGE 19
SECTION 2 PARTITIONING NETWORK CONCEPTS AND REQUIREMENTS The purpose of this section is to explain how a partitioning network could be used in a general purpose, multipleprocessor system and to identify requirements this application would be likely to impose on a partitioning network. A partitionable system architecture will be presented in Section 2.1. The ways in which a system of this type could be applied to different classes of problems will be discussed in Section 2.2. Requirements imposed on a partitioning network by this kind of use will be identified in Section 2.3. 9
PAGE 20
10 2.1 Architecture of a Partitionable System The basic architecture of a partitionable data processing system is illustrated in Figure 2.11. The system contains a number of resource modules, such as processors, memory modules, mass storage devices, I/0 devices, or even complete computer systems. Each module is connected to a partitioning network through one or more ports. These ports generally would be bidirectional so that a module could both transmit and receive signals through the same port. For example, an input device might receive requests for data and then transmit the data requested. The purpose of the partitioning network is to establish necessary connnunication paths by connecting ports together to form subsystems. For each subsystem, the partitioning network, in effect, provides a separate timeshared bus to which all ports in that subsystem are connected. Thus, the partitioning network partitions the ports of system resource modules into subsystems, and a particular resource module may belong to as many subsystems as it has ports. The connections established by the partitioning network typically would be controlled by an operating system or executive and would be modified automatically to accommodate different job needs. The operating system, like any other job, could be executed by a set of resource modules linked as a subsystem. To facilitate operating system functions, some facility for passing system control messages outside the partitioning network would probably be needed. For example, user subsystems might send messages to the operating system in order to request additional resources, to release resources no longer needed, to signal job completion, or to request other changes in system configuration. Similarly, the operating system might
PAGE 21
11 PARTITIONING NETWORK l I Resource Modules Figure 2.11. Basic Architecture of a Partitlonable Systc~
PAGE 22
12 direct system configuration changes by sending commands to the partitioning network, and it might send messages to user subsystems in order to control execution and to acknowledge service requests. Control messages such as these would likely be short and infrequent but generally should be trans mitted quickly. Communication of this sort could be provided by a single timeshared bus or other simple facility linking together all processor modules and the control inputs of the partitioning network as illustrated in Figure 2.12 This facility for passing system control messages would tend to be a critical, but relatively inexpensive, part of a large partitionable system; so the straightforward use of redundant hardware in this facility could prevent it from limiting system reliability and would add little to overall system cost. The architecture presented here has a number of attractive features. By effectively providing a separate bidirectional bus for each subsystem, th e partitioning network allows data transfers within each subsystem to take place at high rates and with little delay. Single timeshared busses are now widely used for interconnecting resource modules in small computing systems, and a variety of devices now on the market can be interfaced in this manner. Security problems in a multiprogramming environment are greatly simplified by the fact that disjoint subsystems readily can be established to execute independent jobs. Since each subsystem bus functions independently from all others, subsystems or sets of interconnected subsystems cannot interfere with each other unless they share one or more resource modules. Excellent potential exists for achieving high system reliability and availability, because each module potentially can be connected directly with any set of other modules by grouping them together as a
PAGE 23
Ot her Resource i\lo clulcs 13 PARTITIONING NETWORK 00@ ~t Control Ilus Proces s or l\Io
PAGE 24
14 subsystem. Thus, if a module failed, the partitioning network could connect an equivalent spare module in its place, and the repaired sub system could continue processing as efficiently as before. The flexibility of this architecture makes it attractive for gen eral purpose use. As will be shown in Section 2.2, it is suitable for most applications of large computers and, for most purposes, could be programmed in a very straightforward manner.
PAGE 25
15 2.2 Utilization of a Partitionable System The system architecture described in Section 2.1 is extremely flexible and could be used in many of the application areas currently dominated by large singleprocessor computers. It allows each job to be executed by a set of resource modules interconnected to emulate a system architecture appropriate for that job. Conventional batch jobs and small realtime jobs could be executed by isolated subsystems, each configured to f~nction as a singleprocessor computer dedicated to its own job. More demanding jobs encountered in many realtime and nearrealtime applications could be handled by sets of subsystems linked via shared resources to form distributed processing networks or parallel processing arrays. Potentially, any of these configurations could coexist with others and could be assembled, disassembled, or modified under operating system control to satisfy changing job requirements. In the remainder of this section~ several basic techniques will be discussed for applying a partitionable system to common types of data processing problems. The simplest way to use a partitionable system is to configure an isolated subsystem for each job. An isolated subsystem is one whose resource modules are functionally independent from those of other sub systems. Except for possible communication with a central operating system, each isolated subsystem would function independently as a sepa rate computer system. Different numbers and types of resource modules could be assigned to different subsystems according to job requirements, and if necessary, modules of a subsystem could be added or deleted by the operating system in response to service calls from the subsystem. Figure 2.21 shows a typical isolated subsystem which might be used to execute a small batch job. The subsystem bus connects the
PAGE 26
Subsvstem I3us I I I/0 FILES PROCESSOR WITH MEMORY SUPPLEl\JENTAL ~1E~IORY I l Figure2.21. Example of an Isolated Subsystem
PAGE 27
17 necessary resource modules so that they can function together as a small singleprocessor computer. Although the resource modules are function ally independent from those of other subsystems, some of them rr~y be physically combined with those of other subsystems. For example, the module labeled "I/0 Files" might actually be one of several ports to a larger file management system, which makes its ports appear to be inde pendent by restricting the files accessible through each port. An isolated subsystem could be used effectively for any job not requiring a large amount of processing power. Batch programs and small realtime programs could be written in a conventional manner and could be executed by singleprocessor subsystems so that, in effect, each program would have its own small computer. High system throughput could be achieved by executing many jobs concurrently with different subsys tems. M ultiple processors could be interconnected for jobs too demanding for a single processor. This might be the case in certain realtime or nearrealtime applications or for very large batch jobs that would take too long to execute otherwise. In some cases, two or more processors could be assigned to an isolated subsystem, but severe bus contention problems would be likely if many processors shared the same subsystem bus. A more practical approach for unusually demanding jobs would be to link two or more subsystems together via shared resource modules as illustrated in Figure 2.22. A number of multiport modules might be included in a system for this purpose For example, a number of multiport memory modules with arbitration hardware could be provided. To increase versatility, the memory areas accessible through each port could be limited
PAGE 28
First Subsystem Bus Second S11Lsvstcm Bus I, w _.._ 0 SHARED GG0 RESOURCE l\JODULE l\lodulcs Dedicated 1\1odulcs Dcrlicale
PAGE 29
19 by registers associated with the port so that each port could be used as a separate memory module when shared memory was not required. Each set of linked subsystems could be configured to function as a distributed network or parallel processing array suited to the needs of its associated job. This technique would enable application programmers to run unusually demanding jobs on partitionable systems, and also could provide a fast, economical means for system designers to emulate special purpose distributed networks and array processors prior to construction. Figure 2.23 illustrates how a set of subsystems might be linked for distributed processing in a simple nearrealtime application. The subsystems are cascaded in assembly line fashion with each subsystem performing a different kind of processing function. Thus, throughput capacity of the distributed system is enhanced by pipelining. A data flow diagram of the distributed system is shown in Figure 2.2~3a, and the organization of resource modules into linked subsystems is shown in Figure 2.23b. The first subsystem collects data from a realtime source, does some initial processing on it, and writes the results into a buffer in shared memory. The second subsystem reads data from this buffer, applies a signal processing algorithm, and writes its results into a second buffer. The third subsystem takes data from the second buffer~ arranges it into the desired output format, and writes it to a display device. Distributed configurations like this, which process data in assembly line fashion, should be relatively easy to design and program. Data flow diagrams for more complex systems could be designed in a systematic, topdown manner much as sequential programs are in certain "structured
PAGE 30
IlUFFER DATA COLLECTION SIGNAL PROCESSING DISPLAY PROCESSING a) DataFlow Diagram First Subsystem Bus Secon
PAGE 31
21 programming" methodologies. Once a dataflow structure has been fully defined, each processing node could be implemented with a singleprocessor subsystem programmed like a conventional computer. For example, a sub system which continually reads data from one buffer and writes its results into another could be programmed as if it were a conventional computer reading data from an input device and writing results to an output device. Any mutual exclusion protocols required for accessing shared buffers ~ould be built into I/O routines and need not necessarily concern the application programmer. These basic design principles could be applied even to complex systems in which some processing nodes and buffers might have multiple inputs and outputs. The practicability of distributedsystem design is evidenced by the growing use of distributed processing in specialized data processing systems. Continuation of this trend, no doubt, will lead to improved design techniques and to more widespread familiarity with distributed systems among designers and programmers. Subsystems also could be linked for parallel processing of array data. With this approach, a number of subsystems would be intercon nected in a regular pattern, and each subsystem would perform the same processing function on a different data stream. One of many possible array processing configurations is shown in Figure 2.24 In this example, four subsystems are linked in~ recr~ng 11 lar pattern for multiplying a pair of large matrices. A (2xI) by J matrix d is to be multiplied times a J by (2xK) matrix~ to obtain a (2xI) by (2xK) matrix Q. The matrices A and~ initially are segmented and loaded into the shared memory modules as shown. Each subsystem then computes one
PAGE 32
Subsystem 13us No. 1 Subsystem Bus No. 2 j t SIIARED MEMORY .t I i\1Ei\IORY PROCESSOR A[1~I;1~JJ PROCESSOR i'IEMORY & MEl\IORY & ME!\IORY Q[1~I;1 ~ K] Q[1~I;K+1~2xK] ..._ SHAREP SHARED i,,. MEMORY !\!EMORY ~[1~J;1NK] Q'.[1~J;K+1~2xK] Subsystem Bus No. 3 Subsystem Bus No. 4 SHARED l\1El\1ORY I d[I+1~2xI;1~J] l\1Ei\1ORY PROCESSOR PROCESSOR l\1El\10RY & l\lEl\lOR Y & MEMORY Q[I+1~2 x I;1~K] Q[I+1~2xI;K+1~2xK] Figu r e 2 2 4! Example o f Su bs ystem s Linked for Array Processing N N
PAGE 33
23 quadrant of the resulting matrix Q and stores it in the memory module shown. By working in parallel, the four processors should be able to perform the matrix multiplication about four times as fast as a single processor. A number of fixedconfiguration array processors have been proposed but generally have been built only as research projects or for special ized applications. 1 There i s no doubt that these machines can perform certain kinds of computations at extremely high speed~ but efficient software for most of them has been notoriously difficult to produce except for specialized applications where the problem "fits" the machine. A frequent source of programming difficulty and software inefficiency is the fact that processing elements are interconnected in a fixed or nearly fixed configuration, which may not have the best dimensions or interconnection pattern for a given problem. In such situations, one must either use the available processing array inefficiently or employ complicated software techniques to "fit" the problem to the machine. Parallel array processing in a partitionable system similarly could provide very high speed array computation but need not involve some of the software problems associated with existing array processors. A processing array in a partionable system could be configured wi th what ever interconnection pattern and dimensions were needed for a particular problem. Consequently, relatively straightforward softwa.re could be executed with high efficiency by array processing subsystems. In a partitionable system~ parallel array processing and other forms of processing could be used t ogether easily and efficiently for 1 A number of existing array processors have been surveyed by Comptre (74).
PAGE 34
24 any job with varied computational requirements. For example, a parallel processing array for signal processing might be used as a processing node of a larger distributed system configured for a realtime applica tion; or a large batch job might involve several steps, each of which would run with a different configuration of subsystems. Such flexibility is desirable because there are many applications in which parallel array processing would be useful only for part of the required computations. In almost any practical situation, an operating system would be necessary for effective utilization of a partitionable system. The kinds of functions performed by such an operating system would be similar to those performed by a conventional multiprogramming operating system except that partitioning network control functions would be performed instead of timesharing overhead functions. The overhead and complexity normally required for swapping tasks in and out of execution on a single processor would be unnecessary in many situations, because each job or task could be run to completion on its own subsystem or set of subsys tems. The software complexity and overhead required for controlling a partitioning network would depend on the kind of network used. As will be shown later, much of the work required for controlling a banyan partitioning network can be performed very rapidly by distributed logic in the network itself.
PAGE 35
25 2.3 Requirements of a Partitioning Network Partitionable systems have a number of attractive features for a wide range of largecomputer applications, but the practicability of this approach depends on the availability of a suitable class of parti tioning networks. In this section, we will identify properties that a class of partitioning networks should have for use in partitionable systems. Cost. Reasonable cost is perhaps the most obvious practical require ment. Generally, a network will be economically practical so long as its cost per resource module port is below some applicationdependent limit. Since network cost per port increases with the number of ports, this effectively limits the maximum network size that is economically practical. Thus, in order for a class of networks to be practical for large systems, its cost function should grow slowly with system size. Actual network cost, of course, depends on many factors, including component technology and packaging techniques. For comparing network structures, however, it is common practice to use the number of "contacts", or switching devices, required as a measure of network cost. Fanout. All major families of electronic switching devices have fanout limitations; that is, each device is capabl~ of driving only a limited number of similar devices. Thus, the number of other devices to which each switching device in a network is connected should grow very slowly, or n.ot at all, with system size in order for a network structure to be practical for large systems. Bidirectional switching. Data paths, and hence the devices used for switching them, must be bidirectional in a partitioning network so that data can be transferred from any resource module to any other resource module connected to the same subsystem bus. Some bidirectional
PAGE 36
26 electronic switching devices suitable for this purpose have been des cribed by Vice et al. (73). Some additional bidirectional switching circuits using standard TTL and ECL gates are described in Appendix C. Priority hardware. Whenever several devices communicate over a bidirectional> timeshared bus> some mechanism is needed to prevent more than one device from trying to transmit on the same line at the same time. Priority hardware built into a bus is probabably the fastest and most desirable mechanism for arbitrating simultaneous requests for bus use. For this reason> priority hardware is likely to be needed in a partitioning network to arbitrate conflicting requests for use of sub system busses. The ease with which suitable priority hardware can be built into a partitioning network is> thus> an important consideration. Speed requirements. There are three basic response times of inter est in a partitioning network: the time required to rearrange connec tions (probably one subsystem at a time)> the time required for a resource module to gain control of its subsystem's bus> and the rate at which a module can transfer data over this bus after obtaining control. The time required to rearrange connections in a partitioning networ~ depends largely on the complexity of the control algorithms involved and the extent to which these algorithms can be performed by hardware in the network itself> as opposed to sequential execution in an external processor. In a partltionable system used as described in C:::.oro+.;nn ? ? U'~v1.. , a subsystem generally would exist long enough for numerous messages to be transferred within that subsystem. Consequently> the time required to establish a new subsystem might be substantially greater than a typical message transfer time without significantly degrading overall performance>
PAGE 37
27 especially if new subsystems can be established without disrupting communication in existing subsystems. In a large system with many sub systems, however, frequent reconf iguration may be necessary even if the average subsystem life is long. In this case, it may be necessary that new subsystems be connected very quickly or that they be connected without disrupting communication in existing subsystems or both. This is an application dependent issue. Fast, simple control algorithms clearly are mor~ desirable than slow, complicated ones, but the impor tance of this depends largely on the frequency with which the system must be reconfigured. The time required for a resource to request and receive bus control (assuming that the bus is available) depends primarily on the speed of priority hardware used to arbitrate bus control requests. Since this must be done prior to each transmission, propagation delays in the priority hardware can significantly affect the rate at which short messages can be transmitted. In designing partitioning networks for large expandable systems, one must be careful that neither the propaga tion delay nor the cost of priority hardware grows unreasonably with system size. The maximum rate at which a resource module can transfer data after gaining control of its subsystem's bus tends to be inversely proportional to the number of bidirectional switches through which the signal propagates on its way through the network. If the network is multiplexed in a manner to be described later, this maximum rate also tends to be inversely proportional to the number of multiplexed "layers". The maximum rate referred to here is the limit imposed by the partitioning network. The rate at which data is actually transferred, of course, is
PAGE 38
28 limited by the resource modules also. The maximum data transfer rate allowed by the partitioning network should be high enough not to unduly slow down the resource modules. For large expandable systems, the propagation delay and the number of multiplexed layers (if used) should grow slowly with system size. Fault tolerance. A major advantage of partitionable systems is their potential for fault tolerance. If this potential is to be realized, a system must be able to tolerate hardware failures in its partitioning network as well as in its resource modules. Thus, a partitioning network should be able to continue functioning to at least some extent in spite of limited hardware failures. It is desirable that there be more than one possible way in which to connect any given subsystem, because other wise a single failure in the network could make certain subsystems impossible to connect, even if no demands are placed on the network by other subsystems. It also should be possible to employ control algo rithms adaptable enough to bypass faulty portions of the network when establishing new subsystems. Modularity and expandability. Modularity and expandability also are advantages of partitionable systems, and it is desirable that a partitioning network share these properties. To minimize production cost and to facilitate maintenance, it should be possible to built a partitioning network by connecting together a number of identical mod ules, perhaps supplied by a manufacturer as "offtheshelf" items. It also should be possible to expand a partitioning network such that the old network becomes part of the new one instead of being replaced by it. Partitioning flexibility. Ideally, we would like for a partition ing network to be able to partition system resource ports into subsystems
PAGE 39
29 in any conceivable way, but complete flexibility in this regard may be unnecessary in practice. Since greater flexibility generally requires greater cost and complexity, it is useful to determine just how much flexibility really is needed in a partitioning network. The ways in which a network actually needs to be able to partition a system depend on the kinds of system resource modules and on the ways in which the system is to be used. In a practical system~ certain kinds of subsystems might be inca pable of performing any useful function and, hence, need never exist. For example," a subsystem containing only memories might be unable to function~ Similarly, for certain kinds of multiport modules, it may be pointless to ever connect two or more ports of the same module to the same subsystem bus. In applications requiring only isolated subsystems, such as batch execution of conventional programs, a partitioning network should be able to configure any reasonable subsystem by itself but need not neces sarily be able to configure all reasonable combinations of subsystems. If subsystems required for a particular set of jobs cannot all be con figured at the same time, then some of the jobs simply can be executed at different times. Thus, jobs can be scheduled to avoid conflicting demands on a partitioning network just as they must be scheduled to avoid conflicting requirements for other system resources. Rescheduling jobs because of partitioning network limitations mightresult.in less efficient resource module utilization, but would allow all jobs to execute eventually, so long as the network could configure each subsystem individually. Isolated subsystems only need to coexist sufficiently for efficient utilizationofresource modules.
PAGE 40
30 Linked subsystems, on the other hand, interact during execution and hence must exist concurrently. Consequently, greater partitioning flexibility is likely to be required if a system is to accommodate large sets of linked subsystems. Additional flexibility for configuring arbitrary sets of linked subsystems can be achieved either by inherent properties of a network structure or by multiplexing or paralleling certain less flexible network structures in a manner that will be de scribed later.
PAGE 41
SECTION 3 SOME ALTERNATE REALIZATIONS OF PARTITIONING NETWORKS Two types of partitioning networks, based on crossbars and permu tation networks, respectively, will be described in this section. These networks are presented for their conceptual significance in relating partitioning networks to other structures and also to provide a basis of comparison for the banyan networks described in following sections. As will be explained, the networks described in this chapter have certain characteristics which tend to make them impractical for very large systems. 31
PAGE 42
32 3.1 Crossbar Networks The network shown in Figure 3.1la is perhaps the most straight forward partitioning structure. 1 It contains a number of busses, which are linked with all of the resource modules by bidirectional switching devices. Partitioning is accomplished by assigning one bus to each subsystem and connecting resources to them accordingly. For N resource modules, up to L N+2 busses may be required since this is the maximum number of nontrivial subsystems possible at any one time. A subsystem with only one resource module is trivial in the sense that it need not use the partitioning network for intrasubsystem communication. Figure 3.1lb is a graph representing the structure of this net w ork. This representation of network structure is similar to that used by Benes (62). It uses vertices to represent data busses, or links, and u ses edges to represent the switching devices, or "contacts", connecting t hem. Graph representations of this kind will be used with other struc tu res later. The network shown is represented by a biparte graph with an edge c onnecting each bus with every resource module. Graphically, this s tructure is equivalent to a crossbar switch. Crossbar partitioning networks have a simple, regular structure. T hey also have potentially low propagation delay for data transmission s ince data must propagate through only two switches regardless of network si ze. Propagation delay for priority hardware, however, would grow l ogarithmically with N, assuming the use of methods similar to those d escribed by Foster (68). Although faster priority hardware is possible, 1 T his structure is equivalent ot the "multiple, systemwide, functionally a nd physically nondedicatedbusses" described by Thurber et al. (72).
PAGE 43
"'\ Data Busses Data Switching I Busses Devices / \ / Resource Modules \ Resource Modules a) Block Diagram b) Graph Representation Figure 3.11. Crossbar Partitioning Network w w
PAGE 44
34 substantial improvement would most likely be prohibitively expensive in very large systems. The principal drawbacks of large crossbar networks are their cost and fanout requirements~ A network with L busses for partitioning N resource modules would require N x L N+2 switches. Thus, the cost in terms of switching devices required tends to grow as the square of N. Since each switching device in this network is connected to Nl similar devices on the same bus, the fanout required of the devices tends to grow linearly with N. Similarly, each resource module port is connected to L switches; so the fanout capability required of resource modules grows linearly with N also.
PAGE 45
35 3.2 Permutation Networks It is possible to build a partitioning network from a permutation network by supplying the external links shown in Figure 3.21. A per mutation network can connect, in pairs, a set of input terminals to a set of output terminals of equal size so that any desired permutation of inputs onto outputs can be realized. These connections allow trans mission in either direction when bidirectional switches are used in the network. In the configuration of Figure 3.21, the network permutes the set of resource modules onto itself, allowing connected subsystems to correspond to the cycles of the permutation. By choosing a permutation wi th the appropriate cycles, any desired partition can be connected. This result is theoretically significant because it implies that a minimal partitioning network for N resource modules requires no more swit ching devices than an Ninput, Noutput permutation network. It has be en shown that when N is a power of 2, such a permutation network can be built with as few as 4 x (N x 2@N) N1 switching devices (Goldstein and Leibholz, 67; Joel, 68; Waksman, 68). Thus, the cost of this net work tends to grow as N x @N, a substantial improvement over that of a cro ssbar for large N. Further, the fanout required of switches in such netw orks is independent of N. This too is a substantial improvement ov e r crossbar structures. The partitioning structure of Figure 3.21 is of limited practical valu e, however~ for several reasons. Propagation delay tends to be exc essive for data transmission in large subsystems. A signal in a sub syst em bus connecting I resource modules may have to propagate through the network as many as L I+2 times to reach its destination. Each time thr ough, it must propagate through as many as (2 x 2N)1 switching
PAGE 46
36 PER}1UTATION NETWORK ......_..._ ______ 'v______ _,, Resource Modules Figure 3. 21. Permutation Network Used as.a Partitioning Network
PAGE 47
37 devices in a minimalcost permutation network. Network reconfiguration would be hampered by the complexity of control algorithms and by the fact that existing connections may have to be rerouted whenever a new subsystem is added. The difficulty of incorporating priority hardware into this structure also appears to be a serious drawback.
PAGE 48
SECTION 4 BANYANS Banyan networks, named for the East Indian fig trees of somewhat similar structure, are defined in terms of their graph representations in Definition 1.1. 1 A banyan graph is a Hasse diagram of a partial ordering in which there is one and only one path from any base to any apex. A base is defined as any vertex having no arcs incident into it, an apex is any vertex with no arcs incident out from it, and all other vertices are called intermediates. When a banyan is used as a parti t ioning network, its bases are connected to resource modules, but its apexes and intermediates are internal to the network. Some examples of b anyans are shown in Figure 41. We use a directed graph representation b ecause it is useful for specifying the structure and its control algo ri thms, but the switching devices represented by edges are still bi d irectional. Frequently, we will omit the arrow heads from banyan graph di agrams and let it be understood that all arcs point up. Useful properties common to all banyan partitioning networks ~"ill b e presented in Sections 4.1 through 4.5. The general class of banyan n etworks is quite broad, but it is expected that most useful banyan p artitioning networks will be included in the more specialized cate g ories described in Sections 5 through 7. 1 All definitions and theorems discussed in this paper appear in Appendix B. 38
PAGE 49
39 a) Irregular Banyan b) LLevel Banyan Figure 41. Examples of Banyans
PAGE 50
40 4 1 TreeShaped Connections In a banyan the data path established to connect the resource modules of any subsystem always forms a tree rooted at some apex. By definition, there is a unique path from each base to each apex. A subsystem bus is formed by selecting an apex and then closing all switches along the path from each desired base to the selected apex Since each path is unique, the resulting data path forms a tree rooted at the selected apex (Th. 1.1 1) Algorithms for locating eligible apexes and establishing the connections will be presented in Section 4 4. Treeshaped connections are significant because they lend themelves w ell to the inclusion of priority hardware and because they can afford l ow propagation delay with limited fanout. A method for building prior it y hardware into a banyan partitioning network will be described next. Pr opagation delay and fanout requirements of certain types of banyans wil l be discussed in subsequent sections.
PAGE 51
41 4 2 Priority Hardware in TreeShaped Data Paths The need for priority hardware in a partitioning network was des cribed in Section 2.3. The treeshaped nature of subsystem busses in a banyan network allows suitable priority hardware to be built into the network using the basic approach outlined in this section. Implementation will not be discussed in detail since a number of variations are possible. Although designed for use in banyan partitioning networks, the technique described here is applicable to any treeshaped data path, and is somewhat similar to that described by Foster (68). It allows priority levels to be associated with requests for bus control by resource modules and, in t he event of simultaneous requests, grants control to the module with t he highest priority request. Various tie breaking schemes are possible. In the proposed priority scheme, each resource module desiring c ontrol of its treeshaped subsystem bus transmits a "bus request signal" ap exward toward the tree's root along a set of "bus request lines." A bu s request signal is an encoded number representing the priority level of the corresponding request for bus control. Modules not desiring bus co ntrol transmit bus request signals at priority level zero. Priority har dware in the network compares these signals and sends "request denial si gnals" to all resource modules except the one to which it grants bus co ntrol. Figure 4 21 illustrates how the priority scheme would function. Sin ce the priority hardware for each subsystem functions independently, on ly one subsystem bus is illustrated. A, B, C, and Dare the bases, or r esource module ports, included in this subsystem. X and Y are in termediate vertices used in the corresponding treeshaped connection,
PAGE 52
1 A 3 \ \ \ \ \ \ \ \ ,o 3 \ \ \ \ \ \ B 42 0 \ \ \ \ \ .. 1 2 \ \ C Figure 4.21. Example of Pri o rity Vie \ \ \ \ \ \ \ 1 \ \ \ \ \ D
PAGE 53
43 and Z is the ape x at its root. Solid lines in the diagram represent bus request lines and dash lines repr e sent "request denial lines," which are used to convey request denial signals Numbers beside these lines indicate the signal values they carry in the example. To simplify the explanation, we assume that bus request and request denial lines are physically distinct from those used to convey data. These lines are switched by switching devices in the banyan just as are the data lines so that their connection patterns are correspondingly treeshaped. They differ from the data lines, however, in that they transfer signals in only one direction and interface with priority ha rdware at each intermediate and apex vertex. Bus request signals ori ginate at bases and propagate only in an apexwardly direction. Re quest denial signals are generated by the priority hardware at apex an d intermediate vertices and propagate baseward. Priority hardware at each intermediate vertex, such as X or Y, com pares its incoming bus request signals and forwards the maximum on to the vertex above. It also generates a request denial signal, repre sent ed by a logical "1", on all but one of the request denial lines bel ow it. The request denial line over which it sends no denial signal, log ical "O", corresponds to the incoming bus request signal with highest pri ority. Ties may be broken in various ways. In the example shown, the rightmost branch with highest request priority is selected. Other tiebreaking schemes are possible, however. For example, a fairer but mor e complex scheme might be to select the branch whose past requests have been least recently granted. When an intermediate receives a request denial signal from the vert ex above, however, it transmits request denial signals to all
PAGE 54
44 vertices below it in the connection regardless of their request priori ties. An apex, such as Z, functions exactly the same as an intermediate e x cept that it can neither receive request denials from nor transmit bus requests to a vertex above. In the example of Figure 4.21, bases A, B, and Care making bus requests at priority levels 3, 3, and 2, respectively. Base D does not desire bus control so, consequently, produces a request signal at level zero. Intermediates X and Y compare their incoming request signals and forw ard the maximums on to apex Z. The apex compares these requests, send s a "request granted" signal (logical "O") to the branch with high est request priority (X), and sends request denial signals (logical "1") to the other branch (Y). Vertex Y transmits denial signals to both C and D because it receives a denial from Z above. Vertex X may grant the re quest of either 4 or B since they have tied for highest priority. In this case, the tie is broken by granting B's request. In explaining this priority scheme, we have assumed the existence of physically separate lines for data, bus request signals, and request den ial signals. This allows the priority vie to be performed in parallel by co mbinational logic with a worst case propagation delay approximately pro portional to the longest path length from a base to the apex in the conn ection. Further, it allows the priority vie for one message trans fer to overlap data transmission from the previous one. This could be des irable to achieve high transfer rates for short messages. For applications in which lower rates are acceptable, however, seri al implementations using fewer lines may be more economical. A numb er of variations are possible. For example, the same lines might
PAGE 55
45 be used for the priority vie as for data transfers. During a priority vie, the bus request and request denial lines would function as de scribed. Once the new bus master has been selected, the network, or at least that part of it used by a given subsystem, would change its operat ing mode and would allow these lines to function as a bidirectional data path during the message transfer. Another possibility is to perform the priority vie sequentially in two phases. First the bus request signals would be transmitted apexward as described. Actual transmission of these signals could be either serial or parallel. During this phase> a small register is set in each i ntermediate and apex vertex to record the branch from which the highest p r iority request was received. These registers then contain sufficient i nformationto controlrouting of the request denial signals. For the s econd phase> the operating mode of the priority hardware is changed and r equest denial signals are propagated baseward using one of the lines pr eviously used for the bus request signals.
PAGE 56
46 4.3 Synthesizing Large Banyans from Smaller Ones Large banyan networks can be synthesized recursively from smaller ones. Suppose that one has available a number of small banyan networks, perhaps supplied by a manufacturer as standard components, and one wishes to synthesize a larger network. This can be done as illustrated in Figure 4.3la by connecting the apexes of some banyans to the bases of others. The interconnections of these banyans can be represented by a graph, as illustrated in Figure 4.3lb. In this graph, called an inter connection graph, each vertex represents a banyan network. An arc from any vertex V1 to another vertex V2 means that one apex of banyan V1 is dir ectly connected to one base of banyan V2. We assume that if there are any arcs incident into a vertex, then the corresponding banyan has exact ly one base for each incident arc. Similarly, the number of apexes equal s the number of arcs incident out from the corresponding vertex unl ess there are none When there are no arcs incident into a vertex, th e bases of the corresponding banyan become the bases of the synthesized net work. Similarly, the apexes of the synthesized network are those of th e component banyans with no arcs incident out. Theorem 1.2.3 states that when banyan networks, called the component ban yans, are interconnected as described, the resulting synthesized netw ork will be a banyan if and only if the corresponding interconnection graph is a banyan. This implies that once one or more banyan structures are known, these structures can be recursively expanded to arbitrarily large sizes. Using this principle, one might construct a large banyan netw ork by systematically interconnecting a number of smaller component
PAGE 57
a) Synthesized Network b) Interconnection Graph Figure 4.31. Banyan Synthesis
PAGE 58
48 banyan networks in a pattern which is itself a banyan. Suitable compo nent banyans could be manufactured as standardized modules. The SW structure, discussed later, is characterized by applying this kind of recursive expansion to a crossbar, which is one of the simplest banyan structures.
PAGE 59
49 4.4 Control of Connections The treeshaped subsystem connections in a banyan network can be established very rapidly and in a potentially faulttolerant manner using distributed control hardware within the network. Control is accomplished by means of a setup algorithm, which establishes a sub system connection, and a search algorithm, which locates eligible apexes. In setting up the first subsystem, any apex may be used as the root of its treeshaped subsystem bus. Prior to setting up each additional subsystem, however ~ a search algorithm must be employed to select an apex such that the new connection will not interfere with those already exi sting. A twostep setup algorithm is illustrated in Figure 4.41 and is ju stified theoretically in Theorem 1.3.1. Setup is facilitated by a si ngle control line provided in each link of the network. First, a "on e" signal is broadcast baseward from the selected apex over the con trol line, as illustrated in Figure 4.4la. The signal fans baseward at each vertex so that the "one" propagates to all bases. This signal set s a flipflop in each intermediate and apex through which it passes. In the second step, "ones" are broadcast apexward from each base in th e desired subsystem, as illustrated in Figure 4.4lb. In this step, the signal is OR'ed apexward at each vertex. As illustrated in Figure 4.4lc: the desired connection is made by closing every switch that rec eives this signal from below and has a set flipflop in the adjacent ver tex above. These are the links through which control signals propa gat ed in both steps one and two. As described, this setup algorithm would require two steps but onl y one control line in each link. Unlike the data lines, this control
PAGE 60
Selected Apex t t Selected Bases a) Step 1 b) Step 2 Figure 4.41. SetUp Algorithm c) Final Connection V1 0
PAGE 61
51 line is always connected between vertices and does not require a bi directional switch for each edge of the graph. Switching for the con trol line occurs at the vertices, where the signal is either OR'ed up or OR'ed down. With the use of two control lines rather than one, the two steps of this algorithm could be combined into one, eliminating the need for the flipflop at each intermediate and apex. A twostep search algorithm is illustrated in Figure 4.42 and is justified theoretically in Theorem 1.3.2. The purpose of the search algorithm is to locate those apexes which are suitable for connecting a given subsystem (i.e., set of bases) without interfering with any existing connections. Two subsystem connections can interfere if and only if t hey have some vertex in common. In the example illustrated, the circled vertices represent those a lready in use, and bases 3 and 6 are to be connected as a new subsys t em. As shown in Figure 4.42a, control signals are first broadcast ap exward simultaneously from all bases in the desired subsystem and are t hen OR'ed upward using the same control line used in setup. During th is step, a flipflop is set in every intermediate and apex which r eceives this control signal and is already in use. In the second step, illustrated in Figure 4.42b, the control si gnals from the bases are turned off, and each vertex with a set flip fl op broadcasts a "one", which is OR'ed apexward on the same line used in step one. All apexes not receiving a "one" during this step are el igible. Final selection could then be performed by a priority circuit at tached to the apexes.
PAGE 62
0 1 2 3 4 5 6 a) Step 1 Eli,gible Apexes 7 0 l 2 3 4 5 b) Step 2 Figure 4.42. Search Algorithm 6 7 V, N
PAGE 63
53 Steps one and two of this algorithm, like those of the setup algorithm, could be combined using a second control line. With four control lines, search and setup could all be combined in one step. Another possible variation is to provide no dedicated control lines at all and use instead the same lines that are used for data and prior ity signals within a subsystem. Since these lines must be treated differently by the network, the entire network could be switched between t wo modes as needed to facilitate either reconfiguration or intrasubsys t em communication. This implementation of the control algorithms could r educe the required pin counts in network hardware, but, unlike other im plementations, would require that communication be temporarily sus pen ded in all existing subsystems whenever new ones are being set up. The most desirable control algorithm implementation would depend on the costperformance tradeoffs of a particular application. The control algorithms described are inherently faulttolerant when faul ty vertices in the network can be made to appear like those already in use to the search algorithm. New connections would then be routed ar o u nd faulty portions of the network just as they are routed around tho se portions already in use. Using the search algorithm described, how ever, a portion of the control circuitry in faulty cells would still nee d to function properly in order to make them appear like those in use. A slower search algorithm that avoids this problem has been des cri bed by Lipovski (70). Alternatively, a software search algorithm cou ld replace the faster hardware algorithm in the event of hardware fail ure.
PAGE 64
5l~ 4.5 Parallel and Multiplexed Networks Search and setup algorithms may be repeated until all desired subsystems have been connected or until no more eligible apexes can be found. Practical banyan networks allow numerous combinations of sub systems to be configured in this way but are not necessarily capable of realizing all possible partitionings of system resources. As was dis cussed in Section 2 3, this degree of partitioning flexibility may suffice for certain kinds of applications. When necessary, a banyan n etwork can always configure conflicting isolated subsystems at dif fe rent times. When greater partitioning flexibility is required, there are two sol utions which potentially will allow configuration of all possible par titionings. First, several banyans can be connected in parallel. Th e parallel networks would function independently but their bases would be connected to the same set of resource modules. As many subsystems as po ssible would be connected using the first network. Those left over woul d be connected in as many additional networks as required. The other solution is to multiplex a single network so that it per iodically rearranges itself to connect first one set of subsystems, the n another, and so on, so that each subsystem has some time slot duri ng which it can connnunicate. A partitioning network, as considered her e, acts as a rearrangeable set of timeshared busses. A resource modu le attached to the network must request and receive control of its bu s before transmitting data, and must be prepared to wait whenever the bus is not innnediately available. Normally, the bus would be una vailable only when used by other resources in the same subsystem;
PAGE 65
55 but should it ever become temporarily unavailable for other reasons, the only effect would be to delay data transmission within the subsystem. This situation makes multiplexing possible with little or no modifica tion of the resource modules. The system need only be designed so that any resource not currently connected by the network would "see" it as a busy bus. Multiplexing requires that a small amount of memory be associated with each switch in the network to store the state of the switch during each time slot. With LSI, this could be done at reasonable cost by associating a small register with each switch and synchronizing all state changes from a central clock. The techniques of parallel networks and multiplexing may be mixed t o balance cost and performance. Whether a network structure is space sh ared with parallel hardware or time shared with multiplexing, the p arallel networks and/or time slots share many properties and are called la yers. The number of layers required depends on a number of factors an d will be discussed later. There also is a partial solution which could provide some increase in partitioning flexibility. Sometimes it might be possible to connect ad ditional subsystems in a single layer by rearran~ing the connections of existing subsystems. This, however, would require more complex al gorithms, would interrupt processing in existing subsystems during re configuration, and would provide only a limited increase in partition in g flexibility. Consequently, it is believed that parallel networks or mu ltiplexing would be of much more practical value.
PAGE 66
SECTION 5 LLEVEL BANYANS An Llevel banyan defined formally in Definition 2.1, is a banyan wh ose vertices are arranged in levels so that switches, or arcs of the g r aph, exist only between vertices in adjacent levels. For example, the gra phs in Figures 4lb, 4 41, and 4.42 are Llevel banyans, but 4la is n ot. There are actually L+1 levels of vertices in an Llevel banyan. They are, by convention~ numbered apexward from O through L so that all bas es are in level O and all apexes are in level L. When we say that a banya n has L levels however ~ it will be understood that it is an Llevel banya n rather than an (L1)level banyan. The class of Llevel banyans is a proper subset of the general banyan networks discussed in Section 4, but is still broad enough to includ e most practical designs. As will be explained in this section, Llev el banyans have additional useful properties, which make them attra ctive as partitioning networks. Any path from a base to an apex in an Llevel banyan has exactly L arcs; thus, the propagation delay of data through the network cannot exce ed that of 2xL switches, since in the worst case, data must travel from base to apex to base. Base and apex "distance" functions can be associated with Llevel banyan s and will be discussed in Section 5.1. Theoretical results conce rning these functions can be used to improve the performance of an Llev el banyan network. 56
PAGE 67
57 A class of Llevel banyans called "uniform" banyans will be dis cussed in Section 5.2. Special cases of uniform banyans called "regular" and "rectangular" banyans also will be discussed. It will be shown that measures of the size and cost of a uniform banyan can be expressed as functions of certain parameters called "fanout" and "spread". The orderly structure of the networks discussed in Section 5.2 makes them likely candidates for modular construction~ and it is expected that most practical designs would fall into these categories.
PAGE 68
58 5.1 Base and Apex Distance The dyadic operators~ and~, called base distance and apex dis tance, respectively, are defined in Definition 2.1.2 for any Llevel banyan. The base distance Bl~ B2 specifies the minimum number of levels up into the banyan a connection must extend to connect two bases B1 and B2. Similarly, the apex distance Al~ A2 specifies the minimum number of levels down from the top of the banyan a connection must e x tend to connect two apexes Al and A2. Figure 5.11 illustrates the concepts of base and apex distance. Th e darkened paths represent minimal connections. The connection of ap exes is presented only as a conceptual aid in explaining apex distance and would not actually occur in a banyan partitioning network. The definitions of base and apex distance are extended to sets of ba ses and apexes, respectively, in the same way that point distances oft en are extended to sets of points. That is, the base distance between any two sets of bases SBl and SB2 is defined to be the minimum of all dist ances Bl~ B2 such that Bl E SB1 and B2 E SB2. The analogous extention appl ies to apex distance. Base and apex distance functions are used in Theorem 2.1.7 to char acterize a way of avoiding conflicts in connections established with in an Llevel banyan. Theorem 2.1.7 tells us that if L < (SB1~B2)+(AltQlA2), then subsystems SB1 and SB2 can be connected without conflict in the sam e layer using tree.,..shaped connections rooted at apexes Al and A2, resp ectively.
PAGE 69
A1 A2 1 = A1~2 T (B1~2) = 2 J_ B1 B2 Figure 5.11. Base and Apex Distances in an LLeyel Banyan
PAGE 70
60 There are two potentially useful interpretations of Theorem 2.1.7 which suggest ways of enhancing the performance of an Llevel banyan partitioning network. First, subsystems close to each other place more stringent requirements on the separation of apexes used than do widely separated subsystems, suggesting that closely spaced subsystems are less likely to be connected in the same layer. Thus, if it is known at design time which resources of a system are most likely to be connected, orie might improve performance by gerrymandering the assignment of re sources to bases so that bases most likely to be connected tend to be closest. For example, each processor module might be placed close to s ome memory module port, and multiple ports to the same memory module mi ght be widely separated. An operating system also could take advantage of this result by allocating closely spaced resource modules to a subsystem w henever possible. The amount of improveme~t thus obtainable is not est imated here since this would be highly problem dependent, but one can ea sily contrive extreme examples in which more than one layer would se ldom or never be needed. The second interpretation concerns the selection of apexes. The sea rch procedure described earlier locates all apexes eligible for con necting a new subsystem in a partially occupied layer, but does not det ermine which of the eligible apexes is the best choice. Theorem 2.1 .7 now suggests a plausible selection criterion. According to the th eorem~ any new subsystem can be connected if we can find some apex suf ficiently distant from those already in use. Thus, apexes most dis tant from those in use are the most valuable in the sense that they are likely to be eligible for connecting the greatest variety of subsystems.
PAGE 71
61 Hore subsystems might then be connected in a layer by selecting each new eligible apex so as to leave as many "valuable" apexes as possible for subsequent connections. This criterion is ambiguous in some cases, but, nevertheless, is the conceptual basis for a priority rule found to improve performance in network simulations discussed in Section 8.
PAGE 72
62 5.2 Fanout and Spread A class of Llevel banyans called uniform banyans is defined in Definition 2.2.1. Within each level of a uniform banyan, all vertices are alike in that each has the same number of arcs incident into it and the same number of arcs incident out from it. The arcs incident out from the vertices of a uniform banyan are characterized by an Lcomponent vector E, called the fanout vector. Similarly, the arcs incident in are characterized by an Lcomponent spread vector~When~= E, the banyan has the same number of vertices in each level (Corollary 2.2.lb) and is called rectangular (Definition 2.2.2). A regular banyan (Definition 2.2.3) is a special case of a uniform ba nyan, in which all vertices throughout the network are alike except, of course, for the fact that bases have no arcs incident into them and apex es have no arcs incident out. All components of a regular banyan's fan out vector, thus, are equal and can be characterized by a single scal ar parameter F, called fanout. Similarly, all components of its spr ead vector are equal and are characterized by a spread parameter S. Regular banyans probably would be the most economical to fabricate, becau se they can be built from a number of identical cells, each con tain ing the circuitry associated with a vertex and the arcs incident into it. The fanout and fanin requirements of these cells are deter mine d by F and S. Such modular construction could be used for other unif orm banyans as well except that different kinds of cells might be need ed for different levels. Theorems 2.2.1 and 2.3.1 and their corollaries show how the numbers o f ar cs and vertices in various kinds of uniform banyans are related to fan out and spread. The expressions derived are summarized in Table 5.2
PAGE 73
63 1. The total number of arcs in a banyan graph can be used as a measure of network cost since these arcs represent the bidirectional switching devices in the corresponding network. It is shown in Theorem 2.4.2 that, for a given number of bases, the "costn (number of arcs) per base of a regular rectangular banyan is minimized with respect to fanout when F = 3. Further, the cost of such a network is the same when F = 4 as it is when F = 2. Similarly, it is shown that a crude costperformance measure; obtained by multiplying this cost function by a measure of maximum data propagation delay, is optimized when F = 7 and is very near optimal when F = 8. The cost and performance aspects of banyan partitioning networks will be discussed in greater detail in Section 9.
PAGE 74
TABLE 5.21. Size and Cost Functions for Uniform Banyans Type of Vertices in Banyan Level I Bases Apexes Arcs (Cost Measure) Uniform x/(It.) ,I+E x/f:_ x/3_ [I=1~L]f.[I]xx/(It~).I+E (Th. 2.2.1) (Car. 2.2.la) (Car. 2.2.la) (Th. 2.3.1) Rectangular x/f_ x/f._ x/f. (x/f.)x/f:_ (Car. 2.2.la & 2.2.lb) (Car. 2.2.la) (Car. 2.2.la & (Car. 2.3.la) 2.2.lb) Regular and F*L F*L F*L (F*L)xLxF Rectangular (Car. 2.2.lb &2.2.lc) (Car. 2.2.lc) (Car. 2.2.lc) (Car. 2.3.lb) .pRegular ( S*I) XF*LI F*L S*L Fx[I=1~LJ(S*I)xF*LI (Car. 2.2.lc) (Car. 2.2.lc) (Car. 2.2.lc) (Car. 2.3.lc) Regular and (S*I)xF*LI F*L S*L (F*L)xSx(((F*L)xS)F)~SF Nonrectangular (Car. 2.2.lc) (Car. 2.2.lc) (Car. 2.2.lc) (Car. 2.3.ld)
PAGE 75
SECTION 6 SW BANYANS SW banyans (Definition 3.1.2) are a particularly interesting class of Llevel banyans which can be synthesized recursively from crossbars (Definition 3.1.1) using the synthesis principle discussed in Section 4.3. SW banyans are probably the most attractive banyans for parti tioning networks because they are the best understood theoretically, because they lend themselves exceptionally well to modular construction and, because they are a broad class of networks which can be varied in a num ber of ways to meet the needs of different applications. Addition ally, the analysis of SW banyans might have much broader implications, beca use a number of connecting networks originally proposed for other app lications are actually special cases of SW banyans. 65
PAGE 76
66 6.1 Previous Special Cases SW banyans are actually a generalization of a number of network structures considered previously for a variety of applications. The term "SW structure was originally used by Lipovski (69, 70), who first proposed them for partioning applications in a large associative pro cessor. The structures he defined are equivalent to regular SW banyans, and the possibility of uniform (but nonregular) SW banyans was implied (Lipovski, 69). Structures graphically equivalent to regular rectan gular banyans with fanout and spread equal to 2 had been proposed ear lier by Batcher (68) for use as "bitonic sorters." More recently, networks graphically equivalent to rectangular SW bany ans were proposed by Lawrie (73, 75) for memoryprocessor communi cati on. Lawrie defined these networks using "omegabase" representa tion s of integers and analyzed them for his application using number the ory. Lawrie also noted that interconnections between stages of an "om ega network" (i.e., between levels of a rectangular banyan) are equiv alent to the "perfect shuffle" connection discussed by Pease (68) and by Stone (71). Additional material on the control and applications of n etworks of this type was published by Lang and Stone (76). A variety of permutation networks also have been proposed which cont ain special cases of SW banyans as major subgraphs. These networks are intended to permute a set of input lines onto an equal number of out put lines in any desired fashion and were studied largely for tel ephone switching applications. Clos networks, proposed by Clos (53) and discussed further by Benes (62), contain three stages of crossbars inte rconnected symmetrically. If the last stage (or alternately the fir st, since the network is symmetrical) were removed from an n by m
PAGE 77
67 by r Clos network, then the remainder would be a uniform 2level SW banyan with fanout vector n rand spread vector m r. Benes (64a, 64b, 65) analyzed a similar, but more general, class of permutation networks containing an odd number (not less than 3) of stages of crossbars inter connected symmetrically. These Benes networks, like Clos networks, are symmetrical about the center st age. If one were to remove all stages from one side of a Benes network, the remainder would be a rectangular SW network (or, equivalently, an omega network as noted by Lawrie (73)). It was subsequently shown by Goldstein and Liebholz (67) and by Waksman (68) that certain crossbars can be removed from one side of a Benes netw ork without destroying its ability to connect all possible permu tati ons of input lines onto output lines. The other side of such a netw ork (including its center stage) is identical to one side of a Benes network and hence is a rectangular SW banyan. Another class of permu tati on networks, called "nested tree" networks, were proposed by Joel (68) with little supporting theory. Each stage of a "nested tree" netw ork is built from twobytwo crossbars and apparently is a regular, rect angular SW banyan with fanout and spread equal to 2. 1 Such common structures as crossbars and homogeneous trees are also special cases of SW banyans. Crossbars are simply 1level SW banyans and h omogeneous trees are uniform SW banyans in which each component of the fanout vector equals 1. 1 The term "homogeneous tree" is used here in the sense of Iverson (62, p. 58). All leaves of a homogeneous tree lie in the same level, and within each level~ all vertices have the same degree.
PAGE 78
68 We are presently concerned with SW banyans as partitioning networks, but this diversity of applications suggests that theoretical results concerning SW banyans also could be useful in other areas.
PAGE 79
69 6.2 Structure SW banyans (Definition 3.1.2) are defined recursively in terms of crossbars (Definition 3.1 1) which are simply 1level banyans All crossbars are SW banyans. Additionally, a synthesized banyan is an SW bany~n if its interconnection graph is an SW banyan and its component banyans are all crossbars. Crossbars and synthesized SW banyans are the only SW banyans. This definition of an SW banyan is somewhat simpler and more general than that published previously by the author (Goke and Li povski, 73). Unlike the earlier definition~ it does not necessarily requ ire an SW banyan to be uniform. Figure 6.21 illustrates the synthesis of an SW banyan. Figure 6.2la show s the interconnection of component crossbars, 6.2lb is the corres pondi ng interconnection graph, and 6.2lc is the resulting synthesized SW banyan graph. Figure 6.22 shows some additional examples of synthesized SW banyans and their corresponding interconnection graphs, which, of cou rse, are also SW banyans. Properties of a synthesized SW banyan are related to those of its int erconnection graph in a number of ways. All SW banyans are Llevel bany ans, and the number of levels in a synthesized SW banyan is one gre ater than the number of levels in its interconnection graph (Theorem 3.1. 3). A uniform synthesized SW banyan with fanout vector E and spread vec tor has a uniform interconnection graph with fanout vector 1+E and spread vector (1)+ (Theorem 3.1.5). Similarly, a uniform SW int erconnection graph with fanout vector E' and spread vector Q 1 can be u sed to synthesize a uniform SW banyan with fanout vector B,E' and spr ead vector ',A, where Bis the number of bases in each bottomlevel com ponent crossbar and A is the number of apexes in each toplevel
PAGE 80
70 0 1 2 3 4 5 2 x 2 CROSSBAR 2 x 2 CROSSBAR 2 x 2 CROSSBAR 4 x 3 CROSSBAR 4 x 3 CROSSBAR 0 1 2 3 4 5 6 7 a) Interconnection of Crossbars b) Interconnection Graph 0 2 4 1 3 5 0 1 2 3 4 5 6 7 c) Synthesized SW Banyan Figure 6. 21. Synthesis of an SW Banyan
PAGE 81
71 a) Nonuniform SW Banyan b) Interconnection Graph of a c) Rectangular SW Banyan d) Interconnection Graph of c e) Regular SW Banyan f) Interconnection Graph of e Figure 6.22. Examples of SW Banyans
PAGE 82
72 C 1 2 3 l1 5 6 7 8 9 10 11 12 13 14 15 0 1 2 1 4 5 6 7 8 9 10 11 12 13 1"15 g) Regular, Rectangular SW Banyan h) Interconnection Graph of g i) Interconnection Graph of h Figure 6.22 (Continued) j) Interconnection Graph of i
PAGE 83
73 component crossbar (Theorem 3.1.4). Also, the bases and apexes of a synthesized SW banyan can be mapped into those of its interconnection graph such that distances in the synthesized banyan are one greater than the corresponding distances in the interconnection graph (Theorems 3.1.6 and 3 .1. 7) Synthesized SW banyans lend themselves especially well to modular construction~ By definition, a synthesized SW banyan can be constructed by interconnecting crossbars in a pattern which is itself a simpler SW banyan. Crossbars are, thus, natural building blocks for SW banyans. A r egular SW banyan has the advantage that its component crossbars are all identical. At most L different kinds of crossbar modules are needed for a uniform SW banyan with L levels. A manufacturer, thus, could mass produc e a few standardized kinds of crossbar modules in sizes where cross bars are practical, and these modules could be interconnected in an SW pat tern to produce larger networks. We observe that it is also possible to synthesize large SW banyans by int erconnecting smaller SW banyan modules in an SW pattern. For exampl e, Figure 6.23 shows how an SW banyan equivalent to that of Figur e 6.22g can be synthesized from eight component SW banyans inter conn ected in a crossbar pattern. In Figure 6.23, graphs of the com ponent SW banyans are drawn in the usual manner using solid lines as arcs, and broken lines show how the component banyans are interconnected. Base and apex numbers correspond with those in Figure 6.22g. In this manner, SW banyan modules can be used as building blocks for larger SW banyan networks. This would be useful if, for example, available packag i ng techno logy made it desirable to use modules larger than the largest practi cal crossbar.
PAGE 84
0 4 8 12 I I I I I I 0 l 2 3 1 5 9 13 2 6 10 14 I I I I I t 5 6 fl 9 10 ll 3 7 11 1 5 I I I I I I I I 12 I:~ 1 'l l~ Figure 6. 23. Sy nthesis of an SW Banyan From Smaller Component SW B a nyans
PAGE 85
75 6.3 Distance Properties The base and apex distance functions of an SW banyan are metrics on its bases and apexes, respectively (Corollaries 3.2.3b and 3.2.4b). In an Llevel SW banyan, each of these functions is characterized by L+1 equivalence relations with nested equivalence classes (Corollaries 3.2.3a and 3.2.4a and Theorems 3.2.5 and 3.2.6). Base distance is characterized by relations ltl 0 ,!ll 1 .. ,ei!L where B1 ~I B2 if a.nd only if (B1W2) I (Definition 3.2.1). Similarly, apex distance is character ized by relations ~ 0 ,~ 1 ,~L where A1 ~I A2 if and only if (A1&l.4.2) I (Definition 3.2.2). The equivalence classes of these relations are listed in Table 6.31 for the SW banyan in Figure 6.22g. In a uniform SW banyan with fanout vector E., the relation ~I partitions the networks bases i.nto x/I+E. equivalence classes with x/Itf. elements each (Theorem 3 .2 7). Similarly, relation ~I partitions the network's apexes into x /(I)+ equivalence classes with x/(I)t elements each (Theorem 3.2.8). For reasons explained in Section 5.1, it is desirable to assign re sources to bases such that resources most likely to be in the same su bsystem are closest to each other. Thus, with an SW banyan, resources mo st likely to be in the same subsystem should be assigned to bases in th e same small equivalence class. For example, suppose that eight pr ocessors and eight memory modules are to be attached to the 16 bases of the network in Figure 6.22g, and suppose it is known that a typical su bsystem will require about as many processors as it does memories. Th en chances are that more subsystems could be connected per layer if pr ocessors were attached to bases 0,2,4, ... ,14 and memories to bases 1, 3,5, ,15 than if processors were attached to bases 0,1, ,7 and memories to bases 8,9, .. ,15.
PAGE 86
TABLE 6.31. Base and Apex Equivalence Clas se s for SW Banyan in Figure 6 22g Relation Equivalence Classes of Bases e:Jo [ O] [1] [2] [3] [4] [ 5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] ~1 [O 1] [2 3] [4 5] [6 7] [8 9] [10 11] [12 13] [14 15] ~2 [O 1 2 3] [4 5 6 7] [8 9 10 11 J [12 13 14 15] ~3 [O 1 2 3 4 5 6 7] [8 9 10 11] [12 13 14 15] ~4 [O 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15] .J O"\ Relation Equivalence Classes of Apexes &lo [OJ [8] [4] [12] [2] [10] [6] [14] [1] [ 9] [ 5] [13] [3] [11] [7] [15] &11 [O 8] [4 12] [2 10] [6 14] [1 9] [5 13] [3 11] [7 15] &J 2 [O 8 4 12] [2 10 6 14] [1 9 5 13] [3 11 7 15] ~3 [O 8 4 12 2 10 6 14] [1 9 5 13 3 11 7 15] ~4 [O 8 4 12 2 10 6 14 1 9 5 13 3 11 7 15]
PAGE 87
77 Base and ape x distance functions are even more informative for SW banyans than they are for other Llevel banyans. In an Llevel banyan, these functions specify the minimum number of levels into the banyan that a connection must extend to connect two bases or apexes (Section 5.1). In an SW banyan, these functions also specify the maximum number of levels into the banyan that such a connection may extend before its two branches join. For example, if the distance between two bases of an SW banyan is 3, then any treeshaped connection joining them will fork p recisely at level 3 rather than just somewhere in level 3 or above. A ccordingly, the necessary condition for a conflict in an Llevel banyan (T heorem 2.1.7) is both a necessary and sufficient condition for a co nflict in an SW banyan (Theorem 3.3.1).
PAGE 88
SECTION 7 CC BANYANS CC banyans (Definition 4.i.1) are a class of rectangular banyans which are potentially useful as partitioning networks. They differ from SW banyans in that multilevel CC banyans are not synthesized from smaller banyans. The distance functions of a CC banyan differ from those of an SW banyan in that bases or apexes appear to be arranged in a circle r ather than in nested equivalence classes. The distance between two bases or apexes is then determined by their separation on the circle. Relatively few examples of the CC banyan structure are known to ex ist in earlier networks. The "barrel switch" of the ILLIAC IV Proces si ng Element is graphically equivalent to a 3level, regular CC banyan wi th fanout and spread equal to 4. In this application, it is used to sh ift 64 bits an arbitrary number of places left or right (Davis, 69). CC banyans also are related to the "line manipulator" networks pr oposed by Feng (74) for performing a variety of data manipulation fu nctions. A line manipulator is not itself a banyan because it contains mu ltiple paths from any given base to an apex, but it contains both a CC ba nyan and an SW banyan as partial graphs. These partial graphs are bo th regular, rectangular banyans with fanout and spread equal to 2. 78
PAGE 89
79 7.1 Structure A CC banyan (Definition 4.1.1) is rectangular (Theorem 4.1.3) and, hence, has the same number of vertices in each level. For convenient identification, we can index these vertices as f[O~L;O~N1] where f[I;O~N1] are the N vertices of level I. Hence, f[O;O~N1] are bases, and f[L;O~N1] are apexes. Let [1~L] be the fanout and spread vector of this rectangu lar banyan. Then from each vertex f[I;J] where O $ i < L, there is an arc to each of the vertices f[I1 ;J] ,f[I1 ;J$( x / I+Q)] ,_f[I1 ;J
PAGE 90
3 2 1 0 0 l 80 2 3 4 5 a) A CC Banyan with L = 3 and~= 2 2. 2. 2 l 0 0 1 2 3 b) A CC Banyan with L = 2 and Q. = 3 2. 6 7 4 5 Figure 7 .11. Examples of CC Banyans
PAGE 91
81 7.2 Distance Properties The base distance function in a CC banyan is characterized in terms of minimum circular distance. Consider the integers O ~ N1 arranged in a circle as illustrated in Figure 7. 21. The minimum cir cular distance between two numbers J1 and J2 is denoted by J1 J2 and is defined to be the minimum number of steps, either clockwise or counter clockwise, which separate J1 from J2 on this circle. For example, if N ::c:: 4 then ((N1)&l1Y = (l~(N1)) = 2. This function is a metric on the int egers O ~ N1 (Theorem 4.2.3). The distance between two bases of a CC banyan can be determined fr om the minimum circular distance between their indices. Any base distance, of course, mustbe an integer in the range O ~ L. For any integer I in thi s range, the base distance between two bases f[O;J1] and f[O;J2] will be equal to or less than I if and only if (J1W2) < x /It{I_, where {I_ is the fanout and spread vector of the CC banyan and where N = x/{I_. (Theorem 4.2. 4). Thus, the base distance between f[O;J1] and f[O,J2] is simply the smallest value of I= 0 ~ L such that (J1W2) < x/It._. Base distance is a metric on the bases of a CC banyan except possibly for degenerate CC banyans in which one or more components of {I__ are less than 2 (Theorem 4.2. 6). It is apparent from this characterization that bases are closest in ter ms of base distance when their indices are closest in terms of minimum circ ular distance Consequently, bases in a CC banyan can be thought of as arranged in a circle like the numbers in Figure 7.21. Hence, resource mod ule ports most likely to be assigned to the same subsystem should be att ached to adjacent bases, and those least likely to be connected should be attached to bases opposite each other on the circle.
PAGE 92
N1 N2 82 2 = (N1)El1 = 1mN1) 0 0 0 0 Figure 7.21. Conceptualization of Hinimum Circular Di s t a nce
PAGE 93
83 Apex distance in a CC banyan is characterized differently from base distance, but it is still useful to think of a CC banyan's apexes as being arranged in a circle. The apex distance between two apexes }:'.[L,J1] and f[L,J2] is the smallest integer i = O ~ L such that O = (+/(I)+)IJ2J1; that is, such that x /(I)+ divides J2J1 (Corollary 4 2.5a). It may be observed, however, that since x /(I)~ divides N, all of the following are equivalent. 0 = ( x /(,I)+) JJ2J1 0 = ( x /(I)+S)INIJ2J1 O ( x /(I)+)1NIJ1J2 o = (x/(,I)+)IJ2W1 I f one thinks of apexes f[L;O] through }:'.[L;N1] arranged in a circle s imilar to that shown in Figure 7.21, then the distance between any t wo apexes can be determined from their separation on the circle. Starting w ith any apex f[L ;J] and proceeding in either direction, one can find an apex I or closer to f[L;J] every x /(I)+ steps around the circle. F or example, Figure 7.22 shows the apexes of the CC banyan in Figure 7. 1la arranged in a circle. The number in parentheses next to each ap ex is that apex's distance from }:'.[L,O]. This pattern of numbers si mply may be rotated clockwise J places to determine distances from a ny other apex f[L;J].
PAGE 94
(2) K[L;6] ( 3 ) f[L;7] (3) f[L; 5] 84 ( 0) f[L;O] 0) }'.'.[;4] ( 2) f[L;2] (3) f[L;3] Figure 7.22. Apex Distances for cc Banyan in Figure 7. 1la
PAGE 95
SECTION 8 BANYAN NETWORK. SIMULATIONS Although theoretical analysis of banyan structures has been fruit ful in many respects, it has thus far failed to yield good quantitative measures of partitioning flexibility. Consequently, a number of banyan networks were simulated on a digital computer in order to study network performance characteristics and to assess the effects of certain design options. The tests performed were intended primarily for comparing the eff ects of design variations on the partitioning flexibility of a banyan ne twork. The simulated test conditions were not based on any particular ap plication or job mix. Theywere designed to exercise a network's part itioning capabilities thoroughly, but in a conceptually simple ma nner. As will be explained later~ these test conditions tended to be con trived "worst case" conditions in several respects and probably were mor e severe than normal conditions in any practical application. The simulations tested the ability of networks to connect randomly sel ected partitions of system resources. Statistics were gathered con cerning the numbers of parallel or multiplexed layers (Section 4.5) re quired and concerning the number of subsystems connected in each lay er. The nature of these simulations will be described in greater det ail in Section 8.1. 85
PAGE 96
86 The simulation results, tabulated in Appendix D and discussed in Section 8.2, demonstrate how performance measures for a banyan network tend to be affected by its size, fanoutspread, and structure type and by certain control options. As will be explained in Section 9, these simulation results also indicate that banyan networks could have signi ficant costperformance advantages over crossbarbased networks in large systems.
PAGE 97
87 8.1 Nature of Simulations Kinds of networks simulated. Both SW and CC banyan partitioning networks were simulated. Ail simulated networks were regular and rec tangular and had fanoutspread parameters ranging from 2 to 8. The number of bases ranged from 4 to 256, but due to computer time limita tions, most tests were performed using networks with at most 64 bases. Ape x selection rules. As was discussed in Section 5.1, Theorem 2.1.7 suggests that more subsystems might be connected in a given layer i f apexes for new subsystems were selected as near as possible to apexes u sed for existing subsystems. To assess the significance of this selec ti on criterion~ two apex selection rules were used, one of which tended t o select new apexes far from those in use and the other of which tended t o selec t new apexes near to those in use. We call these the "far rule" an d the "near rule"~ respectively. Both selection rules were simple, fixedpriority rules. The only di fference was the way in which selection priorities were assigned to ap e x es. The far rule simply selected the leftmost elligible apex, assuming th at a network was layed out in the usual manner as illustrated in Fig ure s 4.3lb, 4.41, 4.42, 5.11, 6.22g, 6.22h, G.22i, and 7.1la. For example, if the far rule were applied to the SW banyan in Figure 6. 22g, apex O would be first choice, apex 1 would be second choice, et c. Apex 15 would be selected only if it were the only apex elligible. Th us, with apexes numbered i n this manner, apex I1 would be the Ith ch oice according to the far rule. This rule tended to select apexes ve ry distant from those already in use; because consecutive choices ge nerally were the most widely separated apexes in terms of apex distance.
PAGE 98
88 In contrast, the near rule tended to select new apexes close to those already in use, because its consecutive choices tended to be the nearest apexes in terms of apex distance. With apexes numbered in the conventional manner, apex number (Q)~(QT(I1)) would be the Ith choice according to the near rule, where Q is the spread vector. For example, the apexes of the SW banyan in Figure 6.22g would be assigned priorities as shown below in Table 8.11. This rule would simply select the leftmost eligible apex if the banyan were laid out as shown in Figure 8.11. Similarly, the near rule would select apexes of the CC banyan in Figure 7.1lb according to the priorities listed in Table 8.12. This would be equivalent to selecting the leftmost eligible apex if the CC banyan were laid out as shown in Figure 8.22. Note that our apex numbering conventions are such that the same far and near rules are applicable to both SW and CC banyans. CHOICE N UMBER (L) 1 2 3 4 5 6 7 8 9 10 11 12 TABLE 8.11. Application of Near Apex Selection Rule to the Banyan in Figure 6.22g APEX ~T(I1) (~T(I+1)) NUMBER 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 8 0 0 1 0 0 1 0 0 4 0 0 1 1 1 1 0 0 12 Q 1 0 0 0 0 1 0 2 0 1 0 1 1 0 1 0 10 0 1 1 0 0 1 1 0 6 0 1 1 1 1 1 1 0 14 1 0 0 0 0 0 0 1 1 1 0 0 1 1 0 0 1 9 1 0 1 0 0 1 0 1 5 1 0 1 1 1 1 0 1 13
PAGE 99
CHOICE NUMBER (I) 13 14 1 5 16 CHOICE N UMBER (I) 1 2 3 4 5 6 89 TABLE 8. 11. (Continued) APEX !2'. T(I1) (!2'.T(I 1)) NLIMBER 1 1 0 0 0 0 1 1 3 1 1 0 1 1 0 1 1 11 1 1 1 0 0 1 1 1 7 1 1 1 1 1 1 1 1 15 TABLE 8.12. Application of Near Apex Selection Rule to the Banyan in Figure 7.1lb APEX !2'.T(I 1) ( !2'. T(I1)) NUMBER 0 0 0 0 0 0 1 1 0 3 1 0 0 1 1 1 1 1 1 4 2 0 0 2 2 2 1 1 2 5 Setup rules. Two setup rules were simulated, the standard setup ru le described in Section 4.4 and a modified rule in which the "trunk" p ortion of a treeshaped connection was disconnected immediately after se tup. The standard rule sometimes produced treeshaped connections, li ke that denoted by heavy lines Figure 8.l3a, in which no branching ex isted at the apex or root. In such cases, the portion of a connection be tween the apex and the highestlevel branch point was superfluous o nce the connection had been established The modified setup rule
PAGE 100
90 0 8 4 12 2 10 6 14 1 9 5 13 3 11 7 15 0 8 4 12 2 10 6 14 1 9 5 13 3 11 7 15 Figure 8.i1. Redrawn Version of SW Banyan in Figure 6.22g
PAGE 101
0 0 91 3 I 4 2 3 I 4 2 Figure 8.12. Re~rawn Version of CC Banyan in Figure 7.1lb 5 5
PAGE 102
92 Selected A pcx Y a) Connection Initially Established Se lected A pcx l i Selected f3ases A Selected fh,cs b) Connection Remaining After SetUp Using Modified Rule Figure 8.13. SetUp Rule Modification
PAGE 103
93 employed the same search and setup algorithms as the standard rule (Section 4.4), but disconnected the superfluous portion of a connection immediately after setup, as illustrated in Figure 8.13b. The purpose of this modification was to achieve more efficient network utilization by leaving as much of the partitioning network as possible for connect ing other subsystems Test case generation~ Complete partitions of a network's bases were generated pseudorandomly. First, the number of subsystems in a partition was selected as a pseudorandom number uniformly distributed from 1 to the number of bases in the network. Then each base was assigned pseudorandomly to one of these subsystems such that all subsystems were equally probable. Thus, the number of bases assigned to any subsystem co uld vary and could even be zero in some cases. Subsystems then were connected one at a time, placing each in the first available layer. All su bsystems of the partition were connected in this manner using as many lay ers as were required. Then, all subsystems were dissolved and the ent ire procedure was repeated for a total of 100 partitions. To determine if the number of subsystems in a partition had any sub stantial effect on network loading, a few simulations also were perf ortned in which the number of subsystems was fixed in advance instead of being selected pseudorandomly for each partition. These simulations wer e like those described above in all other respects, including the ps eudorandom assignment of bases to subsystems. In certain respects, the test conditions simulated tended to be "w orst case" conditions more demanding than those likely to be encoun ter ed in practical applications. By assigning every base to some sub sy stem in each partition~ we effectively simulated a situation in which
PAGE 104
94 every port of every resource module was always needed by some subsystem. Also, trivial onebase subsystems were treated just like those with multiple bases in the simulations, even though partitioning network connections for onebase subsystems would be entirely superfluous from a practical standpoint. Further, by assigning bases to subsystems in the manner described, we simulated a situation in which no knowledge of base distance was used to enhance network performance. In most practical situations, however, knowledge of a network's base distance function could be used to enhance performance as suggested in Section 5.1. Kinds of data collected~ Several kinds of data were collected during each simulation. The average number of layers required for fully connecting all partitions was computed along with an estimate of the s tandard error of this mean. For networks multiplexed as described in S ection 4 5, the average number of layers required is a useful per f ormance measure, because it indicates how much the maximum allowable i ntrasubsystem data rates typically would be diminished due to network ti me sharing. In interpreting the values obtained, however, one should r emember that the test conditions were extremely severe in that all s ubsystems of each partition were required to exist at the same time. The maximum number of layers required for fully connecting all pa rtitions was recorded also, because this indicates the maximum number of layers a network should be capable of providing when operated under c omparable conditions. This empirically observed maximum was based on a l imited sample, however, and should not be taken as a theoretical upper b ound. The distribution of subsystems among layers was also recorded and w as expressed as a cumulative percentage of nonempty subsystems con
PAGE 105
95 nected versus number of layers. That is, we determined the percentage of all nonempty subsystems that were connected in layer 1, the percent age connected in the first 2 layers, the percentage connected in the first 3 layers, etc. The percentage of subsystems connected in a given number of layers can be taken as an indication of how well a network limited to that number of layers would perform, assuming that the sub systems of a partition were isolated subsystems capable of existing at different times.
PAGE 106
96 8.2 Simulation Results Effects of varying the number of subsystems in a partition A series of simulations was performed to determine if the number of sub systems in a partition had any substantial effect on the ease with which the subsystem could be connected by a banyan network. This was of interest during the planning of subsequent simulations because we wished to generate test cases that would seriously challenge a network's connect ing abilities. Results of this series of tests are shown in Table 8.21. The same n etwork was used in all tests. For each test, 100 partitions were gen e rated as described in Section 8.1 using a fixed number of subsystems p er partition. Since this generation procedure made it possible for the a ctual number of nonempty subsystems in a partition to be somewhat less t han the specified number, the total nu m ber of nonempty subsystems in al l 100 partitions is listed in the rightmost column for each test. These results indicate that, except when the number of subsystems per partition is extremely small, variations in this number have little ef fect on the average or maximum number of layers required. The percent ag e of subsystems connected in the first layer grew slowly but steadily wi th the number of subsystems per partition, except when the number of sub systems was so small that all could be connected in the first layer. Cl early, partitions are easiest for such a structure to connect when the num ber of subsystems is extremely small. In fact, it has been observed th at a partition can always be connected in a singlelayer SW or CC ba nyan if it contains no more than ~[1] subsystems, where is the ne twork's spread vector~ Aside from such extreme cases, however, the nu mber of subsystems per partition appeared to have relatively little
PAGE 107
TABLE 8.21 Effects of Varying the Number of Subsystems in a Partition Structure: SW Apex Selection Rule: Far SetUp Rule: Standard Number of Bases: 32 Fanout and Spread: 2 Layers Required Subsystems in Nonempty Subsystems Connected (percent) Each Partition Standard (including empty Error of subsystems) Mean Mean Maximum 1 Layer 2 Layers 3 Layers 4 Layers 2 1.0 o.o 1 100.00 4 1.96 .020 2 60.25 100.00 8 2.10 .030 3 69.03 98.74 100.00 16 2.04 .020 3 75.96 99. 72 100.00 32 2.05 .022 3 84.42 99. 71 100.00 Total Nonempty Subsystems 200 \0 400 ..J 791 1406 2054
PAGE 108
98 effect on the ease with which the partition could be realized, at least by the network tested. Subsequent simulations. In subsequent simulations, the number of subsystems in each partition was selected pseudorandomly as described in Section 8.1. The pseudorandom number generator used was initialized with the same seed at the beginning of each simulation so that different networks with the same number of bases were tested with the same set of partitions. Complete results of these simulations are tabulated in Appendix D. Intepretations of these results will be discussed in the following paragraphs, and relevant portions of the results will be pre sented in different forms where necessary. Effects of network size and fanoutspread on average layers required. Figures 8.21 through 8.24 show how the average number of layers re quir ed varied with the number of bases and with a network's fanout spread parameter (F). Each mean is plotted along with a confidence interval of plus and minus two standard errors of the mean, which cor res ponds to a confidence level of approximately 95 percent. Notice that semi log graphs are shown so that straightline plots represent logarith mic functions. The lines shown were drawn visually based on the points pl otted. It is apparent from these graphs that, for each type of network, the average layers required increased with increasing network size and de creased with increasing fanoutspread Further, wherever three or mor e data points were plotted for the same fanoutspread, the average lay ers required appear to have grown no more rapidly than a logarithmically fu nction of the number of bases. This is evidenced by the fact that most such plots either closely approximated straight lines or else
PAGE 109
3.0 "' ... C) >,, 2.5 .J 1. 5 L O 0.5 0 99 F=2 y 0: F=2 6: F=3 0: F=4 0: F=8 Ol..........,.,r~,2 4 8 16 32 64 128 Number of Ba s es Figure 8. 21. Average 1ayers Required for: SW Banyans Using Far Apex Selection Rule and Standard SetUp Rule 256
PAGE 110
"'d C, :.. ;:; O" C, ri> ... C, ..... r; ..,J C, bD r; ... C, > < 100 2..5 F=2 2 0 F=4 1.5 ? 1.0. 0 0: F=2 0: F=4 0.5 : F=8 04,,,,. 2 4 8 16 32 N um her of 13ascs Figure 8.22. Average Layers Required for sw Banyans U s ing Near Apex Selection Rule and Modified SetUp Rule 6~
PAGE 111
::, I:,) .. s t::"' I:,) a:: cc .. I:,) >. .,J Q,> t:O ,:; ... Q,> > < 2 5 2.0 1.5 1. 0 0.5 101 0 0: F=2 0: F=4 L'') ['F=4 01~.,,, 2 4 8 16 Number of Ba s es 32 Figure 8.23. Average Layers Required for CC Banyans Using Near Apex Selection Rule and Standard SetUp Rule 64
PAGE 112
::, CJ :: ::i t::14 CJ (I) ... CJ e:: cu bJ) ... CJ ,.. < 102 2.5 F=2 2.0 1.5 1.0 0 0: F=2 0.5 0: F=4 o~........., 2 4 8 16 32 Number of 13ascs Figure 8.24. Average Layers Required for CC Banyans Using Near Apex Selection Rule and Modified SetUp Rule 64 /
PAGE 113
103 curved downward indicating that the average layers required grew less rapidly than a logarithmic function. The only notable exceptions to this appear at the low ends of some plots (e.g., for fanout and spread equal to 2 in Figures 8.22 and 8.24) where the plot becomes nearly horizontal as the average layers required approach 1. This anomaly is to be expected, however, since the layers required to connect a parti tion can never be less than 1. Effects of network size and fanoutspread on maximum layers required. It is apparent from the tables in Appendix D that the maximum number of layers required generally was related to network size and to fanout spread much as was the average number of layers required. That is, it tended to increase with increasing network size and tended to decrease with increasing fanout. The maximum number of layers required ranged from 1 through 4 for the various networks simulated. Since only a small range of integer values were covered, it is difficult to assess how rapidly the maximum number of layers grows with system size, but log ogarithmic growth appears plausible. Distribution of subsystems among layers. It is also apparent from t he tables in Appendix D that nearly all subsystems were connected in t he first layer or two, even in cases where comparatively large values w ere obtained for the maximum and average layers required. For example, i n the largest network simulated, an SW banyan with 256 bases (Table D1 )~ over 87 percent of the subsystems were connected in the first layet: a nd over 99 percent were connected in the first two, even through an a verage of 2.39 and a maximum of 4 layers were required to fully connect all partitions. This indicates that only one or a very few layers might
PAGE 114
104 provide sufficient partitioning flexibility for applications involving mostly isolated subsystems Comparison of SW and CC networks. Simulation results for compar able SW and CC networks are shown in Table 8.22. To facilitate com parison, table entries are paired so that each row for an SW banyan is followed immediately by a row for an otherwise identical CC banyan. Network structure and control rule options are abbreviated as follows: SW SW banyan structure cc cc banyan structure F Far apex selection rule N Near apex selection rule s Standard setup rule M Modified setup rule Although performance differences between the two types of networks wer e generally minor, SW banyans always performed as well as or better th an their corresponding CC banyans. This was true for all performance me asures, including average layers required, maximum layers required, an d the percentage of subsystems connected in any given number of layers. Comparison of far and near apex selection rules. Simulation results for comparable networks using far and near apex selection rules are pr esented in Table 8.23. Abbreviations used are the same as for Table 8. 22. As predicted in Section 5.1, the near rule consistently outper for med the far rule except for one small network for which identical re sults were obtained with the two rules. Comparison of standard and modified setup rules. Simula t ion r esults for comparable networks using standard and modified setup rules ar e presented in Table 8 24. Abbreviations used are the sam e as for
PAGE 115
105 Table 8.22. As expected, the modified rule consistently outperformed the standard rule.
PAGE 116
TABLE 8.22 Comparison of SW and CC Network Structur es Layers Required Network Structure Subsystems Connected (percent) and Fanout Number Standard Control and of Error of Rules Spread Bases Mean Mean Maximum 1 Layer 2 Layers 3 Layers 4 Layers SW,F,S 2 64 2.35 .061 3 76.04 98.05 100.00 CC,F,S 2 64 2.41 .064 4 70.38 97.52 99.96 100.00 SW,N,S 2 8 1.07 .026 2 97.95 100.00 CC,N,S 2 8 1. 32 .047 2 90.35 100.00 ... 0 SW,N,S 2 64 2 08 .046 3 81 77 99.35 100.00 CC,N,S 2 64 2.17 053 3 73 31 98.94 100.00 SW,N,S 4 16 1.19 .039 2 97.24 100.00 CC,N,S 4 16 1.33 .047 2 94.78 100.00 SW,N,S 4 64 1.89 .031 2 88.61 100.00 CC,N,S 4 64 1.90 .030 2 85.68 100.00 SW,N,M 2 4 1.00 0.0 1 100.00 CC,N,M 2 4 1.00 o.o 1 100.00 SW,N,M 2 8 1.05 .022 2 98.54 100.00 CC N,M 2 8 1.14 .035 2 95.91 100.00 SW,N,M 2 16 1.44 050 2 93.32 100.00 CC N,M 2 16 1.57 .050 2 89.70 100.00
PAGE 117
TABLE 8.22 (Continued) Layers Required Network Structure Subsystems Connected (percent) and Fanout Number Standard Control and of Error of Rules Spread Bases Mean Mean Maximum 1 Layer 2 Layers 3 Layers 4 Layers SW,N,M 2 32 1.80 .040 2 87.59 100.00 CC,N,M 2 32 1.89 034 3 84.35 99.92 100.00 SW,N,M 2 64 2.05 046 3 86.05 99.43 100.00 CC,N,M 2 64 2.12 .050 3 81. 53 99.19 100.00 1' 0 ..J SW,N,M 4 16 1.03 .017 2 99.56 100.00 CC,N,M 4 16 1.07 .026 2 98.84 100.00 SW,N,M 4 64 1.74 .044 2 92.19 100.00 CC,N,M 4 64 1.82 .039 2 91.83 100.00
PAGE 118
TABLE 8.23 Comparison of Far and Near Apex Selection Rules Layers Required Network Structure Subsystems Connected (percent) and Fanout Number Standard Control and of Error of Rules Spread Bases Mean Mean Maximum 1 Layer 2 Layers 3 Layers 4 Layers SW,F,S 2 8 1.25 044 2 92.69 100.00 SW,N,S 2 8 1.07 .026 2 97.95 100.00 SW,F,S 2 64 2.35 .061 3 76.04 98.05 100.00 SW,N,S 2 64 2.08 046 3 81. 77 99.35 100.00 SW,F,S 3 27 1.92 .027 2 84.36 100.00 SW,N,S 3 27 1.82 039 2 89.55 100.00 I' 0 SW,F,S 4 16 1.37 .049 2 93.76 100.00 co SW,N,S 4 16 1. 19 039 2 97.24 100.00 SW,F,S 4 64 1.91 032 3 83.40 100.00 SW,N,S 4 64 1.89 031 2 88.61 100.00 SW,F,M 2 8 1.05 022 2 98.54 100.00 SW,N,M 2 8 1.05 .022 2 98.54 100.00 SW,F,M 2 64 2.15 052 3 84.17 99.02 100.00 SW,N,M 2 64 2.05 046 3 86.05 99.43 100.00 SW,F,M 4 64 1.85 .036 2 90.15 100.00 SW,N,M 4 64 1. 74 044 2 92.19 100.00 CC,F,S 2 64 2.41 .064 2 70.38 97.52 99.96 100.00 CC,N,S 2 64 2.17 053 3 73.31 98.94 100.00
PAGE 119
TAB L E 8.24 Comparison of Standard and Modified SetUp Ru l es Layers Required Network Structure Subsystems Connected (percent) and Fanout Number Standard Control and of Error of Rules Spread Bases Mean Mean Maximum 1 Layer 2 Layers 3 Layers 4 Layers SW,F,S 2 8 1.25 044 2 92.69 100.00 SW,F,M 2 8 1.05 022 2 98.54 100.00 SW,F,S 2 64 2.35 .061 3 76.04 98.05 100.00 SW,F,M 2 64 2.15 .052 3 84.17 99.02 100.00 SW,F,S 4 64 1.91 032 3 82.40 99.96 100.00 tSW,F,M 4 64 1.85 .036 2 90.15 100.00 0 \0 SW,N,S 2 8 1. 07 .026 2 97.95 100. 00 SW,N,M 2 8 1.05 .022 2 98.54 100.00 SW,N,S 2 64 2.08 046 3 81. 77 99.35 100.00 SW,N,M 2 64 2.05 .046 3 86.05 99.43 100.00 SW,N,S 4 16 1.19 .039 2 97.24 100.00 SW,N,M 4 16 1.03 .017 2 99.56 100.00 SW,N,S 4 64 1.89 .031 2 88.61 100.00 SW,N,S 4 64 1.74 044 2 92.19 100.00 CC,N,S 2 4 1.03 .017 2 98.46 100.00 CC,N,M 2 4 1.00 o.o 1 100.00
PAGE 120
TABLE 8.24 (Continued) Layers Required Network Structure Subsystems Connected (percent) and Fanout Number Standard Control and of Error of Rules Spread Bases Mean Mean Maximum 1 Layer 2 Layers 3 Layers 4 Layers CC,N,S 2 8 1.32 .047 2 90.35 100.00 CC,N,M 2 8 1.14 .035 2 25,35 100.00 CC,N,S 2 16 1.80 .040 2 80.84 100.00 CC,N,M 2 16 1.57 .050 2 89.70 100.00 I' I' CC,N,S 2 32 1.94 .034 3 76. 72 99.76 100.00 0 CC,N,M 2 32 1.89 .034 3 84.35 99.92 100.00 CC,N,S 2 64 2.17 .053 3 73.31 98.94 100.00 CC,N,M 2 64 2.12 .050 3 81.53 99.19 100.00 CC,N,S 4 16 1.33 047 2 94.78 100.00 CC,N,M 4 16 1.07 .026 2 98,84 100.00 CC,N,S 4 64 1.90 .030 2 85.68 100.00 CC,N,M 4 64 1.82 039 2 91. 83 100.00
PAGE 121
SECTION 9 COST AND PERFORMANCE FUNCTIONS Certain quantitative measures of a banyan network's cost and per formance will be discussed in this section~ In Section 9.1, we will discuss functional relationships between these measures and network size, and will compare banyan cost and performance functions with those of the crossbar partitioning structure discussed in Section 3.1. Cost and performance measures for a specific banyan network will be compared with those for alternative crossbar networks in Section 9.2. In Section 9.3, it will be shown that the fanoutspread parameter of a regular, rectangu lar banyan can be selected to optimize a given cost or costperformance measure and that optimum fanoutspread values are constant with respect to network size. 111
PAGE 122
112 9.1 Functions of Interest Number of Arcs. The number of bidirectional switching devices, or "contacts", required is a commonly used measure of the "cost" of connecting network. This "cost" measure is proportional to the number of arcs in a banyan graph and is given by the formulas in Table 5.21. Notice that a regular, rectangular banyan with fanout and spread F requires only (F*L)xLxF arcs for a network with F*L bases~ Thus, such a network with N bases requires only (FN)xF arcs per base, which is a logarithmic function of N. A comparable crossbarbased network like that shown in Figure 3.11 would require L(N) arcs per base, a function which tends to grow linearly withN. Thus, the cost advantage of a regular, rectangular banyan network over a crossbarbased partitioning network improves with out limit as network size increases. Required fanout capabilities. A bidirection~l switching device, corresponding to an arc of a banyan graph, must be capable of driving all other switching devices attached to the same vertex. A vertex in level I of a uniform banyan has E[I] arcs incident into it and Q[I+1] arcs i ncident out from it implying that each of bidirectional switching d evices attached to that vertex must be capable of driving E[IJ+a[I+1]1 s imilar devices. Thus, the fanout capabilities required of the bidirec ti onal switching devices used in a uniform banyan network depend only on t he fanout and spread values selected for the network and do not depend on network size. Hence, arbitrarily large banyan networks can be built u sing switching devices with a limited fanout capability. Similarly, e a ch resource module port, or base, of the banyan network need only drive ~ [1] bidirectional switching devices regardless of network size.
PAGE 123
113 An Nport crossbar network like that in Figure 3.11, however, requires each bidirectional switchi ng device to drive as many as N1 similar devices. Further, each resource module drives l(N..2) bidirectional switching devices ~ Thus, for any given family of switching devices, the maximum size of a straightforward crossbar partitioning network is limited by device fanout capabilities. This limitation can be overcome by subdividing each bus of a crossbar network into a number of segments interfaced with each other using bidirectional amplifiers As will be exemplified in the next section~ however, such modification of a crossbar partitioning network increases network cost and data pr opagation delays. Priority propagation delay. For reasons explained in Section 2.3, pr iority hardware is likely to be needed to resolve conflicting requests f o r use of a subsystem bus. If priority hardware is built into an Llevel banyan network as described in Section 4.2, then the propagation del ay for priority signals will be proportional to the number of levels L. In a regular banyan, there are F*L bases, so the propagation delay for pr iority signals need grow only logarithmically with network size, as suming constant fanout F. Practically the same propagation delay for pr iority signals could be achieved in a crossbar network using methods de scribed by Foster (68), and it is unlikely that a better than logarithmic gr owth rate could be achieved without drastically increasing cost. Thus, prop agation delays for priority signals are likely to be ap proximately the same for both banyan and crossbar part:i,tioning net wor ks. With either structure ;i:.t should be easy to achieve short delays wh ich grow only loga,rithmically with network si,ze.
PAGE 124
114 Data propagation delay. The time required for a data s ignal to propagate through a connecting network is approximately proportional to the number of switching devices through which the signal must pass. The longest possible signal path in a banyan network is from a base to an apex to another base. Therefore, in an Llevel banyan, a data signal w ould have to propagate through at most 2 x L bidirectional switching devices. Hence, theworst case data signal propagation delay in a regu lar banyan with N bases would be that of 2 x (E@N) bidi rectional switches, w hich is a logarithmic function of N. In a simple crossbar network like that in Figure 3.11, a data si gnal must propagate through only 2 bidirectional switches regardless o f network size. This, however~ is feasible only in small networks which d o not exceed device fanout capabilities. To construct la rger crossbar n etworks with limitedfanout devices, one must divide large busses into sm all segments interfaced by bidirectional amplifiers. This modification, wh ich will be illustrated in greater detail in Section 9.2, causes data pr opagation delays to grow logarithmically with system size much as they do for banyan networks. Thus, data propagation delays, like priority propagation delays are li kely to be approximately the same for both banyan and crossbar parti ti oning networks when large network must be constructed using limited fa nout devices. With either structure, short delays which grow only lo garithmically with network size are achievable. Average layers required. Unlike a full crossbar network, a banyan ne twork inay require multiple layers ;in some applications as was discussed in Section 4.s; If parallel networks are employed, then network cost
PAGE 125
115 can b e e x pected to rise in proportion to the number of parall e l networ k s If, on the other hand a single network is multiplexed, then bus acquisi tion and data transmission times will effectively increase in proportion to the average number of "time slots", or layers, used. Preliminary empirical results presented in Section 8 indicate that the average number of layers required under artificially severe test conditions tends to grow no more rapidly than a logarithmic function of the number of bases i n a network. In many banyan network a p plications, fewer layers might suffice or multiple layers might be entirely unnecessary. Costdelay product. Costdelay products are commonly used cost p erformance measures fordig;i.tal circuits. Similar measures can be u seful in comparing the costperformance potentials of different connect i ng network structures. A simple costdelay product for a partitioning n e t work can be obtained by multiplying the number of arcs, or bidirectional sw itching devices, required for each base of the network times the maximum nu mber of switching devices through which a data signal must propagate. If a network is multiplexed to provide multiple layers, then the rate at wh ich data can be transferred within a subsystem is effectively divided by the number of layers used. To take this into account, the co s tdelay pr oduct of a network can be multiplied times the average number of layers re quired to obtain a costdelaylayer product. Costdelay and costdelaylayer products grow more slowly with n e twork size for r egular, rectangular banyans than they do for crossb a r par titioning networks. Let N be the number of bases of a part i t i oning net work. For a simple crossbar like that described in Section 3.1, the num ber of bidirectional switching devices required per base g r o ws linearly
PAGE 126
116 with N, the data propagation delay is a constant 2, and the number of layers required is a constant 1. Hence, the costdelay and costdelay layer products for simple crossbars grow linearly with N. To build large crossbars with limitedfanout devices, however, one must insert a number of bidirectional amplifiers into the network's busses, causing the data propagation delay to grow logarithmically with N. Thus, the costdelay and products for large crossbar networks can be expected to grow as N X @N. In regular, rectangular banyans, however~ both the data propagation delay and the number of bidirectional switches required per base grow logarith mically with N so that the costdelay product for such banyan networks grows as (N) 2. Empirical data discussed in Section 8 indicates that the average number of layers required tends to grow no more rapidly than a logarithmic function of N. This indicates that the costdelaylayer product for r egular, rectangular banyans tends to grow no more rapidly than a function proportional to (N) 3.
PAGE 127
117 9.2 A Comparative Example The cost and performance measures discussed in Section 9.1 were evaluated for three 256base networks and are summarized in Table 9.21. The first network listed is a regular, rectangular, 4level SW banyan with fanout and spread equal to 4. This is not necessarily the optimum banyan structure for 256 bases~ but it was selected for comparison because it is the largest banyan for which simulation results were available. The second network listed is a straightforward crossbar partitioning network as described in Section 3.1. This network places severe requirements on the fanout capabilities of both the bidirectional switches and on the resource modules used. To provide a fairer basis for costperformance comparison~ a third network is listed. It is a modified crossbar partitioning network in which a number of bidirectional amplifieres have been inserted to achieve fanout requirements comparable t o those of the banyan network. The modified crossbar network is assumed to be constructed as shown i n Figure 9 21. To limit the fanout required of the bidirectional s witches and amplifiers, each of the 128 data busses in the network is d ivided into 32 bus segments as illustrated in Figure 9.2la. These bus s egments are linked by a treeshaped arrangement of bidirectional amplifiers s o that they function together as a single bus. Similarly, each of the 2 56 resource module ports is connected to 128 bidirectional switches by a treeshaped structure of bidirectional amplifiers as illustrated in F igure 9.2lb. This modification requires each port of a resource module to drive only 2 bidirectional a~plifiers and requires each input/output o f a bidirectional amplifier or switch to drive no more than 8 similar
PAGE 128
118 4 8 8 To Bidirectional Switches a) Modified Structure of Each Data Bus To Bidirectional Switches 8 8 > e G 8 8 ........ RESOURCE MODULE b) Modified Interface with Each Resou;r:ce Module Bidirectional Amplifiers Fj gure 9. 21. ModHication of a Crossbar Partitioni ng Network to Limit Fanout Requirements
PAGE 129
TABLE 9.21 Cost and Performance Measures for Three Alternative Networks Cost or Performance Measure Total Number of Bidirectional Switches or Amplifiers Required Number of Bidirectional Switches or Amplifiers Required per Base Fanout Capability Required of Switches and/or Amplifiers Fanout Capability Required of Resource Modules Data Propagation Delay (maximum number of bidirectional switches or amplifiers through which data must propagate) Average Layers Required CostDelay Product CostDelayLayer Product SW Banyan F = s = 4 L = 4 4,096 16 7 4 8 2.39* 128 305.92* Simple Crossbar 32,768 128 256 128 2 1 256 256 Modified Crossbar with Limited Fanout 41,984 164 8 2 10 1 1640 1640 *Based on simulation of network using "far" apex selection rule and "standard" setup rule as described in Section 8 t' !" \.0
PAGE 130
120 devices. It may be observed that this modified crossbar network is structure lly equivalent to a 5level banyan with fanout vector 1 1 8 8 4 and with spread vector 2 8 8 1 1, assuming that its bidirectional amplifiers are counted as switching devices. Entries in Table 9.21 were computed as described in Section 9.1. Bidirectional amplifie r s in the modified crossbar network were counted together with switches since bidirectional amplifiers and switches require almost i dentical circuits and would contribute similarly to network cost. The average number of layers required using the SW banyan was obtained from the simulation results reported in Section 8 and Appendix D The n etwork was simulated using the "far" apex selection rule and the "standard" setup rule. For reasons explained in Section 8, it is likely that the a verage number of layers required would have been less if the "near" and modified" rules had been used instead. Also, it should be remembered t hat the figure shown represents severe test conditions and might be s ignificantly less in some applications. The table indicates that the banyan network would cost about an order of magnitude less to build than either crossbar and that it has t he smallest costdelay product. The costdelaylayer products indicate th at the simple crossbar network should be slightly more costeffective t han the banyan network, but this ignores the fact that the simple c rossbar requires both bidirectional switches and resource modules t o have much greater fanout capabilities. When the banyan network is c ompared with a crossba.r modified to overcome these fanout problems the c ostdelaylayerproducts indicate better than a fivetoone cost p erforma.nce advantage for the banyan. Since these cost and cost p erformance measures growmore slowly for banyans than for crossbar
PAGE 131
121 based structures, they can be expected to favor banyans more strongly for larger networks and less strongly, or not at all, for smaller net works. This example is intended only to illustrate certain measures of cost and performance and to demonstrate the potential costperformance advantages of large banyan networks. A much more detailed analysis, of course, would be needed to accurately assess the costeffectiveness of a particular network in a specific application and system environment.
PAGE 132
122 9.3 Optimum Fanout and Spread It is shown in Theorems 2.4.1 and 2.4.2 that the fanoutspread parameter of a regular, rectangular banyan can be selected to optimize the network's cost or costdelay product. The number of arcs per base in an Nbase regular, rectangular banyan is given by Q M [ 1 l = F x F@N, which is a measure of network cost per resource module port. The cost delay product for such a network is given by QM[2] = 2 x L x CM[1J = 2 x LxF x F@N = 2x(F@N) xFx ( F @N) = 2 x F x (F@ N )*2. E ach of these functions is of the form CM[I] = CxFx(F@N)*I, w here C is the appropriate proportionality constant, 1 or 2. A function of this form is minimized with respect to F when F = e*I (T heorem 2.4.1). When Fis restricted to positive integer values, as it must be for real networks, the cost measure QM[1] is minimized when P = 3, and the costdelay product CM[2] is minimized when F = 7 (Theorem 2. 4.2). The costdelay product is increased by only about 0.08%, however, wh en F = 8 (proof of Theorem 2.4.2). Also, network "cost" is in c reased by the same amount when F = 4as it is when F = 2 (Theorem 2.4.2). The op timum values for the fanoutspread rarameter Fare independent of net wor k size, except, of course, for the fact that N must be a pow er of Fin
PAGE 133
123 any regular banyan. No optimum fanoutspread value has been determined for minimizing a network's costdelaylayer product, because the effects of fanout spread on the average number of layers required are not precisely known and may be application dependent. The simulation results indicate that somewhat fewer layers are required when Fis large, suggesting that the costdelaylayer product would tend to be minimized by a fanoutspread value somewhat larger than that required for minimizing the costdelay product.
PAGE 134
SECTION 10 CONCLUSIONS There is a strong and growing need for switching structures suitable for interconnecting numerous processors and other resource modules in l arge, general purpose computing systems. Banyan partitioning networks c an be used as described in Section 2 to fulfill this need. Banyan structures have been presented w hich satisfy the requirements i dentified in Section 2.3 and which offer potentially large cost pe rformance advantages over conventional "crossbar" or "mult i plebus", str uctures for large systems. Regular, rectangular banyans hav e been de scribed whose fanout requirements are constant and whose costperport fu nctions grow only logarithmically with the number of resource module por ts. This is a significant improvement over crossbarbased st r uctures, wh ose fanout and costperport functions both grow linearly with network si ze. Worstcase propagation delays in regular, rectangular banyan net works grow only logarithmically with network size which is as good as can be achieved with any network using limitedfanout devices Banyan par titioning networks can be controlled very rapidly and in a potentially fa ulttolerant manner using distributed hardware in the netwo r k itself, as described in Section 4.4. Priority hardware for resolving bus r equest c onflicts can be built i nto a banyan network easily, using th e t e chnique de scribed in Section4.2. As noted in Sections 6.2 and 7 1, S W a nd CC ba nyans can be constructed in modular fashion by interconnectin g identi c al 124
PAGE 135
125 "building blocks" suitable for use in any size network. E x pansion of a banyan can be accomplished by using the old network as a component banyan of a new synthesized network as described in Section 4.3. Adequate partitioning flexibility for many applications might be provided by singlelayerbanyan networks. Simulation of regular, rec tangular SW and CC banyans with up to 256 bases has indicated that most subsystems of randomly selected partitions can be connected with only one l ayer and that all, or.nearly all, can be connected with two, even under s evere test conditions. In applications where greater flexibility is needed~ any partitioning o f system resources into subsystems can be achieved with a multiplexed b anyan network. In the simulated networks, the average number of layers re quired to fully connect random partitions appears to have grown no more ra pidly than a logarithmic function of the number of resource module p orts. As explained in Section 9, this still allows the potential cost p erformance advantage of banyans over crossbarbased networks to improve wi thout limit as network size increases. The research reported here has focused on the use of banyan networks f or partitioning applications. Hence, banyan networks were compared in Se ction 9 with alternative partitioning networks rather than with networks de signed for different functions, such as permuting or store andforward me ssage switching. It is felt, however, that the adaptation of banyan st ructures for such applications warrants further study. Banyan structures have been defined and analyzed formally using g raph theory. A number of useful and mathematically interest i ng propert i es o f banyans have been identi,fied. This analysis has been ori e nted towards
PAGE 136
126 the use of banyans as partitioning networks, but as was noted in Sections 6.1 and 7, networks graphically equivalent to special cases of banyans have been proposed or used previously for many different applications. The taxonomy of network structures and the mathematical tools presented here are thus applicable to networks for a variety of other data manipu lation functions, such as permuting, shifting, and sorting. Because of its generality, the theory of banyan graphs tends to tie together a number of previous works in addition to establishing useful properties of banyan partitioning networks. Like most research endeavors, this work has left certain questions unanswered and suggests directions for further research. For banyan partitioning networks, the most pressing need for further research is in t he area of performance evaluation. Simulation results reported here i ndicate that banyans can be much more costeffective than crossbarbased p artitioning networks in very large systems, but caution must be used in ex trapolating these results to networks or operating conditions other t han those actually simulated. It would be useful to extend these s imulations to include larger networks and networks with different fanout a nd spread vectors. As explained in Section 8.1, however, the test c onditions simulated may have been much more severe than those likely to b e encountered in practical systems. To accurately estimate banyan per f ormance for any specific purpose, more realistic test conditions will m ost l:i,kely be needed and should take operating system scheduling and r esource allocation strategies :i,nto account. For example, it is sus p ected that the number of layers ~equ:i,red :i,n most practical applications c ould be cut drastically if base distance properties were fully exploited a s suggested in Section 5.L
PAGE 137
127 It is hoped that further mathematical research will provide useful results concerning average and maximum layer requirements. A tight theoretical upper bound on the number of layers required to fully connect an arbitrary partition in an SW banyan would be very useful and is being sought by the author. Useful results concerning layer requirements tend to be difficult to derive mathematically because of the multiplicity of connection possibilities, constraints, and options that must be consid ered It is felt that this is a high risk, but potentially high pay off, area for further research. The application of banyan graph theory to other areas also has interesting possibilities. Particularly, the author suspects that b anyan theory could prove useful in the study of permutation networks. Finally, there is a need for further engineering to develop network bu ilding blocks which can be produced economically as integrated circuits. If a banyan partitioning network were fabricated today using commercially av ailable components, then bidirectional switches, like those discussed in Appendix C, would have to be assembled from smallscale integrated ci rcuit gates. The TTL circuit in Figure C3a, for example, would re quire one IC package per switch. To reduce the cost and physical size of a network, one would like to use largescale integrated circuits con taining many bidirectional switches plus control logic. Technology fo r doing this already exists, and basic techniques for constructing la rge banyans in ll!Odular : form have been proposed here. Further work is n eeded, however to design one or i:nore specific module types which could be manufactured efficiently as standard ICs and which would be versatile en ough to warrant volume production~
PAGE 138
APPENDIX A MATHEMATICAL NOTATION AND TERMINOLOGY
PAGE 139
129 Graph theoretic terms used in this dissertation were taken mostly from Berge (62), where formal definitions can be found. For the reader's c onvenience, these terms are explained briefly below. A g r aph consists of a set of vertices and a set of arcs. When a diagram is drawn depicting a graph, vertices and arcs are represented by dots and arrows respectively. An arc is incident out from its initial vertex and incident into its terminal vertex. A path is a sequence of arcs such that the terminal vertex of one arc is always the initial vertex of the next. A path is said to be f rom the initial vertex of its first arc and to the terminal vertex of it s last arc, assuming that the path is finite. A path from a vertex b ack to that same vertex is called a circuit. A graph without circuits h as a partial ordering associated with it and sometimes is referred to as a Hasse diagram. The partial ordering associated with such a graph is a r elation on the graph's vertices. It asserts that the two vertices are e q u al or else that there exists a path from the first to the second. A graph is said to be nontrivial if it has two or more ve r tices an d is called finite if it has only a finite number of vertices A su bgraph of a graph is formed by deleting zero or more of the graph's ver tices and then deleting precisely those arcs whose initial a nd/or t erminal vertices no longer exist. A partial graph of a g r aph is f ormed by deleting zero or more of the graph's arcs without changing i ts vertices. A partial subgraph of a graph G is a partial graph of some subgraph ofG.
PAGE 140
130 Sometimes we wish to consider a graph without regard to the direc tion, or orientation, of its arcs. For this, we define an edge between two vertices to be the set containing those two vertices. We say that a graph contains an edge between two vertices V1 and V2 if it contains either an arc from V1 to V2 or an arc f ram V2 to }'1 or both. A chain is a st?.quence of edges in which each edge has one vertex in common with the preceding edge and the other vertex in common with the succeeding edge. A finite chain which begins and ends with the same vertex is called a cycle. A connected graph is one in which each pair of vertices can be linked by some chain A nontrivial, finite, connected graph with no cycles is called a tree. The notation used for mathematical expressions in this dissertation is an extension of that used in the APL programming language (Gilman and R ose, 70; IBM, 68). APL notationwas adopted because it is a compact, standardized, and reasonably wellknown notation encompassing a number of p owerful vector operations encountered frequently in the theory of banyan g r aphs. Without these APL operators, it is believed that many of the r esults derived in Appendix B would have been notably more cumbersome to express and prove. Where necessary, standard APL notation has been ex tended to include sets, quantifiers, and other constructs needed for m athematical proofs. For consistency, the same APLbased notation is u sed for all mathematical expressions in this dissertation, even when v ector operations are not involved. APL notation differs from common mathematical notation in that e x pressions are always eva,lua ted f ram right to lef;t without re g ard for operator precedence, except where parentheses explicitly designate a d ifferent order of evaluation~ For example,
PAGE 141
131 A.xBtC equals A,x(B+C) rather than (AxB)+C. Similarly, AB+C equals A(B+C) rather than (AB)+C. Some expressions in this dissertationcontain redundant parentheses in order to make certain subexpressions more con spicuous and to help prevent misinterpretation by readers unaccustomed to APL notation. Standard APL operators used in this dissertation are summarized in Table A1. More detailed explanations of these operators can be found in any book or manual explaining the APL programming language, such as that by Gilman and Rose (70) or that by IBM (68). APL extensions used in this dissertation are summarized in Table A2. Operators peculiar to banyan theory are not listed but are defined as needed in Appendix B. To make it easier for readers to distinguish between vectors and scalars, vector names are underlined consistently throughout the dissertation. Additionally, standard APL conventions have been modified with r espect to relational operators, such as=, <, ~, ~,and>, so that a c onjunction of several relations can be abbreviated in the customary way.
PAGE 142
NOTATION A+B AB A AxB A~B A*B A@B @A ALB LA ArB rA AIB PAQ PvQ A < B A $ B A > B A ;;;,: B A = B A B Y.[I] M[I;J] 132 TABLE A1 Standard APL Notation Used in Dissertation A plus BA minus B. Negative A. A times B. A divided by B. A raised to the B power. Log B base A. Natural logarithm of A. Minimum of A and B. MEANING Greatest integernot exceeding A. Maximum of A and B. Least integer not less than A. Residue of B mod A. Result is nonnegative number less than A and is congruent to B mod A. P and Q. P or Q. A is less than B. A is less than or equal to B. A is greater than B. A is greater than or equal to B. A is equal to 13. A is not equal to 13. Component I of vector f Component in row I and column J of matrix U.
PAGE 143
133 TABLE A1 (Continued) NOTATION MEANING A E I +IK x/I r /I Itf (I)tf (I)+f A equals some component of vector f Sum of all components of vector f. no components~ Product of all components of vector f has no components. Maximum. of all components of vector Minimum of all components of vector Result is O if f has f. Result is 1 if f f. First I components of vector f. Assuming that I is nonnegative, result is an Icomponent vector whose com ponents equal the first I components of f. Last I components of vector f. Assuming that I is non negative, result is an Icomponent vector whose components equal the final I components of f. All except the first I components of vector f. Assuming that I is nonnegative, result is vector formed by delet ing the first I components of f. All except last I components of vector f. Assuming that I is nonnegative, result is vector formed by deleting the final I components of f. Reverse of vector f. Result is vector identical to f except that the order of components is reversed. Vector formed by catenating f and fl. The components of f become the first components of result, and the components of fl become final components of result. The integer obtained by decoding vector fl with respect to the mixed base f. The components of fi are interpreted as digits ofa number in a mixedbase number system. The first components of ft are interpreted as the most signi ficant digits, the last as least significant. Vector f identifies theinixed base such that its first components correspond to the most significant digits. See APL reference for details~
PAGE 144
134 TABLE A1 (Continued) NOTATION MEANING The vecto~ whose components represent I in the mixed base number system specified by vector f. For a given mixed base f, fT is the inverse of function f~ above. See APL reference for details.
PAGE 145
135 TABLE A2 Extensions to APL Notation Used in Dissertation NOTATION MEANING El++ E2 f[I~J] [X: P(X)] A E S s T s C T s T s :::> T S T (:IX) +[I= J ~ N] x [I = J ~ N] Expression El is defined to be equivalent to expression E2. This denotes the vector consisting of components I through J of vector I, provided that I is already defined. When used in the definition vector 1'.:, however, f [I~J]" means that the components of are to be indexed by the integers I through J. This convention extends in the obvious way to arrays of more than one dimension. The null set~ The set of all X such that P(X) is true, where P(X) is some expression denoting a logical function of X. A is an element of set S. Set Sis a subset of set T. Set Sis a proper subset of set T. Set S contains set T. Set S contains set T as a proper subset. Cartesian product of sets Sand T. S T ++ [K[1~2]: (K[l] E S)A(J[2]ET)] There exists an X such that N Summation from I = J to N. This is equivalent to L in conventional notation. I= J N Product from I = J to N. This is equivalent to TT in conventional notation~ I= J ++ (f)i(H). This is like flH except that first components of vectors f and~ correspond to the least sign i ficant digits instead of most significant. ++ ~(1f)TI. For a given mixed base f, f~ is the inverse of function f~ above.
PAGE 146
APPENDIX B THE THEORY OF BANYAN GRAPHS
PAGE 147
137 Banyans are defined and analyzed with the use of graph theory in this appendix. The graph theoretic terms used are explained in Appendix A and are mostly taken from Berge (62). The mathematical notation used is also explained in Appendix A and is basically an extension of that used in the APL programming language. Definitions, theorems, and corollaries are numbered hierarchically. The first part of each number identifies the subsection of this appendix in which the definition, theorem, or corollary appears. For example, Theorem 3.2.l is the first theorem in Section B.3.2. Lemmas are numbered a s theorems. Corollaries of a theorem are numbered by appending letters t o the corresponding theoremnumber. The outline below summarizes each lemma, theorem, and corollary p resented in this appendix and identifies the terms or symbols defined in each definition. It is included for the reader's convenience to s implify reference
PAGE 148
138 B. l Banyans Def. 1.1. Base, apex, intermediate. Def. 1.2. Banyan, above, below. Def. 1.3. l?l], B. B.1.1 Connecting Trees Def. 1. 1. 1. Connecting tree. Th. 1.1.1. Connecting trees are trees. B.1.2 Banyan Synthesis Def. 1. 2.1. Synthesized graph, component set~ component banyan, interconnection graph. Th~ 1.2.1. In an interconnection graph, the arcs incident into a component banyan correspond to the component banyan's bases and the arcs incident out from a component banyan cor respond to its apexes. Th. 1.2.2. If V1' and V2' are distinct component banyans and V1 is a base of V1' and V2 is an apex of V2', then there exists a path from V1 to V2 in the synthesized graph iff there exists a path from V1' to V2' in its int~rconnection graph. Cor. l.2.2a. If V1 is an apex of component banyan V1' and V2 is a base of component banyan V2', then V1 V2 in the syn thesized graph iff V1' V2' in its interconnection graph. Th. 1.2.3. A synthesized graph is a banyan iff its inter~ connection graph is a banyan.
PAGE 149
139 B.1.3 Control of Connections Th. 1. 3.1. The set of vertices in the connecting tree connect;ing an apex A with a set of bases SB in a banyan is (BA) rtffiS'B. Th~ 1. 3. 2. If SB is a set of bases and SV ;i.s a set of vert;i.ces ofa banyan, then the connecting tree connect;i.ng SB with an apex A contains one or more of the vertices in SV iff A E ~(SVn@SB). B.1.4 Connectability nef. 1. 4. 1. Subsystem~ Def ~ 1.4.2. Call, callset~ Def. 1.4.3. Conflict. Def. 1.4.4. Connectable. Def. 1.4.5. KConnectable. B.2 LLevel Banyans Def. 2.1. Llevel banyan, level. B.2.1 Base and Apex Distances Def. 2. 1. 1. [l. Def. 2.1.2. Base distance~. apex distance~Th. 2.1.1. Base and apex distance operators are corr..mutative. Th. 2~ 1. 2. No base or apex distance can be less than zero or greater than. Th. 2. 1.3. If B1 and B2 are bases, then O = B13.IB2 if B1 = B2. If A1 and A2 are apexes, then O = A1~2 iff A1 = A2. Th~ 2.1.4. If SB1 and SB2 are subsystems, then O = SB1f{jC;B2 iff SB1nSB2. If $A1 and SA2 are sets of apexes, then O = SA1WA2 iff SA1nSA2
PAGE 150
140 Lemma 2.1.5. If A1 and A2 are apexes (or sets of apexes), B1 and B2 are bases (or subsystems), and SC is a set of calls from B1 (or bases in B1) to A1 (or apexes in A1), then (SC!iM2) AHM2 and (8Cl?i!B2);?; B11iSl.B2Lemma 2.1.6. Let SC1 be a callset, letA, be an apex (or set of apexes), letB be a base (or subsystem), andletSC2 be a set of calls from B (or bases in B) to A (or apexes in A). Then SC1 and SC2 will not conflict with each other if L < (SC1~B)+SC1~. Th. 2.1. 7. Let A1 and A2 be apexes (or sets of apexes), let B1 and B2 be bases (or subsystems):, letSC1 be a set of calls from B1 (or bases in B1) to A1 (or apexes in A1), and let SC2 be a set of calls from B2 (or bases in B2) to A2 (or apexes in A2). Then SC1 and SC2 will not conflict with each other if L < (B11iSlB2)+A1~2. B.2.2 Fanout and Spread Def. 2.2.1. Uniform, fanout vector, spread vector. Def. 2.2.2. Rectangular. Def. 2.2.3. Regular, fanout, spread. Th. 2.2.1. There are x/(ItQ),I+E vertices in level I of a uniform banyan with fanout vector E and spread vector E. Cor. 2.2.la. There are x/E bases and x/~ apexes in a uniform banyan with fanout vector E and spread vector Q Cor. 2.2.lb. A rectangular banyan has the same number of vertices in each level. Ccir ~ 2.2.lc. A regular banyan with : E;anout F, spread S and levels L has F*L bases, S*L apexes, and (S*J) x F'*LI vertices in level I_
PAGE 151
141 Th. 2.2.2. Let V be a vertex in level I of a unitorm banyan with fanout vectot;'E and spread vector_. Thei::e a.re x/J+ItE vertices below Vin level I if JI. Cor~ 2.2~2a. U V is a, vertex of a uniform, banyan with fanout vector E. and spread vector_, then there are x/ItE bases below V and x/I+_ apexes above V. B.2.3 Cost Functions Th~ 2.3.l In a uniform banyan, the number of arcs (measure of cost) is given by CM= +[I=1~L]E[I]xx/(It_).I+E. Cor~ 2.3.la. In a rectangular banyan, CM= (x/E_.)x+/E.. Cor. 2.3.lb. In a regular, rectangular banyan, CM= ((F*L)xLxF) = FxNxF@N, where N = F*L. Cor. 2.3.lc. In a regular banyan, CM= Fx+[I=1~L](S*I)xF*LI. Lemma 2.3.2. If Xis a real number other than 1 and Lis a positive integer, then (+[I=1~L]X*I) = Xx((X*L)1)..X1. Cor~ 2.3.ld. In a regular, nonrectangular banyan, CM= N[O]xSx((N[O]xS)F)+SF, where N[O] = F*L. B.2.4 Optimum Fanouts Th. 2.4.1. A cost or cost/performance measure given by CM[I] = FxNx(F@N)*I for a regular, rectangular banyan is minimized with respect to F when F = e*I. Th. 2.4.2 Optimum ;i.nt eger values of fanout F for a regular, rectangular banyan are 3 using cost measure CM[1] above and 7 using cost/performance measure CM[2]. Also, the value of measure QM[1l is the same for F = 2 as for F = 4.
PAGE 152
142 B.3 SW Banyans B.3.1 Synthesis Def. 3.1.1. Crossbar. Th. 3.1.1. Crossbars are ~egular. A crossbar w:lth F bases and S apexes has fanout F and spread S. Def. 3.1.2. SW banyan, synthesized SW banyan, component cross bar. Th~ 3. 1. 2. SW banyans are banyans. Th. 3. l. 3. SW banyans are Llevel banyans. If an SW banyan G' with L' levels is the interconnection graph of a synthesized SW banyan G withL levels, thenL = L'+1, the bases of the component crossbars in level I of G' are the vertices of level I of G, and the apexes of the component crossbars in level I of G' are the vertices of level I+1 of G. Cor. 3.1.3a. If Vis a base or an intermediate of a synthesized SW banyan G with interconnection graph G', then there exists a unique component crossbar V' in level ~V of G' such that Vis a base of V'. Cor. 3.1.3b. If Vis an apex or an intermediate of a synthesized SW banyan G with interconnection graph G', then there exists a unique component crossbar V' in level mv)1 of G' such that Vis an apex of V' Th. 3.1.4. If a sythesized SW banyan G has an interconnection graph G' with fanout vector E' and spread vector~', then G is uniform with fanout vector B.E' and spread vector~, ,A.
PAGE 153
143 Th. 3.1.5. If a synthesized SW banyan is uniform with fanout vector E and spread vector, then its interconnection graph is uniform with fanout vector 1 +E and spread vector ( 1) :h?.'.. Th~ 3~ 1. 6. If B1 and B2 are distinct bases of a synthesized SW banyan G with interconnection graph G' and if B1' and B2 1 are the bases of G' that contain B1 and B2 respectively as bases, then (B1'!tlB2') = (B1~2)1. Th. 3.1. 7. If A1 and A2 are distinct apexes of a synthesized SW banyan G with interconnection graph G' and if A1' and A2' are the apexes of G' that contain Al and A2 respectively as apexes, then (A1 '&142') = (A1~2)1. B.3.2 Distance Prop~rties Th. 3. 2. 1. If Bl and B2 are bases of an SW banyan and if V is a vertex such that (B1W2) ~V, then Bl~ V iff B2 V. Th. 3.2.2. If Al and A2 are apexes of an SW banyan with L levels and if Vis a vertex such that (A11\ilA2) L~V, then V Al iff V A2. Def. 3.2.1. ~X Def. 3.2.2. ~x Th. 3.2.3. The relation ~Xis both transitive and symmetric. Cor. 3.2.3a. If X O then ~Xis an equivalence relation. C 3 2 3b The base dl. stance operation is a metric on the or. bases of an SW banyan. Th. 3.2.4. The relation ~Xis both transitive and symmetric. Cor. 3~2~4a. If X o then ~X is a n equivalence rel a tion.
PAGE 154
144 Cor. 3.2 4b. The apex distance operation ~ is a metric on the bases of an SW banyan. Th. 3.2.5. If O $ X $ z then the equivalence classes of relation ~X are subsets of those of relation ~y Th. 3.2.6. If O $ X $ Y then the equivalence classes of relation ~X are subsets of those of relation ~yTh. 3.2.7. If G is a uniform SW banyan with L levels and fanout vector f. and if I is a n integer such tha t O $ I $ L, then the relation ~I partitions the bases of G into x/IiE equivalence classes containing x/ItE bases each. Cor~ 3.2.7a. Let G be a uniform SW banyan with L levels and fanout vector E, and let I be an integer such that 1 $I$ L. Then the relation ~Ii has E[I] equivalence classes that are subsets of any given equivalence class of ~I. Th. 3.2.8. If G is a uniform SW banyan with L levels and spread vector and if I is an integer such that O $I$ L, then the relation ~I partitions the apexes of G into x/(I)tQ equivalence classes containing x/(I)tQ apexes each. Cor. 3.2.8a. Let G be a uniform SW banyan with L levels and spread vector fi., and let I be an integer such that 1 $ I :e,; L. Then the relation ~Ii has Q[L(I1)] equivalence classes that are subsets of any given equivalence class of ~I. B.3.3 Connectability Th. 3.3.l. Let SA1 and SA2 be sets o:f apexes and let SB1 and SB2 be subsystems of an SW banyan with L levels. Let 8C1 = SB1J>tSA1 and let SC2 = SB2'!4.SA2 Then the callsets SC1 and SC2 conflict with each other iff L :2: (SB1~SB2)+SA1~A2.
PAGE 155
145 B.4 CC Banyans B.4.1 Structure Def. 4.1.1. CC Banyan. Lemma 4.1.1. Every CC banyan is the graph a partial order. Th. 4.1.2. In a CC banyan, there is exactly one path from a vertex f[I1;J1] to vertex f[I2;J2] if Ti.< I2 and (J28J1) < x/I2t. and O = (x/IH.)1J2J1; otherwise, there is no path. Cor. 4.1.2a. In a CC banyan E[I1;J1) Bl f[J2;J2l iff Ti $ I2 and (J2eJ1) < x/I2t!]_ and O = (x/Ilt.) ]J2J1. Cor~ 4.1.2b. In a CC banyan ~[J1l f[I;J2] iff (J2eJ1) < x/It.. Cor~ 4.1.2c. In a CC banyan E[I;J1] 0l[J2l iff O = (x/It.)jJ2J1. Th. 4.1. 3. A CC banyan is a rectangular banyan with L levels and with fanout/spread vector.. Also; ~[O~N1] are its bases and A[O~N1] are its bases. B.4.2 Distance Properties Def. 4.2.1. Minimum circular distance~Lemma 4.2.1. If (XeY) $Mand (ZeY) $ M, then (mz) $ M, where X, Y, Z, and Mare arbitrary integers. Lemma 4.2.2. If (YeX) $Mand (YeZ) $ M, then (XeZ) $ M, where X, Y, Z, and Mare arbitrary integers. Th. 4.2.3. is a metric on the integers O~N1. Th. 4.2.4. If ~[J1l and ~[J2J are bases of a CC banyan and I is an integer such that O $I$ L, then rn[J1]~l,i[J2]) $ I iff (J1W2) < x/ It.. Th~ 4. 2. 5. If A,[J1 l and A[J1 l are apexes of a CC bany an and I is an integer such that O ~I$ L, then (d.[J1]rM,[J2]).,:; L1 iff 0 = (x/It.)1J2J1.
PAGE 156
146 Cor. 4.2.Sa. If A[J1] and A[J2] are apexes of a CC banyan and I is an integer such that O $I$ L, then (A[J1]1SM_[J2]) $ I iff O = (x/(,J)+!2'.) ]J2J1. Th. 4.2.6. is a metric on the bases of a CC banyan, provided that 2 $ Q[I] for every I= 1~L. Th. 4.2.7. is a metric on the apexes of a CC banyan.
PAGE 157
147 B.l Banyans A banyan is defined to be a certain kind of graph. Definition 1 1 A vertex of a directed graph G is called a bnse of G iff there are no arcs incident into it in G, is called an apex of Giff there are no arcs incident out from it in G, and is called an interniediate of G otherwise. Definition 1.2. A banyan is a nontrivial finite graph such that its associated weak ordering~ is a partial orderingl and such that for every b ase Band apex A, there is one and only one path from B to A. If V1 a nd V2 are vertices of a banyan and V1 V2, then V1 is said to be below V 2, and V2 is said to be above V1. Notice that by this definition, a vertex is always both above and be low itself. For notational convenience, we also define the operators and 8 which give us, respectively, the vertices above and below a ve rtex or set of vertices. De finition 1.3. In a banyan with partial order~, let V be a vertex and let SV be a set of vertices. Jill and 8 are monadic operators defined as fo llows: JillV ++ [X: WJXJ @SV ++ [X: (3Y) (YESV)A~] 8V ++ [X: fflV] ESV ++ [X: ( 3Y) (YESV)Afil] 1 A directed graph is associated with a partial order iff it contains no ci rcuits. Such graphs are sometimes called Hasse diagrams. See Berge ( 62, p. 12).
PAGE 158
148 B.1 1 Connecting Trees Next we show that the partial subgraph formed by all paths from a set of bases to a single apex is a tree. This is the basis for asserting in Section 4.1 that subsystems are connected by tree~shaped connections in a banyan network. Definition 1.1.1. Let G be a banyan, let A be an apex, and let S B be a n onempty set of bases. The partial subgraph of G consisting of all arcs and vertices of all paths from bases in SB to apex A is called a connecting tr ee. T heorem 1.1.1. Every connecting tree i:s a tree. Pr oof. Let T be a connecting tree connecting apex A with a set of bases SB in banyan G. Since every vertex of Tis a vertex of some path to A, Ti s connected. Now suppose that there exists in Ta cycle C. T contains no circuits, be cause it is a partial subgraph of a graph G of a partial order~ Since C i s a cycle but not a circuit, it must have some vertex V that is the in itial vertex of at least 2 arcs. These 2 or more arcs must be parts of 2 or more nonidentical paths from V to A. Thus, there exists some base B ( any base below v will do) such that there are 2 or more paths from B t o apex A. But this contradicts the definition of bany a n G Therefore, T c ontains no cycles. Since Tis connected and contains no cycles, it is a t ree. Q.E. D. In a similar manner~ i.t can be shown that the paths connecti n g a sin gle base to a set of apexes is likewise a tree~ Also, i.t i s a pparen t that any banyan with only one apex or with only one base is a t ree. Thes e cor ollaries are simply pointed out for mathematical interest and will not be treated formally.
PAGE 159
149 B.1.2 Banyan Synthesis The recursive method discussed in Section 4.3 for synthesizing large banyans from smaller ones will be proven in this section. Definition 1.2.1 Let CS be a set of banyan subgraphs of a directed graph G such that the arc sets of the banyans in CS together form a parti t ioning of the arcs of G and also such that for any X E CS, 1) no intermediate of Xis a vertex of any other element of CS, 2) no base ofX is a vertex of any other element of C S unless each base of Xis an apex of a different element of CS, and 3) no apex of Xis a vertex of any other element of CS unless each apex of Xis a base of a different element of CS. An y such graph G is said to be synthesized, CS is called a component set o f G, and the elements of CS are called component banyans of G. Now let G' be a directed graph on CS (Le., the component b.nyans of G are the ve rtices of G') such that for any component banyans V1' and V2' in CS, th ere exists an arc in G' from V1' to V2' iff some apex of V1' is also a ba s e of V2'. G' is called an interconnection graph of G. It is possible for a synthesized graph G to have more than one com ponent set, and hence, more than one interconnection graph. Strictly spe aking, a subgraph of G is a component banyan of G if it is an element of any component set of G; i.e., if it is a vertex of any interconnection gra ph of G. In subsequent theorems, however, we will generally select a I II par ticular interconnection graph. o:f G and use the term 'co:mponent banyan to refer to its vertices only. The orem 1. 2 .1. Let G be a synthesized graph with interconnection graph G' and let V' be a component banyan (Le;, a vertex of G'). T he n, if
PAGE 160
150 there are any arcs incident into V' in G', there is exactly one such arc for each base of V' Also, if there are any arcs incident out from V' in G', then there is exactly one such arc for each apex of V'. Proof~ Suppose that there is an arc incident into V' in G', so that some base of V' is an apex of another component banyan. By part 2 of Defini tion 1.2 1 each base of V' is an apex of a different component banyan. By the definition of an interconnection graph, these component banyans correspond onefor~one with the arcs incident into V' in G'. Thus, there is one such arc for each base of V' Now suppose that there is an arc incident out from V' in G', so that some apex of V' is a base of another component banyan. By part 3 of Definition 1,2.1, each apex of V' is a base of a different component banyan. By the definition of an interconnection graph, these component ba nyans correspond oneforone with the arcs incident out from V' in G'. T hus, there is one such arc for each apex of v,. Q .E.D T heorem 1.2.2. Let G be a synthesized graph with interconnection graph G' let V1' and V2' be distinct vertices of G', and let V1 and V2 be v ertices of G such that V1 is a base of V1' and V2 is an apex of V2' Th en there exists a path from V1 to V2 in Giff there exists a path from V 1' to V2' in G' Pr oof~ First, suppose that there exists a path from V1 to V2 in G, and le t 4[1],&[2], 4 [N] be the sequence of arcs in this path. Since the ar c sets of the vertices of G' (_co:mponent banyans) forin a partitioning of th e arcs of G, there exists a sequence of indices I[O],J:[1], .. ,I[N'] such that
PAGE 161
1) 2) 3) 151 1 < I[O] < I[2] < < I[N'] = N &[1], ,4[J[O]] are arcs of the same component banyan, for eachJ = O,., N'1, the arcsAU:CJ]+1l,AU:CJJ+2J, A[I[J+1]] belong to the same component banyan but A[I[J]] does not belong to the same component banyan as a A[I[J+1]]. Now let lt[O] ,Pl[1l, : ,]i[N' J be the component banyans containing arcs ,1.[I[O]],A[I[1]], ,,1.[I[N']] respectively. Then Ji[O] is also the component banyan containing ,1[1l, which is an arc incident out from V1 in G. But since V1 is a base of V1' all arcs incident out of V1 in G are arcs of V1'. Therefore, I{[ O] = V1'. Similarly, Pl[N'] contains arc A[I[N' JJ = ,1.[N], w hich is incident into V2 in G. Then since V2 is an apex of V2', I{[N'] = V2'. N ow for each J = O, .. ,N'1; A[I[J]J and A[I[J]+1l are consecutive arcs of a path in G. Thus, the terminal vertex of ,1.[I[J]] is the initial vertex o f A[I[J]+1]. But since &[I[J]] is an arc of I{[J] ~nd A[I[J]+1] is an ar c of Ji[J+1], this vertex is connnon to both ]i[J] and Ji[J+1]. By Defini ti on 1.2.1, this common vertex must either be an apex of Pl[J] and a base of ~1J+1] or else a base of Ji[J] and an apex of Pl[J+1]. It cannot be a ba se of Pl[J], however, because there exists an arc A[I[J]] incident into it in Pl[J]. Consequently, the common vertex is an apex of ~[J] and a ba se of H[J+1l, implying that there is an arc from fi[J] to fl[J+1] in G'. Th erefore; Ji[O], fi[1], ,fi[N'] are the vertices of a path from V 1' to V2' in G' Now suppose there exists a path P' from V1' to V2' in G', and let b'.( O],f!'.'.[1], ~,E../[N'] be the sequenceofvertices o;f G' along this path. Th us; )i[O] = V1' andfi[N'] = V2' Since P' is a path in an int e rconnection gr aph, ea ch component banyan Ji[I] such that 1 I N' has some base that
PAGE 162
152 is an apex of !{[I1J. Let ~[OJ= Vt, let lt_[N 1 +1J = V2, and for each I= 1, ,N 1 let lt_[JJ be a base of fi[I] which is also an apex of fl[J 1J. Then for each J = O, ... N', there exists a path from lt_[JJ to ~[J+1J in G, because H[JJ is a base and lt_[J+1l is an apex of component banyan !{[JJ. Therefore, there exists a path from ~[OJ to lt_[N'+1l in G. Since lt_[OJ = V1 and B[N 1 +1J = V2, there exists a path from Vl to V2 in G. Q.E.D:
PAGE 163
153 Corollary 1. 2. 2a. Let G be a synthesized graph with interconnection graph G' let V 1' and V2' be any vertices of G' (not necessarily distinct), and let V1 and V2 be vertices of G such that V1 is a base of V1' and V2 is an apex of V2' Then V1 V2 in G iff V1' 8:l V2' in G' Proof. First, suppose that V1' = V2'. Then V1' @i V2'. Also, since V1 is a base and V2 is an apex of the same component banyan, there exists a path from V1 to V2 in G and hence V1 @i V2. Therefore, V1 @l V2 iff V1' 8l V2'. On the other hand, suppose that V1' V2'. Then V1 V2 iff V1' V2', by Theorem 1. 2. 2. Q.E.D. Theorem 1.2.3. If G is a synthesized graph with an interconnection graph G ', then G is a banyan iff G' is a banyan. P r oof. Let G be a synthesized graph with interconnection graph G'. There are three ways a directed graph like G or G' can not be a ba nyan: it could contain a circuit, there could be more than one path f rom some base to some apex, or there might be no path from some base to s ome apex. We will show that the existence of any of these conditions in G would cause the same condition to exist in G' and vice versa. First, suppose that G contains a circuit C. Since no circuit can ex ist wholly within any component banyan, C must pass through at least tw o component banyans V1' and V2'. Thus, there exists a base V1 of V1' an d an apex V2 of V2' such that both V1 and V2 are vertices of C Since t here exists a path from V1 to V2 in G, there exists a path from V1' to V 2' in G' by Theorem 1.2.2 :(f V1 = V2 then G' contains an arc from V2' to V 1' If V1 ;ie V2 then G contains a path from V2 to V 1, implying by
PAGE 164
154 Theorem 1.3.2 that G' contains a path from V1' to V2' Thus, in either case, Vl' and V2' are vertices of a circuit in G' Second, suppose that G contains at least two paths P1 and P2 from some base B to an apex A. Then Bis a base of some base B' of G', and A is an apex of some apex.4.' ofG' Let 1'.1[0];V1[1], ,V1[N1l be the sequence of vertices along Pl, : and let V2[0l;V2[1], ~;}'.'..2[N2] be the sequence of.vertices along P2. Let I be the integer such that the first I+l vertices of Pl equal the first I+1 vertices of P2; i.e., J:'.'.1[I+1] V2[I+1] but O J I implies that V1[J] = V2[J]. (Thus, Vl[JJ = V2[I].) Let VI' be the component banyan containing V1[IJ and the arcs incident out from V1[I]. Thus; V1[I+1l and V2[I+1l are also in VI'. Since Vl[I+ll V2[I+1] and there cannot be two distinct paths from the same base to the same apex in VI', there exist two distinct apexes Wl and W2 of VI' such that W1 is a vertex of Pl and W2 is a vertex of P2. Then W1 and W2 must be bases of two distinct component banyans W1' and W2', and hence, ther e exist arcs from VI' to W1' and from VI' to W2' in G'. Finally, let B' be the base of G' containing Bas a base. Then by Theorem 1.2.2, ther e exist paths in G' from B' to VI', from W1' to A' and from W2' to A'. Therefore, G' contains two distinct paths from B' to A', one of which pases through vertex W1' and the other through W2'. Third, suppose that G contains no path from some base B to some apex A. Let B' be the base of G' that contains Bas a base and let A' be th e apex of G' that contains A as an apex. Then by Theorem 1.2.2, G' cont ains no path ram B' to G'. Thus far we have shown that ;if G ;is not a banyan, then neither is G' The converse is proven nexL
PAGE 165
155 First, suppose that G' contains a circuit C'. Let V1' and V2' be the initial and terminal vertices respectively of some arc in C'. Hence, there exists a vertex V of G which is both an apex of V1' and a base of V2'. Since V cannot be both a base and an apex of the same component banyan, V1';z:V2'. Thus, by Theorem 1.2.2, there exists a path from V to Vin G, implying tha.t G contains a circuit. Second, suppose that G' contains two distinct pathsP1' and P2' from some base B' to an apex A', an d let V1' and V2' be distinct vertices of P1' and P2' respectively. Let B be a base of B' and let A be an apex of A'. Thus, Bis also a base of G and A is an apex of G. Let V1A be an apex of V1', and let V2A be an apex of v2 1 Then by Theorem 3, there exist paths in G from B to V1A and from B to V2A. The path from B to V1A must pass through some base V 1B of V1', and likewise, that from B to V2A must pass through some base V2B of v2 1 Then by Theorem 1.2.2, t here exist paths in G from V1B to A and from V2B to A. Further, V1B ;z: V 2 B s ince component banyans V1' and V2' cannot have a common base. Therefore, G contains two distinct paths from A to B, one through vertex V1B and the other through V2B. Third, suppose that G' contains no path from some base B' to some a pex A'. Let A be an apex of A', and let B be a base of B'. Thus, A is a n apex of G, and Bis a base of c. By Theorem 1.2.2, G contains no p ath from B to A Q. E.D.
PAGE 166
156 B 1. 3 Control of Connections The twostep setup method described in Section 4.4 can be justified theoretically. When we broadcast a "one" baseward from an apex A, it propagates to all vertices below A. Likewise, when we broadcast "ones" apexward from a set of bases SB, they propagate to all vertices above any bases in SB. The set of vertices selected by the procedure is therefore CB A) n 1ij] SB. The next theorem shows that these are precisely the vertices of the connecting tree connecting apex A with the bases in SB. Theorem 1.3.1. The set of vertices in the connecting tree connecting an apex A with a set of bases SB in a banyan is (8 A) n 1ij] SB. Pr oof. The vertices of a path from any base B to apex A are [V: (B V) 11 (V A)]. The set of vertices in the desired tree connecting a set of bases SB to ap ex A is thus [V: (3 B) (BE SB) 11 (B V) 11 (V~A)] = [ V: ( V A) 11 (3 B) (B E SB) 11 (B V)] = (8 A) n @ SB. Q.E~D. We will next prove that the twostep search procedure described in Section 4.4 correctly selects those apexes which can be used to connect a new block without interfering with those already connected. Suppose that SV is the set of all vertices that are already in use and hence, must notbe part of the new connecting tree. Suppose also that SB is the set of bases in the subsystem to be connected. In step one
PAGE 167
157 of the procedure, a control signal is broadcast to all vertices above any base in SB; i.e., to@ SB. A flipflop is set in each vertex in S V that receives this signal. The vertices thus selected are SV n@ S B. In step two, each of these vertices broadcasts a signal apexward so that the vertices receiving this signal are @ SV n @ SB. The following theorem will show that the apexes in this set are precisely those which cannot be used to connect subsystem SB. Theorem 1.3.2. Let SB be a set of bases and let SV be a set of vertices of a banyan. The connecting tree connecting the bases in SB with an apex A contains one or more vertices in SV iff A E@ SV n@ SB. Proof. The set of vertices in the connecting tree connecting SB with A is (@ SB) n 8 A. The intersection of this set with SV is nonempty iff: Q .E.D. B. 1.4 Connectability ( 3 V) V E SV n ml SB) n B A ( 3 V) ( v E sv n @ SV) A v E 8 A (3 V) (VE SV n@ SB) AV~ A A E l?1l SV n @ SB It was pointed out in Section 4.5 that a single banyan network ca nnot necessarily connect all desired subsystems of an arbitrary parti ti on simultaneously. When considering any particular banyan, it is im portant to determine how often and under what circumstances multiplexin g or additional networks are likely to be needed. We will therefore define se veral terms concerning the ability of a banyan network to connect mu ltiple subsystems. De finition 1.4.1. A subsystem is a set of bases.
PAGE 168
158 Definition 1.4.2. If Q is a twoelement vector, or ordered pair, such that Q[1] is a base and Q[2] is an apex, then Q is said to be a call from base Q[1] to apex Q[2]. A callset is a set of calls. Calls and callsets are used to characterize connections one might wish to make in a banyan network. For any call Q, there corresponds a unique path through the network from base Q[1] to apex Q[2]. A connecting t ree connecting a subsystem SB with an ap~x A is characterized by a call set containing a call from each base in SB to apex A. Two or more connections can exist simultaneously without interference i n a banyan network if and only if no two of them require the same vertex. C onnections which cannot coexist in a given network are said to conflict wi th each other. Similarly, a subsystem is said to conflict with connec ti ons or other subsystems if all possible connections (connecting trees) f or that subsystem conflict with the connections or other subsystems. A se t of connections and/or subsystems is said to be connectable if no two of them conflict. De finition 1.4.3. Two calls Cl and C2 conflict with each other iff, in a gi ven banyan, the path from C1[1] to C1[2] has some vertex in common with th e path from C2[1] to 92[2], A call Q and a callset SC conflict with ea ch other iff some element of SC conflicts with Q. A subsystem SB and a call Q conflict with each other iff for every apex A, the callset S B ~ [A] co nflicts with Q. A call, callset, or subsystem X and a set of calls, cal lsets, and/or subsystems BX conflict with each other iff X conflicts wit h some element of SX. If the elements of two sets SX1 and SX2 are cal ls, callsets, and/or subsystems, then the sets conflict with e ach ot her iff some element of SX1 conflicts with some element of SX2
PAGE 169
159 Definition 1.4.4. A set of calls, callsets, and/or subsystems is connectable iff no two of its elements conflict with each other. Connections or subsystems which cannot exist simultaneously in a given network can be connected in different layers as described in Section 4.5. A set of connections and/or subsystems is called Kconnectable if it can be connected using Kor fewer layers. Definition 1.4.5. A set of calls, callsets, and/or subsystems is connectable iff it can be partitioned into Kor fewer connectable subsets.
PAGE 170
160 B.2 LLevel Banyans An Llevel banyan, discussed in Section 5, is simply a banyan whose vertices are arranged in levels so that direct connections can only exist between vertices in adjacent levels. Definition 2.1. An level banyan is a banyan whose vertices can be partitioned into L + 1 subsets SV[O], .. ,SV[L] such that for any two vertices V1 and V2, an arc can exist from V1 to V2 only if V1 E SV[I1] and V2 E SV[I] for some I= 1~L. The vertices in each subset SV[I] are said to be in level I. B.2.1 Base and Apex Distances The base and apex distance functions discussed in Section 5.1 will b e defined formally here. The formal definition is somewhat more general th at that of 5.1, because it allows distance to be measured from calls and c allsets as well as from bases, subsystems, apexes and sets of apexes. The monadic operator~, which gives the level of a vertex is also d efined for notational convenience. D efinition 2.1.1. Let V be a vertex and let SV be a set of vertices in an Llevel banyan. We define the monadic operator~ so that ~Vis the level of V and so that WV is the set [~W: WESV]. D efinition 2.1.2. In an Llevel banyan, let B1 and B2 each be bases or su bsystems, and let Al and A2 each be apexes or sets of apexes. Also le t C be a call or callset and let SV be the set of all vertices in the c orresponding paths. The diadic operators~ and~ are defined a s follows:
PAGE 171
161 B1W2 HLI [l( @ B1)n@B2 A l!Sil.4. 2 ++ L f I ll)( B41 ) n8.4. 2 B:l~C ++ CB1 ++ LI [l SVn@B1 A11YJC ++ C'ra41 ++ L f I l[J SVnB.41 Since l!il is defined for bases and~ for apexes, these operators are called base distance and apex distance respectively. Either way siillply be called "distance" when the distinction is clear from context. The following theorems express fundamental properties of base and apex distances. Theorem 2.1.1. Base and apex distance operations are both commutative. Proof. Let B1, B2, A1, A2, and C be as defined in Definition 2.1.2. Then (B1W2) = L/i[J(@B1 ) n@B2 = LlllJ(W2)nW1 = (B2~B1) and (A1~2) = Lf l[l(B.4.1 )n8.4.2 = Lr ll1JCB42)nB41 = (A2~1). By Definition 2.1.2, (B1~) = ~1 a nd (A1~C) = CIS/l.4.1. Q .E.D~ T heorem 2.1.2. In an Llevel banyan, no base or apex distance can be l ess than zero or greater than L. Proof~ Let B1, B2, A1, A2, C, and SV be as defined in Definition 2.1.2.
PAGE 172
Then each of 162 (@B1)nW2 (BA1 )nlB.42 SVn@B1 SVnBA.1 is a set of vertices of an Llevel banyan. But since each vertex of an Llevel banyan has a nonnegative level number not exceeding L, o (L/[l(@B1)n@B2) L 0 (Lr/[l(B4.1)na42) L O (L/[JSVn@B1) L O (Lf!IDSVnB4.1) L Q.E.D. Theorem 2.1.3. Let B1 and B2 be bases and let A1 and A2 be apexes of an level banyan G. Then O = B1~B2 if B1 = B2, and O = A11YlA.2 iff Ai = A2. P roof. First, suppose that O = B1W2. Then O = L/n(W1)nW2, so the set (W1)nBIIE2 contains some vertex Vin level O of G. Hence, VE W1 a nd V E W2. But the only element of @Bl in level O is B1, and the only e lement of W2 in level O is B2. Thus, V = B1 and V = B2, so Bl= B2. Next, suppose that Bl= B2. Then Bl E (fil81)nill82. Since O = Wl, 0 = L/~(@B1)n@B2, and hence, 0 = B1W2. Therefore, 0 = B1W2 iff B1 = B2. Now, suppose that O = A1~2. Then O = Lf /[l(B4.1)nBA.2, so the set (&1 1)nla4.2 contains some vertex Vin level L of G. Hence, V E a4 1 and V E IBA2. But the only element of 8.41 in level Lis Al, and the only e lement of 8.42 in level L is A2. Thus, V = A1 and V = A 2, so A1 = A2.
PAGE 173
163 Next, suppose that A1 = A2. Then A1 E (la41)nla42. Since L = [lA.1, L= I /fl.l(B.41) n84.2 O=Lf /~(84.1)nla42 0=A1ra42. Therefore, 0 = A1~2 iff A1 = A2. Q.E.D. Theorem 2. 1.4. Let SB1 and SB2 be subsystems and let SA1 and SA2 be sets of apexes of an Llevel banyan G. Then O = SB1~B2 iff ;t SB1nSB2, and O = SA1~A2 iff ;t SA1nSA2. Proof. First, suppose that O = SB1WB2. Then there exist bases B1 E SB1 and B2 E SB2 such that O = B11Zi!B2. By Theorem 2.1.3, B1 = B2. Therefore, B1 E SB1nSB2, so ;t SB1nSB2. Next, suppose that ;t SB1nSB2. Then there exists a base B such t hat BE SB1 and BE SB2. But O = BIZi!B, so O = SB1WB2. Therefore, 0 = SB1~SB2 if ;t SB1nSB2. Now, suppose that O = SA1ras'A2. Then there exist apexes A1 E SA1 a nd A2 E SA2 such that O = A1~2. By Theorem 2.1.3, A1 = A2. Therefore, A 1 E SA1nSA2, so ;t SA1nSB2. Next, suppose that ;t SA1nSB2. Then there exists an apex A such t hat A E SA1 and A E SA2. But O = A&, so O = SA1M..~A2. Therefore, 0 = SA1121SA2 iff ;t SA1nSA2. Q. E D. Theorem 2.1. 7, discussed in Section 5.1, will be proven next. Actually, t he theorem here will be proven in a more general form that appl i es to a rbitrary callsets, not just those of connecting trees. The proof given w ill be a trivial consequence of two notsotrivial lemmas. The lemmas
PAGE 174
164 are separated from the proof of Theorem 2.1. 7 since they are felt to be of some theoretical interest in themselves. Lemma 2.1.5. In an Llevel banyan, let A1 and A2 be apexes (or sets of apexes) and let B1 and B2 be bases (or subsystems). Let SC be any set of calls from B1 (or bases in B1) to A1 (or apexes in A1). Then, (S) A1~2 and (S~2) B1W2. Proof. Let SV be the set of all vertices of the paths corresponding to callset SC. Hence, Therefore, Similarly, Q .E.D~ SV (@81)nB.4.1. sv W1 (SVnW2) (W31)n@B2 ([lSVnW2) 11](@81) n@B2 CL/l!lSVnW32) L/~(W1)nW2 (SGI:S!B2) B1W2. sv 84.1 (SVnB.42) (8A1)n8.42 (OS'VnB.42) [l(B41)n8A.2 cr;mvnB.42) f /r.lCB41)nB42 (Lr/l!lSVn642) Lf/~(641)n8.42 (S~2) A1~2.
PAGE 175
165 Lemma 2.1.6. In an Irlevel banyan, let SC1 be a callset, let A be an apex (or set of apexes), let B be a base (or subsystem), and let SC2 be a set of calls from B (or bases in B) to A (or apexes in A). Then SC1 and SC2 will not conflict with each other if L < (SC1W)+SC1&. Proof. Let SV1 be the set of all vertices of the paths corresponding to SC1, and let SV2 be a similar vertex set for SC2. Suppose that SC1 conflicts with SC2. Then there must exist some vertex V such that But, Therefore, By Definition 2 .1. 2, Similarly, F rom (1) and (_2) above, Q.E.D~ V E SV1nSV2. SV2 5c: !illB. V E SV1 n!illB (~V) L/ [.l SV1nillB. ([JV)~ SC1W. SV2 5c: B4 V E SV1nB4. (~V) r; [.l SV1nBA. ([JV) LSC1& L (ll'1V)+SC1&. L (SC11?5JB)+SC1&. (1) (2) Theorem 2.1.7. In an level banyan, let A1 and A2 be apexes o r sets of apexes, let B1 and B2 be bases or subsystems, let SC1 be a set of calls
PAGE 176
166 from B1 (or bases in B1) to A1 (or apexes in A1), and let SC2 be a set of calls from B2 (or bases in B2) to A1 (or apexes in A1). Then SC1 and SC2 will not conflict with each other if L < (B1W2)+A1tQJA2. Proof. Suppose, L < (B1W2)+A1~2. By Lemma 2. 1.5 ((B1W2)+A11Sil.4.2) (SC1W2)+sc1g.q2. The;refore, L < (SC1~B2)+SC11Sil.4.2. By Lemma 2.1. 6, SC1 and SC2 are consistent. Q.E.D. B 2.2 Fanout and Spread We next define several terms dealing with the number of arcs incident into and out from vertices in the various levels. These p roperties, discussed in Section 5.2, are of practical interest because t hey determine the fanout and fanin requirements of circuits used to r ealize a network. Also, as will be shown, they can be used to specify t he size and shape of an Llevel banyan. Definition 2.2.1. An Llevel banyan is uniform iff there exist vectors E [1~L] and [1~L] such that for each vertex Vin each level I: 1) the number of arcs incident into Vis O if I= 0 and is E[I] otherwise, and 2) the number of arcs incident out from Vis O if I= Land is [I+1l otherwise. E is called the fanout vector, and is called the spread vector. D e finition 2.2.2. A rectangular banyan is a uniform banyan w hos e fanout vector E is equal to its sprea d vector
PAGE 177
167 Definition 2.2.3. A uniform banyan is said to be regular with fanout F and spread S iff for each I=1~L, F=f_[IJ, and S=!]_[I], where E and!]_ are the fanout and spread vectors, respectively. We will next show how the number of vertices in each level of a uniform banyan is related to the fanout and spread vectors. Theorem 2.2.1. In a uniform banyan with fanout vector f. and spread vector !]_, the number of vertices N.[I] in level I is given by N.[IJ = x/(It!]_),I+f_. Proof. We will use finite induction. First, consider the case where I= O. Let A be some apex. There are x/f_ choices of paths from various bases to A. By the definition of a banyan, there is one and only one path from each base to A. Therefore, N.[O] = x /f_ = x/(Ot!}_),O+f_. Next let IE1L and suppose that N.[I1] = x/((Il)t!]_),(I1)+.E. Since there are [I] arcs incident out from each vertex in level I1, the total number of arcs from level I1 to level I is N.[I1] x!]_ [I], which equals x/(I+),(I1)+.E. Since f_[I] of these arcs are incident into each vertex in level I, Q.E.D. C orollary 2.2.la. N.[I] = ( x /(It),(I1)+f_)if_[I] = x/(It!]_) ,I+f_. N.[O] = x/f_ N.[LJ = x / Proof. Follows from Theorem 2.2.1 by substituting O and L respectively for I.
PAGE 178
168 Corollary 2.2.lb. A rectangular banyan has the same number of vertices in each level. Proof. In a rectangular banyan, the fanout vector E equals the spread vector {i_; so by substitution into Theorem 2.2.1, we have for each level I. Q.E.D. N[IJ = x/(ItE),ItE = x/ f_ Corollary 2.2.lc. In a regular banyan with fanout P and spread S, the number of vertices in each level I is given by fl[IJ = (S*I)xP*LI, the number of bases is and the number of apexes is fl[L] = S*L. Proof. Follows from Theorem 2.2.1 and Definition 2.2.3. Theorem2.2.2 Let G be a uniform banyan with L levels, with fanout vector E, and with spread vector {i_, and let V be a vertex in some level I of G. Let J be an integer. If O J < I, then there are exactly x /J+Itf_ vertices below Vin level J of G. If i < J L, then there are exactly x/( JI)tI+{i_ vertices above Vin level J of G Pro of. For every O K L, let SV[K] be the set of all vertices in lev el K that are either above or below V. Thus, the vertices in SV [K] are below V iff K I and are above V iff I~ K. First, consider the case where O J = I1. Since G is an level ban yan, there is exactly one arc incident into V from each vertex in SV[ JJ. Since G is uniform, there are E[I] such arcs. Thus, the number
PAGE 179
of vertices in at[J] is 169 E[IJ = x/(I1)+ItE = x/J+ItE. Thus, the theorem is true when O $ J = I1. Next, consider the case where O $ J $ I2, and suppose that there are x/(J+1)+ItE vertices in SV[J+1]. It is apparent that SV[J] is precisely the set of initial vertices of the arcs incident into the vertices in SV[J+1]. Suppose that some vertex V1 in SV[J] were below two distinct vertices V2 and V3 in SV[J+1]. Then for some base B below V1 and for some apex A above V, there would exist two paths from B to A in G, one passing through V1, V2, and V and the other passing through V1, V3, and V. Since this is impossible in a banyan, we conclude that each vertex in Qf[J] is below only one vertex in SV~J+1]. Therefore, there is exactly one vertex in SV[J] for each arc incident into a vertex in SV[J+1]. But each vertex in SV[J+1] has E[J+1] arcs incident into it, so the number of vertices in SV[J] is EtJ+1] xx/ (J+1)+ItE =x/J+ItE. Thus, the theorem is also true when O $ J $ I2. Now consider the case where (I+1) = J $ L. Since G is an Llevel banyan, there is exactly one arc incident out from V to each vertex in SV[J]. Since G is uniform, there are Q[I+1l such arcs. Thus, the number of vertices in SV[J] is Q'.[I+1] = x/HI+Q'. = x/(JI)tI+Q'..
PAGE 180
170 Thus, the theorem is true when (I+1) = J L. Next, consider the case where (I+2) J L, and suppose that there are x/((J1)I)tI+Q vertices in SV[J1]. It is apparent that SV[J] is precisely the set of terminal vertices of the arcs incident out from the vertices in SV[J1]. Suppose that some vertex V1 in SV[J] were above two distinct vertices V2 and V3 in SV[J1]. Then for some base B below V and for some apex A above V1, there would exist two paths from B to A in G, one passing through V, V2, and V1 and the other through V, V3, and V1. Since this is impossible in a banyan, we conclude that each vertex in SV[J] is above only one vertex in SV[J1]. Therefore, there is exactly one vertex in SV[J] for each arc incident out from a vertex in SV[J1]. But each vertex in SV[J1] has Q[J] arcs incident out from it, so the number of vertices in SV[J] is Q[J] xx/ ((J1)I)tI+Q = Q[J] xx/ Q[(I+1) ~ I+(J1)I] = Q[J] xx/ Q[(I+1) ~ J1] = x/ Q[(I+1) ~ J] = x/ (JI)tI+Q. Thus, the theorem is also true when (I+2) J L. Q.E.D. Corallary 2.2.2. Let V be a vertex in level I of a uniform banyan G with L levels, with fanout vector E, and with spread vector Q There are exactly x/ItE bases below V and exactly x/I+Q apexes above V. Proof. Follows from Theorem 2.2.2 by substituting O and L respectively for J. Q.E.D.
PAGE 181
171 B.2.3 Cost Functions The "cost 11 of a connecting network is often measured in terms of the number of contacts, or bidirectional switching devices, required. In the graph of a banyan network, each arc represents a contact (or set of contacts if parallel data paths are used); so the number of arcs is a measure of network "cost" in contacts. The number of arcs in a uniform banyan is a function of its fanout and spread vectors. Theorem 2.3.1. In a uniform banyan with fanout vector E., spread vector Q and L levels, the number of arcs is given by CM= +[I=1~L] f..[I] x x/(ItQ),I + E Proof. For each I=1~L, there are E[I] arcs incident into each vertex in level I, so that the total number of arcs is given by CM= +[I=1~L] E[I] x N[I], w here N[I] is the number of vertices in level I. By Theorem 2.2.1, CM= +[I=1~L] f_[I] x x/(It~),I+E Q .E.D. Corollary 2.3.la. The number of arcs in a rectangular banyan w ith fanout/ s pread vector Eis given by CM= (x/f_) x +/f... P roof~ In a rectangular banyan f.. = Q Therefore, Q. E.D. C M = +[I=1~L] f..[I] x x /(I t f_),I + E = +[I=1~L] f..[I] x x /f_ = ( x /f..) x +[I=1~L] E[IJ = ( x /f..) x +/E.
PAGE 182
172 Corollary 2.3.lb. The number of arcs in a regular, rectangular banyan is given by CM= (F*L) XL X F = F X N X F@N. where Fis the fanout, Lis the number of levels, and N equals the number of bases F*L. Proof. Follows immediately from Corollary 2.3.la. Corollary 2.3.lc. The number of arcs in a regular banyan is given by CM = F x +[I=1~L] (S*I)xF*LI where Fis the fanout, Sis the spread, and Lis the number of levels. Proof. By Theorem 2.3.1. CM = +[I=1~L] Fx(S*I)xF*LI = F x +[I=1~L] (S*I)xF*LI. Q.E.D. Lemma 2.3.2. Let X be a real number other than 1, and let L be a positive integer. Then Proof. Let Then (+[I=1~L]X*I) = Xx((X*L)1)X1 Z = +[I=1~L] X*I. (Z+1) = +[I=O~L] X*I (XxZ+1) = +[I=1~L+1] X*I. Su btracting (1) from (2) yields ((XxZ+1)Z) = X;.;L+1 ((XxZ)~Z) = (X*L+1)X (Z x X1) = (X*L+1)~X (1) (2)
PAGE 183
Since X 1, Q.E.D. 173 Z = ((X*L+l)X )~1 = Xx((X*L)1)~X1 Corollary 2.3.ld. The number of arcs in a regular, nonrectangular banyan is given by CM= ~[O]xSx((~[o]xS)F)~SF where Fis the fanout, Sis the spread, and ~[OJ is the number of bases F*L. Proof~ Let Then, By Corollary 2.3.lc, K = (@S) . @F (@S) = K X @F S = F*K. CM = F x +[I=1;._,L] (S*I) X F*LI. Substituting (1) into (2) yields CM= F x +[1=1;._,L] (F*KxI)xF*LI = F x +[I=1~L] F*(KxI)+LI = (F*l+L) x +[I=1~L] F*(K x I)I = (F*l+L) x +[l=l~L] (F*K1)*I. Sin ce the banyan is not rectangular, S F (S) @F K 1 (F*K1) 1. (1) (2) (3)
PAGE 184
174 Thus, by Lemma 2.3.2 (+[I=1~L] (F*K1)*I) = (F*K1) x ((F*L x K1)1)f(F K1)1. Substituting (4) into (3) yields CM= (F*1+L) x (F*K1) x ((F L x K1) 1)f(F*K1)1 = (F*L)x(F* K )x(((F*L) x F K1)1)f(F K1)1 = (F L) x (F*K) x (((F*L)xF*K)F)f(F*K)F. From Corollary 2.2.lc, (F*L) = N[O] Substituting (1) and (6) into (5) yields CM= N[O] x S x ((N[O] x S)F)fSF. Q.E.D. (4) (5) (6)
PAGE 185
175 B.2.4 Optimum Fanouts Theorems 2.4.1 and 2.4.2 show h o w the fan o utspread parameter of a regular, rectangular banyan can be selected t o minimize either the cost or the costdelay pr o duct discu s sed in S ection 9. Theorem 2.4.1. Let P be the fanou t of a regular, rectangular banyan with N bases, and let I be a p o sitive integer. Any c o st or costperformance function of the form CM[I] = CxFx(F@N)*I is minimized with respect t o F when where e is th e natural l oga rithm b ase, 2. 7 1828 and where C is a positive c o nstant. Proof. To minimize CM[I ] with respect to F, we set its partial with respect t o P e qual to zero and s o lve for F. 0 = J_ CxFx(F@N)*I 3F = .l_ CxFx((@N)+@F)*I aF = Cx((@N)*I)x _l_ F+(@F)*I 'iJF 0 = l_ F+(@F)*I aF =( ((@F)*I)Fx 1.. (@F)*I)+(eF)*Ix2 oF 0 = ( (@F)*I)Fx .i_ (@F)*I ~F = ((@F)*I)Fxix((@F)*I1)_1_ @P cJF = ((@F)*I) F x ix((@F)*I1)+F 0 = (@F) I (@F) = I F = e*I Q.E. D
PAGE 186
176 Theorem 2.4.2. Let Q M [I] be defined as in Theorem 2.4.1. When F is restricted to positive integer values, CM[1] is minimized with respect to F when F = 3, and CM[2] is minimized with respect to P when F = 7. Further, the value of QM[1] is the same for F = 2 as for F = 4. Proof. C M [I] = C x Fx (F@N)*I = C x F x ((@N)7@F)*I = (C x (@N)*I) x F7(@F)*I. In Theorem 2. 4. 1, CM[ 1 J was found to have a single minimum at F = e, which lies between integers 2 and 3. Consequently, the optimal integer value for Fis either 2 or 3. But, ((Cx(@N)*1) x 27()*1) = (C x @N) x 2.885, which is greater than ((Cx(@N)*1) x 37(@3)*1) = (C x @N) x 2.731. Therefore, CM[1] is minimized with respect to F when F = 3. Similarly, CM [2] has a single minimum at F = e*2, which lies between integers 7 and 8. The optimal integer value for Fis then either 7 or 8 But, ( = (C x (@N)*2) x 1.8486. w hich is less than = (C x (N)*2) x 1.8 5 01 T herefore, CM[2] is minimized with respect to F when F = 7. (Note, ho w ev e r, t hat C M [2] is increased only by about 0.08% when F = 8.) Finally, we show that QM[1] has the same value when F = 2 as w h e n F = 4. When F = 2, CM[1] = (C x (@N)) x 2 7 Wh enF = 4,
PAGE 188
178 B.3 SW Banyans The class of Llevel banyans called SW banyans, discussed in Section 6, will be defined and analyzed formally in this section. B.3.1 Synthesis SW banyans are synthesized recursively from crossbars using the synthesis principle discussed in Sections 4.3 and B.1.2. In this section, we will define crossbars, define SW banyans recursively in terms of crossbars, prove that SW banyans are indeed banyans, and then analyze relationships between a synthesized SW banyan and its inter connection graph and component banyans. Definition 3.1.1. A crossbar is a 1level banyan. Theorem 3.1.1. Every crossbar is regular. A crossbar with F bases and S apexes has fanout F and spread S. Proof. Let C be a crossbar with F bases and S apexes. By Definition 3.1.1, C is an level banyan. From each vertex in level O (i.e., from each base), there must exist exactly one path, and, hence one arc, to each vertex in level 1 (i.e., to each apex). Since there are S apexes, there are S arcs incident out from each vertex in level O. Similarly, to each apex, there must exist one arc from each base; so there are F arcs incident into each vertex in level 1. C is therefore uniform, and since L = 1, it is also regular with fanout F and spreads. Q.E.D. Definition 3.1.2. A graph G is called an SW banyan iff either
PAGE 189
179 1) G is a crossbar or 2) G is a synthesized graph whose component banyans are all crossbars and whose interconnection graph is an SW banyan. We call SW banyans formed in this second manner synthesized SW banyans, and we call their component banyans component crossbars. From this definition, it is apparent that if every crossbar has a property P and if P(G') implies P(G) for every synthesized SW banyan with interconnection graph G', then Pis true of every SW banyan. This induction principle will be used in proving many of the following theorems. Theorem 3.1.2. Every SW banyan is a banyan. Proof. Every crossbar is an SW banyan by Definition 3.1.1. Now let G be a synthesized SW banyan with interconnection graph G', and suppose that G' is a banyan. Then, by Theorem 1.2.3, G is a banyan. Q.E.D. Theorem 3.1.3. Every SW banyan is an Llevel banyan. Further, if an SW banyan G' with L' levels is the interconnection graph of a synthesized SW banyan G with L levels and if O $I$ L', then l) L = L'+1, 2) the bases of the component crossbars in level I of G' are the vertices of level I of G, and 3) the apexes of the component crossbars in level I of G' are the vertices of level I+l of G. Proof. Every crossbar is a 1level banyan by definition. Now suppose that G is a synthesized SW banyan with interconnection graph G'. Suppose further that G' is an 'level banyan. For each I= O~L', let SA[I] be the set of all apexes and let SB[I] be the set
PAGE 190
180 of all bases of the component crossbars in level I of G'. Thus, SB[I] = S A [I1] for all I= 1~L', and the vertices of G' can be partitioned into L+2 subsets as follows. SV[O] = SB[O] Qf[I] =~[I]= SA[I1] SV[L+1] = SA[L] (i = 1~L) Since the arcs of Gare those of its component crossbars, each arc in G is from some vertex in SV[I] to some vertex SV[l+1] for some o L. Thus, G is an (L+1 );... level banyan and SV[ 0], ... SV[L+1 l as defined above are its levels. Q.E.D. Corollary 3.1 ~ 3a. Let G be a synthesized SW banyan with interconnection graph G', and let V be either a base or an intermediate of G. Then there exists a unique component crossbar V' in level ~V of G' such that Vis a base of V' Proof. By part 2 of Theorem 3.1.3, there exists a component crossbar V' in level ~V of G' such that Vis a base of V'. There cannot exist more than one such component crossbar, since by part 2 of Definition l.2.1, V cannot be a base of more than one component banyan. Q.E.D. Corollary 3. 1. 3b. Let G be a synthesized SW banyan with interconnection graph G', and let V be either an apex or an intermediate of G. Then there exists a unique component crossbar V' in level (~V) 1 of G such that V is an apex of V' Proof. By part 3 of Theorem 3. 1. 3, there exists a component cross bar V' in level (~V)1 of G' such that Vis a base of V'. There cannot exist
PAGE 191
181 more than one such component crossbar since by part 3 of Definition 1.2.1, V cannot be an apex of mere than one component banyan. Q.E.D. Theorem 3.1.4. Let G be a synthesized SW banyan, and suppose that its interconnection graph G' is uniform with L' levels, fanout vector E', and spread vector~. Suppose also that each apex of G' has A apexes and each base of G' has B bases. Then G is uniform with fanout vector B,E' and spread vector ~' ,A. Proof. By Theorem 3.1.3, each vertex in level 1 of G is an apex of a base of G'. But each base of G' is a crossbar with B bases, and hence, has fanout B by Theorem 3.1.1. Therefore, each vertex in level 1 of G bas B arcs incident into it. Now let I be a level number such that 2 s Is L, where Lis the number of levels in G. By Theorem 3.1.3, each vertex in level I of G is an apex of a component crossbar in level I1 of G'. But each component crossbar in level I1 of G' has E[I1] bases by Theorem 1.2.1, and hence, has fanout E[I1] by Theorem 3.1.1. Therefore, each vertex in level I of G has E[I1] arcs incident into it. Similarly, by Theorem 3 .1. 3, each vertex in level L1 of G is a base of an apex of G'. But each apex of G' is a crossbar with A apexes, and, hence, has spread A by Theorem 3.1.1. Therefore, each vertex in level L1 of G has A arcs incident out from it. Now let J be a level number such that O s J s L2. By Theorem 3.1.3, each vertex in levelJ of G is a base of a component crossbar in level J of G' But each component crossbar in level J of G' has ~[J+1] apexes by
PAGE 192
182 Theorem 1.2.1, and hence, has spread Q[J+1lby Theorem 3.1.1. Therefore, each vertex in level J of G has Q[J+1] arcs incident out from it. Thus, G is uniform with fanout vector B,E and spread vector Q,A. Q.E.D. Theorem 3.1.5. If a synthesized SW banyan is uniform with fanout vector E and spread vector Q, then its interconnection graph is uniform with fanout vector and spread vector (1)+Q. Proof. Let G be a uniform synthesized SW banyan with fanout vector E, spread vector Q, and L levels. Let G' be the interconnection graph of G. By Theorem 3.1.3, G' has L1 levels. Let V' be any component crossbar in level I of G' such that 1 L1. By Theorem 3.1.3, the apexes of V' are in level I+1 of G. Thus, the fanout of V' is E[I+1l. By Theorem 3.1.1, V' has E[I+1lbases; so by Theorem 1.2.1, V' has E[I+1l arcs incident into it in G'. Similarly, let W' be any component crossbar in level J of G' such that O J L2. By Theorem 3.1.3, the bases of W' are in level J of G. Thus, the spread of W' is Q[J+1]. By Theorem 3.1.1, W' has Q[J+1J apexes; so by Theorem 1.2.1, W' has Q[J+1l arcs incident out from it in G'. Therefore, G' is uniform with fanout vector 1+E and spread vector ( 1 H!2'.. Q.E.D.
PAGE 193
183 Theorem 3.1.6. Let B1 and B2 be two distinct bases of a synthesized SW banyan G with interconnection graph G'. Let B1' and B2' be the bases of G' that contain B1 and B2 respectively as bases. Then (B1'W2') = (B1W2)1. Proof. Let I= B1~2. Then there exists a vertex Vin level I of G such that B1 V and B2 V. Let V' be the component crossbar in level I1 of G' that contains Vas an apex. By Corollary l.2.2a, B1' V' and B2' V'. Therefore, (B1'W2') s I1. Now let J = B1'W2'. Then there exists a component crossbar W' in level J of G' such that B1' W' and B2' W'. Let W be an apex of W'. By Theorem 3.1.3, Wis in level J+1 of G. By Corollary l.2.2a, B1 W and B2 W. Therefore, (B1W2) s J+1 and hence, (B1'~B2') (B1W2)1. Thus, (B1'itJB2') = (B1W2)1. Q.E D. Theorem 3.1.7. Let A1 and A2 be two distinct apexes of a synthesized SW banyan G with interconnection graph G'. Let A1' and A2' be the apexes of G' that contain A1 and A2 respectively as apexes. Then (A1' !i:M 2') = (A1!}].42)1. Proof. Let i = A12 and let L be the number of levels in G. Then there exists a vertex Vin level LI of G such that V A1 and V A2. Let V' be the component crossbar in level LI of G' that contains V as a base. By Corollary l.2.2a, V' A1' and V' A2'. Therefore, ( A 1' ~ 2') s L'(LI) where L' is the number of levels in G'. But by Theor e m 3.1.3, L' = L1. Thus, (A1'~2') s (L1)(LI) (A1 ~ 2') s J1.
PAGE 194
184 Now let J = A1'~2'. Then there exists a component crossbar W' in level L' J of G' such that W' El A1' and W' A2'. Let W be a base of W'. By Theorem 3.1.3, Wis in level L'J of G. By Corollary 1.2.2a, W A1 and W A2. Therefore, (A12) $ L(L'J) (.41~,.42) $ L( (L1 )J) (A12) $ 1+J (A1 '2') (A1~2)1 Thus, (A1'2') = (A1tM2)1. Q.E.D.
PAGE 195
185 B 3. 2 Distance Properties In this section, useful and mathematically interesting properties of an SW banyan's base and apex distance functions will be derived formally. The properties derived here include those discussed in Section6.2. Theorem 3.2.1. Let B1 and B2 be any two bases of an SW banyan G, and let V be a vertex in some level I of G. If (B1W2) :,; I then B1 1:9 V iff B2 V. Proof. Let L be the number of levels in G and suppose that (B1W2):,; I. Suppose first that G is a crossbar. If B1 = B2, then obviously B1 El V iff B2 El V. If B1 .tc B2, then V is an apex of G, in which case B1 V and B2 V. Thus, in either case, B1 1:9 V iff B2 E:l V. Suppose next that G is a synthesized SW banyan with interconnection graph G', and suppose that Theorem 3. 2.1 holds for G'. If I=O, then B1 = B2, in which case B1 ~Vis equivalent to B2 V by substitution. Now suppose I > O, and let V' be the component crossbar in level I1 of G' which contains Vas an apex. Also, let B1' and B2' be the bases of G' which contain B1 and B2 respectively as bases~ Since (B1W2):,; I, it follows by Theorem 3.1.6 that (B1'W2'):,; I1. Thus, by the supposition that Theorem 3.2.1 holds for G', B1' E:l V' iff B2' E:l V'. But by Corollary 1.2.2a, B1 V iff B1 1 @ V' and B2' E:l V' iff B2 1:9 V. Therefore, B1 V iff B2 8:l V. Thus, if (B1W2):,; I, then B1 1:9 V iff B2 V. Q.E~D. Theorem 3.2.2. Let A1 and A2 be any two apexes of an SW banyan G with L levels, and let V be a vertex in some level I ofG. If (A1 ra42):,; L1 then V E;l A1 iff V l$l A2.
PAGE 196
186 Proof. Suppose that (A1~2) LI. Suppose first that G is a crossbar. If A1 = A2, then obviously V Bl A1 iff V Bl A2. If A1 A2, then V is a base of G, in which case V A1 and V fil A2. Thus, in either case, V A1 iff V A2. Suppose next that G is a synthesized SW banyan with interconnection graph G', and suppose that Theorem 3.2.2 holds for G 1 If I= o, 'then Al = A2, in which case V E;l A1 is equivalent to V A2 by substitution. Now suppose I> O, and let V 1 be the component crossbar in level I of G 1 which contains Vas a base. Also, let A1 1 and A2' be the apexes of G' which contain A1 and A2 respectively as apexes. Since (A1ra42) LI, it follows by Theorem 3.1.7 that (A1 1 liil,42 1 ) (LI)1 (A1'lii1A2 1 ) (L1)1. But L1 is the number of levels in G'. Thus, by the supposition that Theorem 3.2.2 holds for G', V'A.1 1 iff V'.4.2'. But by Corollary 1.2.2a, ~1 iff V 1 8l,4.1 1 and V'ls4.2 1 iff ~2. Therefore, ~1 iff V.4.2. Thus, if (A1&lA.2) LI, then V E;J A1 iff V A2. Q.E.D. Definition 3.2.1. For any real number X we use ~X to denote the relational operation defined by B1~xB2 (B1W2)~. where B1 and B2 are arbitrary bases of an SW banyan. Definition 3.2.2. For any real number X we use ~X to denote the relational operation defined by A1~/2 ++ (A1~2) ~X, where A1 and A2 are arbitrary bases of an SW banyan.
PAGE 197
187 Theorem 3.2.3. For any real number X, the relation ~Xis both transitive and symmetric. Proof. Let B1, B2, and B3 be bases of an SW banyan G with L levels, and suppose that B1 ~X B2 and B2 ~X B3. Then there exists an integer I such that (B1l61B2) $; I, (B2~B3) $ I, I $ X, and I $ L. Since (B1W2) $ I, it follows that I~ X and I~ L. Since (B1~B2) I, there exists a vertex Vin level I of G such that B1 V and B2 V. By Theorem 3.2.1, B3 E:l V. Since B1 V and B3 E:l V, it follows that (B1~3) I. But I ~ X, so B1 ltlx B2. Thus, ~Xis transitive. Symmetry of ~X follows from Definition 3.2.1 and Theorem 2.1.1. Q.E.D. Corollary 3.2 3a. If Xis a non~negative real number, then ~Xis an equivalence relation. Proof. Let B be a base. By Theorem 2.1.3, 0 = m:;JB. Since Xis non negative, (B!CilB) X. Thus, B ~X B, and hence, ~Xis reflexive. ~Xis transitive and symmetric by Theorem 3.2.3. Therefore, ~Xis an equivalence relation. Q E.D. Corallary 3.2~3b. The base distance operation~ of an SW banyan G is a metric on the bases of G. Proof. Let B1, B2, and B3 be bases of G. By Theorem 2.1.2 0 B1W2 By Theorem 2.1.3, o = B1W2 iff B1 = B2. By Theorem 2.1.1, ( B 1W2) = B2 W 1. Now let X = (B1W2)r(B2W3). Thus, B1 ~X B2 and B2 ~X B 3 By Theorem 3.2.3, (B10B3) $ X. But since base distances are nonnegative, X $ (B1~B2)+(B2~B3). Therefore, (B1W3) (B1~B2)+(B 2 ~B3). Thus,~ is a metric on the bases of G. Q.E.D.
PAGE 198
188 Theorem 3.2.4. For any real number X, the relation ~Xis both transitive and symmetric. Proof. Let A1, A2, and A3 be apexes of an SW banyan G with L levels, and suppose that A1 ~X A2 and A2 ~X A3. Then there exists an integer I such that (A11YlA2) $ I, (A2ra43) $ I, I$ X, and O $ LI. Since (A1~3) $ I, there exists a vertex V in level I of G such that V A1 and V 8) A2. By Theorem 3.2.2, V A3. Since V A1 and V A3, it follows that (A11YlA3) $ I. But I$ X, so A1 ~X A2. Thus, ~Xis transitive. Symmetry of &Ix follows from Definition 3. 2. 2 and Theorem 2.1.1. Q.E.D. Corollary 3.2.4a. If Xis a nonnegative real number, then MX is an equivalence relation. Proof. Let A be an apex. By Theorem 2.1.3, O = AIYJA. Since Xis non negative, (A5M) $ X. Thus, A ~X A, and hence, ~Xis reflexive. ~X is transitive and symmetric by Theorem 3.2.4. Therefore, ~Xis an equivalence relation. Q.E.D. Corollary 3.2.4b. The apex distance operation~ of an SW banyan G is a metric on the apexes of G. Proof. Let A1, A2, and A3 be apexes of G. By Theorem 2.1.2, 0 $ A1f'M.2. By Theorem2.1.3, o" = A11YlA2 iff A1 = A2. By Theorem 2.1.2, (A1~2) = A2li2A1. Now let X = (A11YlA2)f(A2~3). Thus, A1 ~X A2 and A2 ~X A3. By Theorem 3.2.4, (A11YlA2) $ x. But since apex distances are nonnegative, X (A1~2)+(A21YlA3). Therefore, (A1f':M2) (A1~2 )+(A2~3) Thus,~ is a metric on the bases of G. Q.E.D.
PAGE 199
189 Theorem 3. 2. 5. Let X and Y be real numbers such that O :,; X :,; Y. Then the equivalence classes of relation ~X are subsets of those of relation Proof. Let SBX be one of the equivalence classes determined by the relation ltilX, and let SEY be an equivalence class of the relation ~y such that .t SBXn8BY. Let BC E SBXnSBY and let BX E SBX. Since both BC and BX are in SBX, (BC\6lBX):,; X. But X:,; Y, so (B~X):,; Y. Since BCE SEY, this implies that BX E SBY. Therefore, SEX~ SBY. Q.E.D. Theorem 3.2.6. Let X and Y be real numbers such that O:,; X:,; Y. Then the equivalence classes of the relation ~X are subsets of those of relation ltily. Proof. Let SAX be one of the equivalence classes determined by the relation ~x, and let SAY be an equivalence class of the relation ~y such that .t SAXnSAY. Let AC E SAXnSAY and let AX E SAX. Since both AC and AX are in SAX, (AX) :,; X. But X :,; Y, so (ACIMX) :,; Y. Since AC E SAY, this implies that AX E SAY. Therefore, SAX SAY. Q.E.D. Theorem 3.2.7. Let G be a uniform SW banyan with L levels and fanout vectorF, and let I be an integer such that O $I$ L. The relation ltilI partitions the bases of G into x/I+E equivalence classes containing x/ItE bases each. Proof. Let SB1 be one of the equivalence classes of the relation ~I. Also, let V be a vertex in level I such that Vis above some base in SB1. and let SB2 be the set of all bases below V. By Corollary 2.2.2a, SB2 contains x/ItE elements. By Theorem 3.2.1, every base in S B 1 is
PAGE 200
190 below V, so SB1 SB2. Now let B2 be an arbitrary element of SB2 and let Bl E SB1. Both B1 and B2 are elements of SB2, and hence, are below V. Therefore, (B1W2) I. Since B1 E SB1 this implies that B2 E SB1. Thus, SB2 SB1, SB1 = SB2, and SB1 contains x/Itf.. elements. Therefore, each equivalence class of ~I contains x /Itf.. bases. By Corollary 2. 2. la, there are x /f.. bases total. Since the equiva lence classes of ~I form a partitioning of these bases, the number of equivalence classes is Q.E~D. (x/f..)f x/I+E = x/I+f... Corollary 3.2.7a. Let G be a uniform SW banyan with L levels and fanout vector E, and let I be an integer such that 1 L. Then the rela tion ~Il has E[I] equivalence classes that are subsets of any given equivalence class of ~I. Proof. Let SB be an equivalence class of ~ 1 By Theorem 3.2.7, SB contains x/I+E bases. It is apparent from Theorem 3.2.5 that the equivalence classes of ~I~i that are subsets of SB form a partitioning of SB. But by Theorem 3.2.7, each of these subsets has x/(I1)tE elements. Therefore, the number of equivalence classes of ~Il that are subsets of SB is ( x /Itf._)f x/(I1)tf.. = E[I]. Q.E.D. Theorem 3.2.8. Let G be a uniform SW banyan with L levels and fanout vectorf., and let I be an integer such that O L. The r elation ~ I partitions the apexes of G into x /(I)+~ equivalence class es containing
PAGE 201
191 x /(I)t apexes each. Proof. Let SA1 be one of the equivalence classes of the 1 re ation IYJ r Also let V be a vertex in level LI such that Vis below some base in SA1, and let SA2 be the set of all apexes above V. By Corollary 2.2.2a, the number of elements in SA2 is x /(LI)+ = x/(I)t. By Theorem 3.2.2, every apex in SA1 is above V, so SA1 SA2. Now let A2 be an arbitrary element of SA2 and let A1 E SA1. Both A1 and A 2 are elements of SA2 and, hence, are above V. Therefore, (A1fQl.4.2) (L( LI)) = I. Since A1 E SA1, this implies that A 2 E SA1. Thus, SA 2 SA1, SA 1 = SA2, and SA1 contains x/(I) t elements. Therefore, each equivalence class of ~ I contains x/(I)t apexes. By Corollary 2 2.la, there are x /Q ape x es total. Since the equivalence classes of ~I form a partitioning of these apexes, the number of equivalence classes is Q.E.D. ( x /Q)+ x /(I)t5_ = x /( I)+S.. Corollary 3.2~8a. Let G be a uniform SW banyan with L levels, with fanout vector E, and with spread vector 5; and let I be an int eger such that 1 L. Then the relation ~Ii has 5_[ L(I1)] equ iva l en c e c lasses that are subsets of any given equivalence class of ~ I P roof. Let SA be an equivalence class of ~I By Theorem 3.2 8, SA c ontains x/(~I)t apexes. It is apparent from Theor e m 3.2.6 th at t h e e quivalence classes of ~ I ~ i that are subsets of SA form a p a rt ition ing
PAGE 202
192 of SA. But by Theorem 3.2.8, each of these subsets has x/(~(I1))tQ elements. Therefore, the number of equivalence classes of ~Il that are subsets of SA is Q.E.D. (x/(I)tQ)x/((I1))tQ = Q[(L(I1))~L]~Q[(L(I2))~L] = Q[L(I1)].
PAGE 203
193 B.3.3 Connectability The next theorem is a strengthened version of Theorem 2.1.7 appli cable to SW banyans. Theorem 3.3.1. Let SA1 and SA2 be sets of apexes and let SB1 and SB2 be subsystems of an SW banyan with L levels. Let SC1 = SB1:E1SA1 and let SC2 = SB2)!1SA2. Then the callsets SC1 and SC2 conflict with each other iff L 2 (SB1WB2)+SA1~A2. Proof. If L < (SB1~SB2)+SA1~A2 then SC1 does not conflict with SC2 by Theorem 7. Now suppose that L 2 (SB1WB2)+SA1~A2. Then there exists a level number I such that (SB1WB2) L(SA1~A2). Let B1, B2, A1, and A2 be elements of SB1, SB2, SA1, and SA2 respectively such that (B1W2) = SB10SB2 and (A1~2) = SA1~B2. Thus, (B1~B2) I and (A1~2) LI. Let C1 be the call from B1 to A1 and let C2 be the call from B2 to A2. There exists a vertex Vin level I such that V lies along the path from Bl to A1. Thus, B1 V and V A1. But by Theorems 3.2.1 and 3.2.2 respectively, B2 V and V EM.2, implying that V lies along the path from B2 to A2. Since Vis connnon to both paths, C1 conflicts with C2. But C1 E CS1 and C2 E CS2, so SC1 conflicts with SC2. Q.E.D.
PAGE 204
194 B.4 CC Banyans The class of rectangular banyans called CC banyans, discussed in Section 7, will be defined and analyzed ~ormally in this section. B.4.1 Structure In this section, we will define CC banyans, characterize the existence of paths in them, and show that they are indeed rectangular banyans. Definition 4.1.1. Let L be a positive integer, let {i_[1~L] be a vector of positive integers, and let N = x/3.. Also let f[O~L;O ~ N1] be the vertices of a graph G such that for any two vertices 1:'.[I1;J1] and 1:'.[I2;J2], there exists an arc from f[I1;J1] to f[I2;J2] iff both I2 = I1+1 and J2 = NIJ1+M x (x/I1t) for some M = O~{i_[I2]1. Any such graph G is called a CC banyan. Throughout this section, it will be understood that G, L, {l_, N, and 1:'. are as defined above. The particular CC banyan to which these quantities refer will be clear from context, because we will not deal with more than one CC banyan at a time. It will also be understood that ~ [O~N1] = 1:'.[0;0~N1] and A_[O~N1] = 1:'.[L;O~N1]. The modulo N arithmetic operators defined below will be used frequently in this section. X eY ++ N]X+Y ex ++ NIX xeY ++ xeeY Lemina 4.1.1. Every CC banyan is the graph of a partial order
PAGE 205
195 Proof. Since each arc of G is from some vertex f[I;Jl] to a vertex f[I+1;J2], G cannot contain a circuit and, hence, is the graph of a partial order. Q.E.D. Theorem 4.1.2. Let f[I1;J1l and E[I2;J2] be arbitrary vertices of a CC banyan G. If Ii < I2 and (J2eJ1) < x/I2tfi and O = (x/I1tfi)1J2J1, then G contains exactly one path from f[I1;J1] to f[I2;J2]. Otherwise, G contains no path from I[I1;J1] to f[I2;J2]. Proof. First suppose that I1 < I2. Then it is apparent from Definition 4.1.1 that any path from f[I1;J1] to f[I2;J2] must contain exactly I2I1 arcs. It is also apparent that any (I2I1)arc path from f[I1;J1] must pass through the vertex sequence f[Il ;J1], f[I1+1;J1e(x/I1tfi)xM_[1]], f[I1+2;J1e(x/I1tfi)xM_[1]eM_[2]xfi[I1+1]], K[I1+3 ;J1$( x/ I1 tfi) xM_[ 1 ]$(M_[ 2] xfi[I1+1] )ef:1_[3 ] xfi [I1+1 ] xfi [I1+2]], ... f[I1+K;J1e(x/I1+fi)x(I1+!i)~K+M_J, ... E[I2;J1e( x / I1 tfi)x (IHfih(I2I1 HM.J for some vector of integers M_[1~I2I1] where for each K, O :;; M. [K] < fi[Il+K]. Since ( (I2I1) tM.) = M, the last vertex in this sequence is simply ,E[I2;J1e(x/I1tfi)x(I1+fi)~]. But since O:;; M_[K] < fi[I1+K], there are x/I1+I2tfi possible values of M., and the possible values of (I1+ fi).tM are o ~ (x/I1+I2+!i)1. Thus, there are x /I1+I2+!i paths of length I2I1 from vertex V[I1J1] and the terminal vertices of these paths are
PAGE 206
196 E[I2;J1], I[I2;J1$(x/I1tQ_)], K[I2;J1e(x/I1t!2'.)x2], .. ,I[I2;J1e( x /I1tQ_)x (x/IHI2t!2'.)1]. But ( (x/I1tQ_)x(x/I1H2tQ_)1) = (x/I2tQ".)(x/I1tQ'.), so this set of terminal vertices is precisely [ I[I2;X] (0 = ( x /I1tQ'.)IXeJ1) A (XeJ1) < x/I2tf1. ]. This set equals [ K[I2;X] : (0 = (x/I1tQ'.)IXJ1) A (XeJ1) < x/I2tf1.], however, because 0 = (x/IHQ'.) ]N and hence ((x/I1tQ'.) IXeJ1) = ((x/I1+S)INIXJ1) = (x/I1tQ".)IXJ1. Thus, if there exists a path from E[I1;J1] to f[I2;J2], then 0 = (x/I1tQ".)jJ2J1 and (J2eJ1) < x/I2+$_. Likewise, if O = (x/I1t!2'.)IJ2J1 and (J2eJ1) < x/I2tQ"., then there exists a path from f[ T1;J1] to f[I2;J2]. Further, since ((x/I1tQ_)x(x/IHI2tQ_)1) = ( x /I2tQ,) (x/IHtD < x/f1. = N, the terminal vertices described above are all distinct; so there can be at most one path from f[I1;J1] to I[I2;J2]. Suppose, on the other hand, that I1 I2. Then it is obvious from Definition 4.1.1 that G contains no path from I[I1;J1] to I[I2;J2]. Q.E.D. Corollary 4.1.2a. Let I[I1;J1] and I[I2;J2] be arbitrary vertices of a CC banyan G, and let~ be the partial order associated with G. Then V[I1;J1] V[I2;J2] if I1 s I2 and (J2eJ1) < x/I2tQ'. and 0 = (x/I1tQ.)IJ2J1. Proof. First suppose that I1 I2 and (J2eJ1) < x /I2tQ, and 0 = (x/I1tf1.)1J2J1. If I1 < I2 then, by Theorem 4.1.2, there is a path from f[I1;J1] to K[I2;J2], implying that I[I1;J1] I[I2;J2]. If, on the other hand, I1 = I2 then0 = (x/I2t!1.)IJ2J1. Since ( x /I2tS)IN, But, Therefore, 0 = (x/I2tS)INIJ2J1 = ( x /I2tQ".)IJ2eJ1. (J2eJ1) < x /I2t{l_. (J2eJ1) = o J1 = J2
PAGE 207
197 Now, suppose that f[I1;J1] K[I2;J2]. Then either f[I1;J1] = f[I2;J2] or there exists a path from f[I1;J1] to f[I2;J2]. If there exists a path from f[I1;J1] to f[I2;J2], then it follows from Theorem 4.1.2 that I1 s; I2 and (J2eJ1) < x /I2tS,_ and O = ( x /I1tS..)lJ2J1. If, on the other hand, f[I1;J1] = E[I2;J2], then I1 = I2 s; I2 and (J 2 eJ1) = (J2eJ2) = 0 < x /I2 t S,_ and O = (( x /IHS,_)lO) = (( x /I1tS..)lJ2J 2 ) = ( x/ I1 tS) lJ2J1. Q.E.D. Corollary 4.1.2b. Let ~[J1] be a base and let f[I;J2] be a vertex of a CC banyan G with partial order~Then ~[J1] f[I;J2] iff (J2eJ1) < x/ Its_. Proof. If ~[J1] f[I,J2] then it follows immediately from Corollary 4.l.2a that (J2eJ1) < x /ItS,_. Now, suppose that (J2eJ1) < x/Its._. Clearly, 0 s; I. Also, O = (1!J2J1) = (x/OtS,_)IJ2J1. Since ~[J1] = K[O ; J1], it follows from Corollary 4. 1.2a that ~ [J1] f[I;J2]. Q.E.D. Corollary 4.1.Zc. Let I[I;J1] be a vertex and let &[J2] be an apex of a CC banyan G with partial order ~Then f[T;J1] &[J2] iff 0 = ( x /ItS,_) IJ2J1. Proof. If f[I;J1] ~ [J2] then it follows immediately from Corollary 4 1.Za that O = (x/ItS,_) 1J2J1. Now suppose that O = ( x /ItS,_)JJ2J1. Clearly Is; L. Also, (J2eJ1) < N = (x/S) = x /LtS,_. Since &[J2] = f[L;J2], it follows from Corollary 4.1.Za that I[I;J1] tl[J2J. Q.E.D.
PAGE 208
198 Theorem 4.1.3. A CC banyan G is a rectangular banyan with L levels and with fanout/spread vector Q. Also, ~[OJ, ,~/N1] are the bases of G, and A(0], ,4[N1J are the apexes of c. Proof. It is apparent from Definition 4.1.1 that ~[O], ... ,~[N1] have no arcs incident into them and, hence, are bases. Further, these are the only base of G, because for any I= 1~L and any J = o~N1, vertex f[I;J] has an arc incident into it from f[I1;JJ. Also it is obvious from Definition 4.1.1 that 4[0J, .. ,4[N1J are precisely those vertices of G which have no arcs incident out from them, and, hence, are the apexes of G. Next we will show that G is a banyan. By Lemma 4.1.1, G is the graph of a partial order. Next, consider an arbitrary base f[O;J1] and an arbitrary apex f[L;J2]. Clearly, o < L, and (J2eJ1) < N = ( x /Q) = x/Lt{i_, and O = (1 IJ2J1) = (x/Ot{i_) IJ2J1 Thus, by Theorem 4.1. 2, G contains exactly one path from f[O;J1] to f[L;J2]. Therefore, G is a banyan. It is apparent from Definition 4.1.1 that G has L levels and that for each I= o~L1, there are ~[I+1] arcs incident out from any vertex f[I;J] in level INow consider the arcs incident into an arbitrary vertex K[I2 ;J2] in level I2 where 1 I2 $ L. Let f[I1 ;J1] be the initial vertex of one such arc. Then by Definition 4.1.1, I1 = I21 and J2 = J1eMxx/I1t(J__ for some integer M where O $ M $ {i_[I2]1 Therefore, J1 = J2eMxx/I1tS. There are Q[I2] possible values of Af. Since O $ M $ {i_[I1+1]1 and o $ I1 $ L1, it follows that O $ (M xx /I1t{i_) < N Consequently, each possible value of M corresponds to a different possible value of J, implying that there are Q[I2] arcs incident into E:[I2;J2]
PAGE 209
199 Therefore, G is rectangular with fanout vector S. and with spread vector s. Q.E.D
PAGE 210
200 B.4.2 Distance Properties In this section, useful and mathematically interesting properties of a CC banyan's base and apex distance functions will be derived formally. The properties derived here include those discussed in Section 7.2. Definition 4.2.1. The dyadic operator~, called minimum circular distance, is defined by J1W2 ++ (J2eJ1)L(J1eJ2) where J1 and J2 are integers. Lemma 4.2.1. Let X, Y, Z, and Mbe integers. If (XeY) $ Mand (ZeY) $ M, then (.TIZ) $ M. Proof. Suppose (XeY) $Mand (ZeY) $ M. But 0 $ (XeY) and 0 $ (ZeY), so ((XeY)(ZeY)) $Mand ((ZeY)(XeY)) $ M. First, consider the case where (XeY) $ (ZeY). Then 0 $ ((ZeY)(XeY)) < N and hence ((ZeY)e(XeY)) = (ZeY)(XeY). Therefore, (X'QZ) = ( zex) L (XeZ) = zex = (ZeY)e(XeY) = (ZeY)(XeY) s M. Next consider the case where (ZeY) < (XeY). Then 0 $ (( XeY)(ZeY)) < N and hence ( (XeY)e(ZeY)Y = (XeY){ZeY). Therefore, (Xt:JZ) = (ZeX)l(XeZ) $ xez = (XeY)e(ZeY) = (XeY){ZeY) = M. Thus~ in either case, (XIJZ) $ M. Q.E.D.
PAGE 211
201 L emma 4. 2. 2. Let X, Y, Z, and M be integers. If (YeX) $ M and (YeZ) $ M then (lJZ) :<::: M. Proof. Suppose (YeX) $Mand (YeZ) $ M. But o :<::: (YeX) and o $ (YeZ), so ((YeX)YeZ)) :<::: M and ((YeZ)(YeX)) $ M. First, consider the case where (YeX) :<::: YeZ. Then o $ ((YeZ) (YeX)) < N and hence ((YeZ)e(YeX)) = (YeZ)(YeX). Therefore, Croz) = (ZeX)L(XeZ) $ xez = (YeZ)e(YeX) = (YeZ)(YeX) $ M. Next consider the case where (YeZ) < YeX. Then O $ ((YeX)(YeZ)) < N and hence ((YeX)e(YeZ)) = (YeX)(YeZ). Therefore, (ffiZ) = ( ZeX)l (XeZ) $ zex = (YeX)e(YeZ) = (YeX)(YeZ) = M. Thus, in either case, (.xt]Z) $ M. Q.E.D. Theorem 4.2.3. The minimum circular distance operator~ for any CC banyan is a metric on the integers O ~ N1. Proof. Let X, Y, and z be integers in the range b ~ N1. Then, (fflY) = ((YeX)l(XeY)) ;,: O, and
PAGE 212
If {m.Y) = 0 then If X = Y then 202 (X~Y) = (YeX)L{XeY) = (XeY)L(YeX) = .YfjX. O = (YeX) L (XeY) ( 0 = YeX) v ( 0 = XeY) (Y=X) V (X=Y) X = Y. {fflY) = (XeX) L (XeX) = O. Therefore, (m.Y) = 0 iff X = Y. Finally, to show that (fflZ) $ (fflY)+(fflZ), consider first the case where (YeX) XeY and ( ZeY) $ YeZ. Then {W) = YeX and (fflZ) = ZeY. Therefore, (fflZ) = (ZeX)L(XeZ) = zex $ zex = (YeX)EB(ZeY) $ (YeX)+(ZeY) = (fflY)+(fflZ) Second, consider the case where (XeY) < YeX and (YeZ) < ZeY. Then (XW') = XeY and (fflZ) = YeZ. Therefore, (fflZ) = (ZeX)L(XeZ) xez = (XeY)e(YeZ) $ (XeY)+(YeZ)
PAGE 213
203 = ()+(fflZ) Third, consider the case where (XeY) < YeX and (ZeY) = Yez. Then (..KID') = XeY and (fflZ) = ZeY. Let M = (XeY)f(ZeY). B y Lemma 4.2.1, (fflZ) $ M. But, so (XgZ) $ (ffl.Y)+(fflZ). M = (XeY)r(ZeY) $ (XeY)+(ZeY) = < Xlli' )+ < mz ) Finally, consider the case where (YeX) $ XeY and (YeZ) < ZeY. Then (ffl.Y) = YeX and (fflZ) = Yez Let M = (YeX)f )YeZ). B L 4 2 2 y emma (XQZ) $ M. But, so (fflZ) $ (XgY)+(fflZ). M = (YeX)f (YeZ) = (YeX)+(YeZ) = (X~)+(fflZ) Thus, in every case, (fflZ) $ (Xlli')+(fflZ). Q.E.D. Theorem 4.2.4. Let ~[J1l and ~[J2] be bases of a CC banyan G, and let I be an integer such that O $I$ L. Then (~[J1]~~[J2]) $ I iff (J1W2) < x/It.. Proo~ First, suppose that (~[J1]~[J2]) $ I. Then there exists a vertex f[I;J3J in level I of G such that ~[J1] f[I;J3] and ~[J2] K[I;J3]. By Corollary 4.l.2b, (J30J1) < x /It{i, and similarly, (J3eJ2) < x /It{i. Therefore, (J3eJ1) $ (x/It{i)1 and (J30J1) $ ( x /It{i)1. By Lemma 4.2.2, (J1W2) $ ((x/It{i)1 < ( x /ItS..). Nest suppose that (J1W2) < x / ItS... Consider the case where (J2eJ1) $ J1eJ2. Then (J2eJ1) = (J1W2) < x /It{i. Also (J 2eJ2) $ 0
PAGE 214
204 < x/It{i_. By Corollary 4.1.2b, ~[J1] f[I;J2] and ~[J2] f[I;J2]. Therefore, (~[J1]~~[J2]) I. On the other hand, consider the case where (J1eJ2) < J2eJ1. Then (J1eJ2) = (J1W2) < x /I t{i_. Also, (J1eJ1) = O < x/It{i_. By Corollary 4.1.2b, l![J2] 1:'.[I;J1] and ~[J1] [I;J2] Therefore, (l![J1]Wl[J2]) I. Thus, in either case where (J1W2) < x/I+S, we obtain (B[J1]VSIB[J2]) I. Q.E.D. Theorem 4.2.5. Let A[J1] and A[J2lbe apexes of a CC banyan G, and let I be an integer such that O L. Then (A[J1]~[J2]) L1 iff o = (x/It{i_)1J2J1. Proof. First, suppose that (A[J1]liM_[J2]) LI. Then there exists a vertex f[I;J3] in level I of G such that f[I;J3] d_[JJ] and 1:'.[I;J3] A[J2]. By Corollary 4.1.2c, O = (x/It{i_)IJ1J3, and similarly, 0 = (x/It{i_)IJ2J3. Therefore, 0 = (x/It{i_)l(J2J3)(J1J3) = ( x / I+S) IJ2J1, Next, suppost that O = (x/It{i_)IJ2J1, Then by Corollary 4.1.2c, f[I;J1] A[J2]. But O = ((x/It{i_) !O) = x/I+[i_lJ1J1, so it also follows from Corollary 4.1.2c that f[I;J1] l$l d_[J1]. Theref0re, (d_[J1]&t1[J2]) LI. Q.E.D. Corollary 4.2.5a. Letd_[J1] and d_[J2] be apexes of a CC banyan G, and let I be an integer such that O L. Then(d_[J1]~[J2]) I iff O = (x/(I)+[i_))J2J1. Proof. Since I= L(LI), it follows from Theorem 4.2.5 that (c1_[J1]!M[J2])
PAGE 215
205 :,; I iff O = (x/(LI)t{i)IJ2J1. But ((LI)tQ) = (IHS, so (d[J1]!iM_[J2]) :,; I iff O = (x/(I)+{i)!J2J1. Q.E.D. Theorem 4.2.6. Let G be a CC banyan such that 2:,; {i._[I] for every I= 1 ~ L. Then the base distance operator~ is a metric on the bases of G. Proof. Let ~[J1], H[J2], and H[J3] be bases of G. By Theorem 2.1.2, 0:,; ~[J1](6JH[J2]. By Theorem 2.1.3, O = ~[J1]~[J2] iff ~[J1] H[J2]. By Theorem 2.1.1, rn[J1]~[J2]) = H[J2]~[J1 ]. To prove the triangle inequality, let I1 = H[J1]~[J2], let I2 = ~[J2 JeiJH[J3 J, and let I3 = I1 + I2. If I1 = O then H[J1] = ~[J2 J and (~[J1]1::l~[J3]) = I2 = I3. If I2 = O then H[J2] = H[J3l and rn[J1]~[J3]) = I1 = I3. If I3 L then (H[J1]~~[J2]):,; I3 by Theorem 2.1.2. Suppose, on the other hand, that O < I1 and O < I2 and I3 < L. Then by Theorem 4.2.4, (J1W2] < x/IH{i and (J2W3) < x/I2t{i._. Consequently, ((J1W2)+J2W3) < (x/IH{i) + x/I2+{i. Since !J is a metric, ( J1W3) (J1W2)+J2W3. Therefore, Since 2:,; {i._[I3]~ (J2W3) < (x/I1t{i) + x/I2+[i :,; 2 x x/(I1rr2)t{i._. (J1W3) < {i_[I3] x x/ (Iif I2 )t{i._. Since O < Il and O < I2, I3 = (I1+ I2) > IH I2. Therefore, (J1W3) < x/I3+'!J... By Theorem 4.2.4, rn[J1JltlH[J3J) :,; I3.
PAGE 216
206 Thus, in every case, (li[J1]~[J3]) I3 (~[J1 J~[J2] )+~[J2]~[J3]. Q.E.D. Theorem 4.2.7. The apex distance operator~ is a metric on the apexes of a CC banyan. Proof. Let A[J1], A[J2], and A[J3lbe apexes of G. By Theorem 2.1.2, o $ A[J1]~[J2]. By Theorem 2.1.3, o = A[J1]1'_M[J2] iff A[J1] = A[J2]. By Theorem 2.1.1, (A[J1]1'_M[J2]) = A[J2]~[J1]. To prove the triangle inequality, let I1 = LA[J1]1'.M[J2] and let I2 = LA[J2]l'M_[J3J. Hence, (A[J1]1'.M[J2]) = LI1 and (A[J2]1'.M[J3]) = LI2. Then, by Theorem 4.2.5, 0 (x/I1+)IJ2J1 and 0 = (x/I2+)\J3J2. Since (I1LI2) $ I1, O = (x/(I1LI2)t)IJ2J1. Since (I1LI2) $ I2, O = (x/(I1LI2)t)IJ3J2. Therefore, o = (x/(I1LI2)t)l(J3J2)+(J2J1) = (x/(I1LI2)t) IJ3J1. (1) Since apex distances are nonnegative, L I1 and L I2. Thus, L IHI2 O (IHI2)L
PAGE 217
207 (I1LI2) (I1LI2)+(I1fI 2 )L (I1LI2) I1+I2L 6 = ( x /(I1+I2L)+S)l( x /(I1LI2)tS). From equations (1) and (2) above, By theorem 4.2.5, Q.E.D. 6 = (x/(I1+I2L)+S)IJ3J1. (d[J1]&1.:1[J2]) L(I1+I2L) (d[J1]&\:1[J2]) (LI1)+(LI2) Cd[J1JM[J2J) CA[J1JMCJ2J)+d[J2J@:l[J3J. (2)
PAGE 218
APPENDIX C BIDIRECTIONAL SWITCHING CIRCUITS
PAGE 219
209 A switching device is said to be "bidirectional" if it can pass a signal in either direction rather than in just one direction. Mechanical switches and relay contacts are probably the simplest bidirectional devices. They will pass signals equally well in both directions when closed and will block signals in both directions when open. Figure C1 shows a simple relay used as a bidirectional switch. In contrast, the AND gate shown in Figure C2 is a unidirectional switch since it controls signal propagation in one direction only. This circuit has the advantage, however, that it can be realized using a small, highspeed electronic circuit. Further, with just about any logic family other than diode logic, the AND gate switch can amplify the signal it controls. Power amplification of data signals can be very important in a large switching network so that data from a lowpower source can be routed to many possible destinations without exceeding fanout limitations. There are a number of simple electronic circuits which can both switch and amplify bidirectional binary signals. Standard logic families such as DTL, TTL, ECL, and r 2 L can be used. Most any family can be adapted for bidirectional switching so long as it supports eithe r "wired AND" or"wiredOR 11 connections. Figures C3 and C4 show bidirectional switching circuits which can be built from standard TTL gates and ECL gates, respectively. Figure C....;5 shows a comparable I 2 L circuit s u itable
PAGE 220
foput or Output Signal " A 210 Signal Direction ~ .' ..t A .. ______ t. @ :.;, .ou_t_p_u_t_or In put Signal Control Signal Figure C1. Relay Used as a Bidirectional Switch
PAGE 221
211 Signal Direction Input Signal~~ AND 1'J~ Output Signal Control Signal Fgure C2. AND Gate Used as a Unidirectional Switch
PAGE 222
212 LefltoRi"ht i:, Control Input Wired AND Conncclion .:o...l ..._______..,. lJ.. / Left Data lnputOutp11l Wired ANO Connection 7402 7401 7402 RighttoLeft Control Input a) Logic Diagram Left Data InputOutput Lcft loRiglt l Control Input R ighlloLcft Control Input b) Suggested Symbol for Circuit Above ~}+Right Data InputOutput Ri~ht Data InputOutput Figure C3. Bidirectional Switch Using Standard TTL Gate s
PAGE 224
Left Data InputOutput LefttoRight Control Input L__:._~ Wired Al'1D Connections !lighttoLeft Control Input NOTE: Suggested symbol for this circuit is same as for TTL circuit in Figure C3. Figure C5. Bidirectional Switch for LSI Using 1 2 1 Gates Wired AND Connections
PAGE 225
215 for large scale integration. Integrated DTL bidirectional switches were constructed by Vice et al. (73), who also surveyed a number of earlier bidirectional circuits.
PAGE 226
APPENDIX D COMPLETE .srMULATION DATA
PAGE 227
217 Statistical data collected from the simulation experiments discussed in Section 8 is shown in Tables D1 through D7. Figures 8.21 through 8 24 and Tables 8.22 through 8.24 were derived from this data.
PAGE 228
TABLE D1 Simulation Results for SW Banyans Using Far Apex Selection Rule and Standard SetUp Rule Layers Required Nonempty Subsystems Connected (percent) Fanout Number Standard Total and of Error of Nonempty Spread Bases Mean Mean Maximum 1 Layer 2 Layers 3 Layers 4 Layers Subsystems 2 4 1.03 .017 2 98.46 100.00 195 2 8 1.25 044 2 92.69 100.00 342 2 16 1.73 .045 2 85.34 100.00 689 2 32 1.95 .036 3 79.24 99.68 100.00 1233 N ,.... 2 64 2.35 .061 3 76. 04 98.05 100.00 2458 CX) 2 128 2. 72 057 4 73.33 96.76 99.92 100.00 li999 3 9 1. 24 .043 2 93.88 100.00 392 3 27 1.92 .027 2 84.36 100.00 1215 3 81 2.12 .043 3 80.48 99.37 100.00 3197 4 16 1.37 .049 2 93.76 100.00 689 4 64 1.91 .032 3 83.40 99. 96 100.00 2458 4 256 2.39 .060 4 87.42 99.09 99.95 100.00 12931 8 64 1.80 040 2 90.15 100.00 2458
PAGE 229
TABLE D2 Simulation Results for SW Banyans Using Near Apex Selection Rule and Standard SetUp Rule Layers Required Nonempty Subsystems Connected (percent) Fanout Number Standard Total and of Error of Nonempty Spread Bases Mean Mean Maximum 1 Layer 2 Layers 3 Layers 4 Layers Subsystems 2 8 1.07 .026 2 97.95 100.00 342 2 64 2.08 046 3 81.77 99.35 100.00 2458 3 27 1.82 .039 2 89.55 100.00 1215 N 1' \0 4 16 1.19 .039 2 97.24 100.00 689 4 64 1.89 031 2 88.61 100.00 2458
PAGE 230
TABLE D3 Simulation Results for SW Banyans Using Far Apex Selection Rule and Modified SetUp Rule Layers Required Nonempty Subsystems Connected (percent) Fanout Number Standard Total and of Error of Nonerntpy Spread Bases Mean Mean Maximum 1 Layer 2 Layers 3 Layers 4 Layers Subsystems 2 8 1.05 .022 2 98.54 100.00 342 2 64 2.15 052 3 84.17 99.02 100.00 2458 4 64 1.85 036 2 90.15 100.00 2458 N N 0
PAGE 231
TABLE D4 Simulation Results for SW Banyans Using Near Apex Selection Rule and Modified SetUp Rule Layers Required Nonempty Subsystems Connected (percent) Fanout Number Standard Total and of Error of Nonempty Spread Bases Mean Mean Maximum 1 Layer 2 Layers 3 Layers 4 Layers Subsystems 2 4 1.00 0.0 1 100.00 195 2 8 1.05 .022 2 98.54 100.00 342 2 16 1.44 .050 2 93.32 100.00 689 N 2 32 1.80 040 2 87.59 100.00 1233 N .... 2 64 2.05 .046 3 86.05 99.43 100.00 2458 4 16 1.03 .017 2 99.56 100.00 689 4 64 1.74 044 2 92.19 100.00 2458 8 64 1.37 .049 2 97.48 100.00 2458
PAGE 232
TABLE D5 Simulation Results for CC Banyans Using Far Apex Selection Rule and Standard SetUp Rule Layers Required Nonempty Subsystems Connected (percent) Fanout Number Standard Total and of Error of Nonempty Spread Bases Mean Mean Maximum 1 Layer 2 Layers 3 Layers 4 Layers Subsystems 5 64 2.41 .064 4 70.38 97.52 99.96 100.00 1458 N N N
PAGE 233
TABLE D6 Simulation Results for CC Banyans Using Near Apex Selection Rule and Standard SetUp Rule Layers Required Nonempty Subsystems Connected (percent) Fanout Number Standard Total and of Error of Nonempty Spread Bases Mean Mean Maximum 1 Layer 2 Layers 3 Layers 4 Layers Subsystems 2 4 1.03 .017 2 98.46 100.00 195 2 8 1.32 .047 2 90.35 100.00 342 2 16 1.80 .040 2 80.84 100.00 684 N 2 32 1.94 .034 3 76. 72 29.76 100.00 1233 N w 2 64 2.17 053 3 73.31 98. 94 100.00 2458 4 16 1.33 .047 2 94.78 100. 00 689 4 64 1.90 030 2 85.68 100.00 2458
PAGE 234
TABLE D7 Simulation Results for CC Banyans Using Near Apex Selection Rule and Modified SetUp Rule Layers Required Nonempty Subsystems Connected (percent) Fanout Number Standard Total and of Error of Nonempty Spread Bases Mean Mean Maximum 1 Layer 2 Layers 3 Layers 4 Layers Subsystems 2 4 1.00 o.o 1 100.00 195 2 8 1.14 .035 2 95. 91 100.00 342 2 16 1.57 .050 2 89.70 100.00 689 N N 2 32 1.89 034 3 84.35 99. 9.2 100.00 1233 2 64 2.12 .050 3 81.53 99.19 100.00 2458 4 16 1.07 026 2 98.84 100.00 689 4 64 1.82 .039 2 91.83 100.00 2458
PAGE 235
REFERENCES Barnes, G. H., R. M. Brown, M. Kato, D. J. Kuck, D. L. Slotnick, and R. A. Stokes, "The ILLIAC IV Computer," IEEE Transactions on Computers, Vol. C17, pp. 746~757, Aug., 1968. Baskin, H.B., E. B. Horowitz, R. D. Tennison, and L. E. Rittenhouse, "A Modular Computer Sharing System," Communications of the ACM, Vol~ 12, No. 10, pp. 551559, Oct., 1969. ___ B. R. Borgerson, and R. Roberts, "Prime A Modular Architecture for TerminalOriented Systems," AFIPS Proc. SJCC, Vol. 40, pp. 43F437, 1972. Batcher, K. E., "Sorting Networks and Their Applications," AFIPS Proc. SJCC, Vol. 32, pp. 307314, 1968. Benes, V. E. (62), "Algebraic and Topological Properties of Connecting Networks," Bell System Technical Journal, pp. 12491274, July, 1962. (64a), "Permutation Groups, Complexes, and Rearrangeable Connect ing Networks," Bell System Technical Journal, pp. 16191640, July, 1964. (64b), "Optimal Rearrangeable Multistage Connecting Networks," Bell System Technical Journal, pp. 16411656, July, 1964. (65), Mathematical Theory of Connecting Networks and Telephone Traffic, Academic Press, New York, 1965. Berge, Claude, The Theory of Graphs, John Wiley and Sons, Inc., New York, 1962. Bhandarkar, D. P. and J.E. Juliussen, "A Comparative Evaluation of the Cost Effectiveness of Computer Systems," in J. D. White, ed., Proc. Annual Confere nce of the ACM, pp. 323324, Oct. 2022, 1975. Clos, C., "A Study of NonBlocking Switching Networks," Bell System Technical Journal, pp. 406:...424, March, 1953. Comptre Corporation, Enslow, P.H., Jr., ed., Multiprocessors and Parallel Processing, John Wiley and Sons, Inc., New York, 1974. Davis, R. L., "The ILLIAC IV Processing Element," IEEE Transactions on Computers, Vol. C18, No. 9, pp. 800816, Sept., 1969. 225
PAGE 236
226 Feng, TseYun, "Data Manipulating Functions in Parallel Processors and Their Implementations," IEEE Transactions on Computers, Vol. C23, No. 3, pp. 309318, March, 1974. Foster, C. C., "Determination of Priority in Associative Memories," IEEE Transactions on Computers, Vol. C17, No. 8, pp. 788789, Aug., 1968. Frank, R. A., "Imsai Arrays Micros for LowCost Power," Computerworld, Vol. IX, No. 44, pp. 1 and 3, Oct. 29, 1975. Gilman, L. and A. J. Rose, APL/360 an Interactive Approach, John Wiley and Sons, Inc., New York, 1970. Goke, L. R and G. J. Lipovski, "Banyan Networks for Partitioning Multi processor Systems," in Lipovski, G. J. and S. A. Szygenda, ed., Proc. First Annual Symposium on Computer Architecture, IEEE Catalog no. 73CH08243C, pp. 2128, Dec. 911, 1973. Goldstein, L. J. and S. W. Leibholz, "On the Synthesis of Signal Switch ing Networks with Transient Blocking," IEEE Transactions on Electronic Computers, Vol. EC16, No. 5, pp. 637641, Oct., 1967. IBM, APL/360 User's Manual, Publication GH200683 l, IBM Corp., 1968. Iver:son, K. E., A Programming Language, John Wiley and Sons, 1962. Joel, A. E., Jr., "On Permutation Switching Networks," Bell System Technical Journal, pp. 813822, May/June, 1968. Lang, T. and H. S. Stone, "A ShuffleExchange Network with Simplified Control," IEEE Transactions on Computers, Vol. C25, pp. 5565, Jan., 1976. Lawrie, D. H. (73), "MemoryProcessor Connection Networks," University of Illinois Report no. UIUCDCSR73557, Feb., 1973. (75), "Access and Alignment of Data in an Array l:'rocessor," IEEE Transactions on Computers, Vol. C24, No. 12, pp. 11451155, 1975. Lipovski, G. J. (69), The Architecture of a Large Distributed Logic Associative Processor, Coordinated Science Laboratory Report R424, University of Illinois, July, 1969. (70), "The Architecture of a Large Associative Processor," AFIPS Proc SJCC, Vol. 36, pp. 385396, 1970. Pease, M. C., !'An Adaptation of the Fast Fourier Transform for Parallel Processing," Journal of the ACM, Vol. 15, pp 252264, April, 1968. Schultz, G. W., R. M. Holt,and H. L. McFarland, "A Guide to Using LSI Microprocessors," Computer, pp. 1319, June, 1973.
PAGE 237
227 Searle, B. C. and D. E. Freberg, "Tutorial: Microprocessor Applications in Multiple Processor Systems," Computer, Vol. 8, No. 10, pp. 2230, Oct., 1975'. Stone, H. S., "Parallel Processing with the Perfect Shuffle," IEEE Transactions art Computers, Vol. C20, No. 2, pp. 153161, Feb., 1971. Thurber, K. J., E. D. Jensen~ L. A. Jack~ L. L. Kinney, P. C. Patton, and L. C Anderson, "A Systematic Approach to the Design of Digital Bussing Structures," AFIPS Proc. FJCC, VoL 41, pp. 719740, 1972. Vice, W. E., A. J. Brodersen~ and G. J. Lipovski, "On Integrated Circuit Bidirectional Amplifiers," Journal of Solid State Circuits, Oct., 1973. Waksman, A., "A Permutation Network," Journal of the ACM, Vol. 15, No. 1, pp. 159163, Jan., 1968. Wulf, W. A and C. G. Bell, "C.mmp A Multiminiprocessor," AFIPS Proc~ FJCC, Vol. 41, pp. 765777, 1972.
PAGE 238
BXOGRAPHICAL SKETCH Louis Rodney Goke was born December 6, 1946, in Memphis, Tennessee. In 1968, he received the B.S. degree in Mathematics from Christi.an Brothers College at Memphis. In 1971, he received the M.S.E. degree in Electrical Engineering from the University of Florida. While at the University of Florida, he held a College of Engineering Fellowship from 1968 to 1969, a teaching assistantship in the Department of Electrical Engineering from 1969 through 1970, and a research assistantship in the Department of Clinical Psychology from 1971 to 1973. Since 1973, he has been employed as a Member of the Technical Staff with Texas Instruments, where he has performed research and design work concerning operating systems, programming languages, software reliability, and software engineering methodologies. He is a member of the Institute of Electrical and Electronic Engineers, the IEEE Computer Society, the Association for Computing Machinery, and the ACM Special Interest Group on Programming Languages. He participated in the IEEE Region 3 Student Paper Competition in 1968 and subsequently has published three technical papers, two with Dr. Keith L. Doty concerning his master's research and one with Dr. G. J. Lipovski concerning his doctoral research. 228
PAGE 239
I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Gerald J. Lipovski, Chairman Associate Professor of Electrical Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Kefth L. Do'ty Associate Professor of Electtical Engineering I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. / / / / / / 1/ // lz 1 .:::ti .,,.,,. / ., .~ / / #, ( { / / .'"/ 1 / / t'( i~ / (/ / ., s tariley Y. W. _t>u / / Associate Prffes~or of Electrical Engineering ./ I certify that I have read this study and that in my opinion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and quality, as a dissertation for the degree of Doctor of Philosophy. Alexander R. Bednarek Professor of Mathematics
PAGE 240
I certify that I have read this study and that in my op~nion it conforms to acceptable standards of scholarly presentation and is fully adequate, in scope and q~ality, as a dissertation for the degree of Doctor of Philosophy. r r \ \ r ,., \ ,. ~ ( _,: I t ::.1 : _,_. : __ .\ ,, Fr ank D. Vickers Associate Professor of Comput e r Scienc e This dissertation was submitted to the Graduate Faculty of the College of Engineering and to the Graduate Council, and was accepted as partial fulfillment of the requirements for the degree of Doctor of Philosophy. June, 1976 Dean, Graduate School
