Title Page
 Table of Contents
 An integrated data network
 Query processing techniques
 The architecture of the IDN
 Query analysis and optimizatio...
 Query pipeline control methods
 Analytical evaluation
 Petri-net based modeling, simulation,and...
 Biographical sketch

Title: Distributed query processing techniques based on pipelining and data sharing in an integrated data network
Full Citation
Permanent Link: http://ufdc.ufl.edu/UF00082424/00001
 Material Information
Title: Distributed query processing techniques based on pipelining and data sharing in an integrated data network
Physical Description: vii, 195 leaves : ill. ; 28 cm.
Language: English
Creator: Mikkilineni, Krishna Prasad, 1959-
Publication Date: 1986
Subject: Electronic data processing -- Distributed processing   ( lcsh )
Database management   ( lcsh )
Computer networks   ( lcsh )
Electrical Engineering thesis Ph. D
Dissertations, Academic -- Electrical Engineering -- UF
Genre: bibliography   ( marcgt )
non-fiction   ( marcgt )
Thesis: Thesis (Ph. D.)--University of Florida, 1986.
Bibliography: Bibliography: leaves 184-194.
Statement of Responsibility: by Krishna Prasad Mikkilineni.
General Note: Typescript.
General Note: Vita.
 Record Information
Bibliographic ID: UF00082424
Volume ID: VID00001
Source Institution: University of Florida
Holding Location: University of Florida
Rights Management: All rights reserved by the source institution and holding location.
Resource Identifier: aleph - 000956999
oclc - 17313931
notis - AER9682

Table of Contents
    Title Page
        Page i
        Page ii
    Table of Contents
        Page iii
        Page iv
        Page v
        Page vi
        Page vii
        Page 1
        Page 2
        Page 3
        Page 4
        Page 5
        Page 6
    An integrated data network
        Page 7
        Page 8
        Page 9
        Page 10
        Page 11
        Page 12
    Query processing techniques
        Page 13
        Page 14
        Page 15
        Page 16
        Page 17
        Page 18
        Page 19
    The architecture of the IDN
        Page 20
        Page 21
        Page 22
        Page 23
        Page 24
        Page 25
        Page 26
    Query analysis and optimization
        Page 27
        Page 28
        Page 29
        Page 30
        Page 31
        Page 32
        Page 33
        Page 34
        Page 35
        Page 36
        Page 37
        Page 38
        Page 39
        Page 40
        Page 41
        Page 42
        Page 43
        Page 44
        Page 45
        Page 46
        Page 47
        Page 48
        Page 49
        Page 50
        Page 51
        Page 52
        Page 53
        Page 54
        Page 55
        Page 56
        Page 57
        Page 58
        Page 59
        Page 60
        Page 61
        Page 62
        Page 63
        Page 64
        Page 65
        Page 66
        Page 67
        Page 68
        Page 69
        Page 70
        Page 71
        Page 72
        Page 73
        Page 74
        Page 75
        Page 76
        Page 77
        Page 78
        Page 79
        Page 80
        Page 81
        Page 82
        Page 83
        Page 84
        Page 85
        Page 86
        Page 87
        Page 88
        Page 89
        Page 90
        Page 91
        Page 92
        Page 93
        Page 94
        Page 95
        Page 96
        Page 97
        Page 98
        Page 99
        Page 100
    Query pipeline control methods
        Page 101
        Page 102
        Page 103
        Page 104
    Analytical evaluation
        Page 105
        Page 106
        Page 107
        Page 108
        Page 109
        Page 110
        Page 111
        Page 112
        Page 113
        Page 114
        Page 115
        Page 116
        Page 117
        Page 118
        Page 119
        Page 120
        Page 121
        Page 122
        Page 123
        Page 124
        Page 125
        Page 126
        Page 127
        Page 128
        Page 129
        Page 130
        Page 131
        Page 132
        Page 133
        Page 134
        Page 135
        Page 136
        Page 137
        Page 138
        Page 139
    Petri-net based modeling, simulation,and evaluation
        Page 140
        Page 141
        Page 142
        Page 143
        Page 144
        Page 145
        Page 146
        Page 147
        Page 148
        Page 149
        Page 150
        Page 151
        Page 152
        Page 153
        Page 154
        Page 155
        Page 156
        Page 157
        Page 158
        Page 159
        Page 160
        Page 161
        Page 162
        Page 163
        Page 164
        Page 165
        Page 166
        Page 167
        Page 168
        Page 169
        Page 170
        Page 171
        Page 172
        Page 173
        Page 174
        Page 175
        Page 176
        Page 177
        Page 178
        Page 179
        Page 180
        Page 181
        Page 182
        Page 183
        Page 184
        Page 185
        Page 186
        Page 187
        Page 188
        Page 189
        Page 190
        Page 191
        Page 192
        Page 193
        Page 194
    Biographical sketch
        Page 195
        Page 196
        Page 197
Full Text








I owe so very much to Dr. Su, who has been the

single most important influence during the past five years of

my life. I sincerely appreciate the help and kindness of Dr.

Chow and Dr. Lam. I thank both Dr. Taylor and Dr. Staudhammer

for serving on my committee. I enjoyed working with Lance

during the last year and appreciate his help in the project.

I thank Dr. Sham Navathe for his advice during the early

years of my graduate studies. I acknowledge with pleasure,

the help and support provided by Sharon during the last five

years. Many friends and family provided me with much

encouragement, and to them also I owe very much.


ACKNOWLEDGMENTS............................ ii

ABSTRACT........................................... vi


1 INTRODUCTION ............................ 1


2.1 Data Types and Distribution......... 9
2.2 Data Integration and User Interface. 9
2.3 Command and Data Translation ........ 11
2.4 Query Processing..................... 11
2.5 Network and Database Control Functions 12


3.1 Pipelined Query Execution............ 13
3.2 Intermediate Result Sharing.......... 16

4 THE ARCHITECTURE OF THE IDN............... 20

4.1 A Heterogeneous Network Architecture
for the IDN..................... 20
4.1.1 A Broadband Inter-Connection
Medium ................ ... 20
4.1.2 Control............. ......... 23
4.1.3 Translation................... 24
4.2 The Query Processing Software
Architecture.................... 24


5.1 Query Analysis and Modification...... 27
5.2 Query Optimization................... 28
5.2.1 Local or Intra-Query
Optimization............... 30


5.2.2 Inter-Query Optimization or
Query Matching............. 31 Some related work....... 34 Requirements of query
processing software
architecture........ 39 Query tree representation
and notation........... 41 Dynamic inter-query
optimization methodology 43 An example.............. 58 Complexity of the
methodology........... 62
5.2.3 Global Query Optimization...... 66 Global query optimization
procedure............. 69 Description of query
pipelines............. 72 Parameters used in the
cost formulas......... 76 Cost formulas........... 78


6.1 Synchronous Control................. 101
6.2 Asynchronous Control................. 101
6.2.1 Circular Regeneration Approach. 102
6.2.2 Accumulation Based Regeneration
Approach..................... 103

7 ANALYTICAL EVALUATION..................... 105

7.1 Evaluation Procedure................. 105
7.2 Results............................. 106
7.2.1 Pipelining..................... 106
7.2.2 Intermediate Result Sharing.... 107
7.2.3 Query Pipeline Control Methods. 113
7.2.4 Join Algorithms................ 117
7.2.5 Alternative Query Execution
Schedules ....... .......... 130

EVALUATION........................ ........ 140

8.1 The Definition of an Extended
Petri-net ... ..................... 142
8.2 Petri-net Modeling of the Concurrent
Query Processing in the IDN........ 144
8.3 An Interpretation of a Query Petri-net 146

8.4 Modeling of the Database Operations
in the IDN....................... 148
8.4.1 Type 1 Operation: Select....... 148
8.4.2 Type 2 Operation: Sort......... 150
8.4.3 Type 3 Operation: Join......... 153
8.5 Models of the Query Pipeline Control
Methods ............................ 155
8.6 Prototype Based Petri-net Simulation. 158
8.6.1 Prototype...................... 158
8.6.2 Petri-net Simulator........... 159
8.7 Evaluation.......................... 161
8.7.1 Pipelining...................... 162
8.7.2 Intermediate Result Sharing.... 166

9 CONCLUSION..... .................. ....... 171

APPENDIX.. ......... ........................ 178

REFERENCES ....................................... 184

BIOGRAPHICAL SKETCH............................. 195

Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of
the Requirements for the Degree of Doctor of Philosophy



Krishna Prasad Mikkilineni

May 1986

Chairman: Stanley Y. W. Su
Co-Chairman: Yuan-Chieh Chow
Major Department: Electrical Engineering

Many developments in the computer industry suggest

that future data systems will likely be Integrated Data

Networks (IDNs) consisting of a heterogeneous complex of

computer systems with widely diverse data management

capabilities. Heterogeneous distributed data management is a

key problem in an IDN environment. The issues of data

integration and modeling, efficiency of query processing, and

maintenance of the reliability and integrity of the

distributed data are all equally important in heterogeneous

distributed data management. In this work, we develop an IDN

architecture based on two key parallel processing techniques

which improve the database query throughput. These are 1)

pipelined and data-flow execution of database operations, and

2) intermediate result sharing among concurrent queries.

We develop a methodology for matching concurrent

relational algebraic query trees so that common intermediate.

results can be identified and shared among them. Since the

optimization of high-level relational queries is important in

achieving efficiency in their execution, we develop a

procedure for optimizing the pipelined query processing in

the IDN environment. To evaluate the alternative query

execution schedules during the optimization process, detailed

cost formulas for computing the cost of pipelined,

distributed execution of relational queries are developed.

Synchronous and asynchronous methods of controlling the data-

flow in a query pipeline are studied. Alternative algorithms

for executing relational join in the pipelined query

processing environment are also investigated. Based on the

analytical formulas developed to compute the cost of query

execution, a thorough evaluation study is undertaken and its

results are presented. These results clearly establish the

superiority of 1) the pipelined query processing approach, 2)

the asynchronous method of query pipeline control, 3) the

intermediate result sharing technique, and 4) the hash-based

join algorithms. To validate the results of the analytical

study and to obtain a more accurate estimate of the

performance of the proposed techniques, Petri-nets are used

to model the various database operations and query processing

techniques in the IDN. Using a distributed Petri-net

simulator developed on a microcomputer network, we evaluate

and establish the performance improvements due to the

proposed query processing techniques.



The importance of information and the advantages of

its effective exploitation have been recognized as the keys

to achieving prosperity and solving the problems of our

complex society of the present and the future. The essential

ingredient of information is the data that have been gathered

from various sources. We believe that the future data

processing systems will be Integrated Data Networks (IDNs) of

a heterogeneous complex of computer systems that share the

resources of hardware, software, and data. This belief is

supported by the following developments and facts:

a) Affordable mini and micro computers pervading all

areas of our society and the phenomenal increase in

the availability of the data management software are

bringing the information handling power to large

numbers of novice users from the few trained

"computer experts."

b) Data are naturally distributed on different sites and

are stored in different formats which match the needs

of the local applications.

c) The advances in computer communication are making it

economical to construct various types of networks in,

which the distributed data are shared when needed.

d) Large private and public organizations typically

consist of a variety of software and hardware

tailored for handling various tasks.

e) Specialized computers that handle particular

functions such as database machines are becoming

increasingly popular as the need for high performance

in various areas of computer applications is


An Integrated Data Network (IDN) consists of a number

of heterogeneous computer systems, which may have different

hardware, software, data, and data management capabilities.

Some of the component systems of an IDN can be general-

purpose computer systems whereas others can be special

function processors. The data residing on each component

system can be in-a different format and may be managed by a

different type of database or file management system. The

component systems in an IDN can support large numbers of

users, many of whom can be issuing retrieval and update

queries against the distributed data simultaneously.

Heterogeneous distributed database management which

facilitates proper sharing and efficient processing of data

is an issue of fundamental importance in the IDN systems with

the above features.

The three key issues in heterogeneous distributed

database management are 1) the modeling of all the diverse

distributed data using a common global model, 2) the

processing of queries on the global data model, and 3) the

control of the integrity, reliability, and recovery of the

distributed data in the presence of updates issued by users

and/or faults in the hardware or software. All these issues

are equally important for successfully solving the problem of

heterogeneous distributed data management in an IDN. In this

dissertation, we limit our attention to the study of the

issue of concurrent query processing. Since the size of the

heterogeneous distributed database in an IDN will be rather

large and since large numbers of users or application

programs, some of which can be real-time oriented, will be

simultaneously using the system, the efficiency of query

processing will be the main problem in heterogeneous

distributed database management.

To solve the concurrent query processing problem in

an IDN, many related issues need to be studied taking into

consideration the features of an IDN such as its parallel

potential and the speed offered by specialized component

systems. These issues include the software and hardware

architecture, techniques for query processing and its

optimization, and efficient algorithms for query execution.

Distributed database query processing has been studied

extensively within the framework of both special purpose

database machine architectures and general purpose

distributed processing systems [DAT83, CER84, YU84, JAR84A,

KIM85, BRI84, CHU82, DAY83, EPS78]. However, the introduction

of new advanced processing strategies for improving the

efficiency of query processing in an IDN requires a fresh

investigation into the problems of query processing,

Soptimization, architecture, and algorithms in the IDN

environment, which are the tasks carried out in this


In this work, we assume that the relational model is

used as the common global model to model all the distributed

data in an IDN and thus concentrate only on the problem of

the distributed relational query processing in an IDN.

Relational model has been assumed because of its simplicity,

capability, and wide acceptance. Many parallel processing

architectures have been devised to improve the relational

database query processing efficiency and various innovative

algorithms and strategies have been developed to process the

relational queries efficiently over geographically

distributed databases [CER84, YU84, YU85, SHA80, VAL84,

MIS82, KIM84, HEV79A, FIS84, BER81]. In this work, we combine

two parallel architectural techniques into one powerful

query processing strategy: 1) pipelined and data-flow based

processing of database operations, and 2) sharing of

intermediate results among concurrent queries. We establish

that these techniques lead to low response time and high

throughput in query processing in an IDN.

In this dissertation, we define a distributed data
3 management software architecture and a hardware network
inter-connection architecture which are designed to further
enhance the query processing efficiency achieved by
pipelining and data sharing. We develop and analyze a
I methodology for matching concurrent relational algebraic
queries so that appropriate intermediate results can be
shared among them and redundant operations are eliminated. A
procedure for the optimization of distributed, pipelined
execution of global queries is defined. The detailed cost
I formulas, which are necessary during the query optimization
3 process for evaluating the alternative ways of executing a
query, are developed. Different methods of controlling the
I execution of the various operations in query pipelines are
defined and analyzed.
I Based on the cost formulas, which are developed to
I compute the performance of a query pipeline, a comprehensive
analytical evaluation study is performed. The results of this
study establish the advantages of pipelined query processing
and intermediate data sharing and the relative performance of
I various relational join algorithms in the IDN environment. We
* have also verified the results of the analytical evaluation
study and demonstrate the validity of the proposed
i strategies in the IDN architecture 1) by means of a detailed
modeling of the dynamics of the concurrent query processing
I using Petri-nets, and 2) by simulating them on a prototype
i microcomputer network.



This dissertation is organized as follows. Chapter 2

describes the general requirements and operations of an

integrated data network. Chapter 3 presents the two parallel

processing techniques which form the basis of the IDN

architecture and improve the query processing efficiency. In

Chapter 4, we present the proposed hardware network inter-

connection architecture and the distributed query processing

software architecture. In Chapter 5, the details of the query

analysis and optimization strategies are discussed. The

different methods of controlling the execution in a query

pipeline are described in Chapter 6. Chapter 7 describes the

analytical studies performed to evaluate the proposed query

processing techniques, control strategies, and algorithms. In

Chapter 8, we describe the details of modeling concurrent

query processing using Petri-nets and also describe the

results of the simulation of Petri-net models on a prototype

microcomputer network we constructed to demonstrate the

viability of the proposed IDN architecture. A summary,

conclusion, and discussion of the future research are

presented in Chapter 9.


An integrated data network is a loosely coupled

computer network of a number of heterogeneous computing

machines and databases. It may contain a number of 1)

general-purpose computers, 2) special-purpose functional

processors such as join processors, sorting machines, and

database filters [KUN80, MEN81, TON81], 3) back-end computers

(e.g., IDM500) [BRI82], 4) cellular logic devices (e.g.,

CASSM [SU79], RAP [OZK75]), 5)multi-processor systems (e.g.,


DBC [BAN79]), 6) associative memory systems (e.g., STARAN

[BAT77], NON-VON [SHA81], and 7) other non-numeric processors

such as text processing hardware [HOL79] and AI machines

[STO83, FAH83, RIE81]. The main purpose of interconnecting

dissimilar systems is to share data, hardware, and software

resources among these systems. Figure 1 illustrates the

general architecture of an integrated data network in which

component systems with different functionalities and data

management capabilities are connected through an inter-

connection network.


2.1 Data Types and Distribution

In an IDN, the databases may contain the following

three general types of data: 1) structured numeric, non-

numeric and pictorial data, 2) unstructured text or image

data, and 3) rules which may represent expert knowledge as

well as integrity and/or security constraints. Data in such a

system can come from diverse sources and are stored in a

distributed fashion in the network nodes. Some of the nodes

are capable of storing permanent data while others (e.g., a

VLSI chip for the join operation) may only participate in

performing some computational tasks over the data provided to

them by some other nodes. Databases residing on different

nodes may have been defined using different data models and

established and/or processed using different database or file

management systems. They may be stored physically in

different structures and formats which are most suitable for

supporting the local applications.

2.2 Data Integration and User Interface

A common global data model provides a homogeneous

interface to all the network users and application programs

for accessing and processing the distributed databases. This

common data model provides the network users with 1) a data

definition facility for defining an integrated network

database that is physically distributed at the different

component systems in the network, and 2) a data manipulation

facility through a global query language for processing the

integrated database. Thus, the IDN should provide the network

users with a uniform interface that is transparent from the

distribution, organization, and access of data using the

different local database management systems. Data modeling,

language, and integration problems in both homogeneous and

heterogeneous distributed database systems have been studied

in various distributed database management systems such as


SDD-1 [ROT80] and MERMAID [BRI84], and these will not be

addressed in this work.

In this research, we assume that the common global

model can be any semantically rich model capable of

integrating data modeled by all the diverse data models in

the IDN. The common global data manipulation language is

assumed to be any high level language based on the semantic

model. However, in assuming that the global semantic data

model and the global query language can be mapped easily into

relational model and relational algebraic query language

respectively, we ignore the issues of data modeling and

integration and concentrate only on relational algebraic

query processing issues. This assumption of mapping a

semantic model into the relational model in a heterogeneous

distributed database system is reasonable and has been made

in many distributed database management system works.

2.3 Command and Data Translation

To access the integrated database in the IDN, a

network user would formulate a query in a global query

language based on his or her view of the database. The global

query language statements must be translated into sub-queries

or commands interpretable by the component systems. Also, the

data retrieved from a component system must be translated

into structures or formats suitable for use by other

component systems or, at least, the component system that

requested the data. The command and data translation tasks

will have to be carried out either by all the component

systems or by special-purpose processors dedicated to this


2.4 Query Processing

In an IDN, the query processing capabilities of the

different component systems may differ drastically. Some

systems may have data base management systems to accept

complex, high-level queries while others may have file

systems capable of receiving only file manipulation commands.

Still others may accept only parameter values for those

programs which process their local databases. Therefore, it

is necessary to design a method of processing network queries

so that the component systems with diverse query processing

capabilities can co-operativley access and manipulate the

integrated network database. Furthermore, the parallel

processing capability and the efficiency provided by the

functionally specialized processors in the network should be

fully exploited to achieve a high degree of efficiency in

query processing.

2.5 Network and Database Control Functions

To ensure the proper operation of an IDN, a number of

network and database control and management functions have to

be handled in the network. These functions include the


1) Global query transaction management (i.e., transaction

integrity, reliability, and concurrency control),

2) Database and network integrity, security, reliability and

recovery control,

3) Component system allocation and re-configuration, and

4) Control and synchronization of concurrent query

execution at different nodes in the network.

Based on the definition of an integrated data network

and its general requirements given above, the two proposed

parallel processing techniques are presented in the next

chapter. In this work, we shall not deal with the complex

issues of global transaction management and network



In this chapter, we shall present the two query

processing techniques used in the IDN: 1) pipelined query

execution, and 2) intermediate result sharing.

3.1 Pipelined Query Execution

Pipelined execution strategies have been used

extensively in such diverse applications as computer

architecture, signal processing, and knowledge processing to

increase the degree of parallelism in the execution of small

granular operations (e.g. micro-instructions) [BAR78, DAV78,

DEN80, RUM77, TRE82, GAJ85, HWA85, MUR85]. Recently, the

pipelining and data-flow based processing approaches have

been proposed for processing large granular database

operations in several database machines [HON84, BIC81, BOR80,

BOR81B, BOR81A, GAJ83b, KIM84, FIS84] and multi-processor

systems [KEL79, TRE82, GAJ83a]. Large granular pipelining

eliminates the disadvantage of large control overhead

involved in small granular pipelining. Gajski and Peir

[GAJ85] argue that the advantages due to the decrease in

control overhead outweigh the disadvantage of the decrease in

potential concurrency inherent in large granular processing.

In the proposed IDN, pipelining and data flow

approaches are combined and used for the control and

execution of primitive database operations. The query shown

in Figure 2 illustrates this approach. This query can be

decomposed by the system into a tree of primitives as shown

in Figure 3: read, output, sort, select, project, and join.

These primitive operations can be executed in parallel in

different ways, depending on the locations of the Inventory

and Internal order relations in the nodes of the IDN and the

availability of processors that can execute the join, select,

project, and sort operations efficiently. In the pipelined,

data-flow query processing approach, the systems assigned to

work on the operations at each level of the query tree would

be "fired" to perform the operations in parallel as soon as

the input data become available. Unlike the traditional

distributed processing approach, each component system does

not wait for the completion of its operation before sending

the resulting data upward to its parent node in the query

tree. Instead, it sends a block of data to its parent node as

soon as the block is produced. In this manner, several

pipelines of data blocks are formed to carry the data

residing in the secondary storage through a number of

processors to the final destination specified by the users.

Each processor in a pipeline picks up the data blocks from

the pipeline, processes them, and produces some transformed

data blocks which are put back into the pipeline. The results

of a query are generated by the root node of a query tree.





Retrieve the quantities and locations of the parts whose
internal orders have a completed status, and sort them on








The major advantages offered by this query processing

technique are

1) the intermediate results at different levels of the query

tree need not be accumulated in the main memory or be

written out to the second storage device, thus saving

memory space and I/O time,

2) a high degree of concurrency is achieved by parallel

execution of the operations at one level of the query tree

(i.e., horizontal concurrency present in the traditional

distributed query processing) and by parallel execution

and overlapping of the operations at the different levels

of the query tree (i.e., vertical concurrency), and

3) the control overhead involved in query processing is

reduced since processors are activated not through

explicit control, but based on the availability of data

and, in some situations the presence of demands.

3.2 Intermediate Result Sharing

Existing database management systems do not recognize

relationships among concurrent queries. Common or sharable

operations among concurrent queries are not utilized, thus

leading to the redundant execution of the same set of

operations for each query. In an integrated data network

which supports a large number of component systems and users,

we believe that many common or sharable operations among

concurrent queries are possible. For example, in an

integrated CAD/CAM system, the same product or part data can

be simultaneously used for various design analyses ( e.g.,

heat transfer, stress, etc.) and in various manufacturing

processes (e.g., process planning, scheduling, etc.).

The sharing of the intermediate results among

concurrent queries has been proposed by several researchers

[CHA82, FIN82, JAR84B]. In the IDN, we combine this

technique with the pipelined query processing technique to

process concurrent queries. This approach eliminates the need

for redundant execution of the sharable operations among

concurrent queries, thus improving not only the individual

query response time but also the total system throughput. In

recent multi-user database machine performance evaluation

works [BOR84, CH085], it has been shown that as the degree of

sharing among concurrent queries increases the query

throughput also increases. The intermediate result sharing

approach can also save a lot of storage space since the

sharable results produced by these queries do not have to be


In this approach, when a new query enters the system

and is decomposed into a hierarchy of primitive operations,

the operations are compared against the operations of those

queries currently being executed. The purpose of this

comparison is to identify any currently executing operations

which produce results that are either identical or a

superset of those of the new operations. If so, the new query

tree will be merged with those of the existing queries to

form a query network or a network of query pipelines. The

overhead involved in the methodology of comparing a new query

tree against the executing query trees has been found to be

very small in our study (see Chapter 5). Generally, it does

not take long to determine if two queries do not have

Ssharable operations (e.g. they process different sets of

relations). If more time is used to match the query trees and

to check the data conditions, it usually means that some

sharable operations have been detected. The benefits of

sharing make the cost of query matching negligible.

Therefore, we believe that even if the chance of concurrent

queries having sharable operations is low in some

applications, the net overall benefits due to sharing the

intermediate results among them will be significant.

Figure 5 illustrates a merged query network of the

two queries defined in Figure 4 where the results from the

common join operation are piped to the select and sort

operations. A processor which produces sharable data will

pipe the data blocks to the processors in the different query

pipelines connected to it. Thus, the data blocks, starting

from the base data stored on the secondary storage devices,

will flow through the network of query pipelines and be

processed by the component systems of the IDN to produce the

final results for the various queries.







Retrieve the parts whose internal orders have a
completed status, and sort them on location.

Retrieve the part descriptions of the parts
whose internal orders have a completed status
and are issued after June 10,1984.




part description

order time >061084


"completed" ]

---> Identity Match"



In this chapter, we shall first briefly present the

hardware inter-connection architecture proposed for the IDN

and then outline the functions of the query processing

software architecture.

4.1 A Heterogeneous Network Architecture for the IDN

The query processing techniques presented in the

preceding section and the heterogeneous nature of the IDN

require that the inter-connection network be flexible and

re-configurable. In this section, we shall present the design

of a heterogeneous network architecture for the IDN and some

of the related issues.

4.1.1 A Broadband Inter-Connection Medium

After exploring other alternatives such as the use of

cross bar switch, shared memory, tree and banyan network type

inter-connection structures, we have chosen to use a

broadband common bus as the inter-connection medium of the

IDN. This choice is based on the requirements that we have

set for an inter-connection medium suitable for the IDN.

These requirements, derived from the capabilities required by

the proposed processing strategy and the nature of the IDN

environment itself, are the following:

1) many processors sharing intermediate results should be

able to take their inputs from a single processor that'

produces those results,

2) the exchanges of data among the processors in different

query pipelines or even in a single pipeline should

proceed simultaneously,

3) the communication paths among different processors should

be re-configured dynamically and easily to enable a new

query pipeline to tap into the executing query pipelines

at different points or to let one processor participate in

more than one query pipeline,

4) the inter-connection network should be expandable and

reliable in order to support new component systems that

may be added to the IDN as user requirements change, and

5) the inter-connection network should be such that

communication among the nodes that are physically

distributed over relatively long distances is possible.

We found that all the alternative inter-connection

networks we examined except the broadband bus do not

directly support the above requirements. The factors that

motivated our choice of the broadband bus are as follows:

1) It is easy to realize both one-to-many broadcasting and

many-to-many simultaneous communication among the

component systems in the IDN.

2) Both bus and tree inter-connection configurations can be


3) It is a flexible inter-connection structure in which

dynamic re-configurability can be easily achieved through'


4) Its common bus nature is compatible with the loosely

coupled nature of the IDN.

5) It can support a high communication bandwidth.

6) The various hardware and software components associated

with the network connection through a broadband bus have

been well tested. They are commercially available, and

are financially feasible (with the exception of the high

cost of the present frequency agile modems for high


7) Future technologies such as optical fibers can support

broadband communication at a larger capacity, greater

efficiency, and lower cost than current technologies.

The use of a broadband inter-connection bus has been

proposed previously in a data flow database machine [BOR81B].

A broadband bus is a common coaxial cable medium through

which the different component systems communicate on

different frequencies using a frequency division multiplexing

scheme. One processor can broadcast to many processors using

one frequency while the other processors can do the same on

different frequencies. A processor can easily switch among

different frequencies using frequency agile modems. The

communication methodology for each frequency channel is

similar to that in any common baseband bus.

It should be noted that some of the capabilities of a

broadband bus such as broadcasting and one-to-many

communication can be supported by other inter-connection

mediums such as baseband common bus. It is possible to

simulate the frequency division multiplexing and channel

switching of a broadband bus through time division

multiplexing and software based switching among different

processor groups in a baseband scheme. Thus, the frequency

channels of a broadband inter-connection can be considered as

logical channels which can be realized physically in many

ways by other inter-connection mediums such as baseband

common bus. We shall now discuss some of the issues connected

with network communication and control in the IDN.

4.1.2 Control

The network control functions include 1) global

transaction management, 2) network and database security,

reliability and recovery control, 3) dynamic re-

configurability of the inter-connection network, and 4)

housekeeping functions of the network. These control

functions can be completely or partially centralized or

distributed. We have chosen to centralize the network control

functions in one of the suitable nodes in the IDN. This

central control has the similar advantages and disadvantages

to those commonly associated with any type of central

control. For reliability purposes, the control functions can

be duplicated or moved to another general purpose component

system in the IDN in case of failure.

4.1.3 Translation

The data, status, and command information moved among

the component systems in the IDN have to be translated from a

source format to a destination format each time they are

moved. Two approaches are commonly used for translating one

format to another. The first translates a source format to a

common canonical format and then translates the canonical

format to a destination format. The second method translates

directly from a source format to a destination format. If

there are n formats, the first approach would require 2*n

different translations and the second approach would require

n*(n-1) translations. We propose to solve this translation

problem by assigning the translation task to dedicated

processors, or translators. Any processor-to-processor

communication would go through a translator which would

translate the source format to the destination format. The

translation task can be considered as another data processing

task in between each pair of the operations in a query tree.

This approach, while increasing the data transfer time

involved in query execution, eliminates the need for each

component system to implement the translation function.

Another advantage of this approach is that it facilitates

uniform treatment of the translation task and thus increases

the flexibility and expandability of the IDN.

4.2 The Query Processing Software Architecture

In the integrated data network, global queries issued

from any node in the network are first transferred to a

central controller node. As noted above, the controller node

can be any node in the network that performs control and

management functions. The controller first decomposes the

received query posed in a global query language into a

hierarchical tree in which the nodes represent primitive

operations and the edges represent data flow. The

hierarchical query tree is then modified and optimized.

During the optimization process, this query tree is merged

with the query network structure currently being executed.

The different nodes of the optimized query network are then

scheduled for execution at different component systems in the

IDN. The operation commands are translated into local

commands interpretable by the assigned processors before

being sent to the processors for execution.

After the execution, the output data from one

processor are translated from the source format to

destination formats before being sent to other processors.

When the last block of results has reached the last processor

in a query pipeline, the controller frees the assigned

processors from this query pipeline for later assignment to

another query pipeline. In some cases, the controller can

assign many operations belonging to the same or different

queries to a single processor. Figure 6 illustrates the

typical sequence of operations occurring during query

processing. In the following chapter, we shall elaborate on

the main query processing functions mentioned above.












SCommand or Query
>*- -*




and missing)




** control






'-- ---


In this chapter, we shall first describe the query

analysis and modification functions. Query optimization in

the IDN environment is then discussed in detail.

5.1 Query Analysis and Modification

The global query received by the controller is first

parsed and decomposed into a relational algebraic query tree.

The query tree is then modified in the following ways.

1) The leaf operations in the query tree are, in most cases,

operations on the data defined in the user's view

(external view) rather than on those defined in the

global conceptual view. These operations need to be

changed into equivalent operations on the conceptual data

objects using the pre-defined external-view-to-conceptual-

view mappings stored in the system [ALL81]. In this

modification process, the query tree operations are also

modified to accommodate any pre-defined integrity and

security constraints.

2) The qualification conditions that are associated with the

operations in the query tree are transformed into

disjunctive normal form expressions based on the standard

normalization rules.

3) N-ary operations in the query tree that correlate more

than two data files are transformed into a hierarchy of

corresponding binary operations.

These modification and re-structuring steps

result in a standard or canonical query tree structure which

simplifies the process of merging with the query network in

execution. Figure 8 illustrates the query modification

process to account for a view mapping and a security rule.

The definitions of two database relations, a user-issued

query and the view and access constraints associated with

this user are given in Figure 7. The two query trees shown

in Figure 8 illustrate a modification to the query tree of

the query defined in Figure 7 to account for the view and

access constraint mappings.

5.2 Query Optimization

The modified query tree is typically not in a

computationally efficient form. It is necessary for a

database management system to transform the query tree into a

computationally efficient structure whose constituent

operations are executed in a proper sequence using proper

access methods to reduce total execution time. The process

of transforming a query tree structure and execution schedule

into an efficient one using various optimization techniques

is called Query Optimization.

Most of the current commercial database systems

support query optimization only to a limited extent. Query





(the corresponding internal orders have a completed status. )


This user can only see the customer orders dated before
March 10,1984.

QUERY DEFINITION: Retrieve the customer names from the
completed order.



WRITE Due to cust-name
SSecurity Constraint-- SELECT QUAL.
cust-name View Mapping-----
PROJECT / cust order#=cust-
FIELDS: order#
cust-order# PROJECT





optimization in relational database systems has been studied

extensively [JAR84A, ULL82, DAT83, KIM82, K080, HEV79B,

BLA77]. These studies have dealt with the issues of 1)

selecting proper execution algorithms for the operations such

as join and sort, 2) assigning costs for the computation,

data movement, and I/O transfers of the query operations, and

3) evaluating and selecting an optimal sequence for executing

the operations of a query. The query optimization techniques

developed for centralized systems have been extended for the

distributed database systems [HEV79b, BER81, APE83, BRI84,

CHU82, EPS78, YU85].

In the IDN, query optimization is performed in three

phases. These are 1) local or intra-query optimization, 2)

inter query optimization or query matching, and 3) global

query optimization.

5.2.1 Local or Intra-Query Optimization

The two steps involved in this optimization phase are

local reduction and common sub-expression reduction. In the

local reduction step, some unary operations in the query tree

can be moved downwards [SMI75] so that only small amounts of

data need to be moved up in the query tree. After the

external-view-to-conceptual-view mapping step of the query

modification process, it is possible for the modified query

tree to contain some common sub-expressions on the base data.

In the common sub-expression reduction step, these sub-

expressions are identified [ULL82, HAL76, AHO79A, AHO79B] and

combined into a single common sub-expression with some


5.2.2 Inter-Query Optimization or Query Matching

In this phase, the locally optimized query is further

optimized by merging it with the query trees currently being

executed so that the advantages of intermediate data sharing

can be realized. We have named the methodology, which

detects sharable operations among concurrent queries and

merges them for the purpose of sharing the base data and the

intermediate and/or final results, the Inter-query


Inter-query optimization can be viewed as a method of

exploring the sharability relationships among concurrent

queries as contrasted with the traditional concurrency

control mechanism which deals with the conflicting

relationships among the concurrent queries and updates.

However, Inter-query optimization methods and their

performance have not been studied as extensively as the

concurrency control methods and their performance. There have

been a few proposals [GRA81, CHA81, CHA82, FIN82, JAR84B] to

collectively optimize and compile batches of queries in order

to take advantage of their inter-relationships. We call the

batched query optimization static inter-query optimization.

However, to our knowledge, no methodology has been introduced

for optimizing a new query by exploring its data-sharing

relationship with queries currently in execution. In this

work we study this issue of dynamic inter-query optimization

and address the following questions.

1) To what extent and how frequently do concurrent queries

issued by different users contain sharable operations

among them?

2) How can a database management system detect if one

operation can share the results of another?

3) Can a database management system achieve the intermediate

result sharing without incurring undue overhead that

outweighs the advantages of such sharing?

We shall first deal with the first question. It is

obvious that the extent and frequency of a set of concurrent

queries sharing some intermediate or final results are very

much application-dependent. In general, we can say that, in

many application areas, a group of users are collectively

interested in manipulating the same subset of the database at

one time and their interests move to another common subset at

another time. For example, in online transaction processing

systems such as airline reservation systems (or record-

oriented systems according to [JAR84A]), the customers are

only interested in the information about the current

available flights and reservations. Airline personnel at

ticket counters and travel agencies will be accessing the

same flight or reservation sub-database [GIF84]. Similarly,

in automated manufacturing applications, during the

production of a batch of parts, the different design,

planning, management, and control processes would all want

to access the information about the same batch of parts (in-

progress parts) [MIT84]. Figure 4 gives the definitions of

two relations in a factory database and two queries, an

executing query and a new query. We can observe from the two

queries in Figure 4 that the parts retrieved by the queries

will have a common subset. Figure 5 illustrates how the

results from a sub-tree of the executing query tree are

shared by the new query tree.

Beside the intuitive feeling from the example in

Figure 3 for the potential sharing among the concurrent query

operations, one can also estimate the number of sharable

operations by using a probabilistic model. For example, one

study [CHA82] estimates that if two queries share one

relation among 10 base relations in a database with 10%

frequency, the cost of executing two queries together is 40%

less than the total cost when they are executed


The degree of sharing among concurrent queries can be

measured by the number of nodes (operations) and arcs (data

files) of their query trees that match. It can be determined

precisely through benchmarking studies of an actual system

that employs a dynamic inter-query optimization methodology.

In recent database system performance evaluation studies

[BOR84, CH085], it has been shown that both query throughput

and response time improve as the degree of sharing among

concurrent queries increases.

In this work, we shall deal with two issues as our

response to the second and third questions raised: 1) the

development of a dynamic inter-query optimization methodology

for detecting the data-sharing relationships among relational

algebraic queries, and 2) an analysis and evaluation of the

complexity of this methodology. Some related work

Inter-query optimization involves the merging of

query trees based on the data-sharing relationships among

their constituent operations. In graph theory, one comes

across a similar problem of identifying equivalent sub-graphs

in one or more graphs or establishing isomorphism between two

or more graphs. Also, in compilers, a similar problem of

detecting common sub-expressions among computational

expressions exists. During the code optimization phase, some

compilers can detect the equivalence relationships among the

sub-expressions of a computational expression. They then

optimize the code generated for the expression so that the

common sub-expression is evaluated only once and its results

are used by all the other dependent sub-expressions.

However, the solution techniques developed for these two

analogous problems are not directly applicable to solving the

problem of inter-query optimization in database systems

since, in the latter case, high-level queries are involved

and partial matches among database operations are meaningful.

In a relational database system that provides an

external view facility, common sub-expressions can occur in

relational query expressions. This is because the user

constructs query expressions to process against the pre-

defined external views. These expressions are modified by the

database system so that they refer to the base relations.

This modification often results in redundant common sub-

expressions. Some single query optimization studies [HAL76,

SAG80, ULL82, AH079B] recognize this fact and propose common

sub-expression recognition and elimination during the single

query optimization process. Hall [HAL76] presents 1) a method

for detecting multiple common sub-expressions within a single

complex query and 2) a discriminant function to decide which

common sub-expression to eliminate and which to keep. Hall

[HAL76] also presents the results of an implementation of the

proposed methodology in a relational testbed. The results

show a reduction in execution time.

Several other studies [AHO79A, AHO79B, SAG80, SAG83]

have developed methods for recognizing the equivalence

relationship among a sub-class of relational algebraic

queries and have analyzed the complexities of these methods.

The queries in this sub-class are conjunctive queries

represented by tableau. Tableau are the tables in which the

qualification conditions and the target attributes of the

operations in a query are represented as rows and columns.

The studies presented in [AHO79A, SAG80] have developed

algorithms for testing the equivalences among the entries in

a tableau (predicates in a qualification condition) and for

minimizing a single tableau or unions of tableau so that an

exact optimum can be found. The tableau method of common sub-

expression minimization in a single query has been used for

optimizing queries in the distributed database management

system MULTIBASE [KAT81]. It is possible to translate a large

class of relational algebra queries into tableau form and use

the tableau method of finding the equivalence relationships

among them during the primitive operation matching steps of

our methodology (to be discussed in the following section).

An extension of relational algebra called multi-

relational algebra is used to handle the operations on

partitioned and replicated distributed databases in [CER83].

In this paper, the standard rules for transforming sequences

of relational algebraic operations using properties such as

commutativity and associativity have been extended and

modified into a set of equivalency preserving transformation

rules for multi-relational algebraic operations. These rules

will be helpful in detecting common sub-expressions among

concurrent queries in distributed database systems.

A batch of queries can be compiled and executed

together for reasons such as they perform better when they

are executed together, or they are required to be executed as

a batch for a specific application. The performance

improvement due to a batched query execution can be realized

only if the query optimizer and compiler recognizes the

common access paths and common operations among them and

eliminates the redundancies. In this work, we do not study

the problem of batched query optimization or static inter-

query optimization. Although static inter-query optimization

is very much similar to dynamic inter-query optimization in

principle, the specific methodology can be quite different in


1) static inter-query optimization is performed with the

complete knowledge of all the queries in the batch whereas

dynamic inter-query optimization tries to optimize the

current query based on a limited knowledge of the results

being produced by the queries that have already been

optimized and compiled,

2) static inter-query optimization can be done off-line, and

thus can afford to perform a detailed checking to identify

the common operations or access paths among the queries

whereas dynamic inter-query optimization methodology, due

to its run-time nature, has to be fast, and

3) many combinations of common sub-expressions can be

detected and the most beneficial chosen during static

inter-query optimization, whereas dynamic inter-query

optimization due to its time limitation can only detect a

few of the possible common sub-expressions.

Taking these differences into consideration, we shall

concentrate only on the problem of dynamic inter-query


There are a number of works related to this topic.

Roussopoulos [ROU82A, ROU82B] proposes a methodology for

recognizing common sub-expressions among all the expected

queries in a database system and uses that knowledge in

logical database modeling. A logical database design is

optimized by Roussopoulos [ROU82A] by 1) recognizing the

common access paths in the expected queries, and 2) modifying

the original design to produce an optimal set of logical

access paths and indices.

Jarke [JAR84B] presents some general principles for

recognizing common sub-expressions among a set of queries in

several relational query representations: 1) relational

algebra, 2) tuple calculus, and 3) domain calculus. Sellis

and Shapiro in [SEL85] discuss the benefits of combining

multiple concurrent updates into a single update by

identifying that these updates refer to a common data item.

Finkelstein [FIN82] presents algorithms to optimize batched

collections of queries as well as ad hoc queries in relation

to the executing queries. This paper presents a framework for

deciding which temporary relations (intermediate or final

results) produced by a query or which snapshot of the query

execution to keep as well as discussing how to beneficially

share such temporary relations during later query execution.

The methodology described by Finkelstein [FIN82] can be

implemented easily in systems such as System R, which allow.

caching of temporary results. Our work follows that presented

by Finkelstein [FIN82]; however, the dynamic inter-query

optimization methodology presented in the next section is

based on a relational algebraic query tree representation, in

contrast to the query graph representation used by

Finkelstein [FIN82]. Furthermore, an analysis of the

complexity of our methodology and an evaluation of the

relative benefits of intermediate result sharing are

performed in this work. Such analysis and evaluation are

absent from Finkelstein's work. We address the dynamic

inter-query optimization problem in a general framework

suitable for any centralized or distributed database system

architecture which satisfies the basic requirements

described in the next section. Requirements on query processing software architecture

The purpose of dynamic inter-query optimization can

be realized only if the system allows the new query to

operate on the results produced or being produced by the

executing queries. Inter-query optimization can be

implemented in either centralized or distributed database

systems or in multi-processor database machines. In order to

successfully implement the proposed optimization methodology,

the software architecture of a database system should

satisfy the following requirements or provide the following


a) The database system should save the definitions of all the

concurrent queries (i.e. a description of query operations

and search conditions) until they have been executed

completely and left the system.

b) When a new query enters the database system, the system

should be able to either suspend the executing query

processes and perform dynamic inter-query optimization

before resuming the execution or optimize the new query

concurrent with the execution of other queries.

c) The database system should have a mechanism whereby the

optimization module can identify the results produced or

being produced by the executing queries. The system should

also be able to dispose the intermediate/final results

produced by the executing queries only if no other query

is operating on them.

The first facility mentioned above is generally

provided by the transaction management module of a DBMS

because it needs to keep a record of the definitions of all

the queries in each transaction for recovery purpose. Many

existing database systems can provide the second facility by

either executing the optimization process with a higher

priority over the query execution processes or running them

as concurrent or time-shared processes. The third facility is

not readily available in the existing database system




it can be added without much Query tree representation and notation

The modified and locally optimized query tree is in

certain canonical form. To develop the syntactic dynamic

inter-query optimization methodology, we need to adopt a

standard representation for the query structure and

operations. This representation is presented in the

following. It contains only the information essential for the

query matching methodology; its actual implementation may

require additional information. In the following notation, we

use the symbols *,+,(), and {} with the following meaning:

[a]* means zero or more occurrences of a.

[a]+ means one or more occurrences of a.

a(b) means that b can optionally follow a.

a < {b,c} means that a can take the values from the set

containing b and c.

A:b;c means that A is defined by the value of b followed by

that of c.

We should also note that the names for the

intermediate relations in the query tree can be uniquely

assigned by the system so that they reflect the base

relations and the query operations involved in producing

those intermediate relations.

PRIMITIVE OPERATIONS < { Read, Write, Sort, Select, Project,

Join, Union, Difference, Intersect, Product, etc. )


NODE: Operation ID.; Operation definition; Sibling Ptr.; Parent

Ptr.; Subset Ptrs.; Identity ptrs.

Operation definition (format of primitive operations):

READ: Source Relation Name; Result Relation Name

WRITE: Source Relation Name; Result Relation Name

SORT: Source Relation Name; Result Relation Name

SELECT: Source Relation Name; Result Relation Name;

Qualification condition; Conjunction attribute list

PROJECT: Source Relation Name; Result Relation Name;

Target attribute list

JOIN: Relationi Name; Relation2 Name; Result Relation Name;

Join condition; Conjunction attribute list

UNION: Relationl Name; Relation2 Name; Result Relation Name

DIFFERENCE: Relationl Name; Relation2 Name; Result Relation


PRODUCT: Relationl Name; Relation2 Name; Result Relation Name

INTERSECT: Relationl Name; Relation2 Name; Result Relation


Qualification condition: [Disjunction;]*

Disjunction: [Conjunction; OR Conjunction;]*

Conjunction: Predicate; [ AND; (NOT;) Predicate;]*

Predicate < {Type 1 predicate, Type 2 predicate)

Type 1 predicate: (Function;) Attribute Name; (Arithmetic

operator; Constant;) Relational operator; Constant

Type 2 predicate: (Function;) Attribute Name; (Arithmetic

operator; Constant;) Relational operator; (function;)

Attribute Name (; Arithmetic operator; Constant)

Relational operator < { <, <=, >, >=, =, = }

Arithmetic operator < { +, -, *, / }

Function < { Count, Sum, Average, Maximum, Minimum, etc. }

Conjunction attribute list: [ Attribute Name;]+

Target attribute list: [Attribute Name;]+

Join condition: conjunction (with type 2 predicates only)

Constant < Any user specified value conforming to the data

types recognized by the system.

The modified query tree of Figure 8 is shown in

Figure 9 with the ID of each node on its left side and the

name of the intermediate relation produced by that node

listed to the right. Figure 10 shows the contents of each

primitive operation node in the query tree of Figure 9

according to the format specified in the above notation. In

this example, the ID of an operation is used as the value of

the parent and sibling pointer fields in the definition of

that operation. Dynamic inter-query optimization methodology

The dynamic inter-query optimization process is

performed by first comparing the new query tree with the

query trees of the executing queries in order to detect node

or sub-tree matches among them and then, based on these

matches, merging the new query tree with the executing query




T Temp3
Templ Temp2





R1 ORDER Templ Nil S1 Nil Nil
R2 CUSTOMER Temp2 P1 J1 Nil Nil

I Sl Templ Temp3 status= date, Nil PI Nil Nil
'completed& status
I Op.Id.;Sr.Na.;Re.Na.;Tar. At.lt.;Si. Pr.;Pa. Pr.;Su.Prs.;Id.Prs.
P1 Temp3 Temp4 date,order# R2 Jl Nil Nil
P2 Temp5 Temp6 cust-name,date Nil Wl Nil Nil
Op.Id.;Sr.Nas;Re.Na.;Jn Cnd.;Cnj.At.lt.;Si.Pr;Pa.Pr;Su.Prs;Id.Prs

SJl Temp2, Temp5 order#= order#, Nil P2 Nil Nil
Temp4 order# order#
Op. Id.;Sr.Na.; Re. Na.; Si. Pr.; Pa. Pr.; Su. Prs.;Id.Prs.

W1 Temp6 Temp7 Nil Nil Nil Nil



trees. The two possible types of matches are identity match

and subset match. Identity match means that two

corresponding operations in the new and executing queries

produce identical results. Subset match means that a node in

the new query tree produces results that form a subset of

those produced by the corresponding node of an executing

query tree.

For the purposes of query matching, we divide the

operations in a query tree into two classes: identity and

subset based on the types of matches these operations have.

Operations in the identity class result only in an identity

match whereas the operations in the subset class can result

in both identity and subset matches. Among the primitive

operations mentioned earlier in this chapter, those in the

identity class are Read, Write, Union, Difference, Sort,

Product, and Divide and those in the subset class are Select,

Project, and Join. We shall describe the query merging

methodology in two levels of details. The first one gives the

general principle involved in matching a primitive operation

in the new query tree with the same type of operations in the

executing query trees. The second level gives an algorithm

for finding sub-tree matches between the new query tree and

the executing query trees. Level 1: primitive operation matching

Identity matching. An identity match between two

operations of the same type is established by first checking

to see if the source relation names of these two operations

are identical. Subsequently, an identity relationship between

the corresponding target attributes or qualification

conditions of these operations is established. Matching the

relation or attribute names involves a simple symbol matching

which can be carried out either in a single step by matching

the hashed values of the two name strings or by a sequence of

character matches.

Subset matching. The subset matching operation

involves 1) establishing that the operation in the new query

tree (new operation) is more restrictive than a corresponding

operation in an executing query tree (executing operation),

and 2) refining the new operation and constructing an

operation called restricting operation by eliminating from

the new operation the predicates and/or target attributes

common to those in the executing operation and by changing

its source relation name to the result relation name of the

executing operation. Establishing a subset match

The first step in identifying a subset match is to

establish that the source relation names in the new and

executing operations are identical. The second step involves

comparing 1) the target attribute lists in the case of

Project operations, 2) the qualification conditions and

conjunction attribute lists (see Figure 11 for an example)

in the case of Select operations, and 3) the join conditions

and the conjunction attribute lists in the case of Join

operations. We shall elaborate on this second step.

Finding a subset match between two Project operations

involves establishing that the target attribute list of the

new Project operation forms a sub-list of that of the

executing Project operation. Finding a subset match between

two Select operations involves establishing that the

qualification condition of the new Select operation (NQC) is

a sub-expression of that of the executing Select (EQC). This

can be achieved by first comparing the conjunction attribute

lists of both operations and then the corresponding

predicates in both NQC and EQC. Conjunction attribute lists

in a Select/Join operation consist of ordered lists of the

attribute names in each conjunction of the qualification/join

condition. In the conjunction-attribute-list matching step,

the conjunction attribute lists of the new and executing

operations are matched to see if a) the number of

conjunctions in the new operation is equal to or less than

that in the executing operation, and b) each conjunction

attribute list in the new operation is identical to or

contains, as a sub-list, the corresponding conjunction-

attribute-list in the executing operation. If the above test

on conjunction attribute lists succeeds, a more detailed

checking of the predicates is required. Steps 2 and 4 in

Figure 11 illustrate this conjunction-attribute-list matching

performed on the three select operations defined in Table 1.



Sr.Rel.Name Inventory Inventory Inventory

Re.Rel.Name Templ Temp2 Temp3

Qual.Cond. quantity=100& quantity>300 (quantity<200&location=
location='W1' 'Wl')l(location='W1'&

Conj.Att. location, quantity location,quantity;
list quantity location,part-description

step 1:
Match the source relation names in Sl and S2. Result:

step 2:
Check if conjunction attribute list of S1 is a sub-list of
that of S2. Result: No, so S1 has no match with S2: Proceed
to matching of S1 and S3.

step 3:
Match the source relation names in S1 and S3. Result:

step 4:
Check if the conjunction attribute list of Sl is a sub-list
of that of S3. Result: Yes, Proceed.

step 5:
Check if the two predicates in the qualification condition of
S1 have any match with those in S3. Result: Predicate 1 in S1
is a subset of Predicate 1 in S3 and Predicate 2 in S1 is
identical to Predicate 2 in S3. So, 1S is a subset of S3.

step 6:
Construct a restricting select operation that uses the
results from S3 and replaces Sl.

Result: New Select
Src. Rel. Name: Temp3
Res. Rel. Name: Temp4
Qual.Condition: quantity=100
Cnj. Att. list: quantity


In the predicate checking step, each predicate in NQC

is compared against the corresponding predicate (the

predicate containing the same attribute name(s)) in EQC. If

both of the predicates being compared are of Type 1, then we

use the relational operator in the predicate of NQC as the

row index and the relational operator in the predicate of EQC

as the column index for accessing Table 2. The three flags in

the indexed table entry indicate the three types of match

possible on the predicates being matched. One of these three

flags is chosen to establish the type of predicate match

based on the result of the comparison ('<', '>' '=')

between the constants in both predicates. If both of the

predicates being compared are of Type 2, then we index into

Table 3 based on the relational operators in the predicates

of the NQC and the EQC to establish whether this match is a

subset, identity or none. If the above check results in

either identity or subset match, we proceed to the matching

of the next two corresponding predicates in NQC and EQC. A

complete subset or identity match between NQC and EQC is

established only after the last predicate in NQC results in

an identity or subset match. When one of the predicates being

compared is of Type 1 and the other is of Type 2, it is

possible that matching these two predicates might result in a

subset match. However, establishing such a match is, in

general either not possible or difficult because variables,

whose values keep changing constantly during run-time, are


--> Rel. operator in

the predicate of the executing operation

of the

= < > < <= > >=

= I N N S N S

> N I N N N N

< N N I S N N

<= S N N I N N

> N N N N I S

>= S N N N N I

-> CN < CE
-> CN = CE
-> CN > CE

Constant in the new operation
Constant in the executing operation
Identity Match
Subset Match
No Match



--> Rel. operator in the predicate of the executing operation


of the

I: Identity Match
N: No Match
S: Subset Match

= <> < <= > >=

= I N N S N S

<> N I N N N N

II I I 1

> N N N N I S

>= N N N N N I

involved in this type of matching. To simplify the predicate

checking step and limit its overhead, we have chosen not to

consider the case where the two predicates being compared are

of two different types.

The above procedure for establishing a subset match

between two Select operations also applies to the Join

operation matching. The only difference is that Type 2

predicate checking is sufficient for matching the join

conditions, whereas the checking of both Types 1 and 2

predicates need to be performed when matching the

qualification conditions of Select operations. Construction of a restricting operation

In the case of a subset match, the executing

operation restricts the input relation that is the same as

that specified in the new operation and produces a set of

tuples which is a superset of the tuples to be produced by

the new operation. Therefore, when a subset match has been

established, the new operation needs to be modified to form a

Restricting operation which operates on the results of the

executing operation rather than the original input

relationss. .We shall now outline how Restricting operations

are constructed in the case of subset matches on project,

select, and join operations.

A Restricting project operation is constructed as

follows: The source relation name in the Restricting

operation is made the same as the result relation name in the

executing operation. The result relation name is made the

same as the result relation name in the new operation. The

target attribute list is formed from the list of the

attribute names that are in the target attribute list of the

new operation.

A Restricting select operation is constructed as

follows: The source relation name is made the same as the

result relation name in the executing operation. The result

relation name is made the same as that in the new operation.

The qualification condition is made to consist of the

conjunctions that are formed out of the predicates in NQC

that were found to have subset matches, and also those

predicates that are not present in the conjunctions of EQC

but are present in NQC. The conjunction attribute list is

formed from the lists of attribute names for each conjunction

in the qualification condition just constructed.

When a new join operation in the new query is found

to have a subset match with an executing join operation, it

has to be replaced with a Restricting select operation. This

Restricting select operation removes the extra tuples in the

executing join operation to produce the same results as the

new join operation. A Restricting select operation for a

join is constructed as follows. The source relation name is

made the same as the result relation name in the executing

join. The result relation name is made the same as that in

the new join. The qualification condition is made to consist

of the conjunctions that are formed out of the predicates

that are not present in the EQC but are present in NQC. The

conjunction attribute list is formed out of the lists of

attribute names from the newly formed conjunctions.

The definition of a select operation (S1) in the new

query and the definitions of two select operations (S2 and

S3) from the executing queries are given in Table 1. Figure

11 illustrates the various steps required for comparing S1

with S2 and S3. In each step in Figure 11, the action of that

step and its result are explained briefly. Sub-Tree matching

The primitive operation matching strategy discussed

above is used for comparing the new query tree against the

executing query trees. Any of the sub-trees in the new query

tree which is identical to or a "subset" of any sub-tree in

the executing query trees is identified. Two query sub-trees

are said to be identical if all the constituting nodes in

them are identical. A sub-tree in the new query tree is said

to be a subset of that of an executing query tree if the root

node of that sub-tree has a subset match with the root node

of the other sub-tree and if all the corresponding nodes

below these root nodes are identical.

We shall now briefly discuss the general tree

traversal technique employed in finding all the possible sub-

tree matches on the new query tree and in merging the new

query tree with the executing query trees based on these

matches. (A detailed pseudocode for query tree matching and

merging algorithms is given in the Appendix.)

Matching of the new query tree starts out with a

comparison of its first leaf node against an executing

operation of the same type until an identity or subset match

is established or until all the executing operations of the

same type result in a no-match.

If a no-match is established on a node, this node and

all its ancestor nodes are marked as already processed, thus

resulting in a no-match. The matching process then proceeds

to the comparison of the next leaf node which has not been


If an identity match is established, the node is

marked as having an identity match. Furthermore, if the

parent of this new query tree node is a unary operation or if

it is a binary operation with its other child marked as

having an identity match, then that parent node is scheduled

for matching.

If a subset match is established, the matching

algorithm would continue along the path of the tree upward

toward the root to find the most restrictive superset

executing operation. When a subset match on this new node is

finally established with one of the executing operations, it

is marked as having a subset match and all its ancestor nodes

are marked as having a no-match. The next unmarked leaf node

is then scheduled for matching. Marking the last leaf node as

processed completes the tree matching process. We should note

here that, to match an intermediate level node in the new

query tree, the node needs to be compared only against the

parent nodes (including all the identity and subset

operations) of the executing query tree node that has been

established to have a match with the child node of this node.

The above tree matching process establishes, for each

new operation, a specification of the type of match and the

ID of the matched executing operation. This information is

used for merging the new query tree with the executing query

trees. In the tree merging algorithm, the various sub-trees

in the new query tree, each of which has an identity match

for all its constituent nodes, are first identified. The new

query tree is then merged with the executing query trees so

that the parent nodes of these matched sub-trees are made as

the parent nodes of the matched executing query sub-trees.

All the nodes in these sub-trees with an identity match are

then removed from the new query tree. Similarly, the various

sub-trees in the new query tree with a subset match for their

root nodes and an identity match for all the other nodes are

identified. The root nodes of these sub-trees are then

replaced by their corresponding restricting operations and

the other nodes in these sub-trees are removed from the new

query tree.

The tree merging algorithm results in a modified new

query tree with some of its nodes connected to the nodes of

the executing query trees. Since the tree merging phase is

separated from the tree matching phase, it is possible to

incorporate some optimization criteria into the tree merging

phase so that a new query tree node is connected to an

executing query only if the sharing of intermediate results

is more beneficial than executing that new query node

separately. An example

In this section, with the help of an example, we

shall illustrate the workings of the above query tree

matching and merging algorithms. Figure 12 illustrates a new

query tree where each node is labeled by its operation ID.

Let us assume that there are only two executing queries. They

are illustrated in Figure 13 using a merged query network

where the results of the operation Q1S1 are given as input

to the operations Q1J2 of Q1 and Q2S1 of Q2.

The matching of the new query tree in Figure 12 with

the executing queries Q1 and Q2 starts with the comparison of

NQR1. NQR1 is compared against Q1R1 and is found to be

identical. The sibling operation of NQR1, namely NQR2, is

then compared against Q1R1 and Q1R2 and is found to be

identical to Q1R2. The parent operation of NQR2, namely NQJ1,

is then scheduled for comparison. NQJ1 is found to be

identical to Q1J1. The sibling of NQJ1, NQR3, is then

compared against Q1R1, Q1R2, and Q1R3 and is found to be

identical to Q1R3. NQJ2 is then compared against Q1J1, Q1J2,







None NQW1

None NQJ2

Identity: QlJl NQJ1
Identity:Q1R2 Identity:Q1R3
identity : Q1R
--^-I I---I --*--I--
Rl | | R2 | I R3 I


Ql: Q2:



and Q2J1 and is found to have no match with any of these

three operations. Subsequently, NQW1 is determined to have

no-match as well and the query tree matching process is

terminated. Figure 14 illustrates the new query tree with the

type of match found on each node and with the ID of the

matching executing operation.

The new query tree is then merged with Q1 and Q2

based on the four identical matches found in the above

matching process. NQR1 is first checked to see what type of

match has been established on that operation. Since NQR1 has

an identity match, its parent operation NQJ1 is then checked.

Since NQJ1 also has an identity match, and since its parent

operation NQJ2 has no match, the source relation name in NQJ2

is changed to the result relation name of Q1J1, which is

identical to NQJ1, and the identity pointer in Q1J1 is

changed to point to NQJ2. The nodes NQR1, NQR2, and NQJ1 are

then removed from the new query tree. NQR3 is now checked and

since it has an identity match, the source relation name in

NQJ2 is changed to the result relation name of Q1R3 and the

identity pointer in Q1R3 is made to point to NQJ2. This

completes the merging process and it leaves only NQJ2 and

NQW1 in the new query tree. Figure 15 illustrates the merged

query network after the new query tree has been completely

merged with the executing query network. Complexity of the methodology

Any optimization technique should be able to provide

a saving in the optimized parameters and the saving should

far exceed the overhead of the optimization process. In this

section, our objective is to examine the issues related to

the complexity of the inter-query optimization methodology.

We shall only provide a overall analysis of the complexity of

the methodology.

The following are the essential parameters used in

deriving the formulas for the computation time required for

establishing an identity, a subset, or a no-match for

different relational algebraic operations.

En: The number of executing operations (of the same type as

the new operation) that need to be compared to find a

match with a new operation. When a leaf operation in the

new query needs to be matched, En will be equal to the

total number of the executing operations of the same type.

When a new operation at a higher level in the tree needs

to be matched, En will be equal to the number of the

executing operations of the same type that are pointed to

by the parent, identity, and subset pointers of the

executing operation which has been found to be

identical to the child of this new operation.

An: The number of attributes in the target list of a new

projection operation.

Cn: The number of conjunctions in the qualification/join

condition of a new select/join operation.

Pn: The number of predicates in a conjunction.

Ut: The computation time required for a unit operation such as

the comparison of two attributes. Select

Identity match. The time required to establish an

identity match by comparing a new select operation against an

executing select operation consists of three components. They

are the times for 1) the source relation name matching (Ut),

2) the conjunction attribute list matching (Cn*2*Pn (the

total number of attributes in the list)*Ut), and 3) the

predicate matching (Pn*3*Ut(matching of both attributes and

the relational operator in the predicate)). To establish a

complete identity match with one executing operation, several

other executing operations may have to be compared before

finding the one that matches. We shall assume that on the

average En/2 executing operations are compared before finding

an identity match. The time required to establish a no-match

with one executing operation is assumed to be equal to half

of the time required for establishing an identity match. The

total time required for establishing an identity match is IMt

as shown below.

IMt = (En/2)*(Ut/2)*(l+2*Cn*Pn+3*Pn) + Ut*(1+2*Cn*Pn+3*Pn)

Subset match. To find a subset match, the time

required is the sum of the times required for establishing an

identity match and for constructing the restricting

operation. We shall assume that on the average Cn/2

conjunctions and Pn/2 predicates in each conjunction of the

new select operation participate in the construction of the

restricting operation. Based on this assumption, the time for

constructing source relation name, conjunction attribute

list, and qualification condition for the restricting

operation is equal to Ut*(1+(Cn*Pn/4)+(Pn/2)). The total time

required to establish a subset match is SMt where

SMt = IMt + Ut*(l+(Cn*Pn/4)+Pn/2)

No-match. A new select operation is established to

have a no-match with any of the executing operations only

after establishing a no-match with each of the En

operations. As mentioned earlier, the time required for

establishing a no-match on a single executing operation is

assumed to be equal to half of the time required for

establishing an identity match on that operation. The total

time required for establishing a no-match is NMt where

NMt = (Ut/2)*(l+2*Cn*Pn+3*Pn)*En. Join

The complexity of matching a join operation is

identical to that of the selection operation. Therefore, the

above formulas for IMt, SMt, and NMt can be applied. Project.

Identity match. The two steps involved in

establishing an identity match between a new project

operation and an executing project operation are the source

relation name matching and the target attribute list.

matching. The time required for establishing an identity

match on a project operation is Ut+An*Ut. We shall assume

again that on the average En/2 operations are checked before

finding a complete identity match and on the average half of

the time required for establishing an identity match is

involved in establishing a no-match.

IMt = (Ut/2)*(En/2)*(1+An) + Ut*(1+An)

Subset match. The restricting project operation contains the

same number of target attributes as the new project


SMt = IMt + Ut*(l+An)


NMt = Ut*(l+An)*En/2 Union, Difference, Product, Divide

These are identity class of operations and so only

the complexity of identity and no match operations needs to

be calculated for them.

Identity match. The time required for finding an

identity match on any of the four operations is simply the

time for matching the source relation names.

IMt = 2*Ut*(1+En/4)


NMt = 2*Ut*(En/2)

The above formulas are used to calculate the overhead

involved in matching and merging query trees using the

proposed inter-query optimization methodology. In Chapter 7,

we shall present the advantage of intermediate result sharing

technique, taking into consideration the overhead involved.

5.2.3 Global Query Optimization

This optimization phase requires strategies to 1)

select proper sites to execute the operations of the new

query tree, 2) select the proper execution methods and access

structures for processing the query operations, and 3) choose

a proper query execution schedule that achieves a maximal

concurrency in its execution. In this section, we shall

define the global query optimization procedure. This

procedure consists of many steps that generate and evaluate

alternative versions of query pipeline schedules. Cost

formulas for calculating the query pipeline execution time

and response time are also developed and used in the

optimization procedure. The global query optimization

procedure includes steps to assign proper sites for the

execution of not only the unmatched operations of the new

query but also those matched operations which have been

established by the optimization process to be inefficient to

merge with the executing operations. Each of the matching

query operations is evaluated to determine if it would be

more beneficial to allow it to share the intermediate data

from the executing query operation or to execute it

separately. The following features of IDN have been

considered in the development of the global query

optimization strategies:

1. The differences in

a) the functionalities and capabilities of the different

component systems in the IDN,

b) the computation time and I/O time at different nodes in

the IDN,

c) the data transfer time between different component systems

in the network, and

d) the data translation time and the command translation time

necessary to produce data and commands for the component


2. The distribution of data in the IDN, namely

a) the horizontal and/or vertical fragmentations of the

conceptual data objects, and

b) the replication of the fragmented or non-fragmented data.

3. The nature of query execution (pipelined and data flow

based query processing) in the IDN.

4. Overlaps of the execution time components of the

operations in a query pipeline.

5. Alternative algorithms that are suitable for the pipelined

execution of the query operations.

Previous heterogeneous and homogeneous distributed

database query optimization studies have considered the first

two of the above features in the context of various

distributed systems such as star networks and broadcasting

networks [CHE84, ULL82, DAT83, CER84, CHI84, DAY82, DAY83,

BRI84, JAR84A, HEV81, YU82, RIC81, ONU83, APE83, EPS78,

SEL80, KER82, SUG83, YU84, YU85]. In this work, our major

objective is to extend the optimization techniques employed

in some of the above works by considering also the effect of

the last three of the above mentioned features of the IDN.

The use of pipelining and the resulting benefits of overlaps

in database query execution have been suggested before in

several studies [SMI75, BOR81B, BIC81, KOG82, KIM84, FIS84].

However, to our knowledge, the problem of query optimization

in the context of pipelined distributed query execution has

not been addressed before. Global query optimization procedure

Step 1. In this step, the query tree resulting from

the local optimization and query matching steps is used to

generate multiple merged query tree versions. A merged query

tree version is generated based on the choice of either

merging or not merging the operations in the new query that

were identified to have a match.

Step 2. The leaf nodes in each of the merged query

tree versions consist of operations on un-fragmented as well

as fragmented base relations. In this step, the associative

and commutative rules pertaining to the relational algebraic

operations [CHU82] (especially binary operations, since unary

operations have already been rearranged in the local

optimization step) are applied so that different query tree

versions are produced by different orders or arrangements of

query operations.

Step 3. The fragmented and un-fragmented relations

that are inputs to the leaf operations in each of the query

tree versions resulting from step 2 can have multiple copies

distributed across the IDN. Each such leaf operation can thus

be executed at many alternative sites. In this step, multiple

"processing trees" are generated for each of the query tree

versions based on different combinations of assignment of

sites to execute the leaf operations..

Step 4. In this step, each operation in each of the

processing tree versions resulting from Step 3 is assigned an

algorithm for executing that operation. Since some operations

in the query can be executed using many alternative

algorithms, many combinations of algorithm assignments to the

different query operations are possible. In this step, many

"preliminary schedules" (i.e. processing trees with

algorithms assigned) are generated for the processing limb

versions based on different combinations of algorithm


Step 5. The numerous preliminary schedules resulting

from Step 4 are analyzed to formulate limbs and proper sites

are assigned for their execution. A limb is a group of query

tree operations that can be assigned to a single component

system for sequential execution. This is done to eliminate

the data communication time which would have been required if

the operations in the group were executed by separate

component systems. By assigning several operations to a

single component system, we are trading processing time for

data communication time.

Limbs are formed by analyzing a processing tree

bottom-up, starting at the intermediate nodes that are at the

next level higher than the leaf nodes of the tree. When an

intermediate node is encountered in the analysis procedure,

it is checked to see if the same site, which has been

assigned to execute its input operations) is capable of

performing the operation of this node and thus can be

assigned to execute the operation. If this condition is

satisfied, the limb analysis proceeds to the checking of the

next higher intermediate level node. If it is not satisfied,

the descendant nodes of this node form a limb and, for each

of the possible site assignments for the execution of this

limb, a different "intermediate schedule" (i.e. a preliminary

schedule with site assignment completed) is generated. The

limb analysis procedure terminates when either the root node

is encountered or when all the intermediate level nodes are

checked. Thus, many "intermediate schedules" corresponding to

each of the preliminary schedules produced by Step 4 are

generated in this step.

Step 6. In this step, the optimization criteria such

as response time and execution time for pipelined execution

of each intermediate schedule are calculated.

Step 7. In this step, the calculated values for the

different optimization criteria associated with the different

intermediate schedules are compared. Based on this

comparison, the optimal "final query execution schedule" is


We should note here that in order to reduce the large

numbers of versions resulting from each of the first five

steps in the global optimization procedure, traditional

heuristics such as moving the unary operations down and

combining a sequence of adjacent select and project

operations into a single operation [CHU82] as well as some

new heuristics derived for the pipelined query processing

have to be incorporated into the global query optimization

procedure. In chapter 7, we shall discuss the heuristics

derived based on the results of an analytical evaluation

study on the performance of the alternative execution

schedules for various queries. Description of query pipelines

The operations in a merged query tree that have been

assigned to different component systems in the IDN form a

network of query pipelines. Figure 16 defines a query tree

and Figure 17 illustrates the structure of the network of

pipelines established for the execution of this query tree.

The symbolic name of each query operation in Figure 16 is

shown on its left side. The query pipelines shown in Figure

17 have nine nodes and three data sources (the base

relations Rl, R2, and R3) which feed data into the pipelines.

Figure 18 illustrates the relationship between the

flow of data along the query pipelines of Figure 17 and the

passage of time. We can observe from Figure 18 the following

behavior of data-flow. After the data sources start to feed

data block by block into the pipelines, there is some delay

before each node produces its first output data block. Each

node performs a filtering or transformation function on its

input(s) and, after some delay, starts to produce its output.

The first output is generally produced after receiving and

processing some number of input blocks. This behavior

continues until the first output block is produced by the

root node in the pipelines. At this time, the pipelines are

filled with data and the data continue to flow. The nodes

produce data blocks at different intervals of time. The

interval of time between two consecutive output blocks at

each node is mainly a function of the distribution of the

qualified data across the input blocks and the operation

represented by this node. The total number of data blocks

produced by a node depends on 1) the total number of input



Figure 16




au i



J Oo Other Query

52 P1

J2 Results

To Other Query



SM M mm m m mm m mm

Data transfer
1./7 Z ///7 Processing

-- response time of the query tree --

response time of J

block processing time

block transfer time


execution time of 13--

I2 -----------. jr ^ r^ ^ r^ ^^ ^^^^^ ^ ^




blocks to that node, 2) the type of operation it represents,

and 3) the amount of data filtered out or added to by the


From Figure 18, we can observe that a certain time

interval, which is shown by blank lines, elapses before a

processor starts processing the first input block which is

shown by the crossed lines. Since the processors assigned for

query execution are heterogeneous, each processor may take a

different time interval to process a single block as

illustrated in the figure by crossed lines with different

lengths. If a processor finishes processing ah input block

before the next one arrives, it will be idle for some time.

The idle times are shown by solid lines. The transfer of a

data block from one processor to another and its overlap with

the processing of the next block are shown by dotted lines in

the figure.

The following is a description of the parameters,

characterizing each node in a network of query pipelines.

The definitions of these parameters are illustrated in figure

18. These parameters are used in the derivation of the cost

formulas. We assume that estimates of the values of the data

dependent parameters such as input relation sizes and query

operation selectivities, and the system dependent parameters

such as processor speeds and loads, and network communication

speeds are available in the system dictionary. Parameters used in the cost formulas

Trk: Response time at node k -- The time elapsed from the

initiation of the pipeline until the first output

block from node k arrives at the next node, that is

node k+l.

Tek: Execution time at node k -- The time elapsed from

the initiation of the pipeline until the last output

block from node k arrives at node k+1.

Tok: Average output block interval -- The average time

interval between two consecutive blocks from node k

to reach node k+l. In general, the interval between

any two consecutive blocks is different from that

between two other consecutive blocks. It depends on

the nature of the operation as well as the data

characteristics. Since it is difficult to generalize

the data characteristics, we shall consider only the

average value of these time intervals.

Tpk: Block processing time or the time required for

performing a select operation or scalar aggregation,

or statistical aggregation on a block of data.

Tsk: Block sorting time at node k

Tmk: Block merging time at node k

Tnk: The time required for comparing all the tuples in one

block against all the tuples in another block, in the

case of nested block join at node k

Tjk: The time required at node k for joining one tuple in

one block with all the tuples in another block

Thk: Block hashing time at node k

Ttk: A single memory block movement time at node k

Tck: Block transfer time -- The time required for

transferring one data block between nodes k and k+l.

Trwk: Block read/write time -- The time required to read or

write one block from or to the secondary storage at

node k

ISkI: Output size -- The total number of output blocks

produced at node k

,Rfkl: Reduction factor -- The ratio of the product of the

sizes of both the join inputs and the join output

size at node k

IBkI: Buffer size -- The size of the main memory buffer

(in number of blocks) available at node k for

buffering the input

ICI: The number of categories in the input of a

statistical aggregate operation

,NI: The size of a block, in number of tuples

INdI: The number of distinct join attribute values in each

join input

INdol: The number of distinct join attribute values in the

total join output

INdbl: The number of distinct join attribute values in each

of the join input blocks, which also appear in the

join input

The three parameters Trk, Tek, and Tok are the

optimization parameters calculated for each node in the query

pipelines. The values calculated for the node at the root of

a query tree respectively represent the response time,

execution time, and average time interval between consecutive

output blocks for the entire query. Cost formulas

In this section, we shall develop the cost formulas

which provide the basis for evaluating the alternative query

execution methods and schedules. Relational algebraic

- operations, when processed using the pipelining approach, can

be considered to consist of three general types. Type 1

operations are unary operations which can produce output as

input data blocks are being received in a pipeline. Select

operation is an example of this type. Type 2 operations are

those which can produce output only after having examined all

their input data blocks. Sorting is an example for this type

of operation. Type 3 operations are binary operations which

can produce output as input data blocks are being received.

Relational join and set operations belong to this type. We

shall derive the cost formulas for these three types of


Our objective in the derivation of the cost formulas

is to express the optimization parameters associated with

node k (Trk, Tek, Tok) in terms of those associated with its

input nodes and the various components of the execution time

of this node. In the derivation of these cost formulas, we

assume that once a node in the query pipeline starts to

receive the input blockss, the operation is initiated and

the processing at that node completely overlaps with the

processing at its input node(s). Also, the computation, I/O,

and transfer components of the operation time are assumed to

overlap with one another. The cost formulas are derived under

various cases which include the following considerations: 1)

block processing time at node k is longer than the time

interval of the arrival of the blocks from node j, 2) block

processing time is shorter, 3) the buffer size at node k

(IBkj) is greater than the size of the total input from node
j, and 4) the buffer size is smaller. Several other sub-

cases, which take into consideration the relationship between

the response times and average output block intervals of the

two join inputs are also examined. Cost formulas for type 1 operations

The cost formulas associated with a node executing

this type of operation are derived separately for two cases.

In Case 1, the block processing time of node k is assumed to

be shorter than the average time interval between output

blocks from node j. In Case 2, the block processing time of

node k is assumed to be longer than the average output block

interval of node j.

Case i: Tpk <= Toj

Execution time:

Tek = Trj + Max{ISjl*Toj, ISjl*IRfkl*Tck)

/* Execution time = Response time of node j + Max{output

size of node j Average output block interval of node j,

Output size of node j Reduction factor at node k Block

transfer time */

Response time:

Trk = Trj + Max{|Sj|*Toj, ISjI*iRfki*Tck}/(ISji*iRfkl)

/* Response time = Response time of node j + The time

required for producing one output block */

Average output block interval:

Tok = (Tek Trk)/(ISjl*|Rfkl 1)

/* Average output block interval at node k = (Execution time

Response time)/((Output size at node k) 1) */

The above formula for calculating Tok is the same for

all cases and all operations, and, therefore, will not be


Case 2: Tpk > Toj

In this case, the processing of an input block at

node k cannot keep up with the arrival of new input blocks

from node j. If the query pipeline is operating under

synchronous control, node j has to wait for the completion of

the processing of the previous input block sent to node k

before sending a new one. If the nodes in the pipeline are

operating asynchronously, before node k finishes processing

an input block it is possible that node j can send many input

blocks. When the buffer which is used to accumulate the input

blocks at node k overflows, these overflow blocks must be

written to the secondary storage. The number of overflow

blocks at node k can be calculated based on 1) the total

amount of input sent from node j, and 2) the ratio of Tpk

and Toj. The number is equal to (ISjl ISj/(Tpk/Toj)).

Execution time:

When (ISjl Sjl/(Tpk/Toj)) <= IBkj :

Tek = Trj + Max{lSj *Tpk, ISjl *Rfkl*Tck}

/* Execution time = Response time of node j + Max{Total block

processing time, Total block transfer time} */

When (ISjl ISjl/(Tpk/Toj)) > |Bkj :

Tek = Trj + Max{[(JSjl/(Tpk/Toj))*Tpk + IBkl*Tpk + (ISjl-

Sjj/(Tpk/Toj) Bkj) Max(Tpk,2*Trwk)], ISjl* Rfkl*Tck}

/* Execution time = Response time of node j + Max{The time

required for the arrival of all the input + The time required

for processing the blocks in the buffer + The time required

for processing the overflow blocks accumulated on the

secondary storage, Output size of node j Reduction factor

at node k *Block transfer time} */

Response time:

Trk = Trj + (Tek-Trj)/(ISjl*iRfkI)

/* Response time = Response time of node j + The time for

producing one output block */ Cost formulas for type 2 operations

We shall derive cost formulas for three distinct

operations of this type. They are A) sorting, B) scalar

aggregation, and C) scalar aggregation. Sort operation

performs sort-merging after the last input block is received

and then produces output blocks. Scalar aggregate operations

such as maximum, minimum, and average produce a single output

value after the last input block is processed, whereas the

statistical aggregation operations partition the input into

categories and produce the output tuples which are equal to

the number of categories. Sorting

We shall analyze the sort operation using the two-way

sort-merge algorithm. As they arrive, the data blocks as they

arrive are sorted individually. The merging phase begins when

the last block has arrived and been sorted. Log2(jSj!) passes

of merging are required to produce the final sorted output.

Case 1 : ISji <= IBkl

Response time:

Trk = Trj + ISjI*max(Tsk,Toj) + ISji*(Log2(Sjl)-l)*Tmk +

Tmk + Tck

/* Response time = Response time of node j + Output size at

node j Max{Block sorting time, Average output block

interval at node j} + Merging time required for (Log2({Sjl)-

1) passes + Block merging time + Block transfer time */

Execution time:

Tek = Trk + Max(ISjl*Tmk, iSjl*IRfkl*Tck)

/* Execution_time = Response time of node k + Max{The total

time required for the last pass of merging, The total block

transfer time) */

Case 2: ISjI > IBkI

Response time:

Trk = Trj + Bk;*Max(Toj,Tsk) + (ISj -IBk,)
*Max(Tsk,Trwk,Toj)+(Log2( Sjl)-l)*{(IBkl*Tmk)+( Sjl-

|Bkl)*Max(Tmk,2*Trwk)} + Max(Tmk, Trwk) + Tck

/* Response time = Response time of node j + Buffer size at

node k Max(Average output block interval at node j, Block

sorting time) + The size of the input accumulated on the

secondary storage Max(Block sorting time, Block write time,

Average output block interval at node j) + Merging time

required for Log2(iSjj)-l passes + Block merging time + Block

transfer time */

Execution time:

Tek = Trk + Max{(IBkl*Tmk+(ISji-jBkj)


/* Execution time = Response time at node k + Max(The time

required for the last pass of merging, The total block

transfer time) */ Scalar aggregates

We shall derive separately the cost formulas for the

scalar aggregate operations, such as maximum and average, and

for the Type 1 operations, such as select, which have

embedded scalar aggregates in their qualification conditions. Simple scalar aggregates

Since only a single scalar value is output by simple

scalar aggregate operations, their response time and

execution time are equal and the average output block

interval is zero.

Case 1: Toj >= Tpk

Response time:

Trk = Trj + Toj jSj + Tck

/* Response time = Response time at node j + Average output

block interval at node j Output size at node j + Block

transfer time */

Case 2:Tpk > Toj

Response time:

When IBk: >= (ISj ,Sjl/(Tpk/Toj))

Trk = Trj + ISj *Tpk + Tck

/* Response time = Response time of node j + The time

required for processing all the input blocks + Block transfer

time */

When IBk < (|Sj| |Sjl/(Tpk/Toj))

Trk = Trj + IBkl*Tpk + (ISjl/(Tpk/Toj))*Tpk+(ISj;-

;Sji/(Tpk/Toj)-IBkl) Max(Tpk,Trwk) + Tck

/* Response time = Response time of node j + The time

required for processing all the blocks received directly and

the missed blocks accumulated in the buffer + The time for

processing the missed blocks accumulated on the secondary

storage */ Type 1 operation embedding scalar aggregates

When a select type operation at node k contains a

scalar aggregate in its qualification condition, it is

executed in two steps. In the first step, as the input data

blocks arrive, the scalar aggregate operation specified in

the qualification condition is performed. A final output

value for the scalar aggregate is obtained after the last

input block is processed. In the second step, the specified

block operation is performed on all the input data blocks.

The input data blocks required in the second step are

obtained by accumulating the input on the secondary storage

at node k during the first step and reading them during the

second step. We can consider the node k representing a block

operation with embedded scalar aggregates as consisting of

two sub-nodes k' and k, which execute steps 1 and 2

respectively. The formulas for stage k' are similar to the

ones derived above for the simple scalar aggregate stage.

Case 1: Isjl <= IBkl

Execution time:

Tek = Tek' + Max{!Sj *Tpk ISjl*IRfkl*Tck}

/* Execution time = Execution time of node k' + Max{Output

size of node j Block processing time Output size of node

k Block transfer time} */

I 86

i Case 2: ISjl > jBkl
Execution time:
STek = Tek' + Max{ISk|*Tck, (IBki*Tpk + (ISj!-lBkl)*
I /* Execution time = Execution time of node k' + Max{Total
block transfer time, The time required for processing the
blocks in the buffer and the blocks accumulated on the
3 secondary storage */
Response time:
Trk = Tek' + (Tek Tek')/ISj,
/* Response time = Execution time of node k' + The time
required for producing the first block of the first group */
3 Statistical aggregation operations
A statistical aggregation operation divides the
I tuples of a relation into categories based on their common
attribute values and computes the output values corresponding
I to each category. As each input block arrives, each tuple in
3 the block is placed in a buffer reserved for that category
and a new output value is computed based on the accumulated
3 data values of the tuples in that category.
Case 1: Isji <= !Bk!
I Response time:
3 Trk = Trj + ISjl Max{Toj,Tpk} + Tck
/* Response time = Response time at node j + Output size at
I node j Max{Average output block interval at node j, Block
processing time} + Block transfer time */




Execution time:

Tek = (Trk Tck) + Tck Round(ICl/jNI)

/* Execution time = Response time at node k Block transfer

time + Block transfer time Round (The number of categories

in the input / The number of tuples in a block) */

Case 2: ISjl > ,Bk!

Response time:

Trk = Trj + IBkI*Tpk + (ISj|-IBkl) Max{Tpk,Trwk} +

Max{Trwk, Tck}

/* Response time = Response time at node j + Buffer size at

node k Block processing time + (Output size at node j -

Buffer size at node k)*Max{Block processing time, Block read

and write time) + Max(Block read time, Block transfer time)


Execution time:

Tek = Trk Max{Tck,Trwk} + Max{Trwk,Tck} *Round(ICj/INI)

/* Execution time = Response time at node k Max{Total block

read time, Total block transfer time} + Max{Total block read

time, Total block transfer time} Round(The number of

categories in the input/ The number of tuples in a block) */ Cost formulas for type 3 operations

We shall take the join operation as the

representative of this type. The cost formulas for four join

algorithms: a) nested block, b) hashing, c) sort-merge, and

d) pipelined sort-merge are derived in the following. Nested block join

In this algorithm, when a new block arrives from each

of the join inputs, it is joined with the accumulated blocks

of the other input. The output blocks are sent to the next

node as soon as they are produced. The two inputs to the join

operation at node k are assumed to be arriving from nodes i

and j.

Case 1: IBkl >= ISil + !Sjl

Execution time:

Tek = Max{Tri,Trj) + Max{jSil*Toi,lSjl*Toj,jSij* Sjl*Tnk,


/* Execution time at node k = Max{Response time at node i,

Response time at node j} + Max{The time taken for all the

input from node i to arrive, The time taken for all the input

from node j to arrive, The time required for joining all the

data blocks from node i with all those from node j, The total

output block transfer time) */

Response time:

Trk = Max{Tri,Trj} + (Tek Max{Tri,Trj})/ISkl

/* Response time at node k = Max{Response time of node i,

Response time of node j} + The time required for producing

one output block */

Case 2: ISil+jSji > IBkl

The portion of the buffer assigned to each of the

join inputs is assumed to be proportional to its arrival

rate. IBkil is the size of the buffer available for the input

from node i and IBkjl is the size of the buffer available for

the input from node j.

jBkil = lBkl*Toj/(Toi+Toj)

'Bkjl = IBki*Toi/(Toi+Toj)
Execution time:

Tek = Max{Tri,Trj} + Max{iSij*Toi, ISjl*Toj,IBkil*IBkjl*Tnk +

[(ISii-IBkil)*|SjI+(!Sji-!Bkjl)*ISiH]* Max{Tnk, Trwk},iSkl*Tck}
/* Execution time at node k = Max{Response time at node i,

Response time at node j} + Max{The time required for all the

input from node i to arrive, The time required for all the

input from node j to arrive, The time required for joining

the data blocks in the two input buffers (IBkilj*Bkjl*Tnk) +

The time required for joining the data from one input that is

accumulated on the secondary storage with all the data from

the other input (((ISil-IBkil)*ISjl + (ISjl-lBkjl)*ISil) *

Max{Tnk,Trwk}), The total output block transfer time) */

Response time:
The formula for response time is the same as that

given in Case 1, which is Trk = Max{Tri,Trj} + (Tek -

Max{Tri,Trj))/ISk! Hash-Join

In this algorithm, as the blocks from one of the two

inputs arrive, a hash table consisting of the join attribute

values and the corresponding block numbers or IDs for all the

tuples in that input block is constructed. After hashing, the

input blocks are accumulated in the main memory or the

I 90

* secondary storage depending on the size of the main memory.
After all the data blocks from one input have been received,
the join values of the tuples from the other input are hashed
as they arrive from the sender or as they are read from the
Accumulated area. If the value of a tuple in the second input
hashes to a value in the hash table of the first input, then
the first input block containing that tuple is read from the
3 accumulated area and the tuples from both inputs are joined.
To demonstrate the worst case, we shall assume in the worst
U case that for each tuple in the second input that hashes to a
value in the first input hash table, a different block from
the first input has to be accessed from the accumulated area.
3 Dewitt [DEW84] has studied the performance of various hash-
join algorithms when only portions of the hash tables can fit
3 in the main memory. In this work, we make the simplifying
assumption that the entire hash tables can fit in the main
I memory, thus avoiding the complexity in the analysis that
3 would result otherwise.
In the following, we shall describe the timing
3 equations for hash-join over a non-key attribute. Two sub-
cases, A and B are distinguished in these equations. In the
I first sub-case, we assume that the input from node j starts
Sto arrive after node i input has started and continues well
after the last block from node i has been received. The
3 following diagram depicts this sub-case.




Sub-case A:

time=0 <--> Average interval between input blocks
Node i I ---- -- -----'- ---
time=Trj time=Tej
Node j I--------- ------I--I---i--- -----------------
S->Node j starts

In the second sub-case, the input from node j is

assumed to start arriving after that of node i and is assumed

to arrive completely before all the node i input blocks have

been received. The following diagram depicts this sub-case.

Sub-case B:
time=0 <--> Average interval between input blocks
Node i ----|---- -- ---- -- -- -- --
time=Trj time=Tej
Node j I---------j--- ---- -----
-> Node j starts

Many other sub-cases arise from the consideration of

the relative values of the response time and the execution

time of the two join inputs. The timing equations for these

sub-cases and for the hash-join over a key attribute have

been developed and are used in the evaluation. They are not
included here.
In our derivation of the timing equations, we

estimate the number of distinct join attribute values in the

join output at node k (INdol) to be proportional to those in

both join inputs.

INdol = ( Sii* Sj|* Rfk *!NI*INI)/((( SiI*IN )/iNd )
*((!Sj *I N! )/Nd ))

/* Total number of tuples in the join output/(Multiplicity of

the join attribute in the input from node i {(ISij*IN )/1Nd}


* Multiplicity of the join attribute in the input from node j

{(ISjI*INI)/INdl}) */

The number of distinct join attribute values in a

single input block that participates in the join (INdbl) is

estimated to be proportional to the ratio of the distinct

attribute values in the output to those in the input.

INdbl = INI*(INdol/iNdl)

/* The number of tuples in a block The ratio of the

distinct join attribute values in the join output and those

in the input */

Case 1: !Bk! >= !Sil + ISjI

Sub-case A:

Execution time:

Tek = Tri + Max{ISij*Toi, ISil*(Thk + Ttk) + ((ISil*Toi+Tri-

Trj)/Toj)*Ttk} + ((ISjl*Toj+Trj-Tri- Sil*Toi)/Toj) *

Max{Toj,(Thk+( Ndbl*(ISil/INdl))*Tjk),INdbl*(ISil/INdl)
*Tck) + (ISil*Toi+Tri-Trj)/Toj)*Max{Thk+!Ndbj*

(ISil/INdl)*Tjk, Ndb:*(ISil/!Nd )*Tck)

/* Execution time at node k = Response time at node i +

Max{The time required for all the node i input to arrive, The

time taken for hashing the node i input blocks and organizing

them in the main memory + The time required for moving into

the memory the node j input blocks that have arrived before

the end of node i input ((Tri+!Sil*Toi Trj)/Toj)} + The

number of node j input blocks arriving after the end of the

node i input ((Trj+ Sj *Toj -Tri-lSil*Toi)/Toj)* Max{Average

time interval between the arrival of consecutive node j input

blocks, The time for hashing a node j block + The time for

joining the number of distinct tuples in the node j block

(INdbl) with the number of duplicate tuples having the same

distinct join attribute values in the node i input

(ISil/,Nd,), The time for transferring the join output

produced} + Max{The time required for joining the node j

input blocks that arrived before the end of node i input

((Tri+ISil*Toi Trj)/Toj) with all the blocks from the node

i input, The time required for transferring the output blocks

produced} */

Response time:

Trk = Tri + (Tek-Tri)/jSkl

/* Response time at node k = Response time at node i + The

time required for producing one output block */

Sub-case B:

In this case, the join operation is performed in a

different manner from that in sub-case A. The efficiency of

the algorithm is improved by taking advantage of the fact

that all the node j input is available at node k by the time

all the node i input has been processed. After all the node

i input blocks are hashed, the node j blocks are read from

the accumulated area and hashed. After each input block is

hashed, the duplicate tuples with the same join attribute

value are added to the bucket of blocks holding the tuples

with that value. After all the node j input is hashed, the

buckets in each input, which correspond to the join attribute

University of Florida Home Page
© 2004 - 2010 University of Florida George A. Smathers Libraries.
All rights reserved.

Acceptable Use, Copyright, and Disclaimer Statement
Last updated October 10, 2010 - - mvs