Association algebra

MISSING IMAGE

Material Information

Title:
Association algebra a mathematical foundation for object- oriented databases
Physical Description:
viii, 159 leaves : ill. ; 29 cm.
Language:
English
Creator:
Guo, Mingsen, 1947-
Publication Date:

Subjects

Genre:
bibliography   ( marcgt )
theses   ( marcgt )
non-fiction   ( marcgt )

Notes

Thesis:
Thesis (Ph. D.)--University of Florida, 1990.
Bibliography:
Includes bibliographical references (leaves 135-140).
Statement of Responsibility:
by Mingsen Guo.
General Note:
Typescript.
General Note:
Vita.

Record Information

Source Institution:
University of Florida
Rights Management:
All applicable rights reserved by the source institution and holding location.
Resource Identifier:
aleph - 001638747
notis - AHR3687
oclc - 24160849
System ID:
AA00003326:00001

Full Text












ASSOCIATION ALGEBRA:
A MATHEMATICAL FOUNDATION
FOR OBJECT-ORIENTED DATABASES








By

MINGSEN GUO


A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL
OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT
OF THE REQUIREMENTS FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY


UNIVERSITY OF FLORIDA


1990



























Copyright 1990

by

Mingsen Guo





















Dedicated to my dear wife Zhu (Susie)

and lovely daughter Jialan.


And to our parents

Jingcheng Guo and Ruiying Zhang

Shuyan Huang and Chuanxiang Chen,


this was their dream before it was mine.














ACKNOWLEDGEMENTS


I would like to express my sincere appreciation to Dr. Stanley Su, chairman of

my supervisory committee, for giving me the opportunity to work on this interesting

and important topic in the area of object-oriented database systems. Without his

patient guidance and continuous support, this work could not have been completed.

I am grateful to Dr. Herman Lam, cochairman of my supervisory committee, for his

thought-provoking suggestions on this work. I thank Dr. Sham Navathe for his com-

ments and his personal library. I thank Dr. Randy Chow for his encouragement

throughout my graduate study. I would like to thank Dr. John Staudhammer for his

time and for being on my supervisory committee.

My special thanks go to Sharon Grant, the secretary of the Database Systems

Research and Development Center, whose help to me is always friendly and in time.

This research was supported by the National Science Foundation (DMC-

8814989) and the National Institute of Standard and Technology (60NANB4D0017).

The development effort is supported by the Florida High Technology and Industrial

Council (UPN88092237).















TABLE OF CONTENTS


ACKNOW LEDGM ENTS ..............................................................................

ABSTRACT ....................................................................................................

CHAPTER


Page

iv

vii


1 INTRODUCTION .............................................................................. 1

2 A SURVEY OF RELATED WORK............................................. 12

2.1 Relational Model and Relational Algebra................................ 12
2.2 Existing 0-0 Query Languages.............................. ............ .. 18
2.3 ENCORE 0-0 Data Model and Its Underlying Query Algebra. 25

3 OVERVIEW OF 0-0 DATABASES AND
ASSOCIATION-BASED QUERY FORMULATION........................ 38

3.1 Overview of 0-0 Databases................................... ........... 38
3.2 Pattern-based Query Formulation.......................... ............ 41
3.3 Conclusion .............................................................................. 45

4 ASSOCIATION ALGEBRA ......................................... ............ .. 51

4.1 Definitions.................................................................................. 51
4.2 Relationship Between Two Patterns..................................... 55
4.3 Association Operators.......................................................... 56
4.4 Query Examples .................................................................. 71

5 MATHEMATICAL PROPERTIES OF OPERATORS
AND THEIR APPLICATIONS IN QUERY OPTIMIZATION
AND QUERY DECOMPOSITION............................................ 91

5.1 Conventional Algebraic Properties........................................ 91
5.2 Nesting of Two Unary Operators ........................................... 95
5.3 Nesting of Binary Operator in Unary Operator ...................... 97
5.4 Cascading of Two Binary Operators..................................... 99
5.5 General Identities ....................................................................104
5.6 Transformation of Operators ..................................................104
5.7 Applications in Query Optimization and Decomposition ..........106

6 COMPLETENESS OF THE A-ALGEBRA.......................................118

7 CONCLUSION.................................................................................133










REFEREN CES .................................................................................................. 135

APPEND IX .............................. .............................. ...................................141

BIO GRAPHICAL SK ETCH ................................................................................159
















Abstract of Dissertation Presented to the Graduate School
of the University of Florida in Partial Fulfillment of the
Requirements for the Degree of Doctor of Philosophy


ASSOCIATION ALGEBRA:
A MATHEMATICAL FOUNDATION
FOR OBJECT-ORIENTED DATABASES


By
Mingsen Guo

December 1990
Chairman: Dr. Stanley Y.W. Su
Major Department: Electrical Engineering

Existing 0-0 DBMSs lack a solid mathematical foundation for the manipulation

of 0-0 databases, optimization of queries, and the design and selection of storage

structures for supporting 0-0 database manipulations. An association algebra (A-

algebra) is prescribed for serving as a mathematical foundation for processing 0-0

databases, which is analogous to the use of relational algebra for processing relational

databases. In this algebra, objects and their associations in an 0-0 database are uni-

formly represented by association patterns which are manipulated by a number of

operators to produce other association patterns. Different from the relational alge-

bra, in which set operations operate on relations with union-compatible structures,

the A-algebra operators can operate on association patterns of both homogeneous and

heterogeneous structures. Different from the traditional record-based relational pro-

cessing, the A-algebra allows very complex patterns of object associations to be

directly manipulated. Pattern-based query formulation and the A-algebra operators

are described. Some mathematical properties of the algebraic operators are









presented together with their application in query decomposition and optimization.

The completeness of the A-algebra is also defined and proven. The A-algebra has

been used as the basis for the design and implementation of an object-oriented query

language, OQL, which is the query language used in a prototype Knowledge Base

Management System OSAM*.KBMS.














CHAPTER 1
INTRODUCTION


In the past two decades, techniques of data modeling have gone through two

major conceptual changes. First, in early 1970s, E. F. Codd observed that future

database systems should allow application programs and terminal users to remain

unaffected by changes made to the internal data representation (or the storage

structure) of a database. He introduced the relational data model [COD70] and

proposed the relational algebra and relational calculus [COD72a] as the

mathematical foundation for processing relational databases. The relational model

provides two levels of data independence in a three-level architecture for a data-

base management system as shown in Figure 1.1 (figures of each chapter are

placed at the end of the chapter). At the lower level, the physical data indepen-

dence is provided, i.e., the logical representation of a relational database is a set of

relations (i.e., flat tables), which is independent of the physical (data and storage)

structures in which data are stored. At the higher level, the logical data indepen-

dence is provided, i.e., the external view remains unchanged when the logical view

of a database is modified (note that the external view remains unchanged only for

some schema modifications). Besides simple logical representation and data

independence, the fact that the relational model has a solid mathematical founda-

tion is very important and has contributed to the success of the model and the

existing relational database management systems.









However, the relational model and relational systems have some limitations.

For example, the model captures rather limited structural properties of real-world

entities or objects. The construct of aggregation hierarchy which models complex

objects and the construct of generalization which models the superclass-subclass

relationship are not provided. In the relational model, data which describe a com-

plex object are scattered among a number of normalized relations and accessing

that data involves time-consuming traversal and assembly of data stored in multi-

ple relations. The model also does not allow behavioral properties of

entities/objects to be explicitly defined.

The second conceptual change of data modeling techniques occurred in the

early 1980s. The object-oriented paradigm, first introduced in the programming

language SIMULA [DAH67] and made very popular through the language

SMALLTALK [GOL81], allows richer structural constructs and behavioral proper-

ties of objects to be specified at the logical level independent of their physical

implementations. Several features of the paradigm such as abstract data types,

inheritance, encapsulation, information hiding, polymorphism, etc. have been

shown to be useful for data modeling and system development. The object encap-

sulation concept adds a level of data independence between the physical and the

logical independence introduced in the relational model, as depicted in Figure 1.2.

It requires that the structural and behavioral properties of an object be (logically)

encapsulated in its class in the conceptual view of an 0-0 database. Since then, a

number of Object-Oriented (0-0) and semantic data models have been proposed

[HAM81, BAT84, KIN84, ZAN85a, ZAN85b, DAD86, MAI86, MAN86, SU86,









ZDO86, WOE86, BAN87, FIS87, HOR87, HUL87, KIM87, ROW87, CAR88,

COL89, SU89], which offer more powerful constructs for modeling the structural

and behavioral properties of objects found in advanced applications such as

CAD/CAM, CASE, and decision support systems.

An 0-0 semantic data model can be structurally and/or behaviorally object-

oriented [DIT86]. A structurally 0-0 data model is one that encompasses at least

the following characteristics:

(1) It supports the unique identification of objects, that is, each object has a

unique object identifier (surrogate) which is valid for the life-time of the

object.

(2) It categorizes those objects which can be described by the same set of charac-

teristics (attributes) into an object class.

(3) It allows aggregation (association) hierarchies to be defined.

(4) It allows generalization (association) hierarchies to be defined.

The 0-0 view of an application world is represented in the form of a net-

work of classes and associations. Object class can be either a primitive-class whose

instances are of simple data types (e.g., string, integer) or a nonprimitive class

(e.g., Part, Student, Teacher). At the extensional level, instances of different

classes can be related (associated) with each other forming patterns of object asso-

ciations. A behaviorally object-oriented data model, on the other hand, is one in

which operations that describe the behavior of the objects of a class can be defined

and registered with that class. Programs or methods that implement the opera-

tions defined for an object are transparent to the user of the objects.









For these models to be truly useful, they must provide some object manipula-

tion languages, which can take advantage of the expressive power of the models

and provide the users with simple and powerful querying facilities. Recently,

several query languages such as DAPLAX [SHI81], GEM [ZAN83, TSU84], ARIEL

[MAC85], FAD [BAN87], POSTQUEL [ROW871, EXCESS [CAR881, and others

reported in [DAD86, MAN86, SER86, BAN87, FIS87, BAN88, COL89, SHA90]

have been proposed. These languages were developed based on different para-

digms. For example, DAPLAX and the query language of [MAN86] are based on

the functional paradigm. The query language of [BAN88] is based on the message

passing paradigm. Other query languages are based on the relational paradigm:

an extension of QUEL [ROW87, CAR88]; an extension of SQL [DAD86]; and an

extension of the relational algebra [COL89]. The query language of [FIS87] is

based on both functional and relational paradigms, allowing functions to be used

in object-oriented SQL (OSQL) constructs.

The above languages have an 0-0 flavor and have taken significant steps

towards the development of a powerful 0-0 query language. Query languages

such as DAPLAX [SHI81], GEM [ZAN83], ARIEL [MAC85], and the object-

oriented query language described in [BAN88], are based on the view of a data-

base defined in terms of objects, object classes, and their associations. A query in

these languages is formulated by specifying one class (usually a nonprimitive-class,

whose instances are real world objects) in the schema as a central class with some

path expressions. Each path expression starts from the central class and ends at

another class (usually a primitive-class, whose instances are of basic data types










such as integer, string, set, etc.). A restriction condition can be specified on the

class referenced at the end of a path expression. This class can also be specified in

the list of attributes to be retrieved. The result of a query is a set of tuples, each

of which corresponds to a single instance of the central class and contains values

related to that instance which are collected from classes specified in the list.

A major drawback of these query languages is that they do not maintain the

closure property [ALA89b]. A query language is said to be closed if the result of a

query can be further queried by other queries specified in the same language. In

the above mentioned languages, the input to a query has an 0-0 representation

(i.e., a network of objects, classes, and their associations) whereas its output is a

relation which does not have the same structural and behavioral properties as the

original objects. Consequently, the result of a query cannot be further processed

by the same set of operators. The design of these languages is very much

influenced by the relational model and relational languages which are concerned

mainly with retrieval and storage operations. In 0-0 processing, objects in

different classes that satisfy some search conditions are subject to different user-

defined operations. The idea of collecting data to form a resulting relation does

not satisfy this processing model.

The query languages proposed [DAD86, MAN86, BAN87, ROW87, CAR88,

COL89] use nested relations as their logical views of 0-0 databases. Although

these languages are closed, i.e., operators in these languages operate on nested

relations to produce nested relations, the nested relation is not a proper logical

representation for an 0-0 database which is basically a network structure of









object associations. Mapping from a network representation to nested relations is

an additional process. Furthermore, in order to use a nested relation to represent

complex network structures, a considerable amount of data has to be introduced

to relate these nested relations. It is our view that the query language and its

underlying algebra should directly support the manipulation of network structures.

A query algebra [SHA90] was proposed recently based on the 0-0 model

ENCORE [ELM89]. Although ENCORE models applications as networks of

objects, object types, and their associations, the domain of the algebra is defined

as sets of objects of the Tuple type, which is essentially the nested relation

representation since it allows the nesting of tuples. Therefore, the mapping prob-

lem addressed above still remains. In this algebra, two identical queries or two

identical operations in a single query do not give the same response, since each

produces a new object in the database. To eliminate duplicated copies of the

same newly created object, the algebra introduces operations like DupEliminate

and Coalesce, which would not have been necessary if the algebra were to directly

support the network-structured processing of 0-0 databases. We further observe

that the union operation in this algebra may produce a collection of objects having

the same data type but with different structures (e.g., the union of two collections

of objects of the Tuple type with different arities). Nevertheless, the other opera-

tors introduced in the algebra are not defined to operate on collection of objects

with heterogeneous structures.

A common limitation of many existing query languages is that they cannot

express "non-association" relationship between objects easily, i.e., identify objects










in two classes that are not associated with each other while their classes are. For

example, in an 0-0 database, let us assume that Suppliers sl and s2 supply Parts

pl and p2, respectively. GEM, POSTQUEL, and several other query languages

provide the "dot" construct (Suppliers.Parts) and ARIEL provides the "of" con-

struct (Parts of Suppliers) to navigate from the class Suppliers to the class Parts

to produce object pairs (sl,pl and s2,p2). However, they do not have a language

construct for specifying the semantics that sl does not supply p2 and s2 does not

supply pl. Similarly, in functional languages, only the function Parts(Suppliers) is

provided to specify the associations of sl,pl and s2,p2 but not the non-association

of suppliers and parts.

In view of the disadvantages of the existing 0-0 query languages, we would

like to stress the importance of using a graph as the logical representation of an

0-0 database at both intensional and extensional levels as exemplified by 02

[LEC88], FAD [BAN87], and OSAM* [SU89]. The query language and its under-

lying algebra should provide constructs to directly process graphs with different

degrees of complexity. They should also support the specification of non-

associations and the processing of heterogeneous structures. Furthermore, the clo-

sure property should be maintained.

In this dissertation, we propose an association algebra (A-algebra) based on

the graph representation of 0-0 databases and the association-based query formu-

lation (refer to Chapter 3). Analogous to the development of the relational alge-

bra for relational databases, the development of the A-algebra provides the formal

foundation for query processing and optimization in 0-0 databases and for









designing 0-0 query languages. Unlike the record(tuple)-based relational algebra

[COD70 and COD72] and the query algebra [SHA90], the A-algebra is

association-based, i.e., the domain of the algebra is sets of association patterns

(e.g., linear structures, trees, lattices, networks, etc.) and processing an 0-0 data-

base is based on the matching and manipulation of homogeneous as well as hetero-

geneous patterns of object associations. Operators of the A-algebra can be used

to navigate a network of interconnected object classes along the path of interest to

construct a complex pattern as the search condition. They can also be used to

decompose a complicated pattern into simple ones. Ten operators have been

defined for the algebra: three unary operators [A-Select (r), A-Project (I), and A-

Integrate (f)], and seven binary operators [Associate (*), A-Complement (I), A-

Union (+), A-Difference (-), A-Divide (-), NonAssociate (!), and A-Intersect (*)],

where the prefix A stands for "Association". Although many of these operators

correspond to the relational algebra operators, they are different from them in

that they can operate on complicated heterogeneous structures. In this respect,

the A-algebra is more general than the relational algebra.

The rest of this dissertation is organized as follows. A detailed survey on the

relational model and the relational algebra, the existing 0-0 query languages, and

a recently proposed query algebra is provided in Chapter 2. The graphical

representation of 0-0 databases and the association-based query formulation are

described in Chapter 3 with the help of examples. Chapter 4 formally defines the

concepts of Schema Graph (SG), Object Graph (OG), and association patterns.

The formal definitions of the association operators and their simple mathematical






9


properties are also presented. The A-algebra expressions for some example queries

are given to demonstrate the utility of the algebra. Chapter 5 presents the

mathematical properties of the association operators and their utilities in query

optimization and query decomposition. The proofs of the mathematical properties

of the operators can be found in the Appendix. The completeness of the A-

algebra is shown in Chapter 6 and the conclusion is given in Chapter 7.























logical data
independence










physical data
independence


Figure 1.1 Data independencies in relational databases






















logical data
independence






encapsulation




physical data
independence


Figure 1.2 Architecture of 0-0 databases















CHAPTER 2
A SURVEY OF RELATED RESEARCH


This section surveys some of the existing work related to the development of

the A-algebra. Section 2.1 describes the relational model and the relational alge-

bra, while Section 2.2 surveys some existing query languages designed for 0-0

semantic data models. The query algebra recently appeared in the literature is

surveyed in Section 2.3.



2.1 Relational Model and Relational Algebra


When the hierarchical and network data models were used extensively in

information systems in the late 1960s, Codd [COD70] raised an interesting and

important question: Can application programs and terminal activities remain

invariant as the internal data representations (physical representations) change?

He asserted that the future users of large data banks must be protected from hav-

ing to know how the data were organized in the machine. Following this

rationale, he conceived the notion of data independence which suggests that the

logical organization of data should be independent of its physical representation.

Determined to demonstrate the validity of his data independence concept, he pro-

posed a relational data model based on n-ary relations.










The scheme of a relation, R, of an entity set {E1, E2, ..., EJ} is defined on a

set of m attributes {A, A2, ..., Am} which correspond to m domains

{DI, D2, ...,Dm} (not necessarily distinct). Each entity (the instance of the scheme)

is represented by an m-ary tuple which has its first attribute value from D,, its

second attribute from D2, and so forth. A set of attributes of a relation is called a

key if the entities of the relation can be uniquely identified by the values of these

attributes.

In particular, the information of the suppliers such as their names, addresses,

items they supply, and the prices of the items can be represented by the relation

SUPPLIERS of the following scheme

SUPPLIERS(SNAME, ADDRESS, ITEM, PRICE)

where the attributes SNAME and ITEM form a composite key. Data represented

in this form, which intuitively is a flat table, is the logical view of an application

world. It has nothing to do with the physical representation of the data.

When designing a database using the relational model, one is often faced with

a choice among alternative sets of relation schemes. Some choices are more favor-

able than others for various reasons. For example, the relation SUPPLIERS is not

a desirable scheme because it has the following potential problems: (1) Redun-

dancy the address of the supplier is repeated once for each item supplied. (2)

Potential inconsistency (update anomalies) as a consequence of the redundancy,

the update of the address of a supplier in one tuple will leave it inconsistent with

the address of another tuple. (3) Insertion anomalies the address of a supplier

cannot be recorded if that supplier does not currently supply at least one item










since SNAME and ITEM form a composite key of the relation SUPPLIERS. (4)

Deletion anomalies the inverse to problem (3) is that should all the items sup-

plied by one supplier be deleted, we unintentionally lose the address of that sup-

plier.

The causes of these problems and their solutions are relevant to the func-

tional dependencies among the attributes of a relation [COD70, ULL82]. Suppose

X and Y are two sets of attributes of a relation. Y functionally depends on X (or

X functionally determines Y), denoted by X-.Y, if two tuples of the relation hav-

ing the same values in attributes X agree on the values of the attributes in Y.

The above four problems emerge if X-. Y and X,--Z hold simultaneously, where

X, stands for a proper subset of X and Z a set of attributes of the relation.

The solution to these problems is to decompose a relation based on the func-

tional dependencies among attributes. For example, the functional dependencies

among attributes of the relation SUPPLIERS are (SNAME,ITEM)--PRICE and

SNAME-.SADDRESS, thereby having the redundancy, update, insertion, and

deletion anomalies. It should be clear to the reader that these problems will be

eliminated if the relation SUPPLIERS is decomposed into two relations


SA(SNAME, ADDRESS) and
SIP(SNAME, ITEM, PRICE).


There is, however, a disadvantage to the above decomposition; to find the address

of a supplier who supplies item "piston", a join operation, has to be applied since

the SADDRESS and ITEM are logically distributed in two relations.










The decomposition of a relation based on the functional dependencies among

its attributes is a novel issue of normalization in the relational model. Four types

of normal forms, denoted by 1NF, 2NF, 3NF, and Boyee-Codd-NF, respectively,

have been recognized in considering the functional dependency [COD70, ARM74,

and BEE77]. The Boyee-Codd-NF is the strongest of these normal forms. Rela-

tions in these normal forms may have to be further decomposed into 4NF or 5NF

to eliminate multivalued dependencies [FAG77, DEL78, and ZAN76] and join

dependencies [AHO79]. This decomposition is needed to eliminate further redun-

dancy and anomalies.

The success and popularity of the relational model and the relational data-

base management systems (DBMSs) are due to its simplicity in structural tabularr)

representation and its sound theoretical basis the relational algebra and the rela-

tional calculus [COD72a]. The relational algebra defines five primitive operators,

of which two are unary operators [Projection (H) and Selection (o)] and three are

binary operators [Cross-product (x), Union (+), and Difference (-)]. Other opera-

tors such as Join, Natural-join, Set-intersection, and Set-division are also defined

in the algebra. Although these later operators are easy to use, they are not primi-

tive since they can be expressed in terms of the primitive operators.

The relational algebra has the closure property, since every operator must

operate on one or more relations and produces a new relation. Operators of the

relational algebra basically operate on the values of tuples in relations. Structur-

ally speaking, they are defined to operate on tuples whose structures are union-

compatible (homogeneous). The relational algebra is complete in the sense that it










has the equivalent expressive power to the relational calculus [COD72a and

ULL82]. Because of this, it serves as the theoretical basis for the relational model.

The relational algebra has been used for the following three purposes, although it

has not been previously implemented in any existing DBMSs exactly as defined

[ULL82],

(1) It creates a new class of query languages called algebraic languages. Based on

the relational algebra, languages that directly adopt the relational operators

can be developed, such as ISBL [TOD76] which is a close approximation to the

relational algebra. Although languages of this type are mostly procedural, it is

relatively easy to demonstrate their completeness along with the mathematical

properties of the relational algebra which can be readily applied to query

optimization and query decomposition.

(2) It not only serves as a benchmark for evaluating query languages in existing

systems, but also as the criterion for designing new languages for relational

DBMSs. A relational language will not have the necessary expressive power if

it is not relationally complete [ULL82].

(3) It provides a mathematical basis for transforming expressions in query decom-

position and (logical or conceptual) query optimization. As an algebra form,

the mathematical properties of the relational algebra can be explored precisely

and systematically. For query languages construed as algebraic languages,

these mathematical properties exhibit a straightforward application [HAL76J.

Query languages like SQUARE or SEQUEL having certain algebraic features

may also use these properties, since the parse of a query yields a tree in which









some nodes represent relational algebra operators [AST76]. Even if a query

language such as QUEL is a relational calculus language, its calculus-like

expressions are translated into relational algebra expressions in the QUEL

optimizer [WON76].

The total content proposed by Codd before 1979 on the relational model is

referred as Version 1 of the relational model (RM/V1), whose modeling capabilities

were extended by Codd in 1979 [COD79] to version RM/T (T for Tasmania).

Based on these two versions, Codd [COD90] introduces Version 2 of the relational

model (RM/V2). The most important additional features in RM/V2 are as fol-

lows:

(1) A new treatment of items of data missing because they represent properties

that happen to be inapplicable to certain object instances.

(2) New features supporting all kinds of integrity constraints, especially the user-

defined integrity constraints.

(3) A more detailed account of view updatability.

(4) New features pertaining to the management of distributed databases.

It is important to recognize the fact that hierarchical and network models as

well as the relational model evolved during a time in which the primary applica-

tions of information systems were business-oriented. In an attempt to apply these

techniques to the more complicated application areas such as CAD/CAM, CASE,

and decision support, it is found that the relational model is no longer adequate

for modeling these advanced applications. The inadequacies of the relational

model are summarized as follows. First, the relational model has limited modeling










capabilities. When data are logically represented in the form of relations, the rela-

tionships among entities in these relations are represented by matching values of

the attributes or keys in one relation with values of the attributes or foreign keys

in other relations. The actual semantics among the data such as generalization

and aggregation (the abstract data type) cannot be modeled by the relational

model. Second, the relational model only models the structural aspects of entities,

and thus, ignores their behavioral aspects (e.g., system-defined and user-defined

operations). Third, in these advanced applications, the concept of data indepen-

dence should be further extended to the concept of object encapsulation, i.e., not

only should the logical representation of an object be separated from its physical

representation, but its structural and behavioral properties should be logically

encapsulated in its class. The object encapsulation concept cannot be realized in

the relational model, since the data describing an entity may be logically scattered

among several relations due to normalization [COD70, COD72b, BEE77, and

ULL82]. Fourth, entities with complex structures and complicated relationships

among entities are not representable by flat tables (relations). Finally, it cannot

represent and operate on entities with different (heterogeneous) structures.



2.2 Existing 0-0 Query Languages


An extensive literature search on query languages for accessing 0-0 data-

bases such as GEM [ZAN83, TSU84], ARIEL [MAC85], DAPLEX [SHI81], FAD

[BAN87], POSTQUEL [ROW87], EXCESS [CAR88], as well as other proposed

languages [ST084, DAD86, MAN86, SER86, BAN87, FIS87, BAN88, COL89,










SHA90] has been carried out. This section surveys a representative sample of

these languages. Most existing query languages have capabilities beyond those

provided by its theoretical basis. For example, the arithmetic operations and

aggregation functions provided by the relational languages are not available in the

relational algebra. Therefore, this survey is limited to those features which are

relevant to the proposed algebra.

To demonstrate the similarities and differences of these languages, the same

database schema as shown in Figure 2.1 is used for example queries written in

GEM, ARIEL, DAPLEX. The sample schema of Figure 2.1 is for a government

owned laboratory system where rectangles represent classes and edges (links)

represent attributes.

QUEL [STO76, WON76, and Z0077] is a tuple-calculus oriented query

language for relational DBMS INGRES [ST076]. In order to avoid the ambiguity

which arises when two attributes of different relations having the same name are

addressed in a single query, QUEL uses a "dot" mechanism to qualify an attribute

of a relation (i.e., a dot is inserted between the name of the relation and the name

of the attribute). For example, Equipment.Name refers to the attribute Name of

the relation Equipment. Influenced by this mechanism, the existing 0-0 query

languages use similar notations for navigating the database schema from one class

to another or from one relation to other relations in systems which use relational

databases as their back-ends.

The language GEM [ZAN83,TSU84] is an extension of QUEL for the data

model DSIS which supports aggregation, generalization, and unique identification










of objects. In GEM, a class in an aggregation hierarchy that has a link emanating

to another class has the name of the later class as the data type of one of its attri-

bute. For example, the class Lab has an attribute, Facility, of the type Equip-

ment, and has another attribute, Locality, of the type Location, and so forth. The

dot notation is used in GEM for navigating along the reference attributes (links) in

query formulation. The following GEM query retrieves the name of the manager,

the serial number of the equipment, and the address for each laboratory whose

headquarter is located in New York.


Range of Lab is Lab
Retrieve Lab.Manager.Name
Lab.Equipment.Serial#
Lab.Location.Address
Where Lab.Manager.Department.Headquarters.City = "New York"


This query returns a set of tuples in a tabular form. Each tuple contains

values for the manager's name, the equipment serial number, and the address of

the laboratory of interest.

In the approach described in Stonebraker et al. [ST084], the dot notation is

used in a manner similar to that found in GEM to implement the abstract data

type (ADT) concept. In addition, QUEL is used as a data type to facilitate the

navigation from one relation to another. A relation may have a field of type

QUEL which may contain expressions or commands (queries). Whenever the field

is addressed in a query, these expressions, in whole or in part, will be activated.

In general, if X is the tuple variable of the relation R1, Y is a field of type QUEL

in relation R1, and the query stored in Y retrieves field Z of another relation, R2,










then the expression X.Y.Z is a field in a collection of this view. In other words,

the expression will return the values of the Z field of tuples (in R2) that are

related to X through Y. For example, let the relation Manager have a field called

OfficeInfo of type QUEL which contains a query that retrieves the telephone

number of the relation Location. The expression Manager.OfficeInfo.Tel# returns

the telephone number for each manager in a tabular format. Clearly, the imple-

mentation of QUEL as a data type provides a way to relate data in two relations

without modifying the database schema.

Instead of using the dot notation, ARIEL [MAC85] takes advantage of the

"OF" notation. The example query described for GEM can be restated as


Range of Lab is Lab
Retrieve Name OF Manager OF Lab
Serial# OF Equipment OF Lab
Address OF Location OF Lab
Where City OF Headquarters OF Department OF Manager
OF Lab = "New York"


using the "OF" notation which is linguistically more natural than using the dot

notation. However, the result of this query is also represented by a flat table

(relation).

DAPLEX [SHI81] is a functional data language. The data retrieval com-

ponent of DAPLEX is similar to the languages described above, although it is

interpreted differently. In the functional paradigm, the class having a link (i.e.,

attribute) emanating to another class is considered as a function. The function

has, by default, the name of the class to which the link points. For example,










Location(Lab) and Department(Headquarters) represent the facts that Lab has

Location and Headquarters has Department as attribute, respectively. When the

function Location(Lab) is applied to an object of the class Lab, it returns a value

which is an object in the domain class over which the attribute is defined. If the

navigation is from one class to another through a sequence of classes, a nested

function is used. For instance, the expression Name(Manager(Lab)) specifies the

name of the manager of a laboratory to which the manager is responsible. For a

particular object of Lab, the manager of the laboratory is produced first; then, the

function Name( is applied to the returned manager and returns the name of the

manager. The example query can be expressed in DAPLEX as follows.


FOR EACH Lab
SUCH THAT City (Headquarters (Department (Manager (Lab))))
= "New York"
PRINT Name (Manager (Lab)),
Serial# (Equipment (Lab)),
Address (Location (Lab))


Even though DAPLEX is based on the functional paradigm, it returns data in the

form of a relation just like in GEM and in ARIEL.

Banerjee et al. [BAN88] introduce a query language based on message pass-

ing. In the message passing paradigm, the name of a link emanating from a class

is interpreted as the name of a message which is stored within that class. One can

assume there is actually a message created by the system and having, by default,

the same name as its corresponding attribute. When such a message is sent to an

instance of the class, it returns the value of the attribute. For example, the fol-










lowing is an expression for selecting a laboratory that has a manager who belongs

to a subordinate department of its New York headquarters.


(Lab SELECT :S (:S Manager Department
Headquarters City = "New York"))


SELECT in this expression is a message sent to the class Lab. The first

argument of SELECT is :S, an iteration variable. The SELECT message iterates

over the instances of the class Lab with :S bound to one instance at a time. The

block of code within the parentheses is the second argument of SELECT, and is

executed for each value bound to :S. In this particular block, the message

Manager is sent to the instance bound to :S in order to return the related Manager

instance. Similarly, Department and Headquarters are messages. To elaborate,

Department is sent to the returned Manager instance, Manager is sent to the

returned Department instance, and Headquarters is sent to the returned Depart-

ment instance. The sign "=" is also a message which has the argument "New

York". When this message is sent to the resulting headquarter instance, it returns

a logical object TRUE or FALSE. An instance of Lab is qualified for the above

expression, if and only if the returned logical object is TRUE. The logical AND

or OR message can be sent to this object with an argument that specifies some

other condition on the instance of Lab. In principle, though not described in Ban-

erjee et al. [BAN88], similar message-based expressions can be used to retrieve

attribute values of the resulting Lab instance. The result of a query which

involves such conditions is the set of the instances of Lab along with its attribute










values and is represented in a tabular form.

As shown in the samples of these query languages, their query formulations,

though interpreted differently, are very similar to each other. This is evident in

the fact that the formulating of queries is accomplished by navigating the graphi-

cally represented database schema from class to class through their respective

links. In each of these languages, however, a query operates on a database that is

structurally represented using an 0-0 data model and returns a result whose

structure is represented in a tabular form. Consequently, the result of a query

cannot be further queried by other queries written in the same language. There-

fore, these languages are not closed.

Another drawback of these languages is seen in their navigation mechanisms

which can only formulate queries against classes (or relations) that are interre-

lated in simpler patterns like the linear and forest structures shown in Figure 2.2a.

However, in 0-0 databases, the graphical patterns in which objects are inter-

related with each other are basically networks which are not restricted to plane

graphs (a graph is a plane graph if it can be drawn on a plane without any inter-

section of two edges). They can be as complicated as surface graphs (a graph is a

surface graph if it can be drawn on a surface without any intersection of two

edges). Phrasing queries against classes that are interrelated in more complicated

patterns depicted in Figure 2.2b is beyond the capabilities of these languages.

A third drawback of these languages which renders their navigation mechan-

isms insufficient is that only one type of the relationship (an object ia related to

another object) between objects of two classes can be expressed. In fact, when










two classes are directly linked at the schema level, objects in these two classes

may have another type of relationship an object is not related to another object.

This type of relationship represents the complement aspect of the semantics

specified for the two associated classes, such as not-a-part-of,

not-a-function-of, or is-not-a which is often needed in querying the databases.

For example, "For each laboratory, list the equipment that is not available" is a

reasonable query.

The proposed query languages [DAD86, MAN86, BAN87, ROW87, CAR88,

COL89] use nested relations as their logical views of databases. A nested relation

is a generalized relation, i.e., a recursively defined relation: the attributes of a rela-

tion can be either atomic values or another relation in which the attributes can be

a third relation, and so forth. Figure 2.3 shows an example of a nested relation.

Nested relations are particularly suitable for representing data in forest structures.

The above languages are considered to be closed, since operators in these

languages operate on nested relations and produce nested relations. However,

they also have the drawbacks mentioned above and it is our view that nested rela-

tion is not a proper logical representation for an 0-0 database which is networks

of objects, object classes, and their associations. Using nested relations to

represent data in network structures introduces one level of indirection. Mapping

from a network representation to nested relations is an extra process. Further-

more, in order to use a nested relation to represent complex structures, a large

amount of data has to be replicated in the representation. Figure 2.4 shows an

example of using a nested relation to represent a graph having loops. Note that










vertex F has to be replicated three times.



2.3 ENCORE 0-0 Data Model and Its Underlying Query Algebra


In spite of the popularity of the 0-0 paradigm and its application in the field

of database management, the existing 0-0 database management systems still

lack a solid mathematical foundation for the manipulation of an 0-0 database

and the optimization of queries. Recently, a query algebra [SHA90] was proposed

for the ENCORE 0-0 data model [ELM89]. This section surveys the query alge-

bra as well as the ENCORE model. It also serves as a comparison to the associa-

tion algebra proposed in this dissertation.



2.3.1 The ENCORE Model


ENCORE 0-0 data model [ELM89] supports abstract data type, type inheri-

tance, typed collection of typed objects, objects with identity, and object encapsu-

lation. It models an application as networks of objects, object types, and their

associations. The definition of an abstract data type in this model includes the

Name of the type, a set of Properties defined for instances of the type, a set of

Operations which can be applied to the instance of the type. Properties reflect the

state of an object while operations may perform arbitrary actions. Properties are

typed objects that may be implemented as stored values, procedures, or functions.

The implementation of a property is invisible to the user and is assumed to return

an object of the correct type and to have no side-effects.









In addition to user-defined abstract data types and a collection of atomic

types such as Int, String, Boolean, etc. (i.e., primitive-classes), ENCORE provides

two parameterized types and a global Object type which is the supertype of all

other types. The parameterized type Set[TC defines T as the type, or supertype, of

objects in a collection having type Set, and T is called the member type of the set.

The parameterized tuple type associates types (T,) with attribute names (A,) and

defines properties Get-attribute-value and operations Setattribute-value for each

attribute. The T,'s can be any database types, thus, allow nesting of tuple types.

The value of a tuple is represented as where the

A's are attributes of the tuple and the o's are objects of the corresponding types.

The global supertype Object defines a family of operations for equality called

i-equality where i indicates how "deeply" a comparison of two objects must search

before finding equality. Two objects are identical when they are the same object,

i.e., they have the same identity. Identical objects are 0-equal (=0 or just =) and,

for i>0, two objects are i-equal (=J) if

(1) they are both collections of the same cardinality and there is a one-to-one

correspondence between the collections such that corresponding members are



(2) they both have the same type (not a collection type) and the values of

corresponding properties are =i-1.

Type Object also defines a stronger notion of equality called id-equality.

Two objects are id-equal at depth i if they are i-equal and graphical representa-

tions of the objects are isomorphic.









2.3.2 The Underlying Query Algebra of ENCORE


The query algebra [SHA90] is proposed based on the 0-0 model ENCORE.

The domain of the query algebra is defined as a typed collection of typed objects.

A typed collection is of parameterized type Set[T1 and the objects in the collection

are of type T. If objects of a collection are collected from different types, T is

their most specific common type in the type lattice. For example, if object a is of

type S, object p is of type P, and S is a supertype of P, the collection of objects a

and p is of type Set[S]. The query algebra is closed since the operators of the

query algebra operate on collections) of objects with type Set[T,] and produce a

collection with type Set[TJ, where type Tk is defined by the query.

Similar to the languages surveyed in Section 2.2, the query algebra addresses

a property of an object using 'dot' notation (e.g., e.p.q where a is an object of type

T1, p is a property of a and is of type T2, and q is a property of p and is of type

T3).

Twelve operators are defined in this algebra. We give their brief definitions

followed by some example queries to illustrate the major concepts of this algebra.

(1) The Select operation creates a collection of objects which satisfy a selection

predicate.

Select(S,p) = { | (a in S)Ap(s) }

where p is the predicate.

(2) The Image operation is used to return a single object for each object in the

queried collection and has the form:










Image(s, f: 7) = { (A) I s in S }

where S is a collection of objects and f returns an object of type T.

(3) The Project operation extends Image by allowing the application of many

functions to an object, thus supporting the creation and maintenance of

selected relationships between objects. The relationships are stored as tuples

with Tuple type.

Project(S, =
{ I in S }

where S is of type Set[71, the A,'s are unique attribute names, and each /f

takes a single input of type T and returns an object of type Ti. Project

returns one tuple for each object in the collection being queried. Each newly

created tuple is a new object with unique object identifier.

(4) The Ojoin operator is an explicit join operator used to create relationships

which is not defined between objects of two collections in the database. It is

essentially a Cartesian product of collections of objects, followed by a selec-

tion of result tuples. For collections S and R, the Ojoin is defined as follows:

Ojoin(S, R, A,, Ag, p) =
{ I a in S A r in R A p(s,r) }

where p is a predicate (as in Select) defined over objects from S and R. The

Ojoin operation creates new tuples in the database to store the generated

relationships. The tuples created will have unique object identifiers.

(5) Union, Difference, and Intersection are the usual set operations with object

comparisons and set membership based on object identity (=,). The result of










these operations is considered to be a collection of objects of type T, where T

is the most specific common supertype (in the type lattice) of the types of the

objects in the operands.

(6) Flatten operation is used to restructure sets of sets and Nest and UnNest

allow the representation of tuples as flat or nested relations.

(7) For the above operators, two identical operations cannot give identical

response, since each result collection is a newly identified object in the data-

base and the objects in a result collection may be either existing database

objects or new tuple objects created during the operation. Operators DupEl-

iminate and Coalesce are introduced to handle situations where equal objects

are created by a query.

The example queries are issued against the Supplier-Parts-Job database

shown in Figure 2.5. For the purpose of these examples, it is assume that Type

Object is the only supertype for each of the given types.

Example 1: Find all red parts. Which suppliers can supply all of the red parts?

Pred := Select(Parts,Xp p.color = "Red"
S-Pred:= Select(Suppliers,Xs P.red subset-of s.Inventory)

The first selection finds the red parts and the second selection finds all sup-

pliers for which the inventory includes that set of parts. The subset-of operation

is available since property Inventory and result P-red both have type Set[art].

Example 2: What parts are needed by jobs in Boston?

BosJobs := Select(Jobs,Xj j.address.city =- "Boston")
BosJobParts := Project(BosJobs,Xj <(J,j),(Pt,j.PartsNeeded)>)










The select operation finds the jobs in Boston and the project operation gives

information about which parts are needed for each job in Boston. The result of

the projection is of type Set[Tuple]. Note that operation NewPart (of type Job)

cannot be applied to members of BosJobParts, since they have type Tuple. How-

ever, it is appropriate for objects BosJobParts.J.

Example 3: Find all local suppliers for each job.

LocalS:= Ojoin(jobs,Suppliers,J,S, Xj Xs
j.address.city = s.address.city)

This Ojoin operation produces a set of tuples of type <(J,Job),(S,Supplier)>,

which is similar to a normalized relation. To get a set of suppliers for each job, a

Nest operation needs to be applied: Nest(LocalS, S).

From the above description, we can see that the query algebra supports

many features of 0-0 databases and has taken significance steps towards a power-

ful 0-0 query algebra to serve as the mathematical foundation for 0-0 database.

However, it still has the following limitations.

(1) Although the ENCORE models an application as networks of types, objects,

and their associations, the domain of its underlying query algebra is defined as

collections of objects having type Set[T], which is essentially a nested relation

representation, since the member type T of the set type can be a parameter-

ized Tuple type which may in turn contain attributes of Tuple types. There-

fore, the query algebra cannot represent network-structured relationships

among objects efficiently and the mapping problem addressed before still

remains.










(2) In this algebra, two identical expressions or two identical operations in a sin-

gle expression do not give identical response, since each result collection is a

newly identified object in the database. To eliminate duplicated copies of the

same newly created object, the algebra introduces DupEliminate and

Coalesce operations, which are not necessary if it directly supports the net-

work view of 0-0 databases.

(3) In this algebra, a collection may contain objects with heterogeneous struc-

tures. For example, two objects are both of Tuple type but with different

arities and the union of the two object is also a collection of objects having

Tuple type. However, other operators in this algebra are not defined to

operate on such collectionss.

(4) Since the query algebra is developed for a specific model (i.e., Encore), it is

difficult to apply to other 0-0 models.























































Figure 2.1 A sample schema























(a) simple query patterns


plane graphs


surface graphs


(b) complex query patterns


Figure 2.2 Simple and complex query patterns


0---0---0---0---0























































Figure 2.3 An example of a nested relation















B(b2)


A(al)


D(d3)


E(e2)




F(f5)

G(gl)

H(h6)


Figure 2.4 Using a nested relation to represent a complex structure










Type Supplier
properties:
Ident: string
Address: Addr
Inventory: Set[Part]


Type Job
properties:
Num: string
Address: Addr
PartsNeeded: Set[Part]
Preferred_Suppliers:
Ordered_list[Supp


operations:
RecvOrder:
Supplier, Set[Part] --> Supplier


operations:
NewPart: Job, Part --> Job


Type Part
properties: operation
Num: string Order:
Address: Addr Same
Color: string
Components:
Set[Tuple[<(P,Part, (Qty, Int)>]]
Plan: drawing
BillofMaterial: list[Part]


s:
Part --> Part
Part: Part, Part --> Boolean


Type Addr
properties:
Street: string
City: string
State: string


Figure 2.5 A Supplier-Parts-Job database













CHAPTER 3
OVERVIEW OF 0-0 DATABASES
AND ASSOCIATION-BASED QUERY FORMULATION


This chapter informally introduces the graphical view of 0-0 databases and

illustrates the association-based query formulation mechanism. The graphical

view captures the most important characteristics of 0-0 databases in which

object classes and their objects are associated with each other. Based on this

view, query formulation and processing can be made by specifying and manipulat-

ing association patterns in which objects are inter-related with each other, unlike

the traditional attribute-based query formulation and processing which match

values in different relations. Since the graphical view is suitable for many 0-0

data models, the association algebra developed based on this view can be used as a

general algebra for supporting these 0-0 databases. The graphical view of O-O

databases is formalized in the next chapter.



3.1 Overview of 0-0 Databases


0-0 semantic data models provide a conceptual basis for defining 0-0 data-

bases. Although each model has some unique constructs that distinguish one

model from the others, there are several common structural and behavioral pro-

perties based on which an algebra can be developed and used to support these

models:










First, objects are physical entities, abstract concepts, events, processes, func-

tions or anything that an application cares to capture and represent.

Second, objects having the same structural and behavioral properties are

grouped together to form an object class. Object classes can be categorized into

two general categories: (1) the nonprimitive-class which represents a set of objects

of interest in an application world, each of which is assigned a system-wide unique

object identifier (OID) and its data are explicitly entered in a database by the

user; and (2) the primitive-class which represents a class of self-named objects

serving as a domain for defining other object classes, such as a class of symbols or

numerical values. The behavioral properties of an object class are defined in

terms of system-defined or user-defined operations (e.g., retrieve, display, delete,

insert, rotate a design object, hire an employee, etc.), which can meaningfully

operate on its objects using their corresponding programs (or methods). The

structural properties of an object class and, thus, its objects consist of two types of

data (1) descriptive data (or instance variables) which define the states of the

objects; and (2) association data which specify the relationships between its

objects and the objects of some related classes.

Third, different 0-0 models recognize different types of associations. Two of

the most commonly recognized associations are aggregation and generalization.

Aggregation models the a-part-of, a-function-of, or a-composition-of relation-

ship. For instance, a complex object can be modeled by an aggregation hierarchy

(abstract data type) in which a complex object is defined in terms of its associa-

tions with objects in other defined classes. Generalization models the is-a or the










superclaos-subclass relationship in which an object in a subclass inherits both the

structural and the behavioral properties of its superclass(es).

Thus, from the algebra point of view, an 0-0 database can be viewed as a

collection of objects, grouped together in classes and interrelated through associa-

tions. It can be represented by graphs at both the intensional and the extensional

levels. At the intensional (schema) level, a database is defined by a collection of

inter-related object classes and is represented by a Schema Graph (SG). For

example, the SG for a university database is illustrated in Figure 3.1, in which

each rectangle denotes a nonprimitive-class such as a class of person objects or a

class of department objects, and each circle denotes a primitive-class such as a

class of names or ages. The associations among classes are represented by the

edges in SG. For example, there is an association between the class Course and

the class Department (an Aggregation association), and an association between the

class Person and the class Student (a Generalization association). Since the

semantic distinctions of these and other association types recognized by different

semantic models can be either hard-coded in a DBMS or declaratively specified by

some rules and used by a rule processor to govern the manipulation of the associ-

ated classes, the underlying algebra does not have to incorporate the semantics of

these association types. All it has to be concerned with is whether or not an

object class and its objects are associated with some other classes and their

objects, i.e., the edges (or associations) are type-less in SG. For example, the

semantics of inheritance can be incorporated in a query language translator which

translates a high-level language statement into its underlying algebraic representa-









tion. The algebra does not have to deal directly with the semantics of inheritance.

This is particularly important if the algebra is to be used as a general algebra for

supporting various 0-0 data models in which the semantics of an association type

may have slightly different meanings.

At the extensional (instance) level, a database can be viewed as a collection

of objects, grouped together in classes and inter-related through some type-less

associations; and as such it can be represented by an Object Graph (OG). For

example, the OG corresponding to a portion of the university schema graph is

shown in Figure 3.2. In this example, the Teacher object t4 is associated with two

Section objects; thereby representing the fact that he/she is teaching two sections,

sc3 and sc4. The Student object sl is associated with Undergrad object ul which,

in turn, is associated with Department object dl; thereby representing that sl is

an undergraduate student who minors in the department dl. Finally, the Section

object sc2 is not associated with any object of the Student class, which represents

the fact that it is not taken by any student. Object associations expressed by

different graph patterns represent the semantic relationships among these objects

in an application world.



3.2 Pattern-bhsed Query Formulation


Based on this view of an O-O database, users can query the database by

specifying patterns of object associations as search conditions. Once these

objected are selected, they can be further processed by either system-defined

operations (Retrieval, Display, Update, Insert, Delete, etc.) or user-defined










operations (RotatePart, PurchasePart, HireFaculty, etc.). For example, the fol-

lowing queries can be issued against the university database as illustrated in Fig-

ures 3.1 and 3.2 (the algebraic expressions for these queries will be given in Section

4.4).


Query 1: For all sections, get the majors of students who are taking these
sections.

To satisfy this query, we can specify a linear pattern containing the classes

Section, Student, and Department as shown in Figure 3.3a. In this pattern, a cir-

cle represents a class and an edge represents that the objects of the two adjacent

circles (classes) must be associated with each other. This pattern is called an

intensional pattern which represents that sections taken by students who major in

some departments are to be identified. The answer to this query can be found in

Figure 3.2 by checking if the objects of these three classes satisfy such pattern.

There are five object patterns (called extensional patterns) which satisfy the inten-

sional pattern as shown in Figure 3.3b. The Section object sc2 and the Student

object s3 do not appear in these extensional patterns, since sc2 is not taken by any

student and s3 does not have a major yet. These patterns can also be identified in

two sequential steps. First, get all the patterns in which the Section objects are

associated with the Student objects. Then, if a pattern generated in the first step

(i.e., a Section-Student pair) is further associated with an object of Department, a

new pattern consisting of three objects is constructed and retained in the result;

otherwise, the pair is dropped.










Once these objects (as well as their associations) have been identified,

different system-defined or user-defined operations defined on their corresponding

classes can be applied to these selected objects. For example, Inform(Department)

can be an operation defined on the class Department. It sends each of the selected

departments a letter concerning the majors of the students.

Suppose there is a rule in the university that a student cannot major and

minor in the same department. To check whether there is such a case in the

database, the following query can be issued.


Query 2: List students who major and minor in the same department.

The intensional pattern for this query is shown in Figure 3.3c. It can be

formed by starting from the class Student and navigating the schema in two

traversal paths (refer to Figure 3.1). One path is from Student to Department,

which means that a student majors in a certain department; and the other path is

from Student to Department through Undergrad, which means that a student is

an undergraduate and minors in a certain department (we can see from the SG

that only undergraduates may have minors). According to the query, a single stu-

dent should associate with objects in both Undergrad and Department and these

two paths should merge at Department, thereby forming a loop. This implies two

logical AND conditions, one at the Student class and the other at the Department

class. We use double arcs to denote such conditions as shown in Figure 3.3c.

From Figure 3.2, we can see that the student sl has his major and minor in the

department dl. This extensional pattern is depicted in Figure 3.3d.










Query 3: For those students taking section 300 and having majors and/or
minors, get their majors and/or minors.

There are several ways to form an intensional pattern for the query. We

may start from Section# and traverse to Student through Section and, then, navi-

gate the schema in two paths as we did for query 2. According to the query, a

student who either has a major or a minor should be included in the result (in this

database, it is assumed that graduate students do not have minors). This means

that either path of the navigation will construct a pattern that would satisfy the

query. Thus, a logical OR condition exists at Student. We use a single arc to

indicate the OR condition as shown in Figure 3.4a. Like Query 2, these two

branches merge at Department. However, this query does not require that they

merge at the same Department object. This is specified by the second OR condi-

tion at Department in Figure 3.4a.

The extensional patterns that satisfy this query have heterogeneous struc-

tures: two types of linear patterns as shown in Figure 3.4b. The first type includes

patterns that represent the minors of the undergraduates; and the second type

includes patterns that represent the majors of the student who are either under-

graduates or graduates. In both types of patterns, a student is associated with sec-

tion 300 which is assumed to be the Section# for sc3. Figure 3.4c will be

described later in Section 4.4.

We have given some example queries which specify how objects are associ-

ated with one another. In the graphical representation of an 0-0 database, when

there is no edge between two objects even though there is one between their

classes, it implies that two objects are not associated with each other. This










represents the complement aspect of the semantics between two associated classes.

It is necessary to allow a user to retrieve this type of object non-association from a

database. The following query is such an example. It can also be specified by a

pattern.


Query 4: For each teacher, list the sections which he/she does not teach.

We use a dashed line to represent the fact that two objects are not associated

with each other. Therefore, the intensional pattern for this query can be drawn as

in Figure 3.4d. There are twelve extensional patterns that match the intensional

pattern. Figure 3.4e shows a portion of them. Non-association relationships

among objects are not explicitly stored in a database. However, they can be

derived during the processing of this type of queries.

Using the above examples, we hope that we have convinced the reader that

the pattern-based query formulation is suitable for query specification based on a

graphical view of an 0-0 database.



3.3 Conclusion

The (type-less) graphical representation of 0-0 databases is applicable to

most 0-0 data models, since it captures the essential characteristics of 0-0 data

models in which object classes as well as their objects are inter-related with each

other in different association patterns. Querying such databases can be made by

specifying patterns in which objects of interest are associated with each other. It

should be clear that this formulation is quite different from the attribute-based

query formulation in the existing relational query languages which is based on










matching the attributes (or the key or composite key) of one relation with the

attributes (foreign keys) in other relations. A query that requires the specification

of a complex pattern of object associations can be specified in a rather straightfor-

ward manner in an association-based language, whereas in an attribute-based

language, complex nestings of query blocks or multiple queries would be required

[ALA89a].

It is our view that an algebra developed for processing data based on the

graphical view of 0-0 databases and the pattern-based query formulation should

satisfy the following requirements. First, it should allow direct manipulation of

complex patterns of object associations. Second, the closure property should be

maintained. Third, both association and non-association relationships among

objects should be expressible as search conditions. Fourth, it should be complete

in the sense that it can be used to describe all possible patterns in a database.

Lastly, it must be able to represent and process patterns with both homogeneous

and heterogeneous structures.




































degree


Figure 3.1 Schema graph of a university database














Teacher


Section


Section#


Student


Department


Figure 3.2 Object graph













Query 1

Section Dept
(a) 0--- 0
Student


scl sl dl
0 p
sc3 s2 d3

(b) sc3 s4 d3

sc3 s5 d4
sc4 s7 d6





Query 2

jQUndergrad
(c) a


Student Dept


ul
(d) dl
sl A Idl


Figure 3.3 Pattern specifications for Query 1 and Query 2












Query 3


Section# Section Student Dept

(a) O0O u--
(a) [300]

Undergrad


[300] sc3


s3 u3 d2 [300] sc3


[300] sc3 s4 u4 d2


[300] sc3
[300] sc3


s4 d3


s5 d4


s3 d2


s4 ^ d2

s d3




Query 4

) Teache
(d) 0-


s2 d3


s5 d4
.-----


r Section
S- --0


sc2
--0
sc3




sc2
-- -S


Figure 3.4 Pattern specifications for Query 3 and Query 4


(b)


w w


w w














CHAPTER 4
ASSOCIATION ALGEBRA


The association algebra (A-algebra) is defined based on a uniform representa-

tion of an 0-0 database in terms of objects, object classes, and type-less associa-

tions, as described in Chapter 3. The algebra contains a number of operators

which operate on graph structures of object associations to produce graph struc-

tures. The closure property of the algebra ensures that the result of a query can

be further manipulated by other queries.



4.1 Definitions


First, we formally define an 0-0 database at both schema and object levels.

Schema Graph (the intensional database):

The schema graph of an 0-0 database is defined as SG(C,A), where C={C,}
is a set of vertices representing object classes; A is a set of edges, each of
which, Aj(k), represents association between classes C and C, where k is a
number for distinguishing the edges from one another when there is more
than one edge between two vertices.

Object Graph (the extensional database):

The object graph of an 0-0 database is defined as OG(O,E), where 0={0)}
is a set of vertices representing object instances (Ith object in class q,); and
E={O(i- == m,,} is a set of edges representing the associations among object
instances. When one object instance is connected with another in the object
graph, a regular-edge (solid line) is drawn between the corresponding ver-
tices as Oi,-0O,, which specifies that jth object instance in class Ci is
related to nth object instance in class C, through the kth association of
classes C, and Cm. If two object instances 0,. and 0,,. are not connected
in the object graph but their classes Ci and Cm in the corresponding SG are










directly connected, a complement-edge (dotted line) is drawn between them
and is denoted by ,j....Om,,.

In this 0-0 models, an object may participate in several classes (e.g., in a

generalization hierarchy). Its representation in a class is called an object instance.

Since in most cases in this dissertation, "object" and "object instance" can be used

interchangeably without any ambiguity, we shall use "object" unless a distinction

is required between the two.

The reason for explicitly introducing complement-edges into the OG is to

allow the A-algebra to manipulate both association and non-association between

objects of two adjacent classes. In an actual 0-0 database, it is not necessary to

explicitly store the complement-edges. Figure 4.1 illustrates the regular-edges and

complement-edges among the objects of three object classes. For example, we see

that section scl is taken by students s2 and s3 (regular-edges) and not taken by

students sl and s4 (complement-edges).

The relationship between an OG and its corresponding SG is formally

described by the following proposition.

Proposition 1: An OG(O,E) is a morphism of its corresponding SG(C,A).
The mapping function Fm is defined as

F,,: Ci => {Oij}, and
Fm2: Aim(k) => {Oi--==m,.}.

The mapping between SG and OG is one-to-many, since a database is

dynamically changing and may have different instantiations at different times for

the same schema graph.










To define "association pattern", we first extend the concept of connected

graph in graph theory by treating complement-edges as edges, i.e., a connected

graph is a graph in which there exists at least one path between any two vertices

and each path may contain regular-edges, complement-edges, or a combination of

the two. We shall from now on use an upper-case letter to denote a class and the

corresponding lower-case letter with a subscript to denote an object instance in

that class. We shall assume that there is only one edge between any two vertices

in SG unless otherwise specified so as not to complicate the notation.


Association Pattern:

A connected subgraph of an OG is an association pattern (or pattern for
short).

By this definition, a single vertex (or object instance) in OG, which is a con-

nected subgraph, is also a pattern. We call it an Inner-association-pattern (or

Inner-pattern for short). It is algebraically represented by (a,) for a vertex of class

A in SG. Thus, object instances are treated as Inner-patterns in the A-algebra. A

regular-edge together with two vertices (i.e., two Inner-patterns) it connects is

called an Inter-association-pattern (or Inter-pattern) which is represented by (ai0b).

A complement-edge together with the two Inner-patterns it connects is called a

Complement-association-pattern (or Complement-pattern) and is represented by

(acbj). This pattern states that a, and b, are not associated with each other in OG.

If a path consisting of only regular-edges between vertices a, and b, it can be

represented by a Derived-inter-association-pattern (D-inter-pattern), denoted by

(aibj); otherwise, it can be represented by a Derived-complement-association-









pattern (D-complement-pattern), denoted by (aib,). When a path is represented

by a derived pattern, it simply means that two vertices are indirectly associated or

non-associated but how they are interrelated (the actual path) is of no importance.

A D-inter-pattern is treated as an Inter-pattern and a D-complement-pattern is

treated as a Complement-pattern in the algebraic operations.

The above five types of patterns are the primitive patterns, the latter four

being binary patterns. Their graphical and algebraic representations are summar-

ized in Figure 4.2a. All other connected subgraphs are called complex patterns.

For example, the complex pattern shown in Figure 4.2bl contains three primitive

patterns: two Inter-patterns (b61) and (bd), and a Complement-pattern (b6c). It

can be uniquely defined by its algebraic representation as a set of primitive pat-

terns, i.e., (aab,bjc,b6d,). More examples of complex patterns are shown in Figure

4.2b. From these examples, one can observe that a complex pattern can be

decomposed into a set of binary patterns which cannot be further decomposed.

This implies that, in the algebraic representation of a complex pattern, an Inner-

pattern may not occur as an element and a binary pattern may appear only once.

A pattern in this algebraic format is called a normalized pattern, otherwise it is

called an unnormalized pattern. (b,,bzcj), (b2,b22), and (a6b,,bc2,ab,) are examples

of unnormalized patterns. During the process of constructing an association pat-

tern, we always normalize it by eliminating the duplicates. The above three pat-

terns have the normalized forms of (bc6), (b22), and (a1b1,bc), respectively.

The definitions of OG and association pattern imply that a pattern is a non-

directional graph, i.e., (aib,) = (bjai), and that the sequence of primitive patterns in










the algebraic representation of a complex pattern is not important, hence

(aibj, bjck) = (ckb,, aibj).

Based on the above definition and notion of association pattern, we view an

OG as an Association Graph (AG) and all the association patterns in AG form the

domain of the A-algebra, denoted by A.



4.2 Relationship Between Two Association Patterns


The operators of the A-algebra are defined based on the possible relationships

between two patterns in A, so that they can be used either to construct complex

patterns using simpler patterns or to decompose a complex pattern into several

patterns of simpler structures. There are four possible relationships between two

patterns p' and p2: non-overlap, overlap, contain, and equal.

(1) Non-overlap: Two patterns are said to be non-overlap, denoted by p'DCp2,
if they have no common Inner-pattern.

(2) Overlap: Two patterns are said to be overlapped, denoted by pr p2, if they
have at least one common Inner-pattern.

(3) Contain: Contain is a special case of (2) when all the primitive patterns of
p' are contained in p2. We say that p' is a subpattern of p2 and denote this
relationship by p1Cp2.

(4) Equal: This is a special case of (3) when p' contains all the primitive pat-
terns of p2, and vice versa. It is denoted by p =p2.

Before defining the association operators, we give the definition of

"Association-set" the operand of the association operators.

Association-set:

An association-set, denoted by a Greek letter a (or f,"q,...), is a set of associa-
tion patterns without duplicates, a' designates the ith pattern in a, where










a oa (Vi,.j). An empty set is also an association-set, denoted by 0.

A special type of association-set is called homogeneous association-set, which

is important to the A-algebra, since some of the mathematical properties hold only

when operands are homogeneous association-sets.

Homogeneous Association-set:

An association-set is homogeneous, if

(1) all patterns are formed by the Inner-patterns (or object instances) of
the same set of object classes; and

(2) all patterns have the same number of Inner-patterns from each class in
the set; and

(3) corresponding primitive patterns belong to the same association and are
of the same type; and

(4) all patterns have the same topology.

Otherwise, it is a heterogeneous association-set.

Figure 4.3 depicts three example association-sets: a is homogeneous, whereas

P is not since pattern #f has only one Inner-pattern of class C instead of two like

' and 0. r is not homogeneous because y3 contains a Complement-pattern which

is different from and 'y (i.e., different topologies).



4.3 Association Operators


Ten association operators are formally defined in this section: three unary

operators [A-Project (II), A-Select (a), and A-Integrate (f)] and seven binary

operators [Associate (*), A-Complement (I), A-Union (+), A-Difference (-), A-

Divide (+), NonAssociate (!), and A-Intersect (0)]. The examples used to explain










these operators will make use of the domain A shown in Figure 4.4. To keep the

graph simple, the Complement-patterns are not shown in the figure. The simple

mathematical properties such as commutativity, associativity, idempotency, and

nilpotency satisfied by the operators are given after each definition.



4.3.1 Notations


Notations that will be used in the subsequent sections are listed below.

A, B,...,K Denote classes.

CL, Denotes a variable for a class.

[R(CL,,CL2)] Denotes the association between classes CL1 and CL2.

ac Denotes the ith Inner-pattern of class A.

@ Denotes an Inner-pattern variable.

(a bj) Denotes an Inter-pattern between two classes A and B.

(aibj) Denotes a Complement-pattern between two classes A and B.

(ate,) Denotes a Derived-pattern from class A to class C.

a, f, 7,... Denote association-sets.

a Denotes ith pattern of association-set a.

{W},{X},{},... Denote sets of classes. Hence, a( represents association-set a
which has Inner-pattern(s) from the classes in {X}.

It should be noted that an Inner-pattern is represented by an object instance

identifier (liD), which is a system-assigned object identifier (OID) prefixed by a

class identification so that the object instances of an object in multiple classes can

be unambiguously distinguished and the fact that these object instances are










instances of the same object can easily be recognized.



4.3.2 Operators


All relational algebraic operators operate on relations of homogeneous (or

union-compatible) structures with the exception of Cartesian-product and Join.

The Cartesian-product and Join provide the mechanism to concatenate two rela-

tions of different structures into a single relation, so that it can be further manipu-

lated by other operators. In the A-algebra, all the operators are defined to operate

on association patterns of homogeneous as well as heterogeneous structures.

Therefore, the relational algebra is a special case of the A-algebra in this respect.



(1) Associate (*):

The Associate operator is a binary operator which constructs an association-

set of complex patterns by concatenating the patterns represented by two operand

association-sets. Since a pattern may involve many classes and an object class

may have more than one association with another class, it is necessary to specify

through which association the concatenation of two patterns is intended. The

Associate operation on association-sets a and f over the association R between

classes A and B is defined as follows:

a [R(A,B)] 6 = { y 7 =(af,,amb,): amb,E[R(A,B)] A amE, A bE }

The result of an Associate operation is an association-set containing no dupli-

cates. Each of its pattern is the concatenation of two patterns (one from each









operand association-set). More specifically, if the Inner-pattern (or object am) of A

in a' is associated with the Inner-pattern (or object b,) of B in f' in the domain of

the algebra A shown in Figure 4.4, then a' and #f are concatenated via the primi-

tive pattern (a,,b.).

We do not restrict A and B to be different classes in [R(A,B)], i.e.,

a*[R(A,A)]# is a legitimate operation, which concatenates two patterns (one from

each operand association-set) if they have a common Inner-pattern of class A.

An example of the Associate operation is shown in Figure 4.5a (for conveni-

ence a copy of the sample database is shown in each figure for illustrating an

operation. For clarity, we use graphical notation in the figures. In the example,

a1 is concatenated with f' and f, respectively, due to the existence of (bcl) and

(bic2) in A as shown in Figure 4.4. a is dropped simply because it does not have an

Inner-pattern of class B. a3 is dropped because (b2) is not associated with any

Inner-pattern of class C in A. ff cannot be concatenated through (e4) with any

pattern in a because no pattern in a has an Inner-pattern of B that is associated

with (c4) in A. For the same reason f/ is dropped.

For the Associate operator, [R(A,B)] can be omitted if the following condi-

tions hold: (1) both a and f are A-algebra expressions, (2) the Associate operator

operates on the last class in a linear expression a and the first class in a linear

expression f, and (3) there is a unique association between these two classes. For

example, A *[R(A,B)] B can be written as A*B, if class A is associated with class

B through the attribute [R(A,B)] of A. It should be pointed out that A-algebra

allows an attribute to be defined by a computed value (or object). For instance,










B=(A). The implementations of the function and the procedure are invisible to

the algebra. However, they should not have side effect, i.e., the computed result

must be of the same type as B.

The Associate operator is commutative and conditionally associative as

defined below:

a 4[R(A,B)] P = P 4[R(B,A)] a (commutativity)
(afx} [[R(A,B)] #,) *[R(C,D)] -{z} (associativity)
= aC [R(A,B)] ({y1 *{R(C,D)] Y {z}) (if C {X} A BV {Z})
A ({R(A,A)] A = A (idempotency)

The associativity holds true if a and 7 do not have Inner-pattern of classes C

and B, respectively. Otherwise, the associativity does not hold. For example, if

a=(abl,b6o2), f=(bc1), r-=(d,), and A is as shown in Figure 4.4 (the domain of the

algebra), then

(a o4R(A,B)] fi) *R(C,D)] y =(alb,,bi,,b,e2,c2d, )
and


a AIR(A,B)j (P 4R(C,D)] ry) =









(2) A-Complement ( ):

The A-Complement operator is a binary operator which concatenates the

patterns of two operand association-sets over Complement-patterns. It is used to

identify the objects in two classes which are not associated with each other in A.

The A-Complement operator is defined as follows:

a [R(A,B)] f = { '1 | =(oaff,ia,,b): (amb.)E[R(A,B)] A amaEct A bE
or k=a : 3(m)(amc.a) A A(n)(be
or 7=' : 3(n)(b,Ei) A A(mX)(amEa) }

The result of an A-Complement operation is an association-set. Each of its

patterns is formed by concatenating two patterns (one from each operand

association-set) via a Complement-pattern (a.bn), where am and b, belong to a'

and #i, respectively, and the Complement-pattern (amb,) is in A. In the special

case when a(or P) is an empty association-set or does not have Inner-patterns of

class A(or B), then all patterns of f(or a) that have Inner-patterns of A(or B) are

retained in the resulting association-set.

An example of the A-Complement operation is shown in Figure 4.5b. It

operates over the association between classes B and C. a2 does not appear in the

resultant association-set because it contains no Inner-patterns of B. a1 cannot be

A-Complemented with P and fL because it is connected with f# and f by Inter-

patterns (bc,) and (bc) in A, respectively.

Under the same conditions as given in the Associate operator, [R(A,B)] need

not be specified with the A-Complement operator unless there is an ambiguity.

The A-Complement operator is commutative and associative. For the similar rea-









son described for the Associate operator, the associativity holds true conditionally.

a [R(A,B)] P = f [R(B,A)] a (commutativity)
(ax | [R(A,B)] t{y1) | [R(C,D)] f{z} (associativity)
= atx I[R(A,B)] (P{ I [R(C,D)J 7z}) (if {X} A BO{Z})
A I[R(A,A)J A = ( (nilpotency)


(3) A-Select (a):

The A-Select is a unary operator, which operates on an association-set a to

produce a subset of patterns that satisfy a specified predicate P. A pattern in the

operand association-set is retained iff the predicates are evaluated true for that

pattern.

a(&)[I = I = Ya': ;(a')=true }

where a is defined by an algebraic expression, and P= T18IT22 .* 0 ,,T,. Each

term, T,(i=l,2,...n), is a comparison between two expressions and i,(i=1,2,...,n-1) is a

Boolean operator (Aorv). (ar')=true represents that a pattern is evaluated true for

that predicate.

The expressions on the left- and right-hand sides of a comparison operation

may contain constants, functions, and/or operations on objects, but cannot both

be constants. The comparison terms are type sensitive, i.e., the results of the two

expressions in a term should be data of the same type for primitive-classes or both

liDs for nonprimitive-classes. =,>,<,>,<, and are the legitimate comparisons

for numerical types; = and o for character, string, and IID types; and =,C,D,C,D,

and # for set types. The comparison of two IIDs is performed by comparing their

OID portions, since IIDs are the concatenations of the class identifiers and OIDs.









A single valued object or a single IID can be treated either as its own data type in

numerical, string, or IID comparison, or as a set type containing one element in a

set comparison.

As an example of A-Select, we assume that there are two associated classes:

S for stack and Q for queue. To select associated stack and queue object pairs in

which the top and the bottom of the stack have some common objects) with

those in the head and the tail of the queue, it can be written as

o(S*Q)[(top(S)uottom(S)) n (head(Q)JtaiQ)) 0j

For the top equals the head and the bottom equals the tail, we have

o(S Q)[top(S)=head(Q) A bottom(S)=tai( Q)]


(4) A-Project (H):

Similar to the projection operation in the relational algebra, an A-Project

operation is defined to project subpattern(s) of a pattern. However, in the rela-

tional algebra, the relationship among the projected attributes is not important.

Whereas in A-algebra, the association among the projected subpatterns must be

maintained so that the associations among the objects in these subpatterns will be

retained. The A-Project operator is defined as follows:


I4a)[6, TJ

where a is an association-set defined by an A-algebra expression;

E=(e1, e2, .. e) is a set of expressions which specify subpatterns to be pro-

jected; and T=(t,, t, t,) is a set of ordered sets of classes. Each ordered set,










tf, specifies a path connecting two projected subpatterns defined by the E expres-

sions.

e,{i=1,2,...,n) is a subexpression of the expression which defines a. e, and

ej (Vi43) should not contain a common class. There may be many paths that con-

necting two subpatterns in the original pattern. The path to be retained can be

specified in tk. If a specific path is chosen, a minimal number of classes along the

path which can uniquely identify the path should be specified. The result of an

A-Project operation over a pattern is its subpatterns defined by E and some paths

defined by Tthat connect these subpatterns. If a path in the original pattern con-

sists of all Inter-patterns, a D-inter-pattern is retained. Otherwise, a D-

complement-pattern is included. Multiple paths between two projected subpat-

terns can be declared in T, if it is so desired.

Figure 4.5c shows an example of A-Project from a pattern a over A B and

D. For a', the subpatterns (ab,1) and (d,) satisfy A*B and D, respectively. There-

fore, they are kept in the result. According to the path specification stated in the

operation, a Derived-pattern (b,d1) is added to the result, thus 7'=(a~b, d, b,d. Its

normalized form is -=(alb,, bid. 72 is produced for the same reason. Since a3

does not have a subpattern satisfying A *B, only (ds) is retained.



(5) NonAssociate (!):

The NonAssociate operator is a binary operator used to identify the associa-

tion patterns in one operand association-set that are not associated (over a

specified association) with any pattern in the other association-set, and vice versa,









in the domain of the algebra A. The NonAssociate operator is defined as follows:

a [R(A,B)] f ={ 7 I = (ao, ', amb): (amb,)E[R(A,B)] A amEa' A bEf
A V ((amb,),(ambJEA)(am4 a A b 4 )
k i
or 7 = a: 3(m)(amea') A A(nXb6. )
V V(b,Ef)3(k, kAm)(akEa A (akb.)E[R(A,B)])
or = i: 3(n)(befi) A i(m)(amea)
V V(a,,a)3(k, k,4n)(bE A (ab )[R(A,B)]) }

The result of a NonAssociate operation is an association-set. Each of its pat-

terns is formed by concatenating two patterns a' and 0' via a Complement-

pattern (a,,b,) under the condition that a' is not associated with any # and vice

versa. Furthermore, in the special case where the patterns of a(or f) have Inner-

patterns of A(or B) and cannot be concatenated with any pattern of (or a), these

patterns of a(or P) will be retained in the result if one of the following three condi-

tions holds: (1) (or a) is an empty association-set, (2) all patterns of (or a) do

not have Inner-patterns of B(or A), or (3) all patterns of (or a) that have Inner-

patterns of B(or A) can be concatenated with patterns of a(or f).

An example of the NonAssociate operation is shown in Figure 4.5d. In the

example, a1 and f are dropped due to the existence of (b1c,) in Figure 4.4. a2 is

dropped because it does not contain an Inner-pattern of class B. 0' is dropped

because it does not contain an Inner-pattern of class C. 71 is in the resultant

association-set because (b2) is not associated with (c4) in A as shown in Figure 4.4

and (bs) does not appear in a. 7 exists because (b2) is not associated with (c,) in A.

Note that the NonAssociate operator produces a resultant association-set

which is a subset of that produced by the A-Complement operator, because a', i,









and ab, may form a new pattern only when am of a' does not associate with any

object of B in P and b. of fP does not associate with any object of A in a. In fact,

the NonAssociate operator can be expressed in terms of A-Complement and other

operators as follows:

A [R(A,B)] B = (A H(A *[R(A,B)] B)[A] I[R(A,B)] (B I(A *iR(A,B)] B)[B])

Thus, NonAssociate is not a primitive operator in a strict sense. However, it is

very useful for query formulation and is therefore included in the set of A-algebra

operators.

Under the same conditions as given in the Associate operator, [R(A,B)] need

not be specified unless there is an ambiguity. The NonAssociate operator is com-

mutative but not associative.

a [R(A,B)] f = f [R(B,A)] a (commutativity)
A ![R(A,A)] A = 0 (nilpotency)


(6) A-Intersect (.):

The A-Intersect operation is convenient for constructing a pattern with a

branch or a lattice structure (a pattern that has a loop), since a pattern in such

structures can be viewed as the intersection of two patterns. Conceptually, the

A-Intersect operator is equivalent to the JOIN operator in the relational algebra.

It operates on two operand association-sets over a set of specified classes. Two

patterns, one from each association-set, are combined into one if they contain the

same set of Inner-patterns for each specified class. The A-Intersect operation is

defined as follow:










a{ *{i W} = { l7 It = (a,fi):
V(CLE{ W})V(@ECL,,a')(@E')
A V(CL,{ W})V(@eCL,,)(@Ea') }

Figure 4.5e shows an example of the A-Intersect operation over classes B and

C. The resultant association-set contains four patterns, which are the intersection

of a'nI a'nfi, a2onf, and a2wf, respectively, since they all have Inner-patterns

(bl) and (c2). Other patterns (as, a4, fl, fl) fail to produce new patterns because

they either have no Inner-pattern in both classes B and C or have no common

Inner-pattern of class C.

The set of classes { W can be omitted when the A-Intersect operation is per-

formed on all the common classes of its operands, i.e., {W}={X}r{Y} is implied.

Since a lattice pattern can be transformed into a set of other simple patterns,

an A-Intersect operation for building a complex pattern can be replaced by an

Associate operation followed by an A-Select operation (see Section 4 for detail).

The A-Intersect operator is commutative, conditionally associative and idempo-

tent.

a *{W} = f *{ W} a (commutativity)
(aW .{ *W}) fl{Y) *{ W2} = z} = V { (WI) (l{} *{ W2} "{z}) (associativity)
(if ({W--(W } {z} =( A (W}-{W ) n ( =
a 0 a = a (if a is a homogeneous association-set) (idempotency)

The associativity is not always true because there are cases in which a pat-

tern of f which fails to intersect with any pattern of 7, may succeed by first inter-

secting with a pattern of a in the operation (o{W1}) and then intersecting with a

pattern of 7 in the operation (.{ W2}).









Now we define three set operators, which are different from the correspond-

ing set operators in relational algebra, since they operate on heterogeneous struc-

tures as well as homogeneous structures.



(7) A-Integrate (f):

The A-Integrate is a unary operator. It reorganizes patterns in an

association-set according to the relationships among patterns with respect to the

classes specified. The A-Integrate operation is defined as follows:

f()= { yI l'y=(a):
V(k, CL,.{ WIA@ECLA@EaciajEa,)(@EakAakEa,) }

By this definition, a subset of patterns (a,) of a is combined into a single pattern if

every object instance of classes in { } that appears in a pattern in the subset is

also contained in all other patterns in the subset. If a pattern of a cannot be com-

bined with any other pattern, it is retained in the resultant association-set as it is.

If no class is specified, patterns, in which every pattern has at least one

object instance (of any class) common to another, will be integrated into one pat-

tern. The reorganized association-set will contain patterns which are apart from

each other (refer to Section 4.2).

Figure 4.5f shows two examples. The first example shows an A-Integrate

operation over class A. Patterns that have common Inner-pattern of class A are

grouped into one ('1 is the integration of a', a2, and a3; and Y6 is the integration of

a and a ). All other patterns in a are retained in the result as they are. The

second example illustrates an A-Integrate operation on the same association-set of










the first example but without specifying a class. The result becomes two patterns,

which are apart and are exactly the same as they appear in the original database.

Whereas the same primitive patterns appear more than once in the result of the

first example.


(8) A-Union(+):

Similar to the UNION operation of the relational algebra, A-Union combines

two association-sets into one. However, these two association-sets can contain

heterogeneous association structures. It is important for A-algebra to be able to

operate on heterogeneous structures because some prior operations may produce

heterogeneous association-sets and may need to be further processed over the

objects of a common class against other patterns of associations. Unlike the rela-

tional algebra and other 0-0 query languages, union-compatibility is not a restric-

tion in A-algebra. For this reason, A-algebra has more expressive power. Any

query that can be expressed by a single expression in other languages can be

expressed as a single A-algebra expression but not vise versa. The A-Union opera-

tion is defined as follows:

a + p ={ 7I ea V IEf }

The A-Union operator is commutative, associative, and idempotent:

a + = P + a (commutativity)
(a + f) + 7 = a + (f + 7) (associativity)
a + a = a (idempotency)









(9) A-Difference (-):

The A-Difference implements the same concept as the DIFFERENCE opera-

tor in relational algebra but with two differences. First, its operands do not have

to be union compatible. Secondly, a pattern in the minuend is retained if it does

not contain any of the patterns in the subtrahend.

a- = 7 | Iy* = a : A(fi)(fC) }

The example depicted in Figure 4.5g shows that a1 and a3 are dropped since

they both contain #.



(10) A-Divide (-):

The A-Divide operator implements the concept that a group of patterns with

certain common features contains another set of patterns.

Q at~ = {( I = aI : V(k( a. ) }

where a, is a subset of the patterns of a, which have common Inner-patterns for

all classes of {W} and they together contain all patterns of fl. If ({W} is not

specified, the A-Divide operation retains all the patterns of a, if each of which

contain at least one pattern of f and they together contain all patterns of f.

Figure 4.5h shows an example of a being divided by f8 with respect to class

B. The A-Divide operation retains a, a2 ,and a3 since they all contain Inner-

pattern (b,) of B and together contain all patterns of f.









4.3.3 Precedence


The precedence relationships of the above operator are as follows. Unary

operators have higher precedence than binary operators. The precedence of the

seven binary association operators is given in the following order: *, |, ,, ,

and +. Parentheses can be used to alter the precedence relationships.



4.3.4 Summary of operators


(1) Associate (*): Two patterns are concatenated via an Inter-pattern.

(2) A-Complement (I): Two patterns are concatenated via a Complement-pattern.

(3) A-Select (o): A pattern is retained if it satisfies the predicate.

(4) A-Project (H7): A subpattern is projected from the original pattern.

(5) NonAssociate (!): Two patterns are concatenated via a Complement-pattern
only if each of them cannot be concatenated with any pattern of the other
operand via an Inter-pattern.

(6) A-Intersect (.): Two pattern are combined into a single pattern if their com-
mon classes have common objectss.

(7) A-Integrate (f): Patterns in an association-set are combined if objects of a
specified class in a pattern are common to these patterns.

(8) A-Union (+): Two association-sets are lumped into a single set.

(9) A-Difference (-): A pattern in the minuend is retained if it does not contain
any pattern in the subtrahand.

(10) A-Divide (-): A subset of patterns in the dividend that have certain common
features) and contain all the patterns in the divisor is retained.










4.4 Query Examples

We have formally defined nine association operators and given their simple

mathematical properties. Before exploring other properties, we give some exam-

ples to illustrate how these operators can be used to formulate queries for process-

ing an 0-0 database. There can be many alternative expressions for the same

query. Choosing the best one for execution is the task of a query optimizer. The

mathematical properties of these operators can be used for that purpose.

In the following formulation of algebraic expressions, we assume that the user

is using the algebra directly instead of a high-level query language. In the latter

case, the task of generating algebraic expressions would belong to the translator.

To formulate an A-algebra expression for a query, first, we need to construct

an intensional pattern for it by navigating the schema graph of the database as

illustrated in Chapter 3. Then, each edge of the pattern is marked an operator *,

I, or on the intended semantics. For simple patterns, the formulation is straight-

forward. For patterns with complex structures, we may have to decompose them

into patterns with simpler structures. The expression for the original pattern is

the A-Intersect's of the expressions for the decomposed patterns.

First, we formulate expressions for Query 1 to Query 4 given in Chapter 3.

We have identified the intensional patterns for these queries (see Figure 3.3).


Query 1: For all sections, get the majors of students who are taking these
sections.

It is trivial to write an algebraic expression for Query 1, which is represented

by a linear pattern. For this pattern, two edges are all marked with and the










algebraic expression can be formulated as follows:

f (sco (Section Student Department)[Section,Department;Section:Department])
{Section)

where the A-Integrate operation groups the resultant patterns by Sections.


Query 2: List students who major and minor in the same department.

For Query 2, the edges of the intensional pattern shown in Figure 3.3c are all

marked with *. Since this loop structure can be viewed as the A-Intersect of two

linear patterns involving both Student and Department, we have

(Student Undergrad Department Student Department)[Student]

where the A-Project operation gets the student objects that satisfy the association

pattern as required by the query.


Query 3: For those students taking section 300 and having majors and/or
minors, get their majors and/or minors.

The expression for the intensional pattern of Query 3 shown is as follow:

Section# *Section (Student *Department + Student *Undergrad *Departmentl)

where the A-Union operator is used to realize the OR condition at the class Stu-

dent. As long as a student has a major or a minor, the linear pattern from Student

to Department and the linear pattern from Student to Undergrad and to Depart-

ment should be retained. In the expression, Department- is an alias of Depart-

ment, which is used to distinguish major and minor departments. Since the query

ask for the majors and minors of students who are taking section 300, the A-Select

and A-Project operations are used. Thus, we have










ft (17( o(a)[Section#=300])[Student, Department, Departmentl;
{Student}
Student:Department,Student:Departmentl])

where a is the intensional pattern given above. As shown in Figure 3.3g, the

result of this expression will contain the derived patterns shown in Figure 3g

which are specified by the [CT7J clause of the projection operation and is reorgan-

ized by an A-Integrate operation. Note that Query 3 cannot be phrased in a sin-

gle relational algebra expression since (a) the union operation in relational algebra

requires operands to be union-compatible, (b) using a join operation on Student

can cause a loss of information because not every student has both major and

minor, (c) the cartesian-product of the majors and minors will produce erroneous

results, and (d) no other operation in the relational algebra can combine two rela-

tions into one.


Query 4: For each teacher, list the sections which he/she does not teach.


The algebraic expression for Query 4 can be easily formulated as follows,

since it is represented by a linear pattern shown in Figure 3.3h. We note that the

A-Complement operator I, rather than the NonAssociate operator !, should be

used for this query, since a teacher may be teaching some courses.

Teacher I Section

Several other query examples are given below. They use the schema graph

given in Figure 3.1. Their corresponding intensional patterns are depicted in Fig-

ure 4.6.










Query 5: List the names of students who teach in the same departments
as their major departments.

We can see from Figure 4.6 that the intensional pattern for this query can be

constructed in two ways. One way is to decompose it into three linear patterns:

Name-Person-Student, Student-Department, and
Student-Grad-TA- Teacher-Department

The A-Intersect's of these three patterns will produce a pattern that satisfies this

query.

n(Student Person Name Student Department
Student Grad TA Department)[Name]

where the first A-Intersect operation operates over Student and the second

operates over Student and Department. The A-Project operation projects the

names of these students.

Another way is to decompose the intensional pattern into two linear patterns:


Name-Person-Student-Department and
Student-Grad- TA- Teacher-Department

Therefore, we have an alternative expression


(lName *Person *Student *Department *TA
Student *Grad *TA I Teacher *Department)[Name]



Query 6: List the section# of those sections which have not been assigned
a room or have not been assigned a teacher.

Since the query requests sections that have not been assigned a room or a

teacher, these sections must not be connected with any room or any teacher (i.e.,










a section which does not associate with any room and teacher should also be

retained in the result). Therefore, there should be Complement-patterns between

Section and Teacher and between Section and Room, and a single arc between

these two branches as shown in Figure 4.6. We emphasize that operation,

instead of |, should be used to construct these two Complement-patterns. Then

the algebra expression for this query can be easily formulated as follows:

7I (Section# (Section Room# + Section !Teacher))[Section#]


Query 7: List the names of students who take courses 6010 and 6020.

We shall show three ways of formulating an expression for this query. First,

the intensional pattern for Query 5 shown in Figure 4.6 can be constructed by the

A-Intersect of two linear patterns as we did for Query 5:

n(a(Name *Person *Student *Enrollment *Course *Course#)[Coure#=6010]
o(Student *nrollment-l *Course.- *Course#-l)[ Course#=6020])[Name]

where Enrollment-1, Course-1, and Course#J are the aliases of the classes

Enrollment, Course, and Course#, respectively. This ensures that the A-Interact

operation will be performed only over the Student class.

A second way is to view the original pattern as a linear pattern without res-

triction on Course# as follows:

Name-Pe rson-Stude nt-Enrollme nt- Course- Course#

Students who are taking both courses must participate at least two such patterns

with Course#==6010 and Course#=6020, respectively. This implies an A-Divide

operation. Thus, the query can be formulated as follows:










1(Name *Person sStudent *Enrollment *Course *Course#
+{Student} o( Course. Course#)[ Course#=601VOCourse#==6020)[Name]

where a dot in Course.Course# is used only for identifying the Course# class

which is defined in the Course class. It does not represent a function or a method

as in other languages. This expression can also be rewritten as follow:

l(Name Person I(Student Enrollment Course Course#

-{Student} o(Course. Course#)[Course#=6O10V Course#--6020])[Student])[Name]

which is more suitable for execution than the first since the inner A-Project gets

the student objects who are taking these two courses so that all other data associ-

ated with these students, such as Enrollment, Course, and Course#, do not have

to be carried along in further processing to get the names of these student.

Details of optimization issues will be addressed in the next chapter.

We stress that the above association pattern expressions represent the inter-

nal algebraic operations that need to be performed if the dynamic inheritance

method is used. The high-level query statements corresponding to these algebraic

expressions issued by the user can be much simpler due to the inheritance of attri-

butes in the generalization hierarchy or lattice.

















Section


Figure 4.1 Regular-edges and Complement-edges in an OG


Student


Course











graphical
representation


al
Inn-pattern a



al bl
I-pattern al b
primitive
patterns cl dl
Complement- -
pattern

al dl
binary D-Inter-
pattern
patterns pattern
which is derived from
al bl c1 dl

al dl
D-Complement- al dl
pattern W -*
which is derived from
al bl c1 dl
---*--I----


algebraic
representation


(al)



(albl)


(cidl)



(aT'dl)


(albl,blcl,cldl)

(ald1)


(albl,blcl,cldl)


(a) primitive association patterns


al bl c1

(1) d

(albl,blcl,bldl)


a2 b2 c3
..7 -
(2) 3
(2a4 b3

(a2b2,a4b2,b2c3,b3c3)


bl c1 dl
b. -- -- --.--- -- -ft


(bic1,cidl)


cl

d1

c2

(aTbi ,b1c1 ,bi c2,c dl ,c2d1)


(b) complex association patterns


Figure 4.2 Examples of association patterns

















a

al bt c1
c c2

a 1c3
(a3 b2 rC1
\~1'33)


Y

al bl C1
c2


cc4
IcI /
-' .cy


Figure 4.3 Examples of association-sets




















A B C D




bl cl
dl



a2 b2 c2 d2



a3 c3 d3


a4 b3 Ad4
c4





















Figure 4.4 A sample database association graph
(The Complement-patterns are not shown)
























Sample Database
(The Complement-patterns are not shown)








P


/al -- bl\ cl e---- dl
a3 *( c2 c4--- d2

c4 ----- d3


al b1 cl d21
ka=l b c--d--2-
..--------e


(a) an Associate operation


Figure 4.5 Example of operations


























Sample Database
(The Complement-patterns are not shown)


al -- bicl ----. dl


a4 e---4 b3 c3


al bl c3
C-~--.--e-..
a4 b3 cl dI
----4-------
a4 b3 c2 d2

a4 b3 ---
a4 b3 c3
V --..- ---


(b) an A-Complement operation


Figure 4.5--continued

























Sample Database
(The Complement-patterns are not shown)


al bl ci dl
c-----*--*--*
l al bI cl d3
S c--' --- d
b2 c3 d3
e----+----


[(A*B, D);(B:D)] =


al bl dl
al bl.... d3
Id3
'4. */


(c) an A-Project operation


Figure 4.5--continued

























Sample Database
(The Complement-patterns are not shown)






a P Y



al bl e----4
--c2 d3
I 0 *c4 d4 a4 b2 c4 d4
pp I -.-----..c----.
a ) ![R(B,C)] ----
a4 b2 b2 a4 b2 c3
c3---*---.


(d) a NonAssociate operation


Figure 4.5--continued


























Sample Database
(The Complement-patterns are not shown)


bl c2 dlb 2 d
b 2 d
a; b c. bI c d3
a2 b2 0[B,C] bi ci d3 1

a3 2 c4, d4
c


bl c2 dl

d2
bl c2 d3



al bl c2 d2
l*--------*----
kal bl c2 d3
S*- --Q--* -


(e) an A-Intersect operation


Figure 4.5--continued




















Sample Database
(The Complement-patterns are not shown)


al bl c2
al bl cl dl
- --- -*
c2 dl
< d2
b3 c4
b3 c4 d4
e--*---
a4 b2
--a4 b3
a4 b3
......


Cal bl cl dl
d ----
2C d


b3 c4
b3 c4 d4

b2
a4b3
--.


al
*
al bl c2
al bl cl di

c2 ,dl1 al bl cl dl
< d2 0S --
b3 c4 c2
b3 c4 d4 b2
0-----* Z
a4 b2 a b3 c4 d4

a4 b3
-----.


(f) A-Integrate operations


Figure 4.5--continued


{A}





,1







;























Sample Database
(The Complement-patterns are not shown)







P


al b1 cl

a3 b2 \c2
a---l c
al bl c2
- c---


(al bl c2)
a3 b3
-- .--.


a3 b2
* -----


(g) an A-Difference operation


Figure 4.5--continued

























Sample Database
(The Complement-patterns are not shown)


al b cl
bl c2 dl al b1 cl
al bi cla b
bl c4 d4 -- b1 c2 dl
---.--e... } c2
b3 c4 bl c4 d4
Sc4 d4 --- ---- /
b2 c3 ----*
*-----*


(h) an A-Divide operation


Figure 4.5-continued













Query 5


Name


Student


Grad TA Teacher


Query 6


Teacher
Section# -0

Section O
Room


Query 7

Name


Enrollment Course

Student

on

Enrollment_1 Course_1


Course#=6010


Course#=6020


Figure 4.6 Intensional patterns of Query 5, 6, and 7


Dept














CHAPTER 5
MATHEMATICAL PROPERTIES OF OPERATORS
AND THEIR APPLICATIONS
IN QUERY OPTIMIZATION AND QUERY DECOMPOSITION


In Section 4.3, we have shown some mathematical properties of individual

operators. In this section, we shall study their properties systematically. The pro-

perties of A-algebra are classified into six categories: (1) conventional algebraic

properties such as commutativity, associativity, idempotency, nilpotency, and dis-

tributivity; (2) nesting of two unary operations; (3) a binary operation nested in a

unary operation; (4) cascading of two different binary operations; (5) general iden-

tities; and (6) operation transformation. The properties presented in this disserta-

tion is quite exhaustive, but may not be complete. These properties provide the

mathematical foundation for query decomposition and query optimization. Their

utilities in these two applications are also illustrated in this chapter. The proofs of

properties that are marked with t's can be found in the Appendix. Others can be

proved similarly.



5.1 Conventional Algebraic Properties


To be systematic, first we list the properties given in Section 4.3 without

explanation, since they have been illustrated previously. Then, we give the pro-

perties of distributivity.









A. Commutativity

a *R(A,B)I] = P *[R(B,A)] a (5.1 t)

a I [R(A,B)] 6 = I [R(B,A)] a (5.2 t)

a [R(A,B)] P = f [R(B,A)] a (5.3 t )

a *{W B = 6 *{ w} a (5.4 t)

a+ = + (5.5 t)

B. Associativity

(apx *[R(A,B)] ,{) *[R(C,D)] 7{z}
= ax *RR(A,B)] (fi{y *[R(C,D)] {z) (C {X} A B {Z}) (5.6 t)

(ax I [(R(A,B)] fl{y) I [R(C,D)] 7(z}
= a { [R(A,B)] ((I} [ [R(C,D)] '{z}) (CG {X} A Bq {Z}) (5.7 t )

(a{, *{ W} I{() *{ 7z} = a, w { (W } f{ )W2} 'Y{
(({Wi}-{W2}) n {z = A ({W2}-{WI}) l {X} = ) (5.8 t)


(a + P) + y = a + (f + -) (5.9 t)

C. Idempotency and Nilpotency

a a = a (if a is a homogeneous association-set) (5.10)

a + a = a (5.11)

A *R(A,A)] A = A (5.12)

A ![R(A,A)] A = (5.13)